HACKER Q&A
📣 williamstein

What is going on regarding quality of service for API access to LLM's?


I saw in the latest ChatGPT plus announcements that you get better speed if you pay them $20/month. This made me wonder how the speed of the plus version of ChatGPT compares to the API that we pay for (to integrate ChatGPT into https://cocalc.com). We have had solid usage over the last 2 months, and I keep track of exactly how long the complete response takes for each api request. I just checked the stats and the average api response time for chatgpt and gpt4 have both gotten MASSIVELY WORSE for us over time:

    smc=# select model, sum(total_time_s)/count(*) from openai_chatgpt_log where time >= now() - interval '1 weeks' group by model;
         model     |      ?column?      
    ---------------+--------------------
     gpt-4         |  64.17583870967742
     gpt-3.5-turbo | 22.513887411945003
    (2 rows)

    smc=# select model, sum(total_time_s)/count(*) from openai_chatgpt_log where time >= now() - interval '8 weeks' and time <= now() - interval '7 weeks' group by model;
         model     |      ?column?      
    ---------------+--------------------
     gpt-4         |  30.74102777777778
     gpt-3.5-turbo | 10.309548475729441
    (2 rows)

The times have more than doubled on average! (I checked and the average total tokens hasn't changed at all.) Does openai publish any stats about API response times?

I also subscribed to ChatGPT plus, and anecdotally it does seem much faster than the API for us. So maybe OpenAI is increasingly throttling API access for customers who are not marked as special? I wonder if some API users get much faster response times?

Given how valuable LLM's are for products like ours, what does this mean for us? Does it mean that relying on api access as a longterm solution isn't a way to stay competitive? There are other LLM api providers like Anthropic (and potentially Google), but so far they are vaporware for us, since it's just waitlists forever.

This gives me new appreciation for the approach repl.it is taking of building their own open source models.


  👤 williamstein Accepted Answer ✓
There is an official answer at the end of this thread from somebody at openai claiming that they do not intentionally slow down the API: https://community.openai.com/t/we-proved-the-api-is-intentio...

It sounds like they are just swamped with usage and are just trying to keep it working at all…