HACKER Q&A
📣 oidar

How are you handling OpenAI's API latency?


I have a pretty complicated prompt setup due to context requirements. I need to have 3 calls to the GPT-3 Turbo to get the required output which is about 10k tokens of input and output per user interaction. If the LLM was fast (2-3 seconds per API call) that would be fine, but I'm dealing with around 45-60 seconds of wait time. Embedding and fine tuning wouldn't speed this up because the context is constantly shifting.


  👤 0x6A75616E Accepted Answer ✓
A group I'm working with has found that using the OpenAI models through Azure can result in faster response times, but I don't have tangible data to share right now.