How are you handling OpenAI's API latency?

Question

I have a pretty complicated prompt setup due to context requirements. I need to have 3 calls to the GPT-3 Turbo to get the required output which is about 10k tokens of input and output per user interaction. If the LLM was fast (2-3 seconds per API call) that would be fine, but I'm dealing with around 45-60 seconds of wait time. Embedding and fine tuning wouldn't speed this up because the context is constantly shifting.

0x6A75616E · Accepted Answer

A group I'm working with has found that using the OpenAI models through Azure can result in faster response times, but I don't have tangible data to share right now.