1. I pay for all user prompts, even for duplicate ones.
2. I am at the response-time mercy of the LLM API.
I could easily cache locally all prompts in a KV store and simply return the answer from cache for duplicate ones.Why isn't everyone doing this?
I assume one reason is that the LLM response is not deterministic, where same query can return different responses, but this could also be added with a "forceRefresh" parameter to the query.