HACKER Q&A
📣 alaeddine-13

What do you think of continuously learning LLMs in production


What do you think of continuous learning in LLM apps? Current LLM technology only allows generation or offline training, but not much support of continuous learning. This means that LLM apps cannot directly benefit from feedback acquired in production, for instance from user or environment feedback. I see use cases that this can enable, like coding assistants learning from interpreter results, user feedback on customer support LLM agent, an AI browser learning from a failed tool usage or an AI worker going through an initial onboarding or training. Current state of research can somehow enable this, like RAG, DPO, KTO or model editing but still some issues like sample efficiency and catastrophic forgetting.

Though the big question is whether learning should always happen offline or it would be useful to have models learn in production. Let me hear your thoughts.


  👤 dougmwne Accepted Answer ✓
I’m not sure what the use case is outside of having real time information about the world as a search replacement.

For example, if a major event or discovery happened, the model would know about it once a critical mass of news stories and discussions had been generated online. You’d probably be looking at a few days before the content accumulated to the point where it would affect the model weights, which encode all human knowledge ever digitized, so some single news article in a training set of trillions of tokens is not enough.

If you want a long term memory of user interactions, long context and RAG seems to do the job nicely as a single fact can be pulled out of a context length of millions of tokens.