Now i thought about automating this and programatically 'scanning' the LLMs implicit knowledge (of a specific domain) and compile it to some kind of knowledge graph - e.g. an entity relationship diagram of physics concepts.
Could be interesting. With the right scanning technique, it's maybe possible to extract a semantic representation of all of the LLMs 'knowledge' or the information in a document.
Has anyone of you already dealt with sth. like this?
LLMs are very good at dealing with contextual polysemy, the catch is that an embedding of say a topic, will be far from the topic in it's different possible context. So a knowledge graph would be possible, but how you would find these possible areas, or why you would constrain it, is sort of another question.
Now if you are just asking about education, you can get it to generate lists of relations, and map those in concept maps etc(quite a few tools do this), but that's pretty superficial, as such...
FWIW back when LLMs were more prone to hallucination, the convex hull of the embddings of known ground truth statements was more likely to contain truthful generations and relevant generations than those outside of it when I worked on a quiz-generating application in 2020/21 doing this.
In my opinion though you should try to embrace this malleable nature rather than constrain it...
For example, if you ask an LLM to create a graph of all proteins related to X disease, and show how they interact, it will oblige. (You can try this yourself easily in the OpenAI playground. Just ask it to send you back a list like X -> Y -> Z or whatever. Or an array of source/target/relation triplets.)
The challenge is that what you get will be very dependent on how you phrase your request. So you’ll never know if you’re getting a “complete” graph or just the most probable graph for the request you made. If you’re an expert in the domain, you’ll know, but if you’re an expert you might not need the graph in the first place.
Sample: https://twitter.com/yoheinakajima/status/1706848028014068118