HACKER Q&A
📣 lagrange77

Extracting Knowledge Graphs from LLMs


Today i was asking my LLM about some physics concepts and found myself repeatedly asking it for the relation between some of those concepts.

Now i thought about automating this and programatically 'scanning' the LLMs implicit knowledge (of a specific domain) and compile it to some kind of knowledge graph - e.g. an entity relationship diagram of physics concepts.

Could be interesting. With the right scanning technique, it's maybe possible to extract a semantic representation of all of the LLMs 'knowledge' or the information in a document.

Has anyone of you already dealt with sth. like this?


  👤 james-revisoai Accepted Answer ✓
Look at the Kaggle competitions that use embeddings, such as those by the Learning Agency. There was also a paper two weeks back and one two days back about inverting embeddings back to original semantic forms.

LLMs are very good at dealing with contextual polysemy, the catch is that an embedding of say a topic, will be far from the topic in it's different possible context. So a knowledge graph would be possible, but how you would find these possible areas, or why you would constrain it, is sort of another question.

Now if you are just asking about education, you can get it to generate lists of relations, and map those in concept maps etc(quite a few tools do this), but that's pretty superficial, as such...

FWIW back when LLMs were more prone to hallucination, the convex hull of the embddings of known ground truth statements was more likely to contain truthful generations and relevant generations than those outside of it when I worked on a quiz-generating application in 2020/21 doing this.

In my opinion though you should try to embrace this malleable nature rather than constrain it...


👤 simonmesmith
Having worked on this problem in biology, I think one of the challenges you’ll find is that the knowledge graph will be extremely context-dependent and biased towards highly probable nodes and edges.

For example, if you ask an LLM to create a graph of all proteins related to X disease, and show how they interact, it will oblige. (You can try this yourself easily in the OpenAI playground. Just ask it to send you back a list like X -> Y -> Z or whatever. Or an array of source/target/relation triplets.)

The challenge is that what you get will be very dependent on how you phrase your request. So you’ll never know if you’re getting a “complete” graph or just the most probable graph for the request you made. If you’re an expert in the domain, you’ll know, but if you’re an expert you might not need the graph in the first place.



👤 birdplanellama
look into tostino/Inkbot on hf