HACKER Q&A
📣 ChrisMartin33

Is it possible to create an uncensoring "science chatbot"?


For starters I would like to have a chatbot who knows the scientific literature in a certain field, like climate science. Then I would like to be able to ask questions which are answered with pure scientific facts only, just based on scientific literature. Without ever answering with "...this is not scientifically proven, but the consensus is..."


  👤 LucieXLiu Accepted Answer ✓
Chatbots with retrieval access to scientific literature exist. I tried two of these and built a third one myself. They are good for finding references that might have escaped your attention but getting a coherent, critically-thought-through output from them is difficult. For now the RAG setup can give users the illusion that it is answering with coherent understanding of a body of knowledge, but really it is pretty scatter-brained and the UI should make sure the user goes into the paper to check the quality themselves if they are going to use it as a reference. There is work to be done to improve the retrieval and fine-tuning, but I haven't tried yet. Getting human feedback on these responses also requires expertise in extremely finely-divided subfields and is slightly harder than labeling cats and dogs.

Another issue with the rapid-fire facts Q&A setup is that it is at odds with the scientific method. Science is a dynamic process of observing patterns, proposing hypotheses to explain patterns, and testing the hypotheses with experiments to produce theories. In your question, the "pure fact" probably refers to a theory that is unanimously supported by good experiments, and "BS" is the opposite of that. However, there is a lot of ground between fact and BS because a theory can be supported by early experiments and then proven false by later experiments. The newer the theory is, more likely it is to be sitting somewhere in the middle of this spectrum.

Many experimental fields are having a bit of a reproducibility crisis, and many theoretical fields are divided over theories that are difficult to experiment on. So for the newer literature you really need to judge for yourself, and for the older literature there is real risk that somebody has come up with some new observations that proved it wrong, even if it has gathered a lot of citations over the years.


👤 PurpleRamen
> For starters I would like to have a chatbot who knows the scientific literature in a certain field,

You can feed it with whatever you want. How much censoring (of what exactly?) happens, is up to you.

> Then I would like to be able to ask questions which are answered with pure scientific facts only,

That would demand a very deep level of understanding, which AIs today are not able to deliver.

> Without ever answering with "...this is not scientifically proven, but the consensus is..."

That would not be scientific. In the first place, to understand the scientific literature, one needs to evaluate them and rate their value and how reliable they are in the greater known picture. Just taken them blindly would be worthless. That would be just connecting the words and believing that this is science. That's what all LLM basically already do, and the reason why they lie/hallucinate.

And evaluating texts and putting them into a proper context is not censoring, if this is your impression.


👤 jruohonen
> For starters I would like to have a chatbot who knows the scientific literature in a certain field, like climate science.

Mission impossible as most of the relevant literature is paywall'ed.

> Then I would like to be able to ask questions which are answered with pure scientific facts only, just based on scientific literature.

Mission impossible because LLMs cannot evaluate facts. Nor can they do source criticism.