HACKER Q&A
📣 cossatot

How to exhaustively search the scientific literature?


I have a need for a comprehensive database of a certain type of event described in the scientific literature. For what it's worth, the event is a 'paleoearthquake', which is a historic or prehistoric earthquake that is found in the geologic record, usually by digging a trench across a fault line and identifying the disturbances in the geologic strata across or adjacent to the fault and, if possible, dating them via radiocarbon or other geochronological methods. However I don't think the specifics are particularly important.

The issue is that these are generally reported in the literature from local investigations of one or two faults, yielding a few events. These studies are done wherever there are earthquakes on land, so we have a global scope and language issues. Even limiting the results to the English peer-reviewed literature, however, it's a huge distributed search.

I estimate that there are on the order of 10,000 published events, and a mean of 2-3 events per publication.

For my immediate use of the database, it is very important for the database to be as complete as possible--I'm not looking for a sort of statistically representative sample. The literature itself is quite incomplete of course, but we're limited to what exists for now.

Starting with the first step of collating publications, what tools would one use? I have access to most journals through various university affiliations. Are there particular APIs? Web scraping tools? LLMs?

Thanks!


  👤 CamperBob2 Accepted Answer ✓
One option that shouldn't be overlooked: get a temporary subscription to an OpenAI model that allows you to run what they originally called "deep research" (nowadays called "Extended Pro" mode.) This isn't available on the freebie chat page, it will require at least a $20/month subscription (and maybe more, not sure.)

Then, basically paste your post into the prompt and let it crunch. It will take up to 30 minutes or so, and will often give you a reasonably comprehensive report in which most of the references actually exist. It is absolutely a better-Google-than-Google class of resource.

I'll do that and see if it comes up with anything meaningful, and also try it on Gemini 3.1. For a query like this I wouldn't expect it to return a list of thousands of individual reports, but it might give you some good leads that you can follow up with your existing journal access.

Edit:

GPT results: https://chatgpt.com/share/699df5db-b3d4-800b-b737-224319593e...

Gemini 3.1 Pro results: https://gemini.google.com/share/bd22eb43c13b


👤 snowhale
for systematic completeness at that scale, Semantic Scholar's API (semanticscholar.org/product/api) and OpenAlex (openalex.org) are worth knowing about. both have full-text search, citation graph traversal, and free bulk access -- Semantic Scholar covers 200M+ papers. you can query by keyword, field of study, even author affiliation, then follow citation chains to surface papers that cite your known key works. deep research tools are fine for discovery but won't give you completeness guarantees; a proper systematic review workflow usually combines S2/OpenAlex keyword search + snowballing through citation graphs + dedup by DOI.