HACKER Q&A
📣 rahimnathwani

Is anyone building a question answering system using the HN corpus?


Today, if someone wants to know what the HN community knows/thinks about a topic, they can either:

A) Search past HN comments on hn.algolia.com, or

B) Post a new 'Ask HN'.

LLMs could provide a new way to find answers within a corpus. These have been described elsewhere, e.g.

- https://github.com/openai/openai-cookbook/blob/main/examples...

- https://news.ycombinator.com/item?id=34477543

I keep expecting someone (maybe minimaxir or simonw?) to post a 'Show HN: Get your question answered by the collective wisdom of HN', but I no one has so far (unless I missed the submission?).

Is someone already working on this?


  👤 flemhans Accepted Answer ✓
I'd love to do this offline, so I could feed it all my mail. Am I right that it's still going to be a while before we can do that? Or perhaps with a less good model than GPT-3?

👤 olivierduval
Mmmm... and what about copyright ? I mean: may I dump all of HN and then consider it a book to be sold for my own profit ? And if I can't do it... what is the difference between this idea and using HN to train an LLM ? And what if I don't want my comments be parts of this LLM ? Or what about the "trash" accounts that don't want to be identified ?

Don't get me wrong: the idea could be nice but... ain't it time to think twice about all this before applying the last technological fad ?


👤 leobg
Been thinking about this many times. I regularly check what HM things about a specific book, what services HN recommends to perform a particular task, etc..

To the sibling comment that I asked about doing this locally: there’s really no need for an LLM, much less for GPT-3. All you need is, well, attention. Sentence-transformer embeddings. Perhaps even just fastText.


👤 dyeje
I would assume OpenAI products are already trained on it, amongst many other sites.

👤 gschoeni
Has somebody crawled and made a corpus out of hacker news? Is it maintained?