Is anyone building a question answering system using the HN corpus?

Question

Today, if someone wants to know what the HN community knows/thinks about a topic, they can either:A) Search past HN comments on hn.algolia.com, orB) Post a new 'Ask HN'.LLMs could provide a new way to find answers within a corpus. These have been described elsewhere, e.g.- https://github.com/openai/openai-cookbook/blob/main/examples...- https://news.ycombinator.com/item?id=34477543I keep expecting someone (maybe minimaxir or simonw?) to post a 'Show HN: Get your question answered by the collective wisdom of HN', but I no one has so far (unless I missed the submission?).Is someone already working on this?

flemhans · Accepted Answer

I'd love to do this offline, so I could feed it all my mail. Am I right that it's still going to be a while before we can do that? Or perhaps with a less good model than GPT-3?

olivierduval · Answer

Mmmm... and what about copyright ? I mean: may I dump all of HN and then consider it a book to be sold for my own profit ? And if I can't do it... what is the difference between this idea and using HN to train an LLM ? And what if I don't want my comments be parts of this LLM ? Or what about the "trash" accounts that don't want to be identified ?Don't get me wrong: the idea could be nice but... ain't it time to think twice about all this before applying the last technological fad ?

leobg · Answer

Been thinking about this many times. I regularly check what HM things about a specific book, what services HN recommends to perform a particular task, etc..To the sibling comment that I asked about doing this locally: there&rsquo;s really no need for an LLM, much less for GPT-3. All you need is, well, attention. Sentence-transformer embeddings. Perhaps even just fastText.

dyeje · Answer

I would assume OpenAI products are already trained on it, amongst many other sites.

gschoeni · Answer

Has somebody crawled and made a corpus out of hacker news? Is it maintained?