Does HN scrape submitted links?

Question

I was very intrigued when I submitting this to HN: https://news.ycombinator.com/item?id=32459886. The link I have actually submitted was https://www.theatlantic.com/science/archive/2022/08/bird-fli..., but HN automatically changed the domain to quantamagazine.org and marked the submission as [dupe], which is correct given that the same article was submitted 10 days ago: https://news.ycombinator.com/item?id=32342139.How was HN able to determine that the article from The Atlantic is the same as the one from Quanta Magazine? They don't share the title, nor the url structure. The only explanation I can imagine is that there was some scraping involved. Any other idea?

detaro · Accepted Answer

HN follows rel=canonical headers/tags

seydor · Answer

i think it's manualScraping is not really easy nowadays, with cloudflare stopping you