HACKER Q&A
📣 cactusplant7374

What is Substack doing with HN data?


After I posted a link to my website to HN I noticed this in my logs:

44.195.67.189 - - [30/Apr/2023:20:38:44 +0000] "GET / HTTP/1.1" 200 11321 "-" "SubstackContentFetch/1.0 (https://substack.com/)"

I've never seen this before.


  👤 KomoD Accepted Answer ✓
My guess is that is the opengraph[0] crawler for Substack Notes[1], the Twitter-ish alternative that Substack is making, so like someone posting a link to your blog, it visits, grabs meta tags to display a link preview.

[0]: https://ogp.me/

[1]: https://substack.com/notes

Edit: I was indeed correct, I went on Substack notes and wrote my own site, but the bad part is that it crawls as you type your post out, so instead of 1 request it'll be several!

> Sun Apr 30 2023 23:56:03 GMT+0000 (Coordinated Universal Time)]: | Ip=34.200.242.86 | Req_page=/?substack_notes | Agent=SubstackContentFetch/1.0 (https://substack.com/)


👤 verdverm
Was it a substack employee clicking the link while on the corporate VPN?

tl;dr - you won't know much from a single log entry

generally, there are many groups scraping HN and the links found there