I think you could hack a way to do this with BrowserBox: which, funnily enough, already "replaces web pages with their screenshots" (we stream the screenshots or screencast from the headless browser and serve it).
However to get what you really need, I see some difficulties:
- you could scroll the entire page first, stitch the screenshots together and then not update any further, but this breaks in some weird edge cases:
- dynamically loaded content which changes the scroll height as you move down, will not be loaded or displayed in this method reliably.
- this height change will then throw off any clicks that occur below it
- this will still not prevent banners and the like from occurring
- you could use the --print-to-pdf command line flag for Chrom/Edge/Brav/ium browsers. Not sure if hyperlinks work from these. But you could set up a basic proxy to take all HTTP and HTTPs requests and pass them through your headless "to pdf" render process if they do.- if hyperlinks don't work in these pdfs you could use Playwright (or CDP if you're braver), to evaluate "document.links" property (see: https://developer.mozilla.org/en-US/docs/Web/API/Document/li...), and just create a simple "list of links" HTML page templated from these and append that to the bottom of your PDF document (by using some command line tool in collab with your to-pdf proxy).
Anyway, fascinating idea, good luck with it! :)
Search engine: archive.is
Shortcut: a
URL with %s in place of query: https://archive.is/?run=1&url=%s&
This works to effectively make 99% of pages static.Based on past experience, I believe the archiver allows JS during load but is disabling JS during runtime (but I haven't actually verified that).
There is also an unofficial Chrome extension with similar functionality, but it is closed source and requires entire browser history permission for some reason, so...
If you truly want a screenshot, you can write a program to spin up a headless browser / selenium, load the page, and save the result somewhere. People do this all the time as part of UI and integration tests and tooling is decent enough.