HACKER Q&A
📣 pipeline_peak

How do RSS readers handle items missing pubDates?


I'm writing an RSS reader, and I noticed that items often miss publication dates. Regardless, other readers seem to be able to gather them anyway, I'm guessing through the actual site.

Is this generally done through web scraping, archival information, web crawler data, or something else entirely?

I'm asking because I would hate to resort to those types of methods as they are generally heavy on resources and unstable.


  👤 Minor49er Accepted Answer ✓
It's possible that the feed reader attaches it itself based on when it appears in the feed since the reader will keep polling the same feed over time

👤 warrenm
When I've done it in the past, I've just attached the current timestamp of when the item's checked as the pubdate

It might be wrong, but it's usually pretty dang close to right (presuming the RSS feed isn't brand-new to the reader)


👤 dive
NetNewsWire uses [0] ‘dateArrived’ as a fallback option. As an example.

[0] https://github.com/Ranchero-Software/NetNewsWire/blob/941342...


👤 toomuchtodo
Query the Internet Archive’s CDX server for this info. Consider caching the data for performance and good netizen reasons.

https://github.com/internetarchive/wayback/blob/master/wayba...


👤 6510
There always is a date some place. Find or derive them, order them by how resource heavy and/or crappy they are. Chose wisely for the type of reader you are making and/or create configuration options for it.

👤 kevincox
Id be surprised if many readers did something besides default to the current time if a date wasn't specified.

👤 tacone
Looking at it from another point of view: should a RSS reader trust pubDate?