How do RSS readers handle items missing pubDates?

Question

I'm writing an RSS reader, and I noticed that items often miss publication dates. Regardless, other readers seem to be able to gather them anyway, I'm guessing through the actual site.Is this generally done through web scraping, archival information, web crawler data, or something else entirely?I'm asking because I would hate to resort to those types of methods as they are generally heavy on resources and unstable.

Minor49er · Accepted Answer

It's possible that the feed reader attaches it itself based on when it appears in the feed since the reader will keep polling the same feed over time

warrenm · Answer

When I've done it in the past, I've just attached the current timestamp of when the item's checked as the pubdateIt might be wrong, but it's usually pretty dang close to right (presuming the RSS feed isn't brand-new to the reader)

dive · Answer

NetNewsWire uses [0] &lsquo;dateArrived&rsquo; as a fallback option. As an example.[0] https://github.com/Ranchero-Software/NetNewsWire/blob/941342...

toomuchtodo · Answer

Query the Internet Archive&rsquo;s CDX server for this info. Consider caching the data for performance and good netizen reasons.https://github.com/internetarchive/wayback/blob/master/wayba...

6510 · Answer

There always is a date some place. Find or derive them, order them by how resource heavy and/or crappy they are. Chose wisely for the type of reader you are making and/or create configuration options for it.

kevincox · Answer

Id be surprised if many readers did something besides default to the current time if a date wasn't specified.

tacone · Answer

Looking at it from another point of view: should a RSS reader trust pubDate?

How do RSS readers handle items missing pubDates?

It's possible that the feed reader attaches it itself based on when it appears in the feed since the reader will keep polling the same feed over time

When I've done it in the past, I've just attached the current timestamp of when the item's checked as the pubdate
It might be wrong, but it's usually pretty dang close to right (presuming the RSS feed isn't brand-new to the reader)

NetNewsWire uses [0] ‘dateArrived’ as a fallback option. As an example.
[0] https://github.com/Ranchero-Software/NetNewsWire/blob/941342...

Query the Internet Archive’s CDX server for this info. Consider caching the data for performance and good netizen reasons.
https://github.com/internetarchive/wayback/blob/master/wayba...

There always is a date some place. Find or derive them, order them by how resource heavy and/or crappy they are. Chose wisely for the type of reader you are making and/or create configuration options for it.

Id be surprised if many readers did something besides default to the current time if a date wasn't specified.

Looking at it from another point of view: should a RSS reader trust pubDate?