HACKER Q&A
📣 dusted

Is the past disappearing on the web?


I have the habit of looking at the date of things I consume online, it gives me a sense of relevance and context, both when I'm looking for things that are "from now" but more importantly when I'm looking for things given a temporal context, for instance, programming for an old compiler, finding out how to do something with an old piece of hardware or electronics.

I feel like I'm encountering more and more sites and articles where I can't seem to find the date. Google will return irrelevant results from today rather than relevant results from 10 years ago.

I feel it's getting worse, is it just me?


  👤 samwillis Accepted Answer ✓
> I feel like I'm encountering more and more sites and articles where I can't seem to find the date.

It seems to me that its become standard practice on marketing type blogs for corporate websites to remove the date from their posts. I think its because (from personal experience) the company will go though a burst of "blog productivity" create a load of content but then not touch it for years, they don't want that content to look out of date or their website to look stagnant.

Removing the date from their posts, or any other content, hides how old it is and therefore obscures how active they are at crating new content.

Most companies try to use their blogs to attract new customers, a new customer may visit their website once or twice and will never see the blog again, it's not important that they do. They don't want it to look stale.

As a counter example, an interesting thread from yesterday [0] was about how CloudFlare use their blog not as a marketing tool but for technical content and attracting employees. They very regulally use their blog, and so keep the date on it showing how fresh it is.

0: https://news.ycombinator.com/item?id=30070422


👤 m-i-l
Anecdote: On of the best friends of my oldest daughter moved to a new rented flat recently. When I saw the address for a playdate, I recognised it and said to my daughter "I think you went to a birthday party at this address when you were 3-5 years old when another of your friends must have lived here". To corroborate this, I looked up the calendar on my phone, where I'm pretty organised about putting dates and locations of events. However, at that point I discovered that Google Calendar only keeps 2 years of history!

So modern technology is literally erasing our pasts. Not just calendar entries, but messaging systems (people used to keep handwritten letters for decades), and possibly even photos (if we're not careful about preserving them).

Edit: See my clarification of the 2 years in comment https://news.ycombinator.com/item?id=30084620 below. I still think the point remains - we do not own or value our digital data in the same way as physical objects, and there is a much heightened risk of that data disappearing as a result, either by the owners of the platforms the data is stored on archiving the data or by us not valuing it enough to preserve exports and backups through long periods of time.


👤 marmarama
Google's search quality has taken a serious nosedive in the last couple of years - the last 6 months in particular.

I think they implemented some new form of search term widening which is far too strong, so the results you want are often buried among pages and pages of results for the general category of things that you searched for rather than close matches for your keywords. Combined with the recency bias that other people have talked about and you end up with a lot less useful search for precise searching.

This coincides with a large increase in the number of surveys that my partner has been getting through the Google Rewards program that ask whether or not a recently used search term gave relevant results. Obviously that's just anecdotal, but it does feel like there are substantial changes in the algorithm, and not necessarily for the better.


👤 skinkestek
> Google will return irrelevant results from today rather than relevant results from 10 years ago.

Tip: leave Google behind for now.

That site has the last few years been very useful but only in the same way as my very cheap electrical saw: because I didn't have access to anything better.

For someone who has tried good tools like Festo, Milwaukee, Hitachi or old Google it is just a painful reminder of the past and how good life used to be.

It works but hasn't sparked joy for close to a decade.

After kagi and marginalia came into my life my life has improved significantly.

Note: I'm not saying Googlers are evil or dumb now but I will point out that engineers there have incentives stacked against them.


👤 btrettel
A warning for those who remove dates from their content and care about invalidating bad patents and rejecting bad patent applications: If there's no date, likely a patent examiner can't use it as prior art.

I used to work as a patent examiner and I was disappointed when I found web content describing an element of a patent application I was working on, but there was no date that could be used to be certain the document was available before the priority date of the application.

You can use the Wayback Machine and similar archivers to get a date, but frequently the archivers didn't capture the page or didn't capture it in time in my experience (even if it likely was published in time, I can't establish that legally).

Before I quit, I spent some time saving a ton of webpages in one of the areas I was working on (water heaters) just because I wasn't sure how long I'd be at the USPTO and I could be certain of the date given that I myself archived the documents. It was a long-term investment, but could have been quite useful if (for example) a company tries to patent something they previously sold a long time ago and forgot about. The Wayback Machine often had spotty coverage of corporate webpages so I couldn't see all their products at a particular time.


👤 fer
I've certainly noticed Google ignoring results more than 3-4 years old when something barely related, but more recent, matches. I call it "recency trap". Because of that I've found myself more and more systematically setting the date range of the desired results (which isn't 100% of the time useful as many sites reply with misleading metadata).

To point a recent example (and given the current events) a number of Russian officials blamed the sinking of the Kursk on NATO (either on purpose or by accident), and I recall such statements from back then, but via Google it's been almost impossible to find a primary source. Most results were from the 2021 statements insisting on that from a retired admiral that was involved back then, but from 2000/2001 the relevant content was certainly tough to find.

Part of it is because this is 2000/2001 and many links rotted away, another part because the existing links usually don't respect basic SEO, and finally because Google, in my experience, very strongly prioritizes now/recent content.


👤 marginalia_nu
Google has a pretty big recency bias. From a content producer-perspective it makes sense, if you put up new content you want to see traffic to that content. From a consumer perspective it's questionable at best. Given the Lindy-effect, odds are the quality of old content is higher than average.

I also do kinda think we should be thinking more about what legacy we leave than we presently do. HTML has some serious problems with that regard, especially in terms of link rot, and especially now that we treat it as a way to build platforms. Archive.org is great and all, but is it enough? How will SPAs fare when the backend server is down in 30 years? How much value will be lost?


👤 bencollier49
Even more frustrating than this are the sites which automatically append the current year to their article titles.

"Best CMS frameworks (2022)", for example, and yet the content is out of date.


👤 robbomacrae
The past is always disappearing. I know of no data recording device that we could use to store anything for more than a million years except DNA (for fun check out the Arch Mission Foundation that attempts to use DNA for backups of human knowledge) [0] or 5D nanostructured glass [1].

This post reminded me of a great Kurzgesagt video [2] that went briefly into how much of the past life on earth we have no information on and will never be able to know. Incidentally it took me a few seconds to find that video. Before the internet if I was trying to lookup a clip I had seen a month ago on TV I don't know even where I would have begun searching...

However I think we are getting increasingly better at preserving information and making it easy to access with tools like the internet archive, and cloud backups for your photos. This is despite the sheer quantity of data (such as the number of photos you take) growing at an exponential rate. Would you have been able to easily find instructions for a machine that was decades old before the internet?

So the past is disappearing but possibly at a decreasing rate.

0: https://en.wikipedia.org/wiki/Arch_Mission_Foundation

1: https://en.wikipedia.org/wiki/5D_optical_data_storage

2: https://www.youtube.com/watch?v=xaQJbozY_Is


👤 nomilk
I take a lot of notes. The links I place in them often 404 just a few years later. I'm grateful for wayback machine. But it doesn't capture everything. Sometimes I'm just left with a link. And some links don't reveal any information about the source (e.g. a youtube link gives no info about what a removed video was). I've started adding a little more plain text to my notes to help protect against this (e.g. source: "3m20s into Bob's Rails tutorial on Webpacker") just in case it disappears I'll have something to latch on to.

👤 Terry_Roll
Its not you, the search engines have become weaponised.

The most recent example, Sue Gray, a top British civil servant until a few months ago would have her career controversies visible in search results when searching for her. Since it was announced she would carry out an investigation into party's at 10 Downing Streets, its become impossible to see her career controversies in search results now.

Eli Pariser also hilighted changes going on with Google back in 2011 as you can see from this talk, but I think was just the start. https://www.ted.com/talks/eli_pariser_beware_online_filter_b...

IMO, the search engines have now got a lot worse with what they show in search results like the Sue Grey example above.

Society is becoming like Fahrenheit 451 https://en.wikipedia.org/wiki/Fahrenheit_451


👤 codeptualize
Yes, and even worse; I have a strong suspicion some websites (automatically) update the date in titles and other places.

Especially "best [product] in [year]" articles and lists, they somehow always are about the current year, even in early January, and even if they are only about outdated things..

The crap dirty tricks SEO content does seem to work quite well atm. It's probably pretty hard to determine whether something is relevant or not.


👤 fireflymetavrse
Also, youtube videos sometimes don't have posting date in their description, not sure why. And I hate when platforms use contextual time like '1 year ago' instead of posting date.

👤 sillysaurusx
One way to date a post is to copy the url into https://archive.org/.

But I've shared similar frustrations, yes.


👤 comboy
I remember checking caching headers to learn about content date at some point.

It's not just you. Also sites being gone and content getting lost. I think by now pretty much everything text should just not disappear anymore, and we seem to only have web archive which is doing something right (google cache seemed to have lost its persistence at some point but I may be imagining it because that's like 2 data points).

I wish web archive would skip videos and instead fetch more obscure websites but I'm guessing that being able to tell what's spam and what is not is not easy.

Shared distributed history cache for visited websites could be nice but within a short time I spent thinking about it I couldn't figure out a way in which this could work that would make me install it myself.


👤 mejutoco
I noticed a bias for recency that a lot of people share, or take for granted (sometimes myself). I was surprised to see that this idea, that history is better and better with time, goes back to Aristotle.

https://plato.stanford.edu/entries/progress/


👤 steinuil
I think Google prioritizes newer content because AFAIK modifying the date of a page to make it look like it's been published more recently is a common SEO trick.

👤 habibur
TikTok made this more popular. Every other social media had a date of the post or the video. TikTok had none. Still doesn't. That kept videos fresh over time.

👤 gitfan86
Yes, I discovered this back when reddit was just a few years old. Before then I was able to search google for a specific reddit thread and find it. Now there is no chance. I've been screen shit and downloading things that I like for years now.

👤 ChuckMcM
Traditionally 'date' has been a negative SEO thing in that Google prefers to surface "new" things rather than "older" things. As a result, web sites will pick one of three strategies; Remove all date indicators from the content, Put the date on when "new" but remove the date after the content has been up for more than a set period of time, or use a js function to always render a date from yesterday as the date of the material.

Google isn't about "reference" data (who clicks on ads when they are looking for a history of stories about topic X?) so the archival and reference function falls to meta services like Wikipedia where a human curates the history and provides links back to the that history.

Of course such links get very few visitors and often the place hosting the content will simply retire it rather than spend a couple of nanocents on leaving it up, and the result is link rot.

Yes, I am cynical about how Google is now an agent for "destroying the world's information" when at one point they were simply trying to organize it.


👤 ajuc
I often use before:YYYY-MM-DD and after:YYYY-MM-DD keywords in google search, and some websites mess with that data so that their content looks like it was created today when it obviously wasn't.

👤 ecf
I’ve started noticing this when using Google to search for Reddit threads. Threads posted years ago now appear to have been posted in the last few days due to Reddit hacking Google relevancy metrics.

Probably Reddit’s doing, but it’s made finding older topics impossible and I partially blame Google for letting companies abuse their service in this way.


👤 masswerk
I think, this started already at the height of the Web 2.0, when Google started to favour more recent (often copy-and-paste) blog content over older content, regardless of prestige or authority. If I'm not mistaken, age and/or recent update has been now officially part of the algorithm for quite a while.

👤 oversocialized
Yes the old, decentralized web is disappearing and being replaced by centralized big tech. I remember when google search came out and it was amazing. Now it restricts anything "not approved". personal websites, geocities, angelfire, myspace. then blogs and facebook appeared for people to put their content there. Then reddit came and ate the forums. reddit never used to show up in search results but many helpful forums would. Now it is the opposite. Then google ate up the bloggers. The entire point is to direct content to these big tech approved sites. Try wiby.me to find some of the old sites. There is also a movement to decentralize. Take your stuff off github, off reddit, off facebook, and onto your own site. Form webrings like the good old days.

👤 analyte123
It's not just you - it's always Year Zero [1] on Google SRPs now. When you're looking for information on a topic or historical event, you're not going to easily find classical sources, contemporary primary sources, bloggers - it's always content from the last two years from what Google considers authoritative / trusted sources. Searching has become a skill again; you have to use your existing knowledge to drill down to associated terms and also search books, archives, social media, and so on, up to asking literal people.

[1] https://en.wikipedia.org/wiki/Year_Zero_(political_notion)


👤 hwers
It's to a huge degree google giving crap results and not many actual pages disappearing. The other day I searched for "f.position.vsub is not a function"[1] and after the first 3 results it starts giving just completely irrelevant results like 'the definition of function according to mathematics' (https://en.wikipedia.org/wiki/Function_(mathematics) ). Just wild garbage stuff.

[1] https://www.google.com/search?q=f.position.vsub+is+not+a+fun...


👤 geodel
It also reminds me what Thomas Piketty said prices of things used to be mentioned prominently in novels and all in past. Nowadays it does not happen unless pricing itself is salient part of plot. The reason was being inflation. Once inflation took over, there was no accuracy or authenticity about prices of objects and slowly disappeared from novels/movies etc.

The same thing I can see with dates, with billions of events generated and captured every second , the actual date/time of event can be demoted to level of thousands other attributes captured. So it will be mentioned when date itself is point of article etc.


👤 gwern
That past does seem to disappear in Google: https://news.ycombinator.com/item?id=23977375 https://news.ycombinator.com/item?id=19604135 Maybe people are evolving to erase dates as a countermeasure?

I sometimes wonder how much of this is just a ratchet of things like banning spam: on a long enough time horizon, the survival rate of everything goes to zero?


👤 twinge
Related anecdote: the 2008 flash crash of United Airlines stock [0]. A six-year-old article about United Airlines' 2002 bankruptcy was republished from the Chicago Tribune in The South Florida Sun-Sentinel (apparently without the date), then got picked up Google News, then by an investment firm, then ultimately by Bloomberg, leading to a flash crash from $12 to $3.

0: https://www.wired.com/2008/09/six-year-old-st/


👤 gbuk2013
I think this is partly a second-order consequence of the reasonable expectation that, due to the rate of innovation and general churn in technology, computer-related information goes stale quickly and hence more recent sources should be prioritised. Having a date stamp on the page then becomes a liability (I for one am guilty of adding a current year number to my search terms for some queries to try to get more accurate results).

Another reason might be that SEO got really good several years ago so older content just can't compete.


👤 sebastianconcpt
I've noticed that too. For some explicit or implicit reason, designers are making UI's without the date. Of course this makes anyone unable to orient itself in the time context. Wonder if they are mindlessly influenced by the transient nature of social media design. Things being "conversation" oriented can gain from peers interaction but for highly technical content, an informal UX will prevent high engineering from _flowing_ (emerging smoothly) or at least less difficult than needed.

👤 shime
I think this is the effect of various companies trying to capture as much of our attention as possible because it makes sense for their bottom line. We are all victims of recency bias, so it makes sense for companies to prioritize more recent content. If it wasn't the case, the various social media apps wouldn't be as addictive as they are.

I am not aware of a social media app feed that puts quality above recency (not counting the plugins that enable that, like Twemex [1]). Instead they keep us in what David Perell calls "Never-Ending Now" [2]. We endlessly consume temporary, short-lived content and we are mostly blind to the past.

Google search is not social media, but I wouldn't be surprised if Google ranked more recent content higher, given how they have changed the Youtube algorithm.

1: https://chrome.google.com/webstore/detail/twemex-sidebar-for...

2: https://perell.com/essay/never-ending-now/


👤 tiborsaas
Google has some neat tools, click the "tools" menu and select any time, you can enter a range there: https://imgur.com/a/XQSMT9n

Not that I ever needed to use it, but it's there.

Google does know an indirect date of the page even if it's not written explicitly. The first crawling date should be saved and if no better indicator exists I'd assume they are using that timestamp.


👤 chrsw
I also had the same observation. It's pretty annoying and an example of how the web in some cases is become less useful. For Google in particular, more often I myself skipping the first wave of results because they aren't very helpful, they're just crafted to show up in search results. I guess now that so many of us are already in the Google ecosystem, there's less incentive to cater to users.

👤 jlengrand
There is worse than content without a date. There is content with a fake date, changed to appear like it's relevant while it isn't.

There is nothing more frustrating that typing "XXXX 2022" click a link and see "XXXX-2020" in the URL.

People legit not changing their article but updating the title to stay on top of the SEO game. Usually found on generic searches that drive big traffic. I freaking hate that so much.


👤 varispeed
I find myself more and more using DDG. When I use Google I am mostly finding spam nowadays. Even if I try to filter results by date or add very specific keywords, I get mostly garbage. I have a feeling that also results that don't support current narratives are disappearing from search. Now when I find something useful, I am now bookmarking it where I didn't have to do that a year or so ago.

👤 motohagiography
Search results seem only to preserve the most controversial stuff, because that's what people look for, which is just the way the internet and news works. The idea that search engines aren't honest brokers or trustworthy sources is probably an indirectly positive development, as they shouldn't be a social credit record or an immutable journal, and people should not feel they are subject to them, or expect that the results are a true or accurate reflection of a persons life or work. Search was originally great for searching technical material about things, but they're profoundly awful for telling you anything accurate about people, and I'd suggest that may be a positive view.

I did some professional writing for newspapers and magazines in the 00's, and it's all no longer online, which was a bit of a surprise to discover, as the articles in those publications were part of what I saw as significant personal accomplishments. Even the photos from the security cons have been largely scrubbed. There's some mercy in things falling off your social credit record with time, but for me that has been double edged. The good stuff is gone, and the lame stuff persists, but to me it's a small price for the freedom that the relative privacy provides. Some sites I understand what happened as some of it was personal, and other sites I checked to see how far back their online content went, and my pieces were just from before their current historical cutoff.

Of course as a natural conspiracy practitioner, I think there is an ideological effect of progressing search results to emphasize the present and downplay the good and value in the past as a matter of permanent revolution, but even I would be suprised if that was ever explicitly articulated anywhere, and the bitrot of internet history can be explained by other more concrete and plausible incentives.


👤 faoileag
Three things might come into play here:

1. AFAIK Google ranks pages (amongst other metrics) by how "fresh" they are. A date given on a web page might count as a measure of "freshness" so it would be good SEO practice to eliminate the date.

2. Google can measure how good its search results are by simply tracking the click-through. So, assuming search term "t" can lead to older results (technology ten years ago) or newer results (technology now), Google can refine results for the term by looking at the click-through rates. And if most of your fellow searchers look for newer tech, you looking for the older stuff might be marginalized.

3. With tracking being as sophisticated as it is, you might simply be in the wrong "cohort". If that's the case, you might try to alter what google knows about you by looking until you find the results relevant to you and then clicking on them. Even if that means going to page 10 of the results.


👤 therufa
> Google will return irrelevant results from today rather than relevant results from 10 years ago.

It's the main reason behind why instead of bookmarking stuff, I instead archive stuff. Search engines aren't as convenient as they used to be, especially for non-trending topics. At least that's my experience.


👤 Barrin92
I feel about the same and recently had to think about Warren Ellis's Transmetropolitan comic series which very much takes place in a future where history and precise dates as well as geographies seem to have been abolished or simply are irrelevant.

It's not only the case with literal dates but also with style. It's getting harder to tell precisely how old 'relatively new' stuff is. 'Retro'-design seems to have kind of disappeared. There's now such a flood of information that there's no real well-ordered periods any more. Even the way platforms now present information, not chronologically in an absolute sense but ordered according to personal preference kind of breaks time by design.


👤 jonathankoren
Google is over emphasizing most recent results. Just last night, I was looking up something about a medical treatment, and all the results were covid + treatment, none of which was what I wanted.

I’m glad I remembered the -keyword trick to exclude the term.


👤 slightwinder
The past is always moving and disappearing. Old servers die, content is moved to new structures and lost along the line. That's why the wayback machine exists. Also, google is just a window, it's not the whole world. And for some years now they limit their crawling to the last years of content AFAIK, for multiple reasons.

Over all, I think this is good, because it's more likely that old content is outdated and wrong. But of course old content can also be valuable. So for this stuff it might be better to use other ways than google to find them. Like searching directly on the websites or in specialized archives-search engines.


👤 cpcat
All i can say is i get really frustrated when i dont see a date on an article which to me means i cannot rely on any info in there. Im always looking for the latest and with no date i cant tell

👤 indymike
Here's the advice my company's SEO consultant gave us: "If you date your content, it will drop in the search results." This advice, if true is bad for human knowledge.

👤 antifa
This is a huge pet peeve of mine. Articles are increasingly hiding their publish date. Webauthors are arrogantly claiming their blog is "timeless". It turns out that whether the information im looking at is 1 year old, 4 years old, or 10 years old is actually very important in guiding how it needs to be interpreted. If you're lucky, there will be a publish date in "view source".

This applies to programming and current events.


👤 throwawayacc2
I’d imagine it’s a little bit of both to be honest.

There’s definitely algorithmic prioritisation for new content but I think it’s also a bit of a “seek and ye shall find” moment happening.

As an anecdote, very recently I had the opposite thing happening. I was researching something about React and noticed one of the comments ( still applicable actually! ) was from 2015. My head exploded when I realised 2015 is actually 7 years ago so for a short time afterwards I just kept noticing old comments or old content everywhere.


👤 zelphirkalt
> I feel like I'm encountering more and more sites and articles where I can't seem to find the date.

A friend also recommended me to not write a date in the past, but I did not see the reason really. And to simply not write it, because others don't is not a reason enough for me. To me adding the date to any content is a kind of honesty and if not that, then at least useful information, that can be easily added.

> Google will return irrelevant results from today rather than relevant results from 10 years ago.

This might be on Google, not the Internet in general. However, I also miss some things, that I did not store in earlier times and that are now nowhere to be found.

> I feel it's getting worse, is it just me?

I think it is not only you. I think it is a result of how the Internet has (d)evolved into more and more walled gardens. More and more short time engagement is optimized for, rather than long term quality websites and information. Many good content sources have long shut down.

I blame society for its overall mediocrity and lust for the social media quick fix, without realizing, what is destroyed by that. There are too many people online, who don't have a clue about how the Internet works, heck, how even a single website works. They make such a big part of online communities, that it becomes more profitable to cater to them, than the people, who actually know how stuff works. And why not? It is easier for them to do so, plus they make more money from the crowd. The majority does not care about sitting in walled gardens. They do not care about being able to host services yourself. They do not care about services being served by big corp and not being decentralized and extensible. They do mostly not care about their choices being taken away, which they never knew they had in the first place. There are countries, where the "Internet" is served by Facebook. They did not get to know the Internet by writing their own HTML by hand and putting that online. Most of them will never want to learn about the web's basics anyway. Today you are a "creator", when you produce content that goes through the filters of massive platforms, which are owned by FAANG and others. This is how we end up with a situation, that is less and less what people in the know would like. This is how the many ruin it for the few.


👤 tannhaeuser
Well, the past disappearing has been a staple in mainstream literature for some time now (eg. The Second Sleep by Robert Harris), and not just in SciFi (eg. A Fire Upon the Deep by Vernor Vinge, which I came around to reading over Xmas, where it's implicitly assumed civilizations can fall back into a second or third medieval age). So it's not just you; it's Google and the incentives they've created on the web since about 2010 or so.

👤 city41
On a similar note, I've noticed Google search results are almost always very "recent". For example I just searched for "linux find command" and the top result has a date of Nov 21, 2021. I could be wrong, but I don't think `find` has had any changes to necessitate such a recent article on it? It seems like Google gives preference to recent stuff so sites make what would otherwise be stable content, "recent".

👤 agumonkey
It's been the case for quite a while. I often find myself peaking at html source or even HTTP requests to try to retrieve dates. It's super strange.

👤 nmstoker
I agree it's getting worse, but this is not a new phenomenon. I don't have evidence to hand but this is something I've noticed for at least ten years. It usually undermines content trust and value, especially on more professional sites (you can expect it on news clickbait sites but national/international news sites and other generally reputable sites do it too)

👤 rsecora
The now has been always preponderous over the past.

There are different ways to look at the issue.

* general assumption: the future is synonim of progress, so now is more relevant.

* general assumption: we know the past, so focus on the unknown aka the now

* fact: maintain the memories needs energy, and energy is a scarce resource. So societies forget its past to focus in the current issues to preserve themselves.

BTW, Same vibes looking to twitter timelines


👤 Jimmc414
I think it is Google. I recently noticed how different my returned search results were if I'm logged in vs logged out of Google.

👤 JKCalhoun
> Google will return irrelevant results from today

My annoyance when searching for "Alice in Wonderland" and getting Disney, not Carroll.


👤 krapp
> Google will return irrelevant results from today rather than relevant results from 10 years ago.

I can't imagine many subjects for which decades-old results would be more relevant than current ones.

Also, Google gives you the option to search by date or date range, so if you just wanted 10 year old results, you could just do that. It even adds the date to the results.


👤 nicbou
Google explicitly values recent results, and it's related to their campaign against incorrect information.

In many contexts, this makes sense. Things change, and old content describes the old reality. I think it's generally good, because it encourages content creators to maintain their content, instead of publishing it then forgetting about it.


👤 Atlas667
The internet is probably the biggest set of tools right now for preserving the past and keeping it at arms reach.

You're referring to googles search practices, which is a for profit industry centered around marketing, people need to see those ads. And they can make sure of that.

The discrepancy is of the interests of a company vs the interests of the majority.


👤 lovecg
What’s more insidious is sites that do include a date, but silently edit the old content over time anyway. NYTimes is famous for this, but I’m sure others do it too. We’re retreating from a world of hard record keeping back to a sort of an oral history tradition where the past is flexible.

👤 illwrks
As mentioned by other I think search is becoming more muddy and that is part of the issue.

The thought I have in the back of my head is that for thousands of years lots of information was written down, but not any more. What happens if digital storage is affected/destroyed for what ever reason...

Back to the stone age?


👤 crucialfelix
The world would be very different if we had 10000 years of detailed correspondence, news, diaries and paintings from a majority of the population.

What a treasure and a curse it would be. We would drown in our past.

Though I think with AI it would be a blessing. So much to learn from.


👤 wodenokoto
It’s getting really bad. Having a date, or at least an accurate date on an article is a liability towards google traffic. Even old articles about things that happened a long time will be “updated” every 6 month in order to have a relevant date on them.

👤 irthomasthomas
And chrome based browsers delete history after 3 months. Yes, the past is most definitely disappearing. But don't worry, some faang thought-leaders will be along soon to tell us how we're mistaken, it is only being deemphasised, I am sure.

👤 dorfsmay
Yes, very annoying! I once wrote a script to estimate the publication date based on archive.org first appearance, for an entire site for a blogger who had nit put a date on any page.

Might be worth turning it into a per page browser extension...


👤 princevegeta89
Old links are definitely disappearing from search engines. I used to keep a track of results over a period of 5 years and I could only see that newer results are showing up while the ones from the past slowly fade away.

👤 QuantumYeti
> I feel like I'm encountering more and more sites and articles where I can't seem to find the date.

I think this is referred to as "Evergreen Content", and is encouraged by marketing/SEO companies.


👤 pessimizer
It's because search engines are extremely stupid and a plurality of people are looking for information from today, so they've been strongly biased to get that win on 20% of searches.

👤 topspin
> is it just me?

No. I too have observed this. It's another dark pattern driven by SEO. In the internal page ranking algorithm I apply to web content lack of dates is an important factor.


👤 hbarka
Probably an opportunity for a browser extension where a mouseover to the title will query archive.org and return metadata of original create date and latest modified.

👤 7373737373
I hate that not all scientific papers include a date by default

👤 gauchojs

👤 rc_mob
Yes I hate this trend, and google doesn't seem to care

👤 mrfusion
If there are multiple past states that can lead to the present state then is the past any more fixed than the future?

👤 phkahler
I've noticed that most science and math publications - including the paywalled stuff - omits the date. It can be very frustrating finding such information and not knowing its vintage. It IS relevant.

👤 ralusek
It's honestly been hard to find news sources from the beginning of the pandemic...

👤 gbrindisi
Can it be a side effect of SEO? IIRC Google et al. have recency bias to determine page ranks

👤 mot0rola
yeah, this annoys me as well. although maybe not intention, appears deceitful. actually, maybe this could be useful for blockchain tech. i don't know of another way you can check the date on a webpage from frontend?

👤 oxff
Only the hash horizon made of proof of work will protect you.

👤 piyushpr134
Google has become shitty. They "return irrelevant results from today rather than relevant results from 10 years ago" is mfing disease. I have completely moved away from it to ddg and bing. Much better results

👤 mountainb
The truth is that most web content is irrelevant garbage. Do you care that an ad doesn't have a 'published on' date? Do you care that the brochures you get in the mail that you instantly throw out do not have a publish date?

Most web content falls under this category: ephemeral junk. It's irresponsible to use the web for meaningful research purposes. For most technical questions, go to manuals, and for most humanistic questions, hit the books. The web can be a means to inform research, but the majority of material is basically "39 weird teeth whitening tricks the dentists don't want you to know."


👤 throwaway_Aef8
Heequaet4geongeel1bag0xie9inaB4u

👤 toolcombinator
Nah, I see it too.

YouTube is especially bad.


👤 zxcvbn4038
A lot of sites recycle old content and/or remove the dates so you can’t easily tell how frequently the sites update. Very frustrating when your researching technical issues and you keep encountering outdated materials. Not to mention historic content that disappears behind paywalls.

👤 jonahbenton
The TikTok effect.

👤 wholinator2
As an aside, when I saw on the news that Iran had created a propaganda video appearing to show them air striking Donald Trump at a golf course, every new media that told me about the video, also refused to show it. I went to Google and not a single link returned contained the actual video, I was getting pretty mad. I try the exact same words in bing and every link returned has the video.

I had never actually experience so blatantly Google hiding information from me, my search terms were very clear, evidenced by bing's results. But honestly just that one experience is enough for me to never trust Google again.


👤 grappler
I share an interest in this topic, and have come up with a partial set of coping strategies.

- I open the popup for the 'SEOInfo' firefox plugin which displays metadata from the page (JSON-LD, microdata, and a catch-all "other meta" field often show created/modified timestamps for the current page)

- A page may give a date but if the page has a whiff of marketing vibe about it and that date is very recent and not in the url I am immediately suspicious of the date it gives.

- I click over to archive sites (Internet Archive/wayback machine, archive.today). I'll often bookmark a good early capture from one of these next to the actual page for the sole reason that the capture has some semblance of a date, even if it's not close to the actual published date. I want that simply because I like to order links by date and so I need some date to use for that.

- Academic papers are especially frustrating here. Some have a date prominently in the pdf but most don't. This is apparently an artifact of a process whereby papers often simply don't have a definitive date. The authors worked on it over the course of a year or more, so many differing versions are floating around. For these I google the title and can often get an arxiv preprint page with a submission date, or journal publications with a submission date (though these often only give a month or a year).

- There is a lot of good stuff on a stackoverflow page but what's the "date" of that page? A question could have been asked a decade ago and have answers running from that day to today. I really like the wayback machine for these. I can start with an early version of the page and click forward through time, bookmarking when I stop. Then if I pick it up again later, I can resume from the bookmark and continue forward through time from there.

- Quora pages are especially frustrating because they won't give the date the question was asked; just the answers. And you can't do the wayback machine thing I described above for stackoverflow because Quora has blocked internet archive. For these reasons I dread trying to do anything with quora pages.

What I'd really wish for is some way browsers can get create/modify dates for any page, so the user doesn't need to hunt for it and plugins can do stuff with it. For example I'd love a plugin that could order the tabs in a window by, say, modification date. The workflow I have in mind is: open search results into tabs. Make the browser order the tabs by date for you without manual labor. Then you can read through your search results in chronological order. Save the set. Restore it a week or a year later. Click "reorder" again to account for pages that have received updates. Now your tabs are in chronological order again. I would gladly pay for this capability!


👤 jollybean
I sense there might be a bit of a business model here for companies that actually do want to 'archive' stuff.

The company would take a bunch of arbitrary content and do minimal presentation for it, and host it, forever.

Basically 'hands off archiving'. I'll bet a lot of companies would be interested in this.