I feel like I'm encountering more and more sites and articles where I can't seem to find the date. Google will return irrelevant results from today rather than relevant results from 10 years ago.
I feel it's getting worse, is it just me?
It seems to me that its become standard practice on marketing type blogs for corporate websites to remove the date from their posts. I think its because (from personal experience) the company will go though a burst of "blog productivity" create a load of content but then not touch it for years, they don't want that content to look out of date or their website to look stagnant.
Removing the date from their posts, or any other content, hides how old it is and therefore obscures how active they are at crating new content.
Most companies try to use their blogs to attract new customers, a new customer may visit their website once or twice and will never see the blog again, it's not important that they do. They don't want it to look stale.
As a counter example, an interesting thread from yesterday [0] was about how CloudFlare use their blog not as a marketing tool but for technical content and attracting employees. They very regulally use their blog, and so keep the date on it showing how fresh it is.
So modern technology is literally erasing our pasts. Not just calendar entries, but messaging systems (people used to keep handwritten letters for decades), and possibly even photos (if we're not careful about preserving them).
Edit: See my clarification of the 2 years in comment https://news.ycombinator.com/item?id=30084620 below. I still think the point remains - we do not own or value our digital data in the same way as physical objects, and there is a much heightened risk of that data disappearing as a result, either by the owners of the platforms the data is stored on archiving the data or by us not valuing it enough to preserve exports and backups through long periods of time.
I think they implemented some new form of search term widening which is far too strong, so the results you want are often buried among pages and pages of results for the general category of things that you searched for rather than close matches for your keywords. Combined with the recency bias that other people have talked about and you end up with a lot less useful search for precise searching.
This coincides with a large increase in the number of surveys that my partner has been getting through the Google Rewards program that ask whether or not a recently used search term gave relevant results. Obviously that's just anecdotal, but it does feel like there are substantial changes in the algorithm, and not necessarily for the better.
Tip: leave Google behind for now.
That site has the last few years been very useful but only in the same way as my very cheap electrical saw: because I didn't have access to anything better.
For someone who has tried good tools like Festo, Milwaukee, Hitachi or old Google it is just a painful reminder of the past and how good life used to be.
It works but hasn't sparked joy for close to a decade.
After kagi and marginalia came into my life my life has improved significantly.
Note: I'm not saying Googlers are evil or dumb now but I will point out that engineers there have incentives stacked against them.
I used to work as a patent examiner and I was disappointed when I found web content describing an element of a patent application I was working on, but there was no date that could be used to be certain the document was available before the priority date of the application.
You can use the Wayback Machine and similar archivers to get a date, but frequently the archivers didn't capture the page or didn't capture it in time in my experience (even if it likely was published in time, I can't establish that legally).
Before I quit, I spent some time saving a ton of webpages in one of the areas I was working on (water heaters) just because I wasn't sure how long I'd be at the USPTO and I could be certain of the date given that I myself archived the documents. It was a long-term investment, but could have been quite useful if (for example) a company tries to patent something they previously sold a long time ago and forgot about. The Wayback Machine often had spotty coverage of corporate webpages so I couldn't see all their products at a particular time.
To point a recent example (and given the current events) a number of Russian officials blamed the sinking of the Kursk on NATO (either on purpose or by accident), and I recall such statements from back then, but via Google it's been almost impossible to find a primary source. Most results were from the 2021 statements insisting on that from a retired admiral that was involved back then, but from 2000/2001 the relevant content was certainly tough to find.
Part of it is because this is 2000/2001 and many links rotted away, another part because the existing links usually don't respect basic SEO, and finally because Google, in my experience, very strongly prioritizes now/recent content.
I also do kinda think we should be thinking more about what legacy we leave than we presently do. HTML has some serious problems with that regard, especially in terms of link rot, and especially now that we treat it as a way to build platforms. Archive.org is great and all, but is it enough? How will SPAs fare when the backend server is down in 30 years? How much value will be lost?
"Best CMS frameworks (2022)", for example, and yet the content is out of date.
This post reminded me of a great Kurzgesagt video [2] that went briefly into how much of the past life on earth we have no information on and will never be able to know. Incidentally it took me a few seconds to find that video. Before the internet if I was trying to lookup a clip I had seen a month ago on TV I don't know even where I would have begun searching...
However I think we are getting increasingly better at preserving information and making it easy to access with tools like the internet archive, and cloud backups for your photos. This is despite the sheer quantity of data (such as the number of photos you take) growing at an exponential rate. Would you have been able to easily find instructions for a machine that was decades old before the internet?
So the past is disappearing but possibly at a decreasing rate.
0: https://en.wikipedia.org/wiki/Arch_Mission_Foundation
The most recent example, Sue Gray, a top British civil servant until a few months ago would have her career controversies visible in search results when searching for her. Since it was announced she would carry out an investigation into party's at 10 Downing Streets, its become impossible to see her career controversies in search results now.
Eli Pariser also hilighted changes going on with Google back in 2011 as you can see from this talk, but I think was just the start. https://www.ted.com/talks/eli_pariser_beware_online_filter_b...
IMO, the search engines have now got a lot worse with what they show in search results like the Sue Grey example above.
Society is becoming like Fahrenheit 451 https://en.wikipedia.org/wiki/Fahrenheit_451
Especially "best [product] in [year]" articles and lists, they somehow always are about the current year, even in early January, and even if they are only about outdated things..
The crap dirty tricks SEO content does seem to work quite well atm. It's probably pretty hard to determine whether something is relevant or not.
But I've shared similar frustrations, yes.
It's not just you. Also sites being gone and content getting lost. I think by now pretty much everything text should just not disappear anymore, and we seem to only have web archive which is doing something right (google cache seemed to have lost its persistence at some point but I may be imagining it because that's like 2 data points).
I wish web archive would skip videos and instead fetch more obscure websites but I'm guessing that being able to tell what's spam and what is not is not easy.
Shared distributed history cache for visited websites could be nice but within a short time I spent thinking about it I couldn't figure out a way in which this could work that would make me install it myself.
Google isn't about "reference" data (who clicks on ads when they are looking for a history of stories about topic X?) so the archival and reference function falls to meta services like Wikipedia where a human curates the history and provides links back to the that history.
Of course such links get very few visitors and often the place hosting the content will simply retire it rather than spend a couple of nanocents on leaving it up, and the result is link rot.
Yes, I am cynical about how Google is now an agent for "destroying the world's information" when at one point they were simply trying to organize it.
Probably Reddit’s doing, but it’s made finding older topics impossible and I partially blame Google for letting companies abuse their service in this way.
[1] https://en.wikipedia.org/wiki/Year_Zero_(political_notion)
[1] https://www.google.com/search?q=f.position.vsub+is+not+a+fun...
The same thing I can see with dates, with billions of events generated and captured every second , the actual date/time of event can be demoted to level of thousands other attributes captured. So it will be mentioned when date itself is point of article etc.
I sometimes wonder how much of this is just a ratchet of things like banning spam: on a long enough time horizon, the survival rate of everything goes to zero?
Another reason might be that SEO got really good several years ago so older content just can't compete.
I am not aware of a social media app feed that puts quality above recency (not counting the plugins that enable that, like Twemex [1]). Instead they keep us in what David Perell calls "Never-Ending Now" [2]. We endlessly consume temporary, short-lived content and we are mostly blind to the past.
Google search is not social media, but I wouldn't be surprised if Google ranked more recent content higher, given how they have changed the Youtube algorithm.
1: https://chrome.google.com/webstore/detail/twemex-sidebar-for...
Not that I ever needed to use it, but it's there.
Google does know an indirect date of the page even if it's not written explicitly. The first crawling date should be saved and if no better indicator exists I'd assume they are using that timestamp.
There is nothing more frustrating that typing "XXXX 2022" click a link and see "XXXX-2020" in the URL.
People legit not changing their article but updating the title to stay on top of the SEO game. Usually found on generic searches that drive big traffic. I freaking hate that so much.
I did some professional writing for newspapers and magazines in the 00's, and it's all no longer online, which was a bit of a surprise to discover, as the articles in those publications were part of what I saw as significant personal accomplishments. Even the photos from the security cons have been largely scrubbed. There's some mercy in things falling off your social credit record with time, but for me that has been double edged. The good stuff is gone, and the lame stuff persists, but to me it's a small price for the freedom that the relative privacy provides. Some sites I understand what happened as some of it was personal, and other sites I checked to see how far back their online content went, and my pieces were just from before their current historical cutoff.
Of course as a natural conspiracy practitioner, I think there is an ideological effect of progressing search results to emphasize the present and downplay the good and value in the past as a matter of permanent revolution, but even I would be suprised if that was ever explicitly articulated anywhere, and the bitrot of internet history can be explained by other more concrete and plausible incentives.
1. AFAIK Google ranks pages (amongst other metrics) by how "fresh" they are. A date given on a web page might count as a measure of "freshness" so it would be good SEO practice to eliminate the date.
2. Google can measure how good its search results are by simply tracking the click-through. So, assuming search term "t" can lead to older results (technology ten years ago) or newer results (technology now), Google can refine results for the term by looking at the click-through rates. And if most of your fellow searchers look for newer tech, you looking for the older stuff might be marginalized.
3. With tracking being as sophisticated as it is, you might simply be in the wrong "cohort". If that's the case, you might try to alter what google knows about you by looking until you find the results relevant to you and then clicking on them. Even if that means going to page 10 of the results.
It's the main reason behind why instead of bookmarking stuff, I instead archive stuff. Search engines aren't as convenient as they used to be, especially for non-trending topics. At least that's my experience.
It's not only the case with literal dates but also with style. It's getting harder to tell precisely how old 'relatively new' stuff is. 'Retro'-design seems to have kind of disappeared. There's now such a flood of information that there's no real well-ordered periods any more. Even the way platforms now present information, not chronologically in an absolute sense but ordered according to personal preference kind of breaks time by design.
I’m glad I remembered the -keyword trick to exclude the term.
Over all, I think this is good, because it's more likely that old content is outdated and wrong. But of course old content can also be valuable. So for this stuff it might be better to use other ways than google to find them. Like searching directly on the websites or in specialized archives-search engines.
This applies to programming and current events.
There’s definitely algorithmic prioritisation for new content but I think it’s also a bit of a “seek and ye shall find” moment happening.
As an anecdote, very recently I had the opposite thing happening. I was researching something about React and noticed one of the comments ( still applicable actually! ) was from 2015. My head exploded when I realised 2015 is actually 7 years ago so for a short time afterwards I just kept noticing old comments or old content everywhere.
A friend also recommended me to not write a date in the past, but I did not see the reason really. And to simply not write it, because others don't is not a reason enough for me. To me adding the date to any content is a kind of honesty and if not that, then at least useful information, that can be easily added.
> Google will return irrelevant results from today rather than relevant results from 10 years ago.
This might be on Google, not the Internet in general. However, I also miss some things, that I did not store in earlier times and that are now nowhere to be found.
> I feel it's getting worse, is it just me?
I think it is not only you. I think it is a result of how the Internet has (d)evolved into more and more walled gardens. More and more short time engagement is optimized for, rather than long term quality websites and information. Many good content sources have long shut down.
I blame society for its overall mediocrity and lust for the social media quick fix, without realizing, what is destroyed by that. There are too many people online, who don't have a clue about how the Internet works, heck, how even a single website works. They make such a big part of online communities, that it becomes more profitable to cater to them, than the people, who actually know how stuff works. And why not? It is easier for them to do so, plus they make more money from the crowd. The majority does not care about sitting in walled gardens. They do not care about being able to host services yourself. They do not care about services being served by big corp and not being decentralized and extensible. They do mostly not care about their choices being taken away, which they never knew they had in the first place. There are countries, where the "Internet" is served by Facebook. They did not get to know the Internet by writing their own HTML by hand and putting that online. Most of them will never want to learn about the web's basics anyway. Today you are a "creator", when you produce content that goes through the filters of massive platforms, which are owned by FAANG and others. This is how we end up with a situation, that is less and less what people in the know would like. This is how the many ruin it for the few.
There are different ways to look at the issue.
* general assumption: the future is synonim of progress, so now is more relevant.
* general assumption: we know the past, so focus on the unknown aka the now
* fact: maintain the memories needs energy, and energy is a scarce resource. So societies forget its past to focus in the current issues to preserve themselves.
BTW, Same vibes looking to twitter timelines
My annoyance when searching for "Alice in Wonderland" and getting Disney, not Carroll.
I can't imagine many subjects for which decades-old results would be more relevant than current ones.
Also, Google gives you the option to search by date or date range, so if you just wanted 10 year old results, you could just do that. It even adds the date to the results.
In many contexts, this makes sense. Things change, and old content describes the old reality. I think it's generally good, because it encourages content creators to maintain their content, instead of publishing it then forgetting about it.
You're referring to googles search practices, which is a for profit industry centered around marketing, people need to see those ads. And they can make sure of that.
The discrepancy is of the interests of a company vs the interests of the majority.
The thought I have in the back of my head is that for thousands of years lots of information was written down, but not any more. What happens if digital storage is affected/destroyed for what ever reason...
Back to the stone age?
What a treasure and a curse it would be. We would drown in our past.
Though I think with AI it would be a blessing. So much to learn from.
Might be worth turning it into a per page browser extension...
I think this is referred to as "Evergreen Content", and is encouraged by marketing/SEO companies.
No. I too have observed this. It's another dark pattern driven by SEO. In the internal page ranking algorithm I apply to web content lack of dates is an important factor.
Most web content falls under this category: ephemeral junk. It's irresponsible to use the web for meaningful research purposes. For most technical questions, go to manuals, and for most humanistic questions, hit the books. The web can be a means to inform research, but the majority of material is basically "39 weird teeth whitening tricks the dentists don't want you to know."
YouTube is especially bad.
I had never actually experience so blatantly Google hiding information from me, my search terms were very clear, evidenced by bing's results. But honestly just that one experience is enough for me to never trust Google again.
- I open the popup for the 'SEOInfo' firefox plugin which displays metadata from the page (JSON-LD, microdata, and a catch-all "other meta" field often show created/modified timestamps for the current page)
- A page may give a date but if the page has a whiff of marketing vibe about it and that date is very recent and not in the url I am immediately suspicious of the date it gives.
- I click over to archive sites (Internet Archive/wayback machine, archive.today). I'll often bookmark a good early capture from one of these next to the actual page for the sole reason that the capture has some semblance of a date, even if it's not close to the actual published date. I want that simply because I like to order links by date and so I need some date to use for that.
- Academic papers are especially frustrating here. Some have a date prominently in the pdf but most don't. This is apparently an artifact of a process whereby papers often simply don't have a definitive date. The authors worked on it over the course of a year or more, so many differing versions are floating around. For these I google the title and can often get an arxiv preprint page with a submission date, or journal publications with a submission date (though these often only give a month or a year).
- There is a lot of good stuff on a stackoverflow page but what's the "date" of that page? A question could have been asked a decade ago and have answers running from that day to today. I really like the wayback machine for these. I can start with an early version of the page and click forward through time, bookmarking when I stop. Then if I pick it up again later, I can resume from the bookmark and continue forward through time from there.
- Quora pages are especially frustrating because they won't give the date the question was asked; just the answers. And you can't do the wayback machine thing I described above for stackoverflow because Quora has blocked internet archive. For these reasons I dread trying to do anything with quora pages.
What I'd really wish for is some way browsers can get create/modify dates for any page, so the user doesn't need to hunt for it and plugins can do stuff with it. For example I'd love a plugin that could order the tabs in a window by, say, modification date. The workflow I have in mind is: open search results into tabs. Make the browser order the tabs by date for you without manual labor. Then you can read through your search results in chronological order. Save the set. Restore it a week or a year later. Click "reorder" again to account for pages that have received updates. Now your tabs are in chronological order again. I would gladly pay for this capability!
The company would take a bunch of arbitrary content and do minimal presentation for it, and host it, forever.
Basically 'hands off archiving'. I'll bet a lot of companies would be interested in this.