HACKER Q&A
📣 jl6

When has metadata ever worked?


Cory Doctorow’s metacrap essay is legendary, but I wonder if it might be a touch too nihilistic. Are there some standout examples of where Cory’s meta-pitfalls have been successfully overcome? What have we learned?


  👤 jfengel Accepted Answer ✓
"Metadata" works all the time. SMTP is chock full of metadata. HTTP is chock full of metadata.

Doctorow appears to be talking about a very specific kind of metadata: industry standard XML, in situations where people aren't actually exchanging data. The kind of thing that was pushed by the Semantic Web project, which failed pretty badly.

I was involved in Semantic Web, and I've got my own reasons why I thought it failed. Some of them align with Doctorow's reasoning; others don't. It sure didn't help that Semantic Web was neither semantic nor web; bad naming doesn't have to be a problem but it invited difficulty in agreeing on what they were talking about.

And some of it was that Big Data happened along just when they were supposed to be getting going. In theory Semantic Web stuff should have obviated a lot of the data cleaning that goes into Big Data, and made a lot more information available, but Big Data gave sexier results than Semantic Web, and people just didn't want to work on the latter. Why grind out standards when the neurons can just, ya know, figure it out, kinda sorta maybe?

The fact is that data standards work all the time. "Metadata" as Doctorow means it is just one narrow way of defining standards, and that too exists all over the place. Any time people need to exchange data, they set up a standard. It's boring and frustrating and people hate it, and the results are usually bad, but when people need to, it works.

What Semantic Web hoped for was that people would just voluntarily come up with standards and throw their data out there in hopes that somebody else would use it, and that their competitors would use the same standard. Doctorow was right that that was hard, for all kinds of reasons, and rarely actually happens -- especially not with the mechanisms that XML supplies, even supplemented with RDF and OWL.

But in a lot of ways, this essay comes down to Standards Are Hard. Which is true and non-controversial.