HACKER Q&A
📣 itsokaywelldoit

What can we do with 20B links from sites


Hello, geeks of the world. It's my first time writing here. I and my friend developed a project that receives list of domains like www.wikipedia.org(example, example!) and saves all of the links in the html of it's starting page. Sorts them into images, internal site links etc. We had list of like 200 mil domains and we parsed them all, resulting in 20 billion links from sites paired with source domains and sorted types.

We also found a service that can provide domains that are newly registered and domains that recently died, so we can keep our database up to date and even generate regular reports that state changes, improve sorting of result links, and write custom processors for clients to get more data from sites that meet their criteria.

Our thoughts were that we could query links to word press plugins used on sites and generate reports for commercial plugin developers, with regular updates about who uses, who are new users and who stopped using the plugin. But we haven't sent out many emails, so no answers yet.

Example of the juicy part of our data: {

  "source_url": "http://www.wikipedia.org",

  "source_domain": "www.wikipedia.org",

  "destination_url": "https://creativecommons.org/licenses/by-sa/4.0/",

  "destination_domain": "creativecommons.org",

  "link_type": "EXTERNAL_LINK",

  "anchor_text": "Creative Commons Attribution-ShareAlike License"
}

Please help us gather ideas who could be interested in such data and possible insights(leads from sites using competitors, sites using your plugin what plugins get combined with yours most often, which sites are most referred on others, which have contact forms or contact us pages and collect these forms. Google analytics, google ads usage. Does site have links to Google Play and or App Store. Links to social media sites and which SM accounts are most often found on sites home pages) We're sitting on a well of information and we don't know what to find from it and what people would be interested in. Damn, we could be doing graphs and maps and we're just sitting here "ehh what would be interesting to people" with a lot of "m"'s like that doggo.

Help us find ideas what to do with that data and who to target.

If you want to get such data to do something fun, write us, we can devise the query and send you the results no problem)

Thanks a bunch! Looking forward to your comments


  👤 ordinaryalice Accepted Answer ✓
Why did you decide to mine this data specifically if you have trouble finding a market for it?