HACKER Q&A
📣 throw9078686

How do news API services get their data/content?


I'm looking for a service that can provide me with news articles across the web via an API call. There seem to be a lot of such services, all costing roughly similar amounts -- $100+ per month.

Makes me think they are all wrappers around some other service, but I can't find anything else. Does anyone happen to know how this works?


  👤 instagraham Accepted Answer ✓
Not too clued in on the specific API aspects for aggregators, but news publishers usually have individual subscriptions to the major wire agencies like Reuters, AP and AFP and then the national ones (In India these are PTI, ANI and UNI).

These wire agencies must cost more than the amount you mentioned. But they would be considered primary sources, a layer 1 for news and information. A lot of news websites essentially just repackage L1 facts, add some stories about tweets famous people made, and throw in a handful of original reporting.

A cheaper wire service that deals more with business plugs and PR would be businessnewswire and prnewswire - these are inherently promotional by nature.

I doubt this answers your question, but I'd be interested to knowing how accessible these essential L1 information would be to ordinary users (not companies). While you may end up paying for Reuters directly, another website that includes a Reuters feed could give you the same content for less.



👤 altdataseller
Having run/started one of these (technically) in the past, it’s not very hard to maintain a list of major news sources and crawl them every x minutes. Alternatively if you have access to the Twitter firehose, you can pick real time news there, as most URLs get posted the instant it’s published (some by bots, some by publishers themselves)

On a scale of 1 to 10, the difficulty is like a 2 or 3


👤 sturza
Last time i build a real time news service i used the paid API of Inoreader, where i'd manage my own RSS feeds on their platform, and i'd poll their API every once in a while. Inoreader API would cost ~100 per year, not month. To directly answer your question: they use RSS.

👤 jeromechoo
We maintain a list of around 1000 major publishers across the world and we crawl it every 15 minutes. For every other publisher (smaller blogs, etc..), they come through our global crawl.

The list itself isn’t particularly hard to maintain. What’s hard are the myriad of rules and configurations required to crawl and scrape each publisher. We built a model that extracts article data and it does a good job figuring out headlines, images, authors, and text.

Scraping rules are very self-manageable if you're planning on crawling just a few publishers. But jt gets exponentially more difficult to crawl hundreds.


👤 cranberryturkey
https://brisk.news has an api and is free

👤 nwroot
AP Exchange is a monthly subscription that 100% of newspapers use.

👤 kaiwenwang
The URL itself is a GET request

👤 specproc
I work with news data professionally, and have found two broad categories of suppliers: the buyers and the scrapers.

Buyers are often a bit old school and frankly far more expensive than it's worth. Also good luck getting a usable, affordable API. I'm looking here at people like Meltwater, LexisNexus etc., who have licencing agreements with publishers.

Then there are the scrapers. The one I use is newsapi.ai, and I can broadly recommend them. They've got a decent selection, are happy to add stuff for you, and have lots of nice goodies baked in (e.g, NERD).

Most of the other ones you'll find with a cursory "news api" search also fall into this category AFAICT, but few, if any provide full text, which is what I need.

From conversations I've had with my supplier, I believe they've got a scrapy box running somewhere pulling largely off RSS feeds. I wouldn't want their job to be honest, so much to look after.

This approach is fine for some needs, but you can literally see the gaps in the time series where something has fallen over.

I'm very interested in this space and would love to hear other's experiences.


👤 joshxyz
not an api but brutalistreport offers a nice ui