I recently discovered that the SEC has a RESTful endpoint for company filing data. You can, for free, query extremely fine grained financial data about any company that files with the SEC[1]. For instance, this link gives you the Berkshire Hathaway Basic Shares Outstanding:
https://data.sec.gov/api/xbrl/companyconcept/CIK0001067983/us-gaap/WeightedAverageNumberOfSharesOutstandingBasic.json
Unfortunately, as far as I can tell, the data is incomplete. The link above delivered json that only covers up to end 2015, nothing more recent. If you go to the SEC website you can access more recent Berkshire Hathaway data (including basic shares outstanding) - their latest filing was last week.
So, what gives? Is the API faulty? Does this explain why all the python and R libs still use html traversal for scraping SEC data?
[1] https://www.sec.gov/edgar/sec-api-documentation