Open Data APIs: Exploring OpenCorporates’ 80 million companies in 100+ jurisdictions

The largest open database of companies in the world

Building the underlying dataset of all company data • A
critical underpinning of understanding the corporate world • De-siloing data from oﬃcial corporate registers, and other government data, especially regulatory • Linking critical – and previously obscure datasets

One entry per legal entity • Assembled from company registers
around the world • All automatically ingested – no manual imports • Over 100 jurisdictions already... more added every month • Disparate data normalised to key ﬁelds • Searchable across jurisdictions • Automatically matching foreign branches to home companies • Use company-register-based identiﬁers – non- proprietary, non-monopoly. Avoid lock-in

Eventual target: Every bit of company-related public data in the
world, matched to the relevant company • Already have millions of company data from disparate public sources: WIPO trademarks, UK government spending, corporate structures from SEC filings & Federal Reserve • Current focus: every bank licence in the world • Next targets: business licences, other financial licences, non-profit data, government gazettes • Driven by user-demand, or where there is structural benefit (e.g. corporate relationship data) • ‘Open’ critical to both mission and quality

Problem Cause Data accuracy Data is re-keyed. Few eyeballs. Often
little downside   to lying Gaps in data High (& often duplicated) cost of data entry. Limited to payers Lack of granularity Legacy systems/data models hard to re-engineer in closed world Errors go uncorrected Few feedback mechanisms Black box/No provenance Can’t reveal (sometimes dubious) sources.   Limits usefulness/trust Isolated Proprietary IDs are internal identiﬁers & are barriers to sharing & improved data quality Common proprietary data quality issues

v0.4 just released

Just released: v0.4 • search by registered address, plus search
for companies starting with given phrase (e.g. ‘Barclays Bank’) • filter by multiple jurisdictions (e.g. Ireland and UK) • filter by country (e.g. US) • richer filtering of inactive and branch companies • a new nonprofit filter, to restrict to/exclude companies with a nonprofit company type • users with API keys can now get addresses (and dates of birth) for directors/officers • search officers by address, date of birth, position or status • more powerful date searching • a completely new way of representing industry codes that is far more granular and allows more powerful search filtering https://www.flickr.com/photos/usairforce/6904504692

What use is that? • Companies with registered address at
the Empire State Building • Companies with ‘condominium’ in the name in the US and Canada • Oﬃcers who were born over 105 years ago, but are still active (requires API token) • Nonproﬁt companies in UK and US with ‘political’ in the name and incorporated in 2014 (requires API token) • Companies in the UK or Belgium with tax in the title and with the EU industry code for “Accounting, bookkeeping and auditing activities; tax consultancy”* • Companies based in Berlin with foreign branches

Innovative business model: Share-Alike or paid for • Cross-subsidy model
brings best of both worlds • Public benefit – free and open website • Plus free access to data under share-alike licence for open data projects • Many eyes – improves quality • Benefit from efficiencies of scale • No blackbox data – full provenance for all data (source + date retrieved) • Gives added context and confidence

Who's using our data • World Bank • LinkedIn •
Bureau van Dijk • Stripe • Avention (OneSource) • Creditsafe • Palantir • Funding Circle • etc

We don’t need crowdsourcing https://www.ﬂickr.com/photos/oblongpictures/4516124048

We need Ninja Sourcing https://www.ﬂickr.com/photos/danielygo/5531024732

helping the open data community work together • Missions: A
platform for collaborating on data-sourcing, scraping and cleansing • Turbot: A docker-based framework for scrapers • #FlashHacks: Collaborative crowdscraping events for fun and the public good Next FlashHacks April 29, London & Berlin

Open Data APIs: Exploring OpenCorporates’ 80 mi...

Open Data APIs: Exploring OpenCorporates’ 80 million companies in 100+ jurisdictions

API Strategy & Practice Conference

More Decks by API Strategy & Practice Conference

Other Decks in Technology

Featured

Transcript

The largest open database of companies in the world

Building the underlying dataset of all company data • A

One entry per legal entity • Assembled from company registers

Eventual target: Every bit of company-related public data in the

Problem Cause Data accuracy Data is re-keyed. Few eyeballs. Often

v0.4 just released

Just released: v0.4 • search by registered address, plus search

What use is that? • Companies with registered address at

Innovative business model: Share-Alike or paid for • Cross-subsidy model

Who's using our data • World Bank • LinkedIn •

We don’t need crowdsourcing https://www.ﬂickr.com/photos/oblongpictures/4516124048

We need Ninja Sourcing https://www.ﬂickr.com/photos/danielygo/5531024732

helping the open data community work together • Missions: A