Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Getting started with OCCRP Data

Getting started with OCCRP Data

A quick overview of the data platform published by the Organized Crime and Corruption Reporting Project. It provides an easy-to-use tool to background people, search leaked data and share information between reporters.

Friedrich Lindenberg

December 05, 2018

More Decks by Friedrich Lindenberg

Other Decks in Technology


  1. Leaks, court documents, news archives, company registries, persons of interest,

    gazettes, customs declarations, licenses and concessions, sanctions lists, land registries, procurement awards, voter databases, regulatory filings, air and maritime registers 300+ datasets 27,000,000+ documents 450,000,000+ entities
  2. We map all data into a common language of corruption

    investigations: companies, people, contracts, emails, documents… How does it work? • We import leaked data from confidential sources and the open web. • To add context, we also regularly scrape over 200 online sources.
  3. Backgrounding • If a person is sanctioned or involved in

    politics, crime. • Mentions in historic leaks like Wikileaks Cables, ICIJ, HackingTeam, Kazaword, … • Many official documents from offshore jurisdictions, Eastern Europe, and Africa. Quick way to check:
  4. What is OCCRP Data? 2 A secure place to upload

    and search your own documents.
  5. Upload PDF, Word, Excel, E-Mail, PST, mbox, Zip, RAR, Tarballs,

    7z, Access, SQLite, DBF, ODS, ODF, CSV, images, TIFF, video and audio metadata, XML, plain text, etc. We import and preview a wide range of document types, and do image text recognition and entity extraction for:
  6. Additional datasets are available to trusted reporters. You will be

    granted access when you work on a major OCCRP cooperation. Access Control Datasets can be shared publicly, with project teams or individual users. Reporters can also upload and share documents.
  7. What is OCCRP Data? 3 Find links between all documents,

    people and companies in our archive
  8. Documents, people and companies are connected through tags, such as

    names, emails, phone numbers, address or even IBAN numbers. Finding links
  9. We perform cross-referencing on large sets of data, e.g. to

    match all of a country’s parliament to leaked offshore finance, luxury real estate or foreign company ownership. Finding links
  10. Use the technology The technology behind OCCRP Data, Aleph, is

    a re-usable open source package. We support technologists to set up a copy in-house, or on your own servers. Contributions, translations and ideas: https://github.com/alephdata
  11. “Truth cannot penetrate a closed mind. If all places in

    the universe are in the Aleph, then all stars, all lamps, all sources of light are in it, too.” J.L. Borges