Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Getting started with OCCRP Data

Getting started with OCCRP Data

A quick overview of the data platform published by the Organized Crime and Corruption Reporting Project. It provides an easy-to-use tool to background people, search leaked data and share information between reporters.


Friedrich Lindenberg

December 05, 2018


  1. OCCRP Data Getting started with

  2. A platform for investigative reporters working on cross-border cooperations. What

    is OCCRP Data? https://data.occrp.org
  3. What is OCCRP Data? 1 Search through hundreds of leaks

    and scraped databases.
  4. Leaks, court documents, news archives, company registries, persons of interest,

    gazettes, customs declarations, licenses and concessions, sanctions lists, land registries, procurement awards, voter databases, regulatory filings, air and maritime registers 300+ datasets 27,000,000+ documents 450,000,000+ entities
  5. Get an overview of the archive

  6. We map all data into a common language of corruption

    investigations: companies, people, contracts, emails, documents… How does it work? • We import leaked data from confidential sources and the open web. • To add context, we also regularly scrape over 200 online sources.
  7. Backgrounding • If a person is sanctioned or involved in

    politics, crime. • Mentions in historic leaks like Wikileaks Cables, ICIJ, HackingTeam, Kazaword, … • Many official documents from offshore jurisdictions, Eastern Europe, and Africa. Quick way to check:
  8. Backgrounding Register to get more access.

  9. Preview Quickly view search results

  10. What is OCCRP Data? 2 A secure place to upload

    and search your own documents.
  11. Upload Share and search your documents

  12. Upload PDF, Word, Excel, E-Mail, PST, mbox, Zip, RAR, Tarballs,

    7z, Access, SQLite, DBF, ODS, ODF, CSV, images, TIFF, video and audio metadata, XML, plain text, etc. We import and preview a wide range of document types, and do image text recognition and entity extraction for:
  13. Additional datasets are available to trusted reporters. You will be

    granted access when you work on a major OCCRP cooperation. Access Control Datasets can be shared publicly, with project teams or individual users. Reporters can also upload and share documents.
  14. What is OCCRP Data? 3 Find links between all documents,

    people and companies in our archive
  15. Documents, people and companies are connected through tags, such as

    names, emails, phone numbers, address or even IBAN numbers. Finding links
  16. We perform cross-referencing on large sets of data, e.g. to

    match all of a country’s parliament to leaked offshore finance, luxury real estate or foreign company ownership. Finding links
  17. Finding links Link entities across sources

  18. Use the technology The technology behind OCCRP Data, Aleph, is

    a re-usable open source package. We support technologists to set up a copy in-house, or on your own servers. Contributions, translations and ideas: https://github.com/alephdata
  19. https://data.occrp.org Contact: data@occrp.org Open source: github.com/alephdata

  20. “Truth cannot penetrate a closed mind. If all places in

    the universe are in the Aleph, then all stars, all lamps, all sources of light are in it, too.” J.L. Borges