Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Getting started with OCCRP Data

Getting started with OCCRP Data

A quick overview of the data platform published by the Organized Crime and Corruption Reporting Project. It provides an easy-to-use tool to background people, search leaked data and share information between reporters.

Friedrich Lindenberg

December 05, 2018
Tweet

More Decks by Friedrich Lindenberg

Other Decks in Technology

Transcript

  1. OCCRP Data
    Getting started with

    View full-size slide

  2. A platform for investigative reporters
    working on cross-border cooperations.
    What is OCCRP Data?
    https://data.occrp.org

    View full-size slide

  3. What is OCCRP Data?
    1
    Search through hundreds of leaks and
    scraped databases.

    View full-size slide

  4. Leaks, court documents, news archives,
    company registries, persons of interest,
    gazettes, customs declarations, licenses
    and concessions, sanctions lists, land
    registries, procurement awards, voter
    databases, regulatory filings, air and
    maritime registers
    300+ datasets
    27,000,000+ documents
    450,000,000+ entities

    View full-size slide

  5. Get an overview
    of the archive

    View full-size slide

  6. We map all data into a common language of
    corruption investigations: companies,
    people, contracts, emails, documents…
    How does it work?
    • We import leaked data from confidential
    sources and the open web.
    • To add context, we also regularly scrape
    over 200 online sources.

    View full-size slide

  7. Backgrounding
    • If a person is sanctioned or involved in
    politics, crime.
    • Mentions in historic leaks like Wikileaks
    Cables, ICIJ, HackingTeam, Kazaword, …
    • Many official documents from offshore
    jurisdictions, Eastern Europe, and Africa.
    Quick way to check:

    View full-size slide

  8. Backgrounding
    Register to get
    more access.

    View full-size slide

  9. Preview
    Quickly view
    search results

    View full-size slide

  10. What is OCCRP Data?
    2
    A secure place to upload and search
    your own documents.

    View full-size slide

  11. Upload
    Share and search
    your documents

    View full-size slide

  12. Upload
    PDF, Word, Excel, E-Mail, PST, mbox, Zip,
    RAR, Tarballs, 7z, Access, SQLite, DBF, ODS,
    ODF, CSV, images, TIFF, video and audio
    metadata, XML, plain text, etc.
    We import and preview a wide range of
    document types, and do image text
    recognition and entity extraction for:

    View full-size slide

  13. Additional datasets are available to trusted
    reporters. You will be granted access when
    you work on a major OCCRP cooperation.
    Access Control
    Datasets can be shared publicly, with
    project teams or individual users. Reporters
    can also upload and share documents.

    View full-size slide

  14. What is OCCRP Data?
    3
    Find links between all documents, people
    and companies in our archive

    View full-size slide

  15. Documents, people
    and companies are
    connected through
    tags, such as
    names, emails,
    phone numbers,
    address or even
    IBAN numbers.
    Finding links

    View full-size slide

  16. We perform cross-referencing on large sets
    of data, e.g. to match all of a country’s
    parliament to leaked offshore finance,
    luxury real estate or foreign company
    ownership.
    Finding links

    View full-size slide

  17. Finding links
    Link entities
    across sources

    View full-size slide

  18. Use the technology
    The technology behind OCCRP Data, Aleph,
    is a re-usable open source package.
    We support technologists to set up a copy
    in-house, or on your own servers.
    Contributions, translations and ideas:
    https://github.com/alephdata

    View full-size slide

  19. https://data.occrp.org
    Contact: [email protected]
    Open source: github.com/alephdata

    View full-size slide

  20. “Truth cannot penetrate a
    closed mind. If all places
    in the universe are in the
    Aleph, then all stars, all
    lamps, all sources of light
    are in it, too.”
    J.L. Borges

    View full-size slide