Slide 1

Slide 1 text

OCCRP Data Getting started with

Slide 2

Slide 2 text

A platform for investigative reporters working on cross-border cooperations. What is OCCRP Data? https://data.occrp.org

Slide 3

Slide 3 text

What is OCCRP Data? 1 Search through hundreds of leaks and scraped databases.

Slide 4

Slide 4 text

Leaks, court documents, news archives, company registries, persons of interest, gazettes, customs declarations, licenses and concessions, sanctions lists, land registries, procurement awards, voter databases, regulatory filings, air and maritime registers 300+ datasets 27,000,000+ documents 450,000,000+ entities

Slide 5

Slide 5 text

Get an overview of the archive

Slide 6

Slide 6 text

We map all data into a common language of corruption investigations: companies, people, contracts, emails, documents… How does it work? • We import leaked data from confidential sources and the open web. • To add context, we also regularly scrape over 200 online sources.

Slide 7

Slide 7 text

Backgrounding • If a person is sanctioned or involved in politics, crime. • Mentions in historic leaks like Wikileaks Cables, ICIJ, HackingTeam, Kazaword, … • Many official documents from offshore jurisdictions, Eastern Europe, and Africa. Quick way to check:

Slide 8

Slide 8 text

Backgrounding Register to get more access.

Slide 9

Slide 9 text

Preview Quickly view search results

Slide 10

Slide 10 text

What is OCCRP Data? 2 A secure place to upload and search your own documents.

Slide 11

Slide 11 text

Upload Share and search your documents

Slide 12

Slide 12 text

Upload PDF, Word, Excel, E-Mail, PST, mbox, Zip, RAR, Tarballs, 7z, Access, SQLite, DBF, ODS, ODF, CSV, images, TIFF, video and audio metadata, XML, plain text, etc. We import and preview a wide range of document types, and do image text recognition and entity extraction for:

Slide 13

Slide 13 text

Additional datasets are available to trusted reporters. You will be granted access when you work on a major OCCRP cooperation. Access Control Datasets can be shared publicly, with project teams or individual users. Reporters can also upload and share documents.

Slide 14

Slide 14 text

What is OCCRP Data? 3 Find links between all documents, people and companies in our archive

Slide 15

Slide 15 text

Documents, people and companies are connected through tags, such as names, emails, phone numbers, address or even IBAN numbers. Finding links

Slide 16

Slide 16 text

We perform cross-referencing on large sets of data, e.g. to match all of a country’s parliament to leaked offshore finance, luxury real estate or foreign company ownership. Finding links

Slide 17

Slide 17 text

Finding links Link entities across sources

Slide 18

Slide 18 text

Use the technology The technology behind OCCRP Data, Aleph, is a re-usable open source package. We support technologists to set up a copy in-house, or on your own servers. Contributions, translations and ideas: https://github.com/alephdata

Slide 19

Slide 19 text

https://data.occrp.org Contact: [email protected] Open source: github.com/alephdata

Slide 20

Slide 20 text

“Truth cannot penetrate a closed mind. If all places in the universe are in the Aleph, then all stars, all lamps, all sources of light are in it, too.” J.L. Borges