Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Data doesn't grow in tables
Search
Friedrich Lindenberg
July 16, 2014
Technology
2
270
Data doesn't grow in tables
Friedrich Lindenberg
July 16, 2014
Tweet
Share
More Decks by Friedrich Lindenberg
See All by Friedrich Lindenberg
Introducción a OCCRP Data
pudo
0
410
Getting started with OCCRP Data
pudo
0
1.5k
#nr16: Recherche-Tools
pudo
1
100
data.occrp.org
pudo
0
150
Tools for Data Journalism | MediaLab Prado DDJ Workshop
pudo
0
240
Digitial Research Tools for Investigative Reporters
pudo
0
11k
Grano: A Python tool for investigating influence
pudo
1
280
Dr. Freezefile
pudo
2
400
Intro presentation for Naivasha
pudo
1
160
Other Decks in Technology
See All in Technology
Кто отправит outbox? Валентин Удальцов, автор канала Пых
lamodatech
0
300
AIにどこまで任せる?実務で使える(かもしれない)AIエージェント設計の考え方
har1101
3
1.3k
登壇ネタの見つけ方 / How to find talk topics
pinkumohikan
3
300
米国国防総省のDevSecOpsライフサイクルをAWSのセキュリティサービスとOSSで実現
syoshie
2
820
本当に使える?AutoUpgrade の新機能を実践検証してみた
oracle4engineer
PRO
1
120
rubygem開発で鍛える設計力
joker1007
1
130
監視のこれまでとこれから/sakura monitoring seminar 2025
fujiwara3
10
2.9k
Amazon S3標準/ S3 Tables/S3 Express One Zoneを使ったログ分析
shigeruoda
2
390
In Praise of "Normal" Engineers (LDX3)
charity
3
1.2k
Observability в PHP без боли. Олег Мифле, тимлид Altenar
lamodatech
0
300
プロダクトエンジニアリング組織への歩み、その現在地 / Our journey to becoming a product engineering organization
hiro_torii
0
110
25分で解説する「最小権限の原則」を実現するための AWS「ポリシー」大全
opelab
9
2.2k
Featured
See All Featured
How GitHub (no longer) Works
holman
314
140k
個人開発の失敗を避けるイケてる考え方 / tips for indie hackers
panda_program
107
19k
Save Time (by Creating Custom Rails Generators)
garrettdimon
PRO
31
1.2k
Designing Experiences People Love
moore
142
24k
Large-scale JavaScript Application Architecture
addyosmani
512
110k
Templates, Plugins, & Blocks: Oh My! Creating the theme that thinks of everything
marktimemedia
31
2.4k
Connecting the Dots Between Site Speed, User Experience & Your Business [WebExpo 2025]
tammyeverts
4
200
Unsuck your backbone
ammeep
671
58k
Mobile First: as difficult as doing things right
swwweet
223
9.7k
Writing Fast Ruby
sferik
628
61k
The Cost Of JavaScript in 2023
addyosmani
51
8.4k
Let's Do A Bunch of Simple Stuff to Make Websites Faster
chriscoyier
507
140k
Transcript
Data doesn’t grow in tables Dealing with large sets of
documents
–An investigative reporter “We're working with 40 GB of XXX
and would like to search within the documents for certain keywords (like XXX) so we can identify XXX. Ideally we should be able to tag the docs..”
Some lingo • OCR (Optical Character Recognition) • NLP (Natural
Language Processing) • NER (Named Entity Recognition) • Regular Expressions
Cases
Exhibit A
Exhibit B
Exhibit C
Exhibit D
Tools
Tables in disguise http://tabula.nerdpower.org
Docs in a cloud http://documentcloud.org
Clustering, tagging, mining http://overview.ap.org
Let them eat PDF https://github.com/CrowData
All the visuals Jigsaw
Spoken word magic http://sayit.mysociety.org/
Whats missing? Easy-to-use ElasticSearch Commercial-grade OCR Configurable pipelines
Stefan Wehrmeyer, correctiv.org, @stefanwehrmeyer ! ! ! ! ! !
! Friedrich Lindenberg, codeforafrica.org, @pudo
None
None