Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Data doesn't grow in tables
Search
Friedrich Lindenberg
July 16, 2014
Technology
2
280
Data doesn't grow in tables
Friedrich Lindenberg
July 16, 2014
Tweet
Share
More Decks by Friedrich Lindenberg
See All by Friedrich Lindenberg
Introducción a OCCRP Data
pudo
0
420
Getting started with OCCRP Data
pudo
0
1.6k
#nr16: Recherche-Tools
pudo
1
110
data.occrp.org
pudo
0
170
Tools for Data Journalism | MediaLab Prado DDJ Workshop
pudo
0
250
Digitial Research Tools for Investigative Reporters
pudo
0
11k
Grano: A Python tool for investigating influence
pudo
1
290
Dr. Freezefile
pudo
2
440
Intro presentation for Naivasha
pudo
1
170
Other Decks in Technology
See All in Technology
usermode linux without MMU - fosdem2026 kernel devroom
thehajime
0
230
Introduction to Bill One Development Engineer
sansan33
PRO
0
360
Codex 5.3 と Opus 4.6 にコーポレートサイトを作らせてみた / Codex 5.3 vs Opus 4.6
ama_ch
0
150
Agile Leadership Summit Keynote 2026
m_seki
1
610
名刺メーカーDevグループ 紹介資料
sansan33
PRO
0
1k
Introduction to Sansan for Engineers / エンジニア向け会社紹介
sansan33
PRO
6
68k
Context Engineeringの取り組み
nutslove
0
340
ブロックテーマでサイトをリニューアルした話 / 2026-01-31 Kansai WordPress Meetup
torounit
0
460
Embedded SREの終わりを設計する 「なんとなく」から計画的な自立支援へ
sansantech
PRO
3
2.4k
モダンUIでフルサーバーレスなAIエージェントをAmplifyとCDKでサクッとデプロイしよう
minorun365
4
190
Data Hubグループ 紹介資料
sansan33
PRO
0
2.7k
Context Engineeringが企業で不可欠になる理由
hirosatogamo
PRO
3
570
Featured
See All Featured
The Illustrated Guide to Node.js - THAT Conference 2024
reverentgeek
0
250
Unlocking the hidden potential of vector embeddings in international SEO
frankvandijk
0
170
CSS Pre-Processors: Stylus, Less & Sass
bermonpainter
359
30k
The Mindset for Success: Future Career Progression
greggifford
PRO
0
240
Claude Code のすすめ
schroneko
67
210k
30 Presentation Tips
portentint
PRO
1
210
HU Berlin: Industrial-Strength Natural Language Processing with spaCy and Prodigy
inesmontani
PRO
0
210
Google's AI Overviews - The New Search
badams
0
900
Docker and Python
trallard
47
3.7k
Marketing to machines
jonoalderson
1
4.6k
We Are The Robots
honzajavorek
0
160
KATA
mclloyd
PRO
34
15k
Transcript
Data doesn’t grow in tables Dealing with large sets of
documents
–An investigative reporter “We're working with 40 GB of XXX
and would like to search within the documents for certain keywords (like XXX) so we can identify XXX. Ideally we should be able to tag the docs..”
Some lingo • OCR (Optical Character Recognition) • NLP (Natural
Language Processing) • NER (Named Entity Recognition) • Regular Expressions
Cases
Exhibit A
Exhibit B
Exhibit C
Exhibit D
Tools
Tables in disguise http://tabula.nerdpower.org
Docs in a cloud http://documentcloud.org
Clustering, tagging, mining http://overview.ap.org
Let them eat PDF https://github.com/CrowData
All the visuals Jigsaw
Spoken word magic http://sayit.mysociety.org/
Whats missing? Easy-to-use ElasticSearch Commercial-grade OCR Configurable pipelines
Stefan Wehrmeyer, correctiv.org, @stefanwehrmeyer ! ! ! ! ! !
! Friedrich Lindenberg, codeforafrica.org, @pudo
None
None