Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Data doesn't grow in tables
Search
Friedrich Lindenberg
July 16, 2014
Technology
2
270
Data doesn't grow in tables
Friedrich Lindenberg
July 16, 2014
Tweet
Share
More Decks by Friedrich Lindenberg
See All by Friedrich Lindenberg
Introducción a OCCRP Data
pudo
0
410
Getting started with OCCRP Data
pudo
0
1.5k
#nr16: Recherche-Tools
pudo
1
100
data.occrp.org
pudo
0
150
Tools for Data Journalism | MediaLab Prado DDJ Workshop
pudo
0
240
Digitial Research Tools for Investigative Reporters
pudo
0
11k
Grano: A Python tool for investigating influence
pudo
1
280
Dr. Freezefile
pudo
2
400
Intro presentation for Naivasha
pudo
1
160
Other Decks in Technology
See All in Technology
成長し続けるアプリのためのテストと設計の関係、そして意思決定の記録。
sansantech
PRO
0
140
How to Quickly Call American Airlines®️ U.S. Customer Care : Full Guide
flyaahelpguide
0
240
Copilot coding agentにベットしたいCTOが開発組織で取り組んだこと / GitHub Copilot coding agent in Team
tnir
0
120
マーケットプレイス版Oracle WebCenter Content For OCI
oracle4engineer
PRO
3
980
2025-07-06 QGIS初級ハンズオン「はじめてのQGIS」
kou_kita
0
180
DBのスキルで生き残る技術 - AI時代におけるテーブル設計の勘所
soudai
PRO
54
22k
Claude Code に プロジェクト管理やらせたみた
unson
7
4.8k
AIエージェントが書くのなら直接CloudFormationを書かせればいいじゃないですか何故AWS CDKを使う必要があるのさ
watany
14
5.9k
TLSから見るSREの未来
atpons
2
190
SEQUENCE object comparison - db tech showcase 2025 LT2
nori_shinoda
0
270
開発生産性を測る前にやるべきこと - 組織改善の実践 / Before Measuring Dev Productivity
kaonavi
14
8k
データ基盤からデータベースまで?広がるユースケースのDatabricksについて教えるよ!
akuwano
3
150
Featured
See All Featured
Docker and Python
trallard
44
3.5k
Why Our Code Smells
bkeepers
PRO
336
57k
Code Reviewing Like a Champion
maltzj
524
40k
Put a Button on it: Removing Barriers to Going Fast.
kastner
60
3.9k
How to Create Impact in a Changing Tech Landscape [PerfNow 2023]
tammyeverts
53
2.9k
Designing Experiences People Love
moore
142
24k
Become a Pro
speakerdeck
PRO
29
5.4k
Fashionably flexible responsive web design (full day workshop)
malarkey
407
66k
Raft: Consensus for Rubyists
vanstee
140
7k
Side Projects
sachag
455
42k
Building Adaptive Systems
keathley
43
2.7k
We Have a Design System, Now What?
morganepeng
53
7.7k
Transcript
Data doesn’t grow in tables Dealing with large sets of
documents
–An investigative reporter “We're working with 40 GB of XXX
and would like to search within the documents for certain keywords (like XXX) so we can identify XXX. Ideally we should be able to tag the docs..”
Some lingo • OCR (Optical Character Recognition) • NLP (Natural
Language Processing) • NER (Named Entity Recognition) • Regular Expressions
Cases
Exhibit A
Exhibit B
Exhibit C
Exhibit D
Tools
Tables in disguise http://tabula.nerdpower.org
Docs in a cloud http://documentcloud.org
Clustering, tagging, mining http://overview.ap.org
Let them eat PDF https://github.com/CrowData
All the visuals Jigsaw
Spoken word magic http://sayit.mysociety.org/
Whats missing? Easy-to-use ElasticSearch Commercial-grade OCR Configurable pipelines
Stefan Wehrmeyer, correctiv.org, @stefanwehrmeyer ! ! ! ! ! !
! Friedrich Lindenberg, codeforafrica.org, @pudo
None
None