Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Data doesn't grow in tables
Search
Friedrich Lindenberg
July 16, 2014
Technology
2
280
Data doesn't grow in tables
Friedrich Lindenberg
July 16, 2014
Tweet
Share
More Decks by Friedrich Lindenberg
See All by Friedrich Lindenberg
Introducción a OCCRP Data
pudo
0
420
Getting started with OCCRP Data
pudo
0
1.6k
#nr16: Recherche-Tools
pudo
1
110
data.occrp.org
pudo
0
170
Tools for Data Journalism | MediaLab Prado DDJ Workshop
pudo
0
250
Digitial Research Tools for Investigative Reporters
pudo
0
11k
Grano: A Python tool for investigating influence
pudo
1
290
Dr. Freezefile
pudo
2
440
Intro presentation for Naivasha
pudo
1
170
Other Decks in Technology
See All in Technology
生成AI時代にこそ求められるSRE / SRE for Gen AI era
ymotongpoo
5
3.2k
CDK対応したAWS DevOps Agentを試そう_20260201
masakiokuda
1
270
マーケットプレイス版Oracle WebCenter Content For OCI
oracle4engineer
PRO
5
1.6k
Data Hubグループ 紹介資料
sansan33
PRO
0
2.7k
GSIが複数キー対応したことで、俺達はいったい何が嬉しいのか?
smt7174
3
150
15 years with Rails and DDD (AI Edition)
andrzejkrzywda
0
190
顧客との商談議事録をみんなで読んで顧客解像度を上げよう
shibayu36
0
230
Agile Leadership Summit Keynote 2026
m_seki
1
610
コスト削減から「セキュリティと利便性」を担うプラットフォームへ
sansantech
PRO
3
1.5k
セキュリティについて学ぶ会 / 2026 01 25 Takamatsu WordPress Meetup
rocketmartue
1
300
フルカイテン株式会社 エンジニア向け採用資料
fullkaiten
0
10k
Introduction to Bill One Development Engineer
sansan33
PRO
0
360
Featured
See All Featured
Sam Torres - BigQuery for SEOs
techseoconnect
PRO
0
180
A Modern Web Designer's Workflow
chriscoyier
698
190k
Marketing Yourself as an Engineer | Alaka | Gurzu
gurzu
0
130
Exploring anti-patterns in Rails
aemeredith
2
250
SEO Brein meetup: CTRL+C is not how to scale international SEO
lindahogenes
0
2.3k
YesSQL, Process and Tooling at Scale
rocio
174
15k
How to audit for AI Accessibility on your Front & Back End
davetheseo
0
180
Jess Joyce - The Pitfalls of Following Frameworks
techseoconnect
PRO
1
64
A better future with KSS
kneath
240
18k
Redefining SEO in the New Era of Traffic Generation
szymonslowik
1
210
Navigating the Design Leadership Dip - Product Design Week Design Leaders+ Conference 2024
apolaine
0
170
How to Think Like a Performance Engineer
csswizardry
28
2.4k
Transcript
Data doesn’t grow in tables Dealing with large sets of
documents
–An investigative reporter “We're working with 40 GB of XXX
and would like to search within the documents for certain keywords (like XXX) so we can identify XXX. Ideally we should be able to tag the docs..”
Some lingo • OCR (Optical Character Recognition) • NLP (Natural
Language Processing) • NER (Named Entity Recognition) • Regular Expressions
Cases
Exhibit A
Exhibit B
Exhibit C
Exhibit D
Tools
Tables in disguise http://tabula.nerdpower.org
Docs in a cloud http://documentcloud.org
Clustering, tagging, mining http://overview.ap.org
Let them eat PDF https://github.com/CrowData
All the visuals Jigsaw
Spoken word magic http://sayit.mysociety.org/
Whats missing? Easy-to-use ElasticSearch Commercial-grade OCR Configurable pipelines
Stefan Wehrmeyer, correctiv.org, @stefanwehrmeyer ! ! ! ! ! !
! Friedrich Lindenberg, codeforafrica.org, @pudo
None
None