Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Data doesn't grow in tables
Search
Friedrich Lindenberg
July 16, 2014
Technology
2
280
Data doesn't grow in tables
Friedrich Lindenberg
July 16, 2014
Tweet
Share
More Decks by Friedrich Lindenberg
See All by Friedrich Lindenberg
Introducción a OCCRP Data
pudo
0
420
Getting started with OCCRP Data
pudo
0
1.6k
#nr16: Recherche-Tools
pudo
1
110
data.occrp.org
pudo
0
170
Tools for Data Journalism | MediaLab Prado DDJ Workshop
pudo
0
250
Digitial Research Tools for Investigative Reporters
pudo
0
11k
Grano: A Python tool for investigating influence
pudo
1
290
Dr. Freezefile
pudo
2
440
Intro presentation for Naivasha
pudo
1
170
Other Decks in Technology
See All in Technology
Contract One Engineering Unit 紹介資料
sansan33
PRO
0
13k
15 years with Rails and DDD (AI Edition)
andrzejkrzywda
0
190
ブロックテーマ、WordPress でウェブサイトをつくるということ / 2026.02.07 Gifu WordPress Meetup
torounit
0
180
M&A 後の統合をどう進めるか ─ ナレッジワーク × Poetics が実践した組織とシステムの融合
kworkdev
PRO
1
450
SREじゃなかった僕らがenablingを通じて「SRE実践者」になるまでのリアル / SRE Kaigi 2026
aeonpeople
6
2.3k
CDKで始めるTypeScript開発のススメ
tsukuboshi
1
410
GitHub Issue Templates + Coding Agentで簡単みんなでIaC/Easy IaC for Everyone with GitHub Issue Templates + Coding Agent
aeonpeople
1
220
We Built for Predictability; The Workloads Didn’t Care
stahnma
0
140
プロダクト成長を支える開発基盤とスケールに伴う課題
yuu26
4
1.3k
AWS Network Firewall Proxyを触ってみた
nagisa53
1
230
FinTech SREのAWSサービス活用/Leveraging AWS Services in FinTech SRE
maaaato
0
130
顧客との商談議事録をみんなで読んで顧客解像度を上げよう
shibayu36
0
230
Featured
See All Featured
Stewardship and Sustainability of Urban and Community Forests
pwiseman
0
110
CSS Pre-Processors: Stylus, Less & Sass
bermonpainter
359
30k
Claude Code どこまでも/ Claude Code Everywhere
nwiizo
61
52k
Leveraging LLMs for student feedback in introductory data science courses - posit::conf(2025)
minecr
0
140
The State of eCommerce SEO: How to Win in Today's Products SERPs - #SEOweek
aleyda
2
9.5k
How to optimise 3,500 product descriptions for ecommerce in one day using ChatGPT
katarinadahlin
PRO
0
3.4k
Navigating Weather and Climate Data
rabernat
0
100
How to build an LLM SEO readiness audit: a practical framework
nmsamuel
1
640
Raft: Consensus for Rubyists
vanstee
141
7.3k
The Illustrated Guide to Node.js - THAT Conference 2024
reverentgeek
0
260
Connecting the Dots Between Site Speed, User Experience & Your Business [WebExpo 2025]
tammyeverts
11
830
Skip the Path - Find Your Career Trail
mkilby
0
55
Transcript
Data doesn’t grow in tables Dealing with large sets of
documents
–An investigative reporter “We're working with 40 GB of XXX
and would like to search within the documents for certain keywords (like XXX) so we can identify XXX. Ideally we should be able to tag the docs..”
Some lingo • OCR (Optical Character Recognition) • NLP (Natural
Language Processing) • NER (Named Entity Recognition) • Regular Expressions
Cases
Exhibit A
Exhibit B
Exhibit C
Exhibit D
Tools
Tables in disguise http://tabula.nerdpower.org
Docs in a cloud http://documentcloud.org
Clustering, tagging, mining http://overview.ap.org
Let them eat PDF https://github.com/CrowData
All the visuals Jigsaw
Spoken word magic http://sayit.mysociety.org/
Whats missing? Easy-to-use ElasticSearch Commercial-grade OCR Configurable pipelines
Stefan Wehrmeyer, correctiv.org, @stefanwehrmeyer ! ! ! ! ! !
! Friedrich Lindenberg, codeforafrica.org, @pudo
None
None