Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Data doesn't grow in tables
Search
Friedrich Lindenberg
July 16, 2014
Technology
300
2
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
Data doesn't grow in tables
Friedrich Lindenberg
July 16, 2014
More Decks by Friedrich Lindenberg
See All by Friedrich Lindenberg
Introducción a OCCRP Data
pudo
0
440
Getting started with OCCRP Data
pudo
0
1.7k
#nr16: Recherche-Tools
pudo
1
120
data.occrp.org
pudo
0
180
Tools for Data Journalism | MediaLab Prado DDJ Workshop
pudo
0
260
Digitial Research Tools for Investigative Reporters
pudo
0
11k
Grano: A Python tool for investigating influence
pudo
1
310
Dr. Freezefile
pudo
2
460
Intro presentation for Naivasha
pudo
1
190
Other Decks in Technology
See All in Technology
SteampipeとExcel Power QueryでAWS構成定義書の作成を自動化する
jhashimoto
0
110
スキルと MCP ツール、責務をどう分けるか? AI が迷わないインターフェース設計の戦略
cdataj
1
1.1k
2026TECHFRESH畢業分享會 - 原生還是跨平台? App 開發踩坑實錄
line_developers_tw
PRO
0
1.2k
【セミナー資料】Claude Code をセキュアに使うための考え方と設定の勘どころ / Claude Code Webinar 20260616
masahirokawahara
2
390
AIネイティブな開発のサプライチェーンリスク対策 〜激動の開発現場でリスクに立ち向かう〜【ZennFes】
cscengineer
PRO
2
140
2026TECHFRESH畢業分享會 - AI 時代的人生存檔點
line_developers_tw
PRO
0
1.2k
不要なレビューをAIにまかせて AIコーディングの環境改善を加速した
shoota
1
210
2026TECHFRESH畢業分享會 - Lightning Talk - E起 See See : 電商推薦讀心術? 數據說了算
line_developers_tw
PRO
0
1.2k
いまさら聞けない「仕様駆動開発入門」 〜AI活用時代の開発プロセスを考える〜
findy_eventslides
2
160
Kubernetesにおける学習基盤とLLMOpsの概要
ry
1
310
RAG を使わないという選択肢
tatsutaka
1
260
【Snowflake Summit 2026 Recap!!】Snowflake Summit Deep Dive: Security & Governance
civitaspo
1
250
Featured
See All Featured
jQuery: Nuts, Bolts and Bling
dougneiner
66
8.5k
Hiding What from Whom? A Critical Review of the History of Programming languages for Music
tomoyanonymous
2
860
Fashionably flexible responsive web design (full day workshop)
malarkey
408
66k
Being A Developer After 40
akosma
91
590k
VelocityConf: Rendering Performance Case Studies
addyosmani
333
25k
What Being in a Rock Band Can Teach Us About Real World SEO
427marketing
0
250
A Soul's Torment
seathinner
6
2.9k
How to Get Subject Matter Experts Bought In and Actively Contributing to SEO & PR Initiatives.
livdayseo
0
140
"I'm Feeling Lucky" - Building Great Search Experiences for Today's Users (#IAC19)
danielanewman
230
23k
Leveraging LLMs for student feedback in introductory data science courses - posit::conf(2025)
minecr
1
290
Deep Space Network (abreviated)
tonyrice
0
170
Have SEOs Ruined the Internet? - User Awareness of SEO in 2025
akashhashmi
0
370
Transcript
Data doesn’t grow in tables Dealing with large sets of
documents
–An investigative reporter “We're working with 40 GB of XXX
and would like to search within the documents for certain keywords (like XXX) so we can identify XXX. Ideally we should be able to tag the docs..”
Some lingo • OCR (Optical Character Recognition) • NLP (Natural
Language Processing) • NER (Named Entity Recognition) • Regular Expressions
Cases
Exhibit A
Exhibit B
Exhibit C
Exhibit D
Tools
Tables in disguise http://tabula.nerdpower.org
Docs in a cloud http://documentcloud.org
Clustering, tagging, mining http://overview.ap.org
Let them eat PDF https://github.com/CrowData
All the visuals Jigsaw
Spoken word magic http://sayit.mysociety.org/
Whats missing? Easy-to-use ElasticSearch Commercial-grade OCR Configurable pipelines
Stefan Wehrmeyer, correctiv.org, @stefanwehrmeyer ! ! ! ! ! !
! Friedrich Lindenberg, codeforafrica.org, @pudo
None
None