Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Data doesn't grow in tables
Search
Friedrich Lindenberg
July 16, 2014
Technology
2
280
Data doesn't grow in tables
Friedrich Lindenberg
July 16, 2014
Tweet
Share
More Decks by Friedrich Lindenberg
See All by Friedrich Lindenberg
Introducción a OCCRP Data
pudo
0
430
Getting started with OCCRP Data
pudo
0
1.6k
#nr16: Recherche-Tools
pudo
1
120
data.occrp.org
pudo
0
170
Tools for Data Journalism | MediaLab Prado DDJ Workshop
pudo
0
250
Digitial Research Tools for Investigative Reporters
pudo
0
11k
Grano: A Python tool for investigating influence
pudo
1
300
Dr. Freezefile
pudo
2
440
Intro presentation for Naivasha
pudo
1
180
Other Decks in Technology
See All in Technology
大規模な組織におけるAI Agent活用の促進と課題
lycorptech_jp
PRO
5
6.8k
LLM活用の壁を超える:リクルートR&Dの戦略と打ち手
recruitengineers
PRO
1
170
Vertex AI Agent Engine で学ぶ「記憶」の設計
tkikuchi
0
110
LY Tableauでの Tableau x AIの実践 (at Tableau Now! - 2026-02-26)
yoshitakaarakawa
0
960
社内ワークショップで終わらせない 業務改善AIエージェント開発
lycorptech_jp
PRO
1
400
オンプレとGoogle Cloudを安全に繋ぐための、セキュア通信の勘所
waiwai2111
3
1k
バクラクのSREにおけるAgentic AIへの挑戦/Our Journey with Agentic AI
taddy_919
1
480
【Developers Summit 2026】Memory Is All You Need:コンテキストの「最適化」から「継続性」へ ~RAGを進化させるメモリエンジニアリングの最前線~
shisyu_gaku
5
830
俺の失敗を乗り越えろ!メーカーの開発現場での失敗談と乗り越え方 ~ゆるゆるチームリーダー編~
spiddle
0
400
作るべきものと向き合う - ecspresso 8年間の開発史から学ぶ技術選定 / 技術選定con findy 2026
fujiwara3
6
1.6k
WBCの解説は生成AIにやらせよう - 生成AIで野球解説者AI Agentを実現する / Baseball Commentator AI Agent for Gemini
shinyorke
PRO
0
300
バニラVisaギフトカードを棄てるのは結構大変
meow_noisy
0
160
Featured
See All Featured
Utilizing Notion as your number one productivity tool
mfonobong
3
240
AI Search: Where Are We & What Can We Do About It?
aleyda
0
7.1k
Refactoring Trust on Your Teams (GOTO; Chicago 2020)
rmw
35
3.4k
What does AI have to do with Human Rights?
axbom
PRO
0
2k
Measuring Dark Social's Impact On Conversion and Attribution
stephenakadiri
1
140
Six Lessons from altMBA
skipperchong
29
4.2k
Rails Girls Zürich Keynote
gr2m
96
14k
実際に使うSQLの書き方 徹底解説 / pgcon21j-tutorial
soudai
PRO
199
72k
Intergalactic Javascript Robots from Outer Space
tanoku
273
27k
Let's Do A Bunch of Simple Stuff to Make Websites Faster
chriscoyier
508
140k
Optimising Largest Contentful Paint
csswizardry
37
3.6k
Creating an realtime collaboration tool: Agile Flush - .NET Oxford
marcduiker
35
2.4k
Transcript
Data doesn’t grow in tables Dealing with large sets of
documents
–An investigative reporter “We're working with 40 GB of XXX
and would like to search within the documents for certain keywords (like XXX) so we can identify XXX. Ideally we should be able to tag the docs..”
Some lingo • OCR (Optical Character Recognition) • NLP (Natural
Language Processing) • NER (Named Entity Recognition) • Regular Expressions
Cases
Exhibit A
Exhibit B
Exhibit C
Exhibit D
Tools
Tables in disguise http://tabula.nerdpower.org
Docs in a cloud http://documentcloud.org
Clustering, tagging, mining http://overview.ap.org
Let them eat PDF https://github.com/CrowData
All the visuals Jigsaw
Spoken word magic http://sayit.mysociety.org/
Whats missing? Easy-to-use ElasticSearch Commercial-grade OCR Configurable pipelines
Stefan Wehrmeyer, correctiv.org, @stefanwehrmeyer ! ! ! ! ! !
! Friedrich Lindenberg, codeforafrica.org, @pudo
None
None