$30 off During Our Annual Pro Sale. View Details »
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Data doesn't grow in tables
Search
Friedrich Lindenberg
July 16, 2014
Technology
2
280
Data doesn't grow in tables
Friedrich Lindenberg
July 16, 2014
Tweet
Share
More Decks by Friedrich Lindenberg
See All by Friedrich Lindenberg
Introducción a OCCRP Data
pudo
0
420
Getting started with OCCRP Data
pudo
0
1.6k
#nr16: Recherche-Tools
pudo
1
110
data.occrp.org
pudo
0
170
Tools for Data Journalism | MediaLab Prado DDJ Workshop
pudo
0
250
Digitial Research Tools for Investigative Reporters
pudo
0
11k
Grano: A Python tool for investigating influence
pudo
1
290
Dr. Freezefile
pudo
2
430
Intro presentation for Naivasha
pudo
1
170
Other Decks in Technology
See All in Technology
AgentCoreとStrandsで社内d払いナレッジボットを作った話
motojimayu
1
990
なぜ あなたはそんなに re:Invent に行くのか?
miu_crescent
PRO
0
210
Strands Agents × インタリーブ思考 で変わるAIエージェント設計 / Strands Agents x Interleaved Thinking AI Agents
takanorig
5
2.1k
AgentCore BrowserとClaude Codeスキルを活用した 『初手AI』を実現する業務自動化AIエージェント基盤
ruzia
7
1.6k
NIKKEI Tech Talk #41: セキュア・バイ・デザインからクラウド管理を考える
sekido
PRO
0
220
日本の AI 開発と世界の潮流 / GenAI Development in Japan
hariby
1
500
意外と知らない状態遷移テストの世界
nihonbuson
PRO
1
270
Amazon Bedrock Knowledge Bases × メタデータ活用で実現する検証可能な RAG 設計
tomoaki25
6
2.4k
Agent Skillsがハーネスの垣根を超える日
gotalab555
6
4.5k
「もしもデータ基盤開発で『強くてニューゲーム』ができたなら今の僕はどんなデータ基盤を作っただろう」
aeonpeople
0
250
業務の煩悩を祓うAI活用術108選 / AI 108 Usages
smartbank
9
13k
半年で、AIゼロ知識から AI中心開発組織の変革担当に至るまで
rfdnxbro
0
150
Featured
See All Featured
Pawsitive SEO: Lessons from My Dog (and Many Mistakes) on Thriving as a Consultant in the Age of AI
davidcarrasco
0
38
Context Engineering - Making Every Token Count
addyosmani
9
560
Believing is Seeing
oripsolob
0
15
Docker and Python
trallard
47
3.7k
Conquering PDFs: document understanding beyond plain text
inesmontani
PRO
4
2.1k
What the history of the web can teach us about the future of AI
inesmontani
PRO
0
380
Winning Ecommerce Organic Search in an AI Era - #searchnstuff2025
aleyda
0
1.8k
Automating Front-end Workflow
addyosmani
1371
200k
The Impact of AI in SEO - AI Overviews June 2024 Edition
aleyda
5
680
Save Time (by Creating Custom Rails Generators)
garrettdimon
PRO
32
1.9k
Test your architecture with Archunit
thirion
1
2.1k
The Director’s Chair: Orchestrating AI for Truly Effective Learning
tmiket
0
67
Transcript
Data doesn’t grow in tables Dealing with large sets of
documents
–An investigative reporter “We're working with 40 GB of XXX
and would like to search within the documents for certain keywords (like XXX) so we can identify XXX. Ideally we should be able to tag the docs..”
Some lingo • OCR (Optical Character Recognition) • NLP (Natural
Language Processing) • NER (Named Entity Recognition) • Regular Expressions
Cases
Exhibit A
Exhibit B
Exhibit C
Exhibit D
Tools
Tables in disguise http://tabula.nerdpower.org
Docs in a cloud http://documentcloud.org
Clustering, tagging, mining http://overview.ap.org
Let them eat PDF https://github.com/CrowData
All the visuals Jigsaw
Spoken word magic http://sayit.mysociety.org/
Whats missing? Easy-to-use ElasticSearch Commercial-grade OCR Configurable pipelines
Stefan Wehrmeyer, correctiv.org, @stefanwehrmeyer ! ! ! ! ! !
! Friedrich Lindenberg, codeforafrica.org, @pudo
None
None