Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Data doesn't grow in tables
Search
Friedrich Lindenberg
July 16, 2014
Technology
290
2
Share
Data doesn't grow in tables
Friedrich Lindenberg
July 16, 2014
More Decks by Friedrich Lindenberg
See All by Friedrich Lindenberg
Introducción a OCCRP Data
pudo
0
430
Getting started with OCCRP Data
pudo
0
1.7k
#nr16: Recherche-Tools
pudo
1
120
data.occrp.org
pudo
0
180
Tools for Data Journalism | MediaLab Prado DDJ Workshop
pudo
0
250
Digitial Research Tools for Investigative Reporters
pudo
0
11k
Grano: A Python tool for investigating influence
pudo
1
300
Dr. Freezefile
pudo
2
450
Intro presentation for Naivasha
pudo
1
180
Other Decks in Technology
See All in Technology
2026-04-02 IBM Bobオンボーディング入門
yutanonaka
0
260
機能・非機能の学びを一つに!Agent Skillsで月間レポート作成始めてみた / Unifying Bug & Infra Insights — Building Monthly Quality Reports with Agent Skills
bun913
5
3.9k
Zero Data Loss Autonomous Recovery Service サービス概要
oracle4engineer
PRO
4
14k
システムは「動く」だけでは 足りない - 非機能要件・分散システム・トレードオフの基礎
nwiizo
25
7.8k
さくらのAI Engineから始める クラウドネイティブ意識
melonps
0
130
今年60歳のおっさんCBになる
kentapapa
1
350
【Findy FDE登壇_2026_04_14】— 現場課題を本気で解いてたら、FDEになってた話
miyatakoji
0
850
Bluesky Meetup in Tokyo vol.4 - 2023to2026
shinoharata
0
140
Autonomous Database - Dedicated 技術詳細 / adb-d_technical_detail_jp
oracle4engineer
PRO
5
13k
ASTのGitHub CopilotとCopilot CLIの現在地をお話しします/How AST Operates GitHub Copilot and Copilot CLI
aeonpeople
1
210
申請待ちゼロへ!AWS × Entra IDで実現した「権限付与」のセルフサービス化
mhrtech
1
270
BIツール「Omni」の紹介 @Snowflake中部UG
sagara
0
260
Featured
See All Featured
The Cult of Friendly URLs
andyhume
79
6.8k
Put a Button on it: Removing Barriers to Going Fast.
kastner
60
4.2k
Making Projects Easy
brettharned
120
6.6k
Balancing Empowerment & Direction
lara
5
1k
Accessibility Awareness
sabderemane
0
94
Claude Code のすすめ
schroneko
67
220k
jQuery: Nuts, Bolts and Bling
dougneiner
66
8.4k
Building a Scalable Design System with Sketch
lauravandoore
463
34k
<Decoding/> the Language of Devs - We Love SEO 2024
nikkihalliwell
1
180
How to Build an AI Search Optimization Roadmap - Criteria and Steps to Take #SEOIRL
aleyda
1
2k
Scaling GitHub
holman
464
140k
Why Our Code Smells
bkeepers
PRO
340
58k
Transcript
Data doesn’t grow in tables Dealing with large sets of
documents
–An investigative reporter “We're working with 40 GB of XXX
and would like to search within the documents for certain keywords (like XXX) so we can identify XXX. Ideally we should be able to tag the docs..”
Some lingo • OCR (Optical Character Recognition) • NLP (Natural
Language Processing) • NER (Named Entity Recognition) • Regular Expressions
Cases
Exhibit A
Exhibit B
Exhibit C
Exhibit D
Tools
Tables in disguise http://tabula.nerdpower.org
Docs in a cloud http://documentcloud.org
Clustering, tagging, mining http://overview.ap.org
Let them eat PDF https://github.com/CrowData
All the visuals Jigsaw
Spoken word magic http://sayit.mysociety.org/
Whats missing? Easy-to-use ElasticSearch Commercial-grade OCR Configurable pipelines
Stefan Wehrmeyer, correctiv.org, @stefanwehrmeyer ! ! ! ! ! !
! Friedrich Lindenberg, codeforafrica.org, @pudo
None
None