Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Data doesn't grow in tables
Search
Friedrich Lindenberg
July 16, 2014
Technology
2
260
Data doesn't grow in tables
Friedrich Lindenberg
July 16, 2014
Tweet
Share
More Decks by Friedrich Lindenberg
See All by Friedrich Lindenberg
Introducción a OCCRP Data
pudo
0
400
Getting started with OCCRP Data
pudo
0
1.5k
#nr16: Recherche-Tools
pudo
1
96
data.occrp.org
pudo
0
150
Tools for Data Journalism | MediaLab Prado DDJ Workshop
pudo
0
240
Digitial Research Tools for Investigative Reporters
pudo
0
11k
Grano: A Python tool for investigating influence
pudo
1
280
Dr. Freezefile
pudo
2
390
Intro presentation for Naivasha
pudo
1
160
Other Decks in Technology
See All in Technology
関東Kaggler会LT: 人狼コンペとLLM量子化について
nejumi
3
580
RECRUIT TECH CONFERENCE 2025 プレイベント【高橋】
recruitengineers
PRO
0
160
プロダクトエンジニア構想を立ち上げ、プロダクト志向な組織への成長を続けている話 / grow into a product-oriented organization
hiro_torii
1
200
現場の種を事業の芽にする - エンジニア主導のイノベーションを事業戦略に装着する方法 -
kzkmaeda
2
2.1k
ビジネスモデリング道場 目的と背景
masuda220
PRO
9
520
速くて安いWebサイトを作る
nishiharatsubasa
10
13k
Swiftの “private” を テストする / Testing Swift "private"
yutailang0119
0
130
Developers Summit 2025 浅野卓也(13-B-7 LegalOn Technologies)
legalontechnologies
PRO
0
720
アジャイル開発とスクラム
araihara
0
170
N=1から解き明かすAWS ソリューションアーキテクトの魅力
kiiwami
0
130
管理者しか知らないOutlookの裏側のAIを覗く#AzureTravelers
hirotomotaguchi
2
400
君も受託系GISエンジニアにならないか
sudataka
2
430
Featured
See All Featured
Navigating Team Friction
lara
183
15k
Learning to Love Humans: Emotional Interface Design
aarron
273
40k
The Power of CSS Pseudo Elements
geoffreycrofte
75
5.5k
The Cost Of JavaScript in 2023
addyosmani
47
7.3k
RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub
eileencodes
133
33k
Exploring the Power of Turbo Streams & Action Cable | RailsConf2023
kevinliebholz
30
4.6k
Practical Orchestrator
shlominoach
186
10k
The Pragmatic Product Professional
lauravandoore
32
6.4k
A Philosophy of Restraint
colly
203
16k
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
330
21k
Build your cross-platform service in a week with App Engine
jlugia
229
18k
Unsuck your backbone
ammeep
669
57k
Transcript
Data doesn’t grow in tables Dealing with large sets of
documents
–An investigative reporter “We're working with 40 GB of XXX
and would like to search within the documents for certain keywords (like XXX) so we can identify XXX. Ideally we should be able to tag the docs..”
Some lingo • OCR (Optical Character Recognition) • NLP (Natural
Language Processing) • NER (Named Entity Recognition) • Regular Expressions
Cases
Exhibit A
Exhibit B
Exhibit C
Exhibit D
Tools
Tables in disguise http://tabula.nerdpower.org
Docs in a cloud http://documentcloud.org
Clustering, tagging, mining http://overview.ap.org
Let them eat PDF https://github.com/CrowData
All the visuals Jigsaw
Spoken word magic http://sayit.mysociety.org/
Whats missing? Easy-to-use ElasticSearch Commercial-grade OCR Configurable pipelines
Stefan Wehrmeyer, correctiv.org, @stefanwehrmeyer ! ! ! ! ! !
! Friedrich Lindenberg, codeforafrica.org, @pudo
None
None