Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Data doesn't grow in tables
Search
Friedrich Lindenberg
July 16, 2014
Technology
2
240
Data doesn't grow in tables
Friedrich Lindenberg
July 16, 2014
Tweet
Share
More Decks by Friedrich Lindenberg
See All by Friedrich Lindenberg
Introducción a OCCRP Data
pudo
0
380
Getting started with OCCRP Data
pudo
0
1.3k
#nr16: Recherche-Tools
pudo
1
83
data.occrp.org
pudo
0
130
Tools for Data Journalism | MediaLab Prado DDJ Workshop
pudo
0
220
Digitial Research Tools for Investigative Reporters
pudo
0
10k
Grano: A Python tool for investigating influence
pudo
1
270
Dr. Freezefile
pudo
2
320
Intro presentation for Naivasha
pudo
1
130
Other Decks in Technology
See All in Technology
コードを書く隙間を見つけて生きていく技術/Findy 思考の現在地
fujiwara3
27
5.9k
AOAI をきっかけに 社内の Azure 管理を見直した話
recruitengineers
PRO
1
260
NgRx Signal Store
rainerhahnekamp
0
150
プロトタイピングによる不確実性の低減 / Reducing Uncertainty through Prototyping
ohbarye
5
380
Janus
bkuhlmann
1
490
Meta Quest 3 で動く桜マシマシ WebXR アプリを IBM Cloud Code Engine と Babylon.js で作った話
1ftseabass
PRO
0
120
エンジニアのキャリアをちょっと楽しくする3本の軸/Three Pillars to Make an Engineer's Career More Enjoyable
kwappa
0
2.6k
MLOpsの「壁」を乗り越える、LINEヤフーの Data Quality as Code
lycorptech_jp
PRO
5
460
VSCodeの拡張機能を作っている話
ebarakazuhiro
1
300
TechFeed Experts Night#27 〜 フロントエンドフレームワーク最前線 (Svelte)
baseballyama
1
380
GraphQL 成熟度モデルの紹介と、プロダクトに当てはめた事例 / GraphQL maturity model
mh4gf
7
1.3k
Compose Compiler Metricsを使った実践的なコードレビュー
tomorrowkey
1
220
Featured
See All Featured
Dealing with People You Can't Stand - Big Design 2015
cassininazir
357
22k
Refactoring Trust on Your Teams (GOTO; Chicago 2020)
rmw
25
2.3k
Sharpening the Axe: The Primacy of Toolmaking
bcantrill
17
1.4k
Music & Morning Musume
bryan
41
5.6k
実際に使うSQLの書き方 徹底解説 / pgcon21j-tutorial
soudai
121
39k
Docker and Python
trallard
34
2.7k
Automating Front-end Workflow
addyosmani
1356
200k
How STYLIGHT went responsive
nonsquared
92
4.8k
Fantastic passwords and where to find them - at NoRuKo
philnash
37
2.5k
It's Worth the Effort
3n
180
27k
Large-scale JavaScript Application Architecture
addyosmani
504
110k
The Straight Up "How To Draw Better" Workshop
denniskardys
227
130k
Transcript
Data doesn’t grow in tables Dealing with large sets of
documents
–An investigative reporter “We're working with 40 GB of XXX
and would like to search within the documents for certain keywords (like XXX) so we can identify XXX. Ideally we should be able to tag the docs..”
Some lingo • OCR (Optical Character Recognition) • NLP (Natural
Language Processing) • NER (Named Entity Recognition) • Regular Expressions
Cases
Exhibit A
Exhibit B
Exhibit C
Exhibit D
Tools
Tables in disguise http://tabula.nerdpower.org
Docs in a cloud http://documentcloud.org
Clustering, tagging, mining http://overview.ap.org
Let them eat PDF https://github.com/CrowData
All the visuals Jigsaw
Spoken word magic http://sayit.mysociety.org/
Whats missing? Easy-to-use ElasticSearch Commercial-grade OCR Configurable pipelines
Stefan Wehrmeyer, correctiv.org, @stefanwehrmeyer ! ! ! ! ! !
! Friedrich Lindenberg, codeforafrica.org, @pudo
None
None