Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Data doesn't grow in tables
Search
Friedrich Lindenberg
July 16, 2014
Technology
2
270
Data doesn't grow in tables
Friedrich Lindenberg
July 16, 2014
Tweet
Share
More Decks by Friedrich Lindenberg
See All by Friedrich Lindenberg
Introducción a OCCRP Data
pudo
0
410
Getting started with OCCRP Data
pudo
0
1.6k
#nr16: Recherche-Tools
pudo
1
100
data.occrp.org
pudo
0
160
Tools for Data Journalism | MediaLab Prado DDJ Workshop
pudo
0
240
Digitial Research Tools for Investigative Reporters
pudo
0
11k
Grano: A Python tool for investigating influence
pudo
1
280
Dr. Freezefile
pudo
2
400
Intro presentation for Naivasha
pudo
1
160
Other Decks in Technology
See All in Technology
Amazon Qで2Dゲームを作成してみた
siromi
0
170
プロジェクトマネジメントは不確実性との対話だ
hisashiwatanabe
0
130
JAWS-UG のイベントで使うハンズオンシナリオを Amazon Q Developer for CLI で作ってみた話
kazzpapa3
0
120
プロダクトエンジニアリングで開発の楽しさを拡張する話
barometrica
0
210
LTに影響を受けてテンプレリポジトリを作った話
hol1kgmg
0
380
✨敗北解法コレクション✨〜Expertだった頃に足りなかった知識と技術〜
nanachi
1
770
薬屋のひとりごとにみるトラブルシューティング
tomokusaba
0
390
LLM 機能を支える Langfuse / ClickHouse のサーバレス化
yuu26
9
2.6k
テストを実行してSorbetのsigを書こう!
sansantech
PRO
1
130
Amazon Q Developerを活用したアーキテクチャのリファクタリング
k1nakayama
2
220
2025新卒研修・Webアプリケーションセキュリティ #弁護士ドットコム
bengo4com
2
7.3k
Serverless Meetup #21
yoshidashingo
1
130
Featured
See All Featured
StorybookのUI Testing Handbookを読んだ
zakiyama
30
6k
Documentation Writing (for coders)
carmenintech
73
5k
Improving Core Web Vitals using Speculation Rules API
sergeychernyshev
18
1.1k
JavaScript: Past, Present, and Future - NDC Porto 2020
reverentgeek
50
5.5k
Java REST API Framework Comparison - PWX 2021
mraible
33
8.8k
Producing Creativity
orderedlist
PRO
347
40k
Mobile First: as difficult as doing things right
swwweet
223
9.9k
Large-scale JavaScript Application Architecture
addyosmani
512
110k
Being A Developer After 40
akosma
90
590k
Rails Girls Zürich Keynote
gr2m
95
14k
GitHub's CSS Performance
jonrohan
1031
460k
The Art of Delivering Value - GDevCon NA Keynote
reverentgeek
15
1.6k
Transcript
Data doesn’t grow in tables Dealing with large sets of
documents
–An investigative reporter “We're working with 40 GB of XXX
and would like to search within the documents for certain keywords (like XXX) so we can identify XXX. Ideally we should be able to tag the docs..”
Some lingo • OCR (Optical Character Recognition) • NLP (Natural
Language Processing) • NER (Named Entity Recognition) • Regular Expressions
Cases
Exhibit A
Exhibit B
Exhibit C
Exhibit D
Tools
Tables in disguise http://tabula.nerdpower.org
Docs in a cloud http://documentcloud.org
Clustering, tagging, mining http://overview.ap.org
Let them eat PDF https://github.com/CrowData
All the visuals Jigsaw
Spoken word magic http://sayit.mysociety.org/
Whats missing? Easy-to-use ElasticSearch Commercial-grade OCR Configurable pipelines
Stefan Wehrmeyer, correctiv.org, @stefanwehrmeyer ! ! ! ! ! !
! Friedrich Lindenberg, codeforafrica.org, @pudo
None
None