Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Data doesn't grow in tables
Search
Friedrich Lindenberg
July 16, 2014
Technology
2
270
Data doesn't grow in tables
Friedrich Lindenberg
July 16, 2014
Tweet
Share
More Decks by Friedrich Lindenberg
See All by Friedrich Lindenberg
Introducción a OCCRP Data
pudo
0
410
Getting started with OCCRP Data
pudo
0
1.5k
#nr16: Recherche-Tools
pudo
1
98
data.occrp.org
pudo
0
150
Tools for Data Journalism | MediaLab Prado DDJ Workshop
pudo
0
240
Digitial Research Tools for Investigative Reporters
pudo
0
11k
Grano: A Python tool for investigating influence
pudo
1
280
Dr. Freezefile
pudo
2
390
Intro presentation for Naivasha
pudo
1
160
Other Decks in Technology
See All in Technology
IVRyにおけるNLP活用と NLP2025の関連論文紹介
keisukeosone
0
110
テキスト解析で見る PyCon APAC 2025 セッション&スピーカートレンド分析
negi111111
0
260
Cline、めっちゃ便利、お金が飛ぶ💸
iwamot
22
19k
LINEギフトのLINEミニアプリアクセシビリティ改善事例
lycorptech_jp
PRO
0
340
コドモンのQAの今までとこれから -XPによる成長と見えてきた課題-
masasuna
0
160
GitHub MCP Serverを使って Pull Requestを作る、レビューする
hiyokose
2
620
ソフトウェア開発現代史: なぜ日本のソフトウェア開発は「滝」なのか?製造業の成功体験とのギャップ #jassttokyo
takabow
3
1.8k
SREが実現する開発者体験の革新
sansantech
PRO
0
140
マルチアカウント管理で必須!AWS Organizationsの機能とユースケース解説
nrinetcom
PRO
1
120
Enterprise AI in 2025?
pamelafox
0
140
AIエージェント開発における「攻めの品質改善」と「守りの品質保証」 / 2024.04.09 GPU UNITE 新年会 2025
smiyawaki0820
0
300
OCI Database with PostgreSQLのご紹介
rkajiyama
0
130
Featured
See All Featured
Building a Modern Day E-commerce SEO Strategy
aleyda
39
7.2k
Become a Pro
speakerdeck
PRO
27
5.3k
For a Future-Friendly Web
brad_frost
176
9.7k
Optimizing for Happiness
mojombo
377
70k
Fireside Chat
paigeccino
37
3.4k
Building Adaptive Systems
keathley
41
2.5k
Statistics for Hackers
jakevdp
798
220k
CoffeeScript is Beautiful & I Never Want to Write Plain JavaScript Again
sstephenson
160
15k
RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub
eileencodes
135
33k
実際に使うSQLの書き方 徹底解説 / pgcon21j-tutorial
soudai
177
52k
Building Better People: How to give real-time feedback that sticks.
wjessup
367
19k
Designing Dashboards & Data Visualisations in Web Apps
destraynor
231
53k
Transcript
Data doesn’t grow in tables Dealing with large sets of
documents
–An investigative reporter “We're working with 40 GB of XXX
and would like to search within the documents for certain keywords (like XXX) so we can identify XXX. Ideally we should be able to tag the docs..”
Some lingo • OCR (Optical Character Recognition) • NLP (Natural
Language Processing) • NER (Named Entity Recognition) • Regular Expressions
Cases
Exhibit A
Exhibit B
Exhibit C
Exhibit D
Tools
Tables in disguise http://tabula.nerdpower.org
Docs in a cloud http://documentcloud.org
Clustering, tagging, mining http://overview.ap.org
Let them eat PDF https://github.com/CrowData
All the visuals Jigsaw
Spoken word magic http://sayit.mysociety.org/
Whats missing? Easy-to-use ElasticSearch Commercial-grade OCR Configurable pipelines
Stefan Wehrmeyer, correctiv.org, @stefanwehrmeyer ! ! ! ! ! !
! Friedrich Lindenberg, codeforafrica.org, @pudo
None
None