Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Data doesn't grow in tables
Search
Friedrich Lindenberg
July 16, 2014
Technology
2
270
Data doesn't grow in tables
Friedrich Lindenberg
July 16, 2014
Tweet
Share
More Decks by Friedrich Lindenberg
See All by Friedrich Lindenberg
Introducción a OCCRP Data
pudo
0
410
Getting started with OCCRP Data
pudo
0
1.5k
#nr16: Recherche-Tools
pudo
1
100
data.occrp.org
pudo
0
150
Tools for Data Journalism | MediaLab Prado DDJ Workshop
pudo
0
240
Digitial Research Tools for Investigative Reporters
pudo
0
11k
Grano: A Python tool for investigating influence
pudo
1
280
Dr. Freezefile
pudo
2
400
Intro presentation for Naivasha
pudo
1
160
Other Decks in Technology
See All in Technology
20250719_JAWS_kobe
takuyay0ne
1
160
Introduction to Sansan for Engineers / エンジニア向け会社紹介
sansan33
PRO
5
39k
SRE with AI:実践から学ぶ、運用課題解決と未来への展望
yoshiiryo1
1
680
[SRE NEXT 2025] すみずみまで暖かく照らすあなたの太陽でありたい
carnappopper
2
880
Step Functions First - サーバーレスアーキテクチャの新しいパラダイム
taikis
1
260
複数のGemini CLIが同時開発する狂気 - Jujutsuが実現するAIエージェント協調の新世界
gunta
11
3k
「現場で活躍するAIエージェント」を実現するチームと開発プロセス
tkikuchi1002
6
960
組織内、組織間の資産保護に必要なアイデンティティ基盤と関連技術の最新動向
fujie
0
500
Microsoft Defender XDRで疲弊しないためのインシデント対応
sophiakunii
3
400
AWS Well-Architected から考えるオブザーバビリティの勘所 / Considering the Essentials of Observability from AWS Well-Architected
sms_tech
1
840
ゼロから始めるSREの事業貢献 - 生成AI時代のSRE成長戦略と実践 / Starting SRE from Day One
shinyorke
PRO
0
220
Digitization部 紹介資料
sansan33
PRO
1
4.6k
Featured
See All Featured
The Cost Of JavaScript in 2023
addyosmani
51
8.6k
A designer walks into a library…
pauljervisheath
207
24k
Git: the NoSQL Database
bkeepers
PRO
431
65k
Agile that works and the tools we love
rasmusluckow
329
21k
"I'm Feeling Lucky" - Building Great Search Experiences for Today's Users (#IAC19)
danielanewman
229
22k
GitHub's CSS Performance
jonrohan
1031
460k
Design and Strategy: How to Deal with People Who Don’t "Get" Design
morganepeng
130
19k
BBQ
matthewcrist
89
9.7k
Speed Design
sergeychernyshev
32
1k
The Myth of the Modular Monolith - Day 2 Keynote - Rails World 2024
eileencodes
26
2.9k
Keith and Marios Guide to Fast Websites
keithpitt
411
22k
A better future with KSS
kneath
238
17k
Transcript
Data doesn’t grow in tables Dealing with large sets of
documents
–An investigative reporter “We're working with 40 GB of XXX
and would like to search within the documents for certain keywords (like XXX) so we can identify XXX. Ideally we should be able to tag the docs..”
Some lingo • OCR (Optical Character Recognition) • NLP (Natural
Language Processing) • NER (Named Entity Recognition) • Regular Expressions
Cases
Exhibit A
Exhibit B
Exhibit C
Exhibit D
Tools
Tables in disguise http://tabula.nerdpower.org
Docs in a cloud http://documentcloud.org
Clustering, tagging, mining http://overview.ap.org
Let them eat PDF https://github.com/CrowData
All the visuals Jigsaw
Spoken word magic http://sayit.mysociety.org/
Whats missing? Easy-to-use ElasticSearch Commercial-grade OCR Configurable pipelines
Stefan Wehrmeyer, correctiv.org, @stefanwehrmeyer ! ! ! ! ! !
! Friedrich Lindenberg, codeforafrica.org, @pudo
None
None