Lock in $30 Savings on PRO—Offer Ends Soon! ⏳
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Data doesn't grow in tables
Search
Friedrich Lindenberg
July 16, 2014
Technology
2
280
Data doesn't grow in tables
Friedrich Lindenberg
July 16, 2014
Tweet
Share
More Decks by Friedrich Lindenberg
See All by Friedrich Lindenberg
Introducción a OCCRP Data
pudo
0
420
Getting started with OCCRP Data
pudo
0
1.6k
#nr16: Recherche-Tools
pudo
1
110
data.occrp.org
pudo
0
160
Tools for Data Journalism | MediaLab Prado DDJ Workshop
pudo
0
250
Digitial Research Tools for Investigative Reporters
pudo
0
11k
Grano: A Python tool for investigating influence
pudo
1
290
Dr. Freezefile
pudo
2
420
Intro presentation for Naivasha
pudo
1
170
Other Decks in Technology
See All in Technology
AWSを使う上で最低限知っておきたいセキュリティ研修を社内で実施した話 ~みんなでやるセキュリティ~
maimyyym
2
260
AWS Trainium3 をちょっと身近に感じたい
bigmuramura
1
140
30分であなたをOmniのファンにしてみせます~分析画面のクリック操作をそのままコード化できるAI-ReadyなBIツール~
sagara
0
120
コミューンのデータ分析AIエージェント「Community Sage」の紹介
fufufukakaka
0
470
Overture Maps Foundationの3年を振り返る
moritoru
0
170
打 造 A I 驅 動 的 G i t H u b ⾃ 動 化 ⼯ 作 流 程
appleboy
0
270
ガバメントクラウド利用システムのライフサイクルについて
techniczna
0
190
Reinforcement Fine-tuning 基礎〜実践まで
ch6noota
0
170
RAG/Agent開発のアップデートまとめ
taka0709
0
150
手動から自動へ、そしてその先へ
moritamasami
0
300
会社紹介資料 / Sansan Company Profile
sansan33
PRO
11
390k
A Compass of Thought: Guiding the Future of Test Automation ( #jassttokai25 , #jassttokai )
teyamagu
PRO
1
250
Featured
See All Featured
Art, The Web, and Tiny UX
lynnandtonic
303
21k
ReactJS: Keep Simple. Everything can be a component!
pedronauck
666
130k
Building a Scalable Design System with Sketch
lauravandoore
463
34k
The World Runs on Bad Software
bkeepers
PRO
72
12k
Build The Right Thing And Hit Your Dates
maggiecrowley
38
3k
How GitHub (no longer) Works
holman
316
140k
Building a Modern Day E-commerce SEO Strategy
aleyda
45
8.3k
Stop Working from a Prison Cell
hatefulcrawdad
273
21k
Measuring & Analyzing Core Web Vitals
bluesmoon
9
700
Scaling GitHub
holman
464
140k
The Illustrated Children's Guide to Kubernetes
chrisshort
51
51k
The Success of Rails: Ensuring Growth for the Next 100 Years
eileencodes
47
7.9k
Transcript
Data doesn’t grow in tables Dealing with large sets of
documents
–An investigative reporter “We're working with 40 GB of XXX
and would like to search within the documents for certain keywords (like XXX) so we can identify XXX. Ideally we should be able to tag the docs..”
Some lingo • OCR (Optical Character Recognition) • NLP (Natural
Language Processing) • NER (Named Entity Recognition) • Regular Expressions
Cases
Exhibit A
Exhibit B
Exhibit C
Exhibit D
Tools
Tables in disguise http://tabula.nerdpower.org
Docs in a cloud http://documentcloud.org
Clustering, tagging, mining http://overview.ap.org
Let them eat PDF https://github.com/CrowData
All the visuals Jigsaw
Spoken word magic http://sayit.mysociety.org/
Whats missing? Easy-to-use ElasticSearch Commercial-grade OCR Configurable pipelines
Stefan Wehrmeyer, correctiv.org, @stefanwehrmeyer ! ! ! ! ! !
! Friedrich Lindenberg, codeforafrica.org, @pudo
None
None