Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
openthebox.be - smart publications
Search
Niek Bartholomeus
October 02, 2019
Technology
190
0
Share
openthebox.be - smart publications
Extracting deep insights from boring documents: a real-life story
Niek Bartholomeus
October 02, 2019
More Decks by Niek Bartholomeus
See All by Niek Bartholomeus
openthebox.be
niekbartho
1
2.7k
From idea to production with NLP, Scala and Spark
niekbartho
3
510
Going DevOps with BMC
niekbartho
0
220
Orchestration in meatspace
niekbartho
4
2.1k
Self-organization vs. global optimization - a comparison between traditional and modern organizations
niekbartho
2
510
DevOps for Dinosaurs
niekbartho
12
3.1k
Other Decks in Technology
See All in Technology
OpenID Connectによるサービス間連携
takesection
0
160
【5分でわかる】セーフィー エンジニア向け会社紹介
safie_recruit
0
50k
ルールやカスタム機能、どう使う?理想の出力を引き出すために今知りたいIBM Bob 5つの機能
muehara
1
310
MIERUNE JCT 発表資料「宇宙から伊能忠敬ごっこ」
syuchimu
0
140
AIプラットフォームを運用し続けるための可観測性
tanimuyk
4
1.1k
Platform Engineering as a Product: Criteria for Improvement and Multi-Tenant Design
kumorn5s
0
490
美味しいスイスチーズを作ろう🧀🐭
taigamikami
1
230
AI-DLCを活用した高品質・安全なAI駆動開発実践 / AI Driven Development with AI-DLC
yoshidashingo
0
110
製造業のクラウド活用最適解〜AI,DXを加速するデータ基盤の作り方〜
hamadakoji
0
330
Oracle AI Database@AWS:サービス概要のご紹介
oracle4engineer
PRO
4
2.8k
マーケットプレイス版Oracle WebCenter Content For OCI
oracle4engineer
PRO
5
1.8k
OCI Oracle AI Database Services新機能アップデート(2026/03-2026/05)
oracle4engineer
PRO
0
170
Featured
See All Featured
Testing 201, or: Great Expectations
jmmastey
46
8.2k
Understanding Cognitive Biases in Performance Measurement
bluesmoon
32
2.9k
The Director’s Chair: Orchestrating AI for Truly Effective Learning
tmiket
1
180
DevOps and Value Stream Thinking: Enabling flow, efficiency and business value
helenjbeal
1
210
Building the Perfect Custom Keyboard
takai
2
780
Being A Developer After 40
akosma
91
590k
Mind Mapping
helmedeiros
PRO
1
230
Leo the Paperboy
mayatellez
7
1.8k
Fight the Zombie Pattern Library - RWD Summit 2016
marcelosomers
234
17k
The AI Search Optimization Roadmap by Aleyda Solis
aleyda
1
5.9k
How Fast Is Fast Enough? [PerfNow 2025]
tammyeverts
3
600
<Decoding/> the Language of Devs - We Love SEO 2024
nikkihalliwell
1
230
Transcript
openthebox.be Extracting deep insights from 'boring' documents: a real-life story
Me Niek Bartholomeus @niekbartho • Background as a software developer
• Switched to data science and natural language processing in 2016 • Founded openthebox.be in 2017
openthebox.be
openthebox.be Open data KBO NBB Belgian Official Gazette http://kbopub.economie.fgov.be/kbopub https://cri.nbb.be/bc9/web/catalog
http://www.ejustice.just.fgov.be/ tsv/tsvn.htm
knowledge graph Visualization Analytics Machine learning Knowledge graph Structured data
Unstructured data KBO NBB Belgian Official Gazette
Unstructured data - pipeline
Unstructured data - pipeline steps 1] OCR 2] NER 4]
Entity linking 3] Relation extraction
Unstructured data - pipeline steps 1] OCR
Unstructured data - pipeline steps 2] NER
Unstructured data - pipeline steps 2] NER Pre-processing rules: [“1.Jan”,
“Janssens”] 1.Jan Janssens [“Marktstraat”, “54,8450”, “Bredene”] Marktstraat 54,8450 Bredene
Unstructured data - pipeline steps 2] NER Post-processing rules: +
= General rules Legal rules Historic probabilities Faulty publication Context Improved publication
Unstructured data - pipeline steps 2] NER Organization Person Inheritance:
Notary Owner Representative Proxy holder Administrator Author : “is a” relationship Base labels Subclass labels
Unstructured data - pipeline steps 2] NER Gentstraat 69 Niek
Roger Camiel Bartholomeus Sub entity extraction: First name: Niek Middle names: Roger, Camiel Last name: Bartholomeus 9170 Sint-Pauwels Street: Gentstraat Number: 69 Zip code: 9170 City: Sint-Pauwels
Unstructured data - pipeline steps 3] Relation extraction
Unstructured data - pipeline steps 4] Entity linking
Unstructured data - pipeline steps Niek Roger Camiel Bartholomeus Niek
Bartholomeus N. Bartholomeus Bartholomeus } Niek Roger Camiel Bartholomeus Deduplication: 4] Entity linking
Unstructured data - pipeline steps Niek Roger Camiel Bartholomeus Link
with knowledge graph: Gentstraat 69 9170 Sint-Pauwels 4] Entity linking
openthebox.be
openthebox.be Bigger picture
openthebox.be http://wpmlabs.com/ Academia Industry https://www.filter-concept.com/ +
openthebox.be https://opensenselabs.com