Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
openthebox.be - smart publications
Search
Niek Bartholomeus
October 02, 2019
Technology
0
180
openthebox.be - smart publications
Extracting deep insights from boring documents: a real-life story
Niek Bartholomeus
October 02, 2019
Tweet
Share
More Decks by Niek Bartholomeus
See All by Niek Bartholomeus
openthebox.be
niekbartho
1
2.6k
From idea to production with NLP, Scala and Spark
niekbartho
3
490
Going DevOps with BMC
niekbartho
0
210
Orchestration in meatspace
niekbartho
4
2k
Self-organization vs. global optimization - a comparison between traditional and modern organizations
niekbartho
2
490
DevOps for Dinosaurs
niekbartho
12
3k
Other Decks in Technology
See All in Technology
Kiro IDEのドキュメントを全部読んだので地味だけどちょっと嬉しい機能を紹介する
khmoryz
0
210
Contract One Engineering Unit 紹介資料
sansan33
PRO
0
13k
Ruby版 JSXのRuxが気になる
sansantech
PRO
0
160
生成AIを活用した音声文字起こしシステムの2つの構築パターンについて
miu_crescent
PRO
3
210
CDKで始めるTypeScript開発のススメ
tsukuboshi
1
510
Codex 5.3 と Opus 4.6 にコーポレートサイトを作らせてみた / Codex 5.3 vs Opus 4.6
ama_ch
0
190
20260208_第66回 コンピュータビジョン勉強会
keiichiito1978
0
190
クレジットカード決済基盤を支えるSRE - 厳格な監査とSRE運用の両立 (SRE Kaigi 2026)
capytan
6
2.8k
小さく始めるBCP ― 多プロダクト環境で始める最初の一歩
kekke_n
1
500
Oracle Base Database Service 技術詳細
oracle4engineer
PRO
15
93k
AWS Network Firewall Proxyを触ってみた
nagisa53
1
240
[CV勉強会@関東 World Model 読み会] Orbis: Overcoming Challenges of Long-Horizon Prediction in Driving World Models (Mousakhan+, NeurIPS 2025)
abemii
0
140
Featured
See All Featured
AI in Enterprises - Java and Open Source to the Rescue
ivargrimstad
0
1.1k
Scaling GitHub
holman
464
140k
Rebuilding a faster, lazier Slack
samanthasiow
85
9.4k
We Are The Robots
honzajavorek
0
170
Evolution of real-time – Irina Nazarova, EuRuKo, 2024
irinanazarova
9
1.2k
A Soul's Torment
seathinner
5
2.3k
How to optimise 3,500 product descriptions for ecommerce in one day using ChatGPT
katarinadahlin
PRO
0
3.4k
The untapped power of vector embeddings
frankvandijk
1
1.6k
The Curse of the Amulet
leimatthew05
1
8.7k
Done Done
chrislema
186
16k
[Rails World 2023 - Day 1 Closing Keynote] - The Magic of Rails
eileencodes
38
2.7k
How Fast Is Fast Enough? [PerfNow 2025]
tammyeverts
3
450
Transcript
openthebox.be Extracting deep insights from 'boring' documents: a real-life story
Me Niek Bartholomeus @niekbartho • Background as a software developer
• Switched to data science and natural language processing in 2016 • Founded openthebox.be in 2017
openthebox.be
openthebox.be Open data KBO NBB Belgian Official Gazette http://kbopub.economie.fgov.be/kbopub https://cri.nbb.be/bc9/web/catalog
http://www.ejustice.just.fgov.be/ tsv/tsvn.htm
knowledge graph Visualization Analytics Machine learning Knowledge graph Structured data
Unstructured data KBO NBB Belgian Official Gazette
Unstructured data - pipeline
Unstructured data - pipeline steps 1] OCR 2] NER 4]
Entity linking 3] Relation extraction
Unstructured data - pipeline steps 1] OCR
Unstructured data - pipeline steps 2] NER
Unstructured data - pipeline steps 2] NER Pre-processing rules: [“1.Jan”,
“Janssens”] 1.Jan Janssens [“Marktstraat”, “54,8450”, “Bredene”] Marktstraat 54,8450 Bredene
Unstructured data - pipeline steps 2] NER Post-processing rules: +
= General rules Legal rules Historic probabilities Faulty publication Context Improved publication
Unstructured data - pipeline steps 2] NER Organization Person Inheritance:
Notary Owner Representative Proxy holder Administrator Author : “is a” relationship Base labels Subclass labels
Unstructured data - pipeline steps 2] NER Gentstraat 69 Niek
Roger Camiel Bartholomeus Sub entity extraction: First name: Niek Middle names: Roger, Camiel Last name: Bartholomeus 9170 Sint-Pauwels Street: Gentstraat Number: 69 Zip code: 9170 City: Sint-Pauwels
Unstructured data - pipeline steps 3] Relation extraction
Unstructured data - pipeline steps 4] Entity linking
Unstructured data - pipeline steps Niek Roger Camiel Bartholomeus Niek
Bartholomeus N. Bartholomeus Bartholomeus } Niek Roger Camiel Bartholomeus Deduplication: 4] Entity linking
Unstructured data - pipeline steps Niek Roger Camiel Bartholomeus Link
with knowledge graph: Gentstraat 69 9170 Sint-Pauwels 4] Entity linking
openthebox.be
openthebox.be Bigger picture
openthebox.be http://wpmlabs.com/ Academia Industry https://www.filter-concept.com/ +
openthebox.be https://opensenselabs.com