Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
openthebox.be - smart publications
Search
Niek Bartholomeus
October 02, 2019
Technology
0
180
openthebox.be - smart publications
Extracting deep insights from boring documents: a real-life story
Niek Bartholomeus
October 02, 2019
Tweet
Share
More Decks by Niek Bartholomeus
See All by Niek Bartholomeus
openthebox.be
niekbartho
1
2.6k
From idea to production with NLP, Scala and Spark
niekbartho
3
500
Going DevOps with BMC
niekbartho
0
210
Orchestration in meatspace
niekbartho
4
2k
Self-organization vs. global optimization - a comparison between traditional and modern organizations
niekbartho
2
490
DevOps for Dinosaurs
niekbartho
12
3k
Other Decks in Technology
See All in Technology
Keycloak を使った SSO で CockroachDB にログインする / CockroachDB SSO with Keycloak
kota2and3kan
0
120
[2026-03-07]あの日諦めたスクラムの答えを僕達はまだ探している。〜守ることと、諦めることと、それでも前に進むチームの話〜
tosite
0
230
Claude Code Skills 勉強会 (DevelersIO向けに調整済み) / claude code skills for devio
masahirokawahara
1
20k
Yahoo!ショッピングのレコメンデーション・システムにおけるML実践の一例
lycorptech_jp
PRO
1
210
「Blue Team Labs Online」入門 - みんなで挑むログ解析バトル
v_avenger
0
170
2026-03-11 JAWS-UG 茨城 #12 改めてALBを便利に使う
masasuzu
2
380
[JAWS DAYS 2026]私の AWS DevOps Agent 推しポイント
furuton
0
150
Oracle Cloud Infrastructure IaaS 新機能アップデート 2025/12 - 2026/2
oracle4engineer
PRO
0
130
Everything Claude Code を眺める
oikon48
2
2.7k
JAWSDAYS2026 [C02] 楽しく学ぼう!AWSとは?AWSの歴史 入門
hiragahh
0
150
The_Evolution_of_Bits_AI_SRE.pdf
nulabinc
PRO
0
200
AIエージェント、 社内展開の前に知っておきたいこと
oracle4engineer
PRO
2
120
Featured
See All Featured
The Straight Up "How To Draw Better" Workshop
denniskardys
239
140k
Stop Working from a Prison Cell
hatefulcrawdad
274
21k
30 Presentation Tips
portentint
PRO
1
250
Accessibility Awareness
sabderemane
0
80
Raft: Consensus for Rubyists
vanstee
141
7.4k
Effective software design: The role of men in debugging patriarchy in IT @ Voxxed Days AMS
baasie
0
250
The Anti-SEO Checklist Checklist. Pubcon Cyber Week
ryanjones
0
92
The Cult of Friendly URLs
andyhume
79
6.8k
Evolving SEO for Evolving Search Engines
ryanjones
0
150
The Success of Rails: Ensuring Growth for the Next 100 Years
eileencodes
47
8k
Bioeconomy Workshop: Dr. Julius Ecuru, Opportunities for a Bioeconomy in West Africa
akademiya2063
PRO
1
70
The Organizational Zoo: Understanding Human Behavior Agility Through Metaphoric Constructive Conversations (based on the works of Arthur Shelley, Ph.D)
kimpetersen
PRO
0
270
Transcript
openthebox.be Extracting deep insights from 'boring' documents: a real-life story
Me Niek Bartholomeus @niekbartho • Background as a software developer
• Switched to data science and natural language processing in 2016 • Founded openthebox.be in 2017
openthebox.be
openthebox.be Open data KBO NBB Belgian Official Gazette http://kbopub.economie.fgov.be/kbopub https://cri.nbb.be/bc9/web/catalog
http://www.ejustice.just.fgov.be/ tsv/tsvn.htm
knowledge graph Visualization Analytics Machine learning Knowledge graph Structured data
Unstructured data KBO NBB Belgian Official Gazette
Unstructured data - pipeline
Unstructured data - pipeline steps 1] OCR 2] NER 4]
Entity linking 3] Relation extraction
Unstructured data - pipeline steps 1] OCR
Unstructured data - pipeline steps 2] NER
Unstructured data - pipeline steps 2] NER Pre-processing rules: [“1.Jan”,
“Janssens”] 1.Jan Janssens [“Marktstraat”, “54,8450”, “Bredene”] Marktstraat 54,8450 Bredene
Unstructured data - pipeline steps 2] NER Post-processing rules: +
= General rules Legal rules Historic probabilities Faulty publication Context Improved publication
Unstructured data - pipeline steps 2] NER Organization Person Inheritance:
Notary Owner Representative Proxy holder Administrator Author : “is a” relationship Base labels Subclass labels
Unstructured data - pipeline steps 2] NER Gentstraat 69 Niek
Roger Camiel Bartholomeus Sub entity extraction: First name: Niek Middle names: Roger, Camiel Last name: Bartholomeus 9170 Sint-Pauwels Street: Gentstraat Number: 69 Zip code: 9170 City: Sint-Pauwels
Unstructured data - pipeline steps 3] Relation extraction
Unstructured data - pipeline steps 4] Entity linking
Unstructured data - pipeline steps Niek Roger Camiel Bartholomeus Niek
Bartholomeus N. Bartholomeus Bartholomeus } Niek Roger Camiel Bartholomeus Deduplication: 4] Entity linking
Unstructured data - pipeline steps Niek Roger Camiel Bartholomeus Link
with knowledge graph: Gentstraat 69 9170 Sint-Pauwels 4] Entity linking
openthebox.be
openthebox.be Bigger picture
openthebox.be http://wpmlabs.com/ Academia Industry https://www.filter-concept.com/ +
openthebox.be https://opensenselabs.com