Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
openthebox.be - smart publications
Search
Niek Bartholomeus
October 02, 2019
Technology
200
0
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
openthebox.be - smart publications
Extracting deep insights from boring documents: a real-life story
Niek Bartholomeus
October 02, 2019
More Decks by Niek Bartholomeus
See All by Niek Bartholomeus
openthebox.be
niekbartho
1
2.7k
From idea to production with NLP, Scala and Spark
niekbartho
3
510
Going DevOps with BMC
niekbartho
0
220
Orchestration in meatspace
niekbartho
4
2.1k
Self-organization vs. global optimization - a comparison between traditional and modern organizations
niekbartho
2
510
DevOps for Dinosaurs
niekbartho
12
3.1k
Other Decks in Technology
See All in Technology
そこにあるから地図ができる~位置を示す"モノ"を愉しむ~ - Interface 2026年6月号GPS特集オフ会 / interface_202606_GPS_offline
sakaik
1
110
iOS アプリの「これって不具合ですか?」を AI に調べてもらう
miichan
0
150
FPGAの開発コンペでZephyrを使ってみた
iotengineer22
0
210
[チョークトーク資料]AWS DevOps Agent を使いこなす / AWS Dev Ops Agent Chalk Talk AWS Summit Japan 2026
kinunori
4
800
現場のトークンマネジメント
dak2
1
200
千葉での単身赴任からAWSをやり続け、千葉に戻ってきた話
yama3133
1
120
ご挨拶「10周年を迎える共創ラボのこれまでとこれから」
iotcomjpadmin
0
150
LayerX コーポレートエンジニアリング室におけるサプライチェーンセキュリティへの取り組み / Supply Chain Security at LayerX Corporate Engineering
yuyatakeyama
3
860
從開發到部署全都交給 AI:實作 AI 驅動的自動化流程
appleboy
0
180
AIエージェントとPhysical AIが拓く製造業の変革(ハノーバーメッセリキャップ)
iotcomjpadmin
0
160
從觀望到全公司落地:AI Agentic Coding 導入實戰 — 流程整合與安全治理
appleboy
0
150
[AWS Summit Japan 2026]迷っているあなたへ_小さな一歩が、やがて自分を助けてくれる
sh_fk2
2
430
Featured
See All Featured
RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub
eileencodes
141
35k
AI Search: Where Are We & What Can We Do About It?
aleyda
0
7.6k
Building Flexible Design Systems
yeseniaperezcruz
330
40k
Code Review Best Practice
trishagee
74
20k
A better future with KSS
kneath
240
18k
What does AI have to do with Human Rights?
axbom
PRO
1
2.2k
Keith and Marios Guide to Fast Websites
keithpitt
413
23k
技術選定の審美眼(2025年版) / Understanding the Spiral of Technologies 2025 edition
twada
PRO
118
120k
A brief & incomplete history of UX Design for the World Wide Web: 1989–2019
jct
2
400
Test your architecture with Archunit
thirion
1
2.3k
The Power of CSS Pseudo Elements
geoffreycrofte
82
6.3k
Have SEOs Ruined the Internet? - User Awareness of SEO in 2025
akashhashmi
0
370
Transcript
openthebox.be Extracting deep insights from 'boring' documents: a real-life story
Me Niek Bartholomeus @niekbartho • Background as a software developer
• Switched to data science and natural language processing in 2016 • Founded openthebox.be in 2017
openthebox.be
openthebox.be Open data KBO NBB Belgian Official Gazette http://kbopub.economie.fgov.be/kbopub https://cri.nbb.be/bc9/web/catalog
http://www.ejustice.just.fgov.be/ tsv/tsvn.htm
knowledge graph Visualization Analytics Machine learning Knowledge graph Structured data
Unstructured data KBO NBB Belgian Official Gazette
Unstructured data - pipeline
Unstructured data - pipeline steps 1] OCR 2] NER 4]
Entity linking 3] Relation extraction
Unstructured data - pipeline steps 1] OCR
Unstructured data - pipeline steps 2] NER
Unstructured data - pipeline steps 2] NER Pre-processing rules: [“1.Jan”,
“Janssens”] 1.Jan Janssens [“Marktstraat”, “54,8450”, “Bredene”] Marktstraat 54,8450 Bredene
Unstructured data - pipeline steps 2] NER Post-processing rules: +
= General rules Legal rules Historic probabilities Faulty publication Context Improved publication
Unstructured data - pipeline steps 2] NER Organization Person Inheritance:
Notary Owner Representative Proxy holder Administrator Author : “is a” relationship Base labels Subclass labels
Unstructured data - pipeline steps 2] NER Gentstraat 69 Niek
Roger Camiel Bartholomeus Sub entity extraction: First name: Niek Middle names: Roger, Camiel Last name: Bartholomeus 9170 Sint-Pauwels Street: Gentstraat Number: 69 Zip code: 9170 City: Sint-Pauwels
Unstructured data - pipeline steps 3] Relation extraction
Unstructured data - pipeline steps 4] Entity linking
Unstructured data - pipeline steps Niek Roger Camiel Bartholomeus Niek
Bartholomeus N. Bartholomeus Bartholomeus } Niek Roger Camiel Bartholomeus Deduplication: 4] Entity linking
Unstructured data - pipeline steps Niek Roger Camiel Bartholomeus Link
with knowledge graph: Gentstraat 69 9170 Sint-Pauwels 4] Entity linking
openthebox.be
openthebox.be Bigger picture
openthebox.be http://wpmlabs.com/ Academia Industry https://www.filter-concept.com/ +
openthebox.be https://opensenselabs.com