Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
openthebox.be - smart publications
Search
Niek Bartholomeus
October 02, 2019
Technology
0
180
openthebox.be - smart publications
Extracting deep insights from boring documents: a real-life story
Niek Bartholomeus
October 02, 2019
Tweet
Share
More Decks by Niek Bartholomeus
See All by Niek Bartholomeus
openthebox.be
niekbartho
1
2.6k
From idea to production with NLP, Scala and Spark
niekbartho
3
490
Going DevOps with BMC
niekbartho
0
200
Orchestration in meatspace
niekbartho
4
2k
Self-organization vs. global optimization - a comparison between traditional and modern organizations
niekbartho
2
480
DevOps for Dinosaurs
niekbartho
12
3k
Other Decks in Technology
See All in Technology
2025-12-27 Claude CodeでPRレビュー対応を効率化する@機械学習社会実装勉強会第54回
nakamasato
4
1.3k
AgentCoreとStrandsで社内d払いナレッジボットを作った話
motojimayu
1
1.2k
AWS Lambda durable functions を使って AWS Lambda の15分の壁を超えてみよう
matsuzawatakeshi
0
120
日本Rubyの会: これまでとこれから
snoozer05
PRO
6
250
Oracle Cloud Infrastructure:2025年12月度サービス・アップデート
oracle4engineer
PRO
0
150
AWSの新機能をフル活用した「re:Inventエージェント」開発秘話
minorun365
2
530
20251222_サンフランシスコサバイバル術
ponponmikankan
2
160
AWS re:Inventre:cap ~AmazonNova 2 Omniのワークショップを体験してきた~
nrinetcom
PRO
0
120
1万人を変え日本を変える!!多層構造型ふりかえりの大規模組織変革 / 20260108 Kazuki Mori
shift_evolve
PRO
3
290
ペアーズにおけるAIエージェント 基盤とText to SQLツールの紹介
hisamouna
2
2k
2025年のデザインシステムとAI 活用を振り返る
leveragestech
0
620
自己管理型チームと個人のセルフマネジメント 〜モチベーション編〜
kakehashi
PRO
2
370
Featured
See All Featured
Exploring anti-patterns in Rails
aemeredith
2
220
JAMstack: Web Apps at Ludicrous Speed - All Things Open 2022
reverentgeek
1
300
We Have a Design System, Now What?
morganepeng
54
8k
Responsive Adventures: Dirty Tricks From The Dark Corners of Front-End
smashingmag
254
22k
Building Better People: How to give real-time feedback that sticks.
wjessup
370
20k
Facilitating Awesome Meetings
lara
57
6.7k
[Rails World 2023 - Day 1 Closing Keynote] - The Magic of Rails
eileencodes
38
2.7k
Designing Dashboards & Data Visualisations in Web Apps
destraynor
231
54k
Max Prin - Stacking Signals: How International SEO Comes Together (And Falls Apart)
techseoconnect
PRO
0
57
ラッコキーワード サービス紹介資料
rakko
0
1.9M
10 Git Anti Patterns You Should be Aware of
lemiorhan
PRO
659
61k
Neural Spatial Audio Processing for Sound Field Analysis and Control
skoyamalab
0
140
Transcript
openthebox.be Extracting deep insights from 'boring' documents: a real-life story
Me Niek Bartholomeus @niekbartho • Background as a software developer
• Switched to data science and natural language processing in 2016 • Founded openthebox.be in 2017
openthebox.be
openthebox.be Open data KBO NBB Belgian Official Gazette http://kbopub.economie.fgov.be/kbopub https://cri.nbb.be/bc9/web/catalog
http://www.ejustice.just.fgov.be/ tsv/tsvn.htm
knowledge graph Visualization Analytics Machine learning Knowledge graph Structured data
Unstructured data KBO NBB Belgian Official Gazette
Unstructured data - pipeline
Unstructured data - pipeline steps 1] OCR 2] NER 4]
Entity linking 3] Relation extraction
Unstructured data - pipeline steps 1] OCR
Unstructured data - pipeline steps 2] NER
Unstructured data - pipeline steps 2] NER Pre-processing rules: [“1.Jan”,
“Janssens”] 1.Jan Janssens [“Marktstraat”, “54,8450”, “Bredene”] Marktstraat 54,8450 Bredene
Unstructured data - pipeline steps 2] NER Post-processing rules: +
= General rules Legal rules Historic probabilities Faulty publication Context Improved publication
Unstructured data - pipeline steps 2] NER Organization Person Inheritance:
Notary Owner Representative Proxy holder Administrator Author : “is a” relationship Base labels Subclass labels
Unstructured data - pipeline steps 2] NER Gentstraat 69 Niek
Roger Camiel Bartholomeus Sub entity extraction: First name: Niek Middle names: Roger, Camiel Last name: Bartholomeus 9170 Sint-Pauwels Street: Gentstraat Number: 69 Zip code: 9170 City: Sint-Pauwels
Unstructured data - pipeline steps 3] Relation extraction
Unstructured data - pipeline steps 4] Entity linking
Unstructured data - pipeline steps Niek Roger Camiel Bartholomeus Niek
Bartholomeus N. Bartholomeus Bartholomeus } Niek Roger Camiel Bartholomeus Deduplication: 4] Entity linking
Unstructured data - pipeline steps Niek Roger Camiel Bartholomeus Link
with knowledge graph: Gentstraat 69 9170 Sint-Pauwels 4] Entity linking
openthebox.be
openthebox.be Bigger picture
openthebox.be http://wpmlabs.com/ Academia Industry https://www.filter-concept.com/ +
openthebox.be https://opensenselabs.com