Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
openthebox.be - smart publications
Search
Sponsored
·
Ship Features Fearlessly
Turn features on and off without deploys. Used by thousands of Ruby developers.
→
Niek Bartholomeus
October 02, 2019
Technology
0
180
openthebox.be - smart publications
Extracting deep insights from boring documents: a real-life story
Niek Bartholomeus
October 02, 2019
Tweet
Share
More Decks by Niek Bartholomeus
See All by Niek Bartholomeus
openthebox.be
niekbartho
1
2.6k
From idea to production with NLP, Scala and Spark
niekbartho
3
500
Going DevOps with BMC
niekbartho
0
210
Orchestration in meatspace
niekbartho
4
2k
Self-organization vs. global optimization - a comparison between traditional and modern organizations
niekbartho
2
490
DevOps for Dinosaurs
niekbartho
12
3k
Other Decks in Technology
See All in Technology
Ultra Ethernet (UEC) v1.0 仕様概説
markunet
3
200
ビズリーチにおける検索・推薦の取り組み / DEIM2026
visional_engineering_and_design
1
100
フルカイテン株式会社 エンジニア向け採用資料
fullkaiten
0
10k
AI Agentにおける評価指標とAgent GPA
tsho
1
300
パネルディスカッション資料 (at Tableau Now! - 2026-02-26)
yoshitakaarakawa
0
1.1k
自動テストが巻き起こした開発プロセス・チームの変化 / Impact of Automated Testing on Development Cycles and Team Dynamics
codmoninc
1
1.1k
AIに視覚を与えモバイルアプリケーション開発をより円滑に行う
lycorptech_jp
PRO
1
800
バクラクのSREにおけるAgentic AIへの挑戦/Our Journey with Agentic AI
taddy_919
2
1k
Oracle Database@AWS:サービス概要のご紹介
oracle4engineer
PRO
4
1.6k
Introduction to Sansan Meishi Maker Development Engineer
sansan33
PRO
0
360
Databricksアシスタントが自分で考えて動く時代に! エージェントモード体験もくもく会
taka_aki
0
320
Security Diaries of an Open Source IAM
ahus1
0
200
Featured
See All Featured
Facilitating Awesome Meetings
lara
57
6.8k
Google's AI Overviews - The New Search
badams
0
930
Measuring Dark Social's Impact On Conversion and Attribution
stephenakadiri
1
140
Practical Tips for Bootstrapping Information Extraction Pipelines
honnibal
25
1.8k
Building a A Zero-Code AI SEO Workflow
portentint
PRO
0
370
<Decoding/> the Language of Devs - We Love SEO 2024
nikkihalliwell
1
150
BBQ
matthewcrist
89
10k
Designing Powerful Visuals for Engaging Learning
tmiket
0
260
The Curse of the Amulet
leimatthew05
1
9.6k
Joys of Absence: A Defence of Solitary Play
codingconduct
1
300
Game over? The fight for quality and originality in the time of robots
wayneb77
1
130
The Straight Up "How To Draw Better" Workshop
denniskardys
239
140k
Transcript
openthebox.be Extracting deep insights from 'boring' documents: a real-life story
Me Niek Bartholomeus @niekbartho • Background as a software developer
• Switched to data science and natural language processing in 2016 • Founded openthebox.be in 2017
openthebox.be
openthebox.be Open data KBO NBB Belgian Official Gazette http://kbopub.economie.fgov.be/kbopub https://cri.nbb.be/bc9/web/catalog
http://www.ejustice.just.fgov.be/ tsv/tsvn.htm
knowledge graph Visualization Analytics Machine learning Knowledge graph Structured data
Unstructured data KBO NBB Belgian Official Gazette
Unstructured data - pipeline
Unstructured data - pipeline steps 1] OCR 2] NER 4]
Entity linking 3] Relation extraction
Unstructured data - pipeline steps 1] OCR
Unstructured data - pipeline steps 2] NER
Unstructured data - pipeline steps 2] NER Pre-processing rules: [“1.Jan”,
“Janssens”] 1.Jan Janssens [“Marktstraat”, “54,8450”, “Bredene”] Marktstraat 54,8450 Bredene
Unstructured data - pipeline steps 2] NER Post-processing rules: +
= General rules Legal rules Historic probabilities Faulty publication Context Improved publication
Unstructured data - pipeline steps 2] NER Organization Person Inheritance:
Notary Owner Representative Proxy holder Administrator Author : “is a” relationship Base labels Subclass labels
Unstructured data - pipeline steps 2] NER Gentstraat 69 Niek
Roger Camiel Bartholomeus Sub entity extraction: First name: Niek Middle names: Roger, Camiel Last name: Bartholomeus 9170 Sint-Pauwels Street: Gentstraat Number: 69 Zip code: 9170 City: Sint-Pauwels
Unstructured data - pipeline steps 3] Relation extraction
Unstructured data - pipeline steps 4] Entity linking
Unstructured data - pipeline steps Niek Roger Camiel Bartholomeus Niek
Bartholomeus N. Bartholomeus Bartholomeus } Niek Roger Camiel Bartholomeus Deduplication: 4] Entity linking
Unstructured data - pipeline steps Niek Roger Camiel Bartholomeus Link
with knowledge graph: Gentstraat 69 9170 Sint-Pauwels 4] Entity linking
openthebox.be
openthebox.be Bigger picture
openthebox.be http://wpmlabs.com/ Academia Industry https://www.filter-concept.com/ +
openthebox.be https://opensenselabs.com