Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
A Natural Language Pipeline
Search
Sponsored
·
SiteGround - Reliable hosting with speed, security, and support you can count on.
→
ddqz
July 06, 2019
Technology
540
0
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
A Natural Language Pipeline
Presentation from the spaCy IRL 2019 conference.
ddqz
July 06, 2019
Other Decks in Technology
See All in Technology
自宅LLMの話
jacopen
1
690
現場のトークンマネジメント
dak2
1
150
PostgreSQL 19 新機能概要 OSC Hokkaido 2026
nori_shinoda
0
190
生成 AI 実践ガイド (概略版) AIガバナンス編
asei
0
140
秘密度ラベル初心者が第1歩でつまづかないための「設計・運用」ポイント
seafay
PRO
1
360
SONiC Scale-Up Working Group から探る Scale-UpやUltraEthernet機能の実装方法
ebiken
PRO
2
420
自分が詳しくない領域でAIを使う #プロヒス2026
konifar
18
6.3k
就職⽀援サービスにおけるキャリアアドバイザーのシフトスケジューリング
recruitengineers
PRO
1
150
人材育成分科会.pdf
_awache
4
300
サイバーエージェントにおけるAI推進戦略と変革への取り組み
shotatsuge
0
250
ザ・データベース、MySQL ~ OSC 2026 Sendai ~
sakaik
0
150
Claude Codeをどのように キャッチアップしているか
oikon48
13
8.6k
Featured
See All Featured
個人開発の失敗を避けるイケてる考え方 / tips for indie hackers
panda_program
123
22k
Introduction to Domain-Driven Design and Collaborative software design
baasie
1
860
How to Align SEO within the Product Triangle To Get Buy-In & Support - #RIMC
aleyda
2
1.5k
Noah Learner - AI + Me: how we built a GSC Bulk Export data pipeline
techseoconnect
PRO
0
200
A better future with KSS
kneath
240
18k
Evolving SEO for Evolving Search Engines
ryanjones
0
220
コードの90%をAIが書く世界で何が待っているのか / What awaits us in a world where 90% of the code is written by AI
rkaga
62
44k
Are puppies a ranking factor?
jonoalderson
1
3.6k
Exploring anti-patterns in Rails
aemeredith
3
420
Rails Girls Zürich Keynote
gr2m
96
14k
Designing Dashboards & Data Visualisations in Web Apps
destraynor
231
55k
New Earth Scene 8
popppiees
3
2.3k
Transcript
A Natural Language Pipeline
More Input
Knowledge” “A compendium of human...
Library
Physical archives became digital records, encoded with metadata
The internet promised rich dynamic experiences
The internet promised rich dynamic experiences but served us banner
ads
Advertising has and continues to fuel a substantial portion of
the innovation on the internet
What would The Economist look like if it were founded
in 2012?
User
First
Experience
“There’s a reason that tech companies are topping the lists
of most valuable companies and brands. Every company is a tech company.” Maggie Chan Jones
Every story, at its core, is a business story
Language
None
None
Stage -> Stenographer -> Editors -> spaCy -> Data Store
<-> Backend <- Slack <- Users Proto-Pipeline
Over eight hours we created data from the content of
the event, building the model in real-time
The model evolved over time
This was the experiment that would evolve into SiO 2
Silicon, a key element in everything from glass to microchips,
is at the core of global business
Oxygen, the journalistic voice Quartz breathes into the global business
news cycle
Entities are linguistic anchors, defined by context and around which
context can be inferred
Standard Entities PERSON FACILITY ORG PRODUCT GPE EVENT... Additional Entities
TECHNOLOGY PROCESS NATURE MEDIA CONSTRUCT
70K articles 1.4M blocks of text 85K labeled sentences
Entities
This spaCy model made rich analysis for any given text
easy to do on the fly
Stored analysis of a large corpus is a vital resource
The language graph...
Graph
The language graph is a mutable map of the language
model
Any new content is analyzed and then mapped onto the
language graph
Changes made to the graph can then be incorporated into
the next model iteration
The language graph becomes a primary resource for extracting training
data
Snapshots of time can be extracted from the language graph
Context can be derived by looking at the relationships in
the language graph
Elon Musk
Jeff Bezos
Mark Zuckerberg
Context
SiO 2 is a living Natural Language Pipeline of networked
algorithms trained on the corpus of Quartz to understand the linguistic patterns of global business news
The Pipeline(s) Quartz Corpus -> Training Sentences -> spaCy Content
-> spaCy -> Language Graph Language Graph -> Training Data -> Statistical Models / Classifiers Language Graph -> Training Sentences -> spaCy Unseen Content -> spaCy -> Pre-Processed Text / Vectors -> Statistical Models / Classifiers
Thank you