Upgrade to PRO for Only $50/Year—Limited-Time Offer! 🔥
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
A Natural Language Pipeline
Search
ddqz
July 06, 2019
Technology
0
520
A Natural Language Pipeline
Presentation from the spaCy IRL 2019 conference.
ddqz
July 06, 2019
Tweet
Share
Other Decks in Technology
See All in Technology
Strands AgentsとNova 2 SonicでS2Sを実践してみた
yama3133
1
980
Fashion×AI「似合う」を届けるためのWEARのAI戦略
zozotech
PRO
2
990
AgentCore BrowserとClaude Codeスキルを活用した 『初手AI』を実現する業務自動化AIエージェント基盤
ruzia
4
140
AWS re:Invent 2025~初参加の成果と学び~
kubomasataka
0
150
会社紹介資料 / Sansan Company Profile
sansan33
PRO
11
390k
寫了幾年 Code,然後呢?軟體工程師必須重新認識的 DevOps
cheng_wei_chen
1
1.5k
業務のトイルをバスターせよ 〜AI時代の生存戦略〜
staka121
PRO
2
230
ExpoのインダストリーブースでみたAWSが見せる製造業の未来
hamadakoji
0
170
re:Invent2025 3つの Frontier Agents を紹介 / introducing-3-frontier-agents
tomoki10
0
300
Amazon Bedrock Knowledge Bases × メタデータ活用で実現する検証可能な RAG 設計
tomoaki25
6
1.4k
.NET 10の概要
tomokusaba
0
120
プロンプトやエージェントを自動的に作る方法
shibuiwilliam
15
15k
Featured
See All Featured
HU Berlin: Industrial-Strength Natural Language Processing with spaCy and Prodigy
inesmontani
PRO
0
92
Thoughts on Productivity
jonyablonski
73
5k
Lightning talk: Run Django tests with GitHub Actions
sabderemane
0
87
Balancing Empowerment & Direction
lara
5
810
Exploring the Power of Turbo Streams & Action Cable | RailsConf2023
kevinliebholz
37
6.2k
Designing Dashboards & Data Visualisations in Web Apps
destraynor
231
54k
Designing for humans not robots
tammielis
254
26k
Designing for Timeless Needs
cassininazir
0
86
Redefining SEO in the New Era of Traffic Generation
szymonslowik
1
160
[SF Ruby Conf 2025] Rails X
palkan
0
550
実際に使うSQLの書き方 徹底解説 / pgcon21j-tutorial
soudai
PRO
196
70k
30 Presentation Tips
portentint
PRO
1
170
Transcript
A Natural Language Pipeline
More Input
Knowledge” “A compendium of human...
Library
Physical archives became digital records, encoded with metadata
The internet promised rich dynamic experiences
The internet promised rich dynamic experiences but served us banner
ads
Advertising has and continues to fuel a substantial portion of
the innovation on the internet
What would The Economist look like if it were founded
in 2012?
User
First
Experience
“There’s a reason that tech companies are topping the lists
of most valuable companies and brands. Every company is a tech company.” Maggie Chan Jones
Every story, at its core, is a business story
Language
None
None
Stage -> Stenographer -> Editors -> spaCy -> Data Store
<-> Backend <- Slack <- Users Proto-Pipeline
Over eight hours we created data from the content of
the event, building the model in real-time
The model evolved over time
This was the experiment that would evolve into SiO 2
Silicon, a key element in everything from glass to microchips,
is at the core of global business
Oxygen, the journalistic voice Quartz breathes into the global business
news cycle
Entities are linguistic anchors, defined by context and around which
context can be inferred
Standard Entities PERSON FACILITY ORG PRODUCT GPE EVENT... Additional Entities
TECHNOLOGY PROCESS NATURE MEDIA CONSTRUCT
70K articles 1.4M blocks of text 85K labeled sentences
Entities
This spaCy model made rich analysis for any given text
easy to do on the fly
Stored analysis of a large corpus is a vital resource
The language graph...
Graph
The language graph is a mutable map of the language
model
Any new content is analyzed and then mapped onto the
language graph
Changes made to the graph can then be incorporated into
the next model iteration
The language graph becomes a primary resource for extracting training
data
Snapshots of time can be extracted from the language graph
Context can be derived by looking at the relationships in
the language graph
Elon Musk
Jeff Bezos
Mark Zuckerberg
Context
SiO 2 is a living Natural Language Pipeline of networked
algorithms trained on the corpus of Quartz to understand the linguistic patterns of global business news
The Pipeline(s) Quartz Corpus -> Training Sentences -> spaCy Content
-> spaCy -> Language Graph Language Graph -> Training Data -> Statistical Models / Classifiers Language Graph -> Training Sentences -> spaCy Unseen Content -> spaCy -> Pre-Processed Text / Vectors -> Statistical Models / Classifiers
Thank you