Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
A Natural Language Pipeline
Search
ddqz
July 06, 2019
Technology
0
520
A Natural Language Pipeline
Presentation from the spaCy IRL 2019 conference.
ddqz
July 06, 2019
Tweet
Share
Other Decks in Technology
See All in Technology
AWS Devops Agent ~ 自動調査とSlack統合をやってみた! ~
kubomasataka
3
330
Databricks Free Edition講座 データサイエンス編
taka_aki
0
280
usermode linux without MMU - fosdem2026 kernel devroom
thehajime
0
190
いよいよ仕事を奪われそうな波が来たぜ
kazzpapa3
3
350
ZOZOにおけるAI活用の現在 ~開発組織全体での取り組みと試行錯誤~
zozotech
PRO
4
3.8k
Deno・Bunの標準機能やElysiaJSを使ったWebSocketサーバー実装 / ラーメン屋を貸し切ってLT会! IoTLT 2026新年会
you
PRO
0
220
学生・新卒・ジュニアから目指すSRE
hiroyaonoe
1
390
最速で価値を出すための プロダクトエンジニアのツッコミ術
kaacun
1
520
2026年、サーバーレスの現在地 -「制約と戦う技術」から「当たり前の実行基盤」へ- /serverless2026
slsops
2
160
SMTP完全に理解した ✉️
yamatai1212
0
180
クレジットカード決済基盤を支えるSRE - 厳格な監査とSRE運用の両立 (SRE Kaigi 2026)
capytan
6
1.8k
CDK対応したAWS DevOps Agentを試そう_20260201
masakiokuda
1
130
Featured
See All Featured
JavaScript: Past, Present, and Future - NDC Porto 2020
reverentgeek
52
5.8k
Leo the Paperboy
mayatellez
4
1.4k
Leveraging LLMs for student feedback in introductory data science courses - posit::conf(2025)
minecr
0
130
The AI Search Optimization Roadmap by Aleyda Solis
aleyda
1
5.2k
We Are The Robots
honzajavorek
0
160
The Invisible Side of Design
smashingmag
302
51k
The Mindset for Success: Future Career Progression
greggifford
PRO
0
230
The Spectacular Lies of Maps
axbom
PRO
1
500
WENDY [Excerpt]
tessaabrams
9
36k
Exploring the Power of Turbo Streams & Action Cable | RailsConf2023
kevinliebholz
37
6.3k
Connecting the Dots Between Site Speed, User Experience & Your Business [WebExpo 2025]
tammyeverts
11
820
Navigating the Design Leadership Dip - Product Design Week Design Leaders+ Conference 2024
apolaine
0
170
Transcript
A Natural Language Pipeline
More Input
Knowledge” “A compendium of human...
Library
Physical archives became digital records, encoded with metadata
The internet promised rich dynamic experiences
The internet promised rich dynamic experiences but served us banner
ads
Advertising has and continues to fuel a substantial portion of
the innovation on the internet
What would The Economist look like if it were founded
in 2012?
User
First
Experience
“There’s a reason that tech companies are topping the lists
of most valuable companies and brands. Every company is a tech company.” Maggie Chan Jones
Every story, at its core, is a business story
Language
None
None
Stage -> Stenographer -> Editors -> spaCy -> Data Store
<-> Backend <- Slack <- Users Proto-Pipeline
Over eight hours we created data from the content of
the event, building the model in real-time
The model evolved over time
This was the experiment that would evolve into SiO 2
Silicon, a key element in everything from glass to microchips,
is at the core of global business
Oxygen, the journalistic voice Quartz breathes into the global business
news cycle
Entities are linguistic anchors, defined by context and around which
context can be inferred
Standard Entities PERSON FACILITY ORG PRODUCT GPE EVENT... Additional Entities
TECHNOLOGY PROCESS NATURE MEDIA CONSTRUCT
70K articles 1.4M blocks of text 85K labeled sentences
Entities
This spaCy model made rich analysis for any given text
easy to do on the fly
Stored analysis of a large corpus is a vital resource
The language graph...
Graph
The language graph is a mutable map of the language
model
Any new content is analyzed and then mapped onto the
language graph
Changes made to the graph can then be incorporated into
the next model iteration
The language graph becomes a primary resource for extracting training
data
Snapshots of time can be extracted from the language graph
Context can be derived by looking at the relationships in
the language graph
Elon Musk
Jeff Bezos
Mark Zuckerberg
Context
SiO 2 is a living Natural Language Pipeline of networked
algorithms trained on the corpus of Quartz to understand the linguistic patterns of global business news
The Pipeline(s) Quartz Corpus -> Training Sentences -> spaCy Content
-> spaCy -> Language Graph Language Graph -> Training Data -> Statistical Models / Classifiers Language Graph -> Training Sentences -> spaCy Unseen Content -> spaCy -> Pre-Processed Text / Vectors -> Statistical Models / Classifiers
Thank you