Upgrade to PRO for Only $50/Year—Limited-Time Offer! 🔥
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
A Natural Language Pipeline
Search
ddqz
July 06, 2019
Technology
0
520
A Natural Language Pipeline
Presentation from the spaCy IRL 2019 conference.
ddqz
July 06, 2019
Tweet
Share
Other Decks in Technology
See All in Technology
Edge AI Performance on Zephyr Pico vs. Pico 2
iotengineer22
0
110
Oracle Technology Night #95 GoldenGate 26ai の実装に迫る1
oracle4engineer
PRO
0
150
Sansanが実践する Platform EngineeringとSREの協創
sansantech
PRO
2
670
ガバメントクラウド利用システムのライフサイクルについて
techniczna
0
180
大企業でもできる!ボトムアップで拡大させるプラットフォームの作り方
findy_eventslides
1
570
非CUDAの悲哀 〜Claude Code と挑んだ image to 3D “Hunyuan3D”を EVO-X2(Ryzen AI Max+395)で動作させるチャレンジ〜
hawkymisc
1
160
モバイルゲーム開発におけるエージェント技術活用への試行錯誤 ~開発効率化へのアプローチの紹介と未来に向けた展望~
qualiarts
0
660
生成AIでテスト設計はどこまでできる? 「テスト粒度」を操るテーラリング術
shota_kusaba
0
540
Debugging Edge AI on Zephyr and Lessons Learned
iotengineer22
0
120
Challenging Hardware Contests with Zephyr and Lessons Learned
iotengineer22
0
120
ブロックテーマとこれからの WordPress サイト制作 / Toyama WordPress Meetup Vol.81
torounit
0
510
Noを伝える技術2025: 爆速合意形成のためのNICOフレームワーク速習 #pmconf2025
aki_iinuma
2
2.1k
Featured
See All Featured
How STYLIGHT went responsive
nonsquared
100
6k
A designer walks into a library…
pauljervisheath
210
24k
Fantastic passwords and where to find them - at NoRuKo
philnash
52
3.5k
Optimising Largest Contentful Paint
csswizardry
37
3.5k
[RailsConf 2023] Rails as a piece of cake
palkan
58
6.1k
The Cost Of JavaScript in 2023
addyosmani
55
9.3k
Imperfection Machines: The Place of Print at Facebook
scottboms
269
13k
Leading Effective Engineering Teams in the AI Era
addyosmani
8
1.3k
Evolution of real-time – Irina Nazarova, EuRuKo, 2024
irinanazarova
9
1.1k
Build The Right Thing And Hit Your Dates
maggiecrowley
38
3k
Facilitating Awesome Meetings
lara
57
6.7k
RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub
eileencodes
141
34k
Transcript
A Natural Language Pipeline
More Input
Knowledge” “A compendium of human...
Library
Physical archives became digital records, encoded with metadata
The internet promised rich dynamic experiences
The internet promised rich dynamic experiences but served us banner
ads
Advertising has and continues to fuel a substantial portion of
the innovation on the internet
What would The Economist look like if it were founded
in 2012?
User
First
Experience
“There’s a reason that tech companies are topping the lists
of most valuable companies and brands. Every company is a tech company.” Maggie Chan Jones
Every story, at its core, is a business story
Language
None
None
Stage -> Stenographer -> Editors -> spaCy -> Data Store
<-> Backend <- Slack <- Users Proto-Pipeline
Over eight hours we created data from the content of
the event, building the model in real-time
The model evolved over time
This was the experiment that would evolve into SiO 2
Silicon, a key element in everything from glass to microchips,
is at the core of global business
Oxygen, the journalistic voice Quartz breathes into the global business
news cycle
Entities are linguistic anchors, defined by context and around which
context can be inferred
Standard Entities PERSON FACILITY ORG PRODUCT GPE EVENT... Additional Entities
TECHNOLOGY PROCESS NATURE MEDIA CONSTRUCT
70K articles 1.4M blocks of text 85K labeled sentences
Entities
This spaCy model made rich analysis for any given text
easy to do on the fly
Stored analysis of a large corpus is a vital resource
The language graph...
Graph
The language graph is a mutable map of the language
model
Any new content is analyzed and then mapped onto the
language graph
Changes made to the graph can then be incorporated into
the next model iteration
The language graph becomes a primary resource for extracting training
data
Snapshots of time can be extracted from the language graph
Context can be derived by looking at the relationships in
the language graph
Elon Musk
Jeff Bezos
Mark Zuckerberg
Context
SiO 2 is a living Natural Language Pipeline of networked
algorithms trained on the corpus of Quartz to understand the linguistic patterns of global business news
The Pipeline(s) Quartz Corpus -> Training Sentences -> spaCy Content
-> spaCy -> Language Graph Language Graph -> Training Data -> Statistical Models / Classifiers Language Graph -> Training Sentences -> spaCy Unseen Content -> spaCy -> Pre-Processed Text / Vectors -> Statistical Models / Classifiers
Thank you