Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
A Natural Language Pipeline
Search
ddqz
July 06, 2019
Technology
0
500
A Natural Language Pipeline
Presentation from the spaCy IRL 2019 conference.
ddqz
July 06, 2019
Tweet
Share
Other Decks in Technology
See All in Technology
Codeful Serverless / 一人運用でもやり抜く力
_kensh
7
360
おやつは300円まで!の最適化を模索してみた
techtekt
PRO
0
290
RSCの時代にReactとフレームワークの境界を探る
uhyo
10
3.3k
スマートファクトリーの第一歩 〜AWSマネージドサービスで 実現する予知保全と生成AI活用まで
ganota
1
190
AIのグローバルトレンド2025 #scrummikawa / global ai trend
kyonmm
PRO
1
260
Django's GeneratedField by example - DjangoCon US 2025
pauloxnet
0
110
オブザーバビリティが広げる AIOps の世界 / The World of AIOps Expanded by Observability
aoto
PRO
0
330
Flutterでキャッチしないエラーはどこに行く
taiju59
0
220
なぜスクラムはこうなったのか?歴史が教えてくれたこと/Shall we explore the roots of Scrum
sanogemaru
5
1.5k
DDD集約とサービスコンテキスト境界との関係性
pandayumi
2
270
20250910_障害注入から効率的復旧へ_カオスエンジニアリング_生成AIで考えるAWS障害対応.pdf
sh_fk2
3
190
Autonomous Database - Dedicated 技術詳細 / adb-d_technical_detail_jp
oracle4engineer
PRO
4
10k
Featured
See All Featured
Building Adaptive Systems
keathley
43
2.7k
Building a Modern Day E-commerce SEO Strategy
aleyda
43
7.6k
[RailsConf 2023 Opening Keynote] The Magic of Rails
eileencodes
30
9.6k
Become a Pro
speakerdeck
PRO
29
5.5k
Testing 201, or: Great Expectations
jmmastey
45
7.6k
Typedesign – Prime Four
hannesfritz
42
2.8k
Performance Is Good for Brains [We Love Speed 2024]
tammyeverts
12
1.1k
Raft: Consensus for Rubyists
vanstee
140
7.1k
Reflections from 52 weeks, 52 projects
jeffersonlam
352
21k
Let's Do A Bunch of Simple Stuff to Make Websites Faster
chriscoyier
507
140k
A designer walks into a library…
pauljervisheath
207
24k
No one is an island. Learnings from fostering a developers community.
thoeni
21
3.4k
Transcript
A Natural Language Pipeline
More Input
Knowledge” “A compendium of human...
Library
Physical archives became digital records, encoded with metadata
The internet promised rich dynamic experiences
The internet promised rich dynamic experiences but served us banner
ads
Advertising has and continues to fuel a substantial portion of
the innovation on the internet
What would The Economist look like if it were founded
in 2012?
User
First
Experience
“There’s a reason that tech companies are topping the lists
of most valuable companies and brands. Every company is a tech company.” Maggie Chan Jones
Every story, at its core, is a business story
Language
None
None
Stage -> Stenographer -> Editors -> spaCy -> Data Store
<-> Backend <- Slack <- Users Proto-Pipeline
Over eight hours we created data from the content of
the event, building the model in real-time
The model evolved over time
This was the experiment that would evolve into SiO 2
Silicon, a key element in everything from glass to microchips,
is at the core of global business
Oxygen, the journalistic voice Quartz breathes into the global business
news cycle
Entities are linguistic anchors, defined by context and around which
context can be inferred
Standard Entities PERSON FACILITY ORG PRODUCT GPE EVENT... Additional Entities
TECHNOLOGY PROCESS NATURE MEDIA CONSTRUCT
70K articles 1.4M blocks of text 85K labeled sentences
Entities
This spaCy model made rich analysis for any given text
easy to do on the fly
Stored analysis of a large corpus is a vital resource
The language graph...
Graph
The language graph is a mutable map of the language
model
Any new content is analyzed and then mapped onto the
language graph
Changes made to the graph can then be incorporated into
the next model iteration
The language graph becomes a primary resource for extracting training
data
Snapshots of time can be extracted from the language graph
Context can be derived by looking at the relationships in
the language graph
Elon Musk
Jeff Bezos
Mark Zuckerberg
Context
SiO 2 is a living Natural Language Pipeline of networked
algorithms trained on the corpus of Quartz to understand the linguistic patterns of global business news
The Pipeline(s) Quartz Corpus -> Training Sentences -> spaCy Content
-> spaCy -> Language Graph Language Graph -> Training Data -> Statistical Models / Classifiers Language Graph -> Training Sentences -> spaCy Unseen Content -> spaCy -> Pre-Processed Text / Vectors -> Statistical Models / Classifiers
Thank you