Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
A Natural Language Pipeline
Search
Sponsored
·
Ship Features Fearlessly
Turn features on and off without deploys. Used by thousands of Ruby developers.
→
ddqz
July 06, 2019
Technology
530
0
Share
A Natural Language Pipeline
Presentation from the spaCy IRL 2019 conference.
ddqz
July 06, 2019
Other Decks in Technology
See All in Technology
OCI技術資料 : ロード・バランサ 概要 - FLB・NLB共通
ocise
4
27k
TUNA Camp 2026 京都Stage ヒューリスティックアルゴリズム入門
terryu16
0
660
Databricks Appsで実現する社内向けAIアプリ開発の効率化
r_miura
0
230
CloudFrontのHost Header転送設定でパケットの中身はどう変わるのか?
nagisa53
1
240
JSTQB Expert Levelシラバス「テストマネジメント」日本語版のご紹介
ymty
0
110
Datadog で実現するセキュリティ対策 ~オブザーバビリティとセキュリティを 一緒にやると何がいいのか~
a2ush
0
190
LLMに何を任せ、何を任せないか
cap120
11
6.9k
Babylon.js Japan Activities (2026/4)
limes2018
0
140
OPENLOGI Company Profile
hr01
0
83k
QA組織のAI戦略とAIテスト設計システムAITASの実践
sansantech
PRO
1
310
AI時代のシステム開発者の仕事_20260328
sengtor
0
320
Even G2 クイックスタートガイド(日本語版)
vrshinobi1
0
190
Featured
See All Featured
A brief & incomplete history of UX Design for the World Wide Web: 1989–2019
jct
1
340
Faster Mobile Websites
deanohume
310
31k
Bioeconomy Workshop: Dr. Julius Ecuru, Opportunities for a Bioeconomy in West Africa
akademiya2063
PRO
1
81
SEOcharity - Dark patterns in SEO and UX: How to avoid them and build a more ethical web
sarafernandez
0
160
Mind Mapping
helmedeiros
PRO
1
140
Why Our Code Smells
bkeepers
PRO
340
58k
Applied NLP in the Age of Generative AI
inesmontani
PRO
4
2.2k
Principles of Awesome APIs and How to Build Them.
keavy
128
17k
Unlocking the hidden potential of vector embeddings in international SEO
frankvandijk
0
230
AI: The stuff that nobody shows you
jnunemaker
PRO
4
500
How to Build an AI Search Optimization Roadmap - Criteria and Steps to Take #SEOIRL
aleyda
1
2k
Technical Leadership for Architectural Decision Making
baasie
3
300
Transcript
A Natural Language Pipeline
More Input
Knowledge” “A compendium of human...
Library
Physical archives became digital records, encoded with metadata
The internet promised rich dynamic experiences
The internet promised rich dynamic experiences but served us banner
ads
Advertising has and continues to fuel a substantial portion of
the innovation on the internet
What would The Economist look like if it were founded
in 2012?
User
First
Experience
“There’s a reason that tech companies are topping the lists
of most valuable companies and brands. Every company is a tech company.” Maggie Chan Jones
Every story, at its core, is a business story
Language
None
None
Stage -> Stenographer -> Editors -> spaCy -> Data Store
<-> Backend <- Slack <- Users Proto-Pipeline
Over eight hours we created data from the content of
the event, building the model in real-time
The model evolved over time
This was the experiment that would evolve into SiO 2
Silicon, a key element in everything from glass to microchips,
is at the core of global business
Oxygen, the journalistic voice Quartz breathes into the global business
news cycle
Entities are linguistic anchors, defined by context and around which
context can be inferred
Standard Entities PERSON FACILITY ORG PRODUCT GPE EVENT... Additional Entities
TECHNOLOGY PROCESS NATURE MEDIA CONSTRUCT
70K articles 1.4M blocks of text 85K labeled sentences
Entities
This spaCy model made rich analysis for any given text
easy to do on the fly
Stored analysis of a large corpus is a vital resource
The language graph...
Graph
The language graph is a mutable map of the language
model
Any new content is analyzed and then mapped onto the
language graph
Changes made to the graph can then be incorporated into
the next model iteration
The language graph becomes a primary resource for extracting training
data
Snapshots of time can be extracted from the language graph
Context can be derived by looking at the relationships in
the language graph
Elon Musk
Jeff Bezos
Mark Zuckerberg
Context
SiO 2 is a living Natural Language Pipeline of networked
algorithms trained on the corpus of Quartz to understand the linguistic patterns of global business news
The Pipeline(s) Quartz Corpus -> Training Sentences -> spaCy Content
-> spaCy -> Language Graph Language Graph -> Training Data -> Statistical Models / Classifiers Language Graph -> Training Sentences -> spaCy Unseen Content -> spaCy -> Pre-Processed Text / Vectors -> Statistical Models / Classifiers
Thank you