Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
A Natural Language Pipeline
Search
ddqz
July 06, 2019
Technology
0
530
A Natural Language Pipeline
Presentation from the spaCy IRL 2019 conference.
ddqz
July 06, 2019
Tweet
Share
Other Decks in Technology
See All in Technology
Master Dataグループ紹介資料
sansan33
PRO
1
4.4k
A Gentle Introduction to Transformers
keio_smilab
PRO
1
660
【SLO】"多様な期待値" と向き合ってみた
z63d
2
310
開発組織の課題解決を加速するための権限委譲 -する側、される側としての向き合い方-
daitasu
5
270
組織のSREを推進するためのPlatform EngineeringとEKS / Platform Engineering and EKS to drive SRE in your organization
chmikata
0
180
DevOpsエージェントで実現する!! AWS Well-Architected(W-A) を実現するシステム設計 / 20260307 Masaki Okuda
shift_evolve
PRO
2
120
LLM活用の壁を超える:リクルートR&Dの戦略と打ち手
recruitengineers
PRO
1
250
Oracle Database@Google Cloud:サービス概要のご紹介
oracle4engineer
PRO
5
1.1k
Digitization部 紹介資料
sansan33
PRO
1
7k
ブラックボックス観測に基づくAI支援のプロトコルのリバースエンジニアリングと再現~AIを用いたリバースエンジニアリング~ @ SECCON 14 電脳会議 / Reverse Engineering and Reproduction of an AI-Assisted Protocol Based on Black-Box Observation @ SECCON 14 DENNO-KAIGI
chibiegg
0
150
Startups on Rails: 2026 at RubyConf Thailand
irinanazarova
0
140
Evolution of Claude Code & How to use features
oikon48
1
370
Featured
See All Featured
The Web Performance Landscape in 2024 [PerfNow 2024]
tammyeverts
12
1.1k
So, you think you're a good person
axbom
PRO
2
1.9k
The Spectacular Lies of Maps
axbom
PRO
1
580
Highjacked: Video Game Concept Design
rkendrick25
PRO
1
310
実際に使うSQLの書き方 徹底解説 / pgcon21j-tutorial
soudai
PRO
199
73k
Kristin Tynski - Automating Marketing Tasks With AI
techseoconnect
PRO
0
180
B2B Lead Gen: Tactics, Traps & Triumph
marketingsoph
0
68
Stop Working from a Prison Cell
hatefulcrawdad
274
21k
The Cult of Friendly URLs
andyhume
79
6.8k
How to make the Groovebox
asonas
2
2k
Design of three-dimensional binary manipulators for pick-and-place task avoiding obstacles (IECON2024)
konakalab
0
370
Color Theory Basics | Prateek | Gurzu
gurzu
0
230
Transcript
A Natural Language Pipeline
More Input
Knowledge” “A compendium of human...
Library
Physical archives became digital records, encoded with metadata
The internet promised rich dynamic experiences
The internet promised rich dynamic experiences but served us banner
ads
Advertising has and continues to fuel a substantial portion of
the innovation on the internet
What would The Economist look like if it were founded
in 2012?
User
First
Experience
“There’s a reason that tech companies are topping the lists
of most valuable companies and brands. Every company is a tech company.” Maggie Chan Jones
Every story, at its core, is a business story
Language
None
None
Stage -> Stenographer -> Editors -> spaCy -> Data Store
<-> Backend <- Slack <- Users Proto-Pipeline
Over eight hours we created data from the content of
the event, building the model in real-time
The model evolved over time
This was the experiment that would evolve into SiO 2
Silicon, a key element in everything from glass to microchips,
is at the core of global business
Oxygen, the journalistic voice Quartz breathes into the global business
news cycle
Entities are linguistic anchors, defined by context and around which
context can be inferred
Standard Entities PERSON FACILITY ORG PRODUCT GPE EVENT... Additional Entities
TECHNOLOGY PROCESS NATURE MEDIA CONSTRUCT
70K articles 1.4M blocks of text 85K labeled sentences
Entities
This spaCy model made rich analysis for any given text
easy to do on the fly
Stored analysis of a large corpus is a vital resource
The language graph...
Graph
The language graph is a mutable map of the language
model
Any new content is analyzed and then mapped onto the
language graph
Changes made to the graph can then be incorporated into
the next model iteration
The language graph becomes a primary resource for extracting training
data
Snapshots of time can be extracted from the language graph
Context can be derived by looking at the relationships in
the language graph
Elon Musk
Jeff Bezos
Mark Zuckerberg
Context
SiO 2 is a living Natural Language Pipeline of networked
algorithms trained on the corpus of Quartz to understand the linguistic patterns of global business news
The Pipeline(s) Quartz Corpus -> Training Sentences -> spaCy Content
-> spaCy -> Language Graph Language Graph -> Training Data -> Statistical Models / Classifiers Language Graph -> Training Sentences -> spaCy Unseen Content -> spaCy -> Pre-Processed Text / Vectors -> Statistical Models / Classifiers
Thank you