Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
A Natural Language Pipeline
Search
ddqz
July 06, 2019
Technology
0
510
A Natural Language Pipeline
Presentation from the spaCy IRL 2019 conference.
ddqz
July 06, 2019
Tweet
Share
Other Decks in Technology
See All in Technology
20251027_マルチエージェントとは
almondo_event
1
520
ゼロコード計装導入後のカスタム計装でさらに可観測性を高めよう
sansantech
PRO
1
650
パフォーマンスチューニングのために普段からできること/Performance Tuning: Daily Practices
fujiwara3
2
180
AIでデータ活用を加速させる取り組み / Leveraging AI to accelerate data utilization
okiyuki99
6
1.6k
プロダクト開発と社内データ活用での、BI×AIの現在地 / Data_Findy
sansan_randd
1
760
データとAIで明らかになる、私たちの課題 ~Snowflake MCP,Salesforce MCPに触れて~ / Data and AI Insights
kaonavi
0
230
20251027_findyさん_音声エージェントLT
almondo_event
2
530
datadog-incident-management-intro
tetsuya28
0
110
サブドメインテイクオーバー事例紹介と対策について
mikit
9
2.5k
マルチエージェントのチームビルディング_2025-10-25
shinoyamada
0
240
新米エンジニアをTech Leadに任命する ー 成長を支える挑戦的な人と組織のマネジメント
naopr
1
340
dbtとAIエージェントを組み合わせて見えたデータ調査の新しい形
10xinc
7
1.7k
Featured
See All Featured
Six Lessons from altMBA
skipperchong
29
4k
Balancing Empowerment & Direction
lara
5
710
Sharpening the Axe: The Primacy of Toolmaking
bcantrill
46
2.5k
Practical Orchestrator
shlominoach
190
11k
Creating an realtime collaboration tool: Agile Flush - .NET Oxford
marcduiker
34
2.3k
Let's Do A Bunch of Simple Stuff to Make Websites Faster
chriscoyier
508
140k
GraphQLとの向き合い方2022年版
quramy
49
14k
Documentation Writing (for coders)
carmenintech
76
5.1k
GraphQLの誤解/rethinking-graphql
sonatard
73
11k
Large-scale JavaScript Application Architecture
addyosmani
514
110k
RailsConf 2023
tenderlove
30
1.3k
Automating Front-end Workflow
addyosmani
1371
200k
Transcript
A Natural Language Pipeline
More Input
Knowledge” “A compendium of human...
Library
Physical archives became digital records, encoded with metadata
The internet promised rich dynamic experiences
The internet promised rich dynamic experiences but served us banner
ads
Advertising has and continues to fuel a substantial portion of
the innovation on the internet
What would The Economist look like if it were founded
in 2012?
User
First
Experience
“There’s a reason that tech companies are topping the lists
of most valuable companies and brands. Every company is a tech company.” Maggie Chan Jones
Every story, at its core, is a business story
Language
None
None
Stage -> Stenographer -> Editors -> spaCy -> Data Store
<-> Backend <- Slack <- Users Proto-Pipeline
Over eight hours we created data from the content of
the event, building the model in real-time
The model evolved over time
This was the experiment that would evolve into SiO 2
Silicon, a key element in everything from glass to microchips,
is at the core of global business
Oxygen, the journalistic voice Quartz breathes into the global business
news cycle
Entities are linguistic anchors, defined by context and around which
context can be inferred
Standard Entities PERSON FACILITY ORG PRODUCT GPE EVENT... Additional Entities
TECHNOLOGY PROCESS NATURE MEDIA CONSTRUCT
70K articles 1.4M blocks of text 85K labeled sentences
Entities
This spaCy model made rich analysis for any given text
easy to do on the fly
Stored analysis of a large corpus is a vital resource
The language graph...
Graph
The language graph is a mutable map of the language
model
Any new content is analyzed and then mapped onto the
language graph
Changes made to the graph can then be incorporated into
the next model iteration
The language graph becomes a primary resource for extracting training
data
Snapshots of time can be extracted from the language graph
Context can be derived by looking at the relationships in
the language graph
Elon Musk
Jeff Bezos
Mark Zuckerberg
Context
SiO 2 is a living Natural Language Pipeline of networked
algorithms trained on the corpus of Quartz to understand the linguistic patterns of global business news
The Pipeline(s) Quartz Corpus -> Training Sentences -> spaCy Content
-> spaCy -> Language Graph Language Graph -> Training Data -> Statistical Models / Classifiers Language Graph -> Training Sentences -> spaCy Unseen Content -> spaCy -> Pre-Processed Text / Vectors -> Statistical Models / Classifiers
Thank you