$30 off During Our Annual Pro Sale. View Details »
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
A Natural Language Pipeline
Search
ddqz
July 06, 2019
Technology
0
520
A Natural Language Pipeline
Presentation from the spaCy IRL 2019 conference.
ddqz
July 06, 2019
Tweet
Share
Other Decks in Technology
See All in Technology
Bedrock AgentCore Memoryの新機能 (Episode) を試してみた / try Bedrock AgentCore Memory Episodic functionarity
hoshi7_n
2
1.8k
なぜ あなたはそんなに re:Invent に行くのか?
miu_crescent
PRO
0
200
【開発を止めるな】機能追加と並行して進めるアーキテクチャ改善/Keep Shipping: Architecture Improvements Without Pausing Dev
bitkey
PRO
1
130
アプリにAIを正しく組み込むための アーキテクチャ── 国産LLMの現実と実践
kohju
0
220
Kiro を用いたペアプロのススメ
taikis
4
1.8k
100以上の新規コネクタ提供を可能にしたアーキテクチャ
ooyukioo
0
250
Amazon Quick Suite で始める手軽な AI エージェント
shimy
1
1.8k
ActiveJobUpdates
igaiga
1
310
オープンソースKeycloakのMCP認可サーバの仕様の対応状況 / 20251219 OpenID BizDay #18 LT Keycloak
oidfj
0
170
普段使ってるClaude Skillsの紹介(by Notebooklm)
zerebom
8
2.1k
テストセンター受験、オンライン受験、どっちなんだい?
yama3133
0
150
Bedrock AgentCore Evaluationsで学ぶLLM as a judge入門
shichijoyuhi
2
240
Featured
See All Featured
What Being in a Rock Band Can Teach Us About Real World SEO
427marketing
0
150
JavaScript: Past, Present, and Future - NDC Porto 2020
reverentgeek
52
5.8k
Automating Front-end Workflow
addyosmani
1371
200k
30 Presentation Tips
portentint
PRO
1
170
[Rails World 2023 - Day 1 Closing Keynote] - The Magic of Rails
eileencodes
37
2.7k
Why Our Code Smells
bkeepers
PRO
340
57k
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
286
14k
"I'm Feeling Lucky" - Building Great Search Experiences for Today's Users (#IAC19)
danielanewman
231
22k
Designing Experiences People Love
moore
143
24k
More Than Pixels: Becoming A User Experience Designer
marktimemedia
2
260
Speed Design
sergeychernyshev
33
1.4k
ReactJS: Keep Simple. Everything can be a component!
pedronauck
666
130k
Transcript
A Natural Language Pipeline
More Input
Knowledge” “A compendium of human...
Library
Physical archives became digital records, encoded with metadata
The internet promised rich dynamic experiences
The internet promised rich dynamic experiences but served us banner
ads
Advertising has and continues to fuel a substantial portion of
the innovation on the internet
What would The Economist look like if it were founded
in 2012?
User
First
Experience
“There’s a reason that tech companies are topping the lists
of most valuable companies and brands. Every company is a tech company.” Maggie Chan Jones
Every story, at its core, is a business story
Language
None
None
Stage -> Stenographer -> Editors -> spaCy -> Data Store
<-> Backend <- Slack <- Users Proto-Pipeline
Over eight hours we created data from the content of
the event, building the model in real-time
The model evolved over time
This was the experiment that would evolve into SiO 2
Silicon, a key element in everything from glass to microchips,
is at the core of global business
Oxygen, the journalistic voice Quartz breathes into the global business
news cycle
Entities are linguistic anchors, defined by context and around which
context can be inferred
Standard Entities PERSON FACILITY ORG PRODUCT GPE EVENT... Additional Entities
TECHNOLOGY PROCESS NATURE MEDIA CONSTRUCT
70K articles 1.4M blocks of text 85K labeled sentences
Entities
This spaCy model made rich analysis for any given text
easy to do on the fly
Stored analysis of a large corpus is a vital resource
The language graph...
Graph
The language graph is a mutable map of the language
model
Any new content is analyzed and then mapped onto the
language graph
Changes made to the graph can then be incorporated into
the next model iteration
The language graph becomes a primary resource for extracting training
data
Snapshots of time can be extracted from the language graph
Context can be derived by looking at the relationships in
the language graph
Elon Musk
Jeff Bezos
Mark Zuckerberg
Context
SiO 2 is a living Natural Language Pipeline of networked
algorithms trained on the corpus of Quartz to understand the linguistic patterns of global business news
The Pipeline(s) Quartz Corpus -> Training Sentences -> spaCy Content
-> spaCy -> Language Graph Language Graph -> Training Data -> Statistical Models / Classifiers Language Graph -> Training Sentences -> spaCy Unseen Content -> spaCy -> Pre-Processed Text / Vectors -> Statistical Models / Classifiers
Thank you