Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
A Natural Language Pipeline
Search
Sponsored
·
SiteGround - Reliable hosting with speed, security, and support you can count on.
→
ddqz
July 06, 2019
Technology
540
0
Share
A Natural Language Pipeline
Presentation from the spaCy IRL 2019 conference.
ddqz
July 06, 2019
Other Decks in Technology
See All in Technology
ハーネスエンジニアリングの概要と設計思想
sergicalsix
9
6.4k
AI와 협업하는 조직으로의 여정
arawn
0
560
260422_Sansan_Tech_Talk__関西_vol.3_データ活用のリアル__矢田__.pdf
sansantech
PRO
0
140
生成AIが変える SaaS の競争原理と弁護士ドットコムのプロダクト戦略
bengo4com
1
2.9k
Good Enough Types: Heuristic Type Inference for Ruby
riseshia
1
360
AIが自律的に働く時代へ Amazon Quick で実現するAIエージェント紹介
koheiyoshikawa
0
150
AgentCore Managed Harness を使ってみよう
yakumo
2
270
AIが書いたコードを信じられない問題 〜レビュー負荷を下げるために変えたこと〜 / The AI Code Trust Gap: Reducing the Review Burden
bitkey
PRO
8
1.4k
データ定義の混乱と戦う 〜 管理会計と財務会計 〜
wonohe
0
160
はじめての MagicPod生成AI機能 機能紹介から活用方法まで
magicpod
0
120
基盤を育てる 外部SaaS連携の運用
gamonges_dresscode
1
120
ServiceNow Knowledge 26 の歩き方
manarobot
0
250
Featured
See All Featured
16th Malabo Montpellier Forum Presentation
akademiya2063
PRO
0
110
Avoiding the “Bad Training, Faster” Trap in the Age of AI
tmiket
0
130
B2B Lead Gen: Tactics, Traps & Triumph
marketingsoph
0
110
Raft: Consensus for Rubyists
vanstee
141
7.4k
How to Grow Your eCommerce with AI & Automation
katarinadahlin
PRO
1
170
Keith and Marios Guide to Fast Websites
keithpitt
413
23k
Balancing Empowerment & Direction
lara
6
1.1k
How to Create Impact in a Changing Tech Landscape [PerfNow 2023]
tammyeverts
55
3.3k
Gemini Prompt Engineering: Practical Techniques for Tangible AI Outcomes
mfonobong
2
380
A better future with KSS
kneath
240
18k
SEOcharity - Dark patterns in SEO and UX: How to avoid them and build a more ethical web
sarafernandez
0
170
Facilitating Awesome Meetings
lara
57
6.8k
Transcript
A Natural Language Pipeline
More Input
Knowledge” “A compendium of human...
Library
Physical archives became digital records, encoded with metadata
The internet promised rich dynamic experiences
The internet promised rich dynamic experiences but served us banner
ads
Advertising has and continues to fuel a substantial portion of
the innovation on the internet
What would The Economist look like if it were founded
in 2012?
User
First
Experience
“There’s a reason that tech companies are topping the lists
of most valuable companies and brands. Every company is a tech company.” Maggie Chan Jones
Every story, at its core, is a business story
Language
None
None
Stage -> Stenographer -> Editors -> spaCy -> Data Store
<-> Backend <- Slack <- Users Proto-Pipeline
Over eight hours we created data from the content of
the event, building the model in real-time
The model evolved over time
This was the experiment that would evolve into SiO 2
Silicon, a key element in everything from glass to microchips,
is at the core of global business
Oxygen, the journalistic voice Quartz breathes into the global business
news cycle
Entities are linguistic anchors, defined by context and around which
context can be inferred
Standard Entities PERSON FACILITY ORG PRODUCT GPE EVENT... Additional Entities
TECHNOLOGY PROCESS NATURE MEDIA CONSTRUCT
70K articles 1.4M blocks of text 85K labeled sentences
Entities
This spaCy model made rich analysis for any given text
easy to do on the fly
Stored analysis of a large corpus is a vital resource
The language graph...
Graph
The language graph is a mutable map of the language
model
Any new content is analyzed and then mapped onto the
language graph
Changes made to the graph can then be incorporated into
the next model iteration
The language graph becomes a primary resource for extracting training
data
Snapshots of time can be extracted from the language graph
Context can be derived by looking at the relationships in
the language graph
Elon Musk
Jeff Bezos
Mark Zuckerberg
Context
SiO 2 is a living Natural Language Pipeline of networked
algorithms trained on the corpus of Quartz to understand the linguistic patterns of global business news
The Pipeline(s) Quartz Corpus -> Training Sentences -> spaCy Content
-> spaCy -> Language Graph Language Graph -> Training Data -> Statistical Models / Classifiers Language Graph -> Training Sentences -> spaCy Unseen Content -> spaCy -> Pre-Processed Text / Vectors -> Statistical Models / Classifiers
Thank you