A Natural Language Pipeline

More Input

Knowledge” “A compendium of human...

Library

Physical archives became digital records, encoded with metadata

The internet promised rich dynamic experiences

The internet promised rich dynamic experiences but served us banner
ads

Advertising has and continues to fuel a substantial portion of
the innovation on the internet

What would The Economist look like if it were founded
in 2012?

Experience

“There’s a reason that tech companies are topping the lists
of most valuable companies and brands. Every company is a tech company.” Maggie Chan Jones

Every story, at its core, is a business story

Language

Stage -> Stenographer -> Editors -> spaCy -> Data Store
<-> Backend <- Slack <- Users Proto-Pipeline

Over eight hours we created data from the content of
the event, building the model in real-time

The model evolved over time

This was the experiment that would evolve into SiO 2

Silicon, a key element in everything from glass to microchips,
is at the core of global business

Oxygen, the journalistic voice Quartz breathes into the global business
news cycle

Entities are linguistic anchors, deﬁned by context and around which
context can be inferred

Standard Entities PERSON FACILITY ORG PRODUCT GPE EVENT... Additional Entities
TECHNOLOGY PROCESS NATURE MEDIA CONSTRUCT

70K articles 1.4M blocks of text 85K labeled sentences

Entities

This spaCy model made rich analysis for any given text
easy to do on the ﬂy

Stored analysis of a large corpus is a vital resource

The language graph...

The language graph is a mutable map of the language
model

Any new content is analyzed and then mapped onto the
language graph

Changes made to the graph can then be incorporated into
the next model iteration

The language graph becomes a primary resource for extracting training
data

Snapshots of time can be extracted from the language graph

Context can be derived by looking at the relationships in
the language graph

Elon Musk

Jeff Bezos

Mark Zuckerberg

Context

SiO 2 is a living Natural Language Pipeline of networked
algorithms trained on the corpus of Quartz to understand the linguistic patterns of global business news

The Pipeline(s) Quartz Corpus -> Training Sentences -> spaCy Content
-> spaCy -> Language Graph Language Graph -> Training Data -> Statistical Models / Classiﬁers Language Graph -> Training Sentences -> spaCy Unseen Content -> spaCy -> Pre-Processed Text / Vectors -> Statistical Models / Classiﬁers

Thank you

A Natural Language Pipeline

A Natural Language Pipeline

Other Decks in Technology

Featured

Transcript