Upgrade to Pro — share decks privately, control downloads, hide ads and more …

A Natural Language Pipeline

ddqz
July 06, 2019

A Natural Language Pipeline

Presentation from the spaCy IRL 2019 conference.

ddqz

July 06, 2019
Tweet

Other Decks in Technology

Transcript

  1. “There’s a reason that tech companies are topping the lists

    of most valuable companies and brands. Every company is a tech company.” Maggie Chan Jones
  2. Stage -> Stenographer -> Editors -> spaCy -> Data Store

    <-> Backend <- Slack <- Users Proto-Pipeline
  3. Over eight hours we created data from the content of

    the event, building the model in real-time
  4. SiO 2 is a living Natural Language Pipeline of networked

    algorithms trained on the corpus of Quartz to understand the linguistic patterns of global business news
  5. The Pipeline(s) Quartz Corpus -> Training Sentences -> spaCy Content

    -> spaCy -> Language Graph Language Graph -> Training Data -> Statistical Models / Classifiers Language Graph -> Training Sentences -> spaCy Unseen Content -> spaCy -> Pre-Processed Text / Vectors -> Statistical Models / Classifiers