Slide 1

Slide 1 text

spaCy and Explosion: past, present & future Matthew Honnibal
 Ines Montani Explosion

Slide 2

Slide 2 text

1982 - 2014 Before spaCy

Slide 3

Slide 3 text

1982 - 2014 Before spaCy Some time before 2005

Slide 4

Slide 4 text

1982 - 2014 Before spaCy PhD 2005-2010 (Sydney) Some time before 2005

Slide 5

Slide 5 text

1982 - 2014 Before spaCy PhD 2005-2010 (Sydney) ACL 2009 (Singapore) Some time before 2005

Slide 6

Slide 6 text

July 2014 First commit on spaCy

Slide 7

Slide 7 text

Early 2015 spaCy is first released

Slide 8

Slide 8 text

2015 First collaborations • demos & visualizers like displaCy • first concepts of a modern approach to NLP annotation tools

Slide 9

Slide 9 text

2015 First collaborations “Baskets” concept • demos & visualizers like displaCy • first concepts of a modern approach to NLP annotation tools

Slide 10

Slide 10 text

2015 First collaborations “Baskets” concept Binary annotation tool concept • demos & visualizers like displaCy • first concepts of a modern approach to NLP annotation tools

Slide 11

Slide 11 text

Early 2016 First highlights: sense2vec

Slide 12

Slide 12 text

Early 2016 First highlights: German model • first non-English model • non-projective dependencies • developed by Wolfgang Seeker

Slide 13

Slide 13 text

Late 2016 Explosion • new company for AI developer tools • bootstrapped through consulting 
 for the first 6 months • funded through software sales 
 since 2017 • 100% independent and profitable

Slide 14

Slide 14 text

Late 2016 Explosion • new company for AI developer tools • bootstrapped through consulting 
 for the first 6 months • funded through software sales 
 since 2017 • 100% independent and profitable Our bets about NLP • NLP won’t just be a cloud API • number of developers will increase • annotation is better in-house

Slide 15

Slide 15 text

2017 spaCy v2.0 • shift to deep learning • smaller and updatable models • custom pipeline components • custom extension attributes • built-in text classification • built-in displaCy visualizers • many other improvements Thinc, spaCy’s machine learning library

Slide 16

Slide 16 text

July 2017 neuralcoref by Hugging Face Community package for coreference
 resolution with spaCy

Slide 17

Slide 17 text

Late 2017 Prodigy • first commercial product • modern annotation tool • fully scriptable in Python

Slide 18

Slide 18 text

Late 2017 Prodigy • first commercial product • modern annotation tool • fully scriptable in Python

Slide 19

Slide 19 text

Late 2017 Prodigy • first commercial product • modern annotation tool • fully scriptable in Python users 2,000+ 250+ companies incl.

Slide 20

Slide 20 text

July 2018 10,000 stars on GitHub

Slide 21

Slide 21 text

, Early 2019 spaCy v2.1 • transfer learning and pretraining • 2-3 times faster tokenization • enhanced match pattern API • built-in rule-based NER • many other improvements

Slide 22

Slide 22 text

, Early 2019 spaCy v2.1 • transfer learning and pretraining • 2-3 times faster tokenization • enhanced match pattern API • built-in rule-based NER • many other improvements Transfer learning • better models with less data – huge win! • how to adapt for spaCy without bigger (and slower) models? • spacy pretrain is a pretty cool compromise

Slide 23

Slide 23 text

April 2019 Free interactive online course course.spacy.io

Slide 24

Slide 24 text

So what’s next 
 for spaCy?

Slide 25

Slide 25 text

July 2019 Explosion team Matthew Ines Montani Honnibal

Slide 26

Slide 26 text

July 2019 Explosion team Matthew Ines Montani Justin DuJardin Honnibal

Slide 27

Slide 27 text

July 2019 Explosion team Matthew Ines Montani Sebastián Ramírez Guadalupe Romero Giannis Daras Justin DuJardin Van Landeghem Sofie Honnibal

Slide 28

Slide 28 text

Today spaCy IRL • 2 sold-out corporate training days • 200+ conference attendees

Slide 29

Slide 29 text

What’s next? spaCy v3.0 • morphological features • entity linking • non-entity span tagging • static analysis of processing pipeline and its components

Slide 30

Slide 30 text

What’s next? spaCy v3.0 Vision for spaCy • focus on data structures and pipeline • build support for new tasks even if we don’t have a model • make sure it’s easy to BYO model • keep shipping good defaults • morphological features • entity linking • non-entity span tagging • static analysis of processing pipeline and its components

Slide 31

Slide 31 text

What’s next? spaCy v3.0 Vision for spaCy • focus on data structures and pipeline • build support for new tasks even if we don’t have a model • make sure it's easy to BYO model • keep shipping good defaults What’s out-of-scope? • anything generative: summarization, machine translation, etc. • multi-modal: audio, video, etc. • research assistance: plenty of good frameworks for developing novel techniques • morphological features • entity linking • non-entity span tagging • static analysis of processing pipeline and its components

Slide 32

Slide 32 text

, What’s next? spaCy ecosystem in your cloud • whole systems, not just libraries • programmable, extensible cluster • running under your control • automated setup, good defaults • full data privacy – we don’t want your data!

Slide 33

Slide 33 text

, What’s next? spaCy ecosystem in your cloud • whole systems, not just libraries • programmable, extensible cluster • running under your control • automated setup, good defaults • full data privacy – we don’t want your data! Processing with Dask

Slide 34

Slide 34 text

, What’s next? spaCy ecosystem in your cloud • whole systems, not just libraries • programmable, extensible cluster • running under your control • automated setup, good defaults • full data privacy – we don’t want your data! Prodigy Scale Processing with Dask

Slide 35

Slide 35 text

commits 10,000+ contributors 300+ 60+ extension packages 13,500+ GitHub stars 2,800+ closed issues 80+ releases

Slide 36

Slide 36 text

Thank you!