spaCy and Explosion: past, present & future

spaCy and Explosion: past, present & future

C005d9d90f1b1b1c2a0a478d67f1fee9?s=128

Ines Montani

July 06, 2019
Tweet

Transcript

  1. spaCy and Explosion: past, present & future Matthew Honnibal
 Ines

    Montani Explosion
  2. 1982 - 2014 Before spaCy

  3. 1982 - 2014 Before spaCy Some time before 2005

  4. 1982 - 2014 Before spaCy PhD 2005-2010 (Sydney) Some time

    before 2005
  5. 1982 - 2014 Before spaCy PhD 2005-2010 (Sydney) ACL 2009

    (Singapore) Some time before 2005
  6. July 2014 First commit on spaCy

  7. Early 2015 spaCy is first released

  8. 2015 First collaborations • demos & visualizers like displaCy •

    first concepts of a modern approach to NLP annotation tools
  9. 2015 First collaborations “Baskets” concept • demos & visualizers like

    displaCy • first concepts of a modern approach to NLP annotation tools
  10. 2015 First collaborations “Baskets” concept Binary annotation tool concept •

    demos & visualizers like displaCy • first concepts of a modern approach to NLP annotation tools
  11. Early 2016 First highlights: sense2vec

  12. Early 2016 First highlights: German model • first non-English model

    • non-projective dependencies • developed by Wolfgang Seeker
  13. Late 2016 Explosion • new company for AI developer tools

    • bootstrapped through consulting 
 for the first 6 months • funded through software sales 
 since 2017 • 100% independent and profitable
  14. Late 2016 Explosion • new company for AI developer tools

    • bootstrapped through consulting 
 for the first 6 months • funded through software sales 
 since 2017 • 100% independent and profitable Our bets about NLP • NLP won’t just be a cloud API • number of developers will increase • annotation is better in-house
  15. 2017 spaCy v2.0 • shift to deep learning • smaller

    and updatable models • custom pipeline components • custom extension attributes • built-in text classification • built-in displaCy visualizers • many other improvements Thinc, spaCy’s machine learning library
  16. July 2017 neuralcoref by Hugging Face Community package for coreference


    resolution with spaCy
  17. Late 2017 Prodigy • first commercial product • modern annotation

    tool • fully scriptable in Python
  18. Late 2017 Prodigy • first commercial product • modern annotation

    tool • fully scriptable in Python
  19. Late 2017 Prodigy • first commercial product • modern annotation

    tool • fully scriptable in Python users 2,000+ 250+ companies incl.
  20. July 2018 10,000 stars on GitHub

  21. , Early 2019 spaCy v2.1 • transfer learning and pretraining

    • 2-3 times faster tokenization • enhanced match pattern API • built-in rule-based NER • many other improvements
  22. , Early 2019 spaCy v2.1 • transfer learning and pretraining

    • 2-3 times faster tokenization • enhanced match pattern API • built-in rule-based NER • many other improvements Transfer learning • better models with less data – huge win! • how to adapt for spaCy without bigger (and slower) models? • spacy pretrain is a pretty cool compromise
  23. April 2019 Free interactive online course course.spacy.io

  24. So what’s next 
 for spaCy?

  25. July 2019 Explosion team Matthew Ines Montani Honnibal

  26. July 2019 Explosion team Matthew Ines Montani Justin DuJardin Honnibal

  27. July 2019 Explosion team Matthew Ines Montani Sebastián Ramírez Guadalupe

    Romero Giannis Daras Justin DuJardin Van Landeghem Sofie Honnibal
  28. Today spaCy IRL • 2 sold-out corporate training days •

    200+ conference attendees
  29. What’s next? spaCy v3.0 • morphological features • entity linking

    • non-entity span tagging • static analysis of processing pipeline and its components
  30. What’s next? spaCy v3.0 Vision for spaCy • focus on

    data structures and pipeline • build support for new tasks even if we don’t have a model • make sure it’s easy to BYO model • keep shipping good defaults • morphological features • entity linking • non-entity span tagging • static analysis of processing pipeline and its components
  31. What’s next? spaCy v3.0 Vision for spaCy • focus on

    data structures and pipeline • build support for new tasks even if we don’t have a model • make sure it's easy to BYO model • keep shipping good defaults What’s out-of-scope? • anything generative: summarization, machine translation, etc. • multi-modal: audio, video, etc. • research assistance: plenty of good frameworks for developing novel techniques • morphological features • entity linking • non-entity span tagging • static analysis of processing pipeline and its components
  32. , What’s next? spaCy ecosystem in your cloud • whole

    systems, not just libraries • programmable, extensible cluster • running under your control • automated setup, good defaults • full data privacy – we don’t want your data!
  33. , What’s next? spaCy ecosystem in your cloud • whole

    systems, not just libraries • programmable, extensible cluster • running under your control • automated setup, good defaults • full data privacy – we don’t want your data! Processing with Dask
  34. , What’s next? spaCy ecosystem in your cloud • whole

    systems, not just libraries • programmable, extensible cluster • running under your control • automated setup, good defaults • full data privacy – we don’t want your data! Prodigy Scale Processing with Dask
  35. commits 10,000+ contributors 300+ 60+ extension packages 13,500+ GitHub stars

    2,800+ closed issues 80+ releases
  36. Thank you!