$30 off During Our Annual Pro Sale. View Details »

spaCy and Explosion: past, present & future

spaCy and Explosion: past, present & future

Ines Montani
PRO

July 06, 2019
Tweet

More Decks by Ines Montani

Other Decks in Programming

Transcript

  1. spaCy and Explosion:
    past, present & future
    Matthew Honnibal

    Ines Montani
    Explosion

    View Slide

  2. 1982 - 2014
    Before spaCy

    View Slide

  3. 1982 - 2014
    Before spaCy
    Some time before 2005

    View Slide

  4. 1982 - 2014
    Before spaCy
    PhD 2005-2010 (Sydney)
    Some time before 2005

    View Slide

  5. 1982 - 2014
    Before spaCy
    PhD 2005-2010 (Sydney)
    ACL 2009 (Singapore)
    Some time before 2005

    View Slide

  6. July 2014
    First commit on spaCy

    View Slide

  7. Early 2015
    spaCy is first released

    View Slide

  8. 2015
    First collaborations
    • demos & visualizers like displaCy
    • first concepts of a modern
    approach to NLP annotation tools

    View Slide

  9. 2015
    First collaborations
    “Baskets” concept
    • demos & visualizers like displaCy
    • first concepts of a modern
    approach to NLP annotation tools

    View Slide

  10. 2015
    First collaborations
    “Baskets” concept
    Binary annotation tool concept
    • demos & visualizers like displaCy
    • first concepts of a modern
    approach to NLP annotation tools

    View Slide

  11. Early 2016
    First highlights: sense2vec

    View Slide

  12. Early 2016
    First highlights: German model
    • first non-English model
    • non-projective dependencies
    • developed by Wolfgang Seeker

    View Slide

  13. Late 2016
    Explosion
    • new company for AI developer tools
    • bootstrapped through consulting 

    for the first 6 months
    • funded through software sales 

    since 2017
    • 100% independent and profitable

    View Slide

  14. Late 2016
    Explosion
    • new company for AI developer tools
    • bootstrapped through consulting 

    for the first 6 months
    • funded through software sales 

    since 2017
    • 100% independent and profitable
    Our bets about NLP
    • NLP won’t just be a cloud API
    • number of developers will increase
    • annotation is better in-house

    View Slide

  15. 2017
    spaCy v2.0
    • shift to deep learning
    • smaller and updatable models
    • custom pipeline components
    • custom extension attributes
    • built-in text classification
    • built-in displaCy visualizers
    • many other improvements
    Thinc, spaCy’s machine learning library

    View Slide

  16. July 2017
    neuralcoref by Hugging Face
    Community package for coreference

    resolution with spaCy

    View Slide

  17. Late 2017
    Prodigy
    • first commercial product
    • modern annotation tool
    • fully scriptable in Python

    View Slide

  18. Late 2017
    Prodigy
    • first commercial product
    • modern annotation tool
    • fully scriptable in Python

    View Slide

  19. Late 2017
    Prodigy
    • first commercial product
    • modern annotation tool
    • fully scriptable in Python
    users
    2,000+
    250+ companies
    incl.

    View Slide

  20. July 2018
    10,000 stars on GitHub

    View Slide

  21. ,
    Early 2019
    spaCy v2.1
    • transfer learning and pretraining
    • 2-3 times faster tokenization
    • enhanced match pattern API
    • built-in rule-based NER
    • many other improvements

    View Slide

  22. ,
    Early 2019
    spaCy v2.1
    • transfer learning and pretraining
    • 2-3 times faster tokenization
    • enhanced match pattern API
    • built-in rule-based NER
    • many other improvements
    Transfer learning
    • better models with less data –
    huge win!
    • how to adapt for spaCy without
    bigger (and slower) models?
    • spacy pretrain is a pretty cool
    compromise

    View Slide

  23. April 2019
    Free interactive online course
    course.spacy.io

    View Slide

  24. So what’s next 

    for spaCy?

    View Slide

  25. July 2019
    Explosion team
    Matthew
    Ines Montani
    Honnibal

    View Slide

  26. July 2019
    Explosion team
    Matthew
    Ines Montani
    Justin DuJardin
    Honnibal

    View Slide

  27. July 2019
    Explosion team
    Matthew
    Ines Montani
    Sebastián Ramírez Guadalupe Romero
    Giannis Daras
    Justin DuJardin
    Van Landeghem
    Sofie
    Honnibal

    View Slide

  28. Today
    spaCy IRL
    • 2 sold-out corporate training days
    • 200+ conference attendees

    View Slide

  29. What’s next?
    spaCy v3.0
    • morphological features
    • entity linking
    • non-entity span tagging
    • static analysis of processing
    pipeline and its components

    View Slide

  30. What’s next?
    spaCy v3.0
    Vision for spaCy
    • focus on data structures and pipeline
    • build support for new tasks even if we
    don’t have a model
    • make sure it’s easy to BYO model
    • keep shipping good defaults
    • morphological features
    • entity linking
    • non-entity span tagging
    • static analysis of processing
    pipeline and its components

    View Slide

  31. What’s next?
    spaCy v3.0
    Vision for spaCy
    • focus on data structures and pipeline
    • build support for new tasks even if we
    don’t have a model
    • make sure it's easy to BYO model
    • keep shipping good defaults
    What’s out-of-scope?
    • anything generative: summarization, machine
    translation, etc.
    • multi-modal: audio, video, etc.
    • research assistance: plenty of good
    frameworks for developing novel techniques
    • morphological features
    • entity linking
    • non-entity span tagging
    • static analysis of processing
    pipeline and its components

    View Slide

  32. ,
    What’s next?
    spaCy ecosystem in your cloud
    • whole systems, not just libraries
    • programmable, extensible cluster
    • running under your control
    • automated setup, good defaults
    • full data privacy – we don’t want
    your data!

    View Slide

  33. ,
    What’s next?
    spaCy ecosystem in your cloud
    • whole systems, not just libraries
    • programmable, extensible cluster
    • running under your control
    • automated setup, good defaults
    • full data privacy – we don’t want
    your data!
    Processing with Dask

    View Slide

  34. ,
    What’s next?
    spaCy ecosystem in your cloud
    • whole systems, not just libraries
    • programmable, extensible cluster
    • running under your control
    • automated setup, good defaults
    • full data privacy – we don’t want
    your data!
    Prodigy Scale
    Processing with Dask

    View Slide

  35. commits
    10,000+
    contributors
    300+
    60+ extension packages
    13,500+ GitHub stars
    2,800+ closed issues
    80+ releases

    View Slide

  36. Thank you!

    View Slide