• bootstrapped through consulting for the first 6 months • funded through software sales since 2017 • 100% independent and profitable Our bets about NLP • NLP won’t just be a cloud API • number of developers will increase • annotation is better in-house
• 2-3 times faster tokenization • enhanced match pattern API • built-in rule-based NER • many other improvements Transfer learning • better models with less data – huge win! • how to adapt for spaCy without bigger (and slower) models? • spacy pretrain is a pretty cool compromise
data structures and pipeline • build support for new tasks even if we don’t have a model • make sure it’s easy to BYO model • keep shipping good defaults • morphological features • entity linking • non-entity span tagging • static analysis of processing pipeline and its components
data structures and pipeline • build support for new tasks even if we don’t have a model • make sure it's easy to BYO model • keep shipping good defaults What’s out-of-scope? • anything generative: summarization, machine translation, etc. • multi-modal: audio, video, etc. • research assistance: plenty of good frameworks for developing novel techniques • morphological features • entity linking • non-entity span tagging • static analysis of processing pipeline and its components
systems, not just libraries • programmable, extensible cluster • running under your control • automated setup, good defaults • full data privacy – we don’t want your data!
systems, not just libraries • programmable, extensible cluster • running under your control • automated setup, good defaults • full data privacy – we don’t want your data! Processing with Dask
systems, not just libraries • programmable, extensible cluster • running under your control • automated setup, good defaults • full data privacy – we don’t want your data! Prodigy Scale Processing with Dask