Slide 1

Slide 1 text

spaCy meets Transformers Matthew Honnibal Explosion

Slide 2

Slide 2 text

Matthew Honnibal CO-FOUNDER PhD in Computer Science in 2009. 10 years publishing research on state-of-the- art natural language understanding systems. Left academia in 2014 to develop spaCy. Ines Montani CO-FOUNDER Programmer and front-end developer with degree in media science and linguistics. Has been working on spaCy since its first release. Lead developer of Prodigy.

Slide 3

Slide 3 text

100k+ users worldwide 15k stars on GitHub 400 contributors 60+ extension packages https://spacy.io

Slide 4

Slide 4 text

https://prodi.gy

Slide 5

Slide 5 text

2500+ users, including 250+ companies 1200+ forum members https://prodi.gy

Slide 6

Slide 6 text

ELMo ULMFiT BERT

Slide 7

Slide 7 text

ELMo ULMFiT BERT

Slide 8

Slide 8 text

ELMo ULMFiT BERT

Slide 9

Slide 9 text

No content

Slide 10

Slide 10 text

No content

Slide 11

Slide 11 text

Tokenization alignment

Slide 12

Slide 12 text

Fine-tuning

Slide 13

Slide 13 text

spaCy’s NLP pipeline

Slide 14

Slide 14 text

No content

Slide 15

Slide 15 text

Processing pipeline

Slide 16

Slide 16 text

Processing pipeline with shared representations

Slide 17

Slide 17 text

Processing pipeline without shared representations

Slide 18

Slide 18 text

Modular architecture • Functions should be small and self-contained • Avoid state and side- effects • Lots of systems from fewer parts Speed and accuracy • Small functions make you repeat work • Without state, models lose information • ML models aren’t really interchangeable anyway

Slide 19

Slide 19 text

No content

Slide 20

Slide 20 text

What you can do with transformers

Slide 21

Slide 21 text

Transformers: Pros • Easy network design • Great accuracy • Need few annotated examples Transformers: Cons • Slow / expensive • Need large batches • Bleeding edge

Slide 22

Slide 22 text

github.com/explosion/spacy-transformers

Slide 23

Slide 23 text

• pip install spacy-transformers • Supports textcat, aligned tokenization, custom models • Coming soon: NER, tagging, dependency parsing • Coming soon: RPC for the transformer components • Coming soon: Transformers support in Prodigy Conclusion

Slide 24

Slide 24 text

Thank you! Explosion explosion.ai Follow us on Twitter @honnibal @explosion_ai