Slide 1

Slide 1 text

Ines Montani Explosion A practical guide to human-in-the-loop distillation

Slide 2

Slide 2 text

Open-source library for industrial-strength natural language processing spacy.io SPACY 250m+ downloads

Slide 3

Slide 3 text

Open-source library for industrial-strength natural language processing spacy.io ChatGPT can write spaCy code! SPACY 250m+ downloads

Slide 4

Slide 4 text

Modern scriptable annotation tool for machine learning developers PRODIGY 900+ companies prodigy.ai 10k+ users

Slide 5

Slide 5 text

Modern scriptable annotation tool for machine learning developers PRODIGY 900+ companies prodigy.ai Alex Smith Developer Kim Miller Analyst GPT-4 API 10k+ users

Slide 6

Slide 6 text

BACK TO OUR ROOTS explosion.ai/blog/back-to-our-roots We’re back to running Explosion as a smaller, independent-minded and self-su ff icient company. Ines Montani Founder Matthew Honnibal Founder

Slide 7

Slide 7 text

BACK TO OUR ROOTS explosion.ai/blog/back-to-our-roots We’re back to running Explosion as a smaller, independent-minded and self-su ff icient company. Consulting open source developer tools Ines Montani Founder Matthew Honnibal Founder

Slide 8

Slide 8 text

SOFTWARE IN Industry

Slide 9

Slide 9 text

modular SOFTWARE IN Industry

Slide 10

Slide 10 text

modular transparent SOFTWARE IN Industry

Slide 11

Slide 11 text

modular transparent explainable SOFTWARE IN Industry

Slide 12

Slide 12 text

modular transparent explainable data-private SOFTWARE IN Industry

Slide 13

Slide 13 text

modular transparent explainable data-private reliable SOFTWARE IN Industry

Slide 14

Slide 14 text

modular transparent explainable data-private reliable a ordable SOFTWARE IN Industry

Slide 15

Slide 15 text

black-box models modular transparent explainable data-private reliable a ordable SOFTWARE IN Industry

Slide 16

Slide 16 text

third-party APIs black-box models modular transparent explainable data-private reliable a ordable SOFTWARE IN Industry

Slide 17

Slide 17 text

Exceeds expectations kinda meh, really Just got the SpacePhone Nebula and I’m honestly blown away! The camera quality is amazing. And the battery life is incredible, easily lasting me a full day on a single charge. the nebula surely looks nice and all but for that price tag i expected more tbh… never had to carry a powerbank with my old iphone 13 but now i need it all the time 🙃 and night mode doesn’t really work. my pics are way too dark!

Slide 18

Slide 18 text

Exceeds expectations kinda meh, really find mentions of products Just got the SpacePhone Nebula and I’m honestly blown away! The camera quality is amazing. And the battery life is incredible, easily lasting me a full day on a single charge. the nebula surely looks nice and all but for that price tag i expected more tbh… never had to carry a powerbank with my old iphone 13 but now i need it all the time 🙃 and night mode doesn’t really work. my pics are way too dark!

Slide 19

Slide 19 text

Exceeds expectations kinda meh, really find mentions of products link mentions to catalog SpacePhone Nebula Released: June 2024 P3204-W2130 Just got the SpacePhone Nebula and I’m honestly blown away! The camera quality is amazing. And the battery life is incredible, easily lasting me a full day on a single charge. the nebula surely looks nice and all but for that price tag i expected more tbh… never had to carry a powerbank with my old iphone 13 but now i need it all the time 🙃 and night mode doesn’t really work. my pics are way too dark!

Slide 20

Slide 20 text

Exceeds expectations kinda meh, really extract sentiment for di erent attributes Battery Camera Performance Design camera battery design battery camera find mentions of products link mentions to catalog SpacePhone Nebula Released: June 2024 P3204-W2130 Just got the SpacePhone Nebula and I’m honestly blown away! The camera quality is amazing. And the battery life is incredible, easily lasting me a full day on a single charge. the nebula surely looks nice and all but for that price tag i expected more tbh… never had to carry a powerbank with my old iphone 13 but now i need it all the time 🙃 and night mode doesn’t really work. my pics are way too dark!

Slide 21

Slide 21 text

add results to database Exceeds expectations kinda meh, really extract sentiment for di erent attributes Battery Camera Performance Design camera battery design battery camera find mentions of products link mentions to catalog SpacePhone Nebula Released: June 2024 P3204-W2130 Just got the SpacePhone Nebula and I’m honestly blown away! The camera quality is amazing. And the battery life is incredible, easily lasting me a full day on a single charge. the nebula surely looks nice and all but for that price tag i expected more tbh… never had to carry a powerbank with my old iphone 13 but now i need it all the time 🙃 and night mode doesn’t really work. my pics are way too dark!

Slide 22

Slide 22 text

add results to database Exceeds expectations kinda meh, really extract sentiment for di erent attributes Battery Camera Performance Design camera battery design battery camera find mentions of products link mentions to catalog SpacePhone Nebula Released: June 2024 P3204-W2130 Just got the SpacePhone Nebula and I’m honestly blown away! The camera quality is amazing. And the battery life is incredible, easily lasting me a full day on a single charge. the nebula surely looks nice and all but for that price tag i expected more tbh… never had to carry a powerbank with my old iphone 13 but now i need it all the time 🙃 and night mode doesn’t really work. my pics are way too dark!

Slide 23

Slide 23 text

add results to database Exceeds expectations kinda meh, really extract sentiment for di erent attributes Battery Camera Performance Design camera battery design battery camera find mentions of products link mentions to catalog SpacePhone Nebula Released: June 2024 P3204-W2130 Just got the SpacePhone Nebula and I’m honestly blown away! The camera quality is amazing. And the battery life is incredible, easily lasting me a full day on a single charge. the nebula surely looks nice and all but for that price tag i expected more tbh… never had to carry a powerbank with my old iphone 13 but now i need it all the time 🙃 and night mode doesn’t really work. my pics are way too dark!

Slide 24

Slide 24 text

large generative model

Slide 25

Slide 25 text

in-context learning Falcon MIXTRAL GPT-4 large generative model

Slide 26

Slide 26 text

distilled task-specific model in-context learning Falcon MIXTRAL GPT-4 large generative model

Slide 27

Slide 27 text

distilled task-specific model transfer learning ELECTRA T5 in-context learning Falcon MIXTRAL GPT-4 large generative model

Slide 28

Slide 28 text

distilled task-specific model transfer learning ELECTRA T5 in-context learning Falcon MIXTRAL GPT-4 BERT-base is still very competitive! large generative model

Slide 29

Slide 29 text

📖 text 🔮 model raw output ⚙ parser task output 💬 template prompt WORKflow in-context learning explosion.ai/blog/human-in-the-loop-distillation

Slide 30

Slide 30 text

📖 text 🔮 model raw output ⚙ parser task output 💬 template prompt WORKflow in-context learning ⚗ distillation 🎯 annotation task dataset task-specific model transfer learning explosion.ai/blog/human-in-the-loop-distillation

Slide 31

Slide 31 text

CLOSE THE GAP BETWEEN prototype AND production

Slide 32

Slide 32 text

CLOSE THE GAP BETWEEN prototype AND production standardize inputs and outputs

Slide 33

Slide 33 text

CLOSE THE GAP BETWEEN prototype AND production standardize inputs and outputs start with evaluation

Slide 34

Slide 34 text

CLOSE THE GAP BETWEEN prototype AND production standardize inputs and outputs start with evaluation assess utility, not just accuracy explosion.ai/blog/applied-nlp-thinking

Slide 35

Slide 35 text

CLOSE THE GAP BETWEEN prototype AND production standardize inputs and outputs start with evaluation work on data iteratively assess utility, not just accuracy explosion.ai/blog/applied-nlp-thinking

Slide 36

Slide 36 text

CLOSE THE GAP BETWEEN prototype AND production standardize inputs and outputs start with evaluation work on data iteratively assess utility, not just accuracy explosion.ai/blog/applied-nlp-thinking consider structure and ambiguity of natural language

Slide 37

Slide 37 text

processing pipeline prototype

Slide 38

Slide 38 text

processing pipeline prototype github.com/explosion/spacy-llm prompt model & transform output to structured data structured machine-facing Doc object

Slide 39

Slide 39 text

processing pipeline prototype processing pipeline in production structured machine-facing Doc object github.com/explosion/spacy-llm prompt model & transform output to structured data structured machine-facing Doc object

Slide 40

Slide 40 text

human IN THE LOOP

Slide 41

Slide 41 text

continuous evaluation baseline human IN THE LOOP

Slide 42

Slide 42 text

continuous evaluation baseline prompting human IN THE LOOP

Slide 43

Slide 43 text

continuous evaluation baseline prompting human IN THE LOOP

Slide 44

Slide 44 text

continuous evaluation baseline prompting transfer learning human IN THE LOOP

Slide 45

Slide 45 text

continuous evaluation baseline prompting transfer learning human IN THE LOOP distilled model

Slide 46

Slide 46 text

kinda meh, really the nebula surely looks nice and all but for that price tag i expected more tbh… never had to carry a powerbank with my old iphone 13 but now i need it all the time 🙃 and night mode doesn’t really work. my pics are way too dark!

Slide 47

Slide 47 text

Relevant Mention "nebula" Catalog ID P3204-W2130 Battery Camera Performance null Design structured data kinda meh, really the nebula surely looks nice and all but for that price tag i expected more tbh… never had to carry a powerbank with my old iphone 13 but now i need it all the time 🙃 and night mode doesn’t really work. my pics are way too dark!

Slide 48

Slide 48 text

prodigy.ai Relevant Mention "nebula" Catalog ID P3204-W2130 Battery Camera Performance null Design structured data kinda meh, really the nebula surely looks nice and all but for that price tag i expected more tbh… never had to carry a powerbank with my old iphone 13 but now i need it all the time 🙃 and night mode doesn’t really work. my pics are way too dark!

Slide 49

Slide 49 text

prodigy.ai Relevant Mention "nebula" Catalog ID P3204-W2130 Battery Camera Performance null Design structured data kinda meh, really the nebula surely looks nice and all but for that price tag i expected more tbh… never had to carry a powerbank with my old iphone 13 but now i need it all the time 🙃 and night mode doesn’t really work. my pics are way too dark! selection by generative model GPT-4 API

Slide 50

Slide 50 text

prodigy.ai Relevant Mention "nebula" Catalog ID P3204-W2130 Battery Camera Performance null Design structured data kinda meh, really the nebula surely looks nice and all but for that price tag i expected more tbh… never had to carry a powerbank with my old iphone 13 but now i need it all the time 🙃 and night mode doesn’t really work. my pics are way too dark! selection by generative model GPT-4 API can be faster, not slower!

Slide 51

Slide 51 text

CASE STUDY #1 400mb model size 2k+ words/second 8hr data dev time spacy.fyi/pydata-nyc • PyData NYC 2023 workshop: extracting dishes, ingredients and equipment from r/cooking Reddit posts

Slide 52

Slide 52 text

CASE STUDY #1 400mb model size 2k+ words/second 8hr data dev time spacy.fyi/pydata-nyc • PyData NYC 2023 workshop: extracting dishes, ingredients and equipment from r/cooking Reddit posts • used LLM during annotation

Slide 53

Slide 53 text

CASE STUDY #1 400mb model size 2k+ words/second 8hr data dev time spacy.fyi/pydata-nyc • PyData NYC 2023 workshop: extracting dishes, ingredients and equipment from r/cooking Reddit posts • used LLM during annotation • beat few-shot LLM baseline of 0.74 with task-specific model

Slide 54

Slide 54 text

CASE STUDY #1 400mb model size 2k+ words/second 8hr data dev time spacy.fyi/pydata-nyc • PyData NYC 2023 workshop: extracting dishes, ingredients and equipment from r/cooking Reddit posts • used LLM during annotation • beat few-shot LLM baseline of 0.74 with task-specific model • 20× inference time speedup

Slide 55

Slide 55 text

• S&P Global: real-time commodities trading insights by extracting structured attributes explosion.ai/blog/sp-global-commodities CASE STUDY #2 6mb model size 16k+ words/second 99% F-score

Slide 56

Slide 56 text

• S&P Global: real-time commodities trading insights by extracting structured attributes • high-security environment explosion.ai/blog/sp-global-commodities CASE STUDY #2 6mb model size 16k+ words/second 99% F-score

Slide 57

Slide 57 text

• S&P Global: real-time commodities trading insights by extracting structured attributes • high-security environment • used LLM during annotation explosion.ai/blog/sp-global-commodities CASE STUDY #2 6mb model size 16k+ words/second 99% F-score

Slide 58

Slide 58 text

• S&P Global: real-time commodities trading insights by extracting structured attributes • high-security environment • used LLM during annotation • 10× data development speedup with humans and model in the loop explosion.ai/blog/sp-global-commodities CASE STUDY #2 6mb model size 16k+ words/second 99% F-score

Slide 59

Slide 59 text

• S&P Global: real-time commodities trading insights by extracting structured attributes • high-security environment • used LLM during annotation • 10× data development speedup with humans and model in the loop • 8 market pipelines in production explosion.ai/blog/sp-global-commodities CASE STUDY #2 6mb model size 16k+ words/second 99% F-score

Slide 60

Slide 60 text

• S&P Global: real-time commodities trading insights by extracting structured attributes • high-security environment • used LLM during annotation • 10× data development speedup with humans and model in the loop • 8 market pipelines in production explosion.ai/blog/sp-global-commodities CASE STUDY #2 6mb model size 16k+ words/second 99% F-score

Slide 61

Slide 61 text

THINK OF IT AS A refactoring PROCESS

Slide 62

Slide 62 text

THINK OF IT AS A refactoring PROCESS break down larger problems

Slide 63

Slide 63 text

THINK OF IT AS A refactoring PROCESS break down larger problems make problem easier

Slide 64

Slide 64 text

THINK OF IT AS A refactoring PROCESS factor out business logic break down larger problems make problem easier

Slide 65

Slide 65 text

THINK OF IT AS A refactoring PROCESS factor out business logic break down larger problems reassess dependencies make problem easier

Slide 66

Slide 66 text

THINK OF IT AS A refactoring PROCESS factor out business logic break down larger problems reassess dependencies choose the best techniques make problem easier

Slide 67

Slide 67 text

MAKE PROBLEM easier

Slide 68

Slide 68 text

MAKE PROBLEM easier less operational complexity means less can go wrong

Slide 69

Slide 69 text

MAKE PROBLEM easier less operational complexity means less can go wrong development complexity beginner 🤓 intermediate 🥸 advanced 😎

Slide 70

Slide 70 text

MAKE PROBLEM easier less operational complexity means less can go wrong development complexity beginner 🤓 intermediate 🥸 advanced 😎 🎓 research

Slide 71

Slide 71 text

MAKE PROBLEM easier less operational complexity means less can go wrong development complexity beginner 🤓 intermediate 🥸 advanced 😎 🎓 research • build a commons of knowledge

Slide 72

Slide 72 text

MAKE PROBLEM easier less operational complexity means less can go wrong development complexity beginner 🤓 intermediate 🥸 advanced 😎 🎓 research • build a commons of knowledge • make direct comparisons using standard evaluations

Slide 73

Slide 73 text

MAKE PROBLEM easier less operational complexity means less can go wrong development complexity beginner 🤓 intermediate 🥸 advanced 😎 🎓 research • build a commons of knowledge • make direct comparisons using standard evaluations • standardize what isn’t novel

Slide 74

Slide 74 text

🛠 application MAKE PROBLEM easier less operational complexity means less can go wrong development complexity beginner 🤓 intermediate 🥸 advanced 😎 🎓 research • build a commons of knowledge • make direct comparisons using standard evaluations • standardize what isn’t novel

Slide 75

Slide 75 text

🛠 application MAKE PROBLEM easier less operational complexity means less can go wrong development complexity beginner 🤓 intermediate 🥸 advanced 😎 🎓 research • build a commons of knowledge • make direct comparisons using standard evaluations • standardize what isn’t novel • learn from commons of knowledge

Slide 76

Slide 76 text

🛠 application MAKE PROBLEM easier less operational complexity means less can go wrong development complexity beginner 🤓 intermediate 🥸 advanced 😎 🎓 research • build a commons of knowledge • make direct comparisons using standard evaluations • standardize what isn’t novel • learn from commons of knowledge • align evaluation to project goals

Slide 77

Slide 77 text

🛠 application MAKE PROBLEM easier less operational complexity means less can go wrong development complexity beginner 🤓 intermediate 🥸 advanced 😎 🎓 research • build a commons of knowledge • make direct comparisons using standard evaluations • standardize what isn’t novel • learn from commons of knowledge • align evaluation to project goals • do whatever works

Slide 78

Slide 78 text

FACTOR OUT business LOGIC SpacePhone Nebula Released: June 2024 P3204-W2130 kinda meh, really the nebula surely looks nice and all but for that price tag i expected more tbh… never had to carry a powerbank with my old iphone 13 but now i need it all the time 🙃 and night mode doesn’t really work. my pics are way too dark!

Slide 79

Slide 79 text

FACTOR OUT business LOGIC result = business_logic(classification(text)) SpacePhone Nebula Released: June 2024 P3204-W2130 kinda meh, really the nebula surely looks nice and all but for that price tag i expected more tbh… never had to carry a powerbank with my old iphone 13 but now i need it all the time 🙃 and night mode doesn’t really work. my pics are way too dark!

Slide 80

Slide 80 text

FACTOR OUT business LOGIC result = business_logic(classification(text)) SpacePhone Nebula Released: June 2024 P3204-W2130 kinda meh, really products model phone comparison the nebula surely looks nice and all but for that price tag i expected more tbh… never had to carry a powerbank with my old iphone 13 but now i need it all the time 🙃 and night mode doesn’t really work. my pics are way too dark!

Slide 81

Slide 81 text

FACTOR OUT business LOGIC result = business_logic(classification(text)) latest model catalog reference touchscreen worse than SpacePhone Nebula Released: June 2024 P3204-W2130 kinda meh, really products model phone comparison the nebula surely looks nice and all but for that price tag i expected more tbh… never had to carry a powerbank with my old iphone 13 but now i need it all the time 🙃 and night mode doesn’t really work. my pics are way too dark!

Slide 82

Slide 82 text

CASE STUDY #3 1 year of support tickets 6× speedup explosion.ai/blog/gitlab-support-insights • GitLab: extract actionable insights from support tickets and usage questions

Slide 83

Slide 83 text

CASE STUDY #3 1 year of support tickets 6× speedup explosion.ai/blog/gitlab-support-insights • GitLab: extract actionable insights from support tickets and usage questions • high-security environment

Slide 84

Slide 84 text

CASE STUDY #3 1 year of support tickets 6× speedup explosion.ai/blog/gitlab-support-insights • GitLab: extract actionable insights from support tickets and usage questions • high-security environment • easy to adapt to new scenarios and business questions

Slide 85

Slide 85 text

CASE STUDY #3 1 year of support tickets 6× speedup explosion.ai/blog/gitlab-support-insights • GitLab: extract actionable insights from support tickets and usage questions • high-security environment • easy to adapt to new scenarios and business questions • separated general-purpose features from product-specific logic

Slide 86

Slide 86 text

CASE STUDY #3 1 year of support tickets 6× speedup explosion.ai/blog/gitlab-support-insights • GitLab: extract actionable insights from support tickets and usage questions • high-security environment • easy to adapt to new scenarios and business questions • separated general-purpose features from product-specific logic

Slide 87

Slide 87 text

REALITY IS NOT AN end-to-end PREDICTION PROBLEM explosion.ai/blog/human-in-the-loop-distillation

Slide 88

Slide 88 text

REALITY IS NOT AN end-to-end PREDICTION PROBLEM Human-in-the- loop distillation is a refactoring process. explosion.ai/blog/human-in-the-loop-distillation

Slide 89

Slide 89 text

REALITY IS NOT AN end-to-end PREDICTION PROBLEM Iteration and the right tooling can get you past the prototype plateau. Human-in-the- loop distillation is a refactoring process. explosion.ai/blog/human-in-the-loop-distillation

Slide 90

Slide 90 text

REALITY IS NOT AN end-to-end PREDICTION PROBLEM Iteration and the right tooling can get you past the prototype plateau. Human-in-the- loop distillation is a refactoring process. Less operational complexity means less can go wrong. explosion.ai/blog/human-in-the-loop-distillation

Slide 91

Slide 91 text

REALITY IS NOT AN end-to-end PREDICTION PROBLEM Iteration and the right tooling can get you past the prototype plateau. Human-in-the- loop distillation is a refactoring process. Less operational complexity means less can go wrong. Expect surprises from the data, and plan for change. explosion.ai/blog/human-in-the-loop-distillation

Slide 92

Slide 92 text

REALITY IS NOT AN end-to-end PREDICTION PROBLEM Iteration and the right tooling can get you past the prototype plateau. Human-in-the- loop distillation is a refactoring process. Less operational complexity means less can go wrong. Expect surprises from the data, and plan for change. There’s no need to compromise on development best practices or privacy. explosion.ai/blog/human-in-the-loop-distillation

Slide 93

Slide 93 text

Explosion spaCy Prodigy Twitter Mastodon Bluesky explosion.ai spacy.io prodigy.ai @_inesmontani @[email protected] @inesmontani.bsky.social LinkedIn