Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Taking LLMs out of the black box: A practical g...

Taking LLMs out of the black box: A practical guide to human-in-the-loop distillation

As the field of natural language processing advances and new ideas develop, we’re seeing more and more ways to use compute efficiently, producing AI systems that are cheaper to run and easier to control. Large Language Models (LLMs) have enormous potential, but also challenge existing workflows in industry that require modularity, transparency and data privacy. In this talk, I'll show some practical solutions for using the latest state-of-the-art models in real-world applications and distilling their knowledge into smaller and faster components that you can run and maintain in-house.

I'll share some real-world case studies and approaches for using large generative models at development time instead of runtime, curate their structured predictions with an efficient human-in-the-loop workflow and distill task-specific components as small as 6mb that run cheaply, privately and reliably, and that you can compose into larger NLP systems.

If you’re trying to build a system that does a particular thing, you don’t need to transform your request into arbitrary language and call into the largest model that understands arbitrary language the best. The people developing those models are telling that story, but the rest of us aren’t obliged to believe them.

_________________________________________________

▪️ Blog post: https://explosion.ai/blog/human-in-the-loop-distillation
▪️ Case Study #1: https://speakerdeck.com/inesmontani/workshop-half-hour-of-labeling-power-can-we-beat-gpt
▪️ Case Study #2: https://explosion.ai/blog/sp-global-commodities
▪️ Case Study #3: https://explosion.ai/blog/gitlab-support-insights

Ines Montani

September 26, 2024
Tweet

Video


Resources

A practical guide to human-in-the-loop distillation

https://explosion.ai/blog/human-in-the-loop-distillation

Blog post version of this talk, presenting practical solutions for using the latest state-of-the-art models in real-world applications and distilling their knowledge into smaller and faster components that you can run and maintain in-house.

How S&P Global is making markets more transparent with NLP, spaCy and Prodigy

https://explosion.ai/blog/sp-global-commodities

A case study on S&P Global’s efficient information extraction pipelines for real-time commodities trading insights in a high-security environment using human-in-the-loop distillation.

Half hour of labeling power: Can we beat GPT?

https://speakerdeck.com/inesmontani/workshop-half-hour-of-labeling-power-can-we-beat-gpt

A case study using LLMs to create data and beating the few-shot baseline with a distilled task-specific model for extracting dishes, ingredients and equipment from r/cooking Reddit posts.

Applied NLP Thinking: How to Translate Problems into Solutions

https://explosion.ai/blog/applied-nlp-thinking

This blog post discusses some of the biggest challenges for applied NLP and translating business problems into machine learning solutions, including the distinction between utility and accuracy.

How GitLab uses spaCy to analyze support tickets and empower their community

https://explosion.ai/blog/gitlab-support-insights

A case study on GitLab’s large-scale NLP pipelines for extracting actionable insights from support tickets and usage questions.

Using LLMs for structured data in spaCy

https://spacy.io/usage/large-language-models

The spacy-llm package integrates LLMs into spaCy pipelines, featuring a modular system for fast prototyping and prompting, and turning unstructured responses into robust outputs for various NLP tasks.

Using LLMs for human-in-the-loop distillation in Prodigy

https://prodi.gy/docs/large-language-models

Prodigy comes with preconfigured workflows for using LLMs to speed up and automate annotation and create datasets for distilling large generative models into more accurate, smaller, faster and fully private task-specific components.

More Decks by Ines Montani

Other Decks in Programming

Transcript

  1. Modern scriptable annotation tool for machine learning developers PRODIGY 900+

    companies prodigy.ai Alex Smith Developer Kim Miller Analyst GPT-4 API 10k+ users
  2. BACK TO OUR ROOTS explosion.ai/blog/back-to-our-roots We’re back to running Explosion

    as a smaller, independent-minded and self-su ff icient company. Ines Montani Founder Matthew Honnibal Founder
  3. BACK TO OUR ROOTS explosion.ai/blog/back-to-our-roots We’re back to running Explosion

    as a smaller, independent-minded and self-su ff icient company. Consulting open source developer tools Ines Montani Founder Matthew Honnibal Founder
  4. Exceeds expectations kinda meh, really Just got the SpacePhone Nebula

    and I’m honestly blown away! The camera quality is amazing. And the battery life is incredible, easily lasting me a full day on a single charge. the nebula surely looks nice and all but for that price tag i expected more tbh… never had to carry a powerbank with my old iphone 13 but now i need it all the time 🙃 and night mode doesn’t really work. my pics are way too dark!
  5. Exceeds expectations kinda meh, really find mentions of products Just

    got the SpacePhone Nebula and I’m honestly blown away! The camera quality is amazing. And the battery life is incredible, easily lasting me a full day on a single charge. the nebula surely looks nice and all but for that price tag i expected more tbh… never had to carry a powerbank with my old iphone 13 but now i need it all the time 🙃 and night mode doesn’t really work. my pics are way too dark!
  6. Exceeds expectations kinda meh, really find mentions of products link

    mentions to catalog SpacePhone Nebula Released: June 2024 P3204-W2130 Just got the SpacePhone Nebula and I’m honestly blown away! The camera quality is amazing. And the battery life is incredible, easily lasting me a full day on a single charge. the nebula surely looks nice and all but for that price tag i expected more tbh… never had to carry a powerbank with my old iphone 13 but now i need it all the time 🙃 and night mode doesn’t really work. my pics are way too dark!
  7. Exceeds expectations kinda meh, really extract sentiment for di erent

    attributes Battery Camera Performance Design camera battery design battery camera find mentions of products link mentions to catalog SpacePhone Nebula Released: June 2024 P3204-W2130 Just got the SpacePhone Nebula and I’m honestly blown away! The camera quality is amazing. And the battery life is incredible, easily lasting me a full day on a single charge. the nebula surely looks nice and all but for that price tag i expected more tbh… never had to carry a powerbank with my old iphone 13 but now i need it all the time 🙃 and night mode doesn’t really work. my pics are way too dark!
  8. add results to database Exceeds expectations kinda meh, really extract

    sentiment for di erent attributes Battery Camera Performance Design camera battery design battery camera find mentions of products link mentions to catalog SpacePhone Nebula Released: June 2024 P3204-W2130 Just got the SpacePhone Nebula and I’m honestly blown away! The camera quality is amazing. And the battery life is incredible, easily lasting me a full day on a single charge. the nebula surely looks nice and all but for that price tag i expected more tbh… never had to carry a powerbank with my old iphone 13 but now i need it all the time 🙃 and night mode doesn’t really work. my pics are way too dark!
  9. add results to database Exceeds expectations kinda meh, really extract

    sentiment for di erent attributes Battery Camera Performance Design camera battery design battery camera find mentions of products link mentions to catalog SpacePhone Nebula Released: June 2024 P3204-W2130 Just got the SpacePhone Nebula and I’m honestly blown away! The camera quality is amazing. And the battery life is incredible, easily lasting me a full day on a single charge. the nebula surely looks nice and all but for that price tag i expected more tbh… never had to carry a powerbank with my old iphone 13 but now i need it all the time 🙃 and night mode doesn’t really work. my pics are way too dark!
  10. add results to database Exceeds expectations kinda meh, really extract

    sentiment for di erent attributes Battery Camera Performance Design camera battery design battery camera find mentions of products link mentions to catalog SpacePhone Nebula Released: June 2024 P3204-W2130 Just got the SpacePhone Nebula and I’m honestly blown away! The camera quality is amazing. And the battery life is incredible, easily lasting me a full day on a single charge. the nebula surely looks nice and all but for that price tag i expected more tbh… never had to carry a powerbank with my old iphone 13 but now i need it all the time 🙃 and night mode doesn’t really work. my pics are way too dark!
  11. distilled task-specific model transfer learning ELECTRA T5 in-context learning Falcon

    MIXTRAL GPT-4 BERT-base is still very competitive! large generative model
  12. 📖 text 🔮 model raw output ⚙ parser task output

    💬 template prompt WORKflow in-context learning explosion.ai/blog/human-in-the-loop-distillation
  13. 📖 text 🔮 model raw output ⚙ parser task output

    💬 template prompt WORKflow in-context learning ⚗ distillation 🎯 annotation task dataset task-specific model transfer learning explosion.ai/blog/human-in-the-loop-distillation
  14. CLOSE THE GAP BETWEEN prototype AND production standardize inputs and

    outputs start with evaluation assess utility, not just accuracy explosion.ai/blog/applied-nlp-thinking
  15. CLOSE THE GAP BETWEEN prototype AND production standardize inputs and

    outputs start with evaluation work on data iteratively assess utility, not just accuracy explosion.ai/blog/applied-nlp-thinking
  16. CLOSE THE GAP BETWEEN prototype AND production standardize inputs and

    outputs start with evaluation work on data iteratively assess utility, not just accuracy explosion.ai/blog/applied-nlp-thinking consider structure and ambiguity of natural language
  17. processing pipeline prototype processing pipeline in production structured machine-facing Doc

    object github.com/explosion/spacy-llm prompt model & transform output to structured data structured machine-facing Doc object
  18. kinda meh, really the nebula surely looks nice and all

    but for that price tag i expected more tbh… never had to carry a powerbank with my old iphone 13 but now i need it all the time 🙃 and night mode doesn’t really work. my pics are way too dark!
  19. Relevant Mention "nebula" Catalog ID P3204-W2130 Battery Camera Performance null

    Design structured data kinda meh, really the nebula surely looks nice and all but for that price tag i expected more tbh… never had to carry a powerbank with my old iphone 13 but now i need it all the time 🙃 and night mode doesn’t really work. my pics are way too dark!
  20. prodigy.ai Relevant Mention "nebula" Catalog ID P3204-W2130 Battery Camera Performance

    null Design structured data kinda meh, really the nebula surely looks nice and all but for that price tag i expected more tbh… never had to carry a powerbank with my old iphone 13 but now i need it all the time 🙃 and night mode doesn’t really work. my pics are way too dark!
  21. prodigy.ai Relevant Mention "nebula" Catalog ID P3204-W2130 Battery Camera Performance

    null Design structured data kinda meh, really the nebula surely looks nice and all but for that price tag i expected more tbh… never had to carry a powerbank with my old iphone 13 but now i need it all the time 🙃 and night mode doesn’t really work. my pics are way too dark! selection by generative model GPT-4 API
  22. prodigy.ai Relevant Mention "nebula" Catalog ID P3204-W2130 Battery Camera Performance

    null Design structured data kinda meh, really the nebula surely looks nice and all but for that price tag i expected more tbh… never had to carry a powerbank with my old iphone 13 but now i need it all the time 🙃 and night mode doesn’t really work. my pics are way too dark! selection by generative model GPT-4 API can be faster, not slower!
  23. CASE STUDY #1 400mb model size 2k+ words/second 8hr data

    dev time spacy.fyi/pydata-nyc • PyData NYC 2023 workshop: extracting dishes, ingredients and equipment from r/cooking Reddit posts
  24. CASE STUDY #1 400mb model size 2k+ words/second 8hr data

    dev time spacy.fyi/pydata-nyc • PyData NYC 2023 workshop: extracting dishes, ingredients and equipment from r/cooking Reddit posts • used LLM during annotation
  25. CASE STUDY #1 400mb model size 2k+ words/second 8hr data

    dev time spacy.fyi/pydata-nyc • PyData NYC 2023 workshop: extracting dishes, ingredients and equipment from r/cooking Reddit posts • used LLM during annotation • beat few-shot LLM baseline of 0.74 with task-specific model
  26. CASE STUDY #1 400mb model size 2k+ words/second 8hr data

    dev time spacy.fyi/pydata-nyc • PyData NYC 2023 workshop: extracting dishes, ingredients and equipment from r/cooking Reddit posts • used LLM during annotation • beat few-shot LLM baseline of 0.74 with task-specific model • 20× inference time speedup
  27. • S&P Global: real-time commodities trading insights by extracting structured

    attributes explosion.ai/blog/sp-global-commodities CASE STUDY #2 6mb model size 16k+ words/second 99% F-score
  28. • S&P Global: real-time commodities trading insights by extracting structured

    attributes • high-security environment explosion.ai/blog/sp-global-commodities CASE STUDY #2 6mb model size 16k+ words/second 99% F-score
  29. • S&P Global: real-time commodities trading insights by extracting structured

    attributes • high-security environment • used LLM during annotation explosion.ai/blog/sp-global-commodities CASE STUDY #2 6mb model size 16k+ words/second 99% F-score
  30. • S&P Global: real-time commodities trading insights by extracting structured

    attributes • high-security environment • used LLM during annotation • 10× data development speedup with humans and model in the loop explosion.ai/blog/sp-global-commodities CASE STUDY #2 6mb model size 16k+ words/second 99% F-score
  31. • S&P Global: real-time commodities trading insights by extracting structured

    attributes • high-security environment • used LLM during annotation • 10× data development speedup with humans and model in the loop • 8 market pipelines in production explosion.ai/blog/sp-global-commodities CASE STUDY #2 6mb model size 16k+ words/second 99% F-score
  32. • S&P Global: real-time commodities trading insights by extracting structured

    attributes • high-security environment • used LLM during annotation • 10× data development speedup with humans and model in the loop • 8 market pipelines in production explosion.ai/blog/sp-global-commodities CASE STUDY #2 6mb model size 16k+ words/second 99% F-score
  33. THINK OF IT AS A refactoring PROCESS factor out business

    logic break down larger problems make problem easier
  34. THINK OF IT AS A refactoring PROCESS factor out business

    logic break down larger problems reassess dependencies make problem easier
  35. THINK OF IT AS A refactoring PROCESS factor out business

    logic break down larger problems reassess dependencies choose the best techniques make problem easier
  36. MAKE PROBLEM easier less operational complexity means less can go

    wrong development complexity beginner 🤓 intermediate 🥸 advanced 😎
  37. MAKE PROBLEM easier less operational complexity means less can go

    wrong development complexity beginner 🤓 intermediate 🥸 advanced 😎 🎓 research
  38. MAKE PROBLEM easier less operational complexity means less can go

    wrong development complexity beginner 🤓 intermediate 🥸 advanced 😎 🎓 research • build a commons of knowledge
  39. MAKE PROBLEM easier less operational complexity means less can go

    wrong development complexity beginner 🤓 intermediate 🥸 advanced 😎 🎓 research • build a commons of knowledge • make direct comparisons using standard evaluations
  40. MAKE PROBLEM easier less operational complexity means less can go

    wrong development complexity beginner 🤓 intermediate 🥸 advanced 😎 🎓 research • build a commons of knowledge • make direct comparisons using standard evaluations • standardize what isn’t novel
  41. 🛠 application MAKE PROBLEM easier less operational complexity means less

    can go wrong development complexity beginner 🤓 intermediate 🥸 advanced 😎 🎓 research • build a commons of knowledge • make direct comparisons using standard evaluations • standardize what isn’t novel
  42. 🛠 application MAKE PROBLEM easier less operational complexity means less

    can go wrong development complexity beginner 🤓 intermediate 🥸 advanced 😎 🎓 research • build a commons of knowledge • make direct comparisons using standard evaluations • standardize what isn’t novel • learn from commons of knowledge
  43. 🛠 application MAKE PROBLEM easier less operational complexity means less

    can go wrong development complexity beginner 🤓 intermediate 🥸 advanced 😎 🎓 research • build a commons of knowledge • make direct comparisons using standard evaluations • standardize what isn’t novel • learn from commons of knowledge • align evaluation to project goals
  44. 🛠 application MAKE PROBLEM easier less operational complexity means less

    can go wrong development complexity beginner 🤓 intermediate 🥸 advanced 😎 🎓 research • build a commons of knowledge • make direct comparisons using standard evaluations • standardize what isn’t novel • learn from commons of knowledge • align evaluation to project goals • do whatever works
  45. FACTOR OUT business LOGIC SpacePhone Nebula Released: June 2024 P3204-W2130

    kinda meh, really the nebula surely looks nice and all but for that price tag i expected more tbh… never had to carry a powerbank with my old iphone 13 but now i need it all the time 🙃 and night mode doesn’t really work. my pics are way too dark!
  46. FACTOR OUT business LOGIC result = business_logic(classification(text)) SpacePhone Nebula Released:

    June 2024 P3204-W2130 kinda meh, really the nebula surely looks nice and all but for that price tag i expected more tbh… never had to carry a powerbank with my old iphone 13 but now i need it all the time 🙃 and night mode doesn’t really work. my pics are way too dark!
  47. FACTOR OUT business LOGIC result = business_logic(classification(text)) SpacePhone Nebula Released:

    June 2024 P3204-W2130 kinda meh, really products model phone comparison the nebula surely looks nice and all but for that price tag i expected more tbh… never had to carry a powerbank with my old iphone 13 but now i need it all the time 🙃 and night mode doesn’t really work. my pics are way too dark!
  48. FACTOR OUT business LOGIC result = business_logic(classification(text)) latest model catalog

    reference touchscreen worse than SpacePhone Nebula Released: June 2024 P3204-W2130 kinda meh, really products model phone comparison the nebula surely looks nice and all but for that price tag i expected more tbh… never had to carry a powerbank with my old iphone 13 but now i need it all the time 🙃 and night mode doesn’t really work. my pics are way too dark!
  49. CASE STUDY #3 1 year of support tickets 6× speedup

    explosion.ai/blog/gitlab-support-insights • GitLab: extract actionable insights from support tickets and usage questions
  50. CASE STUDY #3 1 year of support tickets 6× speedup

    explosion.ai/blog/gitlab-support-insights • GitLab: extract actionable insights from support tickets and usage questions • high-security environment
  51. CASE STUDY #3 1 year of support tickets 6× speedup

    explosion.ai/blog/gitlab-support-insights • GitLab: extract actionable insights from support tickets and usage questions • high-security environment • easy to adapt to new scenarios and business questions
  52. CASE STUDY #3 1 year of support tickets 6× speedup

    explosion.ai/blog/gitlab-support-insights • GitLab: extract actionable insights from support tickets and usage questions • high-security environment • easy to adapt to new scenarios and business questions • separated general-purpose features from product-specific logic
  53. CASE STUDY #3 1 year of support tickets 6× speedup

    explosion.ai/blog/gitlab-support-insights • GitLab: extract actionable insights from support tickets and usage questions • high-security environment • easy to adapt to new scenarios and business questions • separated general-purpose features from product-specific logic
  54. REALITY IS NOT AN end-to-end PREDICTION PROBLEM Human-in-the- loop distillation

    is a refactoring process. explosion.ai/blog/human-in-the-loop-distillation
  55. REALITY IS NOT AN end-to-end PREDICTION PROBLEM Iteration and the

    right tooling can get you past the prototype plateau. Human-in-the- loop distillation is a refactoring process. explosion.ai/blog/human-in-the-loop-distillation
  56. REALITY IS NOT AN end-to-end PREDICTION PROBLEM Iteration and the

    right tooling can get you past the prototype plateau. Human-in-the- loop distillation is a refactoring process. Less operational complexity means less can go wrong. explosion.ai/blog/human-in-the-loop-distillation
  57. REALITY IS NOT AN end-to-end PREDICTION PROBLEM Iteration and the

    right tooling can get you past the prototype plateau. Human-in-the- loop distillation is a refactoring process. Less operational complexity means less can go wrong. Expect surprises from the data, and plan for change. explosion.ai/blog/human-in-the-loop-distillation
  58. REALITY IS NOT AN end-to-end PREDICTION PROBLEM Iteration and the

    right tooling can get you past the prototype plateau. Human-in-the- loop distillation is a refactoring process. Less operational complexity means less can go wrong. Expect surprises from the data, and plan for change. There’s no need to compromise on development best practices or privacy. explosion.ai/blog/human-in-the-loop-distillation