Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Applied NLP with LLMs: Beyond Black-Box Monoliths

Applied NLP with LLMs: Beyond Black-Box Monoliths

Large Language Models (LLMs) have enormous potential, but also challenge existing workflows in industry that require modularity, transparency and data privacy. In this talk, I'll show some practical solutions for using the latest state-of-the-art models in real-world applications and distilling their knowledge into smaller and faster components that you can run and maintain in-house.

_________________________________________________

▪️ Case Study #1: https://speakerdeck.com/inesmontani/workshop-half-hour-of-labeling-power-can-we-beat-gpt
▪️ Case Study #2: https://explosion.ai/blog/sp-global-commodities
▪️ Case Study #3: https://explosion.ai/blog/gitlab-support-insights

Ines Montani

October 09, 2024
Tweet

Resources

A practical guide to human-in-the-loop distillation

https://explosion.ai/blog/human-in-the-loop-distillation

This blog post presents practical solutions for using the latest state-of-the-art models in real-world applications and distilling their knowledge into smaller and faster components that you can run and maintain in-house.

Applied NLP Thinking: How to Translate Problems into Solutions

https://explosion.ai/blog/applied-nlp-thinking

This blog post discusses some of the biggest challenges for applied NLP and translating business problems into machine learning solutions, including the distinction between utility and accuracy.

Half hour of labeling power: Can we beat GPT?

https://speakerdeck.com/inesmontani/workshop-half-hour-of-labeling-power-can-we-beat-gpt

A case study using LLMs to create data and beating the few-shot baseline with a distilled task-specific model for extracting dishes, ingredients and equipment from r/cooking Reddit posts.

How S&P Global is making markets more transparent with NLP, spaCy and Prodigy

https://explosion.ai/blog/sp-global-commodities

A case study on S&P Global’s efficient information extraction pipelines for real-time commodities trading insights in a high-security environment using human-in-the-loop distillation.

How GitLab uses spaCy to analyze support tickets and empower their community

https://explosion.ai/blog/gitlab-support-insights

A case study on GitLab’s large-scale NLP pipelines for extracting actionable insights from support tickets and usage questions.

Using LLMs for human-in-the-loop distillation in Prodigy

https://prodi.gy/docs/large-language-models

Prodigy comes with preconfigured workflows for using LLMs to speed up and automate annotation and create datasets for distilling large generative models into more accurate, smaller, faster and fully private task-specific components.

More Decks by Ines Montani

Other Decks in Technology

Transcript

  1. 270m+ 270m+ spaC y ChatGPT can write spaCy code! Open-source

    library for industrial- strength natural language processing spacy.io downloads
  2. 900+ 10k+ Prodi g y Modern scriptable annotation tool for

    machine learning developers prodigy.ai 900+ companies 10k+ users
  3. 900+ 10k+ Prodi g y Modern scriptable annotation tool for

    machine learning developers prodigy.ai Alex Smith Developer Kim Miller Analyst GPT-4 API 900+ companies 10k+ users
  4. Falcon MIXTRAL GPT-4 good contextual results ⚠ transparency ⚠ e

    iciency easy to use & configure fast prototyping LLM
  5. Falcon MIXTRAL GPT-4 good contextual results ⚠ data privacy ⚠

    transparency ⚠ e iciency easy to use & configure fast prototyping LLM
  6. Pro t ot y pe & Productio n CLOSE THE

    GAP BETWEEN CLOSE THE GAP BETWEEN
  7. Pro t ot y pe & Productio n CLOSE THE

    GAP BETWEEN CLOSE THE GAP BETWEEN How to avoid the prototype plateau?
  8. Pro t ot y pe & Productio n CLOSE THE

    GAP BETWEEN CLOSE THE GAP BETWEEN 📝 standardize inputs and outputs How to avoid the prototype plateau?
  9. Pro t ot y pe & Productio n CLOSE THE

    GAP BETWEEN CLOSE THE GAP BETWEEN 📝 standardize inputs and outputs 📈 start with evaluation How to avoid the prototype plateau?
  10. Pro t ot y pe & Productio n CLOSE THE

    GAP BETWEEN CLOSE THE GAP BETWEEN 📝 standardize inputs and outputs 📈 start with evaluation 🔮 assess utility, not just accuracy explosion.ai/blog/applied-nlp-thinking How to avoid the prototype plateau?
  11. Pro t ot y pe & Productio n CLOSE THE

    GAP BETWEEN CLOSE THE GAP BETWEEN 📝 standardize inputs and outputs 📈 start with evaluation 🔮 assess utility, not just accuracy explosion.ai/blog/applied-nlp-thinking 🛠 work on data iteratively How to avoid the prototype plateau?
  12. Pro t ot y pe & Productio n CLOSE THE

    GAP BETWEEN CLOSE THE GAP BETWEEN 📝 standardize inputs and outputs 📈 start with evaluation 🔮 assess utility, not just accuracy explosion.ai/blog/applied-nlp-thinking 💬 consider structure and ambiguity of natural language 🛠 work on data iteratively How to avoid the prototype plateau?
  13. P rototype task-specific output 💬 prompt 📖 text LLM prompt

    model & transform output to structured data github.com/explosion/spacy-llm GPT-4 API
  14. 📖 text task-specific output P roduction P rototype task-specific output

    💬 prompt 📖 text LLM prompt model & transform output to structured data github.com/explosion/spacy-llm GPT-4 API
  15. 📖 text task-specific output P roduction P rototype task-specific output

    💬 prompt 📖 text LLM distilled task-specific components prompt model & transform output to structured data github.com/explosion/spacy-llm GPT-4 API
  16. 📖 text task-specific output P roduction P rototype task-specific output

    💬 prompt 📖 text LLM distilled task-specific components prompt model & transform output to structured data github.com/explosion/spacy-llm ✅ modular GPT-4 API
  17. 📖 text task-specific output P roduction P rototype task-specific output

    💬 prompt 📖 text LLM distilled task-specific components prompt model & transform output to structured data github.com/explosion/spacy-llm ✅ small & fast ✅ modular GPT-4 API
  18. 📖 text task-specific output P roduction P rototype task-specific output

    💬 prompt 📖 text LLM distilled task-specific components prompt model & transform output to structured data github.com/explosion/spacy-llm ✅ data-private ✅ small & fast ✅ modular GPT-4 API
  19. Case Stud y : PyData NYC 8hr 400mb 2k+ 8hr

    400mb 2k+ • extracting dishes, ingredients and equipment from r/cooking Reddit posts model size words/second data dev time spacy.fyi/pydata-nyc
  20. Case Stud y : PyData NYC 8hr 400mb 2k+ 8hr

    400mb 2k+ • extracting dishes, ingredients and equipment from r/cooking Reddit posts • used LLM during annotation model size words/second data dev time spacy.fyi/pydata-nyc
  21. Case Stud y : PyData NYC 8hr 400mb 2k+ 8hr

    400mb 2k+ • extracting dishes, ingredients and equipment from r/cooking Reddit posts • used LLM during annotation • 20× inference time speedup model size words/second data dev time spacy.fyi/pydata-nyc
  22. Case Stud y : PyData NYC 8hr 400mb 2k+ 8hr

    400mb 2k+ • extracting dishes, ingredients and equipment from r/cooking Reddit posts • used LLM during annotation • 20× inference time speedup • beat few-shot LLM baseline of 0.74 with task-specific model model size words/second data dev time spacy.fyi/pydata-nyc
  23. Case Stud y : PyData NYC 8hr 400mb 2k+ 8hr

    400mb 2k+ • extracting dishes, ingredients and equipment from r/cooking Reddit posts • used LLM during annotation • 20× inference time speedup • beat few-shot LLM baseline of 0.74 with task-specific model model size words/second data dev time spacy.fyi/pydata-nyc
  24. Case Stud y : S&P Global 99% 6mb 16k+ 99%

    6mb 16k+ • real-time commodities trading insights by extracting structured attributes model size words/second F-score explosion.ai/blog/sp-global-commodities
  25. Case Stud y : S&P Global 99% 6mb 16k+ 99%

    6mb 16k+ • real-time commodities trading insights by extracting structured attributes • high-security environment model size words/second F-score explosion.ai/blog/sp-global-commodities
  26. Case Stud y : S&P Global 99% 6mb 16k+ 99%

    6mb 16k+ • real-time commodities trading insights by extracting structured attributes • high-security environment • used LLM during annotation model size words/second F-score explosion.ai/blog/sp-global-commodities
  27. Case Stud y : S&P Global 99% 6mb 16k+ 99%

    6mb 16k+ • real-time commodities trading insights by extracting structured attributes • high-security environment • used LLM during annotation • 10× data development speedup with humans and model in the loop model size words/second F-score explosion.ai/blog/sp-global-commodities
  28. Case Stud y : S&P Global 99% 6mb 16k+ 99%

    6mb 16k+ • real-time commodities trading insights by extracting structured attributes • high-security environment • used LLM during annotation • 10× data development speedup with humans and model in the loop • 8 market pipelines in production model size words/second F-score explosion.ai/blog/sp-global-commodities
  29. Case Stud y : S&P Global 99% 6mb 16k+ 99%

    6mb 16k+ • real-time commodities trading insights by extracting structured attributes • high-security environment • used LLM during annotation • 10× data development speedup with humans and model in the loop • 8 market pipelines in production model size words/second F-score explosion.ai/blog/sp-global-commodities
  30. break down larger problems make problem easier factor out business

    logic reassess dependencies choose the best techniques iterate on code and data
  31. Case Stud y : GitLab 1 year 6× 1 year

    6× • extract actionable insights from support tickets and usage questions speedup of support tickets explosion.ai/blog/gitlab-support-insights
  32. Case Stud y : GitLab 1 year 6× 1 year

    6× • extract actionable insights from support tickets and usage questions • high-security environment speedup of support tickets explosion.ai/blog/gitlab-support-insights
  33. Case Stud y : GitLab 1 year 6× 1 year

    6× • extract actionable insights from support tickets and usage questions • high-security environment • easy to adapt to new scenarios and business questions speedup of support tickets explosion.ai/blog/gitlab-support-insights
  34. Case Stud y : GitLab 1 year 6× 1 year

    6× • extract actionable insights from support tickets and usage questions • high-security environment • easy to adapt to new scenarios and business questions • separated general-purpose features from product-specific logic speedup of support tickets explosion.ai/blog/gitlab-support-insights
  35. Case Stud y : GitLab 1 year 6× 1 year

    6× • extract actionable insights from support tickets and usage questions • high-security environment • easy to adapt to new scenarios and business questions • separated general-purpose features from product-specific logic speedup of support tickets explosion.ai/blog/gitlab-support-insights
  36. Reason and refactor. The key to success lies in your

    data and may surprise you! Summar y APPLIED NLP & GEN AI APPLIED NLP & GEN AI
  37. Reason and refactor. The key to success lies in your

    data and may surprise you! Summar y APPLIED NLP & GEN AI APPLIED NLP & GEN AI Iterate. The right tooling and mindset gets you past the “prototype plateau”.
  38. Reason and refactor. The key to success lies in your

    data and may surprise you! LLM Stay ambitious. Don’t compromise on best practices, e iciency and privacy. Summar y APPLIED NLP & GEN AI APPLIED NLP & GEN AI Iterate. The right tooling and mindset gets you past the “prototype plateau”.