Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Iain Mackie: Distilling Small Language Models f...

Turing Fest
July 22, 2024
16

Iain Mackie: Distilling Small Language Models for Enterprise

Malted AI is on a mission to shrink AI models to solve the hardest business problems. Iain will talk through the challenges and lessons of building an AI startup in Scotland and raising £7m from Venture Capital. He will also discuss deploying Small Language Models (SLMs) in an enterprise environment and how knowledge distillation can help build more focused AI solutions at scale.

Turing Fest

July 22, 2024
Tweet

More Decks by Turing Fest

Transcript

  1. © Malted AI 2024 Distilling small language models for enterprise

    © Malted AI 2024 Confidential Confidential Turing Fest 2024
  2. © Malted AI 2024 Agenda Confidential Winning Amazon Alexa Prize

    Founding Malted AI Case study: complex medical questions
  3. © Malted AI 2024 Winning Alexa Prize TaskBot Challenge Confidential

    The foundations of Malted AI begun through solving real-world AI problems • 12-month competition against 125 top AI teams from around the globe • 100,000s+ users interacting and rating the TaskBot’s performance • Production setting requiring low-latency, 100% uptime, and data security • Applied distillation to solve domain-specific problems University of Glasgow
  4. © Malted AI 2024 Real-world task assistance Confidential An AI

    system supports a user accomplish a complex multi-step tasks
  5. © Malted AI 2024 Solving real-world tasks Confidential A flexible

    system required to manage factually grounded and dynamic conversations
  6. © Malted AI 2024 Problem #1 – System Confidential Neither

    traditional intent-based on prompting LLMs sufficient for complex real-world tasks Intent-based conversations Large language models (LLMs) Natural ✖ Flexible ✖ Controlled experience ✔ Low-latency ✔ Self host ✔ Natural ✔ Flexible ✔ Controlled experience ✖ Low-latency ✖ Self host ✖
  7. © Malted AI 2024 Network of small language models (SLMs)

    Confidential Developed a systems using multiple SLMs and external tools to create complex conversational flows Natural ✔ Flexible ✔ Controlled experience ✔ Low-latency ✔ Self host ✔ Scalable ✔ Each SLM focused for a specific task SLMs are orders of magnitude smaller and more efficient than general LLMs
  8. © Malted AI 2024 Graphs to model complex tasks Confidential

    Use SLMs offline to parse and augment 100,000 of rich task data based on online sources Complex graph-based flows Traditional linear flows Conditional task steps Ingredient knowledge
  9. © Malted AI 2024 SLM to manage conversations Confidential An

    SLM (Neural Decision Parser) flexibly manages the conversation actions by calling component SLMs or external tools based on user request and task state
  10. © Malted AI 2024 Problem #2 - Data Confidential We

    have the flexible system and input data – but how do we train our SLMs? “In machine learning you need three things: data, data, data.”
  11. © Malted AI 2024 Distillation High-quality synthetic data to train

    specialised small language models (SLMs) Making scalable, high-quality data Traditional ML solutions Distillation technology Technological frontier SLM performance vs. cost dynamics High-quality Scalable Cost-effective High-quality Scalable Cost-effective Teacher system Domain experts Model efficiency (inverse cost) Task performance ML model Student SLM High-quality synthetic data Manual data Confidential
  12. © Malted AI 2024 Example: Neural Decision Parser Neural decision

    parser trained offline using high-quality synthetic data from the teacher system Confidential Teacher system High-quality synthetic data Seed annotations Raw task data LLM(s) (100bn+ parameters) Student SLM Efficient SLM (0.7bn parameters) Synthetic input LLM(s) (100bn+ parameters) Synthetic output Domain experts confirm quality 2-5 minutes 0.5 seconds Runtime
  13. © Malted AI 2024 Continuously improving system Regularly re-running distillation

    pipelines to improve SLMs based on user feedback and usage Input Retrieval Student SLM Student SLM Output Teacher System Teacher System Continuous improvement signal Confidential User feedback and usage
  14. © Malted AI 2024 2022/23 Alexa Prize performance Confidential The

    winning formula was a network of specialised SLMs and a scalable way to create high-quality training data
  15. © Malted AI 2024 Confidential Malted AI State of the

    World > There’s currently too much focus on artificial general intelligence (AGI) and not enough focus on solving concrete problems Problem > Existing Large Language Models (LLMs) are overly general and lack specificity - Enterprises are struggling to extract value from this technology Solution > Malted AI partners with enterprises to distil their own specialised Small Language Models (SLMs) to solve domain-specific problems that others can’t Vision > We are building a world-leading automatic distillation platform that solves the highest-value enterprise problems
  16. © Malted AI 2024 Building a leading AI company (in

    Scotland) Confidential Successfully deployed AI Raised £7m from Venture Capital Winners of Amazon Alexa Prize • 1st place in multi-million dollar AI competition (125 global teams) • Applied distillation to solve domain-specific problems • Deployed smaller, focused models at scale for 100,000s+ users • Raised £7,000,000 from top-tier VCs, Hoxton Ventures and Creator Fund • Participation from technology founders, domain experts, and corporate CEOs • Funding aims to build Europe’s next AI success story • Paid pilots to deploy business-critical AI in the legal and asset management sectors • Proved our distillation technology and partnership model can solve the hardest enterprise problems • Achieved KPIs: Cost reduction : 20x smaller Performance : +60% increase Data security : On your VPC Malted AI is on a mission to shrink AI models to solve the hardest problems for enterprise
  17. © Malted AI 2024 Malted AI team Assembling leading technical

    and commercial experts to tackle the hardest applied AI problems Confidential Commercial Iain Mackie Co-Founder & CEO Rob Barker Commercial Advisor Laura Bernal Growth & Talent Sandy Nairn Non-Exec Director Carlos Gemmell Co-Founder & CTO Jeff Dalton Scientific Advisor Andrew Yates Senior Researcher Antreas Antoniou Machine Learning Federico Rossetto Co-Founder & Chief Eng. Machine learning Platform Richard Jones Senior Engineer Paul Owoicho Machine Learning Jim Croft Senior DevOps Rachel Young Head of Finance Alessando Speggiorin Full-Stack Engineer Marina Potsi Machine Learning
  18. © Malted AI 2024 Confidential Case Study: Complex medical questions

    Automatically answer domain-specific questions that are grounded to factual knowledge Citations
  19. © Malted AI 2024 Confidential Retrieval augmented generation (RAG) RAG

    is method that grounds text generation based on documents to improve factual accuracy and reduces hallucinations Domain specificity. ✖ Consistency ✖ Efficient ✖ Low latency ✖ Drawbacks on medical questions Traditional RAG setup Vector DB LLM (1.7tr parameters) Input Output
  20. © Malted AI 2024 Distilled RAG Malted AI distributes the

    load of complex question answering over multiple distilled small language models (SLMs) Confidential Each distilled model is focused for a specific task SLMs are orders of magnitude smaller and more efficient than general LLMs We update SLMs as new advanced models are released Benefits of Distilled RAG Multi-step SLM pipelines Domain specificity ✔ Consistency ✔ Efficient ✔ Low latency ✔
  21. © Malted AI 2024 High-quality RAG training data Confidential Teacher

    ensemble Question Document Question Question Domain experts confirm quality Teacher ensemble Doc A Doc B Doc C ✅ ✅ ❌ Teacher ensemble Summary A Summary B Teacher ensemble Cited answer 5 min 2 min 30 seconds 20 seconds Runtime Multi-stage teacher system that creates 100,000+ synthetic questions, mappings to relevant documents, and cited answers
  22. © Malted AI 2024 Ranking performance Confidential High-quality synthetic data

    (teacher system) can greatly improve the search performance of a student SLM Distillation ~8 minutes ~1 seconds
  23. © Malted AI 2024 Conclusion Confidential Smaller can be better

    Scalable high-quality data is essential Scotland is a well placed placed for AI innovation