Iain Mackie: Distilling Small Language Models for Enterprise

© Malted AI 2024 Agenda Confidential Winning Amazon Alexa Prize
Founding Malted AI Case study: complex medical questions

© Malted AI 2024 Winning Alexa Prize TaskBot Challenge Confidential
The foundations of Malted AI begun through solving real-world AI problems • 12-month competition against 125 top AI teams from around the globe • 100,000s+ users interacting and rating the TaskBot’s performance • Production setting requiring low-latency, 100% uptime, and data security • Applied distillation to solve domain-speciﬁc problems University of Glasgow

© Malted AI 2024 Real-world task assistance Confidential An AI
system supports a user accomplish a complex multi-step tasks

© Malted AI 2024 Solving real-world tasks Confidential A ﬂexible
system required to manage factually grounded and dynamic conversations

© Malted AI 2024 Problem #1 – System Confidential Neither
traditional intent-based on prompting LLMs sufﬁcient for complex real-world tasks Intent-based conversations Large language models (LLMs) Natural ✖ Flexible ✖ Controlled experience ✔ Low-latency ✔ Self host ✔ Natural ✔ Flexible ✔ Controlled experience ✖ Low-latency ✖ Self host ✖

© Malted AI 2024 Network of small language models (SLMs)
Confidential Developed a systems using multiple SLMs and external tools to create complex conversational flows Natural ✔ Flexible ✔ Controlled experience ✔ Low-latency ✔ Self host ✔ Scalable ✔ Each SLM focused for a specific task SLMs are orders of magnitude smaller and more efficient than general LLMs

© Malted AI 2024 Graphs to model complex tasks Confidential
Use SLMs ofﬂine to parse and augment 100,000 of rich task data based on online sources Complex graph-based flows Traditional linear flows Conditional task steps Ingredient knowledge

© Malted AI 2024 SLM to manage conversations Confidential An
SLM (Neural Decision Parser) ﬂexibly manages the conversation actions by calling component SLMs or external tools based on user request and task state

© Malted AI 2024 Problem #2 - Data Confidential We
have the ﬂexible system and input data – but how do we train our SLMs? “In machine learning you need three things: data, data, data.”

© Malted AI 2024 Distillation High-quality synthetic data to train
specialised small language models (SLMs) Making scalable, high-quality data Traditional ML solutions Distillation technology Technological frontier SLM performance vs. cost dynamics High-quality Scalable Cost-effective High-quality Scalable Cost-effective Teacher system Domain experts Model efﬁciency (inverse cost) Task performance ML model Student SLM High-quality synthetic data Manual data Confidential

© Malted AI 2024 Example: Neural Decision Parser Neural decision
parser trained offline using high-quality synthetic data from the teacher system Confidential Teacher system High-quality synthetic data Seed annotations Raw task data LLM(s) (100bn+ parameters) Student SLM Efficient SLM (0.7bn parameters) Synthetic input LLM(s) (100bn+ parameters) Synthetic output Domain experts confirm quality 2-5 minutes 0.5 seconds Runtime

© Malted AI 2024 Continuously improving system Regularly re-running distillation
pipelines to improve SLMs based on user feedback and usage Input Retrieval Student SLM Student SLM Output Teacher System Teacher System Continuous improvement signal Confidential User feedback and usage

© Malted AI 2024 2022/23 Alexa Prize performance Confidential The
winning formula was a network of specialised SLMs and a scalable way to create high-quality training data

© Malted AI 2024 Confidential Malted AI State of the
World > There’s currently too much focus on artificial general intelligence (AGI) and not enough focus on solving concrete problems Problem > Existing Large Language Models (LLMs) are overly general and lack specificity - Enterprises are struggling to extract value from this technology Solution > Malted AI partners with enterprises to distil their own specialised Small Language Models (SLMs) to solve domain-specific problems that others can’t Vision > We are building a world-leading automatic distillation platform that solves the highest-value enterprise problems

© Malted AI 2024 Building a leading AI company (in
Scotland) Confidential Successfully deployed AI Raised £7m from Venture Capital Winners of Amazon Alexa Prize • 1st place in multi-million dollar AI competition (125 global teams) • Applied distillation to solve domain-speciﬁc problems • Deployed smaller, focused models at scale for 100,000s+ users • Raised £7,000,000 from top-tier VCs, Hoxton Ventures and Creator Fund • Participation from technology founders, domain experts, and corporate CEOs • Funding aims to build Europe’s next AI success story • Paid pilots to deploy business-critical AI in the legal and asset management sectors • Proved our distillation technology and partnership model can solve the hardest enterprise problems • Achieved KPIs: Cost reduction : 20x smaller Performance : +60% increase Data security : On your VPC Malted AI is on a mission to shrink AI models to solve the hardest problems for enterprise

© Malted AI 2024 Malted AI team Assembling leading technical
and commercial experts to tackle the hardest applied AI problems Confidential Commercial Iain Mackie Co-Founder & CEO Rob Barker Commercial Advisor Laura Bernal Growth & Talent Sandy Nairn Non-Exec Director Carlos Gemmell Co-Founder & CTO Jeff Dalton Scientific Advisor Andrew Yates Senior Researcher Antreas Antoniou Machine Learning Federico Rossetto Co-Founder & Chief Eng. Machine learning Platform Richard Jones Senior Engineer Paul Owoicho Machine Learning Jim Croft Senior DevOps Rachel Young Head of Finance Alessando Speggiorin Full-Stack Engineer Marina Potsi Machine Learning

© Malted AI 2024 Confidential Case Study: Complex medical questions
Automatically answer domain-speciﬁc questions that are grounded to factual knowledge Citations

© Malted AI 2024 Confidential Retrieval augmented generation (RAG) RAG
is method that grounds text generation based on documents to improve factual accuracy and reduces hallucinations Domain speciﬁcity. ✖ Consistency ✖ Efﬁcient ✖ Low latency ✖ Drawbacks on medical questions Traditional RAG setup Vector DB LLM (1.7tr parameters) Input Output

© Malted AI 2024 Distilled RAG Malted AI distributes the
load of complex question answering over multiple distilled small language models (SLMs) Confidential Each distilled model is focused for a specific task SLMs are orders of magnitude smaller and more efficient than general LLMs We update SLMs as new advanced models are released Benefits of Distilled RAG Multi-step SLM pipelines Domain specificity ✔ Consistency ✔ Efficient ✔ Low latency ✔

© Malted AI 2024 High-quality RAG training data Confidential Teacher
ensemble Question Document Question Question Domain experts conﬁrm quality Teacher ensemble Doc A Doc B Doc C ✅ ✅ ❌ Teacher ensemble Summary A Summary B Teacher ensemble Cited answer 5 min 2 min 30 seconds 20 seconds Runtime Multi-stage teacher system that creates 100,000+ synthetic questions, mappings to relevant documents, and cited answers

© Malted AI 2024 Ranking performance Confidential High-quality synthetic data
(teacher system) can greatly improve the search performance of a student SLM Distillation ~8 minutes ~1 seconds

© Malted AI 2024 Conclusion Confidential Smaller can be better
Scalable high-quality data is essential Scotland is a well placed placed for AI innovation

Iain Mackie: Distilling Small Language Models f...

Iain Mackie: Distilling Small Language Models for Enterprise

Turing Fest
PRO

More Decks by Turing Fest

Featured

Transcript

© Malted AI 2024 Distilling small language models for enterprise

© Malted AI 2024 Agenda Confidential Winning Amazon Alexa Prize

© Malted AI 2024 Winning Alexa Prize TaskBot Challenge Confidential

© Malted AI 2024 Real-world task assistance Confidential An AI

© Malted AI 2024 Solving real-world tasks Confidential A ﬂexible

© Malted AI 2024 Problem #1 – System Confidential Neither

© Malted AI 2024 Network of small language models (SLMs)

© Malted AI 2024 Graphs to model complex tasks Confidential

© Malted AI 2024 Example: creamy zucchini pasta Confidential

© Malted AI 2024 SLM to manage conversations Confidential An

© Malted AI 2024 Problem #2 - Data Confidential We

© Malted AI 2024 Distillation High-quality synthetic data to train

© Malted AI 2024 Example: Neural Decision Parser Neural decision

© Malted AI 2024 Continuously improving system Regularly re-running distillation

© Malted AI 2024 2022/23 Alexa Prize performance Confidential The

© Malted AI 2024 Confidential Malted AI State of the

© Malted AI 2024 Building a leading AI company (in

© Malted AI 2024 Malted AI team Assembling leading technical

© Malted AI 2024 Confidential Case Study: Complex medical questions

© Malted AI 2024 Confidential Retrieval augmented generation (RAG) RAG

© Malted AI 2024 Distilled RAG Malted AI distributes the

© Malted AI 2024 High-quality RAG training data Confidential Teacher

© Malted AI 2024 Ranking performance Confidential High-quality synthetic data

© Malted AI 2024 Conclusion Confidential Smaller can be better

© Malted AI 2023 Contact [email protected]