Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Squeezing the most out of Foundational Models o...

Avatar for Sash Zats Sash Zats
September 21, 2025

Squeezing the most out of Foundational Models on-device LLM

In the summer 2025 Apple released their OS updates shipping with on-device LLMs. While quite limited, you can still get quite a bit of milage out of them. This talk is going through multiple patterns that allow to mitigate many shortcomings:
1. Short context window → Recompact chat history to create illusion of infinite chat.
2. Routing → Make your own multimodal model without waiting for Apple to ship it.
3. RAG → Ground model in your private knowledge.
4. Majority voting → Improve quality of answers by choosing the best one with judge LLM.
5. Memory → Preserve user information across sessions allowing LLM to read and write memories.
6. Semantic caching → Save cycles on generating expensive content.
7. Agentic setup → Use Apple Foundation Models to build Perplexity-like agent searching internet for you.

Bonus:
How to set up evals using Swift unit testing framework preventing sudden quality degradation if Apple updates Foundation Models

Source code for the companion app https://github.com/zats/LLMPatterns

Avatar for Sash Zats

Sash Zats

September 21, 2025
Tweet

More Decks by Sash Zats

Other Decks in Programming

Transcript