Squeezing the most out of Foundational Models o...

September 21, 2025

17

Squeezing the most out of Foundational Models on-device LLM

In the summer 2025 Apple released their OS updates shipping with on-device LLMs. While quite limited, you can still get quite a bit of milage out of them. This talk is going through multiple patterns that allow to mitigate many shortcomings:
1. Short context window → Recompact chat history to create illusion of infinite chat.
2. Routing → Make your own multimodal model without waiting for Apple to ship it.
3. RAG → Ground model in your private knowledge.
4. Majority voting → Improve quality of answers by choosing the best one with judge LLM.
5. Memory → Preserve user information across sessions allowing LLM to read and write memories.
6. Semantic caching → Save cycles on generating expensive content.
7. Agentic setup → Use Apple Foundation Models to build Perplexity-like agent searching internet for you.

Bonus:
How to set up evals using Swift unit testing framework preventing sudden quality degradation if Apple updates Foundation Models

Source code for the companion app https://github.com/zats/LLMPatterns

Sash Zats

September 21, 2025

Tweet

More Decks by Sash Zats

See All by Sash Zats

Dictionary of generative technics

0

56

Should coders design

1

4k

Taming Animations

4

270

GameplayKit: beyond games

4

12k

EXC_BAD_ACCESS in <redacted>. Now what?

1

16k

Custom operators in swift

0

66

Advanced fun with Objective-C

0

73

Fun fact about Swift

0

91

Other Decks in Programming

See All in Programming

競馬で学ぶ機械学習の基本と実践 / Machine Learning with Horse Racing

0

730

チーム開発の “地ならし"

4

2.8k

Functional Calisthenics in Kotlin: Kotlinで「関数型エクササイズ」を実践しよう

0

110

エンジニアに事業やプロダクトを理解してもらうためにやってること

0

140

Dive into Triton Internals

0

480

퇴근 후 1억이 거래되는 서비스 만들기 | 내가 AI를 사용하는 방법

2

550

AIを駆使して新しい技術を効率的に理解する方法

0

590

高単価案件で働くための心構え

0

110

Phronetic Team with AI - Agile Japan 2025 closing

1

350

オフライン対応！Flutterアプリに全文検索エンジンを実装する @FlutterKaigi2025

itsmedreamwalker

1

160

Web エンジニアが JavaScript で AI Agent を作る / JSConf JP 2025 sponsor session

1

230

業務でAIを使いたい話

0

260

Featured

See All Featured

Performance Is Good for Brains [We Love Speed 2024]

12

1.3k

sergeychernyshev

32

1.2k

Build your cross-platform service in a week with App Engine

234

18k

Building Better People: How to give real-time feedback that sticks.

370

20k

Practical Tips for Bootstrapping Information Extraction Pipelines

24

1.5k

Raft: Consensus for Rubyists

140

7.2k

VelocityConf: Rendering Performance Case Studies

333

24k

Unsuck your backbone

671

58k

Facilitating Awesome Meetings

57

6.6k

Building a Modern Day  E-commerce SEO Strategy

45

8k

Optimising Largest Contentful Paint

37

3.5k

CSS Pre-Processors: Stylus, Less & Sass

359

30k

Transcript