The Present & Future of AI in Mobile Software

L E E D S M O B I L
E · M A Y ‘ 2 6 C O V E R 0 1 / 2 0 L E E D S M O B I L E M E E T U P · M A Y 2 0 2 6 The Present and Future of AI AI in Mobile Software Six months of change, the new builder's loop, and where this is all heading. $ ./start

/ A G E N D A Entering Plan Mode.
~/meetup · cat agenda.md mobile-dev@meetup $ cat agenda.md 01 The last six months — what just changed 02 How teams are actually working — two variants 03 Mobile-specific bottlenecks — at both edges 04 Where it's going — models, silicon, on-device $ ./start L E E D S M O B I L E · M A Y ‘ 2 6 A G E N D A 0 2 / 2 0

L E E D S M O B I L
E · M A Y ‘ 2 6 § 0 1 · T H E L A S T S I X M O N T H S 0 3 / 2 0 01 The last six months. What changed since Nov '25 — and how fast it actually moved. C H A P T E R O N E

W H A T C H A N G E
D Coding agents became good enough. The most opinionated voices in software went from skeptical to agent-first in months. AK Andrej Karpathy ✓ @karpathy · Jan '26 𝕏 Went from 80% manual + autocomplete in November to 80% agent coding in a few weeks. Biggest workflow change in two decades of programming. DH David Heinemeier Hansson ✓ @dhh · Jan '26 𝕏 AI agents really came alive for me. The most exciting thing we've made computers do since we connected them to the internet. LT Linus Torvalds ✓ @torvalds · May '26 L K M L The patch volume is wild. AI-assisted code is the new normal — even for the kernel. L E E D S M O B I L E · M A Y ‘ 2 6 § 0 1 · T H E L A S T S I X M O N T H S 0 4 / 2 0

L E E D S M O B I L
E · M A Y ‘ 2 6 § 0 1 · T H E L A S T S I X M O N T H S 0 5 / 2 0 T H E L E A P 10× 10× Capability per dollar, in one year. 3× from algorithmic improvement · 2× from hardware · the rest is price competition.

L E E D S M O B I L
E · M A Y ‘ 2 6 § 0 2 · H O W T H E Y W O R K N O W 0 6 / 2 0 02 How do mobile teams work now? Early-adopter mobile teams aren't using AI like autocomplete anymore. C H A P T E R T W O

/ P O L L · R A I S
E A H A N D How are you using AI to build apps? // pick the one closest to your daily flow A I'm not. B Glorified auto-complete. const add = (a, b) => { } ⇥ T A B T O A C C E P T return a + b; C In-IDE agent. D Terminal agent. $ claude → reading 12 files… → drafting diff (+182 / −34) $ L E E D S M O B I L E · M A Y ‘ 2 6 § 0 2 · H O W T H E Y W O R K N O W 0 7 / 2 0

T W O W A Y S T O A
C T U A L L Y U S E T H E M Same tools. Codebase picks the loop. Risk tolerance and existing constraints split early-adopter teams into two camps. 0 1 · C O N S E R V A T I V E Human-in-the-loop. Agent drafts. You review every diff. Tests gate everything. F I T Legacy code, regulated industries, mature products. L O O P spec → agent → review → tests → merge B E T Humans are the source of truth — agents are a force multiplier, not an authority. 0 2 · A G G R E S S I V E Agent-on-agent. Agents review agents. Ship to canary. Fix forward. F I T Greenfield, internal tools, prototype velocity. L O O P spec → agent → agent → canary → measure → patch B E T Next year's model will be good enough to fix our tech debt. L E E D S M O B I L E · M A Y ‘ 2 6 § 0 2 · H O W T H E Y W O R K N O W 0 8 / 2 0

V A R I A N T 0 1 ·
I N A C T I O N The conservative loop, on a real branch. claude-code · feature/onboarding-paywall mobile-dev@meetup $ add the new paywall variant to onboarding, hook up the experiment → reading OnboardingFlow.swift, ExperimentClient.kt, 9 more files → drafting changes across 4 files (+182 / −34) → running xcodebuild test… ✓ 84 passed → running ./gradlew testDebugUnitTest… ✓ 312 passed → diff ready. opening for review. note · experiment client needed a default case — added one. flag this if you'd rather throw. $ /review L E E D S M O B I L E · M A Y ‘ 2 6 § 0 2 · H O W T H E Y W O R K N O W 0 9 / 2 0

/ R E S U L T S · P
R O D U C T I O N Results at Olio Olio. // five months in · one 9-engineer team · agent-led day-to-day >2× >2× throughput per engineer −30% −30% AWS spend · infra refactors + perf wins 100% 100% frontend engineers now full-stack (React Native devs → Rails PRs) ✓ Lowest API response times in company history. ✓ Significant reduction in Sentry errors across the stack. ✓ Major refactors taken on with confidence — not deferred. ✓ Scalable end-to-end test infrastructure built; coverage up across the apps. L E E D S M O B I L E · M A Y ‘ 2 6 § 0 2 · H O W T H E Y W O R K N O W 1 0 / 2 0

L E E D S M O B I L
E · M A Y ‘ 2 6 § 0 3 · M O B I L E B O T T L E N E C K S 1 1 / 2 0 03 Mobile bottlenecks. Native app tooling for agents still hasn't caught up. C H A P T E R T H R E E

→ → T H E B O T T L
E N E C K M O V E D Building got faster, scoping and validating… not yet. Just because you can build it, doesn't mean you should. Or that it works everywhere! S C O P E What & how. PM skills don't provide context and history. · Agents aren't doing user research for us. · Design can be accelerated, but UX insight is lacking. · System design trade-offs need judgement. · B U I L D Diffs. Agents write the code. · Fixes in minutes, not hours. Features in hours, not days. · Developers optimise the process by observing what went wrong. · R E V I E W Did it work? Right thing built? · Anything regress? · Will it ship cleanly on every device? · Agents testing apps are still clumsy and slow. · L E E D S M O B I L E · M A Y ‘ 2 6 § 0 3 · M O B I L E B O T T L E N E C K S 1 2 / 2 0

$ S T A T U S - - M
O B I L E Universal challenges, harder on mobile. mobile-ai · edges [ scope ] platform conventions ⟶ tribal knowledge, sparse in training data [ scope ] cross-platform parity ⟶ double the design spec, double the API surface [ scope ] "which native API?" ⟶ multiple valid frameworks per task [ review ] ui readability ⟶ screenshots or accessibility trees < DOM [ review ] real-device UI tests ⟶ flaky, retries eat minutes [ review ] store cycles ⟶ patches take hours to days, fix forward is high risk → six chokepoints · three at scope, three at review · the middle is fine. L E E D S M O B I L E · M A Y ‘ 2 6 § 0 3 · M O B I L E B O T T L E N E C K S 1 3 / 2 0

L E E D S M O B I L
E · M A Y ‘ 2 6 § 0 4 · W H E R E I T ' S G O I N G 1 4 / 2 0 04 Where it's going. Models, weights, silicon. Trends heading towards your device? C H A P T E R F O U R

The Pareto frontier is flattening. 1200 1300 1400 1500 $0.10
$1 $10 $100 A R E N A E L O → ← C H E A P E R · U S D P E R M I L L I O N T O K E N S · P R I C I E R → Llama 3 8B Gemma 3 4B Gemma 3 12B Gemma 3 27B Gemini 3 Flash Gemini 3 Pro Jan 2026 May 2026 F R O N T I E R · J A N 2 0 2 6 Four months ago — Gemma 3 family at the cheap end; Gemini 3 had just landed (Nov/Dec '25), pushing the ceiling to 1486 Elo. L E E D S M O B I L E · M A Y ‘ 2 6 § 0 4 · W H E R E I T ' S G O I N G 1 5 / 2 0

The Pareto frontier is flattening. 1200 1300 1400 1500 $0.10
$1 $10 $100 A R E N A E L O → ← C H E A P E R · U S D P E R M I L L I O N T O K E N S · P R I C I E R → Llama 3 8B DeepSeek V4 Flash Gemma 4 31BDeepSeek V4 Pro Gemini 3 Flash Gemini 3.5 Flash Claude Opus 4.7 Jan 2026 May 2026 · today F R O N T I E R · T O D A Y Four months later — DeepSeek V4 fills the $0.20–$1 gap; Gemma 4 31B and Opus 4.7 extend the rest. $0.20/M today buys what $1/M did in January. L E E D S M O B I L E · M A Y ‘ 2 6 § 0 4 · W H E R E I T ' S G O I N G 1 6 / 2 0

O P E N V S P R O P
R I E T A R Y · I N T E L L I G E N C E I N D E X Open weights rapidly catching up to the frontier. // artificialanalysis.ai · frontier models · may 2026 GPT-5.5 60 Claude Opus 4.7 57 Gemini 3.1 Pro 57 Kimi K2.6 54 MiMo V2.5 Pro 54 Claude Opus 4.6 53 DeepSeek V4 Pro 52 GLM-5.1 51 MiniMax M2.7 50 proprietary open-weights ‹ intelligence index, 0 → 60 › L E E D S M O B I L E · M A Y ‘ 2 6 § 0 4 · W H E R E I T ' S G O I N G 1 7 / 2 0

C L O U D S I L I C
O N · G E N B Y G E N Cost per token follows the silicon down. // bar = FP8 PFLOPS per chip · $/Mtok indicative · ironwood anchored to Google's published $0.02 '24 H1 TPU Trillium · v6 G O O G L E $0.050 /Mtok '24 Q4 AWS Trainium 2 A W S $0.080 /Mtok '25 Q2 TPU Ironwood · v7 G O O G L E $0.020 /Mtok '26 Q1 Microsoft Maia 200 M I C R O S O F T $0.015 /Mtok '26 H2 AWS Trainium 3 A W S $0.040 /Mtok '26 Q4 TPU Zebrafish · v8i G O O G L E $0.012 /Mtok google · tpu aws · trainium microsoft · maia ‹ cost: vendor claims + estimates › L E E D S M O B I L E · M A Y ‘ 2 6 § 0 4 · W H E R E I T ' S G O I N G 1 8 / 2 0

L E E D S M O B I L
E · M A Y ‘ 2 6 § 0 4 · W H E R E I T ' S G O I N G 1 9 / 2 0 O N - D E V I C E · O P E N Q U E S T I O N S How much really goes on-device? More questions than answers. The pros and cons depend on what your app actually does. ? Will every app ship its own model? No. App-size, battery, and update headache only make sense for a few categories. ? Will the built-in models get good? Yes, but will they work well enough for your use case? ? Will we need more RAM than devices have today? Likely, for anything close to today's frontier performance. What does that mean for the long tail of low-end devices?

L E E D S M O B I L
E · M A Y ‘ 2 6 T H A N K S 2 0 / 2 0 T H E E N D ? "Now this is not the end. It is not even the beginning of the end. But it is, perhaps, the end of the beginning the end of the beginning." — w i n s t o n c h u r c h i l l , 1 9 4 2 /questions · t h a n k y o u · $ open ./organiser-slides → leeds-mobile.github.io/organiser-slides/#12

The Present & Future of AI in Mobile Software

The Present & Future of AI in Mobile Software

Leeds Mobile

More Decks by Leeds Mobile

Other Decks in Programming

Featured

Transcript

L E E D S M O B I L

/ A G E N D A Entering Plan Mode.

L E E D S M O B I L

W H A T C H A N G E

L E E D S M O B I L

L E E D S M O B I L

/ P O L L · R A I S

T W O W A Y S T O A

V A R I A N T 0 1 ·

/ R E S U L T S · P

L E E D S M O B I L

→ → T H E B O T T L

$ S T A T U S - - M

L E E D S M O B I L

The Pareto frontier is flattening. 1200 1300 1400 1500 $0.10

The Pareto frontier is flattening. 1200 1300 1400 1500 $0.10

O P E N V S P R O P

C L O U D S I L I C

L E E D S M O B I L

L E E D S M O B I L