Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Swift at Scale: Where Performance Really Comes ...

Swift at Scale: Where Performance Really Comes From

iOSConf SG 2026 https://www.iosconf.sg/

Avatar for Yuta Saito

Yuta Saito

January 22, 2026
Tweet

More Decks by Yuta Saito

Other Decks in Programming

Transcript

  1. Swift at Scale: Where Performance Really Comes From Yuta Saito

    Compiler Toolchain Engineer, Goodnotes iOSConf SG 2026 1
  2. About me ! • Yuta Saito / @kateinoigakukun • Compiler

    Toolchain Engineer at Goodnotes • Swift & Ruby language committer 2
  3. Agenda 1. Background 2. Case study 3. Cost model 4.

    Compiler limitations 5. Data layout redesign 6. Takeaways 3
  4. About Goodnotes • AI note-taking & whiteboard app that empowers

    you to capture and express your ideas effortlessly • 10+ years under development • Full of cool features 5
  5. About Goodnotes and Swift • A large shared Swift codebase

    • Runs under different runtime constraints • iOS / macOS devices • Web browsers (WebAssembly) on Android / Windows / etc. 6
  6. Observation: copy cost in an interaction hot path • We

    profiled an interaction hot path • A consistent hotspot: copying a value 8
  7. Hot region context: aggregation of per- component updates • Many

    components produce partial updates • A central loop aggregates them for application Event Component A Component B Component C Partial Update A Partial Update B Partial Update C Aggregator Final Result 9
  8. Minimal sketch: aggregation boundary struct Update { /* ... */

    mutating func aggregate(with other: Update) } var update = Update() for c in components { let u: Update = c.handleEvent() update.aggregate(with: u) } return update 10
  9. Value type characteristics • 58 stored properties • MemoryLayout<Update> .size

    = 6,441 bytes • 50/58 members were of non-trivial types (e.g., class, String, Array) struct Update { var needsReRender: Bool var fooUpdate: FooUpdate? var barUpdate: BarUpdate? // ... dozens more optional fields ... var zooUpdate: ZooUpdate? } 11
  10. Usage pattern: sparse writes, aggregated semantics • Most updates set

    a small subset of fields (often none) • Aggregation semantics were mostly "last write wins" struct Update { // ... // Merge with other partial update mutating func aggregate(with other: Update) { if other.needsReRender { self.needsReRender = true } if let fooUpdate = other.fooUpdate { self.fooUpdate = fooUpdate } if let barUpdate = other.barUpdate { self.barUpdate = barUpdate } // repeat for other fields... } } 12
  11. Trivial Types (POD / bitwise-copyable) Int 8 bytes Bool 1

    byte Double 8 bytes Non-trivial Types (Example: Reference-counted) Reference / Pointer Reference Count 8 bytes Data ... points to heap Trivial vs non-trivial in Swift • Trivial (BitwiseCopyable): can be moved/copied with direct memory operations (effectively memcpy); no special destroy needed • Non-trivial: requires runtime work for copy/destroy (reference counting, CoW checks, custom destructors) • Any non-trivial member makes an aggregate non-trivial 14
  12. Copy for trivial types struct Foo { var a, b:

    Int } func pairFoo() -> (Foo, Foo) { let v = makeFoo() return (v, v) // Copy happens here } pairFoo: bl _makeFoo ; Get value (returns in x0, x1) mov x2, x0 ; Copy first field mov x3, x1 ; Copy second field ret ; Done - just register moves 15
  13. Copy for non-trivial types class Box {} struct Foo {

    var a, b: Box } func pairFoo() -> (Foo, Foo) { let v = makeFoo() return (v, v) // Copy happens here } pairFoo: bl _makeFoo ; Get value (returns in x0, x1) mov x19, x0 ; Save first Box reference mov x20, x1 ; Save second Box reference bl _swift_retain ; Retain first Box (x0 already set) mov x0, x20 ; Load second Box for retain bl _swift_retain ; Retain second Box mov x0, x19 ; Setup return tuple mov x1, x20 mov x2, x19 mov x3, x20 ret ; Done - but with runtime calls 16
  14. Practical rule: Optimization needs provable facts The compiler can optimize

    aggressively when it can prove: • Definitions and data flow are visible • Aliasing / escape is controlled • Call targets are statically known 18
  15. What compilers do well • Inlining and specialization • Local

    ARC cleanup • Scalar replacement / memcpy lowering • Common local simplifications (CSE, DCE, folding) 19
  16. Why compilers rarely auto-optimize data layout Layout optimization typically needs:

    • Whole-program visibility over use sites • Often runtime hot/cold evidence (profile-guided) • Rewriting all field accesses safely • ABI/resilience constraints complicate public types 20
  17. Goal: Make cost proportional to usage • Keep common-case updates

    small • Represent rare updates only when they occur • Preserve existing aggregation semantics ("last write wins") 22
  18. Key idea: Hot/cold split • Hot: frequently checked in the

    pipeline → Keep inline • Cold: rarely set, often only iterated → Side allocate Inline Heap (only if needed) Hot fields • needsReRender • flags / counters pointer / reference Cold Storage points to 23
  19. Hot small storage + cold dynamic storage protocol ColdUpdate {

    // ... } struct Update { // hot fields var needsReRender: Bool // cold fields var cold: [any ColdUpdate] } 24
  20. Aggregation under "last write wins" • Hot fields: merge normally

    • Cold fields: • Just concatenate arrays • Ensure winners occur later • Apply by taking the last occurrence per cold update type Update 1 supplemental: [ ] Result (1+2) supplemental: [ ] Update 2 supplemental: [ ] Final Aggregated Update supplemental: [B, C] Zero Array Allocation Fast Concat Update 3 supplemental: [B, C] 25
  21. // Example: applying updates with "last write wins" func apply(_

    update: Update) { // Handle hot fields if update.needsReRender { render() } // Handle cold fields: take last occurrence per type var lastSeen: [ObjectIdentifier: any ColdUpdate] = [:] for coldUpdate in update.cold { lastSeen[ObjectIdentifier(type(of: coldUpdate))] = coldUpdate } // Apply only the last occurrence of each type for coldUpdate in lastSeen.values { coldUpdate.accept(visitor) } } 26
  22. protocol ColdUpdateVisitor { func visitFoo(_ u: FooUpdate) func visitBar(_ u:

    BarUpdate) // ... } protocol ColdUpdate { func accept(_ v: some ColdUpdateVisitor) } extension FooUpdate: ColdUpdate { func accept(_ v: some ColdUpdateVisitor) { v.visitFoo(self) } } 27
  23. Evaluation Metric Before After Type size 6,441 bytes 347 bytes

    18x reduction CPU time (microbench) 942 ms 543 ms 42% faster 28
  24. Interpretation • The improvement comes from changing what moves through

    the frequently executed path • Not from micro-optimizing the aggregation loop • Smaller, less expensive data in the common case 29
  25. Checklist: data layout-related hotspots • Ever-growing types without bound •

    Many non-trivial members • Many optionals with sparse usage 31
  26. Conclusion • Align data layout with actual usage patterns •

    Make cost proportional to usage • Keep hot representations small • Isolate cold complexity behind indirection 32