Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Talking Trash: The Evolution of Garbage Collection on Android

Romain Guy
October 27, 2018

Talking Trash: The Evolution of Garbage Collection on Android

Learn how the Android garbage collector has evolved from Dalvik to ART. This talk explains why you should worry more about good code than avoiding allocations.

Romain Guy

October 27, 2018
Tweet

More Decks by Romain Guy

Other Decks in Programming

Transcript

  1. Talking Trash
    @chethaase & @romainguy

    View full-size slide

  2. “Garbage in, garbage out”

    View full-size slide

  3. “Garbage in, garbage out”
    But how fast?

    View full-size slide

  4. Modern Android
    Development
    Chet Haase

    @chethaase
    Romain Guy

    @romainguy

    View full-size slide

  5. Modern Android
    Development
    Chet Haase

    @chethaase
    Romain Guy

    @romainguy

    View full-size slide

  6. Modern Android
    Development
    Chet Haase

    @chethaase
    Romain Guy

    @romainguy
    Blah blah blah
    memory blah blah...

    View full-size slide

  7. Optimized for size

    |

    +- JIT optimizations not

    | as powerful

    |

    +- Allocation/collection slow

    |

    +- Heap fragmentation
    Dalvik

    View full-size slide

  8. So:
    _ Avoid allocation whenever possible
    > For example: enums
    _ Primitive types are cool

    (autoboxing is not)

    View full-size slide

  9. ART
    Optimized for performance

    JIT + AOT

    Faster allocation/collection

    Heap defragmentation

    Large object heap

    View full-size slide

  10. ART
    Optimized for performance

    JIT + AOT

    Faster allocation/collection

    Heap defragmentation

    Large object heap
    So:

    View full-size slide

  11. ART
    Optimized for performance

    JIT + AOT

    Faster allocation/collection

    Heap defragmentation

    Large object heap
    So:
    Allocate as necessary
    (yes, even enums)
    Use appropriate types
    But:
    phones are still constrained
    batteries are, too
    be aware of inner-loop bottlenecks

    View full-size slide

  12. Memories
    int foo = 5;
    MyObject thing = new MyObject();

    View full-size slide

  13. Memories
    Stack
    int foo = 5;
    MyObject thing = new MyObject();

    View full-size slide

  14. Memories
    Stack Registers
    int foo = 5;
    MyObject thing = new MyObject();

    View full-size slide

  15. Memories
    Stack Registers
    int foo = 5;
    MyObject thing = new MyObject();
    foo

    View full-size slide

  16. Memories
    Stack Registers
    int foo = 5;
    MyObject thing = new MyObject();
    foo

    View full-size slide

  17. Memories
    Stack Registers
    int foo = 5;
    MyObject thing = new MyObject();
    Heap
    foo

    View full-size slide

  18. Memories
    Stack Registers
    int foo = 5;
    MyObject thing = new MyObject();
    Heap
    foo
    thing

    View full-size slide

  19. Manual Garbage Collection
    MyObject* thing = new MyObject;
    // code using thing...
    delete thing;
    // code using thing
    C++

    View full-size slide

  20. Manual Garbage Collection
    MyObject* thing = new MyObject;
    // code using thing...
    delete thing;
    // code using thing
    Leak!
    C++

    View full-size slide

  21. Manual Garbage Collection
    MyObject* thing = new MyObject;
    // code using thing...
    delete thing;
    // code using thing
    C++

    View full-size slide

  22. Manual Garbage Collection
    MyObject* thing = new MyObject;
    // code using thing...
    delete thing;
    // code using thing
    C++

    View full-size slide

  23. Manual Garbage Collection
    MyObject* thing = new MyObject;
    // code using thing...
    delete thing;
    // code using thing
    Crash!
    C++

    View full-size slide

  24. Manual Garbage Collection
    MyObject* thing = new MyObject;
    // code using thing...
    delete thing;
    // code using thing
    Crash!
    C++
    (maybe)

    View full-size slide

  25. Automatic Garbage Collection
    MyObject thing = new MyObject();
    // code using thing…
    Java

    View full-size slide

  26. Automatic Garbage Collection
    MyObject thing = new MyObject();
    // code using thing…
    No leak!
    Java
    // eventually freed

    View full-size slide

  27. Automatic Garbage Collection
    MyObject thing = new MyObject();
    // code using thing…
    No leak!
    Java
    // eventually freed
    // code using thing…

    View full-size slide

  28. Automatic Garbage Collection
    MyObject thing = new MyObject();
    // code using thing…
    No leak!
    No crash!
    Java
    // eventually freed
    // code using thing…

    View full-size slide

  29. Runtime GC Concerns

    View full-size slide

  30. Runtime GC Concerns
    How long does allocation take?

    View full-size slide

  31. Runtime GC Concerns
    How long does allocation take?

    How long does collection take?

    View full-size slide

  32. Runtime GC Concerns
    How long does allocation take?

    How long does collection take?
    What impact does this have across all threads?

    View full-size slide

  33. Runtime GC Concerns
    How long does allocation take?

    How long does collection take?
    What impact does this have across all threads?
    When do collections happen?

    View full-size slide

  34. Runtime GC Concerns
    How long does allocation take?

    How long does collection take?
    What impact does this have across all threads?
    When do collections happen?

    How efficient is heap usage?

    View full-size slide

  35. Dalvik GC
    ?
    Allocation

    View full-size slide

  36. Dalvik GC
    ?
    Allocation

    View full-size slide

  37. Dalvik GC
    ?
    Allocation

    View full-size slide

  38. Dalvik GC
    Allocation

    View full-size slide

  39. Dalvik GC
    Collection

    View full-size slide

  40. Dalvik GC
    Collection
    Mark root set (pause)

    View full-size slide

  41. Dalvik GC
    Collection
    Mark root set (pause)
    Mark reachable I (concurrent)

    View full-size slide

  42. Dalvik GC
    Collection
    Mark root set (pause)
    Mark reachable I (concurrent)

    View full-size slide

  43. Dalvik GC
    Collection
    Mark root set (pause)
    Mark reachable I (concurrent)
    Mark reachable II (pause)

    View full-size slide

  44. Dalvik GC
    Collection
    Mark root set (pause)
    Mark reachable I (concurrent)
    Mark reachable II (pause)

    View full-size slide

  45. Dalvik GC
    Collection
    Mark root set (pause)
    Mark reachable I (concurrent)
    Mark reachable II (pause)
    Collect (concurrent)

    View full-size slide

  46. Dalvik GC
    ?
    Allocation, Take II

    View full-size slide

  47. Dalvik GC
    ?
    Allocation, Take II

    View full-size slide

  48. Dalvik GC
    ?
    Allocation, Take II

    View full-size slide

  49. Collection: GC_FOR_ALLOC
    Dalvik GC
    ?

    View full-size slide

  50. Collection: GC_FOR_ALLOC
    Dalvik GC
    ?

    View full-size slide

  51. Allocation
    Dalvik GC
    ?

    View full-size slide

  52. Allocation
    Dalvik GC

    View full-size slide

  53. Dalvik GC
    ?
    Allocation, Take III

    View full-size slide

  54. Dalvik GC
    ?
    Allocation, Take III

    View full-size slide

  55. Dalvik GC
    ?
    Grow the Heap

    View full-size slide

  56. Dalvik GC
    ?
    Out of Memory Error
    Grow the Heap
    or…

    View full-size slide

  57. Fragmentation! (Kitkat)
    Heap

    View full-size slide

  58. Fragmentation! (Kitkat)
    Heap

    View full-size slide

  59. Fragmentation! (Kitkat)
    Heap

    View full-size slide

  60. Fragmentation! (Kitkat)
    Heap

    View full-size slide

  61. Fragmentation! (Kitkat)
    Heap

    View full-size slide

  62. Fragmentation! (Kitkat)
    Heap

    View full-size slide

  63. Fragmentation! (Kitkat)
    Heap
    ?

    View full-size slide

  64. Fragmentation! (Kitkat)
    Heap
    ?

    View full-size slide

  65. D/dalvikvm: GC_FOR_ALLOC freed 2K, 50% free 197613K/392536K, paused 6ms, total 6ms
    I/dalvikvm-heap: Forcing collection of SoftReferences for 2000012-byte allocation
    D/dalvikvm: GC_BEFORE_OOM freed 0K, 50% free 197613K/392536K, paused 6ms, total 6ms
    E/dalvikvm-heap: Out of memory on a 2000012-byte allocation.
    Sad Logcat

    View full-size slide

  66. D/dalvikvm: GC_FOR_ALLOC freed 2K, 50% free 197613K/392536K, paused 6ms, total 6ms
    I/dalvikvm-heap: Forcing collection of SoftReferences for 2000012-byte allocation
    D/dalvikvm: GC_BEFORE_OOM freed 0K, 50% free 197613K/392536K, paused 6ms, total 6ms
    E/dalvikvm-heap: Out of memory on a 2000012-byte allocation.
    Sad Logcat

    View full-size slide

  67. D/dalvikvm: GC_FOR_ALLOC freed 2K, 50% free 197613K/392536K, paused 6ms, total 6ms
    I/dalvikvm-heap: Forcing collection of SoftReferences for 2000012-byte allocation
    D/dalvikvm: GC_BEFORE_OOM freed 0K, 50% free 197613K/392536K, paused 6ms, total 6ms
    E/dalvikvm-heap: Out of memory on a 2000012-byte allocation.
    Sad Logcat

    View full-size slide

  68. D/dalvikvm: GC_FOR_ALLOC freed 2K, 50% free 197613K/392536K, paused 6ms, total 6ms
    I/dalvikvm-heap: Forcing collection of SoftReferences for 2000012-byte allocation
    D/dalvikvm: GC_BEFORE_OOM freed 0K, 50% free 197613K/392536K, paused 6ms, total 6ms
    E/dalvikvm-heap: Out of memory on a 2000012-byte allocation.
    Sad Logcat

    View full-size slide

  69. D/dalvikvm: GC_FOR_ALLOC freed 2K, 50% free 197613K/392536K, paused 6ms, total 6ms
    I/dalvikvm-heap: Forcing collection of SoftReferences for 2000012-byte allocation
    D/dalvikvm: GC_BEFORE_OOM freed 0K, 50% free 197613K/392536K, paused 6ms, total 6ms
    E/dalvikvm-heap: Out of memory on a 2000012-byte allocation.
    Sad Logcat

    View full-size slide

  70. ART (Lollipop)
    Faster allocation!

    Faster collection!

    Faster runtime!

    View full-size slide

  71. ART Allocation

    View full-size slide

  72. ART Allocation
    RosAlloc

    View full-size slide

  73. ART Allocation
    RosAlloc
    Replacement for dlmalloc

    View full-size slide

  74. ART Allocation
    RosAlloc
    Replacement for dlmalloc
    Thread-local allocations

    View full-size slide

  75. ART Allocation
    RosAlloc
    Replacement for dlmalloc
    Thread-local allocations
    Grouped small allocations, page-aligned large allocations

    View full-size slide

  76. ART Allocation
    RosAlloc
    Replacement for dlmalloc
    Thread-local allocations
    Grouped small allocations, page-aligned large allocations
    Finer-grained locks

    View full-size slide

  77. ART Allocation
    RosAlloc
    Replacement for dlmalloc
    Thread-local allocations
    Grouped small allocations, page-aligned large allocations
    Finer-grained locks
    4-5x faster than Dalvik!

    View full-size slide

  78. ART Allocation
    RosAlloc
    Replacement for dlmalloc
    Thread-local allocations
    Grouped small allocations, page-aligned large allocations
    Finer-grained locks
    4-5x faster than Dalvik!

    View full-size slide

  79. ART Allocation

    View full-size slide

  80. ART Allocation
    Large object space

    View full-size slide

  81. ART Allocation
    Large object space

    View full-size slide

  82. ART Allocation
    Large object space

    View full-size slide

  83. ART Allocation
    Large object space

    View full-size slide

  84. ART Allocation
    Large object space
    In Dalvik

    View full-size slide

  85. ART Allocation
    Large object space
    ?
    In Dalvik

    View full-size slide

  86. ART Allocation
    Large object space
    In ART

    View full-size slide

  87. ART Allocation
    Large object space
    In ART

    View full-size slide

  88. ART Allocation
    Large object space
    Moving collector!
    In ART

    View full-size slide

  89. ART Allocation
    Large object space
    Moving collector!
    No more fragmentation!
    In ART

    View full-size slide

  90. ART Allocation
    Large object space
    Moving collector!
    No more fragmentation!
    In ART
    *
    * Eventually

    View full-size slide

  91. ART Allocation
    Large object space
    Moving collector!
    No more fragmentation!
    In ART
    *
    * Eventually

    View full-size slide

  92. Fragmentation! (L+)
    Heap

    View full-size slide

  93. Fragmentation! (L+)
    Heap

    View full-size slide

  94. Fragmentation! (L+)
    Heap

    View full-size slide

  95. Fragmentation! (L+)
    Heap
    ?

    View full-size slide

  96. Fragmentation! (L+)
    Heap ?

    View full-size slide

  97. Dalvik GC
    Mark root set (pause)
    Mark reachable I (concurrent)
    Mark reachable II (pause)
    Collect (concurrent)

    View full-size slide

  98. ~10ms
    Dalvik GC
    Mark root set (pause)
    Mark reachable I (concurrent)
    Mark reachable II (pause)
    Collect (concurrent)

    View full-size slide

  99. ~10ms
    Mark root set (pause)
    Mark reachable I (concurrent)
    Mark reachable II (pause)
    Collect (concurrent)
    ART GC

    View full-size slide

  100. Mark root set (pause)
    Mark reachable I (concurrent)
    Mark reachable II (pause)
    Collect (concurrent)
    ART GC

    View full-size slide

  101. Mark root set (concurrent)
    Mark reachable I (concurrent)
    Mark reachable II (pause)
    Collect (concurrent)
    ART GC

    View full-size slide

  102. Mark root set (concurrent)
    Mark reachable I (concurrent)
    Mark reachable II (pause)
    Collect (concurrent)
    ART GC
    Faster!

    View full-size slide

  103. ~3ms
    Mark root set (concurrent)
    Mark reachable I (concurrent)
    Mark reachable II (pause)
    Collect (concurrent)
    ART GC
    Faster!

    View full-size slide

  104. ART Collection

    View full-size slide

  105. ART Collection
    Minor GC

    View full-size slide

  106. ART Collection
    Minor GC
    Fast collection of “young generation”

    View full-size slide

  107. ART Collection
    Minor GC
    Fast collection of “young generation”
    Temporary objects less expensive

    View full-size slide

  108. ART Collection
    Minor GC
    Fast collection of “young generation”
    Temporary objects less expensive

    Large object heap

    View full-size slide

  109. ART Collection
    Minor GC
    Fast collection of “young generation”
    Temporary objects less expensive

    Large object heap
    Less fragmentation

    View full-size slide

  110. ART Collection
    Minor GC
    Fast collection of “young generation”
    Temporary objects less expensive

    Large object heap
    Less fragmentation
    Less heap resizing

    View full-size slide

  111. ART Collection
    Minor GC
    Fast collection of “young generation”
    Temporary objects less expensive

    Large object heap
    Less fragmentation
    Less heap resizing
    Fewer GC_FOR_ALLOC pauses

    View full-size slide

  112. ART Collection
    Minor GC
    Fast collection of “young generation”
    Temporary objects less expensive

    Large object heap
    Less fragmentation
    Less heap resizing
    Fewer GC_FOR_ALLOC pauses

    Faster runtime

    View full-size slide

  113. ART in Marshmallow
    Optimizing compiler
    Allocation optimizations

    View full-size slide

  114. ART in Nougat

    View full-size slide

  115. ART in Nougat
    More inlining and optimizations

    View full-size slide

  116. ART in Nougat
    More inlining and optimizations

    View full-size slide

  117. ART in Nougat
    More inlining and optimizations
    Allocation

    View full-size slide

  118. ART in Nougat
    More inlining and optimizations
    Allocation
    Rewritten in assembly

    View full-size slide

  119. ART in Nougat
    More inlining and optimizations
    Allocation
    Rewritten in assembly
    10x faster than Dalvik (Kitkat)

    View full-size slide

  120. ART in Oreo
    Concurrent heap compaction

    View full-size slide

  121. ART in Oreo
    Concurrent heap compaction
    Defragmentation in foreground!

    View full-size slide

  122. ART in Oreo
    Concurrent heap compaction
    Defragmentation in foreground!
    Less heap resizing, GC_FOR_ALLOC

    View full-size slide

  123. ART in Oreo
    Concurrent heap compaction
    Defragmentation in foreground!
    Less heap resizing, GC_FOR_ALLOC

    View full-size slide

  124. ART in Oreo
    Concurrent heap compaction
    Defragmentation in foreground!
    Less heap resizing, GC_FOR_ALLOC
    Device-wide memory savings

    View full-size slide

  125. ART in Oreo
    Concurrent heap compaction
    Defragmentation in foreground!
    Less heap resizing, GC_FOR_ALLOC
    Device-wide memory savings
    System and Google Play Services

    View full-size slide

  126. ART in Oreo
    Concurrent heap compaction
    Defragmentation in foreground!
    Less heap resizing, GC_FOR_ALLOC
    Device-wide memory savings
    System and Google Play Services
    Smaller heaps for all

    View full-size slide

  127. ART in Oreo
    Concurrent heap compaction
    Defragmentation in foreground!
    Less heap resizing, GC_FOR_ALLOC
    Device-wide memory savings
    System and Google Play Services
    Smaller heaps for all

    View full-size slide

  128. Concurrent Compaction
    Heap ...
    T0 Region T1 Region T2 Region T3 Region Tn Region
    ...
    ...

    View full-size slide

  129. Concurrent Compaction
    Heap ...
    T0 Region T1 Region T2 Region T3 Region Tn Region
    ...
    ...
    Compaction Phase

    View full-size slide

  130. Concurrent Compaction
    Heap ...
    T0 Region T1 Region T2 Region T3 Region Tn Region
    ...
    ...
    Compaction Phase

    View full-size slide

  131. Concurrent Compaction
    Heap ...
    T0 Region T1 Region T2 Region T3 Region Tn Region
    ...
    ...
    Compaction Phase

    View full-size slide

  132. Concurrent Compaction
    Heap ...
    T0 Region T1 Region T2 Region T3 Region Tn Region
    ...
    ...
    Compaction Phase

    View full-size slide

  133. ART in Oreo
    Thread-local bump allocator
    70% faster allocations than Nougat
    18x faster than Dalvik (Kitkat)

    View full-size slide

  134. Concurrent Compaction — Allocation
    Heap ...
    T0 Region T1 Region T2 Region T3 Region Tn Region All-thread Heap
    ...
    ...

    View full-size slide

  135. Concurrent Compaction — Allocation
    Heap ...
    T0 Region T1 Region T2 Region T3 Region Tn Region All-thread Heap
    ...
    ...

    View full-size slide

  136. Concurrent Compaction — Allocation
    Heap ...
    T0 Region T1 Region T2 Region T3 Region Tn Region All-thread Heap
    ...
    ...
    T1 Region
    Free Pointer

    View full-size slide

  137. Concurrent Compaction — Allocation
    Heap ...
    T0 Region T1 Region T2 Region T3 Region Tn Region All-thread Heap
    ...
    ...
    T1 Region
    Free Pointer

    View full-size slide

  138. Concurrent Compaction — Allocation
    Heap ...
    T0 Region T1 Region T2 Region T3 Region Tn Region All-thread Heap
    ...
    ...
    T1 Region
    Free Pointer

    View full-size slide

  139. Concurrent Compaction — Allocation
    Heap ...
    T0 Region T1 Region T2 Region T3 Region Tn Region All-thread Heap
    ...
    ...
    T1 Region
    Free Pointer

    View full-size slide

  140. Concurrent Compaction — Allocation
    Heap ...
    T0 Region T1 Region T2 Region T3 Region Tn Region All-thread Heap
    ...
    ...
    T1 Region
    Free Pointer

    View full-size slide

  141. Concurrent Compaction — Allocation
    Heap ...
    T0 Region T1 Region T2 Region T3 Region Tn Region All-thread Heap
    ...
    ...
    T1 Region
    Free Pointer

    View full-size slide

  142. Allocation Improvements

    View full-size slide

  143. ART in O+
    Young generation collections gone in O

    View full-size slide

  144. ART in O+
    Young generation collections gone in O
    Enabled in AOSP

    View full-size slide

  145. ART in O+
    Young generation collections gone in O
    Enabled in AOSP
    Watch for that future release…

    View full-size slide

  146. Object Pools

    View full-size slide

  147. Object Pools
    Conventional wisdom

    View full-size slide

  148. Object Pools
    Conventional wisdom
    Reusing objects is faster (saves on allocation/collection time)


    View full-size slide

  149. Object Pools
    Conventional wisdom
    Reusing objects is faster (saves on allocation/collection time)

    Actual wisdom

    View full-size slide

  150. Object Pools
    Conventional wisdom
    Reusing objects is faster (saves on allocation/collection time)

    Actual wisdom
    As of Oreo, synchronized object pools are generally slower

    View full-size slide

  151. Soooo… What Now?
    Creating garbage is okay
    (and so is collecting it)

    Use the types and objects you need
    Even enums

    GC is still overhead
    But not as critical to avoid as it was in Dalvik
    Make the right choices for your architecture
    Avoid overhead in critical sections when possible

    View full-size slide

  152. Jank Test Autoboxing

    View full-size slide

  153. Jank Test Autoboxing
    private Float[] mHolder = new Float[100_000];

    View full-size slide

  154. Jank Test Autoboxing
    public void run() {
    long startTime = System.currentTimeMillis();
    float f = 0f;
    for (int i = 0; i < mHolder.length; ++i, f += 1.0f) {
    mHolder[i] = f;
    }
    System.out.println("Alloc time = " +
    (System.currentTimeMillis() - startTime));
    }
    private Float[] mHolder = new Float[100_000];

    View full-size slide

  155. Jank Test Autoboxing
    public void run() {
    long startTime = System.currentTimeMillis();
    float f = 0f;
    for (int i = 0; i < mHolder.length; ++i, f += 1.0f) {
    mHolder[i] = f;
    }
    System.out.println("Alloc time = " +
    (System.currentTimeMillis() - startTime));
    }
    private Float[] mHolder = new Float[100_000];
    I/System.out: Alloc time = 28
    D/dalvikvm: GC_FOR_ALLOC freed 2047K, 1% free 337371K/339492K, paused 10ms, total 10ms
    I/System.out: Alloc time = 29

    View full-size slide

  156. Jank Test Autoboxing
    public void run() {
    long startTime = System.currentTimeMillis();
    float f = 0f;
    for (int i = 0; i < mHolder.length; ++i, f += 1.0f) {
    mHolder[i] = f;
    }
    System.out.println("Alloc time = " +
    (System.currentTimeMillis() - startTime));
    }
    private Float[] mHolder = new Float[100_000];
    I/System.out: Alloc time = 28
    D/dalvikvm: GC_FOR_ALLOC freed 2047K, 1% free 337371K/339492K, paused 10ms, total 10ms
    I/System.out: Alloc time = 29
    I/System.out: Alloc time = 3
    I/System.out: Alloc time = 2
    I/System.out: Alloc time = 4

    View full-size slide

  157. Jank Test Bitmaps

    View full-size slide

  158. Jank Test Bitmaps
    public void run() {
    long startTime = System.currentTimeMillis();
    mBitmap = Bitmap.createBitmap(1_000, 1_000,
    Bitmap.Config.ARGB_8888);
    System.out.println("Alloc time = " +
    (System.currentTimeMillis() - startTime));
    }

    View full-size slide

  159. Jank Test Bitmaps
    public void run() {
    long startTime = System.currentTimeMillis();
    mBitmap = Bitmap.createBitmap(1_000, 1_000,
    Bitmap.Config.ARGB_8888);
    System.out.println("Alloc time = " +
    (System.currentTimeMillis() - startTime));
    }
    I/System.out: Alloc time = 16
    D/dalvikvm: GC_FOR_ALLOC freed 3907K, 2% free 341280K/347244K, paused 7ms, total 7ms
    I/dalvikvm-heap: Grow heap (frag case) to 337.165MB for 4000012-byte allocation
    D/dalvikvm: GC_FOR_ALLOC freed <1K, 1% free 345186K/347244K, paused 7ms, total 7ms

    View full-size slide

  160. Jank Test Bitmaps
    public void run() {
    long startTime = System.currentTimeMillis();
    mBitmap = Bitmap.createBitmap(1_000, 1_000,
    Bitmap.Config.ARGB_8888);
    System.out.println("Alloc time = " +
    (System.currentTimeMillis() - startTime));
    }
    I/System.out: Alloc time = 16
    D/dalvikvm: GC_FOR_ALLOC freed 3907K, 2% free 341280K/347244K, paused 7ms, total 7ms
    I/dalvikvm-heap: Grow heap (frag case) to 337.165MB for 4000012-byte allocation
    D/dalvikvm: GC_FOR_ALLOC freed <1K, 1% free 345186K/347244K, paused 7ms, total 7ms
    I/System.out: Alloc time = 1
    I/System.out: Alloc time = 0
    I/System.out: Alloc time = 0
    I/System.out: Alloc time = 1

    View full-size slide

  161. data class Float3(x: Float, y: Float, z: Float)
    fun Tonemap_ACES(x: Float3): Float3 {
    val a = 2.51f
    val b = 0.03f
    val c = 2.43f
    val d = 0.59f
    val e = 0.14f
    return (x * (a * x + b)) / (x * (c * x + d) + e)
    }

    View full-size slide

  162. inline operator fun Float.plus(v: Float3) =
    Float3(this + v.x, this + v.y, this + v.z)
    inline operator fun Float.times(v: Float3) =
    Float3(this * v.x, this * v.y, this * v.z)

    View full-size slide

  163. Android O, 1 tile = 00.1~00.5s

    View full-size slide

  164. Android O, 1 tile = 00.1~00.5s
    Android K, 1 tile = 40.0~50.0s

    View full-size slide

  165. 10-23 14:40:04.997 3885-3908/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 4ms, total 4ms
    10-23 14:40:05.067 3885-3908/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 2ms, total 2ms
    10-23 14:40:05.147 3885-3901/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 2ms, total 2ms
    10-23 14:40:05.177 3885-3907/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 5ms, total 5ms
    10-23 14:40:05.207 3885-3909/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 3ms, total 3ms
    10-23 14:40:05.277 3885-3903/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 2ms, total 2ms
    10-23 14:40:05.307 3885-3909/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5125K/5912K, paused 3ms, total 3ms
    10-23 14:40:05.357 3885-3901/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 713K, 14% free 5124K/5912K, paused 5ms, total 5ms
    10-23 14:40:05.397 3885-3903/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 3ms, total 3ms
    10-23 14:40:05.447 3885-3901/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 2ms, total 2ms
    10-23 14:40:05.517 3885-3902/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 2ms, total 2ms
    10-23 14:40:05.607 3885-3909/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 2ms, total 2ms
    10-23 14:40:05.677 3885-3903/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 2ms, total 2ms
    10-23 14:40:05.707 3885-3909/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 4ms, total 4ms
    10-23 14:40:05.767 3885-3902/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 2ms, total 2ms
    10-23 14:40:05.837 3885-3908/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 2ms, total 2ms
    10-23 14:40:05.867 3885-3902/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 2ms, total 2ms
    10-23 14:40:05.897 3885-3901/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 2ms, total 2ms
    10-23 14:40:05.957 3885-3902/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 5ms, total 5ms
    10-23 14:40:05.997 3885-3902/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 2ms, total 2ms
    10-23 14:40:06.037 3885-3908/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 2ms, total 2ms
    10-23 14:40:06.107 3885-3908/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 2ms, total 2ms
    10-23 14:40:06.157 3885-3902/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 3ms, total 3ms
    10-23 14:40:06.227 3885-3909/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 4ms, total 4ms
    10-23 14:40:06.267 3885-3901/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 3ms, total 3ms
    10-23 14:40:06.337 3885-3903/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 2ms, total 2ms
    10-23 14:40:06.367 3885-3907/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 2ms, total 2ms
    10-23 14:40:06.437 3885-3901/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 5ms, total 5ms
    10-23 14:40:06.527 3885-3909/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 2ms, total 2ms
    10-23 14:40:06.597 3885-3907/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 4ms, total 4ms
    10-23 14:40:06.617 3885-3901/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 4ms, total 4ms
    10-23 14:40:06.697 3885-3909/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 2ms, total 2ms
    10-23 14:40:06.717 3885-3902/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 2ms, total 2ms
    10-23 14:40:06.767 3885-3901/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 2ms, total 2ms
    10-23 14:40:06.817 3885-3901/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 713K, 14% free 5124K/5912K, paused 3ms, total 3ms
    10-23 14:40:06.857 3885-3907/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 711K, 14% free 5124K/5912K, paused 5ms, total 5ms
    10-23 14:40:06.937 3885-3908/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 5ms, total 5ms
    10-23 14:40:06.957 3885-3901/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 2ms, total 2ms
    10-23 14:40:07.027 3885-3903/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 2ms, total 2ms
    10-23 14:40:07.117 3885-3907/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 2ms, total 2ms
    10-23 14:40:07.157 3885-3901/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 2ms, total 2ms
    10-23 14:40:07.197 3885-3909/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 5ms, total 5ms
    10-23 14:40:07.257 3885-3908/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 2ms, total 2ms
    10-23 14:40:07.307 3885-3908/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 3ms, total 3ms
    10-23 14:40:07.357 3885-3908/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 2ms, total 2ms
    10-23 14:40:07.397 3885-3903/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 3ms, total 3ms
    10-23 14:40:07.447 3885-3903/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 2ms, total 2ms
    10-23 14:40:07.477 3885-3908/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 2ms, total 2ms
    10-23 14:40:07.557 3885-3901/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 2ms, total 2ms
    10-23 14:40:07.617 3885-3903/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 2ms, total 2ms
    10-23 14:40:07.647 3885-3908/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 2ms, total 2ms
    10-23 14:40:07.677 3885-3901/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 5ms, total 5ms
    10-23 14:40:07.707 3885-3903/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 2ms, total 2ms
    10-23 14:40:07.727 3885-3901/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 5ms, total 5ms
    10-23 14:40:07.767 3885-3909/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 2ms, total 2ms
    10-23 14:40:07.797 3885-3908/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 2ms, total 2ms
    10-23 14:40:07.847 3885-3907/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 5ms, total 5ms
    10-23 14:40:07.927 3885-3903/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5125K/5912K, paused 5ms, total 5ms
    10-23 14:40:08.057 3885-3902/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 713K, 14% free 5124K/5912K, paused 4ms, total 4ms
    10-23 14:40:08.077 3885-3907/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 5ms, total 5ms
    10-23 14:40:08.107 3885-3903/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 2ms, total 2ms
    10-23 14:40:08.147 3885-3908/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 4ms, total 4ms
    10-23 14:40:08.207 3885-3901/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 2ms, total 2ms
    10-23 14:40:08.267 3885-3902/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 713K, 14% free 5124K/5912K, paused 4ms, total 4ms
    10-23 14:40:08.287 3885-3907/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 711K, 14% free 5124K/5912K, paused 5ms, total 5ms
    10-23 14:40:08.337 3885-3902/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 4ms, total 4ms

    View full-size slide

  166. 10-23 14:40:24.847 3885-3909/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 3ms, total 3ms
    10-23 14:40:24.917 3885-3903/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 2ms, total 2ms

    View full-size slide

  167. Benchmarks, GC and Caches

    View full-size slide

  168. Core Core Core Core
    Kryo 385 (Pixel 3)
    “Gold” cores

    View full-size slide

  169. Core Core Core Core
    L1 L1 L1 L1
    Kryo 385 (Pixel 3)
    4x 32 KiB
    “Gold” cores

    View full-size slide

  170. Core Core Core Core
    L2 L2 L2 L2
    L1 L1 L1 L1
    Kryo 385 (Pixel 3)
    4x 32 KiB
    “Gold” cores
    4x 256 KiB

    View full-size slide

  171. Core Core Core Core
    L2 L2 L2 L2
    L3
    L1 L1 L1 L1
    Kryo 385 (Pixel 3)
    4x 32 KiB
    “Gold” cores
    4x 256 KiB
    1x 2 MiB

    View full-size slide

  172. L2
    L3
    L1
    private val data = FloatArray(16)
    // …
    val a = foo[0]
    RAM

    View full-size slide

  173. L2
    L3
    L1
    private val data = FloatArray(16)
    // …
    val a = foo[0]
    RAM

    View full-size slide

  174. L2
    L3
    L1
    private val data = FloatArray(16)
    // …
    val a = foo[0]
    RAM

    View full-size slide

  175. L2
    L3
    L1
    private val data = FloatArray(16)
    // …
    val a = foo[0]
    RAM

    View full-size slide

  176. L2
    L3
    L1
    private val data = FloatArray(16)
    // …
    val a = foo[0]
    RAM

    View full-size slide

  177. val m = ArrayList(n)
    64 bytes

    View full-size slide

  178. m[0] = FloatArray(4)
    m[1] = FloatArray(4)
    m[2] = FloatArray(4)
    m[n] = FloatArray(4)
    val m = ArrayList(n)
    64 bytes

    View full-size slide

  179. m[0] = FloatArray(4)
    m[1] = FloatArray(4)
    m[2] = FloatArray(4)
    m[n] = FloatArray(4)
    val m = ArrayList(n)
    RAM
    64 bytes

    View full-size slide

  180. m[0] FloatArray(4)
    m[1] FloatArray(4)
    m[2] FloatArray(4)
    m[n] FloatArray(4)
    val m = ArrayList(n)
    for (i in 0 until m.size - 3) {
    val a = m[i ]
    val b = m[i + 1]
    val c = m[i + 2]
    val d = m[i + 3]
    computeStuff(a, b, c, d)
    }
    L1
    64 bytes

    View full-size slide

  181. m[0] FloatArray(4)
    m[1] FloatArray(4)
    m[2] FloatArray(4)
    m[n] FloatArray(4)
    val m = ArrayList(n)
    for (i in 0 until m.size - 3) {
    val a = m[i ]
    val b = m[i + 1]
    val c = m[i + 2]
    val d = m[i + 3]
    computeStuff(a, b, c, d)
    }
    L1
    64 bytes

    View full-size slide

  182. m[0] FloatArray(4)
    m[1] FloatArray(4)
    m[2] FloatArray(4)
    m[n] FloatArray(4)
    val m = ArrayList(n)
    for (i in 0 until m.size - 3) {
    val a = m[i ]
    val b = m[i + 1]
    val c = m[i + 2]
    val d = m[i + 3]
    computeStuff(a, b, c, d)
    }
    L1
    64 bytes

    View full-size slide

  183. m[0] FloatArray(4)
    m[1] FloatArray(4)
    m[2] FloatArray(4)
    m[n] FloatArray(4)
    val m = ArrayList(n)
    for (i in 0 until m.size - 3) {
    val a = m[i ]
    val b = m[i + 1]
    val c = m[i + 2]
    val d = m[i + 3]
    computeStuff(a, b, c, d)
    }
    L1
    64 bytes

    View full-size slide

  184. m[0] FloatArray(4)
    m[1] FloatArray(4)
    m[2] FloatArray(4)
    m[n] FloatArray(4)
    val m = ArrayList(n)
    for (i in 0 until m.size - 3) {
    val a = m[i ]
    val b = m[i + 1]
    val c = m[i + 2]
    val d = m[i + 3]
    computeStuff(a, b, c, d)
    }
    L1
    64 bytes

    View full-size slide

  185. m[0] FloatArray(4)
    m[1] FloatArray(4)
    m[2] FloatArray(4)
    m[n] FloatArray(4)
    val m = ArrayList(n)
    for (i in 0 until m.size - 3) {
    val a = m[i ]
    val b = m[i + 1]
    val c = m[i + 2]
    val d = m[i + 3]
    computeStuff(a, b, c, d)
    }
    L1
    64 bytes

    View full-size slide

  186. m[0] = FloatArray(4)
    m[1] = FloatArray(4)
    m[2] = FloatArray(4)
    m[n] = FloatArray(4)
    val m = ArrayList(n)
    RAM
    64 bytes

    View full-size slide

  187. m[0] FloatArray(4)
    m[1] FloatArray(4)
    m[2] FloatArray(4)
    m[n] FloatArray(4)
    val m = ArrayList(n)
    for (i in 0 until m.size - 3) {
    val a = m[i ]
    val b = m[i + 1]
    val c = m[i + 2]
    val d = m[i + 3]
    computeStuff(a, b, c, d)
    }
    L1
    64 bytes

    View full-size slide

  188. m[0] FloatArray(4)
    m[1] FloatArray(4)
    m[2] FloatArray(4)
    m[n] FloatArray(4)
    val m = ArrayList(n)
    for (i in 0 until m.size - 3) {
    val a = m[i ]
    val b = m[i + 1]
    val c = m[i + 2]
    val d = m[i + 3]
    computeStuff(a, b, c, d)
    }
    L1
    64 bytes

    View full-size slide

  189. m[0] FloatArray(4)
    m[1] FloatArray(4)
    m[2] FloatArray(4)
    m[n] FloatArray(4)
    val m = ArrayList(n)
    for (i in 0 until m.size - 3) {
    val a = m[i ]
    val b = m[i + 1]
    val c = m[i + 2]
    val d = m[i + 3]
    computeStuff(a, b, c, d)
    }
    L1
    64 bytes

    View full-size slide

  190. m[0] FloatArray(4)
    m[1] FloatArray(4)
    m[2] FloatArray(4)
    m[n] FloatArray(4)
    val m = ArrayList(n)
    for (i in 0 until m.size - 3) {
    val a = m[i ]
    val b = m[i + 1]
    val c = m[i + 2]
    val d = m[i + 3]
    computeStuff(a, b, c, d)
    }
    L1
    64 bytes

    View full-size slide

  191. m[0] FloatArray(4)
    m[1] FloatArray(4)
    m[2] FloatArray(4)
    m[n] FloatArray(4)
    val m = ArrayList(n)
    for (i in 0 until m.size - 3) {
    val a = m[i ]
    val b = m[i + 1]
    val c = m[i + 2]
    val d = m[i + 3]
    computeStuff(a, b, c, d)
    }
    L1
    64 bytes

    View full-size slide

  192. m[0] FloatArray(4)
    m[1] FloatArray(4)
    m[2] FloatArray(4)
    m[n] FloatArray(4)
    val m = ArrayList(n)
    for (i in 0 until m.size - 3) {
    val a = m[i ]
    val b = m[i + 1]
    val c = m[i + 2]
    val d = m[i + 3]
    computeStuff(a, b, c, d)
    }
    L1
    64 bytes

    View full-size slide

  193. m[0] FloatArray(4)
    m[1] FloatArray(4)
    m[2] FloatArray(4)
    m[n] FloatArray(4)
    val m = ArrayList(n)
    for (i in 0 until m.size - 3) {
    val a = m[i ]
    val b = m[i + 1]
    val c = m[i + 2]
    val d = m[i + 3]
    computeStuff(a, b, c, d)
    }
    L1
    64 bytes

    View full-size slide

  194. m[0] FloatArray(4)
    m[1] FloatArray(4)
    m[2] FloatArray(4)
    m[n] FloatArray(4)
    val m = ArrayList(n)
    for (i in 0 until m.size - 3) {
    val a = m[i ]
    val b = m[i + 1]
    val c = m[i + 2]
    val d = m[i + 3]
    computeStuff(a, b, c, d)
    }
    L1
    64 bytes

    View full-size slide

  195. m[0] FloatArray(4)
    m[1] FloatArray(4)
    m[2] FloatArray(4)
    m[n] FloatArray(4)
    val m = ArrayList(n)
    for (i in 0 until m.size - 3) {
    val a = m[i ]
    val b = m[i + 1]
    val c = m[i + 2]
    val d = m[i + 3]
    computeStuff(a, b, c, d)
    }
    L1
    64 bytes

    View full-size slide

  196. 0.0
    1.0
    2.0
    3.0
    4.0
    5.0
    6.0
    No thrash L1 thrash L2 thrash
    Relative computation times (Pixel 3)

    View full-size slide

  197. On some workloads, the work of
    the GC will affect performance

    View full-size slide

  198. You might be benchmarking
    perfect memory access patterns

    View full-size slide