Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Talking Trash: The Evolution of Garbage Collection on Android

237be48129b762b31847d6167597366d?s=47 Romain Guy
October 27, 2018

Talking Trash: The Evolution of Garbage Collection on Android

Learn how the Android garbage collector has evolved from Dalvik to ART. This talk explains why you should worry more about good code than avoiding allocations.

237be48129b762b31847d6167597366d?s=128

Romain Guy

October 27, 2018
Tweet

Transcript

  1. Talking Trash @chethaase & @romainguy

  2. “Garbage in, garbage out”

  3. “Garbage in, garbage out” But how fast?

  4. Modern Android Development Chet Haase @chethaase Romain Guy @romainguy

  5. Modern Android Development Chet Haase @chethaase Romain Guy @romainguy

  6. Modern Android Development Chet Haase @chethaase Romain Guy @romainguy Blah

    blah blah memory blah blah...
  7. Optimized for size 
 | 
 +- JIT optimizations not

    
 | as powerful 
 | 
 +- Allocation/collection slow 
 | 
 +- Heap fragmentation Dalvik
  8. So: _ Avoid allocation whenever possible > For example: enums

    _ Primitive types are cool 
 (autoboxing is not)
  9. ART Optimized for performance JIT + AOT Faster allocation/collection Heap

    defragmentation Large object heap
  10. ART Optimized for performance JIT + AOT Faster allocation/collection Heap

    defragmentation Large object heap So:
  11. ART Optimized for performance JIT + AOT Faster allocation/collection Heap

    defragmentation Large object heap So: Allocate as necessary (yes, even enums) Use appropriate types But: phones are still constrained batteries are, too be aware of inner-loop bottlenecks
  12. Memories int foo = 5; MyObject thing = new MyObject();

  13. Memories Stack int foo = 5; MyObject thing = new

    MyObject();
  14. Memories Stack Registers int foo = 5; MyObject thing =

    new MyObject();
  15. Memories Stack Registers int foo = 5; MyObject thing =

    new MyObject(); foo
  16. Memories Stack Registers int foo = 5; MyObject thing =

    new MyObject(); foo
  17. Memories Stack Registers int foo = 5; MyObject thing =

    new MyObject(); Heap foo
  18. Memories Stack Registers int foo = 5; MyObject thing =

    new MyObject(); Heap foo thing
  19. Manual Garbage Collection MyObject* thing = new MyObject; // code

    using thing... delete thing; // code using thing C++
  20. Manual Garbage Collection MyObject* thing = new MyObject; // code

    using thing... delete thing; // code using thing Leak! C++
  21. Manual Garbage Collection MyObject* thing = new MyObject; // code

    using thing... delete thing; // code using thing C++
  22. Manual Garbage Collection MyObject* thing = new MyObject; // code

    using thing... delete thing; // code using thing C++
  23. Manual Garbage Collection MyObject* thing = new MyObject; // code

    using thing... delete thing; // code using thing Crash! C++
  24. Manual Garbage Collection MyObject* thing = new MyObject; // code

    using thing... delete thing; // code using thing Crash! C++ (maybe)
  25. Automatic Garbage Collection MyObject thing = new MyObject(); // code

    using thing… Java
  26. Automatic Garbage Collection MyObject thing = new MyObject(); // code

    using thing… No leak! Java // eventually freed
  27. Automatic Garbage Collection MyObject thing = new MyObject(); // code

    using thing… No leak! Java // eventually freed // code using thing…
  28. Automatic Garbage Collection MyObject thing = new MyObject(); // code

    using thing… No leak! No crash! Java // eventually freed // code using thing…
  29. Runtime GC Concerns

  30. Runtime GC Concerns How long does allocation take?

  31. Runtime GC Concerns How long does allocation take? 
 How

    long does collection take?
  32. Runtime GC Concerns How long does allocation take? 
 How

    long does collection take? What impact does this have across all threads?
  33. Runtime GC Concerns How long does allocation take? 
 How

    long does collection take? What impact does this have across all threads? When do collections happen?
  34. Runtime GC Concerns How long does allocation take? 
 How

    long does collection take? What impact does this have across all threads? When do collections happen? 
 How efficient is heap usage?
  35. Dalvik GC ? Allocation

  36. Dalvik GC ? Allocation

  37. Dalvik GC ? Allocation

  38. Dalvik GC Allocation

  39. Dalvik GC Collection

  40. Dalvik GC Collection Mark root set (pause)

  41. Dalvik GC Collection Mark root set (pause) Mark reachable I

    (concurrent)
  42. Dalvik GC Collection Mark root set (pause) Mark reachable I

    (concurrent)
  43. Dalvik GC Collection Mark root set (pause) Mark reachable I

    (concurrent) Mark reachable II (pause)
  44. Dalvik GC Collection Mark root set (pause) Mark reachable I

    (concurrent) Mark reachable II (pause)
  45. Dalvik GC Collection Mark root set (pause) Mark reachable I

    (concurrent) Mark reachable II (pause) Collect (concurrent)
  46. Dalvik GC ? Allocation, Take II

  47. Dalvik GC ? Allocation, Take II

  48. Dalvik GC ? Allocation, Take II

  49. Collection: GC_FOR_ALLOC Dalvik GC ?

  50. Collection: GC_FOR_ALLOC Dalvik GC ?

  51. Allocation Dalvik GC ?

  52. Allocation Dalvik GC

  53. Dalvik GC ? Allocation, Take III

  54. Dalvik GC ? Allocation, Take III

  55. Dalvik GC ? Grow the Heap

  56. Dalvik GC ? Out of Memory Error Grow the Heap

    or…
  57. Fragmentation! (Kitkat) Heap

  58. Fragmentation! (Kitkat) Heap

  59. Fragmentation! (Kitkat) Heap

  60. Fragmentation! (Kitkat) Heap

  61. Fragmentation! (Kitkat) Heap

  62. Fragmentation! (Kitkat) Heap

  63. Fragmentation! (Kitkat) Heap ?

  64. Fragmentation! (Kitkat) Heap ?

  65. D/dalvikvm: GC_FOR_ALLOC freed 2K, 50% free 197613K/392536K, paused 6ms, total

    6ms I/dalvikvm-heap: Forcing collection of SoftReferences for 2000012-byte allocation D/dalvikvm: GC_BEFORE_OOM freed 0K, 50% free 197613K/392536K, paused 6ms, total 6ms E/dalvikvm-heap: Out of memory on a 2000012-byte allocation. Sad Logcat
  66. D/dalvikvm: GC_FOR_ALLOC freed 2K, 50% free 197613K/392536K, paused 6ms, total

    6ms I/dalvikvm-heap: Forcing collection of SoftReferences for 2000012-byte allocation D/dalvikvm: GC_BEFORE_OOM freed 0K, 50% free 197613K/392536K, paused 6ms, total 6ms E/dalvikvm-heap: Out of memory on a 2000012-byte allocation. Sad Logcat
  67. D/dalvikvm: GC_FOR_ALLOC freed 2K, 50% free 197613K/392536K, paused 6ms, total

    6ms I/dalvikvm-heap: Forcing collection of SoftReferences for 2000012-byte allocation D/dalvikvm: GC_BEFORE_OOM freed 0K, 50% free 197613K/392536K, paused 6ms, total 6ms E/dalvikvm-heap: Out of memory on a 2000012-byte allocation. Sad Logcat
  68. D/dalvikvm: GC_FOR_ALLOC freed 2K, 50% free 197613K/392536K, paused 6ms, total

    6ms I/dalvikvm-heap: Forcing collection of SoftReferences for 2000012-byte allocation D/dalvikvm: GC_BEFORE_OOM freed 0K, 50% free 197613K/392536K, paused 6ms, total 6ms E/dalvikvm-heap: Out of memory on a 2000012-byte allocation. Sad Logcat
  69. D/dalvikvm: GC_FOR_ALLOC freed 2K, 50% free 197613K/392536K, paused 6ms, total

    6ms I/dalvikvm-heap: Forcing collection of SoftReferences for 2000012-byte allocation D/dalvikvm: GC_BEFORE_OOM freed 0K, 50% free 197613K/392536K, paused 6ms, total 6ms E/dalvikvm-heap: Out of memory on a 2000012-byte allocation. Sad Logcat
  70. ART (Lollipop) Faster allocation! 
 Faster collection! 
 Faster runtime!

  71. ART Allocation

  72. ART Allocation RosAlloc

  73. ART Allocation RosAlloc Replacement for dlmalloc

  74. ART Allocation RosAlloc Replacement for dlmalloc Thread-local allocations

  75. ART Allocation RosAlloc Replacement for dlmalloc Thread-local allocations Grouped small

    allocations, page-aligned large allocations
  76. ART Allocation RosAlloc Replacement for dlmalloc Thread-local allocations Grouped small

    allocations, page-aligned large allocations Finer-grained locks
  77. ART Allocation RosAlloc Replacement for dlmalloc Thread-local allocations Grouped small

    allocations, page-aligned large allocations Finer-grained locks 4-5x faster than Dalvik!
  78. ART Allocation RosAlloc Replacement for dlmalloc Thread-local allocations Grouped small

    allocations, page-aligned large allocations Finer-grained locks 4-5x faster than Dalvik!
  79. ART Allocation

  80. ART Allocation Large object space

  81. ART Allocation Large object space

  82. ART Allocation Large object space

  83. ART Allocation Large object space

  84. ART Allocation Large object space In Dalvik

  85. ART Allocation Large object space ? In Dalvik

  86. ART Allocation Large object space In ART

  87. ART Allocation Large object space In ART

  88. ART Allocation Large object space Moving collector! In ART

  89. ART Allocation Large object space Moving collector! No more fragmentation!

    In ART
  90. ART Allocation Large object space Moving collector! No more fragmentation!

    In ART * * Eventually
  91. ART Allocation Large object space Moving collector! No more fragmentation!

    In ART * * Eventually
  92. Fragmentation! (L+) Heap

  93. Fragmentation! (L+) Heap

  94. Fragmentation! (L+) Heap

  95. Fragmentation! (L+) Heap ?

  96. Fragmentation! (L+) Heap ?

  97. Dalvik GC Mark root set (pause) Mark reachable I (concurrent)

    Mark reachable II (pause) Collect (concurrent)
  98. ~10ms Dalvik GC Mark root set (pause) Mark reachable I

    (concurrent) Mark reachable II (pause) Collect (concurrent)
  99. ~10ms Mark root set (pause) Mark reachable I (concurrent) Mark

    reachable II (pause) Collect (concurrent) ART GC
  100. Mark root set (pause) Mark reachable I (concurrent) Mark reachable

    II (pause) Collect (concurrent) ART GC
  101. Mark root set (concurrent) Mark reachable I (concurrent) Mark reachable

    II (pause) Collect (concurrent) ART GC
  102. Mark root set (concurrent) Mark reachable I (concurrent) Mark reachable

    II (pause) Collect (concurrent) ART GC Faster!
  103. ~3ms Mark root set (concurrent) Mark reachable I (concurrent) Mark

    reachable II (pause) Collect (concurrent) ART GC Faster!
  104. ART Collection

  105. ART Collection Minor GC

  106. ART Collection Minor GC Fast collection of “young generation”

  107. ART Collection Minor GC Fast collection of “young generation” Temporary

    objects less expensive
  108. ART Collection Minor GC Fast collection of “young generation” Temporary

    objects less expensive 
 Large object heap
  109. ART Collection Minor GC Fast collection of “young generation” Temporary

    objects less expensive 
 Large object heap Less fragmentation
  110. ART Collection Minor GC Fast collection of “young generation” Temporary

    objects less expensive 
 Large object heap Less fragmentation Less heap resizing
  111. ART Collection Minor GC Fast collection of “young generation” Temporary

    objects less expensive 
 Large object heap Less fragmentation Less heap resizing Fewer GC_FOR_ALLOC pauses
  112. ART Collection Minor GC Fast collection of “young generation” Temporary

    objects less expensive 
 Large object heap Less fragmentation Less heap resizing Fewer GC_FOR_ALLOC pauses 
 Faster runtime
  113. ART in Marshmallow Optimizing compiler Allocation optimizations

  114. ART in Nougat

  115. ART in Nougat More inlining and optimizations

  116. ART in Nougat More inlining and optimizations

  117. ART in Nougat More inlining and optimizations Allocation

  118. ART in Nougat More inlining and optimizations Allocation Rewritten in

    assembly
  119. ART in Nougat More inlining and optimizations Allocation Rewritten in

    assembly 10x faster than Dalvik (Kitkat)
  120. ART in Oreo

  121. ART in Oreo Concurrent heap compaction

  122. ART in Oreo Concurrent heap compaction Defragmentation in foreground!

  123. ART in Oreo Concurrent heap compaction Defragmentation in foreground! Less

    heap resizing, GC_FOR_ALLOC
  124. ART in Oreo Concurrent heap compaction Defragmentation in foreground! Less

    heap resizing, GC_FOR_ALLOC
  125. ART in Oreo Concurrent heap compaction Defragmentation in foreground! Less

    heap resizing, GC_FOR_ALLOC Device-wide memory savings
  126. ART in Oreo Concurrent heap compaction Defragmentation in foreground! Less

    heap resizing, GC_FOR_ALLOC Device-wide memory savings System and Google Play Services
  127. ART in Oreo Concurrent heap compaction Defragmentation in foreground! Less

    heap resizing, GC_FOR_ALLOC Device-wide memory savings System and Google Play Services Smaller heaps for all
  128. ART in Oreo Concurrent heap compaction Defragmentation in foreground! Less

    heap resizing, GC_FOR_ALLOC Device-wide memory savings System and Google Play Services Smaller heaps for all
  129. Concurrent Compaction Heap ... T0 Region T1 Region T2 Region

    T3 Region Tn Region ... ...
  130. Concurrent Compaction Heap ... T0 Region T1 Region T2 Region

    T3 Region Tn Region ... ... Compaction Phase
  131. Concurrent Compaction Heap ... T0 Region T1 Region T2 Region

    T3 Region Tn Region ... ... Compaction Phase
  132. Concurrent Compaction Heap ... T0 Region T1 Region T2 Region

    T3 Region Tn Region ... ... Compaction Phase
  133. Concurrent Compaction Heap ... T0 Region T1 Region T2 Region

    T3 Region Tn Region ... ... Compaction Phase
  134. ART in Oreo Thread-local bump allocator 70% faster allocations than

    Nougat 18x faster than Dalvik (Kitkat)
  135. Concurrent Compaction — Allocation Heap ... T0 Region T1 Region

    T2 Region T3 Region Tn Region All-thread Heap ... ...
  136. Concurrent Compaction — Allocation Heap ... T0 Region T1 Region

    T2 Region T3 Region Tn Region All-thread Heap ... ...
  137. Concurrent Compaction — Allocation Heap ... T0 Region T1 Region

    T2 Region T3 Region Tn Region All-thread Heap ... ... T1 Region Free Pointer
  138. Concurrent Compaction — Allocation Heap ... T0 Region T1 Region

    T2 Region T3 Region Tn Region All-thread Heap ... ... T1 Region Free Pointer
  139. Concurrent Compaction — Allocation Heap ... T0 Region T1 Region

    T2 Region T3 Region Tn Region All-thread Heap ... ... T1 Region Free Pointer
  140. Concurrent Compaction — Allocation Heap ... T0 Region T1 Region

    T2 Region T3 Region Tn Region All-thread Heap ... ... T1 Region Free Pointer
  141. Concurrent Compaction — Allocation Heap ... T0 Region T1 Region

    T2 Region T3 Region Tn Region All-thread Heap ... ... T1 Region Free Pointer
  142. Concurrent Compaction — Allocation Heap ... T0 Region T1 Region

    T2 Region T3 Region Tn Region All-thread Heap ... ... T1 Region Free Pointer
  143. Allocation Improvements

  144. ART in O+

  145. ART in O+ Young generation collections gone in O

  146. ART in O+ Young generation collections gone in O Enabled

    in AOSP
  147. ART in O+ Young generation collections gone in O Enabled

    in AOSP Watch for that future release…
  148. Object Pools

  149. Object Pools Conventional wisdom

  150. Object Pools Conventional wisdom Reusing objects is faster (saves on

    allocation/collection time)

  151. Object Pools Conventional wisdom Reusing objects is faster (saves on

    allocation/collection time)
 Actual wisdom
  152. Object Pools Conventional wisdom Reusing objects is faster (saves on

    allocation/collection time)
 Actual wisdom As of Oreo, synchronized object pools are generally slower
  153. Soooo… What Now? Creating garbage is okay (and so is

    collecting it)
 Use the types and objects you need Even enums
 GC is still overhead But not as critical to avoid as it was in Dalvik Make the right choices for your architecture Avoid overhead in critical sections when possible
  154. Jank Test Autoboxing

  155. Jank Test Autoboxing private Float[] mHolder = new Float[100_000];

  156. Jank Test Autoboxing public void run() { long startTime =

    System.currentTimeMillis(); float f = 0f; for (int i = 0; i < mHolder.length; ++i, f += 1.0f) { mHolder[i] = f; } System.out.println("Alloc time = " + (System.currentTimeMillis() - startTime)); } private Float[] mHolder = new Float[100_000];
  157. Jank Test Autoboxing public void run() { long startTime =

    System.currentTimeMillis(); float f = 0f; for (int i = 0; i < mHolder.length; ++i, f += 1.0f) { mHolder[i] = f; } System.out.println("Alloc time = " + (System.currentTimeMillis() - startTime)); } private Float[] mHolder = new Float[100_000]; I/System.out: Alloc time = 28 D/dalvikvm: GC_FOR_ALLOC freed 2047K, 1% free 337371K/339492K, paused 10ms, total 10ms I/System.out: Alloc time = 29
  158. Jank Test Autoboxing public void run() { long startTime =

    System.currentTimeMillis(); float f = 0f; for (int i = 0; i < mHolder.length; ++i, f += 1.0f) { mHolder[i] = f; } System.out.println("Alloc time = " + (System.currentTimeMillis() - startTime)); } private Float[] mHolder = new Float[100_000]; I/System.out: Alloc time = 28 D/dalvikvm: GC_FOR_ALLOC freed 2047K, 1% free 337371K/339492K, paused 10ms, total 10ms I/System.out: Alloc time = 29 I/System.out: Alloc time = 3 I/System.out: Alloc time = 2 I/System.out: Alloc time = 4
  159. Jank Test Bitmaps

  160. Jank Test Bitmaps public void run() { long startTime =

    System.currentTimeMillis(); mBitmap = Bitmap.createBitmap(1_000, 1_000, Bitmap.Config.ARGB_8888); System.out.println("Alloc time = " + (System.currentTimeMillis() - startTime)); }
  161. Jank Test Bitmaps public void run() { long startTime =

    System.currentTimeMillis(); mBitmap = Bitmap.createBitmap(1_000, 1_000, Bitmap.Config.ARGB_8888); System.out.println("Alloc time = " + (System.currentTimeMillis() - startTime)); } I/System.out: Alloc time = 16 D/dalvikvm: GC_FOR_ALLOC freed 3907K, 2% free 341280K/347244K, paused 7ms, total 7ms I/dalvikvm-heap: Grow heap (frag case) to 337.165MB for 4000012-byte allocation D/dalvikvm: GC_FOR_ALLOC freed <1K, 1% free 345186K/347244K, paused 7ms, total 7ms
  162. Jank Test Bitmaps public void run() { long startTime =

    System.currentTimeMillis(); mBitmap = Bitmap.createBitmap(1_000, 1_000, Bitmap.Config.ARGB_8888); System.out.println("Alloc time = " + (System.currentTimeMillis() - startTime)); } I/System.out: Alloc time = 16 D/dalvikvm: GC_FOR_ALLOC freed 3907K, 2% free 341280K/347244K, paused 7ms, total 7ms I/dalvikvm-heap: Grow heap (frag case) to 337.165MB for 4000012-byte allocation D/dalvikvm: GC_FOR_ALLOC freed <1K, 1% free 345186K/347244K, paused 7ms, total 7ms I/System.out: Alloc time = 1 I/System.out: Alloc time = 0 I/System.out: Alloc time = 0 I/System.out: Alloc time = 1
  163. None
  164. data class Float3(x: Float, y: Float, z: Float) fun Tonemap_ACES(x:

    Float3): Float3 { val a = 2.51f val b = 0.03f val c = 2.43f val d = 0.59f val e = 0.14f return (x * (a * x + b)) / (x * (c * x + d) + e) }
  165. inline operator fun Float.plus(v: Float3) = Float3(this + v.x, this

    + v.y, this + v.z) inline operator fun Float.times(v: Float3) = Float3(this * v.x, this * v.y, this * v.z)
  166. None
  167. None
  168. None
  169. Android O, 1 tile = 00.1~00.5s

  170. Android O, 1 tile = 00.1~00.5s Android K, 1 tile

    = 40.0~50.0s
  171. 10-23 14:40:04.997 3885-3908/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K,

    paused 4ms, total 4ms 10-23 14:40:05.067 3885-3908/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 2ms, total 2ms 10-23 14:40:05.147 3885-3901/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 2ms, total 2ms 10-23 14:40:05.177 3885-3907/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 5ms, total 5ms 10-23 14:40:05.207 3885-3909/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 3ms, total 3ms 10-23 14:40:05.277 3885-3903/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 2ms, total 2ms 10-23 14:40:05.307 3885-3909/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5125K/5912K, paused 3ms, total 3ms 10-23 14:40:05.357 3885-3901/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 713K, 14% free 5124K/5912K, paused 5ms, total 5ms 10-23 14:40:05.397 3885-3903/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 3ms, total 3ms 10-23 14:40:05.447 3885-3901/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 2ms, total 2ms 10-23 14:40:05.517 3885-3902/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 2ms, total 2ms 10-23 14:40:05.607 3885-3909/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 2ms, total 2ms 10-23 14:40:05.677 3885-3903/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 2ms, total 2ms 10-23 14:40:05.707 3885-3909/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 4ms, total 4ms 10-23 14:40:05.767 3885-3902/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 2ms, total 2ms 10-23 14:40:05.837 3885-3908/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 2ms, total 2ms 10-23 14:40:05.867 3885-3902/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 2ms, total 2ms 10-23 14:40:05.897 3885-3901/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 2ms, total 2ms 10-23 14:40:05.957 3885-3902/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 5ms, total 5ms 10-23 14:40:05.997 3885-3902/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 2ms, total 2ms 10-23 14:40:06.037 3885-3908/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 2ms, total 2ms 10-23 14:40:06.107 3885-3908/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 2ms, total 2ms 10-23 14:40:06.157 3885-3902/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 3ms, total 3ms 10-23 14:40:06.227 3885-3909/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 4ms, total 4ms 10-23 14:40:06.267 3885-3901/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 3ms, total 3ms 10-23 14:40:06.337 3885-3903/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 2ms, total 2ms 10-23 14:40:06.367 3885-3907/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 2ms, total 2ms 10-23 14:40:06.437 3885-3901/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 5ms, total 5ms 10-23 14:40:06.527 3885-3909/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 2ms, total 2ms 10-23 14:40:06.597 3885-3907/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 4ms, total 4ms 10-23 14:40:06.617 3885-3901/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 4ms, total 4ms 10-23 14:40:06.697 3885-3909/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 2ms, total 2ms 10-23 14:40:06.717 3885-3902/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 2ms, total 2ms 10-23 14:40:06.767 3885-3901/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 2ms, total 2ms 10-23 14:40:06.817 3885-3901/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 713K, 14% free 5124K/5912K, paused 3ms, total 3ms 10-23 14:40:06.857 3885-3907/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 711K, 14% free 5124K/5912K, paused 5ms, total 5ms 10-23 14:40:06.937 3885-3908/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 5ms, total 5ms 10-23 14:40:06.957 3885-3901/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 2ms, total 2ms 10-23 14:40:07.027 3885-3903/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 2ms, total 2ms 10-23 14:40:07.117 3885-3907/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 2ms, total 2ms 10-23 14:40:07.157 3885-3901/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 2ms, total 2ms 10-23 14:40:07.197 3885-3909/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 5ms, total 5ms 10-23 14:40:07.257 3885-3908/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 2ms, total 2ms 10-23 14:40:07.307 3885-3908/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 3ms, total 3ms 10-23 14:40:07.357 3885-3908/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 2ms, total 2ms 10-23 14:40:07.397 3885-3903/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 3ms, total 3ms 10-23 14:40:07.447 3885-3903/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 2ms, total 2ms 10-23 14:40:07.477 3885-3908/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 2ms, total 2ms 10-23 14:40:07.557 3885-3901/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 2ms, total 2ms 10-23 14:40:07.617 3885-3903/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 2ms, total 2ms 10-23 14:40:07.647 3885-3908/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 2ms, total 2ms 10-23 14:40:07.677 3885-3901/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 5ms, total 5ms 10-23 14:40:07.707 3885-3903/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 2ms, total 2ms 10-23 14:40:07.727 3885-3901/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 5ms, total 5ms 10-23 14:40:07.767 3885-3909/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 2ms, total 2ms 10-23 14:40:07.797 3885-3908/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 2ms, total 2ms 10-23 14:40:07.847 3885-3907/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 5ms, total 5ms 10-23 14:40:07.927 3885-3903/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5125K/5912K, paused 5ms, total 5ms 10-23 14:40:08.057 3885-3902/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 713K, 14% free 5124K/5912K, paused 4ms, total 4ms 10-23 14:40:08.077 3885-3907/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 5ms, total 5ms 10-23 14:40:08.107 3885-3903/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 2ms, total 2ms 10-23 14:40:08.147 3885-3908/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 4ms, total 4ms 10-23 14:40:08.207 3885-3901/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 2ms, total 2ms 10-23 14:40:08.267 3885-3902/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 713K, 14% free 5124K/5912K, paused 4ms, total 4ms 10-23 14:40:08.287 3885-3907/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 711K, 14% free 5124K/5912K, paused 5ms, total 5ms 10-23 14:40:08.337 3885-3902/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 4ms, total 4ms
  172. 10-23 14:40:24.847 3885-3909/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K,

    paused 3ms, total 3ms 10-23 14:40:24.917 3885-3903/com.google.ray_trasher D/dalvikvm: GC_FOR_ALLOC freed 712K, 14% free 5124K/5912K, paused 2ms, total 2ms
  173. None
  174. Benchmarks, GC and Caches

  175. Core Core Core Core Kryo 385 (Pixel 3) “Gold” cores

  176. Core Core Core Core L1 L1 L1 L1 Kryo 385

    (Pixel 3) 4x 32 KiB “Gold” cores
  177. Core Core Core Core L2 L2 L2 L2 L1 L1

    L1 L1 Kryo 385 (Pixel 3) 4x 32 KiB “Gold” cores 4x 256 KiB
  178. Core Core Core Core L2 L2 L2 L2 L3 L1

    L1 L1 L1 Kryo 385 (Pixel 3) 4x 32 KiB “Gold” cores 4x 256 KiB 1x 2 MiB
  179. L2 L3 L1 private val data = FloatArray(16) // …

    val a = foo[0] RAM
  180. L2 L3 L1 private val data = FloatArray(16) // …

    val a = foo[0] RAM
  181. L2 L3 L1 private val data = FloatArray(16) // …

    val a = foo[0] RAM
  182. L2 L3 L1 private val data = FloatArray(16) // …

    val a = foo[0] RAM
  183. L2 L3 L1 private val data = FloatArray(16) // …

    val a = foo[0] RAM
  184. 64 bytes

  185. val m = ArrayList<FloatArray>(n) 64 bytes

  186. m[0] = FloatArray(4) m[1] = FloatArray(4) m[2] = FloatArray(4) m[n]

    = FloatArray(4) val m = ArrayList<FloatArray>(n) 64 bytes
  187. m[0] = FloatArray(4) m[1] = FloatArray(4) m[2] = FloatArray(4) m[n]

    = FloatArray(4) val m = ArrayList<FloatArray>(n) RAM 64 bytes
  188. m[0] FloatArray(4) m[1] FloatArray(4) m[2] FloatArray(4) m[n] FloatArray(4) val m

    = ArrayList<FloatArray>(n) for (i in 0 until m.size - 3) { val a = m[i ] val b = m[i + 1] val c = m[i + 2] val d = m[i + 3] computeStuff(a, b, c, d) } L1 64 bytes
  189. m[0] FloatArray(4) m[1] FloatArray(4) m[2] FloatArray(4) m[n] FloatArray(4) val m

    = ArrayList<FloatArray>(n) for (i in 0 until m.size - 3) { val a = m[i ] val b = m[i + 1] val c = m[i + 2] val d = m[i + 3] computeStuff(a, b, c, d) } L1 64 bytes
  190. m[0] FloatArray(4) m[1] FloatArray(4) m[2] FloatArray(4) m[n] FloatArray(4) val m

    = ArrayList<FloatArray>(n) for (i in 0 until m.size - 3) { val a = m[i ] val b = m[i + 1] val c = m[i + 2] val d = m[i + 3] computeStuff(a, b, c, d) } L1 64 bytes
  191. m[0] FloatArray(4) m[1] FloatArray(4) m[2] FloatArray(4) m[n] FloatArray(4) val m

    = ArrayList<FloatArray>(n) for (i in 0 until m.size - 3) { val a = m[i ] val b = m[i + 1] val c = m[i + 2] val d = m[i + 3] computeStuff(a, b, c, d) } L1 64 bytes
  192. m[0] FloatArray(4) m[1] FloatArray(4) m[2] FloatArray(4) m[n] FloatArray(4) val m

    = ArrayList<FloatArray>(n) for (i in 0 until m.size - 3) { val a = m[i ] val b = m[i + 1] val c = m[i + 2] val d = m[i + 3] computeStuff(a, b, c, d) } L1 64 bytes
  193. m[0] FloatArray(4) m[1] FloatArray(4) m[2] FloatArray(4) m[n] FloatArray(4) val m

    = ArrayList<FloatArray>(n) for (i in 0 until m.size - 3) { val a = m[i ] val b = m[i + 1] val c = m[i + 2] val d = m[i + 3] computeStuff(a, b, c, d) } L1 64 bytes
  194. m[0] = FloatArray(4) m[1] = FloatArray(4) m[2] = FloatArray(4) m[n]

    = FloatArray(4) val m = ArrayList<FloatArray>(n) RAM 64 bytes
  195. m[0] FloatArray(4) m[1] FloatArray(4) m[2] FloatArray(4) m[n] FloatArray(4) val m

    = ArrayList<FloatArray>(n) for (i in 0 until m.size - 3) { val a = m[i ] val b = m[i + 1] val c = m[i + 2] val d = m[i + 3] computeStuff(a, b, c, d) } L1 64 bytes
  196. m[0] FloatArray(4) m[1] FloatArray(4) m[2] FloatArray(4) m[n] FloatArray(4) val m

    = ArrayList<FloatArray>(n) for (i in 0 until m.size - 3) { val a = m[i ] val b = m[i + 1] val c = m[i + 2] val d = m[i + 3] computeStuff(a, b, c, d) } L1 64 bytes
  197. m[0] FloatArray(4) m[1] FloatArray(4) m[2] FloatArray(4) m[n] FloatArray(4) val m

    = ArrayList<FloatArray>(n) for (i in 0 until m.size - 3) { val a = m[i ] val b = m[i + 1] val c = m[i + 2] val d = m[i + 3] computeStuff(a, b, c, d) } L1 64 bytes
  198. m[0] FloatArray(4) m[1] FloatArray(4) m[2] FloatArray(4) m[n] FloatArray(4) val m

    = ArrayList<FloatArray>(n) for (i in 0 until m.size - 3) { val a = m[i ] val b = m[i + 1] val c = m[i + 2] val d = m[i + 3] computeStuff(a, b, c, d) } L1 64 bytes
  199. m[0] FloatArray(4) m[1] FloatArray(4) m[2] FloatArray(4) m[n] FloatArray(4) val m

    = ArrayList<FloatArray>(n) for (i in 0 until m.size - 3) { val a = m[i ] val b = m[i + 1] val c = m[i + 2] val d = m[i + 3] computeStuff(a, b, c, d) } L1 64 bytes
  200. m[0] FloatArray(4) m[1] FloatArray(4) m[2] FloatArray(4) m[n] FloatArray(4) val m

    = ArrayList<FloatArray>(n) for (i in 0 until m.size - 3) { val a = m[i ] val b = m[i + 1] val c = m[i + 2] val d = m[i + 3] computeStuff(a, b, c, d) } L1 64 bytes
  201. m[0] FloatArray(4) m[1] FloatArray(4) m[2] FloatArray(4) m[n] FloatArray(4) val m

    = ArrayList<FloatArray>(n) for (i in 0 until m.size - 3) { val a = m[i ] val b = m[i + 1] val c = m[i + 2] val d = m[i + 3] computeStuff(a, b, c, d) } L1 64 bytes
  202. m[0] FloatArray(4) m[1] FloatArray(4) m[2] FloatArray(4) m[n] FloatArray(4) val m

    = ArrayList<FloatArray>(n) for (i in 0 until m.size - 3) { val a = m[i ] val b = m[i + 1] val c = m[i + 2] val d = m[i + 3] computeStuff(a, b, c, d) } L1 64 bytes
  203. m[0] FloatArray(4) m[1] FloatArray(4) m[2] FloatArray(4) m[n] FloatArray(4) val m

    = ArrayList<FloatArray>(n) for (i in 0 until m.size - 3) { val a = m[i ] val b = m[i + 1] val c = m[i + 2] val d = m[i + 3] computeStuff(a, b, c, d) } L1 64 bytes
  204. 0.0 1.0 2.0 3.0 4.0 5.0 6.0 No thrash L1

    thrash L2 thrash Relative computation times (Pixel 3)
  205. On some workloads, the work of the GC will affect

    performance
  206. You might be benchmarking perfect memory access patterns

  207. Questions?