Dissecting Andy: The Dalvik VM under the microscope

232f987cbe47dd7083918ff2b641cf4b?s=47 Takipi
October 22, 2013

Dissecting Andy: The Dalvik VM under the microscope

An in-depth look at the Dalvik VM -- the state-of-the-art mobile virtual machine at the core of the Android platform.

Key points:
* The major differences between Dalvik and other prominent VMs such as HotSpot.
* How Dalvik was designed for optimal performance under strict memory, energy consumption and processing power constraints.
* How the Dalvik VM is built into the Android OS.
* Important things every developer should know when writing for the Android platform.

232f987cbe47dd7083918ff2b641cf4b?s=128

Takipi

October 22, 2013
Tweet

Transcript

  1. None
  2. About me • Niv Steingarten (pronounced Neev) • Co-Founder at

    Takipi
  3. Previously • Led development of AutoCAD WS 10M users on

    Android & iOS • Principal Eng. at VisualTao Acquired by Autodesk Inc. 2009 • Researcher in elite IDF tech unit
  4. Agenda • Introduction • Dalvik is a unique VM •

    Life on a mobile device • Development tips
  5. Introduction

  6. Dalvik (the VM) • The VM which powers Android •

    Optimized for mobile devices • Very different from HotSpot • Integrated into the OS
  7. Dalvík (the village)

  8. General Android Architecture

  9. Agenda • Introduction • Dalvik is a unique VM •

    Life on a mobile device • Development tips
  10. Dalvik is a unique VM

  11. What makes Dalvik different? • Optimized for mobile devices ◦

    Memory constraints ▪ Small RAM ▪ No swap file! ◦ CPU constraints ◦ Storage constraints
  12. What makes Dalvik different? • Optimized for mobile devices ◦

    Memory constraints ◦ CPU constraints ▪ Relatively slow processor ▪ Small cache ◦ Storage constraints
  13. What makes Dalvik different? • Optimized for mobile devices ◦

    Memory constraints ◦ CPU constraints ◦ Storage constraints ▪ Small internal storage ▪ External storage not always available
  14. What makes Dalvik different? • Optimized for mobile devices ◦

    Memory constraints ◦ CPU constraints ◦ Storage constraints ◦ Power constraints ▪ CPU drains battery
  15. What makes Dalvik different? • Designed to be able to

    run efficiently on an extremely wide variety of hardware specs
  16. What makes Dalvik different? • Designed to be able to

    run efficiently on an extremely wide variety of hardware specs ◦ RAM: 32 MB ~ 2 GB ◦ CPU: 200 MHz ~ 2 GHz multi-core ◦ Storage: 32 MB ~ 100s of GBs
  17. What makes Dalvik different? • Host all application processes

  18. What makes Dalvik different? • Host all application processes ◦

    Thus, it is integrated into the OS ◦ Device runs many VMs concurrently ◦ Even internal applications ◦ Apps must be responsive
  19. Dalvik is not a JVM! (details to follow...)

  20. Agenda • Introduction • Dalvik is a unique VM •

    Life on a mobile device • Development tips
  21. Detour

  22. What's in a JVM?

  23. What's in a JVM? • Runtime libraries • Garbage collector

    • Class loading mechanism • Java bytecode interpreter ◦ Optionally: JIT compiler, multiple GCs, debugger...
  24. Class life (.jar) Java / Scala / Clojure Compiler .class

    .class .class
  25. Class life Java / Scala / Clojure Compiler .class .class

    .class Class Loader Bytecode Interpreter
  26. Class life Java / Scala / Clojure Compiler .class .class

    .class Class Loader Bytecode Interpreter JIT Compiler
  27. Class file • Constant pool ◦ String literals ◦ Number

    constants ◦ Identifiers • Method code • Fields • More...
  28. Java bytecode public static int min(int a, int b) {

    if (a < b) { return a; } else { return b; } }
  29. Java bytecode public static int min(int a, int b) {

    if (a < b) { return a; } else { return b; } } ILOAD 0 ILOAD 1 IF_ICMPGE L1 ILOAD 0 IRETURN L1: ILOAD 1 IRETURN
  30. Java bytecode public static int min(int a, int b) {

    if (a < b) { return a; } else { return b; } } ILOAD 0 ILOAD 1 IF_ICMPGE L1 ILOAD 0 IRETURN L1: ILOAD 1 IRETURN *a
  31. Java bytecode public static int min(int a, int b) {

    if (a < b) { return a; } else { return b; } } ILOAD 0 ILOAD 1 IF_ICMPGE L1 ILOAD 0 IRETURN L1: ILOAD 1 IRETURN *b *a
  32. Java bytecode public static int min(int a, int b) {

    if (a < b) { return a; } else { return b; } } ILOAD 0 ILOAD 1 IF_ICMPGE L1 ILOAD 0 IRETURN L1: ILOAD 1 IRETURN *b *a
  33. Java bytecode public static int min(int a, int b) {

    if (a < b) { return a; } else { return b; } } ILOAD 0 ILOAD 1 IF_ICMPGE L1 ILOAD 0 IRETURN L1: ILOAD 1 IRETURN
  34. Java bytecode public static int min(int a, int b) {

    if (a < b) { return a; } else { return b; } } ILOAD 0 ILOAD 1 IF_ICMPGE L1 ILOAD 0 IRETURN L1: ILOAD 1 IRETURN *b
  35. Java bytecode public static int min(int a, int b) {

    if (a < b) { return a; } else { return b; } } ILOAD 0 ILOAD 1 IF_ICMPGE L1 ILOAD 0 IRETURN L1: ILOAD 1 IRETURN *b Return value
  36. Back to Dalvik

  37. What's in a JVM Dalvik?

  38. What's in a JVM Dalvik? • Runtime libraries • Garbage

    collector • Class loading mechanism • Java bytecode interpreter ◦ Also: JIT compiler, debugger support
  39. What's in a JVM Dalvik? • Runtime libraries ✓ •

    Garbage collector • Class loading mechanism • Java bytecode interpreter ◦ Also: JIT compiler, debugger support
  40. What's in a JVM Dalvik? • Runtime libraries ✓ •

    Garbage collector ✓ • Class loading mechanism • Java bytecode interpreter ◦ Also: JIT compiler, debugger support
  41. Class life Java / Scala / Clojure Compiler .class .class

    .class
  42. Class life Java / Scala / Clojure Compiler .class .class

    .class dx classes.dex (.apk)
  43. Class life Java / Scala / Clojure Compiler .class .class

    .class dx classes.dex Optimizer odex Cache
  44. Dalvik bytecode • Typical JVMs are stack machines. • Dalvik

    is a register machine.
  45. Dalvik bytecode public static int min(int a, int b) {

    if (a < b) { return a; } else { return b; } }
  46. Dalvik bytecode public static int min(int a, int b) {

    if (a < b) { return a; } else { return b; } } 0000: if-ge v0, v1, 0003 0002: return v0 0003: move v0, v1 0004: goto 0002
  47. Dalvik bytecode public static int min(int a, int b) {

    if (a < b) { return a; } else { return b; } } 0000: if-ge v0, v1, 0003 0002: return v0 0003: move v0, v1 0004: goto 0002
  48. Dalvik bytecode public static int min(int a, int b) {

    if (a < b) { return a; } else { return b; } } 0000: if-ge v0, v1, 0003 0002: return v0 0003: move v0, v1 0004: goto 0002
  49. Dalvik bytecode public static int min(int a, int b) {

    if (a < b) { return a; } else { return b; } } 0000: if-ge v0, v1, 0003 0002: return v0 0003: move v0, v1 0004: goto 0002
  50. Dalvik bytecode public static int min(int a, int b) {

    if (a < b) { return a; } else { return b; } } 0000: if-ge v0, v1, 0003 0002: return v0 0003: move v0, v1 0004: goto 0002
  51. Dalvik bytecode public static int min(int a, int b) {

    if (a < b) { return a; } else { return b; } } 0000: if-ge v0, v1, 0003 0002: return v0 0003: move v0, v1 0004: goto 0002
  52. Stack vs. Registers 0000: if-ge v0, v1, 0003 0002: return

    v0 0003: move v0, v1 0004: goto 0002 ILOAD 0 ILOAD 1 IF_ICMPGE L1 ILOAD 0 IRETURN L1: ILOAD 1 IRETURN
  53. Register (Dalvik) advantages • Smaller memory footprint ◦ Does not

    use auxiliary stack structure
  54. Register (Dalvik) advantages • Smaller memory footprint ◦ Does not

    use auxiliary stack structure • Code is shorter ◦ ~43% less opcodes (bytecode “lines”)
  55. Stack (JVM) advantages • Compilation is simpler and faster

  56. Stack (JVM) advantages • Compilation is simpler and faster •

    Overall code size is smaller ◦ ~30% smaller than equiv. register bytecode
  57. Stack (JVM) advantages • Compilation is simpler and faster •

    Overall code size is smaller ◦ ~30% smaller than equiv. register bytecode ▪ 205 JVM opcodes → 1 byte ▪ 276 Dalvik opcodes → 2 bytes + specialized opcodes + register specification
  58. Dalvik JIT

  59. Dalvik JIT • Introduced in Froyo (Android 2.2)

  60. Dalvik JIT • Introduced in Froyo (Android 2.2) • Trace-based

    compilation
  61. Trace-based JIT advantages

  62. Trace-based JIT advantages • Optimization takes effect faster

  63. Trace-based JIT advantages • Optimization takes effect faster • Low

    memory usage
  64. Trace-based JIT advantages • Optimization takes effect faster • Low

    memory usage ◦ Compiling traces requires less RAM ◦ Resulting code is more granular ◦ “Luggage” code is never compiled
  65. The DEX file

  66. The DEX file .class Constant Pool Java bytecode classes.dex Unified

    Constant Pools Dalvik bytecode Dalvik bytecode .class Constant Pool Java bytecode .class Constant Pool Java bytecode
  67. The DEX file • Constants take ~60% of class file

    size
  68. The DEX file • Constants take ~60% of class file

    size • One big constant pool for all classes • This big constant pool is divided into sections for even better space efficiency
  69. The DEX file System libraries Classes: 21.4 MB JAR: 10.7

    MB (50%) Browser app Classes: 470.3 KB JAR: 232.1 KB (49%)
  70. The DEX file System libraries Classes: 21.4 MB JAR: 10.7

    MB (50%) DEX: 10.3 MB (48%) Browser app Classes: 470.3 KB JAR: 232.1 KB (49%) DEX: 209.2 KB (44%)
  71. Reminder: Dalvik class life .class .class .class dx classes.dex Optimizer

    odex Cache
  72. The ODEX file • When an app is installed, its

    DEX file is preprocessed.
  73. The ODEX file • When an app is installed, its

    DEX file is preprocessed. ◦ Word alignment, padding, endianity ◦ Bytecode verification
  74. The ODEX file • When an app is installed, its

    DEX file is preprocessed. ◦ Word alignment, padding, endianity ◦ Bytecode verification ◦ Native library call inlining ◦ Method references → vtable indices ◦ Field references → internal byte offsets
  75. The ODEX file • When an app is installed, its

    DEX file is preprocessed. ◦ Word alignment, padding, endianity ◦ Bytecode verification ◦ Native library call inlining ◦ Method references → vtable indices ◦ Field references → internal byte offsets ▪ More: Dead code removal, integral type coalescing...
  76. The ODEX file • It is then cached to internal

    storage. • From now on, it can be quickly loaded into a VM's memory with minimum overhead.
  77. The Zygote Premises:

  78. The Zygote Premises: • Most apps use the same core

    libraries • Starting a VM instance is costly • RAM is scarce
  79. The Zygote The Zygote is a special VM process.

  80. The Zygote The Zygote is a special VM process. •

    Born shortly after Android boots-up • A "warmed up" VM ◦ System libraries are loaded and initialized • Ready to be forked on demand
  81. The Zygote Runtime library DEX Shared Heap (loaded libs, lib

    structures) The Zygote
  82. The Zygote The Zygote Runtime library DEX Shared Heap (loaded

    libs, lib structures) Application DEX Angry Birds Application Heap
  83. The Zygote The Zygote Runtime library DEX Shared Heap (loaded

    libs, lib structures) Application DEX Angry Birds Application Heap Application DEX Chrome Application Heap
  84. The Zygote The Zygote Runtime library DEX Shared Heap (loaded

    libs, lib structures) Application DEX Angry Birds Application Heap Application DEX Chrome Application Heap
  85. The Zygote The Zygote Runtime library DEX Shared Heap (loaded

    libs, lib structures) Application DEX Angry Birds Application Heap Application DEX Chrome Application Heap
  86. The Zygote The Zygote Runtime library DEX Shared Heap (loaded

    libs, lib structures) Application DEX Angry Birds Application Heap Application DEX Chrome Application Heap
  87. The Zygote • Quick VM startup ◦ Improved app startup

    time • Preloaded and initialized libraries • Sharing of memory across VMs • Apps are segregated
  88. The Zygote • Quick VM startup • Preloaded and initialized

    libraries ◦ Improved overall app responsiveness • Sharing of memory across VMs • Apps are segregated
  89. The Zygote • Quick VM startup • Preloaded and initialized

    libraries • Sharing of memory across VMs ◦ Smaller VM memory footprint • Apps are segregated
  90. The Zygote • Quick VM startup • Preloaded and initialized

    libraries • Sharing of memory across VMs • Apps are segregated ◦ Utilize Linux kernel security model
  91. Summary Conserve RAM by...

  92. Summary Conserve RAM by... • Using register-based bytecode • Using

    trace-based JIT compilation • Merging class files into a single .dex file • Sharing memory between processes • Mapping loaded .dex bytecode to files
  93. Summary Conserve CPU (and battery) by...

  94. Summary Conserve CPU (and battery) by... • Using register-based bytecode

    • Optimizing .dex files ◦ Perform platform optimization ◦ Optimize during installation, instead of at runtime • Forking the Zygote, reducing startup overhead
  95. Agenda • Introduction • Dalvik is a unique VM •

    Life on a mobile device • Development tips
  96. Development tips

  97. Don’t grind water • Prefer signaling mechanisms over polling.

  98. Don’t grind water • Prefer signaling mechanisms over polling. •

    Try to do minimum work when there's no user, network or sensor input.
  99. Don’t grind water • Prefer signaling mechanisms over polling. •

    Try to do minimum work when there's no user, network or sensor input. • Monitor the state of the battery ◦ Lengthen polling cycles if necessary ◦ Turn off background services
  100. Efficient looping List<Item> list = new ArrayList<Item>(); ... for (Item

    item : list) { ... }
  101. Efficient looping List<Item> list = new ArrayList<Item>(); ... for (Item

    item : list) { ... } List<Item> list = new ArrayList<Item>(); ... int size = list.size(); for (int i = 0; i < size; ++i) { ... } 3x faster!
  102. Efficient looping Item[] array = new Item[...]; ... for (int

    i = 0; i < array.length; ++i) { ... }
  103. Efficient looping Item[] array = new Item[...]; ... for (int

    i = 0; i < array.length; ++i) { ... } Item[] array = new Item[...]; ... for (Item item : array) { ... } JIT can't yet optimize this!
  104. Efficient looping Using an Iterator is still the best (and

    sometimes only) way to iterate over non ArrayList collections.
  105. Garbage collections are bad!

  106. Garbage collections are bad! • They cause performance hiccups •

    They are heavy on the CPU • They drain the battery
  107. Garbage collections are bad! • They cause performance hiccups •

    They are heavy on the CPU • They drain the battery Avoid short-lived allocations
  108. Avoid short-lived allocations • Dalvik’s GC is not generational •

    Not optimized for stack allocations as in HotSpot
  109. Avoid short-lived allocations • Try to avoid boxing and unboxing

    ◦ int → Integer → int, etc.
  110. Avoid short-lived allocations • Try to avoid boxing and unboxing

    ◦ int → Integer → int, etc. • If you need to aggregate, consider passing the aggregator as an argument. ◦ List ◦ Set ◦ StringBuilder
  111. RAM is scarce!

  112. RAM is scarce! • Caching wisely can reduce allocations ◦

    Recycle views ◦ Recycle bitmaps
  113. RAM is scarce! • Caching wisely can reduce allocations ◦

    Recycle views ◦ Recycle bitmaps • ...but be wary of large caches
  114. RAM is scarce! • Caching wisely can reduce allocations ◦

    Recycle views ◦ Recycle bitmaps • ...but be wary of large caches • Persist whatever you can
  115. RAM is scarce!

  116. RAM is scarce! • Use streams instead of in-memory buffers

    ◦ Decode files directly from file streams ◦ Deserialize structures directly from HTTP streams
  117. RAM is scarce! • Use streams instead of in-memory buffers

    ◦ Decode files directly from file streams ◦ Deserialize structures directly from HTTP streams • For example ◦ BitmapFactory.decodeStream(InputStream is) ◦ MyProtoBuffer.parseFrom(InputStream is)
  118. Thanks! Now go and be awesome. niv@takipi.com @takipid www.takipi.com