Android Benchmarking and other stories

Slide 1

Slide 1 text

Android Benchmarking and other stories Iury Souza Enrique López-Mañas

Slide 2

Slide 2 text

• Mobile stuff @ Klarna • Currently building a shopping browser • Loves building tools @iurysza

Slide 3

Slide 3 text

@eenriquelopez • Android Freelancer • Kotlin Weekly maintainer (kotlinweekly.net) • Kotlin, Android • Running, fi nances.

Slide 4

Slide 4 text

Introduction Benchmarking is the practice of comparing business processes and performance metrics to industry bests and best practices from other companies. Dimensions typically measured are quality, time and cost.

Slide 5

Slide 5 text

Introduction Benchmarking is a way to test the performance of your application. You can regularly run benchmarks to help analyze and debug performance problems and ensure that you don't introduce regressions in recent changes.

Slide 6

Slide 6 text

Introduction In software engineering, pro fi ling ("program pro fi ling", "software pro fi ling") is a form of dynamic program analysis that measures, for example, the space (memory) or time complexity of a program, the usage of particular instructions, or the frequency and duration of function calls.

Slide 7

Slide 7 text

Android Benchmarking • Microbenchmark • Macrobenchmark • Jetpack Benchmark • JankStats

Slide 8

Slide 8 text

Android Profiling • Android Pro fi ler (since Android Studio 3.0) • Replaces Android Monitor Tools • CPU, Memory, Network and Energy pro fi lers • Pro fi leable apps • Useful for identifying performance bottlenecks

Slide 9

Slide 9 text

Android Profiling

Slide 10

Slide 10 text

Android Profiling - Memory

Slide 11

Slide 11 text

Android Profiling - Memory allocation

Slide 12

Slide 12 text

Android Profiling - Energy

Slide 13

Slide 13 text

Microbenchmark • Quickly benchmark your Android native code (Kotlin or Java) from within Android Studio. • Recommendation: pro fi le your code before writing a benchmark • Useful for CPU work that is run many times in your app • Examples: RecyclerView scrolling with one item shown at a time, data conversions/processing.

Slide 14

Slide 14 text

Microbenchmark • Add dependency: dependencies { androidTestImplementation 'androidx.benchmark:benchmark-junit4 : 1.1.0-beta03' }

Slide 15

Slide 15 text

Microbenchmark • Add benchmark: @RunWith(AndroidJUnit4::class) class SampleBenchmark { @get:Rule val benchmarkRule = BenchmarkRule() @Test fun benchmarkSomeWork() { benchmarkRule.measureRepeated { doSomeWork() } } }

Slide 16

Slide 16 text

Microbenchmark • Add benchmark: @RunWith(AndroidJUnit4::class) val benchmarkRule = BenchmarkRule() @Test

Slide 17

Slide 17 text

Microbenchmark • Add benchmark: @RunWith(AndroidJUnit4::class) fun benchmarkSomeWork() { benchmarkRule.measureRepeated { doSomeWork() } } }

Slide 18

Slide 18 text

Microbenchmark

Slide 19

Slide 19 text

Microbenchmark // using random with the same seed, so that it generates the same data every run private val random = Random(0) // create the array once and just copy it in benchmarks private val unsorted = IntArray(10_000) { random.nextInt() } @Test fun benchmark_quickSort() { // creating the variable outside of the measureRepeated to be able to assert after done var listToSort = intArrayOf() // [END_EXCLUDE] benchmarkRule.measureRepeated { // copy the array with timing disabled to measure only the algorithm itself listToSort = runWithTimingDisabled { unsorted.copyOf() } // sort the array in place and measure how long it takes SortingAlgorithms.quickSort(listToSort) } // assert only once not to add overhead to the benchmarks assertTrue(listToSort.isSorted) }

Slide 20

Slide 20 text

Microbenchmark // using random with the same seed, so that it generates the same data every run private val random = Random(0) // create the array once and just copy it in benchmarks

Slide 21

Slide 21 text

Microbenchmark // using random with the same seed, so that it generates the same data every run private val unsorted = IntArray(10_000) { random.nextInt() } @Test

Slide 22

Slide 22 text

Microbenchmark // using random with the same seed, so that it generates the same data every run fun benchmark_quickSort() { // creating the variable outside of the measureRepeated to be able to assert after done var listToSort = intArrayOf() // [END_EXCLUDE] benchmarkRule.measureRepeated { // copy the array with timing disabled to measure only the algorithm itself listToSort = runWithTimingDisabled { unsorted.copyOf() } // sort the array in place and measure how long it takes SortingAlgorithms.quickSort(listToSort) } // assert only once not to add overhead to the benchmarks

Slide 23

Slide 23 text

Microbenchmark • Run benchmark ./gradlew benchmark:connectedCheck ./gradlew benchmark:connectedCheck -P android.testInstrumentationRunnerArguments.class=com.ex ample.benchmark.SampleBenchmark#benchmarkSomeWork

Slide 24

Slide 24 text

Microbenchmark • Results

Slide 25

Slide 25 text

Macrobenchmark • Testing larger use cases of the app • Application startup, complex UI manipulations, running animations

Slide 26

Slide 26 text

Macrobenchmark • Make up “profileable”

Slide 27

Slide 27 text

Macrobenchmark • Con fi gure Benchmark   buildTypes { release { minifyEnabled true shrinkResources true proguardFiles getDefaultProguardFile(‘proguard- android-optimize.txt'), 'proguard-rules.pro' } benchmark { initWith buildTypes.release signingConfig signingConfigs.debug }

Slide 28

Slide 28 text

Macrobenchmark

Slide 29

Slide 29 text

Macrobenchmark @LargeTest @RunWith(AndroidJUnit4::class) class SampleStartupBenchmark { @get:Rule val benchmarkRule = MacrobenchmarkRule() @Test fun startup() = benchmarkRule.measureRepeated( packageName = TARGET_PACKAGE, metrics = listOf(StartupTimingMetric()), iterations = 5, setupBlock = { // Press home button before each run to ensure the starting activity isn't visible. pressHome() } ) { // starts default launch activity startActivityAndWait() }

Slide 30

Slide 30 text

Macrobenchmark @LargeTest @RunWith(AndroidJUnit4::class) class SampleStartupBenchmark { @get:Rule

Slide 31

Slide 31 text

Macrobenchmark @LargeTest val benchmarkRule = MacrobenchmarkRule() @Test

Slide 32

Slide 32 text

Macrobenchmark @LargeTest fun startup() = benchmarkRule.measureRepeated( packageName = TARGET_PACKAGE, metrics = listOf(StartupTimingMetric()), iterations = 5, setupBlock = { // Press home button before each run to ensure the starting activity isn't visible. pressHome() } ) {

Slide 33

Slide 33 text

Macrobenchmark @LargeTest // starts default launch activity startActivityAndWait() }

Slide 34

Slide 34 text

Macrobenchmark • StartupTimingMetric • FrameTimingMetric • TraceSectionMetric (experimental)

Slide 35

Slide 35 text

Macrobenchmark • Show results:  

Slide 36

Slide 36 text

Macrobenchmark { "context": { "build": { "brand": "google", "device": "blueline", "fingerprint": "google/blueline/blueline:12/SP1A.210812.015/7679548:user/release-keys", "model": "Pixel 3", "version": { "sdk": 31 } }, "cpuCoreCount": 8, "cpuLocked": false, "cpuMaxFreqHz": 2803200000, "memTotalBytes": 3753299968, "sustainedPerformanceModeEnabled": false }, "benchmarks": [ { "name": "startup", "params": {}, "className": "com.example.macrobenchmark.startup.SampleStartupBenchmark", "totalRunTimeNs": 4975598256, "metrics": { "timeToInitialDisplayMs": { "minimum": 347.881076, "maximum": 347.881076, "median": 347.881076, "runs": [ 347.881076 ] } }, "sampledMetrics": {}, "warmupIterations": 0, "repeatIterations": 3, "thermalThrottleSleepSeconds": 0 } ] } •

Slide 37

Slide 37 text

JankStats •New framework (9th February 2022) •Build at the top of Android •In app benchmark

Slide 38

Slide 38 text

JankStats class JankLoggingActivity : AppCompatActivity() { private lateinit var jankStats: JankStats override fun onCreate(savedInstanceState: Bundle?) { super.onCreate(savedInstanceState) // metrics state holder can be retrieved regardless of JankStats initialization val metricsStateHolder = PerformanceMetricsState.getForHierarchy(binding.root) // initialize JankStats for current window jankStats = JankStats.createAndTrack( window, Dispatchers.Default.asExecutor(), jankFrameListener, ) // add activity name as state metricsStateHolder.state?.addState("Activity", javaClass.simpleName) // ... }

Slide 39

Slide 39 text

Slide 40

Slide 40 text

JankStats Reporting private val jankFrameListener = JankStats.OnFrameListener { frameData -> // A real app could do something more interesting, like writing the info to local storage and later on report it. Log.v("JankStatsSample", frameData.toString()) }

Slide 41

Slide 41 text

JankStats Aggregating override fun onResume() { super.onResume() jankStatsAggregator.jankStats.isTrackingEnabled = true } override fun onPause() { super.onPause() // Before disabling tracking, issue the report with (optionally) specified reason. jankStatsAggregator.issueJankReport("Activity paused") jankStatsAggregator.jankStats.isTrackingEnabled = false }

Slide 42

Slide 42 text

JankStats Aggregating class FrameData( /** * The time at which this frame began (in nanoseconds) */ val frameStartNanos: Long, /** * The duration of this frame (in nanoseconds) */ val frameDurationNanos: Long, /** * Whether this frame was determined to be janky, meaning that its * duration exceeds the duration determined by the system to indicate jank (@see * [JankStats.jankHeuristicMultiplier]) */ val isJank: Boolean, /** * The UI/app state during this frame. This is the information set by the app, or by * other library code, that can be used later, during analysis, to determine what * UI state was current when jank occurred. * * @see PerformanceMetricsState.addState */ val states: List )

Slide 43

Slide 43 text

Detecting Regressions in CI

Slide 44

Slide 44 text

Detecting Regressions in CI - CI (Continuous integration): A software-engineering practice of merging developer code into a main code base frequently.

Slide 45

Slide 45 text

Detecting Regressions in CI - CI (Continuous integration): A software-engineering practice of merging developer code into a main code base frequently. - Regression: Noun: a return to a former or less developed state.

Slide 46

Slide 46 text

Slide 47

Slide 47 text

Detecting Regressions in CI Why would we need this?

Slide 48

Slide 48 text

Detecting Regressions in CI A typical regression scenario usually goes like this: - You're working on something - Another team (usually QA) warns you about a critical performance issue - You switch context and start digging into the codebase not sure where to look - Pain - Manual profiling, benchmarking, etc

Slide 49

Slide 49 text

Detecting Regressions in CI - Monitoring performance is much easier than profiling. - Catch problems before they hit users - Running benchmarks manually is repetitive and error prone. - The output is just a number. - Ideally, we should automate this process.

Slide 50

Slide 50 text

Detecting Regressions in CI source: Kurzgesagt Let machines do what they’re best at!

Slide 51

Slide 51 text

Detecting Regressions in CI Example: Monitoring degradation in app start-up time

Slide 52

Slide 52 text

Detecting Regressions in CI Example: Identifying degradation in app start-up time Solution: Use MacroBenchmark's StartupTimingMetric

Slide 53

Slide 53 text

Detecting Regressions in CI

Slide 54

Slide 54 text

Detecting Regressions in CI

Slide 55

Slide 55 text

Detecting Regressions in CI

Slide 56

Slide 56 text

Detecting Regressions in CI When to run? - Every build (beware of resource cost) - Or maybe every release

Slide 57

Slide 57 text

Detecting Regressions in CI When to run? - Every build (beware of resource cost) - Or maybe every release Where to run? - Real devices yield more reliable results - Firebase Test Lab (FTL)

Slide 58

Slide 58 text

Detecting Regressions in CI When to run? - Every build (beware of resource cost) - Or maybe every release Where to run? - Real devices yield more reliable results - Firebase Test Lab (FTL) What to store? - The performance metric (time in ms) - The corresponding build-number or commit-hashId

Slide 59

Slide 59 text

Detecting Regressions in CI Ok, setup finished. Now what? 🧐

Slide 60

Slide 60 text

Detecting Regressions in CI Now comes the detection part.

Slide 61

Slide 61 text

Detecting Regressions in CI - Now comes the detection part. - There are multiple possible approaches

Slide 62

Slide 62 text

Detecting Regressions in CI Compare with the previous result:

Slide 63

Slide 63 text

Detecting Regressions in CI Compare with the previous result: Don't do this Don't do this

Slide 64

Slide 64 text

Detecting Regressions in CI Why a naive approach won't work? - Benchmarking values can vary a lot. - Lots of things can change between runs

Slide 65

Slide 65 text

Detecting Regressions in CI Use a threshold value? - Compare against a manually defined percentage threshold

Slide 66

Slide 66 text

Detecting Regressions in CI Use a threshold value? - Compare against a manually defined percentage threshold

Slide 67

Slide 67 text

Detecting Regressions in CI Use a threshold value? - Compare against a manually defined percentage threshold

Slide 68

Slide 68 text

Detecting Regressions in CI Use a threshold value? - Compare against a manually defined percentage threshold

Slide 69

Slide 69 text

Detecting Regressions in CI Problems with naive approaches - Values are inconsistent between benchmarks - It may trigger false alerts - It may miss real regressions

Slide 70

Slide 70 text

Detecting Regressions in CI We can do better

Slide 71

Slide 71 text

Constraints - Handle temporary instability - Avoid manual tuning, per benchmark - We want accuracy! Detecting Regressions in CI

Slide 72

Slide 72 text

Now comes the detection part. Detecting Regressions in CI

Slide 73

Slide 73 text

Now comes the detection math part. Detecting Regressions in CI

Slide 74

Slide 74 text

Detecting Regressions in CI Now comes the detection math part. We need more context to make a decision.

Slide 75

Slide 75 text

Detecting Regressions in CI Now comes the detection math part. We need more context to make a decision.

Slide 76

Slide 76 text

Step fitting algorithm Statistical approach for detecting jumps or steps in a time- series Detecting Regressions in CI

Slide 77

Slide 77 text

Step fitting algorithm    - Main objective: increase confidence in detecting regressions - The sliding window helps you make context-aware decisions - Use width size and threshold to fine-tune the confidence of regression Detecting Regressions in CI

Slide 78

Slide 78 text

Detecting Regressions in CI

Slide 79

Slide 79 text

Detecting Regressions in CI

Slide 80

Slide 80 text

Detecting Regressions in CI

Slide 81

Slide 81 text

Detecting Regressions in CI

Slide 82

Slide 82 text

Detecting Regressions in CI

Slide 83

Slide 83 text

Detecting Regressions in CI

Slide 84

Slide 84 text

Recap Detecting Regressions in CI: - Automate regression detection on key points of your app - Use step- fi tting instead of naive approaches - Helps you catch issues before they hit users - When a new build result is ready, check its benchmark values inside the 2* width size - If there’s a regression or improvement fire an alert to investigate the performance in the last width builds

Slide 85

Slide 85 text

Recap Jetpack benchmark: •Micro, macro benchmarks. •Instrumentation tests as benchmarks

Slide 86

Slide 86 text

Recap Pro fi ling: • Memory, Energy, Network, CPU. • Pro fi le to identify bottlenecks and implement your benchmarks.

Slide 87

Slide 87 text

Resources Great article on fi ghting regressions by Chris Craik  https://bit.ly/3kaidug Benchmarking o ff i cial docs  https://bit.ly/3rQRaZ6 JankStats:  https://developer.android.com/topic/performance/jankstats