Android Benchmarking and other stories

Android Benchmarking and other stories Iury Souza Enrique López-Mañas

• Mobile stuff @ Klarna • Currently building a shopping
browser • Loves building tools @iurysza

@eenriquelopez • Android Freelancer • Kotlin Weekly maintainer (kotlinweekly.net) •
Kotlin, Android • Running, fi nances.

Introduction Benchmarking is the practice of comparing business processes and
performance metrics to industry bests and best practices from other companies. Dimensions typically measured are quality, time and cost.

Introduction Benchmarking is a way to test the performance of
your application. You can regularly run benchmarks to help analyze and debug performance problems and ensure that you don't introduce regressions in recent changes.

Introduction In software engineering, pro fi ling ("program pro fi
ling", "software pro fi ling") is a form of dynamic program analysis that measures, for example, the space (memory) or time complexity of a program, the usage of particular instructions, or the frequency and duration of function calls.

Android Benchmarking • Microbenchmark • Macrobenchmark • Jetpack Benchmark •
JankStats

Android Profiling • Android Pro fi ler (since Android Studio
3.0) • Replaces Android Monitor Tools • CPU, Memory, Network and Energy pro fi lers • Pro fi leable apps • Useful for identifying performance bottlenecks

Android Profiling

Android Profiling - Memory

Android Profiling - Memory allocation

Android Profiling - Energy

Microbenchmark • Quickly benchmark your Android native code (Kotlin or
Java) from within Android Studio. • Recommendation: pro fi le your code before writing a benchmark • Useful for CPU work that is run many times in your app • Examples: RecyclerView scrolling with one item shown at a time, data conversions/processing.

Microbenchmark • Add dependency: dependencies { androidTestImplementation 'androidx.benchmark:benchmark-junit4 : 1.1.0-beta03'
}

Microbenchmark • Add benchmark: @RunWith(AndroidJUnit4::class) class SampleBenchmark { @get:Rule val
benchmarkRule = BenchmarkRule() @Test fun benchmarkSomeWork() { benchmarkRule.measureRepeated { doSomeWork() } } }

Microbenchmark • Add benchmark: @RunWith(AndroidJUnit4::class) val benchmarkRule = BenchmarkRule() @Test

Microbenchmark • Add benchmark: @RunWith(AndroidJUnit4::class) fun benchmarkSomeWork() { benchmarkRule.measureRepeated {
doSomeWork() } } }

Microbenchmark

Microbenchmark // using random with the same seed, so that
it generates the same data every run private val random = Random(0) // create the array once and just copy it in benchmarks private val unsorted = IntArray(10_000) { random.nextInt() } @Test fun benchmark_quickSort() { // creating the variable outside of the measureRepeated to be able to assert after done var listToSort = intArrayOf() // [END_EXCLUDE] benchmarkRule.measureRepeated { // copy the array with timing disabled to measure only the algorithm itself listToSort = runWithTimingDisabled { unsorted.copyOf() } // sort the array in place and measure how long it takes SortingAlgorithms.quickSort(listToSort) } // assert only once not to add overhead to the benchmarks assertTrue(listToSort.isSorted) }

it generates the same data every run private val random = Random(0) // create the array once and just copy it in benchmarks

it generates the same data every run private val unsorted = IntArray(10_000) { random.nextInt() } @Test

it generates the same data every run fun benchmark_quickSort() { // creating the variable outside of the measureRepeated to be able to assert after done var listToSort = intArrayOf() // [END_EXCLUDE] benchmarkRule.measureRepeated { // copy the array with timing disabled to measure only the algorithm itself listToSort = runWithTimingDisabled { unsorted.copyOf() } // sort the array in place and measure how long it takes SortingAlgorithms.quickSort(listToSort) } // assert only once not to add overhead to the benchmarks

Microbenchmark • Run benchmark ./gradlew benchmark:connectedCheck ./gradlew benchmark:connectedCheck -P android.testInstrumentationRunnerArguments.class=com.ex
ample.benchmark.SampleBenchmark#benchmarkSomeWork

Microbenchmark • Results

Macrobenchmark • Testing larger use cases of the app •
Application startup, complex UI manipulations, running animations

Macrobenchmark • Make up “profileable”  <profileable android:shell="true" tools:targetApi="q" />

Macrobenchmark • Con fi gure Benchmark   buildTypes { release
{ minifyEnabled true shrinkResources true proguardFiles getDefaultProguardFile(‘proguard- android-optimize.txt'), 'proguard-rules.pro' } benchmark { initWith buildTypes.release signingConfig signingConfigs.debug }

Macrobenchmark

Macrobenchmark @LargeTest @RunWith(AndroidJUnit4::class) class SampleStartupBenchmark { @get:Rule val benchmarkRule =
MacrobenchmarkRule() @Test fun startup() = benchmarkRule.measureRepeated( packageName = TARGET_PACKAGE, metrics = listOf(StartupTimingMetric()), iterations = 5, setupBlock = { // Press home button before each run to ensure the starting activity isn't visible. pressHome() } ) { // starts default launch activity startActivityAndWait() }

Macrobenchmark @LargeTest @RunWith(AndroidJUnit4::class) class SampleStartupBenchmark { @get:Rule

Macrobenchmark @LargeTest val benchmarkRule = MacrobenchmarkRule() @Test

Macrobenchmark @LargeTest fun startup() = benchmarkRule.measureRepeated( packageName = TARGET_PACKAGE, metrics
= listOf(StartupTimingMetric()), iterations = 5, setupBlock = { // Press home button before each run to ensure the starting activity isn't visible. pressHome() } ) {

Macrobenchmark @LargeTest // starts default launch activity startActivityAndWait() }

Macrobenchmark • StartupTimingMetric • FrameTimingMetric • TraceSectionMetric (experimental)

Macrobenchmark • Show results:  

Macrobenchmark { "context": { "build": { "brand": "google", "device": "blueline",
"fingerprint": "google/blueline/blueline:12/SP1A.210812.015/7679548:user/release-keys", "model": "Pixel 3", "version": { "sdk": 31 } }, "cpuCoreCount": 8, "cpuLocked": false, "cpuMaxFreqHz": 2803200000, "memTotalBytes": 3753299968, "sustainedPerformanceModeEnabled": false }, "benchmarks": [ { "name": "startup", "params": {}, "className": "com.example.macrobenchmark.startup.SampleStartupBenchmark", "totalRunTimeNs": 4975598256, "metrics": { "timeToInitialDisplayMs": { "minimum": 347.881076, "maximum": 347.881076, "median": 347.881076, "runs": [ 347.881076 ] } }, "sampledMetrics": {}, "warmupIterations": 0, "repeatIterations": 3, "thermalThrottleSleepSeconds": 0 } ] } •

JankStats •New framework (9th February 2022) •Build at the top
of Android •In app benchmark

JankStats class JankLoggingActivity : AppCompatActivity() { private lateinit var jankStats:
JankStats override fun onCreate(savedInstanceState: Bundle?) { super.onCreate(savedInstanceState) // metrics state holder can be retrieved regardless of JankStats initialization val metricsStateHolder = PerformanceMetricsState.getForHierarchy(binding.root) // initialize JankStats for current window jankStats = JankStats.createAndTrack( window, Dispatchers.Default.asExecutor(), jankFrameListener, ) // add activity name as state metricsStateHolder.state?.addState("Activity", javaClass.simpleName) // ... }

JankStats Reporting private val jankFrameListener = JankStats.OnFrameListener { frameData ->
// A real app could do something more interesting, like writing the info to local storage and later on report it. Log.v("JankStatsSample", frameData.toString()) }

JankStats Aggregating override fun onResume() { super.onResume() jankStatsAggregator.jankStats.isTrackingEnabled = true
} override fun onPause() { super.onPause() // Before disabling tracking, issue the report with (optionally) specified reason. jankStatsAggregator.issueJankReport("Activity paused") jankStatsAggregator.jankStats.isTrackingEnabled = false }

JankStats Aggregating class FrameData( /** * The time at which
this frame began (in nanoseconds) */ val frameStartNanos: Long, /** * The duration of this frame (in nanoseconds) */ val frameDurationNanos: Long, /** * Whether this frame was determined to be janky, meaning that its * duration exceeds the duration determined by the system to indicate jank (@see * [JankStats.jankHeuristicMultiplier]) */ val isJank: Boolean, /** * The UI/app state during this frame. This is the information set by the app, or by * other library code, that can be used later, during analysis, to determine what * UI state was current when jank occurred. * * @see PerformanceMetricsState.addState */ val states: List<StateInfo> )

Detecting Regressions in CI

Detecting Regressions in CI - CI (Continuous integration): A software-engineering
practice of merging developer code into a main code base frequently.

practice of merging developer code into a main code base frequently. - Regression: Noun: a return to a former or less developed state.

practice of merging developer code into a main code base frequently. - Regression: Noun: a return to a former or less developed state. Performance degradation

Detecting Regressions in CI Why would we need this?

Detecting Regressions in CI A typical regression scenario usually goes
like this: - You're working on something - Another team (usually QA) warns you about a critical performance issue - You switch context and start digging into the codebase not sure where to look - Pain - Manual profiling, benchmarking, etc

Detecting Regressions in CI - Monitoring performance is much easier
than profiling. - Catch problems before they hit users - Running benchmarks manually is repetitive and error prone. - The output is just a number. - Ideally, we should automate this process.

Detecting Regressions in CI source: Kurzgesagt Let machines do what
they’re best at!

Detecting Regressions in CI Example: Monitoring degradation in app start-up
time

Detecting Regressions in CI Example: Identifying degradation in app start-up
time Solution: Use MacroBenchmark's StartupTimingMetric

Detecting Regressions in CI When to run? - Every build
(beware of resource cost) - Or maybe every release

(beware of resource cost) - Or maybe every release Where to run? - Real devices yield more reliable results - Firebase Test Lab (FTL)

(beware of resource cost) - Or maybe every release Where to run? - Real devices yield more reliable results - Firebase Test Lab (FTL) What to store? - The performance metric (time in ms) - The corresponding build-number or commit-hashId

Detecting Regressions in CI Ok, setup finished. Now what? 🧐

Detecting Regressions in CI Now comes the detection part.

Detecting Regressions in CI - Now comes the detection part.
- There are multiple possible approaches

Detecting Regressions in CI Compare with the previous result:

Detecting Regressions in CI Compare with the previous result: Don't
do this Don't do this

Detecting Regressions in CI Why a naive approach won't work?
- Benchmarking values can vary a lot. - Lots of things can change between runs

Detecting Regressions in CI Use a threshold value? - Compare
against a manually defined percentage threshold

Detecting Regressions in CI Problems with naive approaches - Values
are inconsistent between benchmarks - It may trigger false alerts - It may miss real regressions

Detecting Regressions in CI We can do better

Constraints - Handle temporary instability - Avoid manual tuning, per
benchmark - We want accuracy! Detecting Regressions in CI

Now comes the detection part. Detecting Regressions in CI

Now comes the detection math part. Detecting Regressions in CI

Detecting Regressions in CI Now comes the detection math part.
We need more context to make a decision.

Step fitting algorithm Statistical approach for detecting jumps or steps
in a time- series Detecting Regressions in CI

Step fitting algorithm    - Main objective: increase confidence in
detecting regressions - The sliding window helps you make context-aware decisions - Use width size and threshold to fine-tune the confidence of regression Detecting Regressions in CI

Recap Detecting Regressions in CI: - Automate regression detection on
key points of your app - Use step- fi tting instead of naive approaches - Helps you catch issues before they hit users - When a new build result is ready, check its benchmark values inside the 2* width size - If there’s a regression or improvement fire an alert to investigate the performance in the last width builds

Recap Jetpack benchmark: •Micro, macro benchmarks. •Instrumentation tests as benchmarks

Recap Pro fi ling: • Memory, Energy, Network, CPU. •
Pro fi le to identify bottlenecks and implement your benchmarks.

Resources Great article on fi ghting regressions by Chris Craik 
https://bit.ly/3kaidug Benchmarking o ff i cial docs  https://bit.ly/3rQRaZ6 JankStats:  https://developer.android.com/topic/performance/jankstats

Your feedback! bit.ly/benchmarkFeedback

Thank you! @iurysza @eenriquelopez

Android Benchmarking and other stories

Android Benchmarking and other stories

More Decks by Enrique López Mañas

Other Decks in Programming

Featured

Transcript