Experiments for your Android Builds driven by Gradle Profiler

EXPERIMENTS FOR ANDROID BUILDS driven by Gradle Proﬁler Ubiratan Soares
October / 2020

https://n26.com/en/careers

A problem like a big Android project and a really
slow build …

“You can’t improve what you can’t measure” Someone, somewhen

https://www.youtube.com/watch?v=hBkIKfzd7Ms

Measuring builds

Gradle Profiler Features Tooling API Cold/warm builds Daemon control Benchmarking
Profilling Multiple build systems Multiple profilers Scenarios definition Incremental builds evaluation Etc

Installing with SDKMAN! https://github.com/gradle/gradle-proﬁler/releases sdk install gradleprofiler <version>

Installing with HomeBrew brew install gradle-profiler

Benchmarking gradle-profiler

Running Scenarios gradle-profiler

Evaluating Measurements

Build Benchmark #01 Benchmark #02 1 4 4.4 2 5
5.1 3 5.1 5.2 4 4.4 6.2 5 3.9 3.4 6 4.2 6.2 7 4.6 4.6 8 4.5 4.5 9 4.4 3.3 10 4 4.4 3 4 5 6 7

3 4 5 6 7 Benchmark Mean Standard Deviation #01
4.46 0.41 #02 4.77 1.04 #01 #02

https:!//towardsdatascience.com/why-averages-are-often-wrong-1ff08e409a5b

3 4 5 6 7 Benchmark Mean Standard Deviation #01
4.46 0.41 #02 4.77 1.04

When Build Engineering meets Data Science

Statistical Inference Population Exploratory Data Mean (µ) Sampling Reﬁned data
Mean (X) Probability Analysis Inferred parameter

Alice Bob Android Tech Lead Android Engineer

“I KILLED SO MANY ANNOTATIONS ON MY PR THAT NOW
I’M SURE app:assembleDebug IS RUNNING FASTER !!!!” “DID YOU RUN BENCHMARKS FOR IT WITH GRADLE PROFILER ???”

“I CAN HELP WITH THAT !” “BUT I’M NOT SURE
HOW TO DEMONSTRATE THE IMPROVEMENTS …” “YES I DID, BEFORE AND AFTER MY CHANGES.”

Statistical Hyphotesis • Null hyphotesis (H0) • Alternative hyphotesis (HA)
“I missed that class I GUESS…” Population Mean (µ)

Null HyphoTHESIS - h0 - IS THE STATUS QUO. WHAT
WE HAVE RIGHT NOW IN OUR TRUNK BRANCH IF YOU PREFER ALTERNATIVE HyphoTHESIS - HA- IS WHAT YOU WANT TO DEMONSTRATE

Statistical signiﬁcance P(sample 1) = 99.99% P(sample 2) = 88.88%
Sample 1 Probability Analysis Sample 2 alpha = signiﬁcance level = 0.05 = 1 - 0.95 “Probably I missed that CLASS TOO …”

“Your modifications will mean real improvements if I EXECUTE 100
runs of app:assembleDebug and I see an execution faster than 5000ms FOR 95 of them (AT LEAST).”

Law of the Big Numbers Size ? Sample !>= 30
< 30 T-student Distribution Normal Distribution “SUPER EASY! ”

“Given that app:assembleDebug IS quite slow, we wIll consider BENCHMARKS
RUNNING SOMETHING between 15 and 25 MEASURED BUILDS; and Therefore t-student distribution will model our probability CURVE.”

p-value Sample Probability Analysis Critical value (eg, Z or t)
P-value (area) The probability of an error Type I

“With the data you provided the probability model tells me
that every 100 RUNS OF app:assembleDebug, 3 OF THEM ACTUALLY WILL BE false positiveS”

Critical value and Tail analysis Left-tailed Right-tailed Double-tailed µA <
µ0 µA !!= µ0 µA > µ0 “I DO NEED SOME COFFEE. DO YOU?”

µA !<= µ0 “Faster builds imply that the observed mean
after your modifications are SMALLER than the value we had before. So we want a left-tail analysis”

“we will compare the samples for IT captured on two
different moments UNDER SLIGHTLY DIFFERENT CONDITIONS” “But we don’t KNOW the ACTUAL AVERAGE VALUE OF APP:ASSEMBLEDEBUG FOR ALL POSSIBLE BUILDS …” "THIS IS CALLED A PAIRED TEST.”

“YES, YOU WILL” “PLEASE TELL ME THAT I WON’T HAVE
TO DO ALL THE CALCULATIONS BY MYSELF …”

“JUST JOKING. THERE ARE TOOLS WE CAN USE. ” …

https://www.statskingdom.com/160MeanT2pair.html

A framework to drive experiments for your Gradle builds

Benchmark #01 (status quo) alpha (0.05) p-value Compare p-value and
alpha Left-tailed Paired T-test Evidence that build has improved is stastistically WEAK p-value is BIGGER p-value is SMALLER Evidence that build has improved is stastistically STRONG Benchmark #02 (modiﬁcations) Gradle task

Examples

Running Environment •2018 MacBook Pro •Intel Core i7 (6 Cores)
•16GB RAM kotlin.parallel.tasks.in.project=true kapt.use.worker.api=true kapt.include.compile.classpath=true kapt.incremental.apt=false org.gradle.workers.max=6 .gradle/gradle.properties

Scenario build { title = “Assemble Debug APK" tasks =
“mobile:assembleDebug” daemon = warm cleanup-tasks = ["clean"] } Execution • 4 warmed-up builds • 15 measured builds (samples) Hyphotesis H0 : No meaningful build improvements building with newer JDKs Ha : JDK11 delivers faster Gradle builds than JDK8 https://github.com/JakeWharton/SdkSearch Target Example #01

#1 Building with different JDKs •P-value > alpha

#1 Building with different JDKs •P-value > alpha •Ha has
not been accepted

#1 Building with different JDKs •P-value > alpha •Ha has
not been accepted •No STRONG statistical evidence of faster builds with JDK11

Execution • 4 warmed-up builds • 15 measured builds (samples)
Hyphotesis H0 : Bumps delivers no meanigful build improvementsts Current = AGP 3.4.1 and Gradle 5.1.1 Ha : Bumps to AGP 3.5.3 and Gradle 6.2 deliver faster builds https://github.com/google/iosched Target Scenario build { title = “Assemble Debug APK" tasks = “mobile:assembleDebug” daemon = warm cleanup-tasks = ["clean"] } Example #02

Final Remarks

• Play around Gradle Proﬁler • Design your experiment •
Run it !!! • Take your decisions based on data generated and analysed in particular your context, not on Tweets or Subreddits Call to action!

UBIRATAN SOARES Brazilian Computer Scientist Senior Software Engineer @ N26
GDE for Android and Kotlin @ubiratanfsoares ubiratansoares.dev

THANKS

Experiments for your Android Builds driven by G...

Experiments for your Android Builds driven by Gradle Profiler

Video

More Decks by Ubiratan Soares

Other Decks in Programming

Featured

Transcript