Experiments for your Android Builds driven by Gradle Profiler

Slide 1

Slide 1 text

EXPERIMENTS FOR ANDROID BUILDS driven by Gradle Proﬁler Ubiratan Soares October / 2020

Slide 2

Slide 2 text

https://n26.com/en/careers

Slide 3

Slide 3 text

A problem like a big Android project and a really slow build …

Slide 4

Slide 4 text

“You can’t improve what you can’t measure” Someone, somewhen

Slide 5

Slide 5 text

https://www.youtube.com/watch?v=hBkIKfzd7Ms

Slide 6

Slide 6 text

No content

Slide 7

Slide 7 text

Measuring builds

Slide 8

Slide 8 text

No content

Slide 9

Slide 9 text

No content

Slide 10

Slide 10 text

Gradle Profiler Features Tooling API Cold/warm builds Daemon control Benchmarking Profilling Multiple build systems Multiple profilers Scenarios definition Incremental builds evaluation Etc

Slide 11

Slide 11 text

Installing with SDKMAN! https://github.com/gradle/gradle-proﬁler/releases sdk install gradleprofiler

Slide 12

Slide 12 text

Installing with HomeBrew brew install gradle-profiler

Slide 13

Slide 13 text

Benchmarking gradle-profiler

Slide 14

Slide 14 text

No content

Slide 15

Slide 15 text

Running Scenarios gradle-profiler

Slide 16

Slide 16 text

Demo

Slide 17

Slide 17 text

Evaluating Measurements

Slide 18

Slide 18 text

Build Benchmark #01 Benchmark #02 1 4 4.4 2 5 5.1 3 5.1 5.2 4 4.4 6.2 5 3.9 3.4 6 4.2 6.2 7 4.6 4.6 8 4.5 4.5 9 4.4 3.3 10 4 4.4 3 4 5 6 7

Slide 19

Slide 19 text

3 4 5 6 7 Benchmark Mean Standard Deviation #01 4.46 0.41 #02 4.77 1.04 #01 #02

Slide 20

Slide 20 text

https:!//towardsdatascience.com/why-averages-are-often-wrong-1ff08e409a5b

Slide 21

Slide 21 text

No content

Slide 22

Slide 22 text

3 4 5 6 7 Benchmark Mean Standard Deviation #01 4.46 0.41 #02 4.77 1.04

Slide 23

Slide 23 text

When Build Engineering meets Data Science

Slide 24

Slide 24 text

Statistical Inference Population Exploratory Data Mean (µ) Sampling Reﬁned data Mean (X) Probability Analysis Inferred parameter

Slide 25

Slide 25 text

Alice Bob Android Tech Lead Android Engineer

Slide 26

Slide 26 text

“I KILLED SO MANY ANNOTATIONS ON MY PR THAT NOW I’M SURE app:assembleDebug IS RUNNING FASTER !!!!” “DID YOU RUN BENCHMARKS FOR IT WITH GRADLE PROFILER ???”

Slide 27

Slide 27 text

“I CAN HELP WITH THAT !” “BUT I’M NOT SURE HOW TO DEMONSTRATE THE IMPROVEMENTS …” “YES I DID, BEFORE AND AFTER MY CHANGES.”

Slide 28

Slide 28 text

Statistical Hyphotesis • Null hyphotesis (H0) • Alternative hyphotesis (HA) “I missed that class I GUESS…” Population Mean (µ)

Slide 29

Slide 29 text

Null HyphoTHESIS - h0 - IS THE STATUS QUO. WHAT WE HAVE RIGHT NOW IN OUR TRUNK BRANCH IF YOU PREFER ALTERNATIVE HyphoTHESIS - HA- IS WHAT YOU WANT TO DEMONSTRATE

Slide 30

Slide 30 text

Statistical signiﬁcance P(sample 1) = 99.99% P(sample 2) = 88.88% Sample 1 Probability Analysis Sample 2 alpha = signiﬁcance level = 0.05 = 1 - 0.95 “Probably I missed that CLASS TOO …”

Slide 31

Slide 31 text

“Your modifications will mean real improvements if I EXECUTE 100 runs of app:assembleDebug and I see an execution faster than 5000ms FOR 95 of them (AT LEAST).”

Slide 32

Slide 32 text

Law of the Big Numbers Size ? Sample !>= 30 < 30 T-student Distribution Normal Distribution “SUPER EASY! ”

Slide 33

Slide 33 text

“Given that app:assembleDebug IS quite slow, we wIll consider BENCHMARKS RUNNING SOMETHING between 15 and 25 MEASURED BUILDS; and Therefore t-student distribution will model our probability CURVE.”

Slide 34

Slide 34 text

p-value Sample Probability Analysis Critical value (eg, Z or t) P-value (area) The probability of an error Type I

Slide 35

Slide 35 text

No content

Slide 36

Slide 36 text

“With the data you provided the probability model tells me that every 100 RUNS OF app:assembleDebug, 3 OF THEM ACTUALLY WILL BE false positiveS”

Slide 37

Slide 37 text

Critical value and Tail analysis Left-tailed Right-tailed Double-tailed µA < µ0 µA !!= µ0 µA > µ0 “I DO NEED SOME COFFEE. DO YOU?”

Slide 38

Slide 38 text

µA !<= µ0 “Faster builds imply that the observed mean after your modifications are SMALLER than the value we had before. So we want a left-tail analysis”

Slide 39

Slide 39 text

“we will compare the samples for IT captured on two different moments UNDER SLIGHTLY DIFFERENT CONDITIONS” “But we don’t KNOW the ACTUAL AVERAGE VALUE OF APP:ASSEMBLEDEBUG FOR ALL POSSIBLE BUILDS …” "THIS IS CALLED A PAIRED TEST.”

Slide 40

Slide 40 text

“YES, YOU WILL” “PLEASE TELL ME THAT I WON’T HAVE TO DO ALL THE CALCULATIONS BY MYSELF …”

Slide 41

Slide 41 text

“JUST JOKING. THERE ARE TOOLS WE CAN USE. ” …

Slide 42

Slide 42 text

https://www.statskingdom.com/160MeanT2pair.html

Slide 43

Slide 43 text

No content

Slide 44

Slide 44 text

A framework to drive experiments for your Gradle builds

Slide 45

Slide 45 text

Benchmark #01 (status quo) alpha (0.05) p-value Compare p-value and alpha Left-tailed Paired T-test Evidence that build has improved is stastistically WEAK p-value is BIGGER p-value is SMALLER Evidence that build has improved is stastistically STRONG Benchmark #02 (modiﬁcations) Gradle task

Slide 46

Slide 46 text

Examples

Slide 47

Slide 47 text

No content

Slide 48

Slide 48 text

Running Environment •2018 MacBook Pro •Intel Core i7 (6 Cores) •16GB RAM kotlin.parallel.tasks.in.project=true kapt.use.worker.api=true kapt.include.compile.classpath=true kapt.incremental.apt=false org.gradle.workers.max=6 .gradle/gradle.properties

Slide 49

Slide 49 text

Scenario build { title = “Assemble Debug APK" tasks = “mobile:assembleDebug” daemon = warm cleanup-tasks = ["clean"] } Execution • 4 warmed-up builds • 15 measured builds (samples) Hyphotesis H0 : No meaningful build improvements building with newer JDKs Ha : JDK11 delivers faster Gradle builds than JDK8 https://github.com/JakeWharton/SdkSearch Target Example #01

Slide 50

Slide 50 text

No content

Slide 51

Slide 51 text

No content

Slide 52

Slide 52 text

No content

Slide 53

Slide 53 text

#1 Building with different JDKs •P-value > alpha

Slide 54

Slide 54 text

#1 Building with different JDKs •P-value > alpha •Ha has not been accepted

Slide 55

Slide 55 text

#1 Building with different JDKs •P-value > alpha •Ha has not been accepted •No STRONG statistical evidence of faster builds with JDK11

Slide 56

Slide 56 text

No content

Slide 57

Slide 57 text

No content

Slide 58

Slide 58 text

No content

Slide 59

Slide 59 text

Execution • 4 warmed-up builds • 15 measured builds (samples) Hyphotesis H0 : Bumps delivers no meanigful build improvementsts Current = AGP 3.4.1 and Gradle 5.1.1 Ha : Bumps to AGP 3.5.3 and Gradle 6.2 deliver faster builds https://github.com/google/iosched Target Scenario build { title = “Assemble Debug APK" tasks = “mobile:assembleDebug” daemon = warm cleanup-tasks = ["clean"] } Example #02

Slide 60

Slide 60 text

No content

Slide 61

Slide 61 text

No content

Slide 62

Slide 62 text

No content

Slide 63

Slide 63 text

No content

Slide 64

Slide 64 text

No content

Slide 65

Slide 65 text

Final Remarks

Slide 66

Slide 66 text

• Play around Gradle Proﬁler • Design your experiment • Run it !!! • Take your decisions based on data generated and analysed in particular your context, not on Tweets or Subreddits Call to action!

Slide 67

Slide 67 text

UBIRATAN SOARES Brazilian Computer Scientist Senior Software Engineer @ N26 GDE for Android and Kotlin @ubiratanfsoares ubiratansoares.dev

Slide 68

Slide 68 text

THANKS