Experiments for your Android Builds driven by Gradle Profiler

Experiments for your Android Builds driven by Gradle Profiler

Companion slides for my talk about applying Statistics in order to improve Gradle Builds

Presented at

- N26 Barcelona | Android Meetup (February / 2020)
- GDG-SP Android Meetup #78 (March / 2020)
- Droidcon EMEA Online (October / 2020)
- Android Summit Online (October / 2020)

D4b7a3e2ed10f86e0b52498713ba2601?s=128

Ubiratan Soares

October 08, 2020
Tweet

Transcript

  1. EXPERIMENTS FOR ANDROID BUILDS driven by Gradle Profiler Ubiratan Soares

    October / 2020
  2. https://n26.com/en/careers

  3. A problem like a big Android project and a really

    slow build …
  4. “You can’t improve what you can’t measure” Someone, somewhen

  5. https://www.youtube.com/watch?v=hBkIKfzd7Ms

  6. None
  7. Measuring builds

  8. None
  9. None
  10. Gradle Profiler Features Tooling API Cold/warm builds Daemon control Benchmarking

    Profilling Multiple build systems Multiple profilers Scenarios definition Incremental builds evaluation Etc
  11. Installing with SDKMAN! https://github.com/gradle/gradle-profiler/releases sdk install gradleprofiler <version>

  12. Installing with HomeBrew brew install gradle-profiler

  13. Benchmarking gradle-profiler

  14. None
  15. Running Scenarios gradle-profiler

  16. Demo

  17. Evaluating Measurements

  18. Build Benchmark #01 Benchmark #02 1 4 4.4 2 5

    5.1 3 5.1 5.2 4 4.4 6.2 5 3.9 3.4 6 4.2 6.2 7 4.6 4.6 8 4.5 4.5 9 4.4 3.3 10 4 4.4 3 4 5 6 7
  19. 3 4 5 6 7 Benchmark Mean Standard Deviation #01

    4.46 0.41 #02 4.77 1.04 #01 #02
  20. https:!//towardsdatascience.com/why-averages-are-often-wrong-1ff08e409a5b

  21. None
  22. 3 4 5 6 7 Benchmark Mean Standard Deviation #01

    4.46 0.41 #02 4.77 1.04
  23. When Build Engineering meets Data Science

  24. Statistical Inference Population Exploratory Data Mean (µ) Sampling Refined data

    Mean (X) Probability Analysis Inferred parameter
  25. Alice Bob Android Tech Lead Android Engineer

  26. “I KILLED SO MANY ANNOTATIONS ON MY PR THAT NOW

    I’M SURE app:assembleDebug IS RUNNING FASTER !!!!” “DID YOU RUN BENCHMARKS FOR IT WITH GRADLE PROFILER ???”
  27. “I CAN HELP WITH THAT !” “BUT I’M NOT SURE

    HOW TO DEMONSTRATE THE IMPROVEMENTS …” “YES I DID, BEFORE AND AFTER MY CHANGES.”
  28. Statistical Hyphotesis • Null hyphotesis (H0) • Alternative hyphotesis (HA)

    “I missed that class I GUESS…” Population Mean (µ)
  29. Null HyphoTHESIS - h0 - IS THE STATUS QUO. WHAT

    WE HAVE RIGHT NOW IN OUR TRUNK BRANCH IF YOU PREFER ALTERNATIVE HyphoTHESIS - HA- IS WHAT YOU WANT TO DEMONSTRATE
  30. Statistical significance P(sample 1) = 99.99% P(sample 2) = 88.88%

    Sample 1 Probability Analysis Sample 2 alpha = significance level = 0.05 = 1 - 0.95 “Probably I missed that CLASS TOO …”
  31. “Your modifications will mean real improvements if I EXECUTE 100

    runs of app:assembleDebug and I see an execution faster than 5000ms FOR 95 of them (AT LEAST).”
  32. Law of the Big Numbers Size ? Sample !>= 30

    < 30 T-student Distribution Normal Distribution “SUPER EASY! ”
  33. “Given that app:assembleDebug IS quite slow, we wIll consider BENCHMARKS

    RUNNING SOMETHING between 15 and 25 MEASURED BUILDS; and Therefore t-student distribution will model our probability CURVE.”
  34. p-value Sample Probability Analysis Critical value (eg, Z or t)

    P-value (area) The probability of an error Type I
  35. None
  36. “With the data you provided the probability model tells me

    that every 100 RUNS OF app:assembleDebug, 3 OF THEM ACTUALLY WILL BE false positiveS”
  37. Critical value and Tail analysis Left-tailed Right-tailed Double-tailed µA <

    µ0 µA !!= µ0 µA > µ0 “I DO NEED SOME COFFEE. DO YOU?”
  38. µA !<= µ0 “Faster builds imply that the observed mean

    after your modifications are SMALLER than the value we had before. So we want a left-tail analysis”
  39. “we will compare the samples for IT captured on two

    different moments UNDER SLIGHTLY DIFFERENT CONDITIONS” “But we don’t KNOW the ACTUAL AVERAGE VALUE OF APP:ASSEMBLEDEBUG FOR ALL POSSIBLE BUILDS …” "THIS IS CALLED A PAIRED TEST.”
  40. “YES, YOU WILL” “PLEASE TELL ME THAT I WON’T HAVE

    TO DO ALL THE CALCULATIONS BY MYSELF …”
  41. “JUST JOKING. THERE ARE TOOLS WE CAN USE. ” …

  42. https://www.statskingdom.com/160MeanT2pair.html

  43. None
  44. A framework to drive experiments for your Gradle builds

  45. Benchmark #01 (status quo) alpha (0.05) p-value Compare p-value and

    alpha Left-tailed Paired T-test Evidence that build has improved is stastistically WEAK p-value is BIGGER p-value is SMALLER Evidence that build has improved is stastistically STRONG Benchmark #02 (modifications) Gradle task
  46. Examples

  47. None
  48. Running Environment •2018 MacBook Pro •Intel Core i7 (6 Cores)

    •16GB RAM kotlin.parallel.tasks.in.project=true kapt.use.worker.api=true kapt.include.compile.classpath=true kapt.incremental.apt=false org.gradle.workers.max=6 .gradle/gradle.properties
  49. Scenario build { title = “Assemble Debug APK" tasks =

    “mobile:assembleDebug” daemon = warm cleanup-tasks = ["clean"] } Execution • 4 warmed-up builds • 15 measured builds (samples) Hyphotesis H0 : No meaningful build improvements building with newer JDKs Ha : JDK11 delivers faster Gradle builds than JDK8 https://github.com/JakeWharton/SdkSearch Target Example #01
  50. None
  51. None
  52. None
  53. #1 Building with different JDKs •P-value > alpha

  54. #1 Building with different JDKs •P-value > alpha •Ha has

    not been accepted
  55. #1 Building with different JDKs •P-value > alpha •Ha has

    not been accepted •No STRONG statistical evidence of faster builds with JDK11
  56. None
  57. None
  58. None
  59. Execution • 4 warmed-up builds • 15 measured builds (samples)

    Hyphotesis H0 : Bumps delivers no meanigful build improvementsts Current = AGP 3.4.1 and Gradle 5.1.1 Ha : Bumps to AGP 3.5.3 and Gradle 6.2 deliver faster builds https://github.com/google/iosched Target Scenario build { title = “Assemble Debug APK" tasks = “mobile:assembleDebug” daemon = warm cleanup-tasks = ["clean"] } Example #02
  60. None
  61. None
  62. None
  63. None
  64. None
  65. Final Remarks

  66. • Play around Gradle Profiler • Design your experiment •

    Run it !!! • Take your decisions based on data generated and analysed in particular your context, not on Tweets or Subreddits Call to action!
  67. UBIRATAN SOARES Brazilian Computer Scientist Senior Software Engineer @ N26

    GDE for Android and Kotlin @ubiratanfsoares ubiratansoares.dev
  68. THANKS