Experiments for your Android Builds driven by Gradle Profiler

Experiments for your Android Builds driven by Gradle Profiler

Companion slides for my talk about applying Statistics in order to improve Gradle Builds

Presented at

- N26 Barcelona | Android Meetup (February / 2020)
- GDG-SP Android Meetup #78 (March / 2020)

D4b7a3e2ed10f86e0b52498713ba2601?s=128

Ubiratan Soares

February 20, 2020
Tweet

Transcript

  1. EXPERIMENTS FOR ANDROID BUILDS driven by Gradle Profiler Ubiratan Soares

    February / 2020
  2. A problem like a big Android project and a really

    slow build …
  3. ✓ More modules ✓ Dagger reflect ✓ Gradle Enterprise Buck!

    Buck! Buck! Can we have i9 Macs?
  4. A problem like speculative statements

  5. “The problem is Gradle, it is just to slow and

    can’t scale for a multi- hundred modules build”
  6. “We enabled remote build caching, but I fell that our

    build did not improved that much …”
  7. “How do you know that such Dagger setup actually is

    faster than my own? Do you have any data to prove? Otherwise, I prefer my way”
  8. You can’t improve what you can’t measure

  9. https://www.youtube.com/watch?v=hBkIKfzd7Ms

  10. None
  11. Scientific results are validated by consistent and repeated verifications (1)

    applied on real world data and (2) driven by an accepted and battle-tested methodology. Not by vague statements neither by personal tastes
  12. Define Scenario Profile Scenario Identify Bottleneck Fix Bottleneck Verify fix

    Proposed methodology to figure out and solve bottlenecks
  13. Define Scenario Profile Scenario Identify Bottleneck Fix Bottleneck Verify fix

  14. Measuring builds

  15. None
  16. None
  17. Gradle Profiler Features Tooling API Cold/warm builds Daemon control Benchmarking

    Profilling Multiple build systems Multiple profilers Scenarios definition Incremental builds evaluation Etc
  18. Benchmarking

  19. None
  20. With Scenarios

  21. 19/02/2020 Benchmark Results

  22. None
  23. file:///Users/ubiratansoares/Dev/gradle-profiler/jdk8-bundled/benchmark.html 1/2

  24. file:///Users/ubiratansoares/Dev/gradle-profiler/jdk8-bundled/benchmark.html 1/2 CSV also available

  25. Evaluating Measurements

  26. Statistical Inference Population Exploratory Data Mean (µ) Sampling Refined data

    Mean (X) Probability Analysis Inferred parameter
  27. Pre-conditions • Samples are assigned randomly • Samples are obtained

    independently • Data follows the normal condition
  28. Statistical Hyphotesis Population Mean (µ) µ = 70 µ >

    70 H0 : HA : • Null hyphotesis (H0) : the status quo • Alternative hyphotesis (HA) : something we suspect that actually happens
  29. Hyphotesis testing Population Sample Data Mean (µ) Sampling Refinement and

    Validation Mean (X) Probability Analysis “If I pick another sample, which chance do I have to get the same results?”
  30. Statistical significance P(sample 1) = 99.99% P(sample 2) = 88.88%

    Sample 1 Probability Analysis Sample 2 alpha = 0.05 = 1 - 0.95 “Given that I have to live with the probability to observe the sample, the success rate on such observation must be 95% or greater”.
  31. p-value Sample Probability Analysis Calculated score (eg, Z or t)

    P-value (area) The probability of not observe this sample again
  32. None
  33. Get a good sample Define H0 e HA Figure out

    the shape of test (tails) Calculate the statistical score Compute the p-value Compare with significance level Reject H0 and accept Ha Can’t reject H0 Smaller Greater
  34. A framework to A/B test your builds

  35. Let's do a statistical test for the difference of means

    provided as result of two independent benchmarks and figure out if we have an evidence strong enough in order to demonstrate that some change actually improves our build time
  36. Define the improvement Measure the status quo Apply changes Measure

    again Compare measurements Let's do a statistical test for the difference of means calculated after two benchmarks and figure out if we have an evidence strong enough in order to demonstrate that some change actually improves our build time
  37. Define the improvement Measure the status quo Apply changes Measure

    again Compare measurements H0 : the mean after the improvement applied is statistically the same versus HA : the mean after the improvement applied is statistically lower than before
  38. Define the improvement Measure the status quo Apply changes Measure

    again Compare measurements Sample extraction (Gradle Profiler)
  39. Define the improvement Measure the status quo Apply changes Measure

    again Compare measurements Small samples (n < 30) or Big samples (n > 30) ?? Small samples
  40. Define the improvement Measure the status quo Apply changes Measure

    again Compare measurements Given the small sampling, the t-statistic will be used in a left-tailed test with significance level alpha= 0.05
  41. None
  42. None
  43. None
  44. Experimenting in the wild

  45. Running Environment •2019 MacBook Pro •Intel Core i7 (6 Cores)

    •16GB RAM
  46. Building with different JDKs

  47. #1 Building with different JDKs Target : SDKSearch Hyphotesis :

    JDK11 is faster than JDK8 https://github.com/JakeWharton/SdkSearch H0 : Amazon Corretto8 Ha : GraalVM 11
  48. Scenario Execution build { title = "Run full build" tasks

    = "build" daemon = warm cleanup-tasks = ["clean"] } • 4 warmed-up builds • 15 measured builds (sample) #1 Building with different JDKs
  49. None
  50. None
  51. #1 Building with different JDKs Results • P-value >>>>>> Alpha

    • Ha has not been accepted • No statistical evidence of faster builds with GraalVM11
  52. Bumping Gradle and AGP

  53. Target : I/O Sched Hyphotesis : newer versions are faster

    https://github.com/google/iosched H0 : AGP 3.4.1 and Gradle 5.1.1 Ha : AGP 3.5.3 and Gradle 6.2 #2 Bumping Gradle and AGP
  54. Scenario Execution build { title = “Assemble Debug APK" tasks

    = “mobile:assembleDebug” daemon = warm cleanup-tasks = ["clean"] } • 4 warmed-up builds • 15 measured builds (sample) #2 Bumping Gradle and AGP
  55. None
  56. None
  57. None
  58. Results • P-value <<<<< Alpha • Ha has been accepted

    • We have REALLY STRONG statistical evidence that such bumps promote faster builds #2 Bumping Gradle and AGP
  59. New pipeline for resources provided AGP 3.6.x

  60. Target : Plaid Hyphotesis : new pipeline is faster https://github.com/android/plaid

    H0 : AGP 3.6.0 Ha : AGP 3.6.0 with android.namespacedRClass = true android.enableAppCompileTimeRClass = true #3 AGP 3.6.x new pipeline for resources
  61. Scenario Execution build { title = “Assemble Debug APK" tasks

    = “app:assembleDebug” daemon = warm cleanup-tasks = ["clean"] } • 5 warmed-up builds • 20 measured builds (sample) #3 AGP 3.6.x new pipeline for resources
  62. LIVE DEMO

  63. #3 AGP 3.6.x new pipeline for resources Results • P-value

    >>>>>> Alpha • Ha has not been accepted • No statistical evidence of faster builds with new AGP 3.6.x flags
  64. Final Remarks

  65. • Play around Gradle Profiler • Design your experiment •

    Run it !!! • Take your decision based on data, not on personal opinions Call to action!
  66. “It doesn't matter how beautiful your theory is, it doesn't

    matter how smart you are. If it doesn't agree with experiment, it's wrong” - Richard Feymann
  67. UBIRATAN SOARES Brazilian Computer Scientist Senior Software Engineer @ N26

    GDE for Android and Kotlin @ubiratanfsoares ubiratansoares.dev
  68. https:!//speakerdeck.com/ubiratansoares

  69. Thanks