Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Experiments for your Android Builds driven by Gradle Profiler

Experiments for your Android Builds driven by Gradle Profiler

Companion slides for my talk about applying Statistics in order to improve Gradle Builds

Presented at

- N26 Barcelona | Android Meetup (February / 2020)
- GDG-SP Android Meetup #78 (March / 2020)
- Droidcon EMEA Online (October / 2020)
- Android Summit Online (October / 2020)

Ubiratan Soares

October 08, 2020
Tweet

Video

More Decks by Ubiratan Soares

Other Decks in Programming

Transcript

  1. EXPERIMENTS FOR
    ANDROID BUILDS
    driven by Gradle Profiler
    Ubiratan Soares
    October / 2020

    View full-size slide

  2. https://n26.com/en/careers

    View full-size slide

  3. A problem like a big
    Android project and a
    really slow build …

    View full-size slide

  4. “You can’t improve
    what you can’t measure”
    Someone, somewhen

    View full-size slide

  5. https://www.youtube.com/watch?v=hBkIKfzd7Ms

    View full-size slide

  6. Measuring builds

    View full-size slide

  7. Gradle Profiler Features
    Tooling API
    Cold/warm builds
    Daemon control
    Benchmarking
    Profilling
    Multiple build systems
    Multiple profilers Scenarios definition
    Incremental builds evaluation
    Etc

    View full-size slide

  8. Installing with SDKMAN!
    https://github.com/gradle/gradle-profiler/releases
    sdk install gradleprofiler

    View full-size slide

  9. Installing with HomeBrew
    brew install gradle-profiler

    View full-size slide

  10. Benchmarking
    gradle-profiler

    View full-size slide

  11. Running Scenarios
    gradle-profiler

    View full-size slide

  12. Evaluating
    Measurements

    View full-size slide

  13. Build Benchmark #01 Benchmark #02
    1 4 4.4
    2 5 5.1
    3 5.1 5.2
    4 4.4 6.2
    5 3.9 3.4
    6 4.2 6.2
    7 4.6 4.6
    8 4.5 4.5
    9 4.4 3.3
    10 4 4.4
    3
    4
    5
    6
    7

    View full-size slide

  14. 3
    4
    5
    6
    7
    Benchmark Mean Standard Deviation
    #01 4.46 0.41
    #02 4.77 1.04
    #01
    #02

    View full-size slide

  15. https:!//towardsdatascience.com/why-averages-are-often-wrong-1ff08e409a5b

    View full-size slide

  16. 3
    4
    5
    6
    7
    Benchmark Mean Standard Deviation
    #01 4.46 0.41
    #02 4.77 1.04

    View full-size slide

  17. When Build
    Engineering meets
    Data Science

    View full-size slide

  18. Statistical Inference
    Population
    Exploratory
    Data
    Mean (µ)
    Sampling
    Refined data
    Mean (X)
    Probability
    Analysis
    Inferred
    parameter

    View full-size slide



  19. Alice Bob
    Android Tech Lead Android Engineer

    View full-size slide

  20. “I KILLED SO MANY ANNOTATIONS ON MY PR
    THAT NOW I’M SURE app:assembleDebug IS
    RUNNING FASTER !!!!”


    “DID YOU RUN BENCHMARKS FOR IT
    WITH GRADLE PROFILER ???”

    View full-size slide


  21. “I CAN HELP WITH THAT !”

    “BUT I’M NOT SURE HOW TO DEMONSTRATE
    THE IMPROVEMENTS …”
    “YES I DID, BEFORE AND AFTER MY CHANGES.”

    View full-size slide

  22. Statistical Hyphotesis
    • Null hyphotesis (H0)
    • Alternative hyphotesis (HA)

    “I missed that class I GUESS…”

    Population
    Mean (µ)

    View full-size slide


  23. Null HyphoTHESIS - h0 - IS THE STATUS
    QUO. WHAT WE HAVE RIGHT NOW IN
    OUR TRUNK BRANCH IF YOU PREFER
    ALTERNATIVE HyphoTHESIS - HA- IS WHAT
    YOU WANT TO DEMONSTRATE

    View full-size slide

  24. Statistical significance
    P(sample 1) = 99.99%
    P(sample 2) = 88.88%
    Sample 1
    Probability
    Analysis
    Sample 2
    alpha = significance level = 0.05 = 1 - 0.95

    “Probably I missed that CLASS TOO …”

    View full-size slide


  25. “Your modifications will mean real
    improvements if I EXECUTE 100 runs of
    app:assembleDebug and I see an execution
    faster than 5000ms FOR 95 of them (AT
    LEAST).”

    View full-size slide

  26. Law of the Big Numbers
    Size ?
    Sample
    !>= 30
    < 30
    T-student

    Distribution
    Normal

    Distribution


    “SUPER EASY! ”

    View full-size slide


  27. “Given that app:assembleDebug IS
    quite slow, we wIll consider
    BENCHMARKS RUNNING SOMETHING
    between 15 and 25 MEASURED BUILDS;
    and Therefore t-student distribution
    will model our probability CURVE.”

    View full-size slide

  28. p-value
    Sample
    Probability Analysis Critical value
    (eg, Z or t)
    P-value (area)
    The probability of an error Type I



    View full-size slide


  29. “With the data you provided the
    probability model tells me that every 100
    RUNS OF app:assembleDebug, 3 OF THEM
    ACTUALLY WILL BE false positiveS”

    View full-size slide

  30. Critical value and Tail analysis
    Left-tailed Right-tailed
    Double-tailed
    µA < µ0 µA !!= µ0 µA > µ0

    “I DO NEED SOME COFFEE. DO YOU?”

    View full-size slide

  31. µA !<= µ0

    “Faster builds imply that the
    observed mean after your
    modifications are SMALLER than
    the value we had before. So
    we want a left-tail analysis”

    View full-size slide


  32. “we will compare the samples for IT
    captured on two different moments
    UNDER SLIGHTLY DIFFERENT CONDITIONS”
    “But we don’t KNOW the ACTUAL
    AVERAGE VALUE OF APP:ASSEMBLEDEBUG
    FOR ALL POSSIBLE BUILDS …”

    "THIS IS CALLED A PAIRED TEST.”

    View full-size slide


  33. “YES, YOU WILL”
    “PLEASE TELL ME THAT I WON’T HAVE TO
    DO ALL THE CALCULATIONS BY MYSELF …”

    View full-size slide


  34. “JUST JOKING. THERE ARE TOOLS
    WE CAN USE. ”


    View full-size slide

  35. https://www.statskingdom.com/160MeanT2pair.html

    View full-size slide

  36. A framework to
    drive experiments for
    your Gradle builds

    View full-size slide

  37. Benchmark #01
    (status quo)
    alpha

    (0.05)
    p-value
    Compare
    p-value
    and alpha
    Left-tailed
    Paired T-test
    Evidence that build has improved is
    stastistically WEAK
    p-value is BIGGER
    p-value is SMALLER
    Evidence that build has improved is
    stastistically STRONG
    Benchmark #02
    (modifications)
    Gradle task

    View full-size slide

  38. Running Environment
    •2018 MacBook Pro
    •Intel Core i7 (6 Cores)
    •16GB RAM
    kotlin.parallel.tasks.in.project=true
    kapt.use.worker.api=true
    kapt.include.compile.classpath=true
    kapt.incremental.apt=false
    org.gradle.workers.max=6
    .gradle/gradle.properties

    View full-size slide

  39. Scenario
    build {
    title = “Assemble Debug APK"
    tasks = “mobile:assembleDebug”
    daemon = warm
    cleanup-tasks = ["clean"]
    }
    Execution
    • 4 warmed-up builds
    • 15 measured builds (samples)
    Hyphotesis H0 : No meaningful build improvements building with newer JDKs
    Ha : JDK11 delivers faster Gradle builds than JDK8
    https://github.com/JakeWharton/SdkSearch
    Target
    Example #01

    View full-size slide

  40. #1 Building with different JDKs
    •P-value > alpha

    View full-size slide

  41. #1 Building with different JDKs
    •P-value > alpha
    •Ha has not been accepted

    View full-size slide

  42. #1 Building with different JDKs
    •P-value > alpha
    •Ha has not been accepted
    •No STRONG statistical evidence of
    faster builds with JDK11

    View full-size slide

  43. Execution
    • 4 warmed-up builds
    • 15 measured builds (samples)
    Hyphotesis
    H0 : Bumps delivers no meanigful build improvementsts
    Current = AGP 3.4.1 and Gradle 5.1.1
    Ha : Bumps to AGP 3.5.3 and Gradle 6.2 deliver faster builds
    https://github.com/google/iosched
    Target
    Scenario
    build {
    title = “Assemble Debug APK"
    tasks = “mobile:assembleDebug”
    daemon = warm
    cleanup-tasks = ["clean"]
    }
    Example #02

    View full-size slide

  44. Final Remarks

    View full-size slide

  45. • Play around Gradle Profiler
    • Design your experiment
    • Run it !!!
    • Take your decisions based on
    data generated and analysed in
    particular your context, not on
    Tweets or Subreddits
    Call to action!

    View full-size slide

  46. UBIRATAN
    SOARES
    Brazilian Computer Scientist
    Senior Software Engineer @ N26
    GDE for Android and Kotlin
    @ubiratanfsoares
    ubiratansoares.dev

    View full-size slide