Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Chasing Performance Issues Methodically

Amanjeet Singh
September 04, 2022

Chasing Performance Issues Methodically

Solving performance problems at scale has always been tricky. There is a lot of confusion on how to address those problems in android. In this talk, we will try to understand the nature of the performance problems, how to select tooling for these issues, and also try to make a generic approach that could apply to most of the problems.

Amanjeet Singh

September 04, 2022
Tweet

More Decks by Amanjeet Singh

Other Decks in Technology

Transcript

  1. Chasing Performance Issues Methodically
    @droid_singh
    @amanjeetsingh150
    Amanjeet Singh

    View Slide

  2. What is Performance?

    View Slide

  3. What is Performance?
    Story 📘

    View Slide

  4. What is Performance?
    Story 📘

    View Slide

  5. What is Performance?
    Story 📘

    View Slide

  6. What is Performance?
    Story 📘

    View Slide

  7. What is Performance?
    Story 📘
    Pattern 1: Scalable ≠ High Performant Codebase

    View Slide

  8. What is Performance?
    Story 📘
    Pattern 1: Scalable ≠ High Performant Codebase
    • Not upgrading libraries

    View Slide

  9. What is Performance?
    Story 📘
    Pattern 1: Scalable ≠ High Performant Codebase
    • Not upgrading libraries


    • Partial quality gates

    View Slide

  10. What is Performance?
    Story 📘
    Pattern 1: Scalable ≠ High Performant Codebase
    • Not upgrading libraries


    • Partial quality gates


    • Not scoping dagger dependencies properly. I Singleton

    View Slide

  11. What is Performance?
    Story 📘
    Pattern 1: Scalable ≠ High Performant Codebase
    • Not upgrading libraries


    • Partial quality gates


    • Not scoping dagger dependencies properly. I Singleton


    • Selecting configs according to only business metrics

    View Slide

  12. What is Performance?
    Story 📘
    Pattern 1: Scalable ≠ High Performant Codebase
    • Not upgrading libraries


    • Partial quality gates


    • Not scoping dagger dependencies properly. I Singleton


    • Selecting configs according to only business metrics

    View Slide

  13. What is Performance?
    Story 📘
    Pattern 1: Scalable ≠ High Performant Codebase
    • Not upgrading libraries


    • Partial quality gates


    • Not scoping dagger dependencies properly. I Singleton


    • Selecting configs according to only business metrics

    View Slide

  14. What is Performance?
    Story 📘
    Pattern 1: Scalable ≠ High Performant Codebase
    • Not upgrading libraries


    • Partial quality gates


    • Not scoping dagger dependencies properly. I Singleton


    • Selecting configs according to only business metrics

    Pattern 2: We get to know from customer
    tickets or Android Vitals

    View Slide

  15. What is Performance?
    Story 📘
    Pattern 1: Scalable ≠ High Performant Codebase
    • Not upgrading libraries


    • Partial quality gates


    • Not scoping dagger dependencies properly. I Singleton


    • Selecting configs according to only business metrics

    Pattern 2: We get to know from customer
    tickets or Android Vitals

    View Slide

  16. What is Performance?
    Story 📘
    Pattern 1: Scalable ≠ High Performant Codebase
    • Not upgrading libraries


    • Partial quality gates


    • Not scoping dagger dependencies properly. I Singleton


    • Selecting configs according to only business metrics

    Pattern 2: We get to know from customer
    tickets or Android Vitals
    Pattern 3: Prioritising performance issues

    with other feature work is an issue

    View Slide

  17. What is Performance?
    Story 📘
    Pattern 1: Scalable ≠ High Performant Codebase
    • Not upgrading libraries


    • Partial quality gates


    • Not scoping dagger dependencies properly. I Singleton


    • Selecting configs according to only business metrics

    Pattern 2: We get to know from customer
    tickets or Android Vitals
    Pattern 3: Prioritising performance issues

    with other feature work is an issue
    • How are performance issues linked with business?

    View Slide

  18. What is Performance?
    Story 📘
    Pattern 1: Scalable ≠ High Performant Codebase
    • Not upgrading libraries


    • Partial quality gates


    • Not scoping dagger dependencies properly. I Singleton


    • Selecting configs according to only business metrics

    Pattern 2: We get to know from customer
    tickets or Android Vitals
    Pattern 3: Prioritising performance issues

    with other feature work is an issue
    • How are performance issues linked with business?


    • Time for solving one performance issue?

    View Slide

  19. What is Performance?
    Story 📘
    Pattern 1: Scalable ≠ High Performant Codebase
    • Not upgrading libraries


    • Partial quality gates


    • Not scoping dagger dependencies properly. I Singleton


    • Selecting configs according to only business metrics

    Pattern 2: We get to know from customer
    tickets or Android Vitals
    Pattern 3: Prioritising performance issues

    with other feature work is an issue
    • How are performance issues linked with business?


    • Time for solving one performance issue?
    🤷

    View Slide

  20. Platform Team of Company X

    View Slide

  21. Mission I: Reducing Cold Start
    •Drafting OKRs for reducing Cold Start

    View Slide

  22. Mission I: Reducing Cold Start
    •Drafting OKRs for reducing Cold Start
    •Draft I


    •Uplifting the app quality of X consumer app (O)

    View Slide

  23. Mission I: Reducing Cold Start
    •Drafting OKRs for reducing Cold Start
    •Draft I


    •Uplifting the app quality of X consumer app (O)


    •Key Results I: Reduce cold start by 60%


    •Key Results II: Reduce wake locks by 30%


    •Key Results III: Reduce frame drops by 50%

    View Slide

  24. Mission I: Reducing Cold Start
    •Drafting OKRs for reducing Cold Start
    •Draft I


    •Uplifting the app quality of X consumer app (O)


    •Key Results I: Reduce cold start by 60%


    •Key Results II: Reduce wake locks by 30%


    •Key Results III: Reduce frame drops by 50%

    View Slide

  25. Mission I: Reducing Cold Start
    •Drafting OKRs for reducing Cold Start
    •Draft I


    •Uplifting the app quality of X consumer app (O)


    •Key Results I: Reduce cold start by 60%


    •Key Results II: Reduce wake locks by 30%


    •Key Results III: Reduce frame drops by 50%
    •No observability

    View Slide

  26. Mission I: Reducing Cold Start
    •Drafting OKRs for reducing Cold Start
    •Draft I


    •Uplifting the app quality of X consumer app (O)


    •Key Results I: Reduce cold start by 60%


    •Key Results II: Reduce wake locks by 30%


    •Key Results III: Reduce frame drops by 50%
    •No observability


    •No quality gate

    View Slide

  27. Mission I: Reducing Cold Start
    •Drafting OKRs for reducing Cold Start
    •Draft I


    •Uplifting the app quality of X consumer app (O)


    •Key Results I: Reduce cold start by 60%


    •Key Results II: Reduce wake locks by 30%


    •Key Results III: Reduce frame drops by 50%
    😖 •No observability


    •No quality gate


    •Every fix might not have affect to gain performance

    View Slide

  28. Mission I: Reducing Cold Start
    •Drafting OKRs for reducing Cold Start
    •Draft I


    •Uplifting the app quality of X consumer app (O)


    •Key Results I: Reduce cold start by 60%


    •Key Results II: Reduce wake locks by 30%


    •Key Results III: Reduce frame drops by 50%
    Pattern 4: Drafting OKRs for platform teams is difficult
    •No observability


    •No quality gate


    •Every fix might not have affect to gain performance
    😖

    View Slide

  29. First few Attempts to


    fix cold start

    View Slide

  30. First few Attempts to


    fix cold start
    ⏰ 1. Looking online for blogs "Reducing cold starts at an X

    company by 80%"

    View Slide

  31. First few Attempts to


    fix cold start
    ⏰ 1. Looking online for blogs "Reducing cold starts at an X

    company by 80%"
    2. Playing with tools like profilers locally to identify

    bottlenecks on app start and fix them

    View Slide

  32. First few Attempts to


    fix cold start
    ⏰ 1. Looking online for blogs "Reducing cold starts at an X

    company by 80%"
    2. Playing with tools like profilers locally to identify

    bottlenecks on app start and fix them
    3. Randomly change something according to experience

    View Slide

  33. First few Attempts to


    fix cold start
    ⏰ 1. Looking online for blogs "Reducing cold starts at an X

    company by 80%"
    2. Playing with tools like profilers locally to identify

    bottlenecks on app start and fix them
    3. Randomly change something according to experience
    🚀

    View Slide

  34. First few Attempts to


    fix cold start
    ⏰ 1. Looking online for blogs "Reducing cold starts at an X

    company by 80%"
    2. Playing with tools like profilers locally to identify

    bottlenecks on app start and fix them
    3. Randomly change something according to experience
    Theoretically ✅
    🚀

    View Slide

  35. First few Attempts to


    fix cold start
    ⏰ 1. Looking online for blogs "Reducing cold starts at an X

    company by 80%"
    2. Playing with tools like profilers locally to identify

    bottlenecks on app start and fix them
    3. Randomly change something according to experience
    Theoretically ✅

    🚀

    View Slide

  36. First few Attempts to


    fix cold start

    Anti Methodology for Performance Analysis

    View Slide

  37. First few Attempts to


    fix cold start

    Anti Methodology for Performance Analysis
    • Blame-Someone-Else Anti-Method

    View Slide

  38. First few Attempts to


    fix cold start

    Anti Methodology for Performance Analysis
    • Blame-Someone-Else Anti-Method
    • Street light anti-method

    View Slide

  39. First few Attempts to


    fix cold start

    Anti Methodology for Performance Analysis
    • Blame-Someone-Else Anti-Method
    • Street light anti-method
    • Random Change Anti-Method

    View Slide

  40. What are we

    doing wrong?

    View Slide

  41. What are we

    doing wrong?
    • Are we chasing the right metric for cold start?

    View Slide

  42. What are we

    doing wrong?
    • Are we chasing the right metric for cold start?
    • Is there any missing case for cold start we are not

    considering?

    View Slide

  43. What are we

    doing wrong?
    • Are we chasing the right metric for cold start?
    • Is there any missing case for cold start we are not

    considering?
    • Maybe there were improvements but were neutralised

    by changes made by other developers 🤔

    View Slide

  44. Sit back and Strategize! 🤔

    View Slide

  45. Sit back and Strategize! 🤔
    Step 1: Identify proper metrics and create observability

    View Slide

  46. Sit back and Strategize! 🤔
    Step 1: Identify proper metrics and create observability
    • Cold Start App Launch

    View Slide

  47. Sit back and Strategize! 🤔
    Step 1: Identify proper metrics and create observability
    • Cold Start App Launch
    • Different spans for app launch

    View Slide

  48. Sit back and Strategize! 🤔
    Step 1: Identify proper metrics and create observability
    • Cold Start App Launch
    • Different spans for app launch


    • Google


    Content

    Provider
    First

    Screen

    Drawn

    View Slide

  49. Sit back and Strategize! 🤔
    Step 1: Identify proper metrics and create observability
    • Cold Start App Launch
    • Different spans for app launch


    • Google


    • User experienced
    Content

    Provider
    First

    Screen

    Drawn
    onCreate

    end
    onStart

    View Slide

  50. Sit back and Strategize! 🤔
    Step 1: Identify proper metrics and create observability
    • Cold Start App Launch
    • Different spans for app launch


    • Google


    • User experienced
    • Send app launch events and following attributes:

    View Slide

  51. Sit back and Strategize! 🤔
    Step 1: Identify proper metrics and create observability
    • Cold Start App Launch
    • Different spans for app launch


    • Google


    • User experienced
    • Send app launch events and following attributes:


    • First screen name


    View Slide

  52. Sit back and Strategize! 🤔
    Step 1: Identify proper metrics and create observability
    • Cold Start App Launch
    • Different spans for app launch


    • Google


    • User experienced
    • Send app launch events and following attributes:


    • First screen name


    • Total time

    View Slide

  53. Sit back and Strategize! 🤔
    Step 1: Identify proper metrics and create observability
    • Cold Start App Launch
    • Different spans for app launch


    • Google


    • User experienced
    • Send app launch events and following attributes:


    • First screen name


    • Total time


    • User ID

    View Slide

  54. Sit back and Strategize! 🤔
    Step 1: Identify proper metrics and create observability
    • Cold Start App Launch
    • Different spans for app launch


    • Google


    • User experienced
    • Send app launch events and following attributes:


    • First screen name


    • Total time


    • User ID
    • Tools selection: Firebase analytics

    View Slide

  55. Sit back and Strategize! 🤔
    // Start


    AppLaunchTracker.startTracking()


    View Slide

  56. Sit back and Strategize! 🤔
    // Start


    AppLaunchTracker.startTracking()


    // End


    AppLaunchTracker.stopTracking(firstScreenName)




    View Slide

  57. Sit back and Strategize! 🤔
    // Start


    AppLaunchTracker.startTracking()


    // End


    AppLaunchTracker.stopTracking(firstScreenName)




    // Track event


    Bundle bundle = new Bundle();


    bundle.putString(FirebaseAnalytics.Param.USER_ID, id);


    bundle.putString(

    FirebaseAnalytics.Param.FIRST_SCREEN_DRAWN, firstScreenName

    );


    bundle.putString(FirebaseAnalytics.Param.TOTAL_DURATION, totalTime);



    View Slide

  58. Sit back and Strategize! 🤔
    // Start


    AppLaunchTracker.startTracking()


    // End


    AppLaunchTracker.stopTracking(firstScreenName)




    // Track event


    Bundle bundle = new Bundle();


    bundle.putString(FirebaseAnalytics.Param.USER_ID, id);


    bundle.putString(

    FirebaseAnalytics.Param.FIRST_SCREEN_DRAWN, firstScreenName

    );


    bundle.putString(FirebaseAnalytics.Param.TOTAL_DURATION, totalTime);



    View Slide

  59. Sit back and Strategize! 🤔
    // Start


    AppLaunchTracker.startTracking()


    // End


    AppLaunchTracker.stopTracking(firstScreenName)




    // Track event


    Bundle bundle = new Bundle();


    bundle.putString(FirebaseAnalytics.Param.USER_ID, id);


    bundle.putString(

    FirebaseAnalytics.Param.FIRST_SCREEN_DRAWN, firstScreenName

    );


    bundle.putString(FirebaseAnalytics.Param.TOTAL_DURATION, totalTime);


    val firebaseAnalytics = FirebaseAnalytics.getInstance(this)


    firebaseAnalytics.logEvent(FirebaseAnalytics.Event.APP_LAUNCH, bundle)



    View Slide

  60. Sit back and Strategize! 🤔
    // Start


    AppLaunchTracker.startTracking()


    // End


    AppLaunchTracker.stopTracking(firstScreenName)




    // Track event


    Bundle bundle = new Bundle();


    bundle.putString(FirebaseAnalytics.Param.USER_ID, id);


    bundle.putString(

    FirebaseAnalytics.Param.FIRST_SCREEN_DRAWN, firstScreenName

    );


    bundle.putString(FirebaseAnalytics.Param.TOTAL_DURATION, totalTime);


    val firebaseAnalytics = FirebaseAnalytics.getInstance(this)


    firebaseAnalytics.logEvent(FirebaseAnalytics.Event.APP_LAUNCH, bundle)



    View Slide

  61. Sit back and Strategize! 🤔
    // Start


    AppLaunchTracker.startTracking()


    // End


    AppLaunchTracker.stopTracking(firstScreenName)




    // Track event


    Bundle bundle = new Bundle();


    bundle.putString(FirebaseAnalytics.Param.USER_ID, id);


    bundle.putString(

    FirebaseAnalytics.Param.FIRST_SCREEN_DRAWN, firstScreenName

    );


    bundle.putString(FirebaseAnalytics.Param.TOTAL_DURATION, totalTime);


    val firebaseAnalytics = FirebaseAnalytics.getInstance(this)


    firebaseAnalytics.logEvent(FirebaseAnalytics.Event.APP_LAUNCH, bundle)



    Query data and create intelligence

    View Slide

  62. Tracker

    View Slide

  63. Tracker

    View Slide

  64. Tracker
    Attributes for app launch

    View Slide

  65. Tracker
    Attributes for app launch

    Insights from data:

    View Slide

  66. Tracker
    Attributes for app launch

    Insights from data:

    • Splash screen is not the only contributor

    to first_screen_drawn
    0
    25
    50
    75
    100
    Survey Screen Splash Home Screen
    First screen names
    Percentage

    distribution

    of app launch

    View Slide

  67. Tracker
    Attributes for app launch

    Insights from data:

    • Splash screen is not the only contributor

    to first_screen_drawn
    • App Icon click is not only trigger
    0
    25
    50
    75
    100
    Survey Screen Splash Home Screen
    First screen names
    Percentage

    distribution

    of app launch

    View Slide

  68. Tracker
    Attributes for app launch

    Insights from data:

    • Splash screen is not the only contributor

    to first_screen_drawn
    • App Icon click is not only trigger
    • Campaigns
    0
    25
    50
    75
    100
    Survey Screen Splash Home Screen
    Percentage

    distribution

    of app launch
    First screen names

    View Slide

  69. Sit back and Strategize! 🤔
    Step 2:

    View Slide

  70. Sit back and Strategize! 🤔
    Step 2: Stop bleeding by creating baseline

    View Slide

  71. Sit back and Strategize! 🤔
    Step 2: Stop bleeding by creating baseline?

    View Slide

  72. Sit back and Strategize! 🤔
    Step 2: Stop bleeding by creating baseline
    • Creating observability on the PRs

    View Slide

  73. Sit back and Strategize! 🤔
    Step 2: Stop bleeding by creating baseline
    • Creating observability on the PRs
    • Ideal tool for this:

    View Slide

  74. Sit back and Strategize! 🤔
    Step 2: Stop bleeding by creating baseline
    • Creating observability on the PRs
    • Ideal tool for this:


    • Surface tentative regressions

    View Slide

  75. Sit back and Strategize! 🤔
    Step 2: Stop bleeding by creating baseline
    • Creating observability on the PRs
    • Ideal tool for this:


    • Surface tentative regressions


    • Detecting outliers and noise

    View Slide

  76. Sit back and Strategize! 🤔
    Step 2: Stop bleeding by creating baseline
    • Creating observability on the PRs
    • Ideal tool for this:


    • Surface tentative regressions


    • Detecting outliers and noise


    • Surface regressions mapping developers and teams

    View Slide

  77. Sit back and Strategize! 🤔
    Step 2: Stop bleeding by creating baseline
    • Creating observability on the PRs
    • Ideal tool for this:


    • Surface tentative regressions


    • Detecting outliers and noise


    • Surface regressions mapping developers and teams


    • Infra to run the tests

    View Slide

  78. Sit back and Strategize! 🤔
    Step 2: Stop bleeding by creating baseline
    • Creating observability on the PRs
    • Ideal tool for this:


    • Observability on tentative regressions


    • Detecting outliers and noise


    • Surface regressions mapping developers and teams


    • Infra to run the tests

    View Slide

  79. Sit back and Strategize! 🤔
    Common factors of noise and outliers
    • Network

    View Slide

  80. Sit back and Strategize! 🤔
    Common factors of noise and outliers
    • Network
    🎹 Orchestration of test with mitm

    View Slide

  81. Sit back and Strategize! 🤔
    Common factors of noise and outliers
    • Network
    • Remote Configuration
    /data/data/com.app.id/files/frc__firebase_activate.json

    View Slide

  82. Sit back and Strategize! 🤔
    Common factors of noise and outliers
    • Network
    • Remote Configuration
    • Debug Builds

    View Slide

  83. Sit back and Strategize! 🤔
    Why not debug builds?

    View Slide

  84. Sit back and Strategize! 🤔
    Why not debug builds?
    • Account for Progaurd and Dexguard effects

    View Slide

  85. Sit back and Strategize! 🤔
    Why not debug builds?
    • Account for Progaurd and Dexguard effects
    • Don’t account for debug artifacts

    View Slide

  86. Sit back and Strategize! 🤔
    Why not debug builds?
    • Account for Progaurd and Dexguard effects
    • Don’t account for debug artifacts
    • Take in account build configuration as close to

    release builds

    View Slide

  87. Sit back and Strategize! 🤔
    Common factors of noise and outliers
    • Network
    • Remote Configuration
    • Debug Builds
    • App/Device based noise

    View Slide

  88. Sit back and Strategize! 🤔
    App/Device based noise

    View Slide

  89. Sit back and Strategize! 🤔
    • Random GC triggers
    App/Device based noise

    View Slide

  90. Sit back and Strategize! 🤔
    App/Device based noise
    • Random GC triggers
    • System Dialogs from Crashes/ANRs

    View Slide

  91. Sit back and Strategize! 🤔
    App/Device based noise
    • Random GC triggers
    • System Dialogs from Crashes/ANRs
    • Device configuration like CPU frequency

    View Slide

  92. Sit back and Strategize! 🤔
    App/Device based noise
    • Random GC triggers
    • System Dialogs from Crashes/ANRs
    • Device configuration like CPU frequency
    • CPU/Runtime optimisations in general

    View Slide

  93. Sit back and Strategize! 🤔
    App/Device based noise
    • Random GC triggers
    • System Dialogs from Crashes/ANRs
    • Device configuration like CPU frequency
    • CPU/Runtime optimisations in general
    • Never ending.

    View Slide

  94. Sit back and Strategize! 🤔
    Step 2: Stop bleeding by creating baseline
    • Creating observability on the PRs
    • Ideal tool for this:


    • Observability on tentative regressions


    • Detecting outliers and noise


    • Surface regressions mapping developers and teams


    • Infra to run the tests
    Tool Selection

    View Slide

  95. Sit back and Strategize! 🤔
    Step 2: Stop bleeding by creating baseline
    • Creating observability on the PRs
    • Ideal tool for this:


    • Observability on tentative regressions


    • Detecting outliers and noise


    • Surface regressions mapping developers and teams


    • Infra to run the tests
    Tool Selection

    View Slide

  96. Tracker
    Attributes for app launch

    Insights from data:

    • Splash screen is not the only

    contributor to first_screen_drawn
    • App Icon click is not only trigger
    • Campaigns

    View Slide

  97. Tracker
    Attributes for app launch

    Insights from data:

    • Splash screen is not the only

    contributor to first_screen_drawn
    • App Icon click is not only trigger
    • Campaigns
    Eyes on the changes entering while you fix

    View Slide

  98. Tracker
    Attributes for app launch

    Insights from data:

    • Splash screen is not the only

    contributor to first_screen_drawn
    • App Icon click is not only trigger
    • Campaigns
    Eyes on the changes entering while you fix

    Alert of PRs on slack, email

    View Slide

  99. Tracker
    Attributes for app launch

    Insights from data:

    • Splash screen is not the only

    contributor to first_screen_drawn
    • App Icon click is not only trigger
    • Campaigns
    Eyes on the changes entering while you fix

    Alert of PRs on slack, email

    Flame graphs, diff, VCS metadata

    View Slide

  100. View Slide

  101. Sit back and Strategize! 🤔
    Production analytics
    Quality gate on baseline branch


    View Slide

  102. Sit back and Strategize! 🤔
    Step 3: Extract Impacted sessions and collect prod traces

    from the impacted users

    View Slide

  103. Sit back and Strategize! 🤔
    Step 3: Extract Impacted sessions and collect prod traces

    from the impacted users

    View Slide

  104. Sit back and Strategize! 🤔
    • Definition of Impacted session for app launch
    Step 3: Extract Impacted sessions and collect prod traces

    from the impacted users

    View Slide

  105. Sit back and Strategize! 🤔
    • Definition of Impacted session for app launch
    • Google Android Vitals >= 5 seconds
    Step 3: Extract Impacted sessions and collect prod traces

    from the impacted users

    View Slide

  106. Sit back and Strategize! 🤔
    • Definition of Impacted session for app launch
    • Google Android Vitals >= 5 seconds
    • Discussion with product stakeholders
    Step 3: Extract Impacted sessions and collect prod traces

    from the impacted users

    View Slide

  107. Sit back and Strategize! 🤔
    • Definition of Impacted session for app launch
    • Discussion with product stakeholders
    • Google Android Vitals >= 5 seconds
    • Query the users having more than X percent sessions

    as Impacted
    Step 3: Extract Impacted sessions and collect prod traces

    from the impacted users

    View Slide

  108. Sit back and Strategize! 🤔
    • Discussion with product stakeholders
    • Query the users having more than X percent sessions

    as Impacted
    • Tools selection: Firebase Realtime Database, Firebase

    User Properties and Debug API on the rescue
    • Google Android Vitals >= 5 seconds
    • Definition of Impacted session for app launch
    Step 3: Extract Impacted sessions and collect prod traces

    from the impacted users

    View Slide

  109. Sit back and Strategize! 🤔
    Step 3: Extract Impacted sessions and collect prod traces

    from the impacted users
    Segment

    impacted users

    with firebase
    Uploading

    traces from

    impacted users

    View Slide

  110. Sit back and Strategize! 🤔
    Step 3: Extract Impacted sessions and collect prod traces

    from the impacted users
    Segment

    impacted users

    with firebase
    Uploading

    traces from

    impacted users

    View Slide

  111. Sit back and Strategize! 🤔
    1. Create Firebase User Properties on Console

    View Slide

  112. Sit back and Strategize! 🤔
    1. Create Firebase User Properties on Console
    pro
    fi
    le_app_startup

    View Slide

  113. Sit back and Strategize! 🤔
    users
    …….
    2. Firebase Realtime Database

    View Slide

  114. Sit back and Strategize! 🤔
    2. Firebase Realtime Database
    users
    …….
    ……. user-id-1
    ……. user-id-2

    View Slide

  115. Sit back and Strategize! 🤔
    users
    …….
    ……. user-id-1
    ……. user_property_a: true
    ……. user-id-2
    ……. user_property_a: true
    2. Firebase Realtime Database

    View Slide

  116. Sit back and Strategize! 🤔
    users
    …….
    ……. user-id-1
    ……. user_property_a: true
    ……. user-id-2
    ……. user_property_a: true
    ……. pro
    fi
    le_app_startup: true
    ……. user_property_b: true
    2. Firebase Realtime Database

    View Slide

  117. Sit back and Strategize! 🤔
    users
    …….
    ……. user-id-1
    ……. user_property_a: true
    ……. user-id-2
    ……. user_property_a: true
    ……. pro
    fi
    le_app_startup: true
    ……. user_property_b: true
    2. Firebase Realtime Database
    Dynamically enable app start up profiling for these users

    View Slide

  118. Sit back and Strategize! 🤔
    3. Enable via remote config

    View Slide

  119. Sit back and Strategize! 🤔
    3. Enable via remote config
    • Create parameter enable_trace_app_launch

    View Slide

  120. Sit back and Strategize! 🤔
    3. Enable via remote config
    • Create parameter enable_trace_app_launch
    • Create condition with user property profile_app_startup

    View Slide

  121. Sit back and Strategize! 🤔
    3. Enable via remote config
    • Create parameter enable_trace_app_launch
    • Create condition with user property profile_app_startup
    • Enable the parameter for the condition and default to false

    View Slide

  122. Sit back and Strategize! 🤔
    Step 3: Extract Impacted sessions and collect prod traces

    from the impacted users
    Segment

    impacted users

    with firebase
    Uploading

    traces from

    impacted users

    View Slide

  123. Sit back and Strategize! 🤔
    4. Debug API
    •Fetch performance traces for the impacted users from

    production

    View Slide

  124. Sit back and Strategize! 🤔
    4. Debug API
    •Fetch performance traces for the impacted users from

    production
    class AppLaunchTracker(private val remoteConfig: RemoteConfig)


    View Slide

  125. Sit back and Strategize! 🤔
    4. Debug API
    •Fetch performance traces for the impacted users from

    production
    class AppLaunchTracker(private val remoteConfig: RemoteConfig)


    fun startTracking() {


    ...


    if(remoteConfig.isEnabled(enable_trace_app_launch)) {


    Debug.startMethodTracingSampling(


    context.cacheDir,


    maxBufferSize,


    samplingIntervalUs


    )


    }


    }


    View Slide

  126. Sit back and Strategize! 🤔
    4. Debug API
    •Fetch performance traces for the impacted users from

    production
    class AppLaunchTracker(private val remoteConfig: RemoteConfig)


    fun stopTracking() {


    ...


    if(remoteConfig.isEnabled(enable_trace_app_launch)) {


    Debug.stopMethodTracing()


    }


    }


    View Slide

  127. Sit back and Strategize! 🤔
    4. Debug API
    •Fetch performance traces for the impacted users from

    production
    •Upload traces from periodic work manager

    View Slide

  128. Sit back and Strategize! 🤔
    4. Debug API
    •Fetch performance traces for the impacted users from

    production
    •Upload traces from periodic work manager
    • Firebase Storage


    • Multipart API upload

    View Slide

  129. Tracker
    Attributes for app launch

    Insights from data:

    • Splash screen is not the only

    contributor to first_screen_drawn
    • App Icon click is not only trigger
    • Campaigns
    Eyes on the changes entering while you fix

    Alert of PRs on slack, email

    Flame graphs, diff, VCS metadata

    View Slide

  130. Tracker
    Attributes for app launch

    Insights from data:

    • Splash screen is not the only

    contributor to first_screen_drawn
    • App Icon click is not only trigger
    • Campaigns
    Eyes on the changes entering while you fix

    Alert of PRs on slack, email

    Flame graphs, diff, VCS metadata

    Traces from production

    View Slide

  131. Sit back and Strategize! 🤔
    Flamegraphs

    View Slide

  132. Sit back and Strategize! 🤔
    What we achieved with flame graphs?

    View Slide

  133. Sit back and Strategize! 🤔
    •Study issues which are consistent in the impacted flame

    graphs
    What we achieved with flame graphs?

    View Slide

  134. Sit back and Strategize! 🤔
    •Study issues which are consistent in the impacted flame

    graphs
    What we achieved with flame graphs?
    •Creating a priority list of the issues that needs to be fixed

    View Slide

  135. Sit back and Strategize! 🤔
    •Study issues which are consistent in the impacted flame

    graphs
    What we achieved with flame graphs?
    •Creating a priority list of the issues that needs to be fixed


    •Performance gain they will provide

    View Slide

  136. Sit back and Strategize! 🤔
    •Study issues which are consistent in the impacted flame

    graphs
    What we achieved with flame graphs?
    •Creating a priority list of the issues that needs to be fixed


    •Performance gain they will provide


    •Effort of fix on the basis of:


    •Refactor


    •Upgrading library


    •Executing on background thread

    View Slide

  137. Step4:

    Lets ship the fixes

    View Slide

  138. Step4:

    Lets ship the fixes
    Some of the issues the team found
    •Dexguard class encryption of classes increasing

    initialisation time

    View Slide

  139. Step4:

    Lets ship the fixes
    Some of the issues the team found
    •Dexguard class encryption of classes increasing

    initialisation time
    •List of consistent expensive initialisation of some SDKs

    View Slide

  140. Step4:

    Lets ship the fixes
    Some of the issues the team found
    •Dexguard class encryption of classes increasing

    initialisation time
    •List of consistent expensive initialisation of some SDKs
    •Unwanted dependencies getting injected through dagger

    View Slide

  141. Partayyy!!
    Some of the issues the team found
    •Dexguard class encryption of classes increasing

    initialisation time
    •List of consistent expensive initialisation of some SDKs
    •Unwanted dependencies getting injected through dagger
    🎉

    View Slide

  142. Conclusion
    •Creating observability is the most important part, fixes can really be one line change

    View Slide

  143. Conclusion
    •Creating observability is the most important part, fixes can really be one line change
    •Lets revisit our patterns:

    View Slide

  144. Conclusion
    •Creating observability is the most important part, fixes can really be one line change
    • Pattern 1: Scalable ≠ High Performant Codebase
    •Lets revisit our patterns:

    View Slide

  145. Conclusion
    •Creating observability is the most important part, fixes can really be one line change
    • Pattern 1: Scalable ≠ High Performant Codebase
    •Lets revisit our patterns:

    • Pattern 2: We get to know from customer tickets or Android Vitals

    View Slide

  146. Conclusion
    •Creating observability is the most important part, fixes can really be one line change
    • Pattern 1: Scalable ≠ High Performant Codebase
    •Lets revisit our patterns:

    • Pattern 2: We get to know from customer tickets or Android Vitals

    • Pattern 3: Prioritising performance issues with other feature work is an issue

    View Slide

  147. Conclusion
    •Creating observability is the most important part, fixes can really be one line change
    • Pattern 1: Scalable ≠ High Performant Codebase
    •Lets revisit our patterns:

    • Pattern 2: We get to know from customer tickets or Android Vitals

    • Pattern 3: Prioritising performance issues with other feature work is an issue

    • Pattern 4: Drafting OKRs for platform teams is difficult

    View Slide

  148. Conclusion
    •Creating observability is the most important part, fixes can really be one line change
    • Pattern 1: Scalable ≠ High Performant Codebase
    •Lets revisit our patterns:

    • Pattern 2: We get to know from customer tickets or Android Vitals

    • Pattern 3: Prioritising performance issues with other feature work is an issue

    • Pattern 4: Drafting OKRs for platform teams is difficult
    Proposal:

    View Slide

  149. Conclusion
    •Creating observability is the most important part, fixes can really be one line change
    • Pattern 1: Scalable ≠ High Performant Codebase
    •Lets revisit our patterns:

    • Pattern 2: We get to know from customer tickets or Android Vitals

    • Pattern 3: Prioritising performance issues with other feature work is an issue

    • Pattern 4: Drafting OKRs for platform teams is difficult
    • KR 1: Bringing observability to metric x, y, z, a, b, c (first screen, app launch time, user id)
    Proposal:

    View Slide

  150. Conclusion
    •Creating observability is the most important part, fixes can really be one line change
    • Pattern 1: Scalable ≠ High Performant Codebase
    •Lets revisit our patterns:

    • Pattern 2: We get to know from customer tickets or Android Vitals

    • Pattern 3: Prioritising performance issues with other feature work is an issue

    • Pattern 4: Drafting OKRs for platform teams is difficult
    • KR 1: Bringing observability to metric x, y, z, a, b, c (first screen, app launch time, user id)
    Proposal:
    • KR 2: Attempt 70% accuracy on the attempts for fixing cold start.

    View Slide

  151. Conclusion
    •Creating observability is the most important part, fixes can really be one line change
    • Pattern 1: Scalable ≠ High Performant Codebase
    •Lets revisit our patterns:

    • Pattern 2: We get to know from customer tickets or Android Vitals

    • Pattern 3: Prioritising performance issues with other feature work is an issue

    • Pattern 4: Drafting OKRs for platform teams is difficult
    • KR 1: Bringing observability to metric x, y, z, a, b, c (first screen, app launch time, user id)
    Proposal:
    • KR 2: Attempt 70% accuracy on the attempts for fixing cold start.
    • KR 3: Creating quality gate and bringing X% confidence on detecting regressions

    View Slide

  152. Fin
    🙋
    @droid_singh
    @amanjeetsingh150
    Amanjeet Singh

    View Slide