Chasing Performance Issues Methodically

Chasing Performance Issues Methodically @droid_singh @amanjeetsingh150 Amanjeet Singh

What is Performance?

What is Performance? Story 📘

What is Performance? Story 📘 Pattern 1: Scalable ≠ High
Performant Codebase

Performant Codebase • Not upgrading libraries

Performant Codebase • Not upgrading libraries • Partial quality gates

Performant Codebase • Not upgrading libraries • Partial quality gates • Not scoping dagger dependencies properly. I Singleton ❤

Performant Codebase • Not upgrading libraries • Partial quality gates • Not scoping dagger dependencies properly. I Singleton • Selecting configs according to only business metrics ❤

Performant Codebase • Not upgrading libraries • Partial quality gates • Not scoping dagger dependencies properly. I Singleton • Selecting configs according to only business metrics ❤ Pattern 2: We get to know from customer tickets or Android Vitals

Performant Codebase • Not upgrading libraries • Partial quality gates • Not scoping dagger dependencies properly. I Singleton • Selecting configs according to only business metrics ❤ Pattern 2: We get to know from customer tickets or Android Vitals Pattern 3: Prioritising performance issues   with other feature work is an issue

Performant Codebase • Not upgrading libraries • Partial quality gates • Not scoping dagger dependencies properly. I Singleton • Selecting configs according to only business metrics ❤ Pattern 2: We get to know from customer tickets or Android Vitals Pattern 3: Prioritising performance issues   with other feature work is an issue • How are performance issues linked with business?

Performant Codebase • Not upgrading libraries • Partial quality gates • Not scoping dagger dependencies properly. I Singleton • Selecting configs according to only business metrics ❤ Pattern 2: We get to know from customer tickets or Android Vitals Pattern 3: Prioritising performance issues   with other feature work is an issue • How are performance issues linked with business? • Time for solving one performance issue?

Performant Codebase • Not upgrading libraries • Partial quality gates • Not scoping dagger dependencies properly. I Singleton • Selecting configs according to only business metrics ❤ Pattern 2: We get to know from customer tickets or Android Vitals Pattern 3: Prioritising performance issues   with other feature work is an issue • How are performance issues linked with business? • Time for solving one performance issue? 🤷

Platform Team of Company X

Mission I: Reducing Cold Start •Drafting OKRs for reducing Cold
Start

Start •Draft I •Uplifting the app quality of X consumer app (O)

Start •Draft I •Uplifting the app quality of X consumer app (O) •Key Results I: Reduce cold start by 60% •Key Results II: Reduce wake locks by 30% •Key Results III: Reduce frame drops by 50%

Start •Draft I •Uplifting the app quality of X consumer app (O) •Key Results I: Reduce cold start by 60% •Key Results II: Reduce wake locks by 30% •Key Results III: Reduce frame drops by 50% •No observability

Start •Draft I •Uplifting the app quality of X consumer app (O) •Key Results I: Reduce cold start by 60% •Key Results II: Reduce wake locks by 30% •Key Results III: Reduce frame drops by 50% •No observability •No quality gate

Start •Draft I •Uplifting the app quality of X consumer app (O) •Key Results I: Reduce cold start by 60% •Key Results II: Reduce wake locks by 30% •Key Results III: Reduce frame drops by 50% 😖 •No observability •No quality gate •Every fix might not have affect to gain performance

Start •Draft I •Uplifting the app quality of X consumer app (O) •Key Results I: Reduce cold start by 60% •Key Results II: Reduce wake locks by 30% •Key Results III: Reduce frame drops by 50% Pattern 4: Drafting OKRs for platform teams is difficult •No observability •No quality gate •Every fix might not have affect to gain performance 😖

First few Attempts to fix cold start ⏰

First few Attempts to fix cold start ⏰ 1. Looking
online for blogs "Reducing cold starts at an X   company by 80%"

online for blogs "Reducing cold starts at an X   company by 80%" 2. Playing with tools like profilers locally to identify   bottlenecks on app start and fix them

online for blogs "Reducing cold starts at an X   company by 80%" 2. Playing with tools like profilers locally to identify   bottlenecks on app start and fix them 3. Randomly change something according to experience

online for blogs "Reducing cold starts at an X   company by 80%" 2. Playing with tools like profilers locally to identify   bottlenecks on app start and fix them 3. Randomly change something according to experience 🚀

online for blogs "Reducing cold starts at an X   company by 80%" 2. Playing with tools like profilers locally to identify   bottlenecks on app start and fix them 3. Randomly change something according to experience Theoretically ✅ 🚀

online for blogs "Reducing cold starts at an X   company by 80%" 2. Playing with tools like profilers locally to identify   bottlenecks on app start and fix them 3. Randomly change something according to experience Theoretically ✅ ❌ 🚀

First few Attempts to fix cold start ⏰ Anti Methodology
for Performance Analysis

for Performance Analysis • Blame-Someone-Else Anti-Method

for Performance Analysis • Blame-Someone-Else Anti-Method • Street light anti-method

for Performance Analysis • Blame-Someone-Else Anti-Method • Street light anti-method • Random Change Anti-Method

What are we   doing wrong?

What are we   doing wrong? • Are we chasing
the right metric for cold start?

the right metric for cold start? • Is there any missing case for cold start we are not   considering?

the right metric for cold start? • Is there any missing case for cold start we are not   considering? • Maybe there were improvements but were neutralised   by changes made by other developers 🤔  

Sit back and Strategize! 🤔

Sit back and Strategize! 🤔 Step 1: Identify proper metrics
and create observability

and create observability • Cold Start App Launch

and create observability • Cold Start App Launch • Different spans for app launch

and create observability • Cold Start App Launch • Different spans for app launch • Google Content   Provider First   Screen   Drawn

and create observability • Cold Start App Launch • Different spans for app launch • Google • User experienced Content   Provider First   Screen   Drawn onCreate   end onStart

and create observability • Cold Start App Launch • Different spans for app launch • Google • User experienced • Send app launch events and following attributes:

and create observability • Cold Start App Launch • Different spans for app launch • Google • User experienced • Send app launch events and following attributes: • First screen name

and create observability • Cold Start App Launch • Different spans for app launch • Google • User experienced • Send app launch events and following attributes: • First screen name • Total time

and create observability • Cold Start App Launch • Different spans for app launch • Google • User experienced • Send app launch events and following attributes: • First screen name • Total time • User ID

and create observability • Cold Start App Launch • Different spans for app launch • Google • User experienced • Send app launch events and following attributes: • First screen name • Total time • User ID • Tools selection: Firebase analytics

Sit back and Strategize! 🤔 // Start AppLaunchTracker.startTracking()

Sit back and Strategize! 🤔 // Start AppLaunchTracker.startTracking() // End
AppLaunchTracker.stopTracking(firstScreenName)    

AppLaunchTracker.stopTracking(firstScreenName)     // Track event Bundle bundle = new Bundle(); bundle.putString(FirebaseAnalytics.Param.USER_ID, id); bundle.putString(   FirebaseAnalytics.Param.FIRST_SCREEN_DRAWN, firstScreenName   ); bundle.putString(FirebaseAnalytics.Param.TOTAL_DURATION, totalTime);  

AppLaunchTracker.stopTracking(firstScreenName)     // Track event Bundle bundle = new Bundle(); bundle.putString(FirebaseAnalytics.Param.USER_ID, id); bundle.putString(   FirebaseAnalytics.Param.FIRST_SCREEN_DRAWN, firstScreenName   ); bundle.putString(FirebaseAnalytics.Param.TOTAL_DURATION, totalTime); val firebaseAnalytics = FirebaseAnalytics.getInstance(this) firebaseAnalytics.logEvent(FirebaseAnalytics.Event.APP_LAUNCH, bundle)  

AppLaunchTracker.stopTracking(firstScreenName)     // Track event Bundle bundle = new Bundle(); bundle.putString(FirebaseAnalytics.Param.USER_ID, id); bundle.putString(   FirebaseAnalytics.Param.FIRST_SCREEN_DRAWN, firstScreenName   ); bundle.putString(FirebaseAnalytics.Param.TOTAL_DURATION, totalTime); val firebaseAnalytics = FirebaseAnalytics.getInstance(this) firebaseAnalytics.logEvent(FirebaseAnalytics.Event.APP_LAUNCH, bundle)   Query data and create intelligence ✅

Tracker

Tracker Attributes for app launch ✅

Tracker Attributes for app launch ✅ Insights from data: ✅

• Splash screen is not the only contributor   to first_screen_drawn 0 25 50 75 100 Survey Screen Splash Home Screen First screen names Percentage   distribution   of app launch

• Splash screen is not the only contributor   to first_screen_drawn • App Icon click is not only trigger 0 25 50 75 100 Survey Screen Splash Home Screen First screen names Percentage   distribution   of app launch

• Splash screen is not the only contributor   to first_screen_drawn • App Icon click is not only trigger • Campaigns 0 25 50 75 100 Survey Screen Splash Home Screen Percentage   distribution   of app launch First screen names

Sit back and Strategize! 🤔 Step 2:

Sit back and Strategize! 🤔 Step 2: Stop bleeding by
creating baseline

creating baseline?

creating baseline • Creating observability on the PRs

creating baseline • Creating observability on the PRs • Ideal tool for this:

creating baseline • Creating observability on the PRs • Ideal tool for this: • Surface tentative regressions

creating baseline • Creating observability on the PRs • Ideal tool for this: • Surface tentative regressions • Detecting outliers and noise

creating baseline • Creating observability on the PRs • Ideal tool for this: • Surface tentative regressions • Detecting outliers and noise • Surface regressions mapping developers and teams

creating baseline • Creating observability on the PRs • Ideal tool for this: • Surface tentative regressions • Detecting outliers and noise • Surface regressions mapping developers and teams • Infra to run the tests

creating baseline • Creating observability on the PRs • Ideal tool for this: • Observability on tentative regressions • Detecting outliers and noise • Surface regressions mapping developers and teams • Infra to run the tests

Sit back and Strategize! 🤔 Common factors of noise and
outliers • Network

outliers • Network 🎹 Orchestration of test with mitm

outliers • Network • Remote Configuration /data/data/com.app.id/files/frc_<mobile-sdk-id>_firebase_activate.json

outliers • Network • Remote Configuration • Debug Builds

Sit back and Strategize! 🤔 Why not debug builds?

Sit back and Strategize! 🤔 Why not debug builds? •
Account for Progaurd and Dexguard effects

Account for Progaurd and Dexguard effects • Don’t account for debug artifacts

Account for Progaurd and Dexguard effects • Don’t account for debug artifacts • Take in account build configuration as close to   release builds

outliers • Network • Remote Configuration • Debug Builds • App/Device based noise

Sit back and Strategize! 🤔 App/Device based noise

Sit back and Strategize! 🤔 • Random GC triggers App/Device
based noise

Sit back and Strategize! 🤔 App/Device based noise • Random
GC triggers • System Dialogs from Crashes/ANRs

GC triggers • System Dialogs from Crashes/ANRs • Device configuration like CPU frequency

GC triggers • System Dialogs from Crashes/ANRs • Device configuration like CPU frequency • CPU/Runtime optimisations in general

GC triggers • System Dialogs from Crashes/ANRs • Device configuration like CPU frequency • CPU/Runtime optimisations in general • Never ending.

creating baseline • Creating observability on the PRs • Ideal tool for this: • Observability on tentative regressions • Detecting outliers and noise • Surface regressions mapping developers and teams • Infra to run the tests Tool Selection

• Splash screen is not the only   contributor to first_screen_drawn • App Icon click is not only trigger • Campaigns

• Splash screen is not the only   contributor to first_screen_drawn • App Icon click is not only trigger • Campaigns Eyes on the changes entering while you fix ✅

• Splash screen is not the only   contributor to first_screen_drawn • App Icon click is not only trigger • Campaigns Eyes on the changes entering while you fix ✅ Alert of PRs on slack, email ✅

• Splash screen is not the only   contributor to first_screen_drawn • App Icon click is not only trigger • Campaigns Eyes on the changes entering while you fix ✅ Alert of PRs on slack, email ✅ Flame graphs, diff, VCS metadata ✅

Sit back and Strategize! 🤔 Production analytics Quality gate on
baseline branch ✅ ✅

Sit back and Strategize! 🤔 Step 3: Extract Impacted sessions
and collect prod traces   from the impacted users

Sit back and Strategize! 🤔 • Definition of Impacted session
for app launch Step 3: Extract Impacted sessions and collect prod traces   from the impacted users

for app launch • Google Android Vitals >= 5 seconds Step 3: Extract Impacted sessions and collect prod traces   from the impacted users

for app launch • Google Android Vitals >= 5 seconds • Discussion with product stakeholders Step 3: Extract Impacted sessions and collect prod traces   from the impacted users

for app launch • Discussion with product stakeholders • Google Android Vitals >= 5 seconds • Query the users having more than X percent sessions   as Impacted Step 3: Extract Impacted sessions and collect prod traces   from the impacted users

Sit back and Strategize! 🤔 • Discussion with product stakeholders
• Query the users having more than X percent sessions   as Impacted • Tools selection: Firebase Realtime Database, Firebase   User Properties and Debug API on the rescue • Google Android Vitals >= 5 seconds • Definition of Impacted session for app launch Step 3: Extract Impacted sessions and collect prod traces   from the impacted users

and collect prod traces   from the impacted users Segment   impacted users   with firebase Uploading   traces from   impacted users

Sit back and Strategize! 🤔 1. Create Firebase User Properties
on Console

Sit back and Strategize! 🤔 1. Create Firebase User Properties
on Console pro fi le_app_startup

Sit back and Strategize! 🤔 users ……. 2. Firebase Realtime
Database

Sit back and Strategize! 🤔 2. Firebase Realtime Database users
……. ……. user-id-1 ……. user-id-2

Sit back and Strategize! 🤔 users ……. ……. user-id-1 …….
user_property_a: true ……. user-id-2 ……. user_property_a: true 2. Firebase Realtime Database

user_property_a: true ……. user-id-2 ……. user_property_a: true ……. pro fi le_app_startup: true ……. user_property_b: true 2. Firebase Realtime Database

user_property_a: true ……. user-id-2 ……. user_property_a: true ……. pro fi le_app_startup: true ……. user_property_b: true 2. Firebase Realtime Database Dynamically enable app start up profiling for these users

Sit back and Strategize! 🤔 3. Enable via remote config

• Create parameter enable_trace_app_launch

• Create parameter enable_trace_app_launch • Create condition with user property profile_app_startup

• Create parameter enable_trace_app_launch • Create condition with user property profile_app_startup • Enable the parameter for the condition and default to false

and collect prod traces   from the impacted users Segment   impacted users   with firebase Uploading   traces from   impacted users

Sit back and Strategize! 🤔 4. Debug API •Fetch performance
traces for the impacted users from   production

traces for the impacted users from   production class AppLaunchTracker(private val remoteConfig: RemoteConfig)

traces for the impacted users from   production class AppLaunchTracker(private val remoteConfig: RemoteConfig) fun startTracking() { ... if(remoteConfig.isEnabled(enable_trace_app_launch)) { Debug.startMethodTracingSampling( context.cacheDir, maxBufferSize, samplingIntervalUs ) } }

traces for the impacted users from   production class AppLaunchTracker(private val remoteConfig: RemoteConfig) fun stopTracking() { ... if(remoteConfig.isEnabled(enable_trace_app_launch)) { Debug.stopMethodTracing() } }

traces for the impacted users from   production •Upload traces from periodic work manager

traces for the impacted users from   production •Upload traces from periodic work manager • Firebase Storage • Multipart API upload

• Splash screen is not the only   contributor to first_screen_drawn • App Icon click is not only trigger • Campaigns Eyes on the changes entering while you fix ✅ Alert of PRs on slack, email ✅ Flame graphs, diff, VCS metadata ✅

• Splash screen is not the only   contributor to first_screen_drawn • App Icon click is not only trigger • Campaigns Eyes on the changes entering while you fix ✅ Alert of PRs on slack, email ✅ Flame graphs, diff, VCS metadata ✅ Traces from production ✅

Sit back and Strategize! 🤔 Flamegraphs ✅

Sit back and Strategize! 🤔 What we achieved with flame
graphs?

Sit back and Strategize! 🤔 •Study issues which are consistent
in the impacted flame   graphs What we achieved with flame graphs?

in the impacted flame   graphs What we achieved with flame graphs? •Creating a priority list of the issues that needs to be fixed

in the impacted flame   graphs What we achieved with flame graphs? •Creating a priority list of the issues that needs to be fixed •Performance gain they will provide

in the impacted flame   graphs What we achieved with flame graphs? •Creating a priority list of the issues that needs to be fixed •Performance gain they will provide •Effort of fix on the basis of: •Refactor •Upgrading library •Executing on background thread

Step4:   Lets ship the fixes

Step4:   Lets ship the fixes Some of the issues
the team found •Dexguard class encryption of classes increasing   initialisation time

the team found •Dexguard class encryption of classes increasing   initialisation time •List of consistent expensive initialisation of some SDKs

the team found •Dexguard class encryption of classes increasing   initialisation time •List of consistent expensive initialisation of some SDKs •Unwanted dependencies getting injected through dagger

Partayyy!! Some of the issues the team found •Dexguard class
encryption of classes increasing   initialisation time •List of consistent expensive initialisation of some SDKs •Unwanted dependencies getting injected through dagger 🎉

Conclusion •Creating observability is the most important part, fixes can
really be one line change

really be one line change •Lets revisit our patterns:

really be one line change • Pattern 1: Scalable ≠ High Performant Codebase •Lets revisit our patterns: ✅

really be one line change • Pattern 1: Scalable ≠ High Performant Codebase •Lets revisit our patterns: ✅ • Pattern 2: We get to know from customer tickets or Android Vitals ✅

really be one line change • Pattern 1: Scalable ≠ High Performant Codebase •Lets revisit our patterns: ✅ • Pattern 2: We get to know from customer tickets or Android Vitals ✅ • Pattern 3: Prioritising performance issues with other feature work is an issue ✅

really be one line change • Pattern 1: Scalable ≠ High Performant Codebase •Lets revisit our patterns: ✅ • Pattern 2: We get to know from customer tickets or Android Vitals ✅ • Pattern 3: Prioritising performance issues with other feature work is an issue ✅ • Pattern 4: Drafting OKRs for platform teams is difficult

really be one line change • Pattern 1: Scalable ≠ High Performant Codebase •Lets revisit our patterns: ✅ • Pattern 2: We get to know from customer tickets or Android Vitals ✅ • Pattern 3: Prioritising performance issues with other feature work is an issue ✅ • Pattern 4: Drafting OKRs for platform teams is difficult Proposal:

really be one line change • Pattern 1: Scalable ≠ High Performant Codebase •Lets revisit our patterns: ✅ • Pattern 2: We get to know from customer tickets or Android Vitals ✅ • Pattern 3: Prioritising performance issues with other feature work is an issue ✅ • Pattern 4: Drafting OKRs for platform teams is difficult • KR 1: Bringing observability to metric x, y, z, a, b, c (first screen, app launch time, user id) Proposal:

really be one line change • Pattern 1: Scalable ≠ High Performant Codebase •Lets revisit our patterns: ✅ • Pattern 2: We get to know from customer tickets or Android Vitals ✅ • Pattern 3: Prioritising performance issues with other feature work is an issue ✅ • Pattern 4: Drafting OKRs for platform teams is difficult • KR 1: Bringing observability to metric x, y, z, a, b, c (first screen, app launch time, user id) Proposal: • KR 2: Attempt 70% accuracy on the attempts for fixing cold start.

really be one line change • Pattern 1: Scalable ≠ High Performant Codebase •Lets revisit our patterns: ✅ • Pattern 2: We get to know from customer tickets or Android Vitals ✅ • Pattern 3: Prioritising performance issues with other feature work is an issue ✅ • Pattern 4: Drafting OKRs for platform teams is difficult • KR 1: Bringing observability to metric x, y, z, a, b, c (first screen, app launch time, user id) Proposal: • KR 2: Attempt 70% accuracy on the attempts for fixing cold start. • KR 3: Creating quality gate and bringing X% confidence on detecting regressions

Fin 🙋 @droid_singh @amanjeetsingh150 Amanjeet Singh

Chasing Performance Issues Methodically

Chasing Performance Issues Methodically

More Decks by Amanjeet Singh

Other Decks in Technology

Featured

Transcript