[Valera Zakharov] Mobile performance testing at Slack

[Valera Zakharov] Mobile performance testing at Slack

Presentation from GDG DevFest Ukraine 2018 - the biggest community-driven Google tech conference in the CEE.

Learn more at: https://devfest.gdg.org.ua

__

There comes a point in a company's evolution when the rush to build all the features as fast as possible subsides and the company realizes that performance should be prioritized too. The CEO publishes a document that says "a Slack client must be as fast as fuck" and the engineer team sets out to fix all the performance bottlenecks. But how does an engineer validate that their improvements actually work? More importantly, how does the team prevent future performance regressions?

Over a year ago, we asked these questions and decided to build a performance testing pipeline that would continuously validate every code change for performance impact. In this talk, I will introduce the basic building blocks of this pipeline and share the lessons learned from building and maintaining this infrastructure.

3a6de6bc902de7f75c0e753b3202ed52?s=128

Google Developers Group Lviv

October 12, 2018
Tweet

Transcript

  1. How to Build a Performance Testing Pipeline from Scratch VALERA

    ZAKHAROV
  2. None
  3. None
  4. None
  5. "A slack client should be fast as fuck!”

  6. Let’s fix some perf bugs

  7. Trends

  8. Alerts

  9. Pre-merge Alerts

  10. Naive Approach Measure dev version value Compare against baseline Alert

    if they are different Execution time Frame metrics Resource usage Anything that can 
 be measured latest master
  11. Problem Results are variable and sometimes VERY VARIABLE

  12. Stats to the Rescue C O M PA R E

    DEV BUILD VALUES MASTER BUILD VALUES
  13. Mann-Whitney U Test P-VALUE CONFIDENCE

  14. Statistical Approach Collect set of N values from dev version

    Test against data set from master Alert if confidence > threshold
  15. Collect set of N values from dev version Test against

    data set from master Alert if diff confidence > threshold WE CONTROL THESE Statistical Approach
  16. Higher Number of values = better stats Higher alert threshold

    = lower false alert rate lower chance of valid detection more device time Statistical Approach
  17. For whom?

  18. Trust Valid detections Noise =

  19. Valid detections Noise = Trust

  20. PerfTest Job Backend ! " merge to master open PR

    Trends trigger perf run run tests + gather data perf data Alert
  21. PerfTest Job run tests + gather data

  22. Naive approach Build Node runner Build Node backend aggregate metrics

    test metrics PerfTest Job run tests + gather data
  23. Naive approach Build Node runner Build Node backend aggregate metrics

    test metrics test metrics test metrics PerfTest Job run tests + gather data
  24. Naiv-ish approach Build Node runner Build Node backend aggregate metrics

    test metrics test metrics test metrics device provider get release PerfTest Job run tests + gather data
  25. None
  26. https://code.fb.com/android/the-mobile-device-lab-at-the-prineville-data-center Do you have resources to build this?

  27. PerfTest Job Cloud Version Bui runner Build Node backend AGGREGATE

    METRICS TEST METRICS TEST METRICS TEST METRICS device provider GET RELEASE run tests + gather data
  28. Cloud Version # Stability Scalability $ Control PerfTest Job run

    tests + gather data
  29. PerfTest Job Backend ! " merge to master open PR

    Trends trigger perf run run tests + gather data perf data Alert
  30. None
  31. Instrumented Application Instrumentation Test EventTracker.startPerfTracking(Beacon.CHANNEL_SYNC) // code that does channel

    sync EventTracker.endPerfTracking(Beacon.CHANNEL_SYNC) persist_rtm_start,44 process_rtm_start,19 ms_time_to_connect,703 channel_sync,381
  32. Focus on client Network is highly unstable & variable Backend

    regressions should not block client developers Use Record & Replay github.com/airbnb/okreplay
  33. Keep it real We want to catch regressions that represent

    the real world Preserve the prod object graph Run against release-like config LargeTest
  34. Keep it real What about small unit perf tests?

  35. Make it stable Perf tests will be executed a lot

    Stability bar is very high Don’t compromise on flakiness Use IdlingResource
  36. Keep it working

  37. PerfTest Job Backend ! " merge to master open PR

    Trends trigger perf run run tests + gather data perf data Alert
  38. Backend createBuild | completeBuild API store & analyze data check

    sanity perf data
  39. Backend Perf Data { "build_info":{ "platform":"android", “author_slack_id”:”W1234567”, "branch_name":"master", "build_cause":"Fixed sort

    order for starred unreads. (#9838)", "id":8668, "jenkins_build_number":"9287", "author_name":"Kevin Lai", "job_name":"android-master-perf" }, "tests":[ { "status":"complete", "name":"com.Slack.ui.perf.SignInPerfTest#firstSignin_medium", "metric_results":[ {"name":"inflate_flannel_start","value":263}, {"name":"quickswitcher_show",”value”:30}, {"name":"inflate_flannel_start","value":314}, {"name":"quickswitcher_show","value":45} ] } ] }
  40. Backend Backend Stack New shiny tech is great … …

    but use whatever stack you have in house
  41. PerfTest Job Backend ! " merge to master open PR

    Trends trigger perf run run tests + gather data perf data Alert
  42. " Trends

  43. "

  44. "

  45. "

  46. "

  47. PerfTest Job Backend ! " merge to master open PR

    Trends trigger perf run run tests + gather data perf data Alert
  48. ! Alert

  49. !

  50. !

  51. !

  52. !

  53. !

  54. !

  55. ! More on debugging Pre-merge alerting is great for experimenting

    Detailed trace info would be great nice https://github.com/facebookincubator/profilo looks promising
  56. !

  57. !

  58. Trust

  59. PerfTest Job Backend ! " merge to master open PR

    Trends trigger perf run run tests + gather data perf data Alert
  60. None
  61. None
  62. Thank you! Questions? @valera_zakharov