$30 off During Our Annual Pro Sale. View Details »

Monitoring user experience
of Flutter apps with...

Monitoring user experience
of Flutter apps with SLI/SLO

Monitoring user experience 
of Flutter apps with SLI/SLO
talked in https://flutterninjas.dev/ on June 2024.

SLI/SLO is often heard as a SRE word, but it is not familiar to mobile app development. In my product, the failure rate is high, and I needed a mechanism to detect and solve the problems as early as possible. So I adapted the SLI/SLO mechanism to fit mobile apps and created a mechanism to detect poor user experience. This mechanism allows for immediate detection of the following.
- Failure rate (cases where a certain number of users fail to use a feature due to an error or other occurrence)
- Cancellation rate (cases where a certain number of users cancel for some reason when using a feature)
- Suspension rate (cases where a certain number of users kill the app for some reason when using a feature)

Our app has now been embedded in more than 40 features for this monitoring.
In this session, we will discuss the following.
- What is SLI/SLO
- The difference between SLI/SLO in general and SLI/SLO for user experience detection
- How to measure cases where a user stops using a feature or crashes in the middle of a feature
- How to set the sampling rate
- How to create a user experience alerting infrastructure How to combat noise alerts when creating them
- How to embed the measurement infrastructure with as little impact on the main code as possible
- How to involve members of the business in detecting and preventing poor user experience

Takuma Osada

June 13, 2024
Tweet

More Decks by Takuma Osada

Other Decks in Technology

Transcript

  1. FlutterNinjas Takuma Osada Lead Mobile App Engineer @ WinTicket, CyberAgent,

    Inc Monitoring user experience of Flutter apps with SLI/SLO
  2. FlutterNinjas 2 Introduction Takuma Osada Lead Mobile App Engineer /

    WinTicket, CyberAgent Inc • GitHub ID: ostk 0 0 6 9 • X ID: ostk 0 0 6 9
  3. FlutterNinjas 3 Presentation slides are available in Japanese. このスライドを 日

    本語にしたものをX(Twitter) で先ほど投稿しました X(Twitter) ID: @ostk 0 0 6 9
  4. FlutterNinjas 4 Today’s Goal Understand SLI/SLO of user experience &

    expand the world of monitoring in fl utter apps 💪
  5. FlutterNinjas 5 Today’s Goal Understand SLI/SLO of user experience &

    expand the world of monitoring in fl utter apps 💪 This session is not about fl utter/dart. However, quality control / monitoring is one of the most exciting aspects in developing fl utter apps. ⚠
  6. FlutterNinjas 6 Today’s Goal Understand SLI/SLO of user experience &

    expand the world of monitoring in fl utter apps 💪 ⚠ I hope you will enjoy it with the understanding that there are some aspects to it. 🙌
  7. FlutterNinjas 7 AGENDA 1 . What is “user experience SLI/SLO”

    2 . Why “user experience SLI/SLO” is needed 3 . How to create “user experience SLI/SLO” 4 . Impressions from actual operation of “user experience SLI/SLO”
  8. FlutterNinjas 8 1 . What is “user experience SLI/SLO” 1

    . What is general SLI/SLO 2 . What is “user experience SLI/SLO”
  9. FlutterNinjas 1 0 1 . 1 . What is general

    SLI/SLO Have you ever heard of word “SLI/SLO” ? 🙋
  10. FlutterNinjas 1 2 SLI(Service Level Indicator) Measurable indicators of user

    satisfaction quantitatively. eg. response time, error rate, and downtime. https://cloud.google.com/architecture/framework/reliability/slo-components 1 . 1 . What is general SLI/SLO
  11. FlutterNinjas 1 3 SLI(Service Level Indicator) SLO(Service Level Objectives) Goals

    to be achieved with SLI. eg. “ 9 9 . 9 % uptime”, “average response time of less than X milliseconds”. https://cloud.google.com/architecture/framework/reliability/slo-components Measurable indicators of user satisfaction quantitatively eg. response time, error rate, and downtime. 1 . 1 . What is general SLI/SLO
  12. FlutterNinjas 1 4 SLI(Service Level Indicator) SLO(Service Level Objectives) Goals

    to be achieved with SLI. eg. “ 9 9 . 9 % uptime”, “average response time of less than X milliseconds”. https://cloud.google.com/architecture/framework/reliability/slo-components Measurable indicators of user satisfaction quantitatively eg. response time, error rate, and downtime. 1 . 1 . What is general SLI/SLO NOT easy to understand 😅
  13. FlutterNinjas 1 5 SLI(Service Level Indicator) SLO(Service Level Objectives) Goals

    to be achieved with SLI. eg. “ 9 9 . 9 % uptime”, “average response time of less than X milliseconds”. https://cloud.google.com/architecture/framework/reliability/slo-components Measurable indicators of user satisfaction quantitatively eg. response time, error rate, and downtime. 1 . 1 . What is general SLI/SLO Let’s think with familiar example to Flutter developers 📱
  14. FlutterNinjas 1 7 Crash Free User Rate Percentage of users

    NOT experiencing crashes. Crash Free Session Rate is also available for sessions. https://docs.newrelic.com/docs/mobile-monitoring/mobile-monitoring-ui/mobile-app-pages/release-versions-page/#drill-down 1 . 1 . What is general SLI/SLO
  15. FlutterNinjas 1 8 SLI SLO https://cloud.google.com/architecture/framework/reliability/slo-components Measurable indicators of user

    satisfaction quantitatively eg. response time, error rate, and downtime. Goals to be achieved with SLI. eg. “ 9 9 . 9 % uptime”, “average response time of less than X milliseconds”. 1 . 1 . What is general SLI/SLO
  16. FlutterNinjas 1 9 SLI SLO https://cloud.google.com/architecture/framework/reliability/slo-components Crash Free User Rate

    1 . 1 . What is general SLI/SLO Measurable indicators of user satisfaction quantitatively eg. response time, error rate, and downtime. Goals to be achieved with SLI. eg. “ 9 9 . 9 % uptime”, “average response time of less than X milliseconds”.
  17. FlutterNinjas 2 0 SLI SLO https://cloud.google.com/architecture/framework/reliability/slo-components Maintain 9 9 %

    on a 2 4 h basis 1 . 1 . What is general SLI/SLO Goals to be achieved with SLI. eg. “ 9 9 . 9 % uptime”, “average response time of less than X milliseconds”. Crash Free User Rate Measurable indicators of user satisfaction quantitatively eg. response time, error rate, and downtime.
  18. FlutterNinjas Goals to be achieved with SLI. eg. “ 9

    9 . 9 % uptime”, “average response time of less than X milliseconds”. 2 1 SLI SLO https://cloud.google.com/architecture/framework/reliability/slo-components Target values can be adjusted depending on how far the team or business wants to maintain them. 1 . 1 . What is general SLI/SLO Crash Free User Rate Measurable indicators of user satisfaction quantitatively eg. response time, error rate, and downtime. Maintain 9 9 % on a 2 4 h basis
  19. FlutterNinjas 2 2 1 . 1 . What is general

    SLI/SLO What should be measured as SLI?
  20. FlutterNinjas 2 3 ↓ CUJ 1 . 1 . What

    is general SLI/SLO What should be measured as SLI?
  21. FlutterNinjas 2 4 CUJ(Critical User Journey) A user journey is

    a series of tasks that a user performs to achieve a goal. CUJ refers to a sequence of tasks that are common to achieving many goals or to achieving a very important goal. https://sre.google/workbook/implementing-slos/#modeling-user-journeys 1 . 1 . What is general SLI/SLO
  22. FlutterNinjas 2 5 Example in EC services: Find products, add

    items to cart, complete purchase Theses are most likely to be listed as a CUJ. https://sre.google/workbook/implementing-slos/#modeling-user-journeys 1 . 1 . What is general SLI/SLO CUJ(Critical User Journey) A user journey is a series of tasks that a user performs to achieve a goal. CUJ refers to a sequence of tasks that are common to achieving many goals or to achieving a very important goal.
  23. FlutterNinjas 2 6 https://sre.google/workbook/implementing-slos/#modeling-user-journeys 1 . 1 . What is

    general SLI/SLO A user journey is a series of tasks that a user performs to achieve a goal. CUJ refers to a sequence of tasks that are common to achieving many goals or to achieving a very important goal. CUJ(Critical User Journey) Example in EC services: Find products, add items to cart, complete purchase Theses are most likely to be listed as a CUJ. Ultimately, SLOs need to focus on improving the user experience. Therefore, create SLOs in terms of user-centered actions.
  24. FlutterNinjas 2 7 1 . 2 . What is “user

    experience SLI/SLO” Let’s start looking for di ff erence between general SLI/SLO & “User experience SLI/SLO” * “user experience SLI/SLO” is original word. Not common in SRE words.
  25. FlutterNinjas 2 8 1 . 2 . What is “user

    experience SLI/SLO” When I looked into SLI/SLO, all I found was that it is measured in units of API requests. I could not fi nd a clear background as far as I could fi nd, but I assume the reasons are as follows. • The SRE domain is a technology born from the back-end and infrastructure • Measurable regardless of the platform used by the user
  26. FlutterNinjas 2 9 “User experience SLI/SLO” stands for … “measuring

    a series of function fl ows from the start to the end”. 1 . 2 . What is “user experience SLI/SLO”
  27. FlutterNinjas 3 0 SMS login example in WINTICKET → →

    → 1 . 2 . What is “user experience SLI/SLO” Login top Enter phone number Enter veri fi cation code Success & Navigate to home * This is a function used in the development environment.
  28. FlutterNinjas → → → 3 1 Start Measurement when button

    tapped End Measurement with Success when login success 1 . 2 . What is “user experience SLI/SLO” SMS login example in WINTICKET ※ This is a function used in the development environment. Login top Enter phone number Enter veri fi cation code Success & Navigate to home = Login with SMS = Login Success
  29. FlutterNinjas 3 2 Measurement end with Error when any error

    caused by system or users. → → 1 . 2 . What is “user experience SLI/SLO” SMS login example in WINTICKET Login top Enter phone number Enter veri fi cation code * This is a function used in the development environment. = Wrong code error
  30. FlutterNinjas 3 3 → → 1 . 2 . What

    is “user experience SLI/SLO” SMS login example in WINTICKET * This is a function used in the development environment. Login top Enter phone number Enter veri fi cation code To cancel the use of a function, measure as “Cancel”. In the case of crash or application kills, the measurement is taken as “interruption”.
  31. FlutterNinjas 3 4 Advantages • Can measure cancellations and interruptions

    makes measurement wide. • Flutter(one-source solution) can be applied to both iOS and Android. 1 . 2 . What is “user experience SLI/SLO”
  32. FlutterNinjas 3 5 Disadvantage • As the measurement range increases,

    complexity increases. • noise data happens by users 1 . 2 . What is “user experience SLI/SLO” Advantages • Can measure cancellations and interruptions makes measurement wide. • Flutter(one-source solution) can be applied to both iOS and Android.
  33. FlutterNinjas 3 6 Measurement Unit Measure system-induced errors “User experience

    SLI/SLO” SLI/SLO ❌ ✅ API Requests Series of function fl ows ✅ ✅ Measurement Domain Back-end Client Ease of embedding measurements ✅ ❌ 1 . 2 . What is “user experience SLI/SLO” Measure user-induced errors
  34. FlutterNinjas 3 7 2 . Why “user experience SLI/SLO” is

    needed 1 . Aim to improve system availability 2 . Aim to improve MTTR(mean time to repair)
  35. FlutterNinjas 3 8 2 . 1 . Aim to improve

    system availability I will explain why we created “user experience SLI/SLO”
  36. FlutterNinjas 3 9 System Availability 2 . 1 . Aim

    to improve system availability
  37. FlutterNinjas 4 0 What is System Availability? System Availability (%)

    = Working Hours / (Stop Time + Working Hours) Stop Time (h) = MTTR* x 3 6 5 days x 2 4 hours / MTBF* MTTR(mean time to repair) = Summary of Stop Time / Trouble Count MTBF(mean time between failures) = Summary of Working Hours / Trouble Count 2 . 1 . Aim to improve system availability
  38. FlutterNinjas 4 1 Situation in 2 0 2 2 •

    Change Failure Rate: 7 . 5 % ( 3 / 4 0 ) • MTTR*: 4 . 0 days • System Availability: 9 6 . 8 % 2 . 1 . Aim to improve system availability * MTTR: mean time to repair
  39. FlutterNinjas 4 2 Goal for 2 0 2 3 •

    Change Failure Rate: 5 % • MTTR*: 3 . 0 days • System Availability: 9 8 . 4 % → 2 . 1 . Aim to improve system availability Situation in 2 0 2 2 • Change Failure Rate: 7 . 5 % ( 3 / 4 0 ) • MTTR*: 4 . 0 days • System Availability: 9 6 . 8 % * MTTR: mean time to repair
  40. FlutterNinjas 4 3 To achieve the goals … 2 .

    1 . Aim to improve system availability
  41. FlutterNinjas To achieve the goals … 4 4 📈 Reduce

    the failure rate • Create a system to detect problems before release • Conduct a review of the existing testing regime 2 . 1 . Aim to improve system availability
  42. FlutterNinjas To achieve the goals … 4 5 🕐 Lower

    MTTR • Detect defects early after release • Review the existing monitoring system 2 . 1 . Aim to improve system availability 📈 Reduce the failure rate • Create a system to detect problems before release • Conduct a review of the existing testing regime
  43. FlutterNinjas To achieve the goals … 4 6 Skip this

    for this session 2 . 1 . Aim to improve system availability 📈 Reduce the failure rate 🕐 Lower MTTR • Create a system to detect problems before release • Conduct a review of the existing testing regime • Detect defects early after release • Review the existing monitoring system
  44. FlutterNinjas To achieve the goals … 4 7 Skip this

    for this session 2 . 1 . Aim to improve system availability 📈 Reduce the failure rate 🕐 Lower MTTR • Create a system to detect problems before release • Conduct a review of the existing testing regime • Detect defects early after release • Review the existing monitoring system Strategy about End to End Testing is uploaded in Youtube. Presentation in FlutterKaigi 2 0 2 3 . (Only in Japanese 🙏) https://www.youtube.com/watch?v=VHhZlTDfwIQ&ab_channel=FlutterKaigi
  45. FlutterNinjas 4 8 2 . 2 . Aim to improve

    MTTR(mean time to repair) My team can detect defects in the following ways • Alert from error tracking tool(Error / Crash / ANR) • Inquiry from users • Message from co-workers • Ego-search on SNS
  46. FlutterNinjas Properties of each detection method 4 9 Detection Speed

    Detection Range Escalation Detection Area Inquiry Error Tracking Tool Message from co-workers Ego-search Known Unknown Fast Slow - Mid Mid All Slow Slow Mid Wide ✅ ✅ ✅ ❌ Wide All All 2 . 2 . Aim to improve MTTR(mean time to repair)
  47. FlutterNinjas Properties of each detection method 5 0 Detection Speed

    Detection Range Escalation Detection Area Inquiry Error Tracking Tool Message from co-workers Ego-search Known Unknown Fast Slow - Mid Mid All Slow Slow Mid Wide ✅ ✅ ✅ ❌ Wide All All Error tracking tool can detect fast, but can only pick up “known unknowns”. 2 . 2 . Aim to improve MTTR(mean time to repair)
  48. FlutterNinjas Properties of each detection method 5 1 Detection Speed

    Detection Range Escalation Detection Area Inquiry Error Tracking Tool Message from co-workers Ego-search Known Unknown Fast Slow - Mid Mid All Slow Slow Mid Wide ✅ ✅ ✅ ❌ Wide All All The others can detect unknown unknowns, but the detection speed is not fast enough. 2 . 2 . Aim to improve MTTR(mean time to repair)
  49. FlutterNinjas 5 2 “User experience SLI/SLO” could be a way

    out of the problem 🧐 2 . 2 . Aim to improve MTTR(mean time to repair)
  50. FlutterNinjas 5 3 Here is the actual list of failures

    that were not detected by error tracking tool in the WINTICKET app (excerpts) • Live video does not load and error occurs • When using WebView on Android 1 4 , blank display when returning from background recovery • Some users get an error when using credit card charge • Freezes for some users when using SMS authentication 2 . 2 . Aim to improve MTTR(mean time to repair)
  51. FlutterNinjas 5 4 Here is the actual list of failures

    that were not detected by error tracking tool in the WINTICKET app (excerpts) • Live video does not load and error occurs • When using WebView on Android 1 4 , blank display when returning from background recovery • Some users get an error when using credit card charge • Freezes for some users when using SMS authentication “User experience SLI/SLO” can detect not all, but some of these failures. 2 . 2 . Aim to improve MTTR(mean time to repair)
  52. FlutterNinjas All All All Known Unknown Properties of each detection

    method 5 5 ✅ ✅ ✅ ❌ User Experience SLI/SLO ✅ All ※ Only in measurement function SLI/SLO can detect unknown unknown problems within the measurement function. Escalation is also possible by creating an alert function. Detection Speed Detection Range Escalation Detection Area Inquiry Error Tracking Tool Message from co-workers Ego-search 2 . 2 . Aim to improve MTTR(mean time to repair) Fast Slow - Mid Slow Slow Fast Mid Mid Wide Wide Mid
  53. FlutterNinjas 5 6 Two approaches reduce MTTR 2 . 2

    . Aim to improve MTTR(mean time to repair)
  54. FlutterNinjas 5 7 🚨 Alert Function 2 . 2 .

    Aim to improve MTTR(mean time to repair) Two approaches reduce MTTR • Reduce MTTD(mean time to detect) by creating alerts • Create alerts focused on the latest version to detect release-related defects
  55. FlutterNinjas 5 8 • Reduce MTTD(mean time to detect) by

    creating alerts • Create alerts focused on the latest version to detect release-related defects ⛰High Observability • Provide more information to reduce the MTTI (mean time to survey) and make the analysis easier and more accurate. 2 . 2 . Aim to improve MTTR(mean time to repair) Two approaches reduce MTTR 🚨 Alert Function
  56. FlutterNinjas 5 9 Let's take a look how “user experience

    SLI/SLO” is made 🚀 2 . 2 . Aim to improve MTTR(mean time to repair)
  57. FlutterNinjas 6 0 3 . How to create “user experience

    SLI/SLO” 1 . Input knowledge 2 . De fi ne CUJ and align with business teams 3 . Select tools and embed measurements 4 . Create alerts 5 . Achieve high level of observability 6 . Embed SLI/SLO in teams and business
  58. FlutterNinjas 6 1 3 . 1 . Input knowledge Neither

    me nor my team were familiar with SLI/SLO. For this reason, we held a circle reading session in parallel with the survey. https://www.oreilly.com/library/view/observability-engineering/ 9 7 8 1 4 9 2 0 7 6 4 3 8 /
  59. FlutterNinjas 6 2 3 . 1 . Input knowledge Neither

    me nor my team were familiar with SLI/SLO. For this reason, we held a circle reading session in parallel with the survey. As a result, it was a good opportunity to gain knowledge, of course, but it was also very good to be able to communicate with people like, “That thing in this book, right?” or “Is the 0 0 in that book good?” It was very good to be able to communicate like that. https://www.oreilly.com/library/view/observability-engineering/ 9 7 8 1 4 9 2 0 7 6 4 3 8 /
  60. FlutterNinjas 6 3 3 . 2 . Define CUJ and

    align with business teams Review CUJs with the business team and prioritize 👤 Auth 🚴 Betting 🏦 Transfer Payo f 💰 Charge 🎬 Video 🎬 Broadcast 📈 Analysis Bettings 🎉 Campaign 📊 View Predictions ✉ Message 🔧 Settings 📄 Con fi rm History 🔍 Search 📅 Schedules
  61. FlutterNinjas 6 4 3 . 2 . Define CUJ and

    align with business teams Review CUJs with the business team and prioritize Since the purpose of users' use of the service is “to enjoy gambling”, we have focused on the following functions. 👤 Auth 🚴 Betting 🏦 Transfer Payo f 💰 Charge 🎬 Video 🎬 Broadcast 📈 Analysis Bettings 🎉 Campaign 📊 View Predictions ✉ Message 🔧 Settings 📄 Con fi rm History 🔍 Search 📅 Schedules
  62. FlutterNinjas 6 5 3 . 3 . Select tools and

    embed measurements 🧠 Sentry etc … Big Query New Relic DataDog Many tools exist to measure and alert on SLI/SLO
  63. FlutterNinjas 6 6 3 . 3 . Select tools and

    embed measurements 🧠 Sentry etc … Big Query New Relic DataDog My team has selected Sentry Many tools exist to measure and alert on SLI/SLO
  64. FlutterNinjas 6 7 3 . 3 . Select tools and

    embed measurements Why Sentry? ✅ Already used Sentry for error tracking, so familiar with it ✅ Ability to complete error tracking and SLI/SLO measurement in one tool ✅ SDK development is very active and high maintenance can be expected in the future
  65. FlutterNinjas 6 8 Once the measurement tool is determined, let’s

    start embedding measurement ⛏ 3 . 3 . Select tools and embed measurements
  66. FlutterNinjas 6 9 Tips for embedding: • Introduce a mechanism

    to measure interruptions in a less complicated way • Apply a sampling rate to cut costs • Use AsyncMap to avoid using async/await when embedding in UI • Use Dio Interceptor to leave API request breadcrumbs • Use AutoRouteObserver to leave screen transition breadcrumbs • Assign device information, app version, etc. for high observability * Assumption of non-infringement of user's privacy policy 3 . 3 . Select tools and embed measurements
  67. FlutterNinjas 7 0 3 . 3 . Select tools and

    embed measurements * Assumption of non-infringement of user's privacy policy Tips for embedding: • Introduce a mechanism to measure interruptions in a less complicated way • Apply a sampling rate to cut costs • Use AsyncMap to avoid using async/await when embedding in UI • Use Dio Interceptor to leave API request breadcrumbs • Use AutoRouteObserver to leave screen transition breadcrumbs • Assign device information, app version, etc. for high observability In the interest of time, I will only introduce one 🙇
  68. FlutterNinjas 7 1 Introduce a mechanism to measure interruptions in

    a less complicated way 3 . 3 . Select tools and embed measurements
  69. FlutterNinjas 7 2 → → → 3 . 3 .

    Select tools and embed measurements * This is a function used in the development environment. SMS login example in WINTICKET Login top Enter phone number Enter veri fi cation code Success & Navigate to home
  70. FlutterNinjas → → → 7 3 3 . 3 .

    Select tools and embed measurements * This is a function used in the development environment. SMS login example in WINTICKET Login top Enter phone number Enter veri fi cation code Success & Navigate to home Start Measurement when button tapped End Measurement with Success when login success
  71. FlutterNinjas 7 4 → → 3 . 3 . Select

    tools and embed measurements * This is a function used in the development environment. SMS login example in WINTICKET Login top Enter phone number Enter veri fi cation code Measurement end with Error when any error caused by system or users.
  72. FlutterNinjas 7 5 → → 3 . 3 . Select

    tools and embed measurements * This is a function used in the development environment. SMS login example in WINTICKET Login top Enter phone number Enter veri fi cation code To cancel the use of a function, measure as “Cancel”. In the case of crash or application kills, the measurement is taken as “interruption”.
  73. FlutterNinjas 7 6 → → 3 . 3 . Select

    tools and embed measurements * This is a function used in the development environment. SMS login example in WINTICKET Login top Enter phone number Enter veri fi cation code To cancel the use of a function, measure as “Cancel”. In the case of application kills, the measurement is taken as “interruption”. How should interruptions be measured?
  74. FlutterNinjas 7 7 Flutter's lifecycle cannot detect app kill 3

    . 3 . Select tools and embed measurements
  75. FlutterNinjas 7 8 Flutter's lifecycle cannot detect app kill 3

    . 3 . Select tools and embed measurements Even if it could be detected, there is no guarantee that the log transmission process is working properly in the event of a crash due to a screen freeze or memory leak.
  76. FlutterNinjas 7 9 Flutter's lifecycle cannot detect app kill 3

    . 3 . Select tools and embed measurements Even if it could be detected, there is no guarantee that the log transmission process is working properly in the event of a crash due to a screen freeze or memory leak. How should interruptions be measured?
  77. FlutterNinjas 8 0 Use Local Database to measure interruptions 3

    . 3 . Select tools and embed measurements
  78. FlutterNinjas → → → 8 1 3 . 3 .

    Select tools and embed measurements * This is a function used in the development environment. Login top Enter phone number Enter veri fi cation code Success & Navigate to home Start Measurement when button tapped End Measurement with Success when login success SMS login example in WINTICKET
  79. FlutterNinjas → → → 8 2 3 . 3 .

    Select tools and embed measurements * This is a function used in the development environment. Login top Enter phone number Enter veri fi cation code Success & Navigate to home Start Measurement when button tapped End Measurement with Success when login success Save measurement start data in Local Database Delete data from LocalDatabase SMS login example in WINTICKET
  80. FlutterNinjas 8 3 → → 3 . 3 . Select

    tools and embed measurements * This is a function used in the development environment. Login top Enter phone number Enter veri fi cation code SMS login example in WINTICKET Delete data from LocalDatabase Measurement end with Error when any error caused by system or users.
  81. FlutterNinjas 8 4 → → 3 . 3 . Select

    tools and embed measurements * This is a function used in the development environment. Login top Enter phone number Enter veri fi cation code SMS login example in WINTICKET To cancel the use of a function, measure as “Cancel”. Delete data from LocalDatabase
  82. FlutterNinjas → → 8 5 App Crash 💥 / App

    Kills → Re-launch App 3 . 3 . Select tools and embed measurements * This is a function used in the development environment. Login top Enter phone number Enter veri fi cation code SMS login example in WINTICKET Start Measurement when button tapped Save measurement start data in Local Database
  83. FlutterNinjas → → 8 6 → Data available in Local

    Database only when app crashed or killed. 3 . 3 . Select tools and embed measurements * This is a function used in the development environment. Login top Enter phone number Enter veri fi cation code SMS login example in WINTICKET Start Measurement when button tapped Save measurement start data in Local Database Re-launch App App Crash 💥 / App Kills
  84. FlutterNinjas → → 8 7 Measure as Interruption & delete

    from Local Database → 3 . 3 . Select tools and embed measurements * This is a function used in the development environment. Login top Enter phone number Enter veri fi cation code SMS login example in WINTICKET Start Measurement when button tapped Save measurement start data in Local Database Re-launch App Data available in Local Database only when app crashed or killed. App Crash 💥 / App Kills
  85. FlutterNinjas 8 8 Point of concern: • Transaction is not

    sent for users who leave the app without starting it up aga • Using Local Database 3 . 3 . Select tools and embed measurements
  86. FlutterNinjas 8 9 • Data is not sent for users

    who leave the app without starting it up again • Using Local Database → It is not necessary to measure all users' data as long as the failure can be detected. Priority was given to policies that guarantee transmission. → Shared Preference and Secure Preference are single table, so if the functional side also wants to update local data, con fl icts may occur. 3 . 3 . Select tools and embed measurements Point of concern:
  87. FlutterNinjas 9 1 There are two main types of alerts

    used in SLI/SLO • Error Budget Alert • Burn Rate Alert 3 . 4 . Create alerts
  88. FlutterNinjas 9 2 Error Budget Alert https://docs.datadoghq.com/en/service_management/service_level_objectives/error_budget/ Error budget alerts

    are based on thresholds and notify when a certain percentage of the SLO's error budget has not been consumed. For example, an alert would be set if 7 5 % of the error budget is consumed in the period of interest, and a warning if 5 0 % is consumed. 3 . 4 . Create alerts
  89. FlutterNinjas 9 3 Error Budget Alert https://docs.datadoghq.com/en/service_management/service_level_objectives/error_budget/ Error budget alerts

    are based on thresholds and notify when a certain percentage of the SLO's error budget has not been consumed. For example, an alert would be set if 7 5 % of the error budget is consumed in the period of interest, and a warning if 5 0 % is consumed. 3 . 4 . Create alerts Error Budget ?
  90. FlutterNinjas 9 4 Error Budget A measure of how much

    loss of service reliability is acceptable. If the service level objective (SLO) is to maintain a “ 9 9 . 9 9 %” request response rate, the error budget would be to keep the error response rate below “ 0 . 0 1 %. 3 . 4 . Create alerts https://docs.datadoghq.com/en/service_management/service_level_objectives/error_budget/
  91. FlutterNinjas 9 5 Google login SLO (Function Name) Alerts sound

    when 8 0 % or 1 0 0 % of the error budget is consumed. * This is not a Sentry alert function; it is my own creation based on Sentry measurement data 3 . 4 . Create alerts https://docs.datadoghq.com/en/service_management/service_level_objectives/error_budget/ Error Budget Alert
  92. FlutterNinjas 9 6 Burn Rate Alert https://docs.datadoghq.com/en/service_management/service_level_objectives/burn_rate/ Noti fi cation

    is given when the consumption rate of an error budget exceeds a speci fi ed threshold and continues for a speci fi ed period of time. For example, for the SLO 7 -day target, an alert could be set up if a burn rate of 1 6 . 8 or higher is measured in the past hour over the past 5 minutes. 3 . 4 . Create alerts
  93. FlutterNinjas 9 7 = 1 hour * 1 0 0

    % 7 days * 2 4 hours * 1 0 % Error budget consumed = 7 * 2 4 * 0 . 1 1 1 6 . 8 3 . 4 . Create alerts https://docs.datadoghq.com/en/service_management/service_level_objectives/burn_rate/ Noti fi cation is given when the consumption rate of an error budget exceeds a speci fi ed threshold and continues for a speci fi ed period of time. For example, for the SLO 7 -day target, an alert could be set up if a burn rate of 1 6 . 8 or higher is measured in the past hour over the past 5 minutes. Burn Rate Alert
  94. FlutterNinjas 9 8 Burn rate alerts can make you aware

    of many defects immediately. However, the following problems exist in “user experience SLI/SLO” ❌ User-induced errors and cancellations are highly variable ❌ A single user's behavior has a large impact on alerts at times when sessions are low, such as late at night or early in the morning 3 . 4 . Create alerts https://docs.datadoghq.com/en/service_management/service_level_objectives/burn_rate/ Burn Rate Alert
  95. FlutterNinjas 9 9 ❌ User-induced errors and cancellations are highly

    variable → Adjust error budget to account for upward blurring ❌ A single user's behavior has a large impact on alerts at times when sessions are low, such as late at night or early in the morning → Set the number of errors to alert 3 . 4 . Create alerts https://docs.datadoghq.com/en/service_management/service_level_objectives/burn_rate/ Burn Rate Alert Burn rate alerts can make you aware of many defects. However, the following problems exist
  96. FlutterNinjas 1 0 0 Facebook login (Function Name) Alerts if

    2 0 % or 3 0 % exceeded on an hourly basis Warning require 1 0 transactions; Critical require 1 5 or more 3 . 4 . Create alerts https://docs.datadoghq.com/en/service_management/service_level_objectives/burn_rate/ * This is not a Sentry alert function; it is my own creation based on Sentry measurement data Burn Rate Alert
  97. FlutterNinjas 1 0 1 3 . 5 . Achieve high

    level of observability After detecting an alert, assess the situation, including whether it is a false positive. At this time, add data in a manner that does not violate privacy policies.
  98. FlutterNinjas 1 0 2 • User's device name and OS

    version • Device performance(High, Medium, Low) • App version • Which screen was last displayed • Connection Status • etc … 3 . 5 . Achieve high level of observability After detecting an alert, assess the situation, including whether it is a false positive. At this time, add data in a manner that does not violate privacy policies.
  99. FlutterNinjas 1 0 3 3 . 6 . Embed SLI/SLO

    in teams and business Service quality is very important for the entire business
  100. FlutterNinjas 1 0 4 I want the business team to

    know that engineers are working on quality 3 . 6 . Embed SLI/SLO in teams and business
  101. FlutterNinjas 1 0 5 Can't we use the SLI/SLO mechanism

    to address quality more across the entire business? 3 . 6 . Embed SLI/SLO in teams and business
  102. FlutterNinjas 1 0 6 Creating DASHBOARD function 3 . 6

    . Embed SLI/SLO in teams and business
  103. FlutterNinjas 1 0 7 Creating DASHBOARD function • Objective: Anyone

    can see defects at any time • Means: Dashboard functionality that is accessible at hand • Axis of Improvement: Make it possible to start from here to understand the status of a failure. 3 . 6 . Embed SLI/SLO in teams and business
  104. FlutterNinjas 1 0 9 SMS, Google … (Function Names) Success

    rates of functions by time ( 1 h, 4 h, 2 4 h, 7 d) 3 . 6 . Embed SLI/SLO in teams and business Login Group Description Target values representing normality per function
  105. FlutterNinjas 1 1 1 3.4. アラートを作成する 📊 ダッシュボード機能 Display transaction

    count per 1 0 mins; Success transaction count: 2 4 Cancellation transaction count: 0 Error transaction count: 1
  106. FlutterNinjas 1 1 2 3.4. アラートを作成する If the graph display

    like this … Failure is likely from this time
  107. FlutterNinjas 1 1 4 👍 Providing ✅ Gain a complete

    picture of the problem and increase the initial speed to resolution ✅ Provide a source of information for double-checking failures ✅ Con fi rm resolution of failure 🙅 Do not provide 3 . 6 . Embed SLI/SLO in teams and business
  108. FlutterNinjas 1 1 5 👍 Providing ✅ Gain a complete

    picture of the problem and increase the initial speed to resolution ✅ Provide a source of information for double-checking failures ✅ Con fi rm resolution of failure 🙅 Do not provide ❌ This one thing can even completely eliminate obstacles. ❌ Only trust this information 1 0 0 % to make business decisions ❌ Detailed data per user, per version, etc. 3 . 6 . Embed SLI/SLO in teams and business
  109. FlutterNinjas 1 1 7 Established as a tool for business

    members to immediately check user impact when they are noti fi ed of an incident. 3 . 6 . Embed SLI/SLO in teams and business
  110. FlutterNinjas 1 . How much has MTTR improved? 2 .

    Impression 1 1 8 4 . Impressions from actual operation of “user experience SLI/SLO”
  111. FlutterNinjas 1 1 9 4 . 1 . How much

    has MTTR improved ? The following patterns of failure exist: • Degreasing by internal code • Failure/maintenance of external services (Firebase, payment and banking services)
  112. FlutterNinjas 1 2 0 • Degreasing by internal code →

    No degreasing in the measurement range after operation since February 2 0 2 4 . • Failure/maintenance of external services (Firebase, payment and banking services) → Shortened the time to con fi rm the scope of impact, allowing us to identify events that had previously gone unnoticed. 4 . 1 . How much has MTTR improved ? The following patterns of failure exist:
  113. FlutterNinjas 1 2 1 SLI/SLO requires a great deal of

    specialized knowledge, and there are many areas that I have omitted from my explanation due to time constraints. There are still many things I don't understand about SLI/SLO, and I will continue to learn more. If you are interested, I would be happy to talk with you later. 4 . 2 . Impression
  114. FlutterNinjas 1 2 2 4 . 2 . One more

    thing … Sample Repository is available.