Slide 1

Slide 1 text

FlutterNinjas Takuma Osada Lead Mobile App Engineer @ WinTicket, CyberAgent, Inc Monitoring user experience of Flutter apps with SLI/SLO

Slide 2

Slide 2 text

FlutterNinjas 2 Introduction Takuma Osada Lead Mobile App Engineer / WinTicket, CyberAgent Inc • GitHub ID: ostk 0 0 6 9 • X ID: ostk 0 0 6 9

Slide 3

Slide 3 text

FlutterNinjas 3 Presentation slides are available in Japanese. このスライドを 日 本語にしたものをX(Twitter) で先ほど投稿しました X(Twitter) ID: @ostk 0 0 6 9

Slide 4

Slide 4 text

FlutterNinjas 4 Today’s Goal Understand SLI/SLO of user experience & expand the world of monitoring in fl utter apps 💪

Slide 5

Slide 5 text

FlutterNinjas 5 Today’s Goal Understand SLI/SLO of user experience & expand the world of monitoring in fl utter apps 💪 This session is not about fl utter/dart. However, quality control / monitoring is one of the most exciting aspects in developing fl utter apps. ⚠

Slide 6

Slide 6 text

FlutterNinjas 6 Today’s Goal Understand SLI/SLO of user experience & expand the world of monitoring in fl utter apps 💪 ⚠ I hope you will enjoy it with the understanding that there are some aspects to it. 🙌

Slide 7

Slide 7 text

FlutterNinjas 7 AGENDA 1 . What is “user experience SLI/SLO” 2 . Why “user experience SLI/SLO” is needed 3 . How to create “user experience SLI/SLO” 4 . Impressions from actual operation of “user experience SLI/SLO”

Slide 8

Slide 8 text

FlutterNinjas 8 1 . What is “user experience SLI/SLO” 1 . What is general SLI/SLO 2 . What is “user experience SLI/SLO”

Slide 9

Slide 9 text

FlutterNinjas 9 1 . 1 . What is general SLI/SLO

Slide 10

Slide 10 text

FlutterNinjas 1 0 1 . 1 . What is general SLI/SLO Have you ever heard of word “SLI/SLO” ? 🙋

Slide 11

Slide 11 text

FlutterNinjas 1 1 🙋 Who knows what SLI/SLO is? 1 . 1 . What is general SLI/SLO

Slide 12

Slide 12 text

FlutterNinjas 1 2 SLI(Service Level Indicator) Measurable indicators of user satisfaction quantitatively. eg. response time, error rate, and downtime. https://cloud.google.com/architecture/framework/reliability/slo-components 1 . 1 . What is general SLI/SLO

Slide 13

Slide 13 text

FlutterNinjas 1 3 SLI(Service Level Indicator) SLO(Service Level Objectives) Goals to be achieved with SLI. eg. “ 9 9 . 9 % uptime”, “average response time of less than X milliseconds”. https://cloud.google.com/architecture/framework/reliability/slo-components Measurable indicators of user satisfaction quantitatively eg. response time, error rate, and downtime. 1 . 1 . What is general SLI/SLO

Slide 14

Slide 14 text

FlutterNinjas 1 4 SLI(Service Level Indicator) SLO(Service Level Objectives) Goals to be achieved with SLI. eg. “ 9 9 . 9 % uptime”, “average response time of less than X milliseconds”. https://cloud.google.com/architecture/framework/reliability/slo-components Measurable indicators of user satisfaction quantitatively eg. response time, error rate, and downtime. 1 . 1 . What is general SLI/SLO NOT easy to understand 😅

Slide 15

Slide 15 text

FlutterNinjas 1 5 SLI(Service Level Indicator) SLO(Service Level Objectives) Goals to be achieved with SLI. eg. “ 9 9 . 9 % uptime”, “average response time of less than X milliseconds”. https://cloud.google.com/architecture/framework/reliability/slo-components Measurable indicators of user satisfaction quantitatively eg. response time, error rate, and downtime. 1 . 1 . What is general SLI/SLO Let’s think with familiar example to Flutter developers 📱

Slide 16

Slide 16 text

FlutterNinjas 1 6 Think with “Crash Free User Rate” 1 . 1 . What is general SLI/SLO

Slide 17

Slide 17 text

FlutterNinjas 1 7 Crash Free User Rate Percentage of users NOT experiencing crashes. Crash Free Session Rate is also available for sessions. https://docs.newrelic.com/docs/mobile-monitoring/mobile-monitoring-ui/mobile-app-pages/release-versions-page/#drill-down 1 . 1 . What is general SLI/SLO

Slide 18

Slide 18 text

FlutterNinjas 1 8 SLI SLO https://cloud.google.com/architecture/framework/reliability/slo-components Measurable indicators of user satisfaction quantitatively eg. response time, error rate, and downtime. Goals to be achieved with SLI. eg. “ 9 9 . 9 % uptime”, “average response time of less than X milliseconds”. 1 . 1 . What is general SLI/SLO

Slide 19

Slide 19 text

FlutterNinjas 1 9 SLI SLO https://cloud.google.com/architecture/framework/reliability/slo-components Crash Free User Rate 1 . 1 . What is general SLI/SLO Measurable indicators of user satisfaction quantitatively eg. response time, error rate, and downtime. Goals to be achieved with SLI. eg. “ 9 9 . 9 % uptime”, “average response time of less than X milliseconds”.

Slide 20

Slide 20 text

FlutterNinjas 2 0 SLI SLO https://cloud.google.com/architecture/framework/reliability/slo-components Maintain 9 9 % on a 2 4 h basis 1 . 1 . What is general SLI/SLO Goals to be achieved with SLI. eg. “ 9 9 . 9 % uptime”, “average response time of less than X milliseconds”. Crash Free User Rate Measurable indicators of user satisfaction quantitatively eg. response time, error rate, and downtime.

Slide 21

Slide 21 text

FlutterNinjas Goals to be achieved with SLI. eg. “ 9 9 . 9 % uptime”, “average response time of less than X milliseconds”. 2 1 SLI SLO https://cloud.google.com/architecture/framework/reliability/slo-components Target values can be adjusted depending on how far the team or business wants to maintain them. 1 . 1 . What is general SLI/SLO Crash Free User Rate Measurable indicators of user satisfaction quantitatively eg. response time, error rate, and downtime. Maintain 9 9 % on a 2 4 h basis

Slide 22

Slide 22 text

FlutterNinjas 2 2 1 . 1 . What is general SLI/SLO What should be measured as SLI?

Slide 23

Slide 23 text

FlutterNinjas 2 3 ↓ CUJ 1 . 1 . What is general SLI/SLO What should be measured as SLI?

Slide 24

Slide 24 text

FlutterNinjas 2 4 CUJ(Critical User Journey) A user journey is a series of tasks that a user performs to achieve a goal. CUJ refers to a sequence of tasks that are common to achieving many goals or to achieving a very important goal. https://sre.google/workbook/implementing-slos/#modeling-user-journeys 1 . 1 . What is general SLI/SLO

Slide 25

Slide 25 text

FlutterNinjas 2 5 Example in EC services: Find products, add items to cart, complete purchase Theses are most likely to be listed as a CUJ. https://sre.google/workbook/implementing-slos/#modeling-user-journeys 1 . 1 . What is general SLI/SLO CUJ(Critical User Journey) A user journey is a series of tasks that a user performs to achieve a goal. CUJ refers to a sequence of tasks that are common to achieving many goals or to achieving a very important goal.

Slide 26

Slide 26 text

FlutterNinjas 2 6 https://sre.google/workbook/implementing-slos/#modeling-user-journeys 1 . 1 . What is general SLI/SLO A user journey is a series of tasks that a user performs to achieve a goal. CUJ refers to a sequence of tasks that are common to achieving many goals or to achieving a very important goal. CUJ(Critical User Journey) Example in EC services: Find products, add items to cart, complete purchase Theses are most likely to be listed as a CUJ. Ultimately, SLOs need to focus on improving the user experience. Therefore, create SLOs in terms of user-centered actions.

Slide 27

Slide 27 text

FlutterNinjas 2 7 1 . 2 . What is “user experience SLI/SLO” Let’s start looking for di ff erence between general SLI/SLO & “User experience SLI/SLO” * “user experience SLI/SLO” is original word. Not common in SRE words.

Slide 28

Slide 28 text

FlutterNinjas 2 8 1 . 2 . What is “user experience SLI/SLO” When I looked into SLI/SLO, all I found was that it is measured in units of API requests. I could not fi nd a clear background as far as I could fi nd, but I assume the reasons are as follows. • The SRE domain is a technology born from the back-end and infrastructure • Measurable regardless of the platform used by the user

Slide 29

Slide 29 text

FlutterNinjas 2 9 “User experience SLI/SLO” stands for … “measuring a series of function fl ows from the start to the end”. 1 . 2 . What is “user experience SLI/SLO”

Slide 30

Slide 30 text

FlutterNinjas 3 0 SMS login example in WINTICKET → → → 1 . 2 . What is “user experience SLI/SLO” Login top Enter phone number Enter veri fi cation code Success & Navigate to home * This is a function used in the development environment.

Slide 31

Slide 31 text

FlutterNinjas → → → 3 1 Start Measurement when button tapped End Measurement with Success when login success 1 . 2 . What is “user experience SLI/SLO” SMS login example in WINTICKET ※ This is a function used in the development environment. Login top Enter phone number Enter veri fi cation code Success & Navigate to home = Login with SMS = Login Success

Slide 32

Slide 32 text

FlutterNinjas 3 2 Measurement end with Error when any error caused by system or users. → → 1 . 2 . What is “user experience SLI/SLO” SMS login example in WINTICKET Login top Enter phone number Enter veri fi cation code * This is a function used in the development environment. = Wrong code error

Slide 33

Slide 33 text

FlutterNinjas 3 3 → → 1 . 2 . What is “user experience SLI/SLO” SMS login example in WINTICKET * This is a function used in the development environment. Login top Enter phone number Enter veri fi cation code To cancel the use of a function, measure as “Cancel”. In the case of crash or application kills, the measurement is taken as “interruption”.

Slide 34

Slide 34 text

FlutterNinjas 3 4 Advantages • Can measure cancellations and interruptions makes measurement wide. • Flutter(one-source solution) can be applied to both iOS and Android. 1 . 2 . What is “user experience SLI/SLO”

Slide 35

Slide 35 text

FlutterNinjas 3 5 Disadvantage • As the measurement range increases, complexity increases. • noise data happens by users 1 . 2 . What is “user experience SLI/SLO” Advantages • Can measure cancellations and interruptions makes measurement wide. • Flutter(one-source solution) can be applied to both iOS and Android.

Slide 36

Slide 36 text

FlutterNinjas 3 6 Measurement Unit Measure system-induced errors “User experience SLI/SLO” SLI/SLO ❌ ✅ API Requests Series of function fl ows ✅ ✅ Measurement Domain Back-end Client Ease of embedding measurements ✅ ❌ 1 . 2 . What is “user experience SLI/SLO” Measure user-induced errors

Slide 37

Slide 37 text

FlutterNinjas 3 7 2 . Why “user experience SLI/SLO” is needed 1 . Aim to improve system availability 2 . Aim to improve MTTR(mean time to repair)

Slide 38

Slide 38 text

FlutterNinjas 3 8 2 . 1 . Aim to improve system availability I will explain why we created “user experience SLI/SLO”

Slide 39

Slide 39 text

FlutterNinjas 3 9 System Availability 2 . 1 . Aim to improve system availability

Slide 40

Slide 40 text

FlutterNinjas 4 0 What is System Availability? System Availability (%) = Working Hours / (Stop Time + Working Hours) Stop Time (h) = MTTR* x 3 6 5 days x 2 4 hours / MTBF* MTTR(mean time to repair) = Summary of Stop Time / Trouble Count MTBF(mean time between failures) = Summary of Working Hours / Trouble Count 2 . 1 . Aim to improve system availability

Slide 41

Slide 41 text

FlutterNinjas 4 1 Situation in 2 0 2 2 • Change Failure Rate: 7 . 5 % ( 3 / 4 0 ) • MTTR*: 4 . 0 days • System Availability: 9 6 . 8 % 2 . 1 . Aim to improve system availability * MTTR: mean time to repair

Slide 42

Slide 42 text

FlutterNinjas 4 2 Goal for 2 0 2 3 • Change Failure Rate: 5 % • MTTR*: 3 . 0 days • System Availability: 9 8 . 4 % → 2 . 1 . Aim to improve system availability Situation in 2 0 2 2 • Change Failure Rate: 7 . 5 % ( 3 / 4 0 ) • MTTR*: 4 . 0 days • System Availability: 9 6 . 8 % * MTTR: mean time to repair

Slide 43

Slide 43 text

FlutterNinjas 4 3 To achieve the goals … 2 . 1 . Aim to improve system availability

Slide 44

Slide 44 text

FlutterNinjas To achieve the goals … 4 4 📈 Reduce the failure rate • Create a system to detect problems before release • Conduct a review of the existing testing regime 2 . 1 . Aim to improve system availability

Slide 45

Slide 45 text

FlutterNinjas To achieve the goals … 4 5 🕐 Lower MTTR • Detect defects early after release • Review the existing monitoring system 2 . 1 . Aim to improve system availability 📈 Reduce the failure rate • Create a system to detect problems before release • Conduct a review of the existing testing regime

Slide 46

Slide 46 text

FlutterNinjas To achieve the goals … 4 6 Skip this for this session 2 . 1 . Aim to improve system availability 📈 Reduce the failure rate 🕐 Lower MTTR • Create a system to detect problems before release • Conduct a review of the existing testing regime • Detect defects early after release • Review the existing monitoring system

Slide 47

Slide 47 text

FlutterNinjas To achieve the goals … 4 7 Skip this for this session 2 . 1 . Aim to improve system availability 📈 Reduce the failure rate 🕐 Lower MTTR • Create a system to detect problems before release • Conduct a review of the existing testing regime • Detect defects early after release • Review the existing monitoring system Strategy about End to End Testing is uploaded in Youtube. Presentation in FlutterKaigi 2 0 2 3 . (Only in Japanese 🙏) https://www.youtube.com/watch?v=VHhZlTDfwIQ&ab_channel=FlutterKaigi

Slide 48

Slide 48 text

FlutterNinjas 4 8 2 . 2 . Aim to improve MTTR(mean time to repair) My team can detect defects in the following ways • Alert from error tracking tool(Error / Crash / ANR) • Inquiry from users • Message from co-workers • Ego-search on SNS

Slide 49

Slide 49 text

FlutterNinjas Properties of each detection method 4 9 Detection Speed Detection Range Escalation Detection Area Inquiry Error Tracking Tool Message from co-workers Ego-search Known Unknown Fast Slow - Mid Mid All Slow Slow Mid Wide ✅ ✅ ✅ ❌ Wide All All 2 . 2 . Aim to improve MTTR(mean time to repair)

Slide 50

Slide 50 text

FlutterNinjas Properties of each detection method 5 0 Detection Speed Detection Range Escalation Detection Area Inquiry Error Tracking Tool Message from co-workers Ego-search Known Unknown Fast Slow - Mid Mid All Slow Slow Mid Wide ✅ ✅ ✅ ❌ Wide All All Error tracking tool can detect fast, but can only pick up “known unknowns”. 2 . 2 . Aim to improve MTTR(mean time to repair)

Slide 51

Slide 51 text

FlutterNinjas Properties of each detection method 5 1 Detection Speed Detection Range Escalation Detection Area Inquiry Error Tracking Tool Message from co-workers Ego-search Known Unknown Fast Slow - Mid Mid All Slow Slow Mid Wide ✅ ✅ ✅ ❌ Wide All All The others can detect unknown unknowns, but the detection speed is not fast enough. 2 . 2 . Aim to improve MTTR(mean time to repair)

Slide 52

Slide 52 text

FlutterNinjas 5 2 “User experience SLI/SLO” could be a way out of the problem 🧐 2 . 2 . Aim to improve MTTR(mean time to repair)

Slide 53

Slide 53 text

FlutterNinjas 5 3 Here is the actual list of failures that were not detected by error tracking tool in the WINTICKET app (excerpts) • Live video does not load and error occurs • When using WebView on Android 1 4 , blank display when returning from background recovery • Some users get an error when using credit card charge • Freezes for some users when using SMS authentication 2 . 2 . Aim to improve MTTR(mean time to repair)

Slide 54

Slide 54 text

FlutterNinjas 5 4 Here is the actual list of failures that were not detected by error tracking tool in the WINTICKET app (excerpts) • Live video does not load and error occurs • When using WebView on Android 1 4 , blank display when returning from background recovery • Some users get an error when using credit card charge • Freezes for some users when using SMS authentication “User experience SLI/SLO” can detect not all, but some of these failures. 2 . 2 . Aim to improve MTTR(mean time to repair)

Slide 55

Slide 55 text

FlutterNinjas All All All Known Unknown Properties of each detection method 5 5 ✅ ✅ ✅ ❌ User Experience SLI/SLO ✅ All ※ Only in measurement function SLI/SLO can detect unknown unknown problems within the measurement function. Escalation is also possible by creating an alert function. Detection Speed Detection Range Escalation Detection Area Inquiry Error Tracking Tool Message from co-workers Ego-search 2 . 2 . Aim to improve MTTR(mean time to repair) Fast Slow - Mid Slow Slow Fast Mid Mid Wide Wide Mid

Slide 56

Slide 56 text

FlutterNinjas 5 6 Two approaches reduce MTTR 2 . 2 . Aim to improve MTTR(mean time to repair)

Slide 57

Slide 57 text

FlutterNinjas 5 7 🚨 Alert Function 2 . 2 . Aim to improve MTTR(mean time to repair) Two approaches reduce MTTR • Reduce MTTD(mean time to detect) by creating alerts • Create alerts focused on the latest version to detect release-related defects

Slide 58

Slide 58 text

FlutterNinjas 5 8 • Reduce MTTD(mean time to detect) by creating alerts • Create alerts focused on the latest version to detect release-related defects ⛰High Observability • Provide more information to reduce the MTTI (mean time to survey) and make the analysis easier and more accurate. 2 . 2 . Aim to improve MTTR(mean time to repair) Two approaches reduce MTTR 🚨 Alert Function

Slide 59

Slide 59 text

FlutterNinjas 5 9 Let's take a look how “user experience SLI/SLO” is made 🚀 2 . 2 . Aim to improve MTTR(mean time to repair)

Slide 60

Slide 60 text

FlutterNinjas 6 0 3 . How to create “user experience SLI/SLO” 1 . Input knowledge 2 . De fi ne CUJ and align with business teams 3 . Select tools and embed measurements 4 . Create alerts 5 . Achieve high level of observability 6 . Embed SLI/SLO in teams and business

Slide 61

Slide 61 text

FlutterNinjas 6 1 3 . 1 . Input knowledge Neither me nor my team were familiar with SLI/SLO. For this reason, we held a circle reading session in parallel with the survey. https://www.oreilly.com/library/view/observability-engineering/ 9 7 8 1 4 9 2 0 7 6 4 3 8 /

Slide 62

Slide 62 text

FlutterNinjas 6 2 3 . 1 . Input knowledge Neither me nor my team were familiar with SLI/SLO. For this reason, we held a circle reading session in parallel with the survey. As a result, it was a good opportunity to gain knowledge, of course, but it was also very good to be able to communicate with people like, “That thing in this book, right?” or “Is the 0 0 in that book good?” It was very good to be able to communicate like that. https://www.oreilly.com/library/view/observability-engineering/ 9 7 8 1 4 9 2 0 7 6 4 3 8 /

Slide 63

Slide 63 text

FlutterNinjas 6 3 3 . 2 . Define CUJ and align with business teams Review CUJs with the business team and prioritize 👤 Auth 🚴 Betting 🏦 Transfer Payo f 💰 Charge 🎬 Video 🎬 Broadcast 📈 Analysis Bettings 🎉 Campaign 📊 View Predictions ✉ Message 🔧 Settings 📄 Con fi rm History 🔍 Search 📅 Schedules

Slide 64

Slide 64 text

FlutterNinjas 6 4 3 . 2 . Define CUJ and align with business teams Review CUJs with the business team and prioritize Since the purpose of users' use of the service is “to enjoy gambling”, we have focused on the following functions. 👤 Auth 🚴 Betting 🏦 Transfer Payo f 💰 Charge 🎬 Video 🎬 Broadcast 📈 Analysis Bettings 🎉 Campaign 📊 View Predictions ✉ Message 🔧 Settings 📄 Con fi rm History 🔍 Search 📅 Schedules

Slide 65

Slide 65 text

FlutterNinjas 6 5 3 . 3 . Select tools and embed measurements 🧠 Sentry etc … Big Query New Relic DataDog Many tools exist to measure and alert on SLI/SLO

Slide 66

Slide 66 text

FlutterNinjas 6 6 3 . 3 . Select tools and embed measurements 🧠 Sentry etc … Big Query New Relic DataDog My team has selected Sentry Many tools exist to measure and alert on SLI/SLO

Slide 67

Slide 67 text

FlutterNinjas 6 7 3 . 3 . Select tools and embed measurements Why Sentry? ✅ Already used Sentry for error tracking, so familiar with it ✅ Ability to complete error tracking and SLI/SLO measurement in one tool ✅ SDK development is very active and high maintenance can be expected in the future

Slide 68

Slide 68 text

FlutterNinjas 6 8 Once the measurement tool is determined, let’s start embedding measurement ⛏ 3 . 3 . Select tools and embed measurements

Slide 69

Slide 69 text

FlutterNinjas 6 9 Tips for embedding: • Introduce a mechanism to measure interruptions in a less complicated way • Apply a sampling rate to cut costs • Use AsyncMap to avoid using async/await when embedding in UI • Use Dio Interceptor to leave API request breadcrumbs • Use AutoRouteObserver to leave screen transition breadcrumbs • Assign device information, app version, etc. for high observability * Assumption of non-infringement of user's privacy policy 3 . 3 . Select tools and embed measurements

Slide 70

Slide 70 text

FlutterNinjas 7 0 3 . 3 . Select tools and embed measurements * Assumption of non-infringement of user's privacy policy Tips for embedding: • Introduce a mechanism to measure interruptions in a less complicated way • Apply a sampling rate to cut costs • Use AsyncMap to avoid using async/await when embedding in UI • Use Dio Interceptor to leave API request breadcrumbs • Use AutoRouteObserver to leave screen transition breadcrumbs • Assign device information, app version, etc. for high observability In the interest of time, I will only introduce one 🙇

Slide 71

Slide 71 text

FlutterNinjas 7 1 Introduce a mechanism to measure interruptions in a less complicated way 3 . 3 . Select tools and embed measurements

Slide 72

Slide 72 text

FlutterNinjas 7 2 → → → 3 . 3 . Select tools and embed measurements * This is a function used in the development environment. SMS login example in WINTICKET Login top Enter phone number Enter veri fi cation code Success & Navigate to home

Slide 73

Slide 73 text

FlutterNinjas → → → 7 3 3 . 3 . Select tools and embed measurements * This is a function used in the development environment. SMS login example in WINTICKET Login top Enter phone number Enter veri fi cation code Success & Navigate to home Start Measurement when button tapped End Measurement with Success when login success

Slide 74

Slide 74 text

FlutterNinjas 7 4 → → 3 . 3 . Select tools and embed measurements * This is a function used in the development environment. SMS login example in WINTICKET Login top Enter phone number Enter veri fi cation code Measurement end with Error when any error caused by system or users.

Slide 75

Slide 75 text

FlutterNinjas 7 5 → → 3 . 3 . Select tools and embed measurements * This is a function used in the development environment. SMS login example in WINTICKET Login top Enter phone number Enter veri fi cation code To cancel the use of a function, measure as “Cancel”. In the case of crash or application kills, the measurement is taken as “interruption”.

Slide 76

Slide 76 text

FlutterNinjas 7 6 → → 3 . 3 . Select tools and embed measurements * This is a function used in the development environment. SMS login example in WINTICKET Login top Enter phone number Enter veri fi cation code To cancel the use of a function, measure as “Cancel”. In the case of application kills, the measurement is taken as “interruption”. How should interruptions be measured?

Slide 77

Slide 77 text

FlutterNinjas 7 7 Flutter's lifecycle cannot detect app kill 3 . 3 . Select tools and embed measurements

Slide 78

Slide 78 text

FlutterNinjas 7 8 Flutter's lifecycle cannot detect app kill 3 . 3 . Select tools and embed measurements Even if it could be detected, there is no guarantee that the log transmission process is working properly in the event of a crash due to a screen freeze or memory leak.

Slide 79

Slide 79 text

FlutterNinjas 7 9 Flutter's lifecycle cannot detect app kill 3 . 3 . Select tools and embed measurements Even if it could be detected, there is no guarantee that the log transmission process is working properly in the event of a crash due to a screen freeze or memory leak. How should interruptions be measured?

Slide 80

Slide 80 text

FlutterNinjas 8 0 Use Local Database to measure interruptions 3 . 3 . Select tools and embed measurements

Slide 81

Slide 81 text

FlutterNinjas → → → 8 1 3 . 3 . Select tools and embed measurements * This is a function used in the development environment. Login top Enter phone number Enter veri fi cation code Success & Navigate to home Start Measurement when button tapped End Measurement with Success when login success SMS login example in WINTICKET

Slide 82

Slide 82 text

FlutterNinjas → → → 8 2 3 . 3 . Select tools and embed measurements * This is a function used in the development environment. Login top Enter phone number Enter veri fi cation code Success & Navigate to home Start Measurement when button tapped End Measurement with Success when login success Save measurement start data in Local Database Delete data from LocalDatabase SMS login example in WINTICKET

Slide 83

Slide 83 text

FlutterNinjas 8 3 → → 3 . 3 . Select tools and embed measurements * This is a function used in the development environment. Login top Enter phone number Enter veri fi cation code SMS login example in WINTICKET Delete data from LocalDatabase Measurement end with Error when any error caused by system or users.

Slide 84

Slide 84 text

FlutterNinjas 8 4 → → 3 . 3 . Select tools and embed measurements * This is a function used in the development environment. Login top Enter phone number Enter veri fi cation code SMS login example in WINTICKET To cancel the use of a function, measure as “Cancel”. Delete data from LocalDatabase

Slide 85

Slide 85 text

FlutterNinjas → → 8 5 App Crash 💥 / App Kills → Re-launch App 3 . 3 . Select tools and embed measurements * This is a function used in the development environment. Login top Enter phone number Enter veri fi cation code SMS login example in WINTICKET Start Measurement when button tapped Save measurement start data in Local Database

Slide 86

Slide 86 text

FlutterNinjas → → 8 6 → Data available in Local Database only when app crashed or killed. 3 . 3 . Select tools and embed measurements * This is a function used in the development environment. Login top Enter phone number Enter veri fi cation code SMS login example in WINTICKET Start Measurement when button tapped Save measurement start data in Local Database Re-launch App App Crash 💥 / App Kills

Slide 87

Slide 87 text

FlutterNinjas → → 8 7 Measure as Interruption & delete from Local Database → 3 . 3 . Select tools and embed measurements * This is a function used in the development environment. Login top Enter phone number Enter veri fi cation code SMS login example in WINTICKET Start Measurement when button tapped Save measurement start data in Local Database Re-launch App Data available in Local Database only when app crashed or killed. App Crash 💥 / App Kills

Slide 88

Slide 88 text

FlutterNinjas 8 8 Point of concern: • Transaction is not sent for users who leave the app without starting it up aga • Using Local Database 3 . 3 . Select tools and embed measurements

Slide 89

Slide 89 text

FlutterNinjas 8 9 • Data is not sent for users who leave the app without starting it up again • Using Local Database → It is not necessary to measure all users' data as long as the failure can be detected. Priority was given to policies that guarantee transmission. → Shared Preference and Secure Preference are single table, so if the functional side also wants to update local data, con fl icts may occur. 3 . 3 . Select tools and embed measurements Point of concern:

Slide 90

Slide 90 text

FlutterNinjas 9 0 3 . 4 . Create alerts 🚨How Alert Works

Slide 91

Slide 91 text

FlutterNinjas 9 1 There are two main types of alerts used in SLI/SLO • Error Budget Alert • Burn Rate Alert 3 . 4 . Create alerts

Slide 92

Slide 92 text

FlutterNinjas 9 2 Error Budget Alert https://docs.datadoghq.com/en/service_management/service_level_objectives/error_budget/ Error budget alerts are based on thresholds and notify when a certain percentage of the SLO's error budget has not been consumed. For example, an alert would be set if 7 5 % of the error budget is consumed in the period of interest, and a warning if 5 0 % is consumed. 3 . 4 . Create alerts

Slide 93

Slide 93 text

FlutterNinjas 9 3 Error Budget Alert https://docs.datadoghq.com/en/service_management/service_level_objectives/error_budget/ Error budget alerts are based on thresholds and notify when a certain percentage of the SLO's error budget has not been consumed. For example, an alert would be set if 7 5 % of the error budget is consumed in the period of interest, and a warning if 5 0 % is consumed. 3 . 4 . Create alerts Error Budget ?

Slide 94

Slide 94 text

FlutterNinjas 9 4 Error Budget A measure of how much loss of service reliability is acceptable. If the service level objective (SLO) is to maintain a “ 9 9 . 9 9 %” request response rate, the error budget would be to keep the error response rate below “ 0 . 0 1 %. 3 . 4 . Create alerts https://docs.datadoghq.com/en/service_management/service_level_objectives/error_budget/

Slide 95

Slide 95 text

FlutterNinjas 9 5 Google login SLO (Function Name) Alerts sound when 8 0 % or 1 0 0 % of the error budget is consumed. * This is not a Sentry alert function; it is my own creation based on Sentry measurement data 3 . 4 . Create alerts https://docs.datadoghq.com/en/service_management/service_level_objectives/error_budget/ Error Budget Alert

Slide 96

Slide 96 text

FlutterNinjas 9 6 Burn Rate Alert https://docs.datadoghq.com/en/service_management/service_level_objectives/burn_rate/ Noti fi cation is given when the consumption rate of an error budget exceeds a speci fi ed threshold and continues for a speci fi ed period of time. For example, for the SLO 7 -day target, an alert could be set up if a burn rate of 1 6 . 8 or higher is measured in the past hour over the past 5 minutes. 3 . 4 . Create alerts

Slide 97

Slide 97 text

FlutterNinjas 9 7 = 1 hour * 1 0 0 % 7 days * 2 4 hours * 1 0 % Error budget consumed = 7 * 2 4 * 0 . 1 1 1 6 . 8 3 . 4 . Create alerts https://docs.datadoghq.com/en/service_management/service_level_objectives/burn_rate/ Noti fi cation is given when the consumption rate of an error budget exceeds a speci fi ed threshold and continues for a speci fi ed period of time. For example, for the SLO 7 -day target, an alert could be set up if a burn rate of 1 6 . 8 or higher is measured in the past hour over the past 5 minutes. Burn Rate Alert

Slide 98

Slide 98 text

FlutterNinjas 9 8 Burn rate alerts can make you aware of many defects immediately. However, the following problems exist in “user experience SLI/SLO” ❌ User-induced errors and cancellations are highly variable ❌ A single user's behavior has a large impact on alerts at times when sessions are low, such as late at night or early in the morning 3 . 4 . Create alerts https://docs.datadoghq.com/en/service_management/service_level_objectives/burn_rate/ Burn Rate Alert

Slide 99

Slide 99 text

FlutterNinjas 9 9 ❌ User-induced errors and cancellations are highly variable → Adjust error budget to account for upward blurring ❌ A single user's behavior has a large impact on alerts at times when sessions are low, such as late at night or early in the morning → Set the number of errors to alert 3 . 4 . Create alerts https://docs.datadoghq.com/en/service_management/service_level_objectives/burn_rate/ Burn Rate Alert Burn rate alerts can make you aware of many defects. However, the following problems exist

Slide 100

Slide 100 text

FlutterNinjas 1 0 0 Facebook login (Function Name) Alerts if 2 0 % or 3 0 % exceeded on an hourly basis Warning require 1 0 transactions; Critical require 1 5 or more 3 . 4 . Create alerts https://docs.datadoghq.com/en/service_management/service_level_objectives/burn_rate/ * This is not a Sentry alert function; it is my own creation based on Sentry measurement data Burn Rate Alert

Slide 101

Slide 101 text

FlutterNinjas 1 0 1 3 . 5 . Achieve high level of observability After detecting an alert, assess the situation, including whether it is a false positive. At this time, add data in a manner that does not violate privacy policies.

Slide 102

Slide 102 text

FlutterNinjas 1 0 2 • User's device name and OS version • Device performance(High, Medium, Low) • App version • Which screen was last displayed • Connection Status • etc … 3 . 5 . Achieve high level of observability After detecting an alert, assess the situation, including whether it is a false positive. At this time, add data in a manner that does not violate privacy policies.

Slide 103

Slide 103 text

FlutterNinjas 1 0 3 3 . 6 . Embed SLI/SLO in teams and business Service quality is very important for the entire business

Slide 104

Slide 104 text

FlutterNinjas 1 0 4 I want the business team to know that engineers are working on quality 3 . 6 . Embed SLI/SLO in teams and business

Slide 105

Slide 105 text

FlutterNinjas 1 0 5 Can't we use the SLI/SLO mechanism to address quality more across the entire business? 3 . 6 . Embed SLI/SLO in teams and business

Slide 106

Slide 106 text

FlutterNinjas 1 0 6 Creating DASHBOARD function 3 . 6 . Embed SLI/SLO in teams and business

Slide 107

Slide 107 text

FlutterNinjas 1 0 7 Creating DASHBOARD function • Objective: Anyone can see defects at any time • Means: Dashboard functionality that is accessible at hand • Axis of Improvement: Make it possible to start from here to understand the status of a failure. 3 . 6 . Embed SLI/SLO in teams and business

Slide 108

Slide 108 text

FlutterNinjas 1 0 8 3 . 6 . Embed SLI/SLO in teams and business

Slide 109

Slide 109 text

FlutterNinjas 1 0 9 SMS, Google … (Function Names) Success rates of functions by time ( 1 h, 4 h, 2 4 h, 7 d) 3 . 6 . Embed SLI/SLO in teams and business Login Group Description Target values representing normality per function

Slide 110

Slide 110 text

FlutterNinjas 1 1 0 3.4. アラートを作成する Green: Success rate Gray: Cancellation rate Red: Error rate

Slide 111

Slide 111 text

FlutterNinjas 1 1 1 3.4. アラートを作成する 📊 ダッシュボード機能 Display transaction count per 1 0 mins; Success transaction count: 2 4 Cancellation transaction count: 0 Error transaction count: 1

Slide 112

Slide 112 text

FlutterNinjas 1 1 2 3.4. アラートを作成する If the graph display like this … Failure is likely from this time

Slide 113

Slide 113 text

FlutterNinjas 1 1 3 📊 ダッシュボード機能 Failures likely occurring in users who meet certain conditions

Slide 114

Slide 114 text

FlutterNinjas 1 1 4 👍 Providing ✅ Gain a complete picture of the problem and increase the initial speed to resolution ✅ Provide a source of information for double-checking failures ✅ Con fi rm resolution of failure 🙅 Do not provide 3 . 6 . Embed SLI/SLO in teams and business

Slide 115

Slide 115 text

FlutterNinjas 1 1 5 👍 Providing ✅ Gain a complete picture of the problem and increase the initial speed to resolution ✅ Provide a source of information for double-checking failures ✅ Con fi rm resolution of failure 🙅 Do not provide ❌ This one thing can even completely eliminate obstacles. ❌ Only trust this information 1 0 0 % to make business decisions ❌ Detailed data per user, per version, etc. 3 . 6 . Embed SLI/SLO in teams and business

Slide 116

Slide 116 text

FlutterNinjas 1 1 6 3 . 6 . Embed SLI/SLO in teams and business

Slide 117

Slide 117 text

FlutterNinjas 1 1 7 Established as a tool for business members to immediately check user impact when they are noti fi ed of an incident. 3 . 6 . Embed SLI/SLO in teams and business

Slide 118

Slide 118 text

FlutterNinjas 1 . How much has MTTR improved? 2 . Impression 1 1 8 4 . Impressions from actual operation of “user experience SLI/SLO”

Slide 119

Slide 119 text

FlutterNinjas 1 1 9 4 . 1 . How much has MTTR improved ? The following patterns of failure exist: • Degreasing by internal code • Failure/maintenance of external services (Firebase, payment and banking services)

Slide 120

Slide 120 text

FlutterNinjas 1 2 0 • Degreasing by internal code → No degreasing in the measurement range after operation since February 2 0 2 4 . • Failure/maintenance of external services (Firebase, payment and banking services) → Shortened the time to con fi rm the scope of impact, allowing us to identify events that had previously gone unnoticed. 4 . 1 . How much has MTTR improved ? The following patterns of failure exist:

Slide 121

Slide 121 text

FlutterNinjas 1 2 1 SLI/SLO requires a great deal of specialized knowledge, and there are many areas that I have omitted from my explanation due to time constraints. There are still many things I don't understand about SLI/SLO, and I will continue to learn more. If you are interested, I would be happy to talk with you later. 4 . 2 . Impression

Slide 122

Slide 122 text

FlutterNinjas 1 2 2 4 . 2 . One more thing … Sample Repository is available.

Slide 123

Slide 123 text

FlutterNinjas 1 2 3 Thank you for your attention 🙌