Slide 1

Slide 1 text

1 The Power of Conversation aircall.io Efficient Android app Monitoring Julien Salvi - Android GDE | Android @ Aircall droidcon Italy 2024 󰏢 @JulienSalvi

Slide 2

Slide 2 text

2 The Power of Conversation aircall.io Bonjour ! Julien Salvi Lead Android Engineer @ Android GDE “Android dev in shorts!” @JulienSalvi

Slide 3

Slide 3 text

3 The Power of Conversation aircall.io 📣 Introduction & context 🔮 Efficient monitoring & observability ✌ Wrap-up! Summary

Slide 4

Slide 4 text

4 The Power of Conversation aircall.io Intro to monitoring & observability

Slide 5

Slide 5 text

5 The Power of Conversation aircall.io “ ” The story of an app in production 📗

Slide 6

Slide 6 text

6 The Power of Conversation aircall.io Happy app was happy 😃 but suddenly… 💥

Slide 7

Slide 7 text

7 The Power of Conversation aircall.io ● 💥 A crash ● 🥶 An ANR (App Not Responding) ● 🐌 Slow Startup times ● 🖼 Drop frames ● 🐛 Unexpected state in the app ● 🛜 Network failures, slow calls, timeouts… ● 🪫 Battery draining issues ● 🔓 Security vulnerabilities ● 📱 OS or device specific issues ● … What happened? Happy app was happy 😃 but suddenly…

Slide 8

Slide 8 text

8 The Power of Conversation aircall.io This leads to… Happy app was happy 😃 but suddenly is not 🥲 Angry user

Slide 9

Slide 9 text

9 The Power of Conversation aircall.io No monitoring… Angry user Cannot save new contact with French numbers

Slide 10

Slide 10 text

10 The Power of Conversation aircall.io “ ” So why monitoring?

Slide 11

Slide 11 text

11 The Power of Conversation aircall.io From multiple angry users to… …to 1 angry user allow a quick reaction 🏃

Slide 12

Slide 12 text

12 The Power of Conversation aircall.io From an angry user to… …to an angry user but much less longer 😅

Slide 13

Slide 13 text

13 The Power of Conversation aircall.io From an angry user to… …to an happy user Because we catched and fixed the issue soon enough 🫡

Slide 14

Slide 14 text

14 The Power of Conversation aircall.io Monitoring ≠ Fixing issues

Slide 15

Slide 15 text

15 The Power of Conversation aircall.io Debugging and Stability Monitoring 1 2 3 Performance Metrics Proactive Maintenance Real-time monitoring helps quickly identify crashes, ANRs, and performance bottlenecks, ensuring a stable and reliable app experience. Provides critical insights into resource usage, frame rates, and network performance, empowering developers to optimize code and enhance app responsiveness. Enables early detection of issues in production, reducing downtime and improving user trust through timely updates and fixes. “Measure all the (relevant) things!”

Slide 16

Slide 16 text

16 The Power of Conversation aircall.io Comprehensive Insights Observability 1 2 3 Faster Debugging Proactive Monitoring Observability combines metrics, logs, and traces to provide a complete picture of your app’s behavior in production or other env. Helps identify the "why" behind issues like crashes, ANRs, or slow network calls by correlating data from different sources. Enables early detection of anomalies and performance degradation, empowering developers to resolve problems before they impact users. “Observe all the (important) things!”

Slide 17

Slide 17 text

17 The Power of Conversation aircall.io OK! 🤔 but… what’s the difference between monitoring & observability

Slide 18

Slide 18 text

18 The Power of Conversation aircall.io 🤓 In short: monitoring is a tool while observability is a capability

Slide 19

Slide 19 text

19 The Power of Conversation aircall.io Now let’s deep dive into efficient monitoring for Android 🚀

Slide 20

Slide 20 text

20 The Power of Conversation aircall.io Efficient monitoring & observability 🔮 Omniscience for what matters

Slide 21

Slide 21 text

21 The Power of Conversation aircall.io Disclaimer Based on my personal experience Challenge and adapt to fit your needs Taking Firebase and Datadog as examples

Slide 22

Slide 22 text

22 The Power of Conversation aircall.io “ ” The best of Firebase

Slide 23

Slide 23 text

23 The Power of Conversation aircall.io The best of Firebase Custom Crashlytics ● Crashlytics is easy is to use and has been there for a long time… a simple plugin and let’s go! ● You can log more than a crash in Crashlytics! ● You can log an message or an exception as non-fatal issue on the platform

Slide 24

Slide 24 text

24 The Power of Conversation aircall.io

Slide 25

Slide 25 text

25 The Power of Conversation aircall.io The best of Firebase Crashlytics alerting on Slack ● Default alerting with Crashlytics is by email… and it’s quite easy to miss an email 😅 ● If you work daily with Slack, Crashlytics has a nice integration to get alerts on a dedicated channel so you can be notified in real-time ● Customization regarding the alerting: new crash, new ANR, regressions, velocity alerts…

Slide 26

Slide 26 text

26 The Power of Conversation aircall.io The best of Firebase https://rhymezxcode.medium.com/how-to-get-firebase-cr ashlytics-bug-reports-directly-on-a-slack-channel-in-real -time-ee7a516070f7 Crashlytics alerting on Slack

Slide 27

Slide 27 text

27 The Power of Conversation aircall.io The best of Firebase Firebase Performance ● Real-Time Performance Monitoring Automatically tracks key metrics like app startup time, screen rendering speed, and network request latency. ● Customizable Traces and Metrics Allows developers to define custom traces and log specific performance data (e.g., feature-specific response times). ● Proactive Issue Detection Highlights metrics, anomalies, and performance degradations in real-time.

Slide 28

Slide 28 text

28 The Power of Conversation aircall.io

Slide 29

Slide 29 text

29 The Power of Conversation aircall.io

Slide 30

Slide 30 text

30 The Power of Conversation aircall.io The best of Firebase Firebase Performance demo!

Slide 31

Slide 31 text

31 The Power of Conversation aircall.io “ ” Critical path logging

Slide 32

Slide 32 text

32 The Power of Conversation aircall.io Critical path logging The golden rules ● Focus on User Impact Identify processes that are most visible to users, such as app launch, navigation, and feature usage. Prioritize paths where failures or delays would lead to frustration, like checkout or login flows. ● Target High-Risk Areas Pinpoint components prone to failures, such as network dependent features, external API calls, or database operations ● Monitor Business-Critical Functions Identify paths tied to key app goals, like revenue generation or user retention (e.g., notifications, content loading).

Slide 33

Slide 33 text

33 The Power of Conversation aircall.io Critical path logging Pinpoint the failures ● Identify Key Processes Pinpoint the critical paths in your app that directly impact user experience, such as app startup, navigation, and API calls ● Log Key Events and Metrics Use logs to capture critical events like network errors, else conditions or contextual information like timestamps, user actions… ● Prioritize Error and Latency Logs Log failures with stack traces and meaningful error messages for quicker debugging.

Slide 34

Slide 34 text

34 The Power of Conversation aircall.io Critical path logging Example: call flow

Slide 35

Slide 35 text

35 The Power of Conversation aircall.io Critical path logging Example: call flow Identify all the main point of failures in the flow

Slide 36

Slide 36 text

36 The Power of Conversation aircall.io Critical path logging Points of failures

Slide 37

Slide 37 text

37 The Power of Conversation aircall.io Critical path logging The art of logging ● Event name structure Naming the events that are being sent to the logging service is very important. The semantic is key to identify the nature of the log. Using emoji can also be visually helpful! (e.g. TWILIO | CORE | ERROR) ● Log severity Every log that is being dispatch must have a severity link to its importance. An exception cannot logged as VERBOSE or INFO. ● Meaningful payloads Send relevant information in the event payload to be efficient while investigating the logs in case of issue. (e.g. add the location for a stacktrace)

Slide 38

Slide 38 text

38 The Power of Conversation aircall.io demo!

Slide 39

Slide 39 text

39 The Power of Conversation aircall.io

Slide 40

Slide 40 text

40 The Power of Conversation aircall.io “ ” Network monitoring

Slide 41

Slide 41 text

41 The Power of Conversation aircall.io Network monitoring Http Interceptors ● Network requests are often the source of many failures on the app side ● We have to make sure we know what happened on the device in case of failure ● If you are using OkHttp, take advantage of the Interceptor to monitor the requests. In that case what we should effectively monitor?

Slide 42

Slide 42 text

42 The Power of Conversation aircall.io Network monitoring Http Interceptors ● There are a few elements you should monitor while observing your network traffic in your app: ○ DO log the path, method, headers and body of the request ○ DO log the status code, body, headers, response time and error type of the response ○ DO NOT log sensible information (tokens, credentials…), redact them instead! Let’s see how we did it 🚀

Slide 43

Slide 43 text

43 The Power of Conversation aircall.io KSP codegen at your service 🫡

Slide 44

Slide 44 text

44 The Power of Conversation aircall.io

Slide 45

Slide 45 text

45 The Power of Conversation aircall.io

Slide 46

Slide 46 text

46 The Power of Conversation aircall.io

Slide 47

Slide 47 text

47 The Power of Conversation aircall.io

Slide 48

Slide 48 text

48 The Power of Conversation aircall.io “ ” Dashboards & observability

Slide 49

Slide 49 text

49 The Power of Conversation aircall.io Dashboards Observability services ● When it comes to efficient monitoring having observability tools is a must-have but the cost can increase fast 💸 ● Many solutions are existing in the market: Datadog, Sentry, Embrace, Bugsnag… ● These services offer great observability tools: dashboards to monitor, visualize the data or trigger alerts

Slide 50

Slide 50 text

50 The Power of Conversation aircall.io Dashboards Best Practices for effective observability ● Identify the key metrics Focus on essential metrics like crash rates, ANRs, API response times or user engagement. Prioritize metrics that directly impact user experience and business goals. ● Keep It Simple Use intuitive visualizations like graphs, heatmaps, and tables for easy interpretation. ● Use Contextual Filters Enable filtering by dimensions like device type, OS version, app version or geographic region.

Slide 51

Slide 51 text

51 The Power of Conversation aircall.io Dashboards Configuration for Optimal Monitoring ● Set Alerts and Thresholds Configure alerts for critical metrics (e.g., crash-free rate < 99%, API latency > 500ms). ● Regularly Review and Refine Periodically assess dashboards with your team to ensure relevance and clarity. Update metrics and visualizations as app features or user needs evolve

Slide 52

Slide 52 text

52 The Power of Conversation aircall.io Dashboards

Slide 53

Slide 53 text

53 The Power of Conversation aircall.io Dashboards demo!

Slide 54

Slide 54 text

54 The Power of Conversation aircall.io “ ” Dev build monitoring

Slide 55

Slide 55 text

55 The Power of Conversation aircall.io Dev build monitoring Preemptive monitoring ● One of the most effective way to avoid issues in production is to… anticipate them 😅 ● OK anticipate everything is impossible but… 😅 We can apply some monitoring concept to catch things early enough (before it gets to production) What we should effectively monitor in dev mode?

Slide 56

Slide 56 text

56 The Power of Conversation aircall.io Dev build monitoring Preemptive monitoring ● Monitoring your dev builds: ○ DO monitor crashes, warning/error logs in dev mode ○ DO get reports from your unit tests, UI tests, screenshot testing to pinpoint failures ○ DO monitor the network requests while using the app or developing new features (e.g. Chucker) ○ DO have testing sessions before shipping an application in production

Slide 57

Slide 57 text

57 The Power of Conversation aircall.io Dev build monitoring UI tests reporting

Slide 58

Slide 58 text

58 The Power of Conversation aircall.io https://github.com/ChuckerTeam/chucker ● Monitoring your network requests also applies when you are developing new stuff on dev builds. ● Take advantage of the OkHttp Interceptor to monitor your REST requests or GraphQL queries/mutations. ● ⚙ Use Chucker tool in dev mode to monitor your requests ● 👀 Check Nicola Corti’s talk about Chucker 4.0 at droidcon Berlin Dev build monitoring Chucker

Slide 59

Slide 59 text

59 The Power of Conversation aircall.io demo!

Slide 60

Slide 60 text

60 The Power of Conversation aircall.io Wrap-up time! ✌ The key takeaways

Slide 61

Slide 61 text

61 The Power of Conversation aircall.io Key takeaways Monitoring ● Don’t log the entire app, log wisely and effective ● Pinpoint the critical path of your app ● Log all the critical paths ● Identify the main points of failure ● Alerting is good to quickly react ● Performance monitoring matters Observability ● Assess your needs ● Assess the cost of each tool 💸 ● Create efficient & relevant dashboards ● Define your alert threshold and iterate ● Better alerting too soon than never!

Slide 62

Slide 62 text

62 The Power of Conversation aircall.io Grazie ! Happy Smart Monitoring 🤓 Julien Salvi - Android GDE | Android @ Aircall droidcon Italy 2024 󰏢 @JulienSalvi