Upgrade to Pro — share decks privately, control downloads, hide ads and more …

從混亂到掌控:使用 Observability 揭開管家的神秘面紗

從混亂到掌控:使用 Observability 揭開管家的神秘面紗

Speaker: Gary Hu, Tristan Wu
Event: LINE TECH FRESH 畢業分享會

LINE Developers Taiwan

June 19, 2024
Tweet

More Decks by LINE Developers Taiwan

Other Decks in Technology

Transcript

  1. Tristan Education • B.B.A in Finance @ NTU Experience •

    2023 - 2024 | TECH FRESH @ LINE Taiwan • 2022 - 2023 | Software Engineer Intern @ Junyiacademy • 2022 | Backend Trainee @ AppWorks School
  2. 01 02 03 Three Pillars of Observability Case Study: LINE

    INVOICE Introduction to Observability CONTENT 04 Conclusion
  3. Logs 1. Immutable / Timestamped record of discrete events 2.

    Record necessary info for each request Source: https://grafana.com/products/cloud/logs/
  4. • Unstructured - PlainText • Structured - JSON format •

    Binary ◦ MySQL binlogs ◦ systemd journal logs Logs Format
  5. • Unstructured - PlainText • Structured - JSON format •

    Binary ◦ MySQL binlogs ◦ systemd journal logs Logs Format
  6. • Unstructured - PlainText • Structured - JSON format •

    Binary ◦ MySQL binlogs ◦ systemd journal logs Logs Format Source: https://www.percona.com/blog/binlog-encryption-percona-server-mysql/
  7. Metrics Supported Data Types Counter Gauge Histogram Summary • Only

    increases, never decreases • Application: HTTP request times
  8. Metrics Supported Data Types Counter Gauge Histogram Summary • Increase

    or decrease at any time • Application: num of concurrent reqs
  9. Metrics Supported Data Types Counter Gauge Histogram Summary • Only

    increases, never decreases • Application: request durations
  10. Metrics Supported Data Types Counter Gauge Histogram Summary • Provides

    precise sampling of observations • Application: request durations
  11. Traces • Record and visualize the complete path of a

    request through the system • Identify specific points Source: https://grafana.com/docs/grafana/latest/panels-visualizations/visualizations/traces/ https://www.oreilly.com/library/view/distributed-systems-observability/9781492033431/ch04.html
  12. Gary Hu Education • M.S. in Computer Science @ NTU

    • B.B.A in Information Management @ NTU Experience • 2023 - 2024 | TECH FRESH @ LINE Taiwan • 2022 - 2023 | Software Engineer Intern @ KKCompany • 2022 | Research Assistant @ Academia Sinica
  13. Case 1: Mystery Behind the Blank Screen Scenario Thousands of

    users simultaneously accessing our system Problem Users are met with blank screens and error messages. Challenges We need to investigate the error, and identify its cause.
  14. Case 1: Mystery Behind the Blank Screen Steps 1. Centralized

    Log Collection 2. Log Search 3. Identify Log Locations 4. Impact Tracking
  15. Case 2: Peak Traffic Monitoring Scenario Thousands of users simultaneously

    accessing our system Problem 1. Server cannot handle all requests 2. Timeouts and poor user experience Challenges Identify bottlenecks and optimize server performance
  16. Case 2: Peak Traffic Monitoring Steps 1. Collect metrics from

    all services 2. Visualize metrics to understand system behavior 3. Monitor traffic volume and response time
  17. Case 2: Peak Traffic Monitoring Steps 4. Collect CPU and

    memory usage 5. Identify issues 6. Address inappropriate configurations
  18. Case 3: Mystery of the 5-Minute Workflow Scenario Many workflows

    are executed daily to fetch user invoices Problem We discovered that numerous workflows are taking over 5 minutes to complete. Challenges Identify the cause of the delays and optimize the workflow performance.
  19. Case 3: Mystery of the 5-Minute Workflow Steps 1. Collect

    traces 2. Visualize traces 3. Analyze each spans v
  20. Case 3: Mystery of the 5-Minute Workflow Findings • Fetching

    invoices from the government takes 27 seconds. • Storing the invoices in the database, however, takes 54 seconds. 54 secs 27 secs