Upgrade to Pro — share decks privately, control downloads, hide ads and more …

LINE ShopチームでのPrometheus/Grafana/Zipkin/Elasticsearch/Kibanaを使ったサービスモニタリング / Service monitoring with Prometheus,Grafana,Zipkin,Elasticsearch,Kibana at LINE Shop team

LINE ShopチームでのPrometheus/Grafana/Zipkin/Elasticsearch/Kibanaを使ったサービスモニタリング / Service monitoring with Prometheus,Grafana,Zipkin,Elasticsearch,Kibana at LINE Shop team

2019/4/17にLINE Fukuokaで開催されたLINE Developer meetup #52での登壇資料です

53850955f15249a1a9dc49df6113e400?s=128

LINE Developers
PRO

April 17, 2019
Tweet

Transcript

  1. LINE ShopνʔϜͰͷ
 Prometheus/Grafana/Zipkin/ Elasticsearch/Kibana
 Λ࢖ͬͨαʔϏεϞχλϦϯά 2019/04/17 LINE Developer Meetup in

    Fukuoka #52 (https://line.connpass.com/event/126705/) LINE Fukuokaגࣜձࣾ ։ൃ1ࣨ দ࡚ ֶ
  2. About me @matsumana LINE Fukuoka Corp, Development 1 Dept SRE/Server

    side Engineer https://github.com/matsumana Manabu Matsuzaki
  3. • LINE ShopαʔϏε঺հ • Introduction to Armeria • Integration with

    Prometheus • Integration with Zipkin • How do we monitor our services with … • Prometheus/Grafana • Zipkin • Elasticsearch/Kibana Agenda
  4. 1. LINE ShopαʔϏε঺հ

  5. LINE Shopͱ͸ʁ • LINEαʔϏεʹ͓͚Δɺελϯϓɾֆจࣈɾண͔ͤ͑ͳͲͷίϯςϯπΛ
 ൢചɺར༻͢ΔͨΊͷϓϥοτϑΥʔϜ • LINEΞϓϦ಺ͷελϯϓγϣοϓɺண͔ͤ͑γϣοϓ • WebͷLINE STORE

    (https://store.line.me/) • ໿490ສηοτͷLINEελϯϓΛൃചத ʢ2019/04ݱࡏʣ • 1೔͋ͨΓͷελϯϓૹ৴਺͸ฏۉ4ԯ3,300ສճ ʢ2019/04ݱࡏʣ
  6. LINE Shopؔ࿈ͷ࠷ۙͷχϡʔε • ʲLINEελϯϓʳਓؾΩϟϥΫλʔʹɺࣗ෼ͷ޷͖ͳจࣈΛೖΕͯελϯϓ ͕࡞ΕΔʂ ࠷୹1෼Ͱ੍࡞Մೳͳࣗ෼͚ͩͷʮΧελϜελϯϓʯ͕ొ৔ • https://linecorp.com/ja/pr/news/ja/2019/2664

  7. LINE Shop ΞʔΩςΫνϟ • LINE DEVELOPER DAY 2018 ϙελʔηογϣϯ
 ʮԶͷߟ͑ͨ࠷ڧͷϚΠΫϩαʔϏε

    - LINE Shop ͷࣄྫΛఴ͑ͯʯ
 https://twitter.com/LINE_DEV/status/1073068507707789313
  8. 2. Introduction to Armeria (Our apps built on top of

    Armeria)
  9. Armeria is an open-source asynchronous HTTP/2 RPC/REST client/server library built

    on top of Java 8, Netty, Thrift and gRPC. Its primary goal is to help engineers build high-performance asynchronous microservices that use HTTP/2 as a session layer protocol. https://line.github.io/armeria/
  10. Is there any open-source project
 using Armeria to refer? •

    https://github.com/line/armeria/issues/1709 • OSS • OpenZipkin Server • Curiostack • Services • LINE • Slack • Kakao Pay • Infostellar
  11. Related features for monitoring • Collect metrics with Micrometer and

    Prometheus • Distributed tracing with Zipkin
  12. Collected metrics by Armeria

  13. Armeria server’s metrics • Requests • total (success, fail) •

    latency for each API (quantile: 0.5, 0.75, 0.9, 0.95, 0.98, 0.99, 0.99, 1.0) • request size • response size • Connection from client • Active requests • Pending requests • Logback (trace, debug, info, warn, error) • etc
  14. Armeria client’s metrics • Requests • total (success, fail) •

    latency for each API (quantile: 0.5, 0.75, 0.9, 0.95, 0.98, 0.99, 0.99, 1.0) • Circuit breaker • etc
  15. Zipkin integration

  16. public final class Frontend { public static void main(String[] args)

    { final Tracing tracing = TracingFactory.create("frontend"); final HttpClient backendClient = new HttpClientBuilder("http://localhost:9000/") .decorator(HttpTracingClient.newDecorator(tracing, "backend")) .build(); final Server server = new ServerBuilder() .http(8081) .service("/", (ctx, res) -> backendClient.get("/api")) .decorator(HttpTracingService.newDecorator(tracing)) .decorator(LoggingService.newDecorator()) .build(); server.start().join(); } } https://github.com/openzipkin-contrib/zipkin-armeria-example/blob/master/src/main/java/armeria/Frontend.java Sample code from zipkin-armeria-example
  17. public final class Backend { public static void main(String[] args)

    { final Tracing tracing = TracingFactory.create("backend"); final Server server = new ServerBuilder() .http(9000) .service("/api", (ctx, res) -> HttpResponse.of(new Date().toString())) .decorator(HttpTracingService.newDecorator(tracing)) .decorator(LoggingService.newDecorator()) .build(); server.start().join(); } } https://github.com/openzipkin-contrib/zipkin-armeria-example/blob/master/src/main/java/armeria/Backend.java Sample code from zipkin-armeria-example
  18. 3. How do we monitor our services with Prometheus/Grafana/Zipkin /Elasticsearch/Kibana

  19. • Monitor metrics of OS/Middlewares/Applications • Prometheus + Grafana •

    Investigate which microservices are getting slow/failure • Zipkin • Confirm log • Elasticsearch + Kibana • Reporting (Preliminary report value) • Elasticsearch + Kibana
  20. Monitoring with Prometheus/Grafana

  21. Overview

  22. Visualize with Grafana
 (OS metrics by node_exporter) • Load average

    • CPU usage (system, user, I/O wait) • Context switches • Memory usage (memory, slab, swap) • Disk usage • Network traffic (inbound, outbound) • etc
  23. Visualize with Grafana (JVM metrics) • GC • Pause time

    (Young, Old) • Pause count (Young, Old) • Memory (heap) • used, committed, max (Eden, Survivor, Old) • Memory (non-heap) • used, committed, max (Metaspace, Code cache) • Thread • Thread count, Daemon thread count • ClassLoader • Loaded classes count, Unloaded classes count • etc
  24. Visualize with Grafana (Armeria metrics) • Requests • total (success,

    fail) • latency for each API (quantile: 0.5, 0.75, 0.9, 0.95, 0.98, 0.99, 0.99, 1.0) • request size • response size • Connection from client • Active requests • Pending requests • Logback (trace, debug, info, warn, error) • etc
  25. Example chart (Requests/sec)

  26. Example chart (Errors/sec)

  27. Example chart (latency)

  28. Visualize with Grafana (Cache metrics) • Local cache (Caffeine), Redis

    for cache storage • Operation count (get, put) • Hit rate • size • load latency (quantile: 0.5, 0.75, 0.9, 0.95, 0.98, 0.99, 0.99, 1.0)
  29. Visualize with Grafana (Redis client) • Command count • GET

    • HGET • HMGET • SET • HSET • HMSET • ZRANGE • etc
  30. Visualize with Grafana (MongoDB client) • Requests • total (success,

    fail) • latency for each requests (quantile: 0.5, 0.75, 0.9, 0.95, 0.98, 0.99, 0.99, 1.0)
  31. Visualize with Grafana (Elasticsearch client) • Requests • total (success,

    fail) • latency for each requests (quantile: 0.5, 0.75, 0.9, 0.95, 0.98, 0.99, 0.99, 1.0)
  32. Monitoring with Zipkin

  33. Overview

  34. Example graph

  35. Monitoring with
 Elasticsearch/Kibana

  36. Collected data • Application logs (via Logback) • User operation

    logs • Product search • Search keyword • Product browsing • product, type, country, gender, age • Product purchase event (Preliminary report value) • product, type, country, gender, age • sales (compare to yesterday, last week, last month) • Elasticsearch slow logs (collect with Fluentd) • etc
  37. Overview
 (application logs & user operation log)

  38. Overview (Elasticsearch slow logs)

  39. Thank you