Slide 1

Slide 1 text

LINE ShopνʔϜͰͷ
 Prometheus/Grafana/Zipkin/ Elasticsearch/Kibana
 Λ࢖ͬͨαʔϏεϞχλϦϯά 2019/04/17 LINE Developer Meetup in Fukuoka #52 (https://line.connpass.com/event/126705/) LINE Fukuokaגࣜձࣾ ։ൃ1ࣨ দ࡚ ֶ

Slide 2

Slide 2 text

About me @matsumana LINE Fukuoka Corp, Development 1 Dept SRE/Server side Engineer https://github.com/matsumana Manabu Matsuzaki

Slide 3

Slide 3 text

• LINE ShopαʔϏε঺հ • Introduction to Armeria • Integration with Prometheus • Integration with Zipkin • How do we monitor our services with … • Prometheus/Grafana • Zipkin • Elasticsearch/Kibana Agenda

Slide 4

Slide 4 text

1. LINE ShopαʔϏε঺հ

Slide 5

Slide 5 text

LINE Shopͱ͸ʁ • LINEαʔϏεʹ͓͚Δɺελϯϓɾֆจࣈɾண͔ͤ͑ͳͲͷίϯςϯπΛ
 ൢചɺར༻͢ΔͨΊͷϓϥοτϑΥʔϜ • LINEΞϓϦ಺ͷελϯϓγϣοϓɺண͔ͤ͑γϣοϓ • WebͷLINE STORE (https://store.line.me/) • ໿490ສηοτͷLINEελϯϓΛൃചத ʢ2019/04ݱࡏʣ • 1೔͋ͨΓͷελϯϓૹ৴਺͸ฏۉ4ԯ3,300ສճ ʢ2019/04ݱࡏʣ

Slide 6

Slide 6 text

LINE Shopؔ࿈ͷ࠷ۙͷχϡʔε • ʲLINEελϯϓʳਓؾΩϟϥΫλʔʹɺࣗ෼ͷ޷͖ͳจࣈΛೖΕͯελϯϓ ͕࡞ΕΔʂ ࠷୹1෼Ͱ੍࡞Մೳͳࣗ෼͚ͩͷʮΧελϜελϯϓʯ͕ొ৔ • https://linecorp.com/ja/pr/news/ja/2019/2664

Slide 7

Slide 7 text

LINE Shop ΞʔΩςΫνϟ • LINE DEVELOPER DAY 2018 ϙελʔηογϣϯ
 ʮԶͷߟ͑ͨ࠷ڧͷϚΠΫϩαʔϏε - LINE Shop ͷࣄྫΛఴ͑ͯʯ
 https://twitter.com/LINE_DEV/status/1073068507707789313

Slide 8

Slide 8 text

2. Introduction to Armeria (Our apps built on top of Armeria)

Slide 9

Slide 9 text

Armeria is an open-source asynchronous HTTP/2 RPC/REST client/server library built on top of Java 8, Netty, Thrift and gRPC. Its primary goal is to help engineers build high-performance asynchronous microservices that use HTTP/2 as a session layer protocol. https://line.github.io/armeria/

Slide 10

Slide 10 text

Is there any open-source project
 using Armeria to refer? • https://github.com/line/armeria/issues/1709 • OSS • OpenZipkin Server • Curiostack • Services • LINE • Slack • Kakao Pay • Infostellar

Slide 11

Slide 11 text

Related features for monitoring • Collect metrics with Micrometer and Prometheus • Distributed tracing with Zipkin

Slide 12

Slide 12 text

Collected metrics by Armeria

Slide 13

Slide 13 text

Armeria server’s metrics • Requests • total (success, fail) • latency for each API (quantile: 0.5, 0.75, 0.9, 0.95, 0.98, 0.99, 0.99, 1.0) • request size • response size • Connection from client • Active requests • Pending requests • Logback (trace, debug, info, warn, error) • etc

Slide 14

Slide 14 text

Armeria client’s metrics • Requests • total (success, fail) • latency for each API (quantile: 0.5, 0.75, 0.9, 0.95, 0.98, 0.99, 0.99, 1.0) • Circuit breaker • etc

Slide 15

Slide 15 text

Zipkin integration

Slide 16

Slide 16 text

public final class Frontend { public static void main(String[] args) { final Tracing tracing = TracingFactory.create("frontend"); final HttpClient backendClient = new HttpClientBuilder("http://localhost:9000/") .decorator(HttpTracingClient.newDecorator(tracing, "backend")) .build(); final Server server = new ServerBuilder() .http(8081) .service("/", (ctx, res) -> backendClient.get("/api")) .decorator(HttpTracingService.newDecorator(tracing)) .decorator(LoggingService.newDecorator()) .build(); server.start().join(); } } https://github.com/openzipkin-contrib/zipkin-armeria-example/blob/master/src/main/java/armeria/Frontend.java Sample code from zipkin-armeria-example

Slide 17

Slide 17 text

public final class Backend { public static void main(String[] args) { final Tracing tracing = TracingFactory.create("backend"); final Server server = new ServerBuilder() .http(9000) .service("/api", (ctx, res) -> HttpResponse.of(new Date().toString())) .decorator(HttpTracingService.newDecorator(tracing)) .decorator(LoggingService.newDecorator()) .build(); server.start().join(); } } https://github.com/openzipkin-contrib/zipkin-armeria-example/blob/master/src/main/java/armeria/Backend.java Sample code from zipkin-armeria-example

Slide 18

Slide 18 text

3. How do we monitor our services with Prometheus/Grafana/Zipkin /Elasticsearch/Kibana

Slide 19

Slide 19 text

• Monitor metrics of OS/Middlewares/Applications • Prometheus + Grafana • Investigate which microservices are getting slow/failure • Zipkin • Confirm log • Elasticsearch + Kibana • Reporting (Preliminary report value) • Elasticsearch + Kibana

Slide 20

Slide 20 text

Monitoring with Prometheus/Grafana

Slide 21

Slide 21 text

Overview

Slide 22

Slide 22 text

Visualize with Grafana
 (OS metrics by node_exporter) • Load average • CPU usage (system, user, I/O wait) • Context switches • Memory usage (memory, slab, swap) • Disk usage • Network traffic (inbound, outbound) • etc

Slide 23

Slide 23 text

Visualize with Grafana (JVM metrics) • GC • Pause time (Young, Old) • Pause count (Young, Old) • Memory (heap) • used, committed, max (Eden, Survivor, Old) • Memory (non-heap) • used, committed, max (Metaspace, Code cache) • Thread • Thread count, Daemon thread count • ClassLoader • Loaded classes count, Unloaded classes count • etc

Slide 24

Slide 24 text

Visualize with Grafana (Armeria metrics) • Requests • total (success, fail) • latency for each API (quantile: 0.5, 0.75, 0.9, 0.95, 0.98, 0.99, 0.99, 1.0) • request size • response size • Connection from client • Active requests • Pending requests • Logback (trace, debug, info, warn, error) • etc

Slide 25

Slide 25 text

Example chart (Requests/sec)

Slide 26

Slide 26 text

Example chart (Errors/sec)

Slide 27

Slide 27 text

Example chart (latency)

Slide 28

Slide 28 text

Visualize with Grafana (Cache metrics) • Local cache (Caffeine), Redis for cache storage • Operation count (get, put) • Hit rate • size • load latency (quantile: 0.5, 0.75, 0.9, 0.95, 0.98, 0.99, 0.99, 1.0)

Slide 29

Slide 29 text

Visualize with Grafana (Redis client) • Command count • GET • HGET • HMGET • SET • HSET • HMSET • ZRANGE • etc

Slide 30

Slide 30 text

Visualize with Grafana (MongoDB client) • Requests • total (success, fail) • latency for each requests (quantile: 0.5, 0.75, 0.9, 0.95, 0.98, 0.99, 0.99, 1.0)

Slide 31

Slide 31 text

Visualize with Grafana (Elasticsearch client) • Requests • total (success, fail) • latency for each requests (quantile: 0.5, 0.75, 0.9, 0.95, 0.98, 0.99, 0.99, 1.0)

Slide 32

Slide 32 text

Monitoring with Zipkin

Slide 33

Slide 33 text

Overview

Slide 34

Slide 34 text

Example graph

Slide 35

Slide 35 text

Monitoring with
 Elasticsearch/Kibana

Slide 36

Slide 36 text

Collected data • Application logs (via Logback) • User operation logs • Product search • Search keyword • Product browsing • product, type, country, gender, age • Product purchase event (Preliminary report value) • product, type, country, gender, age • sales (compare to yesterday, last week, last month) • Elasticsearch slow logs (collect with Fluentd) • etc

Slide 37

Slide 37 text

Overview
 (application logs & user operation log)

Slide 38

Slide 38 text

Overview (Elasticsearch slow logs)

Slide 39

Slide 39 text

Thank you