Upgrade to Pro — share decks privately, control downloads, hide ads and more …

LINEのB2Bプラットフォームにおけるトラブルシューティング2選

 LINEのB2Bプラットフォームにおけるトラブルシューティング2選

LINEには、LINE公式アカウントやLINEアプリを中心にした様々なサービスにおける広告など、多種多様なB2Bプロダクトとそれを支えるプラットフォームがあります。それらは、社内/社外の多くのシステムと連携しており、大規模なトラフィックとデータを扱っています。

こうしたB2Bプラットフォームを運用する上で発生した"問題"とそのトラブルシューティングの事例をいくつか面白おかしくご紹介したいと思います。

弊社環境でしか発生しない問題もいくつかあると思いますが、トラブルシューティングの過程が参考になれば幸いです。

発表者:長谷部 良輔

こちらの資料は、JJUG CCC 2022 Springで発表した内容です。
https://fortee.jp/jjug-ccc-2022-spring/proposal/730d46e2-a295-45c2-abfa-bb7bf13ad7c9

LINE Developers

June 19, 2022
Tweet

More Decks by LINE Developers

Other Decks in Technology

Transcript

  1. Speaker 2 LINE (2013 9 ) 4 OA Dev 2

    / OA SRE LINE LINE / LINE / LINE ( )/ LINE / LINE / LINE Login (OAuth 2 /OIDC) / LINE / LINE Profile+ / LINE Notify / LINE / Java/Kotlin / (Reactive Streams / Kotlin Coroutines) K 8 s Ryosuke Hasebe Github: be-hase 
 Twitter: be_hasee
  2. 1 . About LINE s B 2 B Platform 2

    . Case 1 : Slow latency issue after updating to Lettuce v 6 3 . Case 2 : Direct buffer OOME issue due to bad usage of Spring WebClient Agenda 3
  3. : LINE B 2 B LINE LINE /CRM API BOT

    LINE LINE 5 LINE LINE Talk Head View
  4. : LINE 6 CPU 4 , 50 0 core 


    (request/sec) 10 Memory 14 TB ※ 2021೥9݄࣌఺
  5. \ / Kotlin/Java, Spring Boot, Armeria, gRPC/Thrift MySQL, HBase, Redis,

    Kafka, Elasticsearch, Centraldogma, nginx, fluentd Verda(OpenStack based Private Cloud) VM/PM, Kubernetes / Prometheus, Grafana, IU( ), Kibana, IMON( ) GHE, Jenkins, Drone, Circle CI, Ansible, ArgoCD 8
  6. Lettuce v 4 . 5 . 0 v 6 .

    0 . 0 99 . 9 percentile latency ( 1 sec ) Lettuce = Redis client library for Java spring-data-redis Kafka Consumer 96 Redis Cluster 1 3K commands/sec HGETALL 1 0
  7. Workaround v 5 (v 5 . 3 . 5 )

    v 4 -> v 6 v 5 . 3 . 5 -> v 6 . 0 . 0 v 6 1 1
  8. Lettuce 5 . 3 EOL 😨 > 5 . 3

    .x is EOL (end-of-life) as of June 2 021 . https://github.com/lettuce-io/lettuce-core/wiki/Lettuce-Versions EOL Spring 4 Shell Lettuce v 6 . 1 . 6 1 3
  9. (Lettuce version client-side ) Redis server-side latency SLOWLOG client-side(= java

    application = Lettuce ) 1 4 client-side(Lettuce) / server-side(Redis)
  10. GC STW 99.9 percentile latency ( ) Lettuce GC(STW) or

    GC time Micrometer GC HeapDump Eclipse Memory Analyzer GC STW Unified JVM Logging safepoint log( ) STW 1 5 Stop The World(STW) [2022-03-14T17:30:16.483+0900][192775.478][info ][safepoint] Total time for which application threads were stopped: 0.xxx seconds, Stopping threads took: 0.xxx seconds safepoint log 
 https://krzysztofslusarski.github.io/ 2020 / 11 / 13 /stw.html
  11. JVM Redis v 5 . 3 . 5 v 6

    . 0 . 0 Try & Error Local 1 6
  12. Kafka Consumer Consumer Group Lettuce 6 . 1 . 16

    1 7 Lettuce v5.3.15 Lettuce v6.1.16
  13. Lettuce v 6 RESP 3 RESP 3 Redis v 6

    https://github.com/antirez/RESP 3 /blob/master/spec.md Lettuce v 6 RESP 3 Redis RESP 3 ⾒ RESP 2 fallback ( ) 1 8 RESP 3 ClusterClientOptions .builder() // RESP2ͷΈ࢖༻͢ΔΑ͏ʹ .protocolVersion(ProtocolVersion.RESP2) .build()
  14. Lettuce Cluster Topology Refresh(CTR) Redis Cluster key(slot) client-side Lettuce Cluster

    Topology Refresh(CTR) CLUSTER NODES 60 1 MOVED redirection https://github.com/lettuce-io/lettuce-core/issues/ 3 3 9 2 2
  15. Lettuce Cluster Topology Refresh(CTR) Redis Cluster key(slot) client-side Lettuce Cluster

    Topology Refresh(CTR) CLUSTER NODES 60 1 MOVED redirection https://github.com/lettuce-io/lettuce-core/issues/ 3 3 9 2 3 e7d1eecce10fd6bb5eb35b9f99a514335d9ba9ca 127.0.0.1:30001@31001 master - 0 0 1 connected 0-5460 
 67ed2db8d677e59ec4a4cefb06858cf2a1a89fa1 127.0.0.1:30002@31002 master - 0 1426238316232 2 connected 5461-10922 292f8b365bb7edb5e285caf0b7e6ddc7265d2f4f 127.0.0.1:30003@31003 master - 0 1426238318243 3 connected 10923-16383 07c37dfeb235213a872192d90877d0cd55635b91 127.0.0.1:30004@31004 slave e7d1eecce10fd6bb5eb35b9f99a514335d9ba9ca 0 1426238317239 4 connected 6ec23923021cf3ffec47632106199cb7f496ce01 127.0.0.1:30005@31005 slave 67ed2db8d677e59ec4a4cefb06858cf2a1a89fa1 0 1426238316232 5 connected 824fe116063bc5fcf9f4ffd895bc17aee7731ac3 127.0.0.1:30006@31006 slave 292f8b365bb7edb5e285caf0b7e6ddc7265d2f4f 0 1426238317741 6 connected
  16. Lettuce Cluster Topology Refresh(CTR) Redis Cluster key(slot) client-side Lettuce Cluster

    Topology Refresh(CTR) CLUSTER NODES 60 1 MOVED redirection https://github.com/lettuce-io/lettuce-core/issues/ 3 3 9 2 4
  17. framegraph ? 60 1 (CTR) framegraph node n O(n^ 2

    ) 9 6 node 1 sec Lettuce v 6 . 0 . 0 event loop 2 5 ← EpollEventLoop.run
  18. CPU x 2 Event-loop 1 1sec (CTR) 
 Redis Command

    I/O 
 ( letency 99.9 percentile 1sec ) 2 6
  19. Lettuce Issue & PR 2 9 Issue https://github.com/lettuce-io/lettuce-core/issues/ 2 0

    4 5 PR https://github.com/lettuce-io/lettuce-core/pull/ 2048 6.1.8 https://github.com/lettuce-io/lettuce-core/releases/ tag/ 6 . 1 . 8 .RELEASE
  20. Other Solution 3 0 CTR dynamic refresh source ( :

    ) > CLUSTER NODES dynamic refresh source Initial Seed Nodes 頻 Cluster Initial Seed Nodes down CTR dynamic refresh source / / ͜͜Ͱࢦఆͨ͠ϊʔυ(Initial Seed Nodes)ʹݶఆ͢Δ͜ͱ͕Ͱ͖Δ RedisURI node1 = RedisURI.create("node1", 6379); RedisURI node2 = RedisURI.create("node2", 6379); RedisClusterClient clusterClient = RedisClusterClient.create(Arrays.asList(node1, node2));
  21. Out of Memory Error(OOME) 頻 CSV spec ( 2 0

    core, 64 GB Mem, -Xmx 24 g) 3 3
  22. OutOfMemoryError Direct buffer memory ( ) (native <-> ) /

    3 6 OutOfMemoryError: Direct buffer memory Caused by: java.lang.OutOfMemoryError: Direct buffer memory at java.base/java.nio.Bits.reserveMemory(Bits.java:175) at java.base/java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:118) at java.base/java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:317) at io.netty.buffer.PoolArena$DirectArena.allocateDirect(PoolArena.java:645) at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:621) ※ ͳ͓ɺJava13͔Β͸Τϥʔϝοηʔδ͕Θ͔Γ΍͘͢਌੾ʹͳ͍ͬͯ·͢ 
 https://bugs.openjdk.java.net/browse/JDK-8048192
  23. Spring WebClient Lettuce Spring WebClient 頻 CSV Mono<byte[]> Reactor Flux<DataBuffer>

    Spring WebClient Spring Boot 2 . 1 2 56 KB 
 4 0 webClient.get() .uri(uri) .retrieve() .bodyToMono(byte[].class) // ո͍͠ .block(); WebClient.builder() .codecs(configurer -> configurer.defaultCodecs() .maxInMemorySize(-1)) // ແ੍ݶʹ͍ͯͨ͠ .build(); Spring WebClient Lettuce
  24. OOME 4 4 * 300MB OOME 4 2 Caused by:

    java.lang.OutOfMemoryError: Direct buffer memory at java.base/java.nio.Bits.reserveMemory(Bits.java:175) at java.base/java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:118) at java.base/java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:317) at io.netty.buffer.PoolArena$DirectArena.allocateDirect(PoolArena.java:648) at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:623) at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:202) at io.netty.buffer.PoolArena.tcacheAllocateNormal(PoolArena.java:186) at io.netty.buffer.PoolArena.allocate(PoolArena.java:136) at io.netty.buffer.PoolArena.allocate(PoolArena.java:126) at io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:394) at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:188) at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:179) seq 4 | xargs -P 4 -I{} curl localhost:8080/mono -XX:MaxDirectMemorySize= 1 g
  25. (jcmd PID GC.run ) OOME 4 3 watch curl localhost:8080/mem

    name=direct, count=76, memoryUsed=1008MB, totalCapacity=1008MB name=direct, count=76, memoryUsed=1008MB, totalCapacity=1008MB …
  26. Flux<DataBuffer> Flux<DataBuffer> ( ) 4 5 seq 4 | xargs

    -P 4 -I{} curl localhost:8080/flux curl localhost:8080/mem name=direct, count=17, memoryUsed=80MB, totalCapacity=80MB
  27. WebClient( reactor-netty) CPU I/O CPU 20 頻 2GB CSV 20

    * 2GB = 4 0 GB 24GB = -Xmx or -XX:MaxDirectMemorySize 4 8
  28. !!

  29. We re Hiring !! 5 1 LINE B 2 B

    2 SRE (https://linecorp.com/ja/career/position/ 3112 ) / (https://linecorp.com/ja/career/position/ 231 6 )