Getting Started Performance Troubleshoot

伊藤ちひろ Chihiro Ito パフォーマンスのトラブルシュート入門 Getting started Performance Troubleshooting 1

Source: Insert source data here Insert source data here Insert
source data here OPTIONAL SECTION MARKER OR TITLE 伊藤ちひろ (Chihiro Ito) OpenJDK Committer (JFR, SA) Application Services Solution Architect Twitter : @chiroito 自己紹介 Proﬁle 2

本資料の対象者 Target of this presentation • 性能問題が起きてる人 People having performance
problems • 性能問題が起きてるけど原因が分からない人 People who are having performance problems but don't know why • 性能問題が起きてる気がするので勘で対応している人 People who feel like having a performance problem and is dealing with it on a hunch • 性能問題が起きてるか分からない人 People who don't know if they're having performance problems • 3

性能問題とは何か？ What is performance problem? 4

よくある疑問 common questions アプリが遅い気がする？ I feel like the app is
slow? 性能が目標値に達してない･･･ My app isn't meeting its performance targets. 5

よくある対処方法 Common ways to deal 良く分からないけどこれかな？ Maybe it's this cause.
いくつか試したら改善した気がする I tried a few things and it improved. 6

あるべき対処方法 The ideal way to deal with a problem アプリからDBへの接続が枯渇した
Lack of connection from the app to the DB DBはまだ接続を受け入れられるなアプリからの接続を増やそう。 DB can still accept connections. Get more connections from the app. 7

正しいパフォーマンスチューニングの流れ The Right Performance Tuning Flow 1. アプリが何をしているか理解 Understanding what
the app is doing 2. アプリの事実を計測 Measuring the facts of the app 3. 計測結果を分析 Analyze the measurement results 4. 根拠のある改善策を適用 Apply evidence-based remedies 8

アプリが何をしてるのか Understanding what the app is doing 9

アプリが何をしているか - Webアプリの場合 Understanding what the app is doing -
For web apps リクエスト Request 10 レスポンス Response アプリケーション Application 異常時はこの期間が延びる In abnormal times, this period is extended. 処理 Processing

アプリが何をしているか - Webアプリの詳細 Understanding what the app is doing -
More about the web app 11 アプリケーション Application データベース Database API API 異常時はどこかの期間が延びる In abnormal times, some period of time is extended.

注意：アプリの下も把握しなければならない Note: You have to ﬁgure out what's going on
underneath the app as well • アプリケーションを動かす基盤を含めて把握が必要 You need to understand the infra that the application works ◦ フレームワーク Framework ◦ ランタイム（アプリケーションサーバなど） Runtime (include Application Server, etc) ◦ オペレーティングシステム Operating System 12 F/W Runtime OS スタック Stack アプリ App

アプリの事実を計測 Measuring the facts of the app 13

アプリの事実を計測とは Measuring the facts of the app 14 ログ・メトリクス Log,
Metric 記録 Record アプリの振る舞い App Behavior

どのようにアプリの事実を計測するか How to measure the facts of your app •
分析に必要な全ての事実を計測 Measure all the facts needed for analysis ◦ 何の情報を計測するか What information is being measured ◦ いつ情報を計測するか When to measure the information ◦ どこへ情報を記録するか Where to record the information 15

何の情報を計測するか What information is being measured • 各処理の視点 Perspectives on
each process ◦ 処理のどこが遅くなっているかの事実が分かる Being able to see the facts about what part of the process is slowing down • システム全体の視点 Perspective in whole system ◦ なぜ遅くなっているのか原因が分かる Being able to see why it's slowing down 16

何を計測するか - 各処理の視点 What information is being measured - Perspectives
on each process A. HTTPをリクエストする（クライアント） Request HTTP (client) B. HTTPリクエストを受ける Receiving HTTP requests C. ネットワーク操作（接続、問い合せ、切断など） Network operations (connect, query, disconnect, etc.) D. スレッド操作（処理中、待機、スリープなど） Thread operations (in process, wait, sleep, etc.) E. ファイル操作（読み込み、書み込みなど） File operations (read, write, etc.) 17 A B B C D E リクエスト Request レスポンス Response

何を計測するか - システム全体の視点 What information is being measured - Perspective
in whole system • リソースの全体量、使用量、残量や待ち数を記録 Record the overall amount of resources, usage, remaining and number of waits • バックグラウンドで処理されるものを記録 Record what is processed in the background • OSやプロセスの情報を記録 Record OS and process information • 毎秒～10秒毎のように定期的に記録 Record it periodically, like every second or 10 seconds 18 残量 Remain 使用量 Usage リソースプール Resource Pool

何を計測するか：具体例 What information is being measured • HTTPやDBなどの処理期間 Processing time
for HTTP, DB, etc • HTTPなどの処理数とそれらの受付待ち数 Number of HTTP, etc. processes and their waiting lists • プールの使用量（コネクションやスレッドなど） Pool usage (connections, threads, etc.) • ランタイムのバックグラウンド処理の統計 Run-time background processing statistics • CPU使用率、IO時間、OSのメモリ使用量など CPU usage, IO time, OS memory usage, etc 19 各処理の視点 Perspectives on each process システムの視点 Perspective in whole system

いつ情報を計測するか When to measure the information • 問題が起きた期間 The period
of time when the problem occurred • 何も問題が起きていない期間 A period of time in which nothing has gone wrong 20 正常期間 Normal Period 異常期間 Abnormal Period 記録期間 Recording period 正常期間 Normal Period

どこへ情報を記録するか Where to record the information • 何度でも確認できるところ Where you
can check it again and again ◦ ファイル File ◦ ミドルウェア/サービス（Prometheus/Elasticsearch/Datadog） Middleware or Services (Prometheus/Elasticsearch/Datadog) • ダメ：すぐに消えてしまうようなところ（Shell） No Good : Where it's going to disappear quickly (Shell) 21

計測結果を分析 Analyze the measurement results 22

計測結果を分析 Analyze the measurement results • 問題の根拠を見つける Find the reasons
for the problem ◦ 計測結果を可視化 Visualization of measurement results ◦ 正常期間と異常期間を比較する Comparing normal and abnormal periods ◦ 情報が足りなかったら計測へ戻る Return to measurement when the acquired information is insuﬃcient 23

計測した情報を可視化 Visualization of measurement results • 各処理の視点 Perspectives on each
process 24 • システム全体の視点 Perspective in whole system DB処理 API呼出時間 time 開始 Start 終了 End 3:24:10 3:24:20 3:24:30 使用中 usage 割当待ち waiting lists リソース状況 Resource Status 3:24:23 3:24:34

正常期間と異常期間を比較 Comparing normal and abnormal periods • 正常期間 Normal period
25 • 異常期間 Abnormal Period DB API 呼出 3:24:10 3:24:20 3:24:30 使用中 usage 割当待ち waiting lists D B API 呼出 1:01:10 1:01:20 1:00:30 使用中 usage コネクションプール Connection Pool 処理 Processing 3:24:23 3:24:34 1:01:12 1:01:17

ここまでのまとめ Summary • アプリがしてることを内部まで正しく理解する Get what the app is doing
right down to the insides • 必要な情報は全て記録する record all the information we need • 可視化して正常期間と異常期間を比較する Visualize and compare normal and abnormal periods • 根拠のある改善を行う Making Evidence-Based Improvements 26

Java の場合 Case of Java 27

アプリが何をしてるのか：Javaの構成 Understanding what the app is doing - Java stack
• アプリケーション Application • アプリケーションサーバ（フレームワーク含む） Application Server (Inc. Framework, etc) • Java 仮想マシン Java Virtual Machine • オペレーティングシステム Operating System 28 Windows Linux Mac Java VM Quarkus Wildﬂy Spring App App App

何を計測するか：Java What information is being measured : Java stack •
HTTPやDBなどの処理期間 Processing time for HTTP, DB, etc • HTTPなどの処理数とそれらの受付待ち数 Number of HTTP, etc. processes and their waiting lists • プールの使用量（コネクションやスレッドなど） Pool usage (connections, threads, etc.) • ガベージコレクション、クラスロード、コンパイル Garbage Collection, Loading class, Compile, etc. • CPU使用率、IO時間、OSのメモリ使用量など CPU usage, IO time, OS memory usage, etc 29 各処理の視点 Perspectives on each process システムの視点 Perspective in whole system 大抵のFWが提供 Most frameworks provide

アプリの事実を計測：Java 標準のツール How do information be measured : Java standard
tools • Uniﬁed Logging • MBean / Java Management Extention (JMX) • Jcmd / Jstat • Eclipse MicroProﬁle Metrics / Health • JDK Flight Recorder 30

Unified Logging • JVMのイベントをログとして出力するJVMのログフレームワーク A JVM logging framework that logs
JVM events ◦ GC、クラスロード、JITコンパイルなどの実行時に出力 Output during GC, class loading, JIT compilation, etc when these are processing • 以前は取りたいログ毎にVM引数が存在 Previously, there were VM arguments for each log you wanted to get 31 java -Xlog:gc=trace:file=gctrace.txt:uptimemillis,pid:filecount=5,filesize=1M App [5ms][34164] MarkStackSize: 4096k MarkStackSizeMax: 16384k [21ms][34164] ConcGCThreads: 2 offset 16 [21ms][34164] ParallelGCThreads: 8 [22ms][34164] Initialize mark stack with 4096 chunks, maximum 16384

MBean / JMX • アプリ/FW/JVMがメトリクスの公開などの管理に使うフレームワーク A framework that apps/FW/JVMs use
to manage metrics exposure, etc ◦ JVMはGC、クラスロード、JITコンパイルなどの統計を提供 JVM provides statistics on GC, class loads, JIT compilation, etc. ◦ フレームワークはリソース量、処理待ち数などを提供 App framework provides such as amount of resources, processing waits, etc. 32 @MXBean public interface Metrics {} public class MetricsImpl implements Metrics {}

Eclipse MicroProﬁle Metrics/Health • FWがアプリケーションやFWのメトリクスを公開 FW exposes application and FW
metrics ◦ 生存確認、APIの呼び出し回数、処理時間など Health check, number of API calls, processing time, etc • XXX 33 curl http://xxxxx/metrics/application { "org.acme.microproﬁle.metrics.PrimeNumberChecker.checksTimer" : { "p99": 217.231273, "p999": 217.231273, "min": 0.58961, "mean": 112.15909190834819, (略) "stddev": 108.2721053982776, } }

JDK Flight Recorder • アプリ/FW/JVMのメトリクス・イベントを記録 Record app/FW/JVM metrics and events
◦ JVMの主要な分析に必要なほぼ全てのメトリクスを記録 Record nearly all the metrics needed for key JVM analyses ◦ アプリ独自のメトリクスも作れる。対応FWもある You can also create app-speciﬁc metrics. Some FW support JFR 34 Recording r = new Recording(); r.start(); r.stop(); r.dump() java -XX:StartFlightRecording App

アプリの状況を計測：昔ながらの収集例 How do information be measured : example of old-fashioned
collecting 35 ログ Log JMX JStat 収集アプリ Collector app アプリ App 記録 Log 記録 Log

アプリの状況を計測：今どきの収集例 How do information be measured : Example of the
latest collections 36 ログ Log MicroProﬁle Metrics/Health アプリ App JFR Prometheus

計測結果を分析：ツール Analyze the measurement results : Tools • スプレッドシート（Excelなど） Spread
sheet (ex. Excel) • Prometheus + Grafana • Elasticsearch + Kibana • JDK Mission Control 37

JDK Mission Control 38

Java のまとめ • Java でも上から下まで情報を記録する必要がある • Java には様々なツールが用意されていて幅広く記録できる • Java
には分析ツールも含まれている 39

ケーススタディ Case study 40

ケーススタディ Case Study 41 アプリケーション Application • 1回目のリクエストだけ遅い、2回目からは早くなる Only the
ﬁrst request is slow, the second one is faster. 遅い Slow 速い Fast 1回目のリクエスト First time 2回目のリクエスト First time

原因は？ What is the cause? 42 Just In Time コンパイルされてないコードは激遅なんだよ！
Just In Time Any code that isn't compiled is very slow! 起動直後はスレッドプールが準備できてないからだろ。 That's because the thread pool isn't ready right after startup. コネクションプールが用意できてないからに決まってるだろ It's because they don't have a connection pool available.

デモ環境 Demo Environment • DBアクセスと他のAPIへアクセスするアプリケーション Applications that access the DB
and other APIs • SpringBoot 2.3.4 on OpenJDK 14 SpringBoot 2.3.4 on OpenJDK 14 • 全てのイベントを記録するようにJFRを設定 Set JFR to record all events • HTTPとJDBCを記録するように独自イベントを作成 Create your own events to record HTTP and JDBC 43

デモで実行した処理の流れ The ﬂow of the process performed in the demo
44 JFR開始 Start JFR JFR終了 Stop JFR リクエスト Request 操作 Operations 1回目(1.2s) 2回目(16ms) 3回目(12ms) 4回目(13ms) 5回目(15ms)

各リクエストの時間 Processing time for each request 45 1回目のリクエスト First request
2回目以降のリクエスト Second and subsequent requests 処理期間は10-15ms Processing time is 10-15ms 処理期間は800msだが、400ms差がある The processing time is 800ms, but there is a 400ms diﬀerence

2～5回目のリクエストの処理の内訳 A breakdown of the processing of the second through
ﬁfth requests 46 遅延している処理は記録されていない Delayed processes are not recorded. DBの処理期間は短い DB processing time is short.

最初のリクエストの処理の内訳 A breakdown of the processing of the ﬁrst request
47 DBの処理期間は短い DB processing time is short. クラスロードが長期間を占める Class load takes up a long period of time.

他のスレッド The other thread. 48 スレッドは処理を受ける前にある Threads are there before
it gets processed. JITは処理され続けている JIT continues to be processed. 処理スレッド processing thread 処理前のFWの処理が400ms掛っている It is taking 400ms to process the FW before processing

結論 Conclusion • 処理開始時点にはスレッドは用意ができている At the start of the process,
the thread is ready to go • コネクションプールでは遅延していない No delays in connection pools • JIT されていなくても処理は速い Even if it's not JIT'd, it's still fast • 処理に必要なクラスのロードに時間が掛かっている It's taking a long time to load the classes needed for processing 49

CONFIDENTIAL Designator linkedin.com/company/red-hat youtube.com/user/RedHatVideos facebook.com/redhatinc twitter.com/RedHat Thank you 50

Getting Started Performance Troubleshoot

Getting Started Performance Troubleshoot

More Decks by Chihiro Ito

Other Decks in Technology

Featured

Transcript