in microservices Tomoyuki KOYAMA, Takayuki KUSHIDA Graduate School of Bionics, Computer and Media Sciences Tokyo University of Technology March 20, 2023 1 CIoT 2023 @Lisbon
and download for end users • Record access history into logs on receiving accesses • End users • Access the web service for technical reports download • Monitoring server • Send HTTP requests to Doktor* • Detect error response counts increase ⇒ Send an alert to system administrator • System administrator • Analyze root cause of error response by log search • Log server • Receive and store logs from microservices • Provide log search for system administrator 2 web service (Doktor) log server system administrators HTTP request HTTP response End users monitoring server send alert log search logs m onitor Error response counts increases Microservice Logs are utilized for root cause analysis. Error * https://github.com/cdsl-research/doktor-v2
search on Doktor • An application bug in “front” by developer • Conditional error occurance on “front” • An error occurs on search keyword “recall” • Error occurance depends on HTTP response body structure from “fulltext” • System administrator • Find log messages corresponding to root cause of error response • Take time to diagnose condition that error occurs ⇒ Not enough information for system administrator to obtain root cause from logs user A front fulltext search {"found": [ {"title": "xx"}, {"title": "yy"}, ]} log server HTTP response body strucutre is dynamically changed by HTTP requests. Keyword “recall” Keyword “IoT” web service (Doktor) {"found": [ ]} Microservice HTTP request HTTP response logs bugs system administrator log search Error user B [2023-01-28T03:22:39.272Z] "GET /fulltext/e0ac896 ..." 200 ... Example log
analysis or fault localization. • MicroRCA [Wu et al. 2020] • Proposed graph based anomaly propagation model to detect performance issues • Included “total sent bytes” as an metric for fault localization ⇒ ”total sent bytes” is different in HTTP requests even if URL endpoint is the same among requests. • CloudRanger [Wang et al. 2018] • Utilized response time and throughput as anomaly detection metrics ⇒This approach cannot detect HTTP response body structure changes. 4
item counts per depth from HTTP response body • Provide observability for HTTP response body structure changes (e.g. item counts) • Support system administrator in finding log messages corresponding to errors • RBSI is added to the end of each log message. 5 Goal: System administrator can obtain information for root cause analysis from log messages. [2022-11-27T08:07:21...] "POST /search ..." 200 … 1, 2 [2022-11-27T08:07:24...] "POST /search ..." 500 … 1, 0 [2022-11-27T08:07:25...] "POST /search ..." 200 … 1, 8 Example: RBSI in log messages { ”fulltexts”: [ {“paper_id”: “xxx”, ”page_n”: 1, …}, {“paper_id”: “yyy”, ”page_n”: 3, …} ] } RBSI HTTP response body in JSON format RBSI 1,2
service consists of microservices . • Web service adopts service mesh (Istio). • Proxy servers for service mesh • Intercept HTTP requests/responses • Generate logs when requests is received • Add RBSA to proxy servers • System administrator finds log messages related to errors. • RBSI represents HTTP response body structure changes. • RBSI allows system administrator to reduce the time to analyze root cause of errors through log messages. 7 web service app1 microservice 1 app2 microservice 2 [2022-11-27T08:07:21...] "POST /search ..." 200 … 1, 2 end user add generate log message RBSA proxy server RBSI system administrator watch
on web application by complex SQL queries • Change item orders in HTTP response body • RBSI doesn’t represent list item order changes • The same RBSI values: “Normal response” ↔ “Item order changes” • [Solution] Use hash algorithm (e.g. SHA-1) to represent the order of items • Use short hash value such as Git commit ID to reduce log message length {"found": [ {"publish_year": 2020}, {"publish_year": 2022}, {"publish_year": 2019}, ]} {"found": [ {"publish_year": 2022}, {"publish_year": 2020}, {"publish_year": 2019}, ]} ◆ Normal response ◆ Item order changes (fault) 1,3 1,3 HTTP response body (JSON format) 6aaad70 e179be7 SHA-1 (short) SHA-1 (short) RBSI RBSI
cause analysis from log messages. Issue • System administrator takes time to diagnose a condition that error occurs by logs. ⇒ Values in log messages are insufficient for diagnosis. Proposed method • Propose identifier "RBSI" for diagnostics with log messages • RBSI represent HTTP response body structure changes Experimental results • Average response time: proposed log format increases 7% than “default log format” • Log message length: proposed log format increases 0.02% than ”default log format” 12