Slide 1

Slide 1 text

Distributed Log Search based on Time Series Access and Service Relations Tomoyuki Koyama, Takayuki Kushida Tokyo University of Technology AINA-2022 / April 15, 2022 1

Slide 2

Slide 2 text

Server Introduction • Log is a message that is recorded events within the software. • Supports analysis of processing procedures in past • Helps administrators to find errors in software • Log Management • The total volume of logs increases as the number of logs increases. • Millions of logs need to be retrieved in a short time. 2 Software March 27, 2022 10:24:13 Started app March 27, 2022 10:25:40 Communicated node1 March 27, 2022 10:25:40 Stored file1 March 27, 2022 10:35:00 Stopped app Logs Administrator Search

Slide 3

Slide 3 text

Introduction • Distributed Tracing is realized by logs. • It improves transactions traceability on microservices. • Each transaction is assigned a request identifier ’Request ID’. • Microservices store log messages with the identifier on log files. • Administrator finds the messages for root cause analysis. 3 [2021-10-22T00:27:09.383Z] "GET /paper/0416f705-df88-4d5f-82e8-095d4bd89e37/download HTTP/1.1" 200 - via_upstream - "-" 0 736954 134 133 "-" "Python/3.9 aiohttp/3.7.4.post0" "11c0553b-e1cd-9044-b4ce- 49576dcbae6c" "paper-app.paper:4000" "10.42.2.65:8000" inbound|8000|| 127.0.0.6:37351 10.42.2.65:8000 10.42.2.64:44452 outbound_.4000_._.paper-app.paper.svc.cluster.local default EXAMPLE: Log message Users Microservice A Microservice B Request ID=11c0553… Log files Log messages with Request ID Administrator Find log messages with Request ID Request ID=11c0553… HTTP Request (transaction) Request ID (1) (2) (3) (4)

Slide 4

Slide 4 text

Introduction • Scatter-Gather Pattern enables horizontal scalability. • A method for large scale data processing • Scatter: The root node splits a task into several sub-tasks, and scatters sub-tasks to leaf nodes. • Gather: Leaf nodes return a result of the sub-task to the root node. • Prerequisite • Applies Scatter-Gather Pattern to log search for distributed tracing 4 Leaf nodes Root node Scatter Gather Admin

Slide 5

Slide 5 text

Issue – Log Search ◆ A simple search method accesses all logs in parallel. 5 Search response time The volume of accessed logs corresponding As the volume of accessed logs on search increases, Search response time increases. As the total volume of logs increases, Search response time increases. ◆ Reduction of search response time is useful for trouble-shoot. Short response time reduces the total time for repairing troubles. Needs: The method for reduction the volume of logs on search

Slide 6

Slide 6 text

Proposed Method • Proposes a fast log search method for distributed tracing • Reduces the number of accessed log data on Search. • Focuses on time-series access patterns of log data and service relations 6 B Microservices A C B Service Relations A C Logs Service Discovery Blocks Placement Rule Leaf nodes Root node Moving blocks by placement rule Istio Admin Search Query Store Phase Search Phase Clustering by datetime & microservice Block List

Slide 7

Slide 7 text

Proposed Method – Reduction of search target logs Service relations correspond to Chronological order among logs. 7 Search target period Accessed Blocks in search targets Unaccessed Blocks in search targets Datetime Blocks: Microservice A Blocks: Microservice B Service A sends a request to Service B. =Time-series access patterns Service B writes a log message after Servce A writes a log message. A B Microservices Example) (1)Request (2)Response A B Service Relations Clustering by datetime & microservice Log: sent Reduces the number of accessed blocks on Search Phase Log: received

Slide 8

Slide 8 text

Experimental Method • Measures search response time from search requests sent till the search responses recived • Creates 14 VMs on Hypervisor • CPU: 1[Core], RAM: 1[GB], Storage: 30[GB] • 1 root node, 13 leaf nodes • Stores production logs (paper search website) to leaf nodes • Enhances the volume of logs: 1,600 → 8,065,000 [messages] 8 Collected from production microservices request_id=xxx bs=8, s_dt_begin="2021-12-13T10:21:50", s_dt_end="2023-02-13T10:21:50" Search Query

Slide 9

Slide 9 text

Experimental Results • Compares search response time between proposed method and all parallel method while the search target period expands • The proposed method is 52% shorter than all parallel method in response time maximally. 9 better Proposed method 0 0.5 1 1.5 2 2.5 3 3 6 9 12 15 18 21 24 Response time [Sec] Date/time range in search query [Month] proposal-04 proposal-08 proposal-16 proposal-32 0 0.5 1 1.5 2 2.5 3 3 6 Response time [Sec] Dat all-p all-p Response time[sec] Search target period [month] All parallel method Search target period [month] 21 24 onth] al-16 al-32 0 0.5 1 1.5 2 2.5 3 3 6 9 12 15 18 21 24 Response time [Sec] Date/time range in search query [Month] all-parallel-04 all-parallel-08 all-parallel-16 all-parallel-32 Response time[sec] better Block Size=4[MB] Block Size=8[MB] Block Size=16[MB] Block Size=32[MB] Block Size=4[MB] Block Size=8[MB] Block Size=16[MB] Block Size=32[MB]

Slide 10

Slide 10 text

Discussion ◆Block size • The proposed method sets fixed block size. • The number of log messages per block is homogeneous. • The file size which can be read and written simultaneously depends on Disk I/O performance per leaf node. • Block size has to be calculated from Disk I/O performance. • One of the methods is using iostat command which returns I/O performance. 10

Slide 11

Slide 11 text

Thank you for listening 11