Distributed Log Search Based on Time Series Access and Service Relations

Distributed Log Search based on Time Series Access and Service
Relations Tomoyuki Koyama, Takayuki Kushida Tokyo University of Technology AINA-2022 / April 15, 2022 1

Server Introduction • Log is a message that is recorded
events within the software. • Supports analysis of processing procedures in past • Helps administrators to find errors in software • Log Management • The total volume of logs increases as the number of logs increases. • Millions of logs need to be retrieved in a short time. 2 Software March 27, 2022 10:24:13 Started app March 27, 2022 10:25:40 Communicated node1 March 27, 2022 10:25:40 Stored file1 March 27, 2022 10:35:00 Stopped app Logs Administrator Search

Introduction • Distributed Tracing is realized by logs. • It
improves transactions traceability on microservices. • Each transaction is assigned a request identifier ’Request ID’. • Microservices store log messages with the identifier on log files. • Administrator finds the messages for root cause analysis. 3 [2021-10-22T00:27:09.383Z] "GET /paper/0416f705-df88-4d5f-82e8-095d4bd89e37/download HTTP/1.1" 200 - via_upstream - "-" 0 736954 134 133 "-" "Python/3.9 aiohttp/3.7.4.post0" "11c0553b-e1cd-9044-b4ce- 49576dcbae6c" "paper-app.paper:4000" "10.42.2.65:8000" inbound|8000|| 127.0.0.6:37351 10.42.2.65:8000 10.42.2.64:44452 outbound_.4000_._.paper-app.paper.svc.cluster.local default EXAMPLE: Log message Users Microservice A Microservice B Request ID=11c0553… Log files Log messages with Request ID Administrator Find log messages with Request ID Request ID=11c0553… HTTP Request (transaction) Request ID (1) (2) (3) (4)

Introduction • Scatter-Gather Pattern enables horizontal scalability. • A method
for large scale data processing • Scatter: The root node splits a task into several sub-tasks, and scatters sub-tasks to leaf nodes. • Gather: Leaf nodes return a result of the sub-task to the root node. • Prerequisite • Applies Scatter-Gather Pattern to log search for distributed tracing 4 Leaf nodes Root node Scatter Gather Admin

Issue – Log Search ◆ A simple search method accesses
all logs in parallel. 5 Search response time The volume of accessed logs corresponding As the volume of accessed logs on search increases, Search response time increases. As the total volume of logs increases, Search response time increases. ◆ Reduction of search response time is useful for trouble-shoot. Short response time reduces the total time for repairing troubles. Needs: The method for reduction the volume of logs on search

Proposed Method • Proposes a fast log search method for
distributed tracing • Reduces the number of accessed log data on Search. • Focuses on time-series access patterns of log data and service relations 6 B Microservices A C B Service Relations A C Logs Service Discovery Blocks Placement Rule Leaf nodes Root node Moving blocks by placement rule Istio Admin Search Query Store Phase Search Phase Clustering by datetime & microservice Block List

Proposed Method – Reduction of search target logs Service relations
correspond to Chronological order among logs. 7 Search target period Accessed Blocks in search targets Unaccessed Blocks in search targets Datetime Blocks: Microservice A Blocks: Microservice B Service A sends a request to Service B. =Time-series access patterns Service B writes a log message after Servce A writes a log message. A B Microservices Example) (1)Request (2)Response A B Service Relations Clustering by datetime & microservice Log: sent Reduces the number of accessed blocks on Search Phase Log: received

Experimental Method • Measures search response time from search requests
sent till the search responses recived • Creates 14 VMs on Hypervisor • CPU: 1[Core], RAM: 1[GB], Storage: 30[GB] • 1 root node, 13 leaf nodes • Stores production logs (paper search website) to leaf nodes • Enhances the volume of logs: 1,600 → 8,065,000 [messages] 8 Collected from production microservices request_id=xxx bs=8, s_dt_begin="2021-12-13T10:21:50", s_dt_end="2023-02-13T10:21:50" Search Query

Experimental Results • Compares search response time between proposed method
and all parallel method while the search target period expands • The proposed method is 52% shorter than all parallel method in response time maximally. 9 better Proposed method 0 0.5 1 1.5 2 2.5 3 3 6 9 12 15 18 21 24 Response time [Sec] Date/time range in search query [Month] proposal-04 proposal-08 proposal-16 proposal-32 0 0.5 1 1.5 2 2.5 3 3 6 Response time [Sec] Dat all-p all-p Response time[sec] Search target period [month] All parallel method Search target period [month] 21 24 onth] al-16 al-32 0 0.5 1 1.5 2 2.5 3 3 6 9 12 15 18 21 24 Response time [Sec] Date/time range in search query [Month] all-parallel-04 all-parallel-08 all-parallel-16 all-parallel-32 Response time[sec] better Block Size=4[MB] Block Size=8[MB] Block Size=16[MB] Block Size=32[MB] Block Size=4[MB] Block Size=8[MB] Block Size=16[MB] Block Size=32[MB]

Discussion ◆Block size • The proposed method sets fixed block
size. • The number of log messages per block is homogeneous. • The file size which can be read and written simultaneously depends on Disk I/O performance per leaf node. • Block size has to be calculated from Disk I/O performance. • One of the methods is using iostat command which returns I/O performance. 10

Thank you for listening 11

Distributed Log Search Based on Time Series Acc...

Distributed Log Search Based on Time Series Access and Service Relations

Tomoyuki KOYAMA

More Decks by Tomoyuki KOYAMA

Other Decks in Research

Featured

Transcript

Distributed Log Search based on Time Series Access and Service

Server Introduction • Log is a message that is recorded

Introduction • Distributed Tracing is realized by logs. • It

Introduction • Scatter-Gather Pattern enables horizontal scalability. • A method

Issue – Log Search ◆ A simple search method accesses

Proposed Method • Proposes a fast log search method for

Proposed Method – Reduction of search target logs Service relations

Experimental Method • Measures search response time from search requests

Experimental Results • Compares search response time between proposed method

Discussion ◆Block size • The proposed method sets fixed block

Thank you for listening 11