Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Distributed Log Search Based on Time Series Access and Service Relations

Distributed Log Search Based on Time Series Access and Service Relations

Tomoyuki KOYAMA

April 17, 2022
Tweet

More Decks by Tomoyuki KOYAMA

Other Decks in Research

Transcript

  1. Distributed Log Search based on Time Series Access and Service

    Relations Tomoyuki Koyama, Takayuki Kushida Tokyo University of Technology AINA-2022 / April 15, 2022 1
  2. Server Introduction • Log is a message that is recorded

    events within the software. • Supports analysis of processing procedures in past • Helps administrators to find errors in software • Log Management • The total volume of logs increases as the number of logs increases. • Millions of logs need to be retrieved in a short time. 2 Software March 27, 2022 10:24:13 Started app March 27, 2022 10:25:40 Communicated node1 March 27, 2022 10:25:40 Stored file1 March 27, 2022 10:35:00 Stopped app Logs Administrator Search
  3. Introduction • Distributed Tracing is realized by logs. • It

    improves transactions traceability on microservices. • Each transaction is assigned a request identifier ’Request ID’. • Microservices store log messages with the identifier on log files. • Administrator finds the messages for root cause analysis. 3 [2021-10-22T00:27:09.383Z] "GET /paper/0416f705-df88-4d5f-82e8-095d4bd89e37/download HTTP/1.1" 200 - via_upstream - "-" 0 736954 134 133 "-" "Python/3.9 aiohttp/3.7.4.post0" "11c0553b-e1cd-9044-b4ce- 49576dcbae6c" "paper-app.paper:4000" "10.42.2.65:8000" inbound|8000|| 127.0.0.6:37351 10.42.2.65:8000 10.42.2.64:44452 outbound_.4000_._.paper-app.paper.svc.cluster.local default EXAMPLE: Log message Users Microservice A Microservice B Request ID=11c0553… Log files Log messages with Request ID Administrator Find log messages with Request ID Request ID=11c0553… HTTP Request (transaction) Request ID (1) (2) (3) (4)
  4. Introduction • Scatter-Gather Pattern enables horizontal scalability. • A method

    for large scale data processing • Scatter: The root node splits a task into several sub-tasks, and scatters sub-tasks to leaf nodes. • Gather: Leaf nodes return a result of the sub-task to the root node. • Prerequisite • Applies Scatter-Gather Pattern to log search for distributed tracing 4 Leaf nodes Root node Scatter Gather Admin
  5. Issue – Log Search ◆ A simple search method accesses

    all logs in parallel. 5 Search response time The volume of accessed logs corresponding As the volume of accessed logs on search increases, Search response time increases. As the total volume of logs increases, Search response time increases. ◆ Reduction of search response time is useful for trouble-shoot. Short response time reduces the total time for repairing troubles. Needs: The method for reduction the volume of logs on search
  6. Proposed Method • Proposes a fast log search method for

    distributed tracing • Reduces the number of accessed log data on Search. • Focuses on time-series access patterns of log data and service relations 6 B Microservices A C B Service Relations A C Logs Service Discovery Blocks Placement Rule Leaf nodes Root node Moving blocks by placement rule Istio Admin Search Query Store Phase Search Phase Clustering by datetime & microservice Block List
  7. Proposed Method – Reduction of search target logs Service relations

    correspond to Chronological order among logs. 7 Search target period Accessed Blocks in search targets Unaccessed Blocks in search targets Datetime Blocks: Microservice A Blocks: Microservice B Service A sends a request to Service B. =Time-series access patterns Service B writes a log message after Servce A writes a log message. A B Microservices Example) (1)Request (2)Response A B Service Relations Clustering by datetime & microservice Log: sent Reduces the number of accessed blocks on Search Phase Log: received
  8. Experimental Method • Measures search response time from search requests

    sent till the search responses recived • Creates 14 VMs on Hypervisor • CPU: 1[Core], RAM: 1[GB], Storage: 30[GB] • 1 root node, 13 leaf nodes • Stores production logs (paper search website) to leaf nodes • Enhances the volume of logs: 1,600 → 8,065,000 [messages] 8 Collected from production microservices request_id=xxx bs=8, s_dt_begin="2021-12-13T10:21:50", s_dt_end="2023-02-13T10:21:50" Search Query
  9. Experimental Results • Compares search response time between proposed method

    and all parallel method while the search target period expands • The proposed method is 52% shorter than all parallel method in response time maximally. 9 better Proposed method 0 0.5 1 1.5 2 2.5 3 3 6 9 12 15 18 21 24 Response time [Sec] Date/time range in search query [Month] proposal-04 proposal-08 proposal-16 proposal-32 0 0.5 1 1.5 2 2.5 3 3 6 Response time [Sec] Dat all-p all-p Response time[sec] Search target period [month] All parallel method Search target period [month] 21 24 onth] al-16 al-32 0 0.5 1 1.5 2 2.5 3 3 6 9 12 15 18 21 24 Response time [Sec] Date/time range in search query [Month] all-parallel-04 all-parallel-08 all-parallel-16 all-parallel-32 Response time[sec] better Block Size=4[MB] Block Size=8[MB] Block Size=16[MB] Block Size=32[MB] Block Size=4[MB] Block Size=8[MB] Block Size=16[MB] Block Size=32[MB]
  10. Discussion ◆Block size • The proposed method sets fixed block

    size. • The number of log messages per block is homogeneous. • The file size which can be read and written simultaneously depends on Disk I/O performance per leaf node. • Block size has to be calculated from Disk I/O performance. • One of the methods is using iostat command which returns I/O performance. 10