Slide 1

Slide 1 text

Falcon: A Practical Log-Based Analysis Tool for Distributed Systems Francisco Neves, Nuno Machado and José Pereira HASLab, INESC TEC and University of Minho Braga, Portugal 28 June, 2018 Luxembourg City, Luxembourg IEEE/IFIP International Conference on Dependable Systems and Networks

Slide 2

Slide 2 text

Introduction ● Developers still debug runtime executions using log data ● Log-based analysis is a daunting task in distributed systems ○ Sheer number of log entries ○ Multiple and heterogeneous log files scattered across several nodes ○ Physical clock skew and log messages produced asynchronously 2

Slide 3

Slide 3 text

Case Study - ZooKeeper Leader Election (in Theory) 3

Slide 4

Slide 4 text

Case Study - ZooKeeper Leader Election (in Practice) 4

Slide 5

Slide 5 text

5 2018-06-12 19:25:33,939 [myid:3] - DEBUG [QuorumPeer[myid=3]/0.0.0.0:2181:FastLeaderElection@672] - Sending Notification: 3 (n.leader), 0x0 (n.zxid), 0x1 (n.round), 2 (recipient), 3 (myid), 0x0 (n.peerEpoch) Case Study - ZooKeeper Leader Election (in Practice) 2018-06-12 19:25:33,938 [myid:2] - DEBUG [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FastLeaderElectio n@789] - Updating proposal: 3 (newleader), 0x0 (newzxid), 2 (oldleader), 0x0 (oldzxid)

Slide 6

Slide 6 text

6 Case Study - ZooKeeper Leader Election (in Practice)

Slide 7

Slide 7 text

Causality Lamport defines a relationship called happens-before “a → b” is read as “a happens before b” 1. if a and b are events in the same process and a occurs before b, then a → b 2. if a is the event of a message being sent by one process and b is the event of the message being received by another process, then a → b 3. “happens-before” is transitive, meaning if a → b and b → c, then a → c 7

Slide 8

Slide 8 text

Tracing tools for debugging 8 Commodity tracing tools ● strace ● ltrace ● ptrace ● perf ● eBPF ● DTrace ● LTTng State-Of-The-Art tracing tools ● Pinpoint [DSN’02] ● X-Trace [NSDI’07] ● Google Dapper ● Twitter Zipkin ● OpenTracing ● Pivot Tracing [SOSP’15] ● Canopy [SOSP’17]  Source/protocol-dependent  Manual instrumentation  Causality  Source/protocol-agnostic  Out-of-the-box instrumentation  Lack of Causality

Slide 9

Slide 9 text

Falcon: A practical log-based analysis tool 9  Source/protocol-agnostic  Out-of-the-box instrumentation  Causality ● Combines events from distinct tracing tools into a causally coherent space-time diagram of the distributed execution.

Slide 10

Slide 10 text

3. Generate the space-time diagram Falcon — Overview 10 1. Parse the output of different tracing tools 2. Compute logical time according to causality

Slide 11

Slide 11 text

3. Generate the space-time diagram Falcon — Overview 11 1. Parse the output of different tracing tools 2. Compute logical time according to causality

Slide 12

Slide 12 text

1. Parse the output of different tracing tools ● Current drivers support: strace, log4j 12 ○ CONNECT (process, src, dst) ○ ACCEPT (process, src, dst) ○ SND (process, src, dst, msg) ○ RCV (process, src, dst, msg) ○ START (process) ○ END (process) ○ FORK (parent, child) ○ JOIN (parent, child) ○ LOG (process, msg) Low-level Events User-level Events ● Falcon uses drivers for translating the output of tracing tools into events with a common format:

Slide 13

Slide 13 text

1. Parse the output of different tracing tools ● Example: for strace, the driver translates system calls into Falcon events: 13 System Call Falcon event(s) fork() FORK + START wait() JOIN + END exit() END connect() CONNECT accept() ACCEPT write() / sendmsg() / sendto() / ... SND read() / recvmsg() / ... RCV

Slide 14

Slide 14 text

3. Generate the space-time diagram Falcon — Overview 14 2. Compute logical time according to causality 1. Parse the output of different tracing tools

Slide 15

Slide 15 text

2. Compute logical time according to causality ● Cast logical time assignment as a constraint solving problem ○ Each event’s logical timestamp is represented as a symbolic integer variable ○ Symbolic variable values must satisfy intra-process and inter-process causality constraints ● Solving the model with an off-the-shelf solver (e.g. Z3) outputs a trace of logically-timestamped events that satisfy causality 15

Slide 16

Slide 16 text

● Happens-before model to obtain event logical timestamps: 2. Compute logical time according to causality 16 Φ HB = φ inter  φ intra  GOAL

Slide 17

Slide 17 text

● Happens-before model to obtain event logical timestamps: 2. Compute logical time according to causality 17 Φ HB = φ inter  φ intra  GOAL Inter-process causality constraints ○ SNDp,m < RCVq,m ○ CONNECTp < ACCEPTq ○ FORKp,q < STARTq ○ ENDq < JOINp,q

Slide 18

Slide 18 text

● Happens-before model to obtain event logical timestamps: 2. Compute logical time according to causality 18 Φ HB = φ inter  φ intra  GOAL Intra-process causality constraints ○ Γ p is the event trace of process p ○ ci and cj are symbolic variables for logical timestamps of events i and j ○  i, j  Γ p : (i < j  ci < cj )

Slide 19

Slide 19 text

● Happens-before model to obtain event logical timestamps: 2. Compute logical time according to causality 19 Φ HB = φ inter  φ intra  GOAL Soft constraint that allows steering the constraint solving procedure:  to follow the original timestamp order  to minimize the logical clock values (i.e. to expose concurrency)

Slide 20

Slide 20 text

3. Generate the space-time diagram Falcon — Overview 20 2. Compute logical time according to causality 1. Parse the output of different tracing tools

Slide 21

Slide 21 text

3. Generate the space-time diagram 21

Slide 22

Slide 22 text

3. Generate the space-time diagram 22 Logical Timestamps

Slide 23

Slide 23 text

3. Generate the space-time diagram 23 Thread Timeline

Slide 24

Slide 24 text

3. Generate the space-time diagram 24 Thread Identifier

Slide 25

Slide 25 text

3. Generate the space-time diagram 25 Thread Event

Slide 26

Slide 26 text

3. Generate the space-time diagram 26 Threads under the same Process

Slide 27

Slide 27 text

3. Generate the space-time diagram 27 Happens-Before relationship between two events of distinct threads/processes

Slide 28

Slide 28 text

Case Study — Apache Zookeeper ● Experimental setting: ○ Two Zookeeper instances executing the Leader Election algorithm ○ Tracing tools: ptrace-based tool and log4j  Our ptrace-tool is quite similar to strace but outputs Falcon events  The log4j configuration includes a new appender that formats log messages according to Falcon’s event specification ● Research Questions: ○ Space-time diagram usefulness ○ Tracing overhead ○ Constraint solving performance 28

Slide 29

Slide 29 text

Case Study — Space-Time Diagram Usefulness (1/3) 29

Slide 30

Slide 30 text

Case Study — Space-Time Diagram Usefulness (1/3) 30 Zookeeper uses DataOutputStream class. The writeInt method of this class calls the write syscall per byte

Slide 31

Slide 31 text

Case Study — Space-Time Diagram Usefulness (2/3) 31

Slide 32

Slide 32 text

Case Study — Space-Time Diagram Usefulness (3/3) 32

Slide 33

Slide 33 text

Case Study — Tracing Overhead ● A Zookeeper client performing 10k iterations of four operations: ○ Check whether a znode exists ○ Create a new znode ○ Check again whether the znode exists ○ Delete the created znode 33 Duration per iteration (ms)

Slide 34

Slide 34 text

● Logs collected with two distinct log levels enabled: ○ INFO - 342 entries ○ INFO + DEBUG - 568 entries Case Study — Constraint Solving Performance 34

Slide 35

Slide 35 text

Future Work ● An eBPF-based tracer, which is expected to be much more efficient than ptrace-based tracing tools ● An interactive space-time diagram ○ Integration with ShiViz 35

Slide 36

Slide 36 text

Conclusions ● Understanding distributed system behavior requires the ability to reason about the causality of events in different processes ● Falcon eases the analysis of distributed system executions without requiring manual instrumentation: ○ Infers causally-related activity atop commodity tracing tools ○ Generates a space-time diagram ○ Incurs tolerable overhead 36

Slide 37

Slide 37 text

Thank you for your attention! Try Falcon now: https://github.com/fntneves/falcon 37