Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Minimizing Faulty Executions of Distributed Systems

Minimizing Faulty Executions of Distributed Systems

Talk given at NSDI '16, RICON, Microsoft Research, Google, Salesforce, Cornell, UW, UC Berkeley, USC.

Colin Scott

March 21, 2016
Tweet

More Decks by Colin Scott

Other Decks in Technology

Transcript

  1. Minimizing Faulty Executions of Distributed Systems Colin Scott, Aurojit Panda,

    Vjekoslav Brajkovic, George Necula, Arvind Krishnamurthy, Scott Shenker
  2. Node1 ? … Node2 Node3 Node4 Node5 Node6 Node7 Node8

    Node9 Node10 Node11 Node12 … Software Developer
  3. 1 LaToza, Venolia, DeLine, ICSE’ 06 49% of developers’ time

    spent on debugging!1 Understanding How Bug Is Triggered Fixing Problematic Code
  4. Why Minimization? Smaller event traces are easier to understand G.

    A. Miller. The Magical Number Seven, Plus or Minus Two: Some Limits on Our Capacity for Processing Information. Psychological Review ’56.
  5. Outline Introduction Background Node 1 Node N Test Coordinator QA

    Testbed Software Under Test Fuzz Testing w/ DEMi S2 S3 S1 S3 Computational Model Minimization Evaluation Conclusion
  6. Outline Introduction Background Node 1 Node N Test Coordinator QA

    Testbed Software Under Test Fuzz Testing w/ DEMi S2 S3 S1 S3 Computational Model Minimization Evaluation Conclusion
  7. Outline Introduction Background Node 1 Node N Test Coordinator QA

    Testbed Software Under Test Fuzz Testing w/ DEMi S2 S3 S1 S3 Computational Model Minimization Evaluation Conclusion
  8. Outline Introduction Background Node 1 Node N Test Coordinator QA

    Testbed Software Under Test Fuzz Testing w/ DEMi S2 S3 S1 S3 Computational Model Minimization Evaluation Conclusion
  9. Outline Introduction Background Node 1 Node N Test Coordinator QA

    Testbed Software Under Test Fuzz Testing w/ DEMi S2 S3 S1 S3 Computational Model Minimization Evaluation Conclusion
  10. Computational Model Distributed System: Collection of N processes Each process

    p: Has unbounded memory Starts in a known initial state Changes states deterministically a b c d e
  11. Computational Model The network maintains a buffer of sent but

    not yet delivered messages a b c d e msg dst: d
  12. Computational Model The network maintains a buffer of sent but

    not yet delivered messages a b c d e msg dst: d
  13. Computational Model Message deliveries occur one at a time: destination

    enters a new state according to old state & message destination sends a finite set of messages to other processes* *May include timer messages to be delivered to itself later a b c d e msg dst: d
  14. Computational Model Message deliveries occur one at a time: destination

    enters a new state according to old state & message destination sends a finite set of messages to other processes* *May include timer messages to be delivered to itself later a b c d e msg dst: d
  15. Computational Model Message deliveries occur one at a time: destination

    enters a new state according to old state & message destination sends a finite set of messages to other processes* *May include timer messages to be delivered to itself later a b c d e timer dst: d msg dst: a
  16. Computational Model Message deliveries occur one at a time: destination

    enters a new state according to old state & message destination sends a finite set of messages to other processes* *May include timer messages to be delivered to itself later a b c d e timer dst: d msg dst: a
  17. Computational Model Message deliveries occur one at a time: destination

    enters a new state according to old state & message destination sends a finite set of messages to other processes* *May include timer messages to be delivered to itself later a b c d e timer dst: d msg dst: a
  18. Computational Model Steps may also be external: External message is

    sent Process is created Process crash-recovers a b c d e timer dst: d msg dst: a
  19. Computational Model Steps may also be external: External message is

    sent Process is created Process crash-recovers a b c d e timer dst: d msg dst: a msg dst: e
  20. Computational Model Steps may also be external: External message is

    sent Process is created Process crash-recovers a b c d e timer dst: d msg dst: a msg dst: e
  21. Computational Model A schedule τ is a sequence of events

    (either external or internal message deliveries) that can be applied in turn starting from the initial configuration. process start message delivery message delivery message delivery external message message delivery e1 i1 i2 i3 i4 e2
  22. Invariant Checking An invariant is a predicate P over the

    state of all processes. a b c d e { ✔ ✗
  23. Invariant Checking An invariant is a predicate P over the

    state of all processes. a b c d e { ✔ ✗ ✗ A faulty execution is one that ends in an invariant violation. e1 i1 i2 i3 i4 e2
  24. Formal Problem Statement Find: locally minimal reproducing sequence τ’: τ’

    violates P, |τ’| ≤ |τ| τ’ contains a subsequence of the external events of τ if we remove any external event e from τ’, ¬∃ τ’’ containing same external events - e, s.t. τ’’ violates P Given: schedule τ that results in violation of P
  25. Outline Introduction Background Node 1 Node N Test Coordinator QA

    Testbed Software Under Test Fuzz Testing w/ DEMi S2 S3 S1 S3 Computational Model Minimization Evaluation Conclusion
  26. Fuzz Testing with DEMi App RPC lib OS AspectJ App

    RPC lib OS AspectJ App RPC lib OS AspectJ
  27. Fuzz Testing with DEMi App RPC lib OS AspectJ App

    RPC lib OS AspectJ App RPC lib OS AspectJ msg dst: b
  28. Fuzz Testing with DEMi App RPC lib OS AspectJ App

    RPC lib OS AspectJ App RPC lib OS AspectJ msg dst: b
  29. Fuzz Testing with DEMi App RPC lib OS AspectJ App

    RPC lib OS AspectJ App RPC lib OS AspectJ msg dst: b
  30. Fuzz Testing with DEMi App RPC lib OS AspectJ App

    RPC lib OS AspectJ App RPC lib OS AspectJ msg dst: b
  31. Fuzz Testing with DEMi App RPC lib OS AspectJ App

    RPC lib OS AspectJ App RPC lib OS AspectJ msg dst: b message delivery
  32. Fuzz Testing with DEMi App RPC lib OS AspectJ App

    RPC lib OS AspectJ App RPC lib OS AspectJ msg dst: b message delivery
  33. Fuzz Testing with DEMi App RPC lib OS AspectJ App

    RPC lib OS AspectJ App RPC lib OS AspectJ timer dst: b msg dst: a message delivery
  34. Fuzz Testing with DEMi App RPC lib OS AspectJ App

    RPC lib OS AspectJ App RPC lib OS AspectJ timer dst: b msg dst: a message delivery
  35. Fuzz Testing with DEMi App RPC lib OS AspectJ App

    RPC lib OS AspectJ App RPC lib OS AspectJ timer dst: b msg dst: a message delivery
  36. Fuzz Testing with DEMi App RPC lib OS AspectJ App

    RPC lib OS AspectJ App RPC lib OS AspectJ timer dst: b msg dst: a message delivery crash recovery
  37. Fuzz Testing with DEMi App RPC lib OS AspectJ App

    RPC lib OS AspectJ App RPC lib OS AspectJ timer dst: b msg dst: a message delivery crash recovery
  38. Fuzz Testing with DEMi App RPC lib OS AspectJ App

    RPC lib OS AspectJ App RPC lib OS AspectJ timer dst: b msg dst: a message delivery crash recovery
  39. Outline Introduction Background Node 1 Node N Test Coordinator QA

    Testbed Software Under Test Fuzz Testing w/ DEMi S2 S3 S1 S3 Computational Model Minimization Evaluation Conclusion
  40. Running Example: Raft Consensus a b c d client request

    client request client request client request
  41. Running Example: Raft Consensus a b c d client request

    client request client request client request
  42. Running Example: Raft Consensus a b c d client request

    client request client request ACK ACK ACK client request
  43. Running Example: Raft Consensus a b c d client request

    client request client request client request
  44. Running Example: Raft Consensus a b c d client request

    client request client request client request
  45. Running Example: Raft Consensus a b c d client request

    client request client request commit commit commit client request
  46. Running Example: Raft Consensus a b c d client request

    client request client request client request
  47. Minimization τ : Given Straightforward approach: Enumerate all schedules |τ’|

    ≤ |τ|, Pick shortest sequence that reproduces ✗ τ Schedule Space … ✗ e1 i1 i2 i4 e2 en im
  48. Minimization τ : Given Straightforward approach: Enumerate all schedules |τ’|

    ≤ |τ|, Pick shortest sequence that reproduces ✗ τ Schedule Space … ✗ e1 i1 i2 i4 e2 en im
  49. i2 i3 ↛i3 ↛i2 dst(i2) ≠ dst(i3) i3 i2 Observation

    #1: many schedules are commutative
  50. Observation #1: many schedules are commutative i3 i2 Step n:

    i2 i3 ↛i3 ↛i2 dst(i2) ≠ dst(i3)
  51. i3 i2 Step n: Step n+1: i2 i3 ↛i3 ↛i2

    dst(i2) ≠ dst(i3) Observation #1: many schedules are commutative
  52. i3 i2 i3 Step n: Step n+1: Step n+2: i2

    i3 ↛i3 ↛i2 dst(i2) ≠ dst(i3) Observation #1: many schedules are commutative
  53. i3 i2 i3 Step n: Step n+1: Step n+2: i2

    i3 ↛i3 ↛i2 dst(i2) ≠ dst(i3) Observation #1: many schedules are commutative
  54. i3 i2 i2 i3 Step n: Step n+1: Step n+2:

    i2 i3 ↛i3 ↛i2 dst(i2) ≠ dst(i3) Observation #1: many schedules are commutative
  55. Observation #1: many schedules are commutative Adopt DPOR: Dynamic Partial

    Order Reduction C. Flanagan, P. Godefroid, “Dynamic Partial-Order Reduction for Model Checking Software”, POPL ‘05
  56. {x=1,y=2} {x=1,y=3} {x=5,y=5} {x=4,y=1} {x=-1,y=-2} {x=-1,y=-1} Each event affects a

    small subset of receiver’s variables {x=2,y=2} Invariant defined over small subset of processes’ variables
  57. {x=1,y=2} {x=1,y=3} {x=5,y=5} {x=4,y=1} {x=-1,y=-2} {x=-1,y=-1} Initial execution contains events

    that don’t affect invariant {x=2,y=2} Each event affects a small subset of receiver’s variables Invariant defined over small subset of processes’ variables
  58. … ✗ e1 i1 i2 i4 e2 en im Observation

    #2: selectively mask original events τ :
  59. … ✗ e1 i1 i2 i4 e2 en im Observation

    #2: selectively mask original events τ : e1 e2 en e3 e4 ext: e5
  60. τ : en e3 ext: e5 e1 e2 e4 …

    ✗ e1 i1 i2 i4 e2 en im Observation #2: selectively mask original events
  61. x τ : en e3 ext: e5 e1 e2 e4

    … ✗ e1 i1 i2 i4 e2 en im Observation #2: selectively mask original events
  62. x τ : en e3 ext: e5 e1 e2 e4

    … ✗ e1 i1 i2 i4 e2 en im (Apply Delta Debugging1) 1A Zeller, R. Hildebrandt, “Simplifying and Isolating Failure-Inducing Input”, IEEE ‘02 Observation #2: selectively mask original events
  63. τ : en e3 ext: e5 sub1: e1 e2 e4

    … ✗ e1 i1 i2 i4 e2 en im e4 e5 en … (Apply Delta Debugging1) 1A Zeller, R. Hildebrandt, “Simplifying and Isolating Failure-Inducing Input”, IEEE ‘02 Observation #2: selectively mask original events
  64. τ : ext: sub1: … ✗ e1 i1 i2 i4

    e2 en im en e5 e4 e1 e2 e3 foreach i in τ: if i is pending: deliver i # ignore unexpected … e5 e4 en Observation #2: selectively mask original events
  65. τ : ext: sub1: … ✗ e1 i1 i2 i4

    e2 en im en e5 e4 e1 e2 e3 foreach i in τ: if i is pending: deliver i # ignore unexpected i1 … e5 e4 en Observation #2: selectively mask original events
  66. τ : ext: sub1: … ✗ e1 i1 i2 i4

    e2 en im en e5 e4 e1 e2 e3 foreach i in τ: if i is pending: deliver i # ignore unexpected i1 … e5 e4 en Observation #2: selectively mask original events
  67. τ : ext: sub1: … ✗ e1 i1 i2 i4

    e2 en im en e5 e4 e1 e2 e3 foreach i in τ: if i is pending: deliver i # ignore unexpected i1 i4 … e5 e4 en im Observation #2: selectively mask original events
  68. τ : ext: sub1: … ✗ e1 i1 i2 i4

    e2 en im en e5 e4 e1 e2 e3 foreach i in τ: if i is pending: deliver i # ignore unexpected i1 i4 … e5 e4 en im Observation #2: selectively mask original events
  69. τ : ext: sub1: … ✗ e1 i1 i2 i4

    e2 en im en e5 e4 e1 e2 e3 foreach i in τ: if i is pending: deliver i # ignore unexpected i1 i4 ✗ … e5 e4 en im Observation #2: selectively mask original events
  70. τ : ext: sub1: … ✗ e1 i1 i2 i4

    e2 en im en e5 e4 e1 e2 e3 foreach i in τ: if i is pending: deliver i # ignore unexpected i1 i4 ✗ … e5 e4 en im Observation #2: selectively mask original events
  71. τ : ext: sub1: … ✗ e1 i1 i2 i4

    e2 en im en e5 e4 i1 i4 ✗ … e5 e4 en im Observation #2: selectively mask original events
  72. τ : ext: sub1: … ✗ e1 i1 i2 i4

    e2 en im en e5 e4 i1 i4 ✗ … e5 e4 en im Observation #2: selectively mask original events
  73. τ : ext: sub1: … ✗ e1 i1 i2 i4

    e2 en im sub2: en e5 e4 i1 i4 ✗ … e5 e4 en im e5 en Observation #2: selectively mask original events
  74. τ : ext: sub1: … ✗ e1 i1 i2 i4

    e2 en im sub2: i1 i4 … en e5 e4 i1 i4 ✗ … e5 e4 en im e5 en im Observation #2: selectively mask original events
  75. τ : ext: sub1: … ✗ e1 i1 i2 i4

    e2 en im sub2: i1 i4 ✔ … en e5 e4 i1 i4 ✗ … e5 e4 en im e5 en im Observation #2: selectively mask original events
  76. τ : ext: sub1: … ✗ e1 i1 i2 i4

    e2 en im sub2: i1 i4 ✔ … Explore backtrack points until (i) ✗ or (ii) time budget for sub2 expired en e5 e4 i1 i4 ✗ … e5 e4 en im e5 en im Observation #2: selectively mask original events
  77. τ : ext: sub1: … ✗ e1 i1 i2 i4

    e2 en im sub2: … . . . i1 i4 ✔ … Explore backtrack points until (i) ✗ or (ii) time budget for sub2 expired en e5 e4 i1 i4 ✗ … e5 e4 en im e5 en im Observation #2: selectively mask original events
  78. a b c d e msg dst: d type:t seq:3

    src:a dst:d replicate: [1,2] type:t seq:5 src:a dst:d replicate: [1,2] msg dst: d Original message: Replay:
  79. a b c d e msg dst: d Observation #3:

    some contents should be masked type:t seq:3 src:a dst:d replicate: [1,2] type:t seq:5 src:a dst:d replicate: [1,2] msg dst: d Original message: Replay:
  80. Phase 1: choose initial schedule Match messages by user-defined “fingerprint”

    Phase 2: prioritize backtrack points Match messages by type only Backtrack whenever multiple pending messages match by type Observation #3: some contents should be masked
  81. Observation #4: shrink external message contents a b c d

    e type:bootstrap peers: [a,b,c,d,e] type:bootstrap peers: [a,b,c,d,e] type:bootstrap peers: [a,b,c,d,e]
  82. Observation #4: shrink external message contents a b c d

    e type:bootstrap peers: [a,b,c,d,e] type:bootstrap peers: [a,b,c,d,e] type:bootstrap peers: [a,b,c,d,e]
  83. Observation #4: shrink external message contents a b c d

    e type:bootstrap peers: [a,b,c,d,e] type:bootstrap peers: [a,b,c,d,e] type:bootstrap peers: [a,b,c,d,e]
  84. Observation #4: shrink external message contents Observation #1: many schedules

    are commutative Approach: prioritize schedule space exploration Goal: find minimal schedule that produces violation Minimize internal events after externals minimized Observation #2: selectively mask original events Observation #3: some contents should be masked
  85. Outline Introduction Background Node 1 Node N Test Coordinator QA

    Testbed Software Under Test Fuzz Testing w/ DEMi S2 S3 S1 S3 Computational Model Minimization Evaluation Conclusion
  86. How well does DEMi work? Total Events 0 300 600

    900 1200 1500 1800 2100 2400 2700 3000 Case Study raft-45 raft-46 raft-56 raft-58a raft-58b raft-42 raft-66 spark-2294 spark-3150 spark-9256 11 14 40 77 180 40 226 82 35 23 300 600 1000 400 1710 1500 2850 2380 1250 2160 Initial Execution After Minimization
  87. How well does DEMi work? Total Events 0 300 600

    900 1200 1500 1800 2100 2400 2700 3000 Case Study raft-45 raft-46 raft-56 raft-58a raft-58b raft-42 raft-66 spark-2294 spark-3150 spark-9256 11 14 40 77 180 40 226 82 35 23 300 600 1000 400 1710 1500 2850 2380 1250 2160 Initial Execution After Minimization Found w/ Fuzz Testing!
  88. How well does DEMi work? Total Events 0 300 600

    900 1200 1500 1800 2100 2400 2700 3000 Case Study raft-45 raft-46 raft-56 raft-58a raft-58b raft-42 raft-66 spark-2294 spark-3150 spark-9256 11 14 40 77 180 40 226 82 35 23 300 600 1000 400 1710 1500 2850 2380 1250 2160 Initial Execution After Minimization 80% - 97% Reduction!
  89. How well does DEMi work? Total Events 0 30 60

    90 120 150 180 210 240 270 300 Case Study raft-45 raft-46 raft-56 raft-58a raft-58b raft-42 raft-66 spark-2294spark-3150spark-9256 11 11 25 29 39 28 51 21 23 22 11 14 40 77 180 40 226 82 35 23 After Minimization Smallest Manual Trace
  90. How well does DEMi work? Total Events 0 30 60

    90 120 150 180 210 240 270 300 Case Study raft-45 raft-46 raft-56 raft-58a raft-58b raft-42 raft-66 spark-2294spark-3150spark-9256 11 11 25 29 39 28 51 21 23 22 11 14 40 77 180 40 226 82 35 23 After Minimization Smallest Manual Trace Factor of 1x - 5x from hand-crafted
  91. 69 170 How quickly does DEMi work? Runtime in Seconds

    0 400 800 1200 1600 2000 2400 2800 3200 3600 4000 Case Study raft-45 raft-46 raft-56 raft-58a raft-58b raft-42 raft-66 spark-2294 spark-3150 spark-9256 210 245 427 348 10676 69 43482 2132 282 170 Overall Minimization (~12 hours) (~3 hours) (~35 minutes)
  92. 69 170 How quickly does DEMi work? Runtime in Seconds

    0 400 800 1200 1600 2000 2400 2800 3200 3600 4000 Case Study raft-45 raft-46 raft-56 raft-58a raft-58b raft-42 raft-66 spark-2294 spark-3150 spark-9256 210 245 427 348 10676 69 43482 2132 282 170 Overall Minimization <10 minutes except 3 cases (~12 hours) (~3 hours) (~35 minutes)
  93. See the paper for… How we handle non-determinism Handling multithreaded

    processes Supporting other RPC libraries Sketch for minimizing production traces More in-depth evaluation Related work …
  94. Past Work Internet Troubleshooting: NSDI ’10, SIGCOMM ‘12 SDN Troubleshooting:

    HotSDN ’13, PODC ’13, SIGCOMM ‘14 Middleboxes & Mobile Devices: SIGCOMM ’12, NSDI ’15 CAP for Networks: HotSDN ‘13
  95. SE PL 1Parkinson, VSTTE ‘10 Tools for dst’sys lag sequential

    tools by ~10 years1 Dst’Sys & Networking
  96. SE PL Tools for dst’sys lag sequential tools by ~10

    years1 Stable- Multithreading & PCT for dst’sys Routing convergence tradeoffs Testing & debugging async code (mobile,JS) Infer JS defer tags SAT/SMT solvers need systems techniques Test verified dst’sys Program properties for minimization ACID for SDN Synthesizing coordination HCI for configuration hell 1Parkinson, VSTTE ‘10 Dst’Sys & Networking
  97. Conclusion Open source tool: github.com/NetSys/demi Read our paper! eecs.berkeley.edu/~rcs/research/nsdi_draft.pdf Optimistic

    that these techniques can be successfully applied more broadly Thanks for your time! Contact me! [email protected]
  98. Attributions Inspiration for slide design: Jay Lorch’s IronFleet slides Graphic

    Icons: thenounproject.org logfile: mantisshrimpdesign magnifying glass: Ricardo Moreira disk: Anton Outkine hook: Seb Cornelius bug report: Lemon Liu devil: Mourad Mokrane Putin: Remi Mercier
  99. Production Traces Model: feed partially ordered log into single machine

    DEMi Require: - Partial ordering of all message deliveries - All crash-recoveries logged to disk
  100. Related Work Thread Schedule Minimization •Isolating Failure-Inducing Thread Schedules. SIGSOFT

    ’02. •A Trace Simplification Technique for Effective Debugging of Concurrent Programs. FSE ’10. Program Flow Analysis. •Enabling Tracing of Long-Running Multithreaded Programs via Dynamic Execution Reduction. ISSTA ’07. •Toward Generating Reducible Replay Logs. PLDI ’11. Best-Effort Replay of Field Failures •A Technique for Enabling and Supporting Debugging of Field Failures. ICSE ’07. •Triage: Diagnosing Production Run Failures at the User’s Site. SOSP ’07.
  101. Dealing With Threads If you’re lucky: threads are largely independent

    (Spark) If you’re unlucky: key insight: A write to shared memory is equivalent to a message delivery Approach: •interpose on virtual memory, thread scheduler •pause a thread whenever it writes to shared memory / disk Cf. “Enabling Tracing Of Long-Running Multithreaded Programs Via Dynamic Execution Reduction”, ISSTA ‘07
  102. Dealing With Non-Determinism Interpose on: - Timers - Random number

    generators - Unordered hash values - ID allocation Stop-gap: replay each schedule multiple times
  103. Integrating with other RPC libs App RPC lib OS App

    RPC lib OS App RPC lib OS DEMi JVM