Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Recon at PLDI 2011

Recon at PLDI 2011

A talk I gave at PLDI 2011 about Recon, a new technique and tool we made for debugging concurrent programs.

Brandon Lucia

May 28, 2012
Tweet

More Decks by Brandon Lucia

Other Decks in Research

Transcript

  1. Recon: Finding and Understanding Concurrency Errors with Reconstructed Execution Fragments

    Brandon Lucia, Benjamin P. Wood, Luis Ceze University of Washington Department of Computer Science and Engineering Monday, May 28, 2012
  2. Concurrency Errors 6 Ready = true if Ready == true

    SObj = new p() myObj = SObj Thread 1 Thread 2 o = Q.dq o->use() Thread 3 Q.nq(myObj) Initially, Ready is false, SObj is an invalid pointer Intended Invariant: If Ready is true, SObj is a valid pointer Monday, May 28, 2012
  3. Concurrency Errors 7 Ready = true if Ready == true

    SObj = new p() myObj = SObj Thread 1 Thread 2 o = Q.dq o->use() Thread 3 Q.nq(myObj) Initially, Ready is false, SObj is an invalid pointer Buggy Behavior: Ready can be true when SObj is invalid Monday, May 28, 2012
  4. Concurrency Errors 7 Ready = true if Ready == true

    SObj = new p() myObj = SObj Thread 1 Thread 2 o = Q.dq o->use() Thread 3 Q.nq(myObj) Several days later... Initially, Ready is false, SObj is an invalid pointer Buggy Behavior: Ready can be true when SObj is invalid Monday, May 28, 2012
  5. Concurrency Errors 8 Ready = true if Ready == true

    SObj = new p() myObj = SObj Thread 1 Thread 2 o = Q.dq o->use() Thread 3 Q.nq(myObj) Several days later... Symptom of bug is much later than cause and in different thread! ! Monday, May 28, 2012
  6. Tools to Make Bugs Happen 9 Ready = true if

    Ready == true SObj = new p() myObj = SObj Thread 1 Thread 2 o = Q.dq o->use() Thread 3 Q.nq(myObj) Several days later... Bug-exposing test tools report buggy execution schedules Too much information! Monday, May 28, 2012
  7. Tools to Show What Happened 10 Ready = true if

    Ready == true SObj = new p() myObj = SObj Thread 1 Thread 2 o = Q.dq o->use() Thread 3 Q.nq(myObj) Bug isolation tools guide programmers to a code point or two Too little information! Monday, May 28, 2012
  8. Tools to Show What Happened 10 Ready = true if

    Ready == true SObj = new p() myObj = SObj Thread 1 Thread 2 o = Q.dq o->use() Thread 3 Q.nq(myObj) Bug isolation tools guide programmers to a code point or two Several days later... Too little information! Monday, May 28, 2012
  9. 11 Ready = true if Ready == true SObj =

    new p() myObj = SObj Thread 1 Thread 2 o = Q.dq o->use() Thread 3 Q.nq(myObj) Programmers need focused information about bugs’ causes. Monday, May 28, 2012
  10. 11 Ready = true if Ready == true SObj =

    new p() myObj = SObj Thread 1 Thread 2 o = Q.dq o->use() Thread 3 Q.nq(myObj) Several days later... Programmers need focused information about bugs’ causes. Monday, May 28, 2012
  11. Focusing on the Right Stuff 12 Ready = true if

    Ready == true SObj = new p() myObj = SObj Thread 1 Thread 2 o = Q.dq o->use() Thread 3 Q.nq(myObj) Several days later... Monday, May 28, 2012
  12. Focusing on the Right Stuff 12 Ready = true if

    Ready == true SObj = new p() myObj = SObj Thread 1 Thread 2 Thread 3 Monday, May 28, 2012
  13. 13 Ready = true if Ready == true SObj =

    new p() myObj = SObj Focusing on the Right Stuff Monday, May 28, 2012
  14. 13 Ready = true if Ready == true SObj =

    new p() myObj = SObj Focusing on the Right Stuff Monday, May 28, 2012
  15. 14 Ready = true if Ready == true SObj =

    new p() myObj = SObj communication Focusing on the Right Stuff Monday, May 28, 2012
  16. 15 Execution Reconstructions Ready = true if Ready == true

    SObj = new p() myObj = SObj communication Reconstruction A reconstructions is a focused subset of the execution schedule around the root cause of a bug Monday, May 28, 2012
  17. 16 Recon Workflow Crashes when I run it! Collect communication

    graphs from many executions Monday, May 28, 2012
  18. 16 Recon Workflow Label graphs as buggy or non-buggy buggy

    non Crashes when I run it! Collect communication graphs from many executions Monday, May 28, 2012
  19. 16 Recon Workflow Label graphs as buggy or non-buggy buggy

    non Build and aggregate reconstructions : Crashes when I run it! Collect communication graphs from many executions Monday, May 28, 2012
  20. 16 Recon Workflow Label graphs as buggy or non-buggy buggy

    non Build and aggregate reconstructions : Crashes when I run it! Collect communication graphs from many executions Rank reconstructions and report 1 2 3 Monday, May 28, 2012
  21. Communication Graphs 19 Ready = true if Ready == true

    SObj = new p() myObj = SObj Thread 1 Thread 2 Nodes are static instructions Edges are inter-thread communication via shared memory Monday, May 28, 2012
  22. Context-Aware Communication Graphs [MICRO ’09] 20 Ready = true if

    Ready == true SObj = new p() myObj = SObj Rem Wr Rem Wr Loc Rd Loc Wr Rem Rd Rem Rd Communication context is a short history of recent communication events Nodes are instances of instructions within their context Monday, May 28, 2012
  23. Timestamped Context-Aware Communication Graphs 21 Ready = true if Ready

    == true SObj = new p() myObj = SObj Rem Wr Rem Wr Loc Rd Loc Wr Rem Rd Rem Rd Timestamps encode ordering of non-communicating nodes T=5 T=7 T=15 T=16 Monday, May 28, 2012
  24. 23 Reconstructions A reconstruction is built around a single communication

    event from a single execution Source Sink Monday, May 28, 2012
  25. 24 Reconstructions A reconstruction is a time-ordered sequence of memory

    operations. Time Memory Operation Monday, May 28, 2012
  26. 25 Reconstructions The regions of a reconstruction are computed using

    graph timestamps. { { { Body Prefix Suffix Monday, May 28, 2012
  27. 26 Multiple Buggy Executions Behavior differs across runs. Same edge,

    different reconstructions. Idea: Focus on common behavior by combining multiple executions = Monday, May 28, 2012
  28. 26 Multiple Buggy Executions Buggy Execution #1 Buggy Execution #2

    Behavior differs across runs. Same edge, different reconstructions. Idea: Focus on common behavior by combining multiple executions = Monday, May 28, 2012
  29. 27 Aggregate Reconstructions + = 50% 50% 50% 50% 100%

    Aggregate Reconstruction Buggy Execution #1 Buggy Execution #2 Monday, May 28, 2012
  30. 28 Aggregate Reconstructions 50% 50% 50% 50% 100% Aggregation focuses

    on typical behavior and deemphasizes rare behavior Monday, May 28, 2012
  31. 28 Aggregate Reconstructions 50% 50% 50% 50% 100% Aggregate Reconstruction

    Aggregation focuses on typical behavior and deemphasizes rare behavior Monday, May 28, 2012
  32. 29 Recon Workflow Collect communication graphs from many executions Label

    graphs as buggy or non-buggy buggy non Build and aggregate reconstructions : Rank reconstructions and report 1 Crashes when I run it! 2 3 Monday, May 28, 2012
  33. 29 Recon Workflow Collect communication graphs from many executions Label

    graphs as buggy or non-buggy buggy non Build and aggregate reconstructions : Rank reconstructions and report 1 Crashes when I run it! 2 3 Monday, May 28, 2012
  34. 31 Ranking Reconstructions 50% 50% 50% 50% 100% Each reconstruction

    is described by a vector of numeric features Statistical inference on feature vectors ranks reconstructions [ ] B C R BUG! Monday, May 28, 2012
  35. 32 Feature Definitions [ ] Buggy Frequency Ratio Context Variation

    Ratio Reconstruction Consistency Monday, May 28, 2012
  36. 33 Buggy Frequency Ratio buggy buggy non non Buggy Frequency

    Ratio is large if edge occurs often in buggy runs, and rarely in non-buggy runs Monday, May 28, 2012
  37. 33 Buggy Frequency Ratio buggy buggy non non Buggy Frequency

    Ratio is large if edge occurs often in buggy runs, and rarely in non-buggy runs Monday, May 28, 2012
  38. 34 Reconstruction Consistency 50% 50% 50% 50% 100% Reconstruction Consistency

    is high if the behavior is typical in buggy executions. Monday, May 28, 2012
  39. 35 Ranking by Feature A BUG! Not a BUG! By

    design: higher feature values mean more likely buggy A reconstruction’s rank is a linear combination of its features Monday, May 28, 2012
  40. 0 2 4 6 8 10 logandswp circlist textreflow jsstrlen

    apache mysql pbzip2 aget stringbuffer vector weblech Rank of Bug’s Reconstruction 25 Buggy Runs C/C++ Java Using 25 non-buggy runs and Evaluating Recon’s Precision Monday, May 28, 2012
  41. 0 2 4 6 8 10 logandswp circlist textreflow jsstrlen

    apache mysql pbzip2 aget stringbuffer vector weblech Rank of Bug’s Reconstruction 25 Buggy Runs 15 Buggy Runs C/C++ Java Using 25 non-buggy runs and Evaluating Recon’s Precision Monday, May 28, 2012
  42. 0 2 4 6 8 10 logandswp circlist textreflow jsstrlen

    apache mysql pbzip2 aget stringbuffer vector weblech Rank of Bug’s Reconstruction 25 Buggy Runs 15 Buggy Runs 5 Buggy Runs C/C++ Java 34 Using 25 non-buggy runs and Evaluating Recon’s Precision Monday, May 28, 2012
  43. 0 5 10 15 20 25 apache mysql pbzip2 aget

    PARSEC weblech D aCapo Java Grande Slowdown (x) 79x C/C++ Java Evaluating Recon’s Performance Monday, May 28, 2012
  44. 0 5 10 15 20 25 apache mysql pbzip2 aget

    PARSEC weblech D aCapo Java Grande Slowdown (x) 79x 27:32 07:08 1:51:56 59:41 13:36 Total Graph Collection Time C/C++ Java Evaluating Recon’s Performance Monday, May 28, 2012
  45. 41 Recon Ready = true if Ready == true SObj

    = new p() myObj = SObj Recon reconstructs execution fragments to help programmers understand their bugs BUG! Recon uses statistical inference to identify reconstructions useful to understanding bugs Monday, May 28, 2012
  46. 0 20 40 60 80 100 logandswp circlist textreflow jsstrlen

    apache mysql pbzip2 aget stringbuffer vector weblech Fraction of Reconstruction Relevant to Bug C/C++ Java Reconstructions Signal-to-Noise Awesome! Lame! Monday, May 28, 2012