ColorSafe at ISCA 2010

ColorSafe at ISCA 2010

A talk I gave at ISCA 2010 on ColorSafe, a new technique and computer architecture for automatically avoiding failures due to complex concurrent programming errors.

4d7bad4018644d2e5ebc1cb49c3a4278?s=128

Brandon Lucia

May 28, 2012
Tweet

Transcript

  1. ColorSafe: Architectural Support for Debugging and Avoiding Multi-variable Atomicity Violations

    Brandon Lucia, Luis Ceze and Karin Strauss Monday, May 28, 2012
  2. Concurrency Bugs: 2 Multithreaded Program T1 T2 T3 Scourge of

    the multi-threaded world Hard to find and fix, and cause bad things to happen! Caused by unforseen thread interactions Monday, May 28, 2012
  3. Dealing with Concurrency Bugs 3 Bug Detection Bug Avoidance T1

    T2 T1 T2 Monday, May 28, 2012
  4. Dealing with Concurrency Bugs 3 Bug Detection Bug Avoidance T1

    T2 T1 T2 Monday, May 28, 2012
  5. Dealing with Concurrency Bugs 3 Bug Detection Bug Avoidance T1

    T2 T1 T2 ! Monday, May 28, 2012
  6. Dealing with Concurrency Bugs 3 Bug Detection Bug Avoidance T1

    T2 T1 T2 ! Monday, May 28, 2012
  7. Dealing with Concurrency Bugs 3 Bug Detection Bug Avoidance T1

    T2 T1 T2 ! Monday, May 28, 2012
  8. Dealing with Concurrency Bugs 3 Bug Detection Bug Avoidance T1

    T2 T1 T2 ! ! Monday, May 28, 2012
  9. Dealing with Concurrency Bugs 3 Bug Detection Bug Avoidance T1

    T2 T1 T2 ! ! Monday, May 28, 2012
  10. Dealing with Concurrency Bugs 3 Bug Detection Bug Avoidance T1

    T2 T1 T2 ! ! We can use architecture support for precise bug detection. Reusing the same support for avoidance makes it useful for the system’s lifetime. Monday, May 28, 2012
  11. Atomicity Violations 4 ctr = ctr+1; ctr = ctr+1; Program

    Initially: ctr = 0; Result: ctr = 2; T1 T2 Monday, May 28, 2012
  12. Atomicity Violations 4 t = t + 1; ctr =

    t; Execution ctr = ctr+1; ctr = ctr+1; Program Initially: ctr = 0; Result: ctr = 2; Result: ctr = 1; Lost Update! T1 T2 t = ctr; T1 t = ctr; t = t + 1; ctr = t; T2 Monday, May 28, 2012
  13. Atomicity Violations 4 t = t + 1; ctr =

    t; Execution ctr = ctr+1; ctr = ctr+1; Program Initially: ctr = 0; Result: ctr = 2; This should be atomic, but it is not. T1 T2 t = ctr; T1 t = ctr; t = t + 1; ctr = t; T2 Monday, May 28, 2012
  14. Atomicity Violations 4 t = t + 1; ctr =

    t; Execution ctr = ctr+1; ctr = ctr+1; Program Initially: ctr = 0; Result: ctr = 2; T1 T2 t = ctr; T1 t = ctr; t = t + 1; ctr = t; T2 Monday, May 28, 2012
  15. Atomicity Violations 4 t = t + 1; ctr =

    t; Execution t = ctr; T1 t = ctr; t = t + 1; ctr = t; T2 Wr Wr Rd Wr Rd Wr Rd Rd Wr Rd Wr Wr Unserializable Interleavings Monday, May 28, 2012
  16. Multi-Variable Bugs 5 str = “BUGS” s = str Program

    len = 4 l = len Initially: str = “”, len = 0 Result: s = BUGS and l = 4 Or s = “” and l = 0 T1 T2 Monday, May 28, 2012
  17. Multi-Variable Bugs 5 Execution str = “BUGS” s = str

    Program len = 4 l = len Initially: str = “”, len = 0 Result: s = BUGS and l = 4 Or s = “” and l = 0 str = “BUGS” s = str len = 4 l = len T1 T2 T1 T2 Monday, May 28, 2012
  18. Multi-Variable Bugs 5 Execution str = “BUGS” s = str

    Program Result: s = BUGS and l = 0 len = 4 l = len Initially: str = “”, len = 0 Result: s = BUGS and l = 4 Or s = “” and l = 0 str = “BUGS” s = str len = 4 l = len T1 T2 T1 T2 Monday, May 28, 2012
  19. Multi-Variable Bugs 5 Execution str = “BUGS” s = str

    Program Result: s = BUGS and l = 0 len = 4 l = len Initially: str = “”, len = 0 Missing atomicity constraint But single-variable access interleavings are serializable! We need a new analysis to find these bugs. str = “BUGS” s = str len = 4 l = len T1 T2 T1 T2 Monday, May 28, 2012
  20. ColorSafe 6 str = “ISCA” len = 4 s =

    str l = len Monday, May 28, 2012
  21. ColorSafe 6 str = “ISCA” len = 4 s =

    str l = len Key Idea: Assign “colors” to related data and analyze serializability of accesses to colors Monday, May 28, 2012
  22. ColorSafe 6 str = “ISCA” len = 4 s =

    str l = len Key Idea: Assign “colors” to related data and analyze serializability of accesses to colors str len str and len are related so they both map to blue Monday, May 28, 2012
  23. ColorSafe 6 Write Write Read Read Key Idea: Assign “colors”

    to related data and analyze serializability of accesses to colors str len str and len are related so they both map to blue In the “color-space”, these accesses are unserializable. ColorSafe’s analysis finds multi-variable atomicity violations Monday, May 28, 2012
  24. Avoiding Bugs 7 Wr Rd Rd Wr Unserializable! Unserializable! ColorSafe

    prevents unserializable interleavings by dynamically making sections of execution atomic and isolated Monday, May 28, 2012
  25. Avoiding Bugs 7 Wr Rd Rd Wr ColorSafe prevents unserializable

    interleavings by dynamically making sections of execution atomic and isolated Atomic! Serializable Serializable Monday, May 28, 2012
  26. 8 Contributions of ColorSafe Color-space serializability analysis to find single-

    and multi-variable bugs Bug detection and avoidance using the same hardware architectural support Debugging Mode vs. Deployment Mode: Trading precision for proactive bug avoidance Monday, May 28, 2012
  27. Debugging vs. Deployment 9 Debugging Mode Rd Rd Wr Rd

    Rd Wr Rd Wr Rd Rd Thread 1 Thread 2 Serializable? Serializable? Rd Rd Wr Rd Rd Wr Rd Wr Rd Rd Thread 1 Thread 2 Serializable? Serializable? Serializable? Serializable? Serializable? Deployment Mode More Precise Less Precise, but Proactive Monday, May 28, 2012
  28. Debugging vs. Deployment 9 Debugging Mode Rd Rd Wr Rd

    Rd Wr Rd Wr Rd Rd Thread 1 Thread 2 Serializable? Serializable? Rd Rd Wr Rd Rd Wr Rd Wr Rd Rd Thread 1 Thread 2 Serializable? Serializable? Serializable? Serializable? Serializable? Deployment Mode More Precise Less Precise, but Proactive Monday, May 28, 2012
  29. Debugging vs. Deployment 9 Debugging Mode Rd Rd Wr Rd

    Rd Wr Rd Wr Rd Rd Thread 1 Thread 2 Serializable? Serializable? Rd Rd Wr Rd Rd Wr Rd Wr Rd Rd Thread 1 Thread 2 Serializable? Serializable? Serializable? Serializable? Serializable? Deployment Mode More Precise Less Precise, but Proactive Monday, May 28, 2012
  30. Debugging vs. Deployment 9 Debugging Mode Rd Rd Wr Rd

    Rd Wr Rd Wr Rd Rd Thread 1 Thread 2 Serializable? Serializable? Rd Rd Wr Rd Rd Wr Rd Wr Rd Rd Thread 1 Thread 2 Serializable? Serializable? Serializable? Serializable? Serializable? Deployment Mode More Precise Less Precise, but Proactive Monday, May 28, 2012
  31. Architectural Support for ColorSafe Monday, May 28, 2012

  32. Architectural Support 11 Monday, May 28, 2012

  33. Architectural Support 11 Color Meta-Data Support 0xA 0x11 0x12 Monday,

    May 28, 2012
  34. Architectural Support 11 Color Meta-Data Support Rd Rd Wr Rd

    Rd Access History Support 0xA 0x11 0x12 Monday, May 28, 2012
  35. Architectural Support 11 Color Meta-Data Support Rd Rd Wr Rd

    Rd Access History Support Wr Remote Color Access Exchange Support 0xA 0x11 0x12 Monday, May 28, 2012
  36. Architectural Support 11 Color Meta-Data Support Rd Rd Wr Rd

    Rd Access History Support Wr Remote Color Access Exchange Support Wr Wr Dynamic Atomicity Support 0xA 0x11 0x12 Monday, May 28, 2012
  37. Color Meta-Data Support 12 0xA 0xB 0xC 0x10 0x11 0x12

    In-Memory Table Maps Memory Addresses to Color Meta-Data Granularity of meta-data is determined by size of colored regions ColorSafe needs general meta-data support, and can rely on mechanisms like MMP [ASPLOS ‘02] or Loki [OSDI ‘08] 0xA 0x11 0x12 Color Lookaside Buffer caches address-to-color translations Monday, May 28, 2012
  38. Color Access History 13 Rd Rd Wr Rd Rd Rd

    Wr Wr Wr Rd Each processor needs local and remote access histories Local Remote Time Monday, May 28, 2012
  39. Color Access History 13 Each processor needs local and remote

    access histories Local Remote Read Write Read Write Read and Write sets are maintained separately Time Monday, May 28, 2012
  40. Color Access History 13 Each processor needs local and remote

    access histories Local Remote Read Write Read Write Read and Write sets are maintained separately History quantized into epochs by hash encoding many accesses into a signature Time Monday, May 28, 2012
  41. Color Access History 13 Local Remote Read Write Read Write

    History Buffer Time Monday, May 28, 2012
  42. Color Access History 13 Local Remote Read Write Read Write

    History Buffer Signature File: FIFO of bloom filter based hardware signatures Time Monday, May 28, 2012
  43. Color Access History 13 Local Remote Read Write Read Write

    History Buffer Signature File: FIFO of bloom filter based hardware signatures History Item: The set of all four signatures for an epoch Time Epoch Monday, May 28, 2012
  44. Color Access History 13 Local Remote Read Write Read Write

    History Buffer Signature File: FIFO of bloom filter based hardware signatures History Item: The set of all four signatures for an epoch Time Monday, May 28, 2012
  45. Detecting Atomicity Violations 14 Local Remote Read Write Read Write

    Processor 1’s History Buffer Intersection is just bitwise AND of signatures Checks performed at... Debugging mode: every instruction Deployment mode: end of epochs U Unserializable access to Local Write Local Write Remote Read Monday, May 28, 2012
  46. Avoiding Atomicity Violations 15 U Unserializable access to Hazard Color

    Set Wr str Wr len Rd Foo Rd Bar Rd Baz Rd Foo Hazard Color Set holds set of suspicious colors Implemented as a signature Monday, May 28, 2012
  47. Avoiding Atomicity Violations 15 U Unserializable access to Hazard Color

    Set Wr str Wr len Rd Foo Rd Bar Rd Baz Rd Foo Hazard Color Set holds set of suspicious colors Implemented as a signature Monday, May 28, 2012
  48. Avoiding Atomicity Violations 15 U Unserializable access to Hazard Color

    Set Wr str Wr len Rd Foo Rd Bar Rd Baz Rd Foo Hazard Color Set holds set of suspicious colors ! On accesses to colors in the Hazard Color Set, a processor starts an Ephemeral Transaction Ephemeral transactions prevent interleaving of subsequent instructions preventing atomicity violations Implemented as a signature Monday, May 28, 2012
  49. Coloring Data 16 Manual Coloring Automatic Coloring Requires annotations Based

    on bug report + application knowledge Primarily useful for bug detection Fully Automatic (e.g., at malloc calls) Preemptively infers likely data relationships More precise Less precise Primarily useful for bug avoidance Monday, May 28, 2012
  50. Evaluating ColorSafe Monday, May 28, 2012

  51. Experimental Methodology 18 Modeled color table, history buffer, serializability analysis

    logic, ephemeral transactions Experimented with manual and malloc coloring ColorSafe simulator built using Pin Used bug kernel benchmarks as well as full applications (AGet, Apache, MySQL) Monday, May 28, 2012
  52. Bug Detection Results 19 0 5 10 15 20 25

    30 35 40 Apache AGet MySQL Debugging Mode Debugging Mode + Post Processing 677 Benchmark # of Reports (w/ FPs) Relatively few code locations reported to developers Using simple invariant-based processing, dramatic reduction in false positives Monday, May 28, 2012
  53. Bug Avoidance Results 20 0 20 40 60 80 100

    nsTextFram e m acN etD river jsStringLength jsInterpreter m sgPane Apache AG et M ySQ L Full Applications Percent of dynamic atomicity violations avoided Bug kernels extracted from Mozilla Monday, May 28, 2012
  54. 21 ColorSafe New analysis that finds single- and multi-variable bugs

    Finds and avoids bugs in real software Hardware support with precise or proactive detection Bug avoidance justifies hardware support and enables us to cope with inevitably broken multi-threaded software Monday, May 28, 2012
  55. Anything Questionable? Monday, May 28, 2012

  56. 23 Performance 0 0.8 1.6 2.4 3.2 4.0 Apache AG

    et M ySQ L M ean % Dyn. Instr. Re-Executed Due to Useless ET Conflicts Monday, May 28, 2012