Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Conflict Exceptions at ISCA 2010

Brandon Lucia
May 28, 2012
77

Conflict Exceptions at ISCA 2010

A talk I gave at ISCA 2010 on Conflict Exceptions, a new technique and computer architecture for treating data-races as fail-stop exceptions.

Brandon Lucia

May 28, 2012
Tweet

Transcript

  1. Conflict Exceptions: Simplifying Concurrent Language Semantics with Precise Hardware Exceptions

    for Data-Races Brandon Lucia, Luis Ceze, Karin Strauss, Shaz Qadeer and Hans-J. Boehm Monday, May 28, 2012
  2. Data-Races are Trouble 2 Complicated language specifications Usually incorrect, and

    difficult to debug Negative impact on system reliability Monday, May 28, 2012
  3. What If... 3 Fail-Stop Semantics for Data-Races Semantics are clear

    and simple Better data-race debugging Safety: races can’t cause problems When a data-race occurs, throw an exception Monday, May 28, 2012
  4. Prior Work 5 Performance Precision Happens-Before [Elmas’07, Flanagan‘09] Approx. Methods

    [Savage’97, Zhou’07, Yu’05] ✓ ✗ ✓ ✗ Monday, May 28, 2012
  5. Prior Work 5 Performance Precision Happens-Before [Elmas’07, Flanagan‘09] Approx. Methods

    [Savage’97, Zhou’07, Yu’05] ✓ ✗ ✓ ✗ Monday, May 28, 2012
  6. Prior Work 5 Performance Precision Happens-Before [Elmas’07, Flanagan‘09] Approx. Methods

    [Savage’97, Zhou’07, Yu’05] ✓ ✗ ✓ ✗ Conflict Exceptions [ISCA ‘10] ✓ ✓ Monday, May 28, 2012
  7. Conflict Exceptions 6 Acquire(K) Release(K) Acquire(L) Release(L) Acquire(M) Release(M) Rd

    Y Wr X Rd T Wr T Rd Y Wr Y ... Rd X Wr Y ... Thread 1 Thread 2 Monday, May 28, 2012
  8. Conflict Exceptions 6 Acquire(K) Release(K) Acquire(L) Release(L) Acquire(M) Release(M) Rd

    Y Wr X Rd T Wr T Rd Y Wr Y ... Rd X Wr Y ... Thread 1 Thread 2 Synchronization-Free Regions Monday, May 28, 2012
  9. Conflict Exceptions 6 Acquire(K) Release(K) Acquire(L) Release(L) Acquire(M) Release(M) Rd

    Y Wr X Rd T Wr T Rd Y Wr Y ... Rd X Wr Y ... Thread 1 Thread 2 Conflict! Synchronization-Free Regions Monday, May 28, 2012
  10. Conflict Exceptions 6 Acquire(K) Release(K) Acquire(L) Release(L) Acquire(M) Release(M) Rd

    Y Wr X Rd T Wr T Rd Y Wr Y ... Rd X Wr Y ... Thread 1 Thread 2 Conflict! Exception Delivered Here Synchronization-Free Regions Monday, May 28, 2012
  11. Conflict Exceptions 6 Acquire(K) Release(K) Acquire(L) Release(L) Acquire(M) Release(M) Rd

    Y Wr X Rd T Wr T Rd Y Wr Y ... Rd X Wr Y ... Thread 1 Thread 2 Conflict! Undetected Race Exception Delivered Here Synchronization-Free Regions Monday, May 28, 2012
  12. Conflict Exceptions 7 Acquire(K) Release(K) Acquire(L) Release(L) Acquire(M) Release(M) Rd

    Y Wr X Rd T Wr T Rd Y Wr Y ... Rd X Wr Y ... Thread 1 Thread 2 Conflict! Undetected Race Exception Delivered Here Synchronization-Free Regions Precisely detect only races that can effect consistency The Guarantee: Exception-Thrown? There was a data-race. Exception-Free? Sequential Consistency. Ignoring unimportant races is key to performance Monday, May 28, 2012
  13. Language Level Benefits 8 Acquire(K) Release(K) Reordering in SFRs is

    legal Granularity independence Rd Y Wr X Acquire(K) Release(K) Wr64_Low X Wr64_Hi X Exception-Free executions are SC Acq(K) Rel(K) Rd X Wr X Acq(K) Rel(K) Rd X Wr X ✓ Monday, May 28, 2012
  14. Language Level Benefits 9 pthread_lock(K) pthread_unlock(K) Programming is the same

    Racy programs are well-behaved Rd Y Wr X Race semantics are simpler Wr Q Wr Z Acq(K) Rd X Wr X Acq(L) Rd X ! Monday, May 28, 2012
  15. Debugging and Reliability 10 Concurrent, conflicting SFRs throw exceptions Acq(K)

    Rd X Wr X Acq(L) Rd X ! All races have some exceptional schedule Exception Handling: Log + Recover Damage Control: Shut down buggy module Monday, May 28, 2012
  16. Hardware/Software Interface 12 New Instructions: BeginRegion and EndRegion Synchronization Operations

    are Singleton Regions Exceptions Thrown Precisely Before Conflicting Instruction Monday, May 28, 2012
  17. Hardware/Software Interface 12 Rd Y Wr X Rd T Wr

    T Acquire(K) Release(K) BeginRegion EndRegion BeginRegion EndRegion New Instructions: BeginRegion and EndRegion Synchronization Operations are Singleton Regions Exceptions Thrown Precisely Before Conflicting Instruction Monday, May 28, 2012
  18. Access Monitoring 13 Byte-granular access information is required ... ...

    N-byte Cache Line N-bit Access Bits Local Read Local Write Remote Read Remote Write Exception Test: compare appropriate local and remote bits Line-level Supplied Bit Monday, May 28, 2012
  19. Coherence Support 14 CPU 1 CPU 2 Read Request Read

    Reply Local Write Bits Remote Write Bits V CPU 1 CPU 2 Write/Invalidate Invalidate Ack Local Write Bits Local Read Bits Read Coherence Actions Write Coherence Actions Monday, May 28, 2012
  20. Coherence Support 14 CPU 1 CPU 2 Read Request Read

    Reply Local Write Bits Remote Write Bits V CPU 1 CPU 2 Write/Invalidate Invalidate Ack Local Write Bits Local Read Bits Read Coherence Actions Write Coherence Actions Monday, May 28, 2012
  21. Ending a Region 15 CPU 1 CPU 2 Local Write

    Bits Local Read Bits End-Of-Region Message Ending a Region Address For all supplied lines... Clears Remote Bits Specified in EOR Msg End-Of-Region Ack Monday, May 28, 2012
  22. Putting It Together 16 LR LW RR RW LR LW

    RR RW CPU 1’s Cache Wr A A B C D A B C D A B C D A B C D CPU 2’s Cache CPU 1’s Code Rd C CPU 2’s Code BeginRegion BeginRegion EndRegion BeginRegion Wr C Sup Sup Monday, May 28, 2012
  23. Putting It Together 16 LR LW RR RW LR LW

    RR RW CPU 1’s Cache Wr A A B C D A B C D A B C D A B C D CPU 2’s Cache CPU 1’s Code Rd C CPU 2’s Code BeginRegion BeginRegion EndRegion BeginRegion Wr C Sup Sup Monday, May 28, 2012
  24. Putting It Together 16 LR LW RR RW LR LW

    RR RW CPU 1’s Cache Wr A A B C D A B C D A B C D A B C D CPU 2’s Cache CPU 1’s Code Rd C CPU 2’s Code BeginRegion BeginRegion EndRegion BeginRegion Wr C Sup Sup Monday, May 28, 2012
  25. Putting It Together 16 LR LW RR RW LR LW

    RR RW CPU 1’s Cache Wr A A B C D A B C D A B C D A B C D CPU 2’s Cache CPU 1’s Code Rd C CPU 2’s Code BeginRegion BeginRegion EndRegion BeginRegion Wr C Sup Sup Monday, May 28, 2012
  26. Putting It Together 16 LR LW RR RW LR LW

    RR RW CPU 1’s Cache Wr A A B C D A B C D A B C D A B C D CPU 2’s Cache CPU 1’s Code Rd C CPU 2’s Code BeginRegion BeginRegion EndRegion BeginRegion Wr C Sup Sup Monday, May 28, 2012
  27. Putting It Together 16 LR LW RR RW LR LW

    RR RW CPU 1’s Cache Wr A A B C D A B C D A B C D A B C D CPU 2’s Cache CPU 1’s Code Rd C CPU 2’s Code BeginRegion BeginRegion EndRegion BeginRegion Wr C Sup Sup Monday, May 28, 2012
  28. Putting It Together 16 LR LW RR RW LR LW

    RR RW CPU 1’s Cache Wr A A B C D A B C D A B C D A B C D CPU 2’s Cache CPU 1’s Code Rd C CPU 2’s Code BeginRegion BeginRegion EndRegion BeginRegion Wr C Sup Sup Rd Req Monday, May 28, 2012
  29. Putting It Together 16 LR LW RR RW LR LW

    RR RW CPU 1’s Cache Wr A A B C D A B C D A B C D A B C D CPU 2’s Cache CPU 1’s Code Rd C CPU 2’s Code BeginRegion BeginRegion EndRegion BeginRegion Wr C Sup Sup Monday, May 28, 2012
  30. Putting It Together 16 LR LW RR RW LR LW

    RR RW CPU 1’s Cache Wr A A B C D A B C D A B C D A B C D CPU 2’s Cache CPU 1’s Code Rd C CPU 2’s Code BeginRegion BeginRegion EndRegion BeginRegion Wr C Sup Sup Rd Reply Monday, May 28, 2012
  31. Putting It Together 16 LR LW RR RW LR LW

    RR RW CPU 1’s Cache Wr A A B C D A B C D A B C D A B C D CPU 2’s Cache CPU 1’s Code Rd C CPU 2’s Code BeginRegion BeginRegion EndRegion BeginRegion Wr C Sup Sup Monday, May 28, 2012
  32. Putting It Together 16 LR LW RR RW LR LW

    RR RW CPU 1’s Cache Wr A A B C D A B C D A B C D A B C D CPU 2’s Cache CPU 1’s Code Rd C CPU 2’s Code BeginRegion BeginRegion EndRegion BeginRegion Wr C Sup Sup Monday, May 28, 2012
  33. Putting It Together 16 LR LW RR RW LR LW

    RR RW CPU 1’s Cache Wr A A B C D A B C D A B C D A B C D CPU 2’s Cache CPU 1’s Code Rd C CPU 2’s Code BeginRegion BeginRegion EndRegion BeginRegion Wr C Sup Sup EoR Monday, May 28, 2012
  34. Putting It Together 16 LR LW RR RW LR LW

    RR RW CPU 1’s Cache Wr A A B C D A B C D A B C D A B C D CPU 2’s Cache CPU 1’s Code Rd C CPU 2’s Code BeginRegion BeginRegion EndRegion BeginRegion Wr C Sup Sup Monday, May 28, 2012
  35. Putting It Together 16 LR LW RR RW LR LW

    RR RW CPU 1’s Cache Wr A A B C D A B C D A B C D A B C D CPU 2’s Cache CPU 1’s Code Rd C CPU 2’s Code BeginRegion BeginRegion EndRegion BeginRegion Wr C Sup Sup Monday, May 28, 2012
  36. Putting It Together 16 LR LW RR RW LR LW

    RR RW CPU 1’s Cache Wr A A B C D A B C D A B C D A B C D CPU 2’s Cache CPU 1’s Code Rd C CPU 2’s Code BeginRegion BeginRegion EndRegion BeginRegion Wr C Sup Sup Monday, May 28, 2012
  37. Putting It Together 16 LR LW RR RW LR LW

    RR RW CPU 1’s Cache Wr A A B C D A B C D A B C D A B C D CPU 2’s Cache CPU 1’s Code Rd C CPU 2’s Code BeginRegion BeginRegion EndRegion BeginRegion Wr C Sup Sup Invalidate Monday, May 28, 2012
  38. Putting It Together 16 LR LW RR RW LR LW

    RR RW CPU 1’s Cache Wr A A B C D A B C D A B C D A B C D CPU 2’s Cache CPU 1’s Code Rd C CPU 2’s Code BeginRegion BeginRegion EndRegion BeginRegion Wr C Sup Sup Monday, May 28, 2012
  39. Putting It Together 16 LR LW RR RW LR LW

    RR RW CPU 1’s Cache Wr A A B C D A B C D A B C D A B C D CPU 2’s Cache CPU 1’s Code Rd C CPU 2’s Code BeginRegion BeginRegion EndRegion BeginRegion Wr C Sup Sup Inv Ack Monday, May 28, 2012
  40. Putting It Together 16 LR LW RR RW LR LW

    RR RW CPU 1’s Cache Wr A A B C D A B C D A B C D A B C D CPU 2’s Cache CPU 1’s Code Rd C CPU 2’s Code BeginRegion BeginRegion EndRegion BeginRegion Wr C Sup Sup Monday, May 28, 2012
  41. Putting It Together 16 LR LW RR RW LR LW

    RR RW CPU 1’s Cache Wr A A B C D A B C D A B C D A B C D CPU 2’s Cache CPU 1’s Code Rd C CPU 2’s Code BeginRegion BeginRegion EndRegion BeginRegion Wr C Sup Sup Exception! Monday, May 28, 2012
  42. Out-Of-Cache Operation 17 CPU 1 CPU 2 Main Memory Local

    Table 1 Local Table 2 Global Table Per-thread local table tracks evicted accessed addresses Per-process global table stores evicted lines’ access bits EoR messages for regions with evictions are expensive Global Table Ptr Global Table Ptr Local Table Ptr Local Table Ptr Monday, May 28, 2012
  43. Evaluation 18 Protocol verified with Zing model checker Simulator built

    using SESC and Pin Evaluated using PARSEC, MySQL and Apache Monday, May 28, 2012
  44. Overheads 19 0 2 4 6 8 x264 ferret canneal

    facesim vips dedup fluidanim ate M ean % Traffic Overhead ~5% traffic overhead on average Monday, May 28, 2012
  45. Performance Impact 20 0 0.3 0.6 0.9 1.2 1.5 stream

    cluster canneal x264 freqm ine Apache blackscholes M ySQ L M ean % In-Memory Acc Bit Lookups Costly access bit lookups are very infrequent - 1.5% in the worst case Monday, May 28, 2012
  46. Conflict Exceptions 21 Simplified language specifications Easier to debug data

    races Limit damage caused by race bugs When a data-race occurs, throw an exception Monday, May 28, 2012
  47. 22 Also In The Paper! Programming Model suitability analysis More

    in depth performance characterization Formal proof that exception free executions are SC Further protocol implementation details Monday, May 28, 2012
  48. Conflict Exceptions: Simplifying Concurrent Language Semantics with Precise Hardware Exceptions

    for Data-Races Brandon Lucia, Luis Ceze, Karin Strauss, Shaz Qadeer and Hans-J. Boehm Monday, May 28, 2012
  49. Memory Overhead 24 0 6 12 18 24 30 ferret

    facesim x264 canneal fluidanim ate dedup blackscholes M ean % Memory Overhead Monday, May 28, 2012
  50. Suitability 25 0 0.02 0.04 0.06 0.08 0.10 Apache ferret

    bodytrack facesim x264 fluidanim ate blackscholes M ean % Lines of Code w/ Exceptions Monday, May 28, 2012