Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Data Provenance Tracking for Concurrent Programs

Brandon Lucia
February 12, 2015
140

Data Provenance Tracking for Concurrent Programs

A talk on Last Writer Slicing and Communication Traps that I gave at CGO 2015.

Brandon Lucia

February 12, 2015
Tweet

Transcript

  1. Data Provenance Tracking for Concurrent Programs Brandon Lucia | Carnegie

    Mellon University, Dept. of ECE work done in cooperation with Luis Ceze @ University of Washington, Dept. of CSE
  2. Data Provenance Tracking for Concurrent Programs - Brandon Lucia 4

    Multithreaded software is difficult to write Need to think about many threads instead of one
  3. Data Provenance Tracking for Concurrent Programs - Brandon Lucia 5

    Threads interact via shared memory Reasoning about concurrent shared accesses is hard Memory
  4. Data Provenance Tracking for Concurrent Programs - Brandon Lucia 6

    Behavior can change from one execution to the next Memory
  5. Data Provenance Tracking for Concurrent Programs - Brandon Lucia 7

    Some behavior is bad, like a crash or hang Memory ! Key problem: understanding why bad things happen
  6. Data Provenance Tracking for Concurrent Programs - Brandon Lucia 8

    Typically debug with a core dump ! ptr = NULL This thread crashed @! “assert(ptr != NULL)”
  7. Data Provenance Tracking for Concurrent Programs - Brandon Lucia 9

    Typically debug with a core dump ! Core dump tells us what happened, not why. ptr = NULL This thread crashed @! “assert(ptr != NULL)” WHY!?!
  8. Data Provenance Tracking for Concurrent Programs - Brandon Lucia 10

    Our work answers the why? question. ! Last Writer Slices record each value’s provenance ptr = NULL This thread crashed @! “assert(ptr != NULL)” This thread set ! x=NULL right here. LWS
  9. Data Provenance Tracking for Concurrent Programs - Brandon Lucia 11

    Bonus: provenance reveals communication Communication Traps: custom communication handlers This thread wrote ! x here… LWS Check LWS…! Communication!! Reader != last writer CTraps CT_Handler(…){! build_c_graph();! check_atomicity();! coop_bug_iso();! }
  10. Data Provenance Tracking for Concurrent Programs - Brandon Lucia 12

    LWS Memory Tracks data provenance at! runtime with low overhead CTraps Executes application-specific! handlers when threads communicate Multi-threaded! Execution Informs! of writes Informs! of communication Debugging Programmer examines! provenance via LWS Analysis Arbitrary concurrency analyses via CTraps Efficiency Overheads low enough! for production use
  11. Data Provenance Tracking for Concurrent Programs - Brandon Lucia 13

    len = len+1 append() realloc(str,len) len: length of string str: string buffer Shared Variables str[len-1] = ‘a’
  12. Data Provenance Tracking for Concurrent Programs - Brandon Lucia 14

    len = len+1 append() str[len-1] = ‘a’ realloc(str,len) len: length of string str: string buffer Shared Variables len = 0, str = [] len = 1, str = [] len = 1, str = [_] len = 1, str = [a]
  13. Data Provenance Tracking for Concurrent Programs - Brandon Lucia 15

    len = len+1 append() realloc(str,len) len: length of string str: string buffer Shared Variables str[len-1] = ‘a’ ! Crash: str[len-1] out of bounds
  14. Data Provenance Tracking for Concurrent Programs - Brandon Lucia 16

    len = len+1 append() realloc(str,len) len: length of string str: string buffer Shared Variables str[len-1] = ‘a’ ! Crash: str[len-1] out of bounds Programmer: “This must be wrong”
  15. Data Provenance Tracking for Concurrent Programs - Brandon Lucia 17

    len = len+1 append() append() erase() realloc(str,len) len = len+1 realloc(str,len) len = len -1 len: length of string str: string buffer Shared Variables str[len-1] = ‘a’ str[len-1] = ‘a’ ! Programmer: “One of these must be wrong”
  16. Data Provenance Tracking for Concurrent Programs - Brandon Lucia 21

    len = len+1 append() realloc(str,len) len: length of string str: string buffer Shared Variables str[len-1] = ‘a’ Last Writer Slices tracks! data provenance:! thread & code point! that last wrote len
  17. Data Provenance Tracking for Concurrent Programs - Brandon Lucia 23

    len = len+1 append() realloc(str,len) str[len-1] = ‘a’ len = len+1 append() realloc(str,len) str[len-1] = ‘a’ T1 T2 Last Writer Table B A F C D E X Read Operation Write Operation B A C D F E Y Var Thread Code! Pt. Last! Writer! Slice
  18. Data Provenance Tracking for Concurrent Programs - Brandon Lucia 24

    len = len+1 append() realloc(str,len) str[len-1] = ‘a’ len = len+1 append() realloc(str,len) str[len-1] = ‘a’ T1 T2 Last Writer Table B A F C D E X Read Operation Write Operation B A C D F E Y B len T1 Update
  19. Data Provenance Tracking for Concurrent Programs - Brandon Lucia 25

    len = len+1 append() realloc(str,len) str[len-1] = ‘a’ len = len+1 append() realloc(str,len) str[len-1] = ‘a’ T1 T2 Last Writer Table B A F C D E X Read Operation Write Operation B A C D F E Y B len T2 Update
  20. Data Provenance Tracking for Concurrent Programs - Brandon Lucia 26

    len = len+1 append() realloc(str,len) str[len-1] = ‘a’ len = len+1 append() realloc(str,len) str[len-1] = ‘a’ T1 T2 Last Writer Table B A F C D E X Read Operation Write Operation B A C D F E Y B len T2 !Crash
  21. Data Provenance Tracking for Concurrent Programs - Brandon Lucia 27

    len = len+1 append() realloc(str,len) str[len-1] = ‘a’ len = len+1 append() realloc(str,len) str[len-1] = ‘a’ T1 T2 Last Writer Table B A F C D E X Read Operation Write Operation B A C D F E Y B len T2 Breakpoint
  22. Data Provenance Tracking for Concurrent Programs - Brandon Lucia 28

    len = len+1 append() realloc(str,len) str[len-1] = ‘a’ len = len+1 append() realloc(str,len) str[len-1] = ‘a’ T1 T2 Last Writer Table B A F C D E B A C D F E B len T2 Reads are free for LWS
  23. Data Provenance Tracking for Concurrent Programs - Brandon Lucia 29

    len = len+1 append() realloc(str,len) str[len-1] = ‘a’ len = len+1 append() realloc(str,len) str[len-1] = ‘a’ T1 T2 Last Writer Table B A F C D E B A C D F E B len T2 CTraps Key Idea: Different thread in the LWT? Threads are communicating. Communication
  24. Data Provenance Tracking for Concurrent Programs - Brandon Lucia 30

    len = len+1 append() realloc(str,len) str[len-1] = ‘a’ CTraps allows communication handlers Handlers implement arbitrary communication analysis CT_Handler(current_code_pt,! current_thread,! LWS_code_pt,! LWS_thread,! mem_addr){! add_comm_graph_edge(current_code_pt,! LWS_code_pt);! } [Lucia, MICRO ’09; PLDI ’11; Shi, OOPSLA ’10; Gao, SC ’07;]
  25. Data Provenance Tracking for Concurrent Programs - Brandon Lucia 31

    len = len+1 append() realloc(str,len) str[len-1] = ‘a’ CTraps allows communication handlers CT_Handler(current_code_pt,! current_thread,! LWS_code_pt,! LWS_thread,! mem_addr){! add_comm_graph_edge(current_code_pt,! LWS_code_pt);! } [Lucia, MICRO ’09; PLDI ’11; Shi, OOPSLA ’10; Gao, SC ’07;] E B 230 A B 1024 D B 950 C F 2000 Handlers implement arbitrary communication analysis
  26. Data Provenance Tracking for Concurrent Programs - Brandon Lucia 32

    Last Writer Table B len T2 A str T17 C foo T9 D bar T6 LWS & CTraps! Compiler @ Write:! update LWT;! call CTraps Handler Link Program! to Runtimes LWS Runtime Library LWT Send LWS + Core! Dump to! debugger (GDB) Last Writer Slicing & CTraps Implementation Expose Comm.! to CTraps @ Read:! call CTraps Handler CTraps Runtime Library Maintain List! of CT_Handlers Call CT_handlers! on comm. c_graph()
  27. Data Provenance Tracking for Concurrent Programs - Brandon Lucia 33

    Caveat: LWS Implementation & Data-races len = len+1 realloc(str,len) str[len-1] = ‘a’ Update_LWT(len) Update_LWT(str) LWT adds no synchronization Correct for DRF programs! (may be incorrect for racy programs) len = len+1 Update_LWT(len) Release() len = len+1 Update_LWT(len) Lock() Program synchronization keeps LWT consistent ordered!
  28. Data Provenance Tracking for Concurrent Programs - Brandon Lucia 34

    Evaluating LWS and CTraps LWS helps with! Debugging CTraps enables useful! Analysis LWS & CTraps have! Efficiency! sufficient for production
  29. Data Provenance Tracking for Concurrent Programs - Brandon Lucia 36

    Debugging with LWS JDK 1.4 Transmission Atomicity! Violation Ordering! Error Ordering! Error Ordering! Error Atomicity! Violation Atomicity! Violation Atomicity! Violation
  30. Data Provenance Tracking for Concurrent Programs - Brandon Lucia 37

    Comparison Point: Bad Value Origin Tracking [Bond, et al OOPSLA ’07] ptr = NULL; foo() A len A Cleverly implemented using value ‘piggybacking’ Update if! value unusable if( x == 100){ … } foo() B x Check “Undefined value originating at used in conditional” B
  31. Data Provenance Tracking for Concurrent Programs - Brandon Lucia 38

    LWS tracks all values, BVOT only unusable ones JDK 1.4 Transmission Atomicity! Violation Ordering! Error Ordering! Error Ordering! Error Atomicity! Violation Atomicity! Violation Atomicity! Violation LWS BVOT LWS BVOT LWS BVOT LWS BVOT LWS BVOT LWS BVOT LWS BVOT *
  32. Data Provenance Tracking for Concurrent Programs - Brandon Lucia 39

    lock = new lock() init() acquire(lock); update() ! Crash: lock not ! initialized lock = new lock() init() acquire(lock); update() Failing Execution Non-Failing Execution OK: lock ! initialized lock A lock T1 A B A B
  33. Data Provenance Tracking for Concurrent Programs - Brandon Lucia 40

    lock = new lock() init() acquire(lock); update() ! Crash: lock not ! initialized lock = new lock() init() acquire(lock); update() Failing Execution Non-Failing Execution OK: lock ! initialized lock A lock T1 A B A B Breakpoint
  34. Data Provenance Tracking for Concurrent Programs - Brandon Lucia 41

    CTraps Supports Useful Analyses Communication Graph Collection CCI-Prev [Lucia, MICRO ’09; PLDI ’11; Shi, OOPSLA ’10; Gao, SC ’07;] [Jin, OOPSLA ’10] ~50 LoC for handlers ~10 LoC for handlers
  35. Data Provenance Tracking for Concurrent Programs - Brandon Lucia 42

    0 0.5 1 1.5 2 2.5 3 3.5 4 M ySQ L A pache m em cached LevelD B A M ean G M ean blackscholes dedup canneal stream cluster x264 fluidanim ate ferret vips sw aptions A M ean G M ean 50% Overhead LWS has overhead low enough! for production use 9% 49% 10% Overhead slowdown
  36. Data Provenance Tracking for Concurrent Programs - Brandon Lucia 0

    5 10 15 20 25 Apache-httpd M ySQ L m em cached LevelD B G M ean blackscholes dedup canneal x264 vips ferret fluidanim ate sw aptions stream cluster G M ean Empty Handler CCI-Prev CGraph 43 50% 14% 120% 56% 150% CTraps has practical overhead that! scales with analysis complexity 485% 774% slowdown
  37. Data Provenance Tracking for Concurrent Programs - Brandon Lucia 44

    LWS helps with! Debugging CTraps enables useful! Analysis LWS & CTraps have! Efficiency! sufficient for production Systems should track data provenance information
  38. Data Provenance Tracking for Concurrent Programs - Brandon Lucia 45

    LWS helps with! Debugging CTraps enables useful! Analysis LWS & CTraps have! Efficiency! sufficient for production https://github.com/blucia0a/CTraps-gcc https://gcc.gnu.org/wiki/plugins
  39. Data Provenance Tracking for Concurrent Programs Brandon Lucia | Carnegie

    Mellon University, Dept. of ECE work done in cooperation with Luis Ceze @ University of Washington, Dept. of CSE