Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Go race detector under the hood

Go race detector under the hood

在開發程式的時候,race condition 一直是程式設計師的大敵,基本上這類的錯誤非常難抓,Go 遠在 GO 1.1 時就加入了 race detector 的支援,讓我們能夠更容易地去檢測 data race ,進而減少 race condition 的情況發生。
這次的 talk 想要分享下,data race & race condition 的差別,還有 Go 怎麼基於 C/C++ 的 ThreadSanitizer 去實現 race detector。

Kakashi Liu

June 02, 2021
Tweet

Other Decks in Technology

Transcript

  1. References • Examples mostly come from the following slides •

    go run -race Under the Hood • Golang-race-detection-neven-miculinic-krakensystems
  2. Data races • When 2+ threads concurrently access a shared

    memory location, at least one access is a write. • Usually it’s a bug
  3. Data races ≠ Race conditions • Race conditions are timing

    errors on thread interleavings, lock operations • Data races are explicitly on “data variables” https://www.cse.iitk.ac.in/users/swarnendu/courses/spring2021-cs636/concurrency-bugs.pdf
  4. Data races ≠ Race conditions transfer (amount, account_from, account_to) {

    atomic { bal = account_from.balance; } if (bal < amount) return NOPE; atomic { account_to.balance += amount; } atomic { account_from.balance -= amount; } return YEP; } Race Condition vs. Data Race
  5. Simple example var count = 0 //shared variable func incrementCount()

    { if count == 0 { count ++ } } func main() { go incrementCount() go incrementCount() } https://speakerdeck.com/kavya719/go-run-race-under-the-hood
  6. Simple example var count = 0 //shared variable func incrementCount()

    { if count == 0 { count ++ } } func main() { go incrementCount() go incrementCount() } https://speakerdeck.com/kavya719/go-run-race-under-the-hood G1R G1W G2R G2W count = 1 sequential G1 G2
  7. Simple example var count = 0 //shared variable func incrementCount()

    { if count == 0 { count ++ } } func main() { go incrementCount() go incrementCount() } https://speakerdeck.com/kavya719/go-run-race-under-the-hood G1R G1R G1R G1W G2R G2R G2R G1W G2W G2W G2W G1W count = 1 count = 2 count=2 sequential current current G1 G2
  8. Go race detector to the rescue • Go v1.1 (2013)

    • Based on LLVM ThreadSanitizer race detector • The race detector is integrated with go tool chain ◦ go test -race ◦ go build -race ◦ go install -race ◦ go run -race https://talks.golang.org/2015/dynamic-tools.slide
  9. Race detection design choices • Static analysis • Dynamic analysis

    ◦ On-the-fly ◦ Post-mortem • Dynamic race detection ◦ Happens-before ◦ Lockset based ◦ Hybrid models https://speakerdeck.com/godays/golang-race-detection-neven-miculinic-krakensystems?slide=32
  10. Concurrency in Go • Goroutine is a lightweight thread managed

    by Go runtime • Go memory model (simplified version) ◦ Happens before ◦ Within a single goroutine ▪ Read + Write are ordered ▪ May reorder the reads and writes executed only when the reordering does not change the behavior ◦ Within multiple goroutines ▪ Shared variables should be synchronized
  11. //shared variable var count = 0 func incrementCount() { mu.Lock()

    if count == 0 { count ++ } mu.Unlock() } func main() { go incrementCount() go incrementCount() } Lock Unlock R count W count G1 Lock Unlock R count G2 A B C D
  12. Lock Unlock R count W count G1 Lock Unlock R

    count G2 A B C D A < B (Same goroutines) B < C (lock/unlock) A < D (transitivity)
  13. Lock Unlock R count W count G1 Lock Unlock R

    count G2 A B C D A < B (Same goroutines) B < C (lock/unlock) A < D (transitivity) Not concurrent
  14. //shared variable var count = 0 func incrementCount() { if

    count == 0 { count ++ } } func main() { go incrementCount() go incrementCount() } R count W count G1 G2 A B C D R count W count A < B (Same goroutines) C < D (Same goroutines) But A ? C and C ? A
  15. //shared variable var count = 0 func incrementCount() { if

    count == 0 { count ++ } } func main() { go incrementCount() go incrementCount() } R count W count G1 G2 A B C D R count W count A < B (Same goroutines) C < D (Same goroutines) But A ? C and C ? A concurrent
  16. How can we detect happens-before Lamport, L. (1978). "Time, clocks,

    and the ordering of events in a distributed system" (PDF).
  17. Lock Unlock R count W count G1 Lock Unlock R

    count G2 A B C D G1 G2 1 0 2 0 t1 t2 3 0 4 0 t1 t2 A B (0, 0) (0, 0)
  18. Lock Unlock R count W count G1 Lock Unlock R

    count G2 A B C D G1 G2 C D 1 0 2 0 t1 t2 3 0 4 0 4 1 t1 t2 4 2 4 3 A B (0, 0) (0, 0)
  19. Lock Unlock R count W count G1 Lock Unlock R

    count G2 A B C D G1 G2 C D 1 0 2 0 t1 t2 3 0 4 0 4 1 t1 t2 4 2 4 3 A B A < D (3,0) < (4,2) no race (0, 0) (0, 0)
  20. G1 t1 t2 0 0 1 t3 0 0 2

    G2 t1 t2 t3 0 1 4 G3 t1 t2 1 3 4 t3 2 3 4 0 0 3 0 0 4 0 2 4 0 3 4
  21. R count W count G1 G2 A B C D

    R count W count G1 G2 1 0 2 0 t1 t2 0 1 t1 t2 0 2 A < D ? (1, 0) < (0,2) concurrent
  22. Algorithms • DJIT+ ◦ No false positive ◦ Might not

    detect data race depending on observed executions • FastTrack ◦ Further optimize time and space complexity of DJIT+ • ThreadSanitizer v2 ◦ Similar with FastTrack
  23. • Each thread keeps a vector clock Ct • Each

    thread has its own clock that is incremented at lock synchronization operations with release semantics • Each lock has a vector clock • Each shared variables x has two vector clock Rx and Wx • Check each memory access with all previous accesses DJIT+
  24. C1 C2 4 0 0 8 DJIT+ Lm • C:

    clock entry for given thread id • Lm: vector clocks for each lock m (mutex) • Wx clock of the last write to x by thered t 0 0 Wx 0 0
  25. C1 C2 4 0 4 0 0 8 DJIT+ W(x)

    Lm 0 0 Wx 0 0 4 0 Update Wx vector clocks
  26. C1 C2 4 0 4 0 0 8 DJIT+ W(x)

    Lm 0 8 0 0 0 0 Wx 0 0 4 0
  27. C1 C2 4 0 4 0 0 8 DJIT+ W(x)

    Lm 0 8 5 0 Unlock(m) 0 0 0 0 4 0 Wx 0 0 4 0 Update Lm vector clocks
  28. C1 C2 4 0 4 0 0 8 DJIT+ W(x)

    Lm 0 8 5 0 Unlock(m) 0 0 0 0 4 0 0 8 Wx 0 0 4 0 4 0
  29. C1 C2 4 0 4 0 0 8 DJIT+ W(x)

    Lm 0 8 5 0 Unlock(m) 0 0 0 0 4 0 0 8 Wx 0 0 4 0 4 0 5 0 4 0 4 8 4 0 Lock(m) join vector clocks (0, 8) and (4, 0)
  30. C1 C2 4 0 4 0 0 8 DJIT+ W(x)

    Lm 0 8 5 0 Unlock(m) 0 0 0 0 4 0 0 8 Wx 0 0 4 0 4 0 5 0 4 0 4 8 4 0 Lock(m) 5 0 4 0 4 8 W(x) (4, 0) ≼ (4, 8) No race
  31. C1 C2 4 0 4 0 0 8 DJIT+ W(x)

    Lm 0 8 5 0 Unlock(m) 0 0 0 0 4 0 0 8 Wx 0 0 4 0 4 0 5 0 4 0 4 8 4 0 Lock(m) 5 0 4 0 4 8 W(x) 4 8
  32. C1 C2 4 0 4 0 0 8 DJIT+ W(x)

    Lm 0 8 4 0 0 0 0 0 0 0 0 8 Wx 0 0 4 0 4 0 4 0 0 0 0 8 4 0 4 0 0 0 0 8 W(x) (4, 0) ? (0, 8) race
  33. C1 C2 4 0 4 0 0 8 FastTrack W(x)

    Lm 0 8 5 0 Unlock(m) 0 0 0 0 4 0 0 8 Wx 5 0 4 0 4 8 Lock(m) 5 0 4 0 4 8 W(x) 0@0 4@0 4@0 4@0 8@1
  34. Quiz var count = 0 var mu sync.Mutex func incrementCount1()

    { mu.Lock() count += 1 mu.UnLock() } func incrementCount2() { mu.Lock() mu.UnLock() count += 1 } G1 G2
  35. ThreadSanitizer • Detects data races and deadlocks • Compile-time instrumentation

    (LLVM, GCC) ◦ Intercept all reads/writes • Run-time library ◦ Intercepts all synchronization and thread mgmt ◦ Handles reads/writes ◦ Runtime slowdown 2x-20x ◦ Memory overhead 5x-10x
  36. Compiler instrumentation var count = 0 func incrementCount() { raceread()

    if count == 0 { racewrite() count ++ } racefuncexit() } func main() { go incrementCount() go incrementCount() }
  37. Goroutine creation & Synchronization events func newproc1() { if raceenabled

    { // callerpc is the address of the go statement // that created this newg.racectx = racegostart(callerpc) } } runtime/proc.go func (m *Mutex) Lock() { if atomic.CompareAndSwapInt32(&m.state, 0, mutexLocked) { if race.Enabled { race.Acquire(unsafe.Pointer(m)) } return } m.lockSlow() mutex.go
  38. Calling C++ race detection library (TSAN) var count = 0

    func incrementCount() { raceread() if count == 0 { racewrite() count ++ } racefuncexit() } func main() { go incrementCount() go incrementCount() }
  39. go incrementCount() func newproc1() { if raceenabled { newg.racectx =

    racegostart(callerpc) } } Struct ThreadState { int tid u64 epoch ThreadClock clock } race.Acquire() race.Release() race.ReleaseMerge() Establish happens-before relations By updating vector clocks race.read() race.write() 1. Data race with previous access 2. Store information about this access for future comparisons
  40. • Store information about memory access Shadow state Application Shadow

    0x7fxxxxxxxxxx 0x1fxxxxxxxxxx tid epoch addr write • The shadow value represents 1 memory access (read or write) from one thread • For each 8bytes of user memory we maintain 4 shadow values (it’s tunable) Struct Shadow { u64 tid // thread id u64 epoch // thread’s clock time u64 addr // memory access address u64 write // read/write }
  41. 1 10 0:2 1 Example 0 1 10 G1 Write

    VC: addr x tid epoch addr is_write
  42. 1 10 0:2 1 Example 0 1 10 G1 Write

    VC: 0 20 5 G2 Read VC: 2 20 4:8 0 addr x tid epoch addr is_write
  43. Race detection 40 0 5 G3 Read addr x[0:4] VC:

    3 40 0:4 0 Compare: <current thread VC, new shadow state> vs <existing shadow states> 1 10 0:2 1 2 20 4:8 0
  44. Race detection 40 0 5 G3 Read addr x[0:4] VC:

    3 40 0:4 0 Compare: <current thread VC, new shadow state> vs <existing shadow states> [ ] access location overlap? [ ] any of these accesses is a write? [ ] are the tid different? [ ] unordered by happens before? 1 10 0:2 1 2 20 4:8 0
  45. Race detection 40 0 5 G3 Read addr x[0:4] VC:

    3 40 0:4 0 Compare: <current thread VC, new shadow state> vs <existing shadow states> [X] access location overlap? [X] any of these accesses is a write? [ ] are the tid different? [ ] unordered by happens before? 1 10 0:2 1 2 20 4:8 0
  46. Race detection 40 0 5 G3 Read addr x[0:4] VC:

    3 40 0:4 0 Compare: <current thread VC, new shadow state> vs <existing shadow states> [O] access location overlap? [O] any of these accesses is a write? [O] are the tid different? [ ] unordered by happens before? 1 10 0:2 1 2 20 4:8 0
  47. Race detection 40 0 5 G3 Read addr x[0:4] VC:

    3 40 0:4 0 Compare: <current thread VC, new shadow state> vs <existing shadow states> [O] access location overlap? [O] any of these accesses is a write? [O] are the tid different? [ ] unordered by happens before? 1 10 0:2 1 2 20 4:8 0 (10, ?, ?) < (5, 0, 40)
  48. Race detection 40 0 5 G3 Read addr x[0:4] VC:

    3 40 0:4 0 Compare: <current thread VC, new shadow state> vs <existing shadow states> [O] access location overlap? [O] any of these accesses is a write? [O] are the tid different? [O] unordered by happens before? 1 10 0:2 1 2 20 4:8 0 (10, ?, ?) < (5, 0, 40) Race
  49. Recap • Data races occurs when 2+ threads concurrently access

    a shared memory location, at least one access is a write. • Go race detector can help you find out those bugs. • Race detector can be slow and memory-consuming • Algorithm is based on dynamic modelling of happens-before relation ◦ No false positive ◦ False negatives are possible