Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Atomic programming

Felix Chern
January 05, 2017
240

Atomic programming

Felix Chern

January 05, 2017
Tweet

Transcript

  1. About Me • Felix Chern • Google: Cloud networking •

    OpenX: Big data team tech lead • SupplyFrame: Built Hadoop pipeline • http://idryman.org
  2. You might think atomic is… yet another data type that

    is concurrent friendly Reference count, program counter; how hard can it be?
  3. Problems with C11/C++11 atomic API • Memory (re)ordering • Subtle

    memory model • Compiler bugs • Performance penalties
  4. 1

  5. int a; if (b > 0) { a = 1;

    } else { a = 0; } int a; a = 0; if (b > 0) { a = 1; } Can be optimized as
  6. non atomic store x non atomic store y Atomic store

    a Atomic load a non atomic load x non atomic load y Time T1 T2 Store on x and y “happens before” load on x and y
  7. non atomic store x non atomic store y Atomic store

    a Time T1 memory order release: A store operation. No (reads or) writes in the current thread can be reordered after this store. // a init to 1 x = 1; y = 2; a.store(0, memory_order_release);
  8. non atomic store x non atomic store y Atomic store

    a Time T1 memory order release: A store operation. No (reads or) writes in the current thread can be reordered after this store.
  9. Atomic load a non atomic load x non atomic load

    y Time T2 while (a.load (memory_order_acquire)) { /* spin lock */ } // read x, y memory order acquire: A load operation. No reads (or writes) in the current thread can be reordered before this load.
  10. Atomic load a non atomic load x non atomic load

    y Time T2 memory order acquire: A load operation. No reads (or writes) in the current thread can be reordered before this load.
  11. non atomic store x non atomic store y Atomic store

    a Atomic load a non atomic load x non atomic load y Time T1 T2 Store on x and y “happens before” load on x and y
  12. non atomic store x Atomic store a Atomic test and

    load a Time T1 T2 Atomic test and load a non atomic store x Atomic store a critical section critical section
  13. non atomic store x Atomic store a T1 Atomic test

    and load a using namespace std; atomic_flag a = ATOMIC_FLAG_INIT; while (a.test_and_set( memory_order_acquire )) ; // spin lock x++; a.clear(memory_order_release); Alternatively: • atomic_fetch_and/atomic_fetch_or • atomic_exchange/atomic_store • atomic_compare_exchange_weak/strong
  14. Memory model • memory_order_relaxed: there are no synchronization or ordering

    constraints, only atomicity is required of this operation • memory_order_acquire: A load operation. No reads (or writes) in the current thread can be reordered before this load. • memory_order_release: A store operation. No (reads or) writes in the current thread can be reordered after this store. • memory_order_acq_rel: A read-modify-write operation. This is both acquire and release. • memory_order_seq_cst: Any operation with this memory order is both an acquire operation and a release operation, plus a single total order exists in which all threads observe all modifications in the same order
  15. 2

  16. non atomic store x Atomic store a ok? Atomic test

    and load a while (a.test_and_set( memory_order_acquire )) ; // spin lock x++; a.clear(memory_order_release); memory order acquire: A load operation. No reads in the current thread can be reordered before this load. ok?
  17. non atomic store x Atomic store a ok? Atomic test

    and load a while (a.test_and_set( memory_order_acquire )) ; // spin lock x++; a.clear(memory_order_release); memory order acquire: A load operation. No reads or writes in the current thread can be reordered before this load. ok? Added in Oct, 2016
  18. Reference count // inlinable void retain(Object* obj) { obj->refcnt.fetch_add(1, memory_order_acquire);

    } memory order acquire: A load operation. No reads or writes in the current thread can be reordered before this load. Both a load and a store!
  19. atomic checklist • Check your code on gcc.godbolt.org • Get

    familiar with the assembly of the targeted platform • Unit test for data race
 What seems correct on godbolt may not be the case on your platform :(
  20. 3

  21. https://godbolt.org/g/9Bb6yt • No system call • No thread fence (?)

    • Only xchg, mov • Should be fast, right?
  22. Why? • xchg src, dest
 If one of the operands

    is a memory address, then the operation has an implicit LOCK prefix, that is, the exchange operation is atomic. • LOCK
 Causes the processor's LOCK# signal to be asserted during execution of the accompanying instruction. In a multiprocessor environment, the LOCK# signal insures that the processor has exclusive use of any shared memory while the signal is asserted.
  23. 4

  24. • lock.load(memory_order_relaxed) • *reinterpret_cast<int*>(lock) // C++
 *(int*)lock // C •

    *reinterpret_cast<volatile int*>(lock) // C++
 *(volatile int*)lock // C
  25. Turns out.. • On x86 • *(volatile int*) lock is

    same as
 lock.load(memory_order_relaxed) • memory order relaxed, acquire, acq_rel, seq_cst doesn’t matter here • But only on X86! • Unless you compile and test, you know nothing!
  26. • Acquire-Release semantic is subtle • Need to double check

    compiled result • Performance penalty can be huge! (20x times slower) • Usually slower than pthread mutex • LOCK# synchronizes all memory access.
  27. 5

  28. Build new logic with atomic • Build concurrent data structures/algorithms


    boost, Facebook folly, golang, etc. • Also make use of thread local variables • Create new semantic pthread doesn’t provide
  29. 6

  30. A mini concurrent queue • max_size init to 3 •

    enqueue:
 acquire read lock
 if (size < max_size)
 enqueue object into the queue
 release read lock, return
 else
 release read lock and acquire write lock
 max_size = max_size * 2
 release write lock, acquire read lock
 enqueue object
 release read lock Not thread safe
  31. A mini concurrent queue • Problem: when releasing the read

    lock, the state is not safe • Two threads increasing the size (ok) • One thread increasing the size, the other does the opposite (bad) • One thread free the resource, the other inserting object (crash!)
  32. Introducing punch card • check_in (like acquire read lock) •

    check_out (like release read lock) • book_critical
 exclude new check in
 if already booked, failed to book • enter_critical
 wait until all check_in checked out then enter • exit_critical
  33. punch card write • check_in • if (!book_critical)
 check_out and

    return • while (!enter_critical) {}
 // spin until others check out • // do critical stuff, like switch state • exit_critical • check_out
  34. State A State B State C check in check out

    check in check out temporal state excluding check in critical section
  35. https://github.com/dryman/atomic_patterns/blob/master/op_atomic.h • check_in: pcard += 1 (when >= 0) •

    check_out: pcard -= 1 • book_critical: Turn MSB to 1 (when > 0)
 i.e. pcard = INT_MIN + pcard • enter_critical: Spin until pcard == INT_MIN + 1
 pcard = INT_MIN • exit_critical: pcard = 1
  36. 7

  37. Atomic applications • Databases • RDBMS • NoSQL • Memory

    manager, allocator, garbage collector • Concurrent programming framework/language • Software transactional memory (STM) • Golang, Erlang, NodeJS(?)
  38. 8

  39. Beyond atomic • Thread local variable (C11/C++11)
 static __thread int

    x; • GCC transactional memory (experimental)
 __transaction_atomic { if (a > b) b++; } • Hardware transactional memory
 -mrtm (Restricted Transactional Memory)
 #include <immintrin.h> if ((status = _xbegin ()) == _XBEGIN_STARTED) { ... transaction code... _xend (); } else { ... non transactional fallback path... }
  40. References • CPP Atomic Operations Library • Preshing on programming

    blog posts on atomic • The Art of Multiprocessor Programming • C++ Concurrency In Action • GCC Transactional Memory • GCC X86 hard ware transaction reference • X86 references: http://x86.renejeschke.de • Compile code online: http://gcc.godbolt.org • https://github.com/dryman/atomic_patterns
  41. OPIC
 Object Persistence In C • https://github.com/dryman/opic • Bring data

    structures to big data • O(1) deserialization (mmap) • Target for high throughput (big data), but also low latency applications (this is why I entered atomic programming) • Version 3 is still under development
 (branch OPIC-33)