Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Atomic programming

Sponsored · Your Podcast. Everywhere. Effortlessly. Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
Avatar for Felix Chern Felix Chern
January 05, 2017
300

Atomic programming

Avatar for Felix Chern

Felix Chern

January 05, 2017
Tweet

Transcript

  1. About Me • Felix Chern • Google: Cloud networking •

    OpenX: Big data team tech lead • SupplyFrame: Built Hadoop pipeline • http://idryman.org
  2. You might think atomic is… yet another data type that

    is concurrent friendly Reference count, program counter; how hard can it be?
  3. Problems with C11/C++11 atomic API • Memory (re)ordering • Subtle

    memory model • Compiler bugs • Performance penalties
  4. 1

  5. int a; if (b > 0) { a = 1;

    } else { a = 0; } int a; a = 0; if (b > 0) { a = 1; } Can be optimized as
  6. non atomic store x non atomic store y Atomic store

    a Atomic load a non atomic load x non atomic load y Time T1 T2 Store on x and y “happens before” load on x and y
  7. non atomic store x non atomic store y Atomic store

    a Time T1 memory order release: A store operation. No (reads or) writes in the current thread can be reordered after this store. // a init to 1 x = 1; y = 2; a.store(0, memory_order_release);
  8. non atomic store x non atomic store y Atomic store

    a Time T1 memory order release: A store operation. No (reads or) writes in the current thread can be reordered after this store.
  9. Atomic load a non atomic load x non atomic load

    y Time T2 while (a.load (memory_order_acquire)) { /* spin lock */ } // read x, y memory order acquire: A load operation. No reads (or writes) in the current thread can be reordered before this load.
  10. Atomic load a non atomic load x non atomic load

    y Time T2 memory order acquire: A load operation. No reads (or writes) in the current thread can be reordered before this load.
  11. non atomic store x non atomic store y Atomic store

    a Atomic load a non atomic load x non atomic load y Time T1 T2 Store on x and y “happens before” load on x and y
  12. non atomic store x Atomic store a Atomic test and

    load a Time T1 T2 Atomic test and load a non atomic store x Atomic store a critical section critical section
  13. non atomic store x Atomic store a T1 Atomic test

    and load a using namespace std; atomic_flag a = ATOMIC_FLAG_INIT; while (a.test_and_set( memory_order_acquire )) ; // spin lock x++; a.clear(memory_order_release); Alternatively: • atomic_fetch_and/atomic_fetch_or • atomic_exchange/atomic_store • atomic_compare_exchange_weak/strong
  14. Memory model • memory_order_relaxed: there are no synchronization or ordering

    constraints, only atomicity is required of this operation • memory_order_acquire: A load operation. No reads (or writes) in the current thread can be reordered before this load. • memory_order_release: A store operation. No (reads or) writes in the current thread can be reordered after this store. • memory_order_acq_rel: A read-modify-write operation. This is both acquire and release. • memory_order_seq_cst: Any operation with this memory order is both an acquire operation and a release operation, plus a single total order exists in which all threads observe all modifications in the same order
  15. 2

  16. non atomic store x Atomic store a ok? Atomic test

    and load a while (a.test_and_set( memory_order_acquire )) ; // spin lock x++; a.clear(memory_order_release); memory order acquire: A load operation. No reads in the current thread can be reordered before this load. ok?
  17. non atomic store x Atomic store a ok? Atomic test

    and load a while (a.test_and_set( memory_order_acquire )) ; // spin lock x++; a.clear(memory_order_release); memory order acquire: A load operation. No reads or writes in the current thread can be reordered before this load. ok? Added in Oct, 2016
  18. Reference count // inlinable void retain(Object* obj) { obj->refcnt.fetch_add(1, memory_order_acquire);

    } memory order acquire: A load operation. No reads or writes in the current thread can be reordered before this load. Both a load and a store!
  19. atomic checklist • Check your code on gcc.godbolt.org • Get

    familiar with the assembly of the targeted platform • Unit test for data race
 What seems correct on godbolt may not be the case on your platform :(
  20. 3

  21. https://godbolt.org/g/9Bb6yt • No system call • No thread fence (?)

    • Only xchg, mov • Should be fast, right?
  22. Why? • xchg src, dest
 If one of the operands

    is a memory address, then the operation has an implicit LOCK prefix, that is, the exchange operation is atomic. • LOCK
 Causes the processor's LOCK# signal to be asserted during execution of the accompanying instruction. In a multiprocessor environment, the LOCK# signal insures that the processor has exclusive use of any shared memory while the signal is asserted.
  23. 4

  24. • lock.load(memory_order_relaxed) • *reinterpret_cast<int*>(lock) // C++
 *(int*)lock // C •

    *reinterpret_cast<volatile int*>(lock) // C++
 *(volatile int*)lock // C
  25. Turns out.. • On x86 • *(volatile int*) lock is

    same as
 lock.load(memory_order_relaxed) • memory order relaxed, acquire, acq_rel, seq_cst doesn’t matter here • But only on X86! • Unless you compile and test, you know nothing!
  26. • Acquire-Release semantic is subtle • Need to double check

    compiled result • Performance penalty can be huge! (20x times slower) • Usually slower than pthread mutex • LOCK# synchronizes all memory access.
  27. 5

  28. Build new logic with atomic • Build concurrent data structures/algorithms


    boost, Facebook folly, golang, etc. • Also make use of thread local variables • Create new semantic pthread doesn’t provide
  29. 6

  30. A mini concurrent queue • max_size init to 3 •

    enqueue:
 acquire read lock
 if (size < max_size)
 enqueue object into the queue
 release read lock, return
 else
 release read lock and acquire write lock
 max_size = max_size * 2
 release write lock, acquire read lock
 enqueue object
 release read lock Not thread safe
  31. A mini concurrent queue • Problem: when releasing the read

    lock, the state is not safe • Two threads increasing the size (ok) • One thread increasing the size, the other does the opposite (bad) • One thread free the resource, the other inserting object (crash!)
  32. Introducing punch card • check_in (like acquire read lock) •

    check_out (like release read lock) • book_critical
 exclude new check in
 if already booked, failed to book • enter_critical
 wait until all check_in checked out then enter • exit_critical
  33. punch card write • check_in • if (!book_critical)
 check_out and

    return • while (!enter_critical) {}
 // spin until others check out • // do critical stuff, like switch state • exit_critical • check_out
  34. State A State B State C check in check out

    check in check out temporal state excluding check in critical section
  35. https://github.com/dryman/atomic_patterns/blob/master/op_atomic.h • check_in: pcard += 1 (when >= 0) •

    check_out: pcard -= 1 • book_critical: Turn MSB to 1 (when > 0)
 i.e. pcard = INT_MIN + pcard • enter_critical: Spin until pcard == INT_MIN + 1
 pcard = INT_MIN • exit_critical: pcard = 1
  36. 7

  37. Atomic applications • Databases • RDBMS • NoSQL • Memory

    manager, allocator, garbage collector • Concurrent programming framework/language • Software transactional memory (STM) • Golang, Erlang, NodeJS(?)
  38. 8

  39. Beyond atomic • Thread local variable (C11/C++11)
 static __thread int

    x; • GCC transactional memory (experimental)
 __transaction_atomic { if (a > b) b++; } • Hardware transactional memory
 -mrtm (Restricted Transactional Memory)
 #include <immintrin.h> if ((status = _xbegin ()) == _XBEGIN_STARTED) { ... transaction code... _xend (); } else { ... non transactional fallback path... }
  40. References • CPP Atomic Operations Library • Preshing on programming

    blog posts on atomic • The Art of Multiprocessor Programming • C++ Concurrency In Action • GCC Transactional Memory • GCC X86 hard ware transaction reference • X86 references: http://x86.renejeschke.de • Compile code online: http://gcc.godbolt.org • https://github.com/dryman/atomic_patterns
  41. OPIC
 Object Persistence In C • https://github.com/dryman/opic • Bring data

    structures to big data • O(1) deserialization (mmap) • Target for high throughput (big data), but also low latency applications (this is why I entered atomic programming) • Version 3 is still under development
 (branch OPIC-33)