“Understanding the mechanisms that
provide coherency guarantees on
multiprocessors is a prerequisite to
understanding contention on such
systems.”
Slide 9
Slide 9 text
MESIF Cache
Coherency
Protocol State
Machine
Understanding how processors
optimize memory access is
crucial.
Understanding how cache traffic
propagates between processors
is crucial.
Marek Fiser
http:/
/www.texample.net/tikz/examples/mesif/
CC BY 2.5
Slide 10
Slide 10 text
“[T]he cache line is the granularity
at which coherency is maintained on
a cache-coherent multiprocessor
system. This is also the unit of
contention.”
Atomic Primitives
• They’re not cheap, even without
contention
• Languages with strong memory
guarantees may increase cost
• Compilers providing builtins may
increase cost
NUMA Topologies
• Non-Uniform Memory Access:
memory placement matters
• Accessing remote memory is slow
• Locking on remote memory can
cause starvation and livelocks
Slide 27
Slide 27 text
NUMA Topologies
• Fair locks help with starvation,
but are sensitive to preemption
• “Cohort locks” are NUMA-
friendly
“Contention [is] a product of the
effective arrival rate of requests to a
shared resource, [which] leads to a
queuing effect that directly impacts
the responsiveness of a system.”
Slide 39
Slide 39 text
Awful Person
Slide 40
Slide 40 text
Locks Create Queues
• Threads become queue
elements
• Composability
• Fairness
Slide 41
Slide 41 text
Locks Create Queues
• Completion guarantees
• Latency of critical section
impacts system
throughput
Slide 42
Slide 42 text
Little’s law: L = λW
• Applies to stable systems
• Capacity (L)
• Throughput (λ)
• Latency (W)
• Decouple liveness with respect
to visibility
• Unmanaged languages need
additional help
Safe Memory Reclamation
Slide 50
Slide 50 text
• Refcounts are common mechanism
• Difficulty using them without locks
• Perform poorly with many long-
lived, frequently accessed objects
Safe Memory Reclamation