The tricky truth about
parallel execution
and modern hardware
Dirkjan Bussink
@dbussink
Slide 2
Slide 2 text
No content
Slide 3
Slide 3 text
No content
Slide 4
Slide 4 text
Causality
Slide 5
Slide 5 text
a = 1
b = ""
Slide 6
Slide 6 text
a = 1
b = ""
x = b
y = a
CPU1 CPU2
a = 0
b = 0
Slide 7
Slide 7 text
a = 1
b = ""
x = b
y = a
CPU1 CPU2
x = ""
y = 1
Slide 8
Slide 8 text
a = 1
b = ""
x = b
y = a
CPU1 CPU2
x = 0
y = 0
Slide 9
Slide 9 text
a = 1
b = ""
x = b
CPU1 CPU2
x = 0
y = 1
y = a
Slide 10
Slide 10 text
?
Slide 11
Slide 11 text
x
y
x
y
x
y
x = ""
y = 0
Slide 12
Slide 12 text
Wat?
Slide 13
Slide 13 text
Compiler
optimization
Slide 14
Slide 14 text
a = 1
b = ""
Slide 15
Slide 15 text
b = ""
a = 1
Slide 16
Slide 16 text
a = 1
x = b
y = a
CPU1 CPU2
x = ""
y = 0
b = ""
Slide 17
Slide 17 text
Out of order
execution
Slide 18
Slide 18 text
8.2.3.4 Loads May Be Reordered with Earlier Stores to Different Locations
Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 3A
Slide 19
Slide 19 text
store a, 1
store b, ""
x = load b
y = load a
!
x == ""
y == 1
x = load b
y = load a
store a, 1
store b, ""
!
x == 0
y == 0
store a, 1
x = load b
y = load a
store b, ""
!
x == 0
y == 1
Slide 20
Slide 20 text
Not all architectures
are created equal
Slide 21
Slide 21 text
ARMv7
Slide 22
Slide 22 text
store a,
store b,
x
y
!
x
y
x
y
store a,
store b,
!
x
y
store a,
x
y
store b,
!
x
y
store b, ""
x = load b
y = load a
store a, 1
!
x == ""
y == 0
8.2.3.4 Loads May Be Reordered with Earlier Stores to Different Locations
Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 3A
Slide 33
Slide 33 text
a = 1
CPU1 CPU2
x = 0
y = 0
y = a
b = ""
x = b
Slide 34
Slide 34 text
Fixing it!
Slide 35
Slide 35 text
Memory barriers
Slide 36
Slide 36 text
__asm__ __volatile__ ("mfence" ::: "memory");
Slide 37
Slide 37 text
sfence
lfence
mfence
Slide 38
Slide 38 text
Double check locking
Slide 39
Slide 39 text
class Foo
@mutex = Mutex.new
!
def initialize
@bar = "Bar"
end
!
def self.instance
unless @instance
@mutex.synchronize do
unless @instance
@instance = Foo.new
end
end
end
@instance
end
end
Slide 40
Slide 40 text
class Foo
@mutex = Mutex.new
!
def initialize
@bar = "Bar"
end
!
def self.instance
unless @instance
@mutex.synchronize do
instance = Foo.new
# Insert compiler barrier
@instance = instance
end
end
@instance
end
end
Explicit
synchronization
Slide 41
Slide 41 text
Ruby?
Slide 42
Slide 42 text
False sharing
Slide 43
Slide 43 text
CPU Cache
Slide 44
Slide 44 text
class Foo
attr_accessor :a
end
!
f = Foo.new
i = 0
while i < 100000
f.a = 0
i += 1
end
i = 0
while i < 100000
f.a = 1
i += 1
end
CPU1 CPU2
class Foo
attr_accessor :a
attr_accessor :b
end
!
f = Foo.new
i = 0
while i < 100000
f.b = 0
i += 1
end
i = 0
while i < 100000
f.a = 1
i += 1
end
CPU1 CPU2
Slide 47
Slide 47 text
Cache lines
Slide 48
Slide 48 text
class Foo
attr_accessor :a
...
attr_accessor :k
end
!
f = Foo.new
i = 0
while i < 100000
f.k = 0
i += 1
end
i = 0
while i < 100000
f.a = 1
i += 1
end
CPU1 CPU2
Slide 49
Slide 49 text
real
single thread 4.252730
actual sharing 19.963792
false sharing 19.803237
no false sharing 4.617507