The tricky truth about parallel execution and modern hardware

The tricky truth about parallel execution and modern hardware

Concurrency and parallelism in Ruby are more and more important in the future. Machines will be multi-core and parallelization is often the way these days to speed things up.

At a hardware level, this parallel world is not always a nice and simple place to live. As Ruby implementations get faster and hardware more parallel, these details will matter for you as a Ruby developer too.

Want to know about the pitfalls are of double check locking? No idea what out of order execution means? How CPU cache effects can lead to obscure crashes? What this thing called a memory barrier is? How false sharing can cause performance issues?

Come listen if you want to know about nitty gritty details that can affect your Ruby application in the future.

B012094b37ab6946c44eaa41d7828478?s=128

Dirkjan Bussink

November 09, 2013
Tweet

Transcript

  1. 2.
  2. 3.
  3. 6.

    a = 1 b = "" x = b y

    = a CPU1 CPU2 a = 0 b = 0
  4. 7.

    a = 1 b = "" x = b y

    = a CPU1 CPU2 x = "" y = 1
  5. 8.

    a = 1 b = "" x = b y

    = a CPU1 CPU2 x = 0 y = 0
  6. 9.

    a = 1 b = "" x = b CPU1

    CPU2 x = 0 y = 1 y = a
  7. 10.

    ?

  8. 12.
  9. 16.

    a = 1 x = b y = a CPU1

    CPU2 x = "" y = 0 b = ""
  10. 18.

    8.2.3.4 Loads May Be Reordered with Earlier Stores to Different

    Locations Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 3A
  11. 19.

    store a, 1 store b, "" x = load b

    y = load a ! x == "" y == 1 x = load b y = load a store a, 1 store b, "" ! x == 0 y == 0 store a, 1 x = load b y = load a store b, "" ! x == 0 y == 1
  12. 21.
  13. 22.

    store a, store b, x y ! x y x

    y store a, store b, ! x y store a, x y store b, ! x y store b, "" x = load b y = load a store a, 1 ! x == "" y == 0
  14. 25.

    L1 L2 L3 RAM 4 cycles ! 10 cycles !

    40 - 75 cycles ! 60ns - 100ns hundreds of cycles
  15. 28.

    a = 1 x = b b = "" y

    = a CPU1 CPU2 a = 0 b = 0
  16. 29.

    a = 1 x = b CPU1 CPU2 x =

    0 y = 1 b = "" y = a
  17. 30.

    a = 1 x = b CPU1 CPU2 x =

    "" y = 0 b = "" y = a
  18. 31.

    a = 1 x = b b = "" CPU1

    CPU2 x = "" y = 1 y = a
  19. 32.

    8.2.3.4 Loads May Be Reordered with Earlier Stores to Different

    Locations Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 3A
  20. 33.

    a = 1 CPU1 CPU2 x = 0 y =

    0 y = a b = "" x = b
  21. 39.

    class Foo @mutex = Mutex.new ! def initialize @bar =

    "Bar" end ! def self.instance unless @instance @mutex.synchronize do unless @instance @instance = Foo.new end end end @instance end end
  22. 40.

    class Foo @mutex = Mutex.new ! def initialize @bar =

    "Bar" end ! def self.instance unless @instance @mutex.synchronize do instance = Foo.new # Insert compiler barrier @instance = instance end end @instance end end Explicit synchronization
  23. 41.
  24. 43.
  25. 44.

    class Foo attr_accessor :a end ! f = Foo.new i

    = 0 while i < 100000 f.a = 0 i += 1 end i = 0 while i < 100000 f.a = 1 i += 1 end CPU1 CPU2
  26. 45.

    L1 L2 L3 RAM 4 cycles ! 10 cycles !

    40 - 75 cycles ! 60ns - 100ns hundreds of cycles Shared across cores
  27. 46.

    class Foo attr_accessor :a attr_accessor :b end ! f =

    Foo.new i = 0 while i < 100000 f.b = 0 i += 1 end i = 0 while i < 100000 f.a = 1 i += 1 end CPU1 CPU2
  28. 48.

    class Foo attr_accessor :a ... attr_accessor :k end ! f

    = Foo.new i = 0 while i < 100000 f.k = 0 i += 1 end i = 0 while i < 100000 f.a = 1 i += 1 end CPU1 CPU2
  29. 50.
  30. 52.
  31. 56.
  32. 57.

    Fin