Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The tricky truth about parallel execution and modern hardware

The tricky truth about parallel execution and modern hardware

Concurrency and parallelism in Ruby are more and more important in the future. Machines will be multi-core and parallelization is often the way these days to speed things up.

At a hardware level, this parallel world is not always a nice and simple place to live. As Ruby implementations get faster and hardware more parallel, these details will matter for you as a Ruby developer too.

Want to know about the pitfalls are of double check locking? No idea what out of order execution means? How CPU cache effects can lead to obscure crashes? What this thing called a memory barrier is? How false sharing can cause performance issues?

Come listen if you want to know about nitty gritty details that can affect your Ruby application in the future.

Dirkjan Bussink

November 09, 2013
Tweet

More Decks by Dirkjan Bussink

Other Decks in Technology

Transcript

  1. a = 1 b = "" x = b y

    = a CPU1 CPU2 a = 0 b = 0
  2. a = 1 b = "" x = b y

    = a CPU1 CPU2 x = "" y = 1
  3. a = 1 b = "" x = b y

    = a CPU1 CPU2 x = 0 y = 0
  4. a = 1 b = "" x = b CPU1

    CPU2 x = 0 y = 1 y = a
  5. ?

  6. a = 1 x = b y = a CPU1

    CPU2 x = "" y = 0 b = ""
  7. 8.2.3.4 Loads May Be Reordered with Earlier Stores to Different

    Locations Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 3A
  8. store a, 1 store b, "" x = load b

    y = load a ! x == "" y == 1 x = load b y = load a store a, 1 store b, "" ! x == 0 y == 0 store a, 1 x = load b y = load a store b, "" ! x == 0 y == 1
  9. store a, store b, x y ! x y x

    y store a, store b, ! x y store a, x y store b, ! x y store b, "" x = load b y = load a store a, 1 ! x == "" y == 0
  10. L1 L2 L3 RAM 4 cycles ! 10 cycles !

    40 - 75 cycles ! 60ns - 100ns hundreds of cycles
  11. a = 1 x = b b = "" y

    = a CPU1 CPU2 a = 0 b = 0
  12. a = 1 x = b CPU1 CPU2 x =

    0 y = 1 b = "" y = a
  13. a = 1 x = b CPU1 CPU2 x =

    "" y = 0 b = "" y = a
  14. a = 1 x = b b = "" CPU1

    CPU2 x = "" y = 1 y = a
  15. 8.2.3.4 Loads May Be Reordered with Earlier Stores to Different

    Locations Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 3A
  16. a = 1 CPU1 CPU2 x = 0 y =

    0 y = a b = "" x = b
  17. class Foo @mutex = Mutex.new ! def initialize @bar =

    "Bar" end ! def self.instance unless @instance @mutex.synchronize do unless @instance @instance = Foo.new end end end @instance end end
  18. class Foo @mutex = Mutex.new ! def initialize @bar =

    "Bar" end ! def self.instance unless @instance @mutex.synchronize do instance = Foo.new # Insert compiler barrier @instance = instance end end @instance end end Explicit synchronization
  19. class Foo attr_accessor :a end ! f = Foo.new i

    = 0 while i < 100000 f.a = 0 i += 1 end i = 0 while i < 100000 f.a = 1 i += 1 end CPU1 CPU2
  20. L1 L2 L3 RAM 4 cycles ! 10 cycles !

    40 - 75 cycles ! 60ns - 100ns hundreds of cycles Shared across cores
  21. class Foo attr_accessor :a attr_accessor :b end ! f =

    Foo.new i = 0 while i < 100000 f.b = 0 i += 1 end i = 0 while i < 100000 f.a = 1 i += 1 end CPU1 CPU2
  22. class Foo attr_accessor :a ... attr_accessor :k end ! f

    = Foo.new i = 0 while i < 100000 f.k = 0 i += 1 end i = 0 while i < 100000 f.a = 1 i += 1 end CPU1 CPU2
  23. Fin