Upgrade to Pro — share decks privately, control downloads, hide ads and more …

LMAX Architecture with Disruptors: 6M Transacti...

LMAX Architecture with Disruptors: 6M Transactions per Second

How to achieve very high throughput for Java based SEDA architectures without queues.

Avatar for Stephan Schmidt

Stephan Schmidt

April 20, 2012
Tweet

More Decks by Stephan Schmidt

Other Decks in Programming

Transcript

  1. 3

  2. brands4friends No.1 Shopping Club in Germany > 360k daily visitors

    > 4.5M Users eBay company 20.04.12 5 WJAX 2011
  3. 6

  4. 7

  5. Development at brands4friends Team Java and web developers, data warehouse

    developers Process Scrum since 2009 Kanban for DWH since 2012
  6. LMAX - The London Multi-Asset Exchange 20.04.12 Fußzeilentext 9 "We

    aim to build the highest performance financial exchange in the world"
  7. Stuff that did not work for various reasons 20.04.12 Fußzeilentext

    15 1.  RDBMS 2.  Actors 3.  SEDA 4.  J2EE … Service / Transaction Processor Receive Unmarshal Replicate Journal Business Logic Marshall Send Queue Queue Queue Queue Queue Queue
  8. Size Node Node Node Node Linked List Queue Add Remove

    Array Queue Cache Line Cache Line Add Remove
  9. Queue as a data structure Problems with Queues 19 1. 

    Reading (Take) and Writing (Add) are both write access => Write Contention 2.  Write Contention solves with Locks 1.  Other solutions include Deques 3.  Locks lead to context switches to the kernel 1.  Context switches lead to CPU cache misses etc. 2.  Kernel might use opportunity to do other stuff as well
  10. Locks Costs according to LMAX Paper 20 Method Time in

    ms Single Thread 300 Single Thread mit Lock 10.000 Zwei Threads mit Lock 224.000 Single Thread mit CAS 5.700 Zwei Threads mit CAS 30.000 Single Thread/ Volatile Write 4.700 “Compare And Swap” Atomic Reference etc. in Java => No Context Switch Memory Read/Write Barrier
  11. Pre-Allocation of Buckets 22 Ring Buffer 31 24 19 18

    Publisher 30 29 28 27 26 25 23 22 21 20 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 Event Processor 2^5 •  No (less) GC problems •  Objects are near each other in memory => cache friendly
  12. Coordination 23 Ring Buffer 31 24 19 18 Publisher 30

    29 28 27 26 25 23 22 21 20 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 Event Processor 2^5 Claim Strategy 1.Claim 2.Write 3.Make Public by advancing sequence Wait Strategy
  13. Service / Transaction Processor Receive Unmarshal Replicate Journal Business Logic

    Marshall Send Queue Queue Queue Queue Queue Queue Latency
  14. 28 Input Disruptor Receiver Journaler Replicator Un- Marshaller Business Logic

    Handler Output Disruptor Publisher Marshaller HA Node File System Jede Stage kann mehrere Threads haben
  15. 29 31 24 19 18 Receiver Journaler Replicator Business Logic

    Handler Receiver writes on 31. Journaler and Replicator read on 24 and can move up the sequence to 30. Business Logic Handler needs to stay behind all others. Un-Marshaller can move beyond Journaler and Replicator up to 30. Un- Marshaller
  16. LMAX Low Level Ideas 20.04.12 Fußzeilentext 39 1.  Simple Code

    2.  Everything in memory 3.  Single threaded per CPU for business logic 4.  Business logic has no I/O, I/O is done somewhere else 5.  Scheduler “knows” dependencies of handlers
  17. 6M TPS? How did LMAX do it? 40 10K+ TPS

    If you don't do anything stupid 3 billions of instructions on modern CPU 100K+ TPS Clean organized code Standard libraries 1000K+ TPS Custom, cache friendly collections Performance Testing Controlled GC Very well modeled domain x 10 x 10
  18. 43 Images CC from Flickr: nimboo, imjustcreative, gremionis, justonlysteve, John_Scone,

    Matthias Wicke, irisgodd3ss, TunnelBug, alandd, seasonal wanderer, raulbarraltamayo, Gilmoth, Dunechaser, graftedno1
  19. Sources 20.04.12 Fußzeilentext 44 “Disruptor: High performance alternative to bounded

    queues for exchanging data between concurrent threads”, Martin Thompson, Dave Farley, Michael Barker, Patricia Gee, Andrew Stewart, 2011 "The LMAX Architecture”, Martin Fowler, 2011 http://martinfowler.com/articles/lmax.html “How to do 100K+ TPS at less than 1ms latency”, Martin Thompson, Michael Barker, 2010