LMAX Architecture with Disruptors: 6M Transactions per Second

Die LMAX-Architecture with Disruptors: 6M Transactions per Second Stephan Schmidt,
Vice CTO, brands4friends

Me Stephan Schmidt Vice CTO brands4friends @codemonkeyism www.codemonkeyism.com [email protected]

brands4friends No.1 Shopping Club in Germany > 360k daily visitors
> 4.5M Users eBay company 20.04.12 5 WJAX 2011

Development at brands4friends Team Java and web developers, data warehouse
developers Process Scrum since 2009 Kanban for DWH since 2012

LMAX - The London Multi-Asset Exchange 20.04.12 Fußzeilentext 9 "We
aim to build the highest performance financial exchange in the world"

High Performance Transaction Processing 20.04.12 10 Fußzeilentext

Service / Transaction Processor Receive Unmarshal Replicate Journal Business Logic
Marshall Send

Marshall Send Queue Queue Queue Queue Queue Queue

Ghz CPU Cores

20.04.12 Fußzeilentext 14 Actors? SEDA?

Stuff that did not work for various reasons 20.04.12 Fußzeilentext
15 1.  RDBMS 2.  Actors 3.  SEDA 4.  J2EE … Service / Transaction Processor Receive Unmarshal Replicate Journal Business Logic Marshall Send Queue Queue Queue Queue Queue Queue

LMAX Architecture 20.04.12 16 Fußzeilentext

Marshall Send Queue Queue Queue Queue Queue Queue

Size Node Node Node Node Linked List Queue Add Remove
Array Queue Cache Line Cache Line Add Remove

Queue as a data structure Problems with Queues 19 1. 
Reading (Take) and Writing (Add) are both write access => Write Contention 2.  Write Contention solves with Locks 1.  Other solutions include Deques 3.  Locks lead to context switches to the kernel 1.  Context switches lead to CPU cache misses etc. 2.  Kernel might use opportunity to do other stuff as well

Locks Costs according to LMAX Paper 20 Method Time in
ms Single Thread 300 Single Thread mit Lock 10.000 Zwei Threads mit Lock 224.000 Single Thread mit CAS 5.700 Zwei Threads mit CAS 30.000 Single Thread/ Volatile Write 4.700 “Compare And Swap” Atomic Reference etc. in Java => No Context Switch Memory Read/Write Barrier

LMAX Data Structure – Ring Buffer 21 Ring Buffer Publisher
Event Processor

Pre-Allocation of Buckets 22 Ring Buffer 31 24 19 18
Publisher 30 29 28 27 26 25 23 22 21 20 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 Event Processor 2^5 •  No (less) GC problems •  Objects are near each other in memory => cache friendly

Coordination 23 Ring Buffer 31 24 19 18 Publisher 30
29 28 27 26 25 23 22 21 20 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 Event Processor 2^5 Claim Strategy 1.Claim 2.Write 3.Make Public by advancing sequence Wait Strategy

Marshall Send Queue Queue Queue Queue Queue Queue Latency

Receive Message Journal Replicate Unmarshall Business Logic

Marshall Send Datenstruktur Datenstruktur

Ouput Disruptor Ouput Disruptor Input Disruptor Ouput Disruptor Business Logic
Handler LMAX Architektur

28 Input Disruptor Receiver Journaler Replicator Un- Marshaller Business Logic
Handler Output Disruptor Publisher Marshaller HA Node File System Jede Stage kann mehrere Threads haben

29 31 24 19 18 Receiver Journaler Replicator Business Logic
Handler Receiver writes on 31. Journaler and Replicator read on 24 and can move up the sequence to 30. Business Logic Handler needs to stay behind all others. Un-Marshaller can move beyond Journaler and Replicator up to 30. Un- Marshaller

Java API 20.04.12 30 Fußzeilentext

P1 C1 C2 C3 C4

C1 P1 C2 C3 C4

P1 C1 C2 C3 C4

C1 C2 C3 C4 P1

C1 P1 P2

20.04.12 Fußzeilentext 38 Demo

LMAX Low Level Ideas 20.04.12 Fußzeilentext 39 1.  Simple Code
2.  Everything in memory 3.  Single threaded per CPU for business logic 4.  Business logic has no I/O, I/O is done somewhere else 5.  Scheduler “knows” dependencies of handlers

6M TPS? How did LMAX do it? 40 10K+ TPS
If you don't do anything stupid 3 billions of instructions on modern CPU 100K+ TPS Clean organized code Standard libraries 1000K+ TPS Custom, cache friendly collections Performance Testing Controlled GC Very well modeled domain x 10 x 10

We’re looking for very good developers

Thanks! @codemonkeyism [email protected]

43 Images CC from Flickr: nimboo, imjustcreative, gremionis, justonlysteve, John_Scone,
Matthias Wicke, irisgodd3ss, TunnelBug, alandd, seasonal wanderer, raulbarraltamayo, Gilmoth, Dunechaser, graftedno1

Sources 20.04.12 Fußzeilentext 44 “Disruptor: High performance alternative to bounded
queues for exchanging data between concurrent threads”, Martin Thompson, Dave Farley, Michael Barker, Patricia Gee, Andrew Stewart, 2011 "The LMAX Architecture”, Martin Fowler, 2011 http://martinfowler.com/articles/lmax.html “How to do 100K+ TPS at less than 1ms latency”, Martin Thompson, Michael Barker, 2010

LMAX Architecture with Disruptors: 6M Transacti...

LMAX Architecture with Disruptors: 6M Transactions per Second

More Decks by Stephan Schmidt

Other Decks in Programming

Featured

Transcript