Slide 1

Slide 1 text

NSQ realtime distributed message processing at scale https://github.com/bitly/nsq May 15th 2013 - NYC Data Engineering Meetup @imsnakes & @jehiah (infrastructure @bitly) Thursday, May 16, 13

Slide 2

Slide 2 text

THE WAY OF THE BITLY Thursday, May 16, 13

Slide 3

Slide 3 text

PHILOSOPHY •service-oriented •avoid SPOFs •perform work asynchronously •de-couple services and use messaging in between •dependencies suck (make it easy to deploy) •use HTTP and JSON Thursday, May 16, 13

Slide 4

Slide 4 text

App ❶ DATA FLOW incoming request Thursday, May 16, 13

Slide 5

Slide 5 text

App ❶ ❷ DATA FLOW incoming request sync persist data Thursday, May 16, 13

Slide 6

Slide 6 text

App ❶ ❸ ❷ DATA FLOW incoming request sync persist data send response Thursday, May 16, 13

Slide 7

Slide 7 text

App ❶ ❹ ❸ ❷ DATA FLOW incoming request sync persist data send response async queue message Thursday, May 16, 13

Slide 8

Slide 8 text

App ❶ ❹ ❸ ❷ DATA FLOW async queue message NSQ responsibilities Thursday, May 16, 13

Slide 9

Slide 9 text

MESSAGING PATTERNS Thursday, May 16, 13

Slide 10

Slide 10 text

PS m1 m1 m1 Producer ConsumerA ConsumerB messages duplicated to multiple consumers de-couple independent stream operations PUBSUB / MULTICAST Thursday, May 16, 13

Slide 11

Slide 11 text

Q m2 m2 m1 Producer ConsumerA ConsumerA m1 messages load balanced among a homogenous group of consumers horizontal scalability DISTRIBUTION Thursday, May 16, 13

Slide 12

Slide 12 text

Q m2 m2 m1 Producer ConsumerA ConsumerA m1 fault tolerance in face of consumer failure, other consumers (try to) pick up the slack DISTRIBUTION Thursday, May 16, 13

Slide 13

Slide 13 text

Q m1 Producer ConsumerA ConsumerA m2 if consumers cannot keep up with producers, the queue is able to hold onto messages so they can be processed later m3 QUEUEING X X Thursday, May 16, 13

Slide 14

Slide 14 text

TYPICAL (TERRIBLE) ARCHITECTURE Host A API queue queuereader •no delivery guarantee •SPOFs •inefficient •complicated setup •hard-coded config Thursday, May 16, 13

Slide 15

Slide 15 text

TYPICAL (TERRIBLE) ARCHITECTURE Host A API queue queuereader Host B pubsub / multicast •no delivery guarantee •SPOFs •inefficient •complicated setup •hard-coded config Thursday, May 16, 13

Slide 16

Slide 16 text

TYPICAL (TERRIBLE) ARCHITECTURE Host A API queue queuereader Host B pubsub / multicast Host C queue queuereader relay •no delivery guarantee •SPOFs •inefficient •complicated setup •hard-coded config Thursday, May 16, 13

Slide 17

Slide 17 text

TYPICAL (TERRIBLE) ARCHITECTURE Host A API queue queuereader Host B pubsub / multicast Host C queue queuereader relay SPOF SPOF COMPLEX •no delivery guarantee •SPOFs •inefficient •complicated setup •hard-coded config Thursday, May 16, 13

Slide 18

Slide 18 text

DESIGNING A SOLUTION Thursday, May 16, 13

Slide 19

Slide 19 text

GOALS •provide a straightforward upgrade path •greatly simplify configuration requirements •promote topologies that enable high-availability and eliminate SPOFs •address the need for stronger message delivery guarantees •bound the memory footprint of a single process •improve efficiency •data format and programming language agnostic Thursday, May 16, 13

Slide 20

Slide 20 text

I WANT IT ALL Thursday, May 16, 13

Slide 21

Slide 21 text

Thursday, May 16, 13

Slide 22

Slide 22 text

Thursday, May 16, 13

Slide 23

Slide 23 text

TOPICS AND CHANNELS • a topic is a distinct stream of messages (a single nsqd instance can have multiple topics) • a channel is an independent queue for a topic (a topic can have multiple channels) • consumers discover producers by querying nsqlookupd (a discovery service for topics) • topics and channels are created at runtime (just start publishing/subscribing) nsqd “clicks” Topics combine multicast, distribution, and queueing Thursday, May 16, 13

Slide 24

Slide 24 text

TOPICS AND CHANNELS • a topic is a distinct stream of messages (a single nsqd instance can have multiple topics) • a channel is an independent queue for a topic (a topic can have multiple channels) • consumers discover producers by querying nsqlookupd (a discovery service for topics) • topics and channels are created at runtime (just start publishing/subscribing) nsqd “metrics” Channels “clicks” Topics combine multicast, distribution, and queueing Thursday, May 16, 13

Slide 25

Slide 25 text

TOPICS AND CHANNELS • a topic is a distinct stream of messages (a single nsqd instance can have multiple topics) • a channel is an independent queue for a topic (a topic can have multiple channels) • consumers discover producers by querying nsqlookupd (a discovery service for topics) • topics and channels are created at runtime (just start publishing/subscribing) nsqd “metrics” Channels “clicks” Topics “spam_analysis” combine multicast, distribution, and queueing Thursday, May 16, 13

Slide 26

Slide 26 text

TOPICS AND CHANNELS • a topic is a distinct stream of messages (a single nsqd instance can have multiple topics) • a channel is an independent queue for a topic (a topic can have multiple channels) • consumers discover producers by querying nsqlookupd (a discovery service for topics) • topics and channels are created at runtime (just start publishing/subscribing) nsqd “metrics” Channels “clicks” Topics “spam_analysis” “archive” combine multicast, distribution, and queueing Thursday, May 16, 13

Slide 27

Slide 27 text

separate hosts TOPICS AND CHANNELS • a topic is a distinct stream of messages (a single nsqd instance can have multiple topics) • a channel is an independent queue for a topic (a topic can have multiple channels) • consumers discover producers by querying nsqlookupd (a discovery service for topics) • topics and channels are created at runtime (just start publishing/subscribing) nsqd “metrics” Channels “clicks” Topics “spam_analysis” “archive” Consumers combine multicast, distribution, and queueing Thursday, May 16, 13

Slide 28

Slide 28 text

separate hosts TOPICS AND CHANNELS • a topic is a distinct stream of messages (a single nsqd instance can have multiple topics) • a channel is an independent queue for a topic (a topic can have multiple channels) • consumers discover producers by querying nsqlookupd (a discovery service for topics) • topics and channels are created at runtime (just start publishing/subscribing) nsqd “metrics” Channels “clicks” Topics “spam_analysis” “archive” Consumers combine multicast, distribution, and queueing Thursday, May 16, 13

Slide 29

Slide 29 text

separate hosts TOPICS AND CHANNELS • a topic is a distinct stream of messages (a single nsqd instance can have multiple topics) • a channel is an independent queue for a topic (a topic can have multiple channels) • consumers discover producers by querying nsqlookupd (a discovery service for topics) • topics and channels are created at runtime (just start publishing/subscribing) nsqd “metrics” Channels “clicks” Topics “spam_analysis” “archive” Consumers combine multicast, distribution, and queueing Thursday, May 16, 13

Slide 30

Slide 30 text

separate hosts TOPICS AND CHANNELS • a topic is a distinct stream of messages (a single nsqd instance can have multiple topics) • a channel is an independent queue for a topic (a topic can have multiple channels) • consumers discover producers by querying nsqlookupd (a discovery service for topics) • topics and channels are created at runtime (just start publishing/subscribing) nsqd “metrics” Channels “clicks” Topics “spam_analysis” “archive” Consumers A A A combine multicast, distribution, and queueing Thursday, May 16, 13

Slide 31

Slide 31 text

separate hosts TOPICS AND CHANNELS • a topic is a distinct stream of messages (a single nsqd instance can have multiple topics) • a channel is an independent queue for a topic (a topic can have multiple channels) • consumers discover producers by querying nsqlookupd (a discovery service for topics) • topics and channels are created at runtime (just start publishing/subscribing) nsqd “metrics” Channels “clicks” Topics “spam_analysis” “archive” Consumers A A A combine multicast, distribution, and queueing Thursday, May 16, 13

Slide 32

Slide 32 text

separate hosts TOPICS AND CHANNELS • a topic is a distinct stream of messages (a single nsqd instance can have multiple topics) • a channel is an independent queue for a topic (a topic can have multiple channels) • consumers discover producers by querying nsqlookupd (a discovery service for topics) • topics and channels are created at runtime (just start publishing/subscribing) nsqd “metrics” Channels “clicks” Topics “spam_analysis” “archive” Consumers A A A combine multicast, distribution, and queueing Thursday, May 16, 13

Slide 33

Slide 33 text

separate hosts TOPICS AND CHANNELS • a topic is a distinct stream of messages (a single nsqd instance can have multiple topics) • a channel is an independent queue for a topic (a topic can have multiple channels) • consumers discover producers by querying nsqlookupd (a discovery service for topics) • topics and channels are created at runtime (just start publishing/subscribing) nsqd “metrics” Channels “clicks” Topics “spam_analysis” “archive” Consumers A A A combine multicast, distribution, and queueing Thursday, May 16, 13

Slide 34

Slide 34 text

separate hosts TOPICS AND CHANNELS • a topic is a distinct stream of messages (a single nsqd instance can have multiple topics) • a channel is an independent queue for a topic (a topic can have multiple channels) • consumers discover producers by querying nsqlookupd (a discovery service for topics) • topics and channels are created at runtime (just start publishing/subscribing) nsqd “metrics” Channels “clicks” Topics “spam_analysis” “archive” Consumers A A A B B B combine multicast, distribution, and queueing Thursday, May 16, 13

Slide 35

Slide 35 text

separate hosts TOPICS AND CHANNELS • a topic is a distinct stream of messages (a single nsqd instance can have multiple topics) • a channel is an independent queue for a topic (a topic can have multiple channels) • consumers discover producers by querying nsqlookupd (a discovery service for topics) • topics and channels are created at runtime (just start publishing/subscribing) nsqd “metrics” Channels “clicks” Topics “spam_analysis” “archive” Consumers A A A B B B combine multicast, distribution, and queueing Thursday, May 16, 13

Slide 36

Slide 36 text

separate hosts TOPICS AND CHANNELS • a topic is a distinct stream of messages (a single nsqd instance can have multiple topics) • a channel is an independent queue for a topic (a topic can have multiple channels) • consumers discover producers by querying nsqlookupd (a discovery service for topics) • topics and channels are created at runtime (just start publishing/subscribing) nsqd “metrics” Channels “clicks” Topics “spam_analysis” “archive” Consumers A A A B B B combine multicast, distribution, and queueing Thursday, May 16, 13

Slide 37

Slide 37 text

separate hosts TOPICS AND CHANNELS • a topic is a distinct stream of messages (a single nsqd instance can have multiple topics) • a channel is an independent queue for a topic (a topic can have multiple channels) • consumers discover producers by querying nsqlookupd (a discovery service for topics) • topics and channels are created at runtime (just start publishing/subscribing) nsqd “metrics” Channels “clicks” Topics “spam_analysis” “archive” Consumers A A A B B B combine multicast, distribution, and queueing Thursday, May 16, 13

Slide 38

Slide 38 text

QUEUES •topics and channels are independent queues •queues have arbitrary high water marks (after which messages transparently read/write to disk, bounding memory footprint) •supports channel-independent degradation and recovery buffer this channel high water mark persisted messages Thursday, May 16, 13

Slide 39

Slide 39 text

DISCOVERY remove the need for publishers and consumers to know about each other nsqlookupd nsqd producer nsqlookupd Thursday, May 16, 13

Slide 40

Slide 40 text

DISCOVERY remove the need for publishers and consumers to know about each other nsqlookupd nsqd ❶ publish msg (specifying topic) producer nsqlookupd Thursday, May 16, 13

Slide 41

Slide 41 text

DISCOVERY remove the need for publishers and consumers to know about each other nsqlookupd nsqd ❶ publish msg (specifying topic) producer ➋ IDENTIFY persistent TCP connections nsqlookupd Thursday, May 16, 13

Slide 42

Slide 42 text

DISCOVERY remove the need for publishers and consumers to know about each other nsqlookupd nsqd ❶ publish msg (specifying topic) producer ➋ IDENTIFY persistent TCP connections nsqlookupd ➌ REGISTER (topic/channel) Thursday, May 16, 13

Slide 43

Slide 43 text

DISCOVERY (CLIENT) remove the need for publishers and consumers to know about each other nsqlookupd nsqlookupd consumer Thursday, May 16, 13

Slide 44

Slide 44 text

DISCOVERY (CLIENT) remove the need for publishers and consumers to know about each other nsqlookupd nsqlookupd consumer ➊ regularly poll for topic producers HTTP requests Thursday, May 16, 13

Slide 45

Slide 45 text

DISCOVERY (CLIENT) remove the need for publishers and consumers to know about each other nsqlookupd nsqlookupd consumer ➊ regularly poll for topic producers ➋ connect to all producers HTTP requests Thursday, May 16, 13

Slide 46

Slide 46 text

ELIMINATE ALL THE SPOF •easily enable distributed and decentralized topologies •no brokers •consumers connect to all producers •messages are pushed to consumers •nsqlookupd instances are independent and require no coordination (run a few for HA) Thursday, May 16, 13

Slide 47

Slide 47 text

ELIMINATE ALL THE SPOF nsqd nsqd nsqd •easily enable distributed and decentralized topologies •no brokers •consumers connect to all producers •messages are pushed to consumers •nsqlookupd instances are independent and require no coordination (run a few for HA) Thursday, May 16, 13

Slide 48

Slide 48 text

ELIMINATE ALL THE SPOF nsqd nsqd nsqd consumer •easily enable distributed and decentralized topologies •no brokers •consumers connect to all producers •messages are pushed to consumers •nsqlookupd instances are independent and require no coordination (run a few for HA) Thursday, May 16, 13

Slide 49

Slide 49 text

ELIMINATE ALL THE SPOF nsqd nsqd nsqd consumer •easily enable distributed and decentralized topologies •no brokers •consumers connect to all producers •messages are pushed to consumers •nsqlookupd instances are independent and require no coordination (run a few for HA) Thursday, May 16, 13

Slide 50

Slide 50 text

ELIMINATE ALL THE SPOF nsqd nsqd nsqd consumer consumer •easily enable distributed and decentralized topologies •no brokers •consumers connect to all producers •messages are pushed to consumers •nsqlookupd instances are independent and require no coordination (run a few for HA) Thursday, May 16, 13

Slide 51

Slide 51 text

ELIMINATE ALL THE SPOF nsqd nsqd nsqd consumer consumer •easily enable distributed and decentralized topologies •no brokers •consumers connect to all producers •messages are pushed to consumers •nsqlookupd instances are independent and require no coordination (run a few for HA) Thursday, May 16, 13

Slide 52

Slide 52 text

NSQ NSQD API consumer NSQ NSQD API NSQ NSQD API consumer nsqlookupd nsqlookupd Thursday, May 16, 13

Slide 53

Slide 53 text

NSQ NSQD API consumer NSQ NSQD API NSQ NSQD API consumer nsqlookupd nsqlookupd PUBLISH Thursday, May 16, 13

Slide 54

Slide 54 text

NSQ NSQD API consumer NSQ NSQD API NSQ NSQD API consumer nsqlookupd nsqlookupd PUBLISH REGISTER Thursday, May 16, 13

Slide 55

Slide 55 text

NSQ NSQD API consumer NSQ NSQD API NSQ NSQD API consumer nsqlookupd nsqlookupd PUBLISH REGISTER DISCOVER Thursday, May 16, 13

Slide 56

Slide 56 text

NSQ NSQD API consumer NSQ NSQD API NSQ NSQD API consumer nsqlookupd nsqlookupd PUBLISH REGISTER DISCOVER SUBSCRIBE Thursday, May 16, 13

Slide 57

Slide 57 text

MESSAGE GUARANTEES •messages are delivered at least once •handling is guaranteed by the protocol: •nsqd sends a message and stores it temporarily •client replies FIN (finish) or REQ (re-queue) •if client does not reply message is automatically re-queued •any single nsqd instance failure can result in message loss (can be mitigated) Thursday, May 16, 13

Slide 58

Slide 58 text

CLIENT BEHAVIOR •messages are pushed to clients (no polling!) •clients manage flow via “RDY state” •clients can perform 3 actions on a message: •finish •re-queue (optionally defer by a duration of time) •touch •back off, i.e. slow down the rate of processing Thursday, May 16, 13

Slide 59

Slide 59 text

ONE MORE THING •#ephemeral channels - runtime introspection •no backup beyond channel high water mark •automatically go away when last client disconnects •server side channel pausing •administratively stop the flow of messages from a channel to its clients •no message loss (queue backs up) •really $#%^ing awesome for operations Thursday, May 16, 13

Slide 60

Slide 60 text

OTHER SOLUTIONS •ZeroMQ - it’s a library, not a platform •RabbitMQ, ActiveMQ - promotes brokered topology (and AMQP’s original authors abandoned it to build ZeroMQ) •kafka - heavyweight, complex, designed for different use case •beanstalk, kestrel - just a better queue we knew you were going to ask about this... Thursday, May 16, 13

Slide 61

Slide 61 text

IN PRODUCTION Thursday, May 16, 13

Slide 62

Slide 62 text

TOOLING • nsqadmin provides a web interface to administrate and introspect an NSQ cluster at runtime (and empty, pause, or delete topics/ channels) • nsq_to_http - utility that helps transport an aggregate stream over HTTP • nsq_to_file - utility that safely persists an aggregated stream to disk • nsq_stat - iostat like utility for a topic/channel • nsq_tail - tail like utility for a topic/channel Thursday, May 16, 13

Slide 63

Slide 63 text

EXAMPLE CLIENTS •Go Client - https://gist.github.com/4039222 •Synchronous Python Client - https://gist.github.com/3925081 •Async Python Client - https://gist.github.com/3925092 Thursday, May 16, 13

Slide 64

Slide 64 text

DEMO Thursday, May 16, 13

Slide 65

Slide 65 text

!anks @imsnakes & @jehiah https://github.com/bitly/nsq shoutouts to @danielhfrank, @ploxiln, and @mccutchen Thursday, May 16, 13