Slide 1

Slide 1 text

NSQ realtime distributed message processing at scale https://github.com/bitly/nsq November 8th 2012 - NYC Golang Meetup @imsnakes & @jehiah (infrastructure @bitly)

Slide 2

Slide 2 text

THE WAY OF THE BITLY

Slide 3

Slide 3 text

PHILOSOPHY •service-oriented •perform work asynchronously •queue messages (locally) •workers process messages (aka “queuereader”) •scale # of workers (and backend) based on ability to handle message volume •dependencies suck (make it easy to deploy) •use HTTP and JSON

Slide 4

Slide 4 text

App ❶ DATA FLOW incoming request

Slide 5

Slide 5 text

App ❶ ❷ DATA FLOW incoming request sync persist data

Slide 6

Slide 6 text

App ❶ ❸ ❷ DATA FLOW incoming request sync persist data send response

Slide 7

Slide 7 text

App ❶ ❹ ❸ ❷ DATA FLOW incoming request sync persist data send response async queue message

Slide 8

Slide 8 text

App ❶ ❹ ❸ ❷ DATA FLOW async queue message NSQ responsibilities

Slide 9

Slide 9 text

WHY QUEUE? •try to avoid SPOFs •queue(s) and worker(s) are silo’d •in failure scenarios: •queues provide buffering •workers exponentially back off •messages are retried •aka no data loss because things break NSQ QUEUE API consumer

Slide 10

Slide 10 text

WHY QUEUE? •try to avoid SPOFs •queue(s) and worker(s) are silo’d •in failure scenarios: •queues provide buffering •workers exponentially back off •messages are retried •aka no data loss because things break NSQ QUEUE API consumer X

Slide 11

Slide 11 text

WHY QUEUE? •try to avoid SPOFs •queue(s) and worker(s) are silo’d •in failure scenarios: •queues provide buffering •workers exponentially back off •messages are retried •aka no data loss because things break NSQ QUEUE API consumer X message backlog

Slide 12

Slide 12 text

TYPICAL (OLD) ARCHITECTURE Host A API simplequeue queuereader

Slide 13

Slide 13 text

TYPICAL (OLD) ARCHITECTURE Host A API simplequeue queuereader Host B pubsub

Slide 14

Slide 14 text

TYPICAL (OLD) ARCHITECTURE Host A API simplequeue queuereader Host B pubsub Host C simplequeue queuereader ps_to_http

Slide 15

Slide 15 text

TYPICAL (OLD) ARCHITECTURE Host A API simplequeue queuereader Host B pubsub Host C simplequeue queuereader ps_to_http SPOF SPOF COMPLEX

Slide 16

Slide 16 text

THE BAD •no message guarantees and clients are responsible for re-queueing •bottleneck and SPOF transport issue (reconnects, distribution, fault tolerance, etc.) •inefficiency - we copy streams multiple times to multiple systems •complicated service setup repeated for each stream •hard-coded knowledge of queue addresses in queuereaders •lack of internal performance metrics (clients, rate, etc.)

Slide 17

Slide 17 text

DESIGNING A SOLUTION

Slide 18

Slide 18 text

GOALS •provide a straightforward upgrade path •greatly simplify configuration requirements •provide easy topology solutions that enable high-availability and eliminate SPOFs •address the need for stronger message delivery guarantees •bound the memory footprint of a single process •improve efficiency •out of the box library support for Go and Python

Slide 19

Slide 19 text

SIMPLIFY CONFIGURATION • a topic is a distinct stream of messages (a single nsqd instance can have multiple topics) • a channel is an independent queue for a topic (a topic can have multiple channels) • consumers discover producers by querying nsqlookupd (a discovery service for topics) • topics and channels are created at runtime (just start publishing/subscribing) nsqd “clicks” Topics

Slide 20

Slide 20 text

SIMPLIFY CONFIGURATION • a topic is a distinct stream of messages (a single nsqd instance can have multiple topics) • a channel is an independent queue for a topic (a topic can have multiple channels) • consumers discover producers by querying nsqlookupd (a discovery service for topics) • topics and channels are created at runtime (just start publishing/subscribing) nsqd “metrics” Channels “clicks” Topics

Slide 21

Slide 21 text

SIMPLIFY CONFIGURATION • a topic is a distinct stream of messages (a single nsqd instance can have multiple topics) • a channel is an independent queue for a topic (a topic can have multiple channels) • consumers discover producers by querying nsqlookupd (a discovery service for topics) • topics and channels are created at runtime (just start publishing/subscribing) nsqd “metrics” Channels “clicks” Topics “spam_analysis”

Slide 22

Slide 22 text

SIMPLIFY CONFIGURATION • a topic is a distinct stream of messages (a single nsqd instance can have multiple topics) • a channel is an independent queue for a topic (a topic can have multiple channels) • consumers discover producers by querying nsqlookupd (a discovery service for topics) • topics and channels are created at runtime (just start publishing/subscribing) nsqd “metrics” Channels “clicks” Topics “spam_analysis” “archive”

Slide 23

Slide 23 text

separate hosts SIMPLIFY CONFIGURATION • a topic is a distinct stream of messages (a single nsqd instance can have multiple topics) • a channel is an independent queue for a topic (a topic can have multiple channels) • consumers discover producers by querying nsqlookupd (a discovery service for topics) • topics and channels are created at runtime (just start publishing/subscribing) nsqd “metrics” Channels “clicks” Topics “spam_analysis” “archive” Consumers

Slide 24

Slide 24 text

separate hosts SIMPLIFY CONFIGURATION • a topic is a distinct stream of messages (a single nsqd instance can have multiple topics) • a channel is an independent queue for a topic (a topic can have multiple channels) • consumers discover producers by querying nsqlookupd (a discovery service for topics) • topics and channels are created at runtime (just start publishing/subscribing) nsqd “metrics” Channels “clicks” Topics “spam_analysis” “archive” Consumers

Slide 25

Slide 25 text

separate hosts SIMPLIFY CONFIGURATION • a topic is a distinct stream of messages (a single nsqd instance can have multiple topics) • a channel is an independent queue for a topic (a topic can have multiple channels) • consumers discover producers by querying nsqlookupd (a discovery service for topics) • topics and channels are created at runtime (just start publishing/subscribing) nsqd “metrics” Channels “clicks” Topics “spam_analysis” “archive” Consumers

Slide 26

Slide 26 text

separate hosts SIMPLIFY CONFIGURATION • a topic is a distinct stream of messages (a single nsqd instance can have multiple topics) • a channel is an independent queue for a topic (a topic can have multiple channels) • consumers discover producers by querying nsqlookupd (a discovery service for topics) • topics and channels are created at runtime (just start publishing/subscribing) nsqd “metrics” Channels “clicks” Topics “spam_analysis” “archive” Consumers A A A

Slide 27

Slide 27 text

separate hosts SIMPLIFY CONFIGURATION • a topic is a distinct stream of messages (a single nsqd instance can have multiple topics) • a channel is an independent queue for a topic (a topic can have multiple channels) • consumers discover producers by querying nsqlookupd (a discovery service for topics) • topics and channels are created at runtime (just start publishing/subscribing) nsqd “metrics” Channels “clicks” Topics “spam_analysis” “archive” Consumers A A A

Slide 28

Slide 28 text

separate hosts SIMPLIFY CONFIGURATION • a topic is a distinct stream of messages (a single nsqd instance can have multiple topics) • a channel is an independent queue for a topic (a topic can have multiple channels) • consumers discover producers by querying nsqlookupd (a discovery service for topics) • topics and channels are created at runtime (just start publishing/subscribing) nsqd “metrics” Channels “clicks” Topics “spam_analysis” “archive” Consumers A A A

Slide 29

Slide 29 text

separate hosts SIMPLIFY CONFIGURATION • a topic is a distinct stream of messages (a single nsqd instance can have multiple topics) • a channel is an independent queue for a topic (a topic can have multiple channels) • consumers discover producers by querying nsqlookupd (a discovery service for topics) • topics and channels are created at runtime (just start publishing/subscribing) nsqd “metrics” Channels “clicks” Topics “spam_analysis” “archive” Consumers A A A

Slide 30

Slide 30 text

separate hosts SIMPLIFY CONFIGURATION • a topic is a distinct stream of messages (a single nsqd instance can have multiple topics) • a channel is an independent queue for a topic (a topic can have multiple channels) • consumers discover producers by querying nsqlookupd (a discovery service for topics) • topics and channels are created at runtime (just start publishing/subscribing) nsqd “metrics” Channels “clicks” Topics “spam_analysis” “archive” Consumers A A A B B B

Slide 31

Slide 31 text

separate hosts SIMPLIFY CONFIGURATION • a topic is a distinct stream of messages (a single nsqd instance can have multiple topics) • a channel is an independent queue for a topic (a topic can have multiple channels) • consumers discover producers by querying nsqlookupd (a discovery service for topics) • topics and channels are created at runtime (just start publishing/subscribing) nsqd “metrics” Channels “clicks” Topics “spam_analysis” “archive” Consumers A A A B B B

Slide 32

Slide 32 text

separate hosts SIMPLIFY CONFIGURATION • a topic is a distinct stream of messages (a single nsqd instance can have multiple topics) • a channel is an independent queue for a topic (a topic can have multiple channels) • consumers discover producers by querying nsqlookupd (a discovery service for topics) • topics and channels are created at runtime (just start publishing/subscribing) nsqd “metrics” Channels “clicks” Topics “spam_analysis” “archive” Consumers A A A B B B

Slide 33

Slide 33 text

separate hosts SIMPLIFY CONFIGURATION • a topic is a distinct stream of messages (a single nsqd instance can have multiple topics) • a channel is an independent queue for a topic (a topic can have multiple channels) • consumers discover producers by querying nsqlookupd (a discovery service for topics) • topics and channels are created at runtime (just start publishing/subscribing) nsqd “metrics” Channels “clicks” Topics “spam_analysis” “archive” Consumers A A A B B B

Slide 34

Slide 34 text

DISCOVERY remove the need for publishers and consumers to know about each other nsqlookupd nsqd producer nsqlookupd

Slide 35

Slide 35 text

DISCOVERY remove the need for publishers and consumers to know about each other nsqlookupd nsqd ❶ publish msg (specifying topic) producer nsqlookupd

Slide 36

Slide 36 text

DISCOVERY remove the need for publishers and consumers to know about each other nsqlookupd nsqd ❶ publish msg (specifying topic) producer ➋ IDENTIFY persistent TCP connections nsqlookupd

Slide 37

Slide 37 text

DISCOVERY remove the need for publishers and consumers to know about each other nsqlookupd nsqd ❶ publish msg (specifying topic) producer ➋ IDENTIFY persistent TCP connections nsqlookupd ➌ REGISTER (topic/channel)

Slide 38

Slide 38 text

DISCOVERY (CLIENT) remove the need for publishers and consumers to know about each other nsqlookupd nsqlookupd consumer

Slide 39

Slide 39 text

DISCOVERY (CLIENT) remove the need for publishers and consumers to know about each other nsqlookupd nsqlookupd consumer ➊ regularly poll for topic producers HTTP requests

Slide 40

Slide 40 text

DISCOVERY (CLIENT) remove the need for publishers and consumers to know about each other nsqlookupd nsqlookupd consumer ➊ regularly poll for topic producers ➋ connect to all producers HTTP requests

Slide 41

Slide 41 text

ELIMINATE ALL THE SPOF •easily enable distributed and decentralized topologies •no brokers •consumers connect to all producers •messages are pushed to consumers •nsqlookupd instances are independent and require no coordination (run a few for HA)

Slide 42

Slide 42 text

ELIMINATE ALL THE SPOF nsqd nsqd nsqd •easily enable distributed and decentralized topologies •no brokers •consumers connect to all producers •messages are pushed to consumers •nsqlookupd instances are independent and require no coordination (run a few for HA)

Slide 43

Slide 43 text

ELIMINATE ALL THE SPOF nsqd nsqd nsqd consumer •easily enable distributed and decentralized topologies •no brokers •consumers connect to all producers •messages are pushed to consumers •nsqlookupd instances are independent and require no coordination (run a few for HA)

Slide 44

Slide 44 text

ELIMINATE ALL THE SPOF nsqd nsqd nsqd consumer •easily enable distributed and decentralized topologies •no brokers •consumers connect to all producers •messages are pushed to consumers •nsqlookupd instances are independent and require no coordination (run a few for HA)

Slide 45

Slide 45 text

ELIMINATE ALL THE SPOF nsqd nsqd nsqd consumer consumer •easily enable distributed and decentralized topologies •no brokers •consumers connect to all producers •messages are pushed to consumers •nsqlookupd instances are independent and require no coordination (run a few for HA)

Slide 46

Slide 46 text

ELIMINATE ALL THE SPOF nsqd nsqd nsqd consumer consumer •easily enable distributed and decentralized topologies •no brokers •consumers connect to all producers •messages are pushed to consumers •nsqlookupd instances are independent and require no coordination (run a few for HA)

Slide 47

Slide 47 text

MESSAGE GUARANTEES •messages are delivered at least once •handling is guaranteed by the protocol: •nsqd sends a message and stores it temporarily •client replies FIN (finish) or REQ (re-queue) •if client does not reply message is automatically re-queued •any single nsqd instance failure can result in message loss (can be mitigated)

Slide 48

Slide 48 text

EFFICIENCY get RDY nsqd consumer

Slide 49

Slide 49 text

EFFICIENCY get RDY nsqd consumer ➊ connect

Slide 50

Slide 50 text

EFFICIENCY get RDY nsqd consumer ➊ connect ➋ SUB (subscribe)

Slide 51

Slide 51 text

EFFICIENCY get RDY nsqd consumer ➊ connect ➋ SUB (subscribe) ➌ RDY 2

Slide 52

Slide 52 text

EFFICIENCY get RDY nsqd consumer ➊ connect ➋ SUB (subscribe) Msg ➌ RDY 2

Slide 53

Slide 53 text

EFFICIENCY get RDY nsqd consumer ➊ connect ➋ SUB (subscribe) Msg ➌ RDY 2 Msg

Slide 54

Slide 54 text

EFFICIENCY get RDY nsqd consumer ➊ connect ➋ SUB (subscribe) Msg ➌ RDY 2 ➍ FIN (success) ➍ REQ (fail) Msg

Slide 55

Slide 55 text

QUEUES •topics and channels are independent queues •queues have arbitrary high water marks (after which messages transparently read/write to disk, bounding memory footprint) •supports channel-independent degradation and recovery •10 lines of Go buffer this channel high water mark persisted messages

Slide 56

Slide 56 text

NSQ ARCHITECTURE NSQ NSQD API consumer NSQ NSQD API NSQ NSQD API consumer nsqlookupd nsqlookupd

Slide 57

Slide 57 text

NSQ ARCHITECTURE NSQ NSQD API consumer NSQ NSQD API NSQ NSQD API consumer nsqlookupd nsqlookupd PUBLISH

Slide 58

Slide 58 text

NSQ ARCHITECTURE NSQ NSQD API consumer NSQ NSQD API NSQ NSQD API consumer nsqlookupd nsqlookupd PUBLISH REGISTER

Slide 59

Slide 59 text

NSQ ARCHITECTURE NSQ NSQD API consumer NSQ NSQD API NSQ NSQD API consumer nsqlookupd nsqlookupd PUBLISH REGISTER DISCOVER

Slide 60

Slide 60 text

NSQ ARCHITECTURE NSQ NSQD API consumer NSQ NSQD API NSQ NSQD API consumer nsqlookupd nsqlookupd PUBLISH REGISTER DISCOVER SUBSCRIBE

Slide 61

Slide 61 text

TOOLING •nsqadmin provides a web interface to administrate and introspect an NSQ cluster at runtime •empty, pause, delete channels •nsq_to_http - utility that helps transport an aggregate stream over HTTP •nsq_to_file - utility that safely persists an aggregated stream to disk

Slide 62

Slide 62 text

ONE MORE THING •#ephemeral channels - runtime introspection •no backup beyond channel high water mark •automatically go away when last client disconnects •server side channel pausing •administratively stop the flow of messages from a channel to its clients •no message loss (queue backs up) •really $#%^ing awesome for operations

Slide 63

Slide 63 text

IMPLEMENTATION

Slide 64

Slide 64 text

GO LESSONS LEARNED •don’t be afraid of the sync package •goroutines are cheap not free •watch your allocations (string() is costly, re-use buffers) •use anonymous structs for arbitrary JSON •no built-in per-request HTTP timeouts •synchronizing goroutine exit is hard - log each cleanup step in long-running goroutines •select skips nil channels

Slide 65

Slide 65 text

IMPORT PATH github.com/bitly/nsq/nsq fork github.com/mreiferson/nsq/nsq •our git workflow involves hardcore forking action •re-writing import paths sucks •$ go tool install_as --import-path=github.com/bitly/nsq/nsq • https://github.com/mreiferson/go-install-as •also - relative imports are OK

Slide 66

Slide 66 text

CLIENTS •Go Client - https://gist.github.com/4039222 •Synchronous Python Client - https://gist.github.com/3925081 •Async Python Client - https://gist.github.com/3925092

Slide 67

Slide 67 text

DEMO

Slide 68

Slide 68 text

!anks @imsnakes & @jehiah https://github.com/bitly/nsq shoutouts to @danielhfrank, @ploxiln, and @mccutchen