Slide 1

Slide 1 text

LESSONS LEARNED OPTIMIZING NSQ a realtime distributed messaging platform https://github.com/bitly/nsq March 6th 2014 - NY Times Code Weekly Matt Reiferson @imsnakes (CTO at Torando Labs)

Slide 2

Slide 2 text

OVERVIEW •the one where we talk about why we’re talking about this •the one where we talk about NSQ •the one where we talk about Go •the one where we talk about making NSQ “Go” fast

Slide 3

Slide 3 text

SERVICE ORIENTED ARCHITECTURE

Slide 4

Slide 4 text

PHILOSOPHY •many single-purposed actors •do one thing and do it well •de-couple and separate based on responsibility •avoid SPOFs •provide flexibility in deployment and topology •perform work asynchronously •communicate via messaging - how?

Slide 5

Slide 5 text

MICRO-SERVICE BLUES •lots of services == lots of moving parts •how do all interested services access data? •how do we maintain loose coupling? •how do we keep it simple? provide a unifying distributed system to receive and disseminate event data

Slide 6

Slide 6 text

A Distributed Log OR? everybody talk to everybody! looks like fun!

Slide 7

Slide 7 text

MESSAGING PATTERNS

Slide 8

Slide 8 text

PS m1 m1 m1 Producer ConsumerA ConsumerB messages duplicated to multiple consumers de-couple independent stream operations PUBSUB

Slide 9

Slide 9 text

Q m2 m2 m1 Producer ConsumerA ConsumerA m1 messages load balanced among a homogenous group of consumers horizontal scalability DISTRIBUTION

Slide 10

Slide 10 text

Q m2 m2 m1 Producer ConsumerA ConsumerA m1 fault tolerance in face of consumer failure, other consumers (try to) pick up the slack DISTRIBUTION

Slide 11

Slide 11 text

Q m1 Producer ConsumerA ConsumerA m2 if consumers cannot keep up with producers, the queue is able to hold onto messages so they can be processed later m3 QUEUEING

Slide 12

Slide 12 text

No content

Slide 13

Slide 13 text

separate hosts TOPICS AND CHANNELS • a topic is a distinct stream of messages (a single nsqd instance can have multiple topics) • a channel is an independent queue for a topic (a topic can have multiple channels) • consumers discover producers by querying nsqlookupd (a discovery service for topics) • topics and channels are created at runtime (just start publishing/subscribing) nsqd “metrics” Channels “clicks” Topics “spam_analysis” “archive” Consumers A A A combine multicast, distribution, and queueing

Slide 14

Slide 14 text

separate hosts TOPICS AND CHANNELS • a topic is a distinct stream of messages (a single nsqd instance can have multiple topics) • a channel is an independent queue for a topic (a topic can have multiple channels) • consumers discover producers by querying nsqlookupd (a discovery service for topics) • topics and channels are created at runtime (just start publishing/subscribing) nsqd “metrics” Channels “clicks” Topics “spam_analysis” “archive” Consumers A A A combine multicast, distribution, and queueing

Slide 15

Slide 15 text

separate hosts TOPICS AND CHANNELS • a topic is a distinct stream of messages (a single nsqd instance can have multiple topics) • a channel is an independent queue for a topic (a topic can have multiple channels) • consumers discover producers by querying nsqlookupd (a discovery service for topics) • topics and channels are created at runtime (just start publishing/subscribing) nsqd “metrics” Channels “clicks” Topics “spam_analysis” “archive” Consumers A A A combine multicast, distribution, and queueing

Slide 16

Slide 16 text

separate hosts TOPICS AND CHANNELS • a topic is a distinct stream of messages (a single nsqd instance can have multiple topics) • a channel is an independent queue for a topic (a topic can have multiple channels) • consumers discover producers by querying nsqlookupd (a discovery service for topics) • topics and channels are created at runtime (just start publishing/subscribing) nsqd “metrics” Channels “clicks” Topics “spam_analysis” “archive” Consumers A A A combine multicast, distribution, and queueing

Slide 17

Slide 17 text

separate hosts TOPICS AND CHANNELS • a topic is a distinct stream of messages (a single nsqd instance can have multiple topics) • a channel is an independent queue for a topic (a topic can have multiple channels) • consumers discover producers by querying nsqlookupd (a discovery service for topics) • topics and channels are created at runtime (just start publishing/subscribing) nsqd “metrics” Channels “clicks” Topics “spam_analysis” “archive” Consumers A A A B B B combine multicast, distribution, and queueing

Slide 18

Slide 18 text

separate hosts TOPICS AND CHANNELS • a topic is a distinct stream of messages (a single nsqd instance can have multiple topics) • a channel is an independent queue for a topic (a topic can have multiple channels) • consumers discover producers by querying nsqlookupd (a discovery service for topics) • topics and channels are created at runtime (just start publishing/subscribing) nsqd “metrics” Channels “clicks” Topics “spam_analysis” “archive” Consumers A A A B B B combine multicast, distribution, and queueing

Slide 19

Slide 19 text

separate hosts TOPICS AND CHANNELS • a topic is a distinct stream of messages (a single nsqd instance can have multiple topics) • a channel is an independent queue for a topic (a topic can have multiple channels) • consumers discover producers by querying nsqlookupd (a discovery service for topics) • topics and channels are created at runtime (just start publishing/subscribing) nsqd “metrics” Channels “clicks” Topics “spam_analysis” “archive” Consumers A A A B B B combine multicast, distribution, and queueing

Slide 20

Slide 20 text

separate hosts TOPICS AND CHANNELS • a topic is a distinct stream of messages (a single nsqd instance can have multiple topics) • a channel is an independent queue for a topic (a topic can have multiple channels) • consumers discover producers by querying nsqlookupd (a discovery service for topics) • topics and channels are created at runtime (just start publishing/subscribing) nsqd “metrics” Channels “clicks” Topics “spam_analysis” “archive” Consumers A A A B B B combine multicast, distribution, and queueing

Slide 21

Slide 21 text

QUEUES •topics and channels are independent queues •queues have arbitrary high water marks (after which messages transparently read/write to disk, bounding memory footprint) •supports channel-independent degradation and recovery buffer this channel high water mark persisted messages

Slide 22

Slide 22 text

NSQ NSQD API consumer NSQ NSQD API NSQ NSQD API consumer nsqlookupd nsqlookupd TYPICAL NSQ CLUSTER •easily enable distributed and decentralized topologies •no centralized broker •consumers connect to
 all producers •messages are pushed to consumers •nsqlookupd instances are independent and require no coordination (run a few for HA)

Slide 23

Slide 23 text

NSQ NSQD API consumer NSQ NSQD API NSQ NSQD API consumer nsqlookupd nsqlookupd PUBLISH TYPICAL NSQ CLUSTER •easily enable distributed and decentralized topologies •no centralized broker •consumers connect to
 all producers •messages are pushed to consumers •nsqlookupd instances are independent and require no coordination (run a few for HA)

Slide 24

Slide 24 text

NSQ NSQD API consumer NSQ NSQD API NSQ NSQD API consumer nsqlookupd nsqlookupd PUBLISH REGISTER TYPICAL NSQ CLUSTER •easily enable distributed and decentralized topologies •no centralized broker •consumers connect to
 all producers •messages are pushed to consumers •nsqlookupd instances are independent and require no coordination (run a few for HA)

Slide 25

Slide 25 text

NSQ NSQD API consumer NSQ NSQD API NSQ NSQD API consumer nsqlookupd nsqlookupd PUBLISH REGISTER DISCOVER TYPICAL NSQ CLUSTER •easily enable distributed and decentralized topologies •no centralized broker •consumers connect to
 all producers •messages are pushed to consumers •nsqlookupd instances are independent and require no coordination (run a few for HA)

Slide 26

Slide 26 text

NSQ NSQD API consumer NSQ NSQD API NSQ NSQD API consumer nsqlookupd nsqlookupd PUBLISH REGISTER DISCOVER SUBSCRIBE TYPICAL NSQ CLUSTER •easily enable distributed and decentralized topologies •no centralized broker •consumers connect to
 all producers •messages are pushed to consumers •nsqlookupd instances are independent and require no coordination (run a few for HA)

Slide 27

Slide 27 text

(NON) GUARANTEES •messages are delivered at least once •messages are not durable (by default) •messages received are un-ordered •consumers eventually find all topic producers

Slide 28

Slide 28 text

CLIENT BEHAVIOR •messages are pushed to clients (no polling!) •clients manage flow via “RDY state” •clients FIN or REQ •client libraries manage backoff

Slide 29

Slide 29 text

ONE MORE THING •#ephemeral channels •no queue backup / automatically go away •server side topic and channel pausing •administratively stop message flow •no message loss (queue backs up) •TLS / Snappy •great tooling out of the box •nsqadmin / nsq_to_file / nsq_to_nsq •nsq_to_http / nsq_tail / nsq_stat

Slide 30

Slide 30 text

IN PRODUCTION

Slide 31

Slide 31 text

EXAMPLE CLIENTS Go Python

Slide 32

Slide 32 text

LET’S TALK ABOUT GO

Slide 33

Slide 33 text

THOUSAND FOOT VIEW •compiled (fast), statically typed, garbage collected •concurrency is built-in •goroutines (lightweight userland threads) •“don’t communicate by sharing memory, share memory by communicating” •small (easy to learn) •statically linked (no external dependencies) •good C interoperability •vast standard library (HTTP server, network IO, RPC, JSON, buffered IO, etc.) •… it keeps getting better / faster with each new release

Slide 34

Slide 34 text

GC TENSION Latency Throughput Footprint % of time spent in GC % of time doing useful work size of heap pick one, maybe two, never all three

Slide 35

Slide 35 text

THE GOLDEN RULE Produce Less Garbage!

Slide 36

Slide 36 text

•first understand how the GC is behaving •under real workloads (production?) •publish metrics to statsd (graphed in nsqadmin) •then understand where garbage is generated •go test -benchmem (profile allocations) •go build -gcflags -m (output escape analysis) UNDERSTANDING THE GC

Slide 37

Slide 37 text

REDUCING GC PRESSURE •avoid []byte to string conversions •re-use buffers or objects (use sync.Pool in Go 1.3) •pre-allocate slices make([]byte, 0, 1024) •explicitly specify number/size of items on the wire •leave nothing unbounded •apply sane limits to configurable dials
 (message size, # of messages) •avoid boxing (use of interface{}) or unnecessary wrapper types •avoid the use of defer in hot code paths (it allocates)

Slide 38

Slide 38 text

OTHER LESSONS LEARNED •don’t be afraid of the sync package •channels are overkill for primitives •goroutines are cheap not free (~4k per) •use worker pools rather than “goroutine per X” •synchronizing goroutine exit and cleanup is hard •all IO must have timeouts (guarantee progress) •wrap IO with bufio.Reader/Writer to reduce
 context switches (syscalls) •select skips nil channels

Slide 39

Slide 39 text

•built-in! • net/http/pprof • go tool pprof •available for CPU, heap, goroutine
 based profiling •all long-running daemons should
 expose telemetry/debug endpoints WHEN IN DOUBT PPROF

Slide 40

Slide 40 text

DRUMROLL PLEASE…

Slide 41

Slide 41 text

Thanks! Any questions? @imsnakes ! https://github.com/bitly/nsq ! shoutout to @jehiah (co-author of NSQ)