Lessons Learned Optimizing NSQ

LESSONS LEARNED OPTIMIZING NSQ a realtime distributed messaging platform https://github.com/bitly/nsq
March 6th 2014 - NY Times Code Weekly Matt Reiferson @imsnakes (CTO at Torando Labs)

OVERVIEW •the one where we talk about why we’re talking
about this •the one where we talk about NSQ •the one where we talk about Go •the one where we talk about making NSQ “Go” fast

SERVICE ORIENTED ARCHITECTURE

PHILOSOPHY •many single-purposed actors •do one thing and do it
well •de-couple and separate based on responsibility •avoid SPOFs •provide ﬂexibility in deployment and topology •perform work asynchronously •communicate via messaging - how?

MICRO-SERVICE BLUES •lots of services == lots of moving parts
•how do all interested services access data? •how do we maintain loose coupling? •how do we keep it simple? provide a unifying distributed system to receive and disseminate event data

A Distributed Log OR? everybody talk to everybody! looks like
fun!

MESSAGING PATTERNS

PS m1 m1 m1 Producer ConsumerA ConsumerB messages duplicated to
multiple consumers de-couple independent stream operations PUBSUB

Q m2 m2 m1 Producer ConsumerA ConsumerA m1 messages load
balanced among a homogenous group of consumers horizontal scalability DISTRIBUTION

Q m2 m2 m1 Producer ConsumerA ConsumerA m1 fault tolerance
in face of consumer failure, other consumers (try to) pick up the slack DISTRIBUTION

Q m1 Producer ConsumerA ConsumerA m2 if consumers cannot keep
up with producers, the queue is able to hold onto messages so they can be processed later m3 QUEUEING

separate hosts TOPICS AND CHANNELS • a topic is a
distinct stream of messages (a single nsqd instance can have multiple topics) • a channel is an independent queue for a topic (a topic can have multiple channels) • consumers discover producers by querying nsqlookupd (a discovery service for topics) • topics and channels are created at runtime (just start publishing/subscribing) nsqd “metrics” Channels “clicks” Topics “spam_analysis” “archive” Consumers A A A combine multicast, distribution, and queueing

separate hosts TOPICS AND CHANNELS • a topic is a
distinct stream of messages (a single nsqd instance can have multiple topics) • a channel is an independent queue for a topic (a topic can have multiple channels) • consumers discover producers by querying nsqlookupd (a discovery service for topics) • topics and channels are created at runtime (just start publishing/subscribing) nsqd “metrics” Channels “clicks” Topics “spam_analysis” “archive” Consumers A A A B B B combine multicast, distribution, and queueing

QUEUES •topics and channels are independent queues •queues have arbitrary
high water marks (after which messages transparently read/write to disk, bounding memory footprint) •supports channel-independent degradation and recovery buffer this channel high water mark persisted messages

NSQ NSQD API consumer NSQ NSQD API NSQ NSQD API
consumer nsqlookupd nsqlookupd TYPICAL NSQ CLUSTER •easily enable distributed and decentralized topologies •no centralized broker •consumers connect to  all producers •messages are pushed to consumers •nsqlookupd instances are independent and require no coordination (run a few for HA)

consumer nsqlookupd nsqlookupd PUBLISH TYPICAL NSQ CLUSTER •easily enable distributed and decentralized topologies •no centralized broker •consumers connect to  all producers •messages are pushed to consumers •nsqlookupd instances are independent and require no coordination (run a few for HA)

consumer nsqlookupd nsqlookupd PUBLISH REGISTER TYPICAL NSQ CLUSTER •easily enable distributed and decentralized topologies •no centralized broker •consumers connect to  all producers •messages are pushed to consumers •nsqlookupd instances are independent and require no coordination (run a few for HA)

consumer nsqlookupd nsqlookupd PUBLISH REGISTER DISCOVER TYPICAL NSQ CLUSTER •easily enable distributed and decentralized topologies •no centralized broker •consumers connect to  all producers •messages are pushed to consumers •nsqlookupd instances are independent and require no coordination (run a few for HA)

consumer nsqlookupd nsqlookupd PUBLISH REGISTER DISCOVER SUBSCRIBE TYPICAL NSQ CLUSTER •easily enable distributed and decentralized topologies •no centralized broker •consumers connect to  all producers •messages are pushed to consumers •nsqlookupd instances are independent and require no coordination (run a few for HA)

(NON) GUARANTEES •messages are delivered at least once •messages are
not durable (by default) •messages received are un-ordered •consumers eventually ﬁnd all topic producers

CLIENT BEHAVIOR •messages are pushed to clients (no polling!) •clients
manage ﬂow via “RDY state” •clients FIN or REQ •client libraries manage backoff

ONE MORE THING •#ephemeral channels •no queue backup / automatically
go away •server side topic and channel pausing •administratively stop message ﬂow •no message loss (queue backs up) •TLS / Snappy •great tooling out of the box •nsqadmin / nsq_to_ﬁle / nsq_to_nsq •nsq_to_http / nsq_tail / nsq_stat

IN PRODUCTION

EXAMPLE CLIENTS Go Python

LET’S TALK ABOUT GO

THOUSAND FOOT VIEW •compiled (fast), statically typed, garbage collected •concurrency
is built-in •goroutines (lightweight userland threads) •“don’t communicate by sharing memory, share memory by communicating” •small (easy to learn) •statically linked (no external dependencies) •good C interoperability •vast standard library (HTTP server, network IO, RPC, JSON, buffered IO, etc.) •… it keeps getting better / faster with each new release

GC TENSION Latency Throughput Footprint % of time spent in
GC % of time doing useful work size of heap pick one, maybe two, never all three

THE GOLDEN RULE Produce Less Garbage!

•first understand how the GC is behaving •under real workloads
(production?) •publish metrics to statsd (graphed in nsqadmin) •then understand where garbage is generated •go test -benchmem (profile allocations) •go build -gcflags -m (output escape analysis) UNDERSTANDING THE GC

REDUCING GC PRESSURE •avoid []byte to string conversions •re-use buffers
or objects (use sync.Pool in Go 1.3) •pre-allocate slices make([]byte, 0, 1024) •explicitly specify number/size of items on the wire •leave nothing unbounded •apply sane limits to conﬁgurable dials  (message size, # of messages) •avoid boxing (use of interface{}) or unnecessary wrapper types •avoid the use of defer in hot code paths (it allocates)

OTHER LESSONS LEARNED •don’t be afraid of the sync package
•channels are overkill for primitives •goroutines are cheap not free (~4k per) •use worker pools rather than “goroutine per X” •synchronizing goroutine exit and cleanup is hard •all IO must have timeouts (guarantee progress) •wrap IO with buﬁo.Reader/Writer to reduce  context switches (syscalls) •select skips nil channels

•built-in! • net/http/pprof • go tool pprof •available for CPU,
heap, goroutine  based proﬁling •all long-running daemons should  expose telemetry/debug endpoints WHEN IN DOUBT PPROF

DRUMROLL PLEASE…

Thanks! Any questions? @imsnakes ! https://github.com/bitly/nsq ! shoutout to @jehiah
(co-author of NSQ)

Lessons Learned Optimizing NSQ

Lessons Learned Optimizing NSQ

More Decks by Matt Reiferson

Other Decks in Programming

Featured

Transcript