Lessons Learned Optimizing NSQ

Lessons Learned Optimizing NSQ

NYTimes Code Weekly

Dd56a8e1de66aeedb987397511f830e7?s=128

Matt Reiferson

March 06, 2014
Tweet

Transcript

  1. LESSONS LEARNED OPTIMIZING NSQ a realtime distributed messaging platform https://github.com/bitly/nsq

    March 6th 2014 - NY Times Code Weekly Matt Reiferson @imsnakes (CTO at Torando Labs)
  2. OVERVIEW •the one where we talk about why we’re talking

    about this •the one where we talk about NSQ •the one where we talk about Go •the one where we talk about making NSQ “Go” fast
  3. SERVICE ORIENTED ARCHITECTURE

  4. PHILOSOPHY •many single-purposed actors •do one thing and do it

    well •de-couple and separate based on responsibility •avoid SPOFs •provide flexibility in deployment and topology •perform work asynchronously •communicate via messaging - how?
  5. MICRO-SERVICE BLUES •lots of services == lots of moving parts

    •how do all interested services access data? •how do we maintain loose coupling? •how do we keep it simple? provide a unifying distributed system to receive and disseminate event data
  6. A Distributed Log OR? everybody talk to everybody! looks like

    fun!
  7. MESSAGING PATTERNS

  8. PS m1 m1 m1 Producer ConsumerA ConsumerB messages duplicated to

    multiple consumers de-couple independent stream operations PUBSUB
  9. Q m2 m2 m1 Producer ConsumerA ConsumerA m1 messages load

    balanced among a homogenous group of consumers horizontal scalability DISTRIBUTION
  10. Q m2 m2 m1 Producer ConsumerA ConsumerA m1 fault tolerance

    in face of consumer failure, other consumers (try to) pick up the slack DISTRIBUTION
  11. Q m1 Producer ConsumerA ConsumerA m2 if consumers cannot keep

    up with producers, the queue is able to hold onto messages so they can be processed later m3 QUEUEING
  12. None
  13. separate hosts TOPICS AND CHANNELS • a topic is a

    distinct stream of messages (a single nsqd instance can have multiple topics) • a channel is an independent queue for a topic (a topic can have multiple channels) • consumers discover producers by querying nsqlookupd (a discovery service for topics) • topics and channels are created at runtime (just start publishing/subscribing) nsqd “metrics” Channels “clicks” Topics “spam_analysis” “archive” Consumers A A A combine multicast, distribution, and queueing
  14. separate hosts TOPICS AND CHANNELS • a topic is a

    distinct stream of messages (a single nsqd instance can have multiple topics) • a channel is an independent queue for a topic (a topic can have multiple channels) • consumers discover producers by querying nsqlookupd (a discovery service for topics) • topics and channels are created at runtime (just start publishing/subscribing) nsqd “metrics” Channels “clicks” Topics “spam_analysis” “archive” Consumers A A A combine multicast, distribution, and queueing
  15. separate hosts TOPICS AND CHANNELS • a topic is a

    distinct stream of messages (a single nsqd instance can have multiple topics) • a channel is an independent queue for a topic (a topic can have multiple channels) • consumers discover producers by querying nsqlookupd (a discovery service for topics) • topics and channels are created at runtime (just start publishing/subscribing) nsqd “metrics” Channels “clicks” Topics “spam_analysis” “archive” Consumers A A A combine multicast, distribution, and queueing
  16. separate hosts TOPICS AND CHANNELS • a topic is a

    distinct stream of messages (a single nsqd instance can have multiple topics) • a channel is an independent queue for a topic (a topic can have multiple channels) • consumers discover producers by querying nsqlookupd (a discovery service for topics) • topics and channels are created at runtime (just start publishing/subscribing) nsqd “metrics” Channels “clicks” Topics “spam_analysis” “archive” Consumers A A A combine multicast, distribution, and queueing
  17. separate hosts TOPICS AND CHANNELS • a topic is a

    distinct stream of messages (a single nsqd instance can have multiple topics) • a channel is an independent queue for a topic (a topic can have multiple channels) • consumers discover producers by querying nsqlookupd (a discovery service for topics) • topics and channels are created at runtime (just start publishing/subscribing) nsqd “metrics” Channels “clicks” Topics “spam_analysis” “archive” Consumers A A A B B B combine multicast, distribution, and queueing
  18. separate hosts TOPICS AND CHANNELS • a topic is a

    distinct stream of messages (a single nsqd instance can have multiple topics) • a channel is an independent queue for a topic (a topic can have multiple channels) • consumers discover producers by querying nsqlookupd (a discovery service for topics) • topics and channels are created at runtime (just start publishing/subscribing) nsqd “metrics” Channels “clicks” Topics “spam_analysis” “archive” Consumers A A A B B B combine multicast, distribution, and queueing
  19. separate hosts TOPICS AND CHANNELS • a topic is a

    distinct stream of messages (a single nsqd instance can have multiple topics) • a channel is an independent queue for a topic (a topic can have multiple channels) • consumers discover producers by querying nsqlookupd (a discovery service for topics) • topics and channels are created at runtime (just start publishing/subscribing) nsqd “metrics” Channels “clicks” Topics “spam_analysis” “archive” Consumers A A A B B B combine multicast, distribution, and queueing
  20. separate hosts TOPICS AND CHANNELS • a topic is a

    distinct stream of messages (a single nsqd instance can have multiple topics) • a channel is an independent queue for a topic (a topic can have multiple channels) • consumers discover producers by querying nsqlookupd (a discovery service for topics) • topics and channels are created at runtime (just start publishing/subscribing) nsqd “metrics” Channels “clicks” Topics “spam_analysis” “archive” Consumers A A A B B B combine multicast, distribution, and queueing
  21. QUEUES •topics and channels are independent queues •queues have arbitrary

    high water marks (after which messages transparently read/write to disk, bounding memory footprint) •supports channel-independent degradation and recovery buffer this channel high water mark persisted messages
  22. NSQ NSQD API consumer NSQ NSQD API NSQ NSQD API

    consumer nsqlookupd nsqlookupd TYPICAL NSQ CLUSTER •easily enable distributed and decentralized topologies •no centralized broker •consumers connect to
 all producers •messages are pushed to consumers •nsqlookupd instances are independent and require no coordination (run a few for HA)
  23. NSQ NSQD API consumer NSQ NSQD API NSQ NSQD API

    consumer nsqlookupd nsqlookupd PUBLISH TYPICAL NSQ CLUSTER •easily enable distributed and decentralized topologies •no centralized broker •consumers connect to
 all producers •messages are pushed to consumers •nsqlookupd instances are independent and require no coordination (run a few for HA)
  24. NSQ NSQD API consumer NSQ NSQD API NSQ NSQD API

    consumer nsqlookupd nsqlookupd PUBLISH REGISTER TYPICAL NSQ CLUSTER •easily enable distributed and decentralized topologies •no centralized broker •consumers connect to
 all producers •messages are pushed to consumers •nsqlookupd instances are independent and require no coordination (run a few for HA)
  25. NSQ NSQD API consumer NSQ NSQD API NSQ NSQD API

    consumer nsqlookupd nsqlookupd PUBLISH REGISTER DISCOVER TYPICAL NSQ CLUSTER •easily enable distributed and decentralized topologies •no centralized broker •consumers connect to
 all producers •messages are pushed to consumers •nsqlookupd instances are independent and require no coordination (run a few for HA)
  26. NSQ NSQD API consumer NSQ NSQD API NSQ NSQD API

    consumer nsqlookupd nsqlookupd PUBLISH REGISTER DISCOVER SUBSCRIBE TYPICAL NSQ CLUSTER •easily enable distributed and decentralized topologies •no centralized broker •consumers connect to
 all producers •messages are pushed to consumers •nsqlookupd instances are independent and require no coordination (run a few for HA)
  27. (NON) GUARANTEES •messages are delivered at least once •messages are

    not durable (by default) •messages received are un-ordered •consumers eventually find all topic producers
  28. CLIENT BEHAVIOR •messages are pushed to clients (no polling!) •clients

    manage flow via “RDY state” •clients FIN or REQ •client libraries manage backoff
  29. ONE MORE THING •#ephemeral channels •no queue backup / automatically

    go away •server side topic and channel pausing •administratively stop message flow •no message loss (queue backs up) •TLS / Snappy •great tooling out of the box •nsqadmin / nsq_to_file / nsq_to_nsq •nsq_to_http / nsq_tail / nsq_stat
  30. IN PRODUCTION

  31. EXAMPLE CLIENTS Go Python

  32. LET’S TALK ABOUT GO

  33. THOUSAND FOOT VIEW •compiled (fast), statically typed, garbage collected •concurrency

    is built-in •goroutines (lightweight userland threads) •“don’t communicate by sharing memory, share memory by communicating” •small (easy to learn) •statically linked (no external dependencies) •good C interoperability •vast standard library (HTTP server, network IO, RPC, JSON, buffered IO, etc.) •… it keeps getting better / faster with each new release
  34. GC TENSION Latency Throughput Footprint % of time spent in

    GC % of time doing useful work size of heap pick one, maybe two, never all three
  35. THE GOLDEN RULE Produce Less Garbage!

  36. •first understand how the GC is behaving •under real workloads

    (production?) •publish metrics to statsd (graphed in nsqadmin) •then understand where garbage is generated •go test -benchmem (profile allocations) •go build -gcflags -m (output escape analysis) UNDERSTANDING THE GC
  37. REDUCING GC PRESSURE •avoid []byte to string conversions •re-use buffers

    or objects (use sync.Pool in Go 1.3) •pre-allocate slices make([]byte, 0, 1024) •explicitly specify number/size of items on the wire •leave nothing unbounded •apply sane limits to configurable dials
 (message size, # of messages) •avoid boxing (use of interface{}) or unnecessary wrapper types •avoid the use of defer in hot code paths (it allocates)
  38. OTHER LESSONS LEARNED •don’t be afraid of the sync package

    •channels are overkill for primitives •goroutines are cheap not free (~4k per) •use worker pools rather than “goroutine per X” •synchronizing goroutine exit and cleanup is hard •all IO must have timeouts (guarantee progress) •wrap IO with bufio.Reader/Writer to reduce
 context switches (syscalls) •select skips nil channels
  39. •built-in! • net/http/pprof • go tool pprof •available for CPU,

    heap, goroutine
 based profiling •all long-running daemons should
 expose telemetry/debug endpoints WHEN IN DOUBT PPROF
  40. DRUMROLL PLEASE…

  41. Thanks! Any questions? @imsnakes ! https://github.com/bitly/nsq ! shoutout to @jehiah

    (co-author of NSQ)