Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Global Scaling at the New York Times using Rabb...

Global Scaling at the New York Times using RabbitMQ

Slides from Michael Laing's OSCON 2014 talk.

More Decks by The New York Times Developers

Other Decks in Technology

Transcript

  1. A  Global  Mesh  with  a  Memory   Message-­‐based:  WebSocket,  AMQP,

     SockJS   If  in  doubt:   •  Resend   •  Reconnect   •  Reread   Idempotent:   •  Replicating   •  Racy   •  Resolving   Classes  of  service:   •  Gold:  replicate/race/resolve   •  Silver:  prioritize   •  Bronze:  queueable   Millions  of  users   Event-­‐driven:  async  using  libev  
  2. Message:  an  event  with  data   §  Envelope:  Routing  while

     in  motion  &  Locating  when  at  rest   §  Metadata   §  Body  (opaque  to  us)   Metadata Body (may be absent) Message Envelope
  3. Message:  an  event  with  data   RabbitMQ WebSocket S3 /

    CloudFront Cassandra Envelope Routing Key Gateway Connection UUID “Path” & UUID Metadata Headers: Map / Array JSON HTTP Headers JSON Body Blob Blob Blob Blob
  4. Publish   Message Core Cassandra S3 / Cloud Front Gateway

    Device Init AMQP CQL WebSocket HTTP sync
  5. Dismiss   Message Core Cassandra Gateway Device Init AMQP CQL

    WebSocket Core Gateway Device Cassandra
  6. Core Core Core Core Gateway Device Message S3 / Cloud

    Front dozens dozens millions millions millions several Cassandra dozens S3 / Cloud Front S3 / Cloud Front S3 / Cloud Front Gateway Gateway Gateway Gateway Gateway Gateway Cassandra Cassandra Cassandra Cassandra Cassandra Device Device Device Device Device Device Device Device Device Device Device Device Message Message Message Message Message Message Message Message Message Message Message Message Connect  
  7. Envelope  –  2  forms  of  addressing   §  “Path”:  1)

     Routing  a  message  to  a  user  2)  Finding  a  message  for  a  user           Message nyt⨍aбrik
  8. Envelope  –  2  forms  of  addressing   §  “Path”:  1)

     Routing  a  message  to  a  user  2)  Finding  a  message  for  a  user         §  “PostofFice”:  Routing  a  message  internally  in  the  nyt⨍aбrik     Message nyt⨍aбrik Core Gateway Core Gateway
  9. The  Path  hierarchy   §  Path  elements  are  text  (utf-­‐8

     but  “.”  is  reserved)  –  the  1st  element  is   the  “category”     “category”: “feeds”, “2nd element”: “breaking-news” “3rd element”: “0012345”
  10. The  Path  hierarchy   §  Path  elements  are  text  (utf-­‐8

     but  “.”  is  reserved)  –  the  1st  element  is   the  “category”     “category”: “feeds”, “2nd element”: “breaking-news” “3rd element”: “0012345”   §  The  elements  are  joined  by  “.”  for  routing   “path”: “feeds.breaking-news.00123456”
  11. Deeper  into  the  Path  hierarchy   §  For  persistence,  the

     path  denotes  a  sorted  “folder”  containing   messages  in  reverse  datetime  order  (using  the  timestamp  from  the   version  1  uuid  uniquely  identifying  each  message)   “feeds.breaking-news.56”/bd1961f5-1062-11e4-a630-406c8f1838fa “feeds.breaking-news.56”/b94e8b45-1062-11e4-900d-406c8f1838fa
  12. Deeper  into  the  Path  hierarchy   §  For  persistence,  the

     path  denotes  a  sorted  “folder”  containing   messages  in  reverse  datetime  order  (using  the  timestamp  from  the   version  1  uuid  uniquely  identifying  each  message)   “feeds.breaking-news.56”/bd1961f5-1062-11e4-a630-406c8f1838fa “feeds.breaking-news.56”/b94e8b45-1062-11e4-900d-406c8f1838fa   §  Subscribing  to  a  path  is  done  by  “binding”,  typically  with  wildcards:     “*”  matches  any  one  element,  “#”  matches  any  sequence  of  elements   All  breaking-­‐news  messages:  “feeds.breaking-news.#”
  13. More  on  subscribing  &  retrieving   §  Retrieving  from  persistent

     storage  can  be  done  by  path,  e.g.  the   “latest”  breaking-­‐news  messages  for  item  56:     “feeds.breaking-news.56”
  14. More  on  subscribing  &  retrieving   §  Retrieving  from  persistent

     storage  can  be  done  by  path,  e.g.  the   “latest”  breaking-­‐news  messages  for  item  56:     “feeds.breaking-news.56”   §  But  retrieval  can  also  be  done  using  trailing  wild  cards:   “feeds.breaking-news.#” will  return  the  “latest”  breaking-­‐news   messages  for  all  “current”  items  
  15. More  on  subscribing  &  retrieving   §  Retrieving  from  persistent

     storage  can  be  done  by  path,  e.g.  the   “latest”  breaking-­‐news  messages  for  item  56:     “feeds.breaking-news.56”   §  But  retrieval  can  also  be  done  using  trailing  wild  cards:   “feeds.breaking-news.#” will  return  the  “latest”  breaking-­‐news   messages  for  all  “current”  items     §  The  Cassandra  data  store  is  designed  to  return  hierarchical  queries   with  a  single  request  and  in  the  desired  order  
  16. A  notable  simpliFication:   §  Paths  for  subscribing  to  messages

     and  paths  for  retrieving  persisted   messages,  including  the  use  of  wild  cards,  are  the  same,  e.g.:  
  17. A  notable  simpliFication:   §  Paths  for  subscribing  to  messages

     and  paths  for  retrieving  persisted   messages,  including  the  use  of  wild  cards,  are  the  same,  e.g.:   When  a  user  logs  in  she  is  “subscribed”  using  her  ID;  messages   “published”  to  her  will  be  received  while  “persisted”  messages  and   subscription  preferences  are  retrieved  (a  few  10’s  of  milliseconds)  
  18. A  notable  simpliFication:   §  Paths  for  subscribing  to  messages

     and  paths  for  retrieving  persisted   messages,  including  the  use  of  wild  cards,  are  the  same,  e.g.:   When  a  user  logs  in  she  is  “subscribed”  using  her  ID;  messages   “published”  to  her  will  be  received  while  “persisted”  messages  and   subscription  preferences  are  retrieved  (a  few  10’s  of  milliseconds)     Once  subscription  preferences  arrive,  she  will  be  “subscribed”  to  them     and  any  corresponding  “persisted”  messages  retrieved     The  same  paths  are  used  for  subscription  and  retrieval  
  19. Special  Paths  for  individual  routing   §  Our  subscribers  (millions

     of  them)  have  numeric  IDs  –  using  those  IDs   directly  for  routing,  specifically  for  the  “binding”  function,  would  be   inefficient   “id.prefs.09067832” (namespace  of  3rd  element  is  too  large)  
  20. Special  Paths  for  individual  routing   §  Our  subscribers  (millions

     of  them)  have  numeric  IDs  –  using  those  IDs   directly  for  routing,  specifically  for  the  “binding”  function,  would  be   inefficient   “id.prefs.09067832” (namespace  of  3rd  element  is  too  large)   §  Instead  we  convert  the  ID  to  base62  elements  and  take  advantage  of   the  patricia  trie  search  structures  built  into  RabbitMQ  and  our   gateway   “id.prefs.c.2.x.M” (equivalent  to  the  above,  used  for  routing)  
  21. PostofFice  addressing   §  The  “postoffice”  is  a  logical  

    “bus”  that  connects  all  the   services  in  all  the  nyt⨍aбrik   instances  globally   Gateway Core Gateway Gateway Core Gateway postoffice logical view
  22. PostofFice  addressing   §  The  “postoffice”  is  a  logical  

    “bus”  that  connects  all  the   services  in  all  the  nyt⨍aбrik   instances  globally   §  It  is  physically  segmented   and  the  segments  are   connected  using  RabbitMQ   “federation”   Gateway Core Gateway Gateway Core Gateway postoffice logical view
  23. PostofFice  address  elements   §  Each  nyt⨍aбrik service has 3

    basic uniquifying elements: “region”: “us-west-2”, “instance”: “i-123”, “pid”: “12”
  24. PostofFice  address  elements   §  Each  nyt⨍aбrik service has 3

    basic uniquifying elements: “region”: “us-west-2”, “instance”: “i-123”, “pid”: “12” §  And  some  additional  qualifiers:   “product”: “search”, “service”: “route”
  25. PostofFice  routing  key   §  Each  routing  key  has  a

     “from”   address  embedded  in  it: “region”: “us-west-2”, “instance”: “i-123”, “pid”: “12”,   “product”: “search”, “service”: “resolve”
  26. PostofFice  routing  key   §  Each  routing  key  has  a

     “from”   address  embedded  in  it: “region”: “us-west-2”, “instance”: “i-123”, “pid”: “12”,   “product”: “search”, “service”: “resolve” §  And  a  “to”  address:   “region”: “us-west-2”, “instance”: “-”, “pid”: “-”,   “product”: “search”, “service”: “route” (the  “–”  means  “any”)
  27. PostofFice  routing  key   §  Each  routing  key  has  a

     “from”   address  embedded  in  it: “region”: “us-west-2”, “instance”: “i-123”, “pid”: “12”,   “product”: “search”, “service”: “resolve” §  And  a  “to”  address:   “region”: “us-west-2”, “instance”: “-”, “pid”: “-”,   “product”: “search”, “service”: “route” §  And  an  “action”:  “action”: “route” (the  “–”  means  “any”)
  28. PostofFice  routing  key  detail   §  And  they  are  put

     together  as  an  ordered  sequence  like  this:   <action>.<from address>.<to address>
  29. PostofFice  routing  key  detail   §  And  they  are  put

     together  as  an  ordered  sequence  like  this:   <action>.<from address>.<to address> “route.\ us-west-2.search.resolve.i-123.12.\ us-west-2.search.route.-.-”
  30. PostofFice  routing  key  detail   §  And  they  are  put

     together  as  an  ordered  sequence  like  this:   <action>.<from address>.<to address> “route.\ us-west-2.search.resolve.i-123.12.\ us-west-2.search.route.-.-” §  Meaning:  This  is  a  request  for  a  “route”  action  from  a  specific   invocation  of  the  “search”  product  “resolve”  service   addressed  to  any  “search”  product  “route”  service  in  region   “us-­‐west-­‐2”
  31. PostofFice  binding   §  Each  service  invocation  “binds”  (subscribes)  to

     the  postoffice   using  its  unique  address  to  get  messages  specifically  directed   to  it,  e.g.  asynchronous  RPC  responses   <any action>.<any address>.<my address> “*.\ *.*.*.*.*.\ us-west-2.search.route.i-123.12”
  32. PostofFice  binding  for  services   §  Each  service  invocation  also

     “binds”  to  the  postoffice  using   addresses  that  will  select  messages  appropriate  for  its   service   <my action>.<my domain>.<my service> “route.\ us-west-2.*.*.*.*.\ *.*.route.*.*”
  33. PostofFice  binding  for  services   §  Each  service  invocation  also

     “binds”  to  the  postoffice  using   addresses  that  will  select  messages  appropriate  for  its   service   <my action>.<my domain>.<my service> “route.\ us-west-2.*.*.*.*.\ *.*.route.*.*” §  All  this  address  manipulation  is  handled  by  common   methods  in  the  nyt⨍aбrik
  34. Routing  in  the  Core   §  For  load  balancing  on

     entry  to  the  nyt⨍aбrik  Core Message Core Core or
  35. Routing  in  the  Core   §  For  replication  of  important

     (gold  service)  messages Message Core Core and
  36. Routing  in  the  Core   §  For  distribution  to  all

     consumers Core Core Gateway Device Gateway Device
  37. Problems  with  Core  instances   §  Complex  connectivity:  N(N-­‐1)  federation

     +  clustering  +  …     §  Many  services:  input,  process,  resolve,  reject,  cache_push,  …    
  38. Problems  with  Core  instances   §  Complex  connectivity:  N(N-­‐1)  federation

     +  clustering  +  …     §  Many  services:  input,  process,  resolve,  reject,  cache_push,  …     §  Hence,  problematic  to  manage  
  39. Problems  with  Core  instances   §  Complex  connectivity:  N(N-­‐1)  federation

     +  clustering  +  …     §  Many  services:  input,  process,  resolve,  reject,  cache_push,  …     §  Hence,  problematic  to  manage   §  And  difficult  to  autoscale  
  40. Possible  solution:  refactor  and  simplify   §  A  new  component,

     the  Rabbit  Router,  to  focus  on   connectivity  and  routing    
  41. Possible  solution:  refactor  and  simplify   §  A  new  component,

     the  Rabbit  Router,  to  focus  on   connectivity  and  routing     §  A  New  Core,  with  a  focus  on  services    
  42. Possible  solution:  refactor  and  simplify   §  A  new  component,

     the  Rabbit  Router,  to  focus  on   connectivity  and  routing     §  A  New  Core,  with  a  focus  on  services     §  Everything  connected  to  a  Rabbit  Router  
  43. Possible  solution:  refactor  and  simplify   §  A  new  component,

     the  Rabbit  Router,  to  focus  on   connectivity  and  routing     §  A  New  Core,  with  a  focus  on  services     §  Everything  connected  to  a  Rabbit  Router   §  The  “bus”  becomes  a  “star”