Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Global Scaling at The New York Times

Global Scaling at The New York Times

Slides from Michael Laing's EBU Devcon 2014 talk.

Tweet

More Decks by The New York Times Developers

Other Decks in Technology

Transcript

  1. Why?   vs The “one ring” Black Hole (temporary) =

    1 2 5 10 100 PetaBytes NYTimes, ?, ?
  2. A  Global  Mesh  with  a  Memory   Message-­‐based:  WebSocket,  AMQP,

     SockJS   If  in  doubt:   •  Resend   •  Reconnect   •  Reread   Idempotent:   •  Replicating   •  Racy   •  Resolving   Classes  of  service:   •  Gold:  replicate/race/resolve   •  Silver:  prioritize   •  Bronze:  queueable   Millions  of  users   Event-­‐driven:  async  using  libev  
  3. Adding  Some  Current  Context…   §  Reactive  Programming:  “…automatically propagate

    changes through the data flow…”                (wikipedia.org/wiki/Reactive_programming)    
  4. Adding  Some  Current  Context…   §  Reactive  Programming:  “…automatically propagate

    changes through the data flow…”                (wikipedia.org/wiki/Reactive_programming)     §  Microservices:  “…small, lightweight services, where each performs a single function…arranged in independently deployable groups and communicate with each other via a well defined interface…”              (davidmorgantini.blogspot.ch/2013/08/micro-services-what-are-micro-services.html)  
  5. Adding  Some  Current  Context…   §  Reactive  Programming:  “…automatically propagate

    changes through the data flow…”                (wikipedia.org/wiki/Reactive_programming)     §  Microservices:  “…small, lightweight services, where each performs a single function…arranged in independently deployable groups and communicate with each other via a well defined interface…”              (davidmorgantini.blogspot.ch/2013/08/micro-services-what-are-micro-services.html)   §  Etc.  
  6. Some  Not-­‐So-­‐Current  Applicable  Context…   §  SEDA  -­‐  Staged Event-Driven

    Architecture: “…decomposes a complex, event-driven application into a set of stages connected by queues…”              (http://en.wikipedia.org/wiki/Staged_event-driven_architecture)    
  7. Some  Not-­‐So-­‐Current  Applicable  Context…   §  SEDA  -­‐  Staged Event-Driven

    Architecture: “…decomposes a complex, event-driven application into a set of stages connected by queues…”              (http://en.wikipedia.org/wiki/Staged_event-driven_architecture)     §  Message  Bus  (Message-­‐Oriented  Middleware):       “…relies on asynchronous message-passing, as opposed to a request-response architecture…”              (http://en.wikipedia.org/wiki/Message-oriented_middleware)  
  8. Message:  an  event  with  data   §  Properties:  Routing  while

     in  motion  &  Locating  when  at  rest   §  Metadata   §  Body  (opaque  to  us)   Metadata Body (may be absent) Message Properties
  9. Message:  an  event  with  data   RabbitMQ WebSocket S3 /

    CloudFront Cassandra Properties Routing Key Gateway Connection UUID “Path” & UUID Metadata Headers: Map / Array JSON HTTP Headers JSON Body Blob Blob Blob Blob
  10. Publish   Message Core Cassandra S3 / Cloud Front Gateway

    Device Init AMQP CQL WebSocket HTTP sync
  11. Dismiss   Message Core Cassandra Gateway Device Init AMQP CQL

    WebSocket Core Gateway Device Cassandra
  12. Core Core Core Core Gateway Device Message S3 / Cloud

    Front dozens dozens millions millions millions several Cassandra dozens S3 / Cloud Front S3 / Cloud Front S3 / Cloud Front Gateway Gateway Gateway Gateway Gateway Gateway Cassandra Cassandra Cassandra Cassandra Cassandra Device Device Device Device Device Device Device Device Device Device Device Device Message Message Message Message Message Message Message Message Message Message Message Message Connect  
  13. Properties  –  2  forms  of  addressing   §  “Path”:  1)

     Routing  a  message  to  a  user  2)  Finding  a  message  for  a  user           Message nyt⨍aбrik
  14. Properties  –  2  forms  of  addressing   §  “Path”:  1)

     Routing  a  message  to  a  user  2)  Finding  a  message  for  a  user         §  “PostofQice”:  Routing  a  message  internally  in  the  nyt⨍aбrik     Message nyt⨍aбrik Core Gateway Core Gateway
  15. The  Path  hierarchy   §  Path  elements  are  text  (utf-­‐8

     but  “.”  is  reserved)  –  the  1st  element  is   the  “category”     “category”: “feeds”, “2nd element”: “breaking-news” “3rd element”: “0012345”
  16. The  Path  hierarchy   §  Path  elements  are  text  (utf-­‐8

     but  “.”  is  reserved)  –  the  1st  element  is   the  “category”     “category”: “feeds”, “2nd element”: “breaking-news” “3rd element”: “0012345”   §  The  elements  are  joined  by  “.”  for  routing   “path”: “feeds.breaking-news.00123456”
  17. Deeper  into  the  Path  hierarchy   §  For  persistence,  the

     path  denotes  a  sorted  “folder”  containing   messages  in  reverse  datetime  order  (using  the  timestamp  from  the   version  1  uuid  uniquely  identifying  each  message)   “feeds.breaking-news.56”/bd1961f5-1062-11e4-a630-406c8f1838fa “feeds.breaking-news.56”/b94e8b45-1062-11e4-900d-406c8f1838fa
  18. Deeper  into  the  Path  hierarchy   §  For  persistence,  the

     path  denotes  a  sorted  “folder”  containing   messages  in  reverse  datetime  order  (using  the  timestamp  from  the   version  1  uuid  uniquely  identifying  each  message)   “feeds.breaking-news.56”/bd1961f5-1062-11e4-a630-406c8f1838fa “feeds.breaking-news.56”/b94e8b45-1062-11e4-900d-406c8f1838fa   §  Subscribing  to  a  path  is  done  by  “binding”,  typically  with  wildcards:     “*”  matches  any  one  element,  “#”  matches  any  sequence  of  elements   All  breaking-­‐news  messages:  “feeds.breaking-news.#”
  19. More  on  subscribing  &  retrieving   §  Retrieving  from  persistent

     storage  can  be  done  by  path,  e.g.  the   “latest”  breaking-­‐news  messages  for  item  56:     “feeds.breaking-news.56”
  20. More  on  subscribing  &  retrieving   §  Retrieving  from  persistent

     storage  can  be  done  by  path,  e.g.  the   “latest”  breaking-­‐news  messages  for  item  56:     “feeds.breaking-news.56”   §  But  retrieval  can  also  be  done  using  trailing  wild  cards:   “feeds.breaking-news.#” will  return  the  “latest”  breaking-­‐news   messages  for  all  “current”  items  
  21. More  on  subscribing  &  retrieving   §  Retrieving  from  persistent

     storage  can  be  done  by  path,  e.g.  the   “latest”  breaking-­‐news  messages  for  item  56:     “feeds.breaking-news.56”   §  But  retrieval  can  also  be  done  using  trailing  wild  cards:   “feeds.breaking-news.#” will  return  the  “latest”  breaking-­‐news   messages  for  all  “current”  items     §  The  Cassandra  data  store  is  designed  to  return  hierarchical  queries   with  a  single  request  and  in  the  desired  order  
  22. A  notable  simpliQication:   §  Paths  for  subscribing  to  messages

     and  paths  for  retrieving  persisted   messages,  including  the  use  of  wild  cards,  are  the  same,  e.g.:  
  23. A  notable  simpliQication:   §  Paths  for  subscribing  to  messages

     and  paths  for  retrieving  persisted   messages,  including  the  use  of  wild  cards,  are  the  same,  e.g.:   When  a  user  logs  in  she  is  “subscribed”  using  her  ID;  messages   “published”  to  her  will  be  received  while  “persisted”  messages  and   subscription  preferences  are  retrieved  (a  few  10’s  of  milliseconds)  
  24. A  notable  simpliQication:   §  Paths  for  subscribing  to  messages

     and  paths  for  retrieving  persisted   messages,  including  the  use  of  wild  cards,  are  the  same,  e.g.:   When  a  user  logs  in  she  is  “subscribed”  using  her  ID;  messages   “published”  to  her  will  be  received  while  “persisted”  messages  and   subscription  preferences  are  retrieved  (a  few  10’s  of  milliseconds)     Once  subscription  preferences  arrive,  she  will  be  “subscribed”  to  them     and  any  corresponding  “persisted”  messages  retrieved     The  same  paths  are  used  for  subscription  and  retrieval  
  25. Special  Paths  for  individual  routing   §  Our  subscribers  (millions

     of  them)  have  numeric  IDs  –  using  those  IDs   directly  for  routing,  specigically  for  the  “binding”  function,  would  be   inefgicient   “id.prefs.09067832” (namespace  of  3rd  element  is  too  large)  
  26. Special  Paths  for  individual  routing   §  Our  subscribers  (millions

     of  them)  have  numeric  IDs  –  using  those  IDs   directly  for  routing,  specigically  for  the  “binding”  function,  would  be   inefgicient   “id.prefs.09067832” (namespace  of  3rd  element  is  too  large)   §  Instead  we  convert  the  ID  to  base62  elements  and  take  advantage  of   the  patricia  trie  search  structures  built  into  RabbitMQ  and  our   gateway   “id.prefs.c.2.x.M” (equivalent  to  the  above,  used  for  routing)  
  27. PostofQice  addressing   §  The  “postofgice”  is  a  logical  

    “bus”  that  connects  all  the   services  in  all  the  nyt⨍aбrik   instances  globally   Gateway Core Gateway Gateway Core Gateway postoffice logical view
  28. PostofQice  addressing   §  The  “postofgice”  is  a  logical  

    “bus”  that  connects  all  the   services  in  all  the  nyt⨍aбrik   instances  globally   §  It  is  physically  segmented   and  the  segments  are   connected  using  RabbitMQ   “shovels”  and  “federation”   Gateway Core Gateway Gateway Core Gateway postoffice logical view
  29. PostofQice  address  elements   §  Each  nyt⨍aбrik service has 3

    basic uniquifying elements: “region”: “us-west-2”, “instance”: “i-123”, “pid”: “12”
  30. PostofQice  address  elements   §  Each  nyt⨍aбrik service has 3

    basic uniquifying elements: “region”: “us-west-2”, “instance”: “i-123”, “pid”: “12” §  And  some  additional  qualigiers:   “product”: “search”, “service”: “route”
  31. PostofQice  routing  key   §  Each  routing  key  has  a

     “from”   address  embedded  in  it: “region”: “us-west-2”, “instance”: “i-123”, “pid”: “12”,   “product”: “search”, “service”: “resolve”
  32. PostofQice  routing  key   §  Each  routing  key  has  a

     “from”   address  embedded  in  it: “region”: “us-west-2”, “instance”: “i-123”, “pid”: “12”,   “product”: “search”, “service”: “resolve” §  And  a  “to”  address:   “region”: “us-west-2”, “instance”: “-”, “pid”: “-”,   “product”: “search”, “service”: “route” (the  “–”  means  “any”)
  33. PostofQice  routing  key   §  Each  routing  key  has  a

     “from”   address  embedded  in  it: “region”: “us-west-2”, “instance”: “i-123”, “pid”: “12”,   “product”: “search”, “service”: “resolve” §  And  a  “to”  address:   “region”: “us-west-2”, “instance”: “-”, “pid”: “-”,   “product”: “search”, “service”: “route” §  And  an  “action”:  “action”: “route” (the  “–”  means  “any”)
  34. PostofQice  routing  key  detail   §  And  they  are  put

     together  as  an  ordered  sequence  like  this:   <action>.<from address>.<to address>
  35. PostofQice  routing  key  detail   §  And  they  are  put

     together  as  an  ordered  sequence  like  this:   <action>.<from address>.<to address> “route.\ us-west-2.search.resolve.i-123.12.\ us-west-2.search.route.-.-”
  36. PostofQice  routing  key  detail   §  And  they  are  put

     together  as  an  ordered  sequence  like  this:   <action>.<from address>.<to address> “route.\ us-west-2.search.resolve.i-123.12.\ us-west-2.search.route.-.-” §  Meaning:  This  is  a  request  for  a  “route”  action  from  a  specigic   invocation  of  the  “search”  product  “resolve”  service   addressed  to  any  “search”  product  “route”  service  in  region   “us-­‐west-­‐2”
  37. PostofQice  binding   §  Each  service  invocation  “binds”  (subscribes)  to

     the  postofgice   using  its  unique  address  to  get  messages  specigically  directed   to  it,  e.g.  asynchronous  RPC  responses   <any action>.<any address>.<my address> “*.\ *.*.*.*.*.\ us-west-2.search.route.i-123.12”
  38. PostofQice  binding  for  services   §  Each  service  invocation  also

     “binds”  to  the  postofgice  using   addresses  that  will  select  messages  appropriate  for  its   service   <my action>.<my domain>.<my service> “route.\ us-west-2.*.*.*.*.\ *.*.route.*.*”
  39. PostofQice  binding  for  services   §  Each  service  invocation  also

     “binds”  to  the  postofgice  using   addresses  that  will  select  messages  appropriate  for  its   service   <my action>.<my domain>.<my service> “route.\ us-west-2.*.*.*.*.\ *.*.route.*.*” §  All  this  address  manipulation  is  handled  by  common   methods  in  the  nyt⨍aбrik
  40. Routing  in  the  Core   §  For  load  balancing  on

     entry  to  the  nyt⨍aбrik  Core Message Core Core or
  41. Routing  in  the  Core   §  For  replication  of  important

     (gold  service)  messages Message Core Core and
  42. Routing  in  the  Core   §  For  distribution  to  all

     consumers Core Core Gateway Device Gateway Device
  43. Stepping  Back  –  snippets  from  Djikstra   §  “Summarizing: as

    a slow-witted human being I have a very small head” from "Notes on Structured Programming" (EWD249)
  44. Stepping  Back  –  snippets  from  Djikstra   §  “Summarizing: as

    a slow-witted human being I have a very small head” from "Notes on Structured Programming" (EWD249) §  from ‘What Led to "Notes on Structured Programming”’ (EWD1308)
  45. Rough  Costs  (all  in  the  Amazon  cloud)   §  Averaging

    ~50 small-ish instances in production   50 x $.13 / hr x 30 x 24 = $4680 / month
  46. Rough  Costs  (all  in  the  Amazon  cloud)   §  Averaging

    ~50 small-ish instances in production   50 x $.13 / hr x 30 x 24 = $4680 / month §  Other costs < $300 / month
  47. Rough  Costs  (all  in  the  Amazon  cloud)   §  Averaging

    ~50 small-ish instances in production   50 x $.13 / hr x 30 x 24 = $4680 / month §  Other costs < $300 / month §  Too much – targeting half that within a few months
  48. The  risky/hard  part:  the  Gateway   §  Prototypes worked well

    §  But interaction with the production live site too intensive
  49. The  risky/hard  part:  the  Gateway   §  Prototypes worked well

    §  But interaction with the production live site too intensive §  Smarter gateway / smarter live site integration will allow better scaling
  50. How  about  these  goals?   §  Responsive: Yes   least

    latency routing, fast cache, etc. §  Resilient: Yes   active/active/… across multiple independent regions and zones
  51. How  about  these  goals?   §  Responsive: Yes   least

    latency routing, fast cache, etc. §  Resilient: Yes   active/active/… across multiple independent regions and zones §  Scalable: Getting there   algorithm is good, scaling up is fine, working on browser interaction,   new automation tools (ansible) are being staged into production
  52. A  Proposal:   Because nyt⨍aбrik  is more of a process

    than a product: §  Replay “Building a Messaging Fabric” as a series of blog posts §  Post the code on github as OSS §  Take community contributions using other languages, message brokers, persistence stores, cloud providers, etc. §  Let me know if there is interest!