Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Dude, Where Are My Messages? Message Analytics at Netflix

Elastic Co
February 19, 2016

Dude, Where Are My Messages? Message Analytics at Netflix

Netflix messages millions of customers a day across many channels – email, push notifications, text, voice calls, etc – via its messaging platform: a distributed system made up of a series of separate applications. Learn how they use Elasticsearch for higher message deliverability and operational excellence.

Elastic Co

February 19, 2016
Tweet

More Decks by Elastic Co

Other Decks in Technology

Transcript

  1. Devika  Chawla  –  Director  of  Engineering   George  Abraham:  So;ware

     Engineer   Dude,  Where  Are  My   Messages?  
  2. Messaging Platform Event   Consumer   Algorithms   APNS  

    Apple   Customer   Service   Billing   Account   PR   Partner   MarkeCng   Message   Processor   Scheduler   In  App   Message   Service   Device   Token   Service   Feedback   Processor   Subscriber   Service   Top   Titles   Video   Metadata   AB   Test   Service   Video   Ranker   GCM   Google   Amazon   SES   Twilio   Events Messages Push Notification Email SMS/Voice In-App Message Message  &   Event   Metadata   Service  
  3. Questions to be answered Real Time 1.  Price  Change  Email

     status?   2.  Customer’s  Password  Request  message?   3.  OITNB  Push  NoCficaCons  delivered?   4.  Global  distribuCon  of  phone  verificaCon?        
  4. Ability  to  answer  quesCons   in  real  Cme   • 

    Leveraging  messaging   operaConal  data   – Customer  ID   – Country   – Message  Type  
  5. Familiar Story … 1.  Log  parsing     2.  Try

     and  leverage  exisCng  soluCons   3.  Build  custom  soluCons   4.  ElasCcsearch  to  the  rescue   Distributed  Grep   •  Specific  paZerns  to  get  a  “feel”   for  issues   •  Not  enough  confidence  since  it   is  basically  low  tech  sampling   Atlas  –  Ne]lix’s  monitoring   system   •  Great  for  trends  and  rates   •  Not  meant  for  tracing   messages   RelaConal  DB   •  Could  aggregate  limited  set  of   dimensions   •  SCll  couldn’t  trace  individual   message  
  6. #netflixeverywhere I.  Customer  Growth   II.  Increases  in  the  types

     messages   III.  AddiCon  of  channels   IV.  Pla]orm  growth  to  accommodate  innovaCon  
  7. Messaging Platform Event   Consumer   Algo   APNS  

    Apple   Customer   Service   Billing   Account   PR   Partner   MarkeCng   Message   Processor   Scheduler   In  App   Message   Service   Device   Token   Service   Feedback   Processor   Subscriber   Service   Top   Titles   Video   Metadata   AB   Test   Service   Video   Ranker   GCM   Google   Amazon   SES   Twilio   Events Messages Push Email SMS/Voice In-App Message Message  &   Event   Management   Service  
  8. Messaging Platform Evolution Event   Consumer   Algo   APNS

      Apple   Customer   Service   Billing   Account   PR   Partner   MarkeCng   Message   Processor   Scheduler   In  App   Message   Service   Device   Token   Service   Feedback   Processor   Subscriber   Service   Top   Titles   Video   Metadata   AB   Test   Service   Video   Ranker   GCM   Google   Amazon   SES   Twilio   Events Messages Push Email SMS/Voice In-App Message Message  &   Event   Metadata   Service  
  9. Event   Consumer   Algo   APNS   Apple  

    Customer   Service   Billing   Account   PR   Partner   MarkeCng   Message   Processor   Scheduler   In  App   Message   Service   Device   Token   Service   Feedback   Processor   Subscriber   Service   Top   Titles   Video   Metadata   AB   Test   Service   Video   Ranker   GCM   Google   Amazon   SES   Twilio   Message  &   Event   Management   Service   Algo   Customer   Service   Billing   Account   PR   Partner   MarkeCng   Event   Consumer   Message   Processor   Scheduler   In  App   Message   Service   Device   Token   Service   Message  &   Event   Management   Service   Subscriber   Service   Top   Titles   Video   Metadata   AB   Test   Service   Video   Ranker   Subscriber   Service   Top   Titles   Video   Metadata   AB   Test   Service   Video   Ranker   Feedback   Processor   Feedback   Processor   Algo   Customer   Service   Billing   Account   PR   Partner   MarkeCng   Event   Consumer   Message   Processor   Scheduler   In  App   Message   Service   Device   Token   Service   Message  &   Event   Metadata   Service   Clusters of application nodes
  10. APNS   Apple   GCM   Google   Amazon  

    SES   Twilio   Across AWS Regions us-­‐east-­‐1   us-­‐west-­‐2   eu-­‐west-­‐1  
  11. Message Lifecycle I.  Way  to  trace  an  event  through  the

     pla]orm   II.  Each  component  beacons  records  to  es  as  it  is   processing  an  event   III.  GUIDs  are  used  to  idenCfy  the  enCre  lifecycle   IV.  Complete  visibility  into  the  pla]orm  
  12. ElasCcsearch   Started   Done   Done   Started  

    Done   Started   Done   Event   Consumer   Algo   APNS   Apple   Customer   Service   Billing   Account   PR   Partner   MarkeCng   Message   Processor   Scheduler   In  App   Message   Service   Device   Token   Service   Feedback   Processor   Subscriber   Service   Top   Titles   Video   Metadata   AB   Test   Service   Video   Ranker   GCM   Google   Amazon   SES   Twilio   Events Messages Push Email SMS/Voice Message  &   Event   Metadata   Service   Processed  
  13. Query By Customer ID EC  Started   MP  Started  

    MP  Completed   Feedback  #1   Feedback  #2   Feedback  #3   Twilio Status = queued The message was queued to be sent out by Twilio Twilio Status = sent The message was accepted by the nearest upstream carrier Twilio Status = delivered The carrier has acknowledged that the message was delivered to the handset EC  Completed   EC  Processed   EC  Processed  
  14. •  Easily  extendable  as   more  components  or   stages

     are  added   •  Numerous  insights   into  the  lifecycle  of  a     message   Ability to Investigate
  15. Monitoring a New Arrival Title Breakdown  by   Status  

    Breakdown  of   unsent   Country  heat-­‐map   Query   Add  Filters  to  drill   down   Histogram   showing  count  
  16. Reporting Tab                

            Query  Builder                           Metrics  on  various  dimensions  in   the  context  of  this  message   Time  period                        
  17. Query and Metrics              

                                                                                                                             
  18. Elasticsearch Data                

                                                                                   
  19. Backend Master   Node   Tribe   Node   Data

      Node   Data   Node   Data   Node   Data   Node   Data   Node   Data   Node   Data   Node   Data   Node   Data   Node   Data   Node   Data   Node   Data   Node   Data   Node   Data   Node   Master   Node   Master   Node   Tribe   Node   Tribe   Node   Tribe   Node   Tribe   Node   Tribe   Node   us-­‐east-­‐1   6  (r3.xlarge)   6  (m3.xlarge)   66  (i2.xlarge)  
  20. us-­‐east-­‐1   6  (r3.xlarge)   3  (m3.xlarge)   66  (i2.xlarge)

      3  (m3.xlarge)   3  (m3.xlarge)   24  (i2.xlarge)   24  (i2.xlarge)   us-­‐west-­‐2   eu-­‐west-­‐1  
  21. 1.  ES  version  1.5.2   2.  Kibana  4.0.1   3. 

    Time-­‐based  rotaCng  daily  indices     –  14  day  retenCon   4.  Clusters  are  sized  so  that  data  nodes  have   about  40%  free  space     ES Backend Details