Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Searching and Alerting at Naver

Elastic Co
December 16, 2015

Searching and Alerting at Naver

Elastic Co

December 16, 2015
Tweet

More Decks by Elastic Co

Other Decks in Technology

Transcript

  1. Searching  and  Aler.ng  for  applica.on  logs  with   Elas.csearch  at

     Naver   2015/12/16   Jaeik  Lee|Seungjin  Lee    
  2. Agenda   •  Introduc.on  to  our  system   •  How

     we  use  Elas.csearch   •  Real-­‐.me  alert  using  percolator  
  3. Applica.on/Crash  Logs  in  Naver   Various  Pla+orm  &  Format  

    Common  Requirement  for  log  handling  
  4. Limita.on  of  Previous  System  NELO   Applica<ons   Log  collect

     &  aggregate   Log  Search  &  Management   Merge  BoOleneck   Slow  Search&     Limited  Query  
  5. What  we  need   •  Full  Text  Search   • 

    Unstructured  query   •  Real-­‐.me   •  Fast  search  and  aler.ng  for  developers  handle  system  fault  quickly   •  Scalable   •  As  the  number  of  logs  increase   •  Schema  Free   •  Handle  various  type  of  logs  
  6. Scale   •  8  Clusters  (7  in  produc.on,  1  in

     stage)   •  229  Nodes  (152  data  nodes)   •  1.5  Billion  incoming  logs  per  one  day  (size:  2  TB)   •  Total  Documents:  105  billion  (size:  160  TB)  
  7. Index  Model   •  1  Index  per  day  -­‐>  index

     lifecycle  management  based  on  day   •  Type  for  project  -­‐>  mapping  variance  per  project   •  Various  reten.on  .me  according  to  the  instances  (1  M,  3M,  2Y,  5Y)  
  8. Indexing  with  River  (Previous)   •  Elas.csearch  Kaha  River  plugin

      •  Consume  kaha  topics  and  index  to  elas.csearch   •  Problems   •  Performance   •  Unstable   •  Difficult  to  debug   •  Deployment  dependency  
  9. Indexing  with  Storm  (Current)   •  Guarantee  to  process  log

     (at  least  once,  exactly  once  seman.cs)   •  Easy  to  scale  out  according  to  the  amount  of  logs  
  10. Rou.ng  Basics   •  Shard  =  hash(rou.ng)  %  number  of

     primary  shards   •  Rou.ng   •  Default  rou.ng:  document  id   •  Rou.ng  parameter:  user  decide  rou.ng  value    
  11. Custom  Rou.ng   •  Use  custom  rou.ng  both  in  index

     &  search   •  Small  project:  store  only  in  one  shard  (custom  rou.ng:  project  name)   •  Big  project:  distribute  logs  over  all  shards  (default  rou.ng)  
  12. Topology  of  a  Cluster   •  Master  Nodes  (node.master:  true):

     Membership  management,  Metadata   •  Data  Nodes  (node.data:  true):  Data  store  &  processing   •  Client  Nodes  (node.master:false,  node.data:false):  load  balancer   Search   Search   Index   Index  
  13. Layering  for  cold  &  hot  data   •  Recent  1

     Week  Data  in  SSD   •  Node  AOribute  based   •  box_type:  SSD|HDD   Search   Hot  Data   Index   Warm  Data  
  14. What  we  are  improving   •  Index  Structure   • 

    Balancing  shard  distribu.on   •  Isola.ng  small  project  from  big  project   •  Mapping   •  Mul.-­‐fields:  remove  complexity  of  analyzed/not  analyzed  fields   •  Suppor.ng  numeric  types   •  Monitoring  Dashboard   •  Watching  key  metrics  for  clusters  in  one  place  
  15. Real.me  no.fica.on  in  NELO2   •  About  what?   • 

    User  specific  condi.on  including  Elas.csearch  query,  threshold  and  interval     •  If  logs  matching  ${query}  comes  ${threshold}  .mes  within  ${interval},  no.fy   me!   •  When?   •  Immediately  ater  a  condi.on  matches,  within  a  second  
  16. Real.me  no.fica.on  in  NELO2   •  Architec.re  stack   • 

    Elas.csearch,  espicially  the  Percolator   •  Apache  Storm,  Redis,  and  Apache  Kaha   •  Data  load   •  For  1.5  billion  logs  per  day  against  2,000+  user  defined  rules  
  17. Elas.csearch  query  as  an  alert  query   •  A  variety

     of  alert  queries  support   •  projectName:"Elas.con"  AND   (body:/.*securityfailexcep.on|.*sessionfailexcep.on/)     •  (project:"Elas.con"  OR  project:"Logstash")  AND  body:"excep.on"  NOT   source:"session-­‐request"   •  Consitent  query  syntax  both  in  search  and  alerts  
  18. Op.mizing  percolator  performance   •  Rou.ng   4IBSE  4IBSE

     4IBSE  4IBSE  4IBSE  4IBSE  percolate
  19. Op.mizing  percolator  performance   •  Load  balancing   4IBSE 

    4IBSE  4IBSE  4IBSE  4IBSE  4IBSE  percolate
  20. Real.me  cache  invalida.on   •  API  server  sends  message  to

     Kaha  when  an  alert  rule  is  updated   •  No.fica.on  server  which  is  listening  to  the  corresponding  topic  in  kaha   invalidates  the  outdated  cache  immediately