Upgrade to Pro — share decks privately, control downloads, hide ads and more …

SEEKing Truth Among 10 Billion Logs

Sponsored · Your Podcast. Everywhere. Effortlessly. Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
Avatar for Elastic Co Elastic Co
December 10, 2015

SEEKing Truth Among 10 Billion Logs

SEEK Limited is a global leader in employment, education, and volunteer opportunities in 17 countries, hosting 100 million job seeker profiles and over 3 million available job opportunities at any given time. With websites attracting over 375 million monthly visits, SEEK needed to build a mechanism to centralize logs from multiple sources, simplify search across all of them, and generate timely results with visual presentation.

Christopher Phan | Elastic{ON} Tour Melbourne | December 10, 2015

Avatar for Elastic Co

Elastic Co

December 10, 2015
Tweet

More Decks by Elastic Co

Other Decks in Technology

Transcript

  1. Introduction Christopher Phan •  Recently graduated from Swinburne University • 

    Studied Bachelor of Information Technology •  Currently working at SEEK in the Security Operations Team SEEK Limited •  Global Leader in Employment, Education & Volunteer marketplaces which space across 17 Countries •  SEEK hosts 100 million job seeker profiles and has over 3 million jobs opportunities available at any given time globally •  SEEK websites attract over 375 million visits per month 2
  2. How has SEEK used Elasticsearch What did we need to

    achieve our goal? •  Provide stakeholders the ability to search and correlate results from multiple log sources in an effort to promote continuous delivery, proactive security, and improve end-user experience. What we needed to do? •  Build a mechanism to centralise logs from multiple sources and simplify search across all log sources. •  Generate timely results and be able to present them visually. 3
  3. Problems before Elasticsearch •  Log files stored in multiple locations

    •  Data logged in large flat files (>300mb per file) •  Lack of tools to effectively search and correlate data •  Unable to search all required log sources to get a complete picture •  Time intensive (Search performance – slow and manual) •  Limited retention periods (data often deleted before search could be performed) 4
  4. Conducted POC using A Physical Environment Discovered Key Learnings and

    Challenges Evolved Requirements for Production Build a Distributed Cluster Migrated Hosting to Cloud - scalability and availability Integrate data with other visualisation platforms Our Journey with Elasticsearch 5 Onwards Elastic POC Production Implementatio n     Evaluation Single  node   cluster   Windows  Server   2008   Elas8csearch  v.   0.90   10  day  reten8on   3  log  sources     Pla?orm   Reliability   Limited  Scalability   Performance   Issues   Security  Concerns       Horizontal   scalability   Self-­‐maintained   Solu8on   1  year  log   reten8on   Mul8ple  log   sources     Integrate  logging   and  searching   tools   Provide  a  visual   interface  to  end-­‐ users  in  real-­‐8me    
  5. Elasticsearch POC 6 We wanted to focus the POC to

    prove we could correlate search and results from three different teams. • Proactive Security • Track firewall violations to weblog patterns • Monitor DDoS & brute force attempts Security • Operational Monitoring leading to Continuous Deployment • Track deployment success • Webserver health checks DevOps • Providing meaningful information in real-time to end- users • Determine impact of fraudulent activity Fraud
  6. Key Learnings from our POC Platform Reliability •  Java memory

    leak – CircuitBreakerException •  Elasticsearch.conf not read •  System required manual recovery Limited scalability •  Hardware Constraints due to Single Physical Node Performance Issues •  Searching could take minutes to complete Security Concerns •  Authentication not granular enough 7
  7. Conducted POC using A Physical Environment Discovered  Key   Learnings

     and   Challenges Evolved  Requirements   for  Produc8on   Build a Distributed Cluster Migrated Hosting to Cloud - scalability and availability Integrate data with other visualisation platforms Our Journey with Elastic 8 Onwards Elastic POC Production Implementatio n     Evaluation Single  node   cluster   Windows  Server   2008   Elas8csearch  v.   0.90   10  day  reten8on   3  log  sources     Pla?orm   Reliability   Limited  Scalability   Performance   Issues   Security  Concerns       Horizontal   scalability   Self-­‐maintained   Solu8on   1  year  log   reten8on   Mul8ple  log   sources     Integrate  logging   and  searching   tools   Provide  a  visual   interface  to  end-­‐ users  in  real-­‐8me    
  8. Addressing Challenges identified from POC Data Management •  Growth of

    log repository to 30TB §  Using Curator plugin to manage data §  Hot – Warm – Cold model o Close > 15 days o Delete > 30 days o Daily S3 snapshots o S3 to Glacier > 60 days Security Concerns around user access •  Move from Apache to Shield §  Active Directory groups permissions §  Index + Alias = Permissions 9 Rules  for  reten8on  and  archiving  
  9. Addressing Challenges identified from POC – cont. Maintaining 99% uptime

    •  Separate Marvel cluster §  Isolate Marvel logs •  Automated Watcher Alerts §  0 logs returned §  Field data > 90% §  Email Alerts •  Cronjob curls §  Webhook notifications §  Alert emails 10
  10. Building to Scale 11 7  Data  Nodes  (Na,ve  TCP)  

    3  Master  Nodes  (TCP)   2  Client  Nodes   Elas,c  Load  Balancer   Security   DevOp s   Product   Fraud   Internal   Systems     Data   Analy8 cs   TCP,  HTTP   TCP,  UDP,  HTTP   Firewall  &  DC  Logs   Syslog   River  &  SeriLog   Database  Logs   nxLog   Webserver  &  Windows   event  logs   Users   Log Sources: Web  Applica8on   Firewall   Web  Server  Logs   DB  Errors   Applica,on  Logs   Internal  Firewall   Domain  Controller   Windows  Event  Logs   Cloudtrail  Logs   S3/Glacier  
  11. Use Case 1 - Investigating an Infected Device BEFORE • 

    Manual data gathering •  Anti-virus logs •  Firewall logs •  Forensics on the machine 13 AFTER •  Live internal network monitoring •  Chain firewall, windows event logs and domain controller •  Flag connections to blacklisted IPs and URLs •  Track events on a user/host level
  12. Use Case 2 – Measuring Customer Response 14 •  Monitor

    for decrease candidate applications •  Determine cause of registration drop-off •  Live tracking of Blue-Green testing
  13. Use Case 3 – Scraping before Elasticsearch BEFORE •  Time

    + Volume based rules •  Block cloud services •  High false positive count •  Difficult to track new behaviour 15 AFTER •  Monitor specific end-points •  Track sources without users •  Count unique URLs per source •  Analyze cookie + referrer + user-agent patterns
  14. What Elasticsearch means to SEEK 16 Cluster: •  5000 -

    docs indexed per second •  ~850 - shards •  >100 - ‘live’ indexes Storage •  3-5 billion docs searchable •  7-10 billion docs on disk •  ~100 billion retrievable documents •  1 year retention period
  15. Conclusion 17 Security Internal Systems Fraud DevOps Product Data Analytics

    •  Watch and identify scrapers •  Monitor for DDoS/Brute Force •  Malicious behavior analysis •  Incident Response timeline •  Web farm health monitoring •  Volume monitoring for error logs •  Track effects of phased deployments •  Monitor fraudulent user activity •  Find identifiers of fraudulent users •  Measure customer response times •  Track application status codes •  Identify cause user drop- off •  Link firewall logs to Windows event logs •  Monitor domain controller events •  Combine big data visualisation with drill- down capability •  Fluctuations in expected behavior
  16. 18 Thank  you  for  listening   Christopher  Phan    

    Email:  [email protected]     LinkedIn:   hXps://au.linkedin.com/in/christopher-­‐phan-­‐6a500051