Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Elasticsearch as a Service at eBay

Avatar for Elastic Co Elastic Co
March 08, 2017

Elasticsearch as a Service at eBay

With countless business-critical text search and analytics use cases that utilize Elasticsearch, eBay has created a custom 'Elasticsearch as Service' platform. Learn about sizing, provisioning, configuring, maintaining, auto-scaling, and decommissioning states for every Elasticsearch cluster.

Sudeep Kumar l Engineer l eBay

Avatar for Elastic Co

Elastic Co

March 08, 2017
Tweet

More Decks by Elastic Co

Other Decks in Technology

Transcript

  1. “SAAS platform that provides fully managed Elasticsearch clusters consistent with

    methods & systems adopted within EBAY” 2 22% 36% 32% 10% Analytics Text Search Monitoring/Logging Document store Use-cases Pronto
  2. 3 Fact Sheet • 35+ managed production clusters • Manages

    1200+ VM nodes • 2.5 million metrics collected daily • 1200 active alerts configured • 2 million indexing requests/min • Multi data center deployments • On cloud based environment • Auto-remediation & flex-up capabilities • API driven cluster management • Secured access to Clusters
  3. 4 ‘ES-AAS’ Goals • Automated deployment • Efficient resource usage/tracking

    • Low TTD and TTR issues • Lights out management • Security • Unified monitoring & alerting • High availability
  4. Cluster Lifecycle 5 • Capacity estimation • Instance preparation Prep

    Deploy Onboard Manage Decom • Cluster Provisioning • Configuration mgmt • Load balancing • Firewall rules • Cluster topology • SLA definition • Client integration • Security access policies • Cluster remediation • Monitoring • Alerting • Logging • Free-up resources • Cleanup topology
  5. • Benchmark • Buffer-up • Create sizing calculator tools Cluster

    Sizing & Prep 6 Understanding infrastructure
  6. • Foreman/Puppet based deployment • Use hot-warm architecture for time

    based indices • Provide index management • Leverage load balancers • Define data ingestion and search access pattern Deployment 8 Automation and Configuration management
  7. • Authentication • Authorization • Explicit IP Whitelisting • Super

    consumers • Set IP filter rules • Additional OS hardening Cluster Security 11 Security plugin & firewall rules Sample Access Policy
  8. • Metrics are collected via custom plugin • 70+ metrics

    collected every minute from each node • Metrics pushed onto a backend TSDB store • Metrics monitored by internal dashboards / Grafana Cluster Maintenance 12 Push based Monitoring
  9. • API driven alert creation • Threshold based • Gap

    alerts Cluster Maintenance 13 Alerting Warning Error Types
  10. Cluster Maintenance 15 Lights Off Management • Rule based engine

    • Allows complex rule expressions • Intelligent Auto remediation • Provides Audit logging / Alerting • Allows retry & rollback to last known state
  11. Future for ‘ES-AAS’ platform 17 • Kubernetes adoption • Data

    consistency across data centers • A good backup and recovery story
  12. 19