Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Elasticsearch as a Service at eBay

Elastic Co
March 08, 2017

Elasticsearch as a Service at eBay

With countless business-critical text search and analytics use cases that utilize Elasticsearch, eBay has created a custom 'Elasticsearch as Service' platform. Learn about sizing, provisioning, configuring, maintaining, auto-scaling, and decommissioning states for every Elasticsearch cluster.

Sudeep Kumar l Engineer l eBay

Elastic Co

March 08, 2017
Tweet

More Decks by Elastic Co

Other Decks in Technology

Transcript

  1. “SAAS platform that provides fully managed Elasticsearch clusters consistent with

    methods & systems adopted within EBAY” 2 22% 36% 32% 10% Analytics Text Search Monitoring/Logging Document store Use-cases Pronto
  2. 3 Fact Sheet • 35+ managed production clusters • Manages

    1200+ VM nodes • 2.5 million metrics collected daily • 1200 active alerts configured • 2 million indexing requests/min • Multi data center deployments • On cloud based environment • Auto-remediation & flex-up capabilities • API driven cluster management • Secured access to Clusters
  3. 4 ‘ES-AAS’ Goals • Automated deployment • Efficient resource usage/tracking

    • Low TTD and TTR issues • Lights out management • Security • Unified monitoring & alerting • High availability
  4. Cluster Lifecycle 5 • Capacity estimation • Instance preparation Prep

    Deploy Onboard Manage Decom • Cluster Provisioning • Configuration mgmt • Load balancing • Firewall rules • Cluster topology • SLA definition • Client integration • Security access policies • Cluster remediation • Monitoring • Alerting • Logging • Free-up resources • Cleanup topology
  5. • Benchmark • Buffer-up • Create sizing calculator tools Cluster

    Sizing & Prep 6 Understanding infrastructure
  6. • Foreman/Puppet based deployment • Use hot-warm architecture for time

    based indices • Provide index management • Leverage load balancers • Define data ingestion and search access pattern Deployment 8 Automation and Configuration management
  7. • Authentication • Authorization • Explicit IP Whitelisting • Super

    consumers • Set IP filter rules • Additional OS hardening Cluster Security 11 Security plugin & firewall rules Sample Access Policy
  8. • Metrics are collected via custom plugin • 70+ metrics

    collected every minute from each node • Metrics pushed onto a backend TSDB store • Metrics monitored by internal dashboards / Grafana Cluster Maintenance 12 Push based Monitoring
  9. • API driven alert creation • Threshold based • Gap

    alerts Cluster Maintenance 13 Alerting Warning Error Types
  10. Cluster Maintenance 15 Lights Off Management • Rule based engine

    • Allows complex rule expressions • Intelligent Auto remediation • Provides Audit logging / Alerting • Allows retry & rollback to last known state
  11. Future for ‘ES-AAS’ platform 17 • Kubernetes adoption • Data

    consistency across data centers • A good backup and recovery story
  12. 19