arRESTful Development: How Netflix Uses Elasticsearch to Better Understand Their Data

arRESTful Development: How Netﬂix Uses Elasticsearch to Better Understand Their
Data Sagar Loke & Homajeet Cheema (Senior Software Engineers)

{ } CC-BY-ND 4.0 Who are we •  Cassandra • 
RDS •  Elasticsearch •  Dynomite – Netflix OSS •  Priam, Raigad – Netflix OSS Netflix OSS - http://netflix.github.io 2 Cloud Database Engineering @ Netflix

{ } CC-BY-ND 4.0 Summary •  Why Elasticsearch •  How
we use Elasticsearch •  How we run Elasticsearch •  Raigad 3

{ } CC-BY-ND 4.0 Why Elasticsearch •  Quick retrieval • 
Full text search •  Distributed system •  Sharding, replication •  Cluster scale up or down fairly easy •  Flexible schema 4

{ } CC-BY-ND 4.0 Who uses Elasticsearch @ Netflix Events
generated by user activity •  Customer service •  Playback •  Signups/User Logins/Referrer URLs Service usage •  Security 5

{ } CC-BY-ND 4.0 ES ecosystem @ Netflix •  Suro
Data Pipeline - Netflix OSS •  Handles backpressure •  Retries •  Transport Client •  REST •  Logstash •  Kibana Netflix OSS - http://netflix.github.io 6

{ } CC-BY-ND 4.0 How we run Elasticsearch Deployment • 
AWS AMI •  Jenkins Job •  Python Scripts •  Raigad 7

{ } CC-BY-ND 4.0 How we run Elasticsearch •  Asgard
– Netflix OSS •  Archaius – Netflix OSS •  Eureka – Netflix OSS Monitoring, Alerting, Dashboard •  Servo – Netflix OSS Netflix OSS - http://netflix.github.io 8 Configuration

{ } CC-BY-ND 4.0 A Typical Cluster •  Dedicated master
nodes •  Dedicated data nodes •  Search nodes •  Zone aware replication •  At least 1 replica •  Instance replacement •  Zone outages 9

{ } CC-BY-ND 4.0 A Typical Cluster 10

{ } CC-BY-ND 4.0 An Example Cluster Deployed in one
AWS region •  More than 3 billion documents (event logs) indexed (per day) •  More than 5TB (per day) •  Indexes stored for 5 days 11

{ } CC-BY-ND 4.0 ES Deployment Growth 12

{ } CC-BY-ND 4.0 { 13 } Raigad An Elasticsearch
Sidecar

{ } CC-BY-ND 4.0 Raigad – Motivation •  Helps to
automate ES deployments, upgrades •  Node Discovery and Tracking •  Automatic Index Management •  Scheduled Backup and Restore •  Geared towards running in AWS Environment { 14 }

{ } CC-BY-ND 4.0 Raigad – How it runs • 
Elasticsearch Side Car installed on every ES instance •  Tunes elasticsearch.yml file based on configuration parameters •  Overwrites existing yml file with new parameters •  Updates Security Groups •  Bootstraps ES process •  Gathers information about peers and passes on to ES process during bootstrap { 15 }

{ } CC-BY-ND 4.0 Raigad – Auto ES Deployments • 
Based on configuration parameters; tunes Elasticsearch.yml file •  Single-region deployments node.rack_id : us-east-1c / us-east-1d / us-east-1e {Availability Zone} •  Multi-region deployments node.rack_id : us-east-1 {Region Name} network.publish_host: 54.123.456.789 •  Currently follows dedicated Master-Data-Search deployment based on ASG Names { 16 }

{ } CC-BY-ND 4.0 Raigad – Node Discovery and Tracking
•  Sample implementation using Cassandra •  C* keeps track of metadata information of ES Clusters •  ES instance reads C* to discover other nodes during bootstrap •  Storing metadata in C* helps in Multi-Region deployments { 17 }

{ } CC-BY-ND 4.0 Raigad – Metadata in C* cluster
{ 18 }

{ } CC-BY-ND 4.0 Raigad – Auto Index Management • 
Provides configuration properties for Auto Index Management •  Based on specific index date suffix (YYYYMMDD), old indices are cleaned and new indices are created •  Index Manager job can be scheduled or invoked through REST call •  Scheduled job runs only on Master node { 19 }

{ } CC-BY-ND 4.0 Raigad – Running Index Manager …
Before Running Index Manager After Running Index Manager { 20 }

{ } CC-BY-ND 4.0 Raigad – Configuration Parameters •  By
default, uses Dynamic Properties in Archaius (https://github.com/Netflix/archaius) •  Supports configuration parameters through properties file / System Properties •  Based on configuration parameters, update following: –  Single/Multi-region deployment –  Tuning ES yml file –  Tribe Node setup –  Security Group settings –  Backup / Restore properties –  Frequency of Snapshot backup (daily / hourly etc) { 21 }

{ } CC-BY-ND 4.0 Raigad – Running in AWS • 
Automatic updates to Security Groups when new nodes are added or removed •  Supports IAM Credentials •  Scheduled Snapshot Backup to S3 -- uses elasticsearch-cloud-aws plugin •  Publish ES Metrics to Servo - Centralized Monitoring System { 22 }

{ } CC-BY-ND 4.0 Raigad – Miscellaneous •  Tribe Node
Setup –  Requires Source Clusters running on different TCP Ports –  Tested for Single Region Tribe Cluster •  REST API Support –  Start ES Process –  Stop ES Process –  Run Index Manager –  Get Peer information –  Run Snapshot Backup / Restore { 23 }

{ } CC-BY-ND 4.0 Lessons Learned … •  Assign approximately
(Available RAM/2) for ES Process •  Following JVM settings worked well for us : •  CMS Collector •  Young Gen = min(500MB * num_cores, 1/4 * heap size) { 24 } Tuning JVM

{ } CC-BY-ND 4.0 Lessons Learned … •  refresh.interval =
Disabled •  replication factor = Reduce •  schema changes = selectively index fields •  queue size = Unbounded queue for bulk indexing (Check heap usage) •  number of shards = Increase { 25 } Write Heavy Workloads

{ } CC-BY-ND 4.0 Lessons Learned •  Dedicated master nodes
•  Queue to regulate indexing load for heavy write applications •  Set High file descriptor limit •  Ideally ES Clients and Servers should have same JVM Versions •  Do NOT run ES Cluster with Mixed JVM Versions { 26 }

{ } Thank you. We are hiring !! Apply here
: [email protected] Homajeet Cheema (www.linkedin.com/in/homajeetcheema) Sagar Loke (@sagar_loke) (www.linkedin.com/in/sagarloke)

{ } CC-BY-ND 4.0 This work is licensed under the
Creative Commons Attribution-NoDerivatives 4.0 International License. To view a copy of this license, visit: http://creativecommons.org/licenses/by-nd/4.0/ or send a letter to: Creative Commons PO Box 1866 Mountain View, CA 94042 USA { 28 }

arRESTful Development: How Netflix Uses Elastic...

arRESTful Development: How Netflix Uses Elasticsearch to Better Understand Their Data

Elastic Co

More Decks by Elastic Co

Other Decks in Technology

Featured

Transcript

arRESTful Development: How Netﬂix Uses Elasticsearch to Better Understand Their

{ } CC-BY-ND 4.0 Who are we •  Cassandra •

{ } CC-BY-ND 4.0 Summary •  Why Elasticsearch •  How

{ } CC-BY-ND 4.0 Why Elasticsearch •  Quick retrieval •

{ } CC-BY-ND 4.0 Who uses Elasticsearch @ Netflix Events

{ } CC-BY-ND 4.0 ES ecosystem @ Netflix •  Suro

{ } CC-BY-ND 4.0 How we run Elasticsearch Deployment •

{ } CC-BY-ND 4.0 How we run Elasticsearch •  Asgard

{ } CC-BY-ND 4.0 A Typical Cluster •  Dedicated master

{ } CC-BY-ND 4.0 A Typical Cluster 10

{ } CC-BY-ND 4.0 An Example Cluster Deployed in one

{ } CC-BY-ND 4.0 ES Deployment Growth 12

{ } CC-BY-ND 4.0 { 13 } Raigad An Elasticsearch

{ } CC-BY-ND 4.0 Raigad – Motivation •  Helps to

{ } CC-BY-ND 4.0 Raigad – How it runs •

{ } CC-BY-ND 4.0 Raigad – Auto ES Deployments •

{ } CC-BY-ND 4.0 Raigad – Node Discovery and Tracking

{ } CC-BY-ND 4.0 Raigad – Metadata in C* cluster

{ } CC-BY-ND 4.0 Raigad – Auto Index Management •

{ } CC-BY-ND 4.0 Raigad – Running Index Manager …

{ } CC-BY-ND 4.0 Raigad – Configuration Parameters •  By

{ } CC-BY-ND 4.0 Raigad – Running in AWS •

{ } CC-BY-ND 4.0 Raigad – Miscellaneous •  Tribe Node

{ } CC-BY-ND 4.0 Lessons Learned … •  Assign approximately

{ } CC-BY-ND 4.0 Lessons Learned … •  refresh.interval =

{ } CC-BY-ND 4.0 Lessons Learned •  Dedicated master nodes

{ } Thank you. We are hiring !! Apply here

{ } CC-BY-ND 4.0 This work is licensed under the