Slide 1

Slide 1 text

arRESTful Development: How Netflix Uses Elasticsearch to Better Understand Their Data Sagar Loke & Homajeet Cheema (Senior Software Engineers)

Slide 2

Slide 2 text

{ } CC-BY-ND 4.0 Who are we •  Cassandra •  RDS •  Elasticsearch •  Dynomite – Netflix OSS •  Priam, Raigad – Netflix OSS Netflix OSS - http://netflix.github.io 2   Cloud Database Engineering @ Netflix

Slide 3

Slide 3 text

{ } CC-BY-ND 4.0 Summary •  Why Elasticsearch •  How we use Elasticsearch •  How we run Elasticsearch •  Raigad 3  

Slide 4

Slide 4 text

{ } CC-BY-ND 4.0 Why Elasticsearch •  Quick retrieval •  Full text search •  Distributed system •  Sharding, replication •  Cluster scale up or down fairly easy •  Flexible schema 4  

Slide 5

Slide 5 text

{ } CC-BY-ND 4.0 Who uses Elasticsearch @ Netflix Events generated by user activity •  Customer service •  Playback •  Signups/User Logins/Referrer URLs Service usage •  Security 5  

Slide 6

Slide 6 text

{ } CC-BY-ND 4.0 ES ecosystem @ Netflix •  Suro Data Pipeline - Netflix OSS •  Handles backpressure •  Retries •  Transport Client •  REST •  Logstash •  Kibana Netflix OSS - http://netflix.github.io 6  

Slide 7

Slide 7 text

{ } CC-BY-ND 4.0 How we run Elasticsearch Deployment •  AWS AMI •  Jenkins Job •  Python Scripts •  Raigad 7  

Slide 8

Slide 8 text

{ } CC-BY-ND 4.0 How we run Elasticsearch •  Asgard – Netflix OSS •  Archaius – Netflix OSS •  Eureka – Netflix OSS Monitoring, Alerting, Dashboard •  Servo – Netflix OSS Netflix OSS - http://netflix.github.io 8   Configuration

Slide 9

Slide 9 text

{ } CC-BY-ND 4.0 A Typical Cluster •  Dedicated master nodes •  Dedicated data nodes •  Search nodes •  Zone aware replication •  At least 1 replica •  Instance replacement •  Zone outages 9  

Slide 10

Slide 10 text

{ } CC-BY-ND 4.0 A Typical Cluster 10  

Slide 11

Slide 11 text

{ } CC-BY-ND 4.0 An Example Cluster Deployed in one AWS region •  More than 3 billion documents (event logs) indexed (per day) •  More than 5TB (per day) •  Indexes stored for 5 days 11  

Slide 12

Slide 12 text

{ } CC-BY-ND 4.0 ES Deployment Growth 12  

Slide 13

Slide 13 text

{ } CC-BY-ND 4.0 { 13 } Raigad An Elasticsearch Sidecar

Slide 14

Slide 14 text

{ } CC-BY-ND 4.0 Raigad – Motivation •  Helps to automate ES deployments, upgrades •  Node Discovery and Tracking •  Automatic Index Management •  Scheduled Backup and Restore •  Geared towards running in AWS Environment { 14 }

Slide 15

Slide 15 text

{ } CC-BY-ND 4.0 Raigad – How it runs •  Elasticsearch Side Car installed on every ES instance •  Tunes elasticsearch.yml file based on configuration parameters •  Overwrites existing yml file with new parameters •  Updates Security Groups •  Bootstraps ES process •  Gathers information about peers and passes on to ES process during bootstrap { 15 }

Slide 16

Slide 16 text

{ } CC-BY-ND 4.0 Raigad – Auto ES Deployments •  Based on configuration parameters; tunes Elasticsearch.yml file •  Single-region deployments node.rack_id : us-east-1c / us-east-1d / us-east-1e {Availability Zone} •  Multi-region deployments node.rack_id : us-east-1 {Region Name} network.publish_host: 54.123.456.789 •  Currently follows dedicated Master-Data-Search deployment based on ASG Names { 16 }

Slide 17

Slide 17 text

{ } CC-BY-ND 4.0 Raigad – Node Discovery and Tracking •  Sample implementation using Cassandra •  C* keeps track of metadata information of ES Clusters •  ES instance reads C* to discover other nodes during bootstrap •  Storing metadata in C* helps in Multi-Region deployments { 17 }

Slide 18

Slide 18 text

{ } CC-BY-ND 4.0 Raigad – Metadata in C* cluster { 18 }

Slide 19

Slide 19 text

{ } CC-BY-ND 4.0 Raigad – Auto Index Management •  Provides configuration properties for Auto Index Management •  Based on specific index date suffix (YYYYMMDD), old indices are cleaned and new indices are created •  Index Manager job can be scheduled or invoked through REST call •  Scheduled job runs only on Master node { 19 }

Slide 20

Slide 20 text

{ } CC-BY-ND 4.0 Raigad – Running Index Manager … Before Running Index Manager After Running Index Manager { 20 }

Slide 21

Slide 21 text

{ } CC-BY-ND 4.0 Raigad – Configuration Parameters •  By default, uses Dynamic Properties in Archaius (https://github.com/Netflix/archaius) •  Supports configuration parameters through properties file / System Properties •  Based on configuration parameters, update following: –  Single/Multi-region deployment –  Tuning ES yml file –  Tribe Node setup –  Security Group settings –  Backup / Restore properties –  Frequency of Snapshot backup (daily / hourly etc) { 21 }

Slide 22

Slide 22 text

{ } CC-BY-ND 4.0 Raigad – Running in AWS •  Automatic updates to Security Groups when new nodes are added or removed •  Supports IAM Credentials •  Scheduled Snapshot Backup to S3 -- uses elasticsearch-cloud-aws plugin •  Publish ES Metrics to Servo - Centralized Monitoring System { 22 }

Slide 23

Slide 23 text

{ } CC-BY-ND 4.0 Raigad – Miscellaneous •  Tribe Node Setup –  Requires Source Clusters running on different TCP Ports –  Tested for Single Region Tribe Cluster •  REST API Support –  Start ES Process –  Stop ES Process –  Run Index Manager –  Get Peer information –  Run Snapshot Backup / Restore { 23 }

Slide 24

Slide 24 text

{ } CC-BY-ND 4.0 Lessons Learned … •  Assign approximately (Available RAM/2) for ES Process •  Following JVM settings worked well for us : •  CMS Collector •  Young Gen = min(500MB * num_cores, 1/4 * heap size) { 24 } Tuning JVM

Slide 25

Slide 25 text

{ } CC-BY-ND 4.0 Lessons Learned … •  refresh.interval = Disabled •  replication factor = Reduce •  schema changes = selectively index fields •  queue size = Unbounded queue for bulk indexing (Check heap usage) •  number of shards = Increase { 25 } Write Heavy Workloads

Slide 26

Slide 26 text

{ } CC-BY-ND 4.0 Lessons Learned •  Dedicated master nodes •  Queue to regulate indexing load for heavy write applications •  Set High file descriptor limit •  Ideally ES Clients and Servers should have same JVM Versions •  Do NOT run ES Cluster with Mixed JVM Versions { 26 }

Slide 27

Slide 27 text

{ } Thank you. We are hiring !! Apply here : [email protected] Homajeet Cheema (www.linkedin.com/in/homajeetcheema) Sagar Loke (@sagar_loke) (www.linkedin.com/in/sagarloke)

Slide 28

Slide 28 text

{ } CC-BY-ND 4.0 This work is licensed under the Creative Commons Attribution-NoDerivatives 4.0 International License. To view a copy of this license, visit: http://creativecommons.org/licenses/by-nd/4.0/ or send a letter to: Creative Commons PO Box 1866 Mountain View, CA 94042 USA { 28 }