Elastic{ON} 2018 - Nativo ELK to Elastic Stack: A Production Journey of 3+ Years

Nativo 2/28/2018 @richhorace Nativo ELK to Elastic Stack: A Production
Journey of 3+ Years Rich Horace, Director of DevOps

• Director of DevOps at Nativo • Last 10+ years
in online advertising space and early-stage startups • Large-scale Infrastructure, both Linux/Windows • Working with Log Aggregation Systems since 2008 • Elastic LA Meetup Organizer since March 2015 About Me 2

Agenda 3 1 Nativo Backstory 2 ELK Stack at Nativo
3 Building a Robust Elastic Stack 4 Future Enhancements Production Journey

Production Journey Nativo Backstory

• Startup in LA • Launched as Postrelease in 2010
• Hired First Sale Person in 2012 • Rebranded to Nativo in 2013 Nativo 5 Early Days

• Customer Base within Top 200 comScore • Approaching 200
Employees Nativo 6 Fast Forward to Now

Nativo 7 Fast Forward to Now

What is Native Advertising? 8

What is Native Advertising? 9

Nativo Ad Executions 10 In-feed player supporting click-to-play, scroll-to- play,
and auto-play execution types Native Video Non-interruptive native ad clicks through to a custom landing page hosted on your site Native Article In-feed ad to external advertiser landing page or publisher designated URL Native Display

Production Journey Nativo Engineering Backstory

• Joined March 2014, Employee #19 • Six Engineers •
No Visibility System Performance • Inadequate Monitoring Nativo Engineering 12 Early Days

• Over 30 Engineers • Elastic Stack • Mission Critical
System • Over 1000 Custom Performance Metrics • Ingesting Data from 30 Applications • 150 Million Docs and 225GB per day • Will Not Deploy to Production without Elastic Stack Nativo Engineering 13 4 Years Later

Oden Cohen, CTO { } The velocity achieved at Nativo
would not be possible without the metrics that Elastic Stack provides.

TEAM 2014 2018 Ad Server Web Team Data Team DevOps
Team Nativo Tech Stack 15 Fast Moving Platform

TEAM 2014 2018 Ad Server Web Team Data Team DevOps
Team Nativo Tech Stack 16 Fast Moving Platform opt-in

LAST FOUR YEARS…. CHANGING WHEELS ON A MOVING BUS… WITHOUT
CRASHING!

Production Journey ELK Stack in the Beginning

In the Beginning ELK Stack • Implemented May 2014 •
Three AWS Regions • Elasticsearch 1.3 / Logstash 1.4 / Kibana 3.0 • Logstash deployed to all EC2 Instances • Distribute Log Enrichment • Standardized Application Logs as JSON Shippers Shippers Indexers

Kibana 3 20 Instant Visibility

Kibana 3 21 Metrics Galore

• 100 – Input Files • 500 – Filter Files
• 900 – Output Files Logstash Configuration 22 Manageability Improvement

23 input { file { path => [ "/path/{ec2.APP}.json"] sincedb_path
=> "/path/sincedb-{ec2.APP}" start_position => "beginning" codec => "json" type => "{ec2.APP}" tags => ["logstash"] } } 100-Input-{ec2.APP}.conf Shipper Input Config Shipper

24 filter { mutate { add_field => { "ec2.Name" =>
"{{ec2_tag_Name}}" "ec2.Cluster" => "{{ec2_tag_Cluster}}" "ec2.App" => "{{ec2_tag_App}}" "ec2.Environment" => "{{ec2_tag_Env}}" "ec2.NativoTeam" => "{{ec2_tag_Team}}" "ec2.InstanceType" => "{{ec2_inst_type}}" "ec2.Region" => "{{ec2_region}}" } } } 500-filter-ec2.conf Shipper Filter Config Shipper

25 output { redis { host => {{shipper_redis.hosts}} shuffle_hosts =>
true data_type => "list" batch => "true" batch_events => "250" key => "logstash" } } 900-output.conf Shipper Output Config Shipper

26 input { redis { batch_count => 1000 codec =>
"json" data_type => "list" host => "localhost" key => "logstash" } } 100-input-redis.conf Indexer Input Config Indexer

27 500-filter-redis.conf Indexer Filter Config Indexer • No Filtering on
Indexers • Log Enrichment Distributed on Shippers • Eliminates Bottleneck on Indexers • Eliminates Config Management on Indexers

28 output { elasticsearch { hosts =>[ “http://es01:9200”, “http://es02:9200”] flush_size
=> 4000 } } 900-output.conf Indexer Output Config Indexer

Monitoring 29 Enrichment at Work

ELK Stack • October 2015 • Elasticsearch 1.7 / Logstash
1.4 / Kibana 4.4 • Single Daily Index • 30M Docs per day • Data Nodes: 3 - M3.2xlarge / 1TB EBS Shippers Shippers Indexers

Kibana 4 31 Release Issue

Production Journey ELK Stack to Elastic Stack

Getting Ready Elastic Stack 33 v2.0

• Hold Out For Dot support in field names •
Need New Indexing Strategy • Elastic Stack 2.0 Products in Transition • Elastic Stack 5.0 Release Coming Q4 Things Just Got Tricky 36 Be Patient

Patience Payoff 37 2.3 Release Mar 2016

Patience Payoff 38 2.4 Dots in Names Aug 2016

Production Journey Elastic 5.0 or Bust

Elastic 5.0 or Bust 40 Patience Is Becoming Procrastination

ELK Stack • January 2017 • Four AWS Regions •
Elasticsearch 1.7.1 / Logstash 1.4.2 / Kibana 4.4 • Single Daily Index • 100-150M Docs per day • 50GB – 80GB per day • Data Nodes: 6 - R3.2xlarge / 750GB EBS

• Indices per App • Metrics vs Logging • logstash-[AppName]-YYYY.MM.DD
• metricbeat-[AppName]-YYYY.MM.DD • More flexibility • Per App Retention • Re-process Raw Logs • Utilize Reindex API Index Strategy 42 Abandon Single Daily Index

Migration Strategy 43 Reindex API 1.7 2.4 Two Phases Reindex
API 5.3

Migration Strategy Patience Pays Off

Migration Strategy 45 Reindex API 1.7 5.3.2 Seamless Transition

46 curl –XPOST ”ES_STACK:9200/_reindex?wait_for_completion=true” –d‘ { "source": { "remote": {
"host": "http://ELK_STACK:9200" }, "index": "logstash-2017.05.01", "query": { "match": { "type": ”APP" } } }, "dest": { "index": "logstash-APP-2017.05.01" } }’ My New Favorite API Backfill with Reindex API

Building More Robust Stack 47 CONFIGURATION ELK STACK ELASTIC STACK
EC2 Instance R3.2xl 750GB EBS / 61GB RAM i3.2xl 1950GB Ephemeral / 61GB RAM Node Combined Master/Data Separate Masters / Data / Clients Clustering Monitoring Kopf Elastic Monitoring Basic System Metrics Ganglia Metricbeats Load Balancing Direct Calls AWS ELB Index Single Daily Index (3 Shards) Custom Index per App (5 Shards) Hot/Warm NA Configured

48 Nativo Elastic Stack 5.0 Redis Messaging Queue Nodes (X)
Logstash Elasticsearch X-pack Master Nodes (3) Ingest Nodes (6) Data Nodes - Hot (6) Data Nodes - Warm (6) Kibana X-pack Instances (2)

100% Elastic Stack 49

Production Journey Index Management

• Producing 360 Shards per Day • Daily Indices Ranged:
300MB to 60GB • <1GB Would be Weekly Index • >1GB Would be a Daily Index • logstash-APP-%{+YYYY-ww} Too Many Shards 51

300MB to 60GB • <1GB Would be Weekly Index • >1GB Would be a Daily Index • logstash-APP-ww-%{+YYYY-ww} Too Many Shards 52

300MB to 60GB • <1GB Would be Weekly Index • >1GB Would be a Daily Index • logstash-APP-mm-%{+YYYY-mm} Too Many Shards 53

54 input { file { path => [ "/path/{ec2.APP}.json"] sincedb_path
=> "/path/sincedb-{ec2.APP}" start_position => "beginning" codec => “json” add_field =>{"ec2.IndxPat" =>”{{ec2_IndxPat}}”} type => ”%{ec2.APP}" tags => ["logstash"] } } 100-Input-{ec2.APP}.conf Shipper Input Config Shipper

55 output { if [ec2.IndxPat] == “weekly” { elasticsearch {
flush_size => 4000 hosts => ["http://es-elb-dns:9200"] manage_template => false index => "logstash-%{ec2.App}-ww-%{+YYYY.ww}" } } … 900-logstash-output.conf Indexer Output Config Indexer

56 output { … else { elasticsearch { flush_size =>
4000 hosts => ["http://es-elb-dns:9200"] manage_template => false index => "logstash-%{ec2.App}-%{+YYYY.MM.dd}" } } } 900-logstash-output.conf Indexer Output Config Indexer

57 output { if [ec2.IndxPat] == “weekly” { elasticsearch {
flush_size => 4000 hosts => ["http://es-elb-dns:9200"] manage_template => false index => ”metricbeat-%{ec2.App}-ww-%{+YYYY.ww}” document_type => "metricsets" } } 901-metricbeat-output.conf Indexer Output Config Indexer

58 output { … else { elasticsearch { flush_size =>
4000 hosts => ["http://es-elb-dns:9200"] manage_template => false index => ”metricbeat-%{ec2.App}-%{+YYYY.MM.dd}” document_type => "metricsets" } } } 901-metricbeat-output.conf Indexer Output Config Indexer

59 cluster.name: ”my_cluster” node.name: “tag_name” node.master: “false” node.data: “true” node.ingest:
“true” # Set Node to Hot or Warm node.attr.box_type:“hot” elasticsearch.yml Hot Data Nodes Configuration

60 { “template” : “logstash-*”, “settings” : { “index.routing.allocation.require.box_type”:“hot” }
}’ Set to Hot at Creation Index Routing

61 cluster.name: ”my_cluster” node.name: “tag_name” node.master: “false” node.data: “true” node.ingest:
“false” # Set Node to Hot or Warm node.attr.box_type:“warm” elasticsearch.yml Warm Data Nodes Configuration

62 curl –XPUT ”ES_STACK:9200/_settings/logstash-APP-2017.05.01” –d‘ “settings” : { “index.routing.allocation.require.box_type”: “warm”
} Move to Warm Nodes Index Routing

1. Distributing Log Enrichment 2. Standardizing on JSON 3. Be
Patience Elastic Keys to Success at Nativo 63

Pain Points

• Kibana Dashboard Managements • Machine Learning Panels/Dashboards • Internal
Elasticsearch Performance Metrics • Forcemerge • Reindex Performance Pain Points 65 What I’d Like To See Improved

Future Enhancements A New Journey

• Upgrade Cluster to > 5.6.x • Prepare Single Type
per Index • Training Team for Removal _all • Cross Cluster Search (Hot/Warm) • New Features: Canvas and SQL Elastic Stack 6.x 67 Preparing Changes

68 Add Monitoring Cluster Kibana X-pack Redis Messaging Queue Nodes
(X) Logstash Instances (2) Elasticsearch X-pack Master Nodes (3) Ingest Nodes (6) Data Nodes - Hot (6) Data Nodes - Warm (6) Elasticsearch X-pack Master Nodes (3) Ingest Nodes (3) Data Nodes (3)

Future Enhancements Monitoring 2.0

• UI for Alerting • Evaluate APM • Move away
from Threshold Monitoring X-PACK 70 It’s Time

ML – Anomaly Detection X-Pack

Single Metric – Publisher 8 X-Pack

Single Metric – Publisher 12 X-Pack

Questions? Visit me at the AMA

www.elastic.co

Except where otherwise noted, this work is licensed under http://creativecommons.org/licenses/by-nd/4.0/
Creative Commons and the double C in a circle are registered trademarks of Creative Commons in the United States and other countries. Third party marks and brands are the property of their respective holders. 76 Please attribute Elastic with a link to elastic.co

Elastic{ON} 2018 - Nativo ELK to Elastic Stack:...

Elastic{ON} 2018 - Nativo ELK to Elastic Stack: A Production Journey of 3+ Years

More Decks by Elastic Co

Other Decks in Technology

Featured

Transcript