Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Elastic{ON} 2018 - Nativo ELK to Elastic Stack: A Production Journey of 3+ Years

Elastic Co
March 01, 2018

Elastic{ON} 2018 - Nativo ELK to Elastic Stack: A Production Journey of 3+ Years

Elastic Co

March 01, 2018
Tweet

More Decks by Elastic Co

Other Decks in Technology

Transcript

  1. Nativo 2/28/2018 @richhorace Nativo ELK to Elastic Stack: A Production

    Journey of 3+ Years Rich Horace, Director of DevOps
  2. • Director of DevOps at Nativo • Last 10+ years

    in online advertising space and early-stage startups • Large-scale Infrastructure, both Linux/Windows • Working with Log Aggregation Systems since 2008 • Elastic LA Meetup Organizer since March 2015 About Me 2
  3. Agenda 3 1 Nativo Backstory 2 ELK Stack at Nativo

    3 Building a Robust Elastic Stack 4 Future Enhancements Production Journey
  4. • Startup in LA • Launched as Postrelease in 2010

    • Hired First Sale Person in 2012 • Rebranded to Nativo in 2013 Nativo 5 Early Days
  5. • Customer Base within Top 200 comScore • Approaching 200

    Employees Nativo 6 Fast Forward to Now
  6. Nativo Ad Executions 10 In-feed player supporting click-to-play, scroll-to- play,

    and auto-play execution types Native Video Non-interruptive native ad clicks through to a custom landing page hosted on your site Native Article In-feed ad to external advertiser landing page or publisher designated URL Native Display
  7. • Joined March 2014, Employee #19 • Six Engineers •

    No Visibility System Performance • Inadequate Monitoring Nativo Engineering 12 Early Days
  8. • Over 30 Engineers • Elastic Stack • Mission Critical

    System • Over 1000 Custom Performance Metrics • Ingesting Data from 30 Applications • 150 Million Docs and 225GB per day • Will Not Deploy to Production without Elastic Stack Nativo Engineering 13 4 Years Later
  9. Oden Cohen, CTO { } The velocity achieved at Nativo

    would not be possible without the metrics that Elastic Stack provides.
  10. TEAM 2014 2018 Ad Server Web Team Data Team DevOps

    Team Nativo Tech Stack 15 Fast Moving Platform
  11. TEAM 2014 2018 Ad Server Web Team Data Team DevOps

    Team Nativo Tech Stack 16 Fast Moving Platform opt-in
  12. In the Beginning ELK Stack • Implemented May 2014 •

    Three AWS Regions • Elasticsearch 1.3 / Logstash 1.4 / Kibana 3.0 • Logstash deployed to all EC2 Instances • Distribute Log Enrichment • Standardized Application Logs as JSON Shippers Shippers Indexers
  13. • 100 – Input Files • 500 – Filter Files

    • 900 – Output Files Logstash Configuration 22 Manageability Improvement
  14. 23 input { file { path => [ "/path/{ec2.APP}.json"] sincedb_path

    => "/path/sincedb-{ec2.APP}" start_position => "beginning" codec => "json" type => "{ec2.APP}" tags => ["logstash"] } } 100-Input-{ec2.APP}.conf Shipper Input Config Shipper
  15. 24 filter { mutate { add_field => { "ec2.Name" =>

    "{{ec2_tag_Name}}" "ec2.Cluster" => "{{ec2_tag_Cluster}}" "ec2.App" => "{{ec2_tag_App}}" "ec2.Environment" => "{{ec2_tag_Env}}" "ec2.NativoTeam" => "{{ec2_tag_Team}}" "ec2.InstanceType" => "{{ec2_inst_type}}" "ec2.Region" => "{{ec2_region}}" } } } 500-filter-ec2.conf Shipper Filter Config Shipper
  16. 25 output { redis { host => {{shipper_redis.hosts}} shuffle_hosts =>

    true data_type => "list" batch => "true" batch_events => "250" key => "logstash" } } 900-output.conf Shipper Output Config Shipper
  17. 26 input { redis { batch_count => 1000 codec =>

    "json" data_type => "list" host => "localhost" key => "logstash" } } 100-input-redis.conf Indexer Input Config Indexer
  18. 27 500-filter-redis.conf Indexer Filter Config Indexer • No Filtering on

    Indexers • Log Enrichment Distributed on Shippers • Eliminates Bottleneck on Indexers • Eliminates Config Management on Indexers
  19. ELK Stack • October 2015 • Elasticsearch 1.7 / Logstash

    1.4 / Kibana 4.4 • Single Daily Index • 30M Docs per day • Data Nodes: 3 - M3.2xlarge / 1TB EBS Shippers Shippers Indexers
  20. • Hold Out For Dot support in field names •

    Need New Indexing Strategy • Elastic Stack 2.0 Products in Transition • Elastic Stack 5.0 Release Coming Q4 Things Just Got Tricky 36 Be Patient
  21. ELK Stack • January 2017 • Four AWS Regions •

    Elasticsearch 1.7.1 / Logstash 1.4.2 / Kibana 4.4 • Single Daily Index • 100-150M Docs per day • 50GB – 80GB per day • Data Nodes: 6 - R3.2xlarge / 750GB EBS
  22. • Indices per App • Metrics vs Logging • logstash-[AppName]-YYYY.MM.DD

    • metricbeat-[AppName]-YYYY.MM.DD • More flexibility • Per App Retention • Re-process Raw Logs • Utilize Reindex API Index Strategy 42 Abandon Single Daily Index
  23. 46 curl –XPOST ”ES_STACK:9200/_reindex?wait_for_completion=true” –d‘ { "source": { "remote": {

    "host": "http://ELK_STACK:9200" }, "index": "logstash-2017.05.01", "query": { "match": { "type": ”APP" } } }, "dest": { "index": "logstash-APP-2017.05.01" } }’ My New Favorite API Backfill with Reindex API
  24. Building More Robust Stack 47 CONFIGURATION ELK STACK ELASTIC STACK

    EC2 Instance R3.2xl 750GB EBS / 61GB RAM i3.2xl 1950GB Ephemeral / 61GB RAM Node Combined Master/Data Separate Masters / Data / Clients Clustering Monitoring Kopf Elastic Monitoring Basic System Metrics Ganglia Metricbeats Load Balancing Direct Calls AWS ELB Index Single Daily Index (3 Shards) Custom Index per App (5 Shards) Hot/Warm NA Configured
  25. 48 Nativo Elastic Stack 5.0 Redis Messaging Queue Nodes (X)

    Logstash Elasticsearch X-pack Master Nodes (3) Ingest Nodes (6) Data Nodes - Hot (6) Data Nodes - Warm (6) Kibana X-pack Instances (2)
  26. • Producing 360 Shards per Day • Daily Indices Ranged:

    300MB to 60GB • <1GB Would be Weekly Index • >1GB Would be a Daily Index • logstash-APP-%{+YYYY-ww} Too Many Shards 51
  27. • Producing 360 Shards per Day • Daily Indices Ranged:

    300MB to 60GB • <1GB Would be Weekly Index • >1GB Would be a Daily Index • logstash-APP-ww-%{+YYYY-ww} Too Many Shards 52
  28. • Producing 360 Shards per Day • Daily Indices Ranged:

    300MB to 60GB • <1GB Would be Weekly Index • >1GB Would be a Daily Index • logstash-APP-mm-%{+YYYY-mm} Too Many Shards 53
  29. 54 input { file { path => [ "/path/{ec2.APP}.json"] sincedb_path

    => "/path/sincedb-{ec2.APP}" start_position => "beginning" codec => “json” add_field =>{"ec2.IndxPat" =>”{{ec2_IndxPat}}”} type => ”%{ec2.APP}" tags => ["logstash"] } } 100-Input-{ec2.APP}.conf Shipper Input Config Shipper
  30. 55 output { if [ec2.IndxPat] == “weekly” { elasticsearch {

    flush_size => 4000 hosts => ["http://es-elb-dns:9200"] manage_template => false index => "logstash-%{ec2.App}-ww-%{+YYYY.ww}" } } … 900-logstash-output.conf Indexer Output Config Indexer
  31. 56 output { … else { elasticsearch { flush_size =>

    4000 hosts => ["http://es-elb-dns:9200"] manage_template => false index => "logstash-%{ec2.App}-%{+YYYY.MM.dd}" } } } 900-logstash-output.conf Indexer Output Config Indexer
  32. 57 output { if [ec2.IndxPat] == “weekly” { elasticsearch {

    flush_size => 4000 hosts => ["http://es-elb-dns:9200"] manage_template => false index => ”metricbeat-%{ec2.App}-ww-%{+YYYY.ww}” document_type => "metricsets" } } 901-metricbeat-output.conf Indexer Output Config Indexer
  33. 58 output { … else { elasticsearch { flush_size =>

    4000 hosts => ["http://es-elb-dns:9200"] manage_template => false index => ”metricbeat-%{ec2.App}-%{+YYYY.MM.dd}” document_type => "metricsets" } } } 901-metricbeat-output.conf Indexer Output Config Indexer
  34. 59 cluster.name: ”my_cluster” node.name: “tag_name” node.master: “false” node.data: “true” node.ingest:

    “true” # Set Node to Hot or Warm node.attr.box_type:“hot” elasticsearch.yml Hot Data Nodes Configuration
  35. 61 cluster.name: ”my_cluster” node.name: “tag_name” node.master: “false” node.data: “true” node.ingest:

    “false” # Set Node to Hot or Warm node.attr.box_type:“warm” elasticsearch.yml Warm Data Nodes Configuration
  36. 1. Distributing Log Enrichment 2. Standardizing on JSON 3. Be

    Patience Elastic Keys to Success at Nativo 63
  37. • Kibana Dashboard Managements • Machine Learning Panels/Dashboards • Internal

    Elasticsearch Performance Metrics • Forcemerge • Reindex Performance Pain Points 65 What I’d Like To See Improved
  38. • Upgrade Cluster to > 5.6.x • Prepare Single Type

    per Index • Training Team for Removal _all • Cross Cluster Search (Hot/Warm) • New Features: Canvas and SQL Elastic Stack 6.x 67 Preparing Changes
  39. 68 Add Monitoring Cluster Kibana X-pack Redis Messaging Queue Nodes

    (X) Logstash Instances (2) Elasticsearch X-pack Master Nodes (3) Ingest Nodes (6) Data Nodes - Hot (6) Data Nodes - Warm (6) Elasticsearch X-pack Master Nodes (3) Ingest Nodes (3) Data Nodes (3)
  40. • UI for Alerting • Evaluate APM • Move away

    from Threshold Monitoring X-PACK 70 It’s Time
  41. Except where otherwise noted, this work is licensed under http://creativecommons.org/licenses/by-nd/4.0/

    Creative Commons and the double C in a circle are registered trademarks of Creative Commons in the United States and other countries. Third party marks and brands are the property of their respective holders. 76 Please attribute Elastic with a link to elastic.co