Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Elastic{ON} 2018 - Log Aggregation for Traffic Control CDN

Elastic Co
March 01, 2018

Elastic{ON} 2018 - Log Aggregation for Traffic Control CDN

Elastic Co

March 01, 2018
Tweet

More Decks by Elastic Co

Other Decks in Technology

Transcript

  1. • Team • 6 Engineers. Half split between Dev and

    Ops. • CDN Deployment • Hundreds of physical servers in 13 datacenters • Hundreds of containers (LXC and Docker) • Apache Traffic Server and Traffic Control • Elasticsearch for access logs CDN Team and Deployment overview
  2. 6 • Set of components that can be used to

    build, monitor, configure, and provision a large scale content delivery network (CDN) http://trafficcontrol.apache.org/ Apache Traffic Control DNS/HTTP client steering to closest/best available cache Implements CDN health protocol and present states to Traffic Router Acquire CDN wide statistics and store information in InfluxDB An API driven configuration management and configuration file generation system A client facing UI used to manage and operate a CDN
  3. • Supported by the enterprise • Minimal amount of access

    logs (<100GB) • Functional • Search and visualize logs • Provides reports and dashboards • Triggers alarms We have a logging platform - 2015 9
  4. • Explosion in CDN usage • Electronic Program Guides •

    Images and poster arts • Increasing IP Video Delivery • Experiencing slowdown in reports • Retention time reduction • Filtering events, losing visibility • Getting too expensive… We have a logging problem - 2016 10
  5. Elastic proof of concept – early 2016 Production Lab Edge/Mid

    Traffic Server Logs Filebeat - File Input - Filters - Add Tags - Beat Output Logstash Indexers N+1 - Beat Input - Filtering (KV) - Redis TCP Localhost Output Logstash Indexers N+1 - Redis Input - Elasticsearch Output Elasticsearch (Data 3..N) Elastic Search - HTTP API - Clustered - Replicated Redis (2) Redis - TCP Input/Output - Buffers events - Processed data
  6. Our current logging pipeline Production Edge/Mid Traffic Server Logs Filebeat

    - File Input - Filters - Add Tags - Kafka Output Logtash Indexers N+1 - Kafka Input - KV Filter - Elasticsearch Output Elasticsearch (Data 3..N) Elastic Search - HTTP API - Clustered - Replicated Kafka (3) - Multiple Topics - Retention Policies - Replicas Traffic Router Access Logs Filebeat - File Input - Add Tags - Kafka Output Kafka Stream Aggregator N+1 - Downsample data - Elasticsearch Output
  7. • Ansible playbooks with Docker • Elasticsearch • Logstash •

    Curator Elasticsearch Deployment 13 - name: Create elasticsearch container docker_container: name: "{{ inventory_hostname }}" image: "docker.elastic.co/elasticsearch/elasticsearch:6.1.2" env: xpack.monitoring.enabled: "{{ xpack_monitor_enabled | lower}}” bootstrap.memory_lock: “true” restart_policy: unless-stopped path.data: "{{ item['es_disks'] | join(',') }}" cpu_set: "{{ item.cpu_set | default([]) }}" … ports: - 9200:9200 - 9300:9300
  8. 14 Early on… Then… Now… • 5 Physical Servers (192GB)

    • Single node • 6 x 1.8TB 10k SAS (RAID0) • 10 Physical Servers (192GB) • 3 nodes per servers • 3 x 2TB SSDs • 10 Physical Servers (192GB) • 1 node (96GB tmpfs) - HOT • 1 node (SSDs) - WARM Elastic Nodes deployment
  9. • Using Hot/Warm architecture • tmpfs (RAM) • NVMe (10us

    writes/500K IOPS) • Hourly indices • Curator to move indexes to SSD Nodes Increasing indexing performance
  10. • This was across 10 Physical hosts - Older E5-26xx.

    • Limited by Logstash (6 instances) Applying to production
  11. • Lower Logstash indexers CPU usage • Better filtering capabilities?

    • Enable other consumers • Increase Filebeat CPU usage at Edge 1519769996.088 chi=174.79.69.70 phn=edge01.rd.at.cox.net shn=example.com url=http://cdn.example.ott.cox.net/test cqhm=GET cqhv=HTTP/1.1 pssc=200 ttms=0 b=52505 sssc=200 sscl=52505 cfsc=FIN pfsc=FIN crc=TCP_MISS phr=DIRECT uas="NING/1.0" range="-" JSON instead of RAW KV Logs Improving pipeline efficiency