Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Elastic Madison & Milwaukee Meetup - 4/7/16

Elastic Madison & Milwaukee Meetup - 4/7/16

http://www.meetup.com/Madison-Milwaukee-Elastic-Fantastics/events/229464001/

TDS Telecom - Looking to implement the Elastic Stack

To start the meetup, our generous hosts and power users, TDS Telecom, will say hi and share how TDS Telecom is looking to use the Elastic Stack to provide logging solutions in a number of areas of the business.

Matt Gruett is the manager of the IT Risk Management & Security team at TDS Telecom. He and his team manage customer-facing and internal enterprise security for TDS Telecom, which is the wireline and cable service provider division of parent company Telephone and Data Systems, Inc. (NYSE: TDS) While at TDS Telecom, Matt has worked as an architect and network security engineer creating security solutions for TDS Telecom’s residential and commercial customers for both the telephone and cable business units of the company. Previous to TDS Telecom, he worked in security and system administration/engineering in a number of other industry sectors including printing, entertainment and manufacturing at numerous companies including Big Idea Productions and Rockwell Automation. He has also done freelance writing including authoring a weekly column for technology blog network LockerGnome.com.

Tony Bichanich has been a system administrator/engineer in the Madison area for over 18 years, including the last 12 months within the TDS Telecom security team helping to tackle the security needs that arise in supporting enterprise business networks and service provider customer carrier networks with his big data expertise and experience. This is Tony's second stint at TDS Telecom, having previously worked for nearly 10 years on the systems team that managed servers within the TDS Telecom ISP. Before joining the security team with TDS Telecom he worked in a number of start-ups in the Madison area in varied industries including social media, could storage, and Internet retail.

Elastic Co

April 07, 2016
Tweet

More Decks by Elastic Co

Other Decks in Technology

Transcript

  1. TDS Telecom - www.tdstelecom.com Agenda • Who is TDS Telecom

    • A mature log lifecycle • The pieces in the Elastic Stack • Details on our implementation • Next Steps • Thinking about data in your company
  2. TDS Telecom - www.tdstelecom.com Who is TDS Telecom? • Founded

    in 1969 • Operates in 32 states as both phone and cable service provider of data, TV and voice for residential and commercial customers • Operates subsidiaries including TDS Metrocom, TDS Cable, Bend Broadband and OneNeck IT Solutions • Part of TDS Inc. which is the parent company of TDS Telecom and other sister companies including US Cellular (NYSE: TDS)
  3. TDS Telecom - www.tdstelecom.com Fiber Gigabit Quiz • Comcast 38%

    (35 municipalities) • TDS - 30% (27) • AT&T - 11% (10) • Grande - 5% (5) • C-Spire - 4% (4) • CenturyLink - 3% (3) • Suddenlink - 3% (3) • Cox, Frontier and Google each have 1% Source: George Winston, “The Gigabit Map,” Broadcasting & Cable, July 27, 2015, p18-21. ISP percentage breakdown across the top 10 states with the highest number of residential gigabit municipalities
  4. TDS Telecom - www.tdstelecom.com TDS Telecom Security Team • Responsible

    for all IT security for TDS Telecom • Responsible for network security device management • All enterprise firewalls, B2B VPNs, user access VPNs and IDS/IPSes • All service provider firewalls, B2B VPNs and IDS/ IPSes
  5. TDS Telecom - www.tdstelecom.com TDS Telecom Security Team • Team

    inherited all network security device management from two other teams within the organization • Two different syslog systems • One of the IDS systems put alert logs in a MySQL database • i.e. lots of log data in lots of disparate places
  6. TDS Telecom - www.tdstelecom.com TDS Telecom Security Team • Enterprise

    and ISP combined generate 255 million log events per day that are ~0.4K in size from ~100 firewalls • After augmenting (geoIP) each log record grows by about 40% • Daily ingest volume of 140 GB/day after augmentation • Not counting 45 firewalls that are currently logging into a separate element management system (EMS) • Not counting any IDS logs since that is being moved to a new platform
  7. TDS Telecom - www.tdstelecom.com Healthy Log Lifecycle • Life cycle

    of logs - Collect/Transmit - Normalize - Store - Analyze - Archive - Delete
  8. TDS Telecom - www.tdstelecom.com Collect/Transmit • Old - No remote

    aggregation - No guaranteed reliability • New - Multiple collection methods - Encryption in transmission - Local queueing if network is down
  9. TDS Telecom - www.tdstelecom.com Normalize • Old - ? •

    New - Aggregated multiple lines together - Split lines apart - Add tags - Normalize fields - Create new tags from log entry data - Obfuscate
  10. TDS Telecom - www.tdstelecom.com Store • Old - Store in

    flat files - Rotate files, compress old files - cat works great for the beginning of the day - tac works great for latest data from the past few minutes - Middle of the log file is a headache - Compressed files need to be completely unpacked first - cat data-logs.log | grep -i error | head 2000 | tail 200 | awk '{ print $2 "-" $4 }' | sort -n | awk 'BEGIN{ FS = "-" } { print $4 "-" $2 }'
  11. TDS Telecom - www.tdstelecom.com Store • New - Store in

    an indexed system - Instantly select based on time range - Uses implicit schemas and not explicit schemas
  12. TDS Telecom - www.tdstelecom.com Analyze • Old - swatch -

    Scripted job running output from cron (cat, grep, head, tail, awk, sort, uniq) - E-mail a text file with filtered log output • New - Stored views - E-mail reports - Text/trap on alert conditions - Visualizations
  13. TDS Telecom - www.tdstelecom.com Archive • Old - Compress the

    file once a day - Kills disk I/O during the compressing - Jeopardizes loss of data as files are rotated - Expensive on CPU and disk to search through old compressed log files • New - Native compression means no penalty for searching recent or old logs - No need to switch search context between current and archived
  14. TDS Telecom - www.tdstelecom.com Delete • Old - Delete jobs

    need to be built into log rotation jobs • New - Drop an index to delete old data if indexing by date
  15. TDS Telecom - www.tdstelecom.com About Elasticsearch • Elasticsearch (search storage)

    - Written by Shay Banon to replace Compass - Compass was written to do full indexing of his wife’s recipes - RESTful API with Apache Lucene query syntax - First version released in February of 2010 - Uses JSON over HTTP - Written in Java
  16. TDS Telecom - www.tdstelecom.com About Logstash • Logstash (shipper and

    indexer) - Open source, Apache license - Written by Jordan Sissel ✦ sys admin (Loggly, Heroku, Dreamhost) ✦ hate-driven development - Written in JRuby (mostly) ✦ runs on a JVM - Plugins written in Ruby
  17. TDS Telecom - www.tdstelecom.com About Kibana • Kibana (web interface)

    - Written by Rashid Khan ✦ Infrastructure Engineer at Village Voice Media ✦ Similar situation as Jordan Sissel - Didn’t like logstash-web
  18. TDS Telecom - www.tdstelecom.com Elasticsearch Details • Elasticsearch as a

    company - Elasticsearch BV founded in 2012 by Shay - Series C funding of $70 million in June of 2014 - Hired Jordan Sissel (logstash) - Hired Rashid Khan (Kibana) - Brings all parts of the Elastic Stack together - Has most momentum behind it and the best scaling examples in terms of volume and velocity
  19. TDS Telecom - www.tdstelecom.com Elasticsearch Details • Elasticsearch has native

    support for distributed operations and cluster building • Indexes are split into shards and then the shards are distributed across multiple nodes • Default index is time series • Index split into 5 shards by default
  20. TDS Telecom - www.tdstelecom.com Elasticsearch Details • Can replicate indexes

    and shards - Index replication with 2 replicas ✦ P1, R1, R2 - Index and shard replication with 1 replica ✦ P1A, P1B, R1A, R1B - Index and shard replication with 2 replicas ✦ P1A, P1B, R1A, R1B, R2A, R2B • P1 and R1 automatically go onto different nodes
  21. TDS Telecom - www.tdstelecom.com Elasticsearch Details Index and shard replication

    with 2 replicas Node 1 Node 2 Node 3 P1A P1B R1A R1B R2A R2B
  22. TDS Telecom - www.tdstelecom.com Elasticsearch Details • Different type of

    nodes - Master nodes - Client nodes - Data nodes - Tribe Nodes
  23. TDS Telecom - www.tdstelecom.com Logstash Details • Logstash (shipper and

    indexer) - Concept of input, filters, outputs and codecs - inputs - redis, amqp, zeromq, eventlog, file, twitter - filters - geoip, checksum, multi-line, split, anonymize - outputs - elasticsearch, email, pagerduty, nagios - codecs - netflow, graphite, json, lines - 40+ inputs, 20 codes, 40+ filters, 50+ outputs - grok as a parser with regex library (also by Sissel)
  24. TDS Telecom - www.tdstelecom.com Logstash Details • Example logstash config:

    input { file { type => "syslog" path => ["/es/netscreen.log"] } } filter { … } output { elasticsearch { host => "192.168.1.11" } }
  25. TDS Telecom - www.tdstelecom.com Logstash Details • Example logstash config:

    filter { if [type] == "syslog" { if "icmp" in [message] { grok { match => { "message" => ["%{TDSNETSCREENICMP}"] } add_tag => "grokked" add_tag => "firewall" add_tag => "netscreen" } } else { grok { match => { "message" => ["%{TDSNETSCREEN}"] } } } }
  26. TDS Telecom - www.tdstelecom.com Logstash Details - Grok • Example

    grok patterns: /opt/logstash/patterns/grok-patterns # Networking MAC (?:%{CISCOMAC}|%{WINDOWSMAC}|%{COMMONMAC}) CISCOMAC (?:(?:[A-Fa-f0-9]{4}\.){2}[A-Fa-f0-9]{4}) WINDOWSMAC (?:(?:[A-Fa-f0-9]{2}-){5}[A-Fa-f0-9]{2}) COMMONMAC (?:(?:[A-Fa-f0-9]{2}:){5}[A-Fa-f0-9]{2}) IPV4 (?<![0-9])(?:(?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]| 2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2}) [.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2}))(?![0-9]) IP (?:%{IPV6}|%{IPV4}) HOSTNAME \b(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:\.(?:[0-9A-Za-z][0-9A-Za- z-]{0,62}))*(\.?|\b) HOST %{HOSTNAME}
  27. TDS Telecom - www.tdstelecom.com Logstash Details - Grok • Example

    grok patterns: # Years? YEAR (?>\d\d){1,2} HOUR (?:2[0123]|[01]?[0-9]) MINUTE (?:[0-5][0-9]) # '60' is a leap second in most time standards and thus is valid. SECOND (?:(?:[0-5]?[0-9]|60)(?:[:.,][0-9]+)?) TIME (?!<[0-9])%{HOUR}:%{MINUTE}(?::%{SECOND})(?![0-9]) # datestamp is YYYY/MM/DD-HH:MM:SS.UUUU (or something like it) DATE_US %{MONTHNUM}[/-]%{MONTHDAY}[/-]%{YEAR} DATE_EU %{MONTHDAY}[./-]%{MONTHNUM}[./-]%{YEAR} ISO8601_TIMEZONE (?:Z|[+-]%{HOUR}(?::?%{MINUTE})) ISO8601_SECOND (?:%{SECOND}|60) TIMESTAMP_ISO8601 %{YEAR}-%{MONTHNUM}-%{MONTHDAY}[T ]%{HOUR}:?% {MINUTE}(?::?%{SECOND})?%{ISO8601_TIMEZONE}? DATE %{DATE_US}|%{DATE_EU} DATESTAMP %{DATE}[- ]%{TIME}
  28. TDS Telecom - www.tdstelecom.com Logstash Details - Grok • Example

    custom pattern: TDSNETSCREENICMP %{TIMESTAMP_ISO8601} %{IPORHOST:device} %{DATA:devicename}: NetScreen device_id=%{DATA:device_id} %{DATA}: start_time=%{QUOTEDSTRING:start_time} duration=%{INT:duration} policy_id=%{INT:policy_id} service=%{DATA:service} proto=% {INT:proto} src zone=%{DATA:src_zone} dst zone=%{DATA:dst_zone} action=%{WORD:action} sent=%{INT:sent} rcvd=%{INT:rcvd} src=% {IPORHOST:src_ip} dst=%{IPORHOST:dst_ip} icmp type=%{INT:icmp_type} icmp code=%{INT:icmp_code} src-xlated ip=%{IPORHOST:src_xlated_ip} dst-xlated ip=%{IPORHOST:dst_xlated_ip} session_id=%{INT:session_id} reason=%{GREEDYDATA:reason}
  29. TDS Telecom - www.tdstelecom.com Logstash Details • Log record:
 


    • Grok parser match • JSON tagged record: Sep 2 15:30:14 192.168.1.21 <133>tdsfirewall: NetScreen device_id=tdsfirewall [Root]system-notification-00257(traffic): start_time="2015-09-02 15:30:11" duration=3 policy_id=38001 service=icmp proto=1 src zone=enterprise dst zone=co-mgmt action=Permit { "duration":"3","policy_id":"38001","service":"icmp","proto":"1" } TDSNETSCREENICMP %{TIMESTAMP_ISO8601} %{IPORHOST:devce} %{DATA:devicename}: NetScreen device_id=%{DATA:device_id} %{DATA}: start_time=% {QUOTEDSTRING:start_time} duration=%{INT:duration} policy_id=%{INT:policy_id} service=% {DATA:service} proto=%{INT:proto} src zone=%{DATA:src_zone} dst zone=%{DATA:dst_zone}
  30. TDS Telecom - www.tdstelecom.com Logstash Details • Example geoip logstash

    config: filter { geoip { database => "/opt/logstash/vendor/geoip/GeoIPCity.dat" source => "src_ip" target => "SourceGeo" } geoip { database => "/opt/logstash/vendor/geoip/GeoIPCity.dat" source => "dst_ip" target => "DestinationGeo" } }
  31. TDS Telecom - www.tdstelecom.com Kibana Details • Kibana • Support

    for histograms, charts (area, bar, line, pie), maps • Build visualizations of data with individual charts • Put all the charts and other metrics together into a dashboard • Better to see it than describe it
  32. TDS Telecom - www.tdstelecom.com The Details Server Function Type Logstash

    Virtual Kafka Virtual Master node Virtual Client node Virtual Data node Physical Kibana Virtual
  33. TDS Telecom - www.tdstelecom.com The Details • Logstash Producers •

    8 GB RAM; 4 cores; 1 TB of disk • Kafka • 8 GB RAM; 4 cores; 50 GB of disk • Logstash Consumers • 8 GB RAM; 8 cores; 20 GB of disk
  34. TDS Telecom - www.tdstelecom.com The Details • Data nodes •

    HP DL380 G9 servers • 2 E5-2620v3 2.4 GHz CPUs (6 cores per CPU x 2 threads per core via Hyperthreading x 2 CPUs =
 24 threads per data node) • 64 GB RAM • 12 10K RPM 300 GB disks per data node in RAID 10 giving each node 1.8 TB usable across 6 striped and mirrored spindles
  35. TDS Telecom - www.tdstelecom.com The Details • Master nodes •

    8 GB RAM; 2 cores; 100 GB of disk • Client/Kibana nodes • 8 GB RAM; 8 cores; 20 GB of disk • Marvel nodes • 8 GB RAM; 2 cores; 100 GB of disk
  36. TDS Telecom - www.tdstelecom.com Support Extras • Resources to deal

    with issues • Marvel - Cluster management • Shield - Authorization for data in Elasticsearch • Watcher - Monitoring of events
  37. TDS Telecom - www.tdstelecom.com The Details • Experimenting with whether

    enterprise and ISP go into the same cluster with permissions on the indexes or different clusters • Different clusters means additional master and client nodes but could take data nodes and split them into two clusters
  38. TDS Telecom - www.tdstelecom.com Alerting based on Metadata • IDS

    is always writing logs with connection hash for any HTTP traffic • IDS writes logs with connection hashes when certain SQL inject indicators
 are detected Tap Firewall Web site Evil host IDS Connection hash for HTTP POST Connection hash for SQL inject traffic
  39. TDS Telecom - www.tdstelecom.com Alerting based on Metadata • HTTP

    Post log gets GeoIP information added and gets stored in the database • SQL inject log gets stored in the database; no need to duplicate GeoIP info Connection hash for HTTP POST Connection hash for SQL inject traffic Log enrichment Log enrichment
  40. TDS Telecom - www.tdstelecom.com Alerting based on Metadata • Watcher

    queries continuously for SQL injection logs • When a match occurs, queries for all traffic of that hash • Logic says X amount of times, foreign country, etc. then send an alert Watcher Security On-Call Firewall Connection hash
  41. TDS Telecom - www.tdstelecom.com Augmenting with GeoIP • GeoIP •

    City • Continent • Country • IP • Latitude • Longitude • Region • Timezone https://www.elastic.co/guide/en/logstash/current/plugins-filters-geoip.html
  42. TDS Telecom - www.tdstelecom.com Next Steps • Create your own

    custom GeoIP tables by downloading the CSV files from Maxmind. • Overlay (or replace) with your public or private IP address mapping and then compile into the database format • https://github.com/maxmind for scripts to do this
  43. TDS Telecom - www.tdstelecom.com Next Steps • Finish integration of

    remaining firewalls (70% of the total 255 million records is currently being sent into Elasticsearch) • Enhance our dashboards • Start exposing data to other support and security groups for self reporting • Play around with estab (https://github.com/miku/ estab)
  44. TDS Telecom - www.tdstelecom.com Next Steps • Application monitoring with

    Beats • http://demo.elastic.co/packetbeat • Database monitoring with Beats • Other areas in the company where logs are parked • Start using Elasticsearch to graph network performance • https://insight.gloriad.org/insight/
  45. TDS Telecom - www.tdstelecom.com Next Steps • We know this

    is not the final design; idea is to be fast and start using it then pivot to see how we need to adjust (fail fast) • Perhaps put app data from Beats in the same cluster (different indexes) to quickly prove the use then launch separate cluster with VM-only data nodes for handling application traffic • Get the application into the hands of the support staff and then see how we need to adjust course
  46. TDS Telecom - www.tdstelecom.com Next Steps • Use Elasticsearch as

    a search engine • Use Elasticsearch for forensic investigations • Use the Elastic Stack to start monitoring social feeds (marketing use; also threat intelligence source) • Start visually mapping the geography we already put together for abuse • Add other Logstash augmentation fields such as connection hash • Create “junk” cluster?
  47. TDS Telecom - www.tdstelecom.com Exciting New Features • Graph •

    Reindexing in ES • Watcher GUI • Shield GUI • Standup internal SaaS version of the Elastic Stack called Elasticsearch Cloud Enterprise making internal ordering of an Elastic Stack instance as easy as ordering from https://www.elastic.co/cloud (formerly Found)
  48. TDS Telecom - www.tdstelecom.com Disruptive Aspect • Looked at Elastic

    Stack to solve a logging problem • Realized lots of other business uses • Not sure if it is a transformative game-changing tool or not but confident that it will be a key part in many aspects of the business at TDS Telecom • Internal OS, app, network & security monitoring • Internal fraud monitoring • Facilitating exposure of data to customers
  49. TDS Telecom - www.tdstelecom.com Tips • Think creatively about areas

    in your company where you have data today and imagine what it would look like if it was indexed and searchable • Think about location data • What would it look like to map that data visually? • Can you add geographic context to data you have? • Barrier to index data as either too large, too expensive or too labor intensive has been removed • Start small but think big
  50. TDS Telecom - www.tdstelecom.com Credits and the Future • Thanks

    to the network and server teams at TDS Telecom (especially Mark, Frits, Jerry and Peter) • Thanks to Thad for helping to get this rolling • Ideas for next meetup? Presenters?