DevOps Data Storage Presented by Brad Lhotsky

Brad Lhotsky • Systems and Security
 at Craigslist • Infrastructure Monitoring & Security
 at • Recovering • Perl Programmer • Linux/BSD Systems Admin • Network Security Specialist • PostgreSQL Administrator • ElasticSearch Janitor • DNS Voyeur • OSSEC Core Team Member

Expectations ‣Common Data Types ‣Using your Data ‣Features of Data Stores ‣Popular Data Stores

Types of DevOps-y Data

Administrative & Meta-Data ‣ Inventory ‣ Hardware ‣ Software ‣ Builds or Roles ‣ Services ‣ Users and Groups ‣ Employee/Contractor ‣ Managers ‣ ACLs

Monitoring ‣ State ‣ OK / NOT OK ‣ UP / DOWN ‣ Package Version ‣ Time Series ‣ Counter ‣ Rate ‣ Statistical Summaries

Events ‣ State Changes ‣ Package Updated ‣ Service Stopped ‣ System Events ‣ Syslog Message ‣ SNMP Traps ‣ Application Events ‣ Access ‣ Errors ‣ Traces

Deving and Oping Your Data

Monitoring and Metrics

-Nicole Forsgren - Monitorama PDX 2016 How Metrics Shape Your Culture “Metrics are your culture.”

"How Metrics Shape Your Culture" • You can't improve what you don't measure • Always measure things that matter • Things measured are things managed • Metrics can be gamed • Metrics inform incenticves • Not everything that can be counted counts • Hard to measure doesn't mean it isn't worth measuring

... and that's probably O.K. All of your monitoring is probably wrong.

Alerting ‣ Disrupting People's Lives at 95% Disk Full ‣ Thresholds -> Change Detection ‣ State Change Thresholds

Automation ‣ Can a Machine read my data? ‣ Autoscale ‣ Trend detection ‣ Service Level Roll Ups in Alerting ‣ Reporting

Capacity Planning ‣ Predicting System Stress Levels ‣ Make Intelligent Projections ‣ Test those predictions

Slide 16 text


Attractive Features

Open and Extensible ‣ Integrations with other projects ‣ Open API ‣ Good Documentation ‣ Modular / Plugin Structute ‣ Community

Reliability vs. Performance

Retention ‣ How easy is it to age data off? ‣ What regulations of laws apply to the data? ‣ Expectations from: ‣ Customers ‣ Employees ‣ Managers ‣ Peers ‣ Legal

Privacy and Security ‣ What do you keep on your users in your ops data? ‣ Who might come calling for it? ‣ How comfortable are you handing it over to Trump? ‣ Anyone hear about MongoDB? ‣ Can the store provide security?

Places People Stick DevOps Datas

‣ Large Community ‣ Forks and Oracle ‣ Performance First ‣ SQL Interface ‣ Limited Data Types ‣ Web > BI ‣ Suitable for Administrative Data

‣ Large Community ‣ Reliability First ‣ Open and Extensible ‣ PGXN ‣ CitusData ‣ GreenPlum ‣ EnterpriseDB ‣ Native Support for IP Addresses ‣ Extensible Data Types ‣ Suitable for Administrative Data

‣ Large Community ‣ Interchangeable Components, ala, MicroServices ‣ Simple API ‣ Rampant Open Source Adoption ‣ Scalable ‣ Compatibility ‣ Grafana, Statsd, Riemann, Bosun, Cabot, Seyren ‣ etc., etc., ‣ Suitable for Time Series Data ‣ Smallest Resolution: seconds

security.logging.indexer.*.total Metrics: Wildcards

sumSeries(security.logging.indexer.*.total) Combining Metrics

alias(sumSeries(security.logging.indexer.*.total),”Today") alias( timeShift( sumSeries(security.logging.indexer.*.total), “7d"), "Last Week") Comparing Metrics

alias(alpha(color(areaBetween( holtWintersConfidenceBands( maxSeries(*.jvm.mem.heap_used_bytes) ) ),“gray"),0.1),"Hot Winter Confidence Bands”) color(alias( maxSeries(*.jvm.mem.heap_used_bytes), "Max Heap Size"),"red") Advanced Tricks

‣ Metrics 2.0 ‣ Hadoop / Hbase backed ‣ SQL-like Language ‣ Zero Data Loss ‣ Compatibility ‣ Carbon, Grafana, Statsd, Riemann, Bosun ‣ Suitable for Time Series Data ‣ Smallest Resolution: milliseconds

‣ Metrics 2.0 ‣ SQL-like Language ‣ Zero Data Loss ‣ Compatibility ‣ Carbon, Grafana, Statsd, Riemann, Bosun ‣ Suitable for Time Series Data ‣ Smallest Resolution: nanoseconds

‣ Well Documented API ‣ Many Open Source Integrations ‣ Lucene backed text search ‣ Scalable ‣ "Jepsen ElasticSearch" re:CAP

Web Attacks Scanners

Slow Pages

App::ElasticSearch::Utilities Search Stuff! = Querying Indexes: lhr4-access-2015.06.03,ams4-access-2015.06.03 @timestamp src_ip src_ip_country file 2015-06-03T11:39:27+0200 GB /B1D671CF- E532-4481-99AA-19F420D90332/netdefender/hui/ndhui.css 2015-06-03T11:39:26+0200 ES /hotel/es/ # Search Parameters: # {"query_string":{"query":" AND crit:404"}} # Displaying 3 of (CENSORED) in 0 seconds. # Indexes (2 of 4) searched: ams4-access-2015.06.03,lhr4-access-2015.06.03 $ --base access and crit:404 \ --show src_ip,src_ip_country,file --size 2

Aggregate Stuff $ --base access --days 1 \ --top src_ip --size 3 = Querying Indexes: ams4-access-2015.06.03,lhr4-access-2015.06.03 count src_ip (CENSORED) (CENSORED) (CENSORED) # Search Parameters: # {"query_string":{"query":""}} # Displaying 3 of (CENSORED) in 5 seconds. # Indexes (2 of 2) searched: ams4-access-2015.06.03,lhr4-access-2015.06.03 # # Totals across batch # count src_ip (CENSORED) (CENSORED) (CENSORED)

Find Pages Viewed by Most Countries $ --base access --days 1 \ --top file --by cardinality:src_ip_country --size 3 = Querying Indexes: lhr4-access-2015.06.03,ams4-access-2015.06.03 cardinality:src_ip_country count file 239 (CENSORED) / 236 (CENSORED) /rt_data/city_bookings 234 (CENSORED) /wishlist/get # Search Parameters: # {"query_string":{"query":""}} # Displaying 3 of (CENSORED) in 21 seconds. # Indexes (2 of 2) searched: ams4-access-2015.06.03,lhr4-access-2015.06.03

Pipeline Queries $ --base access --days 1 \ --top src_ip --by sum:attack_score --size 3 \ --data-file top_attackers.dat $ --base access --days 1 \ src_ip:top_attackers.dat[-1] --size 3\ --show attack_score,src_ip,crit,dst,method,resource \ --sort attack_score:desc

Pipeline Queries = Querying Indexes: lhr4-access-2015.06.03,ams4-access-2015.06.03 @timestamp attack_score src_ip crit dst method resource 2015-06-03T04:20:59+0200 340 404 GET /plus/ search.php?keyword=as&typeArr[111%3D@`%5C'`)+/*!50000And*/+(/*!50000SeLECT*/+1+/*!50000frOM*/+(/*! 50000SeLECT*/+/*!50000Count(*)*/,concat(floor(rand(0)*2),(substring((/*!50000SeLECT*/ +CONCAT(0x40,userid,0x7c,substring(pwd,4,16))+from+`%23@__admin`+limit+0,1),1,62)))a+/*! 50000fRom*/+information_schema.tables+/*!50000gROUP*/+by+a)b)%23@`%5C'`+]=a 2015-06-03T00:50:43+0200 340 404 GET /plus/ search.php?keyword=as&typeArr[111%3D@`%5C'`)+/*!50000And*/+(/*!50000SeLECT*/+1+/*!50000frOM*/+(/*! 50000SeLECT*/+/*!50000Count(*)*/,concat(floor(rand(0)*2),(substring((/*!50000SeLECT*/ +CONCAT(0x40,userid,0x7c,substring(pwd,4,16))+from+`%23@__admin`+limit+0,1),1,62)))a+/*! 50000fRom*/+information_schema.tables+/*!50000gROUP*/+by+a)b)%23@`%5C'`+]=a 2015-06-03T05:18:19+0200 340 404 GET /plus/ search.php?keyword=as&typeArr[111%3D@`%5C'`)+/*!50000And*/+(/*!50000SeLECT*/+1+/*!50000frOM*/+(/*! 50000SeLECT*/+/*!50000Count(*)*/,concat(floor(rand(0)*2),(substring((/*!50000SeLECT*/ +CONCAT(0x40,userid,0x7c,substring(pwd,4,16))+from+`%23@__admin`+limit+0,1),1,62)))a+/*! 50000fRom*/+information_schema.tables+/*!50000gROUP*/+by+a)b)%23@`%5C'`+]=a # Search Parameters: # {"terms":{"src_ip":["","",""]}} # {"query_string":{"query":""}} # Displaying 3 of 793 in 0 seconds. # Indexes (2 of 2) searched: ams4-access-2015.06.03,lhr4-access-2015.06.03

Slide 40 text


Data Types Meta-Data State Time Series Events Graphite No No Yes Kinda InfluxDB Kinda Yes Yes Kinda OpenTSDB Kinda Yes Yes Kinda MySQL Yes Yes No Kinda PostgreSQL Yes Yes No Kinda ElasticSearch No Kinda Kinda Yes

Features of Your Data Interval Cardinality Data Type Aging Graphite Fixed, Regular Low Numeric Roll up InfluxDB Fixed (best) Any High Any Configurable OpenTSDB Any High Numeric n/a MySQL Any Keys: Low Values: High Structured* None PostgreSQL Any Keys: Low Values: High Structured* None ElasticSearch Any Keys: Low Values: High Any None

Features of the Store Security Scalability Performance Reliability Graphite Low High High* Medium InfluxDB Low High Medium High OpenTSDB Low High Low High MySQL Medium Medium Medium* High* PostgreSQL High Medium Medium* High ElasticSearch Low High High Low

Thank you!