Slide 1

Slide 1 text

DevOps Data Storage Presented by Brad Lhotsky

Slide 2

Slide 2 text

Brad Lhotsky • Systems and Security
 at Craigslist • Infrastructure Monitoring & Security
 at Booking.com • Recovering • Perl Programmer • Linux/BSD Systems Admin • Network Security Specialist • PostgreSQL Administrator • ElasticSearch Janitor • DNS Voyeur • OSSEC Core Team Member https://github.com/reyjrar https://twitter.com/reyjrar

Slide 3

Slide 3 text

Expectations ‣Common Data Types ‣Using your Data ‣Features of Data Stores ‣Popular Data Stores

Slide 4

Slide 4 text

Types of DevOps-y Data

Slide 5

Slide 5 text

Administrative & Meta-Data ‣ Inventory ‣ Hardware ‣ Software ‣ Builds or Roles ‣ Services ‣ Users and Groups ‣ Employee/Contractor ‣ Managers ‣ ACLs

Slide 6

Slide 6 text

Monitoring ‣ State ‣ OK / NOT OK ‣ UP / DOWN ‣ Package Version ‣ Time Series ‣ Counter ‣ Rate ‣ Statistical Summaries

Slide 7

Slide 7 text

Events ‣ State Changes ‣ Package Updated ‣ Service Stopped ‣ System Events ‣ Syslog Message ‣ SNMP Traps ‣ Application Events ‣ Access ‣ Errors ‣ Traces

Slide 8

Slide 8 text

Deving and Oping Your Data

Slide 9

Slide 9 text

Monitoring and Metrics

Slide 10

Slide 10 text

-Nicole Forsgren - Monitorama PDX 2016 How Metrics Shape Your Culture “Metrics are your culture.”

Slide 11

Slide 11 text

"How Metrics Shape Your Culture" • You can't improve what you don't measure • Always measure things that matter • Things measured are things managed • Metrics can be gamed • Metrics inform incenticves • Not everything that can be counted counts • Hard to measure doesn't mean it isn't worth measuring

Slide 12

Slide 12 text

... and that's probably O.K. All of your monitoring is probably wrong.

Slide 13

Slide 13 text

Alerting ‣ Disrupting People's Lives at 95% Disk Full ‣ Thresholds -> Change Detection ‣ State Change Thresholds

Slide 14

Slide 14 text

Automation ‣ Can a Machine read my data? ‣ Autoscale ‣ Trend detection ‣ Service Level Roll Ups in Alerting ‣ Reporting

Slide 15

Slide 15 text

Capacity Planning ‣ Predicting System Stress Levels ‣ Make Intelligent Projections ‣ Test those predictions

Slide 16

Slide 16 text

Exploration

Slide 17

Slide 17 text

Attractive Features

Slide 18

Slide 18 text

Open and Extensible ‣ Integrations with other projects ‣ Open API ‣ Good Documentation ‣ Modular / Plugin Structute ‣ Community

Slide 19

Slide 19 text

Reliability vs. Performance

Slide 20

Slide 20 text

Retention ‣ How easy is it to age data off? ‣ What regulations of laws apply to the data? ‣ Expectations from: ‣ Customers ‣ Employees ‣ Managers ‣ Peers ‣ Legal

Slide 21

Slide 21 text

Privacy and Security ‣ What do you keep on your users in your ops data? ‣ Who might come calling for it? ‣ How comfortable are you handing it over to Trump? ‣ Anyone hear about MongoDB? ‣ Can the store provide security?

Slide 22

Slide 22 text

Places People Stick DevOps Datas

Slide 23

Slide 23 text

‣ Large Community ‣ Forks and Oracle ‣ Performance First ‣ SQL Interface ‣ Limited Data Types ‣ Web > BI ‣ Suitable for Administrative Data

Slide 24

Slide 24 text

‣ Large Community ‣ Reliability First ‣ Open and Extensible ‣ PGXN ‣ CitusData ‣ GreenPlum ‣ EnterpriseDB ‣ Native Support for IP Addresses ‣ Extensible Data Types ‣ Suitable for Administrative Data

Slide 25

Slide 25 text

‣ Large Community ‣ Interchangeable Components, ala, MicroServices ‣ Simple API ‣ Rampant Open Source Adoption ‣ Scalable ‣ Compatibility ‣ Grafana, Statsd, Riemann, Bosun, Cabot, Seyren ‣ etc., etc., ‣ Suitable for Time Series Data ‣ Smallest Resolution: seconds

Slide 26

Slide 26 text

security.logging.indexer.*.total Metrics: Wildcards

Slide 27

Slide 27 text

sumSeries(security.logging.indexer.*.total) Combining Metrics

Slide 28

Slide 28 text

alias(sumSeries(security.logging.indexer.*.total),”Today") alias( timeShift( sumSeries(security.logging.indexer.*.total), “7d"), "Last Week") Comparing Metrics

Slide 29

Slide 29 text

alias(alpha(color(areaBetween( holtWintersConfidenceBands( maxSeries(general.es.*.jvm.mem.heap_used_bytes) ) ),“gray"),0.1),"Hot Winter Confidence Bands”) color(alias( maxSeries(general.es.*.jvm.mem.heap_used_bytes), "Max Heap Size"),"red") Advanced Tricks

Slide 30

Slide 30 text

‣ Metrics 2.0 ‣ Hadoop / Hbase backed ‣ SQL-like Language ‣ Zero Data Loss ‣ Compatibility ‣ Carbon, Grafana, Statsd, Riemann, Bosun ‣ Suitable for Time Series Data ‣ Smallest Resolution: milliseconds

Slide 31

Slide 31 text

‣ Metrics 2.0 ‣ SQL-like Language ‣ Zero Data Loss ‣ Compatibility ‣ Carbon, Grafana, Statsd, Riemann, Bosun ‣ Suitable for Time Series Data ‣ Smallest Resolution: nanoseconds

Slide 32

Slide 32 text

‣ Well Documented API ‣ Many Open Source Integrations ‣ Lucene backed text search ‣ Scalable ‣ "Jepsen ElasticSearch" re:CAP

Slide 33

Slide 33 text

Web Attacks Scanners

Slide 34

Slide 34 text

Slow Pages

Slide 35

Slide 35 text

App::ElasticSearch::Utilities Search Stuff! = Querying Indexes: lhr4-access-2015.06.03,ams4-access-2015.06.03 @timestamp src_ip src_ip_country file 2015-06-03T11:39:27+0200 217.36.201.217 GB /B1D671CF- E532-4481-99AA-19F420D90332/netdefender/hui/ndhui.css 2015-06-03T11:39:26+0200 92.56.217.84 ES /hotel/es/ null.es.html # Search Parameters: # {"query_string":{"query":"dst:www.booking.com AND crit:404"}} # Displaying 3 of (CENSORED) in 0 seconds. # Indexes (2 of 4) searched: ams4-access-2015.06.03,lhr4-access-2015.06.03 https://github.com/reyjrar/es-utils $ es-search.pl --base access dst:www.booking.com and crit:404 \ --show src_ip,src_ip_country,file --size 2

Slide 36

Slide 36 text

Aggregate Stuff https://github.com/reyjrar/es-utils $ es-search.pl --base access --days 1 dst:www.booking.com \ --top src_ip --size 3 = Querying Indexes: ams4-access-2015.06.03,lhr4-access-2015.06.03 count src_ip (CENSORED) 66.249.92.71 (CENSORED) 66.249.92.59 (CENSORED) 66.249.92.65 # Search Parameters: # {"query_string":{"query":"dst:www.booking.com"}} # Displaying 3 of (CENSORED) in 5 seconds. # Indexes (2 of 2) searched: ams4-access-2015.06.03,lhr4-access-2015.06.03 # # Totals across batch # count src_ip (CENSORED) 66.249.92.71 (CENSORED) 66.249.92.59 (CENSORED) 66.249.92.65

Slide 37

Slide 37 text

Find Pages Viewed by Most Countries https://github.com/reyjrar/es-utils $ es-search.pl --base access --days 1 dst:www.booking.com \ --top file --by cardinality:src_ip_country --size 3 = Querying Indexes: lhr4-access-2015.06.03,ams4-access-2015.06.03 cardinality:src_ip_country count file 239 (CENSORED) / 236 (CENSORED) /rt_data/city_bookings 234 (CENSORED) /wishlist/get # Search Parameters: # {"query_string":{"query":"dst:www.booking.com"}} # Displaying 3 of (CENSORED) in 21 seconds. # Indexes (2 of 2) searched: ams4-access-2015.06.03,lhr4-access-2015.06.03

Slide 38

Slide 38 text

Pipeline Queries https://github.com/reyjrar/es-utils $ es-search.pl --base access --days 1 dst:www.booking.com \ --top src_ip --by sum:attack_score --size 3 \ --data-file top_attackers.dat $ es-search.pl --base access --days 1 dst:www.booking.com \ src_ip:top_attackers.dat[-1] --size 3\ --show attack_score,src_ip,crit,dst,method,resource \ --sort attack_score:desc

Slide 39

Slide 39 text

Pipeline Queries https://github.com/reyjrar/es-utils = Querying Indexes: lhr4-access-2015.06.03,ams4-access-2015.06.03 @timestamp attack_score src_ip crit dst method resource 2015-06-03T04:20:59+0200 340 107.150.42.90 404 www.booking.com GET /plus/ search.php?keyword=as&typeArr[111%3D@`%5C'`)+/*!50000And*/+(/*!50000SeLECT*/+1+/*!50000frOM*/+(/*! 50000SeLECT*/+/*!50000Count(*)*/,concat(floor(rand(0)*2),(substring((/*!50000SeLECT*/ +CONCAT(0x40,userid,0x7c,substring(pwd,4,16))+from+`%23@__admin`+limit+0,1),1,62)))a+/*! 50000fRom*/+information_schema.tables+/*!50000gROUP*/+by+a)b)%23@`%5C'`+]=a 2015-06-03T00:50:43+0200 340 107.150.42.90 404 www.booking.com GET /plus/ search.php?keyword=as&typeArr[111%3D@`%5C'`)+/*!50000And*/+(/*!50000SeLECT*/+1+/*!50000frOM*/+(/*! 50000SeLECT*/+/*!50000Count(*)*/,concat(floor(rand(0)*2),(substring((/*!50000SeLECT*/ +CONCAT(0x40,userid,0x7c,substring(pwd,4,16))+from+`%23@__admin`+limit+0,1),1,62)))a+/*! 50000fRom*/+information_schema.tables+/*!50000gROUP*/+by+a)b)%23@`%5C'`+]=a 2015-06-03T05:18:19+0200 340 107.150.42.90 404 www.booking.com GET /plus/ search.php?keyword=as&typeArr[111%3D@`%5C'`)+/*!50000And*/+(/*!50000SeLECT*/+1+/*!50000frOM*/+(/*! 50000SeLECT*/+/*!50000Count(*)*/,concat(floor(rand(0)*2),(substring((/*!50000SeLECT*/ +CONCAT(0x40,userid,0x7c,substring(pwd,4,16))+from+`%23@__admin`+limit+0,1),1,62)))a+/*! 50000fRom*/+information_schema.tables+/*!50000gROUP*/+by+a)b)%23@`%5C'`+]=a # Search Parameters: # {"terms":{"src_ip":["107.150.42.90","37.59.7.157","74.84.138.120"]}} # {"query_string":{"query":"dst:www.booking.com"}} # Displaying 3 of 793 in 0 seconds. # Indexes (2 of 2) searched: ams4-access-2015.06.03,lhr4-access-2015.06.03

Slide 40

Slide 40 text

Recap

Slide 41

Slide 41 text

Data Types Meta-Data State Time Series Events Graphite No No Yes Kinda InfluxDB Kinda Yes Yes Kinda OpenTSDB Kinda Yes Yes Kinda MySQL Yes Yes No Kinda PostgreSQL Yes Yes No Kinda ElasticSearch No Kinda Kinda Yes

Slide 42

Slide 42 text

Features of Your Data Interval Cardinality Data Type Aging Graphite Fixed, Regular Low Numeric Roll up InfluxDB Fixed (best) Any High Any Configurable OpenTSDB Any High Numeric n/a MySQL Any Keys: Low Values: High Structured* None PostgreSQL Any Keys: Low Values: High Structured* None ElasticSearch Any Keys: Low Values: High Any None

Slide 43

Slide 43 text

Features of the Store Security Scalability Performance Reliability Graphite Low High High* Medium InfluxDB Low High Medium High OpenTSDB Low High Low High MySQL Medium Medium Medium* High* PostgreSQL High Medium Medium* High ElasticSearch Low High High Low

Slide 44

Slide 44 text

Thank you! [email protected] https://twitter.com/reyjrar https://github.com/reyjrar https://speakerdeck.com/reyjrar https://www.craigslist.org/about/craigslist_is_hiring