Analytics in AWS - Boston AWS, June 2013

Analytics in AWS Patrick Eaton, PhD [email protected] @PatrickREaton

Stackdriver at a Glance Stackdriver's intelligent monitoring service helps SaaS
providers spend more time on Dev and less on Ops • Founded by cloud/infrastructure industry veterans (Microsoft, VMware, EMC, Endeca, Red Hat) with deep systems and DevOps expertise • Team of 15, based in Downtown Boston • Public beta underway -- see the web site

Analytics at Stackdriver Goal: Deliver more value for less effort
Examples: • Monitor and report on infrastructure in terms that makes sense to your team • Detect how pieces of the infrastructure relate • Display lots of data at one time • Suggest policies to alert you of possible problems • Identify unusual application behavior This talk: three broad analytics techniques and examples

Technique: Mine the Metadata • AWS provides heaps of metadata
about your infrastructure • Metadata is accessible via Amazon's APIs • Use it! Simple analysis of the metadata can produce big wins

Using Metadata - Identify App Groups • We make suggestions
based on instance names, tags, security groups, etc. • Customer gets more relevant monitoring

Using Metadata - Find Relationships • We make explicit the
relationships in the customers infrastructure ◦ load balancers and backing instances ◦ instances and security groups ◦ volumes and their host ◦ snapshots and their volumes • Customer sees an architectural view that matches their mental model

Technique - Pre-compute Summarizations • Modern UIs are data rich
• Data summarization is key for effectively presentation and UI performance • Identify commonly-used summarizations and pre- compute them (outside the critical path)

Summarizing - Roll-up Across Time We summarize data for a
resource across time • Compute roll-ups for various functions: avg, max, percentile, etc. Customer can see historical trends quickly 30x Time 30min roll-up

Summarizing - Roll-up Across Resources We summarize behavior of multiple
resources • Compute roll-ups for various functions: avg, max, percentile, etc. Customer can view cluster performance at a glance 15 resources

Tools for Summarization Tools • Hadoop and AWS Elastic Map/Reduce
(EMR) • Python and mrjob ◦ https://github.com/Yelp/mrjob • Data from S3 archives; data to Cassandra Usage • EMR clusters of 14 c1.mediums • Start jobs every 6 hours

mrjob Word Count Example from mrjob.job import MRJob import re
WORD_RE = re.compile(r"[\w']+") class MRWordFreqCount(MRJob): def mapper(self, _, line): for word in WORD_RE.findall(line): yield (word.lower(), 1) def combiner(self, word, counts): yield (word, sum(counts)) def reducer(self, word, counts): yield (word, sum(counts)) if __name__ == '__main__': MRWordFreqCount.run() Define map() and reduce() functions in standard Python Python generators pass data back to Hadoop mrjob handles set-up and configuration of job $ python mrjob_wc.py readme.txt > counts # local $ python mrjob_wc.py readme.txt -r emr > counts # EMR $ python mrjob_wc.py readme.txt -r hadoop > counts # Hadoop

Summarization Algorithm • Phase 0 ◦ Map ▪ Read archives
from s3 ▪ Route data point to a time bucket for aggregation • key = resource::metric::time::granularity ◦ Reduce ▪ Aggregate data into a single data point ▪ We compute 8 functions (min, max, avg, med...) • Phase 1 ◦ Map ▪ Route aggregated point to a data series • key = resource::metric:granularity::aggr_fn ◦ Reduce ▪ Write series of data to Cassandra

Summarization Algorithm Illustrated t t + δ t + 2δ
t+5+δ t + 5 t + 3δ Raw Data 5-Minute 15-Minute min max avg min max avg min max avg ... ... ... min min min max max max avg avg avg min min min max max max avg avg avg Map-0 Red-0 Map-1

Technique - Analyze the Data Find trends, anomalies, oddities, correlations,
relationships, incident signatures, best practice violations, etc. in the data We analyze data from the customer environment and highlight discoveries The customer learns about potential problems before they happen

Analyzing - Suggest Alert Policies The product supports threshold-based alerts.
We analyze historical data and suggest policies to catch future issues. Approach: analyze data series looking for typically consistent performance with statistically significant outliers • Use mrjob in local mode - easy programming model for smaller data sets

Analyzing - Identify Anomalies A more sophisticated version of the
previous Currently using R for much of this analysis Using rpy2 to translate to/from Python (http://rpy.sourceforge.net/rpy2.html)

Example of rpy2 import rpy2.robjects as robjects robjects.r(''' find_decomposition <-
function(data_series) { ts <- ts(data_series, start=c(1, 1), frequency=12) decomp <- decompose(ts, type="additive") } ''') def decompose(df, series): decomp_components = ['seasonal', 'trend', 'random'] decompose_fn = robjects.r['find_decomposition'] return decompose_fn(series)

Thank you! Yes, we are hiring! Patrick Eaton - [email protected]
- @PatrickREaton

Analytics in AWS - Boston AWS, June 2013

Analytics in AWS - Boston AWS, June 2013

Stackdriver

More Decks by Stackdriver

Other Decks in Technology

Featured

Transcript

Analytics in AWS Patrick Eaton, PhD [email protected] @PatrickREaton

Stackdriver at a Glance Stackdriver's intelligent monitoring service helps SaaS

Analytics at Stackdriver Goal: Deliver more value for less effort

Technique: Mine the Metadata • AWS provides heaps of metadata

Using Metadata - Identify App Groups • We make suggestions

Using Metadata - Find Relationships • We make explicit the

Technique - Pre-compute Summarizations • Modern UIs are data rich

Summarizing - Roll-up Across Time We summarize data for a

Summarizing - Roll-up Across Resources We summarize behavior of multiple

Tools for Summarization Tools • Hadoop and AWS Elastic Map/Reduce

mrjob Word Count Example from mrjob.job import MRJob import re

Summarization Algorithm • Phase 0 ◦ Map ▪ Read archives

Summarization Algorithm Illustrated t t + δ t + 2δ

Technique - Analyze the Data Find trends, anomalies, oddities, correlations,

Analyzing - Suggest Alert Policies The product supports threshold-based alerts.

Analyzing - Identify Anomalies A more sophisticated version of the

Example of rpy2 import rpy2.robjects as robjects robjects.r(''' find_decomposition <-

Thank you! Yes, we are hiring! Patrick Eaton - [email protected]