Slide 1

Slide 1 text

NetApp March 1st, 2018 @kerendg, @majorouch The Path to Intelligent Operation with NetApp OnCommand Insight Keren Dagan, Senior Product Manager Francisco Rosa, Senior Software Engineer

Slide 2

Slide 2 text

Agenda 2 1 Introduction to NetApp OnCommand Insight 2 Evolving into Elasticsearch 3 Evolving into Machine Learning 4 Lessons learned

Slide 3

Slide 3 text

NetApp - OnCommand Insight Introduction

Slide 4

Slide 4 text

4 NetApp Inc. Next Gen Data Center - Digital Transformation • 25 years • $5.5 Billion Revenue • 10,000+ employees • 110+ offices Fastest growing All-Flash Array vendor Recognized as a Leader in multiple Gartner Magic Quadrants • General-Purpose Disk Arrays • Solid-State Arrays • Integrated Systems We help out customers to: • Drive value with data • Gain Insight, Access, and Control NetApp Cloud Central And

Slide 5

Slide 5 text

Assets Under Management Capacity 11EB+ Switch Ports 5M+ Systems (VMs and Servers) 10M+ Storage Systems 35K+ • The top 10 Fortune 500 companies • 6 of the top 10 US Retailers • 8 of the top 10 Banks • 5 of the top 10 Insurance companies • 7 of the top 10 Tech and Service Providers Top Companies Rely on OnCommand Insight 5 NetApp OnCommand Insight Hybrid Cloud Infrastructure Analytics Manage growth and complexity Troubleshoot issues Identify and monitor cost

Slide 6

Slide 6 text

6 OnCommand® Insight Hybrid Infrastructure Management Private Cloud Fiber Channel Switches On-Premise Public Cloud Consistent insights across multi-vendor, hybrid infrastructure Intelligent Operations • Discover and monitor resources, their relationships and dependencies • Proactive alerting and fast troubleshooting with advanced analytics Business Insights • Resource optimization • Cost alignment and show back • Forecast performance and capacity planning • Enables business workflows such as billing, cost, change management and automation Ecosystem Integration • Open API provide access to discovered and monitored data Inventory - Resources Performance - KPI Topology and Relationships

Slide 7

Slide 7 text

Resources in OnCommand Insight Resource Type and Name Summary Information Topology Business Context Data Expert View Analytics Section Related Resources Resource Landing Page • Customized view for each resource type • A 360 degree view of the resource including metrics, topology and business context • Expert view with charts, and advanced analytics section • Quick navigation to related resources’ landing pages 7

Slide 8

Slide 8 text

8 The Applications Infrastructure Stack Topologies – OnPrem, Private and Public Cloud! Switch Switch Storage Storage Storage VM VM VM VM VM Hypervisor volume Storage VM Hypervisor Switch Switch App – running OmPrem App – running on AWS(or Azure) AWS Instance EC2 AWS Instance AWS Instance EBS Volume S3 Buckets Switch Storage VM VM KVM App - running on OpenStack VM Switch Storage

Slide 9

Slide 9 text

The Path to Intelligent Operation Evolution to Elasticsearch

Slide 10

Slide 10 text

10 Life is good in ol’ 2014!

Slide 11

Slide 11 text

OnCommand Insight in 2014 11 OCI Engine Cassandra MySQL Lucene MySQL Cassandra

Slide 12

Slide 12 text

Ralph Waldo Emerson PM { } I need grouping of timeseries from virtual machine’s attributes

Slide 13

Slide 13 text

Cassandra Lucene MySQL OCI Engine 13 We introduce Elasticsearch! Elasticsearch MySQL OCI Engine 2016

Slide 14

Slide 14 text

14 We introduce Elasticsearch!

Slide 15

Slide 15 text

PM { } I need to add business context to my data myself

Slide 16

Slide 16 text

16 Indices evolve…the big split

Slide 17

Slide 17 text

17 Indices evolve…the big split MySQL Elasticsearch OCI Elasticsearch plugin Single Node OCI Engine Index join search Custom aggregations: weighted average

Slide 18

Slide 18 text

PM { } Disks are not infinite, I need to age out data

Slide 19

Slide 19 text

19 Indices evolve…and we split some more

Slide 20

Slide 20 text

PM { } I would like to push my own objects into the system

Slide 21

Slide 21 text

21 And we embrace the dynamic world! Attributes indexes Timeseries indexes 2017

Slide 22

Slide 22 text

22 And we embrace the dynamic world! metricbeat Kubernetes master Kubernetes node Kubernetes node Kubernetes node OCI Engine Elasticsearch OCI Elasticsearch plugin Single Node Nodes (X) Logstash OCI output plugin

Slide 23

Slide 23 text

23 And we embrace the dynamic world! Built-in object types Your own object types

Slide 24

Slide 24 text

The Path to Intelligent Operation Evolving into Machine Learning - Anomaly Detection

Slide 25

Slide 25 text

• Systems tends to converge overtime to a rhythmic pattern of operation, a predictable cyclical pattern. • This pattern is not a simple, and the cycles can span over hours, days, weeks and months. • Static threshold works when the user knows what is “bad”, otherwise creates noise • ML is good for detecting when the pattern has changed • Prelert (now Elastic ML) and Elastic. • OnCommand Insight implementation of Elastic ML for Anomaly Detection Anomaly Detection 25 +

Slide 26

Slide 26 text

26 OnCommand Insight Anomaly Detection Engine! • OCI Data sources • Discovering infrastructure resources • Collecting key performance metrics (Latency, IOPS, Utilization) • OnCommand Insight Server • Compute the service path and relationships • Realizes all the Application resources • Packages and send the data to the Elastic ML (job) • Anomaly Detection Engine – Elastic ML • Learns and models normal and detect anomalies • OnCommand Insight UI • Presents the Application anomaly score, with anomalous resources Anomaly Detection Engine – Elastic ML Data Sources

Slide 27

Slide 27 text

27 Anomaly Detection Results Application Landing Page §Forensic view – Application infrastructure resource stack §Overall Anomaly Score and Time §Highlight anomalous resources – 1,2,3 blue bars to indicate the significance of the Anomaly §Call out resources for further investigation §Application Anomaly Score chart Application Anomaly Score and the Time of Anomaly Add to Expert View # of resources Forensic View Anomalous resource Application Anomaly Score at this time

Slide 28

Slide 28 text

28 Anomaly Types - Pattern Anomaly Detecting a Change in Behavior Break in the pattern Pattern - Rhythm

Slide 29

Slide 29 text

29 Anomaly Types – Point and Change Point Anomaly Detecting a Change in Behavior Crash Change Point Point Anomaly

Slide 30

Slide 30 text

OnCommand Insight’s customers appreciate: • The scale of our solution • The data richness and the data quality • Understanding the path, the relationships, and enriching the data with business context • The powerful search, flexible visualization of the data with the topologies • Analytics and Machine learning for proactive alerting OCI technology helps our customers to become aware of issues before becoming a problem, preventing an approaching outage in their environment. Another leap forward in the path Intelligent Operation Summary – The Path to Intelligent Operation 30

Slide 31

Slide 31 text

The Path to Intelligent Operation Lessons Learned

Slide 32

Slide 32 text

• Embedding works for us! • Elasticsearch works really well with timeseries and heavy analysis • Elasticsearch can do a lot but it can be further extended with plugins • Plugins are good but… documentation is scarce and code hooked up to Elasticsearch version • Elasticsearch is better at smaller number of large indexes than larger number of smaller indexes • Rollover API can be your friend Lessons Learned - Elastic 32

Slide 33

Slide 33 text

33 Lessons Learned – Anomaly Detection Applying Domain Expertise! The math might be right, but this is not always enough. Excluding Anomalies Below the Thresholds • A change in very small numbers (0.005 – 0.5) is mathematically significant. • Yet, it is very case specific, becoming an interesting anomaly! Dormant Resources • Resources who does very little work – mostly inactive • A sudden, even subtle change in the performance can generate anomaly • In most cases this is not a critical resource to alert for These resources are not excluded from the learning, only from the results

Slide 34

Slide 34 text

34 More Questions? Visit us at the AMA