Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Leveraging Elasticsearch as the foundation of a...

Elastic Co
November 03, 2017

Leveraging Elasticsearch as the foundation of a Digital Experience Analytics Platform - Elastic New York City user group

Leveraging Elasticsearch as the foundation of a Digital Experience Analytics Platform

In this session, we will discuss our experiences in creating a highly performant, scalable, and real-time data ingestion and processing analytics platform. We will review our architecture, usage, lessons learned, and recommended best practices working with Elastic technology. CA’s Analytics Platform models a Lambda Architecture with the use of a combination of open source technologies such as Kafka, Elasticsearch, Apache Spark, and HDFS. It was built to ingests large amounts of application trace files, metrics, session navigation details, and other UX data in real time from mobile devices and web applications to capture a holistic view of customer experience to developers and business stake holders.

Bryan Whitmarsh is a Sr. Principal Product Manager in the Agile Operations business unit at CA Technologies. Bryan is primarily focused on external facing Product Management actives. As a CA Champion he is very active in the CA Communities as well as various public communities. He lives in Idaho and loves the outdoors.

https://www.meetup.com/New-York-City-Elastic-Fantastics/events/244062413/

Elastic Co

November 03, 2017
Tweet

More Decks by Elastic Co

Other Decks in Technology

Transcript

  1. Leveraging Elasticsearch as the Foundation of a Digital Experience Insights

    Platform Bryan Whitmarsh Sr Principal Product Manager CA Technologies, Idaho November 2017
  2. © 2017 CA. ALL RIGHTS RESERVED. Breakdown of CA’s Analytics

    Engine Lambda Based 1 2 3 1 2 3 1 2 2 2 2 31 28 2 2 2 2 20 % Node/Image Allocation Percentage Docker on OpenShift on AWS 5
  3. 3 © 2017 CA. ALL RIGHTS RESERVED. Agenda WHAT IS

    ELASTIC WHY ELASTIC FOR ANALYTICS WHY PARTNER WITH CA FOR YOUR ANALYTICS INITIATIVES BREAKDOWN OF CA’S ANALYTICS ENGINE LAMBDA ARCHITECTURE CA’S LESSONS LEARNED / BEST PRACTICES 1 2 3 4 5 6
  4. © 2017 CA. ALL RIGHTS RESERVED. Our Problem • Lots

    of Data: 2 Terabytes – 5 Terabytes ingestion per day Customer Experience, App, Infra, Network…. • Multiple Products we need Insights from • Multiple Data Repositories • Highly available, Highly Scalable, Multi-Tenant SaaS Offering • Ability to provide Insights via Analytics • Historical • Real-Time • Anomaly Detection • Predictive • Prescriptive • Advanced Learning Baselining to Skynet’s T-5000 • Royalty Free distribution • …
  5. © 2017 CA. ALL RIGHTS RESERVED. Why Elastic for Analytics?

    Why leverage the ELK Stack as part of your analytics strategy? • Easy to scale • *Open-source (great community/support/documentation) • Good data aggregation capabilities • Good search/query capabilities • High performance • Flexible scheme • Easy visualization • Analytics in a box (Collect-Index-Visualize|Logstash-Elasticsearch-Kibana) • Market momentum
  6. © 2017 CA. ALL RIGHTS RESERVED. CA Analytics Sweet Spot

    VALUE DIFFICULTY HINDSIGHT INSIGHT FORESIGHT Source: Gartner DESCRIPTIVE ANALYTICS What happened? DIAGNOSTIC ANALYTICS Why did it happen? PREDICTIVE ANALYTICS What will happen? PRESCRIPTIVE ANALYTICS How can we make it happen? REAL-TIME ANALYTICS What is happening right now? CA Analytics is enabling technology for building analytics-driven applications.
  7. © 2017 CA. ALL RIGHTS RESERVED. Lambda Architecture “Lambda Architecture”*

    describes a system consisting of three layers: batch processing, speed (or real-time) processing, and a serving layer for responding to queries. This approach to architecture attempts to balance latency, throughput, and fault-tolerance by using batch processing to provide comprehensive and accurate views of batch data, while simultaneously using real-time stream processing to provide views of online data. The two view outputs may be joined before presentation. The rise of lambda architecture is correlated with the growth of big data, real-time analytics, and the drive to mitigate the latencies of map-reduce. *credit Nathan Marz
  8. © 2017 CA. ALL RIGHTS RESERVED. Breakdown of CA’s Analytics

    Engine Lambda Based 1 2 3 1 2 3 1 2 % Node/Image Allocation Percentage
  9. © 2017 CA. ALL RIGHTS RESERVED. Breakdown of CA’s Analytics

    Engine Lambda Based 1 2 3 1 2 3 1 5 % Node/Image Allocation Percentage
  10. © 2017 CA. ALL RIGHTS RESERVED. Breakdown of CA’s Analytics

    Engine Lambda Based 1 2 3 1 2 3 1 2 % Node/Image Allocation Percentage
  11. © 2017 CA. ALL RIGHTS RESERVED. Breakdown of CA’s Analytics

    Engine Lambda Based 1 2 3 1 2 3 1 2 20 % Node/Image Allocation Percentage
  12. © 2017 CA. ALL RIGHTS RESERVED. Breakdown of CA’s Analytics

    Engine Lambda Based 1 2 3 1 2 3 1 2 2 2 % Node/Image Allocation Percentage
  13. © 2017 CA. ALL RIGHTS RESERVED. Pluggable Data Science §

    Provide – Data to subscribe – Data to produce – Lambda Function to run – Start Time & Schedule
  14. © 2017 CA. ALL RIGHTS RESERVED. Breakdown of CA’s Analytics

    Engine Lambda Based 1 2 3 1 2 3 1 31 28 % Node/Image Allocation Percentage
  15. © 2017 CA. ALL RIGHTS RESERVED. Breakdown of CA’s Analytics

    Engine Lambda Based 1 2 3 1 2 3 1 2 % Node/Image Allocation Percentage
  16. © 2017 CA. ALL RIGHTS RESERVED. Breakdown of CA’s Analytics

    Engine Lambda Based 1 2 3 1 2 3 1 2 2 2 2 31 28 2 2 2 2 20 % Node/Image Allocation Percentage Docker on OpenShift on AWS 5
  17. © 2017 CA. ALL RIGHTS RESERVED. CA’s Elastic Lessons Learned

    / Best Practices Remember two very important Ps • Take time to Plan! • Making sure you have your Scheme right from the start will save you from a lot of pain! • Don’t try and reuse a scheme from another solution • Be sure to Prototype (expect to spend $$$ in testing) • Don’t assume you know the affect of doing things like adding shards or what adding an analyzed field will do without Prototyping/testing it first
  18. © 2017 CA. ALL RIGHTS RESERVED. CA’s Elastic Lessons Learned

    / Best Practices One Shard size doesn’t fit most! • The right balance of shards per Index is VERY important and depends GREATLY on your desired use case • Example • Large Indexes more Shards (5) • Small Indexes less Shards (1) • Leverage Rollover Index to prevent your Indexes from getting too large – For CA we Rollover once Index reaches ~300GB (shard > 30GB) High Ingestion Rates<-----------------------------------> Quick Search Times
  19. © 2017 CA. ALL RIGHTS RESERVED. CA’s Elastic Lessons Learned

    / Best Practices Don’t Split your Brains! • set discovery.zen.minimum_master_nodes=half the number of nodes + 1
  20. © 2017 CA. ALL RIGHTS RESERVED. CA’s Elastic Lessons Learned

    / Best Practices Avoid shooting yourself in the foot! • ES allows you to easily delete all the indexes • set the action.destructive_requires_name=true
  21. © 2017 CA. ALL RIGHTS RESERVED. CA’s Elastic Lessons Learned

    / Best Practices Don’t spin with disks! • Solid State Drives Highly Recommended • No Really, SSD Required! • Avoid NASs or SANs • Avoid Raid 5 or Mirroring (Raid 0 ES takes care of replication) • Be careful with EC2 EBS
  22. © 2017 CA. ALL RIGHTS RESERVED. CA’s Elastic Lessons Learned

    / Best Practices “Swapping” can be bad • In reference to “Memory” for ES you want to avoid memory swapping at all costs • Disable memory swapping by setting OS level settings or set the following in ES config: bootstrap.mlockall: true
  23. © 2017 CA. ALL RIGHTS RESERVED. CA’s Lessons Learned /

    Best Practices Monitoring is Important • Use multiple types of monitoring to detect issues before they become critical • AWS Cloudwatch • Log File Monitoring • Application Performance Monitoring • Infrastructure Monitoring • DB Monitoring • Synthetic Monitoring • Elastic Monitoring • …
  24. © 2017 CA. ALL RIGHTS RESERVED. CA’s Elastic Lessons Learned

    / Best Practices Multi-tenant Best Practice (without X-Pack) • Colorsize the Data – for CA all data going in assigned a Tenant ID ++, for ES Tenant ID = Alias • You need to control access to ES – for CA we built an Authentication/Security component
  25. © 2017 CA. ALL RIGHTS RESERVED. CA’s Elastic Lessons Learned

    / Best Practices Kibana Searches can cause problems without limits • Limit Level of Aggregations (custom plugin) • Limit the heap size by lowering the indices.breaker.total.limit default of 70% (ES 5.5.3 required) see: https://github.com/elastic/elasticsearch/is sues/21942
  26. © 2017 CA. ALL RIGHTS RESERVED. CA’s Elastic Lessons Learned

    / Best Practices Ingestion Spikes Can Hurt You • During Ingestion “Spikes” ES nodes can fall behind and return bulk rejections • Default queue_size of the bulk threadpool = 50 • Increase queue_Size to ~500 or • small sleep on this particular error or • You might have another issue…
  27. © 2017 CA. ALL RIGHTS RESERVED. CA’s Elastic Lessons Learned

    / Best Practices Limit your dynamic “Fields” of Play There’s a 1000 field ES Limit per Index for a reason… • Multi-tenant fields: 5 customers create 100 fields = 500 additional fields per index • Normalize dynamic field creation • custom.custom_field_name • custom.custom_string_value
  28. © 2017 CA. ALL RIGHTS RESERVED. CA’s Elastic Lessons Learned

    / Best Practices Remember two very important Ps • Take time to Plan! • Making sure you have your Scheme right from the start will save you from a lot of pain! • Don’t try and reuse a scheme from another solution • Be sure to Prototype (expect to spend $$$ in testing) • Don’t assume you know the affect of doing things like adding shards or what adding an analyzed field will do without Prototyping/testing it first
  29. © 2017 CA. ALL RIGHTS RESERVED. Why partner with CA

    for your Analytics Initiatives Why CA Technologies?
  30. © 2017 CA. ALL RIGHTS RESERVED. Why partner with CA

    for your Analytics Initiatives Because we wear cool T-Shirts and have light up yo-yos!
  31. © 2017 CA. ALL RIGHTS RESERVED. Why partner with CA

    for your Analytics Initiatives Open Unified Operational Data Lake (Elastic Search) Logs and Traces Metric & Alarms Topology Unified Visibility & Reporting App-to-Infra Correlation ML Powered Predictive Insights AXA/End User (Mobile, Web, IoT) Business KPIs (SFDC, Social,… ) Open RESTful APIs APM Transactions & Metrics Topology UIM Metric, Alerts, Logs, Topology Network Fault, Perf, Logs Custom Data Sources
  32. © 2017 CA. ALL RIGHTS RESERVED. Try/Buy CA App Experience

    Analytics Today! SEE WHAT ANALYTICS CAN DO FOR YOU! Free 30 day Hosted Trial Quickly get started http://ca.com/trydxi