Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Build Intelligent Microservices via Machine Learning and Big Data Analytics

Kai Waehner
October 20, 2016

Build Intelligent Microservices via Machine Learning and Big Data Analytics

I had two sessions at O’Reilly Software Architecture Conference in London in October 2016. It is the first #OReillySACon in London. A very good organized conference with plenty of great speakers and sessions. I can really recommend this conference and its siblings in other cities such as San Francisco or New York if you want to learn about good software architectures and new concepts, best practices and technologies. Some of the hot topics this year besides microservices are DevOps, serverless architectures and big data analytics.
Intelligent Microservices by Leveraging Big Data Analytics

One of the two sessions was an updated slide deck of how to apply machine learning and big data analytics to real time event processing. I also included the relation to microservices in this update, i.e. how to leverage microservice concepts such as 12 Factor Apps, Containers (e.g. Docker), Cloud Platforms (e.g. Kubernetes, Cloud Foundry), or DevOps to build agile, intelligent microservices.
Abstract: How to Apply Machine Learning to Microservices

The digital transformation is going forward due to Mobile, Cloud and Internet of Things. Disrupting business models leverage Big Data Analytics and Machine Learning.

"Big Data" is currently a big hype. Large amounts of historical data are stored in Hadoop or other platforms. Business Intelligence tools and statistical computing are used to draw new knowledge and to find patterns from this data, for example for promotions, cross-selling or fraud detection. The key challenge is how these findings can be integrated from historical data into new transactions in real time to make customers happy, increase revenue or prevent fraud. "Fast Data" via stream processing is the solution to embed patterns - which were obtained from analyzing historical data - into future transactions in real-time.

This session uses several real world success stories to explain the concepts behind stream processing and its relation to Hadoop and other big data platforms. It discusses how patterns and statistical models of R, Spark MLlib, H2O, and other technologies can be integrated into real-time processing by using several different real world case studies. The session also points out why a Microservices architecture helps solving the agile requirements for these kind of projects.

A brief overview of available open source frameworks and commercial products shows possible options for the implementation of stream processing, such as Apache Storm, Apache Flink, Spark Streaming, IBM InfoSphere Streams, or TIBCO StreamBase.

A live demo shows how to implement stream processing, how to integrate machine learning, and how human operations can be enabled in addition to the automatic processing via a Web UI and push events.
How to Build Intelligent Microservices - Slide Deck from O'Reilly Software Architecture Conference.

Keywords: Big Data, Fast Data, Machine Learning, Analytics, Analytic Model, Stream Processing, Event Processing, Streaming Analytics, Real Time, Hadoop, Spark, MLlib, Streaming, R, TERR, TIBCO, Spotfire, StreamBase, Live Datamart, H20, Predictive Analytics, Data Discovery, Insights, Patterns

Kai Waehner

October 20, 2016
Tweet

More Decks by Kai Waehner

Other Decks in Technology

Transcript

  1. Kai Wähner Technology Evangelist [email protected] LinkedIn @KaiWaehner www.kai-waehner.de O’Reilly Software

    Architecture Conference 2016 (London, UK) How to apply big data analytics and machine learning to real-time processing of microservice events
  2. © Copyright 2000-2016 TIBCO Software Inc. Key Take-Aways Ø Insights

    are hidden in Historical Data on Big Data Platforms Ø Machine Learning and Big Data Analytics find these Insights by building Analytics Models Ø Event Processing uses these Models (without Redevelopment) to take Action in Real Time
  3. © Copyright 2000-2016 TIBCO Software Inc. Agenda 1) Machine Learning

    and Big Data Analytics 2) Building an Analytic Model 3) Real Time Processing 4) Live Demo 5) Intelligent Microservices
  4. © Copyright 2000-2016 TIBCO Software Inc. Agenda 1) Machine Learning

    and Big Data Analytics 2) Building an Analytic Model 3) Real Time Processing 4) Live Demo 5) Intelligent Microservices
  5. Real World Examples of Machine Learning Spam Detection Search Results

    + Product Recommendation Picture Detection (Friends, Locations, Products) Machine Learning is already present in daily life… Now, every enterprise is beginning to leverage it! The Next Disruption: Google Beats Go Champion
  6. © Copyright 2000-2016 TIBCO Software Inc. Example: Decision Tree –

    Titanic Survival Rate family size Wikipedia
  7. Decision Tree – Product Pass / Fail by Equipment Sensor

    Readings Bad Product Good Product Step 8 Temperature < 122 C >= 122 C Step 2 Recipe A B Step 11 Pressure TV Color Display Problem
  8. © Copyright 2000-2016 TIBCO Software Inc. Ensemble Tree Algorithms •

    Random Forest, Gradient Boosting Machine (GBM) • Method – Average many simple trees • Sample the data: fit a simple tree • Re-sample the data; up-weighting the observations that weren’t fitted well in previous model • Continue adding trees until fit is good • Save all the trees and average them • Better fit + prediction than single trees
  9. © Copyright 2000-2016 TIBCO Software Inc. Analytics Maturity Model Immediate

    Long-Term Competitive Advantage Value to the Organization A good Big Data Analytics platform can provide value to the organization across the full spectrum of use cases Self-service Dashboards Event Processing Advanced Analytics Measure Diagnose Predict Optimize Alert Automate Analytics Maturity Visual Analytics Event Processing Analytics
  10. © Copyright 2000-2016 TIBCO Software Inc. Analytics Maturity Model Immediate

    Long-Term Competitive Advantage Value to the Organization Visual Analytics Event Processing Advanced Analytics Measure Diagnose Predict Optimize Alert Automate Analytics Maturity A good Big Data Analytics platform can provide value to the organization across the full spectrum of use cases Analytics
  11. © Copyright 2000-2016 TIBCO Software Inc. Analytics Maturity Model Immediate

    Long-Term Competitive Advantage Value to the Organization Self-service Dashboards Event Processing Advanced Analytics Measure Diagnose Predict Optimize Alert Automate Analytics Maturity A good Big Data Analytics platform can provide value to the organization across the full spectrum of use cases Visual Analytics Event Processing Analytics
  12. © Copyright 2000-2016 TIBCO Software Inc. The first task in

    a new analytics projects is to define a Business Case!
  13. © Copyright 2000-2016 TIBCO Software Inc. From a Business Case

    to Proactive Actions Model Present Data Wrangling Signals Dashboards SAP Historian Production Well Filter Enrich Merge Shape Explore Clean Assemble Data Business Case Increase Productivity Grow Revenue Completions Visualize GeoLocation Production Value Theses Reduce Risk G&G Equipment Decision, Action Prediction Action Develop Model Pressure Temperature Production Interrupt Drill Bit Movement Equipment Failure
  14. © Copyright 2000-2016 TIBCO Software Inc. Agenda 1) Machine Learning

    and Big Data Analytics 2) Building an Analytic Model 3) Real Time Processing 4) Live Demo 5) Intelligent Microservices
  15. © Copyright 2000-2016 TIBCO Software Inc. Analytics Maturity Model Immediate

    Long-Term Competitive Advantage Value to the Organization A good Big Data Analytics platform can provide value to the organization across the full spectrum of use cases Self-service Dashboards Event Processing Advanced Analytics Measure Diagnose Predict Optimize Alert Automate Analytics Maturity Visual Analytics Event Processing Analytics
  16. © Copyright 2000-2016 TIBCO Software Inc. Variety of Data in

    Enterprises Custom GUI-driven data access via SDK Siebel eBusiness Local data sources Access Excel STDF Drag-and-drop MySQL SQL Server Oracle Information Services (join, transform, reusable, parameterized, dynamic query for in-memory use) Databases JDBC/ODBC Hadoop SFDC PostgreSQL Teradata Netezza Etc. XML RDBMS Flat Files Spread- sheets Web Services Oracle E-Business RDBMS RDBMS RDBMS SAP BW SAP R/3 D A T A F A B R I C Salesforce ODBC OLE DB SqlClient Direct connection Oracle TeradataAster MS SSAS Teradata Direct Query (dynamically query and retrieve data for visualization and analysis) Databases MySQL Etc. OBIEE Netezza Hadoop
  17. cust_id dept sku dollar gift date 1 104 C 12003

    2.40 FALSE 2016-10-17 2 105 A 12005 62.85 FALSE 2016-10-17 3 102 C 12007 69.23 TRUE 2016-10-17 4 104 B 12004 9.33 FALSE 2016-10-18 5 105 C 12010 14.16 TRUE 2016-10-18 6 101 B 12003 90.43 FALSE 2016-10-19 7 103 C 12005 90.97 FALSE 2016-10-19 n … … … … … … cust_id A B C total # orders first_dat e last_dat e 1 100 21.76 23.67 0.00 45.43 2 2016-10- 19 2016-10- 20 2 101 0.01 74.65 0.00 74.66 3 2016-10- 19 2016-10- 20 3 102 0.00 60.92 50.29 111.21 6 2016-10- 17 2016-10- 20 4 103 0.00 0.00 52.30 52.30 2 2016-10- 19 2016-10- 20 © Copyright 2000-2016 TIBCO Software Inc. Data Munging - Transformations
  18. Exploratory Data Analysis (EDA) is an approach/philosophy for data analysis

    that employs a variety of techniques (mostly graphical) 1. to maximize insight into a data set 2. uncover underlying structure 3. extract important variables 4. detect outliers and anomalies 5. test underlying assumptions 6. develop parsimonious models 7. determine optimal factor settings © Copyright 2000-2016 TIBCO Software Inc. Exploratory Data Analysis
  19. “The greatest value of a picture is when it forces

    us to notice what we never expected to see” John W. Tukey, 1977 © Copyright 2000-2016 TIBCO Software Inc. Exploratory Data Analysis
  20. Visual Analytics - Interactive Brush-Linked © Copyright 2000-2016 TIBCO Software

    Inc. … and “Inline Data Wrangling” à Ad-hoc data preparation instead of just ETL
  21. © Copyright 2000-2016 TIBCO Software Inc. Analytics Maturity Model Immediate

    Long-Term Competitive Advantage Value to the Organization Visual Analytics Event Processing Advanced Analytics Measure Diagnose Predict Optimize Alert Automate Analytics Maturity A good Big Data Analytics platform can provide value to the organization across the full spectrum of use cases Analytics
  22. © Copyright 2000-2016 TIBCO Software Inc. Which picture represents a

    model? A model is a simplification of the truth that helps you with decision making.
  23. Employees who write longer emails earn higher salaries! © Copyright

    2000-2016 TIBCO Software Inc. Model Building
  24. © Copyright 2000-2016 TIBCO Software Inc. Model Validation How is

    the IQ of a kid related to the IQ of his / her mum?
  25. © Copyright 2000-2016 TIBCO Software Inc. “…as a next-generation data

    discovery capability that automatically finds and explains insights from advanced analytics to business users or citizen data scientists” Smart Data Discovery (for the Business User) Leverage Machine Learning without the help of a Data Scientist
  26. R Language • Built for data scientists • Very active

    community © Copyright 2000-2016 TIBCO Software Inc.
  27. R with Revolution Analytics (now Microsoft) © Copyright 2000-2016 TIBCO

    Software Inc. Open Source GPL License (including its restrictions) http://www.revolutionanalytics.com/webinars/introducing-revolution-r-open-enhanced-open-source-r-distribution- revolution-analytics
  28. TIBCO has rewritten R as a Commercial Compute Engine •

    Latest statistics scripting engine: S a S-PLUS® a R a TERR • Runs R code including CRAN packages Engine internals rebuilt from scratch at low-level • Redesigned data objects, memory management • High performance + Big Data TERR is licensed from TIBCO • TERR Installs (free) with Spotfire Analyst / Desktop + other TIBCO products • Spotfire Server can manage all TERR / R scripts, artifacts for reuse • Standalone Developer Edition • Supported by TIBCO • No GPL license issues © Copyright 2000-2016 TIBCO Software Inc. TERR - TIBCO’s Enterprise Runtime for R
  29. Which R to use? © Copyright 2000-2016 TIBCO Software Inc.

    http://www.forbes.com/sites/danwoods/2016/01/27/microsofts-revolution-analytics-acquisition-is-the-wrong- way-to-embrace-r/
  30. © Copyright 2000-2016 TIBCO Software Inc. Apache Spark General Data-processing

    Framework à However, focus is especially on Analytics (at least these days)
  31. Apache Spark MLlib © Copyright 2000-2016 TIBCO Software Inc. Spark

    ML is Spark’s machine learning library. Its goal is to make practical machine learning scalable and easy. It consists of common learning algorithms and utilities, including classification, regression, clustering and collaborative filtering. General Data-processing Framework à However, focus is especially on Analytics (at least these days) x
  32. © Copyright 2000-2016 TIBCO Software Inc. H2O.ai An Extensible Open

    Source Platform for Analytics • Best of Breed Open Source Technology • Easy-to-use Web UI and Familiar Interfaces • Data Agnostic Support for all Common Database and File Types • Massively Scalable Big Data Analysis • Real-time Data Scoring (“Nanofast Scoring Engine”) http://www.h2o.ai/
  33. TIBCO Spotfire with R / TERR Integration © Copyright 2000-2016

    TIBCO Software Inc. Let the business user leverage Analytic Models (created by the Data Scientist) to find insights! Example: Customer Churn with Random Forest Algorithm • ‘refresh model’ button lives a ‘random forest algorithm’ • requires no a priori assumptions at all, it just always works • The business user doesn’t need to know what random forest is to be empowered by it Select variables for the model
  34. TIBCO Spotfire with H2O Integration © Copyright 2000-2016 TIBCO Software

    Inc. Example: Predictive Analytics for Manufacturing (“scrap parts as early as possible”)
  35. TIBCO Spotfire with H2O Integration © Copyright 2000-2016 TIBCO Software

    Inc. Example: Predictive Analytics for Manufacturing (“scrap parts as early as possible”)
  36. © Copyright 2000-2016 TIBCO Software Inc. SaaS Machine Learning •

    Managed SaaS service for building ML models and generating predictions • Integrated into the corresponding cloud ecosystem • Easy to use, but limited feature set and potential latency issues if combined with external data or applications http://docs.aws.amazon.com/machine-learning/latest/dg/tutoria
  37. © Copyright 2000-2016 TIBCO Software Inc. PMML (Predictive Model Markup

    Language ) • XML-based de facto standard to represent predictive analytic models • Developed by the Data Mining Group (DMG) • Easily share models between PMML compliant applications (e.g. between model creation and deployment for operations)
  38. © Copyright 2000-2016 TIBCO Software Inc. Agenda 1) Machine Learning

    and Big Data Analytics 2) Building an Analytic Model 3) Real Time Processing 4) Live Demo 5) Intelligent Microservices
  39. © Copyright 2000-2016 TIBCO Software Inc. Analytics Maturity Model Immediate

    Long-Term Competitive Advantage Value to the Organization Self-service Dashboards Event Processing Advanced Analytics Measure Diagnose Predict Optimize Alert Automate Analytics Maturity A good Big Data Analytics platform can provide value to the organization across the full spectrum of use cases Visual Analytics Event Processing Analytics
  40. © Copyright 2000-2016 TIBCO Software Inc. Real Time Streaming Analytics

    time 1 2 3 4 5 6 7 8 9 Event Streams • Continuous Queries • Sliding Windows • Filter • Aggregation • Correlation • …
  41. © Copyright 2000-2016 TIBCO Software Inc. Operational Intelligence and Human

    Interaction Actions by Operations Human decisions in real time informed by up to date information 65 Automated action based on models of history combined with live context and business rules Machine-to-Machine Automation
  42. © Copyright 2000-2016 TIBCO Software Inc. Alternatives for Streaming Analytics

    (no complete list!) Azure Microsoft Stream Analytics CLOSED SOURCE OPEN SOURCE FRAMEWORK PRODUCT
  43. © Copyright 2000-2016 TIBCO Software Inc. What Kind of Streaming

    Analytics do you need? Visual IDE (Dev, Test, Debug) Simulation (Feed Testing, Test Generation) Live UI (monitoring, proactive interaction) Maturity (24/7 support, consulting) Integration (out-of-the-box: ESB, MDM, etc.) Library (Java, .NET, Python) Query Language (often similar to SQL) Scalability (horizontal and vertical, fail over) Connectivity (technologies, markets, products) Operators (Filter, Sort, Aggregate) Time to Market Streaming Frameworks Streaming Products Slow Fast Streaming Concepts
  44. © Copyright 2000-2016 TIBCO Software Inc. Comparison of Stream Processing

    Frameworks and Products Slide Deck from JavaOne 2015: http://www.kai-waehner.de/blog/2015/10/25/ comparison-of-stream-processing-frameworks-and-products/ Updated slide deck coming in November 2016 (Big Data Spain, Madrid)
  45. © Copyright 2000-2016 TIBCO Software Inc. Visual Coding for Streaming

    Analytics • Streaming Operators • Connectivity • Visual Development • Testing & Simulation • Mature Tooling / Support • Middleware Integration
  46. © Copyright 2000-2016 TIBCO Software Inc. Live Visual Analytics UI

    Dynamic aggregation Live visualization Ad-hoc continuous query Alerts Action
  47. © Copyright 2000-2016 TIBCO Software Inc. How to apply analytic

    models to real time processing without redevelopment? Stream Processing H20.ai Open Source R TERR Spark ML MATLAB SAS PMML
  48. © Copyright 2000-2016 TIBCO Software Inc. Closed Loop à Automatically

    Re-Compute (and Improve) the Analytic Model Compute your performance metric Spot not good enough performance Re-compute model
  49. © Copyright 2000-2016 TIBCO Software Inc. Agenda 1) Machine Learning

    and Big Data Analytics 2) Building an Analytic Model 3) Real Time Processing 4) Live Demo 5) Intelligent Microservices
  50. • Reactive – Run to failure • Preventive – Scheduled

    service (reliability) • Condition-based – Monitor condition (sensors) • Predictive – Predict failures • Proactive – Deploy automatic actions Evolution of Equipment Maintenance Strategies
  51. Scenario: Predictive Scrapping of Parts in an Assembly Line Goal:

    Scrap parts as early as possible automatically to reduce costs in a manufacturing process. Question: When to scrap a part in Station 1 instead of doing re-work or sending it to Station 2? Station 1 Station 2 Cost Before 9€ 7€ 13€ Total Cost 29€ (or more) Scrap? Scrap?
  52. Fast Data Architecture for Predictive Maintenance Operational Analytics Operations Live

    UI CSV Batch JSON Real Time XML Real Time Streaming Analytics Action Aggregate Rules Analytics Correlate Live Datamart Continuous query processing Alerts Manual action, escalation HISTORICAL ANALYSIS Data Scientists Flume HDFS Spotfire R / TERR HDFS Hadoop (Cloudera) StreamBase TIBCO Fast Data Platform H2O Oracle RDBMS Avro Parquet … PMML Internal Data
  53. TIBCO Spotfire with H2O Integration Data Discovery / Data Mining

    (“Are parts that repeat a station more likely scrap parts?”)
  54. TIBCO Live Datamart Operational Intelligence (“Monitor the manufacturing process and

    change rules in real time!”) Live Dartmart Desktop Client
  55. © Copyright 2000-2016 TIBCO Software Inc. TIBCO Accelerator for Apache

    Spark 1. Fast Data Preparation for IoT Dozens of enterprise and IoT data preparation adapters: MQTT, Databases; inbound creation of HDFS, Parquet, Hbase, Avro… 2. Spotfire Model Discovery Template Use Spotfire to explore Spark data lake, create predictive model, train in H20, and deploy to Streaming Analytics. 3. Operationalize Predictive Models Zookeeper deployment to StreamBase nodes living in Spark cluster via H20, PMML, TERR models 4. Streaming Analytics for Automation Automate action based on predictive models – make offers to customers, stop fraudulent transactions, alert. 5. Monitor & Retrain Model Monitor behavior of model, retrain when necessary. 6. Drag & Drop for Business Solution Developers Code-free development environment for work with H20, HDFS, Avro, TERR The TIBCO Accelerator for Spark is a TIBCO engineered, light-weight open-source fast- start for systems to stream data into Spark, discover patterns in Spark with Spotfire, and operationalize the insights on Big Data. FUNCTIONAL COMPONENTS
  56. © Copyright 2000-2016 TIBCO Software Inc. Agenda 1) Machine Learning

    and Big Data Analytics 2) Building an Analytic Model 3) Real Time Processing 4) Live Demo 5) Intelligent Microservices
  57. © Copyright 2000-2016 TIBCO Software Inc. Evolving Demands from the

    Business AGILITY & SPEED REDUCED CYCLE TIMES WEB SCALE LOWER COST FAIL FAST
  58. © Copyright 2000-2016 TIBCO Software Inc. 12 Factor Apps for

    Cloud Native Microservices Codebase One codebase tracked in revision control, many deploys. Dependencies Explicitly declare and isolate dependencies. Config Store config in the environment. Backing Services Treat backing services as attached resources. Build, Release, Run Strictly separate build and run stages. Processes Execute the app as one or more stateless processes. Port Binding Export services via port binding. Concurrency Scale out via the process model. Disposability Maximize robustness with fast startup and graceful shutdown. Dev / Prod Parity Keep dev, staging, and prod as similar as possible. Logs Treat logs as event streams. Admin Processes Run admin/mgmt tasks as one-off processes. https://12factor.net/
  59. © Copyright 2000-2016 TIBCO Software Inc. Why Containers? http://www.slideshare.net/andersjanmyr/docker-the-future-of-devops Containers

    enable: • Lightweight deployment • Automation • Better resource utilization • Scaling up and down quickly • Platform agnostic deployment • Innovation and Fail Fast Concepts • Standardization ? Ø The Open Container Initiative (OCI) Ø Docker Fork Discussions (!!!)
  60. © Copyright 2000-2016 TIBCO Software Inc. DevOps Elements – Culture

    and Technology! Process Tools Automation Culture Continuous Integration/ Continuous Development APIs Microservices Frequent releases Collaboration
  61. © Copyright 2000-2016 TIBCO Software Inc. Develop fast. Fail fast.

    Change fast. Visual Analytics + Visual Coding + DevOps = Agile Intelligent Microservices
  62. © Copyright 2000-2016 TIBCO Software Inc. Real Time Streaming Analytics

    time 1 2 3 4 5 6 7 8 9 Event Streams Apply your intelligent (micro)service to any event. Microservice event. Application event. Legacy event. IoT event. You name it.
  63. © Copyright 2000-2016 TIBCO Software Inc. Key Take-Aways Ø Insights

    are hidden in Historical Data on Big Data Platforms Ø Machine Learning and Big Data Analytics find these Insights by building Analytics Models Ø Event Processing uses these Models (without Redevelopment) to take Action in Real Time