Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Build your Machine Learning Scenario for your SAP HANA Cloud application from Python

Build your Machine Learning Scenario for your SAP HANA Cloud application from Python

https://groups.community.sap.com/t5/devtoberfest/build-your-machine-learning-scenario-for-your-sap-hana-cloud/ec-p/9071#M45

Learn about how the Python Machine Learning client for SAP HANA can be used to build classification, regression, or time series forecasting scenarios using the Predictive Analysis Library (PAL) or the Automated Predictive Library (APL).

See how the latest Machine Learning AutoML capabilities can help to build even better models in less time.

Vitaliy Rudnytskiy

October 05, 2022
Tweet

More Decks by Vitaliy Rudnytskiy

Other Decks in Technology

Transcript

  1. Public Build your Machine Learning Scenario for your SAP HANA

    Cloud application from Python October 5th, 2022 Christoph Morgen, SAP SE
  2. 2 Public The information in this presentation is confidential and

    proprietary to SAP and may not be disclosed without the permission of SAP. Except for your obligation to protect confidential information, this presentation is not subject to your license agreement or any other service or subscription agreement with SAP. SAP has no obligation to pursue any course of business outlined in this presentation or any related document, or to develop or release any functionality mentioned therein. This presentation, or any related document and SAP's strategy and possible future developments, products and or platforms directions and functionality are all subject to change and may be changed by SAP at any time for any reason without notice. The information in this presentation is not a commitment, promise or legal obligation to deliver any material, code or functionality. This presentation is provided without a warranty of any kind, either express or implied, including but not limited to, the implied warranties of merchantability, fitness for a particular purpose, or non-infringement. This presentation is for informational purposes and may not be incorporated into a contract. SAP assumes no responsibility for errors or omissions in this presentation, except if such damages were caused by SAP’s intentional or gross negligence. All forward-looking statements are subject to various risks and uncertainties that could cause actual results to differ materially from expectations. Readers are cautioned not to place undue reliance on these forward-looking statements, which speak only as of their dates, and they should not be relied upon in making purchasing decisions. Disclaimer
  3. 3 Public Please watch the recording and leave your questions

    at: https://groups.community.sap.com/t5/devtoberfest/build-your- machine-learning-scenario-for-your-sap-hana-cloud/ec-p/9071#M45
  4. 4 Public • SAP HANA Machine Learning ‒ Machine Learning

    function libraries ‒ Machine Learning clients for Data Scientists ‒ Demo • Data Science to Development Handshake ‒ ML Python to SQL Code Generation ‒ Demo • Summary and Outlook • Q&A Agenda
  5. 6 Public Augmenting Applications with SAP HANA Machine Learning Embedding

    AI into Applications running on SAP HANA § Leverage native Machine Learning out of the box – Automated and trending expert machine learning algorithms for embedded use and processing with in-memory performance – Native interfaces for Data Scientists in R and Python – Further enrich with document store, spatial and graph processing, search and text processing capabilities § Get SAP HANA Cloud advantages on top – Real-time federation, data lake and scalability § Build comprehensive multi-model SAP Applications – Available in all application development scenarios Documentation SAP HANA Machine Learning Overview Advanced analytical processing Graph Machine Learning Search Spatial Doc Store Predictive Analysis Library Automated Predictive Library Python / R machine learning client SQL database and developer client SAP HANA Cloud SAP Application specialized function libraries Text processing
  6. 7 Public Classic Machine Learning Scenarios Predicting customer behavior like

    churn, fraud or buying behavior (classification) Predicting car prices, based on model characteristics and market trends (regression) Enabling marketers to develop targeted marketing programs by grouping customers (clustering) Provide personalized product recommendations by analyzing product associations, individual purchase history and external factors (recommender system) SAP HANA Cloud | Embedded Machine Learning Typical Scenarios Addressed Forecasting future sales, demand, cost, etc. based on historic time related data (time series forecasting) Analyzing shopping baskets to suggest product placements or additional purchases to a customer (association analysis) Detecting anomalies in financial transactions for fraud analysis, or in machine sensor data for predictive maintenance (outlier detection) In a given social network, you seek to infer which new interactions among its members are likely to occur in the near future (link analysis / prediction)
  7. 8 Public Approach | Develop a Machine Learning Application Understand

    business problem and build machine learning models in Jupyter Notebook Generate design-time artefacts for machine learning scenario via hana-ml library Import design-time artefacts into CAP project and configure database module Consume machine learning models and integrate prediction into business app Build Develop Consume Generate Data Scientist App Developer
  8. 9 Public SAP HANA Cloud | Automated Predictive Library (APL)

    Native In-Database Automated Predictive Analytics Automated Analytics Engine in SAP HANA Cloud § Automated Predictive Library (APL) • Addresses ML scenarios like Classification, Regression or Time Series Forecasting (and more) • Automated analysis covers steps from variable selection, data preparation, variable encoding, missing value handling, outlier handling, binning and banding, model testing and best model selection § Automation is the key to broad and fast adoption • The APL provides simple SQL procedures for developers to create, train, apply, deploy predictive models • Quick and easy to leverage for non-expert Data Scientists • Python machine learning client support for APL • High productivity and fast time to value Automated Predictive Library (APL) Classification Regression Cluster analysis Time series forecasting Association analysis Recommendation Link analysis SAP HANA Cloud Learn how to get started with APL and SAP HANA Cloud and documentation.
  9. 10 Public § Benefits • For easy consumption and fast

    adoption, SAP HANA Cloud APL provides simple procedure functions, sample data sets and sample SQL scripts to application developers • All results in the training and prediction phase can be queried for visualization and explanation Using Automated Predictive Library (APL) in SAP HANA Cloud Data Science made easy for Application Developers Prepare Data Build and Train Models Deploy and Predict Machine Learning Tasks Automated by APL Call CREATE_MODEL_AND_TRAIN () Call APPLY_MODEL ()
  10. 11 Public SAP HANA Cloud | Predictive Analysis Library (PAL)

    Native In-Database Machine Learning § SAP HANA Cloud embeds multiple machine learning libraries, designed and optimized for massive parallel in- memory processing § Predictive Analysis Library (PAL) • Addresses ML scenarios like Classification, Regression or Time Series Forecasting (and more) • Expert algorithm library, with over 100 classic and trending machine learning algorithms • Algorithm-/model pipeline support, AutoML for classification / regression / time series (best pipeline and model parameters) • Segmented modeling, like segmented / massive Forecasting • Parallel and real-time transaction performance inference • Explainability for interpretability of model predictions § Easy to develop and simple to embed with applications • Simple SQL interface and Python / R machine learning client • Supports both expert data scientists and developer personas Predictive Analysis Library (PAL) Classification Regression Cluster analysis Time series forecasting Association analysis Recommender System Link prediction Outlier detection SAP HANA Cloud Learn how to get started with PAL and SAP HANA Cloud, see PAL documentation.
  11. 12 Public SAP HANA Cloud | Predictive Analysis Library (PAL)

    Algorithm overview by category Classification Analysis § Decision Tree Analysis (CART, C4.5, CHAID) , Logistic Regression, Support Vector Machine, K-Nearest Neighbor, Naïve Bayes, Confusion Matrix, AUC, Online multi-class Logistic Regression* § Multilayer Perception (back propagation Neural Network) § Random Decision Trees, Hybrid Gradient Boosting Tree (HGBT)#,, Continuous HGBT* § Unified Classification# incl. explainability, segmented (massive) classification Regression § Multiple Linear Regression, Online Linear Regression* § Polynomial-, Exponential-, Bi-Variate Geometric-, Bi-Variate Natural Logarithmic- Regression § Generalized Linear Model (GLM) § Cox Proportional Hazards Model § Random Decision Trees, Hybrid Gradient Boosting Tree (HGBT) #, Continuous HGBT* § Unified Regression* incl. explainability, segmented (massive) regression Pipeline and AutoML § Pipeline-models, -fit and -predict § AutoML incl. data preprocessing, classi- fication, regression, time series forecasting Association Analysis § Apriori, Apriori Lite, FP-Growth § K-Optimal Rule Discovery (KORD) Discovery, Sequential Pattern Mining Link Prediction § Link Prediction (Common Neighbors, Jaccard’s Coefficient, Adamic/Adar, Katzβ), PageRank Recommender Systems § Factorized Polynomial Regression Models, Alternating least squares, Field-aware Factorization Machines (FFM) Text Processing § Conditional Random Field, Latent Dirichlet Allocation § TF-IDF*, term analysis*, text classification*, get related terms / documents*, get relevant terms / documents*, get suggested terms* Data Preprocessing § Sampling, Partitioning, SMOTE, TomekLink, SMOTETomek# § Binning / Discretize, Missing Value Handling, Scaling, Feature Selection* § Isolation Forest* Statistical & Multivariate Analysis § Univariate Analysis (Data Summary, Mean, Median, Variance, Stand. Deviation, Kurtosis, Skewness, ..) § Kernel Density Estimation, Entropy § Correlation Function (with confidence) § Multivariate Analysis (Covariance Matrix, Pearson Correlations Matrix), Condition Index § Principal Component Analysis (PCA)/PCA Projection, TSNE, Categorial PCA § Linear Discriminant Analysis § Multidimensional scaling, Factor Analysis § Chi-squared Tests: Quality of Fit, Test of Independence, ANOVA, F-test (equal variance test) § One-sample Median Test, T Test, Wilcox Signed Rank Test, Kolmogorov-Smirnov Test* § Inter-Quartile Range, Variance Test, Grubbs Outlier Test , Anomaly Detection (KMeans) § Random Distribution Sampling, Markov Chain Monte Carlo (MCMC)# § Distribution Fitting, Cumulative Distribution Function, Distribution Quantile Misc. Functions § Kaplan-Meier Survival Analysis, Weighted Scores Table, ABC Analysis, Tree model visualization# Cluster Analysis § K-Means, Accelerated K-Means, K-Medoids, K- Medians, Geo- / DBSCAN, Agglomerate Hierarchical Clustering*, Slight Silhouette, Cluster Assignment § Kohonen Self-Organizing Maps, Affinity Propagation, Gaussian Mixture Model § segmented (massive) Unified Clustering#, Spectral clustering* Time Series Analysis § Single-, Double-, Triple-, Brown-, Auto Exponential Smoothing, Unified Exponential Smoothing (incl. massive segmentation)* § Auto-ARIMA, Online ARIMA*, Vector-ARIMA*, ARIMA_EXPLAIN* § Additive Model Analysis#, GARCH*, BSTS* § Croston, Croston TSB*, Linear Regression with damped trend and seasonal adjust, Intermittent Time Series Forecast* § Fast Dynamic Time Warping# , DTW*, Hierarchical Forecasting § FFT, Discrete Wavelet/ Wavelet Packet Transform*, Periodogram* § White Noise-, Trend-, Stationary-*, Seasonality- Test, Change Point Detection, Bayesian Change Point Detection*, Outlier Detection*, TS Imputation*, Forecast Accuracy Measures § LSTM*, Attention*, LTSF* § Segmented (massive) Forecasting* SAP HANA Predictive Analysis Library documentation #SAP HANA 2 SPS05 & HANA Cloud | *SAP HANA 2 SPS06 & HANA Cloud | *New in SAP HANA Cloud | As of SAP HANA Cloud 2022 Q3 (CE2022.30))
  12. 13 Public SAP HANA Cloud | Using Predictive Analysis Library

    (PAL) In-database machine learning made easy using SQL SAP Applications embedding SAP HANA Cloud ML § Simple developer SQL interface to Predictive Analysis Library / AFLs – SAP HANA Cloud Database Explorer with simple SQL call interface to PAL / AFL procedures in _SYS_AFL schema § Application Development and Application Integration – ABAP application integration via ABAP Managed Database Procedures, standardized in S/4HANA with Intelligent Scenario Lifecycle Management – Business Application Studio allows SAP HANA ML/AFL procedure embedding within HDI design-time database procedures of BTP Cloud Application projects – Fiori-/CAP-applications allow consumption of SAP HANA ML/AFL procedures via advanced SAP HANA native objects like table functions of HDI projects
  13. 15 Public Leveraging SAP HANA’s data science capabilities § Allow

    scripting in Python or R, while instructing remote processing of data and advanced analytics in SAP HANA Cloud § Use the HANA dataframe object as virtual data reference for data preprocessing, transformation and analysis, including exploratory data analysis (EDA) visualizations § Leverage the Predictive Analyis Library (PAL) in Python / R, allowing the expert Data Scientists a simple conversion from standard Python-packages to HANA embedded ML models and their operationalization § Automated Predictive Library (APL) functions exposing SAP HANA‘s AutoML and non-expert predictive functions in Python § Model storage and ML model performance reports § Leverage SAP HANA Spatial and Graph capabilities in Python SAP HANA Cloud | Python/R Machine Learning Clients Data Scientist using R or Python Python / R machine learning client Learn how to get started with PAL and SAP HANA Cloud, APL and SAP HANA Cloud see Python samples. Python machine learning client documentation here R machine learning client documentation here
  14. 18 Public The HANA dataframe § This module represents a

    database query as a dataframe in Python. Operations are designed to not bring data back from the database, unless explicitly collected. § Dataframe creation • Using a ConnectionContext against a table or using a SQL query • Has attributes Name and Columns § Methods against a dataframe • Explore your data set using a simple describe call SAP HANA Cloud | Python Machine Learning Client – The Dataframe • various methods available: Agg, bin, corr, count (replace nrows), empty, hasna, is_numeric, rename_columns (multi column rename), union, privot, save as view, join with select, …
  15. 19 Public Exploratory data analysis visualizations § This module’s functions

    are designed to delegate the data analysis logic into SAP HANA Cloud, and only provide back the result set required for visual display • Bar- and pie-plots, distribution plots, box-whisker plots, scatter heat-map like plots, correlation plots, … include capabilities like implicit binning of column values SAP HANA Cloud | Python Machine Learning Client – Data Exploration
  16. 20 Public SAP HANA Cloud | Python Machine Learning Client

    – Classification Example Classification scenario example § Leveraging PAL unified classification procedure, ‒ single interface for Decision trees, Hybrid gradient boosting tree, Logistic regression, Multi-class logistic regression, Naïve Bayes, Random decision trees, Support Vector Machine ‒ Includes optimal model parameter selection, fit / score / predict functions, additional debrief statistics and metric data for model value plot, … Fit model Predict incl. explainability Score and evaluate model For examples see /SAP-samples/hana-ml-sample/Python.
  17. 21 Public SAP HANA Cloud | Embedded ML – Value

    Add Unique benefits and differentiation of SAP HANA Cloud embedded ML § State of the art algorithms for classic ML scenarios • Classification, regression, forecasting, clustering, … • Automated ML functions (APL) as well as expert ML functions (PAL) incl. trending functions like Random Decision Trees, Gradient Boosting, … § Executes co-located with data and database transaction • Benefits from HANA in-memory processing and performance • Supports scenarios like massive, parallel segmented forecasting • Fastest ML inference within transaction processing § Simple architecture • No-extra service or machine required • Apply in multi-model context, in combination with spatial, graph, text analytics § Multi-role and user interface • SQL for database developers • Python / R machine learning client for Data Scientist • Integrate into SAP Applications via ABAP/HANA-SQL capabilities
  18. 23 Public • Generate HDI design-time project files ‒ Based

    on „final“ fit / predict python calls ‒ Generated SQL is captured and merged into a HANA ML HDI project template • Incl. HDI-required synonyms, role grants etc. ‒ Share as GIT-repository Ø Simple hand-over of ML scenario artefacts to BAS developer Build ML scenarios by Data Scientists in Python • Leveraging the Python ML client for HANA • Generate PAL SQL code ‒ Python methods for all PAL functions Data Science to Development Handshake│ML Python to SQL Code Generation
  19. 24 Public Build ML scenarios by Data Scientists in Python

    • Leveraging the Python ML client for HANA • Generate HDI design-time project files ‒ Based on „final“ fit / predict python calls • Enable SQL trace • Execute ML scenario fit / predict ‒ Generated SQL is captured and merged into a HANA ML HDI project template • Use artifact.HanaGenerator method • HDI artifacts - base procedure (algorithm configuration) - consumption procedure (input data binding / application integration) + Synonyms for PAL procedures*, role grants* and user provided service-reference for _SYS_AFL schema access* Data Science to Development Handshake│ML Python to SQL Code Generation *required to embed HANA ML in HDI-XSA/BAS applications
  20. 25 Public Build ML scenarios by Data Scientists in Python

    • Generated HDI design-time project files ‒ Filesystem / git-repository artifacts Data Science to Development Handshake│ML Python to SQL Code Generation Business Application Studio • Import / clone project from GIT • Build MTA project • Deploy MTA archive User provided service-reference* Ÿ * UPS user requires privilege roles to grant use of _SYS_AFL functions like AFL__SYS_AFL_AFLPAL_EXECUTE_WITH_GRANT_OPTION
  21. 26 Public Reference Blog | Prediction of Fuel Prices in

    Germany Please check our blog post: SAP Data & Analytics Showcase – Develop a Machine Learning Application on SAP Business Technology Platform and SAP BTP Data & Analytics Showcase – Overall Integration Demo
  22. 27 Public App Development Flow│Configure and Develop CAP Application in

    BAS *HDI A and HDI B are belonging to the same HANA Cloud instance HDI - A HDI - B Training data for ML CAP application SAP Business Technology Platform SAP HANA Cloud User Provided Service! Develop Consume Build Generate Business Application Studio Design-time database artefacts • Synonyms • Roles / Grants • Procedures • Table Structures Jupyter Notebook GitHub
  23. 28 Public • Define functions under service of Node.js application

    • Check functions under metadata of oData service App Development Flow│ Consume ML Models in CAP Application • Implement a Node.js script to call HANA procedures (training & prediction) *Please find more details under this GitHub repo of sap-samples.
  24. 30 Public nSAP HANA Cloud | Predictive Analysis Library (PAL)

    – Key Capabilities Native In-Database Machine Learning with SAP HANA Cloud Predictive Analysis Library – Key capabilities § Addresses all key scenarios like Classification, Regression or Time Series Forecasting (and more) • All major machine learning scenario on structure data can be addressed, within the databases • Algorithms fast and optimized for in-database execution § Over 100 classic and trending algorithms • Random decision trees and gradient boosting decision trees outperform in most classification and regression use cases § High-performance parallel mass prediction, real- time transactional speed prediction • Multi-node fastest big data predictions as well as real-time transactional prediction in milliseconds § Segmented ML model development and prediction • Supported with all PAL algorithms and scenarios • Like segmented time series forecasting (forecast segmented by store, product, etc.) § Automated cross validation, hyper parameter selection and AutoML framework • Pipeline models and AutoML framework • Model development support and automation, higher productivity and faster results with best possible and stable models § Easy to develop and simple to embed within applications • Supports both expert data scientists and developer personas • Simple SQL interface and Python and R ML clients
  25. 31 Public SAP Roadmap Explorer – HANA ML / Predictive

    Analysis Library (PAL) § PAL with SAP HANA Cloud link Roadmap | SAP HANA Machine Learning With SAP HANA platform link Complete list of 70 PAL enhancements with SAP HANA Cloud link SAP HANA 2 SPS07 planned enhancements
  26. 33 Public SAP HANA Machine Learning product information overview page

    and documentation § SAP HANA Predictive Analysis Library (PAL) cloud documentation / on-premise documentation § Automated Predictive Library (APL) for SAP HANA documentation § Python machine learning client documentation here, @PyPI https://pypi.org/project/hana-ml/ R machine learning client documentation here R / Python SAP HANA Machine Learning client install instructions § SAP HANA ML samples @github https://github.com/SAP-samples/hana-ml-samples § Intelligence out of the Box - Native Machine Learning in SAP HANA Cloud | SAP Community Call https://www.youtube.com/watch?v=bFv4n3smzQw Getting Started - Tutorials and Blogs with resource collection § https://blogs.sap.com/2020/08/03/getting-started-with-sap-hana-cloud-vi-machine-learning/ § https://blogs.sap.com/2021/05/27/sap-hana-machine-learning-resources/ SAP HANA | Embedded Machine Learning - Further Information
  27. 34 Public Business Application Studio / WebIDE Development § Collaborative

    Database Development in SAP HANA Cloud, SAP HANA Database | Tutorials for SAP Developers § Tech2021 SAP-samples/teched2021-DAT260: SAP TechEd session DAT260 (github.com) § https://blogs.sap.com/2020/12/21/modeling-in-business-application-studio-compared-to-sap-web-ide/ § https://blogs.sap.com/2019/11/13/faq-modeling-in-web-ide/ § Combine CAP with SAP HANA Cloud to Create Full-Stack Applications | Tutorials for SAP Developers § Capire - Multitenancy CAP applications, The hidden life of ServiceManager handled containers | SAP Blogs § Advanced HANA capabilties with CAP applications ./SAP-samples/cloud-cap-samples/tree/advanced-HANA-sample HANA Machine Learning XSA / BAS Application Integration § TechEd 2018 DAT364 Template: Developing smart applications using SAP HANA In-Database Machine Learning cmog/TechEd2018_DAT364 (github.com) § SAP Data & Analytics Showcase – Develop a HANA Machine Learning Application on SAP BTP and SAP BTP Data & Analytics Showcase – Overall Integration Demo BTP App Dev | Embeddeding HANA Machine Learning - Further informations
  28. 35 Public Open new career opportunities Join the community of

    people with skills for the future 1. Pearson VUE’s Latest “Value of IT Certification” Study Highlights Benefits of IT Certification in Challenging Times,” Pearson Education Inc., May 25, 2021. 2. Chuck Cooper, Why Get IT Certified? The Value of IT Certification: An IT Certification White Paper, IT Certification Council, March 2021. 61% Get promotions1 Check learning.sap.com/teched to benefit like other certified experts: 91% Increase confidence in abilities1 >71% Increase problem-solving skills2 Expand your conference experience: Become an SAP solution expert – now as easy as 1,2,3 in one place: FREE FREE FREE SAP TechEd OFFER § Connect with experts, share your knowledge, expand your network, and collaborate with peers in SAP Community § Network with other participants in the group for SAP TechEd and join the SAP Learning Groups to get your learning questions answered § Follow expert-led learning journeys and live sessions for various development roles to upskill and prepare for certification § Benefit from the event-exclusive certification offer
  29. Thank you. Contact information: © 2022 SAP SE or an

    SAP affiliate company. All rights reserved. See Legal Notice on www.sap.com/legal-notice for use terms, disclaimers, disclosures, or restrictions related to SAP Materials for general audiences. [email protected] Christoph Morgen, SAP SE, Walldorf
  30. 37 Public So nutzen Sie SAP HANA effizient als Analytics-Plattform

    § Advanced Analytics, Machine Learning und vorausschauende Analysen § Praktische Beispiele für den Einsatz von PAL, APL und mehr § Für alle Betriebsformen von SAP HANA Data Science mit SAP HANA 406 Seiten, gebunden, ab Ende Oktober 2022 Buch | E-Book | Bundle ISBN 978-3-8362-9033-3 Jetzt vorbestellen unter www.sap-press.de/5539
  31. © 2020 SAP SE or an SAP affiliate company. All

    rights reserved. No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP SE or an SAP affiliate company. The information contained herein may be changed without prior notice. Some software products marketed by SAP SE and its distributors contain proprietary software components of other software vendors. National product specifications may vary. These materials are provided by SAP SE or an SAP affiliate company for informational purposes only, without representation or warranty of any kind, and SAP or its affiliated companies shall not be liable for errors or omissions with respect to the materials. The only warranties for SAP or SAP affiliate company products and services are those that are set forth in the express warranty statements accompanying such products and services, if any. Nothing herein should be construed as constituting an additional warranty. In particular, SAP SE or its affiliated companies have no obligation to pursue any course of business outlined in this document or any related presentation, or to develop or release any functionality mentioned therein. This document, or any related presentation, and SAP SE’s or its affiliated companies’ strategy and possible future developments, products, and/or platforms, directions, and functionality are all subject to change and may be changed by SAP SE or its affiliated companies at any time for any reason without notice. The information in this document is not a commitment, promise, or legal obligation to deliver any material, code, or functionality. All forward-looking statements are subject to various risks and uncertainties that could cause actual results to differ materially from expectations. Readers are cautioned not to place undue reliance on these forward-looking statements, and they should not be relied upon in making purchasing decisions. SAP and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP SE (or an SAP affiliate company) in Germany and other countries. All other product and service names mentioned are the trademarks of their respective companies. See www.sap.com/copyright for additional trademark information and notices. www.sap.com/contactsap Follow us