Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Machine Learning engines embedded in SAP Data Warehouse Cloud

Machine Learning engines embedded in SAP Data Warehouse Cloud

SAP Data Warehouse Cloud comes with the embedded analytical engines of SAP HANA Cloud.

Learn how to activate and use that functionality out of your preferred Python environment.

Vitaliy Rudnytskiy

October 06, 2022
Tweet

More Decks by Vitaliy Rudnytskiy

Other Decks in Technology

Transcript

  1. 2 Public Introduction ML engines in SAP Data Warehouse Cloud

    § Automated Predictive Library § Predictive Analysis Library § Engines for geospatial, graph, text Demo § Training an ML model in SAP Data Warehouse Cloud from Python Setup § Configuration SAP Data Warehouse Cloud § Python environment BW Bridge § Brief demo Q & A Agenda
  2. 4 Public 4 SAP Data Warehouse Cloud The unified data

    and analytics service that delivers tangible ROI Accelerate outcomes without complexity Connect data with business context Unlock data insights with integrity
  3. 5 Public SAP & Non SAP 3rd Party Data Lakes

    Streaming Data Unstructured Data Non SAP Applications SAP Applications LOB users Business users/ Analysts Data Modeler/ Engineer Developers & experts Total Spend Analytics Workforce Planning & Analysis People Analytics Customer Insights Financial Planning & Analysis Lifecycle Management Machine Learning Security & Authorization Data Privacy & Protection Business Content Unified SAP Data Data Scientists SAP HANA Cloud SAP Data Warehouse Cloud SAP Data Intelligence Cloud Self-Service Modelling & Preparation Data Spaces Business Semantics Data Marketplace Data Cataloging Data Quality & Transformation Orchestration & Processing Multi-tier Data Storage Smart Multi-Model Data Federation Replication App Development SAP Analytics Cloud Business Intelligence Enterprise Planning Augmented Analytics Analytics & Planning Business Semantics & Data Warehousing Database SAP Data & Analytics Capabilities Today Covering the entire lifecycle of data-to-value
  4. 6 Public SAP & Non SAP 3rd Party Data Lakes

    Streaming Data Unstructured Data Non SAP Applications SAP Applications LOB users Business users/ Analysts Data Modeler/ Engineer Developers & experts Total Spend Analytics Workforce Planning & Analysis People Analytics Customer Insights Financial Planning & Analysis Lifecycle Management Machine Learning Security & Authorization Data Privacy & Protection Business Content Unified SAP Data Data Scientists SAP HANA Cloud SAP Data Warehouse Cloud SAP Data Intelligence Cloud Self-Service Modelling & Preparation Data Spaces Business Semantics Data Marketplace Data Cataloging Data Quality & Transformation Orchestration & Processing Multi-tier Data Storage Smart Multi-Model Data Federation Replication App Development SAP Analytics Cloud Business Intelligence Enterprise Planning Augmented Analytics Analytics & Planning Business Semantics & Data Warehousing Database SAP Data & Analytics Capabilities Today Covering the entire lifecycle of data-to-value
  5. 7 Public SAP Analytics Cloud SAP Data Warehouse Cloud Structured

    dataset Machine Learning and Advanced Analytics Automated Predictive Library (APL) Predictive Analysis Library (PAL) (Geo-)Spatial Graph Text Mining SAP HANA Cloud Self-Service Modeling & Preparation Data Spaces Business Semantics Data Marketplace Data Science enrichment in SAP DWC, without data extraction .. and many others… SAP Data Warehouse Cloud / SAP HANA Cloud Embedded Machine Learning and Advanced Analytics
  6. 8 Public Machine Learning Categories Predicting customer behavior like churn,

    fraud or buying behavior (classification) Predicting car prices, based on model characteristics and market trends (regression) Enabling marketers to develop targeted marketing programs by grouping customers (clustering) Provide personalized product recommendations by analyzing product associations, individual purchase history and external factors (recommender system) SAP Data Warehouse Cloud / SAP HANA Cloud Typical Scenarios Addressed with embedded Machine Learning Forecasting future sales, demand, cost, etc. based on historic time related data (time series forecasting) Analyzing shopping baskets to suggest product placements or additional purchases to a customer (association analysis) Detecting anomalies in financial transactions for fraud analysis, or in machine sensor data for predictive maintenance (outlier detection) In a given social network, you seek to infer which new interactions among its members are likely to occur in the near future (link analysis / prediction)
  7. 9 Public Automated Predictive Library (APL) • Framework that scales

    the use of Machine Learning • Covers steps from variable selection, data preparation, variable encoding, missing value handling, outlier handling, binning and banding, model testing and best model selection • Proprietary framework, with global and local explainability Predictive Analysis Library (PAL) • Expert algorithm library, with over 100 classic and trending machine learning algorithms • Individual algorithms with full control for Data Scientists • Requires manual insight, ie for parameterization • Highly reproducible • AutoML based on PAL in development (currently in beta) Classification Regression Cluster analysis Time series forecasting Association analysis Recommender System Link prediction Outlier detection SAP Data Warehouse Cloud / SAP HANA Cloud Automated or hand-crafted Machine Learning
  8. 10 Public Classification Analysis § Decision Tree Analysis (CART, C4.5,

    CHAID) , Logistic Regression, Support Vector Machine, K-Nearest Neighbor, Naïve Bayes, Confusion Matrix, AUC, Online multi-class Logistic Regression* § Multilayer Perception (back propagation Neural Network) § Random Decision Trees, Hybrid Gradient Boosting Tree (HGBT)#,, Continuous HGBT* § Unified Classification# incl. explainability, segmented (massive) classification Regression § Multiple Linear Regression, Online Linear Regression* § Polynomial-, Exponential-, Bi-Variate Geometric-, Bi-Variate Natural Logarithmic- Regression § Generalized Linear Model (GLM) § Cox Proportional Hazards Model § Random Decision Trees, Hybrid Gradient Boosting Tree (HGBT) #, Continuous HGBT* § Unified Regression* incl. explainability, segmented (massive) regression Pipeline and AutoML § Pipeline-models, -fit and -predict § AutoML incl. data preprocessing, classi- fication, regression, time series forecasting Association Analysis § Apriori, Apriori Lite, FP-Growth § K-Optimal Rule Discovery (KORD) Discovery, Sequential Pattern Mining Link Prediction § Link Prediction (Common Neighbors, Jaccard’s Coefficient, Adamic/Adar, Katzβ), PageRank Recommender Systems § Factorized Polynomial Regression Models, Alternating least squares, Field-aware Factorization Machines (FFM) Text Processing § Conditional Random Field, Latent Dirichlet Allocation § TF-IDF*, term analysis*, text classification*, get related terms / documents*, get relevant terms / documents*, get suggested terms* Data Preprocessing § Sampling, Partitioning, SMOTE, TomekLink, SMOTETomek# § Binning / Discretize, Missing Value Handling, Scaling, Feature Selection* § Isolation Forest* Statistical & Multivariate Analysis § Univariate Analysis (Data Summary, Mean, Median, Variance, Stand. Deviation, Kurtosis, Skewness, ..) § Kernel Density Estimation, Entropy § Correlation Function (with confidence) § Multivariate Analysis (Covariance Matrix, Pearson Correlations Matrix), Condition Index § Principal Component Analysis (PCA)/PCA Projection, TSNE, Categorial PCA § Linear Discriminant Analysis § Multidimensional scaling, Factor Analysis § Chi-squared Tests: Quality of Fit, Test of Independence, ANOVA, F-test (equal variance test) § One-sample Median Test, T Test, Wilcox Signed Rank Test, Kolmogorov-Smirnov Test* § Inter-Quartile Range, Variance Test, Grubbs Outlier Test , Anomaly Detection (KMeans) § Random Distribution Sampling, Markov Chain Monte Carlo (MCMC)# § Distribution Fitting, Cumulative Distribution Function, Distribution Quantile Misc. Functions § Kaplan-Meier Survival Analysis, Weighted Scores Table, ABC Analysis, Tree model visualization# Cluster Analysis § K-Means, Accelerated K-Means, K-Medoids, K- Medians, Geo- / DBSCAN, Agglomerate Hierarchical Clustering*, Slight Silhouette, Cluster Assignment § Kohonen Self-Organizing Maps, Affinity Propagation, Gaussian Mixture Model § segmented (massive) Unified Clustering#, Spectral clustering* Time Series Analysis § Single-, Double-, Triple-, Brown-, Auto Exponential Smoothing, Unified Exponential Smoothing (incl. massive segmentation)* § Auto-ARIMA, Online ARIMA*, Vector-ARIMA*, ARIMA_EXPLAIN* § Additive Model Analysis#, GARCH*, BSTS* § Croston, Croston TSB*, Linear Regression with damped trend and seasonal adjust, Intermittent Time Series Forecast* § Fast Dynamic Time Warping# , DTW*, Hierarchical Forecasting § FFT, Discrete Wavelet/ Wavelet Packet Transform*, Periodogram* § White Noise-, Trend-, Stationary-*, Seasonality- Test, Change Point Detection, Bayesian Change Point Detection*, Outlier Detection*, TS Imputation*, Forecast Accuracy Measures § LSTM*, Attention*, LTSF* § Segmented (massive) Forecasting* SAP HANA Predictive Analysis Library documentation #SAP HANA 2 SPS05 & HANA Cloud | *SAP HANA 2 SPS06 & HANA Cloud | *New in SAP HANA Cloud | As of SAP HANA Cloud 2022 Q3 (CE2022.30)) SAP Data Warehouse Cloud / SAP HANA Cloud Predictive Analysis Library (PAL)
  9. 11 Public Leveraging SAP HANA’s data science capabilities § Allow

    scripting in Python or R, while instructing remote processing of data and advanced analytics in SAP HANA Cloud § Use the HANA dataframe object as virtual data reference for data preprocessing, transformation and analysis, including exploratory data analysis (EDA) visualizations § Leverage the Predictive Analyis Library (PAL) in Python / R, allowing the expert Data Scientists a simple conversion from standard Python-packages to HANA embedded ML models and their operationalization § Automated Predictive Library (APL) functions exposing SAP HANA‘s AutoML and non-expert predictive functions in Python § Model storage and ML model performance reports § Leverage SAP HANA Spatial and Graph capabilities in Python SAP HANA – Machine Learning Python/R Machine Learning interfaces for Data Scientists Data Scientist using R or Python Python / R machine learning client Learn how to get started with PAL and SAP HANA Cloud, APL and SAP HANA Cloud see Python samples. Python machine learning client documentation here R machine learning client documentation here
  10. 12 Public § Native SAP HANA database client* for ODBC

    / JDBC / Python / … see SAP Note 2939501 • Documentation https://help.sap.com/viewer/product/SAP_HANA_CLIENT/latest • Available for developers and data scientists from tools.hana.ondemand.com/#hanatools • Expanded client distribution channels for Python client https://pypi.org/project/hdbcli/ § Native Python machine learning client for SAP HANA Cloud • Exposing SAP HANA data as HANA dataframe in Python • Remote use of SAP HANA’s machine learning-, spatial- and graph functions in Python • Available with SAP HANA Client download + expanded distribution via PyPi https://pypi.org/project/hana-ml/ SAP HANA – Machine Learning Python/R Machine Learning interfaces for Data Scientists *support for SAP HANA Cloud and SAP HANA Platform
  11. 15 Public Cloud Setup Machine Learning in SAP Data Warehouse

    Cloud SAP DWC with 3 virtual CPUs or more Ticket to activate the script server Create database user
  12. 16 Public 3 virtual CPUs or more Cloud setup 1

    / 3 Blog: Configure the Size of Your SAP Data Warehouse Cloud Tenant https://blogs.sap.com/2022/02/18/configure-the-size-of-your-sap-data-warehouse-cloud-tenant/
  13. 17 Public Ticket to activate the script server Cloud setup

    2 / 3 SAP Note 2994416: Enablement of APL and PAL in DWC https://launchpad.support.sap.com/#/notes/2994416 No additional APL/PAL installation required
  14. 19 Public Client Setup (Python) Machine Learning in SAP Data

    Warehouse Cloud Install hana_ml package Add external IPv4 to allow list Store credentials securely
  15. 20 Public Install Python Machine Learning Client for SAP HANA

    (hana_ml package) Client setup 1 / 3 The Python Package Index (PyPI): https://pypi.org/project/hana-ml/ Documentation: https://help.sap.com/doc/1d0ebfe5e8dd44d09606814d 83308d4b/latest/en-US/index.html
  16. 21 Public Add external IPv4 address to allow list Client

    setup 2 / 3 • Get your external IPv4 address, ie from a site like https://www.showmyip.com/ • 192.168.0.0 to 192.168.255.255 is not an external IP address • Add the address to the allow list in SAP Data Warehouse Cloud in System → Configuration
  17. 22 Public Store credentials securely Client setup 3 / 3

    (optional, but recommended) 1) Test connection from Python to SAP Data Warehouse Cloud with hardcoded credentials 2) Store credentials securely in the Secure User Store from the SAP HANA Client 3) Logon with these securely stored credentials
  18. 24 Public A SAP Data Warehouse Cloud feature that provides

    a path to the public cloud for SAP BW NetWeaver & SAP BW/4HANA customers. Offers SAP BW capabilities directly in SAP Data Warehouse Cloud: § Connectivity & Business Content providing proven SAP BW-based data integration (Extractors) from SAP ECC and SAP S/4HANA § Enterprise-ready staging layers of SAP BW for managing data loading with partitioning, monitoring, error handling § Tool-supported move of SAP BW-based integration and staging (details see SAP Note 3141688) Reuse for business continuity Leverage SAP BW data structures, transformations, customizations, and skills – quickly extending your SAP BW investments to the public cloud Connect with confidence Integrate on-premises SAP Business Suite data with familiar connectivity and semantic richness – retaining instant access while expanding your analytics depth Innovate with cloud agility Empower your business to rapidly innovate on BW data with an open, unified data & analytics cloud service – scaling innovation and efficiency in the cloud What is the SAP BW bridge? What are the key value propositions? SAP Data Warehouse Cloud, SAP BW bridge Key Capabilities & Value Proposition
  19. 26 Public Want to know more? Additional content around Machine

    Learning with SAP Data Warehouse Cloud Build your Machine Learning Scenario for your SAP HANA Cloud application from Python https://www.youtube.com/watch?v=CX38-95uBtc Why you should know more about SAP Data Warehouse Cloud https://www.youtube.com/watch?v=7beZXTEBXJA Innovate your IT landscape with SAP Data Warehouse Cloud, SAP BW Bridge https://www.youtube.com/watch?v=Kb19xQvCMDg Hands-on To try out the Machine Learning example (classification) yourself, email [email protected] to get the Jupyter Notebooks and data
  20. 27 Public Data Science mit SAP HANA Advanced Analytics, Machine

    Learning und vorausschauende Analysen Praktische Beispiele für den Einsatz von APL, PAL, Geospatial, Graph und Text Für alle Betriebsformen von SAP HANA § SAP Data Warehouse Cloud § SAP HANA Cloud § SAP HANA Data Science mit SAP HANA 400 Seiten, gebunden, ab Ende Oktober 2022 Buch | E-Book | Bundle ISBN 978-3-8362-9033-3 Jetzt vorbestellen unter www.sap-press.de/5539
  21. 28 Public Open new career opportunities Join the community of

    people with skills for the future 1. Pearson VUE’s Latest “Value of IT Certification” Study Highlights Benefits of IT Certification in Challenging Times,” Pearson Education Inc., May 25, 2021. 2. Chuck Cooper, Why Get IT Certified? The Value of IT Certification: An IT Certification White Paper, IT Certification Council, March 2021. 61% Get promotions1 Check learning.sap.com/teched to benefit like other certified experts: 91% Increase confidence in abilities1 >71% Increase problem-solving skills2 Expand your conference experience: Become an SAP solution expert – now as easy as 1,2,3 in one place: FREE FREE FREE SAP TechEd OFFER § Connect with experts, share your knowledge, expand your network, and collaborate with peers in SAP Community § Network with other participants in the group for SAP TechEd and join the SAP Learning Groups to get your learning questions answered § Follow expert-led learning journeys and live sessions for various development roles to upskill and prepare for certification § Benefit from the event-exclusive certification offer
  22. Thank you. Contact information: © 2022 SAP SE or an

    SAP affiliate company. All rights reserved. See Legal Notice on www.sap.com/legal-notice for use terms, disclaimers, disclosures, or restrictions related to SAP Materials for general audiences. [email protected] Andreas Forster