Slide 1

Slide 1 text

With Mark Hornick, Senior Director, Product Management, Data Science and Machine Learning @MarkHornick Marcos Arancibia, Product Manager, Data Science and Big Data @MarcosArancibia oracle.com/machine-learning Oracle Machine Learning Office Hours Oracle Machine Learning for R: an Introduction Copyright © 2020, Oracle and/or its affiliates. All rights reserved

Slide 2

Slide 2 text

Today’s Agenda Upcoming session Today’s speaker Mark Hornick – Oracle Machine Learning for R: an Introduction Q&A Copyright © 2020 Oracle and/or its affiliates.

Slide 3

Slide 3 text

Next Session November 5, 2020: Oracle Machine Office Hours, 8AM US Pacific Machine Learning 102 – Clustering Join us for this special series “Oracle Machine Learning Office Hours – Machine Learning 102”, where we will go through the main steps of solving a Business Problem from beginning to end, using the different components available in Oracle Machine Learning: programming languages and interfaces, including Notebooks with SQL, UI, and languages like R and Python. This sixth session in the series will cover Clustering 102, and we will learn more about the methods on multiple dimensions, how to compare Cluster techniques, and explore Dimensionality Reduction and how to extract only the most meaningful attributes from datasets with lots of attributes (or derived attributes). Copyright © 2020, Oracle and/or its affiliates. All rights reserved

Slide 4

Slide 4 text

For more information… oracle.com/machine-learning Copyright © 2020 Oracle and/or its affiliates.

Slide 5

Slide 5 text

Copyright © 2020 Oracle and/or its affiliates. https://www.oracle.com/cloud/free/

Slide 6

Slide 6 text

Today’s Session: Oracle Machine Learning for R: an Introduction Copyright © 2020, Oracle and/or its affiliates. All rights reserved Oracle Machine Learning for R (OML4R) enables you to work with database tables and views using familiar R syntax and functions. For scalable and performant data exploration, data preparation, and machine learning, leverage Oracle Database as a high performance compute engine: • Build machine learning models using parallelized in-database algorithms using R Formula-based specification • Invoke user-defined R functions from SQL for deployment in applications and dashboards, where R engines are dynamically spawned and controlled by Oracle Database. • Take advantage of running your R functions in a data-parallel and task-parallel manner.

Slide 7

Slide 7 text

Mark Hornick, Senior Director Oracle Machine Learning Product Management October 2020 Oracle Machine Learning for R An Introduction

Slide 8

Slide 8 text

• ML pain points • Oracle Machine Learning • Introducing OML4R • Demo • Q&A Agenda Copyright © 2020 Oracle and/or its affiliates 8

Slide 9

Slide 9 text

“It takes too long to get my data or to get the ‘right’ data” “I can’t analyze or mine all of my data – it has to be sampled” “Putting open source models and results into production takes too long and is ad hoc and complex” “Our company is concerned about data security, backup and recovery” “We need to build and score with 100s or 1000s of models fast to meet business objectives” Sample of common enterprise machine learning pain points Copyright © 2020 Oracle and/or its affiliates.

Slide 10

Slide 10 text

Oracle Machine Learning OML Services* Model Deployment and Management, Cognitive Text d * Coming soon OML4SQL SQL API OML4Py* Python API OML Notebooks with Apache Zeppelin on Autonomous Database OML4Spark R API on Big Data Oracle Data Miner Oracle SQL Developer extension Copyright © 2020 Oracle and/or its affiliates. OML AutoML UI* Code-free AutoML interface on Autonomous Database OML4R R API

Slide 11

Slide 11 text

Oracle Machine Learning interfaces to Oracle Database Autonomous Database Oracle Database OML Notebooks Database Cloud Service OML4SQL Oracle Data Miner OML4Py* SQL Developer SQL*Plus SQL Developer R client, RStudio Python client, Jupyter Notebooks Data Management Platform Oracle Machine Learning Component Tool *Coming soon Apache Zeppelin OML4SQL OML4Py* OML4R* Copyright © 2020 Oracle and/or its affiliates. OML4R

Slide 12

Slide 12 text

In-database, parallelized, distributed algorithms • No extracting data to separate ML engine • Fast and scalable • Batch and real-time scoring • Explanatory prediction details ML models as first class database objects • Access control via permissions • Audit user actions • Export / import models across databases Supports R and Python interfaces Leverage ML across Oracle stack Empower SQL users with immediate access to ML included with Oracle Database and Oracle Autonomous Database Oracle Machine Learning for SQL SQL Interfaces SQL*Plus SQLDeveloper … Oracle Autonomous Database OML Notebooks Oracle Database with OML Copyright © 2020 Oracle and/or its affiliates.

Slide 13

Slide 13 text

Copyright © 2020 Oracle and/or its affiliates. Access latency Paradigm shift: R/Python à Data Access Language à R/Python Memory limitation – data size, in-memory processing Single threaded Issues for backup, recovery, security Ad hoc production deployment Traditional analytics and data source interaction Deployment Ad hoc cron job Data Source Flat Files extract / export read export load Data source connectivity packages Read/Write files using built-in tool capabilities ?

Slide 14

Slide 14 text

Oracle Machine Learning for R Oracle Database as HPC environment In-database parallelized and distributed machine learning algorithms Manage scripts and objects in Oracle Database Integrate results into applications and dashboards via SQL Use Oracle R Distribution or open source R Empower data scientists with open source environments Database Server Machine SQL Interface OML4R Copyright © 2020 Oracle and/or its affiliates.

Slide 15

Slide 15 text

Oracle Machine Learning for R Transparency layer • Leverage proxy objects so data remain in database • Overload native functions translating functionality to SQL • Use familiar R syntax on database data Parallel, distributed algorithms • Scalability and performance • Exposes in-database algorithms available from OML4SQL Embedded execution • Manage and invoke R scripts in Oracle Database • Data-parallel, task-parallel, and non-parallel execution • Use open source packages to augment functionality Empower data scientists with open source environments Copyright © 2020 Oracle and/or its affiliates. Database Server Machine SQL Interface OML4R

Slide 16

Slide 16 text

Copyright © 2020 Oracle and/or its affiliates. Example using OML4R Proxy objects data.frame Proxy data.frame Inherits from

Slide 17

Slide 17 text

Copyright © 2020, Oracle and/or its affiliates 17 Mapping between R and Oracle Database Data Types SQL – ROracle Read R SQL – ROracle Write varchar2, char, clob, rowid character varchar2(4000) number, float, binary_float, binary_double numeric if(ora.number==T) number else binary_double integer integer integer logical integer date, timestamp POSIXct timestamp Date timestamp interval day to second difftime interval day to second raw, blob, bfile ‘list’ of ‘raw’ vectors raw(2000) factor (and other types) character

Slide 18

Slide 18 text

OML4R Algorithms Copyright © 2020 Oracle and/or its affiliates. • Decision Tree • Logistic Regression • Naïve Bayes • Support Vector Machine • RandomForest Regression • Linear Model • Generalized Linear Model • Multi-Layer Neural Networks • Stepwise Linear Regression • Support Vector Machine Classification Attribute Importance • Minimum Description Length Clustering • Hierarchical k-Means • Orthogonal Partitioning • Expectation Maximization Feature Extraction • Nonnegative Matrix Factorization • Principal Component Analysis • Singular Value Decomposition • Explicit Semantic Analysis Market Basket Analysis • Apriori – Association Rules Anomaly Detection • 1 Class Support Vector Machine Time Series • Single Exponential Smoothing • Double Exponential Smoothing …plus open source R packages for algorithms in combination with embedded R data- and task-parallel execution Supports automatic data preparation, partitioned model ensembles, integrated text mining

Slide 19

Slide 19 text

Oracle Database Data c1 c2 ci cn f(dat,args,…) f(dat,args,…) f(dat,args,…) f(dat,args,…) Model c1 Model c2 Model cn Model ci R Datastore R Script Repository Scalable Data Analysis – Model Building Smart meter scenario f(dat,args,…) { } R Script build model

Slide 20

Slide 20 text

Build models and store in database, partition on CUST_ID 20 ore.groupApply (CUST_USAGE_DATA, CUST_USAGE_DATA$CUST_ID, function(dat, ds.name) { cust_id <- dat$CUST_ID[1] mod <- lm(Consumption ~ . -CUST_ID, dat) mod$effects <- mod$residuals <- mod$fitted.values <- NULL name <- paste("mod", cust_id,sep="") assign(name, mod) ds.name1 <- paste(ds.name,".",cust_id,sep="") ore.save(list=paste("mod",cust_id,sep=""), name=ds.name1, overwrite=TRUE) TRUE }, ds.name="myDatastore", ore.connect=TRUE, parallel=TRUE )

Slide 21

Slide 21 text

Copyright © 2020, Oracle and/or its affiliates 21 Demo of Oracle Machine Learning for R

Slide 22

Slide 22 text

Mission and Vision Promote the R language and lead initiatives in support of the R community The R Consortium is committed to help evolve the R language by identifying, developing, and implementing infrastructure projects. The R Consortium works with and provides support to the R Foundation and key organizations developing, maintaining, distributing, and using R software. https://r-consortium.org

Slide 23

Slide 23 text

Membership “We see R as the future of statistical analysis because of its flexibility and the strong active community behind it.” – Alun Bedding: Director of Biostatistics at Genentech Become a member: membership@r-consortium.org

Slide 24

Slide 24 text

Why Oracle for Machine Learning with R? Empower data scientists and R users with powerful in-database ML from R Eliminate costly data movement and latency Scale R for data exploration, data preparation, and ML algorithms In-database algorithms supporting: regression, classification, time series, association rules, attribute importance, clustering, feature extraction, anomaly detection Automatic algorithm-specific data preparation, partition models, integrated text mining Ease of ML model and R script deployment with data-parallel and task-parallel support Leverage existing backup, recovery, and security mechanisms and protocols of Oracle Database That’s where most enterprise data lives – bring the algorithms to the data! Oracle Database and Oracle Autonomous Database Oracle integrates ML across the Oracle Stack and the Enterprise Copyright © 2020 Oracle and/or its affiliates.

Slide 25

Slide 25 text

Copyright © 2020, Oracle and/or its affiliates 25 Q & A

Slide 26

Slide 26 text

Thank You Marcos Arancibia | marcos.arancibia@oracle.com Mark Hornick | mark.hornick@oracle.com Oracle Machine Learning Product Management