Slide 1

Slide 1 text

Is It Corked? Wine Machine Learning Predictions with OAC Francesco Tisiot - @Ftisiot Analytics Tech Lead - Rittman Mead Charlie Berger - @CharlieDataMine Sr Director Product Management - Oracle

Slide 2

Slide 2 text

Verona, Italy http://ritt.md/ftisiot Over 10 Years in Analytics [email protected] @FTisiot Oracle ACE Director ITOUG Board President Francesco Tisiot Analytics Tech Lead

Slide 3

Slide 3 text

Charlie Berger Sr. Director of Product Management Machine Learning, AI, Cognitive Analytics @CharlieDataMine

Slide 4

Slide 4 text

Data Engineering Analytics Data Science www.rittmanmead.com [email protected] @rittmanmead

Slide 5

Slide 5 text

Agenda •OAC •Data Scientist •Become a Data Scientist

Slide 6

Slide 6 text

Oracle Analytics Cloud • Platform Services (PaaS) • Delivered entirely in the cloud: •No infrastructure footprint •Flexibility •Simplified, metered licensing • Several options to suit your needs: •BYOL •Functionality bundled into 2 editions •Professional •Enterprise

Slide 7

Slide 7 text

Functions OAC supports Every type of analytics Classic Modern

Slide 8

Slide 8 text

Augmented Analytics Data Enrichment Suggestions Explain One-Click Advanced Analytics Advanced Machine Learning Natural Language Processing

Slide 9

Slide 9 text

OAC and Data Science

Slide 10

Slide 10 text

Basic Operations What are the Drivers for My Sales? Based on my Experience I can Guess…. Statistically Significant Drivers for Sales Are … Augmented Analytics

Slide 11

Slide 11 text

Basic Operations Is this Client going to accept the Offer? YES/NO 50% 70% Basic ML Model

Slide 12

Slide 12 text

Before Starting…. Define the Problem!

Slide 13

Slide 13 text

Problem Definition: Predicting Wine Quality

Slide 14

Slide 14 text

Rule Based Italy or France -> Good Rest of the World -> Bad Price >= 10 Euros -> Good Price < 10 Euros -> Bad Price > 30 & Production Zone = Veneto & …. -> 6.5

Slide 15

Slide 15 text

Task Experience Performance Estimate Wine Good/Bad Corpus of Wines Descriptions with Ratings Accuracy TEP

Slide 16

Slide 16 text

Accuracy Icons made by Smashicons from www.flaticon.com Real Value Predicted Value Good Bad Bad Good / ( ) + Accuracy =

Slide 17

Slide 17 text

Dataset https://www.kaggle.com/zynicide/wine-reviews

Slide 18

Slide 18 text

The Data

Slide 19

Slide 19 text

Bad Good

Slide 20

Slide 20 text

Become a Data Scientist with OAC Connect Clean Analyse Train & Evaluate Predict Transform & Enrich

Slide 21

Slide 21 text

Connection Options in OAC Pre-Defined Data Models External Data Sources

Slide 22

Slide 22 text

0-200k 0-1 Feature Scaling Train: 80% Test: 20% Train/Test Set Split Col1 -> Name Labelling Columns City “Rome” Irrelevant Observations Mark <> MArk Wrong Values Cleaning What? N/A Missing Values Role: CIO Salary:500 K$ Handling Outliers CASE … WHEN… UPPER FILTER COLUMN RENAME FILTER KPI/ (MAX-MIN) FILTER? # of Clicks Aggregation COUNT Automated Automated Automated

Slide 23

Slide 23 text

Feature Engineering Location -> ZIP Code 2 Locations -> Distance Name -> Sex Day/Month/Year -> Date Data Flow Additional Data Sources?

Slide 24

Slide 24 text

Data Preparation Recommendations

Slide 25

Slide 25 text

Spatial Enrichment Oracle Spatial Studio http://ritt.md/spatial-studio

Slide 26

Slide 26 text

Data Overview

Slide 27

Slide 27 text

Analyse - Explain

Slide 28

Slide 28 text

Explain - Key Drivers

Slide 29

Slide 29 text

Train - What Problem are we Trying to Solve? Supervised Unsupervised “I want to predict the value of Y, here are some examples” “Here is a dataset, make sense out of it!” Classification Regression https://towardsdatascience.com/supervised-vs-unsupervised-learning-14f68e32ea8d Clustering

Slide 30

Slide 30 text

Model Training - Easy Models

Slide 31

Slide 31 text

DataFlow Train Model

Slide 32

Slide 32 text

Which Model - Parameters To Pick?

Slide 33

Slide 33 text

Select, Try, Save, Change, Try, Save …..

Slide 34

Slide 34 text

Compare

Slide 35

Slide 35 text

Compare - Classification

Slide 36

Slide 36 text

Use On the Fly or with a Dataflow

Slide 37

Slide 37 text

Demo

Slide 38

Slide 38 text

Congratulations! …You are now a Data Scientist!

Slide 39

Slide 39 text

ML Production Deployment Data Scientist ML -> Data Oracle Machine Learning

Slide 40

Slide 40 text

Copyright © 2019 Oracle and/or its affiliates. d Oracle Machine Learning OML Microservices*
 Supporting Oracle Applications
 Image, Text, Scoring, Deployment,
 Model Management * Coming soon OML4SQL
 Oracle Advanced Analytics
 SQL API OML4Py*
 Python API OML4R
 Oracle R Enterprise R API OML Notebooks
 with Apache Zeppelin on 
 Autonomous Database OML4Spark
 Oracle R Advanced Analytics 
 for Hadoop Oracle Data Miner
 Oracle SQL Developer extension

Slide 41

Slide 41 text

CLASSIFICATION Naïve Bayes Logistic Regression (GLM) Decision Tree Random Forest Neural Network Support Vector Machine Explicit Semantic Analysis CLUSTERING Hierarchical K-Means Hierarchical O-Cluster Expectation Maximization (EM) ANOMALY DETECTION One-Class SVM
 TIME SERIES Forecasting - Exponential Smoothing Includes popular models 
 e.g. Holt-Winters with trends, 
 seasonality, irregularity, missing data REGRESSION Linear Model Generalized Linear Model Support Vector Machine (SVM) Stepwise Linear regression Neural Network LASSO ATTRIBUTE IMPORTANCE Minimum Description Length Principal Comp Analysis (PCA) Unsupervised Pair-wise KL Div CUR decomposition for row & AI ASSOCIATION RULES A priori/ market basket PREDICTIVE QUERIES Predict, cluster, detect, features SQL ANALYTICS SQL Windows
 SQL Patterns
 SQL Aggregates •Includes support for Partitioned Models, Transactional, Unstructured, Geo-spatial, Graph data. etc, Oracle Machine Learning Algorithms FEATURE EXTRACTION Principal Comp Analysis (PCA) Non-negative Matrix Factorization Singular Value Decomposition (SVD) Explicit Semantic Analysis (ESA) TEXT MINING SUPPORT Algorithms support text Tokenization and theme extraction Explicit Semantic Analysis (ESA) for document similarity STATISTICAL FUNCTIONS Basic statistics: min, max, 
 median, stdev, t-test, F-test, Pearson’s, Chi-Sq, ANOVA, etc. R PACKAGES Third-party R Packages 
 through Embedded Execution Spark MLlib algorithm integration MODEL DEPLOYMENT SQL—1st Class Objects Oracle RESTful API (ORDS) OML Microservices (for Apps) X1 X2 A1 A2 A3A4 A5 A6 A7

Slide 42

Slide 42 text

Oracle Machine Learning Key Features: Collaborative UI for data scientists Packaged with Autonomous Data Warehouse Cloud Easy access to shared notebooks, 
 templates, permissions, scheduler, etc. SQL ML algorithms API Supports deployment of ML analytics Machine Learning Notebook for Autonomous Data Warehouse Cloud

Slide 43

Slide 43 text

No content

Slide 44

Slide 44 text

www.analyticsanddatasummit.org/techcasts

Slide 45

Slide 45 text

Become a Data Scientist with OAC http://ritt.md/OAC-datascience

Slide 46

Slide 46 text

ML in Action with OAC http://ritt.md/OAC-ML-Video

Slide 47

Slide 47 text

https://www.rittmanmead.com/insight-lab/ Insights Lab

Slide 48

Slide 48 text

Tech Days 2020 Milan 29th Jan Rome 31st Jan

Slide 49

Slide 49 text

No content

Slide 50

Slide 50 text

Is It Corked? Wine Machine Learning Predictions with OAC Francesco Tisiot - @Ftisiot Analytics Tech Lead - Rittman Mead Charlie Berger - @CharlieDataMine Sr Director Product Management - Oracle