Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Is It Corked? Wine Machine Learning Predictions with OAC

FTisiot
December 03, 2019

Is It Corked? Wine Machine Learning Predictions with OAC

FTisiot

December 03, 2019
Tweet

More Decks by FTisiot

Other Decks in Technology

Transcript

  1. Is It Corked? Wine Machine
    Learning Predictions with OAC
    Francesco Tisiot - @Ftisiot
    Analytics Tech Lead - Rittman Mead
    Charlie Berger - @CharlieDataMine
    Sr Director Product Management - Oracle

    View full-size slide

  2. Verona, Italy
    http://ritt.md/ftisiot
    Over 10 Years in Analytics
    [email protected]
    @FTisiot
    Oracle ACE Director
    ITOUG Board President
    Francesco Tisiot
    Analytics Tech Lead

    View full-size slide

  3. Charlie Berger
    Sr. Director of Product Management
    Machine Learning, AI, Cognitive Analytics
    @CharlieDataMine

    View full-size slide

  4. Data Engineering Analytics Data Science
    www.rittmanmead.com
    [email protected]
    @rittmanmead

    View full-size slide

  5. Agenda
    •OAC
    •Data Scientist
    •Become a Data Scientist

    View full-size slide

  6. Oracle Analytics Cloud
    • Platform Services (PaaS)

    • Delivered entirely in the cloud:

    •No infrastructure footprint

    •Flexibility

    •Simplified, metered licensing

    • Several options to suit your needs:

    •BYOL

    •Functionality bundled into 2 editions

    •Professional

    •Enterprise

    View full-size slide

  7. Functions
    OAC supports Every type of analytics
    Classic Modern

    View full-size slide

  8. Augmented Analytics
    Data Enrichment
    Suggestions
    Explain
    One-Click
    Advanced
    Analytics
    Advanced
    Machine
    Learning
    Natural
    Language
    Processing

    View full-size slide

  9. OAC and Data Science

    View full-size slide

  10. Basic Operations
    What are the
    Drivers for My
    Sales?
    Based on my Experience
    I can Guess….
    Statistically Significant
    Drivers for Sales Are …
    Augmented
    Analytics

    View full-size slide

  11. Basic Operations
    Is this Client
    going to accept
    the Offer?
    YES/NO
    50%
    70%
    Basic ML
    Model

    View full-size slide

  12. Before Starting…. Define the Problem!

    View full-size slide

  13. Problem Definition:
    Predicting Wine Quality

    View full-size slide

  14. Rule Based
    Italy or France -> Good
    Rest of the World -> Bad
    Price >= 10 Euros -> Good
    Price < 10 Euros -> Bad
    Price > 30 & Production Zone = Veneto & …. -> 6.5

    View full-size slide

  15. Task Experience Performance
    Estimate Wine
    Good/Bad
    Corpus of Wines
    Descriptions with Ratings
    Accuracy
    TEP

    View full-size slide

  16. Accuracy
    Icons made by Smashicons from www.flaticon.com
    Real Value
    Predicted Value
    Good
    Bad
    Bad
    Good
    / ( )
    +
    Accuracy =

    View full-size slide

  17. Dataset
    https://www.kaggle.com/zynicide/wine-reviews

    View full-size slide

  18. Become a Data Scientist with OAC
    Connect Clean
    Analyse
    Train
    &
    Evaluate
    Predict
    Transform
    &
    Enrich

    View full-size slide

  19. Connection Options in OAC
    Pre-Defined
    Data Models
    External
    Data Sources

    View full-size slide

  20. 0-200k
    0-1
    Feature Scaling
    Train: 80%
    Test: 20%
    Train/Test Set Split
    Col1 -> Name
    Labelling Columns
    City
    “Rome”
    Irrelevant Observations
    Mark <> MArk
    Wrong Values
    Cleaning What?
    N/A
    Missing Values
    Role: CIO
    Salary:500 K$
    Handling Outliers
    CASE …
    WHEN…
    UPPER FILTER
    COLUMN
    RENAME
    FILTER
    KPI/
    (MAX-MIN)
    FILTER?
    # of Clicks
    Aggregation
    COUNT
    Automated
    Automated
    Automated

    View full-size slide

  21. Feature Engineering
    Location -> ZIP Code
    2 Locations -> Distance
    Name -> Sex
    Day/Month/Year -> Date
    Data Flow
    Additional
    Data Sources?

    View full-size slide

  22. Data Preparation Recommendations

    View full-size slide

  23. Spatial Enrichment
    Oracle Spatial Studio
    http://ritt.md/spatial-studio

    View full-size slide

  24. Data Overview

    View full-size slide

  25. Analyse - Explain

    View full-size slide

  26. Explain - Key Drivers

    View full-size slide

  27. Train - What Problem are we Trying to Solve?
    Supervised Unsupervised
    “I want to predict the value of Y,
    here are some examples”
    “Here is a dataset,
    make sense out of it!”
    Classification
    Regression
    https://towardsdatascience.com/supervised-vs-unsupervised-learning-14f68e32ea8d
    Clustering

    View full-size slide

  28. Model Training - Easy Models

    View full-size slide

  29. DataFlow Train Model

    View full-size slide

  30. Which Model - Parameters To Pick?

    View full-size slide

  31. Select, Try, Save, Change, Try, Save …..

    View full-size slide

  32. Compare - Classification

    View full-size slide

  33. Use On the Fly or with a Dataflow

    View full-size slide

  34. Congratulations!
    …You are now a Data Scientist!

    View full-size slide

  35. ML Production Deployment
    Data Scientist
    ML -> Data
    Oracle Machine Learning

    View full-size slide

  36. Copyright © 2019 Oracle and/or its affiliates.
    d
    Oracle Machine Learning
    OML Microservices*

    Supporting Oracle Applications

    Image, Text, Scoring, Deployment,

    Model Management
    * Coming soon
    OML4SQL

    Oracle Advanced Analytics

    SQL API
    OML4Py*

    Python API
    OML4R

    Oracle R Enterprise
    R API
    OML Notebooks

    with Apache Zeppelin on 

    Autonomous Database
    OML4Spark

    Oracle R Advanced Analytics 

    for Hadoop
    Oracle Data Miner

    Oracle SQL Developer extension

    View full-size slide

  37. CLASSIFICATION
    Naïve Bayes
    Logistic Regression (GLM)
    Decision Tree
    Random Forest
    Neural Network
    Support Vector Machine
    Explicit Semantic Analysis
    CLUSTERING
    Hierarchical K-Means
    Hierarchical O-Cluster
    Expectation Maximization (EM)
    ANOMALY DETECTION
    One-Class SVM

    TIME SERIES
    Forecasting - Exponential Smoothing
    Includes popular models 

    e.g. Holt-Winters with trends, 

    seasonality, irregularity, missing data
    REGRESSION
    Linear Model
    Generalized Linear Model
    Support Vector Machine (SVM)
    Stepwise Linear regression
    Neural Network
    LASSO
    ATTRIBUTE IMPORTANCE
    Minimum Description Length
    Principal Comp Analysis (PCA)
    Unsupervised Pair-wise KL Div
    CUR decomposition for row & AI
    ASSOCIATION RULES
    A priori/ market basket
    PREDICTIVE QUERIES
    Predict, cluster, detect, features
    SQL ANALYTICS
    SQL Windows

    SQL Patterns

    SQL Aggregates
    •Includes support for Partitioned Models, Transactional, Unstructured, Geo-spatial, Graph data. etc,
    Oracle Machine Learning Algorithms
    FEATURE EXTRACTION
    Principal Comp Analysis (PCA)
    Non-negative Matrix Factorization
    Singular Value Decomposition (SVD)
    Explicit Semantic Analysis (ESA)
    TEXT MINING SUPPORT
    Algorithms support text
    Tokenization and theme extraction
    Explicit Semantic Analysis (ESA) for
    document similarity
    STATISTICAL FUNCTIONS
    Basic statistics: min, max, 

    median, stdev, t-test, F-test, Pearson’s,
    Chi-Sq, ANOVA, etc.
    R PACKAGES
    Third-party R Packages 

    through Embedded Execution
    Spark MLlib algorithm integration
    MODEL DEPLOYMENT
    SQL—1st Class Objects
    Oracle RESTful API (ORDS)
    OML Microservices (for Apps)
    X1
    X2
    A1 A2 A3A4 A5 A6 A7

    View full-size slide

  38. Oracle Machine Learning
    Key Features:
    Collaborative UI for data scientists
    Packaged with Autonomous Data
    Warehouse Cloud Easy access to
    shared notebooks, 

    templates, permissions, scheduler, etc.
    SQL ML algorithms API
    Supports deployment of ML
    analytics
    Machine Learning Notebook for Autonomous Data Warehouse Cloud

    View full-size slide

  39. www.analyticsanddatasummit.org/techcasts

    View full-size slide

  40. Become a Data Scientist with OAC
    http://ritt.md/OAC-datascience

    View full-size slide

  41. ML in Action with OAC
    http://ritt.md/OAC-ML-Video

    View full-size slide

  42. https://www.rittmanmead.com/insight-lab/
    Insights Lab

    View full-size slide

  43. Tech Days 2020
    Milan 29th Jan Rome 31st Jan

    View full-size slide

  44. Is It Corked? Wine Machine
    Learning Predictions with OAC
    Francesco Tisiot - @Ftisiot
    Analytics Tech Lead - Rittman Mead
    Charlie Berger - @CharlieDataMine
    Sr Director Product Management - Oracle

    View full-size slide