Save 37% off PRO during our Black Friday Sale! »

Is it Corked? Wine Machine Learning Predictions with OAC

A23789f299ed06fe7d9f1c6940440bfa?s=47 FTisiot
March 14, 2019

Is it Corked? Wine Machine Learning Predictions with OAC

A23789f299ed06fe7d9f1c6940440bfa?s=128

FTisiot

March 14, 2019
Tweet

Transcript

  1. None
  2. info@rittmanmead.com www.rittmanmead.com @rittmanmead Is It Corked? Wine Machine Learning Predictions

    with OAC Francesco Tisiot BI Tech Lead at Rittman Mead
  3. Verona, Italy http://ritt.md/ftisiot 10 Years in Analytics francesco.tisiot@rittmanmead.com @FTisiot Oracle

    ACE Francesco Tisiot BI Tech Lead at Rittman Mead
  4. Specialised in data visualisation, predictive analytics, enterprise reporting and data

    engineering. Enabling the business, the consumers, the data providers and IT to work towards a common goal, delivering innovative and cost-effective solutions based on our core values of thought leadership, hard work and honesty. We work across multiple verticals on projects that range from mature, large scale implementations to proofs of concept and can provide skills in development, architecture, delivery, training and support. www.rittmanmead.com info@rittmanmead.com @rittmanmead
  5. Do You Like Alfredo’s Fettuccine ?

  6. Who is Alfredo ?

  7. info@rittmanmead.com www.rittmanmead.com @rittmanmead Agenda !7 •Tooling •Data Science Steps •Demo

  8. info@rittmanmead.com www.rittmanmead.com @rittmanmead Tooling

  9. info@rittmanmead.com www.rittmanmead.com @rittmanmead !9 Oracle Analytics Cloud • Oracle’s complete

    suite of Platform Services (PaaS) for unified analytics in the cloud • Delivered entirely in the cloud: ‣ No infrastructure footprint ‣ Flexibility to scale up or down based on your immediate needs ‣ Simplified, metered licensing • Several options to suit your needs: ‣ Oracle or customer/partner managed services ‣ Functionality bundled into 3 editions
  10. info@rittmanmead.com www.rittmanmead.com @rittmanmead • OAC supports every type of analytics

    workload across your organisation !10 Functions • Classic enterprise BI: ‣ Analysis & dashboarding ‣ Published reporting ‣ Enterprise Performance Management • Modern departmental/personal discovery: ‣ Extended data mashup & modelling ‣ Data preparation, exploration & visualisation ‣ Data science & machine learning
  11. info@rittmanmead.com www.rittmanmead.com @rittmanmead • Similar User Experience to OBIEE 12c

    ‣ Centrally maintained & governed ‣ Semantic model remains key • Interactive Dashboards ‣ Ideal for KPI measurement & monitoring ‣ Guided navigation paths • BI Publisher ‣ Highly formatted, burst outputs • Action Framework ‣ Navigation actions ‣ Scheduled agents !11 Classic Enterprise BI
  12. info@rittmanmead.com www.rittmanmead.com @rittmanmead • Data Preparation ‣ Acquire data from

    multiple connections ‣ Apply enrichments data prior to analysis ‣ Define repeatable preparation flows • Data Visualisation ‣ Create visual insights rapidly ‣ Construct narated storyboards ‣ Share findings • Machine Learning ‣ Build & train ML models ‣ Apply model to new data sets !12 Modern Data Discovery
  13. info@rittmanmead.com www.rittmanmead.com @rittmanmead OAC And Data Science

  14. Basic Operations What are the Drivers for My Sales? Based

    on my Experience I can Guess…. Statistically Significant Drivers for Sales Are … Augmented Analytics
  15. Basic Operations Is this Client going to accept the Offer?

    YES/NO 50% 70% Basic ML Model
  16. Before Starting…. Define the Problem!

  17. Problem Definition: Predicting Wine Quality

  18. Rule Based Italy or France -> Good Rest of the

    World -> Bad Price >= 10 Euros -> Good Price < 10 Euros -> Bad Price > 30 & Production Zone = Veneto & …. -> 6.5
  19. Task Experience Performance Estimate Wine Good/Bad TEP Corpus of Wines

    Descriptions with Ratings Accuracy
  20. Accuracy Icons made by Smashicons from www.flaticon.com Real Value Predicted

    Value Good Bad Bad Good / ( ) + Accuracy =
  21. Dataset https://www.kaggle.com/zynicide/wine-reviews

  22. The Data

  23. Bad Good

  24. Become a Data Scientist with OAC Connect Clean Analyse Train

    & Evaluate Predict Transform & Enrich
  25. Connect Pre-Defined Data Models Data Sources

  26. Clean N/A Missing Values Mark <> MArk Wrong Values City

    “Rome” Irrelevant Observations Role: CIO Salary:500 K$ Handling Outliers Train: 80% Test: 20% Train/Test Set Split Col1 -> Name Labelling Columns 0-200k 0-1 Feature Scaling # Of Clicks Aggregation
  27. Why Removing an Outlier? Years Experience Salary 1 30.000 2

    32.000 3 35.000 4 35.500 5 36.000 6 40.000 7 50.000 8 70.000 9 90.000 10 500.000
  28. Enrich - Feature Engineering Location -> ZIP Code 2 Locations

    -> Distance Name -> Sex Day/Month/Year -> Date Data Flow Additional Data Sources?
  29. Data Preparation Recommendations

  30. Analyse - Data Overview

  31. Analyse - Explain

  32. Explain - Key Drivers

  33. Train - What Problem are we Trying to Solve? Supervised

    Unsupervised “I want to predict the value of Y, here are some examples” “Here is a dataset, make sense out of it!” Classification Regression https://towardsdatascience.com/supervised-vs-unsupervised-learning-14f68e32ea8d Clustering
  34. Easy Models

  35. DataFlow Train Model

  36. Which Model - Parameters To Pick?

  37. Select, Try, Save, Change, Try, Save …..

  38. Compare

  39. Compare

  40. There is No Single Truth… 502/(502+896) = 64.09% 471/(471+866)=64.77%

  41. Predict - Use On the Fly

  42. Predict - Step of a Data Flow

  43. Demo

  44. Conclusions 73% > 63% > 50% Existing Skillset Trial Error

    Visual -> UI Driven Data Cleaning & Transformation Model Creation & Evaluation
  45. Example https://towardsdatascience.com/wine-ratings-prediction-using-machine-learning-ce259832b321

  46. Custom ML Models Model_train.xml Parameter Definition Python Parameter Parsing Data

    Cleaning Model Storage Statistics Calculation svr=SVR(kernel=kernel, gamma=0.01, C= 5) SVR_Model = svr.fit(train_X, train_y) Model Definition & Training Model_test.xml https://www.oracle.com/solutions/business-analytics/data-visualization/library-overview.html
  47. Custom ML - Parameter Definition <options> <option> <name>target</name> <displayName>Target</displayName> <value></value>

    <required>true</required> <description>target, the target(label) to learn/predict</description> <type>column</type> <ui-config></ui-config> </option> <option> <name>description</name> <displayName>Description</displayName> <value></value> <required>true</required> <description>The descriptive field to be parsed</description> <type>column</type> <ui-config></ui-config> </option> </options>
  48. Custom ML - Script <scriptcontent> </scriptcontent> Import Libraries def obi_create_model(…)

  49. Parameter Parsing target = args['target'] description = args['description'] test_size =

    float(args['split'])
  50. train_X, test_X, train_y, test_y = train_test_split( X, y, test_size=test_size, random_state=0

    ) Train-Test Split
  51. Model Creation rfc = RandomForestClassifier() rfc.fit(train_X, train_y) predictions = rfc.predict(test_X)

  52. Model Storage pickleobj={'WineModel':rfc} d = base64.b64encode(pickle.dumps(pickleobj)).decode(‘utf-8') required_mappings = {'description':'true'} model

    = Model() model.set_data(__name__, 'pickle', d, target) model.set_required_attr(required_mappings) model.set_class_tag('BinaryClassification') target_col=y
  53. Calculating Confusion Matrix # predict test_data with model predictions_proba =

    rfc.predict_proba(test_X) predictions = lrutils.get_classicification(predictions_proba, rfc.classes_, threshod,positiveValue, targetMappings) # confusion matrix dataset and statistics confusion_df, statistics_df = datasetutils.prepare_output_datasets(test_y, predictions, target, "classification", targetMappings) confusion_mappings = None confusion_ds = ModelDataset("Confusion Matrix", confusion_df, confusion_mappings) model.add_output_dataset(confusion_ds) # statistics dataset statistics_mappings = None statistics_ds = ModelDataset("Statistics", statistics_df, statistics_mappings) model.add_output_dataset(statistics_ds)
  54. Result

  55. Limitations • Libraries • Testing • Performances

  56. None
  57. info@rittmanmead.com www.rittmanmead.com @rittmanmead Is It Corked? Wine Machine Learning Predictions

    with OAC Francesco Tisiot BI Tech Lead at Rittman Mead
  58. None