Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Is it Corked? Wine Machine Learning Predictions...

FTisiot
March 14, 2019

Is it Corked? Wine Machine Learning Predictions with OAC

FTisiot

March 14, 2019
Tweet

More Decks by FTisiot

Other Decks in Technology

Transcript

  1. Specialised in data visualisation, predictive analytics, enterprise reporting and data

    engineering. Enabling the business, the consumers, the data providers and IT to work towards a common goal, delivering innovative and cost-effective solutions based on our core values of thought leadership, hard work and honesty. We work across multiple verticals on projects that range from mature, large scale implementations to proofs of concept and can provide skills in development, architecture, delivery, training and support. www.rittmanmead.com [email protected] @rittmanmead
  2. [email protected] www.rittmanmead.com @rittmanmead !9 Oracle Analytics Cloud • Oracle’s complete

    suite of Platform Services (PaaS) for unified analytics in the cloud • Delivered entirely in the cloud: ‣ No infrastructure footprint ‣ Flexibility to scale up or down based on your immediate needs ‣ Simplified, metered licensing • Several options to suit your needs: ‣ Oracle or customer/partner managed services ‣ Functionality bundled into 3 editions
  3. [email protected] www.rittmanmead.com @rittmanmead • OAC supports every type of analytics

    workload across your organisation !10 Functions • Classic enterprise BI: ‣ Analysis & dashboarding ‣ Published reporting ‣ Enterprise Performance Management • Modern departmental/personal discovery: ‣ Extended data mashup & modelling ‣ Data preparation, exploration & visualisation ‣ Data science & machine learning
  4. [email protected] www.rittmanmead.com @rittmanmead • Similar User Experience to OBIEE 12c

    ‣ Centrally maintained & governed ‣ Semantic model remains key • Interactive Dashboards ‣ Ideal for KPI measurement & monitoring ‣ Guided navigation paths • BI Publisher ‣ Highly formatted, burst outputs • Action Framework ‣ Navigation actions ‣ Scheduled agents !11 Classic Enterprise BI
  5. [email protected] www.rittmanmead.com @rittmanmead • Data Preparation ‣ Acquire data from

    multiple connections ‣ Apply enrichments data prior to analysis ‣ Define repeatable preparation flows • Data Visualisation ‣ Create visual insights rapidly ‣ Construct narated storyboards ‣ Share findings • Machine Learning ‣ Build & train ML models ‣ Apply model to new data sets !12 Modern Data Discovery
  6. Basic Operations What are the Drivers for My Sales? Based

    on my Experience I can Guess…. Statistically Significant Drivers for Sales Are … Augmented Analytics
  7. Rule Based Italy or France -> Good Rest of the

    World -> Bad Price >= 10 Euros -> Good Price < 10 Euros -> Bad Price > 30 & Production Zone = Veneto & …. -> 6.5
  8. Become a Data Scientist with OAC Connect Clean Analyse Train

    & Evaluate Predict Transform & Enrich
  9. Clean N/A Missing Values Mark <> MArk Wrong Values City

    “Rome” Irrelevant Observations Role: CIO Salary:500 K$ Handling Outliers Train: 80% Test: 20% Train/Test Set Split Col1 -> Name Labelling Columns 0-200k 0-1 Feature Scaling # Of Clicks Aggregation
  10. Why Removing an Outlier? Years Experience Salary 1 30.000 2

    32.000 3 35.000 4 35.500 5 36.000 6 40.000 7 50.000 8 70.000 9 90.000 10 500.000
  11. Enrich - Feature Engineering Location -> ZIP Code 2 Locations

    -> Distance Name -> Sex Day/Month/Year -> Date Data Flow Additional Data Sources?
  12. Train - What Problem are we Trying to Solve? Supervised

    Unsupervised “I want to predict the value of Y, here are some examples” “Here is a dataset, make sense out of it!” Classification Regression https://towardsdatascience.com/supervised-vs-unsupervised-learning-14f68e32ea8d Clustering
  13. Conclusions 73% > 63% > 50% Existing Skillset Trial Error

    Visual -> UI Driven Data Cleaning & Transformation Model Creation & Evaluation
  14. Custom ML Models Model_train.xml Parameter Definition Python Parameter Parsing Data

    Cleaning Model Storage Statistics Calculation svr=SVR(kernel=kernel, gamma=0.01, C= 5) SVR_Model = svr.fit(train_X, train_y) Model Definition & Training Model_test.xml https://www.oracle.com/solutions/business-analytics/data-visualization/library-overview.html
  15. Custom ML - Parameter Definition <options> <option> <name>target</name> <displayName>Target</displayName> <value></value>

    <required>true</required> <description>target, the target(label) to learn/predict</description> <type>column</type> <ui-config></ui-config> </option> <option> <name>description</name> <displayName>Description</displayName> <value></value> <required>true</required> <description>The descriptive field to be parsed</description> <type>column</type> <ui-config></ui-config> </option> </options>
  16. Model Storage pickleobj={'WineModel':rfc} d = base64.b64encode(pickle.dumps(pickleobj)).decode(‘utf-8') required_mappings = {'description':'true'} model

    = Model() model.set_data(__name__, 'pickle', d, target) model.set_required_attr(required_mappings) model.set_class_tag('BinaryClassification') target_col=y
  17. Calculating Confusion Matrix # predict test_data with model predictions_proba =

    rfc.predict_proba(test_X) predictions = lrutils.get_classicification(predictions_proba, rfc.classes_, threshod,positiveValue, targetMappings) # confusion matrix dataset and statistics confusion_df, statistics_df = datasetutils.prepare_output_datasets(test_y, predictions, target, "classification", targetMappings) confusion_mappings = None confusion_ds = ModelDataset("Confusion Matrix", confusion_df, confusion_mappings) model.add_output_dataset(confusion_ds) # statistics dataset statistics_mappings = None statistics_ds = ModelDataset("Statistics", statistics_df, statistics_mappings) model.add_output_dataset(statistics_ds)