Groovy Data Science - Speaker Deck

Slide 1

Slide 1 text

objectcomputing.com © 2021, Object Computing, Inc. (OCI). All rights reserved. No part of these notes may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, rowing, or otherwise, without the prior, written permission of Object Computing, Inc. (OCI) Groovy and Data Science Presented by Dr Paul King © 2021 Object Computing, Inc. (OCI). All rights reserved.

Slide 118

Slide 118 text

import … def cols = ["Body", "Sweetness", "Smoky", "Medicinal", "Tobacco", "Honey", "Spicy", "Winey", "Nutty", "Malty", "Fruity", "Floral"] def numClusters = 5 def loader = new CSVLoader(file: 'whiskey.csv') def clusterer = new SimpleKMeans(numClusters: numClusters, preserveInstancesOrder: true) def instances = loader.dataSet instances.deleteAttributeAt(0) // remove RowID clusterer.buildClusterer(instances) println ' ' + cols.join(', ') def dataset = new DefaultCategoryDataset() clusterer.clusterCentroids.eachWithIndex{ Instance ctrd, num -> print "Cluster ${num+1}: " println ((1..cols.size()).collect{ sprintf '%.3f', ctrd.value(it) }.join(', ')) (1..cols.size()).each { idx -> dataset.addValue(ctrd.value(idx), "Cluster ${num+1}", cols[idx-1]) } } def clusters = (0.. clusters[cnum] << instances.get(idx).stringValue(0) } clusters.each { k, v -> println "Cluster ${k+1}:" println v.join(', ') } def plot = new SpiderWebPlot(dataset: dataset) def chart = new JFreeChart('Whiskey clusters', plot) SwingUtil.show(new ChartPanel(chart)) Whiskey – clustering with radar plot and weka Body, Sweetness, Smoky, Medicinal, Tobacco, Honey, Spicy, Winey, Nutty, Malty, Fruity, Floral Cluster 1: 3.800, 1.600, 3.600, 3.600, 0.600, 0.200, 1.600, 0.600, 1.000, 1.400, 1.200, 0.000 Cluster 2: 2.773, 2.409, 1.545, 0.045, 0.000, 1.818, 1.591, 2.000, 2.091, 2.136, 2.136, 1.591 Cluster 3: 1.773, 2.455, 1.318, 0.636, 0.000, 0.636, 1.000, 0.409, 1.636, 1.364, 1.591, 1.591 Cluster 4: 1.500, 2.233, 1.267, 0.267, 0.000, 1.533, 1.400, 0.700, 1.000, 1.900, 1.900, 2.133 Cluster 5: 2.000, 2.143, 1.857, 0.857, 1.000, 0.857, 1.714, 1.000, 1.286, 2.000, 1.429, 1.714 Cluster 1: Ardbeg, Clynelish, Lagavulin, Laphroig, Talisker Cluster 2: Aberfeldy, Aberlour, Ardmore, Auchroisk, Balmenach, BenNevis, Benrinnes, Benromach, BlairAthol, Dailuaine, Dalmore, Edradour, Glendronach, Glendullan, Glenfarclas, Glenrothes, Glenturret, Longmorn, Macallan, Mortlach, RoyalLochnagar, Strathisla Cluster 3: ArranIsleOf, Aultmore, Balblair, Cardhu, Craigganmore, Dufftown, GlenGrant, GlenKeith, GlenScotia, GlenSpey, Glenfiddich, Glenmorangie, Isle of Jura, Mannochmore, Miltonduff, Oban, Speyside, Springbank, Strathmill, Tamnavulin, Teaninich, Tomore Cluster 4: AnCnoc, Auchentoshan, Belvenie, Benriach, Bladnoch, Bowmore, Bruichladdich, Bunnahabhain, Dalwhinnie, Deanston, GlenElgin, GlenGarioch, GlenMoray, GlenOrd, Glenallachie, Glengoyne, Glenkinchie, Glenlivet, Glenlossie, Highland Park, Inchgower, Knochando, Linkwood, Loch Lomond, Scapa, Speyburn, Tamdhu, Tobermory, Tomatin, Tomintoul Cluster 5: Caol Ila, Craigallechie, GlenDeveronMacduff, OldFettercairn, OldPulteney, RoyalBrackla, Tullibardine

Slide 159

Slide 159 text

Iris flower data – wekaDeeplearning4j WekaPackageManager.loadPackages(true) def file = getClass().classLoader.getResource('iris_data.csv').file as File def loader = new CSVLoader(file: file) def data = loader.dataSet data.classIndex = 4 def options = Utils.splitOptions("-S 1 -numEpochs 10 -layer \"weka.dl4j.layers.OutputLayer -activation weka.dl4j.activations.ActivationSoftmax \ -lossFn weka.dl4j.lossfunctions.LossMCXENT\"") AbstractClassifier myClassifier = Utils.forName(AbstractClassifier, "weka.classifiers.functions.Dl4jMlpClassifier", options) // Stratify and split Random rand = new Random(0) Instances randData = new Instances(data) randData.randomize(rand) randData.stratify(3) Instances train = randData.trainCV(3, 0) Instances test = randData.testCV(3, 0) // Build the classifier on the training data myClassifier.buildClassifier(train) // Evaluate the model on test data Evaluation eval = new Evaluation(test) eval.evaluateModel(myClassifier, test) println eval.toSummaryString() println eval.toMatrixString() [main] INFO org.deeplearning4j.nn.graph.ComputationGraph - Starting ComputationGraph with WorkspaceModes set to [training: ENABLED; inference: ENABLED], cacheMode set to [NONE] Training Dl4jMlpClassifier...: [] ETA: 00:00:00[INFO ] 00:03:31.035 [main] weka.classifiers.functions.Dl4jMlpClassifier - Epoch [1/10] took 00:00:00.670 Training Dl4jMlpClassifier...: [====== ] ETA: 00:00:06[INFO ] 00:03:31.152 [main] weka.classifiers.functions.Dl4jMlpClassifier - Epoch [2/10] took 00:00:00.113 Training Dl4jMlpClassifier...: [============ ] ETA: 00:00:03[INFO ] 00:03:31.244 [main] weka.classifiers.functions.Dl4jMlpClassifier - Epoch [3/10] took 00:00:00.090 Training Dl4jMlpClassifier...: [================== ] ETA: 00:00:02[INFO ] 00:03:31.325 [main] weka.classifiers.functions.Dl4jMlpClassifier - Epoch [4/10] took 00:00:00.079 Training Dl4jMlpClassifier...: [======================== ] ETA: 00:00:01[INFO ] 00:03:31.470 [main] weka.dl4j.listener.EpochListener - Epoch [5/10] Train Set: Loss: 0.510342 [INFO ] 00:03:31.470 [main] weka.classifiers.functions.Dl4jMlpClassifier - Epoch [5/10] took 00:00:00.144 Training Dl4jMlpClassifier...: [============================== ] ETA: 00:00:01[INFO ] 00:03:31.546 [main] weka.classifiers.functions.Dl4jMlpClassifier - Epoch [6/10] took 00:00:00.073 Training Dl4jMlpClassifier...: [==================================== ] ETA: 00:00:00[INFO ] 00:03:31.611 [main] weka.classifiers.functions.Dl4jMlpClassifier - Epoch [7/10] took 00:00:00.063 Training Dl4jMlpClassifier...: [========================================== ] ETA: 00:00:00[INFO ] 00:03:31.714 [main] weka.classifiers.functions.Dl4jMlpClassifier - Epoch [8/10] took 00:00:00.101 Training Dl4jMlpClassifier...: [================================================ ] ETA: 00:00:00[INFO ] 00:03:31.790 [main] weka.classifiers.functions.Dl4jMlpClassifier - Epoch [9/10] took 00:00:00.074 Training Dl4jMlpClassifier...: [====================================================== ] ETA: 00:00:00[INFO ] 00:03:31.882 [main] weka.dl4j.listener.EpochListener - Epoch [10/10] Train Set: Loss: 0.286469 …

Slide 214

Slide 214 text

Diet problem (Choco) def model = new Model("Diet problem") def unbounded = 1000.0d def precision = 0.00001d // scale quantities by 10, coefficients by 10, products by 100 def bread = model.realVar("Bread", 0.0, unbounded, precision) def milk = model.realVar("Milk", 0.0, 1.0, precision) def cheese = model.realVar("Cheese", 0.0, unbounded, precision) def potato = model.realVar("Potato", 0.0, unbounded, precision) def fish = model.realVar("Fish", 0.5, unbounded, precision) def yogurt = model.realVar("Yogurt", 0.0, unbounded, precision) RealVar[] all = [bread, milk, cheese, potato, fish, yogurt] def scalarIbex = { coeffs, var -> def (a, b, c, d, e, f) = coeffs model.realIbexGenericConstraint("$a*{0}+$b*{1}+$c*{2}+$d*{3}+$e*{4}+$f*{5}={6}", [*all, var] as RealVar[]).post(); } def cost = model.realVar("Cost", 0.0, unbounded, precision) scalarIbex([2.0, 3.5, 8.0, 1.5, 11.0, 1.0], cost) def protein = model.realVar("Protein", 0.0, 10.0, precision) scalarIbex([4.0, 8.0, 7.0, 1.3, 8.0, 9.2], protein) def fat = model.realVar("Fat", 8.0, unbounded, precision) scalarIbex([1.0, 5.0, 9.0, 0.1, 7.0, 1.0], fat) def carbs = model.realVar("Carbohydrates", 10.0, unbounded, precision) scalarIbex([15.0, 11.7, 0.4, 22.6, 0.0, 17.0], carbs) def calories = model.realVar("Calories", 300, unbounded, precision) scalarIbex([90, 120, 106, 97, 130, 180], calories) model.setObjective(Model.MINIMIZE, cost) def found = model.solver.findSolution() Bread: 0.025131 .. 0.025137 Milk: 0.000009 .. 0.000010 Cheese: 0.428571 .. 0.428571 Potato: 1.848118 .. 1.848124 Fish: 0.561836 .. 0.561836 Yogurt: 0.000007 .. 0.000010 Carbohydrates: 42.316203 .. 42.316211 Fat: 8.000000 .. 8.000005 Protein: 9.997920 .. 9.997926 Calories: 300.000000 .. 300.000008 Cost: 12.431241 .. 12.431245 Choco does have a plugin (via JNI) for the Ibex C++ constraint processing library which does handle real numbers. def pretty = { var -> def bounds = found.getRealBounds(var) printf "%s: %.6f .. %.6f%n", var.name, *bounds } if (found) { all.each { pretty(it) } [carbs, fat, protein, calories, cost].each { pretty(it) } } else { println "No solution" }

Slide 1

Slide 1 text

Slide 2

Slide 2 text

Slide 3

Slide 3 text

Slide 4

Slide 4 text

Slide 5

Slide 5 text

Slide 6

Slide 6 text

Slide 7

Slide 7 text

Slide 8

Slide 8 text

Slide 9

Slide 9 text

Slide 10

Slide 10 text

Slide 11

Slide 11 text

Slide 12

Slide 12 text

Slide 13

Slide 13 text

Slide 14

Slide 14 text

Slide 15

Slide 15 text

Slide 16

Slide 16 text

Slide 17

Slide 17 text

Slide 18

Slide 18 text

Slide 19

Slide 19 text

Slide 20

Slide 20 text

Slide 21

Slide 21 text

Slide 22

Slide 22 text

Slide 23

Slide 23 text

Slide 24

Slide 24 text

Slide 25

Slide 25 text

Slide 26

Slide 26 text

Slide 27

Slide 27 text

Slide 28

Slide 28 text

Slide 29

Slide 29 text

Slide 30

Slide 30 text

Slide 31

Slide 31 text

Slide 32

Slide 32 text

Slide 33

Slide 33 text

Slide 34

Slide 34 text

Slide 35

Slide 35 text

Slide 36

Slide 36 text

Slide 37

Slide 37 text

Slide 38

Slide 38 text

Slide 39

Slide 39 text

Slide 40

Slide 40 text