Slide 1

Slide 1 text

Classifying Iris flowers with Groovy, Deep Learning, and GraalVM Dr Paul King, VP Apache Groovy & Distinguished Engineer Object Computing Twitter/X | Mastodon : Apache Groovy: Repo: Slides: @ApacheGroovy | @[email protected] https://groovy.apache.org/ https://groovy-lang.org/ https://github.com/paulk-asert/groovy-data-science https://speakerdeck.com/paulk/groovy-iris

Slide 2

Slide 2 text

● Apache Groovy ● Classification Overview ● Iris Classification ● Iris Classification with Neural networks

Slide 3

Slide 3 text

Why use Groovy in 2024? It’s like a super version of Java: • Simpler scripting: more powerful yet more concise • Extension methods: 2000+ enhancements to Java classes for a great out-of-the box experience (batteries included) • Flexible Typing: from dynamic duck-typing (terse code) to extensible stronger-than-Java static typing (better checking) • Improved OO & Functional Features: from traits (more powerful and flexible OO designs) to tail recursion and memorizing/partial application of pure functions • AST transforms: 10s of lines instead of 100/1000s of lines • Java Features Earlier: recent features on older JDKs

Slide 4

Slide 4 text

Hello world script import java.util.List; import java.util.function.Predicate; void main() { var pets = List.of("bird", "cat", "dog"); Predicate stringSize3 = s -> s.length() == 3; System.out.println(pets.stream().filter(stringSize3).toList()); } [cat, dog] var pets = ["bird", "cat", "dog"] var size3 = { it.size() == 3 } println pets.findAll(size3) JDK8+ JDK23 with preview enabled

Slide 5

Slide 5 text

Hello world test import org.junit.Test; import java.util.List; import java.util.function.Predicate; import static org.junit.Assert.assertEquals; public class HelloTest { @Test public void testSize() { var pets = List.of("bird", "cat", "dog"); var nums = List.of(List.of(1, 2), List.of(3, 4, 5), List.of(6)); Predicate stringSize3 = s -> s.length() == 3; Predicate> listSize3 = s -> s.size() == 3; assertEquals(pets.stream().filter(stringSize3).count(), 2L); assertEquals(nums.stream().filter(listSize3).count(), 1L); } }

Slide 6

Slide 6 text

Hello world test import org.junit.Test; import java.util.List; import java.util.function.Predicate; import static org.junit.Assert.assertEquals; public class HelloTest { @Test public void testSize() { var pets = List.of("bird", "cat", "dog"); var nums = List.of(List.of(1, 2), List.of(3, 4, 5), List.of(6)); Predicate stringSize3 = s -> s.length() == 3; Predicate> listSize3 = s -> s.size() == 3; assertEquals(pets.stream().filter(stringSize3).count(), 2L); assertEquals(nums.stream().filter(listSize3).count(), 1L); } }

Slide 7

Slide 7 text

Hello world test import org.junit.Test; import java.util.List; import java.util.function.Predicate; import static org.junit.Assert.assertEquals; public class HelloTest { @Test public void testSize() { var pets = List.of("bird", "cat", "dog"); var nums = List.of(List.of(1, 2), List.of(3, 4, 5), List.of(6)); Predicate stringSize3 = s -> s.length() == 3; Predicate> listSize3 = s -> s.size() == 3; assertEquals(pets.stream().filter(stringSize3).count(), 2L); assertEquals(nums.stream().filter(listSize3).count(), 1L); } } var pets = ["bird", "cat", "dog"] var nums = [1..2, 3..5, 6..6] var size3 = { it.size() == 3 } assert pets.count(size3) == 2 && nums.count(size3) == 1

Slide 8

Slide 8 text

Operator Overloading jshell> import org.apache.commons.math3.linear.MatrixUtils jshell> double[][] d1 = { {10d, 0d}, {0d, 10d}} d1 ==> double[2][] { double[2] { 10.0, 0.0 }, double[2] { 0.0, 10.0 } } jshell> var m1 = MatrixUtils.createRealMatrix(d1) m1 ==> Array2DRowRealMatrix{{10.0,0.0},{0.0,10.0}} jshell> double[][] d2 = { {-1d, 1d}, {1d, -1d}} d2 ==> double[2][] { double[2] { -1.0, 1.0 }, double[2] { 1.0, -1.0 } } jshell> var m2 = MatrixUtils.createRealMatrix(d2) m2 ==> Array2DRowRealMatrix{{-1.0,1.0},{1.0,-1.0}} jshell> System.out.println(m1.multiply(m2.power(2))) Array2DRowRealMatrix{{20.0,-20.0},{-20.0,20.0}}

Slide 9

Slide 9 text

Operator Overloading (plus other features) jshell> import org.apache.commons.math3.linear.MatrixUtils jshell> double[][] d1 = { {10d, 0d}, {0d, 10d}} d1 ==> double[2][] { double[2] { 10.0, 0.0 }, double[2] { 0.0, 10.0 } } jshell> var m1 = MatrixUtils.createRealMatrix(d1) m1 ==> Array2DRowRealMatrix{{10.0,0.0},{0.0,10.0}} jshell> double[][] d2 = { {-1d, 1d}, {1d, -1d}} d2 ==> double[2][] { double[2] { -1.0, 1.0 }, double[2] { 1.0, -1.0 } } jshell> var m2 = MatrixUtils.createRealMatrix(d2) m2 ==> Array2DRowRealMatrix{{-1.0,1.0},{1.0,-1.0}} jshell> System.out.println(m1.multiply(m2.power(2))) Array2DRowRealMatrix{{20.0,-20.0},{-20.0,20.0}}

Slide 10

Slide 10 text

IntComparator maxAbs = (i, j) -> i.abs() <=> j.abs() nums.max() nums.max(maxAbs) Primitive array extension methods Comparator maxAbs = Comparator.comparingInt(Math::abs) nums.intStream().max().getAsInt() nums.stream().max(maxAbs).get() public class JavaStreamsMax { private static Comparator comparator = Comparator.comparingInt(Math::abs); public static int max(int[] nums) { return Arrays.stream(nums).max().getAsInt(); } public static int maxAbs(int[] nums) { return Arrays.stream(nums).boxed().max(comparator).get(); } } int[] numbers = {10, 20, 15, 30, 5};

Slide 11

Slide 11 text

IntComparator maxAbs = (i, j) -> i.abs() <=> j.abs() nums.max() nums.max(maxAbs) Comparator maxAbs = Comparator.comparingInt(Math::abs) nums.intStream().max().getAsInt() nums.stream().max(maxAbs).get() public class JavaStreamsMax { private static Comparator comparator = Comparator.comparingInt(Math::abs); public static int max(int[] nums) { return Arrays.stream(nums).max().getAsInt(); } public static int maxAbs(int[] nums) { return Arrays.stream(nums).boxed().max(comparator).get(); } } Better Primitive array extension methods int[] numbers = {10, 20, 15, 30, 5};

Slide 12

Slide 12 text

AST Transformations // imports not shown public class Book { private String $to$string; private int $hash$code; private final List authors; private final String title; private final Date publicationDate; private static final java.util.Comparator this$TitleComparator; private static final java.util.Comparator this$PublicationDateComparator; public Book(List authors, String title, Date publicationDate) { if (authors == null) { this.authors = null; } else { if (authors instanceof Cloneable) { List authorsCopy = (List) ((ArrayList>) authors).clone(); this.authors = (List) (authorsCopy instanceof SortedSet ? DefaultGroovyMethods.asImmutable(authorsCopy) : authorsCopy instanceof SortedMap ? DefaultGroovyMethods.asImmutable(authorsCopy) : authorsCopy instanceof Set ? DefaultGroovyMethods.asImmutable(authorsCopy) : authorsCopy instanceof Map ? DefaultGroovyMethods.asImmutable(authorsCopy) : authorsCopy instanceof List ? DefaultGroovyMethods.asImmutable(authorsCopy) : DefaultGroovyMethods.asImmutable(authorsCopy)); } else { this.authors = (List) (authors instanceof SortedSet ? DefaultGroovyMethods.asImmutable(authors) : authors instanceof SortedMap ? DefaultGroovyMethods.asImmutable(authors) : authors instanceof Set ? DefaultGroovyMethods.asImmutable(authors) : authors instanceof Map ? DefaultGroovyMethods.asImmutable(authors) : authors instanceof List ? DefaultGroovyMethods.asImmutable(authors) : DefaultGroovyMethods.asImmutable(authors)); } } this.title = title; if (publicationDate == null) { this.publicationDate = null; } else { this.publicationDate = (Date) publicationDate.clone(); } } public Book(Map args) { if ( args == null) { args = new HashMap(); } ImmutableASTTransformation.checkPropNames(this, args); if (args.containsKey("authors")) { if ( args.get("authors") == null) { this .authors = null; } else { if (args.get("authors") instanceof Cloneable) { List authorsCopy = (List) ((ArrayList>) args.get("authors")).clone(); this.authors = (List) (authorsCopy instanceof SortedSet ? DefaultGroovyMethods.asImmutable(authorsCopy) : authorsCopy instanceof SortedMap ? DefaultGroovyMethods.asImmutable(authorsCopy) : authorsCopy instanceof Set ? DefaultGroovyMethods.asImmutable(authorsCopy) : authorsCopy instanceof Map ? DefaultGroovyMethods.asImmutable(authorsCopy) : authorsCopy instanceof List ? DefaultGroovyMethods.asImmutable(authorsCopy) : DefaultGroovyMethods.asImmutable(authorsCopy)); } else { List authors = (List) args.get("authors"); this.authors = (List) (authors instanceof SortedSet ? DefaultGroovyMethods.asImmutable(authors) : authors instanceof SortedMap ? DefaultGroovyMethods.asImmutable(authors) : authors instanceof Set ? DefaultGroovyMethods.asImmutable(authors) : authors instanceof Map ? DefaultGroovyMethods.asImmutable(authors) : authors instanceof List ? DefaultGroovyMethods.asImmutable(authors) : DefaultGroovyMethods.asImmutable(authors)); } } } else { this .authors = null; } if (args.containsKey("title")) {this .title = (String) args.get("title"); } else { this .title = null;} if (args.containsKey("publicationDate")) { if (args.get("publicationDate") == null) { this.publicationDate = null; } else { this.publicationDate = (Date) ((Date) args.get("publicationDate")).clone(); } } else {this.publicationDate = null; } } … public Book() { this (new HashMap()); } public int compareTo(Book other) { if (this == other) { return 0; } Integer value = 0 value = this .title <=> other .title if ( value != 0) { return value } value = this .publicationDate <=> other .publicationDate if ( value != 0) { return value } return 0 } public static Comparator comparatorByTitle() { return this$TitleComparator; } public static Comparator comparatorByPublicationDate() { return this$PublicationDateComparator; } public String toString() { StringBuilder _result = new StringBuilder(); boolean $toStringFirst= true; _result.append("Book("); if ($toStringFirst) { $toStringFirst = false; } else { _result.append(", "); } _result.append(InvokerHelper.toString(this.getAuthors())); if ($toStringFirst) { $toStringFirst = false; } else { _result.append(", "); } _result.append(InvokerHelper.toString(this.getTitle())); if ($toStringFirst) { $toStringFirst = false; } else { _result.append(", "); } _result.append(InvokerHelper.toString(this.getPublicationDate())); _result.append(")"); if ($to$string == null) { $to$string = _result.toString(); } return $to$string; } public int hashCode() { if ( $hash$code == 0) { int _result = HashCodeHelper.initHash(); if (!(this.getAuthors().equals(this))) { _result = HashCodeHelper.updateHash(_result, this.getAuthors()); } if (!(this.getTitle().equals(this))) { _result = HashCodeHelper.updateHash(_result, this.getTitle()); } if (!(this.getPublicationDate().equals(this))) { _result = HashCodeHelper.updateHash(_result, this.getPublicationDate()); } $hash$code = (int) _result; } return $hash$code; } public boolean canEqual(Object other) { return other instanceof Book; } … public boolean equals(Object other) { if ( other == null) { return false; } if (this == other) { return true; } if (!( other instanceof Book)) { return false; } Book otherTyped = (Book) other; if (!(otherTyped.canEqual( this ))) { return false; } if (!(this.getAuthors() == otherTyped.getAuthors())) { return false; } if (!(this.getTitle().equals(otherTyped.getTitle()))) { return false; } if (!(this.getPublicationDate().equals(otherTyped.getPublicationDate()))) { return false; } return true; } public final Book copyWith(Map map) { if (map == null || map.size() == 0) { return this; } Boolean dirty = false; HashMap construct = new HashMap(); if (map.containsKey("authors")) { Object newValue = map.get("authors"); Object oldValue = this.getAuthors(); if (newValue != oldValue) { oldValue = newValue; dirty = true; } construct.put("authors", oldValue); } else { construct.put("authors", this.getAuthors()); } if (map.containsKey("title")) { Object newValue = map.get("title"); Object oldValue = this.getTitle(); if (newValue != oldValue) { oldValue = newValue; dirty = true; } construct.put("title", oldValue); } else { construct.put("title", this.getTitle()); } if (map.containsKey("publicationDate")) { Object newValue = map.get("publicationDate"); Object oldValue = this.getPublicationDate(); if (newValue != oldValue) { oldValue = newValue; dirty = true; } construct.put("publicationDate", oldValue); } else { construct.put("publicationDate", this.getPublicationDate()); } return dirty == true ? new Book(construct) : this; } public void writeExternal(ObjectOutput out) throws IOException { out.writeObject(authors); out.writeObject(title); out.writeObject(publicationDate); } public void readExternal(ObjectInput oin) throws IOException, ClassNotFoundException { authors = (List) oin.readObject(); title = (String) oin.readObject(); publicationDate = (Date) oin.readObject(); } … static { this$TitleComparator = new Book$TitleComparator(); this$PublicationDateComparator = new Book$PublicationDateComparator(); } public String getAuthors(int index) { return authors.get(index); } public List getAuthors() { return authors; } public final String getTitle() { return title; } public final Date getPublicationDate() { if (publicationDate == null) { return publicationDate; } else { return (Date) publicationDate.clone(); } } public int compare(java.lang.Object param0, java.lang.Object param1) { return -1; } private static class Book$TitleComparator extends AbstractComparator { public Book$TitleComparator() { } public int compare(Book arg0, Book arg1) { if (arg0 == arg1) { return 0; } if (arg0 != null && arg1 == null) { return -1; } if (arg0 == null && arg1 != null) { return 1; } return arg0.title <=> arg1.title; } public int compare(java.lang.Object param0, java.lang.Object param1) { return -1; } } private static class Book$PublicationDateComparator extends AbstractComparator { public Book$PublicationDateComparator() { } public int compare(Book arg0, Book arg1) { if ( arg0 == arg1 ) { return 0; } if ( arg0 != null && arg1 == null) { return -1; } if ( arg0 == null && arg1 != null) { return 1; } return arg0 .publicationDate <=> arg1 .publicationDate; } public int compare(java.lang.Object param0, java.lang.Object param1) { return -1; } } } @Immutable(copyWith = true) @Sortable(excludes = 'authors') @AutoExternalize class Book { @IndexedProperty List authors String title Date publicationDate } 10 lines of Groovy or 600 lines of Java

Slide 13

Slide 13 text

● Apache Groovy ● Classification Overview ● Iris Classification ● Iris Classification with Neural networks

Slide 14

Slide 14 text

Classification Overview Classification: • Predicting class of some data Algorithms: • Logistic Regression, Naïve Bayes, Stochastic Gradient Descent, K-Nearest Neighbors, Decision Tree, Random Forest, Support Vector Machine Aspects: • Over/underfitting • Ensemble • Confusion Applications: • Image/speech recognition • Spam filtering • Medical diagnosis • Fraud detection • Customer behaviour prediction

Slide 15

Slide 15 text

Classification Overview Classification: • Predicting class of some data Algorithms: • Logistic Regression, Naïve Bayes, Stochastic Gradient Descent, K-Nearest Neighbors, Decision Tree, Random Forest, Support Vector Machine Aspects: • Over/underfitting • Ensemble • Confusion Applications: • Image/speech recognition • Spam filtering • Medical diagnosis • Fraud detection • Customer behaviour prediction

Slide 16

Slide 16 text

● Apache Groovy ● Classification Overview ● Iris Classification ● Iris Classification with Neural networks

Slide 17

Slide 17 text

Case Study: classification of Iris flowers British statistician & biologist Ronald Fisher 1936 paper: “The use of multiple measurements in taxonomic problems as an example of linear discriminant analysis” 150 samples, 50 each of three species of Iris: • setosa • versicolor • virginica Four features measured for each sample: • sepal length • sepal width • petal length • petal width https://en.wikipedia.org/wiki/Iris_flower_data_set https://archive.ics.uci.edu/ml/datasets/Iris setosa versicolor virginica sepal petal

Slide 18

Slide 18 text

Case Study: classification of Iris flowers British statistician & biologist Ronald Fisher 1936 paper: “The use of multiple measurements in taxonomic problems as an example of linear discriminant analysis” 150 samples, 50 each of three species of Iris: • setosa • versicolor • virginica Four features measured for each sample: • sepal length • sepal width • petal length • petal width https://en.wikipedia.org/wiki/Iris_flower_data_set https://archive.ics.uci.edu/ml/datasets/Iris

Slide 19

Slide 19 text

Iris flower data – More information Blog: https://groovy.apache.org/blog/classifying-iris-flowers-with-deep Repo: https://github.com/paulk-asert/groovy-data-science

Slide 20

Slide 20 text

Iris flower data – Jupyter Sepal length,Sepal width,Petal length,Petal width,Class 5.1,3.5,1.4,0.2,Iris-setosa 4.9,3.0,1.4,0.2,Iris-setosa 4.7,3.2,1.3,0.2,Iris-setosa ... 7.0,3.2,4.7,1.4,Iris-versicolor 6.4,3.2,4.5,1.5,Iris-versicolor ...

Slide 21

Slide 21 text

Iris flower data – explore with Weka

Slide 22

Slide 22 text

Iris flower data – explore with Weka (cont’d)

Slide 23

Slide 23 text

Iris flower data – Weka Naïve Bayes def file = getClass().classLoader.getResource('iris_data.csv').file as File def species = ['Iris-setosa', 'Iris-versicolor', 'Iris-virginica'] def loader = new CSVLoader(file: file) def model = new NaiveBayes() def allInstances = loader.dataSet allInstances.classIndex = 4 model.buildClassifier(allInstances) double[] actual = allInstances*.value(4) double[] predicted = allInstances.collect(model::classifyInstance) double[] petalW = allInstances*.value(2) double[] petalL = allInstances*.value(3) def indices = actual.indices

Slide 24

Slide 24 text

def file = getClass().classLoader.getResource('iris_data.csv').file as File def species = ['Iris-setosa', 'Iris-versicolor', 'Iris-virginica'] def loader = new CSVLoader(file: file) def model = new NaiveBayes() def allInstances = loader.dataSet allInstances.classIndex = 4 model.buildClassifier(allInstances) double[] actual = allInstances*.value(4) double[] predicted = allInstances.collect(model::classifyInstance) double[] petalW = allInstances*.value(2) double[] petalL = allInstances*.value(3) def indices = actual.indices Iris flower data – Weka Naïve Bayes def chart = new XYChartBuilder().width(900).height(450). title("Species").xAxisTitle("Petal length").yAxisTitle("Petal width").build() species.eachWithIndex{ String name, int i -> def groups = indices.findAll{ predicted[it] == i }.groupBy{ actual[it] == i } Collection found = groups[true] ?: [] Collection errors = groups[false] ?: [] println "$name: ${found.size()} correct, ${errors.size()} incorrect" chart.addSeries("$name correct", petalW[found], petalL[found]).with { XYSeriesRenderStyle = Scatter } if (errors) { chart.addSeries("$name incorrect", petalW[errors], petalL[errors]).with { XYSeriesRenderStyle = Scatter } } } new SwingWrapper(chart).displayChart()

Slide 25

Slide 25 text

def chart = new XYChartBuilder().width(900).height(450). title("Species").xAxisTitle("Petal length").yAxisTitle("Petal width").build() species.eachWithIndex{ String name, int i -> def groups = indices.findAll{ predicted[it] == i }.groupBy{ actual[it] == i } Collection found = groups[true] ?: [] Collection errors = groups[false] ?: [] println "$name: ${found.size()} correct, ${errors.size()} incorrect" chart.addSeries("$name correct", petalW[found], petalL[found]).with { XYSeriesRenderStyle = Scatter } if (errors) { chart.addSeries("$name incorrect", petalW[errors], petalL[errors]).with { XYSeriesRenderStyle = Scatter } } } new SwingWrapper(chart).displayChart() Iris flower data – Weka Naïve Bayes def file = getClass().classLoader.getResource('iris_data.csv').file as File def species = ['Iris-setosa', 'Iris-versicolor', 'Iris-virginica'] def loader = new CSVLoader(file: file) def model = new NaiveBayes() def allInstances = loader.dataSet allInstances.classIndex = 4 model.buildClassifier(allInstances) double[] actual = allInstances*.value(4) double[] predicted = allInstances.collect(model::classifyInstance) double[] petalW = allInstances*.value(2) double[] petalL = allInstances*.value(3) def indices = actual.indices Iris-setosa: 50 correct, 0 incorrect Iris-versicolor: 48 correct, 4 incorrect Iris-virginica: 46 correct, 2 incorrect

Slide 26

Slide 26 text

Iris flower data – Weka Logistic Regression def file = getClass().classLoader.getResource('iris_data.csv').file as File def species = ['Iris-setosa', 'Iris-versicolor', 'Iris-virginica'] def loader = new CSVLoader(file: file) def model = new SimpleLogistic() def allInstances = loader.dataSet allInstances.classIndex = 4 model.buildClassifier(allInstances) double[] actual = allInstances*.value(4) double[] predicted = allInstances.collect(model::classifyInstance) double[] petalW = allInstances*.value(2) double[] petalL = allInstances*.value(3) def indices = actual.indices def chart = new XYChartBuilder().width(900).height(450). title("Species").xAxisTitle("Petal length").yAxisTitle("Petal width").build() species.eachWithIndex{ String name, int i -> def groups = indices.findAll{ predicted[it] == i }.groupBy{ actual[it] == i } Collection found = groups[true] ?: [] Collection errors = groups[false] ?: [] println "$name: ${found.size()} correct, ${errors.size()} incorrect" chart.addSeries("$name correct", petalW[found], petalL[found]).with { XYSeriesRenderStyle = Scatter } if (errors) { chart.addSeries("$name incorrect", petalW[errors], petalL[errors]).with { XYSeriesRenderStyle = Scatter } } } new SwingWrapper(chart).displayChart() Iris-setosa: 50 correct, 0 incorrect Iris-versicolor: 49 correct, 1 incorrect Iris-virginica: 49 correct, 1 incorrect

Slide 27

Slide 27 text

Iris flower data – Weka J48 Decision Tree def file = getClass().classLoader.getResource('iris_data.csv').file as File def species = ['Iris-setosa', 'Iris-versicolor', 'Iris-virginica'] def loader = new CSVLoader(file: file) def model = new J48() def allInstances = loader.dataSet allInstances.classIndex = 4 model.buildClassifier(allInstances) double[] actual = allInstances.collect{ it.value(4) } double[] predicted = allInstances.collect{ model.classifyInstance(it) } double[] petalW = allInstances.collect{ it.value(0) } double[] petalL = allInstances.collect{ it.value(1) } def indices = actual.indices def chart = new XYChartBuilder().width(900).height(450). title("Species").xAxisTitle("Petal length").yAxisTitle("Petal width").build() species.eachWithIndex{ String name, int i -> def groups = indices.findAll{ predicted[it] == i }.groupBy{ actual[it] == i } Collection found = groups[true] ?: [] Collection errors = groups[false] ?: [] println "$name: ${found.size()} correct, ${errors.size()} incorrect" chart.addSeries("$name correct", petalW[found], petalL[found]).with { XYSeriesRenderStyle = Scatter } if (errors) { chart.addSeries("$name incorrect", petalW[errors], petalL[errors]).with { XYSeriesRenderStyle = Scatter } } } new SwingWrapper(chart).displayChart() Petal width <= 0.6: Iris-setosa (50.0) Petal width > 0.6 | Petal width <= 1.7 | | Petal length <= 4.9: Iris-versicolor (48.0/1.0) | | Petal length > 4.9 | | | Petal width <= 1.5: Iris-virginica (3.0) | | | Petal width > 1.5: Iris-versicolor (3.0/1.0) | Petal width > 1.7: Iris-virginica (46.0/1.0) Number of Leaves : 5 Size of the tree : 9 Iris-setosa: 50 correct, 0 incorrect Iris-versicolor: 49 correct, 2 incorrect Iris-virginica: 48 correct, 1 incorrect

Slide 28

Slide 28 text

Iris flower data – Smile KNN def features = ['Sepal length', 'Sepal width', 'Petal length', 'Petal width'] def species = ['Iris-setosa', 'Iris-versicolor', 'Iris-virginica'] def file = getClass().classLoader.getResource('iris_data.csv').file Table table = Table.read().csv(file) def helper = new TablesawUtil(file) … (0.. def (actual, predicted) = [classNames[idx], species[predictedClass]] actual == predicted ? predicted : "$predicted/$actual".toString() })) def title = 'Petal width vs length with predicted[/actual] class' helper.show(ScatterPlot.create(title, table, 'Petal width', 'Petal length', 'Result'), 'KNNClassification') // use cross validation to get accuracy CrossValidation.classification(10, featureCols, classes, (x, y) -> KNN.fit(x, y, 3)).with { printf 'Accuracy: %.2f%% +/- %.2f\n', 100 * avg.accuracy, 100 * sd.accuracy }

Slide 29

Slide 29 text

KNN Classification • Calculate distance to other data points • Find closest K points • Regression: Average values • Classification: Label with majority class

Slide 30

Slide 30 text

KNN Classification • Calculate distance to other data points • Find closest K points • Regression: Average values • Classification: Label with majority class

Slide 31

Slide 31 text

KNN Classification • Calculate distance to other data points • Find closest K points • Regression: Average values • Classification: Label with majority class

Slide 32

Slide 32 text

Feature stats by species: iris_data.csv summary Class | Mean [Sepal length] | Min [Sepal length] | Max [Sepal length] | ----------------------------------------------------------------------------------------- Iris-setosa | 5.006 | 4.3 | 5.8 | Iris-versicolor | 5.936 | 4.9 | 7 | Iris-virginica | 6.588 | 4.9 | 7.9 | iris_data.csv summary Class | Mean [Sepal width] | Min [Sepal width] | Max [Sepal width] | -------------------------------------------------------------------------------------- Iris-setosa | 3.418 | 2.3 | 4.4 | Iris-versicolor | 2.77 | 2 | 3.4 | Iris-virginica | 2.974 | 2.2 | 3.8 | iris_data.csv summary Class | Mean [Petal length] | Min [Petal length] | Max [Petal length] | ----------------------------------------------------------------------------------------- Iris-setosa | 1.464 | 1 | 1.9 | Iris-versicolor | 4.26 | 3 | 5.1 | Iris-virginica | 5.552 | 4.5 | 6.9 | iris_data.csv summary Class | Mean [Petal width] | Min [Petal width] | Max [Petal width] | -------------------------------------------------------------------------------------- Iris-setosa | 0.244 | 0.1 | 0.6 | Iris-versicolor | 1.326 | 1 | 1.8 | Iris-virginica | 2.026 | 1.4 | 2.5 | Confusion matrix: ROW=truth and COL=predicted class 0 | 50 | 0 | 0 | class 1 | 0 | 47 | 3 | class 2 | 0 | 3 | 47 | Iris flower data – Smile KNN

Slide 33

Slide 33 text

● Apache Groovy ● Classification Overview ● Iris Classification ● Iris Classification with Neural networks

Slide 34

Slide 34 text

Neural Networks

Slide 35

Slide 35 text

Neural Networks

Slide 36

Slide 36 text

Iris flower data – wekaDeeplearning4j WekaPackageManager.loadPackages(true) def file = getClass().classLoader.getResource('iris_data.csv').file as File def loader = new CSVLoader(file: file) def data = loader.dataSet data.classIndex = 4 def options = Utils.splitOptions("-S 1 -numEpochs 10 -layer \"weka.dl4j.layers.OutputLayer -activation weka.dl4j.activations.ActivationSoftmax \ -lossFn weka.dl4j.lossfunctions.LossMCXENT\"") AbstractClassifier myClassifier = Utils.forName(AbstractClassifier, "weka.classifiers.functions.Dl4jMlpClassifier", options) // Stratify and split Random rand = new Random(0) Instances randData = new Instances(data) randData.randomize(rand) randData.stratify(3) Instances train = randData.trainCV(3, 0) Instances test = randData.testCV(3, 0) // Build the classifier on the training data myClassifier.buildClassifier(train) // Evaluate the model on test data Evaluation eval = new Evaluation(test) eval.evaluateModel(myClassifier, test) println eval.toSummaryString() println eval.toMatrixString()

Slide 37

Slide 37 text

WekaPackageManager.loadPackages(true) def file = getClass().classLoader.getResource('iris_data.csv').file as File def loader = new CSVLoader(file: file) def data = loader.dataSet data.classIndex = 4 def options = Utils.splitOptions("-S 1 -numEpochs 10 -layer \"weka.dl4j.layers.OutputLayer -activation weka.dl4j.activations.ActivationSoftmax \ -lossFn weka.dl4j.lossfunctions.LossMCXENT\"") AbstractClassifier myClassifier = Utils.forName(AbstractClassifier, "weka.classifiers.functions.Dl4jMlpClassifier", options) // Stratify and split Random rand = new Random(0) Instances randData = new Instances(data) randData.randomize(rand) randData.stratify(3) Instances train = randData.trainCV(3, 0) Instances test = randData.testCV(3, 0) // Build the classifier on the training data myClassifier.buildClassifier(train) // Evaluate the model on test data Evaluation eval = new Evaluation(test) eval.evaluateModel(myClassifier, test) println eval.toSummaryString() println eval.toMatrixString() Iris flower data – wekaDeeplearning4j [main] INFO org.deeplearning4j.nn.graph.ComputationGraph - Starting ComputationGraph with WorkspaceModes set to [training: ENABLED; inference: ENABLED], cacheMode set to [NONE] Training Dl4jMlpClassifier...: [] ETA: 00:00:00[INFO ] 00:03:31.035 [main] weka.classifiers.functions.Dl4jMlpClassifier - Epoch [1/10] took 00:00:00.670 Training Dl4jMlpClassifier...: [====== ] ETA: 00:00:06[INFO ] 00:03:31.152 [main] weka.classifiers.functions.Dl4jMlpClassifier - Epoch [2/10] took 00:00:00.113 Training Dl4jMlpClassifier...: [============ ] ETA: 00:00:03[INFO ] 00:03:31.244 [main] weka.classifiers.functions.Dl4jMlpClassifier - Epoch [3/10] took 00:00:00.090 Training Dl4jMlpClassifier...: [================== ] ETA: 00:00:02[INFO ] 00:03:31.325 [main] weka.classifiers.functions.Dl4jMlpClassifier - Epoch [4/10] took 00:00:00.079 Training Dl4jMlpClassifier...: [======================== ] ETA: 00:00:01[INFO ] 00:03:31.470 [main] weka.dl4j.listener.EpochListener - Epoch [5/10] Train Set: Loss: 0.510342 [INFO ] 00:03:31.470 [main] weka.classifiers.functions.Dl4jMlpClassifier - Epoch [5/10] took 00:00:00.144 Training Dl4jMlpClassifier...: [============================== ] ETA: 00:00:01[INFO ] 00:03:31.546 [main] weka.classifiers.functions.Dl4jMlpClassifier - Epoch [6/10] took 00:00:00.073 Training Dl4jMlpClassifier...: [==================================== ] ETA: 00:00:00[INFO ] 00:03:31.611 [main] weka.classifiers.functions.Dl4jMlpClassifier - Epoch [7/10] took 00:00:00.063 Training Dl4jMlpClassifier...: [========================================== ] ETA: 00:00:00[INFO ] 00:03:31.714 [main] weka.classifiers.functions.Dl4jMlpClassifier - Epoch [8/10] took 00:00:00.101 Training Dl4jMlpClassifier...: [================================================ ] ETA: 00:00:00[INFO ] 00:03:31.790 [main] weka.classifiers.functions.Dl4jMlpClassifier - Epoch [9/10] took 00:00:00.074 Training Dl4jMlpClassifier...: [====================================================== ] ETA: 00:00:00[INFO ] 00:03:31.882 [main] weka.dl4j.listener.EpochListener - Epoch [10/10] Train Set: Loss: 0.286469 …

Slide 38

Slide 38 text

WekaPackageManager.loadPackages(true) def file = getClass().classLoader.getResource('iris_data.csv').file as File def loader = new CSVLoader(file: file) def data = loader.dataSet data.classIndex = 4 def options = Utils.splitOptions("-S 1 -numEpochs 10 -layer \"weka.dl4j.layers.OutputLayer -activation weka.dl4j.activations.ActivationSoftmax \ -lossFn weka.dl4j.lossfunctions.LossMCXENT\"") AbstractClassifier myClassifier = Utils.forName(AbstractClassifier, "weka.classifiers.functions.Dl4jMlpClassifier", options) // Stratify and split Random rand = new Random(0) Instances randData = new Instances(data) randData.randomize(rand) randData.stratify(3) Instances train = randData.trainCV(3, 0) Instances test = randData.testCV(3, 0) // Build the classifier on the training data myClassifier.buildClassifier(train) // Evaluate the model on test data Evaluation eval = new Evaluation(test) eval.evaluateModel(myClassifier, test) println eval.toSummaryString() println eval.toMatrixString() Iris flower data – wekaDeeplearning4j … [INFO ] 00:03:31.883 [main] weka.classifiers.functions.Dl4jMlpClassifier - Epoch [10/10] took 00:00:00.091 Training Dl4jMlpClassifier...: [============================================================] ETA: 00:00:00 Done! Correctly Classified Instances 40 80 % Incorrectly Classified Instances 10 20 % Kappa statistic 0.701 Mean absolute error 0.2542 Root mean squared error 0.3188 Relative absolute error 57.2154 % Root relative squared error 67.6336 % Total Number of Instances 50 === Confusion Matrix === a b c <-- classified as 17 0 0 | a = Iris-setosa 0 9 8 | b = Iris-versicolor 0 2 14 | c = Iris-virginica BUILD SUCCESSFUL in 22s

Slide 39

Slide 39 text

Iris flower data – Deep Netts String[] cols = ['Sepal length', 'Sepal width', 'Petal length', 'Petal width'] String[] species = ['Iris-setosa', 'Iris-versicolor', 'Iris-virginica'] int numInputs = cols.size() int numOutputs = species.size() // Deep Netts readCsv assumes normalized data, so we roll our own var dataSet = new TabularDataSet(numInputs, numOutputs).tap{ columnNames = cols + species } var data = getClass().classLoader.getResource('iris_data.csv').readLines()*.split(',') data[1..-1].each { dataSet.add(new TabularDataSet.Item(it[0..3]*.toFloat() as float[], oneHotEncode(it[4], species))) } scaleMax(dataSet) def (train, test) = dataSet.split(0.7, 0.3) // 70/30% split …

Slide 40

Slide 40 text

Iris flower data – Deep Netts … var neuralNet = FeedForwardNetwork.builder() .addInputLayer(numInputs) .addFullyConnectedLayer(5, ActivationType.TANH) .addOutputLayer(numOutputs, ActivationType.SOFTMAX) .lossFunction(LossType.CROSS_ENTROPY) .randomSeed(456) .build() neuralNet.trainer.with { maxError = 0.04f learningRate = 0.01f momentum = 0.9f optimizer = OptimizerType.MOMENTUM } neuralNet.train(train) new ClassifierEvaluator().with { println "CLASSIFIER EVALUATION METRICS\n${evaluate(neuralNet, test)}" println "CONFUSION MATRIX\n$confusionMatrix" }

Slide 41

Slide 41 text

… var neuralNet = FeedForwardNetwork.builder() .addInputLayer(numInputs) .addFullyConnectedLayer(5, ActivationType.TANH) .addOutputLayer(numOutputs, ActivationType.SOFTMAX) .lossFunction(LossType.CROSS_ENTROPY) .randomSeed(456) .build() neuralNet.trainer.with { maxError = 0.04f learningRate = 0.01f momentum = 0.9f optimizer = OptimizerType.MOMENTUM } neuralNet.train(train) new ClassifierEvaluator().with { println "CLASSIFIER EVALUATION METRICS\n${evaluate(neuralNet, test)}" println "CONFUSION MATRIX\n$confusionMatrix" } Iris flower data – Deep Netts $ time groovy -cp "build/lib/*" Iris.groovy 16:49:27.089 [main] INFO deepnetts.core.DeepNetts - ------------------------------------------------------------------ 16:49:27.091 [main] INFO deepnetts.core.DeepNetts - TRAINING NEURAL NETWORK 16:49:27.091 [main] INFO deepnetts.core.DeepNetts - ------------------------------------------------------------------ 16:49:27.100 [main] INFO deepnetts.core.DeepNetts - Epoch:1, Time:6ms, TrainError:0.8584314, TrainErrorChange:0.858431 16:49:27.103 [main] INFO deepnetts.core.DeepNetts - Epoch:2, Time:3ms, TrainError:0.52278274, TrainErrorChange:-0.3356 ... 16:49:27.911 [main] INFO deepnetts.core.DeepNetts - Epoch:3031, Time:0ms, TrainError:0.029988592, TrainErrorChange:-0. TRAINING COMPLETED 16:49:27.911 [main] INFO deepnetts.core.DeepNetts - Total Training Time: 820ms 16:49:27.911 [main] INFO deepnetts.core.DeepNetts - ------------------------------------------------------------------ CLASSIFIER EVALUATION METRICS Accuracy: 0.95681506 (How often is classifier correct in total) Precision: 0.974359 (How often is classifier correct when it gives positive prediction) F1Score: 0.974359 (Harmonic average (balance) of precision and recall) Recall: 0.974359 (When it is actually positive class, how often does it give positive prediction) CONFUSION MATRIX none Iris-setosaIris-versicolor Iris-virginica none 0 0 0 0 Iris-setosa 0 14 0 0 Iris-versicolor 0 0 18 1 Iris-virginica 0 0 0 12 real 0m3.160s user 0m10.156s sys 0m0.483s

Slide 42

Slide 42 text

Iris flower data – Deep Netts with GraalVM groovyc -cp "build/lib/*" --compile-static iris.groovy java -agentlib:native-image-agent=config-output-dir=conf/ -cp ".:build/lib/*" iris

Slide 43

Slide 43 text

Iris flower data – Deep Netts with GraalVM native-image --report-unsupported-elements-at-runtime \ --initialize-at-run-time=groovy.grape.GrapeIvy,deepnetts.net.weights.RandomWeights \ --initialize-at-build-time \ --no-fallback \ -H:ConfigurationFileDirectories=conf/ \ -cp ".:build/lib/*" \ -Dorg.slf4j.simpleLogger.defaultLogLevel=WARN \ iris

Slide 44

Slide 44 text

Iris flower data – Deep Netts with GraalVM $ time ./iris CLASSIFIER EVALUATION METRICS Accuracy: 0.93460923 (How often is classifier correct in total) Precision: 0.96491224 (How often is classifier correct when it gives positive prediction) F1Score: 0.96491224 (Harmonic average (balance) of precision and recall) Recall: 0.96491224 (When it is actually positive class, how often does it give positive prediction) CONFUSION MATRIX none Iris-setosaIris-versicolor Iris-virginica none 0 0 0 0 Iris-setosa 0 21 0 0 Iris-versicolor 0 0 20 2 Iris-virginica 0 0 0 17 real 0m0.131s user 0m0.096s sys 0m0.029s

Slide 45

Slide 45 text

Questions? Twitter/X | Mastodon : Apache Groovy: Repo: Slides: @ApacheGroovy | @[email protected] https://groovy.apache.org/ https://groovy-lang.org/ https://github.com/paulk-asert/groovy-data-science https://speakerdeck.com/paulk/groovy-iris