Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Whiskey Groovy Ignite

Whiskey Groovy Ignite

This talk looks at using Apache Groovy with Apache Ignite's distributed K-Means clustering algorithm to cluster Whiskey profiles. Ignite helps you scale your machine learning applications. Groovy simplifies your data science code.

paulking

May 31, 2023
Tweet

More Decks by paulking

Other Decks in Technology

Transcript

  1. June 6, 2023 1 Whiskey Clustering with Apache Groovy &

    Apache Ignite Paul King VP AT APACHE GROOVY, PRINCIPAL SOFTWARE ENGINEER AT UNITY FOUNDATION
  2. Dr Paul King Unity Foundation Groovy Lead V.P. Apache Groovy

    Author: https://www.manning.com/books/groovy-in-action-second-edition Slides: https://speakerdeck.com/paulk/whiskey-groovy-ignite (this talk) https://speakerdeck.com/paulk/groovy-data-science (larger talk) Examples repo: https://github.com/paulk-asert/groovy-data-science Twitter: @paulk_asert, Mastodon: @[email protected]
  3. 3 Whiskey Clustering with Apache Groovy & Apache Ignite •

    Apache Groovy • Apache Ignite • Data Science • Whiskey Clustering & Visualization • Scaling Whiskey Clustering
  4. Apache Groovy Programming Language • Multi-faceted extensible language • Imperative/OO

    & functional • Dynamic & static • Aligned closely with Java • 19+ years since inception • ~2.5B downloads (partial count) • ~500 contributors • 200+ releases • https://www.youtube.com/watch?v=eIGOG- F9ZTw&feature=youtu.be
  5. What is Groovy? It’s like a super version of Java:

    • Supports most Java syntax but allows simpler syntax for many constructs • Supports all Java libraries but provides many extensions and its own productivity libraries • Has both a static and dynamic nature • Extensible language and tooling Java Groovy
  6. Why use Groovy in 2023? It’s still like a super

    version of Java: • Simpler scripting • Metaprogramming: runtime, compile-time, extension methods, AST transforms • Language features: power assert, powerful switch, traits, closures • Static and dynamic nature • Productivity libraries for common tasks • Extensibility: language, tooling, type checker Java Groovy Let’s look at just two features that reduce boilerplate code
  7. Simpler scripting: Java7+ import java.util.List; import java.util.ArrayList; class Main {

    private List keepShorterThan(List strings, int length) { List result = new ArrayList(); for (int i = 0; i < strings.size(); i++) { String s = (String) strings.get(i); if (s.length() < length) { result.add(s); } } return result; } public static void main(String[] args) { List names = new ArrayList(); names.add("Ted"); names.add("Fred"); names.add("Jed"); names.add("Ned"); System.out.println(names); Main m = new Main(); List shortNames = m.keepShorterThan(names, 4); System.out.println(shortNames.size()); for (int i = 0; i < shortNames.size(); i++) { String s = (String) shortNames.get(i); System.out.println(s); } } }
  8. Simpler scripting: Java21+ (with JEP 445 & preview) import java.util.List;

    import java.util.ArrayList; class Main { private List keepShorterThan(List strings, int length) { List result = new ArrayList(); for (int i = 0; i < strings.size(); i++) { String s = (String) strings.get(i); if (s.length() < length) { result.add(s); } } return result; } public static void main(String[] args) { List names = new ArrayList(); names.add("Ted"); names.add("Fred"); names.add("Jed"); names.add("Ned"); System.out.println(names); Main m = new Main(); List shortNames = m.keepShorterThan(names, 4); System.out.println(shortNames.size()); for (int i = 0; i < shortNames.size(); i++) { String s = (String) shortNames.get(i); System.out.println(s); } } } import java.util.List; void main() { var names = List.of("Ted", "Fred", "Jed", "Ned"); System.out.println(names); var shortNames = names.stream().filter(n -> n.length() < 4).toList(); System.out.println(shortNames.size()); shortNames.forEach(System.out::println); }
  9. Simpler scripting: JDK5+/Groovy 1+ import java.util.List; import java.util.ArrayList; class Main

    { private List keepShorterThan(List strings, int length) { List result = new ArrayList(); for (int i = 0; i < strings.size(); i++) { String s = (String) strings.get(i); if (s.length() < length) { result.add(s); } } return result; } public static void main(String[] args) { List names = new ArrayList(); names.add("Ted"); names.add("Fred"); names.add("Jed"); names.add("Ned"); System.out.println(names); Main m = new Main(); List shortNames = m.keepShorterThan(names, 4); System.out.println(shortNames.size()); for (int i = 0; i < shortNames.size(); i++) { String s = (String) shortNames.get(i); System.out.println(s); } } } import java.util.List; void main() { var names = List.of("Ted", "Fred", "Jed", "Ned"); System.out.println(names); var shortNames = names.stream().filter(n -> n.length() < 4).toList(); System.out.println(shortNames.size()); shortNames.forEach(System.out::println); } names = ["Ted", "Fred", "Jed", "Ned"] println names shortNames = names.findAll{ it.size() < 4 } println shortNames.size() shortNames.each{ println it }
  10. Simpler scripting: DSL/command chain support import java.util.List; import java.util.ArrayList; class

    Main { private List keepShorterThan(List strings, int length) { List result = new ArrayList(); for (int i = 0; i < strings.size(); i++) { String s = (String) strings.get(i); if (s.length() < length) { result.add(s); } } return result; } public static void main(String[] args) { List names = new ArrayList(); names.add("Ted"); names.add("Fred"); names.add("Jed"); names.add("Ned"); System.out.println(names); Main m = new Main(); List shortNames = m.keepShorterThan(names, 4); System.out.println(shortNames.size()); for (int i = 0; i < shortNames.size(); i++) { String s = (String) shortNames.get(i); System.out.println(s); } } } import java.util.List; void main() { var names = List.of("Ted", "Fred", "Jed", "Ned"); System.out.println(names); var shortNames = names.stream().filter(n -> n.length() < 4).toList(); System.out.println(shortNames.size()); shortNames.forEach(System.out::println); } names = ["Ted", "Fred", "Jed", "Ned"] println names shortNames = names.findAll{ it.size() < 4 } println shortNames.size() shortNames.each{ println it } given the names "Ted", "Fred", "Jed" and "Ned" display all the names display the number of names having size less than 4 display the names having size less than 4
  11. Scripting for Data Science • Same example • Same library

    Array2DRowRealMatrix{{15.1379501385,40.488531856},{21.4354570637,59.5951246537}} import org.apache.commons.math3.linear.*; public class MatrixMain { public static void main(String[] args) { double[][] matrixData = { {1d,2d,3d}, {2d,5d,3d}}; RealMatrix m = MatrixUtils.createRealMatrix(matrixData); double[][] matrixData2 = { {1d,2d}, {2d,5d}, {1d, 7d}}; RealMatrix n = new Array2DRowRealMatrix(matrixData2); RealMatrix o = m.multiply(n); // Invert o, using LU decomposition RealMatrix oInverse = new LUDecomposition(o).getSolver().getInverse(); RealMatrix p = oInverse.scalarAdd(1d).scalarMultiply(2d); RealMatrix q = o.add(p.power(2)); System.out.println(q); } } Thanks to operator overloading and extensible tooling
  12. Metaprogramming: AST Transforms public final class Person { private final

    String first; private final String last; public String getFirst() { return first; } public String getLast() { return last; } @Override public int hashCode() { final int prime = 31; int result = 1; result = prime * result + ((first == null) ? 0 : first.hashCode()); result = prime * result + ((last == null) ? 0 : last.hashCode()); return result; } public Person(String first, String last) { this.first = first; this.last = last; } // ... // ... @Override public boolean equals(Object obj) { if (this == obj) return true; if (obj == null) return false; if (getClass() != obj.getClass()) return false; Person other = (Person) obj; if (first == null) { if (other.first != null) return false; } else if (!first.equals(other.first)) return false; if (last == null) { if (other.last != null) return false; } else if (!last.equals(other.last)) return false; return true; } @Override public String toString() { return "Person(first:" + first + ", last:" + last + ")"; } } • Writing a JavaBean Person class, Java 7-15
  13. // ... @Override public boolean equals(Object obj) { if (this

    == obj) return true; if (obj == null) return false; if (getClass() != obj.getClass()) return false; Person other = (Person) obj; if (first == null) { if (other.first != null) return false; } else if (!first.equals(other.first)) return false; if (last == null) { if (other.last != null) return false; } else if (!last.equals(other.last)) return false; return true; } @Override public String toString() { return "Person(first:" + first + ", last:" + last + ")"; } } Metaprogramming: AST Transforms public final class Person { private final String first; private final String last; public String getFirst() { return first; } public String getLast() { return last; } @Override public int hashCode() { final int prime = 31; int result = 1; result = prime * result + ((first == null) ? 0 : first.hashCode()); result = prime * result + ((last == null) ? 0 : last.hashCode()); return result; } public Person(String first, String last) { this.first = first; this.last = last; } // ... • Groovy equivalent (JDK 7-15) @Immutable class Person { String first, last }
  14. // ... @Override public boolean equals(Object obj) { if (this

    == obj) return true; if (obj == null) return false; if (getClass() != obj.getClass()) return false; Person other = (Person) obj; if (first == null) { if (other.first != null) return false; } else if (!first.equals(other.first)) return false; if (last == null) { if (other.last != null) return false; } else if (!last.equals(other.last)) return false; return true; } @Override public String toString() { return "Person(first:" + first + ", last:" + last + ")"; } } Metaprogramming: AST Transforms public final class Person { private final String first; private final String last; public String getFirst() { return first; } public String getLast() { return last; } @Override public int hashCode() { final int prime = 31; int result = 1; result = prime * result + ((first == null) ? 0 : first.hashCode()); result = prime * result + ((last == null) ? 0 : last.hashCode()); return result; } public Person(String first, String last) { this.first = first; this.last = last; } // ... • Java Record (JDK16+) / Groovy Record (JDK 8+) @Immutable class Person { String first, last } record Person(String first, String last) { }
  15. Groovy Records: differences to Java Java Record Groovy Emulated Record

    Groovy Native Record JDK version 16+ 8+ 16+ Serialization Record spec Traditional Record spec Recognized by Java, Groovy Groovy Java, Groovy Standard features • accessors • tuple constructor • toString, equals, hashCode    Optional enhancements  toMap, toList, size, getAt, components, copyWith, named-arg constructor Customisable via coding    Customisable via AST transforms (declarative)   
  16. Metaprogramming // imports not shown public class Book { private

    String $to$string; private int $hash$code; private final List<String> authors; private final String title; private final Date publicationDate; private static final java.util.Comparator this$TitleComparator; private static final java.util.Comparator this$PublicationDateComparator; public Book(List<String> authors, String title, Date publicationDate) { if (authors == null) { this.authors = null; } else { if (authors instanceof Cloneable) { List<String> authorsCopy = (List<String>) ((ArrayList<?>) authors).clone(); this.authors = (List<String>) (authorsCopy instanceof SortedSet ? DefaultGroovyMethods.asImmutable(authorsCopy) : authorsCopy instanceof SortedMap ? DefaultGroovyMethods.asImmutable(authorsCopy) : authorsCopy instanceof Set ? DefaultGroovyMethods.asImmutable(authorsCopy) : authorsCopy instanceof Map ? DefaultGroovyMethods.asImmutable(authorsCopy) : authorsCopy instanceof List ? DefaultGroovyMethods.asImmutable(authorsCopy) : DefaultGroovyMethods.asImmutable(authorsCopy)); } else { this.authors = (List<String>) (authors instanceof SortedSet ? DefaultGroovyMethods.asImmutable(authors) : authors instanceof SortedMap ? DefaultGroovyMethods.asImmutable(authors) : authors instanceof Set ? DefaultGroovyMethods.asImmutable(authors) : authors instanceof Map ? DefaultGroovyMethods.asImmutable(authors) : authors instanceof List ? DefaultGroovyMethods.asImmutable(authors) : DefaultGroovyMethods.asImmutable(authors)); } } this.title= title; if (publicationDate== null) { this.publicationDate= null; } else { this.publicationDate= (Date) publicationDate.clone(); } } public Book(Map args) { if ( args == null) { args = new HashMap(); } ImmutableASTTransformation.checkPropNames(this, args); if (args.containsKey("authors")) { if ( args.get("authors") == null) { this .authors = null; } else { if (args.get("authors") instanceof Cloneable) { List<String> authorsCopy = (List<String>) ((ArrayList<?>) args.get("authors")).clone(); this.authors = (List<String>) (authorsCopy instanceof SortedSet ? DefaultGroovyMethods.asImmutable(authorsCopy) : authorsCopy instanceof SortedMap ? DefaultGroovyMethods.asImmutable(authorsCopy) : authorsCopy instanceof Set ? DefaultGroovyMethods.asImmutable(authorsCopy) : authorsCopy instanceof Map ? DefaultGroovyMethods.asImmutable(authorsCopy) : authorsCopy instanceof List ? DefaultGroovyMethods.asImmutable(authorsCopy) : DefaultGroovyMethods.asImmutable(authorsCopy)); } else { List<String> authors = (List<String>) args.get("authors"); this.authors = (List<String>) (authors instanceof SortedSet ? DefaultGroovyMethods.asImmutable(authors) : authors instanceof SortedMap ? DefaultGroovyMethods.asImmutable(authors) : authors instanceof Set ? DefaultGroovyMethods.asImmutable(authors) : authors instanceof Map ? DefaultGroovyMethods.asImmutable(authors) : authors instanceof List ? DefaultGroovyMethods.asImmutable(authors) : DefaultGroovyMethods.asImmutable(authors)); } } } else { this .authors = null; } if (args.containsKey("title")) {this .title = (String) args.get("title"); } else { this .title = null;} if (args.containsKey("publicationDate")) { if (args.get("publicationDate") == null) { this.publicationDate = null; } else { this.publicationDate = (Date) ((Date) args.get("publicationDate")).clone(); } } else {this.publicationDate = null; } } … public Book() { this (new HashMap()); } public int compareTo(Book other) { if (this == other) { return 0; } Integer value = 0 value = this .title <=> other .title if ( value != 0) { return value } value = this .publicationDate <=> other .publicationDate if ( value != 0) { return value } return 0 } public static Comparator comparatorByTitle() { return this$TitleComparator; } public static Comparator comparatorByPublicationDate() { return this$PublicationDateComparator; } public String toString() { StringBuilder _result = new StringBuilder(); boolean $toStringFirst= true; _result.append("Book("); if ($toStringFirst) { $toStringFirst = false; } else { _result.append(", "); } _result.append(InvokerHelper.toString(this.getAuthors())); if ($toStringFirst) { $toStringFirst = false; } else { _result.append(", "); } _result.append(InvokerHelper.toString(this.getTitle())); if ($toStringFirst) { $toStringFirst = false; } else { _result.append(", "); } _result.append(InvokerHelper.toString(this.getPublicationDate())); _result.append(")"); if ($to$string == null) { $to$string = _result.toString(); } return $to$string; } public int hashCode() { if ( $hash$code == 0) { int _result = HashCodeHelper.initHash(); if (!(this.getAuthors().equals(this))) { _result = HashCodeHelper.updateHash(_result, this.getAuthors()); } if (!(this.getTitle().equals(this))) { _result = HashCodeHelper.updateHash(_result, this.getTitle()); } if (!(this.getPublicationDate().equals(this))) { _result = HashCodeHelper.updateHash(_result, this.getPublicationDate()); } $hash$code = (int) _result; } return $hash$code; } public boolean canEqual(Object other) { return other instanceof Book; } … public boolean equals(Object other) { if ( other == null) { return false; } if (this == other) { return true; } if (!( other instanceof Book)) { return false; } Book otherTyped = (Book) other; if (!(otherTyped.canEqual( this ))) { return false; } if (!(this.getAuthors() == otherTyped.getAuthors())) { return false; } if (!(this.getTitle().equals(otherTyped.getTitle()))) { return false; } if (!(this.getPublicationDate().equals(otherTyped.getPublicationDate()))) { return false; } return true; } public final Book copyWith(Map map) { if (map == null || map.size() == 0) { return this; } Boolean dirty = false; HashMap construct = new HashMap(); if (map.containsKey("authors")) { Object newValue = map.get("authors"); Object oldValue = this.getAuthors(); if (newValue != oldValue) { oldValue= newValue; dirty = true; } construct.put("authors", oldValue); } else { construct.put("authors", this.getAuthors()); } if (map.containsKey("title")) { Object newValue = map.get("title"); Object oldValue = this.getTitle(); if (newValue != oldValue) { oldValue= newValue; dirty = true; } construct.put("title", oldValue); } else { construct.put("title", this.getTitle()); } if (map.containsKey("publicationDate")) { Object newValue = map.get("publicationDate"); Object oldValue = this.getPublicationDate(); if (newValue != oldValue) { oldValue= newValue; dirty = true; } construct.put("publicationDate", oldValue); } else { construct.put("publicationDate", this.getPublicationDate()); } return dirty == true ? new Book(construct) : this; } public void writeExternal(ObjectOutputout) throws IOException { out.writeObject(authors); out.writeObject(title); out.writeObject(publicationDate); } public void readExternal(ObjectInputoin) throws IOException, ClassNotFoundException{ authors = (List) oin.readObject(); title = (String) oin.readObject(); publicationDate= (Date) oin.readObject(); } … static { this$TitleComparator = new Book$TitleComparator(); this$PublicationDateComparator = new Book$PublicationDateComparator(); } public String getAuthors(int index) { return authors.get(index); } public List<String> getAuthors() { return authors; } public final String getTitle() { return title; } public final Date getPublicationDate() { if (publicationDate== null) { return publicationDate; } else { return (Date) publicationDate.clone(); } } public int compare(java.lang.Objectparam0, java.lang.Objectparam1) { return -1; } private static class Book$TitleComparator extends AbstractComparator<Book> { public Book$TitleComparator() { } public int compare(Book arg0, Book arg1) { if (arg0 == arg1) { return 0; } if (arg0 != null && arg1 == null) { return -1; } if (arg0 == null && arg1 != null) { return 1; } return arg0.title <=> arg1.title; } public int compare(java.lang.Objectparam0, java.lang.Objectparam1) { return -1; } } private static class Book$PublicationDateComparator extends AbstractComparator<Book> { public Book$PublicationDateComparator() { } public int compare(Book arg0, Book arg1) { if ( arg0 == arg1 ) { return 0; } if ( arg0 != null && arg1 == null) { return -1; } if ( arg0 == null && arg1 != null) { return 1; } return arg0 .publicationDate <=> arg1 .publicationDate; } public int compare(java.lang.Objectparam0, java.lang.Objectparam1) { return -1; } } } @Immutable(copyWith = true) @Sortable(excludes = 'authors') @AutoExternalize class Book { @IndexedProperty List<String> authors String title Date publicationDate }
  17. AST Transformations: Groovy 2.4, Groovy 2.5, Groovy 3.0, Groovy 4.0

    @NonSealed @RecordBase @Sealed @PlatformLog @GQ @Final @RecordType @POJO @Pure @Contracted @Ensures @Invariant @Requires @ClassInvariant @ContractElement @Postcondition @Precondition (Improved in 2.5)
  18. 2 0 Whiskey Clustering with Apache Groovy & Apache Ignite

    • Apache Groovy • Apache Ignite • Data Science • Whiskey Clustering & Visualization • Scaling Whiskey Clustering
  19. Scaling up machine learning: Apache Ignite Apache Ignite is a

    distributed database for high-performance computing with in-memory speed. In simple terms, it makes a cluster (or grid) of nodes appear like an in-memory cache. Ignite can be used as: • an in-memory cache with special features like SQL querying and transactional properties • an in-memory data-grid with advanced read-through & write-through capabilities on top of one or more distributed databases • an ultra-fast and horizontally scalable in-memory database • a high-performance computing engine for custom or built-in tasks including machine learning It is mostly this last capability that we will use. Ignite’s Machine Learning API has purpose built, cluster-aware machine learning and deep learning algorithms for Classification, Regression, Clustering, and Recommendation, among others. We’ll mostly use the distributed K-means Clustering algorithm from their library.
  20. 2 2 Whiskey Clustering with Apache Groovy & Apache Ignite

    • Apache Groovy • Apache Ignite • Data Science • Whiskey Clustering & Visualization • Scaling Whiskey Clustering
  21. Data Science Process Research Goals Obtain Data Data Preparation Data

    Exploration Visualization Data Modeling Data ingestion Data storage Data processing platforms Modeling algorithms Math libraries Graphics processing Integration Deployment
  22. Data science algorithms Data Mining Statistics Machine Learning Optimization •

    Analytics: descriptive, predictive, prescriptive • Analysis: anomaly detection, classification, regression, clustering, association, optimization, dimension reduction • Data relationship: linear, non-linear • Assumptions: parametric, non-parametric • Strategy: supervised, unsupervised, reinforcement • Combining: ensemble, boosting
  23. 2 7 Whiskey Clustering with Apache Groovy & Apache Ignite

    • Apache Groovy • Apache Ignite • Data Science • Whiskey Clustering & Visualization • Scaling Whiskey Clustering
  24. Clustering Overview Clustering: • Grouping similar items Algorithm families: •

    Hierarchical • Partitioning k-means, x-means • Density-based • Graph-based Aspects: • Disjoint vs overlapping • Preset cluster number • Dimensionality reduction PCA • Nominal feature support Applications: • Market segmentation • Recommendation engines • Search result grouping • Social network analysis • Medical imaging
  25. Clustering Overview Clustering: • Grouping similar items Algorithm families: •

    Hierarchical • Partitioning k-means, x-means • Density-based • Graph-based Aspects: • Disjoint vs overlapping • Preset cluster number • Dimensionality reduction PCA • Nominal feature support Applications: • Market segmentation • Recommendation engines • Search result grouping • Social network analysis • Medical imaging
  26. Clustering with KMeans Step 1: • Guess k cluster centroids

    Step 2: • Assign points to closest centroid
  27. Clustering with KMeans Step 1: • Guess k cluster centroids

    Step 2: • Assign points to closest centroid
  28. Clustering with KMeans Step 1: • Guess k cluster centroids

    Step 2: • Assign points to closest centroid Step 3: • Calculate new centroids based on selected points
  29. Clustering with KMeans Step 1: • Guess k cluster centroids

    Step 2: • Assign points to closest centroid Step 3: • Calculate new centroids based on selected points
  30. Clustering with KMeans Step 1: • Guess k cluster centroids

    Step 2: • Assign points to closest centroid Step 3: • Calculate new centroids based on selected points
  31. Clustering with KMeans Step 1: • Guess k cluster centroids

    Step 2: • Assign points to closest centroid Step 3: • Calculate new centroids based on selected points
  32. Clustering with KMeans Step 1: • Guess k cluster centroids

    Step 2: • Assign points to closest centroid Step 3: • Calculate new centroids based on selected points Repeat steps 2 and 3 until stable or some limit reached
  33. Clustering with KMeans Step 1: • Guess k cluster centroids

    Step 2: • Assign points to closest centroid Step 3: • Calculate new centroids based on selected points Repeat steps 2 and 3 until stable or some limit reached
  34. Clustering with KMeans Step 1: • Guess k cluster centroids

    Step 2: • Assign points to closest centroid Step 3: • Calculate new centroids based on selected points Repeat steps 2 and 3 until stable or some limit reached
  35. Clustering with KMeans Step 1: • Guess k cluster centroids

    Step 2: • Assign points to closest centroid Step 3: • Calculate new centroids based on selected points Repeat steps 2 and 3 until stable or some limit reached
  36. Clustering case study: Whiskey flavor profiles • 86 scotch whiskies

    • 12 flavor categories Pictures: https://prasant.net/clustering-scotch-whisky-grouping-distilleries-by-k-means-clustering-81f2ecde069c https://www.r-bloggers.com/where-the-whisky-flavor-profile-data-came-from/ https://www.centerspace.net/clustering-analysis-part-iv-non-negative-matrix-factorization/
  37. Clustering case study: Whiskey flavor profiles RowID,Distillery,Body,Sweetness,Smoky,Medicinal,Tobacco,Honey,Spicy,Winey,Nutty,Malty,Fruity,Floral … 34,GlenElgin,2,3,1,0,0,2,1,1,1,1,2,3 35,GlenGarioch,2,1,3,0,0,0,3,1,0,2,2,2

    36,GlenGrant,1,2,0,0,0,1,0,1,2,1,2,1 37,GlenKeith,2,3,1,0,0,1,2,1,2,1,2,1 38,GlenMoray,1,2,1,0,0,1,2,1,2,2,2,4 39,GlenOrd,3,2,1,0,0,1,2,1,1,2,2,2 40,GlenScotia,2,2,2,2,0,1,0,1,2,2,1,1 41,GlenSpey,1,3,1,0,0,0,1,1,1,2,0,2 42,Glenallachie,1,3,1,0,0,1,1,0,1,2,2,2 …
  38. import … def cols = ["Body", "Sweetness", "Smoky", "Medicinal", "Tobacco",

    "Honey", "Spicy", "Winey", "Nutty", "Malty", "Fruity", "Floral"] def numClusters = 5 def loader = new CSVLoader(file: 'whiskey.csv') def clusterer = new SimpleKMeans(numClusters: numClusters, preserveInstancesOrder: true) def instances = loader.dataSet instances.deleteAttributeAt(0) // remove RowID clusterer.buildClusterer(instances) println ' ' + cols.join(', ') def dataset = new DefaultCategoryDataset() clusterer.clusterCentroids.eachWithIndex{ Instance ctrd, num -> print "Cluster ${num+1}: " println ((1..cols.size()).collect{ sprintf '%.3f', ctrd.value(it) }.join(', ')) (1..cols.size()).each { idx -> dataset.addValue(ctrd.value(idx), "Cluster ${num+1}", cols[idx-1]) } } def clusters = (0..<numClusters).collectEntries{ [it, []] } clusterer.assignments.eachWithIndex { cnum, idx -> clusters[cnum] << instances.get(idx).stringValue(0) } clusters.each { k, v -> println "Cluster ${k+1}:" println v.join(', ') } def plot = new SpiderWebPlot(dataset: dataset) def chart = new JFreeChart('Whiskey clusters', plot) SwingUtil.show(new ChartPanel(chart)) Whiskey – clustering with radar plot and weka Body, Sweetness, Smoky, Medicinal, Tobacco, Honey, Spicy, Winey, Nutty, Malty, Fruity, Floral Cluster 1: 3.800, 1.600, 3.600, 3.600, 0.600, 0.200, 1.600, 0.600, 1.000, 1.400, 1.200, 0.000 Cluster 2: 2.773, 2.409, 1.545, 0.045, 0.000, 1.818, 1.591, 2.000, 2.091, 2.136, 2.136, 1.591 Cluster 3: 1.773, 2.455, 1.318, 0.636, 0.000, 0.636, 1.000, 0.409, 1.636, 1.364, 1.591, 1.591 Cluster 4: 1.500, 2.233, 1.267, 0.267, 0.000, 1.533, 1.400, 0.700, 1.000, 1.900, 1.900, 2.133 Cluster 5: 2.000, 2.143, 1.857, 0.857, 1.000, 0.857, 1.714, 1.000, 1.286, 2.000, 1.429, 1.714 Cluster 1: Ardbeg, Clynelish, Lagavulin, Laphroig, Talisker Cluster 2: Aberfeldy, Aberlour, Ardmore, Auchroisk, Balmenach, BenNevis, Benrinnes, Benromach, BlairAthol, Dailuaine, Dalmore, Edradour, Glendronach, Glendullan, Glenfarclas, Glenrothes, Glenturret, Longmorn, Macallan, Mortlach, RoyalLochnagar, Strathisla Cluster 3: ArranIsleOf, Aultmore, Balblair, Cardhu, Craigganmore, Dufftown, GlenGrant, GlenKeith, GlenScotia, GlenSpey, Glenfiddich, Glenmorangie, Isle of Jura, Mannochmore, Miltonduff, Oban, Speyside, Springbank, Strathmill, Tamnavulin, Teaninich, Tomore Cluster 4: AnCnoc, Auchentoshan, Belvenie, Benriach, Bladnoch, Bowmore, Bruichladdich, Bunnahabhain, Dalwhinnie, Deanston, GlenElgin, GlenGarioch, GlenMoray, GlenOrd, Glenallachie, Glengoyne, Glenkinchie, Glenlivet, Glenlossie, Highland Park, Inchgower, Knochando, Linkwood, Loch Lomond, Scapa, Speyburn, Tamdhu, Tobermory, Tomatin, Tomintoul Cluster 5: Caol Ila, Craigallechie, GlenDeveronMacduff, OldFettercairn, OldPulteney, RoyalBrackla, Tullibardine
  39. import … def rows = CSV.withFirstRecordAsHeader().parse(new FileReader('whiskey.csv')) def cols =

    ["Body", "Sweetness", "Smoky", "Medicinal", "Tobacco", "Honey", "Spicy", "Winey", "Nutty", "Malty", "Fruity", "Floral"] def clusterer = new KMeansPlusPlusClusterer(5) def data = rows.collect{ row -> new DoublePoint(cols.collect{ col -> row[col] } as int[]) } def centroids = clusterer.cluster(data) println cols.join(', ') + ', Medoid' def dataset = new DefaultCategoryDataset() centroids.eachWithIndex{ ctrd, num -> def cpt = ctrd.center.point def closest = ctrd.points.min{ pt -> sumSq((0..<cpt.size()).collect{ cpt[it] - pt.point[it] } as double[]) } def medoid = rows.find{ row -> cols.collect{ row[it] as double } == closest.point }?.Distillery println cpt.collect{ sprintf '%.3f', it }.join(', ') + ", $medoid" cpt.eachWithIndex { val, idx -> dataset.addValue(val, "Cluster ${num+1}", cols[idx]) } } def plot = new SpiderWebPlot(dataset: dataset) def chart = new JFreeChart('Whiskey clusters', plot) SwingUtil.show(new ChartPanel(chart)) Whiskey – clustering with radar plot and medoids Libraries: Apache Commons Math and JFreeChart Body, Sweetness, Smoky, Medicinal, Tobacco, Honey, Spicy, Winey, Nutty, Malty, Fruity, Floral, Medoid 2.000, 2.533, 1.267, 0.267, 0.200, 1.067, 1.667, 0.933, 0.267, 1.733, 1.800, 1.733, GlenOrd 2.789, 2.474, 1.474, 0.053, 0.000, 1.895, 1.632, 2.211, 2.105, 2.105, 2.211, 1.737, Aberfeldy 2.909, 1.545, 2.909, 2.727, 0.455, 0.455, 1.455, 0.545, 1.545, 1.455, 1.182, 0.545, Clynelish 1.333, 2.333, 0.944, 0.111, 0.000, 1.000, 0.444, 0.444, 1.500, 1.944, 1.778, 1.778, Aultmore 1.696, 2.304, 1.565, 0.435, 0.087, 1.391, 1.696, 0.609, 1.652, 1.652, 1.783, 2.130, Benromach
  40. import … def rows = Table.read().csv('whiskey.csv') def cols = ["Body",

    "Sweetness", "Smoky", "Medicinal", "Tobacco", "Honey", "Spicy", "Winey", "Nutty", "Malty", "Fruity", "Floral"] def data = table.as().doubleMatrix(*cols) def pca = new PCA(data) pca.projection = 2 def plots = [PlotCanvas.screeplot(pca)] def projected = pca.project(data) table = table.addColumns( *(1..2).collect { idx -> DoubleColumn.create("PCA$idx", (0..<data.size()).collect { projected[it][idx - 1] }) } ) def colors = [RED, BLUE, GREEN, ORANGE, MAGENTA, GRAY] def symbols = ['*', 'Q', '#', 'Q', '*', '#'] (2..6).each { k -> def clusterer = new KMeans(data, k) double[][] components = table.as().doubleMatrix('PCA1', 'PCA2') plots << ScatterPlot.plot(components, clusterer.clusterLabel, symbols[0..<k] as char[], colors[0..<k] as Color[]) } SwingUtil.show(size: [1200, 900], new PlotPanel(*plots)) Whiskey – Screeplot
  41. Whiskey – clustering and visualizing centroids … def data =

    table.as().doubleMatrix(*cols) def pca = new PCA(data) pca.projection = 3 def projected = pca.project(data) def clusterer = new KMeans(data, 5) def labels = clusterer.clusterLabel.collect { "Cluster " + (it + 1) } table = table.addColumns( *(0..<3).collect { idx -> DoubleColumn.create("PCA${idx+1}", (0..<data.size()).collect{ projected[it][idx] })}, StringColumn.create("Cluster", labels), DoubleColumn.create("Centroid", [10] * labels.size()) ) def centroids = pca.project(clusterer.centroids()) def toAdd = table.emptyCopy(1) (0..<centroids.size()).each { idx -> toAdd[0].setString("Cluster", "Cluster " + (idx+1)) (1..3).each { toAdd[0].setDouble("PCA" + it, centroids[idx][it-1]) } toAdd[0].setDouble("Centroid", 50) table.append(toAdd) } def title = "Clusters x Principal Components w/ centroids" Plot.show(Scatter3DPlot.create(title, table, *(1..3).collect { "PCA$it" }, "Centroid", "Cluster"))
  42. Whiskey – Hierarchical clustering with Dendrogram … def dendrogram =

    new Dendrogram(clusters.tree, clusters.height, FOREST_GREEN).canvas().tap { title = 'Whiskey Dendrogram' setAxisLabels('Distilleries', 'Similarity') def lb = lowerBounds setBound([lb[0] - 1, lb[1] - 20] as double[], upperBounds) distilleries.eachWithIndex { String label, int i -> add(new Label(label, [i, -1] as double[], 0, 0, ninetyDeg, font, colorMap[partitions[i]])) } }.panel() def pca = PCA.fit(data) pca.projection = 2 def projected = pca.project(data) char mark = '#' def scatter = ScatterPlot.of(projected, partitions, mark).canvas().tap { title = 'Clustered by dendrogram partitions' setAxisLabels('PCA1', 'PCA2') }.panel() new PlotGrid(dendrogram, scatter).window()
  43. 5 3 Whiskey Clustering with Apache Groovy & Apache Ignite

    • Apache Groovy • Apache Ignite • Data Science • Whiskey Clustering & Visualization • Scaling Whiskey Clustering
  44. Clustering case study: Whiskey flavor profiles • 86 scotch whiskies

    • 12 flavor categories • Apache Ignite has special capabilities for reading data into the cache • In a cluster environment, use IgniteDataStreamer or IgniteCache.loadCache() to load data from files, stream sources, database sources, etc. • For our little example, we have a small CSV file and a single node, so we’ll just read our data using Apache Commons CSV
  45. Clustering case study: Whiskey flavor profiles • 86 scotch whiskies

    • 12 flavor categories • Let’s select the regions of interest
  46. Clustering case study: Whiskey flavor profiles • Read CSV rows

    • Slice out segments of interest 0 1 2 -1 0 1 … … distilleries data features var file = getClass().classLoader.getResource('whiskey.csv').file as File var rows = file.withReader {r -> RFC4180.parse(r).records*.toList() } var data = rows[1..-1].collect{ it[2..-1]*.toDouble() } as double[][] var distilleries = rows[1..-1]*.get(1) var features = rows[0][2..-1]
  47. Clustering case study: Whiskey flavor profiles • Set up configuration

    & define some helper variables // configure to all run on local machine but could be a cluster (can be hidden in XML) var cfg = new IgniteConfiguration( peerClassLoadingEnabled: true, discoverySpi: new TcpDiscoverySpi( ipFinder: new TcpDiscoveryMulticastIpFinder( addresses: ['127.0.0.1:47500..47509'] ) ) ) var pretty = this.&sprintf.curry('%.4f') var dist = new EuclideanDistance() // or ManhattanDistance var vectorizer = new DoubleArrayVectorizer()
  48. Whiskey flavors – scaling clustering Ignition.start(cfg).withCloseable { ignite -> println

    ">>> Ignite grid started for data: ${data.size()} rows X ${data[0].size()} cols" var dataCache = ignite.createCache(new CacheConfiguration<Integer, double[]>( name: "TEST_${UUID.randomUUID()}", affinity: new RendezvousAffinityFunction(false, 10))) data.indices.each { int i -> dataCache.put(i, data[i]) } var trainer = new KMeansTrainer().withDistance(dist).withAmountOfClusters(5) var mdl = trainer.fit(ignite, dataCache, vectorizer) println ">>> KMeans centroids:\n${features.join(', ')}" var centroids = mdl.centers*.all() var cols = centroids.collect{ it*.get() } cols.each { c -> println c.collect(pretty).join(', ') } dataCache.destroy() }
  49. Whiskey flavors – scaling clustering Ignition.start(cfg).withCloseable { ignite -> println

    ">>> Ignite grid started for data: ${data.size()} rows X ${data[0].size()} cols" var dataCache = ignite.createCache(new CacheConfiguration<Integer, double[]>( name: "TEST_${UUID.randomUUID()}", affinity: new RendezvousAffinityFunction(false, 10))) data.indices.each { int i -> dataCache.put(i, data[i]) } var trainer = new KMeansTrainer().withDistance(dist).withAmountOfClusters(5) var mdl = trainer.fit(ignite, dataCache, vectorizer) println ">>> KMeans centroids:\n${features.join(', ')}" var centroids = mdl.centers*.all() var cols = centroids.collect{ it*.get() } cols.each { c -> println c.collect(pretty).join(', ') } dataCache.destroy() } [11:48:48] __________ ________________ [11:48:48] / _/ ___/ |/ / _/_ __/ __/ [11:48:48] _/ // (7 7 // / / / / _/ [11:48:48] /___/\___/_/|_/___/ /_/ /x___/ [11:48:48] [11:48:48] ver. 2.15.0#20230425-sha1:f98f7f35 [11:48:48] 2023 Copyright(C) Apache Software Foundation … >>> Ignite grid started for data: 86 rows X 12 cols >>> KMeans centroids: Body, Sweetness, Smoky, Medicinal, Tobacco, Honey, Spicy, Winey, Nutty, Malty, Fruity, Floral 1.5000, 2.5000, 1.0000, 0.1818, 0.0455, 0.7727, 0.8182, 0.3636, 1.6818, 1.5909, 2.0455, 1.8182 2.4400, 2.3600, 1.4400, 0.0800, 0.0400, 1.8000, 1.6800, 1.6000, 1.9200, 2.2400, 2.0800, 1.7200 2.9091, 1.5455, 2.9091, 2.7273, 0.4545, 0.4545, 1.4545, 0.5455, 1.5455, 1.4545, 1.1818, 0.5455 1.6000, 2.3200, 1.4800, 0.4400, 0.1200, 1.3600, 1.6000, 0.7600, 0.6800, 1.7600, 1.5600, 2.1600 4.0000, 2.6667, 1.6667, 0.0000, 0.0000, 2.0000, 1.0000, 3.6667, 2.3333, 1.3333, 2.0000, 1.0000
  50. Whiskey flavors – scaling clustering … var clusters = [:].withDefault{

    [] } dataCache.query(new ScanQuery()).withCloseable { observations -> observations.each { observation -> def (k, v) = observation.with{ [getKey(), getValue()] } int prediction = mdl.predict(vectorizer.extractFeatures(k, v)) clusters[prediction] += distilleries[k] } } clusters.sort{ e -> e.key }.each{ k, v -> println "Cluster ${k+1}: ${v.join(', ')}" } … … Cluster 1: AnCnoc, Auchentoshan, Aultmore, BenNevis, Benriach, Bunnahabhain, Cardhu, Craigallechie, Dalwhinnie, Edradour, GlenElgin, GlenGrant, GlenMoray, GlenOrd, Glengoyne, Glenlossie, Glenmorangie, Knochando, Longmorn, Mannochmore, Scapa, Speyside, Strathmill, Tamdhu, Tobermory Cluster 2: Aberlour, Belvenie, Benrinnes, Deanston, Glendullan, Glenlivet, Strathisla Cluster 3: ArranIsleOf, Balblair, Bladnoch, Craigganmore, Dufftown, GlenDeveronMacduff, GlenGarioch, GlenSpey, Glenallachie, Glenfiddich, Glenkinchie, Inchgower, Linkwood, Loch Lomond, Miltonduff, RoyalBrackla, Speyburn, Tamnavulin, Teaninich, Tullibardine Cluster 4: Aberfeldy, Ardmore, Auchroisk, Balmenach, Benromach, BlairAthol, Bowmore, Bruichladdich, Dailuaine, Dalmore, GlenKeith, GlenScotia, Glendronach, Glenfarclas, Glenrothes, Glenturret, Highland Park, Macallan, Mortlach, OldFettercairn, RoyalLochnagar, Springbank, Tomatin, Tomintoul, Tomore Cluster 5: Ardbeg, Caol Ila, Clynelish, Isle of Jura, Lagavulin, Laphroig, Oban, OldPulteney, Talisker …
  51. Whiskey flavors – scaling clustering … var clusters = [:].withDefault{

    [] } dataCache.query(new ScanQuery()).withCloseable { observations -> observations.each { observation -> def (k, v) = observation.with{ [getKey(), getValue()] } int prediction = mdl.predict(vectorizer.extractFeatures(k, v)) clusters[prediction] += distilleries[k] } } clusters.sort{ e -> e.key }.each{ k, v -> println "Cluster ${k+1}: ${v.join(', ')}" } … … Cluster 1: AnCnoc, Auchentoshan, Aultmore, BenNevis, Benriach, Bunnahabhain, Cardhu, Craigallechie, Dalwhinnie, Edradour, GlenElgin, GlenGrant, GlenMoray, GlenOrd, Glengoyne, Glenlossie, Glenmorangie, Knochando, Longmorn, Mannochmore, Scapa, Speyside, Strathmill, Tamdhu, Tobermory Cluster 2: Aberlour, Belvenie, Benrinnes, Deanston, Glendullan, Glenlivet, Strathisla Cluster 3: ArranIsleOf, Balblair, Bladnoch, Craigganmore, Dufftown, GlenDeveronMacduff, GlenGarioch, GlenSpey, Glenallachie, Glenfiddich, Glenkinchie, Inchgower, Linkwood, Loch Lomond, Miltonduff, RoyalBrackla, Speyburn, Tamnavulin, Teaninich, Tullibardine Cluster 4: Aberfeldy, Ardmore, Auchroisk, Balmenach, Benromach, BlairAthol, Bowmore, Bruichladdich, Dailuaine, Dalmore, GlenKeith, GlenScotia, Glendronach, Glenfarclas, Glenrothes, Glenturret, Highland Park, Macallan, Mortlach, OldFettercairn, RoyalLochnagar, Springbank, Tomatin, Tomintoul, Tomore Cluster 5: Ardbeg, Caol Ila, Clynelish, Isle of Jura, Lagavulin, Laphroig, Oban, OldPulteney, Talisker …
  52. Scaling clustering: K-means k=3 Euclidean var dist = new EuclideanDistance()

    … Ignition.start(cfg).withCloseable { ignite -> println ">>> Ignite grid started for data: ${data.size()} rows X ${data[0].size()} cols" var dataCache = ignite.createCache(new CacheConfiguration<Integer, double[]>( name: "TEST_${UUID.randomUUID()}", affinity: new RendezvousAffinityFunction(false, 10))) data.indices.each { int i -> dataCache.put(i, data[i]) } var trainer = new KMeansTrainer().withDistance(dist).withAmountOfClusters(3) var mdl = trainer.fit(ignite, dataCache, vectorizer) println ">>> KMeans centroids:\n${features.join(', ')}" var centroids = mdl.centers*.all() var cols = centroids.collect{ it*.get() } cols.each { c -> println c.collect(pretty).join(', ') } dataCache.destroy() } 5
  53. Scaling clustering: K-means k=3 Euclidean var dist = new EuclideanDistance()

    … Ignition.start(cfg).withCloseable { ignite -> println ">>> Ignite grid started for data: ${data.size()} rows X ${data[0].size()} cols" var dataCache = ignite.createCache(new CacheConfiguration<Integer, double[]>( name: "TEST_${UUID.randomUUID()}", affinity: new RendezvousAffinityFunction(false, 10))) data.indices.each { int i -> dataCache.put(i, data[i]) } var trainer = new KMeansTrainer().withDistance(dist).withAmountOfClusters(3) var mdl = trainer.fit(ignite, dataCache, vectorizer) println ">>> KMeans centroids:\n${features.join(', ')}" var centroids = mdl.centers*.all() var cols = centroids.collect{ it*.get() } cols.each { c -> println c.collect(pretty).join(', ') } dataCache.destroy() } 5 … Cluster 1: Ardbeg, Caol Ila, Clynelish, Lagavulin, Laphroig, Talisker Distinguishing features: Body=3..4, Sweetness=1..2, Smoky=3..4, Medicinal=2..4, Honey=0..1, Winey=0..2, Nutty=1..2, Malty=1..2, Fruity=0..2, Floral=0..1 Cluster 2: Ardmore, ArranIsleOf, Balblair, Balmenach, BlairAthol, Bowmore, Bruichladdich, Dailuaine, Dalmore, GlenDeveronMacduff, GlenGarioch, GlenScotia, GlenSpey, Glendronach, Glenrothes, Highland Park, Isle of Jura, Loch Lomond, Mortlach, Oban, OldFettercairn, OldPulteney, Springbank, Teaninich, Tomatin, Tomore Distinguishing features: Sweetness=1..3, Smoky=1..3, Medicinal=0..2, Honey=0..2, Floral=0..2 Cluster 3: Aberfeldy, Aberlour, AnCnoc, Auchentoshan, Auchroisk, Aultmore, Belvenie, BenNevis, Benriach, Benrinnes, Benromach, Bladnoch, Bunnahabhain, Cardhu, Craigallechie, Craigganmore, Dalwhinnie, Deanston, Dufftown, Edradour, GlenElgin, GlenGrant, GlenKeith, GlenMoray, GlenOrd, Glenallachie, Glendullan, Glenfarclas, Glenfiddich, Glengoyne, Glenkinchie, Glenlivet, Glenlossie, Glenmorangie, Glenturret, Inchgower, Knochando, Linkwood, Longmorn, Macallan, Mannochmore, Miltonduff, RoyalBrackla, RoyalLochnagar, Scapa, Speyburn, Speyside, Strathisla, Strathmill, Tamdhu, Tamnavulin, Tobermory, Tomintoul, Tullibardine Distinguishing features: Smoky=0..2, Medicinal=0..1, Malty=1..3, Fruity=1..3 …
  54. Scaling clustering: K-means k=3 Manhattan var dist = new ManhattanDistance()

    … Ignition.start(cfg).withCloseable { ignite -> println ">>> Ignite grid started for data: ${data.size()} rows X ${data[0].size()} cols" var dataCache = ignite.createCache(new CacheConfiguration<Integer, double[]>( name: "TEST_${UUID.randomUUID()}", affinity: new RendezvousAffinityFunction(false, 10))) data.indices.each { int i -> dataCache.put(i, data[i]) } var trainer = new KMeansTrainer().withDistance(dist).withAmountOfClusters(3) var mdl = trainer.fit(ignite, dataCache, vectorizer) println ">>> KMeans centroids:\n${features.join(', ')}" var centroids = mdl.centers*.all() var cols = centroids.collect{ it*.get() } cols.each { c -> println c.collect(pretty).join(', ') } dataCache.destroy() } 4 3 3 + 4 = 7
  55. Scaling clustering: K-means k=3 Manhattan var dist = new ManhattanDistance()

    … Ignition.start(cfg).withCloseable { ignite -> println ">>> Ignite grid started for data: ${data.size()} rows X ${data[0].size()} cols" var dataCache = ignite.createCache(new CacheConfiguration<Integer, double[]>( name: "TEST_${UUID.randomUUID()}", affinity: new RendezvousAffinityFunction(false, 10))) data.indices.each { int i -> dataCache.put(i, data[i]) } var trainer = new KMeansTrainer().withDistance(dist).withAmountOfClusters(3) var mdl = trainer.fit(ignite, dataCache, vectorizer) println ">>> KMeans centroids:\n${features.join(', ')}" var centroids = mdl.centers*.all() var cols = centroids.collect{ it*.get() } cols.each { c -> println c.collect(pretty).join(', ') } dataCache.destroy() } 4 3 3 + 4 = 7 … Cluster 1: Aberfeldy, Aberlour, AnCnoc, Ardmore, Auchroisk, Balmenach, Belvenie, BenNevis, Benrinnes, Benromach, BlairAthol, Bowmore, Bruichladdich, Craigallechie, Dailuaine, Dalmore, Deanston, Edradour, Glendronach, Glendullan, Glenfarclas, Glenlivet, Glenturret, Knochando, Macallan, Mortlach, OldFettercairn, RoyalLochnagar, Scapa, Strathisla, Tomatin, Tomintoul Distinguishing features: Smoky=1..3, Medicinal=0..2 Cluster 2: Ardbeg, Caol Ila, Clynelish, GlenScotia, Highland Park, Isle of Jura, Lagavulin, Laphroig, Oban, OldPulteney, Springbank, Talisker Distinguishing features: Body=2..4, Sweetness=1..2, Smoky=2..4, Honey=0..2, Winey=0..2, Nutty=1..2, Malty=1..2, Fruity=0..2, Floral=0..2 Cluster 3: ArranIsleOf, Auchentoshan, Aultmore, Balblair, Benriach, Bladnoch, Bunnahabhain, Cardhu, Craigganmore, Dalwhinnie, Dufftown, GlenDeveronMacduff, GlenElgin, GlenGarioch, GlenGrant, GlenKeith, GlenMoray, GlenOrd, GlenSpey, Glenallachie, Glenfiddich, Glengoyne, Glenkinchie, Glenlossie, Glenmorangie, Glenrothes, Inchgower, Linkwood, Loch Lomond, Longmorn, Mannochmore, Miltonduff, RoyalBrackla, Speyburn, Speyside, Strathmill, Tamdhu, Tamnavulin, Teaninich, Tobermory, Tomore, Tullibardine Distinguishing features: Medicinal=0..1, Honey=0..2, Winey=0..2 …
  56. Ignition.start(cfg).withCloseable { ignite -> println ">>> Ignite grid started for

    data: ${data.size()} rows X ${data[0].size()} cols" var dataCache = ignite.createCache(new CacheConfiguration<Integer, double[]>( name: "TEST_${UUID.randomUUID()}", affinity: new RendezvousAffinityFunction(false, 10))) data.indices.each { int i -> dataCache.put(i, data[i]) } var trainer = new GmmTrainer().withMaxCountOfClusters(5) var mdl = trainer.fit(ignite, dataCache, vectorizer) … dataCache.destroy() } Scaling clustering: Gaussian max clusters 5 Image source: wikipedia
  57. Ignition.start(cfg).withCloseable { ignite -> println ">>> Ignite grid started for

    data: ${data.size()} rows X ${data[0].size()} cols" var dataCache = ignite.createCache(new CacheConfiguration<Integer, double[]>( name: "TEST_${UUID.randomUUID()}", affinity: new RendezvousAffinityFunction(false, 10))) data.indices.each { int i -> dataCache.put(i, data[i]) } var trainer = new GmmTrainer().withMaxCountOfClusters(5) var mdl = trainer.fit(ignite, dataCache, vectorizer) … dataCache.destroy() } Scaling clustering: Gaussian max clusters 5 … Cluster 1: Aberfeldy, Aberlour, AnCnoc, Ardmore, ArranIsleOf, Auchentoshan, Auchroisk, Aultmore, Balmenach, Belvenie, BenNevis, Benriach, Benrinnes, Benromach, Bladnoch, BlairAthol, Bunnahabhain, Cardhu, Craigallechie, Craigganmore, Dailuaine, Dalwhinnie, Deanston, Dufftown, Edradour, GlenDeveronMacduff, GlenElgin, GlenGrant, GlenKeith, GlenMoray, GlenOrd, GlenSpey, Glenallachie, Glendronach, Glendullan, Glenfarclas, Glenfiddich, Glengoyne, Glenkinchie, Glenlivet, Glenlossie, Glenrothes, Glenturret, Inchgower, Knochando, Linkwood, Loch Lomond, Longmorn, Macallan, Mannochmore, Miltonduff, Mortlach, OldFettercairn, RoyalLochnagar, Speyburn, Speyside, Strathisla, Tamdhu, Tamnavulin, Tobermory, Tomatin, Tomintoul, Tomore, Tullibardine Distinguishing features: Smoky=0..2, Medicinal=0..1 Cluster 2: Ardbeg, Balblair, Bowmore, Bruichladdich, Caol Ila, Clynelish, Dalmore, GlenGarioch, GlenScotia, Glenmorangie, Highland Park, Isle of Jura, Lagavulin, Laphroig, Oban, OldPulteney, RoyalBrackla, Scapa, Springbank, Strathmill, Talisker, Teaninich Distinguishing features: Sweetness=1..3, Honey=0..2, Winey=0..2, Nutty=0..2, Malty=0..2, Floral=0..2 … Image source: wikipedia
  58. THANK YOU Twitter: Mastodon: Apache Groovy: Apache Ignite: Repo: ©

    2023 Unity Foundation. All rights reserved. @paulk_asert @[email protected] https://groovy.apache.org/ https://groovy-lang.org/ https://ignite.apache.org/ https://github.com/paulk-asert/groovy-data-science