wordle game

Building a wordle-like game with string metrics, phonetics, and semantic
similarity using machine learning and deep neural networks Dr Paul King, VP Apache Groovy & Distinguished Engineer, Object Computing Apache Groovy: Repo: Related blog Slides: Twitter/X | Mastodon | Bluesky: https://groovy.apache.org/ https://groovy-lang.org/ https://github.com/paulk-asert/groovy-string-similarity https://groovy.apache.org/blog/groovy-text-similarity https://speakerdeck.com/paulk/wordle-game @ApacheGroovy | @[email protected] | @groovy.apache.org

Credits

History of text similarity algorithms Simple String Metrics: • Edit
Distance Algorithms: Levenshtein, Jaro-Winkler distance; useful for identifying typographical errors or minor variations • Character/Token-based Methods: Jaccard similarity and N-gram similarity; shared characters or sequences of characters/words Phonetic Algorithms: • Soundex, Metaphone, and Double Metaphone; aim to find similar-sounding words even with different spellings Statistical and Lexical Methods (Pre-Deep Learning): • TF-IDF (Term Frequency-Inverse Document Frequency): Assigns weights to words based on their frequency, allowing for vector representation and cosine similarity calculation between documents • Lexical Databases: Use resources like WordNet to measure semantic similarity through relationships (synonyms, hypernyms) between words Deep Learning Engines (Word Embeddings and Contextualized Embeddings): • Static Word Embeddings: Word2Vec and GloVe represent words as dense vectors in a continuous space, where semantically similar words are located close to each other, allowing the capture of semantic relationships beyond simple string matching • Contextualized Embeddings: ELMo and BERT use word embeddings that vary based on the word's context within a sentence, significantly improving the capture of nuanced meaning Large Language Models (LLMs): • Transformer Architecture: BERT, GPT, and their successors; capture long-range dependencies and complex semantic relationships with unprecedented accuracy • Generative Capabilities and Semantic Understanding: LLMs can generate human-like text, answer complex questions, and perform zero-shot and few-shot learning, demonstrating a deep understanding of context and meaning that far surpasses previous methods

Agenda 1. String metrics • Libraries, common algorithms 2. Phonetic
algorithms • Libraries, common algorithms 3. Deep learning • DeepLearning4J, DeepJavaLibrary 4. LLMs • Ollama A tiny bit of theory A look at available libraries A look at how easy to code on the JVM (in Groovy)

The Game Possible letters: a b c d e f
g h i j k l m n o p q r s t u v w x y z Guess the hidden word (turn 1): aftershock LongestCommonSubsequence 0 Levenshtein Distance: 10, Insert: 0, Delete: 3, Substitute: 7 Jaccard 0% JaroWinkler PREFIX 0% / SUFFIX 0% Phonetic Metaphone=AFTRXK 47% / Soundex=A136 0% Meaning AnglE 45% / Use 21% / ConceptNet 2% / GloVe -4% / fastText 19%

What is Apache Groovy? It’s like a super version of
Java: • Simpler scripting: more powerful yet more concise • Extension methods: 2000+ enhancements to Java classes for a great out-of-the box experience (batteries included) • Flexible typing: from dynamic duck-typing (terse code) to extensible stronger-than-Java static typing (better checking) • Improved OO & functional features: from traits (more powerful and flexible OO designs) to tail recursion and memorizing/partial application of pure functions • AST transforms: 10s of lines instead of 100/1000s of lines • Java features earlier: recent features on older JDK

Agenda 1. String metrics • Libraries o info.debatty:java-string-similarity:2.0.0 o org.apache.commons:commons-text:1.14.0
• Algorithms o Levenshtein o Jaccard o Hamming o LongestCommonSubsequence o JaroWinkler 2. Phonetic algorithms 3. Deep learning 4. LLMs JDK Debatty Groovy Commons Text

String metrics: Levenshtein • The minimum number of "edits" (inserts,
deletes, or substitutions) required to convert from one word to another K I T T E N A G R O U N D S I T T I N G G R O U N D S (2 subs + 1 add) (1 del + 1 add) D = 3 D = 2 • Damerau–Levenshtein distance is a variant that allows transposition of two adjacent letters to count as a single edit

String metrics: Levenshtein • The minimum number of "edits" (inserts,
deletes, or substitutions) required to convert from one word to another • Sim = (N – Dist) / N K I T T E N A G R O U N D S I T T I N G G R O U N D S (2 subs + 1 add) (1 del + 1 add) D = 3 D = 2 S = (7 – 3) / 7 = 0.57 S = (7 – 2) / 7 = 0.71

String metrics: Levenshtein • Damerau–Levenshtein is used by the Groovy
compiler 'foo'.toUpper() No signature of method: java.lang.String.toUpper() is applicable for argument types: () values: [] Possible solutions: toUpperCase(), toURI(), toURL(), toURI(), toURL(), toSet() 'foo'.touppercase() No signature of method: java.lang.String.touppercase() is applicable for argument types: () values: [] Possible solutions: toUpperCase(), toUpperCase(java.util.Locale), toLowerCase(), toLowerCase(java.util.Locale)

String metrics: Jaccard • Divide the number of common elements
(the intersection) by the total number of unique elements in both sets (the union) K I T T E N A G R O U N D S I T T I N G G R O U N D S (inter=ITN, union=ITNKSG) (inter=GROUND, union=GROUNDSA) S = 3/6 = 0.5 S = 6 / 8 = 0.75

String metrics: Jaccard • Divide the number of common elements
(the intersection) by the total number of unique elements in both sets (the union) K I T T E N A G R O U N D S I T T I N G G R O U N D S (inter=ITN, union=ITNKSG) (inter=GROUND, union=GROUNDSA) S = 3/6 = 0.5 S = 6 / 8 = 0.75 • Can also be k-length letter sequences • Can also be words in a sentence, sentences in a document

String metrics: LongestCommonSubsequence • length of longest common subsequence •
A subsequence is obtained by deleting zero or more elements without changing the order of the remaining elements K I T T E N A G R O U N D S I T T I N G G R O U N D S D = 4 D = 6

String metrics: Jaro-Winkler • Edit distance measure • Gives a
“prefix boost” to letters at the start of a word • SHIPPING and SHIPMENT more likely to be related than SHIPMENT and BASEMENT

String metrics: our own WordleLike metric • Each correct letter
(Green) scores 2 point • Each letter in the wrong spot (Yellow) scores 1 point • Incorrect letters (Gray) scores 0 points • For words of differing size, slide smallest word along largest and select smallest distance (max similarity) with padding on either end scoring 0

String metrics: similarity algorithms var simAlgs = [ NormalizedLevenshtein: new
NormalizedLevenshtein()::similarity, 'Jaccard (debatty k=1)': new Jaccard(1)::similarity, 'Jaccard (debatty k=2)': new Jaccard(2)::similarity, 'Jaccard (debatty k=3)': new Jaccard()::similarity, 'Jaccard (commons text k=1)': new JaccardSimilarity()::apply, 'JaroWinkler (debatty)': new JaroWinkler()::similarity, 'JaroWinkler (commons text)': new JaroWinklerSimilarity()::apply, RatcliffObershelp: new RatcliffObershelp()::similarity, SorensenDice: new SorensenDice()::similarity, Cosine: new Cosine()::similarity, WordleLike: this::wordleSimilaritySliding ]

String metrics: similarity algorithms var pairs = [ ['cat', 'hat'],
['cat', 'kitten'], ['cat', 'dog'], ['bear', 'bare'], ['bear', 'bean'], ['pair', 'pear'], ['there', 'their'], ['sort', 'sought'], ['cow', 'bull'], ['cow', 'cowbell'], ['winners', 'grinners'], ['knows', 'nose'], ['ground', 'aground'], ['grounds', 'aground'], ['peeler', 'repeal'], ['hippo', 'hippopotamus'], ['superstar', 'supersonic'], ['partnership', 'leadership'], ['elton john', 'john elton'], ['elton john', 'nhoj notle'], ['my name is Yoda', 'Yoda my name is'], ['the cat sat on the mat', 'the fox jumped over the dog'], ['poodles are cute', 'dachshunds are delightful'] ]

String metrics: similarity algorithms results pairs.each { one, two ->
var results = simAlgs.collectValues { alg -> alg(one, two) } // display results ... } there VS their JaroWinkler (commons text) 0.91 ██████████████████ JaroWinkler (debatty) 0.91 ██████████████████ Jaccard (debatty k=1) 0.80 ████████████████ Jaccard (commons text k=1) 0.80 ████████████████ RatcliffObershelp 0.80 ████████████████ WordleLike 0.70 ██████████████ NormalizedLevenshtein 0.60 ████████████ Cosine 0.33 ██████ Jaccard (debatty k=2) 0.33 ██████ SorensenDice 0.33 ██████ Jaccard (debatty k=3) 0.20 ████

String metrics: distance algorithms var distAlgs = [ NormalizedLevenshtein: new
NormalizedLevenshtein()::distance, 'WeightedLevenshtein (t is near r)': new WeightedLevenshtein({ char c1, char c2 -> c1 == 't' && c2 == 'r' ? 0.5 : 1.0 })::distance, Damerau: new Damerau()::distance, OptimalStringAlignment: new OptimalStringAlignment()::distance, MetricLCS: new MetricLCS()::distance, 'NGram(2)': new NGram(2)::distance, 'NGram(4)': new NGram(4)::distance, QGram: new QGram(2)::distance, CosineDistance: new CosineDistance()::apply, HammingDistance: new HammingDistance()::apply, JaccardDistance: new JaccardDistance()::apply, JaroWinklerDistance: new JaroWinklerDistance()::apply, LevenshteinDetailedDistance: { a, b -> LevenshteinDetailedDistance.defaultInstance.apply(a, b).toString() }, LevenshteinDistance: LevenshteinDistance.defaultInstance::apply, 'LongestCommonSubsequenceDistance (commons text)': new LongestCommonSubsequenceDistance()::apply, 'LongestCommonSubsequence (commons text)': new org.apache.commons.text.similarity.LongestCommonSubsequence()::apply, ]

String metrics: similarity algorithms var phrases = [ 'The sky
is blue', 'The blue sky', 'The blue cat', 'The sea is blue', 'Blue skies following me', 'My ferrari is red', 'Apples are red', 'I read a book', 'The wind blew', 'Numbers are odd or even', 'Red noses', 'Read knows', 'Hippopotamus', ]

String metrics: distance algorithms results • Query: The blue car
NormalizedLevenshtein: The blue cat (0.08), The blue sky (0.25), Th WeightedLevenshtein (t is near r): The blue cat (0.50), The blue sky (3.00), Th Damerau: The blue cat (1.00), The blue sky (3.00), Th OptimalStringAlignment: The blue cat (1.00), The blue sky (3.00), Th LongestCommonSubsequence (debatty): The blue cat (2.00), The blue sky (6.00), Th MetricLCS: The blue cat (0.08), The blue sky (0.25), Th NGram(2): The blue cat (0.04), The blue sky (0.21), Th NGram(4): The blue cat (0.02), The blue sky (0.13), Th QGram: The blue cat (2.00), The blue sky (6.00), Th CosineDistance: The blue sky (0.33), The blue cat (0.33), Th HammingDistance: The blue cat (1), The blue sky (3), Hi JaccardDistance: The blue cat (0.18), The sea is blue (0.33), Th JaroWinklerDistance: The blue cat (0.03), The blue sky (0.10), Th LevenshteinDistance: The blue cat (1), The blue sky (3), Th LongestCommonSubsequenceDistance (commons text): The blue cat (2), The blue sky (6), Th

String metrics: distance algorithms results • Query: Red roses NormalizedLevenshtein:
Red noses (0.11), Read knows (0.50), Apple WeightedLevenshtein (t is near r): Red noses (1.00), Read knows (5.00), The b Damerau: Red noses (1.00), Read knows (5.00), The b OptimalStringAlignment: Red noses (1.00), Read knows (5.00), The b MetricLCS: Red noses (0.11), Read knows (0.40), The b NGram(2): Red noses (0.11), Read knows (0.55), Apple NGram(4): Red noses (0.11), Read knows (0.53), Apple QGram: Red noses (4.00), Read knows (13.00), Apple CosineDistance: Red noses (0.50), The sky is blue (1.00), The b HammingDistance: Red noses (1), The sky is blue (-), The b JaccardDistance: Red noses (0.25), Read knows (0.45), Apple JaroWinklerDistance: Red noses (0.04), Read knows (0.20), The s LevenshteinDistance: Red noses (1), Read knows (5), The b LongestCommonSubsequenceDistance (commons text): Red noses (2), Read knows (7), The b

String metrics: similarity algorithms discussion For Jaccard = 1, we
know that we have correctly guessed all the letters! For Jaccard = 0, we can rule out all letters from our guess! bear VS bare Jaccard (debatty k=1) 1.00 ████████████████████ cow VS bull Jaccard (debatty k=1) 0.00 ▏

String metrics: similarity algorithms discussion Consider comparing a social media
account @elton_john with email [email protected] elton john VS john elton Jaccard (debatty k=1) 1.00 ████████████████████ Jaccard (debatty k=2) 0.80 ████████████████ Jaccard (debatty k=3) 0.45 █████████ elton john VS nhoj notle Jaccard (debatty k=1) 1.00 ████████████████████ Jaccard (debatty k=2) 0.00 ▏ Jaccard (debatty k=3) 0.00 ▏

String metrics: similarity algorithms discussion superstar VS supersonic Jaccard (debatty
k=1) 0.45 █████████ JaroWinkler (debatty) 0.90 ██████████████████ partnership VS leadership Jaccard (debatty k=1) 0.64 ████████████ JaroWinkler (debatty) 0.73 ██████████████

String metrics: similarity algorithms discussion superstar VS supersonic Jaccard (debatty
k=1) 0.45 █████████ JaroWinkler (debatty) 0.90 ██████████████████ partnership VS leadership Jaccard (debatty k=1) 0.64 ████████████ JaroWinkler (debatty) 0.73 ██████████████ Rev. JaroWinkler (debatty) 0.90 ██████████████████

String metrics: distance algorithms discussion • For bat vs tab,
the Jaccard similarity of 100% tells us that the guess contains exactly all the (distinct) letters of the hidden word. The LCS value of 1 tells us that no two letters appear in the correct order. The only way this is possible is if the letters appear in reverse order. • For back vs buck, the LCS of 3 combined with the Jaccard of 3/5 tells us that the three shared letters also appear in the same order in both the guess and hidden words. bat vs tab: LongestCommonSubsequence (1), Jaccard (3/3) back vs buck: LongestCommonSubsequence (3) Jaccard (3/5)

String metrics: decisions Settle on the following string metrics •
LongestCommonSubsequence and Levenshtein distances • Jaccard, JaroWinkler (fwd/rev) similarity measures Possible letters: a b c d e f g h i j k l m n o p q r s t u v w x y z Guess the hidden word (turn 1): aftershock LongestCommonSubsequence 0 Levenshtein Distance: 10, Insert: 0, Delete: 3, Substitute: 7 Jaccard 0% JaroWinkler PREFIX 0% / SUFFIX 0% Phonetic Metaphone=AFTRXK 47% / Soundex=A136 0% Meaning AnglE 45% / Use 21% / ConceptNet 2% / GloVe -4% / fastText 19%

Agenda 1. String metrics 2. Phonetic algorithms • Libraries o
org.openrefine:main:3.9.3 o commons-codec:commons-codec:1.19.0 • Algorithms o Soundex and variants o Metaphone and variants o Miscellaneous 3. Deep learning 4. LLMs JDK Open Refine Groovy Commons Codec

Phonetic algorithms: Soundex The correct value can be found as
follows: • Retain the first letter of the name and drop all other occurrences of a, e, i, o, u, y, h, w. • Replace consonants with digits as follows (after the first letter): ▪ b, f, p, v → 1 ▪ c, g, j, k, q, s, x, z → 2 ▪ d, t → 3 ▪ l → 4 ▪ m, n → 5 ▪ r → 6 • If two or more letters with the same number are adjacent in the original name (before step 1), only retain the first letter; also two letters with the same number separated by 'h', 'w' or 'y' are coded as a single number, whereas such letters separated by a vowel are coded twice. This rule also applies to the first letter. • If there are too few letters in the word to assign three numbers, append zeros until there are three numbers. If there are four or more numbers, retain only the first three. Source: https://en.wikipedia.org/wiki/Soundex

Phonetic algorithms: soundex var result = { a, b ->
["$a|$b", a == b] } var soundex = [ Soundex: { a, b -> new Soundex().with{result(soundex(a), soundex(b)) }}, RefinedSoundex: { a, b -> new RefinedSoundex().with{ result(encode(a), encode(b)) }}, DaitchMokotoffSoundex: { a, b -> new DaitchMokotoffSoundex().with{ result(encode(a), encode(b)) }}, ] def runAndDisplay(algs) { var results = [pairs, algs].combinations().collect { pair, namedAlg -> namedAlg.value(pair) } display(algs, pairs, results) println() } runAndDisplay(soundex)

Phonetic algorithms: soundex Pair Soundex RefinedSoundex DaitchMokotoffSoundex cat|hat C300|H300 C306|H06
430000|530000 bear|bare B600|B600 B109|B1090 790000|790000 pair|pare P600|P600 P109|P1090 790000|790000 there|their T600|T600 T6090|T609 390000|390000 sort|sought S630|S230 S3096|S30406 493000|453000 cow|bull C000|B400 C30|B107 470000|780000 winning|grinning W552|G655 W08084|G4908084 766500|596650 knows|nose K520|N200 K3803|N8030 567400|640000 ground|aground G653|A265 G49086|A049086 596300|059630 peeler|repeal P460|R140 P10709|R90107 789000|978000 hippo|hippopotamus H100|H113 H010|H0101060803 570000|577364

Phonetics: decisions Settle on the following phonetic measures • Soundex
SoundexDiff • Metaphone(5) (Metaphone5LCS + Metaphone5Lev) / 2 Possible letters: a b c d e f g h i j k l m n o p q r s t u v w x y z Guess the hidden word (turn 1): aftershock LongestCommonSubsequence 0 Levenshtein Distance: 10, Insert: 0, Delete: 3, Substitute: 7 Jaccard 0% JaroWinkler PREFIX 0% / SUFFIX 0% Phonetic Metaphone=AFTRXK 47% / Soundex=A136 0% Meaning AnglE 45% / Use 21% / ConceptNet 2% / GloVe -4% / fastText 19%

Agenda 1. String metrics 2. Phonetic algorithms 3. Deep learning
• DeepLearning4J: Word2Vec (fasttext, GloVe, ConceptNet) • DeepJavaLibrary: PyTorch (AnglE), TensorFlow (Use) 4. LLMs

DeepLearning4J Architecture https://medium.com/@abhinav.neo/deeplearning4j-comprehensive-overview-a6ab2d3b3cdc JDK ND4J Word2vec Groovy Models

DeepLearning4J: Word2vec (fasttext) var modelName = 'fasttext-wiki-news-subwords-300.bin' var path =
Paths.get(Dl4jWord2Vec.classLoader.getResource(modelName).toURI()).toFile() var model = WordVectorSerializer.readWord2VecModel(path) String[] words = ['bull', 'calf', 'bovine', 'cattle', 'livestock', 'horse'] var target = 'cow' var result = words .collectEntries { w -> [w, model.similarity(target, w)] } .sort { -it.value } .collectValues('%4.2f'::formatted) var nearest = model.wordsNearest(target, 4) println "Word2Vec(${modelName - '.bin'}) similarity to $target:\n$result" println "Nearest words in vocab: $nearest" Word2Vec(fasttext-wiki-news-subwords-300) similarity to cow: [bovine:0.72, cattle:0.70, calf:0.67, bull:0.67, livestock:0.61, horse:0.60] Nearest words in vocab: [cows, goat, pig, bovine]

DeepLearning4J: Word2vec (GloVe) Word2Vec(glove-wiki-gigaword-300) similarity to cow: [bovine:0.67, cattle:0.62, livestock:0.47,
calf:0.44, horse:0.42, bull:0.38] Nearest words in vocab: [cows, mad, bovine, cattle] var modelName = 'glove-wiki-gigaword-300.bin' var path = Paths.get(Dl4jWord2Vec.classLoader.getResource(modelName).toURI()).toFile() var model = WordVectorSerializer.readWord2VecModel(path) String[] words = ['bull', 'calf', 'bovine', 'cattle', 'livestock', 'horse'] var target = 'cow' var result = words .collectEntries { w -> [w, model.similarity(target, w)] } .sort { -it.value } .collectValues('%4.2f'::formatted) var nearest = model.wordsNearest(target, 4) println "Word2Vec(${modelName - '.bin'}) similarity to $target:\n$result" println "Nearest words in vocab: $nearest"

DeepLearning4J: Word2vec (ConceptNet) Word2Vec(conceptnet-numberbatch-17-06-300) similarity to /c/en/cow: [/c/en/bovine:0.77, /c/en/cattle:0.77, /c/en/livestock:0.63,
/c/en/bull:0.54, /c/en/calf:0.53, /c/en/horse:0.50] Nearest words in vocab: [/c/ast/vaca, /c/be/карова, /c/ur/یاگ, /c/gv/booa] var modelName = 'conceptnet-numberbatch-17-06-300.bin' var path = Paths.get(Dl4jWord2Vec.classLoader.getResource(modelName).toURI()).toFile() var model = WordVectorSerializer.readWord2VecModel(path) String[] words = ['bull', 'calf', 'bovine', 'cattle', 'livestock', 'horse'] var target = '/c/en/cow' var result = words .collectEntries { w -> ["/c/en/$w", model.similarity(target, "/c/en/$w")] } .sort { -it.value } .collectValues('%4.2f'::formatted) var nearest = model.wordsNearest(target, 4) println "Word2Vec(${modelName - '.bin'}) similarity to $target:\n$result" println "Nearest words in vocab: $nearest"

DeepLearning4J: Word2vec (ConceptNet) Algorithm conceptnet /c/fr/vache █████████ /c/de/kuh █████████ /c/en/cow
/c/en/bovine ███████ /c/fr/bovin ███████ /c/en/bull █████ /c/fr/taureau █████████ /c/en/cow █████ /c/en/bull /c/fr/vache █████ /c/de/kuh █████ /c/fr/bovin █████ /c/de/kuh █████ /c/en/cow █████ /c/en/calf /c/fr/vache █████ /c/en/bovine █████ /c/fr/bovin █████ /c/en/cow █████████ /c/fr/vache █████████ /c/de/kuh /c/fr/bovin ███████ /c/en/bovine ███████ /c/en/calf █████ /c/en/cat ████████ /c/de/katze ████████ /c/en/kitten /c/en/bull ██ /c/en/cow █ /c/de/kuh █ /c/de/katze █████████ /c/en/kitten ████████ /c/en/cat /c/en/bull ██ /c/en/cow ██ /c/fr/taureau █ /c/en/cat █████████ /c/en/kitten ████████ /c/de/katze /c/en/bull ██ /c/de/kuh ██ /c/fr/taureau ██ /c/fr/bovin █████████ /c/en/cow ███████ /c/en/bovine /c/de/kuh ███████ /c/fr/vache ███████ /c/en/calf █████ /c/en/bovine █████████ /c/fr/vache ███████ /c/fr/bovin /c/de/kuh ███████ /c/en/cow ███████ /c/fr/taureau █████ /c/en/cow █████████ /c/de/kuh █████████ /c/fr/vache /c/fr/bovin ███████ /c/en/bovine ███████ /c/fr/taureau █████ /c/en/bull █████████ /c/fr/bovin █████ /c/fr/taureau /c/fr/vache █████ /c/en/cow █████ /c/de/kuh █████

DeepLearning4J: GloVe embeddings embedding for cow: [0.07429451495409012, 0.023825954645872116, -0.0350576713681221, 2.1239052875898778E-4,
0.027686676010489464, -0.027611061930656433, -0.0049623711965978146, 0.020939502865076065, -0.0936388373374939, -0.09413033723831177, -0.009138057008385658, -0.08698037266731262, - 0.07676944881677628, 0.044016581028699875, -0.048027075827121735, 0.10814525932073593, -0.009916309267282486, -0.05028389021754265, 0.018505286425352097, 0.05889671668410301, -0.03597813844680786, 0.0516405925154686, -0.03679244965314865, -0.047173500061035156, -0.09804922342300415, - 6.682900129817426E-4, 0.012187808752059937, -0.09750392287969589, 0.04911185801029205, 0.07368378341197968, -0.10344114154577255, 0.013207592070102692, -0.03336796909570694, 0.005758799146860838, 0.038930028676986694, 0.05426966771483421, 0.010626943781971931, 0.050721582025289536, - 0.0624113492667675, 0.04338112473487854, -0.018740855157375336, 0.06490082293748856, -0.008538954891264439, 0.03787141665816307, -0.06195039302110672, 8.1086769932881E-4, -0.054607026278972626, -0.039783600717782974, -0.029420001432299614, -0.016933368518948555, 0.00632838299497962, 0.07660804688930511, 0.04056010767817497, -0.003799356520175934, 0.05560746416449547, -0.03656560555100441, -0.006996847689151764, 0.05116654932498932, -0.13194067776203156, -0.04505192115902901, -0.06055587902665138, 0.07293926924467087, 0.08760564774274826, -0.006492699962109327, - 0.0177287794649601, -0.04065753519535065, -0.06176135689020157, -0.07669529318809509, -0.020943865180015564, 0.03818842023611069, -0.012433120049536228, 0.0608685202896595, -0.03171171247959137, 0.00201731757260859, -0.03855776786804199, -0.025409504771232605, 0.0310166385024786, 0.031528495252132416, -0.041938625276088715, 0.05411843582987785, -0.00957560632377863, -0.015519952401518822, -0.03166954219341278, 0.04504174366593361, 9.512787073617801E-5, -0.07255392521619797, 0.02083916775882244, 0.007762593682855368, -0.07990310341119766, -0.043434929102659225, 0.012822828255593777, -0.07173960655927658, -0.006847217679023743, 0.012675960548222065, 0.004888356197625399, 0.021932674571871758, -0.0279295165091753, 0.03529469296336174, -0.053144168108701706, 0.07567303627729416, -0.012852783314883709, 0.06768257915973663, -0.02611912041902542, - 0.052912961691617966, -0.06802284717559814, 0.01664981245994568, -0.016127780079841614, 0.029520336538553238, -0.0335279256105423, 0.1059306189417839, 0.012355615384876728, 0.03399470075964928, -0.1510261595249176, -9.826152818277478E-4, -0.005242437124252319, -0.011691221967339516, 0.07984057813882828, 0.07278657704591751, 0.03992610424757004, -0.11073070764541626, -0.049758948385715485, -0.09695862233638763, 0.04452552646398544, 0.06255386024713516, -0.03672265261411667, -3.065165365114808E-4, -0.0628068745136261, 0.07184285670518875, -0.08992498368024826, - 0.06390329450368881, 0.05206083878874779, -0.017519384622573853, 0.028018217533826828, 0.0608743354678154, 0.04115048795938492, -0.06981142610311508, 0.007086422294378281, 0.05566999688744545, -0.015992546454072, -0.00886060856282711, -2.7509272331371903E-4, -0.00922835897654295, 0.05679403990507126, -0.09790526330471039, -0.025711964815855026, -8.405465050600469E-4, 0.06877608597278595, -0.01868559792637825, 4.7151690523605794E-5, 0.02318759076297283, -0.09374935179948807, 0.09811320155858994, 0.026716770604252815, 0.16898469626903534, 0.007427852600812912, 0.001734925084747374, -0.09127732366323471, -0.0038229133933782578, -0.011299625039100647, -0.024474497884511948, 0.07970825582742691, 0.02700468711555004, 0.01567990705370903, -0.039283379912376404, 0.0627894252538681, 0.06899275630712509, 0.036423102021217346, -0.0647161528468132, 0.05964268371462822, -0.029324030503630638, -0.001510116271674633, 0.08532265573740005, 0.07373322546482086, 0.01941702701151371, -0.035389214754104614, -0.12569518387317657, -0.03024449571967125, -0.0735761821269989, 0.07549998909235, -0.0493023507297039, 0.049524832516908646, 0.07462169975042343, -0.0520055815577507, -0.08973449468612671, -0.03268307447433472, -0.07651934772729874, 0.015804963186383247, -0.17355068027973175, 0.08981302380561829, -0.08659502118825912, -0.06004839017987251, 0.045986928045749664, 0.05171184614300728, -0.01465619821101427, - 0.009460583329200745, -0.01918582059442997, 0.023878304287791252, -0.11462487280368805, -0.06036102771759033, 0.08193598687648773, 0.19623512029647827, -0.030802881345152855, 0.03031283989548683, 0.06249714270234108, -0.019354499876499176, 0.021198337897658348, -0.009988724254071712, 0.038271304219961166, -0.0647408738732338, 0.06784253567457199, -0.003619480412453413, -0.0446869321167469, -0.04519297182559967, -0.0029555235523730516, -0.057227373123168945, -0.06987105309963226, 0.04924708977341652, -0.09523402154445648, 0.03631985932588577, 0.04091491922736168, - 0.06854197382926941, 0.035464830696582794, -0.005609460175037384, -0.11687877774238586, 0.09606578946113586, 0.012312863953411579, 0.014134891331195831, 0.05814783647656441, 0.035082392394542694, 0.11499422043561935, 0.021144535392522812, 0.023494413122534752, -0.06059078127145767, - 0.022893857210874557, 0.05192851275205612, 0.047249116003513336, 0.08505073934793472, -0.051842719316482544, 0.019861992448568344, 0.007900591008365154, -0.05993641912937164, -0.08620385825634003, -0.01973111927509308, -0.08422042429447174, -0.1121383085846901, 0.002175381639972329, 6.996557349339128E-4, 0.026100218296051025, 0.060164716094732285, -0.005709067918360233, -0.055083978921175, -0.05358913168311119, -0.04064881056547165, 0.020432010293006897, 0.12265750020742416, 3.3725687535479665E-4, -0.02103692851960659, 0.05909738689661026, 0.012789092026650906, 0.03054840862751007, 0.0362209789454937, -0.05833396688103676, 0.03937208279967308, -0.036468181759119034, -0.03857085481286049, -0.01459948718547821, 0.04797909036278725, -0.054114073514938354, -0.013187524862587452, -0.046613659709692, -0.04100216180086136, 0.001268701278604567, 0.005481932777911425, -0.07730020582675934, -0.024480314925312996, -0.012503791600465775, -0.22214776277542114, -0.023876849561929703, -0.010199573822319508, -0.04424051567912102, -0.0897839367389679, -0.0045578316785395145, 0.0023194861132651567, 0.044324856251478195, - 0.07002373784780502, 0.07825412601232529, -0.041089411824941635, -0.023241395130753517, -0.05828307196497917, -0.07625468820333481, 0.08812331408262253, -0.015530130825936794, 0.0797417014837265, -0.002141936682164669, -0.04111703857779503, -0.05709795281291008, 0.06065039709210396, 0.014140126295387745, 0.11645126342773438, -0.05367056280374527] embedding for bull: [0.09320878982543945, -0.009668268263339996, 0.008591377176344395, -0.06309974193572998, 0.07069239020347595, 0.045465726405382156, 0.03622706979513168, 0.010717049241065979, 0.017912106588482857, -0.04634058475494385, 0.10788150131702423, -0.01882454752922058, 0.03744165226817131, 0.026148127391934395, -0.001020491006784141, -0.018561488017439842, 0.03144990652799606, -0.011073307134211063, -0.008244889788329601, -0.012233025394380093, -0.042158689349889755, 0.06323953717947006, 0.07939140498638153, -0.05575361102819443, -0.026175184175372124, - 0.08781832456588745, -0.03510117530822754, -0.026648690924048424, 0.020521657541394234, -0.08548235148191452, -0.041566431522369385, -0.011681952513754368, -0.10885106772184372, -0.006574537139385939, -0.1329413205385208, 0.03247959539294243, -0.062315069139003754, 0.041238732635974884, -0.001573547488078475, 0.08481192588806152, -0.014629573561251163, 0.11227084696292877, -0.032040663063526154, 0.00404781149700284, -0.013646331615746021, -0.021270249038934708, -0.0375589020550251, -0.0335378497838974, 0.031705450266599655, -0.022248832508921623, -0.015048515051603317, -0.0027815173380076885, -0.08697953820228577, 0.019460400566458702, 0.09935085475444794, 0.07698929309844971, -0.053632598370313644, 0.07277131080627441, -0.07270066440105438, 0.053301893174648285, -0.03485465049743652, -0.05525755509734154, 0.11171767115592957, 0.04697493463754654, 0.003389410674571991, -0.04832931607961655, -0.11554481089115143, -0.1065947636961937, 0.03154611214995384, -0.025886571034789085, 0.027734000235795975, 0.0843474343419075, -0.1351194530725479, 0.020402904599905014, 0.019144728779792786, -0.08682921528816223, 0.06382127106189728, - 0.004714780021458864, 0.015096615999937057, -0.013954486697912216, 0.09511033445596695, -0.0039228955283761024, -0.02256149612367153, 0.006959355436265469, 0.04581446573138237, -0.004678552970290184, -0.0940355509519577, 0.05377540364861488, 0.041904646903276443, 0.04053974524140358, 0.09436625987291336, 0.029029758647084236, 0.04867805913090706, 0.006980700884014368, -0.011696684174239635, 0.07300130277872086, 0.03625713288784027, 0.014255276881158352, -0.10305473953485489, -0.009785367175936699, -0.0011472856858745217, 0.05186032876372337, -0.014200109988451004, - 0.04129134491086006, -0.07683897018432617, -0.015982000157237053, -0.04235711321234703, 0.08608213812112808, -0.02864794433116913, 0.045023780316114426, -0.015246936120092869, 0.056634485721588135, -0.1178627461194992, 0.0021924141328781843, -0.010222798213362694, -0.014611384831368923, -0.04671488329768181, 0.026026370003819466, -0.05071339011192322, -0.07264655083417892, -0.046489402651786804, -0.0636814758181572, 0.03081856295466423, -0.027257487177848816, 0.05512527376413345, -0.07141392678022385, -0.04833833500742912, 0.05689153075218201, 0.007027450483292341, - 0.1459304541349411, 0.08448723703622818, 0.004497417714446783, 9.082772303372622E-4, 0.0302052590996027, 0.015128184109926224, 0.0327727235853672, 0.02050211653113365, -0.06044960021972656, 0.06566870212554932, -0.02490648627281189, 0.0068804374895989895, 0.037619031965732574, 0.017942169681191444, -0.03612335026264191, 0.12464667111635208, -0.0038239851128309965, 0.008693143725395203, -0.0037718242965638638, -0.0767202228307724, 0.0339602492749691, 0.04125376045703888, 0.19643791019916534, -0.02349047362804413, 0.07547556608915329, -0.0016505112871527672, - 0.04623987153172493, 0.01774224452674389, -0.03273363783955574, -0.012820624746382236, 0.014815217815339565, 0.019050026312470436, 0.02173774503171444, -0.016787715256214142, -0.05344920605421066, -0.13362827897071838, 0.027502508834004402, 0.016916988417506218, -0.0285261869430542, - 0.11792587488889694, -0.05384755879640579, 0.0336761437356472, 0.033552881330251694, 0.02680051513016224, 0.027394277974963188, 0.018862126395106316, -0.10487210750579834, 0.013714425265789032, 0.01794968731701374, 0.004329510498791933, 0.030705824494361877, 0.0835026428103447, 0.0773545652627945, -0.11404310911893845, 8.906447910703719E-4, 0.016798237338662148, -0.09427906572818756, -0.013683009892702103, -0.0018998918822035193, 0.0010528998682275414, -0.03332289308309555, -0.009884278289973736, -0.015786584466695786, 0.049949761480093, -0.010391456075012684, 0.027664853259921074, 0.01632021926343441, -0.021933160722255707, -0.0036723124794662, 0.01598650962114334, 0.021375473588705063, 0.28960609436035156, -0.07048344612121582, -0.042113594710826874, 0.06054580584168434, 0.008702313527464867, 0.05922900140285492, -0.11870453506708145, 0.02402861975133419, 0.01601056009531021, 0.005450445227324963, 0.013784775510430336, -0.043248508125543594, -0.07989496737718582, -0.14013712108135223, 0.008387694135308266, 0.03915078938007355, -0.0017083844868466258, -0.005894038826227188, -0.05920645594596863, -2.959345583803952E-4, -0.10340798646211624, 0.03105456568300724, -0.05343567579984665, -0.10598447173833847, 0.06501932442188263, -0.03507111221551895, -0.0389358326792717, -0.12544938921928406, -0.009064133279025555, 0.05799638107419014, -0.06509748846292496, -0.007732450030744076, 0.021156009286642075, - 0.03117632307112217, 0.017128940671682358, 0.07756952196359634, -0.04955292120575905, -0.026146624237298965, 0.02781367115676403, 0.06696145236492157, -0.09744329750537872, 0.02594519592821598, -0.007050148211419582, -0.09188296645879745, -0.09880369156599045, -0.03329583257436752, 0.005433008540421724, -0.054952409118413925, 0.036837369203567505, -0.06231807544827461, -0.09657897055149078, 0.01711992174386978, -0.053679198026657104, -0.021361947059631348, 0.031329650431871414, -0.03575656935572624, -0.0375513881444931, 0.006907044444233179, 0.05475999787449837, 0.01989482343196869, 0.015134196728467941, 0.016350284218788147, 0.012737046927213669, 0.044715628027915955, 0.0331229642033577, 0.0021181562915444374, -0.07594005763530731, 0.0054292501881718636, -0.04753562808036804, -0.04474418982863426, -0.07444287091493607, -0.08441809564828873, - 0.017993278801441193, -0.06268185377120972, 0.051952023059129715, -0.03499895706772804, -0.10678417980670929, -0.03229169547557831, -0.09383111447095871, 0.0120070930570364, 0.020212000235915184, 0.06248342990875244, 0.06621286273002625, 0.12101344764232635, -0.056159473955631256, 0.07345527410507202, -0.03602263331413269, -0.05208129808306694, 0.0706668347120285, -0.06283517926931381, -0.04050817713141441, 0.03997153416275978, -0.03571898862719536, -0.0034241345711052418, 0.001014643581584096, -0.03999258205294609, 0.03398129343986511, 0.02644125185906887, 0.07958530634641647, 0.028787745162844658]

Dimensionality reduction 3D

DeepLearning4J: GloVe embeddings projected

DeepJavaLibrary • Engine-Agnostic Architecture • MXNet, PyTorch, TensorFlow, ONNX RuntimeXGBoost,
LightGBM • Core API & Modules • Loading and managing deep learning engines • Working with n-dimensional arrays and memory • Translating between Java objects and NDArrays • Setting up inference pipelines • Building neural network blocks • Collecting metrics during training or evaluation • Pre-Trained Models (Zoo) • GluonCV, HuggingFace, TorchHub, Keras

DeepJavaLibrary: PyTorch (AnglE) JDK PyTorch DJL Groovy AnglE https://huggingface.co/WhereIsAI/UAE-Large-V1 Universal
AnglE Embedding Angle-optimized text Embeddings

DeepJavaLibrary: PyTorch (AnglE) var samplePhrases = [ 'bull', 'bovine', 'kitten',
'hay', 'The sky is blue', 'The sea is blue', 'The grass is green', 'One two three', 'Bulls consume hay', 'Bovines convert grass to milk', 'Dogs play in the grass', 'Bulls trample grass', 'Dachshunds are delightful', 'I like cats and dogs'] var queries = [ 'cow', 'cat', 'dog', 'grass', 'Cows eat grass', 'Poodles are cute', 'The water is turquoise’] var modelName = 'UAE-Large-V1.zip' var path = Paths.get(DjlPytorchAngle.classLoader.getResource(modelName).toURI()) var criteria = Criteria.builder() .setTypes(String, float[]) .optModelPath(path) .optTranslatorFactory(new DeferredTranslatorFactory()) .optProgress(new ProgressBar()) .build()

DeepJavaLibrary: PyTorch (AnglE) var model = criteria.loadModel() var predictor =
model.newPredictor() var sampleEmbeddings = samplePhrases.collect(predictor::predict) queries.each { query -> println "\n $query" var queryEmbedding = predictor.predict(query) sampleEmbeddings .collect { cosineSimilarity(it, queryEmbedding) } .withIndex() .sort { -it.v1 } .take(5) .each { printf '%s (%4.2f)%n', samplePhrases[it.v2], it.v1 } }

DeepJavaLibrary: PyTorch (AnglE) cow bovine (0.72) bull (0.57) Bulls consume
hay (0.46) hay (0.45) kitten (0.44) cat kitten (0.75) I like cats and dogs (0.39) bull (0.35) hay (0.31) bovine (0.26) dog kitten (0.54) Dogs play in the grass (0.45) bull (0.39) I like cats and dogs (0.37) hay (0.35) Poodles are cute Dachshunds are delightful (0.54) I like cats and dogs (0.42) Dogs play in the grass (0.27) Bulls consume hay (0.19) bovine (0.16) The water is turquoise The sea is blue (0.56) The grass is green (0.39) The sky is blue (0.38) kitten (0.17) One two three (0.17) grass The grass is green (0.61) Bulls trample grass (0.56) Dogs play in the grass (0.52) hay (0.51) Bulls consume hay (0.47) Cows eat grass Bovines convert grass to milk (0.60) Bulls trample grass (0.58) Dogs play in the grass (0.56) Bulls consume hay (0.53) bovine (0.44)

DeepJavaLibrary: TensorFlow (Use) JDK TensorFlow DJL Groovy Use https://www.kaggle.com/models/google/universal-sentence-encoder

DeepJavaLibrary: TensorFlow (Use) var criteria = Criteria.builder() .optApplication(Application.NLP.TEXT_EMBEDDING) .setTypes(String[], float[][])
.optModelUrls(modelUrl) .optTranslator(new UseTranslator()) .optEngine("TensorFlow") .optProgress(new ProgressBar()) .build()

DeepJavaLibrary: TensorFlow (Use) cow bovine (0.72) bull (0.57) Bulls consume
hay (0.46) hay (0.45) kitten (0.44) cat kitten (0.75) I like cats and dogs (0.39) bull (0.35) hay (0.31) bovine (0.26) dog kitten (0.54) Dogs play in the grass (0.45) bull (0.39) I like cats and dogs (0.37) hay (0.35) Poodles are cute Dachshunds are delightful (0.54) I like cats and dogs (0.42) Dogs play in the grass (0.27) Bulls consume hay (0.19) bovine (0.16) The water is turquoise The sea is blue (0.56) The grass is green (0.39) The sky is blue (0.38) kitten (0.17) One two three (0.17) grass The grass is green (0.61) Bulls trample grass (0.56) Dogs play in the grass (0.52) hay (0.51) Bulls consume hay (0.47) Cows eat grass Bovines convert grass to milk (0.60) Bulls trample grass (0.58) Dogs play in the grass (0.56) Bulls consume hay (0.53) bovine (0.44)

Deep learning: decisions Settle on the following semantic measures •
All metrics! Possible letters: a b c d e f g h i j k l m n o p q r s t u v w x y z Guess the hidden word (turn 1): aftershock LongestCommonSubsequence 0 Levenshtein Distance: 10, Insert: 0, Delete: 3, Substitute: 7 Jaccard 0% JaroWinkler PREFIX 0% / SUFFIX 0% Phonetic Metaphone=AFTRXK 47% / Soundex=A136 0% Meaning AnglE 45% / Use 21% / ConceptNet 2% / GloVe -4% / fastText 19%

Ending the Game Hopefully: Hints (discussed later): If more than
30 turns: Congratulations, you guessed correctly! You seem to be having trouble, here are one or more clues: XXX, YYY, ZZZ Sorry, you took too many turns! The hidden word was XXX

Playing the Game: Round 1, Turn 1 Let’s start with
aftershock. It’s a long word with distinct letters and common vowels and consonants. • We can actually rule out all the letters A, F, T, E, R, S, H, O, C, and K. The game automatically does this for us if we ever receive a Jaccard score of 0%, or for a Jaccard score of 100%, it keeps those letters and discards all others. We’ll see that the "Possible letters" line changes. • Because we deleted 3 letters, we know that the hidden word has 7 letters. • Even though no letter is correct, the Metaphone score isn’t 0, so we need to be on the lookout for other consonants that transform into the same groups. E.g. Q and G can transform to K, D can transform to T Possible letters: a b c d e f g h i j k l m n o p q r s t u v w x y z Guess the hidden word (turn 1): aftershock LongestCommonSubsequence 0 Levenshtein Distance: 10, Insert: 0, Delete: 3, Substitute: 7 Jaccard 0% JaroWinkler PREFIX 0% / SUFFIX 0% Phonetic Metaphone=AFTRXK 47% / Soundex=A136 0% Meaning AnglE 45% / Use 21% / ConceptNet 2% / GloVe -4% / fastText 19%

Playing the Game: Round 1, Turn 2 Possible letters: b
d g i j l m n p q u v w x y z Guess the hidden word (turn 2): fruit LongestCommonSubsequence 2 Levenshtein Distance: 6, Insert: 2, Delete: 0, Substitute: 4 Jaccard 22% JaroWinkler PREFIX 56% / SUFFIX 45% Phonetic Metaphone=FRT 39% / Soundex=F630 0% Meaning AnglE 64% / Use 41% / ConceptNet 37% / GloVe 31% / fastText 44% In terms of vowels, unless it’s a word like 'rhythm', U and I are our likely candidates. Let’s burn a turn to confirm that hunch. We’ll pick a word containing those two vowels plus a mixture of consonants from aftershock - we don’t want information from other consonants to blur what we might learn about the vowels. • Since LCS is 2, both U and I are in the answer in that order, but there could be duplicates. • Jaccard of 22% is 2 / 9. We know that F, R, and T aren’t in the hidden word, so the 7-letter hidden word has 6 distinct letters, i.e. it has one duplicate. • The semantic meaning scores jumped up just a little, so the hidden word has some relationship to fruit.

Playing the Game: Round 1, Turn 2 A common prefix
is ing and all those letters are still possible. Some possibilities are jumping, dumping, guiding, bugging, bumping and mugging. But, we also know there is exactly one duplicate letter, so we could try judging, pulling, budding, buzzing, bulging, piquing, pumping, mulling, numbing, and pudding (among others). Since we know there is some semantic relationship with fruit, two of these stand out. Budding is something that a fruit tree would need to do to later produce fruit. Pudding is a kind of food. Possible letters: b d g i j l m n p q u v w x y z Guess the hidden word (turn 2): fruit LongestCommonSubsequence 2 Levenshtein Distance: 6, Insert: 2, Delete: 0, Substitute: 4 Jaccard 22% JaroWinkler PREFIX 56% / SUFFIX 45% Phonetic Metaphone=FRT 39% / Soundex=F630 0% Meaning AnglE 64% / Use 41% / ConceptNet 37% / GloVe 31% / fastText 44%

d g i j l m n p q u v w x y z Guess the hidden word (turn 3): budding LongestCommonSubsequence 6 Levenshtein Distance: 1, Insert: 0, Delete: 0, Substitute: 1 Jaccard 71% JaroWinkler PREFIX 90% / SUFFIX 96% Phonetic Metaphone=BTNK 79% / Soundex=B352 75% Meaning AnglE 52% / Use 35% / ConceptNet 2% / GloVe 4% / fastText 25% It’s a 50/50 guess between budding and pudding. Let’s try the first. • We have 6 letters right in a row and 5 of the 6 distinct letters. Also, Metaphone and Soundex scores are high, and JaroWinkler says the front part of our guess is close and the back half is very close. Our other guess of pudding sounds right.

d g i j l m n p q u v w x y z Guess the hidden word (turn 4): pudding LongestCommonSubsequence 7 Levenshtein Distance: 0, Insert: 0, Delete: 0, Substitute: 0 Jaccard 100% JaroWinkler PREFIX 100% / SUFFIX 100% Phonetic Metaphone=PTNK 100% / Soundex=P352 100% Meaning AnglE 100% / Use 100% / ConceptNet 100% / GloVe 100% / fastText 100% Congratulations, you guessed correctly! • Let’s try it!

Playing the Game: Round 2, Turn 1 Possible letters: a
b c d e f g h i j k l m n o p q r s t u v w x y z Guess the hidden word (turn 1): bail LongestCommonSubsequence 1 Levenshtein Distance: 7, Insert: 4, Delete: 0, Substitute: 3 Jaccard 22% JaroWinkler PREFIX 42% / SUFFIX 46% Phonetic Metaphone=BL 38% / Soundex=B400 25% Meaning AnglE 46% / Use 40% / ConceptNet 0% / GloVe 0% / fastText 31% • Since LCS is 1, the letters shared with the hidden word are in the reverse order. • There were 4 inserts and 0 deletes which means the hidden word has 8 letters. • Jaccard of 22% is 2 / 9. Therefore, there are 2 letters in the hidden word that are in bail and five that are not. There are 7 unique letters in the hidden word. It has one duplicate.

b c d e f g h i j k l m n o p q r s t u v w x y z Guess the hidden word (turn 2): leg LongestCommonSubsequence 2 Levenshtein Distance: 6, Insert: 5, Delete: 0, Substitute: 1 Jaccard 25% JaroWinkler PREFIX 47% / SUFFIX 0% Phonetic Metaphone=LK 38% / Soundex=L200 0% Meaning AnglE 50% / Use 18% / ConceptNet 11% / GloVe 13% / fastText 37% • Jaccard of 2 / 8 tells us two of the letters in 'leg' appear in the hidden word. • LCS of 2 tells us that they appear in the same order as in the hidden word. • JaroWinkler has a high prefix score of 47%, but a suffix score of 0%. This suggests that the two correct letters are near the beginning of the word. • Metaphone has picked up some similarity with the encoding LK, suggesting the hidden word has some group of consonants encoded to either an 'L' or 'K'.

b c d e f g h i j k l m n o p q r s t u v w x y z Guess the hidden word (turn 3): languish LongestCommonSubsequence 2 Levenshtein Distance: 8, Insert: 0, Delete: 0, Substitute: 8 Jaccard 15% JaroWinkler PREFIX 50% / SUFFIX 50% Phonetic Metaphone=LNKX 34% / Soundex=L522 0% Meaning AnglE 46% / Use 12% / ConceptNet -11% / GloVe -4% / fastText 25% • Let’s try a word with 'L' and 'G' near the start: • 8 substitutions means none of the letters are in the same spot as 'languish'.

Playing the Game: Round 2, Turn 4 • Let’s try
a word with 'L' and 'E' near the start, bringing at most two letters from languish: • Jaccard tells us we have 4 distinct letters shared with the hidden word and yet we have a LCS of 5. The duplicate 'E' must be correct and the order of all correct letters must match the hidden word. • Only 4 substitutions means 8-4=4 letters are in the correct position. • JaroWinkler slightly favours the prefix over the suffix suggesting that the incorrect letters are probably closer to the end. • The phonetic metrics have increased. For example, 'languish' encodes to LNKX and scored only 34% whereas election, which encodes to ELKXN, scores 50%. Both metrics strongly suggest the hidden word starts with E. Possible letters: a b c d e f g h i j k l m n o p q r s t u v w x y z Guess the hidden word (turn 4): election LongestCommonSubsequence 5 Levenshtein Distance: 4, Insert: 0, Delete: 0, Substitute: 4 Jaccard 40% JaroWinkler PREFIX 83% / SUFFIX 75% Phonetic Metaphone=ELKXN 50% / Soundex=E423 75% Meaning AnglE 47% / Use 13% / ConceptNet -5% / GloVe -7% / fastText 26%

b c d e f g h i j k l m n o p q r s t u v w x y z Guess the hidden word (turn 5): elevator LongestCommonSubsequence 8 Levenshtein Distance: 0, Insert: 0, Delete: 0, Substitute: 0 Jaccard 100% JaroWinkler PREFIX 100% / SUFFIX 100% Phonetic Metaphone=ELFTR 100% / Soundex=E413 100% Meaning AnglE 100% / Use 100% / ConceptNet 100% / GloVe 100% / fastText 100% Congratulations, you guessed correctly! • From the LCS of 2 with leg, either 'L','E' is in the hidden word or 'E','G'. Trying 'L','E':

b c d e f g h i j k l m n o p q r s t u v w x y z Guess the hidden word (turn 1): aftershock LongestCommonSubsequence 3 Levenshtein Distance: 8, Insert: 0, Delete: 4, Substitute: 4 Jaccard 50% JaroWinkler PREFIX 61% / SUFFIX 49% Phonetic Metaphone=AFTRXK 33% / Soundex=A136 25% Meaning AnglE 44% / Use 11% / ConceptNet -7% / GloVe 1% / fastText 15% • we deleted 4 letters, so the hidden word has 6 letters • Jaccard of 50% is either 5/10 or 6/12. If the latter, we’d have all the letters, so there can’t be 2 additional letters in the hidden word, so it’s 5/10. That means we need to pick 5 letters from aftershock, duplicate one of them, and we’ll have all the letters • phonetic clues suggest it probably doesn’t start with A

b c d e f g h i j k l m n o p q r s t u v w x y z Guess the hidden word (turn 2): coarse LongestCommonSubsequence 3 Levenshtein Distance: 4, Insert: 0, Delete: 0, Substitute: 4 Jaccard 57% JaroWinkler PREFIX 67% / SUFFIX 67% Phonetic Metaphone=KRS 74% / Soundex=C620 75% Meaning AnglE 51% / Use 12% / ConceptNet 5% / GloVe 23% / fastText 26% • In aftershock, F, H, and K, are probably least common. Let’s pick a 6-letter word from the remaining 7 letters that abides by our LCS clue. We know this can’t be right because we aren’t duplicating a letter yet, but we just want to narrow down the possibilities. • We now have 4 of the 5 distinct letters (we should discard 2) • Phonetics indicates we are close but not very close yet, from the Metaphone value of KRS we should drop one and keep two.

b c d e f g h i j k l m n o p q r s t u v w x y z Guess the hidden word (turn 3): roasts LongestCommonSubsequence 3 Levenshtein Distance: 6, Insert: 0, Delete: 0, Substitute: 6 Jaccard 67% JaroWinkler PREFIX 56% / SUFFIX 56% Phonetic Metaphone=RSTS 61% / Soundex=R232 25% Meaning AnglE 54% / Use 25% / ConceptNet 18% / GloVe 18% / fastText 31% • Let’s assume C and E are wrong and bring in the other common letter, T. We need to find a word that matches the LCS conditions from previous guesses, and we’ll duplicate one letter, S. • Phonetics dropped, so maybe S wasn’t the correct letter to bring in, we want the K (from letter C) and R from the previous guess. • Also, the semantic meaning has bumped up to warm (from cold for previous guesses). Maybe the hidden word is related to roasts.

b c d e f g h i j k l m n o p q r s t u v w x y z Guess the hidden word (turn 4): carrot LongestCommonSubsequence 6 Levenshtein Distance: 0, Insert: 0, Delete: 0, Substitute: 0 Jaccard 100% JaroWinkler PREFIX 100% / SUFFIX 100% Phonetic Metaphone=KRT 100% / Soundex=C630 100% Meaning AnglE 100% / Use 100% / ConceptNet 100% / GloVe 100% / fastText 100% Congratulations, you guessed correctly! • Let’s try to word starting with C, related to roasts.

Cheat/hint mode: Part 1 Guess the hidden word (turn 3):
hint: what is a 6 letter word a little related to roasts and sounding a little like coarse Hints would have been for Game 3. After round 8: root_vegetable, daucus After round 16: diced, cauliflower, cucumber After round 24: celery, onion, sticks, zucchini

Agenda 1. String metrics 2. Phonetic algorithms 3. Deep learning
4. LLMs • Ollama

We activated a secondary cheat/hint mode • We used Ollama
to have a local LLM engine o Tried Mistal, Llama3, Qwen3 models o The latter proved most useful

Cheat/hint mode: Part 2 if (guess.startsWith('hint:')) { println new OllamaAPI().generate(
'qwen3', guess - 'hint:', null).response count++ continue }

hint: what is a seven letter word, with 6 unique letters, containing no letters from the word aftershock, and slightly related to fruit The seven-letter word you're looking for is likely **"budding"** or **"pudding"**, both of which meet the criteria: 1. **Seven letters**: Both words are exactly seven letters long. 2. **Six unique letters**: - **Budding**: Letters are B, U, D, I, N, G (six unique letters). - **Pudding**: Letters are P, U, D, I, N, G (six unique letters). 3. **No letters from "aftershock"**: - "Budding" and "Pudding" share no letters with "aftershock" (A, F, T, E, R, S, H, O, C, K). 4. **Slightly related to fruit**: - **Budding** refers to the growth of buds on plants, which is directly tied to fruit production. - **Pudding** is a dessert, but it might be considered a stretch for a "slightly related" connection to fruit (e.g., fruit-based puddings). **Final Answer**: The word is likely **"budding"** (more directly related to fruit growth) or **"pudding"** (as a dessert with potential fruit ties). Both are valid based on the criteria. Guess the hidden word (turn 4): pudding

hint: what is a 6 letter word a little related to roasts and sounding a little like coarse The 6-letter word you're looking for is "course". Relation to roasts: In a meal context, a "course" refers to a sequence of dishes (e.g., appetizer, main course, dessert), which can include roasted items. Sound like "coarse": "Course" and "coarse" are homophones (same pronunciation). This fits the clue perfectly! Guess the hidden word (turn 5): carrot

Building a wordle-like game with string metrics, phonetics, and semantic
similarity using machine learning and deep neural networks Dr Paul King, VP Apache Groovy & Distinguished Engineer, Object Computing Apache Groovy: Repo: Related blog Slides: Twitter/X | Mastodon | Bluesky: https://groovy.apache.org/ https://groovy-lang.org/ https://github.com/paulk-asert/groovy-string-similarity https://groovy.apache.org/blog/groovy-text-similarity https://speakerdeck.com/paulk/wordle-game @ApacheGroovy | @[email protected] | @groovy.apache.org Questions?

wordle game

wordle game

More Decks by paulking

Other Decks in Programming

Featured

Transcript