Follow @chewxy on Twitter
Yepoko Lessons For Machine
Learning on Small Data
Xuanyi Chew
YOW! Data 2021
Slide 2
Slide 2 text
Follow @chewxy on Twitter
The Puzzle
Term Number
rureponga talu 10
malapunga yepoko 15
supu 20
tokapunga telu 21
alapunga yepoko 27
polangipula talu 30
tokapu rureponga yepoko 35
tokapu malapu 40
tokapu talu 48
tokapu alapunga talu 50
tokapu talu tokapunga telu 69
tokapu talu polangipunga yepoko 79
tokapu yepoko alapunga telu 97
Slide 3
Slide 3 text
Follow @chewxy on Twitter
The Puzzle
Term Number
rureponga talu 10
malapunga yepoko 15
supu 20
tokapunga telu 21
alapunga yepoko 27
polangipula talu 30
tokapu rureponga yepoko 35
tokapu malapu 40
tokapu talu 48
tokapu alapunga talu 50
tokapu talu tokapunga telu 69
tokapu talu polangipunga yepoko 79
tokapu yepoko alapunga telu 97
Translate:
● tokapu polangipu
● tokapu talu rureponga telu
● tokapu yepoko malapunga talu
● tokapu yepoko polangipunga telu
● 13
● 66
● 72
● 76
● 95
Slide 4
Slide 4 text
Follow @chewxy on Twitter
The Puzzle - One More Thing
Term Number
rureponga talu 10
malapunga yepoko 15
supu 20
tokapunga telu 21
alapunga yepoko 27
polangipula talu 30
tokapu rureponga yepoko 35
tokapu malapu 40
tokapu talu 48
tokapu alapunga talu 50
tokapu talu tokapunga telu 69
tokapu talu polangipunga yepoko 79
tokapu yepoko alapunga telu 97
Translate:
● tokapu polangipu
● tokapu talu rureponga telu
● tokapu yepoko malapunga talu
● tokapu yepoko polangipunga telu
● 13
● 66
● 72
● 76
● 95
telu < yepoko
Slide 5
Slide 5 text
Follow @chewxy on Twitter
Neural Language Models to the Rescue
Slide 6
Slide 6 text
Follow @chewxy on Twitter
The ML Tasks
1. Categorization
2. Regression
3. Translation (sequence to sequence)
Slide 7
Slide 7 text
Follow @chewxy on Twitter
Neural Language Models: The Process
1. Download a pre-trained model
2. Fine-tune it to the task using examples (all 13 of them!)
3. ???
4. Profit!!!
Slide 8
Slide 8 text
Follow @chewxy on Twitter
BERT Results
Test Predicted Result
tokapu polangipu 55.21 ✗
tokapu talu rureponga telu 54.44 ✗
tokapu yepoko malapunga talu 57.98 ✗
tokapu yepoko polangipunga telu 58.22 ✗
Slide 9
Slide 9 text
Follow @chewxy on Twitter
GPT-3
> “If two times three is six, what is four times four”
“Sixteen”
Slide 10
Slide 10 text
Follow @chewxy on Twitter
GPT-3
> if 'rureponga talu' means 10 and 'malapunga yepoko' means 15 and 'supu' means 20 and 'tokapunga
telu' means 21 and 'alapunga yepoko' means 27 and 'polangipula talu' means 30 and 'tokapu rureponga
yepoko' means 35 and 'tokapu malapu' means 40 and 'tokapu talu' means 48 and 'tokapu alapunga talu'
means 50 and 'tokapu talu tokapunga telu' means 69 and 'tokapu talu polangipunga yepoko' means 79
and 'tokapu yepoko alapunga telu' means 97, then what does 'tokapu polangipu' mean?
"I understand! It means 'twice 10'!"
> “what does ‘tokapu talu rureponga telu’ mean?”
"Ah! I know the answer to this! It's 'uproot 100 10 rods long trees'."
Slide 11
Slide 11 text
Follow @chewxy on Twitter
GPT-3
> if 'rureponga talu' means 10 and 'malapunga yepoko' means 15 and 'supu' means 20 and 'tokapunga
telu' means 21 and 'alapunga yepoko' means 27 and 'polangipula talu' means 30 and 'tokapu rureponga
yepoko' means 35 and 'tokapu malapu' means 40 and 'tokapu talu' means 48 and 'tokapu alapunga talu'
means 50 and 'tokapu talu tokapunga telu' means 69 and 'tokapu talu polangipunga yepoko' means 79
and 'tokapu yepoko alapunga telu' means 97, then what does 'tokapu talu rureponga telu' mean?
"I know the answer! Here's a guess: 100 rods long, 10 inches in diameter, made of wood, with three
holes."
> "what does 'tokapu polangipu' mean?"
"I think it means 'two times 100'."
Slide 12
Slide 12 text
Follow @chewxy on Twitter
GPT-3
> if 'rureponga talu' means 10 and 'malapunga yepoko' means 15 and 'supu' means 20 and 'tokapunga
telu' means 21 and 'alapunga yepoko' means 27 and 'polangipula talu' means 30 and 'tokapu rureponga
yepoko' means 35 and 'tokapu malapu' means 40 and 'tokapu talu' means 48 and 'tokapu alapunga talu'
means 50 and 'tokapu talu tokapunga telu' means 69 and 'tokapu talu polangipunga yepoko' means 79
and 'tokapu yepoko alapunga telu' means 97, then what does 'tokapu polangipu' mean?
Let me take a look at my old records. Yes, 'tokapu polangipu' means 100.
> what does ‘tokapu talu rureponga telu’ mean?
Sorry, I can’t find the answer for this.
Slide 13
Slide 13 text
Follow @chewxy on Twitter
GPT-3
...
Socrates: tokapu alapunga talu
Aristotle: 50
Socrates: tokapu talu tokapunga telu
Aristotle: 69
Socrates: tokapu talu polangipunga yepoko
Aristotle: 79
Socrates: tokapu yepoko alapunga telu
Aristotle: 97
Socrates: tokapu polangipu
Aristotle:
I think it means 'Socrates is a dung beetle, Aristotle is a snake, and I am a dodo.'
Slide 14
Slide 14 text
Follow @chewxy on Twitter
Random Forests
1. Byte-pair encode text to be used as features
2. Encode position of pairs in sentence
3. ???
4. Profit!
Follow @chewxy on Twitter
Byte Pair Encoding
alapunga
Slide 23
Slide 23 text
Follow @chewxy on Twitter
Byte Pair Encoding
alapunga
[al, la, ap, pu, un, ng, ga]
Slide 24
Slide 24 text
Follow @chewxy on Twitter
Byte Pair Encoding
alapunga
[al, la, ap, pu, un, ng, ga]
[, 💎, 👐]
Slide 25
Slide 25 text
Follow @chewxy on Twitter
Byte Pair Encoding
alapunga
[al, la, ap, pu, un, ng, ga]
[, 💎, 👐]
al
💎 🇿u
👐 🆖a
🇿 ap
🆖 ng
Slide 26
Slide 26 text
Follow @chewxy on Twitter
Byte Pair Statistics
pu 17
ap 14
ok 14
al 11
ng 11
ga 9
ka 9
lu 9
po 9
to 9
la 8
ep 7
un 7
ta 6
ko 5
ye 5
el 3
te 3
ur 2
an 2
gi 2
ip 2
ma 2
ol 2
on 2
re 2
ru 2
su 1
ul 1
up 1
Slide 27
Slide 27 text
Follow @chewxy on Twitter
Syllable Statistics
pu 17
ap 14
ok 14
al 11
ng 11
ga 9
ka 9
lu 9
po 9
to 9
la 8
ep 7
un 7
ta 6
ko 5
ye 5
el 3
te 3
ur 2
an 2
gi 2
ip 2
ma 2
ol 2
on 2
re 2
ru 2
su 1
ul 1
up 1
nga 9
ngi 2
a 2
Slide 28
Slide 28 text
Follow @chewxy on Twitter
Syllable Statistics
pu 17
ap 14
ok 14
al 11
ng 11
ga 9
ka 9
lu 9
po 9
to 9
la 8
ep 7
un 7
ta 6
ko 5
ye 5
el 3
te 3
ur 2
an 2
gi 2
ip 2
ma 2
ol 2
on 2
re 2
ru 2
su 1
ul 1
up 1
nga 9
ngi 2
a 2
Slide 29
Slide 29 text
Follow @chewxy on Twitter
Recursive Pattern Matching
Some “phrases” are repeated.
Term Number
rureponga talu 10
malapunga yepoko 15
supu 20
tokapunga telu 21
alapunga yepoko 27
polangipula talu 30
tokapu rureponga yepoko 35
tokapu malapu 40
tokapu talu 48
tokapu alapunga talu 50
tokapu talu tokapunga telu 69
tokapu talu polangipunga yepoko 79
tokapu yepoko alapunga telu 97
Slide 30
Slide 30 text
Follow @chewxy on Twitter
Recursive Pattern Matching
Some “phrases” are repeated.
We now have “word” units.
Term Number
rureponga talu 10
malapunga yepoko 15
supu 20
tokapunga telu 21
alapunga yepoko 27
polangipula talu 30
tokapu rureponga yepoko 35
tokapu malapu 40
tokapu talu 48
tokapu alapunga talu 50
tokapu talu tokapunga telu 69
tokapu talu polangipunga yepoko 79
tokapu yepoko alapunga telu 97
Slide 31
Slide 31 text
Follow @chewxy on Twitter
Recursive Pattern Matching
Some “phrases” are repeated.
We now have “word” units.
We now have “sub-word” units.
Term Number
rureponga talu 10
malapunga yepoko 15
supu 20
tokapunga telu 21
alapunga yepoko 27
polangipula talu 30
tokapu rureponga yepoko 35
tokapu malapu 40
tokapu talu 48
tokapu alapunga talu 50
tokapu talu tokapunga telu 69
tokapu talu polangipunga yepoko 79
tokapu yepoko alapunga telu 97
Slide 32
Slide 32 text
Follow @chewxy on Twitter
Solving It - Apply Broad Pattern Matching
tokapu talu tokapunga telu
48 21
69
+ Bigrams should be considered.
tokapunga telu 21
tokapu talu 48
tokapu talu tokapunga telu 69
* - assumption
? - open question
+ - newly synthesized fact
Slide 33
Slide 33 text
Follow @chewxy on Twitter
Solving It - First Level Pattern Matching
tokapu talu tokapunga telu
48 + 21
69
+ Bigrams should be considered.
+ Juxtaposition of bigrams implies addition.
? How tightly do bigrams bind?
tokapunga telu 21
tokapu talu 48
tokapu talu tokapunga telu 69
* - assumption
? - open question
+ - newly synthesized fact
Slide 34
Slide 34 text
Follow @chewxy on Twitter
Solving It - Apply New Information on New Problem
tokapu talu polangipunga yepoko
48 + x
79
+ Bigrams should be considered.
+ Juxtaposition of bigrams implies addition.
? How tightly do bigrams bind?
tokapunga telu 21
tokapu talu 48
tokapu talu tokapunga telu 69
tokapu talu polangipunga yepoko 79
* - assumption
? - open question
+ - newly synthesized fact
Slide 35
Slide 35 text
Follow @chewxy on Twitter
Solving It - Apply New Information on New Problem
tokapu talu polangipunga yepoko
48 + 31
79
+ Bigrams should be considered.
+ Juxtaposition of bigrams implies addition.
? How tightly do bigrams bind?
tokapunga telu 21
tokapu talu 48
tokapu talu tokapunga telu 69
tokapu talu polangipunga yepoko 79
polangipunga yepoko 31
* - assumption
? - open question
+ - newly synthesized fact
Slide 36
Slide 36 text
Follow @chewxy on Twitter
Solving It - A Leap of Faith
+ Bigrams should be considered.
+ Juxtaposition of bigrams implies addition.
? How tightly do bigrams bind?
* polangipula is a typo of polangipunga
polangipunga yepoko 31
polangipula talu 30
polangipunga yepoko 31
* - assumption
? - open question
+ - newly synthesized fact
Slide 37
Slide 37 text
Follow @chewxy on Twitter
Solving It - A Second Leap of Faith
polangipunga yepoko = 31
polangipunga talu = 30
+ Bigrams should be considered.
+ Juxtaposition of bigrams implies addition.
? How tightly do bigrams bind?
* polangipula is a typo of polangipunga
+ talu < yepoko; yepoko = (succ talu)
polangipunga yepoko 31
polangipula talu 30
polangipunga yepoko 31
* - assumption
? - open question
+ - newly synthesized fact
Slide 38
Slide 38 text
Follow @chewxy on Twitter
Solving It - Applying New Information
tokapu alapunga talu
x + y
50
+ Bigrams should be considered.
+ Juxtaposition of bigrams implies addition.
* Bigrams bind tightest to the right.
* polangipula is a typo of polangipunga
+ talu < yepoko; yepoko = (succ talu)
tokapu alapunga talu 50
alapunga yepoko 27
polangipunga yepoko 31
* - assumption
? - open question
+ - newly synthesized fact
Slide 39
Slide 39 text
Follow @chewxy on Twitter
Solving It
alapunga talu = x
alapunga yepoko = 27
+ Bigrams should be considered.
+ Juxtaposition of bigrams implies addition.
* Bigrams bind tightest to the right.
* polangipula is a typo of polangipunga
+ talu < yepoko; yepoko = (succ talu)
tokapu alapunga talu 50
alapunga yepoko 27
polangipunga yepoko 31
alapunga talu 26
* - assumption
? - open question
+ - newly synthesized fact
Slide 40
Slide 40 text
Follow @chewxy on Twitter
Solving It
tokapu alapunga talu
x + 26
50
+ Bigrams should be considered.
+ Juxtaposition of bigrams implies addition.
* Bigrams bind tightest to the right.
* polangipula is a typo of polangipunga
+ talu < yepoko; yepoko = (succ talu)
tokapu alapunga talu 50
alapunga yepoko 27
polangipunga yepoko 31
alapunga talu 26
* - assumption
? - open question
+ - newly synthesized fact
Slide 41
Slide 41 text
Follow @chewxy on Twitter
Solving It- Breakthrough 1
tokapu alapunga talu
24 + 26
50
+ Bigrams should be considered.
+ Juxtaposition of bigrams implies addition.
* Bigrams bind tightest to the right.
* polangipula is a typo of polangipunga
+ talu < yepoko; yepoko = (succ talu)
tokapu alapunga talu 50
alapunga yepoko 27
polangipunga yepoko 31
alapunga talu 26
tokapu 24
* - assumption
? - open question
+ - newly synthesized fact
Slide 42
Slide 42 text
Follow @chewxy on Twitter
Solving It- Breakthrough 2
tokapu talu
24 × 2
48
+ Bigrams should be considered.
+ Juxtaposition of bigrams implies addition.
* Bigrams bind tightest to the right.
* polangipula is a typo of polangipunga
+ talu < yepoko; yepoko = (succ talu)
+ Juxtaposition of words in a bigram implies multiplication.
tokapu talu 48
polangipunga yepoko 31
alapunga talu 26
tokapu 24
telu 1
talu 2
yepoko 3
* - assumption
? - open question
+ - newly synthesized fact
Slide 43
Slide 43 text
Follow @chewxy on Twitter
Breakthrough
Number system is somewhat “positional”.
Large numbers appear to be Base-24.
Term Number
rureponga talu 10
malapunga yepoko 15
supu 20
tokapunga telu 21
alapunga yepoko 27
polangipula talu 30
tokapu rureponga yepoko 35
tokapu malapu 40
tokapu talu 48
tokapu alapunga talu 50
tokapu talu tokapunga telu 69
tokapu talu polangipunga yepoko 79
tokapu yepoko alapunga telu 97
Slide 44
Slide 44 text
Follow @chewxy on Twitter
Positional Numbers
四百二十
Slide 45
Slide 45 text
Follow @chewxy on Twitter
Positional Numbers
四百二十
400 20
Follow @chewxy on Twitter
Solving It - The -nga Suffix
tokapu talu tokapunga telu
21
48 +
=
69
Slide 52
Slide 52 text
Follow @chewxy on Twitter
Solving It - The -nga Suffix
tokapu talu tokapunga telu
24 2 21
× +
=
69
Slide 53
Slide 53 text
Follow @chewxy on Twitter
Solving It - The -nga Suffix
tokapu talu tokapunga telu
24 2 24 ? 1
× + + ×
=
69
Slide 54
Slide 54 text
Follow @chewxy on Twitter
Solving It- The -nga Suffix
rureponga talu = 10
12 ? 2 = 10
malapunga yepoko = 15
16 ? 3 = 15
tokapunga telu = 21
24 ? 1 = 21
alapunga yepoko = 27
x ? 3 = 27
+ Bigrams should be considered.
+ Juxtaposition of bigrams implies addition.
* Bigrams bind tightest to the right.
* polangipula is a typo of polangipunga
+ talu < yepoko; yepoko = (succ talu)
+ Juxtaposition of words in a bigram implies multiplication.
rureponga talu 10
malapunga yepoko 15
tokapunga telu 21
alapunga yepoko 27
* - assumption
? - open question
+ - newly synthesized fact
Slide 55
Slide 55 text
Follow @chewxy on Twitter
Solving It- The -nga Suffix
rureponga talu = 10
12 -4 2 = 10
malapunga yepoko = 15
16 -4 3 = 15
tokapunga telu = 21
24 -4 1 = 21
alapunga yepoko = 27
28 -4 3 = 27
+ Bigrams should be considered.
+ Juxtaposition of bigrams implies addition.
* Bigrams bind tightest to the right.
* polangipula is a typo of polangipunga
+ talu < yepoko; yepoko = (succ talu)
+ Juxtaposition of words in a bigram implies multiplication, except
following -nga , then it’s addition.
+ -nga means (-4).
rureponga talu 10
malapunga yepoko 15
tokapunga telu 21
alapunga yepoko 27
* - assumption
? - open question
+ - newly synthesized fact
Slide 56
Slide 56 text
Follow @chewxy on Twitter
Solving It - The -nga Suffix
tokapu talu tokapunga telu
24 2 24 ? 1
× + + ×
=
69
Slide 57
Slide 57 text
Follow @chewxy on Twitter
Solving It - The -nga Suffix
tokapu talu tokapunga telu
24 2 24 -4 1
× + + +
=
69
Slide 58
Slide 58 text
Follow @chewxy on Twitter
Inconsistencies in Positional Numbers
sixty nine thousand four hundred and twenty
六万九千四百二十
(6 × 10 + 9) × 1000 + 4 × 100 + 2 × 10
6 × 10000 + 9 × 1000 + 4 × 100 + 2 × 10
69420
Slide 59
Slide 59 text
Follow @chewxy on Twitter
Solving It...
Left as an exercise to the watcher.
Slide 60
Slide 60 text
Follow @chewxy on Twitter
ML/AI = “Do What You Just Did, At Scale”
Slide 61
Slide 61 text
Follow @chewxy on Twitter
What The Human Needed
● Which examples to work on
● Recursive problem solving (solving for something while solving for another)
● Parallel problem solving (solving for multiple things at once)
● Backtracking
● Error correction
● Feature engineering
● Prior knowledge
○ Arithmetics
○ Algebra
○ Linguistics
○ Statistics
● Putting all these together
Slide 62
Slide 62 text
Follow @chewxy on Twitter
What Machines Are Good At
● Which examples to work on
● Recursive problem solving (solving for something while solving for another)
● Parallel problem solving (solving for multiple things at once)
● Backtracking
● Feature engineering
● Error tolerance
● Prior knowledge*
● Putting all these together
Slide 63
Slide 63 text
Follow @chewxy on Twitter
AI Has a Long Way to Go
Human reasoning is still needed.
Reinforcement learning may learn “reasoning”.
Slide 64
Slide 64 text
Follow @chewxy on Twitter
Telu More Experiment
Slide 65
Slide 65 text
Follow @chewxy on Twitter
Inspecting the BERT Neurons
Layer 11, Head 5 (EN), Head 6 (UU)
“fourty eight” → “4 8”
“tokapu talu” → “4 8”
4
8
[SEP]
4
8
[SEP]
[CLS]
four##
##ty
[SEP]
Eight
Slide 66
Slide 66 text
Follow @chewxy on Twitter
Talu Artificial Languages
Base-10: One byte for units under 5, one byte-pair for units up to 10, multiply-add
combinations for the rest up to 100
Base-12: One byte for units under 6, one byte-pair for units up to 12, multiply-add
combinations for the rest up to 100.
Two ways of doing multiply-add: prefix and postfix multiplication
Slide 67
Slide 67 text
Follow @chewxy on Twitter
The Basic Components
Base-10 Base-12
a 1 ba 6
e 2 be 7
i 3 bi 8
o 4 bo 9
u 5 bu 10
a 1 ba 7
e 2 be 8
i 3 bi 9
o 4 bo 10
u 5 bu 11
ə 6 bə 12
Follow @chewxy on Twitter
Can a BERT-based LM Translate These Artificial Languages?
Multiply-Add Type Base-10 Base-12
Prefix multiplication
(e.g. “twenty-four”)
Yes No
Postfix multiplication
(e.g. “four-and-twenty”)
No No
Slide 71
Slide 71 text
Follow @chewxy on Twitter
Can a LM w/ BERT Arch Translate These Artificial Languages?
Multiply-Add Type Base-10 Base-12
Prefix multiplication
(e.g. “twenty-four”)
Yes* Yes*
Postfix multiplication
(e.g. “four-and-twenty”)
Yes* Yes*
* super over-fitted obviously
Slide 72
Slide 72 text
Follow @chewxy on Twitter
Yepoko Lessons
Slide 73
Slide 73 text
Follow @chewxy on Twitter
Use The Right Tool for the Right Job
Machine learning algorithms are probably not the right tool for this puzzle.
It’s the right tool for a much larger dataset.
Prolog might help.
Slide 74
Slide 74 text
Follow @chewxy on Twitter
Careful Thought with Judiciously Placed Statistical Tools
For now, we can’t replace careful thought with machines.
Machines are awesome at statistics though.
Slide 75
Slide 75 text
Follow @chewxy on Twitter
Watch Your Biases!
Bias can fuck you up in more ways than you expect.
Slide 76
Slide 76 text
Follow @chewxy on Twitter
Closing Thoughts
Slide 77
Slide 77 text
Follow @chewxy on Twitter
The Language
● Umbu-ungu/Imbo-ungu is a language in Southern Highlands of PNG.
● Base-4, Base-12, Base-24, Base-28, Base-32 number system.
● PNG is the most linguistically diverse country in the world.
Slide 78
Slide 78 text
Follow @chewxy on Twitter
Neural Machine Translation
● Requires parallel corpus.
● Only parallel corpus for Umbu-Ungu is The Bible.
● Rare language communities are underserved.
Slide 79
Slide 79 text
Follow @chewxy on Twitter
Join Me (Shameless Plug)
https://gorgonia.org
[email protected]
@chewxy
Slide 80
Slide 80 text
Follow @chewxy on Twitter
Thank You
Slide 81
Slide 81 text
Follow @chewxy on Twitter
The Fate of Umbu-Ungu
andrete - 100 (from hundred in English)
tausen - 1000 (from thousand in English)
Mostly Tok Pisin (from talk business)
Slide 82
Slide 82 text
Follow @chewxy on Twitter
The Deep Learning Bombe - An Enigma