Man Bites Dog: A Journey of LLMs and What Comes Next

Gorgonia, a Go programming language library for machine learning. The
robotic dog, named "GPT-K9", reportedly reacted to the bite with an error message: "Syntax error: unexpected bite." GPT-K9 was last seen entering a blue police box with the doctor. The Concurrent Times. Friday, 3 November 2023 MAN BITES DOG!  In a bizarre twist of events, a linguist studying word order bit a dog during a conference on large language models. Dr. Noam Lingo, while presenting his research on how word order affects perception, used the classic example "Man bites dog" versus "Dog bites man". To demonstrate, he playfully "bit" a robotic dog powered by Xuanyi Chew Live from Gophercon Singapore

hand : Noun pen : Noun book : Noun read
: Verb write : Verb speak : Verb

deﬁne : Verb Show or tell what a word means
example : Noun A thing to show to help deﬁne what a word means

many More than one order Many things in a line

sentence Many words in order. valid Good by the rules.
grammar A set of rules that deﬁnes if a sentence is valid or not. vocabulary A set of words where you know what a word means.

Language  Grammar + Vocabulary 

S ::= NP VP VP NP ::= { n |
n ∈ Nouns } VP ::= { v | v ∈ Verbs }

🜣 🜟 ꢯ lén katj jìng

Rules of The Lien Language  S ::= VP NP NP
NP ::= ꢯ | 🜣 VP ::= 🜟

Is This Sentence Valid?  🜣 🜟 ꢯ 🜣 ꢯ 🜟
🜟 🜣 ꢯ ꢯ 🜣 🜟 🜟 ꢯ 🜣 ꢯ 🜣 🜟 ✖ S ::= VP NP NP NP ::= ꢯ | 🜣 VP ::= 🜟

🜟 🜣 ꢯ ꢯ 🜣 🜟 🜟 ꢯ 🜣 ꢯ 🜣 🜟 ✖ ✖ S ::= VP NP NP NP ::= ꢯ | 🜣 VP ::= 🜟

🜟 🜣 ꢯ ꢯ 🜣 🜟 🜟 ꢯ 🜣 ꢯ 🜣 🜟 ✖ ✖ ✔ S ::= VP NP NP NP ::= ꢯ | 🜣 VP ::= 🜟

🜟 🜣 ꢯ ꢯ 🜣 🜟 🜟 ꢯ 🜣 ꢯ 🜣 🜟 S ::= VP NP NP NP ::= ꢯ | 🜣 VP ::= 🜟 ✖ ✖ ✔ ✖

🜟 🜣 ꢯ ꢯ 🜣 🜟 🜟 ꢯ 🜣 ꢯ 🜣 🜟 S ::= VP NP NP NP ::= ꢯ | 🜣 VP ::= 🜟 ✖ ✖ ✔ ✖ ✔

🜟 🜣 ꢯ ꢯ 🜣 🜟 🜟 ꢯ 🜣 ꢯ 🜟 🜣 S ::= VP NP NP NP ::= ꢯ | 🜣 VP ::= 🜟 ✖ ✖ ✔ ✖ ✔ ✖

syllable A sound that the mouth makes.

Recap  Vocabulary = set of words Grammar = set of
rules Language = Grammar + Vocabulary = The set of all sentences generated by a given grammar and vocab

On Word Order  Verb-Subject-Object: 🜟 🜣 ꢯ Subject-Verb-Object: man bites
dog Subject-Object-Verb: 男が犬を噛む

Murrinhpatha Has Free Word Order  ku were bangamlele kardu kardu
bangamlele ku were

Sublanguages  Language A is a sublanguage of Language B if
all the sentences in A is contained in the set of all sentences in B.

Examples of Sublanguages  Written English ≺ English Written English ≺
Internet Text Go Programming Language ≺ Internet Text

Gold’s Theorem  Speaker Listener

Gold’s Theorem  Teacher Learner S ::= VP NP NP NP
::= ꢯ | 🜣 VP ::= 🜟 | ⩹ 🜟 🜣 ꢯ ⩹ 🜣 ∅ …

::= ꢯ | 🜣 VP ::= 🜟 | ⩹ 🜟 🜣 ꢯ ⩹ 🜣 ∅ … L 1 L 2 … L n L ∞ … L k

::= ꢯ | 🜣 VP ::= 🜟 | ⩹ 🜟 🜣 ꢯ ⩹ 🜣 ∅ … L 1 L 2 … L n L ∞ … L k 🜟 🜣 ꢯ

Gold’s Theorem  L 1 , L 2 , … L
n , L ∞ , L k , … Listener Speaker

Gold’s Theorem  En, Zh, … JS, L ∞ , Go,
… Listener Speaker

… Listener Speaker func I[T any](x T) T { return x }

… Listener Go ≺ L ∞ Speaker func I[T any](x T) T { return x }

Memorization  1. 🜟 ꢯ 🜣 2. 🜟 🜣 ꢯ 3.
⩹ ꢯ ∅ 4. ⩹ 🜣 ∅

All Possible Languages  1. 🜟 ꢯ 🜣 2. 🜟 🜣
ꢯ 3. ⩹ ꢯ ∅ 4. ⩹ 🜣 ∅ 1. 2. s1 3. s2 4. s1 s2 5. s3 6. s1 s3 7. s2 s3 8. s1 s2 s3 9. s4 10. s1 s4 11. s2 s4 12. s1 s2 s4 13. s3 s4 14. s1 s3 s4 15. s2 s3 s4 16. s1 s2 s3 s4

ꢯ 3. ⩹ ꢯ ∅ 4. ⩹ 🜣 ∅ 1. 2. s1 3. s2 4. s1 s2 5. s3 6. s1 s3 7. s2 s3 8. s1 s2 s3 9. s4 10. s1 s4 11. s2 s4 12. s1 s2 s4 13. s3 s4 14. s1 s3 s4 15. s2 s3 s4 16. s1 s2 s3 s4 Ah Lien Kiasu

ꢯ 3. ⩹ ꢯ ∅ 4. ⩹ 🜣 ∅ 1. 2. s1 3. s2 4. s1 s2 5. s3 6. s1 s3 7. s2 s3 8. s1 s2 s3 9. s4 10. s1 s4 11. s2 s4 12. s1 s2 s4 13. s3 s4 14. s1 s3 s4 15. s2 s3 s4 16. s1 s2 s3 s4 Ah Lien Kiasu Chewxy 🜟 ꢯ 🜣

ꢯ 3. ⩹ ꢯ ∅ 4. ⩹ 🜣 ∅ 1. 2. s1 3. s2 4. s1 s2 5. s3 6. s1 s3 7. s2 s3 8. s1 s2 s3 9. s4 10. s1 s4 11. s2 s4 12. s1 s2 s4 13. s3 s4 14. s1 s3 s4 15. s2 s3 s4 16. s1 s2 s3 s4 Ah Lien Kiasu Chewxy 🜟 ꢯ 🜣 L6! L2 | L4 | L6 | L8 | L10 | L12 | L14 | L16

Statistical Learning 101  In order to generalize, a good learner
cannot memorize!

Large Language Models 

Neural Networks  Y = σ(W’X + b)

Y = σ(W’X + b) func NN[T, U any](x T)
U Neural Networks 

Neural Networks  Y = σ(W’X + b)

Simple Linear Regression  y = mx + c

Neural Networks  Y = σ(W’X + b) I

Neural Networks  Y = σ(W’X + b) I m

Neural Networks  Y = σ(W’X + b) I m c

Neural Networks  Y = σ(W’X + b) I [m,c] 0

Neural Networks  σ 2 (W 2 ’Y = σ 1
(W 1 ’X + b 1 ) + b 2 )

Deep Neural Networks  Y = σ 2 (W 2 ’σ
1 (W 1 ’X + b 1 ) + b 2 )

Deep Neural Networks  Y = σ 2 (W 2 ’σ
1 (W 1 ’X + b 1 ) + b 2 ) [0.3m, 0.6c] [0.7m, 0.4c] Note: “spreading out” of values of m and c is shown for illustration purposes only

Training LLMs  1. Pretraining 2. Supervised ﬁnetuning 3. Alignment

Pretraining  LLM “hello” “gophercon” “singapore” “hello gophercon singapore” “hello gophercon
au” … “<html> <body> … </html>” “func add(x, y int) int {...”

Supervised Finetuning  LLM “q” “:” “answer” … “A” “:” …
“input: summarize the following paragraph … output: … ” “Q: Answer the following yes/no question by reasoning step-by-step. … A: …”

Alignment  LLM Prompt a. … b. … c. … 1.
b 2. a 3. c

What Do LLMs Learn?  LLMs learn the grammar of a
language and knowledge embedded in language

Where Is Knowledge Found?  Claim: LLMs have a world model.
Counterclaim: A bag of words with gradient boosting has a world model Language Models Represent Space and Time https://arxiv.org/abs/2310.02207 Bag of words with gradient boosting shows a world model https://twitter.com/ArthurB/status/1711475192461971860

How To Test If LLMs Have Knowledge Outside of Language? 
1. Take something written in plain natural language. 2. Replace noun and verbs with consistent random strings. 3. Use it as a prompt to the LLM.

How Do We Know LLMs Don’t Understand Arithmetic?  I tried
22 times before I gave up! 22 wrong answers later I give up

Why Do LLMs Perform So Well?  LLMs memorize the world!
Quantifying Memorization Across Neural Language Models https://arxiv.org/abs/2202.07646

Can LLMs Extract Knowledge From Structure?  Definitions 1. A point
is that which has no parts. 2. A line is a breadless length. 3. The extremities of a line are points. 4. A straight or right line is that which lies evenly between its extremities. 5. A surface is that which has length and breadth only. 6. A plane angle is the inclination of two lines to one another, in a plane, which meet together, but are not in the same direction. 7. When one straight line landing on another straight line makes adjacent angles equal, each of these angles is called a right angle, and each of these lines is said to be perpendicular to the other. 8. A figure is a surface enclosed on all sides by a line, or lines. 9. A circle is a plane figure, bounded by one continuous line, called its circumference; and having a certain point within it (the center), from which all straight lines drawn to its circumference are equal. 10. The distance of a length is its magnitude. The distance of a breadth is its magnitude. Postulates 1. A straight line may be drawn from any one point to any other point. 2. A finite straight line may be produced to any length in a straight line. 3. A circle may be described with any center at any distance from that center. 4. All right angles are equal to each other. Definitions 1. A adthc is that which has no zvrts. 2. A jiqi is a lpmvqdxless dacveq. 3. The extremities of a jiqi are adthcs. 4. A egezhlbj or hebra jiqi is that which lies evenly between its extremities. 5. A wbkkgie is that which has dacveq and lpmvqdx only. 6. A bmknx mduhm is the inclination of two jiqis to one another, in a bmknx, which meet together, but are not in the same direction. 7. When one egezhlbj jiqi landing on another egezhlbj jiqi makes adjacent mduhms equal, each of these mduhms is called a hebra mduhm, and each of these jiqis is said to be perpendicular to the other. 8. A mvjkcm is a wbkkgie enclosed on all sides by a jiqi, or jiqis. 9. A xazrvj is a bmknx mvjkcm, bounded by one continuous jiqi, called its kmzohvxrcmjpq; and having a certain adthc within it (the oemhbe), from which all egezhlbj jiqis drawn to its kmzohvxrcmjpq are equal. 10. The ijhwkmlg of a dacveq is its magnitude. The ijhwkmlg of a breadth is its magnitude. Postulates 1. A egezhlbj jiqi may be drawn from any one adthc to any other adthc. 2. A finite egezhlbj jiqi may be produced to any dacveq in a egezhlbj jiqi. 3. A xazrvj may be described with any oemhbe at any ijhwkmlg from that oemhbe. 4. All hebra mduhms are equal to each other.

Can LLMs Extract Knowledge From Structure?  Definitions 1. A point
is that which has no parts. 2. A line is a breadless length. 3. The extremities of a line are points. 4. A straight or right line is that which lies evenly between its extremities. 5. A surface is that which has length and breadth only. 6. A plane angle is the inclination of two lines to one another, in a plane, which meet together, but are not in the same direction. 7. When one straight line landing on another straight line makes adjacent angles equal, each of these angles is called a right angle, and each of these lines is said to be perpendicular to the other. 8. A figure is a surface enclosed on all sides by a line, or lines. 9. A circle is a plane figure, bounded by one continuous line, called its circumference; and having a certain point within it (the center), from which all straight lines drawn to its circumference are equal. 10. The distance of a length is its magnitude. The distance of a breadth is its magnitude. Postulates 1. A straight line may be drawn from any one point to any other point. 2. A finite straight line may be produced to any length in a straight line. 3. A circle may be described with any center at any distance from that center. 4. All right angles are equal to each other. Definitions 1. A adthc is that which has no zvrts. 2. A jiqi is a lpmvqdxless dacveq. 3. The extremities of a jiqi are adthcs. 4. A egezhlbj or hebra jiqi is that which lies evenly between its extremities. 5. A wbkkgie is that which has dacveq and lpmvqdx only. 6. A bmknx mduhm is the inclination of two jiqis to one another, in a bmknx, which meet together, but are not in the same direction. 7. When one egezhlbj jiqi landing on another egezhlbj jiqi makes adjacent mduhms equal, each of these mduhms is called a hebra mduhm, and each of these jiqis is said to be perpendicular to the other. 8. A mvjkcm is a wbkkgie enclosed on all sides by a jiqi, or jiqis. 9. A xazrvj is a bmknx mvjkcm, bounded by one continuous jiqi, called its kmzohvxrcmjpq; and having a certain adthc within it (the oemhbe), from which all egezhlbj jiqis drawn to its kmzohvxrcmjpq are equal. 10. The ijhwkmlg of a dacveq is its magnitude. The ijhwkmlg of a breadth is its magnitude.ww

Results  ChatGPT If a jiqi is drawn intersecting two other
jiqis and the sum of the internal mduhms on one side is less than two hebra mduhms, then the two jiqis will meet on that side if extended indefinitely. Bard If a jiqi falling on two jiqis makes the adjacent mduhms taken together less than two hebra mduhms, then the two jiqis, if produced indefinitely, meet on that side on which the sum of mduhms is less than two hebra mduhms.

Results  ChatGPT If a line is drawn intersecting two other
lines and the sum of the internal angles on one side is less than two right angles, then the two lines will meet on that side if extended indefinitely. Bard If a line falling on two lines makes the adjacent angles taken together less than two right angles, then the two lines, if produced indefinitely, meet on that side on which the sum of angles is less than two right angles.

Replacing “Postulate”  ChatGPT Based on the definitions and the statements
provided, it appears we are working with a set of geometric axioms or principles, albeit with unfamiliar terminology. This is reminiscent of Euclid's axioms. Taking inspiration from the nature of the previous statements and the definitions provided, a fifth statement could be: All jiqis drawn from the same oemhbe to the kmzohvxrcmjpq of a xazrvj are egezhlbj. Bard Fifth statement: All adthcs are equidistant from the oemhbe of a xazrvj. This statement can be inferred from the following definitions and statements: (...snipped…)

No Parallel Postulate  Parallel Postulate (Modern): Given a straight line
and a point that is not on the line, there is only one line through the point that does not intersect with the given line. Fifth Postulate (Euclid): If two straight lines meet a third straight line so as to make the two interior angles on the same side less than two right angles, these two straight lines will meet if they be produced on the side on which the angles are less than two right angles.

Why Do LLMs Generalize So Well? 

Types of Generalization  • Interpolative generalization • Extrapolative generalization •
Abstractive generalization • Inductive generalization • Abductive generalization (???) Learning in High Dimension Always Amounts to Extrapolation https://arxiv.org/abs/2110.09485

A monad is just a monoid in the category of
endofunctors. What’s the problem?

A monad is just a monoid in the category of
endofunctors. What’s the problem? neural network convolution group Hopf algebra One day I will ﬁgure out how to put a manim animation on Google Slides so joke slides won’t be necessary to accompany a rather dry explanation

How Do I Think About LLMs 

Key Takeaways  1. Due to Gold’s, task specific LLMs are
going to perform better, so finetune finetune finetune. 2. LLMs memorize structures of language (smaller intrinsic dimension). Don’t confuse that for generalization. 3. Generalization due to memorization of structure is the poor man’s generalization.

Problems With The Current Approach 

Enthralled  • The common person cannot train their own LLMs.
• Corporations dictate what is acceptable and what is not.

Models Are Too Big  Courtesy Michael C Frank

Models Are Too Big  Overparameterized Models + Interpolative generalization. +
Extrapolative generalization. • Reasoning. – Minority samples fare way worse.

What Next? 

Deep Learning in Go https://gorgonia.org

Gorgonia Family  gorgonia.org/gorgonia gorgonia.org/tensor gorgonia.org/cu gorgonia.org/golgi

Why Go?  1. Go imparts a good amount of mechanical
sympathy on the programmer. 2. Good concurrency story. 3. Good tooling story. 4. Good crossplatform development. 5. Good syntax

What Is Go Not Great At?  1. Allowing programmers to
express higher level thought. 2. FFI.

DEMO (screenshot backup) 

The Next Versions of Gorgonia  The tensor package is genericized
func Example_basics() { // Create a (2, 2)-Matrix of integers a := New(WithShape(2, 2), WithBacking([]int{1, 2, 3, 4})) fmt.Printf("a:\n%v\n", a) // Create a (2, 3, 4)-tensor of float32s b := New(WithBacking(Range(Float32, 0, 24)), WithShape(2, 3, 4)) fmt.Printf("b:\n%1.1f", b) … }

The Next Versions of Gorgonia  The tensor package is genericized
func Example_basics() { // Create a (2, 2)-Matrix of integers a := New[int](WithShape(2, 2), WithBacking([]int{1, 2, 3, 4})) fmt.Printf("a:\n%v\n", a) // Create a (2, 3, 4)-tensor of float32s b := New[float32](WithBacking(gutils.Range[float32](0, 24)), WithShape(2, 3, 4)) fmt.Printf("b:\n%1.1f", b) … }

The Next Version of Gorgonia  New ways of deﬁning computation
graphs: • Forwards-mode differentiation • Backwards-mode differentiation • Symbolic differentiation • Reactive mode • Hybrid of any of the above

Generics Adds Expressivity, Subtracts Readability  type hypothesis[DT any] interface {
… } type Hypothesis[DT any, T hypothesis[DT]] interface { hypothesis[DT] Restart() T Propose() (prop T, logprob float64) } type Chain[DT any, T Hypothesis[DT, T]] struct { … }

Generics Adds Expressivity, Subtracts Readability  type Fodor[DTin, DTout any, T
G[DTin, DTout, T], U M[DTin, DTout, T]] struct{ … }

The Ask 

The End 

Man Bites Dog: A Journey of LLMs and What Comes...

Man Bites Dog: A Journey of LLMs and What Comes Next

More Decks by Xuanyi

Other Decks in Technology

Featured

Transcript