Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Man Bites Dog: A Journey of LLMs and What Comes...

Xuanyi
November 03, 2023

Man Bites Dog: A Journey of LLMs and What Comes Next

Presented at GopherConSG 2023

LLMs like GPT-4 have gained popularity recently, raising questions about their capabilities and general intelligence - mostly written in Python, seemingly leaving Gophers out in the cold. However, that couldn't be more false. In this talk I'll explore where Go can go in an LLM-filled future.

Xuanyi

November 03, 2023
Tweet

More Decks by Xuanyi

Other Decks in Technology

Transcript

  1. Gorgonia, a Go programming language library for machine learning. The

    robotic dog, named "GPT-K9", reportedly reacted to the bite with an error message: "Syntax error: unexpected bite." GPT-K9 was last seen entering a blue police box with the doctor. The Concurrent Times. Friday, 3 November 2023 MAN BITES DOG!
 In a bizarre twist of events, a linguist studying word order bit a dog during a conference on large language models. Dr. Noam Lingo, while presenting his research on how word order affects perception, used the classic example "Man bites dog" versus "Dog bites man". To demonstrate, he playfully "bit" a robotic dog powered by Xuanyi Chew Live from Gophercon Singapore
  2. hand : Noun pen : Noun book : Noun read

    : Verb write : Verb speak : Verb
  3. hand : Noun pen : Noun book : Noun read

    : Verb write : Verb speak : Verb
  4. hand : Noun pen : Noun book : Noun read

    : Verb write : Verb speak : Verb
  5. hand : Noun pen : Noun book : Noun read

    : Verb write : Verb speak : Verb
  6. hand : Noun pen : Noun book : Noun read

    : Verb write : Verb speak : Verb
  7. hand : Noun pen : Noun book : Noun read

    : Verb write : Verb speak : Verb
  8. define : Verb Show or tell what a word means

    example : Noun A thing to show to help define what a word means
  9. define : Verb Show or tell what a word means

    example : Noun A thing to show to help define what a word means
  10. sentence Many words in order. valid Good by the rules.

    grammar A set of rules that defines if a sentence is valid or not. vocabulary A set of words where you know what a word means.
  11. sentence Many words in order. valid Good by the rules.

    grammar A set of rules that defines if a sentence is valid or not. vocabulary A set of words where you know what a word means.
  12. sentence Many words in order. valid Good by the rules.

    grammar A set of rules that defines if a sentence is valid or not. vocabulary A set of words where you know what a word means.
  13. sentence Many words in order. valid Good by the rules.

    grammar A set of rules that defines if a sentence is valid or not. vocabulary A set of words where you know what a word means.
  14. S ::= NP VP VP NP ::= { n |

    n ∈ Nouns } VP ::= { v | v ∈ Verbs }
  15. S ::= NP VP VP NP ::= { n |

    n ∈ Nouns } VP ::= { v | v ∈ Verbs }
  16. S ::= NP VP VP NP ::= { n |

    n ∈ Nouns } VP ::= { v | v ∈ Verbs }
  17. Rules of The Lien Language
 S ::= VP NP NP

    NP ::= ꢯ | 🜣 VP ::= 🜟
  18. Is This Sentence Valid?
 🜣 🜟 ꢯ 🜣 ꢯ 🜟

    🜟 🜣 ꢯ ꢯ 🜣 🜟 🜟 ꢯ 🜣 ꢯ 🜣 🜟 ✖ S ::= VP NP NP NP ::= ꢯ | 🜣 VP ::= 🜟
  19. Is This Sentence Valid?
 🜣 🜟 ꢯ 🜣 ꢯ 🜟

    🜟 🜣 ꢯ ꢯ 🜣 🜟 🜟 ꢯ 🜣 ꢯ 🜣 🜟 ✖ ✖ S ::= VP NP NP NP ::= ꢯ | 🜣 VP ::= 🜟
  20. Is This Sentence Valid?
 🜣 🜟 ꢯ 🜣 ꢯ 🜟

    🜟 🜣 ꢯ ꢯ 🜣 🜟 🜟 ꢯ 🜣 ꢯ 🜣 🜟 ✖ ✖ ✔ S ::= VP NP NP NP ::= ꢯ | 🜣 VP ::= 🜟
  21. Is This Sentence Valid?
 🜣 🜟 ꢯ 🜣 ꢯ 🜟

    🜟 🜣 ꢯ ꢯ 🜣 🜟 🜟 ꢯ 🜣 ꢯ 🜣 🜟 S ::= VP NP NP NP ::= ꢯ | 🜣 VP ::= 🜟 ✖ ✖ ✔ ✖
  22. Is This Sentence Valid?
 🜣 🜟 ꢯ 🜣 ꢯ 🜟

    🜟 🜣 ꢯ ꢯ 🜣 🜟 🜟 ꢯ 🜣 ꢯ 🜣 🜟 S ::= VP NP NP NP ::= ꢯ | 🜣 VP ::= 🜟 ✖ ✖ ✔ ✖ ✔
  23. Is This Sentence Valid?
 🜣 🜟 ꢯ 🜣 ꢯ 🜟

    🜟 🜣 ꢯ ꢯ 🜣 🜟 🜟 ꢯ 🜣 ꢯ 🜟 🜣 S ::= VP NP NP NP ::= ꢯ | 🜣 VP ::= 🜟 ✖ ✖ ✔ ✖ ✔ ✖
  24. Recap
 Vocabulary = set of words Grammar = set of

    rules Language = Grammar + Vocabulary = The set of all sentences generated by a given grammar and vocab
  25. Sublanguages
 Language A is a sublanguage of Language B if

    all the sentences in A is contained in the set of all sentences in B.
  26. Examples of Sublanguages
 Written English ≺ English Written English ≺

    Internet Text Go Programming Language ≺ Internet Text
  27. Gold’s Theorem
 Teacher Learner S ::= VP NP NP NP

    ::= ꢯ | 🜣 VP ::= 🜟 | ⩹ 🜟 🜣 ꢯ ⩹ 🜣 ∅ …
  28. Gold’s Theorem
 Teacher Learner S ::= VP NP NP NP

    ::= ꢯ | 🜣 VP ::= 🜟 | ⩹ 🜟 🜣 ꢯ ⩹ 🜣 ∅ … L 1 L 2 … L n L ∞ … L k
  29. Gold’s Theorem
 Teacher Learner S ::= VP NP NP NP

    ::= ꢯ | 🜣 VP ::= 🜟 | ⩹ 🜟 🜣 ꢯ ⩹ 🜣 ∅ … L 1 L 2 … L n L ∞ … L k 🜟 🜣 ꢯ
  30. Gold’s Theorem
 L 1 , L 2 , … L

    n , L ∞ , L k , … Listener Speaker
  31. Gold’s Theorem
 En, Zh, … JS, L ∞ , Go,

    … Listener Speaker func I[T any](x T) T { return x }
  32. Gold’s Theorem
 En, Zh, … JS, L ∞ , Go,

    … Listener Speaker func I[T any](x T) T { return x }
  33. Gold’s Theorem
 En, Zh, … JS, L ∞ , Go,

    … Listener Speaker func I[T any](x T) T { return x }
  34. Gold’s Theorem
 En, Zh, … JS, L ∞ , Go,

    … Listener Go ≺ L ∞ Speaker func I[T any](x T) T { return x }
  35. All Possible Languages
 1. 🜟 ꢯ 🜣 2. 🜟 🜣

    ꢯ 3. ⩹ ꢯ ∅ 4. ⩹ 🜣 ∅ 1. 2. s1 3. s2 4. s1 s2 5. s3 6. s1 s3 7. s2 s3 8. s1 s2 s3 9. s4 10. s1 s4 11. s2 s4 12. s1 s2 s4 13. s3 s4 14. s1 s3 s4 15. s2 s3 s4 16. s1 s2 s3 s4
  36. All Possible Languages
 1. 🜟 ꢯ 🜣 2. 🜟 🜣

    ꢯ 3. ⩹ ꢯ ∅ 4. ⩹ 🜣 ∅ 1. 2. s1 3. s2 4. s1 s2 5. s3 6. s1 s3 7. s2 s3 8. s1 s2 s3 9. s4 10. s1 s4 11. s2 s4 12. s1 s2 s4 13. s3 s4 14. s1 s3 s4 15. s2 s3 s4 16. s1 s2 s3 s4 Ah Lien Kiasu
  37. All Possible Languages
 1. 🜟 ꢯ 🜣 2. 🜟 🜣

    ꢯ 3. ⩹ ꢯ ∅ 4. ⩹ 🜣 ∅ 1. 2. s1 3. s2 4. s1 s2 5. s3 6. s1 s3 7. s2 s3 8. s1 s2 s3 9. s4 10. s1 s4 11. s2 s4 12. s1 s2 s4 13. s3 s4 14. s1 s3 s4 15. s2 s3 s4 16. s1 s2 s3 s4 Ah Lien Kiasu
  38. All Possible Languages
 1. 🜟 ꢯ 🜣 2. 🜟 🜣

    ꢯ 3. ⩹ ꢯ ∅ 4. ⩹ 🜣 ∅ 1. 2. s1 3. s2 4. s1 s2 5. s3 6. s1 s3 7. s2 s3 8. s1 s2 s3 9. s4 10. s1 s4 11. s2 s4 12. s1 s2 s4 13. s3 s4 14. s1 s3 s4 15. s2 s3 s4 16. s1 s2 s3 s4 Ah Lien Kiasu Chewxy 🜟 ꢯ 🜣
  39. All Possible Languages
 1. 🜟 ꢯ 🜣 2. 🜟 🜣

    ꢯ 3. ⩹ ꢯ ∅ 4. ⩹ 🜣 ∅ 1. 2. s1 3. s2 4. s1 s2 5. s3 6. s1 s3 7. s2 s3 8. s1 s2 s3 9. s4 10. s1 s4 11. s2 s4 12. s1 s2 s4 13. s3 s4 14. s1 s3 s4 15. s2 s3 s4 16. s1 s2 s3 s4 Ah Lien Kiasu Chewxy 🜟 ꢯ 🜣 L6! L2 | L4 | L6 | L8 | L10 | L12 | L14 | L16
  40. Neural Networks
 σ 2 (W 2 ’Y = σ 1

    (W 1 ’X + b 1 ) + b 2 )
  41. Deep Neural Networks
 Y = σ 2 (W 2 ’σ

    1 (W 1 ’X + b 1 ) + b 2 )
  42. Deep Neural Networks
 Y = σ 2 (W 2 ’σ

    1 (W 1 ’X + b 1 ) + b 2 )
  43. Deep Neural Networks
 Y = σ 2 (W 2 ’σ

    1 (W 1 ’X + b 1 ) + b 2 ) [0.3m, 0.6c] [0.7m, 0.4c] Note: “spreading out” of values of m and c is shown for illustration purposes only
  44. Pretraining
 LLM “hello” “gophercon” “singapore” “hello gophercon singapore” “hello gophercon

    au” … “<html> <body> … </html>” “func add(x, y int) int {...”
  45. Supervised Finetuning
 LLM “q” “:” “answer” … “A” “:” …

    “input: summarize the following paragraph … output: … ” “Q: Answer the following yes/no question by reasoning step-by-step. … A: …”
  46. What Do LLMs Learn?
 LLMs learn the grammar of a

    language and knowledge embedded in language
  47. Where Is Knowledge Found?
 Claim: LLMs have a world model.

    Counterclaim: A bag of words with gradient boosting has a world model Language Models Represent Space and Time https://arxiv.org/abs/2310.02207 Bag of words with gradient boosting shows a world model https://twitter.com/ArthurB/status/1711475192461971860
  48. How To Test If LLMs Have Knowledge Outside of Language?


    1. Take something written in plain natural language. 2. Replace noun and verbs with consistent random strings. 3. Use it as a prompt to the LLM.
  49. How Do We Know LLMs Don’t Understand Arithmetic?
 I tried

    22 times before I gave up! 22 wrong answers later I give up
  50. Why Do LLMs Perform So Well?
 LLMs memorize the world!

    Quantifying Memorization Across Neural Language Models https://arxiv.org/abs/2202.07646
  51. Can LLMs Extract Knowledge From Structure?
 Definitions 1. A point

    is that which has no parts. 2. A line is a breadless length. 3. The extremities of a line are points. 4. A straight or right line is that which lies evenly between its extremities. 5. A surface is that which has length and breadth only. 6. A plane angle is the inclination of two lines to one another, in a plane, which meet together, but are not in the same direction. 7. When one straight line landing on another straight line makes adjacent angles equal, each of these angles is called a right angle, and each of these lines is said to be perpendicular to the other. 8. A figure is a surface enclosed on all sides by a line, or lines. 9. A circle is a plane figure, bounded by one continuous line, called its circumference; and having a certain point within it (the center), from which all straight lines drawn to its circumference are equal. 10. The distance of a length is its magnitude. The distance of a breadth is its magnitude. Postulates 1. A straight line may be drawn from any one point to any other point. 2. A finite straight line may be produced to any length in a straight line. 3. A circle may be described with any center at any distance from that center. 4. All right angles are equal to each other. Definitions 1. A adthc is that which has no zvrts. 2. A jiqi is a lpmvqdxless dacveq. 3. The extremities of a jiqi are adthcs. 4. A egezhlbj or hebra jiqi is that which lies evenly between its extremities. 5. A wbkkgie is that which has dacveq and lpmvqdx only. 6. A bmknx mduhm is the inclination of two jiqis to one another, in a bmknx, which meet together, but are not in the same direction. 7. When one egezhlbj jiqi landing on another egezhlbj jiqi makes adjacent mduhms equal, each of these mduhms is called a hebra mduhm, and each of these jiqis is said to be perpendicular to the other. 8. A mvjkcm is a wbkkgie enclosed on all sides by a jiqi, or jiqis. 9. A xazrvj is a bmknx mvjkcm, bounded by one continuous jiqi, called its kmzohvxrcmjpq; and having a certain adthc within it (the oemhbe), from which all egezhlbj jiqis drawn to its kmzohvxrcmjpq are equal. 10. The ijhwkmlg of a dacveq is its magnitude. The ijhwkmlg of a breadth is its magnitude. Postulates 1. A egezhlbj jiqi may be drawn from any one adthc to any other adthc. 2. A finite egezhlbj jiqi may be produced to any dacveq in a egezhlbj jiqi. 3. A xazrvj may be described with any oemhbe at any ijhwkmlg from that oemhbe. 4. All hebra mduhms are equal to each other.
  52. Can LLMs Extract Knowledge From Structure?
 Definitions 1. A point

    is that which has no parts. 2. A line is a breadless length. 3. The extremities of a line are points. 4. A straight or right line is that which lies evenly between its extremities. 5. A surface is that which has length and breadth only. 6. A plane angle is the inclination of two lines to one another, in a plane, which meet together, but are not in the same direction. 7. When one straight line landing on another straight line makes adjacent angles equal, each of these angles is called a right angle, and each of these lines is said to be perpendicular to the other. 8. A figure is a surface enclosed on all sides by a line, or lines. 9. A circle is a plane figure, bounded by one continuous line, called its circumference; and having a certain point within it (the center), from which all straight lines drawn to its circumference are equal. 10. The distance of a length is its magnitude. The distance of a breadth is its magnitude. Postulates 1. A straight line may be drawn from any one point to any other point. 2. A finite straight line may be produced to any length in a straight line. 3. A circle may be described with any center at any distance from that center. 4. All right angles are equal to each other. Definitions 1. A adthc is that which has no zvrts. 2. A jiqi is a lpmvqdxless dacveq. 3. The extremities of a jiqi are adthcs. 4. A egezhlbj or hebra jiqi is that which lies evenly between its extremities. 5. A wbkkgie is that which has dacveq and lpmvqdx only. 6. A bmknx mduhm is the inclination of two jiqis to one another, in a bmknx, which meet together, but are not in the same direction. 7. When one egezhlbj jiqi landing on another egezhlbj jiqi makes adjacent mduhms equal, each of these mduhms is called a hebra mduhm, and each of these jiqis is said to be perpendicular to the other. 8. A mvjkcm is a wbkkgie enclosed on all sides by a jiqi, or jiqis. 9. A xazrvj is a bmknx mvjkcm, bounded by one continuous jiqi, called its kmzohvxrcmjpq; and having a certain adthc within it (the oemhbe), from which all egezhlbj jiqis drawn to its kmzohvxrcmjpq are equal. 10. The ijhwkmlg of a dacveq is its magnitude. The ijhwkmlg of a breadth is its magnitude.ww
  53. Results
 ChatGPT If a jiqi is drawn intersecting two other

    jiqis and the sum of the internal mduhms on one side is less than two hebra mduhms, then the two jiqis will meet on that side if extended indefinitely. Bard If a jiqi falling on two jiqis makes the adjacent mduhms taken together less than two hebra mduhms, then the two jiqis, if produced indefinitely, meet on that side on which the sum of mduhms is less than two hebra mduhms.
  54. Results
 ChatGPT If a line is drawn intersecting two other

    lines and the sum of the internal angles on one side is less than two right angles, then the two lines will meet on that side if extended indefinitely. Bard If a line falling on two lines makes the adjacent angles taken together less than two right angles, then the two lines, if produced indefinitely, meet on that side on which the sum of angles is less than two right angles.
  55. Replacing “Postulate”
 ChatGPT Based on the definitions and the statements

    provided, it appears we are working with a set of geometric axioms or principles, albeit with unfamiliar terminology. This is reminiscent of Euclid's axioms. Taking inspiration from the nature of the previous statements and the definitions provided, a fifth statement could be: All jiqis drawn from the same oemhbe to the kmzohvxrcmjpq of a xazrvj are egezhlbj. Bard Fifth statement: All adthcs are equidistant from the oemhbe of a xazrvj. This statement can be inferred from the following definitions and statements: (...snipped…)
  56. No Parallel Postulate
 Parallel Postulate (Modern): Given a straight line

    and a point that is not on the line, there is only one line through the point that does not intersect with the given line. Fifth Postulate (Euclid): If two straight lines meet a third straight line so as to make the two interior angles on the same side less than two right angles, these two straight lines will meet if they be produced on the side on which the angles are less than two right angles.
  57. Types of Generalization
 • Interpolative generalization • Extrapolative generalization •

    Abstractive generalization • Inductive generalization • Abductive generalization (???) Learning in High Dimension Always Amounts to Extrapolation https://arxiv.org/abs/2110.09485
  58. A monad is just a monoid in the category of

    endofunctors. What’s the problem?
  59. A monad is just a monoid in the category of

    endofunctors. What’s the problem? neural network convolution group Hopf algebra One day I will figure out how to put a manim animation on Google Slides so joke slides won’t be necessary to accompany a rather dry explanation
  60. Key Takeaways
 1. Due to Gold’s, task specific LLMs are

    going to perform better, so finetune finetune finetune. 2. LLMs memorize structures of language (smaller intrinsic dimension). Don’t confuse that for generalization. 3. Generalization due to memorization of structure is the poor man’s generalization.
  61. Enthralled
 • The common person cannot train their own LLMs.

    • Corporations dictate what is acceptable and what is not.
  62. Models Are Too Big
 Overparameterized Models + Interpolative generalization. +

    Extrapolative generalization. • Reasoning. – Minority samples fare way worse.
  63. Why Go?
 1. Go imparts a good amount of mechanical

    sympathy on the programmer. 2. Good concurrency story. 3. Good tooling story. 4. Good crossplatform development. 5. Good syntax
  64. What Is Go Not Great At?
 1. Allowing programmers to

    express higher level thought. 2. FFI.
  65. The Next Versions of Gorgonia
 The tensor package is genericized

    func Example_basics() { // Create a (2, 2)-Matrix of integers a := New(WithShape(2, 2), WithBacking([]int{1, 2, 3, 4})) fmt.Printf("a:\n%v\n", a) // Create a (2, 3, 4)-tensor of float32s b := New(WithBacking(Range(Float32, 0, 24)), WithShape(2, 3, 4)) fmt.Printf("b:\n%1.1f", b) … }
  66. The Next Versions of Gorgonia
 The tensor package is genericized

    func Example_basics() { // Create a (2, 2)-Matrix of integers a := New[int](WithShape(2, 2), WithBacking([]int{1, 2, 3, 4})) fmt.Printf("a:\n%v\n", a) // Create a (2, 3, 4)-tensor of float32s b := New[float32](WithBacking(gutils.Range[float32](0, 24)), WithShape(2, 3, 4)) fmt.Printf("b:\n%1.1f", b) … }
  67. The Next Version of Gorgonia
 New ways of defining computation

    graphs: • Forwards-mode differentiation • Backwards-mode differentiation • Symbolic differentiation • Reactive mode • Hybrid of any of the above
  68. Generics Adds Expressivity, Subtracts Readability
 type hypothesis[DT any] interface {

    … } type Hypothesis[DT any, T hypothesis[DT]] interface { hypothesis[DT] Restart() T Propose() (prop T, logprob float64) } type Chain[DT any, T Hypothesis[DT, T]] struct { … }
  69. Generics Adds Expressivity, Subtracts Readability
 type Fodor[DTin, DTout any, T

    G[DTin, DTout, T], U M[DTin, DTout, T]] struct{ … }