Slide 1

Slide 1 text

[email protected] https://itakigawa.github.io/

Slide 2

Slide 2 text

2 = ( ) ⾒ ⾒

Slide 3

Slide 3 text

3 ( ) • AlphaGeometry (Nature, 2024) • AlphaCode2 (2023) & AlphaCode (Science, 2022) (Codeforces) 85% • FunSearch (Nature, 2023) ⾒ (extremal combinatorics) Cap set • AlphaDev (Nature, 2023) 3 5 70 >25 ⾒ 1.7 LLVM libstdc++ • AlphaTensor (Nature, 2022) 50 2 4×4 49 Strassen 47

Slide 4

Slide 4 text

4 ( ) Natue News https://bit.ly/3UQcgpu Stanford University’s 2024 AI Index https://aiindex.stanford.edu/report/

Slide 5

Slide 5 text

5 ( ) CODE-COMPLETION SYSTEMS OFFERING suggestions to a developer in their integrated development environment (IDE) have become the most frequently used kind of programmer assistance.1 When generating whole snippets of code, they typically use a large language model (LLM) to predict what the user might type next (the completion) from the context of what they are working on at the moment (the prompt).2 This system allows for completions at any position in Measuring GitHub Copilot’s Impact on Productivity DOI:10.1145/3633453 Case study asks Copilot users about its impact on their productivity, and seeks to find their perceptions mirrored in user data. BY ALBERT ZIEGLER, EIRINI KALLIAMVAKOU, X. ALICE LI, ANDREW RICE, DEVON RIFKIN, SHAWN SIMISTER, GANESH SITTAMPALAM, AND EDWARD AFTANDILIAN key insights AI pair-programming tools such as GitHub Copilot have a big impact on developer productivity. This holds for developers of all skill levels, with junior developers seeing the largest gains. The reported benefits of receiving AI suggestions while coding span the full range of typically investigated aspects of productivity, such as task time, product quality, cognitive load, enjoyment, and learning. Perceived productivity gains are reflected in objective measurements of developer activity. While suggestion correctness is important, the driving factor for these improvements appears to be not correctness as such, but whether the suggestions are useful as a starting point for further development. 54 COMMUNICATIONS OF THE ACM | MARCH 2024 | VOL. 67 | NO. 3 research the code, often spanning multiple lines at once. Potential benefits of generating large sections of code automatically are huge, but evaluating these sys- tems is challenging. Offline evalua- tion, where the system is shown a par- tial snippet of code and then asked to complete it, is difficult not least because for longer completions there are many acceptable alternatives and no straightforward mechanism for labeling them automatically.5 An ad- ditional step taken by some research- ers3,21,29 is to use online evaluation and track the frequency of real us- ers accepting suggestions, assuming that the more contributions a system makes to the developer’s code, the higher its benefit. The validity of this assumption is not obvious when con- sidering issues such as whether two short completions are more valuable than one long one, or whether review- ing suggestions can be detrimental to programming flow. Code completion in IDEs using lan- guage models was first proposed in Hindle et al.,9 and today neural syn- thesis tools such as GitHub Copilot, CodeWhisperer, and TabNine suggest code snippets within an IDE with the explicitly stated intention to increase a user’s productivity. Developer pro- ductivity has many aspects, and a re- cent study has shown that tools like these are helpful in ways that are only partially reflected by measures such as completion times for standardized tasks.23,a Alternatively, we can leverage the developers themselves as expert assessors of their own productivity. This meshes well with current think- ing in software engineering research suggesting measuring productiv- ity on multiple dimensions and using self-reported data.6 Thus, we focus on studying perceived productivity. Here, we investigate whether usage measurements of developer interac- tions with GitHub Copilot can predict perceived productivity as reported by developers. We analyze 2,631 sur- a Nevertheless, such completion times are greatly reduced in many settings, often by more than half.16 MARCH 2024 | VOL. 67 | NO. 3 | COMMUNICATIONS OF THE ACM 55 ILLUSTRATION BY JUSTIN METZ • • • (explain) • (brushes) • • A B • • GitHub Copilot ( ) Comm. ACM. 67(3) https://doi.org/10.1145/3633453 GitHub Next / Copilot Labs https://githubnext.com/

Slide 6

Slide 6 text

6 ChatGPT ( ) ? https://arxiv.org/abs/2311.17035 IQ ? https://bit.ly/44qxNYQ

Slide 7

Slide 7 text

7 LLM ( ) https://bit.ly/44Og5z3 ( ) https://arxiv.org/abs/2309.12288

Slide 8

Slide 8 text

8 (Deductive) (Inductive) (Hypothetico-Deductive) / Retroductive (Abductive) (System 1 + System 2)

Slide 9

Slide 9 text

9 • Q. ( ) ( ) vs ( ) • • • ( , RAG, ReACT, LangChain, ) • ( ) • × ( ) •

Slide 10

Slide 10 text

10 • ( ) • ( ) • ( ) • ( ) • BHK ( ) ( ) ? (2015)

Slide 11

Slide 11 text

11 (2023 ) ACM A.M. Turing Award Honors Avi Wigderson for Foundational Contributions to the Theory of Computation https://awards.acm.org/about/2023-turing For Turing Award winner, everything is computation and some problems are unsolvable https://bit.ly/3xZzJLP

Slide 12

Slide 12 text

12 = / / • (Church, Kleene, Turing) • /μ (Gödel-Herbrand) • (Church) • (Turing) • • ( ) vs. (2021) (2016)

Slide 13

Slide 13 text

13 A. ( ) B. ( ) ( ) • 𝜃 ( ) ℱ = {𝑓 𝜃 ; 𝜃 ∈ Θ} • 𝜃

Slide 14

Slide 14 text

14 • • (Memorization) Over t ( ) AAAB93icbVA9SwNBEN2LXzF+RS1tFoNgFe5EomXQxjIB8wHJEfY2c8mS3b1jd088jvwCW63txNafY+k/cZNcYRIfDDzem2FmXhBzpo3rfjuFjc2t7Z3ibmlv/+DwqHx80tZRoii0aMQj1Q2IBs4ktAwzHLqxAiICDp1gcj/zO0+gNIvko0lj8AUZSRYySoyVms+DcsWtunPgdeLlpIJyNAbln/4wookAaSgnWvc8NzZ+RpRhlMO01E80xIROyAh6lkoiQPvZ/NApvrDKEIeRsiUNnqt/JzIitE5FYDsFMWO96s3E/7xeYsJbP2MyTgxIulgUJhybCM++xkOmgBqeWkKoYvZWTMdEEWpsNktbAjG1mXirCayT9lXVq1VrzetK/S5Pp4jO0Dm6RB66QXX0gBqohSgC9IJe0ZuTOu/Oh/O5aC04+cwpWoLz9QuO25Od x AAAB93icbVA9TwJBEJ3DL8Qv1NJmIzGxInfGoCXRxhISQRK4kL1lDjbs7V1290wuhF9gq7WdsfXnWPpPXOAKAV8yyct7M5mZFySCa+O6305hY3Nre6e4W9rbPzg8Kh+ftHWcKoYtFotYdQKqUXCJLcONwE6ikEaBwKdgfD/zn55RaR7LR5Ml6Ed0KHnIGTVWamb9csWtunOQdeLlpAI5Gv3yT28QszRCaZigWnc9NzH+hCrDmcBpqZdqTCgb0yF2LZU0Qu1P5odOyYVVBiSMlS1pyFz9OzGhkdZZFNjOiJqRXvVm4n9eNzXhrT/hMkkNSrZYFKaCmJjMviYDrpAZkVlCmeL2VsJGVFFmbDZLW4JoajPxVhNYJ+2rqler1prXlfpdnk4RzuAcLsGDG6jDAzSgBQwQXuAV3pzMeXc+nM9Fa8HJZ05hCc7XL5Buk54= y AAAB93icbVA9SwNBEN2LXzF+RS1tFoNgFe5EomXQxjIB8wHJEfY2c8mS3b1jd088jvwCW63txNafY+k/cZNcYRIfDDzem2FmXhBzpo3rfjuFjc2t7Z3ibmlv/+DwqHx80tZRoii0aMQj1Q2IBs4ktAwzHLqxAiICDp1gcj/zO0+gNIvko0lj8AUZSRYySoyVms+DcsWtunPgdeLlpIJyNAbln/4wookAaSgnWvc8NzZ+RpRhlMO01E80xIROyAh6lkoiQPvZ/NApvrDKEIeRsiUNnqt/JzIitE5FYDsFMWO96s3E/7xeYsJbP2MyTgxIulgUJhybCM++xkOmgBqeWkKoYvZWTMdEEWpsNktbAjG1mXirCayT9lXVq1VrzetK/S5Pp4jO0Dm6RB66QXX0gBqohSgC9IJe0ZuTOu/Oh/O5aC04+cwpWoLz9QuO25Od x AAAB93icbVA9TwJBEJ3DL8Qv1NJmIzGxInfGoCXRxhISQRK4kL1lDjbs7V1290wuhF9gq7WdsfXnWPpPXOAKAV8yyct7M5mZFySCa+O6305hY3Nre6e4W9rbPzg8Kh+ftHWcKoYtFotYdQKqUXCJLcONwE6ikEaBwKdgfD/zn55RaR7LR5Ml6Ed0KHnIGTVWamb9csWtunOQdeLlpAI5Gv3yT28QszRCaZigWnc9NzH+hCrDmcBpqZdqTCgb0yF2LZU0Qu1P5odOyYVVBiSMlS1pyFz9OzGhkdZZFNjOiJqRXvVm4n9eNzXhrT/hMkkNSrZYFKaCmJjMviYDrpAZkVlCmeL2VsJGVFFmbDZLW4JoajPxVhNYJ+2rqler1prXlfpdnk4RzuAcLsGDG6jDAzSgBQwQXuAV3pzMeXc+nM9Fa8HJZ05hCc7XL5Buk54= y

Slide 15

Slide 15 text

15 1 • B A 1. 2. • (Cybenko, Hornik, etc) 3 1 1 1 3 • ( ) (Vapnik & Chervonenkis, etc) • (Computable Analysis)

Slide 16

Slide 16 text

16 2 • or (Universal Prediction): • ( ) n (2-n) • • 1960 • ( ) ⾒ https://www.lesswrong.com/tag/solomonoff-induction R. J. Solomonoff, Machine Learning — Past and Future (2009). Dartmouth Artificial Intelligence Conference: The Next Fifty Years (AI@50), Dartmouth, 2006. https://bit.ly/3UvpMgG

Slide 17

Slide 17 text

17 • Q. ( ) ( ) vs ( ) • • • ( , RAG, ReACT, LangChain, ) • ( ) • × ( ) •

Slide 18

Slide 18 text

18 ⚠ • ( ) • • (Melanie Mitchell) • AI hype is built on high test scores. Those tests are flawed. https://bit.ly/44m7NxX • AI now beats humans at basic tasks — new benchmarks are needed, says major report https://www.nature.com/articles/d41586-024-01087-4 • ( “ ” )

Slide 19

Slide 19 text

19 (LLMs) 頻 ( ) GPT-4, Claude 3, Gemini 1.5, Mistral, Llama 3, Command, DBRX, PPLX, Grok, https://artificialanalysis.ai/

Slide 20

Slide 20 text

Me > ChatGPT4 > 1. 1944 2. 1966 3. 1971 4. 1971 5. 1981 6. 1990 7. 2007 ü : : ü ü ü ( )

Slide 21

Slide 21 text

Me > ⾒ Web ChatGPT4 > Ichigaku Takigawa ICReDD ⾒ ⾒ d

Slide 22

Slide 22 text

Claude 3 Opus > • • • • • • AC0 PARITY MAJORITY • TC0 W(S5) • • Transformers as recognizers of formal languages: A survey on expressivity. L Strobl, W Merrill, G Weiss, D Chiang, D Angluin https://arxiv.org/abs/2311.00208

Slide 23

Slide 23 text

Me > URL Web 10 https://www.lifeiscomputation.com/transformers-are-not-turing-complete/ ChatGPT4 > Web 1. : : 2. : : 3. : : 4. : : 5. : : 6. : : 7. : : 8. : : 9. : : 10. : : [ ] ü ü

Slide 24

Slide 24 text

24 • ( ) • • WWW • • (?)

Slide 25

Slide 25 text

25 Microsoft • OpenAI GPT AI Word, Excel, PowerPoint, Outlook, Teams, Bing/Edge, Azure, GitHub, VSCode, • GitHub (2018) VSCode GitHub Copilot / Copilot Workspace • 3 (440 ) 3 Apple 2 (🇬🇧🇫🇷🇮🇹 GDP ) • 頻 Windows Microsoft 365 Copilot • Word PowerPoint • Excel • Web

Slide 26

Slide 26 text

26 • • • ⾒ • • • • • 1cm 2 •

Slide 27

Slide 27 text

27 1. ( ) ( ) Embedding ( ) 2. ( ) • Vision-Language-Action (VLA) “computer” → [-0.005, 0.014, -0.007 , ..., -0.015]

Slide 28

Slide 28 text

28 OpenAI embedding models • text-embedding-ada-002 • text-embedding-3-small • text-embedding-3-large API KEY OK

Slide 29

Slide 29 text

29 ⾒ ≑ ⾒ • ( ): ⾒ ⾒ ( ) ⾒ ⾒ • : (= ) Wikipedia 2014 + Gigaword 5 40 ( ) OpenAI embeddings (large) 10 ※ GloVe: Global Vectors for Word Representation https://nlp.stanford.edu/projects/glove/ king – man + woman 0 king 0.690340 1 woman 0.569618 2 queen 0.521278 3 königin 0.479320 4 queene 0.477531 5 koningin 0.468316 6 women 0.460638 7 konig 0.459005 8 queenie 0.458508 9 queeny 0.455897 france – paris + london 0 london 0.682005 1 france 0.643014 2 england 0.563394 3 london-based 0.554125 4 longdon 0.539819 5 londen 0.532294 6 london-born 0.527847 7 londons 0.512141 8 londin 0.507284 9 britain 0.494870 computation 0 computation 1.000000 1 computations 0.863177 2 computing 0.773753 3 computational 0.728108 4 computationally 0.710972 5 computes 0.702328 6 computability 0.678299 7 calculation 0.647723 8 compute 0.637001 9 computable 0.612179 impeccable 0 impeccable 1.000000 1 impeccably 0.852758 2 unimpeachable 0.769426 3 irreproachable 0.725752 4 immaculate 0.716498 5 immaculately 0.665226 6 flawless 0.656289 7 faultless 0.633298 8 unblemished 0.628034 9 spotless 0.622993

Slide 30

Slide 30 text

30 • ( ) • , , , , , , , , , (Character Level) • , , , , (Word Level) • ⾒ (Subword Level) ( + ) • Character/Byte Level (BPE) ( RePair) Byte fallback

Slide 31

Slide 31 text

31 1/2 • Causal LM = Next Token Prediction ( ) • : n : : I'm ne, thank you (9.99e-01) for (1.09e-04) goodness (7.48e-05) • Meta-Llama-3-8B-Instruct ⾒ 頻8192 4096 128256 • n = 8192 ( / ) • vocab size = 128256 ( ) • embedding dim = 4096 ( ) • n : I'm ne, thank for asking. Just a little tired from the trip. I'll be okay

Slide 32

Slide 32 text

32 2/2 = Transformer (Llama3 ) ( 8192) (4096) Transformer Decoder ❌ ❌ ❌ ❌ / (4096) (embedding) Everything is a remix https://www.everythingisaremix.info/

Slide 33

Slide 33 text

33 2/2 = Transformer (Llama3 ) ( 8192) (4096) Transformer Decoder … (128256) Transformer Decoder 32 RMSNorm →Linear Multihead Attention RMSNorm + (MLPs) RMSNorm + ❌ ❌ SiLU × y = w2(silu(w1(x)) * w3(x)) Scaled Dot Product Attention 𝒗! 𝒗" 𝒗# 𝒗$ 𝒗% 𝒐! = # "#$ % 𝛼!," 𝒗" 𝒐! 𝒐" 𝒐# 𝒐$ 𝒐% Linear 𝛼#,% 𝒒! 𝒌% 𝒒" 𝒒# 𝒒$ 𝒒% 𝒌$ 𝒌# 𝒌" 𝒌! 𝒒& ′𝒌' dim of 𝒌' Softmax RoPE

Slide 34

Slide 34 text

from transformers import AutoTokenizer, AutoModelForCausalLM import torch model_id = "meta-llama/Meta-Llama-3-8B-Instruct" tokenizer = AutoTokenizer.from_pretrained(model_id) llama3_lm = AutoModelForCausalLM.from_pretrained( model_id, torch_dtype=torch.bfloat16, device_map="auto", ) print(llama3_lm); print(llama3_lm.model.config) LlamaForCausalLM( (model): LlamaModel( (embed_tokens): Embedding(128256, 4096) (layers): ModuleList( (0-31): 32 x LlamaDecoderLayer( (self_attn): LlamaSdpaAttention( (q_proj): Linear(in_features=4096, out_features=4096, bias=False) (k_proj): Linear(in_features=4096, out_features=1024, bias=False) (v_proj): Linear(in_features=4096, out_features=1024, bias=False) (o_proj): Linear(in_features=4096, out_features=4096, bias=False) (rotary_emb): LlamaRotaryEmbedding() ) (mlp): LlamaMLP( (gate_proj): Linear(in_features=4096, out_features=14336, bias=False) (up_proj): Linear(in_features=4096, out_features=14336, bias=False) (down_proj): Linear(in_features=14336, out_features=4096, bias=False) (act_fn): SiLU() ) (input_layernorm): LlamaRMSNorm() (post_attention_layernorm): LlamaRMSNorm() ) ) (norm): LlamaRMSNorm() ) (lm_head): Linear(in_features=4096, out_features=128256, bias=False) )

Slide 35

Slide 35 text

35 ( ) • ( ) Embedding ( ) • Llama3 15 ( ) • • ( ) 頻 • memorization

Slide 36

Slide 36 text

36 • ( ) ) GPT-4 ChatGPT • Instruction Tuning ( Fine Tuning) ( ) Fine Tuning • Human Feedback (RLHF) • • • Fine Tuning https://bit.ly/4bloMTk

Slide 37

Slide 37 text

37 AI ( ) 1950 1960 1970 1980 1990 2000 2010 2020 2 3 ( ) , , Level-1 Level-2 × Level-3 AI(GOFAI) !?

Slide 38

Slide 38 text

38 Level-1 • (Llama3 15 ) ( etc) • • (Fine Tuning) • RAG ( ) ReACT( ) ( VectorDB ) ) Python • ( ) ( Claude3 200K, Gemini 1.5 Pro 128K)

Slide 39

Slide 39 text

39 Level-1 LangChain LLM https://udemy.benesse.co.jp/development/system/langchain.html

Slide 40

Slide 40 text

40 Level-2 ( ) Tree of Thoughts/ToT (Yao et el. 2023, Long 2023) • • ⾒ •

Slide 41

Slide 41 text

41 AlphaCode2 • AlphaCode2 (2023) & AlphaCode (Science, 2022) (Codeforces) 85% • https://goo.gle/AlphaCode2 https://bit.ly/3UvYg2T https://bit.ly/3JLXhGD

Slide 42

Slide 42 text

42 AlphaCode2 ü Gemini Pro CodeContests Fine Tuning ü Tune Tuning ü 100 ü 95% ü ü Fine Tuning Gemini Pro 10

Slide 43

Slide 43 text

43 AlphaGeometry • AlphaGeometry (Nature, 2024) • Nature 625, 476 482 (2024). https://doi.org/10.1038/s41586-023-06747-5 https://bit.ly/4dkSKc1

Slide 44

Slide 44 text

44 AlphaGeometry ü ü ü LLM ü ü

Slide 45

Slide 45 text

45 AlphaGeometry

Slide 46

Slide 46 text

46 AlphaGeometry 10

Slide 47

Slide 47 text

47 AlphaGeometry

Slide 48

Slide 48 text

48 FunSearch • FunSearch (Nature, 2023) ⾒ (extremal combinatorics) Cap set • Nature 625, 468 475 (2024). https://doi.org/10.1038/s41586-023-06924-6 https://bit.ly/3WprEdx

Slide 49

Slide 49 text

49 FunSearch ü LLM ü FunSearch Fun ( ) ü Google LLM PaLM 2 Codey 蓄 Fine Tuning ü ü ü k ⾒

Slide 50

Slide 50 text

50 FunSearch evaluate

Slide 51

Slide 51 text

51 • Q. ( ) ( ) vs ( ) • • • ( , RAG, ReACT, LangChain, ) • ( ) • × ( ) •

Slide 52

Slide 52 text

52 Level-3 × • Q. ( ) • • or • • • / RAG • ReACT • •

Slide 53

Slide 53 text

53 • keras https://keras.io/examples/nlp/addition_rnn/ • "535+61" "596" • LSTM (1997) Folklore Sequence to Sequence Learning with Neural Networks (2014) https://arxiv.org/abs/1409.3215 Learning to Execute (2014) https://arxiv.org/abs/1410.4615 • +/- Reversed Karpathy s minGPT demo https://github.com/karpathy/minGPT/blob/master/projects/adder/adder.py • (2 +2 ) • Onehot +1 LSTM 99% (Transformer ) • memorization ( data leakage )

Slide 54

Slide 54 text

54 CS • • • ( ) ( ) • • • ( ) • ( ) • CS

Slide 55

Slide 55 text

55 1 • 1cm 2 ( ) / • ( ) , ( ) • • ( ) 2 ? https://bit.ly/4b9mcjz ( ) • 1000 1001 / ( )

Slide 56

Slide 56 text

56 2 • ( ) • ( ) ( or ) ( ) ( ) • • ? • ⾒ ?

Slide 57

Slide 57 text

57 ⾒ • ⾒ ( ) • ⾒ ⾒ ⾒ • • Underdetermined/Underspecified 頻 • ( ) • • ? ( ), ( )

Slide 58

Slide 58 text

58 Level-3 × • : Thinking slow (System 2) and fast (System 1) • • ⾒ • ⾒System 2( ) OK • • System2 • × https://arxiv.org/html/2401.14953v1 • LLM

Slide 59

Slide 59 text

59 1. • Transformer Stateless • 2. • Transformer Memoryless ( ?) • ( ) 3. ( Simulate Feedback) • • • / Optional

Slide 60

Slide 60 text

60 / Feedback • / • Self-Refine https://arxiv.org/abs/2303.17651 • Looped Transformer https://arxiv.org/abs/2301.13196 • The ConceptARC Benchmark https://arxiv.org/abs/2305.07141 • (e.g. ) • Go-Explore https://doi.org/10.1038/s41586-020-03157-9 Atari57 Montezuma s Revenge and Pitfall • Dreamer V3 https://arxiv.org/abs/2301.04104 • MuZero https://doi.org/10.1038/s41586-020-03051-4 , https://www.repre.org/repre/vol45/special/nobuhara/

Slide 61

Slide 61 text

61 • ( × ) AlphaCode2, AlphaGeometry, FunSearch • NN ( etc) • Mixture of Experts (MoE) System 2 System 1 • • Tokenization

Slide 62

Slide 62 text

62 Q. ( ) A. , • ( ) • ( ) By Design ( ) • ⾒ ( Transformer )

Slide 63

Slide 63 text

• ( ), . ( ) X- informatics , 2023 2 18 • ( ), ChatGPT . 2023, , 2023 8 25 • ( ), AI ChatGPT . AI , 2023 9 14 • (NTT CS ), Neuro-Symbolic AI . 2024 3 27 17 AFSA • ( ), . 18 AFSA 2024 4 24 • Private communications: , ( ), (NII) https://afsa.jp/

Slide 64

Slide 64 text

64 • • Great • May the ML Force be with you… PDF https://itakigawa.github.io/data/comp2024.pdf https://bit.ly/4bnMX3m

Slide 65

Slide 65 text

• . (2), , 93(12), 2023. https://bit.ly/3UNCvN8 • . , 77(10), 2023. https://bit.ly/3Wrjwtf • . , 64(12), 2023 https://bit.ly/3UMjAm4 • , , 49(1), 2022. https://bit.ly/3Quy3Aj • ⾒ , 70(3), 2022. https://bit.ly/3QyNOXa • (FPAI). , 34(5), 2019. https://bit.ly/3Qz8fTK • , . , 2018. • , vs. . , 2021. • , ? . , 2015. • , 11 20 , , 2007. • , . , 2016. • , . , 2012. • , . , 1989. • W G , : . , 2005. • , 1912~1951. , 2003. • , . , 2014. • , ⾒ . , 2016. • , & ? , 2014.

Slide 66

Slide 66 text

QA + • Q. tokenization Byte-level subword-level Char/Byte-level Byte fallback Byte • Q. System 2 System 1 System 1 1000 1001 ( ) • Q. ⾒ • 1 Transformer • 2 ( ) ( ) ( ) ( ) ( ) ( ) • 3