Upgrade to Pro — share decks privately, control downloads, hide ads and more …

帰納と演繹の間を求めて:記号と離散構造の統計的機械学習

itakigawa
May 09, 2024
71

 帰納と演繹の間を求めて:記号と離散構造の統計的機械学習

電子情報通信学会 コンピュテーション研究会(COMP) 招待講演
2024年 5月 8日(水) @ 京都大学楽友会館
Video https://youtu.be/T-grdDu3FIo

itakigawa

May 09, 2024
Tweet

More Decks by itakigawa

Transcript

  1. 3 ( ) • AlphaGeometry (Nature, 2024) • AlphaCode2 (2023)

    & AlphaCode (Science, 2022) (Codeforces) 85% • FunSearch (Nature, 2023) ⾒ (extremal combinatorics) Cap set • AlphaDev (Nature, 2023) 3 5 70 >25 ⾒ 1.7 LLVM libstdc++ • AlphaTensor (Nature, 2022) 50 2 4×4 49 Strassen 47
  2. 5 ( ) CODE-COMPLETION SYSTEMS OFFERING suggestions to a developer

    in their integrated development environment (IDE) have become the most frequently used kind of programmer assistance.1 When generating whole snippets of code, they typically use a large language model (LLM) to predict what the user might type next (the completion) from the context of what they are working on at the moment (the prompt).2 This system allows for completions at any position in Measuring GitHub Copilot’s Impact on Productivity DOI:10.1145/3633453 Case study asks Copilot users about its impact on their productivity, and seeks to find their perceptions mirrored in user data. BY ALBERT ZIEGLER, EIRINI KALLIAMVAKOU, X. ALICE LI, ANDREW RICE, DEVON RIFKIN, SHAWN SIMISTER, GANESH SITTAMPALAM, AND EDWARD AFTANDILIAN key insights AI pair-programming tools such as GitHub Copilot have a big impact on developer productivity. This holds for developers of all skill levels, with junior developers seeing the largest gains. The reported benefits of receiving AI suggestions while coding span the full range of typically investigated aspects of productivity, such as task time, product quality, cognitive load, enjoyment, and learning. Perceived productivity gains are reflected in objective measurements of developer activity. While suggestion correctness is important, the driving factor for these improvements appears to be not correctness as such, but whether the suggestions are useful as a starting point for further development. 54 COMMUNICATIONS OF THE ACM | MARCH 2024 | VOL. 67 | NO. 3 research the code, often spanning multiple lines at once. Potential benefits of generating large sections of code automatically are huge, but evaluating these sys- tems is challenging. Offline evalua- tion, where the system is shown a par- tial snippet of code and then asked to complete it, is difficult not least because for longer completions there are many acceptable alternatives and no straightforward mechanism for labeling them automatically.5 An ad- ditional step taken by some research- ers3,21,29 is to use online evaluation and track the frequency of real us- ers accepting suggestions, assuming that the more contributions a system makes to the developer’s code, the higher its benefit. The validity of this assumption is not obvious when con- sidering issues such as whether two short completions are more valuable than one long one, or whether review- ing suggestions can be detrimental to programming flow. Code completion in IDEs using lan- guage models was first proposed in Hindle et al.,9 and today neural syn- thesis tools such as GitHub Copilot, CodeWhisperer, and TabNine suggest code snippets within an IDE with the explicitly stated intention to increase a user’s productivity. Developer pro- ductivity has many aspects, and a re- cent study has shown that tools like these are helpful in ways that are only partially reflected by measures such as completion times for standardized tasks.23,a Alternatively, we can leverage the developers themselves as expert assessors of their own productivity. This meshes well with current think- ing in software engineering research suggesting measuring productiv- ity on multiple dimensions and using self-reported data.6 Thus, we focus on studying perceived productivity. Here, we investigate whether usage measurements of developer interac- tions with GitHub Copilot can predict perceived productivity as reported by developers. We analyze 2,631 sur- a Nevertheless, such completion times are greatly reduced in many settings, often by more than half.16 MARCH 2024 | VOL. 67 | NO. 3 | COMMUNICATIONS OF THE ACM 55 ILLUSTRATION BY JUSTIN METZ • • • (explain) • (brushes) • • A B • • GitHub Copilot ( ) Comm. ACM. 67(3) https://doi.org/10.1145/3633453 GitHub Next / Copilot Labs https://githubnext.com/
  3. 9 • Q. ( ) ( ) vs ( )

    • • • ( , RAG, ReACT, LangChain, ) • ( ) • × ( ) •
  4. 10 • ( ) • ( ) • ( )

    • ( ) • BHK ( ) ( ) ? (2015)
  5. 11 (2023 ) ACM A.M. Turing Award Honors Avi Wigderson

    for Foundational Contributions to the Theory of Computation https://awards.acm.org/about/2023-turing For Turing Award winner, everything is computation and some problems are unsolvable https://bit.ly/3xZzJLP
  6. 12 = / / • (Church, Kleene, Turing) • /μ

    (Gödel-Herbrand) • (Church) • (Turing) • • ( ) vs. (2021) (2016)
  7. 13 A. ( ) B. ( ) ( ) •

    𝜃 ( ) ℱ = {𝑓 𝜃 ; 𝜃 ∈ Θ} • 𝜃
  8. 14 • • (Memorization) Over t ( ) <latexit sha1_base64="b32IJf3miPJfhFt+JgiJN/Kwon8=">AAAB93icbVA9SwNBEN2LXzF+RS1tFoNgFe5EomXQxjIB8wHJEfY2c8mS3b1jd088jvwCW63txNafY+k/cZNcYRIfDDzem2FmXhBzpo3rfjuFjc2t7Z3ibmlv/+DwqHx80tZRoii0aMQj1Q2IBs4ktAwzHLqxAiICDp1gcj/zO0+gNIvko0lj8AUZSRYySoyVms+DcsWtunPgdeLlpIJyNAbln/4wookAaSgnWvc8NzZ+RpRhlMO01E80xIROyAh6lkoiQPvZ/NApvrDKEIeRsiUNnqt/JzIitE5FYDsFMWO96s3E/7xeYsJbP2MyTgxIulgUJhybCM++xkOmgBqeWkKoYvZWTMdEEWpsNktbAjG1mXirCayT9lXVq1VrzetK/S5Pp4jO0Dm6RB66QXX0gBqohSgC9IJe0ZuTOu/Oh/O5aC04+cwpWoLz9QuO25Od</latexit>

    x <latexit sha1_base64="nUdlwggp4HAu9pQZGCwj2BrIfvg=">AAAB93icbVA9TwJBEJ3DL8Qv1NJmIzGxInfGoCXRxhISQRK4kL1lDjbs7V1290wuhF9gq7WdsfXnWPpPXOAKAV8yyct7M5mZFySCa+O6305hY3Nre6e4W9rbPzg8Kh+ftHWcKoYtFotYdQKqUXCJLcONwE6ikEaBwKdgfD/zn55RaR7LR5Ml6Ed0KHnIGTVWamb9csWtunOQdeLlpAI5Gv3yT28QszRCaZigWnc9NzH+hCrDmcBpqZdqTCgb0yF2LZU0Qu1P5odOyYVVBiSMlS1pyFz9OzGhkdZZFNjOiJqRXvVm4n9eNzXhrT/hMkkNSrZYFKaCmJjMviYDrpAZkVlCmeL2VsJGVFFmbDZLW4JoajPxVhNYJ+2rqler1prXlfpdnk4RzuAcLsGDG6jDAzSgBQwQXuAV3pzMeXc+nM9Fa8HJZ05hCc7XL5Buk54=</latexit> y <latexit sha1_base64="b32IJf3miPJfhFt+JgiJN/Kwon8=">AAAB93icbVA9SwNBEN2LXzF+RS1tFoNgFe5EomXQxjIB8wHJEfY2c8mS3b1jd088jvwCW63txNafY+k/cZNcYRIfDDzem2FmXhBzpo3rfjuFjc2t7Z3ibmlv/+DwqHx80tZRoii0aMQj1Q2IBs4ktAwzHLqxAiICDp1gcj/zO0+gNIvko0lj8AUZSRYySoyVms+DcsWtunPgdeLlpIJyNAbln/4wookAaSgnWvc8NzZ+RpRhlMO01E80xIROyAh6lkoiQPvZ/NApvrDKEIeRsiUNnqt/JzIitE5FYDsFMWO96s3E/7xeYsJbP2MyTgxIulgUJhybCM++xkOmgBqeWkKoYvZWTMdEEWpsNktbAjG1mXirCayT9lXVq1VrzetK/S5Pp4jO0Dm6RB66QXX0gBqohSgC9IJe0ZuTOu/Oh/O5aC04+cwpWoLz9QuO25Od</latexit> x <latexit sha1_base64="nUdlwggp4HAu9pQZGCwj2BrIfvg=">AAAB93icbVA9TwJBEJ3DL8Qv1NJmIzGxInfGoCXRxhISQRK4kL1lDjbs7V1290wuhF9gq7WdsfXnWPpPXOAKAV8yyct7M5mZFySCa+O6305hY3Nre6e4W9rbPzg8Kh+ftHWcKoYtFotYdQKqUXCJLcONwE6ikEaBwKdgfD/zn55RaR7LR5Ml6Ed0KHnIGTVWamb9csWtunOQdeLlpAI5Gv3yT28QszRCaZigWnc9NzH+hCrDmcBpqZdqTCgb0yF2LZU0Qu1P5odOyYVVBiSMlS1pyFz9OzGhkdZZFNjOiJqRXvVm4n9eNzXhrT/hMkkNSrZYFKaCmJjMviYDrpAZkVlCmeL2VsJGVFFmbDZLW4JoajPxVhNYJ+2rqler1prXlfpdnk4RzuAcLsGDG6jDAzSgBQwQXuAV3pzMeXc+nM9Fa8HJZ05hCc7XL5Buk54=</latexit> y
  9. 15 1 • B A 1. 2. • (Cybenko, Hornik,

    etc) 3 1 1 1 3 • ( ) (Vapnik & Chervonenkis, etc) • (Computable Analysis)
  10. 16 2 • or (Universal Prediction): • ( ) n

    (2-n) • • 1960 • ( ) ⾒ https://www.lesswrong.com/tag/solomonoff-induction R. J. Solomonoff, Machine Learning — Past and Future (2009). Dartmouth Artificial Intelligence Conference: The Next Fifty Years (AI@50), Dartmouth, 2006. https://bit.ly/3UvpMgG
  11. 17 • Q. ( ) ( ) vs ( )

    • • • ( , RAG, ReACT, LangChain, ) • ( ) • × ( ) •
  12. 18 ⚠ • ( ) • • (Melanie Mitchell) •

    AI hype is built on high test scores. Those tests are flawed. https://bit.ly/44m7NxX • AI now beats humans at basic tasks — new benchmarks are needed, says major report https://www.nature.com/articles/d41586-024-01087-4 • ( “ ” )
  13. 19 (LLMs) 頻 ( ) GPT-4, Claude 3, Gemini 1.5,

    Mistral, Llama 3, Command, DBRX, PPLX, Grok, https://artificialanalysis.ai/
  14. Me > ChatGPT4 > 1. 1944 2. 1966 3. 1971

    4. 1971 5. 1981 6. 1990 7. 2007 ü : : ü ü ü ( )
  15. Claude 3 Opus > • • • • • •

    AC0 PARITY MAJORITY • TC0 W(S5) • • Transformers as recognizers of formal languages: A survey on expressivity. L Strobl, W Merrill, G Weiss, D Chiang, D Angluin https://arxiv.org/abs/2311.00208
  16. Me > URL Web 10 https://www.lifeiscomputation.com/transformers-are-not-turing-complete/ ChatGPT4 > Web 1.

    : : 2. : : 3. : : 4. : : 5. : : 6. : : 7. : : 8. : : 9. : : 10. : : [ ] ü ü
  17. 25 Microsoft • OpenAI GPT AI Word, Excel, PowerPoint, Outlook,

    Teams, Bing/Edge, Azure, GitHub, VSCode, • GitHub (2018) VSCode GitHub Copilot / Copilot Workspace • 3 (440 ) 3 Apple 2 (🇬🇧🇫🇷🇮🇹 GDP ) • 頻 Windows Microsoft 365 Copilot • Word PowerPoint • Excel • Web
  18. 27 1. ( ) ( ) Embedding ( ) 2.

    ( ) • Vision-Language-Action (VLA) “computer” → [-0.005, 0.014, -0.007 , ..., -0.015]
  19. 29 ⾒ ≑ ⾒ • ( ): ⾒ ⾒ (

    ) ⾒ ⾒ • : (= ) Wikipedia 2014 + Gigaword 5 40 ( ) OpenAI embeddings (large) 10 ※ GloVe: Global Vectors for Word Representation https://nlp.stanford.edu/projects/glove/ king – man + woman 0 king 0.690340 1 woman 0.569618 2 queen 0.521278 3 königin 0.479320 4 queene 0.477531 5 koningin 0.468316 6 women 0.460638 7 konig 0.459005 8 queenie 0.458508 9 queeny 0.455897 france – paris + london 0 london 0.682005 1 france 0.643014 2 england 0.563394 3 london-based 0.554125 4 longdon 0.539819 5 londen 0.532294 6 london-born 0.527847 7 londons 0.512141 8 londin 0.507284 9 britain 0.494870 computation 0 computation 1.000000 1 computations 0.863177 2 computing 0.773753 3 computational 0.728108 4 computationally 0.710972 5 computes 0.702328 6 computability 0.678299 7 calculation 0.647723 8 compute 0.637001 9 computable 0.612179 impeccable 0 impeccable 1.000000 1 impeccably 0.852758 2 unimpeachable 0.769426 3 irreproachable 0.725752 4 immaculate 0.716498 5 immaculately 0.665226 6 flawless 0.656289 7 faultless 0.633298 8 unblemished 0.628034 9 spotless 0.622993
  20. 30 • ( ) • , , , , ,

    , , , , (Character Level) • , , , , (Word Level) • ⾒ (Subword Level) ( + ) • Character/Byte Level (BPE) ( RePair) Byte fallback
  21. 31 1/2 • Causal LM = Next Token Prediction (

    ) • : n : : I'm ne, thank you (9.99e-01) for (1.09e-04) goodness (7.48e-05) • Meta-Llama-3-8B-Instruct ⾒ 頻8192 4096 128256 • n = 8192 ( / ) • vocab size = 128256 ( ) • embedding dim = 4096 ( ) • n : I'm ne, thank for asking. Just a little tired from the trip. I'll be okay
  22. 32 2/2 = Transformer (Llama3 ) ( 8192) (4096) Transformer

    Decoder ❌ ❌ ❌ ❌ / (4096) (embedding) Everything is a remix https://www.everythingisaremix.info/
  23. 33 2/2 = Transformer (Llama3 ) ( 8192) (4096) Transformer

    Decoder … (128256) Transformer Decoder 32 RMSNorm →Linear Multihead Attention RMSNorm + (MLPs) RMSNorm + ❌ ❌ SiLU × y = w2(silu(w1(x)) * w3(x)) Scaled Dot Product Attention 𝒗! 𝒗" 𝒗# 𝒗$ 𝒗% 𝒐! = # "#$ % 𝛼!," 𝒗" 𝒐! 𝒐" 𝒐# 𝒐$ 𝒐% Linear 𝛼#,% 𝒒! 𝒌% 𝒒" 𝒒# 𝒒$ 𝒒% 𝒌$ 𝒌# 𝒌" 𝒌! 𝒒& ′𝒌' dim of 𝒌' Softmax RoPE
  24. from transformers import AutoTokenizer, AutoModelForCausalLM import torch model_id = "meta-llama/Meta-Llama-3-8B-Instruct"

    tokenizer = AutoTokenizer.from_pretrained(model_id) llama3_lm = AutoModelForCausalLM.from_pretrained( model_id, torch_dtype=torch.bfloat16, device_map="auto", ) print(llama3_lm); print(llama3_lm.model.config) LlamaForCausalLM( (model): LlamaModel( (embed_tokens): Embedding(128256, 4096) (layers): ModuleList( (0-31): 32 x LlamaDecoderLayer( (self_attn): LlamaSdpaAttention( (q_proj): Linear(in_features=4096, out_features=4096, bias=False) (k_proj): Linear(in_features=4096, out_features=1024, bias=False) (v_proj): Linear(in_features=4096, out_features=1024, bias=False) (o_proj): Linear(in_features=4096, out_features=4096, bias=False) (rotary_emb): LlamaRotaryEmbedding() ) (mlp): LlamaMLP( (gate_proj): Linear(in_features=4096, out_features=14336, bias=False) (up_proj): Linear(in_features=4096, out_features=14336, bias=False) (down_proj): Linear(in_features=14336, out_features=4096, bias=False) (act_fn): SiLU() ) (input_layernorm): LlamaRMSNorm() (post_attention_layernorm): LlamaRMSNorm() ) ) (norm): LlamaRMSNorm() ) (lm_head): Linear(in_features=4096, out_features=128256, bias=False) )
  25. 35 ( ) • ( ) Embedding ( ) •

    Llama3 15 ( ) • • ( ) 頻 • memorization
  26. 36 • ( ) ) GPT-4 ChatGPT • Instruction Tuning

    ( Fine Tuning) ( ) Fine Tuning • Human Feedback (RLHF) • • • Fine Tuning https://bit.ly/4bloMTk
  27. 37 AI ( ) 1950 1960 1970 1980 1990 2000

    2010 2020 2 3 ( ) , , Level-1 Level-2 × Level-3 AI(GOFAI) !?
  28. 38 Level-1 • (Llama3 15 ) ( etc) • •

    (Fine Tuning) • RAG ( ) ReACT( ) ( VectorDB ) ) Python • ( ) ( Claude3 200K, Gemini 1.5 Pro 128K)
  29. 40 Level-2 ( ) Tree of Thoughts/ToT (Yao et el.

    2023, Long 2023) • • ⾒ •
  30. 41 AlphaCode2 • AlphaCode2 (2023) & AlphaCode (Science, 2022) (Codeforces)

    85% • https://goo.gle/AlphaCode2 https://bit.ly/3UvYg2T https://bit.ly/3JLXhGD
  31. 42 AlphaCode2 ü Gemini Pro CodeContests Fine Tuning ü Tune

    Tuning ü 100 ü 95% ü ü Fine Tuning Gemini Pro 10
  32. 43 AlphaGeometry • AlphaGeometry (Nature, 2024) • Nature 625, 476

    482 (2024). https://doi.org/10.1038/s41586-023-06747-5 https://bit.ly/4dkSKc1
  33. 48 FunSearch • FunSearch (Nature, 2023) ⾒ (extremal combinatorics) Cap

    set • Nature 625, 468 475 (2024). https://doi.org/10.1038/s41586-023-06924-6 https://bit.ly/3WprEdx
  34. 49 FunSearch ü LLM ü FunSearch Fun ( ) ü

    Google LLM PaLM 2 Codey 蓄 Fine Tuning ü ü ü k ⾒
  35. 51 • Q. ( ) ( ) vs ( )

    • • • ( , RAG, ReACT, LangChain, ) • ( ) • × ( ) •
  36. 52 Level-3 × • Q. ( ) • • or

    • • • / RAG • ReACT • •
  37. 53 • keras https://keras.io/examples/nlp/addition_rnn/ • "535+61" "596" • LSTM (1997)

    Folklore Sequence to Sequence Learning with Neural Networks (2014) https://arxiv.org/abs/1409.3215 Learning to Execute (2014) https://arxiv.org/abs/1410.4615 • +/- Reversed Karpathy s minGPT demo https://github.com/karpathy/minGPT/blob/master/projects/adder/adder.py • (2 +2 ) • Onehot +1 LSTM 99% (Transformer ) • memorization ( data leakage )
  38. 54 CS • • • ( ) ( ) •

    • • ( ) • ( ) • CS
  39. 55 1 • 1cm 2 ( ) / • (

    ) , ( ) • • ( ) 2 ? https://bit.ly/4b9mcjz ( ) • 1000 1001 / ( )
  40. 56 2 • ( ) • ( ) ( or

    ) ( ) ( ) • • ? • ⾒ ?
  41. 57 ⾒ • ⾒ ( ) • ⾒ ⾒ ⾒

    • • Underdetermined/Underspecified 頻 • ( ) • • ? ( ), ( )
  42. 58 Level-3 × • : Thinking slow (System 2) and

    fast (System 1) • • ⾒ • ⾒System 2( ) OK • • System2 • × https://arxiv.org/html/2401.14953v1 • LLM
  43. 59 1. • Transformer Stateless • 2. • Transformer Memoryless

    ( ?) • ( ) 3. ( Simulate Feedback) • • • / Optional
  44. 60 / Feedback • / • Self-Refine https://arxiv.org/abs/2303.17651 • Looped

    Transformer https://arxiv.org/abs/2301.13196 • The ConceptARC Benchmark https://arxiv.org/abs/2305.07141 • (e.g. ) • Go-Explore https://doi.org/10.1038/s41586-020-03157-9 Atari57 Montezuma s Revenge and Pitfall • Dreamer V3 https://arxiv.org/abs/2301.04104 • MuZero https://doi.org/10.1038/s41586-020-03051-4 , https://www.repre.org/repre/vol45/special/nobuhara/
  45. 61 • ( × ) AlphaCode2, AlphaGeometry, FunSearch • NN

    ( etc) • Mixture of Experts (MoE) System 2 System 1 • • Tokenization
  46. 62 Q. ( ) A. , • ( ) •

    ( ) By Design ( ) • ⾒ ( Transformer )
  47. • ( ), . ( ) X- informatics , 2023

    2 18 • ( ), ChatGPT . 2023, , 2023 8 25 • ( ), AI ChatGPT . AI , 2023 9 14 • (NTT CS ), Neuro-Symbolic AI . 2024 3 27 17 AFSA • ( ), . 18 AFSA 2024 4 24 • Private communications: , ( ), (NII) https://afsa.jp/
  48. 64 • • Great • May the ML Force be

    with you… PDF https://itakigawa.github.io/data/comp2024.pdf https://bit.ly/4bnMX3m
  49. • . (2), , 93(12), 2023. https://bit.ly/3UNCvN8 • . ,

    77(10), 2023. https://bit.ly/3Wrjwtf • . , 64(12), 2023 https://bit.ly/3UMjAm4 • , , 49(1), 2022. https://bit.ly/3Quy3Aj • ⾒ , 70(3), 2022. https://bit.ly/3QyNOXa • (FPAI). , 34(5), 2019. https://bit.ly/3Qz8fTK • , . , 2018. • , vs. . , 2021. • , ? . , 2015. • , 11 20 , , 2007. • , . , 2016. • , . , 2012. • , . , 1989. • W G , : . , 2005. • , 1912~1951. , 2003. • , . , 2014. • , ⾒ . , 2016. • , & ? , 2014.
  50. QA + • Q. tokenization Byte-level subword-level Char/Byte-level Byte fallback

    Byte • Q. System 2 System 1 System 1 1000 1001 ( ) • Q. ⾒ • 1 Transformer • 2 ( ) ( ) ( ) ( ) ( ) ( ) • 3