Slide 1

Slide 1 text

[email protected] 15 2024 1 26 ( )

Slide 2

Slide 2 text

Slido https://app.sli.do/event/d13KtHT3VXxFbmLvxPSQjn PDF https://itakigawa.github.io/data/autograd.pdf

Slide 3

Slide 3 text

3 § § § § § ( ) § ( ) § https://itakigawa.github.io/

Slide 4

Slide 4 text

4 1. 2. 3. ( )

Slide 5

Slide 5 text

5 § ը૾ੜ੒"*͕ര଎ͰਐԽͨ͠೥Λ·ͱΊͯৼΓฦΔ https://ascii.jp/elem/000/004/174/4174570/ 2023 AI

Slide 6

Slide 6 text

6 § AI https://www.itmedia.co.jp/news/articles/2310/05/news179.html

Slide 7

Slide 7 text

7 AI § https://www.itmedia.co.jp/news/articles/2310/05/news179.html

Slide 8

Slide 8 text

8 Adobe Photoshop https://www.adobe.com/jp/products/photoshop/generative-fill.html

Slide 9

Slide 9 text

9 § ChatGPT 2022 11 2 1 § AI § GAFAM IT 2023 OpenAI ChatGPT

Slide 10

Slide 10 text

10 1. - - - 2. - - - 3. - - - DIY OpenAI ChatGPT 4. - - 5. GPTs - DALL-E - - 6. - - § § ChatGPT (ChatGPT )

Slide 11

Slide 11 text

13 § Microsoft OpenAI ChatGPT DALL-E AI § AI 3 (440 ) 3 Apple 2 (🇬🇧🇫🇷🇮🇹 GDP ) § Bing ( Edge) ChatGPT ChatGPT Web § GitHub Copilot( ) § ( ) § § Windows 11 Microsoft 365 Copilot/Copilot Pro Microsoft Copilot ( )

Slide 12

Slide 12 text

14 § AI Windows 11/ Microsoft 365 Copilot § MS (Word, Excel, Powerpoint, Outlook, Teams) AI § § Word § Powerpoint § Powerpoint § Excel § Web § Windows MS

Slide 13

Slide 13 text

15 §

Slide 14

Slide 14 text

16 AI § § ( 2023 12 ) § ChatGPT AI 2 § http://hdl.handle.net/2433/286548 § ( ), ( ) § 1 (Preferred Networks) § 2 ( ) § ( )

Slide 15

Slide 15 text

1.

Slide 16

Slide 16 text

18 .. . A B ( ) • / • •

Slide 17

Slide 17 text

19 ίϯϐϡʔλ ϓϩάϥϜ ೖྗ ग़ྗ my_function x1 x2 y

Slide 18

Slide 18 text

20 ( ) J aime la musique I love music (Software 2.0 )

Slide 19

Slide 19 text

21 = (g) (cm) (g) (cm) (cm) (g) or

Slide 20

Slide 20 text

22 = or (g) (cm) (cm) (g)

Slide 21

Slide 21 text

23 = Random Forest Gaussian Process Logistic Regression P( =red) 0 1 or

Slide 22

Slide 22 text

24 Random Forest Neural Network SVR Kernel Ridge p1 p2 p3 p4 … ( ) ( ) ( ) ⾒

Slide 23

Slide 23 text

25 § : 1 : 1 = ( )

Slide 24

Slide 24 text

26 Decision Tree Random Forest GBDT Nearest Neighbor Logistic Regression SVM Gaussian Process Neural Network ( )

Slide 25

Slide 25 text

2.

Slide 26

Slide 26 text

28 vs ( ) q1 q2 q3 q4 … Random Forest GBDT Nearest Neighbor SVM Gaussian Process Neural Network

Slide 27

Slide 27 text

29 vs ( ) p1 p2 p3 p4 … ( ) q1 q2 q3 q4 …

Slide 28

Slide 28 text

30 § § 𝑓(𝑥) 𝑦 ℓ(𝑓 𝑥 , 𝑦) § (MSE) § (Cross Entropy) § = (optimization) Minimize! 𝐿 𝜃 𝐿 𝜃 = ∑"#$ % ℓ(𝑓 𝑥" , 𝜃 , 𝑦" ) + Ω(𝜃)

Slide 29

Slide 29 text

31 𝐿 𝑎, 𝑏 = − log * !:#!$% 𝑃(𝑦 = 1|𝑥!) * !:#!$& 𝑃(𝑦 = 0|𝑥!) データ モデル 𝑓 𝑥 = 𝜎 𝑎𝑥 + 𝑏 = 1 1 + 𝑒'()*+,) = 𝑃(𝑦 = 1|𝑥) 𝑥%, 𝑦% , 𝑥., 𝑦. , ⋯ , (𝑥/, 𝑦/) 各𝑦! 0 1 2 (Cross Entropy) (-1)× = − 8 !$% / 𝑦! log 𝑓(𝑥!) + 1 − 𝑦! log(1 − 𝑓 𝑥! ) ( ) 0 1 ( )

Slide 30

Slide 30 text

32 1 1 𝑓 𝑥 = 𝜎 𝑎𝑥 + 𝑏 = 1 1 + 𝑒'()*+,) 𝑎𝑥 + 𝑏 𝑎 𝑏 𝑥 9 𝑦 𝑧 𝑥" , 𝑦" , 𝑥# , 𝑦# , (𝑥$ , 𝑦$ ) Minimize 𝐿 𝑎, 𝑏 𝐿 𝑎, 𝑏 = − 𝑦" log " "%&! "#$%& + 1 − 𝑦" log 1 − " "%&! "#$%& − 𝑦# log 1 1 + 𝑒'()*'%+) + 1 − 𝑦# log 1 − 1 1 + 𝑒'()*'%+) − 𝑦$ log 1 1 + 𝑒'()*(%+) + 1 − 𝑦$ log 1 − 1 1 + 𝑒'()*(%+) 𝜎 𝑥 = 1 1 + 𝑒'* Linear Sigmoid ( )

Slide 31

Slide 31 text

33 1 1 3 ( 1 ) Minimize 𝐿 𝑎! , 𝑎" , 𝑏! , 𝑏" 𝐿 𝑎) , 𝑎* , 𝑏) , 𝑏* = − 𝑦) log 1 1 + 𝑒+ ,! ) )-."($%&%'(%) -/! + 1 − 𝑦) log 1 − 1 1 + 𝑒+ ,! ) )-."($%&%'(%) -/! − 𝑦* log 1 1 + 𝑒+ ,! ) )-."($%&!'(%) -/! + 1 − 𝑦* log 1 − 1 1 + 𝑒+ ,! ) )-."($%&!'(%) -/! − 𝑦0 log 1 1 + 𝑒+ ,! ) )-."($%&*'(%) -/! + 1 − 𝑦0 log 1 − 1 1 + 𝑒+ ,! ) )-."($%&*'(%) -/! 𝑎% 𝑏% 𝑥 𝑧 1 1 + 𝑒#(%!&'(!) 𝑎. 𝑏. 𝑦 1 1 + 𝑒#(%"*'(") 𝑥 𝑧 𝑦

Slide 32

Slide 32 text

34 1 1 3 ( 2 ) 𝑥 𝑦 𝑎" , 𝑏" Linear 𝑢" 𝑎# , 𝑏# Linear 𝑢# Sigmoid 𝑧" Sigmoid 𝑧# 𝑎$ , 𝑏$ , 𝑐$ Linear 𝑣 Sigmoid Minimize 𝐿 𝑎! , 𝑎" , 𝑎+ , 𝑏! , 𝑏" , 𝑏+ , 𝑐+ 𝑎0 𝑧) + 𝑏0 𝑧* + 𝑐0

Slide 33

Slide 33 text

35 § § ResNet50 2600 § AlexNet 6100 § ResNeXt101 8400 § VGG19 1 4300 § LLaMa 650 § Chinchilla 700 § GPT-3 1750 § Gopher 2800 § PaLM 5400 (CNN) Transformer ( )

Slide 34

Slide 34 text

36 𝐿 𝜃%, 𝜃., ⋯ 𝜃% 𝜃. 1. θ$ , θ; , ⋯ 2. 𝐿 𝜃$ , 𝜃; , ⋯ θ$ , θ; , ⋯ 3. ( )

Slide 35

Slide 35 text

37 𝑓 𝑥, 𝑦 (𝑎, 𝑏) 𝑥 𝑥 𝑦 = 𝑏 𝑥 = 𝑎 𝑓* 𝑎, 𝑏 = lim 0→& 2 )+0, , '2(),,) 0 𝑓& (𝑎, 𝑏) 𝑓 𝑎, 𝑏 𝑎

Slide 36

Slide 36 text

38 (Gradient) = = ( )

Slide 37

Slide 37 text

39 = ( )

Slide 38

Slide 38 text

40 §

Slide 39

Slide 39 text

41 = • 𝒙 𝑓(𝒙) 𝒙 (learning rate) 𝛼 : 1 " "(step size) 𝒙 𝑥% 𝑥. ⋮ 𝑥4 ← 𝑥% 𝑥. ⋮ 𝑥4 − 𝛼 × 𝜕𝑓/𝜕𝑥% 𝜕𝑓/𝜕𝑥. ⋮ 𝜕𝑓/𝜕𝑥4 𝑓(𝒙) 𝑥! 𝑥" 𝒙 = 𝑥! 𝑥"

Slide 40

Slide 40 text

42 ( )

Slide 41

Slide 41 text

43 § § lr ( ) ( )

Slide 42

Slide 42 text

44 § § § Momentum § AdaGrad § RMSProp § Adam https://towardsdatascience.com/a-visual- explanation-of-gradient-descent-methods- momentum-adagrad-rmsprop-adam-f898b102325c

Slide 43

Slide 43 text

45 § 45 𝑥 𝑦 𝑎) , 𝑏) Linear 𝑢) 𝑎* , 𝑏* Linear 𝑢* Sigmoid 𝑧) Sigmoid 𝑧* 𝑎0 , 𝑏0 , 𝑐0 Linear 𝑣 Sigmoid 𝑦 𝜕𝑦/𝜕𝑎! , 𝜕𝑦/𝜕𝑏! , 𝜕𝑦/𝜕𝑐! ,

Slide 44

Slide 44 text

3. ( )

Slide 45

Slide 45 text

47 § (Chain Rule) § 𝑥 𝑢 = 𝑓(𝑥) 𝑦 = 𝑔(𝑢) 𝑑𝑦 𝑑𝑥 = 𝑑𝑦 𝑑𝑢 𝑑𝑢 𝑑𝑥 § 𝑡 𝑥! = 𝑓! 𝑡 , 𝑥" = 𝑓" 𝑡 , ⋯ ⾒ 𝑧 = 𝑔(𝑥! , 𝑥" , ⋯ ) 𝜕𝑧 𝜕𝑡 = 𝜕𝑧 𝜕𝑥% 𝜕𝑥% 𝜕𝑡 + 𝜕𝑧 𝜕𝑥. 𝜕𝑥. 𝜕𝑡 + ⋯ 𝑡 𝑥% 𝑥. 𝑧 ⋯ 𝑥 𝑢 𝑦 𝑓 𝑔 𝑓! 𝑓" 𝑔

Slide 46

Slide 46 text

48 § 𝑦 = 𝑒;< = > < 𝑥 𝑒'* 1/𝑥 𝑎 𝑏 𝑦 𝑎.𝑏 𝑎 = 𝑒'* 𝑏 = 1 𝑥 𝑦 = 𝑎.𝑏 𝜕𝑓 𝜕𝑥 = 𝑒'.* 𝑥 5 = − 𝑒'.*(2𝑥 + 1) 𝑥. ( or ) 𝜕𝑦 𝜕𝑥 = 𝜕𝑦 𝜕𝑎 𝜕𝑎 𝜕𝑥 + 𝜕𝑦 𝜕𝑏 𝜕𝑏 𝜕𝑥 = 2𝑎𝑏 −𝑒'* + 𝑎. − 1 𝑥. = 2𝑒'* J % * −𝑒'* + 𝑒'* . − % *- = − 6.-/(.*+%) *- ( )

Slide 47

Slide 47 text

49 1 3 𝜕𝑦/𝜕𝑥 𝑥 𝑦 𝑎" , 𝑏" Linear 𝑢" 𝑎# , 𝑏# Linear 𝑢# Sigmoid 𝑧" Sigmoid 𝑧# 𝑎$ , 𝑏$ , 𝑐$ Linear 𝑣 Sigmoid

Slide 48

Slide 48 text

50 1 3 𝑥 𝑦 𝑢" 𝑢# 𝑧" 𝑧# 𝑣 𝑑𝑦 𝑑𝑥 = 𝑑𝑢% 𝑑𝑥 𝑑𝑧% 𝑑𝑢% 𝑑𝑣 𝑑𝑧% 𝑑𝑦 𝑑𝑣 + 𝑑𝑢. 𝑑𝑥 𝑑𝑧. 𝑑𝑢. 𝑑𝑣 𝑑𝑧. 𝑑𝑦 𝑑𝑣 𝑑𝑦 𝑑𝑣 𝑑𝑣 𝑑𝑧" 𝑑𝑣 𝑑𝑧# 𝑑𝑧# 𝑑𝑢# 𝑑𝑧" 𝑑𝑢" 𝑑𝑢 𝑑𝑥 𝑑𝑢" 𝑑𝑥 𝑑𝑦 𝑑𝑣 ( )

Slide 49

Slide 49 text

51 2 𝑣 𝑢 𝑓% 𝑣 = 3 𝑢 𝑣 = log 𝑢 𝑣 = 1/𝑢 2 𝑣 𝑎 𝑏 𝑔% 𝑣 = 𝑎 𝑏 𝑣 = 𝑎. + 𝑏 𝑑𝑣 𝑑𝑢 = 3 2 𝑢 𝑑𝑣 𝑑𝑢 = 1 𝑢 𝑑𝑣 𝑑𝑢 = − 1 𝑢. 𝜕𝑣 𝜕𝑎 = 𝑏 𝜕𝑣 𝜕𝑏 = 𝑎 𝜕𝑣 𝜕𝑎 = 2𝑎 𝜕𝑣 𝜕𝑏 = 1 𝑣 𝑢 𝑓. 𝑣 𝑢 𝑓7 𝑣 𝑎 𝑏 𝑔. 𝑦 = 𝑔%(𝑔. 𝑓7 𝑥 , 𝑓. 𝑥 , 𝑓% 𝑥 ) = 3 𝑥 log 𝑥 + 1 𝑥.

Slide 50

Slide 50 text

52 2 𝑑𝑦 𝑑𝑥 = 𝑑𝑧, 𝑑𝑥 𝑑𝑧! 𝑑𝑧, 𝑑𝑦 𝑑𝑧! + 𝑑𝑧+ 𝑑𝑥 𝑑𝑧! 𝑑𝑧+ 𝑑𝑦 𝑑𝑧! + 𝑑𝑧" 𝑑𝑥 𝑑𝑦 𝑑𝑧" = 3 𝑥 1 𝑥 − 2 𝑥+ + 3 log 𝑥 + 1 𝑥" 2 𝑥 𝑦 = 𝑔% 𝑧%, 𝑧. = 𝑧%𝑧. 𝑧% = 𝑔. 𝑧8, 𝑧7 = 𝑧7 + 𝑧8 . 𝑧. = 𝑓% 𝑥 = 3 𝑥 𝑧7 = 𝑓. 𝑥 = log 𝑥 𝑧8 = 𝑓7 𝑥 = 1/𝑥 𝑥 𝑧. 𝑧7 𝑧8 𝑧% 𝑦 𝑓" 𝑓# 𝑓$ 𝑔# 𝑔" 𝑦 = 𝑔%(𝑔. 𝑓7 𝑥 , 𝑓. 𝑥 , 𝑓% 𝑥 ) = 3 𝑥 log 𝑥 + 1 𝑥.

Slide 51

Slide 51 text

53 (1 ) 𝑧. = 3 𝑥 𝑧7 = log 𝑥 𝑧8 = 1/𝑥 𝑧% = 𝑧8 . + 𝑧7 𝑦 = 𝑧%𝑧. 2 x=1.2 𝑦 = 3 𝑥 log 𝑥 + 1 𝑥. 2 • • backward x=1.2 dy/dx ( )

Slide 52

Slide 52 text

54 § ( ) § § 𝑑𝑦 𝑑𝑥 = 3 𝑥 1 𝑥 − 2 𝑥? + 3 log 𝑥 + 1 𝑥; 2 𝑥 𝑥 = 1.2 @A @B 0.14 § ( )

Slide 53

Slide 53 text

55 𝑦 = 𝑧%𝑧. 𝑧% = 𝑧8 . + 𝑧7 𝑧. = 3 𝑥 𝑧7 = log 𝑥 𝑧8 = 1/𝑥 𝑥 1.2 𝑧7 0.18 𝑧. 3.29 𝑧8 0.83 data data data data Forward

Slide 54

Slide 54 text

56 𝑦 = 𝑧%𝑧. 𝑧% = 𝑧8 . + 𝑧7 𝑧. = 3 𝑥 𝑧7 = log 𝑥 𝑧8 = 1/𝑥 𝑥 1.2 𝑧7 0.18 𝑧. 3.29 𝑧8 0.83 𝑧% 0.88 data data data data data Forward

Slide 55

Slide 55 text

57 𝑦 = 𝑧%𝑧. 𝑧% = 𝑧8 . + 𝑧7 𝑧. = 3 𝑥 𝑧7 = log 𝑥 𝑧8 = 1/𝑥 𝑥 1.2 𝑧7 0.18 𝑧. 3.29 𝑧8 0.83 𝑧% 0.88 𝑦 2.88 data data data data data data Forward

Slide 56

Slide 56 text

58 𝑦 = 𝑧%𝑧. 𝑧% = 𝑧8 . + 𝑧7 𝑧. = 3 𝑥 𝑧7 = log 𝑥 𝑧8 = 1/𝑥 𝑥 1.2 𝑧7 0.18 𝑧. 3.29 𝑧8 0.83 𝑧% 0.88 𝑦 2.88 1.00 data grad data grad data grad data grad 𝜕𝑦 𝜕𝑦 = 1 data grad data grad Backward

Slide 57

Slide 57 text

59 𝑦 = 𝑧%𝑧. 𝑧% = 𝑧8 . + 𝑧7 𝑧. = 3 𝑥 𝑧7 = log 𝑥 𝑧8 = 1/𝑥 𝑥 1.2 𝑧7 0.18 𝑧. 3.29 0.88 𝑧8 0.83 𝑧% 0.88 3.29 𝑦 2.88 1.00 data grad data grad data grad data grad 𝜕𝑦 𝜕𝑦 = 1 𝜕𝑦 𝜕𝑧" = 𝑧# 𝜕𝑦 𝜕𝑧# = 𝑧" data grad data grad = 0.88 = 3.29 Backward

Slide 58

Slide 58 text

60 𝑦 = 𝑧%𝑧. 𝑧% = 𝑧8 . + 𝑧7 𝑧. = 3 𝑥 𝑧7 = log 𝑥 𝑧8 = 1/𝑥 𝑥 1.2 𝑧7 0.18 𝑧. 3.29 0.88 𝑧8 0.83 𝑧% 0.88 3.29 𝑦 2.88 1.00 data grad data grad data grad data grad 𝜕𝑦 𝜕𝑦 = 1 𝜕𝑦 𝜕𝑧" = 𝑧# 𝜕𝑦 𝜕𝑧# = 𝑧" 𝜕𝑧" 𝜕𝑧$ = 1 𝜕𝑧" 𝜕𝑧0 = 2 𝑧0 data grad data grad = 1.67 𝜕𝑦 𝜕𝑧$ = 𝜕𝑦 𝜕𝑧" 𝜕𝑧" 𝜕𝑧$ 𝜕𝑦 𝜕𝑧0 = 𝜕𝑦 𝜕𝑧" 𝜕𝑧" 𝜕𝑧0 = 0.88 = 3.29 Backward

Slide 59

Slide 59 text

61 𝑦 = 𝑧%𝑧. 𝑧% = 𝑧8 . + 𝑧7 𝑧. = 3 𝑥 𝑧7 = log 𝑥 𝑧8 = 1/𝑥 𝑥 1.2 𝑧7 0.18 3.29 𝑧. 3.29 0.88 𝑧8 0.83 5.48 𝑧% 0.88 3.29 𝑦 2.88 1.00 data grad data grad data grad data grad 𝜕𝑦 𝜕𝑦 = 1 𝜕𝑦 𝜕𝑧" = 𝑧# 𝜕𝑦 𝜕𝑧# = 𝑧" 𝜕𝑧" 𝜕𝑧$ = 1 𝜕𝑧" 𝜕𝑧0 = 2 𝑧0 data grad data grad = 1.67 𝜕𝑦 𝜕𝑧$ = 𝜕𝑦 𝜕𝑧" 𝜕𝑧" 𝜕𝑧$ 𝜕𝑦 𝜕𝑧0 = 𝜕𝑦 𝜕𝑧" 𝜕𝑧" 𝜕𝑧0 = 0.88 = 3.29 Backward

Slide 60

Slide 60 text

62 𝑦 = 𝑧%𝑧. 𝑧% = 𝑧8 . + 𝑧7 𝑧. = 3 𝑥 𝑧7 = log 𝑥 𝑧8 = 1/𝑥 𝑥 1.2 𝑧7 0.18 3.29 𝑧. 3.29 0.88 𝑧8 0.83 5.48 𝑧% 0.88 3.29 𝑦 2.88 1.00 data grad data grad data grad data grad 𝜕𝑦 𝜕𝑦 = 1 𝜕𝑦 𝜕𝑧" = 𝑧# 𝜕𝑦 𝜕𝑧# = 𝑧" 𝜕𝑧" 𝜕𝑧$ = 1 𝜕𝑧" 𝜕𝑧0 = 2 𝑧0 data grad data grad = 1.67 𝜕𝑦 𝜕𝑥 = 𝜕𝑦 𝜕𝑧# 𝜕𝑧# 𝜕𝑥 + 𝜕𝑦 𝜕𝑧" 𝜕𝑧" 𝜕𝑧$ 𝜕𝑧$ 𝜕𝑥 + 𝜕𝑦 𝜕𝑧" 𝜕𝑧" 𝜕𝑧0 𝜕𝑧0 𝜕𝑥 𝜕𝑧# 𝜕𝑥 = 3 2 𝑥 = 1.37 = 0.88 = 3.29 Backward

Slide 61

Slide 61 text

63 𝑦 = 𝑧%𝑧. 𝑧% = 𝑧8 . + 𝑧7 𝑧. = 3 𝑥 𝑧7 = log 𝑥 𝑧8 = 1/𝑥 𝑥 1.2 𝑧7 0.18 3.29 𝑧. 3.29 0.88 𝑧8 0.83 5.48 𝑧% 0.88 3.29 𝑦 2.88 1.00 data grad data grad data grad data grad 𝜕𝑦 𝜕𝑦 = 1 𝜕𝑦 𝜕𝑧" = 𝑧# 𝜕𝑦 𝜕𝑧# = 𝑧" 𝜕𝑧" 𝜕𝑧$ = 1 𝜕𝑧" 𝜕𝑧0 = 2 𝑧0 data grad data grad = 1.67 𝜕𝑦 𝜕𝑥 = 𝜕𝑦 𝜕𝑧# 𝜕𝑧# 𝜕𝑥 + 𝜕𝑦 𝜕𝑧" 𝜕𝑧" 𝜕𝑧$ 𝜕𝑧$ 𝜕𝑥 + 𝜕𝑦 𝜕𝑧" 𝜕𝑧" 𝜕𝑧0 𝜕𝑧0 𝜕𝑥 𝜕𝑧# 𝜕𝑥 = 3 2 𝑥 = 1.37 𝜕𝑧$ 𝜕𝑥 = 1 𝑥 = 0.83 = 0.88 = 3.29 Backward

Slide 62

Slide 62 text

64 𝑦 = 𝑧%𝑧. 𝑧% = 𝑧8 . + 𝑧7 𝑧. = 3 𝑥 𝑧7 = log 𝑥 𝑧8 = 1/𝑥 𝑥 1.2 𝑧7 0.18 3.29 𝑧. 3.29 0.88 𝑧8 0.83 5.48 𝑧% 0.88 3.29 𝑦 2.88 1.00 data grad data grad data grad data grad 𝜕𝑦 𝜕𝑦 = 1 𝜕𝑦 𝜕𝑧" = 𝑧# 𝜕𝑦 𝜕𝑧# = 𝑧" 𝜕𝑧" 𝜕𝑧$ = 1 𝜕𝑧" 𝜕𝑧0 = 2 𝑧0 data grad data grad = 1.67 𝜕𝑦 𝜕𝑥 = 𝜕𝑦 𝜕𝑧# 𝜕𝑧# 𝜕𝑥 + 𝜕𝑦 𝜕𝑧" 𝜕𝑧" 𝜕𝑧$ 𝜕𝑧$ 𝜕𝑥 + 𝜕𝑦 𝜕𝑧" 𝜕𝑧" 𝜕𝑧0 𝜕𝑧0 𝜕𝑥 𝜕𝑧# 𝜕𝑥 = 3 2 𝑥 = 1.37 𝜕𝑧$ 𝜕𝑥 = 1 𝑥 = 0.83 𝜕𝑧0 𝜕𝑥 = − 1 𝑥# = -0.69 = 0.88 = 3.29 Backward

Slide 63

Slide 63 text

65 𝑦 = 𝑧%𝑧. 𝑧% = 𝑧8 . + 𝑧7 𝑧. = 3 𝑥 𝑧7 = log 𝑥 𝑧8 = 1/𝑥 𝑥 1.2 0.14 𝑧7 0.18 3.29 𝑧. 3.29 0.88 𝑧8 0.83 5.48 𝑧% 0.88 3.29 𝑦 2.88 1.00 data grad data grad data grad data grad 𝜕𝑦 𝜕𝑦 = 1 data grad data grad 1.67 𝜕𝑦 𝜕𝑥 = 𝜕𝑦 𝜕𝑧# 𝜕𝑧# 𝜕𝑥 + 𝜕𝑦 𝜕𝑧" 𝜕𝑧" 𝜕𝑧$ 𝜕𝑧$ 𝜕𝑥 + 𝜕𝑦 𝜕𝑧" 𝜕𝑧" 𝜕𝑧0 𝜕𝑧0 𝜕𝑥 1.37 0.83 -0.69 0.88 3.29 1 Backward

Slide 64

Slide 64 text

66 𝑥 1.2 0.14 𝑧7 0.18 3.29 𝑧. 3.29 0.88 𝑧8 0.83 5.48 𝑧% 0.88 3.29 𝑦 2.88 1.0 data data data data data data grad grad grad grad grad grad 𝜕𝑦 𝜕𝑥 𝜕𝑦 𝜕𝑧# 𝜕𝑦 𝜕𝑧$ 𝜕𝑦 𝜕𝑧0 𝜕𝑦 𝜕𝑧" 𝜕𝑦 𝜕𝑦 Forward Backward y ( )

Slide 65

Slide 65 text

67 𝑧. = 3 𝑥 𝑥 1.2 data grad 𝑧. 3.29 data grad Forward Backward z2.data ← 3 * sqrt(x.data) 𝑥 1.2 data grad 𝑧. 3.29 data grad backward x.grad ← 3/(2*sqrt(x.data)) * z2.grad 𝜕𝑧# 𝜕𝑥 = 3 2 𝑥 x.grad ← 3/(2*sqrt(x.data)) * z2.grad 𝜕𝑦 𝜕𝑥 = 𝜕𝑦 𝜕𝑧# × 𝜕𝑧# 𝜕𝑥

Slide 66

Slide 66 text

68 𝑧. = 3 𝑥 𝑥 1.2 data grad 𝑧. 3.29 data grad Forward Backward z2.data ← 3 * sqrt(x.data) 𝑥 1.2 data grad 𝑧. 3.29 data grad backward x.grad ← 3/(2*sqrt(x.data)) * z2.grad 𝜕𝑧# 𝜕𝑥 = 3 2 𝑥 x.grad ← 3/(2*sqrt(x.data)) * z2.grad 𝜕𝑦 𝜕𝑧# = 0.88 1.20 0.88 𝜕𝑦 𝜕𝑥 = 𝜕𝑦 𝜕𝑧# × 𝜕𝑧# 𝜕𝑥

Slide 67

Slide 67 text

69 𝑦 = 𝑧%𝑧. 𝑧% = 𝑧8 . + 𝑧7 𝑧. = 3 𝑥 𝑧7 = log 𝑥 𝑧8 = 1/𝑥 𝑥 1.2 𝑧7 0.18 𝑧. 3.29 𝑧8 0.83 𝑧% 0.88 𝑦 2.88 data data grad data grad data grad grad data grad data grad ( ) 1. Forward ( & )

Slide 68

Slide 68 text

70 𝜕𝑧+ 𝜕𝑥 = − 1 𝑥, 𝑦 = 𝑧%𝑧. 𝑧% = 𝑧8 . + 𝑧7 𝑧. = 3 𝑥 𝑧7 = log 𝑥 𝑧8 = 1/𝑥 𝑥 1.2 𝑧7 0.18 𝑧. 3.29 𝑧8 0.83 𝑧% 0.88 𝑦 2.88 data data data data data data grad grad grad grad grad grad ⾒ Backward ⾒ 𝜕𝑧- 𝜕𝑥 = 1 𝑥 𝜕𝑧, 𝜕𝑥 = 3 2 𝑥 𝜕𝑦 𝜕𝑧, = 𝑧. 𝜕𝑧. 𝜕𝑧- = 1 𝜕𝑧. 𝜕𝑧+ = 2 𝑧+ 𝜕𝑦 𝜕𝑧. = 𝑧, 1. Forward ( & )

Slide 69

Slide 69 text

71 𝑦 = 𝑧%𝑧. 𝑧% = 𝑧8 . + 𝑧7 𝑧. = 3 𝑥 𝑧7 = log 𝑥 𝑧8 = 1/𝑥 𝑥 1.2 𝑧7 0.18 𝑧. 3.29 𝑧8 0.83 𝑧% 0.88 𝑦 2.88 data data data data data data grad grad grad grad grad grad x.grad ← 3/(2*sqrt(x.data)) * z2.grad x.grad ← (-1/x.data**2) * z4.grad z2.grad ← z1.data* y.grad z1.grad ← z2.data* y.grad z3.grad ← 1.0 * z1.grad z4.grad ← (2*z4.data)* z1.grad x.grad ← 1/x.data * z3.grad 1. Forward ( & ) ⾒ Backward ⾒

Slide 70

Slide 70 text

72 𝑦 = 𝑧%𝑧. 𝑧% = 𝑧8 . + 𝑧7 𝑧. = 3 𝑥 𝑧7 = log 𝑥 𝑧8 = 1/𝑥 𝑥 1.2 0.0 𝑧7 0.18 0.0 𝑧. 3.29 0.0 𝑧8 0.83 0.0 𝑧% 0.88 0.0 𝑦 2.88 0.0 data data data data data data grad grad grad grad grad grad 2. Backward (grad 0 )

Slide 71

Slide 71 text

73 𝑦 = 𝑧%𝑧. 𝑧% = 𝑧8 . + 𝑧7 𝑧. = 3 𝑥 𝑧7 = log 𝑥 𝑧8 = 1/𝑥 𝑥 1.2 0.0 𝑧7 0.18 0.0 𝑧. 3.29 0.0 𝑧8 0.83 0.0 𝑧% 0.88 0.0 𝑦 2.88 1.0 data data data data data data grad grad grad grad grad grad 3. Backward ( grad 1 )

Slide 72

Slide 72 text

74 𝑦 = 𝑧%𝑧. 𝑧% = 𝑧8 . + 𝑧7 𝑧. = 3 𝑥 𝑧7 = log 𝑥 𝑧8 = 1/𝑥 𝑥 1.2 0.0 𝑧7 0.18 0.0 𝑧. 3.29 0.88 𝑧8 0.83 0.0 𝑧% 0.88 3.29 𝑦 2.88 1.0 data data data data data data grad grad grad grad grad grad z2.grad ← z1.data* y.grad z1.grad ← z2.data* y.grad 4. Backward (y grad )

Slide 73

Slide 73 text

75 𝑦 = 𝑧%𝑧. 𝑧% = 𝑧8 . + 𝑧7 𝑧. = 3 𝑥 𝑧7 = log 𝑥 𝑧8 = 1/𝑥 𝑥 1.2 0.0 𝑧7 0.18 3.29 𝑧. 3.29 0.88 𝑧8 0.83 5.48 𝑧% 0.88 3.29 𝑦 2.88 1.0 data data data data data data grad grad grad grad grad grad z3.grad ← 1.0 * z1.grad z4.grad ← (2*z4.data)* z1.grad 𝜕𝑦 𝜕𝑧$ = 𝜕𝑦 𝜕𝑧" 𝜕𝑧" 𝜕𝑧$ 𝜕𝑦 𝜕𝑧0 = 𝜕𝑦 𝜕𝑧" 𝜕𝑧" 𝜕𝑧0 5. Backward (z1 grad )

Slide 74

Slide 74 text

76 𝑦 = 𝑧%𝑧. 𝑧% = 𝑧8 . + 𝑧7 𝑧. = 3 𝑥 𝑧7 = log 𝑥 𝑧8 = 1/𝑥 𝑥 1.2 1.20 𝑧7 0.18 3.29 𝑧. 3.29 0.88 𝑧8 0.83 5.48 𝑧% 0.88 3.29 𝑦 2.88 1.0 data data data data data data grad grad grad grad grad grad x.grad ← 3/(2*sqrt(x.data)) * z2.grad 1.20 grad=0.0 𝜕𝑦 𝜕𝑥 = 𝜕𝑦 𝜕𝑧# 𝜕𝑧# 𝜕𝑥 + 𝜕𝑦 𝜕𝑧" 𝜕𝑧" 𝜕𝑧$ 𝜕𝑧$ 𝜕𝑥 + 𝜕𝑦 𝜕𝑧" 𝜕𝑧" 𝜕𝑧0 𝜕𝑧0 𝜕𝑥 6. Backward (z2 grad )

Slide 75

Slide 75 text

77 𝑦 = 𝑧%𝑧. 𝑧% = 𝑧8 . + 𝑧7 𝑧. = 3 𝑥 𝑧7 = log 𝑥 𝑧8 = 1/𝑥 𝑥 1.2 3.94 𝑧7 0.18 3.29 𝑧. 3.29 0.88 𝑧8 0.83 5.48 𝑧% 0.88 3.29 𝑦 2.88 1.0 data data data data data data grad grad grad grad grad grad x.grad ← 1/x.data * z3.grad 2.74 grad=1.20 𝜕𝑦 𝜕𝑥 = 𝜕𝑦 𝜕𝑧# 𝜕𝑧# 𝜕𝑥 + 𝜕𝑦 𝜕𝑧" 𝜕𝑧" 𝜕𝑧$ 𝜕𝑧$ 𝜕𝑥 + 𝜕𝑦 𝜕𝑧" 𝜕𝑧" 𝜕𝑧0 𝜕𝑧0 𝜕𝑥 𝜕𝑦 𝜕𝑧$ 6. Backward (z3 grad )

Slide 76

Slide 76 text

78 𝑦 = 𝑧%𝑧. 𝑧% = 𝑧8 . + 𝑧7 𝑧. = 3 𝑥 𝑧7 = log 𝑥 𝑧8 = 1/𝑥 𝑥 1.2 0.14 𝑧7 0.18 3.29 𝑧. 3.29 0.88 𝑧8 0.83 5.48 𝑧% 0.88 3.29 𝑦 2.88 1.0 data data data data data data grad grad grad grad grad grad -3.80 grad=3.94 x.grad ← (-1/x.data**2) * z4.grad 𝜕𝑦 𝜕𝑥 = 𝜕𝑦 𝜕𝑧# 𝜕𝑧# 𝜕𝑥 + 𝜕𝑦 𝜕𝑧" 𝜕𝑧" 𝜕𝑧$ 𝜕𝑧$ 𝜕𝑥 + 𝜕𝑦 𝜕𝑧" 𝜕𝑧" 𝜕𝑧0 𝜕𝑧0 𝜕𝑥 𝜕𝑦 𝜕𝑧0 6. Backward (z4 grad )

Slide 77

Slide 77 text

79 𝑦 = 𝑧%𝑧. 𝑧% = 𝑧8 . + 𝑧7 𝑧. = 3 𝑥 𝑧7 = log 𝑥 𝑧8 = 1/𝑥 𝑥 1.2 0.14 𝑧7 0.18 3.29 𝑧. 3.29 0.88 𝑧8 0.83 5.48 𝑧% 0.88 3.29 𝑦 2.88 1.0 data data data data data data grad grad grad grad grad grad 𝜕𝑦 𝜕𝑥 𝜕𝑦 𝜕𝑧# 𝜕𝑦 𝜕𝑧$ 𝜕𝑦 𝜕𝑧0 𝜕𝑦 𝜕𝑧" 𝜕𝑦 𝜕𝑦 ( ) y

Slide 78

Slide 78 text

80 𝑧. = 3 𝑥 𝑧7 = log 𝑥 𝑧8 = 1/𝑥 𝑧% = 𝑧8 . + 𝑧7 𝑦 = 𝑧%𝑧. PyTorch Backward Forward

Slide 79

Slide 79 text

81 grad x = tensor(1.2, requires_grad=True) z2 = 3*sqrt(x) z3 = log(x) z4 = 1/x z1 = z4**2 + z3 y = z1 * z2 y.retain_grad() z1.retain_grad() z2.retain_grad() z3.retain_grad() z4.retain_grad() y.backward() print(x.data, z2.data, z3.data, z4.data, z1.data, y.data) print(x.grad, z2.grad, z3.grad, z4.grad, z1.grad, y.grad) tensor(1.20) tensor(3.29) tensor(0.18) tensor(0.83) tensor(0.88) tensor(2.88) tensor(0.14) tensor(0.88) tensor(3.29) tensor(5.48) tensor(3.29) tensor(1.) import torch torch.set_printoptions(2) 2 PyTorch

Slide 80

Slide 80 text

82 𝑦 = 𝑥. + 2𝑥 + 3 = 𝑥 + 1 . + 2 (2.0) • (Forward) • (Backward) • • x.grad 𝑥 = −1 𝑦 = 2

Slide 81

Slide 81 text

83 𝑦 = 𝑥$ + 3.2 𝑥# + 1.3 𝑥 − 2 20 ( -1.5 1.5)

Slide 82

Slide 82 text

84 x.grad (Forward) (Backward) 3

Slide 83

Slide 83 text

85 MSE = mean squared errors ( ) SGD = stochastic gradient descent ( ) PyTorch

Slide 84

Slide 84 text

86 ( ) • • (lr ) • • • GPU

Slide 85

Slide 85 text

87 • https://github.com/karpathy/micrograd • Python 94 • https://github.com/karpathy/micrograd/blob/master/micrograd/engine.py • Andrej Karpathy OpenAI (Telsa AI 2023 OpenAI ) • https://youtu.be/VMj-3S1tku0?si=91ZWzaA4ECidua4g micrograd

Slide 86

Slide 86 text

88 § 𝑦 = 3 𝑥 𝑧 = 𝑥 𝑦 = 3𝑧 § PyTorch https://pytorch.org/docs/stable/torch.html#math-operations § §

Slide 87

Slide 87 text

89 ( ) J aime la musique I love music (Software 2.0 )

Slide 88

Slide 88 text

Q & A

Slide 89

Slide 89 text

91 1. 2. 3. ( ) Slido https://app.sli.do/event/d13KtHT3VXxFbmLvxPSQjn