Slide 9
Slide 9 text
Stacking methods, one by one → MNIST
# 2-layer MLP: 784 → 128 (ReLU) → 10
# Forward
z1 = GPU.matmul(w1, x, 128, 784, 1)
h = GPU.relu(GPU.add(z1, b1))
o = GPU.add(GPU.matmul(w2, h, 10, 128, 1), b2)
# Backward
grad_w2 = GPU.matmul_nt(grad_o, h, 10, 1, 128)
grad_h = GPU.matmul_tn(w2, grad_o, 128, 10, 1)
grad_h_pre = GPU.mul(grad_h, mask)
# SGD update
w1 = GPU.sub(w1, GPU.scale(grad_w1, LR))
w2 = GPU.sub(w2, GPU.scale(grad_w2, LR))