loss 2. Introduce two types of moving-averages 𝐸 𝜹 = CE 𝑓 𝒙 , 𝒚 ∇𝜹 𝐸 𝜹 = 𝜕𝐸 𝜕𝜹 𝒎% = 𝜌! 𝒎%$! + 1 − 𝜌! ∇𝜹 𝐸 𝜹% 𝒗% = 𝜌' 𝒗%$! + 1 − 𝜌' ∇𝜹 𝐸 𝜹% ' 3. Obtain perturbation update ∆𝜹𝒕 4. Update perturbation using ∆𝜹𝒕 6 𝒎% = 𝒎% 1 − 𝜌! % , 6 𝒗% = 𝒗% 1 − 𝜌' % ∆𝜹𝒕 = 𝜂 6 𝒎% 6 𝒗% + 𝜖 𝜹%)! = Π 𝜹 *+ 𝜹% + ∆𝜹𝒕 ∆𝜹𝒕 , Algorithm of MAT: Using moments to update perturbation 𝜹, unlike VILLA [Gan+, NeurIPS’20]