Long episodes and generalization require compression ▪ Distributed representations for efﬁcient memory access and robustness ▪ Implicit operation is often implied by the above
initialization and gradient descent schedule 2. Write into the memory by minimizing the writing loss (repeat K times) 1. For a known family deﬁne a writing loss Loss
