: One-hot vector set representing each token of instruction We tokenize instruction using WordPiece [Wu 16] and convert it into token sequence e.g.) “Pick up the empty bottle on the shelf” -> [“Pick”, “up”, “the”, ”empty”, ”bottle”, “on”, “the”, “shelf”, “.”] • 𝑥%&# : One-hot vector set representing each token’s position of instruction