Generative Query Networks

Slide 1

Slide 1 text

Neural Scene Representation and Rendering* Sricharan Chiruvolu *This work was done by S. M. Ali Eslami, Danilo J. Rezende, Frederic Besse, Fabio Viola, Ari S. Morcos, Marta Garnelo, Avraham Ruderman, Andrei A. Rusu, Ivo Danihelka, Karol Gregor, David P. Reichert, Lars Buesing, Theophane Weber, Oriol Vinyals, Dan Rosenbaum, Neil Rabinowitz, Helen King, Chloe Hillier, Matt Botvinick, Daan Wierstra, Koray Kavukcuoglu and Demis Hassabis.

Slide 2

Slide 2 text

Background Scene Understanding

Slide 3

Slide 3 text

Neural Scene Representation and Rendering Sricharan Chiruvolu ([email protected]) 3 Representing Scenes

Slide 4

Slide 4 text

Neural Scene Representation and Rendering Sricharan Chiruvolu ([email protected]) 4 Understanding Scenes Categorise the Dominant Object Classify the Scene type Detect Object Bounding Boxes Label Pixels into Categories

Slide 5

Slide 5 text

Neural Scene Representation and Rendering Sricharan Chiruvolu ([email protected]) 5 Understanding Scenes Song et al. - SUN RGB-D: A RGB-D Scene Understanding Benchmark Suite

Slide 6

Slide 6 text

Neural Scene Representation and Rendering Sricharan Chiruvolu ([email protected]) 6 Learning Generatively BM Lake et al. - Human-level concept learning through probabilistic program induction

Slide 7

Slide 7 text

Neural Scene Representation and Rendering Sricharan Chiruvolu ([email protected]) 7 Generative Models Discriminative -> Learn P(y | x) Generative -> Learn P(x | y) e.g. learn features of whether a y = malignant or benign. Also learns “cost prior” P(y). Slide credit: Andrew Ng, Stanford OpenClassroom

Slide 8

Slide 8 text

Neural Scene Representations

Slide 9

Slide 9 text

Neural Scene Representation and Rendering Sricharan Chiruvolu ([email protected]) 9 Neural Scene Representation and Rendering SMA Eslami et al. - Neural Scene Representation and Rendering

Slide 10

Slide 10 text

Neural Scene Representation and Rendering Sricharan Chiruvolu ([email protected]) 10 Neural Scene Representation and Rendering (Video)

Slide 11

Slide 11 text

Generative Query Networks

Slide 12

Slide 12 text

Neural Scene Representation and Rendering Sricharan Chiruvolu ([email protected]) 12 Generative Query Network

Slide 13

Slide 13 text

Neural Scene Representation and Rendering Sricharan Chiruvolu ([email protected]) 13 Representation Network Architecture • Pyramid: learnt fastest across experiment datasets (more later) • Pool: likely exhibit view-invariant, factorised and compositional characteristics (used in analysis)

Slide 14

Slide 14 text

Neural Scene Representation and Rendering Sricharan Chiruvolu ([email protected]) 14 Generation Network Architecture • Given query viewpoint (Vq) and representation (r) defines the distribution from which images can be sampled. • One possible network applies a sequence of computational cores that take (Vq) and (r) as input. • Each core is a skip-conv LSTM network.

Slide 15

Slide 15 text

Neural Scene Representation and Rendering Sricharan Chiruvolu ([email protected]) 15 Optimisation [Reconstruction Likelihood + regularisation] Deeper models have higher likelihood, not sharing weights of cores improves performance. Effect of (g) on model performance

Slide 16

Slide 16 text

Experiments and Use-cases

Slide 17

Slide 17 text

Neural Scene Representation and Rendering Sricharan Chiruvolu ([email protected]) 17 Scene Algebra • Suggests compositionally of shapes, colours and positions • Can perform arithmetic in (r). • Samples are then drawn from (g), conditioned on the new (r).

Slide 18

Slide 18 text

Neural Scene Representation and Rendering Sricharan Chiruvolu ([email protected]) 18 Rooms with multiple objects • (g) is capable of predicting images from arbitrary viewpoints. • Implies (f) captures identities, counts, positions, colours, position of light and colours of walls and floor.

Slide 19

Slide 19 text

Neural Scene Representation and Rendering Sricharan Chiruvolu ([email protected]) 19 Control of Robotic Arm • 9-joint robotic arm and a target object in a randomised room (Jaco arm). • RL-task: Hand to reach target and remain close to it. Reward: decreasing function of the distance.

Slide 20

Slide 20 text

Neural Scene Representation and Rendering Sricharan Chiruvolu ([email protected]) 20 Control of Robotic Arm • Two networks: • Pre-train GQN on scenes with Jaco arm • Use (f) to train an RL-agent • (r) has much lower dimensionality than input images • Substantially more robust and data-efficient policy learning • ~4 times fewer interactions than standard methods

Slide 21

Slide 21 text

Neural Scene Representation and Rendering Sricharan Chiruvolu ([email protected]) 21 Maze Environments (Partially observed) • 7x7 grid mazes generated with OpenGL-based DeepMind Lab game engine. • (g) is capable of predicting top-down view from only a handful of first- person observations.

Slide 22

Slide 22 text

Neural Scene Representation and Rendering Sricharan Chiruvolu ([email protected]) 22 Shepard-Metzler Environment • Randomly generated shapes (similar to 3D Tetris pieces). • (g) could infer even from a single image. • Capable of re-rendering from any viewpoint with high (indistinguishable) levels of accuracy. • If high occlusion: (g) generated one of the many shapes that's consistent with the observed portion of the image.

Slide 23

Slide 23 text

Neural Scene Representation and Rendering Sricharan Chiruvolu ([email protected]) 23 Shepard-Metzler Environment

Slide 24

Slide 24 text

Comparisons and Restrictions

Slide 25

Slide 25 text

Neural Scene Representation and Rendering Sricharan Chiruvolu ([email protected]) 25 SFM vs GQN • SFM and other multiple view geometry techniques —> point clouds, mesh clouds, collections of pre-defined primitives… - (3D Scanning Lecture) • GQN learns representational space; can express the presence of textures, parts, objects, lights and scenes at a suitably high level of abstraction. • GQN enables task-specific fine-tuning of the representation itself.

Slide 26

Slide 26 text

Neural Scene Representation and Rendering Sricharan Chiruvolu ([email protected]) 26 GQN vs Other Learning-based Methods • Other neural approaches (auto-encoders etc) focus on regularities in colors and patches in the image space, but fail to achieve high-level representation. • GQN can account for uncertainty in scenes with high occlusions. • GQN is not specific to particular choice of generation architecture.

Slide 27

Slide 27 text

Neural Scene Representation and Rendering Sricharan Chiruvolu ([email protected]) 27 Current Restrictions • Resulting representations are no longer interpretable. • Experimented on synthetic environments: • A need for controlled analysis • Limited availability of suitable real datasets • Total scene understanding involves more than just 3D scene.

Slide 28

Slide 28 text

Neural Scene Representation and Rendering Sricharan Chiruvolu ([email protected]) 28 Future Work • GQN based SLAM -> Keep track of agent’s location • Applications in AR/VR -> Perspective rendering • Autonomous driving -> Predictive driving • Modelling dynamic scenes • …

Slide 29

Slide 29 text

Neural Scene Representation and Rendering Sricharan Chiruvolu ([email protected]) 29 Conclusion • A single architecture to perceive, interpret and represent synthetic scenes without human labelling. • Representations adapt to capture details of the environment. • No problem specific engineering of generators. • Paves the way towards fully unsupervised scene understanding, planning and behaviour.

Slide 30

Slide 30 text

Neural Scene Representation and Rendering Sricharan Chiruvolu ([email protected]) 30 More… • DeepMind Blog - June, 2018 • Science - Vol. 360, Issue 6394, pp. 1204-1210 • Open Access Version • Datasets used in Experiments • Related Video • Detailed pseudo-code is provided as Supplementary Materials. • DeepMind has filed a U.K. patent application (GP-201495-00-PCT) related to this work.

Slide 31

Slide 31 text

Fin. ([email protected])

Slide 32

Slide 32 text

No content

Slide 33

Slide 33 text

Extra

Slide 34

Slide 34 text

Neural Scene Representation and Rendering Sricharan Chiruvolu ([email protected]) 34 Generative Models Credit: Goodfellow, 2016

Slide 35

Slide 35 text

Neural Scene Representation and Rendering Sricharan Chiruvolu ([email protected]) 35

Slide 36

Slide 36 text

Neural Scene Representation and Rendering Sricharan Chiruvolu ([email protected]) 36 Generation Network Architecture A. Sequence of computational cores B. Skip-connection pathways (LSTM based)