Generative Query Networks

Neural Scene Representation and Rendering* Sricharan Chiruvolu *This work was
done by S. M. Ali Eslami, Danilo J. Rezende, Frederic Besse, Fabio Viola, Ari S. Morcos, Marta Garnelo, Avraham Ruderman, Andrei A. Rusu, Ivo Danihelka, Karol Gregor, David P. Reichert, Lars Buesing, Theophane Weber, Oriol Vinyals, Dan Rosenbaum, Neil Rabinowitz, Helen King, Chloe Hillier, Matt Botvinick, Daan Wierstra, Koray Kavukcuoglu and Demis Hassabis.

Background Scene Understanding

Neural Scene Representation and Rendering Sricharan Chiruvolu ([email protected]) 3 Representing
Scenes

Neural Scene Representation and Rendering Sricharan Chiruvolu ([email protected]) 4 Understanding
Scenes Categorise the Dominant Object Classify the Scene type Detect Object Bounding Boxes Label Pixels into Categories

Neural Scene Representation and Rendering Sricharan Chiruvolu ([email protected]) 5 Understanding
Scenes Song et al. - SUN RGB-D: A RGB-D Scene Understanding Benchmark Suite

Neural Scene Representation and Rendering Sricharan Chiruvolu ([email protected]) 6 Learning
Generatively BM Lake et al. - Human-level concept learning through probabilistic program induction

Neural Scene Representation and Rendering Sricharan Chiruvolu ([email protected]) 7 Generative
Models Discriminative -> Learn P(y | x) Generative -> Learn P(x | y) e.g. learn features of whether a y = malignant or benign. Also learns “cost prior” P(y). Slide credit: Andrew Ng, Stanford OpenClassroom

Neural Scene Representations

Neural Scene Representation and Rendering Sricharan Chiruvolu ([email protected]) 9 Neural
Scene Representation and Rendering SMA Eslami et al. - Neural Scene Representation and Rendering

Neural Scene Representation and Rendering Sricharan Chiruvolu ([email protected]) 10 Neural
Scene Representation and Rendering (Video)

Generative Query Networks

Query Network

Neural Scene Representation and Rendering Sricharan Chiruvolu ([email protected]) 13 Representation
Network Architecture • Pyramid: learnt fastest across experiment datasets (more later) • Pool: likely exhibit view-invariant, factorised and compositional characteristics (used in analysis)

Neural Scene Representation and Rendering Sricharan Chiruvolu ([email protected]) 14 Generation
Network Architecture • Given query viewpoint (Vq) and representation (r) defines the distribution from which images can be sampled. • One possible network applies a sequence of computational cores that take (Vq) and (r) as input. • Each core is a skip-conv LSTM network.

Neural Scene Representation and Rendering Sricharan Chiruvolu ([email protected]) 15 Optimisation
[Reconstruction Likelihood + regularisation] Deeper models have higher likelihood, not sharing weights of cores improves performance. Effect of (g) on model performance

Experiments and Use-cases

Neural Scene Representation and Rendering Sricharan Chiruvolu ([email protected]) 17 Scene
Algebra • Suggests compositionally of shapes, colours and positions • Can perform arithmetic in (r). • Samples are then drawn from (g), conditioned on the new (r).

Neural Scene Representation and Rendering Sricharan Chiruvolu ([email protected]) 18 Rooms
with multiple objects • (g) is capable of predicting images from arbitrary viewpoints. • Implies (f) captures identities, counts, positions, colours, position of light and colours of walls and floor.

Neural Scene Representation and Rendering Sricharan Chiruvolu ([email protected]) 19 Control
of Robotic Arm • 9-joint robotic arm and a target object in a randomised room (Jaco arm). • RL-task: Hand to reach target and remain close to it. Reward: decreasing function of the distance.

Neural Scene Representation and Rendering Sricharan Chiruvolu ([email protected]) 20 Control
of Robotic Arm • Two networks: • Pre-train GQN on scenes with Jaco arm • Use (f) to train an RL-agent • (r) has much lower dimensionality than input images • Substantially more robust and data-efficient policy learning • ~4 times fewer interactions than standard methods

Neural Scene Representation and Rendering Sricharan Chiruvolu ([email protected]) 21 Maze
Environments (Partially observed) • 7x7 grid mazes generated with OpenGL-based DeepMind Lab game engine. • (g) is capable of predicting top-down view from only a handful of first- person observations.

Neural Scene Representation and Rendering Sricharan Chiruvolu ([email protected]) 22 Shepard-Metzler
Environment • Randomly generated shapes (similar to 3D Tetris pieces). • (g) could infer even from a single image. • Capable of re-rendering from any viewpoint with high (indistinguishable) levels of accuracy. • If high occlusion: (g) generated one of the many shapes that's consistent with the observed portion of the image.

Neural Scene Representation and Rendering Sricharan Chiruvolu ([email protected]) 23 Shepard-Metzler
Environment

Comparisons and Restrictions

Neural Scene Representation and Rendering Sricharan Chiruvolu ([email protected]) 25 SFM
vs GQN • SFM and other multiple view geometry techniques —> point clouds, mesh clouds, collections of pre-defined primitives… - (3D Scanning Lecture) • GQN learns representational space; can express the presence of textures, parts, objects, lights and scenes at a suitably high level of abstraction. • GQN enables task-specific fine-tuning of the representation itself.

Neural Scene Representation and Rendering Sricharan Chiruvolu ([email protected]) 26 GQN
vs Other Learning-based Methods • Other neural approaches (auto-encoders etc) focus on regularities in colors and patches in the image space, but fail to achieve high-level representation. • GQN can account for uncertainty in scenes with high occlusions. • GQN is not specific to particular choice of generation architecture.

Neural Scene Representation and Rendering Sricharan Chiruvolu ([email protected]) 27 Current
Restrictions • Resulting representations are no longer interpretable. • Experimented on synthetic environments: • A need for controlled analysis • Limited availability of suitable real datasets • Total scene understanding involves more than just 3D scene.

Neural Scene Representation and Rendering Sricharan Chiruvolu ([email protected]) 28 Future
Work • GQN based SLAM -> Keep track of agent’s location • Applications in AR/VR -> Perspective rendering • Autonomous driving -> Predictive driving • Modelling dynamic scenes • …

Neural Scene Representation and Rendering Sricharan Chiruvolu ([email protected]) 29 Conclusion
• A single architecture to perceive, interpret and represent synthetic scenes without human labelling. • Representations adapt to capture details of the environment. • No problem specific engineering of generators. • Paves the way towards fully unsupervised scene understanding, planning and behaviour.

Neural Scene Representation and Rendering Sricharan Chiruvolu ([email protected]) 30 More…
• DeepMind Blog - June, 2018 • Science - Vol. 360, Issue 6394, pp. 1204-1210 • Open Access Version • Datasets used in Experiments • Related Video • Detailed pseudo-code is provided as Supplementary Materials. • DeepMind has filed a U.K. patent application (GP-201495-00-PCT) related to this work.

Fin. ([email protected])

Models Credit: Goodfellow, 2016

Neural Scene Representation and Rendering Sricharan Chiruvolu ([email protected]) 35

Neural Scene Representation and Rendering Sricharan Chiruvolu ([email protected]) 36 Generation
Network Architecture A. Sequence of computational cores B. Skip-connection pathways (LSTM based)

Generative Query Networks

Generative Query Networks

More Decks by Sricharan

Other Decks in Research

Featured

Transcript