Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Generative Query Networks

Sricharan
December 06, 2018

Generative Query Networks

TU Munich, 3D Vision WS18/19 Master Seminar Presentation

Sricharan

December 06, 2018
Tweet

More Decks by Sricharan

Other Decks in Research

Transcript

  1. Neural Scene Representation and Rendering* Sricharan Chiruvolu *This work was

    done by S. M. Ali Eslami, Danilo J. Rezende, Frederic Besse, Fabio Viola, Ari S. Morcos, Marta Garnelo, Avraham Ruderman, Andrei A. Rusu, Ivo Danihelka, Karol Gregor, David P. Reichert, Lars Buesing, Theophane Weber, Oriol Vinyals, Dan Rosenbaum, Neil Rabinowitz, Helen King, Chloe Hillier, Matt Botvinick, Daan Wierstra, Koray Kavukcuoglu and Demis Hassabis.
  2. Neural Scene Representation and Rendering Sricharan Chiruvolu ([email protected]) 4 Understanding

    Scenes Categorise the Dominant Object Classify the Scene type Detect Object Bounding Boxes Label Pixels into Categories
  3. Neural Scene Representation and Rendering Sricharan Chiruvolu ([email protected]) 5 Understanding

    Scenes Song et al. - SUN RGB-D: A RGB-D Scene Understanding Benchmark Suite
  4. Neural Scene Representation and Rendering Sricharan Chiruvolu ([email protected]) 6 Learning

    Generatively BM Lake et al. - Human-level concept learning through probabilistic program induction
  5. Neural Scene Representation and Rendering Sricharan Chiruvolu ([email protected]) 7 Generative

    Models Discriminative -> Learn P(y | x) Generative -> Learn P(x | y) e.g. learn features of whether a y = malignant or benign. Also learns “cost prior” P(y). Slide credit: Andrew Ng, Stanford OpenClassroom
  6. Neural Scene Representation and Rendering Sricharan Chiruvolu ([email protected]) 9 Neural

    Scene Representation and Rendering SMA Eslami et al. - Neural Scene Representation and Rendering
  7. Neural Scene Representation and Rendering Sricharan Chiruvolu ([email protected]) 13 Representation

    Network Architecture • Pyramid: learnt fastest across experiment datasets (more later) • Pool: likely exhibit view-invariant, factorised and compositional characteristics (used in analysis)
  8. Neural Scene Representation and Rendering Sricharan Chiruvolu ([email protected]) 14 Generation

    Network Architecture • Given query viewpoint (Vq) and representation (r) defines the distribution from which images can be sampled. • One possible network applies a sequence of computational cores that take (Vq) and (r) as input. • Each core is a skip-conv LSTM network.
  9. Neural Scene Representation and Rendering Sricharan Chiruvolu ([email protected]) 15 Optimisation

    [Reconstruction Likelihood + regularisation] Deeper models have higher likelihood, not sharing weights of cores improves performance. Effect of (g) on model performance
  10. Neural Scene Representation and Rendering Sricharan Chiruvolu ([email protected]) 17 Scene

    Algebra • Suggests compositionally of shapes, colours and positions • Can perform arithmetic in (r). • Samples are then drawn from (g), conditioned on the new (r).
  11. Neural Scene Representation and Rendering Sricharan Chiruvolu ([email protected]) 18 Rooms

    with multiple objects • (g) is capable of predicting images from arbitrary viewpoints. • Implies (f) captures identities, counts, positions, colours, position of light and colours of walls and floor.
  12. Neural Scene Representation and Rendering Sricharan Chiruvolu ([email protected]) 19 Control

    of Robotic Arm • 9-joint robotic arm and a target object in a randomised room (Jaco arm). • RL-task: Hand to reach target and remain close to it. Reward: decreasing function of the distance.
  13. Neural Scene Representation and Rendering Sricharan Chiruvolu ([email protected]) 20 Control

    of Robotic Arm • Two networks: • Pre-train GQN on scenes with Jaco arm • Use (f) to train an RL-agent • (r) has much lower dimensionality than input images • Substantially more robust and data-efficient policy learning • ~4 times fewer interactions than standard methods
  14. Neural Scene Representation and Rendering Sricharan Chiruvolu ([email protected]) 21 Maze

    Environments (Partially observed) • 7x7 grid mazes generated with OpenGL-based DeepMind Lab game engine. • (g) is capable of predicting top-down view from only a handful of first- person observations.
  15. Neural Scene Representation and Rendering Sricharan Chiruvolu ([email protected]) 22 Shepard-Metzler

    Environment • Randomly generated shapes (similar to 3D Tetris pieces). • (g) could infer even from a single image. • Capable of re-rendering from any viewpoint with high (indistinguishable) levels of accuracy. • If high occlusion: (g) generated one of the many shapes that's consistent with the observed portion of the image.
  16. Neural Scene Representation and Rendering Sricharan Chiruvolu ([email protected]) 25 SFM

    vs GQN • SFM and other multiple view geometry techniques —> point clouds, mesh clouds, collections of pre-defined primitives… - (3D Scanning Lecture) • GQN learns representational space; can express the presence of textures, parts, objects, lights and scenes at a suitably high level of abstraction. • GQN enables task-specific fine-tuning of the representation itself.
  17. Neural Scene Representation and Rendering Sricharan Chiruvolu ([email protected]) 26 GQN

    vs Other Learning-based Methods • Other neural approaches (auto-encoders etc) focus on regularities in colors and patches in the image space, but fail to achieve high-level representation. • GQN can account for uncertainty in scenes with high occlusions. • GQN is not specific to particular choice of generation architecture.
  18. Neural Scene Representation and Rendering Sricharan Chiruvolu ([email protected]) 27 Current

    Restrictions • Resulting representations are no longer interpretable. • Experimented on synthetic environments: • A need for controlled analysis • Limited availability of suitable real datasets • Total scene understanding involves more than just 3D scene.
  19. Neural Scene Representation and Rendering Sricharan Chiruvolu ([email protected]) 28 Future

    Work • GQN based SLAM -> Keep track of agent’s location • Applications in AR/VR -> Perspective rendering • Autonomous driving -> Predictive driving • Modelling dynamic scenes • …
  20. Neural Scene Representation and Rendering Sricharan Chiruvolu ([email protected]) 29 Conclusion

    • A single architecture to perceive, interpret and represent synthetic scenes without human labelling. • Representations adapt to capture details of the environment. • No problem specific engineering of generators. • Paves the way towards fully unsupervised scene understanding, planning and behaviour.
  21. Neural Scene Representation and Rendering Sricharan Chiruvolu ([email protected]) 30 More…

    • DeepMind Blog - June, 2018 • Science - Vol. 360, Issue 6394, pp. 1204-1210 • Open Access Version • Datasets used in Experiments • Related Video • Detailed pseudo-code is provided as Supplementary Materials. • DeepMind has filed a U.K. patent application (GP-201495-00-PCT) related to this work.
  22. Neural Scene Representation and Rendering Sricharan Chiruvolu ([email protected]) 36 Generation

    Network Architecture A. Sequence of computational cores B. Skip-connection pathways (LSTM based)