Generative Query Networks

C4b8cba6ffda3f70cdf962d52026fb8c?s=47 Sricharan
December 06, 2018

Generative Query Networks

TU Munich, 3D Vision WS18/19 Master Seminar Presentation

C4b8cba6ffda3f70cdf962d52026fb8c?s=128

Sricharan

December 06, 2018
Tweet

Transcript

  1. Neural Scene Representation and Rendering* Sricharan Chiruvolu *This work was

    done by S. M. Ali Eslami, Danilo J. Rezende, Frederic Besse, Fabio Viola, Ari S. Morcos, Marta Garnelo, Avraham Ruderman, Andrei A. Rusu, Ivo Danihelka, Karol Gregor, David P. Reichert, Lars Buesing, Theophane Weber, Oriol Vinyals, Dan Rosenbaum, Neil Rabinowitz, Helen King, Chloe Hillier, Matt Botvinick, Daan Wierstra, Koray Kavukcuoglu and Demis Hassabis.
  2. Background Scene Understanding

  3. Neural Scene Representation and Rendering Sricharan Chiruvolu (s.chiruvolu@tum.de) 3 Representing

    Scenes
  4. Neural Scene Representation and Rendering Sricharan Chiruvolu (s.chiruvolu@tum.de) 4 Understanding

    Scenes Categorise the Dominant Object Classify the Scene type Detect Object Bounding Boxes Label Pixels into Categories
  5. Neural Scene Representation and Rendering Sricharan Chiruvolu (s.chiruvolu@tum.de) 5 Understanding

    Scenes Song et al. - SUN RGB-D: A RGB-D Scene Understanding Benchmark Suite
  6. Neural Scene Representation and Rendering Sricharan Chiruvolu (s.chiruvolu@tum.de) 6 Learning

    Generatively BM Lake et al. - Human-level concept learning through probabilistic program induction
  7. Neural Scene Representation and Rendering Sricharan Chiruvolu (s.chiruvolu@tum.de) 7 Generative

    Models Discriminative -> Learn P(y | x) Generative -> Learn P(x | y) e.g. learn features of whether a y = malignant or benign. Also learns “cost prior” P(y). Slide credit: Andrew Ng, Stanford OpenClassroom
  8. Neural Scene Representations

  9. Neural Scene Representation and Rendering Sricharan Chiruvolu (s.chiruvolu@tum.de) 9 Neural

    Scene Representation and Rendering SMA Eslami et al. - Neural Scene Representation and Rendering
  10. Neural Scene Representation and Rendering Sricharan Chiruvolu (s.chiruvolu@tum.de) 10 Neural

    Scene Representation and Rendering (Video)
  11. Generative Query Networks

  12. Neural Scene Representation and Rendering Sricharan Chiruvolu (s.chiruvolu@tum.de) 12 Generative

    Query Network
  13. Neural Scene Representation and Rendering Sricharan Chiruvolu (s.chiruvolu@tum.de) 13 Representation

    Network Architecture • Pyramid: learnt fastest across experiment datasets (more later) • Pool: likely exhibit view-invariant, factorised and compositional characteristics (used in analysis)
  14. Neural Scene Representation and Rendering Sricharan Chiruvolu (s.chiruvolu@tum.de) 14 Generation

    Network Architecture • Given query viewpoint (Vq) and representation (r) defines the distribution from which images can be sampled. • One possible network applies a sequence of computational cores that take (Vq) and (r) as input. • Each core is a skip-conv LSTM network.
  15. Neural Scene Representation and Rendering Sricharan Chiruvolu (s.chiruvolu@tum.de) 15 Optimisation

    [Reconstruction Likelihood + regularisation] Deeper models have higher likelihood, not sharing weights of cores improves performance. Effect of (g) on model performance
  16. Experiments and Use-cases

  17. Neural Scene Representation and Rendering Sricharan Chiruvolu (s.chiruvolu@tum.de) 17 Scene

    Algebra • Suggests compositionally of shapes, colours and positions • Can perform arithmetic in (r). • Samples are then drawn from (g), conditioned on the new (r).
  18. Neural Scene Representation and Rendering Sricharan Chiruvolu (s.chiruvolu@tum.de) 18 Rooms

    with multiple objects • (g) is capable of predicting images from arbitrary viewpoints. • Implies (f) captures identities, counts, positions, colours, position of light and colours of walls and floor.
  19. Neural Scene Representation and Rendering Sricharan Chiruvolu (s.chiruvolu@tum.de) 19 Control

    of Robotic Arm • 9-joint robotic arm and a target object in a randomised room (Jaco arm). • RL-task: Hand to reach target and remain close to it. Reward: decreasing function of the distance.
  20. Neural Scene Representation and Rendering Sricharan Chiruvolu (s.chiruvolu@tum.de) 20 Control

    of Robotic Arm • Two networks: • Pre-train GQN on scenes with Jaco arm • Use (f) to train an RL-agent • (r) has much lower dimensionality than input images • Substantially more robust and data-efficient policy learning • ~4 times fewer interactions than standard methods
  21. Neural Scene Representation and Rendering Sricharan Chiruvolu (s.chiruvolu@tum.de) 21 Maze

    Environments (Partially observed) • 7x7 grid mazes generated with OpenGL-based DeepMind Lab game engine. • (g) is capable of predicting top-down view from only a handful of first- person observations.
  22. Neural Scene Representation and Rendering Sricharan Chiruvolu (s.chiruvolu@tum.de) 22 Shepard-Metzler

    Environment • Randomly generated shapes (similar to 3D Tetris pieces). • (g) could infer even from a single image. • Capable of re-rendering from any viewpoint with high (indistinguishable) levels of accuracy. • If high occlusion: (g) generated one of the many shapes that's consistent with the observed portion of the image.
  23. Neural Scene Representation and Rendering Sricharan Chiruvolu (s.chiruvolu@tum.de) 23 Shepard-Metzler

    Environment
  24. Comparisons and Restrictions

  25. Neural Scene Representation and Rendering Sricharan Chiruvolu (s.chiruvolu@tum.de) 25 SFM

    vs GQN • SFM and other multiple view geometry techniques —> point clouds, mesh clouds, collections of pre-defined primitives… - (3D Scanning Lecture) • GQN learns representational space; can express the presence of textures, parts, objects, lights and scenes at a suitably high level of abstraction. • GQN enables task-specific fine-tuning of the representation itself.
  26. Neural Scene Representation and Rendering Sricharan Chiruvolu (s.chiruvolu@tum.de) 26 GQN

    vs Other Learning-based Methods • Other neural approaches (auto-encoders etc) focus on regularities in colors and patches in the image space, but fail to achieve high-level representation. • GQN can account for uncertainty in scenes with high occlusions. • GQN is not specific to particular choice of generation architecture.
  27. Neural Scene Representation and Rendering Sricharan Chiruvolu (s.chiruvolu@tum.de) 27 Current

    Restrictions • Resulting representations are no longer interpretable. • Experimented on synthetic environments: • A need for controlled analysis • Limited availability of suitable real datasets • Total scene understanding involves more than just 3D scene.
  28. Neural Scene Representation and Rendering Sricharan Chiruvolu (s.chiruvolu@tum.de) 28 Future

    Work • GQN based SLAM -> Keep track of agent’s location • Applications in AR/VR -> Perspective rendering • Autonomous driving -> Predictive driving • Modelling dynamic scenes • …
  29. Neural Scene Representation and Rendering Sricharan Chiruvolu (s.chiruvolu@tum.de) 29 Conclusion

    • A single architecture to perceive, interpret and represent synthetic scenes without human labelling. • Representations adapt to capture details of the environment. • No problem specific engineering of generators. • Paves the way towards fully unsupervised scene understanding, planning and behaviour.
  30. Neural Scene Representation and Rendering Sricharan Chiruvolu (s.chiruvolu@tum.de) 30 More…

    • DeepMind Blog - June, 2018 • Science - Vol. 360, Issue 6394, pp. 1204-1210 • Open Access Version • Datasets used in Experiments • Related Video • Detailed pseudo-code is provided as Supplementary Materials. • DeepMind has filed a U.K. patent application (GP-201495-00-PCT) related to this work.
  31. Fin. (s.chiruvolu@tum.de)

  32. None
  33. Extra

  34. Neural Scene Representation and Rendering Sricharan Chiruvolu (s.chiruvolu@tum.de) 34 Generative

    Models Credit: Goodfellow, 2016
  35. Neural Scene Representation and Rendering Sricharan Chiruvolu (s.chiruvolu@tum.de) 35

  36. Neural Scene Representation and Rendering Sricharan Chiruvolu (s.chiruvolu@tum.de) 36 Generation

    Network Architecture A. Sequence of computational cores B. Skip-connection pathways (LSTM based)