Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Generative Query Networks

Sricharan
December 06, 2018

Generative Query Networks

TU Munich, 3D Vision WS18/19 Master Seminar Presentation

Sricharan

December 06, 2018
Tweet

More Decks by Sricharan

Other Decks in Research

Transcript

  1. Neural Scene Representation and
    Rendering*
    Sricharan Chiruvolu
    *This work was done by S. M. Ali Eslami, Danilo J. Rezende, Frederic Besse, Fabio Viola, Ari S. Morcos, Marta Garnelo, Avraham Ruderman, Andrei A. Rusu, Ivo Danihelka, Karol Gregor, David P.
    Reichert, Lars Buesing, Theophane Weber, Oriol Vinyals, Dan Rosenbaum, Neil Rabinowitz, Helen King, Chloe Hillier, Matt Botvinick, Daan Wierstra, Koray Kavukcuoglu and Demis Hassabis.

    View Slide

  2. Background
    Scene Understanding

    View Slide

  3. Neural Scene Representation and Rendering
    Sricharan Chiruvolu ([email protected])
    3
    Representing Scenes

    View Slide

  4. Neural Scene Representation and Rendering
    Sricharan Chiruvolu ([email protected])
    4
    Understanding Scenes
    Categorise the Dominant Object Classify the Scene type
    Detect Object Bounding Boxes Label Pixels into Categories

    View Slide

  5. Neural Scene Representation and Rendering
    Sricharan Chiruvolu ([email protected])
    5
    Understanding Scenes
    Song et al. - SUN RGB-D: A RGB-D Scene Understanding Benchmark Suite

    View Slide

  6. Neural Scene Representation and Rendering
    Sricharan Chiruvolu ([email protected])
    6
    Learning Generatively
    BM Lake et al. - Human-level concept learning through probabilistic program induction

    View Slide

  7. Neural Scene Representation and Rendering
    Sricharan Chiruvolu ([email protected])
    7
    Generative Models
    Discriminative -> Learn P(y | x)
    Generative -> Learn P(x | y) e.g. learn features of whether a y =
    malignant or benign. Also learns “cost prior” P(y).
    Slide credit: Andrew Ng, Stanford OpenClassroom

    View Slide

  8. Neural Scene Representations

    View Slide

  9. Neural Scene Representation and Rendering
    Sricharan Chiruvolu ([email protected])
    9
    Neural Scene Representation and Rendering
    SMA Eslami et al. - Neural Scene Representation and Rendering

    View Slide

  10. Neural Scene Representation and Rendering
    Sricharan Chiruvolu ([email protected])
    10
    Neural Scene Representation and Rendering
    (Video)

    View Slide

  11. Generative Query Networks

    View Slide

  12. Neural Scene Representation and Rendering
    Sricharan Chiruvolu ([email protected])
    12
    Generative Query Network

    View Slide

  13. Neural Scene Representation and Rendering
    Sricharan Chiruvolu ([email protected])
    13
    Representation Network Architecture
    • Pyramid: learnt fastest
    across experiment
    datasets (more later)
    • Pool: likely exhibit
    view-invariant,
    factorised and
    compositional
    characteristics (used in
    analysis)

    View Slide

  14. Neural Scene Representation and Rendering
    Sricharan Chiruvolu ([email protected])
    14
    Generation Network Architecture
    • Given query viewpoint (Vq) and representation (r) defines
    the distribution from which images can be sampled.
    • One possible network applies a sequence of computational
    cores that take (Vq) and (r) as input.
    • Each core is a skip-conv LSTM network.

    View Slide

  15. Neural Scene Representation and Rendering
    Sricharan Chiruvolu ([email protected])
    15
    Optimisation
    [Reconstruction Likelihood + regularisation]
    Deeper models have higher likelihood, not sharing weights of cores improves performance.
    Effect of (g) on model performance

    View Slide

  16. Experiments and Use-cases

    View Slide

  17. Neural Scene Representation and Rendering
    Sricharan Chiruvolu ([email protected])
    17
    Scene Algebra
    • Suggests compositionally
    of shapes, colours and
    positions
    • Can perform arithmetic
    in (r).
    • Samples are then drawn
    from (g), conditioned on
    the new (r).

    View Slide

  18. Neural Scene Representation and Rendering
    Sricharan Chiruvolu ([email protected])
    18
    Rooms with multiple objects
    • (g) is capable of
    predicting images
    from arbitrary
    viewpoints.
    • Implies (f) captures
    identities, counts,
    positions, colours,
    position of light
    and colours of
    walls and floor.

    View Slide

  19. Neural Scene Representation and Rendering
    Sricharan Chiruvolu ([email protected])
    19
    Control of Robotic Arm
    • 9-joint robotic arm
    and a target object in
    a randomised room
    (Jaco arm).
    • RL-task: Hand to
    reach target and
    remain close to it.
    Reward: decreasing
    function of the
    distance.

    View Slide

  20. Neural Scene Representation and Rendering
    Sricharan Chiruvolu ([email protected])
    20
    Control of Robotic Arm
    • Two networks:
    • Pre-train GQN on scenes with Jaco arm
    • Use (f) to train an RL-agent
    • (r) has much lower dimensionality than input images
    • Substantially more robust and data-efficient policy
    learning
    • ~4 times fewer interactions than standard methods

    View Slide

  21. Neural Scene Representation and Rendering
    Sricharan Chiruvolu ([email protected])
    21
    Maze Environments (Partially observed)
    • 7x7 grid mazes
    generated with
    OpenGL-based
    DeepMind Lab game
    engine.
    • (g) is capable of
    predicting top-down
    view from only a
    handful of first-
    person observations.

    View Slide

  22. Neural Scene Representation and Rendering
    Sricharan Chiruvolu ([email protected])
    22
    Shepard-Metzler Environment
    • Randomly generated shapes (similar to 3D Tetris pieces).
    • (g) could infer even from a single image.
    • Capable of re-rendering from any viewpoint with high
    (indistinguishable) levels of accuracy.
    • If high occlusion: (g) generated one of the many shapes
    that's consistent with the observed portion of the image.

    View Slide

  23. Neural Scene Representation and Rendering
    Sricharan Chiruvolu ([email protected])
    23
    Shepard-Metzler Environment

    View Slide

  24. Comparisons and Restrictions

    View Slide

  25. Neural Scene Representation and Rendering
    Sricharan Chiruvolu ([email protected])
    25
    SFM vs GQN
    • SFM and other multiple view geometry techniques —>
    point clouds, mesh clouds, collections of pre-defined
    primitives… - (3D Scanning Lecture)
    • GQN learns representational space; can express the
    presence of textures, parts, objects, lights and scenes at a
    suitably high level of abstraction.
    • GQN enables task-specific fine-tuning of the
    representation itself.

    View Slide

  26. Neural Scene Representation and Rendering
    Sricharan Chiruvolu ([email protected])
    26
    GQN vs Other Learning-based Methods
    • Other neural approaches (auto-encoders etc) focus on
    regularities in colors and patches in the image space, but
    fail to achieve high-level representation.
    • GQN can account for uncertainty in scenes with high
    occlusions.
    • GQN is not specific to particular choice of generation
    architecture.

    View Slide

  27. Neural Scene Representation and Rendering
    Sricharan Chiruvolu ([email protected])
    27
    Current Restrictions
    • Resulting representations are no longer interpretable.
    • Experimented on synthetic environments:
    • A need for controlled analysis
    • Limited availability of suitable real datasets
    • Total scene understanding involves more than just 3D
    scene.

    View Slide

  28. Neural Scene Representation and Rendering
    Sricharan Chiruvolu ([email protected])
    28
    Future Work
    • GQN based SLAM -> Keep track of agent’s location
    • Applications in AR/VR -> Perspective rendering
    • Autonomous driving -> Predictive driving
    • Modelling dynamic scenes
    • …

    View Slide

  29. Neural Scene Representation and Rendering
    Sricharan Chiruvolu ([email protected])
    29
    Conclusion
    • A single architecture to perceive, interpret and represent
    synthetic scenes without human labelling.
    • Representations adapt to capture details of the
    environment.
    • No problem specific engineering of generators.
    • Paves the way towards fully unsupervised scene
    understanding, planning and behaviour.

    View Slide

  30. Neural Scene Representation and Rendering
    Sricharan Chiruvolu ([email protected])
    30
    More…
    • DeepMind Blog - June, 2018
    • Science - Vol. 360, Issue 6394, pp. 1204-1210
    • Open Access Version
    • Datasets used in Experiments
    • Related Video
    • Detailed pseudo-code is provided as Supplementary
    Materials.
    • DeepMind has filed a U.K. patent application
    (GP-201495-00-PCT) related to this work.

    View Slide

  31. View Slide

  32. View Slide

  33. Extra

    View Slide

  34. Neural Scene Representation and Rendering
    Sricharan Chiruvolu ([email protected])
    34
    Generative Models
    Credit: Goodfellow, 2016

    View Slide

  35. Neural Scene Representation and Rendering
    Sricharan Chiruvolu ([email protected])
    35

    View Slide

  36. Neural Scene Representation and Rendering
    Sricharan Chiruvolu ([email protected])
    36
    Generation Network Architecture
    A. Sequence of
    computational cores
    B. Skip-connection pathways
    (LSTM based)

    View Slide