Slide 1

Slide 1 text

Beyond Face Rotation

Slide 2

Slide 2 text

MOMA (Museum of Machine Arts) - Sourabh Bajaj

Slide 3

Slide 3 text

Hello • I’m Sourabh

Slide 4

Slide 4 text

Hello • I’m Sourabh • I’m a software Engineer

Slide 5

Slide 5 text

Hello • I’m Sourabh • I’m a software Engineer • I tweet @sb2nov

Slide 6

Slide 6 text

Hello • I’m Sourabh • I’m a software Engineer • I tweet @sb2nov • I eat gummy bears

Slide 7

Slide 7 text

Generative Neural Networks The network learns to map input to output by seeing many examples Output Text Image Video Music Audio 3D Actions Other Input Random Noise Topic Text Image Video Music Audio 3D Actions Other Generation / Translation Network

Slide 8

Slide 8 text

Object Detection and Recognition (ImageNet) googleresearch.blogspot.com/2014/09/ building-deeper-understanding-of-image s.html (Szegedy et al., GoogLeNet) Live: • VGG • YOLO • YOLO v2 • LeCun Concurrence, Localization Occlusion Out of context Counting Tracking

Slide 9

Slide 9 text

Automatic Colorization • Larsson et al., people.cs.uchicago.edu/~larsson/colorization • IIzuka et al., hi.cs.waseda.ac.jp/~iizuka/projects/colorization/en • Web interface: demos.algorithmia.com/colorize-photos (richzhang.github.io/colorization) Ground Truth Input Output

Slide 10

Slide 10 text

Denoising, Super-Resolution and Inpainting •Denoising auto-encoders (e.g. Chollet) •Recursive convolutional net for super-resolution, Kim et al. • Code: github.com/alexjc/neural-enhance • Deepsense.ai blogpost • letsenhance.io •Context encoders for inpainting Pathak et al. • Globally and Locally Consistent Image Completion • Adobe DeepFill • Contextual Attention - demo

Slide 11

Slide 11 text

Inceptionism: Deep Dreams Mordvintsev et al., googleresearch.blogspot.com/2015/06/inceptionism-going-deeper-into-neural.html (GoogLeNet) Pareidolia: Perceiving a familiar pattern where none exists. Johnny 5

Slide 12

Slide 12 text

Deep Dreams • Code: github.com/google/deepdream (see also: bat-county) • Online TensorFlow example • Reddit: reddit.com/r/deepdream • Web interface: • dreamdeeply.com • deepdreamgenerator.com • psychic-vr-lab.com/deepdream • deepdream.akkez.ru • Captions: www.cs.toronto.edu/~rkiros/inceptionism_captions.html • Video: • Fear and Loathing in Las Vegas • Forest Trail

Slide 13

Slide 13 text

Gatys et al., arxiv.org/abs/1508.06576 (VGG) • Features from object recognition! (*) • Web/Mobile interfaces: • dreamscopeapp.com, app, paintbrush (app, video) • ostagram.ru • deepart.io (app, video, fixed colors) • photopaint.us • instapainting.com/assets • aristo (app, video) • prisma-ai.com (app, video) • lucid (app, video) • deepdreamgenerator.com • deeparteffects.com (app) •Kogan (Alice video, Cubist Mirror) • 2001: Odyssey • Video: 1 2 3 Artistic Style Transfer

Slide 14

Slide 14 text

Style Transfer Li and Wand, (code) arxiv.org/abs/1601.04589 Luan et al., (code) arxiv.org/abs/1703.07511

Slide 15

Slide 15 text

No content

Slide 16

Slide 16 text

Neural Doodle: Semantic Style Transfer Champandard, arxiv.org/abs/1603.01768 • Online demo: dmitryulyanov.github.io/feed-forward-neural-doodle • Code: github.com/alexjc/neural-doodle • Blog: nucl.ai/blog/neural-doodles • See also: Wentz, github.com/awentzonline/image-analogies

Slide 17

Slide 17 text

Image to Image Translation With Conditional Adversarial Networks (PatchGAN) Isola et al., phillipi.github.io/pix2pix Interactive: affinelayer.com/pixsrv Guide: ml4a.github.io/guides/Pix2Pix fotogenerator.npocloud.nl

Slide 18

Slide 18 text

No content

Slide 19

Slide 19 text

Unsupervised Image to Image Translation (DiscoGAN/CycleGAN/DualGAN) • Kim et al., arxiv.org/abs/1703.05192 • Zhu et al., junyanz.github.io/CycleGAN • Yi et al., arxiv.org/abs/1704.02510 Face-off

Slide 20

Slide 20 text

No content

Slide 21

Slide 21 text

Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks (StackGAN) Zhang et al., arxiv.org/abs/1612.03242 Code: github.com/hanzhanggit/StackGAN Cha et al., arxiv.org/abs/1708.09321 StackGAN++: Zhang et al., arxiv.org/abs/1710.10916 Code: github.com/hanzhanggit/StackGAN-v2

Slide 22

Slide 22 text

AttnGAN Xu et al., arxiv.org/abs/1711.10485 [supp]

Slide 23

Slide 23 text

AttnGAN – real results (github.com/taoxugit/AttnGAN) The sun is setting into the sea A man riding a Horse The cat is sitting on the couch

Slide 24

Slide 24 text

Video Prediction and Generation • Deep multi-scale video prediction beyond mean square error (Mathieu et al.) • Generating Videos with Scene Dynamics (Vondrick et al.) • Learning to Generate Long-term Future via Hierarchical Prediction (Villegas et al.) • Attentive Semantic Video Generation using Captions (Marwah et al.) • Video Generation from Text (Li et al.) • Visual to Sound: Generating Natural Sound for Videos in the Wild (Zhou et al.) • Imagine This! Scripts to Compositions to Videos (Gupta et al.)

Slide 25

Slide 25 text

Mikolov et al., arxiv.org/abs/1301.3781 Word Embeddings (Word2Vec) King Queen Man Woman King + ( Woman – Man ) = Queen King - Man + Woman = Queen Y X Web demos: • rare-technologies.com/ word2vec-tutorial • bionlp-www.utu.fi/wv_demo Semantically: Algebraically:

Slide 26

Slide 26 text

Word Embeddings (Word2Vec)

Slide 27

Slide 27 text

Sutskever et al., arxiv.org/abs/1409.3215 Sentence Embeddings (sequence-to-sequence encoder-decoder LSTM)

Slide 28

Slide 28 text

Continuous Sentence Representation with Variational Autoencoders •Bowman et al., Generating Sentences from a Continuous Space (1511.06349) •Semeniuta et al., A Hybrid Convolutional Variational Autoencoder for Text Generation (1702.02390) •Web demo: robinsloan.com/voyages-in-sentenc e-space

Slide 29

Slide 29 text

Thanks