Pixelor: A Competitive Sketching AI Agent. So you think you can sketch?

Slide 1

Slide 1 text

Pixelor: A Competitive Sketching AI Agent. So you think you can sketch? *Ayan Kumar Bhunia1, *Ayan Das1,2, *Umar Riaz Muhammad1, Yongxin Yang1,2, Timothy M. Hospedales3,1, Tao Xiang1,2, Yulia Gryaditskaya1,2, Yi-ZheSong1,2 (*Equal Contribution) 1 SketchX, CVSSP, University of Surrey, UK 2 iFlyTek-Surrey Joint Research Centre on Artificial Intelligence, UK 3 University of Edinburgh 1

Slide 2

Slide 2 text

The Game of Pictionary Rules of the Game: • Two-players draw a given concept • Judges (players) try to guess the concept • First player whose sketch is recognized by judge wins 2

Slide 3

Slide 3 text

3 Pixelary: A pixel-synchronized Pictionary Problem with traditional setting • Machines are physically/computationally faster, so it may be unfair to humans Pixel-Synchronization • We restrict the game to be played in a pixel synchronized way • Human and AI will draw same amount of ink at same time

Slide 4

Slide 4 text

Human Pixelor Pixelor: A Pictionary playing agent 4

Slide 5

Slide 5 text

How to win? • Draw the concepts with minimum amount of ink • Draw the salient parts of the concept first Reordering … Bad strategies take more ink to achieve recognizability … Good strategies take less ink to achieve recognizability Recognized Recognized 5

Slide 6

Slide 6 text

Searching for optimal strategy is a viable problem Casually drawn sketches Sketches with optimal strategy 6

Slide 7

Slide 7 text

7 A key assumption • We simplify the problem in terms of strokes • We consider strokes to be units of a sketch

Slide 8

Slide 8 text

Method overview Sorting Module Dataset Loss Re-ordered Dataset Generative Model Inference Training Novel Data Generation A sorting module re-orders the data according to a better strategy The re-ordered data is used to train a generative model, called “Pixelor” 8

Slide 9

Slide 9 text

Part 1: Strokes Sorting 9

Slide 10

Slide 10 text

QuickDraw Dataset • For our experiments, we use “Quick Draw!”, the largest free- hand doodle sketching dataset. • We use a subset of 20 categories, ~70k samples per category 10

Slide 11

Slide 11 text

Strokes Sorting: Exhaustive search (Extremely slow!) Individual Strokes … Combinatorially large search space A sketch with ‘N’ strokes requires at max ‘N!’ classifier evaluations to determine optimal strategy. 11

Slide 12

Slide 12 text

Neural Sort Strokes Sorting: Neural Sorting is our Solution • We use “NeuralSort”[1], a differentiable sorter that re-orders strokes to increase Early-recognizability in a computationally tractable way [1] Aditya Grover, Eric Wang, Aaron Zweig, and Stefano Ermon. Stochastic Optimization of Sorting Networks via Continuous Relaxations, ICLR 2019 12

Slide 13

Slide 13 text

Strokes Sorting: The concept of Neural Sorting • Traditional “Sorting” is a well known algorithm that permutes a given set of objects according to some predicate [1] Aditya Grover, Eric Wang, Aaron Zweig, and Stefano Ermon. Stochastic Optimization of Sorting Networks via Continuous Relaxations, ICLR 2019 • But it is non-differentiable, hence unsuitable for Deep Learning applications • Neural Sort[1] is a differentiable version of the traditional Sorting operator 13

Slide 14

Slide 14 text

Stroke Sorting: The full Architecture Vectorized Sketches (QuickDraw[1]) [1] https://quickdraw.withgoogle.com/ 14

Slide 15

Slide 15 text

A pre-trained CNN Feature extractor (Sketch-a-Net[2]) [1] https://quickdraw.withgoogle.com/ [2] Qian Yu, Yongxin Yang, Feng Liu, Yi-Zhe Song, Tao Xiang, and Timothy M Hospedales. 2017. Sketch-a-net. IJCV 2017 Stroke Sorting: The full Architecture 15

Slide 16

Slide 16 text

Computing relevance scores for each stroke [1] https://quickdraw.withgoogle.com/ [2] Qian Yu, Yongxin Yang, Feng Liu, Yi-Zhe Song, Tao Xiang, and Timothy M Hospedales. 2017. Sketch-a-net. IJCV 2017 Stroke Sorting: The full Architecture 16

Slide 17

Slide 17 text

Sort the scores with Neural Sort[3] Differentiable ! [1] https://quickdraw.withgoogle.com/ [2] Qian Yu, Yongxin Yang, Feng Liu, Yi-Zhe Song, Tao Xiang, and Timothy M Hospedales. 2017. Sketch-a-net. IJCV 2017 [3] Aditya Grover, Eric Wang, Aaron Zweig, and Stefano Ermon. Stochastic Optimization of Sorting Networks via Continuous Relaxations, ICLR 2019 Stroke Sorting: The full Architecture 17

Slide 18

Slide 18 text

Cumulative sketches Sketch-a-Net[2] classifier [1] https://quickdraw.withgoogle.com/ [2] Qian Yu, Yongxin Yang, Feng Liu, Yi-Zhe Song, Tao Xiang, and Timothy M Hospedales. 2017. Sketch-a-net. IJCV 2017 [3] Aditya Grover, Eric Wang, Aaron Zweig, and Stefano Ermon. Stochastic Optimization of Sorting Networks via Continuous Relaxations, ICLR 2019 Stroke Sorting: The full Architecture 18

Slide 19

Slide 19 text

Sketch-a-Net[1]: The sketch classifier • Sketch-a-Net[1] is a popular CNN-based classifier designed specifically for rasterized sketches [1] Qian Yu, Yongxin Yang, Feng Liu, Yi-Zhe Song, Tao Xiang, and Timothy M Hospedales. 2017. Sketch-a-net. IJCV 2017 [2] https://quickdraw.withgoogle.com/ [3] http://cybertron.cg.tu-berlin.de/eitz/projects/classifysketch/ • It has shown promising performance on TU-Berlin[3] & also QuickDraw[2] • Pre-final layer activations are used as feature extractors 19

Slide 20

Slide 20 text

Quantitative Results (Neural Sorting) • ERE (Early Recognition Efficacy) is the area under the early recognizability curves ERE: 0.715 ERE: 0.612 • We also evaluated two different ordering strategy – Exhaustive search and Greedy search. Refer to the paper for details. 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Sketch completion rate Correct class score Average over 20 categories 20

Slide 21

Slide 21 text

Part 2: Sketch generation 21

Slide 22

Slide 22 text

SketchRNN[1] • Based on RNN encoder-decoder style architecture • Trained with log-likelihood (for reconstruction) and KL divergence (prior) [1] David Ha and Douglas Eck. 2018. A Neural Representation of Sketch Drawings, ICLR 2018 22 Encoder (RNN) Sequential Input Latent Space ~ Decoder (RNN) Sequential Input • Drawbacks: • Cannot handle long sequences • Fails to encode necessary contextual info

Slide 23

Slide 23 text

Multimodality of optimal strategies • We found traditional SketchRNN model sub-optimal for our application 23 ❑ Experiments show that drawing strategies leading to early-recognition have multiple modes ❑ Log-likelihood + KL based training not good for multi-modal data Left: Normal drawings Right: Drawings with good strategy Sketch completion Optimal Strategy 1 Optimal Strategy 2

Slide 24

Slide 24 text

Our Seq2Seq WAE (namely “Pixelor”) • We use Transformer as encoder to alleviate RNN’s long term dependency problem and also to better capture contextual info • We use Wasserstein Distance (instead of Log-likelihood) to leverage the presence of multiple modes in optimal drawing strategies. • We also use MMD (Maximum Mean Discrepancy) instead of KL divergence to match latent distribution to a prior 24

Slide 25

Slide 25 text

Quantitative Results (Seq2Seq-WAE) • We quantitively prove that our Seq2Seq-WAE can handle multi-modal nature of data better than traditional KL-based Sketch-RNN. Averaged over all categories1 25 1 For category-wise numbers, please refer to the supplemental material ❑ ERE (Early Recognition Efficacy) is a measure of early recognisability of a strategy. [Higher is better] ❑ FID (Frechet Inception Distance) is a measure of the quality of generated samples. [Lower is better]

Slide 26

Slide 26 text

The Human Study 26

Slide 27

Slide 27 text

The setup • In “Drawing” phase, participants had to draw a given concept with good strategy A screenshot of the drawing interface 27 A screenshot of the judging interface • In “Judging” phase, one sketch is shown at a time which could be from • Pixelor • Human participants (New dataset) • QuickDraw

Slide 28

Slide 28 text

New dataset (SlowSketch) We collected the sketches drawn by humans under our study. We call it the “SlowSketch” dataset SlowSketch dataset (QuickDraw shown for comparison) 28 ❑ 12 participant ❑ ~1700 new sketches ❑ 7 attempts for each participants

Slide 29

Slide 29 text

AI judging: Results • SlowSketch is recognized by AI at 33% completion rate • Pixelorsketches are at 17% completion rate This plot is for sketches that are recognized correctly at 100% completion 29

Slide 30

Slide 30 text

Human judging: Data preparation 30 • Human participants were presented with one sketch. They can ❑ Unveil the sketch increasingly at 5% interval ❑ They can put a label at each interval • We record each label and its corresponding completion %

Slide 31

Slide 31 text

Human judging: Results • Human sketches are recognized at 33% completion rate; Pixelor sketches at 31% (lower is better). 0.3 0.4 0.2 Human Pixelor Sketch completion rate when the correct guess is done 31

Slide 32

Slide 32 text

Final Remarks 32

Slide 33

Slide 33 text

Failure cases • Since our solution is based on generative models, it is prone to failure. • Studying human study results, we noticed that our Pixelor fails in the game against humans when the generative model fails to generate plausible sketches • We notice that Pixelor often fails to generate repetitive patterns (e.g. Guitar strings, cow body patches) 33

Slide 34

Slide 34 text

More than just a game • This work shows the technical details wrapped in a fun game. But .. • We noticed human participants developing better Pictionary skill over time 34 • .. more applications are possible: ❑ Assisting in child development ❑ Improving human drawing ability ❑ Studying human cognitive process