Pixelor: A Competitive Sketching AI Agent. So you think you can sketch?

Pixelor: A Competitive Sketching AI Agent. So you think you
can sketch? *Ayan Kumar Bhunia1, *Ayan Das1,2, *Umar Riaz Muhammad1, Yongxin Yang1,2, Timothy M. Hospedales3,1, Tao Xiang1,2, Yulia Gryaditskaya1,2, Yi-ZheSong1,2 (*Equal Contribution) 1 SketchX, CVSSP, University of Surrey, UK 2 iFlyTek-Surrey Joint Research Centre on Artificial Intelligence, UK 3 University of Edinburgh 1

The Game of Pictionary Rules of the Game: • Two-players
draw a given concept • Judges (players) try to guess the concept • First player whose sketch is recognized by judge wins 2

3 Pixelary: A pixel-synchronized Pictionary Problem with traditional setting •
Machines are physically/computationally faster, so it may be unfair to humans Pixel-Synchronization • We restrict the game to be played in a pixel synchronized way • Human and AI will draw same amount of ink at same time

Human Pixelor Pixelor: A Pictionary playing agent 4

How to win? • Draw the concepts with minimum amount
of ink • Draw the salient parts of the concept first Reordering … Bad strategies take more ink to achieve recognizability … Good strategies take less ink to achieve recognizability Recognized Recognized 5

Searching for optimal strategy is a viable problem Casually drawn
sketches Sketches with optimal strategy 6

7 A key assumption • We simplify the problem in
terms of strokes • We consider strokes to be units of a sketch

Method overview Sorting Module Dataset Loss Re-ordered Dataset Generative Model
Inference Training Novel Data Generation A sorting module re-orders the data according to a better strategy The re-ordered data is used to train a generative model, called “Pixelor” 8

Part 1: Strokes Sorting 9

QuickDraw Dataset • For our experiments, we use “Quick Draw!”,
the largest free- hand doodle sketching dataset. • We use a subset of 20 categories, ~70k samples per category 10

Strokes Sorting: Exhaustive search (Extremely slow!) Individual Strokes … Combinatorially
large search space A sketch with ‘N’ strokes requires at max ‘N!’ classifier evaluations to determine optimal strategy. 11

Neural Sort Strokes Sorting: Neural Sorting is our Solution •
We use “NeuralSort”[1], a differentiable sorter that re-orders strokes to increase Early-recognizability in a computationally tractable way [1] Aditya Grover, Eric Wang, Aaron Zweig, and Stefano Ermon. Stochastic Optimization of Sorting Networks via Continuous Relaxations, ICLR 2019 12

Strokes Sorting: The concept of Neural Sorting • Traditional “Sorting”
is a well known algorithm that permutes a given set of objects according to some predicate [1] Aditya Grover, Eric Wang, Aaron Zweig, and Stefano Ermon. Stochastic Optimization of Sorting Networks via Continuous Relaxations, ICLR 2019 • But it is non-differentiable, hence unsuitable for Deep Learning applications • Neural Sort[1] is a differentiable version of the traditional Sorting operator 13

Stroke Sorting: The full Architecture Vectorized Sketches (QuickDraw[1]) [1] https://quickdraw.withgoogle.com/
14

A pre-trained CNN Feature extractor (Sketch-a-Net[2]) [1] https://quickdraw.withgoogle.com/ [2] Qian
Yu, Yongxin Yang, Feng Liu, Yi-Zhe Song, Tao Xiang, and Timothy M Hospedales. 2017. Sketch-a-net. IJCV 2017 Stroke Sorting: The full Architecture 15

Computing relevance scores for each stroke [1] https://quickdraw.withgoogle.com/ [2] Qian
Yu, Yongxin Yang, Feng Liu, Yi-Zhe Song, Tao Xiang, and Timothy M Hospedales. 2017. Sketch-a-net. IJCV 2017 Stroke Sorting: The full Architecture 16

Sort the scores with Neural Sort[3] Differentiable ! [1] https://quickdraw.withgoogle.com/
[2] Qian Yu, Yongxin Yang, Feng Liu, Yi-Zhe Song, Tao Xiang, and Timothy M Hospedales. 2017. Sketch-a-net. IJCV 2017 [3] Aditya Grover, Eric Wang, Aaron Zweig, and Stefano Ermon. Stochastic Optimization of Sorting Networks via Continuous Relaxations, ICLR 2019 Stroke Sorting: The full Architecture 17

Cumulative sketches Sketch-a-Net[2] classifier [1] https://quickdraw.withgoogle.com/ [2] Qian Yu, Yongxin
Yang, Feng Liu, Yi-Zhe Song, Tao Xiang, and Timothy M Hospedales. 2017. Sketch-a-net. IJCV 2017 [3] Aditya Grover, Eric Wang, Aaron Zweig, and Stefano Ermon. Stochastic Optimization of Sorting Networks via Continuous Relaxations, ICLR 2019 Stroke Sorting: The full Architecture 18

Sketch-a-Net[1]: The sketch classifier • Sketch-a-Net[1] is a popular CNN-based
classifier designed specifically for rasterized sketches [1] Qian Yu, Yongxin Yang, Feng Liu, Yi-Zhe Song, Tao Xiang, and Timothy M Hospedales. 2017. Sketch-a-net. IJCV 2017 [2] https://quickdraw.withgoogle.com/ [3] http://cybertron.cg.tu-berlin.de/eitz/projects/classifysketch/ • It has shown promising performance on TU-Berlin[3] & also QuickDraw[2] • Pre-final layer activations are used as feature extractors 19

Quantitative Results (Neural Sorting) • ERE (Early Recognition Efficacy) is
the area under the early recognizability curves ERE: 0.715 ERE: 0.612 • We also evaluated two different ordering strategy – Exhaustive search and Greedy search. Refer to the paper for details. 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Sketch completion rate Correct class score Average over 20 categories 20

Part 2: Sketch generation 21

SketchRNN[1] • Based on RNN encoder-decoder style architecture • Trained
with log-likelihood (for reconstruction) and KL divergence (prior) [1] David Ha and Douglas Eck. 2018. A Neural Representation of Sketch Drawings, ICLR 2018 22 Encoder (RNN) Sequential Input Latent Space ~ Decoder (RNN) Sequential Input • Drawbacks: • Cannot handle long sequences • Fails to encode necessary contextual info

Multimodality of optimal strategies • We found traditional SketchRNN model
sub-optimal for our application 23 ❑ Experiments show that drawing strategies leading to early-recognition have multiple modes ❑ Log-likelihood + KL based training not good for multi-modal data Left: Normal drawings Right: Drawings with good strategy Sketch completion Optimal Strategy 1 Optimal Strategy 2

Our Seq2Seq WAE (namely “Pixelor”) • We use Transformer as
encoder to alleviate RNN’s long term dependency problem and also to better capture contextual info • We use Wasserstein Distance (instead of Log-likelihood) to leverage the presence of multiple modes in optimal drawing strategies. • We also use MMD (Maximum Mean Discrepancy) instead of KL divergence to match latent distribution to a prior 24

Quantitative Results (Seq2Seq-WAE) • We quantitively prove that our Seq2Seq-WAE
can handle multi-modal nature of data better than traditional KL-based Sketch-RNN. Averaged over all categories1 25 1 For category-wise numbers, please refer to the supplemental material ❑ ERE (Early Recognition Efficacy) is a measure of early recognisability of a strategy. [Higher is better] ❑ FID (Frechet Inception Distance) is a measure of the quality of generated samples. [Lower is better]

The Human Study 26

The setup • In “Drawing” phase, participants had to draw
a given concept with good strategy A screenshot of the drawing interface 27 A screenshot of the judging interface • In “Judging” phase, one sketch is shown at a time which could be from • Pixelor • Human participants (New dataset) • QuickDraw

New dataset (SlowSketch) We collected the sketches drawn by humans
under our study. We call it the “SlowSketch” dataset SlowSketch dataset (QuickDraw shown for comparison) 28 ❑ 12 participant ❑ ~1700 new sketches ❑ 7 attempts for each participants

AI judging: Results • SlowSketch is recognized by AI at
33% completion rate • Pixelorsketches are at 17% completion rate This plot is for sketches that are recognized correctly at 100% completion 29

Human judging: Data preparation 30 • Human participants were presented
with one sketch. They can ❑ Unveil the sketch increasingly at 5% interval ❑ They can put a label at each interval • We record each label and its corresponding completion %

Human judging: Results • Human sketches are recognized at 33%
completion rate; Pixelor sketches at 31% (lower is better). 0.3 0.4 0.2 Human Pixelor Sketch completion rate when the correct guess is done 31

Final Remarks 32

Failure cases • Since our solution is based on generative
models, it is prone to failure. • Studying human study results, we noticed that our Pixelor fails in the game against humans when the generative model fails to generate plausible sketches • We notice that Pixelor often fails to generate repetitive patterns (e.g. Guitar strings, cow body patches) 33

More than just a game • This work shows the
technical details wrapped in a fun game. But .. • We noticed human participants developing better Pictionary skill over time 34 • .. more applications are possible: ❑ Assisting in child development ❑ Improving human drawing ability ❑ Studying human cognitive process

Pixelor: A Competitive Sketching AI Agent. So you think you
can sketch? *Ayan Kumar Bhunia1, *Ayan Das1,2, *Umar Riaz Muhammad1, Yongxin Yang1,2, Timothy M. Hospedales3,1, Tao Xiang1,2, Yulia Gryaditskaya1,2, Yi-ZheSong1,2 (*Equal Contribution) 1 SketchX, CVSSP, University of Surrey, UK 2 iFlyTek-Surrey Joint Research Centre on Artificial Intelligence, UK 3 University of Edinburgh http://sketchx.ai/pixelor 35

Pixelor: A Competitive Sketching AI Agent. So y...

Pixelor: A Competitive Sketching AI Agent. So you think you can sketch?

More Decks by Ayan Das

Other Decks in Research

Featured

Transcript