Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Pixelor: A Competitive Sketching AI Agent. So you think you can sketch?

Ayan Das
May 13, 2021

Pixelor: A Competitive Sketching AI Agent. So you think you can sketch?

We present the first competitive drawing agent Pixelor that exhibits human-level performance at a Pictionary-like sketching game, where the participant whose sketch is recognized first is a winner. Our AI agent can autonomously sketch a given visual concept, and achieve a recognizable rendition as quickly or faster than a human competitor. The key to victory for the agent is to learn the optimal stroke sequencing strategies that generate the most recognizable and distinguishable strokes first. Training Pixelor is done in two steps. First, we infer the optimal stroke order that maximizes early recognizability of human training sketches. Second, this order is used to supervise the training of a sequence-to-sequence stroke generator. Our key technical contributions are a tractable search of the exponential space of orderings using neural sorting; and an improved Seq2Seq Wasserstein (S2S-WAE) generator that uses an optimal-transport loss to accommodate the multi-modal nature of the optimal stroke distribution. Our analysis shows that Pixelor is better than the human players of the Quick, Draw! game, under both AI and human judging of early recognition. To analyze the impact of human competitors’ strategies, we conducted a further human study with participants being given unlimited thinking time and training in early recognizability by feedback from an AI judge. The study shows that humans do gradually improve their strategies with training, but overall Pixelor still matches human performance. We will release the code and the dataset, optimized for the task of early recognition, upon acceptance.

Ayan Das

May 13, 2021
Tweet

More Decks by Ayan Das

Other Decks in Research

Transcript

  1. Pixelor: A Competitive Sketching AI Agent. So you think you

    can sketch? *Ayan Kumar Bhunia1, *Ayan Das1,2, *Umar Riaz Muhammad1, Yongxin Yang1,2, Timothy M. Hospedales3,1, Tao Xiang1,2, Yulia Gryaditskaya1,2, Yi-ZheSong1,2 (*Equal Contribution) 1 SketchX, CVSSP, University of Surrey, UK 2 iFlyTek-Surrey Joint Research Centre on Artificial Intelligence, UK 3 University of Edinburgh 1
  2. The Game of Pictionary Rules of the Game: • Two-players

    draw a given concept • Judges (players) try to guess the concept • First player whose sketch is recognized by judge wins 2
  3. 3 Pixelary: A pixel-synchronized Pictionary Problem with traditional setting •

    Machines are physically/computationally faster, so it may be unfair to humans Pixel-Synchronization • We restrict the game to be played in a pixel synchronized way • Human and AI will draw same amount of ink at same time
  4. How to win? • Draw the concepts with minimum amount

    of ink • Draw the salient parts of the concept first Reordering … Bad strategies take more ink to achieve recognizability … Good strategies take less ink to achieve recognizability Recognized Recognized 5
  5. Searching for optimal strategy is a viable problem Casually drawn

    sketches Sketches with optimal strategy 6
  6. 7 A key assumption • We simplify the problem in

    terms of strokes • We consider strokes to be units of a sketch
  7. Method overview Sorting Module Dataset Loss Re-ordered Dataset Generative Model

    Inference Training Novel Data Generation A sorting module re-orders the data according to a better strategy The re-ordered data is used to train a generative model, called “Pixelor” 8
  8. QuickDraw Dataset • For our experiments, we use “Quick Draw!”,

    the largest free- hand doodle sketching dataset. • We use a subset of 20 categories, ~70k samples per category 10
  9. Strokes Sorting: Exhaustive search (Extremely slow!) Individual Strokes … Combinatorially

    large search space A sketch with ‘N’ strokes requires at max ‘N!’ classifier evaluations to determine optimal strategy. 11
  10. Neural Sort Strokes Sorting: Neural Sorting is our Solution •

    We use “NeuralSort”[1], a differentiable sorter that re-orders strokes to increase Early-recognizability in a computationally tractable way [1] Aditya Grover, Eric Wang, Aaron Zweig, and Stefano Ermon. Stochastic Optimization of Sorting Networks via Continuous Relaxations, ICLR 2019 12
  11. Strokes Sorting: The concept of Neural Sorting • Traditional “Sorting”

    is a well known algorithm that permutes a given set of objects according to some predicate [1] Aditya Grover, Eric Wang, Aaron Zweig, and Stefano Ermon. Stochastic Optimization of Sorting Networks via Continuous Relaxations, ICLR 2019 • But it is non-differentiable, hence unsuitable for Deep Learning applications • Neural Sort[1] is a differentiable version of the traditional Sorting operator 13
  12. A pre-trained CNN Feature extractor (Sketch-a-Net[2]) [1] https://quickdraw.withgoogle.com/ [2] Qian

    Yu, Yongxin Yang, Feng Liu, Yi-Zhe Song, Tao Xiang, and Timothy M Hospedales. 2017. Sketch-a-net. IJCV 2017 Stroke Sorting: The full Architecture 15
  13. Computing relevance scores for each stroke [1] https://quickdraw.withgoogle.com/ [2] Qian

    Yu, Yongxin Yang, Feng Liu, Yi-Zhe Song, Tao Xiang, and Timothy M Hospedales. 2017. Sketch-a-net. IJCV 2017 Stroke Sorting: The full Architecture 16
  14. Sort the scores with Neural Sort[3] Differentiable ! [1] https://quickdraw.withgoogle.com/

    [2] Qian Yu, Yongxin Yang, Feng Liu, Yi-Zhe Song, Tao Xiang, and Timothy M Hospedales. 2017. Sketch-a-net. IJCV 2017 [3] Aditya Grover, Eric Wang, Aaron Zweig, and Stefano Ermon. Stochastic Optimization of Sorting Networks via Continuous Relaxations, ICLR 2019 Stroke Sorting: The full Architecture 17
  15. Cumulative sketches Sketch-a-Net[2] classifier [1] https://quickdraw.withgoogle.com/ [2] Qian Yu, Yongxin

    Yang, Feng Liu, Yi-Zhe Song, Tao Xiang, and Timothy M Hospedales. 2017. Sketch-a-net. IJCV 2017 [3] Aditya Grover, Eric Wang, Aaron Zweig, and Stefano Ermon. Stochastic Optimization of Sorting Networks via Continuous Relaxations, ICLR 2019 Stroke Sorting: The full Architecture 18
  16. Sketch-a-Net[1]: The sketch classifier • Sketch-a-Net[1] is a popular CNN-based

    classifier designed specifically for rasterized sketches [1] Qian Yu, Yongxin Yang, Feng Liu, Yi-Zhe Song, Tao Xiang, and Timothy M Hospedales. 2017. Sketch-a-net. IJCV 2017 [2] https://quickdraw.withgoogle.com/ [3] http://cybertron.cg.tu-berlin.de/eitz/projects/classifysketch/ • It has shown promising performance on TU-Berlin[3] & also QuickDraw[2] • Pre-final layer activations are used as feature extractors 19
  17. Quantitative Results (Neural Sorting) • ERE (Early Recognition Efficacy) is

    the area under the early recognizability curves ERE: 0.715 ERE: 0.612 • We also evaluated two different ordering strategy – Exhaustive search and Greedy search. Refer to the paper for details. 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Sketch completion rate Correct class score Average over 20 categories 20
  18. SketchRNN[1] • Based on RNN encoder-decoder style architecture • Trained

    with log-likelihood (for reconstruction) and KL divergence (prior) [1] David Ha and Douglas Eck. 2018. A Neural Representation of Sketch Drawings, ICLR 2018 22 Encoder (RNN) Sequential Input Latent Space ~ Decoder (RNN) Sequential Input • Drawbacks: • Cannot handle long sequences • Fails to encode necessary contextual info
  19. Multimodality of optimal strategies • We found traditional SketchRNN model

    sub-optimal for our application 23 ❑ Experiments show that drawing strategies leading to early-recognition have multiple modes ❑ Log-likelihood + KL based training not good for multi-modal data Left: Normal drawings Right: Drawings with good strategy Sketch completion Optimal Strategy 1 Optimal Strategy 2
  20. Our Seq2Seq WAE (namely “Pixelor”) • We use Transformer as

    encoder to alleviate RNN’s long term dependency problem and also to better capture contextual info • We use Wasserstein Distance (instead of Log-likelihood) to leverage the presence of multiple modes in optimal drawing strategies. • We also use MMD (Maximum Mean Discrepancy) instead of KL divergence to match latent distribution to a prior 24
  21. Quantitative Results (Seq2Seq-WAE) • We quantitively prove that our Seq2Seq-WAE

    can handle multi-modal nature of data better than traditional KL-based Sketch-RNN. Averaged over all categories1 25 1 For category-wise numbers, please refer to the supplemental material ❑ ERE (Early Recognition Efficacy) is a measure of early recognisability of a strategy. [Higher is better] ❑ FID (Frechet Inception Distance) is a measure of the quality of generated samples. [Lower is better]
  22. The setup • In “Drawing” phase, participants had to draw

    a given concept with good strategy A screenshot of the drawing interface 27 A screenshot of the judging interface • In “Judging” phase, one sketch is shown at a time which could be from • Pixelor • Human participants (New dataset) • QuickDraw
  23. New dataset (SlowSketch) We collected the sketches drawn by humans

    under our study. We call it the “SlowSketch” dataset SlowSketch dataset (QuickDraw shown for comparison) 28 ❑ 12 participant ❑ ~1700 new sketches ❑ 7 attempts for each participants
  24. AI judging: Results • SlowSketch is recognized by AI at

    33% completion rate • Pixelorsketches are at 17% completion rate This plot is for sketches that are recognized correctly at 100% completion 29
  25. Human judging: Data preparation 30 • Human participants were presented

    with one sketch. They can ❑ Unveil the sketch increasingly at 5% interval ❑ They can put a label at each interval • We record each label and its corresponding completion %
  26. Human judging: Results • Human sketches are recognized at 33%

    completion rate; Pixelor sketches at 31% (lower is better). 0.3 0.4 0.2 Human Pixelor Sketch completion rate when the correct guess is done 31
  27. Failure cases • Since our solution is based on generative

    models, it is prone to failure. • Studying human study results, we noticed that our Pixelor fails in the game against humans when the generative model fails to generate plausible sketches • We notice that Pixelor often fails to generate repetitive patterns (e.g. Guitar strings, cow body patches) 33
  28. More than just a game • This work shows the

    technical details wrapped in a fun game. But .. • We noticed human participants developing better Pictionary skill over time 34 • .. more applications are possible: ❑ Assisting in child development ❑ Improving human drawing ability ❑ Studying human cognitive process
  29. Pixelor: A Competitive Sketching AI Agent. So you think you

    can sketch? *Ayan Kumar Bhunia1, *Ayan Das1,2, *Umar Riaz Muhammad1, Yongxin Yang1,2, Timothy M. Hospedales3,1, Tao Xiang1,2, Yulia Gryaditskaya1,2, Yi-ZheSong1,2 (*Equal Contribution) 1 SketchX, CVSSP, University of Surrey, UK 2 iFlyTek-Surrey Joint Research Centre on Artificial Intelligence, UK 3 University of Edinburgh http://sketchx.ai/pixelor 35