Upgrade to Pro — share decks privately, control downloads, hide ads and more …

ChiroDiff: Modelling chirographic data with Diffusion Models

Ayan Das
April 07, 2023

ChiroDiff: Modelling chirographic data with Diffusion Models

Generative modelling over continuous-time geometric constructs, a.k.a chirographic data such as handwriting, sketches, drawings etc., have been accomplished through autoregressive distributions. Such strictly-ordered discrete factorization however falls short of capturing key properties of chirographic data -- it fails to build holistic understanding of the temporal concept due to one-way visibility (causality). Consequently, temporal data has been modelled as discrete token sequences of fixed sampling rate instead of capturing the true underlying concept. In this paper, we introduce a powerful model-class namely Denoising Diffusion Probabilistic Models or DDPMs for chirographic data that specifically addresses these flaws. Our model named ChiroDiff, being non-autoregressive, learns to capture holistic concepts and therefore remains resilient to higher temporal sampling rate up to a good extent. Moreover, we show that many important downstream utilities (e.g. conditional sampling, creative mixing) can be flexibly implemented using ChiroDiff. We further show some unique use-cases like stochastic vectorization, de-noising/healing, abstraction are also possible with this model-class. We perform quantitative and qualitative evaluation of our framework on relevant datasets and found it to be better or on par with competing approaches.

Ayan Das

April 07, 2023
Tweet

More Decks by Ayan Das

Other Decks in Research

Transcript

  1. Intern a tion a l Conference on Le a rning

    Represent a tion (ICLR) 2023 ChiroDiff: Modelling chirographic data with Diffusion Models Ay a n D a s 1,2, Yongxin Y a ng 1,3, Timothy Hosped a les 1,4,5, T a o Xi a ng 1,2, Yi-Zhe Song 1,2 1 SketchX L a b, University of Surrey; 2 iFlyTek-Surrey Joint Rese a rch Centre on AI; 3 Queen M a ry University of London; 4 University of Edinburgh; 5 S a msung AI Center C a mbridge
  2. Raster vs Vector for sparse structures Gr a phics/Vision models

    mostly de a l with grid-b a sed r a ster im a ges !!
  3. Raster vs Vector for sparse structures Gr a phics/Vision models

    mostly de a l with grid-b a sed r a ster im a ges !! Generic Representation (Non-optimised for sparse structures)
  4. Raster vs Vector for sparse structures Gr a phics/Vision models

    mostly de a l with grid-b a sed r a ster im a ges !! Generic Representation (Non-optimised for sparse structures) Specialized Representation (Optimised for sparsity)
  5. Raster vs Vector for sparse structures Gr a phics/Vision models

    mostly de a l with grid-b a sed r a ster im a ges !! Generic Representation (Non-optimised for sparse structures) Specialized Representation (Optimised for sparsity)
  6. Chirographic Data: Handwriting, Sketches etc. Gener a tive Modelling a

    nd m a nipul a tion [1] A. Das, Y. Yang, T. M. Hospedales, T. Xiang, and Y. Z. Song. SketchODE: Learning neural sketch representation in continuous time. In ICLR, 2022. [2] KanjiVG dataset: https://kanjivg.tagaini.net/ [3] D. Ha and D. Eck. A neural representation of sketch drawings. In ICLR, 2018.
  7. Chirographic Data: Handwriting, Sketches etc. Gener a tive Modelling a

    nd m a nipul a tion English Digits[1] (Simple) [1] A. Das, Y. Yang, T. M. Hospedales, T. Xiang, and Y. Z. Song. SketchODE: Learning neural sketch representation in continuous time. In ICLR, 2022. [2] KanjiVG dataset: https://kanjivg.tagaini.net/ [3] D. Ha and D. Eck. A neural representation of sketch drawings. In ICLR, 2018.
  8. Chirographic Data: Handwriting, Sketches etc. Gener a tive Modelling a

    nd m a nipul a tion English Digits[1] (Simple) Chinese Characters[2] (Complex compositional structure) [1] A. Das, Y. Yang, T. M. Hospedales, T. Xiang, and Y. Z. Song. SketchODE: Learning neural sketch representation in continuous time. In ICLR, 2022. [2] KanjiVG dataset: https://kanjivg.tagaini.net/ [3] D. Ha and D. Eck. A neural representation of sketch drawings. In ICLR, 2018.
  9. Chirographic Data: Handwriting, Sketches etc. Gener a tive Modelling a

    nd m a nipul a tion English Digits[1] (Simple) Chinese Characters[2] (Complex compositional structure) Sketches[3] (Freehand, Noisy) [1] A. Das, Y. Yang, T. M. Hospedales, T. Xiang, and Y. Z. Song. SketchODE: Learning neural sketch representation in continuous time. In ICLR, 2022. [2] KanjiVG dataset: https://kanjivg.tagaini.net/ [3] D. Ha and D. Eck. A neural representation of sketch drawings. In ICLR, 2018.
  10. Popular auto-regressive generative models One segment/point a t a time

    [1] D. Ha and D. Eck. A neural representation of sketch drawings. In ICLR, 2018. [2] A. Das, Y. Yang, T. M. Hospedales, T. Xiang, and Y. Z. Song. BezierSketch: A generative model for scalable vector sketches. In ECCV, 2020.
  11. Popular auto-regressive generative models One segment/point a t a time

    Input Output [1] D. Ha and D. Eck. A neural representation of sketch drawings. In ICLR, 2018. [2] A. Das, Y. Yang, T. M. Hospedales, T. Xiang, and Y. Z. Song. BezierSketch: A generative model for scalable vector sketches. In ECCV, 2020.
  12. Popular auto-regressive generative models One segment/point a t a time

    Input Output Control Points instead of Segments[1] [1] D. Ha and D. Eck. A neural representation of sketch drawings. In ICLR, 2018. [2] A. Das, Y. Yang, T. M. Hospedales, T. Xiang, and Y. Z. Song. BezierSketch: A generative model for scalable vector sketches. In ECCV, 2020.
  13. Popular auto-regressive generative models One segment/point a t a time

    Input Output Control Points instead of Segments[1] [1] D. Ha and D. Eck. A neural representation of sketch drawings. In ICLR, 2018. [2] A. Das, Y. Yang, T. M. Hospedales, T. Xiang, and Y. Z. Song. BezierSketch: A generative model for scalable vector sketches. In ECCV, 2020. p (si |si−1 ; θ) Learning “drawing dynamics”[1, 2]
  14. Popular auto-regressive generative models One segment/point a t a time

    Input Output Control Points instead of Segments[1] [1] D. Ha and D. Eck. A neural representation of sketch drawings. In ICLR, 2018. [2] A. Das, Y. Yang, T. M. Hospedales, T. Xiang, and Y. Z. Song. BezierSketch: A generative model for scalable vector sketches. In ECCV, 2020. p (si |si−1 ; θ) Learning “drawing dynamics”[1, 2] p(s0 , s1 , ⋯; θ) Learning “holistic concepts”
  15. Some newer approaches Continuous-time Model[1] of chirogr a phic d

    a t a [1] A. Das, Y. Yang, T. M. Hospedales, T. Xiang, and Y. Z. Song. SketchODE: Learning neural sketch representation in continuous time. In ICLR, 2022.
  16. Some newer approaches Continuous-time Model[1] of chirogr a phic d

    a t a [1] A. Das, Y. Yang, T. M. Hospedales, T. Xiang, and Y. Z. Song. SketchODE: Learning neural sketch representation in continuous time. In ICLR, 2022. Learns holistic concept as Vector Field
  17. Some newer approaches Continuous-time Model[1] of chirogr a phic d

    a t a [1] A. Das, Y. Yang, T. M. Hospedales, T. Xiang, and Y. Z. Song. SketchODE: Learning neural sketch representation in continuous time. In ICLR, 2022. Learns holistic concept as Vector Field Over-smoothening Training di ffi culty of underlying tools
  18. “ChiroDiff” is our solution Model chirogr a phic sequence in

    non- a utoregressive m a nner p(s0 , s1 , ⋯; θ) Learns holistic concepts, not dynamics
  19. “ChiroDiff” is our solution Model chirogr a phic sequence in

    non- a utoregressive m a nner p(s0 , s1 , ⋯; θ) Di ff usion Models allow us to realise this Learns holistic concepts, not dynamics
  20. “ChiroDiff” is our solution Model chirogr a phic sequence in

    non- a utoregressive m a nner p(s0 , s1 , ⋯; θ) Di ff usion Models allow us to realise this Learns holistic concepts, not dynamics No over-smoothening, still discrete Much easier to train, as with any Di ff usion Models Allows variable length and length conditioning
  21. Our framework St a nd a rd noising, Non- a

    utoregressive sequence De-noiser
  22. Our framework St a nd a rd noising, Non- a

    utoregressive sequence De-noiser Reverse process can modify any part of the sequence at any time .. .. unlike auto-regressive models
  23. Reverse Generative Process Bi-RNN or Tr a nsformer Encoder (w/

    PE) a s le a rn a ble Denoiser Image sources: [1] https://d2l.ai/chapter_recurrent-modern/bi-rnn.html [2] https://jalammar.github.io/illustrated-transformer/ [1]
  24. Reverse Generative Process Bi-RNN or Tr a nsformer Encoder (w/

    PE) a s le a rn a ble Denoiser Image sources: [1] https://d2l.ai/chapter_recurrent-modern/bi-rnn.html [2] https://jalammar.github.io/illustrated-transformer/ [1]
  25. Reverse Generative Process Bi-RNN or Tr a nsformer Encoder (w/

    PE) a s le a rn a ble Denoiser Image sources: [1] https://d2l.ai/chapter_recurrent-modern/bi-rnn.html [2] https://jalammar.github.io/illustrated-transformer/ [1] [2] OR
  26. Properties of our Model (1) Implicit conditioning a nd he

    a ling Di ff erent degree of correlation
  27. Properties of our Model (1) Implicit conditioning a nd he

    a ling Di ff erent degree of correlation Healing badly drawn sketches
  28. Properties of our Model (2) Stoch a stic recre a

    tion, Sem a ntic interpol a tion
  29. Properties of our Model (2) Stoch a stic recre a

    tion, Sem a ntic interpol a tion Inferring drawing topology given perceptive input (ink-clouds)
  30. Properties of our Model (2) Stoch a stic recre a

    tion, Sem a ntic interpol a tion Inferring drawing topology given perceptive input (ink-clouds) Interpolation between samples (with deterministic DDIM latent space)
  31. Properties of our Model (3) Twe a king the reverse

    process v a ri a nce Controlled level of abstraction
  32. Thank you. Read the paper or visit our website for

    more information https://ayandas.me/chirodi ff /