ChiroDiff: Modelling chirographic data with Diffusion Models

Intern a tion a l Conference on Le a rning
Represent a tion (ICLR) 2023 ChiroDiff: Modelling chirographic data with Diffusion Models Ay a n D a s 1,2, Yongxin Y a ng 1,3, Timothy Hosped a les 1,4,5, T a o Xi a ng 1,2, Yi-Zhe Song 1,2 1 SketchX L a b, University of Surrey; 2 iFlyTek-Surrey Joint Rese a rch Centre on AI; 3 Queen M a ry University of London; 4 University of Edinburgh; 5 S a msung AI Center C a mbridge

Raster vs Vector for sparse structures Gr a phics/Vision models
mostly de a l with grid-b a sed r a ster im a ges !!

mostly de a l with grid-b a sed r a ster im a ges !! Generic Representation (Non-optimised for sparse structures)

mostly de a l with grid-b a sed r a ster im a ges !! Generic Representation (Non-optimised for sparse structures) Specialized Representation (Optimised for sparsity)

Chirographic Data: Handwriting, Sketches etc. Gener a tive Modelling a
nd m a nipul a tion [1] A. Das, Y. Yang, T. M. Hospedales, T. Xiang, and Y. Z. Song. SketchODE: Learning neural sketch representation in continuous time. In ICLR, 2022. [2] KanjiVG dataset: https://kanjivg.tagaini.net/ [3] D. Ha and D. Eck. A neural representation of sketch drawings. In ICLR, 2018.

nd m a nipul a tion English Digits[1] (Simple) [1] A. Das, Y. Yang, T. M. Hospedales, T. Xiang, and Y. Z. Song. SketchODE: Learning neural sketch representation in continuous time. In ICLR, 2022. [2] KanjiVG dataset: https://kanjivg.tagaini.net/ [3] D. Ha and D. Eck. A neural representation of sketch drawings. In ICLR, 2018.

nd m a nipul a tion English Digits[1] (Simple) Chinese Characters[2] (Complex compositional structure) [1] A. Das, Y. Yang, T. M. Hospedales, T. Xiang, and Y. Z. Song. SketchODE: Learning neural sketch representation in continuous time. In ICLR, 2022. [2] KanjiVG dataset: https://kanjivg.tagaini.net/ [3] D. Ha and D. Eck. A neural representation of sketch drawings. In ICLR, 2018.

nd m a nipul a tion English Digits[1] (Simple) Chinese Characters[2] (Complex compositional structure) Sketches[3] (Freehand, Noisy) [1] A. Das, Y. Yang, T. M. Hospedales, T. Xiang, and Y. Z. Song. SketchODE: Learning neural sketch representation in continuous time. In ICLR, 2022. [2] KanjiVG dataset: https://kanjivg.tagaini.net/ [3] D. Ha and D. Eck. A neural representation of sketch drawings. In ICLR, 2018.

Popular auto-regressive generative models One segment/point a t a time
[1] D. Ha and D. Eck. A neural representation of sketch drawings. In ICLR, 2018. [2] A. Das, Y. Yang, T. M. Hospedales, T. Xiang, and Y. Z. Song. BezierSketch: A generative model for scalable vector sketches. In ECCV, 2020.

Input Output [1] D. Ha and D. Eck. A neural representation of sketch drawings. In ICLR, 2018. [2] A. Das, Y. Yang, T. M. Hospedales, T. Xiang, and Y. Z. Song. BezierSketch: A generative model for scalable vector sketches. In ECCV, 2020.

Input Output Control Points instead of Segments[1] [1] D. Ha and D. Eck. A neural representation of sketch drawings. In ICLR, 2018. [2] A. Das, Y. Yang, T. M. Hospedales, T. Xiang, and Y. Z. Song. BezierSketch: A generative model for scalable vector sketches. In ECCV, 2020.

Input Output Control Points instead of Segments[1] [1] D. Ha and D. Eck. A neural representation of sketch drawings. In ICLR, 2018. [2] A. Das, Y. Yang, T. M. Hospedales, T. Xiang, and Y. Z. Song. BezierSketch: A generative model for scalable vector sketches. In ECCV, 2020. p (si |si−1 ; θ) Learning “drawing dynamics”[1, 2]

Input Output Control Points instead of Segments[1] [1] D. Ha and D. Eck. A neural representation of sketch drawings. In ICLR, 2018. [2] A. Das, Y. Yang, T. M. Hospedales, T. Xiang, and Y. Z. Song. BezierSketch: A generative model for scalable vector sketches. In ECCV, 2020. p (si |si−1 ; θ) Learning “drawing dynamics”[1, 2] p(s0 , s1 , ⋯; θ) Learning “holistic concepts”

Some newer approaches Continuous-time Model[1] of chirogr a phic d
a t a [1] A. Das, Y. Yang, T. M. Hospedales, T. Xiang, and Y. Z. Song. SketchODE: Learning neural sketch representation in continuous time. In ICLR, 2022.

a t a [1] A. Das, Y. Yang, T. M. Hospedales, T. Xiang, and Y. Z. Song. SketchODE: Learning neural sketch representation in continuous time. In ICLR, 2022. Learns holistic concept as Vector Field

a t a [1] A. Das, Y. Yang, T. M. Hospedales, T. Xiang, and Y. Z. Song. SketchODE: Learning neural sketch representation in continuous time. In ICLR, 2022. Learns holistic concept as Vector Field Over-smoothening Training di ffi culty of underlying tools

“ChiroDiff” is our solution Model chirogr a phic sequence in
non- a utoregressive m a nner

non- a utoregressive m a nner p(s0 , s1 , ⋯; θ) Learns holistic concepts, not dynamics

non- a utoregressive m a nner p(s0 , s1 , ⋯; θ) Di ff usion Models allow us to realise this Learns holistic concepts, not dynamics

non- a utoregressive m a nner p(s0 , s1 , ⋯; θ) Di ff usion Models allow us to realise this Learns holistic concepts, not dynamics No over-smoothening, still discrete Much easier to train, as with any Di ff usion Models Allows variable length and length conditioning

Our framework St a nd a rd noising, Non- a
utoregressive sequence De-noiser

Our framework St a nd a rd noising, Non- a
utoregressive sequence De-noiser Reverse process can modify any part of the sequence at any time .. .. unlike auto-regressive models

Reverse Generative Process Bi-RNN or Tr a nsformer Encoder (w/
PE) a s le a rn a ble Denoiser Image sources: [1] https://d2l.ai/chapter_recurrent-modern/bi-rnn.html [2] https://jalammar.github.io/illustrated-transformer/ [1]

Reverse Generative Process Bi-RNN or Tr a nsformer Encoder (w/
PE) a s le a rn a ble Denoiser Image sources: [1] https://d2l.ai/chapter_recurrent-modern/bi-rnn.html [2] https://jalammar.github.io/illustrated-transformer/ [1] [2] OR

Unconditional Generation High-qu a lity gener a tions

Properties of our Model (1) Implicit conditioning a nd he
a ling

a ling Di ff erent degree of correlation

a ling Di ff erent degree of correlation Healing badly drawn sketches

Properties of our Model (2) Stoch a stic recre a
tion, Sem a ntic interpol a tion

tion, Sem a ntic interpol a tion Inferring drawing topology given perceptive input (ink-clouds)

tion, Sem a ntic interpol a tion Inferring drawing topology given perceptive input (ink-clouds) Interpolation between samples (with deterministic DDIM latent space)

Properties of our Model (3) Twe a king the reverse
process v a ri a nce

Properties of our Model (3) Twe a king the reverse
process v a ri a nce Controlled level of abstraction

Thank you. Read the paper or visit our website for
more information https://ayandas.me/chirodi ff /

ChiroDiff: Modelling chirographic data with Dif...

ChiroDiff: Modelling chirographic data with Diffusion Models

More Decks by Ayan Das

Other Decks in Research

Featured

Transcript