Upgrade to Pro — share decks privately, control downloads, hide ads and more …

ChiroDiff: Modelling chirographic data with Diffusion Models

Ayan Das
April 07, 2023

ChiroDiff: Modelling chirographic data with Diffusion Models

Generative modelling over continuous-time geometric constructs, a.k.a chirographic data such as handwriting, sketches, drawings etc., have been accomplished through autoregressive distributions. Such strictly-ordered discrete factorization however falls short of capturing key properties of chirographic data -- it fails to build holistic understanding of the temporal concept due to one-way visibility (causality). Consequently, temporal data has been modelled as discrete token sequences of fixed sampling rate instead of capturing the true underlying concept. In this paper, we introduce a powerful model-class namely Denoising Diffusion Probabilistic Models or DDPMs for chirographic data that specifically addresses these flaws. Our model named ChiroDiff, being non-autoregressive, learns to capture holistic concepts and therefore remains resilient to higher temporal sampling rate up to a good extent. Moreover, we show that many important downstream utilities (e.g. conditional sampling, creative mixing) can be flexibly implemented using ChiroDiff. We further show some unique use-cases like stochastic vectorization, de-noising/healing, abstraction are also possible with this model-class. We perform quantitative and qualitative evaluation of our framework on relevant datasets and found it to be better or on par with competing approaches.

Ayan Das

April 07, 2023
Tweet

More Decks by Ayan Das

Other Decks in Research

Transcript

  1. Intern
    a
    tion
    a
    l Conference on Le
    a
    rning Represent
    a
    tion (ICLR) 2023
    ChiroDiff: Modelling chirographic data with
    Diffusion Models
    Ay
    a
    n D
    a
    s 1,2, Yongxin Y
    a
    ng 1,3, Timothy Hosped
    a
    les 1,4,5, T
    a
    o Xi
    a
    ng 1,2, Yi-Zhe Song 1,2


    1 SketchX L
    a
    b, University of Surrey; 2 iFlyTek-Surrey Joint Rese
    a
    rch Centre on AI;


    3 Queen M
    a
    ry University of London; 4 University of Edinburgh; 5 S
    a
    msung AI Center C
    a
    mbridge

    View Slide

  2. Raster vs Vector for sparse structures
    Gr
    a
    phics/Vision models mostly de
    a
    l with grid-b
    a
    sed r
    a
    ster im
    a
    ges !!

    View Slide

  3. Raster vs Vector for sparse structures
    Gr
    a
    phics/Vision models mostly de
    a
    l with grid-b
    a
    sed r
    a
    ster im
    a
    ges !!
    Generic Representation


    (Non-optimised for sparse structures)

    View Slide

  4. Raster vs Vector for sparse structures
    Gr
    a
    phics/Vision models mostly de
    a
    l with grid-b
    a
    sed r
    a
    ster im
    a
    ges !!
    Generic Representation


    (Non-optimised for sparse structures)
    Specialized Representation


    (Optimised for sparsity)

    View Slide

  5. Raster vs Vector for sparse structures
    Gr
    a
    phics/Vision models mostly de
    a
    l with grid-b
    a
    sed r
    a
    ster im
    a
    ges !!
    Generic Representation


    (Non-optimised for sparse structures)
    Specialized Representation


    (Optimised for sparsity)

    View Slide

  6. Chirographic Data: Handwriting, Sketches etc.
    Gener
    a
    tive Modelling
    a
    nd m
    a
    nipul
    a
    tion
    [1] A. Das, Y. Yang, T. M. Hospedales, T. Xiang, and Y. Z. Song. SketchODE: Learning neural sketch representation in continuous time. In ICLR, 2022.


    [2] KanjiVG dataset: https://kanjivg.tagaini.net/


    [3] D. Ha and D. Eck. A neural representation of sketch drawings. In ICLR, 2018.


    View Slide

  7. Chirographic Data: Handwriting, Sketches etc.
    Gener
    a
    tive Modelling
    a
    nd m
    a
    nipul
    a
    tion
    English Digits[1]


    (Simple)
    [1] A. Das, Y. Yang, T. M. Hospedales, T. Xiang, and Y. Z. Song. SketchODE: Learning neural sketch representation in continuous time. In ICLR, 2022.


    [2] KanjiVG dataset: https://kanjivg.tagaini.net/


    [3] D. Ha and D. Eck. A neural representation of sketch drawings. In ICLR, 2018.


    View Slide

  8. Chirographic Data: Handwriting, Sketches etc.
    Gener
    a
    tive Modelling
    a
    nd m
    a
    nipul
    a
    tion
    English Digits[1]


    (Simple)
    Chinese Characters[2]


    (Complex compositional structure)
    [1] A. Das, Y. Yang, T. M. Hospedales, T. Xiang, and Y. Z. Song. SketchODE: Learning neural sketch representation in continuous time. In ICLR, 2022.


    [2] KanjiVG dataset: https://kanjivg.tagaini.net/


    [3] D. Ha and D. Eck. A neural representation of sketch drawings. In ICLR, 2018.


    View Slide

  9. Chirographic Data: Handwriting, Sketches etc.
    Gener
    a
    tive Modelling
    a
    nd m
    a
    nipul
    a
    tion
    English Digits[1]


    (Simple)
    Chinese Characters[2]


    (Complex compositional structure)
    Sketches[3]


    (Freehand, Noisy)
    [1] A. Das, Y. Yang, T. M. Hospedales, T. Xiang, and Y. Z. Song. SketchODE: Learning neural sketch representation in continuous time. In ICLR, 2022.


    [2] KanjiVG dataset: https://kanjivg.tagaini.net/


    [3] D. Ha and D. Eck. A neural representation of sketch drawings. In ICLR, 2018.


    View Slide

  10. Popular auto-regressive generative models
    One segment/point
    a
    t
    a
    time
    [1] D. Ha and D. Eck. A neural representation of sketch drawings. In ICLR, 2018.


    [2] A. Das, Y. Yang, T. M. Hospedales, T. Xiang, and Y. Z. Song. BezierSketch: A generative model for scalable vector sketches. In ECCV, 2020.


    View Slide

  11. Popular auto-regressive generative models
    One segment/point
    a
    t
    a
    time
    Input
    Output
    [1] D. Ha and D. Eck. A neural representation of sketch drawings. In ICLR, 2018.


    [2] A. Das, Y. Yang, T. M. Hospedales, T. Xiang, and Y. Z. Song. BezierSketch: A generative model for scalable vector sketches. In ECCV, 2020.


    View Slide

  12. Popular auto-regressive generative models
    One segment/point
    a
    t
    a
    time
    Input
    Output
    Control Points instead of
    Segments[1]
    [1] D. Ha and D. Eck. A neural representation of sketch drawings. In ICLR, 2018.


    [2] A. Das, Y. Yang, T. M. Hospedales, T. Xiang, and Y. Z. Song. BezierSketch: A generative model for scalable vector sketches. In ECCV, 2020.


    View Slide

  13. Popular auto-regressive generative models
    One segment/point
    a
    t
    a
    time
    Input
    Output
    Control Points instead of
    Segments[1]
    [1] D. Ha and D. Eck. A neural representation of sketch drawings. In ICLR, 2018.


    [2] A. Das, Y. Yang, T. M. Hospedales, T. Xiang, and Y. Z. Song. BezierSketch: A generative model for scalable vector sketches. In ECCV, 2020.


    p (si
    |si−1
    ; θ)
    Learning “drawing dynamics”[1, 2]

    View Slide

  14. Popular auto-regressive generative models
    One segment/point
    a
    t
    a
    time
    Input
    Output
    Control Points instead of
    Segments[1]
    [1] D. Ha and D. Eck. A neural representation of sketch drawings. In ICLR, 2018.


    [2] A. Das, Y. Yang, T. M. Hospedales, T. Xiang, and Y. Z. Song. BezierSketch: A generative model for scalable vector sketches. In ECCV, 2020.


    p (si
    |si−1
    ; θ)
    Learning “drawing dynamics”[1, 2]
    p(s0
    , s1
    , ⋯; θ)
    Learning “holistic concepts”

    View Slide

  15. Some newer approaches
    Continuous-time Model[1] of chirogr
    a
    phic d
    a
    t
    a
    [1] A. Das, Y. Yang, T. M. Hospedales, T. Xiang, and Y. Z. Song. SketchODE: Learning neural sketch representation in continuous time. In ICLR, 2022.

    View Slide

  16. Some newer approaches
    Continuous-time Model[1] of chirogr
    a
    phic d
    a
    t
    a
    [1] A. Das, Y. Yang, T. M. Hospedales, T. Xiang, and Y. Z. Song. SketchODE: Learning neural sketch representation in continuous time. In ICLR, 2022.
    Learns holistic concept as Vector Field

    View Slide

  17. Some newer approaches
    Continuous-time Model[1] of chirogr
    a
    phic d
    a
    t
    a
    [1] A. Das, Y. Yang, T. M. Hospedales, T. Xiang, and Y. Z. Song. SketchODE: Learning neural sketch representation in continuous time. In ICLR, 2022.
    Learns holistic concept as Vector Field
    Over-smoothening
    Training di
    ffi
    culty of underlying tools

    View Slide

  18. “ChiroDiff” is our solution
    Model chirogr
    a
    phic sequence in non-
    a
    utoregressive m
    a
    nner

    View Slide

  19. “ChiroDiff” is our solution
    Model chirogr
    a
    phic sequence in non-
    a
    utoregressive m
    a
    nner
    p(s0
    , s1
    , ⋯; θ)
    Learns holistic concepts, not dynamics

    View Slide

  20. “ChiroDiff” is our solution
    Model chirogr
    a
    phic sequence in non-
    a
    utoregressive m
    a
    nner
    p(s0
    , s1
    , ⋯; θ)
    Di
    ff
    usion Models allow us to realise this
    Learns holistic concepts, not dynamics

    View Slide

  21. “ChiroDiff” is our solution
    Model chirogr
    a
    phic sequence in non-
    a
    utoregressive m
    a
    nner
    p(s0
    , s1
    , ⋯; θ)
    Di
    ff
    usion Models allow us to realise this
    Learns holistic concepts, not dynamics
    No over-smoothening, still discrete
    Much easier to train, as with any Di
    ff
    usion Models
    Allows variable length and length conditioning

    View Slide

  22. Our framework
    St
    a
    nd
    a
    rd noising, Non-
    a
    utoregressive sequence De-noiser

    View Slide

  23. Our framework
    St
    a
    nd
    a
    rd noising, Non-
    a
    utoregressive sequence De-noiser
    Reverse process can modify any part of the sequence at any time ..
    .. unlike auto-regressive models

    View Slide

  24. Reverse Generative Process
    Bi-RNN or Tr
    a
    nsformer Encoder (w/ PE)
    a
    s le
    a
    rn
    a
    ble Denoiser
    Image sources:


    [1] https://d2l.ai/chapter_recurrent-modern/bi-rnn.html


    [2] https://jalammar.github.io/illustrated-transformer/
    [1]

    View Slide

  25. Reverse Generative Process
    Bi-RNN or Tr
    a
    nsformer Encoder (w/ PE)
    a
    s le
    a
    rn
    a
    ble Denoiser
    Image sources:


    [1] https://d2l.ai/chapter_recurrent-modern/bi-rnn.html


    [2] https://jalammar.github.io/illustrated-transformer/
    [1]

    View Slide

  26. Reverse Generative Process
    Bi-RNN or Tr
    a
    nsformer Encoder (w/ PE)
    a
    s le
    a
    rn
    a
    ble Denoiser
    Image sources:


    [1] https://d2l.ai/chapter_recurrent-modern/bi-rnn.html


    [2] https://jalammar.github.io/illustrated-transformer/
    [1] [2]
    OR

    View Slide

  27. Unconditional Generation
    High-qu
    a
    lity gener
    a
    tions

    View Slide

  28. Unconditional Generation
    High-qu
    a
    lity gener
    a
    tions

    View Slide

  29. Properties of our Model (1)
    Implicit conditioning
    a
    nd he
    a
    ling

    View Slide

  30. Properties of our Model (1)
    Implicit conditioning
    a
    nd he
    a
    ling
    Di
    ff
    erent degree


    of correlation

    View Slide

  31. Properties of our Model (1)
    Implicit conditioning
    a
    nd he
    a
    ling
    Di
    ff
    erent degree


    of correlation
    Healing badly drawn sketches

    View Slide

  32. Properties of our Model (2)
    Stoch
    a
    stic recre
    a
    tion, Sem
    a
    ntic interpol
    a
    tion

    View Slide

  33. Properties of our Model (2)
    Stoch
    a
    stic recre
    a
    tion, Sem
    a
    ntic interpol
    a
    tion
    Inferring drawing topology given perceptive input (ink-clouds)

    View Slide

  34. Properties of our Model (2)
    Stoch
    a
    stic recre
    a
    tion, Sem
    a
    ntic interpol
    a
    tion
    Inferring drawing topology given perceptive input (ink-clouds)
    Interpolation between samples (with deterministic DDIM latent space)

    View Slide

  35. Properties of our Model (3)
    Twe
    a
    king the reverse process v
    a
    ri
    a
    nce

    View Slide

  36. Properties of our Model (3)
    Twe
    a
    king the reverse process v
    a
    ri
    a
    nce
    Controlled level of abstraction

    View Slide

  37. Thank you.


    Read the paper or visit our website


    for more information


    https://ayandas.me/chirodi
    ff
    /

    View Slide