Upgrade to Pro — share decks privately, control downloads, hide ads and more …

SketchODE: Learning neural sketch representatio...

Ayan Das
March 24, 2022

SketchODE: Learning neural sketch representation in continuous time

Learning meaningful representations for chirographic drawing data such as sketches, handwriting, and flowcharts is a gateway for understanding and emulating human creative expression. Despite being inherently continuous-time data, existing works have treated these as discrete-time sequences, disregarding their true nature. In this work, we model such data as continuous-time functions and learn compact representations by virtue of Neural Ordinary Differential Equations. To this end, we introduce the first continuous-time Seq2Seq model and demonstrate some remarkable properties that set it apart from traditional discrete-time analogues. We also provide solutions for some practical challenges for such models, including introducing a family of parameterized ODE dynamics & continuous-time data augmentation particularly suitable for the task. Our models are validated on several datasets including VectorMNIST, DiDi and Quick, Draw!.

Ayan Das

March 24, 2022
Tweet

More Decks by Ayan Das

Other Decks in Research

Transcript

  1. SketchODE: Learning neural sketch representation in continuous time Ayan Das1,2,

    Yongxin Yang1,3, Timothy Hospedales1,3, Tao Xiang1,2, Yi-Zhe Song1,2 1SketchX, CVSSP, University of Surrey, UK 2iFlyTek-Surrey Joint research centre on AI 3University of Edinburgh, UK Accepted as poster @ ICLR ‘22
  2. Chirographic Data: Handwriting, Sketches etc. • Usually represented as sequence

    of discrete points • Disregards the true nature, which is continuous
  3. Chirographic Data: Handwriting, Sketches etc. • Usually represented as sequence

    of discrete points • Disregards the true nature, which is continuous
  4. Chirographic Data: Handwriting, Sketches etc. • Usually represented as sequence

    of discrete points • Disregards the true nature, which is continuous QuickDraw
  5. Chirographic Data: Handwriting, Sketches etc. • Usually represented as sequence

    of discrete points • Disregards the true nature, which is continuous QuickDraw VectorMNIST* * VectorMNIST (newly introduced), vectorized version of MNIST, check https://ayandas.me/sketchode
  6. Representing continuous time strokes • Previous approaches • Bezier curves

    [2] • Differential Geometry [1] • etc … [1] Emre Aksan, Thomas Deselaers, Andrea Tagliasacchi, and Otmar Hilliges. Cose: Compositional stroke embeddings. NeurIPS, 2020. [2] Ayan Das, Yongxin Yang, Timothy Hospedales, Tao Xiang, and Yi-Zhe Song. BezierSketch: A generative model for scalable vector sketches. ECCV, 2020. Image taken from [2] Image taken from [1]
  7. Representing continuous time strokes • Previous approaches • Bezier curves

    [2] • Differential Geometry [1] • etc … [1] Emre Aksan, Thomas Deselaers, Andrea Tagliasacchi, and Otmar Hilliges. Cose: Compositional stroke embeddings. NeurIPS, 2020. [2] Ayan Das, Yongxin Yang, Timothy Hospedales, Tao Xiang, and Yi-Zhe Song. BezierSketch: A generative model for scalable vector sketches. ECCV, 2020. Image taken from [2] Image taken from [1] Bezier curve based stroke + Autoregressive Generation
  8. Representing continuous time strokes • Previous approaches • Bezier curves

    [2] • Differential Geometry [1] • etc … [1] Emre Aksan, Thomas Deselaers, Andrea Tagliasacchi, and Otmar Hilliges. Cose: Compositional stroke embeddings. NeurIPS, 2020. [2] Ayan Das, Yongxin Yang, Timothy Hospedales, Tao Xiang, and Yi-Zhe Song. BezierSketch: A generative model for scalable vector sketches. ECCV, 2020. Image taken from [2] Image taken from [1] Diff. Geometry based stroke + Autoregressive Generation Bezier curve based stroke + Autoregressive Generation
  9. Entire chirographic structure as one function • Chirographic structures (including

    pen-up events) as one continuous- time function 𝑠(𝑡) and model it’s derivative using Neural ODE [1] [1] Tian Qi Chen, Yulia Rubanova, Jesse Bettencourt, and David Duvenaud. Neural Ordinary Differential Equations. In NeurIPS, 2018.
  10. Entire chirographic structure as one function [1] Tian Qi Chen,

    Yulia Rubanova, Jesse Bettencourt, and David Duvenaud. Neural Ordinary Differential Equations. In NeurIPS, 2018. • Chirographic structures (including pen-up events) as one continuous- time function 𝑠(𝑡) and model it’s derivative using Neural ODE [1]
  11. Entire chirographic structure as one function • Chirographic structures (including

    pen-up events) as one continuous- time function 𝑠(𝑡) and model it’s derivative using Neural ODE [1] [1] Tian Qi Chen, Yulia Rubanova, Jesse Bettencourt, and David Duvenaud. Neural Ordinary Differential Equations. In NeurIPS, 2018.
  12. Entire chirographic structure as one function • Chirographic structures (including

    pen-up events) as one continuous- time function 𝑠(𝑡) and model it’s derivative using Neural ODE [1] [1] Tian Qi Chen, Yulia Rubanova, Jesse Bettencourt, and David Duvenaud. Neural Ordinary Differential Equations. In NeurIPS, 2018.
  13. Entire chirographic structure as one function • Chirographic structures (including

    pen-up events) as one continuous- time function 𝑠(𝑡) and model it’s derivative using Neural ODE [1] [1] Tian Qi Chen, Yulia Rubanova, Jesse Bettencourt, and David Duvenaud. Neural Ordinary Differential Equations. In NeurIPS, 2018. Model
  14. Entire chirographic structure as one function • Chirographic structures (including

    pen-up events) as one continuous- time function 𝑠(𝑡) and model it’s derivative using Neural ODE [1] [1] Tian Qi Chen, Yulia Rubanova, Jesse Bettencourt, and David Duvenaud. Neural Ordinary Differential Equations. In NeurIPS, 2018. Model Solution/Inference
  15. Learn latent representations for continuous time functions • Use “Neural

    CDE” [1] to encoder the data and “Neural ODE” [2] to decode it – an Autoencoder setup. [1] Tian Qi Chen, Yulia Rubanova, Jesse Bettencourt, and David Duvenaud. Neural Ordinary Differential Equations. NeurIPS, 2018. [2] Patrick Kidger, James Morrill, James Foster, and Terry J. Lyons. Neural controlled differential equations for irregular time series. NeurIPS, 2020.
  16. Learn latent representations for continuous time functions • Use “Neural

    CDE” [1] to encoder the data and “Neural ODE” [2] to decode it – an Autoencoder setup. [1] Tian Qi Chen, Yulia Rubanova, Jesse Bettencourt, and David Duvenaud. Neural Ordinary Differential Equations. NeurIPS, 2018. [2] Patrick Kidger, James Morrill, James Foster, and Terry J. Lyons. Neural controlled differential equations for irregular time series. NeurIPS, 2020.
  17. Learn latent representations for continuous time functions • Use “Neural

    CDE” [1] to encoder the data and “Neural ODE” [2] to decode it – an Autoencoder setup. [1] Tian Qi Chen, Yulia Rubanova, Jesse Bettencourt, and David Duvenaud. Neural Ordinary Differential Equations. NeurIPS, 2018. [2] Patrick Kidger, James Morrill, James Foster, and Terry J. Lyons. Neural controlled differential equations for irregular time series. NeurIPS, 2020. Augmented ODE
  18. Learn latent representations for continuous time functions • Use “Neural

    CDE” [1] to encoder the data and “Neural ODE” [2] to decode it – an Autoencoder setup. [1] Tian Qi Chen, Yulia Rubanova, Jesse Bettencourt, and David Duvenaud. Neural Ordinary Differential Equations. NeurIPS, 2018. [2] Patrick Kidger, James Morrill, James Foster, and Terry J. Lyons. Neural controlled differential equations for irregular time series. NeurIPS, 2020. Augmented ODE Second order
  19. Full-sequence & Multi-stroke format • Exact format of 𝑠(𝑡) is

    important • Either represent pen-ups as straight lines with a state bit
  20. Full-sequence & Multi-stroke format • Exact format of 𝑠(𝑡) is

    important • Either represent pen-ups as straight lines with a state bit • Or, as a sequence of individual strokes
  21. Full-sequence & Multi-stroke format • Exact format of 𝑠(𝑡) is

    important • Either represent pen-ups as straight lines with a state bit • Or, as a sequence of individual strokes
  22. Full-sequence & Multi-stroke format • Exact format of 𝑠(𝑡) is

    important • Either represent pen-ups as straight lines with a state bit • Or, as a sequence of individual strokes Final state of the previous stroke is mapped to initial state of the next stroke (check the paper for more details)
  23. Full-sequence & Multi-stroke format • Exact format of 𝑠(𝑡) is

    important • Either represent pen-ups as straight lines with a state bit • Or, as a sequence of individual strokes Final state of the previous stroke is mapped to initial state of the next stroke (check the paper for more details) “events”
  24. Training/Implementation tricks • Sin/Cos activation for dynamics functions • High

    frequency temporal changes in trajectory • Continuous noise augmentation • More intuitive in continuous-time case
  25. Reconstruction & Generation • Faithful reconstruction, just like deterministic RNN-RNN

    • Generation by injecting noise into latent space • RNN-RNNs break continuity due to autoregression.
  26. Reconstruction & Generation • Faithful reconstruction, just like deterministic RNN-RNN

    • Generation by injecting noise into latent space • RNN-RNNs break continuity due to autoregression. One-shot Generation
  27. Inherently smooth latent space • Due to continuous nature of

    latent-to-decoder mapping, “SketchODE” enjoys inherent continuity
  28. Inherently smooth latent space • Due to continuous nature of

    latent-to-decoder mapping, “SketchODE” enjoys inherent continuity
  29. Inherently smooth latent space • Due to continuous nature of

    latent-to-decoder mapping, “SketchODE” enjoys inherent continuity One-shot Interpolation SketchODE SketchODE RNN-RNN RNN-RNN
  30. Abstraction effect by squeezing frequency • We noticed a peculiar

    property, i.e. “Abstraction effect” • Thanks to the periodic nature of activations and their frequencies
  31. Abstraction effect by squeezing frequency • We noticed a peculiar

    property, i.e. “Abstraction effect” • Thanks to the periodic nature of activations and their frequencies Decreasing frequency content
  32. Abstraction effect by squeezing frequency • We noticed a peculiar

    property, i.e. “Abstraction effect” • Thanks to the periodic nature of activations and their frequencies Decreasing frequency content Refer to the paper for more details