Upgrade to Pro — share decks privately, control downloads, hide ads and more …

[CVPR24 Oral] Retrieval-Augmented Layout Transf...

Udon
June 16, 2024
300

[CVPR24 Oral] Retrieval-Augmented Layout Transformer for Content-Aware Layout Generation

This is a slide of the following research paper from the University of Tokyo and CyberAgent AI Lab in CVPR 2024.

Retrieval-Augmented Layout Transformer for Content-Aware Layout Generation
Daichi Horita, Naoto Inoue, Kotaro Kikuchi, Kota Yamaguchi, Kiyoharu Aizawa
CVPR, 2024.

Project Website: https://udonda.github.io/RALF/
GitHub Repository: https://github.com/CyberAgentAILab/RALF
Paper: https://arxiv.org/abs/2311.13602

Abstract
Content-aware graphic layout generation aims to automatically arrange visual elements along with a given content, such as an e-commerce product image. In this paper, we argue that the current layout generation approaches suffer from the limited training data for the high-dimensional layout structure. We show that a simple retrieval augmentation can significantly improve the generation quality. Our model, which is named Retrieval-Augmented Layout Transformer (RALF), retrieves nearest neighbor layout examples based on an input image and feeds these results into an autoregressive generator. Our model can apply retrieval augmentation to various controllable generation tasks and yield high-quality layouts within a unified architecture. Our extensive experiments show that RALF successfully generates content-aware layouts in both constrained and unconstrained settings and significantly outperforms the baselines.

Udon

June 16, 2024
Tweet

Transcript

  1. RALF 3 Retrieval-Augmented Layout Transformer 1) Retrieve nearest neighbor layouts

    based on the input image 2) use them as a reference to augment the generation process.
  2. 7 Challenges Controllability to User-Specified Constraints Category → Size +Position

    “Logo, Text x2, Underlay” Relationship “Logo top on Text”
  3. • Data Scarcity & Training Efficiency •Retrieval augmentation effectively addresses

    the data scarcity problem • Content-Layout Harmonization •Propose RALF • Controllability to User-Specified Constraints •Show RALF outperforms the baselines on unconditional & conditional tasks 8 Contributions
  4. 11 Preliminaries Representation of layout Z = (bos, c1 ,

    x1 , y1 , w1 , h1 , c2 , x2 , y2 , w2 , h2 , . . . , eos) Sorted by raster scan order ᶃ ᶄ ᶅ ᶆ ᶃ ᶄ
  5. • Quantize bbox : • Autoregressive modeling: bi 12 Preliminaries

    Tokenization of layout B bins I: image, S: saliency map In exp, B = 128
  6. 16 Overview of RALF Fuses the features of retrieved layouts

    with the image feature using cross-attention.
  7. • A challenge lies in the absence of joint embedding

    for image–layout retrieval, unlike CLIP for image—text retrieval. Layout Retrieval 19 image—text retrieval Dog image—layout retrieval No joint embedding! joint embedding (CLIP)
  8. • We hypothesize that given an image–layout pair ( ),

    is more likely to be useful when is similar to . ˜ I, ˜ L ˜ L ˜ I I Layout Retrieval 20 GT layout Image (query) I (˜ I1 , ˜ L1 ) (˜ I2 , ˜ L2 ) Retrieve images using similarity, then use paired layout ˜ I ˜ L
  9. 27 Analysis How different K affects the output? Compare K=1

    with K=16 ɹɹɹSimilar results Output 1 Output 2 Retrieved example(s) ! = 1 ge Output 1 Output 2 Retrieved example(s) ! = 1 Reference Output K=1
  10. ! = 16 ! = 1 ! = 16 !

    = 1 28 Analysis How different K affects the output? Compare K=1 with K=16 ɹɹɹDiverse and plausible results. Reference K=16 Output
  11. 30 Qualitative Results PKU Dataset Logo Text Underlay CGL Dataset

    Embellishment Logo Text Underlay ! LayoutDM CGL-GAN DS-GAN Autoreg Baseline RALF (Ours)
  12. 31 Quantitative Results Unconstrained generation results on PKU dataset [Hsu+

    CVPR23] Train:Test:Val = 7,734:1,000:1,000 Content: an overlap of saliency object and layout
  13. 33 Quantitative Results Just top-1 retrieval is the worst in

    content metrics “Retrieval-augmented” generation is important Top-1 retrieved layout
  14. 38 Conclusion Thank you! Contact: [email protected] @udoooom • Retrieval augmentation

    effectively addresses the data scarcity problem. • Propose RALF: Retrieval-augmented Layout Transformer • Retrieval augmentation + Autoregressive Transformer. • Show that RALF successfully generates high-quality layouts, significantly outperforming baselines.