$30 off During Our Annual Pro Sale. View Details »

SSII2023 [OS1] グラフィックデザインとマルチモーダル処理

SSII2023 [OS1] グラフィックデザインとマルチモーダル処理

⼭⼝光太(CyberAgent)

More Decks by 画像センシングシンポジウム

Other Decks in Science

Transcript

  1. グラフィックデザインと
    マルチモーダル処理
    2023.6.14
    ⼭⼝ 光太(CyberAgent)

    View Slide

  2. CyberAgent AI Lab
    ● ػցֶश
    ● ίϯϐϡʔλϏδϣϯ
    ● ίϯϐϡʔλάϥϑΟοΫε
    ● ࣗવݴޠॲཧ
    ● Ի੠৴߸ॲཧ
    ● HCI / ϩϘοτ
    ● ܭྔܦࡁֶ

    View Slide

  3. ΞδΣϯμ
    1. άϥϑΟοΫσβΠϯͷϞμϦςΟ
    2. ࠷ۙͷऔΓ૊Έ

    View Slide

  4. άϥϑΟοΫσβΠϯͷϞμϦςΟ
    01

    View Slide

  5. άϥϑΟοΫσβΠϯ
    ● εϥΠυγϣʔɺιʔγϟϧϝσΟΞ౤ߘɺϙελʔɺ ಈը޿ࠂɺWebϖʔδ

    View Slide

  6. άϥϑΟοΫσβΠϯ͸͍ΘΏΔը૾Ͱ͸ͳ͍
    ϕΫλάϥϑΟοΫ
    ϥελը૾
    Rendering
    σβΠφʔͷѻ͏΋ͷ σΟεϓϨΠʹөΔ΋ͷ

    View Slide

  7. ϥελը૾
    ● JPEG, PNG, WebP
    ● ݻఆղ૾౓ϐΫηϧɺυοτֆ
    ● ͍ΘΏΔը૾
    ϕΫλάϥϑΟοΫ
    ● PDF, PPTX, Photoshop
    ● ղ૾౓ඇґଘͷඳըࢦࣔ
    ● ͍ΘΏΔυΩϡϝϯτ
    ϥελܗࣜͱϕΫλܗࣜ
    Typography
    Typography
    Typography

    View Slide

  8. ঎ۀάϥϑΟοΫσβΠϯͷཁૉ
    ● ͨ͘͞ΜͷϞμϦςΟɺςʔϒϧσʔλʹ͍ۙ
    ίϐʔ
    ண஍ϖʔδ
    Ωϟϯϖʔϯ
    άϥϑΟοΫ
    ϓϥοτϑΥʔϜ
    දࣔσόΠε
    ίϯςϯπ ίϯςΩετ
    ഔମ
    *1 ௌऺ

    View Slide

  9. ϕΫλάϥϑΟοΫͷσʔλߏ଄
    Canvas
    Image
    Text
    Text
    Text
    Text
    Canvas
    Image
    Text
    Text
    Text Text
    υΩϡϝϯτ
    ,
    Ωϟϯόε ϨΠϠʔ
    Width, Height,
    Category, …
    Type, Position, Size,
    Appearance, Text, Pixels, …

    View Slide

  10. ϨΠΞ΢τੜ੒
    ● ϨΠΞ΢τ͸(type, left, top, width, height)ͷϨΠϠʔλϓϧͷܥྻ
    ● ϨΠΞ΢τੜ੒͸ϚϧνϞʔμϧͳܥྻੜ੒໰୊ʹؼண
    t1
    x1
    y1
    w1
    h1
    t2
    x2
    y2
    w2
    h2

    Layer 1 Layer
    2
    Generator
    Canvas
    Layer 1
    Layer
    2

    View Slide

  11. ςΩετͷٯϨϯμϦϯά
    ● ϚϧνλεΫɾϚϧνϞʔμϧͳ༧ଌ
    Resolution: [1699, 1280]
    Location: [247, 1130,
    748, 1280]
    Text: "WANT"
    Font: Barlow Semi
    Condensed ExtraBold
    Fill:
    RGB: [44, 34, 41]
    Border:
    Visible: True
    RGB: [217, 91, 97]
    Width: 2
    Shadow:
    Visible: False
    Background:
    ٯϨϯμϦϯά

    View Slide

  12. N Inoue et al., LayoutDM: Discrete Diffusion Model for Controllable Layout Generation, CVPR 2023
    N Inoue et al., Towards Flexible Multi-modal Document Models, CVPR 2023
    ࠷ۙͷऔΓ૊Έ
    02

    View Slide

  13. LayoutDM[Ҫ্+] ɿϨΠΞ΢τੜ੒
    ● ཭ࢄ֦ࢄϞσϧʹΑΔϨΠΞ΢τੜ੒

    View Slide

  14. ϨΠΞ΢τͷͨΊͷϚϧνϞʔμϧ཭ࢄදݱ
    ● (type, left, top, width, height)ͷܥྻσʔλΛϞμϦςΟຖʹಠཱͯ͠཭ࢄදݱ
    ● D3PM[J Austin 21]ʹΑΔ཭ࢄ֦ࢄੜ੒Ϟσϧͷద༻

    View Slide

  15. ֤छϨΠΞ΢τੜ੒λεΫ

    View Slide

  16. FlexDM[Ҫ্+] : σβΠφʔͷฤू޻ఔͷϞσϦϯά
    ● ଟछଟ༷ͳϚϧνϞʔμϧɾϚϧνλεΫॲཧΛ͢ΔΤϯίʔμϞσϧ
    FlexDM
    Layout
    generation
    Texts
    filling
    Font & color
    styling
    Images
    filling
    Element
    filling

    type: Text
    pos: (150, 30)
    size: (200, 90)
    text: Happy\nHolidays!
    image:
    font:
    color:
    Arial
    (210,220,100)
    -
    [MASK]
    [MASK]
    [NULL]
    type: Text
    pos: (150, 30)
    size: (200, 90)
    text: Happy\nHolidays!
    image:
    font:
    color:

    View Slide

  17. ϚεΫ෇͖ΦʔτΤϯίʔμͷϚϧνλεΫ׆༻
    ● ϚϧνϞʔμϧͳBERTతͳϞσϧ→ϚεΫ੾Γସ͑Ͱଟ༷ͳλεΫॲཧ
    Design tasks
    =
    =
    Masking patterns
    Font & color prediction
    Element filling
    BEST IN TOWN!
    CAR WASH
    Full service
    Type
    Position
    Img-emb.
    Text-emb.
    Color / font
    context
    [NULL]
    [MASK]
    1 2 3 4 5
    1 2
    3
    4
    5
    Type
    Position
    Img-emb.
    Text-emb.
    Color / font
    context
    [NULL]
    [MASK]
    1 2 3 4 5
    1 2
    3
    4
    5

    View Slide

  18. ग़ྗྫ
    Output
    Input Output
    Input Output
    Input
    ATTR prediction TXT prediction IMG prediction
    POS prediction Element filling
    Output
    Input
    Output (bbox.) Output (img.) Output (bbox.) Output (img.)

    View Slide

  19. άϥϑΟοΫσβΠϯͱϚϧνϞʔμϧॲཧ
    ● άϥϑΟοΫσβΠϯ͸ը૾ɺจࣈɺزԿ
    ഑ஔɺελΠϦϯάଐੑͳͲϚϧνϞʔμ
    ϧߏ଄σʔλ
    ● γʔέϯεߏ଄ͷ׆༻Ͱ֤छλεΫͷఆࣜ
    Խ͕Մೳʹ

    View Slide