Slide 1

Slide 1 text

グラフィックデザインと マルチモーダル処理 2023.6.14 ⼭⼝ 光太(CyberAgent)

Slide 2

Slide 2 text

CyberAgent AI Lab ● ػցֶश ● ίϯϐϡʔλϏδϣϯ ● ίϯϐϡʔλάϥϑΟοΫε ● ࣗવݴޠॲཧ ● Ի੠৴߸ॲཧ ● HCI / ϩϘοτ ● ܭྔܦࡁֶ

Slide 3

Slide 3 text

ΞδΣϯμ 1. άϥϑΟοΫσβΠϯͷϞμϦςΟ 2. ࠷ۙͷऔΓ૊Έ

Slide 4

Slide 4 text

άϥϑΟοΫσβΠϯͷϞμϦςΟ 01

Slide 5

Slide 5 text

άϥϑΟοΫσβΠϯ ● εϥΠυγϣʔɺιʔγϟϧϝσΟΞ౤ߘɺϙελʔɺ ಈը޿ࠂɺWebϖʔδ

Slide 6

Slide 6 text

άϥϑΟοΫσβΠϯ͸͍ΘΏΔը૾Ͱ͸ͳ͍ ϕΫλάϥϑΟοΫ ϥελը૾ Rendering σβΠφʔͷѻ͏΋ͷ σΟεϓϨΠʹөΔ΋ͷ

Slide 7

Slide 7 text

ϥελը૾ ● JPEG, PNG, WebP ● ݻఆղ૾౓ϐΫηϧɺυοτֆ ● ͍ΘΏΔը૾ ϕΫλάϥϑΟοΫ ● PDF, PPTX, Photoshop ● ղ૾౓ඇґଘͷඳըࢦࣔ ● ͍ΘΏΔυΩϡϝϯτ ϥελܗࣜͱϕΫλܗࣜ Typography Typography Typography

Slide 8

Slide 8 text

঎ۀάϥϑΟοΫσβΠϯͷཁૉ ● ͨ͘͞ΜͷϞμϦςΟɺςʔϒϧσʔλʹ͍ۙ ίϐʔ ண஍ϖʔδ Ωϟϯϖʔϯ άϥϑΟοΫ ϓϥοτϑΥʔϜ දࣔσόΠε ίϯςϯπ ίϯςΩετ ഔମ *1 ௌऺ

Slide 9

Slide 9 text

ϕΫλάϥϑΟοΫͷσʔλߏ଄ Canvas Image Text Text Text Text Canvas Image Text Text Text Text υΩϡϝϯτ , Ωϟϯόε ϨΠϠʔ Width, Height, Category, … Type, Position, Size, Appearance, Text, Pixels, …

Slide 10

Slide 10 text

ϨΠΞ΢τੜ੒ ● ϨΠΞ΢τ͸(type, left, top, width, height)ͷϨΠϠʔλϓϧͷܥྻ ● ϨΠΞ΢τੜ੒͸ϚϧνϞʔμϧͳܥྻੜ੒໰୊ʹؼண t1 x1 y1 w1 h1 t2 x2 y2 w2 h2 … Layer 1 Layer 2 Generator Canvas Layer 1 Layer 2

Slide 11

Slide 11 text

ςΩετͷٯϨϯμϦϯά ● ϚϧνλεΫɾϚϧνϞʔμϧͳ༧ଌ Resolution: [1699, 1280] Location: [247, 1130, 748, 1280] Text: "WANT" Font: Barlow Semi Condensed ExtraBold Fill: RGB: [44, 34, 41] Border: Visible: True RGB: [217, 91, 97] Width: 2 Shadow: Visible: False Background: ٯϨϯμϦϯά

Slide 12

Slide 12 text

N Inoue et al., LayoutDM: Discrete Diffusion Model for Controllable Layout Generation, CVPR 2023 N Inoue et al., Towards Flexible Multi-modal Document Models, CVPR 2023 ࠷ۙͷऔΓ૊Έ 02

Slide 13

Slide 13 text

LayoutDM[Ҫ্+] ɿϨΠΞ΢τੜ੒ ● ཭ࢄ֦ࢄϞσϧʹΑΔϨΠΞ΢τੜ੒

Slide 14

Slide 14 text

ϨΠΞ΢τͷͨΊͷϚϧνϞʔμϧ཭ࢄදݱ ● (type, left, top, width, height)ͷܥྻσʔλΛϞμϦςΟຖʹಠཱͯ͠཭ࢄදݱ ● D3PM[J Austin 21]ʹΑΔ཭ࢄ֦ࢄੜ੒Ϟσϧͷద༻

Slide 15

Slide 15 text

֤छϨΠΞ΢τੜ੒λεΫ

Slide 16

Slide 16 text

FlexDM[Ҫ্+] : σβΠφʔͷฤू޻ఔͷϞσϦϯά ● ଟछଟ༷ͳϚϧνϞʔμϧɾϚϧνλεΫॲཧΛ͢ΔΤϯίʔμϞσϧ FlexDM Layout generation Texts filling Font & color styling Images filling Element filling … type: Text pos: (150, 30) size: (200, 90) text: Happy\nHolidays! image: font: color: Arial (210,220,100) - [MASK] [MASK] [NULL] type: Text pos: (150, 30) size: (200, 90) text: Happy\nHolidays! image: font: color: …

Slide 17

Slide 17 text

ϚεΫ෇͖ΦʔτΤϯίʔμͷϚϧνλεΫ׆༻ ● ϚϧνϞʔμϧͳBERTతͳϞσϧ→ϚεΫ੾Γସ͑Ͱଟ༷ͳλεΫॲཧ Design tasks = = Masking patterns Font & color prediction Element filling BEST IN TOWN! CAR WASH Full service Type Position Img-emb. Text-emb. Color / font context [NULL] [MASK] 1 2 3 4 5 1 2 3 4 5 Type Position Img-emb. Text-emb. Color / font context [NULL] [MASK] 1 2 3 4 5 1 2 3 4 5

Slide 18

Slide 18 text

ग़ྗྫ Output Input Output Input Output Input ATTR prediction TXT prediction IMG prediction POS prediction Element filling Output Input Output (bbox.) Output (img.) Output (bbox.) Output (img.)

Slide 19

Slide 19 text

άϥϑΟοΫσβΠϯͱϚϧνϞʔμϧॲཧ ● άϥϑΟοΫσβΠϯ͸ը૾ɺจࣈɺزԿ ഑ஔɺελΠϦϯάଐੑͳͲϚϧνϞʔμ ϧߏ଄σʔλ ● γʔέϯεߏ଄ͷ׆༻Ͱ֤छλεΫͷఆࣜ Խ͕Մೳʹ