Slide 1

Slide 1 text

Structuring Latent Spaces for Stylized Response Generation ੢ࢿࠁ (ML Engineer, Pingpong)

Slide 2

Slide 2 text

2 Introduction

Slide 3

Slide 3 text

• Stylized response generation • ؀ച ղীࢲ queryী ੸੺ೠ ׹߸ਸ ਗೞח झఋੌ۽ ࢤࢿ • polite, professional, friendly, … • ޙઁ੼ 1. Parallel ؘ੉ఠ੄ ࠗ੤ • Text style transfer੄ Ӕࠄ੸ ޙઁ -> ӝઓ੄ unsupervised approach۽ ೧Ѿೡ ࣻ হਸө? 2. Non-parallel ؀ച ؘ੉ఠ੄ ࠗ੤ • ؀ച ؘ੉ఠ ഑਷ non-parallel झఋੌ ؘ੉ఠ݅ ઓ੤ (׏झ, ࣗࢸ, ࠶۽Ӓ ١) ➡ ӝઓ ߑधਵ۽ח less style-specificೞѢա less context-relevantೣ 3 Motivation

Slide 4

Slide 4 text

• S2S + LM (Niu and Bansal, 2018) • ؀ച ؘ੉ఠ۽ seq2seq ݽ؛ ೟ण • झఋੌ ؘ੉ఠ۽ LM ೟ण • ف ݽ؛ ഛܫ੄ о઺೤ਵ۽ ׮਺ ష௾ ৘ஏ ➡ ъઁ۽ biasܳ ઱ӝ ٸޙী relevance ↓ • Multi-task learning (Luan et al., 2017) • ؀ച ؘ੉ఠ۽ seq2seq ݽ؛ ೟ण • झఋੌ ؘ੉ఠ۽ autoencoder ೟ण • ف ؘ੉ఠܳ э਷ latent space ࢚ী mapping दఇ ➡ ৈ੹൤ ܻ࠙ػ ௿۞झఠܳ ഋࢿೞӝ ٸޙী style intensity ↓ 4 Previous Work

Slide 5

Slide 5 text

• ؀ച-झఋੌ р shared latent space ഋࢿ • ੄޷੸ਵ۽ ਬࢎೠ stylized sentenceٜ੉ ࠺तೠ Ҕী ݽ੉ب۾ • SPACEFUSION (Gao et al., 2019)ਸ non-parallel ؘ੉ఠ۽ ഛ੢ • Automatic, human evaluation ݽف baselineী ࠺೧ ਋ࣻ 5 Contributions

Slide 6

Slide 6 text

6 The Proposed Method STYLEFUSION

Slide 7

Slide 7 text

• Joint optimizationਸ ా೧ ׹߸੄ relevance৬ diversityܳ ֫੐ • relevance৬ diversity੄ shared latent spaceܳ ഋࢿ • ف ࢿ૕ਸ ઑ੺ೡ ࣻ ੓ب۾ э਷ ҕр ࢚ী align • Lossী ҙ۲ regularization termਸ ୶оೞח ߑध 7 Recap: SPACEFUSION

Slide 8

Slide 8 text

• {seq2seq, autoencoder} encoder + (parameter-shared) decoder 8 STYLEFUSION: Model Architecture

Slide 9

Slide 9 text

• ׮ܲ latent spaceՙܻ оөਕ૑ѱ ݅٘ח term • Cross-space distance • • • • Same-space distance • • • dconv = ∑ i∈batch dE (zS2S(xi ), zAE(yi )) n l dstyle = 1 2 dcross NN ({zS2S(xi )}, {zAE(si )}) + 1 2 dcross NN ({zAE(si )}, {zS2S(xi )}) dcross NN ({ai }, {bi }) = ∑ i∈batch dE (ai , bNN of ai ) n l dspread-out = min[dsame NN ({zAE(yi )}), dsame NN ({zAE(si )}), dsame NN ({zS2S(xi )})] dsame NN ({ai }) = ∑ i∈batch dE (ai , aNN of ai ) n l ℒfuse,{conv,style} = d{conv,style} − dspread-out 9 STYLEFUSION: Fusion Objective

Slide 10

Slide 10 text

• ҕр ࢚ীࢲ ߭ఠ ੉زী ٮܲ ੄޷ ߸ചо ݒՍۣب۾ ೞח term • Smoothing between prediction and target • • • Smoothing between non-stylized and random stylized sentence • • ✓ ℒsmooth,conv = − 1 |y| log p(y|zconv) zconv = (1 − u)zAE(y) + uzS2S(x) + ϵ ℒsmooth,style = − (1 − u) 1 |y| log p(y|zstyle) − u 1 |s| log p(s|zstyle) zstyle = (1 − u)zAE(y) + uzAE(s) + ϵ u ∼ U(0,1), ϵ ∼ N(0,σ2I) 10 STYLEFUSION: Smoothness Objective

Slide 11

Slide 11 text

• • ੌ߈੸ਵ۽ • Overfitting ߑ૑ܳ ਤ೧ ۽ pretrain • Data augmentation • ੐੄۽ ۽ ݃झఊ • ℒ = − 1 |y| log p(y|zS2S) + ℒconv + ℒstyle Dstyle ≪ Dconv Dconv si ∈ Dstyle P(mask) ∝ (freq)−1 11 STYLEFUSION: Training Objective

Slide 12

Slide 12 text

• • ઱߸ীࢲ ࢠ೒݂ • ੄ ରਗী ৔ೱ ߉૑ ঋب۾ normalize: • • ޷ܻ ೟णػ style classifier੄ ഛܫҗ о઺೤ • neural: 2-layer GRU • ngram: logistic regression using multi-hot features (n=1,2,3,4) z = zS2S(x) + r zS2S(x) z ρ = |r|/(σ l) score(hi ) = (1 − λ)P(hi |zS2S(x)) + λPstyle(hi ) 12 STYLEFUSION: Inference

Slide 13

Slide 13 text

13 Experiments

Slide 14

Slide 14 text

• ؘ੉ఠࣇ • Reddit: ؀ച, 10M context-response ह • arXiv, Holmes: п 1M, 38K ޙ੢ • : pretrained style classifier۽ Redditীࢲ ೙ఠ݂ • Baselines • MTask (Luan et al., 2017) • S2S+LM (Niu and Bansal, 2018) • Retrieval, Rand, Human Dtest 14 Experimental Setup

Slide 15

Slide 15 text

• ࢤࢿ ৘द 15 Results

Slide 16

Slide 16 text

• ী ٮܲ style intensity৬ relevance ߸ച ρ 16 Results

Slide 17

Slide 17 text

• ী ٮܲ style intensity৬ relevance ߸ച ρ 17 Results

Slide 18

Slide 18 text

• Rand: style intensity ↑, but relevance ↓ • S2S+LM: BLEU ↓ • MTask: style intensity ↓, diversity ↓ 18 Results: Automatic Evaluation

Slide 19

Slide 19 text

• MTask: style intensity ↓ • S2S+LM: relevance ↓ 19 Results: Human Evaluation & Visualization zS2S(x) zAE(s) zAE(y)

Slide 20

Slide 20 text

20 Thank You! Any Question?