Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Survey on DANN

Avatar for knshnb knshnb
May 20, 2020

Survey on DANN

Avatar for knshnb

knshnb

May 20, 2020
Tweet

More Decks by knshnb

Other Decks in Science

Transcript

  1. TL;DR ✴ "Domain Adversarial Training of Neural Networks” ✴ One

    of the most common method for deep domain adaptation ✴ Find a representation that is ‣ discriminative for the original task ‣ indiscriminate between domains in an adversarial way ✴ Applicable to arbitral neural architectures 3
  2. Problem Setting: Unsupervised Domain Adaptation ✴ Classification ‣ : input

    space ‣ : label space ✴ Two different distributions over ‣ : source domain ‣ : target domain X Y = {0,1,...,L − 1} X × Y DS DT 4
  3. Problem Setting: Unsupervised Domain Adaptation ✴ Unsupervised (= No labels

    of ) ‣ ‣ ‣ Totally examples ✴ Minimize a target risk: ‣ DT S = {(xi , yi ) ∼ DS }n i=1 T = {xi ∼ DX T }N i=n+1 N = n + n′ RDT (h) = Pr(x,y)∼DT [h(x) ≠ y] 5
  4. H-divergence [Ben-David+ NIPS 2006] ✴ Discrepancy measure ✴ Given and

    over , and a hypothesis class , ‣ “How distinguishable two classes are by ” ✴ Empirical H-divergence DX S DX T X H dH (DX S , DX T ) = 2 sup h∈H |Prx∼DX S [h(x) = 1] − Prx∼DX T [h(x) = 1]| H ˜ dH (S, H) = 2(1 − min h∈H ( 1 n n ∑ i=1 I[h(xi ) = 0] + 1 n′ N ∑ i=n+1 I[h(xi ) = 1])) 7
  5. Target Risk Bound ✴ Target risk is upper bounded using

    empirical H-divergence ✴ With probability , for every ‣ 1 − δ h ∈ H RDT ≤ RS (h) + ˜ dH (S, T) + complexity(H) + . . . 8
  6. Target Risk Bound ✴ Target risk is upper bounded using

    empirical H-divergence ✴ With probability , for every ‣ ✴ What we can control ‣ Source risk • Ordinal classification ‣ Empirical H-divergence • Find a feature representation where two domains are indistinguishable by 1 − δ h ∈ H RDT (h) ≤ RS (h) + ˜ dH (S, T)+complexity(H) + . . . H 9
  7. Idea ✴ Train three components at the same time (

    : dimension of feature representation) ‣ Feature extractor • ‣ Label predictor • • ‣ Domain classifier • • D Gf ( ⋅ ; θf ) : X → ℝD Gy ( ⋅ ; θy ) : ℝD → [0,1]L Ly : [0,1]L × {0,1,...,L − 1} → ℝ Gd ( ⋅ ; θd ) : ℝD → [0,1] Ld : [0,1] × {0,1} → ℝ 11
  8. Architecture ✴ Two loss functions ‣ ‣ • : domain

    label, representing source or target Li y (θf , θy ) = Ly (Gy (Gf (xi ; θf ); θy ), yi ) Li d (θf , θd ) = Ld (Gd (Gf (xi ; θf ); θd ), di ) di 12
  9. Simultaneous Optimization ✴ Optimize prediction and domain loss simultaneously ✴

    We want to ‣ minimize by and ‣ maximize by E(θf , θy , θd ) = 1 n n ∑ i=1 Li y (θf , θy ) − λ( 1 n n ∑ i=1 Li d (θf , θd ) + 1 n′ N ∑ i=n+1 Li d (θf , θd )) E θf θy E θd 13 Source risk H-divergence
  10. Gradient Reversal Layer ✴ Gradient update ‣ ‣ ‣ θf

    ← θf − μ( ∂Lt y ∂θf − λ ∂Lt d ∂θf ) θy ← θy − μ ∂Lt y ∂θy θd ← θd − μλ ∂Lt d ∂θd 14 Gradient Reversal Layer: - forward: identity - backward: multiply by -1
  11. Summary ✴ Domain adaptation by minimizing source risk and H-

    divergence at the same time ✴ Applicable to arbitrary neural network architecture ✴ Significant improvement from previous methods in various tasks (image classification, person re-identification) 16