Face Recognition @ ECCV2022

Slide 1

Slide 1 text

2023.02.09 Takumi Karasawa 株式会社ディー・エヌ・エー＋株式会社 Mobility Technologies Face Recognition@ECCV’22

Slide 2

Slide 2 text

2 Face recognitionまわりのキャッチアップ。ECCV 2022。はじめに唐澤（からさわ） n DeNA19新卒 → MoT n DRIVE CHART CVチーム４年目 (顔認証、内カメ、外カメ） n テニスを週1, 2とかでしてます 🎾 🎾 🎾

Slide 3

Slide 3 text

3 前回資料 https://speakerdeck.com/takarasawa_/face-recognition-and-arcface-papers

Slide 4

Slide 4 text

4 目次 01｜Face Recognition（FR） 02｜Face Recognition@ECCV’22

Slide 5

Slide 5 text

5 01 Face Recognition（FR）

Slide 6

Slide 6 text

一般的に以下２つのタスクのこと Face Recognition： n Face Identification (1:N) 顔画像からどの人物かを識別 n Face Verification (1:1) 顔画像から同一人物かどうかを判定学習観点では (現状、)手法的な差は特にないイメージ Face Recognition（FR） Face Recognition の状況設定の違い（SphereFace*より引⽤） *Sphereface: Deep hypersphere embedding for face recognition. [W. Liu+, CVPRʼ20] open/closed-set: 学習時に存在しないクラスが推論時に存在する/しない状況設定

Slide 7

Slide 7 text

7 近年の手法は顔ランドマークは顔特徴の学習には使用されておらず、一般的な距離学習（metric learning）の手法として扱えるものが多いただ、顔認識では前処理として顔ランドマークにより正規化されるのが通例 Face Recognition / Metric Learning https://github.com/deepinsight/insightface ↑ざっくり⽬の位置が揃っている

Slide 8

Slide 8 text

ざっくりいうと、「GTへの予測にmarginペナルティを付与したうえで、クラス分類の枠組みで学習することで距離学習を実現する手法」 → marginぶん他クラスに差をつけた上で正解する必要があるイメージ n ちなみにmarginベースの手法における「距離の近さ」は、cosine similarity ▪ 距離的な意味合いであれば（1 – cos） n SphereFace・CosFace・ArcFaceの違いは、marginペナルティの付与の仕方 ▪ あとは特徴をnormalizeするか、とか細かいところ ArcFace を代表とするmarginベースの距離学習⼿法

Slide 9

Slide 9 text

9 一般的に用いられる、 n 前処理後の画像サイズ： (112, 112) n 顔特徴の次元数：512 としたとき ArcFace モデル構造 112 4 3 N-dim(512/1280/..) (バックボーン依存) 112 4 Feature Extractor Pre-processed Image Feature Map N-dim GAP BN Dropout FC BN 512 Neck Face Feature FlattenならN-dim*16 N-class (FC) ArcFace Head 512 ⁄ 𝑥! 𝑥! 顔特徴抽出このneckを挟むのがわりと⼀般的（慣習的に⽤いられてるだけ感はある︖）この段階で4x4 CNN ⼿法のメイン部分

Slide 10

Slide 10 text

10 ArcFace Head（手法のメイン部分） 512 Face Feature 512 N-class FC Weight x N-class Cosine Similarity W 𝑐𝑜𝑠𝜃!! 𝑐𝑜𝑠(𝜃!! +𝑚) Scale & SoftMax prob GT ⁄ 𝑊 " 𝑊 " normalizeされているのでこの内積計算はfeatureと各重みのcosine similarityを計算してることと同じ正解ラベルの類似度だけ marginペナルティを加えてあげる（ハイパラ１） logitsの値が⼩さすぎるので scale（ハイパラ２） Cross-entropy loss 𝑦! FC層のバイアスはなし（𝑏 = 0） →学習によって、実質各クラスの代表ベクトル予測が完全にfeatureと重みの⾓度だけで表現される重みもnormalize

Slide 11

Slide 11 text

Loss Function 通常のsoftmax loss n normalize & b=0 → cosine n scale n pos/negを分離して記述 marginを付与 SphereFace, CosFace, ArcFaceのmarginの与え⽅の違いを含めた⼀般式 𝑚# : SphereFace, 𝑚$ : ArcFace, 𝑚% : CosFace 𝜃軸でのクラス境界⾯におけるmarginの違い（ArcFace*より引⽤） *ArcFace: Additive angular margin loss for deep face recognition. [J. Deng+, CVPRʼ19] ArcFace loss

Slide 12

Slide 12 text

12 02 Face Recognition@ECCV’22 *図表は説明がない限り紹介論⽂より引⽤

Slide 13

Slide 13 text

13 ”Face Recognition”を含む論文、7本。 n Privacy-Preserving Face Recognition with Learnable Privacy Budgets in Frequency Domain n Unsupervised and Semi-supervised Bias Benchmarking in Face Recognition n Teaching Where to Look: Attention Similarity Knowledge Distillation for Low Resolution Face Recognition n CoupleFace: Relation Matters for Face Recognition Distillation n Controllable and Guided Face Synthesis for Unconstrained Face Recognition n BoundaryFace: A mining framework with noise label self-correction for Face Recognition n Towards Robust Face Recognition with Comprehensive Search Face Recognition papers@ECCV’22 顔認証特有のプライバシー観点や、顔認証モデルのbias benchmarking、低解像度（LR）、軽量モデル、ロバスト性、ラベルノイズ

Slide 14

Slide 14 text

14 ”Face Recognition”を含む論文、7本。 n Privacy-Preserving Face Recognition with Learnable Privacy Budgets in Frequency Domain n Unsupervised and Semi-supervised Bias Benchmarking in Face Recognition n Teaching Where to Look: Attention Similarity Knowledge Distillation for Low Resolution Face Recognition n CoupleFace: Relation Matters for Face Recognition Distillation n Controllable and Guided Face Synthesis for Unconstrained Face Recognition n BoundaryFace: A mining framework with noise label self-correction for Face Recognition n Towards Robust Face Recognition with Comprehensive Search Face Recognition papers@ECCV’22 モデル改善系の４本を紹介 LR, distillation lightweight, distillation label noise comprehensive search

Slide 15

Slide 15 text

15 Attentionに基づくdistillation手法（Attention Similarity Knowledge Distillation, A-SKD）の提案 Teaching Where to Look: Attention Similarity Knowledge Distillation for Low Resolution Face Recognition Loss function: 全体概要図 A-SKDによって、中間出⼒であるattention mapが近づくように学習通常のarcface lossにdistillation lossを追加 LR, distillation

Slide 16

Slide 16 text

16 一般的に軽量モデル学習のために活用されるdistillationを、低解像度モデル学習のため活用する考え方 ref. [M. Zhu+, Low-resolution Visual Recognition via Deep Feature Distillation. ICASSP 2019] key1. Distillation Approach for LR images large model small model distillation loss distillation loss HR image LR image ⼀般的な軽量モデル学習のための distillation 低解像度画像モデル学習のための distillation teacher teacher student student

Slide 17

Slide 17 text

17 Convolutional Block Attention Module (CBAM) n channel attention & spatial attention n 通常のconv blockと置換して取り入れ可能 key2. Attention module: CBAM [S. Woo+, CBAM: Convolutional Block Attention Module., ECCV 2018]

Slide 18

Slide 18 text

18 n CBAMにおけるchannel-attention, spatial-attentionが類似するようdistillation loss ▪ loss: cosine distance n arcface lossに追加して学習 n (logit の distillation lossも併用可） proposal. Attention Similarity Knowledge Distillation (A-SKD) spatial-attention channel-attention 全体概要図＊図は１ブロック分のみ ( 𝜆&!'(!)) = 5.） Loss function

Slide 19

Slide 19 text

19 result

Slide 20

Slide 20 text

20 vs Attention Transfer [S. Zagoruyko+, Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer, ICLR 2017] Activation-based Attention Transfer Gradient-based Attention Transfer (32x32) ⽬・⿐・⼝にattentionがあたっている

Slide 21

Slide 21 text

21 object classification / face detection task

Slide 22

Slide 22 text

22 featureの一貫性に基づく従来のdistillation手法（Feature Consistency distillation, FCD）に加え、サンプル間の関係性に基づくdistillation手法（Mutual Relation Distillation, MRD）を追加したCoupleFaceの提案 CoupleFace: Relation Matters for Face Recognition Distillation lightweight, distillation CoupleFace 全体概要図

Slide 23

Slide 23 text

23 teacherモデルとstudentモデルにより抽出された特徴量を、直接 L2 distanceで最小化するアプローチ → teacherモデルとstudentモデルの特徴空間を揃える key1. Feature Consistency Distillation (FCD) [X. Wang+, Exclusivity-consistency regularized knowledge distillation for face recognition, ECCV 2020]

Slide 24

Slide 24 text

Mutual Relation Distillation (MRD) FCDではサンプル間の関係性を十分に考慮できてないとし、任意のサンプル間のsimilarityが同程度となるように学習＊ただ学習自体は、𝑅(𝑓* +, 𝑓, +)でなく 𝑅(𝑓* +, 𝑓, -)を最適化その際、全組み合わせに対してでなく、より効果的な学習となる組み合わせとなるように → Informative Mutual Relation Mining proposal. Mutual Relation Distillation (MRD)

Slide 25

Slide 25 text

25 Informative Mutual Relation Mining 学習前： 1. teacherモデルを用いて、全学習データからidentityごとの代表ベクトル 𝒓𝒎 を作成（同一identityに属する画像群から抽出した特徴平均） 2. 代表ベクトルを用いてidentity間のsimilarityを計算し、各identityに対して hard negativeといえるtop kのinformative prototype set 𝑯𝒎 を構築学習時：𝐻/ を参照し、feature bankから該当featureを取得し mutual relation を算出 n memory bankは各identityの特徴1サンプルを保持 proposal. Informative Mutual Relation Mining 𝐸: feature bank, memory-updating strategy 学習前に⼀度だけ student mutual relation, 𝑅(𝑓! ', 𝑓" () teacher mutual relation, 𝑅(𝑓! (, 𝑓" () 図はk=4だが実験はk=100

Slide 26

Slide 26 text

26 Couple Face = FCD + MRD proposal. CoupleFace 𝛼=1.0 𝛽 = 0(CoupleFace), 100k iters後に 𝛽=0.01(CoupleFace+) Relation-Aware Distillation (RAD) loss: Feature Consistency Distillation (FCD) loss: total loss: 𝑞=0.03 ArcFace loss CoupleFace 全体概要図

Slide 27

Slide 27 text

27 n teacher model: Resnet50 n student model: MobileNet v2 result

Slide 28

Slide 28 text

28 ノイジーなデータセットに対して、ノイズラベルの修正を行ってから効果的にhard sample miningを行う BoundaryFace の提案 BoundaryFace: A mining framework with noise label self-correction for Face Recognition label noise label noise self-correction nearest negative class match hard sample mining

Slide 29

Slide 29 text

29 MV-ArcFace (AAAI’20), CurricularFace (CVPR’20) → 学習データセットがlabel noiseがなくきれいであることを前提としている key1. hard label mining for FR （参考）前回資料︓https://speakerdeck.com/takarasawa_/face-recognition-and-arcface-papers

Slide 30

Slide 30 text

30 一般に２つのラベルノイズに区別される： n closed-set noise（label flip）: 本来、他クラスのユーザに属するデータ n open-set noise（outlier）: 本来、いずれのクラスのユーザにも属さないデータ key2. label noise in FR closed-set noise 適切なクラスが存在する → clean dataになりうる

Slide 31

Slide 31 text

31 「適切にモデルの学習ができていれば、closed-set noiseサンプルは問題なく真のラベルの決定境界内に存在していそう」なことを観測できたとし、学習時にラベルを修正して損失を計算 proposal. label self-correction (BoundaryF1) ↑ 条件は、margin込みの決定境界

Slide 32

Slide 32 text

32 BoundaryFace = label self-correction + hard sample mining self-correctionを入れた上で、margin内に存在するデータについては直接的にlossを大きくするhard sample miningを導入 proposal. hard sample mining label のクラス最近傍の negative class easy sample 扱い hard sample 扱い closed-set noise扱い label self-correction

Slide 33

Slide 33 text

33 proposal. BoundaryFace noise label self-correction 条件に合致するものはラベル修正 hard sample mining BoundaryFace = label self-correction + hard sample mining

Slide 34

Slide 34 text

34 n clean dataset: CASIA WebFace n closed-set noise: ランダムにラベルを入れ替える n open-set noise: ランダムに MegaFace データセットのデータと入れ替える result ＊BoundaryF1: self-label correctionのみ

Slide 35

Slide 35 text

35 Data cleaning, loss function, backbone の組み合わせの最適化に関して、探索空間の定義と、強化学習を用いた網羅的探索 Towards Robust Face Recognition with Comprehensive Search

Slide 36

Slide 36 text

36 ラベルノイズ区別に応じて対処を一般化し、2つの閾値のハイパラでdata cleaning strategy を定義 closed-set noise (label flip) → inter-class merging class center similarity が𝜏*:-;< より大きいかどうか open-set noise (outlier) → intra-class filtering discriminabilityを以下のように定義し、 𝜏*:-<= より大きいかどうか Search space 1: data cleaning ref. slide30

Slide 37

Slide 37 text

37 marginベースの一般化式のハイパラを用いて定義 n 𝑚>, 𝑚?, 𝑚@, 𝑠A, 𝑠: Search space 2: loss function ref. slide 11

Slide 38

Slide 38 text

38 backboneに関しては、EfficientNetの考え方を参考にDepth, Widthの２つで定義 MobileNetをベースとする Search space 3: backbone [M. Tan+, EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks, ICML 2019]

Slide 39

Slide 39 text

39 n training dataset: MS1MV2 n validation dataset: MegaFace verification benchmark n search のための acc スコア： ▪ TAR@FAR at 10−3, 10−4, 10−5 をそれぞれ0.5, 0.25, 0.25で重み付け n search process： ▪ around 1,000 samples to converge ▪ around 37 GPU days (NVIDIA A100, FP16 training) n baseline: ArcFace, MobileNet result モデルは同程度サイズ（flops）になる制約アリ単⼀だとloss searchの上がり幅が最⼤ margin(𝑚" )は意外と⼩さめ

Slide 40

Slide 40 text

40 Optimization process ⾚点線︓baseline accuracy

Slide 41

Slide 41 text

41 data, lossに関するdifficultyを以下のように定義し、関係性をプロット Best matches analysis より⼤きいモデルのほうが、より難しい状況に対してうまく学習ができていそう、という主張

Slide 42

Slide 42 text

42 ”Face Recognition”を含む論文、7本。 n Privacy-Preserving Face Recognition with Learnable Privacy Budgets in Frequency Domain n Unsupervised and Semi-supervised Bias Benchmarking in Face Recognition n Teaching Where to Look: Attention Similarity Knowledge Distillation for Low Resolution Face Recognition n CoupleFace: Relation Matters for Face Recognition Distillation n Controllable and Guided Face Synthesis for Unconstrained Face Recognition n BoundaryFace: A mining framework with noise label self-correction for Face Recognition n Towards Robust Face Recognition with Comprehensive Search Face Recognition papers@ECCV’22 loss: ArcFace, backbone: ResNet loss: Sub-center ArcFace, backbone: ResNet loss: ArcFace, backbone: MobileNet, ResNet loss: ArcFace, backbone: ResNet loss: ArcFace, backbone: ResNet loss: ArcFace, backbone: MobileNet loss: ArcFace, backbone: ResNet with CBAM ＜loss, backbone メモ＞依然としてArcFace。顔認証特有のプライバシー観点や、顔認証モデルのbias benchmarking、低解像度（LR）、軽量モデル、ロバスト性、ラベルノイズ