Face Recognition @ ECCV2022

2023.02.09 Takumi Karasawa 株式会社ディー・エヌ・エー＋株式会社 Mobility Technologies Face Recognition@ECCV’22

2 Face recognitionまわりのキャッチアップ。ECCV 2022。はじめに唐澤（からさわ） n DeNA19新卒 → MoT
n DRIVE CHART CVチーム４年目 (顔認証、内カメ、外カメ） n テニスを週1, 2とかでしてます 🎾 🎾 🎾

3 前回資料 https://speakerdeck.com/takarasawa_/face-recognition-and-arcface-papers

4 目次 01｜Face Recognition（FR） 02｜Face Recognition@ECCV’22

5 01 Face Recognition（FR）

一般的に以下２つのタスクのこと Face Recognition： n Face Identification (1:N) 顔画像からどの人物かを識別 n Face
Verification (1:1) 顔画像から同一人物かどうかを判定学習観点では (現状、)手法的な差は特にないイメージ Face Recognition（FR） Face Recognition の状況設定の違い（SphereFace*より引⽤） *Sphereface: Deep hypersphere embedding for face recognition. [W. Liu+, CVPRʼ20] open/closed-set: 学習時に存在しないクラスが推論時に存在する/しない状況設定

7 近年の手法は顔ランドマークは顔特徴の学習には使用されておらず、一般的な距離学習（metric learning）の手法として扱えるものが多いただ、顔認識では前処理として顔ランドマークにより正規化されるのが通例 Face Recognition / Metric Learning
https://github.com/deepinsight/insightface ↑ざっくり⽬の位置が揃っている

ざっくりいうと、「GTへの予測にmarginペナルティを付与したうえで、クラス分類の枠組みで学習することで距離学習を実現する手法」 → marginぶん他クラスに差をつけた上で正解する必要があるイメージ n ちなみにmarginベースの手法における「距離の近さ」は、cosine similarity ▪ 距離的な意味合いであれば
（1 – cos） n SphereFace・CosFace・ArcFaceの違いは、marginペナルティの付与の仕方 ▪ あとは特徴をnormalizeするか、とか細かいところ ArcFace を代表とするmarginベースの距離学習⼿法

9 一般的に用いられる、 n 前処理後の画像サイズ： (112, 112) n 顔特徴の次元数：512 としたとき ArcFace
モデル構造 112 4 3 N-dim(512/1280/..) (バックボーン依存) 112 4 Feature Extractor Pre-processed Image Feature Map N-dim GAP BN Dropout FC BN 512 Neck Face Feature FlattenならN-dim*16 N-class (FC) ArcFace Head 512 ⁄ 𝑥! 𝑥! 顔特徴抽出このneckを挟むのがわりと⼀般的（慣習的に⽤いられてるだけ感はある︖）この段階で4x4 CNN ⼿法のメイン部分

10 ArcFace Head（手法のメイン部分） 512 Face Feature 512 N-class FC Weight
x N-class Cosine Similarity W 𝑐𝑜𝑠𝜃!! 𝑐𝑜𝑠(𝜃!! +𝑚) Scale & SoftMax prob GT ⁄ 𝑊 " 𝑊 " normalizeされているのでこの内積計算はfeatureと各重みのcosine similarityを計算してることと同じ正解ラベルの類似度だけ marginペナルティを加えてあげる（ハイパラ１） logitsの値が⼩さすぎるので scale（ハイパラ２） Cross-entropy loss 𝑦! FC層のバイアスはなし（𝑏 = 0） →学習によって、実質各クラスの代表ベクトル予測が完全にfeatureと重みの⾓度だけで表現される重みもnormalize

Loss Function 通常のsoftmax loss n normalize & b=0 → cosine
n scale n pos/negを分離して記述 marginを付与 SphereFace, CosFace, ArcFaceのmarginの与え⽅の違いを含めた⼀般式 𝑚# : SphereFace, 𝑚$ : ArcFace, 𝑚% : CosFace 𝜃軸でのクラス境界⾯におけるmarginの違い（ArcFace*より引⽤） *ArcFace: Additive angular margin loss for deep face recognition. [J. Deng+, CVPRʼ19] ArcFace loss

12 02 Face Recognition@ECCV’22 *図表は説明がない限り紹介論⽂より引⽤

13 ”Face Recognition”を含む論文、7本。 n Privacy-Preserving Face Recognition with Learnable Privacy
Budgets in Frequency Domain n Unsupervised and Semi-supervised Bias Benchmarking in Face Recognition n Teaching Where to Look: Attention Similarity Knowledge Distillation for Low Resolution Face Recognition n CoupleFace: Relation Matters for Face Recognition Distillation n Controllable and Guided Face Synthesis for Unconstrained Face Recognition n BoundaryFace: A mining framework with noise label self-correction for Face Recognition n Towards Robust Face Recognition with Comprehensive Search Face Recognition papers@ECCV’22 顔認証特有のプライバシー観点や、顔認証モデルのbias benchmarking、低解像度（LR）、軽量モデル、ロバスト性、ラベルノイズ

Budgets in Frequency Domain n Unsupervised and Semi-supervised Bias Benchmarking in Face Recognition n Teaching Where to Look: Attention Similarity Knowledge Distillation for Low Resolution Face Recognition n CoupleFace: Relation Matters for Face Recognition Distillation n Controllable and Guided Face Synthesis for Unconstrained Face Recognition n BoundaryFace: A mining framework with noise label self-correction for Face Recognition n Towards Robust Face Recognition with Comprehensive Search Face Recognition papers@ECCV’22 モデル改善系の４本を紹介 LR, distillation lightweight, distillation label noise comprehensive search

15 Attentionに基づくdistillation手法（Attention Similarity Knowledge Distillation, A-SKD）の提案 Teaching Where to
Look: Attention Similarity Knowledge Distillation for Low Resolution Face Recognition Loss function: 全体概要図 A-SKDによって、中間出⼒であるattention mapが近づくように学習通常のarcface lossにdistillation lossを追加 LR, distillation

16 一般的に軽量モデル学習のために活用されるdistillationを、低解像度モデル学習のため活用する考え方 ref. [M. Zhu+, Low-resolution Visual Recognition via Deep
Feature Distillation. ICASSP 2019] key1. Distillation Approach for LR images large model small model distillation loss distillation loss HR image LR image ⼀般的な軽量モデル学習のための distillation 低解像度画像モデル学習のための distillation teacher teacher student student

17 Convolutional Block Attention Module (CBAM) n channel attention &
spatial attention n 通常のconv blockと置換して取り入れ可能 key2. Attention module: CBAM [S. Woo+, CBAM: Convolutional Block Attention Module., ECCV 2018]

18 n CBAMにおけるchannel-attention, spatial-attentionが類似するようdistillation loss ▪ loss: cosine distance n
arcface lossに追加して学習 n (logit の distillation lossも併用可） proposal. Attention Similarity Knowledge Distillation (A-SKD) spatial-attention channel-attention 全体概要図＊図は１ブロック分のみ ( 𝜆&!'(!)) = 5.） Loss function

19 result

20 vs Attention Transfer [S. Zagoruyko+, Paying More Attention to
Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer, ICLR 2017] Activation-based Attention Transfer Gradient-based Attention Transfer (32x32) ⽬・⿐・⼝にattentionがあたっている

21 object classification / face detection task

22 featureの一貫性に基づく従来のdistillation手法（Feature Consistency distillation, FCD）に加え、サンプル間の関係性に基づくdistillation手法（Mutual Relation Distillation, MRD）を追加したCoupleFaceの提案
CoupleFace: Relation Matters for Face Recognition Distillation lightweight, distillation CoupleFace 全体概要図

23 teacherモデルとstudentモデルにより抽出された特徴量を、直接 L2 distanceで最小化するアプローチ → teacherモデルとstudentモデルの特徴空間を揃える key1. Feature Consistency
Distillation (FCD) [X. Wang+, Exclusivity-consistency regularized knowledge distillation for face recognition, ECCV 2020]

Mutual Relation Distillation (MRD) FCDではサンプル間の関係性を十分に考慮できてないとし、任意のサンプル間のsimilarityが同程度となるように学習＊ただ学習自体は、𝑅(𝑓* +, 𝑓, +)でなく
𝑅(𝑓* +, 𝑓, -)を最適化その際、全組み合わせに対してでなく、より効果的な学習となる組み合わせとなるように → Informative Mutual Relation Mining proposal. Mutual Relation Distillation (MRD)

25 Informative Mutual Relation Mining 学習前： 1. teacherモデルを用いて、全学習データからidentityごとの代表ベクトル 𝒓𝒎 を作成
（同一identityに属する画像群から抽出した特徴平均） 2. 代表ベクトルを用いてidentity間のsimilarityを計算し、各identityに対して hard negativeといえるtop kのinformative prototype set 𝑯𝒎 を構築学習時：𝐻/ を参照し、feature bankから該当featureを取得し mutual relation を算出 n memory bankは各identityの特徴1サンプルを保持 proposal. Informative Mutual Relation Mining 𝐸: feature bank, memory-updating strategy 学習前に⼀度だけ student mutual relation, 𝑅(𝑓! ', 𝑓" () teacher mutual relation, 𝑅(𝑓! (, 𝑓" () 図はk=4だが実験はk=100

26 Couple Face = FCD + MRD proposal. CoupleFace 𝛼=1.0
𝛽 = 0(CoupleFace), 100k iters後に 𝛽=0.01(CoupleFace+) Relation-Aware Distillation (RAD) loss: Feature Consistency Distillation (FCD) loss: total loss: 𝑞=0.03 ArcFace loss CoupleFace 全体概要図

27 n teacher model: Resnet50 n student model: MobileNet v2
result

28 ノイジーなデータセットに対して、ノイズラベルの修正を行ってから効果的にhard sample miningを行う BoundaryFace の提案 BoundaryFace: A mining
framework with noise label self-correction for Face Recognition label noise label noise self-correction nearest negative class match hard sample mining

29 MV-ArcFace (AAAI’20), CurricularFace (CVPR’20) → 学習データセットがlabel noiseがなくきれいであることを前提としている key1. hard
label mining for FR （参考）前回資料︓https://speakerdeck.com/takarasawa_/face-recognition-and-arcface-papers

30 一般に２つのラベルノイズに区別される： n closed-set noise（label flip）: 本来、他クラスのユーザに属するデータ n open-set noise（outlier）:
本来、いずれのクラスのユーザにも属さないデータ key2. label noise in FR closed-set noise 適切なクラスが存在する → clean dataになりうる

31 「適切にモデルの学習ができていれば、closed-set noiseサンプルは問題なく真のラベルの決定境界内に存在していそう」なことを観測できたとし、学習時にラベルを修正して損失を計算 proposal. label self-correction (BoundaryF1) ↑
条件は、margin込みの決定境界

32 BoundaryFace = label self-correction + hard sample mining self-correctionを入れた上で、margin内に存在するデータについては
直接的にlossを大きくするhard sample miningを導入 proposal. hard sample mining label のクラス最近傍の negative class easy sample 扱い hard sample 扱い closed-set noise扱い label self-correction

33 proposal. BoundaryFace noise label self-correction 条件に合致するものはラベル修正 hard sample mining
BoundaryFace = label self-correction + hard sample mining

34 n clean dataset: CASIA WebFace n closed-set noise: ランダムにラベルを入れ替える
n open-set noise: ランダムに MegaFace データセットのデータと入れ替える result ＊BoundaryF1: self-label correctionのみ

35 Data cleaning, loss function, backbone の組み合わせの最適化に関して、探索空間の定義と、強化学習を用いた網羅的探索 Towards Robust
Face Recognition with Comprehensive Search

36 ラベルノイズ区別に応じて対処を一般化し、2つの閾値のハイパラでdata cleaning strategy を定義 closed-set noise (label flip) →
inter-class merging class center similarity が𝜏*:-;< より大きいかどうか open-set noise (outlier) → intra-class filtering discriminabilityを以下のように定義し、 𝜏*:-<= より大きいかどうか Search space 1: data cleaning ref. slide30

37 marginベースの一般化式のハイパラを用いて定義 n 𝑚>, 𝑚?, 𝑚@, 𝑠A, 𝑠: Search space
2: loss function ref. slide 11

38 backboneに関しては、EfficientNetの考え方を参考にDepth, Widthの２つで定義 MobileNetをベースとする Search space 3: backbone [M. Tan+,
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks, ICML 2019]

39 n training dataset: MS1MV2 n validation dataset: MegaFace verification
benchmark n search のための acc スコア： ▪ TAR@FAR at 10−3, 10−4, 10−5 をそれぞれ0.5, 0.25, 0.25で重み付け n search process： ▪ around 1,000 samples to converge ▪ around 37 GPU days (NVIDIA A100, FP16 training) n baseline: ArcFace, MobileNet result モデルは同程度サイズ（flops）になる制約アリ単⼀だとloss searchの上がり幅が最⼤ margin(𝑚" )は意外と⼩さめ

40 Optimization process ⾚点線︓baseline accuracy

41 data, lossに関するdifficultyを以下のように定義し、関係性をプロット Best matches analysis より⼤きいモデルのほうが、より難しい状況に対してうまく学習ができていそう、という主張

Budgets in Frequency Domain n Unsupervised and Semi-supervised Bias Benchmarking in Face Recognition n Teaching Where to Look: Attention Similarity Knowledge Distillation for Low Resolution Face Recognition n CoupleFace: Relation Matters for Face Recognition Distillation n Controllable and Guided Face Synthesis for Unconstrained Face Recognition n BoundaryFace: A mining framework with noise label self-correction for Face Recognition n Towards Robust Face Recognition with Comprehensive Search Face Recognition papers@ECCV’22 loss: ArcFace, backbone: ResNet loss: Sub-center ArcFace, backbone: ResNet loss: ArcFace, backbone: MobileNet, ResNet loss: ArcFace, backbone: ResNet loss: ArcFace, backbone: ResNet loss: ArcFace, backbone: MobileNet loss: ArcFace, backbone: ResNet with CBAM ＜loss, backbone メモ＞依然としてArcFace。顔認証特有のプライバシー観点や、顔認証モデルのbias benchmarking、低解像度（LR）、軽量モデル、ロバスト性、ラベルノイズ

Face Recognition @ ECCV2022

Face Recognition @ ECCV2022

More Decks by Takumi Karasawa

Other Decks in Research

Featured

Transcript