– ベースラインのパフォーマンスが不当に低評価されており、公平な評価ができていない • Data augumentationされていない、等 – データのドメインシフトを考慮していない • 未知のクラスも同じデータセットからサンプリングして評価している 5 Oriol Vinyals, NIPS 17 Meta Learning Models Taxonomy Model Based • Santoro et al. ’16 • Duan et al. ’17 • Wang et al. ‘17 • Munkhdalai & Yu ‘17 • Mishra et al. ‘17 Metric Based • Koch ’15 • Vinyals et al. ‘16 • Snell et al. ‘17 • Shyam et al. ‘17 • Sung et al. ‘17 Optimization Based • Schmidhuber ’87, ’92 • Bengio et al. ’90, ‘92 • Hochreiter et al. ’01 • Li & Malik ‘16 • Andrychowicz et al. ’16 • Ravi & Larochelle ‘17 • Finn et al. ‘17 Adapted from Finn ‘17 図引⽤: Vinyals, Oriol. NIPS 2017 Meta-Learning symposium.
, 分類器 𝐶($ |𝐖" ) を学習する • Base class: Few-shotで分類したいクラスとは別のクラスのデータ (ラベル付きデータが⼤量にある前提) – Fine-tuning stage • 𝑓! は固定し、Novel classのデータを使って分類器 𝐶($ |𝐖# ) を学習する • Novel class: Few-shotで分類したいクラスのデータ (ラベル付きデータが数件しか無い前提) 7 Published as a conference paper at ICLR 2019 Baseline++ Baseline Training stage Classifier Feature extractor Novel class data (Few) Fine-tuning stage Fixed Feature extractor Base class data (Many) Linear layer Softmax ! Softmax ! Cosine distance Classifier Classifier … … Figure 1: Baseline and Baseline++ few-shot classification methods. Both the baseline and
set(N件ずつのKクラスラベル付きデータ)を⼿がかりに、Query set(ラベル無し データ)をKクラスいずれかに分類する – Meta-training stage • Meta-testingでの状況に合わせて、Support set, Query setをBase classからサンプリングする • サンプリングされたQuery setが、Support setを参考に正しく分類できるように特徴抽出器 𝑓! を 学習する 9 Published as a conference paper at ICLR 2019 Meta-training stage Meta-testing stage Support set conditioned model Novel support set (Novel class data ) Base query set Base support set ! " Sampled # classes Support set conditioned model Feature extractor MatchingNet Cosine distance RelationNet Relation Module $ ProtoNet Euclidean distance $ MAML Gradient Linear Linear Base class data (Many) Class mean Class mean Figure 2: Meta-learning few-shot classification algorithms. The meta-learning classifier M(·|S) is conditioned on the support set S. (Top) In the meta-train stage, the support set Sb and the query
x Kサンプルの Support set と Query set に対してそれぞれ 𝑓! で特徴抽出 • コサイン距離に基づくクロスエントロピー誤差を最⼩化 – ProtoNet [Snell+, NIPS2017] • Nクラス x Kサンプルの Support set と Query set に対してそれぞれ 𝑓! で特徴抽出 • Support set から得られた特徴ベクトルをクラス毎に平均し、N個のprototype(ベクトル)を作る • Query set のベクトルとprototypeとのユークリッド距離に基づくクロスエントロピー誤差を最⼩化 10 Published as a conference paper at ICLR 2019 Meta-training stage Meta-testing stage Support set conditioned model Novel support set (Novel class data ) Base query set Base support set ! " Sampled # classes Support set conditioned model Feature extractor MatchingNet Cosine distance RelationNet Relation Module $ ProtoNet Euclidean distance $ MAML Gradient Linear Linear Base class data (Many) Class mean Class mean Figure 2: Meta-learning few-shot classification algorithms. The meta-learning classifier M(·|S) is conditioned on the support set S. (Top) In the meta-train stage, the support set Sb and the query
• NNでパラメタライズされたRelation Moduleのスコアに基づくクロスエントロピー誤差を最⼩化 – MAML [Finn+, ICML2017] • Support set(⼩数のラベル付きデータ)でFine-tuningをした時に、Query setの予測誤差が ⼩さくなるようなモデルパラメータの初期値を学習する 11 Published as a conference paper at ICLR 2019 Meta-training stage Meta-testing stage Support set conditioned model Novel support set (Novel class data ) Base query set Base support set ! " Sampled # classes Support set conditioned model Feature extractor MatchingNet Cosine distance RelationNet Relation Module $ ProtoNet Euclidean distance $ MAML Gradient Linear Linear Base class data (Many) Class mean Class mean Figure 2: Meta-learning few-shot classification algorithms. The meta-learning classifier M(·|S) is conditioned on the support set S. (Top) In the meta-train stage, the support set Sb and the query
4層CNN – 各⼿法の設定 • Baseline ⇔ Baseline*: data-augmentation有り ⇔ 無し • ProtoNet ⇔ ProtoNet#: 5-way ⇔ 30-way(1-shot), 20-way(5-shot) でmeta-train • 考察 – Baselineについては、data-augmentationすることにより改善可能であり、報告値は 過⼩評価されている – Baseline++を含めて⽐較するとSOTA⼿法に匹敵する 15 Published as a conference paper at ICLR 2019 Table 1: Validating our re-implementation. We validate our few-shot classification implementation on the mini-ImageNet dataset using a Conv-4 backbone. We report the mean of 600 randomly generated test episodes as well as the 95% confidence intervals. Our reproduced results to all few-shot methods do not fall behind by more than 2% to the reported results in the literature. We attribute the slight discrepancy to different random seeds and minor implementation differences in each method. “Baseline⇤” denotes the results without applying data augmentation during training. ProtoNet# indicates performing 30-way classification in 1-shot and 20-way in 5-shot during the meta-training stage. 1-shot 5-shot Method Reported Ours Reported Ours Baseline - 42.11 ± 0.71 - 62.53 ±0.69 Baseline⇤3 41.08 ± 0.70 36.35 ± 0.64 51.04 ± 0.65 54.50 ±0.66 MatchingNet3 Vinyals et al. (2016) 43.56 ± 0.84 48.14 ± 0.78 55.31 ±0.73 63.48 ±0.66 ProtoNet - 44.42 ± 0.84 - 64.24 ±0.72 ProtoNet# Snell et al. (2017) 49.42 ± 0.78 47.74 ± 0.84 68.20 ±0.66 66.68 ±0.68 MAML Finn et al. (2017) 48.07 ± 1.75 46.47 ± 0.82 63.15 ±0.91 62.71 ±0.71 RelationNet Sung et al. (2018) 50.44 ± 0.82 49.31 ± 0.85 65.32 ±0.70 66.60 ±0.69 Table 2: Few-shot classification results for both the mini-ImageNet and CUB datasets. The Table 1: Validating our re-implementation. We validate our few-shot classification implementation on the mini-ImageNet dataset using a Conv-4 backbone. We report the mean of 600 randomly generated test episodes as well as the 95% confidence intervals. Our reproduced results to all few-shot methods do not fall behind by more than 2% to the reported results in the literature. We attribute the slight discrepancy to different random seeds and minor implementation differences in each method. “Baseline⇤” denotes the results without applying data augmentation during training. ProtoNet# indicates performing 30-way classification in 1-shot and 20-way in 5-shot during the meta-training stage. 1-shot 5-shot Method Reported Ours Reported Ours Baseline - 42.11 ± 0.71 - 62.53 ±0.69 Baseline⇤3 41.08 ± 0.70 36.35 ± 0.64 51.04 ± 0.65 54.50 ±0.66 MatchingNet3 Vinyals et al. (2016) 43.56 ± 0.84 48.14 ± 0.78 55.31 ±0.73 63.48 ±0.66 ProtoNet - 44.42 ± 0.84 - 64.24 ±0.72 ProtoNet# Snell et al. (2017) 49.42 ± 0.78 47.74 ± 0.84 68.20 ±0.66 66.68 ±0.68 MAML Finn et al. (2017) 48.07 ± 1.75 46.47 ± 0.82 63.15 ±0.91 62.71 ±0.71 RelationNet Sung et al. (2018) 50.44 ± 0.82 49.31 ± 0.85 65.32 ±0.70 66.60 ±0.69 Table 2: Few-shot classification results for both the mini-ImageNet and CUB datasets. The Baseline++ consistently improves the Baseline model by a large margin and is competitive with the state-of-the-art meta-learning methods. All experiments are from 5-way classification with a Conv-4 backbone and data augmentation. CUB mini-ImageNet Method 1-shot 5-shot 1-shot 5-shot Baseline 47.12 ± 0.74 64.16 ± 0.71 42.11 ± 0.71 62.53 ±0.69 Baseline++ 60.53 ± 0.83 79.34 ± 0.61 48.24 ± 0.75 66.43 ±0.63 MatchingNet Vinyals et al. (2016) 61.16 ± 0.89 72.86 ± 0.70 48.14 ± 0.78 63.48 ±0.66 ProtoNet Snell et al. (2017) 51.31 ± 0.91 70.77 ± 0.69 44.42 ± 0.84 64.24 ±0.72 MAML Finn et al. (2017) 55.92 ± 0.95 72.09 ± 0.76 46.47 ± 0.82 62.71 ±0.71 RelationNet Sung et al. (2018) 62.45 ± 0.98 76.11 ± 0.69 49.31 ± 0.85 66.60 ±0.69
(meta-testing) – 特徴抽出器 𝑓! : ResNet-18 • 考察 – BaselineがMeta-learningの⼿法全てを上回る結果に – ドメイン間の相違が増⼤するにつれ、Meta-learningの⼿法は相対的に有効でなくなる という結果になった 17 Published as a conference paper at ICLR 2019 mini-ImageNet !CUB Baseline 65.57±0.70 Baseline++ 62.04±0.76 MatchingNet 53.07±0.74 ProtoNet 62.02±0.70 MAML 51.34±0.72 RelationNet 57.71±0.73 Table 3: 5-shot accuracy under the cross-domain scenario with a ResNet-18 backbone. Baseline outperforms all other 40% 50% 60% 70% 80% 90% CUB miniImageNet miniImageNet -> CUB Baseline Baseline++ MatchingNet ProtoNet MAML RelationNet Domain Difference Large Small Figure 4: 5-shot accuracy in different scenarios with a ResNet-18 backbone. The Baseline model performs relative well with larger domain
Wang, Jia-Bin Huang. A Closer Look at Few-shot Classification. In ICLR 2019. • Spyros Gidaris and Nikos Komodakis. Dynamic few-shot visual learning without forgetting. In CVPR 2018. • Junlin Hu, Jiwen Lu, and Yap-Peng Tan. Deep transfer metric learning. In CVPR 2015. • Oriol Vinyals, Charles Blundell, Tim Lillicrap, Daan Wierstra, et al. Matching networks for one shot learning. In NIPS 2016 • Jake Snell, Kevin Swersky, and Richard Zemel. Prototypical networks for few-shot learning. In NIPS 2017. • Flood Sung, Yongxin Yang, Li Zhang, Tao Xiang, Philip HS Torr, and Timothy M Hospedales. Learning to compare: Relation network for few-shot learning. In CVPR 2018. • Chelsea Finn, Pieter Abbeel, and Sergey Levine. Model-agnostic meta-learning for fast adaptation of deep networks. In ICML 2017. 19