小規模データセットに対する機械学習のアプローチ

 小規模データセットに対する機械学習のアプローチ

1クラスあたりの画像枚数が少ないデータセットに対する機械学習の手法をサーベイしました。Data Engineering & Data Analysis Workshop #7 @CyberAgentでの発表。

C4a38de0b6c49494319a621a981d2684?s=128

Masaki Samejima

December 14, 2018
Tweet

Transcript

  1. © 2018, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Masaki Samejima Solutions Architect, Amazon Web Services Japan. 2018.12.14 㼭鋉垷歗⫷ر٦ةإحزח 㼎ׅ׷堣唒㷕统ך،فٗ٦ث Data Engineering & Data Analysis WS#7
  2. © 2018, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. 荈䊹稱➜ 늦䃊 姻埠 ،وبٝ ؐؑـ ؟٦ؽأ آٍػٝ 堣唒㷕统أُ٦ءّٝ،٦ؗذؙز 㥨ֹזAWS؟٦ؽأ Amazon SageMaker, AWS Cloud9 馯㄂ ٖٖؔؔSageMaker example Deep Learning Document ך缺鏬 https://s3.amazonaws.com/ja.gluon.mxnet.io/index.html https://github.com/harusametime/sagemaker-notebooks/
  3. © 2018, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. 歗⫷ر٦ةإحزח㼎ׅ׷劍䖉 • 歗⫷ח㼎ׅ׷堣唒㷕统כծקר Deep Learning ח獳遤 • 钠陎׃׋ְぐؙٓأחծ⼧ⴓז歗⫷ָ֮׷ךָ椚䟝 0 2000 4000 6000 8000 0 1 2 3 4 5 6 7 8 9 MNIST Food-101 0 200 400 600 800 1000 1200 Sushi Ramen Edamame Takoyaki Gyoza …
  4. © 2018, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. 植㹋ךر٦ةإحز 0 50 100 150 200 250 300 350 1 2 3 4 5 6 7 8 9 1011121314151617181920212223 0 2 4 6 8 10 12 14 16 18 20 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Labeled Faces in the Wild 겣歗⫷ر٦ةإحز Landmark Recognition Challenge ٓٝسو٦ؙ钠陎 https://www.kaggle.com/c/landmark-recognition-challenge https://www.kaggle.com/jessicali9530/lfw-dataset 卐侧 卐侧 歗⫷ID 歗⫷ID
  5. © 2018, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. ➂嵲䨌遭 • ر٦ةָ搀ֽ׸ל꧊׭׸לְְׄׯזְ • Web Crawling ח״׷歗⫷ ꧊הCrowdsourcing (Amazon Mechanical Turk, etc.) ח״׷،ظذ٦ءّٝ⡲噟 • ֲֿ׃ג ImageNet ָ钰欰
  6. © 2018, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Amazon SageMaker Ground Truth • ر٦ةחٓكٕ (Ground Truth) ׾➰♷ׅ׷،ظذ٦ءّٝ׾佄䴂 • ⟃♴ך4ةأؙחכذٝفٖ٦زָ欽䠐ׁ׸גֶ׶ծ荈⡲׮〳腉 • ٓكٕ׾➰♷ׅ׷ٙ٦ؕ٦כծAmazon Mechanical Turkծ 㢩鿇كٝتծفٓ؎ك٦زזث٦يך3אַ׵鼅ץ׷
  7. © 2018, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. ➂嵲䨌遭ךꣲ歲 ➂㆞׾䫎Ⰵ׃ג׮鍑寸׃זְ㜥さָ֮׷ 1➂ך겣׾钠陎ׅ׷ךח侧涰卐׮ 乆䕦ׅ׷ךכءأذي涸ח灶笼 겣歗⫷钠陎 https://ai.googleblog.com/2018/03/google-landmarks-new-dataset-and.html ٓٝسو٦ؙ钠陎 קה׿ו濼׵׸גְזְٓٝسو٦ؙ ך歗⫷׾꧊׭׵׸זְ https://link.springer.com/chapter/10.1007/978-3-319-25958-1_8
  8. © 2018, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. ➙傈ך鑧겗 㼭鋉垷ז歗⫷ر٦ةإحزח׮ Deep Learning ׾黝欽׃׋ְ 黝欽דֹ׷ה⡦ֲָ׸׃ְ 1. ر٦ةָזֻג镘׭גְ׷➂׾佸ִ׷〳腉䚍֮׶կ 2. Deep Learning ׾黝欽ׅ׷הծ葺ְ礵䏝ָד׷ַ׮׃׸ זְկ 3. ⟎חر٦ة׾꧊׭׵׸׷ה׃ג׮ծ׉ך⸤⸂׾㣐ֹֻ⵴ 幾דֹ׷ַ׮׃׸זְկ
  9. © 2018, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. 㼭鋉垷歗⫷ر٦ةإحزח㼎ׅ׷،فٗ٦ث Data Augmentation ד歗⫷׾㟓װׅ 㼭鋉垷ז歗⫷ ח䓼ְٌرٕ ׾⡲׷
  10. © 2018, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. 㼭鋉垷歗⫷ر٦ةإحزח㼎ׅ׷،فٗ٦ث Data Augmentation ד歗⫷׾㟓װׅ 㼭鋉垷ז歗⫷ ח䓼ְٌرٕ ׾⡲׷
  11. © 2018, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Data Augmentation ׮הך歗⫷׾⸇䊨׃גծⵃ欽〳腉ז歗⫷׾㟓װׅ倯岀 ⯋歗⫷ 䊩〸⿾鯄 ♳♴⿾鯄 㔐鯄 [⽃秪ז⸇䊨]
  12. © 2018, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Data Augmentation GAN ⯋歗⫷ 欰䧭歗⫷ ⸇䊨٥嫰鯰׾粸׶鵤׃ծ ⯋歗⫷ה⼒ⴽָאַזְ ״ֲז歗⫷׾荈⹛欰䧭 嫰鯰 A. Antoniou, et al., Data Augmentation Generative Adversarial Networks, arXiv:1711.04340, 2017
  13. © 2018, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Data Augmentation GAN - Result • 垥彊涸ז⸇䊨הGANח״׷⸇䊨ך嫰鯰 • 歗⫷卐侧ָ㼰זְקוGANך⸬卓ָ״ֻ植׸גְ׷
  14. © 2018, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. AutoAugment • 圫ղז⸇䊨倯岀ָ֮׷זַדծ歗⫷ח״׏ג黝ⴖז⸇䊨倯岀כ麩ֲכ׆ • 歗⫷ַ׵荈⹛ד黝ⴖז⸇䊨倯岀׾ⴻ倖׃ג黝欽 Ekin D. Cubuk, et al., AutoAugment: Learning Augmentation Policies from Data, arXiv:1805.09501, 2018
  15. © 2018, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. AutoAugment 然桦ⴓ䋒ח䖞׏ג ر٦ة׾欰䧭 • 㷕统 • 鐰⣣ 䓼⻉㷕统ד黝ⴖז⸇䊨倯岀׾㷕统،ٕ؞ٔؤيכPPO 㜠ꂹ
  16. © 2018, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. AutoAugment - Result Cifer-10דך穠卓 (Error rate [%] ) Autoaugment ׾ⵃ欽ׅ׷ֿהדerror rateָ鯪幾
  17. © 2018, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. 㼭鋉垷歗⫷ر٦ةإحزח㼎ׅ׷،فٗ٦ث Data Augmentation ד歗⫷׾㟓װׅ 㼭鋉垷ז歗⫷ ח䓼ְٌرٕ ׾⡲׷
  18. © 2018, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Augmentation vs Deep Learning Model • DAGANך穠卓׾鋅׷הծ然ַחDAGANכ剣⸬ • ׃ַ׃ծDeep Learning ךٌرٕך鼅ן倯ךקֲָꅾ銲ח׫ִ׷ DAGANד34% 礵䏝何㊣ DAGANד0.5% 礵䏝何㊣ ٌرٕד嚊י礵䏝 ָ寸ת׶ծ Augmentationד 礵䏝ָ㼰׃何㊣
  19. © 2018, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. 㼰ꆀر٦ةإحزח㼎ׅ׷Deep Learning ך倯ꆙ • Ⰻؙٓأח㼎ׅ׷ⴓ겲ٌرٕכ⡲׵זְ⡲׸זְ • 傀濼ך歗⫷ךֲ׍⡂גְ׷׮ך׾⿫罋ח׃גⴓ겲ׅ׷ (k-NNك٦أה׮ㄎל׸׷ 㢳ؙٓأⴓ겲 ٌرٕ
  20. © 2018, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. 稱➜׃׋ְⰻ㺁 • FaceNet: 겣歗⫷钠陎 F. Schroff, et al., FaceNet : A Unified Embedding for Face Recognition and Clustering, CVPR 2015 • Matching Network: 㼰侧歗⫷钠陎 (A few shot learning) O. Vinyals, et al., Matching Networks for One Shot Learning, arXiv:1606.04080 • AnoGAN: 歗⫷殯䌢嗚濼 T. Schlegl, Unsupervised Anomaly Detection with Generative Adversarial Networks to Guide Marker Discovery, IPMI2017
  21. © 2018, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. FaceNet F. Schroff, et al., FaceNet : A Unified Embedding for Face Recognition and Clustering, CVPR 2015 • 1➂֮׋׶ך겣歗⫷卐侧ָ㼰זְ㜥さך겣钠陎 • 겣歗⫷׾嫰鯰׃ג • ず♧➂暟ך겣כז׷ץֻ鵚ֻח • ➭➂ך겣כז׷ץֻ黅ֻח ז׷״ֲח瑞꟦חꂁ縧ׅ׷կ ず♧➂暟→鵚ְ ➭➂→黅ְ
  22. © 2018, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. FaceNet, 瑞꟦חꂁ縧הכ • ꂁ縧ׅ׷׋׭ך䏟垥ָ䗳銲 • 䏟垥׾鎘皾ׅ׷鿇ⴓ׾אֻ׷ 䏟垥 䏟垥鎘皾鿇ⴓ
  23. © 2018, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. FaceNet, ⽃秪ז䏟垥鎘皾 • ぐؾؙإٕך⦼׾《׶⳿׃ג䏟垥ח׃ג׃תֲ • 2如⯋邌爙חׅ׷ז׵ծ׉׸׾如⯋㖇簭ׅ׷ … • 胜兝װ⯔٥䕦ך䕦갟׾湫䱸「ֽג׃תֲ • ず♧➂暟ך暴䗙ծ➭➂הך䊴殯׾ֲתֻ⿾僥׃׋䏟垥ח׃׋ְ 铬겗
  24. © 2018, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. FaceNet, Triplet Loss ك٦أ歗⫷ծず♧➂暟ך歗⫷ծ➭➂ך歗⫷ך3אך䏟垥׾鎘皾 䏟垥鎘皾欽 طحزٙ٦ؙ Inception ك٦أ ず♧ ➭➂ 剑ⴱכ➭➂ךקֲָ鵚ְַ׮ D(➭➂) D(ず♧) ず♧➂暟׾鵚ֻ׃׋ְ D(➭➂) < D(ず♧) Triplet Loss = | D(ず♧) – D(➭➂) | → min.
  25. © 2018, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. FaceNet - Result • Labeled Faces in the Wild dataset • 겣钠陎ך礵䏝ָ99.63%חⵋ麦 • Youtube Faces DB • FaceNetד95.12% • 2014䎃ךDeepFaceָ91.4%
  26. © 2018, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Matching Network ך嚊銲 • 㷕统ر٦ةך♧鿇Support setה嫰鯰׃גծ⡂גְ׷歗⫷ךٓ كٕ׾⳿⸂ׅ׷את׶Matching) • ⡂גְ׷䏝さְ׾ Deep Learning ד㷕统ׅ׷ Support set ⴓ겲㼎韋 ⴓ겲㼎韋כ Ӎ ח⡂גְ׷ O. Vinyals, et al., Matching Networks for One Shot Learning, arXiv:1606.04080
  27. © 2018, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Matching Network ך嚊銲 侧䒭דءٝفٕח剅ֻה Support set ⴓ겲㼎韋 ⴓ겲㼎韋כ Ӎ ח⡂גְ׷ ! " ("$ , &$ ) ! & ! & = ) $ *("$ , ! ")&$ *("$ , ! ") ٓكٕ&$חأ؝،* "$ , ! " ׾ꅾ׫➰ֽ׃ג駈ׅ
  28. © 2018, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Matching Network ך㷕统 Support set Batch Full dataset Support setהBatch׾ٓٝتيד䒷ְגBatchָ姻׃ֻⴻ㹀ׁ׸׷״ֲחׅ׷
  29. © 2018, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Matching Network ך㷕统 Support set Batch Full dataset 侧䒭ד剅ֻה ! = argmax ( )*~, )-~*,/~* [ 1 2,3 ∈/ log 7( (9|;, <)]
  30. © 2018, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Matching Networkך穠卓 • 傀㶷ךطحزٙ٦ؙSiamese Net ה嫰ץג礵䏝ָ葺ְ • Siamese Net ׮ת׋ծ2אךر٦ة׾嫰鯰׃זָ׵㷕统ׅ׷ٌرٕ • Fine-tuning כ䗳׆׃׮ֲתְַֻזַ׏׋ (♴ך؛٦أדכoverfitting)
  31. © 2018, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. AnoGAN: GAN׾ⵃ欽׃׋歗⫷殯䌢嗚濼 • 氫䝕ך嗚⳿זוך殯䌢ر٦ةכ姻䌢ر٦ةח嫰ץ׷ה㼰זְ • 殯䌢ה姻䌢׾ⴓֽ׷ٌرٕ׾⡲׷ךכꨇ׃ְ • 姻䌢ر٦ةך׫׾㷕统׃גծ劢濼ךر٦ةָ姻䌢ر٦ةַ׵ ו׸ֻ׵ְַֽꨄ׸גְ׷ַ׾鋅׷ T. Schlegl, Unsupervised Anomaly Detection with Generative Adversarial Networks to Guide Marker Discovery, IPMI2017
  32. © 2018, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. 歗⫷嫰鯰ח״׷殯䌢嗚濼،فٗ٦ث: GAN 姻䌢禸ך ر٦ة׾欰䧭ׅ׷ طحزٙ٦ؙ 姻䌢禸ך歗⫷ה 劢濼ך歗⫷׾嫰鯰 ׃גⴻⴽׅ׷طحزٙ٦ؙ
  33. © 2018, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. AnoGAN - Result • 嫰鯰䩛岀׮㛇劤涸חכ姻䌢禸׾㷕统׃גծ劢濼歗⫷ה嫰 鯰ׅ׷䩛岀 • 姻䌢禸ך㷕统ָCAE (Conditional AutoEncoder)׌׏׋׶ծ GAN׌׏׋׶ׅ׷ָծ䏟垥ך״ֲז׮ך׾䖤׷ךָ湡涸
  34. © 2018, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. תה׭ • ת׆כ知⽃זAugmentation׾ׅ׷ • Ⰻؙٓأ׾ⴓ겲ׅ׷ٌرٕ׾⡲׷ךדכזֻծر٦ة嫰鯰 דؙٓأ׾寸׭׷״ֲחׅ׷կ • 嫰鯰ך׋׭חر٦ةך瑞꟦ꂁ縧׾寸׭׷ٌرٕ׾㷕统ׅ׷կ • ٗأ׾䊨㣗ׅ׷倯岀 (Triplet loss) • ر٦ة׾嫰鯰ׅ׷ٌرٕ׾׉ךתת㷕统 (Matching network) • GANךDiscriminatorד嫰鯰ׅ׷ (AnoGAN) • ⡭⸂ָ֮׸לAugmentation׾갹䓸׷կ