Upgrade to Pro — share decks privately, control downloads, hide ads and more …

入門 Automated Machine Learning

入門 Automated Machine Learning

機械学習は現在様々な領域で活用されていますが、機械学習モデルの構築には高度な訓練を受けたエンジニアが必要です。しかし、近年、機械学習プロセスを自動化する技術である「Automated Machine Learning(AutoML)」が注目を集めています。AutoMLを使用することにより、プロセスを自動化し、生産性向上や専門的なエンジニアの助けなしにタスクに応じた機械学習モデルを構築できます。本資料では、従来の機械学習からAutoMLまでの道のりを紹介することで、AutoMLの可能性や課題について考察したいと思います。

Hiroki Nakayama

December 21, 2018
Tweet

Other Decks in Technology

Transcript

  1. Table of contents Ұൠతͳػցֶशϓϩηε ͜͜Ͱ͸ҰൠతͳػցֶशϓϩηεͰߦ ΘΕΔ͜ͱʹ͍ͭͯཧղΛڞ༗͠·͢ɻ AutoMLͱ͸ʁ ͜͜Ͱ͸AutoMLͱ͸Կ͔ʹ͍ͭͯɺ
 ஀ੜͨ͠എܠ͔Βઆ໌͠·͢ɻ AutoMLͰߦΘΕΔ͜ͱ

    ͜͜Ͱ͸֤ػցֶशϓϩηεͰैདྷߦΘΕ ͍ͯͨ͜ͱͱAutoMLͰߦΘΕΔ͜ͱΛ
 ঺հ͠·͢ɻ AutoMLͷιϑτ΢ΣΞ ͜͜Ͱ͸AutoMLΛߦ͏ͨΊͷιϑτ ΢ΣΞͱαʔϏεʹ͍ͭͯ঺հ͠·͢ɻ AutoMLͷະདྷ ͜͜Ͱ͸AutoMLͷະདྷʹ͍ͭͯ
 ड़΂·͢ɻ ·ͱΊ ࠷ޙʹ͜Ε·Ͱͷ಺༰Λ·ͱΊ·͢ɻ
  2. σʔλΫϦʔχϯά σʔλதͷෆ׬શͰ͋ͬͨΓɺෆਖ਼֬ɺແؔ܎ͳ෦෼Λ
 আڈɾमਖ਼͢Δϓϩηε ςʔϒϧσʔλͷ৔߹ Gender Age City Income Male 18

    Chicago $53,000 Female 25 Cicago $27,000 Female Chicago $89,000 Male 54 Tokyo ¥5,000,000 ܽଛ஋ ϛεεϖϧ ҟͳΔ୯Ґ
  3. σʔλΫϦʔχϯάͷඞཁੑ ҰൠతʹɺσʔλΫϦʔχϯάΛͨ͠ํ͕Ϟσϧͷ ύϑΥʔϚϯε͕޲্͢Δɻ Gender Age City Income Male 18 Chicago

    $53,000 Female 25 Cicago $27,000 Female Chicago $89,000 Male 54 Tokyo ¥5,000,000 ܽଛ஋ ϛεεϖϧ ҟͳΔ୯Ґ ෆ׬શɾෆਖ਼֬ͳσʔλͰ͸ਖ਼ֶ͘͠शͰ͖ͳ͍
  4. Ϟσϧબ୒ͷඞཁੑ ग़య: Data-driven Advice for Applying Machine Learning to Bioinformatics

    Problems Logistic Regression vs RF͸ 10%ͷσʔληοτͰ
 Logistic Regression͕উར
  5. Ϟσϧબ୒ͷඞཁੑ ग़య: Data-driven Advice for Applying Machine Learning to Bioinformatics

    Problems Ұ൪ऑ͍ϞσϧͰ΋Ұ൪ڧ͍
 Ϟσϧʹউͭ͜ͱ΋͋ΔͷͰɺ༷ʑͳ
 ϞσϧΛߟྀ͢Δඞཁ͕͋Δ
  6. ϋΠύʔύϥϝʔλνϡʔχϯά ϞσϧͷϋΠύʔύϥϝʔλΛ࠷దͳ஋ʹௐ੔͢Δϓϩηε
 ɹɹɹɹˠֶशΛߦ͏લʹઃఆ͢Δύϥϝʔλ ϩδεςΟοΫճؼͷ৔߹ ϥϯμϜϑΥϨετͷ৔߹ RandomForestClassifier(n_estimators=’warn’, criterion=’gini’, max_depth=None, min_sa mples_split=2,

    min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features=’auto’, max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None, bootstrap= True, oob_score=False, n_jobs=None, random_state=None, verbose=0, warm_start=False , class_weight=None) LogisticRegression(penalty=’l2’, dual=False, tol=0.0001, C=1.0, fit_intercept=True,
 intercept_scaling=1, class_weight=None, random_state=None, solver=’warn’,
 max_iter=100, multi_class=’warn’, verbose=0, warm_start=False, n_jobs=None)
  7. Table of contents Ұൠతͳػցֶशϓϩηε ͜͜Ͱ͸ҰൠతͳػցֶशϓϩηεͰߦ ΘΕΔ͜ͱʹ͍ͭͯཧղΛڞ༗͠·͢ɻ AutoMLͱ͸ʁ ͜͜Ͱ͸AutoMLͱ͸Կ͔ʹ͍ͭͯɺ
 ஀ੜͨ͠എܠ͔Βઆ໌͠·͢ɻ AutoMLͰߦΘΕΔ͜ͱ

    ͜͜Ͱ͸֤ػցֶशϓϩηεͰैདྷߦΘΕ ͍ͯͨ͜ͱͱAutoMLͰߦΘΕΔ͜ͱΛ
 ঺հ͠·͢ɻ AutoMLͷιϑτ΢ΣΞ ͜͜Ͱ͸AutoMLΛߦ͏ͨΊͷιϑτ ΢ΣΞͱαʔϏεʹ͍ͭͯ঺հ͠·͢ɻ AutoMLͷະདྷ ͜͜Ͱ͸AutoMLͷະདྷʹ͍ͭͯ
 ड़΂·͢ɻ ·ͱΊ ࠷ޙʹ͜Ε·Ͱͷ಺༰Λ·ͱΊ·͢ɻ
  8. ੜ࢈ੑ޲্͕ඞཁͳഎܠ ग़య: Study Shows That the Number of Data Scientists

    Has Doubled in 4 Years 2ഒ σʔλαΠΤϯςΟετͷ਺͸
 ٸܹʹ૿͍͑ͯΔ
  9. ੜ࢈ੑ޲্͕ඞཁͳഎܠ ʢग़యʣMcKinsey Global InstituteʮBig data: The next frontier for innovation,

    competition, and productivityʯ ਺͸ٸܹʹ૿͍͑ͯΔ͕ धཁʹ௥͍͍͍ͭͯͳ͍
  10. Table of contents Ұൠతͳػցֶशϓϩηε ͜͜Ͱ͸ҰൠతͳػցֶशϓϩηεͰߦ ΘΕΔ͜ͱʹ͍ͭͯཧղΛڞ༗͠·͢ɻ AutoMLͱ͸ʁ ͜͜Ͱ͸AutoMLͱ͸Կ͔ʹ͍ͭͯɺ
 ஀ੜͨ͠എܠ͔Βઆ໌͠·͢ɻ AutoMLͰߦΘΕΔ͜ͱ

    ͜͜Ͱ͸֤ػցֶशϓϩηεͰैདྷߦΘΕ ͍ͯͨ͜ͱͱAutoMLͰߦΘΕΔ͜ͱΛ
 ঺հ͠·͢ɻ AutoMLͷιϑτ΢ΣΞ ͜͜Ͱ͸AutoMLΛߦ͏ͨΊͷιϑτ ΢ΣΞͱαʔϏεʹ͍ͭͯ঺հ͠·͢ɻ AutoMLͷະདྷ ͜͜Ͱ͸AutoMLͷະདྷʹ͍ͭͯ
 ड़΂·͢ɻ ·ͱΊ ࠷ޙʹ͜Ε·Ͱͷ಺༰Λ·ͱΊ·͢ɻ
  11. ϋΠύʔύϥϝʔλνϡʔχϯά ϞσϧͷϋΠύʔύϥϝʔλΛ࠷దͳ஋ʹௐ੔͢Δϓϩηε
 ɹɹɹɹˠֶशΛߦ͏લʹઃఆ͢Δύϥϝʔλ ϩδεςΟοΫճؼͷ৔߹ ϥϯμϜϑΥϨετͷ৔߹ RandomForestClassifier(n_estimators=’warn’, criterion=’gini’, max_depth=None, min_sa mples_split=2,

    min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features=’auto’, max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None, bootstrap= True, oob_score=False, n_jobs=None, random_state=None, verbose=0, warm_start=False , class_weight=None) LogisticRegression(penalty=’l2’, dual=False, tol=0.0001, C=1.0, fit_intercept=True,
 intercept_scaling=1, class_weight=None, random_state=None, solver=’warn’,
 max_iter=100, multi_class=’warn’, verbose=0, warm_start=False, n_jobs=None)
  12. खಈʹΑΔνϡʔχϯάͷ໰୊఺ 1 1 2 3 1 2 3 1 2

    3 1 2 3 1 2 3 1 2 3 ộ ộ ộ ộ Parameter1 Parameter2 Parameter3 Parameter4 Parameter5 ϋΠύʔύϥϝʔλΛखಈͰνϡʔχϯά͢Δͷ͸ݱ࣮తͰ͸ͳ͍ ͨͱ͑͹ɺϋΠύʔύϥϝʔλ͕5ͭ͋Γɺ֤ϋΠύʔύϥϝʔλ ʹରͯ͠3ͭͷ஋Λςετ͢Δ৔߹ɺ૊Έ߹Θͤ͸3ͷ5৐ʢ=273ʣ ௨Γଘࡏ
  13. άϦουαʔν ͢΂ͯͷ૊Έ߹Θͤ (10, 0.1) (10, 0.2) (10, 0.5) (100, 0.1)

    (100, 0.2) (100, 0.5) ֤ϋΠύʔύϥϝʔλͷީิ஋Λෳ਺༻ҙͯ͠ɺ͢΂ͯͷ૊߹ͤΛࢼ͢ख๏ ͨͱ͑͹ɺC ͱ γ ͱ͍͏2ͭͷύϥϝʔλ͕͋ΓɺͦΕͧΕɺC∈ {10, 100}, γ∈ {0.1, 0.2, 0.5}ͱ͍͏ީิ஋Λઃఆͨ͠৔߹ɺ2x3=6ͷ૊Έ߹ΘͤΛࢼ͢ ग़య: Random Search for Hyper-Parameter Optimization
  14. ύϥϝʔλʹର͢Δ෼෍Λࢦఆ͠ɺ͔ͦ͜Β஋ΛαϯϓϦϯά͢Δख๏ ͨͱ͑͹ɺάϦουαʔνͰ͸ γ ʹରͯ͠ γ∈ {0.1, 0.2, 0.5} ͷΑ͏ͳ஋Λ
 ༩͍͑ͯͨͷʹରͯ͠ɺϥϯμϜαʔνͰ͸ɺࢦ਺෼෍ͷΑ͏ͳ֬཰෼෍Λ

    ༩͑ɺ͔ͦ͜Β஋ΛαϯϓϦϯά͠·͢ɻগ਺ͷϋΠύʔύϥϝʔλ͕ੑೳ ʹେ͖͘ӨڹΛ༩͑Δ৔߹ʹޮՌతͳख๏ ϥϯμϜαʔν ग़య: Random Search for Hyper-Parameter Optimization
  15. χϡʔϥϧΞʔΩςΫνϟαʔνͷ՝୊ͱͯ͠͸ܭࢉྔͷଟ͕͋͞Δ →ωοτϫʔΫΛ࡞੒͢Δͨͼʹɺֶशͯ݁͠ՌΛ֬ೝ͢Δඞཁ͕͋Δ͔Β Neural Architecture Search with Reinforcement Learning 800 GPUͰ28೔ؒ

    Learning Transferable Architectures for Scalable Image Recognition 500 GPU Ͱ4೔ؒ Ұൠͷݚڀऀ΍։ൃऀ͕ར༻͢Δͷ͸ݱ࣮తͰ͸ͳ͍ χϡʔϥϧΞʔΩςΫνϟαʔνͷ՝୊
  16. featuretools Ͱ͸ Deep Feature Synthesis(DFS) ͱݺ͹ΕΔํ๏Ͱ৽ͨͳಛ ௃Λੜ੒͍ͯ͠·͢ɻDFSͰ͸ primitive ͱݺ͹ΕΔؔ਺Λ࢖ͬͯσʔλͷू ໿ͱม׵Λߦ͍·͢ɻprimitive

    ͷྫͱͯ͠͸ɺྻͷฏۉ΍࠷େ஋ΛऔΔؔ਺ Λڍ͛Δ͜ͱ͕Ͱ͖·͢ɻ·ͨࣗ෼Ͱఆٛͨؔ͠਺Λ primitive ͱͯ͠࢖͏͜ ͱ΋Ͱ͖·͢ɻ featuretoolsͰͷํ๏
  17. Table of contents Ұൠతͳػցֶशϓϩηε ͜͜Ͱ͸ҰൠతͳػցֶशϓϩηεͰߦ ΘΕΔ͜ͱʹ͍ͭͯཧղΛڞ༗͠·͢ɻ AutoMLͱ͸ʁ ͜͜Ͱ͸AutoMLͱ͸Կ͔ʹ͍ͭͯɺ
 ஀ੜͨ͠എܠ͔Βઆ໌͠·͢ɻ AutoMLͰߦΘΕΔ͜ͱ

    ͜͜Ͱ͸֤ػցֶशϓϩηεͰैདྷߦΘΕ ͍ͯͨ͜ͱͱAutoMLͰߦΘΕΔ͜ͱΛ
 ঺հ͠·͢ɻ AutoMLͷιϑτ΢ΣΞ ͜͜Ͱ͸AutoMLΛߦ͏ͨΊͷιϑτ ΢ΣΞͱαʔϏεʹ͍ͭͯ঺հ͠·͢ɻ AutoMLͷະདྷ ͜͜Ͱ͸AutoMLͷະདྷʹ͍ͭͯ
 ड़΂·͢ɻ ·ͱΊ ࠷ޙʹ͜Ε·Ͱͷ಺༰Λ·ͱΊ·͢ɻ
  18. AutoMLͷιϑτ΢ΣΞ Auto-Keras scikit-learnϥΠΫͳΠϯλʔϑΣʔεͰ χϡʔϥϧΞʔΩςΫνϟαʔνΛߦ͑Δ ϥΠϒϥϦ auto-sklearn scikit-learnϥΠΫͳΠϯλʔϑΣʔεͰϞσ ϧબ୒ͱϋΠύʔύϥϝʔλνϡʔχϯά Λߦ͑ΔϥΠϒϥϦ optuna

    ϋΠύʔύϥϝʔλνϡʔχϯάͷͨΊ ͷϥΠϒϥϦɻϕΠζ࠷దԽʹΑΔख ๏Λαϙʔτ TPOT scikit-learnϥΠΫͳΠϯλʔϑΣʔεͰϞσ ϧબ୒ͱϋΠύʔύϥϝʔλνϡʔχϯά Λߦ͑ΔϥΠϒϥϦ
  19. Table of contents Ұൠతͳػցֶशϓϩηε ͜͜Ͱ͸ҰൠతͳػցֶशϓϩηεͰߦ ΘΕΔ͜ͱʹ͍ͭͯཧղΛڞ༗͠·͢ɻ AutoMLͱ͸ʁ ͜͜Ͱ͸AutoMLͱ͸Կ͔ʹ͍ͭͯɺ
 ஀ੜͨ͠എܠ͔Βઆ໌͠·͢ɻ AutoMLͰߦΘΕΔ͜ͱ

    ͜͜Ͱ͸֤ػցֶशϓϩηεͰैདྷߦΘΕ ͍ͯͨ͜ͱͱAutoMLͰߦΘΕΔ͜ͱΛ
 ঺հ͠·͢ɻ AutoMLͷιϑτ΢ΣΞ ͜͜Ͱ͸AutoMLΛߦ͏ͨΊͷιϑτ ΢ΣΞͱαʔϏεʹ͍ͭͯ঺հ͠·͢ɻ AutoMLͷະདྷ ͜͜Ͱ͸AutoMLͷະདྷʹ͍ͭͯ
 ड़΂·͢ɻ ·ͱΊ ࠷ޙʹ͜Ε·Ͱͷ಺༰Λ·ͱΊ·͢ɻ
  20. ࢀߟࢿྉ ಛ௃ΤϯδχΞϦϯά • Why Automated Feature Engineering Will Change the

    Way You Do Machine Learning • Deep Feature Synthesis: How Automated Feature Engineering Works • Automated Feature Engineering χϡʔϥϧΞʔΩςΫνϟαʔν • Neural Architecture Search with Reinforcement Learning • Learning Transferable Architectures for Scalable Image Recognition • Efficient Neural Architecture Search via Parameter Sharing • Everything you need to know about AutoML and Neural Architecture Search • Understanding AutoML and Neural Architecture Search • An Opinionated Introduction to AutoML and Neural Architecture Search • What do machine learning practitioners actually do?
  21. ࢀߟࢿྉ ϋΠύʔύϥϝʔλνϡʔχϯά • Random Search for Hyper-Parameter Optimization • A

    Conceptual Explanation of Bayesian Hyperparameter Optimization for Machine Learning • ػցֶशϞσϧͷϋΠύύϥϝʔλ࠷దԽ • ػցֶशͷͨΊͷϕΠζ࠷దԽೖ໳