Slide 1

Slide 1 text

ೖ໳ Automated Machine Learning 19 Dec 2018

Slide 2

Slide 2 text

தࢁޫथ (@Hironsan) TISגࣜձࣾ ઓུٕज़ηϯλʔ ೖࣾҎདྷɺػցֶश΍ࣗવݴޠॲཧͷݚڀɺͦΕΒΛ༻͍ͨγεςϜͷ
 ϓϩτλΠϐϯάΛ୲౰ɻݱࡏ͸ࣗવݴޠॲཧʹ࢖͑ΔσʔλΛ୭Ͱ΋
 ؆୯ʹ࡞੒Ͱ͖ΔΑ͏ʹͳΔ͜ͱΛ໨ࢦ͠ɺιϑτ΢ΣΞͷ։ൃΛߦ͍ͬͯΔɻ About Me

Slide 3

Slide 3 text

anaGo ݻ༗දݱೝࣝ΍඼ࢺλά෇͚ͷͨΊͷPythonϥΠϒϥϦ Open Source Software (1/2) https://github.com/Hironsan/anago

Slide 4

Slide 4 text

doccano ػցֶश༻ςΩετσʔλͷϥϕϧ෇πʔϧ Open Source Software (2/2) https://github.com/chakki-works/doccano

Slide 5

Slide 5 text

Deep Learning with Kerasͷ຋༁ Book

Slide 6

Slide 6 text

Table of contents Ұൠతͳػցֶशϓϩηε ͜͜Ͱ͸ҰൠతͳػցֶशϓϩηεͰߦ ΘΕΔ͜ͱʹ͍ͭͯཧղΛڞ༗͠·͢ɻ AutoMLͱ͸ʁ ͜͜Ͱ͸AutoMLͱ͸Կ͔ʹ͍ͭͯɺ
 ஀ੜͨ͠എܠ͔Βઆ໌͠·͢ɻ AutoMLͰߦΘΕΔ͜ͱ ͜͜Ͱ͸֤ػցֶशϓϩηεͰैདྷߦΘΕ ͍ͯͨ͜ͱͱAutoMLͰߦΘΕΔ͜ͱΛ
 ঺հ͠·͢ɻ AutoMLͷιϑτ΢ΣΞ ͜͜Ͱ͸AutoMLΛߦ͏ͨΊͷιϑτ ΢ΣΞͱαʔϏεʹ͍ͭͯ঺հ͠·͢ɻ AutoMLͷະདྷ ͜͜Ͱ͸AutoMLͷະདྷʹ͍ͭͯ
 ड़΂·͢ɻ ·ͱΊ ࠷ޙʹ͜Ε·Ͱͷ಺༰Λ·ͱΊ·͢ɻ

Slide 7

Slide 7 text

Ұൠతͳػցֶशϓϩηε ग़య: Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science

Slide 8

Slide 8 text

σʔλΫϦʔχϯά σʔλதͷෆ׬શͰ͋ͬͨΓɺෆਖ਼֬ɺແؔ܎ͳ෦෼Λ
 আڈɾमਖ਼͢Δϓϩηε ςΩετσʔλͷ৔߹ URLͷআڈ ϋογϡλάͷ
 আڈ ਺஋ͷਖ਼نԽ (0000೥00݄00೔) จࣈछͷਖ਼نԽ (→ mkdocs) ϚεΩϯά

Slide 9

Slide 9 text

σʔλΫϦʔχϯά σʔλதͷෆ׬શͰ͋ͬͨΓɺෆਖ਼֬ɺແؔ܎ͳ෦෼Λ
 আڈɾमਖ਼͢Δϓϩηε ςʔϒϧσʔλͷ৔߹ Gender Age City Income Male 18 Chicago $53,000 Female 25 Cicago $27,000 Female Chicago $89,000 Male 54 Tokyo ¥5,000,000 ܽଛ஋ ϛεεϖϧ ҟͳΔ୯Ґ

Slide 10

Slide 10 text

σʔλΫϦʔχϯάͷඞཁੑ ҰൠతʹɺσʔλΫϦʔχϯάΛͨ͠ํ͕Ϟσϧͷ ύϑΥʔϚϯε͕޲্͢Δɻ Gender Age City Income Male 18 Chicago $53,000 Female 25 Cicago $27,000 Female Chicago $89,000 Male 54 Tokyo ¥5,000,000 ܽଛ஋ ϛεεϖϧ ҟͳΔ୯Ґ ෆ׬શɾෆਖ਼֬ͳσʔλͰ͸ਖ਼ֶ͘͠शͰ͖ͳ͍

Slide 11

Slide 11 text

ಛ௃ΤϯδχΞϦϯά ػցֶशΞϧΰϦζϜͷੑೳΛ޲্ͤ͞ΔͨΊʹɺ
 ಛ௃(෼ੳର৅σʔλͷଌఆՄೳͳม਺)Λ࡞੒͢Δϓϩηε λΠλχοΫ߸ͷ৐٬৘ใ

Slide 12

Slide 12 text

ಛ௃ΤϯδχΞϦϯάͷඞཁੑ ಛ௃ΤϯδχΞϦϯάͰྑ͍ಛ௃ΛಘΔ͜ͱͰػցֶश
 ΞϧΰϦζϜͷੑೳ͕޲্͢Δɻ λΠλχοΫ߸ͷ৐٬৘ใ ܟশ(Mr, Mrs, SirͳͲ)Λ
 நग़ͯ͠࢖͏ ધͷ্ͷํͳͷ͔ɺ
 Լͷํͳͷ͔

Slide 13

Slide 13 text

Ϟσϧબ୒ σʔλΛֶशͤ͞ΔػցֶशΞϧΰϦζϜΛબͿϓϩηε ग़య: sklearn Classifier comparisonΑΓҰ෦ൈਮ

Slide 14

Slide 14 text

Ϟσϧબ୒ͷඞཁੑ ग़య: Data-driven Advice for Applying Machine Learning to Bioinformatics Problems

Slide 15

Slide 15 text

Ϟσϧબ୒ͷඞཁੑ ग़య: Data-driven Advice for Applying Machine Learning to Bioinformatics Problems Logistic Regression vs RF͸ 10%ͷσʔληοτͰ
 Logistic Regression͕উར

Slide 16

Slide 16 text

Ϟσϧબ୒ͷඞཁੑ ग़య: Data-driven Advice for Applying Machine Learning to Bioinformatics Problems Ұ൪ऑ͍ϞσϧͰ΋Ұ൪ڧ͍
 Ϟσϧʹউͭ͜ͱ΋͋ΔͷͰɺ༷ʑͳ
 ϞσϧΛߟྀ͢Δඞཁ͕͋Δ

Slide 17

Slide 17 text

ϋΠύʔύϥϝʔλνϡʔχϯά ϞσϧͷϋΠύʔύϥϝʔλΛ࠷దͳ஋ʹௐ੔͢Δϓϩηε
 ɹɹɹɹˠֶशΛߦ͏લʹઃఆ͢Δύϥϝʔλ ϩδεςΟοΫճؼͷ৔߹ ϥϯμϜϑΥϨετͷ৔߹ RandomForestClassifier(n_estimators=’warn’, criterion=’gini’, max_depth=None, min_sa mples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features=’auto’, max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None, bootstrap= True, oob_score=False, n_jobs=None, random_state=None, verbose=0, warm_start=False , class_weight=None) LogisticRegression(penalty=’l2’, dual=False, tol=0.0001, C=1.0, fit_intercept=True,
 intercept_scaling=1, class_weight=None, random_state=None, solver=’warn’,
 max_iter=100, multi_class=’warn’, verbose=0, warm_start=False, n_jobs=None)

Slide 18

Slide 18 text

ϋΠύʔύϥϝʔλνϡʔχϯάͷඞཁੑ ग़య: Data-driven Advice for Applying Machine Learning to Bioinformatics Problems νϡʔχϯάʹΑͬͯฏۉతʹ͸
 ਖ਼ղ཰Ͱ3ʙ5%ఔ౓ͷվળ

Slide 19

Slide 19 text

Table of contents Ұൠతͳػցֶशϓϩηε ͜͜Ͱ͸ҰൠతͳػցֶशϓϩηεͰߦ ΘΕΔ͜ͱʹ͍ͭͯཧղΛڞ༗͠·͢ɻ AutoMLͱ͸ʁ ͜͜Ͱ͸AutoMLͱ͸Կ͔ʹ͍ͭͯɺ
 ஀ੜͨ͠എܠ͔Βઆ໌͠·͢ɻ AutoMLͰߦΘΕΔ͜ͱ ͜͜Ͱ͸֤ػցֶशϓϩηεͰैདྷߦΘΕ ͍ͯͨ͜ͱͱAutoMLͰߦΘΕΔ͜ͱΛ
 ঺հ͠·͢ɻ AutoMLͷιϑτ΢ΣΞ ͜͜Ͱ͸AutoMLΛߦ͏ͨΊͷιϑτ ΢ΣΞͱαʔϏεʹ͍ͭͯ঺հ͠·͢ɻ AutoMLͷະདྷ ͜͜Ͱ͸AutoMLͷະདྷʹ͍ͭͯ
 ड़΂·͢ɻ ·ͱΊ ࠷ޙʹ͜Ε·Ͱͷ಺༰Λ·ͱΊ·͢ɻ

Slide 20

Slide 20 text

AutoMLͷ࿩୊

Slide 21

Slide 21 text

࣮ࡍͷͱ͜ΖɺAutoMLͱ͸Կͳͷ͔ʁ

Slide 22

Slide 22 text

AutoMLͱ͸ʁ AutoMLʢAutomated Machine Learningʣ͸ɺػցֶशϓϩηεΛ
 ࣗಈԽ͢ΔͨΊͷٕज़ ग़య: Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science

Slide 23

Slide 23 text

AutoMLͷ໨త ΤϯδχΞͷੜ࢈ੑ޲্ ػցֶशͷຽओԽ

Slide 24

Slide 24 text

ੜ࢈ੑ޲্͕ඞཁͳഎܠ ग़య: Study Shows That the Number of Data Scientists Has Doubled in 4 Years 2ഒ σʔλαΠΤϯςΟετͷ਺͸
 ٸܹʹ૿͍͑ͯΔ

Slide 25

Slide 25 text

ੜ࢈ੑ޲্͕ඞཁͳഎܠ ʢग़యʣMcKinsey Global InstituteʮBig data: The next frontier for innovation, competition, and productivityʯ ਺͸ٸܹʹ૿͍͑ͯΔ͕ धཁʹ௥͍͍͍ͭͯͳ͍

Slide 26

Slide 26 text

ػցֶशͷຽओԽ ୭Ͱ΋ػցֶशΛ࢖ͬͯՁ஋ΛੜΈग़ͤΔΑ͏ʹ͢Δ͜ͱ →ओʹػցֶशΤϯδχΞ΍σʔλαΠΤϯςΟετͰ͸
 ɹͳ͍ਓͷ͜ͱ ୭Ͱ΋࢖͑ΔΑ͏ʹGUIϕʔεͰ͋Δ͜ͱ͕ଟ͍ Azure Machine Learning? Google Cloud AutoML DataRobot

Slide 27

Slide 27 text

ػցֶशͷຽओԽͷഎܠ എܠͷҰͭͱͯ͠ΞϧΰϦζϜͷίϞσΟςΟԽ͕͋Δ ΞϧΰϦζϜ͕ΦʔϓϯιʔεͰެ։͞ΕΔΑ͏ʹͳͬͨ ΞϧΰϦζϜͷ࣮૷͕؆୯ʹͳͬͨ PyTorch

Slide 28

Slide 28 text

ػցֶशͷຽओԽͷഎܠ • ΞϧΰϦζϜ͕ίϞσΟςΟԽ͢ΔͱɺΞϧΰϦζϜͷ
 ੑೳͷࠩͰউෛ͢Δͷ͸େม • ެ։͞Ε͍ͯΔΞϧΰϦζϜΛ࢖ͬͯαʔϏεΛੜΈग़͢ ॏཁੑ͕૿͖ͯͨ͠ • ͨͩɺͦ͏͍ͬͨΞϧΰϦζϜ͸PythonͷΑ͏ͳϓϩά ϥϛϯάݴޠΛѻ͑ͳ͍ͱ࢖͑ͳ͍ɻத਎Λཧղ͠Α͏ ͱ͢Ε͹ߴ౓ͳ਺ֶͷ஌͕ࣝඞཁɻ • ͜͏͍ͬͨ΋ͷΛѻ͑ΔਓࡐΛͱͬͯ͘Δͷ͸େม • ׆༻Λ޿͛ΔͨΊʹ͸؆୯ʹ࢖͑ΔΑ͏ʹ͢Δඞཁ͕͋Δ

Slide 29

Slide 29 text

Table of contents Ұൠతͳػցֶशϓϩηε ͜͜Ͱ͸ҰൠతͳػցֶशϓϩηεͰߦ ΘΕΔ͜ͱʹ͍ͭͯཧղΛڞ༗͠·͢ɻ AutoMLͱ͸ʁ ͜͜Ͱ͸AutoMLͱ͸Կ͔ʹ͍ͭͯɺ
 ஀ੜͨ͠എܠ͔Βઆ໌͠·͢ɻ AutoMLͰߦΘΕΔ͜ͱ ͜͜Ͱ͸֤ػցֶशϓϩηεͰैདྷߦΘΕ ͍ͯͨ͜ͱͱAutoMLͰߦΘΕΔ͜ͱΛ
 ঺հ͠·͢ɻ AutoMLͷιϑτ΢ΣΞ ͜͜Ͱ͸AutoMLΛߦ͏ͨΊͷιϑτ ΢ΣΞͱαʔϏεʹ͍ͭͯ঺հ͠·͢ɻ AutoMLͷະདྷ ͜͜Ͱ͸AutoMLͷະདྷʹ͍ͭͯ
 ड़΂·͢ɻ ·ͱΊ ࠷ޙʹ͜Ε·Ͱͷ಺༰Λ·ͱΊ·͢ɻ

Slide 30

Slide 30 text

AutoMLͰߦΘΕΔ͜ͱ ϋΠύʔύϥϝʔλνϡʔχϯά ϋΠύʔύϥϝʔλνϡʔχϯάͰԿ͕ ߦΘΕΔ͔ैདྷͷํ๏ͱൺֱͯ͠঺հ Ϟσϧબ୒ Ϟσϧબ୒ͰԿ͕ߦΘΕΔ͔ैདྷͷ
 ํ๏ͱൺֱͯ͠঺հ ಛ௃ΤϯδχΞϦϯά ಛ௃ΤϯδχΞϦϯάͰߦΘΕΔ͜ͱ ʹ͍ͭͯ঺հ

Slide 31

Slide 31 text

ϋΠύʔύϥϝʔλνϡʔχϯά ϞσϧͷϋΠύʔύϥϝʔλΛ࠷దͳ஋ʹௐ੔͢Δϓϩηε
 ɹɹɹɹˠֶशΛߦ͏લʹઃఆ͢Δύϥϝʔλ ϩδεςΟοΫճؼͷ৔߹ ϥϯμϜϑΥϨετͷ৔߹ RandomForestClassifier(n_estimators=’warn’, criterion=’gini’, max_depth=None, min_sa mples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features=’auto’, max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None, bootstrap= True, oob_score=False, n_jobs=None, random_state=None, verbose=0, warm_start=False , class_weight=None) LogisticRegression(penalty=’l2’, dual=False, tol=0.0001, C=1.0, fit_intercept=True,
 intercept_scaling=1, class_weight=None, random_state=None, solver=’warn’,
 max_iter=100, multi_class=’warn’, verbose=0, warm_start=False, n_jobs=None)

Slide 32

Slide 32 text

ϋΠύʔύϥϝʔλνϡʔχϯάͷඞཁੑ ग़య: Data-driven Advice for Applying Machine Learning to Bioinformatics Problems

Slide 33

Slide 33 text

खಈʹΑΔνϡʔχϯάͷ໰୊఺ 1 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 ộ ộ ộ ộ Parameter1 Parameter2 Parameter3 Parameter4 Parameter5 ϋΠύʔύϥϝʔλΛखಈͰνϡʔχϯά͢Δͷ͸ݱ࣮తͰ͸ͳ͍ ͨͱ͑͹ɺϋΠύʔύϥϝʔλ͕5ͭ͋Γɺ֤ϋΠύʔύϥϝʔλ ʹରͯ͠3ͭͷ஋Λςετ͢Δ৔߹ɺ૊Έ߹Θͤ͸3ͷ5৐ʢ=273ʣ ௨Γଘࡏ

Slide 34

Slide 34 text

खಈʹΑΔνϡʔχϯάͷ໰୊఺ ϋΠύʔύϥϝʔλΛखಈͰνϡʔχϯά͢Δͷ͸ݱ࣮తͰ͸ͳ͍ ͨͱ͑͹ɺϋΠύʔύϥϝʔλ͕5ͭ͋Γɺ֤ϋΠύʔύϥϝʔλ ʹରͯ͠3ͭͷ஋Λςετ͢Δ৔߹ɺ૊Έ߹Θͤ͸3ͷ5৐ʢ=273ʣ ௨Γଘࡏ ग़య: Deep Residual Learning for Image Recognition

Slide 35

Slide 35 text

ϋΠύʔύϥϝʔλνϡʔχϯάͷࣗಈԽ ϥϯμϜαʔνʢRandom Searchʣ άϦουαʔνʢGrid Searchʣ ϕΠζ࠷దԽʢBayesian Optimizationʣ AutoMLͰ͸ϋΠύʔύϥϝʔλνϡʔχϯάΛࣗಈԽ →ࣗಈԽʹΑΓɺνϡʔχϯάͷޮ཰͕޲্ →ਓ͕௚ײతʹܾΊͨύϥϝʔλʹΑΔόΠΞεΛऔΓআ͚Δ ۩ମతʹ͸ҎԼͷΑ͏ͳख๏͕࢖ΘΕ͍ͯ·͢ɻ

Slide 36

Slide 36 text

άϦουαʔν ͢΂ͯͷ૊Έ߹Θͤ (10, 0.1) (10, 0.2) (10, 0.5) (100, 0.1) (100, 0.2) (100, 0.5) ֤ϋΠύʔύϥϝʔλͷީิ஋Λෳ਺༻ҙͯ͠ɺ͢΂ͯͷ૊߹ͤΛࢼ͢ख๏ ͨͱ͑͹ɺC ͱ γ ͱ͍͏2ͭͷύϥϝʔλ͕͋ΓɺͦΕͧΕɺC∈ {10, 100}, γ∈ {0.1, 0.2, 0.5}ͱ͍͏ީิ஋Λઃఆͨ͠৔߹ɺ2x3=6ͷ૊Έ߹ΘͤΛࢼ͢ ग़య: Random Search for Hyper-Parameter Optimization

Slide 37

Slide 37 text

ύϥϝʔλʹର͢Δ෼෍Λࢦఆ͠ɺ͔ͦ͜Β஋ΛαϯϓϦϯά͢Δख๏ ͨͱ͑͹ɺάϦουαʔνͰ͸ γ ʹରͯ͠ γ∈ {0.1, 0.2, 0.5} ͷΑ͏ͳ஋Λ
 ༩͍͑ͯͨͷʹରͯ͠ɺϥϯμϜαʔνͰ͸ɺࢦ਺෼෍ͷΑ͏ͳ֬཰෼෍Λ ༩͑ɺ͔ͦ͜Β஋ΛαϯϓϦϯά͠·͢ɻগ਺ͷϋΠύʔύϥϝʔλ͕ੑೳ ʹେ͖͘ӨڹΛ༩͑Δ৔߹ʹޮՌతͳख๏ ϥϯμϜαʔν ग़య: Random Search for Hyper-Parameter Optimization

Slide 38

Slide 38 text

άϦουαʔν΍ϥϯμϜαʔνͷ՝୊ͱͯ͠ɺݟࠐΈͷͳ͍ϋΠύʔύϥ ϝʔλʹ࣌ؒΛඅ΍͕ͪ͠ͳ఺Λڍ͛Δ͜ͱ͕Ͱ͖·͢ɻ͜ͷݪҼͱͯ͠͸ɺ άϦουαʔν΍ϥϯμϜαʔνͰ͸ҎલʹಘΒΕͨ݁ՌΛར༻͍ͯ͠ͳ͍ ఺Λڍ͛ΒΕ·͢ɻ άϦουαʔνͱϥϯμϜαʔνͷ՝୊ ग़య: Random Search for Hyper-Parameter Optimization

Slide 39

Slide 39 text

Ҏલͷ୳ࡧ݁ՌΛར༻͢Δ͜ͱͰޮ཰Α͘୳ࡧͰ͖ͦ͏ɻ άϦουαʔνͱϥϯμϜαʔνͷ՝୊ ग़య: A Conceptual Explanation of Bayesian Hyperparameter Optimization for Machine Learning

Slide 40

Slide 40 text

• ϕΠζ࠷దԽΛ༻͍ͨϋΠύʔύϥϝʔλνϡʔχϯά͸ɺҎલͷ݁ՌΛ ࢖ͬͯ࣍ʹ୳ࡧ͢ΔϋΠύʔύϥϝʔλΛબͿ • ͜ΕʹΑΓɺ༗๬ͦ͏ͳͱ͜ΖΛத৺ʹϋΠύʔύϥϝʔλΛ୳ࡧͰ͖Δ ˠਓ͕ؒߦ͏୳ࡧʹ͍ۙ • σΟʔϓϥʔχϯάΛؚΉػցֶशͷϞσϧʹରͯ͠ɺൺֱతྑ͍
 ϋΠύʔύϥϝʔλΛ୳ࡧͰ͖Δ͜ͱ͕஌ΒΕ͍ͯΔ ϕΠζ࠷దԽʹΑΔνϡʔχϯά Optuna Hyperopt ϕΠζ࠷దԽʹΑΔϋΠύʔύϥϝʔλνϡʔχϯάΛߦ͑ΔOSS

Slide 41

Slide 41 text

Ϟσϧબ୒ σʔλΛֶशͤ͞ΔػցֶशΞϧΰϦζϜΛબͿϓϩηε ग़య: sklearn Classifier comparisonΑΓҰ෦ൈਮ

Slide 42

Slide 42 text

Ϟσϧબ୒ͷඞཁੑ ग़య: Data-driven Advice for Applying Machine Learning to Bioinformatics Problems

Slide 43

Slide 43 text

࣮ࡍͷϓϩδΣΫτͰ͸ଟ͘ͷػցֶशΞϧΰϦζϜΛߟྀͰ͖͍ͯΔͱ͸ ͍͍೉͍ঢ়گ →ݪҼͷҰͭʹ͸ɺਓؒͷόΠΞε͕ؔ܎͍ͯ͠Δ खಈʹΑΔϞσϧબ୒ͷ໰୊఺ GTB͸ຖճྑ͍݁ՌΛग़ ͔͢Β͜ΕΛ࢖ͬͯ
 ͓͚͹͍͍Μͩ ←όΠΞε

Slide 44

Slide 44 text

ਓؒͷόΠΞεΛܰݮͤ͞ΔͨΊʹ༗ޮͳखͷҰͭͱͯ͠ɺσʔληοτͷ ಛ௃ʹԠͯ͡ػցతʹϞσϧΛܾఆ͢Δ࢓૊ΈΛߏங͓ͯ͘͠ख͕͋Δ όΠΞεΛܰݮͤ͞Δํ๏ ग़య: Choosing the right estimator

Slide 45

Slide 45 text

AutoMLͰ͸ػցֶशΞϧΰϦζϜͷબ୒Λࣗಈతʹߦ͏ →ਓؒͷόΠΞεΛഉআͰ͖Δ →༷ʑͳϞσϧΛߟྀͰ͖Δ Ϟσϧબ୒ͷख๏ʹ͍ͭͯ͸ϋΠύʔύϥϝʔλνϡʔχϯάͱ੾Γ཭ͤ
 ͣɺϋΠύʔύϥϝʔλνϡʔχϯά΋ಉ࣌ʹߦΘΕΔɻ Ϟσϧબ୒ͷࣗಈԽ

Slide 46

Slide 46 text

TPOTʹΑΔϞσϧબ୒ͱ
 ϋΠύʔύϥϝʔλνϡʔχϯά Demo

Slide 47

Slide 47 text

χϡʔϥϧΞʔΩςΫνϟαʔν

Slide 48

Slide 48 text

χϡʔϥϧΞʔΩςΫνϟαʔνͱ͸ɺχϡʔϥϧωοτϫʔΫͷߏ଄ઃܭ ΛࣗಈԽ͢Δٕज़ →χϡʔϥϧωοτϫʔΫΛ࢖ͬͯωοτϫʔΫΞʔΩςΫνϟΛੜ੒͠ɺ ϋΠύʔύϥϝʔλνϡʔχϯάΛֶͭͭ͠श χϡʔϥϧΞʔΩςΫνϟαʔνͱ͸ʁ ग़య: Neural Architecture Search with Reinforcement Learning

Slide 49

Slide 49 text

• χϡʔϥϧωοτϫʔΫͷΞʔΩςΫνϟΛઃܭ͢Δͷ͸
 ߴ౓ͳઐ໳஌͕ࣝඞཁͰඇৗʹ೉͍͠ • Α͍ΞʔΩςΫνϟΛ࡞Δʹ͸ࢼߦࡨޡ͕ඞཁͰɺ࣌ؒ΋͓ۚ΋͔͔Δ • ΤϯδχΞͷڅྉ͕ߴ͍ • GPUϚγϯͷ࢖༻ྉۚ • ௕ֶ͍श࣌ؒ • ͜ΕͰ͸׆༻Ͱ͖Δͷ͕গ਺ͷݚڀऀ΍ΤϯδχΞ͚ͩʹݶΒΕΔ ઃܭΛࣗಈԽ͍ͨ͠ཧ༝

Slide 50

Slide 50 text

χϡʔϥϧΞʔΩςΫνϟαʔνͷ՝୊ͱͯ͠͸ܭࢉྔͷଟ͕͋͞Δ →ωοτϫʔΫΛ࡞੒͢Δͨͼʹɺֶशͯ݁͠ՌΛ֬ೝ͢Δඞཁ͕͋Δ͔Β Neural Architecture Search with Reinforcement Learning 800 GPUͰ28೔ؒ Learning Transferable Architectures for Scalable Image Recognition 500 GPU Ͱ4೔ؒ Ұൠͷݚڀऀ΍։ൃऀ͕ར༻͢Δͷ͸ݱ࣮తͰ͸ͳ͍ χϡʔϥϧΞʔΩςΫνϟαʔνͷ՝୊

Slide 51

Slide 51 text

ߴ଎ԽͷखஈͷҰͭͱͯ͠࢖ΘΕΔͷ͕సҠֶशͰ͢ɻEfficient Neural Architecture Search via Parameter SharingͰ͸͢΂ͯͷॏΈΛεΫϥονͰ ֶशͤ͞ΔͷͰ͸ͳ͘ɺֶशࡁΈͷϞσϧ͔ΒసҠֶशͤͯ͞࢖͏͜ͱͰߴ ଎ԽΛ͍ͯ͠·͢ɻͦͷ݁Ռɺֶश࣌ؒ͸ 1 GPU Ͱ൒೔·Ͱʹ཈͑ΒΕ͍ͯ ·͢ɻ ܭࢉྔΛݮΒ͢

Slide 52

Slide 52 text

Google AutoML VisionΛ࢖ͬͯɺ ίʔσΟϯάແ͠ͰϞσϧΛ࡞੒͢Δ Demo

Slide 53

Slide 53 text

ಛ௃ΤϯδχΞϦϯά ػցֶशΞϧΰϦζϜͷੑೳΛ޲্ͤ͞ΔͨΊʹɺ
 ಛ௃(෼ੳର৅σʔλͷଌఆՄೳͳม਺)Λ࡞੒͢Δϓϩηε λΠλχοΫ߸ͷ৐٬৘ใ

Slide 54

Slide 54 text

ಛ௃ΤϯδχΞϦϯάͷඞཁੑ ಛ௃ΤϯδχΞϦϯάͰྑ͍ಛ௃ΛಘΔ͜ͱͰػցֶश
 ΞϧΰϦζϜͷੑೳ͕޲্͢Δɻ λΠλχοΫ߸ͷ৐٬৘ใ ܟশ(Mr, Mrs, SirͳͲ)Λ
 நग़ͯ͠࢖͏ ધͷ্ͷํͳͷ͔ɺ
 Լͷํͳͷ͔

Slide 55

Slide 55 text

खಈʹΑΔಛ௃ΤϯδχΞϦϯάͷ՝୊ ܟশ(Mr, Mrs, SirͳͲ)Λ
 நग़ͯ͠࢖͏ ધͷ্ͷํͳͷ͔ɺ
 Լͷํͳͷ͔ ਓؒʹΑΔಛ௃ΤϯδχΞϦϯάͷ໰୊఺ →ྑ͍ಛ௃Λࢥ͍ͭ͘ͷ͕೉͍͠ →ݕূ͢Δͷʹ͕͔͔࣌ؒΔ

Slide 56

Slide 56 text

खಈʹΑΔಛ௃ΤϯδχΞϦϯάͷ՝୊ ܟশ(Mr, Mrs, SirͳͲ)Λ
 நग़ͯ͠࢖͏ ધͷ্ͷํͳͷ͔ɺ
 Լͷํͳͷ͔ ಛ௃Λࢥ͍͍ͭͨΒ
 ऴΘΓͰ͸ͳ͍ ਓؒʹΑΔಛ௃ΤϯδχΞϦϯάͷ໰୊఺ →ྑ͍ಛ௃Λࢥ͍ͭ͘ͷ͕೉͍͠ →ݕূ͢Δͷʹ͕͔͔࣌ؒΔ

Slide 57

Slide 57 text

AutoMLʹΑΔಛ௃ΤϯδχΞϦϯά AutoMLͰ͸ಛ௃ΤϯδχΞϦϯάΛࣗಈԽ͢Δ͜ͱͰɺ
 ઌͷ2ͭͷ໰୊Λܰݮ͢Δ →ྑ͍ಛ௃Λࢥ͍ͭ͘ͷ͕೉͍͠ →ݕূ͢Δͷʹ͕͔͔࣌ؒΔ Ҏ߱Ͱ͸AutoMLʹ͓͍ͯಛ௃ΤϯδχΞϦϯά͕ͲͷΑ͏ʹ
 ߦΘΕΔͷ͔ʹ͍ͭͯઆ໌͠·͢ɻ • DataRobotͰͷࣗಈԽ • featuretoolsͰͷࣗಈԽ

Slide 58

Slide 58 text

DataRobotͰͷํ๏ DataRobotͰ͸ΤΩεύʔτγεςϜΛߏங͢Δ͜ͱͰ ಛ௃ΤϯδχΞϦϯάΛࣗಈԽ͍ͯ͠Δ 1. ಛ௃ͷੜ੒ 2. ಛ௃ΤϯδχΞϦϯά͕ඞཁͳϞσϧΛ஌Δ 3. ֤Ϟσϧʹ༗ޮͳಛ௃ΤϯδχΞϦϯάͷछྨΛ஌Δ 4. γεςϚςΟοΫʹϞσϧΛൺֱͯ͠ɺಛ௃ΤϯδχΞϦ ϯάͱϞσϧͷ࠷΋ྑ͍૊Έ߹ΘͤΛ஌Δ

Slide 59

Slide 59 text

͜ΕΒͷૢ࡞ΛDataRobotͰ͸ model blueprint Λ࢖ͬͯߦ͍ͬͯ·͢ɻ͜͜ Ͱɺmodel blueprint ͱ͸ɺͪ͜ΒͷهࣄʹΑΔͱɺલॲཧɺಛ௃ΤϯδχΞ Ϧϯάɺֶशɺνϡʔχϯάͱ͍ͬͨॲཧͷγʔέϯεͷ͜ͱͷΑ͏Ͱ͢ɻ ҎԼ͕ model blueprint ͷྫͰ͢ɻ DataRobotͰͷํ๏ ग़య: Automated Feature Engineering

Slide 60

Slide 60 text

͜ΕΒͷૢ࡞ΛDataRobotͰ͸ model blueprint Λ࢖ͬͯߦ͍ͬͯ·͢ɻ͜͜ Ͱɺmodel blueprint ͱ͸ɺͪ͜ΒͷهࣄʹΑΔͱɺલॲཧɺಛ௃ΤϯδχΞ Ϧϯάɺֶशɺνϡʔχϯάͱ͍ͬͨॲཧͷγʔέϯεͷ͜ͱͷΑ͏Ͱ͢ɻ ҎԼ͕ model blueprint ͷྫͰ͢ɻ DataRobotͰͷํ๏ ग़య: Automated Feature Engineering

Slide 61

Slide 61 text

DataRobotͷํ๏΋ྑ͍ͷͰ͕͢ɺ࡞ΓࠐΈ͕ඞཁͰਅࣅ͠ʹ͍͘ײ͡ͷ΍ ΓํͳͷͰ featuretools ʹ͍ͭͯ΋঺հ͓͖ͯ͠·͢ɻfeaturetools ͸Python ੡ͷΦʔϓϯιʔεͷಛ௃ΤϯδχΞϦϯάࣗಈԽπʔϧͰ͢ɻfeaturetools Λ࢖͏͜ͱͰಛ௃Λࣗಈతʹੜ੒͢Δ͜ͱ͕Ͱ͖·͢ɻ featuretoolsͰͷํ๏

Slide 62

Slide 62 text

featuretools Ͱ͸ Deep Feature Synthesis(DFS) ͱݺ͹ΕΔํ๏Ͱ৽ͨͳಛ ௃Λੜ੒͍ͯ͠·͢ɻDFSͰ͸ primitive ͱݺ͹ΕΔؔ਺Λ࢖ͬͯσʔλͷू ໿ͱม׵Λߦ͍·͢ɻprimitive ͷྫͱͯ͠͸ɺྻͷฏۉ΍࠷େ஋ΛऔΔؔ਺ Λڍ͛Δ͜ͱ͕Ͱ͖·͢ɻ·ͨࣗ෼Ͱఆٛͨؔ͠਺Λ primitive ͱͯ͠࢖͏͜ ͱ΋Ͱ͖·͢ɻ featuretoolsͰͷํ๏

Slide 63

Slide 63 text

featuretoolsͰͷํ๏ ग़య: Deep Feature Synthesis: How Automated Feature Engineering Works Primitive

Slide 64

Slide 64 text

featuretoolsͰͷํ๏ ग़య: Deep Feature Synthesis: How Automated Feature Engineering Works Primitiveͷ
 2ஈ֊ద༻

Slide 65

Slide 65 text

featuretoolsʹΑΔಛ௃ੜ੒ Demo

Slide 66

Slide 66 text

Table of contents Ұൠతͳػցֶशϓϩηε ͜͜Ͱ͸ҰൠతͳػցֶशϓϩηεͰߦ ΘΕΔ͜ͱʹ͍ͭͯཧղΛڞ༗͠·͢ɻ AutoMLͱ͸ʁ ͜͜Ͱ͸AutoMLͱ͸Կ͔ʹ͍ͭͯɺ
 ஀ੜͨ͠എܠ͔Βઆ໌͠·͢ɻ AutoMLͰߦΘΕΔ͜ͱ ͜͜Ͱ͸֤ػցֶशϓϩηεͰैདྷߦΘΕ ͍ͯͨ͜ͱͱAutoMLͰߦΘΕΔ͜ͱΛ
 ঺հ͠·͢ɻ AutoMLͷιϑτ΢ΣΞ ͜͜Ͱ͸AutoMLΛߦ͏ͨΊͷιϑτ ΢ΣΞͱαʔϏεʹ͍ͭͯ঺հ͠·͢ɻ AutoMLͷະདྷ ͜͜Ͱ͸AutoMLͷະདྷʹ͍ͭͯ
 ड़΂·͢ɻ ·ͱΊ ࠷ޙʹ͜Ε·Ͱͷ಺༰Λ·ͱΊ·͢ɻ

Slide 67

Slide 67 text

AutoMLͷιϑτ΢ΣΞ Auto-Keras scikit-learnϥΠΫͳΠϯλʔϑΣʔεͰ χϡʔϥϧΞʔΩςΫνϟαʔνΛߦ͑Δ ϥΠϒϥϦ auto-sklearn scikit-learnϥΠΫͳΠϯλʔϑΣʔεͰϞσ ϧબ୒ͱϋΠύʔύϥϝʔλνϡʔχϯά Λߦ͑ΔϥΠϒϥϦ optuna ϋΠύʔύϥϝʔλνϡʔχϯάͷͨΊ ͷϥΠϒϥϦɻϕΠζ࠷దԽʹΑΔख ๏Λαϙʔτ TPOT scikit-learnϥΠΫͳΠϯλʔϑΣʔεͰϞσ ϧબ୒ͱϋΠύʔύϥϝʔλνϡʔχϯά Λߦ͑ΔϥΠϒϥϦ

Slide 68

Slide 68 text

AutoMLͷαʔϏε DataRobot Google Cloud AutoML Azure Machine Learning

Slide 69

Slide 69 text

Table of contents Ұൠతͳػցֶशϓϩηε ͜͜Ͱ͸ҰൠతͳػցֶशϓϩηεͰߦ ΘΕΔ͜ͱʹ͍ͭͯཧղΛڞ༗͠·͢ɻ AutoMLͱ͸ʁ ͜͜Ͱ͸AutoMLͱ͸Կ͔ʹ͍ͭͯɺ
 ஀ੜͨ͠എܠ͔Βઆ໌͠·͢ɻ AutoMLͰߦΘΕΔ͜ͱ ͜͜Ͱ͸֤ػցֶशϓϩηεͰैདྷߦΘΕ ͍ͯͨ͜ͱͱAutoMLͰߦΘΕΔ͜ͱΛ
 ঺հ͠·͢ɻ AutoMLͷιϑτ΢ΣΞ ͜͜Ͱ͸AutoMLΛߦ͏ͨΊͷιϑτ ΢ΣΞͱαʔϏεʹ͍ͭͯ঺հ͠·͢ɻ AutoMLͷະདྷ ͜͜Ͱ͸AutoMLͷະདྷʹ͍ͭͯ
 ड़΂·͢ɻ ·ͱΊ ࠷ޙʹ͜Ε·Ͱͷ಺༰Λ·ͱΊ·͢ɻ

Slide 70

Slide 70 text

AutoMLͷະདྷ σʔλΫϦʔχϯά΋ՄೳʹͳΔ ͨͱ͑͹ɺςΩετͷΑ͏ͳඇߏ଄ԽσʔλΛ
 ෼ੳʹ͙͢ʹ࢖͑ΔΑ͏ʹςʔϒϧσʔλʹม׵͢Δ େن໛σʔλʹεέʔϧ͢Δ ݱࡏ͸αϯϓϧͷখ͞ͳσʔλʹରͯ͠Ͱ͑͞ܭࢉ͕࣌ؒ ݁ߏ͔͔Δɻকདྷతʹ͸͍ΘΏΔϏοάσʔλʹରͯ͠΋ ࢖͑ΔΑ͏ʹͳΔͰ͠ΐ͏ɻ ੑೳ͕ਓؒΛ্ճΔ ݱࡏͰ΋Ұ෦ͷσʔληοτͰ͸ਓؒʹඖఢ͢ΔੑೳΛग़͍ͯ͠·͕͢ɺকདྷ తʹ͸ਓ͕ؒߟ͔͑ͭͳ͍Α͏ͳಛ௃Ͱ͋Δͱ͔ωοτϫʔΫΞʔΩςΫνϟ ΛੜΈग़ͤΔΑ͏ʹͳΔͰ͠ΐ͏

Slide 71

Slide 71 text

·ͱΊ ػցֶशʹ͸༷ʑͳϓϩηε͕ଘࡏ͠ख͕͔͔ؒΔ ֤ϓϩηεΛࣗಈԽ͢Δ͜ͱͰɺੜ࢈ੑ޲্΍୭ʹͰ΋ػցֶश Λ࢖͑ΔΑ͏ʹ͢Δඞཁੑ͕ੜ͍ͯ͡Δ AutoML͸ͦͷͨΊͷٕज़ͰΦʔϓϯιʔεͷιϑτ΢ΣΞ΍ ঎༻ͷαʔϏε͕ఏڙ͞Ε͍ͯΔ

Slide 72

Slide 72 text

ࢀߟࢿྉ ಛ௃ΤϯδχΞϦϯά • Why Automated Feature Engineering Will Change the Way You Do Machine Learning • Deep Feature Synthesis: How Automated Feature Engineering Works • Automated Feature Engineering χϡʔϥϧΞʔΩςΫνϟαʔν • Neural Architecture Search with Reinforcement Learning • Learning Transferable Architectures for Scalable Image Recognition • Efficient Neural Architecture Search via Parameter Sharing • Everything you need to know about AutoML and Neural Architecture Search • Understanding AutoML and Neural Architecture Search • An Opinionated Introduction to AutoML and Neural Architecture Search • What do machine learning practitioners actually do?

Slide 73

Slide 73 text

ࢀߟࢿྉ ϋΠύʔύϥϝʔλνϡʔχϯά • Random Search for Hyper-Parameter Optimization • A Conceptual Explanation of Bayesian Hyperparameter Optimization for Machine Learning • ػցֶशϞσϧͷϋΠύύϥϝʔλ࠷దԽ • ػցֶशͷͨΊͷϕΠζ࠷దԽೖ໳