Pythonで動かして学ぶ機械学習入門第一回　機械学習の理解

ػցֶशͷཧղ   PythonͰಈֶ͔ͯ͠Ϳػցֶशೖ໳ ୈҰճʢશ࢛ճγϦʔζʣ ٠ా ངฏ  2016/09/09

ߨࢣ঺հ • ٠ా ངฏʢ͖ͨ͘ Α͏΁͍ʣ • ത࢜ʢཧֶʣ • ݱࡏ͸๭ίϯαϧςΟϯάϑΝʔϜʹͯσʔλ෼ੳۀ຿ʹैࣄ •
ಘҙ෼໺  ɾػցֶशͷཧ࿦తଆ໘  ɾਪનΞϧΰϦζϜ  ɾը૾෼ੳʢDeep Learningʣ • ࿈བྷઌ  Կ͔͋Γ·ͨ͠Β͓ؾܰʹ͝࿈བྷ͍ͩ͘͞  Email : [email protected]  Facebook : https://www.facebook.com/yohei.kikuta.3  Linkedin : https://jp.linkedin.com/in/yohei-kikuta-983b29117

໨࣍ • ػցֶशͱ͸Կ͔  ػցֶशͷఆٛ  • ػցֶशͰԿ͕Ͱ͖Δͷ͔  ػցֶशͰ਱ߦͰ͖Δ୅දతͳ෼ੳ  • ػցֶशʹ͓͚ΔϞσϦϯά  Ϟσϧͱ͸Կ͔ͷཧղͱͦͷߏஙखॱ

ػցֶशͱ͸Կ͔

ػցֶशͷҖྗ ग़ॴɿhttps://arxiv.org/abs/1112.6209 he of e- st ce n, ns
rs. he d, ut . e). en h- Figure 3. Top: Top 48 stimuli of the best neuron from the test set. Bottom: The optimal stimulus according to nu- merical constraint optimization. 4.5. Invariance properties We would like to assess the robustness of the face de- tector against common object transformations, e.g., translation, scaling and out-of-plane rotation. First, we chose a set of 10 face images and perform distor- Building high-level features using large-scale unsupervised learning t) and out-of-plane (3D) rotation (right) ies of the best feature. Figure 6. Visualization of the cat fac human body neuron (right). scribed in (Zhang et al., 2008). In are 10,000 positive images and 18,4 (so that the positive-to-negative ra

ػցֶशͷҖྗ ग़ॴɿhttps://www.cs.toronto.edu/~ranzato/publications/taigman_cvpr14.pdf Figure 2. Outline of the DeepFace architecture.
A front-end of a single convolution-pooling-convolution filtering on the rectified input, followed by three locally-connected layers and two fully-connected layers. Colors illustrate feature maps produced at each layer. The net includes more than 120 million parameters, where more than 95% come from the local and fully connected layers. very few parameters. These layers merely expand the input into a set of simple local features. The subsequent layers (L4, L5 and L6) are instead locally connected [13, 16], like a convolutional layer they ap- ply a filter bank, but every location in the feature map learns a different set of filters. Since different regions of an aligned image have different local statistics, the spatial stationarity assumption of convolution cannot hold. For example, areas between the eyes and the eyebrows exhibit very different appearance and have much higher discrimination ability compared to areas between the nose and the mouth. In other words, we customize the architecture of the DNN by lever- The goal of training is to maximize the probability of the correct class (face id). We achieve this by minimiz- ing the cross-entropy loss for each training sample. If k is the index of the true label for a given input, the loss is: L = log pk . The loss is minimized over the parameters by computing the gradient of L w.r.t. the parameters and by updating the parameters using stochastic gradient de- scent (SGD). The gradients are computed by standard back- propagation of the error [25, 21]. One interesting property of the features produced by this network is that they are very sparse. On average, 75% of the feature components in the topmost layers are exactly zero. This is mainly due to the ases in the DNN, perfect equivariance would have been achieved. 4. Verification Metric Verifying whether two input instances belong to the same class (identity) or not has been extensively researched in the domain of unconstrained face-recognition, with supervised methods showing a clear performance advantage over unsupervised ones. By training on the target-domain’s training set, one is able to fine-tune a feature vector (or classifier) to perform better within the particular distribution of the dataset. For instance, LFW has about 75% males, celebri- ties that were photographed by mostly professional photog- raphers. As demonstrated in [5], training and testing within different domain distributions hurt performance consider- ably and requires further tuning to the representation (or classifier) in order to improve their generalization and per- Figure 3. The ROC curves on the LFW dataset. Best viewed in color.

ػցֶशͷҖྗ ग़ॴɿhttps://arxiv.org/abs/1512.02595 Read speech with high signal-to-noise ratio is
arguably the easiest large vocabulary for a continuous speech recognition task. We benchmark our system on two test sets from the Wall Street Journal (WSJ) corpus of read news articles. These are available in the LDC catalog as LDC94S13B and LDC93S6B. We also take advantage of the recently developed LibriSpeech corpus constructed using audio books from the LibriVox project [46]. Table 13 shows that the DS2 system outperforms humans in 3 out of the 4 test sets and is competitive on the fourth. Given this result, we suspect that there is little room for a generic speech system to further improve on clean read speech without further domain adaptation. Read Speech Test set DS1 DS2 Human WSJ eval’92 4.94 3.60 5.03 WSJ eval’93 6.94 4.98 8.08 LibriSpeech test-clean 7.89 5.33 5.83 LibriSpeech test-other 21.74 13.25 12.69 Table 13: Comparison of WER for two speech systems and human level performance on read speech. 18

ػցֶशͷҖྗ ग़ॴɿhttps://www.nextrembrandt.com/

ػցֶशͷҖྗ ARTICLE RESEARCH learning of convolutional networks, won 11%
of games against Pachi23 and 12% against a slightly weaker program, Fuego24. Reinforcement learning of value networks The final stage of the training pipeline focuses on position evaluation, (s, a) of the search tree stores an action value Q(s, a), visit count N(s, a), and prior probability P(s, a). The tree is traversed by simulation (that is, descending the tree in complete games without backup), starting from the root state. At each time step t of each simulation, an action at is selected from state st Figure 3 | Monte Carlo tree search in AlphaGo. a, Each simulation traverses the tree by selecting the edge with maximum action value Q, plus a bonus u(P) that depends on a stored prior probability P for that edge. b, The leaf node may be expanded; the new node is processed once by the policy network pσ and the output probabilities are stored as prior probabilities P for each action. c, At the end of a simulation, the leaf node is evaluated in two ways: using the value network vθ ; and by running a rollout to the end of the game with the fast rollout policy pπ , then computing the winner with function r. d, Action values Q are updated to track the mean value of all evaluations r(·) and vθ (·) in the subtree below that action. Selection a b c d Expansion Evaluation Backup p S p V Q + u(P) Q + u(P) Q + u(P) Q + u(P) P P P P Q Q Q Q Q r r r r P max max P Q T Q T Q T Q T Q T Q T ग़ॴɿhttp://www.nature.com/nature/journal/v529/n7587/abs/nature16961.html?lang=en

ਓ޻஌ೳͱػցֶश ਓ޻஌ೳ΍ػցֶशͱ͍͏୯ޠΛ͍ͨΔͱ͜ΖͰ໨ʹ͠·͕͢ɺ ਓ޻஌ೳ΍ػցֶशͷఆٛΛઆ໌͢Δ͜ͱ͕Ͱ͖·͔͢ʁ

ਓ޻஌ೳͱػցֶश ೔ຊʹ͓͚Δʮਓ޻஌ೳʯͷݕࡧස౓ ݕࡧස౓ ग़ॴɿhttps://www.google.co.jp/trends/ ݕࡧස౓ ೔ຊʹ͓͚Δʮػցֶशʯͷݕࡧස౓

ਓ޻஌ೳͱػցֶश ݕࡧස౓ ग़ॴɿhttps://www.google.co.jp/trends/ ݕࡧස౓ ੈքʹ͓͚ΔʮNBDIJOFMFBSOJOHʯͷݕࡧස౓ ੈքʹ͓͚ΔʮBSUJpDJBM*OUFMMJHFODFʯͷݕࡧස౓

ਓ޻஌ೳͱػցֶश ಛʹ೔ຊͰ͸ԿͰ΋ਓ޻஌ೳͱ͍͏ݴ༿ͰׅΒΕͯ͠·͏͜ͱ͕ଟ͍ ࣮ࡍʹ͸ਓ޻஌ೳͷఆٛʹ͸̎ͭ͋ΔʢݫີͳఆٛͰ͸ͳ͍ʣ • ڧ͍ਓ޻஌ೳ  SFʹग़ͯ͘ΔΑ͏ͳҙࣝ΍஌ੑΛ֫ಘͨ͠ίϯϐϡʔλ • ऑ͍ਓ޻஌ೳ  ਓؒͷػೳͷҰ෦Λ୅ସ͢Δίϯϐϡʔλ
ݱࡏݚڀ͞Ε͍ͯΔ΋ͷͷେ෦෼͸ऑ͍ਓ޻஌ೳʹଐ͢Δ΋ͷͰɺ ͦͷதͰಛʹൃల͍ͯ͠ΔྖҬ͕ػցֶशͱଊ͑Δͷ͕ࣗવ

֤ྖҬͷแؚؔ܎ $PNQVUFS4DJFODF "SUJpDJBM*OUFMMJHFODF .BDIJOF-FBSOJOH

ػցֶशͷఆٛ ͋ΔλεΫTΛղ͖ɺͦͷύϑΥʔϚϯεଌఆΛPͰߦ͏͜ͱΛߟ͑Δ ػցֶशͱ͸ܦݧEʹΑͬͯPΛߴΊ͍ͯ͘ख๏ͷ͜ͱͰ͋Δ

σʔλ෼ੳʹ͓͚ΔಡΈସ͑ ͋ΔλεΫTΛղ͖ɺͦͷύϑΥʔϚϯεଌఆΛPͰߦ͏͜ͱΛߟ͑Δ ػցֶशͱ͸ܦݧEʹΑͬͯPΛߴΊ͍ͯ͘ख๏ͷ͜ͱͰ͋Δ ൑ผ໰୊ͳͲ ൑ผਫ਼౓ͳͲ σʔλ

Notebook

ͳͥػցֶश͕༗༻͔ʁ • େྔͷσʔλ  σʔλ͸ࢦ਺ؔ਺తʹ૿Ճ        • ΦϯσϚϯυͳܭࢉࢿݯ 
ඞཁͳࢿݯ͕ඞཁͳ࣌ʹඞཁͳ෼͚ͩ४උՄೳ        • ๛෋ͳϥΠϒϥϦ  Pythonʹ୅ද͞ΕΔ๛෋ͳػցֶशϥΠϒϥϦΛ༗͢Δݴޠͷൃల  GithubͳͲͷOSSڞ༗ͷ΢ΣϒαʔϏεͷ୆಄ 1.2. ػցֶशͰԿ͕Ͱ͖Δͷ͔ 9 ձࣾ Ϋϥ΢υαʔϏε URL Amazon Amazon Web Service (AWS) https://aws.amazon.com/jp/ Google Google Cloud Platform (GCP) https://cloud.google.com/ IBM SoftLayer http://www.softlayer.com/jp/ Microsoft Azure https://azure.microsoft.com/ja-jp/ ද 1.1: ओཁͳΫϥ΢υαʔόʔαʔϏεɻ֤͕ࣾετϨʔδαʔϏε͔ΒػցֶशͷϚωʔδυܕ αʔϏε·Ͱ෯޿͍αʔϏεΛఏڙ͍ͯ͠·͢ɻ ·ͨɺػցֶशͷ༷ʑͳΞϧΰϦζϜ͸ GitHubʢhttps://github.com/ʣͳͲΛච಄ʹੈքதͰڞ ༗͞Ε͍ͯΔͨΊɺIT ΍ػցֶशʹҰఆͷ଄ܮ͕͋Ε͹͜ͷੈքͷ࠷ઌ୺ͷӥஐ͕୭ʹͰ΋ར༻Մ ೳͰ͢ɻҎ্ͷΑ͏ͳঢ়گʹΑΓɺݱࡏͰ͸େن໛ͳγεςϜΛ࣋ͪ߹Θͤͳ͍ݸਓϨϕϧͰߴ͍ੑ ೳΛ༗͢Δػցֶशख๏͕ར༻Մೳͱͳ͍ͬͯ·͢ɻػցֶश͸՝୊ղܾʹରͯ͠ݱ୅ͷਓʑ͕࠾༻

ػցֶश͸༗༻ʂͰ͕͢… ػցֶश͸՝୊ղܾʹରͯ͠ݱ୅ͷਓʑ͕࠾༻͢΂͖ڧྗͳΞϓϩʔν ͔͠͠ɺԿͷࣄલ஌ࣝ΋ͳ͍··࢖͍͜ͳ͢ͷ͸೉͍͠… ͦ͜ͰɺຊษڧձͰ͸۩ମతͳࣄྫຖʹཧղͱ࣮ફΛηοτͰఏڙ ୈҰճɿػցֶशͷཧղɺചΓ্͛༧ଌ ୈೋճɿධ൑෼ੳɺը૾෼ੳ ୈࡾճɿσΟʔϓϥʔχϯάཧ࿦ɺσΟʔϓϥʔχϯά࣮૷ ୈ࢛ճɿސ٬෼ྨɺهࣄਪન

ͥͻଟ͘ͷ΋ͷΛ࣋ͪؼ࣮ͬͯࡍʹػցֶशΛ׆༻͍ͯͩ͘͠͞ʂ

ػցֶशͰԿ͕Ͱ͖Δͷ͔

ػցֶशͷఆٛ ͋ΔλεΫTΛղ͖ɺͦͷύϑΥʔϚϯεଌఆΛPͰߦ͏͜ͱΛߟ͑Δ ػցֶशͱ͸ܦݧEʹΑͬͯPΛߴΊ͍ͯ͘ख๏ͷ͜ͱͰ͋Δ ൑ผ໰୊ͳͲ ൑ผਫ਼౓ͳͲ σʔλͷ஝ੵ ୅දతͳྫΛ঺հ

ػցֶशʹ͓͚ΔλεΫ • ճؼ෼ੳɿઆ໌͍ͨ͠ྔΛଞͷྔΛ༻͍ͯදݱ • ࣌ܥྻ෼ੳɿ஫໨͢Δྔͷ࣌ؒతͳมԽΛදݱ • ൑ผ෼ੳɿ༩͑ΒΕͨσʔλ͕ͲͷΫϥεʹଐ͢Δ͔൑ఆ • ࣗಈૢ࡞ɿήʔϜͷ߈ུ΍ϩϘοτͷߦಈ੍ޚΛඇ໌ࣔతʹࣗಈԽ •
ಛఆύλʔϯͷൃݟɿେྔͷσʔλ͔ΒಛఆͷύλʔϯΛൃݟ • ΫϥελϦϯά෼ੳɿσʔλΛ͋ΔنଇʹैͬͯΫϥε෼͚ • ࠷దԽ෼ੳɿ੍໿৚݅Λຬͨ͢Α͏໨తͷྔͷ࠷େ஋࠷খ஋Λࢉग़ ※ػցֶशͷλεΫΛશͯ໢ཏ͍ͯ͠ΔΘ͚Ͱ͸͋Γ·ͤΜ

ػցֶशʹ͓͚ΔλεΫ ໨తม਺ આ໌ม਺ • ճؼ෼ੳɿઆ໌͍ͨ͠ྔΛଞͷྔΛ༻͍ͯදݱɹ    ྫʣ௞ି෺݅ͷՈ௞ΛӺ͔Βͷెา࣌ؒ΍ங೥਺͔Β༧ଌ ໨తม਺ આ໌ม਺

ػցֶशʹ͓͚ΔλεΫ • ࣌ܥྻ෼ੳɿ஫໨͢Δྔͷ࣌ؒతͳมԽΛදݱ  ྫʣిྗফඅྔͷ༧ଌ ஫໨͢Δྔ ࣌ؒ

ػցֶशʹ͓͚ΔλεΫ • ൑ผ෼ੳɿ༩͑ΒΕͨσʔλ͕ͲͷΫϥεʹଐ͢Δ͔൑ఆ  ྫʣೣ͕͍ࣸͬͯΔը૾ͰೣͷछྨΛࣝผ ม਺ ม਺ Ϋϥε Ϋϥε Ϋϥε

ػցֶशʹ͓͚ΔλεΫ • ࣗಈૢ࡞ɿήʔϜͷ߈ུ΍ϩϘοτͷߦಈ੍ޚΛඇ໌ࣔతʹࣗಈԽ  ྫʣήʔϜૢ࡞ ग़ॴɿhttps://www.youtube.com/watch?v=V1eYniJ0Rnk

ػցֶशʹ͓͚ΔλεΫ • ಛఆύλʔϯͷൃݟɿେྔͷσʔλ͔ΒಛఆͷύλʔϯΛൃݟ  ྫʣECαΠτʹ͓͚Δ঎඼ਪન Ϣʔβ" Ϣʔβ# Ϣʔβ$ ɾ ɾ
ɾ ঎඼B ঎඼C ঎඼D ঎඼E ঎඼F ɾɾɾ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔

ػցֶशʹ͓͚ΔλεΫ • ΫϥελϦϯά෼ੳɿσʔλΛ͋ΔنଇʹैͬͯΫϥε෼͚  ྫʣਓؒΛ਎ମతಛ௃΍Ҩ఻తಛ௃Ͱάϧʔϓ෼͚

ػցֶशʹ͓͚ΔλεΫ • ࠷దԽ෼ੳɿ੍໿৚݅Λຬͨ͢Α͏໨తͷྔͷ࠷େ஋࠷খ஋Λࢉग़  ྫʣ༗ྉಓ࿏Λ࢖Θͳ͍৚݅Ͱ૸ߦ࣌ؒ࠷୹ͷίʔεΛಋग़ " # % $ '
& " # % $ ' &

ػցֶशͷఆٛ ͋ΔλεΫTΛղ͖ɺͦͷύϑΥʔϚϯεଌఆΛPͰߦ͏͜ͱΛߟ͑Δ ػցֶशͱ͸ܦݧEʹΑͬͯPΛߴΊ͍ͯ͘ख๏ͷ͜ͱͰ͋Δ ൑ผ໰୊ͳͲ ൑ผਫ਼౓ͳͲ σʔλͷ஝ੵ ୅දతͳྫΛ঺հ σʔλͷ஝ੵͱڞʹͲͷΑ͏ʹ ֶशΛ͍ͯ͘͠ͷ͔Λ঺հ

ػցֶशʹ͓͚Δֶश • ڭࢣ༗Γֶश  ౴͑ͱͳΔϥϕϧΛ্ख͘༧ଌͰ͖ΔΑ͏ʹֶश͢Δख๏  • ڭࢣແֶ͠श  ϥϕϧ͕ͳ͍ঢ়گͰج४ͱͳΔࢦඪ͕࠷దͳ஋ʹͳΔΑ͏ֶश  • ൒ڭࢣ༗Γֶशʢൃలతͳڭࢣ༗Γֶशʣ  ϥϕϧ͕Ұ෦͔͠ͳ͍ঢ়گͰͷֶश 
• ڧԽֶश  ߦಈͷ݁Ռ༩͑ΒΕΔใुΛ࠷େԽ͢ΔΑ͏ߦಈΛֶश

ػցֶशʹ͓͚Δֶश • ڭࢣ༗Γֶश  ౴͑ͱͳΔϥϕϧΛ্ख͘༧ଌͰ͖ΔΑ͏ʹֶश͢Δख๏ ༧ଌث ༧ଌث` ༧ଌث`` ༧ଌ
౴͑ ༧ଌͱ౴͑Λൺ΂ֶͯश ֶशεςοϓ ɾ ɾ ɾ

ػցֶशʹ͓͚Δֶश • ڭࢣແֶ͠श  ϥϕϧ͕ͳ͍ঢ়گͰج४ͱͳΔࢦඪ͕࠷దͳ஋ʹͳΔΑ͏ֶश ֶशεςοϓ ج४ͱͳΔࢦඪ  ' '
'

ػցֶशʹ͓͚Δֶश • ൒ڭࢣ༗Γֶशʢൃలతͳڭࢣ༗Γֶशʣ  ϥϕϧ͕Ұ෦͔͠ͳ͍ঢ়گͰͷֶश

ػցֶशʹ͓͚Δֶश • ڧԽֶश  ߦಈͷ݁Ռ༩͑ΒΕΔใुΛ࠷େԽ͢ΔΑ͏ߦಈΛֶश ؀ڥ ߦಈ PS ใुʹ่ͨ͠ϒϩοΫͷ਺ ؍ଌʹߦಈͷ݁ՌͲ͏ͳ͔ͬͨ

ػցֶशͷఆٛ ͋ΔλεΫTΛղ͖ɺͦͷύϑΥʔϚϯεଌఆΛPͰߦ͏͜ͱΛߟ͑Δ ػցֶशͱ͸ܦݧEʹΑͬͯPΛߴΊ͍ͯ͘ख๏ͷ͜ͱͰ͋Δ ൑ผ໰୊ͳͲ ൑ผਫ਼౓ͳͲ σʔλͷ஝ੵ ୅දతͳྫΛ঺հ σʔλͷ஝ੵͱڞʹͲͷΑ͏ʹ ֶशΛ͍ͯ͘͠ͷ͔Λ঺հ
͍͔ͭ͘ͷྫΛ঺հ

ਫ਼౓ࢦඪ λεΫͷධՁʹ͸ج४ͱͳΔࢦඪ͕ඞཁͰɺͦͷબͼํ͸ॏཁ ͨͩ͠ࢦඪ͸ඇৗʹ਺͕ଟ͍͜ͱʹՃ͑ͯɺ ڭࢣ༗Γֶशͱڭࢣແֶ͠शͰ΋ߟ͑ํ͕ҟͳΔͨΊෳࡶ ͜͜Ͱ͸ڭࢣ༗Γֶशͷ୅දతͳྫΛ͍͔ͭ֬͘ೝ͢ΔʹཹΊΔ • ճؼ෼ੳ޲͖ͷࢦඪ  Root Mean Square
Error (RMSE), Median Absolute Deviation (MAD), Max-Error, …  • ൑ผ෼ੳ޲͖ͷࢦඪ  Are Under the Receiver Operator Characteristic curve (AUROC), (multi- class) log loss, normalized Discounted Cumulative Gain (nDCG), …

ਫ਼౓ࢦඪ Root Mean Square Error: ฏۉೋ৐ޡࠩ ϧͷධՁ ͲΜͳࢦඪͰੑೳΛଌΔ͔ͱ͍͏໰୊Ͱ͢ɻઌ΄Ͳͷೣͷ൑ผͷྫͰݴ͑͹ɺྫ ͏ͪɺಛʹࣗ৴Λ΋ͬͯ༧ଌΛ͍ͯ͠Δ֬৴౓্Ґ
10 ຕͷը૾͸Կͱͯ͠΋౰ ΋Ͱ͖Δ͚ͩଟ͘ͷຕ਺Ͱਖ਼͘͠ೣͱ༧ଌ͍ͨ͠ͷ͔Ͱѻ͏ࢦඪ͕มΘ͖ͬͯ· ʹ͓͍ͯ͸࣮ʹ༷ʑͳࢦඪ͕ߟҊ͞Εར༻͞Ε͍ͯ·͕͢ɺ͜͜Ͱ͸ճؼͱ൑ผ ୅දతͳࢦඪΛ঺հ͠·͢ɻTODO:Ҏ߱ͷࢦඪʹؔͯ͠͸΋͏গ͠Θ͔Γ΍͢ ͢Δɻ Ͱ࢖ΘΕΔ͜ͱͷଟ͍ࢦඪͱͯ͠͸ҎԼͷ΋ͷ͕ڍ͛ΒΕ·͢ɻ ೋ৐ޡࠩʢRoot Mean Square Errorʣ ม਺ͷ࣮σʔλͱ༧ଌ஋ͷࠩͷೋ৐࿨Ͱ͋Γɺ஋͕খ͍͞΄Ͳ౰ͯ͸·Γ͕ྑ͍ ·͢ɻ 1 N N i (f(xi) − yi)2 جຊతͳࢦඪͰ͋ΓɺϞσϧʹΑΔ֤σʔλ఺Ͱͷ༧ଌ͕࣮σʔλʹ͍ۙ஋Ͱ͋ ໨తม਺ આ໌ม਺ දతͳࢦඪΛ঺հ͠·͢ɻTODO:Ҏ߱ͷࢦඪʹؔͯ͠͸΋͏গ͠Θ͔Γ΍͘͢ͳΔ Δɻ ࢖ΘΕΔ͜ͱͷଟ͍ࢦඪͱͯ͠͸ҎԼͷ΋ͷ͕ڍ͛ΒΕ·͢ɻ ޡࠩʢRoot Mean Square Errorʣ ͷ࣮σʔλͱ༧ଌ஋ͷࠩͷೋ৐࿨Ͱ͋Γɺ஋͕খ͍͞΄Ͳ౰ͯ͸·Γ͕ྑ͍Ϟσϧ ɻ 1 N N i (f(xi) − yi)2 ( తͳࢦඪͰ͋ΓɺϞσϧʹΑΔ֤σʔλ఺Ͱͷ༧ଌ͕࣮σʔλʹ͍ۙ஋Ͱ͋Δ΄Ͳ ߟ͑ʹج͍͍ͮͯ·͢ɻࠜ߸͸׳शతͳ΋ͷͳͷͰࠜ߸ແ͠ͷ΋ͷΛࢦඪͱ͢Δ৔ ͢ɻೋ৐࿨Ͱͳ͘ઈର஋ͷ࿨Ͱ͋ΔฏۉઈରޡࠩʢMean Absolute ErrorʣͳͲ΋ ͢ɻ ޡࠩʢMedian Absolute Errorʣ ͸ͳ͘தԝ஋ͱͷࠩ෼ΛऔΔࢦඪͰɺ஋͕খ͍͞΄Ͳ౰ͯ͸·Γ͕ྑ͍Ϟσϧͱͳ ͷ༧ଌͷ͏ͪɺಛʹࣗ৴Λ΋ͬͯ༧ଌΛ͍ͯ͠Δ֬৴౓্Ґ 10 ຕͷը૾͸Կͱͯ͠΋౰ͯ ɺͦΕͱ΋Ͱ͖Δ͚ͩଟ͘ͷຕ਺Ͱਖ਼͘͠ೣͱ༧ଌ͍ͨ͠ͷ͔Ͱѻ͏ࢦඪ͕มΘ͖ͬͯ·͢ ػցֶशʹ͓͍ͯ͸࣮ʹ༷ʑͳࢦඪ͕ߟҊ͞Εར༻͞Ε͍ͯ·͕͢ɺ͜͜Ͱ͸ճؼͱ൑ผͷ ͍ΒΕΔ୅දతͳࢦඪΛ঺հ͠·͢ɻTODO:Ҏ߱ͷࢦඪʹؔͯ͠͸΋͏গ͠Θ͔Γ΍͘͢ improve ͢Δɻ ճؼ໰୊Ͱ࢖ΘΕΔ͜ͱͷଟ͍ࢦඪͱͯ͠͸ҎԼͷ΋ͷ͕ڍ͛ΒΕ·͢ɻ • ฏۉೋ৐ޡࠩʢRoot Mean Square Errorʣ ໨తม਺ͷ࣮σʔλͱ༧ଌ஋ͷࠩͷೋ৐࿨Ͱ͋Γɺ஋͕খ͍͞΄Ͳ౰ͯ͸·Γ͕ྑ͍Ϟ ͳΓ·͢ɻ 1 N N i (f(xi) − yi)2 ࠷΋جຊతͳࢦඪͰ͋ΓɺϞσϧʹΑΔ֤σʔλ఺Ͱͷ༧ଌ͕࣮σʔλʹ͍ۙ஋Ͱ͋Δ ͍ͱ͍͏ߟ͑ʹج͍͍ͮͯ·͢ɻࠜ߸͸׳शతͳ΋ͷͳͷͰࠜ߸ແ͠ͷ΋ͷΛࢦඪͱ͢ ΋͋Γ·͢ɻೋ৐࿨Ͱͳ͘ઈର஋ͷ࿨Ͱ͋ΔฏۉઈରޡࠩʢMean Absolute ErrorʣͳͲ ࢖ΘΕ·͢ɻ • தԝઈରޡࠩʢMedian Absolute Errorʣ

ਫ਼౓ࢦඪ Are Under the Receiver Operator Characteristic curve:  ड৴ऀૢ࡞ಛੑۂઢͷۂઢԼ໘ੵ
ୈ 1 ষ ػցֶशΛཧղ͢Δ ऀૢ࡞ಛੑʢROCʣۂઢԼ໘ੵʢArea Under the Curveʣ ෼ྨͰ࢖ΘΕΔࢦඪͰɺ[0.5,1] ͷ஋ΛऔΓ 1 ʹ͍ۙ΄Ͳྑ͍༧ଌੑೳͰ͋Δ͜ͱΛࣔͯ͠ ͢ɻఆٛʹ͸͍͔ͭ͘ͷεςοϓ͕ඞཁͱͳΓ·͢ɻ·ͣɺϞσϧʹΑΔ༧ଌ {1,0} ͱ࣮ λͷ஋ {1,0} Λൺֱ͢ΔࠞಉߦྻΛಋೖ͠·͢ɻදͷ੒෼͸ͦ͜ʹ౰ͯ͸·Δσʔλͷ਺ ʢྫ͑͹ TP ͷ৔߹͸Ϟσϧ༧ଌ͕ 1 Ͱ͔࣮ͭσʔλ΋ 1 Ͱ͋Δ΋ͷͷ਺Ͱ͢ʣ ɻ͜͜Ͱɺ ϧ༧ଌ͸ᮢ஋Λม͑Δ͜ͱͰมԽ͢Δ͜ͱʹ஫ҙ͍ͯͩ͘͠͞ɻྫ͑͹ɺϞσϧͷग़ྗ͕ 100%] ͷ֬৴౓Ͱ͋Δ৔߹ɺ50%Ҏ্ͷ৔߹Λ 1 ͱ༧ଌ͢Δ͔ 80%Ҏ্ͷ৔߹Λ 1 ͱ͢Δ ݁Ռ͕มΘΔͱ͍͏͜ͱͰ͢ɻ (࣮σʔλ)=1 (࣮σʔλ)=0 (Ϟσϧ༧ଌ)=1 True Positive (TP) False Positive (FP) (Ϟσϧ༧ଌ)=0 False Negative (FN) True Negative (TN) ࠞಉߦྻͷ੒෼Λ༻͍ͯײ౓ʢSensitivityʣͱಛҟ౓ʢSpecificityʣఆٛ͠·͢ɻ ʢײ౓ʣ= TP TP + FN , ʢಛҟ౓ʣ= TN TN + FP (1.7) ͸࣮ࡍʹ 1 ͱͳΔσʔλͷ͏ͪਖ਼͘͠ϞσϧͰ 1 ͱ༧ଌͰ͖ͨ΋ͷͷׂ߹ɺಛҟ౓͸࣮ࡍ ͱͳΔσʔλͷ͏ͪਖ਼͘͠ϞσϧͰ 0 ͱ༧ଌͰ͖ͨ΋ͷͷׂ߹ͱͳΓ·͢ɻड৴ऀૢ࡞ಛੑ ड৴ऀૢ࡞ಛੑʢROCʣۂઢԼ໘ੵʢArea Under the Curveʣ ೋ஋෼ྨͰ࢖ΘΕΔࢦඪͰɺ[0.5,1] ͷ஋ΛऔΓ 1 ʹ͍ۙ΄Ͳྑ͍༧ଌੑೳͰ͋Δ͜ͱΛࣔͯ͠ ͍·͢ɻఆٛʹ͸͍͔ͭ͘ͷεςοϓ͕ඞཁͱͳΓ·͢ɻ·ͣɺϞσϧʹΑΔ༧ଌ {1,0} ͱ࣮ σʔλͷ஋ {1,0} Λൺֱ͢ΔࠞಉߦྻΛಋೖ͠·͢ɻදͷ੒෼͸ͦ͜ʹ౰ͯ͸·Δσʔλͷ਺ Ͱ͢ʢྫ͑͹ TP ͷ৔߹͸Ϟσϧ༧ଌ͕ 1 Ͱ͔࣮ͭσʔλ΋ 1 Ͱ͋Δ΋ͷͷ਺Ͱ͢ʣ ɻ͜͜Ͱɺ Ϟσϧ༧ଌ͸ᮢ஋Λม͑Δ͜ͱͰมԽ͢Δ͜ͱʹ஫ҙ͍ͯͩ͘͠͞ɻྫ͑͹ɺϞσϧͷग़ྗ͕ [0%, 100%] ͷ֬৴౓Ͱ͋Δ৔߹ɺ50%Ҏ্ͷ৔߹Λ 1 ͱ༧ଌ͢Δ͔ 80%Ҏ্ͷ৔߹Λ 1 ͱ͢Δ ͔Ͱ݁Ռ͕มΘΔͱ͍͏͜ͱͰ͢ɻ (࣮σʔλ)=1 (࣮σʔλ)=0 (Ϟσϧ༧ଌ)=1 True Positive (TP) False Positive (FP) (Ϟσϧ༧ଌ)=0 False Negative (FN) True Negative (TN) ͜ͷࠞಉߦྻͷ੒෼Λ༻͍ͯײ౓ʢSensitivityʣͱಛҟ౓ʢSpecificityʣఆٛ͠·͢ɻ ʢײ౓ʣ= TP TP + FN , ʢಛҟ౓ʣ= TN TN + FP (1.7) ײ౓͸࣮ࡍʹ 1 ͱͳΔσʔλͷ͏ͪਖ਼͘͠ϞσϧͰ 1 ͱ༧ଌͰ͖ͨ΋ͷͷׂ߹ɺಛҟ౓͸࣮ࡍ ʹ 0 ͱͳΔσʔλͷ͏ͪਖ਼͘͠ϞσϧͰ 0 ͱ༧ଌͰ͖ͨ΋ͷͷׂ߹ͱͳΓ·͢ɻड৴ऀૢ࡞ಛੑ ʢReceiver Operator Characteristicʣۂઢ͸ɺॎ࣠Λײ౓ɺԣ࣠Λʢ1 - ಛҟ౓ʣͱͯ͠Ϟσϧ ֬৴౓ͷᮢ஋Λม͑ͯඳ͍ͨۂઢͷ͜ͱΛࢦ͠·͢ɻROC ۂઢԼ໘ੵ (Area Under the Curve) ୈ 1 ষ ػցֶशΛཧղ͢Δ ʣۂઢԼ໘ੵʢArea Under the Curveʣ ඪͰɺ[0.5,1] ͷ஋ΛऔΓ 1 ʹ͍ۙ΄Ͳྑ͍༧ଌੑೳͰ͋Δ͜ͱΛࣔͯ͠ ͔ͭͷεςοϓ͕ඞཁͱͳΓ·͢ɻ·ͣɺϞσϧʹΑΔ༧ଌ {1,0} ͱ࣮ ֱ͢ΔࠞಉߦྻΛಋೖ͠·͢ɻදͷ੒෼͸ͦ͜ʹ౰ͯ͸·Δσʔλͷ਺ ߹͸Ϟσϧ༧ଌ͕ 1 Ͱ͔࣮ͭσʔλ΋ 1 Ͱ͋Δ΋ͷͷ਺Ͱ͢ʣ ɻ͜͜Ͱɺ ͑Δ͜ͱͰมԽ͢Δ͜ͱʹ஫ҙ͍ͯͩ͘͠͞ɻྫ͑͹ɺϞσϧͷग़ྗ͕ ͋Δ৔߹ɺ50%Ҏ্ͷ৔߹Λ 1 ͱ༧ଌ͢Δ͔ 80%Ҏ্ͷ৔߹Λ 1 ͱ͢Δ ͏͜ͱͰ͢ɻ (࣮σʔλ)=1 (࣮σʔλ)=0 ଌ)=1 True Positive (TP) False Positive (FP) ଌ)=0 False Negative (FN) True Negative (TN) ༻͍ͯײ౓ʢSensitivityʣͱಛҟ౓ʢSpecificityʣఆٛ͠·͢ɻ ײ౓ʣ= TP TP + FN , ʢಛҟ౓ʣ= TN TN + FP (1.7) σʔλͷ͏ͪਖ਼͘͠ϞσϧͰ 1 ͱ༧ଌͰ͖ͨ΋ͷͷׂ߹ɺಛҟ౓͸࣮ࡍ ਖ਼͘͠ϞσϧͰ 0 ͱ༧ଌͰ͖ͨ΋ͷͷׂ߹ͱͳΓ·͢ɻड৴ऀૢ࡞ಛੑ ײ౓ ಛҟ౓ *% ౴͑ είΞ ༧ଌ ᮢ஋ ༧ଌ ᮢ஋ ʜ ʜ ʜ ʜ ʜ ʜ

ػցֶशʹ͓͚ΔϞσϦϯά

Ϟσϧͱ͸Կ͔ ֓೦తʹ͸ɺΠϯϓοτۭ͔ؒΒΞ΢τϓοτۭؒ΁ͷࣸ૾͕Ϟσϧ Ϟσϧʹؔ͢Δܗࣜతٞ࿦͸͋·Γͳ͍͕ɺ http://www.mbmlbook.com/ ͳͲͰmodel based machine learningͷఆࣜԽ͕ͳ͞Ε࢝Ί͍ͯΔ Πϯϓοτۭؒ
Ξ΢τϓοτۭؒ Ϟσϧ ྫ1.ʣؾԹɺ༵೔ɺ… ྫ2.ʣը૾ ചΓ্͛ ֘౰ΧςΰϦ

Ϟσϧͷछྨ͸ແݶେ ྑ͍ʢλεΫʹରͯ͠ߴ͍ύϑΥʔϚϯεΛൃش͢Δʣ ϞσϧΛߏங͢Δ͜ͱ͸ػցֶशʹ͓͍ͯຊ࣭తʹॏཁͳ࡞ۀ ͔͠͠ͳ͕ΒɺϞσϧͷछྨ͸ແݶେ ྫʣؔ਺ܗΛม͑ΔɺύϥϝλΛม͑Δɺઆ໌ม਺Λม͑Δɺ… ྑ͍ϞσϧΛߏங͢Δͷ͸஌ࣝ΍ܦݧʹࠨӈ͞ΕΔ෦෼͕େ͖͘ɺ ༩͑ΒΕͨ໰୊ʹରͯ͠ྑ͍ϞσϧΛߏங͢Δͷ͸࿹ͷݟͤͲ͜Ζ

ૉ๿ͳٙ໰ ༷ʑͳ໰୊ʹରͯ͠౎౓ྑ͍ϞσϧΛߏங͢Δͷ͕େมͳΒ͹ɺ ࠷ॳʹؤுͬͯͲΜͳ໰୊ʹ΋ྑ͍ੑೳΛൃش͢ΔϞσϧΛߏஙͯ͠ɺ ͦΕΛ࢖͍·Θͤ͹͍͍ͷͰ͸ʁ → “ݪཧత”ʹෆՄೳ

ϊʔϑϦʔϥϯνఆཧ ͋ΒΏΔ໰୊ʹରͯ͠ߴ͍ੑೳΛൃش͢ΔϞσϧ͸ߏஙෆՄೳʢʂʣ →ͦͷ໰୊ʹಛ༗ͳ஌ࣝͳͲΛ׆͔ͯ͠ϞσϧΛߏங͢΂͖ Ϟσϧͷੑೳ ໰୊ͷछྨ ฏۉੑೳ ಛघԽ͞ΕͨϞσϧ ͋Δఔ౓ಛघԽ͞ΕͨϞσϧ ൚༻తͳϞσϧ
ྫʣϥΠΦϯΛࣝผ͢ΔϞσϧ ྫʣωίՊΛࣝผ͢ΔϞσϧ ྫʣಈ෺Λࣝผ͢ΔϞσϧ ਺ֶతͳূ໌͸ http://ci.nii.ac.jp/naid/110002812562 ͳͲ

Ϟσϧͷߏஙํ๏ Trial and errorΛ܁Γฦͯ͠improve͍ͯ͘͜͠ͱ͕ॏཁ ɹɹϞσϧΛબఆ ɹɹม਺ʢಛ௃ྔʣΛ࡞੒ ɹɹֶशͷ࣮ߦ ɹɹύϑΥʔϚϯεΛଌఆ

Ϟσϧͷߏஙํ๏ Trial and errorΛ܁Γฦͯ͠improve͍ͯ͘͜͠ͱ͕ॏཁ ɹɹϞσϧΛબఆ ɹɹม਺ʢಛ௃ྔʣΛ࡞੒ ɹɹֶशͷ࣮ߦ ɹɹύϑΥʔϚϯεΛଌఆ ஌ࣝ΍ಎ࡯͕ॏཁ
ͲͷΑ͏ͳ໰୊ʹͲͷΑ͏ͳϞσϧ͕ ༗ޮ͔͸ܦݧ͕ϞϊΛݴ͏ case studyతʹֶΜͰ͍͘ͷ͕ྑ͍

Ϟσϧͷߏஙํ๏ Trial and errorΛ܁Γฦͯ͠improve͍ͯ͘͜͠ͱ͕ॏཁ ɹɹϞσϧΛબఆ ɹɹม਺ʢಛ௃ྔʣΛ࡞੒ ɹɹֶशͷ࣮ߦ ɹɹύϑΥʔϚϯεΛଌఆ લॲཧ΍ಛ௃ྔ࡞੒͸ػցֶशʹ
͓͍ͯ࠷΋࣌ؒΛཁ͢Δεςοϓ Ϟσϧͷਫ਼౓ʹ΋௚݁͢ΔͷͰɺ લॲཧ΍ಛ௃ྔ࡞੒͸ຊ࣭తʹॏཁ ୅දతͳ΋ͷΛ৭ʑࢼͯ͠ΈΔͷ͕ྑ͍

ृ͍Ξώϧͷࢠఆཧ 20 ୈ 1 ষ ػցֶशΛཧղ͢Δ ਤ 1.17: ृ͍ΞώϧͷࢠఆཧͷΠϝʔδਤɻN
ඖ͍ΔΞώϧΛಛ௃ྔΛ࢖ͬͯ 2 ͭͷάϧʔϓʹ෼͚ ΔύλʔϯΛௐ΂·͢ɻύλʔϯ 1 ͷ෼͚ํ΋ύλʔϯ 2 ͷ෼͚ํ΋਺͋Δύλʔϯͷ୯ͳΔҰͭͷ ৔߹Ͱ͔͠ͳ͘ɺզʑ͕৭͕ࠇ͍ͱ͍͏͜ͱʹಛผͳҙຯΛ༩͑ͳ͍ݶΓ͸ࠇ͍Ξώϧ͸ଞͷΞώϧ → ಛ௃ྔબ୒΍࡞੒͸Ϟσϧߏஙʹ͓͍ͯຊ࣭తʹॏཁʂ

Ϟσϧͷߏஙํ๏ Trial and errorΛ܁Γฦͯ͠improve͍ͯ͘͜͠ͱ͕ॏཁ ɹɹϞσϧΛબఆ ɹɹม਺ʢಛ௃ྔʣΛ࡞੒ ɹɹֶशͷ࣮ߦ ɹɹύϑΥʔϚϯεΛଌఆ ֶशΞϧΰϦζϜͰੑೳ͕มΘΓಘΔ
׳Ε͖ͯͨΒ৭ʑࢼͯ͠ΈΔͷ͕ྑ͍ ग़ॴɿhttp://www.denizyuret.com/2015/03/alec-radfords-animations-for.html

Ϟσϧͷߏஙํ๏ Trial and errorΛ܁Γฦͯ͠improve͍ͯ͘͜͠ͱ͕ॏཁ ɹɹϞσϧΛબఆ ɹɹม਺ʢಛ௃ྔʣΛ࡞੒ ɹɹֶशͷ࣮ߦ ɹɹύϑΥʔϚϯεΛଌఆ ໰୊ʹదͨ͠ਫ਼౓ࢦඪͷબ୒͕ඞཁ
աֶशʹ͸૬౰ؾΛ෇͚Δඞཁ͕͋Δ ࢦඪ ֶशճ਺ ςετσʔλʹର͢Δ݁Ռ ֶशσʔλʹର͢Δ݁Ռ աֶश ྖҬ

Ϟσϧͷߏஙํ๏ Trial and errorΛ܁Γฦͯ͠improve͍ͯ͘͜͠ͱ͕ॏཁ ɹɹϞσϧΛબఆ ɹɹม਺ʢಛ௃ྔʣΛ࡞੒ ɹɹֶशͷ࣮ߦ ɹɹύϑΥʔϚϯεΛଌఆ աֶशΛආ͚ͯϞσϧΛߏங͢Δʹ͸
ަ伹ݕূͰࢦඪΛධՁ͢Δͷ͕ඪ४త 28 ୈ 1 ষ ػցֶश ͜ΕҎ֎ʹ΋ͨ͘͞Μͷࢦඪ͕͋Γ·͢ͷͰɺࠓޙͷֶशΛ௨ͯ͡਎ʹ෇͚͍͍ͯͬͯ ͱࢥ͍·͢ɻ Ҏ্Λ౿·͑ͯɺػցֶशʹ͓͚Δ࠷΋ඪ४తͳϞσϧͷධՁํ๏Λ঺հ͓͖ͯ͠·͢ ΛධՁ͢Δର৅σʔλͱͯ͠͸ςετσʔλΛ༻͍Δ͜ͱΛड़΂·͕ͨ͠ɺσʔλ਺͕ॏ शʹ͓͍ͯख࣋ͪͷσʔλ਺Λ࠷େݶ׆͔ͨ͢Ίʹɺަࠥݕఆͱݺ͹ΕΔख๏͕Α͘༻͍ ަࠥݕఆͰ͸σʔλΛ N ౳෼ͯ͠ N-1 Ͱֶशͨ͠ϞσϧΛ༻͍ͯ࢒ΓͷσʔλΛධՁ͢ ࡞Λ܁Γฦ͢ख๏Ͱɺͦͷ֓೦ਤ͸ਤ 1.22 ʹ͍ࣔͯ͠·͢ɻ ਤ 1.22: ަࠥݕূͷΠϝʔδਤɻ͜͜Ͱ͸σʔληοτΛ N ෼ׂͯ͠ N-1 ݸΛֶश༻σʔ ςετ༻σʔλͱͯ͠ɺςετ༻σʔλΛͦΕͧΕͷ N ϒϩοΫʹׂΓ౰͍ͯͯΔɻ͜Ε σʔλΛ࠷େݶʹ׆͔ͯ͠൚ԽੑೳΛߴΊΔΑ͏ͳֶश͕࣮ߦͰ͖Δɻ

ຊߨ࠲Ͱ͍͔ͭ͘ͷࣄྫΛ࣮ફͯͦ͠ͷޮՌΛ࣮ײͯ͠Լ͍͞

·ͱΊ

·ͱΊ • ػցֶशͱ͸Կ͔  σʔλͷ஝ੵͱڞʹλεΫΛղ͘ࡍͷύϑΥʔϚϯεΛߴΊΔख๏  • ػցֶशͰԿ͕Ͱ͖Δͷ͔  ڭࢣ༗Γֶश΍ڭࢣແֶ͠शΛ༻͍ͯɺճؼ෼ੳ΍ΫϥελϦϯά ෼ੳͳͲͷ༷ʑͳλεΫΛղ͘͜ͱ͕Ͱ͖Δ  • ػցֶशʹ͓͚ΔϞσϦϯά 
Ϟσϧ͸Πϯϓοτ(આ໌ม਺)ΛΞ΢τϓοτ(໨తม਺)ʹࣸ͢΋ͷ  Ϟσϧબఆ→ม਺࡞੒→ֶश→ධՁͷαΠΫϧΛଟ͘ճ͢ͷ͕ॏཁ

͓·͚

ػցֶशʹؔ͢Δ৘ใ • WebͰͷֶश  Cousera(https://www.coursera.org/) Ngઌੜͷػցֶशίʔε͕༗໊ • σʔληοτ  UCI dataset(http://archive.ics.uci.edu/ml/datasets.html) ๛෋ͳσʔλ
• Githubͷ༗༻ͳϨϙδτϦ  ྫ͑͹ https://github.com/jakevdp/sklearn_tutorial ͳͲ • σʔλ෼ੳίϯϖ  Kaggle(https://www.kaggle.com/) ༷ʑͳίϯϖ͕͋Γ࣮ྗΛࢼͤΔ • ࿦จऩू  arXiv(https://arxiv.org/) ຖ೔࠷ઌ୺ͷ࿦จ͕Ξοϓ͞ΕΔ

Pythonで動かして学ぶ機械学習入門第一回 機械学習の理解

Pythonで動かして学ぶ機械学習入門第一回 機械学習の理解

More Decks by yoppe

Other Decks in Technology

Featured

Transcript

Pythonで動かして学ぶ機械学習入門第一回　機械学習の理解

Pythonで動かして学ぶ機械学習入門第一回　機械学習の理解