Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Pythonで動かして学ぶ機械学習入門第一回 機械学習の理解

yoppe
September 09, 2016

 Pythonで動かして学ぶ機械学習入門第一回 機械学習の理解

yoppe

September 09, 2016
Tweet

More Decks by yoppe

Other Decks in Technology

Transcript

  1. ߨࢣ঺հ • ٠ా ངฏʢ͖ͨ͘ Α͏΁͍ʣ • ത࢜ʢཧֶʣ • ݱࡏ͸๭ίϯαϧςΟϯάϑΝʔϜʹͯσʔλ෼ੳۀ຿ʹैࣄ •

    ಘҙ෼໺
 ɾػցֶशͷཧ࿦తଆ໘
 ɾਪનΞϧΰϦζϜ
 ɾը૾෼ੳʢDeep Learningʣ • ࿈བྷઌ
 Կ͔͋Γ·ͨ͠Β͓ؾܰʹ͝࿈བྷ͍ͩ͘͞
 Email : [email protected]
 Facebook : https://www.facebook.com/yohei.kikuta.3
 Linkedin : https://jp.linkedin.com/in/yohei-kikuta-983b29117 
  2. ػցֶशͷҖྗ ग़ॴɿhttps://arxiv.org/abs/1112.6209  he of e- st ce n, ns

    rs. he d, ut . e). en h- Figure 3. Top: Top 48 stimuli of the best neuron from the test set. Bottom: The optimal stimulus according to nu- merical constraint optimization. 4.5. Invariance properties We would like to assess the robustness of the face de- tector against common object transformations, e.g., translation, scaling and out-of-plane rotation. First, we chose a set of 10 face images and perform distor- Building high-level features using large-scale unsupervised learning t) and out-of-plane (3D) rotation (right) ies of the best feature. Figure 6. Visualization of the cat fac human body neuron (right). scribed in (Zhang et al., 2008). In are 10,000 positive images and 18,4 (so that the positive-to-negative ra
  3. ػցֶशͷҖྗ ग़ॴɿhttps://www.cs.toronto.edu/~ranzato/publications/taigman_cvpr14.pdf  Figure 2. Outline of the DeepFace architecture.

    A front-end of a single convolution-pooling-convolution filtering on the rectified input, followed by three locally-connected layers and two fully-connected layers. Colors illustrate feature maps produced at each layer. The net includes more than 120 million parameters, where more than 95% come from the local and fully connected layers. very few parameters. These layers merely expand the input into a set of simple local features. The subsequent layers (L4, L5 and L6) are instead lo- cally connected [13, 16], like a convolutional layer they ap- ply a filter bank, but every location in the feature map learns a different set of filters. Since different regions of an aligned image have different local statistics, the spatial stationarity assumption of convolution cannot hold. For example, ar- eas between the eyes and the eyebrows exhibit very differ- ent appearance and have much higher discrimination ability compared to areas between the nose and the mouth. In other words, we customize the architecture of the DNN by lever- The goal of training is to maximize the probability of the correct class (face id). We achieve this by minimiz- ing the cross-entropy loss for each training sample. If k is the index of the true label for a given input, the loss is: L = log pk . The loss is minimized over the parameters by computing the gradient of L w.r.t. the parameters and by updating the parameters using stochastic gradient de- scent (SGD). The gradients are computed by standard back- propagation of the error [25, 21]. One interesting property of the features produced by this network is that they are very sparse. On average, 75% of the feature components in the topmost layers are exactly zero. This is mainly due to the ases in the DNN, perfect equivariance would have been achieved. 4. Verification Metric Verifying whether two input instances belong to the same class (identity) or not has been extensively researched in the domain of unconstrained face-recognition, with supervised methods showing a clear performance advantage over unsu- pervised ones. By training on the target-domain’s training set, one is able to fine-tune a feature vector (or classifier) to perform better within the particular distribution of the dataset. For instance, LFW has about 75% males, celebri- ties that were photographed by mostly professional photog- raphers. As demonstrated in [5], training and testing within different domain distributions hurt performance consider- ably and requires further tuning to the representation (or classifier) in order to improve their generalization and per- Figure 3. The ROC curves on the LFW dataset. Best viewed in color.
  4. ػցֶशͷҖྗ ग़ॴɿhttps://arxiv.org/abs/1512.02595  Read speech with high signal-to-noise ratio is

    arguably the easiest large vocabulary for a continuous speech recognition task. We benchmark our system on two test sets from the Wall Street Journal (WSJ) corpus of read news articles. These are available in the LDC catalog as LDC94S13B and LDC93S6B. We also take advantage of the recently developed LibriSpeech corpus constructed using audio books from the LibriVox project [46]. Table 13 shows that the DS2 system outperforms humans in 3 out of the 4 test sets and is competitive on the fourth. Given this result, we suspect that there is little room for a generic speech system to further improve on clean read speech without further domain adaptation. Read Speech Test set DS1 DS2 Human WSJ eval’92 4.94 3.60 5.03 WSJ eval’93 6.94 4.98 8.08 LibriSpeech test-clean 7.89 5.33 5.83 LibriSpeech test-other 21.74 13.25 12.69 Table 13: Comparison of WER for two speech systems and human level performance on read speech. 18
  5. ػցֶशͷҖྗ  ARTICLE RESEARCH learning of convolutional networks, won 11%

    of games against Pachi23 and 12% against a slightly weaker program, Fuego24. Reinforcement learning of value networks The final stage of the training pipeline focuses on position evaluation, (s, a) of the search tree stores an action value Q(s, a), visit count N(s, a), and prior probability P(s, a). The tree is traversed by simulation (that is, descending the tree in complete games without backup), starting from the root state. At each time step t of each simulation, an action at is selected from state st Figure 3 | Monte Carlo tree search in AlphaGo. a, Each simulation traverses the tree by selecting the edge with maximum action value Q, plus a bonus u(P) that depends on a stored prior probability P for that edge. b, The leaf node may be expanded; the new node is processed once by the policy network pσ and the output probabilities are stored as prior probabilities P for each action. c, At the end of a simulation, the leaf node is evaluated in two ways: using the value network vθ ; and by running a rollout to the end of the game with the fast rollout policy pπ , then computing the winner with function r. d, Action values Q are updated to track the mean value of all evaluations r(·) and vθ (·) in the subtree below that action. Selection a b c d Expansion Evaluation Backup p S p V Q + u(P) Q + u(P) Q + u(P) Q + u(P) P P P P Q Q Q Q Q r r r r P max max P Q T Q T Q T Q T Q T Q T ग़ॴɿhttp://www.nature.com/nature/journal/v529/n7587/abs/nature16961.html?lang=en
  6. ͳͥػցֶश͕༗༻͔ʁ  • େྔͷσʔλ
 σʔλ͸ࢦ਺ؔ਺తʹ૿Ճ
 
 
 
 • ΦϯσϚϯυͳܭࢉࢿݯ


    ඞཁͳࢿݯ͕ඞཁͳ࣌ʹඞཁͳ෼͚ͩ४උՄೳ
 
 
 
 • ๛෋ͳϥΠϒϥϦ
 Pythonʹ୅ද͞ΕΔ๛෋ͳػցֶशϥΠϒϥϦΛ༗͢Δݴޠͷൃల
 GithubͳͲͷOSSڞ༗ͷ΢ΣϒαʔϏεͷ୆಄ 1.2. ػցֶशͰԿ͕Ͱ͖Δͷ͔ 9 ձࣾ Ϋϥ΢υαʔϏε URL Amazon Amazon Web Service (AWS) https://aws.amazon.com/jp/ Google Google Cloud Platform (GCP) https://cloud.google.com/ IBM SoftLayer http://www.softlayer.com/jp/ Microsoft Azure https://azure.microsoft.com/ja-jp/ ද 1.1: ओཁͳΫϥ΢υαʔόʔαʔϏεɻ֤͕ࣾετϨʔδαʔϏε͔ΒػցֶशͷϚωʔδυܕ αʔϏε·Ͱ෯޿͍αʔϏεΛఏڙ͍ͯ͠·͢ɻ ·ͨɺػցֶशͷ༷ʑͳΞϧΰϦζϜ͸ GitHubʢhttps://github.com/ʣͳͲΛච಄ʹੈքதͰڞ ༗͞Ε͍ͯΔͨΊɺIT ΍ػցֶशʹҰఆͷ଄ܮ͕͋Ε͹͜ͷੈքͷ࠷ઌ୺ͷӥஐ͕୭ʹͰ΋ར༻Մ ೳͰ͢ɻҎ্ͷΑ͏ͳঢ়گʹΑΓɺݱࡏͰ͸େن໛ͳγεςϜΛ࣋ͪ߹Θͤͳ͍ݸਓϨϕϧͰߴ͍ੑ ೳΛ༗͢Δػցֶशख๏͕ར༻Մೳͱͳ͍ͬͯ·͢ɻػցֶश͸՝୊ղܾʹରͯ͠ݱ୅ͷਓʑ͕࠾༻
  7. ػցֶशʹ͓͚ΔλεΫ • ճؼ෼ੳɿઆ໌͍ͨ͠ྔΛଞͷྔΛ༻͍ͯදݱ • ࣌ܥྻ෼ੳɿ஫໨͢Δྔͷ࣌ؒతͳมԽΛදݱ • ൑ผ෼ੳɿ༩͑ΒΕͨσʔλ͕ͲͷΫϥεʹଐ͢Δ͔൑ఆ • ࣗಈૢ࡞ɿήʔϜͷ߈ུ΍ϩϘοτͷߦಈ੍ޚΛඇ໌ࣔతʹࣗಈԽ •

    ಛఆύλʔϯͷൃݟɿେྔͷσʔλ͔ΒಛఆͷύλʔϯΛൃݟ • ΫϥελϦϯά෼ੳɿσʔλΛ͋ΔنଇʹैͬͯΫϥε෼͚ • ࠷దԽ෼ੳɿ੍໿৚݅Λຬͨ͢Α͏໨తͷྔͷ࠷େ஋࠷খ஋Λࢉग़ ※ػցֶशͷλεΫΛશͯ໢ཏ͍ͯ͠ΔΘ͚Ͱ͸͋Γ·ͤΜ 
  8. ਫ਼౓ࢦඪ λεΫͷධՁʹ͸ج४ͱͳΔࢦඪ͕ඞཁͰɺͦͷબͼํ͸ॏཁ ͨͩ͠ࢦඪ͸ඇৗʹ਺͕ଟ͍͜ͱʹՃ͑ͯɺ ڭࢣ༗Γֶशͱڭࢣແֶ͠शͰ΋ߟ͑ํ͕ҟͳΔͨΊෳࡶ ͜͜Ͱ͸ڭࢣ༗Γֶशͷ୅දతͳྫΛ͍͔ͭ֬͘ೝ͢ΔʹཹΊΔ • ճؼ෼ੳ޲͖ͷࢦඪ
 Root Mean Square

    Error (RMSE), Median Absolute Deviation (MAD), Max-Error, …
 • ൑ผ෼ੳ޲͖ͷࢦඪ
 Are Under the Receiver Operator Characteristic curve (AUROC), (multi- class) log loss, normalized Discounted Cumulative Gain (nDCG), … 
  9. ਫ਼౓ࢦඪ Root Mean Square Error: ฏۉೋ৐ޡࠩ  ϧͷධՁ ͲΜͳࢦඪͰੑೳΛଌΔ͔ͱ͍͏໰୊Ͱ͢ɻઌ΄Ͳͷೣͷ൑ผͷྫͰݴ͑͹ɺྫ ͏ͪɺಛʹࣗ৴Λ΋ͬͯ༧ଌΛ͍ͯ͠Δ֬৴౓্Ґ

    10 ຕͷը૾͸Կͱͯ͠΋౰ ΋Ͱ͖Δ͚ͩଟ͘ͷຕ਺Ͱਖ਼͘͠ೣͱ༧ଌ͍ͨ͠ͷ͔Ͱѻ͏ࢦඪ͕มΘ͖ͬͯ· ʹ͓͍ͯ͸࣮ʹ༷ʑͳࢦඪ͕ߟҊ͞Εར༻͞Ε͍ͯ·͕͢ɺ͜͜Ͱ͸ճؼͱ൑ผ ୅දతͳࢦඪΛ঺հ͠·͢ɻTODO:Ҏ߱ͷࢦඪʹؔͯ͠͸΋͏গ͠Θ͔Γ΍͢ ͢Δɻ Ͱ࢖ΘΕΔ͜ͱͷଟ͍ࢦඪͱͯ͠͸ҎԼͷ΋ͷ͕ڍ͛ΒΕ·͢ɻ ೋ৐ޡࠩʢRoot Mean Square Errorʣ ม਺ͷ࣮σʔλͱ༧ଌ஋ͷࠩͷೋ৐࿨Ͱ͋Γɺ஋͕খ͍͞΄Ͳ౰ͯ͸·Γ͕ྑ͍ ·͢ɻ 1 N N i (f(xi) − yi)2 جຊతͳࢦඪͰ͋ΓɺϞσϧʹΑΔ֤σʔλ఺Ͱͷ༧ଌ͕࣮σʔλʹ͍ۙ஋Ͱ͋ ໨తม਺ આ໌ม਺ දతͳࢦඪΛ঺հ͠·͢ɻTODO:Ҏ߱ͷࢦඪʹؔͯ͠͸΋͏গ͠Θ͔Γ΍͘͢ͳΔ Δɻ ࢖ΘΕΔ͜ͱͷଟ͍ࢦඪͱͯ͠͸ҎԼͷ΋ͷ͕ڍ͛ΒΕ·͢ɻ ޡࠩʢRoot Mean Square Errorʣ ͷ࣮σʔλͱ༧ଌ஋ͷࠩͷೋ৐࿨Ͱ͋Γɺ஋͕খ͍͞΄Ͳ౰ͯ͸·Γ͕ྑ͍Ϟσϧ ɻ 1 N N i (f(xi) − yi)2 ( తͳࢦඪͰ͋ΓɺϞσϧʹΑΔ֤σʔλ఺Ͱͷ༧ଌ͕࣮σʔλʹ͍ۙ஋Ͱ͋Δ΄Ͳ ߟ͑ʹج͍͍ͮͯ·͢ɻࠜ߸͸׳शతͳ΋ͷͳͷͰࠜ߸ແ͠ͷ΋ͷΛࢦඪͱ͢Δ৔ ͢ɻೋ৐࿨Ͱͳ͘ઈର஋ͷ࿨Ͱ͋ΔฏۉઈରޡࠩʢMean Absolute ErrorʣͳͲ΋ ͢ɻ ޡࠩʢMedian Absolute Errorʣ ͸ͳ͘தԝ஋ͱͷࠩ෼ΛऔΔࢦඪͰɺ஋͕খ͍͞΄Ͳ౰ͯ͸·Γ͕ྑ͍Ϟσϧͱͳ ͷ༧ଌͷ͏ͪɺಛʹࣗ৴Λ΋ͬͯ༧ଌΛ͍ͯ͠Δ֬৴౓্Ґ 10 ຕͷը૾͸Կͱͯ͠΋౰ͯ ɺͦΕͱ΋Ͱ͖Δ͚ͩଟ͘ͷຕ਺Ͱਖ਼͘͠ೣͱ༧ଌ͍ͨ͠ͷ͔Ͱѻ͏ࢦඪ͕มΘ͖ͬͯ·͢ ػցֶशʹ͓͍ͯ͸࣮ʹ༷ʑͳࢦඪ͕ߟҊ͞Εར༻͞Ε͍ͯ·͕͢ɺ͜͜Ͱ͸ճؼͱ൑ผͷ ͍ΒΕΔ୅දతͳࢦඪΛ঺հ͠·͢ɻTODO:Ҏ߱ͷࢦඪʹؔͯ͠͸΋͏গ͠Θ͔Γ΍͘͢ improve ͢Δɻ ճؼ໰୊Ͱ࢖ΘΕΔ͜ͱͷଟ͍ࢦඪͱͯ͠͸ҎԼͷ΋ͷ͕ڍ͛ΒΕ·͢ɻ • ฏۉೋ৐ޡࠩʢRoot Mean Square Errorʣ ໨తม਺ͷ࣮σʔλͱ༧ଌ஋ͷࠩͷೋ৐࿨Ͱ͋Γɺ஋͕খ͍͞΄Ͳ౰ͯ͸·Γ͕ྑ͍Ϟ ͳΓ·͢ɻ 1 N N i (f(xi) − yi)2 ࠷΋جຊతͳࢦඪͰ͋ΓɺϞσϧʹΑΔ֤σʔλ఺Ͱͷ༧ଌ͕࣮σʔλʹ͍ۙ஋Ͱ͋Δ ͍ͱ͍͏ߟ͑ʹج͍͍ͮͯ·͢ɻࠜ߸͸׳शతͳ΋ͷͳͷͰࠜ߸ແ͠ͷ΋ͷΛࢦඪͱ͢ ΋͋Γ·͢ɻೋ৐࿨Ͱͳ͘ઈର஋ͷ࿨Ͱ͋ΔฏۉઈରޡࠩʢMean Absolute ErrorʣͳͲ ࢖ΘΕ·͢ɻ • தԝઈରޡࠩʢMedian Absolute Errorʣ
  10. ਫ਼౓ࢦඪ Are Under the Receiver Operator Characteristic curve:
 ड৴ऀૢ࡞ಛੑۂઢͷۂઢԼ໘ੵ 

    ୈ 1 ষ ػցֶशΛཧղ͢Δ ऀૢ࡞ಛੑʢROCʣۂઢԼ໘ੵʢArea Under the Curveʣ ෼ྨͰ࢖ΘΕΔࢦඪͰɺ[0.5,1] ͷ஋ΛऔΓ 1 ʹ͍ۙ΄Ͳྑ͍༧ଌੑೳͰ͋Δ͜ͱΛࣔͯ͠ ͢ɻఆٛʹ͸͍͔ͭ͘ͷεςοϓ͕ඞཁͱͳΓ·͢ɻ·ͣɺϞσϧʹΑΔ༧ଌ {1,0} ͱ࣮ λͷ஋ {1,0} Λൺֱ͢ΔࠞಉߦྻΛಋೖ͠·͢ɻදͷ੒෼͸ͦ͜ʹ౰ͯ͸·Δσʔλͷ਺ ʢྫ͑͹ TP ͷ৔߹͸Ϟσϧ༧ଌ͕ 1 Ͱ͔࣮ͭσʔλ΋ 1 Ͱ͋Δ΋ͷͷ਺Ͱ͢ʣ ɻ͜͜Ͱɺ ϧ༧ଌ͸ᮢ஋Λม͑Δ͜ͱͰมԽ͢Δ͜ͱʹ஫ҙ͍ͯͩ͘͠͞ɻྫ͑͹ɺϞσϧͷग़ྗ͕ 100%] ͷ֬৴౓Ͱ͋Δ৔߹ɺ50%Ҏ্ͷ৔߹Λ 1 ͱ༧ଌ͢Δ͔ 80%Ҏ্ͷ৔߹Λ 1 ͱ͢Δ ݁Ռ͕มΘΔͱ͍͏͜ͱͰ͢ɻ (࣮σʔλ)=1 (࣮σʔλ)=0 (Ϟσϧ༧ଌ)=1 True Positive (TP) False Positive (FP) (Ϟσϧ༧ଌ)=0 False Negative (FN) True Negative (TN) ࠞಉߦྻͷ੒෼Λ༻͍ͯײ౓ʢSensitivityʣͱಛҟ౓ʢSpecificityʣఆٛ͠·͢ɻ ʢײ౓ʣ= TP TP + FN , ʢಛҟ౓ʣ= TN TN + FP (1.7) ͸࣮ࡍʹ 1 ͱͳΔσʔλͷ͏ͪਖ਼͘͠ϞσϧͰ 1 ͱ༧ଌͰ͖ͨ΋ͷͷׂ߹ɺಛҟ౓͸࣮ࡍ ͱͳΔσʔλͷ͏ͪਖ਼͘͠ϞσϧͰ 0 ͱ༧ଌͰ͖ͨ΋ͷͷׂ߹ͱͳΓ·͢ɻड৴ऀૢ࡞ಛੑ ड৴ऀૢ࡞ಛੑʢROCʣۂઢԼ໘ੵʢArea Under the Curveʣ ೋ஋෼ྨͰ࢖ΘΕΔࢦඪͰɺ[0.5,1] ͷ஋ΛऔΓ 1 ʹ͍ۙ΄Ͳྑ͍༧ଌੑೳͰ͋Δ͜ͱΛࣔͯ͠ ͍·͢ɻఆٛʹ͸͍͔ͭ͘ͷεςοϓ͕ඞཁͱͳΓ·͢ɻ·ͣɺϞσϧʹΑΔ༧ଌ {1,0} ͱ࣮ σʔλͷ஋ {1,0} Λൺֱ͢ΔࠞಉߦྻΛಋೖ͠·͢ɻදͷ੒෼͸ͦ͜ʹ౰ͯ͸·Δσʔλͷ਺ Ͱ͢ʢྫ͑͹ TP ͷ৔߹͸Ϟσϧ༧ଌ͕ 1 Ͱ͔࣮ͭσʔλ΋ 1 Ͱ͋Δ΋ͷͷ਺Ͱ͢ʣ ɻ͜͜Ͱɺ Ϟσϧ༧ଌ͸ᮢ஋Λม͑Δ͜ͱͰมԽ͢Δ͜ͱʹ஫ҙ͍ͯͩ͘͠͞ɻྫ͑͹ɺϞσϧͷग़ྗ͕ [0%, 100%] ͷ֬৴౓Ͱ͋Δ৔߹ɺ50%Ҏ্ͷ৔߹Λ 1 ͱ༧ଌ͢Δ͔ 80%Ҏ্ͷ৔߹Λ 1 ͱ͢Δ ͔Ͱ݁Ռ͕มΘΔͱ͍͏͜ͱͰ͢ɻ (࣮σʔλ)=1 (࣮σʔλ)=0 (Ϟσϧ༧ଌ)=1 True Positive (TP) False Positive (FP) (Ϟσϧ༧ଌ)=0 False Negative (FN) True Negative (TN) ͜ͷࠞಉߦྻͷ੒෼Λ༻͍ͯײ౓ʢSensitivityʣͱಛҟ౓ʢSpecificityʣఆٛ͠·͢ɻ ʢײ౓ʣ= TP TP + FN , ʢಛҟ౓ʣ= TN TN + FP (1.7) ײ౓͸࣮ࡍʹ 1 ͱͳΔσʔλͷ͏ͪਖ਼͘͠ϞσϧͰ 1 ͱ༧ଌͰ͖ͨ΋ͷͷׂ߹ɺಛҟ౓͸࣮ࡍ ʹ 0 ͱͳΔσʔλͷ͏ͪਖ਼͘͠ϞσϧͰ 0 ͱ༧ଌͰ͖ͨ΋ͷͷׂ߹ͱͳΓ·͢ɻड৴ऀૢ࡞ಛੑ ʢReceiver Operator Characteristicʣۂઢ͸ɺॎ࣠Λײ౓ɺԣ࣠Λʢ1 - ಛҟ౓ʣͱͯ͠Ϟσϧ ֬৴౓ͷᮢ஋Λม͑ͯඳ͍ͨۂઢͷ͜ͱΛࢦ͠·͢ɻROC ۂઢԼ໘ੵ (Area Under the Curve) ୈ 1 ষ ػցֶशΛཧղ͢Δ ʣۂઢԼ໘ੵʢArea Under the Curveʣ ඪͰɺ[0.5,1] ͷ஋ΛऔΓ 1 ʹ͍ۙ΄Ͳྑ͍༧ଌੑೳͰ͋Δ͜ͱΛࣔͯ͠ ͔ͭͷεςοϓ͕ඞཁͱͳΓ·͢ɻ·ͣɺϞσϧʹΑΔ༧ଌ {1,0} ͱ࣮ ֱ͢ΔࠞಉߦྻΛಋೖ͠·͢ɻදͷ੒෼͸ͦ͜ʹ౰ͯ͸·Δσʔλͷ਺ ߹͸Ϟσϧ༧ଌ͕ 1 Ͱ͔࣮ͭσʔλ΋ 1 Ͱ͋Δ΋ͷͷ਺Ͱ͢ʣ ɻ͜͜Ͱɺ ͑Δ͜ͱͰมԽ͢Δ͜ͱʹ஫ҙ͍ͯͩ͘͠͞ɻྫ͑͹ɺϞσϧͷग़ྗ͕ ͋Δ৔߹ɺ50%Ҏ্ͷ৔߹Λ 1 ͱ༧ଌ͢Δ͔ 80%Ҏ্ͷ৔߹Λ 1 ͱ͢Δ ͏͜ͱͰ͢ɻ (࣮σʔλ)=1 (࣮σʔλ)=0 ଌ)=1 True Positive (TP) False Positive (FP) ଌ)=0 False Negative (FN) True Negative (TN) ༻͍ͯײ౓ʢSensitivityʣͱಛҟ౓ʢSpecificityʣఆٛ͠·͢ɻ ײ౓ʣ= TP TP + FN , ʢಛҟ౓ʣ= TN TN + FP (1.7) σʔλͷ͏ͪਖ਼͘͠ϞσϧͰ 1 ͱ༧ଌͰ͖ͨ΋ͷͷׂ߹ɺಛҟ౓͸࣮ࡍ ਖ਼͘͠ϞσϧͰ 0 ͱ༧ଌͰ͖ͨ΋ͷͷׂ߹ͱͳΓ·͢ɻड৴ऀૢ࡞ಛੑ ײ౓ ಛҟ౓   *% ౴͑ είΞ ༧ଌ ᮢ஋ ༧ଌ ᮢ஋ ʜ                ʜ ʜ ʜ ʜ ʜ
  11. Ϟσϧͷߏஙํ๏ Trial and errorΛ܁Γฦͯ͠improve͍ͯ͘͜͠ͱ͕ॏཁ ɹɹϞσϧΛબఆ ɹɹม਺ʢಛ௃ྔʣΛ࡞੒ ɹɹֶशͷ࣮ߦ ɹɹύϑΥʔϚϯεΛଌఆ  લॲཧ΍ಛ௃ྔ࡞੒͸ػցֶशʹ

    ͓͍ͯ࠷΋࣌ؒΛཁ͢Δεςοϓ Ϟσϧͷਫ਼౓ʹ΋௚݁͢ΔͷͰɺ લॲཧ΍ಛ௃ྔ࡞੒͸ຊ࣭తʹॏཁ ୅දతͳ΋ͷΛ৭ʑࢼͯ͠ΈΔͷ͕ྑ͍
  12. ृ͍Ξώϧͷࢠఆཧ  20 ୈ 1 ষ ػցֶशΛཧղ͢Δ ਤ 1.17: ृ͍ΞώϧͷࢠఆཧͷΠϝʔδਤɻN

    ඖ͍ΔΞώϧΛಛ௃ྔΛ࢖ͬͯ 2 ͭͷάϧʔϓʹ෼͚ ΔύλʔϯΛௐ΂·͢ɻύλʔϯ 1 ͷ෼͚ํ΋ύλʔϯ 2 ͷ෼͚ํ΋਺͋Δύλʔϯͷ୯ͳΔҰͭͷ ৔߹Ͱ͔͠ͳ͘ɺզʑ͕৭͕ࠇ͍ͱ͍͏͜ͱʹಛผͳҙຯΛ༩͑ͳ͍ݶΓ͸ࠇ͍Ξώϧ͸ଞͷΞώϧ → ಛ௃ྔબ୒΍࡞੒͸Ϟσϧߏஙʹ͓͍ͯຊ࣭తʹॏཁʂ
  13. Ϟσϧͷߏஙํ๏ Trial and errorΛ܁Γฦͯ͠improve͍ͯ͘͜͠ͱ͕ॏཁ ɹɹϞσϧΛબఆ ɹɹม਺ʢಛ௃ྔʣΛ࡞੒ ɹɹֶशͷ࣮ߦ ɹɹύϑΥʔϚϯεΛଌఆ  աֶशΛආ͚ͯϞσϧΛߏங͢Δʹ͸

    ަ伹ݕূͰࢦඪΛධՁ͢Δͷ͕ඪ४త 28 ୈ 1 ষ ػցֶश ͜ΕҎ֎ʹ΋ͨ͘͞Μͷࢦඪ͕͋Γ·͢ͷͰɺࠓޙͷֶशΛ௨ͯ͡਎ʹ෇͚͍͍ͯͬͯ ͱࢥ͍·͢ɻ Ҏ্Λ౿·͑ͯɺػցֶशʹ͓͚Δ࠷΋ඪ४తͳϞσϧͷධՁํ๏Λ঺հ͓͖ͯ͠·͢ ΛධՁ͢Δର৅σʔλͱͯ͠͸ςετσʔλΛ༻͍Δ͜ͱΛड़΂·͕ͨ͠ɺσʔλ਺͕ॏ शʹ͓͍ͯख࣋ͪͷσʔλ਺Λ࠷େݶ׆͔ͨ͢Ίʹɺަࠥݕఆͱݺ͹ΕΔख๏͕Α͘༻͍ ަࠥݕఆͰ͸σʔλΛ N ౳෼ͯ͠ N-1 Ͱֶशͨ͠ϞσϧΛ༻͍ͯ࢒ΓͷσʔλΛධՁ͢ ࡞Λ܁Γฦ͢ख๏Ͱɺͦͷ֓೦ਤ͸ਤ 1.22 ʹ͍ࣔͯ͠·͢ɻ ਤ 1.22: ަࠥݕূͷΠϝʔδਤɻ͜͜Ͱ͸σʔληοτΛ N ෼ׂͯ͠ N-1 ݸΛֶश༻σʔ ςετ༻σʔλͱͯ͠ɺςετ༻σʔλΛͦΕͧΕͷ N ϒϩοΫʹׂΓ౰͍ͯͯΔɻ͜Ε σʔλΛ࠷େݶʹ׆͔ͯ͠൚ԽੑೳΛߴΊΔΑ͏ͳֶश͕࣮ߦͰ͖Δɻ
  14. ػցֶशʹؔ͢Δ৘ใ • WebͰͷֶश
 Cousera(https://www.coursera.org/) Ngઌੜͷػցֶशίʔε͕༗໊ • σʔληοτ
 UCI dataset(http://archive.ics.uci.edu/ml/datasets.html) ๛෋ͳσʔλ

    • Githubͷ༗༻ͳϨϙδτϦ
 ྫ͑͹ https://github.com/jakevdp/sklearn_tutorial ͳͲ • σʔλ෼ੳίϯϖ
 Kaggle(https://www.kaggle.com/) ༷ʑͳίϯϖ͕͋Γ࣮ྗΛࢼͤΔ • ࿦จऩू
 arXiv(https://arxiv.org/) ຖ೔࠷ઌ୺ͷ࿦จ͕Ξοϓ͞ΕΔ