Pythonで動かして学ぶ機械学習入門第一回　機械学習の理解

Slide 1

Slide 1 text

ػցֶशͷཧղ   PythonͰಈֶ͔ͯ͠Ϳػցֶशೖ໳ ୈҰճʢશ࢛ճγϦʔζʣ ٠ా ངฏ  2016/09/09

Slide 6

Slide 6 text

ػցֶशͷҖྗ ग़ॴɿhttps://www.cs.toronto.edu/~ranzato/publications/taigman_cvpr14.pdf Figure 2. Outline of the DeepFace architecture. A front-end of a single convolution-pooling-convolution filtering on the rectified input, followed by three locally-connected layers and two fully-connected layers. Colors illustrate feature maps produced at each layer. The net includes more than 120 million parameters, where more than 95% come from the local and fully connected layers. very few parameters. These layers merely expand the input into a set of simple local features. The subsequent layers (L4, L5 and L6) are instead locally connected [13, 16], like a convolutional layer they ap- ply a filter bank, but every location in the feature map learns a different set of filters. Since different regions of an aligned image have different local statistics, the spatial stationarity assumption of convolution cannot hold. For example, areas between the eyes and the eyebrows exhibit very different appearance and have much higher discrimination ability compared to areas between the nose and the mouth. In other words, we customize the architecture of the DNN by lever- The goal of training is to maximize the probability of the correct class (face id). We achieve this by minimiz- ing the cross-entropy loss for each training sample. If k is the index of the true label for a given input, the loss is: L = log pk . The loss is minimized over the parameters by computing the gradient of L w.r.t. the parameters and by updating the parameters using stochastic gradient de- scent (SGD). The gradients are computed by standard back- propagation of the error [25, 21]. One interesting property of the features produced by this network is that they are very sparse. On average, 75% of the feature components in the topmost layers are exactly zero. This is mainly due to the ases in the DNN, perfect equivariance would have been achieved. 4. Verification Metric Verifying whether two input instances belong to the same class (identity) or not has been extensively researched in the domain of unconstrained face-recognition, with supervised methods showing a clear performance advantage over unsu- pervised ones. By training on the target-domain’s training set, one is able to fine-tune a feature vector (or classifier) to perform better within the particular distribution of the dataset. For instance, LFW has about 75% males, celebri- ties that were photographed by mostly professional photog- raphers. As demonstrated in [5], training and testing within different domain distributions hurt performance consider- ably and requires further tuning to the representation (or classifier) in order to improve their generalization and per- Figure 3. The ROC curves on the LFW dataset. Best viewed in color.

Slide 40

Slide 40 text

ਫ਼౓ࢦඪ Root Mean Square Error: ฏۉೋ৐ޡࠩ ϧͷධՁ ͲΜͳࢦඪͰੑೳΛଌΔ͔ͱ͍͏໰୊Ͱ͢ɻઌ΄Ͳͷೣͷ൑ผͷྫͰݴ͑͹ɺྫ ͏ͪɺಛʹࣗ৴Λ΋ͬͯ༧ଌΛ͍ͯ͠Δ֬৴౓্Ґ 10 ຕͷը૾͸Կͱͯ͠΋౰ ΋Ͱ͖Δ͚ͩଟ͘ͷຕ਺Ͱਖ਼͘͠ೣͱ༧ଌ͍ͨ͠ͷ͔Ͱѻ͏ࢦඪ͕มΘ͖ͬͯ· ʹ͓͍ͯ͸࣮ʹ༷ʑͳࢦඪ͕ߟҊ͞Εར༻͞Ε͍ͯ·͕͢ɺ͜͜Ͱ͸ճؼͱ൑ผ ୅දతͳࢦඪΛ঺հ͠·͢ɻTODO:Ҏ߱ͷࢦඪʹؔͯ͠͸΋͏গ͠Θ͔Γ΍͢ ͢Δɻ Ͱ࢖ΘΕΔ͜ͱͷଟ͍ࢦඪͱͯ͠͸ҎԼͷ΋ͷ͕ڍ͛ΒΕ·͢ɻ ೋ৐ޡࠩʢRoot Mean Square Errorʣ ม਺ͷ࣮σʔλͱ༧ଌ஋ͷࠩͷೋ৐࿨Ͱ͋Γɺ஋͕খ͍͞΄Ͳ౰ͯ͸·Γ͕ྑ͍ ·͢ɻ 1 N N i (f(xi) − yi)2 جຊతͳࢦඪͰ͋ΓɺϞσϧʹΑΔ֤σʔλ఺Ͱͷ༧ଌ͕࣮σʔλʹ͍ۙ஋Ͱ͋ ໨తม਺ આ໌ม਺ දతͳࢦඪΛ঺հ͠·͢ɻTODO:Ҏ߱ͷࢦඪʹؔͯ͠͸΋͏গ͠Θ͔Γ΍͘͢ͳΔ Δɻ ࢖ΘΕΔ͜ͱͷଟ͍ࢦඪͱͯ͠͸ҎԼͷ΋ͷ͕ڍ͛ΒΕ·͢ɻ ޡࠩʢRoot Mean Square Errorʣ ͷ࣮σʔλͱ༧ଌ஋ͷࠩͷೋ৐࿨Ͱ͋Γɺ஋͕খ͍͞΄Ͳ౰ͯ͸·Γ͕ྑ͍Ϟσϧ ɻ 1 N N i (f(xi) − yi)2 ( తͳࢦඪͰ͋ΓɺϞσϧʹΑΔ֤σʔλ఺Ͱͷ༧ଌ͕࣮σʔλʹ͍ۙ஋Ͱ͋Δ΄Ͳ ߟ͑ʹج͍͍ͮͯ·͢ɻࠜ߸͸׳शతͳ΋ͷͳͷͰࠜ߸ແ͠ͷ΋ͷΛࢦඪͱ͢Δ৔ ͢ɻೋ৐࿨Ͱͳ͘ઈର஋ͷ࿨Ͱ͋ΔฏۉઈରޡࠩʢMean Absolute ErrorʣͳͲ΋ ͢ɻ ޡࠩʢMedian Absolute Errorʣ ͸ͳ͘தԝ஋ͱͷࠩ෼ΛऔΔࢦඪͰɺ஋͕খ͍͞΄Ͳ౰ͯ͸·Γ͕ྑ͍Ϟσϧͱͳ ͷ༧ଌͷ͏ͪɺಛʹࣗ৴Λ΋ͬͯ༧ଌΛ͍ͯ͠Δ֬৴౓্Ґ 10 ຕͷը૾͸Կͱͯ͠΋౰ͯ ɺͦΕͱ΋Ͱ͖Δ͚ͩଟ͘ͷຕ਺Ͱਖ਼͘͠ೣͱ༧ଌ͍ͨ͠ͷ͔Ͱѻ͏ࢦඪ͕มΘ͖ͬͯ·͢ ػցֶशʹ͓͍ͯ͸࣮ʹ༷ʑͳࢦඪ͕ߟҊ͞Εར༻͞Ε͍ͯ·͕͢ɺ͜͜Ͱ͸ճؼͱ൑ผͷ ͍ΒΕΔ୅දతͳࢦඪΛ঺հ͠·͢ɻTODO:Ҏ߱ͷࢦඪʹؔͯ͠͸΋͏গ͠Θ͔Γ΍͘͢ improve ͢Δɻ ճؼ໰୊Ͱ࢖ΘΕΔ͜ͱͷଟ͍ࢦඪͱͯ͠͸ҎԼͷ΋ͷ͕ڍ͛ΒΕ·͢ɻ • ฏۉೋ৐ޡࠩʢRoot Mean Square Errorʣ ໨తม਺ͷ࣮σʔλͱ༧ଌ஋ͷࠩͷೋ৐࿨Ͱ͋Γɺ஋͕খ͍͞΄Ͳ౰ͯ͸·Γ͕ྑ͍Ϟ ͳΓ·͢ɻ 1 N N i (f(xi) − yi)2 ࠷΋جຊతͳࢦඪͰ͋ΓɺϞσϧʹΑΔ֤σʔλ఺Ͱͷ༧ଌ͕࣮σʔλʹ͍ۙ஋Ͱ͋Δ ͍ͱ͍͏ߟ͑ʹج͍͍ͮͯ·͢ɻࠜ߸͸׳शతͳ΋ͷͳͷͰࠜ߸ແ͠ͷ΋ͷΛࢦඪͱ͢ ΋͋Γ·͢ɻೋ৐࿨Ͱͳ͘ઈର஋ͷ࿨Ͱ͋ΔฏۉઈରޡࠩʢMean Absolute ErrorʣͳͲ ࢖ΘΕ·͢ɻ • தԝઈରޡࠩʢMedian Absolute Errorʣ

Slide 41

Slide 41 text

ਫ਼౓ࢦඪ Are Under the Receiver Operator Characteristic curve:  ड৴ऀૢ࡞ಛੑۂઢͷۂઢԼ໘ੵ ୈ 1 ষ ػցֶशΛཧղ͢Δ ऀૢ࡞ಛੑʢROCʣۂઢԼ໘ੵʢArea Under the Curveʣ ෼ྨͰ࢖ΘΕΔࢦඪͰɺ[0.5,1] ͷ஋ΛऔΓ 1 ʹ͍ۙ΄Ͳྑ͍༧ଌੑೳͰ͋Δ͜ͱΛࣔͯ͠ ͢ɻఆٛʹ͸͍͔ͭ͘ͷεςοϓ͕ඞཁͱͳΓ·͢ɻ·ͣɺϞσϧʹΑΔ༧ଌ {1,0} ͱ࣮ λͷ஋ {1,0} Λൺֱ͢ΔࠞಉߦྻΛಋೖ͠·͢ɻදͷ੒෼͸ͦ͜ʹ౰ͯ͸·Δσʔλͷ਺ ʢྫ͑͹ TP ͷ৔߹͸Ϟσϧ༧ଌ͕ 1 Ͱ͔࣮ͭσʔλ΋ 1 Ͱ͋Δ΋ͷͷ਺Ͱ͢ʣ ɻ͜͜Ͱɺ ϧ༧ଌ͸ᮢ஋Λม͑Δ͜ͱͰมԽ͢Δ͜ͱʹ஫ҙ͍ͯͩ͘͠͞ɻྫ͑͹ɺϞσϧͷग़ྗ͕ 100%] ͷ֬৴౓Ͱ͋Δ৔߹ɺ50%Ҏ্ͷ৔߹Λ 1 ͱ༧ଌ͢Δ͔ 80%Ҏ্ͷ৔߹Λ 1 ͱ͢Δ ݁Ռ͕มΘΔͱ͍͏͜ͱͰ͢ɻ (࣮σʔλ)=1 (࣮σʔλ)=0 (Ϟσϧ༧ଌ)=1 True Positive (TP) False Positive (FP) (Ϟσϧ༧ଌ)=0 False Negative (FN) True Negative (TN) ࠞಉߦྻͷ੒෼Λ༻͍ͯײ౓ʢSensitivityʣͱಛҟ౓ʢSpecificityʣఆٛ͠·͢ɻ ʢײ౓ʣ= TP TP + FN , ʢಛҟ౓ʣ= TN TN + FP (1.7) ͸࣮ࡍʹ 1 ͱͳΔσʔλͷ͏ͪਖ਼͘͠ϞσϧͰ 1 ͱ༧ଌͰ͖ͨ΋ͷͷׂ߹ɺಛҟ౓͸࣮ࡍ ͱͳΔσʔλͷ͏ͪਖ਼͘͠ϞσϧͰ 0 ͱ༧ଌͰ͖ͨ΋ͷͷׂ߹ͱͳΓ·͢ɻड৴ऀૢ࡞ಛੑ ड৴ऀૢ࡞ಛੑʢROCʣۂઢԼ໘ੵʢArea Under the Curveʣ ೋ஋෼ྨͰ࢖ΘΕΔࢦඪͰɺ[0.5,1] ͷ஋ΛऔΓ 1 ʹ͍ۙ΄Ͳྑ͍༧ଌੑೳͰ͋Δ͜ͱΛࣔͯ͠ ͍·͢ɻఆٛʹ͸͍͔ͭ͘ͷεςοϓ͕ඞཁͱͳΓ·͢ɻ·ͣɺϞσϧʹΑΔ༧ଌ {1,0} ͱ࣮ σʔλͷ஋ {1,0} Λൺֱ͢ΔࠞಉߦྻΛಋೖ͠·͢ɻදͷ੒෼͸ͦ͜ʹ౰ͯ͸·Δσʔλͷ਺ Ͱ͢ʢྫ͑͹ TP ͷ৔߹͸Ϟσϧ༧ଌ͕ 1 Ͱ͔࣮ͭσʔλ΋ 1 Ͱ͋Δ΋ͷͷ਺Ͱ͢ʣ ɻ͜͜Ͱɺ Ϟσϧ༧ଌ͸ᮢ஋Λม͑Δ͜ͱͰมԽ͢Δ͜ͱʹ஫ҙ͍ͯͩ͘͠͞ɻྫ͑͹ɺϞσϧͷग़ྗ͕ [0%, 100%] ͷ֬৴౓Ͱ͋Δ৔߹ɺ50%Ҏ্ͷ৔߹Λ 1 ͱ༧ଌ͢Δ͔ 80%Ҏ্ͷ৔߹Λ 1 ͱ͢Δ ͔Ͱ݁Ռ͕มΘΔͱ͍͏͜ͱͰ͢ɻ (࣮σʔλ)=1 (࣮σʔλ)=0 (Ϟσϧ༧ଌ)=1 True Positive (TP) False Positive (FP) (Ϟσϧ༧ଌ)=0 False Negative (FN) True Negative (TN) ͜ͷࠞಉߦྻͷ੒෼Λ༻͍ͯײ౓ʢSensitivityʣͱಛҟ౓ʢSpecificityʣఆٛ͠·͢ɻ ʢײ౓ʣ= TP TP + FN , ʢಛҟ౓ʣ= TN TN + FP (1.7) ײ౓͸࣮ࡍʹ 1 ͱͳΔσʔλͷ͏ͪਖ਼͘͠ϞσϧͰ 1 ͱ༧ଌͰ͖ͨ΋ͷͷׂ߹ɺಛҟ౓͸࣮ࡍ ʹ 0 ͱͳΔσʔλͷ͏ͪਖ਼͘͠ϞσϧͰ 0 ͱ༧ଌͰ͖ͨ΋ͷͷׂ߹ͱͳΓ·͢ɻड৴ऀૢ࡞ಛੑ ʢReceiver Operator Characteristicʣۂઢ͸ɺॎ࣠Λײ౓ɺԣ࣠Λʢ1 - ಛҟ౓ʣͱͯ͠Ϟσϧ ֬৴౓ͷᮢ஋Λม͑ͯඳ͍ͨۂઢͷ͜ͱΛࢦ͠·͢ɻROC ۂઢԼ໘ੵ (Area Under the Curve) ୈ 1 ষ ػցֶशΛཧղ͢Δ ʣۂઢԼ໘ੵʢArea Under the Curveʣ ඪͰɺ[0.5,1] ͷ஋ΛऔΓ 1 ʹ͍ۙ΄Ͳྑ͍༧ଌੑೳͰ͋Δ͜ͱΛࣔͯ͠ ͔ͭͷεςοϓ͕ඞཁͱͳΓ·͢ɻ·ͣɺϞσϧʹΑΔ༧ଌ {1,0} ͱ࣮ ֱ͢ΔࠞಉߦྻΛಋೖ͠·͢ɻදͷ੒෼͸ͦ͜ʹ౰ͯ͸·Δσʔλͷ਺ ߹͸Ϟσϧ༧ଌ͕ 1 Ͱ͔࣮ͭσʔλ΋ 1 Ͱ͋Δ΋ͷͷ਺Ͱ͢ʣ ɻ͜͜Ͱɺ ͑Δ͜ͱͰมԽ͢Δ͜ͱʹ஫ҙ͍ͯͩ͘͠͞ɻྫ͑͹ɺϞσϧͷग़ྗ͕ ͋Δ৔߹ɺ50%Ҏ্ͷ৔߹Λ 1 ͱ༧ଌ͢Δ͔ 80%Ҏ্ͷ৔߹Λ 1 ͱ͢Δ ͏͜ͱͰ͢ɻ (࣮σʔλ)=1 (࣮σʔλ)=0 ଌ)=1 True Positive (TP) False Positive (FP) ଌ)=0 False Negative (FN) True Negative (TN) ༻͍ͯײ౓ʢSensitivityʣͱಛҟ౓ʢSpecificityʣఆٛ͠·͢ɻ ײ౓ʣ= TP TP + FN , ʢಛҟ౓ʣ= TN TN + FP (1.7) σʔλͷ͏ͪਖ਼͘͠ϞσϧͰ 1 ͱ༧ଌͰ͖ͨ΋ͷͷׂ߹ɺಛҟ౓͸࣮ࡍ ਖ਼͘͠ϞσϧͰ 0 ͱ༧ଌͰ͖ͨ΋ͷͷׂ߹ͱͳΓ·͢ɻड৴ऀૢ࡞ಛੑ ײ౓ ಛҟ౓ *% ౴͑ είΞ ༧ଌ ᮢ஋ ༧ଌ ᮢ஋ ʜ ʜ ʜ ʜ ʜ ʜ

Slide 1

Slide 1 text

Slide 2

Slide 2 text

Slide 3

Slide 3 text

Slide 4

Slide 4 text

Slide 5

Slide 5 text

Slide 6

Slide 6 text

Slide 7

Slide 7 text

Slide 8

Slide 8 text

Slide 9

Slide 9 text

Slide 10

Slide 10 text

Slide 11

Slide 11 text

Slide 12

Slide 12 text

Slide 13

Slide 13 text

Slide 14

Slide 14 text

Slide 15

Slide 15 text

Slide 16

Slide 16 text

Slide 17

Slide 17 text

Slide 18

Slide 18 text

Slide 19

Slide 19 text

Slide 20

Slide 20 text

Slide 21

Slide 21 text

Slide 22

Slide 22 text

Slide 23

Slide 23 text

Slide 24

Slide 24 text

Slide 25

Slide 25 text

Slide 26

Slide 26 text

Slide 27

Slide 27 text

Slide 28

Slide 28 text

Slide 29

Slide 29 text

Slide 30

Slide 30 text

Slide 31

Slide 31 text

Slide 32

Slide 32 text

Slide 33

Slide 33 text

Slide 34

Slide 34 text

Slide 35

Slide 35 text

Slide 36

Slide 36 text

Slide 37

Slide 37 text

Slide 38

Slide 38 text

Slide 39

Slide 39 text

Slide 40

Slide 40 text