Upgrade to Pro — share decks privately, control downloads, hide ads and more …

cookpadで学ぶ自然言語処理&機械学習/internship2016

j.harashima
September 05, 2016
30k

 cookpadで学ぶ自然言語処理&機械学習/internship2016

j.harashima

September 05, 2016
Tweet

Transcript

  1. ࠓ೔ͷׂ࣌ؒ         

       ࣗવݴޠॲཧʢߨٛˍ࣮शʣ  ٳΈ࣌ؒ  ػցֶशʢߨٛʣ  ٳΈ࣌ؒ  ϥϯν  ػցֶशʢ࣮शʣ  ٳΈ࣌ؒ  ੒Ռൃද  ٳΈ࣌ؒ  ໘ஊ
  2. ܗଶૉղੳ ୯ޠͷ۠੾Γͱ඼ࢺɺݪܗΛࣝผ͢Δॲཧ ଠ࿠ ͸ ژ౎ ʹ ߦͬͨ ଠ࿠͸ژ౎ʹߦͬͨ ໊ࢺ ໊ࢺ

    ಈࢺ ॿࢺ ॿࢺ c c c c ߦ͘ ղੳ༻ͷࣙॻΛࢀরͯ͠ɺҰ൪΋ͬͱ΋Β͍͠ ʢҰ൪ίετͷ૯࿨͕௿͍ʣ୯ޠྻΛબ୒͢Δ ଠ࿠ ໊ࢺ  ژ౎ ໊ࢺ  ژ ໊ࢺ  ౎ ໊ࢺ   ࣙॻ ੜىίετ ੜىίετͷ΄͔ʹ୯ޠؒͷ࿈઀ίετ΋ߟྀ
  3. ݻ༗දݱೝࣝ จதͷݻ༗දݱʢFHਓ໊ɺ஍໊ʣΛೝࣝ͢Δॲཧ ଠ࿠ ͸ ژ౎ ʹ ߦͬͨ ໊ࢺ ໊ࢺ ಈࢺ

    ॿࢺ ॿࢺ c c c c ߦ͘ ଠ࿠ ͸ ژ౎ ʹ ߦͬͨ ਓ໊ ஍໊ ܇࿅σʔλΛ༻ҙͯ͠ɺͲͷΑ͏ͳ୯ޠྻ͕ ͲͷΑ͏ͳݻ༗දݱʹͳΓ͏Δ͔Λֶश ࣍࿠͸ژ౎ʹߦͬͨɻ Ֆࢠ͸େࡕʹߦͬͨɻ  ܇࿅σʔλ ਓ໊ ਓ໊ ஍໊ ஍໊ ݻ༗දݱೝࣝث
  4. ಛച৘ใʢళฮ͔Β౤ߘʣ ݻ༗දݱೝࣝ ͨ·Ͷ͗ݸ ԁ ͨ·Ͷ͗υϨογϯά ԁ   ͨ·Ͷ͗ɹݸ ঎඼໊

    ͨ·Ͷ͗ɹυϨογϯά ঎඼໊ ʮͨ·Ͷ͗ʯͰొ࿥ ʮͨ·Ͷ͗υϨογϯάʯͰొ࿥ 0OMJOF Ϣʔβͷۙลͷళฮ͔Β ঎඼໊ʮͨ·Ͷ͗ʯΛݕࡧ ݻ༗දݱೝࣝJODPPLQBE
  5. ߏจղੳ จͷߏ଄ʢ೔ຊޠͰ͸จઅͷ۠੾Γͱ܎ΓઌʣΛ໌Β͔ʹ͢Δॲཧ ҎԼͷ৘ใ͔Βࣝผ ɾจઅ͸લ͔ΒޙΖʹ܎Δ ɾจઅ͸ͨͩҰͭͷจઅʹ܎Δ ɾ܎Γड͚͸ޓ͍ʹަࠩ͠ͳ͍ ɾ࠷΋͍ۙจઅʹ܎Γ΍͍͢ ɾ ଠ࿠ ͸

    ژ౎ ʹ ߦͬͨ ໊ࢺ ໊ࢺ ಈࢺ ॿࢺ ॿࢺ c c c c ߦ͘ ଠ࿠͸ ژ౎ʹ ߦͬͨ c c จઅͷ۠੾ΓΛࣝผ ߦͬͨ ଠ࿠͸ ژ౎ʹ จઅͷ܎ΓઌΛࣝผ จઅ͸ݸҎ্ͷ಺༰ޠʢFH໊ࢺʣͱݸ Ҏ্ͷ෇ଐޠʢFHॿࢺʣ͔ΒͳΔ
  6. লུরԠղੳ রԠࢺ จষதͷಉҰදݱΛಉఆ͢Δॲཧ ઌߦࢺ রԠղੳ লུղੳʢθϩরԠղੳʣ ޲͔ͬͨ ൴͸ ਗ਼ਫࣉʹ ·ͣ

    ଠ࿠͸ ژ౎ʹ ߦͬͨ θϩ୅໊ࢺ ઌߦࢺ ޲͔ͬͨ ൴͸ ਗ਼ਫࣉʹ ·ͣ ଠ࿠͸ ژ౎ʹ ߦͬͨ θϩ୅໊ࢺʢলུ͞ΕͨরԠࢺʣʹؾ෇͘ͱ͜Ζ͔Β ελʔτ ҎԼͷ৘ใ͔Βಉఆ ɾরԠࢺͱઌߦࢺͷੑผ͸ಉ͡ ɾઌߦࢺ͸ʮ͸ʯΛ൐͍΍͍͢ ɾরԠࢺͱઌߦࢺ͸ͦΕ΄Ͳ཭Εͳ͍ ɾ
  7. ݕࡧΤϯδϯ ΫΤϦʹؔ࿈͢ΔϦιʔεʢFHจॻʣΛऔಘ͢Δॲཧ ΠϯσΩγϯά సஔΠϯσοΫε ୯ޠ͔Βจॻ΁ͷϚοϐϯά 0⒐JOF 0OMJOF จॻଠ࿠͸ژ౎ʹߦͬͨ จॻ࣍࿠͸ژ౎ʹߦͬͨ ܗଶૉղੳ

    จॻଠ࿠c͸cژ౎cʹcߦͬͨ จॻ࣍࿠c͸cژ౎cʹcߦͬͨ ଠ࿠\^ ࣍࿠\^  ژ౎\ ^ ΫΤϦʮଠ࿠ʯ ݕࡧ݁Ռɿจॻ ΫΤϦʮژ౎ʯ ݕࡧ݁Ռɿจॻ จॻ సஔΠϯσοΫεΛࢀর సஔΠϯσοΫεΛࢀর
  8. ࣗಈ຋༁ ίϯϐϡʔλʹΑΔผͷݴޠͷ౳Ձͳจষ΁ͷஔ׵ ೔ɿଠ࿠͸ژ౎ʹߦͬͨ ӳɿ5BSPXFOUUP,ZPUP ܗଶૉղੳɺߏจղੳɺ ݴޠ୯Ґ͝ͱʹ຋༁ ߦͬͨ ଠ࿠͸ ژ౎ʹ 5BSP

    UP,ZPUP XFOU MFBWF  ࣗવจΛੜ੒ ೖྗจΛԿΒ͔ͷݴޠ୯ҐʢFH ୯ޠɺ୯ޠ ྻʣͰදݱ ຋༁ީิΛ૊Έ߹Θͤͯෳ਺ͷจΛੜ੒ ͦͷத͔Β΋ͬͱ΋ࣗવͳ΋ͷΛग़ྗ ֤ݴޠ୯Ґʹ͍ͭͯର༁ࣙॻ͔Β຋༁ީิΛ બ୒ʢର༁ࣙॻ͸༧Ίߏங͓ͯ͘͠ʣ
  9. ܗଶૉղੳث     +6."/ $IB4FO .F$BC ,Z5FB ػցֶशʢੜ੒ϞσϧʣʹΑΔίετௐ੔

    ػցֶशʢࣝผϞσϧʣʹΑΔίετௐ੔ ֶशσʔλͷߏஙΛ༰қʹ ਓखʹΑΔίετௐ੔
  10. .F$BC ɾݱࡏͷσϑΝΫτ ɹɾߴਫ਼౓Ͱ࢖͍΍͍ͨ͢Ί ɾ࢖͍ํ ɹɾ45%*/ʹͭͬ͜Ή͚ͩ NFDBC ଠ࿠͸ژ౎ʹߦͬͨ&OUFS ଠ࿠ɹ໊ࢺ ݻ༗໊ࢺ ਓ໊

    ໊ ଠ࿠ λϩ΢ λϩʔ ͸ɹɹॿࢺ ܎ॿࢺ ͸ ϋ ϫ ژ౎ɹ໊ࢺ ݻ༗໊ࢺ ஍Ҭ Ұൠ ژ౎ Ωϣ΢τ Ωϣʔτ ʹɹɹॿࢺ ֨ॿࢺ Ұൠ ʹ χ χ ߦͬɹಈࢺ ཱࣗ ޒஈɾΧߦଅԻศ ࿈༻λ઀ଓ ߦ͘ Πο Πο ͨɹɹॿಈࢺ ಛघɾλ جຊܗ ͨ λ λ &04 ୯ޠͷ۠੾Γͱ඼ࢺɺݪܗ͕෼͔Δ ؆୯ͳ΋ͷͰ͋Ε͹ɺݻ༗දݱ΋෼͔Δ
  11. ʲ࣮शʳ୯ޠͷ෼෍Λ෼ੳ͠Α͏ ग़ݱස౓ΛΧ΢ϯτ ɹɾࢥ͍ࢥ͍ͷํ๏Ͱʂ ॱҐ ୯ޠ ग़ݱස౓   ʁ 

      ʁ    ʂ   ؆୯   ̇   Ϩγϐʹ ճҎ্ ग़ݱ͢Δ୯ޠͬͯʁ
  12.          

      ࣗવݴޠॲཧʢߨٛˍ࣮शʣ  ٳΈ࣌ؒ  ػցֶशʢߨٛʣ  ٳΈ࣌ؒ  ϥϯν  ػցֶशʢ࣮शʣ  ٳΈ࣌ؒ  ੒Ռൃද  ٳΈ࣌ؒ  ໘ஊ ࠓ೔ͷׂ࣌ؒ
  13. FH खॻ͖จࣈೝࣝ        ༩͑ΒΕͨσʔλΛֶश

     ؙͬΆ͍͔Β ༩͑ΒΕͳ͔ͬͨσʔλͰਪଌ Ϟσϧ ؙͬΆ͍ͷ͸ ॎͷҰຊઢ͸  Ϟσϧ ʁ ֶश ਪଌ
  14. ɾ΋ͬͱ΋جຊతͳճؼϞσϧ ɾೖྗσʔλΛද͢΋ͬͱ΋ྑ͍X X X  X/ ΛܾΊΔ ઢܗճؼ x 1

    y y = w 0 + w 1 x 1 y ( w, x ) = w0 + w1x1 + · · · + wN xN നؙΛද͢Ұ൪ྑ͍X͸ʁ ˠޡࠩͷ૯࿨͕࠷খͷ΋ͷʢ࣮ઢʣ ˞͜ͷྫͰ͸ೖྗσʔλ͕Ұ࣍ݩʢY͚ͩʣ Ͱද͞ΕΔ΋ͷͱ͍ͯ͠Δ
  15.  (PPHMFೣೝࣝ  ਂ૚ֶश͕ը૾ೝࣝͷίϯϖςΟγϣϯͰѹউ 'BDFCPPLݚڀॴJO/:  #BJEVݚڀॴ  "MQIB(P͕ϓϩع࢜ʹѹউ 

     υϫϯΰݚڀॴ ϦΫϧʔτݚڀॴɺ࢈૯ݚݚڀॴ     αΠόʔΤʔδΣϯτݚڀॴ -*/&ݚڀॴ ౦େد෇ߨ࠲ɺి௨େݚڀॴ ւ֎ (PPHMF͕%FFQ.JOEങऩ  1'/%F/"߹หձࣾ  ೔ຊ 'BDFCPPLݚڀॴJO1BSJT   (PPHMF͕%//SFTFBSDIങऩ (PPHMF5FOTPS'MPXެ։   1'*$IBJOFSެ։ ओͳχϡʔε
  16. &H Y Y Y  Y/ ΛΧςΰϦ"PS#ʹ෼ྨ ɹࠨͷϊʔυ͔ΒӈͷϊʔυʹYΛ༩͑Δ ɹΤοδΛ௨ΔࡍʹX X

    X  X/ Λֻ͚Δ ɹؔ਺G ЄOXOYO Λܭࢉʢˣ͸εςοϓؔ਺ͷྫʣ ɹZ͕Ͱ͋Ε͹"ɺͰ͋Ε͹#ʹ෼ྨ ɹ˞X͸ֶशσʔλΛ͏·͘෼ྨͰ͖Δ஋ʹௐ੔ ɹʢ͜Ε͕ύʔηϓτϩϯʹ͓͚Δֶशʣ ύʔηϓτϩϯ … y 0 … x N x 1 x 0 x N x 1 … x 0 y w 0 w 1 w N y 1 y M … h 0 h L y = f (⌃nwnxn) = ( 1 (⌃nwnxn 0) 1 (⌃nwnxn < 0) ύʔηϓτϩϯʢೋ૚ʣ
  17. ɾύʔηϓτϩϯͷΑ͏ͳɺϊʔυͱͦΕΒΛܨ͙ΤοδͰද͞ΕΔϞσϧͷ૯শ ɾ૚Λ૿΍͢ͱ ɹɾෳࡶͳ໰୊͕ղ͚Δ ɹɾֶश͕೉͘͠ͳΔɹɹɹˠ͜ΕΛղܾͨ͠ͷ͕ਂ૚ֶश χϡʔϥϧωοτϫʔΫ … y 0 … x

    N x 1 x 0 y 1 y M … h 0 h L w 0,0 w N,L w L,M w 0,0 w’ w’ ύʔηϓτϩϯʢࡾ૚ʣ ύʔηϓτϩϯʢೋ૚ʣ … y 0 … x N x 1 x 0 x N x 1 … x 0 y w 0 w 1 w N y 1 y M … h 0 h L
  18. ɾࣗಈ຋༁ʹΑ͘࢖ΘΕΔϞσϧ ɾԼͷωοτϫʔΫʹݪݴޠʢFH ೔ຊޠʣΛೖྗ ɹɾY Y  Y5͕ݪݴޠจͷ֤୯ޠʢΛද͢ϕΫτϧʣ ɾ্ͷωοτϫʔΫ͔Β໨తݴޠʢFH ӳޠʣΛग़ྗ ɹɾZ

    Z  Z5`͕ݪݴޠจͷ֤୯ޠʢΛද͢ϕΫτϧʣ ɾ͜Ε·Ͱͷख๏ͱҧͬͯɺߴ౓ͳղੳʢFH ߏจղੳɺলུরԠղੳʣ͕ෆཁ TFRVFODFUPTFRVFODF … … x T  … … … … x 0  x 1  h1 0  h1 1  … … … … … … … … y 0  h2 0  … … y 1  h2 1  … … y T’-1  h2 T’-1  … … y T’  h2 T’  … … … x T-1  h1 T-1  h1 T 
  19. ɾւ֎ͷϢʔβ͕ಡΊΔΑ͏ʹɺ೔ຊͷ ɹΫοΫύουͷϨγϐΛӳޠʹ຋༁ ɾ ϨγϐͷֶशσʔλΛ࢖ͬͯ ɹ೔ӳ຋༁ϞσϧΛߏங Ϟσϧ ֶशσʔλ ֶश Ϩγϐʢ೔ʣ Ϩγϐʢӳʣ

    "UUFOUJPO NPEFM ೔ɿ֖·ͨ͸ΞϧϛϗΠϧͰམͱ֖͠Λͯ͠தՐͰৠ͠ম͖ʹ ͠·͢ɻ ӳɿDPWFSXJUIBMJEPSBMVNJOVNGPJM BOETUFBNUIF DIJDLFOPONFEJVNIFBU TFRVFODFUPTFRVFODFJODPPLQBE #BIEBOBVFUBM ݚ ڀ த ʢ࣮ࡍͷ຋༁Ϟσϧͷ݁Ռʣ
  20.          

      ࣗવݴޠॲཧʢߨٛˍ࣮शʣ  ٳΈ࣌ؒ  ػցֶशʢߨٛʣ  ٳΈ࣌ؒ  ϥϯν  ػցֶशʢ࣮शʣ  ٳΈ࣌ؒ  ੒Ռൃද  ٳΈ࣌ؒ  ໘ஊ ࠓ೔ͷׂ࣌ؒ
  21. 47.JOTDJLJUMFBSO >>> from sklearn import svm >>> X = [[0,

    0], [1, 1]] >>> y = [0, 1] >>> clf = svm.SVC() >>> clf.fit(X, y) SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, decision_function_shape=None, degree=3, gamma='auto', kernel='rbf', max_iter=-1, probability=False, random_state=None, shrinking=True, tol=0.001, verbose=False) ˡಛ௃ϕΫτϧʢFH ୯ޠͷස౓Ͱදͨ͠Ϩγϐʣ ˡڭࢣσʔλɹʢFH /PU໙ྨɺ໙ྨʣ IUUQTDJLJUMFBSOPSHTUBCMFNPEVMFTTWNIUNMTWN
  22. ώϯτ ྫ͑͹ɺ ɹ֤ϨγϐͷςΩετʢFH λΠτϧʣΛܗଶૉղੳ ɹ಺༰ޠʢFH ໊ࢺʣʹ൪߸Λ෇༩ ɹ֤ϨγϐΛಛ௃ϕΫτϧͰදݱʢFH Ϩγϐ"<  

       >ʣ ɹಛ௃ϕΫτϧͱڭࢣσʔλΛ༩͑ͯɺ47.Λֶश ΋ͪΖΜɺ͜ΕҎ֎ͷํ๏Ͱ΋0,Ͱ͢ʂ ˢ൪ͷ୯ޠ͕ճɺ൪ͷ୯ޠ͕ճɺ
  23. ɾࠓճ͸ɺσʔλͷ൒෼ͰϞσϧΛֶशɺ࢒Γ൒෼ͰϞσϧΛධՁ͠·͠ΐ͏ ෼ྨ݁ՌͷධՁ from sklearn.svm import SVC from sklearn import cross_validation

    X = ... y = ... X_train, X_test, y_train, y_test = cross_validation.train_test_split(X, y, test_size=0.5, random_state=0) svm = SVC() svm.fit(X_train, y_train) score = svm.score(X_test, y_test) print(score)