Upgrade to Pro — share decks privately, control downloads, hide ads and more …

自分のための機械学習をしてみた話

 自分のための機械学習をしてみた話

2018/4/13に開かれた、GDG信州「Recap of TensorFlow Dev Summit 2018 in 信州」でのLTスライドです。

D484f3a4d5f516f943b29b9ff55a2040?s=128

thinkAmi

April 13, 2018
Tweet

Transcript

  1. ࣗ෼ͷͨΊͷػցֶशΛ ͯ͠Έͨ࿩ !UIJOL"NJ 3FDBQPG5FOTPS'MPX%FW4VNNJUJO৴भ -5 

  2. ࣗݾ঺հ • UIJOL"NJ ◦ ͸ͯͳ 5XJUUFS (JU)VC • झຯɿΓΜͨ͝΂͋Δ͖ ◦

    ໿छྨ͘Β͍ • ࢓ࣄɿ1ZUIPO ◦ %KBOHP$POHSFTT • (%(৴भɺ/4&( • ΪʔΫϥϘ௕໺ IUUQSJOHPUBCFUUFSIFSPLVBQQDPNIDUPUBM
  3. ࣗ෼ͷͨΊͷػցֶशΛͯ͠Έͨ࿩ ◦ ΄͍݁͠ՌΛಘΔͨΊʹɺ͍Ζ͍ΖͱࢼߦࡨޡΛͯ͠Έͨ࿩ • ͋ͱͰεϥΠυ͸ެ։͠·͢

  4. ࣭໰  ೔ ͸Կͷ೔ʁ

  5. ౴͑ *1"ͷ৘ใॲཧࢼݧ

  6. ߴ౓ࢼݧͷޕલ͸ ಉ೔ͷԠ༻৘ใٕज़ऀࢼݧ ͷޕલͱಉ͡໰୊͕ग़Δ Β͍͠

  7. ͔ͤͬ͘ͳͷͰ ػցֶशͰௐ΂ͯΈͨ ͱ͍͏-5

  8. ࣭໰ • ػցֶशɺಛʹࣗવݴޠॲཧΛ΍ͬͨ͜ͱ͕͋Δํ • 1ZUIPOΛॻ͍ͨ͜ͱ͕͋Δํ

  9. ௐ΂ͯΈͨྲྀΕ  աڈ໰Λೖख  0$3ॲཧ  ղੳ݁ՌΛલॲཧͯ͠ɺσʔλΛΩϨΠʹ͢Δ  ػցֶशͰࣗવݴޠॲཧ 

    ֶश݁ՌΛදࣔ ιʔείʔυ 1ZUIPO ͸ͪ͜Β IUUQTHJUIVCDPNUIJOL"NJJQB@JTTVFT@NPEFM@CZ@EPDWFD
  10. աڈ໰Λೖख • ͪ͜Β͔Β ◦ *1"ಠཱߦ੓๏ਓ ৘ใॲཧਪਐػߏɿաڈ໰୊ ▪ IUUQTXXXKJUFDJQBHPKQ@IBOOJ@TVLJSV@JOEFY@NP OEBJIUNM •

    0$3ॲཧ͞Ε͍ͯͳ͍ͬΆ͍QEG ◦ ͭΒ͍
  11. 0$3ॲཧ • 1ZUIPOͷ৔߹ɺQZPDS 5FTTFSBDUͰ͍͚ͦ͏ • ͱ͸͍͑ɺ͜͜͸(%(৴भ ◦ (%((PPHMF%FWFMPQFST(SPVQT • (PPHMFΛ࢖Θͳ͍ͱʂ

  12. (PPHMFͰ0$3ॲཧͰ͖ͦ͏ͳ΋ͷ • (PPHMF$MPVE7JTJPO"1* • (PPHMF%SJWFͷ0$3ॲཧ ◦ ࠓճ͸ͪ͜ΒΛ࢖͏

  13. (PPHMF%SJWF"1*Ͱ0$3ॲཧͰ͖Δ͔ • ύϥϝʔλ͕͋ͬͨ ◦ 1045IUUQTXXXHPPHMFBQJTDPNVQMPBEESJWFWGJMFTͷ PDS-BOHVBHF ▪ IUUQTEFWFMPQFSTHPPHMFDPNESJWFWSFGFSFODFGJMFTDSFBUF • 1ZUIPOͰॻ͍ͯΈͨ

    ◦ 0$3ॲཧ͕Ͱ͖ͨ ◦ HPPHMF@ESJWF@VQMPBEFSQZ • ͜Μͳײ͡ NFEJB@CPEZ.FEJB'JMF6QMPBE G NJNFUZQF.*.&@5:1& SFTVNBCMF5SVF CPEZ\OBNFGOBNF NJNF5ZQF.*.&@5:1& QBSFOUT<EJSFDUPSZ@JE> ^ STFSWJDFGJMFT DSFBUF CPEZCPEZ NFEJB@CPEZNFEJB@CPEZ PDS-BOHVBHFKB  FYFDVUF
  14. 0$3ྫ ࠨɿݪຊɺӈɿ(PPHMF%SJWF0$3

  15. 0$3ͷਫ਼౓ʹ͍ͭͯ • ࠓճɺ0$3ͷਫ਼౓ʹ͍ͭͯ͸ͩ͜ΘΒͳ͍ ◦ ಉ͡໰୊Ͱ͋Ε͹ɺಉ͡ਫ਼౓ͰςΩετԽ͞ΕΔ͸ͣ ▪ ยํ͕ޡͬͯղੳ͞Εͯ΋ɺ΋͏ยํ΋ޡͬͯղੳ͞ΕΔ͸ͣ • ͱΓ͋͑ͣ0$3ॲཧͰ͖͍ͯΕ͹0,

  16. σʔλͷલॲཧ • 0$3ޙͷσʔλΛݟͨͱ͜Ζɺͦͷ··Ͱ͸ѻ͍ͮΒ͍͜ͱʹؾͮ͘ ◦ վߦ͕ଟ͍ ◦ ʮ໰ʯ͚͕ͩʮ໰ʯͱͯ͠0$3ॲཧ͞ΕͯΔ ▪ ଞͷ໰୊͸େৎ෉ͦ͏ͳͷʹ •

    ͱ͸͍͑ɺ͋·ΓྗೖΕͳ͍
  17. ػցֶशͰͷࣗવݴޠॲཧ • ػցֶशͰͷࣗવݴޠॲཧ͸ɺ͍Ζ͍Ζ͋Δ • ࠓճ͸%PD7FDΛબ୒ ◦ จষͷྨࣅੑΛݟ͚ͭ΍ͦ͢͏ͩͬͨͷͰ • %PD7FDͱ͸ ◦

    จষΛϕΫτϧԽ ▪ ϕΫτϧԽ͢ΔͱίϯϐϡʔλͰѻ͍΍͘͢ͳΔ ▪ ෼ྨͱ͔
  18. %PD7FDϞσϧ • 5FOTPS'MPX ◦ ͦΕͬΆ͍࣮૷͸͋ͬͨ ▪ IUUQTHJUIVCDPN1BDLU1VCMJTIJOH5FOTPS'MPX.BDIJOF-FBSOJOH $PPLCPPLCMPCNBTUFS$IBQUFSEPDWFDQZ ◦ ࣗྗ࣮૷͢Δʹ͸஌͕ࣝ଍Γͳ͍

  19. %PD7FDϞσϧ • (FOTJN ◦ (FOTJNJTB1ZUIPOMJCSBSZGPSUPQJDNPEFMMJOH EPDVNFOU JOEFYJOHBOETJNJMBSJUZSFUSJFWBMXJUIMBSHFDPSQPSB5BSHFU BVEJFODFJTUIFOBUVSBMMBOHVBHFQSPDFTTJOH /-1 BOE

    JOGPSNBUJPOSFUSJFWBM *3 DPNNVOJUZ ▪ IUUQTHJUIVCDPN3B3F5FDIOPMPHJFTHFOTJN ◦ %PD7FDϞσϧ͕ఏڙ͞Ε͍ͯΔ ▪ IUUQTSBEJNSFIVSFLDPNHFOTJNNPEFMTEPDWFDIUNM
  20. %PD7FD͢ΔͨΊʹ • จॻΛܗଶૉղੳΛ͢Δඞཁ͕͋Δ ◦ ܗଶૉղੳɿจॻΛɺҙຯΛ࣋ͭ࠷খ୯Ґʹ෼ղ͢Δ͜ͱ ◦ ͓଴͓ͪͯ͠Γ·͢ ͓଴ͪ ͠ ͯ

    ͓Γ ·͢ 8JLJQFEJB • ϥΠϒϥϦ ◦ .F$BC΍KBOPNFɺ+6."/ ͳͲ ▪ ࠓճ͸ɺखݩʹ͋ͬͨ+6."/ Λ࢖༻ • +6."/ ͷΠϯετʔϧ ◦ .BDͷ৔߹ɺ)PNFCSFXͰΠϯετʔϧ ▪ IUUQTDIF[PVIBUFOBCMPHDPNFOUSZ • 1ZUIPO͔Βѻ͏ͨΊʹɺ1Z,/1ΛΠϯετʔϧ ◦ IUUQOMQJTUJLZPUPVBDKQJOEFYQIQ 1Z,/1
  21. +6."/ ͰܗଶૉղੳɺτϨʔχϯάσʔλ༻ҙ • doc2vec_runner.py def morphological_analysis(doc): r = Jumanpp().analysis(doc) return

    [mrph.midasi for mrph in r.mrph_list()] def get_trainings(issue_type): trainings = [] paths = get_file_paths(issue_type) for p in paths: doc = read_file(p) words = morphological_analysis(doc) trainings.append(TaggedDocument(words=words, tags=[p.stem])) return trainings
  22. HFOTJNͷ%PD7FDͰֶश • doc2vec_runner.py model = Doc2Vec( documents=all_trainings, dm=1, # PV-DMモデルを使う

    min_count=1, # これ以下の出現数の単語は無視 workers=4, # スレッドのワーカー数 epochs=EPOCHS, # エポック数 <= これが重要だった )
  23. %PD7FDͰͷֶश  ΤϙοΫ਺ͷҧ͍ • ΤϙοΫ਺ FQPDI ◦ ༻ҙֶͨ͠शσʔλͰɺԿճ܇࿅͢Δ͔ • ॏཁͳͷ͸ɺద੾ͳ਺Ͱ܇࿅͢Δ͜ͱ

    ◦ IUUQTTUBDLPWFSGMPXDPNRVFTUJPOTXIBUBSFEPDWFDUSBJOJOH JUFSBUJPOT
  24. %PD7FDͰͷֶश  ΤϙοΫ਺ͷҧ͍ • ݁Ռͷҧ͍ ࠨɿΤϙοΫɺӈɿΤϙοΫ

  25. ֶश݁ՌΛදࣔ • TJNJMBSJUZ@SVOOFSQZ ◦ Ұ൪ྨࣅͯ͠ΔͷΛग़͢ TJNJMBSJUZ

  26. ֶश݁ՌΛදࣔ • TJNJMBSJUZ@SVOOFSQZ ◦ ࣅ͍ͯΔ্ҐO݅ NPTU@TJNJMBS

  27. ·ͱΊ • ҙ֎ͱͦΕͳΓͷਫ਼౓͕ग़ͨ ◦ ΄͍͠σʔλ͕ಘΒΕͨ • πʔϧΛ૊Έ߹ΘͤΕ͹ɺࣗ෼ͷͨΊͷػցֶश͕Ͱ͖ͦ͏ ◦ ָ͍͠ •

    ͜ΕΛ࡞Δͷָ͕͔ͬͨ͠ͷͰɺࢼݧରࡦͰ͖ͯͳ͍ɻɻɻ
  28. $. • /4&( ◦  ౔ dʮ044ϥΠηϯεʯษڧձ ▪ ߨࢣɿςΫχΧϧϥΠλʔʗ*5ϥΠλʔͷՄ஌๛ࢯ ▪

    ৔ॴɿΪʔΫϥϘ௕໺ ▪ IUUQTOTFHDPOOQBTTDPNFWFOU • *P5-5 ◦  ౔ d௕໺൛*P5റΓͷษڧձ *P5-5௕໺ WPM ▪ ৔ॴɿΪʔΫϥϘ௕໺ ▪ ୈ෦లࣔ IUUQTJPUMUDPOOQBTTDPNFWFOU ▪ ୈ෦-5େձ IUUQTJPUMUDPOOQBTTDPNFWFOU