2018/4/13に開かれた、GDG信州「Recap of TensorFlow Dev Summit 2018 in 信州」でのLTスライドです。
ࣗͷͨΊͷػցֶशΛͯ͠Έͨ!UIJOL"NJ3FDBQPG5FOTPS'MPX%FW4VNNJUJO৴भ -5
View Slide
ࣗݾհ● UIJOL"NJ○ ͯͳ 5XJUUFS (JU)VC● झຯɿΓΜͨ͋͝Δ͖○ छྨ͘Β͍● ࣄɿ1ZUIPO○ %KBOHP$POHSFTT● (%(৴भɺ/4&(● ΪʔΫϥϘIUUQSJOHPUBCFUUFSIFSPLVBQQDPNIDUPUBM
ࣗͷͨΊͷػցֶशΛͯ͠Έͨ○ ΄͍݁͠ՌΛಘΔͨΊʹɺ͍Ζ͍ΖͱࢼߦࡨޡΛͯ͠Έͨ● ͋ͱͰεϥΠυެ։͠·͢
࣭ Կͷʁ
͑*1"ͷใॲཧࢼݧ
ߴࢼݧͷޕલಉͷԠ༻ใٕज़ऀࢼݧͷޕલͱಉ͕͡ग़ΔΒ͍͠
͔ͤͬ͘ͳͷͰػցֶशͰௐͯΈͨͱ͍͏-5
࣭● ػցֶशɺಛʹࣗવݴޠॲཧΛͬͨ͜ͱ͕͋Δํ● 1ZUIPOΛॻ͍ͨ͜ͱ͕͋Δํ
ௐͯΈͨྲྀΕ աڈΛೖख 0$3ॲཧ ղੳ݁ՌΛલॲཧͯ͠ɺσʔλΛΩϨΠʹ͢Δ ػցֶशͰࣗવݴޠॲཧ ֶश݁ՌΛදࣔιʔείʔυ 1ZUIPOͪ͜ΒIUUQTHJUIVCDPNUIJOL"[email protected]@[email protected]@EPDWFD
աڈΛೖख● ͪ͜Β͔Β○ *1"ಠཱߦ๏ਓ ใॲཧਪਐػߏɿաڈ■ [email protected]@[email protected]@NPOEBJIUNM● 0$3ॲཧ͞Ε͍ͯͳ͍ͬΆ͍QEG○ ͭΒ͍
0$3ॲཧ● 1ZUIPOͷ߹ɺQZPDS5FTTFSBDUͰ͍͚ͦ͏● ͱ͍͑ɺ͜͜(%(৴भ○ (%((PPHMF%FWFMPQFST(SPVQT● (PPHMFΛΘͳ͍ͱʂ
(PPHMFͰ0$3ॲཧͰ͖ͦ͏ͳͷ● (PPHMF$MPVE7JTJPO"1*● (PPHMF%SJWFͷ0$3ॲཧ○ ࠓճͪ͜ΒΛ͏
(PPHMF%SJWF"1*Ͱ0$3ॲཧͰ͖Δ͔● ύϥϝʔλ͕͋ͬͨ○ 1045IUUQTXXXHPPHMFBQJTDPNVQMPBEESJWFWGJMFTͷ PDS-BOHVBHF■ IUUQTEFWFMPQFSTHPPHMFDPNESJWFWSFGFSFODFGJMFTDSFBUF● 1ZUIPOͰॻ͍ͯΈͨ○ 0$3ॲཧ͕Ͱ͖ͨ○ [email protected]@VQMPBEFSQZ● ͜Μͳײ͡[email protected]'JMF6QMPBE G NJNFUZQF.*.&@5:1& SFTVNBCMF5SVFCPEZ\OBNFGOBNF NJNF5ZQF.*.&@5:1& QBSFOUT ^STFSWJDFGJMFT DSFBUF CPEZCPEZ [email protected]@CPEZ PDS-BOHVBHFKB FYFDVUF
0$3ྫ ࠨɿݪຊɺӈɿ(PPHMF%SJWF0$3
0$3ͷਫ਼ʹ͍ͭͯ● ࠓճɺ0$3ͷਫ਼ʹ͍ͭͯͩ͜ΘΒͳ͍○ ಉ͡Ͱ͋Εɺಉ͡ਫ਼ͰςΩετԽ͞ΕΔͣ■ ยํ͕ޡͬͯղੳ͞Εͯɺ͏ยํޡͬͯղੳ͞ΕΔͣ● ͱΓ͋͑ͣ0$3ॲཧͰ͖͍ͯΕ0,
σʔλͷલॲཧ● 0$3ޙͷσʔλΛݟͨͱ͜Ζɺͦͷ··Ͱѻ͍ͮΒ͍͜ͱʹؾͮ͘○ վߦ͕ଟ͍○ ʮʯ͚͕ͩʮʯͱͯ͠0$3ॲཧ͞ΕͯΔ■ ଞͷେৎͦ͏ͳͷʹ● ͱ͍͑ɺ͋·ΓྗೖΕͳ͍
ػցֶशͰͷࣗવݴޠॲཧ● ػցֶशͰͷࣗવݴޠॲཧɺ͍Ζ͍Ζ͋Δ● ࠓճ%PD7FDΛબ○ จষͷྨࣅੑΛݟ͚ͭͦ͢͏ͩͬͨͷͰ● %PD7FDͱ○ จষΛϕΫτϧԽ■ ϕΫτϧԽ͢ΔͱίϯϐϡʔλͰѻ͍͘͢ͳΔ■ ྨͱ͔
%PD7FDϞσϧ● 5FOTPS'MPX○ ͦΕͬΆ͍࣮͋ͬͨ■ IUUQTHJUIVCDPN1BDLU1VCMJTIJOH5FOTPS'MPX.BDIJOF-FBSOJOH$PPLCPPLCMPCNBTUFS$IBQUFSEPDWFDQZ○ ࣗྗ࣮͢Δʹ͕ࣝΓͳ͍
%PD7FDϞσϧ● (FOTJN○ (FOTJNJTB1ZUIPOMJCSBSZGPSUPQJDNPEFMMJOH EPDVNFOUJOEFYJOHBOETJNJMBSJUZSFUSJFWBMXJUIMBSHFDPSQPSB5BSHFUBVEJFODFJTUIFOBUVSBMMBOHVBHFQSPDFTTJOH /-1BOEJOGPSNBUJPOSFUSJFWBM *3DPNNVOJUZ■ IUUQTHJUIVCDPN3B3F5FDIOPMPHJFTHFOTJN○ %PD7FDϞσϧ͕ఏڙ͞Ε͍ͯΔ■ IUUQTSBEJNSFIVSFLDPNHFOTJNNPEFMTEPDWFDIUNM
%PD7FD͢ΔͨΊʹ● จॻΛܗଶૉղੳΛ͢Δඞཁ͕͋Δ○ ܗଶૉղੳɿจॻΛɺҙຯΛ࣋ͭ࠷খ୯Ґʹղ͢Δ͜ͱ○ ͓͓ͪͯ͠Γ·͢ ͓ͪ ͠ ͯ ͓Γ ·͢ 8JLJQFEJB● ϥΠϒϥϦ○ .F$BCKBOPNFɺ+6."/ͳͲ■ ࠓճɺखݩʹ͋ͬͨ+6."/Λ༻● +6."/ͷΠϯετʔϧ○ .BDͷ߹ɺ)PNFCSFXͰΠϯετʔϧ■ IUUQTDIF[PVIBUFOBCMPHDPNFOUSZ● 1ZUIPO͔Βѻ͏ͨΊʹɺ1Z,/1ΛΠϯετʔϧ○ IUUQOMQJTUJLZPUPVBDKQJOEFYQIQ 1Z,/1
+6."/ͰܗଶૉղੳɺτϨʔχϯάσʔλ༻ҙ● doc2vec_runner.pydef morphological_analysis(doc):r = Jumanpp().analysis(doc)return [mrph.midasi for mrph in r.mrph_list()]def get_trainings(issue_type):trainings = []paths = get_file_paths(issue_type)for p in paths:doc = read_file(p)words = morphological_analysis(doc)trainings.append(TaggedDocument(words=words, tags=[p.stem]))return trainings
HFOTJNͷ%PD7FDͰֶश● doc2vec_runner.pymodel = Doc2Vec(documents=all_trainings,dm=1, # PV-DMモデルを使うmin_count=1, # これ以下の出現数の単語は無視workers=4, # スレッドのワーカー数epochs=EPOCHS, # エポック数 <= これが重要だった)
%PD7FDͰͷֶश ΤϙοΫͷҧ͍● ΤϙοΫ FQPDI○ ༻ҙֶͨ͠शσʔλͰɺԿճ܇࿅͢Δ͔● ॏཁͳͷɺదͳͰ܇࿅͢Δ͜ͱ○ IUUQTTUBDLPWFSGMPXDPNRVFTUJPOTXIBUBSFEPDWFDUSBJOJOHJUFSBUJPOT
%PD7FDͰͷֶश ΤϙοΫͷҧ͍● ݁Ռͷҧ͍ ࠨɿΤϙοΫɺӈɿΤϙοΫ
ֶश݁ՌΛදࣔ● [email protected]○ Ұ൪ྨࣅͯ͠ΔͷΛग़͢ TJNJMBSJUZ
ֶश݁ՌΛදࣔ● [email protected]○ ࣅ͍ͯΔ্ҐO݅ [email protected]
·ͱΊ● ҙ֎ͱͦΕͳΓͷਫ਼͕ग़ͨ○ ΄͍͠σʔλ͕ಘΒΕͨ● πʔϧΛΈ߹ΘͤΕɺࣗͷͨΊͷػցֶश͕Ͱ͖ͦ͏○ ָ͍͠● ͜ΕΛ࡞Δͷָ͕͔ͬͨ͠ͷͰɺࢼݧରࡦͰ͖ͯͳ͍ɻɻɻ
$.● /4&(○ dʮ044ϥΠηϯεʯษڧձ■ ߨࢣɿςΫχΧϧϥΠλʔʗ*5ϥΠλʔͷՄ๛ࢯ■ ॴɿΪʔΫϥϘ■ IUUQTOTFHDPOOQBTTDPNFWFOU● *P5-5○ d൛*P5റΓͷษڧձ *P5-5 WPM■ ॴɿΪʔΫϥϘ■ ୈ෦లࣔ IUUQTJPUMUDPOOQBTTDPNFWFOU■ ୈ෦-5େձ IUUQTJPUMUDPOOQBTTDPNFWFOU