Upgrade to Pro — share decks privately, control downloads, hide ads and more …

自分のための機械学習をしてみた話

 自分のための機械学習をしてみた話

2018/4/13に開かれた、GDG信州「Recap of TensorFlow Dev Summit 2018 in 信州」でのLTスライドです。

thinkAmi

April 13, 2018
Tweet

More Decks by thinkAmi

Other Decks in Programming

Transcript

  1. ࣗݾ঺հ • UIJOL"NJ ◦ ͸ͯͳ 5XJUUFS (JU)VC • झຯɿΓΜͨ͝΂͋Δ͖ ◦

    ໿छྨ͘Β͍ • ࢓ࣄɿ1ZUIPO ◦ %KBOHP$POHSFTT • (%(৴भɺ/4&( • ΪʔΫϥϘ௕໺ IUUQSJOHPUBCFUUFSIFSPLVBQQDPNIDUPUBM
  2. ௐ΂ͯΈͨྲྀΕ  աڈ໰Λೖख  0$3ॲཧ  ղੳ݁ՌΛલॲཧͯ͠ɺσʔλΛΩϨΠʹ͢Δ  ػցֶशͰࣗવݴޠॲཧ 

    ֶश݁ՌΛදࣔ ιʔείʔυ 1ZUIPO ͸ͪ͜Β IUUQTHJUIVCDPNUIJOL"NJJQB@JTTVFT@NPEFM@CZ@EPDWFD
  3. (PPHMF%SJWF"1*Ͱ0$3ॲཧͰ͖Δ͔ • ύϥϝʔλ͕͋ͬͨ ◦ 1045IUUQTXXXHPPHMFBQJTDPNVQMPBEESJWFWGJMFTͷ PDS-BOHVBHF ▪ IUUQTEFWFMPQFSTHPPHMFDPNESJWFWSFGFSFODFGJMFTDSFBUF • 1ZUIPOͰॻ͍ͯΈͨ

    ◦ 0$3ॲཧ͕Ͱ͖ͨ ◦ HPPHMF@ESJWF@VQMPBEFSQZ • ͜Μͳײ͡ NFEJB@CPEZ.FEJB'JMF6QMPBE G NJNFUZQF.*.&@5:1& SFTVNBCMF5SVF CPEZ\OBNFGOBNF NJNF5ZQF.*.&@5:1& QBSFOUT<EJSFDUPSZ@JE> ^ STFSWJDFGJMFT DSFBUF CPEZCPEZ NFEJB@CPEZNFEJB@CPEZ PDS-BOHVBHFKB  FYFDVUF
  4. %PD7FDϞσϧ • (FOTJN ◦ (FOTJNJTB1ZUIPOMJCSBSZGPSUPQJDNPEFMMJOH EPDVNFOU JOEFYJOHBOETJNJMBSJUZSFUSJFWBMXJUIMBSHFDPSQPSB5BSHFU BVEJFODFJTUIFOBUVSBMMBOHVBHFQSPDFTTJOH /-1 BOE

    JOGPSNBUJPOSFUSJFWBM *3 DPNNVOJUZ ▪ IUUQTHJUIVCDPN3B3F5FDIOPMPHJFTHFOTJN ◦ %PD7FDϞσϧ͕ఏڙ͞Ε͍ͯΔ ▪ IUUQTSBEJNSFIVSFLDPNHFOTJNNPEFMTEPDWFDIUNM
  5. %PD7FD͢ΔͨΊʹ • จॻΛܗଶૉղੳΛ͢Δඞཁ͕͋Δ ◦ ܗଶૉղੳɿจॻΛɺҙຯΛ࣋ͭ࠷খ୯Ґʹ෼ղ͢Δ͜ͱ ◦ ͓଴͓ͪͯ͠Γ·͢ ͓଴ͪ ͠ ͯ

    ͓Γ ·͢ 8JLJQFEJB • ϥΠϒϥϦ ◦ .F$BC΍KBOPNFɺ+6."/ ͳͲ ▪ ࠓճ͸ɺखݩʹ͋ͬͨ+6."/ Λ࢖༻ • +6."/ ͷΠϯετʔϧ ◦ .BDͷ৔߹ɺ)PNFCSFXͰΠϯετʔϧ ▪ IUUQTDIF[PVIBUFOBCMPHDPNFOUSZ • 1ZUIPO͔Βѻ͏ͨΊʹɺ1Z,/1ΛΠϯετʔϧ ◦ IUUQOMQJTUJLZPUPVBDKQJOEFYQIQ 1Z,/1
  6. +6."/ ͰܗଶૉղੳɺτϨʔχϯάσʔλ༻ҙ • doc2vec_runner.py def morphological_analysis(doc): r = Jumanpp().analysis(doc) return

    [mrph.midasi for mrph in r.mrph_list()] def get_trainings(issue_type): trainings = [] paths = get_file_paths(issue_type) for p in paths: doc = read_file(p) words = morphological_analysis(doc) trainings.append(TaggedDocument(words=words, tags=[p.stem])) return trainings
  7. HFOTJNͷ%PD7FDͰֶश • doc2vec_runner.py model = Doc2Vec( documents=all_trainings, dm=1, # PV-DMモデルを使う

    min_count=1, # これ以下の出現数の単語は無視 workers=4, # スレッドのワーカー数 epochs=EPOCHS, # エポック数 <= これが重要だった )
  8. $. • /4&( ◦  ౔ dʮ044ϥΠηϯεʯษڧձ ▪ ߨࢣɿςΫχΧϧϥΠλʔʗ*5ϥΠλʔͷՄ஌๛ࢯ ▪

    ৔ॴɿΪʔΫϥϘ௕໺ ▪ IUUQTOTFHDPOOQBTTDPNFWFOU • *P5-5 ◦  ౔ d௕໺൛*P5റΓͷษڧձ *P5-5௕໺ WPM ▪ ৔ॴɿΪʔΫϥϘ௕໺ ▪ ୈ෦లࣔ IUUQTJPUMUDPOOQBTTDPNFWFOU ▪ ୈ෦-5େձ IUUQTJPUMUDPOOQBTTDPNFWFOU