Upgrade to Pro — share decks privately, control downloads, hide ads and more …

自分のための機械学習をしてみた話

 自分のための機械学習をしてみた話

2018/4/13に開かれた、GDG信州「Recap of TensorFlow Dev Summit 2018 in 信州」でのLTスライドです。

thinkAmi

April 13, 2018
Tweet

More Decks by thinkAmi

Other Decks in Programming

Transcript

  1. ࣗ෼ͷͨΊͷػցֶशΛ
    ͯ͠Έͨ࿩
    !UIJOL"NJ
    3FDBQPG5FOTPS'MPX%FW4VNNJUJO৴भ -5

    View Slide

  2. ࣗݾ঺հ
    ● UIJOL"NJ
    ○ ͸ͯͳ 5XJUUFS (JU)VC
    ● झຯɿΓΜͨ͝΂͋Δ͖
    ○ ໿छྨ͘Β͍
    ● ࢓ࣄɿ1ZUIPO
    ○ %KBOHP$POHSFTT
    ● (%(৴भɺ/4&(
    ● ΪʔΫϥϘ௕໺
    IUUQSJOHPUBCFUUFSIFSPLVBQQDPNIDUPUBM

    View Slide

  3. ࣗ෼ͷͨΊͷػցֶशΛͯ͠Έͨ࿩
    ○ ΄͍݁͠ՌΛಘΔͨΊʹɺ͍Ζ͍ΖͱࢼߦࡨޡΛͯ͠Έͨ࿩
    ● ͋ͱͰεϥΠυ͸ެ։͠·͢

    View Slide

  4. ࣭໰

    ͸Կͷ೔ʁ

    View Slide

  5. ౴͑
    *1"ͷ৘ใॲཧࢼݧ

    View Slide

  6. ߴ౓ࢼݧͷޕલ͸
    ಉ೔ͷԠ༻৘ใٕज़ऀࢼݧ
    ͷޕલͱಉ͡໰୊͕ग़Δ
    Β͍͠

    View Slide

  7. ͔ͤͬ͘ͳͷͰ
    ػցֶशͰௐ΂ͯΈͨ
    ͱ͍͏-5

    View Slide

  8. ࣭໰
    ● ػցֶशɺಛʹࣗવݴޠॲཧΛ΍ͬͨ͜ͱ͕͋Δํ
    ● 1ZUIPOΛॻ͍ͨ͜ͱ͕͋Δํ

    View Slide

  9. ௐ΂ͯΈͨྲྀΕ
    աڈ໰Λೖख
    0$3ॲཧ
    ղੳ݁ՌΛલॲཧͯ͠ɺσʔλΛΩϨΠʹ͢Δ
    ػցֶशͰࣗવݴޠॲཧ
    ֶश݁ՌΛදࣔ
    ιʔείʔυ 1ZUIPO
    ͸ͪ͜Β
    IUUQTHJUIVCDPNUIJOL"[email protected]@[email protected]@EPDWFD

    View Slide

  10. աڈ໰Λೖख
    ● ͪ͜Β͔Β
    ○ *1"ಠཱߦ੓๏ਓ ৘ใॲཧਪਐػߏɿաڈ໰୊
    [email protected]@[email protected]@NP
    OEBJIUNM
    ● 0$3ॲཧ͞Ε͍ͯͳ͍ͬΆ͍QEG
    ○ ͭΒ͍

    View Slide

  11. 0$3ॲཧ
    ● 1ZUIPOͷ৔߹ɺQZPDS5FTTFSBDUͰ͍͚ͦ͏
    ● ͱ͸͍͑ɺ͜͜͸(%(৴भ
    ○ (%((PPHMF%FWFMPQFST(SPVQT
    ● (PPHMFΛ࢖Θͳ͍ͱʂ

    View Slide

  12. (PPHMFͰ0$3ॲཧͰ͖ͦ͏ͳ΋ͷ
    ● (PPHMF$MPVE7JTJPO"1*
    ● (PPHMF%SJWFͷ0$3ॲཧ
    ○ ࠓճ͸ͪ͜ΒΛ࢖͏

    View Slide

  13. (PPHMF%SJWF"1*Ͱ0$3ॲཧͰ͖Δ͔
    ● ύϥϝʔλ͕͋ͬͨ
    ○ 1045IUUQTXXXHPPHMFBQJTDPNVQMPBEESJWFWGJMFTͷ PDS-BOHVBHF
    ■ IUUQTEFWFMPQFSTHPPHMFDPNESJWFWSFGFSFODFGJMFTDSFBUF
    ● 1ZUIPOͰॻ͍ͯΈͨ
    ○ 0$3ॲཧ͕Ͱ͖ͨ
    [email protected]@VQMPBEFSQZ
    ● ͜Μͳײ͡
    [email protected]'JMF6QMPBE G NJNFUZQF.*.&@5:1& SFTVNBCMF5SVF

    CPEZ\OBNFGOBNF NJNF5ZQF.*.&@5:1& QBSFOUT ^
    STFSWJDFGJMFT
    DSFBUF CPEZCPEZ [email protected]@CPEZ PDS-BOHVBHFKB
    FYFDVUF

    View Slide

  14. 0$3ྫ ࠨɿݪຊɺӈɿ(PPHMF%SJWF0$3

    View Slide

  15. 0$3ͷਫ਼౓ʹ͍ͭͯ
    ● ࠓճɺ0$3ͷਫ਼౓ʹ͍ͭͯ͸ͩ͜ΘΒͳ͍
    ○ ಉ͡໰୊Ͱ͋Ε͹ɺಉ͡ਫ਼౓ͰςΩετԽ͞ΕΔ͸ͣ
    ■ ยํ͕ޡͬͯղੳ͞Εͯ΋ɺ΋͏ยํ΋ޡͬͯղੳ͞ΕΔ͸ͣ
    ● ͱΓ͋͑ͣ0$3ॲཧͰ͖͍ͯΕ͹0,

    View Slide

  16. σʔλͷલॲཧ
    ● 0$3ޙͷσʔλΛݟͨͱ͜Ζɺͦͷ··Ͱ͸ѻ͍ͮΒ͍͜ͱʹؾͮ͘
    ○ վߦ͕ଟ͍
    ○ ʮ໰ʯ͚͕ͩʮ໰ʯͱͯ͠0$3ॲཧ͞ΕͯΔ
    ■ ଞͷ໰୊͸େৎ෉ͦ͏ͳͷʹ
    ● ͱ͸͍͑ɺ͋·ΓྗೖΕͳ͍

    View Slide

  17. ػցֶशͰͷࣗવݴޠॲཧ
    ● ػցֶशͰͷࣗવݴޠॲཧ͸ɺ͍Ζ͍Ζ͋Δ
    ● ࠓճ͸%PD7FDΛબ୒
    ○ จষͷྨࣅੑΛݟ͚ͭ΍ͦ͢͏ͩͬͨͷͰ
    ● %PD7FDͱ͸
    ○ จষΛϕΫτϧԽ
    ■ ϕΫτϧԽ͢ΔͱίϯϐϡʔλͰѻ͍΍͘͢ͳΔ
    ■ ෼ྨͱ͔

    View Slide

  18. %PD7FDϞσϧ
    ● 5FOTPS'MPX
    ○ ͦΕͬΆ͍࣮૷͸͋ͬͨ
    ■ IUUQTHJUIVCDPN1BDLU1VCMJTIJOH5FOTPS'MPX.BDIJOF-FBSOJOH
    $PPLCPPLCMPCNBTUFS$IBQUFSEPDWFDQZ
    ○ ࣗྗ࣮૷͢Δʹ͸஌͕ࣝ଍Γͳ͍

    View Slide

  19. %PD7FDϞσϧ
    ● (FOTJN
    ○ (FOTJNJTB1ZUIPOMJCSBSZGPSUPQJDNPEFMMJOH EPDVNFOU
    JOEFYJOHBOETJNJMBSJUZSFUSJFWBMXJUIMBSHFDPSQPSB5BSHFU
    BVEJFODFJTUIFOBUVSBMMBOHVBHFQSPDFTTJOH /-1
    BOE
    JOGPSNBUJPOSFUSJFWBM *3
    DPNNVOJUZ
    ■ IUUQTHJUIVCDPN3B3F5FDIOPMPHJFTHFOTJN
    ○ %PD7FDϞσϧ͕ఏڙ͞Ε͍ͯΔ
    ■ IUUQTSBEJNSFIVSFLDPNHFOTJNNPEFMTEPDWFDIUNM

    View Slide

  20. %PD7FD͢ΔͨΊʹ
    ● จॻΛܗଶૉղੳΛ͢Δඞཁ͕͋Δ
    ○ ܗଶૉղੳɿจॻΛɺҙຯΛ࣋ͭ࠷খ୯Ґʹ෼ղ͢Δ͜ͱ
    ○ ͓଴͓ͪͯ͠Γ·͢ ͓଴ͪ ͠ ͯ ͓Γ ·͢ 8JLJQFEJB

    ● ϥΠϒϥϦ
    ○ .F$BC΍KBOPNFɺ+6."/ͳͲ
    ■ ࠓճ͸ɺखݩʹ͋ͬͨ+6."/Λ࢖༻
    ● +6."/ͷΠϯετʔϧ
    ○ .BDͷ৔߹ɺ)PNFCSFXͰΠϯετʔϧ
    ■ IUUQTDIF[PVIBUFOBCMPHDPNFOUSZ
    ● 1ZUIPO͔Βѻ͏ͨΊʹɺ1Z,/1ΛΠϯετʔϧ
    ○ IUUQOMQJTUJLZPUPVBDKQJOEFYQIQ 1Z,/1

    View Slide

  21. +6."/ͰܗଶૉղੳɺτϨʔχϯάσʔλ༻ҙ
    ● doc2vec_runner.py
    def morphological_analysis(doc):
    r = Jumanpp().analysis(doc)
    return [mrph.midasi for mrph in r.mrph_list()]
    def get_trainings(issue_type):
    trainings = []
    paths = get_file_paths(issue_type)
    for p in paths:
    doc = read_file(p)
    words = morphological_analysis(doc)
    trainings.append(TaggedDocument(words=words, tags=[p.stem]))
    return trainings

    View Slide

  22. HFOTJNͷ%PD7FDͰֶश
    ● doc2vec_runner.py
    model = Doc2Vec(
    documents=all_trainings,
    dm=1, # PV-DMモデルを使う
    min_count=1, # これ以下の出現数の単語は無視
    workers=4, # スレッドのワーカー数
    epochs=EPOCHS, # エポック数 <= これが重要だった
    )

    View Slide

  23. %PD7FDͰͷֶश ΤϙοΫ਺ͷҧ͍
    ● ΤϙοΫ਺ FQPDI

    ○ ༻ҙֶͨ͠शσʔλͰɺԿճ܇࿅͢Δ͔
    ● ॏཁͳͷ͸ɺద੾ͳ਺Ͱ܇࿅͢Δ͜ͱ
    ○ IUUQTTUBDLPWFSGMPXDPNRVFTUJPOTXIBUBSFEPDWFDUSBJOJOH
    JUFSBUJPOT

    View Slide

  24. %PD7FDͰͷֶश ΤϙοΫ਺ͷҧ͍
    ● ݁Ռͷҧ͍ ࠨɿΤϙοΫɺӈɿΤϙοΫ

    View Slide

  25. ֶश݁ՌΛදࣔ
    [email protected]
    ○ Ұ൪ྨࣅͯ͠ΔͷΛग़͢ TJNJMBSJUZ


    View Slide

  26. ֶश݁ՌΛදࣔ
    [email protected]
    ○ ࣅ͍ͯΔ্ҐO݅ [email protected]


    View Slide

  27. ·ͱΊ
    ● ҙ֎ͱͦΕͳΓͷਫ਼౓͕ग़ͨ
    ○ ΄͍͠σʔλ͕ಘΒΕͨ
    ● πʔϧΛ૊Έ߹ΘͤΕ͹ɺࣗ෼ͷͨΊͷػցֶश͕Ͱ͖ͦ͏
    ○ ָ͍͠
    ● ͜ΕΛ࡞Δͷָ͕͔ͬͨ͠ͷͰɺࢼݧରࡦͰ͖ͯͳ͍ɻɻɻ

    View Slide

  28. $.
    ● /4&(
    ○ ౔
    dʮ044ϥΠηϯεʯษڧձ
    ■ ߨࢣɿςΫχΧϧϥΠλʔʗ*5ϥΠλʔͷՄ஌๛ࢯ
    ■ ৔ॴɿΪʔΫϥϘ௕໺
    ■ IUUQTOTFHDPOOQBTTDPNFWFOU
    ● *P5-5
    ○ ౔
    d௕໺൛*P5റΓͷษڧձ *P5-5௕໺ WPM
    ■ ৔ॴɿΪʔΫϥϘ௕໺
    ■ ୈ෦లࣔ IUUQTJPUMUDPOOQBTTDPNFWFOU
    ■ ୈ෦-5େձ IUUQTJPUMUDPOOQBTTDPNFWFOU

    View Slide