Upgrade to Pro — share decks privately, control downloads, hide ads and more …

再帰型ニューラルネットを用いたCSの返答自動化の検討

 再帰型ニューラルネットを用いたCSの返答自動化の検討

カスタマーサポートで質問に自動返答できるシステムが欲しかったのでプロトタイプを作ってみました

Yasunori Tanaka

March 31, 2018
Tweet

More Decks by Yasunori Tanaka

Other Decks in Technology

Transcript

  1. About Me • ాத Ժࣝ (Tanaka Yasunori) • @yanak174 •

    https://blog.codingecho.com • ॴଐ: גࣜձࣾenish • ΍ͬͯΔ͜ͱ: ήʔϜΞϓϦͷαʔόʔαΠυΤϯδχΞ
  2. ಺༰ 1. CS (ΧελϚʔαϙʔτ) ͷݱঢ় 2. ΍Γ͍ͨ͜ͱ 3. σʔληοτ 4.

    ࣭໰ςΩετͷલॲཧ 5. LSTMͰֶश 6. ·ͱΊ
  3. CSͷݱঢ় • 1. ଟ͘ͷϢʔβʔ͕͢Δ࣭໰ͱɺ2. ൺֱతগͳ͍Ϣʔβʔ͕ ૺ۰͢Δ໰୊ʹର͢Δ࣭໰͕͋Δ • ଟ͘ͷϢʔβʔ͕͢Δ࣭໰ (ҎԼɺFAQ)͸ɺ͋Δఔ౓ܾΊΒΕ ͨϑΥʔϚοτʹैͬͯճ౴͞ΕΔ͜ͱ͕ଟ͍

    • εϚʔτϑΥϯ޲͚ήʔϜͷதʹ͸ɺFAQʹ֘౰͢Δ࣭໰ͱ͠ ͯ՝ۚʹؔ͢Δෆ۩߹ͱΞΧ΢ϯτফࣦʹؔ͢Δ಺༰͕͋Δ • ͜ΕΒͷFAQ͸Ϣʔβʔʹରͯ͠ఆܕԽͨ͠ඞཁ߲໨ͷώΞ Ϧϯά͕ඞཁͱͳΔ
  4. ࣭໰ςΩετͷલॲཧ ه߸΍ϝʔϧΞυϨεɺ਺ࣈͳͲͷจࣈΛ࡟আ͢Δ filtered_text = [] text = ["͓࣌ؒΛ௖ଷ͓ͯ͠Γ·͢ɻversion 1.2.3 ----------------------------------------"]

    for t in issues: result = re.compile('-+').sub('', t) result = re.compile('[0-9]+').sub('0', result) result = re.compile('\s+').sub('', result) # ... ͜ͷΑ͏ͳஔ׵ॲཧ͕ෳ਺ܨ͕͍ͬͯ·͢ # ࣭໰ςΩετ͕ۭจࣈʹͳΔ͜ͱ͕͋ΔͷͰͦͷߦ͸ؚΊͳ͍Α͏ʹ͠·͢ if len(result) > 0: sub_texts.append(result) filtered_text.append(result) print("text:%s" % result) # text:͓࣌ؒΛ௖ଷ͓ͯ͠Γ·͢ɻ
  5. αϯϓϧͱϥϕϧΛ࡞੒ labels = [] samples = [] threshold = 700

    cnt1 = 0 cnt2 = 0 cnt3 = 0 for i, row in enumerate(filtered_samples): if 'Account' in row[2]: if cnt2 < threashold: cnt1 += 1 labels.append(2) samples.append(row[0]) elif 'Payment' in row[2]: if cnt3 < threashold: cnt3 += 1 labels.append(3) samples.append(row[0]) else: if cnt1 < threashold: cnt1 += 1 labels.append(1) samples.append(row[0])
  6. MeCabͰ෼͔ͪॻ͖ import MeCab import re def tokenize(text): wakati = MeCab.Tagger("-O

    wakati") wakati.parse("") words = wakati.parse(text) # Make word list if words[-1] == u"\n": words = words[:-1] return words texts = [tokenize(a) for a in samples] ௕Β͓࣌ؒ͘Λ௖ଷ͓ͯ͠Γ·͢ ௕Β͘ ͓ ࣌ؒ Λ ௖ଷ ͠ ͯ ͓Γ ·͢
  7. LSTMͰֶश from keras.models import Sequential from keras.layers import Flatten, Dense,

    Embedding from keras.layers import LSTM model = Sequential() model.add(Embedding(15000, 100, input_length=maxlen)) model.add(LSTM(32)) model.add(Dense(4, activation='sigmoid')) model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['acc']) model.summary() Word embedding΋ಉ࣌ʹֶश͢Δ