Upgrade to Pro — share decks privately, control downloads, hide ads and more …

MeCabとKerasを使ったテキスト分類

F8865f41777ef3caced0e4e6801ff83a?s=47 masa-ita
February 23, 2019

 MeCabとKerasを使ったテキスト分類

F8865f41777ef3caced0e4e6801ff83a?s=128

masa-ita

February 23, 2019
Tweet

Transcript

  1. MeCabKeras     2019/2/23 @Python  in

  2. 

  3.    3F-*"% š Q:<+/M@3F-*8L )9 š 3F O8L$?.

    š IDP6S š E<6S š >16S š KFREG6S š /M6S C4-*"% š 3F-*8L)9 <JNF '0=A#&H ! 5 72; B, ("%
  4.  š !!$A<7> 7>-=N-Gram .C(2  !$,@ š 7>A<A1 

    0 š # $?/<"A<85  3B!$,  %&<*'9)+:. š %&<*'D46 =;C2E6 0 Ex. MeCab
  5.     š '!, ",*+$J8  AOIQH= š

    FORBFO"( E9   š RLRB20N16AOIQ H= š RLAAG>U  &$ CV .@W73 RL?K MS  16E š -D16/5:TH= /5:T;=46  )%#+P  46< 
  6.    

  7. livedoor š NHN Japan58+- 42 livedoor $' ) #%&* (!*

    =. $'1,79  :6;HTML"/<30  š https://www.rondhuit.com/download.html#ldcc
  8.  š livedoor       

  9. MeCab

  10.    š MeCab HN7GSMGegi−69PKPLW`8:%/0-$ &25iGQoegI _@eg1-*,.4'",BC?  !.5)(

    fdkRm 5'5 š V;T[nUJaGoogle Inc. š ^p\Ffh]cX +.3-5#><jl = Y ,"5DAbEZ  O
  11. MeCab š MeCab C++ '&   # !*( š

    Windows %$ š https://taku910.github.io/mecab/#download #"+) 32  š 64 ,  https://github.com/ikegami-yukino/mecab/releases/tag/v0.996 #"+) š Mac %$ š Homebrew mecab, mecab-ipadic #!+) š Ubuntu %$ š apt mecab, mecab-ipadic #!+)
  12. Keras 

  13. keras.preprocessing.text.Tokenizer š /-.2 /- !%"(8$&5 * #31)76 0)% š +4

     ', š fit &5tokenize !%0) %
  14. keras.preprocessing.sequence.pad_sequences š !  ( " š #  $'%

      š &    
  15. 

  16. BoW: Bag of Words š # %EC*  G DEC?

    -  J;/ š F<+EC,8=@1/0&%)  58 ()! '"%*$* š ,8I209&%)  58 /1 š TF-IDF: Term Frequency Inverse Document Frequency EHI2  ><,8 EC:67B4A .1&% )3
  17. Word Embedding š a]!.$*2C<@ fTY=!UD :9RPJG5 a]J ?Z10,000 20,000K6 

    Ni '3&, &.$*2 š 7<a]![RP7dJ`RPe.$*2 F S< Word Embeddinga]gO š Google A; Xb!LWord2vec^V \B  š W^Ec!80)2H_!LRP IM  š Word2vec&#(-%1/Qh@Ec!8 )"-1 +4%0)27> Ec!8<@
  18. RNN: Recurrent Neural Network š *-H,+.=8 G "!%AB š !*DF

    @162  š  ,'/5?)/ G#$&!:(8 RNN> C;79304E LSTMLong Short Term MemoryGRU Gated Recurrent Unit<
  19. 

  20. BoW   DNN      

    
  21. Word EmbeddingGlobalAveragePooling1D     

  22. Word EmbeddingRNNLSTM DNN  

  23. 

  24.  BoWDNN š 0.5E #9("%$)CBoW+/ DNN4: * š DBG6GlobalAveragePooling1D1 !$=2F

    A   š LSTM7H2F,- <4: ' š ; 7I ?3>8)CLSTM 4: & @:4
  25.  š NLP,B8?=4-1$!&)%+"C5>@.A  š 7EFDQ&A-1Sequence-to-Sequence($*  Attention :($*.A;3 OpenAIGoogle

    Transformer  '#Allen Institute 2.ELMo  Google G5($*3BERTOpenAI .6GPT-204 <($* 9/