Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
MeCabとKerasを使ったテキスト分類
Search
masa-ita
February 23, 2019
Technology
1
500
MeCabとKerasを使ったテキスト分類
masa-ita
February 23, 2019
Tweet
Share
More Decks by masa-ita
See All by masa-ita
Ollamaを使ったLocal Language Model活用法
itagakim
1
160
Run Instant NeRF on Docker
itagakim
1
2.3k
3D Clustering and Metric Learning
itagakim
0
350
Cloud TPUの使い方〜BigBirdの日本語学習済みモデルを作る〜
itagakim
0
680
多言語学習済みモデルmT5とは?
itagakim
1
700
AWSのGPUを安く使ってTensorFlowモデルを訓練する方法
itagakim
0
370
最近の自然言語処理モデルの動向
itagakim
1
570
ディープラーニングで芸術はできるか?〜生成系ネットワークの進展〜
itagakim
0
340
AWSとTerraform初心者がやってみたこと
itagakim
1
480
Other Decks in Technology
See All in Technology
Introdução a Service Mesh usando o Istio
aeciopires
1
280
Introduction to Sansan for Engineers / エンジニア向け会社紹介
sansan33
PRO
5
43k
CNCFの視点で捉えるPlatform Engineering - 最新動向と展望 / Platform Engineering from the CNCF Perspective
hhiroshell
0
130
AIフル活用で挑む!空間アプリ開発のリアル
taat
0
140
難しいセキュリティ用語をわかりやすくしてみた
yuta3110
0
360
Dylib Hijacking on macOS: Dead or Alive?
patrickwardle
0
440
ハノーファーメッセ2025で見た生成AI活用ユースケース.pdf
hamadakoji
0
390
「最速」で Gemini CLI を使いこなそう! 〜Cloud Shell/Cloud Run の活用〜 / The Fastest Way to Master the Gemini CLI — with Cloud Shell and Cloud Run
aoto
PRO
0
150
会社を支える Pythonという言語戦略 ~なぜPythonを主要言語にしているのか?~
curekoshimizu
1
500
Introduction to Sansan, inc / Sansan Global Development Center, Inc.
sansan33
PRO
0
2.8k
Linux カーネルが支えるコンテナの仕組み / LF Japan Community Days 2025 Osaka
tenforward
1
110
OpenTelemetry が拡げる Gemini CLI の可観測性
phaya72
2
1.7k
Featured
See All Featured
個人開発の失敗を避けるイケてる考え方 / tips for indie hackers
panda_program
115
20k
Making the Leap to Tech Lead
cromwellryan
135
9.6k
Producing Creativity
orderedlist
PRO
347
40k
Facilitating Awesome Meetings
lara
57
6.6k
Code Review Best Practice
trishagee
72
19k
Site-Speed That Sticks
csswizardry
13
920
Designing for Performance
lara
610
69k
Practical Tips for Bootstrapping Information Extraction Pipelines
honnibal
PRO
23
1.5k
Bootstrapping a Software Product
garrettdimon
PRO
307
110k
How to train your dragon (web standard)
notwaldorf
97
6.3k
CSS Pre-Processors: Stylus, Less & Sass
bermonpainter
359
30k
Typedesign – Prime Four
hannesfritz
42
2.8k
Transcript
MeCabKeras 2019/2/23 @Python in
3F-*"% Q:<+/M@3F-*8L )9 3F O8L$?.
IDP6S E<6S >16S KFREG6S /M6S C4-*"% 3F-*8L)9 <JNF '0=A#&H ! 5 72; B, ("%
!!$A<7> 7>-=N-Gram .C(2 !$,@ 7>A<A1
0 # $?/<"A<85 3B!$, %&<*'9)+:. %&<*'D46 =;C2E6 0 Ex. MeCab
'!, ",*+$J8 AOIQH=
FORBFO"( E9 RLRB20N16AOIQ H= RLAAG>U &$ CV .@W73 RL?K MS 16E -D16/5:TH= /5:T;=46 )%#+P 46<
livedoor NHN Japan58+- 42 livedoor $' ) #%&* (!*
=. $'1,79 :6;HTML"/<30 https://www.rondhuit.com/download.html#ldcc
livedoor
MeCab
MeCab HN7GSMGegi−69PKPLW`8:%/0-$ &25iGQoegI _@eg1-*,.4'",BC? !.5)(
fdkRm 5'5 V;T[nUJaGoogle Inc. ^p\Ffh]cX +.3-5#><jl = Y ,"5DAbEZ O
MeCab MeCab C++ '& # !*(
Windows %$ https://taku910.github.io/mecab/#download #"+) 32 64 , https://github.com/ikegami-yukino/mecab/releases/tag/v0.996 #"+) Mac %$ Homebrew mecab, mecab-ipadic #!+) Ubuntu %$ apt mecab, mecab-ipadic #!+)
Keras
keras.preprocessing.text.Tokenizer /-.2 /- !%"(8$&5 * #31)76 0)% +4
', fit &5tokenize !%0) %
keras.preprocessing.sequence.pad_sequences ! ( " # $'%
&
BoW: Bag of Words # %EC* G DEC?
- J;/ F<+EC,8=@1/0&%) 58 ()! '"%*$* ,8I209&%) 58 /1 TF-IDF: Term Frequency Inverse Document Frequency EHI2 ><,8 EC:67B4A .1&% )3
Word Embedding a]!.$*2C<@ fTY=!UD :9RPJG5 a]J ?Z10,000 20,000K6
Ni '3&, &.$*2 7<a]![RP7dJ`RPe.$*2 F S< Word Embeddinga]gO Google A; Xb!LWord2vec^V \B W^Ec!80)2H_!LRP IM Word2vec&#(-%1/Qh@Ec!8 )"-1 +4%0)27> Ec!8<@
RNN: Recurrent Neural Network *-H,+.=8 G "!%AB !*DF
@162 ,'/5?)/ G#$&!:(8 RNN> C;79304E LSTMLong Short Term MemoryGRU Gated Recurrent Unit<
BoW DNN
Word EmbeddingGlobalAveragePooling1D
Word EmbeddingRNNLSTM DNN
BoWDNN 0.5E #9("%$)CBoW+/ DNN4: * DBG6GlobalAveragePooling1D1 !$=2F
A LSTM7H2F,- <4: ' ; 7I ?3>8)CLSTM 4: & @:4
NLP,B8?=4-1$!&)%+"C5>@.A 7EFDQ&A-1Sequence-to-Sequence($* Attention :($*.A;3 OpenAIGoogle
Transformer '#Allen Institute 2.ELMo Google G5($*3BERTOpenAI .6GPT-204 <($* 9/