Slide 4
Slide 4 text
◆Segmentation/Tokenization
- The (almost) necessary first step of NLP,
which segments a sentence into tokens
◆Word
- Human-understandable unit
- Processing unit of traditional NLP
- Mandatory unit for linguistic analysis (e.g., parsing and PAS)
- Useful information as a feature or an intermediate unit of
subword for application-oriented tasks (e.g., NER and MT)
4
Char ニ,ュ,ー,ラ,ル,ネ,ッ,ト,ワ,ー,ク,に,よ,る,自,然,言,語,処,理
Subword ニュー,ラル,ネット,ワーク,による,自然,言語,処理
Word ニューラル,ネットワーク,に,よる,自然,言語,処理
ニューラルネットワークによる自然言語処理 ‘Natural language processing
based on neural networks’
Background (1/4)