ニューラル固有表現抽出 / Neural Named Entity Recognition

Makoto Hiramatsu <@himkt> University of Tsukuba ニューラル固有表現抽出 DLHacks 12/13 https://bit.ly/2HNHngs

自己紹介 2 Makoto Hiramatsu, himkt • Master student @Graduate School
of Informatics, University of Tsukuba • Research Engineer @Cookpad Inc. • HQ @nlpaper.challenge (https://nlpaper-challenge.connpass.com/) • ex-Researcher @Retrieva Inc. Interests: Natural Language Processing, Named Entity Recognition @himkt or @himakotsu @himkt For more detail: https://himkt.github.io

Neural Architectures for Named Entity Recognition In Annual Conference of
the North American Chapter of the Association for Computational Linguistics (NAACL2016) Guillaume Lample et al. (Carnegie Mellon University) (現在: Facebook AI Research) 書誌情報 3 著者実装: https://github.com/glample/tagger

選定理由 • 固有表現抽出 (Named Entity Recognition; NER) のタスクでベースラインとしてよく使われる •
2016年以降のニューラルNERの基礎となっている（ELMoとかBERTとか出てきて様子は変わり始めた感はある） • (自分の研究ツールを公開する外圧をかけたかった) 4

タスクの説明: 固有表現抽出 • テキストの中から人名や組織名のような固有表現 (NE) を抽出する • 固有表現の定義はほしい情報に基づいて柔軟に定義する • 情報抽出や検索エンジンのインデキシングなどで有用
https://explosion.ai/demos/displacy-ent 5

タスクの説明: 固有表現抽出 •系列ラベリング問題として定式化されることが多い • 単語は離散のシンボル，文は単語の列: 自然言語処理の多くのタスクの定式化に使える •固有表現抽出を系列ラベリング問題として解く場合: 系列タグスキーマを利用: BIOES (BIO
or IOB) (B: Begin, I: Inside, O: Outside, E: End, S: Single) George B-PER Washington E-PER Visited O England S-LOC PERSON: George Washington, LOCATION: England 6

•IOB (or IOB1): CoNLL 2003 のデータセットで使われている • 接頭辞 I を付与;
NEが連続する場合，2番目以降のNEの開始単語に接頭辞 B を付与（NERの場合にはほとんどBのprefixは出てこない） •BIO (or IOB2) • NEが連続するかどうかにかかわらずNEの開始単語に接頭辞 B を付与 •BIOES: BIOの拡張（今回の実験で採用) • BIO の拡張: NE の終端単語に接頭辞 E を付与 • ただし， NE が単一の単語である場合，その単語には接頭辞 S を付与 7 I-PER E-PER S-PER O S-LOC O B-PER I-PER I-PER B-PER O I-LOC O B-PER I-PER I-PER B-PER O I-LOC O I-PER IOB BIO BIOES 系列タグスキーマ

データセット •CoNLL 2003 データセット • ニュース記事に対して「人物」「組織」「場所」「その他」の４種類のNEタグを付与 • 英語: Reuters
Corpus（train, dev: 10日分の記事, test: 1日分の記事？） • ドイツ語: ECI Multilingual Text Corpus（train, dev, test: 1週間分の記事） •CoNLL 2003 ことはじめ 1. ロイターコーパスを入手する • http://trec.nist.gov/data/reuters/reuters.html 2. NEのアノテーションデータを入手する • https://www.clips.uantwerpen.be/conll2003/ner/ 3. NEタグ付与コーパスを生成する • 2で入手したデータの中に生成スクリプトが含まれている 8 https://www.clips.uantwerpen.be/conll2003/ner/

論文: Neural Architectures for Named Entity Recognition • 単語/文字レベルの分散表
現をBi-LSTMを用いてエンコード • 条件付き確率場 (CRF) を用いて遷移を考慮したタグ系列を予測 • CoNLL2003 (NERでよく使われるベンチマークタスク) でよい性能 9

単語の特徴量の構成 •以下の特徴量を結合 • 単語の分散表現 • 文字の分散表現をBi-LSTMに入力して得られる隠れ状態 10

Two types of Bi-LSTM • LSTM-CRF の内部の Bi-LSTM • 順方向LSTMと逆方向LSTM
各時刻の隠れ状態の結合 (chainer: NStepLSTMの出力の3番目) • Word encoder の内部の Bi-LSTM • 順方向LSTMと逆方向LSTM それぞれの最終時刻の隠れ状態の結合 (chainer: NStepLSTMの出力の1番目) 11 単語Bi-LSTM: 文字Bi-LSTM:

単語の特徴量の構成 12 2016年のNAACLの段階でのstate-of-the-art • 現在は... TagLM [3]: 91.9, ELMo
[4]: 92.2, BERT: 92.8... Lampleらの手法 (下２つは今回説明しなかったもの)

</introduction><implementation> 13

himkt/pyner 14

15 お世話になったライブラリ •ネットワークの記述: chainer/chainer •学習済み分散表現の取り扱い: RaRe-Technologies/gensim •モデルの評価: chakki-works/seqeval https://github.com/chainer/chainer https://github.com/RaRe-Technologies/gensim
https://github.com/chakki-works/seqeval

Trainer Updater Iterator Extension Optimizer Dataset Model Trainerを使う場合に必要なモジュール see also:
https://qiita.com/mitmul/items/1e35fba085eb07a92560#trainerを使ってみよう 16

使いまわせる Chainer モジュール 17 Trainer Updater Extension Optimizer Dataset Model
Iterator

18 pyner が提供するモジュール • SequenceLabelingDataset • DatasetTransformer • NamedEntityEvaluator •
(LearningRateDecayer) Transformer Trainer Updater Extension Optimizer Dataset Model Iterator

実験: 設定 •train, dev, test: CoNLL2003の分割に従う •パラメータ: 論文に従う • 単語・文字分散表現:
100 / 50 • 単語・文字レベルBiLSTMの次元: 2x100 / 2x25 • 勾配クリッピング: 5.0 • ドロップアウトのratio: 0.5 •ゼロ正規化（時間があれば後述）: 有効 •学習済みの分散表現: Skip-N-Gram [5] を初期値として利用 19

実験: 比較条件 •単語の分散表現: ランダム or 学習済みの分散表現 •文字特徴: 使う or 使わない
•ドロップアウト: 有効 or 無効 20

実験: 単語（ランダム初期化） 21

実験: 単語（ランダム初期化）＋文字 22

実験: 単語（学習済み） 23

実験: 単語（学習済み）＋文字 24

実験: 単語（学習済み） + Dropout 25

実験: 単語（学習済み）＋文字＋ドロップアウト 26

まとめ •Neural Architectures for Named Entity Recognition • 固有表現抽出のタスクにおいて人気のある手法 •
BiLSTM-CRFによる高性能な抽出 • 文字ベース特徴を考慮 •実装 (PyNER) の公開 • Chainerによる実装 • 系列ラベリングタスクでTrainerを使うためのコンポーネントを提供 • 論文の性能が再現されることを確認 27

28 himkt/pyner

勾配のクリッピング • RNNは forward/backward を時系列方向に繰り返す • 系列長 T が大きくなるにつれて勾配が爆発的に
大きく/小さくなることがある（勾配爆発） • 勾配のクリッピング: 勾配に対して上限値を与える • クリッピングには element-wise/norm の２通り存在するらしい • https://www.slideshare.net/tsubosaka/deeplearning-60419659 • 著者実装は Theano の grad_clip を呼んでいる; element-wise である • https://github.com/glample/tagger/blob/1c9618889fb89500cc5e70c45c27859b89d44449/optimization.py#L26 • https://github.com/Theano/theano/blob/d395439aec5a6ddde8ef5c266fd976412a5c5695/theano/gradient.py#L2187 29

ゼロ正規化 •テキスト中に出現する数字をすべて 0 に変換する • CoNLL2003データセットでは数字がたくさん出現する • これらを１つのシンボルにまとめてしまう • 語彙数:
246,679 -> 210,023 30 https://github.com/glample/tagger/blob/master/utils.py#L82

ちなみに: 処理を除いたり足したりすると？ 31

One more thing 32

パラメータの初期化（論文） 33

パラメータの初期化（著者実装） https://github.com/glample/tagger/blob/master/utils.py#L44 34 Xavier initialization (=Glorot Normal) [7]

https://github.com/glample/tagger/blob/master/model.py#L281 35 CRFの重みの初期値 https://github.com/chainer/chainer/blob/master/chainer/links/loss/crf1d.py (until: 2018/12) （https://github.com/chainer/chainer/pull/5807 で修正完了）

36 • いろいろな初期化方法が使われている（Uniformベースが多い） • [-sqrt(6/(in+out)), sqrt(6/(in+out))] (Xavier initialization [7]) •
Lample+ [1]: https://github.com/glample/tagger/blob/master/utils.py#L52 • [-sqrt(3/dim), sqrt(3/dim)] (LeCun initialization [8]) • Ma+ [2]: https://github.com/XuezheMax/NeuroNLP/blob/master/sequence_labeling.py#L161 • Liu+ [6]: https://github.com/LiyuanLucasLiu/LM-LSTM-CRF/blob/master/model/utils.py#L793 • (論文ではGlorotを引用している) • word2vec/word2vec: Uniform [-(0.5/dim), (0.5/dim)] • https://github.com/tmikolov/word2vec/blob/master/word2vec.c#L365 • facebookresearch/fastText: Uniform [-(1/dim), (1/dim)] • https://github.com/facebookresearch/fastText/blob/master/src/fasttext.cc#L734 • Chainer: Xavier initialization (= Glorot Normal; not Uniform!) がデフォルト) パラメータの初期化方法

おまけ: パラメータをすべて論文通りに初期化 37

One more thing 38

Recent cool optimizer: AdaBound [9] •Adam に対して，学習率の上界・下界を与える •学習が進むにつれて，SGD (w/ Momentum)
に近づいていく (chainer: https://github.com/chainer/chainer/pull/6388)

[1] Neural Architectures for Named Entity Recognition. Lample et al.
NAACL2016. [2] End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF. Ma et al. ACL2016. [3] Semi-supervised sequence tagging with bidirectional language models. Peters et al. ACL2017. [4] Deep contextualized word representations. Peters et al. NAACL2017. [5] Empower Sequence Labeling with Task-Aware Neural Language Model. Liu et al. AAAI2018 [6] Not all contexts are created equal: Better word representations with variable attention. Ling et al. EMNLP2016 [7] Understanding the difficulty of training deep feedforward neural networks. Glorot et al. AISTATS2010. [8] Efficient Backprop. LeCun et al. Neural Network: Tricks of the Trade. [9] Adaptive Gradient Methods with Dynamic Bound of Learning Rate. Luo et al. ICLR2019. 40

ニューラル固有表現抽出 / Neural Named Entity Recognition

ニューラル固有表現抽出 / Neural Named Entity Recognition

More Decks by himkt

Other Decks in Research

Featured

Transcript