Using Linked Disambiguated Distributional Networks for Word Sense Disambiguation

Using Linked Disambiguated Distributional Networks for Word Sense Disambiguation Alexander
Panchenko , Stefano Faralli , Simone Paolo Ponzetto , and Chris Biemann 2018 5/29 文献紹介 ※例文、図、表は本論文より引用長岡技術科学大学自然言語処理研究室福嶋　真也 Proceeding of the 1st Workshop on Sense, Concept and Entity Representations and their Applications,pages 72-78,Valencia, Spain, April 4 2017.

2 Abstract • 教師なし知識ベースのWSDのための新しい方法を紹介 • 資源に2つのタイプのネットワークを使用・意味の分散を用いてコーパスから構築・人手で構築 • 2つのベンチマークで評価し、語彙の資源のみで最先端の教
師なし知識ベースのWSDシステムに匹敵する性能を達成

3 introduction • WSDのタスクにおいて様々なアプローチがなされている・graph-based distributional approach ・word sense embeddings
・conbination of both ・hybrid approach • このペーパーではハイブリッドな単語の意味表現の有用性を調査

4 introduction • 教師なし知識ベースの新しい手法を紹介 (WSD based on the Hybrid Aligned
Resource(HAR)) • HARではまばらな語彙の表現を用いている →意味表現を読めるようにし、かつWSDに使いやすいよう　にする • 象徴的な分散意味表現と語彙資源をリンクさせることで意味の表現を改善

5 Relation Work • Conbined distributional information and lexical resource
• Word embeddings • Skip-gram model • Sense embeddings etc...

6 Unsupervised Knowledge-based WSD using Hybrid Aligned Resource • HARの構築
・コーパスベースの部分 PCZ ID,Related Terms,Hypernyms,Context Clue ・知識ベースの部分 WordNet ID

7 Unsupervised Knowledge-based WSD using Hybrid Aligned Resource • PCZの構築手法（Faralli
et al.,2016）・分布シソーラスの構築・語義の誘導・上位語と共に語の意味をラベリング・関連語と上位語の曖昧さ回避・文脈の手がかりを検索

8 HAR Dataset • News（100 million sentences）・Gigaword(Parker et al.,2011)
・LCC(Richter et al.,2006) Average sense nunber：2.3 per word • Wikipedia (35 million sentences) Average sense nunber：1.8 per word

9 Experimental conditions • WordNet • WordNet+Related(news) • WordNet+Related(news)+Context(news) •
WordNet+Related(news)+Context(wiki) ただし、下2つのContextは1つの単語の意味に対し、 context cluesは5000個まで

10 Experimental conditions WordNet+Related(news)

11 Evaluation • WSDにおけるHARの影響を調査 • 評価に用いるデータセット・Senseval-3(Mihalcea et al.,2004) ・SemEval-2007
Task 17(Pradhan et al.,2007) 前者は粗いアノテーションときめ細かいアノテーションが存在後者はきめ細かいもののみ存在 • 全ての実験において the official task’s evaluator を使用 recall,precision,F-scoreを計算

12 Results • コーパスベースの特徴の影響・どちらのデータセットでもオリジナルに比べ、F-scoreが向上 →意味表現の拡張が大きなアドバンテージを生んでいる・Senseval-3 においてContextはいい結果を出さなかった

13 Results • SoTAとの比較・KnowNet(Cuadros and Rigau, 2008) ・BabelNet(Navigli and
Ponzetto, 2012) ・WN+XWN(Cuadros and Rigau, 2007) ・NASARI(Camacho-Collados et al., 2015a) 基本的にオリジナルのスコアを使用ただしNASARIは新たに意味表現を獲得したもので評価

14 Results • SoTAとの比較・Senseval-3において他のハイブリッドモデルより高い性能を示した・SemEval-2007ではBabelNetを上回れなかった →multilingual approachが効いている可能性がある　機械翻訳自体に豊富なカバレージの資源を使用している

15 Conclusion • HARを用いることにより、意味表現を正常に豊かに出来た • 2つのデータセットを用いた実験により、語彙資源のみをベースとしたモデルで優れたパフォーマンスを発揮した • 他のコーパスベースの特徴を利用したハイブリットモデルとの比較でSoTAなパフォーマンスを実証した

Using Linked Disambiguated Distributional Netwo...

Using Linked Disambiguated Distributional Networks for Word Sense Disambiguation

masaya82

More Decks by masaya82

Featured

Transcript

Using Linked Disambiguated Distributional Networks for Word Sense Disambiguation Alexander

2 Abstract • 教師なし知識ベースのWSDのための新しい方法を紹介 • 資源に2つのタイプのネットワークを使用・意味の分散を用いてコーパスから構築・人手で構築 • 2つのベンチマークで評価し、語彙の資源のみで最先端の教

3 introduction • WSDのタスクにおいて様々なアプローチがなされている・graph-based distributional approach ・word sense embeddings

4 introduction • 教師なし知識ベースの新しい手法を紹介 (WSD based on the Hybrid Aligned

5 Relation Work • Conbined distributional information and lexical resource

6 Unsupervised Knowledge-based WSD using Hybrid Aligned Resource • HARの構築

7 Unsupervised Knowledge-based WSD using Hybrid Aligned Resource • PCZの構築手法（Faralli

8 HAR Dataset • News（100 million sentences）・Gigaword(Parker et al.,2011)

9 Experimental conditions • WordNet • WordNet+Related(news) • WordNet+Related(news)+Context(news) •

10 Experimental conditions WordNet+Related(news)

11 Evaluation • WSDにおけるHARの影響を調査 • 評価に用いるデータセット・Senseval-3(Mihalcea et al.,2004) ・SemEval-2007

12 Results • コーパスベースの特徴の影響・どちらのデータセットでもオリジナルに比べ、F-scoreが向上 →意味表現の拡張が大きなアドバンテージを生んでいる・Senseval-3 においてContextはいい結果を出さなかった

13 Results • SoTAとの比較・KnowNet(Cuadros and Rigau, 2008) ・BabelNet(Navigli and