A Word-Complexity Lexicon and A Neural Readability Ranking Model for Lexical Simplification

弊研究室で行なったEMNLP2018読み会の発表資料です。

onizuka laboratory

December 18, 2018

Tweet

More Decks by onizuka laboratory

See All by onizuka laboratory

Phrase-Based & Neural Unsupervised Machine Translation

0

120

Tell-and-Answer: Towards Explainable Visual Question Answering using Attributes and Captions

0

72

Card-660: A Reliable Evaluation Framework for Rare Word Representation Models

0

38

Integrating Transformer and Paraphrase Rules for Sentence Simplification

0

61

An Auto-Encoder Matching Model for Learning Utterance-Level Semantic Dependency in Dialogue Generation

0

57

Generating More Interesting Responses in Neural Conversation Models with Distributional Constraints

0

100

Modeling Multi-turn Conversation with Deep Utterance Aggregation

0

98

Learning Semantic Sentence Embeddings using Pair-wise Discriminator

0

120

SGM: Sequence Generation Model for Multi-Label Classification

0

81

Other Decks in Research

See All in Research

「車1割削減、渋滞半減、公共交通2倍」を熊本から岡山へ＠RACDA設立30周年記念都市交通フォーラム2026

1

740

ドメイン知識がない領域での自然言語処理の始め方

1

260

病院向け生成AIプロダクト開発の実践と課題

0

570

Tiaccoon: Unified Access Control with Multiple Transports in Container Networks

0

1.1k

2026 東京科学大情報通信系研究室紹介 (大岡山)

0

770

A History of Approximate Nearest Neighbor Search from an Applications Perspective

1

200

ペットのかわいい瞬間を撮影するオートシャッターAIアプリへのスマートラベリングの適用

0

380

AIスーパーコンピュータにおけるLLM学習処理性能の計測と可観測性 / AI Supercomputer LLM Benchmarking and Observability

1

740

Can We Teach Logical Reasoning to LLMs? – An Approach Using Synthetic Corpora (AAAI 2026 bridge keynote)

1

170

Earth AI: Unlocking Geospatial Insights with Foundation Models and Cross-Modal Reasoning

3

630

ウェブ・ソーシャルメディア論文読み会第36回: The Stepwise Deception: Simulating the Evolution from True News to Fake News with LLM Agents (EMNLP, 2025)

0

200

Satellites Reveal Mobility: A Commuting Origin-destination Flow Generator for Global Cities

3

640

Featured

See All Featured

The Art of Programming - Codeland 2020

57

14k

Practical Orchestrator

191

11k

Practical Tips for Bootstrapping Information Extraction Pipelines

25

1.8k

How People are Using Generative and Agentic AI to Supercharge Their Products, Projects, Services and Value Streams Today

1

140

Build your cross-platform service in a week with App Engine

234

18k

The MySQL Ecosystem @ GitHub 2015

251

13k

The AI Search Optimization Roadmap by Aleyda Solis

1

5.4k

Measuring & Analyzing Core Web Vitals

9

780

Leadership Guide Workshop - DevTernity 2021

1

240

Color Theory Basics | Prateek | Gurzu

0

250

Being A Developer After 40

91

590k

Stewardship and Sustainability of Urban and Community Forests

0

140

Transcript

EMNLP A Word-Complexity Lexicon and A Neural Readability Ranking Model
2018/12/18 M1
• 2 • 15000 • SimplePPDB++
2
3 Complex Sentence The cat perched on the mat. Substitution
Generation perched : rested, sat Substitution Ranking #1 : sat, #2 : rested Complex Word Identification The cat perched on the mat. Simplification Sentence The cat sat on the mat.
$,52(% *60#94 -):3 • 60 • $;! '
. • foolishness7 vs folly1 • 60 foolishness • Google Ngram Corpus foolishness/;! • PPDB"&2272 • 21%60 8160 • 14%/;! 760 4 +2
- • Google Ngram Corpus • Wo 15000 • 11
L • 6 5 6 • e p bug n d • C Wo c • 1000 i 2-2.5h • 1 5-7 L • m l 5
- C 2 • 3% • L 0.55 → 0.64
• • ≦0.5 47% • ≦1.0 78% • ≦1.5 93% 6
2 7
• ,/+*23.0! •
SemEval2012$! "% • )-2*15Candidates • $! "% • %'&(30Target300Candidate • #% 171Target1710Candidate 8 TEXT When you think about it, that’s pretty terrible. Target terrible Candidates bad, awful, deplorable
9 P@1 1 S all binning WC R 15000
• PPDB P Ranking model • PPDB • • •
+ + + • PPDB D • 10B S 10
+ 11 SimplePPDB++
Target Candidate • 100 Target Candidate • 2 • Candidate
G • SimplePPDB++ 12
13
• n Target • PPs Candidate • MAP Candidate • P@1 Top1
I • SemEval2016 CWIG3G2 • C WC 14
15
• 2'"#( & • SOTA% • 15000'"#(
• !*$ CWI) • SimplePPDB++ 16