Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Neural Machine Translation with Byte-Level Subw...
Search
Sponsored
·
Your Podcast. Everywhere. Effortlessly.
Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
→
Scatter Lab Inc.
May 15, 2020
Research
2.6k
0
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
Neural Machine Translation with Byte-Level Subwords
Scatter Lab Inc.
May 15, 2020
More Decks by Scatter Lab Inc.
See All by Scatter Lab Inc.
zeta introduction
scatterlab
0
1.9k
SimCLR: A Simple Framework for Contrastive Learning of Visual Representations
scatterlab
0
4.4k
Adversarial Filters of Dataset Biases
scatterlab
0
2.3k
Sparse, Dense, and Attentional Representations for Text Retrieval
scatterlab
0
2.3k
Weight Poisoning Attacks on Pre-trained Models
scatterlab
0
2.2k
Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval
scatterlab
0
2.5k
Beyond Accuracy: Behavioral Testing of NLP Models with CheckList
scatterlab
0
2.3k
Open-Retrieval Conversational Question Answering
scatterlab
0
2.3k
What Can Neural Networks Reason About?
scatterlab
0
2.3k
Other Decks in Research
See All in Research
Sequences of Logits Reveal the Low Rank Structure of Language Models
sansantech
PRO
1
260
Scalable dynamic origin-destination demand estimation enhanced by high-resolution satellite imagery data
satai
3
250
CyberAgent AI Lab研修 / Social Implementation Anti-Patterns in AI Lab
chck
7
4.6k
LLM Compute Infrastructure Overview
karakurist
2
1.4k
セマンティック通信勉強会 6Gに向けたデバイス間効率的な通信の技術紹介・課題・今後展望
satai
3
150
PGDM: Physically Guided Diffusion Model for L Downscaling
satai
2
250
LLMアプリケーションの透明性について
fufufukakaka
0
230
AY 2026 Guide to Academic Writing Using Generative AI - Workshop
ks91
PRO
0
120
Fukui Shibiten 39 - AI Art
butchi
0
110
適応的スパムフィルタのための軽量な類似メッセージカウンタ / jsai2026-adaptive-spam-filter
monochromegane
0
1.5k
正規分布と最適化について
koide3
1
240
「行ける・行けない表」による地域公共交通の性能評価
bansousha
0
160
Featured
See All Featured
Building a A Zero-Code AI SEO Workflow
portentint
PRO
0
550
Building Applications with DynamoDB
mza
96
7.1k
Why Your Marketing Sucks and What You Can Do About It - Sophie Logan
marketingsoph
0
160
What does AI have to do with Human Rights?
axbom
PRO
1
2.2k
The SEO identity crisis: Don't let AI make you average
varn
0
480
The Organizational Zoo: Understanding Human Behavior Agility Through Metaphoric Constructive Conversations (based on the works of Arthur Shelley, Ph.D)
kimpetersen
PRO
0
350
A better future with KSS
kneath
240
18k
Utilizing Notion as your number one productivity tool
mfonobong
4
320
WENDY [Excerpt]
tessaabrams
11
38k
Test your architecture with Archunit
thirion
1
2.3k
The Mindset for Success: Future Career Progression
greggifford
PRO
0
350
The World Runs on Bad Software
bkeepers
PRO
72
12k
Transcript
Byte-level BPE: Neural Machine Translation with Byte-level Subwords ഘ (ML
Research Scientist, Pingpong)
Neural Machine Translation with Byte-level Subwords Overview • “Neural Machine
Translation with Byte-level Subwords” • Changhan Wang, Kyunghyun Cho, and Jiatao Gu (Facebook AI Research) • AAAI 2020 (arXiv 2019)
1. Introduction Neural Machine Translation with Byte-level Subwords (Wang et
al., 2019)
Byte-Pair Encoding (BPE) 1. Introduction • ࠼بо ֫ Character हਸ
߽೧աх vocab = all_unique_characters while len(vocab) <= max_vocab_size: pair = get_max_pair(corpus) corpus = merge_vocab(corpus, pair) vocab.append(pair)
Character? Byte? 1. Introduction • Character (a, b, c, о,
ա, , …) • ۽ BPEೞݶ character-levelਸ ݈ೣ • Textۄח ѱ sequence of character۽ॄ അೞח ѱ োझ۞ਕࢲ • Byte (E3, 81, AE, …) • Compactness: 256ѐ ష݅ ਵݶ ޤٚ ٜ݅ ࣻ • যী ࢚ҙ হ ࢎਊೡ ࣻ
Character? Byte? 1. Introduction • Character (a, b, c, о,
ա, , …) • ۽ BPEೞݶ character-levelਸ ݈ೣ • Textۄח ѱ sequence of character۽ॄ അೞח ѱ োझ۞ਕࢲ • Byte (E3, 81, AE, …) • Compactness: 256ѐ ష݅ ਵݶ ޤٚ ٜ݅ ࣻ • যী ࢚ҙ হ ࢎਊೡ ࣻ
Character-level BPE ೠ҅ 1. Introduction • Vocabularyীࢲ characterо ցޖ ݆
ठ܃ਸ ରೡ ࣻ • Rare character from noisy text • Character-rich languages (such as CJK languages) • ৈ۞ যܳ ܖӝী ࠗೣ • bilingual and multilingual • 150ѐ যܳ ழߡೞ۰ݶ 138K ਬפ٘ characterо ਃೣ • ߈ݶ, UTF-8 byteח 256ѐ ী 248ѐ݅ ਵݶ ழߡೡ ࣻ
2. Byte-level BPE Neural Machine Translation with Byte-level Subwords (Wang
et al., 2019)
Byte-level BPE (BBPE) 2. Byte-level BPE • ӝࠄਵ۽ ਬפ٘ characterܳ
UTF-8۽ ੋ٬ೣ • 1 ਬפ٘ = 1~4 byte • ੋ٬ ػ sequence of bytesী ೧ࢲ BPE णਸ दఇ • ୭ઙ vocab: UTF-8 byte set + BPEܳ ా೧ ୶о غח variable-length n-gram bytes Byte Sequence: EA B0 80 EB 82 98 EB 8B A4 EB 9D BC EB A7 88 EB B0 94 EC 82 AC Byte set: EA, B0, 80, EB, 82, 98, 8B, A4, 9D, BC, A7, 88, B0, 94, EC, 82, AC Variable-length n-gram bytes: EA B0, EB 82 98, A4 EB, …
Byte-level BPE (BBPE) 2. Byte-level BPE • ӝࠄਵ۽ ਬפ٘ characterܳ
UTF-8۽ ੋ٬ೣ • 1 ਬפ٘ = 1~4 byte • ੋ٬ ػ sequence of bytesী ೧ࢲ BPE णਸ दఇ • ୭ઙ vocab: UTF-8 byte set + BPEܳ ా೧ ୶о غח variable-length n-gram bytes ೞա characterо ଂѐ
Contextualization 2. Byte-level BPE • ݫੋ ݽ؛ী ٜযоӝ ী Contextualization
ਃೞҊ ೣ • рױೠ CNNա GRUܳ కਕࢲ ݫੋ ݽ؛ੋ Transformerী ٜযоח ߑध
Decoding 2. Byte-level BPE • ݽٚ ޙ byte sequence۽ അೡ
ࣻ ݅, যڃ byte sequenceח ޙਵ۽ ࠂਗ(decoding)ೞӝ গݒೣ • Ex) Generation, Translation
Decoding 2. Byte-level BPE • Empirically, णػ ݽ؛ীࢲ ੜޅػ byte
sequenceܳ outputਵ۽ ղࠁղח ҃ח ٘ޛҊ ೣ • प೧ࠄ Ѣীࢲח Ѣ হҊ, प ࣁ 165K example large testsetীࢲ ઑରب ٘ޛ • ডр ण ؏ ػ ݽ؛ীࢲח ࠂػ byteܳ ߈ࠂೞח ޙઁо • ۠ ী۞ ಁఢٜ ୭ೠ ݆ ਬפ٘ character۽ ࢶഋदрী ࠂਗೞҊ ೣ • Dynamic Programming ӝ߈ ঌҊ્ܻਸ ઁউ
Decoding: algorithm 2. Byte-level BPE • Byte sequence о য
• ܳ ীࢲ ࠂਗ оמೠ ୭ character ѐࣻۄҊ ೞ • ח dynamic programmingਸ ా೧ࢲ ইې৬ э ҅ೡ ࣻ {B}N k=1 f(k) {B}N k=1 f(k) • о ৢ߄ܲ characterݶ , ইפݶ 0 • ਤ ܳ ӈਵ۽ backtrackingೞݶࢲ ҅ೞݶ ೧ܳ ҳೡ ࣻ {B}j k=i g(i, j) = 1 f(k)
3. Experiments Towards Universal Dialogue State Tracking, (Ren et al.,
2018)
Experimental Setting • Dataset • Bilingual: En-De, Ja-En, Si-En •
Multilingual: Many-to-English (X-En) → TED Talk Corpus, 59ѐ যী ೠ parallel data • BPE & BBPE: Source + Target ޙী ೧ࢲ SentencePiece۽ ण 3. Experiments
Experimental Setting • Model and Learning • Transformer ࢎਊ •
Vaswini et al., 2017 ࣁਸ ݆ ٮܴ • Inference and Evaluation • Beam size: En-Deח 4, աݠח 5 • We calculate casesensitive tokenized BLEU (Papineni et al. 2002) as the metrics using sacreBLEU (Post 2018). 3. Experiments
Results: Qualitative Comparison: BPE vs. BBPE • Symbol Frequency Distribution
3. Experiments BBPEо ഻ঁ ؊ ࠙غয . Long tail Ѣ হҊ Ӓ۠ ൞ӈೠ ױযח subword۽ അ
Results: Qualitative Comparison: BPE vs. BBPE • Ratio of BBPE
tokens with partial characters 3. Experiments ੌࠄয৬ Multilingual partial character ࠺ਯ ࢚ೣ. Character set: ੌࠄয(8K), Multilingual(11K)
Results: Qualitative Comparison: BPE vs. BBPE 3. Experiments
Results: Qualitative Comparison: BPE vs. BBPE • Cross-lingual Sharing •
X-En symbolsҗ ݃ա Ҁசө? • Ar, He, Ru, Ko, It যী ೧ࢲ प • ߈ਵ۽ BBPEо symbols ݆ Ҁஜ • ݽ؛ ஏݶীࢲ parameter sharing ٙ • vocab ஏݶীࢲ universal modeling ٙ 3. Experiments
Results: Qualitative Comparison: BPE vs. BBPE • Impact on Sequence
Length 3. Experiments BBPEо ؊ fineೠ ױਤܳ ܖࠁפө sequenceо ӡযҊ ࢤпೡ ࣻ ݅, ষ ӟ Ѫب ইש
Results: Importance of Contextualization • X-Enী ೧ࢲ 3о ࣁਸ ࠺Ү
• none • 1-layer CNN • 1-layer Bi-GRU • Fine-grained vocabੌࣻ۾ ബҗо ఀ 3. Experiments
Results: BBPE on Noisy Character Sets • En-De ؘఠࣇীח non-latin
alphabet ખ • ۠ ਬ۽ character set 3.4Kա ؽ • BPEח character setਸ ನೣ೧ঠ ೞӝ ٸޙী ۠ ࠗ࠙ ݆ vocab ठ܃ਸ խ࠺दఇ • BBPE 2K, 4K৬ BPE 32Kо ࠺तೠ Ѿҗܳ • ೞ݅ ۄఠ ࣻীࢲ ݆ ٙਸ ࠆ 3. Experiments
Results: BBPE on Character-Rich Languages • Ҵয, ੌࠄযח 50Kо ֈח
character setਸ о • Ja-En ؘఠࣇ ୨ 8K character setҊ, top 2.4K characterо 99%ܳ ழߡೣ • ۠ ਸ Ҋ۰ೞৈ BBPE ӝܳ 4K۽ ࣁ • BPEী ೧ࢲ comparableೠ ࢿמਸ ࠁ 3. Experiments
Results: BBPE on Many-to-En Translation 3. Experiments BBPE જ Char/Byte
؊ જ (?)
• Impact on Sequence Length 3. Experiments Source৬ Target ӡ
ରо ݆ աࢲ attention য۰ਕ٠. Ӓېࢲ (B)BPE ࢿמ ڄয ѱ ইקө? Results: BBPE on Many-to-En Translation
Results: BBPE on Many-to-En Translation 3. Experiments Ӓۢীب ࠛҳೞҊ, ߈ਵ۽
BBPEо ࢿמա ࣘب ݶীࢲ ߖ۠झо જ Ѫ э
Results: Transfer Learning on Unseen Characters • BBPEח ݽٚ UTF-8
byteܳ ನೣೞӝ ٸޙী OOV ޙઁо ਸ ࣻ হ • ٮۄࢲ character set ഃ উҀח ف যী ೧ transferring оמೣ • X-Enਵ۽ pre-trainingೠ ݽ؛ਸ Si-Enী ೧ࢲ Fine-tuningೞݶ transferо ੜ غח Ѫਸ ࠅ ࣻ 3. Experiments
4. Conclusion Towards Universal Dialogue State Tracking, (Ren et al.,
2018)
Contributions 4. Conclusion • Byte-level subword vocabularyܳ ݅٘ח BBPEܳ ઁউ
• Character-based ӝߨী ࠺೧ࢲ ࢿמਸ ਬೞݶࢲ vocabularyܳ ݒ ѱ ٜ݅ ࣻ • Multilingual settingীࢲח ઙઙ ؊ ࢿמ જӝب ೣ • OOV ޙઁب ഃ হ • নೠ যী transferringب оמೞҊ, ח ݒ genericೞҊ ࢿמ, training acceleration ݶীࢲ ٙ • Character-based ӝߨࠁ sequence lengthب ؊ ૣইࢲ ࡅܲ णҗ ୶ۿ оמೣ
Future Work 4. Conclusion • Source-Target ӡ ରо ٸ
ࢿמ ڄযח ޙઁܳ ೧Ѿ೧ࠅ Ѫ • One-to-Many, Many-to-Many settingীࢲب ಣоܳ ೧ࠁҊ ೣ
хࢎפ✌ ୶о ޙ ژח ҾӘೠ ݶ ઁٚ ইې োۅ۽
োۅ ࣁਃ! ഘ (ML Research Scientist, Pingpong) Email.
[email protected]
Facebook. @roomylee Linked in. @roomylee