Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
サブセット探索を用いた高速なkNNニューラル機械翻訳
Search
Hiroyuki Deguchi
March 22, 2024
Research
0
140
サブセット探索を用いた高速なkNNニューラル機械翻訳
第8回AAMTセミナー
AAMT若手翻訳研究会
最優秀賞
Hiroyuki Deguchi
March 22, 2024
Tweet
Share
More Decks by Hiroyuki Deguchi
See All by Hiroyuki Deguchi
20250226 NLP colloquium: "SoftMatcha: 10億単語規模コーパス検索のための柔らかくも高速なパターンマッチャー"
de9uch1
0
590
20240820: Minimum Bayes Risk Decoding for High-Quality Text Generation Beyond High-Probability Text
de9uch1
0
290
20240226_AAMT-Japio
de9uch1
0
160
Searching for Needles in a Haystack: On the Role of Incidental Bilingualism in PaLM’s Translation Capability
de9uch1
0
130
Paper Reading: Sampling-Based Approximations to Minimum Bayes Risk Decoding for Neural Machine Translation
de9uch1
0
180
My Research Environmental Setup
de9uch1
0
300
Nearest Neighbor Machine Translation
de9uch1
0
260
Paper Reading - Dynamic Programming Encoding for Subword Segmentation in Neural Machine Translation
de9uch1
0
290
paper reading - Tree Transformer
de9uch1
0
260
Other Decks in Research
See All in Research
Language Models Are Implicitly Continuous
eumesy
PRO
0
320
Hiding What from Whom? A Critical Review of the History of Programming languages for Music
tomoyanonymous
0
220
Integrating Static Optimization and Dynamic Nature in JavaScript (GPCE 2025)
tadd
0
110
超高速データサイエンス
matsui_528
1
190
VectorLLM: Human-like Extraction of Structured Building Contours via Multimodal LLMs
satai
4
380
単施設でできる臨床研究の考え方
shuntaros
0
3.1k
20250624_熊本経済同友会6月例会講演
trafficbrain
1
740
2025/7/5 応用音響研究会招待講演@北海道大学
takuma_okamoto
1
230
Stealing LUKS Keys via TPM and UUID Spoofing in 10 Minutes - BSides 2025
anykeyshik
0
150
【輪講資料】Moshi: a speech-text foundation model for real-time dialogue
hpprc
3
770
20250725-bet-ai-day
cipepser
2
500
Vision and LanguageからのEmbodied AIとAI for Science
yushiku
PRO
1
570
Featured
See All Featured
Why You Should Never Use an ORM
jnunemaker
PRO
60
9.6k
Performance Is Good for Brains [We Love Speed 2024]
tammyeverts
12
1.3k
The Language of Interfaces
destraynor
162
25k
Build your cross-platform service in a week with App Engine
jlugia
234
18k
Navigating Team Friction
lara
190
15k
Thoughts on Productivity
jonyablonski
72
4.9k
Principles of Awesome APIs and How to Build Them.
keavy
127
17k
Git: the NoSQL Database
bkeepers
PRO
431
66k
Leading Effective Engineering Teams in the AI Era
addyosmani
8
970
GraphQLとの向き合い方2022年版
quramy
49
14k
KATA
mclloyd
PRO
32
15k
Context Engineering - Making Every Token Count
addyosmani
8
340
Transcript
𝒌
◼ ⚫ ⚫ ◼ ⚫ (Zhang+, NAACL2018; Gu+, AAAI2018; Khandelwal+,
ICLR2021) ▶ (Nagao, 1984) ▶ ⚫ 𝑘 (Khandelwal+, ICLR2021) ▶ ▶ ▶ Guiding Neural Machine Translation with Retrieved Translation Pieces (Zhang+, NAACL2018) Search Engine Guided Neural Machine Translation (Gu+, AAAI2018) Nearest Neighbor Machine Translation (Khandelwal+, ICLR2021) A framework for a mechanical translation between Japanese and English by analogy principle (Nagao, 1984)
◼ ◼ ⚫ ⚫
𝒌 (Khandelwal+, ICLR2021) ◼ ⚫ ⚫ ⚫ ◼ ⚫ ▶
⚫ ▶ ≈ Nearest Neighbor Machine Translation (Khandelwal+, ICLR2021) 𝒙 𝒚
𝒌 (Khandelwal+, ICLR2021) 𝒌𝑖 ∈ ℝ𝐷 𝑓 𝒙, 𝒚<𝑡 ∈
ℝ𝐷 Nearest Neighbor Machine Translation (Khandelwal+, ICLR2021) ◼ 𝑘 ◼ ⚫ ⚫ 𝑝𝑘NN 𝑦𝑡 𝒙, 𝒚<𝑡 ∝ 𝑖=1 𝑘 𝟙𝑦𝑡=𝑣𝑖 exp − 𝒌𝑖 − 𝑓 𝒙, 𝒚<𝑡 2 2 𝜏 ◼ 𝑘
𝒌 ◼ (Martins+, EMNLP2022) ◼ (Meng+, ACLFindings2022) ⚫ 𝑘 𝑘
𝜆 = 0.5 𝑘 = 16 Chunk-based Nearest Neighbor Machine Translation (Martins+, EMNLP2022) Fast Nearest Neighbor Machine Translation (Meng+, ACL Findings2022)
𝒌 ◼ 𝑘 ◼ ⚫ 𝑘 (Matsui+, ACMMM2018) ⚫ 𝑘
𝑘 𝑘 Reconfigurable Inverted Index (Matsui+, ACMMM2018) 𝒌
◼ ⚫ 𝑘 ⚫ 𝑘 ◼ ◼ 𝑘
𝑛 𝑘 1 1 1 1 1 1 1 1
1
𝑛 𝑘 1 1 1 1 1 1 1 1
1
𝑛 𝑘 1 1 1 1 1 1 1 1
1
⚫ ⚫ ⚫ ⚫ ⚫ 𝑘 𝜆 = 0.5 𝑘
= 16 𝑛 = 56
𝑘 𝑘 ◼ 𝑘 ⚫ ▶ ⚫ ▶
◼ 𝑘 𝒌 𝒌
◼ ⚫ 𝑘
𝒌 𝒌 ◼ ⚫ ⚫ ◼ 𝑘 ⚫ ⚫ ◼
⚫
⚫ ⚫ ▶ ⚫ ▶