Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
サブセット探索を用いた高速なkNNニューラル機械翻訳
Search
Hiroyuki Deguchi
March 22, 2024
Research
0
140
サブセット探索を用いた高速なkNNニューラル機械翻訳
第8回AAMTセミナー
AAMT若手翻訳研究会
最優秀賞
Hiroyuki Deguchi
March 22, 2024
Tweet
Share
More Decks by Hiroyuki Deguchi
See All by Hiroyuki Deguchi
20250226 NLP colloquium: "SoftMatcha: 10億単語規模コーパス検索のための柔らかくも高速なパターンマッチャー"
de9uch1
0
570
20240820: Minimum Bayes Risk Decoding for High-Quality Text Generation Beyond High-Probability Text
de9uch1
0
280
20240226_AAMT-Japio
de9uch1
0
160
Searching for Needles in a Haystack: On the Role of Incidental Bilingualism in PaLM’s Translation Capability
de9uch1
0
130
Paper Reading: Sampling-Based Approximations to Minimum Bayes Risk Decoding for Neural Machine Translation
de9uch1
0
180
My Research Environmental Setup
de9uch1
0
300
Nearest Neighbor Machine Translation
de9uch1
0
250
Paper Reading - Dynamic Programming Encoding for Subword Segmentation in Neural Machine Translation
de9uch1
0
280
paper reading - Tree Transformer
de9uch1
0
250
Other Decks in Research
See All in Research
20250605_新交通システム推進議連_熊本都市圏「車1割削減、渋滞半減、公共交通2倍」から考える地方都市交通政策
trafficbrain
0
850
J-RAGBench: 日本語RAGにおける Generator評価ベンチマークの構築
koki_itai
0
660
Combinatorial Search with Generators
kei18
0
930
長期・短期メモリを活用したエージェントの個別最適化
isidaitc
0
170
Stealing LUKS Keys via TPM and UUID Spoofing in 10 Minutes - BSides 2025
anykeyshik
0
140
snlp2025_prevent_llm_spikes
takase
0
360
[論文紹介] Intuitive Fine-Tuning
ryou0634
0
120
RHO-1: Not All Tokens Are What You Need
sansan_randd
1
190
20250725-bet-ai-day
cipepser
2
470
【輪講資料】Moshi: a speech-text foundation model for real-time dialogue
hpprc
3
740
Learning to (Learn at Test Time): RNNs with Expressive Hidden States
kurita
1
260
[CV勉強会@関東 CVPR2025] VLM自動運転model S4-Driver
shinkyoto
2
520
Featured
See All Featured
Context Engineering - Making Every Token Count
addyosmani
5
210
The MySQL Ecosystem @ GitHub 2015
samlambert
251
13k
Why You Should Never Use an ORM
jnunemaker
PRO
59
9.6k
Designing Experiences People Love
moore
142
24k
Fireside Chat
paigeccino
40
3.7k
Product Roadmaps are Hard
iamctodd
PRO
54
11k
Git: the NoSQL Database
bkeepers
PRO
431
66k
Testing 201, or: Great Expectations
jmmastey
45
7.7k
Imperfection Machines: The Place of Print at Facebook
scottboms
269
13k
Art, The Web, and Tiny UX
lynnandtonic
303
21k
ReactJS: Keep Simple. Everything can be a component!
pedronauck
667
120k
Statistics for Hackers
jakevdp
799
220k
Transcript
𝒌
◼ ⚫ ⚫ ◼ ⚫ (Zhang+, NAACL2018; Gu+, AAAI2018; Khandelwal+,
ICLR2021) ▶ (Nagao, 1984) ▶ ⚫ 𝑘 (Khandelwal+, ICLR2021) ▶ ▶ ▶ Guiding Neural Machine Translation with Retrieved Translation Pieces (Zhang+, NAACL2018) Search Engine Guided Neural Machine Translation (Gu+, AAAI2018) Nearest Neighbor Machine Translation (Khandelwal+, ICLR2021) A framework for a mechanical translation between Japanese and English by analogy principle (Nagao, 1984)
◼ ◼ ⚫ ⚫
𝒌 (Khandelwal+, ICLR2021) ◼ ⚫ ⚫ ⚫ ◼ ⚫ ▶
⚫ ▶ ≈ Nearest Neighbor Machine Translation (Khandelwal+, ICLR2021) 𝒙 𝒚
𝒌 (Khandelwal+, ICLR2021) 𝒌𝑖 ∈ ℝ𝐷 𝑓 𝒙, 𝒚<𝑡 ∈
ℝ𝐷 Nearest Neighbor Machine Translation (Khandelwal+, ICLR2021) ◼ 𝑘 ◼ ⚫ ⚫ 𝑝𝑘NN 𝑦𝑡 𝒙, 𝒚<𝑡 ∝ 𝑖=1 𝑘 𝟙𝑦𝑡=𝑣𝑖 exp − 𝒌𝑖 − 𝑓 𝒙, 𝒚<𝑡 2 2 𝜏 ◼ 𝑘
𝒌 ◼ (Martins+, EMNLP2022) ◼ (Meng+, ACLFindings2022) ⚫ 𝑘 𝑘
𝜆 = 0.5 𝑘 = 16 Chunk-based Nearest Neighbor Machine Translation (Martins+, EMNLP2022) Fast Nearest Neighbor Machine Translation (Meng+, ACL Findings2022)
𝒌 ◼ 𝑘 ◼ ⚫ 𝑘 (Matsui+, ACMMM2018) ⚫ 𝑘
𝑘 𝑘 Reconfigurable Inverted Index (Matsui+, ACMMM2018) 𝒌
◼ ⚫ 𝑘 ⚫ 𝑘 ◼ ◼ 𝑘
𝑛 𝑘 1 1 1 1 1 1 1 1
1
𝑛 𝑘 1 1 1 1 1 1 1 1
1
𝑛 𝑘 1 1 1 1 1 1 1 1
1
⚫ ⚫ ⚫ ⚫ ⚫ 𝑘 𝜆 = 0.5 𝑘
= 16 𝑛 = 56
𝑘 𝑘 ◼ 𝑘 ⚫ ▶ ⚫ ▶
◼ 𝑘 𝒌 𝒌
◼ ⚫ 𝑘
𝒌 𝒌 ◼ ⚫ ⚫ ◼ 𝑘 ⚫ ⚫ ◼
⚫
⚫ ⚫ ▶ ⚫ ▶