Lock in $30 Savings on PRO—Offer Ends Soon! ⏳
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
2018年度レトリバインターン参加報告
Search
Satoru Katsumata
December 10, 2023
0
17
2018年度レトリバインターン参加報告
レトリバで2018年度夏季インターンに参加した報告スライドです。
研究室で発表した資料になります。
Satoru Katsumata
December 10, 2023
Tweet
Share
More Decks by Satoru Katsumata
See All by Satoru Katsumata
論文紹介: Word-node2vec
katsumata420
0
69
論文紹介: How Contextual are Contextualized Word Representations?
katsumata420
0
41
論文紹介: Incorporating Syntactic and Semantic Information in Word Embeddings using Graph Convolutional Networks
katsumata420
0
41
論文紹介: Exploiting Monolingual Data at Scale for Neural Machine Translation
katsumata420
0
44
論文紹介: Deep Neural Machine Translation with Linear Associative Unit
katsumata420
0
59
論文紹介: A convolutional encoder model for neural machine translation
katsumata420
0
88
論文紹介: Lexically constrained decoding for sequence generation using grid beam search
katsumata420
0
170
論文紹介: Memory-augmented Neural Machine Translation
katsumata420
0
54
論文紹介: Guiding neural machine translation with retrieved translation pieces
katsumata420
0
59
Featured
See All Featured
Docker and Python
trallard
46
3.7k
Raft: Consensus for Rubyists
vanstee
140
7.2k
Put a Button on it: Removing Barriers to Going Fast.
kastner
60
4.1k
Visualization
eitanlees
150
16k
Producing Creativity
orderedlist
PRO
348
40k
How To Stay Up To Date on Web Technology
chriscoyier
791
250k
Exploring the Power of Turbo Streams & Action Cable | RailsConf2023
kevinliebholz
36
6.1k
Connecting the Dots Between Site Speed, User Experience & Your Business [WebExpo 2025]
tammyeverts
10
690
Sharpening the Axe: The Primacy of Toolmaking
bcantrill
46
2.6k
Evolution of real-time – Irina Nazarova, EuRuKo, 2024
irinanazarova
9
1.1k
Learning to Love Humans: Emotional Interface Design
aarron
274
41k
The Pragmatic Product Professional
lauravandoore
37
7k
Transcript
ͳͭ͢ΈʹϨτϦόͰ Πϯλʔϯͨ͠ খொݚ म࢜̍ উຢ ஐ
1. ϨτϦόͬͯʁ 2. ϨτϦόͷڥ 3. औΓΜͩ༰ 4. ײ ϨτϦόͬͯԿʁ: ʢͬ͘͟Γʣձࣾઆ໌
▸ ࣗવݴޠॲཧΛ༻͍ͨιϑτΣΞͷݚڀɺ։ൃɺൢചɺಋೖΛ͍ͯ͠Δձࣾ ▸ [എܠͱ͔] PFI ͔ΒεϐϯΞτɺࠓͰ3 ▸ [ॴͳͲ] JR ൧ాڮӺ͔Βెา5ɺϏϧ1֊ΛआΓ͍ͯΔ ▸ ৄࡉ͕ؾʹͳΔํޙͰݸผʹ͓ئ͍͠·͢… ▸ ఆظతʹϐβύʔςΟͱ͔ͬͯΔͱͷ͜ͱͳͷͰؾʹͳͬͨํੋඇ 2
1. ϨτϦόͬͯʁ 2. ϨτϦόͷڥ 3. औΓΜͩ༰ 4. ײ ϨτϦόͬͯԿʁ: ʢͬ͘͟Γʣձࣾઆ໌
▸ ࣗવݴޠॲཧΛ༻͍ͨιϑτΣΞͷݚڀɺ։ൃɺൢചɺಋೖΛ͍ͯ͠Δձࣾ ▸ [എܠͱ͔] PFI ͔ΒεϐϯΞτɺࠓͰ3 ▸ [ॴͳͲ] JR ൧ాڮӺ͔Βెา5ɺϏϧ1֊ΛआΓ͍ͯΔ ▸ ৄࡉ͕ؾʹͳΔํޙͰݸผʹ͓ئ͍͠·͢… ▸ ఆظతʹϐβύʔςΟͱ͔ͬͯΔͱͷ͜ͱͳͷͰؾʹͳͬͨํੋඇ 3
1. ϨτϦόͬͯʁ 2. ϨτϦόͷڥ 3. औΓΜͩ༰ 4. ײ ϨτϦόͷΠϯλʔϯͱʁ ▸
ݚڀɺ։ൃͰืू ▸ ࠓճ ݚڀͰ2ਓɺ։ൃͰ2ਓͩͬͨ ▸ ݚڀ ࣗવݴޠॲཧ Ͱ1ਓɺԻೝࣝ Ͱ1ਓͷߏ ▸ ظؒ 8݄ɺ9݄ͷ 2ϲ݄ ▸ ࠓͷԠืకΊΓ 5/10 Ͱ ॻྨબߟ → ίʔσΟϯά՝ → ໘ → ࠾༻௨ ͱ͍͏ྲྀΕ ▸ ࣗίʔσΟϯάࡶڕࡶڕͳͷͰɺ ݚڀͷํͰरͬͯΒ͍͍ͨͱ͍͏ؾ࣋ͪʹ͋;Ε͍ͯͨʢͳͷͰड͔ͬͯخ͍͠ʣ 4 উຢݚڀʢࣗવݴޠॲཧʣ ଞͷΠϯλʔϯͷํ ୳ͤωοτ্Ͱݟ͔ͭΔ…ͱ ࢥ͍·͢
ϨτϦόͷ ڥ
1. ϨτϦόͬͯʁ 2. ϨτϦόͷڥ 3. औΓΜͩ༰ 4. ײ ϨτϦόͷڥ: རްੜɺڥʢΠϯλʔϯͰ͓ੈʹͳͬͨ༰த৺ʣ
▸ ϦϞʔτϫʔΫͷڐՄͳͲ͕औΓ͍͢ ← ෩ͷ࣌ʹͬ͘͢͝ॿ͔Γ·ͨ͠ ▸ Ҝࢠ͕͔ͳΓྑ͍ʢContessaʣˡ ϔουϨετͷॏཁੑ ▸ [ҿ৯ܥ] ຖिਫ༵ͷ͓ன͓หࢧڅ ← ࣾηϛφʔͳͲͷͨΊ ΦϑΟεάϦίಋೖɺҿΈ͕ϖοτϘτϧͰΒ͑Δ ▸ ͓څྉͷͰΔΠϯλʔϯͰ͢ʢΊͬͪΌॏཁʣ ަ௨අग़·͢ɺԕํͷΠϯλʔϯੜॅΉॴΛ༻ҙͯ͠ΒͬͨΓ 6
1. ϨτϦόͬͯʁ 2. ϨτϦόͷڥ 3. औΓΜͩ༰ 4. ײ ϨτϦόͷڥ: बۀ࣌ؒͱ͔
▸ جຊతʹ 10:00 - 18:00 ▸ ͓னૣΊʹ 11:30 ͝Ζʹʢࠞࡶରࡦʣ ▸ [ޕલத] தճֶͨ͠शͷ֬ೝͱ͔ ࠓԿ͔Δ͔ܾΊΔ [ޕޙ] ޕલதܾΊͨ͜ͱʹऔΓΉ 7
1. ϨτϦόͬͯʁ 2. ϨτϦόͷڥ 3. औΓΜͩ༰ 4. ײ ϨτϦόͷڥ: ߦࣄͱ͔ͦͷล
▸ ܴձ ϐβͱ͔ΛࣾͰ৯Δɺࣗݾհͱ͔ ▸ ϐβύʔςΟ ఆظతʹͬͯΔɺ֎෦ͷਓͱ͓ͨ͠͠ΓϘʔυήʔϜΛΔɺ ϐβΛ৯ΒΕΔ ▸ Ϙυήେձ ͓ன͔Β༦ํա͗͝Ζ·ͰϘʔυήʔϜɺ৭ʑͳͭΛͬͨ Camel Up ͕ݸਓతʹ໘ന͔ͬͨ 8 ϘʔυήʔϜΛΔػձ͕ଟ͘ɺϐβͱ͔৯Δ͜ͱ͕Ͱ͖Δʂ ↑ ༡ΜͰͳ͍ͷʹҹਂ͍ ↑
1. ϨτϦόͬͯʁ 2. ϨτϦόͷڥ 3. औΓΜͩ༰ 4. ײ ϨτϦόͷڥ: ৯ࣄʢटେͱൺΔͱӢటͷࠩʣ
ㅟ ▸ த՚ → 䠧߳ɺX’IAN ▸ ϋϫΠΞϯ → ALOHA TABLE ▸ ম͖ڕ → ӽޙُؙ ▸ ໌ଠࢠ → · ▸ ΠϯυΧϨʔ → ͻͭ͡ ▸ ڕɺ༲͛ → ͏͓࠲ ▸ ͏ͲΜ → խָʢ͏ͲΜͷதͷ͏ͲΜΒ͍͠ʣ 9 த՚ ϋϫΠΞϯ ম͖ڕ ໌ଠࢠʢ·ʣ ΠϯυʢΧϨʔʣ ڕɺ༲͛ ͏ͲΜ ൧ాڮӺ
ϨτϦόͰ औΓΜͩ͜ͱ ※ ৄࡉʹ͍ͭͯผͷࢿྉΛࢀর͍ͯͩ͘͠͞…
1. ϨτϦόͬͯʁ 2. ϨτϦόͷڥ 3. औΓΜͩ༰ 4. ײ औΓΜͩ༰: ࣗಈߍਖ਼ࢧԉख๏ͷݕ౼
▸ ֓ཁ [ೖྗ] ຊޠޠऀʹΑΔޡΓ͕͋Δʢ͔͠Εͳ͍ʣจ [ग़ྗ] ↑ͷޡΓΛݕग़ʢగਖ਼ʣͨ͠ͷ [༻Ͱ͖Δσʔλ] గਖ਼ޙʢޡΓؚ͕·Ε͍ͯͳ͍ʣจʢจষ୯Ґʣ [ͬͨ͜ͱ] గਖ਼ޙͷจ͔Βਖ਼͍͠จͷݴޠϞσϧΛֶश ˠ ೖྗจͷ֤୯ޠͷੜى֬Λࢉग़ɺᮢΑΓ͔ͬͨΒޡΓͱ͢Δ 11 ਖ਼ղʢग़ྗʣདྷདྷདྷੈௗʹͳۭͬͯΛࣗ༝ʹඈͼ͍ͨɻ ೖྗདྷདྷདྷੈௗʹͳۭͬͯΛࣗ༝ʹͼ͍ͨɻ ྫ
1. ϨτϦόͬͯʁ 2. ϨτϦόͷڥ 3. औΓΜͩ༰ 4. ײ ݴޠϞσϧͷͬ͘͟Γͨ͠Πϝʔδ ▸
ֶशσʔλʹج͍ͮͯɺ͋Δ୯ޠͷग़ݱ֬ΛٻΊ͍ͯΔ ྫ: ͷ୯ޠͷग़ݱ֬ΛٻΊΔ߹ ˠ ͦΕ·Ͱग़ݱͨ͠୯ޠʢi-1൪ʣ͔ΒٻΊΔ 12 w0 w1 … wi−1 wi wi P(wi |w0 , …, wi−1 )
1. ϨτϦόͬͯʁ 2. ϨτϦόͷڥ 3. औΓΜͩ༰ 4. ײ ݴޠϞσϧΛ༻͍ͨޡΓ୯ޠͷݕग़ ▸
ݴޠϞσϧ͔ΒٻΊͨ୯ޠͷੜى͕֬ᮢΑΓ͍ ˠ ޡΓ୯ޠͱ͢Δ ▸ ྫ: ᮢΛ 0.1 ͱͨ࣌͠ 13 w0 w1 … wi−1 wi P(wi |w0 , …, wi−1 ) < 0.1
1. ϨτϦόͬͯʁ 2. ϨτϦόͷڥ 3. औΓΜͩ༰ 4. ײ ࠓճͷࣗಈධՁई ▸
ೖྗͱగਖ਼݁Ռɺೖྗͱਖ਼ղʹ͍ͭͯɺͦΕͧΕͷจରͷҧ͍Λൺֱ͢Δ 14 ਖ਼ղࢲͷॴଐ͍ͯ͠Δେֶͷ໊শ͕มΘΓ·͢ɻ ೖྗࢲͷॴଐ͍ͯ͠Δେֶͷ໊উ͕ΘΓ·͢ɻ గਖ਼݁Ռࢲͷॾ͍ͯ͠Δେֶͷ໊শ͕ΘΓ·͢ɻ <❌>గਖ਼͕ bॴଐ`Λbॾ`ஔ 'BMTF1PTJUJWF <⭕>గਖ਼͕ b໊উ`Λ`໊শ`ஔ 5SVF1PTJUJWF <❌>గਖ਼͕ b`Λ`ม`ஔ͠ͳ͍ 'BMTF/FHBUJWF ྫ
1. ϨτϦόͬͯʁ 2. ϨτϦόͷڥ 3. औΓΜͩ༰ 4. ײ ࣮ݧઃఆ ▸
ֶशσʔλ: ͱ͋Δͷσʔλʢจ: 233,873 sentsʣ ▸ గਖ਼ର: ֶशσʔλͱผ͚ͩͲ͍ͦͦۙ͜͜ͷσʔλ ʢશମͰޡΓ12Օॴʣ ▸ ୯ޠׂ: MeCabʢUniDic, IPADICʣɺจࣈ୯ҐʢNeural ͷΈʣ ▸ ݴޠϞσϧ: - N-gram → KenLMʢ5-gramʣ - Neural → ยํɺํ LSTMʢֶशσʔλස1ͷ୯ޠΛ <unk>ʹஔʣ 15
1. ϨτϦόͬͯʁ 2. ϨτϦόͷڥ 3. औΓΜͩ༰ 4. ײ ࣮ݧ݁ՌʢN-gram ݴޠϞσϧʣ:
ఆྔతධՁ ▸ ୯ޠׂ: UniDic, IPADIC 16 1PTJUJWF /FHBUJWF 5SVF ʢ51ʣ ʢ5/ʣ 'BMTF ʢ'1ʣ ʢ'/ʣ 1PTJUJWF /FHBUJWF 5SVF ʢ51ʣ ʢ5/ʣ 'BMTF ʢ'1ʣ ʢ'/ʣ ݕग़݁Ռ గਖ਼݁Ռ 1PTJUJWF /FHBUJWF 5SVF ʢ51ʣ ʢ5/ʣ 'BMTF ʢ'1ʣ ʢ'/ʣ 1PTJUJWF /FHBUJWF 5SVF ʢ51ʣ ʢ5/ʣ 'BMTF ʢ'1ʣ ʢ'/ʣ 6OJ%JD *1"%*$ N-gram ͷΈగਖ਼ ࢼ͍ͯ͠Δ
1. ϨτϦόͬͯʁ 2. ϨτϦόͷڥ 3. औΓΜͩ༰ 4. ײ ࣮ݧ݁ՌʢNeural ݴޠϞσϧʣ:
ఆྔతධՁ ▸ ୯ޠׂ: UniDic, IPADIC 17 1PTJUJWF /FHBUJWF 5SVF ʢ51ʣ ʢ5/ʣ 'BMTF ʢ'1ʣ ʢ'/ʣ 1PTJUJWF /FHBUJWF 5SVF ʢ51ʣ ʢ5/ʣ 'BMTF ʢ'1ʣ ʢ'/ʣ ݕग़݁Ռʢ୯ํʣ ݕग़݁Ռʢํʣ 1PTJUJWF /FHBUJWF 5SVF ʢ51ʣ ʢ5/ʣ 'BMTF ʢ'1ʣ ʢ'/ʣ 1PTJUJWF /FHBUJWF 5SVF ʢ51ʣ ʢ5/ʣ 'BMTF ʢ'1ʣ ʢ'/ʣ 6OJ%JD *1"%*$
1. ϨτϦόͬͯʁ 2. ϨτϦόͷڥ 3. औΓΜͩ༰ 4. ײ ࣮ݧ݁ՌʢNeural ݴޠϞσϧʣ:
ఆྔతධՁ ▸ ୯ޠׂ: จࣈ୯Ґ 18 1PTJUJWF /FHBUJWF 5SVF ʢ51ʣ ʢ5/ʣ 'BMTF ʢ'1ʣ ʢ'/ʣ 1PTJUJWF /FHBUJWF 5SVF ʢ51ʣ ʢ5/ʣ 'BMTF ʢ'1ʣ ʢ'/ʣ ݕग़݁Ռʢ୯ํʣ ݕग़݁Ռʢํʣ จࣈ୯Ґ Neural ख๏ݕग़ΛͯΔͷ͕͔ͳΓΉ͍ͣ…ʁ
1. ϨτϦόͬͯʁ 2. ϨτϦόͷڥ 3. औΓΜͩ༰ 4. ײ ࠓճͷऔΓΜͩ༰ͷ·ͱΊ ▸
ࣗಈߍਖ਼ࢧԉͷλεΫͰݴޠϞσϧΛ༻͍ͯޡΓΛݕग़͢Δख๏Λݕ౼ͨ͠ ▸ N-gram ͰͬͯΈΔͱɺTP Λग़ͨ͢Ίʹ FP ͕͍ͬͺ͍ग़Δײ͡ʹͳͬͨ ʢword Ͱ͍͏ͱͷ͍͘͢͝ઢ͕ग़͖ͯͯΔΑ͏ͳײ͡ʣ ▸ Neural ͰͬͯΈΔͱɺ͋ΕʁN-gram ΑΓ্ख͍͔͘ͳ͍ͧʁͬͯײͩͬͨ͡ 19
Πϯλʔϯͷ ײ
1. ϨτϦόͬͯʁ 2. ϨτϦόͷڥ 3. औΓΜͩ༰ 4. ײ ײ: ͜ͷ̎ϲ݄ΛৼΓฦͬͯ
▸ શମ·ͱΊ ͦͦϨτϦόͰΠϯλʔϯΛ͠Α͏ͱࢥͬͨཧ༝ 1. ࣗવݴޠॲཧͷݟΛࣾձʹཱͯΔͬͯͲΜͳײ͔͡Γ͔ͨͬͨ 2. ՆٳΈظؒશ͍͍ͯͬͯײ͡ͷϑΟʔυόοΫΛಘ͍ͨ ˠ ݚڀͱݚڀ։ൃͷҧ͍Λ͘͢͝ײ͡Δ2ϲ݄Ͱͨ͠ʂ ▸ +α ͳײ - ͦͦΠϯλʔϯࣗମ͕ॳΊͯͩͬͨΜͰ͕͢ɺͦͷลಛ༗ͷࠔΓײ͡ͳ͔ͬͨͰ͢ - [ҹ] ࣗ༝ͳձࣾ:ʮ࣮ࡍʹΛͬͯΔاۀʹߦ͖͍ͨͰ͢ʯˠ ʮ͍͍Ͱ͢Αʔʯ - ͍͢͝ਓ͔͍͠ͳ͍: ΠϯλʔϯͳͲ֎ʹग़Δ͜ͱͰ৽͍ܹ͕͋ͬͨ͠Γ 21
1. ϨτϦόͬͯʁ 2. ϨτϦόͷڥ 3. औΓΜͩ༰ 4. ײ ײ: ͜ͷ̎ϲ݄ΛৼΓฦͬͯ
▸ શମ·ͱΊ ͦͦϨτϦόͰΠϯλʔϯΛ͠Α͏ͱࢥͬͨཧ༝ 1. ࣗવݴޠॲཧͷݟΛࣾձʹཱͯΔͬͯͲΜͳײ͔͡Γ͔ͨͬͨ 2. ՆٳΈظؒશ͍͍ͯͬͯײ͡ͷϑΟʔυόοΫΛಘ͍ͨ ˠ ݚڀͱݚڀ։ൃͷҧ͍Λ͘͢͝ײ͡Δ2ϲ݄Ͱͨ͠ʂ ▸ +α ͳײ - ͦͦΠϯλʔϯࣗମ͕ॳΊͯͩͬͨΜͰ͕͢ɺͦͷลಛ༗ͷࠔΓײ͡ͳ͔ͬͨͰ͢ - [ҹ] ࣗ༝ͳձࣾ:ʮ࣮ࡍʹΛͬͯΔاۀʹߦ͖͍ͨͰ͢ʯˠ ʮ͍͍Ͱ͢Αʔʯ - ͍͢͝ਓ͔͍͠ͳ͍: ΠϯλʔϯͳͲ֎ʹग़Δ͜ͱͰ৽͍ܹ͕͋ͬͨ͠Γ 22 ࣗવݴޠॲཧ͕ࣾձͰͲΜͳײ͡Ͱཱͭͷ͔ΛΓ͍ͨਓ ϨτϦόͷΠϯλʔϯ݁ߏ͍͍ͱࢥ͍·͢ ಛʹम࢜ͷֶੜ ਐֶ͔ब৬͔ͷஅࡐྉʹͳΔͱࢥ͍·͢
ͦͷଞ ࢀߟʹͳΓͦ͏ͳ ▸ ଞͷΠϯλʔϯͷಉظࢀՃใࠂΛ্͍͛ͯͨΓ͢Δ - http://www.creativ.xyz/retrieva-intern-840 - https://nomoto-eriko.hatenablog.com/entry/2018/10/04/125940 ▸ ଞͷΠϯλʔϯͷಉظՌใࠂεϥΠυΛެ։͍ͯͨ͠Γ͢Δ
- https://speakerdeck.com/nomotoeriko/retoribaintancheng-guo-bao-gao - https://speakerdeck.com/kajyuuen/zhuan-men-yong-yu-chou-chu-shou-fa- falseyan-jiu-to-chou-chu-apurikesiyonfalsekai-fa 23