Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
2018年度レトリバインターン参加報告
Search
Satoru Katsumata
December 10, 2023
0
9
2018年度レトリバインターン参加報告
レトリバで2018年度夏季インターンに参加した報告スライドです。
研究室で発表した資料になります。
Satoru Katsumata
December 10, 2023
Tweet
Share
More Decks by Satoru Katsumata
See All by Satoru Katsumata
論文紹介: Word-node2vec
katsumata420
0
54
論文紹介: How Contextual are Contextualized Word Representations?
katsumata420
0
39
論文紹介: Incorporating Syntactic and Semantic Information in Word Embeddings using Graph Convolutional Networks
katsumata420
0
32
論文紹介: Exploiting Monolingual Data at Scale for Neural Machine Translation
katsumata420
0
32
論文紹介: Deep Neural Machine Translation with Linear Associative Unit
katsumata420
0
45
論文紹介: A convolutional encoder model for neural machine translation
katsumata420
0
56
論文紹介: Lexically constrained decoding for sequence generation using grid beam search
katsumata420
0
71
論文紹介: Memory-augmented Neural Machine Translation
katsumata420
0
40
論文紹介: Guiding neural machine translation with retrieved translation pieces
katsumata420
0
43
Featured
See All Featured
StorybookのUI Testing Handbookを読んだ
zakiyama
28
5.4k
Visualization
eitanlees
146
15k
RailsConf 2023
tenderlove
29
960
The Psychology of Web Performance [Beyond Tellerrand 2023]
tammyeverts
45
2.3k
A Modern Web Designer's Workflow
chriscoyier
693
190k
Build The Right Thing And Hit Your Dates
maggiecrowley
33
2.5k
Scaling GitHub
holman
459
140k
The Myth of the Modular Monolith - Day 2 Keynote - Rails World 2024
eileencodes
19
2.3k
Fantastic passwords and where to find them - at NoRuKo
philnash
50
2.9k
Writing Fast Ruby
sferik
628
61k
Practical Orchestrator
shlominoach
186
10k
Why You Should Never Use an ORM
jnunemaker
PRO
54
9.1k
Transcript
ͳͭ͢ΈʹϨτϦόͰ Πϯλʔϯͨ͠ খொݚ म࢜̍ উຢ ஐ
1. ϨτϦόͬͯʁ 2. ϨτϦόͷڥ 3. औΓΜͩ༰ 4. ײ ϨτϦόͬͯԿʁ: ʢͬ͘͟Γʣձࣾઆ໌
▸ ࣗવݴޠॲཧΛ༻͍ͨιϑτΣΞͷݚڀɺ։ൃɺൢചɺಋೖΛ͍ͯ͠Δձࣾ ▸ [എܠͱ͔] PFI ͔ΒεϐϯΞτɺࠓͰ3 ▸ [ॴͳͲ] JR ൧ాڮӺ͔Βెา5ɺϏϧ1֊ΛआΓ͍ͯΔ ▸ ৄࡉ͕ؾʹͳΔํޙͰݸผʹ͓ئ͍͠·͢… ▸ ఆظతʹϐβύʔςΟͱ͔ͬͯΔͱͷ͜ͱͳͷͰؾʹͳͬͨํੋඇ 2
1. ϨτϦόͬͯʁ 2. ϨτϦόͷڥ 3. औΓΜͩ༰ 4. ײ ϨτϦόͬͯԿʁ: ʢͬ͘͟Γʣձࣾઆ໌
▸ ࣗવݴޠॲཧΛ༻͍ͨιϑτΣΞͷݚڀɺ։ൃɺൢചɺಋೖΛ͍ͯ͠Δձࣾ ▸ [എܠͱ͔] PFI ͔ΒεϐϯΞτɺࠓͰ3 ▸ [ॴͳͲ] JR ൧ాڮӺ͔Βెา5ɺϏϧ1֊ΛआΓ͍ͯΔ ▸ ৄࡉ͕ؾʹͳΔํޙͰݸผʹ͓ئ͍͠·͢… ▸ ఆظతʹϐβύʔςΟͱ͔ͬͯΔͱͷ͜ͱͳͷͰؾʹͳͬͨํੋඇ 3
1. ϨτϦόͬͯʁ 2. ϨτϦόͷڥ 3. औΓΜͩ༰ 4. ײ ϨτϦόͷΠϯλʔϯͱʁ ▸
ݚڀɺ։ൃͰืू ▸ ࠓճ ݚڀͰ2ਓɺ։ൃͰ2ਓͩͬͨ ▸ ݚڀ ࣗવݴޠॲཧ Ͱ1ਓɺԻೝࣝ Ͱ1ਓͷߏ ▸ ظؒ 8݄ɺ9݄ͷ 2ϲ݄ ▸ ࠓͷԠืకΊΓ 5/10 Ͱ ॻྨબߟ → ίʔσΟϯά՝ → ໘ → ࠾༻௨ ͱ͍͏ྲྀΕ ▸ ࣗίʔσΟϯάࡶڕࡶڕͳͷͰɺ ݚڀͷํͰरͬͯΒ͍͍ͨͱ͍͏ؾ࣋ͪʹ͋;Ε͍ͯͨʢͳͷͰड͔ͬͯخ͍͠ʣ 4 উຢݚڀʢࣗવݴޠॲཧʣ ଞͷΠϯλʔϯͷํ ୳ͤωοτ্Ͱݟ͔ͭΔ…ͱ ࢥ͍·͢
ϨτϦόͷ ڥ
1. ϨτϦόͬͯʁ 2. ϨτϦόͷڥ 3. औΓΜͩ༰ 4. ײ ϨτϦόͷڥ: རްੜɺڥʢΠϯλʔϯͰ͓ੈʹͳͬͨ༰த৺ʣ
▸ ϦϞʔτϫʔΫͷڐՄͳͲ͕औΓ͍͢ ← ෩ͷ࣌ʹͬ͘͢͝ॿ͔Γ·ͨ͠ ▸ Ҝࢠ͕͔ͳΓྑ͍ʢContessaʣˡ ϔουϨετͷॏཁੑ ▸ [ҿ৯ܥ] ຖिਫ༵ͷ͓ன͓หࢧڅ ← ࣾηϛφʔͳͲͷͨΊ ΦϑΟεάϦίಋೖɺҿΈ͕ϖοτϘτϧͰΒ͑Δ ▸ ͓څྉͷͰΔΠϯλʔϯͰ͢ʢΊͬͪΌॏཁʣ ަ௨අग़·͢ɺԕํͷΠϯλʔϯੜॅΉॴΛ༻ҙͯ͠ΒͬͨΓ 6
1. ϨτϦόͬͯʁ 2. ϨτϦόͷڥ 3. औΓΜͩ༰ 4. ײ ϨτϦόͷڥ: बۀ࣌ؒͱ͔
▸ جຊతʹ 10:00 - 18:00 ▸ ͓னૣΊʹ 11:30 ͝Ζʹʢࠞࡶରࡦʣ ▸ [ޕલத] தճֶͨ͠शͷ֬ೝͱ͔ ࠓԿ͔Δ͔ܾΊΔ [ޕޙ] ޕલதܾΊͨ͜ͱʹऔΓΉ 7
1. ϨτϦόͬͯʁ 2. ϨτϦόͷڥ 3. औΓΜͩ༰ 4. ײ ϨτϦόͷڥ: ߦࣄͱ͔ͦͷล
▸ ܴձ ϐβͱ͔ΛࣾͰ৯Δɺࣗݾհͱ͔ ▸ ϐβύʔςΟ ఆظతʹͬͯΔɺ֎෦ͷਓͱ͓ͨ͠͠ΓϘʔυήʔϜΛΔɺ ϐβΛ৯ΒΕΔ ▸ Ϙυήେձ ͓ன͔Β༦ํա͗͝Ζ·ͰϘʔυήʔϜɺ৭ʑͳͭΛͬͨ Camel Up ͕ݸਓతʹ໘ന͔ͬͨ 8 ϘʔυήʔϜΛΔػձ͕ଟ͘ɺϐβͱ͔৯Δ͜ͱ͕Ͱ͖Δʂ ↑ ༡ΜͰͳ͍ͷʹҹਂ͍ ↑
1. ϨτϦόͬͯʁ 2. ϨτϦόͷڥ 3. औΓΜͩ༰ 4. ײ ϨτϦόͷڥ: ৯ࣄʢटେͱൺΔͱӢటͷࠩʣ
ㅟ ▸ த՚ → 䠧߳ɺX’IAN ▸ ϋϫΠΞϯ → ALOHA TABLE ▸ ম͖ڕ → ӽޙُؙ ▸ ໌ଠࢠ → · ▸ ΠϯυΧϨʔ → ͻͭ͡ ▸ ڕɺ༲͛ → ͏͓࠲ ▸ ͏ͲΜ → խָʢ͏ͲΜͷதͷ͏ͲΜΒ͍͠ʣ 9 த՚ ϋϫΠΞϯ ম͖ڕ ໌ଠࢠʢ·ʣ ΠϯυʢΧϨʔʣ ڕɺ༲͛ ͏ͲΜ ൧ాڮӺ
ϨτϦόͰ औΓΜͩ͜ͱ ※ ৄࡉʹ͍ͭͯผͷࢿྉΛࢀর͍ͯͩ͘͠͞…
1. ϨτϦόͬͯʁ 2. ϨτϦόͷڥ 3. औΓΜͩ༰ 4. ײ औΓΜͩ༰: ࣗಈߍਖ਼ࢧԉख๏ͷݕ౼
▸ ֓ཁ [ೖྗ] ຊޠޠऀʹΑΔޡΓ͕͋Δʢ͔͠Εͳ͍ʣจ [ग़ྗ] ↑ͷޡΓΛݕग़ʢగਖ਼ʣͨ͠ͷ [༻Ͱ͖Δσʔλ] గਖ਼ޙʢޡΓؚ͕·Ε͍ͯͳ͍ʣจʢจষ୯Ґʣ [ͬͨ͜ͱ] గਖ਼ޙͷจ͔Βਖ਼͍͠จͷݴޠϞσϧΛֶश ˠ ೖྗจͷ֤୯ޠͷੜى֬Λࢉग़ɺᮢΑΓ͔ͬͨΒޡΓͱ͢Δ 11 ਖ਼ղʢग़ྗʣདྷདྷདྷੈௗʹͳۭͬͯΛࣗ༝ʹඈͼ͍ͨɻ ೖྗདྷདྷདྷੈௗʹͳۭͬͯΛࣗ༝ʹͼ͍ͨɻ ྫ
1. ϨτϦόͬͯʁ 2. ϨτϦόͷڥ 3. औΓΜͩ༰ 4. ײ ݴޠϞσϧͷͬ͘͟Γͨ͠Πϝʔδ ▸
ֶशσʔλʹج͍ͮͯɺ͋Δ୯ޠͷग़ݱ֬ΛٻΊ͍ͯΔ ྫ: ͷ୯ޠͷग़ݱ֬ΛٻΊΔ߹ ˠ ͦΕ·Ͱग़ݱͨ͠୯ޠʢi-1൪ʣ͔ΒٻΊΔ 12 w0 w1 … wi−1 wi wi P(wi |w0 , …, wi−1 )
1. ϨτϦόͬͯʁ 2. ϨτϦόͷڥ 3. औΓΜͩ༰ 4. ײ ݴޠϞσϧΛ༻͍ͨޡΓ୯ޠͷݕग़ ▸
ݴޠϞσϧ͔ΒٻΊͨ୯ޠͷੜى͕֬ᮢΑΓ͍ ˠ ޡΓ୯ޠͱ͢Δ ▸ ྫ: ᮢΛ 0.1 ͱͨ࣌͠ 13 w0 w1 … wi−1 wi P(wi |w0 , …, wi−1 ) < 0.1
1. ϨτϦόͬͯʁ 2. ϨτϦόͷڥ 3. औΓΜͩ༰ 4. ײ ࠓճͷࣗಈධՁई ▸
ೖྗͱగਖ਼݁Ռɺೖྗͱਖ਼ղʹ͍ͭͯɺͦΕͧΕͷจରͷҧ͍Λൺֱ͢Δ 14 ਖ਼ղࢲͷॴଐ͍ͯ͠Δେֶͷ໊শ͕มΘΓ·͢ɻ ೖྗࢲͷॴଐ͍ͯ͠Δେֶͷ໊উ͕ΘΓ·͢ɻ గਖ਼݁Ռࢲͷॾ͍ͯ͠Δେֶͷ໊শ͕ΘΓ·͢ɻ <❌>గਖ਼͕ bॴଐ`Λbॾ`ஔ 'BMTF1PTJUJWF <⭕>గਖ਼͕ b໊উ`Λ`໊শ`ஔ 5SVF1PTJUJWF <❌>గਖ਼͕ b`Λ`ม`ஔ͠ͳ͍ 'BMTF/FHBUJWF ྫ
1. ϨτϦόͬͯʁ 2. ϨτϦόͷڥ 3. औΓΜͩ༰ 4. ײ ࣮ݧઃఆ ▸
ֶशσʔλ: ͱ͋Δͷσʔλʢจ: 233,873 sentsʣ ▸ గਖ਼ର: ֶशσʔλͱผ͚ͩͲ͍ͦͦۙ͜͜ͷσʔλ ʢશମͰޡΓ12Օॴʣ ▸ ୯ޠׂ: MeCabʢUniDic, IPADICʣɺจࣈ୯ҐʢNeural ͷΈʣ ▸ ݴޠϞσϧ: - N-gram → KenLMʢ5-gramʣ - Neural → ยํɺํ LSTMʢֶशσʔλස1ͷ୯ޠΛ <unk>ʹஔʣ 15
1. ϨτϦόͬͯʁ 2. ϨτϦόͷڥ 3. औΓΜͩ༰ 4. ײ ࣮ݧ݁ՌʢN-gram ݴޠϞσϧʣ:
ఆྔతධՁ ▸ ୯ޠׂ: UniDic, IPADIC 16 1PTJUJWF /FHBUJWF 5SVF ʢ51ʣ ʢ5/ʣ 'BMTF ʢ'1ʣ ʢ'/ʣ 1PTJUJWF /FHBUJWF 5SVF ʢ51ʣ ʢ5/ʣ 'BMTF ʢ'1ʣ ʢ'/ʣ ݕग़݁Ռ గਖ਼݁Ռ 1PTJUJWF /FHBUJWF 5SVF ʢ51ʣ ʢ5/ʣ 'BMTF ʢ'1ʣ ʢ'/ʣ 1PTJUJWF /FHBUJWF 5SVF ʢ51ʣ ʢ5/ʣ 'BMTF ʢ'1ʣ ʢ'/ʣ 6OJ%JD *1"%*$ N-gram ͷΈగਖ਼ ࢼ͍ͯ͠Δ
1. ϨτϦόͬͯʁ 2. ϨτϦόͷڥ 3. औΓΜͩ༰ 4. ײ ࣮ݧ݁ՌʢNeural ݴޠϞσϧʣ:
ఆྔతධՁ ▸ ୯ޠׂ: UniDic, IPADIC 17 1PTJUJWF /FHBUJWF 5SVF ʢ51ʣ ʢ5/ʣ 'BMTF ʢ'1ʣ ʢ'/ʣ 1PTJUJWF /FHBUJWF 5SVF ʢ51ʣ ʢ5/ʣ 'BMTF ʢ'1ʣ ʢ'/ʣ ݕग़݁Ռʢ୯ํʣ ݕग़݁Ռʢํʣ 1PTJUJWF /FHBUJWF 5SVF ʢ51ʣ ʢ5/ʣ 'BMTF ʢ'1ʣ ʢ'/ʣ 1PTJUJWF /FHBUJWF 5SVF ʢ51ʣ ʢ5/ʣ 'BMTF ʢ'1ʣ ʢ'/ʣ 6OJ%JD *1"%*$
1. ϨτϦόͬͯʁ 2. ϨτϦόͷڥ 3. औΓΜͩ༰ 4. ײ ࣮ݧ݁ՌʢNeural ݴޠϞσϧʣ:
ఆྔతධՁ ▸ ୯ޠׂ: จࣈ୯Ґ 18 1PTJUJWF /FHBUJWF 5SVF ʢ51ʣ ʢ5/ʣ 'BMTF ʢ'1ʣ ʢ'/ʣ 1PTJUJWF /FHBUJWF 5SVF ʢ51ʣ ʢ5/ʣ 'BMTF ʢ'1ʣ ʢ'/ʣ ݕग़݁Ռʢ୯ํʣ ݕग़݁Ռʢํʣ จࣈ୯Ґ Neural ख๏ݕग़ΛͯΔͷ͕͔ͳΓΉ͍ͣ…ʁ
1. ϨτϦόͬͯʁ 2. ϨτϦόͷڥ 3. औΓΜͩ༰ 4. ײ ࠓճͷऔΓΜͩ༰ͷ·ͱΊ ▸
ࣗಈߍਖ਼ࢧԉͷλεΫͰݴޠϞσϧΛ༻͍ͯޡΓΛݕग़͢Δख๏Λݕ౼ͨ͠ ▸ N-gram ͰͬͯΈΔͱɺTP Λग़ͨ͢Ίʹ FP ͕͍ͬͺ͍ग़Δײ͡ʹͳͬͨ ʢword Ͱ͍͏ͱͷ͍͘͢͝ઢ͕ग़͖ͯͯΔΑ͏ͳײ͡ʣ ▸ Neural ͰͬͯΈΔͱɺ͋ΕʁN-gram ΑΓ্ख͍͔͘ͳ͍ͧʁͬͯײͩͬͨ͡ 19
Πϯλʔϯͷ ײ
1. ϨτϦόͬͯʁ 2. ϨτϦόͷڥ 3. औΓΜͩ༰ 4. ײ ײ: ͜ͷ̎ϲ݄ΛৼΓฦͬͯ
▸ શମ·ͱΊ ͦͦϨτϦόͰΠϯλʔϯΛ͠Α͏ͱࢥͬͨཧ༝ 1. ࣗવݴޠॲཧͷݟΛࣾձʹཱͯΔͬͯͲΜͳײ͔͡Γ͔ͨͬͨ 2. ՆٳΈظؒશ͍͍ͯͬͯײ͡ͷϑΟʔυόοΫΛಘ͍ͨ ˠ ݚڀͱݚڀ։ൃͷҧ͍Λ͘͢͝ײ͡Δ2ϲ݄Ͱͨ͠ʂ ▸ +α ͳײ - ͦͦΠϯλʔϯࣗମ͕ॳΊͯͩͬͨΜͰ͕͢ɺͦͷลಛ༗ͷࠔΓײ͡ͳ͔ͬͨͰ͢ - [ҹ] ࣗ༝ͳձࣾ:ʮ࣮ࡍʹΛͬͯΔاۀʹߦ͖͍ͨͰ͢ʯˠ ʮ͍͍Ͱ͢Αʔʯ - ͍͢͝ਓ͔͍͠ͳ͍: ΠϯλʔϯͳͲ֎ʹग़Δ͜ͱͰ৽͍ܹ͕͋ͬͨ͠Γ 21
1. ϨτϦόͬͯʁ 2. ϨτϦόͷڥ 3. औΓΜͩ༰ 4. ײ ײ: ͜ͷ̎ϲ݄ΛৼΓฦͬͯ
▸ શମ·ͱΊ ͦͦϨτϦόͰΠϯλʔϯΛ͠Α͏ͱࢥͬͨཧ༝ 1. ࣗવݴޠॲཧͷݟΛࣾձʹཱͯΔͬͯͲΜͳײ͔͡Γ͔ͨͬͨ 2. ՆٳΈظؒશ͍͍ͯͬͯײ͡ͷϑΟʔυόοΫΛಘ͍ͨ ˠ ݚڀͱݚڀ։ൃͷҧ͍Λ͘͢͝ײ͡Δ2ϲ݄Ͱͨ͠ʂ ▸ +α ͳײ - ͦͦΠϯλʔϯࣗମ͕ॳΊͯͩͬͨΜͰ͕͢ɺͦͷลಛ༗ͷࠔΓײ͡ͳ͔ͬͨͰ͢ - [ҹ] ࣗ༝ͳձࣾ:ʮ࣮ࡍʹΛͬͯΔاۀʹߦ͖͍ͨͰ͢ʯˠ ʮ͍͍Ͱ͢Αʔʯ - ͍͢͝ਓ͔͍͠ͳ͍: ΠϯλʔϯͳͲ֎ʹग़Δ͜ͱͰ৽͍ܹ͕͋ͬͨ͠Γ 22 ࣗવݴޠॲཧ͕ࣾձͰͲΜͳײ͡Ͱཱͭͷ͔ΛΓ͍ͨਓ ϨτϦόͷΠϯλʔϯ݁ߏ͍͍ͱࢥ͍·͢ ಛʹम࢜ͷֶੜ ਐֶ͔ब৬͔ͷஅࡐྉʹͳΔͱࢥ͍·͢
ͦͷଞ ࢀߟʹͳΓͦ͏ͳ ▸ ଞͷΠϯλʔϯͷಉظࢀՃใࠂΛ্͍͛ͯͨΓ͢Δ - http://www.creativ.xyz/retrieva-intern-840 - https://nomoto-eriko.hatenablog.com/entry/2018/10/04/125940 ▸ ଞͷΠϯλʔϯͷಉظՌใࠂεϥΠυΛެ։͍ͯͨ͠Γ͢Δ
- https://speakerdeck.com/nomotoeriko/retoribaintancheng-guo-bao-gao - https://speakerdeck.com/kajyuuen/zhuan-men-yong-yu-chou-chu-shou-fa- falseyan-jiu-to-chou-chu-apurikesiyonfalsekai-fa 23