Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
2018年度レトリバインターン参加報告
Search
Satoru Katsumata
December 10, 2023
0
5
2018年度レトリバインターン参加報告
レトリバで2018年度夏季インターンに参加した報告スライドです。
研究室で発表した資料になります。
Satoru Katsumata
December 10, 2023
Tweet
Share
More Decks by Satoru Katsumata
See All by Satoru Katsumata
論文紹介: Word-node2vec
katsumata420
0
24
論文紹介: How Contextual are Contextualized Word Representations?
katsumata420
0
16
論文紹介: Incorporating Syntactic and Semantic Information in Word Embeddings using Graph Convolutional Networks
katsumata420
0
15
論文紹介: Exploiting Monolingual Data at Scale for Neural Machine Translation
katsumata420
0
12
論文紹介: Deep Neural Machine Translation with Linear Associative Unit
katsumata420
0
19
論文紹介: A convolutional encoder model for neural machine translation
katsumata420
0
23
論文紹介: Lexically constrained decoding for sequence generation using grid beam search
katsumata420
0
23
論文紹介: Memory-augmented Neural Machine Translation
katsumata420
0
23
論文紹介: Guiding neural machine translation with retrieved translation pieces
katsumata420
0
22
Featured
See All Featured
Building Effective Engineering Teams - LeadDev
addyosmani
33
1.9k
Optimizing for Happiness
mojombo
370
69k
Thoughts on Productivity
jonyablonski
60
3.9k
KATA
mclloyd
16
12k
4 Signs Your Business is Dying
shpigford
176
21k
StorybookのUI Testing Handbookを読んだ
zakiyama
13
4.7k
Music & Morning Musume
bryan
41
5.6k
How to name files
jennybc
65
93k
Why You Should Never Use an ORM
jnunemaker
PRO
51
8.7k
Product Roadmaps are Hard
iamctodd
45
9.8k
The Brand Is Dead. Long Live the Brand.
mthomps
49
30k
Exploring the Power of Turbo Streams & Action Cable | RailsConf2023
kevinliebholz
8
3.5k
Transcript
ͳͭ͢ΈʹϨτϦόͰ Πϯλʔϯͨ͠ খொݚ म࢜̍ উຢ ஐ
1. ϨτϦόͬͯʁ 2. ϨτϦόͷڥ 3. औΓΜͩ༰ 4. ײ ϨτϦόͬͯԿʁ: ʢͬ͘͟Γʣձࣾઆ໌
▸ ࣗવݴޠॲཧΛ༻͍ͨιϑτΣΞͷݚڀɺ։ൃɺൢചɺಋೖΛ͍ͯ͠Δձࣾ ▸ [എܠͱ͔] PFI ͔ΒεϐϯΞτɺࠓͰ3 ▸ [ॴͳͲ] JR ൧ాڮӺ͔Βెา5ɺϏϧ1֊ΛआΓ͍ͯΔ ▸ ৄࡉ͕ؾʹͳΔํޙͰݸผʹ͓ئ͍͠·͢… ▸ ఆظతʹϐβύʔςΟͱ͔ͬͯΔͱͷ͜ͱͳͷͰؾʹͳͬͨํੋඇ 2
1. ϨτϦόͬͯʁ 2. ϨτϦόͷڥ 3. औΓΜͩ༰ 4. ײ ϨτϦόͬͯԿʁ: ʢͬ͘͟Γʣձࣾઆ໌
▸ ࣗવݴޠॲཧΛ༻͍ͨιϑτΣΞͷݚڀɺ։ൃɺൢചɺಋೖΛ͍ͯ͠Δձࣾ ▸ [എܠͱ͔] PFI ͔ΒεϐϯΞτɺࠓͰ3 ▸ [ॴͳͲ] JR ൧ాڮӺ͔Βెา5ɺϏϧ1֊ΛआΓ͍ͯΔ ▸ ৄࡉ͕ؾʹͳΔํޙͰݸผʹ͓ئ͍͠·͢… ▸ ఆظతʹϐβύʔςΟͱ͔ͬͯΔͱͷ͜ͱͳͷͰؾʹͳͬͨํੋඇ 3
1. ϨτϦόͬͯʁ 2. ϨτϦόͷڥ 3. औΓΜͩ༰ 4. ײ ϨτϦόͷΠϯλʔϯͱʁ ▸
ݚڀɺ։ൃͰืू ▸ ࠓճ ݚڀͰ2ਓɺ։ൃͰ2ਓͩͬͨ ▸ ݚڀ ࣗવݴޠॲཧ Ͱ1ਓɺԻೝࣝ Ͱ1ਓͷߏ ▸ ظؒ 8݄ɺ9݄ͷ 2ϲ݄ ▸ ࠓͷԠืకΊΓ 5/10 Ͱ ॻྨબߟ → ίʔσΟϯά՝ → ໘ → ࠾༻௨ ͱ͍͏ྲྀΕ ▸ ࣗίʔσΟϯάࡶڕࡶڕͳͷͰɺ ݚڀͷํͰरͬͯΒ͍͍ͨͱ͍͏ؾ࣋ͪʹ͋;Ε͍ͯͨʢͳͷͰड͔ͬͯخ͍͠ʣ 4 উຢݚڀʢࣗવݴޠॲཧʣ ଞͷΠϯλʔϯͷํ ୳ͤωοτ্Ͱݟ͔ͭΔ…ͱ ࢥ͍·͢
ϨτϦόͷ ڥ
1. ϨτϦόͬͯʁ 2. ϨτϦόͷڥ 3. औΓΜͩ༰ 4. ײ ϨτϦόͷڥ: རްੜɺڥʢΠϯλʔϯͰ͓ੈʹͳͬͨ༰த৺ʣ
▸ ϦϞʔτϫʔΫͷڐՄͳͲ͕औΓ͍͢ ← ෩ͷ࣌ʹͬ͘͢͝ॿ͔Γ·ͨ͠ ▸ Ҝࢠ͕͔ͳΓྑ͍ʢContessaʣˡ ϔουϨετͷॏཁੑ ▸ [ҿ৯ܥ] ຖिਫ༵ͷ͓ன͓หࢧڅ ← ࣾηϛφʔͳͲͷͨΊ ΦϑΟεάϦίಋೖɺҿΈ͕ϖοτϘτϧͰΒ͑Δ ▸ ͓څྉͷͰΔΠϯλʔϯͰ͢ʢΊͬͪΌॏཁʣ ަ௨අग़·͢ɺԕํͷΠϯλʔϯੜॅΉॴΛ༻ҙͯ͠ΒͬͨΓ 6
1. ϨτϦόͬͯʁ 2. ϨτϦόͷڥ 3. औΓΜͩ༰ 4. ײ ϨτϦόͷڥ: बۀ࣌ؒͱ͔
▸ جຊతʹ 10:00 - 18:00 ▸ ͓னૣΊʹ 11:30 ͝Ζʹʢࠞࡶରࡦʣ ▸ [ޕલத] தճֶͨ͠शͷ֬ೝͱ͔ ࠓԿ͔Δ͔ܾΊΔ [ޕޙ] ޕલதܾΊͨ͜ͱʹऔΓΉ 7
1. ϨτϦόͬͯʁ 2. ϨτϦόͷڥ 3. औΓΜͩ༰ 4. ײ ϨτϦόͷڥ: ߦࣄͱ͔ͦͷล
▸ ܴձ ϐβͱ͔ΛࣾͰ৯Δɺࣗݾհͱ͔ ▸ ϐβύʔςΟ ఆظతʹͬͯΔɺ֎෦ͷਓͱ͓ͨ͠͠ΓϘʔυήʔϜΛΔɺ ϐβΛ৯ΒΕΔ ▸ Ϙυήେձ ͓ன͔Β༦ํա͗͝Ζ·ͰϘʔυήʔϜɺ৭ʑͳͭΛͬͨ Camel Up ͕ݸਓతʹ໘ന͔ͬͨ 8 ϘʔυήʔϜΛΔػձ͕ଟ͘ɺϐβͱ͔৯Δ͜ͱ͕Ͱ͖Δʂ ↑ ༡ΜͰͳ͍ͷʹҹਂ͍ ↑
1. ϨτϦόͬͯʁ 2. ϨτϦόͷڥ 3. औΓΜͩ༰ 4. ײ ϨτϦόͷڥ: ৯ࣄʢटେͱൺΔͱӢటͷࠩʣ
ㅟ ▸ த՚ → 䠧߳ɺX’IAN ▸ ϋϫΠΞϯ → ALOHA TABLE ▸ ম͖ڕ → ӽޙُؙ ▸ ໌ଠࢠ → · ▸ ΠϯυΧϨʔ → ͻͭ͡ ▸ ڕɺ༲͛ → ͏͓࠲ ▸ ͏ͲΜ → խָʢ͏ͲΜͷதͷ͏ͲΜΒ͍͠ʣ 9 த՚ ϋϫΠΞϯ ম͖ڕ ໌ଠࢠʢ·ʣ ΠϯυʢΧϨʔʣ ڕɺ༲͛ ͏ͲΜ ൧ాڮӺ
ϨτϦόͰ औΓΜͩ͜ͱ ※ ৄࡉʹ͍ͭͯผͷࢿྉΛࢀর͍ͯͩ͘͠͞…
1. ϨτϦόͬͯʁ 2. ϨτϦόͷڥ 3. औΓΜͩ༰ 4. ײ औΓΜͩ༰: ࣗಈߍਖ਼ࢧԉख๏ͷݕ౼
▸ ֓ཁ [ೖྗ] ຊޠޠऀʹΑΔޡΓ͕͋Δʢ͔͠Εͳ͍ʣจ [ग़ྗ] ↑ͷޡΓΛݕग़ʢగਖ਼ʣͨ͠ͷ [༻Ͱ͖Δσʔλ] గਖ਼ޙʢޡΓؚ͕·Ε͍ͯͳ͍ʣจʢจষ୯Ґʣ [ͬͨ͜ͱ] గਖ਼ޙͷจ͔Βਖ਼͍͠จͷݴޠϞσϧΛֶश ˠ ೖྗจͷ֤୯ޠͷੜى֬Λࢉग़ɺᮢΑΓ͔ͬͨΒޡΓͱ͢Δ 11 ਖ਼ղʢग़ྗʣདྷདྷདྷੈௗʹͳۭͬͯΛࣗ༝ʹඈͼ͍ͨɻ ೖྗདྷདྷདྷੈௗʹͳۭͬͯΛࣗ༝ʹͼ͍ͨɻ ྫ
1. ϨτϦόͬͯʁ 2. ϨτϦόͷڥ 3. औΓΜͩ༰ 4. ײ ݴޠϞσϧͷͬ͘͟Γͨ͠Πϝʔδ ▸
ֶशσʔλʹج͍ͮͯɺ͋Δ୯ޠͷग़ݱ֬ΛٻΊ͍ͯΔ ྫ: ͷ୯ޠͷग़ݱ֬ΛٻΊΔ߹ ˠ ͦΕ·Ͱग़ݱͨ͠୯ޠʢi-1൪ʣ͔ΒٻΊΔ 12 w0 w1 … wi−1 wi wi P(wi |w0 , …, wi−1 )
1. ϨτϦόͬͯʁ 2. ϨτϦόͷڥ 3. औΓΜͩ༰ 4. ײ ݴޠϞσϧΛ༻͍ͨޡΓ୯ޠͷݕग़ ▸
ݴޠϞσϧ͔ΒٻΊͨ୯ޠͷੜى͕֬ᮢΑΓ͍ ˠ ޡΓ୯ޠͱ͢Δ ▸ ྫ: ᮢΛ 0.1 ͱͨ࣌͠ 13 w0 w1 … wi−1 wi P(wi |w0 , …, wi−1 ) < 0.1
1. ϨτϦόͬͯʁ 2. ϨτϦόͷڥ 3. औΓΜͩ༰ 4. ײ ࠓճͷࣗಈධՁई ▸
ೖྗͱగਖ਼݁Ռɺೖྗͱਖ਼ղʹ͍ͭͯɺͦΕͧΕͷจରͷҧ͍Λൺֱ͢Δ 14 ਖ਼ղࢲͷॴଐ͍ͯ͠Δେֶͷ໊শ͕มΘΓ·͢ɻ ೖྗࢲͷॴଐ͍ͯ͠Δେֶͷ໊উ͕ΘΓ·͢ɻ గਖ਼݁Ռࢲͷॾ͍ͯ͠Δେֶͷ໊শ͕ΘΓ·͢ɻ <❌>గਖ਼͕ bॴଐ`Λbॾ`ஔ 'BMTF1PTJUJWF <⭕>గਖ਼͕ b໊উ`Λ`໊শ`ஔ 5SVF1PTJUJWF <❌>గਖ਼͕ b`Λ`ม`ஔ͠ͳ͍ 'BMTF/FHBUJWF ྫ
1. ϨτϦόͬͯʁ 2. ϨτϦόͷڥ 3. औΓΜͩ༰ 4. ײ ࣮ݧઃఆ ▸
ֶशσʔλ: ͱ͋Δͷσʔλʢจ: 233,873 sentsʣ ▸ గਖ਼ର: ֶशσʔλͱผ͚ͩͲ͍ͦͦۙ͜͜ͷσʔλ ʢશମͰޡΓ12Օॴʣ ▸ ୯ޠׂ: MeCabʢUniDic, IPADICʣɺจࣈ୯ҐʢNeural ͷΈʣ ▸ ݴޠϞσϧ: - N-gram → KenLMʢ5-gramʣ - Neural → ยํɺํ LSTMʢֶशσʔλස1ͷ୯ޠΛ <unk>ʹஔʣ 15
1. ϨτϦόͬͯʁ 2. ϨτϦόͷڥ 3. औΓΜͩ༰ 4. ײ ࣮ݧ݁ՌʢN-gram ݴޠϞσϧʣ:
ఆྔతධՁ ▸ ୯ޠׂ: UniDic, IPADIC 16 1PTJUJWF /FHBUJWF 5SVF ʢ51ʣ ʢ5/ʣ 'BMTF ʢ'1ʣ ʢ'/ʣ 1PTJUJWF /FHBUJWF 5SVF ʢ51ʣ ʢ5/ʣ 'BMTF ʢ'1ʣ ʢ'/ʣ ݕग़݁Ռ గਖ਼݁Ռ 1PTJUJWF /FHBUJWF 5SVF ʢ51ʣ ʢ5/ʣ 'BMTF ʢ'1ʣ ʢ'/ʣ 1PTJUJWF /FHBUJWF 5SVF ʢ51ʣ ʢ5/ʣ 'BMTF ʢ'1ʣ ʢ'/ʣ 6OJ%JD *1"%*$ N-gram ͷΈగਖ਼ ࢼ͍ͯ͠Δ
1. ϨτϦόͬͯʁ 2. ϨτϦόͷڥ 3. औΓΜͩ༰ 4. ײ ࣮ݧ݁ՌʢNeural ݴޠϞσϧʣ:
ఆྔతධՁ ▸ ୯ޠׂ: UniDic, IPADIC 17 1PTJUJWF /FHBUJWF 5SVF ʢ51ʣ ʢ5/ʣ 'BMTF ʢ'1ʣ ʢ'/ʣ 1PTJUJWF /FHBUJWF 5SVF ʢ51ʣ ʢ5/ʣ 'BMTF ʢ'1ʣ ʢ'/ʣ ݕग़݁Ռʢ୯ํʣ ݕग़݁Ռʢํʣ 1PTJUJWF /FHBUJWF 5SVF ʢ51ʣ ʢ5/ʣ 'BMTF ʢ'1ʣ ʢ'/ʣ 1PTJUJWF /FHBUJWF 5SVF ʢ51ʣ ʢ5/ʣ 'BMTF ʢ'1ʣ ʢ'/ʣ 6OJ%JD *1"%*$
1. ϨτϦόͬͯʁ 2. ϨτϦόͷڥ 3. औΓΜͩ༰ 4. ײ ࣮ݧ݁ՌʢNeural ݴޠϞσϧʣ:
ఆྔతධՁ ▸ ୯ޠׂ: จࣈ୯Ґ 18 1PTJUJWF /FHBUJWF 5SVF ʢ51ʣ ʢ5/ʣ 'BMTF ʢ'1ʣ ʢ'/ʣ 1PTJUJWF /FHBUJWF 5SVF ʢ51ʣ ʢ5/ʣ 'BMTF ʢ'1ʣ ʢ'/ʣ ݕग़݁Ռʢ୯ํʣ ݕग़݁Ռʢํʣ จࣈ୯Ґ Neural ख๏ݕग़ΛͯΔͷ͕͔ͳΓΉ͍ͣ…ʁ
1. ϨτϦόͬͯʁ 2. ϨτϦόͷڥ 3. औΓΜͩ༰ 4. ײ ࠓճͷऔΓΜͩ༰ͷ·ͱΊ ▸
ࣗಈߍਖ਼ࢧԉͷλεΫͰݴޠϞσϧΛ༻͍ͯޡΓΛݕग़͢Δख๏Λݕ౼ͨ͠ ▸ N-gram ͰͬͯΈΔͱɺTP Λग़ͨ͢Ίʹ FP ͕͍ͬͺ͍ग़Δײ͡ʹͳͬͨ ʢword Ͱ͍͏ͱͷ͍͘͢͝ઢ͕ग़͖ͯͯΔΑ͏ͳײ͡ʣ ▸ Neural ͰͬͯΈΔͱɺ͋ΕʁN-gram ΑΓ্ख͍͔͘ͳ͍ͧʁͬͯײͩͬͨ͡ 19
Πϯλʔϯͷ ײ
1. ϨτϦόͬͯʁ 2. ϨτϦόͷڥ 3. औΓΜͩ༰ 4. ײ ײ: ͜ͷ̎ϲ݄ΛৼΓฦͬͯ
▸ શମ·ͱΊ ͦͦϨτϦόͰΠϯλʔϯΛ͠Α͏ͱࢥͬͨཧ༝ 1. ࣗવݴޠॲཧͷݟΛࣾձʹཱͯΔͬͯͲΜͳײ͔͡Γ͔ͨͬͨ 2. ՆٳΈظؒશ͍͍ͯͬͯײ͡ͷϑΟʔυόοΫΛಘ͍ͨ ˠ ݚڀͱݚڀ։ൃͷҧ͍Λ͘͢͝ײ͡Δ2ϲ݄Ͱͨ͠ʂ ▸ +α ͳײ - ͦͦΠϯλʔϯࣗମ͕ॳΊͯͩͬͨΜͰ͕͢ɺͦͷลಛ༗ͷࠔΓײ͡ͳ͔ͬͨͰ͢ - [ҹ] ࣗ༝ͳձࣾ:ʮ࣮ࡍʹΛͬͯΔاۀʹߦ͖͍ͨͰ͢ʯˠ ʮ͍͍Ͱ͢Αʔʯ - ͍͢͝ਓ͔͍͠ͳ͍: ΠϯλʔϯͳͲ֎ʹग़Δ͜ͱͰ৽͍ܹ͕͋ͬͨ͠Γ 21
1. ϨτϦόͬͯʁ 2. ϨτϦόͷڥ 3. औΓΜͩ༰ 4. ײ ײ: ͜ͷ̎ϲ݄ΛৼΓฦͬͯ
▸ શମ·ͱΊ ͦͦϨτϦόͰΠϯλʔϯΛ͠Α͏ͱࢥͬͨཧ༝ 1. ࣗવݴޠॲཧͷݟΛࣾձʹཱͯΔͬͯͲΜͳײ͔͡Γ͔ͨͬͨ 2. ՆٳΈظؒશ͍͍ͯͬͯײ͡ͷϑΟʔυόοΫΛಘ͍ͨ ˠ ݚڀͱݚڀ։ൃͷҧ͍Λ͘͢͝ײ͡Δ2ϲ݄Ͱͨ͠ʂ ▸ +α ͳײ - ͦͦΠϯλʔϯࣗମ͕ॳΊͯͩͬͨΜͰ͕͢ɺͦͷลಛ༗ͷࠔΓײ͡ͳ͔ͬͨͰ͢ - [ҹ] ࣗ༝ͳձࣾ:ʮ࣮ࡍʹΛͬͯΔاۀʹߦ͖͍ͨͰ͢ʯˠ ʮ͍͍Ͱ͢Αʔʯ - ͍͢͝ਓ͔͍͠ͳ͍: ΠϯλʔϯͳͲ֎ʹग़Δ͜ͱͰ৽͍ܹ͕͋ͬͨ͠Γ 22 ࣗવݴޠॲཧ͕ࣾձͰͲΜͳײ͡Ͱཱͭͷ͔ΛΓ͍ͨਓ ϨτϦόͷΠϯλʔϯ݁ߏ͍͍ͱࢥ͍·͢ ಛʹम࢜ͷֶੜ ਐֶ͔ब৬͔ͷஅࡐྉʹͳΔͱࢥ͍·͢
ͦͷଞ ࢀߟʹͳΓͦ͏ͳ ▸ ଞͷΠϯλʔϯͷಉظࢀՃใࠂΛ্͍͛ͯͨΓ͢Δ - http://www.creativ.xyz/retrieva-intern-840 - https://nomoto-eriko.hatenablog.com/entry/2018/10/04/125940 ▸ ଞͷΠϯλʔϯͷಉظՌใࠂεϥΠυΛެ։͍ͯͨ͠Γ͢Δ
- https://speakerdeck.com/nomotoeriko/retoribaintancheng-guo-bao-gao - https://speakerdeck.com/kajyuuen/zhuan-men-yong-yu-chou-chu-shou-fa- falseyan-jiu-to-chou-chu-apurikesiyonfalsekai-fa 23