Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
2018年度レトリバインターン参加報告
Search
Satoru Katsumata
December 10, 2023
0
7
2018年度レトリバインターン参加報告
レトリバで2018年度夏季インターンに参加した報告スライドです。
研究室で発表した資料になります。
Satoru Katsumata
December 10, 2023
Tweet
Share
More Decks by Satoru Katsumata
See All by Satoru Katsumata
論文紹介: Word-node2vec
katsumata420
0
47
論文紹介: How Contextual are Contextualized Word Representations?
katsumata420
0
34
論文紹介: Incorporating Syntactic and Semantic Information in Word Embeddings using Graph Convolutional Networks
katsumata420
0
31
論文紹介: Exploiting Monolingual Data at Scale for Neural Machine Translation
katsumata420
0
32
論文紹介: Deep Neural Machine Translation with Linear Associative Unit
katsumata420
0
43
論文紹介: A convolutional encoder model for neural machine translation
katsumata420
0
46
論文紹介: Lexically constrained decoding for sequence generation using grid beam search
katsumata420
0
59
論文紹介: Memory-augmented Neural Machine Translation
katsumata420
0
39
論文紹介: Guiding neural machine translation with retrieved translation pieces
katsumata420
0
42
Featured
See All Featured
VelocityConf: Rendering Performance Case Studies
addyosmani
325
24k
Build your cross-platform service in a week with App Engine
jlugia
229
18k
Learning to Love Humans: Emotional Interface Design
aarron
273
40k
Automating Front-end Workflow
addyosmani
1366
200k
Practical Tips for Bootstrapping Information Extraction Pipelines
honnibal
PRO
10
720
CSS Pre-Processors: Stylus, Less & Sass
bermonpainter
356
29k
Code Review Best Practice
trishagee
64
17k
Writing Fast Ruby
sferik
627
61k
Agile that works and the tools we love
rasmusluckow
327
21k
Building Adaptive Systems
keathley
38
2.3k
GraphQLとの向き合い方2022年版
quramy
43
13k
How GitHub (no longer) Works
holman
310
140k
Transcript
ͳͭ͢ΈʹϨτϦόͰ Πϯλʔϯͨ͠ খொݚ म࢜̍ উຢ ஐ
1. ϨτϦόͬͯʁ 2. ϨτϦόͷڥ 3. औΓΜͩ༰ 4. ײ ϨτϦόͬͯԿʁ: ʢͬ͘͟Γʣձࣾઆ໌
▸ ࣗવݴޠॲཧΛ༻͍ͨιϑτΣΞͷݚڀɺ։ൃɺൢചɺಋೖΛ͍ͯ͠Δձࣾ ▸ [എܠͱ͔] PFI ͔ΒεϐϯΞτɺࠓͰ3 ▸ [ॴͳͲ] JR ൧ాڮӺ͔Βెา5ɺϏϧ1֊ΛआΓ͍ͯΔ ▸ ৄࡉ͕ؾʹͳΔํޙͰݸผʹ͓ئ͍͠·͢… ▸ ఆظతʹϐβύʔςΟͱ͔ͬͯΔͱͷ͜ͱͳͷͰؾʹͳͬͨํੋඇ 2
1. ϨτϦόͬͯʁ 2. ϨτϦόͷڥ 3. औΓΜͩ༰ 4. ײ ϨτϦόͬͯԿʁ: ʢͬ͘͟Γʣձࣾઆ໌
▸ ࣗવݴޠॲཧΛ༻͍ͨιϑτΣΞͷݚڀɺ։ൃɺൢചɺಋೖΛ͍ͯ͠Δձࣾ ▸ [എܠͱ͔] PFI ͔ΒεϐϯΞτɺࠓͰ3 ▸ [ॴͳͲ] JR ൧ాڮӺ͔Βెา5ɺϏϧ1֊ΛआΓ͍ͯΔ ▸ ৄࡉ͕ؾʹͳΔํޙͰݸผʹ͓ئ͍͠·͢… ▸ ఆظతʹϐβύʔςΟͱ͔ͬͯΔͱͷ͜ͱͳͷͰؾʹͳͬͨํੋඇ 3
1. ϨτϦόͬͯʁ 2. ϨτϦόͷڥ 3. औΓΜͩ༰ 4. ײ ϨτϦόͷΠϯλʔϯͱʁ ▸
ݚڀɺ։ൃͰืू ▸ ࠓճ ݚڀͰ2ਓɺ։ൃͰ2ਓͩͬͨ ▸ ݚڀ ࣗવݴޠॲཧ Ͱ1ਓɺԻೝࣝ Ͱ1ਓͷߏ ▸ ظؒ 8݄ɺ9݄ͷ 2ϲ݄ ▸ ࠓͷԠืకΊΓ 5/10 Ͱ ॻྨબߟ → ίʔσΟϯά՝ → ໘ → ࠾༻௨ ͱ͍͏ྲྀΕ ▸ ࣗίʔσΟϯάࡶڕࡶڕͳͷͰɺ ݚڀͷํͰरͬͯΒ͍͍ͨͱ͍͏ؾ࣋ͪʹ͋;Ε͍ͯͨʢͳͷͰड͔ͬͯخ͍͠ʣ 4 উຢݚڀʢࣗવݴޠॲཧʣ ଞͷΠϯλʔϯͷํ ୳ͤωοτ্Ͱݟ͔ͭΔ…ͱ ࢥ͍·͢
ϨτϦόͷ ڥ
1. ϨτϦόͬͯʁ 2. ϨτϦόͷڥ 3. औΓΜͩ༰ 4. ײ ϨτϦόͷڥ: རްੜɺڥʢΠϯλʔϯͰ͓ੈʹͳͬͨ༰த৺ʣ
▸ ϦϞʔτϫʔΫͷڐՄͳͲ͕औΓ͍͢ ← ෩ͷ࣌ʹͬ͘͢͝ॿ͔Γ·ͨ͠ ▸ Ҝࢠ͕͔ͳΓྑ͍ʢContessaʣˡ ϔουϨετͷॏཁੑ ▸ [ҿ৯ܥ] ຖिਫ༵ͷ͓ன͓หࢧڅ ← ࣾηϛφʔͳͲͷͨΊ ΦϑΟεάϦίಋೖɺҿΈ͕ϖοτϘτϧͰΒ͑Δ ▸ ͓څྉͷͰΔΠϯλʔϯͰ͢ʢΊͬͪΌॏཁʣ ަ௨අग़·͢ɺԕํͷΠϯλʔϯੜॅΉॴΛ༻ҙͯ͠ΒͬͨΓ 6
1. ϨτϦόͬͯʁ 2. ϨτϦόͷڥ 3. औΓΜͩ༰ 4. ײ ϨτϦόͷڥ: बۀ࣌ؒͱ͔
▸ جຊతʹ 10:00 - 18:00 ▸ ͓னૣΊʹ 11:30 ͝Ζʹʢࠞࡶରࡦʣ ▸ [ޕલத] தճֶͨ͠शͷ֬ೝͱ͔ ࠓԿ͔Δ͔ܾΊΔ [ޕޙ] ޕલதܾΊͨ͜ͱʹऔΓΉ 7
1. ϨτϦόͬͯʁ 2. ϨτϦόͷڥ 3. औΓΜͩ༰ 4. ײ ϨτϦόͷڥ: ߦࣄͱ͔ͦͷล
▸ ܴձ ϐβͱ͔ΛࣾͰ৯Δɺࣗݾհͱ͔ ▸ ϐβύʔςΟ ఆظతʹͬͯΔɺ֎෦ͷਓͱ͓ͨ͠͠ΓϘʔυήʔϜΛΔɺ ϐβΛ৯ΒΕΔ ▸ Ϙυήେձ ͓ன͔Β༦ํա͗͝Ζ·ͰϘʔυήʔϜɺ৭ʑͳͭΛͬͨ Camel Up ͕ݸਓతʹ໘ന͔ͬͨ 8 ϘʔυήʔϜΛΔػձ͕ଟ͘ɺϐβͱ͔৯Δ͜ͱ͕Ͱ͖Δʂ ↑ ༡ΜͰͳ͍ͷʹҹਂ͍ ↑
1. ϨτϦόͬͯʁ 2. ϨτϦόͷڥ 3. औΓΜͩ༰ 4. ײ ϨτϦόͷڥ: ৯ࣄʢटେͱൺΔͱӢటͷࠩʣ
ㅟ ▸ த՚ → 䠧߳ɺX’IAN ▸ ϋϫΠΞϯ → ALOHA TABLE ▸ ম͖ڕ → ӽޙُؙ ▸ ໌ଠࢠ → · ▸ ΠϯυΧϨʔ → ͻͭ͡ ▸ ڕɺ༲͛ → ͏͓࠲ ▸ ͏ͲΜ → խָʢ͏ͲΜͷதͷ͏ͲΜΒ͍͠ʣ 9 த՚ ϋϫΠΞϯ ম͖ڕ ໌ଠࢠʢ·ʣ ΠϯυʢΧϨʔʣ ڕɺ༲͛ ͏ͲΜ ൧ాڮӺ
ϨτϦόͰ औΓΜͩ͜ͱ ※ ৄࡉʹ͍ͭͯผͷࢿྉΛࢀর͍ͯͩ͘͠͞…
1. ϨτϦόͬͯʁ 2. ϨτϦόͷڥ 3. औΓΜͩ༰ 4. ײ औΓΜͩ༰: ࣗಈߍਖ਼ࢧԉख๏ͷݕ౼
▸ ֓ཁ [ೖྗ] ຊޠޠऀʹΑΔޡΓ͕͋Δʢ͔͠Εͳ͍ʣจ [ग़ྗ] ↑ͷޡΓΛݕग़ʢగਖ਼ʣͨ͠ͷ [༻Ͱ͖Δσʔλ] గਖ਼ޙʢޡΓؚ͕·Ε͍ͯͳ͍ʣจʢจষ୯Ґʣ [ͬͨ͜ͱ] గਖ਼ޙͷจ͔Βਖ਼͍͠จͷݴޠϞσϧΛֶश ˠ ೖྗจͷ֤୯ޠͷੜى֬Λࢉग़ɺᮢΑΓ͔ͬͨΒޡΓͱ͢Δ 11 ਖ਼ղʢग़ྗʣདྷདྷདྷੈௗʹͳۭͬͯΛࣗ༝ʹඈͼ͍ͨɻ ೖྗདྷདྷདྷੈௗʹͳۭͬͯΛࣗ༝ʹͼ͍ͨɻ ྫ
1. ϨτϦόͬͯʁ 2. ϨτϦόͷڥ 3. औΓΜͩ༰ 4. ײ ݴޠϞσϧͷͬ͘͟Γͨ͠Πϝʔδ ▸
ֶशσʔλʹج͍ͮͯɺ͋Δ୯ޠͷग़ݱ֬ΛٻΊ͍ͯΔ ྫ: ͷ୯ޠͷग़ݱ֬ΛٻΊΔ߹ ˠ ͦΕ·Ͱग़ݱͨ͠୯ޠʢi-1൪ʣ͔ΒٻΊΔ 12 w0 w1 … wi−1 wi wi P(wi |w0 , …, wi−1 )
1. ϨτϦόͬͯʁ 2. ϨτϦόͷڥ 3. औΓΜͩ༰ 4. ײ ݴޠϞσϧΛ༻͍ͨޡΓ୯ޠͷݕग़ ▸
ݴޠϞσϧ͔ΒٻΊͨ୯ޠͷੜى͕֬ᮢΑΓ͍ ˠ ޡΓ୯ޠͱ͢Δ ▸ ྫ: ᮢΛ 0.1 ͱͨ࣌͠ 13 w0 w1 … wi−1 wi P(wi |w0 , …, wi−1 ) < 0.1
1. ϨτϦόͬͯʁ 2. ϨτϦόͷڥ 3. औΓΜͩ༰ 4. ײ ࠓճͷࣗಈධՁई ▸
ೖྗͱగਖ਼݁Ռɺೖྗͱਖ਼ղʹ͍ͭͯɺͦΕͧΕͷจରͷҧ͍Λൺֱ͢Δ 14 ਖ਼ղࢲͷॴଐ͍ͯ͠Δେֶͷ໊শ͕มΘΓ·͢ɻ ೖྗࢲͷॴଐ͍ͯ͠Δେֶͷ໊উ͕ΘΓ·͢ɻ గਖ਼݁Ռࢲͷॾ͍ͯ͠Δେֶͷ໊শ͕ΘΓ·͢ɻ <❌>గਖ਼͕ bॴଐ`Λbॾ`ஔ 'BMTF1PTJUJWF <⭕>గਖ਼͕ b໊উ`Λ`໊শ`ஔ 5SVF1PTJUJWF <❌>గਖ਼͕ b`Λ`ม`ஔ͠ͳ͍ 'BMTF/FHBUJWF ྫ
1. ϨτϦόͬͯʁ 2. ϨτϦόͷڥ 3. औΓΜͩ༰ 4. ײ ࣮ݧઃఆ ▸
ֶशσʔλ: ͱ͋Δͷσʔλʢจ: 233,873 sentsʣ ▸ గਖ਼ର: ֶशσʔλͱผ͚ͩͲ͍ͦͦۙ͜͜ͷσʔλ ʢશମͰޡΓ12Օॴʣ ▸ ୯ޠׂ: MeCabʢUniDic, IPADICʣɺจࣈ୯ҐʢNeural ͷΈʣ ▸ ݴޠϞσϧ: - N-gram → KenLMʢ5-gramʣ - Neural → ยํɺํ LSTMʢֶशσʔλස1ͷ୯ޠΛ <unk>ʹஔʣ 15
1. ϨτϦόͬͯʁ 2. ϨτϦόͷڥ 3. औΓΜͩ༰ 4. ײ ࣮ݧ݁ՌʢN-gram ݴޠϞσϧʣ:
ఆྔతධՁ ▸ ୯ޠׂ: UniDic, IPADIC 16 1PTJUJWF /FHBUJWF 5SVF ʢ51ʣ ʢ5/ʣ 'BMTF ʢ'1ʣ ʢ'/ʣ 1PTJUJWF /FHBUJWF 5SVF ʢ51ʣ ʢ5/ʣ 'BMTF ʢ'1ʣ ʢ'/ʣ ݕग़݁Ռ గਖ਼݁Ռ 1PTJUJWF /FHBUJWF 5SVF ʢ51ʣ ʢ5/ʣ 'BMTF ʢ'1ʣ ʢ'/ʣ 1PTJUJWF /FHBUJWF 5SVF ʢ51ʣ ʢ5/ʣ 'BMTF ʢ'1ʣ ʢ'/ʣ 6OJ%JD *1"%*$ N-gram ͷΈగਖ਼ ࢼ͍ͯ͠Δ
1. ϨτϦόͬͯʁ 2. ϨτϦόͷڥ 3. औΓΜͩ༰ 4. ײ ࣮ݧ݁ՌʢNeural ݴޠϞσϧʣ:
ఆྔతධՁ ▸ ୯ޠׂ: UniDic, IPADIC 17 1PTJUJWF /FHBUJWF 5SVF ʢ51ʣ ʢ5/ʣ 'BMTF ʢ'1ʣ ʢ'/ʣ 1PTJUJWF /FHBUJWF 5SVF ʢ51ʣ ʢ5/ʣ 'BMTF ʢ'1ʣ ʢ'/ʣ ݕग़݁Ռʢ୯ํʣ ݕग़݁Ռʢํʣ 1PTJUJWF /FHBUJWF 5SVF ʢ51ʣ ʢ5/ʣ 'BMTF ʢ'1ʣ ʢ'/ʣ 1PTJUJWF /FHBUJWF 5SVF ʢ51ʣ ʢ5/ʣ 'BMTF ʢ'1ʣ ʢ'/ʣ 6OJ%JD *1"%*$
1. ϨτϦόͬͯʁ 2. ϨτϦόͷڥ 3. औΓΜͩ༰ 4. ײ ࣮ݧ݁ՌʢNeural ݴޠϞσϧʣ:
ఆྔతධՁ ▸ ୯ޠׂ: จࣈ୯Ґ 18 1PTJUJWF /FHBUJWF 5SVF ʢ51ʣ ʢ5/ʣ 'BMTF ʢ'1ʣ ʢ'/ʣ 1PTJUJWF /FHBUJWF 5SVF ʢ51ʣ ʢ5/ʣ 'BMTF ʢ'1ʣ ʢ'/ʣ ݕग़݁Ռʢ୯ํʣ ݕग़݁Ռʢํʣ จࣈ୯Ґ Neural ख๏ݕग़ΛͯΔͷ͕͔ͳΓΉ͍ͣ…ʁ
1. ϨτϦόͬͯʁ 2. ϨτϦόͷڥ 3. औΓΜͩ༰ 4. ײ ࠓճͷऔΓΜͩ༰ͷ·ͱΊ ▸
ࣗಈߍਖ਼ࢧԉͷλεΫͰݴޠϞσϧΛ༻͍ͯޡΓΛݕग़͢Δख๏Λݕ౼ͨ͠ ▸ N-gram ͰͬͯΈΔͱɺTP Λग़ͨ͢Ίʹ FP ͕͍ͬͺ͍ग़Δײ͡ʹͳͬͨ ʢword Ͱ͍͏ͱͷ͍͘͢͝ઢ͕ग़͖ͯͯΔΑ͏ͳײ͡ʣ ▸ Neural ͰͬͯΈΔͱɺ͋ΕʁN-gram ΑΓ্ख͍͔͘ͳ͍ͧʁͬͯײͩͬͨ͡ 19
Πϯλʔϯͷ ײ
1. ϨτϦόͬͯʁ 2. ϨτϦόͷڥ 3. औΓΜͩ༰ 4. ײ ײ: ͜ͷ̎ϲ݄ΛৼΓฦͬͯ
▸ શମ·ͱΊ ͦͦϨτϦόͰΠϯλʔϯΛ͠Α͏ͱࢥͬͨཧ༝ 1. ࣗવݴޠॲཧͷݟΛࣾձʹཱͯΔͬͯͲΜͳײ͔͡Γ͔ͨͬͨ 2. ՆٳΈظؒશ͍͍ͯͬͯײ͡ͷϑΟʔυόοΫΛಘ͍ͨ ˠ ݚڀͱݚڀ։ൃͷҧ͍Λ͘͢͝ײ͡Δ2ϲ݄Ͱͨ͠ʂ ▸ +α ͳײ - ͦͦΠϯλʔϯࣗମ͕ॳΊͯͩͬͨΜͰ͕͢ɺͦͷลಛ༗ͷࠔΓײ͡ͳ͔ͬͨͰ͢ - [ҹ] ࣗ༝ͳձࣾ:ʮ࣮ࡍʹΛͬͯΔاۀʹߦ͖͍ͨͰ͢ʯˠ ʮ͍͍Ͱ͢Αʔʯ - ͍͢͝ਓ͔͍͠ͳ͍: ΠϯλʔϯͳͲ֎ʹग़Δ͜ͱͰ৽͍ܹ͕͋ͬͨ͠Γ 21
1. ϨτϦόͬͯʁ 2. ϨτϦόͷڥ 3. औΓΜͩ༰ 4. ײ ײ: ͜ͷ̎ϲ݄ΛৼΓฦͬͯ
▸ શମ·ͱΊ ͦͦϨτϦόͰΠϯλʔϯΛ͠Α͏ͱࢥͬͨཧ༝ 1. ࣗવݴޠॲཧͷݟΛࣾձʹཱͯΔͬͯͲΜͳײ͔͡Γ͔ͨͬͨ 2. ՆٳΈظؒશ͍͍ͯͬͯײ͡ͷϑΟʔυόοΫΛಘ͍ͨ ˠ ݚڀͱݚڀ։ൃͷҧ͍Λ͘͢͝ײ͡Δ2ϲ݄Ͱͨ͠ʂ ▸ +α ͳײ - ͦͦΠϯλʔϯࣗମ͕ॳΊͯͩͬͨΜͰ͕͢ɺͦͷลಛ༗ͷࠔΓײ͡ͳ͔ͬͨͰ͢ - [ҹ] ࣗ༝ͳձࣾ:ʮ࣮ࡍʹΛͬͯΔاۀʹߦ͖͍ͨͰ͢ʯˠ ʮ͍͍Ͱ͢Αʔʯ - ͍͢͝ਓ͔͍͠ͳ͍: ΠϯλʔϯͳͲ֎ʹग़Δ͜ͱͰ৽͍ܹ͕͋ͬͨ͠Γ 22 ࣗવݴޠॲཧ͕ࣾձͰͲΜͳײ͡Ͱཱͭͷ͔ΛΓ͍ͨਓ ϨτϦόͷΠϯλʔϯ݁ߏ͍͍ͱࢥ͍·͢ ಛʹम࢜ͷֶੜ ਐֶ͔ब৬͔ͷஅࡐྉʹͳΔͱࢥ͍·͢
ͦͷଞ ࢀߟʹͳΓͦ͏ͳ ▸ ଞͷΠϯλʔϯͷಉظࢀՃใࠂΛ্͍͛ͯͨΓ͢Δ - http://www.creativ.xyz/retrieva-intern-840 - https://nomoto-eriko.hatenablog.com/entry/2018/10/04/125940 ▸ ଞͷΠϯλʔϯͷಉظՌใࠂεϥΠυΛެ։͍ͯͨ͠Γ͢Δ
- https://speakerdeck.com/nomotoeriko/retoribaintancheng-guo-bao-gao - https://speakerdeck.com/kajyuuen/zhuan-men-yong-yu-chou-chu-shou-fa- falseyan-jiu-to-chou-chu-apurikesiyonfalsekai-fa 23