Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
2018年度レトリバインターン参加報告
Search
Sponsored
·
Your Podcast. Everywhere. Effortlessly.
Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
→
Satoru Katsumata
December 10, 2023
20
0
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
2018年度レトリバインターン参加報告
レトリバで2018年度夏季インターンに参加した報告スライドです。
研究室で発表した資料になります。
Satoru Katsumata
December 10, 2023
More Decks by Satoru Katsumata
See All by Satoru Katsumata
論文紹介: Word-node2vec
katsumata420
0
78
論文紹介: How Contextual are Contextualized Word Representations?
katsumata420
0
55
論文紹介: Incorporating Syntactic and Semantic Information in Word Embeddings using Graph Convolutional Networks
katsumata420
0
50
論文紹介: Exploiting Monolingual Data at Scale for Neural Machine Translation
katsumata420
0
52
論文紹介: Deep Neural Machine Translation with Linear Associative Unit
katsumata420
0
74
論文紹介: A convolutional encoder model for neural machine translation
katsumata420
0
100
論文紹介: Lexically constrained decoding for sequence generation using grid beam search
katsumata420
0
200
論文紹介: Memory-augmented Neural Machine Translation
katsumata420
0
68
論文紹介: Guiding neural machine translation with retrieved translation pieces
katsumata420
0
65
Featured
See All Featured
How to Think Like a Performance Engineer
csswizardry
28
2.6k
The agentic SEO stack - context over prompts
schlessera
0
810
Reality Check: Gamification 10 Years Later
codingconduct
0
2.2k
GraphQLの誤解/rethinking-graphql
sonatard
75
12k
JavaScript: Past, Present, and Future - NDC Porto 2020
reverentgeek
52
6k
How to audit for AI Accessibility on your Front & Back End
davetheseo
0
420
Keith and Marios Guide to Fast Websites
keithpitt
413
23k
AI Search: Implications for SEO and How to Move Forward - #ShenzhenSEOConference
aleyda
1
1.3k
Exploring anti-patterns in Rails
aemeredith
3
400
Connecting the Dots Between Site Speed, User Experience & Your Business [WebExpo 2025]
tammyeverts
11
940
Music & Morning Musume
bryan
47
7.2k
How To Speak Unicorn (iThemes Webinar)
marktimemedia
1
480
Transcript
ͳͭ͢ΈʹϨτϦόͰ Πϯλʔϯͨ͠ খொݚ म࢜̍ উຢ ஐ
1. ϨτϦόͬͯʁ 2. ϨτϦόͷڥ 3. औΓΜͩ༰ 4. ײ ϨτϦόͬͯԿʁ: ʢͬ͘͟Γʣձࣾઆ໌
▸ ࣗવݴޠॲཧΛ༻͍ͨιϑτΣΞͷݚڀɺ։ൃɺൢചɺಋೖΛ͍ͯ͠Δձࣾ ▸ [എܠͱ͔] PFI ͔ΒεϐϯΞτɺࠓͰ3 ▸ [ॴͳͲ] JR ൧ాڮӺ͔Βెา5ɺϏϧ1֊ΛआΓ͍ͯΔ ▸ ৄࡉ͕ؾʹͳΔํޙͰݸผʹ͓ئ͍͠·͢… ▸ ఆظతʹϐβύʔςΟͱ͔ͬͯΔͱͷ͜ͱͳͷͰؾʹͳͬͨํੋඇ 2
1. ϨτϦόͬͯʁ 2. ϨτϦόͷڥ 3. औΓΜͩ༰ 4. ײ ϨτϦόͬͯԿʁ: ʢͬ͘͟Γʣձࣾઆ໌
▸ ࣗવݴޠॲཧΛ༻͍ͨιϑτΣΞͷݚڀɺ։ൃɺൢചɺಋೖΛ͍ͯ͠Δձࣾ ▸ [എܠͱ͔] PFI ͔ΒεϐϯΞτɺࠓͰ3 ▸ [ॴͳͲ] JR ൧ాڮӺ͔Βెา5ɺϏϧ1֊ΛआΓ͍ͯΔ ▸ ৄࡉ͕ؾʹͳΔํޙͰݸผʹ͓ئ͍͠·͢… ▸ ఆظతʹϐβύʔςΟͱ͔ͬͯΔͱͷ͜ͱͳͷͰؾʹͳͬͨํੋඇ 3
1. ϨτϦόͬͯʁ 2. ϨτϦόͷڥ 3. औΓΜͩ༰ 4. ײ ϨτϦόͷΠϯλʔϯͱʁ ▸
ݚڀɺ։ൃͰืू ▸ ࠓճ ݚڀͰ2ਓɺ։ൃͰ2ਓͩͬͨ ▸ ݚڀ ࣗવݴޠॲཧ Ͱ1ਓɺԻೝࣝ Ͱ1ਓͷߏ ▸ ظؒ 8݄ɺ9݄ͷ 2ϲ݄ ▸ ࠓͷԠืకΊΓ 5/10 Ͱ ॻྨબߟ → ίʔσΟϯά՝ → ໘ → ࠾༻௨ ͱ͍͏ྲྀΕ ▸ ࣗίʔσΟϯάࡶڕࡶڕͳͷͰɺ ݚڀͷํͰरͬͯΒ͍͍ͨͱ͍͏ؾ࣋ͪʹ͋;Ε͍ͯͨʢͳͷͰड͔ͬͯخ͍͠ʣ 4 উຢݚڀʢࣗવݴޠॲཧʣ ଞͷΠϯλʔϯͷํ ୳ͤωοτ্Ͱݟ͔ͭΔ…ͱ ࢥ͍·͢
ϨτϦόͷ ڥ
1. ϨτϦόͬͯʁ 2. ϨτϦόͷڥ 3. औΓΜͩ༰ 4. ײ ϨτϦόͷڥ: རްੜɺڥʢΠϯλʔϯͰ͓ੈʹͳͬͨ༰த৺ʣ
▸ ϦϞʔτϫʔΫͷڐՄͳͲ͕औΓ͍͢ ← ෩ͷ࣌ʹͬ͘͢͝ॿ͔Γ·ͨ͠ ▸ Ҝࢠ͕͔ͳΓྑ͍ʢContessaʣˡ ϔουϨετͷॏཁੑ ▸ [ҿ৯ܥ] ຖिਫ༵ͷ͓ன͓หࢧڅ ← ࣾηϛφʔͳͲͷͨΊ ΦϑΟεάϦίಋೖɺҿΈ͕ϖοτϘτϧͰΒ͑Δ ▸ ͓څྉͷͰΔΠϯλʔϯͰ͢ʢΊͬͪΌॏཁʣ ަ௨අग़·͢ɺԕํͷΠϯλʔϯੜॅΉॴΛ༻ҙͯ͠ΒͬͨΓ 6
1. ϨτϦόͬͯʁ 2. ϨτϦόͷڥ 3. औΓΜͩ༰ 4. ײ ϨτϦόͷڥ: बۀ࣌ؒͱ͔
▸ جຊతʹ 10:00 - 18:00 ▸ ͓னૣΊʹ 11:30 ͝Ζʹʢࠞࡶରࡦʣ ▸ [ޕલத] தճֶͨ͠शͷ֬ೝͱ͔ ࠓԿ͔Δ͔ܾΊΔ [ޕޙ] ޕલதܾΊͨ͜ͱʹऔΓΉ 7
1. ϨτϦόͬͯʁ 2. ϨτϦόͷڥ 3. औΓΜͩ༰ 4. ײ ϨτϦόͷڥ: ߦࣄͱ͔ͦͷล
▸ ܴձ ϐβͱ͔ΛࣾͰ৯Δɺࣗݾհͱ͔ ▸ ϐβύʔςΟ ఆظతʹͬͯΔɺ֎෦ͷਓͱ͓ͨ͠͠ΓϘʔυήʔϜΛΔɺ ϐβΛ৯ΒΕΔ ▸ Ϙυήେձ ͓ன͔Β༦ํա͗͝Ζ·ͰϘʔυήʔϜɺ৭ʑͳͭΛͬͨ Camel Up ͕ݸਓతʹ໘ന͔ͬͨ 8 ϘʔυήʔϜΛΔػձ͕ଟ͘ɺϐβͱ͔৯Δ͜ͱ͕Ͱ͖Δʂ ↑ ༡ΜͰͳ͍ͷʹҹਂ͍ ↑
1. ϨτϦόͬͯʁ 2. ϨτϦόͷڥ 3. औΓΜͩ༰ 4. ײ ϨτϦόͷڥ: ৯ࣄʢटେͱൺΔͱӢటͷࠩʣ
ㅟ ▸ த՚ → 䠧߳ɺX’IAN ▸ ϋϫΠΞϯ → ALOHA TABLE ▸ ম͖ڕ → ӽޙُؙ ▸ ໌ଠࢠ → · ▸ ΠϯυΧϨʔ → ͻͭ͡ ▸ ڕɺ༲͛ → ͏͓࠲ ▸ ͏ͲΜ → խָʢ͏ͲΜͷதͷ͏ͲΜΒ͍͠ʣ 9 த՚ ϋϫΠΞϯ ম͖ڕ ໌ଠࢠʢ·ʣ ΠϯυʢΧϨʔʣ ڕɺ༲͛ ͏ͲΜ ൧ాڮӺ
ϨτϦόͰ औΓΜͩ͜ͱ ※ ৄࡉʹ͍ͭͯผͷࢿྉΛࢀর͍ͯͩ͘͠͞…
1. ϨτϦόͬͯʁ 2. ϨτϦόͷڥ 3. औΓΜͩ༰ 4. ײ औΓΜͩ༰: ࣗಈߍਖ਼ࢧԉख๏ͷݕ౼
▸ ֓ཁ [ೖྗ] ຊޠޠऀʹΑΔޡΓ͕͋Δʢ͔͠Εͳ͍ʣจ [ग़ྗ] ↑ͷޡΓΛݕग़ʢగਖ਼ʣͨ͠ͷ [༻Ͱ͖Δσʔλ] గਖ਼ޙʢޡΓؚ͕·Ε͍ͯͳ͍ʣจʢจষ୯Ґʣ [ͬͨ͜ͱ] గਖ਼ޙͷจ͔Βਖ਼͍͠จͷݴޠϞσϧΛֶश ˠ ೖྗจͷ֤୯ޠͷੜى֬Λࢉग़ɺᮢΑΓ͔ͬͨΒޡΓͱ͢Δ 11 ਖ਼ղʢग़ྗʣདྷདྷདྷੈௗʹͳۭͬͯΛࣗ༝ʹඈͼ͍ͨɻ ೖྗདྷདྷདྷੈௗʹͳۭͬͯΛࣗ༝ʹͼ͍ͨɻ ྫ
1. ϨτϦόͬͯʁ 2. ϨτϦόͷڥ 3. औΓΜͩ༰ 4. ײ ݴޠϞσϧͷͬ͘͟Γͨ͠Πϝʔδ ▸
ֶशσʔλʹج͍ͮͯɺ͋Δ୯ޠͷग़ݱ֬ΛٻΊ͍ͯΔ ྫ: ͷ୯ޠͷग़ݱ֬ΛٻΊΔ߹ ˠ ͦΕ·Ͱग़ݱͨ͠୯ޠʢi-1൪ʣ͔ΒٻΊΔ 12 w0 w1 … wi−1 wi wi P(wi |w0 , …, wi−1 )
1. ϨτϦόͬͯʁ 2. ϨτϦόͷڥ 3. औΓΜͩ༰ 4. ײ ݴޠϞσϧΛ༻͍ͨޡΓ୯ޠͷݕग़ ▸
ݴޠϞσϧ͔ΒٻΊͨ୯ޠͷੜى͕֬ᮢΑΓ͍ ˠ ޡΓ୯ޠͱ͢Δ ▸ ྫ: ᮢΛ 0.1 ͱͨ࣌͠ 13 w0 w1 … wi−1 wi P(wi |w0 , …, wi−1 ) < 0.1
1. ϨτϦόͬͯʁ 2. ϨτϦόͷڥ 3. औΓΜͩ༰ 4. ײ ࠓճͷࣗಈධՁई ▸
ೖྗͱగਖ਼݁Ռɺೖྗͱਖ਼ղʹ͍ͭͯɺͦΕͧΕͷจରͷҧ͍Λൺֱ͢Δ 14 ਖ਼ղࢲͷॴଐ͍ͯ͠Δେֶͷ໊শ͕มΘΓ·͢ɻ ೖྗࢲͷॴଐ͍ͯ͠Δେֶͷ໊উ͕ΘΓ·͢ɻ గਖ਼݁Ռࢲͷॾ͍ͯ͠Δେֶͷ໊শ͕ΘΓ·͢ɻ <❌>గਖ਼͕ bॴଐ`Λbॾ`ஔ 'BMTF1PTJUJWF <⭕>గਖ਼͕ b໊উ`Λ`໊শ`ஔ 5SVF1PTJUJWF <❌>గਖ਼͕ b`Λ`ม`ஔ͠ͳ͍ 'BMTF/FHBUJWF ྫ
1. ϨτϦόͬͯʁ 2. ϨτϦόͷڥ 3. औΓΜͩ༰ 4. ײ ࣮ݧઃఆ ▸
ֶशσʔλ: ͱ͋Δͷσʔλʢจ: 233,873 sentsʣ ▸ గਖ਼ର: ֶशσʔλͱผ͚ͩͲ͍ͦͦۙ͜͜ͷσʔλ ʢશମͰޡΓ12Օॴʣ ▸ ୯ޠׂ: MeCabʢUniDic, IPADICʣɺจࣈ୯ҐʢNeural ͷΈʣ ▸ ݴޠϞσϧ: - N-gram → KenLMʢ5-gramʣ - Neural → ยํɺํ LSTMʢֶशσʔλස1ͷ୯ޠΛ <unk>ʹஔʣ 15
1. ϨτϦόͬͯʁ 2. ϨτϦόͷڥ 3. औΓΜͩ༰ 4. ײ ࣮ݧ݁ՌʢN-gram ݴޠϞσϧʣ:
ఆྔతධՁ ▸ ୯ޠׂ: UniDic, IPADIC 16 1PTJUJWF /FHBUJWF 5SVF ʢ51ʣ ʢ5/ʣ 'BMTF ʢ'1ʣ ʢ'/ʣ 1PTJUJWF /FHBUJWF 5SVF ʢ51ʣ ʢ5/ʣ 'BMTF ʢ'1ʣ ʢ'/ʣ ݕग़݁Ռ గਖ਼݁Ռ 1PTJUJWF /FHBUJWF 5SVF ʢ51ʣ ʢ5/ʣ 'BMTF ʢ'1ʣ ʢ'/ʣ 1PTJUJWF /FHBUJWF 5SVF ʢ51ʣ ʢ5/ʣ 'BMTF ʢ'1ʣ ʢ'/ʣ 6OJ%JD *1"%*$ N-gram ͷΈగਖ਼ ࢼ͍ͯ͠Δ
1. ϨτϦόͬͯʁ 2. ϨτϦόͷڥ 3. औΓΜͩ༰ 4. ײ ࣮ݧ݁ՌʢNeural ݴޠϞσϧʣ:
ఆྔతධՁ ▸ ୯ޠׂ: UniDic, IPADIC 17 1PTJUJWF /FHBUJWF 5SVF ʢ51ʣ ʢ5/ʣ 'BMTF ʢ'1ʣ ʢ'/ʣ 1PTJUJWF /FHBUJWF 5SVF ʢ51ʣ ʢ5/ʣ 'BMTF ʢ'1ʣ ʢ'/ʣ ݕग़݁Ռʢ୯ํʣ ݕग़݁Ռʢํʣ 1PTJUJWF /FHBUJWF 5SVF ʢ51ʣ ʢ5/ʣ 'BMTF ʢ'1ʣ ʢ'/ʣ 1PTJUJWF /FHBUJWF 5SVF ʢ51ʣ ʢ5/ʣ 'BMTF ʢ'1ʣ ʢ'/ʣ 6OJ%JD *1"%*$
1. ϨτϦόͬͯʁ 2. ϨτϦόͷڥ 3. औΓΜͩ༰ 4. ײ ࣮ݧ݁ՌʢNeural ݴޠϞσϧʣ:
ఆྔతධՁ ▸ ୯ޠׂ: จࣈ୯Ґ 18 1PTJUJWF /FHBUJWF 5SVF ʢ51ʣ ʢ5/ʣ 'BMTF ʢ'1ʣ ʢ'/ʣ 1PTJUJWF /FHBUJWF 5SVF ʢ51ʣ ʢ5/ʣ 'BMTF ʢ'1ʣ ʢ'/ʣ ݕग़݁Ռʢ୯ํʣ ݕग़݁Ռʢํʣ จࣈ୯Ґ Neural ख๏ݕग़ΛͯΔͷ͕͔ͳΓΉ͍ͣ…ʁ
1. ϨτϦόͬͯʁ 2. ϨτϦόͷڥ 3. औΓΜͩ༰ 4. ײ ࠓճͷऔΓΜͩ༰ͷ·ͱΊ ▸
ࣗಈߍਖ਼ࢧԉͷλεΫͰݴޠϞσϧΛ༻͍ͯޡΓΛݕग़͢Δख๏Λݕ౼ͨ͠ ▸ N-gram ͰͬͯΈΔͱɺTP Λग़ͨ͢Ίʹ FP ͕͍ͬͺ͍ग़Δײ͡ʹͳͬͨ ʢword Ͱ͍͏ͱͷ͍͘͢͝ઢ͕ग़͖ͯͯΔΑ͏ͳײ͡ʣ ▸ Neural ͰͬͯΈΔͱɺ͋ΕʁN-gram ΑΓ্ख͍͔͘ͳ͍ͧʁͬͯײͩͬͨ͡ 19
Πϯλʔϯͷ ײ
1. ϨτϦόͬͯʁ 2. ϨτϦόͷڥ 3. औΓΜͩ༰ 4. ײ ײ: ͜ͷ̎ϲ݄ΛৼΓฦͬͯ
▸ શମ·ͱΊ ͦͦϨτϦόͰΠϯλʔϯΛ͠Α͏ͱࢥͬͨཧ༝ 1. ࣗવݴޠॲཧͷݟΛࣾձʹཱͯΔͬͯͲΜͳײ͔͡Γ͔ͨͬͨ 2. ՆٳΈظؒશ͍͍ͯͬͯײ͡ͷϑΟʔυόοΫΛಘ͍ͨ ˠ ݚڀͱݚڀ։ൃͷҧ͍Λ͘͢͝ײ͡Δ2ϲ݄Ͱͨ͠ʂ ▸ +α ͳײ - ͦͦΠϯλʔϯࣗମ͕ॳΊͯͩͬͨΜͰ͕͢ɺͦͷลಛ༗ͷࠔΓײ͡ͳ͔ͬͨͰ͢ - [ҹ] ࣗ༝ͳձࣾ:ʮ࣮ࡍʹΛͬͯΔاۀʹߦ͖͍ͨͰ͢ʯˠ ʮ͍͍Ͱ͢Αʔʯ - ͍͢͝ਓ͔͍͠ͳ͍: ΠϯλʔϯͳͲ֎ʹग़Δ͜ͱͰ৽͍ܹ͕͋ͬͨ͠Γ 21
1. ϨτϦόͬͯʁ 2. ϨτϦόͷڥ 3. औΓΜͩ༰ 4. ײ ײ: ͜ͷ̎ϲ݄ΛৼΓฦͬͯ
▸ શମ·ͱΊ ͦͦϨτϦόͰΠϯλʔϯΛ͠Α͏ͱࢥͬͨཧ༝ 1. ࣗવݴޠॲཧͷݟΛࣾձʹཱͯΔͬͯͲΜͳײ͔͡Γ͔ͨͬͨ 2. ՆٳΈظؒશ͍͍ͯͬͯײ͡ͷϑΟʔυόοΫΛಘ͍ͨ ˠ ݚڀͱݚڀ։ൃͷҧ͍Λ͘͢͝ײ͡Δ2ϲ݄Ͱͨ͠ʂ ▸ +α ͳײ - ͦͦΠϯλʔϯࣗମ͕ॳΊͯͩͬͨΜͰ͕͢ɺͦͷลಛ༗ͷࠔΓײ͡ͳ͔ͬͨͰ͢ - [ҹ] ࣗ༝ͳձࣾ:ʮ࣮ࡍʹΛͬͯΔاۀʹߦ͖͍ͨͰ͢ʯˠ ʮ͍͍Ͱ͢Αʔʯ - ͍͢͝ਓ͔͍͠ͳ͍: ΠϯλʔϯͳͲ֎ʹग़Δ͜ͱͰ৽͍ܹ͕͋ͬͨ͠Γ 22 ࣗવݴޠॲཧ͕ࣾձͰͲΜͳײ͡Ͱཱͭͷ͔ΛΓ͍ͨਓ ϨτϦόͷΠϯλʔϯ݁ߏ͍͍ͱࢥ͍·͢ ಛʹम࢜ͷֶੜ ਐֶ͔ब৬͔ͷஅࡐྉʹͳΔͱࢥ͍·͢
ͦͷଞ ࢀߟʹͳΓͦ͏ͳ ▸ ଞͷΠϯλʔϯͷಉظࢀՃใࠂΛ্͍͛ͯͨΓ͢Δ - http://www.creativ.xyz/retrieva-intern-840 - https://nomoto-eriko.hatenablog.com/entry/2018/10/04/125940 ▸ ଞͷΠϯλʔϯͷಉظՌใࠂεϥΠυΛެ։͍ͯͨ͠Γ͢Δ
- https://speakerdeck.com/nomotoeriko/retoribaintancheng-guo-bao-gao - https://speakerdeck.com/kajyuuen/zhuan-men-yong-yu-chou-chu-shou-fa- falseyan-jiu-to-chou-chu-apurikesiyonfalsekai-fa 23