Upgrade to Pro — share decks privately, control downloads, hide ads and more …

【輪講資料】From Zero to Hero: On the Limitations of ...

Yano
May 03, 2023
64

【輪講資料】From Zero to Hero: On the Limitations of Zero-Shot Language Transfer with Multilingual Transformers

研究室内の輪講用に作った資料です

Yano

May 03, 2023
Tweet

Transcript

  1. From Zero to Hero: On the Limitations of Zero-Shot Language

    Transfer with Multilingual Transformers 5݄2೔ ෢ాɾ࡫໺ݚڀࣨɹM2 ໼໺ઍߛ Anne Lauscher, Vinit Ravishankar, Ivan Vulic, and Goran Glavas 2020 EMNLP
  2. From Zero to Hero: On the Limitations of Zero-Shot Language

    Transfer with Multilingual Transformers ֓ཁ • ଟݴޠϞσϧͷݴޠؒసҠೳྗͷݶքʹ͍ͭͯௐࠪ 1. ݴޠྨࣅੑͱࣄલֶश࣌ͷίʔύεαΠζ͸ݴޠؒసҠೳྗʹͲͷΑ͏ͳӨڹΛ༩͑Δ͔ʁ 2. ԼҐλεΫͱݴޠؒసҠೳྗʹؔ܎͸͋Δ͔ʁ 3. ݴޠؒసҠೳྗͷ༧ଌ͸Մೳ͔ʁ 4. few-shotʹίετΛࢧ෷͏΂͖͔ʁ • ιʔεݴޠ͔Βԕ͍ݴޠ΍௿ϦιʔεͳݴޠͰ͸zero-shotਪ࿦͸೉͍͠ • ௿ϨϕϧλεΫͷసૹੑೳʹ͸ߏ଄తͳݴޠྨࣅੑ͕େ͖͘Өڹ • ߴϨϕϧͷLUλεΫͷసૹੑೳʹ͸ࣄલֶशͷίʔύεαΠζ͕Өڹ • zero-shot͕೉͍͠৔߹΄Ͳfew-shotͰੑೳ͕վળ • few-shotͷඅ༻ରޮՌ͸ߴ͍ 2
  3. Cross-lingual Transfer ✦ ڭࢣ͋Γֶश (Supervised Learning) • े෼ͳαΠζͷσʔληοτΛڭࢣ৴߸ͱֶͯ͠श
 ✖︎ σʔληοτ͕५୔ͳݴޠ͸ҰѲΓ

    • λεΫ͝ͱʹҟͳΔΞϊςʔτ͕ඞཁ… ➡ߴϦιʔεݴޠͷϥϕϧ෇͖σʔλΛར༻ͯ͠௿ϦιʔεݴޠͰ ͷਪ࿦Λߦ͍͍ͨʂ ✦ ݴޠؒసҠֶश (Cross-lingual Transfer Learning) • ݴޠԣஅతʹλεΫΛղ͘ྗΛ֫ಘ͢Δ 4
  4. Cross-lingual Transfer ✦ Cross-Lingual Word Embeddings (CLWE) • ݴޠԣஅతͳ୯ޠຒΊࠐΈ ✦

    Massively Multilingual Transformer networks 
 (MMTs) • ଟݴޠίʔύεͰࣄલֶश͞ΕͨݴޠϞσϧ • mBERTɺXLM-R… • ݴޠؒసҠೳྗΛ࣋ͭ • FineTuning࣌ʹ༩͍͑ͯͳ͍ݴޠͰ΋ੑೳվળ • ྫ) ӳޠͰλεΫAʹରֶͯ͠श
 -> ΠλϦΞޠͰλεΫAʹରͯ͠ਪ࿦ 5 ▪ ӳޠ ▪ ΠλϦΞޠ CLWE࿦จΑΓҾ༻ ݴޠԣஅతͳ୯ޠຒΊࠐΈۭؒ
  5. Massively Multilingual Transformers ✦ multilingual BERT (mBERT) • MLMͱNSPͰࣄલֶश •

    Mask Language ModelingɿϥϯμϜʹϚεΫͨ͠จதͷτʔΫϯΛ༧ଌ • Next Sentence Predictionɿ̎ͭͷจ͕ྡΓ߹͍ͬͯΔ͔༧ଌ • 104ݴޠͷWikipediaΛֶशʹར༻ ✦ XLM RoBERTa (XLM-R) • MLMͷΈͰࣄલֶश • 100ݴޠͷCommonCrawlΛֶशʹར༻ • ΢ΣϒςΩετ • mBERTΑΓ΋େ͖͍ 6 XLM-R࿦จΑΓҾ༻
  6. Massively Multilingual Transformers ✓ ݴޠؒసҠೳྗΛ࣋ͭ ✦ Zero-shot • FineTuning࣌ʹ༩͍͑ͯͳ͍Ϋϥε (ݴޠ)Ͱਪ࿦

    • ྫ) ӳޠͰλεΫAʹରֶͯ͠श
 -> ΠλϦΞޠͰλεΫAʹରͯ͠ਪ࿦ ✦ Few-shot • FineTuning࣌ʹগྔͷڭࢣσʔλ (λʔήοτݴޠ)Λ (΋)ར༻ • ྫ) ӳޠͱগྔͷΠλϦΞޠͰλεΫAʹରֶͯ͠श
 -> ΠλϦΞޠͰλεΫAʹରͯ͠ਪ࿦ 7 ଟݴޠࣄલֶशʹΑͬͯ
 ֤ݴޠͷදݱۭ͕ؒྨࣅ͠ɺ
 ݴޠؒసૹ͕ՄೳͳͷͰ͸ʁ[1] <>&NFSHJOH$SPTTMJOHVBM4USVDUVSFJO1SFUSBJOFE-BOHVBHF.PEFMT
  7. Q1:ݴޠྨࣅੑͱࣄલֶश࣌ͷίʔύεαΠζ͸ݴޠؒసҠೳྗʹͲͷΑ͏ͳ ɹɹӨڹΛ༩͑Δ͔ʁ • ଟݴޠ୯ޠຒΊࠐΈ (CLWE)͸ྨࣅͨ͠ݴޠؒ΍े෼ͳίʔύε͕ଘࡏ ͢ΔݴޠͰͷΈੑೳΛൃش • CLWEͰ͸֤ݴޠͷ୯ޠຒΊࠐΈۭ͕ؒࣅ͍ͯΔ͜ͱ͕લఏͱͯ͠ඞཁ • ྫ)

    ӳޠͱࣅ͍ͯΔݴޠ • υΠπޠɺεΧϯδφϏΞޠɺϑϥϯεޠɺεϖΠϯޠ • ಉ͡ޠ଒ (ΠϯυɾϤʔϩούޠ଒)ʹଐ͢Δ ➡MMTsͷݴޠؒసҠೳྗ͸Ͳ͏͔ʁ ✓ ෳ਺ݴޠؒͰͷݴޠؒసҠੑೳΛௐࠪ 9 MMTsͷθϩγϣοτݴޠؒసҠೳྗͷ෼ੳ
  8. Q2:ԼҐλεΫͱݴޠؒసҠೳྗʹؔ܎͸͋Δ͔ʁ • POS-tagging (඼ࢺλά෇͚) • Dependency Parsing (ґଘߏ଄ղੳ) • Named

    Entity Recognition (ݻ༗දݱநग़)
 • Natural Language Inference (ࣗવݴޠਪ࿦) • Question Answering (࣭໰Ԡ౴) 10 ߴϨϕϧ (ݴޠཧղ)λεΫ ௿ϨϕϧλεΫ MMTsͷθϩγϣοτݴޠؒసҠೳྗͷ෼ੳ ✓ ෳ਺ͷλεΫͰθϩγϣοτݴޠؒసҠೳྗΛଌΔ
  9. Q2:ԼҐλεΫͱݴޠؒసҠೳྗʹؔ܎͸͋Δ͔ʁ ✦ POS-tagging (඼ࢺλά෇͚) • ςΩετதͷ୯ޠʹର͠඼ࢺΛϥϕϦϯά͢Δ ✦ Dependency Parsing (ґଘߏ଄ղੳ)

    • จதͷґଘؔ܎ʹج͍ͮͯߏจղੳΛߦ͏ ✦ Named Entity Recognition (ݻ༗දݱநग़) • ςΩετ͔Βݻ༗දݱ (ਓ໊ɺ૊৫໊ɺ஍໊ɺ࣌ؒ……)Λநग़͢Δ ✦ Natural Language Inference (ࣗવݴޠਪ࿦) • ̎ͭͷςΩετͷؒʹ੒Γཱͭਪ࿦తؔ܎ (ؚҙ/தཱ/ໃ६)Λ༧ଌ͢Δ ✦ Question Answering (࣭໰Ԡ౴) • จॻ͔Β࣭໰ͷճ౴ͱͳΔςΩετΛ୳ࡧ͢Δ 11 MMTsͷθϩγϣοτݴޠؒసҠೳྗͷ෼ੳ
  10. Q4: few-shotʹίετΛࢧ෷͏΂͖͔ʁ • few-shotͷඅ༻ରޮՌ͸Ͳͷఔ౓ͳͷ͔ʁ ✓ zero-shotͱfew-shotͷੑೳΛൺֱ ✓ few-shotͷΞϊςʔγϣϯʹ͔͔Δඅ༻Λྫࣔ 13 MMTsͷθϩγϣοτݴޠؒసҠೳྗͷ෼ੳ

    ✦ Zero-shot • FineTuning࣌ʹ༩͍͑ͯͳ͍Ϋϥε (ݴޠ)Ͱਪ࿦ • ྫ) ӳޠͰλεΫAʹରֶͯ͠श
 -> ΠλϦΞޠͰλεΫAʹରͯ͠ਪ࿦ ✦ Few-shot • FineTuning࣌ʹগྔͷڭࢣσʔλ (λʔήοτݴޠ)Λ (΋)ར༻ • ྫ) ӳޠͱগྔͷΠλϦΞޠͰλεΫAʹରֶͯ͠श
 -> ΠλϦΞޠͰλεΫAʹରͯ͠ਪ࿦ + ੑೳ + ίετ
  11. ຊ࿦จͰѻΘΕΔओͳݴޠʹ͍ͭͯ ޠ଒ɿಉ͡ىݯΛ࣋ͭݴޠͷάϧʔϓ • ڞ௨ͷޠኮɺจ๏ɺԻӆମܥͳͲΛ࣋ͭ͜ͱ͕͋Γɺࣅͨಛ௃Λڞ༗͢Δݴޠ܈ • ΠϯυɾϤʔϩούޠ଒ EN (ӳޠ), RU (ϩγΞޠ),

    HI (ώϯσΟʔޠ), IT (ΠλϦΞޠ), SV (ε΢Σʔσϯޠ), ES (εϖΠϯޠ), 
 EL (ΪϦγϟޠ),DE (υΠπޠ), FR (ϑϥϯεޠ), BG (ϒϧΨϦΞޠ), UR (΢ϧυΡʔޠ), … • ΞϑϩɾΞδΞޠ଒ AR (ΞϥϏΞޠ), HE (ϔϒϥΠޠ), … • ଞͷޠ଒ TR (τϧίޠ): Φάζޠ܈, ZH (தࠃޠ): γφɾνϕοτޠ଒, EU (όεΫޠ): όεΫޠ଒, FI (ϑΟϯϥϯ υޠ): ΢ϥϧޠ଒, JA (೔ຊޠ): ೔ຊޠ଒, KO (ؖࠃޠ): ؖࠃޠ଒, VI (ϕτφϜޠ): ΦʔετϩΞδΞޠ ଒, TH (λΠޠ): λΠɾΧμΠޠ଒, SW (εϫώϦޠ): χδΣʔϧɾίϯΰޠ଒, … 14
  12. θϩγϣοτసҠೳྗͷ࣮ݧͱ෼ੳ ✓ ෳ਺ͷλεΫΛෳ਺ͷݴޠʹର͠θϩγϣοτͰਪ࿦ • Q1ͱQ2ʹճ౴͢Δ • Q1: ੑೳͱݴޠྨࣅੑɺࣄલֶशσʔλαΠζͷؔ܎ʹ͍ͭͯ • Q2:

    ੑೳͱλεΫͷؔ܎ʹ͍ͭͯ • λεΫ: ݴޠ (ISO 639-1ܗࣜ) • POS(඼ࢺλά෇͚)ɿZH, TR, RU, AR, HI, EU, FI, HE, IT, JA, KO, SV • DEP (ґଘߏ଄ղੳ): ZH, TR, RU, AR, HI, EU, FI, HE, IT, JA, KO, SV • NER (ݻ༗දݱநग़): ZH, TR, RU, AR, HI, EU, FI, HE, IT, JA, KO, SV • NLI (ࣗવݴޠਪ࿦): ZH, TR, RU, AR, HI,VI, TH, ES, EL, DE, ER, BG, SW, UR • QA (࣭໰Ԡ౴): ZH, TR, RU, AR, HI,VI, TH, ES, EL, DE 15
  13. θϩγϣοτసҠೳྗͷ࣮ݧͱ෼ੳ ✓ ෳ਺ͷλεΫΛෳ਺ͷݴޠʹର͠θϩγϣοτͰਪ࿦ • λεΫ: σʔληοτ • POS(඼ࢺλά෇͚)ɿUniversal Dependency •

    DEP (ґଘߏ଄ղੳ): Universal Dependency • NER (ݻ༗දݱநग़): NER WikiANN dataset [2] • NLI (ࣗવݴޠਪ࿦): XNLI [3] • QA (࣭໰Ԡ౴): XQuAD [4] 16 <>.BTTJWFMZ.VMUJMJOHVBM5SBOTGFSGPS/&3 <>9/-*&WBMVBUJOH$SPTTMJOHVBM4FOUFODF3FQSFTFOUBUJPOT <>0OUIF$SPTTMJOHVBM5SBOTGFSBCJMJUZPG.POPMJOHVBM3FQSFTFOUBUJPOT
  14. θϩγϣοτసҠೳྗͷ࣮ݧͱ෼ੳ ✓ ෳ਺ͷλεΫΛෳ਺ͷݴޠʹର͠θϩγϣοτͰਪ࿦ • λεΫ: ධՁࢦඪ • POS(඼ࢺλά෇͚)ɿAccuracy • DEP

    (ґଘߏ଄ղੳ): Unlabeled Attachment Scores (܎Γઌ͕ਖ਼͚͠Ε͹ྑ͍) • NER (ݻ༗දݱநग़): Accuracy • NLI (ࣗવݴޠਪ࿦): Accuracy • QA (࣭໰Ԡ౴): Exact Match 17
  15. θϩγϣοτసҠೳྗͷ࣮ݧͱ෼ੳ ✓ ݴޠྨࣅੑͷਪఆʹ͸lang2vec͔ΒҎԼͷಛ௃ྔΛར༻ • Syntax: ߏจ৘ใΛදݱ ओޠͱಈࢺͷલޙؔ܎ͳͲ • Phonology:Իӆతಛ௃Λදݱ ࢠԻͱ฼Իͷൺ཰ͳͲ

    • Inventory: Իӆ࿦ʹ͓͚Δࣗવྨ (ಛఆͷൃԻΛߦ͏Ի)ͷ༗ແ • Language families: ޠ଒Λදݱ • Geography: ஍ٿ্Ͱͷڑ཭Λදݱ ➡͜ΕΒͷίαΠϯྨࣅ౓Λ࢖͏͜ͱͰݴޠྨࣅੑΛਪఆ
 ✓ ࣄલֶश࣌ͷίʔύεαΠζΛඪ४Խͯ͠ಛ௃ྔSIZEͱͯ͠ར༻ 22
  16. Few-shotసҠೳྗͷ࣮ݧͱ෼ੳ • Q4: few-shotʹίετΛ͔͚Δ΂͖͔ʁ ➡few-shotʹΑΔݴޠؒసҠੑೳͷมԽΛௐ΂Δ • ӳޠʹՃ͑ͯkݸͷλʔήοτݴޠͷσʔλΛֶशʹར༻ 29 ✦ Zero-shot

    • FineTuning࣌ʹ༩͍͑ͯͳ͍Ϋϥε (ݴޠ)Ͱਪ࿦ • ྫ) ӳޠͰλεΫAʹରֶͯ͠श
 -> ΠλϦΞޠͰλεΫAʹରͯ͠ਪ࿦ ✦ Few-shot • FineTuning࣌ʹগྔͷڭࢣσʔλ (λʔήοτݴޠ)Λ (΋)ར༻ • ྫ) ӳޠͱগྔͷΠλϦΞޠͰλεΫAʹରֶͯ͠श
 -> ΠλϦΞޠͰλεΫAʹରͯ͠ਪ࿦ + ੑೳ + ίετ
  17. Few-shotసҠೳྗͷ࣮ݧͱ෼ੳ Direct Target Language Few-Shot Fine-Tuning ӳޠͰͷλεΫ΁ͷֶशΛߦΘͣɺfew-shotͰͷΈֶश • ߴϨϕϧλεΫʹ͓͍ͯӳޠେن໛σʔλͰͷֶशΛল͘ͱ௿͍ੑೳ •

    ιʔεݴޠͰͷλεΫֶशͱλʔήοτݴޠͰͷֶशɺ྆ํ͕ඞཁ • ௿ϨϕϧλεΫʹ͓͍ͯ͸ӳޠͰͷֶशΛলུՄೳ • ΑΓ୯७ͳ௿ϨϕϧλεΫʹରͯ͠͸গ਺ͷσʔλͰ΋ֶशՄೳ 33
  18. Q2:ԼҐλεΫͱݴޠؒసҠೳྗʹؔ܎͸͋Δ͔ʁ • POS-tagging (඼ࢺλά෇͚) • Dependency Parsing (ґଘߏ଄ղੳ) • Named

    Entity Recognition (ݻ༗දݱநग़)
 • Natural Language Inference (ࣗવݴޠਪ࿦) • Question Answering (࣭໰Ԡ౴) 35 ߴϨϕϧ (ݴޠཧղ)λεΫ ௿ϨϕϧλεΫ ✓ zero-shot, few-shotͷ྆ઃఆͰݴޠؒసҠੑೳͷ܏޲͕λεΫͷϨϕϧʹґଘ MMTsͷθϩγϣοτݴޠؒసҠೳྗͷ෼ੳ
  19. Q3:ݴޠؒసҠೳྗͷ༧ଌ͸Մೳ͔ʁ ✓ ಛఆͷλεΫͷసҠ݁Ռ͕ଘࡏ͢Ε͹ɺ؆୯ͳճؼΛ༻͍Δ͜ͱͰ৽͍͠ ݴޠͰͷసҠೳྗ͕͓͓·͔ʹ༧ଌՄೳ ✓ λεΫ͝ͱʹґଘ͢Δಛ௃ྔʹ܏޲͕ଘࡏ • ௿ϨϕϧλεΫɿݴޠྨࣅੑʹґଘ • ߴϨϕϧλεΫɿݴޠྨࣅੑͱࣄલֶश࣌ͷίʔύεαΠζʹґଘ

    36 • ݴޠͷྨࣅੑɺࣄલֶश࣌ͷίʔύεαΠζͳͲ͔ΒಛఆλεΫɺ
 ݴޠͰͷੑೳΛ༧ଌͰ͖Δ͔ʁ
 ※ ݴޠͷྨࣅੑ • lang2vecͰݴޠͷಛ௃ΛϕΫτϧԽ͠ɺͦͷίαΠϯྨࣅ౓Λར༻ • ݴޠͷಛ௃͕ऩू͞Εͨσʔλϕʔε͔Βಛ௃ྔΛநग़ MMTsͷθϩγϣοτݴޠؒసҠೳྗͷ෼ੳ
  20. ΑΓେ͖ͳଟݴޠϞσϧͰͷθϩγϣοτݴޠؒసҠੑೳʹ͍ͭͯ ✓ ͜ͷ࿦จ͸2020೥ʹൃද͞Ε͓ͯΓɺݱ୅Ͱ͸ঢ়گ͕ҟͳΔՄೳੑ͕͋Δ • ࿦จ಺Ͱ࣮ݧʹ༻͍ΒΕ͍ͯΔϞσϧ • mBERT (base): 110M •

    XLM-R (base): 270M • ݱࡏൃද͞Ε͍ͯΔେن໛ଟݴޠϞσϧ • mT5-XXL[5]: 13B • Bloomz[6]: 176B 39 <>N5".BTTJWFMZ.VMUJMJOHVBM1SFUSBJOFE5FYUUP5FYU5SBOTGPSNFS <>$SPTTMJOHVBM(FOFSBMJ[BUJPOUISPVHI.VMUJUBTL'JOFUVOJOH