Upgrade to Pro — share decks privately, control downloads, hide ads and more …

AI最新論文読み会2022年まとめ

ai.labo.ocu
December 07, 2022

 AI最新論文読み会2022年まとめ

AI最新論文読み会2022年まとめ

ai.labo.ocu

December 07, 2022
Tweet

More Decks by ai.labo.ocu

Other Decks in Science

Transcript

  1. େࡕެཱେֶɹ২ాେथ AI࠷৽࿦จಡΈձ 2022೥1೥·ͱΊ

  2. 2022೥·ͱΊ AI࠷৽࿦จಡΈձ ɾϝΠϯ ConvNeXt (2݄ൃද): ࢖͍΍͍͢࠷ۙͷߴੑೳϞσϧ GLIDE (1݄ൃද)ɹςΩετtoը૾ੜ੒ Imagic (11݄ൃද)ɹࡉ΍͔ͳमਖ਼

    AudioLM (10݄ൃද): Ի੠ੜ੒Ϟσϧ Socratic Models (5݄ൃද): ൚༻AI (AGI) ɾͦͷଞ Wav2Vec 2 (7݄ൃද): NeuroAI?Brain-inspired AI (AIͱਓؒͷ೴ͷؔ܎Λ୳Δ෼໺) Algorithmic Imprint (7݄ൃද): AI࡞੒ऀͷྙཧ
  3. 2022೥·ͱΊ Text GLIDE, Imagic AGI ࣗવݴޠॲཧΛج൫ͱͨ͠൚ਓ޻஌ೳ(AGI)΁ͷνϟϨϯδͷ1೥ɻ CV ConvNeXt AudioLM Speech

    Socratic model Diffusion
  4. 2022೥·Ͱ·ͱΊ Self-Attention 2017೥ 2018೥ BERT 2020೥ DETR ViT GPT3 2021೥

    CLIP wav2vec 2 w2v-BERT BigSSL 2019೥ GPT2 SwinT DDPM ADM
  5. 2022೥·ͱΊ Text GLIDE, Imagic AGI ࣗવݴޠॲཧΛج൫ͱͨ͠൚ਓ޻஌ೳ(AGI)΁ͷνϟϨϯδͷ1೥ɻ CV AudioLM Speech Socratic

    model Diffusion ConvNeXt
  6. ConvNeXt: CNN x SwinTransformer ը૾෼ྨϞσϧͷstate-of-the-art

  7. ConvNeXt: CNN x SwinTransformer ޻෉఺ͷ·ͱΊ: ϕʔε͸ResNet

  8. ConvNeXt ·ͣॳΊʹɻ

  9. ConvNeXt ֤εςʔδͷ܁Γฦ͠਺ΛSwinTʹ͚ۙͮΔ

  10. ConvNeXt 4×4 non-overlapping convolution ৞ΈࠐΈͷύονԽ

  11. ConvNeXt Depthwise convolutionಋೖޙɺ෯Λ޿͛Δ

  12. ConvNeXt Inverted bottleneck(Narrow→Wide→Narrow)ߏ଄ͷಋೖ TransformerͰ͸֦େ཰4ഒΛ࢖༻ɻ※MobileNetͰ͸֦େ཰͸6ഒɻ શମͱͯ͠ͷܭࢉྔ͸ݮΔ͕ɺConvͷԋࢉ͸૿Ճɻ SwinTͰ͸ίί

  13. ConvNeXt Depthwise convolutionͷҠಈ ※Depthwise ConvolutionͰେ͖ͳΧʔωϧαΠζ࢖͏ͨΊ Ұ࣌తʹConvͷԋࢉྔݮগͰੑೳѱԽɻ SwinTͰ͸ίί MSAϒϩοΫ͕FFNΑΓ΋ઌ಄ʹ͋Δ

  14. ConvNeXt SwinTransformerͷΧʔωϧαΠζ(7)ΛਅࣅΔ Depthwise convolutionͷ ΧʔωϧαΠζେ͖͍ͯ͘͘͠ɻ 7Ͱੑೳ͕๞࿨(SwinTͱಉ͡) ↓

  15. ConvNeXt ࡉ͔ͳSwinT or ViTͷ޻෉Λಋೖ ReLU→GELU NormalizationݮΒ͢ BN→LN μ΢ϯαϯϓϧ૚Λ੾Γ཭͠

  16. ConvNeXt ݁Ռ

  17. ConvNeXt ResNetΛSwinTransformerԽͯ͠ɺ CNN͚ͩͰState-of-the-artग़ͨΑɻ

  18. None
  19. 2022೥·ͱΊ Text AGI ࣗવݴޠॲཧΛج൫ͱͨ͠൚ਓ޻஌ೳ(AGI)΁ͷνϟϨϯδͷ1೥ɻ CV ConvNeXt AudioLM Speech Socratic model

    Diffusion GLIDE, Imagic
  20. GLIDE Stable Di ff usionͷجૅϞσϧ

  21. Diffusion model ੜ੒Ϟσϧ

  22. Diffusion modelͷྺ࢙ DDPM ADM GLIDE CLIP ↓ ҆ఆԽɺߴղ૾౓Խ ݴޠΛѻ͏

  23. Diffusion modelͷྺ࢙ DDPM ADM GLIDE CLIP ↓

  24. DDPM: diffusion modelͷ࢝·Γ DNN Image Noise Image 
 + 


    Noise ਪ࿦ͨ͠ Noise ࣌ࠁ৘ใ ೋ৐ޡࠩ ࠷খԽ
  25. Diffusion modelͷྺ࢙ DDPM ADM GLIDE CLIP ↓

  26. ADM: ϞσϧΛ2ͭʹ෼͚ͯɺߴղ૾౓Խʹ੒ޭɻ Base Upsampler ෼ྨ ߴղ૾ Classi fi er guidance

    (CNN)
  27. GLIDE = CLIP x Diffusion model Di ff usion modelͷྺ࢙

    DDPM ADM GLIDE CLIP ↓
  28. CLIP: ը૾ͱςΩετͷڮ౉͠

  29. CLIP: ը૾ͱςΩετͷڮ౉͠ ը૾ͱςΩετΛ௚઀ൺֱͰ͖ΔΑ͏ʹಛ௃ม׵Ͱ͖ΔϞσϧ ViT: Image Transformer: Text ίαΠϯྨࣅ౓

  30. ADM Base Upsampler ෼ྨ ߴղ૾ Classi fi er guidance (CNN)

  31. GLIDE = ADM-basedʹCNNΛCLIPʹมߋ ADM-basedʹCNNΛCLIPʹมߋ Base Upsampler ෼ྨ ߴղ૾ Classi fi

    er guidance (CLIP)
  32. Imagic: Stable DiffusionͷվྑςΫχοΫ Stable Di ff usionͷվྑςΫχοΫ

  33. Imagic Overview

  34. None
  35. 2022೥·ͱΊ Text GLIDE, Imagic AGI ࣗવݴޠॲཧΛج൫ͱͨ͠൚ਓ޻஌ೳ(AGI)΁ͷνϟϨϯδͷ1೥ɻ CV ConvNeXt Speech Socratic

    model Diffusion AudioLM
  36. AudioLM Իͷੜ੒Ϟσϧ

  37. AudioLM = w2v-BERT x SoundStream Overview ɾจষͱΦʔσΟΦͷؒʹ͸Ұରଟͷؔ܎͕͋Δɻ ɾΦʔσΟΦ͸ςΩετʹൺͯ͠σʔλྔ͕ଟ͍ɻ

  38. SoundStream ԻΛྔࢠԽ͢Δ

  39. w2v-BERT Contrastive LearningͱMasked Language Modelingͷ૊Έ߹Θͤ

  40. None
  41. 2022೥·ͱΊ Text GLIDE, Imagic AGI ࣗવݴޠॲཧΛج൫ͱͨ͠൚ਓ޻஌ೳ(AGI)΁ͷνϟϨϯδͷ1೥ɻ CV ConvNeXt AudioLM Speech

    Diffusion Socratic model
  42. Socratic models طଘֶशࡁΈϞσϧΛ૊Έ߹Θͤͨ(४ʁ)൚ਓ޻஌ೳϞσϧ

  43. Socratic models Overview Language is an intermediate representation

  44. Socratic models Overview طଘͷVLM (Visual Language Model)ɺLMs (Large Language Model)

    ɺ ALMs (Audio Language Model)ͷಉ࢜ が ɺߏ଄Խ͞Εͨର࿩Λߦ͏ɻ ͦͯ͠ɺ ビデ ΦαʔνɺΩϟ プ γϣϯੜ੒ɺ ビデ ΦQ&A (ະ஌ͷλεΫ)ɺকདྷͷߦಈ༧ଌΛ͜ͷର࿩ۭؒ΁ͷ৽͍͠ࢀՃऀͱͯ͠ѻ͏ ɻ
  45. Socratic models ྫࣔ̍ɿجຊฤ

  46. Socratic models ྫࣔ̎ɿԠ༻ฤ

  47. Socratic models ιΫϥςεର࿩ͱ͸ʁ

  48. None
  49. Others: NeuroAIᶃ ೴ͷػೳͱݴޠϞσϧͷରԠΛ୳Δ

  50. Others: NeuroAIᶃ શମ૾: Wav2Vec 2Λֶश͠ɺͦͷ݁Ռ͔ΒfMRIͷBOLDΛ༧ଌ͢ΔWΛ࡞੒ɾ݁Ռݕূ

  51. Others: NeuroAIᶃ ฏۉԽͨ͠೴ͷ׆ੑͷදݱɻ

  52. Others: NeuroAIᶃ ϞσϧͷϨΠϠʔͷਂ͞ͱ೴ͷ෦ҐʹରԠ͕͋ͬͨɻ

  53. Others: NeuroAIᶄ ೴೾͔ΒݴޠΛੜ੒͢Δ

  54. Others: NeuroAIᶄ ϞσϧͷτϨʔχϯάηογϣϯ 81िؒʹΘͨΓ50ճͷηογϣϯ ݽཱޠλεΫͱจষλεΫ λʔήοτͷ୯ޠ΍จষ͕ը໘্ͷจࣈͱͯ͠ ඃݧऀʹࢹ֮తʹఏࣔ͞Εඃݧऀ͸ ͦͷ୯ޠ΍จষΛੜ੒͠Α͏ͱͨ͠ɻ ݽཱޠλεΫͰ͸ɺ50ݸͷӳ୯ޠηοτ͔Βݸʑͷ୯ޠΛੜ੒ɻ จষλεΫͰ͸ɺ50୯ޠηοτ͔ΒͳΔӳޠจ͔Β୯ޠྻΛੜ੒ɻ

  55. Others: NeuroAIᶄ Ϟσϧͷ݁Ռ จষ͸75%ͷਫ਼౓ ୯ޠ͸93%ͷਫ਼౓

  56. None
  57. Others: AI Ethics ྙཧ

  58. Algorithmic Imprint Ξϧ ゴ Ϧ ズ ϜʹΑΔ֐ が ൃੜͨ͠৔߹ͷҰൠత で

    ߹ཧతͳରࡦͱͯ͠ɺͦͷ༗֐ͳӨڹ が ͞Βʹ఻ൖ͢ΔͷΛ๷ ぐ ͨΊʹ Ξϧ ゴ Ϧ ズ Ϝͷ࢖༻ఀࢭ が Α͘ߦΘΕΔ が ɺఀࢭ͔ͨ͠Βͱݴͬͯެฏੑɺઆ໌੹೚ɺಁ໌ੑɺྙཧͷ໰୊ が ͳ͘ͳΔ Θ͚ で ͸ͳ͍ →͜ͷ༗֐ͳΞϧ ゴ Ϧ ズ ϜͷӨڹ͸ɺΞϧ ゴ Ϧ ズ Ϝ࡟আҎ߱΋௕͘Өڹ͠ଓ͚Δ(Ξϧ ゴ Ϧ ズ Ϝͷࠟ੻) ྫ: ӳࠃΛڌ఺ͱ͢Δߴߍͷଔۀূॻࢼݧ で ͋ΔGCEࢼݧͷΞϧ ゴ Ϧ ズ ϜʹΑΔධՁΛऔΓר͘໰୊(2020) ▪ ど ͷΑ͏ͳࢼݧ͔? ɾ 160͔ࠃҎ্ で ࣮ࢪ͞Ε͍ͯΔ(ͦͷଟ͘͸ӳࠃͷݩ২ຽ஍)ࠃࡍతʹೝΊΒΕͨࢼݧ ɾ AϨ ベ ϧͷ੒੷͸ඞવత で ͋Γɺେֶ΁ͷೖֶʹෆՄܽͳ໾ׂΛՌͨ͢ ▪ܦҢ ɾCOVID-19ͷେྲྀߦʹΑΓGCEࢼݧΛ؂ಜ͢ΔӳࠃʹຊڌΛஔ͘४੓෎ػؔ で ͋ΔOfqual͸ର໘ࢼݧΛதࢭͨ͠ ɾࢼݧͷ୅ΘΓʹɺֶߍ で ͷੜెͷաڈͷ੒੷ɺڭࢣͷධՁΛ࢖༻ͯ͠Ξϧ ゴ Ϧ ズ Ϝ で ੒੷Λ࡞੒ͨ͠ →݁Ռɺੈքతͳ߅ٞߦಈ が ຄൃ͠ɺΞϧ ゴ Ϧ ズ Ϝ͸࡟আ͞Εͨ ɹڭࢣଆ: ͦ΋ͦ΋աڈͷੜెͷධՁΛه࿥͍ͯ͠ͳ͍ ɹੜెଆ: ੒੷ʹରͯ͠ਅ݋ʹऔΓ૊Μ で ͍ͳ͔ͬͨ(ࢼݧ が શͯͳͷ で ௚લͷ30~60೔ʹ໠ษڧ͢Δੜె が ଟ͍) ɾΞϧ ゴ Ϧ ズ Ϝ͸࡟আ͞Εͨ が ɺֶੜͷ࠶ධՁ͸ߦΘΕͳ͔ͬͨɻ ͢ͳΘͪɺ࠾఺ํ๏͸มΘͬͨ が ɺΞϧ ゴ Ϧ ズ ϜͷӨڹΛେ͖͘ड͚͍ͯͨ(Ξϧ ゴ Ϧ ズ Ϝͷࠟ੻)
  59. Algorithmic Imprint ▪Algorithmic Imprint(Ξϧ ゴ Ϧ ズ Ϝͷࠟ੻)Λҙࣝͨ͠Ξϧ ゴ Ϧ

    ズ Ϝͷઃܭ ʮΞϧ ゴ Ϧ ズ Ϝͷࠟ੻ʯΛҙࣝͨ͠ઃܭͷߟ͑ํʹΑΓɺΞϧ ゴ Ϧ ズ Ϝ։ൃ プ ϩηεΛΑΓެฏ で ࣾձٕज़తͳ ৘ใʹج づ ͍ͨ΋ͷʹ͢Δ͜ͱ がで ͖Δɻ (1)Ξϧ ゴ Ϧ ズ ϜͷӨڹ Ξϧ ゴ Ϧ ズ Ϝ͸࡟আͨ͠ޙʹ΋ར֐ؔ܎ऀʹӨڹΛٴ ぼ ͢ɻ։ൃऀͱӡӦऀ͸Ξϧ ゴ Ϧ ズ ϜΛ࡟আ ͢Δ だ ͚ で ͳ͘ɺΞϧ ゴ Ϧ ズ ϜʹΑΔة֐Λੋਖ਼͠ɺઆ໌੹೚ が ࣋ଓͯ͠ཁٻ͞ΕΔɻ (2)Ξϧ ゴ Ϧ ズ Ϝઃܭͷઆ໌੹೚ ։ൃऀ͸ʮΞϧ ゴ Ϧ ズ Ϝͷࠟ੻ʯͷӨڹΛड͚Δਓʹ֐ΛΑΓೝࣝ で ͖ΔΑ͏ʹ͢Δ べ ͖ で ͋Δɻ (3)AIྙཧ ガ バ φϯε で ิڧ͢Δ
 ٕज़తͳհೖ だ ͚ で ͸֐Λ࡟ݮ͢Δ͜ͱ͸ で ͖ͳ͍ɻ ʮΞϧ ゴ Ϧ ズ Ϝͷࠟ੻ʯΛҙࣝͨ͠Ξϧ ゴ Ϧ ズ ϜઃܭΛ ద੾ͳAI ྙཧ ガ バ φϯε で ิ׬͢Δɻ
  60. None
  61. 2023೥ʹ͍ͭͯ ʮզʑͷݚڀࣨʹ͔͠Ͱ͖ͳ͍͜ͱʯΛɻ Ҿ͖ଓ͖ษڧձ։࠵͢Δɻ ҩֶ΁ͷൺॏΛॏ͘͢Δɻ ҩྍը૾ݚڀ༻ϞσϧͷνϡʔτϦΞϧɾϋϯζΦϯ