Slide 1

Slide 1 text

εʔύʔίϯϐϡʔλ͕૊ࠐΈγεςϜʹ߱Γͯ͘Δʂ 
 ʙ৽࣌୅ͷߴੑೳ૊ࠐΈγεςϜͷ 
 SIMDʗϕΫτϧॲཧͷཁ఺Λԡ͑͞Δ ๺۝भࢢཱେֶ ࢁ࡚ ਐ 1 ©︎ 2021 Susumu Yamazaki

Slide 2

Slide 2 text

എܠ • εʔύʔίϯϐϡʔλͰഓΘΕ͖ͯͨฒྻܭࢉ ͷٕज़͕૊ࠐΈγεςϜʹ߱Γ͖ͯͨͷ͸ɼ 
 ࠓʹ࢝·ͬͨ͜ͱͰ͸ͳ͍ • 1994೥ൃചͷॳ୅PlayStation (ӈਤ) • 2000೥ൃചͷPlayStation 2 2 ©︎ 2021 Susumu Yamazaki ©MarcelBuehner, Creative Commons

Slide 3

Slide 3 text

എܠ • 2021೥3݄ൃද ࣍ظARMΞʔΩςΫνϟͱͳΔArmv9ΞʔΩςΫνϟ • εʔύʔίϯϐϡʔλ෋ַͰ༻͍ΒΕΔϕΫτϧ໋ྩSVE/SVE2͕࠾༻ • RISC-V • ϕΫλ֦ுΛ౥ࡌͨ͠D1 chip౥ࡌͷIoTϘʔυ͕ࢢൢ • 2021೥6݄ ϕΫλ֦ுΛ౥ࡌͨ͠RISC-VνοϓΛؚΉߴੑೳRISC-VνοϓΛIntel͕ ࡞Δͱൃද 3 ©︎ 2021 Susumu Yamazaki

Slide 4

Slide 4 text

എܠ • SIMD(Single Instruction Multiple Data) ʹ͍ͭͯ͸͜ΕʹઌΜͯ͡ීٴ • ޿͘ීٴ͍ͯ͠ΔݱߦͷArmv8ΞʔΩςΫνϟʹSIMD͕౥ࡌ • GPU΋جຊతʹ͸SIMDΞʔΩςΫνϟ • Arm Mali΍NVIDIA JetsonͷΑ͏ʹGPUΛ౥ࡌͨ͠IoT΋Ұൠత 4 ©︎ 2021 Susumu Yamazaki

Slide 5

Slide 5 text

ϋΠΤϯυͷ૊ࠐΈγεςϜʹ౥ࡌ͞ΕΔ SIMDʗϕΫτϧॲཧػೳΛ׆༻͢Δʹ͸ʁ 5 ©︎ 2021 Susumu Yamazaki

Slide 6

Slide 6 text

ϋΠΤϯυͷ૊ࠐΈγεςϜʹ౥ࡌ͞ΕΔSIMD/ϕΫτϧॲཧػೳΛ׆༻͢Δʹ͸ • Ұൠతͳ౴͑ • εʔύʔίϯϐϡʔλͰ௕೥ഓΘΕ͖ٕͯͨज़ʹ༝དྷ͢ΔBLAS/LAPACKͱ͍͏ઢܗ ୅਺ʹجͮ͘ϥΠϒϥϦͱɼͦΕΛ׆༻͢ΔOSSϥΠϒϥϦΛར༻͠·͠ΐ͏ • ͦͷཧ༝ • BLAS/LAPACK͸ɼਓ஌ͷ͔͗ΓΛਚͯ͘͠SIMDʗϕΫτϧॲཧػೳΛۃݶ·Ͱ׆༻ ͢΂͘νϡʔχϯά͞Ε͍ͯΔ • ͜ΕΛ௒͑ΔΑ͏ͳ΋ͷΛ༰қʹ։ൃ͢Δ͜ͱ͸Ͱ͖ͳ͍ʮۚࣈౝʯ 6 ©︎ 2021 Susumu Yamazaki

Slide 7

Slide 7 text

ϋΠΤϯυͷ૊ࠐΈγεςϜʹ౥ࡌ͞ΕΔSIMD/ϕΫτϧॲཧػೳΛ׆༻͢Δʹ͸ • BLAS/LAPACKͷΑ͏ͳϥΠϒϥϦΛ׆༻͢Δͱ͖ʹ૊ࠐΈΤϯδχΞ͕࣋ͭͰ͋Ζ͏ ݒ೦ɾཁ๬ • ͖ͬͱϝϞϦɾετϨʔδͷαΠζ͕͔͔ΔͩΖ͏ • ݪཧ΍࢓૊Έ͕Θ͔Βͳ͍ͱ҆৺ͯ͠࢖͑ͳ͍ 7 ©︎ 2021 Susumu Yamazaki

Slide 8

Slide 8 text

ϋΠΤϯυͷ૊ࠐΈγεςϜʹ౥ࡌ͞ΕΔSIMD/ϕΫτϧॲཧػೳΛ׆༻͢Δʹ͸ • ͦ͜Ͱɼຊ෼ՊձͰ͸ɼ࣍ͷ͜ͱΛϨΫνϟʔ͠·͢ • SIMDʗϕΫτϧॲཧΛ׆༻͢Δ্ͰجૅͱͳΔSIMD(Single Instruction Multiple Data)ͷߟ͑ํ • ͜ΕΛੜ͔ͨ͢ΊͷϝϞϦ഑ஔ • SIMD໋ྩɾϕΫτϧ໋ྩ׆༻ͷཁ఺ • BLASͷAPIͷ͘͝Ұ෦Λྫʹͯ͠ಡΈऔͬͨɼSIMDʗϕΫτϧॲཧϓϩάϥϛϯάͷώϯτ • ٯʹѻΘͳ͍͜ͱ • BLAS/LAPACKͷ׆༻ํ๏ʹ͍ͭͯ 8 ©︎ 2021 Susumu Yamazaki

Slide 9

Slide 9 text

ࢲࣗ਎͸·ͩ·ͩݚڀͷ్্ͰୡਓͷҬʹ͸ୡ͍ͯ͠ͳ͍ 
 ࠓޙɼݟղΛม͑Δ͔΋͠Εͳ͍͜ͱΛྃ͝ঝ͍ͩ͘͞ 
 (ࢲͷݟͨʮୡਓʯ͸ɼ΋ͬͱ΋ͬͱɼ͔ͬͨ͢͝Ͱ͢ʂ) 9 ©︎ 2021 Susumu Yamazaki

Slide 10

Slide 10 text

Ͱ͸ຊ୊ 10 ©︎ 2021 Susumu Yamazaki

Slide 11

Slide 11 text

SIMDͷߟ͑ํ 11 ©︎ 2021 Susumu Yamazaki

Slide 12

Slide 12 text

ฒྻॲཧͷ෼ྨ • SISD(γεσΟ: Single Instruction Single Data: ୯Ұ໋ྩ୯Ұσʔλ): ฒྻͰ͸ͳ͍ • SIMD(γϜσΟ: Single Instruction Multiple Data: ୯Ұ໋ྩෳ਺σʔλ) • MISD(ϛεσΟ: Multiple Instruction Single Data: ෳ਺໋ྩ୯Ұσʔλ) • MIMD(ϛϜσΟ: Multiple Instruction Multiple Data: ෳ਺໋ྩෳ਺σʔλ) 12 ©︎ 2021 Susumu Yamazaki

Slide 13

Slide 13 text

MIMD • MIMD (ϛϜσΟʔɼMultiple Instruction Multiple Data): 
 1ͭ1͕ͭҟͳΔಈ࡞͕Ͱ͖ΔܭࢉϢχοτʹฒྻॲཧ 
 ͤ͞Δํࣜ • ͨͱ͑ΔͳΒɼङΛ͢Δ͍͞ͷङྌݘͷΑ͏ʹɼ֤ʑ͕ ࿈ܞ͠ͳ͕Βݸผͷಈ͖Λ͢Δํࣜ • ϚϧνίΞCPUͱ͍ͬͨͱ͖ʹ͸ɼ1ͭ1ͭͷίΞ͕ಠཱ ͯ͠ݸผͷಈ͖Λ͢Δ͜ͱ͕Ͱ͖Δ • MIMD͸SIMDͱൺ΂ͯฒྻ౓ΛՔ͗ʹ͍͘ • ࢢൢ͞Ε͍ͯΔCPUͰ͸਺ेʙඦ਺ेίΞ͕͍͍ͤͥ • ݚڀஈ֊Ͱ͸ઍݸҎ্ͷίΞΛ࣋ͭ΋ͷ΋͋Δ 13 ©︎ 2021 Susumu Yamazaki

Slide 14

Slide 14 text

SIMD • SIMD(γϜσΟ, Single Instruction Multiple Data) 
 ಉ͡ಈ࡞Λ͢ΔܭࢉϢχοτʹฒྻॲཧͤ͞Δ ํࣜ • ͨͱ͑ΔͳΒా২͑Λͨ͘͞Μͷਓ਺Ͱಉ࣌ʹ ߦ͏Α͏ͳ΋ͷ • CPU ͷ SIMD ໋ྩ΍ϕΫλ໋ྩɼGPUͰ࠾༻ • SIMDํࣜͩͱฒྻ౓Λ্͛΍͘͢ɼ1000Ҏ্ ͷฒྻ౓Λ࣋ͭGPU͕ࢢൢ͞Ε͍ͯΔ 14 ©︎ 2021 Susumu Yamazaki

Slide 15

Slide 15 text

CPUͷSIMD໋ྩ • ݻఆ௕ͷϏοτ෯ͷSIMDϨδελͰԋࢉΛߦ͏ • Intel SSEͳΒ͹128Ϗοτ • Intel AVX / AVX2ͳΒ͹256Ϗοτ • Intel AVX-512ͳΒ͹512Ϗοτ • ARM NEON ͳΒ͹128Ϗοτ(ARMv8ͷ৔߹) • SIMDϨδελΛ۠੾ͬͯ࢖༻͢Δ • ੔਺ͳΒ͹{ූ߸෇͖/ͳ͠},{8/16/32/64Ϗοτ} • ුಈখ਺఺਺ͳΒ͹൒ਫ਼౓(16Ϗοτ)(ARM NEONͷΈ)/୯ਫ਼౓(32 Ϗοτ)/ഒਫ਼౓(64Ϗοτ) • ར఺: ࣮ݱ͕༰қ • ͨͱ͑͹੔਺ͷՃݮࢉ໋ྩͰ͋Ε͹ɼ۠੾ͬͨϏοτҐஔͰՃࢉ 
 ճ࿏ͷΩϟϦʔʗϘϩʔΛ఻೻ͤ͞ͳ͍Α͏ʹ͢Ε͹ɼ࣮ݱͰ͖Δ • ܽ఺: • ΫϩοΫप೾਺ͷ޲্ͷ৔߹ͱҟͳΓɼSIMD໋ྩΛར༻͠ͳ͍ͱ ੑೳ͸্͕Βͳ͍ͷͰΞηϯϒϦϓϩάϥϛϯά΋͘͠͸SIMD໋ ྩͷίʔυੜ੒ʹରԠͨ͠ίϯύΠϥ͕ඞਢ • SIMDϨδελͷϏοτ෯͕มΘΔͱػցޠ໋ྩͷޓ׵ੑ͕ͳ͍ͷ ͰɼϏοτ෯Λม͑Δ࣌ʹϓϩάϥϛϯά΍ίϯύΠϧΛ͠ͳ͓͢ ඞཁ͕͋Δ • ࠷ۙͷClangͱGCC͸ auto vectorization ͕ৗ࣌༗ޮͱͳ͓ͬͯΓɼɹ ϧʔϓʹରͯࣗ͠ಈͰSIMD໋ྩΛ࢖͏Α͏ʹίʔυੜ੒ͯ͘͠ΕΔ 15 ©︎ 2021 Susumu Yamazaki

Slide 16

Slide 16 text

ϕΫτϧ໋ྩ • ϕΫτϧ໋ྩ΋ฒྻॲཧͷ෼ྨͩͱ SIMDʹ۠෼͞ΕΔ • SIMD໋ྩͷܽ఺: • SIMDϨδελͷϏοτ෯͕มΘΔͱ ػցޠ໋ྩͷޓ׵ੑ͕ͳ͍ͷͰɼ 
 Ϗοτ෯Λม͑Δ࣌ʹϓϩάϥϛϯά ΍ίϯύΠϧΛ͠ͳ͓͢ඞཁ͕͋Δ • ͜Εʹର͠ϕΫτϧ໋ྩͰ͸ • ϕΫτϧϨδελ෯͸CPUʹΑͬͯ ҟͳΔ͕ɼͦΕʹ߹Θͤͯϧʔϓճ਺ ͳͲͷύϥϝʔλΛઃఆ͢Δઐ༻໋ྩ ͕ଘࡏ͢Δ • ͦͷͨΊϕΫτϧϨδελ෯͕มΘͬ ͯ΋ɼϓϩάϥϛϯά΍ίϯύΠϧΛ ͠ͳ͓͢ඞཁ͸ͳ͍ 16 ©︎ 2021 Susumu Yamazaki

Slide 17

Slide 17 text

ϕΫτϧ໋ྩʹ͍ͭͯ 17 ©︎ 2021 Susumu Yamazaki

Slide 18

Slide 18 text

ϕΫτϧ໋ྩʹ͍ͭͯ • RISC-VͷϕΫτϧ֦ுʹରԠ͢ΔCPUΛ౥ࡌͨ͠IoTϘʔυΛೖखͰ͖·ͨ͠🎉 • ϕΫλ໋ྩΛίϯύΠϧ͢ΔΫϩείϯύΠϧ؀ڥͷߏஙʹ੒ޭ͠·ͨ͠🎉 • ϕΫλ໋ྩΛ࢖ͬͨαϯϓϧΞηϯϒϦίʔυͷ࣮ߦʹ΋੒ޭ͠·ͨ͠🎉 • ͨͩɼϕϯνϚʔΫଌఆͷ؀ڥΛ੔͑Δͷ͕ؒʹ߹͍·ͤΜͰͨ͠😭 • ͦͷͨΊɼಈ࡞֬ೝҎ্ͷ͜ͱ͸ग़དྷ͍ͯ·ͤΜ(ਃ͠༁ͳ͍Ͱ͢)😭 18 ©︎ 2021 Susumu Yamazaki

Slide 19

Slide 19 text

ͦ͏͍͏Θ͚Ͱ 19 ©︎ 2021 Susumu Yamazaki

Slide 20

Slide 20 text

Ҏ߱ɼSIMD໋ྩʹϑΥʔΧε͠·͢ ྃ͝ঝ͍ͩ͘͞ 😞 20 ©︎ 2021 Susumu Yamazaki

Slide 21

Slide 21 text

ͱ͜ΖͰ 21 ©︎ 2021 Susumu Yamazaki

Slide 22

Slide 22 text

SIMDԽ͢Δ͜ͱͰ ͲΕ͘Β͍ޮՌ͕͋Δͷ͔ʁ 22 ©︎ 2021 Susumu Yamazaki

Slide 23

Slide 23 text

αϯϓϧϓϩάϥϜ࡞ͬͯΈ·ͨ͠ • https://github.com/zacky1972/simd_sample • ϞϊΫϩԽͷը૾ϑΟϧλΛ୊ࡐʹ • 8Ϗοτූ߸ͳ͠੔਺→32Ϗοτූ߸ͳ͠੔਺→32Ϗοτුಈখ਺఺਺ 
 →{R, G, B} <= 0.299 * r + 0.587 * g + 0.114 * bɹ 
 →খ਺఺ҎԼ࢛ࣺޒೖ→32Ϗοτූ߸ͳ͠੔਺→8Ϗοτූ߸ͳ͠੔਺ɹ • CݴޠͰॻ͍ͨ৔߹ (2छྨ: ޙड़) 
 Auto-vectorizationͷͨΊϧʔϓͰSIMDίʔυΛੜ੒ • Intrinsic (SIMD໋ྩΛCͰهड़͢Δํ๏ͷ1ͭ)Ͱॻ͍ͨ৔߹ • Intel Core i5 / AVX2 • ARMv8 / NEON • Cݴޠ+Auto-vectorizationʹൺ΂ͯSIMD໋ྩΛखͰهड़ͯ͠࠷దԽΛਤΔͱ 
 2ʙ5ഒͷߴ଎ԽޮՌ͕͋Δ • ͔͠΋SIMD໋ྩͷهड़ʹ͸·ͩ࠷దԽͷ༨஍͕͋Γͦ͏ͳײ৮͕͋Δ 23 ©︎ 2021 Susumu Yamazaki

Slide 24

Slide 24 text

2ഒ΋ߴ଎ʹͳΔ 😲 
 vs 
 ͔͔ͨͩ2ഒ͔͠ߴ଎ʹͳΒͳ͍ 🙅 24 ©︎ 2021 Susumu Yamazaki

Slide 25

Slide 25 text

͔͔ͨͩ2ഒఔ౓͔͠ߴ଎ʹͳΒͳ͍ 
 ͷʹ 
 Intrinsics / ΞηϯϒϦίʔυͷϓϩάϥϛϯάΛ͢Δͷʁ🤔 25 ©︎ 2021 Susumu Yamazaki

Slide 26

Slide 26 text

ޙड़🤔🤔🤔 26 ©︎ 2021 Susumu Yamazaki

Slide 27

Slide 27 text

ؓ࿩ٳ୊ 27 ©︎ 2021 Susumu Yamazaki

Slide 28

Slide 28 text

SIMDΛੜ͔ͨ͢Ίͷ ϝϞϦ഑ஔ 28 ©︎ 2021 Susumu Yamazaki

Slide 29

Slide 29 text

Array of Structures (AoS) 
 Structure of Arrays (SoA) 29 ©︎ 2021 Susumu Yamazaki

Slide 30

Slide 30 text

Array of Structures (AoS) • ߏ଄ମͷ഑ྻ • AoS͸SIMDԽ͠ʹ͍͘ • ඈͼඈͼʹࢀর͢Δඞཁ͕͋Δ • RGB 8Ϗοτͣͭͩͬͨ৔߹ʹ͸ɼ 
 3όΠτඈ͹͠͠ͳ͕Βr, g, bΛϩʔυͯ͠ 
 ϞϊΫϩԽͷ܎਺Λ͔͚ͯ΍Δඞཁ͕͋Δ • ͨͩ͠ NEON ʹ͸ͦͷͨΊͷศརͳϩʔυɾετΞ໋ ྩ͕උΘ͍ͬͯΔʂ • ͜ͷ৔߹ͩͱ3όΠτͣͭಡΈඈ͹ͯ͠SIMDϨδελ ʹ֨ೲ͍ͯ͘͠ϩʔυ໋ྩ ld3q ͕༻ҙ͞Ε͍ͯΔ 30 ©︎ 2021 Susumu Yamazaki

Slide 31

Slide 31 text

Structure of Arrays (SoA) • ഑ྻͷߏ଄ମ • AoSΑΓ΋SoAͷํ͕SIMDԽ͠΍͍͢ʂ • 8Ϗοτ഑ྻͷ஋Λ 
 SSEɾNEONͷ৔߹128Ϗοτ(16ݸ෼)ͣͭ 
 AVX2ͷ৔߹256Ϗοτ(32ݸ෼)ͣͭ 
 AVX-512ͷ৔߹512Ϗοτ(64ݸ෼)ͣͭ 
 ·ͱΊͯॲཧ͢Δܗʹ͠΍͍͢ 31 ©︎ 2021 Susumu Yamazaki

Slide 32

Slide 32 text

ϕϯνϚʔΫ݁ՌΛ΋͏Ұ౓ݟͯΈΑ͏ • https://github.com/zacky1972/simd_sample • Intel / AVX2 ͩͱ AoSΑΓSoA͕2ഒۙ͘଎͍ • SoAͷํ͕SIMDԽ͠΍͍͢ͱ͍͏ఆੴ௨Γ • ARMv8 / NEONͩͱඍົʹAoSͷํ͕଎͍ • NEONʹ༻ҙ͞Ε͍ͯΔ3όΠτඈ͹͠ͷ 
 ϩʔυɾετΞ໋ྩͷޮՌ͔ʁ • ߏ଄ମͷαΠζ͕5όΠτҎ্ʹͳΔͱରԠ͢Δ ໋ྩ͕ͳ͘ͳΔͷͰɼ͓ͦΒ͘஗͘ͳΔͷͰ͸ʁ 32 ©︎ 2021 Susumu Yamazaki

Slide 33

Slide 33 text

Auto-vectorizationΛ࢖͏ൣғͰ͸ 
 AoSΑΓ΋SoAͷελΠϧʹ 
 ϝϞϦ഑ஔͨ͠ํ͕ແ೉ͦ͏ 33 ©︎ 2021 Susumu Yamazaki

Slide 34

Slide 34 text

SIMD໋ྩɾϕΫτϧ໋ྩ ׆༻ͷཁ఺ 34 ©︎ 2021 Susumu Yamazaki

Slide 35

Slide 35 text

SIMD໋ྩɾϕΫτϧ໋ྩ׆༻ͷཁ఺ • ϞϊΫϩԽͷ৔߹ͷྲྀΕ(͜ΕΒ1ͭ1ͭͷखॱ͸ఆੴύλʔϯʹͳΓͦ͏) 1. ϩʔυ 2. 8Ϗοτ͔Β16Ϗοτʹ֦ுͯ͠ԼҐͱ্Ґʹ෼ׂ 3. 16Ϗοτ͔Β32Ϗοτʹ֦ுͯ͠ԼҐͱ্Ґʹ෼ׂ 4. 32Ϗοτ੔਺͔Β32Ϗοτුಈখ਺఺਺ʹม׵ 5. ੵ→ੵ࿨→ੵ࿨ 6. 32Ϗοτුಈখ਺఺਺Λ࢛ࣺޒೖͯ͠32Ϗοτ੔਺ʹม׵ 7. 32Ϗοτ͔Β16Ϗοτʹॖখͭͭ͠ԼҐͱ্ҐΛ౷߹ 8. 16Ϗοτ͔Β8Ϗοτʹॖখͭͭ͠ԼҐͱ্ҐΛ౷߹ 9. ετΞ • CPUͷSIMDϨδελ਺Λ௒ա͠ͳ͍Α͏ʹεέδϡʔϦϯά͢Δ • ֦ுͯ͠ԼҐͱ্Ґʹ෼ׂ͢Δࡍʹɼ্ҐͷޙଓͷܭࢉΛޙճ͠ʹ͢Δ • R, GͷॲཧΛઌʹ΍ͬͯɼੵ→ੵ࿨ͯ͠SIMDϨδελΛ·ͱΊͨޙ B ͷॲཧΛߦͳͬͯੵ࿨͢Δ • ม਺ʹܕΛ໌ه͢ΔͱΘ͔Γ΍͍͢ (ϋϯΨϦΞϯه๏తͳΞϓϩʔν) • ྫ: • float32x4_t f32x4_pixel_r; • uint8x16_t u16x8_pixel_b; • NEON͸໋໊نଇ͕Θ͔Γ΍͘͢ɼ 
 ෳࡶͳIntel SSE/AVX ΑΓ΋ϓϩάϥϛϯά͠΍͍͢ 35 ©︎ 2021 Susumu Yamazaki

Slide 36

Slide 36 text

OpenBLASʹݟΔ 
 SIMDॲཧϓϩάϥϛϯάͷώϯτ 36 ©︎ 2021 Susumu Yamazaki

Slide 37

Slide 37 text

OpenBLAS • https://github.com/xianyi/OpenBLAS • BLAS: جຊઢܗ୅਺αϒϓϩάϥϜ • OpenBLAS͸࠷దԽ͞ΕͨBLASͷϥΠϒϥϦͰBSDϥΠηϯεͰ 
 ެ։͞Ε͍ͯΔ΋ͷ • ࢖͍উखΑΓεϐʔυΛ௥ٻͯ͠API͕ઃܭ͞Ε͍ͯΔ • ͦΕͰ͍ͯ൚༻ੑ͕͋ΔΑ͏ͳઈົ͕͋͞Δ • ͨͱ͑͹GEMM(ߦྻͱߦྻͷੵ) • σʔλܕʹ߹Θͤͯ࠷దԽ͞Ε͍ͯΔ • SGEMM(୯ਫ਼౓൛) • DGEMM(ഒਫ਼౓൛) • CGEMM(ෳૉ਺୯ਫ਼౓൛) • ZGEMM(ෳૉ਺ഒਫ਼౓൛) • ࣍ͷܭࢉΛಉ࣌ʹ࣮ߦ͢Δ͜ͱͰɼେҬతʹ࠷దԽͰ͖Δ • ߦྻͷ৐ࢉ • ߦྻͷసஔ • ߦྻͷεΧϥʔഒ • ߦྻͷՃࢉ 37 ©︎ 2021 Susumu Yamazaki BLAS (Basic Linear Algebra Subprograms) https://www.netlib.org/blas/

Slide 38

Slide 38 text

OpenBLAS • ΞηϯϒϦίʔυྫ • https://github.com/xianyi/OpenBLAS/ blob/develop/kernel/arm64/ dgemm_kernel_4x4.S • https://github.com/xianyi/OpenBLAS/ blob/develop/kernel/arm64/ dgemm_kernel_4x8.S • https://github.com/xianyi/OpenBLAS/ blob/develop/kernel/arm64/ dgemm_kernel_8x4.S • ԋࢉΧʔωϧ͕CPUΞʔΩςΫνϟ΍ αΠζͰࡉ͔͘৔߹෼͚͞Ε͍ͯΔ • ϚΫϩ౳Λఆٛ͢Δ͜ͱͰಉ͡Α͏ͳ ίʔυྻΛ޼Έʹ࠶ར༻͍ͯ͠Δ • ಺ଆϧʔϓΛల։ͯ͠଎౓Λ޲্ͤ͞Δ Α͏ʹ͍ͯ͠Δ 38 ©︎ 2021 Susumu Yamazaki

Slide 39

Slide 39 text

·ͱΊ 39 ©︎ 2021 Susumu Yamazaki

Slide 40

Slide 40 text

• SIMD͸ಉ͡ಈ࡞Λ͢ΔܭࢉϢχοτʹฒྻॲཧͤ͞Δํࣜ • CPUͷSIMD໋ྩɼϕΫτϧ໋ྩɼGPU͕SIMDʹ֘౰ • CPUͷSIMD໋ྩ͸ݻఆ௕ͷϏοτ෯ͷSIMDϨδελͰԋࢉΛߦ͏ • SIMDϨδελͷϏοτ෯͕มΘΔͱػցޠ໋ྩͷޓ׵ੑ͕ͳ͍ • SIMD໋ྩͷϏοτ෯Λม͑Δ࣌ʹϓϩάϥϛϯά΍ίϯύΠϧΛ͠ͳ͓͢ ඞཁ͕͋Δ • ϕΫτϧ໋ྩͰ͸ϕΫτϧϨδελ෯͕มΘͬͯ΋ɼϓϩάϥϛϯά΍ 
 ίϯύΠϧΛ͠ͳ͓͢ඞཁ͸ͳ͍ • Cݴޠ+Auto-vectorizationʹൺ΂ͯSIMD໋ྩΛखͰهड़ͯ͠࠷దԽΛਤΔ ͱ2ʙ5ഒͷߴ଎ԽޮՌ͕͋Δ • Auto-vectorizationΛ࢖͏ൣғͰ͸AoS(ߏ଄ମͷ഑ྻ)ΑΓ΋ 
 SoA(഑ྻͷߏ଄ମ)ͷελΠϧʹϝϞϦ഑ஔͨ͠ํ͕ແ೉ͦ͏ • ఆੴύλʔϯΛϚΫϩԽ͢Δͱྑͦ͞͏ • CPUͷSIMD/ϕΫτϧϨδελ਺Λ௒ա͠ͳ͍Α͏εέδϡʔϦϯά͢Δ • ม਺ʹܕΛ໌ه͢ΔͱΘ͔Γ΍͍͢ (ϋϯΨϦΞϯه๏తͳΞϓϩʔν) • NEON͸໋໊نଇ͕Θ͔Γ΍͘͢ɼෳࡶͳIntel SSE/AVX ΑΓ΋ 
 ϓϩάϥϛϯά͠΍͍͢ • ࢖͍উखΑΓεϐʔυΛ௥ٻ͠ɼͦΕͰ͍ͯ൚༻ੑ͕͋ΔΑ͏ͳઈົ͕͞ ͋Δײ͡ͰAPIΛઃܭ͢Δ • ԋࢉΧʔωϧΛCPUΞʔΩςΫνϟ΍αΠζͰࡉ͔͘৔߹෼͚͢Δ • ϚΫϩ౳Λఆٛ͢Δ͜ͱͰಉ͡Α͏ͳίʔυྻΛ࠶ར༻͢Δ • ಺ଆϧʔϓΛల։ͯ͠଎౓Λ޲্ͤ͞Δ ·ͱΊ 40 ©︎ 2021 Susumu Yamazaki

Slide 41

Slide 41 text

ஔ͍ͨ࿩Λݩʹ໭ͯ͠ 41 ©︎ 2021 Susumu Yamazaki

Slide 42

Slide 42 text

͔͔ͨͩ2ഒఔ౓͔͠ߴ଎ʹͳΒͳ͍ 
 ͷʹ 
 Intrinsics / ΞηϯϒϦίʔυͷϓϩάϥϛϯάΛ͢Δͷʁ🤔 42 ©︎ 2021 Susumu Yamazaki

Slide 43

Slide 43 text

खͰSIMD໋ྩΛॻ͘ͷ͸ 
 ຊ౰ʹੑೳ͕ཁٻ͞ΕΔ෦෼ʹ 
 ݶఆ͢΂͖Ͱ͋Δ 43 ©︎ 2021 Susumu Yamazaki

Slide 44

Slide 44 text

Ͳͷ͘Β͍࠶ར༻͞ΕΔ͔ 
 ͲͷΑ͏ʹ࠶ར༻͞ΕΔ͔Λ 
 ෼ੳͯ͠APIΛઃܭ͢΂͖Ͱ͋Δ 44 ©︎ 2021 Susumu Yamazaki

Slide 45

Slide 45 text

ϝϯςφϯεͰ͖ΔཁһΛ 
 ৗʹ֬อ͠ଓ͚Δ֮ޛ͕ඞཁͰ͋Δ 45 ©︎ 2021 Susumu Yamazaki

Slide 46

Slide 46 text

Ͱ΋ 
 ͦΜͳ͜ͱ 
 ΍ͬͯΒΕͳ͍Ͱ͢ΑͶʁ 46 ©︎ 2021 Susumu Yamazaki

Slide 47

Slide 47 text

ͦ͜Ͱ 
 ฒྻॲཧʹ௕͚ͨElixirͷϓϩάϥϜ͔Β 
 ௒ߴ଎ͳSIMD໋ྩʴϚϧνίΞฒྻͷίʔυΛ 
 ੜ੒͢Δٕज़γʔζΛҭ͍ͯͯ·͢🎉 47 ©︎ 2021 Susumu Yamazaki

Slide 48

Slide 48 text

ΦϯϥΠϯͷݚڀձΛ։࠵͠·͢ͷͰ 
 ڵຯ͋Δਓ͸࿈བྷ͍ͩ͘͞ [email protected] 48 ©︎ 2021 Susumu Yamazaki