Upgrade to Pro — share decks privately, control downloads, hide ads and more …

コンピューティングの基礎と高速化入門

tkclimb
April 27, 2019

 コンピューティングの基礎と高速化入門

コンピューティングの基礎と処理の高速化入門 #1 at connpass で発表したスライドです。
https://liberal-arts-for-tech.connpass.com/event/123273/

tkclimb

April 27, 2019
Tweet

More Decks by tkclimb

Other Decks in Technology

Transcript

  1. ίϯϐϡʔςΟϯάͷجૅͱߴ଎Խೖ໳
    ࢁా وొ (twitter: @tkclimb0911)
    April 27, 2019

    View full-size slide

  2. ໨࣍
    1. ίϯϐϡʔλʹΑΔܭࢉͷݪཧ
    1.1 ϒʔϧ୅਺
    1.2 ࿦ཧճ࿏
    1.3 ిࢠճ࿏
    1.4 νϡʔϦϯάϚγϯ
    1.5 ϊΠϚϯܕίϯϐϡʔλ
    2. CPU ʹΑΔܭࢉͷߴ଎Խ
    2.1 CPU ͱ͸
    2.2 ISA ͱϚΠΫϩΞʔΩςΫνϟ
    2.3 ໋ྩϨϕϧฒྻੑ
    2.4 هԱ֊૚
    2.5 σʔλϨϕϧฒྻੑ
    3. ίϯύΠϥʹΑΔܭࢉͷߴ଎Խ
    3.1 ίϯύΠϥͱ͸
    3.2 ϓϩάϥϜͷදݱͱղऍ
    3.3 ίϯύΠϥʹΑΔ࠷దԽ
    3.4 ύΠϓϥΠχϯάͱεέδϡʔϦ
    ϯά
    1

    View full-size slide

  3. ίϯϐϡʔλʹΑΔܭࢉͷݪཧ

    View full-size slide

  4. ίϯϐϡʔλγεςϜͷந৅Խ
    ίϯϐϡʔλ͸ඇৗʹෳࡶͳγεςϜͰ͋Γɺෳ਺ͷϨϕϧͰ༷ʑͳٕज़͕૊Έ߹Θ
    ͤΒΕͯͰ͖͍ͯΔɻ͜͜Ͱ͸ͦΕΒΛந৅Խͨ͠େ·͔ͳ֊૚ߏ଄Λࣔ͢ɻ
    Figure 1: ίϯϐϡʔλγεςϜͷந৅Խ [5] 2

    View full-size slide

  5. ίϯϐϡʔλ಺෦Ͱͷ਺஋දݱ: 2 ਐ਺
    2 ਐ਺ (binary) ͸ 17 ੈلʹϥΠϓχοπ͕਺ֶతʹཱ֬͠ɺݱ୅ͷίϯϐϡʔλʹ͓
    ͚Δ਺஋දݱʹ༻͍ΒΕ͍ͯΔɻ2 ਐ਺͸͋Δ਺Λ 2 Λج਺ͱͨ͠΂͖ͱ܎਺͔Βͳ
    Δଟ߲ࣜͰදݱ͢Δɻ
    ҎԼʹ”147”Λ 10 ਐ਺ (decimal) ͱ 2 ਐ਺ͷ྆ํͰදͨ͠ྫΛࣔ͢ɻ
    decimal : 14710 = 1 × 102 + 4 × 101 + 7 × 100 (1)
    binary : 1472 = 1 × 27 + 1 × 24 + 1 × 21 + 1 × 20 (2)
    ಉ༷ʹɺ10 ਐ਺ͷ 22 Λ 2 ਐ਺ʹม׵͢ΔͱҎԼͷΑ͏ʹͳΔɻ
    101102 = 1 × 24 + 0 × 23 + 1 × 22 + 1 × 21 + 0 × 20 = 2210
    3

    View full-size slide

  6. 2 ਐ਺ʹΑΔࢉज़: Ճࢉ
    2 ਐ਺ʹ͓͍ͯ΋ 10 ਐ਺ͱಉ༷ʹࢉज़ԋࢉΛߦ͏͜ͱ͕Ͱ͖ΔɻҎԼʹՃࢉͷྫΛ
    ࣔ͢ɻ
    Figure 2: 10 ਐ਺ͱ 2 ਐ਺ʹΑΔՃࢉͷൺֱ [5]
    4

    View full-size slide

  7. 2 ਐ਺ʹΑΔࢉज़: ৐ࢉ
    Figure 3: 2 ਐ਺ʹΑΔ৐ࢉ [7]
    5

    View full-size slide

  8. ϒʔϧ୅਺
    ϒʔϧ୅਺͸ 19 ੈلʹδϣʔδɾϒʔϧʹΑΓߟҊ͞Εͨɻ୅਺ܥ (B; +, −, (¯)) ͕
    ҎԼͷੑ࣭Λຬͨ࣌͢ɺB Λϒʔϧ୅਺ͱݺͿɻ2 ਐ਺ͷ֤ܻΛϒʔϧ୅਺Λ༻͍ͯ
    ද͢͜ͱͰɺՃࢉ΍৐ࢉͳͲͷܭࢉΛ୅਺తʹߦ͏͜ͱ͕Ͱ͖Δɻ
    Figure 4: ϒʔϧ୅਺ͷެཧ (ϋϯςΟϯτϯͷެཧܥ)[12]
    6

    View full-size slide

  9. ϒʔϧؔ਺ͱϒʔϧԋࢉ
    ϒʔϧؔ਺ͱ͸ϒʔϧม਺ B = {0, 1} ͱͦͷ௚ੵ Bn Λ༻͍ͯɺࣸ૾ f : Bn → B Ͱ
    ఆٛ͞ΕΔɻྫ͑͹ɺf (1, 0, 0, 1, 0) = 1 ͳͲͰ͋Δɻ
    ϒʔϧԋࢉͱ͸̎ม਺ͷϒʔϧؔ਺ f : B × B → B Ͱ͋ΓɺίϯϐϡʔλʹΑΔ࿦ཧ
    ԋࢉͱͯ͠΋සൟʹ༻͍ΒΕ͍ͯΔɻҎԼʹϒʔϧԋࢉͷྫΛࣔ͢ɻ
    Figure 5: ϒʔϧԋࢉͷྫ [12]
    7

    View full-size slide

  10. ϒʔϧ୅਺ͱ࿦ཧճ࿏: BUFFER ͱ NOT
    ϒʔϧԋࢉΛճ࿏ͱͯ͠දݱͨ͠΋ͷΛ࿦ཧճ࿏ͱݺͿɻϒʔϧ୅਺ʹ͓͚Δ೚ҙͷ
    ԋࢉ͸࿦ཧճ࿏Ͱද͢͜ͱ͕Ͱ͖ɺήʔτͱݺ͹ΕΔɻҎԼʹ߃౳ͱ൱ఆͷϒʔϧԋ
    ࢉͷਅཧ஋දͱ࿦ཧճ࿏Λࣔ͢ɻ
    Figure 6: BUFFER ήʔτ [5] Figure 7: NOT ήʔτ [5]
    8

    View full-size slide

  11. ϒʔϧ୅਺ͱ࿦ཧճ࿏: AND ͱ OR
    ओཁͳϒʔϧԋࢉͰ͋Δ࿦ཧੵͱ࿦ཧ࿨ʹ͍ͭͯɺͦͷਅཧ஋දͱදهํ๏ɺ࿦ཧճ
    ࿏ΛҎԼʹࣔ͢ɻ
    Figure 8: AND ήʔτ [5] Figure 9: OR ήʔτ [5]
    9

    View full-size slide

  12. ϒʔϧ୅਺ͱ࿦ཧճ࿏:XOR, NAND, NOR
    ͦͷଞͷϒʔϧԋࢉͷਅཧ஋දͱදهํ๏ɺ࿦ཧճ࿏Λࣔ͢ɻ
    Figure 10: XOR ήʔτ, NAND ήʔτ, NOR ήʔτ [5]
    10

    View full-size slide

  13. ిࢠճ࿏ʹΑΔϒʔϧ୅਺ͷදݱ
    1937 ೥ʹΫϩʔυɾγϟϊϯ͕ϒʔϧ୅਺ΛεΠονճ࿏ͰදݱͰ͖Δ͜ͱΛূ໌
    ͨ͠ [9]ɻεΠονճ࿏͸ϦϨʔͳͲͷిࢠճ࿏ͱͯ͠ߏஙͰ͖ΔͷͰɺϒʔϧ୅਺
    ʹ͓͚ΔԋࢉΛ෺ཧతʹܭࢉ͢Δ͜ͱ͕Մೳʹͳͬͨɻ
    Figure 11: ϦϨʔճ࿏ʹΑΔϒʔϧԋࢉͷදݱ [9]
    ࣮͸ 1935 ೥ʹதౡষ্͕هͱ౳Ձͳ಺༰Λൃද͍͕ͯͨ͠ɺ࿦จதʹ͸ࢀর͞Εͯ
    ͍ͳ͍ͱ͞Ε͍ͯΔɻ͜ͷล͸චऀ΋ৄ͘͠ͳ͍ͷͰ͋͘·Ͱࢀߟఔ౓ʹɻ
    ɻ
    11

    View full-size slide

  14. ిѹΛ༻͍ͨ 2 ஋ͷදݱ
    ϒʔϧԋࢉ͕εΠονճ࿏ͰදݱͰ͖Δ͜ͱΛઆ໌͕ͨ͠ɺҰൠతͳίϯϐϡʔλͷ
    ͓͍ͯ 2 ஋ͷϒʔϧม਺͸ిѹͷେখͰදݱ͞ΕΔɻ
    ిѹͷߴ͞ ஋ ిѹ஋
    ௿͍ 0 GND
    ߴ͍ 1 VDD
    ݱ୅ͷίϯϐϡʔλʹ͓͍ͯɺεΠονճ࿏ͱͯ͠ MOS τϥϯδελ͕༻͍ΒΕͯ
    ͍ΔɻτϥϯδελΛ༻͍ͨిࢠճ࿏͸ߏ଄͕γϯϓϧͰ৴པੑ͕ߴ͘ɺେྔੜ࢈͕
    ՄೳͰ͋Δɻ
    12

    View full-size slide

  15. ίϯϐϡʔλΛߏ੒͢Δిࢠૉࢠ: MOS τϥϯδελ
    MOS τϥϯδελ͸ήʔτɺιʔεɺυϨΠϯͱ͍͏ 3 ͭͷ୺ࢠ͔ΒͳΓɺήʔτ
    ୺ࢠʹߴిѹΛ༩͑ΔͱιʔεిѹΛυϨΠϯʹग़ྗ͢Δ͜ͱ͕Ͱ͖Δ nMOS ͱ
    ήʔτ୺ࢠʹ௿ిѹΛ༩͑ͯಉ༷ͷಈ࡞Λ͢Δ pMOS ͕͋Δɻ
    Figure 12: MOS τϥϯδελͷஅ໘ਤ [8]
    13

    View full-size slide

  16. MOS τϥϯδελʹΑΔεΠονͷߏங
    MOS τϥϯδελ͸ήʔτ୺ࢠͷిѹΛมԽͤ͞Δ͜ͱʹΑͬͯɺιʔεͱυϨΠ
    ϯؒʹిྲྀ͕ྲྀΕΔͨΊɺ୺ࢠͷిѹΛ੍ޚ͢ΔͨΊͷεΠονͱͯ͠ར༻͢Δ͜ͱ
    ͕Ͱ͖Δɻ
    Figure 13: nMOS τϥϯδελͷಈ࡞ [5]
    14

    View full-size slide

  17. CMOS ߏ੒ʹΑΔ࿦ཧճ࿏ͷߏங
    nMOS ͱ pMOS Ͱ࢖͍΍͍͢ిѹ͕ҟͳΔͷͰɺ૒ํΛ૊Έ߹Θͤͯճ࿏Λߏங͢
    Δࣄ͕ଟ͍ɻ͜ͷߏ੒Λ CMOS(Complementary MOS) ͱݺͿɻҎԼʹ CMOS Λ༻͍
    ͯ NOT, NAND, NOR ͷ࿦ཧήʔτΛߏங͢ΔྫΛࣔ͢ɻ
    15

    View full-size slide

  18. ిࢠճ࿏Ͱ 2 ஋Λ༻͍Δ͜ͱͷར఺
    ຊདྷ͸࣮਺஋Ͱ͋ΔిѹΛϒʔϧม਺ͱͯ͠ѻ͏͜ͱͰɺిѹͷϊΠζʹରͯ͋͠Δ
    ఔ౓ͷ଱ੑΛ࣋ͭ͜ͱ͕Ͱ͖ɺ৘ใΛ҆ఆͯ͠఻ୡɾอଘͰ͖Δɻ
    Figure 15: ిࢠճ࿏ʹΑΔ 2 ਐ਺ͷදݱ [5]
    16

    View full-size slide

  19. ϚϧνϓϨΫα
    2 ೖྗ͔Β 1 ͭͷೖྗΛબ୒͢Δ࿦ཧճ࿏ΛϚϧνϓϨΫαͱݺͿɻબ୒͸ઐ༻ͷ৴
    ߸ S Λ༻͍ͯߦΘΕɺϓϩάϥϜʹ͓͚Δ if จʹ૬౰͢ΔΠϝʔδͰ͋Δɻ
    Figure 16: ϚϧνϓϨΫαͷ࿦ཧճ࿏ͱਅཧ஋ද [5]
    17

    View full-size slide

  20. ॱংճ࿏ͱϑϦοϓϑϩοϓ
    ͜Ε·Ͱઆ໌ͨ͠ճ࿏͸࣌ؒΛߟྀͤͣɺ༩͑ΒΕͨೖྗΛଈ࠲ʹॲཧ͢Δճ࿏Ͱ͋
    ΔͨΊɺ૊Έ߹Θͤճ࿏ͱݺ͹ΕΔɻҰํͰ࿦ཧճ࿏ʹΫϩοΫΛಋೖ͠ɺ࣌ؒʹ·
    ͕ͨͬͯ஋Λอ࣋͢Δճ࿏Λॱংճ࿏ͱݺͿɻ
    ೖྗ͞Εͨ஋ΛΫϩοΫ͕ 1 ʹมԽͨ͠ͷΈग़ྗ͢Δॱংճ࿏ΛϑϦοϓϑϩοϓͱ
    ݺͿɻ͜ͷճ࿏͸ΫϩοΫ͕ 0 ͷ৔߹͸ͦͷ··ͷ஋Λҡ࣋͢Δࣄ͕Ͱ͖ΔɻҎԼʹ
    D ϑϦοϓϑϩοϓͷྫΛࣔ͢ɻ
    Figure 17: D ϑϦοϓϑϩοϓͷந৅Խͨ͠
    ࿦ཧճ࿏ͱਅཧ஋ද [5]
    Figure 18: ϑϦοϓϑϩοϓʹΑΔঢ়ଶણҡ
    [5]
    18

    View full-size slide

  21. Ϩδελ
    N ϏοτͷϨδελͱ͸ N ݸͷ࿦ཧ஋Λอ࣋ͨ͠ந৅తͳهԱૉࢠͷ͜ͱΛࢦ͢ɻ
    Ұൠతͳίϯϐϡʔλʹ͓͍ͯɺϨδελ͸ڞ௨ͷΫϩοΫೖྗΛ࣋ͭ N ݸͷϑϦο
    ϓϑϩοϓΛ·ͱΊͨ΋ͷΛར༻͢ΔɻҎԼʹ 4 ϏοτͷϨδελͷྫΛࣔ͢ɻ
    Figure 19: ϑϦοϓϑϩοϓΛ༻͍ͨ 4 ϏοτϨδελͷྫ [5]
    19

    View full-size slide

  22. Ճࢉճ࿏
    ҎԼʹՃࢉΛ࣮ߦ͢Δ૊Έ߹Θͤճ࿏Ͱ͋ΔશՃࢉثͷྫΛࣔ͢ɻ͜Ε͸ 1 Ϗοτͷ
    Ճࢉͷ݁ՌΛ܁Γ্͕Γ਺ͱڞʹग़ྗ͢Δɻෳ਺ฒ΂ͯग़ྗΩϟϦʔ (cout) Λ࣍ͷ
    ೖྗΩϟϦʔʹ઀ଓ͢Δ͜ͱͰɺଟϏοτͷՃࢉΛ࣮ߦͰ͖Δɻ
    Figure 20: શՃࢉثͷ࿦ཧճ࿏ [5]
    ͋ͨɺ1 ͭͷશՃࢉثΛΫϩοΫʹ·͕ͨΓ࢖͍ճ͢͜ͱͰɺଟϏοτͷՃࢉΛߦ͏
    ͜ͱ΋Ͱ͖Δɻ͜ͷํ๏͸ԋࢉػ͕ߴՁͩͬͨ࣌୅ʹར༻͞Ε͍ͯͨ (EDVAC ͳͲ)ɻ
    20

    View full-size slide

  23. ALU
    Ճࢉͱ࿦ཧԋࢉ͕ಉҰͷ࿦ཧճ࿏ͰܭࢉͰ͖Δ͜ͱΛར༻ͯ͠ɺͦΕΒΛಉҰͷϢ
    χοτʹ·ͱΊͨ΋ͷΛ ALU(Arithmetic Logic Unit) ͱݺͿɻ
    Figure 21: 1bit ALU ͷ࿦ཧճ࿏ [7] 21

    View full-size slide

  24. ଟϏοτΛѻ͏ ALU
    Figure 22: 64bit ALU ͷ࿦ཧճ࿏ [7] 22

    View full-size slide

  25. νϡʔϦϯάϚγϯ
    ͜Ε·Ͱ࿦ཧճ࿏ʹΑΓදݱ͞ΕΔϒʔϧ୅਺ͱͦͷԋࢉΛ༻͍Δ͜ͱͰɺ2 ਐ਺ʹ
    ର͢ΔܭࢉΛߦ͑Δ͜ͱΛઆ໌͕ͨ͠ɺ͜Ε͚ͩͰ͸͋ΒΏΔखଓ͖Λදݱ͢Δ͜ͱ
    ͸Ͱ͖ͳ͍ɻ͋Δखଓ͖͕ܭࢉՄೳͰ͋Δ͜ͱͷূ໌ͱɺͦΕΛ࣮ݱ͢ΔͨΊͷ࢓૊
    Έ͕ඞཁͰ͋Δɻ
    1936 ೥ʹΞϥϯɾνϡʔϦϯά͸ਓ͕ܭࢉΛߦ͏ࡍͷखଓ͖Λ෼ੳ͠ɺͦΕͱશ͘
    ಉ͜͡ͱΛՄೳʹ͢Δ”ࣗಈػց”ΛఏҊͨ͠ɻࣗಈػց͸਺ཧతख๏Λ༻͍ͯূ໌͞
    Ε͓ͯΓɺ
    ʮνϡʔϦϯάϚγϯʯͱݺ͹Ε͍ͯΔɻ
    23

    View full-size slide

  26. νϡʔϦϯάϚγϯͷݪཧ
    νϡʔϦϯάϚγϯ͸ແݶͷ௕͞Λ࣋ͭςʔϓͱͦΕΛಡΉϔουɺػցΛಈ࡞ͤ͞
    Δϧʔϧ͕هड़͞Εͨςʔϒϧ͔ΒͳΔɻςʔϓʹ͸༗ݶݸͷछྨͷه߸ͷ͏ͪ 1 ͭ
    ͕هࡌ͞Ε͍ͯΔɻ
    Figure 23: νϡʔϦϯάϚγϯ [2]
    νϡʔϦϯάϚγϯ͸༗ݶͷঢ়ଶू߹͔Β͋ΔҰͭͷঢ়ଶΛ࣋ͪɺςʔϒϧ͸ঢ়ଶू
    ߹ͱςʔϓʹهࡌ͞Ε͍ͯΔ༗ݶͳه߸ͷ૊Έ߹Θͤʹରͯ͠ɺ͋Δ໋ྩ͕ఆٛ͞Ε
    ͍ͯΔɻ΋ͪΖΜɺ༗ݶͳू߹ಉ࢜ͷ૊Έ߹ΘͤͰ͋ΔͨΊɺςʔϒϧʹஔ͚Δ໋ྩ
    ਺΋·ͨ༗ݶͰ͋Δɻ
    24

    View full-size slide

  27. ϑΥϯɾϊΠϚϯܕϚγϯ
    ݱࡏͷίϯϐϡʔλͷ͓͍ͯɺνϡʔϦϯάϚγϯͰ༻͍ΒΕ͍ͯͨςʔϓɺϔου
    ͓Αͼςʔϒϧ͸ɺϝϞϦɺϓϩάϥϜΧ΢ϯλɺϓϩηοαͱ࣮ͯ͠૷͞Ε͍ͯ
    Δɻ͜ͷىݯ͸ 1945 ೥ʹϊΠϚϯɺΤοΧʔτɺϞʔΫϦʔΒʹΑͬͯ࡞ΒΕͨ
    EDVAC ʹ͋Δͱ͞ΕɺޙʹϊΠϚϯܕίϯϐϡʔλͱݺ͹ΕΔ΋ͷݪܕͰ͋Δ?[3]ɻ
    ϊΠϚϯܕϚγϯ͸ҎԼͷಛ௃Λඋ͑Δɻ
    • ໋ྩ͕ஞ࣍తʹ࣮ߦ͞ΕΔ
    • σʔλ͸ԼͷϏοτ͔Βॱ࣍ϝϞϦ·ͨ͸ϨδελʹऔΓग़͞Εͯॲཧ͞Εɺ࠶
    ͼॻ͖໭͞ΕΔ
    • ֤ԋࢉʹར༻͢ΔԋࢉػߏΛͰ͖Δ͚ͩڞ௨ʹ࢖͏
    • ໋ྩͱσʔλΛ۠ผͤͣʹಉ͡ϝϞϦʹ͓͖ɺͲͪΒ΋ԋࢉͷର৅ͱ͢Δ
    • ϝϞϦʹ͸ΞυϨε͕෇͍͓ͯΓɺͦΕʹΑΓॲཧର৅ͷσʔλΛࢦఆ͢Δ
    • 1 ͭͷهԱͱ 1 ͭͷϓϩηοαΛ࣋ͭ͜ͱ
    ϊΠϚϯ͸νϡʔϦϯάϚγϯͷӨڹΛड͚ͯɺ্هΛઃܭͨ͠ͱࢥΘΕΔɻ 25

    View full-size slide

  28. యܕతͳϑΥϯɾϊΠϚϯܕͷϓϩηοα
    Figure 24: ϑΥϯɾϊΠϚϯܕͷϓϩηοαͷྫ [1] 26

    View full-size slide

  29. CPU ʹΑΔܭࢉͷߴ଎Խ

    View full-size slide

  30. CPU ͱ͸
    CPU(Central Processing Unit) ͸Ұൠʹߴີ౓ʹूੵ͞Εͨిࢠճ࿏ (IC) Ͱ͋Γɺࣄ
    લʹఆΊΕͨ༗ݶछྨͷ໋ྩΛ࣮ߦ͢Δ͜ͱ͕Ͱ͖ΔɻҎԼʹ໋ྩͷྫΛࣔ͢ɻ
    1. ࿦ཧԋࢉ: AND, OR ͳͲ
    2. جຊతͳࢉज़ԋࢉ: Add, Mul ͳͲ
    3. σʔλΛϝϞϦ͔ΒಡΈग़͠·ͨ͸ॻ͖ࠐΈ͢Δ໋ྩ: Load, Store ͳͲ
    4. όεΛհͯ͠पลػثΛૢ࡞͢Δ໋ྩ (ϙʔτ I/O ͷ৔߹)
    ༗໊ͳ CPU ʹ intel ͷ Core γϦʔζɺarm ͷ Conterx-A γϦʔζ౳͕͋Δɻ
    27

    View full-size slide

  31. CPU ͷେ·͔ͳߏ੒
    Figure 25: ૉ๿ͳ CPU ͷߏ੒ [7] 28

    View full-size slide

  32. ԋࢉػͱϨδελ
    CPU ಺෦ʹ͸࣮ࡍʹԋࢉΛߦ͏ϢχοτͱଟϏοτͷϨδελ͕ز͔ͭ౥ࡌ͞Εͯ
    ͍Δɻ͜Ε͸Ұ࣌తʹσʔλΛอଘ͢ΔͨΊͷిࢠճ࿏Ͱ͋ΓɺͦΕͧΕ൪߸͕ৼΒ
    Ε͍ͯΔɻ൪߸͸ޙʹઆ໌͢Δػցޠͷ಺෦ͰҾ਺ͱͯ͠ར༻͞ΕΔɻ
    Figure 26: ԋࢉػͱϨδελ [7]
    29

    View full-size slide

  33. ISA: Instruction Set Architecture
    CPU Ͱ࣮ߦͰ͖Δ໋ྩΛ·ͱΊͨ΋ͷͷ͜ͱΛ ISA ͱݺͿɻISA ͸ιϑτ΢ΣΞͱ
    ϋʔυ΢ΣΞͷΠϯλʔϑΣΠεͰ͋ΓɺϓϩάϥϚ͔ΒݟͨϓϩηοαͷΞʔΩς
    ΫνϟͱΈͳ͢͜ͱ΋Ͱ͖Δɻ௨ৗ͸ϓϩάϥϚ͕ίϯϐϡʔλΛ੍ޚ͢Δࡍʹ༻͍
    Δࣄ͕Ͱ͖Δ࠷΋௿Ϩϕϧͳ”ιϑτ΢ΣΞ”Ͱ΋͋Δɻ
    ISA ʹରͯ͠ɺISA Λ࣮ߦ͢ΔͨΊͷ಺෦ͷ࿦ཧճ࿏ΛϚΠΫϩΞʔΩςΫνϟ
    (march) ͱݺͼɺ͜Ε͸ϓϩηοαΛ಺෦͔ΒݟͨࡍͷΞʔΩςΫνϟͱ͍͏͜ͱ͕
    Ͱ͖Δɻ
    ൚༻ੑͷ؍఺͔ΒҟͳΔϕϯμ͕ಉҰͷ ISA Λ࠾༻͍ͯ͠Δ CPU ΋͋Δɻྫ͑͹ɺ
    intel ࣾͱ AMD ࣾ͸ޓ͍ʹ͋Δఔ౓ͷޓ׵ੑ͕͋Δ ISA Λ࠾༻͍ͯ͠Δ͕ɺͦͷ
    march ͸શ͘ҟͳΔɻ
    ຊεϥΠυͰ͸ ISA ͱͯ͠ RISC-V ͱ x86 Λ༻͍ͯઆ໌Λߦ͏ɻ
    30

    View full-size slide

  34. ISA ͷߏ੒
    ISA ͷ໋ྩܗࣜͷྫͱͯ͠ɺRISC-V RV32I R ܗࣜΛ঺հ͢Δɻ
    Figure 27: RISC-V ISA ͷϑΟʔϧυߏ੒ [7]
    • opcode: جຊతͳ໋ྩͷछྨΛද͢
    • rd: ܭࢉͷ݁Ռ͕ॻ͖ࠐ·ΕΔϨδελ (destination)
    • rs1: ୈ 1 ΦϖϥϯυͱͳΔϨδελ
    • rs2: ୈ 2 ΦϖϥϯυͱͳΔϨδελ
    • funct7: ௥Ճ৘ใΛ༩͑ΔͨΊͷϑΟʔϧυ
    x5 = x6 + x7 -> 0000000 , 00111 , 00110 , 000, 00101 , 0110011
    31

    View full-size slide

  35. Figure 28: RISC-V ͷ ISA[7] 32

    View full-size slide

  36. ػցޠͱΞηϯϒϦݴޠ
    ISA Λ࣮ࡍʹίϯϐϡʔλ͕ղऍͰ͖ΔΑ͏ʹ 2 ਐ਺ʹΤϯίʔυͨ͠΋ͷΛػցޠ
    ͱ͍͏ɻػցޠ͸όΠτίʔυͰ͋ΔͨΊਓ͕ؒಡΉ͜ͱʹ޲͍͍ͯͳ͍ɻैͬͯɺ
    ػցޠΛਓ͕ؒಡΈ΍͍͢ܗʹʢ΄΅ʣ1 ର 1 Ͱஔ͖׵͑ͨ؆୯ͳϓϩάϥϛϯάݴ
    ޠͰ͋ΔΞηϯϒϦݴޠΛਓؒ͸ར༻͢Δɻ͜ΕΒ͸Ξηϯϒϥͱݺ͹ΕΔιϑτ
    ΢ΣΞΛ௨ͯ͠ػցޠʹม׵͞ΕΔɻ
    33

    View full-size slide

  37. ػցޠͱΞηϯϒϦݴޠͷྫ: x86
    ҎԼʹɺ2 ม਺ΛՃࢉ͢Δؔ਺ add Λ࣮ߦ͢Δ x86 ͷػցޠͱΞηϯϒϦݴޠ (GAS)
    Λࣔ͢ɻx86 ͷ CISC Ͱ͋Γɺ໋֤ྩͷ௕͕͞౳͘͠ͳ͍͜ͱ͕෼͔Δɻ
    int add(int a, int b) {
    return a + b;
    }
    _add:
    55 // pushq %rbp
    48 89 e5 // movq %rsp , %rbp
    8d 04 37 // leal (%rdi ,%rsi), %eax
    5d // popq %rbp
    c3 // retq
    34

    View full-size slide

  38. ϚΠΫϩΞʔΩςΫνϟ
    ISA ͕ϓϩάϥϚ੍͕ޚՄೳͳ࠷΋௿Ϩϕϧͳιϑτ΢ΣΞͰ͋ΔҰํͰϚΠΫϩ
    ΞʔΩςΫνϟ͸ͦͷ࣮૷Ͱ͋Δϋʔυ΢ΣΞࣗମͷߏ੒Λࢦ͠ɺճ࿏ਤ΍ RTL ͷ
    ܗͰදݱ͞ΕΔɻ࣍ϖʔδʹɺRISC-V ISA Λ࣮ߦ͢ΔͨΊͷૉ๿ͳ CPU ͱ intel x86
    ίΞͷϚΠΫϩΞʔΩςΫνϟͷྫΛࣔ͢ɻ
    35

    View full-size slide

  39. ϚΠΫϩΞʔΩςΫνϟͷྫ
    Figure 29: Intel Core i7 ͷ march[6]
    Figure 30: ARM A53 ͷ march[6]
    36

    View full-size slide

  40. ISA ͕ CPU Ͱ࣮ߦ͞ΕΔ࢓૊Έ: ࢉज़໋ྩ
    Ճࢉ໋ྩ͕࣮ߦ͞ΕΔࡍͷ CPU(σʔλόε) ͷ༷ࢠΛࣔ͢ɻ
    // ̎ͭͷϨδελͷ஋ΛՃࢉ͢Δ
    add x1 , x2 , x3 // a = b + c
    1. ໋ྩ͕ϑΣον͞ΕɺPC ͕Ճࢉ͞ΕΔ
    2. ໋ྩ͕σίʔυ͞Εɺ֤Ϣχοτʹ৴߸͕ग़ྗ͞ΕΔ
    3. ্هͷ৴߸ʹैͬͯɺϨδελϑΝΠϧ͔Β x2 ͱ x3 ͷσʔλ͕ಡΈग़͞ΕΔ
    4. ಡΈग़͞ΕͨσʔλΛ ALU ͰՃࢉ͢Δ͞ΕΔ
    5. ALU ͷग़ྗ͕ϨδελϑΝΠϧ಺ͷ x3 ʹॻ͖ࠐ·ΕΔ
    37

    View full-size slide

  41. Ճࢉ໋ྩ͕ CPU Ͱ࣮ߦ͞ΕΔ࢓૊Έ
    Figure 31: ૉ๿ͳ CPU ʹΑΔՃࢉ໋ྩͷ࣮ߦ 38

    View full-size slide

  42. ISA ͕ CPU Ͱ࣮ߦ͞ΕΔ࢓૊Έ: ϝϞϦ໋ྩ
    ϩʔυ໋ྩ͕࣮ߦ͞ΕΔࡍͷ CPU(σʔλόε) ͷ༷ࢠΛࣔ͢ɻ
    // ഑ྻ͔Β஋ΛϨδελʹϩʔυ͢Δ
    lw x1 , 4(x6) // x = a[i + 1]
    1. ໋ྩ͕ϑΣον͞ΕɺPC ͕Ճࢉ͞ΕΔ
    2. ໋ྩ͕σίʔυ͞Εɺ֤Ϣχοτʹ৴߸͕ग़ྗ͞ΕΔ
    3. Ϩδελ x1 ͷσʔλ͕ϨδελϑΝΠϧ͔ΒಡΈग़͞ΕΔ
    4. ಡΈग़͞Εͨ஋ͱఆ਺஋ (4) Λ ALU ͰՃࢉ͢Δ (ΞυϨεͷܭࢉ)
    5. ্هͷग़ྗ͞ΕͨΞυϨε͔ΒɺDataMemory ͕ϝϞϦ͔Βσʔλϩʔυ͢Δ
    6. DataMemory ͷग़ྗ͕ϨδελϑΝΠϧ಺ͷ x5 ʹॻ͖ࠐ·ΕΔ
    39

    View full-size slide

  43. ϩʔυ໋ྩ͕ CPU Ͱ࣮ߦ͞ΕΔ࢓૊Έ
    Figure 32: ૉ๿ͳ CPU ʹΑΔϩʔυ໋ྩͷ࣮ߦ 40

    View full-size slide

  44. ISA ͕ CPU Ͱ࣮ߦ͞ΕΔ࢓૊Έ: ϒϥϯν໋ྩ
    ϒϥϯν໋ྩ͕࣮ߦ͞ΕΔࡍͷ CPU(σʔλόε) ͷ༷ࢠΛࣔ͢ɻ
    // ̎ͭͷม਺Λൺֱͯ͠ɺ౳͚͠Ε͹෼ذ͢Δ
    beq x1 , x2 , offset // if (x1 == x2) pc += offset;
    1. ໋ྩ͕ϑΣον͞ΕɺPC ͕Ճࢉ͞ΕΔ
    2. ໋ྩ͕σίʔυ͞Εɺ֤Ϣχοτʹ৴߸͕ग़ྗ͞ΕΔ
    3. Ϩδελ x1 ͱ x2 ͷσʔλ͕ϨδελϑΝΠϧ͔ΒಡΈग़͞ΕΔ
    4. PC ͷ஋ͱ offset ͷ஋Λ 2 ഒͯ͠ූ߸֦ுͨ͠΋ͷΛ ALU ͰՃࢉ͢Δ
    5. x1 ͱ x2 ͷ஋͕౳͚͠Ε͹ɺ݁ՌΛ PC ʹ্ॻ͖͢Δ
    ্هͰՃࢉ͞ΕΔ offset ͸ 12bit Ͱ͋Γɺͦͷ஋Λ 2 ഒ͢Δͷ͸໋ྩ௕͕ 2 όΠτͷ
    ഒ਺Ͱ͋Δඞཁ͕͋Δͱ͍͏ RISC-V ͷ࢓༷Ͱ͋Δɻ
    41

    View full-size slide

  45. ϒϥϯν໋ྩ͕ CPU Ͱ࣮ߦ͞ΕΔ࢓૊Έ
    Figure 33: ૉ๿ͳ CPU ʹΑΔϒϥϯν໋ྩͷ࣮ߦ 42

    View full-size slide

  46. CPU શମͷঢ়ଶભҠͷྫ
    Figure 34: CPU શମͷঢ়ଶભҠͷྫ [1] 43

    View full-size slide

  47. ओཁͳ ISA
    • x86
    Ұൠ޲͚ͷίϯϐϡʔλ͔ΒϫʔΫεςʔγϣϯ·Ͱ෯޿͘ར༻͞Ε͍ͯΔɻ
    • ARM
    ଟ͘ͷϞόΠϧσόΠε΍ϚΠίϯɺ૊ΈࠐΈػثͳͲʹ޿͘ར༻͞Ε͍ͯΔɻ
    e.g. εϚʔτϑΥϯ֤छɺRapberryPiɺNintendo DS ͳͲɻ
    • RISC-V
    2010 ೥ʹ UCB ͔ΒΦʔϓϯιʔεͱͯ͠ൃදɻ࠷ۙ੝Γ্͕͖͍ͬͯͯΔɻ
    • MIPS
    ήʔϜػ΍૊ΈࠐΈػثͳͲʹར༻͞Ε͍ͯΔɻ
    e.g. PlayStationɺNintendo 64 ͳͲ
    • PowerPC
    ਾ͑ஔ͖ܕήʔϜ΍εʔύʔίϯϐϡʔλ޲͚ʹར༻͞Ε͍ͯΔɻ
    e.g. IBM ͷϫʔΫεςʔγϣϯɺچ MacintoshɺPlayStation3ɺXbox360 ͳͲɻ
    44

    View full-size slide

  48. CPU ʹ͓͚Δ଎͞ͱ͸ʁ: CPI
    CPU Ͱ໋ྩΛ࣮ߦ͢ΔࡍʹɺͲΕ͘Β͍ΫϩοΫαΠΫϧ͕ඞཁͰ͋Δ͔Λࣔͨ͠
    ਺஋Λ CPI ͱݺͿɻ͜Ε͸ ISA ʹର֤ͯ͠ϚΠΫϩΞʔΩςΫνϟ͕Ͳͷ͘Β͍ߴ଎
    ʹಈ࡞͢ΔΛද͢ج४ͱͳΔɻ
    ϓϩάϥϜʹ͓͚Δ໋ྩ i ͷ࣮ߦ਺ ICi
    ɺ໋ྩ i ʹඞཁͳΫϩοΫαΠΫϧ਺Λ CPIi
    ͱ͢ΔͱɺCPI ͸ҎԼͷΑ͏ʹܭࢉͰ͖Δɻ
    CPI =
    n

    i=1
    ICi
    IC
    × CPIi
    CPI Λ༻͍Δ͜ͱͰɺҟͳΔ ISA ΍ϚΠΫϩΞʔΩςΫνϟͷ࣮૷Λൺֱ͢Δ͜ͱ͕
    Ͱ͖Δɻͨͩ͠ɺύΠϓϥΠϯԽ͞Ε͍ͯΔͱ͜ͷ਺஋Λ۩ମతʹٻΊΔ͜ͱ͸೉
    ͍͠ɻ
    45

    View full-size slide

  49. CPI ܭࢉͷ۩ମྫ
    ҎԼͷ৚݅ͷϓϩάϥϜͱϓϩηοαʹ͓͍ͯɺCPI(CPIbase) ΛٻΊͯΈΔɻ
    • ුಈখ਺఺ԋࢉͷׂ߹ Freqfp = 25 ˋ
    • ුಈখ਺఺ԋࢉͷฏۉ CPI CPIfp = 4.0
    • ଞͷԋࢉͷฏۉ CPI CPIothers = 1.33
    • ฏํࠜԋࢉͷׂ߹ Freqsqrt = 2 ˋ
    • ฏํࠜԋࢉͷ CPI CPIsqrt = 20
    CPIbase = (CPIfp × Freqfp) + (CPIothers × 1 − Freqfp) (3)
    = (4 × 0.25) + (1.33 × 0.75) = 2.0 (4)
    46

    View full-size slide

  50. CPI ܭࢉͷ۩ମྫ
    ҎԼͷ̎ͭͷϓϩηοαͷઃܭύλʔϯʹ͍ͭͯɺͲͪΒ͕ΑΓੑೳΛ޲্ͤ͞Δ͔
    Λߟ͑ͯΈΔɻ
    1. ฏํࠜԋࢉͷ CPI Λ 2 ʹݮগͤ͞Δ (CPI1)
    2. શͯͷුಈখ਺఺ԋࢉͷ CPI Λ 2.5 ʹݮগͤ͞Δ (CPI2)
    ৽͍͠ฏํࠜԋࢉͷ CPI Λ CPInews qrt
    ɺ৽͍͠ුಈখ਺఺ԋࢉͷ CPI Λ CPInewf p
    ͱ͢
    ΔͱɺͦΕͧΕҎԼͷΑ͏ʹܭࢉͰ͖Δɻ
    CPI1 = CPIbase − Freqsqrt × (CPIsqrt − CPInews qrt) (5)
    = 2.0 − 0.02 × (20 − 2) = 1.64 (6)
    CPI2 = (CPI1−Freqfp
    × CPIothers) + (Freqfp × CPInewf p) (7)
    = (0.75 × 1.33) + (0.25 × 2.5) = 1.625 (8)
    47

    View full-size slide

  51. ໋ྩϨϕϧฒྻੑ
    ͜Ε·Ͱ CPU ΞʔΩςΫτ͸୯Ұͷϓϩηοαͷ্Ͱز͔ͭͷ໋ྩΛฒྻʹ࣮ߦ͢
    Δ͜ͱʹΑͬͯੑೳΛ޲্͖ͤͯͨ͞ɻ͜ΕΒ͸໋ྩͷ࣋ͭજࡏతͳॏෳΛར༻ͯ͠
    ଎౓Λ޲্͓ͤͯ͞Γɺ͜ΕΛ໋ྩϨϕϧฒྻੑ (ILP) ͱݺ͹ΕΔɻ(ILP: Instruction
    Level Parallelism)
    ࣍ʹ໋ྩϨϕϧฒྻੑͷྫͰ࠷΋ҰൠతͳྫͰ͋ΔύΠϓϥΠϯʹ͍ͭͯઆ໌͢Δɻ
    48

    View full-size slide

  52. ߴ଎Խͷྫ 2: ύΠϓϥΠϯ
    γϯάϧαΠΫϧߏ੒͸શͯͷ໋ྩ͕ऴྃ͢Δ·Ͱʹ௕͍͕͔͔࣌ؒΓɺ֤ܭࢉࢿݯ
    ͷར༻཰͕Լ͕ͬͯ͠·͏໰୊͕͋ͬͨɻݱ୅ͷ CPU ͸ύΠϓϥΠϯͱ͍͏ߏ଄Λ
    ༻͍ͯɺ͜ͷ໰୊Λղܾ͠Α͏ͱ͍ͯ͠Δɻ͜Ε͸໋֤ྩͷ࣋ͭॏෳੑΛར༻ͯ͠ɺ
    ͦΕͧΕΛฒྻʹ࣮ߦ͢Δ͜ͱʹΑͬͯॲཧΛߴ଎ԽΛ͢Δٕज़Ͱ͋Δɻ
    ҎԼͷચ୕ͷྫ͔Β෼͔ΔΑ͏ʹɺҰͭͷॲཧΛز͔ͭͷεςʔδʹ෼ׂ͠ɺҰͭͷ
    ॲཧͷ࣮ߦதʹۭ͍͍ͯΔࢿݯΛ࣍ͷॲཧʹར༻͢Δ͜ͱͰɺશମͱͯ͠୹͍࣌ؒͰ
    ॲཧ͢Δ͜ͱͰ͖Δɻ
    Figure 35: ύΠϓϥΠϯΛར༻ͨ͠ચ୕ͷޮ཰Խ [7]
    49

    View full-size slide

  53. γϯάϧαΠΫϧͱύΠϓϥΠϯߏ੒ͷ໋ྩ࣮ߦ࣌ؒͷൺֱ
    ֤εςʔδͷ࣮ߦ࣌ؒ͸ۉ౳Ͱ͋Δ΄Ͳɺ໋ྩͷ࣮ߦ଎౓͕޲্͢Δࣄ͕෼͔Δɻ
    Figure 36: γϯάϧαΠΫϧͱύΠϓϥΠϯߏ੒ͷ࣮ߦͷ༷ࢠ [7] 50

    View full-size slide

  54. ύΠϓϥΠϯΛಋೖͨ͠ϚΠΫϩΞʔΩςΫνϟͷྫ
    ֤ϢχοτͷؒʹϨδελΛڬΈࠐΉ͜ͱͰύΠϓϥΠϯΛ࣮૷͢Δ͜ͱ͕Ͱ͖Δɻ
    Figure 37: ૉ๿ͳ CPU ͷύΠϓϥΠϯԽ [7] 51

    View full-size slide

  55. ύΠϓϥΠϯʹ͓͚Δ໰୊: ϋβʔυ
    ύΠϓϥΠϯΛ༻͍ͯ΋ৗʹຖαΠΫϧ໋ྩΛ࣮ߦͰ͖ΔΘ͚Ͱ͸ͳ͍ɻ͜Ε͸໋ྩ
    ؒʹ͓͚ΔσʔλͷґଘͳͲ͕ݪҼͰ࣍ͷ໋ྩ͕࣍ͷΫϩοΫαΠΫϧͰ࣮ߦͰ͖ͳ
    ͍৔߹͕͋ΔͨΊͰ͋Δɻ͜ΕΒ͸ϋβʔυͱݺ͹ΕɺύΠϓϥΠϯͷԁ׈ͳ࣮ߦΛ
    ཚͯ͠εϧʔϓοτΛ௿Լͤ͞ΔɻҎԼʹओཁͳϋβʔυΛ̏ͭ঺հ͢Δɻ
    1. σʔλϋβʔυ
    2. ੍ޚϋβʔυ
    3. ߏ଄ϋβʔυ
    52

    View full-size slide

  56. σʔλϋβʔυ
    σʔλϋβʔυ͸࿈ଓ͢Δ໋ྩͷΦϖϥϯυ͕લͷ໋ྩͷ݁Ռʹґଘ͍ͯ͠Δ৔߹ʹ
    ى͜ΔɻҎԼ͸̎ͭ໨ͷ sub ໋ྩ͕લͷ add ໋ྩͷ݁Ռʹґଘ͍ͯ͠ΔͨΊɺRF ε
    ςʔδͰཹ·͍ͬͯΔྫͰ͋Δɻ͜ͷ৔߹ɺadd ໋ྩ͕ WB εςʔδʹ౸ୡ͠ͳ͍ͱ
    sub ໋ྩ͸࣮ߦͰ͖ͳ͍ɻ͜ͷΑ͏ʹύΠϓϥΠϯ͕ఀࢭ͍ͯ͠ΔαΠΫϧΛύΠϓ
    ϥΠϯετʔϧ (όϒϧ) ͱݺͿ
    add x5 , x0 , x1
    sub x2 , x5 , x3
    ɹɹɹ
    Figure 38: ύΠϓϥΠϯʹ͓͚Δσʔλϋβʔυͷྫ 53

    View full-size slide

  57. σʔλϋβʔυͷվળํ๏
    σʔλϋβʔυͷྫ͸લͷ໋ྩʹඞཁͳεςʔδ (EX) ΑΓޙͷεςʔδ (WD) Ͱ͠
    ͔ɺͦͷ݁Ռ͕Ϩδελʹॻ͖ࠐ·Εͳ͍͜ͱ͕ݪҼͰى͜Δɻ
    ͜Ε͸ඞཁͳεςʔδ͕ऴΘͬͨ࣌఺Ͱͦͷ݁ՌΛઌऔΓ͢ΔύεΛ࡞Δ͜ͱͰվળ
    Ͱ͖Δɻ͜ΕΛϑΥϫʔσΟϯάͱݺͿɻ͔͠͠ɺ͜ͷख๏Ͱ׬શʹετʔϧΛආ͚
    ΒΕΔΘ͚Ͱ͸ͳ͘ɺϩʔυ໋ྩ͔Β࣍ͷ໋ྩͷΦϖϥϯυΛऔಘ͢Δ৔߹͸গ਺ͩ
    ͕ετʔϧ͕ൃੜ͢Δʢϩʔυ஗Ԇʣ
    ɻ
    Figure 39: ϑΥϫʔσΟϯάʹΑΔσʔλϋ
    βʔυͷվળ
    Figure 40: ϑΥϫʔσΟϯάͱϩʔυ஗Ԇ
    54

    View full-size slide

  58. ੍ޚϋβʔυ
    ੍ޚϋβʔυ͸ɺ৚݅෼ذ໋ྩ͕࣮ߦ͞ΕΔ৔߹ʹͦͷܭࢉ͕ऴྃ͢Δ·Ͱ࣍ͷ໋ྩ
    ͷΞυϨεΛ஌Δ͜ͱ͕Ͱ͖ͳ͍ͨΊʹൃੜ͢Δɻ
    ࠷΋ૉ๿ͳ੍ޚϋβʔυ΁ͷରࡦ͸ɺ৚݅෼ذ໋ྩͷܭࢉ͸ EX Ͱऴྃ͢Δ·Ͱɺ࣍
    ͷ໋ྩͷϑΣονΛετʔϧ͢Δ͜ͱͰ͋Δɻ͔͜͠͠ͷख๏͸ύΠϓϥΠϯͷஈ਺
    ͕૿͑Δ΄Ͳɺ஗Ԇ͕େ͖͘ͳΔɻ
    Figure 41: ύΠϓϥΠϯʹ͓͚Δ੍ޚϋβʔυͷྫ
    55

    View full-size slide

  59. ੍ޚϋβʔυͷվળํ๏
    ৚݅෼ذ໋ྩͷ෼ذ͕੒ཱ͠ͳ͍৔߹ɺ࣍ͷ໋ྩ͸ϑΣονΛ෼ذ໋ྩͷ EX ·Ͱ଴
    ͭඞཁ͸ͳ͍ɻैͬͯɺ৚݅෼ذ͕੒ཱ͠ͳ͍ͱԾఆͯ͠௨ৗ௨ΓʹύΠϓϥΠϯΛ
    ಈ࡞ͤ͞ɺ෼ذ͕ൃੜ͢Δ͜ͱ͕Θ͔ͬͨॠؒʹ݁ՌΛ࡟আ (ϑϥογϡ) ͢Δख๏
    ͕͋Δɻ͜Ε͸୯ʹετʔϧͤ͞Δख๏ΑΓฏۉͯ͠൒෼ͷαΠΫϧ਺Λઅ໿Ͱ͖Δ
    ͕ɺϨδελͷ಺༰Λ࡟আ͢ΔͨΊͷࢿݯ͕ඞཁͰ͋Δɻ
    Figure 42: ϑϥογϡʹ͓͚Δ੍ޚϋβʔυͷվળ 56

    View full-size slide

  60. ߏ଄ϋβʔυ
    ߏ଄ϋβʔυͱ͸ύΠϓϥΠϯͷ 2 ͭҎ্ͷεςʔδ͕ 1 ͔ͭ͠ແ͍ܭࢉࢿݯΛऔΓ
    ߹͏͜ͱͰൃੜ͢ΔϋβʔυͰ͋Δɻ
    RISC ͸શͯͷεςʔδ͕ॱʹ࣮ߦ͞ΕΔͨΊɺ໋ྩؒͰܭࢉࢿݯͷऔΓ߹͍͕ى͜
    Δ͜ͱ͕গͳ͘ɺߏ଄ϋβʔυ͸͋·Γ໰୊ͱͳΒͳ͍ɻ͔͠͠ɺ໋ྩͱσʔλͷϝ
    ϞϦΞΫηεʹಉҰͷܦ࿏Λ༻͍͍ͯΔ৔߹ɺσʔλͷϝϞϦΞΫηεΛ༏ઌͯ͠ޙ
    ଓ໋ྩͷϑΣονΛ஗Ԇͤ͞ͳ͚Ε͹ͳΒͳ͍ΞʔΩςΫνϟ΋͋Δɻ
    Figure 43: ύΠϓϥΠϯʹ͓͚Δߏ଄ϋβʔυͷྫ
    57

    View full-size slide

  61. ϝϞϦΞΫηε࣌ؒͷ໰୊
    ϓϩηοαͷ಺෦ͷߏ੒͚ͩͰ͸ͳ͘ɺ֎෦ͱͷ௨৴࣌ؒ΋·࣮ͨߦ଎౓ʹେ͖ͳӨ
    ڹΛ༩͑Δɻ௨ৗɺओهԱ͸Ϩδελͱൺ΂ͯඇৗʹ௿଎Ͱ͋Γɺͦͷ଎౓ࠩ͸໋ྩ
    ϑΣον΍ϝϞϦ໋ྩͷࡍʹେ͖ͳετʔϧΛҾ͖ىͯ͜͠଎౓Λ௿Լͤ͞Δɻ͞Β
    ʹɺϑΝΠϧͳͲͷσʔλ͸σΟεΫʹอଘ͞Ε͓ͯΓɺ͜ΕΒ͸ओهԱΑΓ΋େ༰
    ྔͰ͋Δ͕௿଎Ͱ͋ΔͨΊΞΫηεʹ͸ߋʹ௕͍͕͔͔࣌ؒΔɻ
    Figure 44: ֤هԱ૷ஔͷ֓ཁ [6] 58

    View full-size slide

  62. هԱ֊૚
    ϝϞϦΞΫηε࣌ؒͷ໰୊Λվળ͢ΔͨΊʹɺ௨ৗͷ CPU Ͱ͸ෳ਺ͷهԱ૷ஔΛ֊
    ૚Խͯ͠഑ஔ͢Δߏ੒Λͱ͍ͬͯΔɻ͜Ε͸খ༰ྔͰߴ଎ͳϝϞϦΛ্Ґʹɺେ༰ྔ
    ͕ͩ௿଎ͳ΋ͷԼҐʹ഑ஔͯ͠ɺ্͔ΒॱʹΞΫηεΛߦ͏ɻར༻ස౓ͷߴ͍σʔλ
    Λ্Ґʹ͓͘͜ͱͰશମͱͯ͠ΞΫηε࣌ؒΛ୹ॖͰ͖Δɻ
    Figure 45: هԱ֊૚ͷ֓೦ [4] 59

    View full-size slide

  63. ߴ଎Խͷྫ 2: Ωϟογϡ
    ϨδελͱओهԱͷؒʹ഑ஔ͞ΕͨهԱ૷ஔΛΩϟογϡͱݺͼɺҰൠతʹ͸ SRAM
    Ͱߏ੒͞Εͨখ༰ྔͳϝϞϦΛ 2 ͔Β 3 ͭ΄Ͳ֊૚Խͨ͠΋ͷͰ͋Δɻϓϩηοα͸
    ϝϞϦΞΫηε͕ൃੜͨ͠৔߹ɺ·ͣΩϟογϡʹΞΫηεͯͦ͠ͷσʔλ͕͋Δ͔
    ໰͍߹ΘͤΔɻ΋͠σʔλ͕͋Ε͹Ωϟογϡ͸ΛΕΛฦ٫͠ (ώοτ)ɺͳ͍৔߹͸
    ΑΓԼҐͷهԱ֊૚ʹ໰͍߹ΘͤΔ (ϛεʣ
    ɻ
    Figure 46: Ωϟογϡ΁ͷΞΫηε [1] 60

    View full-size slide

  64. Ωϟογϡͷߏ੒
    Ωϟογϡ͸ϝϞϦͱಉ༷ʹৼΔ෣͏ඞཁ͕͋ΔͨΊͦͷσʔλ͸ΞυϨεͰ؅ཧ͞
    ΕΔɻ͔͠͠ɺߴՁͳهԱૉࢠͰߏ੒͞ΕΔͨΊɺอ࣋͢Δσʔλ͸ޮ཰తʹબ୒͢
    Δඞཁ͕͋ΔɻैͬͯɺҎԼͷϓϩάϥϜʹ͓͚ΔओཁͳϝϞϦΞΫηεύλʔϯΛ
    ੜ͔ͯ͠ઃܭ͞ΕΔɻ
    • ࣌ؒతہॴੑ
    • ۭؒతہॴੑ
    جૅతͳ̏ͭͷΩϟογϡͷߏ੒ํࣜΛҎԼʹࣔ͢ɻ
    • μΠϨΫτϚοϐϯάํࣜ
    • ϑϧΞιγΞςΟϒํࣜ
    • N ΢ΣΠ-ηοτΞιγΞςΟϒํࣜ
    ࣌ؒͷ౎߹্ɺͦΕͧΕͷৄࡉ͸লུ͢Δɻ
    61

    View full-size slide

  65. Ωϟογϡ׆༻ͷྫ: ߦྻੵ
    C++ʹΑΔ 2 ͭͷߦྻੵͷ࣮૷ͱͦͷ࣮ߦ࣌ؒΛൺֱ͢Δ (N = 1000)
    ૉ๿ͳߦྻੵͷ࣮૷ (3314 ms)
    for(int i=0; ifor(int j=0; jC[i*N+j] = 0;
    for(int k=0; kC[i*N+j] += A[i*N+k]*B[k*N+j];
    }
    B Λసஔͨ͠ߦྻੵͷ࣮૷ (2518 ms)
    for(int i=0; ifor(int j=0; jC[i*N+j] = 0;
    for(int k=0; kC[i*N+j] += A[i*N+k]* B_trans[j*N+k];
    }
    62

    View full-size slide

  66. ߴ଎Խͷྫ 3: εʔύʔεΧϥ
    ͋Δεςʔδ͕௕͍ϨΠςϯγΛ࣋ͭ৔߹ɺͦͷεςʔδΛෳ਺༻ҙ͢Δ͜ͱ͕ޮՌ
    తͰ͋Δɻ͜ͷΑ͏ͳઃܭΛεʔύεΧϥͱݺͿɻ
    Figure 47: εʔύʔεΧϥͷྫ [1]
    63

    View full-size slide

  67. σʔλϨϕϧฒྻੑ
    ௨ৗ͸ϓϩάϥϜ͸Ұͭͷ໋ྩʹରͯ͠ҰͭͷσʔλΛॲཧ͢Δࣄ͕ଟ͍ɻ͜ͷํࣜ
    ͸ MIMD ͱݺ͹ΕΔҰํͰɺՊֶٕज़ܭࢉ΍ը૾΍Ի੠ͳͲͷϚϧνϝσΟΞॲཧ
    Ͱ͸Ұͭͷ໋ྩΛෳ਺ͷσʔλʹରͯ͠ద༻͢Δ͜ͱ͕Α͋͘Δɻ͜ͷ৔߹ɺ໋̍ྩ
    Ͱෳ਺ͷॲཧ͕׬ྃ͢ΔͨΊɺޮ཰্͕ঢ͢Δɻ͜ͷΑ͏ʹॲཧͷର৅ͱͳΔσʔλ
    ͷ࣋ͭજࡏతͳฒྻੑΛσʔλϨϕϧฒྻੑͱݺͿɻ
    Ұൠͷϓϩηοα͸͜ͷσʔλϨϕϧฒྻੑΛੜ͔ͯ͠ɺϓϩάϥϜΛߴ଎Խ͢Δͨ
    Ίͷػߏ΍ઐ༻໋ྩΛ࣮૷͍ͯ͠ΔɻͦΕΒʹ͍ͭͯҎԼͷ̎ͭͷྫΛ঺հ͢Δɻ
    • ϚϧνϝσΟΞ SIMD
    • ϕΫτϧɾΞʔΩςΫνϟ
    • GPU
    64

    View full-size slide

  68. ߴ଎Խͷྫ 4: ϚϧνϝσΟΞ SIMD
    SIMD ͸Ի੠΍ը૾ͳͲͷϚϧνϝσΟΞσʔλͷܭࢉ͕ɺ8 Ϗοτ΍ 16 ϏοτͷΑ
    ͏ͳ୹͍σʔλ௕Λར༻͢Δ͜ͱ͔ΒɺͦΕΛޮ཰తʹܭࢉ͢ΔࢼΈ͔Βੜ·Εͨɻ
    ྫ͑͹ɺ32 ϏοτͷԋࢉػΛ 8 Ϗοτ × 4 ͭͷσʔλʹ෼͚ͯɺΩϟϦʔͷ఻೻Λ
    ్தͰࢭΊΔ͜ͱͰ̐ͭͷՃࢉΛಉ࣌ʹܭࢉͰ͖ΔɻҰൠతʹ͸ɺ128 Ϗοτ͔Β
    512 Ϗοτͷԋࢉػ͓ΑͼϨδελ͕༻ҙ͞Ε͍ͯΔ͜ͱ͕ଟ͍ɻ
    SIMD Λ༻͍Δͱ௨ৗͷ໋ྩΛԿ౓΋࢖༻͢ΔΑΓ໋ྩ਺ΛݮΒ͢͜ͱ͕Ͱ͖Δɻ
    fld f5 , a # Load scalar a
    splat .4D f0 , f0 # Make 4 copies of a
    fld.4D f1 , 0(x5) # Load X[i] ... X[i+3]
    fmul .4D f1 , f1 , f0 # f1[0] = f1 [0]*a; ...; f1[3] = f1 [3]*a;
    fld.4D f2 , 0(x6) # Load Y[i] ... Y[i+3]
    fadd .4D f2 , f2 , f1 # f2[0] = f1 [0]+b; ...; f2[3] = f2 [3]+b;
    fsd.4D f2 , 0(x6) # Store Y[j] ... Y[j+3]
    65

    View full-size slide

  69. SIMD ͷσϝϦοτͱϕΫτϧΞʔΩςΫνϟ
    ϚϧνϝσΟΞ SIMD ʹ͸ҎԼͷܽ఺͕͋Δɻ
    • ֤ϕΫτϧ௕ຖʹ໋ྩ͕ଘࡏ͓ͯ͠Γɺ໋ྩͷ਺͕૿͑΍͍͢
    • Ϊϟβʔ/εΩϟλʔͷΑ͏ͳΞυϨογϯάϞʔυ͕ແ͍
    • ཁૉ͝ͱͷ৚݅෇͖࣮ߦΛαϙʔτ͍ͯ͠ͳ͍
    ϕΫτϧΞʔΩςΫνϟ͸ SIMD ΑΓॊೈͰɺΑΓߴ౓ͳσʔλฒྻԋࢉΛѻ͑Δઃ
    ܭͰ͋Δɻ͜Ε͸ɺෳ਺ͷϨδελͱԋࢉϢχοτΛฒ΂ͯͦΕΒΛಈతʹ૊Έ߹Θ
    ͤΔ͜ͱͰɺ௕͍ύΠϓϥΠϯ΍ػೳϢχοτͷฒྻԽΛߦ͏ख๏Ͱ͋Δɻ
    ͜ΕʹΑΓ໋ྩΛ૿΍͢͜ͱແ͘ɺฒྻ໋ྩ࣮ߦ΍໋ྩͷΦʔόϔουͷܰݮ͕Ͱ͖
    Δɻ·ͨɺΪϟβʔ/εΩϟλ΍৚݅෇͖ϕΫτϧԋࢉΛαϙʔτ͍ͯ͠Δ΋ͷ΋
    ͋Δɻ
    66

    View full-size slide

  70. ϕΫτϧΞʔΩςΫνϟͷྫ
    Figure 48: ϕΫτϧΞʔΩςΫνϟͷྫ [6] 67

    View full-size slide

  71. ߴ଎Խͷྫ 5: ϕΫτϧΞʔΩςΫνϟ
    ϕΫτϧΞʔΩςΫνϟ͸ҎԼͷߏ੒͔ΒͳΔɻ
    • ϕΫτϧϨδελ:
    ҰͭͷϕΫτϧΛอ࣋͢ΔɻRV64V Ͱ͸ 32 × 64bit ݸͷϨδελ͕͋ΔɻΫϩ
    εόʔεΠονΛ༻͍ͯϕΫλػೳϢχοτͷೖྗͱग़ྗʹ઀ଓ͞Ε͍ͯΔɻ
    • ϕΫλػೳϢχοτ:
    ෳ਺ͷ஋ʹରͯ͠ɺࢦఆͷϕΫτϧԋࢉΛ࣮ߦ͢ΔϢχοτɻͦΕͧΕ͕׬શʹ
    ύΠϓϥΠϯԽ͞Ε͓ͯΓɺຖαΠΫϧ৽͍͠ԋࢉΛ։࢝Ͱ͖Δɻ
    • ϕΫλϝϞϦϢχοτ:
    ϝϞϦ͔ΒϕΫλσʔλΛϩʔυɺετΞ͢ΔϢχοτɻ׬શʹύΠϓϥΠϯԽ
    ͞Ε͓ͯΓɺΦʔόϔουͷޙʹຖαΠΫϧ৽͍͠ϫʔυΛҠಈͰ͖Δɻ
    • εΧϥϨδελ: ϕΫλϝϞϦϢχοτͰར༻͢ΔΞυϨεΛܭࢉͨ͠ΓɺϕΫ
    λػೳϢχοτʹεΧϥ஋Λೖྗ͢ΔϢχοτɻ
    68

    View full-size slide

  72. ϕΫτϧΞʔΩςΫνϟͷΞηϯϒϦ (DAXPY)
    ϕΫτϧܭࢉͱͯ͠Α͘࢖ΘΕΔ໋ྩΛྫʹ RISC-V ͷϕΫτϧ໋ྩͷΞηϯϒϦࣔ
    ͢ɻ2 ͭͷϕΫτϧ X ͱ Y ͱ 1 ͭͷεΧϥ a Λ༻͍ͯɺҎԼΛܭࢉ͢Δɻ
    Y = a × X + Y
    vsetdcfg 4 * FP64 # Enable 4 DP FP vregs
    fld f0 , a # Load scalar a
    vld v0 , x5 # Load vector X
    vmul v1 , v0 , f0 # Vector scalar multiply
    vld v2 , x6 # Load vector Y
    vadd v3 , v1 , v2 # Vector -vector add
    vst v3 , x6 # Store the sum
    vdisable # Disable vector regs
    69

    View full-size slide

  73. ϕΫτϧΞʔΩςΫνϟͷಈతྫ
    Figure 49: ϕΫλػೳϢχοτͷฒྻԽͷྫ [6] 70

    View full-size slide

  74. GPU
    ࠓճ·ͱΊ͖Ε·ͤΜͰͨ͠ɻεϛϚηϯɻ
    ɻ
    ୅ΘΓͱ͍͏ͷ͸͋ΕͰ͕͢ɺҎԼචऀͷϒϩάΛࢀর͍͚ͨͩΔͱخ͍͠Ͱ͢ɻ
    γϦʔζ෺Ͱେମ 7 هࣄ͘Β͍͋Γ·͢ɻ
    https://csam.hatenablog.com/entry/2019/02/04/154700
    71

    View full-size slide

  75. ίϯύΠϥʹΑΔܭࢉͷߴ଎Խ

    View full-size slide

  76. ίϯύΠϥͱ͸
    ߴڃݴޠͰॻ͔ΕͨϓϩάϥϜΛػցޠʹม׵͢Δ΋ͷͷ͜ͱͰ͋Δɻ͜Ε͸ɺ1950
    ೥୅ʹॳΊͯ FORTRAN ͷΑ͏ͳϓϩάϥϜ͕։ൃ͞Εͨ࣌ʹ໊෇͚ΒΕͨɻ
    ҎԼʹ C ݴޠͷιʔείʔυΛػցޠʹม׵͢Δ༷ࢠΛࣔ͢ɻ
    Figure 50: C ݴޠͷίϯύΠϧ
    (http://gihyo.jp/dev/serial/01/c-programming-introduction/0001)
    72

    View full-size slide

  77. ίϯύΠϧํࣜͱΠϯλϓϦλํࣜ
    ίϯύΠϥʹ͸ɺೖྗͷϓϩάϥϜΛ׬શʹػցޠʹม׵ͯ͠ϑΝΠϧͱͯ͠ੜ੒͠
    ͨޙʹͦΕΛ࣮ߦ͢ΔίϯύΠϧํࣜͱɺϑΝΠϧͷੜ੒ΛߦΘͣʹͦͷ··ղऍ͠
    ͯೖྗͷϓϩάϥϜΛ࣮ߦ͢ΔΠϯλʔϓϦλํ͕ࣜ͋Δɻ
    • ίϯύΠϧํࣜ: CɺFORTRANɺRust
    • ΠϯλϓϦλܗࣜ: PythonɺRubyɺJavascript
    ·ͨɺೖྗͷϓϩάϥϜΛ࣮ࡍͷػցޠͰ͸ͳ͘Ծ૝తͳ໋ྩʹม׵͢Δ΋ͷ΋͋
    Δɻ͜ΕΒ͸ͦͷԾ૝໋ྩΛղऍ͢ΔϓϩάϥϜΛιϑτ΢ΣΞͱͯ͠ಈ࡞ͤ͞Δ͜
    ͱʹΑΓ࣮ߦΛߦ͏ɻ
    Ծ૝ػցΛ༻͍ͨϓϩάϥϛϯάݴޠͷྫʹ Jave ͕͋ΔɻJava ͸ίϯύΠϧํࣜͱ
    ΠϯλϓϦλํࣜͷ྆ํͰ࣮ߦ͕ՄೳͰ͋Δɻ
    73

    View full-size slide

  78. ίϯύΠϥͷॲཧͷྲྀΕ
    Figure 51: ίϯύΠϥͷॲཧͷྲྀΕ [10] 74

    View full-size slide

  79. ίϯύΠϥʹΑΔ࠷దԽͷख๏
    ໨తϓϩάϥϜͷ࠷దԽͱ͸ɺޮ཰ͷྑ͍໨తϓϩάϥϜʹ͢Δͱ͍͏͜ͱͰ͋Δɻ
    ࠷దԽʹ͸ϓϩάϥϜͷ࣮ߦ଎౓Λૣͨ͘͠ΓɺαΠζΛখ͘͢͞ΔͳͲ༷ʑͳछྨ
    ͕͋Δ͕ɺຊࢿྉͰ͸ߴ଎Խʹ͍ͭͯऔΓѻ͏͜ͱʹ͢Δɻ࣮ߦ଎౓ͷ޲্ʹ͸େ͖
    ͘ 3 ͭͷ޻෉͕ߟ͑ΒΕΔ [11]ɻ
    • ໋ྩͷ࣮ߦճ਺ΛݮΒ͢
    • ΑΓૣ໋͍ྩΛ࢖͏
    • ฒྻ౓Λ্͛Δ
    ্هΛͲͷ༷ʹ૊Έ߹ΘͤΔ͔͸ɺ࣮ߦର৅ͷίϯϐϡʔλʹΑͬͯҟͳͬͯ͘Δɻ
    ྫ͑͹ɺಉ࣌ʹෳ਺ͷ໋ྩΛ࣮ߦͰ͖ΔεʔύεΧϥϚγϯʹ͓͍ͯ͸ 3 ͕༗ޮͰ͋
    Δɻ·ͨɺCISC Ϛγϯͷ༷ʹෳ߹తͳ໋ྩΛଟ࣋ͭ͘ϓϩηοαͷ৔߹ʹ͸ɺͦΕ
    ΒΛ͏·͘࢖͏͜ͱͰ໋ྩͷ࣮ߦճ਺ΛݮΒ͢͜ͱ͕Ͱ͖Δɻ
    75

    View full-size slide

  80. ࠷దԽͷൣғ
    ϓϩάϥϜͷ͘͝Ұ෦ͷΈΛղੳͯ͠࠷దԽ͢Δ͜ͱΛہॴత࠷దԽɺͦͷશମʹ
    ౉ͬͯ࠷దԽ͢Δ͜ͱΛେҬత࠷దԽͱݺͿɻಛʹɺ໋֤ྩͷۙ͘͝ลͷΈΛݟͯߦ
    ΘΕΔ࠷దԽΛ peephole ࠷దԽͱݺͿɻ
    ࠷దԽ͸தؒදݱͱΞʔΩςΫνϟʹґଘͨ͠ػցޠͷ૒ํʹରͯ͠ߦΘΕΔɻ
    76

    View full-size slide

  81. ໋ྩͷ࣮ߦճ਺ΛݮΒ͢
    ໋ྩͷ࣮ߦճ਺ΛݮΒ͢ʹ͸ҎԼͷํ๏͕͋Δ [11]ɻ
    1. 1 ౓࣮ߦͨ݁͠ՌΛ࠶ར༻͢Δ: Common Subexpression Elimination ͳͲ
    2. Մೳͳ΋ͷ͸ίϯύΠϧ࣌ʹ࣮ߦ͢Δ: Constant Folding ͳͲ
    3. ໋ྩΛΑΓ࣮ߦස౓ͷ௿͍ͱ͜Ζ΁Ҡ͢: Code Motion ͳͲ
    4. ࣮ߦճ਺ΛݮΒ͢Α͏ʹϓϩάϥϜͷܗΛม׵͢Δ: Loop Transformation ͳͲ
    5. ࣜͷੑ࣭Λར༻࣮ͯ͠ߦΛม׵͢Δ: ୅਺ͱ࿦ཧࣜͷ׆༻ͳͲ
    6. ৑௕ͳ໋ྩΛऔΓআ͘: Dead Code Elimination ͳͲ
    7. ಛघԽ͢Δ: Function Inliningɺ൑ఆஔ͖׵͑ͳͲ
    ࠓճ͸্هͷҰ෦Λ঺հ͢Δɻ࣮͸্هͷز͔ͭͷຊ࣭తͳมܗख๏͸ಉ͡Ͱ͋Δɻ
    ྫ͑͹ɺCommon Subexpression Elimination ͱ Code Motion ͸ Partial Subexpression
    Elimination ͷҰछͰ͋Δɻ
    77

    View full-size slide

  82. Common Subexpression Elimination
    ҎԼͷΑ͏ʹจͷதʹڞ௨ͷؚ͕ࣜ·ΕΔ৔߹ɺͦΕΒΛҰ౓͚ͩܭࢉ͠Ұ࣌ม਺ͱ
    ͯ֬͠อͨ͠ޙɺͦΕΛ࢖͍ճ͢Α͏ʹϓϩάϥϜΛมߋ͢Δɻ
    c = a + b // (1)
    ...
    e = (a + b) * d // (2)
    t = a + b // Ұ࣌ม਺ͱͯ͠ܭࢉ
    c = t // Ұ࣌ม਺Λར༻1
    ...
    e = t * d // Ұ࣌ม਺Λར༻2
    ڞ௨෦෼ࣜͷ࡟আʹ͸ҰൠʹҎԼͷ 3 ͭͷ৚͕݅ඞཁͰ͋Δɻ
    i. (1) ͱ (2) ͷ a+b ͸ಉ͡ܗͷࣜͰ͋Δ
    ii. (2) ͷܭࢉͷલʹඞͣ (1) ͷܭࢉ͕ͳ͞Ε͍ͯΔ
    iii. (1) ͱ (2) ͷؒͰɺa ͱ b ͷ஋͕มΘΒͳ͍
    78

    View full-size slide

  83. Constant Folding
    ఆ਺ͷܭࢉΛίϯύΠϧ࣌ʹߦ͏࠷దԽͷ͜ͱɻྫ͑͹ɺҎԼͷΑ͏ͳϓϩάϥϜʹ
    ରͯ͠ɺ
    a = 1.0 + 2.0
    b = a * 3.0
    d = c - b // ͸ม਺c
    a Λࣄલʹܭࢉͯ͠ఆ਺ͱ͠ɺͦͷ݁ՌΛ༻͍ͯ b ΋ܭࢉ͢Δͱɺ࠷ऴతʹҎԼͷΑ
    ͏ʹͳΔɻ
    d = c - 6.0
    79

    View full-size slide

  84. Loop Motion
    ϧʔϓͷ಺෦Ͱ஋͕มԽ͠ͳ͍ཁૉΛϧʔϓෆม (Loop Invariant) ͱݺͿɻLoop
    Motion ͸ϧʔϓෆมͳܭࢉΛ֎ʹͩ͢͜ͱͰɺܭࢉճ਺ΛݮΒ͢࠷దԽͰ͋ΔɻҎ
    Լʹ Loop Motion ͷ؆୯ͳྫΛࣔ͢ɻ
    for i = 1, n
    ...
    a = b * c
    d = i * 2
    ...
    end
    a = b * c
    for i = 1, n
    ...
    d = i * 2
    ...
    end
    Loop Motion ʹΑͬͯ࡟ݮ͞ΕΔܭࢉྔ͸ҰൠʹͦͷϧʔϓͷΠςϨʔγϣϯ਺ͱԋ
    ࢉࢠͷڧ౓ʹґଘ͢Δɻ্هͰ͸ n-1 ճͷ৐ࢉ͕࡟ݮ͞ΕΔɻ
    80

    View full-size slide

  85. ΑΓ଎໋͍ྩͷར༻
    1. هԱ֊૚Λ׆༻͢Δ: Register Allocation, Tiling ͳͲ
    2. ϓϩηοαͷ࣋ͭߴ଎ͳ໋ྩΛ׆༻͢Δ: SIMD, incremental condition ͳͲ
    3. ࿦ཧ΍਺ཧత৘ใΛ༻͍ͯΑΓ୯७ͳ໋ྩʹஔ͖׵͑Δ
    81

    View full-size slide

  86. Register Allocation
    Ϩδελ͸ίϯϐϡʔλͷ࣋ͭ࠷΋ߴ଎ͳهԱૉࢠͰ͋ΔͨΊɺͳΔ΂͘ϨδελΛ
    ࢖͏Α͏ʹϓϩάϥϜΛม׵͢Δͱߴ଎ʹͳΔɻ͔͠͠ɺϨδελͷ਺͸গͳ͍ͨ
    Ίɺม਺ʹରͯ͠ޮ཰తʹͦΕΒΛׂΓ౰ͯΔඞཁ͕͋Δɻ͜Ε͸ Register
    Allocation ͱݺ͹ΕΔɻ
    82

    View full-size slide

  87. Loop Tiling
    هԱ֊૚Λ׆͔ͨ͠ߴ଎Խͷ༗໊ͳྫʹ Loop Tiling ͕͋Δɻ͜Ε͸େ͖ͳ഑ྻΛෳ
    ਺ͷখ͞ͳ഑ྻʹ෼ղͯ͠ɺͦͷൣғͰܭࢉΛߦ͏͜ͱͰΞΫηεͷہॴੑΛߴΊΔ
    ࠷దԽͰ͋ΔɻҎԼʹ n ࣍ਖ਼ํߦྻͷߦྻੵͷྫΛࣔ͢ɻ
    for i = 1, n
    for j = 1, n
    for k = 1, n
    C(i,j) += A(i,k) * B(j,k)
    end // k
    end // j
    end // i
    83

    View full-size slide

  88. loop tiling Λద༻ͨ͠ߦྻੵͷྫ
    for ti = 1, n, t
    for tj = 1, n, t
    for tk = 1, n, t
    for i = ti , n, min(ti + t, n)
    for j = tj , n, min(tj + t, n)
    for k = tk , n, min(tk + t, n)
    C(i,j) += A(i,k) * B(j,k)
    end // k
    end // j
    end // i
    end // tk
    end // tj
    end // ti
    84

    View full-size slide

  89. σόΠεͷ࣋ͭߴ଎ͳ໋ྩΛ׆༻͢Δ
    ଟ͘ͷσόΠε͸Ұൠతͳ໋ྩͷଞʹɺ͋Δಛघͳಈ࡞ʹରͯ͠ߴ଎ʹಈ࡞͢Δ໋ྩ
    Λ౥ࡌ͍ͯ͠Δ͜ͱ͕͋Δɻྫ͑͹ɺϨδελ಺ͷ஋ʹରͯ̍͠ΛՃࢉ͠ɺͦͷ݁Ռ
    Λ༻͍ͯ෼ذ໋ྩΛ࣮ߦ͢Δ໋ྩ౳Ͱ͋Δɻ͜Ε͸ϧʔϓͷऴྃ൑ఆͳͲʹΑ͘ར༻
    ͞ΕΔɻ·ͨɺۙ೥ͷσόΠεʹΑ͘౥ࡌ͞Ε͍ͯΔϕΫτϧԋࢉػΛ׆༻͢Δ໋ྩ
    ΋Α͘ར༻͞ΕΔɻRISC-V ຊͷϕΫτϧ໋ྩΛࢀরɻ͋ͱͰॻ͘
    85

    View full-size slide

  90. σόΠεͷ࣋ͭߴ଎ͳ໋ྩΛ׆༻͢Δ
    CPU ͷϕΫλϓϩηοα΍ SIMD ͷ߲໨Ͱઆ໌ͨ͠ͷͰɺͦͪΒΛࢀরɻ
    86

    View full-size slide

  91. ฒྻ౓Λ্͛Δ
    ίϯύΠϥʹΑΔϓϩάϥϜͷ࠷దԽͷҰͭʹɺϓϩάϥϜΛฒྻʹ࣮ߦ͢Δ͜ͱʹ
    Αͬͯߴ଎Խ͢Δख๏͕͋Δɻ͜Ε͸ಛʹɺݱ୅ͷ CPU ͷ༷ʹେن໛ͳσόΠεͰ
    ಛʹ༗ޮͳख๏Ͱ͋ΓɺҰൠͷϓϩάϥϜͰ΋සൟʹར༻͞Ε͍ͯΔɻ
    ฒྻ౓Λ্͛Δख๏͸ɺେ͖͘ҎԼͷ̎ͭʹ෼ྨ͢Δ͜ͱ͕Ͱ͖Δɻ
    1. ໋ྩϨϕϧͷฒྻ࣮ߦ
    2. ϓϩηοαͷϨϕϧͷฒྻ࣮ߦ
    87

    View full-size slide

  92. ໋ྩϨϕϧͷฒྻ࣮ߦ
    CPU ͷষͰۙ୅తͳϓϩηοα͸ԋࢉϢχοτΛෳ਺΋͓ͬͯΓɺ໋ྩΛฒྻʹ࣮
    ߦ͢Δ͜ͱ͕Ͱ͖Δ͜ͱΛड़΂ͨɻ͜͜Ͱ͸ίϯύΠϥ͔Βੜ੒͢ΔίʔυΛ޻෉͢
    Δ͜ͱͰෳ਺ͷԋࢉػ্Ͱ࣮ߦ͞ΕΔ໋ྩ਺Λ૿΍͠ɺϓϩάϥϜΛߴ଎Խ͢Δख๏
    Λઆ໌͢Δɻ
    ίϯύΠϥʹΑΓɺ໋ྩ࣮ߦͷޮ཰Λ্͛ΔΑ͏ʹͦͷॱংΛมߋ͢Δ͜ͱΛ໋ྩε
    έδϡʔϦϯάͱݺͿɻಛʹϧʔϓʹରͯ͠ɺ໋ྩεέδϡʔϦϯάΛߦ͏͜ͱΛι
    ϑτ΢ΣΞύΠϓϥΠχϯάΛݺͿɻ͜ͷख๏͸ओʹεʔύʔεΧϥϓϩηοα΍
    VLIW ͷΑ͏ͳߏ੒Λ΋ͭϓϩηοαʹରͯ͠༗ޮͰ͋Δɻ
    88

    View full-size slide

  93. ιϑτ΢ΣΞύΠϓϥΠχϯά
    ͜͜Ͱ͸ҎԼͷ৚݅Λຬͨ͢ϓϩηοαΛ࢖༻ͨ͠ࡍͷιϑτ΢ΣΞύΠϓϥΠχϯ
    άͷྫΛࣔ͢ɻ
    • 1 ͭͷԋࢉ໋ྩͱ̍ͭͷϝϞϦૢ࡞໋ྩΛಉ࣌ʹ࣮ߦͰ͖Δɻ
    • ԋࢉ໋ྩ͸̍αΠΫϧͰ׬ྃ͢Δ
    • ϝϞϦૢ࡞໋ྩͷϩʔυ໋ྩ͸ 2 αΠΫϧɺετΞ໋ྩ͸ 1 αΠΫϧͰ׬ྃ͢Δ
    ҎԼʹɺϧʔϓ಺෦Ͱ୯७ͳܭࢉΛߦ͏ϓϩάϥϜͱͦͷΞηϯϒϦΛࣔ͢ɻ࣍ʹ͜
    ͷϓϩάϥϜͷ֤ΠςϨʔγϣϯʹ͓͚Δ໋ྩ࣮ߦͷਪҠΛࣔ͢ɻ
    for i = 1, n
    A[i] = A[i] * b + c
    end
    1: Load r1 A[i]
    2: Mul r4 r1 r2 // b in r2
    3: Add r5 r4 r3 // c in r3
    4: Store A[i] r5
    5: loop_check
    89

    View full-size slide

  94. ιϑτ΢ΣΞύΠϓϥΠχϯάͳ͠ͷ৔߹ͷΠςϨʔγϣϯͷ༷ࢠ
    1 i = 1 i = 2 i = 3 i = 4
    2 Load r1 A[1]
    3 Mul r4 r1 r2
    4 Add r5 r4 r3
    5 Store A[i] r5
    6
    7 loop check
    8 Load r1 A[2]
    9 Mul r4 r1 r2
    10 Add r5 r4 r3
    11 Store A[2]
    12
    13 loop check
    90

    View full-size slide

  95. ιϑτ΢ΣΞύΠϓϥΠχϯάΛద༻ͨ͠ϓϩάϥϜͱΞηϯϒϦ
    for i = 1, n, 4
    A[i] = A[i] * b + c
    A[i+1] = A[i+1] * b + c
    A[i+2] = A[i+2] * b + c
    A[i+3] = A[i+3] * b + c
    end
    1: Load r1 A[i]
    2: Mul r4 r1 r2
    3: Add r5 r4 r3
    4: Store A[i] r5
    5: Load r1 A[i+1]
    6: Mul r4 r1 r2
    7: Add r5 r4 r3
    8: Store A[i+1] r5
    9: Load r1 A[i+2]
    10: Mul r4 r1 r2
    11: Add r5 r4 r3
    12: Store A[i+2] r5
    13: Load r1 A[i+3]
    14: Mul r4 r1 r2
    15: Add r5 r4 r3
    16: Store A[i+3] r5
    17: loop_check
    91

    View full-size slide

  96. ιϑτ΢ΣΞύΠϓϥΠχϯά͋Γͷ৔߹ͷΠςϨʔγϣϯͷ༷ࢠ
    1 i = 1 i = 2 i = 3 i = 4
    2 Load r1 A[1]
    3 Mul r4 r1 r2
    4 Add r5 r4 r3 Load r1 A[2]
    5
    6 Store A[1] r5 Mul r4 r1 r2
    7 Add r5 r4 r3 Load r1 A[3]
    8
    9 Store A[2] r5 Mul r4 r1 r2
    10 Add r5 r4 r3 Load r1 A[4]
    11
    12 Store A[3] r5 Mul r4 r1 r2
    13 Add r5 r4 r3
    92

    View full-size slide

  97. Initiation Interval
    લεϥΠυͷද͔Βɺϧʔϓͷ్த͔Β 3 αΠΫϧͰ࣍ͷΠςϨʔγϣϯΛ։࢝ͯ͠
    ͍Δ͜ͱ͕෼͔Δ (ྫ: 6 9 αΠΫϧ໨)ɻ͜ͷִؒΛ Initiation Interval(II) ͱݺͿɻॳ
    Ίͷૉ๿ͳϧʔϓͷ II ͸ 6 Ͱ͋ͬͨ͜ͱ͔Β΋෼͔ΔΑ͏ʹɺII ͕খ͍͞΄Ͳϧʔϓ
    ʹඞཁͳΫϩοΫαΠΫϧ਺͸ݮগ͢Δɻ
    Ҏ্͔Βɺιϑτ΢ΣΞύΠϓϥΠχϯάΛߦ͏ͱૉ๿ͳ৔߹ΑΓߴ଎ʹϓϩάϥϜ
    Λ࣮ߦͰ͖Δ͜ͱ͕Θ͔ͬͨɻ͜Ε͸֤ϧʔϓؒʹ͓͚Δऴྃ൑ఆΛݮΒ͢͜ͱͰɺ
    ϓϩηοαͷύΠϓϥΠϯΛखಈͰຒΊΔΑ͏ʹϓϩάϥϜΛ૊Ή͜ͱʹΑΓੜͯ͡
    ͍Δɻݱ୅ͷϓϩηοαͷଟ͕͘౤ػ࣮ߦʹΑͬͯऴྃ൑ఆΛ଴ͨͣʹ࣍ͷ໋ྩΛ
    ϑΣον͠ɺฒྻʹ࣮ߦ͍ͯ͠ΔͨΊɺૉ๿ͳϧʔϓͷྫͷΑ͏ʹඇޮ཰ʹ࣮ߦ͕ͳ
    ͞ΕΔ͜ͱ͸গͳ͍ɻ͔͠͠ɺ͜ΕΒ͸෼ذ༧ଌͷਫ਼౓΍෼ذͷ֬཰ʹґଘ͢Δͨ
    Ίɺϋʔυ΢ΣΞ͕ਐԽͨ͠ݱࡏͰ΋ґવͱͯ͠ιϑτ΢ΣΞʹΑΔ໋ྩεέδϡʔ
    Ϧϯά͸༗ޮͰ͋Δɻ
    93

    View full-size slide

  98. ϓϩηοαϨϕϧͷฒྻ࣮ߦ
    CPU ͷষͰۙ୅తͳϓϩηοα͸ෳ਺ͷಉҰͳܭࢉϢχοτΛ಺෦ʹ͓࣋ͬͯΓ (Ϛ
    ϧνίΞ)ɺ໋ྩΛฒྻʹ࣮ߦ͢Δ͜ͱ͕Ͱ͖Δ͜ͱΛड़΂ͨɻ͜͜Ͱ͸ॲཧΛෳ੡
    ͠ɺෳ਺ͷϢχοτͷ্Ͱฒྻʹ࣮ߦΛߦ͏͜ͱͰϓϩάϥϜΛߴ଎Խ͢Δख๏Λ঺
    հ͢ΔɻͲͷ༷ʹҰͭͷॲཧΛ෼ׂ࣮͠ߦ͢Δ͔Ͱɺ2 ͭͷύλʔϯ͕͋Δɻ
    1. σʔλฒྻ
    2. λεΫฒྻ
    ·ͨɺฒྻϚγϯ͸ओهԱͷ઀ଓͷ͔ͨ͠ʹΑͬͯ 2 छྨʹ෼ྨͰ͖Δɻ
    1. ڞ༗ϝϞϦํࣜ
    2. ෼ࢄϝϞϦํࣜ
    94

    View full-size slide

  99. ·ͱΊ
    • ίϯϐϡʔλ͸਺ͷදݱʹ 2 ਐ਺Λ༻͍͓ͯΓɺεΠονճ࿏Λݩʹߏ੒͞Εͨ
    ࿦ཧճ࿏ͰϒʔϧԋࢉΛߦ͍ͬͯΔ
    • Ұൠతͳίϯϐϡʔλ͸ϊΠϚϯܕϚγϯͱݺ͹ΕΔߏ੒ͱͳ͍ͬͯΔ
    • CPU ͸ ISA ʹΑΓ໋ྩΛ࣮ߦ͠ɺͦΕΒ͸ڞ௨ͷ࿦ཧճ࿏Λ༻͍࣮ͯߦ͞ΕΔ
    • CPU ͸ߴ଎ԽͷͨΊʹύΠϓϥΠϯԽ͞Ε͍ͯΔ͕ɺϋβʔυʹΑΓͦͷޮ཰
    ͕མͪΔ͜ͱ͕͋Δ
    • ଎౓ͱ༰ྔͷҟͳΔϝϞϦΛ֊૚Խ͢Δ͜ͱͰϝΠϯϝϞϦ΁ͷΞΫηε࣌ؒΛ
    ؇࿨͍ͯ͠Δ
    • CPU ͸ಉ͡ॲཧΛෳ਺ͷσʔλʹର࣮ͯ͠ߦͰ͖Δ
    • ίϯύΠϥ͸ CPU ͷੑ࣭Λར༻ͯ͠ϓϩάϥϜΛߴ଎Խ͍ͯ͠Δ
    ߴ଎Խͷଟ͘͸ίϯϐϡʔλͷΞʔΩςΫνϟࣗମͷੑ࣭Λ͏·͘ར༻͢Δ͜ͱͰୡ
    ੒Ͱ͖Δέʔε͕ଟ͍ɻैͬͯɺجૅతͳίϯϐϡʔλͷ஌ࣝΛ͖ͪΜͱཧղ͢Δ͜
    ͱ͕ॏཁͰ͋Δɻ
    95

    View full-size slide

  100. ײ૝ͱࠓޙͷల๬
    ࠓճͷൃදͰ͸ࣄલʹܭը͍ͯͨ͠ GPU ͷ෦෼Λ׬શʹൈ͍ͯ͠·͍ɺਃ͠༁͋Γ
    ·ͤΜͰͨ͠ɻ
    ɻ·ͨɺൃද࣌ؒ΍ࢿྉͷ࡞੒४උ࣌ؒͷ໰୊Ͱ CPU ΍ίϯύΠϥʹ
    ͍ͭͯ΋ਂ͍ೖΓͰ͖ͳ͔ͬͨͷͰɺͲ͔͜Ͱͦͷล΋ൃද͍ͨ͠ɻ
    ൓লΛ౿·͑ͯࠓޙͷൃද͍ͨ͠಺༰͸ҎԼɻ
    • CPU: εʔύεΧϥͷৄࡉɺOoO ࣮ߦɺεϨουϨϕϧฒྻੑɺ෼ذ༧ଌɺη
    ΩϡϦςΟ໰୊
    • GPU: جૅɺGPU ϓϩάϥϛϯάͷྫ
    • ίϯύΠϥ: ϑϩϯτΤϯυɺσʔλϑϩʔղੳͳͲͷղੳܥ
    • ͦͷଞ: FPGA, Domain Specific Architecture
    ಡྃɺ͓ർΕ༷Ͱͨ͠ɻ
    96

    View full-size slide

  101. ࠷ޙʹ
    ؒҧ͍΍͝ҙݟ͋Ε͹ͥͻڭ͍͑ͯͩ͘͞ɻҎԼɺ࿈བྷઌͰ͢ɻ
    Twitter ID @tkclimb0911
    97

    View full-size slide

  102. ϚΠΫϩϓϩηοαɾΞʔΩςΫνϟೖ໳: RISC ϓϩηοαͷجૅ͔Β
    ࠷৽ϓϩηοαͷ͘͠Έ·Ͱ. Tech I. CQ ग़൛, 2004. isbn:
    9784789833318. url:
    https://books.google.co.jp/books?id=CQ4lPQAACAAJ.
    ࿦ཧͱܭࢉͷ͘͠Έ. ؠ೾ॻళ, 2007. isbn: 9784000061919. url:
    https://www.iwanami.co.jp/book/b265606.html.
    ୭͕Ͳ͏΍ͬͯίϯϐϡʔλΛ૑ͬͨͷ͔? ڞཱग़൛, 1995. isbn:
    9784320027428. url:
    https://books.google.co.jp/books?id=4ZEsAAAACAAJ.
    H. Ando. ϓϩηοαΛࢧ͑Δٕज़: Ռͯ͠ͳ͘εϐʔυΛ௥ٻ͢Δੈք
    . Web+DB Press ϓϥεγϦʔζ. ٕज़ධ࿦ࣾ, 2011. isbn:
    9784774145211. url:
    https://books.google.co.jp/books?id=NCRQYgEACAAJ.
    97

    View full-size slide

  103. Sarah Harris and David Harris. Digital Design and Computer
    Architecture: ARM Edition. 1st. San Francisco, CA, USA: Morgan
    Kaufmann Publishers Inc., 2015. isbn: 0128000562, 9780128000564.
    John L. Hennessy and David A. Patterson. Computer Architecture, Sixth
    Edition: A Quantitative Approach. 6th. San Francisco, CA, USA: Morgan
    Kaufmann Publishers Inc., 2017. isbn: 0128119055, 9780128119051.
    David A. Patterson and John L. Hennessy. Computer Organization and
    Design RISC-V: The Hardware/Software Interface. 5th. San Francisco,
    CA, USA: Morgan Kaufmann Publishers Inc., 2017. isbn: 0124077269,
    9780124077263.
    poyopoyo Reconf. LSI ͔ΒΘ͔Δࣗ࡞ CPU. 1st. 2018. url:
    https://booth.pm/ja/items/1046056.
    97

    View full-size slide

  104. C. E. Shannon. “A symbolic analysis of relay and switching circuits”. In:
    Electrical Engineering 57.12 (Dec. 1938), pp. 713–723. issn: 0095-9197.
    doi: 10.1109/EE.1938.6431064.
    ΤΠϖϧΞϯυϦϡʔ W. ࠷৽ίϯύΠϥߏ੒ٕ๏. ᠳӭࣾ, 2009.
    isbn: 9784798114682. url:
    https://books.google.co.jp/books?id=MzSFQgAACAAJ.
    தాҭ෉. ίϯύΠϥͷߏ੒ͱ࠷దԽ ʢୈ̎൛ʣ
    . ே૔ॻళ, 2009. isbn:
    9784254121773. url:
    https://www.asakura.co.jp/books/isbn/978-4-254-12177-3/.
    ߴᖛ పߦ খ૔ ٱ࿨. ৘ใͷ࿦ཧ਺ֶೖ໳: ϒʔϧ୅਺͔Βड़ޠ࿦ཧ·
    Ͱ. ৘ใͷ࿦ཧ਺ֶೖ໳: ϒʔϧ୅਺͔Βड़ޠ࿦ཧ·Ͱ. ۙ୅Պֶࣾ,
    1991. isbn: 9784764901803. url:
    https://books.google.co.jp/books?id=JGSYygAACAAJ.
    97

    View full-size slide