Lock in $30 Savings on PRO—Offer Ends Soon! ⏳

コンピューティングの基礎と高速化入門

tkclimb
April 27, 2019

 コンピューティングの基礎と高速化入門

コンピューティングの基礎と処理の高速化入門 #1 at connpass で発表したスライドです。
https://liberal-arts-for-tech.connpass.com/event/123273/

tkclimb

April 27, 2019
Tweet

More Decks by tkclimb

Other Decks in Technology

Transcript

  1. ໨࣍ 1. ίϯϐϡʔλʹΑΔܭࢉͷݪཧ 1.1 ϒʔϧ୅਺ 1.2 ࿦ཧճ࿏ 1.3 ిࢠճ࿏ 1.4

    νϡʔϦϯάϚγϯ 1.5 ϊΠϚϯܕίϯϐϡʔλ 2. CPU ʹΑΔܭࢉͷߴ଎Խ 2.1 CPU ͱ͸ 2.2 ISA ͱϚΠΫϩΞʔΩςΫνϟ 2.3 ໋ྩϨϕϧฒྻੑ 2.4 هԱ֊૚ 2.5 σʔλϨϕϧฒྻੑ 3. ίϯύΠϥʹΑΔܭࢉͷߴ଎Խ 3.1 ίϯύΠϥͱ͸ 3.2 ϓϩάϥϜͷදݱͱղऍ 3.3 ίϯύΠϥʹΑΔ࠷దԽ 3.4 ύΠϓϥΠχϯάͱεέδϡʔϦ ϯά 1
  2. ίϯϐϡʔλ಺෦Ͱͷ਺஋දݱ: 2 ਐ਺ 2 ਐ਺ (binary) ͸ 17 ੈلʹϥΠϓχοπ͕਺ֶతʹཱ֬͠ɺݱ୅ͷίϯϐϡʔλʹ͓ ͚Δ਺஋දݱʹ༻͍ΒΕ͍ͯΔɻ2

    ਐ਺͸͋Δ਺Λ 2 Λج਺ͱͨ͠΂͖ͱ܎਺͔Βͳ Δଟ߲ࣜͰදݱ͢Δɻ ҎԼʹ”147”Λ 10 ਐ਺ (decimal) ͱ 2 ਐ਺ͷ྆ํͰදͨ͠ྫΛࣔ͢ɻ decimal : 14710 = 1 × 102 + 4 × 101 + 7 × 100 (1) binary : 1472 = 1 × 27 + 1 × 24 + 1 × 21 + 1 × 20 (2) ಉ༷ʹɺ10 ਐ਺ͷ 22 Λ 2 ਐ਺ʹม׵͢ΔͱҎԼͷΑ͏ʹͳΔɻ 101102 = 1 × 24 + 0 × 23 + 1 × 22 + 1 × 21 + 0 × 20 = 2210 3
  3. ϒʔϧ୅਺ ϒʔϧ୅਺͸ 19 ੈلʹδϣʔδɾϒʔϧʹΑΓߟҊ͞Εͨɻ୅਺ܥ (B; +, −, (¯)) ͕ ҎԼͷੑ࣭Λຬͨ࣌͢ɺB

    Λϒʔϧ୅਺ͱݺͿɻ2 ਐ਺ͷ֤ܻΛϒʔϧ୅਺Λ༻͍ͯ ද͢͜ͱͰɺՃࢉ΍৐ࢉͳͲͷܭࢉΛ୅਺తʹߦ͏͜ͱ͕Ͱ͖Δɻ Figure 4: ϒʔϧ୅਺ͷެཧ (ϋϯςΟϯτϯͷެཧܥ)[12] 6
  4. ϒʔϧؔ਺ͱϒʔϧԋࢉ ϒʔϧؔ਺ͱ͸ϒʔϧม਺ B = {0, 1} ͱͦͷ௚ੵ Bn Λ༻͍ͯɺࣸ૾ f

    : Bn → B Ͱ ఆٛ͞ΕΔɻྫ͑͹ɺf (1, 0, 0, 1, 0) = 1 ͳͲͰ͋Δɻ ϒʔϧԋࢉͱ͸̎ม਺ͷϒʔϧؔ਺ f : B × B → B Ͱ͋ΓɺίϯϐϡʔλʹΑΔ࿦ཧ ԋࢉͱͯ͠΋සൟʹ༻͍ΒΕ͍ͯΔɻҎԼʹϒʔϧԋࢉͷྫΛࣔ͢ɻ Figure 5: ϒʔϧԋࢉͷྫ [12] 7
  5. ిѹΛ༻͍ͨ 2 ஋ͷදݱ ϒʔϧԋࢉ͕εΠονճ࿏ͰදݱͰ͖Δ͜ͱΛઆ໌͕ͨ͠ɺҰൠతͳίϯϐϡʔλͷ ͓͍ͯ 2 ஋ͷϒʔϧม਺͸ిѹͷେখͰදݱ͞ΕΔɻ ిѹͷߴ͞ ஋ ిѹ஋

    ௿͍ 0 GND ߴ͍ 1 VDD ݱ୅ͷίϯϐϡʔλʹ͓͍ͯɺεΠονճ࿏ͱͯ͠ MOS τϥϯδελ͕༻͍ΒΕͯ ͍ΔɻτϥϯδελΛ༻͍ͨిࢠճ࿏͸ߏ଄͕γϯϓϧͰ৴པੑ͕ߴ͘ɺେྔੜ࢈͕ ՄೳͰ͋Δɻ 12
  6. Ճࢉճ࿏ ҎԼʹՃࢉΛ࣮ߦ͢Δ૊Έ߹Θͤճ࿏Ͱ͋ΔશՃࢉثͷྫΛࣔ͢ɻ͜Ε͸ 1 Ϗοτͷ Ճࢉͷ݁ՌΛ܁Γ্͕Γ਺ͱڞʹग़ྗ͢Δɻෳ਺ฒ΂ͯग़ྗΩϟϦʔ (cout) Λ࣍ͷ ೖྗΩϟϦʔʹ઀ଓ͢Δ͜ͱͰɺଟϏοτͷՃࢉΛ࣮ߦͰ͖Δɻ Figure 20:

    શՃࢉثͷ࿦ཧճ࿏ [5] ͋ͨɺ1 ͭͷશՃࢉثΛΫϩοΫʹ·͕ͨΓ࢖͍ճ͢͜ͱͰɺଟϏοτͷՃࢉΛߦ͏ ͜ͱ΋Ͱ͖Δɻ͜ͷํ๏͸ԋࢉػ͕ߴՁͩͬͨ࣌୅ʹར༻͞Ε͍ͯͨ (EDVAC ͳͲ)ɻ 20
  7. νϡʔϦϯάϚγϯͷݪཧ νϡʔϦϯάϚγϯ͸ແݶͷ௕͞Λ࣋ͭςʔϓͱͦΕΛಡΉϔουɺػցΛಈ࡞ͤ͞ Δϧʔϧ͕هड़͞Εͨςʔϒϧ͔ΒͳΔɻςʔϓʹ͸༗ݶݸͷछྨͷه߸ͷ͏ͪ 1 ͭ ͕هࡌ͞Ε͍ͯΔɻ Figure 23: νϡʔϦϯάϚγϯ [2]

    νϡʔϦϯάϚγϯ͸༗ݶͷঢ়ଶू߹͔Β͋ΔҰͭͷঢ়ଶΛ࣋ͪɺςʔϒϧ͸ঢ়ଶू ߹ͱςʔϓʹهࡌ͞Ε͍ͯΔ༗ݶͳه߸ͷ૊Έ߹Θͤʹରͯ͠ɺ͋Δ໋ྩ͕ఆٛ͞Ε ͍ͯΔɻ΋ͪΖΜɺ༗ݶͳू߹ಉ࢜ͷ૊Έ߹ΘͤͰ͋ΔͨΊɺςʔϒϧʹஔ͚Δ໋ྩ ਺΋·ͨ༗ݶͰ͋Δɻ 24
  8. ϑΥϯɾϊΠϚϯܕϚγϯ ݱࡏͷίϯϐϡʔλͷ͓͍ͯɺνϡʔϦϯάϚγϯͰ༻͍ΒΕ͍ͯͨςʔϓɺϔου ͓Αͼςʔϒϧ͸ɺϝϞϦɺϓϩάϥϜΧ΢ϯλɺϓϩηοαͱ࣮ͯ͠૷͞Ε͍ͯ Δɻ͜ͷىݯ͸ 1945 ೥ʹϊΠϚϯɺΤοΧʔτɺϞʔΫϦʔΒʹΑͬͯ࡞ΒΕͨ EDVAC ʹ͋Δͱ͞ΕɺޙʹϊΠϚϯܕίϯϐϡʔλͱݺ͹ΕΔ΋ͷݪܕͰ͋Δ?[3]ɻ ϊΠϚϯܕϚγϯ͸ҎԼͷಛ௃Λඋ͑Δɻ •

    ໋ྩ͕ஞ࣍తʹ࣮ߦ͞ΕΔ • σʔλ͸ԼͷϏοτ͔Βॱ࣍ϝϞϦ·ͨ͸ϨδελʹऔΓग़͞Εͯॲཧ͞Εɺ࠶ ͼॻ͖໭͞ΕΔ • ֤ԋࢉʹར༻͢ΔԋࢉػߏΛͰ͖Δ͚ͩڞ௨ʹ࢖͏ • ໋ྩͱσʔλΛ۠ผͤͣʹಉ͡ϝϞϦʹ͓͖ɺͲͪΒ΋ԋࢉͷର৅ͱ͢Δ • ϝϞϦʹ͸ΞυϨε͕෇͍͓ͯΓɺͦΕʹΑΓॲཧର৅ͷσʔλΛࢦఆ͢Δ • 1 ͭͷهԱͱ 1 ͭͷϓϩηοαΛ࣋ͭ͜ͱ ϊΠϚϯ͸νϡʔϦϯάϚγϯͷӨڹΛड͚ͯɺ্هΛઃܭͨ͠ͱࢥΘΕΔɻ 25
  9. CPU ͱ͸ CPU(Central Processing Unit) ͸Ұൠʹߴີ౓ʹूੵ͞Εͨిࢠճ࿏ (IC) Ͱ͋Γɺࣄ લʹఆΊΕͨ༗ݶछྨͷ໋ྩΛ࣮ߦ͢Δ͜ͱ͕Ͱ͖ΔɻҎԼʹ໋ྩͷྫΛࣔ͢ɻ 1.

    ࿦ཧԋࢉ: AND, OR ͳͲ 2. جຊతͳࢉज़ԋࢉ: Add, Mul ͳͲ 3. σʔλΛϝϞϦ͔ΒಡΈग़͠·ͨ͸ॻ͖ࠐΈ͢Δ໋ྩ: Load, Store ͳͲ 4. όεΛհͯ͠पลػثΛૢ࡞͢Δ໋ྩ (ϙʔτ I/O ͷ৔߹) ༗໊ͳ CPU ʹ intel ͷ Core γϦʔζɺarm ͷ Conterx-A γϦʔζ౳͕͋Δɻ 27
  10. ISA: Instruction Set Architecture CPU Ͱ࣮ߦͰ͖Δ໋ྩΛ·ͱΊͨ΋ͷͷ͜ͱΛ ISA ͱݺͿɻISA ͸ιϑτ΢ΣΞͱ ϋʔυ΢ΣΞͷΠϯλʔϑΣΠεͰ͋ΓɺϓϩάϥϚ͔ΒݟͨϓϩηοαͷΞʔΩς

    ΫνϟͱΈͳ͢͜ͱ΋Ͱ͖Δɻ௨ৗ͸ϓϩάϥϚ͕ίϯϐϡʔλΛ੍ޚ͢Δࡍʹ༻͍ Δࣄ͕Ͱ͖Δ࠷΋௿Ϩϕϧͳ”ιϑτ΢ΣΞ”Ͱ΋͋Δɻ ISA ʹରͯ͠ɺISA Λ࣮ߦ͢ΔͨΊͷ಺෦ͷ࿦ཧճ࿏ΛϚΠΫϩΞʔΩςΫνϟ (march) ͱݺͼɺ͜Ε͸ϓϩηοαΛ಺෦͔ΒݟͨࡍͷΞʔΩςΫνϟͱ͍͏͜ͱ͕ Ͱ͖Δɻ ൚༻ੑͷ؍఺͔ΒҟͳΔϕϯμ͕ಉҰͷ ISA Λ࠾༻͍ͯ͠Δ CPU ΋͋Δɻྫ͑͹ɺ intel ࣾͱ AMD ࣾ͸ޓ͍ʹ͋Δఔ౓ͷޓ׵ੑ͕͋Δ ISA Λ࠾༻͍ͯ͠Δ͕ɺͦͷ march ͸શ͘ҟͳΔɻ ຊεϥΠυͰ͸ ISA ͱͯ͠ RISC-V ͱ x86 Λ༻͍ͯઆ໌Λߦ͏ɻ 30
  11. ISA ͷߏ੒ ISA ͷ໋ྩܗࣜͷྫͱͯ͠ɺRISC-V RV32I R ܗࣜΛ঺հ͢Δɻ Figure 27: RISC-V

    ISA ͷϑΟʔϧυߏ੒ [7] • opcode: جຊతͳ໋ྩͷछྨΛද͢ • rd: ܭࢉͷ݁Ռ͕ॻ͖ࠐ·ΕΔϨδελ (destination) • rs1: ୈ 1 ΦϖϥϯυͱͳΔϨδελ • rs2: ୈ 2 ΦϖϥϯυͱͳΔϨδελ • funct7: ௥Ճ৘ใΛ༩͑ΔͨΊͷϑΟʔϧυ x5 = x6 + x7 -> 0000000 , 00111 , 00110 , 000, 00101 , 0110011 31
  12. ػցޠͱΞηϯϒϦݴޠͷྫ: x86 ҎԼʹɺ2 ม਺ΛՃࢉ͢Δؔ਺ add Λ࣮ߦ͢Δ x86 ͷػցޠͱΞηϯϒϦݴޠ (GAS) Λࣔ͢ɻx86

    ͷ CISC Ͱ͋Γɺ໋֤ྩͷ௕͕͞౳͘͠ͳ͍͜ͱ͕෼͔Δɻ int add(int a, int b) { return a + b; } _add: 55 // pushq %rbp 48 89 e5 // movq %rsp , %rbp 8d 04 37 // leal (%rdi ,%rsi), %eax 5d // popq %rbp c3 // retq 34
  13. ISA ͕ CPU Ͱ࣮ߦ͞ΕΔ࢓૊Έ: ࢉज़໋ྩ Ճࢉ໋ྩ͕࣮ߦ͞ΕΔࡍͷ CPU(σʔλόε) ͷ༷ࢠΛࣔ͢ɻ // ̎ͭͷϨδελͷ஋ΛՃࢉ͢Δ

    add x1 , x2 , x3 // a = b + c 1. ໋ྩ͕ϑΣον͞ΕɺPC ͕Ճࢉ͞ΕΔ 2. ໋ྩ͕σίʔυ͞Εɺ֤Ϣχοτʹ৴߸͕ग़ྗ͞ΕΔ 3. ্هͷ৴߸ʹैͬͯɺϨδελϑΝΠϧ͔Β x2 ͱ x3 ͷσʔλ͕ಡΈग़͞ΕΔ 4. ಡΈग़͞ΕͨσʔλΛ ALU ͰՃࢉ͢Δ͞ΕΔ 5. ALU ͷग़ྗ͕ϨδελϑΝΠϧ಺ͷ x3 ʹॻ͖ࠐ·ΕΔ 37
  14. ISA ͕ CPU Ͱ࣮ߦ͞ΕΔ࢓૊Έ: ϝϞϦ໋ྩ ϩʔυ໋ྩ͕࣮ߦ͞ΕΔࡍͷ CPU(σʔλόε) ͷ༷ࢠΛࣔ͢ɻ // ഑ྻ͔Β஋ΛϨδελʹϩʔυ͢Δ

    lw x1 , 4(x6) // x = a[i + 1] 1. ໋ྩ͕ϑΣον͞ΕɺPC ͕Ճࢉ͞ΕΔ 2. ໋ྩ͕σίʔυ͞Εɺ֤Ϣχοτʹ৴߸͕ग़ྗ͞ΕΔ 3. Ϩδελ x1 ͷσʔλ͕ϨδελϑΝΠϧ͔ΒಡΈग़͞ΕΔ 4. ಡΈग़͞Εͨ஋ͱఆ਺஋ (4) Λ ALU ͰՃࢉ͢Δ (ΞυϨεͷܭࢉ) 5. ্هͷग़ྗ͞ΕͨΞυϨε͔ΒɺDataMemory ͕ϝϞϦ͔Βσʔλϩʔυ͢Δ 6. DataMemory ͷग़ྗ͕ϨδελϑΝΠϧ಺ͷ x5 ʹॻ͖ࠐ·ΕΔ 39
  15. ISA ͕ CPU Ͱ࣮ߦ͞ΕΔ࢓૊Έ: ϒϥϯν໋ྩ ϒϥϯν໋ྩ͕࣮ߦ͞ΕΔࡍͷ CPU(σʔλόε) ͷ༷ࢠΛࣔ͢ɻ // ̎ͭͷม਺Λൺֱͯ͠ɺ౳͚͠Ε͹෼ذ͢Δ

    beq x1 , x2 , offset // if (x1 == x2) pc += offset; 1. ໋ྩ͕ϑΣον͞ΕɺPC ͕Ճࢉ͞ΕΔ 2. ໋ྩ͕σίʔυ͞Εɺ֤Ϣχοτʹ৴߸͕ग़ྗ͞ΕΔ 3. Ϩδελ x1 ͱ x2 ͷσʔλ͕ϨδελϑΝΠϧ͔ΒಡΈग़͞ΕΔ 4. PC ͷ஋ͱ offset ͷ஋Λ 2 ഒͯ͠ූ߸֦ுͨ͠΋ͷΛ ALU ͰՃࢉ͢Δ 5. x1 ͱ x2 ͷ஋͕౳͚͠Ε͹ɺ݁ՌΛ PC ʹ্ॻ͖͢Δ ্هͰՃࢉ͞ΕΔ offset ͸ 12bit Ͱ͋Γɺͦͷ஋Λ 2 ഒ͢Δͷ͸໋ྩ௕͕ 2 όΠτͷ ഒ਺Ͱ͋Δඞཁ͕͋Δͱ͍͏ RISC-V ͷ࢓༷Ͱ͋Δɻ 41
  16. ओཁͳ ISA • x86 Ұൠ޲͚ͷίϯϐϡʔλ͔ΒϫʔΫεςʔγϣϯ·Ͱ෯޿͘ར༻͞Ε͍ͯΔɻ • ARM ଟ͘ͷϞόΠϧσόΠε΍ϚΠίϯɺ૊ΈࠐΈػثͳͲʹ޿͘ར༻͞Ε͍ͯΔɻ e.g. εϚʔτϑΥϯ֤छɺRapberryPiɺNintendo

    DS ͳͲɻ • RISC-V 2010 ೥ʹ UCB ͔ΒΦʔϓϯιʔεͱͯ͠ൃදɻ࠷ۙ੝Γ্͕͖͍ͬͯͯΔɻ • MIPS ήʔϜػ΍૊ΈࠐΈػثͳͲʹར༻͞Ε͍ͯΔɻ e.g. PlayStationɺNintendo 64 ͳͲ • PowerPC ਾ͑ஔ͖ܕήʔϜ΍εʔύʔίϯϐϡʔλ޲͚ʹར༻͞Ε͍ͯΔɻ e.g. IBM ͷϫʔΫεςʔγϣϯɺچ MacintoshɺPlayStation3ɺXbox360 ͳͲɻ 44
  17. CPU ʹ͓͚Δ଎͞ͱ͸ʁ: CPI CPU Ͱ໋ྩΛ࣮ߦ͢ΔࡍʹɺͲΕ͘Β͍ΫϩοΫαΠΫϧ͕ඞཁͰ͋Δ͔Λࣔͨ͠ ਺஋Λ CPI ͱݺͿɻ͜Ε͸ ISA ʹର֤ͯ͠ϚΠΫϩΞʔΩςΫνϟ͕Ͳͷ͘Β͍ߴ଎

    ʹಈ࡞͢ΔΛද͢ج४ͱͳΔɻ ϓϩάϥϜʹ͓͚Δ໋ྩ i ͷ࣮ߦ਺ ICi ɺ໋ྩ i ʹඞཁͳΫϩοΫαΠΫϧ਺Λ CPIi ͱ͢ΔͱɺCPI ͸ҎԼͷΑ͏ʹܭࢉͰ͖Δɻ CPI = n ∑ i=1 ICi IC × CPIi CPI Λ༻͍Δ͜ͱͰɺҟͳΔ ISA ΍ϚΠΫϩΞʔΩςΫνϟͷ࣮૷Λൺֱ͢Δ͜ͱ͕ Ͱ͖Δɻͨͩ͠ɺύΠϓϥΠϯԽ͞Ε͍ͯΔͱ͜ͷ਺஋Λ۩ମతʹٻΊΔ͜ͱ͸೉ ͍͠ɻ 45
  18. CPI ܭࢉͷ۩ମྫ ҎԼͷ৚݅ͷϓϩάϥϜͱϓϩηοαʹ͓͍ͯɺCPI(CPIbase) ΛٻΊͯΈΔɻ • ුಈখ਺఺ԋࢉͷׂ߹ Freqfp = 25 ˋ

    • ුಈখ਺఺ԋࢉͷฏۉ CPI CPIfp = 4.0 • ଞͷԋࢉͷฏۉ CPI CPIothers = 1.33 • ฏํࠜԋࢉͷׂ߹ Freqsqrt = 2 ˋ • ฏํࠜԋࢉͷ CPI CPIsqrt = 20 CPIbase = (CPIfp × Freqfp) + (CPIothers × 1 − Freqfp) (3) = (4 × 0.25) + (1.33 × 0.75) = 2.0 (4) 46
  19. CPI ܭࢉͷ۩ମྫ ҎԼͷ̎ͭͷϓϩηοαͷઃܭύλʔϯʹ͍ͭͯɺͲͪΒ͕ΑΓੑೳΛ޲্ͤ͞Δ͔ Λߟ͑ͯΈΔɻ 1. ฏํࠜԋࢉͷ CPI Λ 2 ʹݮগͤ͞Δ

    (CPI1) 2. શͯͷුಈখ਺఺ԋࢉͷ CPI Λ 2.5 ʹݮগͤ͞Δ (CPI2) ৽͍͠ฏํࠜԋࢉͷ CPI Λ CPInews qrt ɺ৽͍͠ුಈখ਺఺ԋࢉͷ CPI Λ CPInewf p ͱ͢ ΔͱɺͦΕͧΕҎԼͷΑ͏ʹܭࢉͰ͖Δɻ CPI1 = CPIbase − Freqsqrt × (CPIsqrt − CPInews qrt) (5) = 2.0 − 0.02 × (20 − 2) = 1.64 (6) CPI2 = (CPI1−Freqfp × CPIothers) + (Freqfp × CPInewf p) (7) = (0.75 × 1.33) + (0.25 × 2.5) = 1.625 (8) 47
  20. σʔλϋβʔυ σʔλϋβʔυ͸࿈ଓ͢Δ໋ྩͷΦϖϥϯυ͕લͷ໋ྩͷ݁Ռʹґଘ͍ͯ͠Δ৔߹ʹ ى͜ΔɻҎԼ͸̎ͭ໨ͷ sub ໋ྩ͕લͷ add ໋ྩͷ݁Ռʹґଘ͍ͯ͠ΔͨΊɺRF ε ςʔδͰཹ·͍ͬͯΔྫͰ͋Δɻ͜ͷ৔߹ɺadd ໋ྩ͕

    WB εςʔδʹ౸ୡ͠ͳ͍ͱ sub ໋ྩ͸࣮ߦͰ͖ͳ͍ɻ͜ͷΑ͏ʹύΠϓϥΠϯ͕ఀࢭ͍ͯ͠ΔαΠΫϧΛύΠϓ ϥΠϯετʔϧ (όϒϧ) ͱݺͿ add x5 , x0 , x1 sub x2 , x5 , x3 ɹɹɹ Figure 38: ύΠϓϥΠϯʹ͓͚Δσʔλϋβʔυͷྫ 53
  21. ߏ଄ϋβʔυ ߏ଄ϋβʔυͱ͸ύΠϓϥΠϯͷ 2 ͭҎ্ͷεςʔδ͕ 1 ͔ͭ͠ແ͍ܭࢉࢿݯΛऔΓ ߹͏͜ͱͰൃੜ͢ΔϋβʔυͰ͋Δɻ RISC ͸શͯͷεςʔδ͕ॱʹ࣮ߦ͞ΕΔͨΊɺ໋ྩؒͰܭࢉࢿݯͷऔΓ߹͍͕ى͜ Δ͜ͱ͕গͳ͘ɺߏ଄ϋβʔυ͸͋·Γ໰୊ͱͳΒͳ͍ɻ͔͠͠ɺ໋ྩͱσʔλͷϝ

    ϞϦΞΫηεʹಉҰͷܦ࿏Λ༻͍͍ͯΔ৔߹ɺσʔλͷϝϞϦΞΫηεΛ༏ઌͯ͠ޙ ଓ໋ྩͷϑΣονΛ஗Ԇͤ͞ͳ͚Ε͹ͳΒͳ͍ΞʔΩςΫνϟ΋͋Δɻ Figure 43: ύΠϓϥΠϯʹ͓͚Δߏ଄ϋβʔυͷྫ 57
  22. ߴ଎Խͷྫ 2: Ωϟογϡ ϨδελͱओهԱͷؒʹ഑ஔ͞ΕͨهԱ૷ஔΛΩϟογϡͱݺͼɺҰൠతʹ͸ SRAM Ͱߏ੒͞Εͨখ༰ྔͳϝϞϦΛ 2 ͔Β 3 ͭ΄Ͳ֊૚Խͨ͠΋ͷͰ͋Δɻϓϩηοα͸

    ϝϞϦΞΫηε͕ൃੜͨ͠৔߹ɺ·ͣΩϟογϡʹΞΫηεͯͦ͠ͷσʔλ͕͋Δ͔ ໰͍߹ΘͤΔɻ΋͠σʔλ͕͋Ε͹Ωϟογϡ͸ΛΕΛฦ٫͠ (ώοτ)ɺͳ͍৔߹͸ ΑΓԼҐͷهԱ֊૚ʹ໰͍߹ΘͤΔ (ϛεʣ ɻ Figure 46: Ωϟογϡ΁ͷΞΫηε [1] 60
  23. Ωϟογϡ׆༻ͷྫ: ߦྻੵ C++ʹΑΔ 2 ͭͷߦྻੵͷ࣮૷ͱͦͷ࣮ߦ࣌ؒΛൺֱ͢Δ (N = 1000) ૉ๿ͳߦྻੵͷ࣮૷ (3314

    ms) for(int i=0; i<N; ++i) for(int j=0; j<N; ++j) { C[i*N+j] = 0; for(int k=0; k<N; ++k) C[i*N+j] += A[i*N+k]*B[k*N+j]; } B Λసஔͨ͠ߦྻੵͷ࣮૷ (2518 ms) for(int i=0; i<N; ++i) for(int j=0; j<N; ++j) { C[i*N+j] = 0; for(int k=0; k<N; ++k) C[i*N+j] += A[i*N+k]* B_trans[j*N+k]; } 62
  24. ߴ଎Խͷྫ 4: ϚϧνϝσΟΞ SIMD SIMD ͸Ի੠΍ը૾ͳͲͷϚϧνϝσΟΞσʔλͷܭࢉ͕ɺ8 Ϗοτ΍ 16 ϏοτͷΑ ͏ͳ୹͍σʔλ௕Λར༻͢Δ͜ͱ͔ΒɺͦΕΛޮ཰తʹܭࢉ͢ΔࢼΈ͔Βੜ·Εͨɻ

    ྫ͑͹ɺ32 ϏοτͷԋࢉػΛ 8 Ϗοτ × 4 ͭͷσʔλʹ෼͚ͯɺΩϟϦʔͷ఻೻Λ ్தͰࢭΊΔ͜ͱͰ̐ͭͷՃࢉΛಉ࣌ʹܭࢉͰ͖ΔɻҰൠతʹ͸ɺ128 Ϗοτ͔Β 512 Ϗοτͷԋࢉػ͓ΑͼϨδελ͕༻ҙ͞Ε͍ͯΔ͜ͱ͕ଟ͍ɻ SIMD Λ༻͍Δͱ௨ৗͷ໋ྩΛԿ౓΋࢖༻͢ΔΑΓ໋ྩ਺ΛݮΒ͢͜ͱ͕Ͱ͖Δɻ fld f5 , a # Load scalar a splat .4D f0 , f0 # Make 4 copies of a fld.4D f1 , 0(x5) # Load X[i] ... X[i+3] fmul .4D f1 , f1 , f0 # f1[0] = f1 [0]*a; ...; f1[3] = f1 [3]*a; fld.4D f2 , 0(x6) # Load Y[i] ... Y[i+3] fadd .4D f2 , f2 , f1 # f2[0] = f1 [0]+b; ...; f2[3] = f2 [3]+b; fsd.4D f2 , 0(x6) # Store Y[j] ... Y[j+3] 65
  25. SIMD ͷσϝϦοτͱϕΫτϧΞʔΩςΫνϟ ϚϧνϝσΟΞ SIMD ʹ͸ҎԼͷܽ఺͕͋Δɻ • ֤ϕΫτϧ௕ຖʹ໋ྩ͕ଘࡏ͓ͯ͠Γɺ໋ྩͷ਺͕૿͑΍͍͢ • Ϊϟβʔ/εΩϟλʔͷΑ͏ͳΞυϨογϯάϞʔυ͕ແ͍ •

    ཁૉ͝ͱͷ৚݅෇͖࣮ߦΛαϙʔτ͍ͯ͠ͳ͍ ϕΫτϧΞʔΩςΫνϟ͸ SIMD ΑΓॊೈͰɺΑΓߴ౓ͳσʔλฒྻԋࢉΛѻ͑Δઃ ܭͰ͋Δɻ͜Ε͸ɺෳ਺ͷϨδελͱԋࢉϢχοτΛฒ΂ͯͦΕΒΛಈతʹ૊Έ߹Θ ͤΔ͜ͱͰɺ௕͍ύΠϓϥΠϯ΍ػೳϢχοτͷฒྻԽΛߦ͏ख๏Ͱ͋Δɻ ͜ΕʹΑΓ໋ྩΛ૿΍͢͜ͱແ͘ɺฒྻ໋ྩ࣮ߦ΍໋ྩͷΦʔόϔουͷܰݮ͕Ͱ͖ Δɻ·ͨɺΪϟβʔ/εΩϟλ΍৚݅෇͖ϕΫτϧԋࢉΛαϙʔτ͍ͯ͠Δ΋ͷ΋ ͋Δɻ 66
  26. ߴ଎Խͷྫ 5: ϕΫτϧΞʔΩςΫνϟ ϕΫτϧΞʔΩςΫνϟ͸ҎԼͷߏ੒͔ΒͳΔɻ • ϕΫτϧϨδελ: ҰͭͷϕΫτϧΛอ࣋͢ΔɻRV64V Ͱ͸ 32 ×

    64bit ݸͷϨδελ͕͋ΔɻΫϩ εόʔεΠονΛ༻͍ͯϕΫλػೳϢχοτͷೖྗͱग़ྗʹ઀ଓ͞Ε͍ͯΔɻ • ϕΫλػೳϢχοτ: ෳ਺ͷ஋ʹରͯ͠ɺࢦఆͷϕΫτϧԋࢉΛ࣮ߦ͢ΔϢχοτɻͦΕͧΕ͕׬શʹ ύΠϓϥΠϯԽ͞Ε͓ͯΓɺຖαΠΫϧ৽͍͠ԋࢉΛ։࢝Ͱ͖Δɻ • ϕΫλϝϞϦϢχοτ: ϝϞϦ͔ΒϕΫλσʔλΛϩʔυɺετΞ͢ΔϢχοτɻ׬શʹύΠϓϥΠϯԽ ͞Ε͓ͯΓɺΦʔόϔουͷޙʹຖαΠΫϧ৽͍͠ϫʔυΛҠಈͰ͖Δɻ • εΧϥϨδελ: ϕΫλϝϞϦϢχοτͰར༻͢ΔΞυϨεΛܭࢉͨ͠ΓɺϕΫ λػೳϢχοτʹεΧϥ஋Λೖྗ͢ΔϢχοτɻ 68
  27. ϕΫτϧΞʔΩςΫνϟͷΞηϯϒϦ (DAXPY) ϕΫτϧܭࢉͱͯ͠Α͘࢖ΘΕΔ໋ྩΛྫʹ RISC-V ͷϕΫτϧ໋ྩͷΞηϯϒϦࣔ ͢ɻ2 ͭͷϕΫτϧ X ͱ Y

    ͱ 1 ͭͷεΧϥ a Λ༻͍ͯɺҎԼΛܭࢉ͢Δɻ Y = a × X + Y vsetdcfg 4 * FP64 # Enable 4 DP FP vregs fld f0 , a # Load scalar a vld v0 , x5 # Load vector X vmul v1 , v0 , f0 # Vector scalar multiply vld v2 , x6 # Load vector Y vadd v3 , v1 , v2 # Vector -vector add vst v3 , x6 # Store the sum vdisable # Disable vector regs 69
  28. ίϯύΠϧํࣜͱΠϯλϓϦλํࣜ ίϯύΠϥʹ͸ɺೖྗͷϓϩάϥϜΛ׬શʹػցޠʹม׵ͯ͠ϑΝΠϧͱͯ͠ੜ੒͠ ͨޙʹͦΕΛ࣮ߦ͢ΔίϯύΠϧํࣜͱɺϑΝΠϧͷੜ੒ΛߦΘͣʹͦͷ··ղऍ͠ ͯೖྗͷϓϩάϥϜΛ࣮ߦ͢ΔΠϯλʔϓϦλํ͕ࣜ͋Δɻ • ίϯύΠϧํࣜ: CɺFORTRANɺRust • ΠϯλϓϦλܗࣜ: PythonɺRubyɺJavascript

    ·ͨɺೖྗͷϓϩάϥϜΛ࣮ࡍͷػցޠͰ͸ͳ͘Ծ૝తͳ໋ྩʹม׵͢Δ΋ͷ΋͋ Δɻ͜ΕΒ͸ͦͷԾ૝໋ྩΛղऍ͢ΔϓϩάϥϜΛιϑτ΢ΣΞͱͯ͠ಈ࡞ͤ͞Δ͜ ͱʹΑΓ࣮ߦΛߦ͏ɻ Ծ૝ػցΛ༻͍ͨϓϩάϥϛϯάݴޠͷྫʹ Jave ͕͋ΔɻJava ͸ίϯύΠϧํࣜͱ ΠϯλϓϦλํࣜͷ྆ํͰ࣮ߦ͕ՄೳͰ͋Δɻ 73
  29. ίϯύΠϥʹΑΔ࠷దԽͷख๏ ໨తϓϩάϥϜͷ࠷దԽͱ͸ɺޮ཰ͷྑ͍໨తϓϩάϥϜʹ͢Δͱ͍͏͜ͱͰ͋Δɻ ࠷దԽʹ͸ϓϩάϥϜͷ࣮ߦ଎౓Λૣͨ͘͠ΓɺαΠζΛখ͘͢͞ΔͳͲ༷ʑͳछྨ ͕͋Δ͕ɺຊࢿྉͰ͸ߴ଎Խʹ͍ͭͯऔΓѻ͏͜ͱʹ͢Δɻ࣮ߦ଎౓ͷ޲্ʹ͸େ͖ ͘ 3 ͭͷ޻෉͕ߟ͑ΒΕΔ [11]ɻ • ໋ྩͷ࣮ߦճ਺ΛݮΒ͢

    • ΑΓૣ໋͍ྩΛ࢖͏ • ฒྻ౓Λ্͛Δ ্هΛͲͷ༷ʹ૊Έ߹ΘͤΔ͔͸ɺ࣮ߦର৅ͷίϯϐϡʔλʹΑͬͯҟͳͬͯ͘Δɻ ྫ͑͹ɺಉ࣌ʹෳ਺ͷ໋ྩΛ࣮ߦͰ͖ΔεʔύεΧϥϚγϯʹ͓͍ͯ͸ 3 ͕༗ޮͰ͋ Δɻ·ͨɺCISC Ϛγϯͷ༷ʹෳ߹తͳ໋ྩΛଟ࣋ͭ͘ϓϩηοαͷ৔߹ʹ͸ɺͦΕ ΒΛ͏·͘࢖͏͜ͱͰ໋ྩͷ࣮ߦճ਺ΛݮΒ͢͜ͱ͕Ͱ͖Δɻ 75
  30. ໋ྩͷ࣮ߦճ਺ΛݮΒ͢ ໋ྩͷ࣮ߦճ਺ΛݮΒ͢ʹ͸ҎԼͷํ๏͕͋Δ [11]ɻ 1. 1 ౓࣮ߦͨ݁͠ՌΛ࠶ར༻͢Δ: Common Subexpression Elimination ͳͲ

    2. Մೳͳ΋ͷ͸ίϯύΠϧ࣌ʹ࣮ߦ͢Δ: Constant Folding ͳͲ 3. ໋ྩΛΑΓ࣮ߦස౓ͷ௿͍ͱ͜Ζ΁Ҡ͢: Code Motion ͳͲ 4. ࣮ߦճ਺ΛݮΒ͢Α͏ʹϓϩάϥϜͷܗΛม׵͢Δ: Loop Transformation ͳͲ 5. ࣜͷੑ࣭Λར༻࣮ͯ͠ߦΛม׵͢Δ: ୅਺ͱ࿦ཧࣜͷ׆༻ͳͲ 6. ৑௕ͳ໋ྩΛऔΓআ͘: Dead Code Elimination ͳͲ 7. ಛघԽ͢Δ: Function Inliningɺ൑ఆஔ͖׵͑ͳͲ ࠓճ͸্هͷҰ෦Λ঺հ͢Δɻ࣮͸্هͷز͔ͭͷຊ࣭తͳมܗख๏͸ಉ͡Ͱ͋Δɻ ྫ͑͹ɺCommon Subexpression Elimination ͱ Code Motion ͸ Partial Subexpression Elimination ͷҰछͰ͋Δɻ 77
  31. Common Subexpression Elimination ҎԼͷΑ͏ʹจͷதʹڞ௨ͷؚ͕ࣜ·ΕΔ৔߹ɺͦΕΒΛҰ౓͚ͩܭࢉ͠Ұ࣌ม਺ͱ ͯ֬͠อͨ͠ޙɺͦΕΛ࢖͍ճ͢Α͏ʹϓϩάϥϜΛมߋ͢Δɻ c = a + b

    // (1) ... e = (a + b) * d // (2) t = a + b // Ұ࣌ม਺ͱͯ͠ܭࢉ c = t // Ұ࣌ม਺Λར༻1 ... e = t * d // Ұ࣌ม਺Λར༻2 ڞ௨෦෼ࣜͷ࡟আʹ͸ҰൠʹҎԼͷ 3 ͭͷ৚͕݅ඞཁͰ͋Δɻ i. (1) ͱ (2) ͷ a+b ͸ಉ͡ܗͷࣜͰ͋Δ ii. (2) ͷܭࢉͷલʹඞͣ (1) ͷܭࢉ͕ͳ͞Ε͍ͯΔ iii. (1) ͱ (2) ͷؒͰɺa ͱ b ͷ஋͕มΘΒͳ͍ 78
  32. Constant Folding ఆ਺ͷܭࢉΛίϯύΠϧ࣌ʹߦ͏࠷దԽͷ͜ͱɻྫ͑͹ɺҎԼͷΑ͏ͳϓϩάϥϜʹ ରͯ͠ɺ a = 1.0 + 2.0 b

    = a * 3.0 d = c - b // ͸ม਺c a Λࣄલʹܭࢉͯ͠ఆ਺ͱ͠ɺͦͷ݁ՌΛ༻͍ͯ b ΋ܭࢉ͢Δͱɺ࠷ऴతʹҎԼͷΑ ͏ʹͳΔɻ d = c - 6.0 79
  33. Loop Motion ϧʔϓͷ಺෦Ͱ஋͕มԽ͠ͳ͍ཁૉΛϧʔϓෆม (Loop Invariant) ͱݺͿɻLoop Motion ͸ϧʔϓෆมͳܭࢉΛ֎ʹͩ͢͜ͱͰɺܭࢉճ਺ΛݮΒ͢࠷దԽͰ͋ΔɻҎ Լʹ Loop

    Motion ͷ؆୯ͳྫΛࣔ͢ɻ for i = 1, n ... a = b * c d = i * 2 ... end a = b * c for i = 1, n ... d = i * 2 ... end Loop Motion ʹΑͬͯ࡟ݮ͞ΕΔܭࢉྔ͸ҰൠʹͦͷϧʔϓͷΠςϨʔγϣϯ਺ͱԋ ࢉࢠͷڧ౓ʹґଘ͢Δɻ্هͰ͸ n-1 ճͷ৐ࢉ͕࡟ݮ͞ΕΔɻ 80
  34. ΑΓ଎໋͍ྩͷར༻ 1. هԱ֊૚Λ׆༻͢Δ: Register Allocation, Tiling ͳͲ 2. ϓϩηοαͷ࣋ͭߴ଎ͳ໋ྩΛ׆༻͢Δ: SIMD,

    incremental condition ͳͲ 3. ࿦ཧ΍਺ཧత৘ใΛ༻͍ͯΑΓ୯७ͳ໋ྩʹஔ͖׵͑Δ 81
  35. loop tiling Λద༻ͨ͠ߦྻੵͷྫ for ti = 1, n, t for

    tj = 1, n, t for tk = 1, n, t for i = ti , n, min(ti + t, n) for j = tj , n, min(tj + t, n) for k = tk , n, min(tk + t, n) C(i,j) += A(i,k) * B(j,k) end // k end // j end // i end // tk end // tj end // ti 84
  36. ιϑτ΢ΣΞύΠϓϥΠχϯά ͜͜Ͱ͸ҎԼͷ৚݅Λຬͨ͢ϓϩηοαΛ࢖༻ͨ͠ࡍͷιϑτ΢ΣΞύΠϓϥΠχϯ άͷྫΛࣔ͢ɻ • 1 ͭͷԋࢉ໋ྩͱ̍ͭͷϝϞϦૢ࡞໋ྩΛಉ࣌ʹ࣮ߦͰ͖Δɻ • ԋࢉ໋ྩ͸̍αΠΫϧͰ׬ྃ͢Δ • ϝϞϦૢ࡞໋ྩͷϩʔυ໋ྩ͸

    2 αΠΫϧɺετΞ໋ྩ͸ 1 αΠΫϧͰ׬ྃ͢Δ ҎԼʹɺϧʔϓ಺෦Ͱ୯७ͳܭࢉΛߦ͏ϓϩάϥϜͱͦͷΞηϯϒϦΛࣔ͢ɻ࣍ʹ͜ ͷϓϩάϥϜͷ֤ΠςϨʔγϣϯʹ͓͚Δ໋ྩ࣮ߦͷਪҠΛࣔ͢ɻ for i = 1, n A[i] = A[i] * b + c end 1: Load r1 A[i] 2: Mul r4 r1 r2 // b in r2 3: Add r5 r4 r3 // c in r3 4: Store A[i] r5 5: loop_check 89
  37. ιϑτ΢ΣΞύΠϓϥΠχϯάͳ͠ͷ৔߹ͷΠςϨʔγϣϯͷ༷ࢠ 1 i = 1 i = 2 i =

    3 i = 4 2 Load r1 A[1] 3 Mul r4 r1 r2 4 Add r5 r4 r3 5 Store A[i] r5 6 7 loop check 8 Load r1 A[2] 9 Mul r4 r1 r2 10 Add r5 r4 r3 11 Store A[2] 12 13 loop check 90
  38. ιϑτ΢ΣΞύΠϓϥΠχϯάΛద༻ͨ͠ϓϩάϥϜͱΞηϯϒϦ for i = 1, n, 4 A[i] = A[i]

    * b + c A[i+1] = A[i+1] * b + c A[i+2] = A[i+2] * b + c A[i+3] = A[i+3] * b + c end 1: Load r1 A[i] 2: Mul r4 r1 r2 3: Add r5 r4 r3 4: Store A[i] r5 5: Load r1 A[i+1] 6: Mul r4 r1 r2 7: Add r5 r4 r3 8: Store A[i+1] r5 9: Load r1 A[i+2] 10: Mul r4 r1 r2 11: Add r5 r4 r3 12: Store A[i+2] r5 13: Load r1 A[i+3] 14: Mul r4 r1 r2 15: Add r5 r4 r3 16: Store A[i+3] r5 17: loop_check 91
  39. ιϑτ΢ΣΞύΠϓϥΠχϯά͋Γͷ৔߹ͷΠςϨʔγϣϯͷ༷ࢠ 1 i = 1 i = 2 i =

    3 i = 4 2 Load r1 A[1] 3 Mul r4 r1 r2 4 Add r5 r4 r3 Load r1 A[2] 5 6 Store A[1] r5 Mul r4 r1 r2 7 Add r5 r4 r3 Load r1 A[3] 8 9 Store A[2] r5 Mul r4 r1 r2 10 Add r5 r4 r3 Load r1 A[4] 11 12 Store A[3] r5 Mul r4 r1 r2 13 Add r5 r4 r3 92
  40. Initiation Interval લεϥΠυͷද͔Βɺϧʔϓͷ్த͔Β 3 αΠΫϧͰ࣍ͷΠςϨʔγϣϯΛ։࢝ͯ͠ ͍Δ͜ͱ͕෼͔Δ (ྫ: 6 9 αΠΫϧ໨)ɻ͜ͷִؒΛ

    Initiation Interval(II) ͱݺͿɻॳ Ίͷૉ๿ͳϧʔϓͷ II ͸ 6 Ͱ͋ͬͨ͜ͱ͔Β΋෼͔ΔΑ͏ʹɺII ͕খ͍͞΄Ͳϧʔϓ ʹඞཁͳΫϩοΫαΠΫϧ਺͸ݮগ͢Δɻ Ҏ্͔Βɺιϑτ΢ΣΞύΠϓϥΠχϯάΛߦ͏ͱૉ๿ͳ৔߹ΑΓߴ଎ʹϓϩάϥϜ Λ࣮ߦͰ͖Δ͜ͱ͕Θ͔ͬͨɻ͜Ε͸֤ϧʔϓؒʹ͓͚Δऴྃ൑ఆΛݮΒ͢͜ͱͰɺ ϓϩηοαͷύΠϓϥΠϯΛखಈͰຒΊΔΑ͏ʹϓϩάϥϜΛ૊Ή͜ͱʹΑΓੜͯ͡ ͍Δɻݱ୅ͷϓϩηοαͷଟ͕͘౤ػ࣮ߦʹΑͬͯऴྃ൑ఆΛ଴ͨͣʹ࣍ͷ໋ྩΛ ϑΣον͠ɺฒྻʹ࣮ߦ͍ͯ͠ΔͨΊɺૉ๿ͳϧʔϓͷྫͷΑ͏ʹඇޮ཰ʹ࣮ߦ͕ͳ ͞ΕΔ͜ͱ͸গͳ͍ɻ͔͠͠ɺ͜ΕΒ͸෼ذ༧ଌͷਫ਼౓΍෼ذͷ֬཰ʹґଘ͢Δͨ Ίɺϋʔυ΢ΣΞ͕ਐԽͨ͠ݱࡏͰ΋ґવͱͯ͠ιϑτ΢ΣΞʹΑΔ໋ྩεέδϡʔ Ϧϯά͸༗ޮͰ͋Δɻ 93
  41. ·ͱΊ • ίϯϐϡʔλ͸਺ͷදݱʹ 2 ਐ਺Λ༻͍͓ͯΓɺεΠονճ࿏Λݩʹߏ੒͞Εͨ ࿦ཧճ࿏ͰϒʔϧԋࢉΛߦ͍ͬͯΔ • Ұൠతͳίϯϐϡʔλ͸ϊΠϚϯܕϚγϯͱݺ͹ΕΔߏ੒ͱͳ͍ͬͯΔ • CPU

    ͸ ISA ʹΑΓ໋ྩΛ࣮ߦ͠ɺͦΕΒ͸ڞ௨ͷ࿦ཧճ࿏Λ༻͍࣮ͯߦ͞ΕΔ • CPU ͸ߴ଎ԽͷͨΊʹύΠϓϥΠϯԽ͞Ε͍ͯΔ͕ɺϋβʔυʹΑΓͦͷޮ཰ ͕མͪΔ͜ͱ͕͋Δ • ଎౓ͱ༰ྔͷҟͳΔϝϞϦΛ֊૚Խ͢Δ͜ͱͰϝΠϯϝϞϦ΁ͷΞΫηε࣌ؒΛ ؇࿨͍ͯ͠Δ • CPU ͸ಉ͡ॲཧΛෳ਺ͷσʔλʹର࣮ͯ͠ߦͰ͖Δ • ίϯύΠϥ͸ CPU ͷੑ࣭Λར༻ͯ͠ϓϩάϥϜΛߴ଎Խ͍ͯ͠Δ ߴ଎Խͷଟ͘͸ίϯϐϡʔλͷΞʔΩςΫνϟࣗମͷੑ࣭Λ͏·͘ར༻͢Δ͜ͱͰୡ ੒Ͱ͖Δέʔε͕ଟ͍ɻैͬͯɺجૅతͳίϯϐϡʔλͷ஌ࣝΛ͖ͪΜͱཧղ͢Δ͜ ͱ͕ॏཁͰ͋Δɻ 95
  42. ײ૝ͱࠓޙͷల๬ ࠓճͷൃදͰ͸ࣄલʹܭը͍ͯͨ͠ GPU ͷ෦෼Λ׬શʹൈ͍ͯ͠·͍ɺਃ͠༁͋Γ ·ͤΜͰͨ͠ɻ ɻ·ͨɺൃද࣌ؒ΍ࢿྉͷ࡞੒४උ࣌ؒͷ໰୊Ͱ CPU ΍ίϯύΠϥʹ ͍ͭͯ΋ਂ͍ೖΓͰ͖ͳ͔ͬͨͷͰɺͲ͔͜Ͱͦͷล΋ൃද͍ͨ͠ɻ ൓লΛ౿·͑ͯࠓޙͷൃද͍ͨ͠಺༰͸ҎԼɻ

    • CPU: εʔύεΧϥͷৄࡉɺOoO ࣮ߦɺεϨουϨϕϧฒྻੑɺ෼ذ༧ଌɺη ΩϡϦςΟ໰୊ • GPU: جૅɺGPU ϓϩάϥϛϯάͷྫ • ίϯύΠϥ: ϑϩϯτΤϯυɺσʔλϑϩʔղੳͳͲͷղੳܥ • ͦͷଞ: FPGA, Domain Specific Architecture ಡྃɺ͓ർΕ༷Ͱͨ͠ɻ 96
  43. ϚΠΫϩϓϩηοαɾΞʔΩςΫνϟೖ໳: RISC ϓϩηοαͷجૅ͔Β ࠷৽ϓϩηοαͷ͘͠Έ·Ͱ. Tech I. CQ ग़൛, 2004. isbn:

    9784789833318. url: https://books.google.co.jp/books?id=CQ4lPQAACAAJ. ࿦ཧͱܭࢉͷ͘͠Έ. ؠ೾ॻళ, 2007. isbn: 9784000061919. url: https://www.iwanami.co.jp/book/b265606.html. ୭͕Ͳ͏΍ͬͯίϯϐϡʔλΛ૑ͬͨͷ͔? ڞཱग़൛, 1995. isbn: 9784320027428. url: https://books.google.co.jp/books?id=4ZEsAAAACAAJ. H. Ando. ϓϩηοαΛࢧ͑Δٕज़: Ռͯ͠ͳ͘εϐʔυΛ௥ٻ͢Δੈք . Web+DB Press ϓϥεγϦʔζ. ٕज़ධ࿦ࣾ, 2011. isbn: 9784774145211. url: https://books.google.co.jp/books?id=NCRQYgEACAAJ. 97
  44. Sarah Harris and David Harris. Digital Design and Computer Architecture:

    ARM Edition. 1st. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2015. isbn: 0128000562, 9780128000564. John L. Hennessy and David A. Patterson. Computer Architecture, Sixth Edition: A Quantitative Approach. 6th. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2017. isbn: 0128119055, 9780128119051. David A. Patterson and John L. Hennessy. Computer Organization and Design RISC-V: The Hardware/Software Interface. 5th. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2017. isbn: 0124077269, 9780124077263. poyopoyo Reconf. LSI ͔ΒΘ͔Δࣗ࡞ CPU. 1st. 2018. url: https://booth.pm/ja/items/1046056. 97
  45. C. E. Shannon. “A symbolic analysis of relay and switching

    circuits”. In: Electrical Engineering 57.12 (Dec. 1938), pp. 713–723. issn: 0095-9197. doi: 10.1109/EE.1938.6431064. ΤΠϖϧΞϯυϦϡʔ W. ࠷৽ίϯύΠϥߏ੒ٕ๏. ᠳӭࣾ, 2009. isbn: 9784798114682. url: https://books.google.co.jp/books?id=MzSFQgAACAAJ. தాҭ෉. ίϯύΠϥͷߏ੒ͱ࠷దԽ ʢୈ̎൛ʣ . ே૔ॻళ, 2009. isbn: 9784254121773. url: https://www.asakura.co.jp/books/isbn/978-4-254-12177-3/. ߴᖛ పߦ খ૔ ٱ࿨. ৘ใͷ࿦ཧ਺ֶೖ໳: ϒʔϧ୅਺͔Βड़ޠ࿦ཧ· Ͱ. ৘ใͷ࿦ཧ਺ֶೖ໳: ϒʔϧ୅਺͔Βड़ޠ࿦ཧ·Ͱ. ۙ୅Պֶࣾ, 1991. isbn: 9784764901803. url: https://books.google.co.jp/books?id=JGSYygAACAAJ. 97