Upgrade to Pro — share decks privately, control downloads, hide ads and more …

hpc170_slide.pdf

 hpc170_slide.pdf

My slide at "第170回HPC研究会 (SWoPP2019)".
http://id.nii.ac.jp/1001/00198056/

Kazuhiro Serizawa

July 24, 2019
Tweet

More Decks by Kazuhiro Serizawa

Other Decks in Research

Transcript

  1. େن໛ػցֶश܇࿅ʹ ͓͚Δ*0ੑೳͷߴ଎Խ ۔୔࿨༸ ݐ෦मݟ  ஜ೾େֶେֶӃγεςϜ৘ใ޻ֶݚڀՊ ஜ೾େֶܭࢉՊֶݚڀηϯλʔ ۔୔࿨༸ ݐ෦मݟେن໛ػցֶश܇࿅ʹ͓͚Δ*0ੑೳͷߴ଎Խ ৘ใॲཧֶ

    ձୈճ)1$ݚڀձใࠂ )1$ 7PM)1$ /P +VM !๺ւಓ๺ݟࢢຽձؗ
  2. ൃද಺༰  ݚڀͷഎܠͱ໨త  ਂ૚χϡʔϥϧωοτϫʔΫͷ֓ཁͱ܇࿅σʔλSFBEͷӨڹ  SFBE*0Λߴ଎Խ͢ΔͨΊͷैདྷख๏  ఏҊख๏ 

    ධՁ࣮ݧ  ϕϯνϚʔΫධՁ  ࣮ΞϓϦέʔγϣϯධՁ  ؔ࿈ݚڀ  ·ͱΊͱࠓޙͷ՝୊ 2
  3. ݚڀͷഎܠʢ̍̎ʣ • ۙ೥ɼը૾෼ྨλεΫΛਂ૚χϡʔϥϧωοτϫʔΫʢҎ ԼDNNʣΛ༻͍ͯղ͘ख๏͕޿͘࠾༻͞Ε͍ͯΔ • ҰํͰɼը૾෼ྨͷϞσϧΛDNNͰ܇࿅͢Δࡍʹ͸ɼҰൠʹ਺ ສຕʢCifar100౳ʣʙ਺ඦສຕʢImageNetͳͲʣͷ܇࿅༻ͷը ૾Λread͢Δඞཁ͕͋Γɼread I/Oͷॴཁ࣌ؒ͸ແࢹͰ͖ͳ͍ ϘτϧωοΫͱͳΔ

    3
  4. ݚڀͷഎܠʢ̎̎ʣ $JGBS *NBHF/FUL ܇࿅ॲཧதͷ(16Χʔωϧ࣮ߦঢ়گ ܇࿅ॲཧதʢJUFSBUJPOʣͷ(16ར༻ঢ়گΛώʔτϚοϓͱͯ͠ՄࢹԽͨ͠ਤ ग़ల۔୔ɼݐ෦ɼਂ૚χϡʔϥϧωοτϫʔΫʹ͓͚Δ܇࿅ߴ଎ԽͷͨΊͷࣗಈ࠷దԽ ৘ใॲཧֶձୈճ)1$ݚڀձใࠂQਤ w ܇࿅σʔλͷSFBE͕཯଎͍ͯ͠Δέʔεͷ࣮ྫ 4

  5. ݚڀͷ໨త w %//ʹ͓͚ΔSFBE*0ͷॴཁ࣌ؒΛ࡟ݮ͠ɼ܇࿅଎౓Λ վળ͢Δख๏ΛఏҊ͢Δ w ఏҊख๏ΛػցֶशϑϨʔϜϫʔΫͰ͋Δ$IBJOFSʹ࣮૷ ͠ɼ$IBJOFS͕ఏڙ͢ΔطଘػೳͱSFBE*0ͷੑೳൺֱΛ ߦ͏ 5

  6. ैདྷख๏ʢ̍̎ʣ w ฒྻSFBE w ϛχόονόοϑΝϦϯά 6

  7. ैདྷख๏ʢ̎̎ʣ w $IBJOFSʹ͓͍ͯ͸ฒྻSFBEͱϛχόονόοϑΝϦϯά ͷछྨΛ૊Έ߹Θͤͨػೳ͕ఏڙ͞Ε͍ͯΔ w ͔͠͠ɼ͜ͷΑ͏ͳ࢓૊ΈΛ༻͍ͯ΋ɼ࢖༻͢ΔετϨʔδͷ *0ੑೳʹΑͬͯ͸܇࿅σʔλͷSFBE͕(16ଆͷॲཧʹؒʹ߹ ΘͣɼશମΛ཯଎͢Δ 7

  8. )1$Ϋϥελʹ͓͚Δ ετϨʔδ w Ұൠʹ)1$Ϋϥελͷܭࢉϊʔυʹ͸ϊʔυϩʔΧϧετϨʔδͱͯ͠/7.F44%౳ͷߴ଎ͳ ετϨʔδ͕౥ࡌ͞Ε͍ͯΔ͕ɼܭࢉδϣϒ͝ͱʹڞ༗ετϨʔδ͔ΒͷϑΝΠϧίϐʔ͕ඞཁ w ࢀߟͱͯ͠ɼ)1$Ϋϥελʮ$ZHOVTʯʹ͓͍ͯɼ*NBHF/FUͷը૾໿ສຕʢ߹ܭ(J#ʣ Λڞ༗ετϨʔδ͔ΒϊʔυϩʔΧϧετϨʔδʹσΟϨΫτϦ͝ͱίϐʔͨ͠ͱ͜Ζ ෼ʙ෼ఔ౓Λཁͨ͠ 8

  9. ఏҊख๏ʢ̍̐ʣ ैདྷख๏ͱͷࠩ෼ w ڞ༗ετϨʔδ͔Β܇࿅σʔλΛฒྻʹϊʔυϩʔΧϧετ Ϩʔδ΁ϑΝΠϧίϐʔ͠ɼίϐʔ͞ΕͨϑΝΠϧ͔Βॱʹ SFBEͯ͠ϛχόονΛੜ੒͢Δ 9

  10. ఏҊख๏ʢ̎̐ʣ w ఏҊख๏ʹΑΓɼϊʔυϩʔΧϧετϨʔδΛ࢖͏৔߹Ͱ΋܇ ࿅ॲཧ։࢝લͷϑΝΠϧίϐʔͷ׬ྃΛ଴ͨͣʹ܇࿅ॲཧ͕։ ࢝Ͱ͖Δ ैདྷख๏ ఏҊख๏ ϊʔυϩʔΧϧετϨʔδ΁ͷ ϑΝΠϧίϐʔ ϛχόονੜ੒

    ܇࿅ॲཧ ܦա࣌ؒ 10
  11. ఏҊख๏ʢ̏̐ʣ 11

  12. ఏҊख๏ʢ̐̐ʣ JOEFYϦετͷྲྀΕ σʔλͷྲྀΕʢ࠶ܝʣ 12

  13. ධՁ࣮ݧ؀ڥ w ຊݚڀͰͷධՁ͸ஜ೾େֶܭࢉՊֶηϯλʔʹͯӡ ༻͞Ε͍ͯΔ)1$Ϋϥελʮ$ZHOVTʯΛ࢖༻ͨ͠ $16 *OUFM9FPO(PME1SPDFTTPS  $()[ Y .FNPSZ

    (J#  (J#%%3&$$3%*..Y ϊʔυϩʔΧϧ ετϨʔδ *OUFM44%1$14FSJFT5# ڞ༗ετϨʔδ -VTUSF %%/&9"4DBMFS (16 /7*%*"5FTMB7(J#)#.1$*F Y 04 $FOU04-JOVYSFMFBTF $ZHOVT֎؍ 13
  14. ධՁ࣮ݧ̍ʢ̍̎ʣ w ϝϞϦ্ʹϛχόονΩϡʔ͔ΒϛχόονΛ ݸ MPBE͢Δ·Ͱͷ࣌ؒΛܭଌ w ճܭଌͨ͠ฏۉ஋Λൺֱ͢Δ w ࢖༻͢Δσʔλ͸*NBHF/FUͷը૾໿ສຕΛ࢖༻͠ɼϛχ όοναΠζ͸ݻఆ

    ධՁύλʔϯ ࢖༻ετϨʔδ ख๏ ֤ॲཧͷ ੜ੒ϓϩηε਺ ϊʔυϩʔΧϧ ετϨʔδ 44% ैདྷख๏      ڞ༗ετϨʔδ -VTUSF ैདྷख๏ ఏҊख๏ -VTUSF 44% ఏҊख๏ 14
  15. ධՁ࣮ݧ̍ʢ̎̎ʣ w ఏҊख๏͸܇࿅σʔλϓϦϑΣονϓϩηε਺ʹൺྫͯ͠ੑೳ͕޲্͍ͯ͠Δ͕ɼϛχόον ੜ੒ϓϩηε਺ͱ͸΄΅૬͕ؔͳ͍ w ڞ༗ετϨʔδɼϊʔυϩʔΧϧετϨʔδڞʹϛχόονੜ੒ϓϩηε਺ʹൺྫͯ͠ੑೳ͕ ޲্͍ͯ͠Δɽ ఏҊख๏ ڞ༗ετϨʔδ ϊʔυϩʔΧϧετϨʔδ

    ఏҊख๏ ϓϦϑΣονϓϩηε਺ ϛχόονੜ੒ϓϩηε ఏҊख๏ʹ͓͍ͯϓϦϑΣονϓϩηε਺Λ มԽͤͨ͞ͱ͖ͷ݁Ռ શύλʔϯʹ͓͍ͯϛχόονੜ੒ϓϩηεΛ มԽͤͨ͞ͱ͖ͷ݁Ռ ʢఏҊख๏ͷϓϦϑΣονϓϩηε਺ʣ 'BTUFS 'BTUFS 15
  16. ධՁ࣮ݧ̍ͷ ੑೳʹؔ͢Δߟ࡯ ࠷଎஋<TFD> JUFSBUJPOTFD όϯυ෯ <.#T> ߹ܭ࢖༻ϓϩηε਺ ϊʔυϩʔΧϧ ετϨʔδ 

       ڞ༗ετϨʔδ     ఏҊख๏     w ૝ఆ௨Γɼ࠷΋ετϨʔδσόΠεͱͯ͠ͷੑೳ͕ߴ͍44%͕࠷΋ߴ଎ w ఏҊख๏͸ڞ༗ετϨʔδΑΓ΋ߴ଎ͳ݁Ռ͕ಘΒΕɼϓϦϑΣονͷޮՌ͕֬ ೝͰ͖Δ w ڞ༗ετϨʔδͱఏҊख๏Λൺֱ͢ΔͱɼఏҊख๏ͷํ͕ΑΓগͳ͍ϓϩηε਺ Ͱڞ༗ετϨʔδΑΓ΋ߴ͍SFBE*0Λୡ੒͍ͯ͠Δ 16
  17. ධՁ࣮ݧ̎ʢ̍̏ʣ w ࣮ΞϓϦέʔγϣϯͰͷධՁͱͯ͠$IBJOFSΛ༻͍ͯσʔλฒྻ܇࿅Λߦ͍ɼҎ ԼΛධՁ͢Δ w FQPDIͷσʔλฒྻ܇࿅ʹཁͨ͠߹ܭ࣌ؒ w ॲཧ಺༰ʹ͓͚Δϛχόονͷϩʔυ࣌ؒ w ධՁύλʔϯ͸ධՁ࣮ݧ̍ಉ༷ɼϊʔυ਺͸

       ʢ113ݻఆʣ w Ϟσϧ͸3FT/FUͱ͍͏৞ΈࠐΈχϡʔϥϧωοτϫʔΫͷҰछΛ࢖༻ɼ࢖༻ ͢ΔσʔλɼϛχόοναΠζ͸ධՁ࣮ݧ̍ͱಉ͡ 17 w ˞༧ߘʹܝࡌͨ͠ʮFQPDIσʔλฒྻ܇࿅ʯͷධՁ݁Ռͱ͸ҎԼͷ఺͕ҟͳΓ·͢ w ॲཧ։࢝લʹESPQDBDIFTΛ༻͍ͯ1BHF$BDIFΛ࡟আ w ܇࿅͢ΔFQPDI਺Λ͔ΒʹมߋʢFQPDI໨Ҏ߱ͷ݁ՌʹมԽ͕ͳ͔ͬͨͨΊʣ w ίʔυͷ࠷దԽʢQSJOUσόοάʹΑΔΦʔόʔϔουΛۃྗ࡟আʣ
  18. w σʔλฒྻ܇࿅͸.1*Ͱෳ਺ͷϓϩηεΛੜ੒͠ɼͦΕͧΕͷϓϩηε͕෼ׂ͞Εͨ ܇࿅σʔλΛSFBEͯ͠܇࿅͢Δ w ௨ৗͷ܇࿅ॲཧͱ͸ҟͳΓɼ#BDLXBSEͷ͋ͱʹ"MM3FEVDFΛ༻͍֤ͯϓϩηεؒͰ ಉظ্ͨ͠Ͱɼ֤ϓϩηεͷύϥϝʔλͷฏۉΛܭࢉͯ͠ڞ༗͢Δॲཧ͕ൃੜ͢Δ ධՁ࣮ݧ̎ʢ̎̏ʣ σʔλฒྻ܇࿅ʹ͓͚Δॲཧϑϩʔ 18

  19. w ϊʔυ਺ͷ૿Ճͱͱ΋ʹͲͷख๏΋ੑೳ͕΄΅ઢܗʹεέʔϧ͍ͯ͠Δ w ఏҊख๏͸ͲͷέʔεͰ΋ڞ༗ετϨʔδΑΓ΋ߴ଎Ͱ͋Δͱ͍͏݁Ռ͕ಘ ΒΕͨ w ఏҊख๏ͷϊʔυϩʔΧϧετϨʔδʹର͢Δࠩ͸ ࠷େͰ໿ ඵࠩʢϊʔυʣɼ࠷খͰ໿ඵࠩʢϊʔυʣ ධՁ࣮ݧ̎ʢ̏̏ʣ

    ڞ༗ετϨʔδ ϊʔυϩʔΧϧετϨʔδ ఏҊख๏ ڞ༗ετϨʔδ ϊʔυϩʔΧϧετϨʔδ ఏҊख๏ 'BTUFS #FUUFS FQPDIͷσʔλฒྻ܇࿅ʹཁͨ͠ॲཧ࣌ؒ ϊʔυ਺ผͷ ̍ඵ͋ͨΓʹ܇࿅ॲཧͨ͠ը૾ͷຕ਺ 19
  20. ධՁ࣮ݧ̎ʹ͓͚Δ ॲཧ࣌ؒͷ಺༁ʢ̍̑ʣ w ϊʔυʹ͓͚Δશ.1*ϓϩηεʢϓϩηεʣͷॲཧ࣌ؒΛੵΈ্͛ͨ஋ͷ ಺༁ΛݟΔͱɼFQPDI໨ͷϛχόονͷϩʔυ͸શମͷఔ౓Ͱ͋Γɼ FQPDI໨Ͱ͸΄ͱΜͲফࣦ͍ͯ͠Δ ఏҊख๏ʹ͓͚ΔϊʔυͰͷ શ.1*ϓϩηεͷશॲཧ࣌ؒ߹ܭ஋<TFD> ڞ༗ετϨʔδʹ͓͚ΔϊʔυͰͷ શ.1*ϓϩηεͷશॲཧ࣌ؒ߹ܭ஋<TFD>

    'BTUFS 'BTUFS 20
  21. ධՁ࣮ݧ̎ʹ͓͚Δ ॲཧ࣌ؒͷ಺༁ʢ̎̑ʣ w ఏҊख๏ͱϊʔυϩʔΧϧετϨʔδͰ͸FQPDI໨ʹͳΔͱ΄ͱΜͲॲཧ࣌ؒ ͕͔͔͓ͬͯΒͣɼϝϞϦ͔ΒͷΞΫηε࣌ؒͷΈ͕࿐ग़͍ͯ͠Δͱߟ͑ΒΕΔ w ڞ༗ετϨʔδͷΈ͕FQPDI໨ʹ͓͍ͯඵۙ͘Λཁ͍ͯ͠Δ͜ͱ͔Βɼε τϨʔδ͔ΒͷSFBE଴͕ͪ࣌ؒ͋Δఔ౓ଘࡏ͍ͯ͠Δ͜ͱ͕ߟ͑ΒΕΔ ϊʔυͰͷશ.1*ϓϩηεͷϛχόονͷϩʔυʹ͓͚Δ ॲཧ࣌ؒ߹ܭ஋<TFD>

    'BTUFS ڞ༗ετϨʔδ ϊʔυϩʔΧϧετϨʔδ ఏҊख๏ ඵ ඵ 21
  22. w FQPDIXϊʔυͰ܇࿅͍ͯ͠Δ࠷தʹɼ೚ҙʹબΜͩϊʔυʹ͓͚ ΔWNTUBUͰαϯϓϦϯάͨ͠XBʢ*0଴ͪ࣌ؒʣͷׂ߹ͷਪҠΛൺֱ w ϊʔυϩʔΧϧετϨʔδͱఏҊख๏͸FQPDI໨Ҏ߱͸΄ͱΜͲXBͷ஋ ͕ൃੜ͠ͳ͍͕ڞ༗ετϨʔδ͸FQPDI໨Ҏ߱΋XB͕Ұఆͷߴ͞ͰਪҠ FQPDI FQPDI FQPDI FQPDI

    FQPDI FQPDI ϊʔυϩʔΧϧετϨʔδʹ͓͚Δ XBͷਪҠ 22 ఏҊख๏ʹ͓͚Δ XBͷਪҠ ڞ༗ετϨʔδʹ͓͚Δ XBͷਪҠ ධՁ࣮ݧ̎ʹ͓͚Δ ॲཧ࣌ؒͷ಺༁ʢ̏̑ʣ
  23. ධՁ࣮ݧ̎ʹ͓͚Δ ॲཧ࣌ؒͷ಺༁ʢ̐̑ʣ ϛχόονͷϩʔυͷ͕࣌ؒ JUFSBUJPOͷ܇࿅࣌ؒΑΓ΋௕͍৔߹ͷॲཧϑϩʔ ϛχόονͷϩʔυͷ͕࣌ؒ JUFSBUJPOͷ܇࿅࣌ؒΑΓ΋୹͍৔߹ͷॲཧϑϩʔ w ڞ༗ετϨʔδʹ͓͍ͯ͸܇࿅଎౓ΑΓ΋ϛχόονͷੜ੒࣌ؒͷํ͕௕͔͔͍ͬͯͨͨ͘Ίɼϛ χόονͷϩʔυ͕࣌ؒ͋Δఔ౓͔͔͍ͬͯͨͱߟ͑ΒΕΔʢࠨਤʣ w

    ҰํͰϊʔυϩʔΧϧετϨʔδɼఏҊख๏ʹ͓͍ͯ͸ৗʹϛχόον͕όοϑΝʹଘࡏ͍ͯ͠Δ ঢ়ଶʹͳ͍ͬͯͨͱߟ͑ΒΕΔʢӈਤʣ 23
  24. ධՁ࣮ݧ̎ʹ͓͚Δ ॲཧ࣌ؒͷ಺༁ʢ̑̑ʣ ϊʔυͰͷશ.1*ϓϩηεͷ"MM3FEVDFͷ ॲཧ࣌ؒ߹ܭ஋<TFD> w "MM3FEVDFʹཁ͢Δॲཧ࣌ؒʹ͍ͭͯ͸ɼ྆FQPDIͱ΋ʹڞ༗ετϨʔδ͕࠷΋ॴཁ͕࣌ؒ௕͍ w ͜Ε͸ɼଞͷ̎ύλʔϯͱൺ΂ϛχόονϩʔυʹཁ͢Δ͕࣌ؒ௕͍ϓϩηε͕૿͑ͨ෼ɼ "MM3FEVDFͰશϓϩηεؒͰಉظΛͱΔͨΊʹ֤ϓϩηεͰಉظ଴͍ͪͯ͠Δ͕࣌ؒ௕ظԽͯ͠ ͍ΔͨΊͱߟ͑ΒΕΔ

    ϛχόονϩʔυʹΑ֤ͬͯϓϩηεͷ "MM3FEVDFͷಉظ଴ͪͷ͕࣌ؒ௕ظԽ͢Δྫ 'BTUFS ڞ༗ετϨʔδ ϊʔϩʔΧϧ ετϨʔδ ఏҊख๏ 24
  25. ؔ࿈ݚڀ w ܇࿅σʔλઐ༻ͷ෼ࢄΩϟογϡαʔόΛઃܭ͠ɼ܇࿅ σʔλΛ͢΂ͯϝϞϦ্ʹΩϟογϡ͢Δ͜ͱͰSFBE࣌ ؒΛߴ଎Խ͢Δख๏ͷఏҊ<> w 5FOTPS'MPXʹ͓͍ͯ܇࿅σʔλͷฒྻSFBEͱϛχόον ͷόοϑΝϦϯάͷޮՌΛܭଌ͠ɼͦͷ༗ޮੑΛใࠂ<> [1] Zhu,

    Y., Chowdhury, F.,et al. “Entropy-Aware I/O Pipelining for Large-Scale Deep Learning on HPC Systems”, MASCOTS.2018.00023, pp.145-146 (2018) [2] X. Lu, H. Shi, M. H. Javed,et al. “Characterizing Deep Learning over Big Data (DLoBD) Stacks on RDMA-Capable Networks," 2017 IEEE 25th Annual Symposium on High- Performance Interconnects (HOTI), pp. 87-94 (2017) 25
  26. ·ͱΊ w ຊݚڀͰ͸ɼ௿଎ͳڞ༗ετϨʔδͱߴ଎ͳϊʔυϩʔΧϧετ ϨʔδΛ૊Έ߹Θͤͯɼػցֶशʹ͓͚ΔSFBE*0Λߴ଎Խ͢ Δख๏ΛఏҊͨ͠ w  ϛχόονΛϩʔυ͢ΔϕϯνϚʔΫͰ͸ɼڞ༗ετϨʔ δͷΈΛ࢖༻͢Δ৔߹ͱൺ΂ͯ࠷େͰߴ଎ʹϛχόονΛ ϩʔυͰ͖ΔੑೳΛࣔͨ͠

    w FQPDIͷσʔλฒྻ܇࿅ʹ͓͍ͯ͸ɼఏҊख๏͸ैདྷख๏ͱ มΘΒͳ͍εέʔϥϏϦςΟΛࣔ͠ɼશͯͷϊʔυ਺ʹ͓͍ͯڞ ༗ετϨʔδΑΓ΋܇࿅଎౓͕ߴ଎Ͱ͋Δͱ͍͏݁ՌΛࣔͨ͠ 26
  27. ҎԼ༧උεϥΠυ 27

  28. ϓϩηεGPSL࣌ͷ ઃఆʹ͍ͭͯ w $IBJOFSͷυΩϡϝϯτʹ*OpOJ#BOEͱ.VMUJQSPDFTT*UFSBUPS ͷ૊Έ߹ΘͤͰΫϥογϡ͢ΔՄೳੑ͕هࡌ͞Ε͍ͯΔ