Slide 1

Slide 1 text

GPUネットワーク設計・運⽤ 基礎勉強会 Lossless Ethernet ‒ PFC/ECN編 .BTBZVLJ,PCBZBTIJ

Slide 2

Slide 2 text

˙ର৅ 3P$&Wʹશ͘৮Εͨ͜ͱͷແ͍ωοτϫʔΫΤϯδχΞ -εΠονͷӡ༻ܦݧ͕͋Δ͜ͱ ˙ΰʔϧ -PTTMFTT&UIFSOFUͷ֓ཁΛཧղ͢Δ 3P$&WωοτϫʔΫΛߏங͢ΔͷʹͲͷΑ͏ͳٕज़ͱߟྀࣄ߲͕͋Δͷ͔ཧղ͢Δ ˙஫ҙࣄ߲ ຊࢿྉͷ಺༰͸ωοτϫʔΫΤϯδχΞ͕शಘ͠΍͍͢جૅతͳ෦෼ͷΈΛهࡌ͍ͯ͠·͢ (16Ϋϥελಛ༗ͷωοτϫʔΫߏ੒΍αʔόͷ಺෦ߏ੒ɺ$$-ʹ͍ͭͯ͸͜͜Ͱ͸৮Ε·ͤΜ ର৅ͱΰʔϧ はじめに

Slide 3

Slide 3 text

Lossless Ethernet (16ωοτϫʔΫʹϩεϨε͕ඞཁͳཧ༝ IUUQTTQFBLFSEFDLDPNNBSLVOFULVSBVUPUFUBTFOUBOFUVUPXBLVOPJNBUPLPSFLBSB

Slide 4

Slide 4 text

Backend Network Ϋϥελؒ௨৴ *OUFSDPOOFDU ઐ༻ͷωοτϫʔΫ 5SBJOJOH $MVTUFS *OGFSFODF $MVTUFS 4UPSBHF $MVTUFS -FBG -FBG 4QJOF 4QJOF -FBG -FBG -FBG -FBG -FBG -FBG 4QJOF 4QJOF -PTTMFTT'BCSJD #BDLFOE • "*$IJQؒͷ௨৴ʹ࠷దԽ • 3BJMPQUJNJ[FE5PQPMPHZ • ௒௿஗Ԇɾ௒޿ଳҬɾඇڝ߹ • -PTTMFTT 3%." • (16$FOUSJD -PTTZ'BCSJD 'SPOUFOE • ΠϯλʔωοταʔϏεʹ࠷దԽ • $MPT5PQPMPHZ • ϕετΤϑΥʔτ • -PTTZ 5$1 • $16$FOUSJD

Slide 5

Slide 5 text

Backend Network ͳͥඞཁͳͷ͔ʁ ಉ͡ωοτϫʔΫʹࠞࡏͤͯ͞͸͍͚ͳ͍ͷ͔ʁ "*.- 4FSWFST -FBG -FBG 4QJOF 4QJOF -FBG -FBG 'SPOUFOE 4FSWFST • 5$16%1ύέοτϩεલఏ • 3%."ύέοτϩεϨεલఏ • ࠶ૹ͢Δઃܭ WT࠶ૹͤ͞ͳ͍ઃܭ • 3P$&W͸ઌʹϒϨʔΩΛ͔͚Δٕज़ • 3%."༻ʹઃܭ͞ΕͨωοτϫʔΫͰ͸ 5$1͸ύϑΥʔϚϯεΛൃشͰ͖ͳ͍ 3P$&W͕༗ޮͳωοτϫʔΫͰ͸5$1ͷ᫔᫓੍ޚΞϧΰϦζϜ͕཯଎͞ΕΔͨΊɺ෼཭͢Δ

Slide 6

Slide 6 text

᫔᫓ʹΑΔόοϑΝҲΕΛ๷͙όοϑΝ؅ཧٕज़ͷू߹ ˠ ͜Ε͕ϩεϨε *ODBTU5SBGGJD΍ෆۉߧͳ෼ࢄ΁ͷରॲͳͲΛର৅ ϦϯΫμ΢ϯ΍఻ૹΤϥʔ͸ର৅ʹ͍ͯ͠ͳ͍ #&3ͱ'&$ͷϨΠϠͰΧόʔ ϩεϨεͱ͸Կ͔ Lossless Ethernet (164FSWFS /7*%*") ( 1 6 ( 1 6 ( 1 6 ( 1 6 ( 1 6 ̑ ( 1 6 ̒ ( 1 6 ̓ ( 1 6 ̔ -FBG4XJUDI 3BJM -FBG4XJUDI 3BJM -FBG4XJUDI 3BJM -FBG4XJUDI 3BJM -FBG4XJUDI 3BJM 4QJOF4XJUDI 4QJOF4XJUDI 4QJOF4XJUDI 4QJOF4XJUDI (164FSWFS /7*%*") ( 1 6 ( 1 6 ( 1 6 ( 1 6 ( 1 6 ̑ ( 1 6 ̒ ( 1 6 ̓ ( 1 6 ̔ ϑϩʔͷภΓ *ODBTU ɾɾɾ *ODBTU ϑϩʔͷภΓ

Slide 7

Slide 7 text

4XJUDI • ωοτϫʔΫػث /*$΋ؚΉ ͷ֤ϙʔτʹ͸ɺૹ৴ͱड৴ͦΕͧΕʹݸͷԾ૝Ωϡʔ͕͋Δ ͜ΕΛ 5$ 5SBGGJD$MBTT 2VFVFͱݺͿɻ෺ཧతͳϝϞϦྖҬΛڞ༗͍ͯ͠Δɻ ωοτϫʔΫػثͷ2VFVFʹ͍ͭͯ パケットのプライオリティと分類 &HSFTT5$ ༻్ ະ࢖༻ ͦͷଞ ະ࢖༻ ༏ઌ౓ߴ ະ࢖༻ ະ࢖༻ ઈର༏ઌ ະ࢖༻ *OHSFTT 5$ &HSFTT5$ ༻్ ະ࢖༻ ͦͷଞ ະ࢖༻ ༏ઌ౓ߴ ະ࢖༻ ະ࢖༻ ઈର༏ઌ ະ࢖༻ 4FSWFS 4FSWFS *OHSFTT 5$

Slide 8

Slide 8 text

• 3P$&WͰ͸ૹ৴ଆͷݸͷΩϡʔʹ3P$&Wઐ༻ͷΩϡʔΛݻఆͰઃఆ͢Δඞཁ͕͋Δ 3P$&Wͷύέοτ͸͸5$ ͦͷଞͷύέοτ͸5$Λ࢖͏ͳͲ ωοτϫʔΫػثͷ2VFVFʹ͍ͭͯ パケットのプライオリティと分類 4XJUDI &HSFTT5$ ༻్ ະ࢖༻ ͦͷଞ ະ࢖༻ ༏ઌ౓ߴ ະ࢖༻ ະ࢖༻ ઈର༏ઌ ະ࢖༻ *OHSFTT 5$ &HSFTT5$ ༻్ ະ࢖༻ ͦͷଞ ະ࢖༻ ༏ઌ౓ߴ ະ࢖༻ ະ࢖༻ ઈର༏ઌ ະ࢖༻ 4FSWFS 4FSWFS *OHSFTT 5$

Slide 9

Slide 9 text

• 3P$&WͰ͸ૹ৴ଆͷ֤Ωϡʔʹద੾ͳϓϥΠΦϦςΟΛઃఆ͢Δ͜ͱͰ੍ޚ͞ΕΔ ड৴ଆͰͲͷΩϡʔΛར༻͢Δ͔͸࣮૷ґଘ ωοτϫʔΫػثͷ2VFVFʹ͍ͭͯ パケットのプライオリティと分類 &HSFTT5$2VFVF ༏ઌ੍ޚઃఆ ׂΓ౰ͯର৅ ະ࢖༻ %833 ͦͷଞ ະ࢖༻ %833 3P$&W௨৴ ະ࢖༻ ະ࢖༻ 4USJDU $/1 %833ɿ %FGJDJU8FJHIUFE3PVOE3PCJO

Slide 10

Slide 10 text

• ػثͰͦͷύέοτΛͲͷΩϡʔͰॲཧ͢Δͷ͔ΛࣝผͰ͖ͳ͚Ε͹ͳΒͳ͍ ύέοτ͸7-"/UBH΋͘͠͸%4$1ͷCJUͰࣝผ͢Δ ͲͷϓϥΠΦϦςΟΛ࢖༻͢Δ͔͸εΠονͷઃఆͰબ୒Մೳ ͜ΕΛ 5SVTU.PEFͱݺͿ ύέοτͷ෼ྨ パケットのプライオリティと分類 %."$ 4."$ 7-"/5BH 2 &UIFS5ZQF *1)FBEFS 1BZMPBE %4'JFME 51*% 1$1 %&* 7*% CJU CJU CJU CJU &$/ %4$1 CJU CJU 1SJPSJUZ$PEF1PJOU %JGG4FSW $PEF1PJOU CJU

Slide 11

Slide 11 text

• ػثͰͦͷύέοτΛͲͷΩϡʔͰॲཧ͢Δͷ͔ΛࣝผͰ͖ͳ͚Ε͹ͳΒͳ͍ ύέοτ͸7-"/UBH΋͘͠͸%4$1ͷCJUͰࣝผ͢Δ ͲͷϓϥΠΦϦςΟΛ࢖༻͢Δ͔͸εΠονͷઃఆͰબ୒Մೳ ͜ΕΛ 5SVTU.PEFͱݺͿ ύέοτͷ෼ྨ パケットのプライオリティと分類 cumulus@leaf_rail-1:mgmt:~$ nv show interface swp1 qos roce status operational ------------------ ------------- pfc pfc-priority 3 rx-enabled yes tx-enabled yes trust trust-mode pcp,dscp congestion-control congestion-mode ecn, absolute enabled-tc 0,3 min-threshold 153.00 KB max-threshold 1.43 MB mode lossless leaf_rail-4#show qos interfaces ethernet 1/1 Ethernet1/1: Trust Mode: DSCP Default COS: 0 Default DSCP: 0 Port shaping rate: disabled

Slide 12

Slide 12 text

• ड৴ଆͷػثͰͦͷύέοτΛͲͷΩϡʔͰॲཧ͢Δͷ͔ΛࣝผͰ͖ͳ͚Ε͹ͳΒͳ͍ ଟ͘ͷ৔߹ɺ%4$1Ͱࣝผ͢Δ ύέοτͷ෼ྨ パケットのプライオリティと分類 4XJUDI &HSFTT5$2VFVF 4FSWFS 3P$&W #(1 *1W3" 3P$&W 3P$&W #(1 *1W3" 3P$&W %4$1 %4$1/"

Slide 13

Slide 13 text

ઃఆ஋͸೚ҙ͕ͩɺσϑΝΫτελϯμʔυ͕͋Δɻ 3P$&W%4$15$ $/1%4$15$ ͕ҰൠతͳϚοϐϯάߏ੒ɻ/7*%*" .FMMBOPY ͷਪ঑஋ɻ Ϛοϐϯά͸ͦͷػثϩʔΧϧͷಈ࡞ʹӨڹ͢Δ͕ɺ ӡ༻্ͷ؍఺Ͱ͢΂ͯͷػثͰઃఆ஋Λ߹ΘͤΔ͜ͱɻ 3P$&Wͷ%4$1൪߸ʹ͍ͭͯ パケットのプライオリティと分類 IUUQTFOUFSQSJTF TVQQPSUOWJEJBDPNTBSUJDMFMPTTMFTTSPDF DPOGJHVSBUJPOGPSNMOYPTTXJUDIFTJOETDQCBTFE RPTNPEFBEWBODFENPEFY IUUQTXXXBSJTUBDPNBTTFUTEBUBQEG#SPBEDPN 3P$&%FQMPZNFOU(VJEFQEG

Slide 14

Slide 14 text

パケットのプライオリティと分類 /*$ଆͷઃఆ ࠷ॏཁ # use L3 PFC, default=pcp (L2 PFC) sudo mlnx_qos -i $IF_NAME --trust dscp # enable PFC on PFC Priority 3 sudo mlnx_qos -i $IF_NAME --pfc 0,0,0,1,0,0,0,0 # clear Traffic Class (TC) settings echo "tclass=-1" | sudo tee /sys/class/infiniband/$DEV_NAME/tc/1/traffic_class # set default ToS (= DSCP value * 4) for RoCE traffic echo 106 | sudo tee /sys/class/infiniband/$DEV_NAME/tc/1/traffic_class # set default ToS for RoCE traffic sudo cma_roce_tos -d $DEV_NAME -t 106 &$/ %4$1 CJUT CJUT 5P4 CJUT 1'$ %4$1ઃఆ

Slide 15

Slide 15 text

パケットのプライオリティと分類 /*$ଆͷઃఆ ࠷ॏཁ [markunet@hgx200]$ sudo mlnx_qos -i enp14s0np0 --trust dscp DCBX mode: OS controlled Priority trust state: dscp dscp2prio mapping: prio:0 dscp:00, default priority: Receive buffer size (bytes): 19872,220896,0,0,0,0,0,0,max_buffer_size=4151520 Cable len: 7 PFC configuration: priority 0 1 2 3 4 5 6 7 enabled 0 0 0 1 0 0 0 0 buffer 0 0 0 1 0 0 0 0 tc: 0 ratelimit: unlimited, tsa: vendor priority: 1 tc: 1 ratelimit: unlimited, tsa: vendor priority: 0 tc: 2 ratelimit: unlimited, tsa: vendor priority: 2 tc: 3 ratelimit: unlimited, tsa: vendor priority: 3 tc: 4 ratelimit: unlimited, tsa: vendor priority: 4 tc: 5 ratelimit: unlimited, tsa: vendor priority: 5 tc: 6 ratelimit: unlimited, tsa: vendor priority: 6 tc: 7 ratelimit: unlimited, tsa: vendor priority: 7 1'$ઃఆ֬ೝ

Slide 16

Slide 16 text

パケットのプライオリティと分類 /*$ଆͷઃఆ ࠷ॏཁ IUUQTCMPHNZMBCDD&OBCMF-1'$%$2$/GPS3P$&PO.FMMBOPY$POOFDU9/*$T

Slide 17

Slide 17 text

パケットのプライオリティと分類 %4$1UP5$.BQQJOH$POGJHVSBUJPO εΠονଆ RoCE PCP/DSCP->SP mapping configurations =========================================== pcp dscp switch-prio - --- ----------------------- ----------- 0 0 0,1,2,3,4,5,6,7 0 1 1 8,9,10,11,12,13,14,15 1 2 2 16,17,18,19,20,21,22,23 2 3 3 24,25,26,27,28,29,30,31 3 4 4 32,33,34,35,36,37,38,39 4 5 5 40,41,42,43,44,45,46,47 5 6 6 48,49,50,51,52,53,54,55 6 7 7 56,57,58,59,60,61,62,63 7 qos map DSCP 0 1 2 3 4 5 6 7 to traffic-class 0 qos map DSCP 8 9 10 11 12 13 14 15 to traffic-class 1 qos map DSCP 16 17 18 19 20 21 22 23 to traffic-class 2 qos map DSCP 24 25 26 27 28 29 30 31 to traffic-class 3 qos map DSCP 32 33 34 35 36 37 38 39 to traffic-class 4 qos map DSCP 40 41 42 43 44 45 46 47 to traffic-class 5 qos map DSCP 48 49 50 51 52 53 54 55 to traffic-class 6 qos map DSCP 56 57 58 59 60 61 62 63 to traffic-class 7 Dscp-tc map: d1 : d2 0 1 2 3 4 5 6 7 8 9 -------------------------------------- 0 : 0 0 0 0 0 0 0 0 1 1 1 : 1 1 1 1 1 1 2 2 2 2 2 : 2 2 2 2 3 3 3 3 3 3 3 : 3 3 4 4 4 4 4 4 4 4 4 : 5 5 5 5 5 5 5 5 6 6 5 : 6 6 6 6 6 6 7 7 7 7 6 : 7 7 7 7 /7*%*"$VNVMVTσϑΥϧτ஋ ઃఆෆཁ "SJTUBΛ/7*%*"ͱಉ༷ͷϚοϐϯάͳΔΑ͏ʹઃఆͨ͠ྫ

Slide 18

Slide 18 text

パケットのプライオリティと分類 1SJPSJUZ$POGJHVSBUJPO &HSFTT5SBGGJD$MBTT 5$ %4$1 ༏ઌ੍ޚઃఆ ׂΓ౰ͯର৅ &$/1'$ %833 ͦͷଞͷ௨৴ %833 3P$&Wͷ௨৴ ༗ޮ 4USJDU $/1 • %833ͱͦͷઃఆ஋͸͋͘·Ͱྫɺػثͷ࢓༷ʹ߹Θͤͯνϡʔχϯά͢Δඞཁ͕͋Δ • ػثʹΑͬͯ͸833ͳͲҟͳΔΞϧΰϦζϜΛαϙʔτ͢Δ৔߹΋͋Δ • ۭཝ͸ະ࢖༻ʢະઃఆʣͷΩϡʔ

Slide 19

Slide 19 text

パケットのプライオリティと分類 ઃఆྫ "SJTUB&04 leaf_rail-4#show qos interfaces ethernet 1/1 Ethernet1/1: Trust Mode: DSCP Default COS: 0 Default DSCP: 0 Port shaping rate: disabled Tx Bandwidth Bandwidth Shape Rate Priority ECN/WRED Queue Guaranteed (units) (units) ------------------------------------------------------------------------------------------ 7 - / - - / - ( - ) - / - ( - ) SP / SP D 6 - / - - / - ( - ) - / - ( - ) SP / SP D 5 - / - - / - ( - ) - / - ( - ) SP / SP D 4 - / - - / - ( - ) - / - ( - ) SP / SP D 3 95% / 95% - / - ( - ) - / - ( - ) RR / RR L 2 - / - - / - ( - ) - / - ( - ) RR / SP D 1 5% / 5% - / - ( - ) - / - ( - ) RR / RR D 0 - / - - / - ( - ) - / - ( - ) RR / SP D Note: Values are displayed as Operational/Configured Legend: RR -> Round Robin SP -> Strict Priority - -> Not Applicable / Not Configured % -> Percentage of reference ECN/WRED: L -> Queue Length ECN Enabled W -> WRED Enabled D -> Disabled

Slide 20

Slide 20 text

4XJUDI • ͋ͱ͸ׂΓ౰ͯͨΩϡʔͷόοϑΝ͕ҲΕͳ͍Α͏ʹ੍ޚ͢Δ ˠ -PTTMFTT&UIFSOFUʹඞཁͳٕज़ ύέοτͷ෼ྨ パケットのプライオリティと分類 &HSFTT5$ ༻్ ະ࢖༻ ͦͷଞ ະ࢖༻ ༏ઌ౓ߴ ະ࢖༻ ະ࢖༻ ઈର༏ઌ ະ࢖༻ *OHSFTT 5$ &HSFTT5$ ༻్ ະ࢖༻ ͦͷଞ ະ࢖༻ ༏ઌ౓ߴ ະ࢖༻ ະ࢖༻ ઈର༏ઌ ະ࢖༻ 4FSWFS 4FSWFS *OHSFTT 5$

Slide 21

Slide 21 text

-PTTMFTT&UIFSOFUΛߏ੒͢Δओͳٕज़ ᶃ 1'$ 1SJPSJUZ'MPX$POUSPM *&&&2CC ᶄ &$/ &YQMJDJU$POHFTUJPO/PUJGJDBUJPO *&&&2BV ᶅ $/1 $POHFTUJPO/PUJGJDBUJPO1BDLFU *#5"ن֨ 3P$&Wͷύέοτʹ͜ΕΒͷٕज़Λద༻Ͱ͖ΔΑ͏ʹɺύέοτΛࣝผ͢Δඞཁ͕͋Δ ϩεϨεͷ࣮ݱํ๏ Lossless Ethernet

Slide 22

Slide 22 text

Lossless Ethernet ϩεϨεͷ࣮ݱํ๏ &UIFSOFU *16%1 *OGJOJCBOE #5) - /*$ -4XJUDI &$/1'$FOBCMFE -4XJUDI &$/1'$FOBCMFE &UIFSOFU *16%1 *OGJOJCBOE #5) - /*$ $/1 1'$ 1"64& &$/ 1'$ 1"64& %$2$/ %$2$/ 1'$ 1"64& 1'$ɿϦϯΫϨϕϧϑϩʔ੍ޚ )PQCZ)PQ &$/ɿ&OEUP&OEϑϩʔ੍ޚ

Slide 23

Slide 23 text

• ᫔᫓͕ൃੜͨ͠5SBGGJD$MBTT d ͷΩϡʔ͔Β1"64&ϑϨʔϜΛྡ઀ϊʔυʹૹ৴ • 1"64&ϑϨʔϜΛड৴ͨ͠ϊʔυ͸ͦͷ5$ʹରԠ͢ΔΩϡʔ͔Βͷύέοτͷૹ৴ΛҰ࣌ఀࢭ • ૹ৴ఀࢭͱ࠶։ґཔ 9PGG PGGPO ΛΩϡʔ͝ͱʹ੍ޚ 1SJPSJUZCBTFE'MPX$POUSPM PFC IUUQTXXXKBOPHHSKQNFFUJOHKBOPHXQDPOUFOUVQMPBET+"/0(@%$204@QEG

Slide 24

Slide 24 text

Headroom Buffer 1'$ͷ࠷ॏཁ࣮૷ 1'$ൃಈ࣌ʹɺ఻ૹ࿏্ͷύέοτΛड৴͢ΔͨΊͷόοϑΝͷ҆શྖҬ IUUQTXXXJFFFPSHGJMFTQVCMJDEPDTOFXMWBEBQUJWFQGDIFBESPPNWQEG

Slide 25

Slide 25 text

Headroom Buffer έʔϒϧ௕͔ΒͷඞཁͳόοϑΝྖҬΛࢉग़ IUUQTEPDTOWJEJBDPNOFUXPSLJOHFUIFSOFUTPGUXBSFDVNVMVTMJOVY-BZFS BOE4XJUDI1PSUT2VBMJUZPG4FSWJDFMJOLQBVTF IUUQTXXXKVOJQFSOFUEPDVNFOUBUJPOVTFOTPGUXBSFKVOPTDMJ SFGFSFODFUPQJDTSFGTUBUFNFOUDBCMFMFOHUIFEJUDMBTTPGTFSWJDFDPOHFTUJPO OPUJGJDBUJPORGYTFSJFTIUNM ϋʔυ΢ΣΞࣄʹ࢓༷͕શ͘ҟͳΔɻ جຊతʹσϑΥϧτ஋ͷ··Ͱ໰୊ͳ͍͕ɺߴ౓ͳνϡʔχϯάΛ͢Δ৔߹͸ ӡ༻ऀ͕ܭࢉͯ͠ઃఆ͢Δɻ

Slide 26

Slide 26 text

Headroom Buffer ࣮ࡍͷΫϥελͰͷݸผઃఆྫ έʔϒϧ௕͕ҟͳΔઃܭʹ͢Δͱܭࢉ͕໘౗ͳͷͰɺ&P3ͷϥοΫσβΠϯ͕औΕͳ͘ͳΔ ͜ͷܭࢉͷͨΊʹࠨӈରশͷϥοΫσβΠϯʹͳ͍ͬͯΔ ඇެ։

Slide 27

Slide 27 text

1'$͸Ωϡʔ͝ͱͷ੍ޚͱ͸͍͑ɺύέοτͷૹ৴ΛࢭΊͯ͠·͏ ࢭΊΔͷͰ͸ͳ͘ɺૹ৴ϨʔτΛ཈੍ͯ͠΋Β͏᫔᫓੍ޚ͕๬·͍͠ όοϑΝ͕ຒ·Δલʹ᫔᫓Λݕ஌ɺૹ৴ݩʹ᫔᫓Λ௨஌͠᫔᫓੍ޚ 1'$ͷܧଓ͸ੑೳ௿ԼΛট͘ ECN / CNP 4XJUDI 4FSWFS 4FOEFS 4FSWFS 3FDFJWFS $POHFTUJPO $POHFTUJPO.BSLJOH $POHFTUFE5SBGGJD $POHFTUJPO/PUJGJDBUJPO )JHI1SJPSJUZ.BSLJOH

Slide 28

Slide 28 text

छྨͷ᫔᫓௨஌ύέοτΛར༻͢Δ • &$/&YQMJDJU$POHFTUJPO/PUJGJDBUJPO • $/1$POHFTUJPO/PUJGJDBUJPO1BDLFU ಈ࡞ϝΧχζϜ ECN / CNP • C/PU&$5 /PU&$/$BQBCMF5SBOTQPSU • C&$5 &$/$BQBCMF5SBOTQPSU • C&$5 &$/$BQBCMF5SBOTQPSU • C$& $POHFTUJPO&YQFSJFODFE 4XJUDI 4FSWFS 4FOEFS 4FSWFS 3FDFJWFS $POHFTUJPO $POHFTUJPO.BSLJOH $POHFTUFE5SBGGJD $POHFTUJPO/PUJGJDBUJPO )JHI1SJPSJUZ.BSLJOH *# 6%1 *1)FBEFS &$/C &UIFSOFU *# 6%1 *1)FBEFS &$/C &UIFSOFU $/1 &$/C ᶃ &$5 CJUΛηοτ ᶄ $& CJUΛมߋ ᶅ $&Λड৴ͨ͠Β$/1Λૹ৴ ᶆ $/1ͷड৴Ͱૹ৴Ϩʔτௐ੔ 0Q *# 6%1 *1)FBEFS &$/C &UIFSOFU

Slide 29

Slide 29 text

ECN όοϑΝͷ࢖༻ঢ়گʹԠͯ͡ɺύέοτʹϥϯμϜͰϚʔΩϯάʢ&$/ $& CJUʣΛ࣮ࢪ .JOJNVN 5ISFTIPME .BYJNVN5ISFTIPME 83&%ʹجͮ͘ϚʔΩϯά ͜ͷ۠ؒ͸ϚʔΩϯά͞Εͨύέοτͱ ͞Ε͍ͯͳ͍ύέοτ͕ࠞࡏ͢Δ ͜ΕҎ߱͸ ϚʔΩϯά 83&% 8FJHIUFE3BOEPN&BSMZ%FUFDUJPO &$/͸ઐ༻ύέοτͰ͸ͳ͘ɺ 3P$&WͷσʔλύέοτʹϚʔΩϯά͢Δ #VGGFS6TBHF &$/.BSLJOH1SPCBCJMJUZ

Slide 30

Slide 30 text

ECN ֬཰తͳϚʔΩϯάͷ༨༟Λ࣋ͨͳ͍έʔε .JOJNVN 5ISFTIPME .BYJNVN5ISFTIPME ࠷খᮢ஋ͱ࠷େᮢ஋ΛಉҰʹ͢Δ͜ͱͰ શύέοτʹϚʔΩϯά͢Δઃఆ ֬཰తͳϚʔΩϯάΛ࣮ࢪ͠ͳ͍ύλʔϯ /7*%*"ͷσϑΥϧτ஋͸ͪ͜Β )1$ͳͲύϑΥʔϚϯεʹහײͳγεςϜ޲͚ #VGGFS6TBHF &$/.BSLJOH1SPCBCJMJUZ ͜ΕҎ߱͸ ϚʔΩϯά

Slide 31

Slide 31 text

CNP $/11BDLFU'PSNBU #5)#BTF5SBOTQPSU)FBEFS *OGJOJCBOE 0QDPEFY C` $/1͸௨஌ઐ༻ͷ"DLύέοτͷͨΊɺ 3P$&WͷϖΠϩʔυ͸࣋ͨͳ͍ 0Q

Slide 32

Slide 32 text

&$/͸*1ϔομͷϑΟʔϧυ͕ͩɺ$/1͸*##5ϔομͷϑΟʔϧυͰ͋Δ͜ͱʹ஫ҙ͢Δ 3P$&W1BDLFU'PSNBU ECN / CNP &$/͸*1ϔομͷٕज़ - &$/ %4$1 CJUT CJUT $/10QDPEF CJUT $/1͸*#ϔομͷٕज़ -

Slide 33

Slide 33 text

DCQCN ᫔᫓੍ޚΞϧΰϦζϜ IUUQTFOUFSQSJTFTVQQPSUOWJEJBDPNTBSUJDMF%$2$/$$BMHPSJUIN *&&&2BV2VBOUJ[FE$POHFTUJPO/PUJGJDBUJPO 2$/ ͷεΩʔϜʹج͖ͮૹ৴ϨʔτΛ੍ޚ͢Δɻ 5$1ͷΑ͏ͳ΢Οϯυ΢αΠζ੍ޚͱ͸ҟͳΔɻ # Check DCQCN is enabled on Prio 3 cat /sys/class/net/$IF_NAME/ecn/roce_np/enable/3 cat /sys/class/net/$IF_NAME/ecn/roce_rp/enable/3 # Check counters related to DCQCN cat /sys/class/infiniband/$DEV_NAME/ports/1/hw_counters/np_cnp_sent cat /sys/class/infiniband/$DEV_NAME/ports/1/hw_counters/np_ecn_marked_roce_packets cat /sys/class/infiniband/$DEV_NAME/ports/1/hw_counters/rp_cnp_handled

Slide 34

Slide 34 text

DCQCN ҰׅઃఆεΫϦϓτ IUUQTHJUIVCDPN/7*%*"EPSPDFMJOVY

Slide 35

Slide 35 text

᫔᫓͕ղফ͞ΕΔ·Ͱɺ&$/ͷϚʔΩϯάͱ$/1ͷϨεϙϯε͸ܧଓ͢Δ ૹ৴ଆͷ᫔᫓੍ޚΞϧΰϦζϜ %$2$/ ͕Ϩʔτௐ੔Λߦ͏ εΠονͰϦϯΫμ΢ϯ͍ͯ͠Δؒɺૹ৴ϊʔυ͕ϨʔτΛௐ੔͍ͯ͠Δɺ͜ͷؒύέϩεͳ͠ ࣮ࡍͷಈ࡞ྫ ECN / CNP ඇެ։

Slide 36

Slide 36 text

PFC / ECN / CNP ϋʔυ΢ΣΞΧ΢ϯλ markunet@leaf_rail-1:mgmt:~$ ethtool -S swp3 | egrep "Q3|Q6|Ecn|Pfc3" HwIfInPfc3Pkt: 31848 HwIfOutPfc3Pkt: 0 HwIfOutQ3WredDrops: 0 HwIfOutQ6WredDrops: 0 HwIfOutQ3BuffDiscards: 0 HwIfOutQ6BuffDiscards: 0 HwIfOutQ3Pkts: 318357735060 HwIfOutQ3Octets: 326513898313396 HwIfOutEcnMarkedPkts: 11375061 HwIfOutQ6Pkts: 8245964 HwIfOutQ6Octets: 643185192 HwIfInPfc3Duration: 635412 HwIfOutPfc3Duration: 0 HwIfInQ3Pkts: 0 HwIfInQ6Pkts: 0 HwIfInQ3BuffDiscards: 0 HwIfInQ6BuffDiscards: 0 HwIfInQ3SharedBuffDiscards: 0 HwIfInQ6SharedBuffDiscards: 0 3P$&WͷύέοτΧ΢ϯλ &$/ͷϚʔΩϯάΛͨ͠ύέοτΧ΢ϯλ $/1ͷύέοτΧ΢ϯλ 1'$ͷύέοτΧ΢ϯλ

Slide 37

Slide 37 text

PFC / ECN / CNP ϋʔυ΢ΣΞΧ΢ϯλ leaf_rail-4#show interfaces ethernet 1/1 counters queue detail Port TxQ Counter/pkts Counter/bytes Drop/pkts Drop/bytes ------- ---- ------------ ------------ ------------ ------------ Et1/1 UC0 5 638 0 0 Et1/1 UC1 0 0 0 0 Et1/1 UC2 0 0 0 0 Et1/1 UC3 7242261179 477989237814 0 0 Et1/1 UC4 0 0 0 0 Et1/1 UC5 0 0 0 0 Et1/1 UC6 128963 12930230 0 0 Et1/1 UC7 0 0 0 0 Et1/1 UC8 1976 539448 0 0 leaf_rail-4#show qos interfaces ethernet 1/1 ecn counters queue Ethernet1/1: Tx-Queue Marked Packets ---------- ----------------------- 0 - 1 - 2 - 3 0 4 - 5 - 6 - 7 - ద੾ͳΩϡʔΛར༻͍ͯ͠Δͷ͔ͷ֬ೝ͸ඞਢ ͳͲ͸γεςϜͷಛघ༻్ͷ৔߹͋Γ

Slide 38

Slide 38 text

RoCEv2 設定例 Arista EOS ϓϥΠΦϦςΟͷϚοϐϯάͱ..6ϓϩϑΝΠϧͷઃఆ qos map DSCP 0 1 2 3 4 5 6 7 to traffic-class 0 qos map DSCP 8 9 10 11 12 13 14 15 to traffic-class 1 qos map DSCP 16 17 18 19 20 21 22 23 to traffic-class 2 qos map DSCP 24 25 26 27 28 29 30 31 to traffic-class 3 qos map DSCP 32 33 34 35 36 37 38 39 to traffic-class 4 qos map DSCP 40 41 42 43 44 45 46 47 to traffic-class 5 qos map DSCP 48 49 50 51 52 53 54 55 to traffic-class 6 qos map DSCP 56 57 58 59 60 61 62 63 to traffic-class 7 platform trident mmu queue profile RoCE_MMU_Profile ingress threshold 1/16 egress unicast queue 3 threshold 8 ! ..6.FNPSZ.BOBHFNFOU6OJU όοϑΝ༧໿Λߏ੒͢ΔϓϩϑΝΠϧͷ͜ͱ Ωϡʔͷᮢ஋Λཁ݅ʹ߹Θͤͯมߋ͢Δ ..6Λมߋ͠ͳ͍ͱ1'$͕ܧଓಈ࡞͠ύϑΥʔϚϯε͕௿Լ͠·͢ 4USBUB9(4 $IJQͰͷઃఆྫɻ%/9Ͱ͸ҟͳΔͷͰ஫ҙɻ

Slide 39

Slide 39 text

RoCEv2 設定例 Arista EOS 3P$&WϓϩϑΝΠϧͷ࡞੒ͱ*OUFSGBDF΁ͷΞλον qos profile RoCEv2 priority-flow-control on priority-flow-control priority 3 no-drop ! tx-queue 1 no priority bandwidth percent 5 ! tx-queue 3 no priority bandwidth percent 95 random-detect ecn minimum-threshold 512 kbytes maximum-threshold 768 kbytes max-mark-probability 100 ! interface Ethernet1/1 description DOWNLINK mtu 9216 speed forced 100gfull no switchport ipv6 enable service-profile RoCEv2 ! tx-queue 3 random-detect ecn count platform trident mmu queue interface-profile RoCE_MMU_Profile ! &$/ͷᮢ஋͸σϑΥϧτ஋Ͱݕূͯ͠ɺඞཁͳΒมߋ͢Δ 1'$Λ5$Ͱ༗ޮԽ͢Δઃఆ OPESPQ-PTTMFTT &$/ͷઃఆ

Slide 40

Slide 40 text

Load Balancing &$.1Ͱͷ෼ࢄ໰୊ IUUQTXXXKBOPHHSKQNFFUJOHKBOPHXQ DPOUFOUVQMPBETKBOPHMULPCBZBTIJQEG ৄ͘͠͸ͪ͜ΒͷࢿྉͰ֬ೝ ! ip hardware fib load-balance distribution dynamic ip hardware fib load-balance distribution dynamic member-selection optimal always !

Slide 41

Slide 41 text

RoCEv2 設定例 Scheduled Fabric 7P2ϕʔεͷઃఆ ඇެ։

Slide 42

Slide 42 text

RoCEv2 設定例 SONiC 1VSF40/J$ ͰҰ෦ػೳʹະରԠ ඇެ։

Slide 43

Slide 43 text

RoCEv2 設定例 Juniper Junos ࢀߟ৘ใ ඇެ։

Slide 44

Slide 44 text

RoCEv2 設定例 Cisco Nexus ࢀߟ৘ใ ඇެ։

Slide 45

Slide 45 text

Lossless Ethernet ্ҐϨΠϠͷ᫔᫓੍ޚ͔Βॱʹಈ࡞͢Δ͜ͱ͕ద੾ɺ1'$͸ॠؒతͳ࠷ऴखஈ /FUXPSL#FTU1SBDUJDFTGPS"SUJGJDJBM*OUFMMJHFODF%BUB$FOUSF /FNBOKB,BNFOJDB 5FDIOJDBM.BSLFUJOH&OHJOFFS $JTDP-JWF#3,%$/ ΑΓը૾Ҿ༻

Slide 46

Slide 46 text

• ͦͷωοτϫʔΫͰಈ࡞͢ΔϫʔΫϩʔυ͕Կ͔ཧղ͢Δ ֶश ਪ࿦ ετϨʔδ --.େن໛෼ࢄֶश FUD • ϋʔυ΢ΣΞ࢓༷͕େ͖͘ҟͳΔ੡඼Λࠞࡏͤ͞ͳ͍ 4XJUDI /*$ όοϑΝͷ༰ྔ΍ઃఆՄೳͳ஋͕ҟͳΔ • ࣮૷͕େ͖͘ҟͳΔιϑτ΢ΣΞ΍ઃఆΛࠞࡏͤ͞ͳ͍ /04 ύέοτͷ෼ࢄํࣜͳͲ͕ҟͳΔͱύϑΥʔϚϯεʹӨڹ͢Δ • ωοτϫʔΫΛ&OEUP&OEͰӡ༻͢Δ εΠον͚ͩͰͳ͘(16αʔό΋ωοτϫʔΫΤϯδχΞͷ୲౰ྖҬ αʔό಺෦τϙϩδΛ·ͣཧղ͢Δ جຊతͳߟྀࣄ߲ RoCEv2の運⽤

Slide 47

Slide 47 text

EoF