Slide 1

Slide 1 text

Research Paper Introduction #12 “B4: Experience with a Globally-Deployed Software De fi ned WAN” @cafenero_777 2020/06/16

Slide 2

Slide 2 text

$ which • B4: Experience with a Globally-Deployed Software De fi ned WAN • Sushant Jain, Alok Kumar, Subhasree Mandal, Joon Ong, Leon Poutievski, Arjun Singh, Subbaiah Venkata, Jim Wanderer, Junlan Zhou, Min Zhu, Jonathan Zolla,
 Urs Hölzle, Stephen Stuart and Amin Vahdat • Google, Inc • SIGCOM ’13 • ACM SIGCOMM Computer Communication Review • https://dl.acm.org/doi/abs/10.1145/2534169.2486019

Slide 3

Slide 3 text

Agenda • ֓ཁͱಡ΋͏ͱͨ͠ཧ༝ • Introduction • BACKGROUND • DESIGN • TRAFFIC ENGINEERING • TE PROTOCOL AND OPENFLOW • EVALUATION • EXPERIENCE FROM AN OUTAGE • RELATED WORK • CONCLUSIONS

Slide 4

Slide 4 text

֓ཁͱಡ΋͏ͱͨ͠ཧ༝ • ֓ཁ • GoogleͷWAN(DCؒ)ͷઃܭɾ࣮૷ɾධՁ • OpenFlowΛ༻͍ͨதԝ؅ཧܕͷτϥϑΟοΫΤϯδχΞϦϯά • ಡ΋͏ͱͨ͠ཧ༝ • Jupiter: googleͷLAN(DC಺)ͷ࿦จ಺Ͱݴٴ͞Ε͍ͯͨͨΊɻ • SDN -> OpenFlow -> P4/WB-SW/Stratum • https://www.publickey1.jp/blog/18/grpcstratumonfgoogle.html

Slide 5

Slide 5 text

Introduction • WAN͸ߴՁ • transitඅ༻ʂՄ༻ੑͷͨΊར༻཰͸ʢͨͬͨͷʣ30%-40% • NWػࡐ΋ߴ͍ • GoogleͷWAN՝୊ • ຤୺·ͰͷτϥϑΟοΫ੍ޚΛ͍ͨ͠ • DCؒͷଳҬΛͲ͏ʹ͔͍ͨ͠ • தԝ੍ޚ͍ͨ͠ • B4ͱ͍͏OpenFlowϕʔεͷ؅ཧγεςϜΛ։ൃɺ3೥ӡ༻ɻTE΋ߦ͏ɻ

Slide 6

Slide 6 text

BACKGROUND (1/2) • ੈքதʹDC/Ωϟογϡ͕͋Δ • ݕࡧɾϏσΦɾΫϥ΢υΛఏڙ • Ϣʔβ޲͚WANͱ಺෦௨৴༻WAN (B4) • ಺෦௨৴͸99%͕B4ܦ༝ • Ϣʔβ޲͚͸ଟ͘ͷIPSܦ༝ɺߴՄ༻ੑɺ༷ʑͳϓϩτίϧ • ϢʔβσʔλɺϦϞʔτετϨʔδɺσʔλϓογϡಉظ • ʢྔগͳ͍ɺlatencyड͚΍͍͢ɺ༏ઌ౓ߴ͍ʣ • Commodity HW/Scale outͤ͞Δ͕ɺΠϯλʔωοτͷ੒௕ΑΓૣ͘಺෦WAN͕੒௕ • εέʔϧɺରো֐ੑɺίετɺ੍ޚ໘Λղܾ͍ͨ͠ -> B4 • େن໛ଳҬ੍ޚɺର৅͸਺ेఔ౓ʢDCͷ਺ఔ౓ʣɺΤϯυͷΞϓϦέʔγϣϯ༏ઌ౓ͱόʔετ੍ޚʢෳࡶͳNW੍ޚ͸͠ͳ͍ʣɺίε τ࠷దԽ

Slide 7

Slide 7 text

BACKGROUND (2/2) ઃܭࢥ૝ ݪཧͱϝϦοτ ՝୊ Merchant switch siliconΛ༻͍ͨϧʔλ ΞϓϦτϥϑΟοΫ੍ޚͱগ਺DC -> όοϑΝɾFIB͸গͳͯ͘ྑ͍ -> ίετ࡟ݮͰ͖Δ HW৑௕ੑɺdeep bu ff ering, large FIB ࢖༻ଳҬ͸100%Λ໨ࢦ͢ ޮ཰తͳlong haul transport ଟ͘͸ฏۉతʹߴ͍ଳҬΛ࢖͑Δ େن໛௨৴͸ಈతʹଳҬ੍ޚ͞ΕΔ ো֐࣌ͷύέϩε͸ආ͚ΒΕͳ͍ தԝ؅ཧTE Ϛϧνύεόϥϯγϯά ΞϓϦέʔγϣϯ෼ྨͱεέδϡʔϦϯά ʢlink-state routing͸࠷దͰ͸ͳ͍ʣ طଘͷϓϩτίϧ͸࢖͑ͳ͍ DCؒ௨৴ͷφϨοδ͕ඞཁ HW/SW෼཭ ϧʔςΟϯάɾ؂ࢹΛΧελϚΠζ ϓϩάϥϛϯάΠϯλʔϑΣʔεͰHWΛ ෼཭͠ɺґଘΛճආ HW/SW෼཭ͷ։ൃ͸ະܦݧ

Slide 8

Slide 8 text

DESIGN (Overview) • HW/SW(contorller)෼཭ͱPaxos෼ࢄ • Routing/TE෼཭ • ISIS͸࢖ΘͣʹOpenFlowͰ࣮ݱ • ϕʔεϥΠϯ fl ow • ༏ઌ౓෇͖ fl ow (TE༻్) • Fallback ( fl owফ͚ͩ͢ʁ) • ༻ޠ • NCS (NetworkControlServer) • NCA (NetworkControlApplication) • OFC (OpenFlowController) • OFA (OpenFlowAgent) NCA SDN AS AS

Slide 9

Slide 9 text

DESIGN (Switch Design) • ίετߴ • Deep bu ff er, large FIB, ߴՄ༻ੑHW • B4Ͱ͸ඞཁͳ͍ • طଘSDN PFͰ͸࣮ݱͰ͖ͳ͔ͬͨ • ࣗ෼ͨͪͰ࡞ͬͨ • 10G*128ϙʔτ non-blocking • w/ 16*10G merchant silicon • OFA: BGP->OFม׵

Slide 10

Slide 10 text

DESIGN (NCF/Routing) • Network Control Functionality • PaxosͰ෼ࢄɺϦʔμબग़ɺߴՄ༻ੑ • NIB؅ཧɿঢ়ଶ(Քಇதɺdrained, etc), topoloby౳ • Routing • RAP (Routing Application Proxy) • BGP/ISISϧʔτߋ৽ɺI/Fঢ়ଶ؅ཧ౳ • ͬ͘͟Γݴ͏ͱBGP/ISIS->OF΁ͷม׵

Slide 11

Slide 11 text

TRAFFIC ENGINEERING (Centralized TE Architecture) • ΞϓϦέʔγϣϯؒͷଳҬΛ࠷దԽ͍ͨ͠ • αΠτؒ(ASؒ)௨৴ͰTE͍ͨ͠ • FG(FlowGroup)ͱͯ͠·ͱΊΔ • IPIPͰτϯωϧ(T) • TG(Tunnel Group, TͱFGͷϚοϓ)ͱͯ͠αΠτؒʹద༻ • ͋ͱ͸OFC/OFA͕OFΛྲྀ͠ࠐΜͰ͘ΕΔ

Slide 12

Slide 12 text

TRAFFIC ENGINEERING (Bandwidth functions) • ֤ΞϓϦͷ༏ઌ౓͔ΒଳҬΛܾΊΔ • ֤FGͷधཁ͔Βग़ͤΔଳҬΛܾΊΔ

Slide 13

Slide 13 text

TRAFFIC ENGINEERING (TE Optimization Algorithm) • ུɺɺ

Slide 14

Slide 14 text

TE PROTOCOL AND OPENFLOW • TE/໾ׂ/OF/HWͷϚοϐϯά • ѼઌIP -> FG -> TG • tunnel-ID͸શαΠτͰઃఆ؅ཧඞཁ • 2ͭͷύεʹ෼ׂ͢Δྫ • hashͰpathΛܾΊΔ

Slide 15

Slide 15 text

TE PROTOCOL AND OPENFLOW • Routing/TEͷ߹੒ • ༏ઌ౓෇͖ • Routing͸LPMςʔϒϧͱ࣮ͯ͠૷ • TE͸ACL/TOS(QoS)ͱ࣮ͯ͠૷ɺ༏ઌ • αΠτຖʹTED(TE-Database)ʹม׵ • ֤߲໨ྫ • ґଘؔ܎ͱো֐ • FGΛফ͢લʹTGফ͑ͨΒύέϩεɻɻ • ยαΠτͰফ͑ͯ΋໭ͤΔ࢓૊Έ (TED͔Β໭͢) • ྆αΠτͰফ͑ͯ΋໭ͤΔ࢓૊Έ (TEDετΞͱಉظ)

Slide 16

Slide 16 text

EVALUATION (Deployment and Evolution ) • 2012೥͸2ഒʹ૿͑ͨ • TEػೳͷૉૣ͍ಋೖ • TEϚελ͸ฏۉ11೔ؒ • ஍ཧతʹ཭Εͨ5Օॴ • τϙϩδมߋ286ճ/day • edge (DC)ͷมߋ΋͋Δ • WANͷϑϥοϐϯά͕ԸܙΛड͚Δ

Slide 17

Slide 17 text

EVALUATION (TE ops performance) • Fig12-(d), Ωϟογϡͷಋೖ • ӡ༻͠ͳ͕Β5݄ɺ11݄Ͱ࠷దԽͨ͠ • τϙϩδมߋ͸͕͔͔࣌ؒΔ܏޲

Slide 18

Slide 18 text

EVALUATION (Impact of Failures) • Single link͕੾Εͯ΋ECMPͰଈ෮چ • encap SW͕མͪΔͱ3ඵஅ(ݱࡏ͸100ms) • OFCͱTEαʔό͕མͪͯ΋அͳ͠ • ߋ৽Ͱ͖ͳ͍ɻfail-overதͩͱ·͍ͣΒ͍͠

Slide 19

Slide 19 text

EVALUATION (Link Utilization and Hashing) • ଳҬར༻཰͸΄΅100%Λୡ੒ • ύέϩε΋গͳ͍ • TEͰ༏ઌ౓ผʹଳҬ੍ޚͰ͖͍ͯΔͨΊɻ • Non deep bu ff er, non-ECMP, όʔετʹ΋ؔΘΒͣ • શ෺ཧϦϯΫͰݟͯ΋ଳҬΛ༗ޮར༻Ͱ͖͍ͯΔ

Slide 20

Slide 20 text

EXPERIENCE FROM AN OUTAGE • ϝϯς࣌ʹൃੜ • طଘSWͱಉ͡IDΛ৽SWʹखಈઃఆ -> େن໛ϑϥοϓ • ISIS LSPϑϥοσΟϯά-> ߴෛՙ->BGPωΠόʔஅ • ݹ͍TE͸࢒͍͕ͬͯͨɺ৽ͨͳTF͕ೖΕΒΕͣɺτϯωϧுΕͳ͍->৽ن௨৴Ͱ͖ͳ͍ • શOFCϦϒʔτ ʢͱಉ࣌ʹߴෛՙ࣌ͰͷOFCόά΋ൃݟʣ • վળ఺ • OFC/OFAؒ͸latency, scalability࠷ॏཁɺνϟωϧΛ2ͭʹ෼͚Δ • ඇಉظϚϧνεϨουॲཧ • ύϑΥʔϚϯεϓϩϑΝΠϧͱϨϙʔτʢ࣮͸ϩάʹஹީ͕ग़͍ͯͨɺɺʣ • TE-OFC͕ؒ੾Εͯ΋௨৴ʹӨڹͳ͍ʢfail-openʣ • ͦͷ··ʹͳͬͯ͠·͏ɻ੾Εͨ/ނোͨ͠OFCΛճආ͢ΔΑ͏ʹमਖ਼ • ώϡʔϚϯΤϥʔ͸ආ͚ΒΕͳ͍ɻखಈ࡞ۀ͸͢Δ΂͖Ͱ͸ͳ͍

Slide 21

Slide 21 text

CONCLUSIONS • B4ͷಈػɾઃܭɾධՁΛղઆ • HW/SW෼཭ɺίϯτϩʔϧϓϨʔϯɾσʔλϓϨʔϯ෼཭ • 3೥ӡ༻(@2013)͠ɺҎԼΛ࣮ݱ • WANଳҬ΄΅100%ར༻ʢTEͰ͖͍ͯΔʣ • Commodity HWར༻Ͱίετޮ཰͕ྑ͍ • ো֐ͱͦͷվળ

Slide 22

Slide 22 text

EoP