$30 off During Our Annual Pro Sale. View Details »

#12 “B4: Experience with a Globally-Deployed So...

#12 “B4: Experience with a Globally-Deployed Software Defined WAN”

SIGCOM ’13
ACM SIGCOMM Computer Communication Review
https://dl.acm.org/doi/abs/10.1145/2534169.2486019

cafenero_777

June 14, 2023
Tweet

More Decks by cafenero_777

Other Decks in Technology

Transcript

  1. $ which • B4: Experience with a Globally-Deployed Software De

    fi ned WAN • Sushant Jain, Alok Kumar, Subhasree Mandal, Joon Ong, Leon Poutievski, Arjun Singh, Subbaiah Venkata, Jim Wanderer, Junlan Zhou, Min Zhu, Jonathan Zolla,
 Urs Hölzle, Stephen Stuart and Amin Vahdat • Google, Inc • SIGCOM ’13 • ACM SIGCOMM Computer Communication Review • https://dl.acm.org/doi/abs/10.1145/2534169.2486019
  2. Agenda • ֓ཁͱಡ΋͏ͱͨ͠ཧ༝ • Introduction • BACKGROUND • DESIGN •

    TRAFFIC ENGINEERING • TE PROTOCOL AND OPENFLOW • EVALUATION • EXPERIENCE FROM AN OUTAGE • RELATED WORK • CONCLUSIONS
  3. ֓ཁͱಡ΋͏ͱͨ͠ཧ༝ • ֓ཁ • GoogleͷWAN(DCؒ)ͷઃܭɾ࣮૷ɾධՁ • OpenFlowΛ༻͍ͨதԝ؅ཧܕͷτϥϑΟοΫΤϯδχΞϦϯά • ಡ΋͏ͱͨ͠ཧ༝ •

    Jupiter: googleͷLAN(DC಺)ͷ࿦จ಺Ͱݴٴ͞Ε͍ͯͨͨΊɻ • SDN -> OpenFlow -> P4/WB-SW/Stratum • https://www.publickey1.jp/blog/18/grpcstratumonfgoogle.html
  4. Introduction • WAN͸ߴՁ • transitඅ༻ʂՄ༻ੑͷͨΊར༻཰͸ʢͨͬͨͷʣ30%-40% • NWػࡐ΋ߴ͍ • GoogleͷWAN՝୊ •

    ຤୺·ͰͷτϥϑΟοΫ੍ޚΛ͍ͨ͠ • DCؒͷଳҬΛͲ͏ʹ͔͍ͨ͠ • தԝ੍ޚ͍ͨ͠ • B4ͱ͍͏OpenFlowϕʔεͷ؅ཧγεςϜΛ։ൃɺ3೥ӡ༻ɻTE΋ߦ͏ɻ
  5. BACKGROUND (1/2) • ੈքதʹDC/Ωϟογϡ͕͋Δ • ݕࡧɾϏσΦɾΫϥ΢υΛఏڙ • Ϣʔβ޲͚WANͱ಺෦௨৴༻WAN (B4) •

    ಺෦௨৴͸99%͕B4ܦ༝ • Ϣʔβ޲͚͸ଟ͘ͷIPSܦ༝ɺߴՄ༻ੑɺ༷ʑͳϓϩτίϧ • ϢʔβσʔλɺϦϞʔτετϨʔδɺσʔλϓογϡಉظ • ʢྔগͳ͍ɺlatencyड͚΍͍͢ɺ༏ઌ౓ߴ͍ʣ • Commodity HW/Scale outͤ͞Δ͕ɺΠϯλʔωοτͷ੒௕ΑΓૣ͘಺෦WAN͕੒௕ • εέʔϧɺରো֐ੑɺίετɺ੍ޚ໘Λղܾ͍ͨ͠ -> B4 • େن໛ଳҬ੍ޚɺର৅͸਺ेఔ౓ʢDCͷ਺ఔ౓ʣɺΤϯυͷΞϓϦέʔγϣϯ༏ઌ౓ͱόʔετ੍ޚʢෳࡶͳNW੍ޚ͸͠ͳ͍ʣɺίε τ࠷దԽ
  6. BACKGROUND (2/2) ઃܭࢥ૝ ݪཧͱϝϦοτ ՝୊ Merchant switch siliconΛ༻͍ͨϧʔλ ΞϓϦτϥϑΟοΫ੍ޚͱগ਺DC ->

    όοϑΝɾFIB͸গͳͯ͘ྑ͍ -> ίετ࡟ݮͰ͖Δ HW৑௕ੑɺdeep bu ff ering, large FIB ࢖༻ଳҬ͸100%Λ໨ࢦ͢ ޮ཰తͳlong haul transport ଟ͘͸ฏۉతʹߴ͍ଳҬΛ࢖͑Δ େن໛௨৴͸ಈతʹଳҬ੍ޚ͞ΕΔ ো֐࣌ͷύέϩε͸ආ͚ΒΕͳ͍ தԝ؅ཧTE Ϛϧνύεόϥϯγϯά ΞϓϦέʔγϣϯ෼ྨͱεέδϡʔϦϯά ʢlink-state routing͸࠷దͰ͸ͳ͍ʣ طଘͷϓϩτίϧ͸࢖͑ͳ͍ DCؒ௨৴ͷφϨοδ͕ඞཁ HW/SW෼཭ ϧʔςΟϯάɾ؂ࢹΛΧελϚΠζ ϓϩάϥϛϯάΠϯλʔϑΣʔεͰHWΛ ෼཭͠ɺґଘΛճආ HW/SW෼཭ͷ։ൃ͸ະܦݧ
  7. DESIGN (Overview) • HW/SW(contorller)෼཭ͱPaxos෼ࢄ • Routing/TE෼཭ • ISIS͸࢖ΘͣʹOpenFlowͰ࣮ݱ • ϕʔεϥΠϯ

    fl ow • ༏ઌ౓෇͖ fl ow (TE༻్) • Fallback ( fl owফ͚ͩ͢ʁ) • ༻ޠ • NCS (NetworkControlServer) • NCA (NetworkControlApplication) • OFC (OpenFlowController) • OFA (OpenFlowAgent) NCA SDN AS AS
  8. DESIGN (Switch Design) • ίετߴ • Deep bu ff er,

    large FIB, ߴՄ༻ੑHW • B4Ͱ͸ඞཁͳ͍ • طଘSDN PFͰ͸࣮ݱͰ͖ͳ͔ͬͨ • ࣗ෼ͨͪͰ࡞ͬͨ • 10G*128ϙʔτ non-blocking • w/ 16*10G merchant silicon • OFA: BGP->OFม׵
  9. DESIGN (NCF/Routing) • Network Control Functionality • PaxosͰ෼ࢄɺϦʔμબग़ɺߴՄ༻ੑ • NIB؅ཧɿঢ়ଶ(Քಇதɺdrained,

    etc), topoloby౳ • Routing • RAP (Routing Application Proxy) • BGP/ISISϧʔτߋ৽ɺI/Fঢ়ଶ؅ཧ౳ • ͬ͘͟Γݴ͏ͱBGP/ISIS->OF΁ͷม׵
  10. TRAFFIC ENGINEERING (Centralized TE Architecture) • ΞϓϦέʔγϣϯؒͷଳҬΛ࠷దԽ͍ͨ͠ • αΠτؒ(ASؒ)௨৴ͰTE͍ͨ͠ •

    FG(FlowGroup)ͱͯ͠·ͱΊΔ • IPIPͰτϯωϧ(T) • TG(Tunnel Group, TͱFGͷϚοϓ)ͱͯ͠αΠτؒʹద༻ • ͋ͱ͸OFC/OFA͕OFΛྲྀ͠ࠐΜͰ͘ΕΔ
  11. TE PROTOCOL AND OPENFLOW • TE/໾ׂ/OF/HWͷϚοϐϯά • ѼઌIP -> FG

    -> TG • tunnel-ID͸શαΠτͰઃఆ؅ཧඞཁ • 2ͭͷύεʹ෼ׂ͢Δྫ • hashͰpathΛܾΊΔ
  12. TE PROTOCOL AND OPENFLOW • Routing/TEͷ߹੒ • ༏ઌ౓෇͖ • Routing͸LPMςʔϒϧͱ࣮ͯ͠૷

    • TE͸ACL/TOS(QoS)ͱ࣮ͯ͠૷ɺ༏ઌ • αΠτຖʹTED(TE-Database)ʹม׵ • ֤߲໨ྫ • ґଘؔ܎ͱো֐ • FGΛফ͢લʹTGফ͑ͨΒύέϩεɻɻ • ยαΠτͰফ͑ͯ΋໭ͤΔ࢓૊Έ (TED͔Β໭͢) • ྆αΠτͰফ͑ͯ΋໭ͤΔ࢓૊Έ (TEDετΞͱಉظ)
  13. EVALUATION (Deployment and Evolution ) • 2012೥͸2ഒʹ૿͑ͨ • TEػೳͷૉૣ͍ಋೖ •

    TEϚελ͸ฏۉ11೔ؒ • ஍ཧతʹ཭Εͨ5Օॴ • τϙϩδมߋ286ճ/day • edge (DC)ͷมߋ΋͋Δ • WANͷϑϥοϐϯά͕ԸܙΛड͚Δ
  14. EVALUATION (Impact of Failures) • Single link͕੾Εͯ΋ECMPͰଈ෮چ • encap SW͕མͪΔͱ3ඵஅ(ݱࡏ͸100ms)

    • OFCͱTEαʔό͕མͪͯ΋அͳ͠ • ߋ৽Ͱ͖ͳ͍ɻfail-overதͩͱ·͍ͣΒ͍͠
  15. EVALUATION (Link Utilization and Hashing) • ଳҬར༻཰͸΄΅100%Λୡ੒ • ύέϩε΋গͳ͍ •

    TEͰ༏ઌ౓ผʹଳҬ੍ޚͰ͖͍ͯΔͨΊɻ • Non deep bu ff er, non-ECMP, όʔετʹ΋ؔΘΒͣ • શ෺ཧϦϯΫͰݟͯ΋ଳҬΛ༗ޮར༻Ͱ͖͍ͯΔ
  16. EXPERIENCE FROM AN OUTAGE • ϝϯς࣌ʹൃੜ • طଘSWͱಉ͡IDΛ৽SWʹखಈઃఆ -> େن໛ϑϥοϓ

    • ISIS LSPϑϥοσΟϯά-> ߴෛՙ->BGPωΠόʔஅ • ݹ͍TE͸࢒͍͕ͬͯͨɺ৽ͨͳTF͕ೖΕΒΕͣɺτϯωϧுΕͳ͍->৽ن௨৴Ͱ͖ͳ͍ • શOFCϦϒʔτ ʢͱಉ࣌ʹߴෛՙ࣌ͰͷOFCόά΋ൃݟʣ • վળ఺ • OFC/OFAؒ͸latency, scalability࠷ॏཁɺνϟωϧΛ2ͭʹ෼͚Δ • ඇಉظϚϧνεϨουॲཧ • ύϑΥʔϚϯεϓϩϑΝΠϧͱϨϙʔτʢ࣮͸ϩάʹஹީ͕ग़͍ͯͨɺɺʣ • TE-OFC͕ؒ੾Εͯ΋௨৴ʹӨڹͳ͍ʢfail-openʣ • ͦͷ··ʹͳͬͯ͠·͏ɻ੾Εͨ/ނোͨ͠OFCΛճආ͢ΔΑ͏ʹमਖ਼ • ώϡʔϚϯΤϥʔ͸ආ͚ΒΕͳ͍ɻखಈ࡞ۀ͸͢Δ΂͖Ͱ͸ͳ͍
  17. EoP