Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Research Paper Introduction #107 "Taking the Ed...

cafenero_777
September 20, 2022

Research Paper Introduction #107 "Taking the Edge off with Espresso: Scale, Reliability and Programmability for Global Internet Peering”

cafenero_777

September 20, 2022
Tweet

More Decks by cafenero_777

Other Decks in Technology

Transcript

  1. Research Paper Introduction #40 “Taking the Edge o ff with

    Espresso: Scale, Reliability and Programmability for Global Internet Peering” ௨ࢉ#107 @cafenero_777 2022/09/15 1
  2. Agenda •ର৅࿦จ •֓ཁͱಡ΋͏ͱͨ͠ཧ༝ 1. INTRODUCTION 2. Background and REQUIREMENTS 3.

    DESIGN PRINCIPLES 4. DESIGN 5. FEATURE AND ROLLOUT VELOCITY 6. EVALUATION 7. EXPERIENCE 8. RELATED WORK 9. CONCLUSIONS 2
  3. ର৅࿦จ •Taking the Edge o ff with Espresso: Scale, Reliability

    and Programmability for Global Internet Peering • Kok-Kiong Yap, et.al. ૯੎25໊@Google • SIGCOMM ‘17 • https://dl.acm.org/doi/10.1145/3098822.3098854 3
  4. ֓ཁͱಡ΋͏ͱͨ͠ཧ༝ •֓ཁ • Espresso: (Google͔Βݟͨ) Internet edge૚Λޮ཰Α֦͘ுͰ͖ΔγεςϜ • ී௨ͷSwitchΛ࢖͍ɺhost-baseͰϓϩάϥϚϒϧʹϧʔςΟϯάɾύέοτॲཧ •

    2017೥ݱࡏɺ2೥લ͔Βӡ༻͠૯τϥϑΟοΫͷ22%͕ࡌ͍ͬͯΔ •ಡ΋͏ͱͨ͠ཧ༝ • B4, Jupiter, Andromedaͱଓ͍ͯɺؾʹͳ͍ͬͯͨͷͰɻ • EPE: Egress Peering (Tra ff i c) Engineering • SDN in CDN? 4 ࢀߟ: ҿΉํͷEspresso
  5. 1. INTRODUCTION 5 •TbpsΛ഑৴͢ΔΠϯλʔωοτͷϧʔλ • ଳҬ΍portີ౓͸࠷େن໛ʢ୆਺͸গ਺ʣɻFIBΤϯτϦ΋ଟ͍ɻIPv4 /24Ͱ਺ेສɺACL rule΋ඞཁɺ਺ ඦͷBGPηογϣϯɻ •՝୊ɿॊೈੑɾՄ༻ੑɾίετޮ཰

    • άϩʔόϧ࠷దԽɾΞϓϦ࠷దԽ͕Ͱ͖͍ͯͳ͍ɻBGPͰ΍Δͷʁ • ো֐֬཰͕௿ͯ͘΋ɺى͖ͨΒେن໛Өڹ • PortลΓͷίετ͸ʢখن໛ϧʔλͷFIB/ACLൺֱͯ͠ʣ4-10ഒఔ౓ •Espresso • ϐΞϦϯά૷ஔ͔ΒC-planeΛແͯ͘͠ɺγϯϓϧͳMPLSσʔλϓϨʔϯͷΈͱ͠ɺACL, FIB͸αʔό ଆͰಈ͔͢ • TE͸ʢݸผͷϐΞϦϯά૷ஔ͚ͩͰ΍ΔͷͰ͸ͳ͘ʣશମ࠷దԽͰ͖ΔΑ͏ʹͨ͠ɻBGP͸αʔό্Ͱ ಈ͔͢Α͏ʹͨ͠ •݁ՌɿάϩʔόϧTEͱ࿈ܞͯ͠ɺ(BGPར༻࣌ͱൺֱͯ͠) 13%ଟ͘τϥϑΟοΫΛ࿫͍ͬͯΔɻ·ͨಈըͷϦ όοϑΝ࣌ؒΛ35%->170%ʹվળ $ telnet route-server.ip.att.net
  6. 2. BACKGROUND and REQUIREMENTS •DC -> B4 -> B2 (Espresso)

    -> User • B4: DCؒWAN: Մ༻ੑ௿ ࣗࣾ಺ͰͷSDN, HW։ൃ • B2: DC to peering edge •BGPͰ੍ޚ͢Δͷେม͗͢Δ •B4։ൃɾӡ༻͔ΒSDN/TEͰ͖Δ͜ͱ͕෼͔ͬͨɻ͜ΕΛB2ʹೖΕ͍͖͍ͯͨ •ཁ݅ • E ffi ciency: ϙʔτ୯ՁΛ҆͘ʢར༻཰Λ্͛Δʣ • Inter-operability: ࠓճ͸૬ख (peering)͕͍Δ • Reliability: 99.999% (5min/year) • Incremental Deployment: ৽چฒߦӡ༻΋Ͱ͖ΔΑ͏ʹɻ • High Feature Velocity: ཁ݅มԽʹରͯ͠ਝ଎ͳػೳ։ൃͱσϓϩΠ͕Ͱ͖ΔΑ͏ʹɻ •ReliabilityͱVelocity͸എ൓͢Δ͜ͱ͕ଟ͍ 6
  7. 3. DESIGN PRINCIPLES •1. ϩʔΧϧC&άϩʔόϧCͷ֊૚ • άϩʔόϧ࠷దԽɺϩʔΧϧΠϕϯτͷଈ࣌ରԠɺϩʔΧϧࣗ཯ •2. Fail static

    • DP͸࠷ޙͷঢ়ଶ(Last Known good state)Λҡ࣋ͨ͠··CPΛϝϯς •3. CP/DP෼཭͠ɺγϯϓϧͳHWػೳͷΈར༻ʢMPLS pop΍సૹͷΈʣ • CP/DPΛಠཱʹupgradeͰ͖Δ •4. ProgrammabilityͱTesting • Unit ~ E2E·Ͱ׬શࣗಈԽ͠ɺerror budget಺ͰϦϦʔε •5. intent-drivenͳ؅ཧ • ਓతΦʔόʔϔουͷ࡟ݮɺେن໛ো֐ʹͭͳ͕Δӡ༻ίετ࡟ݮ 7
  8. 4. DESIGN (1/5) Overview •ઃܭ֓ཁ • GC: தԝίϯτϩʔϥ: ܦ࿏ू໿ͱΞϓϦ/policyܭࢉ •

    LC: ֤ϝτϩͷίϯτϩʔϥ: Πϯλʔωοτن໛FIBΛ࣋ͪɺ fl ow entry per Apps, ࡉཻ͔͍౓ͰDDoSରࡦ • Date plane: • ൚༻MPLS/GRE/IP/ACLͷPeering Fabric + BGP speakers • TCAMখ͍ͨ͞ΊɺඞཁʹԠͯ͡ΤοδαʔόଆͰMPLS push͢Δ • Intent-driven con fi guration: ઃఆ͔ΒόʔδϣϯΞοϓ·Ͱɻvalidation΋ɻ 8
  9. 4. DESIGN (2/5) ωοτϫʔΫߏ੒ͱApplication-aware routing •ωοτϫʔΫߏ੒ͱApplication-aware routing • Ingress͸IP +

    GRE • Egress͸IP + GRP + MPLS • GRE: ϧʔλ·Ͱ༠ಋ • MPLS: ϧʔλͷग़ޱϙʔτ (peering port)·Ͱ༠ಋ • ΞϓϦঢ়گΛߟྀ͢Δ • GC/LC͕GRE, MPLSΛࢦఆ (FIBΛϓϩάϥϜ) 9
  10. 4. DESIGN (3/5) App-aware TE system 10 Ϧιʔε (peering, BB,

    server )ʹԠͯ͡ΞϓϦଳҬ΍latencyΛάϩʔόϧ࠷దԽ egress MAP: <PoP, client pre fi x, service class> per <PR/PF, egress port> •Peering route: Ϣʔβ͕ʢBGPతʹʣͲ͜ʹ͍Δ͔ •ϢʔβଳҬ෯: CDN͔ΒϢʔβ/24͝ͱʹϝτϦοΫ(goodput, RTT, resubmit/sec)ΛGCʹૹ৴ͯ͠OptͰਪఆ •Linkར༻཰: Q͝ͱͷར༻཰Λऩू͠CoSॱʹಈతʹଳҬׂ౰ɻ஗ԆͱଳҬ (+17%)Λಉ࣌ʹ࠷దԽɻ •E2E᫔᫓ݕ஌͠ɺpeering link/AS path/client pre fi xΛάϧʔϓԽ͠ɺᷖճɻϝτϦΫεతʹ170%վળɻ Input Output Optimizer: inputΛݩʹ࠷దԽܭࢉ
  11. 4. DESIGN (4/5) PF: Peering Fabric •PF Controller: εΠονϓϩάϥϛϯάͱeBGPηογϣϯ؅ཧ •

    BGPd؅ཧʢσϓϩΠͱHCʣɺMPLSϧʔϧɺBGPηογϣϯ͝ͱͷGREϧʔϧΛઃఆ • BGPdͱಉډ͠ͳ͍͜ͱͰϥοΫɾిݯ৑௕ԽͰ͖Δ •ΧελϜBGP (Raven): • Raven: ϚϧνίΞ(ϚϧνεϨου)ɾେ༰ྔϝϞϦɺ࠷௿ݶͷ࣮૷ɺtestॆ࣮ɻ cf. Quagga • PFͱRaven(BGPd)ΛTunnel઀ଓͤ͞Δ͜ͱͰɺಠཱʹ૿ઃՄೳɺো֐ൣғ΋ઃܭՄೳ • ͪͳΈʹFastly͸MAC address tableΛBGPdͱಉظ͍ͤͯ͞ΔΒ͍͠ • සൟʹupgrade͞ΕΔͷͰBGP Graceful RestartΛ࢖ͬͯܦ࿏ҡ࣋ʢFRRͰ͸Ͱ͖ͳ͍Β͍͠ʣ •൚༻MPLSεΠον • FIB͸খ͍͕͞peering਺ఔ౓ͳΒे෼ • ACLߦͷҰ෦ͷΈΛల։ (top 5%ߦ͚ͩͰτϥϑΟοΫ99%ΛΧόʔʣ • OpenFlow agent͕MPLS/IP-GRE Encap/DecapϧʔϧΛσϓϩΠ 11
  12. 4. DESIGN (5/5) ߏ੒ͱ؅ཧ •Πϯςϯτϕʔε: ਓؒͷಡΊΔΠϯςϯτΛ౤͛Δ • ಺෦తʹ͸௿Ϩϕϧcon fi gʹม׵͞ΕΔ

    • Ұ؏ੑΛνΣοΫɺઃఆऔΓফ͠͸consumerଆͰมߋ͢Δ͚ͩͰΑ͍ • SW࣮૷ͷͨΊએݴతݴޠ͕࢖͑Δɾ࢖͍ͬͯΔ • গͣͭ͠DPʹ൓өͤ͞ΔɻCPଆͰଟ૚validation • BGPdͷॴʹ͋Δͷ͸͋͘·Ͱ΋Πϯςϯτ/ઃఆͷcacheѻ͍ɻfail staticࢥ૝ɻ • ۓٸఀࢭϘλϯ (Big Red Buttons) • best pathͷڧ੍্ॻ͖ɺڧ੍de-peeringɺಛఆBGP speakerʹτϥϑΟοΫدͤ •NWςϨϝτϦ: σʔλϓϨʔϯଆͷϝτϦΫεͰҟৗݕ஌͠ɺʢBGPλΠϜΞ΢τΛ଴ͨͣʹʣPFCଆͰ͙͢ʹᷖճͰ͖Δ •ϓϩʔϒύέοτΛ౤͛ͯɺʢ࣮ࡍʹencap/decap, ALCϧʔϧΛ௨Δ͔Λʣ֬ೝ͍ͯ͠Δ 12
  13. 5. FEATURE AND ROLLOUT VELOCITY •“ۓٸϝϯς”͔Β୤٫͍ͨ͠ɺͱ͍͏໌֬ͳ໨త͋Γ • ֤ίϯϙʔωϯτ͸ૄ݁߹ʢಠཱ͔ͭඇಉظͰಈ࡞ʣ • ΧφϦʔϦϦʔε

    • ޿Ҭͳunittest/૊Έ߹Θͤςετ/E2Eςετ • ݱߦɾ৽όʔδϣϯ྆ํͷޓ׵ੑνΣοΫ • ׬ྃ͢Ε͹global rollout •ִिϦϦʔεɻखಈͰ΋਺࣌ؒҎ಺ʹϦϦʔεͰ͖Δ •Ϋϥ΢υ޲͚L2VPNΛ਺ϲ݄Ͱల։Ͱ͖ͨɻैདྷܕ͸͢ͰʹҰ೥͔͔͍ͬͯΔ͕ະ׬ɻɻ 13 σϓϩΠճ਺ validation࣌ؒ ϦϦʔε࣌ؒ
  14. 6. EVALUATION τϥϑΟοΫ੒௕ͱApp-aware TE •Espressoܦ༝͸શମͷ22%͙Β͍ (2017೥ݱࡏ) •ΩϟύʔΦʔόʔʢΦʔόʔϑϩʔʣͰผϝτϩ͔ΒτϥϑΟοΫΛᷖճఏڙ • ͜Ε·ͰΑΓ13%ଟ͘τϥϑΟοΫΛఏڙՄೳ •࢖༻཰100%Ͱ΋ϩε2%ఔ౓ʢGC͕ରԠʣ

    •goodputΛࢦඪʹܦ࿏੾Γସ͑ • ಈը഑৴UXʹܹతͳޮՌ • Google/ISPؒͰ͸᫔᫓ݕ஌͞ΕͣɻGlobal TEͷ͓͔͛ʂ •ۓٸఀࢭϘλϯ • peeringఀࢭ: ฏۉ4ඵ (1.6~20ඵ) ภࠩ3ඵ • ΠϯςϯτͰpeeringఀࢭʢυϨΠϯʣ: ฏۉ20ඵ (15~100ඵ) ภࠩ8ඵ • GC, ଟஈvalidation, LC, BGPdͰ͔͔࣌ؒΔ 14
  15. 7. EXPERIENCE •1. Ϩϙʔτػೳͷ݁߹ςετෆ଍ɺόάͰεϨουރׇɺશͯͷjob͕ಈ͔ͳ͍ • Ұ෦ʹσϓϩΠ͞Ε͕ͨɺEspressoଆ͸drain͞Εɺैདྷܕ͔Βpeering͞ΕͨɻӨڹ͸࠷খݶɻD/C-plane྆ํͷ౷߹ςετ͕ඞཁʂ •2. ৽نֶशܦ࿏͕࢖ΘΕͳ͍ • GC͕ݡ͘”અ໿”͍ͯͨ͠ʢܭࢉྔΛݮΒͨ͢Ίू໿͕஗͍ɻʣLCଆͰ࢖͏Α͏ʹ࣮૷มߋͨ͠ɻLC༏ઌɻ֊૚ܕ͕ྑ͔͍ͬͨ͏࿩ɻ

    •3. Φϖϛεʢղઆऀͷิ଍ɾ૝૾͋Γʣ • શͯͷEspresso PF device͔ΒEgress༻ͷҰ࿈ͷoptionΛൈ͍ͨɻ-> େྔʹશஅ • GC͕ଞͷpeering deviceʹᷖճͨ͠ɻଳҬ༰ྔ͸༨͍ͬͯͨͷͰηʔϑɻࣄલ݁Ռ༧૝ʢγϛϡϨʔλʣ͕ग़Δ͕ɺແࢹͨ͠ɻେ෯มߋ͸༧૝Ͱ͸ͳ͘”ো ֐”ͱͯ͠ग़ྗ͢Δ͜ͱʹͨ͠ɻ҆શੑνΣοΫ௥ՃͳͲɻ •4. ޿ใͷҰ෦͕blackhole ʢղઆऀͷิ଍ɾ૝૾͋Γʣ • PFͷHW TCAM੍ݶͷͨΊɺड৴ܦ࿏͸ϑΟϧλ͢ΔϙϦγʔΛೖΕ͍ͯͨ • BBϧʔλϙϦγʔมߋͰɺࣗࣾͷશϓϨϑΟοΫεΛ޿ใ࢝͠Ίͨ -> Ұ෦ͷϓϨϑΟοΫεʢVPNͳͲʣ͕blackholeʹͳͬͨɻ • Ҡߦ΍৽چڞଘ͸͓ޓ͍ѱӨڹΛ༩͑ΔϦεΫ͕͋Δ 15
  16. 8. RELATED WORK •B4, Jupiter: ϕϯμʔɾιϑτ΢ΣΞ͸গ਺ʹ੍ݶ͞Ε͍ͯͨ •SDN peering: PoPͷΈ (not

    global)ɻϙϦγʔ͸͖Ίࡉ͔͍͕ɺApp-awareͰ͸ͳ͍ •ίετ໘ͷΈ (FacebookͷTEͳͲ) -> E2EͰධՁ •ूதܕTEίϯτϩʔϥ͸ݹ͔͘Β͋Δ͕ɺΠϯλʔωοτεέʔϧͰϢʔβϝτϦΫ εʢ஗ԆͳͲʣΛ࢖͍ͬͯΔͷ͸ͳ͍ •ϗετଆͰͷΧϓηϧԽɾύέοτॲཧ͸Nicira͔Βͷൃలܗ •એݴܕωοτϫʔΫ؅ཧ͸FBͷRobotronʹࣅ͍ͯΔ 16
  17. 9. CONCLUSIONS •SDN΁ͷ൷൑ • ΠϯλʔΦϖϥϏϦςΟ͕ඞཁͳ͍৔ॴͰ࢖͏ɻίετ࡟ݮͷͨΊʹ࢖͏ɻ •Espresso • Πϯλʔωοτεέʔϧͷpeering, ϙϦγʔΛ֊૚ίϯτϩʔϥɺHW/SW/Τοδϗετ࿈ܞͰ࿫͏ •

    SDNͷՁ஋͸SWͷॊೈੑͱ։ൃ଎౓ • ϧʔςΟϯάॲཧ (BGP)ͱύέοτॲཧΛ෼཭͠ɺ֊૚ܕίϯτϩʔϥͰࡉ͔͍Ԡ౴ੑΛ֬อ • 6ഒͷ։ൃ଎౓ɺ75%ίετ࡟ݮɺGoogleͷΠϯλʔωοτ௨৴ͷ22%Λసૹ͍ͯ͠Δ 17
  18. ׬૸ͨ͠ײ૝ •2010೥ࠒͷ~SDN + CDN (WAN)Λ௥͑ͯ໘ന͔ͬͨ •DeclarativeΈ͕͋ͬͯྑ͍ • ࠷దԽܭࢉ: Greedy algorithmΛ༻͍Δɻઢܕܭը๏Ͱ͸ͳ͍

    • Πϯςϯτϕʔε: खଓ͖ܕͰ͸ͳ͘એݴతݴޠΛ༻͍Δ •HW/SW࿈ܞͰʢࣗࣾͷಛघͳʣཁ݅Λղܾ͍͢͝ •SDNܥͷେن໛ো֐͸ා͍ 18