Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Interop New York - Building Your Network for the Next 10 Years

Interop New York - Building Your Network for the Next 10 Years

Networking is cool again! After a decade of relative technology stagnation, data center networking has been seen dozens of new technologies emerge to offer new business services. This means more networking, more technology and more things to do.

This workshop is well suited to network architects & engineers who want insights into the technology foundation of the data center network for the next 10 years. This session will particularly focus on the design principles for Private and Public Cloud networking and help understand the difference between "virtual networking" of today and "cloud networking" of tomorrow. This session will attempt to define Software Define Networking so that you can laugh at "software defined” vendors.

This workshop will explore the following:

Data Center Fabrics
L2 Fabrics - MLAG, TRILL / FabricPath / VCS
L3 Fabrics - Leaf/Spine, Juniper QFabric, Cisco Dynamic Fabric Automation
Virtual Networking (Legacy)
Introduction to Software Defined Networking
Controllers
OpenFlow, Cisco onePK, SLAX, NetConf
Overlay Networking
What is Overlay Networking and Why You Want It
Tunnel protocols VXLAN, NVGRE, MPLSoGRE,
Tunnel Fabrics, Network Agents,
Enterprise Security Overhaul
Cloud Networking
Networking for Virtual/Software Defined Data Centers
Programmable Infrastructure
Comparing Software-Only and Integrated Overlays

EtherealMind

October 01, 2013
Tweet

More Decks by EtherealMind

Other Decks in Technology

Transcript

  1. About Me ‣ Host of Packet Pushers Podcast ‣ Freelance

    Network Architect/Engineer ‣ Blog - EtherealMind.com ‣ NetworkComputing.com (http://networkcomputing.com/blogs/author/Greg-Ferro) ‣ Slides: speakerdeck.com/etherealmind 2 2 Tuesday, 1 October 13
  2. Needs ‣ Understand your needs ‣ Write them down ‣

    All cars are the almost the same. ‣ There isn’t much difference between them ‣ They all use the same fuel ‣ They all get you where you want to go 3 3 Tuesday, 1 October 13
  3. Scheduling ‣ Phones ‣ Questions ‣ Toilets ‣ Break at

    10:30am ‣ Four Parts 4 4 Tuesday, 1 October 13
  4. Agenda - Part 0 Part 0 - Networking Renaissance Part

    1 - Data Centre Fabrics & Designs Part 2 - Switch Internals Part 3 - SDN, Operations and Overlays 5 5 Tuesday, 1 October 13
  5. Scaling ‣ Device density increasing ‣ 1 Physical > 50

    Virtual (1000 MAC -> 50000MAC) ‣ 10GbE needed for 20-100 servers per physical servers ‣ 10GbE server means 40GbE uplinks 7 7 Tuesday, 1 October 13
  6. Reliability ‣ Networks not reliable ‣ Brittle failure modes in

    Data Centre ‣ Unreliable operations due to hand cranking ‣ sending humans to do a robots job ‣ who thinks configuring VLANs or firewall rules is “skill” ‣ Current change not reliable 8 8 Tuesday, 1 October 13
  7. Multitenancy ‣ Multitenancy is about security ‣ Public Cloud obvious

    ‣ Enterprises need new Security models ‣ operational security needs more zones ‣ existing tools work but aren’t good enough ‣ Software Defined Data Centers 9 9 Tuesday, 1 October 13
  8. Mobility ‣ Server mobility ‣ Application mobility ‣ Automated server

    and application provisioning ‣ Faster Service Delivery through automation & orchestration 10 10 Tuesday, 1 October 13
  9. Virtualization ‣ Virtualization need new networking ‣ The edge of

    the network moved into the hypervisor ‣ Loss of network control & visibility ‣ Enables Cloud Systems for tight integration between servers, storage & network 11 11 Tuesday, 1 October 13
  10. The Challenge •Reliability • Risk free change •Bugs • operational

    reliability •Power • huge problem Power, Physical Hosts Users Connectivity Applications Data Centre Network Servers, Storage, VMware Apps Impact Pyramid 12 12 Tuesday, 1 October 13
  11. CapEx •Spend first, ROI later •Buy a large core, planning

    for growth •Servers start small, scale linear •“Blind” CapEx vs “Step” CapEx / OpEx Time Capital Expenditure Network Install Port Capacity Network Upgrade Server Upgrades Server Upgrades Server Upgrades CapEx Waste 13 13 Tuesday, 1 October 13
  12. Failure •Single points of Complex failure •Why have only one

    pair of firewalls ‣ routing, cost, power users ‣ Only one or two critical services need HA •HA systems are inherently risky & shared fate systems. ‣ Active/Standby firewall •HA in vertical scale system = $$$$$’s SVR WAN RTR Internet RTR FWL FWL SVR SVR SVR SVR SVR SVR Stateful HA Active/Standby WAN Internet LoadBal LoadBal Stateful HA 14 14 Tuesday, 1 October 13
  13. Configuration Failure ‣ Manual Configuration ‣ All devices are configured

    using “power tools” ‣ Every engineer is a “power user” ‣ Why have an API ? Substandard & lack vendor commitment ‣ Restricts number of devices (requires power users) ‣ “Don’t send a human to do a robots job” 15 15 Tuesday, 1 October 13
  14. Fourth Wave of Virtualization ‣ 1998 - virtual lans, 2002

    - virtual routing, 2007- virtual devices ‣ The secret is in the name ‣ another virtualization phase is no big deal ‣ networking will be integrated with servers and storage by 2015 16 16 Tuesday, 1 October 13
  15. Agenda - Part 1 Part 0 - Networking Renaissance Part

    1 - Data Centre Fabrics & Designs Part 2 - Switch Internals Part 3 - SDN, Operations and Overlays 17 17 Tuesday, 1 October 13
  16. Disclaimer ‣ Focus on technology & features ‣ Understanding of

    why you buy not what you buy ‣ I don’t endorse vendors, you need to make your own choices ‣ The future is uncertain 18 18 Tuesday, 1 October 13
  17. Classes of Data Centre Networks Today ‣ Tree ‣ Spanning

    Tree ‣ LAG ‣ MLAG Tomorrow ‣ Fabrics ‣ L2 ECMP ‣ L3 ECMP 19 Wisdom: Everything works for someone Some things work better than others 19 Tuesday, 1 October 13
  18. Tree Networking Core Core Dist'n Dist'n Dist'n Dist'n Access Access

    Access Access Access Access Access Access 20 20 Tuesday, 1 October 13
  19. STP Review - Negative ‣ Ethernet networks cannot have loops

    ‣ Ethernet is a dumb (but cheap) protocol ‣ STP prevents loops ‣ STP protocol has been enhanced (RSTP/ MSTP) ‣ inherently unreliable during changes ‣ Brittle failure mode ‣ One mistake and an entire data centre goes down 21 21 Tuesday, 1 October 13
  20. STP Features TYPES ‣ Classic ‣ PVST / PVST+ ‣

    Cisco proprietary ‣ Rapid STP ‣ Multiple Instance Features ‣ Loop Guard ‣ Root Guard ‣ BPDU Guard ‣ UDLD ‣ Portfast ‣ Root Placement 22 22 Tuesday, 1 October 13
  21. STP Review - Positive ‣ Cheap - standard feature, no

    licensing ‣ Knowledge, Skills and Resources everywhere ‣ STP will be around until I retire 23 23 Tuesday, 1 October 13
  22. STP Review - Inefficient ‣ Wasted bandwidth ‣ Doesn’t scale

    - STP databases require ever larger memory ‣ e.q QinQ might mean 16 million MSTP instances ‣ Requires big expensive switches Core Core Dist'n Dist'n Dist'n Dist'n Access Access Access Access Access Access Access Access x x x x x x x x x x VERY BIG $$$ BIG $$$ 24 24 Tuesday, 1 October 13
  23. LAG ‣ Link Aggregation Group ‣ Channel bonding, ‣ Port

    agreggation ‣ Bonding multiple Ethernet ports into a single logical connection 26 26 Tuesday, 1 October 13
  24. ‣ 4 x 10GbE ports ‣ 40G Bandwidth but 10GbE

    Speed ‣ Doesn’t solve latency Switch Switch LAG Bandwidth and Speed 29 29 Tuesday, 1 October 13
  25. Multi-Chassis LAG MLAG MLAG Switch Switch Switch Switch Physical MLAG

    Logical MLAG Switch Switch Switch Switch 31 31 Tuesday, 1 October 13
  26. Core Core Distribution Distribution Svr Access Access Core Core Distribution

    Distribution Svr Access Access 32 32 Tuesday, 1 October 13
  27. MLAG - Problems •Control plane Synchronisation •MLAG Master/ Secondary •Peer

    Detection •Orphan ports (human error) Core Core Distribution Distribution 34 34 Tuesday, 1 October 13
  28. MLAG - Problems •Control plane Synchronisation Core Core Distribution Distribution

    Core Core Distribution Distribution Svr Access Access •Data Paths 35 35 Tuesday, 1 October 13
  29. LAG & MLAG Designs ‣ Spanning Tree Avoidance is big

    business ‣ Link Aggregation works well for many ‣ MLAG is a tricky thing ‣ complex, hard to configure reliably ‣ tends to buggy and over-featuring ‣ Popular for virtualization cores ‣ Doesn’t scale - only 2 switches per MLAG 36 36 Tuesday, 1 October 13
  30. Propietary MLAG ‣ Cisco VSS, vPC ‣ HP IRF ‣

    Juniper Virtual Chassis 37 37 Tuesday, 1 October 13
  31. LAG Limits ‣ Scaling ‣ Co-dependency in MLAG ‣ Limited

    number of LAG per device / stack ‣ CPU must maintain state ‣ Megaflow can cause unexpected congestion 38 38 Tuesday, 1 October 13
  32. Latency ‣ New ‘old’ requirement ‣ Distributed storage eg vSAN

    ‣ Distributed database eg CEPH, Lustre, ‣ Impacts everything ‣ Cheap products have high latency ‣ Consistent latency can be important 40 40 Tuesday, 1 October 13
  33. Contention Ratios Access Ports Uplink Contention 2001 48 x 100MbE

    2 x 1GbE 2.4 : 1 2005 48 x 100MbE 4 x 1GbE 1.2 : 1 2009 48 x 1GbE 2 x 10GbE 2.4 : 1 2013 48 x 10GbE 4 x 40GbE 4 : 1 ‣ How much traffic can a server generate ? ‣ Blade servers = 50 VMs per blade, 8 blades per chassis 41 41 Tuesday, 1 October 13
  34. Distribution Distribution Distribution Distribution Access Access Access Access Access Access

    Access Access X X X X X X X X X X X X Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr 1 2 3 Core Core 42 42 Tuesday, 1 October 13
  35. Chassis vs Rackables ‣ Chassis switches are required in Tree

    Networks to meet bandwidth ‣ Expensive, complex and “productive bug factories” ‣ Critical failure points ‣ Overloaded with unused features ‣ Rackables are cheap to design and build. ‣ Merchant silicon allows focus on software quality ‣ Low power ‣ Better purchasing ie one by one instead of one big purchase 43 43 Tuesday, 1 October 13
  36. Power ‣ Rules of thumb ‣ Less switches = less

    power ‣ TOR Switches efficiency better than Chassis 44 44 Tuesday, 1 October 13
  37. Large Bridging Domains ‣ Using VLANs is current practice for

    L2 VM ‣ VLANs are security zoning ‣ servers in VLAN 20, VDI in 50-70, DMZ 3900-3950 ‣ VLAN everywhere is simple but risky at scale ‣ Every device is “co-dependent” when using bridging ie faulty NICs 45 45 Tuesday, 1 October 13
  38. DCBX 46 Switch Switch Switch Switch PAUSE Capability Exchange PAUSE

    PAUSE Storage Server Capability Exchange Capability Exchange PAUSE PAUSE Capability Exchange Capability Exchange ‣ IEEE DCBX Working Group ‣ PFC(802.1Qbb), ETS/DCXB(802.1Qaz), QCN ‣ much more complicated that this ‣ hard to make work, complex to maintain ‣ proprietary solution for Cisco, Brocade, Juniper 46 Tuesday, 1 October 13
  39. Bisectional Bandwidth - Tree Distribution Distribution Distribution Distribution Access Access

    Access Access Access Access Access Access Core Core X X X X X X X X X X X X Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr 1 2 3 48 48 Tuesday, 1 October 13
  40. Bisectional Bandwidth - MLAG Distribution Distribution Distribution Distribution Access Access

    Access Access Access Access Access Access Core Core Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr 1 2 3 49 49 Tuesday, 1 October 13
  41. Why Fabrics ? ‣ STP and LAG Networks are high

    contention ‣ 8:1 and 4:1 are common ‣ Need modular build-out methods ‣ better purchasing models (less capital) ‣ In virtualization and cloud deployments, East/West > North/South Goal: Build a Lossless Network 51 51 Tuesday, 1 October 13
  42. Purchasing Time Capital Expenditure Network Install Port Capacity Network Upgrade

    Server Upgrades Server Upgrades Wasted CapEx Port Capacity New Server New Server New Servers •Spend big, ROI later •Network Assets rot for years & years 52 52 Tuesday, 1 October 13
  43. North South East West Distribution Distribution Distribution Distribution Access Access

    Access Access Access Access Access Access Core Core Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr WAN EAST- WEST NO RTH - SO UTH N O RTH - SO U TH WAN Rtr WAN Rtr 53 53 Tuesday, 1 October 13
  44. Types of Ethernet Fabric ‣ The term “Fabric” ‣ explained

    in the next section “Switch Internals” ‣ It is possible to build an Ethernet network that is lossless ‣ 1:1 Contention ratio at all points ‣ The term “Crossbar Fabric” ‣ Chassis switches ‣ Internal ‣ ECMP Fabric ‣ Is what we are talking about here 54 54 Tuesday, 1 October 13
  45. Two types of ECMP ‣ L2 ECMP ‣ TRILL or

    SPB standards ‣ Uses a routing protocol to direct Ethernet frames over equal costs paths ‣ L3 ECMP ‣ IP Only ‣ Uses a routing protocol to direct IP Packet over equal cost paths 56 56 Tuesday, 1 October 13
  46. Spine Spine Edge Edge Edge Edge Edge Edge Spine Spine

    Storage Server Server Server Server Server Server Server Server Storage Spine Leaf 57 57 Tuesday, 1 October 13
  47. Equal Cost Multipath 1 Spine Spine Edge Edge Edge Edge

    Edge Edge Spine Spine Server Server 2 3 4 1 2 3 4 58 58 Tuesday, 1 October 13
  48. ECMP Width determined by Vendor Implementation Spine Spine Edge Edge

    Edge Edge Edge Edge Spine Spine Server Server Spine Spine Spine Spine Spine Spine Spine Spine Edge Edge Edge Edge Edge Edge Uplink ports on Leaf Nodes 59 59 Tuesday, 1 October 13
  49. SPINE SPINE LEAF LEAF LEAF LEAF SPINE SPINE LEAF LEAF

    LEAF S S 1 2 3 1 2 3 Spine - Unequal Paths 60 60 Tuesday, 1 October 13
  50. Spine - Network Only Spine Spine Edge Edge Edge Edge

    Edge Edge Spine Spine Storage Server Server Server Server Server Server Server Server Storage 40GbE Ports on Spine Lets Connect Storage Cheap ? 61 61 Tuesday, 1 October 13
  51. Spine - Network Only MegaFlows Cause Unequal Load Factors Spine

    Spine Edge Edge Edge Edge Edge Edge Spine Spine Server Server Storage Storage 62 62 Tuesday, 1 October 13
  52. East West Spine Spine Edge Edge Edge Edge Edge Edge

    Spine Spine Server Server Storage Storage East West Bandwith is Excellent 63 63 Tuesday, 1 October 13
  53. Bisectional - ECMP 64 Core Core Edge Edge Edge Edge

    Edge Edge Core Core 2 1 64 Tuesday, 1 October 13
  54. L2 ECMP SPINE SPINE LEAF LEAF LEAF LEAF SPINE SPINE

    LEAF LEAF LEAF S S SPINE SPINE LEAF LEAF LEAF LEAF SPINE SPINE LEAF LEAF LEAF S S VLAN 1 65 65 Tuesday, 1 October 13
  55. L2 ECMP Core TRILL L2MP Core Switch Switch Switch Switch

    Switch Switch STP Domain STP Domain TRILL L2MP Core STP Domain Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch 66 66 Tuesday, 1 October 13
  56. TRILL/SPB 67 STP Domain STP Domain Switch Switch Switch Switch

    RBridge RBridge RBridge RBridge IS-IS protocol "MAC Address:RBridgeID" Encapsulsation RBridge RBridge Switch Switch RBridge RBridge Ethernet Frame RBridge TRILL ENCAP RBridge Lookup MAC Lookup MAC Lookup RBridge Lookup RBridge Lookup TRILL ENCAP Ethernet Frame "Routing by MAC Address" 67 Tuesday, 1 October 13
  57. L2 ECMP ‣ Uses Encapsulation To ‘route’ frames ‣ SPB

    from Avaya and Huawei ‣ SPB ‣ Vendor Proprietary - Brocade VCS, Cisco Fabricpath ‣ TRILL ‣ Mostly propietary ‣ Proprietary (not encapsulation) ‣ QFabric 68 68 Tuesday, 1 October 13
  58. L3 ECMP ‣ No connections between spines ‣ upgradeable without

    outage ‣ Semi-independent control planes ‣ Grows piece by piece ‣ Use low cost 1RU switches ‣ Dell S6000, Mellanox S1032, Nexus 5500, ‣ Simplicity means reliable 69 69 Tuesday, 1 October 13
  59. 70 Spine Spine Edge Edge Edge Edge Edge Edge Spine

    Spine Server "Old" vMotion Doesn't WORK Server Server Server Server Server Server Server Server Server Server Server 70 Tuesday, 1 October 13
  60. 71 L2 / L3 ECMP Spine Spine Edge Edge Edge

    Edge Edge Edge Spine Spine Server Server L3 L2 71 Tuesday, 1 October 13
  61. 72 L2/L3 ECMP ‣ You need VLANs everywhere ‣ legacy

    hosts and vMotion requirements ‣ You want to emulate a tree network ‣ which is dumb (more in overlay networking) ‣ Proprietary solutions from Brocade & Cisco ‣ Multi-Gateway on Spines 72 Tuesday, 1 October 13
  62. When to Use ECMP Designs ‣ ALL THE TIME ‣

    Start with virtualization servers using Overlay Networking ‣ Integrate with your existing network ‣ Data Centre will have many networks ‣ Connect on the LEAF Nodes ‣ L3 ECMP is best with overlay networking ‣ L2 ECMP is better than nothing 73 73 Tuesday, 1 October 13
  63. Sizing ECMP ‣ Most ECMP solutions are minimum of 500

    GbE ports ‣ Different vendors can offer 750 and, in some cases, up to 1500 10GbE ports in a single ECMP network ‣ thats a lot of ports 74 74 Tuesday, 1 October 13
  64. 75 Cloud Server Cluster OpenStack / VMware vCloud Spine Spine

    Edge Edge Edge Edge Edge Edge Spine Spine Core Core Dist'n Dist'n Dist'n Dist'n Access Access Access Access Access Access Access Access Server Server Server Server Server Server Server 40GbE Interfaces Bare Metal / vCenter / KVM / Other Hand Cranked Server Server Server Server Server Server Server 75 Tuesday, 1 October 13
  65. 76 Cloud Server Cluster OpenStack / VMware vCloud Spine Spine

    Edge Edge Edge Edge Grow Over Time Core Core Dist'n Dist'n Dist'n Dist'n Access Access Access Access Access Access Access Access Server Server Server Server Server 10GbE Interfaces Bare Metal / vCenter / KVM / Other Hand Cranked Server Server Server Server Server Server Server 76 Tuesday, 1 October 13
  66. MultiGateway Routing ‣ For L2 ECMP designs, where and how

    are default gateways handled ‣ For L2/L3 ECMP where & how IP routing in handled 78 78 Tuesday, 1 October 13
  67. Device Sizing, Capacity ‣ MAC Scaling ‣ ARP Scaling ‣

    Mobility ‣ Power ‣ include interfaces 79 79 Tuesday, 1 October 13
  68. Industry Changes ‣ Asset rotation to move to 3 -

    5 years (instead of 5-10 today) ‣ CapEx will drop accordingly ‣ Simpler devices ‣ SDN platforms add new costs based on consumption ie per use licensing ‣ Budget remain static 80 80 Tuesday, 1 October 13
  69. Agenda - Part 2 Part 0 - Networking Renaissance Part

    1 - Data Centre Fabrics & Designs Part 2 - Switch Internals Part 3 - SDN, Operations and Overlays 82 82 Tuesday, 1 October 13
  70. Inside Your Switch ‣ Have you though about what happens

    inside your switch 83 83 Tuesday, 1 October 13
  71. Forwarding steps ‣ Receive Frame, ‣ Extract Address Information ‣

    Lookup Address table ‣ Check table entry age ‣ Check & update checksum ‣ Send Frame/Packet 84 84 Tuesday, 1 October 13
  72. Packet Pathway Packet Parser VLAN Processor MST Storage L2 Match/

    Learning L2 CAM L3 Match/ Learning ACL Processing L3 CAM TCAM Input Counters Ingress Egress Output Counters Packet Rewrite QoS Policing/ Shaping PHY Crossbar PHY 85 85 Tuesday, 1 October 13
  73. Ethernet BUM Traffic ‣ Broadcast ‣ IP ARP ‣ FF:FF:FF:FF:FF:FF

    ‣ Multicast ‣ IP Multicast, VRRP, ‣ 01:xx:xx:xx:xx:xx ‣ Unknown Unicast ‣ Flooding for Mac Address learning ‣ Switch must replicate frame and send out every port 86 86 Tuesday, 1 October 13
  74. BUM Handling Packet Parser VLAN Processor MST Storage L2 Match/

    Learning L2 CAM L3 Match/ Learning ACL Processing L3 CAM TCAM Input Counters Ingress Egress Output Counters Packet Rewrite QoS Policing/ Shaping PHY Crossbar PHY Packet Replication 87 87 Tuesday, 1 October 13
  75. Clos Theory Switch A B X Y Switch A X

    Y Switch B X Y Switch A B X Y Switch A B X Y 89 89 Tuesday, 1 October 13
  76. Crossbar 1 90 1 2 3 4 5 Any input

    to any output 1 2 3 4 5 Any input to any output 1 2 3 4 5 1 2 3 4 5 90 Tuesday, 1 October 13
  77. Crossbar 2 91 1 2 3 4 5 1 2

    3 4 5 1 2 3 4 5 1 2 3 4 5 X 91 Tuesday, 1 October 13
  78. Crossbar 3 92 Buffers Fabric Arbitration 1 2 3 4

    5 1 2 3 4 5 O Fabric Arbitration Input Buffer Output Buffer 92 Tuesday, 1 October 13
  79. Crossbar 4 93 Fabric 1 2 3 4 1 2

    3 4 2 3 3 2 4 4 1 2 4 1 1 1 2 3 4 Switching)Backplane 4 1 2 3 Time 1 2 3 Output Blocked Servers Storage)Array iSCSI)/)FCoE 4 93 Tuesday, 1 October 13
  80. Crossbar 5 94 Input Queue Fabric 1 2 3 4

    1 2 3 4 2 3 3 2 4 4 1 2 4 1 1 1 2 3 4 Q Packets Queued at Input Time 1 2 3 Q 94 Tuesday, 1 October 13
  81. Crossbar 6 95 Output Queue Input Queue Fabric 1 2

    3 4 1 2 3 4 2 3 3 2 4 4 1 2 4 1 1 1 2 3 4 Q Q Q Packet is Queued at Output Time 1 2 3 95 Tuesday, 1 October 13
  82. Crossbar 7 96 Output Queue Input Queue Fabric 4 3

    1 2 4 1 2 3 4 ? ? Where would queueing occur ? Time 1 2 3 96 Tuesday, 1 October 13
  83. Crossbar 97 ‣ there is a lot of “magic” inside

    a switch ‣ the faster the switch, the more complex the magic and the more costly it becomes ‣ crossbar silicon exists in every switch 97 Tuesday, 1 October 13
  84. Content Addressable Memory ‣ High Speed SRAM memory ‣ Performs

    a database lookup in a single clock cycle ‣ Expensive, fast, ‣ Different types ‣ BCAM - exact match ‣ TCAM - partial match 98 98 Tuesday, 1 October 13
  85. 101 ‣ High Power Consumption - SRAM ‣ Speed limited

    to 600 Mhz - refresh ‣ Update limitations - synchronous updates ‣ Expensive - high component count 101 Tuesday, 1 October 13
  86. Rack vs Chassis - 1 ‣ Rackable switch use “System

    on Chip” ‣ Low cost but performance limited ‣ ECMP allows small switches to do big things ‣ low cost rackable devices allow for rotation in 3 years (server replacement times) ‣ new features, 40GbE / 100GbE ‣ react to changes in networking ‣ switching on motherboard 102 102 Tuesday, 1 October 13
  87. Rack vs Chassis - 2 ‣ chassis switches are way

    more expensive and unreliable because of complexity ‣ chassis switches are poor choices for network services e.g. ‣ chassis switch mandatory for tree network & optional for ECMP designs 103 103 Tuesday, 1 October 13
  88. 105 Slot 1 Slot 2 Slot 4 Slot 5 OOB

    Shared Bus Slot 3 Slot 1 Slot 2 Slot 5 Slot 6 OOB Shared Bus Slot 3 Slot 4 105 Tuesday, 1 October 13
  89. Chassis Architecture 106 FABRIC MODULES Fabric ASIC Packet Processor Port

    ASIC Port ASIC Packet Processor Port ASIC Port ASIC Packet Processor Port ASIC Port ASIC Packet Processor Port ASIC Port ASIC CHASSIS BACKPLANE Packet Processor Port ASIC Port ASIC Packet Processor Port ASIC Port ASIC Packet Processor Port ASIC Port ASIC Packet Processor Port ASIC Port ASIC Fabric ASIC Fabric ASIC Fabric ASIC LINE CARD LINE CARD 106 Tuesday, 1 October 13
  90. Dual Active Fabrics ‣ Dual Active Fabric ‣ Legacy Switches

    ‣ Chassis Switches Only ‣ Complex ‣ Complexity = Software Risk & Hard to Maintain 107 107 Tuesday, 1 October 13
  91. Ethernet Bandwidth in 25 Years 1990 1992 1994 1996 1998

    2000 2002 2004 2006 2008 2010 2012 2014 1.1 × 1012 0 1 × 1011 2 × 1011 3 × 1011 4 × 1011 5 × 1011 6 × 1011 7 × 1011 8 × 1011 9 × 1011 1 × 1012 10 Mbps 100 Mbps 1 Gbps 10 Gbps 40 Gbps 100 Gbps 1 Tbps 400 Gbps 109 109 Tuesday, 1 October 13
  92. QSFP Breakout •40GbE QSFP becomes 4 x 10GbE Passive copper

    coax •100 GbE QSFP becomes 10 x 10GbE (Arista does 12) •Low power, moderate cost 110 110 Tuesday, 1 October 13
  93. Merchant Silicon ‣ Broadcom Trident2 ‣ Fully-featured , high-density 1RU

    data center switch ‣ 2.56Tbps throughput DCB-enabled, VLT ‣ 32x40GbE and 8x40GbE ( 96x10GbE in breakout mode) ‣ Look for other features ‣ Speeds and feeds no longer important ‣ Dell S6000 operates room temperature ‣ VTEP capabilties ‣ software quality 111 111 Tuesday, 1 October 13
  94. 40GbE & 100GbE Physical ‣ 10GbE lanes (4 x 10,

    10 x 10) ‣ Multimode - 40 GbE fibre=4 pair, 100 GbE = 10 pair ‣ UTP highly unlikely > 98% certain ‣ New fibre types used for distance ‣ Use new cabling methods ‣ abandon flood cabling 112 112 Tuesday, 1 October 13
  95. How Much 10GbE Do You Need ? ‣ A typical

    Broadcom Trident2 switch has 32 x 40GbE ports ‣ Lets Say Blade Chassis uses 1 x 40GbE to a pair of switches ‣ Each 40GbE can be 4 x 10GbE (whichever cheaper) ‣ 1 Blade Chassis = 8 Physical Servers ‣ 8 Physical = 20 Virtual ‣ 32 x 20 x 8 = 5120 Virtual Servers 113 113 Tuesday, 1 October 13
  96. QoS in the Data Centre ‣ No. ‣ If you

    do, it will break things ‣ Buy bigger switches ‣ Its cheaper & more reliable ‣ QoS heavily depends on the physical devices. ‣ consistent end-to-end QoS deeply impractical with current systems. No known solutions. ‣ With sufficient thrust, pigs fly just fine. However, this is not necessarily a good idea. It is hard to be sure where they are going to land, and it could be dangerous sitting under them as they fly overhead - RFC1925 Rule 3. 114 114 Tuesday, 1 October 13
  97. Please Rate Me ‣ If its good they might invite

    me back and I’ll know the effort is worthwhile ‣ I might get champagne with gold flakes in it. ‣ If it’s not good, then I will be prevented from inflicting this on anyone else. 116 116 Tuesday, 1 October 13
  98. Agenda - Part 3 Part 0 - Networking Renaissance Part

    1 - Data Centre Fabrics & Designs Part 2 - Switch Internals Part 3 - SDN, Operations and Overlays 117 117 Tuesday, 1 October 13
  99. SDN 118 ‣ Software Defined Networking ‣ is about new

    ways of operating your network ‣ instead of manual configuration for boring stuff, use software / applications ‣ enable Cloud by integrating network with virtualization & storage 118 Tuesday, 1 October 13
  100. Can’t predict 10 years of SDN ‣ But I can

    tell you what the next three years will look like ‣ OpenFlow changed and will change everything ‣ drastic change will take time to arrive ‣ OpenFlow probably dominant in hardware in 5 years 119 119 Tuesday, 1 October 13
  101. Controller Networking 121 East West LAN Switches Network SDN Controller

    OpenFlow Quantum/OpenStack Configuration Controller Orchestration Controller Northbound SDN Northbound SDN Southbound SDN North/South LAN 121 Tuesday, 1 October 13
  102. Overlay Networking 122 ‣ Changing the physical network will take

    years ‣ OpenFlow needs new hardware ‣ Competitive pressure means we need new networking today ‣ Overlay Networking 122 Tuesday, 1 October 13
  103. 123 Physical Server Hypervisor vSwitch VM VM OS App vNIC

    OS App vNIC Driver Driver Physical Server Hypervisor vSwitch VM VM OS App vNIC OS App vNIC Driver Driver Physical Server Hypervisor vSwitch VM VM OS App vNIC OS App vNIC Driver Driver Physical Server Hypervisor vSwitch VM VM OS App vNIC OS App vNIC Driver Driver ToR NIC NIC NIC NIC NIC NIC NIC NIC ToR 123 Tuesday, 1 October 13
  104. 124 Distribution Distribution Distribution Distribution Access Access Access Access Access

    Access Access Access Core Core X X X X X X X X X X X X Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr Svr vSwitch vSwitch vSwitch vSwitch vSwitch vSwitch vSwitch vSwitch vSwitch vSwitch vSwitch vSwitch vSwitch vSwitch vSwitch vSwitch Physical Network Virtual Network 124 Tuesday, 1 October 13
  105. Overlay Networking 127 •Full Mesh of Tunnels •Using existing assets

    •No impact to physical network - NO CHANGE CONTROL 127 Tuesday, 1 October 13
  106. Overlay Networking 128 •Separate overlay network per customer •Single point

    of control in the “vSwitch” •Lower security obligation on physical network simplifies operation 128 Tuesday, 1 October 13
  107. Overlay Networking 129 •Today, vSwitch is a “robot patch panel”

    •Tomorrow, a network device performing routing, load balancing, filtering/firewall at network edge 129 Tuesday, 1 October 13
  108. Network performance of x86 •Intel confirms 40Gbps forwarding on a

    single CPU core •Expect to see Fulcrum switch silicon on motherboard in 2015 & CPU die by 2017 (maybe) 130 130 Tuesday, 1 October 13
  109. Network Agent as Router •Network agent can •filter at the

    edge, •load balance across available path •policy route (SRC/DST into tunnel interfaces 131 131 Tuesday, 1 October 13
  110. 132 Physical Server Physical Server Physical Server pNic pNic VM

    Core Core Core Core Agent ToR ToR pNic pNic VM Agent ToR ToR pNic pNic Agent Data Centre Logical Overlay Network TUNNEL SWITCHING 132 Tuesday, 1 October 13
  111. 133 Physical Server Physical Server Physical Server pNic pNic VM

    Core Core Core Core Agent ToR ToR pNic pNic Agent ToR ToR pNic pNic VM Agent TUNNEL ROUTING 133 Tuesday, 1 October 13
  112. 134 Physical Server Physical Server Physical Server VM VM VM

    VM VM VM Core Core Core Core vSwitch ToR ToR VM VM VM VM VM VM vSwitch ToR ToR VM VM VM VM VM VM vSwitch Hypervisor Manager Network Controller API Enabled Network Devices API Enabled Software Network Agents 134 Tuesday, 1 October 13
  113. 135 Data Centre Physical Network Physical Server Physical Server Physical

    Server pNic pNic VM VM VM VM VM VM Core Core Core Core vSwitch ToR ToR pNic pNic VM VM VM VM VM VM vSwitch ToR ToR pNic pNic VM VM VM VM VM VM vSwitch L2 with TRILL / MLAG L3 with ECMP 135 Tuesday, 1 October 13
  114. 136 VTEP ‣ VXLAN Tunnel End Point (VTEP) ‣ Connects

    Overlay to Underlay ‣ At high bandwidth & Low latency ‣ VXLAN to VLAN ‣ Not first option, use software appliances ‣ USE LOTS of Software Appliances - 1 per VXLAN/ VLAN if you can. Still cheaper than hardware device and cheaper to own 136 Tuesday, 1 October 13
  115. 137 FWL Physical Server pNic pNic VM VM VM VM

    VM VM Core Core Core Core ToR ToR ToR ToR VXLAN, NVGRE, NVO3 or MPLSoGRE FWL RTR RTR Internal VTEP VTEP vSwitch Internet SW SW Direct Hosts Server Server Legacy Hosts 137 Tuesday, 1 October 13
  116. 138 FWL Physical Server Physical Server Physical Server pNic pNic

    VM VM VM VM VM VM Core Core Core Core ToR ToR pNic pNic VM VM VM VM VM VM ToR ToR pNic pNic VM VM VM VM VM VM VXLAN, NVGRE, NVO3 or MPLSoGRE FWL RTR RTR Internal OVSDB VTEP VTEP vSwitch vSwitch vSwitch Network Controller OVSDB Internet SW SW Direct Hosts Server Server Legacy Hosts •Network Controller make this easy to operate •Easier than MPLS, VRF- Lite, Virtual Device Contexts •Mapping Underlay to Overlay is painful because orchestration fails 138 Tuesday, 1 October 13
  117. Software Defined Data Centres 139 ‣ Overlay Networking means that

    the network has a new range of services ‣ Full service separation for security and multi-tenancy applications 139 Tuesday, 1 October 13
  118. Today 140 UCS2100 UCS2100 UCS 5100 B2xx B2xx B2xx B2xx

    B2xx B2xx B2xx B2xx UCS2100 UCS2100 UCS 5100 B2xx B2xx B2xx B2xx B2xx B2xx B2xx B2xx Ethernet Core Ethernet Core NX7K Core Context NX7K Core Context LoadBal UCS6200 UCS6200 LoadBal NX7K Aggr Context NX7K Aggr Context ASA Firewall ASA Context ASA Context ASA Firewall ASA Context ASA Context MPLS/WAN Internet VMDC Design Template v2.1 - Cisco CVD NX5K NX5K NX5K NX5K DMZ Svr DMZ Svr DMZ Svr DMZ Svr DMZ Svr •Ethernet Core with VDC, MPLS, VRF-Lite or E-VPN •Device contexts •Aieeeeeeeeeeee, change control 140 Tuesday, 1 October 13
  119. 141 Physical Server Physical Server Physical Server Core Core Core

    Core ToR ToR ToR ToR Ethernet Fabric - "Underlay" Overlay LAN 1 Overlay LAN 2 pNic pNic pNic pNic pNic pNic Agent Agent VM Agent VM VM VM 141 Tuesday, 1 October 13
  120. 142 Physical Server Physical Server Physical Server Virtual Data Center

    1 Virtual Data Center 2 Agent Agent VM Agent VM VM Network Controller VM VM VM VM 142 Tuesday, 1 October 13
  121. 143 Software Networking ‣ Virtual Machine as Part of the

    Network ‣ Firewalls, Load Balancers, IDS/IPS ‣ Logging Servers ‣ Overturns How Network Architecture Supports Applications 143 Tuesday, 1 October 13
  122. 144 RTR RTR FWL FWL Core Core Access Access SV

    R SV R SV R SV R SV R SV R Access Access SV R SV R SV R SV R SV R SV R Access SV R SV R SV R Application Shared Resources Internal Service Application RTR RTR FWL FWL Core Core SVR SVR SVR SVR SVR SVR SVR SVR SVR SVR SVR SVR DMZ SVR SVR SVR 144 Tuesday, 1 October 13
  123. 145 Application Shared Resources Internal Service Application SVR SVR SVR

    SVR SVR SVR SVR SVR SVR SVR SVR SVR DMZ SVR SVR SVR RTR RTR FWL FWL RTR FWL RTR FWL FWL FWL Application Shared Resources Internal Service Application SVR SVR SVR SVR SVR SVR SVR SVR SVR SVR SVR SVR DMZ SVR SVR SVR RTR RTR FWL FWL RTR FWL RTR FWL FWL FWL 145 Tuesday, 1 October 13
  124. 146 Virtual Data Centers ‣ Network separation by overlay and

    controller ‣ Server <-> Network integration at controller 146 Tuesday, 1 October 13
  125. 147 Internet Customer vDC VM Internet FW Customer vDC VM

    Internet FW Customer vDC LB VDI IDS AD LOG VDI VDI VM VM VM VM VM 147 Tuesday, 1 October 13
  126. 148 Internet Customer vDC vApp Template FW SVR VDI VDI

    SVR App Catalog vApp Template FW SVR VDI VDI SVR 148 Tuesday, 1 October 13
  127. 149 Internet Customer vDC vApp Template FW SVR VDI VDI

    SVR App Catalog vApp Template FW SVR VDI VDI SVR RTR SVR Zone 3 vDC SVR SVR SVR 149 Tuesday, 1 October 13
  128. 150 SVR SVR SVR SVR SVR SVR SVR Multiple Service

    Line Model RTR RTR FW FW FW FW LB SVR RTR FW LB SVR SVR FW FW SVR SVR SVR VPN Internet 150 Tuesday, 1 October 13
  129. 151 SVR SVR SVR Resources Services vDC SVR SVR RTR

    Active Directory Patching SVR AntiVirus vCNS SVR vCNS SVR SVR SVR Service Lines with Shared Resources RTR RTR FW FW FW FW LB SVR RTR FW LB SVR SVR FW FW SVR SVR SVR VPN Internet 151 Tuesday, 1 October 13
  130. 152 RTR VPN IAM NAC Juniper MAG ? RSA IAM

    Juniper SA Security vDC RTR SVR Zone 1 vDC vCNS RTR DLP FW PXY SAML Internet SVR SVR Resources Services vDC SVR SVR SVR RTR Active Directory Patching SVR AntiVirus PXY Remote Access/BYOD Outbound Web Services LOG LOG RTR FW Inbound Web Services IDS IDS RTR SVR Zone 2 vDC vCNS SVR SVR SVR RTR SVR Zone 3 vDC vCNS SVR SVR SVR . . . . . . . Templates Virtual Data Centre Inter-Org Links LOG 152 Tuesday, 1 October 13
  131. New Networking 154 ‣ Connectivity is not a service or

    a privilege. Its dumb networking. ‣ Dumb networking must be automated ‣ Future of networking is overlay, SDN, controllers and deep integration ‣ Network services is firewalls, load balancing, automation ‣ Fast automated provisioning instead of manual button pushing 154 Tuesday, 1 October 13
  132. Network Fabrics ‣ Expect to deploy multiple switch fabrics in

    the data centre ‣ Reconsider chassis purchases, build ECMP fabrics ‣ expect to upgrade to 40GbE/100GbE within 3 years ‣ don’t pre-cable new racks, use cabling system ‣ 155 155 Tuesday, 1 October 13
  133. Please Rate Me ‣ If its good they might invite

    me back and I’ll know the effort is worthwhile ‣ I might get champagne with gold flakes in it. ‣ If it’s not good, then I will be prevented from inflicting this on anyone else. 156 156 Tuesday, 1 October 13
  134. Thankyou Question Time ‣ Host of Packet Pushers Podcast ‣

    Freelance Network Architect/Engineer ‣ Blog - EtherealMind.com ‣ NetworkComputing.com (http://networkcomputing.com/blogs/author/Greg-Ferro) ‣ Slides: speakerdeck.com/etherealmind 157 157 Tuesday, 1 October 13