Upgrade to Pro — share decks privately, control downloads, hide ads and more …

2008: Data Center Networking: Real vs. Ideal - Stanford Clean Slate Seminar

2008: Data Center Networking: Real vs. Ideal - Stanford Clean Slate Seminar

Presentation for the "Stanford Clean Slate Seminar" by Tom Lyon of Nuova Systems; May 13, 2008

Tom Lyon

May 13, 2008
Tweet

More Decks by Tom Lyon

Other Decks in Technology

Transcript

  1. 03/12/08 Nuova Systems Inc. Page 1 Data Center Networking: Real

    vs. Ideal Tom Lyon Stanford Clean Slate Seminar May 13, 2008 [email protected]
  2. 03/12/08 Nuova Systems Inc. Page 2 Disclaimer Opinions expressed herein

    are my own, and are unlikely to correspond with any official opinions of Nuova Systems or Cisco Systems.
  3. 03/12/08 Nuova Systems Inc. Page 3 The Data Center Explosion

     Google  Around 1,000,000 servers  20-50 datacenters, up to 100MW each  Microsoft:  Adding 20,000 servers per month  15x servers and power over next 5 years  Electronic Trading  Whoever trades the fastest and smartest wins  Cloud Computing  Oil&Gas  Drug Discovery  ...
  4. 03/12/08 Nuova Systems Inc. Page 5 Dimensions of Data Center

    Networks  Functionality  Compatibility  Reliability  Cost  Manageability  Performance  Bandwidth  Latency  Jitter/Fairness
  5. 03/12/08 Nuova Systems Inc. Page 6 Functionality: Types of Networks

     “Normal” Ethernet  “Special” Ethernets – console, backup, IPC, ...  Storage Network – [Ethernet, Fibre Channel]  IPC Network – [Infiniband, Ethernet, exotic]  Console Network [Serial, Ethernet, KVM]  Power [AC, DC]  SMP Interconnect [exotic]
  6. 03/12/08 Nuova Systems Inc. Page 7 Compatibility  Serial ports

    are direct descendents of the Telegraph. Still work with 50 year old teletypes!  SCSI is almost 30 years old  Ethernet packets backwards compatible to 1980  x86 instruction set: 1978
  7. 03/12/08 Nuova Systems Inc. Page 8 Virtualization  Hard to

    believe in 'Clean Slate'  But, Virtualization can “contain” compatibility problems  Telnet, VLANs, VPNs, VMware, Virtual disks, ..  Virtualization used to share resources  1 mainframe, many systems  Virtualization used for consolidation  Multiple physical objects to 1  Virtualization used for encapsulation  Ability to capture, manage, redeploy state
  8. 03/12/08 Nuova Systems Inc. Page 9 Reliability: No single point

    of failure  2 general Ethernet connections  2 storage network connections  2 power connections  usually one 1 console connection  Every server on 7 networks!  Enter the blade server: shared wiring for servers  Net consolidation: Storage, Ethernet & Console  iSCSI or FCOE  KVM/IP
  9. 03/12/08 Nuova Systems Inc. Page 10 Cost  Driven by

    volume  Why the PC architecture is dominant  Inhibited by complexity  “Features” are the enemy
  10. 03/12/08 Nuova Systems Inc. Page 11 Manageability  Most systems

    require human touch, training, configuration, monitoring  What if I don't want to set the time on my coffee maker?  But systems are just components of a datacenter  Reverse Turing Test – management software pretending to be humans  We need a standard paradigm for devices to manage other devices
  11. 03/12/08 Nuova Systems Inc. Page 12 Performance  Cost of

    bandwidth 10,000x less than WAN  1Gb links nearing zero cost  This is the year of affordable 10Gbps  Higher level protocols are speed-independent  PHY layers have largely converged among different standard  Most performance problems not related to “raw” performance
  12. 03/12/08 Nuova Systems Inc. Page 13 Normalized Line Rates 1G

    ETH 4G FC 8G FC 10G IB 10G ETH 20G IB 0 2 4 6 8 10 12 14 16 18 Gbps
  13. 03/12/08 Nuova Systems Inc. Page 14 Performance vs Application 

    Storage  Large block transfers  Latency sensitive  “Hardware” endpoints  IPC  Extremely latency sensitive  Wide range of packet sizes  “Software” endpoints  Generic Ethernet/TCP/IP  No huge packets  Not as latency sensitive  Default for all software
  14. 03/12/08 Nuova Systems Inc. Page 15 Storage Networks  Storage

    Access slowly evolving from hardware bus to open network  NAS vs SAN  NFS & CIFS vs SCSI's many flavors  Ethernet vs Fibre Channel vs Infiniband
  15. 03/12/08 Nuova Systems Inc. Page 16 Storage Networks: Ethernet vs

    EtherNot  iSCSI, NFS, CIFS  TCP & Ethernet  Congestion Loss  Stream Oriented  Software Transport  High CPU overhead  SCSI-FCP, SCSI-SRP  F.C. and Infiniband  Credit Flow Control  Block Oriented  Hardware Transport  Low CPU overhead
  16. 03/12/08 Nuova Systems Inc. Page 17 Storage Networks: Convergence 

    Data Center Ethernet  Choice of congestion classes  Lossy vs lossless  Choice of storage transports  TCP or F.C. (FCOE)  Choice of hardware or software transport  TOE w TCP, software FCOE, ...
  17. 03/12/08 Nuova Systems Inc. Page 18 Turning over the rocks...

    The Ugly Reality & Some Clean Alternatives
  18. 03/12/08 Nuova Systems Inc. Page 19 Topology  Data Center

    networks are tree structured  Only topology that Ethernet supports  parallel trees for redundancy  Bandwidth of core nodes limits bandwidth of entire network  Need to evolve to support of fat-tree, mesh, arbitrary topologies - “multi-path”  Redundancy / Incompatibility between different network layers  L2 / Ethernet  L3 / IP  Storage multi-pathing
  19. 03/12/08 Nuova Systems Inc. Page 20 Control Planes & Politics

     Servers are redundantly connected, yet don't participate in network topology determination  Onto which interface should I send a packet?  Multi-path access to storage is very important  But storage doesn't participate in network topology either  What if there was just one control plane?  Unify Ethernet, IP, SCSI addressing  Arbitrary topology – simple graph theory  Huge boost in possible bandwidth  Huge reduction in congestion & latency
  20. 03/12/08 Nuova Systems Inc. Page 21 Congestion  Protocols have

    different congestion management approaches  Traditional Ethernet / TCP – drop & retransmit  Fibre Channel – never drop, spead congestion  Infiniband – no drop, virtual channels  Ideal: Application chooses behavior  How to achieve fairness?  Ideal: Integrate congestion & topology  route around congested nodes
  21. 03/12/08 Nuova Systems Inc. Page 22 Ethernet Congestion Directions 

    Ethernet today supports “pause”  All or nothing – congestion spreading  Soon: “Per Priority Pause”  8 classes of traffic – independent pause  Congestion signaling – IEEE p802.3au / QCN  backwards congestion notification to source  will take a long time to diffuse to many products  Adapters need better queueing  Single deep queue leads to massive unfairness  Multi-queue / flow awareness needed  Better integration between hw/sw queue management
  22. 03/12/08 Nuova Systems Inc. Page 23 TCP Directions  TCP

    today supports ECN  But its turned off because there's too many broken routers in the world  Layer 2 switches don't mark, need L3 awareness  Extend TCP to select ECN or not based on routes  Enable for “local” or “known good” subnets  TCP timeouts are ludicrously high  Need to decrease by 1000x for the datacenter  OS impact
  23. 03/12/08 Nuova Systems Inc. Page 25 Layer “Violations”  Layer

    2 - “transparent” switching  Virtual machines  Virtual networks  IB/FC transport  Proxies, firewalls, appliances
  24. 03/12/08 Nuova Systems Inc. Page 26 World Views - Topology

     Old world: The network (world) is flat  Keep going and you'll eventually get to the edge  New world: The network (world) is round  Wherever the data goes, it's still in the network  Servers just talk to other servers  The data goes in, rattles around, and comes back out again  Storage: just a network with a time dimension  Spacetime Data Fabric?:  Data constantly in motion/transformation  distance == time == latency
  25. 03/12/08 Nuova Systems Inc. Page 27 World Views - Manageability

     Geocentric  The CPU/OS is the center of the universe  Heliocentric  The DataCenter as a holistic unit  Galactic  DataCenters all over the world You are here
  26. 03/12/08 Nuova Systems Inc. Page 28 Monitoring  How much

    time, space, bandwidth & energy does my application use?  We can almost answer this within a single server  But how can we answer this for distributed applications?  Monitoring needs transparency – the opposite of virtualization
  27. 03/12/08 Nuova Systems Inc. Page 29 “Mobile” Processing  99.9%

    of networking moves the data to the processor  Why not move the processing to the data?  “function shipping”  Hypervisor on disk drive?  Map/Reduce model  What if the data is distributed?  Move the processing into the network – EC2  Steiner points?
  28. 03/12/08 Nuova Systems Inc. Page 30 Summary  Data Centers

    are the new heavy industry  Networking is the raison d'être for data centers  From inter-continental to intra-chip, networking issues are the major problems to be solved