Upgrade to Pro — share decks privately, control downloads, hide ads and more …

2008: TCP Issues in the Data Center

2008: TCP Issues in the Data Center

Presentation for Stanford seminar "The Future of TCP: Train-Wreck or Evolution?"
Raises the issue of buffer-bloat in servers (before the "buffer-bloat" term was used)
by Tom Lyon of Nuova Systems

Tom Lyon

April 01, 2008
Tweet

More Decks by Tom Lyon

Other Decks in Technology

Transcript

  1. 03/12/08 Nuova Systems Inc. Page 1 TCP Issues in the

    Data Center Tom Lyon The Future of TCP: Train-wreck or Evolution? Stanford University 2008-04-01
  2. 03/12/08 Nuova Systems Inc. Page 2 TCP: Not Just for

    “The Internet”  Essentially all network software relies on TCP/IP semantics  “The network is the data center”  In the data center, gigabits are “free”  105 times cheaper than WAN bandwidth  Terabit class switches  10Gb endpoints  TCP needs:  High bandwidth  Low Latency  Predictability & Fairness
  3. 03/12/08 Nuova Systems Inc. Page 3 Storage Networks  Storage

    Access slowly evolving from hardware bus to open network  NAS vs SAN  NFS & CIFS vs SCSI's many flavors  Ethernet vs Fibre Channel vs Infiniband
  4. 03/12/08 Nuova Systems Inc. Page 4 Storage Networks: Ethernet vs

    EtherNot  iSCSI, NFS, CIFS  TCP & Ethernet  Congestion Loss  Stream Oriented  Software Transport  High CPU overhead  SCSI-FCP, SCSI-SRP  F.C. and Infiniband  Credit Flow Control  Block Oriented  Hardware Transport  Low CPU overhead
  5. 03/12/08 Nuova Systems Inc. Page 5 Storage Networks: Convergence 

    Data Center Ethernet  Choice of congestion classes  Lossy vs lossless  Choice of storage transports  TCP or F.C. (FCOE)  Choice of hardware or software transport  TOE w TCP, software FCOE, ...
  6. 03/12/08 Nuova Systems Inc. Page 6 TCP: Time Out of

    Joint  TCP was standardized in a much slower world  ½ Second minimum retransmit timeout  20 micro-second RTT achievable today!  Fast re-transmit algorithm only works for streams – more data being sent  Most data center traffic is request/response – often single packets  Packet loss hurts because TCP won't (not can't) respond fast enough
  7. 03/12/08 Nuova Systems Inc. Page 7 Congestion in the Data

    Center  Gigantic, non-blocking switches are the norm  Hundreds of ports, terabits of throughput  Buffers and buffer management are the most costly part of the switch  Link based flow control (“pause”) allows switch to push congestion back to its upstream neighbors  If the upstream neighbor is the source server, then the congestion “Goes away”  Or does it?
  8. 03/12/08 Nuova Systems Inc. Page 8 Servers and Gigabits 

    Any current x86 server can easily saturate a 1Gb Ethernet link with TCP traffic  Many current servers can saturate 10Gb Ethernet links!  Lossless classes cause the pipe to fill faster  What happens when the first hop, the server's own Ethernet link, is the point of congestion?
  9. 03/12/08 Nuova Systems Inc. Page 9 TCP and the Fat

    Pipe  If TCP doesn't “see” congestion (loss or ECN) then it will continue to increase its window to try to get more bandwidth in the network  Lossless network => high throughput  But... a single streaming connection will consume all available buffers  Newer connections will have a hard time getting buffers => extreme unfairness  The server needs good congestion management
  10. 03/12/08 Nuova Systems Inc. Page 10 Servers, Ethernet, and Queues

     “Everyone” knows that big, simple FIFO queues are a bad idea in routers  What do servers have today? - big, simple FIFO queues!  The queues are owned and maintained by the Ethernet NIC hardware  Horrible unfairness can be demonstrated with only 2 TCP connections  Many servers deal with 1000s of TCP connections
  11. 03/12/08 Nuova Systems Inc. Page 11 Connection Size vs Throughput

    – idle 1G link 10 100 1,000 10,000 100,000 1,000,000 10,000,000 100,000,000 1,000,000,000 0 100000000 200000000 300000000 400000000 500000000 600000000 700000000 800000000 900000000 1000000000 Throughput
  12. 03/12/08 Nuova Systems Inc. Page 12 10 100 1,000 10,000

    100,000 1,000,000 10,000,000 100,000,000 1,000,000,000 0 50000000 100000000 150000000 200000000 250000000 300000000 350000000 400000000 450000000 500000000 Ideal Actual Connection Size vs Throughput – busy 1G link – competing with a single “hog” connection UNFAIR!
  13. 03/12/08 Nuova Systems Inc. Page 14 TCP: Rock or Hard

    Place?  With lossy Ethernet, TCP bandwidth can collapse due to stupidly high timeouts  => Unpredictable performance  With lossless Ethernet, TCP fairness can collapse due to stupid queuing policies  => Unpredictable performance  Data Center Managers hate unpredictability  Ethernet standards have evolved, TCP needs to catch up  TCP and Ethernet implementations must improve
  14. 03/12/08 Nuova Systems Inc. Page 15 Why does this matter?

     The Earth is being paved by data centers  Google, Microsoft, NSA, Walmart, Facebook, ...  Improving TCP means more overall efficiency in the data center  Heat, CO 2 , and radioactive waste are becoming measurable by-products of TCP inefficiency  Fix TCP => Save the World!