Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How shit works: TCP/IP

How shit works: TCP/IP

A talk given at GeeCON 2019 in Kraków, Poland.

From the top of the stack, networks are remarkably simple to use. Just throw in a couple machines, some cables, a network switch and HTTP server, stir gently and season to taste, right?

Your go-to high level protocol (HTTP anyone?) builds on a hierarchy of seemingly simple abstractions that hide immense complexity, and these abstractions inevitably leak. In the fourth part of this series we'll take a closer look at everyone's favorite protocol TCP/IP, enumerate the challenges it faces and see how it aims to solve them.

With luck, you'll come away understanding enough so that next time you run into Connection Reset by Peer issues or SO_LINGER you'll have some idea what you're up against.

Tomer Gabel

May 17, 2019
Tweet

More Decks by Tomer Gabel

Other Decks in Technology

Transcript

  1. Before We Start… • I love my mom – Dearly

    – No, seriously! • This is a slight exaggeration • Though not entirely Image: Mom Anchor by Mez Love on Flickr (CC BY-NC-ND 2.0)
  2. In Theory… • Point-to-point • One continuous session – Explicit

    preamble – Stateful communication – Explicit termination • In-order delivery • Reliable delivery
  3. … In Practice • Few delivery guarantees – Out of

    order – Unreliable • Fuzzy session boundaries – No explicit preamble – No explicit termination • Receiver is totally overwhelmed
  4. What Worked? • We could address each other. – I

    text my mom on her cellphone – Switchboard maps number to IMSI – Packets are routed to my mom’s device • … but we don’t think about that. Image: Vintage Envelope by Heather on Flickr (CC BY-NC-ND 2.0)
  5. What Worked? • We could communicate. • Routing is complex!

    – Multiple hops – Heterogenous networks – Each packet can take a different route • … but we don’t think about that. Image: Arpanet map 1973 (public domain), source: WikiMedia Commons
  6. What Worked? • We could transmit. • Radio is complex!

    – Packets arrive at base station – Transceiver converts packets to analog signal – Antennae transmit signal as electromagnetic radiation • … but we don’t think about that. Image: Modern cell and antenna with flat parabola on blue sky by First Responder Network on Flickr (CC BY-NC-ND 2.0)
  7. So... What Went Wrong? • My mom and I failed

    to collaborate – My mom sent faster than I was ready to receive – My mom did not retransmit data that I had missed – There was no way to infer the correct order
  8. Slight Digression • Have you noticed? – Texting is reliable

    – Texting is ordered – But that didn’t help • Reliability is the responsibility of the application • This is known as the end- to-end principle† † See “End-to-end principle” on Wikipedia Image: Neither Snow nor Rain by Kathleen Conklin (CC BY-2.0)
  9. Full Disclosure Bullshit ahead! • I’m not an expert •

    Explanations will be: – Simplified – Inaccurate – Wrong :-) • We’ll barely scratch the surface Image: Public Domain
  10. Product Management 101 Assumptions • Existing infrastructure – Physical transmission

    – Addressing – Routing • In other words, we build on IP
  11. Product Management 101 Assumptions • Existing infrastructure – Physical transmission

    – Addressing – Routing • In other words, we build on IP Requirements • Delivery guarantees – No drops – No duplicates • In-order delivery • Sender cannot overwhelm receiver
  12. Proposed Solution • Super simple: – Send one packet at

    a time – Wait for ack – Rinse and repeat • Neatly solves all our problems! • … well, almost A B “Hi!” “How are you?” “Hey!”
  13. Naïvety Doesn’t Become You • Sure, this works • But

    it’s very inefficient • We want this…
  14. Naïvety Doesn’t Become You • Sure, this works • But

    it’s very inefficient • We want this…
  15. Naïvety Doesn’t Become You • Sure, this works • But

    it’s very inefficient • We want this… • … but actually get this – Serial communication – Slow, oh so slow
  16. Naïvety Doesn’t Become You • Sure, this works • But

    it’s very inefficient • We want this… • … but actually get this – Serial communication – Slow, oh so slow
  17. Product Management 102 Assumptions • Existing infrastructure – Physical transmission

    – Addressing – Routing • In other words, we build on IP Requirements • Delivery guarantees – No drops – No duplicates • In-order delivery • Sender cannot overwhelm receiver • Maximize throughput
  18. Product Management 102 Assumptions • Existing infrastructure – Physical transmission

    – Addressing – Routing • In other words, we build on IP Requirements • Delivery guarantees – No drops – No duplicates • In-order delivery • Sender cannot overwhelm receiver • Maximize throughput
  19. Product Management 102 Assumptions • Existing infrastructure – Physical transmission

    – Addressing – Routing • In other words, we build on IP Requirements • Delivery guarantees – No drops – No duplicates • In-order delivery • Flow control
  20. What You All Know • TCP is a transport- layer

    protocol – Point-to-point – Connection-oriented – Reliable • Builds on top of IP Image: TCP packet layout with bit scale by Quliyevferman (CC BY-SA 4.0), source: WikiMedia Commons
  21. “Point-to-Point” Ethernet (PHY) IP TCP Client Ethernet (PHY) IP TCP

    Server Bidirectional byte stream Segments Datagrams Frames
  22. ”Connection-Oriented” • The network is asynchronous – Just packets running

    around – Fully stateless • TCP provides a connection abstraction – Stateful – Explicit handshake – Explicit termination A B syn syn/ ack ack Connection established Roundtrip Time (RTT)
  23. “Reliable” • TCP handles: – Ordering – Retransmission • Seems

    simple enough • It’s not. A B drop delay duplicate reorder A B A B A B time
  24. TCP Flow Control • Two seemingly conflicting goals: – Maximize

    throughput – Do not overwhelm receiver • A collaborative protocol Image: Pixabay (via Pexels, free for use)
  25. Sliding Windows 0 1 2 3 4 5 6 7

    8 9 10 11 12 13 14 H e l l o , w o r l d ! \r \n Sender Receiver Receive window size = 4 MSS = 1
  26. Sliding Windows 0 1 2 3 4 5 6 7

    8 9 10 11 12 13 14 H e l l o , w o r l d ! \r \n Sender Receiver Receive window size = 4 ‘H’ seq=0 MSS = 1 ‘e’ seq=1 ‘l’ seq=2 ‘l’ seq=3 seq=0
  27. Sliding Windows 0 1 2 3 4 5 6 7

    8 9 10 11 12 13 14 H e l l o , w o r l d ! \r \n Sender Receiver Receive window size = 4 ‘H’ seq=0 MSS = 1 ‘e’ seq=1 ‘l’ seq=2 ‘l’ seq=3 seq=0 ‘o’ seq=4
  28. Sliding Windows 0 1 2 3 4 5 6 7

    8 9 10 11 12 13 14 H e l l o , w o r l d ! \r \n Sender Receiver Receive window size = 4 ‘H’ seq=0 MSS = 1 ‘e’ seq=1 ‘l’ seq=2 ‘l’ seq=3 seq=0 ‘o’ seq=4 seq=1 ‘,’ seq=5
  29. Sliding Windows 0 1 2 3 4 5 6 7

    8 9 10 11 12 13 14 H e l l o , w o r l d ! \r \n Sender Receiver Receive window size = 4 ‘H’ seq=0 MSS = 1 ‘e’ seq=1 ‘l’ seq=2 ‘l’ seq=3 seq=0 ‘o’ seq=4 seq=1 ‘,’ seq=5 seq=2 ‘ ’ seq=6
  30. Packet Loss 0 1 2 3 4 5 6 7

    8 9 10 11 12 13 14 H e l l o , w o r l d ! \r \n Sender Receiver Receive window size = 4 ‘H’ seq=0 MSS = 1 ‘e’ seq=1 ‘l’ seq=2 ‘l’ seq=3 ‘o’ seq=4 seq=0
  31. Retransmission 0 1 2 3 4 5 6 7 8

    9 10 11 12 13 14 H e l l o , w o r l d ! \r \n Sender Receiver Receive window size = 4 ‘H’ seq=0 MSS = 1 ‘e’ seq=1 ‘l’ seq=2 ‘l’ seq=3 ‘o’ seq=4 seq=0 seq=0
  32. Retransmission 0 1 2 3 4 5 6 7 8

    9 10 11 12 13 14 H e l l o , w o r l d ! \r \n Sender Receiver Receive window size = 4 ‘H’ seq=0 MSS = 1 ‘l’ seq=2 ‘l’ seq=3 ‘o’ seq=4 seq=0 seq=0 ‘e’ seq=1 Retransmission Timeout (RTO) ‘e’ seq=1
  33. Retransmission 0 1 2 3 4 5 6 7 8

    9 10 11 12 13 14 H e l l o , w o r l d ! \r \n MSS = 1 ‘e’ seq=1 seq=4 Sender Receiver Receive window size = 4 ‘H’ seq=0 ‘l’ seq=2 ‘l’ seq=3 ‘o’ seq=4 seq=0 seq=0 Retransmission Timeout (RTO) ‘e’ seq=1
  34. Retransmission 0 1 2 3 4 5 6 7 8

    9 10 11 12 13 14 H e l l o , w o r l d ! \r \n MSS = 1 ‘e’ seq=1 seq=4 Sender Receiver Receive window size = 4 ‘H’ seq=0 ‘l’ seq=2 ‘l’ seq=3 ‘o’ seq=4 seq=0 seq=0 ‘e’ seq=1 ‘,’ seq=5 ‘ ’ seq=6
  35. TCP Fast Retransmission 0 1 2 3 4 5 6

    7 8 9 10 11 12 13 14 H e l l o , w o r l d ! \r \n Sender Receiver Receive window size = 4 ‘H’ seq=0 MSS = 1 ‘e’ seq=1 ‘l’ seq=2 ‘l’ seq=3 ‘o’ seq=4 seq=0 seq=0 seq=0 Retransmission Timeout (RTO)
  36. TCP Fast Retransmission 0 1 2 3 4 5 6

    7 8 9 10 11 12 13 14 H e l l o , w o r l d ! \r \n Sender Receiver Receive window size = 4 ‘H’ seq=0 MSS = 1 ‘e’ seq=1 ‘l’ seq=2 ‘l’ seq=3 ‘o’ seq=4 seq=0 seq=0 seq=0 Retransmission Timeout (RTO) • 3 duplicate acks† • Sequence number is last known delivered † ”Congestion Avoidance and Control“, Jacobson et al, 1998
  37. TCP Fast Retransmission 0 1 2 3 4 5 6

    7 8 9 10 11 12 13 14 H e l l o , w o r l d ! \r \n Sender Receiver Receive window size = 4 ‘H’ seq=0 MSS = 1 ‘e’ seq=1 ‘l’ seq=2 ‘l’ seq=3 ‘o’ seq=4 seq=0 seq=0 seq=0 Retransmission Timeout (RTO) Opportunistically send last known + 1 ‘e’ seq=1
  38. TCP Dynamics • Multiple variables… – RTT – MSS –

    Window size – RTO Roundtrip Time • Initial value on connection • Tracked on each ack • High variability
  39. TCP Dynamics • Multiple variables… – RTT – MSS –

    Window size – RTO Maximum Segment Size • Largest segment allowed by TCP, in octets (bytes) • Related, but not identical, to MTU • Negotiated at connection time
  40. TCP Dynamics • Multiple variables… – RTT – MSS –

    Window size – RTO TCP Window Size • Determined by the receiver • Advertised with each packet • Size up to 216 = 64KB • With, TCP Window Scaling, up to 214 x 216 = 1GB
  41. TCP Dynamics • Multiple variables… – RTT – MSS –

    Window size – RTO Retransmission Timeout • Initially set to 1 seconds (originally 3 seconds†) • Dynamically adjusted based on RTT † RFC 6298 section 2.1
  42. TCP Dynamics • Multiple variables… – RTT – MSS –

    Window size – RTO • There are more – Out of scope – Don’t worry :-)
  43. SO CAN I TALK TO MY MOM, YET? You’ve made

    it this far. Good for you!
  44. Closing the Connection • A TCP connection is a conversation

    – A fin is effectively a promise: “I’m done sending” – The corresponding ack means “I’ve received everything” • Both sides must signal fin A B fin ack ack fin B may send data
  45. Closing the Connection • Why is this necessary? • Because

    it’s polite • If one side terminates the connection… • … and the other side sends more data… – “Connection reset by peer” – Like slamming the phone down! A B Yep! Enjoying GeeCON? A drops connection rst
  46. Now It Gets Interesting • Remember, the network is asynchronous

    • Packets may be delayed, duplicated or both • What happens when an “old” packet shows up again? A B fin ack New connection fin ack ”Adios” ”Adios” Old connection
  47. TCP Old Duplicates • Not very probable – Same host

    and port on both ends (”socketpair”) – Same sequence number • Low probability * large scale * long time = inevitability A B fin ack New connection fin ack ”Adios” ”Adios” Old connection
  48. TIMEWAIT • A closed client connection remains in TIMEWAIT state

    • Old duplicates are dropped without rst • On modern TCP stacks, this lasts 1 minute druuge:~ tomer.gabel$ dig +short google.com 172.217.16.14 druuge:~ tomer.gabel$ curl -s http://172.217.16.14 >/dev/null druuge:~ tomer.gabel$ netstat -n | grep 172.217.16.14 tcp4 0 0 10.0.1.136.65499 172.217.16.14.80 TIME_WAIT
  49. Last But Not Least • On socket.close(), what happens to…

    – Queued (unsent) data? – Unacknowledged segments? • Depends on SO_LINGER – Offers a “grace period” – On timeout, drops connection A B rst ack fin SO_LINGER
  50. Last But Not Least • Setting SO_LINGER=0… – Circumvents normal

    TCP shutdown – Immediately aborts the connection • Sometimes recommended† as a way to avoid TIME_WAIT – Not a good idea! – Unless you know exactly why A B rst † Examples on StackOverflow, ServerFault,
  51. Further Reading • TCP is a huge subject • We

    haven’t covered: – Delayed Acks – Nagle’s Algorithm – Congestion Control – Multipath • There’s always more! • Introduction to Computer Networks Peter Dordal, LUC • CSEP 561: Network Systems Krishnamurthy et al, University of Washington • List of relevant RFCs Wikipedia
  52. QUESTIONS? Thank you for listening [email protected] @tomerg On GitHub: https://github.com/holograph

    This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.