Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How shit works: TCP/IP

How shit works: TCP/IP

A talk given at GeeCON 2019 in Kraków, Poland.

From the top of the stack, networks are remarkably simple to use. Just throw in a couple machines, some cables, a network switch and HTTP server, stir gently and season to taste, right?

Your go-to high level protocol (HTTP anyone?) builds on a hierarchy of seemingly simple abstractions that hide immense complexity, and these abstractions inevitably leak. In the fourth part of this series we'll take a closer look at everyone's favorite protocol TCP/IP, enumerate the challenges it faces and see how it aims to solve them.

With luck, you'll come away understanding enough so that next time you run into Connection Reset by Peer issues or SO_LINGER you'll have some idea what you're up against.

0014decc65763e66f22891be724b5afa?s=128

Tomer Gabel

May 17, 2019
Tweet

More Decks by Tomer Gabel

Other Decks in Technology

Transcript

  1. How Shit Works: TCP/IP Tomer Gabel Kraków, 15-17 May 2019

  2. How shit works: Talking to my mom Tomer Gabel @

    GeeCON 2019, Kraków
  3. Before We Start… • I love my mom – Dearly

    – No, seriously! • This is a slight exaggeration • Though not entirely Image: Mom Anchor by Mez Love on Flickr (CC BY-NC-ND 2.0)
  4. In Theory…

  5. In Theory… • Point-to-point • One continuous session – Explicit

    preamble – Stateful communication – Explicit termination • In-order delivery • Reliable delivery
  6. … In Practice

  7. … In Practice • Few delivery guarantees – Out of

    order
  8. … In Practice • Few delivery guarantees – Out of

    order – Unreliable
  9. … In Practice • Few delivery guarantees – Out of

    order – Unreliable • Fuzzy session boundaries – No explicit preamble – No explicit termination • Receiver is totally overwhelmed
  10. THIS IS ACTUALLY AMAZING.

  11. What Worked? • We could address each other. – I

    text my mom on her cellphone – Switchboard maps number to IMSI – Packets are routed to my mom’s device • … but we don’t think about that. Image: Vintage Envelope by Heather on Flickr (CC BY-NC-ND 2.0)
  12. What Worked? • We could communicate. • Routing is complex!

    – Multiple hops – Heterogenous networks – Each packet can take a different route • … but we don’t think about that. Image: Arpanet map 1973 (public domain), source: WikiMedia Commons
  13. What Worked? • We could transmit. • Radio is complex!

    – Packets arrive at base station – Transceiver converts packets to analog signal – Antennae transmit signal as electromagnetic radiation • … but we don’t think about that. Image: Modern cell and antenna with flat parabola on blue sky by First Responder Network on Flickr (CC BY-NC-ND 2.0)
  14. THE ONION OF ABSTRACTION STRIKES AGAIN!

  15. So... What Went Wrong? • My mom and I failed

    to collaborate – My mom sent faster than I was ready to receive – My mom did not retransmit data that I had missed – There was no way to infer the correct order
  16. Slight Digression • Have you noticed? – Texting is reliable

    – Texting is ordered – But that didn’t help • Reliability is the responsibility of the application • This is known as the end- to-end principle† † See “End-to-end principle” on Wikipedia Image: Neither Snow nor Rain by Kathleen Conklin (CC BY-2.0)
  17. WHAT WE NEED IS A PROTOCOL.

  18. How shit works: TCP/IP Tomer Gabel @ GeeCON 2019, Kraków

  19. How shit works: TCP/IP Tomer Gabel @ GeeCON 2019, Kraków

  20. Full Disclosure Bullshit ahead! • I’m not an expert •

    Explanations will be: – Simplified – Inaccurate – Wrong :-) • We’ll barely scratch the surface Image: Public Domain
  21. Product Management 101 Assumptions • Existing infrastructure – Physical transmission

    – Addressing – Routing • In other words, we build on IP
  22. Product Management 101 Assumptions • Existing infrastructure – Physical transmission

    – Addressing – Routing • In other words, we build on IP Requirements • Delivery guarantees – No drops – No duplicates • In-order delivery • Sender cannot overwhelm receiver
  23. Proposed Solution • Super simple: – Send one packet at

    a time – Wait for ack – Rinse and repeat • Neatly solves all our problems! • … well, almost A B “Hi!” “How are you?” “Hey!”
  24. Naïvety Doesn’t Become You • Sure, this works • But

    it’s very inefficient • We want this…
  25. Naïvety Doesn’t Become You • Sure, this works • But

    it’s very inefficient • We want this…
  26. Naïvety Doesn’t Become You • Sure, this works • But

    it’s very inefficient • We want this… • … but actually get this – Serial communication – Slow, oh so slow
  27. Naïvety Doesn’t Become You • Sure, this works • But

    it’s very inefficient • We want this… • … but actually get this – Serial communication – Slow, oh so slow
  28. Product Management 102 Assumptions • Existing infrastructure – Physical transmission

    – Addressing – Routing • In other words, we build on IP Requirements • Delivery guarantees – No drops – No duplicates • In-order delivery • Sender cannot overwhelm receiver • Maximize throughput
  29. Product Management 102 Assumptions • Existing infrastructure – Physical transmission

    – Addressing – Routing • In other words, we build on IP Requirements • Delivery guarantees – No drops – No duplicates • In-order delivery • Sender cannot overwhelm receiver • Maximize throughput
  30. Product Management 102 Assumptions • Existing infrastructure – Physical transmission

    – Addressing – Routing • In other words, we build on IP Requirements • Delivery guarantees – No drops – No duplicates • In-order delivery • Flow control
  31. SO, TCP THEN?

  32. What You All Know • TCP is a transport- layer

    protocol – Point-to-point – Connection-oriented – Reliable • Builds on top of IP Image: TCP packet layout with bit scale by Quliyevferman (CC BY-SA 4.0), source: WikiMedia Commons
  33. “Point-to-Point” Ethernet (PHY) IP TCP Client Ethernet (PHY) IP TCP

    Server Bidirectional byte stream Segments Datagrams Frames
  34. ”Connection-Oriented” • The network is asynchronous – Just packets running

    around – Fully stateless • TCP provides a connection abstraction – Stateful – Explicit handshake – Explicit termination A B syn syn/ ack ack Connection established Roundtrip Time (RTT)
  35. “Reliable” • TCP handles: – Ordering – Retransmission • Seems

    simple enough • It’s not. A B drop delay duplicate reorder A B A B A B time
  36. TCP Flow Control • Two seemingly conflicting goals: – Maximize

    throughput – Do not overwhelm receiver • A collaborative protocol Image: Pixabay (via Pexels, free for use)
  37. Sliding Windows 0 1 2 3 4 5 6 7

    8 9 10 11 12 13 14 H e l l o , w o r l d ! \r \n Sender Receiver Receive window size = 4 MSS = 1
  38. Sliding Windows 0 1 2 3 4 5 6 7

    8 9 10 11 12 13 14 H e l l o , w o r l d ! \r \n Sender Receiver Receive window size = 4 ‘H’ seq=0 MSS = 1 ‘e’ seq=1 ‘l’ seq=2 ‘l’ seq=3 seq=0
  39. Sliding Windows 0 1 2 3 4 5 6 7

    8 9 10 11 12 13 14 H e l l o , w o r l d ! \r \n Sender Receiver Receive window size = 4 ‘H’ seq=0 MSS = 1 ‘e’ seq=1 ‘l’ seq=2 ‘l’ seq=3 seq=0 ‘o’ seq=4
  40. Sliding Windows 0 1 2 3 4 5 6 7

    8 9 10 11 12 13 14 H e l l o , w o r l d ! \r \n Sender Receiver Receive window size = 4 ‘H’ seq=0 MSS = 1 ‘e’ seq=1 ‘l’ seq=2 ‘l’ seq=3 seq=0 ‘o’ seq=4 seq=1 ‘,’ seq=5
  41. Sliding Windows 0 1 2 3 4 5 6 7

    8 9 10 11 12 13 14 H e l l o , w o r l d ! \r \n Sender Receiver Receive window size = 4 ‘H’ seq=0 MSS = 1 ‘e’ seq=1 ‘l’ seq=2 ‘l’ seq=3 seq=0 ‘o’ seq=4 seq=1 ‘,’ seq=5 seq=2 ‘ ’ seq=6
  42. Packet Loss 0 1 2 3 4 5 6 7

    8 9 10 11 12 13 14 H e l l o , w o r l d ! \r \n Sender Receiver Receive window size = 4 ‘H’ seq=0 MSS = 1 ‘e’ seq=1 ‘l’ seq=2 ‘l’ seq=3 ‘o’ seq=4 seq=0
  43. Retransmission 0 1 2 3 4 5 6 7 8

    9 10 11 12 13 14 H e l l o , w o r l d ! \r \n Sender Receiver Receive window size = 4 ‘H’ seq=0 MSS = 1 ‘e’ seq=1 ‘l’ seq=2 ‘l’ seq=3 ‘o’ seq=4 seq=0 seq=0
  44. Retransmission 0 1 2 3 4 5 6 7 8

    9 10 11 12 13 14 H e l l o , w o r l d ! \r \n Sender Receiver Receive window size = 4 ‘H’ seq=0 MSS = 1 ‘l’ seq=2 ‘l’ seq=3 ‘o’ seq=4 seq=0 seq=0 ‘e’ seq=1 Retransmission Timeout (RTO) ‘e’ seq=1
  45. Retransmission 0 1 2 3 4 5 6 7 8

    9 10 11 12 13 14 H e l l o , w o r l d ! \r \n MSS = 1 ‘e’ seq=1 seq=4 Sender Receiver Receive window size = 4 ‘H’ seq=0 ‘l’ seq=2 ‘l’ seq=3 ‘o’ seq=4 seq=0 seq=0 Retransmission Timeout (RTO) ‘e’ seq=1
  46. Retransmission 0 1 2 3 4 5 6 7 8

    9 10 11 12 13 14 H e l l o , w o r l d ! \r \n MSS = 1 ‘e’ seq=1 seq=4 Sender Receiver Receive window size = 4 ‘H’ seq=0 ‘l’ seq=2 ‘l’ seq=3 ‘o’ seq=4 seq=0 seq=0 ‘e’ seq=1 ‘,’ seq=5 ‘ ’ seq=6
  47. TCP Fast Retransmission 0 1 2 3 4 5 6

    7 8 9 10 11 12 13 14 H e l l o , w o r l d ! \r \n Sender Receiver Receive window size = 4 ‘H’ seq=0 MSS = 1 ‘e’ seq=1 ‘l’ seq=2 ‘l’ seq=3 ‘o’ seq=4 seq=0 seq=0 seq=0 Retransmission Timeout (RTO)
  48. TCP Fast Retransmission 0 1 2 3 4 5 6

    7 8 9 10 11 12 13 14 H e l l o , w o r l d ! \r \n Sender Receiver Receive window size = 4 ‘H’ seq=0 MSS = 1 ‘e’ seq=1 ‘l’ seq=2 ‘l’ seq=3 ‘o’ seq=4 seq=0 seq=0 seq=0 Retransmission Timeout (RTO) • 3 duplicate acks† • Sequence number is last known delivered † ”Congestion Avoidance and Control“, Jacobson et al, 1998
  49. TCP Fast Retransmission 0 1 2 3 4 5 6

    7 8 9 10 11 12 13 14 H e l l o , w o r l d ! \r \n Sender Receiver Receive window size = 4 ‘H’ seq=0 MSS = 1 ‘e’ seq=1 ‘l’ seq=2 ‘l’ seq=3 ‘o’ seq=4 seq=0 seq=0 seq=0 Retransmission Timeout (RTO) Opportunistically send last known + 1 ‘e’ seq=1
  50. TCP Dynamics • Multiple variables… – RTT – MSS –

    Window size – RTO Roundtrip Time • Initial value on connection • Tracked on each ack • High variability
  51. TCP Dynamics • Multiple variables… – RTT – MSS –

    Window size – RTO Maximum Segment Size • Largest segment allowed by TCP, in octets (bytes) • Related, but not identical, to MTU • Negotiated at connection time
  52. TCP Dynamics • Multiple variables… – RTT – MSS –

    Window size – RTO TCP Window Size • Determined by the receiver • Advertised with each packet • Size up to 216 = 64KB • With, TCP Window Scaling, up to 214 x 216 = 1GB
  53. TCP Dynamics • Multiple variables… – RTT – MSS –

    Window size – RTO Retransmission Timeout • Initially set to 1 seconds (originally 3 seconds†) • Dynamically adjusted based on RTT † RFC 6298 section 2.1
  54. TCP Dynamics • Multiple variables… – RTT – MSS –

    Window size – RTO • There are more – Out of scope – Don’t worry :-)
  55. SO CAN I TALK TO MY MOM, YET? You’ve made

    it this far. Good for you!
  56. Closing the Connection • A TCP connection is a conversation

    – A fin is effectively a promise: “I’m done sending” – The corresponding ack means “I’ve received everything” • Both sides must signal fin A B fin ack ack fin B may send data
  57. Closing the Connection • Why is this necessary? • Because

    it’s polite • If one side terminates the connection… • … and the other side sends more data… – “Connection reset by peer” – Like slamming the phone down! A B Yep! Enjoying GeeCON? A drops connection rst
  58. Now It Gets Interesting • Remember, the network is asynchronous

    • Packets may be delayed, duplicated or both • What happens when an “old” packet shows up again? A B fin ack New connection fin ack ”Adios” ”Adios” Old connection
  59. TCP Old Duplicates • Not very probable – Same host

    and port on both ends (”socketpair”) – Same sequence number • Low probability * large scale * long time = inevitability A B fin ack New connection fin ack ”Adios” ”Adios” Old connection
  60. TIMEWAIT • A closed client connection remains in TIMEWAIT state

    • Old duplicates are dropped without rst • On modern TCP stacks, this lasts 1 minute druuge:~ tomer.gabel$ dig +short google.com 172.217.16.14 druuge:~ tomer.gabel$ curl -s http://172.217.16.14 >/dev/null druuge:~ tomer.gabel$ netstat -n | grep 172.217.16.14 tcp4 0 0 10.0.1.136.65499 172.217.16.14.80 TIME_WAIT
  61. Last But Not Least • On socket.close(), what happens to…

    – Queued (unsent) data? – Unacknowledged segments? • Depends on SO_LINGER – Offers a “grace period” – On timeout, drops connection A B rst ack fin SO_LINGER
  62. Last But Not Least • Setting SO_LINGER=0… – Circumvents normal

    TCP shutdown – Immediately aborts the connection • Sometimes recommended† as a way to avoid TIME_WAIT – Not a good idea! – Unless you know exactly why A B rst † Examples on StackOverflow, ServerFault,
  63. Further Reading • TCP is a huge subject • We

    haven’t covered: – Delayed Acks – Nagle’s Algorithm – Congestion Control – Multipath • There’s always more! • Introduction to Computer Networks Peter Dordal, LUC • CSEP 561: Network Systems Krishnamurthy et al, University of Washington • List of relevant RFCs Wikipedia
  64. QUESTIONS? Thank you for listening tomer@tomergabel.com @tomerg On GitHub: https://github.com/holograph

    This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.