Slide 1

Slide 1 text

How Shit Works: TCP/IP Tomer Gabel Kraków, 15-17 May 2019

Slide 2

Slide 2 text

How shit works: Talking to my mom Tomer Gabel @ GeeCON 2019, Kraków

Slide 3

Slide 3 text

Before We Start… • I love my mom – Dearly – No, seriously! • This is a slight exaggeration • Though not entirely Image: Mom Anchor by Mez Love on Flickr (CC BY-NC-ND 2.0)

Slide 4

Slide 4 text

In Theory…

Slide 5

Slide 5 text

In Theory… • Point-to-point • One continuous session – Explicit preamble – Stateful communication – Explicit termination • In-order delivery • Reliable delivery

Slide 6

Slide 6 text

… In Practice

Slide 7

Slide 7 text

… In Practice • Few delivery guarantees – Out of order

Slide 8

Slide 8 text

… In Practice • Few delivery guarantees – Out of order – Unreliable

Slide 9

Slide 9 text

… In Practice • Few delivery guarantees – Out of order – Unreliable • Fuzzy session boundaries – No explicit preamble – No explicit termination • Receiver is totally overwhelmed

Slide 10

Slide 10 text

THIS IS ACTUALLY AMAZING.

Slide 11

Slide 11 text

What Worked? • We could address each other. – I text my mom on her cellphone – Switchboard maps number to IMSI – Packets are routed to my mom’s device • … but we don’t think about that. Image: Vintage Envelope by Heather on Flickr (CC BY-NC-ND 2.0)

Slide 12

Slide 12 text

What Worked? • We could communicate. • Routing is complex! – Multiple hops – Heterogenous networks – Each packet can take a different route • … but we don’t think about that. Image: Arpanet map 1973 (public domain), source: WikiMedia Commons

Slide 13

Slide 13 text

What Worked? • We could transmit. • Radio is complex! – Packets arrive at base station – Transceiver converts packets to analog signal – Antennae transmit signal as electromagnetic radiation • … but we don’t think about that. Image: Modern cell and antenna with flat parabola on blue sky by First Responder Network on Flickr (CC BY-NC-ND 2.0)

Slide 14

Slide 14 text

THE ONION OF ABSTRACTION STRIKES AGAIN!

Slide 15

Slide 15 text

So... What Went Wrong? • My mom and I failed to collaborate – My mom sent faster than I was ready to receive – My mom did not retransmit data that I had missed – There was no way to infer the correct order

Slide 16

Slide 16 text

Slight Digression • Have you noticed? – Texting is reliable – Texting is ordered – But that didn’t help • Reliability is the responsibility of the application • This is known as the end- to-end principle† † See “End-to-end principle” on Wikipedia Image: Neither Snow nor Rain by Kathleen Conklin (CC BY-2.0)

Slide 17

Slide 17 text

WHAT WE NEED IS A PROTOCOL.

Slide 18

Slide 18 text

How shit works: TCP/IP Tomer Gabel @ GeeCON 2019, Kraków

Slide 19

Slide 19 text

How shit works: TCP/IP Tomer Gabel @ GeeCON 2019, Kraków

Slide 20

Slide 20 text

Full Disclosure Bullshit ahead! • I’m not an expert • Explanations will be: – Simplified – Inaccurate – Wrong :-) • We’ll barely scratch the surface Image: Public Domain

Slide 21

Slide 21 text

Product Management 101 Assumptions • Existing infrastructure – Physical transmission – Addressing – Routing • In other words, we build on IP

Slide 22

Slide 22 text

Product Management 101 Assumptions • Existing infrastructure – Physical transmission – Addressing – Routing • In other words, we build on IP Requirements • Delivery guarantees – No drops – No duplicates • In-order delivery • Sender cannot overwhelm receiver

Slide 23

Slide 23 text

Proposed Solution • Super simple: – Send one packet at a time – Wait for ack – Rinse and repeat • Neatly solves all our problems! • … well, almost A B “Hi!” “How are you?” “Hey!”

Slide 24

Slide 24 text

Naïvety Doesn’t Become You • Sure, this works • But it’s very inefficient • We want this…

Slide 25

Slide 25 text

Naïvety Doesn’t Become You • Sure, this works • But it’s very inefficient • We want this…

Slide 26

Slide 26 text

Naïvety Doesn’t Become You • Sure, this works • But it’s very inefficient • We want this… • … but actually get this – Serial communication – Slow, oh so slow

Slide 27

Slide 27 text

Naïvety Doesn’t Become You • Sure, this works • But it’s very inefficient • We want this… • … but actually get this – Serial communication – Slow, oh so slow

Slide 28

Slide 28 text

Product Management 102 Assumptions • Existing infrastructure – Physical transmission – Addressing – Routing • In other words, we build on IP Requirements • Delivery guarantees – No drops – No duplicates • In-order delivery • Sender cannot overwhelm receiver • Maximize throughput

Slide 29

Slide 29 text

Product Management 102 Assumptions • Existing infrastructure – Physical transmission – Addressing – Routing • In other words, we build on IP Requirements • Delivery guarantees – No drops – No duplicates • In-order delivery • Sender cannot overwhelm receiver • Maximize throughput

Slide 30

Slide 30 text

Product Management 102 Assumptions • Existing infrastructure – Physical transmission – Addressing – Routing • In other words, we build on IP Requirements • Delivery guarantees – No drops – No duplicates • In-order delivery • Flow control

Slide 31

Slide 31 text

SO, TCP THEN?

Slide 32

Slide 32 text

What You All Know • TCP is a transport- layer protocol – Point-to-point – Connection-oriented – Reliable • Builds on top of IP Image: TCP packet layout with bit scale by Quliyevferman (CC BY-SA 4.0), source: WikiMedia Commons

Slide 33

Slide 33 text

“Point-to-Point” Ethernet (PHY) IP TCP Client Ethernet (PHY) IP TCP Server Bidirectional byte stream Segments Datagrams Frames

Slide 34

Slide 34 text

”Connection-Oriented” • The network is asynchronous – Just packets running around – Fully stateless • TCP provides a connection abstraction – Stateful – Explicit handshake – Explicit termination A B syn syn/ ack ack Connection established Roundtrip Time (RTT)

Slide 35

Slide 35 text

“Reliable” • TCP handles: – Ordering – Retransmission • Seems simple enough • It’s not. A B drop delay duplicate reorder A B A B A B time

Slide 36

Slide 36 text

TCP Flow Control • Two seemingly conflicting goals: – Maximize throughput – Do not overwhelm receiver • A collaborative protocol Image: Pixabay (via Pexels, free for use)

Slide 37

Slide 37 text

Sliding Windows 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 H e l l o , w o r l d ! \r \n Sender Receiver Receive window size = 4 MSS = 1

Slide 38

Slide 38 text

Sliding Windows 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 H e l l o , w o r l d ! \r \n Sender Receiver Receive window size = 4 ‘H’ seq=0 MSS = 1 ‘e’ seq=1 ‘l’ seq=2 ‘l’ seq=3 seq=0

Slide 39

Slide 39 text

Sliding Windows 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 H e l l o , w o r l d ! \r \n Sender Receiver Receive window size = 4 ‘H’ seq=0 MSS = 1 ‘e’ seq=1 ‘l’ seq=2 ‘l’ seq=3 seq=0 ‘o’ seq=4

Slide 40

Slide 40 text

Sliding Windows 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 H e l l o , w o r l d ! \r \n Sender Receiver Receive window size = 4 ‘H’ seq=0 MSS = 1 ‘e’ seq=1 ‘l’ seq=2 ‘l’ seq=3 seq=0 ‘o’ seq=4 seq=1 ‘,’ seq=5

Slide 41

Slide 41 text

Sliding Windows 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 H e l l o , w o r l d ! \r \n Sender Receiver Receive window size = 4 ‘H’ seq=0 MSS = 1 ‘e’ seq=1 ‘l’ seq=2 ‘l’ seq=3 seq=0 ‘o’ seq=4 seq=1 ‘,’ seq=5 seq=2 ‘ ’ seq=6

Slide 42

Slide 42 text

Packet Loss 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 H e l l o , w o r l d ! \r \n Sender Receiver Receive window size = 4 ‘H’ seq=0 MSS = 1 ‘e’ seq=1 ‘l’ seq=2 ‘l’ seq=3 ‘o’ seq=4 seq=0

Slide 43

Slide 43 text

Retransmission 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 H e l l o , w o r l d ! \r \n Sender Receiver Receive window size = 4 ‘H’ seq=0 MSS = 1 ‘e’ seq=1 ‘l’ seq=2 ‘l’ seq=3 ‘o’ seq=4 seq=0 seq=0

Slide 44

Slide 44 text

Retransmission 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 H e l l o , w o r l d ! \r \n Sender Receiver Receive window size = 4 ‘H’ seq=0 MSS = 1 ‘l’ seq=2 ‘l’ seq=3 ‘o’ seq=4 seq=0 seq=0 ‘e’ seq=1 Retransmission Timeout (RTO) ‘e’ seq=1

Slide 45

Slide 45 text

Retransmission 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 H e l l o , w o r l d ! \r \n MSS = 1 ‘e’ seq=1 seq=4 Sender Receiver Receive window size = 4 ‘H’ seq=0 ‘l’ seq=2 ‘l’ seq=3 ‘o’ seq=4 seq=0 seq=0 Retransmission Timeout (RTO) ‘e’ seq=1

Slide 46

Slide 46 text

Retransmission 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 H e l l o , w o r l d ! \r \n MSS = 1 ‘e’ seq=1 seq=4 Sender Receiver Receive window size = 4 ‘H’ seq=0 ‘l’ seq=2 ‘l’ seq=3 ‘o’ seq=4 seq=0 seq=0 ‘e’ seq=1 ‘,’ seq=5 ‘ ’ seq=6

Slide 47

Slide 47 text

TCP Fast Retransmission 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 H e l l o , w o r l d ! \r \n Sender Receiver Receive window size = 4 ‘H’ seq=0 MSS = 1 ‘e’ seq=1 ‘l’ seq=2 ‘l’ seq=3 ‘o’ seq=4 seq=0 seq=0 seq=0 Retransmission Timeout (RTO)

Slide 48

Slide 48 text

TCP Fast Retransmission 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 H e l l o , w o r l d ! \r \n Sender Receiver Receive window size = 4 ‘H’ seq=0 MSS = 1 ‘e’ seq=1 ‘l’ seq=2 ‘l’ seq=3 ‘o’ seq=4 seq=0 seq=0 seq=0 Retransmission Timeout (RTO) • 3 duplicate acks† • Sequence number is last known delivered † ”Congestion Avoidance and Control“, Jacobson et al, 1998

Slide 49

Slide 49 text

TCP Fast Retransmission 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 H e l l o , w o r l d ! \r \n Sender Receiver Receive window size = 4 ‘H’ seq=0 MSS = 1 ‘e’ seq=1 ‘l’ seq=2 ‘l’ seq=3 ‘o’ seq=4 seq=0 seq=0 seq=0 Retransmission Timeout (RTO) Opportunistically send last known + 1 ‘e’ seq=1

Slide 50

Slide 50 text

TCP Dynamics • Multiple variables… – RTT – MSS – Window size – RTO Roundtrip Time • Initial value on connection • Tracked on each ack • High variability

Slide 51

Slide 51 text

TCP Dynamics • Multiple variables… – RTT – MSS – Window size – RTO Maximum Segment Size • Largest segment allowed by TCP, in octets (bytes) • Related, but not identical, to MTU • Negotiated at connection time

Slide 52

Slide 52 text

TCP Dynamics • Multiple variables… – RTT – MSS – Window size – RTO TCP Window Size • Determined by the receiver • Advertised with each packet • Size up to 216 = 64KB • With, TCP Window Scaling, up to 214 x 216 = 1GB

Slide 53

Slide 53 text

TCP Dynamics • Multiple variables… – RTT – MSS – Window size – RTO Retransmission Timeout • Initially set to 1 seconds (originally 3 seconds†) • Dynamically adjusted based on RTT † RFC 6298 section 2.1

Slide 54

Slide 54 text

TCP Dynamics • Multiple variables… – RTT – MSS – Window size – RTO • There are more – Out of scope – Don’t worry :-)

Slide 55

Slide 55 text

SO CAN I TALK TO MY MOM, YET? You’ve made it this far. Good for you!

Slide 56

Slide 56 text

Closing the Connection • A TCP connection is a conversation – A fin is effectively a promise: “I’m done sending” – The corresponding ack means “I’ve received everything” • Both sides must signal fin A B fin ack ack fin B may send data

Slide 57

Slide 57 text

Closing the Connection • Why is this necessary? • Because it’s polite • If one side terminates the connection… • … and the other side sends more data… – “Connection reset by peer” – Like slamming the phone down! A B Yep! Enjoying GeeCON? A drops connection rst

Slide 58

Slide 58 text

Now It Gets Interesting • Remember, the network is asynchronous • Packets may be delayed, duplicated or both • What happens when an “old” packet shows up again? A B fin ack New connection fin ack ”Adios” ”Adios” Old connection

Slide 59

Slide 59 text

TCP Old Duplicates • Not very probable – Same host and port on both ends (”socketpair”) – Same sequence number • Low probability * large scale * long time = inevitability A B fin ack New connection fin ack ”Adios” ”Adios” Old connection

Slide 60

Slide 60 text

TIMEWAIT • A closed client connection remains in TIMEWAIT state • Old duplicates are dropped without rst • On modern TCP stacks, this lasts 1 minute druuge:~ tomer.gabel$ dig +short google.com 172.217.16.14 druuge:~ tomer.gabel$ curl -s http://172.217.16.14 >/dev/null druuge:~ tomer.gabel$ netstat -n | grep 172.217.16.14 tcp4 0 0 10.0.1.136.65499 172.217.16.14.80 TIME_WAIT

Slide 61

Slide 61 text

Last But Not Least • On socket.close(), what happens to… – Queued (unsent) data? – Unacknowledged segments? • Depends on SO_LINGER – Offers a “grace period” – On timeout, drops connection A B rst ack fin SO_LINGER

Slide 62

Slide 62 text

Last But Not Least • Setting SO_LINGER=0… – Circumvents normal TCP shutdown – Immediately aborts the connection • Sometimes recommended† as a way to avoid TIME_WAIT – Not a good idea! – Unless you know exactly why A B rst † Examples on StackOverflow, ServerFault,

Slide 63

Slide 63 text

Further Reading • TCP is a huge subject • We haven’t covered: – Delayed Acks – Nagle’s Algorithm – Congestion Control – Multipath • There’s always more! • Introduction to Computer Networks Peter Dordal, LUC • CSEP 561: Network Systems Krishnamurthy et al, University of Washington • List of relevant RFCs Wikipedia

Slide 64

Slide 64 text

QUESTIONS? Thank you for listening tomer@tomergabel.com @tomerg On GitHub: https://github.com/holograph This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.