Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The rsync algorithm

The rsync algorithm

Modern computers are very powerful. These days, mobile phones are packed with multi core CPUs and even GPUs. Despite these advances in hardware, internet connections in most parts of the world are still surprisingly slow and unreliable. This creates a challenge: how can files be efficiently transferred between computers over a low-bandwidth, high latency network connection? In 1996, Andrew Tridgell and Paul Mackerras developed a simple solution which powers one of the most useful UNIX tools, rsync. The algorithm remains relevant 21 years after its invention because it solves a timeless problem.

This talk will take you step by step through the rsync algorithm. You will learn that PhD theses are not always scary or unapproachable. Hopefully, you will also leave with a better intuition about the inner workings of rsync, as well as ideas for how its principles can be building blocks for solving many other different problems.

Camilo Aguilar

April 27, 2017
Tweet

More Decks by Camilo Aguilar

Other Decks in Programming

Transcript

  1. A B Client Server Σ Σ Σ Σ Σ Σ

    Σ Σ Σ Σ Calculating block signatures…
  2. A B Client Server Σ Σ Σ Σ Σ Σ

    Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Calculating block signatures…
  3. A B Client Server Σ Σ Σ Σ Σ Σ

    Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Calculating block signatures… Sending block signatures to Client… done
  4. A B Client Server Σ Σ Σ Σ Σ Σ

    Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Calculating block signatures… Sending block signatures to Client… done done
  5. A B Client Server C Σ Σ Σ Σ Σ

    Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ
  6. A B Client Server C Σ Σ Σ Σ Σ

    Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Finding differences with B…
  7. A B Client Server Σ C Σ Σ Σ Σ

    Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Finding differences with B…
  8. A B Client Server Σ Lookup C Σ Σ Σ

    Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Finding differences with B…
  9. A B Client Server Σ Match! C Σ Σ Σ

    Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Finding differences with B…
  10. A B Client Server Σ Match! C Σ Σ Σ

    Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Finding differences with B…
  11. A B Client Server C Σ Σ Σ Σ Σ

    Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Finding differences with B…
  12. A B Client Server C Σ Σ Σ Σ Σ

    Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Reconstructing C… Finding differences with B…
  13. A B Client Server C Σ Σ Σ Σ Σ

    Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Reconstructing C… Finding differences with B…
  14. A B Client Server C Σ Σ Σ Σ Σ

    Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Reconstructing C… Finding differences with B…
  15. A B Client Server C Σ Σ Σ Σ Σ

    Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Reconstructing C… Finding differences with B…
  16. A B Client Server C Σ Σ Σ Σ Σ

    Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Reconstructing C… Finding differences with B…
  17. A B Client Server C Σ Σ Σ Σ Σ

    Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Reconstructing C… Finding differences with B…
  18. A B Client Server C Σ Lookup Σ Σ Σ

    Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Reconstructing C… Finding differences with B…
  19. A B Client Server C Σ Not Found! Σ Σ

    Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Reconstructing C… Finding differences with B…
  20. A B Client Server C Σ Σ Σ Σ Σ

    Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Reconstructing C… Finding differences with B…
  21. A B Client Server C Σ Σ Σ Σ Σ

    Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Reconstructing C… Finding differences with B…
  22. A B Client Server C Σ Σ Σ Σ Σ

    Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Reconstructing C… Finding differences with B…
  23. A B Client Server C Σ Σ Σ Σ Σ

    Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Reconstructing C… Finding differences with B…
  24. A B Client Server C Σ Σ Σ Σ Σ

    Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Reconstructing C… Finding differences with B…
  25. A B Client Server C Σ Σ Σ Σ Σ

    Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Reconstructing C… done Finding differences with B… done
  26. Server Block Size Block Match 2 Bytes False Weak 0

    Strong EOF False 0 Client A B 12e00cb → strong: 5982e9e5 index: 1
  27. Server Block Size Block Match 2 Bytes False Weak 0

    Strong EOF False 0 Client A B 12e00cb → strong: 5982e9e5 index: 1
  28. Server Block Size Block Match 2 Bytes False Weak 0

    Strong EOF False 0 Client A B 12e00cb → strong: 5982e9e5 index: 1 13400d3 strong: 1c2fb9a8 index: 2 →
  29. Server Block Size Block Match 2 Bytes False Weak 0

    Strong EOF False 0 Client A B 12e00cb → strong: 5982e9e5 index: 1 13400d3 → strong: 1c2fb9a8 index: 2
  30. Server Block Size Block Match 2 Bytes False Weak 0

    Strong EOF False 0 Client A B 12e00cb → strong: 5982e9e5 index: 1 13400d3 → strong: 1c2fb9a8 index: 2 13900cd → strong: ab9e1994 index: 3
  31. Server Block Size Block Match 2 Bytes False Weak 0

    Strong EOF False 0 Client A B 12e00cb → strong: 5982e9e5 index: 1 13400d3 → strong: 1c2fb9a8 index: 2 13900cd → strong: ab9e1994 index: 3
  32. Server Block Size Block Match 2 Bytes False Weak 0

    Strong EOF False 0 Client A B 12e00cb → strong: 5982e9e5 index: 1 13400d3 → strong: 1c2fb9a8 index: 2 13900cd → strong: ab9e1994 index: 3
  33. Server Block Size Block Match 2 Bytes False Weak 0

    Strong EOF False 0 Client A B 12e00cb → strong: 5982e9e5 index: 1 13400d3 → strong: 1c2fb9a8 index: 2 13900cd → strong: ab9e1994 index: 3 Signature table
  34. Server Block Size Block Match 2 Bytes False Weak 0

    Strong EOF False 0 Client A B 12e00cb → strong: 5982e9e5 index: 1 13400d3 → strong: 1c2fb9a8 index: 2 13900cd → strong: ab9e1994 index: 3 Signature table
  35. Server Block Size Block Match 2 Bytes False Weak 13d00d6

    Strong EOF False 0 Client A B 12e00cb → strong: 5982e9e5 index: 1 13400d3 → strong: 1c2fb9a8 index: 2 13900cd → strong: ab9e1994 index: 3 Signature table
  36. Server Block Size Block Match 2 Bytes False Weak 13d00d6

    Strong EOF False 0 Client A B 12e00cb → strong: 5982e9e5 index: 1 13400d3 → strong: 1c2fb9a8 index: 2 13900cd → strong: ab9e1994 index: 3 Signature table Lookup…
  37. Server Block Size Block Match 2 Bytes False Weak 13d00d6

    Strong EOF False 0 Client A B 12e00cb → strong: 5982e9e5 index: 1 13400d3 → strong: 1c2fb9a8 index: 2 13900cd → strong: ab9e1994 index: 3 Signature table ✘
  38. Server Block Size Block Match 2 Bytes False Weak 13d00d6

    Strong EOF False 0 Client A B 12e00cb → strong: 5982e9e5 index: 1 13400d3 → strong: 1c2fb9a8 index: 2 13900cd → strong: ab9e1994 index: 3 Signature table ✘
  39. Server Block Size Block Match 2 Bytes False Weak 13d00d6

    Strong EOF False 0 Client A B 12e00cb → strong: 5982e9e5 index: 1 13400d3 → strong: 1c2fb9a8 index: 2 13900cd → strong: ab9e1994 index: 3 Signature table
  40. Server Block Size Block Match 2 Bytes False Weak 13d00d6

    Strong EOF False 0 Client A B 12e00cb → strong: 5982e9e5 index: 1 13400d3 → strong: 1c2fb9a8 index: 2 13900cd → strong: ab9e1994 index: 3 Signature table
  41. Server Block Size Block Match 2 Bytes False Weak 406019b

    Strong EOF False 0 Client A B 12e00cb → strong: 5982e9e5 index: 1 13400d3 → strong: 1c2fb9a8 index: 2 13900cd → strong: ab9e1994 index: 3 Signature table Operations
  42. Server Block Size Block Match 2 Bytes False Weak 406019b

    Strong EOF False 0 Client A B 12e00cb → strong: 5982e9e5 index: 1 13400d3 → strong: 1c2fb9a8 index: 2 13900cd → strong: ab9e1994 index: 3 Signature table Operations Lookup…
  43. Server Block Size Block Match 2 Bytes False Weak 406019b

    Strong EOF False 0 Client A B 12e00cb → strong: 5982e9e5 index: 1 13400d3 → strong: 1c2fb9a8 index: 2 13900cd → strong: ab9e1994 index: 3 Signature table Operations ✘
  44. Server Block Size Block Match 2 Bytes False Weak 406019b

    Strong EOF False 0 Client A B 12e00cb → strong: 5982e9e5 index: 1 13400d3 → strong: 1c2fb9a8 index: 2 13900cd → strong: ab9e1994 index: 3 Signature table Operations ✘
  45. Server Block Size Block Match 2 Bytes False Weak 406019b

    Strong EOF False 0 Client A B 12e00cb → strong: 5982e9e5 index: 1 13400d3 → strong: 1c2fb9a8 index: 2 13900cd → strong: ab9e1994 index: 3 Signature table Operations
  46. Server Block Size Block Match 2 Bytes False Weak 406019b

    Strong EOF False 0 Client A B 12e00cb → strong: 5982e9e5 index: 1 13400d3 → strong: 1c2fb9a8 index: 2 13900cd → strong: ab9e1994 index: 3 Signature table Operations
  47. Server Block Size Block Match 2 Bytes False Weak 406019b

    Strong EOF False 0 Client A B 12e00cb → strong: 5982e9e5 index: 1 13400d3 → strong: 1c2fb9a8 index: 2 13900cd → strong: ab9e1994 index: 3 Signature table Operations
  48. Server Block Size Block Match 2 Bytes False Weak 12e00cb

    Strong EOF False 0 Client A B 12e00cb → strong: 5982e9e5 index: 1 13400d3 → strong: 1c2fb9a8 index: 2 13900cd → strong: ab9e1994 index: 3 Signature table Operations
  49. Server Block Size Block Match 2 Bytes False Weak 12e00cb

    Strong EOF False 0 Client A B 12e00cb → strong: 5982e9e5 index: 1 13400d3 → strong: 1c2fb9a8 index: 2 13900cd → strong: ab9e1994 index: 3 Signature table Operations Lookup…
  50. Server Block Size Block Match 2 Bytes False Weak 12e00cb

    Strong EOF False 0 Client A B 12e00cb → strong: 5982e9e5 index: 1 13400d3 → strong: 1c2fb9a8 index: 2 13900cd → strong: ab9e1994 index: 3 Signature table Operations ✓
  51. Server Block Size Block Match 2 Bytes False Weak 12e00cb

    Strong EOF False 5982e9e5 Client A B 12e00cb → strong: 5982e9e5 index: 1 13400d3 → strong: 1c2fb9a8 index: 2 13900cd → strong: ab9e1994 index: 3 Signature table Operations ✓
  52. Server Block Size Block Match 2 Bytes False Weak 12e00cb

    Strong EOF False 5982e9e5 Client A B 12e00cb → strong: 5982e9e5 index: 1 13400d3 → strong: 1c2fb9a8 index: 2 13900cd → strong: ab9e1994 index: 3 Signature table Operations ✓
  53. Server Block Size Block Match 2 Bytes False Weak 12e00cb

    Strong EOF False 5982e9e5 Client A B 12e00cb → strong: 5982e9e5 index: 1 13400d3 → strong: 1c2fb9a8 index: 2 13900cd → strong: ab9e1994 index: 3 Signature table Operations ✓ Lookup…
  54. Server Block Size Block Match 2 Bytes False Weak 12e00cb

    Strong EOF False 5982e9e5 Client A B 12e00cb → strong: 5982e9e5 index: 1 13400d3 → strong: 1c2fb9a8 index: 2 13900cd → strong: ab9e1994 index: 3 Signature table Operations ✓ ✓
  55. Server Block Size Block Match 2 Bytes True Weak 12e00cb

    Strong EOF False 5982e9e5 Client A B 12e00cb → strong: 5982e9e5 index: 1 13400d3 → strong: 1c2fb9a8 index: 2 13900cd → strong: ab9e1994 index: 3 Signature table Operations
  56. Server Block Size Block Match 2 Bytes True Weak 12e00cb

    Strong EOF False 5982e9e5 Client A B 12e00cb → strong: 5982e9e5 index: 1 13400d3 → strong: 1c2fb9a8 index: 2 13900cd → strong: ab9e1994 index: 3 Signature table Operations 1
  57. Server Block Size Block Match 2 Bytes Weak Strong EOF

    False Client A B 12e00cb → strong: 5982e9e5 index: 1 13400d3 → strong: 1c2fb9a8 index: 2 13900cd → strong: ab9e1994 index: 3 Signature table Operations 1 0 0 True
  58. Server Block Size Block Match 2 Bytes Weak Strong EOF

    False Client A B 12e00cb → strong: 5982e9e5 index: 1 13400d3 → strong: 1c2fb9a8 index: 2 13900cd → strong: ab9e1994 index: 3 Signature table Operations 1 0 0 True
  59. Server Block Size Block Match 2 Bytes Weak Strong EOF

    False Client A B 12e00cb → strong: 5982e9e5 index: 1 13400d3 → strong: 1c2fb9a8 index: 2 13900cd → strong: ab9e1994 index: 3 Signature table Operations 1 13400d3 0 True
  60. Server Block Size Block Match 2 Bytes Weak Strong EOF

    False Client A B 12e00cb → strong: 5982e9e5 index: 1 13400d3 → strong: 1c2fb9a8 index: 2 13900cd → strong: ab9e1994 index: 3 Signature table Operations 1 13400d3 0 True Lookup…
  61. Server Block Size Block Match 2 Bytes Weak Strong EOF

    False Client A B 12e00cb → strong: 5982e9e5 index: 1 13400d3 → strong: 1c2fb9a8 index: 2 13900cd → strong: ab9e1994 index: 3 Signature table Operations 1 13400d3 0 True ✓
  62. Server Block Size Block Match 2 Bytes Weak Strong EOF

    False Client A B 12e00cb → strong: 5982e9e5 index: 1 13400d3 → strong: 1c2fb9a8 index: 2 13900cd → strong: ab9e1994 index: 3 Signature table Operations 1 13400d3 1c2fb9a8 True ✓
  63. Server Block Size Block Match 2 Bytes Weak Strong EOF

    False Client A B 12e00cb → strong: 5982e9e5 index: 1 13400d3 → strong: 1c2fb9a8 index: 2 13900cd → strong: ab9e1994 index: 3 Signature table Operations 1 13400d3 1c2fb9a8 True ✓ Lookup…
  64. Server Block Size Block Match 2 Bytes Weak Strong EOF

    False Client A B 12e00cb → strong: 5982e9e5 index: 1 13400d3 → strong: 1c2fb9a8 index: 2 13900cd → strong: ab9e1994 index: 3 Signature table Operations 1 13400d3 1c2fb9a8 True ✓ ✓
  65. Server Block Size Block Match 2 Bytes Weak Strong EOF

    False Client A B 12e00cb → strong: 5982e9e5 index: 1 13400d3 → strong: 1c2fb9a8 index: 2 13900cd → strong: ab9e1994 index: 3 Signature table Operations 1 13400d3 1c2fb9a8 True
  66. Server Block Size Block Match 2 Bytes Weak Strong EOF

    False Client A B 12e00cb → strong: 5982e9e5 index: 1 13400d3 → strong: 1c2fb9a8 index: 2 13900cd → strong: ab9e1994 index: 3 Signature table Operations 1 13400d3 1c2fb9a8 True 2
  67. Server Block Size Block Match 2 Bytes Weak Strong EOF

    False Client A B 12e00cb → strong: 5982e9e5 index: 1 13400d3 → strong: 1c2fb9a8 index: 2 13900cd → strong: ab9e1994 index: 3 Signature table Operations 12 0 0 True
  68. Server Block Size Block Match 2 Bytes Weak Strong EOF

    False Client A B 12e00cb → strong: 5982e9e5 index: 1 13400d3 → strong: 1c2fb9a8 index: 2 13900cd → strong: ab9e1994 index: 3 Signature table Operations 12 0 0 True
  69. Server Block Size Block Match 2 Bytes Weak Strong EOF

    False Client A B 12e00cb → strong: 5982e9e5 index: 1 13400d3 → strong: 1c2fb9a8 index: 2 13900cd → strong: ab9e1994 index: 3 Signature table Operations 12 13900cd 0 True
  70. Server Block Size Block Match 2 Bytes Weak Strong EOF

    False Client A B 12e00cb → strong: 5982e9e5 index: 1 13400d3 → strong: 1c2fb9a8 index: 2 13900cd → strong: ab9e1994 index: 3 Signature table Operations 12 13900cd 0 True Lookup…
  71. Server Block Size Block Match 2 Bytes Weak Strong EOF

    False Client A B 12e00cb → strong: 5982e9e5 index: 1 13400d3 → strong: 1c2fb9a8 index: 2 13900cd → strong: ab9e1994 index: 3 Signature table Operations 12 13900cd 0 True ✓
  72. Server Block Size Block Match 2 Bytes Weak Strong EOF

    False Client A B 12e00cb → strong: 5982e9e5 index: 1 13400d3 → strong: 1c2fb9a8 index: 2 13900cd → strong: ab9e1994 index: 3 Signature table Operations 12 13900cd ab9e1994 True ✓
  73. Server Block Size Block Match 2 Bytes Weak Strong EOF

    False Client A B 12e00cb → strong: 5982e9e5 index: 1 13400d3 → strong: 1c2fb9a8 index: 2 13900cd → strong: ab9e1994 index: 3 Signature table Operations 12 13900cd ab9e1994 True ✓ ✓
  74. Server Block Size Block Match 2 Bytes Weak Strong EOF

    False Client A B 12e00cb → strong: 5982e9e5 index: 1 13400d3 → strong: 1c2fb9a8 index: 2 13900cd → strong: ab9e1994 index: 3 Signature table Operations 12 13900cd ab9e1994 True ✓ ✓
  75. Server Block Size Block Match 2 Bytes Weak Strong EOF

    False Client A B 12e00cb → strong: 5982e9e5 index: 1 13400d3 → strong: 1c2fb9a8 index: 2 13900cd → strong: ab9e1994 index: 3 Signature table Operations 12 13900cd ab9e1994 True
  76. Server Block Size Block Match 2 Bytes Weak Strong EOF

    False Client A B 12e00cb → strong: 5982e9e5 index: 1 13400d3 → strong: 1c2fb9a8 index: 2 13900cd → strong: ab9e1994 index: 3 Signature table Operations 12 13900cd ab9e1994 True 3
  77. Server Block Size Block Match 2 Bytes Weak Strong EOF

    False Client A B 12e00cb → strong: 5982e9e5 index: 1 13400d3 → strong: 1c2fb9a8 index: 2 13900cd → strong: ab9e1994 index: 3 Signature table Operations 12 True 3 0 0
  78. Server Block Size Block Match 2 Bytes Weak Strong EOF

    False Client A B 12e00cb → strong: 5982e9e5 index: 1 13400d3 → strong: 1c2fb9a8 index: 2 13900cd → strong: ab9e1994 index: 3 Signature table Operations 12 True 3 0 0
  79. Server Block Size Block Match 2 Bytes Weak Strong EOF

    False Client A B 12e00cb → strong: 5982e9e5 index: 1 13400d3 → strong: 1c2fb9a8 index: 2 13900cd → strong: ab9e1994 index: 3 Signature table Operations 12 True 3 13d00d6 0
  80. Server Block Size Block Match 2 Bytes Weak Strong EOF

    False Client A B 12e00cb → strong: 5982e9e5 index: 1 13400d3 → strong: 1c2fb9a8 index: 2 13900cd → strong: ab9e1994 index: 3 Signature table Operations 12 True 3 13d00d6 0
  81. Server Block Size Block Match 2 Bytes Weak Strong EOF

    False Client A B 12e00cb → strong: 5982e9e5 index: 1 13400d3 → strong: 1c2fb9a8 index: 2 13900cd → strong: ab9e1994 index: 3 Signature table Operations 12 True 3 13d00d6 0 Lookup…
  82. Server Block Size Block Match 2 Bytes Weak Strong EOF

    False Client A B 12e00cb → strong: 5982e9e5 index: 1 13400d3 → strong: 1c2fb9a8 index: 2 13900cd → strong: ab9e1994 index: 3 Signature table Operations 12 True 3 13d00d6 0 ✘
  83. Server Block Size Block Match 2 Bytes Weak Strong EOF

    False Client A B 12e00cb → strong: 5982e9e5 index: 1 13400d3 → strong: 1c2fb9a8 index: 2 13900cd → strong: ab9e1994 index: 3 Signature table Operations 12 False 3 13d00d6 0
  84. Server Block Size Block Match 2 Bytes Weak Strong EOF

    True Client A B 12e00cb → strong: 5982e9e5 index: 1 13400d3 → strong: 1c2fb9a8 index: 2 13900cd → strong: ab9e1994 index: 3 Signature table Operations 12 False 3 13d00d6 0
  85. Server Block Size Block Match 2 Bytes Weak Strong EOF

    True Client A B 12e00cb → strong: 5982e9e5 index: 1 13400d3 → strong: 1c2fb9a8 index: 2 13900cd → strong: ab9e1994 index: 3 Signature table Operations 12 False 3 13d00d6 0
  86. Server Block Size Block Match 2 Bytes Weak Strong EOF

    True Client A B 12e00cb → strong: 5982e9e5 index: 1 13400d3 → strong: 1c2fb9a8 index: 2 13900cd → strong: ab9e1994 index: 3 Signature table Operations 12 False 3 13d00d6 0
  87. Server Block Size Block Match 2 Bytes Weak Strong EOF

    True Client A B 12e00cb → strong: 5982e9e5 index: 1 13400d3 → strong: 1c2fb9a8 index: 2 13900cd → strong: ab9e1994 index: 3 Signature table Operations 12 False 3 13d00d6 0 C
  88. Server Block Size Block Match 2 Bytes Weak Strong EOF

    True Client A B 12e00cb → strong: 5982e9e5 index: 1 13400d3 → strong: 1c2fb9a8 index: 2 13900cd → strong: ab9e1994 index: 3 Signature table Operations 12 False 3 13d00d6 0 C
  89. Server Block Size Block Match 2 Bytes Weak Strong EOF

    True Client A B 12e00cb → strong: 5982e9e5 index: 1 13400d3 → strong: 1c2fb9a8 index: 2 13900cd → strong: ab9e1994 index: 3 Signature table Operations 12 False 3 13d00d6 0 C
  90. Server Block Size Block Match 2 Bytes Weak Strong EOF

    True Client A B 12e00cb → strong: 5982e9e5 index: 1 13400d3 → strong: 1c2fb9a8 index: 2 13900cd → strong: ab9e1994 index: 3 Signature table Operations 1 2 False 3 13d00d6 0 C
  91. Server Block Size Block Match 2 Bytes Weak Strong EOF

    True Client A B 12e00cb → strong: 5982e9e5 index: 1 13400d3 → strong: 1c2fb9a8 index: 2 13900cd → strong: ab9e1994 index: 3 Signature table Operations 1 2 False 3 13d00d6 0 C
  92. Server Block Size Block Match 2 Bytes Weak Strong EOF

    True Client A B 12e00cb → strong: 5982e9e5 index: 1 13400d3 → strong: 1c2fb9a8 index: 2 13900cd → strong: ab9e1994 index: 3 Signature table Operations 2 False 3 13d00d6 0 C
  93. Server Block Size Block Match 2 Bytes Weak Strong EOF

    True Client A B 12e00cb → strong: 5982e9e5 index: 1 13400d3 → strong: 1c2fb9a8 index: 2 13900cd → strong: ab9e1994 index: 3 Signature table Operations 2 False 3 13d00d6 0 C
  94. Server Block Size Block Match 2 Bytes Weak Strong EOF

    True Client A B 12e00cb → strong: 5982e9e5 index: 1 13400d3 → strong: 1c2fb9a8 index: 2 13900cd → strong: ab9e1994 index: 3 Signature table Operations 2 False 3 13d00d6 0 C
  95. Server Block Size Block Match 2 Bytes Weak Strong EOF

    True Client A B 12e00cb → strong: 5982e9e5 index: 1 13400d3 → strong: 1c2fb9a8 index: 2 13900cd → strong: ab9e1994 index: 3 Signature table Operations False 3 13d00d6 0 C
  96. Server Block Size Block Match 2 Bytes Weak Strong EOF

    True Client A B 12e00cb → strong: 5982e9e5 index: 1 13400d3 → strong: 1c2fb9a8 index: 2 13900cd → strong: ab9e1994 index: 3 Signature table Operations False 3 13d00d6 0 C
  97. Server Block Size Block Match 2 Bytes Weak Strong EOF

    True Client A B 12e00cb → strong: 5982e9e5 index: 1 13400d3 → strong: 1c2fb9a8 index: 2 13900cd → strong: ab9e1994 index: 3 Signature table Operations False 3 13d00d6 0 C
  98. Server Block Size Block Match 2 Bytes Weak Strong EOF

    True Client A B 12e00cb → strong: 5982e9e5 index: 1 13400d3 → strong: 1c2fb9a8 index: 2 13900cd → strong: ab9e1994 index: 3 Signature table Operations False 13d00d6 0 C
  99. Server Block Size Block Match 2 Bytes Weak Strong EOF

    True Client A B 12e00cb → strong: 5982e9e5 index: 1 13400d3 → strong: 1c2fb9a8 index: 2 13900cd → strong: ab9e1994 index: 3 Signature table False 13d00d6 0 C
  100. Server Block Size Block Match 2 Bytes Weak Strong EOF

    True Client A B 12e00cb → strong: 5982e9e5 index: 1 13400d3 → strong: 1c2fb9a8 index: 2 13900cd → strong: ab9e1994 index: 3 Signature table False 13d00d6 0
  101. Rolling Checksum const ( mod = 1 << 16 )

    func rollingHash(block []byte) (uint32, uint32, uint32) { var a, b uint32 l := uint32(len(block)) for index, value := range block { a += uint32(value) b += (l - uint32(index)) * uint32(value) } r1 := a % mod r2 := b % mod r := r1 + (mod * r2) return r1, r2, r }
  102. Rolling Checksum const ( mod = 1 << 16 )

    func rollingHash(block []byte) (uint32, uint32, uint32) { var a, b uint32 l := uint32(len(block)) for index, value := range block { a += uint32(value) b += (l - uint32(index)) * uint32(value) } r1 := a % mod r2 := b % mod r := r1 + (mod * r2) return r1, r2, r }
  103. Rolling Checksum const ( mod = 1 << 16 )

    func rollingHash(block []byte) (uint32, uint32, uint32) { var a, b uint32 l := uint32(len(block)) for index, value := range block { a += uint32(value) b += (l - uint32(index)) * uint32(value) } r1 := a % mod r2 := b % mod r := r1 + (mod * r2) return r1, r2, r }
  104. Rolling Checksum const ( mod = 1 << 16 )

    func rollingHash(block []byte) (uint32, uint32, uint32) { var a, b uint32 l := uint32(len(block)) for index, value := range block { a += uint32(value) b += (l - uint32(index)) * uint32(value) } r1 := a % mod r2 := b % mod r := r1 + (mod * r2) return r1, r2, r }
  105. Incremental Rolling Checksum func rollingHash2(l, r1, r2, outgoingValue, incomingValue uint32)

    (uint32, uint32, uint32) { r1 = (r1 - outgoingValue + incomingValue) % mod r2 = (r2 - (l * outgoingValue) + r1) % mod r := r1 + (mod * r2) return r1, r2, r }
  106. Incremental Rolling Checksum func rollingHash2(l, r1, r2, outgoingValue, incomingValue uint32)

    (uint32, uint32, uint32) { r1 = (r1 - outgoingValue + incomingValue) % mod r2 = (r2 - (l * outgoingValue) + r1) % mod r := r1 + (mod * r2) return r1, r2, r }
  107. Incremental Rolling Checksum func rollingHash2(l, r1, r2, outgoingValue, incomingValue uint32)

    (uint32, uint32, uint32) { r1 = (r1 - outgoingValue + incomingValue) % mod r2 = (r2 - (l * outgoingValue) + r1) % mod r := r1 + (mod * r2) return r1, r2, r }
  108. Incremental Rolling Checksum func rollingHash2(l, r1, r2, outgoingValue, incomingValue uint32)

    (uint32, uint32, uint32) { r1 = (r1 - outgoingValue + incomingValue) % mod r2 = (r2 - (l * outgoingValue) + r1) % mod r := r1 + (mod * r2) return r1, r2, r }
  109. What could be improved? • Adaptive block size based on

    file size • Whole file checksum client generation and verification at the server. • Pipelining to sync multiple files in parallel over the same network connection. • Block compression
  110. Applications • Online data deduplication • File forensics • Incremental

    backup systems • Efficient software updates …
  111. Acknowledgments Thanks for all the invaluable feedback on early drafts

    to: • Catherine Lopez - @lopezcatherine • Elizabeth Ramírez - @eramirem • David Castillo - @castillobgr • Andrew Turley - @casio_juarez • Cameron Yick - @hydrosquall • Juan P. Osorio - @jpoo90
  112. Acknowledgments For being so kind with his time answering all

    my questions about his PhD thesis. Andrew Tridgell
  113. References • https://www.samba.org/~tridge/phd_thesis.pdf • https://www.andrew.cmu.edu/course/15-749/READINGS/required/cas/tridgell96.pdf • https://speakerdeck.com/ceejbot/hash-functions-and-you • https://en.wikipedia.org/wiki/Rabin_fingerprint •

    https://en.wikipedia.org/wiki/Rolling_hash • http://olstrans.sourceforge.net/release/OLS2000-rsync/OLS2000-rsync.html • http://preshing.com/20110504/hash-collision-probabilities/
  114. Demo • Written in Go • Using gRPC Streaming •

    LZ4 Compression • Lookup table uses dynamic arrays for chaining • Pipelining possible by using more gRPC streams on the same TCP connection, one per file. • Block size: 6kb • Strong checksum algorithm: xxhash