Pro Yearly is on sale from $80 to $50! »

Distributed ID Generation

224884ff0b79e700e516e24fbcae6423?s=47 Nathan Kleyn
February 25, 2015

Distributed ID Generation

The theory and code behind generating distributed unique IDs using Redis.

224884ff0b79e700e516e24fbcae6423?s=128

Nathan Kleyn

February 25, 2015
Tweet

Transcript

  1. Distributed ID Generation @nathankleyn

  2. or, how to make your ID generation less like this:

  3. None
  4. and more like this:

  5. None
  6. Requirements

  7. 3 main requirements

  8. 1 an ID for every event

  9. 2 IDs must be unique

  10. 3 it must scale well

  11. Theory

  12. Time

  13. Time is hard. Really hard. (Diamond. Hard. See what I

    did there?)
  14. Time oracles could solve this problem.

  15. (No, not that oracle)

  16. Time oracles could solve this problem.

  17. NTP tries to solve this problem for the “rest of

    us”.
  18. (No, not that NTP?)

  19. NTP tries to solve this problem for the “rest of

    us”.
  20. However, expect ±10ms. (at least)

  21. That’s ±10ms per machine.

  22. Hey Bob, what’s the time? It’s 1970! Back in my

    day... Fucking Bob always thinks it’s 1970. It’s clearly 1980. Look at my hair! No, Karen. It’s 1990. I’m not wearing this antiquated Disney shirt for kicks.
  23. Time can move backwards or forwards.

  24. 1s ≠ 1000ms

  25. None
  26. Representing Numbers In Binary

  27. Java has many numeric data types.

  28. However it has no unsigned variants.* This is no longer

    strictly true in Java 8, however it’s a trick: they’re still signed types, but now there’s a bunch of functions to do unsigned operations on them (eg. unsignedDivide).
  29. So in Java the MSB is for the sign: 1000000000000

    1 when negative and 0 when positive. So the above is a short and is -4096.
  30. Some languages make it hard to use the signed bit.

  31. A Java long is 64-bits. So we have 63 usable

    bits.
  32. The Epoch

  33. An epoch is a marker of time relative to true

    time.
  34. Unix time is an epoch measured relative to 00:00:00.000 1/1/1970.

  35. We can define our own epoch.

  36. So in 1 year it will be 31,536,000. Unix time

    will be ~1,456,348,092,000.
  37. Why is this useful? Because it allows us to compress

    the storage of time.
  38. Redis

  39. Redis is awesome, fast and stable.

  40. It supports scripting via Lua.

  41. We can create a Lua script to make an ID

    inside Redis.
  42. Redis is not distributed.* Redis clustering will arrive in v3.

  43. We can round-robin between a bunch of Redis servers to

    achieve distribution.
  44. k-sorting

  45. k-sorting = “roughly sorting”

  46. IDs should provide only k- sorting guarantees.

  47. How It’s Done

  48. Format

  49. Our 64-bit IDs look like this:

  50. ABBBBBBBBBBBBBBBB BBBBBBBBBBBBBBBBB BBBBBBBBCCCCCCCCC CDDDDDDDDDDDD

  51. ABBBBBBBBBBBBBBBBBBBBBBBBB BBBBBBBBBBBBBBBBCCCCCCCCCC DDDDDDDDDDDD A is the reserved signed bit of

    a Java long (1 bit).
  52. ABBBBBBBBBBBBBBBBBBBBBBBBB BBBBBBBBBBBBBBBBCCCCCCCCCC DDDDDDDDDDDD B is the timestamp in milliseconds since

    custom epoch bits (41 bits).
  53. ABBBBBBBBBBBBBBBBBBBBBBBBB BBBBBBBBBBBBBBBBCCCCCCCCCC DDDDDDDDDDDD C is the logical shard ID (10

    bits).
  54. ABBBBBBBBBBBBBBBBBBBBBBBBB BBBBBBBBBBBBBBBBCCCCCCCCCC DDDDDDDDDDDD D is the sequence (12 bits).

  55. The Timestamp

  56. Represented in 41 bits using a custom epoch.

  57. This allows ~69 years of continuous ID generation.

  58. Note this is the first part of the ID, so

    it has the most bearing on sorting.
  59. Sorting IDs sorts by time strictly, remainder of ID roughly

    (ie. k-sorted).
  60. Logical Shard ID

  61. We want to be able to have many Redis servers.

  62. We allow 10 bits for this ID, so we can

    have up to 1024 ID generation machines.
  63. We give a fixed ID to each Redis server and

    it stamps its IDs with this ever after.
  64. The Sequence

  65. What happens if you ask the same Redis server to

    generate multiple IDs in a millisecond?
  66. The sequence ensures IDs are never duplicated when this happens.

  67. We rotate a 12-bit number.

  68. We roll back to 0 when it reaches 4905.

  69. That means a maximum of 4096 IDs per node per

    millisecond.
  70. If the sequence rolls over twice in the same millisecond,

    we block until the time changes.
  71. Distributing The Load

  72. Simple round-robin between the Redis servers.

  73. Retry 5 times before failing.

  74. Fin Questions?