Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Ctrie Data Structure

Ctrie Data Structure

The description of the Ctrie data structure from PPoPP 2012.

Aleksandar Prokopec

February 28, 2012
Tweet

More Decks by Aleksandar Prokopec

Other Decks in Programming

Transcript

  1. Concurrent Tries with Efficient Non-blocking Snapshots Aleksandar Prokopec Phil Bagwell

    Martin Odersky École Polytechnique Fédérale de Lausanne Nathan Bronson Stanford
  2. Motivation val numbers = getNumbers() // compute square roots numbers

    foreach { entry => x = entry.root n = entry.number entry.root = 0.5 * (x + n / x) if (abs(entry.root - x) < eps) numbers.remove(entry) }
  3. Immutable HAMT • updates rewrite path from root to leaf

    4 12 16 20 25 33 37 0 1 8 9 3 4 12 8 9 11 insert(11)
  4. Immutable HAMT • updates rewrite path from root to leaf

    4 12 16 20 25 33 37 0 1 8 9 3 4 12 8 9 11 insert(11) efficient updates - logk (n)
  5. Node compression 48 57 48 57 1 0 1 0

    48 57 1 0 1 0 48 57 10 BITPOP(((1 << ((hc >> lev) & 1F)) – 1) & BMP)
  6. Node compression 48 57 48 57 1 0 1 0

    48 57 1 0 1 0 48 57 10 48 57
  7. Ctrie insert 4 9 12 16 20 25 33 37

    0 1 3 48 57 17 = 0100012
  8. Ctrie insert 4 9 12 16 20 25 33 37

    0 1 3 48 57 17 = 0100012 16 17 1) allocate
  9. Ctrie insert 4 9 12 20 25 33 37 0

    1 3 48 57 17 = 0100012 16 17 2) CAS
  10. Ctrie insert 4 9 12 20 25 33 37 0

    1 3 48 57 17 = 0100012 16 17
  11. Ctrie insert 4 9 12 33 37 0 1 3

    48 57 18 = 0100102 16 17 20 25
  12. Ctrie insert 4 9 12 33 37 0 1 3

    48 57 18 = 0100102 16 17 20 25 1) allocate 16 17 18
  13. Ctrie insert 4 9 12 33 37 0 1 3

    48 57 18 = 0100102 20 25 2) CAS 16 17 18
  14. Ctrie insert 4 9 12 33 37 0 1 3

    48 57 18 = 0100102 20 25 2) CAS 16 17 18 Unless…
  15. Ctrie insert 4 9 12 33 37 0 1 3

    48 57 18 = 0100102 16 17 20 25 T1-1) allocate 16 17 18 Unless… 28 = 0111002 T1 T2
  16. Ctrie insert 4 9 12 0 1 3 18 =

    0100102 16 17 20 25 T1-1) allocate 16 17 18 Unless… 28 = 0111002 T1 T2 20 25 28 T2-1) allocate
  17. Ctrie insert 4 9 12 0 1 3 18 =

    0100102 16 17 20 25 T1-1) allocate 16 17 18 28 = 0111002 T1 T2 20 25 28 T2-2) CAS
  18. Ctrie insert 4 9 12 0 1 3 18 =

    0100102 16 17 20 25 T1-2) CAS 16 17 18 28 = 0111002 T1 T2 20 25 28 T2-2) CAS
  19. Ctrie insert 4 9 12 0 1 3 18 =

    0100102 16 17 20 25 16 17 18 28 = 0111002 T1 T2 20 25 28 Lost insert!
  20. Ctrie insert – 2nd attempt 4 9 12 0 1

    3 16 17 20 25 Solution: I-nodes
  21. Ctrie insert – 2nd attempt 4 9 12 0 1

    3 16 17 20 25 18 = 0100102 28 = 0111002 T1 T2
  22. Ctrie insert – 2nd attempt 4 9 12 0 1

    3 16 17 T1 T2 20 25 18 = 0100102 28 = 0111002 16 17 18 20 25 28 T2-1) allocate T1-1) allocate
  23. Ctrie insert – 2nd attempt 4 9 12 0 1

    3 16 17 T1 T2 20 25 16 17 18 20 25 28 T2-2) CAS T1-2) CAS
  24. Ctrie insert – 2nd attempt 4 9 12 0 1

    3 16 17 18 20 25 28 Idea: once added to the Ctrie, I-nodes remain present.
  25. Ctrie insert – 2nd attempt 4 9 12 0 1

    3 16 17 18 20 25 28 Remove operation supported as well - details in the paper.
  26. Ctrie size 4 9 12 0 1 3 16 17

    18 20 25 28 size = 0
  27. Ctrie size 4 9 12 0 1 3 16 17

    18 20 25 28 size = 0
  28. Ctrie size 4 9 12 0 1 3 16 17

    18 20 25 28 size = 0
  29. Ctrie size 4 9 12 0 1 3 16 17

    18 20 25 28 size = 0
  30. Ctrie size 4 9 12 0 1 3 16 17

    18 20 25 28 size = 1
  31. Ctrie size 4 9 12 0 1 3 16 17

    18 20 25 28 size = 2
  32. Ctrie size 4 9 12 0 1 3 16 17

    18 20 25 28 size = 3
  33. Ctrie size 4 9 12 0 1 3 16 17

    18 20 25 28 size = 5
  34. Ctrie size 4 9 12 0 1 3 16 17

    18 20 25 28 size = 5 actual size = 12
  35. Ctrie size 4 9 12 0 1 3 16 17

    18 20 25 28 size = 5 0 1 actual size = 12
  36. Ctrie size 4 9 12 0 1 3 16 17

    18 20 25 28 size = 5 0 1 CAS actual size = 11
  37. Ctrie size 4 9 12 16 17 18 20 25

    28 size = 5 0 1 actual size = 11
  38. Ctrie size 4 9 12 16 17 18 20 25

    28 size = 6 0 1 actual size = 11
  39. Ctrie size 4 9 12 16 17 18 20 25

    28 size = 6 0 1 actual size = 11 19
  40. Ctrie size 4 9 12 16 17 18 20 25

    28 size = 6 0 1 actual size = 11 16 17 18 19
  41. Ctrie size 4 9 12 16 17 18 20 25

    28 size = 6 0 1 actual size = 12 16 17 18 19 CAS
  42. Ctrie size 4 9 12 20 25 28 size =

    6 0 1 actual size = 12 16 17 18 19
  43. Ctrie size 4 9 12 20 25 28 size =

    6 0 1 actual size = 12 16 17 18 19
  44. Ctrie size 4 9 12 20 25 28 size =

    7 0 1 actual size = 9 16 17 18 19
  45. Ctrie size 4 9 12 20 25 28 size =

    8 0 1 actual size = 12 16 17 18 19
  46. Ctrie size 4 9 12 20 25 28 size =

    9 0 1 actual size = 12 16 17 18 19
  47. Ctrie size 4 9 12 20 25 28 size =

    10 0 1 actual size = 12 16 17 18 19
  48. Ctrie size 4 9 12 20 25 28 size =

    11 0 1 actual size = 12 16 17 18 19
  49. Ctrie size 4 9 12 20 25 28 size =

    12 0 1 actual size = 12 16 17 18 19
  50. Ctrie size 4 9 12 20 25 28 size =

    13 0 1 actual size = 12 16 17 18 19
  51. Ctrie size 4 9 12 20 25 28 size =

    13 0 1 actual size = 12 16 17 18 19 But the size was never 13!
  52. Global state information 4 9 12 20 25 28 0

    1 16 17 18 19 • size • find • filter • iterator
  53. Global state information 4 9 12 20 25 28 0

    1 16 17 18 19 • size • find • filter • iterator  snapshot
  54. Snapshot using locks 4 9 12 20 25 28 0

    1 16 17 18 19 • copy expensive
  55. Snapshot using locks 4 9 12 20 25 28 0

    1 16 17 18 19 • copy expensive • not lock-free
  56. Snapshot using locks 4 9 12 20 25 28 0

    1 16 17 18 19 • copy expensive • not lock-free • can insert or remove remain lock-free? 0 1 2 CAS
  57. Snapshot using locks 4 9 12 20 25 28 0

    1 16 17 18 19 • copy expensive • not lock-free • can insert or remove remain lock-free? 0 1 2 CAS
  58. Snapshot using logs 4 9 12 20 25 28 0

    1 16 17 18 19 • keep a linked list of previous values in each I-node
  59. Snapshot using logs 4 9 12 20 25 28 0

    1 16 17 18 19 0 1 2 • keep a linked list of previous values in each I-node
  60. Snapshot using logs 4 9 12 20 25 28 0

    1 16 17 18 19 • keep a linked list of previous values in each I-node • when is it safe to delete old entries? 0 1 2
  61. Snapshot using immutability 4 9 12 20 25 28 0

    1 16 17 18 19 #1 #1 #1 #1 #1 root
  62. Snapshot using immutability 4 9 12 20 25 28 0

    1 16 17 18 19 #1 #1 #1 #1 #1 snapshot! root
  63. Snapshot using immutability 4 9 12 20 25 28 0

    1 16 17 18 19 #1 #1 #1 #1 #1 snapshot! #2 root 1) create new I-node at #2
  64. Snapshot using immutability 4 9 12 20 25 28 0

    1 16 17 18 19 #1 #1 #1 #1 #1 snapshot! #2 root 2) set snapshot snapshot #1
  65. Snapshot using immutability 4 9 12 20 25 28 0

    1 16 17 18 19 #1 #1 #1 #1 #1 snapshot! #2 root 3) CAS root to new I-node snapshot #1
  66. Snapshot using immutability 4 9 12 20 25 28 0

    1 16 17 18 19 #1 #1 #1 #1 #1 subsequent insert #2 root snapshot #1 2
  67. Snapshot using immutability 4 9 12 20 25 28 0

    1 16 17 18 19 #1 #1 #1 #1 #1 subsequent insert #2 root snapshot #1 2 generation #2 - ok!
  68. Snapshot using immutability 4 9 12 20 25 28 0

    1 16 17 18 19 #1 #1 #1 #1 #1 subsequent insert #2 root snapshot #1 2 generation #1 not ok, too old!
  69. Snapshot using immutability 4 9 12 20 25 28 0

    1 16 17 18 19 #1 #1 #1 #1 #1 subsequent insert #2 root 1) create updated node at #2 snapshot #1 2 #2 #2
  70. Snapshot using immutability 4 9 12 20 25 28 0

    1 16 17 18 19 #1 #1 #1 #1 #1 subsequent insert #2 root 2) CAS to the updated node snapshot #1 2 #2 #2
  71. Snapshot using immutability 4 9 12 20 25 28 0

    1 16 17 18 19 #1 #1 #1 #1 #1 subsequent insert #2 root snapshot #1 2 #2 #2 #1 too old!
  72. Snapshot using immutability 4 9 12 20 25 28 0

    1 16 17 18 19 #1 #1 #1 #1 #1 subsequent insert #2 root snapshot #1 2 #2 #2 4 9 12 #2 1) create updated node at #2
  73. Snapshot using immutability 4 9 12 20 25 28 0

    1 16 17 18 19 #1 #1 #1 #1 #1 subsequent insert #2 root snapshot #1 2 #2 #2 4 9 12 #2 2) CAS
  74. Snapshot using immutability 4 9 12 20 25 28 0

    1 16 17 18 19 #1 #1 #1 #1 #1 subsequent insert #2 root snapshot #1 #2 #2 4 9 12 #2 0 1 2 finally, create a new leaf and CAS
  75. Snapshot using immutability 4 9 12 20 25 28 0

    1 16 17 18 19 #1 #1 #1 #1 #1 another insert #2 root snapshot #1 #2 #2 4 9 12 #2 0 1 2 3
  76. Snapshot using immutability 4 9 12 20 25 28 0

    1 16 17 18 19 #1 #1 #1 #1 #1 another insert #2 root snapshot #1 #2 #2 4 9 12 #2 0 1 2 0 1 2 3
  77. Snapshot using immutability 4 9 12 20 25 28 0

    1 16 17 18 19 #1 #1 #1 #1 #1 But... this won't really work... why? #2 root snapshot #1 #2 #2 4 9 12 #2 0 1 2 0 1 2 3
  78. Snapshot using immutability 4 9 12 20 25 28 0

    1 16 17 18 19 #1 #1 #1 #1 #1 #2 root snapshot #1 #2 #2 4 9 12 #2 0 1 2 0 1 2 3 T2: remove 19 16 17 18
  79. Snapshot using immutability 4 9 12 20 25 28 0

    1 16 17 18 19 #1 #1 #1 #1 #1 #2 root snapshot #1 #2 #2 4 9 12 #2 0 1 2 0 1 2 3 T2: remove 19 16 17 18 CAS
  80. Snapshot using immutability 4 9 12 20 25 28 0

    1 16 17 18 19 #1 #1 #1 #1 #1 #2 root snapshot #1 #2 #2 4 9 12 #2 0 1 2 0 1 2 3 T2: remove 19 16 17 18 CAS How to fail this last CAS?
  81. Snapshot using immutability 4 9 12 20 25 28 0

    1 16 17 18 19 #1 #1 #1 #1 #1 #2 root snapshot #1 #2 #2 4 9 12 #2 0 1 2 0 1 2 3 T2: remove 19 16 17 18 DCAS How to fail this last CAS? DCAS
  82. Snapshot using immutability 4 9 12 20 25 28 0

    1 16 17 18 19 #1 #1 #1 #1 #1 #2 root snapshot #1 #2 #2 4 9 12 #2 0 1 2 0 1 2 3 T2: remove 19 16 17 18 How to fail this last CAS? DCAS - software based DCAS
  83. Snapshot using immutability 4 9 12 20 25 28 0

    1 16 17 18 19 #1 #1 #1 #1 #1 #2 root snapshot #1 #2 #2 4 9 12 #2 0 1 2 0 1 2 3 T2: remove 19 16 17 18 How to fail this last CAS? DCAS - software based ...creates intermediate objects DCAS
  84. GCAS - generation-compare-and-swap 4 9 12 20 25 28 0

    1 16 17 18 19 #1 #1 #1 #1 #1 #2 root snapshot #1 #2 #2 4 9 12 #2 0 1 2 3 T2: remove 19 16 17 18 prev 1) set prev field
  85. GCAS - generation-compare-and-swap 4 9 12 20 25 28 0

    1 16 17 18 19 #1 #1 #1 #1 #1 #2 root snapshot #1 #2 #2 4 9 12 #2 0 1 2 3 T2: remove 19 16 17 18 prev 2) CAS
  86. GCAS - generation-compare-and-swap 4 9 12 20 25 28 0

    1 16 17 18 19 #1 #1 #1 #1 #1 #2 root snapshot #1 #2 #2 4 9 12 #2 0 1 2 3 T2: remove 19 16 17 18 prev 3) read root generation
  87. GCAS - generation-compare-and-swap 4 9 12 20 25 28 0

    1 16 17 18 19 #1 #1 #1 #1 #1 #2 root snapshot #1 #2 #2 4 9 12 #2 0 1 2 3 16 17 18 prev 4) if root generation changed CAS prev to FailedNode(prev) FN
  88. GCAS - generation-compare-and-swap 4 9 12 20 25 28 0

    1 16 17 18 19 #1 #1 #1 #1 #1 #2 root snapshot #1 #2 #2 4 9 12 #2 0 1 2 3 16 17 18 prev 4) if root generation changed CAS prev to FailedNode(prev) FN
  89. GCAS - generation-compare-and-swap 4 9 12 20 25 28 0

    1 16 17 18 19 #1 #1 #1 #1 #1 #2 root snapshot #1 #2 #2 4 9 12 #2 0 1 2 3 16 17 18 prev 5) CAS to previous value FN
  90. GCAS - generation-compare-and-swap 4 9 12 20 25 28 0

    1 16 17 18 19 #1 #1 #1 #1 #1 #2 root snapshot #1 #2 #2 4 9 12 #2 0 1 2 3 16 17 18 prev 4) if root generation unchanged CAS prev to null
  91. GCAS - generation-compare-and-swap 4 9 12 20 25 28 0

    1 16 17 18 19 #1 #1 #1 #1 #1 #2 root snapshot #1 #2 #2 4 9 12 #2 0 1 2 3 16 17 18 4) if root generation unchanged CAS prev to null
  92. GCAS - generation-compare-and-swap 4 9 12 20 25 28 0

    1 16 17 18 19 #1 #1 #1 #1 #1 #2 root snapshot #1 #2 #2 4 9 12 #2 0 1 2 3 1) Replace all CAS with GCAS 2) Replace all READ with GCAS_READ (which checks if prev field is null)
  93. Snapshot-based size def size = { val sz = 0

    val it = iterator while (it.hasNext) sz += 1 sz }
  94. Snapshot-based size def size = { val sz = 0

    val it = iterator while (it.hasNext) sz += 1 sz } Above is O(n). But, by caching size in nodes - amortized O(logk n)! (see source code)
  95. Snapshot-based atomic clear def clear() = { val or =

    READ(root) val nr = new INode(new Gen) if (!CAS(root, or, nr)) clear() } (roughly)
  96. Conclusion • snapshots are linearizable and lock-free • snapshots take

    constant time • snapshots are horizontally scalable • snapshots add a non-significant overhead to the algorithm if they aren't used • the approach may be applicable to tree-based lock-free data-structures in general (intuition)