Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Data Structures and Performance

felipe
November 20, 2019

Data Structures and Performance

felipe

November 20, 2019
Tweet

More Decks by felipe

Other Decks in Technology

Transcript

  1. 2 ➔ Programmers should know some data structures ➔ Sometimes

    we need to think about performance ➔ Data structures are interesting Why
  2. 3 "Programmers waste enormous amounts of time thinking about, or

    worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%." Donald Knuth Problems with early optimization
  3. 4 Rob Pike's 5 Rules of Programming (slightly edited) Rule

    1. You can't tell where a program is going to spend its time. Don’t try a speed hack until you've proven that's where the bottleneck is. Rule 2. Measure. Don't tune for speed until you've measured. Rule 3. Fancy algorithms are slow when n is small, and n is usually small. Only use fancy algorithms when you are sure n is big.
  4. 5 Rob Pike's 5 Rules of Programming (slightly edited) Rule

    4. Fancy algorithms are buggier than simple ones, and they're much harder to implement. Use simple algorithms as well as simple data structures. Rule 5. Data structures, not algorithms, are central to programming. Algorithms are self-evident when you choose the right data structure.
  5. 12 Pros: ➔ Insert in constant time ➔ No need

    to know the size beforehand Cons: ➔ Costly random access ➔ Requires more space than arrays Used in: Implementation of stacks and queues, also buckets from hash maps. Linked List
  6. 14 Pros: ➔ Access in constant time ➔ Less memory

    consumed than a linked list Cons: ➔ Slow inserts and deletes (memcpy and memmove are linear) ➔ Memory allocation on heap is slow Used in: Python lists, Golang slices, Clojure vectors, Rust vectors... Vector
  7. 16 Hash Map bucket ptr 0 1 2 3 key

    value a 1 b 2 “a” 21738721 hash function Mask Access Lookup Insert Delete - Θ(1) Θ(1) Θ(1) - O(n) O(n) O(n)
  8. 17 Pros: ➔ Fast lookup even when n is high

    ➔ Good when order doesn’t matter (random access) Cons: ➔ Becomes inefficient with a high number of collisions ➔ Inserts can be slow if reallocation is needed Used in: Database indexing, caching, sets... Hash Map
  9. 19 Pros: ➔ Very fast operations for sorted items ➔

    Good for searching Cons: ➔ Becomes inefficient when unbalanced ➔ Too much memory copying when the tree is tall Used in: Heaps, sorting, encoding... Binary Tree
  10. 21 Bloom Filters 1 0 1 0 1 0 1

    1 x y f(x) g(x) h(x) f(y) g(y) h(y) Access Lookup Insert Delete - O(k) O(k) - k = Number of hash functions
  11. 22 Pros: ➔ Low space complexity ➔ Very fast to

    check set membership Cons: ➔ Too many false positives when the bitarray is small Used in: Web crawlers, weak password checks, malicious URLs on chrome Bloom Filters
  12. 23 Merkle Tree Top Hash Hash 1 Hash 0 B0

    B1 Access Lookup Insert Delete ? Θ(log2(n)) Θ(log2(n)) Θ(log2(n)) ? O(logk(n)) O(logk(n)) O(logk(n)) k = Branching factor
  13. 24 Pros: ➔ Fast verification of the contents of large

    data structures ➔ Maintains data integrity efficiently Cons: ➔ Can get really big Used in: Btrfs, ZFS, Git, Bitcoin, Ethereum Merkle Tree
  14. 25 Radix Tree (Trie) Access Lookup Insert Delete ? O(an)

    O(an) O(an) a = Size of word being searched for
  15. 26 Pros: ➔ Good for efficient strings representation ➔ Fast

    search Cons: ➔ Performs like a list when highly unbalanced Used in: REST API routers, modeling hierarchical data, Textual search.... Radix Tree
  16. 28 Pros: ➔ Good for immutable data representation ➔ Performance

    is close to a vector (depending on branching factor) ➔ Efficient space complexity Cons: ➔ Updates require copying Used in: Clojure’s immutable data structures, Immutable.js, Pyrsistent... Persistent Vector
  17. 30 ➔ https://hypirion.com/musings/understanding-persistent-vector-pt-1 ➔ https://brilliant.org/wiki/tries/ ➔ https://en.wikipedia.org/wiki/Merkle_tree ➔ https://brilliant.org/wiki/bloom-filter/ ➔

    https://en.wikipedia.org/wiki/Binary_search_tree ➔ https://stackoverflow.com/questions/25218880/why-is-stdvectorinsert- complexity-linear-instead-of-being-constant ➔ https://users.ece.utexas.edu/~adnan/pike.html ➔ https://github.com/Workiva/go-datastructures References