Upgrade to Pro — share decks privately, control downloads, hide ads and more …

SPL Data Structures and their Complexity

norm2782
September 30, 2011

SPL Data Structures and their Complexity

PHP 5.3 gained several new SPL datastructures. Since it is important to know when to use which data structure, we will look at the way they work and their algorithmic complexity. An introduction to complexity analysis is given to familiarize the audience with the big-Oh notation and the concepts of space and time complexity. We will then continue with discussing several of the SPL data structures.

norm2782

September 30, 2011
Tweet

More Decks by norm2782

Other Decks in Programming

Transcript

  1. 3 This presentation §1 Understand what data structures are How

    they are represented internally How “fast” each one is and why that is
  2. 4 Data structures §1 Classes that offer the means to

    store and retrieve data, possibly in a particular order Implementation is (often) optimised for certain use cases array is PHP’s oldest and most frequently used data structure PHP 5.3 adds support for several others
  3. 5 Current SPL data structures §1 SplDoublyLinkedList SplStack SplQueue SplHeap

    SplMaxHeap SplMinHeap SplPriorityQueue SplFixedArray SplObjectStorage
  4. 6 Why care? §1 Using the right data structure in

    the right place could improve performance Already implemented and tested: saves work Can add a type hint in a function definition Adds semantics to your code
  5. 7 Algorithmic complexity §1 We want to be able to

    talk about the performance of the data structure implementation Running speed (time complexity) Space consumption (space complexity) We describe complexity in terms of input size, which is machine and programming language independent
  6. 8 Example §1 for ($i = 0; $i < $n;

    $i++) { for ($j = 0; $j < $n; $j++) { echo ’tick’; } } For some n, how many times is “tick” printed? I.e. what is the time complexity of this algorithm?
  7. 8 Example §1 for ($i = 0; $i < $n;

    $i++) { for ($j = 0; $j < $n; $j++) { echo ’tick’; } } For some n, how many times is “tick” printed? I.e. what is the time complexity of this algorithm? n2 times
  8. 9 Talking about complexity §1 Pick a function to act

    as boundary for the algorithm’s complexity Worst-case Denoted O (big-Oh) “My algorithm will not be slower than this function” Best-case Denoted Ω (big-Omega) “My algorithm will at least be as slow as this function” If they are the same, we write Θ (big-Theta) In example: both cases are n2, so the algorithm is in Θ(n2)
  9. 11 Example 2 §1 for ($i = 0; $i <

    $n; $i++) { if ($myBool) { for ($j = 0; $j < $n; $j++) { echo ’tick’; } } } What is the time complexity of this algorithm?
  10. 11 Example 2 §1 for ($i = 0; $i <

    $n; $i++) { if ($myBool) { for ($j = 0; $j < $n; $j++) { echo ’tick’; } } } What is the time complexity of this algorithm? O(n2) Ω(n) (if $myBool is false) No Θ!
  11. 12 We can be a bit sloppy §1 for ($i

    = 0; $i < $n; $i++) { if ($myBool) { for ($j = 0; $j < $n; $j++) { echo ’tick’; } } } We describe algorithmic behaviour as input size grows to infinity constant factors and smaller terms don’t matter too much E.g. 3n2 + 4n + 1 is in O(n2)
  12. 13 Other functions §1 for ($i = 0; $i <

    $n; $i++) { for ($j = 0; $j < $n; $j++) { echo ’tick’; } } for ($i = 0; $i < $n; $i++) { echo ’tock’; } This algorithm is still in Θ(n2).
  13. 15 Complexity Comparison §1 100 101 10 1 102 10

    3 Logarithmic Linear Quadratic Exponential Factorial Superexponential Constant: 1, logarithmic: lg n, linear: n, quadratic: n2, exponential: 2n, factorial: n!, super-exponential: nn
  14. 16 In numbers §1 Approximate growth for n = 50:

    1 1 lg n 5.64 n 50 n2 2500 n3 12500 2n 1125899906842620 n! 3.04 ∗ 1064 nn 8.88 ∗ 1084
  15. 17 Some more notes on complexity §1 Constant time is

    written 1, but goes for any constant c Polynomial time contains all functions in nc for some constant c Everything in this presentation will be in polynomial time
  16. 19 Credit where credit is due §2 The first three

    pictures in this section are from Wikipedia
  17. 20 SplDoublyLinkedList §2 12 99 37 Superclass of SplStack and

    SplQueue SplDoublyLinkedList is strange: it has some hashtable characteristics, while lacking some DLL characteristics
  18. 20 SplDoublyLinkedList §2 12 99 37 Superclass of SplStack and

    SplQueue SplDoublyLinkedList is strange: it has some hashtable characteristics, while lacking some DLL characteristics Interface suggests constant time operations through the ArrayAccess interface, which is not the case
  19. 20 SplDoublyLinkedList §2 12 99 37 Superclass of SplStack and

    SplQueue SplDoublyLinkedList is strange: it has some hashtable characteristics, while lacking some DLL characteristics Interface suggests constant time operations through the ArrayAccess interface, which is not the case Implemented as a conventional DLL in the C code
  20. 20 SplDoublyLinkedList §2 12 99 37 Superclass of SplStack and

    SplQueue SplDoublyLinkedList is strange: it has some hashtable characteristics, while lacking some DLL characteristics Interface suggests constant time operations through the ArrayAccess interface, which is not the case Implemented as a conventional DLL in the C code Time complexity Lookup by scanning in O(n) Access to beginning/end in Θ(1) Move to next/previous node in Θ(1)
  21. 21 SplStack §2 Subclass of SplDoublyLinkedList; adds no new operations

    Last-in, first-out (LIFO) Pop/push value from/on the top of the stack in Θ(1) Pop Push
  22. 22 SplQueue §2 Subclass of SplDoublyLinkedList; adds enqueue/dequeue operations First-in,

    first-out (FIFO) Read/dequeue element from front in Θ(1) Enqueue element to the end in Θ(1) Dequeue Enqueue
  23. 23 Short excursion: trees §2 100 19 36 17 3

    25 1 2 7 Consists of nodes (vertices) and directed edges Each node always has in-degree 1 Except the root: always in-degree 0 Previous property implies there are no cycles Binary tree: each node has at most two child-nodes
  24. 24 SplHeap, SplMaxHeap and SplMinHeap §2 100 19 36 17

    3 25 1 2 7 A heap is a tree with the heap property: for all A and B, if B is a child node of A, then val(A) val(B) for a max-heap: SplMaxHeap val(A) val(B) for a min-heap: SplMinHeap Where val(A) denotes the value of node A
  25. 25 Heaps contd. §2 SplHeap is an abstract superclass Implemented

    as binary tree Access to root element in Θ(1) Insertion/deletion in O(lg n)
  26. 26 SplPriorityQueue §2 Variant of SplMaxHeap: for all A and

    B, if B is a child node of A, then prio(A) prio(B) Where prio(A) denotes the priority of node A
  27. 27 SplFixedArray §2 Fixed-size array with numerical indices only Efficient

    OO array implementation No hashing required for keys Can make assumptions about array size Lookup, insertion, deletion in Θ(1) time Resize in Θ(n)
  28. 28 SplObjectStorage §2 Storage container for objects Insertion, deletion in

    Θ(1) Verification of presence in Θ(1) Missing: set operations Union, intersection, difference, etc.
  29. 30 Missing in PHP §3 Set data structure Map/hashtable data

    structure Does SplDoublyLinkedList satisfy this use case? If yes: split it in two separate structures and make SplDoublyLinkedList a true doubly linked list Immutable data structures Allows us to more easily emulate “pure” functions Less bugs in your code due to lack of mutable state
  30. 31 Closing remarks §3 Use the SPL data structures! Choose

    them with care Reason about your code’s complexity