Upgrade to Pro — share decks privately, control downloads, hide ads and more …

SPL Data Structures and their Complexity

norm2782
September 30, 2011

SPL Data Structures and their Complexity

PHP 5.3 gained several new SPL datastructures. Since it is important to know when to use which data structure, we will look at the way they work and their algorithmic complexity. An introduction to complexity analysis is given to familiarize the audience with the big-Oh notation and the concepts of space and time complexity. We will then continue with discussing several of the SPL data structures.

norm2782

September 30, 2011
Tweet

More Decks by norm2782

Other Decks in Programming

Transcript

  1. 1
    SPL Data Structures and their Complexity
    Jurri¨
    en Stutterheim
    September 17, 2011

    View Slide

  2. 2
    1. Introduction

    View Slide

  3. 3
    This presentation §1
    Understand what data structures are
    How they are represented internally
    How “fast” each one is and why that is

    View Slide

  4. 4
    Data structures §1
    Classes that offer the means to store and retrieve data,
    possibly in a particular order
    Implementation is (often) optimised for certain use cases
    array is PHP’s oldest and most frequently used data
    structure
    PHP 5.3 adds support for several others

    View Slide

  5. 5
    Current SPL data structures §1
    SplDoublyLinkedList
    SplStack
    SplQueue
    SplHeap
    SplMaxHeap
    SplMinHeap
    SplPriorityQueue
    SplFixedArray
    SplObjectStorage

    View Slide

  6. 6
    Why care? §1
    Using the right data structure in the right place could
    improve performance
    Already implemented and tested: saves work
    Can add a type hint in a function definition
    Adds semantics to your code

    View Slide

  7. 7
    Algorithmic complexity §1
    We want to be able to talk about the performance of the
    data structure implementation
    Running speed (time complexity)
    Space consumption (space complexity)
    We describe complexity in terms of input size, which is
    machine and programming language independent

    View Slide

  8. 8
    Example §1
    for ($i = 0; $i < $n; $i++) {
    for ($j = 0; $j < $n; $j++) {
    echo ’tick’;
    }
    }
    For some n, how many times is “tick” printed? I.e. what is the
    time complexity of this algorithm?

    View Slide

  9. 8
    Example §1
    for ($i = 0; $i < $n; $i++) {
    for ($j = 0; $j < $n; $j++) {
    echo ’tick’;
    }
    }
    For some n, how many times is “tick” printed? I.e. what is the
    time complexity of this algorithm?
    n2 times

    View Slide

  10. 9
    Talking about complexity §1
    Pick a function to act as boundary for the algorithm’s
    complexity
    Worst-case
    Denoted O (big-Oh)
    “My algorithm will not be slower than this function”
    Best-case
    Denoted Ω (big-Omega)
    “My algorithm will at least be as slow as this function”
    If they are the same, we write Θ (big-Theta)
    In example: both cases are n2, so the algorithm is in Θ(n2)

    View Slide

  11. 10
    Visualized §1

    View Slide

  12. 11
    Example 2 §1
    for ($i = 0; $i < $n; $i++) {
    if ($myBool) {
    for ($j = 0; $j < $n; $j++) {
    echo ’tick’;
    }
    }
    }
    What is the time complexity of this algorithm?

    View Slide

  13. 11
    Example 2 §1
    for ($i = 0; $i < $n; $i++) {
    if ($myBool) {
    for ($j = 0; $j < $n; $j++) {
    echo ’tick’;
    }
    }
    }
    What is the time complexity of this algorithm?
    O(n2)
    Ω(n) (if $myBool is false)
    No Θ!

    View Slide

  14. 12
    We can be a bit sloppy §1
    for ($i = 0; $i < $n; $i++) {
    if ($myBool) {
    for ($j = 0; $j < $n; $j++) {
    echo ’tick’;
    }
    }
    }
    We describe algorithmic behaviour as input size grows to
    infinity
    constant factors and smaller terms don’t matter too much
    E.g. 3n2 + 4n + 1 is in O(n2)

    View Slide

  15. 13
    Other functions §1
    for ($i = 0; $i < $n; $i++) {
    for ($j = 0; $j < $n; $j++) {
    echo ’tick’;
    }
    }
    for ($i = 0; $i < $n; $i++) {
    echo ’tock’;
    }
    This algorithm is still in Θ(n2).

    View Slide

  16. 14
    Bounds §1
    Figure: Order relations1
    1Taken from Cormen et al. 2009

    View Slide

  17. 15
    Complexity Comparison §1
    100
    101
    10
    1
    102
    10
    3
    Logarithmic
    Linear
    Quadratic
    Exponential
    Factorial
    Superexponential
    Constant: 1, logarithmic: lg n, linear: n, quadratic: n2,
    exponential: 2n, factorial: n!, super-exponential: nn

    View Slide

  18. 16
    In numbers §1
    Approximate growth for n = 50:
    1 1
    lg n 5.64
    n 50
    n2 2500
    n3 12500
    2n 1125899906842620
    n! 3.04 ∗ 1064
    nn 8.88 ∗ 1084

    View Slide

  19. 17
    Some more notes on complexity §1
    Constant time is written 1, but goes for any constant c
    Polynomial time contains all functions in nc for some
    constant c
    Everything in this presentation will be in polynomial time

    View Slide

  20. 18
    2. SPL Data Structures

    View Slide

  21. 19
    Credit where credit is due §2
    The first three pictures in this section are from Wikipedia

    View Slide

  22. 20
    SplDoublyLinkedList §2
    12 99 37
    Superclass of SplStack and SplQueue
    SplDoublyLinkedList is strange: it has some hashtable
    characteristics, while lacking some DLL characteristics

    View Slide

  23. 20
    SplDoublyLinkedList §2
    12 99 37
    Superclass of SplStack and SplQueue
    SplDoublyLinkedList is strange: it has some hashtable
    characteristics, while lacking some DLL characteristics
    Interface suggests constant time operations through the
    ArrayAccess interface, which is not the case

    View Slide

  24. 20
    SplDoublyLinkedList §2
    12 99 37
    Superclass of SplStack and SplQueue
    SplDoublyLinkedList is strange: it has some hashtable
    characteristics, while lacking some DLL characteristics
    Interface suggests constant time operations through the
    ArrayAccess interface, which is not the case
    Implemented as a conventional DLL in the C code

    View Slide

  25. 20
    SplDoublyLinkedList §2
    12 99 37
    Superclass of SplStack and SplQueue
    SplDoublyLinkedList is strange: it has some hashtable
    characteristics, while lacking some DLL characteristics
    Interface suggests constant time operations through the
    ArrayAccess interface, which is not the case
    Implemented as a conventional DLL in the C code
    Time complexity
    Lookup by scanning in O(n)
    Access to beginning/end in Θ(1)
    Move to next/previous node in Θ(1)

    View Slide

  26. 21
    SplStack §2
    Subclass of SplDoublyLinkedList; adds no new operations
    Last-in, first-out (LIFO)
    Pop/push value from/on the top of the stack in Θ(1)
    Pop
    Push

    View Slide

  27. 22
    SplQueue §2
    Subclass of SplDoublyLinkedList; adds enqueue/dequeue
    operations
    First-in, first-out (FIFO)
    Read/dequeue element from front in Θ(1)
    Enqueue element to the end in Θ(1)
    Dequeue
    Enqueue

    View Slide

  28. 23
    Short excursion: trees §2
    100
    19 36
    17 3 25 1
    2 7
    Consists of nodes (vertices) and directed edges
    Each node always has in-degree 1
    Except the root: always in-degree 0
    Previous property implies there are no cycles
    Binary tree: each node has at most two child-nodes

    View Slide

  29. 24
    SplHeap, SplMaxHeap and SplMinHeap §2
    100
    19 36
    17 3 25 1
    2 7
    A heap is a tree with the heap property: for all A and B, if
    B is a child node of A, then
    val(A) val(B) for a max-heap: SplMaxHeap
    val(A) val(B) for a min-heap: SplMinHeap
    Where val(A) denotes the value of node A

    View Slide

  30. 25
    Heaps contd. §2
    SplHeap is an abstract superclass
    Implemented as binary tree
    Access to root element in Θ(1)
    Insertion/deletion in O(lg n)

    View Slide

  31. 26
    SplPriorityQueue §2
    Variant of SplMaxHeap: for all A and B, if B is a child
    node of A, then prio(A) prio(B)
    Where prio(A) denotes the priority of node A

    View Slide

  32. 27
    SplFixedArray §2
    Fixed-size array with numerical indices only
    Efficient OO array implementation
    No hashing required for keys
    Can make assumptions about array size
    Lookup, insertion, deletion in Θ(1) time
    Resize in Θ(n)

    View Slide

  33. 28
    SplObjectStorage §2
    Storage container for objects
    Insertion, deletion in Θ(1)
    Verification of presence in Θ(1)
    Missing: set operations
    Union, intersection, difference, etc.

    View Slide

  34. 29
    3. Concluding

    View Slide

  35. 30
    Missing in PHP §3
    Set data structure
    Map/hashtable data structure
    Does SplDoublyLinkedList satisfy this use case?
    If yes: split it in two separate structures and make
    SplDoublyLinkedList a true doubly linked list
    Immutable data structures
    Allows us to more easily emulate “pure” functions
    Less bugs in your code due to lack of mutable state

    View Slide

  36. 31
    Closing remarks §3
    Use the SPL data structures!
    Choose them with care
    Reason about your code’s complexity

    View Slide

  37. 32
    Questions §3
    Questions?

    View Slide