Slide 1

Slide 1 text

1 SPL Data Structures and their Complexity Jurri¨ en Stutterheim September 17, 2011

Slide 2

Slide 2 text

2 1. Introduction

Slide 3

Slide 3 text

3 This presentation §1 Understand what data structures are How they are represented internally How “fast” each one is and why that is

Slide 4

Slide 4 text

4 Data structures §1 Classes that offer the means to store and retrieve data, possibly in a particular order Implementation is (often) optimised for certain use cases array is PHP’s oldest and most frequently used data structure PHP 5.3 adds support for several others

Slide 5

Slide 5 text

5 Current SPL data structures §1 SplDoublyLinkedList SplStack SplQueue SplHeap SplMaxHeap SplMinHeap SplPriorityQueue SplFixedArray SplObjectStorage

Slide 6

Slide 6 text

6 Why care? §1 Using the right data structure in the right place could improve performance Already implemented and tested: saves work Can add a type hint in a function definition Adds semantics to your code

Slide 7

Slide 7 text

7 Algorithmic complexity §1 We want to be able to talk about the performance of the data structure implementation Running speed (time complexity) Space consumption (space complexity) We describe complexity in terms of input size, which is machine and programming language independent

Slide 8

Slide 8 text

8 Example §1 for ($i = 0; $i < $n; $i++) { for ($j = 0; $j < $n; $j++) { echo ’tick’; } } For some n, how many times is “tick” printed? I.e. what is the time complexity of this algorithm?

Slide 9

Slide 9 text

8 Example §1 for ($i = 0; $i < $n; $i++) { for ($j = 0; $j < $n; $j++) { echo ’tick’; } } For some n, how many times is “tick” printed? I.e. what is the time complexity of this algorithm? n2 times

Slide 10

Slide 10 text

9 Talking about complexity §1 Pick a function to act as boundary for the algorithm’s complexity Worst-case Denoted O (big-Oh) “My algorithm will not be slower than this function” Best-case Denoted Ω (big-Omega) “My algorithm will at least be as slow as this function” If they are the same, we write Θ (big-Theta) In example: both cases are n2, so the algorithm is in Θ(n2)

Slide 11

Slide 11 text

10 Visualized §1

Slide 12

Slide 12 text

11 Example 2 §1 for ($i = 0; $i < $n; $i++) { if ($myBool) { for ($j = 0; $j < $n; $j++) { echo ’tick’; } } } What is the time complexity of this algorithm?

Slide 13

Slide 13 text

11 Example 2 §1 for ($i = 0; $i < $n; $i++) { if ($myBool) { for ($j = 0; $j < $n; $j++) { echo ’tick’; } } } What is the time complexity of this algorithm? O(n2) Ω(n) (if $myBool is false) No Θ!

Slide 14

Slide 14 text

12 We can be a bit sloppy §1 for ($i = 0; $i < $n; $i++) { if ($myBool) { for ($j = 0; $j < $n; $j++) { echo ’tick’; } } } We describe algorithmic behaviour as input size grows to infinity constant factors and smaller terms don’t matter too much E.g. 3n2 + 4n + 1 is in O(n2)

Slide 15

Slide 15 text

13 Other functions §1 for ($i = 0; $i < $n; $i++) { for ($j = 0; $j < $n; $j++) { echo ’tick’; } } for ($i = 0; $i < $n; $i++) { echo ’tock’; } This algorithm is still in Θ(n2).

Slide 16

Slide 16 text

14 Bounds §1 Figure: Order relations1 1Taken from Cormen et al. 2009

Slide 17

Slide 17 text

15 Complexity Comparison §1 100 101 10 1 102 10 3 Logarithmic Linear Quadratic Exponential Factorial Superexponential Constant: 1, logarithmic: lg n, linear: n, quadratic: n2, exponential: 2n, factorial: n!, super-exponential: nn

Slide 18

Slide 18 text

16 In numbers §1 Approximate growth for n = 50: 1 1 lg n 5.64 n 50 n2 2500 n3 12500 2n 1125899906842620 n! 3.04 ∗ 1064 nn 8.88 ∗ 1084

Slide 19

Slide 19 text

17 Some more notes on complexity §1 Constant time is written 1, but goes for any constant c Polynomial time contains all functions in nc for some constant c Everything in this presentation will be in polynomial time

Slide 20

Slide 20 text

18 2. SPL Data Structures

Slide 21

Slide 21 text

19 Credit where credit is due §2 The first three pictures in this section are from Wikipedia

Slide 22

Slide 22 text

20 SplDoublyLinkedList §2 12 99 37 Superclass of SplStack and SplQueue SplDoublyLinkedList is strange: it has some hashtable characteristics, while lacking some DLL characteristics

Slide 23

Slide 23 text

20 SplDoublyLinkedList §2 12 99 37 Superclass of SplStack and SplQueue SplDoublyLinkedList is strange: it has some hashtable characteristics, while lacking some DLL characteristics Interface suggests constant time operations through the ArrayAccess interface, which is not the case

Slide 24

Slide 24 text

20 SplDoublyLinkedList §2 12 99 37 Superclass of SplStack and SplQueue SplDoublyLinkedList is strange: it has some hashtable characteristics, while lacking some DLL characteristics Interface suggests constant time operations through the ArrayAccess interface, which is not the case Implemented as a conventional DLL in the C code

Slide 25

Slide 25 text

20 SplDoublyLinkedList §2 12 99 37 Superclass of SplStack and SplQueue SplDoublyLinkedList is strange: it has some hashtable characteristics, while lacking some DLL characteristics Interface suggests constant time operations through the ArrayAccess interface, which is not the case Implemented as a conventional DLL in the C code Time complexity Lookup by scanning in O(n) Access to beginning/end in Θ(1) Move to next/previous node in Θ(1)

Slide 26

Slide 26 text

21 SplStack §2 Subclass of SplDoublyLinkedList; adds no new operations Last-in, first-out (LIFO) Pop/push value from/on the top of the stack in Θ(1) Pop Push

Slide 27

Slide 27 text

22 SplQueue §2 Subclass of SplDoublyLinkedList; adds enqueue/dequeue operations First-in, first-out (FIFO) Read/dequeue element from front in Θ(1) Enqueue element to the end in Θ(1) Dequeue Enqueue

Slide 28

Slide 28 text

23 Short excursion: trees §2 100 19 36 17 3 25 1 2 7 Consists of nodes (vertices) and directed edges Each node always has in-degree 1 Except the root: always in-degree 0 Previous property implies there are no cycles Binary tree: each node has at most two child-nodes

Slide 29

Slide 29 text

24 SplHeap, SplMaxHeap and SplMinHeap §2 100 19 36 17 3 25 1 2 7 A heap is a tree with the heap property: for all A and B, if B is a child node of A, then val(A) val(B) for a max-heap: SplMaxHeap val(A) val(B) for a min-heap: SplMinHeap Where val(A) denotes the value of node A

Slide 30

Slide 30 text

25 Heaps contd. §2 SplHeap is an abstract superclass Implemented as binary tree Access to root element in Θ(1) Insertion/deletion in O(lg n)

Slide 31

Slide 31 text

26 SplPriorityQueue §2 Variant of SplMaxHeap: for all A and B, if B is a child node of A, then prio(A) prio(B) Where prio(A) denotes the priority of node A

Slide 32

Slide 32 text

27 SplFixedArray §2 Fixed-size array with numerical indices only Efficient OO array implementation No hashing required for keys Can make assumptions about array size Lookup, insertion, deletion in Θ(1) time Resize in Θ(n)

Slide 33

Slide 33 text

28 SplObjectStorage §2 Storage container for objects Insertion, deletion in Θ(1) Verification of presence in Θ(1) Missing: set operations Union, intersection, difference, etc.

Slide 34

Slide 34 text

29 3. Concluding

Slide 35

Slide 35 text

30 Missing in PHP §3 Set data structure Map/hashtable data structure Does SplDoublyLinkedList satisfy this use case? If yes: split it in two separate structures and make SplDoublyLinkedList a true doubly linked list Immutable data structures Allows us to more easily emulate “pure” functions Less bugs in your code due to lack of mutable state

Slide 36

Slide 36 text

31 Closing remarks §3 Use the SPL data structures! Choose them with care Reason about your code’s complexity

Slide 37

Slide 37 text

32 Questions §3 Questions?