Slide 1

Slide 1 text

Mastering PHP Data Structure 102 Patrick Allaert PHPNW 2012 Manchester, United Kingdom

Slide 2

Slide 2 text

About me ● Patrick Allaert ● Founder of Libereco Technologies ● Playing with PHP/Linux for +10 years ● eZ Publish core developer ● Author of the APM PHP extension ● @patrick_allaert ● [email protected] ● http://github.com/patrickallaert/ ● http://patrickallaert.blogspot.com/

Slide 3

Slide 3 text

APM

Slide 4

Slide 4 text

APM

Slide 5

Slide 5 text

PHP native datatypes ● NULL (IS_NULL) ● Booleans (IS_BOOL) ● Integers (IS_LONG) ● Floating point numbers (IS_DOUBLE) ● Strings (IS_STRING) ● Arrays (IS_ARRAY, IS_CONSTANT_ARRAY) ● Objects (IS_OBJECT) ● Resources (IS_RESOURCE) ● Callable (IS_CALLABLE)

Slide 6

Slide 6 text

Wikipedia datatypes ● 2-3-4 tree ● 2-3 heap ● 2-3 tree ● AA tree ● Abstract syntax tree ● (a,b)-tree ● Adaptive k-d tree ● Adjacency list ● Adjacency matrix ● AF-heap ● Alternating decision tree ● And-inverter graph ● And–or tree ● Array ● AVL tree ● Beap ● Bidirectional map ● Bin ● Binary decision diagram ● Binary heap ● Binary search tree ● Binary tree ● Binomial heap ● Bit array ● Bitboard ● Bit field ● Bitmap ● BK-tree ● Bloom filter ● Boolean ● Bounding interval hierarchy ● B sharp tree ● BSP tree ● B-tree ● B*-tree ● B+ tree ● B-trie ● Bx-tree ● Cartesian tree ● Char ● Circular buffer ● Compressed suffix array ● Container ● Control table ● Cover tree ● Ctrie ● Dancing tree ● D-ary heap ● Decision tree ● Deque ● Directed acyclic graph ● Directed graph ● Disjoint-set ● Distributed hash table ● Double ● Doubly connected edge list ● Doubly linked list ● Dynamic array ● Enfilade ● Enumerated type ● Expectiminimax tree ● Exponential tree ● Fenwick tree ● Fibonacci heap ● Finger tree ● Float ● FM-index ● Fusion tree ● Gap buffer ● Generalised suffix tree ● Graph ● Graph-structured stack ● Hash ● Hash array mapped trie ● Hashed array tree ● Hash list ● Hash table ● Hash tree ● Hash trie ● Heap ● Heightmap ● Hilbert R-tree ● Hypergraph ● Iliffe vector ● Image ● Implicit kd-tree ● Interval tree ● Int ● Judy array ● Kdb tree ● Kd-tree ● Koorde ● Leftist heap ● Lightmap ● Linear octree ● Link/cut tree ● Linked list ● Lookup table ● Map/Associative array/Dictionary ● Matrix ● Metric tree ● Minimax tree ● Min/max kd-tree ● M-tree ● Multigraph ● Multimap ● Multiset ● Octree ● Pagoda ● Pairing heap ● Parallel array ● Parse tree ● Plain old data structure ● Prefix hash tree ● Priority queue ● Propositional directed acyclic graph ● Quad-edge ● Quadtree ● Queap ● Queue ● Radix tree ● Randomized binary search tree ● Range tree ● Rapidly-exploring random tree ● Record (also called tuple or struct) ● Red-black tree ● Rope ● Routing table ● R-tree ● R* tree ● R+ tree ● Scapegoat tree ● Scene graph ● Segment tree ● Self-balancing binary search tree ● Self-organizing list ● Set ● Skew heap ● Skip list ● Soft heap ● Sorted array ● Spaghetti stack ● Sparse array ● Sparse matrix ● Splay tree ● SPQR-tree ● Stack ● String ● Suffix array ● Suffix tree ● Symbol table ● Syntax tree ● Tagged union (variant record, discriminated union, disjoint union) ● Tango tree ● Ternary heap ● Ternary search tree ● Threaded binary tree ● Top tree ● Treap ● Tree ● Trees ● Trie ● T-tree ● UB-tree ● Union ● Unrolled linked list ● Van Emde Boas tree ● Variable-length array ● VList ● VP-tree ● Weight-balanced tree ● Winged edge ● X-fast trie ● Xor linked list ● X-tree ● Y-fast trie ● Zero suppressed decision diagram ● Zipper ● Z-order

Slide 7

Slide 7 text

Game: Can you recognize some structures?

Slide 8

Slide 8 text

No content

Slide 9

Slide 9 text

No content

Slide 10

Slide 10 text

No content

Slide 11

Slide 11 text

No content

Slide 12

Slide 12 text

No content

Slide 13

Slide 13 text

No content

Slide 14

Slide 14 text

Array: PHP's untruthfulness PHP “Arrays” are not true Arrays!

Slide 15

Slide 15 text

Array: PHP's untruthfulness PHP “Arrays” are not true Arrays! An array typically looks like this: Data Data Data Data Data Data 0 1 2 3 4 5

Slide 16

Slide 16 text

Array: PHP's untruthfulness PHP “Arrays” can dynamically grow and be iterated both directions (reset(), next(), prev(), end()), exclusively with O(1) operations.

Slide 17

Slide 17 text

Array: PHP's untruthfulness PHP “Arrays” can dynamically grow and be iterated both directions (reset(), next(), prev(), end()), exclusively with O(1) operations. Let's have a Doubly Linked List (DLL): Data Data Data Data Data Head Tail Enables List, Deque, Queue and Stack implementations

Slide 18

Slide 18 text

Array: PHP's untruthfulness PHP “Arrays” elements are always accessible using a key (index).

Slide 19

Slide 19 text

Array: PHP's untruthfulness PHP “Arrays” elements are always accessible using a key (index). Let's have an Hash Table: Data Data Data Data Data Head Tail Bucket Bucket Bucket Bucket Bucket Bucket pointers array Bucket * 0 Bucket * 1 Bucket * 2 Bucket * 3 Bucket * 4 Bucket * 5 ... Bucket * nTableSize -1

Slide 20

Slide 20 text

Array: PHP's untruthfulness http://php.net/manual/en/language.types.array.php: “This type is optimized for several different uses; it can be treated as an array, list (vector), hash table (an implementation of a map), dictionary, collection, stack, queue, and probably more.”

Slide 21

Slide 21 text

Optimized for anything ≈ Optimized for nothing!

Slide 22

Slide 22 text

Array: PHP's untruthfulness ● In C: 100 000 integers (using long on 64bits => 8 bytes) can be stored in 0.76 Mb. ● In PHP: it will take 13.97 Mb! ≅ ● A PHP variable (containing an integer) takes 48 bytes. ● The overhead of buckets for every “array” entries is about 96 bytes. ● More details: http://nikic.github.com/2011/12/12/How-big-are-PHP-arrays-really-Hint-BIG.html

Slide 23

Slide 23 text

Data Structure

Slide 24

Slide 24 text

Structs (or records, tuples,...)

Slide 25

Slide 25 text

Structs (or records, tuples,...) ● A struct is a value containing other values which are typically accessed using a name. ● Example: Person => firstName / lastName ComplexNumber => realPart / imaginaryPart

Slide 26

Slide 26 text

Structs – Using array $person = array( "firstName" => "Patrick", "lastName" => "Allaert" );

Slide 27

Slide 27 text

Structs – Using a class $person = new PersonStruct( "Patrick", "Allaert" );

Slide 28

Slide 28 text

Structs – Using a class (Implementation) class PersonStruct { public $firstName; public $lastName; public function __construct($firstName, $lastName) { $this->firstName = $firstName; $this->lastName = $lastName; } }

Slide 29

Slide 29 text

Structs – Using a class (Implementation) class PersonStruct { public $firstName; public $lastName; public function __construct($firstName, $lastName) { $this->firstName = $firstName; $this->lastName = $lastName; } public function __set($key, $value) { // a. Do nothing // b. trigger_error() // c. Throws an exception } }

Slide 30

Slide 30 text

Structs – Pros and Cons Array + Uses less memory (PHP < 5.4) - Uses more memory (PHP = 5.4) - No type hinting - Flexible structure +|- Less OO Slightly faster? Class - Uses more memory (PHP < 5.4) + Uses less memory (PHP = 5.4) + Type hinting possible + Rigid structure +|- More OO Slightly slower?

Slide 31

Slide 31 text

(true) Arrays

Slide 32

Slide 32 text

(true) Arrays ● An array is a fixed size collection where elements are each identified by a numeric index.

Slide 33

Slide 33 text

(true) Arrays ● An array is a fixed size collection where elements are each identified by a numeric index. Data Data Data Data Data Data 0 1 2 3 4 5

Slide 34

Slide 34 text

(true) Arrays – Using SplFixedArray $array = new SplFixedArray(3); $array[0] = 1; // or $array->offsetSet() $array[1] = 2; // or $array->offsetSet() $array[2] = 3; // or $array->offsetSet() $array[0]; // gives 1 $array[1]; // gives 2 $array[2]; // gives 3

Slide 35

Slide 35 text

(true) Arrays – Pros and Cons Array - Uses more memory +|- Less OO SplFixedArray + Uses less memory +|- More OO

Slide 36

Slide 36 text

Queues

Slide 37

Slide 37 text

Queues ● A queue is an ordered collection respecting First In, First Out (FIFO) order. ● Elements are inserted at one end and removed at the other.

Slide 38

Slide 38 text

Queues ● A queue is an ordered collection respecting First In, First Out (FIFO) order. ● Elements are inserted at one end and removed at the other. Data Data Data Data Data Data Data Data Enqueue Dequeue

Slide 39

Slide 39 text

Queues – Using array $queue = array(); $queue[] = 1; // or array_push() $queue[] = 2; // or array_push() $queue[] = 3; // or array_push() array_shift($queue); // gives 1 array_shift($queue); // gives 2 array_shift($queue); // gives 3

Slide 40

Slide 40 text

Queues – Using SplQueue $queue = new SplQueue(); $queue[] = 1; // or $queue->enqueue() $queue[] = 2; // or $queue->enqueue() $queue[] = 3; // or $queue->enqueue() $queue->dequeue(); // gives 1 $queue->dequeue(); // gives 2 $queue->dequeue(); // gives 3

Slide 41

Slide 41 text

Stacks

Slide 42

Slide 42 text

Stacks ● A stack is an ordered collection respecting Last In, First Out (LIFO) order. ● Elements are inserted and removed on the same end.

Slide 43

Slide 43 text

Stacks ● A stack is an ordered collection respecting Last In, First Out (LIFO) order. ● Elements are inserted and removed on the same end. Data Data Data Data Data Data Data Data Push Pop

Slide 44

Slide 44 text

Stacks – Using array $stack = array(); $stack[] = 1; // or array_push() $stack[] = 2; // or array_push() $stack[] = 3; // or array_push() array_pop($stack); // gives 3 array_pop($stack); // gives 2 array_pop($stack); // gives 1

Slide 45

Slide 45 text

Stacks – Using SplStack $stack = new SplStack(); $stack[] = 1; // or $stack->push() $stack[] = 2; // or $stack->push() $stack[] = 3; // or $stack->push() $stack->pop(); // gives 3 $stack->pop(); // gives 2 $stack->pop(); // gives 1

Slide 46

Slide 46 text

Queues/Stacks – Pros and Cons Array - Uses more memory (overhead / entry: 96 bytes) - No type hinting +|- Less OO SplQueue / SplStack + Uses less memory (overhead / entry: 48 bytes) + Type hinting possible +|- More OO

Slide 47

Slide 47 text

Sets People with strong views on the distinction between geeks and nerds Geeks Nerds

Slide 48

Slide 48 text

Sets ● A set is a collection with no particular ordering especially suited for testing the membership of a value against a collection or to perform union/intersection/complement operations between them.

Slide 49

Slide 49 text

Sets ● A set is a collection with no particular ordering especially suited for testing the membership of a value against a collection or to perform union/intersection/complement operations between them. Data Data Data Data Data

Slide 50

Slide 50 text

Sets – Using array $set = array(); // Adding elements to a set $set[] = 1; $set[] = 2; $set[] = 3; // Checking presence in a set in_array(2, $set); // true in_array(5, $set); // false array_merge($set1, $set2); // union array_intersect($set1, $set2); // intersection array_diff($set1, $set2); // complement

Slide 51

Slide 51 text

Sets – Using array $set = array(); // Adding elements to a set $set[] = 1; $set[] = 2; $set[] = 3; // Checking presence in a set in_array(2, $set); // true in_array(5, $set); // false array_merge($set1, $set2); // union array_intersect($set1, $set2); // intersection array_diff($set1, $set2); // complement True performance killers!

Slide 52

Slide 52 text

Sets – Mis-usage if ($value === "val1" || $value === "val2" || $value === "val3"))) { // ... }

Slide 53

Slide 53 text

Sets – Mis-usage if (in_array($value, array("val1", "val2", "val3"))) { // ... }

Slide 54

Slide 54 text

Sets – Mis-usage switch ($value) { case "val1": case "val2": case "val3": // ... }

Slide 55

Slide 55 text

Sets – Using array (simple types) $set = array(); // Adding elements to a set $set[1] = true; // Any dummy value $set[2] = true; // is good but NULL! $set[3] = true; // Checking presence in a set isset($set[2]); // true isset($set[5]); // false $set1 + $set2; // union array_intersect_key($set1, $set2); // intersection array_diff_key($set1, $set2); // complement

Slide 56

Slide 56 text

Sets – Using array (simple types) ● Remember that PHP Array keys can be integers or strings only! $set = array(); // Adding elements to a set $set[1] = true; // Any dummy value $set[2] = true; // is good but NULL! $set[3] = true; // Checking presence in a set isset($set[2]); // true isset($set[5]); // false $set1 + $set2; // union array_intersect_key($set1, $set2); // intersection array_diff_key($set1, $set2); // complement

Slide 57

Slide 57 text

Sets – Using array (objects) $set = array(); // Adding elements to a set $set[spl_object_hash($object1)] = $object1; $set[spl_object_hash($object2)] = $object2; $set[spl_object_hash($object3)] = $object3; // Checking presence in a set isset($set[spl_object_hash($object2)]); // true isset($set[spl_object_hash($object5)]); // false $set1 + $set2; // union array_intersect_key($set1, $set2); // intersection array_diff_key($set1, $set2); // complement

Slide 58

Slide 58 text

Sets – Using array (objects) $set = array(); // Adding elements to a set $set[spl_object_hash($object1)] = $object1; $set[spl_object_hash($object2)] = $object2; $set[spl_object_hash($object3)] = $object3; // Checking presence in a set isset($set[spl_object_hash($object2)]); // true isset($set[spl_object_hash($object5)]); // false $set1 + $set2; // union array_intersect_key($set1, $set2); // intersection array_diff_key($set1, $set2); // complement Store a reference of the object!

Slide 59

Slide 59 text

Sets – Using SplObjectStorage (objects) $set = new SplObjectStorage(); // Adding elements to a set $set->attach($object1); // or $set[$object1] = null; $set->attach($object2); // or $set[$object2] = null; $set->attach($object3); // or $set[$object3] = null; // Checking presence in a set isset($set[$object2]); // true isset($set[$object2]); // false $set1->addAll($set2); // union $set1->removeAllExcept($set2); // intersection $set1->removeAll($set2); // complement

Slide 60

Slide 60 text

Sets – Using QuickHash (int) ● No union/intersection/complement operations (yet?) ● Yummy features like (loadFrom|saveTo)(String|File) $set = new QuickHashIntSet(64, QuickHashIntSet::CHECK_FOR_DUPES); // Adding elements to a set $set->add(1); $set->add(2); $set->add(3); // Checking presence in a set $set->exists(2); // true $set->exists(5); // false // Soonish: isset($set[2]);

Slide 61

Slide 61 text

Sets – Using bitsets define("E_ERROR", 1); // or 1<<0 define("E_WARNING", 2); // or 1<<1 define("E_PARSE", 4); // or 1<<2 define("E_NOTICE", 8); // or 1<<3 // Adding elements to a set $set = 0; $set |= E_ERROR; $set |= E_WARNING; $set |= E_PARSE; // Checking presence in a set $set & E_ERROR; // true $set & E_NOTICE; // false $set1 | $set2; // union $set1 & $set2; // intersection $set1 ^ $set2; // complement

Slide 62

Slide 62 text

Sets – Using bitsets (example) Instead of: function remove($path, $files = true, $directories = true, $links = true, $executable = true) { if (!$files && is_file($path)) return false; if (!$directories && is_dir($path)) return false; if (!$links && is_link($path)) return false; if (!$executable && is_executable($path)) return false; // ... } remove("/tmp/removeMe", true, false, true, false); // WTF ?!

Slide 63

Slide 63 text

Sets – Using bitsets (example) Instead of: define("REMOVE_FILES", 1 << 0); define("REMOVE_DIRS", 1 << 1); define("REMOVE_LINKS", 1 << 2); define("REMOVE_EXEC", 1 << 3); define("REMOVE_ALL", ~0); // Setting all bits function remove($path, $options = REMOVE_ALL) { if (~$options & REMOVE_FILES && is_file($path)) return false; if (~$options & REMOVE_DIRS && is_dir($path)) return false; if (~$options & REMOVE_LINKS && is_link($path)) return false; if (~$options & REMOVE_EXEC && is_executable($path)) return false; // ... } remove("/tmp/removeMe", REMOVE_FILES | REMOVE_LINKS); // Much better :)

Slide 64

Slide 64 text

Sets: Conclusions ● Use the key and not the value when using PHP Arrays. ● Use QuickHash for set of integers if possible. ● Use SplObjectStorage as soon as you are playing with objects. ● Don't use array_unique() when you need a set!

Slide 65

Slide 65 text

Maps ● A map is a collection of key/value pairs where all keys are unique.

Slide 66

Slide 66 text

Maps – Using array ● Don't use array_merge() on maps. $map = array(); $map["ONE"] = 1; $map["TWO"] = 2; $map["THREE"] = 3; // Merging maps: array_merge($map1, $map2); // SLOW! $map2 + $map1; // Fast :)

Slide 67

Slide 67 text

Multikey Maps – Using array $map = array(); $map["ONE"] = 1; $map["UN"] =& $map["ONE"]; $map["UNO"] =& $map["ONE"]; $map["TWO"] = 2; $map["DEUX"] =& $map["TWO"]; $map["DUE"] =& $map["TWO"]; $map["UNO"] = "once"; $map["DEUX"] = "twice"; var_dump($map); /* array(6) { ["ONE"] => &string(4) "once" ["UN"] => &string(4) "once" ["UNO"] => &string(4) "once" ["TWO"] => &string(5) "twice" ["DEUX"] => &string(5) "twice" ["DUE"] => &string(5) "twice" } */

Slide 68

Slide 68 text

Heap ● A heap is a tree-based structure in which all elements are ordered with largest key at the top, and the smallest one as leafs.

Slide 69

Slide 69 text

Heap ● A heap is a tree-based structure in which all elements are ordered with largest key at the top, and the smallest one as leafs.

Slide 70

Slide 70 text

Heap – Using array $heap = array(); $heap[] = 3; sort($heap); $heap[] = 1; sort($heap); $heap[] = 2; sort($heap);

Slide 71

Slide 71 text

Heap – Using Spl(Min|Max)Heap $heap = new SplMinHeap; $heap->insert(3); $heap->insert(1); $heap->insert(2);

Slide 72

Slide 72 text

Heaps: Conclusions ● MUCH faster than having to re-sort() an array at every insertion. ● If you don't require a collection to be sorted at every single step and can insert all data at once and then sort(). Array is a much better/faster approach. ● SplPriorityQueue is very similar, consider it is the same as SplHeap but where the sorting is made on the key rather than the value.

Slide 73

Slide 73 text

Bloom filters ● A bloom filter is a space-efficient probabilistic data structure used to test whether an element is member of a set. ● False positives are possible, but false negatives are not!

Slide 74

Slide 74 text

Bloom filters – Using bloomy // BloomFilter::__construct(int capacity [, double error_rate [, int random_seed ] ]) $bloomFilter = new BloomFilter(10000, 0.001); $bloomFilter->add("An element"); $bloomFilter->has("An element"); // true for sure $bloomFilter->has("Foo"); // false, most probably

Slide 75

Slide 75 text

Other related projects ● SPL Types: Various types implemented as object: SplInt, SplFloat, SplEnum, SplBool and SplString http://pecl.php.net/package/SPL_Types

Slide 76

Slide 76 text

Other related projects ● SPL Types: Various types implemented as object: SplInt, SplFloat, SplEnum, SplBool and SplString http://pecl.php.net/package/SPL_Types ● Judy: Sparse dynamic arrays implementation http://pecl.php.net/package/Judy

Slide 77

Slide 77 text

Other related projects ● SPL Types: Various types implemented as object: SplInt, SplFloat, SplEnum, SplBool and SplString http://pecl.php.net/package/SPL_Types ● Judy: Sparse dynamic arrays implementation http://pecl.php.net/package/Judy ● Weakref: Weak references implementation. Provides a gateway to an object without preventing that object from being collected by the garbage collector.

Slide 78

Slide 78 text

Conclusions ● Use appropriate data structure. It will keep your code clean and fast.

Slide 79

Slide 79 text

Conclusions ● Use appropriate data structure. It will keep your code clean and fast. ● Think about the time and space complexity involved by your algorithms.

Slide 80

Slide 80 text

Conclusions ● Use appropriate data structure. It will keep your code clean and fast. ● Think about the time and space complexity involved by your algorithms. ● Name your variables accordingly: use “Map”, “Set”, “List”, “Queue”,... to describe them instead of using something like: $ordersArray.

Slide 81

Slide 81 text

Questions?

Slide 82

Slide 82 text

Thanks ● Don't forget to rate this talk on https://joind.in/6941

Slide 83

Slide 83 text

Photo Credits ● Tuned car: http://www.flickr.com/photos/gioxxswall/5783867752 ● London Eye Structure: http://www.flickr.com/photos/photographygal123/4883546484 ● Cigarette: http://www.flickr.com/photos/superfantastic/166215927 ● Heap structure: http://en.wikipedia.org/wiki/File:Max-Heap.svg ● Drawers: http://www.flickr.com/photos/jamesclay/2312912612 ● Stones stack: http://www.flickr.com/photos/silent_e/2282729987 ● Tree: http://www.flickr.com/photos/drewbandy/6002204996