Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Mastering PHP Data Structure 102

Mastering PHP Data Structure 102

PHP NorthWest 2012 talk in Manchester about PHP Data Structures

Patrick Allaert

October 06, 2012
Tweet

More Decks by Patrick Allaert

Other Decks in Programming

Transcript

  1. About me • Patrick Allaert • Founder of Libereco Technologies

    • Playing with PHP/Linux for +10 years • eZ Publish core developer • Author of the APM PHP extension • @patrick_allaert • [email protected] • http://github.com/patrickallaert/ • http://patrickallaert.blogspot.com/
  2. APM

  3. APM

  4. PHP native datatypes • NULL (IS_NULL) • Booleans (IS_BOOL) •

    Integers (IS_LONG) • Floating point numbers (IS_DOUBLE) • Strings (IS_STRING) • Arrays (IS_ARRAY, IS_CONSTANT_ARRAY) • Objects (IS_OBJECT) • Resources (IS_RESOURCE) • Callable (IS_CALLABLE)
  5. Wikipedia datatypes • 2-3-4 tree • 2-3 heap • 2-3

    tree • AA tree • Abstract syntax tree • (a,b)-tree • Adaptive k-d tree • Adjacency list • Adjacency matrix • AF-heap • Alternating decision tree • And-inverter graph • And–or tree • Array • AVL tree • Beap • Bidirectional map • Bin • Binary decision diagram • Binary heap • Binary search tree • Binary tree • Binomial heap • Bit array • Bitboard • Bit field • Bitmap • BK-tree • Bloom filter • Boolean • Bounding interval hierarchy • B sharp tree • BSP tree • B-tree • B*-tree • B+ tree • B-trie • Bx-tree • Cartesian tree • Char • Circular buffer • Compressed suffix array • Container • Control table • Cover tree • Ctrie • Dancing tree • D-ary heap • Decision tree • Deque • Directed acyclic graph • Directed graph • Disjoint-set • Distributed hash table • Double • Doubly connected edge list • Doubly linked list • Dynamic array • Enfilade • Enumerated type • Expectiminimax tree • Exponential tree • Fenwick tree • Fibonacci heap • Finger tree • Float • FM-index • Fusion tree • Gap buffer • Generalised suffix tree • Graph • Graph-structured stack • Hash • Hash array mapped trie • Hashed array tree • Hash list • Hash table • Hash tree • Hash trie • Heap • Heightmap • Hilbert R-tree • Hypergraph • Iliffe vector • Image • Implicit kd-tree • Interval tree • Int • Judy array • Kdb tree • Kd-tree • Koorde • Leftist heap • Lightmap • Linear octree • Link/cut tree • Linked list • Lookup table • Map/Associative array/Dictionary • Matrix • Metric tree • Minimax tree • Min/max kd-tree • M-tree • Multigraph • Multimap • Multiset • Octree • Pagoda • Pairing heap • Parallel array • Parse tree • Plain old data structure • Prefix hash tree • Priority queue • Propositional directed acyclic graph • Quad-edge • Quadtree • Queap • Queue • Radix tree • Randomized binary search tree • Range tree • Rapidly-exploring random tree • Record (also called tuple or struct) • Red-black tree • Rope • Routing table • R-tree • R* tree • R+ tree • Scapegoat tree • Scene graph • Segment tree • Self-balancing binary search tree • Self-organizing list • Set • Skew heap • Skip list • Soft heap • Sorted array • Spaghetti stack • Sparse array • Sparse matrix • Splay tree • SPQR-tree • Stack • String • Suffix array • Suffix tree • Symbol table • Syntax tree • Tagged union (variant record, discriminated union, disjoint union) • Tango tree • Ternary heap • Ternary search tree • Threaded binary tree • Top tree • Treap • Tree • Trees • Trie • T-tree • UB-tree • Union • Unrolled linked list • Van Emde Boas tree • Variable-length array • VList • VP-tree • Weight-balanced tree • Winged edge • X-fast trie • Xor linked list • X-tree • Y-fast trie • Zero suppressed decision diagram • Zipper • Z-order
  6. Array: PHP's untruthfulness PHP “Arrays” are not true Arrays! An

    array typically looks like this: Data Data Data Data Data Data 0 1 2 3 4 5
  7. Array: PHP's untruthfulness PHP “Arrays” can dynamically grow and be

    iterated both directions (reset(), next(), prev(), end()), exclusively with O(1) operations.
  8. Array: PHP's untruthfulness PHP “Arrays” can dynamically grow and be

    iterated both directions (reset(), next(), prev(), end()), exclusively with O(1) operations. Let's have a Doubly Linked List (DLL): Data Data Data Data Data Head Tail Enables List, Deque, Queue and Stack implementations
  9. Array: PHP's untruthfulness PHP “Arrays” elements are always accessible using

    a key (index). Let's have an Hash Table: Data Data Data Data Data Head Tail Bucket Bucket Bucket Bucket Bucket Bucket pointers array Bucket * 0 Bucket * 1 Bucket * 2 Bucket * 3 Bucket * 4 Bucket * 5 ... Bucket * nTableSize -1
  10. Array: PHP's untruthfulness http://php.net/manual/en/language.types.array.php: “This type is optimized for several

    different uses; it can be treated as an array, list (vector), hash table (an implementation of a map), dictionary, collection, stack, queue, and probably more.”
  11. Array: PHP's untruthfulness • In C: 100 000 integers (using

    long on 64bits => 8 bytes) can be stored in 0.76 Mb. • In PHP: it will take 13.97 Mb! ≅ • A PHP variable (containing an integer) takes 48 bytes. • The overhead of buckets for every “array” entries is about 96 bytes. • More details: http://nikic.github.com/2011/12/12/How-big-are-PHP-arrays-really-Hint-BIG.html
  12. Structs (or records, tuples,...) • A struct is a value

    containing other values which are typically accessed using a name. • Example: Person => firstName / lastName ComplexNumber => realPart / imaginaryPart
  13. Structs – Using a class (Implementation) class PersonStruct { public

    $firstName; public $lastName; public function __construct($firstName, $lastName) { $this->firstName = $firstName; $this->lastName = $lastName; } }
  14. Structs – Using a class (Implementation) class PersonStruct { public

    $firstName; public $lastName; public function __construct($firstName, $lastName) { $this->firstName = $firstName; $this->lastName = $lastName; } public function __set($key, $value) { // a. Do nothing // b. trigger_error() // c. Throws an exception } }
  15. Structs – Pros and Cons Array + Uses less memory

    (PHP < 5.4) - Uses more memory (PHP = 5.4) - No type hinting - Flexible structure +|- Less OO Slightly faster? Class - Uses more memory (PHP < 5.4) + Uses less memory (PHP = 5.4) + Type hinting possible + Rigid structure +|- More OO Slightly slower?
  16. (true) Arrays • An array is a fixed size collection

    where elements are each identified by a numeric index.
  17. (true) Arrays • An array is a fixed size collection

    where elements are each identified by a numeric index. Data Data Data Data Data Data 0 1 2 3 4 5
  18. (true) Arrays – Using SplFixedArray $array = new SplFixedArray(3); $array[0]

    = 1; // or $array->offsetSet() $array[1] = 2; // or $array->offsetSet() $array[2] = 3; // or $array->offsetSet() $array[0]; // gives 1 $array[1]; // gives 2 $array[2]; // gives 3
  19. (true) Arrays – Pros and Cons Array - Uses more

    memory +|- Less OO SplFixedArray + Uses less memory +|- More OO
  20. Queues • A queue is an ordered collection respecting First

    In, First Out (FIFO) order. • Elements are inserted at one end and removed at the other.
  21. Queues • A queue is an ordered collection respecting First

    In, First Out (FIFO) order. • Elements are inserted at one end and removed at the other. Data Data Data Data Data Data Data Data Enqueue Dequeue
  22. Queues – Using array $queue = array(); $queue[] = 1;

    // or array_push() $queue[] = 2; // or array_push() $queue[] = 3; // or array_push() array_shift($queue); // gives 1 array_shift($queue); // gives 2 array_shift($queue); // gives 3
  23. Queues – Using SplQueue $queue = new SplQueue(); $queue[] =

    1; // or $queue->enqueue() $queue[] = 2; // or $queue->enqueue() $queue[] = 3; // or $queue->enqueue() $queue->dequeue(); // gives 1 $queue->dequeue(); // gives 2 $queue->dequeue(); // gives 3
  24. Stacks • A stack is an ordered collection respecting Last

    In, First Out (LIFO) order. • Elements are inserted and removed on the same end.
  25. Stacks • A stack is an ordered collection respecting Last

    In, First Out (LIFO) order. • Elements are inserted and removed on the same end. Data Data Data Data Data Data Data Data Push Pop
  26. Stacks – Using array $stack = array(); $stack[] = 1;

    // or array_push() $stack[] = 2; // or array_push() $stack[] = 3; // or array_push() array_pop($stack); // gives 3 array_pop($stack); // gives 2 array_pop($stack); // gives 1
  27. Stacks – Using SplStack $stack = new SplStack(); $stack[] =

    1; // or $stack->push() $stack[] = 2; // or $stack->push() $stack[] = 3; // or $stack->push() $stack->pop(); // gives 3 $stack->pop(); // gives 2 $stack->pop(); // gives 1
  28. Queues/Stacks – Pros and Cons Array - Uses more memory

    (overhead / entry: 96 bytes) - No type hinting +|- Less OO SplQueue / SplStack + Uses less memory (overhead / entry: 48 bytes) + Type hinting possible +|- More OO
  29. Sets • A set is a collection with no particular

    ordering especially suited for testing the membership of a value against a collection or to perform union/intersection/complement operations between them.
  30. Sets • A set is a collection with no particular

    ordering especially suited for testing the membership of a value against a collection or to perform union/intersection/complement operations between them. Data Data Data Data Data
  31. Sets – Using array $set = array(); // Adding elements

    to a set $set[] = 1; $set[] = 2; $set[] = 3; // Checking presence in a set in_array(2, $set); // true in_array(5, $set); // false array_merge($set1, $set2); // union array_intersect($set1, $set2); // intersection array_diff($set1, $set2); // complement
  32. Sets – Using array $set = array(); // Adding elements

    to a set $set[] = 1; $set[] = 2; $set[] = 3; // Checking presence in a set in_array(2, $set); // true in_array(5, $set); // false array_merge($set1, $set2); // union array_intersect($set1, $set2); // intersection array_diff($set1, $set2); // complement True performance killers!
  33. Sets – Mis-usage if ($value === "val1" || $value ===

    "val2" || $value === "val3"))) { // ... }
  34. Sets – Using array (simple types) $set = array(); //

    Adding elements to a set $set[1] = true; // Any dummy value $set[2] = true; // is good but NULL! $set[3] = true; // Checking presence in a set isset($set[2]); // true isset($set[5]); // false $set1 + $set2; // union array_intersect_key($set1, $set2); // intersection array_diff_key($set1, $set2); // complement
  35. Sets – Using array (simple types) • Remember that PHP

    Array keys can be integers or strings only! $set = array(); // Adding elements to a set $set[1] = true; // Any dummy value $set[2] = true; // is good but NULL! $set[3] = true; // Checking presence in a set isset($set[2]); // true isset($set[5]); // false $set1 + $set2; // union array_intersect_key($set1, $set2); // intersection array_diff_key($set1, $set2); // complement
  36. Sets – Using array (objects) $set = array(); // Adding

    elements to a set $set[spl_object_hash($object1)] = $object1; $set[spl_object_hash($object2)] = $object2; $set[spl_object_hash($object3)] = $object3; // Checking presence in a set isset($set[spl_object_hash($object2)]); // true isset($set[spl_object_hash($object5)]); // false $set1 + $set2; // union array_intersect_key($set1, $set2); // intersection array_diff_key($set1, $set2); // complement
  37. Sets – Using array (objects) $set = array(); // Adding

    elements to a set $set[spl_object_hash($object1)] = $object1; $set[spl_object_hash($object2)] = $object2; $set[spl_object_hash($object3)] = $object3; // Checking presence in a set isset($set[spl_object_hash($object2)]); // true isset($set[spl_object_hash($object5)]); // false $set1 + $set2; // union array_intersect_key($set1, $set2); // intersection array_diff_key($set1, $set2); // complement Store a reference of the object!
  38. Sets – Using SplObjectStorage (objects) $set = new SplObjectStorage(); //

    Adding elements to a set $set->attach($object1); // or $set[$object1] = null; $set->attach($object2); // or $set[$object2] = null; $set->attach($object3); // or $set[$object3] = null; // Checking presence in a set isset($set[$object2]); // true isset($set[$object2]); // false $set1->addAll($set2); // union $set1->removeAllExcept($set2); // intersection $set1->removeAll($set2); // complement
  39. Sets – Using QuickHash (int) • No union/intersection/complement operations (yet?)

    • Yummy features like (loadFrom|saveTo)(String|File) $set = new QuickHashIntSet(64, QuickHashIntSet::CHECK_FOR_DUPES); // Adding elements to a set $set->add(1); $set->add(2); $set->add(3); // Checking presence in a set $set->exists(2); // true $set->exists(5); // false // Soonish: isset($set[2]);
  40. Sets – Using bitsets define("E_ERROR", 1); // or 1<<0 define("E_WARNING",

    2); // or 1<<1 define("E_PARSE", 4); // or 1<<2 define("E_NOTICE", 8); // or 1<<3 // Adding elements to a set $set = 0; $set |= E_ERROR; $set |= E_WARNING; $set |= E_PARSE; // Checking presence in a set $set & E_ERROR; // true $set & E_NOTICE; // false $set1 | $set2; // union $set1 & $set2; // intersection $set1 ^ $set2; // complement
  41. Sets – Using bitsets (example) Instead of: function remove($path, $files

    = true, $directories = true, $links = true, $executable = true) { if (!$files && is_file($path)) return false; if (!$directories && is_dir($path)) return false; if (!$links && is_link($path)) return false; if (!$executable && is_executable($path)) return false; // ... } remove("/tmp/removeMe", true, false, true, false); // WTF ?!
  42. Sets – Using bitsets (example) Instead of: define("REMOVE_FILES", 1 <<

    0); define("REMOVE_DIRS", 1 << 1); define("REMOVE_LINKS", 1 << 2); define("REMOVE_EXEC", 1 << 3); define("REMOVE_ALL", ~0); // Setting all bits function remove($path, $options = REMOVE_ALL) { if (~$options & REMOVE_FILES && is_file($path)) return false; if (~$options & REMOVE_DIRS && is_dir($path)) return false; if (~$options & REMOVE_LINKS && is_link($path)) return false; if (~$options & REMOVE_EXEC && is_executable($path)) return false; // ... } remove("/tmp/removeMe", REMOVE_FILES | REMOVE_LINKS); // Much better :)
  43. Sets: Conclusions • Use the key and not the value

    when using PHP Arrays. • Use QuickHash for set of integers if possible. • Use SplObjectStorage as soon as you are playing with objects. • Don't use array_unique() when you need a set!
  44. Maps – Using array • Don't use array_merge() on maps.

    $map = array(); $map["ONE"] = 1; $map["TWO"] = 2; $map["THREE"] = 3; // Merging maps: array_merge($map1, $map2); // SLOW! $map2 + $map1; // Fast :)
  45. Multikey Maps – Using array $map = array(); $map["ONE"] =

    1; $map["UN"] =& $map["ONE"]; $map["UNO"] =& $map["ONE"]; $map["TWO"] = 2; $map["DEUX"] =& $map["TWO"]; $map["DUE"] =& $map["TWO"]; $map["UNO"] = "once"; $map["DEUX"] = "twice"; var_dump($map); /* array(6) { ["ONE"] => &string(4) "once" ["UN"] => &string(4) "once" ["UNO"] => &string(4) "once" ["TWO"] => &string(5) "twice" ["DEUX"] => &string(5) "twice" ["DUE"] => &string(5) "twice" } */
  46. Heap • A heap is a tree-based structure in which

    all elements are ordered with largest key at the top, and the smallest one as leafs.
  47. Heap • A heap is a tree-based structure in which

    all elements are ordered with largest key at the top, and the smallest one as leafs.
  48. Heap – Using array $heap = array(); $heap[] = 3;

    sort($heap); $heap[] = 1; sort($heap); $heap[] = 2; sort($heap);
  49. Heaps: Conclusions • MUCH faster than having to re-sort() an

    array at every insertion. • If you don't require a collection to be sorted at every single step and can insert all data at once and then sort(). Array is a much better/faster approach. • SplPriorityQueue is very similar, consider it is the same as SplHeap but where the sorting is made on the key rather than the value.
  50. Bloom filters • A bloom filter is a space-efficient probabilistic

    data structure used to test whether an element is member of a set. • False positives are possible, but false negatives are not!
  51. Bloom filters – Using bloomy // BloomFilter::__construct(int capacity [, double

    error_rate [, int random_seed ] ]) $bloomFilter = new BloomFilter(10000, 0.001); $bloomFilter->add("An element"); $bloomFilter->has("An element"); // true for sure $bloomFilter->has("Foo"); // false, most probably
  52. Other related projects • SPL Types: Various types implemented as

    object: SplInt, SplFloat, SplEnum, SplBool and SplString http://pecl.php.net/package/SPL_Types
  53. Other related projects • SPL Types: Various types implemented as

    object: SplInt, SplFloat, SplEnum, SplBool and SplString http://pecl.php.net/package/SPL_Types • Judy: Sparse dynamic arrays implementation http://pecl.php.net/package/Judy
  54. Other related projects • SPL Types: Various types implemented as

    object: SplInt, SplFloat, SplEnum, SplBool and SplString http://pecl.php.net/package/SPL_Types • Judy: Sparse dynamic arrays implementation http://pecl.php.net/package/Judy • Weakref: Weak references implementation. Provides a gateway to an object without preventing that object from being collected by the garbage collector.
  55. Conclusions • Use appropriate data structure. It will keep your

    code clean and fast. • Think about the time and space complexity involved by your algorithms.
  56. Conclusions • Use appropriate data structure. It will keep your

    code clean and fast. • Think about the time and space complexity involved by your algorithms. • Name your variables accordingly: use “Map”, “Set”, “List”, “Queue”,... to describe them instead of using something like: $ordersArray.
  57. Photo Credits • Tuned car: http://www.flickr.com/photos/gioxxswall/5783867752 • London Eye Structure:

    http://www.flickr.com/photos/photographygal123/4883546484 • Cigarette: http://www.flickr.com/photos/superfantastic/166215927 • Heap structure: http://en.wikipedia.org/wiki/File:Max-Heap.svg • Drawers: http://www.flickr.com/photos/jamesclay/2312912612 • Stones stack: http://www.flickr.com/photos/silent_e/2282729987 • Tree: http://www.flickr.com/photos/drewbandy/6002204996