About me ● Patrick Allaert ● Founder of Libereco Technologies ● Playing with PHP/Linux for +10 years ● eZ Publish core developer ● Author of the APM PHP extension ● @patrick_allaert ● [email protected] ● http://github.com/patrickallaert/ ● http://patrickallaert.blogspot.com/
Array: PHP's untruthfulness PHP “Arrays” can dynamically grow and be iterated both directions (reset(), next(), prev(), end()), exclusively with O(1) operations.
Array: PHP's untruthfulness PHP “Arrays” can dynamically grow and be iterated both directions (reset(), next(), prev(), end()), exclusively with O(1) operations. Let's have a Doubly Linked List (DLL): Data Data Data Data Data Head Tail Enables List, Deque, Queue and Stack implementations
Array: PHP's untruthfulness PHP “Arrays” elements are always accessible using a key (index). Let's have an Hash Table: Data Data Data Data Data Head Tail Bucket Bucket Bucket Bucket Bucket Bucket pointers array Bucket * 0 Bucket * 1 Bucket * 2 Bucket * 3 Bucket * 4 Bucket * 5 ... Bucket * nTableSize -1
Array: PHP's untruthfulness http://php.net/manual/en/language.types.array.php: “This type is optimized for several different uses; it can be treated as an array, list (vector), hash table (an implementation of a map), dictionary, collection, stack, queue, and probably more.”
Array: PHP's untruthfulness ● In C: 100 000 integers (using long on 64bits => 8 bytes) can be stored in 0.76 Mb. ● In PHP: it will take 13.97 Mb! ≅ ● A PHP variable (containing an integer) takes 48 bytes. ● The overhead of buckets for every “array” entries is about 96 bytes. ● More details: http://nikic.github.com/2011/12/12/How-big-are-PHP-arrays-really-Hint-BIG.html
Structs (or records, tuples,...) ● A struct is a value containing other values which are typically accessed using a name. ● Example: Person => firstName / lastName ComplexNumber => realPart / imaginaryPart
Structs – Using a class (Implementation) class PersonStruct { public $firstName; public $lastName; public function __construct($firstName, $lastName) { $this->firstName = $firstName; $this->lastName = $lastName; } }
Structs – Using a class (Implementation) class PersonStruct { public $firstName; public $lastName; public function __construct($firstName, $lastName) { $this->firstName = $firstName; $this->lastName = $lastName; } public function __set($key, $value) { // a. Do nothing // b. trigger_error() // c. Throws an exception } }
Structs – Pros and Cons Array + Uses less memory (PHP < 5.4) - Uses more memory (PHP = 5.4) - No type hinting - Flexible structure +|- Less OO Slightly faster? Class - Uses more memory (PHP < 5.4) + Uses less memory (PHP = 5.4) + Type hinting possible + Rigid structure +|- More OO Slightly slower?
Queues ● A queue is an ordered collection respecting First In, First Out (FIFO) order. ● Elements are inserted at one end and removed at the other. Data Data Data Data Data Data Data Data Enqueue Dequeue
Stacks ● A stack is an ordered collection respecting Last In, First Out (LIFO) order. ● Elements are inserted and removed on the same end. Data Data Data Data Data Data Data Data Push Pop
Queues/Stacks – Pros and Cons Array - Uses more memory (overhead / entry: 96 bytes) - No type hinting +|- Less OO SplQueue / SplStack + Uses less memory (overhead / entry: 48 bytes) + Type hinting possible +|- More OO
Sets ● A set is a collection with no particular ordering especially suited for testing the membership of a value against a collection or to perform union/intersection/complement operations between them.
Sets ● A set is a collection with no particular ordering especially suited for testing the membership of a value against a collection or to perform union/intersection/complement operations between them. Data Data Data Data Data
Sets – Using array (simple types) $set = array(); // Adding elements to a set $set[1] = true; // Any dummy value $set[2] = true; // is good but NULL! $set[3] = true; // Checking presence in a set isset($set[2]); // true isset($set[5]); // false $set1 + $set2; // union array_intersect_key($set1, $set2); // intersection array_diff_key($set1, $set2); // complement
Sets – Using array (simple types) ● Remember that PHP Array keys can be integers or strings only! $set = array(); // Adding elements to a set $set[1] = true; // Any dummy value $set[2] = true; // is good but NULL! $set[3] = true; // Checking presence in a set isset($set[2]); // true isset($set[5]); // false $set1 + $set2; // union array_intersect_key($set1, $set2); // intersection array_diff_key($set1, $set2); // complement
Sets – Using array (objects) $set = array(); // Adding elements to a set $set[spl_object_hash($object1)] = $object1; $set[spl_object_hash($object2)] = $object2; $set[spl_object_hash($object3)] = $object3; // Checking presence in a set isset($set[spl_object_hash($object2)]); // true isset($set[spl_object_hash($object5)]); // false $set1 + $set2; // union array_intersect_key($set1, $set2); // intersection array_diff_key($set1, $set2); // complement Store a reference of the object!
Sets – Using SplObjectStorage (objects) $set = new SplObjectStorage(); // Adding elements to a set $set->attach($object1); // or $set[$object1] = null; $set->attach($object2); // or $set[$object2] = null; $set->attach($object3); // or $set[$object3] = null; // Checking presence in a set isset($set[$object2]); // true isset($set[$object2]); // false $set1->addAll($set2); // union $set1->removeAllExcept($set2); // intersection $set1->removeAll($set2); // complement
Sets – Using QuickHash (int) ● No union/intersection/complement operations (yet?) ● Yummy features like (loadFrom|saveTo)(String|File) $set = new QuickHashIntSet(64, QuickHashIntSet::CHECK_FOR_DUPES); // Adding elements to a set $set->add(1); $set->add(2); $set->add(3); // Checking presence in a set $set->exists(2); // true $set->exists(5); // false // Soonish: isset($set[2]);
Sets: Conclusions ● Use the key and not the value when using PHP Arrays. ● Use QuickHash for set of integers if possible. ● Use SplObjectStorage as soon as you are playing with objects. ● Don't use array_unique() when you need a set!
Heaps: Conclusions ● MUCH faster than having to re-sort() an array at every insertion. ● If you don't require a collection to be sorted at every single step and can insert all data at once and then sort(). Array is a much better/faster approach. ● SplPriorityQueue is very similar, consider it is the same as SplHeap but where the sorting is made on the key rather than the value.
Bloom filters ● A bloom filter is a space-efficient probabilistic data structure used to test whether an element is member of a set. ● False positives are possible, but false negatives are not!
Other related projects ● SPL Types: Various types implemented as object: SplInt, SplFloat, SplEnum, SplBool and SplString http://pecl.php.net/package/SPL_Types
Other related projects ● SPL Types: Various types implemented as object: SplInt, SplFloat, SplEnum, SplBool and SplString http://pecl.php.net/package/SPL_Types ● Judy: Sparse dynamic arrays implementation http://pecl.php.net/package/Judy
Other related projects ● SPL Types: Various types implemented as object: SplInt, SplFloat, SplEnum, SplBool and SplString http://pecl.php.net/package/SPL_Types ● Judy: Sparse dynamic arrays implementation http://pecl.php.net/package/Judy ● Weakref: Weak references implementation. Provides a gateway to an object without preventing that object from being collected by the garbage collector.
Conclusions ● Use appropriate data structure. It will keep your code clean and fast. ● Think about the time and space complexity involved by your algorithms.
Conclusions ● Use appropriate data structure. It will keep your code clean and fast. ● Think about the time and space complexity involved by your algorithms. ● Name your variables accordingly: use “Map”, “Set”, “List”, “Queue”,... to describe them instead of using something like: $ordersArray.