Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Mastering PHP Data Structure 102

Mastering PHP Data Structure 102

PHP NorthWest 2012 talk in Manchester about PHP Data Structures

Patrick Allaert

October 06, 2012
Tweet

More Decks by Patrick Allaert

Other Decks in Programming

Transcript

  1. Mastering PHP Data Structure 102
    Patrick Allaert
    PHPNW 2012 Manchester, United Kingdom

    View full-size slide

  2. About me

    Patrick Allaert

    Founder of Libereco Technologies

    Playing with PHP/Linux for +10 years

    eZ Publish core developer

    Author of the APM PHP extension

    @patrick_allaert

    [email protected]

    http://github.com/patrickallaert/

    http://patrickallaert.blogspot.com/

    View full-size slide

  3. PHP native datatypes

    NULL (IS_NULL)

    Booleans (IS_BOOL)

    Integers (IS_LONG)

    Floating point numbers
    (IS_DOUBLE)

    Strings (IS_STRING)

    Arrays (IS_ARRAY,
    IS_CONSTANT_ARRAY)

    Objects (IS_OBJECT)

    Resources (IS_RESOURCE)

    Callable (IS_CALLABLE)

    View full-size slide

  4. Wikipedia datatypes

    2-3-4 tree

    2-3 heap

    2-3 tree

    AA tree

    Abstract syntax tree

    (a,b)-tree

    Adaptive k-d tree

    Adjacency list

    Adjacency matrix

    AF-heap

    Alternating decision
    tree

    And-inverter graph

    And–or tree

    Array

    AVL tree

    Beap

    Bidirectional map

    Bin

    Binary decision
    diagram

    Binary heap

    Binary search tree

    Binary tree

    Binomial heap

    Bit array

    Bitboard

    Bit field

    Bitmap

    BK-tree

    Bloom filter

    Boolean

    Bounding interval
    hierarchy

    B sharp tree

    BSP tree

    B-tree

    B*-tree

    B+ tree

    B-trie

    Bx-tree

    Cartesian tree

    Char

    Circular buffer

    Compressed suffix
    array

    Container

    Control table

    Cover tree

    Ctrie

    Dancing tree

    D-ary heap

    Decision tree

    Deque

    Directed acyclic
    graph

    Directed graph

    Disjoint-set

    Distributed hash
    table

    Double

    Doubly connected
    edge list

    Doubly linked list

    Dynamic array

    Enfilade

    Enumerated type

    Expectiminimax tree

    Exponential tree

    Fenwick tree

    Fibonacci heap

    Finger tree

    Float

    FM-index

    Fusion tree

    Gap buffer

    Generalised suffix
    tree

    Graph

    Graph-structured
    stack

    Hash

    Hash array mapped
    trie

    Hashed array tree

    Hash list

    Hash table

    Hash tree

    Hash trie

    Heap

    Heightmap

    Hilbert R-tree

    Hypergraph

    Iliffe vector

    Image

    Implicit kd-tree

    Interval tree

    Int

    Judy array

    Kdb tree

    Kd-tree

    Koorde

    Leftist heap

    Lightmap

    Linear octree

    Link/cut tree

    Linked list

    Lookup table

    Map/Associative
    array/Dictionary

    Matrix

    Metric tree

    Minimax tree

    Min/max kd-tree

    M-tree

    Multigraph

    Multimap

    Multiset

    Octree

    Pagoda

    Pairing heap

    Parallel array

    Parse tree

    Plain old data
    structure

    Prefix hash tree

    Priority queue

    Propositional
    directed acyclic
    graph

    Quad-edge

    Quadtree

    Queap

    Queue

    Radix tree

    Randomized binary
    search tree

    Range tree

    Rapidly-exploring
    random tree

    Record (also called
    tuple or struct)

    Red-black tree

    Rope

    Routing table

    R-tree

    R* tree

    R+ tree

    Scapegoat tree

    Scene graph

    Segment tree

    Self-balancing
    binary search tree

    Self-organizing list

    Set

    Skew heap

    Skip list

    Soft heap

    Sorted array

    Spaghetti stack

    Sparse array

    Sparse matrix

    Splay tree

    SPQR-tree

    Stack

    String

    Suffix array

    Suffix tree

    Symbol table

    Syntax tree

    Tagged union (variant
    record, discriminated
    union, disjoint union)

    Tango tree

    Ternary heap

    Ternary search tree

    Threaded binary tree

    Top tree

    Treap

    Tree

    Trees

    Trie

    T-tree

    UB-tree

    Union

    Unrolled linked list

    Van Emde Boas tree

    Variable-length array

    VList

    VP-tree

    Weight-balanced tree

    Winged edge

    X-fast trie

    Xor linked list

    X-tree

    Y-fast trie

    Zero suppressed
    decision diagram

    Zipper

    Z-order

    View full-size slide

  5. Game:
    Can you recognize some structures?

    View full-size slide

  6. Array: PHP's untruthfulness
    PHP “Arrays” are not true Arrays!

    View full-size slide

  7. Array: PHP's untruthfulness
    PHP “Arrays” are not true Arrays!
    An array typically looks like this:
    Data Data
    Data
    Data Data Data
    0 1 2 3 4 5

    View full-size slide

  8. Array: PHP's untruthfulness
    PHP “Arrays” can dynamically grow and be iterated
    both directions (reset(), next(), prev(), end()),
    exclusively with O(1) operations.

    View full-size slide

  9. Array: PHP's untruthfulness
    PHP “Arrays” can dynamically grow and be iterated
    both directions (reset(), next(), prev(), end()),
    exclusively with O(1) operations.
    Let's have a Doubly Linked List (DLL):
    Data Data Data Data Data
    Head Tail
    Enables List, Deque, Queue and Stack
    implementations

    View full-size slide

  10. Array: PHP's untruthfulness
    PHP “Arrays” elements are always accessible using a
    key (index).

    View full-size slide

  11. Array: PHP's untruthfulness
    PHP “Arrays” elements are always accessible using a
    key (index).
    Let's have an Hash Table:
    Data Data Data Data Data
    Head Tail
    Bucket Bucket Bucket Bucket Bucket
    Bucket pointers array
    Bucket *
    0
    Bucket *
    1
    Bucket *
    2
    Bucket *
    3
    Bucket *
    4
    Bucket *
    5 ...
    Bucket *
    nTableSize -1

    View full-size slide

  12. Array: PHP's untruthfulness
    http://php.net/manual/en/language.types.array.php:
    “This type is optimized for several
    different uses; it can be treated as an
    array, list (vector), hash table (an
    implementation of a map),
    dictionary, collection, stack, queue,
    and probably more.”

    View full-size slide

  13. Optimized for anything ≈ Optimized for nothing!

    View full-size slide

  14. Array: PHP's untruthfulness

    In C: 100 000 integers (using long on 64bits => 8
    bytes) can be stored in 0.76 Mb.

    In PHP: it will take 13.97 Mb!


    A PHP variable (containing an integer) takes 48
    bytes.

    The overhead of buckets for every “array” entries is
    about 96 bytes.

    More details:
    http://nikic.github.com/2011/12/12/How-big-are-PHP-arrays-really-Hint-BIG.html

    View full-size slide

  15. Data Structure

    View full-size slide

  16. Structs (or records, tuples,...)

    View full-size slide

  17. Structs (or records, tuples,...)

    A struct is a value containing other values which
    are typically accessed using a name.

    Example:
    Person => firstName / lastName
    ComplexNumber => realPart / imaginaryPart

    View full-size slide

  18. Structs – Using array
    $person = array(
    "firstName" => "Patrick",
    "lastName" => "Allaert"
    );

    View full-size slide

  19. Structs – Using a class
    $person = new PersonStruct(
    "Patrick", "Allaert"
    );

    View full-size slide

  20. Structs – Using a class
    (Implementation)
    class PersonStruct
    {
    public $firstName;
    public $lastName;
    public function __construct($firstName, $lastName)
    {
    $this->firstName = $firstName;
    $this->lastName = $lastName;
    }
    }

    View full-size slide

  21. Structs – Using a class
    (Implementation)
    class PersonStruct
    {
    public $firstName;
    public $lastName;
    public function __construct($firstName, $lastName)
    {
    $this->firstName = $firstName;
    $this->lastName = $lastName;
    }
    public function __set($key, $value)
    {
    // a. Do nothing
    // b. trigger_error()
    // c. Throws an exception
    }
    }

    View full-size slide

  22. Structs – Pros and Cons
    Array
    + Uses less memory (PHP < 5.4)
    - Uses more memory (PHP = 5.4)
    - No type hinting
    - Flexible structure
    +|- Less OO
    Slightly faster?
    Class
    - Uses more memory (PHP < 5.4)
    + Uses less memory (PHP = 5.4)
    + Type hinting possible
    + Rigid structure
    +|- More OO
    Slightly slower?

    View full-size slide

  23. (true) Arrays

    View full-size slide

  24. (true) Arrays

    An array is a fixed size collection where elements
    are each identified by a numeric index.

    View full-size slide

  25. (true) Arrays

    An array is a fixed size collection where elements
    are each identified by a numeric index.
    Data Data
    Data
    Data Data Data
    0 1 2 3 4 5

    View full-size slide

  26. (true) Arrays – Using SplFixedArray
    $array = new SplFixedArray(3);
    $array[0] = 1; // or $array->offsetSet()
    $array[1] = 2; // or $array->offsetSet()
    $array[2] = 3; // or $array->offsetSet()
    $array[0]; // gives 1
    $array[1]; // gives 2
    $array[2]; // gives 3

    View full-size slide

  27. (true) Arrays – Pros and Cons
    Array
    - Uses more memory
    +|- Less OO
    SplFixedArray
    + Uses less memory
    +|- More OO

    View full-size slide

  28. Queues

    A queue is an ordered collection respecting First
    In, First Out (FIFO) order.

    Elements are inserted at one end and removed at
    the other.

    View full-size slide

  29. Queues

    A queue is an ordered collection respecting First
    In, First Out (FIFO) order.

    Elements are inserted at one end and removed at
    the other.
    Data Data
    Data
    Data Data Data
    Data
    Data
    Enqueue
    Dequeue

    View full-size slide

  30. Queues – Using array
    $queue = array();
    $queue[] = 1; // or array_push()
    $queue[] = 2; // or array_push()
    $queue[] = 3; // or array_push()
    array_shift($queue); // gives 1
    array_shift($queue); // gives 2
    array_shift($queue); // gives 3

    View full-size slide

  31. Queues – Using SplQueue
    $queue = new SplQueue();
    $queue[] = 1; // or $queue->enqueue()
    $queue[] = 2; // or $queue->enqueue()
    $queue[] = 3; // or $queue->enqueue()
    $queue->dequeue(); // gives 1
    $queue->dequeue(); // gives 2
    $queue->dequeue(); // gives 3

    View full-size slide

  32. Stacks

    A stack is an ordered collection respecting Last In,
    First Out (LIFO) order.

    Elements are inserted and removed on the same
    end.

    View full-size slide

  33. Stacks

    A stack is an ordered collection respecting Last In,
    First Out (LIFO) order.

    Elements are inserted and removed on the same
    end.
    Data Data
    Data
    Data Data Data
    Data
    Data
    Push
    Pop

    View full-size slide

  34. Stacks – Using array
    $stack = array();
    $stack[] = 1; // or array_push()
    $stack[] = 2; // or array_push()
    $stack[] = 3; // or array_push()
    array_pop($stack); // gives 3
    array_pop($stack); // gives 2
    array_pop($stack); // gives 1

    View full-size slide

  35. Stacks – Using SplStack
    $stack = new SplStack();
    $stack[] = 1; // or $stack->push()
    $stack[] = 2; // or $stack->push()
    $stack[] = 3; // or $stack->push()
    $stack->pop(); // gives 3
    $stack->pop(); // gives 2
    $stack->pop(); // gives 1

    View full-size slide

  36. Queues/Stacks – Pros and Cons
    Array
    - Uses more memory
    (overhead / entry: 96 bytes)
    - No type hinting
    +|- Less OO
    SplQueue / SplStack
    + Uses less memory
    (overhead / entry: 48 bytes)
    + Type hinting possible
    +|- More OO

    View full-size slide

  37. Sets
    People with
    strong views on
    the distinction
    between geeks
    and nerds
    Geeks Nerds

    View full-size slide

  38. Sets

    A set is a collection with no particular ordering
    especially suited for testing the membership of a
    value against a collection or to perform
    union/intersection/complement operations
    between them.

    View full-size slide

  39. Sets

    A set is a collection with no particular ordering
    especially suited for testing the membership of a
    value against a collection or to perform
    union/intersection/complement operations
    between them.
    Data
    Data
    Data
    Data
    Data

    View full-size slide

  40. Sets – Using array
    $set = array();
    // Adding elements to a set
    $set[] = 1;
    $set[] = 2;
    $set[] = 3;
    // Checking presence in a set
    in_array(2, $set); // true
    in_array(5, $set); // false
    array_merge($set1, $set2); // union
    array_intersect($set1, $set2); // intersection
    array_diff($set1, $set2); // complement

    View full-size slide

  41. Sets – Using array
    $set = array();
    // Adding elements to a set
    $set[] = 1;
    $set[] = 2;
    $set[] = 3;
    // Checking presence in a set
    in_array(2, $set); // true
    in_array(5, $set); // false
    array_merge($set1, $set2); // union
    array_intersect($set1, $set2); // intersection
    array_diff($set1, $set2); // complement
    True
    performance
    killers!

    View full-size slide

  42. Sets – Mis-usage
    if ($value === "val1" || $value === "val2" || $value ===
    "val3")))
    {
    // ...
    }

    View full-size slide

  43. Sets – Mis-usage
    if (in_array($value, array("val1", "val2", "val3")))
    {
    // ...
    }

    View full-size slide

  44. Sets – Mis-usage
    switch ($value)
    {
    case "val1":
    case "val2":
    case "val3":
    // ...
    }

    View full-size slide

  45. Sets – Using array (simple types)
    $set = array();
    // Adding elements to a set
    $set[1] = true; // Any dummy value
    $set[2] = true; // is good but NULL!
    $set[3] = true;
    // Checking presence in a set
    isset($set[2]); // true
    isset($set[5]); // false
    $set1 + $set2; // union
    array_intersect_key($set1, $set2); // intersection
    array_diff_key($set1, $set2); // complement

    View full-size slide

  46. Sets – Using array (simple types)

    Remember that PHP Array keys can be integers or
    strings only!
    $set = array();
    // Adding elements to a set
    $set[1] = true; // Any dummy value
    $set[2] = true; // is good but NULL!
    $set[3] = true;
    // Checking presence in a set
    isset($set[2]); // true
    isset($set[5]); // false
    $set1 + $set2; // union
    array_intersect_key($set1, $set2); // intersection
    array_diff_key($set1, $set2); // complement

    View full-size slide

  47. Sets – Using array (objects)
    $set = array();
    // Adding elements to a set
    $set[spl_object_hash($object1)] = $object1;
    $set[spl_object_hash($object2)] = $object2;
    $set[spl_object_hash($object3)] = $object3;
    // Checking presence in a set
    isset($set[spl_object_hash($object2)]); // true
    isset($set[spl_object_hash($object5)]); // false
    $set1 + $set2; // union
    array_intersect_key($set1, $set2); // intersection
    array_diff_key($set1, $set2); // complement

    View full-size slide

  48. Sets – Using array (objects)
    $set = array();
    // Adding elements to a set
    $set[spl_object_hash($object1)] = $object1;
    $set[spl_object_hash($object2)] = $object2;
    $set[spl_object_hash($object3)] = $object3;
    // Checking presence in a set
    isset($set[spl_object_hash($object2)]); // true
    isset($set[spl_object_hash($object5)]); // false
    $set1 + $set2; // union
    array_intersect_key($set1, $set2); // intersection
    array_diff_key($set1, $set2); // complement
    Store a
    reference of
    the object!

    View full-size slide

  49. Sets – Using SplObjectStorage
    (objects)
    $set = new SplObjectStorage();
    // Adding elements to a set
    $set->attach($object1); // or $set[$object1] = null;
    $set->attach($object2); // or $set[$object2] = null;
    $set->attach($object3); // or $set[$object3] = null;
    // Checking presence in a set
    isset($set[$object2]); // true
    isset($set[$object2]); // false
    $set1->addAll($set2); // union
    $set1->removeAllExcept($set2); // intersection
    $set1->removeAll($set2); // complement

    View full-size slide

  50. Sets – Using QuickHash (int)

    No union/intersection/complement operations
    (yet?)

    Yummy features like (loadFrom|saveTo)(String|File)
    $set = new QuickHashIntSet(64,
    QuickHashIntSet::CHECK_FOR_DUPES);
    // Adding elements to a set
    $set->add(1);
    $set->add(2);
    $set->add(3);
    // Checking presence in a set
    $set->exists(2); // true
    $set->exists(5); // false
    // Soonish: isset($set[2]);

    View full-size slide

  51. Sets – Using bitsets
    define("E_ERROR", 1); // or 1<<0
    define("E_WARNING", 2); // or 1<<1
    define("E_PARSE", 4); // or 1<<2
    define("E_NOTICE", 8); // or 1<<3
    // Adding elements to a set
    $set = 0;
    $set |= E_ERROR;
    $set |= E_WARNING;
    $set |= E_PARSE;
    // Checking presence in a set
    $set & E_ERROR; // true
    $set & E_NOTICE; // false
    $set1 | $set2; // union
    $set1 & $set2; // intersection
    $set1 ^ $set2; // complement

    View full-size slide

  52. Sets – Using bitsets (example)
    Instead of:
    function remove($path, $files = true, $directories = true, $links = true,
    $executable = true)
    {
    if (!$files && is_file($path))
    return false;
    if (!$directories && is_dir($path))
    return false;
    if (!$links && is_link($path))
    return false;
    if (!$executable && is_executable($path))
    return false;
    // ...
    }
    remove("/tmp/removeMe", true, false, true, false); // WTF ?!

    View full-size slide

  53. Sets – Using bitsets (example)
    Instead of:
    define("REMOVE_FILES", 1 << 0);
    define("REMOVE_DIRS", 1 << 1);
    define("REMOVE_LINKS", 1 << 2);
    define("REMOVE_EXEC", 1 << 3);
    define("REMOVE_ALL", ~0); // Setting all bits
    function remove($path, $options = REMOVE_ALL)
    {
    if (~$options & REMOVE_FILES && is_file($path))
    return false;
    if (~$options & REMOVE_DIRS && is_dir($path))
    return false;
    if (~$options & REMOVE_LINKS && is_link($path))
    return false;
    if (~$options & REMOVE_EXEC && is_executable($path))
    return false;
    // ...
    }
    remove("/tmp/removeMe", REMOVE_FILES | REMOVE_LINKS); // Much better :)

    View full-size slide

  54. Sets: Conclusions

    Use the key and not the value when using PHP
    Arrays.

    Use QuickHash for set of integers if possible.

    Use SplObjectStorage as soon as you are playing
    with objects.

    Don't use array_unique() when you need a set!

    View full-size slide

  55. Maps

    A map is a collection of key/value pairs where all
    keys are unique.

    View full-size slide

  56. Maps – Using array

    Don't use array_merge() on maps.
    $map = array();
    $map["ONE"] = 1;
    $map["TWO"] = 2;
    $map["THREE"] = 3;
    // Merging maps:
    array_merge($map1, $map2); // SLOW!
    $map2 + $map1; // Fast :)

    View full-size slide

  57. Multikey Maps – Using array
    $map = array();
    $map["ONE"] = 1;
    $map["UN"] =& $map["ONE"];
    $map["UNO"] =& $map["ONE"];
    $map["TWO"] = 2;
    $map["DEUX"] =& $map["TWO"];
    $map["DUE"] =& $map["TWO"];
    $map["UNO"] = "once";
    $map["DEUX"] = "twice";
    var_dump($map);
    /*
    array(6) {
    ["ONE"] => &string(4) "once"
    ["UN"] => &string(4) "once"
    ["UNO"] => &string(4) "once"
    ["TWO"] => &string(5) "twice"
    ["DEUX"] => &string(5) "twice"
    ["DUE"] => &string(5) "twice"
    }
    */

    View full-size slide

  58. Heap

    A heap is a tree-based structure in which all
    elements are ordered with largest key at the top,
    and the smallest one as leafs.

    View full-size slide

  59. Heap

    A heap is a tree-based structure in which all
    elements are ordered with largest key at the top,
    and the smallest one as leafs.

    View full-size slide

  60. Heap – Using array
    $heap = array();
    $heap[] = 3;
    sort($heap);
    $heap[] = 1;
    sort($heap);
    $heap[] = 2;
    sort($heap);

    View full-size slide

  61. Heap – Using Spl(Min|Max)Heap
    $heap = new SplMinHeap;
    $heap->insert(3);
    $heap->insert(1);
    $heap->insert(2);

    View full-size slide

  62. Heaps: Conclusions

    MUCH faster than having to re-sort() an array at
    every insertion.

    If you don't require a collection to be sorted at
    every single step and can insert all data at once
    and then sort(). Array is a much better/faster
    approach.

    SplPriorityQueue is very similar, consider it is the
    same as SplHeap but where the sorting is made on
    the key rather than the value.

    View full-size slide

  63. Bloom filters

    A bloom filter is a space-efficient probabilistic data
    structure used to test whether an element is
    member of a set.

    False positives are possible, but false negatives are
    not!

    View full-size slide

  64. Bloom filters – Using bloomy
    // BloomFilter::__construct(int capacity [, double
    error_rate [, int random_seed ] ])
    $bloomFilter = new BloomFilter(10000, 0.001);
    $bloomFilter->add("An element");
    $bloomFilter->has("An element"); // true for sure
    $bloomFilter->has("Foo"); // false, most probably

    View full-size slide

  65. Other related projects

    SPL Types: Various types implemented as object:
    SplInt, SplFloat, SplEnum, SplBool and SplString
    http://pecl.php.net/package/SPL_Types

    View full-size slide

  66. Other related projects

    SPL Types: Various types implemented as object:
    SplInt, SplFloat, SplEnum, SplBool and SplString
    http://pecl.php.net/package/SPL_Types

    Judy: Sparse dynamic arrays implementation
    http://pecl.php.net/package/Judy

    View full-size slide

  67. Other related projects

    SPL Types: Various types implemented as object:
    SplInt, SplFloat, SplEnum, SplBool and SplString
    http://pecl.php.net/package/SPL_Types

    Judy: Sparse dynamic arrays implementation
    http://pecl.php.net/package/Judy

    Weakref: Weak references implementation.
    Provides a gateway to an object without
    preventing that object from being collected by the
    garbage collector.

    View full-size slide

  68. Conclusions

    Use appropriate data structure. It will keep your
    code clean and fast.

    View full-size slide

  69. Conclusions

    Use appropriate data structure. It will keep your
    code clean and fast.

    Think about the time and space complexity
    involved by your algorithms.

    View full-size slide

  70. Conclusions

    Use appropriate data structure. It will keep your
    code clean and fast.

    Think about the time and space complexity
    involved by your algorithms.

    Name your variables accordingly: use “Map”, “Set”,
    “List”, “Queue”,... to describe them instead of using
    something like: $ordersArray.

    View full-size slide

  71. Thanks

    Don't forget to rate this talk on https://joind.in/6941

    View full-size slide

  72. Photo Credits

    Tuned car:
    http://www.flickr.com/photos/gioxxswall/5783867752

    London Eye Structure:
    http://www.flickr.com/photos/photographygal123/4883546484

    Cigarette:
    http://www.flickr.com/photos/superfantastic/166215927

    Heap structure:
    http://en.wikipedia.org/wiki/File:Max-Heap.svg

    Drawers:
    http://www.flickr.com/photos/jamesclay/2312912612

    Stones stack:
    http://www.flickr.com/photos/silent_e/2282729987

    Tree:
    http://www.flickr.com/photos/drewbandy/6002204996

    View full-size slide