Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Mastering PHP Data Structure 102

Mastering PHP Data Structure 102

PHP NorthWest 2012 talk in Manchester about PHP Data Structures

Patrick Allaert

October 06, 2012
Tweet

More Decks by Patrick Allaert

Other Decks in Programming

Transcript

  1. Mastering PHP Data Structure 102
    Patrick Allaert
    PHPNW 2012 Manchester, United Kingdom

    View Slide

  2. About me

    Patrick Allaert

    Founder of Libereco Technologies

    Playing with PHP/Linux for +10 years

    eZ Publish core developer

    Author of the APM PHP extension

    @patrick_allaert

    [email protected]

    http://github.com/patrickallaert/

    http://patrickallaert.blogspot.com/

    View Slide

  3. APM

    View Slide

  4. APM

    View Slide

  5. PHP native datatypes

    NULL (IS_NULL)

    Booleans (IS_BOOL)

    Integers (IS_LONG)

    Floating point numbers
    (IS_DOUBLE)

    Strings (IS_STRING)

    Arrays (IS_ARRAY,
    IS_CONSTANT_ARRAY)

    Objects (IS_OBJECT)

    Resources (IS_RESOURCE)

    Callable (IS_CALLABLE)

    View Slide

  6. Wikipedia datatypes

    2-3-4 tree

    2-3 heap

    2-3 tree

    AA tree

    Abstract syntax tree

    (a,b)-tree

    Adaptive k-d tree

    Adjacency list

    Adjacency matrix

    AF-heap

    Alternating decision
    tree

    And-inverter graph

    And–or tree

    Array

    AVL tree

    Beap

    Bidirectional map

    Bin

    Binary decision
    diagram

    Binary heap

    Binary search tree

    Binary tree

    Binomial heap

    Bit array

    Bitboard

    Bit field

    Bitmap

    BK-tree

    Bloom filter

    Boolean

    Bounding interval
    hierarchy

    B sharp tree

    BSP tree

    B-tree

    B*-tree

    B+ tree

    B-trie

    Bx-tree

    Cartesian tree

    Char

    Circular buffer

    Compressed suffix
    array

    Container

    Control table

    Cover tree

    Ctrie

    Dancing tree

    D-ary heap

    Decision tree

    Deque

    Directed acyclic
    graph

    Directed graph

    Disjoint-set

    Distributed hash
    table

    Double

    Doubly connected
    edge list

    Doubly linked list

    Dynamic array

    Enfilade

    Enumerated type

    Expectiminimax tree

    Exponential tree

    Fenwick tree

    Fibonacci heap

    Finger tree

    Float

    FM-index

    Fusion tree

    Gap buffer

    Generalised suffix
    tree

    Graph

    Graph-structured
    stack

    Hash

    Hash array mapped
    trie

    Hashed array tree

    Hash list

    Hash table

    Hash tree

    Hash trie

    Heap

    Heightmap

    Hilbert R-tree

    Hypergraph

    Iliffe vector

    Image

    Implicit kd-tree

    Interval tree

    Int

    Judy array

    Kdb tree

    Kd-tree

    Koorde

    Leftist heap

    Lightmap

    Linear octree

    Link/cut tree

    Linked list

    Lookup table

    Map/Associative
    array/Dictionary

    Matrix

    Metric tree

    Minimax tree

    Min/max kd-tree

    M-tree

    Multigraph

    Multimap

    Multiset

    Octree

    Pagoda

    Pairing heap

    Parallel array

    Parse tree

    Plain old data
    structure

    Prefix hash tree

    Priority queue

    Propositional
    directed acyclic
    graph

    Quad-edge

    Quadtree

    Queap

    Queue

    Radix tree

    Randomized binary
    search tree

    Range tree

    Rapidly-exploring
    random tree

    Record (also called
    tuple or struct)

    Red-black tree

    Rope

    Routing table

    R-tree

    R* tree

    R+ tree

    Scapegoat tree

    Scene graph

    Segment tree

    Self-balancing
    binary search tree

    Self-organizing list

    Set

    Skew heap

    Skip list

    Soft heap

    Sorted array

    Spaghetti stack

    Sparse array

    Sparse matrix

    Splay tree

    SPQR-tree

    Stack

    String

    Suffix array

    Suffix tree

    Symbol table

    Syntax tree

    Tagged union (variant
    record, discriminated
    union, disjoint union)

    Tango tree

    Ternary heap

    Ternary search tree

    Threaded binary tree

    Top tree

    Treap

    Tree

    Trees

    Trie

    T-tree

    UB-tree

    Union

    Unrolled linked list

    Van Emde Boas tree

    Variable-length array

    VList

    VP-tree

    Weight-balanced tree

    Winged edge

    X-fast trie

    Xor linked list

    X-tree

    Y-fast trie

    Zero suppressed
    decision diagram

    Zipper

    Z-order

    View Slide

  7. Game:
    Can you recognize some structures?

    View Slide

  8. View Slide

  9. View Slide

  10. View Slide

  11. View Slide

  12. View Slide

  13. View Slide

  14. Array: PHP's untruthfulness
    PHP “Arrays” are not true Arrays!

    View Slide

  15. Array: PHP's untruthfulness
    PHP “Arrays” are not true Arrays!
    An array typically looks like this:
    Data Data
    Data
    Data Data Data
    0 1 2 3 4 5

    View Slide

  16. Array: PHP's untruthfulness
    PHP “Arrays” can dynamically grow and be iterated
    both directions (reset(), next(), prev(), end()),
    exclusively with O(1) operations.

    View Slide

  17. Array: PHP's untruthfulness
    PHP “Arrays” can dynamically grow and be iterated
    both directions (reset(), next(), prev(), end()),
    exclusively with O(1) operations.
    Let's have a Doubly Linked List (DLL):
    Data Data Data Data Data
    Head Tail
    Enables List, Deque, Queue and Stack
    implementations

    View Slide

  18. Array: PHP's untruthfulness
    PHP “Arrays” elements are always accessible using a
    key (index).

    View Slide

  19. Array: PHP's untruthfulness
    PHP “Arrays” elements are always accessible using a
    key (index).
    Let's have an Hash Table:
    Data Data Data Data Data
    Head Tail
    Bucket Bucket Bucket Bucket Bucket
    Bucket pointers array
    Bucket *
    0
    Bucket *
    1
    Bucket *
    2
    Bucket *
    3
    Bucket *
    4
    Bucket *
    5 ...
    Bucket *
    nTableSize -1

    View Slide

  20. Array: PHP's untruthfulness
    http://php.net/manual/en/language.types.array.php:
    “This type is optimized for several
    different uses; it can be treated as an
    array, list (vector), hash table (an
    implementation of a map),
    dictionary, collection, stack, queue,
    and probably more.”

    View Slide

  21. Optimized for anything ≈ Optimized for nothing!

    View Slide

  22. Array: PHP's untruthfulness

    In C: 100 000 integers (using long on 64bits => 8
    bytes) can be stored in 0.76 Mb.

    In PHP: it will take 13.97 Mb!


    A PHP variable (containing an integer) takes 48
    bytes.

    The overhead of buckets for every “array” entries is
    about 96 bytes.

    More details:
    http://nikic.github.com/2011/12/12/How-big-are-PHP-arrays-really-Hint-BIG.html

    View Slide

  23. Data Structure

    View Slide

  24. Structs (or records, tuples,...)

    View Slide

  25. Structs (or records, tuples,...)

    A struct is a value containing other values which
    are typically accessed using a name.

    Example:
    Person => firstName / lastName
    ComplexNumber => realPart / imaginaryPart

    View Slide

  26. Structs – Using array
    $person = array(
    "firstName" => "Patrick",
    "lastName" => "Allaert"
    );

    View Slide

  27. Structs – Using a class
    $person = new PersonStruct(
    "Patrick", "Allaert"
    );

    View Slide

  28. Structs – Using a class
    (Implementation)
    class PersonStruct
    {
    public $firstName;
    public $lastName;
    public function __construct($firstName, $lastName)
    {
    $this->firstName = $firstName;
    $this->lastName = $lastName;
    }
    }

    View Slide

  29. Structs – Using a class
    (Implementation)
    class PersonStruct
    {
    public $firstName;
    public $lastName;
    public function __construct($firstName, $lastName)
    {
    $this->firstName = $firstName;
    $this->lastName = $lastName;
    }
    public function __set($key, $value)
    {
    // a. Do nothing
    // b. trigger_error()
    // c. Throws an exception
    }
    }

    View Slide

  30. Structs – Pros and Cons
    Array
    + Uses less memory (PHP < 5.4)
    - Uses more memory (PHP = 5.4)
    - No type hinting
    - Flexible structure
    +|- Less OO
    Slightly faster?
    Class
    - Uses more memory (PHP < 5.4)
    + Uses less memory (PHP = 5.4)
    + Type hinting possible
    + Rigid structure
    +|- More OO
    Slightly slower?

    View Slide

  31. (true) Arrays

    View Slide

  32. (true) Arrays

    An array is a fixed size collection where elements
    are each identified by a numeric index.

    View Slide

  33. (true) Arrays

    An array is a fixed size collection where elements
    are each identified by a numeric index.
    Data Data
    Data
    Data Data Data
    0 1 2 3 4 5

    View Slide

  34. (true) Arrays – Using SplFixedArray
    $array = new SplFixedArray(3);
    $array[0] = 1; // or $array->offsetSet()
    $array[1] = 2; // or $array->offsetSet()
    $array[2] = 3; // or $array->offsetSet()
    $array[0]; // gives 1
    $array[1]; // gives 2
    $array[2]; // gives 3

    View Slide

  35. (true) Arrays – Pros and Cons
    Array
    - Uses more memory
    +|- Less OO
    SplFixedArray
    + Uses less memory
    +|- More OO

    View Slide

  36. Queues

    View Slide

  37. Queues

    A queue is an ordered collection respecting First
    In, First Out (FIFO) order.

    Elements are inserted at one end and removed at
    the other.

    View Slide

  38. Queues

    A queue is an ordered collection respecting First
    In, First Out (FIFO) order.

    Elements are inserted at one end and removed at
    the other.
    Data Data
    Data
    Data Data Data
    Data
    Data
    Enqueue
    Dequeue

    View Slide

  39. Queues – Using array
    $queue = array();
    $queue[] = 1; // or array_push()
    $queue[] = 2; // or array_push()
    $queue[] = 3; // or array_push()
    array_shift($queue); // gives 1
    array_shift($queue); // gives 2
    array_shift($queue); // gives 3

    View Slide

  40. Queues – Using SplQueue
    $queue = new SplQueue();
    $queue[] = 1; // or $queue->enqueue()
    $queue[] = 2; // or $queue->enqueue()
    $queue[] = 3; // or $queue->enqueue()
    $queue->dequeue(); // gives 1
    $queue->dequeue(); // gives 2
    $queue->dequeue(); // gives 3

    View Slide

  41. Stacks

    View Slide

  42. Stacks

    A stack is an ordered collection respecting Last In,
    First Out (LIFO) order.

    Elements are inserted and removed on the same
    end.

    View Slide

  43. Stacks

    A stack is an ordered collection respecting Last In,
    First Out (LIFO) order.

    Elements are inserted and removed on the same
    end.
    Data Data
    Data
    Data Data Data
    Data
    Data
    Push
    Pop

    View Slide

  44. Stacks – Using array
    $stack = array();
    $stack[] = 1; // or array_push()
    $stack[] = 2; // or array_push()
    $stack[] = 3; // or array_push()
    array_pop($stack); // gives 3
    array_pop($stack); // gives 2
    array_pop($stack); // gives 1

    View Slide

  45. Stacks – Using SplStack
    $stack = new SplStack();
    $stack[] = 1; // or $stack->push()
    $stack[] = 2; // or $stack->push()
    $stack[] = 3; // or $stack->push()
    $stack->pop(); // gives 3
    $stack->pop(); // gives 2
    $stack->pop(); // gives 1

    View Slide

  46. Queues/Stacks – Pros and Cons
    Array
    - Uses more memory
    (overhead / entry: 96 bytes)
    - No type hinting
    +|- Less OO
    SplQueue / SplStack
    + Uses less memory
    (overhead / entry: 48 bytes)
    + Type hinting possible
    +|- More OO

    View Slide

  47. Sets
    People with
    strong views on
    the distinction
    between geeks
    and nerds
    Geeks Nerds

    View Slide

  48. Sets

    A set is a collection with no particular ordering
    especially suited for testing the membership of a
    value against a collection or to perform
    union/intersection/complement operations
    between them.

    View Slide

  49. Sets

    A set is a collection with no particular ordering
    especially suited for testing the membership of a
    value against a collection or to perform
    union/intersection/complement operations
    between them.
    Data
    Data
    Data
    Data
    Data

    View Slide

  50. Sets – Using array
    $set = array();
    // Adding elements to a set
    $set[] = 1;
    $set[] = 2;
    $set[] = 3;
    // Checking presence in a set
    in_array(2, $set); // true
    in_array(5, $set); // false
    array_merge($set1, $set2); // union
    array_intersect($set1, $set2); // intersection
    array_diff($set1, $set2); // complement

    View Slide

  51. Sets – Using array
    $set = array();
    // Adding elements to a set
    $set[] = 1;
    $set[] = 2;
    $set[] = 3;
    // Checking presence in a set
    in_array(2, $set); // true
    in_array(5, $set); // false
    array_merge($set1, $set2); // union
    array_intersect($set1, $set2); // intersection
    array_diff($set1, $set2); // complement
    True
    performance
    killers!

    View Slide

  52. Sets – Mis-usage
    if ($value === "val1" || $value === "val2" || $value ===
    "val3")))
    {
    // ...
    }

    View Slide

  53. Sets – Mis-usage
    if (in_array($value, array("val1", "val2", "val3")))
    {
    // ...
    }

    View Slide

  54. Sets – Mis-usage
    switch ($value)
    {
    case "val1":
    case "val2":
    case "val3":
    // ...
    }

    View Slide

  55. Sets – Using array (simple types)
    $set = array();
    // Adding elements to a set
    $set[1] = true; // Any dummy value
    $set[2] = true; // is good but NULL!
    $set[3] = true;
    // Checking presence in a set
    isset($set[2]); // true
    isset($set[5]); // false
    $set1 + $set2; // union
    array_intersect_key($set1, $set2); // intersection
    array_diff_key($set1, $set2); // complement

    View Slide

  56. Sets – Using array (simple types)

    Remember that PHP Array keys can be integers or
    strings only!
    $set = array();
    // Adding elements to a set
    $set[1] = true; // Any dummy value
    $set[2] = true; // is good but NULL!
    $set[3] = true;
    // Checking presence in a set
    isset($set[2]); // true
    isset($set[5]); // false
    $set1 + $set2; // union
    array_intersect_key($set1, $set2); // intersection
    array_diff_key($set1, $set2); // complement

    View Slide

  57. Sets – Using array (objects)
    $set = array();
    // Adding elements to a set
    $set[spl_object_hash($object1)] = $object1;
    $set[spl_object_hash($object2)] = $object2;
    $set[spl_object_hash($object3)] = $object3;
    // Checking presence in a set
    isset($set[spl_object_hash($object2)]); // true
    isset($set[spl_object_hash($object5)]); // false
    $set1 + $set2; // union
    array_intersect_key($set1, $set2); // intersection
    array_diff_key($set1, $set2); // complement

    View Slide

  58. Sets – Using array (objects)
    $set = array();
    // Adding elements to a set
    $set[spl_object_hash($object1)] = $object1;
    $set[spl_object_hash($object2)] = $object2;
    $set[spl_object_hash($object3)] = $object3;
    // Checking presence in a set
    isset($set[spl_object_hash($object2)]); // true
    isset($set[spl_object_hash($object5)]); // false
    $set1 + $set2; // union
    array_intersect_key($set1, $set2); // intersection
    array_diff_key($set1, $set2); // complement
    Store a
    reference of
    the object!

    View Slide

  59. Sets – Using SplObjectStorage
    (objects)
    $set = new SplObjectStorage();
    // Adding elements to a set
    $set->attach($object1); // or $set[$object1] = null;
    $set->attach($object2); // or $set[$object2] = null;
    $set->attach($object3); // or $set[$object3] = null;
    // Checking presence in a set
    isset($set[$object2]); // true
    isset($set[$object2]); // false
    $set1->addAll($set2); // union
    $set1->removeAllExcept($set2); // intersection
    $set1->removeAll($set2); // complement

    View Slide

  60. Sets – Using QuickHash (int)

    No union/intersection/complement operations
    (yet?)

    Yummy features like (loadFrom|saveTo)(String|File)
    $set = new QuickHashIntSet(64,
    QuickHashIntSet::CHECK_FOR_DUPES);
    // Adding elements to a set
    $set->add(1);
    $set->add(2);
    $set->add(3);
    // Checking presence in a set
    $set->exists(2); // true
    $set->exists(5); // false
    // Soonish: isset($set[2]);

    View Slide

  61. Sets – Using bitsets
    define("E_ERROR", 1); // or 1<<0
    define("E_WARNING", 2); // or 1<<1
    define("E_PARSE", 4); // or 1<<2
    define("E_NOTICE", 8); // or 1<<3
    // Adding elements to a set
    $set = 0;
    $set |= E_ERROR;
    $set |= E_WARNING;
    $set |= E_PARSE;
    // Checking presence in a set
    $set & E_ERROR; // true
    $set & E_NOTICE; // false
    $set1 | $set2; // union
    $set1 & $set2; // intersection
    $set1 ^ $set2; // complement

    View Slide

  62. Sets – Using bitsets (example)
    Instead of:
    function remove($path, $files = true, $directories = true, $links = true,
    $executable = true)
    {
    if (!$files && is_file($path))
    return false;
    if (!$directories && is_dir($path))
    return false;
    if (!$links && is_link($path))
    return false;
    if (!$executable && is_executable($path))
    return false;
    // ...
    }
    remove("/tmp/removeMe", true, false, true, false); // WTF ?!

    View Slide

  63. Sets – Using bitsets (example)
    Instead of:
    define("REMOVE_FILES", 1 << 0);
    define("REMOVE_DIRS", 1 << 1);
    define("REMOVE_LINKS", 1 << 2);
    define("REMOVE_EXEC", 1 << 3);
    define("REMOVE_ALL", ~0); // Setting all bits
    function remove($path, $options = REMOVE_ALL)
    {
    if (~$options & REMOVE_FILES && is_file($path))
    return false;
    if (~$options & REMOVE_DIRS && is_dir($path))
    return false;
    if (~$options & REMOVE_LINKS && is_link($path))
    return false;
    if (~$options & REMOVE_EXEC && is_executable($path))
    return false;
    // ...
    }
    remove("/tmp/removeMe", REMOVE_FILES | REMOVE_LINKS); // Much better :)

    View Slide

  64. Sets: Conclusions

    Use the key and not the value when using PHP
    Arrays.

    Use QuickHash for set of integers if possible.

    Use SplObjectStorage as soon as you are playing
    with objects.

    Don't use array_unique() when you need a set!

    View Slide

  65. Maps

    A map is a collection of key/value pairs where all
    keys are unique.

    View Slide

  66. Maps – Using array

    Don't use array_merge() on maps.
    $map = array();
    $map["ONE"] = 1;
    $map["TWO"] = 2;
    $map["THREE"] = 3;
    // Merging maps:
    array_merge($map1, $map2); // SLOW!
    $map2 + $map1; // Fast :)

    View Slide

  67. Multikey Maps – Using array
    $map = array();
    $map["ONE"] = 1;
    $map["UN"] =& $map["ONE"];
    $map["UNO"] =& $map["ONE"];
    $map["TWO"] = 2;
    $map["DEUX"] =& $map["TWO"];
    $map["DUE"] =& $map["TWO"];
    $map["UNO"] = "once";
    $map["DEUX"] = "twice";
    var_dump($map);
    /*
    array(6) {
    ["ONE"] => &string(4) "once"
    ["UN"] => &string(4) "once"
    ["UNO"] => &string(4) "once"
    ["TWO"] => &string(5) "twice"
    ["DEUX"] => &string(5) "twice"
    ["DUE"] => &string(5) "twice"
    }
    */

    View Slide

  68. Heap

    A heap is a tree-based structure in which all
    elements are ordered with largest key at the top,
    and the smallest one as leafs.

    View Slide

  69. Heap

    A heap is a tree-based structure in which all
    elements are ordered with largest key at the top,
    and the smallest one as leafs.

    View Slide

  70. Heap – Using array
    $heap = array();
    $heap[] = 3;
    sort($heap);
    $heap[] = 1;
    sort($heap);
    $heap[] = 2;
    sort($heap);

    View Slide

  71. Heap – Using Spl(Min|Max)Heap
    $heap = new SplMinHeap;
    $heap->insert(3);
    $heap->insert(1);
    $heap->insert(2);

    View Slide

  72. Heaps: Conclusions

    MUCH faster than having to re-sort() an array at
    every insertion.

    If you don't require a collection to be sorted at
    every single step and can insert all data at once
    and then sort(). Array is a much better/faster
    approach.

    SplPriorityQueue is very similar, consider it is the
    same as SplHeap but where the sorting is made on
    the key rather than the value.

    View Slide

  73. Bloom filters

    A bloom filter is a space-efficient probabilistic data
    structure used to test whether an element is
    member of a set.

    False positives are possible, but false negatives are
    not!

    View Slide

  74. Bloom filters – Using bloomy
    // BloomFilter::__construct(int capacity [, double
    error_rate [, int random_seed ] ])
    $bloomFilter = new BloomFilter(10000, 0.001);
    $bloomFilter->add("An element");
    $bloomFilter->has("An element"); // true for sure
    $bloomFilter->has("Foo"); // false, most probably

    View Slide

  75. Other related projects

    SPL Types: Various types implemented as object:
    SplInt, SplFloat, SplEnum, SplBool and SplString
    http://pecl.php.net/package/SPL_Types

    View Slide

  76. Other related projects

    SPL Types: Various types implemented as object:
    SplInt, SplFloat, SplEnum, SplBool and SplString
    http://pecl.php.net/package/SPL_Types

    Judy: Sparse dynamic arrays implementation
    http://pecl.php.net/package/Judy

    View Slide

  77. Other related projects

    SPL Types: Various types implemented as object:
    SplInt, SplFloat, SplEnum, SplBool and SplString
    http://pecl.php.net/package/SPL_Types

    Judy: Sparse dynamic arrays implementation
    http://pecl.php.net/package/Judy

    Weakref: Weak references implementation.
    Provides a gateway to an object without
    preventing that object from being collected by the
    garbage collector.

    View Slide

  78. Conclusions

    Use appropriate data structure. It will keep your
    code clean and fast.

    View Slide

  79. Conclusions

    Use appropriate data structure. It will keep your
    code clean and fast.

    Think about the time and space complexity
    involved by your algorithms.

    View Slide

  80. Conclusions

    Use appropriate data structure. It will keep your
    code clean and fast.

    Think about the time and space complexity
    involved by your algorithms.

    Name your variables accordingly: use “Map”, “Set”,
    “List”, “Queue”,... to describe them instead of using
    something like: $ordersArray.

    View Slide

  81. Questions?

    View Slide

  82. Thanks

    Don't forget to rate this talk on https://joind.in/6941

    View Slide

  83. Photo Credits

    Tuned car:
    http://www.flickr.com/photos/gioxxswall/5783867752

    London Eye Structure:
    http://www.flickr.com/photos/photographygal123/4883546484

    Cigarette:
    http://www.flickr.com/photos/superfantastic/166215927

    Heap structure:
    http://en.wikipedia.org/wiki/File:Max-Heap.svg

    Drawers:
    http://www.flickr.com/photos/jamesclay/2312912612

    Stones stack:
    http://www.flickr.com/photos/silent_e/2282729987

    Tree:
    http://www.flickr.com/photos/drewbandy/6002204996

    View Slide