Mastering PHP Data Structure 102Patrick Allaert
phpDay 2012 Verona, Italy
About me
● Patrick Allaert● Founder of Libereco● Playing with PHP/Linux for +10 years● eZ Publish core developer● Author of the APM PHP extension● @patrick_allaert● [email protected]● http://github.com/patrickallaert/● http://patrickallaert.blogspot.com/
APM
APM
PHP native datatypes
● NULL (IS_NULL)● Booleans (IS_BOOL)● Integers (IS_LONG)● Floating point numbers
(IS_DOUBLE)● Strings (IS_STRING)● Arrays (IS_ARRAY,
IS_CONSTANT_ARRAY)● Objects (IS_OBJECT)● Resources (IS_RESOURCE)● Callable (IS_CALLABLE)
Wikipedia datatypes● 2-3-4 tree● 2-3 heap● 2-3 tree● AA tree● Abstract syntax tree● (a,b)-tree● Adaptive k-d tree● Adjacency list● Adjacency matrix● AF-heap● Alternating decision
tree● And-inverter graph● And–or tree● Array● AVL tree● Beap● Bidirectional map● Bin● Binary decision
diagram● Binary heap● Binary search tree● Binary tree● Binomial heap● Bit array● Bitboard
● Bit field● Bitmap● BK-tree● Bloom filter● Boolean● Bounding interval
hierarchy● B sharp tree● BSP tree● B-tree● B*-tree● B+ tree● B-trie● Bx-tree● Cartesian tree● Char● Circular buffer● Compressed suffix
array● Container● Control table● Cover tree● Ctrie● Dancing tree● D-ary heap● Decision tree● Deque
● Directed acyclic graph
● Directed graph● Disjoint-set● Distributed hash
table● Double● Doubly connected
edge list● Doubly linked list● Dynamic array● Enfilade● Enumerated type● Expectiminimax tree● Exponential tree● Fenwick tree● Fibonacci heap● Finger tree● Float● FM-index● Fusion tree● Gap buffer● Generalised suffix
tree● Graph● Graph-structured
stack● Hash● Hash array mapped
trie
● Hashed array tree● Hash list● Hash table● Hash tree● Hash trie● Heap● Heightmap● Hilbert R-tree● Hypergraph● Iliffe vector● Image● Implicit kd-tree● Interval tree● Int● Judy array● Kdb tree● Kd-tree● Koorde● Leftist heap● Lightmap● Linear octree● Link/cut tree● Linked list● Lookup table
● Map/Associative array/Dictionary
● Matrix● Metric tree● Minimax tree● Min/max kd-tree● M-tree● Multigraph● Multimap● Multiset● Octree● Pagoda● Pairing heap● Parallel array● Parse tree● Plain old data
structure● Prefix hash tree● Priority queue● Propositional
directed acyclic graph
● Quad-edge● Quadtree● Queap● Queue● Radix tree● Randomized binary
search tree● Range tree
● Rapidly-exploring random tree
● Record (also called tuple or struct)
● Red-black tree● Rope● Routing table● R-tree● R* tree● R+ tree● Scapegoat tree● Scene graph● Segment tree● Self-balancing
binary search tree● Self-organizing list● Set● Skew heap● Skip list● Soft heap● Sorted array● Spaghetti stack● Sparse array● Sparse matrix● Splay tree● SPQR-tree● Stack● String● Suffix array
● Suffix tree● Symbol table● Syntax tree● Tagged union (variant
record, discriminated union, disjoint union)
● Tango tree● Ternary heap● Ternary search tree● Threaded binary tree● Top tree● Treap● Tree● Trees● Trie● T-tree● UB-tree● Union● Unrolled linked list● Van Emde Boas tree● Variable-length array● VList● VP-tree● Weight-balanced tree● Winged edge● X-fast trie● Xor linked list● X-tree● Y-fast trie● Zero suppressed
decision diagram● Zipper● Z-order
Game:Can you recognize some structures?
Array: PHP's untruthfulness
PHP “Arrays” are not true Arrays!
Array: PHP's untruthfulness
PHP “Arrays” are not true Arrays!
An array typically looks like this:
Data DataDataData Data Data
0 1 2 3 4 5
Array: PHP's untruthfulness
PHP “Arrays” can dynamically grow and be iterated both directions (reset(), next(), prev(), end()), exclusively with O(1) operations.
Array: PHP's untruthfulness
PHP “Arrays” can dynamically grow and be iterated both directions (reset(), next(), prev(), end()), exclusively with O(1) operations.
Let's have a Doubly Linked List (DLL):
Data Data Data Data Data
Head Tail
Enables List, Deque, Queue and Stack implementations
Array: PHP's untruthfulness
PHP “Arrays” elements are always accessible using a key (index).
Array: PHP's untruthfulness
PHP “Arrays” elements are always accessible using a key (index).
Let's have an Hash Table:
Data Data Data Data Data
Head Tail
Bucket Bucket Bucket Bucket Bucket
Bucket pointers array
Bucket *
0
Bucket *
1
Bucket *
2
Bucket *
3
Bucket *
4
Bucket *
5 ...
Bucket *
nTableSize -1
Array: PHP's untruthfulness
http://php.net/manual/en/language.types.array.php:
“This type is optimized for several different uses; it can be treated as an array, list (vector), hash table (an implementation of a map), dictionary, collection, stack, queue, and probably more.”
Optimized for anything ≈ Optimized for nothing!
Array: PHP's untruthfulness
● In C: 100 000 integers (using long on 64bits => 8 bytes) can be stored in 0.76 Mb.
● In PHP: it will take 13.97 Mb!≅● A PHP variable (containing an integer) takes 48
bytes.● The overhead of buckets for every “array” entries is
about 96 bytes.● More details:
http://nikic.github.com/2011/12/12/How-big-are-PHP-arrays-really-Hint-BIG.html
Data Structure
Structs (or records, tuples,...)
Structs (or records, tuples,...)
● A struct is a value containing other values which are typically accessed using a name.
● Example:Person => firstName / lastNameComplexNumber => realPart / imaginaryPart
Structs – Using array
$person = array( "firstName" => "Patrick", "lastName" => "Allaert");
Structs – Using a class
$person = new PersonStruct( "Patrick", "Allaert");
Structs – Using a class (Implementation)
class PersonStruct{ public $firstName; public $lastName; public function __construct($firstName, $lastName) { $this->firstName = $firstName; $this->lastName = $lastName; }}
Structs – Using a class (Implementation)
class PersonStruct{ public $firstName; public $lastName; public function __construct($firstName, $lastName) { $this->firstName = $firstName; $this->lastName = $lastName; } public function __set($key, $value) { // a. Do nothing // b. trigger_error() // c. Throws an exception }}
Structs – Pros and Cons
Array+ Uses less memory (PHP < 5.4)
- Uses more memory (PHP = 5.4)
- No type hinting
- Flexible structure
+|- Less OO
Slightly faster?
Class- Uses more memory (PHP < 5.4)
+ Uses less memory (PHP = 5.4)
+ Type hinting possible
+ Rigid structure
+|- More OO
Slightly slower?
(true) Arrays
(true) Arrays
● An array is a fixed size collection where elements are each identified by a numeric index.
(true) Arrays
● An array is a fixed size collection where elements are each identified by a numeric index.
Data DataDataData Data Data
0 1 2 3 4 5
(true) Arrays – Using SplFixedArray
$array = new SplFixedArray(3);$array[0] = 1; // or $array->offsetSet()$array[1] = 2; // or $array->offsetSet()$array[2] = 3; // or $array->offsetSet()$array[0]; // gives 1$array[1]; // gives 2$array[2]; // gives 3
(true) Arrays – Pros and Cons
Array- Uses more memory
+|- Less OO
SplFixedArray+ Uses less memory
+|- More OO
Queues
Queues
● A queue is an ordered collection respecting First In, First Out (FIFO) order.
● Elements are inserted at one end and removed at the other.
Queues
● A queue is an ordered collection respecting First In, First Out (FIFO) order.
● Elements are inserted at one end and removed at the other.
Data DataDataData Data Data
Data
Data
Enqueue
Dequeue
Queues – Using array
$queue = array();$queue[] = 1; // or array_push()$queue[] = 2; // or array_push()$queue[] = 3; // or array_push()array_shift($queue); // gives 1array_shift($queue); // gives 2array_shift($queue); // gives 3
Queues – Using SplQueue
$queue = new SplQueue();$queue[] = 1; // or $queue->enqueue()$queue[] = 2; // or $queue->enqueue()$queue[] = 3; // or $queue->enqueue()$queue->dequeue(); // gives 1$queue->dequeue(); // gives 2$queue->dequeue(); // gives 3
Stacks
Stacks
● A stack is an ordered collection respecting Last In, First Out (LIFO) order.
● Elements are inserted and removed on the same end.
Stacks
● A stack is an ordered collection respecting Last In, First Out (LIFO) order.
● Elements are inserted and removed on the same end.
Data DataDataData Data Data
Data
Data
Push
Pop
Stacks – Using array
$stack = array();$stack[] = 1; // or array_push()$stack[] = 2; // or array_push()$stack[] = 3; // or array_push()array_pop($stack); // gives 3array_pop($stack); // gives 2array_pop($stack); // gives 1
Stacks – Using SplStack
$stack = new SplStack();$stack[] = 1; // or $stack->push()$stack[] = 2; // or $stack->push()$stack[] = 3; // or $stack->push()$stack->pop(); // gives 3$stack->pop(); // gives 2$stack->pop(); // gives 1
Queues/Stacks – Pros and Cons
Array- Uses more memory
(overhead / entry: 96 bytes)
- No type hinting
+|- Less OO
SplQueue / SplStack+ Uses less memory
(overhead / entry: 48 bytes)
+ Type hinting possible
+|- More OO
Sets
People with strong views on the distinction between geeks
and nerds
Geeks Nerds
Sets
● A set is a collection with no particular ordering especially suited for testing the membership of a value against a collection or to perform union/intersection/complement operations between them.
Sets
● A set is a collection with no particular ordering especially suited for testing the membership of a value against a collection or to perform union/intersection/complement operations between them.
Data
Data
Data
Data
Data
Sets – Using array
$set = array();
// Adding elements to a set$set[] = 1;$set[] = 2;$set[] = 3;
// Checking presence in a setin_array(2, $set); // truein_array(5, $set); // false
array_merge($set1, $set2); // unionarray_intersect($set1, $set2); // intersectionarray_diff($set1, $set2); // complement
Sets – Using array
$set = array();
// Adding elements to a set$set[] = 1;$set[] = 2;$set[] = 3;
// Checking presence in a setin_array(2, $set); // truein_array(5, $set); // false
array_merge($set1, $set2); // unionarray_intersect($set1, $set2); // intersectionarray_diff($set1, $set2); // complement
True performance killers!
Sets – Mis-usage
if ($value === "val1" || $value === "val2" || $value === "val3"))){ // ...}
Sets – Mis-usage
if (in_array($value, array("val1", "val2", "val3"))){ // ...}
Sets – Mis-usage
switch ($value){ case "val1": case "val2": case "val3": // ...}
Sets – Using array (simple types)
$set = array();
// Adding elements to a set$set[1] = true; // Any dummy value$set[2] = true; // is good but NULL!$set[3] = true;
// Checking presence in a setisset($set[2]); // trueisset($set[5]); // false
$set1 + $set2; // unionarray_intersect_key($set1, $set2); // intersectionarray_diff_key($set1, $set2); // complement
Sets – Using array (simple types)
● Remember that PHP Array keys can be integers or strings only!
$set = array();
// Adding elements to a set$set[1] = true; // Any dummy value$set[2] = true; // is good but NULL!$set[3] = true;
// Checking presence in a setisset($set[2]); // trueisset($set[5]); // false
$set1 + $set2; // unionarray_intersect_key($set1, $set2); // intersectionarray_diff_key($set1, $set2); // complement
Sets – Using array (objects)
$set = array();
// Adding elements to a set$set[spl_object_hash($object1)] = $object1;$set[spl_object_hash($object2)] = $object2;$set[spl_object_hash($object3)] = $object3;
// Checking presence in a setisset($set[spl_object_hash($object2)]); // trueisset($set[spl_object_hash($object5)]); // false
$set1 + $set2; // unionarray_intersect_key($set1, $set2); // intersectionarray_diff_key($set1, $set2); // complement
Sets – Using array (objects)
$set = array();
// Adding elements to a set$set[spl_object_hash($object1)] = $object1;$set[spl_object_hash($object2)] = $object2;$set[spl_object_hash($object3)] = $object3;
// Checking presence in a setisset($set[spl_object_hash($object2)]); // trueisset($set[spl_object_hash($object5)]); // false
$set1 + $set2; // unionarray_intersect_key($set1, $set2); // intersectionarray_diff_key($set1, $set2); // complement
Store a reference of the object!
Sets – Using SplObjectStorage (objects)
$set = new SplObjectStorage();
// Adding elements to a set$set->attach($object1); // or $set[$object1] = null;$set->attach($object2); // or $set[$object2] = null;$set->attach($object3); // or $set[$object3] = null;
// Checking presence in a setisset($set[$object2]); // trueisset($set[$object2]); // false
$set1->addAll($set2); // union$set1->removeAllExcept($set2); // intersection$set1->removeAll($set2); // complement
Sets – Using QuickHash (int)
● No union/intersection/complement operations (yet?)
● Yummy features like (loadFrom|saveTo)(String|File)
$set = new QuickHashIntSet(64, QuickHashIntSet::CHECK_FOR_DUPES);
// Adding elements to a set$set->add(1);$set->add(2);$set->add(3);
// Checking presence in a set$set->exists(2); // true$set->exists(5); // false
// Soonish: isset($set[2]);
Sets – Using bitsets
define("E_ERROR", 1); // or 1<<0define("E_WARNING", 2); // or 1<<1define("E_PARSE", 4); // or 1<<2define("E_NOTICE", 8); // or 1<<3
// Adding elements to a set$set = 0;$set |= E_ERROR;$set |= E_WARNING;$set |= E_PARSE;
// Checking presence in a set$set & E_ERROR; // true$set & E_NOTICE; // false
$set1 | $set2; // union$set1 & $set2; // intersection$set1 ^ $set2; // complement
Sets – Using bitsets (example)
Instead of:function remove($path, $files = true, $directories = true, $links = true, $executable = true){ if (!$files && is_file($path)) return false; if (!$directories && is_dir($path)) return false; if (!$links && is_link($path)) return false; if (!$executable && is_executable($path)) return false; // ...}
remove("/tmp/removeMe", true, false, true, false); // WTF ?!
Sets – Using bitsets (example)
Instead of:define("REMOVE_FILES", 1 << 0);define("REMOVE_DIRS", 1 << 1);define("REMOVE_LINKS", 1 << 2);define("REMOVE_EXEC", 1 << 3);define("REMOVE_ALL", ~0); // Setting all bits
function remove($path, $options = REMOVE_ALL){ if (~$options & REMOVE_FILES && is_file($path)) return false; if (~$options & REMOVE_DIRS && is_dir($path)) return false; if (~$options & REMOVE_LINKS && is_link($path)) return false; if (~$options & REMOVE_EXEC && is_executable($path)) return false; // ...}
remove("/tmp/removeMe", REMOVE_FILES | REMOVE_LINKS); // Much better :)
Sets: Conclusions
● Use the key and not the value when using PHP Arrays.
● Use QuickHash for set of integers if possible.● Use SplObjectStorage as soon as you are playing
with objects.● Don't use array_unique() when you need a set!
Maps
● A map is a collection of key/value pairs where all keys are unique.
Maps – Using array
● Don't use array_merge() on maps.
$map = array();$map["ONE"] = 1;$map["TWO"] = 2;$map["THREE"] = 3;
// Merging maps:array_merge($map1, $map2); // SLOW!$map2 + $map1; // Fast :)
Multikey Maps – Using array
$map = array();$map["ONE"] = 1;$map["UN"] =& $map["ONE"];$map["UNO"] =& $map["ONE"];$map["TWO"] = 2;$map["DEUX"] =& $map["TWO"];$map["DUE"] =& $map["TWO"];
$map["UNO"] = "once";$map["DEUX"] = "twice";
var_dump($map);/*array(6) {["ONE"] => &string(4) "once"["UN"] => &string(4) "once"["UNO"] => &string(4) "once"["TWO"] => &string(5) "twice"["DEUX"] => &string(5) "twice"["DUE"] => &string(5) "twice"}*/
Heap
● A heap is a tree-based structure in which all elements are ordered with largest key at the top, and the smallest one as leafs.
Heap
● A heap is a tree-based structure in which all elements are ordered with largest key at the top, and the smallest one as leafs.
Heap – Using array
$heap = array();$heap[] = 3;sort($heap);$heap[] = 1;sort($heap);$heap[] = 2;sort($heap);
Heap – Using Spl(Min|Max)Heap
$heap = new SplMinHeap;$heap->insert(3);$heap->insert(1);$heap->insert(2);
Heaps: Conclusions
● MUCH faster than having to re-sort() an array at every insertion.
● If you don't require a collection to be sorted at every single step and can insert all data at once and then sort(). Array is a much better/faster approach.
● SplPriorityQueue is very similar, consider it is the same as SplHeap but where the sorting is made on the key rather than the value.
Bloom filters
● A bloom filter is a space-efficient probabilistic data structure used to test whether an element is member of a set.
● False positives are possible, but false negatives are not!
Bloom filters – Using bloomy
// BloomFilter::__construct(int capacity [, double error_rate [, int random_seed ] ])$bloomFilter = new BloomFilter(10000, 0.001);
$bloomFilter->add("An element");
$bloomFilter->has("An element"); // true for sure$bloomFilter->has("Foo"); // false, most probably
Other related projects
● SPL Types: Various types implemented as object: SplInt, SplFloat, SplEnum, SplBool and SplString http://pecl.php.net/package/SPL_Types
Other related projects
● SPL Types: Various types implemented as object: SplInt, SplFloat, SplEnum, SplBool and SplString http://pecl.php.net/package/SPL_Types
● Judy: Sparse dynamic arrays implementation http://pecl.php.net/package/Judy
Other related projects
● SPL Types: Various types implemented as object: SplInt, SplFloat, SplEnum, SplBool and SplString http://pecl.php.net/package/SPL_Types
● Judy: Sparse dynamic arrays implementation http://pecl.php.net/package/Judy
● Weakref: Weak references implementation. Provides a gateway to an object without preventing that object from being collected by the garbage collector.
Conclusions
● Use appropriate data structure. It will keep your code clean and fast.
Conclusions
● Use appropriate data structure. It will keep your code clean and fast.
● Think about the time and space complexity involved by your algorithms.
Conclusions
● Use appropriate data structure. It will keep your code clean and fast.
● Think about the time and space complexity involved by your algorithms.
● Name your variables accordingly: use “Map”, “Set”, “List”, “Queue”,... to describe them instead of using something like: $ordersArray.
Questions?
Thanks
● Don't forget to rate this talk on http://joind.in/6371
Photo Credits● Tuned car:
http://www.flickr.com/photos/gioxxswall/5783867752
● London Eye Structure: http://www.flickr.com/photos/photographygal123/4883546484
● Cigarette:http://www.flickr.com/photos/superfantastic/166215927
● Heap structure:http://en.wikipedia.org/wiki/File:Max-Heap.svg
● Drawers:http://www.flickr.com/photos/jamesclay/2312912612
● Stones stack:http://www.flickr.com/photos/silent_e/2282729987
● Tree:http://www.flickr.com/photos/drewbandy/6002204996