mastering php data structure 102 - phpday 2012 verona

Post on 17-May-2015

5.633 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

We all have certainly learned data structures at school: arrays, lists, sets, stacks, queues (LIFO/FIFO), heaps, associative arrays, trees, ... and what do we mostly use in PHP? The "array"! In most cases, we do everything and anything with it but we stumble upon it when profiling code. During this session, we'll learn again to use the structures appropriately, leaning closer on the way to employ arrays, the SPL and other structures from PHP extensions as well.

TRANSCRIPT

Mastering PHP Data Structure 102Patrick Allaert

phpDay 2012 Verona, Italy

About me

● Patrick Allaert● Founder of Libereco● Playing with PHP/Linux for +10 years● eZ Publish core developer● Author of the APM PHP extension● @patrick_allaert● patrickallaert@php.net● http://github.com/patrickallaert/● http://patrickallaert.blogspot.com/

APM

APM

PHP native datatypes

● NULL (IS_NULL)● Booleans (IS_BOOL)● Integers (IS_LONG)● Floating point numbers

(IS_DOUBLE)● Strings (IS_STRING)● Arrays (IS_ARRAY,

IS_CONSTANT_ARRAY)● Objects (IS_OBJECT)● Resources (IS_RESOURCE)● Callable (IS_CALLABLE)

Wikipedia datatypes● 2-3-4 tree● 2-3 heap● 2-3 tree● AA tree● Abstract syntax tree● (a,b)-tree● Adaptive k-d tree● Adjacency list● Adjacency matrix● AF-heap● Alternating decision

tree● And-inverter graph● And–or tree● Array● AVL tree● Beap● Bidirectional map● Bin● Binary decision

diagram● Binary heap● Binary search tree● Binary tree● Binomial heap● Bit array● Bitboard

● Bit field● Bitmap● BK-tree● Bloom filter● Boolean● Bounding interval

hierarchy● B sharp tree● BSP tree● B-tree● B*-tree● B+ tree● B-trie● Bx-tree● Cartesian tree● Char● Circular buffer● Compressed suffix

array● Container● Control table● Cover tree● Ctrie● Dancing tree● D-ary heap● Decision tree● Deque

● Directed acyclic graph

● Directed graph● Disjoint-set● Distributed hash

table● Double● Doubly connected

edge list● Doubly linked list● Dynamic array● Enfilade● Enumerated type● Expectiminimax tree● Exponential tree● Fenwick tree● Fibonacci heap● Finger tree● Float● FM-index● Fusion tree● Gap buffer● Generalised suffix

tree● Graph● Graph-structured

stack● Hash● Hash array mapped

trie

● Hashed array tree● Hash list● Hash table● Hash tree● Hash trie● Heap● Heightmap● Hilbert R-tree● Hypergraph● Iliffe vector● Image● Implicit kd-tree● Interval tree● Int● Judy array● Kdb tree● Kd-tree● Koorde● Leftist heap● Lightmap● Linear octree● Link/cut tree● Linked list● Lookup table

● Map/Associative array/Dictionary

● Matrix● Metric tree● Minimax tree● Min/max kd-tree● M-tree● Multigraph● Multimap● Multiset● Octree● Pagoda● Pairing heap● Parallel array● Parse tree● Plain old data

structure● Prefix hash tree● Priority queue● Propositional

directed acyclic graph

● Quad-edge● Quadtree● Queap● Queue● Radix tree● Randomized binary

search tree● Range tree

● Rapidly-exploring random tree

● Record (also called tuple or struct)

● Red-black tree● Rope● Routing table● R-tree● R* tree● R+ tree● Scapegoat tree● Scene graph● Segment tree● Self-balancing

binary search tree● Self-organizing list● Set● Skew heap● Skip list● Soft heap● Sorted array● Spaghetti stack● Sparse array● Sparse matrix● Splay tree● SPQR-tree● Stack● String● Suffix array

● Suffix tree● Symbol table● Syntax tree● Tagged union (variant

record, discriminated union, disjoint union)

● Tango tree● Ternary heap● Ternary search tree● Threaded binary tree● Top tree● Treap● Tree● Trees● Trie● T-tree● UB-tree● Union● Unrolled linked list● Van Emde Boas tree● Variable-length array● VList● VP-tree● Weight-balanced tree● Winged edge● X-fast trie● Xor linked list● X-tree● Y-fast trie● Zero suppressed

decision diagram● Zipper● Z-order

Game:Can you recognize some structures?

Array: PHP's untruthfulness

PHP “Arrays” are not true Arrays!

Array: PHP's untruthfulness

PHP “Arrays” are not true Arrays!

An array typically looks like this:

Data DataDataData Data Data

0 1 2 3 4 5

Array: PHP's untruthfulness

PHP “Arrays” can dynamically grow and be iterated both directions (reset(), next(), prev(), end()), exclusively with O(1) operations.

Array: PHP's untruthfulness

PHP “Arrays” can dynamically grow and be iterated both directions (reset(), next(), prev(), end()), exclusively with O(1) operations.

Let's have a Doubly Linked List (DLL):

Data Data Data Data Data

Head Tail

Enables List, Deque, Queue and Stack implementations

Array: PHP's untruthfulness

PHP “Arrays” elements are always accessible using a key (index).

Array: PHP's untruthfulness

PHP “Arrays” elements are always accessible using a key (index).

Let's have an Hash Table:

Data Data Data Data Data

Head Tail

Bucket Bucket Bucket Bucket Bucket

Bucket pointers array

Bucket *

0

Bucket *

1

Bucket *

2

Bucket *

3

Bucket *

4

Bucket *

5 ...

Bucket *

nTableSize -1

Array: PHP's untruthfulness

http://php.net/manual/en/language.types.array.php:

“This type is optimized for several different uses; it can be treated as an array, list (vector), hash table (an implementation of a map), dictionary, collection, stack, queue, and probably more.”

Optimized for anything ≈ Optimized for nothing!

Array: PHP's untruthfulness

● In C: 100 000 integers (using long on 64bits => 8 bytes) can be stored in 0.76 Mb.

● In PHP: it will take 13.97 Mb!≅● A PHP variable (containing an integer) takes 48

bytes.● The overhead of buckets for every “array” entries is

about 96 bytes.● More details:

http://nikic.github.com/2011/12/12/How-big-are-PHP-arrays-really-Hint-BIG.html

Data Structure

Structs (or records, tuples,...)

● A struct is a value containing other values which are typically accessed using a name.

● Example:Person => firstName / lastNameComplexNumber => realPart / imaginaryPart

Structs – Using array

$person = array( "firstName" => "Patrick", "lastName" => "Allaert");

Structs – Using a class

$person = new PersonStruct( "Patrick", "Allaert");

Structs – Using a class (Implementation)

class PersonStruct{ public $firstName; public $lastName; public function __construct($firstName, $lastName) { $this->firstName = $firstName; $this->lastName = $lastName; }}

Structs – Using a class (Implementation)

class PersonStruct{ public $firstName; public $lastName; public function __construct($firstName, $lastName) { $this->firstName = $firstName; $this->lastName = $lastName; } public function __set($key, $value) { // a. Do nothing // b. trigger_error() // c. Throws an exception }}

Structs – Pros and Cons

Array+ Uses less memory (PHP < 5.4)

- Uses more memory (PHP = 5.4)

- No type hinting

- Flexible structure

+|- Less OO

Slightly faster?

Class- Uses more memory (PHP < 5.4)

+ Uses less memory (PHP = 5.4)

+ Type hinting possible

+ Rigid structure

+|- More OO

Slightly slower?

(true) Arrays

(true) Arrays

● An array is a fixed size collection where elements are each identified by a numeric index.

(true) Arrays

● An array is a fixed size collection where elements are each identified by a numeric index.

Data DataDataData Data Data

0 1 2 3 4 5

(true) Arrays – Using SplFixedArray

$array = new SplFixedArray(3);$array[0] = 1; // or $array->offsetSet()$array[1] = 2; // or $array->offsetSet()$array[2] = 3; // or $array->offsetSet()$array[0]; // gives 1$array[1]; // gives 2$array[2]; // gives 3

(true) Arrays – Pros and Cons

Array- Uses more memory

+|- Less OO

SplFixedArray+ Uses less memory

+|- More OO

Queues

Queues

● A queue is an ordered collection respecting First In, First Out (FIFO) order.

● Elements are inserted at one end and removed at the other.

Queues

● A queue is an ordered collection respecting First In, First Out (FIFO) order.

● Elements are inserted at one end and removed at the other.

Data DataDataData Data Data

Data

Data

Enqueue

Dequeue

Queues – Using array

$queue = array();$queue[] = 1; // or array_push()$queue[] = 2; // or array_push()$queue[] = 3; // or array_push()array_shift($queue); // gives 1array_shift($queue); // gives 2array_shift($queue); // gives 3

Queues – Using SplQueue

$queue = new SplQueue();$queue[] = 1; // or $queue->enqueue()$queue[] = 2; // or $queue->enqueue()$queue[] = 3; // or $queue->enqueue()$queue->dequeue(); // gives 1$queue->dequeue(); // gives 2$queue->dequeue(); // gives 3

Stacks

Stacks

● A stack is an ordered collection respecting Last In, First Out (LIFO) order.

● Elements are inserted and removed on the same end.

Stacks

● A stack is an ordered collection respecting Last In, First Out (LIFO) order.

● Elements are inserted and removed on the same end.

Data DataDataData Data Data

Data

Data

Push

Pop

Stacks – Using array

$stack = array();$stack[] = 1; // or array_push()$stack[] = 2; // or array_push()$stack[] = 3; // or array_push()array_pop($stack); // gives 3array_pop($stack); // gives 2array_pop($stack); // gives 1

Stacks – Using SplStack

$stack = new SplStack();$stack[] = 1; // or $stack->push()$stack[] = 2; // or $stack->push()$stack[] = 3; // or $stack->push()$stack->pop(); // gives 3$stack->pop(); // gives 2$stack->pop(); // gives 1

Queues/Stacks – Pros and Cons

Array- Uses more memory

(overhead / entry: 96 bytes)

- No type hinting

+|- Less OO

SplQueue / SplStack+ Uses less memory

(overhead / entry: 48 bytes)

+ Type hinting possible

+|- More OO

Sets

People with strong views on the distinction between geeks

and nerds

Geeks Nerds

Sets

● A set is a collection with no particular ordering especially suited for testing the membership of a value against a collection or to perform union/intersection/complement operations between them.

Sets

● A set is a collection with no particular ordering especially suited for testing the membership of a value against a collection or to perform union/intersection/complement operations between them.

Data

Data

Data

Data

Data

Sets – Using array

$set = array();

// Adding elements to a set$set[] = 1;$set[] = 2;$set[] = 3;

// Checking presence in a setin_array(2, $set); // truein_array(5, $set); // false

array_merge($set1, $set2); // unionarray_intersect($set1, $set2); // intersectionarray_diff($set1, $set2); // complement

Sets – Using array

$set = array();

// Adding elements to a set$set[] = 1;$set[] = 2;$set[] = 3;

// Checking presence in a setin_array(2, $set); // truein_array(5, $set); // false

array_merge($set1, $set2); // unionarray_intersect($set1, $set2); // intersectionarray_diff($set1, $set2); // complement

True performance killers!

Sets – Mis-usage

if ($value === "val1" || $value === "val2" || $value === "val3"))){ // ...}

Sets – Mis-usage

if (in_array($value, array("val1", "val2", "val3"))){ // ...}

Sets – Mis-usage

switch ($value){ case "val1": case "val2": case "val3": // ...}

Sets – Using array (simple types)

$set = array();

// Adding elements to a set$set[1] = true; // Any dummy value$set[2] = true; // is good but NULL!$set[3] = true;

// Checking presence in a setisset($set[2]); // trueisset($set[5]); // false

$set1 + $set2; // unionarray_intersect_key($set1, $set2); // intersectionarray_diff_key($set1, $set2); // complement

Sets – Using array (simple types)

● Remember that PHP Array keys can be integers or strings only!

$set = array();

// Adding elements to a set$set[1] = true; // Any dummy value$set[2] = true; // is good but NULL!$set[3] = true;

// Checking presence in a setisset($set[2]); // trueisset($set[5]); // false

$set1 + $set2; // unionarray_intersect_key($set1, $set2); // intersectionarray_diff_key($set1, $set2); // complement

Sets – Using array (objects)

$set = array();

// Adding elements to a set$set[spl_object_hash($object1)] = $object1;$set[spl_object_hash($object2)] = $object2;$set[spl_object_hash($object3)] = $object3;

// Checking presence in a setisset($set[spl_object_hash($object2)]); // trueisset($set[spl_object_hash($object5)]); // false

$set1 + $set2; // unionarray_intersect_key($set1, $set2); // intersectionarray_diff_key($set1, $set2); // complement

Sets – Using array (objects)

$set = array();

// Adding elements to a set$set[spl_object_hash($object1)] = $object1;$set[spl_object_hash($object2)] = $object2;$set[spl_object_hash($object3)] = $object3;

// Checking presence in a setisset($set[spl_object_hash($object2)]); // trueisset($set[spl_object_hash($object5)]); // false

$set1 + $set2; // unionarray_intersect_key($set1, $set2); // intersectionarray_diff_key($set1, $set2); // complement

Store a reference of the object!

Sets – Using SplObjectStorage (objects)

$set = new SplObjectStorage();

// Adding elements to a set$set->attach($object1); // or $set[$object1] = null;$set->attach($object2); // or $set[$object2] = null;$set->attach($object3); // or $set[$object3] = null;

// Checking presence in a setisset($set[$object2]); // trueisset($set[$object2]); // false

$set1->addAll($set2); // union$set1->removeAllExcept($set2); // intersection$set1->removeAll($set2); // complement

Sets – Using QuickHash (int)

● No union/intersection/complement operations (yet?)

● Yummy features like (loadFrom|saveTo)(String|File)

$set = new QuickHashIntSet(64, QuickHashIntSet::CHECK_FOR_DUPES);

// Adding elements to a set$set->add(1);$set->add(2);$set->add(3);

// Checking presence in a set$set->exists(2); // true$set->exists(5); // false

// Soonish: isset($set[2]);

Sets – Using bitsets

define("E_ERROR", 1); // or 1<<0define("E_WARNING", 2); // or 1<<1define("E_PARSE", 4); // or 1<<2define("E_NOTICE", 8); // or 1<<3

// Adding elements to a set$set = 0;$set |= E_ERROR;$set |= E_WARNING;$set |= E_PARSE;

// Checking presence in a set$set & E_ERROR; // true$set & E_NOTICE; // false

$set1 | $set2; // union$set1 & $set2; // intersection$set1 ^ $set2; // complement

Sets – Using bitsets (example)

Instead of:function remove($path, $files = true, $directories = true, $links = true, $executable = true){ if (!$files && is_file($path)) return false; if (!$directories && is_dir($path)) return false; if (!$links && is_link($path)) return false; if (!$executable && is_executable($path)) return false; // ...}

remove("/tmp/removeMe", true, false, true, false); // WTF ?!

Sets – Using bitsets (example)

Instead of:define("REMOVE_FILES", 1 << 0);define("REMOVE_DIRS", 1 << 1);define("REMOVE_LINKS", 1 << 2);define("REMOVE_EXEC", 1 << 3);define("REMOVE_ALL", ~0); // Setting all bits

function remove($path, $options = REMOVE_ALL){ if (~$options & REMOVE_FILES && is_file($path)) return false; if (~$options & REMOVE_DIRS && is_dir($path)) return false; if (~$options & REMOVE_LINKS && is_link($path)) return false; if (~$options & REMOVE_EXEC && is_executable($path)) return false; // ...}

remove("/tmp/removeMe", REMOVE_FILES | REMOVE_LINKS); // Much better :)

Sets: Conclusions

● Use the key and not the value when using PHP Arrays.

● Use QuickHash for set of integers if possible.● Use SplObjectStorage as soon as you are playing

with objects.● Don't use array_unique() when you need a set!

Maps

● A map is a collection of key/value pairs where all keys are unique.

Maps – Using array

● Don't use array_merge() on maps.

$map = array();$map["ONE"] = 1;$map["TWO"] = 2;$map["THREE"] = 3;

// Merging maps:array_merge($map1, $map2); // SLOW!$map2 + $map1; // Fast :)

Multikey Maps – Using array

$map = array();$map["ONE"] = 1;$map["UN"] =& $map["ONE"];$map["UNO"] =& $map["ONE"];$map["TWO"] = 2;$map["DEUX"] =& $map["TWO"];$map["DUE"] =& $map["TWO"];

$map["UNO"] = "once";$map["DEUX"] = "twice";

var_dump($map);/*array(6) {["ONE"] => &string(4) "once"["UN"] => &string(4) "once"["UNO"] => &string(4) "once"["TWO"] => &string(5) "twice"["DEUX"] => &string(5) "twice"["DUE"] => &string(5) "twice"}*/

Heap

● A heap is a tree-based structure in which all elements are ordered with largest key at the top, and the smallest one as leafs.

Heap

● A heap is a tree-based structure in which all elements are ordered with largest key at the top, and the smallest one as leafs.

Heap – Using array

$heap = array();$heap[] = 3;sort($heap);$heap[] = 1;sort($heap);$heap[] = 2;sort($heap);

Heap – Using Spl(Min|Max)Heap

$heap = new SplMinHeap;$heap->insert(3);$heap->insert(1);$heap->insert(2);

Heaps: Conclusions

● MUCH faster than having to re-sort() an array at every insertion.

● If you don't require a collection to be sorted at every single step and can insert all data at once and then sort(). Array is a much better/faster approach.

● SplPriorityQueue is very similar, consider it is the same as SplHeap but where the sorting is made on the key rather than the value.

Bloom filters

● A bloom filter is a space-efficient probabilistic data structure used to test whether an element is member of a set.

● False positives are possible, but false negatives are not!

Bloom filters – Using bloomy

// BloomFilter::__construct(int capacity [, double error_rate [, int random_seed ] ])$bloomFilter = new BloomFilter(10000, 0.001);

$bloomFilter->add("An element");

$bloomFilter->has("An element"); // true for sure$bloomFilter->has("Foo"); // false, most probably

Other related projects

● SPL Types: Various types implemented as object: SplInt, SplFloat, SplEnum, SplBool and SplString http://pecl.php.net/package/SPL_Types

Other related projects

● SPL Types: Various types implemented as object: SplInt, SplFloat, SplEnum, SplBool and SplString http://pecl.php.net/package/SPL_Types

● Judy: Sparse dynamic arrays implementation http://pecl.php.net/package/Judy

Other related projects

● SPL Types: Various types implemented as object: SplInt, SplFloat, SplEnum, SplBool and SplString http://pecl.php.net/package/SPL_Types

● Judy: Sparse dynamic arrays implementation http://pecl.php.net/package/Judy

● Weakref: Weak references implementation. Provides a gateway to an object without preventing that object from being collected by the garbage collector.

Conclusions

● Use appropriate data structure. It will keep your code clean and fast.

Conclusions

● Use appropriate data structure. It will keep your code clean and fast.

● Think about the time and space complexity involved by your algorithms.

Conclusions

● Use appropriate data structure. It will keep your code clean and fast.

● Think about the time and space complexity involved by your algorithms.

● Name your variables accordingly: use “Map”, “Set”, “List”, “Queue”,... to describe them instead of using something like: $ordersArray.

Questions?

Thanks

● Don't forget to rate this talk on http://joind.in/6371

Photo Credits● Tuned car:

http://www.flickr.com/photos/gioxxswall/5783867752

● London Eye Structure: http://www.flickr.com/photos/photographygal123/4883546484

● Cigarette:http://www.flickr.com/photos/superfantastic/166215927

● Heap structure:http://en.wikipedia.org/wiki/File:Max-Heap.svg

● Drawers:http://www.flickr.com/photos/jamesclay/2312912612

● Stones stack:http://www.flickr.com/photos/silent_e/2282729987

● Tree:http://www.flickr.com/photos/drewbandy/6002204996

top related