data structures
DESCRIPTION
this slide some of the important data structures like graphs, trie, suffix trees, hash tables etcTRANSCRIPT
![Page 1: Data structures](https://reader033.vdocument.in/reader033/viewer/2022061214/54990a80b47959525b8b4767/html5/thumbnails/1.jpg)
Data StructuresPlacement Lectures 2012
Pranav Gupta
![Page 2: Data structures](https://reader033.vdocument.in/reader033/viewer/2022061214/54990a80b47959525b8b4767/html5/thumbnails/2.jpg)
Why we need Data Structures?
• Efficient and Intuitive representation of data• Tree using arrays vs tree using pointers
• To solve real life problems efficiently• Insertion• Deletion• Search• Sort
• Applications• Social networks• Employee hierarchy• Recommended items
![Page 3: Data structures](https://reader033.vdocument.in/reader033/viewer/2022061214/54990a80b47959525b8b4767/html5/thumbnails/3.jpg)
Basic Operations
1. traverse2. insert3. delete4. find
![Page 4: Data structures](https://reader033.vdocument.in/reader033/viewer/2022061214/54990a80b47959525b8b4767/html5/thumbnails/4.jpg)
Data Structures (Basic)
• Arrays• Linked Lists• Stacks• Queues• Recursion• Trees – Basic• Practice Problems
![Page 5: Data structures](https://reader033.vdocument.in/reader033/viewer/2022061214/54990a80b47959525b8b4767/html5/thumbnails/5.jpg)
Arrays
![Page 6: Data structures](https://reader033.vdocument.in/reader033/viewer/2022061214/54990a80b47959525b8b4767/html5/thumbnails/6.jpg)
• Contiguous and fixed memory allocation (independent of language)
• Random access and modification
• List of (index, value); index is non-negative integer; all values in a given array are of the same data type
• To hold various types of values or have non-numerical indices, use associative arrays/dictionaries – The Dictionary Problem?
![Page 7: Data structures](https://reader033.vdocument.in/reader033/viewer/2022061214/54990a80b47959525b8b4767/html5/thumbnails/7.jpg)
• Arrays may also be:• 2-D : array of 1-D arrays (a 1-D array is a data type in itself)• 3-D : array of 2-D arrays (a 2-D array is a data type in itself)
• Memory placement of multi-dimensional arrays1.row-major2.column-major
• Useful Operationa. Modifyb. Accessc. Swapd. In-place reverse
![Page 8: Data structures](https://reader033.vdocument.in/reader033/viewer/2022061214/54990a80b47959525b8b4767/html5/thumbnails/8.jpg)
Structure of an Arraytemplate<class T> class Array{int size;T *arr;void put();void get();…….};
Useful Libraries#include <vector>
![Page 9: Data structures](https://reader033.vdocument.in/reader033/viewer/2022061214/54990a80b47959525b8b4767/html5/thumbnails/9.jpg)
Irregular Arrays• Languages known to students at IITG1.2-D Array
2.Irregular Array
Student
Languages
Student
![Page 10: Data structures](https://reader033.vdocument.in/reader033/viewer/2022061214/54990a80b47959525b8b4767/html5/thumbnails/10.jpg)
Special (Arrays ??)
• Diagonal matrix, upper/lower triangular matrix, trigonal matrix, symmetric/asymmetric matrices
• Generally deal with 2-D matrices, but 3-D or higher cases are also possible. Generally deal with square matrix, but rectangular (non-square) are also possible
• More like functions
![Page 11: Data structures](https://reader033.vdocument.in/reader033/viewer/2022061214/54990a80b47959525b8b4767/html5/thumbnails/11.jpg)
Special (Arrays ??)
int spec_matrix(int i, int j){return no_cols*i + j + 1;
}
• Performance ??
![Page 12: Data structures](https://reader033.vdocument.in/reader033/viewer/2022061214/54990a80b47959525b8b4767/html5/thumbnails/12.jpg)
One Dimensional Sparse Array
4 17 7 23 8 14ary
0 0 0 0 17 0 0 23 14 0 0 0
0 1 2 3 4 5 6 7 8 9 10 11ary
![Page 13: Data structures](https://reader033.vdocument.in/reader033/viewer/2022061214/54990a80b47959525b8b4767/html5/thumbnails/13.jpg)
Two Dimensional Sparse Array
8
12
33
17
0 1 2 3 4 50
1
2
3
4
5
5 120
1
2
3
4
5
1 8 5 33
3 17
Row elements can be accessed efficiently
![Page 14: Data structures](https://reader033.vdocument.in/reader033/viewer/2022061214/54990a80b47959525b8b4767/html5/thumbnails/14.jpg)
Two Dimensional Sparse Array
8
12
33
17
0 1 2 3 4 50
1
2
3
4
5
5 120
1
2
3
4
5
1 8 5 33
3 17
0 1 2 3 4 5
0
33
4
rows
cols
Efficient row and column elements access
![Page 15: Data structures](https://reader033.vdocument.in/reader033/viewer/2022061214/54990a80b47959525b8b4767/html5/thumbnails/15.jpg)
Efficient Representation
8
12
3317
0 1 2 3 4 50
1
2
3
4
5
5 12
1 8 5 33
3 17
5
0
33
4
rows
cols
0
3
4
31
![Page 16: Data structures](https://reader033.vdocument.in/reader033/viewer/2022061214/54990a80b47959525b8b4767/html5/thumbnails/16.jpg)
Linked Lists
![Page 17: Data structures](https://reader033.vdocument.in/reader033/viewer/2022061214/54990a80b47959525b8b4767/html5/thumbnails/17.jpg)
Why?
• To store heterogeneous data• To store sparse data• Flexibility of increase/decrease in size; easy insertion and
deletion of elements
• Useful Operations• insertion• deletion
![Page 18: Data structures](https://reader033.vdocument.in/reader033/viewer/2022061214/54990a80b47959525b8b4767/html5/thumbnails/18.jpg)
Logical Arrangement
First element
Second element
Third element Null
Head nodeFinal node
Tail node
Address of second node
Address of third node
Address of final node
![Page 19: Data structures](https://reader033.vdocument.in/reader033/viewer/2022061214/54990a80b47959525b8b4767/html5/thumbnails/19.jpg)
The Structuretemplate <class T> class node{
T data;node<T> *next; // Extra (4?) bytes; size of a pointer
};
template <class T> LinkedList{node<T> *head;int size; // …..etc etc etc
};
Useful Libraries#include <list>
![Page 20: Data structures](https://reader033.vdocument.in/reader033/viewer/2022061214/54990a80b47959525b8b4767/html5/thumbnails/20.jpg)
View of the Memory
*Struct is stored in contiguous memory
![Page 21: Data structures](https://reader033.vdocument.in/reader033/viewer/2022061214/54990a80b47959525b8b4767/html5/thumbnails/21.jpg)
Insertion/Deletion
Time Complexity:Insertion : O(1) / O(n)Deletion : O(1) / O(n)
Space Complexity:Insertion : O(1)Deletion : O(1)
![Page 22: Data structures](https://reader033.vdocument.in/reader033/viewer/2022061214/54990a80b47959525b8b4767/html5/thumbnails/22.jpg)
Tweak some more !
• Doubly Linked Lists• Extra (4?) bytes space vs better accessibility• Insertion/deletion ?
• Circular Linked Lists• How to find the end?
• Tail pointer• Null ‘next’ pointer from last node• Last node points to first (circular)
![Page 23: Data structures](https://reader033.vdocument.in/reader033/viewer/2022061214/54990a80b47959525b8b4767/html5/thumbnails/23.jpg)
Practice (Linked List)Linked List 1)Linked List 2)
![Page 24: Data structures](https://reader033.vdocument.in/reader033/viewer/2022061214/54990a80b47959525b8b4767/html5/thumbnails/24.jpg)
Recursion
![Page 25: Data structures](https://reader033.vdocument.in/reader033/viewer/2022061214/54990a80b47959525b8b4767/html5/thumbnails/25.jpg)
• To solve a task using that task itself• ; a task should have recursive nature• ; generally can be transformed by tweaking some parts of the
task
• Example: task of piling up n coins vs picking up a suitcase.
• Let the task be a C function. What are the parts of the task:1.Input it takes2.What it does3.Output it gives
![Page 26: Data structures](https://reader033.vdocument.in/reader033/viewer/2022061214/54990a80b47959525b8b4767/html5/thumbnails/26.jpg)
• A task is performed recursively when generally a large input can’t be handled directly.
• So, recursion is all about simplifying the input at every step till it becomes trivial (base case)
![Page 27: Data structures](https://reader033.vdocument.in/reader033/viewer/2022061214/54990a80b47959525b8b4767/html5/thumbnails/27.jpg)
Implementation – run time stack
• Activation Records (AR)• Store the state of a method
1.input parameters2.return values3.local variables4.return addresses
![Page 28: Data structures](https://reader033.vdocument.in/reader033/viewer/2022061214/54990a80b47959525b8b4767/html5/thumbnails/28.jpg)
2
25.6
(136)?
2
...…y…
2
25.6
(136)?
2
...…y…
2
15.6
(105)?
2
25.6
(136)?
2
...…y…
2
15.6
(105)?
2
05.6
(105)?
2
25.6
(136)?
2
...…y…
2
15.6
(105)?
2
05.6
(105)1.0
2
25.6
(136)?
2
...…y…
2
15.6
(105)5.6
2
25.6
(136)31.36
2
...…y…
power(5.6, 2)
power(5.6, 1)power(5.6, 2)
power(5.6, 0)power(5.6, 1)power(5.6, 2)
power(5.6, 1)power(5.6, 2)
power(5.6, 2)
![Page 29: Data structures](https://reader033.vdocument.in/reader033/viewer/2022061214/54990a80b47959525b8b4767/html5/thumbnails/29.jpg)
• AR is formed on run-time stack and is private to a method.• run-time stack is 1 only.
Stack pointer
Stack pointer
Stack pointer
Stack pointer
![Page 30: Data structures](https://reader033.vdocument.in/reader033/viewer/2022061214/54990a80b47959525b8b4767/html5/thumbnails/30.jpg)
Advantages/Disadvantages
1.more readable/understandable/consistent with the the definition
2.memory requirements increase due to runtime stack3.difficult to open and debug
![Page 31: Data structures](https://reader033.vdocument.in/reader033/viewer/2022061214/54990a80b47959525b8b4767/html5/thumbnails/31.jpg)
Types of Recursion
• Tail (vs loop?)int factn;While (n > 0) factn *= n--;
• Indirect• A() -> B() -> C() -> A()
• Nested:• h(n) = h(2 + h(n-1))
![Page 32: Data structures](https://reader033.vdocument.in/reader033/viewer/2022061214/54990a80b47959525b8b4767/html5/thumbnails/32.jpg)
Types of Recursion
• Excessive: exponential time complexity!
• Questionable: will it terminate??
2)2()1(
11
00)(
nnFibnFibnif
nifnFib
otherwisenf
evenisnifnf
nif
nf
)1*3(
)2/(
11
)(
![Page 33: Data structures](https://reader033.vdocument.in/reader033/viewer/2022061214/54990a80b47959525b8b4767/html5/thumbnails/33.jpg)
Hashes
![Page 34: Data structures](https://reader033.vdocument.in/reader033/viewer/2022061214/54990a80b47959525b8b4767/html5/thumbnails/34.jpg)
Why?
• Want to store dictionaries?, associative arrays?• arrays with non-numerical indices
• String operations made easy• Ex: Finding anagrams• Ex: Counting frequency of words in a string
![Page 35: Data structures](https://reader033.vdocument.in/reader033/viewer/2022061214/54990a80b47959525b8b4767/html5/thumbnails/35.jpg)
Associative Arrays• (key, value) pairs where key is not necessarily a non-negative
integer; can be string etc.
• Ex: no. of students in each department• “cse” => 68• “eee” => 120• “mech” => 70• “biotech” => 30
• Do not allow duplicate keys• Dict (“cse”) = “data structures”• Dict (“cse”) = “algorithms”
Dict(“cse”) = {“data structures”, “algorithms”}
![Page 36: Data structures](https://reader033.vdocument.in/reader033/viewer/2022061214/54990a80b47959525b8b4767/html5/thumbnails/36.jpg)
Hash Functions1.HashTable : an array of fixed size
• TableSize - preferably prime and large2.Hash function (map to an index of the HashTable)Techniques
• use all characters• use aggregate properties - length, frequencies of characters• first 3 characters, odd characters
Evaluation• Uniform distribution; load factor λ?• Utilize table space• Quickly computable
![Page 37: Data structures](https://reader033.vdocument.in/reader033/viewer/2022061214/54990a80b47959525b8b4767/html5/thumbnails/37.jpg)
3. Collision resolution1.separate chaining
• Linked list at each index• Insertion (at head?)• Desired length of a chain : close to λ• Avg. time for Successful search = 1 + 1 + λ/2• Disadvantages
• slow?• different data structures - array/linked lists?
![Page 38: Data structures](https://reader033.vdocument.in/reader033/viewer/2022061214/54990a80b47959525b8b4767/html5/thumbnails/38.jpg)
1.open addressing• Single table• Desired λ ~ 0.5• Apply h0(x), h1(x), h2(x) …
• hi(x) = h(x) + f(i); f(0) = 03 ways to do it
1.linear probing : f(i) is linear in i• f(i) = i (quickly computable vs primary clustering?)
2.quadratic probing : f(i) is quadratic in i3.double hashing
• H(x) = h(x) + f(i).h2(x)Rehashing
• What if the table gets full (70%, …. , 100%)• Create a new HashTable double? the size
![Page 39: Data structures](https://reader033.vdocument.in/reader033/viewer/2022061214/54990a80b47959525b8b4767/html5/thumbnails/39.jpg)
Structure
template<class T> class Hash{int TableSize;T *arr;
};
Useful Libraries#include <hash>
![Page 40: Data structures](https://reader033.vdocument.in/reader033/viewer/2022061214/54990a80b47959525b8b4767/html5/thumbnails/40.jpg)
Practice (Hashes)Trie 7)
![Page 41: Data structures](https://reader033.vdocument.in/reader033/viewer/2022061214/54990a80b47959525b8b4767/html5/thumbnails/41.jpg)
Graphs
![Page 42: Data structures](https://reader033.vdocument.in/reader033/viewer/2022061214/54990a80b47959525b8b4767/html5/thumbnails/42.jpg)
What is it?
In simple words, G = (V, E)V = (v0, v1, v2, v3, .. vn) is the set of nodesE = (e0, e1, e2, e3 .. em) is the set of edges
*Any tree T = (V, E) as well; so most techniques in graph algorithms apply to trees as well.
v0
v3v2
v1
![Page 43: Data structures](https://reader033.vdocument.in/reader033/viewer/2022061214/54990a80b47959525b8b4767/html5/thumbnails/43.jpg)
Representation1.Adjacency Matrix (|V| * |V|)
2.Adjacency List
![Page 44: Data structures](https://reader033.vdocument.in/reader033/viewer/2022061214/54990a80b47959525b8b4767/html5/thumbnails/44.jpg)
Breadth First Traversal (BFT)
• Traverse the nodes depth-wise; nodes at depth 0 before nodes at depth 1 before nodes at depth 2 ....
• Done using a queue• Ex: 1,2,3,4,5,7,8,6
![Page 45: Data structures](https://reader033.vdocument.in/reader033/viewer/2022061214/54990a80b47959525b8b4767/html5/thumbnails/45.jpg)
Depth First Traversal (DFT)
• Move to next child only after all nodes in the current child are marked
• Done using a stack• Ex: a, b, c, d, e, h, f, g
![Page 46: Data structures](https://reader033.vdocument.in/reader033/viewer/2022061214/54990a80b47959525b8b4767/html5/thumbnails/46.jpg)
Trees (Advanced)
![Page 47: Data structures](https://reader033.vdocument.in/reader033/viewer/2022061214/54990a80b47959525b8b4767/html5/thumbnails/47.jpg)
Retrieval
• Stores the prefixes of a set of strings in an efficient manner• Used to store associative arrays/dictionaries
![Page 48: Data structures](https://reader033.vdocument.in/reader033/viewer/2022061214/54990a80b47959525b8b4767/html5/thumbnails/48.jpg)
How to create a Trie
• Ex: tin, ten, ted, tea, to, i, in, inn
![Page 49: Data structures](https://reader033.vdocument.in/reader033/viewer/2022061214/54990a80b47959525b8b4767/html5/thumbnails/49.jpg)
Pairs of anagrams
• Sort all the strings• acute -> acetu• obtuse -> beostu … etc
• Insert them into the trie• Keep storing collisions i.e. multiple values for each key• Each set of values gives groups of anagrams
![Page 50: Data structures](https://reader033.vdocument.in/reader033/viewer/2022061214/54990a80b47959525b8b4767/html5/thumbnails/50.jpg)
Suffix Tree/Patricia/Radix Tree
• Stores the suffixes of a string• O(n) space and time to build• Does not exist for all strings; add special symbol $ at the end
![Page 51: Data structures](https://reader033.vdocument.in/reader033/viewer/2022061214/54990a80b47959525b8b4767/html5/thumbnails/51.jpg)
Advantages of Suffix Trees
• Store n suffixes in O(n) space.• Improved string operations. Eg. substring lookup, Longest
common substring operation (generalized suffix trees?)
Generalized Suffix Trees• Each string terminated by a different special symbol• More space efficient• Have different set of algorithms
![Page 52: Data structures](https://reader033.vdocument.in/reader033/viewer/2022061214/54990a80b47959525b8b4767/html5/thumbnails/52.jpg)
Longest Common Substring
Longest Common Substring1.Make a “generalized suffix tree” for the (2?) strings2.Traverse the tree to mark all internal nodes as 1, 2 or (1,2)
depending on whether it is parent to a leaf node terminating with the special symbol of string 1 and string 2.
3.Find the deepest internal node marked 1,2
Pattern Matching ?