indexed search tree (trie)
DESCRIPTION
Indexed Search Tree (Trie). Nelson Padua-Perez Chau-Wen Tseng Department of Computer Science University of Maryland, College Park. Indexed Search Tree ( Trie). Special case of tree Applicable when Key C can be decomposed into a sequence of subkeys C 1 , C 2 , … C n - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Indexed Search Tree (Trie)](https://reader034.vdocument.in/reader034/viewer/2022051821/56814e99550346895dbc42b7/html5/thumbnails/1.jpg)
Indexed Search Tree (Trie)
Nelson Padua-Perez
Chau-Wen Tseng
Department of Computer Science
University of Maryland, College Park
![Page 2: Indexed Search Tree (Trie)](https://reader034.vdocument.in/reader034/viewer/2022051821/56814e99550346895dbc42b7/html5/thumbnails/2.jpg)
Indexed Search Tree (Trie)
Special case of tree
Applicable when Key C can be decomposed into a sequence of subkeys C1, C2, … Cn
Redundancy exists between subkeys
ApproachStore subkey at each node
Path through trie yields full key
ExampleHuffman tree
C3
C1
C2
C4C3
![Page 3: Indexed Search Tree (Trie)](https://reader034.vdocument.in/reader034/viewer/2022051821/56814e99550346895dbc42b7/html5/thumbnails/3.jpg)
Tries
Useful for searching strings String decomposes into sequence of letters
Example
“ART” “A” “R” “T”
Can be very fastLess overhead than hashing
May reduce memoryExploiting redundancy
May require more memoryExplicitly storing substrings
S
A
R
TE
“ART”
![Page 4: Indexed Search Tree (Trie)](https://reader034.vdocument.in/reader034/viewer/2022051821/56814e99550346895dbc42b7/html5/thumbnails/4.jpg)
Types of Tries
StandardSingle character per node
CompressedEliminating chains of nodes
CompactStores indices into original string(s)
SuffixStores all suffixes of string
![Page 5: Indexed Search Tree (Trie)](https://reader034.vdocument.in/reader034/viewer/2022051821/56814e99550346895dbc42b7/html5/thumbnails/5.jpg)
Standard Tries
ApproachEach node (except root) is labeled with a character
Children of node are ordered (alphabetically)
Paths from root to leaves yield all input strings
Trie for Morse Code
![Page 6: Indexed Search Tree (Trie)](https://reader034.vdocument.in/reader034/viewer/2022051821/56814e99550346895dbc42b7/html5/thumbnails/6.jpg)
Standard Trie Example
For strings{ a, an, and, any, at }
![Page 7: Indexed Search Tree (Trie)](https://reader034.vdocument.in/reader034/viewer/2022051821/56814e99550346895dbc42b7/html5/thumbnails/7.jpg)
Standard Trie Example
For strings{ bear, bell, bid, bull, buy, sell, stock, stop }
a
e
b
r
l
l
s
u
l
l
y
e t
l
l
o
c
k
p
i
d
![Page 8: Indexed Search Tree (Trie)](https://reader034.vdocument.in/reader034/viewer/2022051821/56814e99550346895dbc42b7/html5/thumbnails/8.jpg)
Standard Tries
Node structureValue between 1…m
Reference to m children
Array or linked list
ExampleClass Node {
Letter value; // Letter V = { V1, V2, … Vm }Node child[ m ];
}
![Page 9: Indexed Search Tree (Trie)](https://reader034.vdocument.in/reader034/viewer/2022051821/56814e99550346895dbc42b7/html5/thumbnails/9.jpg)
Standard Tries
EfficiencyUses O(n) space
Supports search / insert / delete in O(dm) time
For
n total size of strings indexed by trie
d length of the parameter string
m size of the alphabet
![Page 10: Indexed Search Tree (Trie)](https://reader034.vdocument.in/reader034/viewer/2022051821/56814e99550346895dbc42b7/html5/thumbnails/10.jpg)
Word Matching Trie
Insert words into trie
Each leaf stores occurrences of word in the text
s e e b e a r ? s e l l s t o c k !
s e e b u l l ? b u y s t o c k !
b i d s t o c k !
a
a
h e t h e b e l l ? s t o p !
b i d s t o c k !
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46
47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68
69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86
a r87 88
a
e
b
l
s
u
l
e t
e
0, 24
o
c
i
l
r
6
l
78
d
47, 58l
30
y
36l
12k
17, 40,51, 62
p
84
h
e
r
69
a
![Page 11: Indexed Search Tree (Trie)](https://reader034.vdocument.in/reader034/viewer/2022051821/56814e99550346895dbc42b7/html5/thumbnails/11.jpg)
Compressed Trie
ObservationInternal node v of T is redundant if v has one child and is not the root
ApproachA chain of redundant nodes can be compressed
Replace chain with single node
Include concatenation of labels from chain
ResultInternal nodes have at least 2 children
Some nodes have multiple characters
![Page 12: Indexed Search Tree (Trie)](https://reader034.vdocument.in/reader034/viewer/2022051821/56814e99550346895dbc42b7/html5/thumbnails/12.jpg)
Compressed Trie
e
b
ar ll
s
u
ll y
ell to
ck p
id
a
e
b
r
l
l
s
u
l
l
y
e t
l
l
o
c
k
p
i
d
Node selain root yang hanya punya 1 anak perlu dikompress
![Page 13: Indexed Search Tree (Trie)](https://reader034.vdocument.in/reader034/viewer/2022051821/56814e99550346895dbc42b7/html5/thumbnails/13.jpg)
Compact Tries
Compact representation of a compressed trie
ApproachFor an array of strings S = S[0], … S[s-1]
Store ranges of indices at each node
Instead of substring
Represent as a triplet of integers (i, j, k)
Such that X = s[i][j..k]
Example: S[0] = “abcd”, (0,1,2) = “bc”
PropertiesUses O(s) space, where s = # of strings in the array
Serves as an auxiliary index structure
![Page 14: Indexed Search Tree (Trie)](https://reader034.vdocument.in/reader034/viewer/2022051821/56814e99550346895dbc42b7/html5/thumbnails/14.jpg)
Compact Representation
Example
s e e
b e a r
s e l l
s t o c k
b u l l
b u y
b i d
h e
b e l l
s t o p
0 1 2 3 4a rS[0] =
S[1] =
S[2] =
S[3] =
S[4] =
S[5] =
S[6] =
S[7] =
S[8] =
S[9] =
0 1 2 3 0 1 2 3
1, 1, 1
1, 0, 0 0, 0, 0
4, 1, 1
0, 2, 2
3, 1, 2
1, 2, 3 8, 2, 3
6, 1, 2
4, 2, 3 5, 2, 2 2, 2, 3 3, 3, 4 9, 3, 3
7, 0, 3
0, 1, 1
e
b s
u
e
to
ar ll
id
ll y ll ck p
hear
e
![Page 15: Indexed Search Tree (Trie)](https://reader034.vdocument.in/reader034/viewer/2022051821/56814e99550346895dbc42b7/html5/thumbnails/15.jpg)
Suffix Trie
Compressed trie of all suffixes of text
Example: “IPDPS”Suffixes
IPDPS
PDPS
DPS
PS
S
Useful for finding pattern in any part of textOccurrence prefix of some suffix
Example: find PDP in IPDPS
D
P
S
P
I
P
D
S
P
S
D
PS
S
![Page 16: Indexed Search Tree (Trie)](https://reader034.vdocument.in/reader034/viewer/2022051821/56814e99550346895dbc42b7/html5/thumbnails/16.jpg)
Suffix Trie Example
e nimize
nimize ze
zei mi
mize nimize ze
m i n i z em i0 1 2 3 4 5 6 7
7, 7 2, 7
2, 7 6, 7
6, 7
4, 7 2, 7 6, 7
1, 1 0, 1
minimize
inimize
nimize
imize
mize
ize
ze
e
![Page 17: Indexed Search Tree (Trie)](https://reader034.vdocument.in/reader034/viewer/2022051821/56814e99550346895dbc42b7/html5/thumbnails/17.jpg)
Encoding Trie (1)Encode adalah merepresentasikan alphabet ke binary
An encoding trie represents a prefix codeEach leaf stores a character
The code word of a character is given by the path from the root to the leaf storing the character (0 for a left child and 1 for a right child
a
b c
d e00 010 011 10 11
a b c d e
0
0 1
0 1
1
0 1
![Page 18: Indexed Search Tree (Trie)](https://reader034.vdocument.in/reader034/viewer/2022051821/56814e99550346895dbc42b7/html5/thumbnails/18.jpg)
Encoding Trie (2)Given a text string X, we want to find a prefix code for the characters of X that yields a small encoding for X
Frequent characters should have long code-wordsRare characters should have short code-words
ExampleX = abracadabraT1 encodes X into 29 bits: 010 11 011 010 00 010 10 010 11 011 010T2 encodes X into 24 bits: 00 10 11 00 010 00 011 00 10 11 00
c
a r
d b a
c d
b r
T1T2
00 10 010 011 11
a b c d r
010 11 00 10 011
a b c d r
0 1
0 10 1
0 1
0 1
0 1
0 1
0 1
![Page 19: Indexed Search Tree (Trie)](https://reader034.vdocument.in/reader034/viewer/2022051821/56814e99550346895dbc42b7/html5/thumbnails/19.jpg)
Huffman’s Tree
a b c d r
5 2 1 1 2
X = abracadabra
Frequencies
r adc b52 21 1
c ardb
2
52 2
•Given a string X, Huffman’s algorithm construct a prefix code• the minimizes the size of the encoding of X
c
ar
d
b 25
4
2
a5
6
c d
b 2
4r
a
11
6
c d
b 2
4r
: 0 110 10 0 1110 0 1111 0 110 10 0: 23 digit
![Page 20: Indexed Search Tree (Trie)](https://reader034.vdocument.in/reader034/viewer/2022051821/56814e99550346895dbc42b7/html5/thumbnails/20.jpg)
Tries and Web Search Engines
Search engine indexCollection of all searchable words
Stored in compressed trie
Each leaf of trie Associated with a word
List of pages (URLs) containing that word
Called occurrence list
Trie is kept in memory (fast)
Occurrence lists kept in external memoryRanked by relevance
![Page 21: Indexed Search Tree (Trie)](https://reader034.vdocument.in/reader034/viewer/2022051821/56814e99550346895dbc42b7/html5/thumbnails/21.jpg)
Computational Biology
DNASequence of 4 different nucleotides (ATCG)
Portions of DNA sequence produce proteins (genes)
GenomeMaster DNA sequence for organism
For Human
46 chromosomes
3 billion nucleotides
![Page 22: Indexed Search Tree (Trie)](https://reader034.vdocument.in/reader034/viewer/2022051821/56814e99550346895dbc42b7/html5/thumbnails/22.jpg)
![Page 23: Indexed Search Tree (Trie)](https://reader034.vdocument.in/reader034/viewer/2022051821/56814e99550346895dbc42b7/html5/thumbnails/23.jpg)
Tries and Computational Biology
ESTsFragments of expressed DNA
Indicator for genes (& location)
5.5 million sequences at NIH
ESTmapperBuild suffix trie of genome
8 hours, 60 Gbytes
Search for ESTs in suffix trie
11 hours w/ 8 processor Sun
Search genome w/ BLAST 5+ years (predicted)
Genome
ESTs
Suffix tree
Mapping
Gene