comp 272 notes

26
7/23/2019 Comp 272 Notes http://slidepdf.com/reader/full/comp-272-notes 1/26 Unit 1: Introduction Explain the concept of data structures. In computer science, a data structure is a particular way of organizing data in a computer so that it can be used efficiently. Data structures can implement one or more particular abstract data types (ADT), which are the means of specifying the contract of operations and their complexity. In comparison, a data structure is a concrete implementation of the contract proided by an ADT. Different !inds of data structures are suited to different !inds of applications, and some are highly specialized to specific tas!s. "or example, databases use #$tree indexes for small  percentages of data retrieal, and compilers and databases use dynamic hash tables as loo!$up tables. Data structures proide a means to manage large amounts of data efficiently for uses such as large databases and internet indexing serices. %sually, efficient data structures are !ey to designing efficient algorithms. &ome formal design methods and programming languages emphasize data structures, rather than algorithms, as the !ey organizing factor in software design. &toring and retrieing can be carried out on data stored in both main memory and in secondary memory. Explain the concept of algorithms. As you will recall from earlier in your studies, an algorithm is a step$by$step process that  performs actions on data structures. "or example, you can design and write code for an algorithm to find the smallest integer in an array of integers' you can design and write code for an algorithm that finds all red pixels in a $D colour image. Explain the need for efficiency in data structures and algorithms. These examples tell us that the obious implementations of data structures do not scale well when the number of items, , in the data structure and the number of operations, , performed on the data structure are both large. In these cases, the time (measured in, say, machine instructions) is roughly . The solution, of course, is to carefully organize data within the data structure so that not eery operation re*uires eery data item to be inspected. Although it sounds impossible at first, we will see data structures where a search re*uires loo!ing at only two items on aerage, independent of the number of items stored in the data structure. Distinguish the difference between an interface and an implementation. A program is an implementation of an algorithm. In fact, eery program is an implementation of some algorithm. +hen discussing data structures, it is important to understand the difference between a data structures interface and its implementation. An interface describes what a data structure does, while an implementation describes how the data structure does it. An interface, sometimes also called an abstract data type, defines the set of operations supported by a data structure and the semantics, or meaning, of those operations. An

Upload: qqq-wweeew

Post on 18-Feb-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Comp 272 Notes

7/23/2019 Comp 272 Notes

http://slidepdf.com/reader/full/comp-272-notes 1/26

Unit 1: Introduction

• Explain the concept of data structures.In computer science, a data structure is a particular way of organizing data  in a computer sothat it can be used efficiently. Data structures can implement one or more particular abstract data

types (ADT), which are the means of specifying the contract of operations and their complexity.In comparison, a data structure is a concrete implementation of the contract proided by an ADT.

Different !inds of data structures are suited to different !inds of applications, and some arehighly specialized to specific tas!s. "or example, databases use #$tree  indexes for small percentages of data retrieal, and compilers and databases use dynamic hash tables as loo!$uptables.

Data structures proide a means to manage large amounts of data efficiently for uses such aslarge databases  and internet indexing serices. %sually, efficient data structures are !ey todesigning efficient algorithms.   &ome formal design methods and  programming languages

emphasize data structures, rather than algorithms, as the !ey organizing factor in softwaredesign. &toring and retrieing can be carried out on data stored in both main memory and insecondary memory.

• Explain the concept of algorithms.

As you will recall from earlier in your studies, an algorithm is a step$by$step process that performs actions on data structures. "or example, you can design and write code for an algorithmto find the smallest integer in an array of integers' you can design and write code for analgorithm that finds all red pixels in a $D colour image.

• Explain the need for efficiency in data structures and algorithms.

These examples tell us that the obious implementations of data structures do not scale well

when the number of items, , in the data structure and the number of operations, , performedon the data structure are both large. In these cases, the time (measured in, say, machine

instructions) is roughly . The solution, of course, is to carefully organize data within thedata structure so that not eery operation re*uires eery data item to be inspected. Although itsounds impossible at first, we will see data structures where a search re*uires loo!ing at only twoitems on aerage, independent of the number of items stored in the data structure.

• Distinguish the difference between an interface and an implementation.A program is an implementation of an algorithm. In fact, eery program is animplementation of some algorithm.+hen discussing data structures, it is important to understand the difference between adata structures interface and its implementation. An interface describes what a datastructure does, while an implementation describes how the data structure does it.An interface, sometimes also called an abstract data type, defines the set of operationssupported by a data structure and the semantics, or meaning, of those operations. An

Page 2: Comp 272 Notes

7/23/2019 Comp 272 Notes

http://slidepdf.com/reader/full/comp-272-notes 2/26

interface tells us nothing about how the data structure implements these operations' itonly proides a list of supported operations along with specifications about what types ofarguments each operation accepts and the alue returned by each operation.A data structure implementation, on the other hand, includes the internal representation of the data structure as well as the definitions of the algorithms that implement the

operations supported by the data structure. Thus, there can be many implementations of asingle interface. "or example, in -hapter , we will see implementations of the istinterface using arrays and in -hapter / we will see implementations of the ist interfaceusing pointer$based data structures. 0ach implements the same interface, ist, but indifferent ways.

• Use mathematical concepts required to understand data structures and algorithms.

+e generally use asymptotic notation to simplify functions. "or example, in place of 

 we can write . This is proen as follows1

 

This demonstrates that the function is in the set

 using the constants and .

Page 3: Comp 272 Notes

7/23/2019 Comp 272 Notes

http://slidepdf.com/reader/full/comp-272-notes 3/26

%sing big$2h notation, the running time can be simplified to

• Apply a model of computation.

"or this, we use $bit word$3A4 model. 3A4 stands for 3andom Access 4achine. In this

model, we hae access to a random access memory consisting of cells, each of which stores a$bit word. This implies that a memory cell can represent, for example, any integer in the set

.In the word$3A4 model, basic operations on words ta!e constant time. This includes arithmetic

operations ( , , , , ), comparisons ( , , , , ), and bitwise booleanoperations (bitwise$A5D, 23, and exclusie$23).

Any cell can be read or written in constant time. A computers memory is managed by a memorymanagement system from which we can allocate or deallocate a bloc! of memory of any size we

would li!e. Allocating a bloc! of memory of size ta!es time and returns a reference (a pointer) to the newly$allocated memory bloc!. This reference is small enough to be represented by a single word.

The word$size is a ery important parameter of this model. The only assumption we will

ma!e about is the lower$bound , where is the number of elements stored in

any of our data structures. This is a fairly modest assumption, since otherwise a word is not een big enough to count the number of elements stored in the data structure.

&pace is measured in words, so that when we tal! about the amount of space used by a datastructure, we are referring to the number of words of memory used by the structure. All of ourdata structures store alues of a generic type T, and we assume an element of type T occupies oneword of memory.

• Apply correctness, time complexity, and space complexity to data structures and algorithms.

Correctness:

The data structure should correctly implement its interface.Time complexity:

The running times of operations on the data structure should be as small as possible.Space complexity:

The data structure should use as little memory as possible.

Worst-case running times:

These are the strongest !ind of running time guarantees. If a data structure operation has a worst$

Page 4: Comp 272 Notes

7/23/2019 Comp 272 Notes

http://slidepdf.com/reader/full/comp-272-notes 4/26

case running time of , then one of these operations neer ta!es longer than time.Amortized running times:

If we say that the amortized running time of an operation in a data structure is , then this

means that the cost of a typical operation is at most . 4ore precisely, if a data structure

has an amortized running time , then a se*uence of operations ta!es at most

time. &ome indiidual operations may ta!e more than time but the aerage, oer the

entire se*uence of operations, is at most .

Expected running times:

If we say that the expected running time of an operation on a data structure is , this meansthat the actual running time is a random ariable (see &ection ./.6) and the expected alue of 

this random ariable is at . The randomization here is with respect to random choicesmade by the data structure.

Unit 2: Array-ased !ists

• Implement ist interfaces.

• Implement !ueue interfacesA stac! is a container of ob7ects that are inserted and remoed according to the last$in first$out(I"2) principle. In the pushdown stac!s only two operations are allowed1 pus" the item into thestac!, and pop the item out of the stac!. A stac! is a limited access data structure $ elements can be added and remoed from the stac! only at the top. pus" adds an item to the top of the stac!,pop remoes the item from the top. A helpful analogy is to thin! of a stac! of boo!s' you canremoe only the top boo!, also you can add a new boo! on the top.

A *ueue is a container of ob7ects (a linear collection) that are inserted and remoed according tothe first$in first$out ("I"2) principle. An excellent example of a *ueue is a line of students in the

food court of the %-. 5ew additions to a line made to the bac! of the *ueue, while remoal (orsering) happens in the front. In the *ueue only two operations are allowed en#ueue andde#ueue. 0n*ueue means to insert an item into the bac! of the *ueue, de*ueue means remoingthe front item. The picture demonstrates the "I"2 access.The difference between stac!s and *ueues is in remoing. In a stac! we remoe the item themost recently added' in a *ueue, we remoe the item the least recently added.

Page 5: Comp 272 Notes

7/23/2019 Comp 272 Notes

http://slidepdf.com/reader/full/comp-272-notes 5/26

Unit $: !in%ed !ists

• Implement singly"lin#ed lists.

• Implement doubly"lin#ed lists.• Implement space"efficient lin#ed lists.

+e first study singly$lin!ed lists, which can implement &tac! and ("I"2) 8ueue operations inconstant time per operation and then moe on to doubly$lin!ed lists, which can implementDe*ueue operations in constant time.

in!ed lists hae adantages and disadantages when compared to array$based implementationsof the ist interface. The primary disadantage is that we lose the ability to access any element

using or in constant time. Instead, we hae to wal! through the list, one

element at a time, until we reach the 999: th element. The primary adantage is that they are

more dynamic1 ;ien a reference to any list node , we can delete or insert a node ad7acent

to in constant time. This is true no matter where is in the list.

Unit &: S%iplists

• Implement s#iplists.

Unit ': (as" Ta)les

• Explain hash functions $di%ision, multiplication, folding, radix transformation, digitrearrangement, length"dependent, mid"square&.

• Estimate the effecti%eness of hash functions $di%ision, multiplication, folding, radix

transformation, digit rearrangement, length"dependent, mid"square&.

• Differentiate between %arious hash functions $di%ision, multiplication, folding, radix

transformation, digit rearrangement, length"dependent, mid"square&.

*i+ision (as"ing

Diision hashing using a prime number is *uite popular. <ere is a list of prime numbers that youcan use and the range in which they are effectie1

=/ > =

 to ?

@ > ? to

@/ >  to B

/B@ > B to @

?@ > @ to C

The search !eys that we intend to hash can range from simple numbers to the entire text contentof a boo!. The original ob7ect could be textual or numerical or in any medium. +e need toconert each ob7ect into its e*uialent numerical representation in preparation for hashing. That

Page 6: Comp 272 Notes

7/23/2019 Comp 272 Notes

http://slidepdf.com/reader/full/comp-272-notes 6/26

is, ob7ects are referenced by search !eys.A hash function must guarantee that the number it returns is a alid index to one of the tableslots. A simple way is to use (! modulo Table&ize).Example1 &uppose we intend to hash strings' i.e., the table is to store strings. A ery simple hashfunction would be to add up the A&-II alues of all the characters in the string and ta!e the

modulo of the table size, say @.Thus cobb would be stored at the location( ?6 / ?6 = ?6 ?6 ) E @ F BBhi!e would be stored at the location( ?6 B ?6 @ ?6 ?6 =) E @ F  pp** would be stored at the location( ?6 ? ?6 ? ?6 ?6 ) E @ F /=abcd would be stored at the location(?6 ?6 ?6 / ?6 6) E @ F ?The !ey idea is to get numbers far away from each other.A better hashing function for a string sC s sG s5 gien a table size Table&ize would be

H(ascii(sC) BC (ascii(s) B (ascii(s) B G (ascii(s5) B5 J E Table&izeThe computation of the hashing function is ery li!ely to fail for large strings because ofoerflow in arious terms. This failure can be aoided by using <ornerKs rule, using mod at eachstage of computation1;ien a polynomial of degree n, p(x) F anxn anLxn L ... ax aC2ne might suspect that n (n L ) (n L ) . .. F n(n )M multiplications would beneeded to ealuate p(x) for a gien x. <oweer <ornerKs rule shows that it can be rewritten sothat only n multiplications are needed1 p(x) F (((anx anL)x a)x ... a)x aCThis is exactly the way that integer constants are ealuated from strings of characters (digits)1/6= F C6 C/ / C 6 C = CCF (( C ) C / C 6) C =%se of <ornerKs rule would imply computing the aboe function in the following fashion1ascii(sC) B(ascii(s) B(ascii(s) ... (B(ascii(s5$) B ascii(s5))...))

,ultiplication (as"ing

4ultiplication hashing uses multiplication by a real number and then a truncation of the integer.Intuitiely, we can expect that multiplying a random real number between C and with aninteger !ey should gie us another random real number. Ta!ing the decimal part of this resultshould gie us most of the digits of precision (i.e., randomness) of the original. The decimal partalso restricts output to a range of alues.A good alue of the real number to be used in multiplication hashing is c F ((s*rt(=) L ) M ).Thus,h(!) F N m (! c L N ! c N ) N, and C O c O .<ere, the !ey ! is multiplied by the constant real number c, where C O c O . +e then ta!e thefractional part of ! c. 4ultiply this alue by m. 5ote that the alue of m does not ma!e adifference. Ta!e the floor of the result, which is the hash alue of !.&uppose the size of the table, m, is /C1"or ! F /6, h(!) F B=C

Page 7: Comp 272 Notes

7/23/2019 Comp 272 Notes

http://slidepdf.com/reader/full/comp-272-notes 7/26

"or ! F /=, h(!) F /=/"or ! F /?, h(!) F ="or ! F /, h(!) F ??C"or ! F /B, h(!) F ?6"or ! F /@, h(!) F @?B

"or ! F 6C, h(!) F 6As you can see, the hash function brea!s the input pattern fairly uniformly.The diision and multiplication hash functions are not order presering. That is, the originalorder of ob7ects is not the same as the order in which the hashed alues are stored in the table.i.e., ! O ! FP h(!) O h(!)Also, the diision and multiplication hash functions are not perfect or minimal perfect hashfunctions. A minimal perfect hash maps n !eys to a range of n elements with no collisions. A perfect hash maps n !eys to a range of m elements, m PF n, with no collisions. 3efer tohttp1MMcmph.sourceforge.netM for additional information about algorithms that can generateminimal perfect hashing.

olding (as"ingThis method diides the original ob7ect or the search !ey into seeral parts, adds the partstogether, and then uses the last four digits (or some other arbitrary number of digits) as thehashed alue or !ey. "or example, a social insurance number / 6=? B@ can be bro!en intothree parts1 /, 6=?, and B@. These three numbers are added, yielding the position as /?B. Thehash function will be /?B E Table&ize.The folding can be done in number of ways. "or instance, one can diide the number into four parts1 , /6, =?, B@, and add them together.

.adix Trans/ormation

The number base (or radix) of the search !ey can be changed, resulting in a different se*uence of digits. "or example, a decimal numbered !ey could be transformed into a hexadecimal numbered!ey. <igh$order digits could be discarded to fit a hash alue of uniform length. "or instance, ifour !ey is / in base C, we might conert it to / in base . +e then use the diision method toobtain a hash alue.

*igit .earrangement

<ere, the search !ey digits, say in positions / through ?, are reersed, resulting in a new search!ey."or example, if our !ey is /6=?, we might select the digits in positions through 6, yielding/6. The manipulation can then ta!e many forms1

Q reersing the digits > 6/, resulting in a !ey of 6/=?Q performing a circular shift to the right > 6/, resulting in a !ey of 6/=?Q performing a circular shift to the left > /6, resulting in a !ey of /6=?Q swapping each pair of digits > /6, resulting in a !ey of /6=?.

!engt"-*ependent (as"ing

In this method, the !ey and the length of the !ey are combined in some way to form either theindex itself or an intermediate ersion. "or example, if our !ey is B?=, we might multiply thefirst two digits by the length and then diide by the last digit, yielding ?@. If our table size is 6/,

Page 8: Comp 272 Notes

7/23/2019 Comp 272 Notes

http://slidepdf.com/reader/full/comp-272-notes 8/26

we would then use the diision method, resulting in an index of ?.

,id-S#uare (as"ing

The !ey is s*uared, and the middle part of the result is used as address for the hash table. The

entire !ey participates in generating the address so that there is a better chance that differentaddresses are generated een for !eys close to each other. "or example,suppose the !ey is /, the s*uare is @6C?6, and the mid alue is 6C?suppose the !ey is /, the s*uare is @6?BB6, and the mid alue is 6?Bsuppose the !ey is //, the s*uare is @=/@, and the mid alue is =/In practice, it is more efficient to choose a power of for the size of the table and extract themiddle part of the bit representation of the s*uare of a !ey. If the table size is chosen in thisexample as C6, the binary representation of s*uare of / isCCCC$CCCCCCC$CCCC.The middle part can be easily extracted using a mas! and a shift operation.

• recogni'e %arious collision resolution algorithms(open addressing $linear probing, quadratic probing, double hashing&, separate chaining $normal, with list heads, with other data

 structures&, coalesced hashing, robin hood hashing, cuc#oo hashing, hopscotch hashing,dynamic resi'ing $resi'ing the whole, incremental resi'ing&.

Q -ollisions happen when two search !eys are hashed into the same slot in the table. Thereare many ways to resole collision in hashing. Alternatiely, one can discoer a hashfunction that is perfectRmeaning that it maps each search !ey into a different hash alue.%nfortunately, perfect hash functions are effectie only in situations where the inputs arefixed and !nown in adance. A sub$category of perfect hash is minimal perfect hash,where the range of the hash alues is also limited, yielding a compact hash table.

Q If we are able to deelop a perfect hashing function, we do not need to be concernedabout collisions or table size. <oweer, often we do not !now the size of the input datasetand are not able to deelop a perfect hashing function. In these cases, we must choose amethod for handling collisions.

Q "or almost all hash functions, it is possible that more than one !ey is assigned to the sametable slot. "or example, if the hash function computes the slot based 7ust on the first letter of the !ey, then all !eys starting with the same letter will be hashed to the same slot,resulting in a collision.

Q -ollision can be resoled partially by choosing another hash function, which computesthe slot based on first two letters of the !ey. <oweer, een if a hash function is chosen inwhich all the letters of the !ey participate, there is still a possibility that a number of !eysmay hash to the same slot in the hash table.

Q Another factor that can be used to aoid collision of multiple !eys is the size of the hashtable. A larger size will result in fewer collisions, but that will also increase the accesstime during retrieal.

Q A number of strategies hae been proposed to preent collision of multiple !eys.&trategies that loo! for another open position in the table other than the one to which theslot is originally hashed are called open addressing strategies. +e will examine threeopen addressing strategies1 linear probing, *uadratic probing, and double hashing.

Page 9: Comp 272 Notes

7/23/2019 Comp 272 Notes

http://slidepdf.com/reader/full/comp-272-notes 9/26

!inear 0ro)ing

Q +hen a collision ta!es place, you should search for the next aailable position in thetable by ma!ing a se*uential search. Thus the hash alues are generated by

Q after$collision1 h(!, i) F Hh(!) p(i) J mod Table&ize,Q where p(i) is the probing function after the ith probe. The probing function is one that

loo!s for the next aailable slot in case of a collision. The simplest probing function islinear probing, for which p(i) F i, where i is the step size.Q -onsider a simple example with table of size C, hence mod C. After hashing !eys , @,

and 6/, the table is shown below. 5ote that initially a simple diision hashing function,h(!) F ! mod Table&ize, wor!s fine. +e will use the modified hash function, h(!, i) FHh(!) i J mod Table&ize, only when there is a collision.

C / 6 = ? B @

  6/ @

Q +hen !eys / and ?= arrie, they are stored as follows. 5ote that the search !ey /results in a hash alue of , but slot is already occupied. Thus, using the modified hash

function, with i F , a new hash alue of / is obtained. <oweer, slot / is also occupied,so we reapply the modified hash function. This results in a slot alue of 6 that houses thesearch !ey /. The search !ey ?= directly hashes to slot =.

C / 6 = ? B @

 

6/

/

?=

  @

Q &uppose we hae another !ey with alue of =6. The !ey =6 cannot be stored in itsdesignated place because it collides with /, so a new place for it is found by linear probing to position ?, which is empty at this point1

C / 6 = ? B @

 

6/

/

?=

=6

  @

Q +hen the search reaches end of the table, it continues from the first location again. Thusthe !ey =@ will be stored as follows1

C / 6 = ? B @

=@ 6/ / ?= =6 @

Q In linear probing, the !eys start forming clusters, which hae a tendency to grow fast because more and more collisions ta!e place and the new !eys get attached to one end ofthe cluster. These are called primary clusters.

Q The problem with such clusters is that they generate unsuccessful searches. The searchmust go through to the end of the table and start from the beginning of the table.uadratic 0ro)ing

Q To oercome the primary clustering problem, *uadratic probing places the elementsfurther away rather than in immediate succession.

Q et h(!) be the hash function that maps a search !ey ! to an integer in HC, m L J. <ere mis the size of the table. 2ne choice is the following *uadratic function for the ith probe.That is, the modified hash function is used to probe only after a collision has been

Page 10: Comp 272 Notes

7/23/2019 Comp 272 Notes

http://slidepdf.com/reader/full/comp-272-notes 10/26

obsered.Q after collision1 h(!, i) F h(!) ci ci mod Table&ize, where c is not e*ual to CQ If c F C, then this hash function will become a linear probe. "or a gien hash table, the

alues c and c remain constant. "or m F n, a good choice for the constants are c F c FS.

Q "or a prime m P , most choices of c and c will ma!e h(!,i) distinct for i in HC,(m L ) MJ. &uch choices include c F c F M, c F c F , and c F C, c F .Q Although using *uadratic probing gies much better results than using linear probing, the

 problem of cluster buildup is not aoided altogether. &uch clusters are called secondaryclusters.

*ou)le (as"ing

Q The problem of secondary clustering is best addressed with double hashing. A secondfunction is used for resoling conflicts.

Q i!e linear probing, double hashing uses one hash alue as a starting point and thenrepeatedly steps forward an interal until the desired alue is located, an empty locationis reached, or the entire table has been searched. #ut the resulting interal is decided

using a second, independent hash function' hence the name double hashing. ;ienindependent hash functions h and h, the 7th probing for alue ! in a hash table of size mis

Q h(!, 7) F h(!) 7 h(!)) mod mQ +hateer scheme is used for hashing, it is obious that the search time depends on how

much of the table is filled up. The search time increases with the number of elements inthe table. In the worst case, one may hae to go through all the table entries.

Q &imilar to other open addressing techni*ues, double hashing becomes linear as the hashtable approaches maximum capacity. Also, it is possible for the secondary hash functionto ealuate to zero, for example, if we choose ! F = with the following function1 h(!) F= > (! mod ).Separate C"aining

Q A popular and space$efficient alternatie to the aboe schemes is separate chaininghashing. 0ach position of the table is associated with a lin!ed list or chain of structureswhose data field stores the !eys. The hashing table is a table of references to these lin!edlists. Thus, the !eys B, B, /B, B, and =B would hash to the same position, position B, inthe reference hash table.

Q In this scheme, the table can neer oerflow, because the lin!ed lists are extended onlyupon the arrial of new !eys. A new !ey is always added to the front of the lin!ed list,thus minimizing storage time. 4any unsuccessful searches may end up in empty lists,which reduce the search time of other hashing schemes. This is of course at the expenseof extra storage for lin!ed$list references. +hile searching for a !ey, you must first locatethe slot using the hash function and then search through the lin!ed list for the specificentry.

Page 11: Comp 272 Notes

7/23/2019 Comp 272 Notes

http://slidepdf.com/reader/full/comp-272-notes 11/26

Q In this example, ohn &mith and &andra Dee end up in the same buc!et1 table entry =.0ntry = points first to the ohn &mith ob7ect, which is lin!ed to the &andra Dee ob7ect.

Q Insertion of a new !ey re*uires appending to either end of the list in the hashed slot.Q Deletion re*uires searching the list and remoing the element.Q &tudy carefully and thoroughly the section titled U&eparate -hainingV in

http1MMen.wi!ipedia.orgMwi!iM&eparateWchainingX&eparateWchaining, particularly the two

different types of separate chainingRseparate chaining with list heads and separatechaining with other structures.Q This web page also introduces you to coalesced hashing, 3obin <ood hashing, cuc!oo

hashing, and hopscotch hashing, which may help you understand how they differ fromeach other.

Coalesced "as"ing

A hybrid of chaining and open addressing, coalesced hashing  lin!s together chains of nodeswithin the table itself.HJ  i!e open addressing, it achiees space usage and (somewhatdiminished) cache adantages oer chaining. i!e chaining, it does not exhibit clustering effects'in fact, the table can be efficiently filled to a high density. %nli!e chaining, it cannot hae more

elements than table slots.Cuc%oo "as"ing

Another alternatie open$addressing solution is cuc!oo hashing, which ensures constant loo!uptime in the worst case, and constant amortized time for insertions and deletions. It uses two or more hash functions, which means any !eyMalue pair could be in two or more locations. "or loo!up, the first hash function is used' if the !eyMalue is not found, then the second hashfunction is used, and so on. If a collision happens during insertion, then the !ey is re$hashed with

Page 12: Comp 272 Notes

7/23/2019 Comp 272 Notes

http://slidepdf.com/reader/full/comp-272-notes 12/26

the second hash function to map it to another buc!et. If all hash functions are used and there isstill a collision, then the !ey it collided with is remoed to ma!e space for the new !ey, and theold !ey is re$hashed with one of the other hash functions, which maps it to another buc!et. If thatlocation also results in a collision, then the process repeats until there is no collision or the process traerses all the buc!ets, at which point the table is resized. #y combining multiple hash

functions with multiple cells per buc!et, ery high space utilization can be achieed.(opscotc" "as"ing

Another alternatie open$addressing solution is hopscotch hashing,H/J  which combines theapproaches of  cuc!oo hashing and linear probing, yet seems in general to aoid their limitations.In particular it wor!s well een when the load factor grows beyond C.@. The algorithm is wellsuited for implementing a resizable concurrent hash table.The hopscotch hashing algorithm wor!s by defining a neighborhood of buc!ets near the originalhashed buc!et, where a gien entry is always found. Thus, search is limited to the number of entries in this neighborhood, which is logarithmic in the worst case, constant on aerage, andwith proper alignment of the neighborhood typically re*uires one cache miss. +hen inserting an

entry, one first attempts to add it to a buc!et in the neighborhood. <oweer, if all buc!ets in thisneighborhood are occupied, the algorithm traerses buc!ets in se*uence until an open slot (anunoccupied buc!et) is found (as in linear probing). At that point, since the empty buc!et isoutside the neighborhood, items are repeatedly displaced in a se*uence of hops. (This is similar to cuc!oo hashing, but with the difference that in this case the empty slot is being moed into theneighborhood, instead of items being moed out with the hope of eentually finding an emptyslot.) 0ach hop brings the open slot closer to the original neighborhood, without inalidating theneighborhood property of any of the buc!ets along the way. In the end, the open slot has beenmoed into the neighborhood, and the entry being inserted can be added to it.

.o)in (ood "as"ing

2ne interesting ariation on double$hashing collision resolution is 3obin <ood hashing.H6JH=J Theidea is that a new !ey may displace a !ey already inserted, if its probe count is larger than that ofthe !ey at the current position. The net effect of this is that it reduces worst case search times inthe table. This is similar to ordered hash tablesH?J except that the criterion for bumping a !ey doesnot depend on a direct relationship between the !eys. &ince both the worst case and the ariationin the number of probes is reduced dramatically, an interesting ariation is to probe the tablestarting at the expected successful probe alue and then expand from that position in bothdirections.HJ 0xternal 3obin <ashing is an extension of this algorithm where the table is storedin an external file and each table position corresponds to a fixed$sized page or buc!et with #records.HBJ

Unit 3: .ecursion

• Define recursion.

2ne of the most succinct properties of modern programming languages li!e -, -X, and aa

(as well as many others) is that these languages allow you to define methods that reference

themseles, such methods are said to be recursie. 2ne of the biggest adantages recursie

methods bring to the table is that they usually result in more readable, and compact solutions to

 problems.

Page 13: Comp 272 Notes

7/23/2019 Comp 272 Notes

http://slidepdf.com/reader/full/comp-272-notes 13/26

A recursie method then is one that is defined in terms of itself. ;enerally a recursie algorithms

has two main properties1

Q . 2ne or more base cases' and

Q . A recursie case

• Inspect recursi%e programs.

• )reate different types of recursi%e programs.• Differentiate recursi%e solutions from iterati%e solutions.

An iteratie solution uses no recursion whatsoeer. An iteratie solution relies only on the use of

loops (e.g. for, while, do$while, etc). The down side to iteratie algorithms is that they tend not to

 be as clear as to their recursie counterparts with respect to their operation. The ma7or adantage

of iteratie solutions is speed. 4ost production software you will find uses little or no recursie

algorithms whatsoeer. The latter property can sometimes be a companyKs prere*uisite to

chec!ing in code, e.g. upon chec!ing in a static analysis tool may erify that the code the

deeloper is chec!ing in contains no recursie algorithms. 5ormally it is systems leel code that

has this zero tolerance policy for recursie algorithms.

• Estimate the time complexity of recursi%e programs.

Unit 4: inary Trees

• Define binary tree4athematically, a binary tree is a connected, undirected, finite graph with no cycles, and noertex of degree greater than three.

"or most computer science applications, binary trees are rooted1 A special node, , of degree at

most two is called the root of the tree. "or eery node, , the second node on the path to

 is called the parent of . 0ach of the other nodes ad7acent to is called a child of . 4ostof the binary trees we are interested in are ordered, so we distinguish between the left child and

right child of .

The depth of a node, , in a binary tree is the length of the path from to the root of the tree.

If a node, , is on the path from to , then is called an ancestor of and a

descendant of . The subtree of a node, , is the binary tree that is rooted at and contains

all s descendants. The height of a node, , is the length of the longest path from to one of 

Page 14: Comp 272 Notes

7/23/2019 Comp 272 Notes

http://slidepdf.com/reader/full/comp-272-notes 14/26

its descendants. The height of a tree is the height of its root. A node , is a leaf if it has nochildren.

Q +e sometimes thin! of the tree as being augmented with external nodes. Any node thatdoes not hae a left child has an external node as its left child, and, correspondingly, any

node that does not hae a right child has an external node as its right child (see"igure ?..b).

• Define binary search tree.

A #inary&earchTree is a special !ind of binary tree in which each node, , also stores a data

alue: , from some total order. The data alues in a binary search tree obey the binary

search tree property1 "or a node, , eery data alue stored in the subtree rooted at is

less than : and eery data alue stored in the subtree rooted at is greater than

.• Examine a binary tree and binary search tree.

• Implement a binary tree and binary search tree.

• Define A* tree.An AY tree is a self$balancing binary search tree in which the heights of the two child subtreesof any node differ by at most ' therefore, it is also said to be height$balanced. oo!up, insertion,and deletion all ta!e 2(log n) time in both the aerage and worst cases, where n is the number ofnodes in the tree prior to the operation. Insertions and deletions may re*uire the tree to berebalanced by one or more tree rotations.The balance factor of a node is the height of its right subtree minus the height of its left subtree,and a node with balance factor , C, or L is considered balanced. A node with any other balancefactor is considered unbalanced and re*uires rebalancing the tree. The balance factor is eitherstored directly at each node or computed from the heights of the subtrees.AY trees are often compared to red>blac! trees (&ee %nit @) because they support the same setof operations and because redLblac! trees also ta!e 2(log n) time for the basic operations. AYtrees perform better than red>blac! trees for loo!up$intensie applications. AY trees, red>blac!trees, and (,6) trees, to be introduced in %nit @ and -hapter @ of 4orinKs boo!, share a numberof good properties, but AY trees and (,6) trees may re*uire some extra operation to deal withrestructuring (rotations), fusing, or splitting. <oweer, red>blac! trees do not hae thesedrawbac!s.

Unit 5: Scapegoat Trees

• Define scapegoat tree.

A &capegoatTree is a #inary&earchTree that, in addition to !eeping trac! of the number, , of 

nodes in the tree also !eeps a counter, , that maintains an upper$bound on the number of 

Page 15: Comp 272 Notes

7/23/2019 Comp 272 Notes

http://slidepdf.com/reader/full/comp-272-notes 15/26

nodes.

At all times, and obey the following ine*ualities1

In addition, a &capegoatTree has logarithmic height' at all times, the height of the scapegoat treedoes not exceed

• Examine a scapegoat tree.

%nfortunately, it will sometimes happen that . In this case, we need to

reduce the height. This isnt a big 7ob' there is only one node, namely , whose depth exceeds

. To , we wal! from bac! up to the root loo!ing for a scapegoat, . The

scapegoat, , is a ery unbalanced node. It has the property that

Q where is the child of on the path from the root .Q implement a scapegoat tree.

Unit 6: .ed7lac% Trees

Q Define red>blac! tree.Q A $6 tree is a rooted tree with the following properties1Q 0roperty 681 (height) All leaes hae the same depth.Q 0roperty 682 (degree) 0ery internal node has , /, or 6 children.

Q !emma 6881  A $6 tree with leaes has height at most .

Page 16: Comp 272 Notes

7/23/2019 Comp 272 Notes

http://slidepdf.com/reader/full/comp-272-notes 16/26

• Examine a red+blac# tree.

A red$blac! tree is a binary search tree in which each node, , has a colour which is either red

or blac!. 3ed is represented by the alue C and blac! by the alue .

#efore and after any operation on a red$blac! tree, the following two properties are satisfied.0ach property is defined both in terms of the colours red and blac!, and in terms of the numericalues C and .

0roperty 688$ (blac!$height) There are the same number of blac! nodes on eery root to leaf  path. (The sum of the colours on any root to leaf path is the same.)

0roperty 688&  (no$red$edge) 5o two red nodes are ad7acent. ("or any , except the root,

.)5otice that we can always colour the root, , of a

red$blac! tree blac! without iolating either of these two properties, so we will assume that theroot is blac!, and the algorithms for updating a red$blac! tree will maintain this. Another tric! 

that simplifies red$blac! trees is to treat the external nodes (represented by ) as blac! nodes.

This way, eery real node, , of a red$blac! tree has exactly two children, each with a well$defined colour. "urthermore, the blac!$height property now guarantees that eery root$to$leaf 

 path in has the same length. In other words, is a $6 treeZ

!emma 6882  The height of red$blac! tree with nodes is at most .

• implement a red>blac! tree.

Unit 19: (eaps

• Define heap.

A heap is a data structure created using a binary tree. It can be seen as a binary tree with twoadditional constraints1The s"ape property1 the tree is a complete binary tree' that is, all leels of the tree, except

 possibly the last one (deepest) are fully filled, and, if the last leel of the tree is not complete, thenodes of that leel are filled from left to right.The "eap property1 each node is greater than or e*ual to each of its children according to somecomparison predicate which is fixed for the entire data structure.;reater than means according to whateer comparison function is chosen to sort the heap, notnecessarily greater than in the mathematical sense because the *uantities are not alwaysnumerical. <eaps where the comparison function is the mathematical greater than are calledmax$heaps' those where the comparison function is the mathematical less than are called min$

Page 17: Comp 272 Notes

7/23/2019 Comp 272 Notes

http://slidepdf.com/reader/full/comp-272-notes 17/26

heaps. -onentionally, min$heaps are used because they are readily applicable for use in priority*ueues. 5ote that the ordering of siblings in a heap is not specified by the heap property, so the twochildren of a parent can be freely interchanged, as long as this does not iolate the shape andheap properties.

The binary heap is a special case of the d$ary heap in which d F .

• Examine a binary heap tree.

If we apply 0ytzingers method to a sufficiently large tree, some patterns emerge. The left

child of the node at index is at index and the right child of the

node at index . The parent of the node at is at index

.

A #inary<eap implements the (priority) 8ueue interface.

• Implement a binary heap tree.• Define meldable heap.

The 4eldable<eap, a priority 8ueue implementation in which the underlying structure is also aheap$ordered binary tree. <oweer, unli!e a #inary<eap in which the underlying binary tree iscompletely defined by the number of elements, there are no restrictions on the shape of the binary tree that underlies a 4eldable<eap' anything goes.

The and operations in a 4eldable<eap are implemented in

terms of the operation. This operation ta!es two heap nodes and

 and merges them, returning a heap node that is the root of a heap that contains all

elements in the subtree rooted at and all elements in the subtree rooted at .

• Examine a randomi'ed meldable heap.

Unit 11: Sorting Algorit"ms

• Describe sorting algorithms $merge, quic#, heap, counting, radix&.

,erge sort

The merge$sort algorithm is a classic example of recursie diide and con*uer1 If the length of 

 is at most , then is already sorted, so we do nothing. 2therwise, we split into two

hales and . +e recursiely

Page 18: Comp 272 Notes

7/23/2019 Comp 272 Notes

http://slidepdf.com/reader/full/comp-272-notes 18/26

 and , and then we merge (the now sorted) and to get our fully sorted array 1

uic% sort

The *uic!sort algorithm is another classic diide and con*uer algorithm. %nli!e merge$sort, which does merging after soling the two subproblems, *uic!sort does all of itswor! upfront.

8uic!sort is simple to describe1 [ic! a random piot element, , from ' partition

into the set of elements less than , the set of elements e*ual to , and the set of

elements greater than ' and, finally, recursiely sort the first and third sets in this partition.<eap &ort

(eap-sort

The heap$sort algorithm is another in$place sorting algorithm. <eap$sort uses the binary heapsdiscussed in &ection C.. 3ecall that the #inary<eap data structure represents a heap using a

single array. The heap$sort algorithm conerts the input array into a heap and then repeatedlyextracts the minimum alue.

4ore specifically, a heap stores elements in an array, , at array locations

 with the smallest alue stored at the root, . After

Page 19: Comp 272 Notes

7/23/2019 Comp 272 Notes

http://slidepdf.com/reader/full/comp-272-notes 19/26

transforming into a #inary<eap, the heap$sort algorithm repeatedly swaps and

, decrements , and calls so that

 once again are a alid heap representation. +hen this process

ends (because ) the elements of are stored in decreasing order, so isreersed to obtain the final sorted order.."igure .6 shows an example of the execution

of .-onsider a statement of the form

This statement executes in constant time, but possible different outcomes,

depending on the alue of . This means that the execution of an algorithm that ma!es sucha statement cannot be modelled as a binary tree. %ltimately, this is the reason that the algorithmsin this section are able to sort faster than comparison$based algorithms.

Counting Sort

&uppose we hae an input array consisting integers, each in the range . The

counting$sort algorithm sorts using an auxiliary array of counters. It outputs a sorted

ersion of as an auxiliary array .

The idea behind counting$sort is simple1 "or , count the number of

occurrences of in and store this . 5ow, after sorting, the output will loo! li!e

occurrences of C, followed by occurrences of , followed by occurrences of ,...,

followed by occurrences .

Page 20: Comp 272 Notes

7/23/2019 Comp 272 Notes

http://slidepdf.com/reader/full/comp-272-notes 20/26

.adix-Sort

-ounting$sort is ery efficient for sorting an array of integers when the length, , of the array is

not much smaller than the maximum alue, , that appears in the array. The radix$sortalgorithm, which we now describe, uses seeral passes of counting$sort to allow for a muchgreater range of maximum alues.

3adix$sort sorts $bit integers by using passes of counting$sort to sort these integers

 bits at a time.. 4ore precisely, radix sort first sorts the integers by their least significant bits,

then their next significant bits, and so on until, in the last pass, the integers are sorted by their

most significant bits.

Page 21: Comp 272 Notes

7/23/2019 Comp 272 Notes

http://slidepdf.com/reader/full/comp-272-notes 21/26

• Estimate the complexity of sorting algorithms.

,erge sort

Therefore, the total amount of time ta!en by merge$sort is

The mergeWsort(a) algorithm runs time and performs at mostcomparisons.

uic% sort

+hen *uic!sort is called to sort an array containing the , the expected number of 

times element is compared to a piot element is at most .

A little summing up of harmonic numbers gies us the following theorem about the running timeof *uic!sort1

T"eorem 11882  +hen *uic!sort is called to sort an array containing distinct elements, the

expected number of comparisons performed is at most .

  The method runs in expected time and the expected

number of comparisons it performs is at .

(eap sort

Page 22: Comp 272 Notes

7/23/2019 Comp 272 Notes

http://slidepdf.com/reader/full/comp-272-notes 22/26

The method runs time and performs at most

 comparisons.

-ounting sort

The method can sort an array containing integers in the set

  time.

.adix sort

"or any integer , the method can sort an array containing

$bit integers in time.

If we thin!, instead, of the elements of the array being in the range , and ta!e

 we obtain the following ersion of Theorem .B.

Corollary 11881  The method can sort an containing integer alues

in the range time.Q compare sorting algorithms.

Page 23: Comp 272 Notes

7/23/2019 Comp 272 Notes

http://slidepdf.com/reader/full/comp-272-notes 23/26

  comparisons in$place

4erge$sort worst$case 5o

8uic!sort expected \es

<eap$sort 

worst$case \es

0ach of these comparison$based algorithms has its adantages and disadantages. 4erge$sortdoes the fewest comparisons and does not rely on randomization. %nfortunately, it uses anauxiliary array during its merge phase. Allocating this array can be expensie and is a potential point of failure if memory is limited. 8uic!sort is an in$place algorithm and is a close second interms of the number of comparisons, but is randomized, so this running time is not alwaysguaranteed. <eap$sort does the most comparisons, but it is in$place and deterministic.There is one setting in which merge$sort is a clear$winner' this occurs when sorting a lin!ed$list.In this case, the auxiliary array is not needed' two sorted lin!ed lists are ery easily merged into asingle sorted lin!ed$list by pointer manipulations (see 0xercise .).The counting$sort and radix$sort algorithms described here are due to &eward H??, &ection .6.?J.<oweer, ariants of radix$sort hae been used since the @Cs to sort punch cards using punched card sorting machines. These machines can sort a stac! of cards into two piles based onthe existence (or not) of a hole in a specific location on the card. 3epeating this process for different hole locations gies an implementation of radix$sort."inally, we note that counting sort and radix$sort can be used to sort other types of numbers besides non$negatie integers. &traightforward modifications of counting sort can sort integers,

in any , in time. &imilarly, radix sort can sort integers in the same

interal in time. "inally, both of these algorithms can also be used to sortfloating point numbers in the I000 =6 floating point format. This is because the I000 format isdesigned to allow the comparison of two floating point numbers by comparing their alues as if they were integers in a signed$magnitude binary representation.

Unit 12: rap"s

4athematically, a (directed) graph is a pair where is a set of ertices and

is a set of ordered pairs of ertices called edges. An edge is directed from to ' is

called the source of the edge and is called the target. A path in is a se*uence of ertices :

  such that, for eery , the edge is in . A path

Page 24: Comp 272 Notes

7/23/2019 Comp 272 Notes

http://slidepdf.com/reader/full/comp-272-notes 24/26

 is a cycle if, additionally, the edge is in . A path (or cycle) is simple

if all of its ertices are uni*ue. If there is a path from some ertex to some ertex then

we say that is reachable from . An example of a graph is shown in "igure ..

igure 1281: A graph with twele ertices. Yertices are drawn as numbered circles and edges aredrawn as pointed cures pointing from source to target.

Q 3epresent a graph by a matrix

+here the ad7acency matrix performs poorly is with the and

operations. To implement these, we must scan all entries in the corresponding row or column

 and gather up all the indices, , , respectiely , is true. These operations

clearly ta!e time per operation.

Another drawbac! of the ad7acency matrix representation is that it is large. It stores an

 boolean matrix, so it re*uires at bits of memory. The implementation here uses a matrix of

alues so it actually uses on the order of bytes of memory.

Despite its high memory re*uirements and poor performance of the and

Page 25: Comp 272 Notes

7/23/2019 Comp 272 Notes

http://slidepdf.com/reader/full/comp-272-notes 25/26

 operations, an Ad7acency4atrix can still be useful for some applications. In

 particular, when the graph is dense, i.e., it has close edges, then a memory usage

may be acceptable.

• epresent a graph in ad-acency lists

The space used by a Ad7acencyists .• Understand the execution process of the depth"first"search and bread"first"search algorithms for

tra%ersing a graph

• Analy'e the performance of the depth"first"search and bread"first"search algorithms for

tra%ersing a graph

+hen gien as input a ;raph, , that is implemented using the Ad7acencyists data structure,

the algorithm runs in time.A particularly useful application of the breadth$first$search algorithm is, therefore, in computingshortest paths

+hen gien as input a ;raph, , that is implemented using the Ad7acencyists data structure,

  algorithms each run time.

• Implement those search algorithms for tra%ersing a graph in pseudo"code or other programming languages, such as a%a, ), or )//, etc.

Unit 1$: inary Trie

• define trie.

A #inaryTrie encodes a set bit integers in a binary tree. All leaes in the tree hae depth

and each integer is encoded as a root$to$leaf path. The path for the integer turns left at leel

if the th most significant bit is a C and turns right if it is a .

the method runs time.• Examine a binary trie.• Explain binary trie.

Page 26: Comp 272 Notes

7/23/2019 Comp 272 Notes

http://slidepdf.com/reader/full/comp-272-notes 26/26

0ach node, , also contains an additional . s left child is missing,

 points to the smallest leaf in s subtree. If s right child is missing, then points to

the largest leaf in s subtree.