lecture 10 : searching and hashingbu.edu.eg/portal/uploads/computers and informatics...7 searching...

27
Data Structures Lecture 10 : Searching and Hashing 0 Dr. Essam Halim Houssein Lecturer, Faculty of Computers and Informatics, Benha University http://bu.edu.eg/staff/esamhalim14

Upload: others

Post on 12-Oct-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture 10 : Searching and Hashingbu.edu.eg/portal/uploads/Computers and Informatics...7 Searching and Hashing 1. Find the middle element of the array (i.e., n/2 is the middle element)

Data Structures

Lecture 10 :

Searching and Hashing

0

Dr. Essam Halim HousseinLecturer, Faculty of Computers and Informatics, Benha Universityhttp://bu.edu.eg/staff/esamhalim14

Page 2: Lecture 10 : Searching and Hashingbu.edu.eg/portal/uploads/Computers and Informatics...7 Searching and Hashing 1. Find the middle element of the array (i.e., n/2 is the middle element)

2

Searching and Hashing

Searching is a process of checking and finding an element from a list of

elements. Let A be a collection of data elements, i.e., A is a linear array of

n elements. If we want to find an element “data” in A, then we have to

search for it. The search is successful if data does appear in A and

unsuccessful if otherwise. There are several types of searching

techniques; one has some advantage(s) over other. Following are the two

important searching techniques :

1. Linear or Sequential Searching

2. Binary Searching

Page 3: Lecture 10 : Searching and Hashingbu.edu.eg/portal/uploads/Computers and Informatics...7 Searching and Hashing 1. Find the middle element of the array (i.e., n/2 is the middle element)

3

Searching and Hashing

7.1. LINEAR OR SEQUENTIAL SEARCHING

In linear search, each element of an array is read one by

one sequentially and it is compared with the desired element.

A search will be unsuccessful if all the elements are read and

the desired element is not found.

Page 4: Lecture 10 : Searching and Hashingbu.edu.eg/portal/uploads/Computers and Informatics...7 Searching and Hashing 1. Find the middle element of the array (i.e., n/2 is the middle element)

4

Searching and Hashing

A be an array of n elements. data is the element to be searched. Then find

the location loc of data in A.

1. Input an array A of n elements, data and initialize loc = – 1.

2. Initialize i = 0; and repeat step 3 if (i < n) by incrementing i.

3. If (data == A[i])

(a) loc = i

(b) Display “data is found “ at Location loc.

4. Else

(a) Display “data is not found ”

5. Exit

Page 5: Lecture 10 : Searching and Hashingbu.edu.eg/portal/uploads/Computers and Informatics...7 Searching and Hashing 1. Find the middle element of the array (i.e., n/2 is the middle element)

5

int linearSearch(int array[], int data, int size)

{

int i;

for (i=0; i<size; ++i)

{

if (array[i]== data) // compare data with each element

{

return i;

}

}

return -1; // not matching

}

Searching and Hashing

http://www.youtube.com/watch?v=FYX_yG-9CxE

Page 6: Lecture 10 : Searching and Hashingbu.edu.eg/portal/uploads/Computers and Informatics...7 Searching and Hashing 1. Find the middle element of the array (i.e., n/2 is the middle element)

6

Searching and Hashing

7.2. BINARY SEARCH

Binary search is an extremely efficient algorithm when it is

compared to linear search. Binary search technique searches

“data” in minimum possible comparisons. Suppose the given

array is a sorted one, otherwise first we have to sort the array

elements.

Then apply the following conditions to search a “data”.

Page 7: Lecture 10 : Searching and Hashingbu.edu.eg/portal/uploads/Computers and Informatics...7 Searching and Hashing 1. Find the middle element of the array (i.e., n/2 is the middle element)

7

Searching and Hashing

1. Find the middle element of the array (i.e., n/2 is the middle element).

2. Compare the data with the middle element to be searched, then

there are following three cases.

(a) If it is a desired element, then search is successful.

(b) If it is less than desired data, then search only the first half of the

array, i.e., the elements which come to the left side of the middle

element.

(c) If it is greater than the desired data, then search only the second

half of the array, i.e., the elements which come to the right side of

the middle element.

Repeat the same steps until an element is found or exhaust the

search area.

Page 8: Lecture 10 : Searching and Hashingbu.edu.eg/portal/uploads/Computers and Informatics...7 Searching and Hashing 1. Find the middle element of the array (i.e., n/2 is the middle element)

8

Searching and Hashing

7.2.1. ALGORITHM FOR BINARY SEARCH

Let A be an array of n elements A[1],A[2],A[3],...... A[n]. “Data” is an element

to be searched. “mid” denotes the middle location of a segment of the

element of A. LB and UB is the lower and upper bound of the array which is

under consideration.

Page 9: Lecture 10 : Searching and Hashingbu.edu.eg/portal/uploads/Computers and Informatics...7 Searching and Hashing 1. Find the middle element of the array (i.e., n/2 is the middle element)

9

Searching and Hashing

Suppose we have an array of 7 elements

LB = 0; UB = 6

mid = (0 + 6)/2 = 3

A[mid] = A[3] = 30

Page 10: Lecture 10 : Searching and Hashingbu.edu.eg/portal/uploads/Computers and Informatics...7 Searching and Hashing 1. Find the middle element of the array (i.e., n/2 is the middle element)

10

Step 2:

Since (A[3] < data) - i.e., 30 < 45 - reinitialize the variable LB, UB and mid

LB = 4 UB = 6

mid = (4 + 6)/2 = 5

A[mid] = A[5] = 45

Searching and Hashing

LB UB

Page 11: Lecture 10 : Searching and Hashingbu.edu.eg/portal/uploads/Computers and Informatics...7 Searching and Hashing 1. Find the middle element of the array (i.e., n/2 is the middle element)

11

Searching and Hashing

Step 3:

Since (A[5] < data) - i.e., 45 < 45 – no, Since A[5]== data , searching is

successful.

LB UB

Page 12: Lecture 10 : Searching and Hashingbu.edu.eg/portal/uploads/Computers and Informatics...7 Searching and Hashing 1. Find the middle element of the array (i.e., n/2 is the middle element)

12

Searching and Hashing

BINARY SEARCH Algorithm

1. Input an array A of n elements and data to be searched.

2. low= 0, upper= n-1;

3. Repeat step 4 : 7 while (low <= upper)

4. mid = (low + upper)/2

5. if (data > A[mid])

(a) low = mid + 1

6. Else If (data < A[mid])

(a) upper = mid–1

7. Else If (A[mid]== data)

(a) Display “the data found at mid”

8. Display “the data is not found”

9. Exit

Page 13: Lecture 10 : Searching and Hashingbu.edu.eg/portal/uploads/Computers and Informatics...7 Searching and Hashing 1. Find the middle element of the array (i.e., n/2 is the middle element)

13

int binarySearch( int arr [ ], int n, int data)

{

int mid, low = 0, upper = n-1;

while( low <= upper)

{

mid=( low + upper )/2;

if( data > arr [mid])

low = mid+1;

else if ( data < arr [mid])

upper = mid-1;

else if ( data = arr [mid]

return data; // data found

}

return 0; // data not found

}

Page 14: Lecture 10 : Searching and Hashingbu.edu.eg/portal/uploads/Computers and Informatics...7 Searching and Hashing 1. Find the middle element of the array (i.e., n/2 is the middle element)

14

Searching and Hashing

Page 15: Lecture 10 : Searching and Hashingbu.edu.eg/portal/uploads/Computers and Informatics...7 Searching and Hashing 1. Find the middle element of the array (i.e., n/2 is the middle element)

15

Searching and Hashing

7.3. HASHING

The searching time of the each searching technique, that were

discussed in the previous section, depends on the comparison. i.e.,

n comparisons required for an array A with n elements. To increase

the efficiency, i.e., to reduce the searching time, we need to

avoid unnecessary comparisons.

Page 16: Lecture 10 : Searching and Hashingbu.edu.eg/portal/uploads/Computers and Informatics...7 Searching and Hashing 1. Find the middle element of the array (i.e., n/2 is the middle element)

16

Searching and Hashing

Hashing is a technique where we can compute the location of the

desired record in order to retrieve it in a single access. Let there is a

table of n employee records and each employee record is defined by a

unique employee code, which is a key to the record and employee name. If

the key (or employee code) is used as the array index, then the record can

be accessed by the key directly. If L is the memory location where each

record is related with the key. If we can locate the memory address of a

record from the key then the desired record can be retrieved in a

single access.

Page 17: Lecture 10 : Searching and Hashingbu.edu.eg/portal/uploads/Computers and Informatics...7 Searching and Hashing 1. Find the middle element of the array (i.e., n/2 is the middle element)

17

Searching and Hashing

We assume that the keys k and the address L are integers. So the location

is selected by applying a function which is called hash function. Function

H may not yield different values (or index or many address); it is possible

that two different keys k1 and k2 will yield the same hash address. This

situation is called Hash Collision.

Page 18: Lecture 10 : Searching and Hashingbu.edu.eg/portal/uploads/Computers and Informatics...7 Searching and Hashing 1. Find the middle element of the array (i.e., n/2 is the middle element)

18

Searching and Hashing

7.3.1. HASH FUNCTION

The basic idea of hash function is the transformation of the key into

the corresponding location in the hash table. A Hash function H can

be defined as a function that takes key as input and transforms it into a

hash table index. Following are the most popular hash functions :

1. Division method

2. Mid Square method

3. Folding method.

Page 19: Lecture 10 : Searching and Hashingbu.edu.eg/portal/uploads/Computers and Informatics...7 Searching and Hashing 1. Find the middle element of the array (i.e., n/2 is the middle element)

19

Searching and Hashing

7.3.1.1. Division Method

TABLE is an array of database file where the employee details are stored. Choose

a number m, which is larger than the number of keys k. i.e., m is greater than the

total number of records in the TABLE. The number m is usually chosen to be

prime number to minimize the collision.

The hash function H is defined by H(k) = k (mod m) Where H(k) is the hash

address (or index of the array) and here k (mod m) means the remainder when k

is divided by m.

For example:

Let a company has 90 employees and 00, 01, 02, ...... 99 be the two digit 100

memory address (or index or hash address) to store the records. We have

employee code as the key.

Page 20: Lecture 10 : Searching and Hashingbu.edu.eg/portal/uploads/Computers and Informatics...7 Searching and Hashing 1. Find the middle element of the array (i.e., n/2 is the middle element)

20

Searching and Hashing

Choose m in such a way that it is grater than 90. Suppose m = 93. Then

for the following employee code (or key k) :

H(k) = H(2103) = 2103 (mod 93) = 57

H(k) = H(6147) = 6147 (mod 93) = 9

H(k) = H(3750) = 3750 (mod 93) = 30

Then a typical employee hash table will look like as in Fig. 7.2.

Page 21: Lecture 10 : Searching and Hashingbu.edu.eg/portal/uploads/Computers and Informatics...7 Searching and Hashing 1. Find the middle element of the array (i.e., n/2 is the middle element)

21

Searching and Hashing

Page 22: Lecture 10 : Searching and Hashingbu.edu.eg/portal/uploads/Computers and Informatics...7 Searching and Hashing 1. Find the middle element of the array (i.e., n/2 is the middle element)

22

Searching and Hashing

So if you enter the employee code to the hash function, we

can directly retrieve TABLE[H(k)] details directly. Note that if

the memory address begins with 01-m instead of 00-m, then

we have to choose the hash function H(k) = k (mod m)+1.

The very simple hash table example

http://www.algolist.net/Data_structures/Hash_table/Simple_example

Page 23: Lecture 10 : Searching and Hashingbu.edu.eg/portal/uploads/Computers and Informatics...7 Searching and Hashing 1. Find the middle element of the array (i.e., n/2 is the middle element)

23

Searching and Hashing

7.3.2. HASH COLLISION

It is possible that two non-identical keys K1, K2 are hashed into the same

hash address. This situation is calledHash Collision.

Page 24: Lecture 10 : Searching and Hashingbu.edu.eg/portal/uploads/Computers and Informatics...7 Searching and Hashing 1. Find the middle element of the array (i.e., n/2 is the middle element)

24

Searching and Hashing

Let us consider a hash table having 7 locations as shown in Fig. 7.6.

Division method is used to hash the key.

H(k) = k (mod m)

Herem is chosen as 10. The Hash function produces any integer between

0 and 9, depending on the value of the key. If we want to insert a new

record with key 500 then

H(500) = 500(mod 10) = 0.

The location 0 in the table is already filled (i.e., not empty). Thuscollision occurred. Collisions are almost impossible to avoid but itcan be minimized considerably by introducing any one of the following::1. Open addressing2. Chaining3. Bucket addressing

Page 25: Lecture 10 : Searching and Hashingbu.edu.eg/portal/uploads/Computers and Informatics...7 Searching and Hashing 1. Find the middle element of the array (i.e., n/2 is the middle element)

25

Searching and Hashing

2. Chaining

If we try to insert a new record with a key 500 then H(500) = 500 (mod 10) =0. Then

the collision occurs in normal way because there exists a record in the 0th

position. But in chaining corresponding linked list can be extended to accommodate

the new record with the key.

Page 26: Lecture 10 : Searching and Hashingbu.edu.eg/portal/uploads/Computers and Informatics...7 Searching and Hashing 1. Find the middle element of the array (i.e., n/2 is the middle element)

26

Applications of hash tables

✎ Lots of recent research into using distributed hash tables in

peer-to-peer networks (searching, lookup)

✏ Symbol tables (compilers)

✏ Databases (of phone numbers, IP addresses, etc.)

✏ Dictionaries

http://www.sourcecodesworld.com/

Page 27: Lecture 10 : Searching and Hashingbu.edu.eg/portal/uploads/Computers and Informatics...7 Searching and Hashing 1. Find the middle element of the array (i.e., n/2 is the middle element)

Any Questions?

27

Searching and Hashing