data structuresjnujprdistance.com/assets/lms/lms jnu/b.sc. (computer science)/s… · board of...

Data Structures

Board of Studies

Prof. H. N. Verma Prof. M. K. GhadoliyaVice- Chancellor Director, Jaipur National University, Jaipur School of Distance Education and Learning Jaipur National University, JaipurDr. Rajendra Takale Prof. and Head AcademicsSBPIM, Pune

___________________________________________________________________________________________

Subject Expert Panel

Dr. Ramchandra G. Pawar Ashwini PanditDirector, SIBACA, Lonavala Subject Matter ExpertPune

___________________________________________________________________________________________

Content Review Panel

Gaurav Modi Shubhada PawarSubject Matter Expert Subject Matter Expert

___________________________________________________________________________________________Copyright ©

This book contains the course content for Data Structures.

First Edition 2013

Printed byUniversal Training Solutions Private Limited

Address05th Floor, I-Space, Bavdhan, Pune 411021.

All rights reserved. This book or any portion thereof may not, in any form or by any means including electronic or mechanical or photocopying or recording, be reproduced or distributed or transmitted or stored in a retrieval system or be broadcasted or transmitted.

___________________________________________________________________________________________

I

Index

ContentI. ...................................................................... II

List of FiguresII. ..........................................................VI

List of TablesIII. ......................................................... VII

AbbreviationsIV. ......................................................VIII

ApplicationV. ............................................................. 109

BibliographyVI. ......................................................... 130

Self Assessment AnswersVII. ................................... 131

Book at a Glance

II

Contents

Chapter I ....................................................................................................................................................... 1Introduction to Data Structure ................................................................................................................... 1Aim ................................................................................................................................................................ 1Objectives ...................................................................................................................................................... 1Learning outcome .......................................................................................................................................... 11.1 Introduction .............................................................................................................................................. 21.2 Types of Data Structure ............................................................................................................................ 21.3 Linear Data Structure ............................................................................................................................... 21.4 Types of Linear Data Structure ................................................................................................................ 2 1.4.1 Array ........................................................................................................................................ 2 1.4.2 Linear Link List ....................................................................................................................... 6 1.4.3 Stack ......................................................................................................................................... 8 1.4.4 Queues ................................................................................................................................... 101.5 Non-Linear Data Structure ......................................................................................................................11 1.5.1 Tree .........................................................................................................................................11 1.5.1.1 Definition of Tree .....................................................................................................11 1.5.1.2 Types of Trees ..........................................................................................................11 1.5.1.3 Tree Terminology .................................................................................................... 12 1.5.1.4 Rooted Tree ............................................................................................................. 13 1.5.1.5 Ordered Tree ............................................................................................................ 13 1.5.1.6 Binary Trees ............................................................................................................ 13 1.5.1.7 Example of Trees ..................................................................................................... 14 1.5.1.8 Binary Search Tree .................................................................................................. 15 1.5.1.9 ++ Tree Traversal Types .......................................................................................... 16 1.5.1.10 Traversal Method .................................................................................................. 17Summary ..................................................................................................................................................... 18References .................................................................................................................................................. 18Recommended Reading ............................................................................................................................. 18Self Assessment ........................................................................................................................................... 18

Chapter II ................................................................................................................................................... 21Data Type and Basics of C Programming ............................................................................................... 21Aim .............................................................................................................................................................. 21Objectives .................................................................................................................................................... 21Learning outcome ........................................................................................................................................ 212.1 Introduction ............................................................................................................................................ 222.2 Introduction to ‘C’ .................................................................................................................................. 222.3 Identifier and Keywords ........................................................................................................................ 232.4 Data Types and Constants ...................................................................................................................... 232.5 Variables ................................................................................................................................................. 252.6 Operators and Expressions ..................................................................................................................... 262.7 Preprocessor Directives ......................................................................................................................... 29Summary ..................................................................................................................................................... 32References ................................................................................................................................................... 32Recommended Reading ............................................................................................................................. 32Self Assessment ........................................................................................................................................... 33

Chapter III .................................................................................................................................................. 35Algorithm Analysis .................................................................................................................................... 35Aim ............................................................................................................................................................. 35Objectives ................................................................................................................................................... 35Learning outcome ....................................................................................................................................... 353.1 Introduction ........................................................................................................................................... 36

III

3.2 Comparing Algorithms........................................................................................................................... 363.3 Measuring Algorithms .......................................................................................................................... 363.4 Timing Algorithms ................................................................................................................................ 363.5 Running Time......................................................................................................................................... 363.6 Counting ................................................................................................................................................. 373.7 Cost Functions ....................................................................................................................................... 373.8 Input Varies ............................................................................................................................................ 373.9 Best, Average and Worst Case .............................................................................................................. 373.10 Analysing Algorithms .......................................................................................................................... 383.11 Comparisons ......................................................................................................................................... 383.12 Searching and Sorting .......................................................................................................................... 383.13 Searching ............................................................................................................................................. 383.14 Main Type of Searching ....................................................................................................................... 383.15 Some Important Points ......................................................................................................................... 383.16 Types of Searching Algorithms ............................................................................................................ 393.17 Sequential Search ................................................................................................................................ 393.18 Some Drawbacks ................................................................................................................................. 393.19 Algorithm Sequential Search ............................................................................................................... 393.20 Linear Search Algorithm ...................................................................................................................... 393.21 Complexity of Linear Search Algorithm ............................................................................................. 403.22 Linear Search Code for Array ............................................................................................................. 403.23 Binary Search ....................................................................................................................................... 403.24 Sorting Technique ................................................................................................................................ 423.25 Bubble Sort .......................................................................................................................................... 423.26 Insertion Sort ........................................................................................................................................ 433.27 Merge Sort............................................................................................................................................ 453.28 Quick Sort ............................................................................................................................................ 463.29 Quick Sort ............................................................................................................................................ 473.30 Selection Sort ....................................................................................................................................... 483.31 Radix Sort ............................................................................................................................................ 49Summary .................................................................................................................................................... 50References ................................................................................................................................................... 50Recommended Reading ............................................................................................................................ 50Self Assessment .......................................................................................................................................... 51

Chapter IV ................................................................................................................................................. 53Complexity of Algorithm .......................................................................................................................... 53Aim .............................................................................................................................................................. 53Objectives .................................................................................................................................................... 53Learning outcome ....................................................................................................................................... 534.1 Introduction to Algorithm ...................................................................................................................... 544.2 Algorithm's Performance ....................................................................................................................... 544.3 Growth of Functions: Asymptotic Notation ........................................................................................... 544.4 Types of Asymptotic Notations ............................................................................................................. 544.5 Θ - Notation (Same order) ..................................................................................................................... 554.6 O- Notation (Upper Bound) ................................................................................................................... 564.7 Ω-Notation (Lower Bound) ................................................................................................................... 574.8 o-Notation .............................................................................................................................................. 574.9 ω-Notation .............................................................................................................................................. 574.10 Relations Between Θ,O and Ω ............................................................................................................. 584.11 Running Times ..................................................................................................................................... 584.12 Algorithm Analysis .............................................................................................................................. 584.13 Optimality ............................................................................................................................................ 594.14 Reduction ............................................................................................................................................. 594.15 Comparison of Functions ..................................................................................................................... 59

IV

4.16 Properties ............................................................................................................................................. 594.17 Common Functions ............................................................................................................................. 60Summary .................................................................................................................................................... 61References .................................................................................................................................................. 61Recommended Reading ............................................................................................................................ 61Self Assessment ........................................................................................................................................... 62

Chapter V ................................................................................................................................................... 64Recursion .................................................................................................................................................... 64Aim .............................................................................................................................................................. 64Objectives ................................................................................................................................................... 64Learning outcome ....................................................................................................................................... 645.1 Introduction ............................................................................................................................................ 655.2 A Simple Illustration of Recursion ......................................................................................................... 655.3 A Pseudocode Fundraising Strategy ....................................................................................................... 665.4 Recursion ............................................................................................................................................... 665.5 The Recursion Step ................................................................................................................................ 675.6 Use of Recursion .................................................................................................................................... 675.7 Infinite Recursion ................................................................................................................................... 675.8 Recursion and Memory .......................................................................................................................... 685.9 Recursive Algorithms ............................................................................................................................. 685.10 Solving Factorials ................................................................................................................................ 685.11 Iterative Example ................................................................................................................................ 695.12 Recursion Example ............................................................................................................................. 695.13 Visual Example – Factorials ................................................................................................................ 705.14 Recursion and Iteration ........................................................................................................................ 705.15 Another Example - The Fibonacci Series ............................................................................................ 715.16 The Fibonacci Function ....................................................................................................................... 715.17 Visual Example – Fibonacci Series ...................................................................................................... 725.18 Fibonacci Series – Caution .................................................................................................................. 725.19 Fibonacci Series – Another Solution .................................................................................................... 735.20 Classic Recursive Problems ................................................................................................................. 735.21 The Towers of Hanoi ............................................................................................................................ 735.22 Indirect Recursion ................................................................................................................................ 755.23 Maze Traversal ..................................................................................................................................... 75Summary .................................................................................................................................................... 76References .................................................................................................................................................. 76Recommended Reading ............................................................................................................................. 76Self Assessment ........................................................................................................................................... 77

Chapter VI ................................................................................................................................................. 79Hash Table ................................................................................................................................................. 79Aim .............................................................................................................................................................. 79Objectives .................................................................................................................................................... 79Learning outcome ........................................................................................................................................ 796.1 Introduction to Hash Table ..................................................................................................................... 806.2 Types of Hash Table ............................................................................................................................... 806.3 Chained Hash Tables .............................................................................................................................. 80 6.3.1 Description of Chained Hash Tables ...................................................................................... 806.4 Open-addressed Hash Table ................................................................................................................... 816.5 Selecting a Hash Function ..................................................................................................................... 816.6 Collision Resolution............................................................................................................................... 816.7 Application of Hash Table ...................................................................................................................... 816.8 Collision Resolution............................................................................................................................... 826.9 Selecting a Hash Function ..................................................................................................................... 83

V

6.10 Division Method .................................................................................................................................. 836.11 Multiplication Method ......................................................................................................................... 836.12 Interface for Chained Hash Tables ....................................................................................................... 846.13 Implementation and Analysis of Chained Hash Tables ....................................................................... 866.14 Chained Hash Table Example .............................................................................................................. 916.15 Description of Open-Addressed Hash Tables ...................................................................................... 946.16 Collision Resolution............................................................................................................................. 946.17 Linear Probing ..................................................................................................................................... 956.18 Double Hashing ................................................................................................................................... 966.19 Interface for Open-Addressed Hash Tables ......................................................................................... 976.20 Implementation and Analysis of Open Addressed Hash Tables .......................................................... 99Summary ................................................................................................................................................... 106References ................................................................................................................................................ 106Recommended Reading .......................................................................................................................... 106Self Assessment ......................................................................................................................................... 107

VI

List of Figures

Fig. 1.1 Classification of data structures ........................................................................................................ 2Fig. 1.2 Visual example of Array ................................................................................................................... 3Fig. 1.3 Inserting element .............................................................................................................................. 5Fig. 1.4 Linear link list ................................................................................................................................... 7Fig. 1.5 Stack ................................................................................................................................................. 9Fig. 1.6 Stack physical view .......................................................................................................................... 9Fig. 1.7 Queue .............................................................................................................................................. 10Fig. 1.8 Queue physical view ....................................................................................................................... 10Fig. 1.9 Tree ................................................................................................................................................. 12Fig. 1.10 Tree terminology........................................................................................................................... 12Fig. 1.11 Rooted tree .................................................................................................................................... 13Fig. 1.12 Binary tree .................................................................................................................................... 14Fig. 1.13 Example of trees ........................................................................................................................... 14Fig. 1.14 Binary search tree’s example ........................................................................................................ 15Fig. 3.1 Trace binary search first step ......................................................................................................... 42Fig. 3.2 Trace binary search second step .................................................................................................... 42Fig. 3.3 Trace binary search third step ........................................................................................................ 42Fig. 3.4 Bubble sorting ................................................................................................................................ 43Fig. 3.5 Insertion sorting ............................................................................................................................. 44Fig. 3.6 Merge sort procedure ...................................................................................................................... 45Fig. 3.7 Merge sorting ................................................................................................................................. 46Fig. 3.8 Quick sorting .................................................................................................................................. 47Fig. 3.9 Selection sort ................................................................................................................................. 48Fig. 3.10 Radix sorting................................................................................................................................. 49Fig. 4.1 Θ-Notation ...................................................................................................................................... 55Fig. 4.2 O-Notation ..................................................................................................................................... 56Fig. 4.3 Ω-Notation ...................................................................................................................................... 57Fig. 4.4 Relations between Θ, O and Ω ....................................................................................................... 58Fig. 5.1 Illustration of recursion .................................................................................................................. 66Fig. 5.2 Recursion and memory ................................................................................................................... 68Fig. 5.3 Visual example of factorials ........................................................................................................... 70Fig. 5.4 Fibonacci Series ............................................................................................................................. 72Fig. 5.5 Tower of Hanoi ............................................................................................................................... 74Fig. 5.6 Indirect recursion ............................................................................................................................ 75Fig. 6.1 A chained hash table with five buckets containing a total of seven elements ................................ 81Fig. 6.2 Linear probing with h(k, i) = (k mod 11 + i) mod 11 ..................................................................... 96Fig. 6.3 Hashing the same keys ................................................................................................................... 97

VII

List of Tables

Table 2.1 Keywords ..................................................................................................................................... 23Table 2.2 Character set ................................................................................................................................. 23Table 2.3 Basic data types ............................................................................................................................ 24Table 2.4 Data type – range of values .......................................................................................................... 24Table 2.5 Arithmetic operators ..................................................................................................................... 26Table 2.6 Rational operators ........................................................................................................................ 27Table 2.7 Logical operators .......................................................................................................................... 27Table 2.8 Operator precedence..................................................................................................................... 28Table 2.9 Bitwise operators .......................................................................................................................... 28Table 6.1 Expected probes as a result of load factor, assuming uniform hashing ....................................... 95

VIII

Abbreviations

ASP - Active Server PagesBASIC - Beginner's All-purpose Symbolic Instruction CodeBST - Binary Search TreeCHT - Chained Hash TablesCOBOL - Common Business Oriented Language CPU - Central Processing UnitDHTML - Dynamic HyperText Markup LanguageFIFO - First In First OutFORTRAN - Formula TranslatorHTML - HyperText Markup LanguagePERL - Practical Extraction Report LanguagePHP - Hypertext Preprocessor

1

Chapter I

Introduction to Data Structure

Aim

The aim of this chapter is to:

introduce the concept of data structure•

discuss the types of data structure•

describe the concept of array•

Objectives

The objectives of this chapter are to:

illustrate the function of queue •

explain the difference between nonlinear and linear data structure•

give an overview of tree •

Learning outcome

At the end of this chapter, you will be able to:

explain the concept of linear and nonlinear data structure with java code•

draw code for various data type of data structure •

get an overview of stack•

Data Structures

2

1.1 IntroductionData structure can be defined as the collection of elements and all the possible operations which are required for the set of elements. The operation includes inserting, deleting, searching and printing an element. It is the way of representing logical relationship between individual data elements.

1.2 Types of Data StructureThe data structure can be divided into two types:

Linear data structures• Nonlinear data structures•

These two types are further classified into:

Data Structures

Linear

Array LinkedList Stack Queue Tree Graph Table Sets

Non-Linear

Fig. 1.1 Classification of data structures

1.3 Linear Data StructureA data structure is said to be linear, if its elements form a sequence or a linear list.

Operations on linear structureThe operations on linear structure are as follows:

Traversal - Travel through the data structure.• Search - Traversal through the data structure for a given element.• Insertion - Adding new elements to the data structure.• Deletion - Removing an element from the data structure.• Sorting - Arranging the elements in some type of order.• Merging - Combining two similar data structures into one.•

1.4 Types of Linear Data StructureThe various types of linear data structures are mentioned below:

1.4.1 ArrayArray is a consecutive group of memory locations which all have the same name and are of identical type.

3

651 322 763 911 550 826Grades

Grades 0 Grades 5

Fig. 1.2 Visual example of Array

Array terminology and declarationArray declaration

type Array Name[]= new type[<array size>]

Example:Float Grades[] = new float[7];

Grades is an array of type float with size 7.• Grades[0], Grades[1], …, Grades[6] are the elements of the Grades; each is of type float.• 0, 1, 2,…,6 are the indices of the array. Also called subscripts. (Note that the indices start at 0 and NOT 1)• During array declaration we may also put the brackets before the variable name: • i.e., float [ ]Grades = new float[7];•

Initialising arraysArrays may be initialised in the following ways:

int n[] = 2, 4, 6, 8, 10 ;

Creates and initialises a five element array with specified values•

int n[] = new int[10];

Creates and initialises a 10 element array of zeros•

Note:If data type is a non primitive type then above expression would create and initialise a 10 element array of • nulls.You cannot assign data to arrays like:•

List = 1, 2, 3, 4, 5; Wrong!Array elements are indexed between zero and the size of the array minus one.• Arrays can have any type.• You can check the size of your array by calling the length member variable.•

int anArray[] = new int[5];System.out.println(anArray.length);

The above example will print out 5 to the terminal

Data Structures

4

Array parametersOne can pass arrays into functions as part of the function parameter like any other variable.

int results = new int[20];printResults (results);

The function prototype for “printResults” was defined as:

void printResults(int SomeArray[ ]);

Note:Arrays in Java are treated like objects thus all arrays are passed in by reference.• We don’t need to pass in the array size since we can get this from the length method.•

Traversing linear arraysUsual way to traverse a linear 1-d array is to use a loop.• For example, getting the overall average grade•

Pseudo-Code: Get_Average (Array [])Begin

index = 0;sum = 0;for index = 0 to index = ArraySize -1sum = sum + Array[index];average = sum / ArraySize;End

Example: sample java code/* function which calculates and returns the average overall grade/*float Get_Average(float Grades[ ]);float sum = 0; // initialise the sum variable// We use a for loop for traversal here because // it’s the handiest loop for what we want donefor(int index = 0; index < Grades.length; index++)sum += Grades[index];// Return the averagereturn sum/Grades.length;

Inserting elementsAdding an element to an array/list at an arbitrary position without overwriting the previous values, requires that you move all elements "below" that position.

5

3

7

9

13

22

3

7

9

13

22

3

7

9

12

13

22

1

2

3

4

5

6

1

2

3

4

5

6

1

2

3

4

5

6

12 12

Fig. 1.3 Inserting element

AlgorithmInsert (List [], position, element, ArraySize)

Start at the top element of the array.• Traverse the array backwards so as not to overwrite any previous data.• Replace the current element that we are on with the element before it.• Stop once we have reached our insertion position in the array.• Insert our data into that position.•

Sample Pseudo-code

Pseudo-Code: Insert (List [], position, element)Beginfor index = ArraySize-1 to index = position+1List[index] = List[index-1];List[position] = element;End

Multidimensional arrayArrays can be more than one dimensional.• Used to represent tables of data, etc.• Declaration of 2-d array: int grid [][] = new int[5][6];• This declaration is interpreted as an array consisting of 5 rows and 6 columns• Another way of declaring a 2-d array is given below.•

int grid[][] = new int[2][];grid[0] = new int[2];grid[1] = new int[2];

Same as declaring a 2 x 2 array except we are doing it one row at a time here.•

Data Structures

6

Example for 3-d array:

int space = new int[100][100][100];

Initialising multidimensional arraysExample:

int array1[ ][ ] = 1,2,3, 4,5,6;int array2[ ][ ] = new int[3][2];int array3[ ][ ] = 1,2, 4;

If we were to print out the values of these arrays by row we would get:Array 1 Array 2 Array3

1 2 3 0 0 0 1 24 5 6 0 0 4

Note: If there are not initialisers for a given row, primitive types are initialised to zero and non primitive types are initialised to null.

Multidimensional arrays as parametersExample:

Void warp(int space[ ][ ][ ]);

Same as declaring a one dimensional array as parameters except that we must remember to include extra square brackets for each additional dimension.

1.4.2 Linear Link ListA linear list is a list in which each element has a unique successor.

Types of listsGeneral: data can be inserted and deleted anywhere in the list•

Unordered or random data Ordered data: data are arranged according to a key

Restricted: data can be inserted or deleted at the ends of the list• LIFO (stack) FIFO (queue)

Four basic operations associated with linear lists are as follows:• Insertion Deletion Retrieval Traversal

7

START

NExT POINTERFIELDINFORMAIONPART

Fig. 1.4 Linear link list

Basic operation of linked listsA linked list is an ordered collection of data in which each element contains the location of the next element.• Each element (item) called node contains two parts: data and link.• Link contains a pointer variable that identifies the first element in the list.• Nodes are called self-referential structures. Each instance of the structure contains a pointer to another instance • of the same structural type.Unlike arrays, data can be easily inserted and deleted in the linked list, but the search becomes sequential as the • elements are no longer physically sequenced.Linked list abstract data type consists of the data structure and all operations that manipulate the data.• A head node contains metadata about the list such as, count, a head pointer to the first node and a rear pointer • to the last node.Data node contains the data type which depends entirely on the application and a pointer to another data structure • of its own type.The ten low-level algorithms are as follows:•

Create list Insert node Delete node Search list Retrieve node Empty list Full list List count Traverse list Destroy list

The three high-level algorithms include:• Add node Remove node Print list

Data Structures

8

Linked list using java

Declaring a class for nodes •Public class IntNode private int data private IntNode link

Declaring two nodesIntNode headIntNode tail

Constructor for the IntNodepublic IntNode(int initialData, IntNode initialLink)data = initialDatalink = initialLink

Complex linked list structuresThe various complex linked-list structures are as follows:

Doubly-linked list - Each node has a pointer to both its successor and its predecessor.• Multilinked list - List has two or more logical key sequences and data can be ordered chronologically.•

1.4.3 Stack

A stack is a linear list in which all additions and deletions are restricted to one end called top. It is a Last In • First Out (LIFO) data structure.Basic stack operations include:•

Push: adds an item at the top of the stack. Pop: removes the item at the top of the stack and return it to the user. Stack top: reads the stack top and returns the data to the user.

Stack-linked list implementation is discussed below.• Usually implemented with a pointer to a stack head structure stored in dynamic memory.

9

Push Pop

Fig. 1.5 Stack

head

data

nodes

Fig 1.6 Stack physical view

Head node contains metadata about the stack such as count and a pointer to the top of the stack.• Data node looks like a typical linked list data node.•

Stack algorithmsCreate stack• Push stack• Pop stack• Stack top• Empty stack• Full stack• Stack count• Destroy stack•

Data Structures

10

Stack applicationsFour common stack applications are:

Reversing data - A given set of data is recorded so that the first and last elements are exchanged, with all of the • positions between the first and the last elements are relatively exchanged also.Parsing - Any logic that breaks down the data into independent pieces for further processing.• Postponement - The usage of data can be deferred until some later point.• Backtracking - Making decisions between two or more paths.•

Stacks are also useful for implementing recursive algorithms.

1.4.4 QueuesA queue is a linear list in which data can be inserted at one end (called the rear), and deleted from the other end (called the front). It is a First In First Out (FIFO) data structure.

Front

Rear

Fig. 1.7 Queue

Basic queue operationsThe various queue operations are as follows:

Enqueue - Inserts an element at the rear of the queue.• Dequeue - Deletes an element at the front of the queue.• Queue front - Examines the element at the front of the queue.• Queue rear - Examines the element at the rear of the queue.•

front data nodes

head

rear

Fig. 1.8 Queue physical view

11

Queue-linked list implementationQueue-linked list is usually implemented as a linked list in dynamic memory.• Head node contains metadata about the stack such as count, and two pointers to the front and rear of the • queue.Data node looks like a typical linked list data node.•

Queue algorithmsCreate queue• Enqueue• Dequeue• Queue front• Queue rear• Empty queue• Full queue• Queue count• Destroy queue•

1.5 Non-Linear Data StructureNon-linear data structures are the data structures in which data may be arranged in hierarchical manner.

There are four type of non-linear data structures:• Tree Graph Table Set

1.5.1 TreeA tree is discussed in detail below:

1.5.1.1 Definition of Tree

A set of related interconnected nodes in a hierarchical structure.• A non-empty collection of vertices and edges that satisfies certain requirements.• Structure resembles branches of a “tree”, hence the name.•

1.5.1.2 Types of TreesThe different types of trees are as follows:

Rooted tree• Ordered tree• M-ary tree and binary tree•

Data Structures

12

Fig. 1.9 Tree

1.5.1.3 Tree Terminology

A vertex (or node) is a simple object that can have a name and can carry other associated information.• The first or top node in a tree is called the root node.• An edge is a connection between two vertices.• A path in a tree is a list of distinct vertices in which successive vertices are connected by edges in the tree.•

Example : a, b, d, i is path.The defining property of a tree is that there is precisely one path connecting any two nodes.• A disjoint set of trees is called a forest.• Nodes with no children are leaves or terminal nodes.•

Fig. 1.10 Tree terminology

The various terminologies used in tree include:• Root - This is the unique node in the tree to which further sub trees are attached. Degree of the node - The total number of subtrees attached to that node is called the degree of the node. For node A degree is 2.Leaves - These are the terminal nodes of the tree. The nodes with degree 0 are always the leaves. Here nodes are e, f, g, h, i.Internal nodes - The nodes other than the root node and the leaves are called the internal node. Here b, c, d and f are internal nodes.Parent node - The node which is having further sub branches is called the parent node of those sub branches. In fig 1.8 node b is parent node of d, e and f and c is parent node of g and h. whereas d, e, f, g and h is child of b and c parent.

13

1.5.1.4 Rooted Tree

A Rooted tree is one, where we designate one node as the root (i.e., the tree examples we have been looking at • so far are all rooted trees).In computer science, the term tree is reserved to refer rooted trees. The more general structure is a free tree.• In a rooted tree, any node is the root of a sub-tree that consists of many roots and the nodes below it.• There is exactly one path between the root and each of the other nodes in the tree.• Each node except the root has exactly one node above it in the tree, (i.e., it is parent), and we extend the family • analogy talking of children, siblings, or grandparents.

Root

Vertex

Leaf Leaf Leaf Leaf

Vertex

Fig. 1.11 Rooted tree

1.5.1.5 Ordered Tree

An ordered tree is a rooted tree in which the order of the children at every node is specified.• If each node must have a specific number of children appearing in a specific order, then we have a M-ary • tree.The simplest type of M-ary tree is the binary tree.•

1.5.1.6 Binary TreesA binary tree is a tree, where each node has exactly zero, one or two children. i.e., each parent can have not more than 2 children. As with any Abstract Data Structure , a binary tree can be implemented in a number of ways, using arrays, strings, or structures and pointers.

Data Structures

14

Fig. 1.12 Binary tree

1.5.1.7 Example of Trees

Fig. 1.13 Example of trees

Representing a Binary Tree Data Structure in JavaQuestion: How do we represent a binary tree using java?

15

Solution:One way of representing A node of a binary tree in Java as follows:class BTreeNode int data = 0;BTreeNode left = null;BTreeNode right = null;

Data : 0

Left (Null) Right (Null)

We may then build our binary tree using this node structure

Constructing the Binary TreeLets draw the following code sequence.

BTreeNode root = null, temp = null;root = new BTreeNode();root.data = 5;temp = new BTreeNode();temp.data = 2;temp.left = null;temp.right = null;root.left = temp;root.right = null;

1.5.1.8 Binary Search TreeA binary tree which conforms to the following properties is called a binary search tree.

Properties of binary search treeEach value (key) in the tree exists at most once (i.e., no duplicates).• The "greater-than" and "less-than" relations are well defined for the data value.• Sorting constraints for every node n •

All data in the left subtree of n is less than the data in the root of that subtree. All data in the right subtree of n is greater than the data in the root of that subtree.

Fig. 1.14 Binary search tree’s example

Data Structures

16

Question: How to construct a BST in java?Answer: The most intuitive way of creating BST/adding nodes to a BST is to use recursive approach.

Question: What are the base and recursive cases?Answer: Base case: if tree is empty, create new node for item.Recursive case: if key < root's value, add to the left subtree, otherwise to the right subtreeBST Add Node Method

Code for add node method BTreeNode Add_node(BTreeNode node, int value) // Method returns newly added nodeif(node == null) // The base casenode = new BTreeNode(); // Create the new nodenode.data = value; // Give it specified datanode.left = null; // Set up left and right pointersnode.right = null;return node; // Return the new node to the calling methodelse // The Recursive casesif ((value > node.data) && (node.right == null)) // If node is a leaf and node to add is right childreturn node.right = Add_node(node.right, value); // Prepare to accept a new right “child”else if ((value > node.data) && (node.right != null)) // Else if node not a leaf, traverse as appropriatereturn Add_node(node.right, value);else if ((value < node.data) && (node.left == null)) // If node is a leaf and node to add is left childreturn node.left = Add_node(node.left, value); // Prepare to accept a new left “child”else // Else node not a leaf, traverse as appropriatereturn Add_node(node.left, value);

1.5.1.9 ++ Tree Traversal TypesWe may traverse a binary tree in 3 ways. These are:

preorder• postorder• inorder•

These are described below:Preorder traversal

Visit the node first and process.• Do preorder traversal of left subtree.• Do preorder traversal of right subtree, i.e., visits and processes each node in a tree BEFORE visiting and • processing its children.

Postorder traversalDo postorder traversal of left subtree.• Do postorder traversal of right subtree.• Visit the node last and process, i.e., visits and processes each node in the tree AFTER visiting and processing • its children.

17

Inorder traversalDo inorder traversal of left subtree.• Visit the node and process.• Do inorder traversal of right subtree, i.e., processes nodes in the tree in an ascending sorted order.•

1.5.1.10 Traversal MethodThe three methods of tree traversal are discussed briefly below.Inorder

void inorder(BTreeNode the_node)if (the_node == null); // do nothingelseinorder(the_node.left);System.out.print(the_node.data);inorder(the_node.right);

Preorder

void preorder(BTreeNode the_node)if (the_node == null); // do nothingelseSystem.out.print(the_node.data);preorder(the_node.left);preorder(the_node.right);

Postorder

void postorder(BTreeNode the_node)if (the_node == null); // do nothingelsepostorder(the_node.left);postorder(the_node.right);System.out.print(the_node.data);

Data Structures

18

SummaryThe data structure can be defined as the collection of element and all the possible operations which are required • for those set of elements. The operation includes inserting, deleting, searching and printing an element. It is the way of representing logical • relationship between individual data elements. The data structure can be divided into two types, namely Linear and Nonlinear data structures.• Traversal: Travel through the data structure.• Search: Traversal through the data structure for a given element.• Insertion: Adding new elements to the data structure.• Deletion: Removing an element from the data structure.• Sorting: Arranging the elements in some type of order.• Merging: Combining two similar data structures into one.• A consecutive group of memory locations that all have the same name and of identical type.• Usual way to traverse a linear 1-d array is to use a loop. For example, getting the overall average grade. • Adding an element to an array/list at an arbitrary position without overwriting the previous values requires that • you move all elements "below" that position.A linear list is a list in which each element has a unique successor.• Types of lists are General and Restricted.• Four basic operations associated with linear lists are Insertion, Deletion, Retrieval and Traversal.•

References Venugopal, S., 2006. • Data Structures Outside in with Java. Publisher: Prentice Hall. McAllister, W., 2008. • Data structures and algorithms using Java. Publisher: Jones & Bartlett Learning.

Recommended ReadingGuzdial, M. and Ericson, B., 2009.• Problem Solving with Data Structures Using Java: A Multimedia Approach. Prentice Hall. Langdon, W. B., 1998. • Genetic programming and data structures: genetic programming + data structures. Springer.Kamthane, A., 2010. • Programming and Data Structures. 5th ed., Course Technology.

19

Self Assessment The ___________can be defined as the collection of element and all the possible operations which are required 1. for those set of elements.

data structurea. molecular structure b. chemical structure c. code structure d.

Which of the following statements is true ?2. The data structure has one type.a. The data structure can be divided into three types.b. The data structure can be divided into four types.c. The data structure can be divided into two types.d.

A data structure is said to be linear, if its elements form a sequence or a___________.3. line lista. least listb. linear listc. non-linear listd.

Which of the following is traversal statement?4. Travel through the data structurea. Travel through the data structure for given element b. Adding new elements to data structure c. Travel in the data structured.

_____________ is removing an element from the data structure.5. Sorting a. Insertion b. Deletion c. Merging d.

___________ is combining two similar data structures into one.6. Sorting a. Insertion b. Deletion c. Merging d.

_____________ is a arranging the elements in some type of order.7. Sorting a. Insertion b. Deletion c. Merging d.

Data Structures

20

__________ is adding new elements to the data structure.8. Sorting a. Insertion b. Deletion c. Merging d.

__________ is traversal through the data structure for a given element.9. Searcha. Insertion b. Deletion c. Merging d.

______________ is a consecutive group of memory locations that all have the same name and of identical 10. type.

Array a. Pointerb. Data c. Element d.

21

Chapter II

Data Type and Basics of C Programming

Aim


introduce programming language•

describe the ways of developing algorithmic thinking•

create a foundation for you, to impart moderate skills in programming in industry•

Objectives


define operations•

describe the advanced programming tools•

introduce the basics of ‘C’ programming language•

Learning outcome

At the end of this chapter, the student will be able to:

develop their own logic in ‘C’ language programming•

distinguish between identifiers and keywords•

classify different data types, data type qualifiers and variables•

Data Structures

22

2.1 IntroductionComputer is a system made up of two major components; hardware and software. The set of physical components, i.e., mouse, keyboards, monitor, CPU, etc. are the forms of hardware. Software is the secondary component which acts as an interface between a user and hardware.

Without software, it is not possible to use hardware. Software is the set of programs. (while a program is a set of instruction given to the hardware). Software helps the user to run the programs. Computer system cannot be used without it. In other words, software can be called the bridge between the user and the computer hardware. The computer system is classified into five main categories based on the requirement of the user. These are discussed below.

Operating systemIt is the platform of the computer and is a necessary application for all computer systems, which helps the user to boot the machine. Windows xP, Linux, Windows NT, MAC, OS/2, etc. are some examples of the operating systems.

Application softwareAccounting softwares like Tally; office automation software like Microsoft Office or designing tools like Coral Draw, Page Maker or Photoshop are available software which are developed according to the user's need.

Programming languageBASIC, C++, C#, Java, etc. are the examples of the computer languages. These are the programming languages which are used to develop system and application software.

Advanced development toolsThese are basically used in combinations. The major categories of this software include:Front endDesigns screen and reports of the software. Visual Basic, Developer 2000, Visual C++ are some popular front ends.

Back endDBMS or RDBMS are used to store the data entered by the user. Oracle, Sybase, Informix etc., are some popular back ends.

Web based toolsHTML, DHTML, VB Script, Java Script, ASP, PHP, PERL etc. are some softwares used to develop internet websites.

2.2 Introduction to ‘C’Information technology is the subject which deals with programming. Programming or coding is the process • of testing, writing, debugging/troubleshooting and maintaining the source code of computer programs. This source code is written in a programming language.‘C’ is the programming language which is used to write the program and execute it. It was developed at AT & • T’s Bell Laboratories in USA in1972 and was written by Dennis Ritchie. It is popular because of its reliability, simplicity and user friendly facility. COBOL was the language which was used for commercial purpose. BASIC was used as beginner’s language and FORTRAN was used for engineering applications.

There are mainly two types of languages, known as low level and high level languages.Low level languagesThese languages interact with machine hardwares. For example, memory.

23

High level languagesHuman beings can understand high level languages, just as they understand English.

Example: ‘C’ languageCode written in ‘C’ language is portable, i.e., software written on one type of computer can be adapted to work • on another type.This language shows both low level and high level feature as it interacts with the hardware as well as English • in words.As ‘C’ language is written in blocks, it is also called a block structured language.• It is a very powerful language, as most part of UNIx (multi-user operating system) is written in ‘C’.• ‘C’ has five built-in data types and permits almost all data type conversion.• ‘C’ languages are widely used in the electronic devices like cellular phones, laptops, microwaves, etc.• It is majorly used in 3D applications like video games, where we need powerful graphical interface and fast • speed.It enables the user to write virus as well as antivirus programs in ‘C’.•

2.3 Identifier and KeywordsIn ‘C’ language, every word has a keyword or an identifier. The names given to the variables by the compiler are referred to as identifiers. They are the names of user defined variable, arrays and functions. They should essentially be a sequence of letters or digits. The variable name should begin with characters and are permitted to be written in upper case and lower case both, or even with underscore. The keyword set of ‘C’ language is given below.

auto double int structbreak else long switchcase enum register typedefchar extern return unionconst float short unsignedcontinue for signed voiddefault goto size of volatiledo if static while

Table 2.1 Keywords

The following table is the character set of ‘C’ language.letter A B C D E F G H I J K L M N O P Q R S T U V W x Y Z

digit 0 1 2 3 4 5 6 7 8 9 underscore _punctuation ! " # % & ' ( ) * + , - . / ; < = > ? [ \ ] ^ | ~

Table 2.2 Character set

2.4 Data Types and ConstantsFor any computer programming language, it is very essential to have a data type, which makes it easier to • maintain information within a computer program. Data type plays a very important and large part, since the basic principle behind the computer programming is to take the information, process it and deliver the information to the user in a different form.The programmer has to declare as to which data type has to be used for each data object for most of the • programming languages and most database systems require the user to declare the type of each data field. The data types differ from one programming language to another and from one data base application to another.

Data Structures

24

The constant value is the entity that does not change when the program is executed. It refers to a value. ‘C’ • supports various types of constants.

Basic data types

Integer intFloating point float

Double floating point double

Character char

Void void

Table 2.3 Basic data types

Data Type Range of values

int - 32768 to +32767

float 3.4e-38 to 3.4e+e38

double 1.7e+308

char -128 to 127

Table 2.4 Data type – range of values

int• used to define inter numbers no decimal points are used takes 16 bytes of memory to store the value on 16 bit machine an integer can be positive or negative

float• used to define floating point numbers stores only floating point values the values should be in decimal places and it takes 32 bytes of memory to store the value on 16 bit machine

double• used to define big floating points used to get the store on 16 bit machine, it requires 64 bytes of memory

char• used to define characters used to store single character or number at a time used a 16 bit machine, 8 byte memory is required for character type of data

void• using void data type, we can specify a function

25

Type qualifiersType qualifiers declare the variables along with the data types.

Short• Long• Unsigned• Unsigned long•

ShortShort int is an integer of lower range.• Short int may require less memory than an ordinary int or require same amount of memory, but will never • exceed an ordinary int in word length.The short int will have half the memory requirements, if long int and int have the same requirements.•

LongLong int is an integer of increased range. Long int may require more memory than ordinary int or may require • the same amount of memory but will never be less than an ordinary int. Long int will have double the requirements, if short int and an ordinary int both have the same requirements.•

UnsignedAn unsigned int is the integer with no negative range, the space capacity is used to increase positive range• Unsigned int and int have similar memory requirements.• The unsigned int can be twice as large as an ordinary int.•

Unsigned longLike unsigned, int might possibly be of increased range.•

2.5 VariablesA variable is an identifier which is used to represent a specific type of information within a designated portion • of a program.It represents a single data item, i.e. numerical quantity or a character constant.• A variable is the entity that changes when the program is executed. They can be changed at different times. A • variable is a name which stores a value in the memory. It stores only one value at a time.If the variable is of the real type it can be stored using float or double data type.•

Variable declarationThe variables are declared along with the data types so that the compiler understands which type of variable it is.General syntax used is,

<data type> <variable name>;Examples:

int num• float percent• char grade•

You can use extended data types as mentioned below:Long int amount• Signed int temp•

Data Structures

26

Variable initialisationValues are assigned for the variables through the variable initialisation.Example:

int num; //variable declaration num = 1; //variable initialization

Declaring variables as constantsConstant variable values once assigned cannot be changed throughout the program. These types of variables can be declared using the keyword ‘const’.Example:

consta float pi = 3.14

2.6 Operators and ExpressionsIndividual constants, variables array elements and function can be joined together by the various operators to • form expressions.An operator is a symbol which enables the user to command the computer and perform certain manipulations • and calculations.The data items that operators act upon are called operands.• Some operators require two operands, while other act upon only one operand.• It is used in ‘C’ language program to operate on data and variables. ‘C’ has set of operators which can be • classified as the following-

Arithmetic operatorsBasic operations such as addition, subtraction, multiplication and division are performed using arithmetic • operations.% is the operator called as modulus operator and is used to find out division from the remainder.• The operand acted upon by the arithmetic operators must represent numeric values.• Division of one integer quantity by another is referred to be integer division.• Floating point quotient is the result if a division operation is carried out with two floating point numbers or with • one floating point number and one integer.

Sr. No. Operator Meaning1 + Addition or unary plus2 - Subtraction or unary minus3 * Multiplication4 / Division5 % Modulus

Table 2.5 Arithmetic operators

Examples of arithmetic operators are:x + y x – y -x + y a * b + c -a * b

where,a, b, c, x, y are Operands.

27

Rational operatorsRational operators are required to compare the relationship between operands.• They are popularly known as comparison operators. ‘C’ supports the following rational table.•

Sr. No. Operator Meaning1 < Less than2 > Greater than3 ≤ Less than or equal to4 ≥ Greater than or equal to5 = Equal to6 ! = Not equal to

Table 2.6 Rational operators

The operands fall within the same precedence group, and are lower than the arithmetic and unary operators.• The equal to (=) and not equal to (• ! =) operators fall within different precedence group, beneath the rational operators.

Logical operatorsThe six rational operators are used to form logical operators, which help represent the conditions either true or • false.True condition is represented by value 1.• False condition is represented by value 2.• Logical operators are used to compare or evaluate logical and rational expressions. AND, OR and NOT are the • logical operators.The result of AND operation will be true if the both the operand are true.• The result of OR operation will be true if either operand is true or both the operands are true. Thus, the result • of operation OR will be false, only if both the operands are false.OR operation is of higher precedence than operation AND.•

Sr. No. Operator Meaning

&& Logical AND – True if all conditions are true

|| Logical OR – True if any one or all conditions are true

! Logical NOT – Negation

Table 2.7 Logical operators

Expression is the set of operators and operands.Consider,c = a + bHere, c will store addition of a and b.

Data Structures

28

Sr. no. Operator category Operators AssociativityArithmetic – multiple, divide and remainder ×, ÷, % L → RArithmetic – add and subtract +, - L → RRational operators <, ≤, >, ≥ L → REquality operators =, != L → RAND && L → ROR || L → R

Table 2.8 Operator precedence

Comma operatorComma operator is used to separate two or more expressions when only one expression is expected.• a=(b=3, b=2)• Comma operator is also used in an expression for forming a loop.•

Conditional operatorsAn expression that makes use of conditional operator is called as conditional expression.• Conditional operator is in combination of ‘? :’. This is called as tertiary operator. Its general form is,• expression 1? expression 2 : expression 3• Expression 1 is evaluated first when conditional expression is evaluated.• Expression 2 is evaluated if expression 1 is true and it becomes the value of conditional expression.• If expression 1 is false, expression 3 is evaluated and it becomes the value of conditional expression.• Conditional expression often appears on the right hand side of an assignment statement.• Above the assignment operator, the conditional operator has its own precedence.• It checks the condition and evaluates the result depending upon the status of the condition.•

i.e., true or false.Example,

max = a > b ? a : bHere,

if a>b, then the condition becomes true and max variable will store the value of aif a>b is false, then max variable will store the value of b

Thus, the conditional operator evaluates true or false and returns the result.

Bitwise operatorsBasically, the computer does not understand any language so the bitwise operators are used to manipulate the data. • Considering the bit pattern of the values stored by a computer, the bitwise operators modify the variables.It understands the machine code which is in the form of 0’s and 1’s. But this program is written in a high level • language. A program, called compiler, is used to convert the high level language into low level language. ‘0’ and ‘1’ are called as bits. Bitwise operation is thus, required to perform multiple operations on bits.

Sr. No. Operators Meaning1 & Bitwise AND

2 | Bitwise OR3 ^ Bitwise exclusive OR4 << Shift Lift5 >> Shift Right

Table 2.9 Bitwise operators

29

Assignment operatorsIn ‘C’ language, there are several different operators that are used to form assignment expressions which assign • the value of an expression to an identifier.The most commonly used assignment operator is '='. This is used to assign some value to the variable. • The arithmetic operations precede assignment operation.•

Example,b = 10

Here, value 10 is assigned to the variable b.

Increment and decrement operatorsIncrement and decrement operators are the shortcuts used to increment the values. The operator '+ +' is used to increase the value by 1 and the operator '- -' is the decrement operator, used to reduce value by 1.Example,

+ + x is similar to x = x = 1If, x is 5, then, after + + x or x + + , x will become 6Or,

x + + is similar to x = x + 1-- y is similar to y = y – 1

If, x is 5,then after + + x or x + +,

x will become 6Or,

y - - is similar to y = y – 1Within variable name, increment and decrement operators can be used as prefix or postfix position.

2.7 Preprocessor DirectivesThe ‘C’ preprocessor is exactly what its name implies. It is a program that processes our program before it • is passed to the compiler. Preprocessor directives form what can almost be considered a language within ‘C’ language.In the code of any program, preprocessor directives are the lines included which are not program statements • but, directives for the preprocessor. These lines are generally defined by a hash symbol (#). The preprocessor digests all these directives before any code is generated by the statement since the preprocessor is executed before the actual compilation of the code begins.The preprocessor directives extent only across single line code and the preprocessor directive are considered • to end as soon as a newline character is found. By preceding the newline character at the end of the line by a backslash (\), the preprocessor directive can be extended through more than one line.There are two types of preprocessor directives.•

Macro expansion File inclusion

Macro expansionConsider the following program,

Data Structures

30

#define UPPER 25main ( ) int I ; for ( i = 1 ; i ≤ UPPER ; i ++ ) printf ( “\n%d”, i ) ;

In this program, instead of writing 25 in the ‘for’ loop we are writing it in the form of UPPER, which has already been defined before ‘main ( )’ through the statement,

#define UPPER 25

This statement is called ‘macro definition’ or more commonly, just a ‘macro’. During preprocessing, the preprocessor replaces every occurrence of UPPER in the program with 25. Example of macro definition,

#define PI 3.1415main ( ) float r = 6.25 ; float area ;

area = PI* r*r ; printf ( “\nArea of circle = %f”, area ) ;

UPPER and PI in the above programs are often called ‘macro templates’, whereas, 25 and 3.1415 are their corresponding ‘macro expansions’.

A ‘#define’ directive is many a times used to define operators as shown below.

#define AND &&#define OR ||main ( ) int f = 1, x =4, y = 90 ;

if ( ( f , 5 ) AND ( x ≤ 20 OR y ≤ 45 ) ) printf ( “\nYour PC will alwys work fine…” ) ; else print (“\nIn front of the maintenance man” ) ;

31

A ‘#define’ directive could be used even to replace a condition.

#define AND &&#define ARRANGE ( a > 25 AND a < 50 )main ( ) int a = 30 ;

if (ARRANGE) printf ( “within range” ) ; else printf ( “out of range” ) ;

A #define directive could be used to replace even an entire ‘C’ statement.

#define FOUND printf ( “X virus” ) ;main ( ) char signature ;

if ( signature = =’Y’ ) FOUND else printf ( “Safe… as yet !” ) ;

File inclusionFeatures of file inclusion areas are as follows:

If we have a very large program, the code is divided into several different files, each containing a set of related • functions. It is a good programming practice to keep different sections of a large program separate. These files are ‘#included’ at the beginning of main program file.There are some functions and some macro definitions that are needed almost in all programs that we write. These • commonly used functions and macro definitions can be stored in a file and the file can be included in every program we write. This would add all the statements in the file to our program, as if we have typed them in.There are two ways to write ‘#include’ statement.•

#include “filename” - this command will look for the filename in the current directory as well as the specific list of directories as mentioned in the include search path that might have been set up.#include <filename> - this command will look for the filename in the specified list of directories only.

Data Structures

32

SummaryOperating system is a platform to use the computer system. Windows xP, Linux, MAC, etc. are a few examples • of the operating systems that are used. Java, ‘C’, ‘C+ +’, etc. are computer programming languages.Further, the programming languages are divided into two main types; low level programming language and • high level programming language.A low level programming language interacts with the machine hardware, while the high level programming • language interacts in human understandable language (like English).In ‘C’ language, every word has a keyword or an identifier along with a character set.• The constant value is the entity that does not change when the program is executed and it refers to a value.• ‘C’ consists of various data types.- int, float, double, char and void.• Short, long, unsigned and unsigned long are the data qualifiers.• The variables are declared along with the data types so that the compiler understands which type of variable • it is.An operator is a symbol that enables the user to command the computer and perform certain manipulations and • calculations. Operators are classified as- arithmetic operators, rational operator, logical operator, comma operator, conditional • operation, bitwise operator, assignment operator and increment and decrement operators.In the code of any program, preprocessor directives are the lines included which are not program statements • but directives for the preprocessor.

ReferencesGottfried, B. S., 1996. Schaum • Outline of theory and problems of Programming in C, 2nd ed., The McGraw-Hill Company. Inc.Mrs. Kapure, G. Y., 2010. • Programming in ‘C’, 1st ed., Tech-Max Publications.

Recommended ReadingJerry Lee Ford Jr., 2007. • Programming for the Absolute Beginners. 1st ed., Course Technology PTR.Francis Glassborow, 2004. • You can do it: A Beginners Introduction to Computer Programming. Wiley Publishers.Ullman, L., 2004. • C Programming. 1st ed., Peachpit Press.

33

Self AssessmentLow level interacts with machine 1. __________.

softwarea. hardwareb. both a and bc. none of aboved.

In2. ‘C’ language, every __________ has a keyword or an identifier along with a character set.sentencea. lineb. wordc. alld.

Increment and decrement operators are the shortcuts used to increment the values. The operator 3. __________ is used to increase the value by 1 and the operator __________ is the decrement operator, used to reduce value by 1.

' + + ' , ' - - 'a. ' - ' , ' + + 'b. ' +' , ' - -'c. ' - - ' , ' + +'d.

__________ is used to separate two or more expressions when only one expression is expected.4. Assignment operatora. Arithmetic operatorb. Comma operatorc. Conditional operatord.

Which of the following statements is true ?5. Comma operator is in combination of ‘a. ? :’Conditional operator is in combination of ‘b. ? :’Conditional operator is in combination of ‘c. # :’Conditional operator is in combination of ‘d. ? .’

Which of the following statements is true ?6. Rational operators are required to compare the relationship between operands.a. Assignment operators are required to compare the relationship between operands.b. Rational operators are not required to compare the relationship between operands.c. Logical operators are required to compare the relationship between operands.d.

Which of the following statements is true ?7. The operator ≤ means greater than or equal to. a. The operator ± means greater than or equal to. b. The operator ≠ means greater than or equal to. c. The operator ≥ means greater than or equal to. d.

Data Structures

34

Which symbol is generally used to define the lines in the preprocessor directives?8. Percentage ( % ) signa. Hash ( # ) signb. Ampersand ( & ) signc. Underscores ( _ ) signd.

Which are the four data type qualifiers?9. Short and longa. Long and unsignedb. Long, unsigned and unsignedc. Short, long, unsigned and unsigned longd.

Which operators are10. required to compare the relationship between operands?Logical operatorsa. Rational operatorsb. Conditional operatorsc. Comma operatorsd.

35

Chapter III

Algorithm Analysis

Aim


introduce the concept of algorithm •

explain the significance of algorithm •

elaborate the running time concept in algorithm •

evaluate the efficiency of algorithm •

Objectives


illustrate the importance of searching and sorting technique •

explain different sorting techniques •

understand the significance of sorting technique •

differentiate between sorting and searching technique •

Learning outcome


evaluate the types of searching •

enlist the drawback of searching technique •

illustrate how can the drawback of searching be overcome•

Data Structures

36

3.1 Introduction Algorithms can be described in terms of

time efficiencyspace efficiency

Choosinganappropriatealgorithmcanmakea significant difference in the usability of a systemgovernment and corporate databases with many millions of records, which are accessed frequently online search engines real time systems where near instantaneous response is required from air traffic control systems to computer games

3.2 Comparing AlgorithmsThere are often many ways to solve a problem. Different algorithms produce the same results, for example, there are numerous sorting algorithms. We are usually interested in how an algorithm performs when its input is large. In practice, with today's hardware, most algorithms will perform well with small input.

3.3 Measuring Algorithms It is possible to count the number of operations that an algorithm performs, by a careful -visual walkthrough of the algorithm or by inserting code in the algorithm to count and print the number of times that each line executes. It is also possible to time algorithms. Compare system time before and after running an algorithm.

In Java:System.currentTimeMillis( )

More sophisticated timer classes also exist.

3.4 Timing Algorithms It may be useful to time how long an algorithm takes to run.• In some cases it may be essential to know how long an algorithm takes on some system, e.g., air traffic control • systems.Running time is affected by a number of factors other than algorithm efficiency.•

3.5 Running TimeRunning time is affected by following things:

CPU speed • Amount of main memory • Specialised hardware (e.g., graphics card)• Operating system • System configuration (e.g., virtual memory)• Programming language • Algorithm implementation • Other programs• System tasks (e.g., memory management)•

37

3.6 CountingInstead of timing an algorithm, count the number of instructions that it performs. • The number of instructions performed may vary based on following things:•

The size of the input the organisation of the input

The number of instructions can be written as a cost function on the input size.•

public void printArray(int arr[]) for (int i = 0; i < arr.length; ++i) System.out.println(arr[i]);

3.7 Cost FunctionsInstead of choosing a particular input size, we will express a cost function for input of size n.• Assume that the running time, t, of an algorithm is proportional to the number of operations.• Express t as a function of n.•

Where t is the time required to process the data using some algorithm A.• Denote a cost function as t• A(n), i.e., the running time of algorithm A, with input size n.

public void printArray(int arr[]) for (int i = 0; i < arr.length; ++i) System.out.println(arr[i]);

3.8 Input VariesThe number of operations usually varies, based on the size of the input• . Though not always, consider array lookup.In addition, algorithm performance may vary based on the organisation of the input.• For example, consider searching a large array. If the target is the first item in the array the search will be very • quick.

3.9 Best, Average and Worst Case Algorithm efficiency is often calculated for three broad cases of input•

Best Case Average (Or “Usual”) case Worst Case

This analysis considers how performance varies for different inputs of the same size.•

Data Structures

38

3.10 Analysing AlgorithmsIt can be difficult to determine the exact number of operations performed by an algorithm.• An alternative to counting all instructions is to focus on an algorithm's barometer instruction. • The barometer instruction is the instruction that is executed the most number of times in an algorithm• The number of times that the barometer instruction is executed is usually proportional to its running time. •

3.11 ComparisonsLet's analyse and compare some different algorithms

Linear search• Binary search• Selection sort• Insertion sort• Quick sort•

3.12 Searching and SortingIt is often useful to find out whether or not a list contains a particular item.• Such a search can either return true or false or the position of the item in the list.• If the array isn't sorted use linear search.• Start with the first item, and go through the array comparing each item to the target.• If the target item is found, return true (or the index of the target element).• Searching and sorting are among the most common operations performed. For example, databases.• It is much easier to search data that has been sorted. For example, searching a telephone directory. •

3.13 Searching Computer systems are often used to store large amounts of data from which individual records are retrieved • according to some searching criteria.The process of finding the location of a specific data item or record with a given key value or finding the locations • of all records which satisfy one or more conditions in a list is called “Searching”.If the item exists in the given list then search is said to be successful, otherwise, if the element is not found in • the given list then search is said to be unsuccessful.

3.14 Main Type of SearchingMentioned below are the main types of search:Internal search The search in which the whole list resides in the main memory, is called internal search.

External Search The search in which most of the list resides in the secondary memory is called external search.

3.15 Some Important PointsIn data structures, when one use the word searching, one actually refers to internal searching.• The complexity of any searching method is determined from the number of comparisons performed among the • elements of the list in order to find the element.The time required for a search operation depends on the complexity of the searching algorithm.• Basically one has to consider three cases when we search for a particular element in the list.•

39

Best case The best case is that in which the element is found during the first comparison.

Worst caseThe worst case is that in which the element is found only at the end, i.e., in the last comparison.

Average caseThe average case is that in which the element is found in comparisons more than best case, but less than worst case.

3.16 Types of Searching AlgorithmsThere are two types of searching algorithm, which are listed below

Sequential search • Binary Search •

3.17 Sequential Search It is also called linear search.• It is the simplest way for finding an element in a list.• It searches the element sequentially in a list, no matter whether list is sorted or unsorted.• In case of sorted list in ascending order, the search is started from 1st element and continued until the desired • element is found or the element whose value is greater than the value being searched.In case of sorted list in descending order, the search is started from 1st element and continued until the desired • element is found or the element whose value is smaller than the value being searched.If the list is unsorted searching started from 1st location and continued until the element is found or the end of • the list is reached.

3.18 Some DrawbacksIt is a very slow process.• It is used only for small amount of data.• It is a very time consuming method.•

3.19 Algorithm Sequential SearchThis algorithm is used for linear searching.• ITEM is to be searched in a linear array DATA having N elements.• LOC is a variable which stores the location in case of successful search and in case of unsuccessful search LOC • will contain 0.

3.20 Linear Search Algorithm

Step 1: [Insert ITEM at the end of DATA] SetDATA [N+1]:=ITEM Step 2: [Initialise counter] Set LOC: =1 Step 3: [Search for ITEM] Repeat while DATA [LOC]! =ITEM Set LOC:=LOC+1. [End of loop]Step 4: [Successful?] if LOC=N+1, then Set LOC:=0 Step 5: Exit [Finish].

Data Structures

40

3.21 Complexity of Linear Search Algorithm Linear search provides complexity for finding an element in an array, because linear search is a step-by-step • process, in which specific element is compared with the each element of array.In linear search, complexity is due to two cases.•

It is possible that required element occurs at the end of the array. So, linear search consumes more time. It is also possible that required element is not present in the given array, this is the worst case.

In this case, the algorithm requires f (n) =n+1 comparisons.• If the element is at first position in array, then only one comparison will be needed.• Suppose pk is the probability that ITEM appears in DATA [K] and suppose q is the probability that ITEM does • not appear in DATA. (Then p1+p2+………+pn=1.)Since the algorithm uses k comparisons when ITEM appears in DATA [K], the average number of comparisons • is given by f(n)=1xp1+2xp2+………+nxpn+(n+1)xq. In particular, suppose q is very small and ITEM appears with equal probability in each element of DATA. Then • q is approximately equal to 0 and each pi=1/n.Accordingly, f(n)=1x1/n+2x1/n+……..+nx1/n+(n+1)x0=(1+2+….+n)x1/n=n(n+1)/2x1/n=(n+1)/2• That is, in this special case, the average number of comparisons required to find the location of ITEM is • approximately equal to half the number of elements in the array.

3.22 Linear Search Code for Array The algorithm translates to the following Java method.public static int linearSearch(Object[] array, Object key) for(int k = 0; k < array.length; k++) if(array[k].equals(key)) return k; return -1;

3.23 Binary SearchA binary search algorithm is a technique for finding a particular value in a sorted list. It makes progressively better guesses and closes in on the sought value by selecting the median element in a list, comparing its value to the target value (key) and determining if the selected value is greater than, less than or equal to the target value. A guess that turns out to be too high becomes the new top of the list and a guess that is too low becomes the new bottom of the list. Pursuing this approach iteratively, it narrows the search by a factor of two each time and finds the target value.

The binary search consists of the following steps:Search a sorted array by repeatedly dividing the search interval in half.• Begin with an interval covering the whole array.• If the value of the search key is less than the item in the middle of the interval, narrow the interval to the lower • half. Otherwise narrow it to the upper half.Repeatedly check until the value is found or the interval is empty.•

The most straightforward implementation is recursion which recursively searches the sub-array dictated by the comparison as given in algorithm below:

41

AlgorithmBinary search (arr[1...n], value, low, high) • IF (high<low)• return-1 // not found • Endif • mid=(low+high)/2• If (arr[mid]> value)• return BinarySearch (arr, value, low, mid-1) • Else if (arr[mid]< value)• return Binary search(arr, value, low, mid+1, high)• Else • return mid // found • Endif • End•

The algorithm is invoked with initial low and high values of 1 and n. We can eliminate the recursion above and convert this to an iterative implementation as given in algorithm below:

AlgorithmBinarySearch(a[1...n], n, x)• low= 1• high=n • while(low• ≤high)mid=(low+high)/2• If (a[mid]>x)• high=mid-1• else If (a[mid]<x)• high =mid+1• else If(a[mid]<x)• low=mid+1• else• return mid //found • Endif• Endwhile • retrun-1 // not found •

The figure below illustrates the trace of the algorithm to find the target value of 65.

Binary search is a logarithmic algorithm and runs in O(log2n) time. Specifically, 1 + log2n iterations are needed to return an answer. In most cases, it is considerably faster than a linear search. It can be implemented using recursion or iteration, as shown above.

Data Structures

42

[1] [2] [3] [4] [5] [6] [7] [8] [9]10 15 40 50 55 65 75 90 95

low mid high

Fig. 3.1 Trace binary search first step

[1] [2] [3] [4] [5] [6] [7] [8] [9]10 15 40 50 55 65 75 90 95

low mid high

Fig. 3.2 Trace binary search second step

[1] [2] [3] [4] [5] [6] [7] [8] [9]10 15 40 50 55 65 75 90 95

lowmidhigh

Fig. 3.3 Trace binary search third step

3.24 Sorting TechniqueMany sorting techniques exist: bubble sort, insertion sort, selection sort, merge sort, quick sort, shell sort, radix • sort, etc. These techniques differ in their efficiency. • Different sorting techniques take different amounts of time (and memory/disk space) to sort the same data.• Some sorting algorithms are better (faster) than others for larger data sets.•

3.25 Bubble SortMethod: Compare the pairs of adjacent items, and swap them if they are “out of order”.• Elements “bubble” to their proper places.• At the end of each pass, the largest remaining element is in its proper place.• This is trivial to implement, but very inefficient.• Each pass may only sort a single value.• For N values, we need (N-1) passes.•

43

Input

2

1

4

2

3

1

2

4

2

3

1

2

4

2

3

1

2

2

4

3

1

2

2

3

4

1

2

2

3

4

Step 1

min = 1

Step 2

min = 2

Step 3

min = 2

Step 4

min = 3

Step 5

min = 4

b[1] = 1 b[2] = 2 b[3] = 2 b[4] = 3 b[5] = 4

4 comp 3 comp 2 comp 1 comp 0 comp 2 I/O 2 I/O 3 I/O 3 I/O 1 I/O

Work = (4 + 3 + 2 + 1) comparisons + (2 + 2 + 3 + 3 + 1) I/O= 10 comparisons + 11 I/O operations

Fig. 3.4 Bubble sorting

PseudocodeLet N be the number of elements in the data set• Repeat N-1 times• if A[x] > A[x+1], swap them•

3.26 Insertion SortMethod: Select one element at a time and insert it into its proper sorted position • Begin by dividing the array into two regions: sorted and unsorted • For each pass, move the first unsorted item into its proper position in the sorted region.• Slightly more efficient than bubble sort, since it swaps fewer elements per round•

PseudocodeA[0] is sorted; A[1]-A[N-1] are unsorted1. Repeat the following steps N times. 2.

next Item = first unsorted elementa. Shift sorted elements > nextItem over one position (A[x] = A[x-1])b. Insert nextItem into correct position.c.

Insertion sort exampleStart: [ 29 ][ 10, 14, 37, 13 ]• Move 10: [ 10, 29 ][ 14, 37, 13 ]•

Data Structures

44

Move 14: [ 10, 14, 29 ][ 37, 13 ]• Move 37: [ 10, 14, 29, 37 ][ 13 ]• Move 13: [ 10, 13, 14, 29, 37 ][ ]• End: [ 10, 13, 14, 29, 37 ]•

Schematic diagram 40 15 30 5 25 10 20 35

15 40 30 5 25 10 20 35

15 30 40 5 25 10 20 35

5 15 30 40 25 10 20 35

5 15 25 30 40 10 20 35

5 10 15 25 30 40 20 35

5 10 15 20 25 30 40 35

5 10 15 20 25 30 35 40

Fig. 3.5 Insertion sorting

Insertion sort codepublic void insertionSort(int [ ] list) for (int unsorted = 1; unsorted < list.length; unsorted++) int nextItem = list[unsorted]; int loc; // Shift larger sorted elements to the right for (loc = unsorted; (loc > 0) && (list[loc-1] > nextItem); loc--) list[loc] = list[loc-1]; // Insert nextItem into sorted position list[loc] = nextItem;

45

3.27 Merge SortMerge sort is an O(n log n) comparison-based sorting algorithm. In most implementations it is stable, meaning that it preserves the input order of equal elements in the sorted output. Sorting by merging is a recursive, divide-and-conquer strategy. In the base case, we have a sequence with exactly one element in it. Since such a sequence is already sorted, there is nothing to be done. The steps to sort a sequence of elements (n > 1) are as follows:

Divide the sequence into two sequences of length • n / 2 and n / 2Recursively sort each of the two subsequences and then• Merge the sorted subsequences to obtain the final result•

The figure below shows the idea of merge sort.

Fig. 3.6 Merge sort procedure

Algorithm: To sort the entire sequence A[1 ... n], make the initial call to the procedure MERGE-SORT (A, 1, n).MERGE-SORT(A,p, r )

IF P < r• THEN q= FLOOR [(p+r)/2] // Check for base case • MERGE (A, p, q) //Divide step • MERGE (A, q+1, r) // Conquer step • MERGE (A, p, q, r) // Conquer step •

The figure below shows the bottom-up view of the above procedure for n = 8.

Data Structures

46

sorted sequence

merge

merge

merge merge merge

merge

merge

unsorted sequence

5 10 15 20 25 30 35 40

5 15 30 40 10 20 25 35

15 40

40 15 30 5 25 2010 35

10 25 20 35 5 30

Fig. 3.7 Merge sorting

The pseudo code of the MERGE procedure is as followsMERGE (A,p,q,r)

n1 • ← q- p+1 n2 • ← r-qCreate arrays L[1.......n• 1+1] and R[1..... n2]FOR i• ← 1To n1 Do L[i]• ← A[p+i-1]FOR j• ← 1To n2 Do R[j]• ← A[q+j]L[n• 1+1]←∞L[n• 2+1]←∞i• ←1j• ←1FOR k• ←p To r DO IF L[i]≤R[j]• THEN A[K]• ←L[i]i• ←i+1Else A[k]• ← R[j]j• ←j+1

3.28 Quick SortThe basic version of quick sort algorithm was invented by C. A. R. Hoare in 1960. It is used on the principle of divide-and-conquer. Quick sort is an algorithm of choice in many situations because it is not difficult to implement, it is a good "general purpose" sort and it consumes relatively fewer resources during execution.

It is in-place since it uses only a small auxiliary stack.• It requires only • n log(n) time to sort n items.It has an extremely short inner loop• This algorithm has been subjected to a thorough mathematical analysis; a very precise statement can be made • about performance issues.

47

Quick sort works by partitioning a given array A[p. .r] into two non-empty sub array A[p.... q] and A[q+1 .... r] such that every key in A[p....q] is less than or equal to every key in A[q+1..... r]. Then the two subarrays are sorted by recursive calls to Quick sort. The exact position of the partition depends on the given array and index q is computed as a part of the partitioning procedure.

Fig. 3.8 Quick sorting

3.29 Quick SortIf • p < r thenq• Partition (A, p, r)Recursive call to Quick Sort (• A, p, q)Recursive call to Quick Sort (• A, q+ r, r)

Note that to sort entire array, the initial call Quick Sort (A, 1, length[A]) is made.

As a first step, Quick Sort chooses as pivot one of the items in the array to be sorted. Then array is then partitioned on either side of the pivot. Elements that are less than or equal to pivot will move toward the left and elements that are greater than or equal to pivot will move toward the right.

Partitioning the arrayPartitioning procedure rearranges the sub arrays in-place as given below:Partition (A, p, r)

x• ← A[p]i• ← p-1j• ← r+1while TRUE do• Repeat • j←j-1until • A[j] ≤ xRepeat • i← i+1

Data Structures

48

until • A[i] ≥ xif • i< jthen exchange • A[i]↔A[j]else return • j

Partition selects the first key, A[p] as a pivot key about which the array will partitioned:Keys ≤ A[p] will be moved towards the left . Keys ≥ A[p] will be moved towards the right.

3.30 Selection SortThis type of sorting is called "selection sort", because it works by repeatedly element. It works as follows: first find the smallest in the array and exchange it with the element in the first position, then find the second smallest element and exchange it with the element in the second position, and continue in this way, until the entire array is sorted. The algorithm and schematic diagram is as given below.

Fig. 3.9 Selection sort

Selection Sort (A)for i ← 1 to n-1 domin j ← i;min x ← A[i]for j ← i + 1 to n doIf A[j] < min x thenmin j ← jmin x ← A[j] A[min j] ←A [i]A[i] ←min x

Selection sort is among the simplest of sorting techniques and it work very well for small files. Furthermore, despite its evident "naïve approach ", selection sort has a quite important application because each item is actually moved at most once, section sort is a method of choice for sorting files with very large objects (records) and small keys.

49

3.31 Radix SortMethod: Form groups (based on digits in the same place), then combine those groups, i.e., all items with 3 in • the tens place. This requires d iterations, where d is the number of digits in the largest element.• Worst-case running time: O(dn)•

Fig. 3.10 Radix sorting

Pseudocodefor (J = d down to 1):

Initialise 10 groups to empty1. for (I = 0 through N-1):2.

Place A[I] at the end of group K.a. Increment Kth counter.b. Replace A with group 0 + group 1 + etc.c.

Data Structures

50

Summary Algorithms can be described in terms of time efficiency and space efficiency.• Choosing• anappropriatealgorithmcanmakea significant difference in the usability of a system.It is possible to count the number of operations that an algorithm performs.• By a careful visual walkthrough of the algorithm or by inserting code in the algorithm to count and print the • number of times that each line executes.In some cases, it may be essential to know how long an algorithm takes on some system. For example, air traffic • control systems.Running time is affected by a number of factors other than algorithm efficiency.• The search in which the whole list resides in the main memory is called internal search.• The search in which most of the list resides in the secondary memory is called external search.•

ReferencesVenit, S., 2001. • Introduction To Programming Concepts And Design. Dreamtech Press. Sengupta, S & Korobkin, C. P., 1994. • C++, object-oriented data structures. Birkhäuser.

Recommended Reading Rainald Löhner., 2008. • Appliedcomputationalfluiddynamicstechniques:anintroductionbasedonfiniteelementmethods. John Wiley and Sons. McMillan, M., 2005. • Data structures and algorithms using Visual Basic.NET. Cambridge University Press. Kashivishwanath, N., 2007. • Data Structure Using C++ .

51

Self Assessment Which of following describes algorithms?1.

Time efficiency a. Space efficiency b. Gap efficiencyc. Code efficiency d.

It is possible to count the number of operations that an ___________performs.2. algorithma. codeb. instructionc. programd.

By a careful visual walkthrough of the algorithm or by inserting code in the algorithm to count and print the 3. number of times that each line_____________.

performsa. executesb. finishesc. effectsd.

Which of following sentences is true?4. It is possible to time algorithms.a. It is impossible to time algorithms.b. It is unlikely to time algorithms.c. It is incorrect to time algorithms.d.

Which of the following sentences is true? 5. It may be destructive to time how long an algorithm takes to run.a. It may be useless to time how long an algorithm takes to run.b. It may be useful to time how long an algorithm takes to run.c. It may be unavoidable to time how long an algorithm takes to run.d.

___________ is affected by a number of factors other than algorithm efficiency.6. Execution time a. Program time b. Efficiency time c. Running timed.

Choose the correct terms that affect the running time.7. CPU speeda. Operating system b. System tasks c. none of above d.

Data Structures

52

Instead of timing an algorithm, count the number of _________________that it performs. 8. codesa. programsb. commandsc. instructionsd.

Which of the following statements is true?9. It can be difficult to determine the exact number of instruction performed by an algorithm.a. It can be difficult to determine the exact number of operations performed by an algorithm.b. It can be difficult to determine the exact number of acts performed by an algorithm.c. It can be difficult to determine the exact number of courses performed by an algorithm.d.

The search in which the whole list resides in the main memory is called _______________. 10. internal searcha. interior searchb. inner searchc. inner searchd.

53

Chapter IV

Complexity of Algorithm

Aim


introduce the concept of algorithm •

explain the performance of the algorithm •

illustrate the growth of functions •

Objectives


introduce the big-oh notation •

state the concept of Ω-notation •

explain the theory of • Θ-notation

Learning outcome


evaluate various asymptotic notation •

understand the significance of algorithm •

discuss the relation between • Θ, Ω and O

Data Structures

54

4.1 Introduction to AlgorithmAn algorithm is a set of rules for carrying out calculation either by hand or on a machine.• An algorithm is a finite step-by-step procedure to achieve a required result.• An algorithm is a sequence of computational steps that transform the input into the output.• An algorithm is a sequence of operations performed on data that has to be organised in data structures.• An algorithm is an abstraction of a program to be executed on a physical machine (model of Computation).•

4.2 Algorithm's PerformanceTwo important ways to characterise the effectiveness of an algorithm are its space complexity and time • complexity. Time complexity of an algorithm concerns determining an expression of the number of steps needed as a • function of the problem size. Since the step count measure is somewhat coarse, one does not aim at obtaining an exact step count. Instead, one attempts only to get asymptotic bounds on the step count. • Asymptotic analysis makes use of the O (Big Oh) notation. • Two other notational constructs used by computer scientists in the analysis of algorithms are Θ (Big Theta) • notation and Ω (Big Omega) notation.The performance evaluation of an algorithm is obtained by totalling the number of occurrences of each operation • when running the algorithm. The performance of an algorithm is evaluated as a function of the input size n and is to be considered modulo • a multiplicative constant.The following notations are commonly used notations in performance analysis and used to characterise the • complexity of an algorithm.

4.3 Growth of Functions: Asymptotic NotationTo characterise the time cost of algorithms, we focus on functions that map input size to (typically, worst-case) • running time. (Similarly for space costs.) We are interested in precise notation for characterizing running-time differences that are likely to be significant • across different platforms and different implementations of the algorithms. This naturally leads to an interest in the asymptotic growth" of functions. • We focus on how the function behaves as its input grows large. • Asymptotic notation is a standard means for describing families of functions that share similar asymptotic • behaviour. Asymptotic notation allows us to ignore small input sizes, constant factors, lower-order terms in polynomials, • and so forth.

4.4 Types of Asymptotic Notations The following are the asymptotic notations that we have to use in design and analysis of algorithms:

Big-Oh notation• Big-theta Notation • Small-Omega Notation • Small-oh Notation •

55

4.5 Θ - Notation (Same order)Θ notation bounds a function to within constant factors. • We say f(n) = Θ(g(n)) if there exists positive constants n0, c1 and c2 such that to the right of n0 the value of • f(n) always lies between c1 g(n) and c2 g(n) inclusive.The set notation can be written as follows:•

Θ(g(n)) = f(n) : there exist positive constants c1, c1, and n0 such that 0 ≤ c1 g(n) ≤ f(n) ≤ c2 g (n) for all n ≥ n0

We say that g (n) is an asymptotically tight bound for f(n).

Fig. 4.1 Θ-Notation

Graphically, for all values of n to the right of n• 0, the value of f(n) lies at or above c1 g(n) and at or below c2 g(n). In other words, for all n ≥ n• 0, the function f(n) is equal to g(n) to within a constant factor.We say that g(n) is an asymptotically tight bound for f(n).• In the set terminology, f(n) is said to be a member of the set Θ(g(n)) of functions. • In other words, because O(g(n)) is a set, it can be written as:•

f(n) ∈ Θ(g(n))

This indicates that f(n) is a member of Θ(g(n)). We express the same notation as given below.•

f(n) = Θ(g(n))

Historically, this notation is "f(n) = Θ(g(n))" although the idea that f(n) is equal to something called Θ(g(n)), • is misleading.

Example: n2/2 − 2n = (n2), with c1 = 1/4, c2 = 1/2, and n0 = 8.

Data Structures

56

4.6 O- Notation (Upper Bound)O-notation gives an upper bound for a function to within a constant factor. • We write f(n) = O(g(n)), if there are positive constants n• 0 and c such that to the right of n0, the value of f(n) always lies on or below c g(n).In the set notation, we write as follows: For a given function g(n), the set of functions•

Ο(g(n)) = f(n): there exist positive constants c and n0 such that 0 ≤ f(n) ≤ c g(n) for all n ≥ n0

We say that the function g(n) is an asymptotic upper bound for the function f(n). • We use Ο-notation to give an upper bound on a function, to within a constant factor.•

Fig. 4.2 O-Notation

Graphically, for all values of n to the right of n0, the value of the function f(n) is on or below g(n). • We write f(n) = O(g(n)) to indicate that a function f(n) is a member of the set Ο(g(n)). That is:•

f(n) ∈ Ο(g(n))

Note that f(n) = Θ(g(n)) implies f(n) = Ο(g(n)), since Θ-notation is a stronger notation than Ο notation.•

Example: 2n2 = Ο(n3), with c = 1 and n0 = 2.

Equivalently, we may also define f is of order g as follows• :

If f(n) and g(n) are functions defined on the positive integers, then f(n) is Ο(g(n)), if and only if there is a c > 0 and an n0 > 0 such that:

| f(n) | ≤ | g(n) | for all n ≥ n0

Historical Note: The notation was introduced in 1892 by the German mathematician Paul Bachman.

57

4.7 Ω-Notation (Lower Bound)Ω- notation gives a lower bound for a function to within a constant factor. • We write f(n) = Ω(g(n)) if there are positive constants n• 0 and c such that to the right of n0, the value of f(n) always lies on or above c g(n).In the set notation, we write as follows: For a given function g(n), the set of functions. •

Ω(g(n)) = f(n) : there exist positive constants c and n0 such that 0 ≤ c g(n) ≤ f(n) for all n ≥ n0

We say that the function g(n) is an asymptotic lower bound for the function f(n).•

Fig. 4.3 Ω-Notation

The intuition behind Ω-notation is shown above.

Example: √n = (lg n), with c = 1 and n0 = 16.

4.8 o-NotationFor a given function g(n), the set little-oo(g(n)) = f(n): ∀ c > 0, ∃ n0 > 0 such that ∀ n ≥ n0, we have 0 ≤ f(n) < cg(n).f(n) becomes insignificant relative to g(n) as n approaches infinity

g(n) is an upper bound for f(n) that is not asymptotically tight.Observe the difference in this definition from previous ones.

4.9 ω-NotationFor a given function g(n), the set little-omegaω(g(n)) = f(n): c > 0, ∃ n0 > 0 such that ∀n≥n0, we have 0 ≤cg(n) < f(n)f(n) becomes arbitrarily large relative to g(n) as n approaches infinity

= ∞ g(n) is a lower bound for f(n) that is not asymptotically tight.

Data Structures

58

4.10 Relations Between Θ,O and Ω

(b) (c)(a) Fig. 4.4 Relations between Θ, O and Ω

Theorem: For any two functions g(n) and f(n), f(n) = Θ(g(n)) f(n) = O(g(n))f(n) = Ω(g(n))i.e., Θ(g(n)) = O(g(n))∩Ω(g(n))

In practice, asymptotically tight bounds are obtained from asymptotic upper and lower bounds.

4.11 Running Times“• Running time is O(f(n))”=> Worst case is O(f(n))O(f(n)) bound on the worst-case running time=> O(f(n)) bound on the running time of every input.• Q(f(n)) bound on the worst-case running time=> Q(f(n)) bound on the running time of every input.• “Running time is Ω(f(n))”=>Best case is Ω(f(n))• Can still say “Worst-case running time is W(f(n))”• Means worst-case running time is given by some unspecified function g(n) • ∈W(f(n)).

4.12 Algorithm AnalysisThe complexity of an algorithm is a function g(n) that gives the upper bound of the number of operation (or running time) performed by an algorithm when the input size is n. There are two interpretations of upper bound.Worst-case complexityThe running time for any given size input will be lower than the upper bound except possibly for some values of the input where the maximum is reached in worst-case complexity.

Average-case complexityThe running time for any given size input will be the average number of operations over all problem instances for a given size.

Because, it is quite difficult to estimate the statistical behaviour of the input, most of the time we content ourselves to a worst case behaviour. Most of the time, the complexity of g(n) is approximated by its family o(f(n)) where f(n) is one of the following functions. n (linear complexity), log n (logarithmic complexity), na where a ≥ 2 (polynomial complexity), an (exponential complexity).

59

4.13 OptimalityOnce the complexity of an algorithm has been estimated, the question arises whether this algorithm is • optimal. An algorithm for a given problem is optimal if its complexity reaches the lower bound over all the algorithms • solving this problem. For example, any algorithm solving “the intersection of n segments” problem will execute at least n• 2 operations in the worst case even if it does nothing but print the output. This is abbreviated by saying that the problem has Ω (n• 2) complexity. If one finds an O (n2) algorithm that solve this problem, it will be optimal and of complexity Θ (n2).

4.14 ReductionAnother technique for estimating the complexity of a problem is the transformation of problems, also called • problem reduction. As an example, suppose we know a lower bound for a problem A, and that we would like to estimate a lower • bound for a problem B. If we can transform A into B by a transformation step whose cost is less than that for solving A, then B has the • same bound as A.The Convex hull problem nicely illustrates "reduction" technique. • A lower bound of Convex-hull problem established by reducing the sorting problem (complexity: Θ(n log n)) • to the Convex hull problem.

4.15 Comparison of Functions f ↔g ≈a ↔ b (f (n) = O(gn)) ≈a≤b f (n) = Ω(g(n))≈a≥b f (n) = Θ(g(n)) ≈a = b f (n) = o(g(n)) ≈a < b f (n) = ω(g(n))≈ a > b

4.16 PropertiesDiscussed below are the properties of asymptotic notation.

Transitivityf(n) =Θ(g(n)) & g(n) = Θ(h(n))⇒f(n) = Θ(h(n))f(n) = O(g(n)) & g(n) = O(h(n))⇒f(n) = O(h(n))f(n) = Ω(g(n)) & g(n) = Ω(h(n))⇒f(n) =Ω(h(n))f(n) = o (g(n)) & g(n) = o (h(n)) ⇒ f(n) = o (h(n))f(n) = ω(g(n)) & g(n) = ω(h(n)) ⇒ f(n) = ω(h(n))

Reflexivityf(n) = Θ(f(n))f(n) = O(f(n))f(n) = Ω(f(n))

Symmetryf(n) = Θ(g(n)) if g(n) = Θ(f(n))

Data Structures

60

Complementarityf(n) = O(g(n)) if g(n) = Ω(f(n)) f(n) = o(g(n)) if g(n) = ω((f(n))

4.17 Common Functions The common functions of asymptotic notation are monotonicity, exponentials and exponentials and polynomials which are described below.

Monotonicity• f(n) is

monotonically increasing if m ≤ n ⇒ f(m) ≤ f(n).monotonically decreasing if m ≥ n ⇒ f(m) ≥ f(n).strictly increasing if m < n ⇒ f(m) < f(n).strictly decreasing if m > n ⇒ f(m) > f(n).

Exponentials • Useful identities

=

Exponentials and polynomials•

=0 ⇒

61

Summary An algorithm is a set of rules for carrying out calculation either by hand or on a machine.• An algorithm is a finite step-by-step procedure to achieve a required result.• An algorithm is a sequence of computational steps that transform the input into the output.• An algorithm is a sequence of operations performed on data that have to be organised in data structures.• Two important ways to characterise the effectiveness of an algorithm are its space complexity and time • complexity. Time complexity of an algorithm concerns determining an expression of the number of steps needed as a function • of the problem size. Since the step count measure is somewhat coarse, one does not aim at obtaining an exact step count. • Instead, one attempts only to get asymptotic bounds on the step count. • Asymptotic analysis makes use of the O (Big Oh) notation. • Two other notational constructs used by computer scientists in the analysis of algorithms are Θ (Big Theta) • notation and Ω (Big Omega) notation.The performance evaluation of an algorithm is obtained by totalling the number of occurrences of each operation • when running the algorithm. Θ-notation bounds a function to within constant factors. • O-notation gives an upper bound for a function to within a constant factor. • Ω- notation gives a lower bound for a function to within a constant factor.• In practice, asymptotically tight bounds are obtained from asymptotic upper and lower bounds.• “• Running time is O(f(n))”=> Worst case is O(f(n))O(f(n)) bound on the worst-case running time=> O(f(n)) bound on the running time of every input.• Q(f(n)) bound on the worst-case running time=> Q(f(n)) bound on the running time of every input.• The running time for any given size input will be lower than the upper bound except possibly for some values • of the input where the maximum is reached.

References Malik D. S., 2008. C• ++ programming: program design including data structures. Cengage Learning. Preiss, B. R., 2008. • Data Structures and Algorithms with Object- Oriented Design Patterns in C++. Wiley-India.

Recommended Reading Laszlo M. J., 1996. • Computational geometry and computer graphics in C++. Prentice Hall.Goodrich, M., Tamassia, R. & Mount, D. • Data Structures And Alogorithms In C++. Wiley-India.Suely Oliveira & Stewart, D. E., 2006• .Writingscientificsoftware:aguideforgoodstyle. Cambridge University Press.

Data Structures

62

Self AssessmentAn algorithm is a set of rules for carrying out ___________either by hand or on a machine.1.

calculationa. sumsb. picturesc. drawingsd.

An algorithm is a finite ___________________procedure to achieve a required result.2. role-by-role a. step-by-stepb. turn-by-turn c. round-by-round d.

An algorithm is a _______________of computational steps that transform the input into the output.3. sequencea. cycleb. progressionc. stringd.

An algorithm is an abstraction of a _________to be executed on a physical machine.4. processa. codeb. programc. instruction d.

Two important ways to characterise the effectiveness of an algorithm are its _______________complexity and 5. _________________ complexity.

freedom, time a. space, moment b. gap, instance c. space, time d.

Asymptotic analysis makes use of the __________notation.6. O (Big Oh)a. P (Big P)b. Q (Big Q)c. H(Big H)d.

_______ notation bounds a function to within constant factors. 7. Ωa. αb. ωc. Θd.

63

___________ notation gives an upper bound for a function to within a constant factor. 8. Oa. αb. ωc. Θd.

______________ notation gives a lower bound for a function to within a constant factor. 9. Oa. αb. Ωc. Θd.

The running time for any given size input will be lower than the upper bound except possibly for some values 10. of the input where the maximum is reached in __________complexity.

worst-casea. average-case b. slow –case c. fast-case d.

Data Structures

64

Chapter V

Recursion

Aim


explain the concept of recursion •

discuss infinite recursion function method •

compare recursion and memory•

Objectives


understand recursive algorithms•

illustrate the recursion process in coding language •

compare recursion and iteration •

discuss recursive problem •

Learning outcome


state the concept of tower of Hanoi •

explain indirect recursion •

elaborate maze traversal •

65

5.1 IntroductionOne of the most important “Great Ideas” in CS 106B is the concept of recursion, which is the process of solving • a problem by dividing it into smaller sub problems of the same form. The italicised phrase is the essential characteristic of recursion; without it, all you have is a description of • stepwise refinement of the sort we teach in CS106A.The fact that recursive decomposition generates sub problems that have the same form as the original problem • means that recursive programs will use the same function or method to solve sub problems at different levels of the solution. In terms of the structure of the code, the defining characteristic of recursion is having functions that call • themselves, directly or indirectly, as the decomposition process proceeds.A recursive function is one which calls itself. Multiple calls are active at the same time. Recursion in place of • a loop. Sometimes, the recursive solution is shorter and more elegant.To get the code right, it helps to “think recursively”.• Recursion is a programming technique that allows the programmer to express operations in terms of • themselves. In C++, this takes the form of a function that calls itself. • A useful way to think of recursive functions is to imagine them as a process being performed where one of the • instructions is to "repeat the process". This makes it sound very similar to a loop because it repeats the same code, and in some ways it is similar to • looping. On the other hand, recursion makes it easier to express ideas in which the result of the recursive call is necessary • to complete the task. Of course, it must be possible for the "process" to sometimes be completed without the recursive call. •

5.2 A Simple Illustration of RecursionSuppose that you are the national fundraising director for a charitable organization and need to raise • $1,000,000. One possible approach is to find a wealthy donor and ask for a single $1,000,000 contribution. The problem • with that strategy is that individuals with the necessary combination of means and generosity are difficult to find. Donors are much more likely to make contributions in the $100 range.Another strategy would be to ask 10,000 friends for $100 each. Unfortunately, most of us don’t have 10,000 • friends.There are, however, more promising strategies. You could, for example, find ten regional coordinators and • charge each one with raising $100,000. Those regional coordinators could in turn delegate the task to local coordinators, each with a goal of $10,000, continuing the process reached a manageable contribution level.The following diagram illustrates the recursive strategy for raising $1,000,000 described on the previous • slide.

Data Structures

66

Fig. 5.1 Illustration of recursion

5.3 A Pseudocode Fundraising StrategyIf one is to implement the fundraising strategy in the form of a C++ function, it would look something like this:

void CollectContributions(int n) if (n <= 100) Collect the money from a single donor. else Find 10 volunteers.Get each volunteer to collect n/10 dollars.Combine the money raised by the volunteers.

What makes this strategy recursive is that the line “Get each volunteer to collect n/10 dollars. will be implemented using the following recursive call: CollectContributions (n / 10).

5.4 RecursionThe programs we have been doing to date have been organised into functions that call another function in a • disciplined, hierarchical way.A function is said to be recursive if it calls itself either directly or indirectly through another function.• A recursive function knows when to stop calling itself once a base case is reached.•

67

e.g. //print numbers 1 to n backwardsint print(int n)if ( n = = 0) // this is the terminating base casereturn 0;else System.out.print(“”+n+” “);return print(n-1); // recursive call to itself again

5.5. The Recursion StepA recursive method tackles a problem by launching (calling) a copy of itself to work on a smaller problem. This • is called the recursion step. The recursion step can result in many more such recursive calls as the method keeps dividing each problem it • is called with into two smaller problems.It is important to ensure that the recursion terminates.• Each time the function calls itself with a slightly simpler version of the original problem.• This sequence of smaller problems must eventually converge on the base case.•

5.6 Use of RecursionTo use recursion properly, we need to remember to provide two parts.

One (or more) base cases that are not recursive, i.e., we can directly give a solution.•

if (n==0)return 0;

One (or more) recursive cases that operate on smaller problems that get closer to the base case(s).•

return print(n-1);Note:

The base case(s) should always be checked before the recursive calls.• Most of the time, a recursive function will usually return something.•

5.7 Infinite RecursionWe must make sure that recursion eventually stops, otherwise it will run forever.

e.g.// example of a badly defined recursive functionint Bad_recursion(n)int x = Bad_recursion(n-1); // Bad!!!if (n == 1)return 1;elsereturn n*x;

Data Structures

68

5.8 Recursion and MemoryEach recursive call makes a new copy of that method (actually only the variables) in memory.• Once a method ends (i.e., returns some data), the copy of that returning method is removed from memory.• In our previous example, if we called the print function with n=4, visually our memory assignments will look • like:

Fig. 5.2 Recursion and memory

5.9 Recursive AlgorithmsRecursive Algorithm is a solution that uses recursion to solve the problem.• As mentioned previously, methods are said to be recursive, if they call themselves either directly or • indirectly.Many problems can be solved and best described through recursive algorithms (e.g. traversal of nodes in a • binary tree).Some problems are best suited for recursive solutions while others are not.• Recursion is a complex topic and recursive algorithms can get quite complex.• We are going to look at some simple problems that can be solved recursively.•

5.10 Solving FactorialsThe method of factorial explanation is briefly described below.Factorial explanation:The product of the positive integers from 1 to n inclusive is called “n factorial”, usually denoted by n!

n! = n*(n-1)*(n-1)…3*2*1

Mathematical explanation:

n!=

Example: 6! = 6*5*4*3*2*1 = 72

69

5.11 Iterative Example We can write an iterative solution to the factorial problem:

Pseudo code

Beginfactorial(n)result = 1; //init result – i.e. n = 0 or 1for i = 2 to n // if n is > 1result = result * i; //fact is n*(n-1)!endforreturn resultendmethodEnd

Actual code

int factorial (int n) int result = 1; //init result – i.e. n = 0 or 1for(int i = 2; i<= n; i++) // if n is > 1result = result * i; //fact is n*(n-1)!return result

5.12 Recursion Example We can obtain a recursive definition of the factorial method by observing the relationship from the previous mathematical explanation.

n!=

From this, we can obtain both our base and recursive cases.

Base Case if(n <= 1)return 1;

Recursive Casereturn n*factorial(n-1);

Data Structures

70

Factorial method using recursive

int factorial(int n)if (n <=1) // the base casereturn 1else return n * factorial(n-1); // the recursive case

5.13 Visual Example – Factorials

Fig. 5.3 Visual example of factorials

5.14 Recursion and IterationWhich way is better? – Iteration or Recursion? Answer is that it depends on what you are trying to do. Usually, a recursive approach more naturally mirrors the problem at hand. So, a recursive approach makes it simpler to tackle a problem which may not have the most obvious of answers. However, recursion carries an overhead that for each recursive call needs space on the stack frame. This extra memory needs is quite a processor intensive and consumes a lot of memory if recursion is nested deeply. Iteration does not have this overhead as it occurs within the method so the overhead of repeated method calls and extra memory is omitted.

71

Difference between Recursion and Iteration

Recursion Iteration

Terminates when a base case is reached.• Each recursive call requires extra space on the • stack frame (i.e., memory).If we get infinite recursion, we will eventually run • out of memory, resulting in a stack overflow.Solutions to some problems are easier to • formulate recursively

Terminates when a condition is proven to be false.• Each iteration does not require any extra space as it • resides in the same method.An infinite loop could potentially loop forever, since • there is no extra memory being created.Iterative solutions to a problem may not • always be as obvious as a recursive solution.•

5.15 Another Example - The Fibonacci SeriesThe Fibonacci series:0, 1, 1, 2, 3, 5, 8, 13, 21, 34 …Begins with a 0 and 1 and has the property that each subsequent Fibonacci number is the sum of the previous two Fibonacci numbers. A Fibonnacci number, fib(n), can be expressed mathematically as:

Fib(n)=

5.16 The Fibonacci FunctionWe can obtain our base and recursive cases by observing the mathematical relationship as we did before with factorials.

Base case(s)if (n == 0) or (n == 1)return n;

Recursive case(s)return fib(n-1) + fib(n-2);

Now we may write the function to obtain a fibonnacci number.

Data Structures

72

unsigned int fib(unsigned int n)if ((n == 0) || (n == 1)) // the base casereturn n;elsereturn fib(n-1) + fib(n-2); the recursive case

Note:In order to produce a fibonacci series, we would iteratively call this function to produce the required Fibonacci number in the series.

5.17 Visual Example – Fibonacci Series

Fig. 5.4 Fibonacci Series

5.18 Fibonacci Series – CautionA word of caution is in order about recursive programs like the one we use here to generate Fibonacci • numbers.Each level of recursion in the fibonacci function has a doubling effect on the number of calls. • Calculating the 20th Fibonacci number would require on the order of about a million calls.• Calculating the 30th Fibonacci number would require around a billion calls.• This is referred to as “exponential complexity”.• How would you write it iteratively? –Use a bottom up dynamic programming solution.•

73

5.19 Fibonacci Series – Another SolutionOne possibility to improve our recursive approach is to use arrays in a top-down approach to remember the intermediate values calculated.

KnownFib[MAXSIZE] = unknown; // initialise all slots to indicate unknownFib(x)if knownFib[x] <> unknown //if fib number x is knownreturn knownF[x] // return known resultif x < 1 // if fib number x is 0 t = 0; // then return 0if x = 1 // if fib number x is 1t = 1; // return 1if x > 1 // if fib number x is greater than 1t = fib(x-1) + fib(x-2) // recursively calculate this numberknownF[x] = t // fib number x now known so put into our known arrayreturn knownF[x] // return fib number x

Note:The index of our knownFib array represents each fib number x so that x can be used here as an index into our knownFib array.

5.20 Classic Recursive ProblemsHere are some classic problems that use recursion to solve:

Queens Problem• Knights Problem• Towers of Hanoi•

Some more complex sorting algorithms require that we use a recursive approach in order to solve them. Two of these that we will be looking at later on are:

Merge Sort• Quick Sort•

5.21 The Towers of HanoiIn the Towers of Hanoi problem, there are three pegs (posts) and n disks of different sizes. Each disk has a hole in the middle so that it can fit on any peg. At the beginning of the game, all n disks are on the first peg, arranged such that the largest is at the bottom, and the smallest is on the top (so the first peg looks like a tower). The goal of the game is to end up with all disks on the third peg, in the same order, that is, smallest on top, and increasing order towards the bottom. But, there are some restrictions to how the disks are moved (which makes the problem non-trivial):

The only allowed type of move is to grab one disk from the top of one peg and drop it on another peg. That is, • you cannot grab several disks at one time. A larger disk can never lie above a smaller disk, on any post.•

Data Structures

74

Fig. 5.5 Tower of Hanoi

int main() Hanoivoid MoveTower(int n, char start, char finish, char temp) if (n == 1) MoveSingleDisk(start, finish); else MoveTower(n - 1, start, temp, finish); MoveSingleDisk(start, finish); MoveTower(n - 1, temp, finish, start);

75

5.22 Indirect RecursionA method invoking itself is considered to be direct recursion.• A method could invoke another method, which invokes another, etc., until eventually the original method is • invoked again. For example, method m1 could invoke m2, which invokes m3, which in turn invokes m1 again until a base • case is reached. This is called indirect recursion, and requires all the same care as direct recursion. • It is often more difficult to trace and debug. •

Fig. 5.6 Indirect recursion

5.23 Maze TraversalWe can use recursion to find a path through a maze; a path can be found from any location if a path can be found • from any of the location’s neighbouring locations. At each location we encounter, we mark the location as “visited” and we attempt to find a path from that • location’s “unvisited” neighbours.Recursion keeps a track of the path through the maze. • The base cases are a prohibited move or arrival at the final destination.•

Data Structures

76

Summary One of the most important “Great Ideas” in CS 106B is the concept of recursion, which is the process of solving • a problem by dividing it into smaller sub problems of the same form. The italicised phrase is the essential characteristic of recursion; without it, all you have is a description of • stepwise refinement of the sort we teach in CS106A.The fact that recursive decomposition generates sub problems that have the same form as the original problem • means that recursive programs will use the same function or method to solve sub problems at different levels of the solution. In terms of the structure of the code, the defining characteristic of recursion is having functions that call • themselves, directly or indirectly, as the decomposition process proceeds.A recursive function is one which calls itself.• The programs we have been doing to date have been organised into functions that call another function in a • disciplined, hierarchical way.A function is said to be recursive if it calls itself either directly or indirectly through another function.• A recursive function knows when to stop calling itself once a base case is reached.• A recursive method tackles a problem by launching (calling) a copy of itself to work on a smaller problem.• Each recursive call makes a new copy of that method (actually only the variables) in memory.• Once the method ends (i.e., returns some data), the copy of that returning method is removed from memory.• Recursive algorithm is a solution that uses recursion to solve the problem.• The product of the positive integers from 1 to n inclusive is called “n factorial”. • Usually, a recursive approach more naturally mirrors the problem at hand. So, a recursive approach makes it • simpler to tackle a problem which may not have the most obvious of answers.Recursion carries an overhead that for each recursive call needs space on the stack frame.• The extra memory need can be quite processor intensive and consume a lot of memory if recursion is nested • deeply.

References Gopalan., 2010. • Object- Oriented Programming Using c++. PHI Learning Pvt. Ltd. Deitel, H. M. & Deitel, P. J., 2005. • Small C++ How to Program. Prentice Hall.

Recommended ReadingFriedman, F. L. & Koffman, E. B., 2007. • Problem solving, abstraction, and design using C++. Pearson Addison-Wesley. William, Ford & Topp, W. R., 2002. • Data structures with C++ using STL. Prentice Hall.František Franěk., 2004. • Memory as a programming concept in C and C++. Cambridge University Press.

77

Self AssessmentOne of the most important “Great Ideas” in ____________is the concept of recursion, which is the process of 1. solving a problem by dividing it into smaller sub problems of the same form.

CC 106Ba. CB 106Bb. SS 106Bc. CS 106Bd.

Which of the following statements is true ?2. A recursive function is one which calls itself.a. A recursive function is one which calls other function.b. A recursive function is one which calls some function.c. A recursive function is one which calls all function.d.

Which of the following sentences is true in recursion?3. Multiple calls are active at the same time.a. Single call is active at the same time.b. Recursive calls are active at the same time.c. Non-stop calls are active at the same time.d.

In C++, ___________takes the form of a function that calls itself. 4. jumpa. recursive b. duplication c. repetitiond.

A recursive function knows when to stop calling _________once a base case is reached.5. other functiona. any functionb. another functionc. itselfd.

It is important to ensure that the recursion __________.6. terminatesa. runs forever b. executes c. stopsd.

The _________ case(s) should always be checked before the recursive calls.7. last a. base b. first c. midd.

Data Structures

78

__________is a solution that uses recursion to solve the problem.8. Recursive algorithma. Recursive procedure b. Recursive function c. Recursive callsd.

Fibonacci number is the_________ of the previous two Fibonacci numbers.9. suma. subtraction b. multiplication c. division d.

____________ is often more difficult to trace and debug. 10. Indirect recursion a. Straight recursion b. Direct recursion c. Short recursion d.

79

Chapter VI

Hash Table

Aim


explain the concept of hash table•

evaluate various type of hash tables•

elaborate the concept of chained hash table•

Objectives


discuss the instruction set used in hash table•

illustrate the collision resolution•

accumulate information of open-addressed hash table•

Learning outcome


acquire knowledge of hash table•

evaluate various hashing technique•

explain the hash table in coding language•

Data Structures

80

6.1 Introduction to Hash TableHash tables support one of the most efficient types of searching i.e., hashing. Fundamentally, a hash table consists of an array in which data is accessed via a special index called a key. The primary idea behind a hash table is to establish a mapping between the set of all possible keys and positions in the array using a hash function.

A hash function accepts a key and returns its hash coding, or hash value. Keys vary in type, but hash codings are always integers. Since both computing a hash value and indexing into an array can be performed in constant time, the beauty of hashing is that we can use it to perform constant time searches. The resulting hash table is said to be directly addressed when a hash function can guarantee that no two keys will generate the same hash coding. This is ideal, but direct addressing is rarely possible in practice. Example, imagine a phone-mail system in which eight-character names are hashed to find messages for users in the system.

If we were to rely on direct addressing, the hash table would contain more than 268=(2.09)1011 entries and the majority would be unused since most character combinations are not names. Typically, the number of entries in a hash table is small relative to the universe of possible keys. Consequently, most hash functions map some keys to the same position in the table. When two keys map to the same position, they collide. A good hash function minimises collisions, but we must still be prepared to deal with them.

This chapter presents two types of hash tables that resolve collisions in different ways.

6.2 Types of Hash TableIn this chapter we will discuss following hash tables:

Chained hash table• Open-addressed hash table• Selecting a hash function• Collision resolution•

6.3 Chained Hash TablesChained hash table includes hash tables that store data in buckets. Each bucket is a linked list that can grow as large as necessary to accommodate collisions.

6.3.1 Description of Chained Hash Tables

A chained hash table fundamentally consists of an array of linked lists.• Each list forms a bucket in which we place all elements hashing to a specific position in the array.• To insert an element, we first pass its key to a hash function in a process called hashing the key. This tells us in • which bucket the element belongs.We then insert the element at the head of the appropriate list.• To look up or remove an element, we hash its key again to find its bucket, and then traverse the appropriate list • until we find the element we are looking for.Since each bucket is a linked list, a chained hash table is not limited to a fixed number of elements. However, • performance degrades if the table becomes too full.

81

Fig. 6.1 A chained hash table with five buckets containing a total of seven elements

6.4 Open-addressed Hash TableHash tables that store data in the table itself instead of in buckets is nothing but open-addressed hash table. Collisions are resolved using various methods of probing the table.

6.5 Selecting a Hash FunctionSelecting a hash function is a crux of hashing. By distributing keys in a random manner about the table, collisions are minimised. Thus, it is important to select a hash function that accomplishes this function.

6.6 Collision ResolutionCollision resolution is a method of managing when several keys map to the same index. Chained hash tables have an inherent way to resolve collisions. Open-addressed hash tables use various forms of probing.

6.7 Application of Hash TableSome applications of hash tables are as follows:Database systems

Database systems are specifically, those that require efficient random access.• Generally, these try to optimise between two types of access methods: sequential and random.• Hash tables are an important part of efficient random access because they provide a way to locate data in a • constant amount of time.

Symbol tableThe tables used by compilers to maintain information about symbols from a program are called symbol • tables.Compilers access information about symbols frequently. Therefore, it is important that symbol tables be • implemented very efficiently.

Tagged buffersTagged buffer is the mechanism for storing and retrieving data in a machine-independent manner.• Each data member resides at a fixed offset in the buffer.• A hash table is stored in the buffer so that the location of each tagged member can be ascertained quickly.• One use of a tagged buffer is sending structured data across a network to a machine whose byte ordering and • structure alignment may not be the same as the original host’s.The buffer handles these concerns as the data is stored and extracted member by member.•

Data Structures

82

Data dictionariesData dictionaries include structures that support adding, deleting, and searching for data.• Although the operations of a hash table and a data dictionary are similar, other data structures may be used to • implement data dictionaries.Using a hash table is particularly efficient.•

Associative arraysMost commonly used in languages those do not support structured types are termed to be associative arrays.• These consist of data arranged so that the n• th element of one array corresponds to the nth element of another.Associative arrays are useful for indexing a logical grouping of data by several key fields.• A hash table helps to key into each array efficiently.•

6.8 Collision ResolutionWhen two keys hash to the same position in a hash table, they collide.• Chained hash tables have a simple solution for resolving collisions: elements are simply placed in the bucket • where the collision occurs.One problem with this includes, if an excessive number of collisions occur at a specific position, a bucket • becomes longer and longer. Thus, accessing its elements takes more and more time.Ideally, we would like all buckets to grow at the same rate so that they remain nearly the same size and as • small as possible. In other words, the goal is to distribute elements about the table in as uniform and random a manner as possible. This theoretically perfect situation is known as uniform hashing; however, in practice it usually can only be • approximated.Even assuming uniform hashing, performance degrades significantly if we make the number of buckets in the • table small relative to the number of elements we plan to insert.In this situation, all of the buckets become longer and longer. Thus, it is important to pay close attention to a • hash table’s load factor.The load factor of a hash table is defined as•

α=n/mWhere,n = the number of elements in the tablem = the number of positions into which elements may be hashedThe load factor of a chained hash table indicates the maximum number of elements we can expect to encounter • in a bucket, assuming uniform hashing.Example, in a chained hash table, whenm = 1699 bucketsn = 3198 elementsthe load factor of the table, α = 3198/1699 = 2Therefore, in this case, we can expect to encounter no more than two elements while searching any one bucket.When the load factor of a table drops below 1, each position will probably contain no more than one element.• Of course, since uniform hashing is only approximated, in actuality we end up encountering somewhat more • or less than what the load factor suggests.How close we come to uniform hashing ultimately depends on how well we select our hash function.•

83

6.9 Selecting a Hash FunctionThe goal of a good hash function is to approximate uniform hashing, that is, to spread elements about a hash • table in as uniform and random a manner as possible.A hash function h is a function we define to map a key k to some position x in a hash table. • x is called the hash coding of k. Formally, this is stated as:

h(k)=xGenerally, most hashing methods assume k to be an integer so that it may be mathematically altered easily to • make h distribute elements throughout the table more uniformly.When • k is not an integer, we can usually coerce it into one without much difficulty.Precisely how to coerce a set of keys depends a great deal on the characteristics of the keys themselves.• Therefore, it is important to gain as much of a qualitative understanding of them in a particular application as • we can.Example, if we were to hash the identifiers found in a program, we might observe that many have similar • prefixes and suffixes since developers tend to gravitate towards variables such as, sampleptr, simpleptr, and sentryptr.A poor way to coerce these keys would be any method depending strictly on characters at the beginning and • end of the keys, since this would result in many of the same integers for k.On the other hand, we might try selecting characters from four positions that have the propensity to be somewhat • random, permute them in a way that randomises them further, and stuff them into specific bytes of a four-byte integer.Whichever approach we choose for coercing keys, the most important thing to remember is “a hash function • should distribute a set of keys about a hash table in a uniform and random manner”.

6.10 Division MethodOnce we have a key • k represented as an integer, one of the simplest hashing methods is to map it into one of m positions in a table by taking the remainder of k divided by m. This is called the division method and can be formally stated as:

b(k)= k mod mUsing this method, if the table has m = 1699 positions, and we hash the key • k =25,657, the hash coding is 25,657 mod 1699 = 172.Typically, we should avoid values for m that are powers of 2. This is because if m = 2p, • h becomes just the p lowest-order bits of k.Usually we choose m to be a prime number not too close to a power of 2, while considering storage constraints • and load factor.Example, if we expect to insert around n = 4500 elements into a chained hash table, we might choose m = 1699, • a good prime number between 210 and 211.This results in a load factor of α = 4500/1699 ≈ 2.6, which indicates that generally two or three elements will • reside in each bucket, assuming uniform hashing.

6.11 Multiplication MethodAn alternative to the division method is to multiply the integer key k by a constant A in the range 0 < A < 1; • extract the fractional part; multiply this value by the number of positions in the table, m; and take the floor of the result.Typically, A is chosen to be 0.618, which is the square root of 5, minus 1, all divided by 2. This method is called • the multiplication method, which is formally stated to be:

b(k)=[m(kA mod 1)]

where A≈( )/2=0.618

Data Structures

84

An advantage to this method is that m, the number of positions in the table, is not as critical as in the division • method. For example, if the table contains m = 2000 positions, and we hash the key k = 6341, the hash coding is

[(2000)((6341)(0.618)mod 1)]= [(2000)(3918.738 mod 1)] = [(2000)(0.738)]= 1476

In a chained hash table, if we expect to insert no more than n = 4500 elements, we might let m = 2250. This • results in a load factor of α = 4500/2250 = 2, which indicates that no more than two traversals should be required to locate an element in any bucket, assuming uniform hashing.Again, notice how this method of hashing allows more flexibility in choosing m to suit the maximum number • of traversals acceptable to us.The example given below, presents a hash function that performs particularly well for strings. It coerces a key • into a permuted integer through a series of bit operations.The resulting integer is mapped using the division method. The function was adapted from Compilers: Principles, • Techniques, and Tools (Reading, MA: Addison-Wesley, 1986), by Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman, who attributed it to P. J. Weinberger as a hash function that performed well in hashing strings for his compiler.

Example: A hash function that performs well for strings

/* hashpjw.c*/#include "hashpjw.h"/* hashpjw */int hashpjw(const void *key) const char *ptr;int val;/* Hash the key by performing a number of bit operations on it.*/val = 0;ptr = key;while (*ptr != '\0') int tmp; val = (val << 4) + (*ptr); if (tmp = (val & 0xf0000000)) val = val ^ (tmp >> 24); val = val ^ tmp; ptr++;/* In practice, replace PRIME_TBLSIZ with the actual table size. */

return val % PRIME_TBLSIZ;

6.12 Interface for Chained Hash TablesLet us discuss some commands for used in chained hash table.chtbl_init

int chtbl_init(CHTbl *htbl, int buckets, int (*h)(const void *key), int (*match)(const void *key1, const void *key2), void (*destroy)(void *data));

85

Return valueThe return value is 0, if initialising the hash table is successful, otherwise -1.

Descriptionchtbl_init initialises the chained hash table specified by “htbl”.• This operation must be called for a chained hash table before the hash table can be used with any other • operation.The number of buckets allocated in the hash table is specified by buckets.• The function pointer • h specifies a user-defined hash function for hashing keys.The function pointer “match” specifies a user-defined function to determine whether two keys match.• It should return 1 if ‘key1’ is equal to ‘key2’, and 0 otherwise.• The destroy argument provides a way to free dynamically allocated data when “chtbl_destroy” is called.•

Example, if the hash table contains data dynamically allocated using “malloc”, destroy should be set to free to free the data as the hash table is destroyed.

For structured data containing several dynamically allocated members, “destroy” should be set to a user-defined • function that calls “free” for each dynamically allocated member as well as for the structure itself.For a hash table containing data that should not be freed, “destroy” should be set to NULL.•

ComplexityComplexity is denoted as O(m), where m is the number of buckets in the hash table.

Chtbl_destroyvoid chtbl_destroy(CHTbl *htbl);

Return valueNone

Descriptionchtbl_destroy destroys the chained hash table specified by htbl. No other operations are permitted after calling chtbl_destroy unless chtbl_init is called again. The chtbl_destroy operation removes all elements from a hash table and calls the function passed as destroy to chtbl_init once for each element as it is removed, provided “destroy” was not set to NULL.

ComplexityThe complexity is given as O(m), where m is the number of buckets in the hash table.

chtbl_insertint chtbl_insert(CHTbl *htbl, const void *data);

Return value0 if inserting the element is successful, 1 if the element is already in the hash table, or –1 otherwise.

Descriptionchtbl_insert inserts an element into the chained hash table specified by htbl.• The new element contains a pointer to data, so the memory referenced by data should remain valid as long as • the element remains in the hash table.It is the responsibility of the caller to manage the storage associated with data.•

Data Structures

86

ComplexityO(1), here m=1.

chtbl_lookupint chtbl_lookup(const CHTbl *htbl, void **data);

Return value 0 if the element is found in the hash table, or –1 otherwise.

Description Determines whether an element matches data in the chained hash table specified by htbl. If a match is found, data points to the matching data in the hash table upon return.

Complexity O(1)

chtbl_sizeint chtbl_size(CHTbl *htbl);

Return value Number of elements in the hash table.

DescriptionMacro that evaluates to the number of elements in the chained hash table specified by htbl.

Complexity O(1)

6.13 Implementation and Analysis of Chained Hash TablesA chained hash table consists of an array of buckets. • Each bucket is a linked list containing the elements that hash to a certain position in the table. • The structure • CHTb1 is the chained hash table data structure. This structure consists of six members: buckets are the number of buckets allocated in the table; h, matches, and destroy are members used to encapsulate the functions passed to chtbl_init; size is the number of elements currently in the table; and table is the array of buckets.

87

/*chtbl.h*//*Header for the Chained Hash Table Abstract Datatype */#ifndef CHTBL_H#define CHTBL_H#include <stdlib.h>#include "list.h"

/* Define a structure for chained hash tables*/typedef struct CHTbl_ int buckets;int (*h)(const void *key);int (*match)(const void *key1, const void *key2);void (*destroy)(void *data);int size;List *table; CHTbl;

/*Public Interface */int chtbl_init(CHTbl *htbl, int buckets, int (*h)(const void *key), int (*match)(const void *key1, const void *key2), void (*destroy)(void *data));void chtbl_destroy(CHTbl *htbl);int chtbl_insert(CHTbl *htbl, const void *data);int chtbl_remove(CHTbl *htbl, void **data);int chtbl_lookup(const CHTbl *htbl, void **data);#define chtbl_size(htbl) ((htbl)->size)#endif

chtbl_initThe • chtbl_init operation initialises a chained hash table so that it can be used in other operations. Initialising a chained hash table is a simple operation in which we allocate space for the buckets; initialise each • bucket by calling list_init; encapsulate the h, match, and destroy functions; and set the size member to 0. The runtime complexity of • chtbl_init is O (m), where m is the number of buckets in the table. This is because the O(1) operation list_init must be called once for each of the m buckets. All other parts of the operation run in a constant amount of time.•

chtbl_destroyThe • chtbl_destroy operation destroys a chained hash table.Primarily this means removing the elements from each bucket and freeing the memory • chtbl_init allocated for the table. The function passed as destroy to • chtbl_init is called once for each element as it is removed, provided destroy was not set to NULL. The runtime complexity of • chtbl_destroy is O(m), where m is the number of buckets in the table. This is because • list_destroy is called once for each bucket. In each bucket, we expect to remove a number of elements equal to the load factor of the hash table, which is • treated as a small constant.

Data Structures

88

chtbl_insertThe • chtbl_insert operation inserts an element into a chained hash table. Since a key is not allowed to be inserted into the hash table more than once, • chtbl_lookup is called to make sure that the table does not already contain the new element. If no element with the same key already exists in the hash table, we hash the key for the new element and insert • it into the bucket at the position in the hash table that corresponds to the hash coding. If this is successful, we increment the table size. • Assuming we approximate uniform hashing well, the runtime complexity of • chtbl_insert is O(1), since chtbl_lookup, hashing a key, and inserting an element at the head of a linked list all run in a constant amount of time.

chtbl_removeThe • chtbl_remove operation removes an element from a chained hash table. To remove the element, we hash its key, search the appropriate bucket for an element with a key that matches, • and call list_rem_next to remove it. The pointer • prev maintains a pointer to the element before the one to be removed since list_rem_next requires this. Recall that • list_rem_next sets data to point to the data removed from the table. If a matching key is not found in the bucket, the element is not in the table. • If removing the element is successful, we decrease the table size by 1. • Assuming we approximate uniform hashing well, the runtime complexity of • chtbl_remove is O(1). This is because we expect to search a number of elements equal to the load factor of the hash table, which is treated as a small constant.

chtbl_lookupThe • chtbl_lookup operation searches for an element in a chained hash table and returns a pointer to it. This operation works much like chtbl_remove, except that once the element is found; it is not removed from the table. Assuming we approximate uniform hashing well, the runtime complexity of • chtbl_lookup is O(1). This is because we expect to search a number of elements equal to the load factor of the hash table, which is treated as a small constant.

chtbl_sizeThis macro evaluates to the number of elements in a chained hash table. It works by accessing the • size member of the CHTbl structure. The runtime complexity of • chtbl_size is O(1) because accessing a member of a structure is a simple task that runs in a constant amount of time.

89

/*Implementation of the Chained Hash Table Abstract Datatype*//* chtbl.c*/#include <stdlib.h>#include <string.h>#include "list.h"#include "chtbl.h"

/* chtbl_init*/int chtbl_init(CHTbl *htbl, int buckets, int (*h)(const void *key), int (*match)(const void *key1, const void *key2), void (*destroy)(void*data)) int i;

/* Allocate space for the hash tabl*/if ((htbl->table = (List *)malloc(buckets * sizeof(List))) == NULL) return -1;

/*Initialize the buckets*/htbl->buckets = buckets;for (i = 0; i < htbl->buckets; i++) list_init(&htbl->table[i], destroy);

/*Encapsulate the functions*/htbl->h = h;htbl->match = match;htbl->destroy = destroy;

/* Initialize the number of elements in the table*/htbl->size = 0;return 0;

/* chtbl_destroy*/void chtbl_destroy(CHTbl *htbl) int i;

/*Destroy each bucket*/for (i = 0; i < htbl->buckets; i++) list_destroy(&htbl->table[i]);

/*Free the storage allocated for the hash table*/

free(htbl->table);

Data Structures

90

/*No operations are allowed now, but clear the structure as a precaution*/memset(htbl, 0, sizeof(CHTbl));return;

/*chtbl_insert*/int chtbl_insert(CHTbl *htbl, const void *data) void *temp;int bucket, retval;

/*Do nothing if the data is already in the table*/temp = (void *)data;if (chtbl_lookup(htbl, &temp) == 0) return 1;

/* Hash the key*/

bucket = htbl->h(data) % htbl->buckets;

/*Insert the data into the bucket*/if ((retval = list_ins_next(&htbl->table[bucket], NULL, data)) == 0) htbl->size++;return retval;/* chtbl_remove*/int chtbl_remove(CHTbl *htbl, void **data) ListElmt *element, *prev;int bucket;/* Hash the key*/bucket = htbl->h(*data) % htbl->buckets;/*Search for the data in the bucket*/prev = NULL;for (element = list_head(&htbl->table[bucket]); element != NULL; element = list_next(element)) if (htbl->match(*data, list_data(element))) /*Remove the data from the bucket*/if (list_rem_next(&htbl->table[bucket], prev, data) == 0)

91

htbl->size--; return 0; else return -1; prev = element;

/* Return that the data was not found*/return -1;

/* chtbl_lookup*/int chtbl_lookup(const CHTbl *htbl, void **data) ListElmt *element;int bucket;

/* Hash the key*/

bucket = htbl->h(*data) % htbl->buckets;

/*Search for the data in the bucket*/for (element = list_head(&htbl->table[bucket]); element != NULL; element = list_next(element)) if (htbl->match(*data, list_data(element)))

/*Pass back the data from the table*/*data = list_data(element); return 0;

/*Return data was not found*/return -1;

6.14 Chained Hash Table ExampleAn important application of hash tables is the way compilers maintain information about symbols encountered • in a program. Formally, a compiler translates a program written in one language, a source language such as C, into another • language, which is a set of instructions for the machine on which the program will run. In order to maintain information about the symbols in a program, compilers make use of a data structure called • a symbol table. Symbol tables are often implemented as hash tables because a compiler must be able to store and retrieve • information about symbols very quickly.Several parts of a compiler access the symbol table during various phases of the compilation process. • One part, the lexical analyser, inserts symbols. •

Data Structures

92

The lexical analyser is the part of a compiler charged with grouping characters from the source code into • meaningful strings, called lexemes. These are translated into syntactic elements, called tokens that are passed on to the parser. • The parser performs syntactical analysis. • As the lexical analyser encounters symbols in its input stream, it stores information about them into the symbol • table. Two important attributes stored by the lexical analyser are a symbol’s lexeme and the type of token the lexeme • constitutes (e.g., an identifier or an operator).The example presented here is a very simple lexical analyser that analyses a string of characters and then • groups the characters into one of two types of tokens: a token consisting only of digits or a token consisting of something other than digits alone. For simplicity, we assume that tokens are separated in the input stream by a single blank.• The lexical analyser is implemented as a function, lex, which a parser calls each time it requires another • token.The function works by first calling the • next_token function (whose implementation is not shown) to get the next blank-delimited string from the input stream istream. If • next_token returns NULL, there are no more tokens in the input stream. In this case, the function returns • lexit, which tells the parser that there are no more tokens to be processed. If • next_token finds a string, some simple analysis is performed to determine what type of token the string represents. Next, the function inserts the lexeme and token type together as a Symbol structure into the symbol table, • symtbl, and returns the token type to the parser. The type Symbol is defined in symbol.h, which is not included in this example.• A chained hash table is a good way to implement a symbol table because, in addition to being an efficient way • to store and retrieve information, we can use it to store a virtually unlimited amount of data. This is important for a compiler since it is difficult to know how many symbols a program will contain before • lexical analysis.The runtime complexity of • lex is O(1), assuming next_token runs in a constant amount of time. This is because lex simply calls chtbl_insert, which is an O(1) operation.

93

/* Header for a Simple Lexical Analyze*//* lex.h*/#ifndef LEx_H#define LEX_H#include "chtbl.h"

/* Define the token types recognized by the lexical analyze*/

typedef enum Token_ lexit, error, digit, other Token;

/*Public Interface*/

Token lex(const char *istream, CHTbl *symtbl);#endif

/*Implementation of a Simple Lexical Analyzer*//*lex.c*/#include <ctype.h>#include <stdlib.h>#include <string.h>#include "chtbl.h"#include "lex.h"#include "symbol.h"

/* lex*/Token lex (const char *istream, CHTbl *symtbl) Token token;Symbol *symbol;int length, retval, i;/*Allocate space for a symbol*/

if ((symbol = (Symbol *)malloc(sizeof(Symbol))) == NULL) return error;

/* Process the next token*/

if ((symbol->lexeme = next_token(istream)) == NULL)/* Return that there is no more input*/free(symbol); return lexit;

Data Structures

94

else /*Determine the token type*/symbol->token = digit; length = strlen(symbol->lexeme); for (i = 0; i < length; i++) if (!isdigit(symbol->lexeme[i])) symbol->token = other; memcpy(&token, &symbol->token, sizeof(Token));

/*Insert the symbol into the symbol table*/if ((retval = chtbl_insert(symtbl, symbol)) < 0) free(symbol); return error; else if (retval == 1)

/* The symbol is already in the symbol table*/

free(symbol);

/* Return the token for the parser*/return token ;

6.15 Description of Open-Addressed Hash TablesIn a chained hash table, elements reside in buckets extending from each position.• In an open-addressed hash table, on the other hand, all elements reside in the table itself. This may be important • for some applications that rely on the table being a fixed size.Without a way to extend the number of elements at each position, however, an open-addressed hash table needs • another way to resolve collisions.

6.16 Collision ResolutionWhereas chained hash tables have an inherent means of resolving collisions, open-addressed hash tables must • handle them in a different way.The way to resolve collisions in an open-addressed hash table is to probe the table. • To insert an element, for example, we probe positions until we find an unoccupied one, and insert the element • there. To remove or look up an element, we probe positions until the element is located or until we encounter an • unoccupied position. If we encounter an unoccupied position before finding the element, or if we end up traversing all of the positions, • the element is not in the table.The goal is to minimise how many probes we have to perform. • Exactly how many positions we end up probing depends primarily on two things: the load factor of the hash • table and the degree to which elements are distributed uniformly.

95

Recall that the load factor of a hash table is α = n/m, where n is the number of elements and m is the number • of positions into which the elements may be hashed. Notice that since an open-addressed hash table cannot contain more elements than the number of positions in • the table (n > m), its load factor is always less than or equal to 1. This makes sense, since no position can ever contain more than one element.• Assuming uniform hashing, the number of positions we can expect to probe in an open-addressed hash table • is: 1/(1-α)For an open-addressed hash table that is half full (whose load factor is 0.5), for example, the number of positions • we can expect to probe is 1/(1 – 0.5) = 2. Table 6.1 illustrates how dramatically the expected number of probes increases as the load factor of an open-• addressed hash table approaches 1 (or 100%), at which point the table is completely full. In a particularly time-sensitive application, it may be advantageous to increase the size of the hash table to • allow extra space for probing.

Load Factor (%) Expected Probes

< 50 80 90 95

1 / (1 – 0.50) < 21 / (1 – 0.80) = 51 / (1 – 0.90) = 101 / (1 – 0.95) = 20

Table 6.1 Expected probes as a result of load factor, assuming uniform hashing

How close we come to the figures presented in Table 6.1 depends on how closely we approximate uniform • hashing. Just as in a chained hash table, this depends on how well we select our hash function. • In an open-addressed hash table, however, this also depends on how we probe subsequent positions in the table • when collisions occur. Generally, a hash function for probing positions in an open-addressed hash table is defined by : h(k, i)=x. • Where k is a key, • i is the number of times the table has been probed thus far, and x is the resulting hash coding. Typically, h makes use of one or more auxiliary hash functions selected for the same properties as presented • for chained hash tables. However, for an open-addressed hash table, h must possess an additional property: as i increases from 0 to m • – 1, where m is the number of positions in the hash table, all positions in the table must be visited before any position is visited twice; otherwise, not all positions will be probed.

6.17 Linear ProbingOne simple approach to probing an open-addressed hash table is to probe successive positions in the table. Formally stated, if we let i go between 0 and m – 1, where m is the number of positions in the table, a hash function for linear probing is defined as

h (k, i) = (h′(k) + i) mod mThe function h' is an auxiliary hash function, which is selected like any hash function; that is, so that elements • are distributed in a uniform and random manner. For example, we might choose to use the division method of hashing and let h' (k) = k mod m.• In this case, if we hash an element with key k = 2998 into a table of size m = 1000, the hash codings produced • are (998 + 0) mod 1000 = 998 when i = 0, (998 + 1) mod 1000 = 999 when i = 1, (998 + 2) mod 1000 = 0 when i = 2, and so on.

Data Structures

96

Therefore, to insert an element with key k = 2998, we would look for an unoccupied position first at position • 998, then 999, then 0, and so on. The advantage of linear probing is that it is simple and there are no constraints on m to ensure that all positions • will eventually be probed. Unfortunately, linear probing does not approximate uniform hashing very well. In particular, linear probing • suffers from a phenomenon known as primary clustering, in which large chains of occupied positions begin to develop as the table becomes more and more full. This results in excessive probing.

Fig. 6.2 Linear probing with h(k, i) = (k mod 11 + i) mod 11

6.18 Double HashingOne of the most effective approaches for probing an open-addressed hash table focuses on adding the hash • codings of two auxiliary hash functions. Formally stated, if we let i go between 0 and m – 1, where m is the number of positions in the table, a hash • function for double hashing is defined as

h(k, i)= (h1(k)+ih2(k))mod m

The functions h• 1 and h2 are auxiliary hash functions, which are selected like any hash function: so that elements are distributed in a uniform and random manner. However, in order to ensure that all positions in the table are visited before any position is visited twice, we must • adhere to one of the following procedures: we must select m to be a power of 2 and make h2 always return an odd value, or we must make m prime and design h2 so that it always returns a positive integer less than m.Typically, we let h• 1 (k) = k mod m and h2 (k) = 1 + (k mod m’), where m’ is slightly less than m, say, m – 1 or m – 2. Using this approach, for example, if the hash table contains m = 1699 positions (a prime number) and we hash • the key k = 15,385, the positions probed are (94 + (0) (113) mod 1699 = 94 when i = 0, and every 113th position after this as i increases.The advantage of double hashing is that it is one of the best forms of probing, producing a good distribution of • elements throughout a hash table.

97

The disadvantage is that m is constrained in order to ensure that all positions in the table will be visited in a • series of probes before any position is probed twice.

Fig. 6.3 Hashing the same keys

6.19 Interface for Open-Addressed Hash TablesLet us discuss the instruction used in Interface for Open-Addressed Hash Tables

ohtbl_init

int ohtbl_init(OHTbl *htbl, int positions,int (*h1)(const void *key) int (*h2)(const void *key), int (*match)(const void *key1 const void *key2), void (*destroy)(void *data));

Return Value 0 if initialising the hash table is successful or –1 otherwise.

Description Initialises the open-addressed hash table specified by • htbl. This operation must be called for an open-addressed hash table before the hash table can be used with any other operation. The number of positions to be allocated in the hash table is specified by positions. The function pointers h• 1 and h2 specify user-defined auxiliary hash functions for double hashing. The function pointer match specifies a user-defined function to determine if two keys match. • It should perform in a manner similar to that described for • chtbl_init. The destroy argument provides a way to free dynamically allocated data when • ohtbl_destroy is called. It works in a manner similar to that described for • chtbl_destroy. For an open-addressed hash table containing data that should not be freed, • destroy should be set to NULL.

Data Structures

98

ComplexityO(m), where m is the number of positions in the hash table.

ohtbl_destroy•

void ohtbl_destroy(OHTbl *htbl);

Return Value None

DescriptionDestroys the open-addressed hash table specified by htbl. • No other operations are permitted after calling • ohtbl_destroy unless ohtbl_init is called again. The • ohtbl_destroy operation removes all elements from a hash table and calls the function passed as destroy to ohtbl_init once for each element as it is removed, provided destroy was not set to NULL.

Complexity O(m), where m is the number of positions in the hash table.

ohtbl_insert

int ohtbl_insert(OHTbl *htbl, const void *data);

Return Value 0 if inserting the element is successful, 1 if the element is already in the hash table, or –1 otherwise.

Description Inserts an element into the open-addressed hash table specified by htbl. • The new element contains a pointer to data, so the memory referenced by data should remain valid as long as • the element remains in the hash table. It is the responsibility of the caller to manage the storage associated with data.•

Complexity O (1)

ohtbl_removeint ohtbl_remove(OHTbl *htbl, void **data)

Return value 0 if removing the element is successful, or –1 otherwise.

Description Removes the element matching data from the open-addressed hash table specified by • htbl.Upon return, data points to the data stored in the element that was removed. • It is the responsibility of the caller to manage the storage associated with the data.•

Complexity O(1)ohtbl_lookup

int ohtbl_lookup(const OHTbl *htbl, void **data);

Return value 0 if the element is found in the hash table, or –1 otherwise.

99

Description Determines whether an element matches data in the open addressed hash table specified by htbl. • If a match is found, upon return data points to the matching data in the hash table.•

Complexity O(1)

ohtbl_size

int ohtbl_size(const OHTbl *htbl);

Return Value Number of elements in the hash table.

DescriptionMacro that evaluates to the number of elements in the open addressed hash table specified by htbl.

Complexity O(1)

6.20 Implementation and Analysis of Open Addressed Hash TablesAn open-addressed hash table fundamentally consists of a single array. • The structure OHTbl is the open-addressed hash table data structure. • This structure consists of eight members: •

Positions is the number of positions allocated in the hash table; vacated is a pointer that will be initialised to a special storage location to indicate that a particular position in the table has had an element removed from it; h 1, h2, match, and destroy are members used to encapsulate the functions passed to ohtbl_init ; size is the number of elements currently in the table; table is the array in which the elements are stored.

The vacated member requires a bit of discussion. • Its purpose is to support the removal of elements. • An unoccupied position in an open-addressed hash table usually contains a NULL pointer. • However, when we remove an element, we cannot set its data pointer back to NULL because when probing to • look up a subsequent element, NULL would indicate that the position is unoccupied and no more probes should be performed. In actuality, one or more elements may have been inserted by probing past the removed element while it was • still in the table.Considering this, we set the data pointer to the vacated member of the hash table data structure when we remove • an element. The address of vacated serves as a special sentinel to indicate that a new element may be inserted at the position. • This way, when probing to look up an element, we are assured that a NULL really means to stop probing.

Data Structures

100

/* Header for the Open-Addressed Hash Table Abstract Datatype*/•

/*ohtbl.h*/#ifndef OHTBL_H#define OHTBL_H#include <stdlib.h>

/* Define a structure for open-addressed hash table*/

typedef struct OHTbl_ int positions;void *vacated;int (*h1)(const void *key);int (*h2)(const void *key);int (*match)(const void *key1, const void *key2);void (*destroy)(void *data);int size;void **table; OHTbl;

/*Public Interface*/

int ohtbl_init(OHTbl *htbl, int positions, int (*h1)(const void *key), int (*h2)(const void *key), int (*match)(const void *key1, const void *key2), void (*destroy)(void *data));void ohtbl_destroy(OHTbl *htbl);int ohtbl_insert(OHTbl *htbl, const void *data);int ohtbl_remove(OHTbl *htbl, void **data);int ohtbl_lookup(const OHTbl *htbl, void **data);#define ohtbl_size(htbl) ((htbl)->size)#endif

ohtbl_initThe • ohtbl_init operation initialises an open-addressed hash table so that it can be used in other operations. Initialising an open-addressed hash table is a simple operation in which we allocate space for the table; initialise • the pointer in each position to NULL; encapsulate the h1, h2, match and destroy functions; initialise vacated to its sentinel address; and set the size member to 0. The runtime complexity of • ohtbl_init is O(m), where m is the number of positions in the table. This is because the data pointer in each of the m positions must be initialised to NULL, and all other parts of the operation run in a constant amount of time.

ohtbl_destroyThe • ohtbl_destroy operation destroys an open-addressed hash table. Primarily this means freeing the memory • ohtbl_init allocated for the table. The function passed as destroy to • ohtbl_init is called once for each element as it is removed, provided destroy was not set to NULL. The runtime complexity of • ohtbl_destroy is O(m), where m is the number of positions in the hash table. This is because we must traverse all positions in the hash table to determine which are occupied. If • destroy is NULL, ohtbl_destroy runs in O(1) time.

101

ohtbl_insertThe • ohtbl_insert operation inserts an element into an open-addressed hash table. Since an open-addressed hash table has a fixed size, we first ensure that there is room for the new element to be inserted. Also, since a key is not allowed to be inserted into the hash table more than once, we call • ohtbl_lookup to make sure the table does not already contain the new element.Once these conditions are met, we use double hashing to probe the table for an unoccupied position. • A position in the table is unoccupied if it points either to NULL or the address in • vacated, a special member of the hash table data structure that indicates that a position has had an element removed from it. Once we find an unoccupied position in the table, we set the pointer at that position to point to the data we wish • to insert. After this, we increment the table size. • Assuming we approximate uniform hashing well and the load factor of the hash table is relatively small, the • runtime complexity of ohtbl_insert is O(1). This is because in order to find an unoccupied position at which to insert the element, we expect to probe 1/(1 – α) positions, a number treated as a small constant, where α is the load factor of the hash table.

ohtbl_removeThe • ohtbl_remove operation removes an element from an open-addressed hash table.To remove the element, we use double hashing as in • ohtbl_insert to locate the position at which the element resides. We continue searching until we locate the element or NULL is found. • If we find the element, we set data to the data being removed and decrease the table size by 1. • Also, we set the position in the table to the vacated member of the hash table data structure. • Assuming we approximate uniform hashing well, the runtime complexity of • ohtbl_remove is O(1). This is because we expect to probe 1/(1 – α) positions, a number treated as a small constant, where α is the • largest load factor of the hash table since calling ohtbl_init. The reason that the performance of this operation depends on the largest load factor and thus does not improve • as elements are removed is that we must still probe past vacated positions. The use of the • vacated member only improves the performance of ohtbl_insert.

ohtbl_lookupThe • ohtbl_lookup operation searches for an element in an open-addressed hash table and returns a pointer to it.This operation works similarly to • ohtbl_remove, except that the element is not removed from the table. Assuming we approximate uniform hashing well, the runtime complexity of • ohtbl_lookup is the same as ohtbl_remove, or O(1). This is because we expect to probe 1/(1 – α) positions, a number treated as a small constant, where α is the • largest load factor of the hash table since calling ohtbl_init. The reason that performance depends on the largest load factor since calling • ohtbl_init is the same as described for ohtbl_remove.

Data Structures

102

ohtbl_sizeThis macro evaluates to the number of elements in an open-addressed hash table. It works by accessing the size member of the OHTbl structure. The runtime complexity of ohtbl_size is O(1) because accessing a member of a structure is a simple task that runs in a constant amount of time.

/* Implementation of the Open-Addressed Hash Table Abstract Datatype *//* ohtbl.c*/

#include <stdlib.h>#include <string.h>#include "ohtbl.h"

/* Reserve a sentinel memory address for vacated elements */

static char vacated;

/* ohtbl_init*/

int ohtbl_init(OHTbl *htbl, int positions, int (*h1)(const void *key), int (*h2)(const void *key), int (*match)(const void *key1, const void *key2), void (*destroy)(void *data)) int i;

/* Allocate space for the hash table*/

if ((htbl->table = (void **)malloc(positions * sizeof(void *))) == NULL) return -1;

/* Initialize each position*/

htbl->positions = positions;for (i = 0; i < htbl->positions; i++) htbl->table[i] = NULL;

/* Set the vacated member to the sentinel memory address reserved for this */

htbl->vacated = &vacated;

/* Encapsulate the functions */

103

htbl->h1 = h1;htbl->h2 = h2;htbl->match = match;htbl->destroy = destroy;

/* Initialize the number of elements in the table */

htbl->size = 0;return 0;

/* ohtbl_destroy*/

void ohtbl_destroy(OHTbl *htbl) int i;if (htbl->destroy != NULL)

/* Call a user-defined function to free dynamically allocated data */

for (i = 0; i < htbl->positions; i++) if (htbl->table[i] != NULL && htbl->table[i] != htbl->vacated) htbl->destroy(htbl->table[i]);

/* Free the storage allocated for the hash table */

free(htbl->table);/* No operations are allowed now, but clear the structure as a precaution*/memset(htbl, 0, sizeof(OHTbl));return;

/* ohtbl_insert */

int ohtbl_insert(OHTbl *htbl, const void *data) void *temp;int position, i;

/* Do not exceed the number of positions in the table */

if (htbl->size == htbl->positions) return -1;

Data Structures

104

/* Use double hashing to hash the key */

for (i = 0; i < htbl->positions; i++) position = (htbl->h1(data) + (i * htbl->h2(data))) % htbl->positions; if (htbl->table[position] == NULL || htbl->table[position] == htbl-> vacated)

/* Insert the data into the table */

htbl->table[position] = (void *)data; htbl->size++; return 0;

/* Return that the hash functions were selected incorrectly */

return -1;

/* ohtbl_remove */

int ohtbl_remove(OHTbl *htbl, void **data) int position, i;


for (i = 0; i < htbl->positions; i++) position = (htbl->h1(*data) + (i * htbl->h2(*data))) % htbl->positions; if (htbl->table[position] == NULL)

/* Return that the data was not found */return -1; else if (htbl->table[position] == htbl->vacated)

/* Search beyond vacated positions*/

continue; else if (htbl->match(htbl->table[position], *data))

105

/* Pass back the data from the table */

*data = htbl->table[position]; htbl->table[position] = htbl->vacated; htbl->size--; return 0;

/* Return that the data was not found */

return -1;

/* ohtbl_lookup*/

int ohtbl_lookup(const OHTbl *htbl, void **data) int position, i;


for (i = 0; i < htbl->positions; i++) position = (htbl->h1(*data) + (i * htbl->h2(*data))) % htbl->positions; if (htbl->table[position] == NULL)


return -1; else if (htbl->match(htbl->table[position], *data))

/* Pass back the data from the table */

*data = htbl->table[position]; return 0;


return -1;

Data Structures

106

SummaryHash tables support one of the most efficient types of searching: hashing.• Fundamentally, a hash table consists of an array in which data is accessed via a special index called a key. • The primary idea behind a hash table is to establish a mapping between the set of all possible keys and positions • in the array using a hash function. If we were to rely on direct addressing, the hash table would contain more than 26• 8= (2.09)1011 entries and the majority would be unused since most character combinations are not names.Chained hash table is nothing but a hash tables that store data in buckets. Each bucket is a linked list that can • grow as large as necessary to accommodate collisions. Hash tables that store data in the table itself instead of in buckets is nothing but open-addressed hash table. • Collisions are resolved using various methods of probing the table.Selecting a hash function is a crux of a hashing. By distributing keys in a random manner about the table, • collisions are minimised. Thus, it is important to select a hash function that accomplishes this function.Collision resolution is a method of managing when several keys map to the same index. Chained hash tables • have an inherent way to resolve collisions. Open-addressed hash tables use various forms of probing.A chained hash table fundamentally consists of an array of linked lists. • Each list forms a bucket in which we place all elements hashing to a specific position in the array. • To insert an element, we first pass its key to a hash function in a process called hashing the key. • When two keys hash to the same position in a hash table, they collide. Chained hash tables have a simple solution • for resolving collisions: elements are simply placed in the bucket where the collision occurs.The goal of a good hash function is to approximate uniform hashing, that is, to spread elements about a hash • table in as uniform and random a manner as possible.

References Cadenhead, R. & Lemay, L., 2002. • Sams teach yourself Java 2 in 21 days. Sams Publishing. Hunt, J., 2002. • Java and object orientation: an introduction. Springer.

Recommended Reading Cormen, T. H., 2001. • Introduction to algorithms. MIT Press.Arnold., 2000. • The Java Programming Language. Pearson Education India. Herbert Schildt., 2010. • C# 4. 0 the Complete Reference. McGraw Hill Professional.

107

Self AssessmentHash tables support one of the most efficient types of searching, i.e.,_____________.1.

hashinga. multiplying b. addingc. subtracting d.

Which of the following statements is true?2. A hash function accepts a key and dissolves its hash coding, or hash valuea. A hash function accepts a key and adds its hash coding, or hash valueb. A hash function accepts a key and returns its hash coding, or hash valuec. A hash function accepts a key and manipulates its hash coding, or hash valued.

Which of the following statements is true?3. Keys vary in type, but hash codings are always decimal.a. Keys vary in type, but muddle codings are always pointer.b. Keys vary in type, but hash codings are always hexadecimal.c. Keys vary in type, but hash codings are always integers.d.

Which of following is a hash tables that store data in buckets?4. Chained hash table a. Open-addressed has table b. Collision resolutionc. Selecting hash table d.

Which of the following hash tables stores data in the table itself instead of in buckets?5. Chained hash table a. Open-addressed has table b. Collision resolutionc. Selecting hash table d.

Collisions are resolved using various methods of ___________the table.6. penetratinga. searchingb. probingc. pointedd.

Selecting a hash function is a __________of a hashing.7. hearta. bottomb. nubc. cruxd.

Data Structures

108

The tables used by compilers to maintain information about __________from a program.8. signsa. symbolsb. codec. policyd.

________________access information about symbols frequently.9. Compilersa. Assembler sb. Instructorsc. Informersd.

A mechanism for storing and retrieving data in a ___________manner.10. code-independenta. instruction-independentb. machine-independentc. table-independent d.

109

Application I

Program of Bubble Sort in Perl Language and in Python

Sorting AlgorithmsIn order to keep business records and sort them by ID numbers or last name of client, one needs a sorting algorithm. To understand the more complex and efficient sorting algorithms, it's important to first understand the simpler, but slower algorithms. In this application, you'll learn about bubble sort, including a modified bubble sort that's slightly more efficient; insertion sort; and selection sort. Any of these sorting algorithms are good enough for most small tasks, though if you were going to process a large amount of data, you would want to choose one of the sorting algorithms listed on the advanced sorting page.

Bubble sortThe simplest sorting algorithm is bubble sort. The bubble sort works by iterating down an array to be sorted from the first element to the last, comparing each pair of elements and switching their positions if necessary. This process is repeated as many times as necessary, until the array is sorted. Since the worst case scenario is that the array is in reverse order, and that the first element in sorted array is the last element in the starting array, the most exchanges that will be necessary is equal to the length of the array. Here is a simple example:

AlgorithmIterates through every element of the array, starting with the first 2 elements.1. If left element is bigger than right element, swap them.2. Repeat step #1 and #2 until there are no more swaps.3.

Block Diagram

Fig. 1.1 Bubble sorting technique

Data Structures

110

Perl

#!/apps/perl/bin/perl -w &main(); sub main my @array = (1,7,4,9,4,7,2,3,0,8); print "Before Sort:-\n"; print "@array\n"; &bubbleSort(\@array,); print "After Sort:-\n"; print "@array\n"; # main sub bubbleSort my $aref = shift(@_); my $swapHappened = 1; while ($swapHappened) $swapHappened = 0; for (my $i=0; $i<@$aref-1; $i++) if ($aref->[$i] > $aref->[$i+1]) &swap( \$aref->[$i], \$aref->[$i+1] ); $swapHappened = 1; # if # for # while # selectionSort sub swap my ($x, $y) = @_; my $tmp = $$x; $$x = $$y; $$y = $tmp; # swap

Python

#!/swdev/tools/python/2.5.1/linux32/bin/python -d def main(): array = [1, 7, 4, 9, 4, 7, 2, 3, 0, 8] print("Before sort:-") print(array)

111

bubbleSort(array) print("After sort:-") print(array) def bubbleSort(array): swapHappened = True while swapHappened: swapHappened = False for x in range(0, len(array)-1): if array[x] > array[x+1]: # Swap data array[x], array[x+1] = array[x+1], array[x] swapHappened = True

main()

QuestionsWhat is sorting algorithm?1. Write a program for bubble sorting. 2. Draw Block diagram of bubble sorting. 3. Write a code of bubble sort in Python. 4.

Data Structures

112

Application II

Merge sort in C++ and C#

Sorting TechniqueIn computer science and mathematics, a sorting algorithm is an algorithm that puts elements of a list in a certain order. The most-used orders are numerical order and lexicographical order. Efficient sorting is important for optimizing the use of other algorithms (such as search and merge algorithms) that require sorted lists to work correctly; it is also often useful for canonicalising data and for producing human-readable output. More formally, the output must satisfy two conditions: The output is in non-decreasing order (each element is no smaller than the previous element according to the desired total order); the output is a permutation, or reordering, of the input.

Merge SortSorting Schemes can be classified as internal (sorting data already in memory) and external (sorting collection • of data from secondary storage). Merge Sort is a sorting algorithm that is useful for both, internal and external sorting. • The merging problem is one that is simpler than sorting an unsorted array, and one that will be a tool we can • use in Merge Sort.The problem is that you are given two arrays, each of which is already sorted. Now, your job is to efficiently • combine the two arrays into one larger one which contains all of the values of the two smaller arrays in sorted order. The essential idea is this:

Keep track of the smallest value in each array that hasn’t been placed in order in the larger array yet. Compare these two smallest values from each array. One of these must be the smallest of all the values in both arrays that are left. Placed the smallest of the two values in the next location in the larger array. Adjust the smallest value for the appropriate array. Continue this process until all values have been placed in the large array.

Illustration of Merge AlgorithmHere is an illustration of an algorithm to do a merge.

sorted sequence

merge

merge

merge merge merge

merge

merge

initial sequence

1 2 2 3 4 5 6 6

2 4 5 6 1 2 3 6

2 5

5 2 4 6 1 23 6

1 3 2 6 4 6

Fig. 1.2 Merge sort

113

1. Program of merge sort in C#

public IList MergeSort(IList list) if (list.Count <= 1) return list; int mid = list.Count / 2; IList left = new ArrayList(); IList right = new ArrayList(); for (int i = 0; i < mid; i++) left.Add(list[i]); for (int i = mid; i < list.Count; i++) right.Add(list[i]); return Merge(MergeSort(left), MergeSort(right)); public IList Merge(IList left, IList right) IList rv = new ArrayList(); while (left.Count > 0 && right.Count > 0) if (((IComparable)left[0]).CompareTo(right[0]) > 0) rv.Add(right[0]); right.RemoveAt(0); else rv.Add(left[0]); left.RemoveAt(0); for (int i = 0; i < left.Count; i++) rv.Add(left[i]); for (int i = 0; i < right.Count; i++) rv.Add(right[i]); return rv;

2. Program of merge sort in C++

//! \brief Performs a recursive merge sort on the given vector //! \param vec The vector to be sorted using the merge sort //! \return The sorted resultant vector after merge sort is //! complete. vector<int> merge_sort(vector<int>& vec)

Data Structures

114

// Termination condition: List is completely sorted if it // only contains a single element. if(vec.size() == 1) return vec; // Determine the location of the middle element in the vector std::vector<int>::iterator middle = vec.begin() + (vec.size() / 2); vector<int> left(vec.begin(), middle); vector<int> right(middle, vec.end()); // Perform a merge sort on the two smaller vectors left = merge_sort(left); right = merge_sort(right); return merge(left, right); And here is the implementation of the merge function://! \brief Merges two sorted vectors into one sorted vector //! \param left A sorted vector of integers //! \param right A sorted vector of integers //! \return A sorted vector that is the result of merging two sorted //! vectors. vector<int> merge(const vector<int>& left, const vector<int>& right) // Fill the resultant vector with sorted results from both vectors vector<int> result; unsigned left_it = 0, right_it = 0; while(left_it < left.size() && right_it < right.size()) // If the left value is smaller than the right it goes next // into the resultant vector if(left[left_it] < right[right_it]) result.push_back(left[left_it]); left_it++; else result.push_back(right[right_it]); right_it++; // Push the remaining data from both vectors onto the resultant while(left_it < left.size()) result.push_back(left[left_it]); left_it++;

115

while(right_it < right.size()) result.push_back(right[right_it]); right_it++; return result;

Here's another recursive implementation of the mergesort using arrays of variable length

#include <iostream> using namespace std; void merge(int a[], const int low, const int mid, const int high) // Variables declaration. int * b = new int[high+1-low]; int h,i,j,k; h=low; i=0; j=mid+1; // Merges the two array's into b[] until the first one is finish while((h<=mid)&&(j<=high)) if(a[h]<=a[j]) b[i]=a[h]; h++; else b[i]=a[j]; j++; i++; // completes the array filling in it the missing values if(h>mid) for(k=j;k<=high;k++) b[i]=a[k]; i++; else for(k=h;k<=mid;k++) b[i]=a[k];

Data Structures

116

i++; // Prints into the original array for(k=0;k<=high-low;k++) a[k+low]=b[k]; delete[] b; void merge_sort(int a[], const int low, const int high) // Recursive sort ... int mid; if(low<high) mid=(low+high)/2; merge_sort(a, low,mid); merge_sort(a, mid+1,high); merge(a, low,mid,high); int _tmain(int argc, _TCHAR* argv[]) int arraySize; // a[] is the array to be sorted. ArraySize is the size of a[] ... merge_sort(a, 0, (arraySize-1) ); // would be more natural to use merge_sort(a, 0, arraySize ), so please try. // some work return 0;

3. Program of merge sort in Java

public int[] mergeSort(int array[]) if(array.length > 1) int elementsInA1 = array.length/2; int elementsInA2 = array.length - elementsInA1; int arr1[] = new int[elementsInA1]; int arr2[] = new int[elementsInA2]; for(int i = 0; i < elementsInA1; i++) arr1[i] = array[i]; for(int i = elementsInA1; i < elementsInA1 + elementsInA2; i++) arr2[i - elementsInA1] = array[i]; arr1 = mergeSort(arr1); arr2 = mergeSort(arr2); int i = 0, j = 0, k = 0;

117

while(arr1.length != j && arr2.length != k) if(arr1[j] <= arr2[k]) array[i] = arr1[j];

i++; j++; else array[i] = arr2[k]; i++; k++; while(arr1.length != j) array[i] = arr1[j]; i++; j++; while(arr2.length != k) array[i] = arr2[k]; i++; k++; return array;

4. Program of merge sort in Java

function merge_sort(arr) var l = arr.length, m = Math.floor(l/2); if (l <= 1) return arr; return merge(merge_sort(arr.slice(0, m)), merge_sort(arr.slice(m))); function merge(left,right) var result = []; var ll = left.length, rl = right.length; while (ll > 0 && rl > 0) if (left[0] <= right[0]) result.push(left.shift()); ll--; else result.push(right.shift()); rl--; if (ll > 0) result.push.apply(result, left); else if (rl > 0) result.push.apply(result, right); return result;

Data Structures

118

5. Program of merge sort in PHP

function merge_sort(&$arrayToSort) if (sizeof($arrayToSort) <= 1) return $arrayToSort; // split our input array into two halves // left... $leftFrag = array_slice($arrayToSort, 0, (int)(count($arrayToSort)/2)); // right... $rightFrag = array_slice($arrayToSort, (int)(count($arrayToSort)/2)); // RECURSION // split the two halves into their respective halves... $leftFrag = merge_sort($leftFrag); $rightFrag = merge_sort($rightFrag); $returnArray = merge($leftFrag, $rightFrag); return $returnArray; function merge(&$lF, &$rF) $result = array(); // while both arrays have something in them while (count($lF)>0 && count($rF)>0) if ($lF[0] <= $rF[0]) array_push($result, array_shift($lF)); else array_push($result, array_shift($rF));

// did not see this in the pseudo code, // but it became necessary as one of the arrays // can become empty before the other array_splice($result, count($result), 0, $lF); array_splice($result, count($result), 0, $rF); return $result;

6. Program of merge sort in Python

def sort(array): if len(array) <= 1: return array mid = len(array) // 2 return merge (sort(array[0:mid]), sort(array[mid:])) # this may not be the most thoroughly idiomatic python, or the # most efficient merge (it duplicates data when "Transmitting") # but it works def merge(left, right):

119

merged = [] i = 0 j = 0 while len(merged) < len(left)+len(right): if left[i] <= right[j]: merged.append(left[i]) i += 1 if i == len(left): # Knuth, TaoCP Vol 3 5.2.4 Calls this the "transmit" merged.extend(right[j:]) break else: merged.append(right[j]) j += 1 if j == len(right): merged.extend(left[i:]) break return merged def mergesort(n): """Recursively merge sort a list. Returns the sorted list.""" front = n[:len(n)/2] back = n[len(n)/2:] if len(front) > 1: front = mergesort(front) if len(back) > 1: back = mergesort(back) return merge(front, back) def merge(front, back): """Merge two sorted lists together. Returns the merged list.""" result = [] while front and back: # pick the smaller one from the front and stick it on # note that list.pop(0) is a linear operation, so this gives quadratic running time... result.append(front.pop(0) if front[0]<=back[0] else back.pop(0)) # add the remaining end result.extend(front or back) return result

7. Program of merge sort in RUBY

def mergesort(list) return list if list.size <= 1 mid = list.size / 2 left = list[0, mid] right = list[mid, list.size] merge(mergesort(left), mergesort(right)) end

Data Structures

120

def merge(left, right) sorted = [] until left.empty? or right.empty? if left.first <= right.first sorted << left.shift else sorted << right.shift end end sorted.concat(left).concat(right) end

QuestionsWhat is merge sort?1. Write in brief about sorting technique. 2. Draw a neat and clean diagram of merge sorting 3. Write a program of Merge sort in RUBY.4. Write a program of Merge sort in Python.5.

121

Application III

Queue

Queue Data StructureQueue is a data structure that maintains "First In First Out" (FIFO) order and can be viewed as people queuing up to buy a ticket. In programming, queue is usually used as a data structure for BFS (Breadth First Search).

Fig. 2 Queue

Queue operations Operations on queue Q are :

Enqueue: insert item at the back of queue Q1. Dequeue: return (and virtually remove) the front item from queue Q2. init: intialize queue Q, reset all variables.3.

Title: Program of queue using array

Aim: To write program for queue in C, C++, Java

Program of Queue in C:

# include <stdio.h># define MAX 5

int queue_arr[MAX];int rear = -1;int front = -1;

main()int choice;while(1)printf("1.Insert\n");printf("2.Delete\n");printf("3.Display\n");printf("4.Quit\n");printf("Enter your choice : ");scanf("%d",&choice);

Back Front

Queue

Data Structures

122

switch(choice)

case 1 :insert();break;case 2 :del();break;case 3:display();break;case 4:exit(1);default:printf("Wrong choice\n");/*End of switch*//*End of while*//*End of main()*/

insert()int added_item;if (rear==MAx-1)printf("Queue Overflow\n");elseif (front==-1) /*If queue is initially empty */front=0;printf("Input the element for adding in queue : ");scanf("%d", &added_item);rear=rear+1;queue_arr[rear] = added_item ;/*End of insert()*/

del()if (front == -1 || front > rear)printf("Queue Underflow\n");return ;elseprintf("Element deleted from queue is : %d\n", queue_arr[front]);front=front+1;/*End of del() */

display()

123

int i;if (front == -1)printf("Queue is empty\n");

elseprintf("Queue is :\n");for(i=front;i<= rear;i++)printf("%d ",queue_arr[i]);printf("\n");/*End of display() */

Program of Queue Implementation in C++/* Problem : Queue - Data Structure * Author : Stephanus * Lang. : C++ * Date : 20 January 2004/* #include < iostream >#define QUEUE_SIZE 100using namespace std;class Queue int q[QUEUE_SIZE]; int first,last; int count;public: Queue();void enqueue(int x);int dequeue();int getSize();;Queue::Queue() first = 0; last = QUEUE_SIZE - 1; count = 0;void Queue::enqueue(int x) last = (last + 1) % QUEUE_SIZE; q[last] = x; count++;int Queue::dequeue()int x = q[first];first = (first + 1) % QUEUE_SIZE;count--;return x;int Queue::getSize()return count;

Data Structures

124

int main()

Queue q;q.enqueue(1);q.enqueue(2);q.enqueue(3);while (q.getSize())cout << q.dequeue() << endl;return 0;

Implementation in JAVA/* Problem : Queue - Data Structure * Author : Stephanus * Lang. : JAVA * Date : 20 January 2004/* class Queuefinal int QUEUE_SIZE = 100;private int[] q = new int[QUEUE_SIZE];private int first,last;private int count;Queue()first = 0;last = QUEUE_SIZE - 1;count = 0;public void enqueue(int x)last = (last + 1) % QUEUE_SIZE;q[last] = x;count++;public int dequeue()int x = q[first];first = (first + 1) % QUEUE_SIZE;count--;return x;public int getSize()return count;public class queuepublic static void main(String[] args)Queue q = new Queue();

125

q.enqueue(1);q.enqueue(2);q.enqueue(3);

while (q.getSize() != 0)System.out.println(q.dequeue());

Implementation in JAVA/* Problem : Queue - Data Structure * Author : Stephanus * Lang. : JAVA * Date : 20 January 2004/* class Queuefinal int QUEUE_SIZE = 100;private int[] q = new int[QUEUE_SIZE];private int first,last;private int count;Queue()first = 0;last = QUEUE_SIZE - 1;count = 0;public void enqueue(int x)last = (last + 1) % QUEUE_SIZE;q[last] = x;count++;public int dequeue()int x = q[first];first = (first + 1) % QUEUE_SIZE;count--;return x;public int getSize()return count;public class queuepublic static void main(String[] args)Queue q = new Queue();q.enqueue(1);q.enqueue(2);q.enqueue(3);while (q.getSize() != 0)System.out.println(q.dequeue());

Data Structures

126

QuestionsWhat is Queue in data structure?1. What do you mean by FIFO? Draw neat and clean diagram to explain the the same.2. Explain operation of Queue.3. Write a program in C for queue.4. Write a program in Java for queue. 5.

127

Answer-1 Queue Data StructureQueue is a data structure that maintains "First In First Out" (FIFO) order. And can be viewed as people queuing up to buy a ticket. In programming, queue is usually used as a data structure for BFS (Breadth First Search).

Answer-2 FIFO: FIFO means First In First Out.

Answer-3 Queue operations Operations on queue Q are :

Enqueue: Insert item at the back of queue Q1. Dequeue: Return (and virtually remove) the front item from queue Q2. init: Intialize queue Q, reset all variables.3.

Answer-4 # include <stdio.h># define MAX 5

int queue_arr[MAX];int rear = -1;int front = -1;

main()int choice;while(1)printf("1.Insert\n");printf("2.Delete\n");printf("3.Display\n");printf("4.Quit\n");printf("Enter your choice : ");scanf("%d",&choice);

switch(choice)case 1 :insert();break;case 2 :

Back Front

Queue

Data Structures

128

del();break;case 3:display();break;case 4:exit(1);default:printf("Wrong choice\n");/*End of switch*//*End of while*//*End of main()*/

insert()int added_item;if (rear==MAx-1)printf("Queue Overflow\n");elseif (front==-1) /*If queue is initially empty */front=0;printf("Input the element for adding in queue : ");scanf("%d", &added_item);rear=rear+1;queue_arr[rear] = added_item ;/*End of insert()*/

del()if (front == -1 || front > rear)printf("Queue Underflow\n");return ;elseprintf("Element deleted from queue is : %d\n", queue_arr[front]);front=front+1;/*End of del() */

display()int i;if (front == -1)printf("Queue is empty\n");elseprintf("Queue is :\n");for(i=front;i<= rear;i++)

129

printf("%d ",queue_arr[i]);printf("\n");/*End of display() */

5. Implementation in JAVA

/* Problem : Queue - Data Structure * Author : Stephanus * Lang. : JAVA * Date : 20 January 2004/* class Queuefinal int QUEUE_SIZE = 100;private int[] q = new int[QUEUE_SIZE];private int first,last;private int count;Queue()first = 0;last = QUEUE_SIZE - 1;count = 0;public void enqueue(int x)last = (last + 1) % QUEUE_SIZE;q[last] = x;count++;public int dequeue()int x = q[first];first = (first + 1) % QUEUE_SIZE;count--;return x;public int getSize()return count;public class queuepublic static void main(String[] args)Queue q = new Queue();q.enqueue(1);q.enqueue(2);q.enqueue(3);while (q.getSize() != 0)System.out.println(q.dequeue());

Data Structures

130

Bibliography

ReferencesCadenhead, R. & Lemay, L., 2002. Sams teach yourself Java 2 in 21 days. Sams Publishing. • Deitel, H. M. & Deitel, P. J., 2005. Small C++ How to Program. Prentice Hall. • Gopalan., 2010. Object- Oriented Programming Using c++. PHI Learning Pvt. Ltd. • Gottfried, B. S., 1996. Schaum Outline of theory and problems of Programming in C, 2nd ed., The McGraw-• Hill Company. Inc.Hunt, J., 2002. Java and object orientation: an introduction. Springer.• Malik D. S., 2008. C++ programming: program design including data structures. Cengage Learning. • McAllister, W., 2008. Data structures and algorithms using Java. Publisher: Jones & Bartlett Learning.• Mrs. Kapure, G. Y., 2010. Programming in ‘C’, 1st ed., Tech-Max Publications.• Preiss, B. R., 2008. Data Structures and Algorithms with Object- Oriented Design Patterns in C++. Wiley-• India. Sengupta, S & Korobkin, C. P., 1994. C++, object-oriented data structures. Birkhäuser.• Venit, S., 2001. Introduction To Programming Concepts And Design. Dreamtech Press. • Venugopal, S., 2006. Data Structures Outside in with Java. Publisher: Prentice Hall. •

Recommended ReadingArnold., 2000. The Java Programming Language. Pearson Education India. • Cormen, T. H., 2001. Introduction to algorithms. MIT Press.• Francis Glassborow, 2004. You can do it: A Beginners Introduction to Computer Programming. Wiley • Publishers.František Franěk., 2004. Memory as a programming concept in C and C++. Cambridge University Press.• Friedman, F. L. & Koffman, E. B., 2007. Problem solving, abstraction, and design using C++. Pearson Addison-• Wesley. Goodrich, M., Tamassia, R. & Mount, D. Data Structures And Alogorithms In C++. Wiley-India.• Guzdial, M. and Ericson, B., 2009.Problem Solving with Data Structures Using Java: A Multimedia Approach. • Prentice Hall. Herbert Schildt., 2010. C# 4. 0 the Complete Reference. McGraw Hill Professional.• Jerry Lee Ford Jr., 2007. Programming for the Absolute Beginners. 1st ed., Course Technology PTR.• Kamthane, A., 2010. Programming and Data Structures. 5th ed., Course Technology.• Kashivishwanath, N., 2007. Data Structure Using C++ . • Langdon, W. B., 1998. Genetic programming and data structures: genetic programming + data structures. • Springer.Laszlo M. J., 1996. Computational geometry and computer graphics in C++. Prentice Hall.• McMillan, M., 2005. Data structures and algorithms using Visual Basic.NET. Cambridge University Press. • Rainald Löhner., 2008. Applied computational fluid dynamics techniques: an introduction based on finite element • methods. John Wiley and Sons. Suely Oliveira & Stewart, D. E., 2006.Writing scientific software: a guide for good style. Publisher: Cambridge • University Press.Ullman, L., 2004. C Programming. 1st ed., Peachpit Press.• William, Ford & Topp, W. R., 2002. Data structures with C++ using STL. Prentice Hall.•

131

Self Assessment Answers

Chapter Ia1. d2. c3. a4. c5. d6. b7. b8. a9. a10.

Chapter IIb1. c2. a3. c4. b5. a6. d7. b8. d9. b10.

Chapter IIIa, b1. a2. b3. a4. c5. d6. a, b, c 7. d8. b9. a10.

Chapter IV a1. b2. a3. c4. d5. a6. d7. a8. c9. a10.

Data Structures

132

Chapter V d1. a2. a3. b4. d5. a6. b7. a8. a9. a10.

Chapter VI a1. c2. d3. a4. b5. c6. d7. b8. a9. c10.

data structuresjnujprdistance.com/assets/lms/lms jnu/b.sc. (computer science)/s… · board of...

Documents