course notes cs2606: data structures and object-oriented development ryan richardson

34
Course notes CS2606: Data Structures and Object-Oriented Development Ryan Richardson John Paul Vergara Department of Computer Science Virginia Tech Spring 2008

Upload: jabir

Post on 06-Jan-2016

26 views

Category:

Documents


0 download

DESCRIPTION

Course notes CS2606: Data Structures and Object-Oriented Development Ryan Richardson John Paul Vergara Department of Computer Science Virginia Tech Spring 2008. Goals of this Course. Reinforce the concept that costs and benefits exist for every data structure. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Course notes CS2606: Data Structures and Object-Oriented Development Ryan Richardson

Course notes

CS2606: Data Structures andObject-Oriented Development

Ryan RichardsonJohn Paul Vergara

Department of Computer ScienceVirginia TechSpring 2008

Page 2: Course notes CS2606: Data Structures and Object-Oriented Development Ryan Richardson

Goals of this Course

1. Reinforce the concept that costs and benefits exist for every data structure.

2. Learn the commonly used data structures.– These form a programmer's basic data

structure ``toolkit.'‘

3. Understand how to measure the cost of a data structure or program.– These techniques also allow you to judge the

merits of new data structures that you or others might invent.

Page 3: Course notes CS2606: Data Structures and Object-Oriented Development Ryan Richardson

The Need for Data Structures

Data structures organize data

more efficient programs.

More powerful computers

more complex applications.

More complex applications demand more calculations.

Complex computing tasks are unlike our everyday experience.

Page 4: Course notes CS2606: Data Structures and Object-Oriented Development Ryan Richardson

Organizing Data

Any organization for a collection of records can be searched, processed in any order, or modified.

The choice of data structure and algorithm can make the difference between a program running in a few seconds or many days.

Page 5: Course notes CS2606: Data Structures and Object-Oriented Development Ryan Richardson

Efficiency

A solution is said to be efficient if it solves the problem within its resource constraints.– Space– Time

• The cost of a solution is the amount of resources that the solution consumes.

Page 6: Course notes CS2606: Data Structures and Object-Oriented Development Ryan Richardson

Selecting a Data Structure

Select a data structure as follows:

1. Analyze the problem to determine the basic operations that must be supported.

2. Quantify the resource constraints for each operation.

3. Select the data structure that best meets these requirements.

Page 7: Course notes CS2606: Data Structures and Object-Oriented Development Ryan Richardson

Some Questions to Ask

• Are all data inserted into the data structure at the beginning, or are insertions interspersed with other operations?

• Can data be deleted?

• Are all data processed in some well-defined order, or is random access allowed?

Page 8: Course notes CS2606: Data Structures and Object-Oriented Development Ryan Richardson

Costs and Benefits

Each data structure has costs and benefits.

Rarely is one data structure better than another in all situations.

Any data structure requires:– space for each data item it stores,– time to perform each basic operation,– programming effort.

Page 9: Course notes CS2606: Data Structures and Object-Oriented Development Ryan Richardson

Costs and Benefits (cont)

Each problem has constraints on available space and time.

Only after a careful analysis of problem characteristics can we know the best data structure for the task.

Bank example:– Start account: a few minutes– Transactions: a few seconds– Close account: overnight

Page 10: Course notes CS2606: Data Structures and Object-Oriented Development Ryan Richardson

Example 1.2

Problem: Create a database containing information about cities and towns.

Tasks: Find by name or attribute or location

• Exact match, range query, spatial queryResource requirements: Times can be

from a few seconds for simple queries to a minute or two for complex queries

Page 11: Course notes CS2606: Data Structures and Object-Oriented Development Ryan Richardson

Abstract Data Types

Abstract Data Type (ADT): a definition for a data type solely in terms of a set of values and a set of operations on that data type.

Each ADT operation is defined by its inputs and outputs.

Encapsulation: Hide implementation details.

Page 12: Course notes CS2606: Data Structures and Object-Oriented Development Ryan Richardson

Data Structure

• A data structure is the physical implementation of an ADT.– Each operation associated with the ADT is

implemented by one or more subroutines in the implementation.

• Data structure usually refers to an organization for data in main memory.

• File structure: an organization for data on peripheral storage, such as a disk drive.

Page 13: Course notes CS2606: Data Structures and Object-Oriented Development Ryan Richardson

Metaphors

An ADT manages complexity through abstraction: metaphor.– Hierarchies of labels

Ex: transistors gates CPU.

In a program, implement an ADT, then think only about the ADT, not its implementation.

Page 14: Course notes CS2606: Data Structures and Object-Oriented Development Ryan Richardson

Logical vs. Physical Form

Data items have both a logical and a physical form.

Logical form: definition of the data item within an ADT.– Ex: Integers in mathematical sense: +, -

Physical form: implementation of the data item within a data structure.– Ex: 16/32 bit integers, overflow.

Page 15: Course notes CS2606: Data Structures and Object-Oriented Development Ryan Richardson

Data Type

ADT:TypeOperations

Data Items: Logical Form

Data Items: Physical Form

Data Structure:Storage SpaceSubroutines

Page 16: Course notes CS2606: Data Structures and Object-Oriented Development Ryan Richardson

Example 1.8

A typical database-style project will have many interacting parts.

Page 17: Course notes CS2606: Data Structures and Object-Oriented Development Ryan Richardson

Binary Search Trees

What is good and bad about BSTs?• Space requirements?• Time requirements?• Average vs. Worst Case?• What drives worst-case behavior?

What problems can a BST solve?

Page 18: Course notes CS2606: Data Structures and Object-Oriented Development Ryan Richardson

Example: BST Template Interfacetemplate <typename T> class BST {

private:

BinNodeT<T>* Root;

// additional private members not shown

public:

BST(); // create empty BST

BST(const T& D); // root holds D

BST(const BST<T>& D); // deep copy support

BST<T>& operator=(const BST<T>& D) const;

bool Insert(const T& D); // insert element

bool Delete(const T& D); // delete element

T* const Find(const T& D); // return access to D

const T* const Find(const T& D) const; // return access to D

void Clear(); // restore to empty

// state

~BST();

};

Page 19: Course notes CS2606: Data Structures and Object-Oriented Development Ryan Richardson

Spatial Data Structures

What if we want to search for points in 2D or 3D space? How do we store the data?

• Could store separate data structures organizing by x- and y-dimensions (list, BST, etc.)

• This is OK for exact-match queries, but doesn’t handle range queries well

• Could combine

We need a spatial data structure to handle spatial queries.

Page 20: Course notes CS2606: Data Structures and Object-Oriented Development Ryan Richardson

Spatial Data Structures

• Spatial data records include a sense of location as an attribute.

• Typically location is represented by coordinate data (in 2D or 3D).

• If we are to search spatial data using the locations as key values, we need data structures that efficiently represent selecting among more than two alternatives during a search.

Page 21: Course notes CS2606: Data Structures and Object-Oriented Development Ryan Richardson

Spatial Data Structures

• One approach for 2D data is to employ quadtrees, in which each internal node can have up to 4 children, each representing a different region obtained by decomposing the coordinate space.

• There are a variety of such quadtrees, many of which are described in:

– The Quadtree and Related Hierarchical Data Structures, Hanan Samet, ACM Computing Surveys, June 1984

Page 22: Course notes CS2606: Data Structures and Object-Oriented Development Ryan Richardson

Spatial Decomposition

• In binary search trees, the structure of the tree depends not only upon what data values are inserted, but also in what order they are inserted.

• In contrast, the structure of a Point-Region Quadtree is determined entirely by the data values it contains, and is independent of the order of their insertion.

• In effect, each node of a PR Quadtree represents a particular region in a 2D coordinate space.

Page 23: Course notes CS2606: Data Structures and Object-Oriented Development Ryan Richardson

Spatial Decomposition

• Internal nodes have exactly 4 children (some may be empty), each representing a different, congruent quadrant of the region represented by their parent node.

• Internal nodes do not store data.• Leaf nodes hold a single data value. • Therefore, the coordinate space is partitioned as insertions

are performed so that no region contains more than a single point.

• PR quadtrees represent points in a finitely-bounded coordinate space.

Page 24: Course notes CS2606: Data Structures and Object-Oriented Development Ryan Richardson
Page 25: Course notes CS2606: Data Structures and Object-Oriented Development Ryan Richardson
Page 26: Course notes CS2606: Data Structures and Object-Oriented Development Ryan Richardson
Page 27: Course notes CS2606: Data Structures and Object-Oriented Development Ryan Richardson
Page 28: Course notes CS2606: Data Structures and Object-Oriented Development Ryan Richardson
Page 29: Course notes CS2606: Data Structures and Object-Oriented Development Ryan Richardson

PR Quadtree Insertion

• Insertion proceeds recursively, descending until the appropriate leaf node is found, and then partitioning and descending until there is no more than one point within the region represented by each leaf.

• It is possible for a single insertion to add many levels to the relevant subtree, if points lie close enough together.

• Of course, it is also possible for an insertion to require no splitting whatsoever.

• The shape of the tree is entirely independent of the order in which the data elements are added to it.

Page 30: Course notes CS2606: Data Structures and Object-Oriented Development Ryan Richardson

PR Quadtree Deletion

• Deletion of elements is more complex (naturally) and may involve collapsing of nodes.

• Since deletion is not required for the first project, a discussion of the details will be deferred until a later time.

Page 31: Course notes CS2606: Data Structures and Object-Oriented Development Ryan Richardson

PR Quadtree Node Implementation

• Of course, the PR Quadtree will be implemented as a C++ template.

• However, it may be somewhat less generic than the general BST discussed earlier.

• During insertion and search, it is necessary to determine whether one point lies NW, NE, SE or SW of another point. Clearly this cannot be accomplished by using the usual relational operators to compare points.

Page 32: Course notes CS2606: Data Structures and Object-Oriented Development Ryan Richardson

PR Quadtree Node Implementation

Two possible approaches:1) have the data type provide accessors for the x- and y-

coordinates2) have the type provide a comparator that returns NW, NE,

SE or SW• Either is feasible. It is possible to argue either is better,

depending upon the value placed upon various design goals. It is also possible to deal with the issue in other ways.

• In any case, the PR Quadtree implementation will impose fairly strict requirements on any data type that is to be stored in it.

Page 33: Course notes CS2606: Data Structures and Object-Oriented Development Ryan Richardson

PR Quadtree Implementation

Here's a possible PR Quadtree interface:template <typename T> class prQuadTree {public:

prQuadTree(int xMinimum, int xMaximum, int yMinimum, int yMaximum);

bool Insert(const T& Elem);bool Delete(const T& Elem);T* const Find(const T& Elem);const T* const Find(const T& Elem) const;void Display(std::ostream& Out) const;

private:prQuadNode<T>* Root;int xMin, xMax, yMin, yMax;

// . . .};

Page 34: Course notes CS2606: Data Structures and Object-Oriented Development Ryan Richardson

PR Quadtree Implementation

Some comments:• the tree must be created to organize data elements that lie

within a particular, bounded region for the partitioning logic to be correct

• the question of how to manage different types for internal and leaf nodes raises some fascinating design and coding issues…

• there will, of course, be a number of private, recursive helper functions

• how to display the tree also raises some fascinating issues…