data abstraction gang qian department of computer science university of central oklahoma

Data Abstraction

Gang Qian

Department of Computer ScienceUniversity of Central Oklahoma

Objectives

Specification of Data Abstractions Implementation Issues Abstraction Function and Rep Invariant Designing Issues

Motivations of Data Abstraction Allows us to extend the programming language with

new data types Allows us to focus on the behaviors of data objects

rather than the implementation of data objects Incorporates abstractions both by parameterization

and by specification Abstraction by parameterization is achieved the same way

as procedures Abstraction by specification is achieved by making the

operations part of the new data type

Implementation of a data type is mainly to select a storage representation for the data/objects Without data abstraction, all programs that use

the data type should be implemented based on the storage representation Not easy to modify

If we combine data types and operations, users only need to use the operations, without knowing the storage representation

Specification for Data Abstraction The focus of the specification is to explain the

operations of a data type Our specification is based on class in Java,

but the same idea can be used if a language employs a different mechanism

Each class defines a type name and the follows: Constructors Instance methods (or methods)

As opposed to static methods or procedure

Data Abstraction Specification Template/** OVERVIEW: A brief description of the behavior of the type’s objects goes here. Mutability. Bounded or not for collection types*/visibility class dname { /** specs for constructors */

/** specs for methods */}

Notes: dname is the class name The visibility of most classes is public The overview part of the specification describe the data

abstraction in terms of “well-understood” concepts All constructors and methods that appear in the

specification should be public Since constructors and methods are just special

procedures, they use the same notation as stand-alone procedures REQUIRES, MODIFIES and EFFECTS Still need to be very careful about exceptions Usually no static method May use this to reference the current object in the

specification

Example: IntSet

/** OVERVIEW: IntSets are mutable, unbounded sets of integers. A typical IntSet is {x1,...,xn}.*/public class IntSet { /** EFFECTS: Constructor. Initializes this to be empty. */ public IntSet ()

/** MODIFIES: this EFFECTS: Adds x to the elements of this, i.e., this_post = this + { x }. */ public void insert (int x)

(continued on next slide)

/** MODIFIES: this EFEECTS: Removes x from this, i.e., this_post = this — { x }. */ public void remove (int x) /** EFFECTS: If x is in this returns true else returns false. */ public boolean isIn (int x)

/** EEEECTS: Returns the cardinality of this. */ public int size ()

/** EFFECTS: If this is empty, throws EmptyException; else returns an arbitrary element of this */ public int choose () throws EmptyException}

Note: The object is referred to as this in the

specification Since a constructor always modifies this, we do

not have to include a MODIFIES clause for it The modification is transparent to the user anyway

Mutator: methods that modifies this insert and remove Note the use of this_post

Observer: Return info about the state of the object

Method choose is underdetermined EmptyException checked or unchecked?

Method insert does not throw an exception if there is a duplicate int in the set Method remove has a similar situation It depends on the application May provide additional methods that can throw

exceptions insertNonDup and removeIfIn

The specification of IntSet requires that users know the mathematical concept of sets A problem with informal specification It is usually reasonable to expect such knowledge from

the users If some concepts are not well-known, more descriptions

and/or explanations are needed, including the use of examples, figures or other tools

Example: Poly

/** OVERVIEW: Polys are immutable polynomials with integer coefficients. A typical Poly is c0 + c1x + c2x^2 + ... */public class Poly { /** EFFECTS: Constructor. Initializes this to be the zero polynomial. */ public Poly ()

/** EFFECTS: If n < 0 throws NegativeExponentException; else initializes this to be the Poly cx^n. */ public Poly (int c, int n) throws NegativeExponentException


/** EFFECTS: Returns the degree of this, i.e., the largest exponent with a non-zero coefficient. Returns 0 if this is the zero Poly. */ public int degree ()

/** EFFECTS: Returns the coefficient of the term of this whose exponent is d. */ public int coeff (int d)

/** EFFECTS: If q is null throws NullPointerException; else returns the Poly this + q. */ public Poly add (Poly q) throws NullPointerException


/** EFFECTS: If q is null throws NullPointerException; else returns the Poly this * q. */ public Po]y mul (Poly q) throws NullPointerException /** EFFECTS: If q is null throws NullPointerException; else returns the Poly this — q. */ public Poly sub (Poly q) throws NullPointerException

/** EFFECTS: Returns the Poly — this. */ public Poly minus ()}

Note: Poly is an immutable class

There is no mutator methods NegativeExponentException: checked or

unchecked?

Using Data Abstractions

Programs should be written solely based on the specification of the data abstraction Implementation of the data abstraction should

NOT be utilized by the using code

/** EFFECTS: if p is null throws NullPointerException; else returns the Poly obtained by differentiating p. */public static Poly diff (Poly p) throws

NullPointerException { Poly q = new Poly (); for (int i = 1; i <= p.degree( ); i++) q = q.add(new Poly(p.coeff(i) * i, i — 1)); return q; }

/** EFFECTS: if a is null throws NullPointerException; else returns a set containing an entry for each distinct element of a. */public static IntSet getElements (int[] a) throws NullPointerException { IntSet s = new IntSet(); for (int i = 0; i < a.length; i++) s.insert(a[i]); return s; }

Implementing Data Abstraction Select a representation or rep to store the state of

an object In Java, rep is the set of instance variables in the class E.g., you may use an array to implement a set. Then the

array is the rep of the set object Constructors and methods of the object should

operate based on the rep The rep should support all operations of the object

Usually a rep may not support all operations efficiently Therefore, multiple implementation of the same data type

may be needed

Example: We may use ArrayList<Integer> as the representation

of IntSet objects The elements in an IntSet object can be stored in an

ArrayList We can choose between two representations:

Allow duplicate element values of the set in ArrayList<Integer>

Let each element of the set occur exactly once in ArrayList<Integer>

The 1st way is better for insert The 2nd way is better for remove and isIn, since the array

list is shorter If isIn is more frequently used than other operations, then

the 2nd way is favorable

Implementing Data Abstraction in Java A representation typically has a number of instance

variables The constructors and methods access and manipulate the

instance variables From an implementation point of view, objects have

both methods and instance variables However, as a data abstraction, instance variables

should be invisible (private) to users It is generally a bad idea to make instance variables public

Record data types are an exception E.g., LinkedListNode

Example: Implementing IntSet/** OVERVIEW: IntSets are unbounded, mutable sets of integers. A typical IntSet is {x1,...,xn}.*/public class IntSet { private Vector<Integer> els; // the rep

/** EFFECTS: Constructor. Initializes this to be empty. */ public IntSet () { els = new Vector<Integer>(); } (continued on next slide)

/** MODIFIES: this EFFECTS: Adds x to the elements of this. */

public void insert (int x) { Integer y = x; if (getIndex(y) < 0) els.add(y); }

/** MODIFIES: this EFFECTS: Removes x from this. */ public void remove (int x) { int i = getIndex(x); if (i < 0) return; els.set(i, els.lastElement( )); els.remove(els.size() - 1); }


/** EFFECTS: Returns true if x is in this; else returns false. */ public boolean isIn (int x) { return getIndex(x) > 0; }

/** EFFECTS: If x is in this returns the index where x appears; else returns -1. */ private int getIndex (Integer x) { for (int i = 0; i < els.size(); i++) if (x.equals(els.get(i))) return i; return -1; }

/** EFFECTS: Returns the cardinality of this. */ public int size () { return els.size(); }


/** EFFECTS: If this is empty throws

EmptyException; else returns an arbitrary

element of this. */

public int choose () throws EmptyException {

if (els.size() == 0)

throw new EmptyException("IntSet.choose"); return els.lastElement();

}

}

Note: Why does getIndex not use exceptions? Method insert guarantees the

uniqueness of elements in the vector els This condition is essential to the

implementation of methods size and remove Implementation using an int array is ok

but less favorable Underdetermined method choose gets a

determined implementation

Example: Implementing Poly

Since Poly is immutable, array can be used as its rep (coefficient array) The ith element of the array stores the coefficient

of the ith exponent Make sense only if the poly is dense Example 1: Dense

<1, 1, 0, -2>

Example 2: Sparse

<1, 0, …, 0, -10>

321 xx

1000101 x

1001st element

Example: Implementing Poly

The zero Poly can be represented by either an empty array or a one-element array containing zero The latter is used in the implementation

For convenience, an instance variable is used to store the degree of the Poly

/** OVERVIEW: Polys are immutable polynomials with integer coefficients. A typical Poly is c0 + c1x + c2x^2 + ... */public class Poly { private int[] trms; private int deg;

/** EFFECTS: Constructor. Initializes this to be the zero polynomial */ public Poly () { trms = new int[1]; deg = 0; }


/** EFFECTS: If n < 0 throws NegativeExponentException; else initializes this to be the Poly cx^n */ public Poly (int c, int n) throws NegativeExponentException { if (n < 0) throw new NegativeExponentException( "Poly(int, int) constructor"); if (c == 0) { trms = new int[1]; deg = 0; return; } trms = new int[n + 1]; for (int i = 0; i < n; i++) trms[i] = 0; trms[n] = c; deg = n; }


/** EFFECTS: initialize this to be the poly 0x^n */ private Poly (int n) { trms = new int[n+1]; deg = n; }

/** EFFECTS: Returns the degree of this, i.e., the largest exponent with a non-zero coefficient. Returns 0 if this is the zero Poly. */ public int degree () { return deg; }

/** EFFECTS: Returns the coefficient of the term of this whose exponent is d */ public int coeff (int d) { if (d < 0 || d > deg) return 0; else return trms[d]; }


/** EFFECTS: If q is null throws NullPointerException; else returns the Poly this - q */ public Poly sub (Poly q) throws NullPointerException { return add(q.minus()); }

/** EFFECTS: Returns the Poly - this. */ public Poly minus () { Poly r = new Poly(deg); for (int i = 0; i < deg; i++) r.trms[i] = - trms[i]; // Note here return r; }

// ... // See textbook p. 92 and WebCT for complete code}

Records

A record is a collection of fields E.g., struct in C/C++ Java does not have struct. A class has to be used Visibility of instance variables in a record class can be

either public or package visible/** Overview: A record type */

class Pair { int coeff; int exp; Pair(int c, int n) { coeff = c; exp = n; } }

No specification is needed for record other than to indicate that it is a record type

Additional Methods

Class Object is the ancestor of all Java classes Depending on the class, some methods of Object

need to be overridden We will discuss equals, clone and toString

Method equals Conceptually, two objects are equal if they are

behaviorally equivalent Behaviorally equivalent: Two objects are

undistinguishable by using any sequence of calls to the objects’ methods

For mutable objects, all distinct objects are distinguishable Objects are equal only when they are the same object

Immutable objects with the same state are equivalent

The equals method of Object tests whether two objects are the same object For mutable objects, there is no need to override the

equals method of Object The equals method of immutable objects should be

overridden

Similarity is a weaker equality notion Two objects are similar if they are not distinguishable by

using any observers of their type If necessary, you may implement a similar method

similar and equals are the same for immutable objects similar is weaker than equals for mutable types

IntSet s = new IntSet();

IntSet t = new IntSet();

if (s.similar(t)) ...; else ...;

Method clone clone creates a new object that is a copy of the

object on which clone is invoked The clone method of Object assigns the instance

variables of the old object to those of the new one It may create a sharing problem if any of the instance

variables is a reference E.g., IntSet, Poly (immutable, so acceptable)

The default implementation of clone is usually acceptable for immutable types If the default clone can be used, it can be inherited by

putting implements Cloneable in the class header clone needs to be implemented for mutable

types

/** OVERVIEW: ... */

public class Poly implements Cloneable { public boolean equals (Poly q) { // Optimized if (q == null || deg != q.deg) return false; for (int i = 0; i <= deg; i++) if (trms[i] != q.trms[i]) return false; return true; } public boolean equals (Object z) { if (!(z instanceof Poly)) return false; return equals((Poly) z); } } The definitions of the equals method can be deemed as a

template The first equals method is overloaded, while the second one

is the overriding method, since it has the same signature as that in class Object

/** OVERVIEW: ... */public class IntSet { ... private IntSet (Vector<Integer> v) { els = new Vector<Integer>(); for (int i = 0; i < v.size(); i++) els.add(v.get(i)); }

public Object clone ( ) { return new IntSet(els); } } Note:

There is no specification for clone or equals since their meanings are well-understood

CloneNotSupportedException is thrown if clone is called on an object that neither implements Cloneable nor declare its own clone method

The signature of clone of a subtype is identical to the signature of clone for Object Object clone(); A cast is needed when clone is usedIntSet t = (IntSet) s.clone(); May not be so for user-defined generic classes

Discussed in Polymorphism

Method toString Produces a string that represents the current state

of its object and indicates its type E.g., IntSet: {1, 7, 3} E.g., Ploy: 2 + 3x + 5x^2

toString of Object only provides type name and its hash code

It is advisable to provide a customized toString method for each new type

toString method for IntSet

public String toString () {

if (els.size() == 0) return "IntSet: { } ";

String s = "IntSet: {" + els.elementAt(0);

for (int i = 1; i < els.size(); i++)

s = s + ", " + els.elementAt(i);

return s + "} ";

}

Aids to Understanding Implementations Abstraction function describes the

implementer's choice of a particular representation for the data type About how instance variable values are mapped

to the state of the abstract object that they represent

Rep invariant describes the common assumptions on which constructors and methods are implemented It allows the implementation of each operation

without worrying about those of the others

Abstraction function and rep invariant captures why the code is the way it is E.g., choose and size of IntSet They are valuable to both implementers and other

readers of the code But not the user of the data abstraction

Note that they are NOT specification Written by implementers

DesignerDesigner

ImplementerImplementer

Abstraction Function

The implementation of a data abstraction decides a relationship between the rep and the abstract objects

The relationship can be defined as a function called the abstraction function (AF) It maps from the instance variables (rep of an object) to the

abstract object being represented E.g., the IntSet uses a vector els Specifically, AF maps from the concrete values of the

instance variables to the abstract state of the abstract object

The following example shows the mapping of concrete states of a real object to abstract states of the abstract object E.g.: Vector<Integer>: [1, 2] maps to an Integer set {1, 2}

Apparently, AFs are often many-to-one mappings

AF should be described in a comment in the implementation of an abstract object

Since informal specification is used, the range of an AF is not mathematically defined

To overcome the problem, we give a description of a typical abstract object in the specification E.g., in IntSet, we have “ A typical IntSet is {x1,...,xn}“

E.g., in Poly, we have “A typical Poly is c0 + c1x + c2x^2 + ...”

Example: Based on the typical abstract IntSet object, we can write the AF for IntSet as follows:// The abstraction function is// AF(c) = { c.els[i] | 0 <= i < c.els.size } The notation {x | p(x) } describes the set of all x

such that the predicate p(x) is true Note that convenient abbreviations are used

c.els[i] stands for c.els.get(i) It is fine as long as the readers can clearly understand

what it means

Note that you can also choose to write the abstraction function in plain English E.g., AF of IntSet implementation can be “All

elements in the rep els correspond to the elements in the abstract IntSet.”

Example:// A typical Poly is c0 + c1x + c2x^2 + ...// The abstraction function is:// AF(c) = c0 + c1x + x2x^2 + ...// where // ci = c.trms[i] if 0 <= i < c.trms.size// ci = 0 otheriwse Or in plain English: The elements in the rep trms

correspond to the coefficients of the polynomial object. The index of each element/coefficient in trms corresponds to the exponent of each term in the polynomial

You do not need to provide an abstraction function for a record type A record type provides no abstraction over its rep

–- both the real object and abstract object is a collection of fields that correspond to each other

Representation Invariant

Not all syntactically correct values of instance variables are semantically correct to represent the state of the abstract object E.g., if we do not allow duplicate values in Vector<Integer> els of an IntSet. The els’s containing duplicate values are not legitimate representations of the IntSet, although the compiler will accept it

Representation (Rep) invariant is a statement of a property that all legitimate objects satisfy A rep invariant is a predicate that is true of legitimate

objects If it is violated, then the object is corruptted

Example: for IntSet, we have:// The rep invariant is

// I(c) = c.els != null &&

// for all int i, j, 0 < i, j < c.els.size &&

// i != j => c.els[i] != c.els[j]

The rep invariant is written using predicate calculus notation

Predicate calculus notation: &&: and, conjunction ||: or, disjunction =>: implication for all: universal quantifier there exists: existential quantifier

You may also choose to write the rep invariant in an informal way using plain English

Example: for IntSet, we have:// The rep invariant is:

// I(c) = c.els != null &&

// there are no duplicates in c.els

Example: Consider an alternative representation of IntSet that consists of an array of 100 boolean values plus a Vector<Integer>private boolean[] els = new boolean[100];private Vector<Integer> otherEls;private int sz; Based on the above rep, if an integer i between 0

and 99 is in the set, we just set els[i] to be true

All integers > 99 are stored in otherEls For efficiency purpose, we store the size of the

set in sz This will be an efficient rep if almost all integers

that appear are between 0 and 99

The Abstraction function:// The abstraction function is

// AF(c) = { c.otherEls[i] |

// 0 <= i < c.otherEls.size }

// + { j | 0 <= j < 100 && c.els[j] }

The rep invariant:// The rep invariant is

// I(c) = c.els != null && otherEls != null &&

// all elements in c.otherEls are not in the

// range 0 to 99 && there are no duplicates in

// c.otherEls && c.sz = c.otherEls.size +

// (count of true entries in c.els)

The Abstraction function in plain English:// The abstraction function:

// The set of int in the IntSet are the union

// of the indices of true elements in els

// and all the elements in otherEls

Note that sz is redundant Whenever there is redundant information in the rep,

the relationship of the redundant info to the rest of the rep should be explained in the rep invariant

Example: Poly// The rep invariant is

// I(c) = c.trms != null && c.trms.length >= 1 &&

// c.deg = c.trms.length- 1 &&

// c.deg > 0 => c.trms[deg] != 0

If all syntactically correct states of the concrete object are legal representations, we simply have:// The rep invariant is

// I(c) = true

It is so for all record types Since using code can access the rep directly, there is no

way for the implementation code of a record type to constrain it

Thus, rep invariants need not to be given for record types They must be given for all other types It helps the implementers and code readers

It is always possible that there is some strong relationship among the fields. That relationship should be expressed in the rep invariant of the using code Assume that another type of rep is used for Poly:

class Pair { int exp; int coeff; }

/** Overview: A record type */public class Poly { Pair[] trms; // only used for non-zero terms ...}

Then we have rep invariant as follows:// for all elements e of c.trms, // e.exp >= 0 and e.coeff != 0

Implementing the Abstraction Function and Rep Invariant Besides providing the abstraction function

and rep invariant as comments, you usually also provide methods to implement them Not for record type

The toString method is used to implement the abstraction function

The method that checks the rep invariant is called repOk

repOk specification:/** EFFECTS: Returns true if the rep invariant

holds for this; otherwise returns false

*/

public boolean repOk()

repOk is public so that using code can use it Since the specification of repOk is clear and

always the same, it is not necessary to write it

Examples// for Poly: public boolean repOk() { if (trms == null || deg != trms.length - 1 || trms.length == 0) return false; if (deg == 0) return true; return trms[deg] != 0; } //for IntSet public boolean repOk() { if (els == null) return false; for (int i = 0; i < els.size(); i++) { Integer x = els.get(i); for (int j = i + 1; j < els.size(); j++) if (x.equals(els.get(j))) return false; } return true;}

There are two ways to use repOk Using code can call it to check the implementation Call it inside constructors and methods that

modify the rep Call right before they return

If repOk is costly, they can be disabled when the program is in production

Discussion AF and RI are NOT the specification of the data

abstraction. They are written by the implementers rather than designers

Rep invariant holds whenever an object is used outside its implementation It need not hold all the time in an object operation However, it must be true whenever the operations return

to their callers The abstraction function only makes sense when

the rep invariant holds

A rep invariant should express all constraints on which the object operations depend You may imagine that the operations are to be

implemented by different people The rep invariant should be sufficient to support the

scenario When implementing a data abstraction, AF and RI

should be completed before the implementation of any operation

All operations of the data abstraction should be implemented such that RI is preserved

Properties of Data Abstraction Implementation The rep of an immutable abstraction need not

to be immutable E.g., Poly

Benevolent Side Effects are rep modifications that are not visible outside the implementation Example: Change the order of the elements in elms in an IntSet

Example: Suppose rational numbers are represented as a pair of integers (fraction): int num, denom;

The abstraction function is: // A typical rational is n / d

// The abstraction function is

// AF(c) = c.num / c.denom Given the rep, there are several issues:

Zero denominator? No Negative rational? Negative Numerator Reduced form (no common term)? No

Based on the decision, we have a rep invariant:// The rep invariant is

// c.denom > 0

However, to test equality of two rationals, a reduced form is needed (common factors are removed) See code in WebCT or on Page 110 in the textbook

The equals method will compute the reduced form of two rationals first, before decide the equality

The reduced form of rep replaces the original rep but the abstract object is the same Benevolent side effect

Benevolent side effects are often performed for efficiency reasons They are possible whenever the abstraction

function is many-to-one

Exposing the Rep It is very important that the rep of a data

abstraction cannot be modified outside its implementation

Even if all instance variables are declared private, it is still possible to expose the rep/** EFFECTS: Returns a vector containing the elements of this, each exactly once, in arbitrary order */public Vector<Integer> allEls () { return els;}

Exposing the rep is an implementation error: A method returns a mutable object in the rep A constructor or method makes a mutable argument object

part of the rep/** EFFECTS: If elms is null throws

NullPointerException; else initializes

this to contain all elements in elms */

public IntSet (Vector<Integer> elms) throws

NullPointerException {

if (elms == null) throw new

NullPointerException(“IntSet.IntSet(Vectors)”);

els = elms;

}

Design Issues

Mutability In general, a type should be immutable if its

objects would naturally have unchanging values E.g., mathematical objects such as Poly, Rational,

etc. A type should usually be mutable if it is modeling

something from the real world, where it is natural for values of objects to change overtime E.g., Employee, IntSet, etc.

There is a trade-off between efficiency and safety Immutable abstractions are safer: no problem for

sharing; Immutable abstractions are less efficient: objects may

be created and discarded frequently Mutability is a property of the type and not of its

implementation Implementation should support the property

Operation Categories Creators: Operations that create objects of their

types from scratch All creators are constructors

Producers: Take objects of their type as inputs and create other objects of their type E.g., add of Poly

Mutators: Modifies objects of their type Only for mutable types

Observers: Take objects of their type as inputs and return results of other types

Creators usually create some but not all objects of the data type E.g., Poly constructors only create single-term

polynomials, while IntSet constructor only creates the empty set

Other objects are created by producers or mutators E.g., add of Poly, insert of IntSet, etc

Mutators play the same role in mutable types as what producers play in immutable ones

A mutable type can have producers as well as mutators E.g., clone of IntSet

Sometimes observers are combined with producers or mutators E.g., we may have a chooseAndRemove method for

IntSet

Adequacy A data type is adequate if it provides enough

operations so that everything users need to do with its objects can be done both conveniently and with reasonable efficiency There is no precise definition of adequacy A not adequate type can be told

E.g., an IntSet without isIn

A basic notion of adequacy can be obtained by considering the operation categories In general, for immutable types, it must have creators,

observers and producers For mutable types, it must have creators, observers and

mutators A type must be fully populated

Using its creators, mutators and producers, it must be possible to obtain every possible abstract object state

A type that is intended for general use must have a rich enough set of operations for its intended uses

But do not offer irrelevant operations E.g., sum or sort for IntSet

If a type is adequate, its operations can be augmented by standalone procedures that are outside the type’s implementation (i.e., static methods of some other class)

Locality and Modifiability Revisited Locality: the ability to reason about a module

by just looking at its specification It requires that the rep be modified only within its

type’s implementation Modifiability: the ability to re-implement an

abstraction without having to modify any using code All access to a rep must occur within its

implementation

data abstraction gang qian department of computer science university of central oklahoma

Documents

data typeour specification

new data typesallows

new data typeimplementation

data abstractionthe

behaviors of data objects

public intset

public class intset

public void