data abstraction gang qian department of computer science university of central oklahoma
TRANSCRIPT
Data Abstraction
Gang Qian
Department of Computer ScienceUniversity of Central Oklahoma
Objectives
Specification of Data Abstractions Implementation Issues Abstraction Function and Rep Invariant Designing Issues
Motivations of Data Abstraction Allows us to extend the programming language with
new data types Allows us to focus on the behaviors of data objects
rather than the implementation of data objects Incorporates abstractions both by parameterization
and by specification Abstraction by parameterization is achieved the same way
as procedures Abstraction by specification is achieved by making the
operations part of the new data type
Implementation of a data type is mainly to select a storage representation for the data/objects Without data abstraction, all programs that use
the data type should be implemented based on the storage representation Not easy to modify
If we combine data types and operations, users only need to use the operations, without knowing the storage representation
Specification for Data Abstraction The focus of the specification is to explain the
operations of a data type Our specification is based on class in Java,
but the same idea can be used if a language employs a different mechanism
Each class defines a type name and the follows: Constructors Instance methods (or methods)
As opposed to static methods or procedure
Data Abstraction Specification Template/** OVERVIEW: A brief description of the behavior of the type’s objects goes here. Mutability. Bounded or not for collection types*/visibility class dname { /** specs for constructors */
/** specs for methods */}
Notes: dname is the class name The visibility of most classes is public The overview part of the specification describe the data
abstraction in terms of “well-understood” concepts All constructors and methods that appear in the
specification should be public Since constructors and methods are just special
procedures, they use the same notation as stand-alone procedures REQUIRES, MODIFIES and EFFECTS Still need to be very careful about exceptions Usually no static method May use this to reference the current object in the
specification
Example: IntSet
/** OVERVIEW: IntSets are mutable, unbounded sets of integers. A typical IntSet is {x1,...,xn}.*/public class IntSet { /** EFFECTS: Constructor. Initializes this to be empty. */ public IntSet ()
/** MODIFIES: this EFFECTS: Adds x to the elements of this, i.e., this_post = this + { x }. */ public void insert (int x)
(continued on next slide)
/** MODIFIES: this EFEECTS: Removes x from this, i.e., this_post = this — { x }. */ public void remove (int x) /** EFFECTS: If x is in this returns true else returns false. */ public boolean isIn (int x)
/** EEEECTS: Returns the cardinality of this. */ public int size ()
/** EFFECTS: If this is empty, throws EmptyException; else returns an arbitrary element of this */ public int choose () throws EmptyException}
Note: The object is referred to as this in the
specification Since a constructor always modifies this, we do
not have to include a MODIFIES clause for it The modification is transparent to the user anyway
Mutator: methods that modifies this insert and remove Note the use of this_post
Observer: Return info about the state of the object
Method choose is underdetermined EmptyException checked or unchecked?
Method insert does not throw an exception if there is a duplicate int in the set Method remove has a similar situation It depends on the application May provide additional methods that can throw
exceptions insertNonDup and removeIfIn
The specification of IntSet requires that users know the mathematical concept of sets A problem with informal specification It is usually reasonable to expect such knowledge from
the users If some concepts are not well-known, more descriptions
and/or explanations are needed, including the use of examples, figures or other tools
Example: Poly
/** OVERVIEW: Polys are immutable polynomials with integer coefficients. A typical Poly is c0 + c1x + c2x^2 + ... */public class Poly { /** EFFECTS: Constructor. Initializes this to be the zero polynomial. */ public Poly ()
/** EFFECTS: If n < 0 throws NegativeExponentException; else initializes this to be the Poly cx^n. */ public Poly (int c, int n) throws NegativeExponentException
(continued on next slide)
/** EFFECTS: Returns the degree of this, i.e., the largest exponent with a non-zero coefficient. Returns 0 if this is the zero Poly. */ public int degree ()
/** EFFECTS: Returns the coefficient of the term of this whose exponent is d. */ public int coeff (int d)
/** EFFECTS: If q is null throws NullPointerException; else returns the Poly this + q. */ public Poly add (Poly q) throws NullPointerException
(continued on next slide)
/** EFFECTS: If q is null throws NullPointerException; else returns the Poly this * q. */ public Po]y mul (Poly q) throws NullPointerException /** EFFECTS: If q is null throws NullPointerException; else returns the Poly this — q. */ public Poly sub (Poly q) throws NullPointerException
/** EFFECTS: Returns the Poly — this. */ public Poly minus ()}
Note: Poly is an immutable class
There is no mutator methods NegativeExponentException: checked or
unchecked?
Using Data Abstractions
Programs should be written solely based on the specification of the data abstraction Implementation of the data abstraction should
NOT be utilized by the using code
/** EFFECTS: if p is null throws NullPointerException; else returns the Poly obtained by differentiating p. */public static Poly diff (Poly p) throws
NullPointerException { Poly q = new Poly (); for (int i = 1; i <= p.degree( ); i++) q = q.add(new Poly(p.coeff(i) * i, i — 1)); return q; }
/** EFFECTS: if a is null throws NullPointerException; else returns a set containing an entry for each distinct element of a. */public static IntSet getElements (int[] a) throws NullPointerException { IntSet s = new IntSet(); for (int i = 0; i < a.length; i++) s.insert(a[i]); return s; }
Implementing Data Abstraction Select a representation or rep to store the state of
an object In Java, rep is the set of instance variables in the class E.g., you may use an array to implement a set. Then the
array is the rep of the set object Constructors and methods of the object should
operate based on the rep The rep should support all operations of the object
Usually a rep may not support all operations efficiently Therefore, multiple implementation of the same data type
may be needed
Example: We may use ArrayList<Integer> as the representation
of IntSet objects The elements in an IntSet object can be stored in an
ArrayList We can choose between two representations:
Allow duplicate element values of the set in ArrayList<Integer>
Let each element of the set occur exactly once in ArrayList<Integer>
The 1st way is better for insert The 2nd way is better for remove and isIn, since the array
list is shorter If isIn is more frequently used than other operations, then
the 2nd way is favorable
Implementing Data Abstraction in Java A representation typically has a number of instance
variables The constructors and methods access and manipulate the
instance variables From an implementation point of view, objects have
both methods and instance variables However, as a data abstraction, instance variables
should be invisible (private) to users It is generally a bad idea to make instance variables public
Record data types are an exception E.g., LinkedListNode
Example: Implementing IntSet/** OVERVIEW: IntSets are unbounded, mutable sets of integers. A typical IntSet is {x1,...,xn}.*/public class IntSet { private Vector<Integer> els; // the rep
/** EFFECTS: Constructor. Initializes this to be empty. */ public IntSet () { els = new Vector<Integer>(); } (continued on next slide)
/** MODIFIES: this EFFECTS: Adds x to the elements of this. */
public void insert (int x) { Integer y = x; if (getIndex(y) < 0) els.add(y); }
/** MODIFIES: this EFFECTS: Removes x from this. */ public void remove (int x) { int i = getIndex(x); if (i < 0) return; els.set(i, els.lastElement( )); els.remove(els.size() - 1); }
(continued on next slide)
/** EFFECTS: Returns true if x is in this; else returns false. */ public boolean isIn (int x) { return getIndex(x) > 0; }
/** EFFECTS: If x is in this returns the index where x appears; else returns -1. */ private int getIndex (Integer x) { for (int i = 0; i < els.size(); i++) if (x.equals(els.get(i))) return i; return -1; }
/** EFFECTS: Returns the cardinality of this. */ public int size () { return els.size(); }
(continued on next slide)
/** EFFECTS: If this is empty throws
EmptyException; else returns an arbitrary
element of this. */
public int choose () throws EmptyException {
if (els.size() == 0)
throw new EmptyException("IntSet.choose"); return els.lastElement();
}
}
Note: Why does getIndex not use exceptions? Method insert guarantees the
uniqueness of elements in the vector els This condition is essential to the
implementation of methods size and remove Implementation using an int array is ok
but less favorable Underdetermined method choose gets a
determined implementation
Example: Implementing Poly
Since Poly is immutable, array can be used as its rep (coefficient array) The ith element of the array stores the coefficient
of the ith exponent Make sense only if the poly is dense Example 1: Dense
<1, 1, 0, -2>
Example 2: Sparse
<1, 0, …, 0, -10>
321 xx
1000101 x
1001st element
Example: Implementing Poly
The zero Poly can be represented by either an empty array or a one-element array containing zero The latter is used in the implementation
For convenience, an instance variable is used to store the degree of the Poly
/** OVERVIEW: Polys are immutable polynomials with integer coefficients. A typical Poly is c0 + c1x + c2x^2 + ... */public class Poly { private int[] trms; private int deg;
/** EFFECTS: Constructor. Initializes this to be the zero polynomial */ public Poly () { trms = new int[1]; deg = 0; }
(continued on next slide)
/** EFFECTS: If n < 0 throws NegativeExponentException; else initializes this to be the Poly cx^n */ public Poly (int c, int n) throws NegativeExponentException { if (n < 0) throw new NegativeExponentException( "Poly(int, int) constructor"); if (c == 0) { trms = new int[1]; deg = 0; return; } trms = new int[n + 1]; for (int i = 0; i < n; i++) trms[i] = 0; trms[n] = c; deg = n; }
(continued on next slide)
/** EFFECTS: initialize this to be the poly 0x^n */ private Poly (int n) { trms = new int[n+1]; deg = n; }
/** EFFECTS: Returns the degree of this, i.e., the largest exponent with a non-zero coefficient. Returns 0 if this is the zero Poly. */ public int degree () { return deg; }
/** EFFECTS: Returns the coefficient of the term of this whose exponent is d */ public int coeff (int d) { if (d < 0 || d > deg) return 0; else return trms[d]; }
(continued on next slide)
/** EFFECTS: If q is null throws NullPointerException; else returns the Poly this - q */ public Poly sub (Poly q) throws NullPointerException { return add(q.minus()); }
/** EFFECTS: Returns the Poly - this. */ public Poly minus () { Poly r = new Poly(deg); for (int i = 0; i < deg; i++) r.trms[i] = - trms[i]; // Note here return r; }
// ... // See textbook p. 92 and WebCT for complete code}
Records
A record is a collection of fields E.g., struct in C/C++ Java does not have struct. A class has to be used Visibility of instance variables in a record class can be
either public or package visible/** Overview: A record type */
class Pair { int coeff; int exp; Pair(int c, int n) { coeff = c; exp = n; } }
No specification is needed for record other than to indicate that it is a record type
Additional Methods
Class Object is the ancestor of all Java classes Depending on the class, some methods of Object
need to be overridden We will discuss equals, clone and toString
Method equals Conceptually, two objects are equal if they are
behaviorally equivalent Behaviorally equivalent: Two objects are
undistinguishable by using any sequence of calls to the objects’ methods
For mutable objects, all distinct objects are distinguishable Objects are equal only when they are the same object
Immutable objects with the same state are equivalent
The equals method of Object tests whether two objects are the same object For mutable objects, there is no need to override the
equals method of Object The equals method of immutable objects should be
overridden
Similarity is a weaker equality notion Two objects are similar if they are not distinguishable by
using any observers of their type If necessary, you may implement a similar method
similar and equals are the same for immutable objects similar is weaker than equals for mutable types
IntSet s = new IntSet();
IntSet t = new IntSet();
if (s.similar(t)) ...; else ...;
Method clone clone creates a new object that is a copy of the
object on which clone is invoked The clone method of Object assigns the instance
variables of the old object to those of the new one It may create a sharing problem if any of the instance
variables is a reference E.g., IntSet, Poly (immutable, so acceptable)
The default implementation of clone is usually acceptable for immutable types If the default clone can be used, it can be inherited by
putting implements Cloneable in the class header clone needs to be implemented for mutable
types
/** OVERVIEW: ... */
public class Poly implements Cloneable { public boolean equals (Poly q) { // Optimized if (q == null || deg != q.deg) return false; for (int i = 0; i <= deg; i++) if (trms[i] != q.trms[i]) return false; return true; } public boolean equals (Object z) { if (!(z instanceof Poly)) return false; return equals((Poly) z); } } The definitions of the equals method can be deemed as a
template The first equals method is overloaded, while the second one
is the overriding method, since it has the same signature as that in class Object
/** OVERVIEW: ... */public class IntSet { ... private IntSet (Vector<Integer> v) { els = new Vector<Integer>(); for (int i = 0; i < v.size(); i++) els.add(v.get(i)); }
public Object clone ( ) { return new IntSet(els); } } Note:
There is no specification for clone or equals since their meanings are well-understood
CloneNotSupportedException is thrown if clone is called on an object that neither implements Cloneable nor declare its own clone method
The signature of clone of a subtype is identical to the signature of clone for Object Object clone(); A cast is needed when clone is usedIntSet t = (IntSet) s.clone(); May not be so for user-defined generic classes
Discussed in Polymorphism
Method toString Produces a string that represents the current state
of its object and indicates its type E.g., IntSet: {1, 7, 3} E.g., Ploy: 2 + 3x + 5x^2
toString of Object only provides type name and its hash code
It is advisable to provide a customized toString method for each new type
toString method for IntSet
public String toString () {
if (els.size() == 0) return "IntSet: { } ";
String s = "IntSet: {" + els.elementAt(0);
for (int i = 1; i < els.size(); i++)
s = s + ", " + els.elementAt(i);
return s + "} ";
}
Aids to Understanding Implementations Abstraction function describes the
implementer's choice of a particular representation for the data type About how instance variable values are mapped
to the state of the abstract object that they represent
Rep invariant describes the common assumptions on which constructors and methods are implemented It allows the implementation of each operation
without worrying about those of the others
Abstraction function and rep invariant captures why the code is the way it is E.g., choose and size of IntSet They are valuable to both implementers and other
readers of the code But not the user of the data abstraction
Note that they are NOT specification Written by implementers
DesignerDesigner
ImplementerImplementer
Abstraction Function
The implementation of a data abstraction decides a relationship between the rep and the abstract objects
The relationship can be defined as a function called the abstraction function (AF) It maps from the instance variables (rep of an object) to the
abstract object being represented E.g., the IntSet uses a vector els Specifically, AF maps from the concrete values of the
instance variables to the abstract state of the abstract object
The following example shows the mapping of concrete states of a real object to abstract states of the abstract object E.g.: Vector<Integer>: [1, 2] maps to an Integer set {1, 2}
Apparently, AFs are often many-to-one mappings
AF should be described in a comment in the implementation of an abstract object
Since informal specification is used, the range of an AF is not mathematically defined
To overcome the problem, we give a description of a typical abstract object in the specification E.g., in IntSet, we have “ A typical IntSet is {x1,...,xn}“
E.g., in Poly, we have “A typical Poly is c0 + c1x + c2x^2 + ...”
Example: Based on the typical abstract IntSet object, we can write the AF for IntSet as follows:// The abstraction function is// AF(c) = { c.els[i] | 0 <= i < c.els.size } The notation {x | p(x) } describes the set of all x
such that the predicate p(x) is true Note that convenient abbreviations are used
c.els[i] stands for c.els.get(i) It is fine as long as the readers can clearly understand
what it means
Note that you can also choose to write the abstraction function in plain English E.g., AF of IntSet implementation can be “All
elements in the rep els correspond to the elements in the abstract IntSet.”
Example:// A typical Poly is c0 + c1x + c2x^2 + ...// The abstraction function is:// AF(c) = c0 + c1x + x2x^2 + ...// where // ci = c.trms[i] if 0 <= i < c.trms.size// ci = 0 otheriwse Or in plain English: The elements in the rep trms
correspond to the coefficients of the polynomial object. The index of each element/coefficient in trms corresponds to the exponent of each term in the polynomial
You do not need to provide an abstraction function for a record type A record type provides no abstraction over its rep
–- both the real object and abstract object is a collection of fields that correspond to each other
Representation Invariant
Not all syntactically correct values of instance variables are semantically correct to represent the state of the abstract object E.g., if we do not allow duplicate values in Vector<Integer> els of an IntSet. The els’s containing duplicate values are not legitimate representations of the IntSet, although the compiler will accept it
Representation (Rep) invariant is a statement of a property that all legitimate objects satisfy A rep invariant is a predicate that is true of legitimate
objects If it is violated, then the object is corruptted
Example: for IntSet, we have:// The rep invariant is
// I(c) = c.els != null &&
// for all int i, j, 0 < i, j < c.els.size &&
// i != j => c.els[i] != c.els[j]
The rep invariant is written using predicate calculus notation
Predicate calculus notation: &&: and, conjunction ||: or, disjunction =>: implication for all: universal quantifier there exists: existential quantifier
You may also choose to write the rep invariant in an informal way using plain English
Example: for IntSet, we have:// The rep invariant is:
// I(c) = c.els != null &&
// there are no duplicates in c.els
Example: Consider an alternative representation of IntSet that consists of an array of 100 boolean values plus a Vector<Integer>private boolean[] els = new boolean[100];private Vector<Integer> otherEls;private int sz; Based on the above rep, if an integer i between 0
and 99 is in the set, we just set els[i] to be true
All integers > 99 are stored in otherEls For efficiency purpose, we store the size of the
set in sz This will be an efficient rep if almost all integers
that appear are between 0 and 99
The Abstraction function:// The abstraction function is
// AF(c) = { c.otherEls[i] |
// 0 <= i < c.otherEls.size }
// + { j | 0 <= j < 100 && c.els[j] }
The rep invariant:// The rep invariant is
// I(c) = c.els != null && otherEls != null &&
// all elements in c.otherEls are not in the
// range 0 to 99 && there are no duplicates in
// c.otherEls && c.sz = c.otherEls.size +
// (count of true entries in c.els)
The Abstraction function in plain English:// The abstraction function:
// The set of int in the IntSet are the union
// of the indices of true elements in els
// and all the elements in otherEls
Note that sz is redundant Whenever there is redundant information in the rep,
the relationship of the redundant info to the rest of the rep should be explained in the rep invariant
Example: Poly// The rep invariant is
// I(c) = c.trms != null && c.trms.length >= 1 &&
// c.deg = c.trms.length- 1 &&
// c.deg > 0 => c.trms[deg] != 0
If all syntactically correct states of the concrete object are legal representations, we simply have:// The rep invariant is
// I(c) = true
It is so for all record types Since using code can access the rep directly, there is no
way for the implementation code of a record type to constrain it
Thus, rep invariants need not to be given for record types They must be given for all other types It helps the implementers and code readers
It is always possible that there is some strong relationship among the fields. That relationship should be expressed in the rep invariant of the using code Assume that another type of rep is used for Poly:
class Pair { int exp; int coeff; }
/** Overview: A record type */public class Poly { Pair[] trms; // only used for non-zero terms ...}
Then we have rep invariant as follows:// for all elements e of c.trms, // e.exp >= 0 and e.coeff != 0
Implementing the Abstraction Function and Rep Invariant Besides providing the abstraction function
and rep invariant as comments, you usually also provide methods to implement them Not for record type
The toString method is used to implement the abstraction function
The method that checks the rep invariant is called repOk
repOk specification:/** EFFECTS: Returns true if the rep invariant
holds for this; otherwise returns false
*/
public boolean repOk()
repOk is public so that using code can use it Since the specification of repOk is clear and
always the same, it is not necessary to write it
Examples// for Poly: public boolean repOk() { if (trms == null || deg != trms.length - 1 || trms.length == 0) return false; if (deg == 0) return true; return trms[deg] != 0; } //for IntSet public boolean repOk() { if (els == null) return false; for (int i = 0; i < els.size(); i++) { Integer x = els.get(i); for (int j = i + 1; j < els.size(); j++) if (x.equals(els.get(j))) return false; } return true;}
There are two ways to use repOk Using code can call it to check the implementation Call it inside constructors and methods that
modify the rep Call right before they return
If repOk is costly, they can be disabled when the program is in production
Discussion AF and RI are NOT the specification of the data
abstraction. They are written by the implementers rather than designers
Rep invariant holds whenever an object is used outside its implementation It need not hold all the time in an object operation However, it must be true whenever the operations return
to their callers The abstraction function only makes sense when
the rep invariant holds
A rep invariant should express all constraints on which the object operations depend You may imagine that the operations are to be
implemented by different people The rep invariant should be sufficient to support the
scenario When implementing a data abstraction, AF and RI
should be completed before the implementation of any operation
All operations of the data abstraction should be implemented such that RI is preserved
Properties of Data Abstraction Implementation The rep of an immutable abstraction need not
to be immutable E.g., Poly
Benevolent Side Effects are rep modifications that are not visible outside the implementation Example: Change the order of the elements in elms in an IntSet
Example: Suppose rational numbers are represented as a pair of integers (fraction): int num, denom;
The abstraction function is: // A typical rational is n / d
// The abstraction function is
// AF(c) = c.num / c.denom Given the rep, there are several issues:
Zero denominator? No Negative rational? Negative Numerator Reduced form (no common term)? No
Based on the decision, we have a rep invariant:// The rep invariant is
// c.denom > 0
However, to test equality of two rationals, a reduced form is needed (common factors are removed) See code in WebCT or on Page 110 in the textbook
The equals method will compute the reduced form of two rationals first, before decide the equality
The reduced form of rep replaces the original rep but the abstract object is the same Benevolent side effect
Benevolent side effects are often performed for efficiency reasons They are possible whenever the abstraction
function is many-to-one
Exposing the Rep It is very important that the rep of a data
abstraction cannot be modified outside its implementation
Even if all instance variables are declared private, it is still possible to expose the rep/** EFFECTS: Returns a vector containing the elements of this, each exactly once, in arbitrary order */public Vector<Integer> allEls () { return els;}
Exposing the rep is an implementation error: A method returns a mutable object in the rep A constructor or method makes a mutable argument object
part of the rep/** EFFECTS: If elms is null throws
NullPointerException; else initializes
this to contain all elements in elms */
public IntSet (Vector<Integer> elms) throws
NullPointerException {
if (elms == null) throw new
NullPointerException(“IntSet.IntSet(Vectors)”);
els = elms;
}
Design Issues
Mutability In general, a type should be immutable if its
objects would naturally have unchanging values E.g., mathematical objects such as Poly, Rational,
etc. A type should usually be mutable if it is modeling
something from the real world, where it is natural for values of objects to change overtime E.g., Employee, IntSet, etc.
There is a trade-off between efficiency and safety Immutable abstractions are safer: no problem for
sharing; Immutable abstractions are less efficient: objects may
be created and discarded frequently Mutability is a property of the type and not of its
implementation Implementation should support the property
Operation Categories Creators: Operations that create objects of their
types from scratch All creators are constructors
Producers: Take objects of their type as inputs and create other objects of their type E.g., add of Poly
Mutators: Modifies objects of their type Only for mutable types
Observers: Take objects of their type as inputs and return results of other types
Creators usually create some but not all objects of the data type E.g., Poly constructors only create single-term
polynomials, while IntSet constructor only creates the empty set
Other objects are created by producers or mutators E.g., add of Poly, insert of IntSet, etc
Mutators play the same role in mutable types as what producers play in immutable ones
A mutable type can have producers as well as mutators E.g., clone of IntSet
Sometimes observers are combined with producers or mutators E.g., we may have a chooseAndRemove method for
IntSet
Adequacy A data type is adequate if it provides enough
operations so that everything users need to do with its objects can be done both conveniently and with reasonable efficiency There is no precise definition of adequacy A not adequate type can be told
E.g., an IntSet without isIn
A basic notion of adequacy can be obtained by considering the operation categories In general, for immutable types, it must have creators,
observers and producers For mutable types, it must have creators, observers and
mutators A type must be fully populated
Using its creators, mutators and producers, it must be possible to obtain every possible abstract object state
A type that is intended for general use must have a rich enough set of operations for its intended uses
But do not offer irrelevant operations E.g., sum or sort for IntSet
If a type is adequate, its operations can be augmented by standalone procedures that are outside the type’s implementation (i.e., static methods of some other class)
Locality and Modifiability Revisited Locality: the ability to reason about a module
by just looking at its specification It requires that the rep be modified only within its
type’s implementation Modifiability: the ability to re-implement an
abstraction without having to modify any using code All access to a rep must occur within its
implementation