what is object-oriented programming? - ieee software

What is Object-Oriented Programming 9

Bjmne StMStNp, AT&T Bell Laboratories

Ohjedenkntedhas become a buzzword that implies “good”

proghmming But when it comes to

really supporting this pardighm, not all

l a m e e s are equal.

10

ot all programminglanguagescan be object-oriented. Yet, claims have been made that APL, Ada,

Clu, C++, Iaops, and Smalltalk are object- oriented languages. I have heard discussions of object-oriented design in C, Pas- cal, Modula-2, and Chili. Could there somewhere be proponents of object- oriented programming in Fortran and Cobol? I think there must be.

“Object-oriented” has become a high- tech synonym for “good.” Articles in the trade presscontain arguments that appear to boil down to syllogisms like:

Ada is good; object-oriented is good; thert$oore, Ada is object+niated.

This article presents my view ofwhat o b ject-oriented means in the context of a general-purpose programming language. I present examples in C++, partly to in t re

An earlier version of this article appeared in / k r . Fmt Etrmpnii G 7 $ on O ~ P r l - ( ~ i ~ / k ~ ~ r n t n r , l R . Spnnger- Verlag. NewYork. 1987.p~. 51-70.

0740-7459/88/05OO/OO10/$01 .OO 01988 IEEE

duce C++ and partly because C++ is one of the few languages that supports data a h straction, object-oriented programming, and traditional programming techniques. I do not cover issues of concurrency and hardware support for specific, higher level language constructs.

Programming paradigms Object-oriented programming is a tech-

nique - a paradigm for writing “good” programs for a set of problems. If the term “object-oriented language” means any- thing, it must mean a language that has mechanisms that support the object- oriented style of programming well.

There is an important distinction here: A language suppurtsa programming style if i t provides facilities that make it con- venient (reasonably easy, safe, and efi- cient) to use that style. A language does not support a technique if i t takes excep tional effort or skill to write such programs; in that case, the language merely

IEEE Software

wrcilhc prograinmcrs to use the techniqiie. For examplc,you can wire structured pro- grains in Fortr;in and type-sccurc programs i n C, and y o ~ i can use data ahstrac- tioii in Moclul>tZ, but i t is iinnecessarily hard to do so because those languages do not .support those techniques.

Support for a p i d i g i n conics not o n l y in the obvious f o i - i n of language facilities that let you use the paradigm directly, bur also in the niorc subtle forms of compilc- tiin<= ai id runtime checks foi- uniiiten- tional deviations f'rom the paradigm. Type checking, ambiguity detection, and runtime checks are an examples of linguistic support for paradigms. Extralinguistic facilities such as standard librarics and pi-ograinrning einironments can a l s o prw \idr significant support for paradigms.

One language is not nccessarily better than another because i t has a feature the other does not - therc are many examples to the contrary. The impormiit issue is not how many features a language has, but that the features it docs have are sufficient t o support the desired programming styles in the desired application areas. Specifically, it is important that:

All features are cleanly and elegantly integrated into the language.

I t is possible to cornhilie features to achieve solutions that would have other- wise required extra, separate features.

There arc as few spurious arid special- pirposc features as possible.

Implementing a feature does not im- pose significant overhead on programs that do not require it.

A user need o n l y know about the language subset used explicitly to write a program.

The last two principles can be suni- marized as "what you don't know won't hurt you." If. there are any doubts about the usefulness o f a feature, i t is better left out. It is much easier to add a feature to a language than to remove or modifv one

digin is filled with discussions of how to

A languee does not s u m t a fechique if it takes exceptional

e m r t or skill to write such proglams.

pass arguments, how to distinguish different kinds ofargumcnts and different kinds of functions (procedures, routines, macros, etc.), and so on.

Fortran is the original procedut al language; Algold0, Algol-68, C, and Pascal are later inventions in the same tradition.

An example of good procedural style is a square-root function. Given an argument, the function neatly produces a result. To do so, i t performs a well-understood math- ematical computation.

double sqrt(doublc arg) I

I

void some-function () I

/ / the rode for calcdating a squarv 1-oot

double root2 = sqrt (2) ; / / ...

I

that has found itsway into the compiler or ' the literature.

Procedul-al programming uses frrnc- , tions tv ci-catc oi-dcr in a maze of a l p rithms.

Procedural. The original - arid pi-ob 1 ably still the most cvnirnoii - program- rning paradigm is:

Data hiding. Over the years, the empha- , sis in the progi-am design has shifted from procedure design t o data organization.

~ Among ot~ici- things, this reflects an in- Dmdr rohirh prowilurrs you want: I L W lhr I m t iilgonlhms y m cnn,/ind.

crc'ae in prograni size. '4 set ofrelated p r e The foC1l.S is on proccdnrc design -the ~ cedures and the data they manipulate is

often called ii ~notlule. The programming pal-adigni is:

algorichtn needed to pelform the desired computation. Iaiguages support this paradigm with facilities for passing arguments to fnnctions and retLlrning values 1 functions. The literature about this para-

Zlrridr ~ i ~ h i ~ h moduIr.rycnL w(m~;pn?lzlion theprop-(on ,si) thnt dntn is hiddm in m o d u h .

This paradigm is known as the data-hiding principle. U%en procedures do not need to be grouped with related data, the procedural style suffices. I n fact, the techniques for designing good procedures are still applied, now to each procedure in a module.

The most coninion example ofdata hiding is a definition of :I stack module. A good solution requires

a user interface for the stack (for example, thefLinctionspustl() and pop()),

that the stack representation (for example, a vector of elements) can be accessed only through this user interface, and

that the stack is initialized before its first use.

A plausibly external interface for astack module is

,(I declai-ation of tlic inteifare of module / / stackol characters chai-pop(): void push (char) : const sr-nck-cife = 100;

Assuming this interface is found in a file called stark.h, the intemalscan bedefined like this:

#include "5utk.h" static charv[stack-siie]: / / " s t a t i c " means

// local to this / / file/rnodule

May 1988 11

static char* p = v;

char pop() I

I void push (char c) I

I

It is quite feasible to change this stack representation to a linked list. The user does not have access to the representation anyway (because v and p were declared static - that is, local to the file or module in which theyweredeclared). Such astack can be used like this:

#include "stack.h" void some-function() I

// the stack is initially // empty

// check for underflow and pop

// check for overflow and push

chart= pop(push('c')); if (c != 'c') error("impossib1e");

I

As originally defined, Pascal doesn't provide satisfactory facilities for such group ing: The only way to hide a name from the rest of the program is to make it local to a procedure. This leads to strange procedure nestings and overreliance on global data. Cfaressomewhatbetter.Asshown in the

example, you can define a module by grouping related function and data defini- tions in a single source file. The programmer can then control which names are seen by the rest of the program (a name can be seen by the rest of the program un- lessit has been declared static). So in C you can achieve a degree of modularity. However, there is no generally accepted paradigm for using thisfacility, and relying on staticdeclarations is rather low-level.

One of Pascal's successors, Modula-2, goesa bit further. It formalizes the module concept by making it a fundamental con- struct with welldefined module declarations, explicit control of the scope of names (import/export facilities), a module-initialization mechanism, and a set of generally known, accepted usage styles.

In other words, C enables the decom- position of a program into modules, while Modula-2 supports it.

Data abstraction. Programming with modules leads to the centralization of all data of a certain type under the control of

a type-manager module. Ifyouwanted two stacks, you would define a stack-manager module with an interface like this:

// stack-id is a type; no details about // stacks or stackjds are known here:

// make a stack and return its identifier:

// call when stack is no longer needed:

class stack-id;

stack-id create-stack(int size);

destroy-stack(stack-id); void push(stack-id, char); char pop(stack-id);

This is certainly a great improvement over the traditional unstructured mess, but "types" implemented this way are clearly very different from the types built into a language.

I n most important aspects, a type created through a module mechanism is different from a built-in type and enjoys in- ferior support: Each type-manager module must define a separate mechanism for creatingvariables of its type, there is n o e s tablished norm forassigningobject identi- fiers, a variable of such a type has no name known to the compiler or programming environment, and such variables do not obey the usual scope and argument-pass ing rules.

For example:

void f() I

stack-idsl; stack-id s2;

S I = create-stack(200); // Oops: forgot to create s2

char c 1 = pop (s 1 ,push (s 1 ,'a')) ; if (cl != 'c') error("impossib1e");

char c2 = pop(s2,push(s2,'a')); if (c2 != 'c') error("impossih1e");

destroy(s2); // Oops: forgot to destroy S I

I

In other words, the module concept that supports the data-hiding paradigm ena- blesdataabstraction, but doesnot support it.

Abstract data types. Languages such as Ada, Clu, and C++ attack this problem by letting the user define types that behave in (nearly) the same way as built-in types. Such a type is often called an abstract data type, although I prefer to call it a userde-

fined type.* The programming paradigm becomes:

Decide which types you want; plovide afull set ofoperationsfor each type.

When there isnoneedformore thatone object of a type, the data-hiding programming style using modules suffices. Arith- metic types such as rational and complex numbers are common examples of userdefined types:

class complex 1

public: double re, im;

complex(doub1e r, double i ) { re=r; im=i; 1 // float->complex conversion: complex(doub1e r) { re=r; im=O; 1 friend complex operator+

// binary minus: friend complex operator-

/ / unary minus: friend complex operator-( complex) ; friend complex operatort

friend complex operator/

// ...

(complex, complex);

(complex, complex);

(complex, complex);

(complex, complex);

I

The declaration of the complex class (the userdefined type) specifies the representation ofacomplexnumberand theset of operations on a complex number. The representation is @'vu& that is, re and im are accessible only to the functions specified in the declaration of class complex. Such functions can be defined like this:

complex operatoi-+

I (complex a l , complex a2)

return complex (a1 .reta2.re,al .inita2.im);

t

and used like this:

complex a = 2.3; complex h = I /a; complex c = a+b*complex( 1,2.3); // ... c = -(a/h)t2;

Most, but not all, modules are better ex-

*As Doug Mcllroy has raid, 'Those typrs are not 'ab stract,' the) arc a real as rnrand/7mf."Another defini- lion ofabsuact data types would requirr a mathrmatical "abstract" sprcification of all ppes (both builr-in and iiserdcfinrd). What is referred to as typc's in this ai-tick would. given such a specification. hr coticrete specifica- tions of such triilv abstrncr rntities.

12 IEEE Software

pressed as userdefined types. When the programmer prefers to use a module r e p resentation, even when a proper Facility for defining types is available, he can declare a type that has only a single object of that t)pe. Alternatively, a language might pi-ovidc a module concept in addition to and distinct from the class concept.

~rohIP7m ti userdefined type defines a sort of black box. Once i t has been de- f i n d , it does not really interact with the rest ofthe program. The onlywav to adapt it to new uses is to modify its definition. This is often too inflexible.

Consider defining a type shapfOr use in agraphics system. Asume for the moment that the system has to support circles, trian- gles, and squares. Assunie also that you have some classes:

class point! / * ... * / 1; class colol-j /* ... * / 1;

You might define a shape like this:

eniiin kind ! circle, triangle, square 1;

class shapc ! point tenter; color- col: kind k; // representation ofshape

point where() 1-rtiirii trnter; Loid movr(poiiit to) ! trtitri- = to; draw();

void draw ( ) ; 1 oid rotate(int) : / / inorr operations

public:

t

I:

The t)pr field. k , is used by operations such a5 draw() and rotate() to determine what shapr they are dealing with (in a Pas- cal-like language, you might use a variant record with tag k ) . The function draw() might be defined like this:

wid shapc,::drau ( ) !

switch (k ) j c-ax (ii-clc: / / draw ii cirrlc. break;

/ / draw a trianglr bsrak;

// drawa sqiia~-e

case triangle:

caw sqiiarc:

I 1

This is a mess. I t requires that functions

such as draw() know about all the kinds o f shapes there are. Therefore, the code for any such function must by modified each time a new shape is added to the system.

If you define a new shape, every operation on a shape must be examined and (possibly) modified. You cannot add a new shape to a system unlessyou have access t o

the source code for every operation. Be- cause adding a new shape involves touch- ing the code of every important operation on shapes, i t can require great skill and may introduce bugs into the code that handles the older shapes.

Also, your choice of how you represent particular shapes can be severely cramped by the requirement that at least some of their representation fit into the typically fixed-sized framework presented by the definition ofthe general type shape

The problem is that there is no distinction between the general properties ofany shape (ashape hasacolor,itcan bedrawn, etc.) and the propertiesof a specific shape (a circle is a shape that has a radius, is drawn by a circle-drawiiig function, etc.).

Object-oriented programming. T h e ability to express this distinction and take advantage of it defines object-oriented programming. A language with constructs that let you express and use this distinction supports object-oriented pi-ogramming. Other languages don’t.

The Simula inheritance mechanism provides asolution. First, you specify a class that defines the general properties of all shapes:

c l a s shape ! point center; color col; / / ...

public: point where() j setui-ii center; } void movr(point to) { trnte1- = to; draw():

vii-tualvoid d raw( ) ; vii-tual void Iot;itc( int); / / ...

I

1;

The functions marked klirtual are those for which the calling interface can be defined, but the implementation cannot be defined except for a specific shape. (‘Vir- tual” is the Simula and [:++ term for “may be redefined later in a class derived from this one.”) Given this definition, we can

write gencral functions t o manipulate shapes:

wid rotatc-all

/ / rotiitr all nicmlwrs of \ectoI “ v ” o l aiie // “sire” “mglr” dcgrrrs I

(shape* v, i i i t siic. i i i t mgle)

foi- (int i = 0: i < sile; i++) v[i].rotatr(;ingle);

To define a particular shape, you must say that i t is a shape and spccify its particular properties:

clas5 cii-cle : public sh‘tpr {

public: int radius:

voiddraw() ! / * ... */ I ;

t ; InC++,the circleclassissaid tobc dm’wd

from the shape class, and the shapc class is said t o be a hatrofthe circle class. Anothel- terminolop calls circle a subclass and shape a superclass.

The programming pai-adigin is:

1)mridr wliicli cimsrc you 71inni; /)ro71i& (1

pill srt (iJ oprm~ionsJm- rtzrh rhss; wu~kr co~nrnoncilzty rxplitit using in1~m“lancP

Where tlierc is no such comrnoriality, data abstraction suffices. I {ow much types have in common s o that the conimonality can be exploitcd using inhet-itance and \4r- tual fhct ions is the liunus test ofthe appli- cability ofot?jcct~)ricnted progi-animing,

In sonic areas, such as interactive graph- ics, there is clearly enormous opportunity for- object-orirntcd programming. In other areas, such as classical arithmetic types and the computations based on them, there appears to be hardly any need for more than data abstraction.*

Finding cornrnonalit)~ among types in a system is not ii trivial proccss. How much commonalit! can be exploited depends on how the system is dcsigncti. (:on- monality inlist be actively sought when the system is drsignrd, borh by drsigning classes specifically as building blocks for other typesatid bycxaminingclassrstosee if they have similarities that can be exploited in a common base class.

~ ~- *Howr\cl. morr .id\anrcd m.ilhriiiatic \ i i i in bcnrlit

May 1988 13

Nygaard' and Kerr' explain what object- oriented programming is without re- course to specific language constructs; Cargill has written a case study in object- oriented pr0gramming.l

Supporting data abstraction

Programming with data abstraction is supported with facilities both to define a set of operations for a type and to restrict access of objects of that type to that operation set. However, once that is done the programmer soon finds that language re- finements are needed to define and use the new types conveniently.

lnitializationandcleanup. When a type's representation is hidden, some mechanism must be provided for a user to initialize variables of that type. A simple solution is to require a user to call some function to initialize avariable before using it. For example,

class vector { in t sz; int* v;

public: void init(int size); //call init to initialize

// szandv before the // first use of avector

// ... I ; vector v; // don't use v here v.init( IO); // use v here

This iserror-prone and inelegant. Abetter solution is to allow the designer of a type to provide a distinguished function to do the initialization. Such a function makes all* cation and initialization of a variable a single operation (often called instantia- tion) insteadoftwooperations. Suchan in- stantiation function is often called a constructor.

In caseswhere constructingobject types isnontrivial,itisoften necessarytoprovide a complementary operation to clean up objects after their last use. In C++, a cleanup function is called a destructor. Consider a vectortype:

class vector { int sz; / /number ofelements int* v;

vector(int); // constructor

// pointer to integers public:

-vector(); // destructor int& operator[] (int index); // subscript

// operator t ;

The vectorconstructor can be defined to al- locate space like this:

vec tor::vector (in t s) I

if (s<=O) error("badvector size"); sz = s; v = newint[s]; // allocatean array

// of "s" integers t

The vectordestructor frees storage:

vector::-vector() 1

delete v; // deallocate the memory // pointed to by v

I

C++ does not support garbage collection. It compensates for this by letting a type maintain its own storage management without user intervention. While this is a common use for the constructor/destructor mechanism, many uses of this mechanism are unrelated to storage management.

Assignment and initialization. Control- ling the construction and destruction of objects is sufficient for many types, but not for all. Sometimes, it is also necessary to control copy operations. Consider the vector class:

vectorvl(100); vector v2 = v l ; // make a new vector v2

V I =v2; //assignv2tovl // initialized to V I

It must be possible to define the meaning of the initialization of v2 and its assignment to V I . It should also be possible to prohibit such copy operations; preferably both alternatives should be available. For example:

int* v; int SL;

public: / / ... void operator=(vector&); // assignment vector(vector&); // initialization

class vector 1

I ;

specifies that user-defined operations should be used to interpret vector assign-

ment andinitialization. Assignment might be defined like this:

vector::operator= (vector& a)

I // check size and copy elements

if (sz != a.s/;) error("badvector s i x for ="); for (inti = 0; i<sr; i t + ) v[i] = a.v[i];

I

Since the assignment operation relies on the old value of the vector being assigned to, the initialization operation must be different. For example:

vector::vector(vector& a)

{ / / initialize avector from another vector

SL = a s / ; / / same size v=newint[sz]; // allocateelementarray / / copy elements: for ( inti=Oi<sz;i t+) v[i] =a.v[il;

I

In C++, a constructor X(X&) defines all initialization of objects of type Xwith another object of type X. In addition to explicit initialization, constructors of the form X(X&)are used to handlearguments passed by value and function-return values.

In C++, assignment of an object of class X can be prohibited by declaring the assignment operation private:

class x [ void operator=(>(&); / / only members

X(X&); / / copy an X / / of x can

. . . public:

I ; . . .

Add does not support constructors, destructors, assignment overloading, or userdefined control of argument passing and function return. This lack of support severely limits the class of types that can be defined and forces the programmer back to data-hiding techniques: The user must design and use type-manager modules instead of proper types.

Parameterized types. Why would you want todefine avectorofintegersanyway? Typically, a user needsavector ofelements of some type unknown to the writer of the type urctor. Consequendy the vector type ought to be expressed so it takes the ele- ment type as an argument:

14 IEEE Software

c l a s vcctor<class 'I> {

T* v; int sr;

public: vcrtor(int s j {

/ / vrrtoi- of elements of type 1

if (s <= 0) error("hadvector sire"); v = newT[sr=s]: //allocateanarray

I T&operator[] ( inti) ; int sire() { retur-n s/; 1 // ...

1;

Vectors of specific types can now be defined and used:

vrctor-<iiiovl(100); / / V I isavector

vrctor<complex>v2(200); / / v2is avector // of 100 integei-s

/ / of 200 complex / /numbers

v2[i] =complex(vl[xl,vl [)I):

Ada and Clu support parameterized types. Unfortunately, C++ does not; the notation used here is still experimental. When they are needed, parameterized classes are faked with macros. There need not be any runtime overheads compared with a class where all types involved are specified directly.

Typically, a parameterired type will have to depend on at least some aspect of a type parameter. For example, some of the vector operations must assume that assignment is defined for objects of the parameter type. How can you ensure that? One way is to require that the designer of the parameterized class state the dependency. For example, "T must be a type for which = is defined." A better way is not to require this - or to take a specification of an argument type as a partial specification. Acom- pilercan detectifaamissingoperation has been applied and give an error message such as

cannot define vrcto~-<non~cop~->::operator[] (non-copy&):

vpe non-copy does not have operator=

This technique letsyoudefine typeswhere the dependency on attributes ofaparame- ter type is handled at the level of the in- dividual operation of the type. For example, you might define a vector with a sort operation. The sort operation might use <, ==, and = on objects of the parameter bpe. Itwould still be possible to define

vectors of a type for which < was not defined, as long as the vector-sorting operation was not actually invoked.

A problem with parameterized types is that each instantiationcreatesan independent type. For example, the type upr- tor<char> is unrelated to the type uer- tor<romplex>. Ideally you would like to be able to express and use the commonality of types generated from the same paranie- terized type. For example, both uer- tor<char, and vector<cornplex> have a size() function that is independent of the parameter type. It is possible, but not easy, to deduce this from the definition of class vector and then let size() be applied to any vector. An interpreted language or a language that supports both parameterized types and inheritance has an advantage here.

Exception handling. As programs grow, and especially when libraries are used ex- tensively, standardsfor handling errors (or "exceptional circumstances") become ini- portant.

Ada, Algol48, and Clu each support a standard way to handle exceptions. Unfor- tunately, C++ does not. When necessary, exceptions are faked using pointers to functions, exception objects, error states, and the C library's signal and longjmp facilities. This is not satisfactory because it fails to provide even a standard framework for error handling.

Consider the vector example again. What should be done when an out-of- rangeindexvalueispassed to thesubscript operator? The designer of the vector class should be able to provide a default behavior for this:

class vector

except vector-range { / / define an exception called / / vector-range a i d specifv default // code for handling it error("g1obal: vector range error"); exit (99) ;

I I

Instead of calling an error function, uer- tor::operatm[]() can invoke the exception- handling code:

int&vector::operator[] (inti) l

it (04 I 1 s/<=i) iaiw \ccroI-ratigc; w t i i i n i v [ i ] :

I

This will cause thr call stack t o be 1111-

raveled until an cxception hmdlcr for UP<-

tor-rctng-pis found and executed. An exccption handlet- rnay he definrd

for a specific block:

void E( j { veCtOl-v( I O ) ;

ci-rors hei-c ;ire tiandlrtl by the local cxceprioii handler tielined l x i o w

/ / ...

v[i] = 7; I rxccpc 1

/ / potential range CI.I.OI

\'er toir:vcc to i--n tigr : eri-or("f ( ) : vector miigc VI-I-01"):

rerwn: I

/ / cIIoI\ l1eI-e ii1-t' ll;llldlcd b y the // globnl exception h;uidlci / / defined in vec.toi-

i n t i = g( ) ;

v[ i ] = 7:

/ / g niiglir c;trisc a rangr / / error rising sonic vector / / potenrial range vi-roi-

I

There are many ways to define exceptions arid the behaiior of exception handlers. The facilit) sketched here resembles the ones found in Modula-2+. This style of.exception handling can be iinplrrnented so that code is not executed unless iin excep tion is raised (except possibly for some in- itialiration code). I t can also be ported across most <; iiriplernentations by using setjmp() and longjmp() (sec the <;library manual for your system).

Could exceptions, as defined above, be completely faked in a language such as C++? Unfortunately, no. The snag is that when an exception occurs, the runtime stack must be iinravclcd up to a point where an exception handler is defined. To do this properly in <:++ involves invoking destructors defined in the scopes involved. This is not done by a <; longjmp() and cannot in general be done by the user.

Coercions. U ser-de f i t i e d coc rc io n s , such as the one from floating-point numbers to complex numbers implied by the constructor comnjhx(doubk), have proven unexpectedly useful in C++. Such coercions can be applied explicitly or the prw

May 1988 15

grammer can rely on the compiler to add them implicitly where necessary and un- ambiguous:

complex a = complex( I ) ; complex h = I ;

a = btcomplex(2); a = h+2; // implicit:

// implicit: // 1 ->complex( I )

// 2-> complex(2)

Coercions were introduced into C++ because mixed-mode arithmetic is the norm in languages for numerical work and because most userdefined types used for cal- culation (of matrices, character strings, and machine addresses) have natural mappings to and from other types.

One use of coercions has proven especially useful in organizing programs:

complex a = 2; complex h = a+2; // interpreted as

//operator+ // (a,complex(2))

b = 2ta; //interpreted as //operator+ // (complex(2)d

Only one function is needed to interpret + operations, and the two operands are handled identically by the type system. Furthermore, class complex is written without any need to modify the concept of integers to enable the smooth and natural integration of the two concepts.

This is in contrast to a “pure” object- oriented system, where the operations would be interpreted like this:

a+2; // a.operatort(2) 2ta; // 2.operator+(a)

making it necessary to modify the integer class to make Z+alegal.

You should avoid modifying existing code as much as possible when adding new facilities to a system. Typically, object- oriented programming offers superior facilities for adding to a system without modifying existing code. In this case, however, data-abstraction facilities provide a better solution.

Iterators. A language that supports data abstraction must provide a way to define control structure^.^ In particular, users need amechanism to define loopsover the elements contained in an object of some userdefined type, without forcing them to depend on implementation details of the

user-defined type. Given a sufficiently powerful mechanism for defining new types and the ability to overload operators, this can be handled without a separate mechanism for defining control structures.

For a vector, defining an iterator is not necessary because an ordering is available to a user through the indices. I’ll define one anyway, to demonstrate the technique.

There are several iterator styles. My favorite relies on overloading the function application operator ():*

class vector-iterator [ vector& v; inti;

public: vector-iterator(vector& r) { i = 0; v = r; ) intoperator() ()

{ return i<v.size() ?v.elem(itt) : 0;) t :

A vector-iterator type can now be declared and used for a vector:

vector v( sz): vec tor-iterator next (v) ; inti; while (i=next()) print(i);

More than one iterator can be active for a single object at one time, and a type may have several different iterator types defined for it so different kinds of iteration can be performed. An iterator is a rather simple control structure. More general mechanisms can also be defined. For example, the C++ standard library provides a coroutine c l a s s

For many container types, such as vector, you can avoid introducing a separate iterator type by defining an iteration mechanism as part of the type itself. A vectortype might be defined to have a “current ele- ment”:

class vector { int* v; int sz; intcurrent;

public: // ... int next() {return (currentt+<sz) ?v[current] : 0; I intprev() { return (&--current) ?v[current] : 0 I

1;

*This style also relies on the existence ofadistinct value to represent end of iteration. Often, in particular for Ctt pointer ws, Ocan be used.

Then the iteration can be performed like this:

vector v( sz); inti; while (i=v.next() ) print ( i ) ;

This solution is not as general as the iterator solution, but it avoids overhead in the important special case where only one kind of iteration is needed and where only one iteration at a time is needed for avector.

If necessary, you can apply a more general solution in addition to this simple one. The simple solution requires more foresight from the designer of the container class than does the iterator solution. The iterator-type technique can also be used to define iterators that can be bound to several different container types, thus providing a mechanism for iterating over different container typeswith asingle iterator type.

Implementation issues. Suppor t for data abstraction is primarily provided in the form of language features implemented by a compiler. However, parameterized types are best implemented with support from a linker with some knowl- edge of the language semantics, and exception handling requires support from the runtime environment. Both can be implemented to meet the strictest criteria for both compile-time speed and efficiency without compromising generality or prcl grammer convenience.

As the power to define types increases, programs will depend increasingly on types from libraries (and notjust those described in the language manual). Thisnat- urally puts a greater demand on facilities toexpresswhatisinsertedinto orretrieved from a library, for finding out what a li- brarycontains, fordeterminingwhatparts of a library are actually used by a program, and so on.

For acompiled language, facilities to cal- culate the minimal compilation necessary afterachangeareimportant. It isessential that the linker/loader can bring a prcl gram into memory for execution without also bringing in a lot of related but unused code. In particular, a library/linker/loader system that includes the code for every operation on a type in the executable prcl

16 IEEE Software

Supporting dojectaiented programming

The lmsic support functions a prograin- met- needs t o Lvrite otijcct-oriciitc.tI pi-tr grains arc a class iriechxiism ivi th iiilieri- tancc and a inech;iiiisni that lets calls of member fiinctions depend on the actual object t)-pe (\\-lien the actual type is uri- kno\\m a t coinpile time).

The design of the iiieinl,er-f~inctioii calling rnechanism is critical. I n atltiitioii. 1'. . ' I ' ' 'IC I i t ics I hat siipport data-ahstrac tion techniques a rc important liecause the ar- giiiiieiits 101- (lata ahstraction and for its I-e- finrincnts to use types elegantly a r e equal1 y wlid \vh ere sup p( )rt foi- o tije c t- oriented progi-animiiig is available.

The sticccss of Imth techniques hinges on thedesi~rioft~peaantion thrrasr,flexibility, and efficiency o f such t\pcs. object- oriented programming simply lets userdefined t)'pes be far more flexible and gcncral than the ones designed using only data-abstraction techniques.

Calling mechanisms. The key laiiguage facility to support object-oi-ienterl pro- grarniiiing is the rnechanism by which a

member fiinction is invoked for an object. Foi- example, gi\en pointer p , how is ii call /e/(arg handled? There is a range o f choices.

I n languages such as (:++ ;ind Simrila, whcr-e static type checking is used cxtcii- sivcly, the type system can select bet\veen different c;illing mechanisms. I n (:++, there are two alternatives:

1. Normal function (AI: The member function t o call is determined at compile time (through a lookup in the coinpilet-'s symbol tables) and called with the standard functioncall niechanism, with a n argument added to identify the object for which the function is called. M h n the

liinction semantics. This optitiii/;itioii is equally valuable as a support f0r data a b straction.

2. \'irtual fiinction call: The fiuiction called depends on the object t)pe, ivhich usually cannot be determined until i- i i i i-

time. Typically, the pointer p will he of some base class Band the object \\ill be an object of some derived class I ) . The call mechanism must look into the object and find some infijrmatioti placed there by the compiler to determine which function /is to he called. Once i l i a function, say 11::L is found, i t is called i t h the mechaniwi tle- sciitied above. At cornpile time, the name /is converted into an index t o a table con- taining pointers to fiinctions. This \ii-tiial- call mechanism can essentially be rnatle a s

efficient as the normal fiinction<.all incch- anisrn. I n the s tandard (;++ irnple- men tat ion, o n l y five additional 111em~ )ry references are used. In cases where the actual type can be deduced at compile time, even this overhead iseliminated and in-lin- ing can be used. Such cases are quite common arid important.

In languagesmith w w k stxic type c h c ~ k - ing, a third, more elaborate alternative must be used. In ;I languagr like Smalltalk, a list of the names of;111 ineiiibei. fiiiictions (callrd methods) of a class are stor-rrl so the! can be found at runtime:

3. Method invocation: The appropriate table o f method iiarnes is first foiind bp ex- amining the object that ppoints to. 1 1 1 this table (or set oftables), the stringjis looked up to see i f theo t~ jcc thasan~) . I fan / ( ) is found, it is called; othenvise, some error handling takes place. This lookup differs from the lookup done a t compile tinir in a statically checked language because the method invocation uses a method table for the actual object.

A method invocation is inefficient compared with a virtual function call, but i t is more flexible. Since static type checking of arguments qpically cannot be done for a method invocation, the use of rnerhods must be supported by dynamic type checking.

standard function call is not efficient enough, the programmer can declare a fiinction to be in-line and the cornpilrr-bill try to expand its body in-line. This lets y o u achieve the efliciency of ;I macro expaii-

Type checking. The shape example car- lier showed the power oftirtiial functioiis. What else does a mcthod-invoc;irion mechanism do for you? I t lets you invoke ), = cs.,,(,,,( 1:

sion without compromising the standard any method for any object. p->t;ll\eoff (i:

May 1988 17

The use of static type checkingand virtual function calls leads to a somewhat different style of programming than does dynamic type checking and method invocation. For example, a Simula or C++ class specifies a fixed interface to a set of objects (of any derived class), while a Smalltalk class specifies an initial set of operations for objects (of any subclass). In other words, a Smalltalk class is a minimal specification and the user is free to try operations not specified, while a C++ class is an exact specification and only operations specified in the class declaration are guaranteed to be accepted by the compiler.

Inheritance. Consider a language that has some form of method lookup without an inheritance mechanism. Does that language support object-oriented programming? I think not.

Clearly, you could do interesting things with the method table to adapt the objects’ behavior to suit conditions. However, to avoid chaos there must be some systematic way to associate methods and the data structures they assume for their object r e p resentation. To let a object’s user know what kind of behavior to expect, there would also have to be some standardway to express what is common to the different behaviors the object might adopt. Thissys- tematic, standard way is an inheritance mechanism.

Consider a language that has an inheritance mechanism without virtual functions or methods. Does that language s u p port object-oriented programming? I think not: The shape example does not have agood solution in such a language.

However, such a language would be more powerful than a plain data-abstraction language. This contention is supported by the observation that many Simula and C++ programs are structured using class hierarchies without virtual functions. The ability to express commonality (factoring) is an extremely powerful tool. For example, the problems associated with the need to have a common representation of all shapes could be solved; no union would be needed.

However, in the absence of virtual functions, the programmer would have to re-

sort to using type fields to determine actual types of objects, so the problems with the code’s lack of modularity would re- main.*

This implies that class derivation ( sub classing) is an important programming tool in its own right. While it can be used to support object-oriented programming, it has wider uses. This is particularly true if you associate inheri tance in object- oriented programming with the idea that a base class expresses a general concept of which all derived classes are specializa- tions. This idea captures only part of the expressive power of inheritance, but it is strongly encouraged by languages where every member function is virtual (or a method).

Given suitable controls over what is in- herited,” class derivation can be a powerful tool for creating new types. Given a class, derivation can be used to add and subtract features. The relation of the re- sulting class to its base cannot always be completely described in terms of speciali- zation; factoring is a better term.

Derivation is another programmer’s tool and there is no foolproof way to pre- dict how it is going to be used - and it is too early (even after 20years ofsirnula) to tell which uses are simply misuses.

Multiple inheritance. When class A is a base of class B, B inherits the attributes of A; that is, B is an A in addition to whatever else it might be. Given this explanation, it seems obvious that it might be useful to have class B inherit from two base classes, AI and A2. This is called multiple inheritance.*

An example of multiple inheritance are two library classes, the displayed class and the task class, that respectively represent objects under the control of a display manager and coroutines under the control of a scheduler. A programmer could then create classes such as

class my-displayed-task : public displayed, public task

I; // my stuff

class my-task : public task { // not displayed

// my stuff 1;

class my-displayed :public displayed { / / not a task

t ; / / my stuff

With single inheritance, only two of these three choices are open to the programmer. This leads to code replication or loss of flexibility- and typically both. In C++, this example can be handled with no significant overhead (in time or space), compared to single inheritance, and without sacrificing static type checking.’

Ambiguities are detected at compile time:

classA(public:f(); ... 1 ; class B public: f ( ) ; . . . I ; class C : public A, public B { . . . I;

voidgo I C* p;

t p > f ( ) ; / /error: ambiguous

In this capability, C++ differs from the o b ject-oriented Lisp dialects that support multiple inheritance. In these Lisp dialects, ambiguities are resolved by considering the order of declarations significant, by considering objects of the same name in different base classes identical, or by combining methods of the same name in base classes into a more complex method of the highest class.

In C++, you would typically resolve the ambiguity by adding a function:

class C : public A, public B 1 public:

f ( ) 1 // C’s own stuff A::f(); B::f();

I ...

I In addition to this fairly straightforward

concept of independent, multiple inheritance, there appears to be a need for a more general mechanism to express de- pendencies between classes in a multiple- inheritance lattice. In C++, the requirement that a subobject be shared in a class object is expressed through the mechanism of a virtual base class:

*This is the problem with Sirnula’s Inspect s ~ t e i n e m and the reason it does not have a counterpart in C++.

class U‘ , , . 1; class Bwindow / / window with border

18 IEEE Software

: piihlit \ i i tual \I j . . . t :

class .Ilivindow : puhlic 1 irtual \V I . . . t :

cl;t5s I%.Il\V wiiido\\ with bordrr

: puhlic B~indow, puhlit 4lwiiidow

1 ’ \vindowwith mriiii

I/ allti lllellll

I . . . 1 :

Here, the single window siibobjcct is ?hared bv the Rwindow and Bwindow s i b objects of a RMM’. The Lisp dialects use mrthod combinations to ease programming using such complicated class hierarchies. C++ docs not.

Encapsulation. ( :oilsider a class member (data or function) that must be protected from unauthorized access. M’hat choices are I-easonahlc t o deliniit the set of functions that inay access that member?

The obvious aiisver for a language supporting object-oriented programming is “all operations defined for this object,” o r all inember fiinctions. A hidden impli- cation ofthisans~\cristhattherecannotbe a complete and final list of all functions that may access the protected member since you can alivays add another by deriv- ing a new class from the protected niern- her’s class and then defining a member function of that derived class. This ap- proach combines a large degree of.protec- tion from accidents (it’s not easy to define a new derived class by accident) uith the flexibilit! needed for t o o l building iising clas hierarchies ( y o u can grant yonrself~ access t o protected member-s by deri\ing a class).

L’ n for-tun at el y, the o hi ous answe I for a langiragc supporting data abstraction is different: “1,ist the functions that need ;icccss in the class declaration.” There is nothing special about these functions; thcy need n o t be member functions.

X noninenit)ci- function with access to pi-ivate class members is called aJnmd in (:++. Class Complex, above, \vas defined using friend functions. I t is sometimes important that a function may be specified as a friend in more than one class. HaLing the fu l l list of.nicnibcrs and friends available is a great ad~antagc when you are tr!ing to undcrstand the bchaiior o f a type and especially when you want t o modify i t .

Here is an example that demonstrates part of the range of choices for encapsulation in C++:

class B 1 / / class members are / / default private

int i l ; void fl ():

protected: int i2: r.oidfL();

public: i n t i3; void f3 ();

friend void g(B*): //any function can bc

1; //designated as a friend

Private and protected members are not generally accessible:

void 11 (B* p) I

p > f l (); // error: B::fl is privatr p>f2(): //error: B::f? is protected p > n ( ) ; //fine: B::fl is public

I

Protected, but not private, members are accessible to members of a derived class:

class D : public B 1 public:

voidg() I

f l (); / / error: B::fl is private fZ(): //fine: B::E isprotected,

// but D is derived from B f3(): / /f ine: B::fl ispublic

I t ;

Friend functions have access to private and protected membersjust like member functions:

void g(B* p) I

p > f l ( ) ; / / fine: B::f.l is private. / / bu tgo isafriendofB

p>f2(); //fine: B::n is protected, // birtg() isafrieiidofB

p>t3( ) ; //fine: B::tl is public I

The importance of encapsulation issues increases dramatically as program sire increases, and as the number and geo- graphical dispersion of its users expands. For a detailed discussion of encapsulation issues, see SnydeP and Stroustrup.’

Implementation issues. Support for ob ject-oriented programming is provided primarily by the runtime system and the

programming cnvii-oniiicnt. 1’;ii-t of the reason is that objcct+rientcd pl-ogramming builds on rhc language iinpro\v nients all-ead! pi-o\ided t o support data abstraction, so 1-elatively fkw additions arc needed.

This assume’ that an ol~jcct-oi.icnted language docs indeed suppor-t data ab-

abstraction i + oftcn deficiclir in airch Iaii-

guages. Conversely, languages that s u p port dara abstraction a1.c typic;llly defi- citnt in their support of’ object-oriented programming.

Object-oiiented programming further blurs the distinction between a programming langiiagt. and its en\ir-onnicnt. Bc- cause more powerful special- and gencl-al- purpose user-defined types call be defined, the! pei7ade user programs. This requires furthei- development of the ruii- time system, library Facilities, dchuggrrs, performance measuring, monitoring tools, and so on. Ideally, tticsc ai-c integrated into a unified pi-ogi.aniniing e11- Lironment, of which Smalltiilk is the best example.

StrdctiOll. HC)l.iC\Cr, the SUrIpOrt for data

Limits to perfection To claim to be general-purpose, a lan-

guage that is designed to exploit the tcch- niques of data hiding, data abstraction, and object<iricnted programming must also

run on traditional machines, coexist with traditional operating sy-

teins, compete with traditional ~ ~ O ~ I K I I I I -

ming tangiiagvs in runtime cfficicnc); and cope with every major application

area. This means that facilities must be avail-

able for effectkc nuincl-ical work (floating-point arithmetic without over-head that Would m i k e For-tran attr-active), and n i c i n o ~ milst be accessible s o that dcricc drivers @an hc written. I t must also he possible t o write calla that conform to the (oftcn rather strange) standards required for operating-sptcm intcrf‘accs. In addition, it should be possible to call fiinctions written in other languages fr-om it objcc-t- oriented language and for functions written in the objcct<)ricntcd language to he called from a program lvrittrn i l i another language.

May 1988 19

It also means that an object-riented language cannot rely completely on mechanisms that cannot be efficiently implemented on a traditional architecture and still expect to be used as ageneral-purpose language. A very general implementation of method invocation can be a liability unless there are alternative ivavs of request- ing a service.

Similarly, garbage collection can become a performance and portability bot- tleneck. Most object-oriented programming languages use garbage collection to simplify the programmer’s task and to re- duce the complexity of the language and its compiler. However, it ought LO be possible to use garbage collection in non-

Acknowledri!ments

critical areas while retaining control of storage use in a r e a where it matters. As an alternative, it is feasible to have a language without garbage collection and then prtr vide sufficient expressive power to enable the design of types that maintain their own storage. C++ is an example of this.

Exception handling and concurrency are other potential problems. Any feature that is best implemented with help from a linker is likely to become a portability problem.

The alternative to having low-level features in a language is to handle major a p plication areas using separate low-level languages.

bject-oriented programming is programming using inheritance. 0 Data abstraction is programming

using userdefined types. With few excep tions, object-riented programming can and ought to be a superset of data abstrar- tion.

These techniques need proper language support to be effective. Data abstraction needs support primarily in language features; object-or iented programming needs more support in the programming environment. To be con- sidered general-purpose, a language must let you use traditional hardware effec- tively. .:.

I~

An earlier version of this article was presented to the Association of Sirnula Users meeting in Stockholm in August 1986. The discussions there caused many improvements in both style and content. Bi-ian Kernighan and Ravi Sethi made many constructive com- ments. Also, thanks to all who helped shape C t t .

AN OUTSTANDING SEMINAR BY THE IN TERNATIONALLY RECOGNIZED AUTHORITY ON ...

Software Cost Estimation Using

COCOMO - - - - ~-~ - (Constructive Cost Model)

Special Feature: A Full Description of the Recently Developed Ada COCOMO Model

COCOMO vs Other Cost Modelsfrechniques Basic. Intermediate, Detailed COCOMO-

Tailoring COCOMO COCOMO Extensions-Incremental Development,

FeaturedApplications

Acquisition management

PRESENTED BY:

DR. BARRY W. BOEHM LOS ANGELES WASHINGTON, DC

June 13-14, 1988 June 9-10, 1988

For Information Call: (213) 5M.4871

References 1. K Nygaard, “Basic Concept, in Object-Oriented Programming,”

SIGPhn ,%firm, Oct. 1986, pp. 128-1 32. 2. R. Kerr, “Object-Based Programming: A Foundation for Reliable

Softwa-e,”I%c. 14th Simulu C!sm ‘(;(in$, Sirnula Inlonnation, Oslo, Norway, 1986, pp. 159.165; a short version is “A Materialistic View of the Sofmare ‘Engineering’ Analog); SIGJ’kn Notzm, March

3. T. Cargill, “IPI: A Case Study in ObjectQi-ietlted Programming,” SIGPhn Nofires, Nov. 1986, pp. 350-360.

4. B. Liskov et al., “Abstraction Mechanisms in Clu,” Cnmm. ACM, Aug. 1977, pp. 564-576.

5 . J , Shopiro, “Extending the C t t Task Systciii I o I Real-Time Appli- cations,” 13ur. ( 5 m z x C++ Wcnkshvp, Cseiiix, Santa Monica, Calif.,

6. A. Snyder, “Encapsulation and Inhrritmce il l Object-0rienIed

7. B. Su-oiistrup, 7hr C++ Programming ImpiagP3 Addismi-Wesley,

8. D. Weinreb a i d D. Moon, l i sp Mnrhinr Mrinual, Synbolics, Cam-

9. B. Stroustrup, “Multiple Inheritance fo r C t t , ” I ’ m . Spring

1987, pp 123-125.

1987. pp. 77-94.

Programming Iaiguages,” SlGf’kin Notzrr.\, Nov. 1986, pp. 3845.

Reading, Mass., 1986.

bridge, Mass., 1981.

b.’urnpmn ( ’ n i x C k r s Croup Con{, EEUG, 1,ondon. 19x7.

Bjarne Stroustrup is the designer- and original implemen terof Ctt . His research interests include distributed systems. operating systems, simulation, programming methodology, and programming languages.

Stroustrup received an MS i n mathematics and computer science from the University of-Aarhus and a PhI) i n coniputer science from Cambridge University. He is a distinguisehd niemhei- of the Coin- puter Science Research Center and is a member of IEEE and ACM.

Address questions about this ai-tick to the anthol- at AT&T Bell Laboratories, f in, 2(:-324, 600 Mountain Avr., Mui-ray Hill, N J 07974.

Reader Service Number 4 IEEE Sottware

what is object-oriented programming? - ieee software

Documents