scs 421 software development ii lecture notes

178
SCS 421 SOFTWARE DEVELOPMENT II (NEW) COURSE DESCRIPTION Using application programmer interfaces (APIs): API programming; class browsers and related tools; programming by example; debugging in the API environment; component-based computing. Human-centered software evaluation: Setting goals for evaluation; evaluation strategies, software development: Approaches, characteristics, and overview of process; prototyping techniques and tools. Software development techniques: Object-oriented analysis and design; component-level design; software requirements and specifications; prototyping; characteristics of maintainable software; software reuse. Application programming interface An application programming interface (API) is a particular set of rules ('code') and specifications that software programs can follow to communicate with each other. It serves as an interface between different software programs and facilitates their interaction, similar to the way the user interface facilitates interaction between humans and computers. An API can be created for applications, libraries, operating systems, etc., as a way of defining their "vocabularies" and resources request conventions (e.g. function-calling conventions). It may include specifications for routines, data structures, object classes, and protocols used to communicate between the consumer program and the implementer program of the API. Concept An API can be: • General, the full set of an API that is bundled in the libraries of a programming language, e.g. Standard Template Library in C++ or Java API. • Specific, meant to address a specific problem, e.g. Google Maps API or Java API for XML Web Services. • Language-dependent, meaning it is only available by using the syntax and elements of a particular language, which makes the API more convenient to use.

Upload: bd87gl

Post on 21-Apr-2015

62 views

Category:

Documents


7 download

TRANSCRIPT

SCS 421 SOFTWARE DEVELOPMENT II (NEW) COURSE DESCRIPTION

Using application programmer interfaces (APIs): API programming; class browsers and related tools; programming by example; debugging in the API environment; component-based computing. Human-centered software evaluation: Setting goals for evaluation; evaluation strategies, software development: Approaches, characteristics, and overview of process; prototyping techniques and tools. Software development techniques: Object-oriented analysis and design; component-level design; software requirements and specifications; prototyping; characteristics of maintainable software; software reuse.

Application programming interface

An application programming interface (API) is a particular set of rules ('code') and specifications that software programs can follow to communicate with each other. It serves as an interface between different software programs and facilitates their interaction, similar to the way the user interface facilitates interaction between humans and computers.

An API can be created for applications, libraries, operating systems, etc., as a way of defining their "vocabularies" and resources request conventions (e.g. function-calling conventions). It may include specifications for routines, data structures, object classes, and protocols used to communicate between the consumer program and the implementer program of the API.

Concept

An API can be:

General, the full set of an API that is bundled in the libraries of a programming language, e.g. Standard Template Library in C++ or Java API.

Specific, meant to address a specific problem, e.g. Google Maps API or Java API for XML Web Services.

Language-dependent, meaning it is only available by using the syntax and elements of a particular language, which makes the API more convenient to use.

Language-independent, written so that it can be called from several programming languages. This is a desirable feature for a service-oriented API that is not bound to a specific process or system and may be provided as remote procedure calls or web services. For example, a website that allows users to review local restaurants is able to layer their reviews over maps taken from Google Maps, because Google Maps has an API that facilitates this functionality. Google Maps' API controls what information a third-party site can use and how they can use it.

The term API may be used to refer to a complete interface, a single function, or even a set of APIs provided by an organization. Thus, the scope of meaning is usually determined by the context of usage.

Advanced explanation

An API may describe the ways in which a particular task is performed.

In procedural languages like C language the action is usually mediated by a function call. Hence the API includes usually a description of all the functions/routines it provides.

For instance: the math.h include file for the C language contains the definition of the function prototypes of the mathematical functions available in the C language library for mathematical processing (usually called libm).This file describes how to use the functions included in the given library: the function prototype is a signature that describes the number and types of the parameters to be passed to the functions and the type of the return value.

The behavior of the functions is usually described in more details in a human readable format in printed books or in electronic formats like the man pages: e.g. on UNIX systems the command

man 3 sqrt

will present the signature of the function sqrt in the form:

SYNOPSIS

#include

double sqrt(double X);

float sqrtf(float X);

DESCRIPTION

DESCRIPTION

sqrt computes the positive square root of the argument. ...

RETURNS

On success, the square root is returned. If X is real and

positive...

That means that the function returns the square root of a positive floating point number (single or double precision) as another floating point number.Hence the API in this case can be interpreted as the collection of the included files used by the C language and its human readable description provided by the man pages.

API in modern languages

Most of the modern programming languages provide the documentation associated with an API in some digital format that makes it easy to consult on a computer. E.g. perl comes with the tool perldoc:

$ perldoc -f sqrt

sqrt EXPR

sqrt #Return the square root of EXPR. If EXPR is omitted,

returns

#square root of $_. Only works on non-negative

operands, unless

#you've loaded the standard Math::Complex module.

python comes with the tool pydoc:

$ pydoc math.sqrt

Help on built-in function sqrt in math:

math.sqrt = sqrt(...)

sqrt(x)

Return the square root of x.

ruby comes with the tool ri:

$ ri Math::sqrt

------------------------------------------------------------- Math::sqrt

Math.sqrt(numeric) => float

------------------------------------------------------------------------

Returns the non-negative square root of _numeric_.

Java comes with the documentation organized in html pages (JavaDoc format), while Microsoft distributes the API documentation for its languages (Visual C++, C#, Visual Basic, F#, etc...) embedded in Visual Studio's help system.

API in object-oriented languages

In object oriented languages, an API usually includes a description of a set of class definitions, with a set of behaviors associated with those classes. A behavior is the set of rules for how an object, derived from that class, will act in a given circumstance. This abstract concept is associated with the real functionalities exposed, or made available, by the classes that are implemented in terms of class methods (or more generally by all its public components hence all public methods, but also possibly including public field, constants, nested objects).

The API in this case can be conceived as the totality of all the methods publicly exposed by the classes (usually called the class interface). This means that the API prescribes the methods by which one interacts with/handles the objects derived from the class definitions.

More generally, one can see the API as the collection of all the kinds of objects one can derive from the class definitions, and their associated possible behaviors. Again: the use is mediated by the public methods, but in this interpretation, the methods are seen as a technical detail of how the behavior is implemented.

For instance: a class representing a Stack can simply expose publicly two methods push() (to add a new item to the stack), and pop() (to extract the last item, ideally placed on top of the stack).

In this case the API can be interpreted as the two methods pop() and push(), or, more generally, as the idea that one can use an item of type Stack that implements the behavior of a stack: a pile exposing its top to add/remove elements.

This concept can be carried to the point where a class interface in an API has no methods at all, but only behaviors associated with it. For instance, the Java language and Lisp API include the interface Serializable, which requires that each class that implements it should behave in a serialized fashion. This does not require to have any public method, but rather requires that any class that implements it to have a representation that can be saved (serialized) at any time (this is typically true for any class containing simple data and no link to external resources, like an open connection to a file, a remote system, or an external device).

Similarly the behavior of an object in a concurrent (multi threaded) environment is not necessarily determined by specific methods, belonging to the interface implemented, but still belongs to the API for that Class of objects, and should be described in the documentation[3] .

In this sense, in object oriented languages, the API defines a set of object behaviors, possibly mediated by a set of class methods.

In such languages, the API is still distributed as a library. For example, the Java language libraries include a set of APIs that are provided in the form of the JDK used by the developers to build new Java programs. The JDK includes the documentation of the API in JavaDoc notation.

The quality of the documentation associated with an API is often a factor determining its success in terms of ease of use.

API libraries and frameworks

An API usually is related to a software library: the API describes and prescribes the expected behavior while the library is an actual implementation of this set of rules. A single API can have multiple implementations in the form of different libraries that share the same programming interface.

An API can also be related to a software framework: a framework can be based on several libraries implementing several APIs, but unlike the normal use of an API, the access to the behavior built into the framework is mediated by extending its content with new classes plugged into the framework itself. Moreover the overall program flow of control can be out of the control of the caller, and in the hands of the framework via inversion of control or similar mechanisms.

API and protocols

An API can also be an implementation of a protocol.

In general the difference between an API and a protocol is that the protocol defines a standard way to exchange requests and responses based on a common transport and agreeing on a data/message exchange format, while an API is usually implemented as a library to be used directly: hence there can be no transport involved (no information physically transferred from/to some remote machine), but rather only simple information exchange via function calls (local to the machine where the elaboration takes place) and data is exchanged in formats expressed in a specific language .

When an API implements a protocol it can be based on proxy methods for remote invocations that underneath rely on the communication protocol. The role of the API can be exactly to hide the detail of the transport protocol. E.g.: RMI is an API that implements the JRMP protocol or the IIOP as RMI-IIOP.

Protocols are usually shared between different technologies (system based on given computer programming languages in a given operating system) and usually allow the different technologies to exchange information, acting as an abstraction/mediation level between the two worlds. While APIs are specific to a given technology: hence the

APIs of a given language cannot be used in other languages, unless the function calls are wrapped with specific adaptation libraries.

Object API and protocols

An object API can prescribe a specific object exchange format, an object exchange protocol can define a way to transfer the same kind of information in a message.

When a message is exchanged via a protocol between two different platforms using objects on both sides, the object in a programming language can be transformed (marshaled and unmarshalled) in an object in a remote and different language: so, e.g., a program written in Java invokes a service via SOAP or IIOP written in C# both programs use

APIs for remote invocation (each locally to the machine where they are working) to (remotely) exchange information that they both convert from/to an object in local memory.

Instead when a similar object is exchanged via an API local to a single machine the object is effectively exchanged (or a reference to it) in memory.

API sharing and reuse via virtual machine

Some languages like those running in a virtual machine (e.g. CLI compliant languages in the Common Language Runtime and JVM compliant languages in the Java Virtual Machine) can share APIs.

In this case the virtual machine enables the language interoperation thanks to the common denominator of the virtual machine that abstracts from the specific language using an intermediate byte code.

Hence this approach maximizes the code reuse potential for all the existing libraries and related APIs.

Fluent API and DSL

An object oriented API is said to be fluent when it aims to provide for more readable code. Fluent APIs can be used to develop Domain-specific languages [6] .

Web APIs

When used in the context of web development, an API is typically a defined set of Hypertext Transfer Protocol

(HTTP) request messages, along with a definition of the structure of response messages, which is usually in an Extensible Markup Language (XML) or JavaScript Object Notation (JSON) format. While "Web API" is virtually a synonym for web service, the recent trend (so-called Web 2.0) has been moving away from Simple Object Access

Protocol (SOAP) based services towards more direct Representational State Transfer (REST) style communications. Web APIs allow the combination of multiple services into new applications known as mashups.Use of APIs to share content

The practice of publishing APIs has allowed web communities to create an open architecture for sharing content and data between communities and applications. In this way, content that is created in one place can be dynamically posted and updated in multiple locations on the web.

1. Photos can be shared from sites like Flicker and Photobucket to social network sites like Face book and MySpace.

2. Content can be embedded, e.g. embedding a presentation from SlideShare on a LinkedIn profile.

3. Content can be dynamically posted. Sharing live comments made on Twitter with a Facebook account, for example, is enabled by their APIs.

4. Video content can be embedded on sites which are served by another host.

5. User information can be shared from web communities to outside applications, delivering new functionality to the web community that shares its user data via an open API. One of the best examples of this is the Facebook Application platform. Another is the Open Social platform.Implementations

The POSIX standard defines an API that allows a wide range of common computing functions to be written in a way such that they may operate on many different systems (Mac OS X, and various Berkeley Software Distributions

(BSDs) implement this interface); however, making use of this requires re-compiling for each platform. A compatible API, on the other hand, allows compiled object code to function without any changes to the system implementing that API. This is beneficial to both software providers (where they may distribute existing software on new systems without producing and distributing upgrades) and users (where they may install older software on their new systems without purchasing upgrades), although this generally requires that various software libraries implement the necessary APIs as well.

Microsoft has shown a strong commitment to a backward compatible API, particularly within their Windows API (Win32) library, such that older applications may run on newer versions of Windows using an executable-specific setting called "Compatibility Mode".Apple Inc. has shown less concern, breaking compatibility or implementing an API in a slower "emulation mode"; this allows greater freedom in development, at the cost of making older software obsolete.

Among Unix-like operating systems, there are many related but incompatible operating systems running on a common hardware platform (particularly Intel 80386-compatible systems). There have been several attempts to standardize the API such that software vendors may distribute one binary application for all these systems; however, to date, none of these have met with much success. The Linux Standard Base is attempting to do this for the Linux platform, while many of the BSD Unixes, such as FreeBSD, NetBSD, and OpenBSD, implement various levels of

API compatibility for both backward compatibility (allowing programs written for older versions to run on newer distributions of the system) and cross-platform compatibility (allowing execution of foreign code without recompiling).

Release policies

The two options for releasing API are:

1. Protecting information on APIs from the general public. For example, Sony used to make its official PlayStation

2. API available only to licensed PlayStation developers. This enabled Sony to control who wrote PlayStation 2 games. This gives companies quality control privileges and can provide them with potential licensing revenue streams.

3. Making APIs freely available. For example, Microsoft makes the Microsoft Windows API public, and Apple releases its APIs Carbon and Cocoa, so that software can be written for their platforms.

A mix of the two behaviors can be used as well.

ABIs

The related term application binary interface (ABI) is a lower level definition concerning details at the assembly language level. For example, the Linux Standard Base is an ABI, while POSIX is an API.API examples

ASPI for SCSI device interfacing Carbon and Cocoa for the Macintosh DirectX for Microsoft Windows EHLLAPI Java APIs OpenGL cross-platform graphics API OpenAL cross-platform sound API OpenCL cross-platform API for general-purpose computing for CPUs & GPUs OpenMP API that supports multi-platform shared memory multiprocessing programming in C, C++ and FORTRAN on much architecture, including UNIX and Microsoft Windows platforms.

Simple DirectMedia Layer (SDL) Talend integrates its data management with BPM from Bonita Open Solution Windows API

Language bindings and interface generators

APIs that are intended to be used by more than one high-level programming language often provide, or are augmented with, facilities to automatically map the API to features (syntactic or semantic) that are more natural in those languages. This is known as language binding, and is itself an API. The aim is to encapsulate most of the required functionality of the API, leaving a "thin" layer appropriate to each language.

Below are listed some interface generator tools which bind languages to APIs at compile time.

SWIG opensource interfaces bindings generator from many languages to many languages (Typically Compiled->Scripted)

F2PY: FORTRAN to Python interface generator.SUMMARYAn application programming interface (API) is the interface that a computer system, library or application provides in order to allow requests for service to be made of it by other computer programs, and/or to allow data to be exchanged between them.

Description

One of the primary purposes of an API is to describe how to access a set of functions - for example, an API might describe how to draw windows or icons on the screen using a library that has been written for that purpose. APIs, like most interfaces, are abstract. Software that may be accessed via a particular API is said to implement that API.

For instance, a computer program can (and often must) use its operating system's API to allocate memory and access files. Many types of systems and applications implement APIs, such as graphics systems, databases, networks, web services, and even some computer games.

In many instances, an API is often a part of a Software development kit (SDK). An SDK may include an API as well as other tools and perhaps even some hardware, so the two terms are not strictly interchangeable.

There are various design models for APIs. Interfaces intended for the fastest execution often consist of sets of functions, procedures, variables and data structures. However, other models exist as well, such as the interpreter used to evaluate expressions in ECMAScript/JavaScript or in the abstraction layer, which relieves the programmer from needing to know how the functions of the API relate to the lower levels of abstraction. This makes it possible to redesign or improve the functions within the API without breaking code that relies on it.

Two general lines of policies exist regarding publishing APIs:

1. Some companies guard their APIs zealously. For example, Sony used to make its official PlayStation 2 API available only to licensed PlayStation developers. This is because Sony wanted to restrict how many people could write a PlayStation 2 game, and wanted to profit from them as much as possible. This is typical of companies who do not profit from the sale of API implementations (in this case, Sony broke-even on the sale of PlayStation 2 consoles and even took a loss on marketing, instead making it up through game royalties created by API licensing). However, PlayStation 3 is based entirely on open and publicly available APIs.

2. Other companies propagate their APIs freely. For example, Microsoft deliberately makes most of its API information public, so that software will be written for the Windows platform. The sale of the third-party software sells copies of Microsoft Windows. This is typical of companies who profit from the sale of API implementations (in this case, Microsoft Windows, which is sold at a gain for Microsoft).

Some APIs, such as the ones standard to an operating system, are implemented as separate code libraries that are distributed with the operating system. Others require software publishers to integrate the API functionality directly into the application. This forms another distinction in the examples above. Microsoft Windows APIs come with the operating system for anyone to use. Software for embedded systems such as video game consoles generally falls into the application-integrated category. While an official PlayStation API document may be interesting to read, it is of little use without its corresponding implementation, in the form of a separate library or software development kit.

An API that does not require royalties for access and usage is called "open." The APIs provided by Free software (such as all software distributed under the GNU General Public License), are open by definition, since anyone can look into the source of the software and figure out the API. Although usually authoritative "reference implementations" exist for an API (such as Microsoft Windows for the Win32 API), there's nothing that prevents the creation of additional implementations. For example, most of the Win32 API can be provided under a UNIX system using software called Wine.

It is generally lawful to analyze API implementations in order to produce a compatible one. This technique is called reverse engineering for the purposes of interoperability. However, the legal situation is often ambiguous, so that care and legal counsel should be taken before the reverse engineering is carried out. For example, while APIs usually do not have an obvious legal status, they might include patents that may not be used until the patent holder gives permission.

CLASS BROWSER

History of Class Browsers

Most modern class browsers owe their origins to Smalltalk, one of the earliest object-oriented languages. The Smalltalk browser was a series of horizontally-abutting panes at the top of a text editor window that listed the class hierarchy of the Smalltalk system. A class selected in one pane would list the subclasses of that class in the next pane to the right. For leaf classes, the farthest left pane would list the class instance variables and allow them to be edited.Most succeeding object-oriented languages differed from Smalltalk in that they were compiled and executed in a discrete runtime environment, rather than being dynamically integrated into a monolithic system like the early Smalltalk environments. Nevertheless, the concept of a table-like or graphic browser to navigate a class hierarchy.

With the popularity of C++ starting in the late-1980s, modern IDEs added class browsers, at first to simply navigate class hierarchies, and later to aid in the creation of new classes. With the introduction of Java in the mid-1990s class browsers became an expected part of any graphic development environment.

Class Browsing in Modern IDEs

All major development environments supply some manner of class browser, including

CodeWarrior for Microsoft Windows, Mac OS, and embedded systems

Microsoft Visual Studio

Eclipse

Borland JBuilder

IntelliJ IDEA

IBM WebSphere

Sun Microsystems Java Studio Creator

Apple Xcode for Mac OS X

Step Ahead Software's Javelin

NetBeans

Zeus IDE

KDevelop

ParcPlace Smalltalk

.NET Reflector

Modern class browsers fall into three general categories: the columnar browsers, the outline browsers, and the diagram browsers.

Columnar Browsers

Continuing the Smalltalk tradition, columnar browsers display the class hierarchy from left to right in a series of columns. Often the rightmost column is reserved for the instance methods or variables of the leaf class.

Outline Browsers

Systems with roots in Microsoft Windows tend to use an outline-form browser, often with colorful (if cryptic) icons to denote classes and their attributes.

Diagram Browsers

In the early years of the 21st century class browsers began to morph into modeling tools, where programmers could not only visualize their class hierarchy as a diagram, but also add classes to their code by adding them to the diagram. Most of these visualization systems have been based on some form of the Unified Modeling Language.

Refactoring Class BrowsersAs development environments add refactoring features, many of these features have been implemented in the class browser as well as in text editors. A refactoring browser can allow a programmer to move an instance variable from one class to another simply by dragging it in the graphic user interface, or to combine or separate classes using mouse gestures rather than a large number of text editor commands.DEBUGGING

Finding and fixing bugs, or "debugging", has always been a major part of computer programming. Maurice Wilkes, an early computing pioneer, described his realization in the late 1940s that much of the rest of his life would be spent finding mistakes in his own programs. As computer programs grow more complex, bugs become more common and difficult to fix. Often programmers spend more time and effort finding and fixing bugs than writing new code. Software testers are professionals whose primary task is to find bugs, or write code to support testing. On some projects, more resources can be spent on testing than in developing the program

Usually, the most difficult part of debugging is finding the bug in the source code. Once it is found, correcting it is usually relatively easy. Programs known as debuggers exist to help programmers locate bugs by executing code line by line, watching variable values, and other features to observe program behavior. Without a debugger, code can be added so that messages or values can be written to a console (for example with printf in the c language) or to a window or log file to trace program execution or show values.

However, even with the aid of a debugger, locating bugs is something of an art.

It is not uncommon for a bug in one section of a program to cause failures in a completely different section, thus making it especially difficult to track (for example, an error in a graphics rendering routine causing a file I/O routine to fail), in an apparently unrelated part of the system.

Sometimes, a bug is not an isolated flaw, but represents an error of thinking or planning on the part of the programmer. Such logic errors require a section of the program to be overhauled or rewritten. As a part of Code review, stepping through the code modeling the execution process in one's head or on paper can often find these errors without ever needing to reproduce the bug as such, if it can be shown there is some faulty logic in its implementation.

But more typically, the first step in locating a bug is to reproduce it reliably

Once the bug is reproduced, the programmer can use a debugger or some other tool to monitor the execution of the program in the faulty region, and find the point at which the program went astray.

It is not always easy to reproduce bugs. Some are triggered by inputs to the program which may be difficult for the programmer to re-create. One cause of the Therac-25 radiation machine deaths was a bug (specifically, a race condition) that occurred only when the machine operator very rapidly entered a treatment plan; it took days of practice to become able to do this, so the bug did not manifesting testing or when the manufacturer attempted to duplicate it. Other bugs may disappear when the program is run with a debugger; these are heisenbugs (humorously named after the Heisenberg uncertainty principle.)

Debugging is still a tedious task requiring considerable effort. Since the 1990s, particularly following the Ariane 5 Flight 501 disaster, there has been a renewed interest in the development of effective automated aids to debugging. For instance, methods of static code analysis by abstract interpretation have already made significant achievements, while still remaining much of a work in progress.

As with any creative act, sometimes a flash of inspiration will show a solution, but this is rare and, by definition, cannot be relied on.

There are also classes of bugs that have nothing to do with the code itself. If, for example, one relies on faulty documentation or hardware, the code may be written perfectly properly to what the documentation says, but the bug truly lies in the documentation or hardware, not the code. However, it is common to change the code instead of the other parts of the system, as the cost and time to change it is generally less. Embedded systems frequently have workarounds for hardware bugs, since to make a new version of a ROM is much cheaper than remanufacturing the hardware, especially if they are commodity items.

Bug management

It is common practice for software to be released with known bugs that are considered non-critical, that is, that do not affect most users' main experience with the product. While software products may, by definition, contain any number of unknown bugs, measurements during testing can provide an estimate of the number of likely bugs remaining; this becomes more reliable the longer a product is tested and developed ("if we had 200 bugs last week, we should have 100 this week")

Most big software projects maintain two lists of "known bugs" those known to the software team, and those to be told to users. This is not dissimulation, but users are not concerned with the internal workings of the product. The second list informs users about bugs that are not fixed in the current release, or not fixed at all, and a workaround may be offered.

There are various reasons for not fixing bugs:

The developers often don't have time or it is not economical to fix all non-severe bugs.

The bug could be fixed in a new version or patch that is not yet released.

The changes to the code required to fix the bug could be large, expensive, or delay finishing the project.

Even seemingly simple fixes bring the chance of introducing new unknown bugs into the system. At the end of a test/fix cycle some managers may only allow the most critical bugs to be fixed.

Users may be relying on the undocumented, buggy behavior, especially if scripts or macros rely on a behavior; it may introduce a breaking change.

It's "not a bug". A misunderstanding has arisen between expected and provided behavior

Given the above, it is often considered impossible to write completely bug-free software of any real complexity. So bugs are categorized by severity, and low-severity non-critical bugs are tolerated, as they do not affect the proper operation of the system for most users. NASA's SATC managed to reduce the number of errors to fewer than 0.1 per 1000 lines of code (SLOC) but this was not felt to be feasible for any real world projects.

The severity of a bug is not the same as its importance for fixing, and the two should be measured and managed separately. On a Microsoft Windows system a blue screen of death is rather severe, but if it only occurs in extreme circumstances, especially if they are well diagnosed and avoidable, it may be less important to fix than an icon not representing its function well, which though purely aesthetic may confuse thousands of users every single day. This balance, of course, depends on many factors; expert users have different expectations from novices, a niche market is different from a general consumer market, and so on.

A school of thought popularized by Eric S. Raymond as Linus's Law says that popular open-source software has more chance of having few or no bugs than other software, because "given enough eyeballs, all bugs are shallow".[12] This assertion has been disputed, however: computer security specialist Elias Levy wrote that "it is easy to hide vulnerabilities in complex, little understood and undocumented source code," because, "even if people are reviewing the code, that doesn't mean they're qualified to do so."

Like any other part of engineering management, bug management must be conducted carefully and intelligently because "what gets measured gets done" and managing purely by bug counts can have unintended consequences. If, for example, developers are rewarded by the number of bugs they fix, they will naturally fix the easiest bugs first leaving the hardest, and probably most risky or critical, to the last possible moment ("I only have one bug on my list but it says "Make sun rise in West"). If the management ethos is to reward the number of bugs fixed, then some developers may quickly write sloppy code knowing they can fix the bugs later and be rewarded for it, whereas careful, perhaps "slower" developers do not get rewarded for the bugs that were never there.

Security vulnerabilitiesMalicious software may attempt to exploit known vulnerabilities in a system which may or may not be bugs. Viruses are not bugs in themselves they are typically programs that are doing precisely what they were designed to do. However, viruses are occasionally referred to as such in the popular press.

Common types of computer bugs Conceptual error (code is syntactically correct, but the programmer or designer intended it to do something else)

Maths bugs Division by zero

Arithmetic overflow or underflow

Loss of arithmetic precision due to rounding or numerically unstable algorithms Logic bugs Infinite loops and infinite recursion

Logic bugs Infinite loops and infinite recursion

Syntax bugs Use of the wrong operator, such as performing assignment instead of equality test. In simple cases often warned by the compiler; in many languages, deliberately guarded against by language syntax

Resource bugs

Null pointer dereference Using an uninitialized variable Off by one error, counting one too many or too few when looping Access violations Resource leaks, where a finite system resource such as memory or file handles are exhausted by repeated allocation without release. Buffer overflow, in which a program tries to store data past the end of allocated storage. This may or may not lead to an access violation. These bugs can form security vulnerability. Excessive recursion which though logically valid causes stack overflowCo-programming bugs Deadlock Race condition Concurrency errors in Critical sections, Mutual exclusions and other features of concurrent processing. Time-of-check-to-time-of-use (TOCTOU) is a form of unprotected critical section.

Team working bugs Unpropagated updates; e.g. programmer changes "myAdd" but forgets to change "mySubtract", which uses the same algorithm. These errors are mitigated by the Don't Repeat Yourself philosophy. Comments out of date or incorrect: many programmers assume the comments accurately describe the code Differences between documentation and the actual productNeed for debugging

Once errors are identified in a program code, it is necessary to first identify the precise program statements responsible for the errors and then to fix them. Identifying errors in a program code and then fix them up are known as debugging.

Debugging approaches

The following are some of the approaches popularly adopted by programmers for debugging.

Brute Force Method:

This is the most common method of debugging but is the least efficient method. In this approach, the program is loaded with print statements to print the intermediate values with the hope that some of the printed values will help to identify the statement in error. This approach becomes more systematic with the use of a symbolic debugger (also called a source code debugger), because values of different variables can be easily checked and break points and watch points can be easily set to test the values of variables effortlessly.Backtracking: This is also a fairly common approach. In this approach, beginning from the statement at which an error symptom has been observed, the source code is traced backwards until the error is discovered. Unfortunately, as the number of source lines to be traced back increases, the number of potential backward paths increases and may become unmanageably large thus limiting the use of this approach.

Cause Elimination Method: In this approach, a list of causes which could possibly have contributed to the error symptom is developed and tests are conducted to eliminate each. A related technique of identification of the error from the error symptom is the software fault tree analysis.

Program Slicing: This technique is similar to back tracking. Here the search space is reduced by defining slices. A slice of a program for a particular variable at a particular statement is the set of source lines preceding this statement that can influence the value of that variable [Mund2002].

Debugging guidelines

Debugging is often carried out by programmers based on their ingenuity. The following are some general guidelines for effective debugging:

Many times debugging requires a thorough understanding of the program design. Trying to debug based on a partial understanding of the system design and implementation may require an inordinate amount of effort to be put into debugging even simple problems. Debugging may sometimes even require full redesign of the system. In such cases, a common mistake that novice programmers often make is attempting not to fix the error but its symptoms. One must be beware of the possibility that an error correction may introduce new errors. Therefore after every round of error-fixing, regression testing must be carried out.

Program analysis tools

A program analysis tool means an automated tool that takes the source code or the executable code of a program as input and produces reports regarding several important characteristics of the program, such as its size, complexity, adequacy of commenting, adherence to programming standards, etc. We can classify these into two broad categories of program analysis tools: Static Analysis tools

Dynamic Analysis tools

Static program analysis tools

Static analysis tool is also a program analysis tool. It assesses and computes various characteristics of a software product without executing it. Typically, static analysis tools analyze some structural representation of a program to arrive at certain analytical conclusions, e.g. that some structural properties hold. The structural properties that are usually analyzed are:

Whether the coding standards have been adhered to?

Certain programming errors such as uninitialized variables and mismatch between actual and formal parameters, variables that are declared but never used are also checked.

Code walk throughs and code inspections might be considered as static analysis methods. But, the term static program analysis is used to denote automated analysis tools. So, a compiler can be considered to be a static program analysis tool.

Dynamic program analysis tools

Dynamic program analysis techniques require the program to be executed and its actual behavior recorded. A dynamic analyzer usually instruments the code (i.e. adds additional statements in the source code to collect program execution traces). The instrumented code when executed allows us to record the behavior of the software for different test cases. After the software has been tested with its full test suite and its behavior recorded, the dynamic analysis tool caries out a post execution analysis and produces reports which describe the structural coverage that has been achieved by the complete test suite for the program. For example, the post execution dynamic analysis report might provide data on extent statement, branch and path coverage achieved.

Normally the dynamic analysis results are reported in the form of a histogram or a pie chart to describe the structural coverage achieved for different modules of the program. The output of a dynamic analysis tool can be stored and printed easily and provides evidence that thorough testing has been done. The dynamic analysis results the extent of testing performed in white-box mode. If the testing coverage is not satisfactory more test cases can be designed and added to the test suite. Further, dynamic analysis results can help to eliminate redundant test cases from the test suite.

Integration testing

The primary objective of integration testing is to test the module interfaces, i.e. there are no errors in the parameter passing, when one module invokes another module. During integration testing, different modules of a system are integrated in a planned manner using an integration plan. The integration plan specifies the steps and the order in which modules are combined to realize the full system. After each integration step, the partially integrated system is tested. An important factor that guides the integration plan is the module dependency graph. The structure chart (or module dependency graph) denotes the order in which different modules call each other. By examining the structure chart the integration plan can be developed.

Integration test approaches

There are four types of integration testing approaches. Any one (or a mixture) of the following approaches can be used to develop the integration test plan. Those approaches are the following:

Big bang approach

Top-down approach

Bottom-up approach

Mixed-approach

Big-Bang Integration Testing It is the simplest integration testing approach, where all the modules making up a system are integrated in a single step. In simple words, all the modules of the system are simply put together and tested. However, this technique is practicable only for very small systems. The main problem with this approach is that once an error is found during the integration testing, it is very difficult to localize the error as the error may potentially belong to any of the modules being integrated. Therefore, debugging errors reported during big bang integration testing are very expensive to fix.

Bottom-Up Integration Testing In bottom-up testing, each subsystem is tested separately and then the full system is tested. A subsystem might consist of many modules which communicate among each other through well-defined interfaces. The primary purpose of testing each subsystem is to test the interfaces among various modules making up the subsystem. Both control and data interfaces are tested. The test cases must be carefully chosen to exercise the interfaces in all possible manners.

Large software systems normally require several levels of subsystem testing; lower-level subsystems are successively combined to form higher-level subsystems. A principal advantage of bottom-up integration testing is that several disjoint subsystems can be tested simultaneously. In a pure bottom-up testing no stubs are required, only test-drivers are required. A disadvantage of bottom-up testing is the complexity that occurs when the system is made up of a large number of small subsystems. The extreme case corresponds to the big-bang approach.

Top-Down Integration Testing Top-down integration testing starts with the main routine and one or two subordinate routines in the system. After the top-level skeleton has been tested, the immediately subroutines of the skeleton are combined with it and tested. Top-down integration testing approach requires the use of program stubs to simulate the effect of lower-level routines that are called by the routines under test. Pure top-down integration does not require any driver routines. A disadvantage of the top-down integration testing approach is that in the absence of lower-level routines, many times it may become difficult to exercise the top-level routines in the desired manner since the lower-level routines perform several low-level functions such as I/O.

Mixed Integration Testing A mixed (also called sandwiched) integration testing follows a combination of top-down and bottom-up testing approaches. In top-down approach, testing can start only after the top-level modules have been coded and unit tested. Similarly, bottom-up testing can start only after the bottom level modules are ready. The mixed approach overcomes this shortcoming of the top-down and bottom-up approaches. In the mixed testing approaches, testing can start as and when modules become available. Therefore, this is one of the most commonly used integration testing approaches.

Phased vs. incremental testing

The different integration testing strategies are either phased or incremental. A comparison of these two strategies is as follows:

In incremental integration testing, only one new module is added to the partial system each time.

In phased integration, a group of related modules are added to the partial system each time.

Phased integration requires less number of integration steps compared to the incremental integration approach. However, when failures are detected, it is easier to debug the system in the incremental testing approach since it is known that the error is caused by addition of a single module. In fact, big bang testing is a degenerate case of the phased integration testing approach.

System testing

System tests are designed to validate a fully developed system to assure that it meets its requirements. There are essentially three main kinds of system testing:

Alpha Testing. Alpha testing refers to the system testing carried out by the test team within the developing organization.

Beta testing. Beta testing is the system testing performed by a select group of friendly customers.

Acceptance Testing. Acceptance testing is the system testing performed by the customer to determine whether he should accept the delivery of the system.

In each of the above types of tests, various kinds of test cases are designed by referring to the SRS document. Broadly, these tests can be classified into functionality and performance tests. The functionality tests test the functionality of the software to check whether it satisfies the functional requirements as documented in the SRS document. The performance tests test the conformance of the system with the nonfunctional requirements of the system.

Performance testing

Performance testing is carried out to check whether the system needs the non-functional requirements identified in the SRS document. There are several types of performance testing. Among of them nine types are discussed below. The types of performance testing to be carried out on a system depend on the different non-functional requirements of the system documented in the SRS document. All performance tests can be considered as black-box tests.

Stress testing

Volume testing

Configuration testing

Compatibility testing

Regression testing

Recovery testing

Maintenance testing

Documentation testing

Usability testing

Stress Testing Stress testing is also known as endurance testing. Stress testing evaluates system performance when it is stressed for short periods of time. Stress tests are black box tests which are designed to impose a range of abnormal and even illegal input conditions so as to stress the capabilities of the software. Input data volume, input data rate, processing time, utilization of memory, etc. are tested beyond the designed capacity. For example, suppose an operating system is supposed to support 15 multiprogrammed jobs, the system is stressed by attempting to run 15 or more jobs simultaneously. A real-time system might be tested to determine the effect of simultaneous arrival of several high-priorities interrupts.

Stress testing is especially important for systems that usually operate below the maximum capacity but are severely stressed at some peak demand hours. For example, if the non-functional requirement specification states that the response time should not be more than 20 secs per transaction when 60 concurrent users are working, then during the stress testing the response time is checked with 60 users working simultaneously.

Volume Testing It is especially important to check whether the data structures (arrays, queues, stacks, etc.) have been designed to successfully extraordinary situations. For example, a compiler might be tested to check whether the symbol table overflows when a very large program is compiled.

Configuration Testing This is used to analyze system behavior in various hardware and software configurations specified in the requirements. Sometimes systems are built in variable configurations for different users. For instance, we might define a minimal system to serve a single user, and other extension configurations to serve additional users. The system is configured in each of the required configurations and it is checked if the system behaves correctly in all required configurations.

Compatibility Testing This type of testing is required when the system interfaces with other types of systems. Compatibility aims to check whether the interface functions perform as required. For instance, if the system needs to communicate with a large database system to retrieve information, compatibility testing is required to test the speed and accuracy of data retrieval.Regression Testing This type of testing is required when the system being tested is an upgradation of an already existing system to fix some bugs or enhance functionality, performance, etc. Regression testing is the practice of running an old test suite after each change to the system or after each bug fix to ensure that no new bug has been introduced due to the change or the bug fix. However, if only a few statements are changed, then the entire test suite need not be run - only those test cases that test the functions that are likely to be affected by the change need to be run.

Recovery Testing Recovery testing tests the response of the system to the presence of faults, or loss of power, devices, services, data, etc. The system is subjected to the loss of the mentioned resources (as applicable and discussed in the SRS document) and it is checked if the system recovers satisfactorily. For example, the printer can be disconnected to check if the system hangs. Or, the power may be shut down to check the extent of data loss and corruption.

Maintenance Testing This testing addresses the diagnostic programs, and other procedures that are required to be developed to help maintenance of the system. It is verified that the artifacts exist and they perform properly.

Documentation Testing It is checked that the required user manual, maintenance manuals, and technical manuals exist and are consistent. If the requirements specify the types of audience for which a specific manual should be designed, then the manual is checked for compliance.

Usability Testing Usability testing concerns checking the user interface to see if it meets all user requirements concerning the user interface. During usability testing, the display screens, report formats, and other aspects relating to the user interface requirements are tested.

Error seeding

Sometimes the customer might specify the maximum number of allowable errors that may be present in the delivered system. These are often expressed in terms of maximum number of allowable errors per line of source code. Error seed can be used to estimate the number of residual errors in a system.

Error seeding, as the name implies, seeds the code with some known errors. In other words, some artificial errors are introduced into the program artificially. The number of these seeded errors detected in the course of the standard testing procedure is determined. These values in conjunction with the number of unseeded errors detected can be used to predict:

The number of errors remaining in the product. The effectiveness of the testing strategy.

Let N be the total number of defects in the system and let n of these defects be found by testing.

Let S be the total number of seeded defects, and let s of these defects be found during testing.

or

Defects still remaining after testing Error seeding works satisfactorily only if the kind of seeded errors matches closely with the kind of defects that actually exist. However, it is difficult to predict the types of errors that exist in software. To some extent, the different categories of errors that remain can be estimated to a first approximation by analyzing historical data of similar projects. Due to the shortcoming that the types of seeded errors should match closely with the types of errors actually existing in the code, error seeding is useful only to a moderate extent.

Regression testing

Regression testing does not belong to either unit test, integration test, or system testing. Instead, it is a separate dimension to these three forms of testing. The functionality of regression testing has been discussed earlier.

The following questions have been designed to test the objectives identified for this module:

1. What are the different ways of documenting program code? Which of these usually the most useful while understands a piece of code? 2. What is a coding standard? Identify the problems that might occur if the engineers of an organization do not adhere to any coding standard.

3. What is the difference between coding standards and coding guidelines? Why are these considered as important in a software development organization?

4. Write down five important coding standards.

5. Write down five important coding guidelines.

6. What do you mean by side effects of a function call? Why are obscure side effects undesirable?

7. What is meant by code review? Why is it required to be completed before performing integration and system testing?

8. Identify the type of errors that can be detected during code walk throughs.

9. Identify the type of errors that can be detected during code inspection.

10. What is clean room testing?

11. Why is it important to properly document a software product?

12. Differentiate between the external and internal documentation of a software product.

13. Identify the necessity of testing of a software product.

14. Distinguish between error and failure. Testing detects which of these two? Justify it.

15. Differentiate between verification and validation in the contest of software testing.

16. Is random selection of test cases effective? Justify.

17. Write down major differences between functional testing and structural testing.

18. Do you agree with the statement: The effectiveness of a testing suite in detecting errors in a system can be determined by examining the number of test cases in the suite. Justify your answer.

19. What are driver and stub modules in the context of unit testing of a software product?

20. Given a software and its requirements specification document, how can black-box test suites for this software be designed?

21. Identify two guidelines for the design of equivalence classes for a problem.

22. Explain why boundary value analysis is so important for the design of black box test suite for a problem.

23. Compare the features of stronger testing with the features of complementary testing.

24. Which is strongest structural testing technique among statement coverage-based testing, branch coverage-based testing, and condition coverage-based testing? Why?

25. Discuss how does control flow graph (CFG) of a problem help in understanding of path coverage based testing strategy.

26. Draw the control flow graph for the following function named find-maximum. From the control flow graph, determines its Cyclomatic complexity. int find-maximum(int i, int j, int k) { int max; if(i>j) then if(i>k) then max = i; else max = k; else if(j>k) max = j; else max = k; return(max); } 27. What is the difference between path and linearly independent path in terms of control flow graph (CFG) of a problem?

28. Define a metric form which the upper bound for the number of linearly independent paths of a program can be computed.

29. Consider the following C function named bin-search:

/* num is the number the function searches in a presorted integer array arr */ int bin_search(int num)

{ int min, max;

min = 0; max = 100;

while(min!=max)

{ if (arr[(min+max)/2]>num) max = (min+max)/2; else if(arr[(min+max)/2]