elec 875 design recovery and automated evolution week …post.queensu.ca/~trd/875/pdf/e875w1.pdf ·...

58
ELEC 875 Design Recovery and Automated Evolution Week 1 Modelling

Upload: vanxuyen

Post on 26-Jul-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ELEC 875 Design Recovery and Automated Evolution Week …post.queensu.ca/~trd/875/pdf/E875W1.pdf · Design Recovery and Automated Evolution Week 1 ... Examination of Software Engineering

ELEC 875 – Design Recovery and Automated Evolution

ELEC 875Design Recovery

andAutomated Evolution

Week 1Modelling

Page 2: ELEC 875 Design Recovery and Automated Evolution Week …post.queensu.ca/~trd/875/pdf/E875W1.pdf · Design Recovery and Automated Evolution Week 1 ... Examination of Software Engineering

ELEC 875 – Design Recovery and Automated Evolution

Papers for next week• Singer, J., Lethbridge, T., Vinson, N. and Anquetil, N., "An

Examination of Software Engineering Work Practices", CASCON '97 , Toronto, October, pp. 209-223.

• Lethbridge, T. and Singer, J. (1997), "Understanding Software Maintenance Tools: Some Empirical Research", Workshop on Empirical Studies of Software Maintenance (WESS 97) , Bari Italy, October, pp. 157-162.

• R. Ferenc, S. Sim, R. Holt , R. Koschke, T. Gyimóthy, "Towards a Standard Schema for C/C++", 8th Working Conference On Reverse Engineering (WCRE'01) , Stuttgart, Germany, October, pp. 49-58.

Page 3: ELEC 875 Design Recovery and Automated Evolution Week …post.queensu.ca/~trd/875/pdf/E875W1.pdf · Design Recovery and Automated Evolution Week 1 ... Examination of Software Engineering

ELEC 875 – Design Recovery and Automated Evolution

System Evolution• Real systems evolve over time◊ not just bug fixes◊ environment changes over time◊ new/old features◊ legacy systems

• Design Recovery◊ Recover design level facts about software

artifacts

• Automated Evolution◊ semi-automated changes to systems

Page 4: ELEC 875 Design Recovery and Automated Evolution Week …post.queensu.ca/~trd/875/pdf/E875W1.pdf · Design Recovery and Automated Evolution Week 1 ... Examination of Software Engineering

ELEC 875 – Design Recovery and Automated Evolution

Course Structure• 5 weeks of lectures◊ background material (readings)◊ basis

• Midterm (25%)◊ based on lectures

• Advanced Readings◊ reports (30%) and discussion (15%)

• Project (30%)◊ Project Presentation◊ TXL◊ Other technologies (Rascal/Rational/Custom)

Page 5: ELEC 875 Design Recovery and Automated Evolution Week …post.queensu.ca/~trd/875/pdf/E875W1.pdf · Design Recovery and Automated Evolution Week 1 ... Examination of Software Engineering

ELEC 875 – Design Recovery and Automated Evolution

Legacy

noun A sum of money, or a specified article, given to another by will; anything handed down by an ancestor or predecessor

adj associated with something that is outdated or discontinued

Page 6: ELEC 875 Design Recovery and Automated Evolution Week …post.queensu.ca/~trd/875/pdf/E875W1.pdf · Design Recovery and Automated Evolution Week 1 ... Examination of Software Engineering

ELEC 875 – Design Recovery and Automated Evolution

Legacy Systems• Software◊ inherited (more than one generation of

developers)◊ valuable

– significant resources to replace– significant risk to replace

• Problems:◊ original developers may not be available◊ older development methods used (outdated?)◊ extensive modifications◊ missing or outdated documentation◊ studies show 50% – 75% of available effort

(domain dependent)

Page 7: ELEC 875 Design Recovery and Automated Evolution Week …post.queensu.ca/~trd/875/pdf/E875W1.pdf · Design Recovery and Automated Evolution Week 1 ... Examination of Software Engineering

ELEC 875 – Design Recovery and Automated Evolution

Legacy Systems• Traditionally viewed as old and expensive◊ prohibitively expensive◊ only a matter of time before they must be

replaced◊ drain on resources◊ outdated

Page 8: ELEC 875 Design Recovery and Automated Evolution Week …post.queensu.ca/~trd/875/pdf/E875W1.pdf · Design Recovery and Automated Evolution Week 1 ... Examination of Software Engineering

ELEC 875 – Design Recovery and Automated Evolution

Legacy Systems• Alternate View:◊ crown jewels◊ organizations that have not let their legacy

systems get out of control (i.e. most large financial institutions) have a significant advantage over other organizations

◊ system is working and evolves

Page 9: ELEC 875 Design Recovery and Automated Evolution Week …post.queensu.ca/~trd/875/pdf/E875W1.pdf · Design Recovery and Automated Evolution Week 1 ... Examination of Software Engineering

ELEC 875 – Design Recovery and Automated Evolution

Legacy Systems• Continuous Evolution◊ You own a wooden ship. You replace each

board in the ship each time you sail. At what point in time do you have a new ship?

- Ship of Theseus (Plutarch)◊ Space Shuttle◊ Operating Systems◊ Compilers◊ Financial Systems (systems written in 1962 are

still running).

Page 10: ELEC 875 Design Recovery and Automated Evolution Week …post.queensu.ca/~trd/875/pdf/E875W1.pdf · Design Recovery and Automated Evolution Week 1 ... Examination of Software Engineering

ELEC 875 – Design Recovery and Automated Evolution

Design Recovery• Recover Design Information from Source Artifacts.

Source Artifacts:

Page 11: ELEC 875 Design Recovery and Automated Evolution Week …post.queensu.ca/~trd/875/pdf/E875W1.pdf · Design Recovery and Automated Evolution Week 1 ... Examination of Software Engineering

ELEC 875 – Design Recovery and Automated Evolution

Design Recovery• Recover Design Information from Source Artifacts.

Source Artifacts:◊ source code

Page 12: ELEC 875 Design Recovery and Automated Evolution Week …post.queensu.ca/~trd/875/pdf/E875W1.pdf · Design Recovery and Automated Evolution Week 1 ... Examination of Software Engineering

ELEC 875 – Design Recovery and Automated Evolution

Design Recovery• Recover Design Information from Source Artifacts.

Source Artifacts:◊ source code◊ database definitions

Page 13: ELEC 875 Design Recovery and Automated Evolution Week …post.queensu.ca/~trd/875/pdf/E875W1.pdf · Design Recovery and Automated Evolution Week 1 ... Examination of Software Engineering

ELEC 875 – Design Recovery and Automated Evolution

Design Recovery• Recover Design Information from Source Artifacts.

Source Artifacts:◊ source code◊ database definitions◊ screen definitions (also web page definitions)◊ communication definitions

Page 14: ELEC 875 Design Recovery and Automated Evolution Week …post.queensu.ca/~trd/875/pdf/E875W1.pdf · Design Recovery and Automated Evolution Week 1 ... Examination of Software Engineering

ELEC 875 – Design Recovery and Automated Evolution

Design Recovery• Recover Design Information from Source Artifacts.

Source Artifacts:◊ source code◊ database definitions◊ screen definitions (also web page definitions)◊ communication definitions◊ stored procedures

Page 15: ELEC 875 Design Recovery and Automated Evolution Week …post.queensu.ca/~trd/875/pdf/E875W1.pdf · Design Recovery and Automated Evolution Week 1 ... Examination of Software Engineering

ELEC 875 – Design Recovery and Automated Evolution

Design Recovery• Recover Design Information from Source Artifacts.

Source Artifacts:◊ source code◊ database definitions◊ screen definitions (also web page definitions)◊ communication definitions◊ stored procedures◊ scripting languages (JCL, TCL, Shell, DOS BAT)

Page 16: ELEC 875 Design Recovery and Automated Evolution Week …post.queensu.ca/~trd/875/pdf/E875W1.pdf · Design Recovery and Automated Evolution Week 1 ... Examination of Software Engineering

ELEC 875 – Design Recovery and Automated Evolution

Design Recovery• Recover Design Information from Source Artifacts.

Source Artifacts:◊ source code◊ database definitions◊ screen definitions (also web page definitions)◊ communication definitions◊ stored procedures◊ scripting languages (JCL, TCL, Shell, DOS BAT)◊ some forms of documentation

Page 17: ELEC 875 Design Recovery and Automated Evolution Week …post.queensu.ca/~trd/875/pdf/E875W1.pdf · Design Recovery and Automated Evolution Week 1 ... Examination of Software Engineering

ELEC 875 – Design Recovery and Automated Evolution

Design Recovery• Recover Design Information from Source Artifacts.

Source Artifacts:◊ source code◊ database definitions◊ screen definitions (also web page definitions)◊ communication definitions◊ stored procedures◊ scripting languages (JCL, TCL, Shell, DOS BAT)◊ some forms of documentation◊ 4GL languages (application generation)

Page 18: ELEC 875 Design Recovery and Automated Evolution Week …post.queensu.ca/~trd/875/pdf/E875W1.pdf · Design Recovery and Automated Evolution Week 1 ... Examination of Software Engineering

ELEC 875 – Design Recovery and Automated Evolution

Resources• Conferences:◊ IEEE International Conference on Software

Maintenance (ICSM)◊ IEEE Working Conference On Reverse

Engineering (WCRE)◊ European Conference On Software

Maintenance and Reengineering (CSMR)◊ IEEE International Conference on Program

Comprehension (ICPC)◊ IEEE International Conference On Software

Engineering (ICSE)◊ Foundations on Software Engineering

Page 19: ELEC 875 Design Recovery and Automated Evolution Week …post.queensu.ca/~trd/875/pdf/E875W1.pdf · Design Recovery and Automated Evolution Week 1 ... Examination of Software Engineering

ELEC 875 – Design Recovery and Automated Evolution

Resources• Journals

• Web◊ many researchers in the area

◊ citeseer, google scholar, etc.

Page 20: ELEC 875 Design Recovery and Automated Evolution Week …post.queensu.ca/~trd/875/pdf/E875W1.pdf · Design Recovery and Automated Evolution Week 1 ... Examination of Software Engineering

ELEC 875 – Design Recovery and Automated Evolution

Design Recovery Architecture

SrcCode Extractor

AnalysisReporter

Reports

DesignModel

Page 21: ELEC 875 Design Recovery and Automated Evolution Week …post.queensu.ca/~trd/875/pdf/E875W1.pdf · Design Recovery and Automated Evolution Week 1 ... Examination of Software Engineering

ELEC 875 – Design Recovery and Automated Evolution

Design Recovery Architecture

SrcCode Extractor

AnalysisReporter

Reports

DesignModel

Page 22: ELEC 875 Design Recovery and Automated Evolution Week …post.queensu.ca/~trd/875/pdf/E875W1.pdf · Design Recovery and Automated Evolution Week 1 ... Examination of Software Engineering

ELEC 875 – Design Recovery and Automated Evolution

Modelling -ER

Employee Departmentmember1:∞ 0:1

managedBy0:1

1:∞ No: int Name: string

Page 23: ELEC 875 Design Recovery and Automated Evolution Week …post.queensu.ca/~trd/875/pdf/E875W1.pdf · Design Recovery and Automated Evolution Week 1 ... Examination of Software Engineering

ELEC 875 – Design Recovery and Automated Evolution

Modelling - Extended ER

Employee Departmentmember1:∞ 0:1

managedBy0:1

1:∞

Part Time Full Time Temp

No: int Name: string

Benefits: xyzzy Hours: int

Page 24: ELEC 875 Design Recovery and Automated Evolution Week …post.queensu.ca/~trd/875/pdf/E875W1.pdf · Design Recovery and Automated Evolution Week 1 ... Examination of Software Engineering

ELEC 875 – Design Recovery and Automated Evolution

Modelling• In traditional design (forward engineering), we model

the problem domain and incorporate that model into the software in some manner.◊ OOAD◊ SA&D

• In design recovery, the problem domain is software. Our model will consist of entities that represent software artifacts (data is a program)

• Long Term Goal: to tie the model extracted from the code to a traditional problem model

Page 25: ELEC 875 Design Recovery and Automated Evolution Week …post.queensu.ca/~trd/875/pdf/E875W1.pdf · Design Recovery and Automated Evolution Week 1 ... Examination of Software Engineering

ELEC 875 – Design Recovery and Automated Evolution

Two Models - Base Model

SrcCode Extractor

AnalysisReporter

Reports

DesignModel

Page 26: ELEC 875 Design Recovery and Automated Evolution Week …post.queensu.ca/~trd/875/pdf/E875W1.pdf · Design Recovery and Automated Evolution Week 1 ... Examination of Software Engineering

ELEC 875 – Design Recovery and Automated Evolution

Base Model• Entities and Relations in the Base Model directly

represent software artifacts◊ source code elements

• Example Entities◊ variables◊ procedures◊ types◊ statements

Page 27: ELEC 875 Design Recovery and Automated Evolution Week …post.queensu.ca/~trd/875/pdf/E875W1.pdf · Design Recovery and Automated Evolution Week 1 ... Examination of Software Engineering

ELEC 875 – Design Recovery and Automated Evolution

Base Model• Example Relations◊ calls (procedure calls a procedure)◊ references (procedure references a variable)◊ modifies (procedure changes value of a variable)◊ isFieldOf (field to structure or class)◊ hasType (type of variable or function)◊ ifPart (component of a compound statement)

Page 28: ELEC 875 Design Recovery and Automated Evolution Week …post.queensu.ca/~trd/875/pdf/E875W1.pdf · Design Recovery and Automated Evolution Week 1 ... Examination of Software Engineering

ELEC 875 – Design Recovery and Automated Evolution

Base Model - Notes• some entities have natural names◊ variables ◊ procedures◊ types- names may be predefined or user defined

• some entities do not have natural names◊ statements◊ blocks◊ constants

Page 29: ELEC 875 Design Recovery and Automated Evolution Week …post.queensu.ca/~trd/875/pdf/E875W1.pdf · Design Recovery and Automated Evolution Week 1 ... Examination of Software Engineering

ELEC 875 – Design Recovery and Automated Evolution

Base Model - Examplefile main.c void printf(char *, …); char * foo(int); int main(int argc, char **argv){ printf(“hello world%s”,foo(3) }

file foo.c char * foo(int x){ return (“!\n”); }

Page 30: ELEC 875 Design Recovery and Automated Evolution Week …post.queensu.ca/~trd/875/pdf/E875W1.pdf · Design Recovery and Automated Evolution Week 1 ... Examination of Software Engineering

ELEC 875 – Design Recovery and Automated Evolution

Base Model - ExampleEntities:

Files: main.c foo.c Functions: foo, main Variables: argc, argv, x Prototypes: foo, printf Constants: “hello world%s”, “!\n” Types: void, int, char*, char**, char

Relations: Contains: (main.c,printf), (main.c, main), (main.c foo) Calls: (main,foo) Parameter: (main, argc), (main, argv), (foo,x) Argument: (foo,3) HasType: (main, int), (foo, char*), (argc, int), (argv,

char**), (x,int), (printf,void),(foo, char*)

Page 31: ELEC 875 Design Recovery and Automated Evolution Week …post.queensu.ca/~trd/875/pdf/E875W1.pdf · Design Recovery and Automated Evolution Week 1 ... Examination of Software Engineering

ELEC 875 – Design Recovery and Automated Evolution

Base Model - Issues• Unique Naming◊ some entities have the same name ◊ scoping◊ name spaces (Java, C, C++)◊ Model is a database, need a key for each entity◊ different entity sets - keys needed only for same

entity sets and for entity sets that share relations◊ solutions:

- unique id for each entity (CPPX, Columbus)- name derived from scope (LS/2000)

Page 32: ELEC 875 Design Recovery and Automated Evolution Week …post.queensu.ca/~trd/875/pdf/E875W1.pdf · Design Recovery and Automated Evolution Week 1 ... Examination of Software Engineering

ELEC 875 – Design Recovery and Automated Evolution

Base Model - Issues• Resolution◊ sample model cannot connect arguments to

parameters (more than one call? more than one argument?)

◊ Return value of foo?

• Organization◊ Database practice - organize database to answer

common queries◊ any given organization makes some queries hard,

other queries easy

Page 33: ELEC 875 Design Recovery and Automated Evolution Week …post.queensu.ca/~trd/875/pdf/E875W1.pdf · Design Recovery and Automated Evolution Week 1 ... Examination of Software Engineering

ELEC 875 – Design Recovery and Automated Evolution

Two Models - Derived Model

SrcCode Extractor

AnalysisReporter

Reports

DesignModel

Page 34: ELEC 875 Design Recovery and Automated Evolution Week …post.queensu.ca/~trd/875/pdf/E875W1.pdf · Design Recovery and Automated Evolution Week 1 ... Examination of Software Engineering

ELEC 875 – Design Recovery and Automated Evolution

Derived Model• built on top of the base model◊ derived from information in the base model ◊ new relations between entities◊ new entities for existing entity types◊ new entity types◊ new attributes

• Two types of derived information◊ deterministic computed information

- implementation semantics, storage semantics◊ inferred information (heuristics)

Page 35: ELEC 875 Design Recovery and Automated Evolution Week …post.queensu.ca/~trd/875/pdf/E875W1.pdf · Design Recovery and Automated Evolution Week 1 ... Examination of Software Engineering

ELEC 875 – Design Recovery and Automated Evolution

Derived Model - Computed• storage semantics◊ programmers can and do play storage games

struct xyzzy{ int x; float y;};

- x is at offset 0 and is 4 bytes long- y is at offset 4 and is 4 bytes long

• Big Endian/Little Endian

Page 36: ELEC 875 Design Recovery and Automated Evolution Week …post.queensu.ca/~trd/875/pdf/E875W1.pdf · Design Recovery and Automated Evolution Week 1 ... Examination of Software Engineering

ELEC 875 – Design Recovery and Automated Evolution

Derived Model - Computedstruct xyzzy{int type;union { struct { … int x; … } option1; struct { … int y; … } option2;} detail;

};

what if fields X and Y have the same offset???

what if the programmer intends them to have the same offset??

Page 37: ELEC 875 Design Recovery and Automated Evolution Week …post.queensu.ca/~trd/875/pdf/E875W1.pdf · Design Recovery and Automated Evolution Week 1 ... Examination of Software Engineering

ELEC 875 – Design Recovery and Automated Evolution

Derived Model - Computed• BCD - binary coded decimal• COMP-3 - BCD + Sign Nibble

01 CONV-REC. 05 NUM-VAL PIC 99 COMP-3. 05 ALPHA REDEFINES NUM-VAL. 10 ALPHA-VAL PIC X. 10 FILLER PIC X

MOVE INBYTE to ALPHA-VAL.DIVIDE NUM-VAL BY 10.

Page 38: ELEC 875 Design Recovery and Automated Evolution Week …post.queensu.ca/~trd/875/pdf/E875W1.pdf · Design Recovery and Automated Evolution Week 1 ... Examination of Software Engineering

ELEC 875 – Design Recovery and Automated Evolution

Derived Model - Inferred• Use other information to infer information about

entities.• Y2K - Dates◊ Names of Variables and Functions◊ Storage Types of Fields◊ Interaction with OS or with known API◊ Domain Dependent Patterns

01 MTGSTD PIC 9(6).

Page 39: ELEC 875 Design Recovery and Automated Evolution Week …post.queensu.ca/~trd/875/pdf/E875W1.pdf · Design Recovery and Automated Evolution Week 1 ... Examination of Software Engineering

ELEC 875 – Design Recovery and Automated Evolution

Derived Model - Inferred• Use other information to infer information about

entities.• Y2K - Dates◊ Names of Variables and Functions◊ Storage Types of Fields◊ Interaction with OS or with known API◊ Domain Dependent Patterns

01 CURRENT-DATE-YYMMDD PIC 9(6).01 MTGSTD PIC 9(6).

IF MTGSTD > CURRENT-DATE-YYMMDD

Page 40: ELEC 875 Design Recovery and Automated Evolution Week …post.queensu.ca/~trd/875/pdf/E875W1.pdf · Design Recovery and Automated Evolution Week 1 ... Examination of Software Engineering

ELEC 875 – Design Recovery and Automated Evolution

Derived Model - Inferred• Move to higher level of abstraction• Business Rules, Business Types• Goal:◊ Link to problem model for program◊ Employee Number, Customer Name, Customer

Address

Page 41: ELEC 875 Design Recovery and Automated Evolution Week …post.queensu.ca/~trd/875/pdf/E875W1.pdf · Design Recovery and Automated Evolution Week 1 ... Examination of Software Engineering

ELEC 875 – Design Recovery and Automated Evolution

Biggerstaff - Introduction• “Design Recovery For Maintenance and Reuse",

IEEE Computer, 22(7), July 1989, pp. 36–99• Seminal Paper◊ Discusses the General Goal◊ Prototype: Desire - first step towards the goal

Page 42: ELEC 875 Design Recovery and Automated Evolution Week …post.queensu.ca/~trd/875/pdf/E875W1.pdf · Design Recovery and Automated Evolution Week 1 ... Examination of Software Engineering

ELEC 875 – Design Recovery and Automated Evolution

Already Happens• “a common, sometimes hidden part of many

activities scattered throughout the software life cycle”

• Development◊ Understand similar systems◊ Understand libraries and systems that interact

• Maintenance◊ Understand system before making changes.

• happens in an inherently unstructured way.

Page 43: ELEC 875 Design Recovery and Automated Evolution Week …post.queensu.ca/~trd/875/pdf/E875W1.pdf · Design Recovery and Automated Evolution Week 1 ... Examination of Software Engineering

ELEC 875 – Design Recovery and Automated Evolution

Domain Expertise• Domain Model◊ Human knowledge of the application area.

- needed for development and maintenance

◊ Tools need to abstract domain knowledge as well.

Page 44: ELEC 875 Design Recovery and Automated Evolution Week …post.queensu.ca/~trd/875/pdf/E875W1.pdf · Design Recovery and Automated Evolution Week 1 ... Examination of Software Engineering

ELEC 875 – Design Recovery and Automated Evolution

Biggerstaff• Design recovery whenever a system is maintained• Several Steps◊ Program Understanding

- Modules- Key data items- Software engineering artifacts- Informal design abstractions- Relate SE artifacts and informal abstractions to the code

◊ Population of Reuse and Recovery Libraries◊ Applying Results of Design Recovery

Page 45: ELEC 875 Design Recovery and Automated Evolution Week …post.queensu.ca/~trd/875/pdf/E875W1.pdf · Design Recovery and Automated Evolution Week 1 ... Examination of Software Engineering

ELEC 875 – Design Recovery and Automated Evolution

Identify the Modules• Not all languages have modules• software of any size has modules• variety of ways to implement modules◊ separate files and compilation units

- module.h module.c- no nested modules- smaller modules (one file)- may be more than one implementation file

- e.g. module1.c module2.c• naming convention for type, procedure or variable

names

Page 46: ELEC 875 Design Recovery and Automated Evolution Week …post.queensu.ca/~trd/875/pdf/E875W1.pdf · Design Recovery and Automated Evolution Week 1 ... Examination of Software Engineering

ELEC 875 – Design Recovery and Automated Evolution

Key Data Items• Most programs are organized around one or more

specific data items.◊ Master journal record in transaction systems◊ Master account database◊ Ready, wait and device queues in operating

systems• These data items are some abstraction of the

problem domain. What are they?◊ Customer, Sale, Deposit, Process

• How are they related to the modules◊ SA&D vs ADTs◊ Functional Decomposition vs OO

Page 47: ELEC 875 Design Recovery and Automated Evolution Week …post.queensu.ca/~trd/875/pdf/E875W1.pdf · Design Recovery and Automated Evolution Week 1 ... Examination of Software Engineering

ELEC 875 – Design Recovery and Automated Evolution

SE Artifacts• The result of Design Recovery (as expressed by

Biggerstaff) are design artifacts◊ dependent on shop◊ PDLs, Dataflow, Data Dictionary◊ UML?

• Does not have to match the artifacts originally used to create the system

• Artifacts must be appropriate for system◊ Consequences of a poor fit?◊ UML for 40 year old transaction system

Page 48: ELEC 875 Design Recovery and Automated Evolution Week …post.queensu.ca/~trd/875/pdf/E875W1.pdf · Design Recovery and Automated Evolution Week 1 ... Examination of Software Engineering

ELEC 875 – Design Recovery and Automated Evolution

Informal Design Abstractions• Informal descriptions of concepts that occur in the

code (automatable?)• Design Rational• Original Designers are not available, or it may be

so long that they do not remember◊ People’s version of history change over the

years◊ Guess◊ Source Code Comments◊ Existing Documentation

Page 49: ELEC 875 Design Recovery and Automated Evolution Week …post.queensu.ca/~trd/875/pdf/E875W1.pdf · Design Recovery and Automated Evolution Week 1 ... Examination of Software Engineering

ELEC 875 – Design Recovery and Automated Evolution

Relating Abstractions to Code• Link the recovered design back to the code• Which functions are part of which module?• Which files are part of a UML class?• Which data structure represents a particular

informal concept

• Necessary to answer low level questions that have been abstracted out◊ needed in order to use the system◊ not designing systems from scratch, modifying

existing systems.- modifications to the design imply modifications to particular pieces of code

Page 50: ELEC 875 Design Recovery and Automated Evolution Week …post.queensu.ca/~trd/875/pdf/E875W1.pdf · Design Recovery and Automated Evolution Week 1 ... Examination of Software Engineering

ELEC 875 – Design Recovery and Automated Evolution

Reuse and Application• late 80’s early 90’s - big thing was code reuse• Identify reusable parts of code◊ generalize to make more reusable◊ factoring and decoupling

• Biggerstaff - not just code reuse, but also design recovery reuse◊ help build similar components◊ help recover similar components from other

systems

Page 51: ELEC 875 Design Recovery and Automated Evolution Week …post.queensu.ca/~trd/875/pdf/E875W1.pdf · Design Recovery and Automated Evolution Week 1 ... Examination of Software Engineering

ELEC 875 – Design Recovery and Automated Evolution

Analysis• Informal Information◊ semantics is not the only thing

- turing computable argument◊ real systems do not contain random code

- they have to understand it and have some confidence that it actually works

◊ naming conventions◊ structural conventions

• One main goal is to help humans◊ don’t underestimate humans

Page 52: ELEC 875 Design Recovery and Automated Evolution Week …post.queensu.ca/~trd/875/pdf/E875W1.pdf · Design Recovery and Automated Evolution Week 1 ... Examination of Software Engineering

ELEC 875 – Design Recovery and Automated Evolution

Informal Informationplausible instances of the conceptual ab- straction.

Associative connections. A conceptual abstraction’s second property is its rich set of informal, natural-language associations that establish its contextual framework for human understanding. That is, the concept of a process management module has semantic connections to other informal, semantic concepts such as context switch- ing, state saving, andmultitasking. Each of these concepts allows association of the concept of a process management module with a large body of knowledge that can help an engineer interpret the design of some specific process management mod- ule or plan the design of a new one.

These two properties provide clues to the role of conceptual abstractions in deal- ing with large complex designs. The struc- tural property provides a way to handle lots of detail without being overwhelmed, as well as a way of describing the application patterns one expects to find in programs. In contrast, the associative linguistic prop- erty offers a way to deal with partially specified (fuzzy) design objects within the universe of informal, natural language- based semantics. These properties relate to two parallel and complementary models - the software-engineering representa- tion model and the natural-language se- mantic model.

The importance of informal informa- tion. An example will illustrate the impor- tance of the informal aspect of conceptual abstractions. Consider the C function in Figure4. This is areal function taken from a multitasking window system’ with the comments removed and meaningful iden- tifiers mapped to semantically empty symbols. What could an analyst tell about the computational intent of this function? Precious little. About all he or she could do is paraphrase the relations expressed in the programming language. For example, the analyst could describe that the function fOOOl calls f0002 with arguments that are global arrays (such as gOOO1) of structures containing some fields (such as so001 and ~0002) . Even if the definitions of all of the functions (f0002, f0003, etc.) were avail- able and similarly transformed, the com- putational intent would remain unclear. What is worse is that, without the informal information, the computational intent of these functions might not be unique. There could be a number of valid interpretations.

The example severs the connection be- tween the artifact and the semantics of the

July 1989

#include <stdio.h> #include “h0001 .h” #include “h0002.h” #include “h0003.h” f000 1 (a000 1 )

unsigned int a000 1 ; I unsigned int io00 1 ; f0002(g0005,d000 1 ,d0002); f0002(a000 1 ,d0003,d0002); f0003(g0001 [a0001 ] .so001 ,go001 [a000 l].sOOO2); go006 = a000 1 ; io001 = g0001[a0001].s0003; if( !f0004(i0001) && (gOOO2->gOOO3)[iOOOl].sOOO4 == d0004)

1 fOO05(iOOO 1 );

Figure 4. Function with no informal semantic clues.

#include <stdio.h> #include “pr0c.h” #include “window.h” #include “g1obdefs.h” change-window(nw)

unsigned int nw; I unsigned int pn; border-attribute(cwin,NORM_ATTR,INV_ATTR,INV-ATTR); border-attribute(nw,NORMHLIT-ATTR,INV-ATTR); move-cursor(wintbl[nw].crow,wintbl[nw].ccol); cwin = nw; pn = wintbl[nw].pnumb; if(!outrange(pn) && (g->proctbl)[pn].procstate == SUSPENDED)

t resume(pn);

Figure 5. Function with some informal semantic clues.

problem domain, eliminating associations between the program and our informal knowledge of the world. Interpretation and understanding of the program has become impossible in any deep sense. Thus, we can see that connotation plays an important role in the process by which people deal with, interpret, and understand programs.

It is exactly this kind of semantically impoverished representation that we usu- ally give to automated tools. If people have difficulty dealing with this kind of repre-

sentation, why should we expect a com- puter to be more successful?

So what sort of informal information is required to understand the program in a nonsuperficial way? Let us consider a slightly enhanced version of this program. Figure 5 maps the symbolic names back to those used in the original code. Here, the names of the functions are more meaning- ful and, if the reader understands a bit about multitasking and window systems, he or she can probably make some good

41

Page 53: ELEC 875 Design Recovery and Automated Evolution Week …post.queensu.ca/~trd/875/pdf/E875W1.pdf · Design Recovery and Automated Evolution Week 1 ... Examination of Software Engineering

ELEC 875 – Design Recovery and Automated Evolution

Informal Information

plausible instances of the conceptual ab- straction.

Associative connections. A conceptual abstraction’s second property is its rich set of informal, natural-language associations that establish its contextual framework for human understanding. That is, the concept of a process management module has semantic connections to other informal, semantic concepts such as context switch- ing, state saving, andmultitasking. Each of these concepts allows association of the concept of a process management module with a large body of knowledge that can help an engineer interpret the design of some specific process management mod- ule or plan the design of a new one.

These two properties provide clues to the role of conceptual abstractions in deal- ing with large complex designs. The struc- tural property provides a way to handle lots of detail without being overwhelmed, as well as a way of describing the application patterns one expects to find in programs. In contrast, the associative linguistic prop- erty offers a way to deal with partially specified (fuzzy) design objects within the universe of informal, natural language- based semantics. These properties relate to two parallel and complementary models - the software-engineering representa- tion model and the natural-language se- mantic model.

The importance of informal informa- tion. An example will illustrate the impor- tance of the informal aspect of conceptual abstractions. Consider the C function in Figure4. This is areal function taken from a multitasking window system’ with the comments removed and meaningful iden- tifiers mapped to semantically empty symbols. What could an analyst tell about the computational intent of this function? Precious little. About all he or she could do is paraphrase the relations expressed in the programming language. For example, the analyst could describe that the function fOOOl calls f0002 with arguments that are global arrays (such as gOOO1) of structures containing some fields (such as so001 and ~0002) . Even if the definitions of all of the functions (f0002, f0003, etc.) were avail- able and similarly transformed, the com- putational intent would remain unclear. What is worse is that, without the informal information, the computational intent of these functions might not be unique. There could be a number of valid interpretations.

The example severs the connection be- tween the artifact and the semantics of the

July 1989

#include <stdio.h> #include “h0001 .h” #include “h0002.h” #include “h0003.h” f000 1 (a000 1 )

unsigned int a000 1 ; I unsigned int io00 1 ; f0002(g0005,d000 1 ,d0002); f0002(a000 1 ,d0003,d0002); f0003(g0001 [a0001 ] .so001 ,go001 [a000 l].sOOO2); go006 = a000 1 ; io001 = g0001[a0001].s0003; if( !f0004(i0001) && (gOOO2->gOOO3)[iOOOl].sOOO4 == d0004)

1 fOO05(iOOO 1 );

Figure 4. Function with no informal semantic clues.

#include <stdio.h> #include “pr0c.h” #include “window.h” #include “g1obdefs.h” change-window(nw)

unsigned int nw; I unsigned int pn; border-attribute(cwin,NORM_ATTR,INV_ATTR,INV-ATTR); border-attribute(nw,NORMHLIT-ATTR,INV-ATTR); move-cursor(wintbl[nw].crow,wintbl[nw].ccol); cwin = nw; pn = wintbl[nw].pnumb; if(!outrange(pn) && (g->proctbl)[pn].procstate == SUSPENDED)

t resume(pn);

Figure 5. Function with some informal semantic clues.

problem domain, eliminating associations between the program and our informal knowledge of the world. Interpretation and understanding of the program has become impossible in any deep sense. Thus, we can see that connotation plays an important role in the process by which people deal with, interpret, and understand programs.

It is exactly this kind of semantically impoverished representation that we usu- ally give to automated tools. If people have difficulty dealing with this kind of repre-

sentation, why should we expect a com- puter to be more successful?

So what sort of informal information is required to understand the program in a nonsuperficial way? Let us consider a slightly enhanced version of this program. Figure 5 maps the symbolic names back to those used in the original code. Here, the names of the functions are more meaning- ful and, if the reader understands a bit about multitasking and window systems, he or she can probably make some good

41

Page 54: ELEC 875 Design Recovery and Automated Evolution Week …post.queensu.ca/~trd/875/pdf/E875W1.pdf · Design Recovery and Automated Evolution Week 1 ... Examination of Software Engineering

ELEC 875 – Design Recovery and Automated Evolution

Informal Information#include <stdio.h> #include “pr0c.h” #include “window. h” #include “g1obdefs.h” change-window(nw) /*Change current window to window nw*/

/*Number of target window*/ unsigned int nw; 1 unsigned int pn;

/*Restore border of current window to un-highlighted*/ border-attribute(cwin,NORM-ATTRJNV-ATTR);

/*Highlight border of new current window*/ border-attri bute( nw,NORMHLIT-ATTRJNV-ATTR);

/*Move the physical cursor to the new window where the cursor was left, and make nw the current window*/ move~cursor(wintb~[nw].crow,wintbl[nw].ccol); cwin = nw:

/*Resume the process associated with the new window if it is suspended.*/ pn = wintbl[nw].pnumb; if( !outrange(pn) && (g->proctbl)[pn] .procstate == SUSPENDED)

I resume(pn);

Figure 6. Function with many informal semantic clues.

guesses about the operation. The name of the function suggests that it changes which window is currently active, with the new window probably indicated by the argu- ment nw. Further, we can guess that the function border-attribute alters the visual appearance of the windows’ borders, the function move-cursor moves the screen cursor to some position in the new win- dow, and the function resume allows some suspended process to run again (probably the process associated with the new win- dow). The variables similarly come alive with meaning: wintbl is probably the win- dow table and probably has fields ccol and crow that keep track of the cursor (inferred from their use in the call to move-cursor).

By restoring the comments from the original code (see Figure 6), we can cor- roborate several of our guesses and en- hance our understanding of some of the functions and variables.

This exercise should make it clear that the informal linguistic information that the software engineer deals with is not simply supplemental information that can be ig- nored because automated tools do not use it. Rather, this information is fundamental.

42

It provides the ability to determine the computational intent of code in a way that is impossible with just the source code denuded of its informal semantics.

If we are to use this informal informa- tion in design recovery tools, we must propose a form for it, suggest how that form relates to the formal information captured in program source code or in formal specifications, and propose a set of operations on these structures that imple- ments the design recovery process. To accomplish these goals, we must first ana- lyze the proposed design recovery system in a bit more detail.

A model-based design recovery system

What would a design recovery system look like? Figure 7 is a system-level de- scription of a model-based design recov- ery system (called Desire) showing some of the sources of information used to re- cover designs. They include the code of existing systems because such code COII-

tains a large amount of important informa- tion, but there must be other sources as well. Much design information cannot be formally captured in the program source code because programming languages do not contain the constructs necessary to express information such as the informal conceptual abstractions behind the code. For example, the informal conceptual ab- stractions behind the change-window function discussed earlier include win- dows, processes, cursors, and the opera- tions on these entities. And these concep- tual abstractions are woven into a rich set of knowledge about the domain that pro- vides clues to understanding the formal source code structures.

Design recovery results in a hypertext web4 of information that weaves together informal ideas (e.g., the concept of a proc- ess), software engineering artifacts (e.g.. a dataflow diagram of aprocess switch), and details of specific examples of these enti- ties as embodied in code (e.g., one specific subroutine for process switching). This web is projected into externalized reports to help the software engineer understand a specific target system and into internalized data structures for use by the Rose reuse ~ y s t e m . ~ Since the web is built out of hy- pertext frames, the design recovered by the Desire system is simply a set of data struc- tures that represent the conceptual abstractions and express the semiformal relationships among them (see sidebar, “Concepts of object-oriented program- ming”).

To understand how this model-based design recovery system works, the nature of the data items in the model, and how those data items are used, consider the following typical design recovery session using the multitasking window system as the application domain. Using the process model of design recovery as a guide, I will first define a set of data objects (called idioms) that implement the structural pat- terns and associative connections of con- ceptual abstractions. The example idioms codify the domain model’s expectations of the entities and structures in a typical multitasking window system running on a personal computer. I will then informally describe how these domain objects behave during the semiautomated recovery of the design of a specific multitasking window system. This scenario is analogous to a nonautomated design recovery performed by an unaided software engineer.

An example. We start to discover the structure of this multitasking window sys-

COMPUTER

Page 55: ELEC 875 Design Recovery and Automated Evolution Week …post.queensu.ca/~trd/875/pdf/E875W1.pdf · Design Recovery and Automated Evolution Week 1 ... Examination of Software Engineering

ELEC 875 – Design Recovery and Automated Evolution

Idioms• Information frames◊ Conceptual model of domain◊ binds in some way to source code.

- direct - indirect◊ drew a lot from frame based AI techniques

popular at the time. Process number, process name, process state

identify elements of a process tableAddress, Name, Postal Code indications of a

Customer.

Page 56: ELEC 875 Design Recovery and Automated Evolution Week …post.queensu.ca/~trd/875/pdf/E875W1.pdf · Design Recovery and Automated Evolution Week 1 ... Examination of Software Engineering

ELEC 875 – Design Recovery and Automated Evolution

Desire• linguistic patterns - lexical◊ representation of informal information◊ naming convention

• Structural Requirements◊ presence of one component implies another◊ some structures are aggregations of other

structures• Incomplete Match◊ not all systems are created equal◊ manual intervention

Page 57: ELEC 875 Design Recovery and Automated Evolution Week …post.queensu.ca/~trd/875/pdf/E875W1.pdf · Design Recovery and Automated Evolution Week 1 ... Examination of Software Engineering

ELEC 875 – Design Recovery and Automated Evolution

Prototype• lower level◊ functions, files, global data items◊ definition locations, use locations◊ calls uses depends

• Components◊ parser, analysis, view generation◊ links comments to artifacts

• Viewer◊ queries link back to source code

Page 58: ELEC 875 Design Recovery and Automated Evolution Week …post.queensu.ca/~trd/875/pdf/E875W1.pdf · Design Recovery and Automated Evolution Week 1 ... Examination of Software Engineering

ELEC 875 – Design Recovery and Automated Evolution

Analysis• Prototype is lower level◊ starting point is the code◊ may also include comments

• Link Back to Code◊ always important◊ use to modify existing code◊ knowledge of design is important, but only

useful if it helps you in the maintenance task• Manual Intervention◊ Design recovery includes abstract concepts.

Until real AI is created, human mind is still king.