elec 875 design recovery and automated evolution week …post.queensu.ca/~trd/875/pdf/e875w1.pdf ·...
TRANSCRIPT
ELEC 875 – Design Recovery and Automated Evolution
ELEC 875Design Recovery
andAutomated Evolution
Week 1Modelling
ELEC 875 – Design Recovery and Automated Evolution
Papers for next week• Singer, J., Lethbridge, T., Vinson, N. and Anquetil, N., "An
Examination of Software Engineering Work Practices", CASCON '97 , Toronto, October, pp. 209-223.
• Lethbridge, T. and Singer, J. (1997), "Understanding Software Maintenance Tools: Some Empirical Research", Workshop on Empirical Studies of Software Maintenance (WESS 97) , Bari Italy, October, pp. 157-162.
• R. Ferenc, S. Sim, R. Holt , R. Koschke, T. Gyimóthy, "Towards a Standard Schema for C/C++", 8th Working Conference On Reverse Engineering (WCRE'01) , Stuttgart, Germany, October, pp. 49-58.
ELEC 875 – Design Recovery and Automated Evolution
System Evolution• Real systems evolve over time◊ not just bug fixes◊ environment changes over time◊ new/old features◊ legacy systems
• Design Recovery◊ Recover design level facts about software
artifacts
• Automated Evolution◊ semi-automated changes to systems
ELEC 875 – Design Recovery and Automated Evolution
Course Structure• 5 weeks of lectures◊ background material (readings)◊ basis
• Midterm (25%)◊ based on lectures
• Advanced Readings◊ reports (30%) and discussion (15%)
• Project (30%)◊ Project Presentation◊ TXL◊ Other technologies (Rascal/Rational/Custom)
ELEC 875 – Design Recovery and Automated Evolution
Legacy
noun A sum of money, or a specified article, given to another by will; anything handed down by an ancestor or predecessor
adj associated with something that is outdated or discontinued
ELEC 875 – Design Recovery and Automated Evolution
Legacy Systems• Software◊ inherited (more than one generation of
developers)◊ valuable
– significant resources to replace– significant risk to replace
• Problems:◊ original developers may not be available◊ older development methods used (outdated?)◊ extensive modifications◊ missing or outdated documentation◊ studies show 50% – 75% of available effort
(domain dependent)
ELEC 875 – Design Recovery and Automated Evolution
Legacy Systems• Traditionally viewed as old and expensive◊ prohibitively expensive◊ only a matter of time before they must be
replaced◊ drain on resources◊ outdated
ELEC 875 – Design Recovery and Automated Evolution
Legacy Systems• Alternate View:◊ crown jewels◊ organizations that have not let their legacy
systems get out of control (i.e. most large financial institutions) have a significant advantage over other organizations
◊ system is working and evolves
ELEC 875 – Design Recovery and Automated Evolution
Legacy Systems• Continuous Evolution◊ You own a wooden ship. You replace each
board in the ship each time you sail. At what point in time do you have a new ship?
- Ship of Theseus (Plutarch)◊ Space Shuttle◊ Operating Systems◊ Compilers◊ Financial Systems (systems written in 1962 are
still running).
ELEC 875 – Design Recovery and Automated Evolution
Design Recovery• Recover Design Information from Source Artifacts.
Source Artifacts:
ELEC 875 – Design Recovery and Automated Evolution
Design Recovery• Recover Design Information from Source Artifacts.
Source Artifacts:◊ source code
ELEC 875 – Design Recovery and Automated Evolution
Design Recovery• Recover Design Information from Source Artifacts.
Source Artifacts:◊ source code◊ database definitions
ELEC 875 – Design Recovery and Automated Evolution
Design Recovery• Recover Design Information from Source Artifacts.
Source Artifacts:◊ source code◊ database definitions◊ screen definitions (also web page definitions)◊ communication definitions
ELEC 875 – Design Recovery and Automated Evolution
Design Recovery• Recover Design Information from Source Artifacts.
Source Artifacts:◊ source code◊ database definitions◊ screen definitions (also web page definitions)◊ communication definitions◊ stored procedures
ELEC 875 – Design Recovery and Automated Evolution
Design Recovery• Recover Design Information from Source Artifacts.
Source Artifacts:◊ source code◊ database definitions◊ screen definitions (also web page definitions)◊ communication definitions◊ stored procedures◊ scripting languages (JCL, TCL, Shell, DOS BAT)
ELEC 875 – Design Recovery and Automated Evolution
Design Recovery• Recover Design Information from Source Artifacts.
Source Artifacts:◊ source code◊ database definitions◊ screen definitions (also web page definitions)◊ communication definitions◊ stored procedures◊ scripting languages (JCL, TCL, Shell, DOS BAT)◊ some forms of documentation
ELEC 875 – Design Recovery and Automated Evolution
Design Recovery• Recover Design Information from Source Artifacts.
Source Artifacts:◊ source code◊ database definitions◊ screen definitions (also web page definitions)◊ communication definitions◊ stored procedures◊ scripting languages (JCL, TCL, Shell, DOS BAT)◊ some forms of documentation◊ 4GL languages (application generation)
ELEC 875 – Design Recovery and Automated Evolution
Resources• Conferences:◊ IEEE International Conference on Software
Maintenance (ICSM)◊ IEEE Working Conference On Reverse
Engineering (WCRE)◊ European Conference On Software
Maintenance and Reengineering (CSMR)◊ IEEE International Conference on Program
Comprehension (ICPC)◊ IEEE International Conference On Software
Engineering (ICSE)◊ Foundations on Software Engineering
ELEC 875 – Design Recovery and Automated Evolution
Resources• Journals
• Web◊ many researchers in the area
◊ citeseer, google scholar, etc.
ELEC 875 – Design Recovery and Automated Evolution
Design Recovery Architecture
SrcCode Extractor
AnalysisReporter
Reports
DesignModel
ELEC 875 – Design Recovery and Automated Evolution
Design Recovery Architecture
SrcCode Extractor
AnalysisReporter
Reports
DesignModel
ELEC 875 – Design Recovery and Automated Evolution
Modelling -ER
Employee Departmentmember1:∞ 0:1
managedBy0:1
1:∞ No: int Name: string
ELEC 875 – Design Recovery and Automated Evolution
Modelling - Extended ER
Employee Departmentmember1:∞ 0:1
managedBy0:1
1:∞
Part Time Full Time Temp
No: int Name: string
Benefits: xyzzy Hours: int
ELEC 875 – Design Recovery and Automated Evolution
Modelling• In traditional design (forward engineering), we model
the problem domain and incorporate that model into the software in some manner.◊ OOAD◊ SA&D
• In design recovery, the problem domain is software. Our model will consist of entities that represent software artifacts (data is a program)
• Long Term Goal: to tie the model extracted from the code to a traditional problem model
ELEC 875 – Design Recovery and Automated Evolution
Two Models - Base Model
SrcCode Extractor
AnalysisReporter
Reports
DesignModel
ELEC 875 – Design Recovery and Automated Evolution
Base Model• Entities and Relations in the Base Model directly
represent software artifacts◊ source code elements
• Example Entities◊ variables◊ procedures◊ types◊ statements
ELEC 875 – Design Recovery and Automated Evolution
Base Model• Example Relations◊ calls (procedure calls a procedure)◊ references (procedure references a variable)◊ modifies (procedure changes value of a variable)◊ isFieldOf (field to structure or class)◊ hasType (type of variable or function)◊ ifPart (component of a compound statement)
ELEC 875 – Design Recovery and Automated Evolution
Base Model - Notes• some entities have natural names◊ variables ◊ procedures◊ types- names may be predefined or user defined
• some entities do not have natural names◊ statements◊ blocks◊ constants
ELEC 875 – Design Recovery and Automated Evolution
Base Model - Examplefile main.c void printf(char *, …); char * foo(int); int main(int argc, char **argv){ printf(“hello world%s”,foo(3) }
file foo.c char * foo(int x){ return (“!\n”); }
ELEC 875 – Design Recovery and Automated Evolution
Base Model - ExampleEntities:
Files: main.c foo.c Functions: foo, main Variables: argc, argv, x Prototypes: foo, printf Constants: “hello world%s”, “!\n” Types: void, int, char*, char**, char
Relations: Contains: (main.c,printf), (main.c, main), (main.c foo) Calls: (main,foo) Parameter: (main, argc), (main, argv), (foo,x) Argument: (foo,3) HasType: (main, int), (foo, char*), (argc, int), (argv,
char**), (x,int), (printf,void),(foo, char*)
ELEC 875 – Design Recovery and Automated Evolution
Base Model - Issues• Unique Naming◊ some entities have the same name ◊ scoping◊ name spaces (Java, C, C++)◊ Model is a database, need a key for each entity◊ different entity sets - keys needed only for same
entity sets and for entity sets that share relations◊ solutions:
- unique id for each entity (CPPX, Columbus)- name derived from scope (LS/2000)
ELEC 875 – Design Recovery and Automated Evolution
Base Model - Issues• Resolution◊ sample model cannot connect arguments to
parameters (more than one call? more than one argument?)
◊ Return value of foo?
• Organization◊ Database practice - organize database to answer
common queries◊ any given organization makes some queries hard,
other queries easy
ELEC 875 – Design Recovery and Automated Evolution
Two Models - Derived Model
SrcCode Extractor
AnalysisReporter
Reports
DesignModel
ELEC 875 – Design Recovery and Automated Evolution
Derived Model• built on top of the base model◊ derived from information in the base model ◊ new relations between entities◊ new entities for existing entity types◊ new entity types◊ new attributes
• Two types of derived information◊ deterministic computed information
- implementation semantics, storage semantics◊ inferred information (heuristics)
ELEC 875 – Design Recovery and Automated Evolution
Derived Model - Computed• storage semantics◊ programmers can and do play storage games
struct xyzzy{ int x; float y;};
- x is at offset 0 and is 4 bytes long- y is at offset 4 and is 4 bytes long
• Big Endian/Little Endian
ELEC 875 – Design Recovery and Automated Evolution
Derived Model - Computedstruct xyzzy{int type;union { struct { … int x; … } option1; struct { … int y; … } option2;} detail;
};
what if fields X and Y have the same offset???
what if the programmer intends them to have the same offset??
ELEC 875 – Design Recovery and Automated Evolution
Derived Model - Computed• BCD - binary coded decimal• COMP-3 - BCD + Sign Nibble
01 CONV-REC. 05 NUM-VAL PIC 99 COMP-3. 05 ALPHA REDEFINES NUM-VAL. 10 ALPHA-VAL PIC X. 10 FILLER PIC X
MOVE INBYTE to ALPHA-VAL.DIVIDE NUM-VAL BY 10.
ELEC 875 – Design Recovery and Automated Evolution
Derived Model - Inferred• Use other information to infer information about
entities.• Y2K - Dates◊ Names of Variables and Functions◊ Storage Types of Fields◊ Interaction with OS or with known API◊ Domain Dependent Patterns
01 MTGSTD PIC 9(6).
ELEC 875 – Design Recovery and Automated Evolution
Derived Model - Inferred• Use other information to infer information about
entities.• Y2K - Dates◊ Names of Variables and Functions◊ Storage Types of Fields◊ Interaction with OS or with known API◊ Domain Dependent Patterns
01 CURRENT-DATE-YYMMDD PIC 9(6).01 MTGSTD PIC 9(6).
IF MTGSTD > CURRENT-DATE-YYMMDD
ELEC 875 – Design Recovery and Automated Evolution
Derived Model - Inferred• Move to higher level of abstraction• Business Rules, Business Types• Goal:◊ Link to problem model for program◊ Employee Number, Customer Name, Customer
Address
ELEC 875 – Design Recovery and Automated Evolution
Biggerstaff - Introduction• “Design Recovery For Maintenance and Reuse",
IEEE Computer, 22(7), July 1989, pp. 36–99• Seminal Paper◊ Discusses the General Goal◊ Prototype: Desire - first step towards the goal
ELEC 875 – Design Recovery and Automated Evolution
Already Happens• “a common, sometimes hidden part of many
activities scattered throughout the software life cycle”
• Development◊ Understand similar systems◊ Understand libraries and systems that interact
• Maintenance◊ Understand system before making changes.
• happens in an inherently unstructured way.
ELEC 875 – Design Recovery and Automated Evolution
Domain Expertise• Domain Model◊ Human knowledge of the application area.
- needed for development and maintenance
◊ Tools need to abstract domain knowledge as well.
ELEC 875 – Design Recovery and Automated Evolution
Biggerstaff• Design recovery whenever a system is maintained• Several Steps◊ Program Understanding
- Modules- Key data items- Software engineering artifacts- Informal design abstractions- Relate SE artifacts and informal abstractions to the code
◊ Population of Reuse and Recovery Libraries◊ Applying Results of Design Recovery
ELEC 875 – Design Recovery and Automated Evolution
Identify the Modules• Not all languages have modules• software of any size has modules• variety of ways to implement modules◊ separate files and compilation units
- module.h module.c- no nested modules- smaller modules (one file)- may be more than one implementation file
- e.g. module1.c module2.c• naming convention for type, procedure or variable
names
ELEC 875 – Design Recovery and Automated Evolution
Key Data Items• Most programs are organized around one or more
specific data items.◊ Master journal record in transaction systems◊ Master account database◊ Ready, wait and device queues in operating
systems• These data items are some abstraction of the
problem domain. What are they?◊ Customer, Sale, Deposit, Process
• How are they related to the modules◊ SA&D vs ADTs◊ Functional Decomposition vs OO
ELEC 875 – Design Recovery and Automated Evolution
SE Artifacts• The result of Design Recovery (as expressed by
Biggerstaff) are design artifacts◊ dependent on shop◊ PDLs, Dataflow, Data Dictionary◊ UML?
• Does not have to match the artifacts originally used to create the system
• Artifacts must be appropriate for system◊ Consequences of a poor fit?◊ UML for 40 year old transaction system
ELEC 875 – Design Recovery and Automated Evolution
Informal Design Abstractions• Informal descriptions of concepts that occur in the
code (automatable?)• Design Rational• Original Designers are not available, or it may be
so long that they do not remember◊ People’s version of history change over the
years◊ Guess◊ Source Code Comments◊ Existing Documentation
ELEC 875 – Design Recovery and Automated Evolution
Relating Abstractions to Code• Link the recovered design back to the code• Which functions are part of which module?• Which files are part of a UML class?• Which data structure represents a particular
informal concept
• Necessary to answer low level questions that have been abstracted out◊ needed in order to use the system◊ not designing systems from scratch, modifying
existing systems.- modifications to the design imply modifications to particular pieces of code
ELEC 875 – Design Recovery and Automated Evolution
Reuse and Application• late 80’s early 90’s - big thing was code reuse• Identify reusable parts of code◊ generalize to make more reusable◊ factoring and decoupling
• Biggerstaff - not just code reuse, but also design recovery reuse◊ help build similar components◊ help recover similar components from other
systems
ELEC 875 – Design Recovery and Automated Evolution
Analysis• Informal Information◊ semantics is not the only thing
- turing computable argument◊ real systems do not contain random code
- they have to understand it and have some confidence that it actually works
◊ naming conventions◊ structural conventions
• One main goal is to help humans◊ don’t underestimate humans
ELEC 875 – Design Recovery and Automated Evolution
Informal Informationplausible instances of the conceptual ab- straction.
Associative connections. A conceptual abstraction’s second property is its rich set of informal, natural-language associations that establish its contextual framework for human understanding. That is, the concept of a process management module has semantic connections to other informal, semantic concepts such as context switch- ing, state saving, andmultitasking. Each of these concepts allows association of the concept of a process management module with a large body of knowledge that can help an engineer interpret the design of some specific process management mod- ule or plan the design of a new one.
These two properties provide clues to the role of conceptual abstractions in deal- ing with large complex designs. The struc- tural property provides a way to handle lots of detail without being overwhelmed, as well as a way of describing the application patterns one expects to find in programs. In contrast, the associative linguistic prop- erty offers a way to deal with partially specified (fuzzy) design objects within the universe of informal, natural language- based semantics. These properties relate to two parallel and complementary models - the software-engineering representa- tion model and the natural-language se- mantic model.
The importance of informal informa- tion. An example will illustrate the impor- tance of the informal aspect of conceptual abstractions. Consider the C function in Figure4. This is areal function taken from a multitasking window system’ with the comments removed and meaningful iden- tifiers mapped to semantically empty symbols. What could an analyst tell about the computational intent of this function? Precious little. About all he or she could do is paraphrase the relations expressed in the programming language. For example, the analyst could describe that the function fOOOl calls f0002 with arguments that are global arrays (such as gOOO1) of structures containing some fields (such as so001 and ~0002) . Even if the definitions of all of the functions (f0002, f0003, etc.) were avail- able and similarly transformed, the com- putational intent would remain unclear. What is worse is that, without the informal information, the computational intent of these functions might not be unique. There could be a number of valid interpretations.
The example severs the connection be- tween the artifact and the semantics of the
July 1989
#include <stdio.h> #include “h0001 .h” #include “h0002.h” #include “h0003.h” f000 1 (a000 1 )
unsigned int a000 1 ; I unsigned int io00 1 ; f0002(g0005,d000 1 ,d0002); f0002(a000 1 ,d0003,d0002); f0003(g0001 [a0001 ] .so001 ,go001 [a000 l].sOOO2); go006 = a000 1 ; io001 = g0001[a0001].s0003; if( !f0004(i0001) && (gOOO2->gOOO3)[iOOOl].sOOO4 == d0004)
1 fOO05(iOOO 1 );
Figure 4. Function with no informal semantic clues.
#include <stdio.h> #include “pr0c.h” #include “window.h” #include “g1obdefs.h” change-window(nw)
unsigned int nw; I unsigned int pn; border-attribute(cwin,NORM_ATTR,INV_ATTR,INV-ATTR); border-attribute(nw,NORMHLIT-ATTR,INV-ATTR); move-cursor(wintbl[nw].crow,wintbl[nw].ccol); cwin = nw; pn = wintbl[nw].pnumb; if(!outrange(pn) && (g->proctbl)[pn].procstate == SUSPENDED)
t resume(pn);
Figure 5. Function with some informal semantic clues.
problem domain, eliminating associations between the program and our informal knowledge of the world. Interpretation and understanding of the program has become impossible in any deep sense. Thus, we can see that connotation plays an important role in the process by which people deal with, interpret, and understand programs.
It is exactly this kind of semantically impoverished representation that we usu- ally give to automated tools. If people have difficulty dealing with this kind of repre-
sentation, why should we expect a com- puter to be more successful?
So what sort of informal information is required to understand the program in a nonsuperficial way? Let us consider a slightly enhanced version of this program. Figure 5 maps the symbolic names back to those used in the original code. Here, the names of the functions are more meaning- ful and, if the reader understands a bit about multitasking and window systems, he or she can probably make some good
41
ELEC 875 – Design Recovery and Automated Evolution
Informal Information
plausible instances of the conceptual ab- straction.
Associative connections. A conceptual abstraction’s second property is its rich set of informal, natural-language associations that establish its contextual framework for human understanding. That is, the concept of a process management module has semantic connections to other informal, semantic concepts such as context switch- ing, state saving, andmultitasking. Each of these concepts allows association of the concept of a process management module with a large body of knowledge that can help an engineer interpret the design of some specific process management mod- ule or plan the design of a new one.
These two properties provide clues to the role of conceptual abstractions in deal- ing with large complex designs. The struc- tural property provides a way to handle lots of detail without being overwhelmed, as well as a way of describing the application patterns one expects to find in programs. In contrast, the associative linguistic prop- erty offers a way to deal with partially specified (fuzzy) design objects within the universe of informal, natural language- based semantics. These properties relate to two parallel and complementary models - the software-engineering representa- tion model and the natural-language se- mantic model.
The importance of informal informa- tion. An example will illustrate the impor- tance of the informal aspect of conceptual abstractions. Consider the C function in Figure4. This is areal function taken from a multitasking window system’ with the comments removed and meaningful iden- tifiers mapped to semantically empty symbols. What could an analyst tell about the computational intent of this function? Precious little. About all he or she could do is paraphrase the relations expressed in the programming language. For example, the analyst could describe that the function fOOOl calls f0002 with arguments that are global arrays (such as gOOO1) of structures containing some fields (such as so001 and ~0002) . Even if the definitions of all of the functions (f0002, f0003, etc.) were avail- able and similarly transformed, the com- putational intent would remain unclear. What is worse is that, without the informal information, the computational intent of these functions might not be unique. There could be a number of valid interpretations.
The example severs the connection be- tween the artifact and the semantics of the
July 1989
#include <stdio.h> #include “h0001 .h” #include “h0002.h” #include “h0003.h” f000 1 (a000 1 )
unsigned int a000 1 ; I unsigned int io00 1 ; f0002(g0005,d000 1 ,d0002); f0002(a000 1 ,d0003,d0002); f0003(g0001 [a0001 ] .so001 ,go001 [a000 l].sOOO2); go006 = a000 1 ; io001 = g0001[a0001].s0003; if( !f0004(i0001) && (gOOO2->gOOO3)[iOOOl].sOOO4 == d0004)
1 fOO05(iOOO 1 );
Figure 4. Function with no informal semantic clues.
#include <stdio.h> #include “pr0c.h” #include “window.h” #include “g1obdefs.h” change-window(nw)
unsigned int nw; I unsigned int pn; border-attribute(cwin,NORM_ATTR,INV_ATTR,INV-ATTR); border-attribute(nw,NORMHLIT-ATTR,INV-ATTR); move-cursor(wintbl[nw].crow,wintbl[nw].ccol); cwin = nw; pn = wintbl[nw].pnumb; if(!outrange(pn) && (g->proctbl)[pn].procstate == SUSPENDED)
t resume(pn);
Figure 5. Function with some informal semantic clues.
problem domain, eliminating associations between the program and our informal knowledge of the world. Interpretation and understanding of the program has become impossible in any deep sense. Thus, we can see that connotation plays an important role in the process by which people deal with, interpret, and understand programs.
It is exactly this kind of semantically impoverished representation that we usu- ally give to automated tools. If people have difficulty dealing with this kind of repre-
sentation, why should we expect a com- puter to be more successful?
So what sort of informal information is required to understand the program in a nonsuperficial way? Let us consider a slightly enhanced version of this program. Figure 5 maps the symbolic names back to those used in the original code. Here, the names of the functions are more meaning- ful and, if the reader understands a bit about multitasking and window systems, he or she can probably make some good
41
ELEC 875 – Design Recovery and Automated Evolution
Informal Information#include <stdio.h> #include “pr0c.h” #include “window. h” #include “g1obdefs.h” change-window(nw) /*Change current window to window nw*/
/*Number of target window*/ unsigned int nw; 1 unsigned int pn;
/*Restore border of current window to un-highlighted*/ border-attribute(cwin,NORM-ATTRJNV-ATTR);
/*Highlight border of new current window*/ border-attri bute( nw,NORMHLIT-ATTRJNV-ATTR);
/*Move the physical cursor to the new window where the cursor was left, and make nw the current window*/ move~cursor(wintb~[nw].crow,wintbl[nw].ccol); cwin = nw:
/*Resume the process associated with the new window if it is suspended.*/ pn = wintbl[nw].pnumb; if( !outrange(pn) && (g->proctbl)[pn] .procstate == SUSPENDED)
I resume(pn);
Figure 6. Function with many informal semantic clues.
guesses about the operation. The name of the function suggests that it changes which window is currently active, with the new window probably indicated by the argu- ment nw. Further, we can guess that the function border-attribute alters the visual appearance of the windows’ borders, the function move-cursor moves the screen cursor to some position in the new win- dow, and the function resume allows some suspended process to run again (probably the process associated with the new win- dow). The variables similarly come alive with meaning: wintbl is probably the win- dow table and probably has fields ccol and crow that keep track of the cursor (inferred from their use in the call to move-cursor).
By restoring the comments from the original code (see Figure 6), we can cor- roborate several of our guesses and en- hance our understanding of some of the functions and variables.
This exercise should make it clear that the informal linguistic information that the software engineer deals with is not simply supplemental information that can be ig- nored because automated tools do not use it. Rather, this information is fundamental.
42
It provides the ability to determine the computational intent of code in a way that is impossible with just the source code denuded of its informal semantics.
If we are to use this informal informa- tion in design recovery tools, we must propose a form for it, suggest how that form relates to the formal information captured in program source code or in formal specifications, and propose a set of operations on these structures that imple- ments the design recovery process. To accomplish these goals, we must first ana- lyze the proposed design recovery system in a bit more detail.
A model-based design recovery system
What would a design recovery system look like? Figure 7 is a system-level de- scription of a model-based design recov- ery system (called Desire) showing some of the sources of information used to re- cover designs. They include the code of existing systems because such code COII-
tains a large amount of important informa- tion, but there must be other sources as well. Much design information cannot be formally captured in the program source code because programming languages do not contain the constructs necessary to express information such as the informal conceptual abstractions behind the code. For example, the informal conceptual ab- stractions behind the change-window function discussed earlier include win- dows, processes, cursors, and the opera- tions on these entities. And these concep- tual abstractions are woven into a rich set of knowledge about the domain that pro- vides clues to understanding the formal source code structures.
Design recovery results in a hypertext web4 of information that weaves together informal ideas (e.g., the concept of a proc- ess), software engineering artifacts (e.g.. a dataflow diagram of aprocess switch), and details of specific examples of these enti- ties as embodied in code (e.g., one specific subroutine for process switching). This web is projected into externalized reports to help the software engineer understand a specific target system and into internalized data structures for use by the Rose reuse ~ y s t e m . ~ Since the web is built out of hy- pertext frames, the design recovered by the Desire system is simply a set of data struc- tures that represent the conceptual abstractions and express the semiformal relationships among them (see sidebar, “Concepts of object-oriented program- ming”).
To understand how this model-based design recovery system works, the nature of the data items in the model, and how those data items are used, consider the following typical design recovery session using the multitasking window system as the application domain. Using the process model of design recovery as a guide, I will first define a set of data objects (called idioms) that implement the structural pat- terns and associative connections of con- ceptual abstractions. The example idioms codify the domain model’s expectations of the entities and structures in a typical multitasking window system running on a personal computer. I will then informally describe how these domain objects behave during the semiautomated recovery of the design of a specific multitasking window system. This scenario is analogous to a nonautomated design recovery performed by an unaided software engineer.
An example. We start to discover the structure of this multitasking window sys-
COMPUTER
ELEC 875 – Design Recovery and Automated Evolution
Idioms• Information frames◊ Conceptual model of domain◊ binds in some way to source code.
- direct - indirect◊ drew a lot from frame based AI techniques
popular at the time. Process number, process name, process state
identify elements of a process tableAddress, Name, Postal Code indications of a
Customer.
ELEC 875 – Design Recovery and Automated Evolution
Desire• linguistic patterns - lexical◊ representation of informal information◊ naming convention
• Structural Requirements◊ presence of one component implies another◊ some structures are aggregations of other
structures• Incomplete Match◊ not all systems are created equal◊ manual intervention
ELEC 875 – Design Recovery and Automated Evolution
Prototype• lower level◊ functions, files, global data items◊ definition locations, use locations◊ calls uses depends
• Components◊ parser, analysis, view generation◊ links comments to artifacts
• Viewer◊ queries link back to source code
ELEC 875 – Design Recovery and Automated Evolution
Analysis• Prototype is lower level◊ starting point is the code◊ may also include comments
• Link Back to Code◊ always important◊ use to modify existing code◊ knowledge of design is important, but only
useful if it helps you in the maintenance task• Manual Intervention◊ Design recovery includes abstract concepts.
Until real AI is created, human mind is still king.