a journey through the bushmigod/papers/2000/highlevelschemas-4.pdf · ctags, cxref, cscope •...
TRANSCRIPT
A Journey through the Bush
Presented by Michael W. GodfreySoftware Architecture Group (SWAG)
Dept of Comp Sci, Univ of Waterloo
This presentation is available fromhttp://plg.uwaterloo.ca/~migod/papers/
WoSEF 00 -- High Level Schemas 2
My answer:
Any schema above the statement level
I see two distinct levels of abstraction:1. Programming language entity level
– Entities are fcns, (non-local) vars, types, classes, …
2. Architectural level– Entities are modules, subsystems, classes, interfaces, …
WoSEF 00 -- High Level Schemas 3
• Lots of – motivational work– ad hoc extractor snarfing– experimental translation mechanisms
• Examples (many others exist)– CORUM I and II– GRAX – TAXForm (TA eXchange FORMat) using Acacia, Rigiparse– Rigi using VA– Dali using Sniff+
WoSEF 00 -- High Level Schemas 4
• I would like to be able to use other extractors …– Want to perform architectural analyses of systems
written in languages other than C
– Want to implement BEAGLE(a tool for exploring software evolution)
• … but extractors differ in languages modelled, level of detail, robustness, bugs, data format, …– I want to be able to convert data between tools.
– Need agreement (awareness) from tool creators
WoSEF 00 -- High Level Schemas 5
PBS Extractor(cfx)
Rigi Extractor(rigiparse)
Dali Extractor(SNiFF+)
TAXFormRepository
PBS Viewerand Abstraction
Tools
SystemArtifacts
BunchClustering Tool
Rigi SHriMPViewer
Dali toTAXFormConverter
Rigi toTAXFormConverter
cfx toTAXFormConverter
Bunch /TAXFormConverter
TAXForm toRigi Converter
WoSEF 00 -- High Level Schemas 6
Universal
High-Level
Procedural
PL/I C
Object-Oriented
C++ Java
Dali C Rigi CPBS C
WoSEF 00 -- High Level Schemas 7
SourceFile
usesfile
Data Type
defines
Procedure Data
definesdefines
usestype
usesdata
defines defines
usesprocedure
uses type
WoSEF 00 -- High Level Schemas 8
Module
depends-on
Subsystemcontains
contains
WoSEF 00 -- High Level Schemas 9
• Would like to concentrate on procedural and OO languages.– Others are interested in COBOL, JCL etc.
• I am interested in high-level info (f calls g)– but not in ASGs, code-level metrics
• Need to agree on– Syntax– Level of granularity and detail– What to do in case of X e.g., X = “missing files”
WoSEF 00 -- High Level Schemas 10
[influenced by Acacia’s C and C++ data models]
Top-level programming language entities:– functions, variables, constants, type definitions (procedural
languages)
– methods, class member data, static methods and member data (object-oriented languages)
Entity containers: – files, modules, classes, packages
WoSEF 00 -- High Level Schemas 11
Entity attributes:– Name, unique identifier (UID -- see next section)– UID of container, UID of containing file (if container is not a file)– Signature/data type– Line number information (see below)– Declared scope/visibility, static or not, final or not– Definition or declaration (see below)
Entity container attributes:– name, UID– relative path (if a file)– version identifier (if provided)– UID of container (if not a file), UID of containing file (if not a file)
WoSEF 00 -- High Level Schemas 12
Relationships:– Function calls, variable uses– Line number information (see below)– Container use/inclusion (by other containers)– Inheritance (various kinds)– “Friendship”, various template relationships
Relationship attributes:– Line number information (see below)– Scope/permission of inheritance
WoSEF 00 -- High Level Schemas 13
Some technical problems:– UID generation? (name-mangling?)– Line numbering (ranges)?– Incomplete information?
• ill-formed code, gcc/K&R-isms• missing header files• resolving entity use to dfn/dcl
(esp. with polymorphism, overloading)
– Pre or post preprocessing?
WoSEF 00 -- High Level Schemas 14
We’ve had these conversations before …
“Getting academics to agree on anything is like herding cats.”
WoSEF 00 -- High Level Schemas 15
• PBS [UWloo]
• Acacia [AT&T]
• cxref, ctags
• TA++ [UOttawa]
• Rigi [UVictoria]
• SPOOL [UMontréal]
• BAUHAUS [UStuttgart]
• GUPRO [UKoblenz]
• SHORE [SD&M]
• Neuhold [UVienna]
WoSEF 00 -- High Level Schemas 16
• Intended use– Level of schema (entity level vs. architectural)
– Amount of detail
• Languages modelled– Multi-lingual– Common super schemas– Model “cross-overs” (e.g., JCL, embedded SQL)
• Hidden assumptions– Known limitations
• Notation/approach to store factbase– Support for translations and transformations
• What’s particularly novel and noteworthy
WoSEF 00 -- High Level Schemas 17
[Holt et al. @ UWaterloo]
• Portable Bookshelf is a reverse engineering tool for creating software architecture models of large systems: – Guinea pigs: Mozilla, Linux, Apache, VIM, Mitel,
TOBEY
• Consists of fact extractor, fact manipulation engine (“grok”), and visualization tool (“landscape”)
sourcecode
cfx groklandscape
viewerentity-level
factsarchitectural
facts
WoSEF 00 -- High Level Schemas 18
WoSEF 00 -- High Level Schemas 19 WoSEF 00 -- High Level Schemas 20
WoSEF 00 -- High Level Schemas 21
[Chen, Gansner et al. @ AT&T]
• History: – CIA → CIAO → Acacia
• Consists of – C and C++ extractors
– SQL-like query engine
– visualization with auto-layout
WoSEF 00 -- High Level Schemas 22
• Entity attributes:– Hex UID, name, kind (file, function, type, var, macro),
filename, datatype (string), typeclass (enum, struct, etc.), linenum info for def/dec, def/dec/undef, param list, template info, scope, storage spec (static, const, inline, inline virtual, etc.), signature
• Relationship attributes:– Linenum info, rel. kind (refers, contains, inherits,
instantiates, typedef, etc.), relationship scope
WoSEF 00 -- High Level Schemas 23
• SQL-like queries for entities and relationships produces “;” delimited textual output:
% ksh cdef -u fu closeTagFile
26f53ece;closeTagFile;function;entry.h;void;regular;83;0;83;dec;00000000;(const boolean);;extern;;;;
76e7ae31;closeTagFile;function;entry.c;void;regular;551;553;563;def;00000000;(const boolean);;extern;;;;
% ksh cref –u - - - m file2=‘osdeps.h’
<all entity1 attrs> ; <all entity2 attrs > ; <rel attrs>
WoSEF 00 -- High Level Schemas 24
ctags, cxref, cscope
• These are “open source” Unix tools that perform extractions:– ctags extracts only entity info
• e.g., file, name, line num, kind, etc• works with C, C++, Eiffel, Fortran, and Java. • Used for fast context switching while editing source code with vim/emacs
– cxref generates cross-reference table for C systems. • Often used for webifying source code (e.g., Linux, Mozilla).
– cscope used for program comprehension of C systems (e.g., who calls f, who uses v)
• Older commercial Unix tool, recently open sourced.
WoSEF 00 -- High Level Schemas 25
[Lethbridge et al. @ UOttawa]
• TKSee aids programming comprehension– i.e., what programmers do all day
– TA++ is the data modelling language
• Want “full story” from the source code:– Want pre-preprocessing view of code for all platforms
and environments (text editor’s view)
– … but most extractors use a compiler front end and preprocess toward a particular target and option set
• Some extractors keep some macro info
WoSEF 00 -- High Level Schemas 26
WoSEF 00 -- High Level Schemas 27 WoSEF 00 -- High Level Schemas 28
WoSEF 00 -- High Level Schemas 29
[Koschke et al. @ UStuttgart]
• Software architecture recovery system– Parse code, look for hidden/decayed abstractions, then
redesign– Uses various heuristics to perform “clustering”– Works both at entity level and subsystem level
• Built from many tools …– … including Rigi viewer and a customized C
parser/extractor that (optionally) dumps RSF
• Example WoSEF problem:– Cannot derive full includes hierarchy from Bauhaus
extracted facts; this was a design decision, as the researchers were not interested in this information
WoSEF 00 -- High Level Schemas 30
WoSEF 00 -- High Level Schemas 31 WoSEF 00 -- High Level Schemas 32
WoSEF 00 -- High Level Schemas 33
[Ebert, Kullbach, Winter et al. @ UKoblenz]
• GUPRO supports simultaneous modelling of inter-related systems written in different programming languages– In particular, concerned with the COBOL/MVS/JCL
mainframe world
• GUPRO is notable because:– Simultaneously multilingual– Explicitly models “boundary crossings” (!)– Looks at (very real) problems of the mainframe world
• COBOL, JCL, database migration
WoSEF 00 -- High Level Schemas 34
• Candidate system is modelled in an object-based repository using a graph-based approach:
EER (modelling language)
+GRAL (constraint language)
• GReQL mechanism supports structured queries on the repository via restricted first-order logic
WoSEF 00 -- High Level Schemas 35
JCL schema COBOL schema
WoSEF 00 -- High Level Schemas 36
Integrated schemas for JCL and COBOL
WoSEF 00 -- High Level Schemas 37 WoSEF 00 -- High Level Schemas 38
[Hess et al. @ SD&M]
• SHORE is a web-based repository that stores information extracted from structured documents e.g., XML-ified source code, reqs spec
• Uses “layered meta model” to integrate different programming languages– Has language independent meta model plus
specializations for Java and COBOL models
– Has parsers (XML-ifiers) for Java, COBOL
WoSEF 00 -- High Level Schemas 39
• Their current schemas are “high-level”, but they propose that a future exchange format should model:– all AST-level (structural) info
– all semantic analysis info
• Not clear (to me) how entity resolution is done (name-based?):– seems to assume a tree-based definitional/structural
view of the code
WoSEF 00 -- High Level Schemas 40
WoSEF 00 -- High Level Schemas 41
Source <<document type>>
CobolSource <<document type>>
XCobolSource <<document type>>
JavaSource <<document type>>
<<is a>> <<is a>> <<is a>>
© 2000 software design & management AG
WoSEF 00 -- High Level Schemas 42
programminglanguages
object orientedlanguages
Common Meta Model for Programming Languages
© 2000 software design & management AG
structure
behavior data
Procedural Programming Languages
© 2000 software design & management AG
WoSEF 00 -- High Level Schemas 43
Parameter Operation 0..* 1 0..* 1
belongs to
Module 0..*
0..* 0..*
reuse
0..*
DataElement
Callable
Source
Namespace
0..*
0..*
0..*
0..*
includes
Element
0..*
1
0..*
1
in
0..*
0..*
0..*
0..*
imports
0..*
0..*
0..*
0..*
uses
0..*
depends on
0..*
Template
TemplateParameter
1
0..*
1
0..*
belongs to
Specification 0..* 0..*
0..*
extends
0..*
<<subset of>>
<<subset of>> <<subset of>>
© 2000 software design & management AG
WoSEF 00 -- High Level Schemas 44
Template(from structure)
TemplateFunction
Operation(from structure)
Signature(from structure)
Specification(from structure)
Function 0..* 0..10..* 0..1
implements
Module(from structure) 0..*0..* 0..*0..*
implements
redefines
Callable(from structure)
Namespace(from structure)
0..*
0..*
0..*
0..*
calls
Element(from structure)
0..*
0..*
0..*
0..*
uses
<<subset of>>
0..*
depends on
0..*
© 2000 software design & management AG
WoSEF 00 -- High Level Schemas 45
Member Variable Constant Parameter (from structure)
Namespace (from structure)
Element (from structure)
0..*
0..*
0..*
0..*
uses 0..*
depends on
0..*
TemplateParameter (from structure)
TemplateArgument
instatiates
DataElement (from structure)
Type
0..* 1 0..* 1 has
aliases <<subset of>>
© 2000 software design & management AG
WoSEF 00 -- High Level Schemas 46
oo structure
oo behavior
Common Meta Model for OO Languages
© 2000 software design & management AG
Behaviour of OO Languages
Class (from oo structure)
Exception (from oo structure)
Method (from oo structure)
0..* 0..* 0..* 0..*
throws
© 2000 software design & management AG
WoSEF 00 -- High Level Schemas 47
MethodeSignature Methode
TemplateClass
TemplateMethode Attribute
Type (from data)
Callable (from structure)
Package
Template (from structure)
Member (from data)
Namespace (from structure)
Operation (from structure)
Signature (from structure)
Function (from behavior)
Module (from structure)
0..* 0..*
0..*
reuse 0..*
Specification (from structure)
0..* 0..*
0..*
extends 0..*
0..* 0..* 0..* 0..* implements
Class
0..* 0..* 0..*
reuse
0..*
Interface 0..* 0..* 0..* 0..*
implements 0..* 0..* 0..*
extends
0..*
© 2000 software design & management AG
WoSEF 00 -- High Level Schemas 48
JavaClass
JavaMethod
JavaException
JavaPackage JavaInterfac
JavaVariable JavaAttribute
JavaParameter JavaSignature
Interface (from oo structure)
Class (from oo structure)
Package (from oo structure)
MethodSignature (from oo structure)
Exception (from oo structure)
Method (from oo structure)
Attribute (from oo structure)
Parameter (from structure)
Variable (from data)
WoSEF 00 -- High Level Schemas 49
[Karin Neuhold @ UWien]
• Consists of parsers + repository + metrics engine• Interested in applying OO metrics to code
– Concerned with “statement level” of detail, but some “flattening” was performed.
• Have parsers for several OO languages …– C++, Java, Delphi, Smalltalk
• … but wanted a single meta-model (repository schema) that would be as language independent as possible.– Some language-specific specialization allowed in
repository
WoSEF 00 -- High Level Schemas 50
WoSEF 00 -- High Level Schemas 51 WoSEF 00 -- High Level Schemas 52
WoSEF 00 -- High Level Schemas 53 WoSEF 00 -- High Level Schemas 54
WoSEF 00 -- High Level Schemas 55 WoSEF 00 -- High Level Schemas 56
WoSEF 00 -- High Level Schemas 57
• Lots of sticky issues at the prog. lang. level:– To pre- or not to pre-process– Entity resolution often not done– What is a function: def, dec, polymorphism,
overloading, templates, …– How to deal with missing libraries, incremental
extractions, versioned extractions, non-ANSI-isms, …
• Conceptual gaps:– COBOL/JCL world very different from C/C++/Java
world– “I didn’t know you wanted full includes info…”
WoSEF 00 -- High Level Schemas 58
• Many of us seem to be doing similar kinds of extractions. It seems like that:– Many extractors can be used within other tools– Some form of common interchange format is feasible
• Challenges:– May want to use multiple tools together
• I am working on a standalone cxref-based hack to add full includes information to a BAUHAUS converter
– Can we take advantage of the web to set up some sort of distributed fact extraction/conversion factory?
Q: �����������