a journey through the bushmigod/papers/2000/highlevelschemas-4.pdf · ctags, cxref, cscope •...

15
A Journey through the Bush Presented by Michael W. Godfrey Software Architecture Group (SWAG) Dept of Comp Sci, Univ of Waterloo This presentation is available from http://plg.uwaterloo.ca/~migod/papers/ WoSEF 00 -- High Level Schemas 2 My answer: Any schema above the statement level I see two distinct levels of abstraction: 1. Programming language entity level Entities are fcns, (non-local) vars, types, classes, … 2. Architectural level Entities are modules, subsystems, classes, interfaces, … WoSEF 00 -- High Level Schemas 3 Lots of motivational work ad hoc extractor snarfing experimental translation mechanisms Examples (many others exist) CORUM I and II – GRAX TAXForm (TA eXchange FORMat) using Acacia, Rigiparse Rigi using VA Dali using Sniff+ WoSEF 00 -- High Level Schemas 4 I would like to be able to use other extractors … Want to perform architectural analyses of systems written in languages other than C Want to implement BEAGLE (a tool for exploring software evolution) … but extractors differ in languages modelled, level of detail, robustness, bugs, data format, … I want to be able to convert data between tools. Need agreement (awareness) from tool creators

Upload: others

Post on 17-Oct-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Journey through the Bushmigod/papers/2000/highlevelschemas-4.pdf · ctags, cxref, cscope • These are “open source” Unix tools that perform extractions: –ctagsextracts only

A Journey through the Bush

Presented by Michael W. GodfreySoftware Architecture Group (SWAG)

Dept of Comp Sci, Univ of Waterloo

This presentation is available fromhttp://plg.uwaterloo.ca/~migod/papers/

WoSEF 00 -- High Level Schemas 2

My answer:

Any schema above the statement level

I see two distinct levels of abstraction:1. Programming language entity level

– Entities are fcns, (non-local) vars, types, classes, …

2. Architectural level– Entities are modules, subsystems, classes, interfaces, …

WoSEF 00 -- High Level Schemas 3

• Lots of – motivational work– ad hoc extractor snarfing– experimental translation mechanisms

• Examples (many others exist)– CORUM I and II– GRAX – TAXForm (TA eXchange FORMat) using Acacia, Rigiparse– Rigi using VA– Dali using Sniff+

WoSEF 00 -- High Level Schemas 4

• I would like to be able to use other extractors …– Want to perform architectural analyses of systems

written in languages other than C

– Want to implement BEAGLE(a tool for exploring software evolution)

• … but extractors differ in languages modelled, level of detail, robustness, bugs, data format, …– I want to be able to convert data between tools.

– Need agreement (awareness) from tool creators

Page 2: A Journey through the Bushmigod/papers/2000/highlevelschemas-4.pdf · ctags, cxref, cscope • These are “open source” Unix tools that perform extractions: –ctagsextracts only

WoSEF 00 -- High Level Schemas 5

PBS Extractor(cfx)

Rigi Extractor(rigiparse)

Dali Extractor(SNiFF+)

TAXFormRepository

PBS Viewerand Abstraction

Tools

SystemArtifacts

BunchClustering Tool

Rigi SHriMPViewer

Dali toTAXFormConverter

Rigi toTAXFormConverter

cfx toTAXFormConverter

Bunch /TAXFormConverter

TAXForm toRigi Converter

WoSEF 00 -- High Level Schemas 6

Universal

High-Level

Procedural

PL/I C

Object-Oriented

C++ Java

Dali C Rigi CPBS C

WoSEF 00 -- High Level Schemas 7

SourceFile

usesfile

Data Type

defines

Procedure Data

definesdefines

usestype

usesdata

defines defines

usesprocedure

uses type

WoSEF 00 -- High Level Schemas 8

Module

depends-on

Subsystemcontains

contains

Page 3: A Journey through the Bushmigod/papers/2000/highlevelschemas-4.pdf · ctags, cxref, cscope • These are “open source” Unix tools that perform extractions: –ctagsextracts only

WoSEF 00 -- High Level Schemas 9

• Would like to concentrate on procedural and OO languages.– Others are interested in COBOL, JCL etc.

• I am interested in high-level info (f calls g)– but not in ASGs, code-level metrics

• Need to agree on– Syntax– Level of granularity and detail– What to do in case of X e.g., X = “missing files”

WoSEF 00 -- High Level Schemas 10

[influenced by Acacia’s C and C++ data models]

Top-level programming language entities:– functions, variables, constants, type definitions (procedural

languages)

– methods, class member data, static methods and member data (object-oriented languages)

Entity containers: – files, modules, classes, packages

WoSEF 00 -- High Level Schemas 11

Entity attributes:– Name, unique identifier (UID -- see next section)– UID of container, UID of containing file (if container is not a file)– Signature/data type– Line number information (see below)– Declared scope/visibility, static or not, final or not– Definition or declaration (see below)

Entity container attributes:– name, UID– relative path (if a file)– version identifier (if provided)– UID of container (if not a file), UID of containing file (if not a file)

WoSEF 00 -- High Level Schemas 12

Relationships:– Function calls, variable uses– Line number information (see below)– Container use/inclusion (by other containers)– Inheritance (various kinds)– “Friendship”, various template relationships

Relationship attributes:– Line number information (see below)– Scope/permission of inheritance

Page 4: A Journey through the Bushmigod/papers/2000/highlevelschemas-4.pdf · ctags, cxref, cscope • These are “open source” Unix tools that perform extractions: –ctagsextracts only

WoSEF 00 -- High Level Schemas 13

Some technical problems:– UID generation? (name-mangling?)– Line numbering (ranges)?– Incomplete information?

• ill-formed code, gcc/K&R-isms• missing header files• resolving entity use to dfn/dcl

(esp. with polymorphism, overloading)

– Pre or post preprocessing?

WoSEF 00 -- High Level Schemas 14

We’ve had these conversations before …

“Getting academics to agree on anything is like herding cats.”

WoSEF 00 -- High Level Schemas 15

• PBS [UWloo]

• Acacia [AT&T]

• cxref, ctags

• TA++ [UOttawa]

• Rigi [UVictoria]

• SPOOL [UMontréal]

• BAUHAUS [UStuttgart]

• GUPRO [UKoblenz]

• SHORE [SD&M]

• Neuhold [UVienna]

WoSEF 00 -- High Level Schemas 16

• Intended use– Level of schema (entity level vs. architectural)

– Amount of detail

• Languages modelled– Multi-lingual– Common super schemas– Model “cross-overs” (e.g., JCL, embedded SQL)

• Hidden assumptions– Known limitations

• Notation/approach to store factbase– Support for translations and transformations

• What’s particularly novel and noteworthy

Page 5: A Journey through the Bushmigod/papers/2000/highlevelschemas-4.pdf · ctags, cxref, cscope • These are “open source” Unix tools that perform extractions: –ctagsextracts only

WoSEF 00 -- High Level Schemas 17

[Holt et al. @ UWaterloo]

• Portable Bookshelf is a reverse engineering tool for creating software architecture models of large systems: – Guinea pigs: Mozilla, Linux, Apache, VIM, Mitel,

TOBEY

• Consists of fact extractor, fact manipulation engine (“grok”), and visualization tool (“landscape”)

sourcecode

cfx groklandscape

viewerentity-level

factsarchitectural

facts

WoSEF 00 -- High Level Schemas 18

WoSEF 00 -- High Level Schemas 19 WoSEF 00 -- High Level Schemas 20

Page 6: A Journey through the Bushmigod/papers/2000/highlevelschemas-4.pdf · ctags, cxref, cscope • These are “open source” Unix tools that perform extractions: –ctagsextracts only

WoSEF 00 -- High Level Schemas 21

[Chen, Gansner et al. @ AT&T]

• History: – CIA → CIAO → Acacia

• Consists of – C and C++ extractors

– SQL-like query engine

– visualization with auto-layout

WoSEF 00 -- High Level Schemas 22

• Entity attributes:– Hex UID, name, kind (file, function, type, var, macro),

filename, datatype (string), typeclass (enum, struct, etc.), linenum info for def/dec, def/dec/undef, param list, template info, scope, storage spec (static, const, inline, inline virtual, etc.), signature

• Relationship attributes:– Linenum info, rel. kind (refers, contains, inherits,

instantiates, typedef, etc.), relationship scope

WoSEF 00 -- High Level Schemas 23

• SQL-like queries for entities and relationships produces “;” delimited textual output:

% ksh cdef -u fu closeTagFile

26f53ece;closeTagFile;function;entry.h;void;regular;83;0;83;dec;00000000;(const boolean);;extern;;;;

76e7ae31;closeTagFile;function;entry.c;void;regular;551;553;563;def;00000000;(const boolean);;extern;;;;

% ksh cref –u - - - m file2=‘osdeps.h’

<all entity1 attrs> ; <all entity2 attrs > ; <rel attrs>

WoSEF 00 -- High Level Schemas 24

ctags, cxref, cscope

• These are “open source” Unix tools that perform extractions:– ctags extracts only entity info

• e.g., file, name, line num, kind, etc• works with C, C++, Eiffel, Fortran, and Java. • Used for fast context switching while editing source code with vim/emacs

– cxref generates cross-reference table for C systems. • Often used for webifying source code (e.g., Linux, Mozilla).

– cscope used for program comprehension of C systems (e.g., who calls f, who uses v)

• Older commercial Unix tool, recently open sourced.

Page 7: A Journey through the Bushmigod/papers/2000/highlevelschemas-4.pdf · ctags, cxref, cscope • These are “open source” Unix tools that perform extractions: –ctagsextracts only

WoSEF 00 -- High Level Schemas 25

[Lethbridge et al. @ UOttawa]

• TKSee aids programming comprehension– i.e., what programmers do all day

– TA++ is the data modelling language

• Want “full story” from the source code:– Want pre-preprocessing view of code for all platforms

and environments (text editor’s view)

– … but most extractors use a compiler front end and preprocess toward a particular target and option set

• Some extractors keep some macro info

WoSEF 00 -- High Level Schemas 26

WoSEF 00 -- High Level Schemas 27 WoSEF 00 -- High Level Schemas 28

Page 8: A Journey through the Bushmigod/papers/2000/highlevelschemas-4.pdf · ctags, cxref, cscope • These are “open source” Unix tools that perform extractions: –ctagsextracts only

WoSEF 00 -- High Level Schemas 29

[Koschke et al. @ UStuttgart]

• Software architecture recovery system– Parse code, look for hidden/decayed abstractions, then

redesign– Uses various heuristics to perform “clustering”– Works both at entity level and subsystem level

• Built from many tools …– … including Rigi viewer and a customized C

parser/extractor that (optionally) dumps RSF

• Example WoSEF problem:– Cannot derive full includes hierarchy from Bauhaus

extracted facts; this was a design decision, as the researchers were not interested in this information

WoSEF 00 -- High Level Schemas 30

WoSEF 00 -- High Level Schemas 31 WoSEF 00 -- High Level Schemas 32

Page 9: A Journey through the Bushmigod/papers/2000/highlevelschemas-4.pdf · ctags, cxref, cscope • These are “open source” Unix tools that perform extractions: –ctagsextracts only

WoSEF 00 -- High Level Schemas 33

[Ebert, Kullbach, Winter et al. @ UKoblenz]

• GUPRO supports simultaneous modelling of inter-related systems written in different programming languages– In particular, concerned with the COBOL/MVS/JCL

mainframe world

• GUPRO is notable because:– Simultaneously multilingual– Explicitly models “boundary crossings” (!)– Looks at (very real) problems of the mainframe world

• COBOL, JCL, database migration

WoSEF 00 -- High Level Schemas 34

• Candidate system is modelled in an object-based repository using a graph-based approach:

EER (modelling language)

+GRAL (constraint language)

• GReQL mechanism supports structured queries on the repository via restricted first-order logic

WoSEF 00 -- High Level Schemas 35

JCL schema COBOL schema

WoSEF 00 -- High Level Schemas 36

Integrated schemas for JCL and COBOL

Page 10: A Journey through the Bushmigod/papers/2000/highlevelschemas-4.pdf · ctags, cxref, cscope • These are “open source” Unix tools that perform extractions: –ctagsextracts only

WoSEF 00 -- High Level Schemas 37 WoSEF 00 -- High Level Schemas 38

[Hess et al. @ SD&M]

• SHORE is a web-based repository that stores information extracted from structured documents e.g., XML-ified source code, reqs spec

• Uses “layered meta model” to integrate different programming languages– Has language independent meta model plus

specializations for Java and COBOL models

– Has parsers (XML-ifiers) for Java, COBOL

WoSEF 00 -- High Level Schemas 39

• Their current schemas are “high-level”, but they propose that a future exchange format should model:– all AST-level (structural) info

– all semantic analysis info

• Not clear (to me) how entity resolution is done (name-based?):– seems to assume a tree-based definitional/structural

view of the code

WoSEF 00 -- High Level Schemas 40

Page 11: A Journey through the Bushmigod/papers/2000/highlevelschemas-4.pdf · ctags, cxref, cscope • These are “open source” Unix tools that perform extractions: –ctagsextracts only

WoSEF 00 -- High Level Schemas 41

Source <<document type>>

CobolSource <<document type>>

XCobolSource <<document type>>

JavaSource <<document type>>

<<is a>> <<is a>> <<is a>>

© 2000 software design & management AG

WoSEF 00 -- High Level Schemas 42

programminglanguages

object orientedlanguages

Common Meta Model for Programming Languages

© 2000 software design & management AG

structure

behavior data

Procedural Programming Languages

© 2000 software design & management AG

WoSEF 00 -- High Level Schemas 43

Parameter Operation 0..* 1 0..* 1

belongs to

Module 0..*

0..* 0..*

reuse

0..*

DataElement

Callable

Source

Namespace

0..*

0..*

0..*

0..*

includes

Element

0..*

1

0..*

1

in

0..*

0..*

0..*

0..*

imports

0..*

0..*

0..*

0..*

uses

0..*

depends on

0..*

Template

TemplateParameter

1

0..*

1

0..*

belongs to

Specification 0..* 0..*

0..*

extends

0..*

<<subset of>>

<<subset of>> <<subset of>>

© 2000 software design & management AG

WoSEF 00 -- High Level Schemas 44

Template(from structure)

TemplateFunction

Operation(from structure)

Signature(from structure)

Specification(from structure)

Function 0..* 0..10..* 0..1

implements

Module(from structure) 0..*0..* 0..*0..*

implements

redefines

Callable(from structure)

Namespace(from structure)

0..*

0..*

0..*

0..*

calls

Element(from structure)

0..*

0..*

0..*

0..*

uses

<<subset of>>

0..*

depends on

0..*

© 2000 software design & management AG

Page 12: A Journey through the Bushmigod/papers/2000/highlevelschemas-4.pdf · ctags, cxref, cscope • These are “open source” Unix tools that perform extractions: –ctagsextracts only

WoSEF 00 -- High Level Schemas 45

Member Variable Constant Parameter (from structure)

Namespace (from structure)

Element (from structure)

0..*

0..*

0..*

0..*

uses 0..*

depends on

0..*

TemplateParameter (from structure)

TemplateArgument

instatiates

DataElement (from structure)

Type

0..* 1 0..* 1 has

aliases <<subset of>>

© 2000 software design & management AG

WoSEF 00 -- High Level Schemas 46

oo structure

oo behavior

Common Meta Model for OO Languages

© 2000 software design & management AG

Behaviour of OO Languages

Class (from oo structure)

Exception (from oo structure)

Method (from oo structure)

0..* 0..* 0..* 0..*

throws

© 2000 software design & management AG

WoSEF 00 -- High Level Schemas 47

MethodeSignature Methode

TemplateClass

TemplateMethode Attribute

Type (from data)

Callable (from structure)

Package

Template (from structure)

Member (from data)

Namespace (from structure)

Operation (from structure)

Signature (from structure)

Function (from behavior)

Module (from structure)

0..* 0..*

0..*

reuse 0..*

Specification (from structure)

0..* 0..*

0..*

extends 0..*

0..* 0..* 0..* 0..* implements

Class

0..* 0..* 0..*

reuse

0..*

Interface 0..* 0..* 0..* 0..*

implements 0..* 0..* 0..*

extends

0..*

© 2000 software design & management AG

WoSEF 00 -- High Level Schemas 48

JavaClass

JavaMethod

JavaException

JavaPackage JavaInterfac

JavaVariable JavaAttribute

JavaParameter JavaSignature

Interface (from oo structure)

Class (from oo structure)

Package (from oo structure)

MethodSignature (from oo structure)

Exception (from oo structure)

Method (from oo structure)

Attribute (from oo structure)

Parameter (from structure)

Variable (from data)

Page 13: A Journey through the Bushmigod/papers/2000/highlevelschemas-4.pdf · ctags, cxref, cscope • These are “open source” Unix tools that perform extractions: –ctagsextracts only

WoSEF 00 -- High Level Schemas 49

[Karin Neuhold @ UWien]

• Consists of parsers + repository + metrics engine• Interested in applying OO metrics to code

– Concerned with “statement level” of detail, but some “flattening” was performed.

• Have parsers for several OO languages …– C++, Java, Delphi, Smalltalk

• … but wanted a single meta-model (repository schema) that would be as language independent as possible.– Some language-specific specialization allowed in

repository

WoSEF 00 -- High Level Schemas 50

WoSEF 00 -- High Level Schemas 51 WoSEF 00 -- High Level Schemas 52

Page 14: A Journey through the Bushmigod/papers/2000/highlevelschemas-4.pdf · ctags, cxref, cscope • These are “open source” Unix tools that perform extractions: –ctagsextracts only

WoSEF 00 -- High Level Schemas 53 WoSEF 00 -- High Level Schemas 54

WoSEF 00 -- High Level Schemas 55 WoSEF 00 -- High Level Schemas 56

Page 15: A Journey through the Bushmigod/papers/2000/highlevelschemas-4.pdf · ctags, cxref, cscope • These are “open source” Unix tools that perform extractions: –ctagsextracts only

WoSEF 00 -- High Level Schemas 57

• Lots of sticky issues at the prog. lang. level:– To pre- or not to pre-process– Entity resolution often not done– What is a function: def, dec, polymorphism,

overloading, templates, …– How to deal with missing libraries, incremental

extractions, versioned extractions, non-ANSI-isms, …

• Conceptual gaps:– COBOL/JCL world very different from C/C++/Java

world– “I didn’t know you wanted full includes info…”

WoSEF 00 -- High Level Schemas 58

• Many of us seem to be doing similar kinds of extractions. It seems like that:– Many extractors can be used within other tools– Some form of common interchange format is feasible

• Challenges:– May want to use multiple tools together

• I am working on a standalone cxref-based hack to add full includes information to a BAUHAUS converter

– Can we take advantage of the web to set up some sort of distributed fact extraction/conversion factory?

Q: �����������