3/7/2003bioinformatics1 how to address rapidly changing data representations in an evolving...

47
3/7/2003 3/7/2003 Bioinformatics Bioinformatics 1 How To Address Rapidly Changing How To Address Rapidly Changing Data Representations in an Data Representations in an Evolving Scientific Domain Evolving Scientific Domain Using Aspect-oriented Using Aspect-oriented Programming Techniques + Programming Techniques + Overview of Bioinformatics at Overview of Bioinformatics at NEU. NEU. Karl Lieberherr Karl Lieberherr ([email protected]) ([email protected]) College of Computer and College of Computer and Information Science Information Science Northeastern University Northeastern University Boston Boston

Post on 19-Dec-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

3/7/20033/7/2003 BioinformaticsBioinformatics 11

How To Address Rapidly Changing Data How To Address Rapidly Changing Data Representations in an Evolving Scientific Representations in an Evolving Scientific

Domain Using Aspect-oriented Domain Using Aspect-oriented Programming Techniques +Programming Techniques +

Overview of Bioinformatics at NEU.Overview of Bioinformatics at NEU.

Karl Lieberherr ([email protected])Karl Lieberherr ([email protected])

College of Computer and Information College of Computer and Information ScienceScience

Northeastern UniversityNortheastern University

BostonBoston

3/7/20033/7/2003 BioinformaticsBioinformatics 22

MotivationMotivation

From: Computational Challenges in From: Computational Challenges in Structural and Functional Genomics by J. Structural and Functional Genomics by J. Head-Gordon, Head-Gordon, IBM SYSTEMS JOURNAL, VOL 40, NO 2, 2001.

3/7/20033/7/2003 BioinformaticsBioinformatics 33

Some Quotes From Head-Some Quotes From Head-Gordon.Gordon.

Although techniques for warehousing techniques are as vital in the sciences as in business, functional warehouses tailored for specific scientific needs are few and far between.

A key technical reason for this discrepancy is that our understanding of the concepts being explored in an evolving scientific domain change constantly, leading to rapid changes in data representation.

3/7/20033/7/2003 BioinformaticsBioinformatics 44

Some Quotes From Head-Some Quotes From Head-Gordon (Refinement).Gordon (Refinement).

… evolving scientific domain change constantly, leading to rapid changes in data representation.

Not only changes in data representation but also changes in interfaces – need protection against changes in interfaces.

Examples: additional or modified fields or arguments; additional or modified types.

3/7/20033/7/2003 BioinformaticsBioinformatics 55

More Quotes From Head-More Quotes From Head-Gordon.Gordon.

When the format of source data changes, the warehouse must be updated to read that source or it will not function properly. The bulk of these modifications involve extremely tedious, low-level translation and integration tasks that typically require the full attention of both database and domain experts. Given the lack of the ability to automate this work, warehouse maintenance costs are prohibitive, and warehouse “up-times” severely restricted.

3/7/20033/7/2003 BioinformaticsBioinformatics 66

Protect Against Changes.Protect Against Changes.

Protection against changes in data representation and interfaces. Protection against changes in data representation and interfaces. Traditional technique: information-hiding is good to protect Traditional technique: information-hiding is good to protect against changes in data representation. Does not help with against changes in data representation. Does not help with changes to interfaces.changes to interfaces.

Need more than information hiding to protect against interface Need more than information hiding to protect against interface changes: restriction through shy programming, called Adaptive changes: restriction through shy programming, called Adaptive Programming (AP).Programming (AP).

Implementation Interface Client

Information Hiding Shy Programming

3/7/20033/7/2003 BioinformaticsBioinformatics 77

Problem with Information HidingProblem with Information Hiding

Shy Programming builds on the observation that Shy Programming builds on the observation that traditional black-box composition is not traditional black-box composition is not restricting enough. We use the slogan: restricting enough. We use the slogan: information hiding is not hiding enough. information hiding is not hiding enough. Blackbox composition Blackbox composition isolates the isolates the implementation from the interfaceimplementation from the interface, but , but does not does not decouple the interface from its clients.decouple the interface from its clients.

3/7/20033/7/2003 BioinformaticsBioinformatics 88

Cover unimportant parts of the Cover unimportant parts of the interfaceinterface

To permit interfaces to evolve, self-discipline is To permit interfaces to evolve, self-discipline is required to prevent from programming required to prevent from programming extensively against the interface. Certain parts of extensively against the interface. Certain parts of the interface are best left as if they were the interface are best left as if they were covered. covered.

Implementation Interface Client

Information Hiding Shy Programming

3/7/20033/7/2003 BioinformaticsBioinformatics 99

Shy Programming = Shy Programming = Adaptive ProgrammingAdaptive Programming

This disciplined programming is referred to as This disciplined programming is referred to as shy programming. Shy programming lets the shy programming. Shy programming lets the program recover from (or adapt to) interface program recover from (or adapt to) interface changes. Shy programming is also called changes. Shy programming is also called Adaptive Programming (AP). This is similar to Adaptive Programming (AP). This is similar to the shyness metaphor in the Law of Demeter the shyness metaphor in the Law of Demeter (LoD): structure evolves over time, thus (LoD): structure evolves over time, thus communicate with just a subset of the visible communicate with just a subset of the visible objects. objects.

3/7/20033/7/2003 BioinformaticsBioinformatics 1010

Decoupling of InterfaceDecoupling of Interface

We summarize the commonalities and differences We summarize the commonalities and differences between black-box composition and Shy Programming between black-box composition and Shy Programming into two principles.into two principles.– Black-box PrincipleBlack-box Principle: the representation of objects can be : the representation of objects can be

changed without affecting clients.changed without affecting clients.

– Shy-Programming PrincipleShy-Programming Principle: the interface of objects can be : the interface of objects can be changed within certain parameters without affecting clients.changed within certain parameters without affecting clients.

It is important to notice that the Shy-Programming It is important to notice that the Shy-Programming Principle builds on top of the Black-Box principle.Principle builds on top of the Black-Box principle.

3/7/20033/7/2003 BioinformaticsBioinformatics 1111

Manager Metaphor.Manager Metaphor.

A manager M is managing a set of group leaders A manager M is managing a set of group leaders G, each one managing a set of workers W. We G, each one managing a set of workers W. We consider issues related to informing M and consider issues related to informing M and requesting information from M. We use this requesting information from M. We use this example to illustrate three points.example to illustrate three points.– MicromanagerMicromanager – no information restriction. – no information restriction.– ShynessShyness – helps information restriction. – helps information restriction.– Complex requestsComplex requests – help information restriction and – help information restriction and

optimization.optimization.

Want to learn about organizing bioinformatics knowledge.M

G

W

3/7/20033/7/2003 BioinformaticsBioinformatics 1212

Manager Metaphor.Manager Metaphor.

MicromanagerMicromanager – no information restriction. – no information restriction.– If the manager is a micromanager (a manager that If the manager is a micromanager (a manager that

wants to know about and rely on all the details of the wants to know about and rely on all the details of the worker’s projects), the managing approach is worker’s projects), the managing approach is brittlebrittle because when there is a change in the details of one because when there is a change in the details of one of the worker’s projects, the manager needs to be of the worker’s projects, the manager needs to be notified.notified. M

G

W

3/7/20033/7/2003 BioinformaticsBioinformatics 1313

Manager Metaphor.Manager Metaphor.

MicromanagerMicromanager – no information restriction (continued). – no information restriction (continued).– An object-oriented program written in the usual way An object-oriented program written in the usual way

corresponds to the manager that likes to micromanage. It is corresponds to the manager that likes to micromanage. It is full of detailed knowledge of the class graph. An alternative full of detailed knowledge of the class graph. An alternative way of formulating the same idea is to observe that it is good way of formulating the same idea is to observe that it is good when the workers are shy. A shy worker will when the workers are shy. A shy worker will only share only share minimal, high-level information with the group leaderminimal, high-level information with the group leader. And . And this will prevent a brittle situation where the group leaders this will prevent a brittle situation where the group leaders and manager rely on too much detail.and manager rely on too much detail.

M

G

W

3/7/20033/7/2003 BioinformaticsBioinformatics 1414

Manager Metaphor.Manager Metaphor.

ShynessShyness – helps information restriction – helps information restriction – It is good for the workers to be It is good for the workers to be shyshy and only talk to their and only talk to their

group leader and not to the manager directly. (group leader and not to the manager directly. (ShynessShyness has has twotwo facets: talk only to a facets: talk only to a fewfew friendsfriends AND share AND share minimalminimal information with them. Here we use the first facet while in the information with them. Here we use the first facet while in the previous point we used the second facet.) The group leader previous point we used the second facet.) The group leader will abstract the information from the workers and only pass will abstract the information from the workers and only pass on the abstract information to the manager. This will prevent on the abstract information to the manager. This will prevent the manager from micromanaging. This variant can be viewed the manager from micromanaging. This variant can be viewed as an application of the as an application of the Law of DemeterLaw of Demeter (LoD) which states (LoD) which states that an object should talk only to closely related objects. The that an object should talk only to closely related objects. The closely related object for a worker is the group leader and not closely related object for a worker is the group leader and not the manager.the manager.

M

G

W

3/7/20033/7/2003 BioinformaticsBioinformatics 1515

Manager Metaphor.Manager Metaphor.

ShynessShyness – helps information restriction – helps information restriction (continued).(continued).– The motivation is that when things change at the The motivation is that when things change at the

worker level, the manager worker level, the manager does not have to be does not have to be informed necessarilyinformed necessarily. The group leader will be . The group leader will be informed and will decide whether the information informed and will decide whether the information needs to be passed up.needs to be passed up.

M

G

W

shielded

3/7/20033/7/2003 BioinformaticsBioinformatics 1616

Manager Metaphor.Manager Metaphor.

Complex requestsComplex requests – help information restriction and – help information restriction and optimization.optimization.– The manager does not want to be bothered by many simple The manager does not want to be bothered by many simple

requests from the many workers. Instead the manager prefers requests from the many workers. Instead the manager prefers to get a complex request from time to time from a group to get a complex request from time to time from a group manager. The complex request offers the manager the manager. The complex request offers the manager the possibility to possibility to see all the requests as a wholesee all the requests as a whole and to optimize and to optimize the overall result which would not be possible if simple the overall result which would not be possible if simple requests come one by one and need to be satisfied requests come one by one and need to be satisfied immediately before the totality of all simple requests is seen. immediately before the totality of all simple requests is seen.

3/7/20033/7/2003 BioinformaticsBioinformatics 1717

Manager Metaphor.Manager Metaphor.

Complex requestsComplex requests – help information restriction – help information restriction and optimization (continued).and optimization (continued).– The same point applies to programming: instead of The same point applies to programming: instead of

sending an object a lot of individual data access sending an object a lot of individual data access requests, it is better to send one complex request that requests, it is better to send one complex request that can be treated as a whole and optimized accordingly.can be treated as a whole and optimized accordingly.

3/7/20033/7/2003 BioinformaticsBioinformatics 1818

Aspect-oriented Programming Aspect-oriented Programming (AOP).(AOP).

AOP is programming with aspects. An aspect is AOP is programming with aspects. An aspect is a complex request to modify the execution of a a complex request to modify the execution of a program. May expose a large interface. This can program. May expose a large interface. This can be implemented efficiently by inserting code at be implemented efficiently by inserting code at compile time into the program. An aspect should compile time into the program. An aspect should be shy with respect to the program it modifies. be shy with respect to the program it modifies.

3/7/20033/7/2003 BioinformaticsBioinformatics 1919

AOSD: not every concern fits into AOSD: not every concern fits into a component: a component: crosscuttingcrosscutting

CM1 CM2 CM3 CM4 CM5 CM6

CR1 x

CR2 x

CR3 x

CR4 x x x x

Goal: find new component structures that encapsulate “rich” concerns

3/7/20033/7/2003 BioinformaticsBioinformatics 2020

A Reusable Aspect.A Reusable Aspect.abstract public aspect RemoteExceptionLogging {  abstract pointcut logPoint();  after() throwing (RemoteException e): logPoint() { log.println(“Remote call failed in: ” + thisJoinPoint.toString() + “(” + e + “).”); }}

public aspect MyRMILogging extends RemoteExceptionLogging { pointcut logPoint(): call(* RegistryServer.*.*(..)) || call(private * RMIMessageBrokerImpl.*.*(..));}

abstract

3/7/20033/7/2003 BioinformaticsBioinformatics 2121

Good Aspects Are Shy.Good Aspects Are Shy.

abstract aspect CapabilityChecking {

pointcut invocations(Caller c): this(c) && call(void Service.doService(String));

pointcut workPoints(Worker w): target(w) && call(void Worker.doTask(Task));

pointcut perCallerWork(Caller c, Worker w): cflow(invocations(c)) && workPoints(w);

before (Caller c, Worker w): perCallerWork(c, w) { w.checkCapabilities(c); }}

3/7/20033/7/2003 BioinformaticsBioinformatics 2222

Lessons From Manager Lessons From Manager Metaphor.Metaphor.

Information hiding does not hide enough.Information hiding does not hide enough. Information hiding makes all public interfaces Information hiding makes all public interfaces available and (Micromanager) makes the point available and (Micromanager) makes the point that only an abstraction of those interfaces that only an abstraction of those interfaces should be visible at higher levels. should be visible at higher levels.

3/7/20033/7/2003 BioinformaticsBioinformatics 2323

Lessons From Manager Lessons From Manager Metaphor (Continued).Metaphor (Continued).

In Shy Programming, only high-level information about In Shy Programming, only high-level information about the class or call graph is visible at the (shy) the class or call graph is visible at the (shy) programming level and this shields the program from programming level and this shields the program from many changes to the class or call graph in the same way many changes to the class or call graph in the same way as the manager is shielded from many of the changes in as the manager is shielded from many of the changes in the workers’ projects. The role of the group leader is the workers’ projects. The role of the group leader is played by the glue code that maps high-level played by the glue code that maps high-level information to low-level information and vice-versa. information to low-level information and vice-versa. Shy Programming is graph-shy.Shy Programming is graph-shy.

3/7/20033/7/2003 BioinformaticsBioinformatics 2424

Application to Bioinformatics Application to Bioinformatics KnowledgeKnowledge

Need shy programming and shy knowledge Need shy programming and shy knowledge representation techniques for representation techniques for Bioinformatics.Bioinformatics.

Need domain-specific languages to define Need domain-specific languages to define function in a structure-shy way.function in a structure-shy way.

3/7/20033/7/2003 BioinformaticsBioinformatics 2525

Another Good Example of AOP.Another Good Example of AOP.

BusRoute BusStopList

BusStopBusList

Bus PersonList

Person

passengers

buses

busStops

waiting

0..*

0..*

0..*

find all persons waiting at any bus stop on a bus route

OO solution:one methodfor each redclass

3/7/20033/7/2003 BioinformaticsBioinformatics 2626

Traversal Traversal Strategy.Strategy.

BusRoute BusStopList

BusStopBusList

Bus PersonList

Person

passengers

buses

busStops

waiting

0..*

0..*

0..*

from BusRoute through BusStop to Person

find all persons waiting at any bus stop on a bus route

A complex request

3/7/20033/7/2003 BioinformaticsBioinformatics 2727

Robustness of Robustness of Strategy.Strategy.

BusRoute BusStopList

BusStopBusList

Bus PersonList

Person

passengers

busesbusStops

waiting

0..*

0..*

0..*

from BusRoute through BusStop to Person

VillageList

Village

villages

0..*

find all persons waiting at any bus stop on a bus route

Complex request is class-graph shy

3/7/20033/7/2003 BioinformaticsBioinformatics 2828

Writing Aspect-oriented Writing Aspect-oriented Programs With Programs With Strategies.Strategies.

class BusRoute { int countWaitingPersons() { Integer result = (Integer) Main.cg.traverse(this, WPStrategy, new Visitor(){ int r ; public void before(Person host){ r++; } public void start() { r = 0;} public Object getReturnValue() {return new Integer(r);} }); return result.intValue();}}

String WPStrategy=“from BusRoute through BusStop to Person”

A complex request

Complex requestplays role ofmanagerComplex request is class-graph shy

3/7/20033/7/2003 BioinformaticsBioinformatics 2929

Writing Aspect-Oriented Writing Aspect-Oriented Programs With Programs With Strategies.Strategies.

// Prepare current class graphMain.cg = new ClassGraph();

int r = aBusRoute.countWaitingPersons();

String WPStrategy=“from BusRoute through BusStop to Person”

3/7/20033/7/2003 BioinformaticsBioinformatics 3030

ObjectGraph: in UML Notation.ObjectGraph: in UML Notation.

Route1:BusRoute

:BusStopListbusStops

CentralSquare:BusStop

:PersonList

waiting

Paul:Person Seema:Person

:BusListbuses

Bus15:Bus

:PersonList

passengers

Joan:Person

Eric:Person

3/7/20033/7/2003 BioinformaticsBioinformatics 3131

ObjectGraphSlice.ObjectGraphSlice.

Route1:BusRoute

:BusStopListbusStops

CentralSquare:BusStop

:PersonList

waiting

Paul:Person Seema:Person

BusListbuses

Bus15:Bus

:PersonList

passengers

Joan:Person

Eric:Person

3/7/20033/7/2003 BioinformaticsBioinformatics 3232

Summary So Far.Summary So Far.

Aspect-oriented software development helps Aspect-oriented software development helps to create software that is to create software that is – More flexible; supports easy adaptation to More flexible; supports easy adaptation to

rapidly changing interfaces.rapidly changing interfaces.– Easier to understand and also shorter.Easier to understand and also shorter.– Supports the Shy Programming Principle.Supports the Shy Programming Principle.

3/7/20033/7/2003 BioinformaticsBioinformatics 3333

Institute for Complex Scientific Institute for Complex Scientific SoftwareSoftware

Institute Home Page:Institute Home Page:

http://www.icss.neu.edu/http://www.icss.neu.edu/

3/7/20033/7/2003 BioinformaticsBioinformatics 3434

What?What? Problem driving institute:Problem driving institute:

– Complexity of building software Complexity of building software systems to enable scientific researchsystems to enable scientific research

Objective:– Develop general methodologies

for building complex scientific software using latest computer science research

3/7/20033/7/2003 BioinformaticsBioinformatics 3535

Goals.Goals.

Applications

Computer Science

The The InstituteInstitute

ScientificSoftwareSolutions

NewMethodologies

3/7/20033/7/2003 BioinformaticsBioinformatics 3636

Applicable Computer Science Applicable Computer Science Research.Research.

Aspect-Oriented Software DevelopmentAspect-Oriented Software Development Software ComponentsSoftware Components ParallelismParallelism Domain Specific LanguagesDomain Specific Languages VisualizationVisualization Knowledge-Based Support SystemsKnowledge-Based Support Systems

3/7/20033/7/2003 BioinformaticsBioinformatics 3737

Three Testbeds.Three Testbeds.

THEMATICSTHEMATICS (M. Ondrechen; protein function from (M. Ondrechen; protein function from structure; high external visibility)structure; high external visibility)– Proc. Nat. Academy of Science publicationProc. Nat. Academy of Science publication– Featured in popular scientific magazines: Nature, Featured in popular scientific magazines: Nature,

American Chemical Society, Science DailyAmerican Chemical Society, Science Daily Subsurface Sensing and ImagingSubsurface Sensing and Imaging (many (many

Institute participants from this area)Institute participants from this area) Parallel Geant4Parallel Geant4 (CERN; Cooperman, Reucroft (CERN; Cooperman, Reucroft

and Swain; particle matter interaction -- million line and Swain; particle matter interaction -- million line program)program)

3/7/20033/7/2003 BioinformaticsBioinformatics 3838

Some Other Faculty Highlights.Some Other Faculty Highlights.

Valentin Ilyin.Valentin Ilyin.– Protein structure analysis: novel Protein structure analysis: novel structural alignment method which produces structural alignment method which produces high quality alignments.high quality alignments.

– visual analytical bioinformatics interface visual analytical bioinformatics interface (Friend).(Friend).

Roger Giese.Roger Giese.– The long term goal is to learn whether the The long term goal is to learn whether the measurement of DNA adducts in people can measurement of DNA adducts in people can help to individualize cancer prevention, help to individualize cancer prevention, analogous to the measurement of cholesterol analogous to the measurement of cholesterol as a biomarker for risk of a heart attack.as a biomarker for risk of a heart attack.

3/7/20033/7/2003 BioinformaticsBioinformatics 3939

Some Other Faculty Highlights.Some Other Faculty Highlights.

Bob Futrelle.Bob Futrelle.– I'm particularly interested in the I'm particularly interested in the relations between bio-ontologies relations between bio-ontologies and text and diagrams.and text and diagrams.

3/7/20033/7/2003 BioinformaticsBioinformatics 4040

ConclusionsConclusions

Northeastern University and the Institute for Northeastern University and the Institute for Complex Scientific Software create Complex Scientific Software create knowledge of significant interest to knowledge of significant interest to bioinformatics.bioinformatics.

Aspect-Oriented Software Development is a Aspect-Oriented Software Development is a useful technology for the rapidly evolving useful technology for the rapidly evolving area of bioinformatics.area of bioinformatics.

3/7/20033/7/2003 BioinformaticsBioinformatics 4141

The EndThe End

3/7/20033/7/2003 BioinformaticsBioinformatics 4242

PathSet AlgorithmPathSet Algorithm

We have developed an efficient graph We have developed an efficient graph search algorithm that solves the following search algorithm that solves the following problem:problem:

Input:Input:– Graph G1 = (V1, E1) with source s and target t.Graph G1 = (V1, E1) with source s and target t.– Graph G2 = (V2, E2) where V1 is a subset of V2.Graph G2 = (V2, E2) where V1 is a subset of V2.

Question: Does G2 contain a path that is Question: Does G2 contain a path that is an expansion of a path in G1 from s to t an expansion of a path in G1 from s to t (the algorithm works even if s and t are (the algorithm works even if s and t are sets of nodes.)sets of nodes.)

3/7/20033/7/2003 BioinformaticsBioinformatics 4343

Explanation.Explanation.

Given a path p, a path p' is called an Given a path p, a path p' is called an expansion, if p' can be obtained by expansion, if p' can be obtained by inserting one or more elements between inserting one or more elements between elements of p.elements of p.

More generally, we can find a third More generally, we can find a third graph that succinctly represents all graph that succinctly represents all possible such paths in G2.possible such paths in G2.

Do you see applications of such an Do you see applications of such an algorithm in biology?algorithm in biology?

3/7/20033/7/2003 BioinformaticsBioinformatics 4444

Motivation.Motivation.

G1 is a “small” graph that lists “important” G1 is a “small” graph that lists “important” nodes.nodes.

G2 is a “large” graph in which we want to G2 is a “large” graph in which we want to recognize paths that are expansions of recognize paths that are expansions of paths in the the “small” graph.paths in the the “small” graph.

Expansions of paths may contain additional Expansions of paths may contain additional nodes that are “noise” nodes.nodes that are “noise” nodes.

3/7/20033/7/2003 BioinformaticsBioinformatics 4545

NotesNotes

There is a path in G2 iff the traversal graph There is a path in G2 iff the traversal graph of G1 and G2 is not empty.of G1 and G2 is not empty.

G1 may have exponentially many paths G1 may have exponentially many paths from s to t.from s to t.

3/7/20033/7/2003 BioinformaticsBioinformatics 4646

Topic Switch.Topic Switch.

3/7/20033/7/2003 BioinformaticsBioinformatics 4747

Lessons From Manager Lessons From Manager Metaphor (Continued).Metaphor (Continued).

AOP is related to (Micromanager) through the AOP is related to (Micromanager) through the observation that aspects should be loosely coupled to observation that aspects should be loosely coupled to the base programs they modify. The aspect should not the base programs they modify. The aspect should not be brittle with respect to the detailed calling structure of be brittle with respect to the detailed calling structure of the base program in the same way as the manager the base program in the same way as the manager should not rely on the details of the workers’ project. should not rely on the details of the workers’ project. There is an intermediary, called glue code, that maps There is an intermediary, called glue code, that maps the aspect to the detailed usage context. AOP is call-the aspect to the detailed usage context. AOP is call-graph shy.graph shy.