1 usc information sciences institute yolanda gil artificial intelligence and large-scope science:...

44
1 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil Artificial Intelligence and Large-Scope Science: Workflow Planning and Beyond Yolanda Gil USC/Information Sciences Institute [email protected] www.isi.edu/~gil In collaboration with others in the Intelligent Systems Division and the Center for Grid Technologies at USC/ISI including: Ewa Deelman, Carl Kesselman, Jim Blythe Supported in part by NSF’s GriPhyn and SCEC/CME projects, and by internal grants from USC/ISI INFORMATION SCIENCES INSTITUTE

Upload: gwendoline-holmes

Post on 12-Jan-2016

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 1 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil Artificial Intelligence and Large-Scope Science: Workflow Planning and Beyond Yolanda Gil USC/Information

1USC INFORMATION SCIENCES INSTITUTE Yolanda Gil

Artificial Intelligence and Large-Scope Science:

Workflow Planning and Beyond

Yolanda Gil

USC/Information Sciences Institute

[email protected]

www.isi.edu/~gil

In collaboration with others in the Intelligent Systems Division and the Center for Grid Technologies at USC/ISI including:

Ewa Deelman, Carl Kesselman, Jim Blythe

Supported in part by NSF’s GriPhyn and SCEC/CME projects, and by internal grants from USC/ISI

INFORMATIONSCIENCESINSTITUTE

Page 2: 1 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil Artificial Intelligence and Large-Scope Science: Workflow Planning and Beyond Yolanda Gil USC/Information

2USC INFORMATION SCIENCES INSTITUTE Yolanda Gil

Outline

Motivation• Large-scope large-scale science• Challenges and opportunities for Artificial Intelligence

Research on workflow planning at USC/ISI• Using AI techniques in Pegasus to generate executable grid

workflows Future directions in support of scientific workflows

• Intelligent interactive assistance and automatic completion

• Active workflows• Cognitive grids

Knowledge infrastructure for science• Challenges in Community-Based Knowledge Capture and

Representation

Page 3: 1 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil Artificial Intelligence and Large-Scope Science: Workflow Planning and Beyond Yolanda Gil USC/Information

3USC INFORMATION SCIENCES INSTITUTE Yolanda Gil

The Southern California Earthquake Center’s Community Modeling Environment (SCEC-CME) (http://iowa.usc.edu/cmeportal/)

Page 4: 1 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil Artificial Intelligence and Large-Scope Science: Workflow Planning and Beyond Yolanda Gil USC/Information

4USC INFORMATION SCIENCES INSTITUTE Yolanda Gil

Integrating Diverse Models of Complex Phenomena…

Fault models Fault ruptures

Wave propagation

Historic records

Site response models

Effect on structures

Page 5: 1 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil Artificial Intelligence and Large-Scope Science: Workflow Planning and Beyond Yolanda Gil USC/Information

5USC INFORMATION SCIENCES INSTITUTE Yolanda Gil

…for Broader Use

Geophysicists, civil and structural engineers, city planners, emergency managers, …

• Analyze seismic hazard• Learn and understand seismic hazard

Of course, scientists need this infrastructure as well!

Page 6: 1 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil Artificial Intelligence and Large-Scope Science: Workflow Planning and Beyond Yolanda Gil USC/Information

6USC INFORMATION SCIENCES INSTITUTE Yolanda Gil

Not Just Large-Scale and HPC Issues:Large-Scope Science and Engineering Research

“Whereas large-scale means increasing the resolution of the solution to a fixed physical model problem, large-scope means increasing the physical complexity of the model itself. Increasing the scope involves adding more physical realism to the simulation, making the actual code more complex and heterogeneous, while keeping the resolution more or less constant.”

-- Report from ACM Workshop on Strategic Directions in Computing

Research, A. Sameh et al on Computational Science and Engineering, June 1996

Page 7: 1 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil Artificial Intelligence and Large-Scope Science: Workflow Planning and Beyond Yolanda Gil USC/Information

7USC INFORMATION SCIENCES INSTITUTE Yolanda Gil

How This is Done Today

Scientists:• Verbal communication needed to compose

models• When an earthquake occurs, hard to respond

quickly Other users (e.g., building engineers):

• Use models based on correlations of historical data

• Employ consultants that know how to setup these models

• Delay in accessing state-of-the-art scientific models

Page 8: 1 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil Artificial Intelligence and Large-Scope Science: Workflow Planning and Beyond Yolanda Gil USC/Information

8USC INFORMATION SCIENCES INSTITUTE Yolanda Gil

Scientific Workflows

Models composed into end-to-end scientific workflows that model/analyze complex physical phenomena• In-silico experimentation• Data collection and analysis

Reproducibility, reusability, pedigree

Hazard CurveCalculator: SA vs. prob. exc.

SA exc. probs.

SA exc. prob.

Rupture

Ruptures

Site VS30

Site Basin-Depth-2.5

SA Period

Gaussian Truncation

Std. Dev. Type

Task Result: Hazard curve: SA vs. prob. exc.

Hazard curve: SA vs. prob. exc.

Field (2000)

IMR: SAexc. prob.

Basin-DepthCalculator

Basin-DepthLatLong.

UTM Converter

(get-Lat-Long-given-UTM)

Lat.longUTM

(, , , )

LatLong.CVM-get-

Velocity-at-point

VelocityLatLong.

Ruptures

PEER-FaultGaussian DistNo TruncationTotal Moment

Rate

Duration-Year

Fault-Grid-SpacingRupture Offset

Mag-Length-sigmaDipRake

Magnitude (min)

Magnitude (max)Magnitude (mean)

rfml

rfml

Page 9: 1 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil Artificial Intelligence and Large-Scope Science: Workflow Planning and Beyond Yolanda Gil USC/Information

9USC INFORMATION SCIENCES INSTITUTE Yolanda Gil

Executing Scientific Workflows on Grids Grids support this process through middleware services:

• Seamless integration and management of resources (OGSA)• Job submission (Condor)• Resource Monitoring and Directory Service (MDS) • Replica Location Service (RLS)• Metadata Catalog Services (MCS)

RDiscovery

Many sourcesof data, services,computation

R

Registries organizeservices of interestto a community

Access

Data integration activitiesmay require access to, &exploration/analysis of, dataat many locations

Exploration & analysismay involve complex,multi-step workflows

RM

RM

RMRM

RM

Resource managementis needed to ensureprogress & arbitrate competing demands

Securityservice

Securityservice

PolicyservicePolicyservice

Security & policymust underlie access& managementdecisions

From [Kesselman 04]:

Page 10: 1 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil Artificial Intelligence and Large-Scope Science: Workflow Planning and Beyond Yolanda Gil USC/Information

10USC INFORMATION SCIENCES INSTITUTE Yolanda Gil

FFT

FFT filea

/usr/local/bin/fft /home/file1

transfer filea from host1://

home/fileato host2://home/file1

ApplicationDomain

AbstractWorkflow

ConcreteWorkflow

ExecutionEnvironment

host1 host2

Data

Data

host2

Application Development and Execution Process

DataTransfer

Resource SelectionData Replica Selection

Transformation InstanceSelection

ApplicationComponent

Selection

Retry

Pick different Resources

Specify aDifferentWorkflow

Failure RecoveryMethod

Page 11: 1 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil Artificial Intelligence and Large-Scope Science: Workflow Planning and Beyond Yolanda Gil USC/Information

11USC INFORMATION SCIENCES INSTITUTE Yolanda Gil

Challenges

Complexity: Many choices are involved as workflow is composed• Alternative application components, files, and locations• Many different interdependencies may occur among components• May reach many dead ends

Usability: Users should not need to be aware of infrastructure details

• Files are distributed, indexed, replicated • Match application requirements to host capabilities

Solution cost: Evaluate the alternative solution costs• Performance• Reliability• Resource Usage

Global cost: minimizing cost across organizations • Individual user’s choices in light of other user’s choices

Reliability of execution: job resubmission upon failure• Detection, diagnosis, repair• Anticipation and avoidance, resource reservations

Page 12: 1 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil Artificial Intelligence and Large-Scope Science: Workflow Planning and Beyond Yolanda Gil USC/Information

12USC INFORMATION SCIENCES INSTITUTE Yolanda Gil

Challenges and opportunities for Artificial Intelligence

We need alternative foundations that offer • expressive representations to capture the complex

knowledge involved in both the application domain and the execution environment

• flexible reasoners to explore this complex space systematically and incorporate constraints, tradeoffs, policies

Many Artificial Intelligence (AI) techniques are relevant:– Planning to achieve given requirements – Searching through problem spaces of related choices– Using and combining heuristics– Reasoners that can incorporate rules, definitions, axioms, etc. – Schedulers and resource allocation techniques– Coordination and communication in distributed problem solving– Expressive knowledge representation languages– Reasoning under uncertainty – Dynamic replanning and reactive control – Learning in complex dynamic environments – Learning to improve problem solving skills

Page 13: 1 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil Artificial Intelligence and Large-Scope Science: Workflow Planning and Beyond Yolanda Gil USC/Information

13USC INFORMATION SCIENCES INSTITUTE Yolanda Gil

Outline

Motivation• Scientific workflows• Challenges and opportunities for Artificial Intelligence

Research on workflow planning at USC/ISI• Using AI techniques in Pegasus to generate executable grid

workflows Future directions in support of scientific workflows

• Intelligent interactive assistance and automatic completion

• Active workflows• Cognitive grids

Knowledge infrastructure for science• Challenges in Community-Based Knowledge Capture and

Representation

Page 14: 1 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil Artificial Intelligence and Large-Scope Science: Workflow Planning and Beyond Yolanda Gil USC/Information

14USC INFORMATION SCIENCES INSTITUTE Yolanda Gil

GridGridGrid

workflow executor (DAGman)

Execution

WorkflowPlanning

Globus Replica Location Service

Globus Monitoring and Discovery

Service

Information and Models

detector

Raw data

Concrete

Workflow

tasks

Replica LocationAvailable Reources

Moni

torin

g in

form

atio

n

Abstract Worfklow

Dynamic

information

Request Manager

Replica and Resource SelectorSubmission and

Monitoring System

Workflow Reduction

DataPublication

Virtual Data Language Chimera

Data Management

TransformationCatalog

WorkflowGeneration

Reasoning about Distributed Execution Infrastructure in Grids with Pegasus (work with J. Blythe, E. Deelman, C. Kesselman, and others)

[Gil et al, IEEE IS 04]

Page 15: 1 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil Artificial Intelligence and Large-Scope Science: Workflow Planning and Beyond Yolanda Gil USC/Information

15USC INFORMATION SCIENCES INSTITUTE Yolanda Gil

Pegasus: Using AI Planning Techniques to Generate Executable Grid Workflows

Given: desired result and constraints• A desired result (high-level description of data product)• A set of application components described in the grid• A set of resources in the grid (dynamic, distributed) • A set of constraints and preferences on solution quality

Find: an executable job workflow• A configuration of components that generates the desired result• A specification of resources where components can be executed

and data can be stored• A specification of data sources and data movements

Approach: Use AI planning techniques to search the solution space and evaluate tradeoffs

• Exploit heuristics to direct the search for solutions and represent optimality and policy criteria

Page 16: 1 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil Artificial Intelligence and Large-Scope Science: Workflow Planning and Beyond Yolanda Gil USC/Information

16USC INFORMATION SCIENCES INSTITUTE Yolanda Gil

Advantages of Using AI Planning

Provide broad-base, generic foundation Use general techniques to search for solutions Explores alternatives, supports backtracking Incorporates domain-specific and domain-

independent heuristics (as search control rules) Allow easy addition of new constraints and rules Incorporate optimality and policy into the search

for solutions Interleave decisions at various levels Can integrate the generation of workflows across

users and policies within virtual orgs.

Page 17: 1 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil Artificial Intelligence and Large-Scope Science: Workflow Planning and Beyond Yolanda Gil USC/Information

17USC INFORMATION SCIENCES INSTITUTE Yolanda Gil

Reasoning about Workflows in Pegasus

a

d e

g

h

c

f

i

b

Data processing tasks

KEYThe original node

Input transfer node

Registration node

Output transfer node

Unnecessary nodes

e

g

h

d

a

c

f

i

b

Final Workflow

a

Desired Results

h

f

i

Page 18: 1 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil Artificial Intelligence and Large-Scope Science: Workflow Planning and Beyond Yolanda Gil USC/Information

18USC INFORMATION SCIENCES INSTITUTE Yolanda Gil

Pegasus Application Domains (work with E. Deelman and dozens of scientists)

Pulsar search for gravitational-wave physics (LIGO)

• 975 tasks, 1365 data transfers, 975 output files, 96hrs runtime

Galaxy morphology for NVO and NASA in Montage

Thomography for neural structure reconstruction

High-energy physics – Compact Muon Solenoid

• 7 days, 678 jobs, produced ~200GB

Gene alignment• In 24 hours, ~ 10,000 Grid

jobs, >200,000 BLAST executions, produced 50 GB

Page 19: 1 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil Artificial Intelligence and Large-Scope Science: Workflow Planning and Beyond Yolanda Gil USC/Information

19USC INFORMATION SCIENCES INSTITUTE Yolanda Gil

Small Montage Workflow

~1200 nodes [Deelman et al, 04]

Page 20: 1 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil Artificial Intelligence and Large-Scope Science: Workflow Planning and Beyond Yolanda Gil USC/Information

20USC INFORMATION SCIENCES INSTITUTE Yolanda Gil

Artemis: Integrating Distributed Info Sources on the Grid (work with E. Deelman, S. Thakkar, R. Tuchinda)

DataSource

Models

DataSource

Filters

Entity selection

Ontology

User

Query Wizard

DataSource

MetadataCatalogServices

Model mappings

Dynamic Model

Generator

PrometheusQuery

Mediator

Theseusquery

execution

[Tuchinda et al, IAAI-04]

MetadataCatalogServices

MetadataCatalogServices

Page 21: 1 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil Artificial Intelligence and Large-Scope Science: Workflow Planning and Beyond Yolanda Gil USC/Information

21USC INFORMATION SCIENCES INSTITUTE Yolanda Gil

Outline

Motivation• Scientific workflows• Challenges and opportunities for Artificial Intelligence

Research on workflow planning at USC/ISI• Using AI techniques in Pegasus to generate executable grid

workflows Future directions in support of scientific workflows

• Intelligent interactive assistance and automatic completion

• Active workflows• Cognitive grids

Knowledge infrastructure for science• Challenges in Community-Based Knowledge Capture and

Representation

Page 22: 1 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil Artificial Intelligence and Large-Scope Science: Workflow Planning and Beyond Yolanda Gil USC/Information

22USC INFORMATION SCIENCES INSTITUTE Yolanda Gil

Scientific Workflows: Future Directions

Using AI to support the workflow creation process• Interactive assistance and automatic completion

Using AI to support the scientific experimentation process• Active workflows

Using AI to augment the execution infrastructure • Cognitive grids

Page 23: 1 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil Artificial Intelligence and Large-Scope Science: Workflow Planning and Beyond Yolanda Gil USC/Information

23USC INFORMATION SCIENCES INSTITUTE Yolanda Gil

The Process of Creating an Executable Workflow

1. Creating a valid workflow template (human guided)• Selecting application components and connecting inputs and

outputs• Adding other steps for data conversions/transformations

2. Creating instantiated workflow• Providing input data to pathway inputs (logical assignments)

3. Creating executable workflow (automatically)• Given requirements of each model, find and assign adequate

resources for each model• Select physical locations for logical names• Include data movement steps, including data deposition

steps

User guided

Automated

Page 24: 1 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil Artificial Intelligence and Large-Scope Science: Workflow Planning and Beyond Yolanda Gil USC/Information

24USC INFORMATION SCIENCES INSTITUTE Yolanda Gil

Challenges for Interactive Composition of Valid Workflow Templates

Provide flexible interaction• User can start from initial data, from data products, or steps • User can specify abstract descriptions of steps and later specialize

them • User can reuse, merge, or build from scratch

Automatic tracking of workflow constraints • User is notified if there are problems but does not have to keep track

of details Proactive assistance

• System should not just point out problems but help user by suggesting fixes (always)

And… how do we define what “valid” means?

Page 25: 1 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil Artificial Intelligence and Large-Scope Science: Workflow Planning and Beyond Yolanda Gil USC/Information

25USC INFORMATION SCIENCES INSTITUTE Yolanda Gil

Assisting Users in Creating Workflow Templates (with J. Kim and M. Spraragen)

User interaction results in modifications to workflows• Specify desired result, external/user provided input • Add/remove step, add/remove link• Specialize step (e.g., IMR -> IMR-SA)

As user creates a workflow, intermediate stages result in possibly incorrect workflows

ErrorScan algorithm detects errors and generates possible fixes• Knowledge base that represents components and constraints• Formal definitions of desirable properties of workflows based on AI

planning techniques Fixes are multi-step and “click-through” Errors and fixes are ranked using heuristics

If no errors detected, workflow is guaranteed to be correct

[Kim et al, IUI-04] [Spraragen et al, 04]

Page 26: 1 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil Artificial Intelligence and Large-Scope Science: Workflow Planning and Beyond Yolanda Gil USC/Information

26USC INFORMATION SCIENCES INSTITUTE Yolanda Gil

Scientific Workflows: Future Directions

Using AI to support the workflow creation process• Interactive assistance and automatic completion

Using AI to support the scientific experimentation process• Active workflows

Using AI to augment the execution infrastructure • Cognitive grids

Page 27: 1 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil Artificial Intelligence and Large-Scope Science: Workflow Planning and Beyond Yolanda Gil USC/Information

27USC INFORMATION SCIENCES INSTITUTE Yolanda Gil

Supporting the Interactive and Incremental Nature of Scientific Exploration (with M. Ellisman, E. Deelman, C. Kesselman)

Workflows cannot always be created in advance • Experimental design depends on initial / partial results • Scientific experimentation is often exploratory

Need to support interactive and incremental creation and execution of workflows

Active workflows: represent evolving workflows and are continually authored, refined, executed, and modified

Page 28: 1 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil Artificial Intelligence and Large-Scope Science: Workflow Planning and Beyond Yolanda Gil USC/Information

28USC INFORMATION SCIENCES INSTITUTE Yolanda Gil

Supporting the Evolution of Active Workflows (I)

Page 29: 1 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil Artificial Intelligence and Large-Scope Science: Workflow Planning and Beyond Yolanda Gil USC/Information

29USC INFORMATION SCIENCES INSTITUTE Yolanda Gil

Supporting the Evolution of Active Workflows (II)

Page 30: 1 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil Artificial Intelligence and Large-Scope Science: Workflow Planning and Beyond Yolanda Gil USC/Information

30USC INFORMATION SCIENCES INSTITUTE Yolanda Gil

Supporting the Evolution of Active Workflows (and III)

Page 31: 1 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil Artificial Intelligence and Large-Scope Science: Workflow Planning and Beyond Yolanda Gil USC/Information

31USC INFORMATION SCIENCES INSTITUTE Yolanda Gil

Scientific Workflows: Future Directions

Using AI to support the workflow creation process• Interactive assistance and automatic completion

Using AI to support the scientific experimentation process• Active workflows

Using AI to augment the execution infrastructure • Cognitive grids

Page 32: 1 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil Artificial Intelligence and Large-Scope Science: Workflow Planning and Beyond Yolanda Gil USC/Information

32USC INFORMATION SCIENCES INSTITUTE Yolanda Gil

Simulationcodes

Community Distributed Resources (e.g., computers, storage, network,

simulation codes, data)

ResourceIndexes

ReplicaLocators

OtherGridservices

ApplicationKB

ResourceKB

PolicyKB

Other KB

Policy InformationServices

Pervasive Knowledge Sources

PolicyManagement

ResourceMatching

WorkflowRepair

WorkflowRefinement

Workflow historyWorkflow

historyWorkflow History

Smart Workflow Pool

Workflow Manager

High-level specification ofdesired results, constraints, requirements, user policies

Intelligent Reasoners

Pervasive Knowledge Sources and Reasoners(work with J. Blythe, E. Deelman, C. Kesselman, H. Tangmurarunkit)

[Gil et al, IEEE IS 04]

Page 33: 1 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil Artificial Intelligence and Large-Scope Science: Workflow Planning and Beyond Yolanda Gil USC/Information

33USC INFORMATION SCIENCES INSTITUTE Yolanda Gil

Cognitive Grids: Pervasive Semantic Representations of the Environment at all Levels

Basic Grid Middleware (Globus Toolkit, Condor-G, DAGMan)

Higher-Level Service (Virtual Data Tools, Resource Brokers)

Intelligent Reasoners (matchmaking, refinement, repair, coordination, negotiation…)

Users and Applications

Semantic ResourceDescriptions

Resource Knowledge-bases

Application ComponentModels

Resource PolicyDescriptions

User and VO policymodels

Grid Resources (Compute, Data, Network)

Policy Knowledge-bases

Current Request Status, Results,Provenance Information

High-levelRequest

descriptions

Refined Workflow Provenance andMonitoring

TasksMonitoring, Resources

knowledge

Semantics forFile-based data

Page 34: 1 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil Artificial Intelligence and Large-Scope Science: Workflow Planning and Beyond Yolanda Gil USC/Information

34USC INFORMATION SCIENCES INSTITUTE Yolanda Gil

time

Levels ofabstraction

Application-level

knowledge

Logicaltasks

Tasksbound toresources

and sent forexecution

User’sRequest

Relevantcomponents

Fullabstractworkflow

Partialexecution

Not yetexecuted

executed

Workflow refinement

Onto-basedMatchmaker

Workflow repair

Policyreasoner

Cognitive Grids: Distributed Intelligent Reasoners that Incrementally Generate the Workflow

Page 35: 1 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil Artificial Intelligence and Large-Scope Science: Workflow Planning and Beyond Yolanda Gil USC/Information

35USC INFORMATION SCIENCES INSTITUTE Yolanda Gil

Many Opportunities for AI Techniques

The Grid Now The Future Grid

Syntax-based matchmaking of resources to job requirements

• Condor matchmaker• Attribute based discovery and

selection Scheduling of jobs based on

Grid-able users that specify job execution sequences and computing requirements

• Scripting languages• Workflow languages,• Task graphs

Explicit mappings from task to jobs, simple job brokers

Explicit service negotiation and recovery strategies

Knowledge-based reasoning about resources enables

• Semantic matchmaking• Aggregate resource reasoning

Task-level reasoning to plan and schedule jobs and resources

• More agility and coordination Wide range of users can specify

high level requirements in a mixed-initiative mode

• Mapping of high-level requirements to details required for execution

End-to-end resource negotiation and adaptive strategies to accommodate failure

Page 36: 1 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil Artificial Intelligence and Large-Scope Science: Workflow Planning and Beyond Yolanda Gil USC/Information

36USC INFORMATION SCIENCES INSTITUTE Yolanda Gil

Outline

Motivation• Scientific workflows• Challenges and opportunities for Artificial Intelligence

Research on workflow planning at USC/ISI• Using AI techniques in Pegasus to generate executable grid

workflows Future research in support of scientific workflows

• Intelligent interactive assistance and automatic completion

• Active workflows• Cognitive grids

Knowledge infrastructure for science• Challenges in Community-Based Knowledge Capture and

Representation

Page 37: 1 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil Artificial Intelligence and Large-Scope Science: Workflow Planning and Beyond Yolanda Gil USC/Information

37USC INFORMATION SCIENCES INSTITUTE Yolanda Gil

Knowledge Infrastructure for Science: Challenges in Community-Based Knowledge Capture & Representation

1. be a community-wide effort 2. have community-wide acceptance3. be used in practice on a daily basis to compose

simulation code and annotate their results

Page 38: 1 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil Artificial Intelligence and Large-Scope Science: Workflow Planning and Beyond Yolanda Gil USC/Information

38USC INFORMATION SCIENCES INSTITUTE Yolanda Gil

Scientists Ask Lots of Questions, Knowledge Representation has few Answers

How do you get started? How to ensure the community will accept it (use

it)? How do you (can you?) represent alternative

views? What is the process to contribute to it? What is the process to make changes to it? What is the impact to my application when there

is an update? How is it implemented? How is it managed? Who does what, when, where, why?

Page 39: 1 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil Artificial Intelligence and Large-Scope Science: Workflow Planning and Beyond Yolanda Gil USC/Information

39USC INFORMATION SCIENCES INSTITUTE Yolanda Gil

SCEC/GO Workshop on Ontology Development: Lessons Learned and Prospects [Bada et al, forthcoming]

SCEC learns from the Gene Ontology (GO) experience (Workshop Nov’02, Cambridge UK):• Had a successful jumpstart• Done by biologists, not knowledge engineers• Developed by a wide, distributed community• Focused on specific aspects of genomics

– Fly-base, yeast, mouse• Used 24/7 from day 1• Accepted widely by the community• Extended based on use requirements of a wide

community• Quite large (13K terms)• Simple (and messy) representation• Simple infrastructure• Process to accommodate changes, curation

Page 40: 1 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil Artificial Intelligence and Large-Scope Science: Workflow Planning and Beyond Yolanda Gil USC/Information

40USC INFORMATION SCIENCES INSTITUTE Yolanda Gil

Some Policies for Organizing Contributions

Curated by knowledge engineers: processes changes requested by users• http://www.ecocyc.org

Curated by domain experts: group of domain curators processes changes requested by users• http://www.geneontology.org

Open contributions: any user can add content• http://www.dmoz.org, http://www.openmind.org

Open editing: any user can edit and create any page on a web site.• http://wiki.org

Page 41: 1 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil Artificial Intelligence and Large-Scope Science: Workflow Planning and Beyond Yolanda Gil USC/Information

41USC INFORMATION SCIENCES INSTITUTE Yolanda Gil

Broad Range of Contributors of Scientific Knowledge (with T. Chklovski)

<<< >> <>>>>>

More inexpensiveMore inaccurateMore ambiguousDeeper into society/impact

<subclassOf foton … <>>>>

More expensiveMore accurateMore concreteDeeper into the science

Page 42: 1 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil Artificial Intelligence and Large-Scope Science: Workflow Planning and Beyond Yolanda Gil USC/Information

42USC INFORMATION SCIENCES INSTITUTE Yolanda Gil

Thank you!

Scientific workflows• pegasus.isi.edu

Cognitive grids• www.isi.edu/ikcap/cognitive-grids

AI and science• IEEE Intelligent Systems Jan/Feb 2004, De Roure, Gil,

Hendler (Eds), Special issue on e-Science

www.isi.edu/~gil

Page 43: 1 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil Artificial Intelligence and Large-Scope Science: Workflow Planning and Beyond Yolanda Gil USC/Information

43USC INFORMATION SCIENCES INSTITUTE Yolanda Gil

“As We May Think”

“Wholly new forms of encyclopedias will appear, ready made with a mesh of associative trails running through them […]. The lawyer has at his touch the associated opinions and decisions of his whole experience, and of the experience of friends and authorities. The patent attorney has on call the millions of issued patents, with familiar trails to every point of his client's interest. […] The chemist, struggling with the synthesis of an organic compound, has all the chemical literature before him in his laboratory, with trails following the analogies of compounds, and side trails to their physical and chemical behavior. […]

There is a new profession of trail blazers, those who find delight in the task of establishing useful trails through the enormous mass of the common record. The inheritance from the master becomes, not only his additions to the world's record, but for his disciples the entire scaffolding by which [their additions] were erected.”

--- Vannevar Bush, 1945

http://www.theatlantic.com/unbound/flashbks/computer/bushf.htm

Page 44: 1 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil Artificial Intelligence and Large-Scope Science: Workflow Planning and Beyond Yolanda Gil USC/Information

44USC INFORMATION SCIENCES INSTITUTE Yolanda Gil

Searching for Pulsars with the Pegasus Planner

Used AI planning techniques to compose executable grid workflows with hundreds of jobs

Laser-Interferometer Gravitational Wave Observatory (LIGO) data, which aims to detect waves predicted by Einstein’s theory of relativity

Used LIGO’s data collected during the first scientific run of the instruments in Fall 2002

Targeted a set of 1000 locations of known pulsars as well as random locations in the sky

Performed using compute and storage resources at Caltech, University of Southern California, and University of Wisconsin Milwaukee.