cloud computing for e-science with carmen paul watson newcastle university

39
Cloud Computing for e-Science with CARMEN Paul Watson Newcastle University

Upload: curtis-leeke

Post on 14-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Cloud Computing fore-Science with CARMEN

Paul WatsonNewcastle University

e-Science

“e-Science is about global collaboration in key areas of science, and the next generation of infrastructure that will enable it”

John Taylor

Former Director General of the UK Research Councils

Two Strands to talk...

Research Challenge

Understanding the brain is the greatest informatics challenge

• Enormous implications for science:

• Medicine

• Biology

• Computer Science

Collecting the Evidence

100,000 neuroscientists generate huge quantities of data

– molecular (genomic/proteomic)

– neurophysiological (time-series activity)– anatomical (spatial)– behavioural

Neuroinformatics Problems

• Data is:• expensive to collect but rarely shared• in proprietary formats & locally described

• The result is:• a shortage of analysis techniques that can be applied

across neuronal systems• limited interaction between research centres with

complementary expertise

Data in Science

Bowker’s “Standard Scientific Model”

1. Collect data

2. Publish papers

3. Gradually loose the original data

The New Knowledge Economy & Science & Technology Policy, G.C. Bowker

Problems:– papers often draw conclusions from data that is

not published– inability to replicate experiments– data cannot be re-used

Codes in Science

Three stages for codes

1. Write code and apply to data

2. Publish papers

3. Gradually loose the original codes

Problems:

– papers often draw conclusions from codes that are not published

– inability to replicate experiments

– codes cannot be re-used

CARMEN

enables sharing and collaborative exploitation of data, analysis code and expertise that are not physically collocated

CARMEN Project

UK EPRSC e-Science Pilot

£5M (2006-10)

20 Investigators

Stirling

St. Andrews

Newcastle

York

Sheffield

Cambridge

ImperialPlymouth

Warwick

Leicester

Manchester

Newcastle: Colin Ingram Paul Watson Stuart Baker Marcus Kaiser Phil Lord Evelyne Sernagor Tom Smulders Miles Whittington

York: Jim Austin Tom Jackson

Stirling: Leslie Smith Plymouth: Roman Borisyuk

Cambridge: Stephen Eglen

Warwick: Jianfeng Feng

Sheffield: Kevin Gurney Paul Overton

Manchester: Stefano Panzeri

Leicester: Rodrigio Quian Quiroga

Imperial: Simon Schultz

St. Andrews: Anne Smith

CARMEN Consortium

Industry & Associates

cracking the neural code

neurone 1

neurone 2

neurone 3

raw voltage signal data typically collected using single or multi-electrode array recording

Focus on Neural Activity

Epilepsy ExemplarData analysis guides surgeon removing brain tissue

WARNING!

The next 2 Slides show an exposed brain

Epilepsy Exemplar

Recording from removed tissue (up to 20 GB/h)

On-line analysis by distributed collaborators will enable experiment to be defined during data collection

Repository will enable integration of rare

case types from different labs

Advances in Treatment

Data analysis guides surgeon removing brain tissue

e-Science Requirements Summary

• Sharing– data– code

• Capacity– vast data storage

• (100TB+ in CARMEN)

– support data intensive analysis

CARMEN Cloud Architecture

Data storage

and

analysis

User access over Internet(typically via

browser)

Users upload data & services

Users run analyses

e-Science Cloud Services

• Amazon (& Google) offer cloud computing– Basic storage & compute services– e.g. Amazon S3 & EC2

• e-Science needs a set of higher-level services to support user needs

• Which services? ....

CARMEN Cloud (CAIRN)

Data

Metadata

Compute Cluster on which Services are Dynamically

Deployed

WebPortal

..............

WebPortal

Rich Clients

Sec

urity

Workflow Enactment

Engine

RegistryServiceRepos-

itory

Search for Data & Analysis Code

Raw & Derived Data Store

Structured Metadata Store Enabling Search & Annotation

AnalysisCode Store

Dynasoar

• Code Repository and Deployment– long term storage

• Code factored as Web Services– Standard (WS-I) interface– Internals not important

• Java, MatLab, C, C#,C++,...

• Deployers for a variety of service types– .war files (Tomcat), Virtual Machines (VMWare, Virtual

PC), .NET assemblies, database stored procedures

Dynasoar: Dynamic Deployment

21

C WSP

req

res

1

Host Provider

node 1s2, s5

node 2

node ns2

Web Service Provider

3

2: service fetch &deploy

SR

Service Repository

R

The deployed service remains in place andcan be re-used - unlike job scheduling

A request to s4

Dynasoar

22

C WSP

req

res

Host Provider

node 1s2, s5

node 2

node ns2

Web Service Provider

Consumer

A request for s2 is routed to an existing

deployment of the service

Performance Gains

C

req

resAnalysis Service

DatabaseService

req

res

Scalability

0

50

100

150

200

250

300

350

400

450

0.03

0.03

0.03

0.06

0.06

0.13

0.13

0.13

0.25

0.25 0.

5

0.5

0.5 1 1 1

Arrival Rate (messages per second)

Res

pons

e tim

e (s

econ

ds)

0

2

4

6

8

10

12

14

16

18

Proc

esso

rs in

poo

l

Response time(Seconds)

processors in pool

CARMEN Cloud (CAIRN)

Data

Metadata

Compute Cluster on which Services are Dynamically

Deployed

WebPortal

..............

WebPortal

Rich Clients

Sec

urity

Workflow Enactment

Engine

RegistryServiceRepos-

itory

Search for Data & Analysis Code

Raw Signal Data Search & Visualisation

Enactment of scientific analysis processes

Raw & Derived Data Store

Security Policies Controlling Access to Data & Code

Structured Metadata Store Enabling Search & Annotation

AnalysisCode Store

Controlled Sharing

My collaborators can now see it

Everyone can see it

Only I am allowed to see

this data

Scientist

Security Solution

• XACML – standard way to encode rules as (subject, action, resource) triples

• Rules checked on each access

Controlled Sharing - conflicts

My collaborators can now see it

Only I am allowed to see

this data

All data must be accessible to everyone

after the end of the project

Scientist

Funder

Addressing Conflicts

• Each party expresses policy as XACML rules• Rules are converted to formal language

– XACML -> VDM++• Run formal model to detect conflicts

Data

Metadata

Compute Cluster on which Services are Dynamically

Deployed

WebPortal

..............

WebPortal

Rich Clients

Sec

urity

Workflow Enactment

Engine

RegistryServiceRepos-

itory

OMII:Grimoire

DAME:Signal Data Explorer

OMII/ myGrid:Taverna

OGSA-DAI, SRB, DAME

Gold:Role & Task based Security

myGrid & CISBAN

Dynasoar

CARMEN CAIRN

Using CARMEN for a typical scenario

1. Data Collection from a Multi-Electrode Array2. Data Visualisation and Exploration3. Spike Detection4. Spike Sorting5. Analysis6. Visualisation of Analysis Results

Currently, this is asemi-manual process

CARMEN has automated this….

Web Portal

Raw Data Exploration with Signal Data Explorer

Defining the process with Workflow

Running a Workflow

SRB FileSystem

RDBMS

External

Client Spike Sorting

Service

Reporting

Dynamically Deployed Services in Dynasoar

TAVERNA

Registry

INPUT Data

OUTPUT Metadata

Available Services

RepositoryS

ecur

ityWorkflow Engine

Query

Running the Workflow

Graphical Output

Movie Output

CARMEN (www.carmen.org.uk)

• is delivering an e-Science infrastructure that can be applied across a diverse range of applications

• uses a Cloud/Software as a Service architecture • enables cooperation and interdisciplinary working• aims to deliver new results in neuroscience, computer

science and medicine