science cloud paul watson newcastle university, uk [email protected]
TRANSCRIPT
Research Challenge
Understanding the brain is the greatest informatics challenge
• Enormous implications for science:
• Medicine
• Biology
• Computer Science
Collecting the Evidence
100,000 neuroscientists generate huge quantities of data – molecular (genomic/proteomic)– neurophysiological (time-series activity)– anatomical (spatial)– behavioural
Neuroinformatics Problems
• Data is:• expensive to collect but rarely shared• in proprietary formats & locally described
• The result is:• a shortage of analysis techniques that can be applied
across neuronal systems• limited interaction between research centres with
complementary expertise
Data in Science
• Bowker’s “Standard Scientific Model”
1. Collect data
2. Publish papers
3. Gradually loose the original data
The New Knowledge Economy & Science & Technology Policy, G.C. Bowker
• Problems:– papers often draw conclusions from data that is not
published– inability to replicate experiments– data cannot be re-used
Codes in Science
• Three stages for codes
1. Write code and apply to data
2. Publish papers
3. Gradually loose the original codes
• Problems:– papers often draw conclusions from codes that are
not published– inability to replicate experiments– codes cannot be re-used
Plan
• Neuroinformatics - a challenging e-science application• CARMEN – addressing the challenges• Cloud Computing for e-science
– Lessons we’ve Learnt• The Promise of Commercial Clouds
cracking the neural code
neurone 1
neurone 2
neurone 3
raw voltage signal data typically collected using single or multi-electrode array recording
Focus on Neural Activity
Epilepsy Exemplar
Data analysis guides surgeon during operation
Further analysis provides evidence
WARNING!The next 2 Slides show an exposed human brain
CARMEN
enables sharing and collaborative exploitation of data, analysis code and expertise that are not physically collocated
CARMEN Project
Stirling
St. Andrews
Newcastle
York
Sheffield
Cambridge
ImperialPlymouth
Warwick
Leicester
Manchester
UK EPSRC e-Science Pilot
$7M (2006-10)
20 Investigators
Industry & Associates
CARMEN e-Science Requirements
• Store– very large quantities of data (100TB+)
• Analyse– suite of neuroinformatics services– support data intensive analysis
• Automate– workflow
• Share– under user-control
Background: North East Regional e-Science Centre
• 25 Research Projects across many domains:• Bioinformatics, Ageing & Health, Neuroscience, Chemical
Engineering, Transport, Geomatics, Video Archives, Artistic Performance Analysis, Computer Performance Analysis,....
• Same key needs:
Store
Analyse
AutomateShare
Result: e-Science Central
• Integrated Store-Analyse-Automate-Share infrastructure• Web-based• Generic
– CARMEN neuroinformatics & chemistry as pilots
Science Cloud Architecture
Data storage
and
analysis
Access over Internet
(typically via browser)
Upload data &
services
Run analyses
Cloud Services Continuum (based on Robert Anderson)
Platform(PaaS)
Infrastructure(IaaS)
Software(SaaS)
Google Apps
Google AppEngine
Amazon EC2 & S3
http://et.cairene.net/2008/07/03/cloud-services-continuum/
Microsoft Azure
Salesforce.com
Science Cloud Options
Cloud Infrastructure:Storage & Compute
Scie
nce
Ap
p 1
....
Scie
nce
Ap
p n
Cloud Infrastructure: Storage & Compute
Science Platform
ScienceApp 1 .... Science
App n
Users
Service Developers
CARMEN Cloud
Filestore with PatternSearch
Database
Metadata
ServiceRepositoryProcessing
Workflow
Enactment
Workflo
w
Secu
rit
y
Browsers &
Rich Clients
Editing and Running a Workflow on the Web
Viewing the output of Workflow Runs
Workflow
Result File
Viewing results
Blogs and links
Communicating Results
Linking to results & workflows
What we learnt: Moving into a Cloud
• Moving existing technologies into a cloud can be difficult– some can’t run in a Cloud at all
Raw Data Exploration with Signal Data Explorer
What we learnt : Scalability
• Clouds offer the potential for scalability– grab compute power only when needed
• But developers have to write scalable code– for Infrastructure as a Service Clouds
Dynasoar: Dynamic Deployment
29
C WSP
req
res
1
Host Provider
node 1s2, s5
…
node 2
node ns2
Web Service Provider
3
2: service fetch &deploy
SR
Service Repository
R
The deployed service remains in place andcan be re-used - unlike job scheduling
A request to s4
Dynasoar
30
C WSP
req
res
Host Provider
node 1s2, s5
…
node 2
node ns2
Web Service Provider
Consumer
A request for s2 is routed to an existing
deployment of the service
Adaptive Dynamic Deployment with Dynasoar
0
50
100
150
200
250
300
350
400
450
0.03
0.03
0.03
0.06
0.06
0.13
0.13
0.13
0.25
0.25 0.
5
0.5
0.5 1 1 1
Arrival Rate (messages per second)
Res
pons
e tim
e (s
econ
ds)
0
2
4
6
8
10
12
14
16
18
Proc
esso
rs in
poo
l
Response time(Seconds)
processors in pool
Adding Processors as you need them optimises resources and saves money in pay-as-you-go clouds
Commercial Pay-as-you-go cloudsWould allow us to avoid this limit
Hot Off the Press..
• Recent experiments with Microsoft Azure Cloud– running Chemical analyses– Silverlight UI
Thanks to:
- Paul Appleby & Team at the Microsoft Technology Centre, Reading
- & MS e-Science Group
Microsoft Azure Cloud for e-Science Demo
Why are Commercial Clouds Important: Before
Research
1. Have good idea
2. Write proposal
3. Wait 6 months
4. If successful, wait 3 months
5. Install Computers
6. Start Work
Science Start-ups
1. Have good idea
2. Write Business Plan
3. Ask VCs to fund
4. If successful..
5. Install Computers
6. Start Work
Why Use Commercial Clouds:
1. Have good idea
2. Grab nodes from Cloud provider
3. Start Work
4. Pay for what you used
• also scalability, cost, sustainability
Commercial Clouds to the Rescue?
• Focus currently on infrastructure as a service
• But, this is only part of the stack
• Can we have pay-as-you-go Science Cloud Platforms?
A Sustainable Science Cloud
Science Platform as a Service
ScienceApp 1
.... ScienceApp n
CommercialClouds
?
?
Problem:deliveringthe e-science platform
www.inkspotscience.com
e-Science Central
Cloud Infrastructure: Storage & Compute
Summary: e-Science Central & CARMEN
Software as a Service
Cloud Computi
ng
Social Networki
ng
e-Science Central /CARMEN
• Dynamic Resource
Allocation• Pay-as-you-Go*
• Web based• Works anywhere
• Controlled Sharing
• Collaboration• Communities
Summary
• e-Science Central– Store-Analyse-Automate-Share e-science platform– Adding content from a range of domains
• CARMEN is piloting this approach for neuroinformatics
• Cloud computing can revolutionise e-science– reduce time from idea to realisation