Microsoft Research: Computational Ecology and Environmental Science Group
http://research.microsoft.com/en-us/groups/ecology/
Manual Measurement
Automated Measurement
Sample Collection
Historical Photographs
Counting
Ubiquitous
Motes
Aircraft SurveysModel Output
Typing
Monitoring
Collation
Quality assurance
Aggregation
Analysis
Reporting
Forecasting
Distribution
Done poorly,but a few notablecounter-examples
Done poorly to moderately,not easy to find
Sometimes done well,generally discoverable and available,
but could be improved
Integration
(I. Zaslavsky & CSIRO, BOM, WMO)
Data-intensive Science
Data
Acquisition &
modelling
Collaboration
and
visualisation
Analysis &
data mining
Dissemination
& sharing
Archiving and
preserving
fourthparadigm.org
Complex shared detector Simple instrument (if any)
Complex and Heavy process by experts Ad hoc observations and models
KB
GB
TB
PB
Science happens when PBs, TBs, GBs, and KBs can be mashed up simply
Provenance and trust widely variesData acquisition, early processing, and reporting ranges from a large government agency to individual scientists.
Smaller data often passed around in email; big data downloads can take days (if at all)
Data sharing concerns and patterns varyOpen access followed by (non-repeatable and tedious) pre-processing
True science ready data set but concerns about misuse, misunderstanding particularly for hard won data.
Computational tools differ. Not everyone can get an account at a supercomputer center
Very large computations require engineering (error handling)
Space and time aren’t always simple dimensions
Getting what you need, when you need it
Cloud computing is good for…
http://github.com/windowsazure
Customer Data Center
http://fetchclimate2.cloudapp.net/
Data Marketplaces
Web search:
“open weather
data azure”
http://weatherservice.cloudapp.net
http://research.microsoft.com/en-us/projects/azure/technical-papers.aspx
http://aka.ms/dm0 http://research.microsoft.com/projects/msrceesdm/
Windows Azure for Research Group
@azure4research
www.azure4research.com
MODIS Azure: Computing Evapotranspiration (ET) in the Cloud
A pipeline for
download,
processing, and
reduction of
diverse NASA
MODIS satellite
imagery.
Catharine van Ingen (Microsoft Research), Jie Li, Marty Humphrey (UVA), Youngryel Ryu (UCB), Deb Agarwal (BWC/LBL), Keith Jackson
(BL), Jay Borenstein (Stanford) , Team SICT: Vlad Andrei, Klaus Ganser, Samir Selman, Nandita Prabhu (Stanford), Team Nimbus: David Li,
Sudarshan Rangarajan, Shantanu Kurhekar, Riddhi Mittal (Stanford)
MODIS Azure Service
Reduction #1 Queue
Scientific
Results
Downloa
d
Reduction #2 Queue
Source
Metadata
MODIS Azure
Service Web Role
Portal
Request
Queue
Analysis Reduction Stage
Data Collection Stage
Source Imagery Download Sites
. . .
Reprojection
Queue
Derivation Reduction Stage Reprojection Stage
Download
Queue
Scientists
Science results
Catharine van Ingen (Microsoft Research), Jie Li, Marty Humphrey (UVA), Youngryel Ryu (UCB), Deb Agarwal (BWC/LBL), Keith Jackson
(BL), Jay Borenstein (Stanford) , Team SICT: Vlad Andrei, Klaus Ganser, Samir Selman, Nandita Prabhu (Stanford), Team Nimbus: David Li,
Sudarshan Rangarajan, Shantanu Kurhekar, Riddhi Mittal (Stanford)
Use laptops &
desktop computers
Overwhelmed by
data
Finding analysis
ever more difficult;
sharing even
harder
www.azure4research.com
Windows Azure for Research Group
@azure4research
www.azure4research.com