data challenges in e-science aberdeen prof. malcolm atkinson director 2 nd december 2003
TRANSCRIPT
![Page 1: Data Challenges in e-Science Aberdeen Prof. Malcolm Atkinson Director 2 nd December 2003](https://reader036.vdocument.in/reader036/viewer/2022062417/5515edd3550346cf6f8b5274/html5/thumbnails/1.jpg)
Data Challenges in e-Science
Aberdeen
Prof. Malcolm AtkinsonDirector
www.nesc.ac.uk
2nd December 2003
![Page 2: Data Challenges in e-Science Aberdeen Prof. Malcolm Atkinson Director 2 nd December 2003](https://reader036.vdocument.in/reader036/viewer/2022062417/5515edd3550346cf6f8b5274/html5/thumbnails/2.jpg)
What is e-Science?
![Page 3: Data Challenges in e-Science Aberdeen Prof. Malcolm Atkinson Director 2 nd December 2003](https://reader036.vdocument.in/reader036/viewer/2022062417/5515edd3550346cf6f8b5274/html5/thumbnails/3.jpg)
Foundation for e-Science
sensor nets
Shared data archives
computers
software
colleagues
instruments
Grid
e-Science methodologies will rapidly transform science, engineering, medicine and business
driven by exponential growth (×1000/decade) enabling a whole-system approach
Diagram derived fromIan Foster’s slide
![Page 4: Data Challenges in e-Science Aberdeen Prof. Malcolm Atkinson Director 2 nd December 2003](https://reader036.vdocument.in/reader036/viewer/2022062417/5515edd3550346cf6f8b5274/html5/thumbnails/4.jpg)
Three-way Alliance
Computing ScienceSystems, Notations &
Formal Foundation→ Process & Trust
TheoryModels & Simulations
→Shared Data
Experiment &Advanced Data
Collection→
Shared Data
Multi-national, Multi-discipline, Computer-enabledConsortia, Cultures & Societies
Requires Much Engineering, Much Innovation
Changes Culture, New Mores, New Behaviours
New Opportunities, New Results, New Rewards
![Page 5: Data Challenges in e-Science Aberdeen Prof. Malcolm Atkinson Director 2 nd December 2003](https://reader036.vdocument.in/reader036/viewer/2022062417/5515edd3550346cf6f8b5274/html5/thumbnails/5.jpg)
Biochemical Pathway Simulator
Closing the inf ormation loop – between lab and computational model.
(Computing Science, Bioinformatics, Beatson Cancer Research Labs)
DTI Bioscience Beacon Project Harnessing Genomics Programme
Slide from Professor Muffy Calder, Glasgow
![Page 6: Data Challenges in e-Science Aberdeen Prof. Malcolm Atkinson Director 2 nd December 2003](https://reader036.vdocument.in/reader036/viewer/2022062417/5515edd3550346cf6f8b5274/html5/thumbnails/6.jpg)
Why is Data Important?
![Page 7: Data Challenges in e-Science Aberdeen Prof. Malcolm Atkinson Director 2 nd December 2003](https://reader036.vdocument.in/reader036/viewer/2022062417/5515edd3550346cf6f8b5274/html5/thumbnails/7.jpg)
Information, knowledge,decisions & designs
Derived &Synthesised
Data
Data as Evidence – all disciplines
Hypothesis,Curiosity or …
Collectionsof
Data
Analysis&
Models
Driven by creativity, imagination and perspiration
![Page 8: Data Challenges in e-Science Aberdeen Prof. Malcolm Atkinson Director 2 nd December 2003](https://reader036.vdocument.in/reader036/viewer/2022062417/5515edd3550346cf6f8b5274/html5/thumbnails/8.jpg)
Information, knowledge,decisions & designs
Data as Evidence - Historically
Hypothesis,Curiosity or …
Collectionsof
Data
Analysis&
Models
Derived &Synthesised
Data
Individual’sidea
Personalcollection
Personaleffort
LabNotebook
Driven by creativity, imagination, perspiration & personal resources
![Page 9: Data Challenges in e-Science Aberdeen Prof. Malcolm Atkinson Director 2 nd December 2003](https://reader036.vdocument.in/reader036/viewer/2022062417/5515edd3550346cf6f8b5274/html5/thumbnails/9.jpg)
Information, knowledge,decisions & designs
Data as Evidence – Enterprise Scale
Hypothesis,Curiosity,
Business Goals
Collectionsof
Digital Data
Analysis &Computational
Models
Derived &Synthesised
Data
Agreed Hypothesisor Goals
Enterprise databases& (archive) file stores
Data ProductionPipelines
Data Products& Results
Driven by creativity, imagination, perspiration & company’s resources
![Page 10: Data Challenges in e-Science Aberdeen Prof. Malcolm Atkinson Director 2 nd December 2003](https://reader036.vdocument.in/reader036/viewer/2022062417/5515edd3550346cf6f8b5274/html5/thumbnails/10.jpg)
Information, knowledge,decisions & designs
Data as Evidence – e-Science
Shared GoalsMultiple
hypotheses
Collectionsof Published
& Private Data
AnalysisComputationAnnotation
Derived &Synthesised
Data
Communitiesand Challenges
Multi-enterprise& Public Curation
Synthesis fromMultiple SourcesMulti-enterpriseModels, Computation &Workflow
SharedData Products& Results
Driven by creativity, imagination, perspiration & shared resources
![Page 11: Data Challenges in e-Science Aberdeen Prof. Malcolm Atkinson Director 2 nd December 2003](https://reader036.vdocument.in/reader036/viewer/2022062417/5515edd3550346cf6f8b5274/html5/thumbnails/11.jpg)
global in-flight engine diagnostics
in-flight data
airline
maintenance centre
ground station
global networkeg SITA
internet, e-mail, pager
DS&S Engine Health Center
data centre
Distributed Aircraft Maintenance Environment: Universities of Leeds, Oxford, Sheffield &York
100,000 aircraft
0.5 GB/flight
4 flights/day
200 TB/day
![Page 12: Data Challenges in e-Science Aberdeen Prof. Malcolm Atkinson Director 2 nd December 2003](https://reader036.vdocument.in/reader036/viewer/2022062417/5515edd3550346cf6f8b5274/html5/thumbnails/12.jpg)
LHC Distributed Simulation &
Analysis
Tier2 Centre ~1 TIPS
Online System
Offline Farm~20 TIPS
CERN Computer Centre >20 TIPS
RAL Regional Centre
US Regional Centre
French Regional Centre
Italian Regional Centre
InstituteInstituteInstituteInstitute ~0.25TIPS
Workstations
~100 MBytes/sec
~100 MBytes/sec
100 - 1000 Mbits/sec
•One bunch crossing per 25 ns
•100 triggers per second
•Each event is ~1 Mbyte
Physicists work on analysis “channels”
Each institute has ~10 physicists working on one or more channels
Data for these channels should be cached by the institute server
Physics data cache
~PBytes/sec
~ Gbits/sec or Air Freight
Tier2 Centre ~1 TIPS
Tier2 Centre ~1 TIPS
~Gbits/sec
Tier Tier 00
Tier Tier 11
Tier Tier 33
Tier Tier 44
1 TIPS = 25,000 SpecInt95
PC (1999) = ~15 SpecInt95
ScotGRID++ ~1 TIPS
Tier Tier 22
1. CERN1. CERN
![Page 13: Data Challenges in e-Science Aberdeen Prof. Malcolm Atkinson Director 2 nd December 2003](https://reader036.vdocument.in/reader036/viewer/2022062417/5515edd3550346cf6f8b5274/html5/thumbnails/13.jpg)
DataGrid Testbed
Dubna
Moscow
RAL
Lund
Lisboa
Santander
Madrid
Valencia
Barcelona
Paris
Berlin
LyonGrenoble
Marseille
BrnoPrague
Torino
Milano
BO-CNAFPD-LNL
Pisa
Roma
Catania
ESRIN
CERN
HEP sites
ESA sites
IPSL
Estec KNMI
(>40)
[email protected] - [email protected]
Testbed Sites
![Page 14: Data Challenges in e-Science Aberdeen Prof. Malcolm Atkinson Director 2 nd December 2003](https://reader036.vdocument.in/reader036/viewer/2022062417/5515edd3550346cf6f8b5274/html5/thumbnails/14.jpg)
Shared GoalsMultiple
hypotheses
Collectionsof Published
& Private Data
AnalysisComputationAnnotation
Derived &Synthesised
Data
Shared GoalsMultiple
hypotheses
Collectionsof Published
& Private Data
AnalysisComputationAnnotation
Derived &Synthesised
Data
Multiple overlapping communities
Shared GoalsMultiple
hypotheses
Collectionsof Published
& Private Data
AnalysisComputationAnnotation
Derived &Synthesised
Data
Supported by common standards & shared infrastructure
![Page 15: Data Challenges in e-Science Aberdeen Prof. Malcolm Atkinson Director 2 nd December 2003](https://reader036.vdocument.in/reader036/viewer/2022062417/5515edd3550346cf6f8b5274/html5/thumbnails/15.jpg)
Life-science Examples
![Page 16: Data Challenges in e-Science Aberdeen Prof. Malcolm Atkinson Director 2 nd December 2003](https://reader036.vdocument.in/reader036/viewer/2022062417/5515edd3550346cf6f8b5274/html5/thumbnails/16.jpg)
Database GrowthPDB Content Growth
Bases 45,356,382,990
![Page 17: Data Challenges in e-Science Aberdeen Prof. Malcolm Atkinson Director 2 nd December 2003](https://reader036.vdocument.in/reader036/viewer/2022062417/5515edd3550346cf6f8b5274/html5/thumbnails/17.jpg)
Wellcome Trust: Cardiovascular Functional Genomics
Glasgow Edinburgh
Leicester
Oxford
LondonNetherlands
Shared dataPublic curated
data
BRIDGESIBM
Depends on building & maintaining security, privacy & trust
![Page 18: Data Challenges in e-Science Aberdeen Prof. Malcolm Atkinson Director 2 nd December 2003](https://reader036.vdocument.in/reader036/viewer/2022062417/5515edd3550346cf6f8b5274/html5/thumbnails/18.jpg)
Comparative Functional Genomics
Large amounts of dataHighly heterogeneous
Data typesData formscommunity
Highly complex and inter-relatedVolatile
myGrid Project: Carole Goble, University of Manchester
![Page 19: Data Challenges in e-Science Aberdeen Prof. Malcolm Atkinson Director 2 nd December 2003](https://reader036.vdocument.in/reader036/viewer/2022062417/5515edd3550346cf6f8b5274/html5/thumbnails/19.jpg)
UCSF
UIUC
From Klaus Schulten, Center for Biomollecular Modeling and Bioinformatics, Urbana-Champaign
![Page 20: Data Challenges in e-Science Aberdeen Prof. Malcolm Atkinson Director 2 nd December 2003](https://reader036.vdocument.in/reader036/viewer/2022062417/5515edd3550346cf6f8b5274/html5/thumbnails/20.jpg)
Community =1000s of home computer usersPhilanthropic computing vendor (Entropia)Research group (Scripps)
Common goal= advance AIDS research
Home ComputersEvaluate AIDS Drugs
From Steve Tuecke 12 Oct. 01
![Page 21: Data Challenges in e-Science Aberdeen Prof. Malcolm Atkinson Director 2 nd December 2003](https://reader036.vdocument.in/reader036/viewer/2022062417/5515edd3550346cf6f8b5274/html5/thumbnails/21.jpg)
![Page 22: Data Challenges in e-Science Aberdeen Prof. Malcolm Atkinson Director 2 nd December 2003](https://reader036.vdocument.in/reader036/viewer/2022062417/5515edd3550346cf6f8b5274/html5/thumbnails/22.jpg)
Astronomy Examples
![Page 23: Data Challenges in e-Science Aberdeen Prof. Malcolm Atkinson Director 2 nd December 2003](https://reader036.vdocument.in/reader036/viewer/2022062417/5515edd3550346cf6f8b5274/html5/thumbnails/23.jpg)
![Page 24: Data Challenges in e-Science Aberdeen Prof. Malcolm Atkinson Director 2 nd December 2003](https://reader036.vdocument.in/reader036/viewer/2022062417/5515edd3550346cf6f8b5274/html5/thumbnails/24.jpg)
Global Knowledge Communitiesdriven by Data: e.g., Astronomy
No. & sizes of data sets as of mid-2002, grouped by wavelength• 12 waveband coverage of large areas of the sky• Total about 200 TB data• Doubling every 12 months• Largest catalogues near 1B objects
Data and images courtesy Alex Szalay, John Hopkins
![Page 25: Data Challenges in e-Science Aberdeen Prof. Malcolm Atkinson Director 2 nd December 2003](https://reader036.vdocument.in/reader036/viewer/2022062417/5515edd3550346cf6f8b5274/html5/thumbnails/25.jpg)
Sloan Digital Sky Survey Production System
Slide from Ian Foster’s ssdbm 03 keynote
![Page 26: Data Challenges in e-Science Aberdeen Prof. Malcolm Atkinson Director 2 nd December 2003](https://reader036.vdocument.in/reader036/viewer/2022062417/5515edd3550346cf6f8b5274/html5/thumbnails/26.jpg)
Supernova Cosmology Requires Complex,
Widely Distributed Workflow Management
![Page 27: Data Challenges in e-Science Aberdeen Prof. Malcolm Atkinson Director 2 nd December 2003](https://reader036.vdocument.in/reader036/viewer/2022062417/5515edd3550346cf6f8b5274/html5/thumbnails/27.jpg)
Engineering Examples
![Page 28: Data Challenges in e-Science Aberdeen Prof. Malcolm Atkinson Director 2 nd December 2003](https://reader036.vdocument.in/reader036/viewer/2022062417/5515edd3550346cf6f8b5274/html5/thumbnails/28.jpg)
whole-system simulations
•braking performance•steering capabilities•traction•dampening capabilities
landing gear models
•lift capabilities•drag capabilities•responsiveness
wing models
•deflection capabilities•responsiveness
stabilizer modelsairframe models
crew capabilities- accuracy- perception- stamina- reaction times- SOP’s
human models •thrust performance•reverse thrust performance•responsiveness•fuel consumption
engine models
NASA Information Power Grid: coupling all sub-system simulations - slide from Bill Johnson
![Page 29: Data Challenges in e-Science Aberdeen Prof. Malcolm Atkinson Director 2 nd December 2003](https://reader036.vdocument.in/reader036/viewer/2022062417/5515edd3550346cf6f8b5274/html5/thumbnails/29.jpg)
Mathematicians Solve NUG30Looking for the solution to the NUG30 quadratic assignment problem An informal collaboration of mathematicians and computer scientistsCondor-G delivered 3.46E8 CPU seconds in 7 days (peak 1009 processors) in U.S. and Italy (8 sites)
14,5,28,24,1,3,16,15,10,9,21,2,4,29,25,22,13,26,17,30,6,20,19,8,18,7,27,12,11,23
MetaNEOS: Argonne, Iowa, Northwestern, WisconsinFrom Miron Livny 7 Aug. 01
![Page 30: Data Challenges in e-Science Aberdeen Prof. Malcolm Atkinson Director 2 nd December 2003](https://reader036.vdocument.in/reader036/viewer/2022062417/5515edd3550346cf6f8b5274/html5/thumbnails/30.jpg)
Network for EarthquakeEngineering Simulation
NEESgrid: national infrastructure to couple earthquake engineers with experimental facilities, databases, computers, & each otherOn-demand access to experiments, data streams, computing, archives, collaboration
NEESgrid: Argonne, Michigan, NCSA, UIUC, USCFrom Steve Tuecke 12 Oct. 01
![Page 31: Data Challenges in e-Science Aberdeen Prof. Malcolm Atkinson Director 2 nd December 2003](https://reader036.vdocument.in/reader036/viewer/2022062417/5515edd3550346cf6f8b5274/html5/thumbnails/31.jpg)
National Airspace Simulation Environment
NASA Information Power Grid: aircraft, flight paths, airport operations and the environmentare combined to get a virtual national airspace
VirtualNational Air
SpaceVNAS
GRCengine models
LaRC
airframe models
landinggear models
ARC
wing models
stabilizer models
human models
• FAA ops data• weather data• airline schedule data• digital flight data• radar tracks• terrain data• surface data
22,000 commercialUS flights a day
50,000 engine runs
22,000 airframe impact runs
132,000 landing/take-off gear runs
48,000 human crew runs
66,000 stabilizer runs
44,000 wing runs
simulationdrivers
![Page 32: Data Challenges in e-Science Aberdeen Prof. Malcolm Atkinson Director 2 nd December 2003](https://reader036.vdocument.in/reader036/viewer/2022062417/5515edd3550346cf6f8b5274/html5/thumbnails/32.jpg)
Data Challenges
![Page 33: Data Challenges in e-Science Aberdeen Prof. Malcolm Atkinson Director 2 nd December 2003](https://reader036.vdocument.in/reader036/viewer/2022062417/5515edd3550346cf6f8b5274/html5/thumbnails/33.jpg)
Derived from Ian Foster’s slide at ssdbM July 03
It’s Easy to ForgetHow Different 2003 is From
1993Enormous quantities of data: Petabytes
For an increasing number of communitiesgating step is not collection but analysis
Ubiquitous Internet: >100 million hostsCollaboration & resource sharing the normSecurity and Trust are crucial issues
Ultra-high-speed networks: >10 Gb/sGlobal optical networksBottlenecks: last kilometre & firewalls
Huge quantities of computing: >100 Top/sMoore’s law gives us all supercomputersOrganising their effective use is the challenge
Moore’s law everywhereInstruments, detectors, sensors, scanners, …Organising their effective use is the challenge
![Page 34: Data Challenges in e-Science Aberdeen Prof. Malcolm Atkinson Director 2 nd December 2003](https://reader036.vdocument.in/reader036/viewer/2022062417/5515edd3550346cf6f8b5274/html5/thumbnails/34.jpg)
Tera → Peta BytesRAM time to move
15 minutes
1Gb WAN move time10 hours ($1000)
Disk Cost7 disks = $5000 (SCSI)
Disk Power100 Watts
Disk Weight5.6 Kg
Disk FootprintInside machine
RAM time to move2 months
1Gb WAN move time14 months ($1 million)
Disk Cost6800 Disks + 490 units + 32 racks = $7 million
Disk Power100 Kilowatts
Disk Weight33 Tonnes
Disk Footprint60 m2
May 2003 Approximately Correct
See also Distributed Computing Economics Jim Gray, Microsoft Research, MSR-TR-2003-24
![Page 35: Data Challenges in e-Science Aberdeen Prof. Malcolm Atkinson Director 2 nd December 2003](https://reader036.vdocument.in/reader036/viewer/2022062417/5515edd3550346cf6f8b5274/html5/thumbnails/35.jpg)
The Story so Far
Technology enables Grids, More Data & …Information Grids will dominateCollaboration essential
Combining approachesCombining skillsSharing resources
(Structured) Data is the language of Collaboration
Data Access & Integration a Ubiquitous Requirement
Many hard technical challengesScale, heterogeneity, distribution, dynamic variation
Intimate combinations of data and computationWith unpredictable (autonomous) development of both
![Page 36: Data Challenges in e-Science Aberdeen Prof. Malcolm Atkinson Director 2 nd December 2003](https://reader036.vdocument.in/reader036/viewer/2022062417/5515edd3550346cf6f8b5274/html5/thumbnails/36.jpg)
Scientific DataOpportunities
Global Production of Published DataVolume DiversityCombination Analysis Discovery
ChallengesData Huggers
Meagre metadata
Ease of Use
Optimised integration
Dependability
OpportunitiesSpecialised IndexingNew Data OrganisationNew AlgorithmsVaried ReplicationShared AnnotationIntensive Data & Computation
ChallengesFundamental PrinciplesApproximate MatchingMulti-scale optimisationAutonomous ChangeLegacy structuresScale and LongevityPrivacy and Mobility
![Page 37: Data Challenges in e-Science Aberdeen Prof. Malcolm Atkinson Director 2 nd December 2003](https://reader036.vdocument.in/reader036/viewer/2022062417/5515edd3550346cf6f8b5274/html5/thumbnails/37.jpg)
UK e-Science
![Page 38: Data Challenges in e-Science Aberdeen Prof. Malcolm Atkinson Director 2 nd December 2003](https://reader036.vdocument.in/reader036/viewer/2022062417/5515edd3550346cf6f8b5274/html5/thumbnails/38.jpg)
UK e-Science
e- Science and the Grid‘e- Science is about global collaboration in key areas of science, and the next generation of infrastructure that will enable it.’
‘e- Science will change the dynamic of the way science is undertaken.’
J ohn TaylorDirector General of Research Councils
Offi ce of Science and Technology
From presentation by Tony Hey
![Page 39: Data Challenges in e-Science Aberdeen Prof. Malcolm Atkinson Director 2 nd December 2003](https://reader036.vdocument.in/reader036/viewer/2022062417/5515edd3550346cf6f8b5274/html5/thumbnails/39.jpg)
e-Science Programme’s Vision
UK will lead the in the exploitation of e-Infrastructure
New, faster and better researchEngineering design, medical diagnosis, decision support, …e-Business, e-Research, e-Design & e-Decision
Depends on Leading e-Infrastructure development & deployment
![Page 40: Data Challenges in e-Science Aberdeen Prof. Malcolm Atkinson Director 2 nd December 2003](https://reader036.vdocument.in/reader036/viewer/2022062417/5515edd3550346cf6f8b5274/html5/thumbnails/40.jpg)
e-Science and SR2002
Research Council 2004-6 2001-4Medical £13.1M (£8M)Biological £10.0M (£8M)Environmental £8.0M (£7M)Eng & Phys £18.0M (£17M)HPC £2.5M (£9M)Core Prog. £16.2M (£15M) +
£20MParticle Phys & Astro £31.6M (£26M)Economic & Social £10.6M (£3M)Central Labs £5.0M (£5M)
![Page 41: Data Challenges in e-Science Aberdeen Prof. Malcolm Atkinson Director 2 nd December 2003](https://reader036.vdocument.in/reader036/viewer/2022062417/5515edd3550346cf6f8b5274/html5/thumbnails/41.jpg)
www.nesc.ac.uk
Nationale-
ScienceCentre
HPC(x)
National e-Science InstituteInternational relationshipsEngineering Task ForceGrid Support CentreArchitecture Task ForceOGSA-DAIOne of 11 Centre ProjectsGridNet to support standards workOne of 6 administration projectsTraining team5 Application projects15 “Fundamental” Research projectsEGEE
Globus Alliance
![Page 42: Data Challenges in e-Science Aberdeen Prof. Malcolm Atkinson Director 2 nd December 2003](https://reader036.vdocument.in/reader036/viewer/2022062417/5515edd3550346cf6f8b5274/html5/thumbnails/42.jpg)
NeSI in Edinburgh
National e-Science Centre
![Page 43: Data Challenges in e-Science Aberdeen Prof. Malcolm Atkinson Director 2 nd December 2003](https://reader036.vdocument.in/reader036/viewer/2022062417/5515edd3550346cf6f8b5274/html5/thumbnails/43.jpg)
NeSI Events held in the 2nd Year
(from 1 Aug 2002 to 31 Jul 2003)We have had 86 events: (Year 1 figures in
brackets)
11 project meetings ( 4)11 research meetings ( 7)25 workshops (17 + 1) 2 “summer” schools (0)15 training sessions (8)12 outreach events (3)5 international meetings (1)5 e-Science management meetings (7)
(though the definitions are fuzzy!)
> 3600 Participant Days
Suggestions always welcome
Establishing a training team
Investing in community building, skill generation & knowledge development
![Page 44: Data Challenges in e-Science Aberdeen Prof. Malcolm Atkinson Director 2 nd December 2003](https://reader036.vdocument.in/reader036/viewer/2022062417/5515edd3550346cf6f8b5274/html5/thumbnails/44.jpg)
NeSI Workshops
Space for real workCrossing communitiesCreativity: new strategies and solutionsWritten reports
Scientific Data Mining, Integration and VisualisationGrid Information SystemsPortals and PortletsVirtual Observatory as a Data GridImaging, Medical Analysis and Grid EnvironmentsGrid SchedulingProvenance & WorkflowGeoSciences & Scottish Bioinformatics Forum
http://www.nesc.ac.uk/events/
![Page 45: Data Challenges in e-Science Aberdeen Prof. Malcolm Atkinson Director 2 nd December 2003](https://reader036.vdocument.in/reader036/viewer/2022062417/5515edd3550346cf6f8b5274/html5/thumbnails/45.jpg)
E-Infrastructure
![Page 46: Data Challenges in e-Science Aberdeen Prof. Malcolm Atkinson Director 2 nd December 2003](https://reader036.vdocument.in/reader036/viewer/2022062417/5515edd3550346cf6f8b5274/html5/thumbnails/46.jpg)
OGSA
Infrastructure Architecture
OGSI: Interface to Grid Infrastructure
Data Intensive Applications for Application area X
Compute, Data & Storage Resources
Distributed
Simulation, Analysis & Integration Technology for Application area X
Data Intensive Users
Virtual Integration Architecture
Generic Virtual Data Access and Integration Layer
Structured DataIntegration
Structured Data Access
Structured Data Relational XML Semi-structured-
Transformation
Registry
Job Submission
Data Transport Resource Usage
Banking
Brokering Workflow
Authorisation
![Page 47: Data Challenges in e-Science Aberdeen Prof. Malcolm Atkinson Director 2 nd December 2003](https://reader036.vdocument.in/reader036/viewer/2022062417/5515edd3550346cf6f8b5274/html5/thumbnails/47.jpg)
1a. Request to Registry for sources of data about “x”
1b. Registry responds with
Factory handle2a. Request to Factory for access to database
2c. Factory returns handle of GDS to client
3a. Client queries GDS with XPath, SQL, etc
3b. GDS interacts with database
3c. Results of query returned to client as XML
SOAP/HTTP
service creation
API interactions
Registry
Factory
2b. Factory creates GridDataService to manage access
Grid Data Service
Client
XML / Relational database
Data Access & Integration Services
![Page 48: Data Challenges in e-Science Aberdeen Prof. Malcolm Atkinson Director 2 nd December 2003](https://reader036.vdocument.in/reader036/viewer/2022062417/5515edd3550346cf6f8b5274/html5/thumbnails/48.jpg)
GDTS2 GDS3
GDS2
GDTS1
Sx
Sy
1a. Request to Registry for sources of data about “x” & “y”
1b. Registry responds with
Factory handle
2a. Request to Factory for access and integration from resources Sx and Sy
2b. Factory creates GridDataServices network
2c. Factory returns handle of GDS to client
3a. Client submits sequence of scripts each has a set of queries to GDS with XPath, SQL, etc
3c. Sequences of result sets returned to analyst as formatted binary described in a standard XML notation
SOAP/HTTP
service creation
API interactions
Data Registry
Data Access& Integrationmaster
Client
Analyst XML database
Relational database
GDS
GDS
GDS
GDTS
GDTS
3b. Client tells analyst
GDS1
Future DAI Services?
“scientific”Applicationcodingscientificinsights
ProblemSolving
Environment
SemanticMeta data
Application Code
![Page 49: Data Challenges in e-Science Aberdeen Prof. Malcolm Atkinson Director 2 nd December 2003](https://reader036.vdocument.in/reader036/viewer/2022062417/5515edd3550346cf6f8b5274/html5/thumbnails/49.jpg)
Integration is our Focus
Supporting CollaborationBring together disciplinesBring together people engaged in shared challengeInject initial energyInvent methods that work
Supporting Collaborative ResearchIntegrate compute, storage and communicationsDeliver and sustain integrated software stackOperate dependable infrastructure serviceIntegrate multiple data sourcesIntegrate data and computationIntegrate experiment with simulationIntegrate visualisation and analysis
High-level tools and automation essentialFundamental research as a foundation
![Page 50: Data Challenges in e-Science Aberdeen Prof. Malcolm Atkinson Director 2 nd December 2003](https://reader036.vdocument.in/reader036/viewer/2022062417/5515edd3550346cf6f8b5274/html5/thumbnails/50.jpg)
Take Home MessageData is a Major Source of Challenges
AND an Enabler of New Science, Engineering , Medicine, Planning, …
Information GridsSupport for collaborationSupport for computation and data gridsStructured data is fundamentalIntegrated strategies & technologies needed
E-Infrastructure is Here – More to do – technically & socio-economically
Join in – explore the potential – develop the methods & standards
NeSC would like to help you develop e-ScienceWe seek suggestions and collaboration