nesc data projects and initiatives dr. dave berry research manager
TRANSCRIPT
![Page 1: NeSC Data Projects and Initiatives Dr. Dave Berry Research Manager](https://reader036.vdocument.in/reader036/viewer/2022070306/5515ee4c550346cf6f8b52a4/html5/thumbnails/1.jpg)
NeSC Data Projects and Initiatives
Dr. Dave BerryResearch Manager
![Page 2: NeSC Data Projects and Initiatives Dr. Dave Berry Research Manager](https://reader036.vdocument.in/reader036/viewer/2022070306/5515ee4c550346cf6f8b52a4/html5/thumbnails/2.jpg)
Contents
The Data DelugeWeb ServicesThe DAI visionThe OGSA-DAI Project and GGFThe OGSA-DAI SoftwareEdiktOther relevant projects in the UK
![Page 3: NeSC Data Projects and Initiatives Dr. Dave Berry Research Manager](https://reader036.vdocument.in/reader036/viewer/2022070306/5515ee4c550346cf6f8b52a4/html5/thumbnails/3.jpg)
Acknowledgements
This talk includes material prepared by:The OGSA-DAI projectThe e-Diamond projectThe BRIDGES projectThe GGF OGSA Working Groupand others…
![Page 4: NeSC Data Projects and Initiatives Dr. Dave Berry Research Manager](https://reader036.vdocument.in/reader036/viewer/2022070306/5515ee4c550346cf6f8b52a4/html5/thumbnails/4.jpg)
The Data Deluge
Mont Blanc(4810 m)
Entering an age of dataCERN: LHC will generate 1GB/s = 10PB/yVLBA (NRAO) generates 1GB/s todayPixar generate 100 TB/Movie
Data stored in many different waysRelational databasesXML databasesFlat files
Need ways to facilitate Data discoveryData accessData integration
Downtown Geneva
![Page 5: NeSC Data Projects and Initiatives Dr. Dave Berry Research Manager](https://reader036.vdocument.in/reader036/viewer/2022070306/5515ee4c550346cf6f8b52a4/html5/thumbnails/5.jpg)
Astronomical Databases
No. & sizes of data sets as of mid-2002, grouped by wavelength• 12 waveband coverage of large areas of the sky• Total about 200 TB data• Doubling every 12 months• Largest catalogues nr. 1B objects
Data and images courtesy Alex Szalay, John Hopkins
![Page 6: NeSC Data Projects and Initiatives Dr. Dave Berry Research Manager](https://reader036.vdocument.in/reader036/viewer/2022070306/5515ee4c550346cf6f8b52a4/html5/thumbnails/6.jpg)
Bioinformatics DatabasesPDB Content Growth
•Biobliographic (MedLine, …)
•Amino Acid Seq (SWISS-PROT, …)
•3D Molecular Structure (PDB, …)
•Nucleotide Seq (GenBank, EMBL, …)
•Biochemical Pathways (KEGG, WIT…)
•Molecular Classifications (SCOP, CATH,…)
•Motif Libraries (PROSITE, Blocks, …)
![Page 7: NeSC Data Projects and Initiatives Dr. Dave Berry Research Manager](https://reader036.vdocument.in/reader036/viewer/2022070306/5515ee4c550346cf6f8b52a4/html5/thumbnails/7.jpg)
Web Services
Using the protocols and ideas that have made the web a success for humans…And applying them to distributed programming
HTTP Single networking port Autonomy & Failure handlingOpen standards
Tools & PlatformsApache axisWebsphere, .NET, Oracle Application Server, Sun ONE, …
![Page 8: NeSC Data Projects and Initiatives Dr. Dave Berry Research Manager](https://reader036.vdocument.in/reader036/viewer/2022070306/5515ee4c550346cf6f8b52a4/html5/thumbnails/8.jpg)
From Browsing to Programming
Browsing the web Programming the web
Readers People Software
Discovery Google, Altavista, … UDDI, …
Description N/A WSDL
Operations Get, post, … Service-specific
Protocol HTTP SOAP over HTTP
Format HTML, XHTML XML + Schema
![Page 9: NeSC Data Projects and Initiatives Dr. Dave Berry Research Manager](https://reader036.vdocument.in/reader036/viewer/2022070306/5515ee4c550346cf6f8b52a4/html5/thumbnails/9.jpg)
A Perspective on WS Specifications
![Page 10: NeSC Data Projects and Initiatives Dr. Dave Berry Research Manager](https://reader036.vdocument.in/reader036/viewer/2022070306/5515ee4c550346cf6f8b52a4/html5/thumbnails/10.jpg)
Open Grid Services Architecture
Web Services
Business integration
Secure and universal access
Applications on demand
Grid Protocols
Vast resourcescalability
Global Accessibility
Resourceson demand
ContinuousAvailability
Accessresource
Manageresource
Shareresource
The architecture of the Global Grid Forum
![Page 11: NeSC Data Projects and Initiatives Dr. Dave Berry Research Manager](https://reader036.vdocument.in/reader036/viewer/2022070306/5515ee4c550346cf6f8b52a4/html5/thumbnails/11.jpg)
ContextServices
InformationServices
InfrastructureServices
SecurityServices
ResourceMgmt
Services
ExecutionMgmt
Services
DataServices
PolicyMgmt
VOMgmt
Access
Integration
Provisioning
Cataloging
BoundaryTraversal
Integrity
Authorization
Authentication
WSRF WSN WSDM
EventMgmt
Trouble-shooting
Discovery
JobMgmt
Logging
ExecutionPlanning
WorkflowMgmt
WorkloadMgmt
Provisioning
ApplicationMgmt
DeploymentConfigurationReservation
Naming
SelfMgmt
Services
HeterogeneityMgmt
Service LevelAttainment
QoSMgmt
Optimization
GGF11:OGSA specification
informationaldocument
![Page 12: NeSC Data Projects and Initiatives Dr. Dave Berry Research Manager](https://reader036.vdocument.in/reader036/viewer/2022070306/5515ee4c550346cf6f8b52a4/html5/thumbnails/12.jpg)
Data Access and Integration
Web Services for querying and integrating structured data resourcesThe foundation framework for:
Building tailored DAI applicationsHigher-level services:
Replication: Data located in multiple locations Federation: Composition of multiple sources Provenance: How was data generated?
![Page 13: NeSC Data Projects and Initiatives Dr. Dave Berry Research Manager](https://reader036.vdocument.in/reader036/viewer/2022070306/5515ee4c550346cf6f8b52a4/html5/thumbnails/13.jpg)
The OGSA-DAI Project
Powered by ….
Funded by the Grid Core ProgrammeOGSA-DAI£3 million, 18 months, from Feb 2002
Three major releases, three interim releases
DAIT (DAI-Two)Keep the OGSA-DAI brand name£1.5 million, 24 months, from Oct 2003Four major releases
![Page 14: NeSC Data Projects and Initiatives Dr. Dave Berry Research Manager](https://reader036.vdocument.in/reader036/viewer/2022070306/5515ee4c550346cf6f8b52a4/html5/thumbnails/14.jpg)
DAI in GGF and OGSA
Data Access and Integration Services WGStrong involvement from OGSA-DAI membersStandardise the interfaces – WS-DAIOGSA-DAI a reference implementationExperience informing specification work
OGSA WG Data Design TeamDesigning the data-oriented aspects of OGSACreated after GGF10 (March 2004)Led by NeSC
![Page 15: NeSC Data Projects and Initiatives Dr. Dave Berry Research Manager](https://reader036.vdocument.in/reader036/viewer/2022070306/5515ee4c550346cf6f8b52a4/html5/thumbnails/15.jpg)
Context Services Info
Services
InfraServices
SecurityServices
Rsrc Mgmt Services
Execution Mgmt
Services
DataServices
PolicyMgmt
VOMgmt
Access
Integration
Provisioning
Cataloging
BoundaryTraversal
Integrity
Authorization
Authentication
WSRF WSN WSDM
EventMgmt
Trouble-shooting
Discovery
JobMgmt
Logging
ExecutionPlanning
WorkflowMgmt
WorkloadMgmt
Provisioning
ApplicationMgmt
DeploymentConfigurationReservation
Naming
Self MgmtServices
HeterogeneityMgmt
Service LevelAttainment
QoSMgmt
Optimization
OGSA Design Teams
OGSA-WG
Information Service design teamData Service design team
EMS design team
Resource Mgmt design team
Security Service design team
Self Mgmt design team
Core (roadmap) design team
Naming design team
![Page 16: NeSC Data Projects and Initiatives Dr. Dave Berry Research Manager](https://reader036.vdocument.in/reader036/viewer/2022070306/5515ee4c550346cf6f8b52a4/html5/thumbnails/16.jpg)
Data Services design team
Informal domain expert groups within OGSAMay include co-chairs of other WG/RGsOutput is included in OGSA specification
OGSA-WG
OGSA Data ServiceDesign team
DAIS-WG
GSM-WG
GFS-WG
Info-D WG
ADF, OREP, …
Tele cons, F2F meetings
![Page 17: NeSC Data Projects and Initiatives Dr. Dave Berry Research Manager](https://reader036.vdocument.in/reader036/viewer/2022070306/5515ee4c550346cf6f8b52a4/html5/thumbnails/17.jpg)
OGSA v2 Document Deliverables
RootDocuments
Usecase doc Architecture v2 Glossary
Design team
DocumentsService descriptions Scenarios
Working Group
Specifications GGF Recommendation documents
![Page 18: NeSC Data Projects and Initiatives Dr. Dave Berry Research Manager](https://reader036.vdocument.in/reader036/viewer/2022070306/5515ee4c550346cf6f8b52a4/html5/thumbnails/18.jpg)
1a. Request to Registry for sources of data about “x”
1b. Registry responds with
Factory handle2a. Request to Factory for access to database
2c. Factory returns handle of GDS to client
3a. Client queries GDS with XPath, SQL, etc
3b. GDS interacts with database
3c. Results of query returned to client as XML
SOAP/HTTP
service creation
API interactions
Registry
Factory
2b. Factory creates GridDataService to manage access
Grid Data Service
Client
XML / Relational database
How OGSA-DAI works
![Page 19: NeSC Data Projects and Initiatives Dr. Dave Berry Research Manager](https://reader036.vdocument.in/reader036/viewer/2022070306/5515ee4c550346cf6f8b52a4/html5/thumbnails/19.jpg)
OGSA-DAI compared to JDBC
Language independence at the client endPlatform independence
Do not have to worry about connection technology, drivers, etc
Can handle XML resourcesCan embed additional functionality at the service end
TransformationsThird party deliveryAvoiding unnecessary data movement
Provision of Metadata is powerfulUsefulness of the Registry for service discovery
Dynamic service binding process
![Page 20: NeSC Data Projects and Initiatives Dr. Dave Berry Research Manager](https://reader036.vdocument.in/reader036/viewer/2022070306/5515ee4c550346cf6f8b52a4/html5/thumbnails/20.jpg)
GDTS2 GDS3
GDS2
GDTS1
Sx
Sy
1a. Request to Registry for sources of data about “x” & “y”
1b. Registry responds with
Factory handle
2a. Request to Factory for access and integration from resources Sx and Sy
2b. Factory creates GridDataServices network
2c. Factory returns handle of GDS to client
3a. Client submits sequence of scripts each has a set of queries to GDS with XPath, SQL, etc
3c. Sequences of result sets returned to analyst as formatted binary described in a standard XML notation
SOAP/HTTP
service creation
API interactions
Data Registry
Data Access& Integrationmaster
Client
Analyst XML database
Relational database
GDS
GDS
GDS
GDTS
GDTS
3b. Client tells analyst
GDS1
Future DAI Services
Application Code
![Page 21: NeSC Data Projects and Initiatives Dr. Dave Berry Research Manager](https://reader036.vdocument.in/reader036/viewer/2022070306/5515ee4c550346cf6f8b52a4/html5/thumbnails/21.jpg)
Activities are the drivers
Express a task to be performed by a GDSThree broad classes of activities:
StatementTransformationsDelivery
Extensible:Easy to add new functionalityDoes not require modification to the service interfaceExtension operate within the OGSA-DAI framework
Functionality:Implemented at the serviceWork where the data is (do not require to move data back)
![Page 22: NeSC Data Projects and Initiatives Dr. Dave Berry Research Manager](https://reader036.vdocument.in/reader036/viewer/2022070306/5515ee4c550346cf6f8b52a4/html5/thumbnails/22.jpg)
OGSA-DAI Deck
![Page 23: NeSC Data Projects and Initiatives Dr. Dave Berry Research Manager](https://reader036.vdocument.in/reader036/viewer/2022070306/5515ee4c550346cf6f8b52a4/html5/thumbnails/23.jpg)
Building Applications
Activities are grouped togetherPerform documentData can flow between activities
OptimisationAvoids multiple message exchanges
Can deliver to other GDSsPrerequisite for data integration
Base middleware for projects requiring data access
Some capability for data integration
![Page 24: NeSC Data Projects and Initiatives Dr. Dave Berry Research Manager](https://reader036.vdocument.in/reader036/viewer/2022070306/5515ee4c550346cf6f8b52a4/html5/thumbnails/24.jpg)
Release 4, April 2004
Provides Data Access components, an extensible framework for building applications and some integration componentsBuilt on top of Globus Toolkit 3.2Supports relational, xml and some files
MySQL, Oracle, DB2, SQL Server, Postgres, XIndice, CSVSupports various delivery options
SOAP, FTP, GridFTP, HTTP, files, email, inter-serviceSupports various transforms
XSLT, ZIP, GZipSupports message level security using X509 certificatesClient Toolkit library for application developersGUI data browser (contributed by FirstDIG project)Separate Distributed Query Processing componentsComprehensive documentation and tutorials in XHTML format
![Page 25: NeSC Data Projects and Initiatives Dr. Dave Berry Research Manager](https://reader036.vdocument.in/reader036/viewer/2022070306/5515ee4c550346cf6f8b52a4/html5/thumbnails/25.jpg)
Downloads by Release
0
500
1000
1500
2000
2500
3000
15/0
1/20
03
15/0
3/20
03
15/0
5/20
03
15/0
7/20
03
15/0
9/20
03
15/1
1/20
03
15/0
1/20
04
15/0
3/20
04
15/0
5/20
04
15/0
7/20
04
R1 R2
R3
R4
2746 downloads (~4.7 downloads a day)
![Page 26: NeSC Data Projects and Initiatives Dr. Dave Berry Research Manager](https://reader036.vdocument.in/reader036/viewer/2022070306/5515ee4c550346cf6f8b52a4/html5/thumbnails/26.jpg)
Downloads by country
792 registered users @ 23/8/04
![Page 27: NeSC Data Projects and Initiatives Dr. Dave Berry Research Manager](https://reader036.vdocument.in/reader036/viewer/2022070306/5515ee4c550346cf6f8b52a4/html5/thumbnails/27.jpg)
Release 5, October 2004
Re-engineered interface-independent core OGSA-DAI functionality.Improved dependability and security integration.New file data resources representing flat files queried using full text searches (e.g. EMBL format).Installation and Configuration Wizard, including “all-in-one installer”Improved Data Browser which allows XPath querying.Set of standard benchmarks.JSP Quick View interface.Support for other databases (e.g. Access, Exist, HSQL).
![Page 28: NeSC Data Projects and Initiatives Dr. Dave Berry Research Manager](https://reader036.vdocument.in/reader036/viewer/2022070306/5515ee4c550346cf6f8b52a4/html5/thumbnails/28.jpg)
Release 6, April 2006
Data Integration applications supporting identified scenariosOGSA-DQP as an integrated part of releaseFully compliant JDBC Driver for OGSA-DAISupport for WS-Security implementationsSupport for stored procedures on all supported databasesImproved support for different database specific SQL typesSQL translation between vendor dialects for subset of queries Support for XQuery data resourcesWe expect to comply with a version of the emerging DAIS specification at this release.
![Page 29: NeSC Data Projects and Initiatives Dr. Dave Berry Research Manager](https://reader036.vdocument.in/reader036/viewer/2022070306/5515ee4c550346cf6f8b52a4/html5/thumbnails/29.jpg)
Who is Using OGSA-DAI?
OGSA-DAI(http://www.ogsadai.org.uk)
AstroGrid(http://www.astrogrid.org/)
BioSimGrid(http://www.biosimgrid.org/)
BioGrid(http://www.biogrid.jp/)
Bridges(http://www.brc.dcs.gla.ac.uk/projects/bridges/)
eDiaMoND (http://www.ediamond.ox.ac.uk/)
FirstDig(http://www.epcc.ed.ac.uk/~firstdig/)
GeneGrid(http://www.qub.ac.uk/escience/projects.php#genegrid)
GEON(http://www.geongrid.org/)
IU RGRBench(http://www.cs.indiana.edu/~plale/projects/RGR/OGSA-DAI.html)
myGrid(http://www.mygrid.org.uk/)
N2Grid(http://www.cs.univie.ac.at/institute/index.html?project-80=80)
ODD-Genes(http://www.epcc.ed.ac.uk/oddgenes/) OGSA-WebDB
(http://www.gtrc.aist.go.jp/dbgrid/)
MCS(http://www.isi.edu/~deelman/MCS/)
INWA(http://www.epcc.ed.ac.uk/projects/inwa/)
GridMiner(http://www.gridminer.org/)
![Page 30: NeSC Data Projects and Initiatives Dr. Dave Berry Research Manager](https://reader036.vdocument.in/reader036/viewer/2022070306/5515ee4c550346cf6f8b52a4/html5/thumbnails/30.jpg)
OGSA-DAIBiologicalSciences
PhysicalSciences
Commercial Applications
ComputerSciences
• FirstDig
• I NWA
• Bridges • AstroGrid
• BioSimGrid• BioGrid
• eDiamond• myGrid
• ODD- Genes
• N2Grid
• GEON
• MCS
• I U RGBench
• OGSA Web- DB
• GeneGrid
• GridMiner
OGSA-DAIBiologicalSciences
PhysicalSciences
Commercial Applications
ComputerSciences
• FirstDig
• I NWA
• Bridges • AstroGrid
• BioSimGrid• BioGrid
• eDiamond• myGrid
• ODD- Genes
• N2Grid
• GEON
• MCS
• I U RGBench
• OGSA Web- DB
• GeneGrid
• GridMiner
Project classification
![Page 31: NeSC Data Projects and Initiatives Dr. Dave Berry Research Manager](https://reader036.vdocument.in/reader036/viewer/2022070306/5515ee4c550346cf6f8b52a4/html5/thumbnails/31.jpg)
Edikt
The team: 8 professional software engineers, support staff, project manager, commercialisation manager, architect, and SABSHEFC funded research and development grant
3 years funding: May 2002 – 2005+3 years funding upon successful project and review
Standards
Edikt project
Requirementsanalysis
Technologymatchmaking
Gap filling Rigorousengineering
CS Research
Grid Services fore-Science Data Management
Commercial SW components
and skills
E-Science Apps
![Page 32: NeSC Data Projects and Initiatives Dr. Dave Berry Research Manager](https://reader036.vdocument.in/reader036/viewer/2022070306/5515ee4c550346cf6f8b52a4/html5/thumbnails/32.jpg)
JavaFramework
ELDAS – Data Access Service
Implemented using Enterprise Java BeansData Access Components interface to distinct DBMSsAccessible as a grid data service or a web data service
ELDAS
DB2 DBMySQL DBXindice DB
Web User1
Oracle 9i DB
EJB - DAS
DACDACDACDAC
Another (partial) implementation of the GGF WS-DAI specifications
Web ServletGrid Proxy
Grid User1 Grid User2
![Page 33: NeSC Data Projects and Initiatives Dr. Dave Berry Research Manager](https://reader036.vdocument.in/reader036/viewer/2022070306/5515ee4c550346cf6f8b52a4/html5/thumbnails/33.jpg)
e-ScienceApplication
BinaryData File
BinaryData FileBinary
Data File
BinaryData FileBinary
Data File
BinaryData File
BinX – accessing legacy binary data
The Problem:Many binary data filesApplications must “know”the data formatBinary data formats are machine-specific
BinX Library
The Solution:Write a “stand-aside” format description in XMLProvide a library to
Interpret the description Provide file access across different
machines
Build higher-level services
BinX file describes binary file structure
BinX file describes binary file structure
simulations
![Page 34: NeSC Data Projects and Initiatives Dr. Dave Berry Research Manager](https://reader036.vdocument.in/reader036/viewer/2022070306/5515ee4c550346cf6f8b52a4/html5/thumbnails/34.jpg)
Mammography
Mammograms have different appearances, depending on image settings and acquisition systems
StandardMammoFormat
StandardMammoFormat
Temporal mammography
ComputerAidedDetection
3D View
A prototype of a national database of mammographic images in support of the UK breast screening programme
![Page 35: NeSC Data Projects and Initiatives Dr. Dave Berry Research Manager](https://reader036.vdocument.in/reader036/viewer/2022070306/5515ee4c550346cf6f8b52a4/html5/thumbnails/35.jpg)
DB2 ContentManager
DB2 ContentManager
DB2 ContentManager
DB2 ContentManager
DB2 Federation
OGSA-DAI OGSA-DAI OGSA-DAI OGSA-DAI
Database Files
OGSA-DAI
Core Services
Core Services
Core Services
Core Services
DataLoad
TrainingApp
TrainingServices
UCLKCL UEDCHU
CoreAPI
TrainingAPI
TrainingApplication
Core & Training API
OGSA-DAI
DataLoad
TrainingApp
Core & Training API
DataLoad
TrainingApp
Core & Training API
DataLoad
TrainingApp
Core & Training API
![Page 36: NeSC Data Projects and Initiatives Dr. Dave Berry Research Manager](https://reader036.vdocument.in/reader036/viewer/2022070306/5515ee4c550346cf6f8b52a4/html5/thumbnails/36.jpg)
The BRIDGES Project
Biomedical Research Informatics Delivered by Grid Enabled Services
NeSC (Edinburgh and Glasgow) and IBM www.brc.dcs.gla.ac.uk/projects/bridges
Supporting project for CFG project Generating data on hypertensionRat, Mouse, Human genome databases
Variety of tools usedBLAST, BLAT, Gene Prediction, visualisation, …
Variety of data sources and formatsMicroarray data, genome DBs, project partner research data, medical records, …
Aim is integrated infrastructure supportingData federationSecurity
![Page 37: NeSC Data Projects and Initiatives Dr. Dave Berry Research Manager](https://reader036.vdocument.in/reader036/viewer/2022070306/5515ee4c550346cf6f8b52a4/html5/thumbnails/37.jpg)
BRIDGES
Glasgow Edinburgh
Leicester Oxford
London
Netherlands
Publically Curated Data
Private data
Private data
Private data
Private data
Private data
Private data
CFG Virtual Organisation Ensembl
MGI
HUGO
OMIM
SWISS-PROT
… DATA HUB
RGD
SyntenyGrid
Service
blast
+
VO Authorisation
Information Integrator
OGSA-DAI
![Page 38: NeSC Data Projects and Initiatives Dr. Dave Berry Research Manager](https://reader036.vdocument.in/reader036/viewer/2022070306/5515ee4c550346cf6f8b52a4/html5/thumbnails/38.jpg)
INWA Project
Innovation Node Western AustraliaInforming Business & Regional Policy: Grid-enabled fusion of global data and local knowledge
Involved 10 partners (6 UK + 4 Australia)Aim
Data mine commercially sensitive dataSecurity an absolute MUSTEmploy Grid technologiesNeed access to data and computational resources
OGSA-DAIAccess data resources
SunDCG's TOG (Transfer-queue Over Globus)Handle job submission to analyse micro array data
![Page 39: NeSC Data Projects and Initiatives Dr. Dave Berry Research Manager](https://reader036.vdocument.in/reader036/viewer/2022070306/5515ee4c550346cf6f8b52a4/html5/thumbnails/39.jpg)
user@australia
Curtin,Australia
EPCC,UK
INWA
Grid Engine
Bank Telco
Grid Engine
Bank Telco
OGSA-DAI OGSA-DAI
OGSA-DAI OGSA-DAI
TOG
TOG
Data Browser
Data Browser
user@edinburgh
Telco data
Bank data
Australian property
UK Property
![Page 40: NeSC Data Projects and Initiatives Dr. Dave Berry Research Manager](https://reader036.vdocument.in/reader036/viewer/2022070306/5515ee4c550346cf6f8b52a4/html5/thumbnails/40.jpg)
Further Information on OGSA-DAI
The OGSA-DAI Project Site:http://www.ogsadai.org.uk
The DAIS-WG site:http://cs.man.ac.uk/grid-db
OGSA-DAI Users Mailing [email protected] discussion on grid DAI matters
Formal support for OGSA-DAI releaseshttp://www.ogsadai.org.uk/[email protected]
OGSA-DAI training courses