Dr Richard Sinnott Dr Dave Berry
5th February 2004
National e-Science Centre Local Developments
Technical Director National e-Science Centre ||| Deputy Director (Technical)
Bioinformatics Research Centre University of Glasgow
Research ManagerNational e-Science CentreUniversity of Edinburgh
OverviewNeSC Role in UK e-Science
NeSC Edinburgh developments e-Science Institute Infrastructure/set-up Projects Plans
NeSC Glasgow developments Infrastructure/set-up Projects Plans
Conclusions
NeSC’s Role
Help coordinate and lead the UK e-Science ProgrammeCommunity building activities, regional support & outreachGrid building as a member of the Engineering Task ForceSkill building through training events & support centre
Help establish the UK’s international roleInternational meetings, standardisation work & presentations
Undertake R&D projectsTo deliver reliable middlewareTo engage industryTo stimulate the uptake of e-Science technology and methods
Run the e-Science InstituteKnowledge building through workshops and conferencesResearch visitors and events
NeSC at Edinburgh:Recent Developments
Globus AllianceDigital Curation Centre
Edinburgh, Glasgow, UKOLN, CCLRC
New e-Science Lecturer (Particle Physics)Training Team
PPARC and EGEE fundingManager + 4 trainersEurope-wide role
DAI Two (Extension of OGSA-DAI)OGSA Test Grid
Digital Curation Centre
Industry
research collaborators
standards bodies
testbeds& tools
communities of practice:
users
community support
& outreach
research
development
servicesmanagement
& co-ordination
curation organisations
Collaborative Associates Network of DataOrganisations
e-Science Institute
A meeting placeThe focus for presenting UK e-Science
Visiting researchersCollaborate in our research and developmentEngage in and develop our event programmeBuild bridges with their communityVisits last between one week and six months
Research-oriented event programmee-Science research topics Training to e-Science research teams
eSI Workshops
Space for real workCrossing communitiesCreativity: new strategies and solutionsWritten reports
Scientific Data Mining, Integration and VisualisationGrid Information SystemsPortals and PortletsVirtual Observatory as a Data GridImaging, Medical Analysis and Grid EnvironmentsOpen Issues in Grid SchedulingData Provenance & Annotatione-Science Workflow ServicesGeoSciences & Scottish Bioinformatics Forum
http://www.nesc.ac.uk/events/
Suggestionsalways
welcome!
Projects
OGSA-DAI/DAIT, MS.NETGrid, SunDCG, GridWeaver, BRIDGES, PGPGrid, FirstDIG, ODD-GenesEGEE, NextGridOGSA Test Grid, IBM Early EvaluationediktPublishing Scientific DataGridPP, AstroGrid, QCDGrid, RealityGrid PortalBiological Spatio-Temporal DatabasesCoAKTinG, Grid-enabled Modelling Tools and Databases for Neuroinformatics, e-DiamondDynamic Configuration of Grid Fabrics, Dependable Grid Services, Deductive Synthesis Techniques, Inferring QoS Properties for Grid Applications, Mobile Resource GuaranteesTIES, TIES-II
The Virtual Observatory
International Virtual Observatory Alliance
UK, Australia, EU, China, UK, Australia, EU, China, Canada, Italy, Germany, Japan, Canada, Italy, Germany, Japan, Korea, US, Russia, France, IndiaKorea, US, Russia, France, India
How to integrate manymulti-TB collections ofheterogeneous data distributed globally?
Sociological and technological challenges to be met
Data Services
GGF Data Access and Integration Svcs (DAIS)OGSI-compliant interfaces to access relational and XML databasesNeeds to be generalized to encompass other data sources (see next slide…)
Generalized DAIS becomes the foundation for:
Replication: Data located in multiple locationsFederation: Composition of multiple sourcesProvenance: How was data generated?
1a. Request to Registry for sources of data about “x”
1b. Registry responds with
Factory handle2a. Request to Factory for access to database
2c. Factory returns handle of GDS to client
3a. Client queries GDS with XPath, SQL, etc
3b. GDS interacts with database
3c. Results of query returned to client as XML
SOAP/HTTP
service creation
API interactions
Registry
Factory
2b. Factory creates GridDataService to manage access
Grid Data Service
Client
XML / Relational database
Data Access & Integration Services
edikt
The team: 8 professional software engineers, support staff, project manager, commercialisation manager, architect, and SABSHEFC funded research and development grant
3 years funding: May 2002 – 2005+3 years funding upon successful project and review
Standards
Edikt project
Requirementsanalysis
Technologymatchmaking
Gap filling Rigorousengineering
CS Research
Grid Services fore-Science Data Management
Commercial SW components
and skills
E-Science Apps
JavaFramework
ELDAS – Data Access Service
Implemented using Enterprise Java BeansData Access Components interface to distinct DBMSsAccessible as a grid data service or a web data service
ELDAS
DB2 DBMySQL DBXindice DB
Web User1
Oracle 9i DB
EJB - DAS
DACDACDACDAC
ELDAS runs anywhereWeb ServletGrid Proxy
Grid User1 Grid User2
Suitable for grid & web
e-ScienceApplication
BinaryData File
BinaryData FileBinary
Data File
BinaryData FileBinary
Data File
BinaryData File
BinX – accessing legacy binary data
The Problem:Many binary data filesApplications must “know”the data formatBinary data formats are machine-specific
BinX Library
The Solution:Write a “stand-aside” format description in XMLProvide a library to
Interpret the description Provide file access across
different machines
Build higher-level services
BinX file describes binary file structure
BinX file describes binary file structure
simulations
NeSC at GlasgowE-Science Hub
Externally Glasgow end of NeSC
– Involved in UK wide activities» ETF: In May 2003 became first UK e-Science Centre to
run integration tests across every site of the UK (Level 2) Grid. Therefore 100% access to UK Grid resources at this time
– Public visibility of NeSC» responsible for NeSC web site
Internally Focal point for e-Science research/activities at Glasgow Work closely with foundation departments
– Department of Computing Science– Department of Physics & Astronomy
Also working closely with other groups including– Bioinformatics Research Centre– Electronics and Electrical Engineering– Biostatistics, …
Glasgow e-Science Investment
Major investment by university
230m2 of newly renovated floor space in Kelvin
Building offices access grid facility training room
– equipped with 20PCs/server for training courses
Funding Technical Director
Resource Consolidation at Glasgow
Building around ScotGridProviding shared Grid resource for wide
variety of scientists inside/outside Glasgow Particle physicists, computer scientists,
electronic engineers, bioinformaticians, … Focal point, knowledge pool, primary resource
for e-Science activity at Glasgow Target shares
– 60% PP, 20% Bioinf, 20% open share…
Hardware• 59 IBM X Series 330 dual 1 GHz Pentium III with 2GB memory • 2 IBM X Series 340 dual 1 GHz Pentium III with 2GB memory • 3 IBM X Series 340 dual 1 GHz Pentium III with 2GB memory and 100 + 1000 Mbit/s ethernet • 1TB disk • LTO/Ultrium Tape Library • Cisco ethernet switches• IBM X Series 370 PIII Xeon with 32 x 512 MB RAM • 5TB FastT500 disk 70 x 73.4 GB IBM FC Hot-Swap HDD• eDIKT 28 IBM blades dual 2.4 GHz Xeon with 1.5GB memory• eDIKT 6 IBM X Series 335 dual 2.4 GHz Xeon with 1.5GB memory • CDF 10 Dell PowerEdge 2650 2.4 GHz Xeon with 1.5GB memory• CDF 7.5TB Raid disk
Shared Resources: Disk ~15TB
CPU ~ 330 1GHz CDF
LHC BIO
Projects with NeSC Glasgow Involvement
DCC National Digital Curation Centre
AMUSEAutonomous Management of Ubiquitous Systems for e-Health
P2PoptPerformance measurement & mgt of 2-Layer Peer to Peer NWs…
PGPGridPeppers Ghost Productions
EquatorEnvironmental e-Science Interdisciplinary Research Project
BPSBiochemical Pathway Simulator
BRIDGES
Overview of BRIDGES
Biomedical Research Informatics Delivered by Grid Enabled Services (BRIDGES)
NeSC (Edinburgh and Glasgow) and IBM 2 year project started 1st October 2003
Supporting project for CFG project Generating data on hypertensionRat, Mouse, Human genome databases
Variety of tools usedBLAST, FASTA, MPsrch, BLAT, Gene Prediction, visualisation, …
Variety of data sources and formatsMicroarray data, genome DBs, project partner research data, medical records, …
Aim is integrated infrastructure supportingData federationSecurity
Shared data
CFG Partner Distribution
Glasgow Edinburgh
Leicester
Oxford
London
Netherlands
Public curated
data
Private data
Private data
Private data
Private data
Private data
Private data
Problems specific to Bio-Community
PDB Content Growth
•DBs growing exponentially!!!•Biobliographic (MedLine, …)
•Amino Acid Seq (SWISS-PROT, …)
•3D Molecular Structure (PDB, …)
•Nucleotide Seq (GenBank, EMBL, …)
•Biochemical Pathways (KEGG, WIT…)
•Molecular Classifications (SCOP, CATH,…)
•Motif Libraries (PROSITE, Blocks, …)
•…
More genomes …...
Arabidopsis thaliana
mouse
rat
Caenorhabitis elegans
Drosophilamelanogaster
Mycobacteriumleprae
Vibrio cholerae
Plasmodiumfalciparum
Mycobacteriumtuberculosis
Neisseria meningitidis
Z2491
Helicobacter pylori
Xylella fastidiosa
Borrelia burgorferi
Rickettsia prowazekii
Bacillus subtilis
Archaeoglobusfulgidus
Campylobacter jejuni
Aquifex aeolicus
Thermotoga maritima
Chlamydiapneumoniae
Pseudomonasaeruginosa
Ureaplasmaurealyticum
Buchnerasp. APS
Escherichia coli
Saccharomycescerevisiae
Yersinia pestis
Salmonellaenterica
Thermoplasmaacidophilum
Complexity of Biological DataN
ucl
eoti
de
seq
uen
ces
Nu
cleo
tid
e st
ruct
ure
s
Gen
e ex
pre
ssio
ns
Pro
tein
Str
uct
ure
s
Pro
tei n
fu
nct
ion
s
Pro
tein
-pro
tein
inte
ract
ion
(p
ath
way
s)
Cel
l
Cel
l sig
nal
lin
g
Tis
sues
Org
ans
Ph
ysio
logy
Org
anis
ms
BRIDGES Data Integration/Federation
Local repository being developedPopulated with data that cannot be federated
e.g. public data sets with no programmatic interface
Shared data sets of CFG scientistsSecurity through
X.509 PKI (authentication) PERMIS (authorisation)
Will make use of e-Science technologies (OGSA-DAI/DAIT, ELDAS, IBM’s DiscoveryLink)
Automatically keep fresh/updated data
Web (Grid) services offered that allow to make use of these local data sets
For example for visualising, searching, querying, …
Example usage scenario …
System Usage Scenario
BRIDGES Portal
ClientSite X
Secure access for CFG VO
Shared/Private
Data Sets
Personalised Services
BLASTSmith
WSV
DLO
GS
A-
DA
I
Authorisation
Per user, per site
Re
mo
te d
ata
in O
racle
, DB
2,
Syb
ase
, Exce
l, flat file
s, XM
L...
Brow
ser based clients…
Java App downloaded (via WebStart)
Push relevant data onto ScotGrid for BLAST’ing
Secure Data Repository
Up to date results input to DB
wrappers
wrappers
Generic services used by other
projects
Conclusions
NeSC continues to provide leadership in UK e-Science
Difficult with multitude of scientific research areas, heterogeneity of systems and fluidity of technologies,
GT2, GT3, WSRF, GT4…?
Closer working with GridPP beneficial for everyone
move towards Production Grid ScotGrid a good model for co-operation
Planning for soft landing through diversification and more integration into university
MRC bids, BBSRC bids, EPSRC bids, …UK e-Science operating as community for upcoming DTI funding opportunitiesPlans for developing Grid Computing teaching modules as part of advanced MSc
WebsiteNational e-Science Centre http://www.nesc.ac.uk/
Mission, Background, FoundationLocations, Staff, Resources, ProjectsRegister interest, Mailing lists, NeSCForgeRegional associations and CollaborationsNews, NoticesPresentations & Lectures http://www.nesc.ac.uk/presentations/
e-Science Institute http://www.nesc.ac.uk/esi/Mission, Events (Future and Past)Register for Events, Visitor Programme
UK e-ScienceMap and Index of Centres http://www.nesc.ac.uk/centres/Technical Papers http://www.nesc.ac.uk/technical_papers/Index of >100 Projects http://www.nesc.ac.uk/projects/Task Forces http://www.nesc.ac.uk/teams/
General InformationGlossary, Bibliography, Who’s whoE-Science job vacancies
Questions…?