Download - The Global Bio Grid
![Page 1: The Global Bio Grid](https://reader038.vdocument.in/reader038/viewer/2022102800/56812a93550346895d8e437e/html5/thumbnails/1.jpg)
The Global Bio GridAndrew Grimshaw
University of VirginiaJanuary, 2006
Virginia Center for Grid Research
![Page 2: The Global Bio Grid](https://reader038.vdocument.in/reader038/viewer/2022102800/56812a93550346895d8e437e/html5/thumbnails/2.jpg)
• Why Bio Grids?
• Grid Basics
• The Global Bio Grid
![Page 3: The Global Bio Grid](https://reader038.vdocument.in/reader038/viewer/2022102800/56812a93550346895d8e437e/html5/thumbnails/3.jpg)
In ten years the world will be very different.
![Page 4: The Global Bio Grid](https://reader038.vdocument.in/reader038/viewer/2022102800/56812a93550346895d8e437e/html5/thumbnails/4.jpg)
Think back ten years.
• No web
• Wide-spread internet was new
• Human Genome Project still far from completion
• Science (biology) done primarily in individual labs
![Page 5: The Global Bio Grid](https://reader038.vdocument.in/reader038/viewer/2022102800/56812a93550346895d8e437e/html5/thumbnails/5.jpg)
Today
• Billions a year in e-commerce• Internet everywhere
• Broadband to your home• Wireless becoming pervasive
• Pervasive device are proliferating – motes
• Sequencing of organisms a daily event. Bioinformatics hitting the main stream
![Page 6: The Global Bio Grid](https://reader038.vdocument.in/reader038/viewer/2022102800/56812a93550346895d8e437e/html5/thumbnails/6.jpg)
![Page 7: The Global Bio Grid](https://reader038.vdocument.in/reader038/viewer/2022102800/56812a93550346895d8e437e/html5/thumbnails/7.jpg)
Tomorrow
• $1000/sequnce for humans – becomes standard clinical practice
• “Biology is becoming an information science”(Large Scale Biomedical Science: Exploring Strategies for future research, Institute of Medicine, National Research Council, 2003)
• Global interconnected networks – grids• Provide transparent, secure, access to data, applications,
and on-demand compute.
• Research using not just your data, but all trusted data, not just your applications, but any trusted application.
• Implications for progress are significant.
![Page 8: The Global Bio Grid](https://reader038.vdocument.in/reader038/viewer/2022102800/56812a93550346895d8e437e/html5/thumbnails/8.jpg)
There are a number of “catches”
• So much data!
• So many organizations with so little trust!
• So much complexity!
![Page 9: The Global Bio Grid](https://reader038.vdocument.in/reader038/viewer/2022102800/56812a93550346895d8e437e/html5/thumbnails/9.jpg)
An IT guys view
• Data is all over, of all different forms, with lots of different policies• Need to get the right data in the right place at the
right time
• Ontology problem – how do we compare, integrate, the databases• Need to understand semantics, automatically
transform
• Semantics• Knowledge Discovery – “mining”
![Page 10: The Global Bio Grid](https://reader038.vdocument.in/reader038/viewer/2022102800/56812a93550346895d8e437e/html5/thumbnails/10.jpg)
This is where grids enter the picture
(we do the plumbing)
![Page 11: The Global Bio Grid](https://reader038.vdocument.in/reader038/viewer/2022102800/56812a93550346895d8e437e/html5/thumbnails/11.jpg)
Some lessons learned
• 10+ years in academic and commercial grids• All/most problems are not technical• Users don’t want change!
• Too many grids are technology centric• Must keep “activation energy low”• Need a user-centric approach• There are at least four classes of users• Wide variance in computational savvy
![Page 12: The Global Bio Grid](https://reader038.vdocument.in/reader038/viewer/2022102800/56812a93550346895d8e437e/html5/thumbnails/12.jpg)
A grid enables users to collaborate securely by sharing processing, applications, work flows and processes, and data across heterogeneous systems and administrative domains for collaboration, faster application execution, and easier access to data.
What is a Grid? A grid is all about gathering together resources and making them accessible to users and applications.
The emphasis is on secure access to a widevariety of resources
![Page 13: The Global Bio Grid](https://reader038.vdocument.in/reader038/viewer/2022102800/56812a93550346895d8e437e/html5/thumbnails/13.jpg)
Characteristics of Grid systems
Numerous ResourcesOwnership by
MutuallyDistrustful
Organizations & Individuals
Potentially FaultyResources
Different Security
Requirements & Policies Required
Resources areHeterogeneous
GeographicallySeparated
Different Resource
ManagementPolicies
Connected byHeterogeneous, Multi-Level
NetworksGrid System
![Page 14: The Global Bio Grid](https://reader038.vdocument.in/reader038/viewer/2022102800/56812a93550346895d8e437e/html5/thumbnails/14.jpg)
Characteristics of a Grid system
Numerous ResourcesOwnership by
MutuallyDistrustful
Organizations & Individuals
Potentially FaultyResources
Different Security
Requirements & Policies Required
Resources areHeterogeneous
GeographicallySeparated
Different Resource
ManagementPolicies
Connected byHeterogeneous, Multi-Level
Networks
![Page 15: The Global Bio Grid](https://reader038.vdocument.in/reader038/viewer/2022102800/56812a93550346895d8e437e/html5/thumbnails/15.jpg)
What grids are not
• The solution to all problems
• Clusters of machines
• SETI@home
• Any one particular technology
![Page 16: The Global Bio Grid](https://reader038.vdocument.in/reader038/viewer/2022102800/56812a93550346895d8e437e/html5/thumbnails/16.jpg)
Users view
Site 0 Site 1 Site 2 Site 3
Cluster
Cluster
HPSS
UsersUsers
Grid
Runprograms
AccessData Collaborate
Provideshared
services
![Page 17: The Global Bio Grid](https://reader038.vdocument.in/reader038/viewer/2022102800/56812a93550346895d8e437e/html5/thumbnails/17.jpg)
Grid Computing Scenarios
Desktop Cycle Aggregation• Limited acceptance in commercial enterprises
Cluster Grids• Single owner, department, project • Single domain, file system• LAN connection
Campus/Enterprise Grids• Multiple owners, domains• Multiple file systems• WAN connection
Partner Grids• Multiple owners, sites, domains• Multiple file systems• Internet connectivity
Legion Grid
Software – C
ompute
and Data G
rid
![Page 18: The Global Bio Grid](https://reader038.vdocument.in/reader038/viewer/2022102800/56812a93550346895d8e437e/html5/thumbnails/18.jpg)
Standards
• Global Grid Forum – ggf.org• OGSA – Open Grid Services Architecture
• Web-Services based IPC• WSRF and possibly other• OGSA-BES – Basic Execution Service• OGSA-ByteIO – file IO• WS-Naming – abstract name to EPR• RNS-lite – Resource Name Space
![Page 19: The Global Bio Grid](https://reader038.vdocument.in/reader038/viewer/2022102800/56812a93550346895d8e437e/html5/thumbnails/19.jpg)
The Global Bio Grid
![Page 20: The Global Bio Grid](https://reader038.vdocument.in/reader038/viewer/2022102800/56812a93550346895d8e437e/html5/thumbnails/20.jpg)
• Federated access to multiple • Data sources
• Public databases• Commercial databases• In-house databases, annotations, etc.
• Application suites (including processes and workflows)
• Compute resources
• Shared among collaborative research teams• Multiple research locations• Virtual organizations
• Built on evolving computing standards (GGF, I3C, WS-*)
GBG concept
![Page 21: The Global Bio Grid](https://reader038.vdocument.in/reader038/viewer/2022102800/56812a93550346895d8e437e/html5/thumbnails/21.jpg)
Global Bio Grid• Datagrid using Avaki DG technology
• Working on ADG available free for “.edu”• UVA, NCBIO, U-Texas, Texas Tech• Already operational• Flat file and relational• Working on an OGSA-compliant implementation
• Compute grid at UVA on-line• 64 dual processor Opteron’s available• Sunfires• Hundreds of Windows machines• Legion 1.8 based – moving towards OGSA-compliant services
• Applications• Biomarker• Searching pub med• Hospital info integration
![Page 22: The Global Bio Grid](https://reader038.vdocument.in/reader038/viewer/2022102800/56812a93550346895d8e437e/html5/thumbnails/22.jpg)
Three resource classes illustrate the Grid-effect
• Data
• Processing
• Applications
![Page 23: The Global Bio Grid](https://reader038.vdocument.in/reader038/viewer/2022102800/56812a93550346895d8e437e/html5/thumbnails/23.jpg)
Data• Suppose you have collaborators with critical
databases (clinical, protein, other) that you need to use.
• You use a number of databases that change on a regular basis.
• You want to “mine” heterogeneous data sets (relational, flat-file, XML, …) in different locations – say in a hospital
• Want to produce, consume, or share derivative data products, e.g., the result of a set of joins and data transformation steps.
• This applies to business data (BI/EII) as well as life science data
![Page 24: The Global Bio Grid](https://reader038.vdocument.in/reader038/viewer/2022102800/56812a93550346895d8e437e/html5/thumbnails/24.jpg)
SEQ_3
BiochemistryBiology
Partner Institution
SEQ_2SEQ_1
Partner Institution
Public DB Public DB
Research Institution
APP 2APP 1
Public DBDataGrid: Unifying fabric for data access • Transparent access to multiple DBs• Multiple domains• Highly-secure, flexible access control• Automatic cache management and
coherence
PDB
NCBI
EMBL
SEQ_1
Data
![Page 25: The Global Bio Grid](https://reader038.vdocument.in/reader038/viewer/2022102800/56812a93550346895d8e437e/html5/thumbnails/25.jpg)
Three Concrete Examples
• KDS – “data mining” on widely separated data sets such as PubMed.
• “Map” UniProt datasets into data grid• Researchers no longer need to spend time
downloading latest
• Extended Hospital
![Page 26: The Global Bio Grid](https://reader038.vdocument.in/reader038/viewer/2022102800/56812a93550346895d8e437e/html5/thumbnails/26.jpg)
Extended Hospital
Insurance companies
Emergency vehicles
Research
DataWarehouse
Department Domain
Data
Department Domain
Data
Department Domain
Data
HOSPITAL
Clinics / Large Practices
Non-relatedHospitals
AuthorizedFamily
![Page 27: The Global Bio Grid](https://reader038.vdocument.in/reader038/viewer/2022102800/56812a93550346895d8e437e/html5/thumbnails/27.jpg)
Processing• Classic high-throughput computing
• Suppose you have thousands of computationally intensive jobs to run• SW, CHARMm, Sequest, a.out
• Your usage is bursty – need a lot over short period of time, but often have idle resources
• You wish you had more!
![Page 28: The Global Bio Grid](https://reader038.vdocument.in/reader038/viewer/2022102800/56812a93550346895d8e437e/html5/thumbnails/28.jpg)
SEQ_3
BiochemistryBiology
Partner Institution
SEQ_2SEQ_1
Partner Institution
Public DB Public DB
Research Institution
APP 2APP 1
Cluster 1
Cluster 2
Cluster N
Processing
Public DBCompute Grid: Shared access to processing
• Flexible, location-independent access to virtually unlimited processing, on-demand
• Scheduling, usage, management policies• System detects, recovers from job failures• Heterogeneous platform support• Usage accounting, as required
PDB
NCBI
EMBL
SEQ_1
Data
![Page 29: The Global Bio Grid](https://reader038.vdocument.in/reader038/viewer/2022102800/56812a93550346895d8e437e/html5/thumbnails/29.jpg)
Concrete Examples
• Biomarkers project wants to run Sequest-2 using public databases
• Charmm/Amber
• Gnomad (Altman et al)
• BLAST, FASTA, ….
• Autodock
![Page 30: The Global Bio Grid](https://reader038.vdocument.in/reader038/viewer/2022102800/56812a93550346895d8e437e/html5/thumbnails/30.jpg)
Applications
• Suppose you want to use applications or workflows developed, maintained, and supported by others – without the hassle of installing all of them on your gear.
• Suppose you want to couple multiple applications developed at different institutions together.
![Page 31: The Global Bio Grid](https://reader038.vdocument.in/reader038/viewer/2022102800/56812a93550346895d8e437e/html5/thumbnails/31.jpg)
SEQ_3
BiochemistryBiology
Partner Institution
SEQ_2SEQ_1
Partner Institution
Public DB Public DB
Research Institution
APP 2APP 1
PDBNCBIEMBLSEQ_NData
Cluster 1
Cluster 2
Cluster N
Processing
APP 1
APP 2
APP N
Applications
Public DB
• Flexible binary management• No need to recompile applications• Securely share applications
• Restrict who gains access• Restrict where apps run
Grid users share applications, employing multiple data & processing resources
PDB
NCBI
EMBL
SEQ_1
Data
![Page 32: The Global Bio Grid](https://reader038.vdocument.in/reader038/viewer/2022102800/56812a93550346895d8e437e/html5/thumbnails/32.jpg)
SEQ_3
BiochemistryBiology
Partner Institution
SEQ_2SEQ_1
Partner Institution
Public DB Public DB
Research Institution
APP 2APP 1
Cluster 1
Cluster 2
Cluster N
Processing
APP 1
APP 2
APP N
Applications
Public DBBetter Research, Faster
• Secure, wide-area access to global breadth of consistent, current data
• Access to vast processing power• Ability to securely share proprietary
data and applications, as needed
PDB
NCBI
EMBL
SEQ_1
Data
![Page 33: The Global Bio Grid](https://reader038.vdocument.in/reader038/viewer/2022102800/56812a93550346895d8e437e/html5/thumbnails/33.jpg)
Evolution in action
Bare Metal Programming
50’s
Batch OS
Multi-UserTimeshare
60’s to 80’s
Low Level Network
Programming
Today
Grid & WS
Now & Future!
Summary
![Page 34: The Global Bio Grid](https://reader038.vdocument.in/reader038/viewer/2022102800/56812a93550346895d8e437e/html5/thumbnails/34.jpg)
Summary
• Grids will have a huge impact on the life sciences
• Prototype GBG operational
• Applications are underway
• We’re always looking for new applications