e-science and grid the vl-e approach
DESCRIPTION
e-Science and Grid The VL-e approach. L.O. (Bob) Hertzberger Computer Architecture and Parallel Systems Group Department of Computer Science Universiteit van Amsterdam [email protected]. Content. Developments in Grid Developments in e-Science Objectives Virtual Lab for e-Science - PowerPoint PPT PresentationTRANSCRIPT
-
e-Science and GridThe VL-e approach
L.O. (Bob) Hertzberger Computer Architecture and Parallel Systems Group Department of Computer Science Universiteit van [email protected]
-
ContentDevelopments in GridDevelopments in e-ScienceObjectivesVirtual Lab for e-ScienceResearch philosophyConclusions
-
Background ICTpush developmentsProcessing power doubles every 18 monthMemory size doubles every 12 monthNetwork speed doubles every 9 monthSomething has to be done to harness this developmentVirtualization of ICT resources InternetWebGrid
-
Internal versus external bandwidthMbit/sComputer bussesnetworks
Chart1
19900.064
19954
1997155
1998622
20012500
200220000
1980
.0.0015
Chart2
19900.064
19954
1997155
1998622
20012500
200220000
1980
.0.0015
Chart3
0.0015128
0.064320
4640
1551056
6221056
25004224
200004224
networking
internal
Sheet1
TimeNetworkbus
19800.001512816
19900.06432040
1995464080
19971551056132
19986221056132
200125004224528
2002200004224528
Sheet2
Sheet3
-
Web& Grid & Web/Grid Services
Web services is a paradigm/way of using/accessing informationWeb resources are mostly human-centered (understood by humans, computers can read but cant understand)Grid is about accessing & sharing computing resources by virtualization Data & Information repositoriesExperimental facilitiesOGSA: Service oriented Grid standard based on Web services
-
Web & Grid & Semantic Web/Grid
Semantic Web resources should also be understandable by computersThis way, many complex tasks formulated and assigned by humans can be automated and executed by agents working on the Semantic WebBut, both Grid and Web Services only focus on single services.Semantic Web/Grid should be able to describe a single service as well as the relationship between services e.g. aggregate service using a number of services so making knowledge explicit
-
Levels of Grid abstractionComputational Grid Data GridInformation Web/Grid Knowledge Web/Grid
-
Background informationexperimental sciencesThere is a tendency to look ever deeper in:Matter e.g. PhysicsUniverse e.g. AstronomyLife e.g. Life sciencesTherefore experiments become increasingly more complexInstrumental consequences are increase in detector: Resolution & sensitivityAutomation & robotization Results for instance in life science in:So called high throughput methods Omics experimentation genome ===> genomics
-
New technologies in Life Sciences researchUniversity of AmsterdamcellMethodology/Technology
-
Paradigm shift in Life sciencePast experiments where hypothesis drivenEvaluate hypothesisComplement existing knowledge
Present experiments are data drivenDiscover knowledge from large amounts of dataApply statistical techniques
-
Background informationexperimental sciencesExperiments become increasingly more complexDriven by detector developmentsResolution increasesAutomation & robotization increasesResults in an increase in amount and complexity of data
-
The Application data crisisScientific experiments start to generate lots of datamedical imaging (fMRI): ~ 1 GByte per measurement (day)Bio-informatics queries:500 GByte per databaseSatellite world imagery: ~ 5 TByte/yearCurrent particle physics: 1 PByte per yearLHC physics (2007): 10-30 PByte per year
Data is often very distributed
-
Background informationexperimental sciencesExperiments become increasingly more complexDriven by detector developmentsResolution increasesAutomation & robotization increasesResults in an increase in amount and complexity of dataSomething has to be done to harness this developmentVirtualization of experimental resources: e-Science
-
The what of e-Sciencee-Science is the application domain Science of Grid & Web More than only coping with data explosionA multi-disciplinary activity combining human expertise & knowledge between:A particular domain scientistICT scientiste-Science demands a different approach to experimentation because computer is integrated part of experimentConsequence is a radical change in design for experimentatione-Science should apply and integrate Web/Grid methods where and whenever possible
-
Grid and Web ServicesConvergenceGridDefinition of Web Service Resource Framework(WSRF) makes explicit distinction between service and stateful entities acting upon service i.e. the resources Means that Grid and Web communities can move forward on a common base!!! Ref: FosterWeb
-
e-Science Objectives
It should enhance the scientific process by:Stimulating collaboration by sharing data & informationResult is re-use of data & information
-
The data sharing potential for CognitionCollaborative scientific researchInformation sharingMetadata modelingAllows for experiment validationIndependent confirmation of resultsStatistical methodologiesAccess to large collections of data and metadataTrainingTrain the next generation using peer reviewed publications and the associated data
-
Electron tomography data pipeline Acquisition Alignment Reconstruction Segmentation Interpretation
-
Cell Centred Database
NCIMR (San Diego)Maryann Martone and Mark Ellisman
-
e-Science Objectives
It should enhance the scientific process by:Stimulating collaboration by sharing data & informationImprove re-use of data & informationCombing data and information from different modalitiesSensor data & information fusion
-
The anatomy of a painting: Scrutinizing Rembrandt in multiple dimensions
-
To metadatadatabase
VL chemical imaging data experiment
Experiment 1
Experiment 2
VL Exp.1
VL Exp.2
VL Result 1
VL Result 2
VL Exp.3 + 4
Metadata correlation analysis (CCA)
VL Result 3
VL Result 4
-
300mmx300mm256x256 pixels3-D Dataset
400mmx400mm64x64 pixels3-D Dataset
Metadata generation
EXP 1
EXP 1
VL EXP 1
VL EXP 2
-
Score images
Reconstructed MS
Score images
Reconstructed FTIR spectra
R=0.858
R=0.618
Pos
Pos
Neg
Neg
VL EXP 3
VL EXP 4
Improved FTIR spectra, S/N >100% increase
-
e-Science Objectives
It should enhance the scientific process by:Stimulating collaboration by sharing data & informationImprove re-use of data & informationCombing data and information from different modalitiesSensor data & information fusionRealize the combination of real life & (model based) simulation experiments
-
Simulated Vascular Reconstructionin a Virtual Operating Theatre
An example of the combination of real life & (model based) simulation experiments patient specific vascular geometry (from CT)Segmentationblood flow simulation (Latice Bolzmann)Pre-operative planning (interaction)Suitable for parallelization through functional decompositionSimulated Fem-FembypassPatients vascular geometry (CTA)
-
e-Science Objectives
It should enhance the scientific process by:Stimulating collaboration by sharing data & informationImprove re-use of data & informationCombing data and information from different modalitiesSensor data & information fusionRealize the combination of real life & (model based) simulation experimentsModeling of dynamic systems
-
Dynamicbird behaviourMODELSBird distributionsEnsemblesCalibration andData assimilationPredictions and on-linewarningsBird behaviour in relation to weather and landscape
-
e-Science Objectives
It should result in :Computer aided support for rapid prototyping of ideasStimulate the creativity process
It should realize that by creating & applying:New ICT methodologies and a computing infrastructure stimulating thisFrom this ICT point of view it should support the following application steps:DesignDevelopment & realizationExecution Analysis & interpretationWe try to realize e-Science and their applications via the Virtual Lab for e-Science (VL-e) project
-
Virtual Lab for e-Science research Philosophy
Multidisciplinary research & the development of related ICT infrastructureGeneric application supportApplication cases are drivers for computer & computational science and engineering research
-
Managementof comm. & computingVL-e Application Oriented ServicesFood Informatics Dutch Telescience Medical Diagnosis & ImagingVL-e projectBio-Informatics
Data Insive Science/LOFARBio-Diversity Data intensive sciences
-
Two sides of Bioinformaticsas an e-Science
The scientific responsibility to develop the underlying computational concepts and models to convert complex biological data into useful biological and chemical knowledgeTechnological responsibility to manage and integrate huge amounts of heterogeneous data sources from high throughput experimentation
-
Role of bioinformaticscellmethodologybioinformatics
-
Virtual Lab for e-Science research PhilosophyMultidisciplinary research and development of related ICT infrastructureGeneric application supportApplication cases are drivers for computer & computational science and engineering research Problem solving partly generic and partly specificRe-use of components via generic solutions whenever possible
-
Managementof comm. & computing Managementof comm. & computing Managementof comm. & computingPotential GenericpartPotential GenericpartPotential GenericpartApplicationSpecificPartApplicationSpecificPartApplicationSpecificPartVirtual Laboratory Application Oriented Services
-
Generic e-Science aspects
Virtual Reality Visualization & user interfacesModeling & SimulationInteractive Problem SolvingData & information managementData modelingdynamic work flow managementContent (knowledge) managementSemantic aspectsMeta data modelingOntologiesWrapper technologyDesign for Experimentation
-
Virtual Lab for e-Science research PhilosophyMultidisciplinary research and development of related ICT infrastructureGeneric application supportApplication cases are drivers for computer & computational science and engineering research Problem solving partly generic and partly specificRe-use of components via generic solutions whenever possible Rationalization of experimental processReproducible & comparable
-
Issues for a reproducible scientific experimentinterpretationexperimentMuch of this is lost when an experiment is completed.
-
Scientific experiments & e-ScienceFor complex experiments: contain complex processes require interdisciplinary expertise need large scale resourceGrid & high level support
-
Components in a VL-e experimentExperiment TopologyGraphical representation of self-contained dataprocessing modules attached to each other in a workflowProcess-Flow Templates(PFT) Derived from ontologies Graphical representation of data elements and processing steps in an experimental procedure Information to support context-sensitive assistance (semantics)Study Descriptions of experimental steps represented as an instance of a PFT with references to experiment topologies
-
Step1: designing an experimentStep2: performing the experimentStep3: analyzing the experiment resultssuccessTaverna, Kepler, and Triana: model processes for computing tasks.In VL-e: model both computing tasks and human activity based processes, and model them from the perspective of an entire lifecycle. It tries to support Collaboration in different stages Information sharing Reuse of experimentPFT in VLAMGPFT instances in VLAMGProcess Flow Templatee-Science environment Scientific Workflow Management SystemsProcess Flow Template
-
Definition of experiment protocolsWorkflow definitionsRecreate complex experiments into process flowsWorkflow executionMaintain control over the experimentData processing executionTopologies of data processing modulesInterpretationVisualization of processed results to help intuition Ontology definitions can help in obtaining a well-structured definition of experiment, data and metadata.
-
Virtual Lab for e-Science research Philosophy
Multidisciplinary research and development of related ICT infrastructureGeneric application supportApplication cases are drivers for computer & computational science and engineering research Problem solving partly generic and partly specificRe-use of components via generic solutions whenever possible Rationalization of experimental processReproducible & comparableTwo research experimentation environmentsProof of concept for application experimentationRapid prototyping for computer & computational science experimentation
-
The VL-e infrastructureGrid MiddlewareSurfnetApplication specificserviceApplication Potential Generic service &Virtual Lab. servicesGrid &NetworkServicesVirtual LaboratoryVL-e Proof of Concept EnvironmentTelescience Medical ApplicationBio InformaticsApplicationsVL-e Experimental EnvironmentVirtual Lab.rapid prototyping(interactive simulation)Additional Grid Services(OGSA services)Network Service (lambda networking)VL-e Certification EnvironmentTest & Cert.CompatibilityTest & Cert.Grid MiddlewareTest & Cert.VL-software
-
Infrastructure for ApplicationsApplications are a driving force of the PoC
Experience shows applications value stability
Foster two-way interaction to make this happen
-
Taverna, Kepler, Triana and VLAM.Step1: designing an experimentStep2: performing the experimentStep3: analyzing the experiment resultssuccessPFTe-Science environment Scientific Workflow Management SystemsSWMSResearch activity to be developed in the Rapid Prototyping environmentStable developmentsto be used in VL-e in the Proof of Concept environment Research activity to be developed in the Rapid Prototyping environmentPFT
-
VL-e PoC environmentLatest certified stable software environment of core grid and VL-e servicesCore infrastructure built around clusters and storage at SARA and NIKHEF (production quality)Controlled extension to other platforms and distributionsOn the user end: install needed servers: user interface systems, storage elements for data disclosure, grid-secured DB accessFocus on stability and scalability
-
Hosted services for VL-eKey services and resources are offered centrally for all applications in VL-eMass data and number crunching on the large resources at SARAStorage for data replication & distributionPersistent strategic storage on tapeResource brokers, resource discovery, user group management
-
Why such a complex scheme?software is part of the infrastructurestability of core software needed to develop the new scientific applicationsenable distributed systems management (who runs what version when?)the grid is one big error amplifiercomputers make mistakes like humans, only much, much faster
-
What did we learn
It is not enough to just transport current applications to Grid or e-Science infrastructuresTo fully exploit the potential of e-Science infrastructures one has to learn what is possible Therefore the full lifecycle of an experiment has to be taken into accountWorkflow management is a first stepWe add semantic information via Process Flow TemplatesApplication innovation such as for instance bio-banking should be the aimGrids should be transparent for the end-user
-
Conclusions
e-Science is a lot more more than trying to cope with data explosion aloneImplementation of e-Science systems requires further rationalization and standardization of experimentation processe-Science success demands the realization of an environment allowing application driven experimentation & rapid dissemination of feed back of these new methodsWe try to do that via development of Proof of Concept based on Grid
-
Electron tomography data pipeline developmentTOM (Matlab) acquisition and data storage with the SRBSRB StorageWith the CCDB database for retrieval of experimental data setsGrid-computing 3D reconstruction refinements Currently for EMAN and later also for TOM (Matlab)SARA - Amsterdam1 Gbit/s
Here the role of bioinformatics has to be introducedData presentation essentialBy virtualizing experimental resources they can become part of a computing network necessary for computer experimentationMention the essential paradime shiftGenomics has to be realized via bioinformaticsCognition via sharing of data & databasingMention the problem of making data comparable and experiments reproducibleCombination in vitro/vivo with in silicoPet/fMRITranscriptomics/proteomicsHuntington example fMRICombination in vitro/vivo with in silicoPet/fMRITranscriptomics/proteomicsHuntington example fMRIVoorbeeld van nieuwe meer exactere modellen en de noodzaak tot interactier met je computing experimentFunctionele decompositie
De verschillende componenten lopen elk op hun eigen processor en communiceren de resultaten met elkaar.
Combination in vitro/vivo with in silicoPet/fMRITranscriptomics/proteomicsHuntington example fMRIMention the tension between the necessity of rationalization and the demand of flexibilizationOur approach keep application and supporting research togetherExample bioinformaticsBioinformatics & biogridInformation management intensive scienceMentioned first reqirement to come to system wide approach towards biology including modeling of cell processesMention the tension between the necessity of rationalization and the demand of flexibilization
How easy is all of this?
For the end-users, VL has a friendly user interface, which in theory will allow the end-users to perform all the tasks we have discussed so far without writing a single program or script. The PFT is completely graphical, present the needed information to the user on demand (usually by a simple click of the mouse button). It communicates with all the other software components of the VL toolkit and convert there output to a human readable form (filling forms, sending queries, composing a data process flow).
In Reality the end-user doesnt use the PFT itself it create an instance and tuned it. As I said one of the tasks to be specified is the data flow process: where data is coming from? Which process needs to be applied? On which resource the processes should be executed? And where the outcome is saved? Mostly this is represented in the PFT as simple box by just clicking in the box, the use is provided with another graphical interface, which allows him to compose the process flow of the d