the seek ecogrid: a data grid system for ecology
DESCRIPTION
UC DAVIS Department of Computer Science. San Diego Supercomputer Center. The SEEK EcoGrid: A Data Grid System for Ecology. Arcot Rajasekar ([email protected]) Matthew Jones ([email protected]) Bertram Ludäscher ([email protected]). Large collaborative NSF/ITR (2002-2007) - PowerPoint PPT PresentationTRANSCRIPT
The SEEK EcoGrid: A Data Grid System for Ecology
Arcot Rajasekar ([email protected])
Matthew Jones ([email protected])
Bertram Ludäscher ([email protected])
UC DAVISDepartment ofComputer Science
San Diego Supercomputer Center
Science Environment for Science Environment for Ecological KnowledgeEcological Knowledge
Large collaborative NSF/ITR (2002-2007)
Bringing together ecologists, IT experts, CS researchers, …
SEEK.ecoinformatics.org
http://seek.ecoinformatics.org
SWDB Aug 29, 2004
What is SEEK?
• Multidisciplinary research project to facilitate …
• Access to ecological, environmental, and biodiversity data– Enable data sharing & re-use– Enhance data discovery at global scales
• Scalable analysis and synthesis – Taxonomic, Spatial, Temporal, Conceptual integration of
data, addressing data heterogeneity issues– Enable communication and collaboration for analysis– Enable re-use of analytical components
http://seek.ecoinformatics.org
SWDB Aug 29, 2004
SEEK Components
Main Components:• Kepler
– Problem-solving environment for scientific data analysis and visualization “scientific workflows”
• EcoGrid– Distributed data network for environmental,
ecological, and systematics data– Making diverse environmental data systems
interoperate
• Semantic Mediation System– “Smart” data discovery and integration
• Knowledge Representation WG• Taxon WG• BEAM WG• Education, Outreach, Training
http://seek.ecoinformatics.org
SWDB Aug 29, 2004
Ecological Metadata Language
Metadata: a means to manage ecological data There is no universal data model for ecology Accommodate heterogeneity and dispersion
EML Common language for archiving and transporting data Discovery information
Creator, Title, Abstract, Keyword, etc. Content Context Physical, logical structure
SEEK adds semantic structure
http://seek.ecoinformatics.org
SWDB Aug 29, 2004
An Example EML Document
<?xml version="1.0"?><eml:eml packageId="piscoUCSB.5.20" system="knb" xmlns:eml="eml://ecoinformatics.org/eml-2.0.0"><dataset> <shortName>Alegria Temperatures</shortName> <title>PISCO: Intertidal Temperature Data: Alegria, California: 1996-1997</title> <creator id="C.Blanchette"> <individualName> <givenName>Carol</givenName> <surName>Blanchette</surName> </individualName> <organizationName>PISCO</organizationName> <address> <deliveryPoint>UCSB Marine Science Institute</deliveryPoint> <city>Santa Barbara</city> <administrativeArea>CA</administrativeArea> <postalCode>93106</postalCode> </address> </creator> <abstract> <para>These temperature data were collected at Alegria Beach, California, and were ... </para> </abstract> <keywordSet> <keyword>OceanographicSensorData</keyword> <keyword>Thermistor</keyword> <keywordThesaurus> PISCOCategories </keywordThesaurus> </keywordSet> <intellectualRights><para>Please contact the authors for permission to use these data. Please also acknowledge the authors in any publications.</para> </intellectualRights> <contact> <references>C.Blanchette</references> </contact></dataset></eml:eml>
Transform
http://seek.ecoinformatics.org
SWDB Aug 29, 2004
SEEK Overview
http://seek.ecoinformatics.org
SWDB Aug 29, 2004
Ecogrid Focus
Data and Metadata Distributed Data XML-based Metadata
Service to Semantic Mediation Layer Access to Ontologies and Taxon Services Helping with Semantic Data Integration
Service to Analysis and Modelling Layer Interaction with Kepler - Workflows Interaction with Grid Computing Facilities
Access to Legacy Apps LifeMapper Spatial Data Workbench
http://seek.ecoinformatics.org
SWDB Aug 29, 2004
SEEK EcoGrid
• Goal: allow diverse environmental data systems to interoperate– Hides complexity of underlying systems using lightweight
interfaces– Integrate diverse data networks from ecology, biodiversity, and
environmental sciences
• Data systems– Any system can implement these interfaces – Prototyping using:
• Metacat, SRB, DiGIR, Xanthoria, etc.
• Supports multiple metadata standards– EML, Darwin Core as foci
http://seek.ecoinformatics.org
SWDB Aug 29, 2004
Web services
• Service Oriented Architecture (SOA)– Remote discovery and execution of services
• Network transport of data (HTTP)• Message format (SOAP/XML)• Service interface description (WSDL)
Morpho
12
3
Diagram from http://www.w3.org/TR/2002/WD-ws-arch-20021114/
http://seek.ecoinformatics.org
SWDB Aug 29, 2004
Grid Services
• A Grid service is a Web service– plus
• Lifecycle management – (persisting the service over outages)
• State management– (tracking sessions across multiple requests)
• Factory services– (allowing many clients to connect)
• Security– (authorization)
• …
Ecogrid defines a standard set of grid interfaces for useby many data servers
http://seek.ecoinformatics.org
SWDB Aug 29, 2004
EcoGrid Example
query()
get()
EcoGrid WSDLquery(session, query)get(session, identifier)
EcoGridRegistry
1. Publish
3. Return service description
4. Execute search,handle response
5. Execute get,handle response
Morpho
2. Find service
http://seek.ecoinformatics.org
SWDB Aug 29, 2004
EcoGrid Query Interfaces
• Provides a mechanism for search and retrieval of metadata and federated data– Supports third party interaction with search results
• forwarding of result set identifiers to another service instance for retrieval
• Different levels of compliance– Low barrier for participation– Bulk of data will be accessible through Type I ResultQuery
http://seek.ecoinformatics.org
SWDB Aug 29, 2004
EcoGrid Query Level I
• Basic, entry level exposure of data and metadata for EcoGrid and SEEK
• Response contains data – intended for direct communications rather than 3rd party indirection
ResultsetType query(SessionID,QueryType)
byte[] get(SessionID,objectID)
Result Query
http://seek.ecoinformatics.org
SWDB Aug 29, 2004
Query Conditions
• Language independent representation of a query structure
• Transformed into the appropriate native language of the data store
Example:
<AND>
<condition operator="LIKE“
concept="ScientificName">peromyscus%</condition>
<condition operator="NOT EQUALS“ concept="DecimalLatitude">NULL</condition>
</AND>Query
http://seek.ecoinformatics.org
SWDB Aug 29, 2004
Specifying the Resultset
• Specify the list of concepts (fields) to be returned in the resultset
• Simple paths used to identify elements or document subtrees
• Effectively flattens the structure of the records, but allows generic representation
Example:
<returnfield>/ScientificName</returnfield>
<returnfield>/Longitude</returnfield>
<returnfield>/Latitude</returnfield>
Query
http://seek.ecoinformatics.org
SWDB Aug 29, 2004
Full Query Example
<egq:query queryId="query-digir.1.1" system="http://knb.ecoinformatics.org"
xmlns:egq="ecogrid://ecoinformatics.org/ecogrid-query-1.0.0beta1"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="ecogrid://ecoinformatics.org/ecogrid-query-1.0.0beta1 ../../src/xsd/query.xsd">
<namespace prefix="darwin">http://digir.net/schema/conceptual/darwin/2003/1.0</namespace>
<returnfield>/ScientificName</returnfield> <returnfield>/Longitude</returnfield> <returnfield>/Latitude</returnfield> <title>Peromyscus genus query</title> <condition operator="LIKE"
concept="Genus">Peromyscus</condition></egq:query>
Query
http://seek.ecoinformatics.org
SWDB Aug 29, 2004
<rs:resultset resultsetId="foo.1.1" system="urn:not://sure/what/to/put/here" xmlns:rs="ecogrid://ecoinformatics.org/ecogrid-resultset-1.0.0beta1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="ecogrid://ecoinformatics.org/ecogrid-resultset-1.0.0beta1
../../src/xsd/resultset.xsd">
<resultsetMetadata> <sendTime>2003-05-02T16:45:50-09:00</sendTime> <startRecord>1</startRecord> <endRecord>2</endRecord> <recordCount>2</recordCount> <namespace>http://digir.net/schema/conceptual/darwin/2003/1.0</namespace> <system id="1">http://speciesanalyst.net/digir/DiGIR.php?resource=MammalsDwC2</
system> </resultsetMetadata>
<record number="1" system="1" identifier="mvz1"> <returnField name="ScientificName">PEROMYSCUS LEUCOPUS</returnField> <returnField name="Longitude">100</returnField> <returnField name="Latitude">200</returnField> </record> …</rs:resultset>
Query Result Set Structure
Result
http://seek.ecoinformatics.org
SWDB Aug 29, 2004
EcoGrid Get & Put
• get enables retrieval of the content of a dataset/file such as SRB, MetaCat.
• get also enables SQL querying of relational databases (Oracle, DB2, etc), which are pre-registered as a data source in SRB.
• put for data: allows users to create (upload) files into EcoGrid resources such as MetCat, SRB.
• put for metadata: Ecogrid put service also allows ingestion of metadata such as EML in MetaCat or User-defined metadata in SRB.– Depends on the availability of an authentication and access
control system
– put(sessionID, objectID, object, type)– delete(sessionID,objectID)
http://seek.ecoinformatics.org
SWDB Aug 29, 2004
Building the EcoGrid
AND
LUQ
NTL
Metacat node
Legacy system
LTER Network (24) Natural History Collections (>> 100)Organization of Biological Field Stations (180)UC Natural Reserve System (36)Partnership for Interdisciplinary Studies of Coastal Oceans (4)Multi-agency Rocky Intertidal Network (60)
SRB node
DiGIR node
VCR
VegBank node
Xanthoria node
HBR
http://seek.ecoinformatics.org
SWDB Aug 29, 2004
EcoGrid Client Interactions
• Modes of interaction– Client-server– Fully distributed– Peer-to-peer
• EcoGrid Registry– Node discovery– Service discovery
• Aggregation services– Centralized access– Reliability– Data preservation
http://seek.ecoinformatics.org
SWDB Aug 29, 2004
Layers in EcoGrid
http://seek.ecoinformatics.org
SWDB Aug 29, 2004
EcoGrid Queries in Kepler
http://seek.ecoinformatics.org
SWDB Aug 29, 2004
Metadata-driven analysis cycle
http://seek.ecoinformatics.org
SWDB Aug 29, 2004
Status
• Read, Query & Register Completed• Simple Registry Operational• EcoGrid Wrappers completed for:
– MetaCat– SRB– DiGIR– Xanthoria
• Available Interfaces– WSDL– Simple Web Interactivity– Kepler
http://seek.ecoinformatics.org
SWDB Aug 29, 2004
Acknowledgements
This material is based upon work supported by:
The National Science Foundation under Grant Numbers 9980154, 9904777, 0131178, 9905838, 0129792, and 0225676.
PBI Collaborators:
NCEAS, University of New Mexico (Long Term Ecological Research Network Office), San Diego Supercomputer Center, University of California, Davis, University of Kansas (Center for Biodiversity Research)
Kepler contributors:
SEEK, Ptolemy II, DOE SDM/SciDAC, GEON, and others.
http://seek.ecoinformatics.org
SWDB Aug 29, 2004
Q & A
http://seek.ecoinformatics.org
SWDB Aug 29, 2004
Frequently Asked Questions …
• Which version of Grid services do you use?– We currently use 3.2.x because it was the last stable version
based on OGSA. It seems that WSRF does not support the OGSA Factory pattern, which is the main Grid Service feature that we utilize and wouldn’t want to lose. We may migrate to WSRF eventually.
• How can a user (or developer) discover what catalogs are on the EcoGrid?– In Kepler, click the "Sources" button on the Data tab. The UI
allows a basic query of the EcoGrid registry to discover new nodes and choose which should be searched.
– Developers can program to the EcoGrid Registry API.• How much is the EcoGrid *integrated*? Is there a common
query language?– Yes, there is a common query syntax for expressing path-based
metadata queries. This syntax does not do any mapping among various metadata languages. We still need of a system that can translate a query that uses terms from one metadata language (e.g., DarwinCore) into queries for another metadata language (e.g., EML). The SEEK SMS system will help with this mapping.
http://seek.ecoinformatics.org
SWDB Aug 29, 2004
Frequently Asked Questions …
• Is the EcoGrid a "federation of federations" ? – In a sense. The EcoGrid is an *API* (specifically a Grid Services
API) that allows clients to use a common set of communication protocols to access diverse data systems. The EcoGrid API has been implemented for Metacat, DIGIR, and SRB, all of which are federations. As clients can access the various systems via EcoGrid, the latter can be considered a federation of federations. The EcoGrid Registry has a list of systems that have published EcoGrid interfaces that are accessible to clients.
• Where are the WSDLs?– http://ecogrid.ecoinformatics.org/ogsa/services/org/
ecoinformatics/ecogrid/EcoGridQueryInterfaceLevelOneService?wsdl
• What’s on the EcoGrid right now?– The KNB network is gathering data and metadata from NCEAS, 24
LTER sites, and about 200 other field stations (KNB EcoGrid node)– The DIGIR system federates access to museum collections data in
the form of Darwin Core records. The EcoGrid node at KU points at this network of about ~150 museums that are accessible through DIGIR.
– SRB is currently used to hold some data objects that are described via EML metadata records that are in the KNB Metacat.
http://seek.ecoinformatics.org
SWDB Aug 29, 2004
Frequently Asked Questions …
• Where is the code for the EcoGrid? – Most code is in CVS at seek/projects/ecogrid. Some Kepler-
specific client-side UI code is in the Kepler CVS. – http://cvs.ecoinformatics.org/cvs/cvsweb.cgi/seek/projects/
ecogrid– There are also Ecogrid design docs, meeting notes, etc.
• Are there plans for an "EcoGrid Portal" so that end users can access easily contribute data? – Yes, this is under development. In the interim, one can search
the KNB and DIGIR sites individually, or use Kepler.