ceos idn task team chiang mai, thailand may 17, 2003
TRANSCRIPT
CEOS IDN Task Team
Chiang Mai, ThailandMay 17, 2003
IDN Agenda• Minutes from Toulouse, France May 2003
http://idn.ceos.org/IDN/Meetings/2003_05_Toulouse
• IDN Reports • Keyword Update• Process for adopting DIF Changes• Status of DIF Proposals• Thesaurus• Semantic Web• IDN Tools and Software Update
– MD8, MD9, DocBuilder• Discussion/Issues
Minutes from Toulouse
• IDN Reports (4 presentations, 2 written)• ISO 19115 Update• Updated controlled keywords• DIF Modification Proposals• Enhanced Project Description• MD8 Report from UNEP
– MD8 operations and testing– Efforts with PostgresSQL
• MD9 Update
IDN Reports
Status at the American Coordinating Node
New DIFs January 1999 - August 2003
6000
7000
8000
9000
10000
11000
12000
13000
14000
Date
# n
ew
DIF
s
DIF Population by Earth Science Topic
AGRICULTURE5% ATMOSPHERE
18%
BIOSPHERE17%
CLIMATE INDICATORS1%
CRYOSPHERE4%
HUMAN DIMENSIONS10%
HYDROSPHERE6%
LAND SURFACE13%
OCEANS13%
PALEOCLIMATE1%
RADIANCE OR IMAGERY6%
SUN-EARTH INTERACTIONS1%
SOLID EARTH5%
AGRICULTURE ATMOSPHERE BIOSPHERECLIMATE INDICATORS CRYOSPHERE HUMAN DIMENSIONSHYDROSPHERE LAND SURFACE OCEANSPALEOCLIMATE RADIANCE OR IMAGERY SUN-EARTH INTERACTIONSSOLID EARTH
DIF Population by Node
NASA41%
NOAA10%
USDA5%
CIESIN1%
CNES0%
NEONET0%
PNRA0%
RAS0%
NASDA1%
UNEP/GRID4%
AMD19%
G3OS0%
CLIVAR0%
UN5%
CCRS4%
J ST1%
CSIRO0%
DLR0%ESA/ESRIN
1%
INPE0%CONAE
0%USGS
4%
NASA NOAA USDA CIESIN USGSCONAE INPE CCRS ESA/ESRIN DLRCNES NEONET PNRA RAS NASDAJ ST CSIRO UNEP/GRID AMD G3OSCLIVAR UN
GCMD Usage by Domain TypeJanuary 1999 - August 2003
.gov3%
.edu8% .org
1%
.com20%
.net20%
.mil1%
numeric24%
.us1%non-US
22%
.gov .edu .org .com .net
.mil numeric foreign .us
Unique Hosts by DomainComparison of 2000, 2001 and 2002
0
10000
20000
30000
40000
50000
60000
.gov .edu .org .com .net .mil numeric foreign .us
domains
uniq
ue h
osts
2000 Total 2001 Total 2002 Total
# Web Page Hits Since January 2001
200000
300000
400000
500000
600000
700000
800000
J an-01 Apr-01 J ul-01 Oct-01 J an-02 Apr-02 J ul-02 Oct-02 J an-03 Apr-03 J ul-03
month
#hits
Searches by Controlled Keyword
Agriculture9% Atmosphere
12%
Biosphere7%
Climate Indicators0%
Cryosphere4%
Human Dimensions7%
Hydrosphere6%
Land Surface8%
Oceans12%
Paleoclimate4%
Radiance or Imagery6%
Sun-Earth Interactions1%
Solid Earth7%
Location9% Source
4%
Sensor4%
Agriculture Atmosphere Biosphere Climate IndicatorsCryosphere Human Dimensions Hydrosphere Land SurfaceOceans Paleoclimate Radiance or Imagery Sun-Earth InteractionsSolid Earth Location Source Sensor
Global Change Master Directory
GCMD Portals to Community-focused DataGCMD Portals to Community-focused Data
DODS
Portal Index://http
. . / / _ .gcmd nasa gov Data portal index html• The index of all of the Portals created by GCMD are available online from the portal index.
• Portal visibility will be improved with link to portals from the GCMD homepage (currently the link is available only via the GCMD sitemap).
New Portals This Past Year
• National Center for Atmospheric Research
• World Water Forum• Remote Sensing for Conservation Portal• United Nations Earth Science Data• Antarctic Master Directory
– Finland– Belgium– Argentina
IDN/CEOS-GRID Activities
• Participate on the CEOS/GRID Catalog Tiger Team
• Potential CEOS/GRID Collaborations – Directory search across CEOS/GRID
databases– Use of IDN controlled keyword hierarchy – Mapping of IDN DIF fields to core ISO
19115 fields as a minimum set of metadata elements
IDN Agency Reports
Keyword Update
New Earth Science Parameter Keywords
47 New Earth science keywords suggested by the GlobalObserving System Information Center (GOSIC)(comprising the GOOS, GTOS and GCOS)
These new keywords were mapped to existing GCMD/IDN keywords. Although not all of the suggested keywords were adopted, some reorganizationof existing keywords resulted in new and modified keywords
New Earth Science Parameter Keywords
TOPIC keyword Radiance or Imagery
will be restructured to Engineering/Spectral Measurements
To accommodate engineering and level 0 raw data measuredfrom satellites and other remote sensing platforms
A new TERM under this TOPIC will be:Engineering/Spectral > Spectral Measurements > Acoustic Waves
Cryosphere > Glaciers/Ice Sheets > Ablation Zones/Accumulation ZonesCryosphere > Glaciers/Ice Sheets > IcebergsCryosphere > Glaciers/Ice Sheets > Glacier FaciesCryosphere > Glaciers/Ice Sheets > Glacier Mass Balance/Ice Sheet Mass BalanceCryosphere > Glaciers/Ice Sheets > Glacier Thickness/Ice Sheet ThicknessCryosphere > Glaciers/Ice Sheets > Glacier Topography/Ice Sheet TopographyCryosphere > Glaciers/Ice Sheets > Glacier Elevation/Ice Sheet ElevationCryosphere > Glaciers/Ice Sheets > Glacier Motion/Ice Sheet Motion
New Earth Science Parameter Keywords
New Glaciers/Ice Sheets TERM and variables
New Earth Science Parameter Keywords
Land Surface > Soils > Micronutrients/Trace ElementsLand Surface > Soils > Heavy MetalsLand Surface > Soils > MicrofaunaLand Surface > Soils > MicrofloraLand Surface > Soils > MacrofaunaLand Surface > Soils > Soil Rooting DepthLand Surface > Soils > Soil ErosionLand Surface > Soils > Soil Infiltration
New Soils Keywords
Note: Also applies to TOPIC Agriculture > Soils
New Earth Science Parameter Keywords
Biosphere > Ecological Dynamics > Habitat Biosphere > Ecological Dynamics > Indicator SpeciesBiosphere > Ecological Dynamics > Biodiversity
New Biosphere keywords:
New Earth Science Parameter Keywords
Atmosphere > Atmospheric Radition > Incoming Solar Radiation
Hydrosphere > Water Quality/Water Chemistry > Water Potability
Additional new keywords
Keyword Citation
GCMD will be providing a citation that we kindly request should be used by organizations and agencies that use the IDN keywords
• Many groups have adopted the keywords• It is important to know how they are being
used so agencies and be informed of updates• GCMD science coordinators put a lot of effort
and expertise in choosing scientifically accepted terminology
Changes to the IDN DIF Standard
Proposals to Modify the DIF Proposals to Modify the DIF StructureStructure
• Through Formal Standard: ISO 19115 compatibility
• Through CEOS IDN Interoperability Group– Current examples
• Numerical Model Field Data Set• Spatial/Temporal Resolution
– Advantages• More flexible for specified purposes beyond ISO.• Faster turnaround time for implementation.
DIF Changes and the Interoperability Process
• Anyone can request a change to the DIF format
• The proposal is circulated on the interop mailing list ([email protected])
• Based on comments and feedback the proposal may be modified and resubmitted
• A vote is requested by the Interop Voting Committee
DIF Proposals
• A proposal passes with a majority vote
• Results of the vote are conveyed to the interop list
• Prioritization, scheduling, and completion of software changes are dependent on the resources of participating members.
Interop Voting Committee1. Andrea Buffam (CCRS/GeoConnections)
2. Cheryl Solomon (USGS/BRD/GCMD) [email protected]
3. Jolyon Martin (ESA) [email protected]
4. Lee Belbin (JCADM) [email protected]. Osamu Ochai (NASDA)
[email protected]. Lyne Yohe (NSIDC) [email protected]. Victor Pusztai (UNEP GRID-Budapest)
[email protected]. Lorant Czaran (UNEP) [email protected]. Sherry Harrison (GHRC)
Proposals to Modify the DIF Structure
• ISO 19115 compatibility• Numerical Model Field• Data Set Spatial/Temporal
Resolution
DIF/ISO Conversion(Fields that need attention)
Category>Topic > Term > Variable:
Earth Science > Biosphere > Ecological Dynamics > Population
DIF ISO
ISO Topic Category: farming, biota, boundaries,
climatologyMeteorologyAtmosphere, economy, elevation, environment, geoscientific information, healthimageryBaseMapsEarthCover, inlandWaters
Metadata Standard FormatMetadata Standard Version
Personnel:Address
Personnel:Address, City, State or Province, Zip Code,
Country
Not currently in the DIF
ISO Topic Categories
farming, biota, boundaries, climatologyMeteorologyAtmosphere, economy, elevation, environment, geoscientific information, health, imageryBaseMapsEarthCover, intelligenceMilitary,inlandWaters, location, oceans, planningCadastre, society, structure, transportation, utilitiesCommunication
Summary of Proposal Status
Non-controversial proposal to bring the DIF into compliance with the international standard.
o Anne Sophie Archambeau from IPSL, France proposed an addition to the DIF to handle numerical output data sets:
Group: Numerical_Experiment Model_Name: Model_Version: Model_URL: Model_Configuration: Model_Resolution: Model_Calendar: Group: Model_Integration_Period Start_Date: Stop_Date: End_Group Simulation_Name: Initial_Conditions: Perturbation: Imposed_Boundary_Conditions:End_Group
o Interop discussions on the proposed new fields are summarized in the CEOS IDN Interop Newsletter for April 2003 (http://gcmd.gsfc.nasa.gov/pipermail/interop/2003-April/000011.html)
Numerical Model Fields Proposed for the DIF
Summary of Proposal Status
• Received November 18, 2002• The additional information could be included in
the summary• Development schedules do not permit the
prioritization of this addition to the database at this time.
• Comments will be extended until such time that development schedules might permit the initiation of change
• The proposal will be presented to a modeling group in London at the end of September for comment
Data Resolution Proposal
Proposal to provide users with the capability of refining GCMD database searches by Geospatial and Temporal Resolution. Proposed DIF Syntax – (Changes in Bold )
+Group: Data_Resolution
Latitude_Resolution:
Longitude_Resolution:
+Horizonal_Resolution_Range: [choose from the list of geospatial ranges]
Vertical_Resolution:
+Vertical_Resolution_Range: [choose from the list of vertical resolutions]
Temporal_Resolution:
+Temporal_Resolution_Range:[choose from the list of temporal resolutions]
End_Group
DIF authors must select from the set of Geospatial (Horizontal) resolution range valids.
< 1 meter
1 meter - < 30 meters
30 meters - < 100 meters
100 meters - < 1 km
1 km - < 100 km or approximately 0.1 degree - < 1 degree
100 km - < 250 km or approximately 1 degree - < 2.5 degrees
250 km - < 500 km or approximately 2.5 degrees - < 5.0 degrees
500 km - < 1000 km or approximately 5 degrees - < 10 degrees
1000 km or > 10 degrees
Horizontal Resolution Range
Data Resolution Proposal Valids
Vertical Resolution refers to both Altitude and Depth resolution.
< 1 meter
1 meter - < 10 meters
10 meters - < 30 meters
30 meters - < 100 meters
100 meters - < 1 km
> 1 km
< 1 second
1 second - < 1 minute
1 minute - < 1 hour
Hourly
Daily
Weekly
Monthly
Hourly Climatology
Daily Climatology
Pentad
Climatology
Weekly Climatology
Monthly climatology
Annual
Annual climatology
Decadal
Climate Normal (30-year climatology)
Vertical Resolution Range Temporal Resolution Range
Data Resolution Proposal Valids
Example
Group: Data_Resolution
Latitude_Resolution: 1 meter
Longitude_Resolution: 1 meter
Horizontal_Resolution_Range: 1 meter - < 10 meters
Vertical_Resolution: 5 meters
Vertical_Resolution_Range: 1 meter - < 10 meters
Temporal_Resolution: Daily
Temporal_Resolution_Range: Hourly - < Daily
End_Group
Data Resolution Proposal Example
Summary of Proposal Status
New proposal sent through the Interop
Thesaurus Support when Searching Earth Science
DataW.N. Martin, J. C. French (NSF) and
A.K. NaiduDepartment of Computer Science
University of Virginia
The Vocabulary Problem
• Data is indexed using one set of terms
• Searcher casts query using another set of terms
• Result: relevant data is overlooked
Objective
Assist users in specifying queries when the indexing vocabulary is unknown or unfamiliar.
Provide a thesaurus facility to Earth science searchers.
Thesaurus Overview
• DLR gave Thesaurus Server in 1998.
• Lookup Assistant and Modification Tool by University of Virginia– Dr. Jim French– Dr. Worthy Martin– Amit K. Naidu (student)
• Minor server changes have been made.
global change(4)
pollution(6)
... ...
air pollution(22)
... ...
aerosols(5)
carbon monoxide
(6)
NOx(1)
sulfur dioxide(6)
... ... ...
...
acidification(1)
indoor pollution
(1)
...
global warming(5)
...
food contamination
(2)
...
trace gases(11)
...
air quality(2)...
... ... ...
Broader Terms (Top Terms)
Unrelated Terms
Narrower Terms
Related Terms
air contamination
air pollutant
air pollutants
air pollution
air-borne contaminants
air-borne contamination
air-borne contaminations
air-pollutants
atmospheric impurities
atmospheric impurity
atmospheric pollutant
atmospheric pollutants
atmospheric pollution
contaminated air
contaminated atmosphere
polluted air
polluted atmosphere
pollution of the air
pollution of the atmosphere
urban air
urban air pollution
urban atmosphere
Conceptual Thesaurus Structure
Saurus.pl(Perl script)
ISIS Thesaurus Server
Port 6188
Oracle ThesaurusDatabase
Port 1521
Thesaurus Architecture
Demo.cgi(Perl Script)
Document Search servlet
Document DatabaseThesaurus Search Applet Modified Query
Final Search ResultsMain Page
Thesaurus Modification Tool
Last Three Months• Amit Naido started working on the Project.
• Upgraded Makefiles to work with Oracle 9i and SunOS system on new Development machine.
• Now running both thesaurus server and thesaurus modification tool on the same machine.
• Queries go to Isite free-text search on gcmd.nasa.gov machine.
Future Plans
• Create script to regularly backup the database and make sure that the server is running.
• Update the thesaurus database with new terms and relations
• Update the GCMD homepage to include the thesaurus button
DEMO
• Thesaurus Lookup Assistant:http://gcmddev.sesda.com/thesaurus/assistant.html
• Thesaurus Modification Tool:http://gcmddev.sesda.com/thesaurus/tool/applet.html
The Semantic Web for Spatial Data Search
Femke Reitsma
University of Maryland – College Park [email protected]
Why Ontologies: Could the Semantic Web Meet Discovery Challenges?
Why Ontologies: Could the Semantic Web Meet Discovery Challenges?
“The semantic web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation” (Tim Burners-Lee et al. 2001)
The semantic web makes web pages machine “understandable” rather than just human understandable.
Semantic web components:
2. Moving up the semantic web layers:
1. Basic components:
Semantic Web layers presented by Tim Berners-Lee
ontology + semantically marked-up web page
= semantic web
Semantic Web Languages
• Primary languages:
RDF (Resource Description Framework) RDFS (Resource Description Framework Schema)OWL (Web Ontology Language)
• Historical development:XML provides the basic syntax RDF and RDFS adds some tags to XMLDAML+OIL add some tags to RDFOWL extends and replaces (almost) DAML+OIL
• Information is encoded as a triple: subject, predicate, and object
For example:
<Femke > <is a> <student> <Zimbabwe> <is a part of> <Africa>
• All subjects and objects are identified with a Universal Resource Identifier (URI): e.g. http://www.daml.org/2001/02/geofile/geofile-ont.daml#GeographicLocation
Basic Structure
What is an ontology?
Big “O” Ontology vs little “o” ontology:
Ontology = metaphysics, the essence of being, reality
ontology = “a logical theory which gives an explicit, partial account of a conceptualization” (Guarino and Giaretta, 1995 )
What does an ontology look like?
What does semantic web page look like?
Ontology ↔ Semantic Content
Ontology: Dublincore ontologySemantic Web page: http://owl.mindswap.org
ontology + semantic web =
• Computer parsable
• Inference ability
State code > city code > address code
Computer agent could deduce that a Cornell University address, being in Ithaca, must be in New York State, which is in the U.S., and therefore must be formatted to U.S. standards.
Example of application:
Find me places to eat accessible via public transport?
Over here ….
Objective:
Explore the potential of the Semantic Web
for distributing spatial data
Current GCMD Search
North America?
2950 records matched your query
Future GCMD Search
North America?
North America [2950]
Limit search by:
- Spatial resolution
- Temporal resolution
- GCMD keywords
Explore results by:
Canada [1348]USA [1602]
GCMD keywords
……
Key = ability to determine relationships between keywords without explicitly encoding them
North America?
GCMD Database
Sesame Ontology
Java Application
Progressing Towards Level 1:
Sesame = Open Source RDF Schema-based Repository and Querying facility
Keywords → Ontologies
•Importance of careful specification of relationships for ontology.
CATEGORY > TOPIC > TERM > VARIABLE
•For purpose of Semantic Web, keyword structure may need modification. e. g.
Hydrosphere > Ground Water > Saltwater Intrusion
e.g. the Variable Fetch is a measurable property of the Term Ocean Waves; however, the Variable Fisheries is a sub-topic of the Term Agricultural Aquatic Sciences.
Keywords: Projects, Sensors, Sources, Locations, IDN Nodes, Data Centers, Science Keywords, Services Keywords, URL Content Types, Chronostratigraphic Units
DIF Schema
• XSLT style sheet to create DIF schema in Semantic Web language
• Mapping terms to ontology
• Avoiding a monolithic ontology by mapping terms to other ontologies
e.g. Dublin Core
DIFs
• XSLT style sheet to convert DIFs to Semantic Language
• Mapping terms to ontology and DIF Schema
• Recording keywords of finest granularity
• Avoiding a monolithic ontology by mapping terms to other ontologies
e.g. Dublin Core, Cyc, DAML-time
Sesame:HTTP Protocol Handler Soap Protocol Handler
Request Router
Export ModuleQuery ModuleAdmin Module
Repository Abstraction Layer
Client 1 Client 2 Client 3
GCMD Repository:
-RDF DIF files
-Ontologies
-DIF Schema
Se
sam
e
HT
TP
HT
TP
SO
AP
HTTP Protocol Handler Soap Protocol Handler
Request Router
Export ModuleQuery ModuleAdmin Module
Repository Abstraction Layer
Client 1 Client 2 Client 3
GCMD Repository:
-RDF DIF files
-Ontologies
-DIF Schema
Se
sam
e
HT
TP
HT
TP
SO
AP
• Middleware
• GUI or API
• Database: PostgreSQL or Oracle
Advantages for the GCMD
• Semantic Web presents database structure in a machine parsable format
• Ability to search for the semantic relationships among any DIF terms within the ontology
• Do not need to change the database structure when new classes and relationships are added
• Real advantages = when ontology is enriched
IDN Tools and Software Update
DocBuilder: An XML Authoring Tool for ISO 19115
The Next Generation of Metadata Authoring Tools
• Web and stand-alone application
• Web version, will replace the current web tools (DIFbuilder, SERFbuilder, etc.).
• Increases software flexibility by allowing the user to choose what type of document to build (i.e. DIF or SERF or Project Supplemental or even an FGDC or ISO document)
DocBuilder Features DocBuilder Features • Object-oriented design. Allows code reuse.
• Java/Jython implementation. Offers platform independence and maintenance reduction.
• XML support. Promotes extensibility and standardized information exchange.
• Multiple versions. Supports diverse users and environments by offering both Web and stand-alone applications.
• MD8 integration. Provides added functionality through distributed database operations.
• Multi-document support. Increases flexibility by allowing the user to choose the format type to build: DIF, SERF, FGDC, ANZLIC, ISO, or Project Supplemental.
• Customization capabilities. Strengthens integration with portals.
DocBuilder HTML Version DocBuilder HTML Version
• HTML Version: The initial focus of effort was on the Java/Swing version. As this has become a stable product, attention has shifted to the HTML version, which will replace the current Web tools (DIFbuilder, SERFbuilder, etc.).
• Functionality similar to Java/Swing (stand alone) version• Will look similar to the current perl tool (DIFbuilder) so users
won’t need to learn a whole new tool.– All components are written strictly in HTML and JavaScript.
No Java components (applets) are included in the Web-client.
– A fully functional widget has been written that is comparable to the Java/Swing searchable list widget.
– Features specialized widgets for valids fields (Parameters, Sensor, Source, Location, Data Centers, Personnel) to provide increased functionality
– In alpha testing now.
Sample screensMain Page: Overview showing fields “checklist”.
The next generation of IDN authoring toolsDocBuilder
DocBuilder HTML Demo
Click here
MD8 MD8 UpdateUpdate
MD Software Purpose
• To assist JCADM and CEOS IDN members in their efforts to share collection metadata
• To reduce the manual (costly) effort when exchanging DIF metadata
• To improve the metadata validation and thus improve the quality of the descriptions
• To create a foundation for more advanced applications and APIs
MD8 Metadata SharingDB
MD Server
Local Database Agent
UNEPNode
DBMD
Server
Local Database Agent
AADC Node
DBMD
Server
Local Database Agent
Your
Node
Network
New Content
Local DBIncoming QueueTable
MD Server
Local Database Agent
LDAServer
Announcer
Scheduler
Schedule Table
Trigger Table
GCMDNode
MD8 Installation
• Much effort was placed on an easy install compared to MD7
• Read installation requirements for details• Some database and web server configuration
is required outside of the MD8 install• Client is capable of autoupdating its code• UNEP tested the install page and had no
significant problems• AAD tested the install from a more advanced
‘tarball’ and had some problems
MD8 Status
• Used in production at GCMD since 2001•Operational at AADC• Operational at UNEP/Budapest
http://griddata.ktm.hu:8080/Data/portals/ceos
• Final release is MD 8.0 build 3
MD8 at UNEP/Budapest
What’s Next: Future Database Changes
• ISO Changes– Change Personnel address field– Add ISO_Topic_Category field– Add Metadata Standard Name field– Make mandatory: Citation group, DIF
Author, Spatial_Coverage, Dataset_Language, DIF_Creation_Date
Future Database Changes
• GOS project description enhancements• Better tracking of valids and personnel • Data center URLs will be tied to the data
center valid instead of the DIF record• New location valids hierarchy• Dataset geospatial and temporal resolution
What’s NextWhat’s Next
• Move HTML DocBuilder into production
• DIF compatibility with ISO• Web Page Redesign
Home Page
What’s Next (continued)
• Enhanced portal visibility from GCMD homepage
• Adding Services to the AMD• Automatic emails to DIF authors to
remind them to review and update DIFs
MD Version 9• ISO Changes are top priority
– Implies some backward compatibility issues that will need to be dealt with
– In progress• Replace Oracle with PostgreSQL• Replace Isite free-text search with Lucene• Add GOS Project functionality• Subscription Service for Valids• Upgrade the spatial search • enhance data center buckets • Implement temporal and spatial resolution refinement query• Version control of IDN DIFs• API to retrieve DIFs/Valids since a certain date (CCRS request)• Tie data center URLs to dc valid, not DIF• Mini-portal and a save search option
Conclusion: Maintain Operations While Going
Forward
– Operational Goals• Reduce maintenance• Increase content
– Development Goals• Increase functionality• Serve as the “public face” for discovery of
and linkage to CEOS Earth science data
Discussion and Issues