in search of what some of it means rda semantics and metadata workshop feb 23, 2015 peter fox (rpi)...
TRANSCRIPT
In Search of What Some of It Means
RDA Semantics and Metadata Workshop
Feb 23, 2015
Peter Fox (RPI) [email protected] World Constellation
Metadata and documentation
Not more code!
Spectral synthesis components and flow
Getting the metadata?
6
What I wanted ~ 1994-6
Scientists should be able to access a global, distributed knowledge base of scientific data that:
• appears to be integrated
• appears to be locally available
But… data is obtained by multiple means (instruments, models, analysis) using various protocols, in differing vocabularies, using (sometimes unstated) assumptions, with inconsistent (or non-existent) metadata. It may be inconsistent, incomplete, evolving, and distributed. And, it is almost always created in a manner to facilitate its generation not its use.
And… there exist(ed) significant levels of semantic heterogeneity, large-scale data, complex data types, legacy systems, inflexible and unsustainable implementation technology…
What I was doing…
pro read_spec, spectra_name, description, auxiliary_info, model_size, mu_size, wave_size, model, smodel, mu, wave0, wavelength, intensity, brightness_temperature, index1, index2, percent
ncopts = 0;
description_start=0
description_edges=80
i=0
j=0
k=0
; Construct the DB filename
ncid=ncdf_open(string(getenv("SPECTRA")))
inq_struct=ncdf_inquire(ncid)
; /* get dimension info */
tmp_id = ncdf_dimid(ncid, "comment_dim")
ncdf_diminq,ncid, tmp_id, dummy, comment_dim
tmp_id=ncdf_dimid(ncid, "mu_dim")
ncdf_diminq,ncid, tmp_id, dummy, mu_dim
tmp_id=ncdf_dimid(ncid, "wave_dim")
ncdf_diminq,ncid, tmp_id, dummy, wave_dim
tmp_id=ncdf_dimid(ncid, "model_dim")
ncdf_diminq,ncid, tmp_id, dummy, model_dim
tmp_id=ncdf_dimid(ncid, "smodel_dim")
ncdf_diminq,ncid, tmp_id, dummy, smodel_dim
tmp_id=ncdf_dimid(ncid, "item_dim")
ncdf_diminq,ncid, tmp_id, dummy, item_dim
What I was doing… etc.
tmp_id = ncdf_varid (ncid, "description")
ncdf_varget,ncid, tmp_id, OFFSET=0,COUNT=comment_dim, description
; Id's for variables
tmp_id=ncdf_varid(ncid, "spectra_name")
ncdf_varget,ncid, tmp_id, OFFSET=0,COUNT=comment_dim, spectra_name
tmp_id=ncdf_varid(ncid, "auxiliary_info")
ncdf_varget,ncid, tmp_id, OFFSET=0,COUNT=comment_dim, auxiliary_info
tmp_id=ncdf_varid(ncid, "model_size")
ncdf_varget,ncid, tmp_id, OFFSET=0,COUNT=item_dim, model_size
start=intarr(1)
edges=intarr(1)
start(0)=0
edges(0)=model_size
tmp_id=ncdf_varid(ncid, "mu_size")
ncdf_varget,ncid, tmp_id, mu_size, OFFSET=start, COUNT=edges
tmp_id=ncdf_varid(ncid, "model")
ncdf_varget,ncid, tmp_id, model, OFFSET=start, COUNT=edges
start=intarr(2)
edges=intarr(2)
start(0)=0
edges(0)=smodel_dim
start(1)=0
edges(1)=model_size
tmp_id=ncdf_varid(ncid, "smodel")
ncdf_varget,ncid, tmp_id, smodel, OFFSET=start, COUNT=edges
What does It all Mean?
Some version of this…
10
Data Information Knowledge
Context
PresentationOrganization
IntegrationConversation
CreationGathering
Experience
~Metadata?
It and Meaning
• It = things that matter– Context
• Meaning = duh -> semantics• Relations!! Real ones!
• But it was more than that, though that often comes later…– Syntax (structure/form)– Semantics (meaning)– Pragmatics (use)
Metadata-Information-Knowledge Ecosystem
12
Metadata Information Knowledge
Context
FormalizationOrganization
IntegrationShared Conceptualization
CreationGathering
Experience
Provenance
• Origin or source from which something comes, intention for use, who/what generated for, manner of manufacture, history of subsequent owners, sense of place and time of manufacture, production or discovery, documented in detail sufficient to allow reproducibility
• Provenance: metadata in a given context! Swallow that.
• Knowledge provenance; meaning and relations in multiple contexts!
Perfect is the enemy of the good… (thanks Voltaire)
Origins …
• In 2000-2001 the need for capturing and preserving knowledge in science data became very clear but the barriers were high
• In 2004 we started a virtual observatory project based on semantic technologies
• Use case driven – in solar and solar-terrestrial physics with an emphasis on instrument-based measurements and real data pipelines; we needed implementations
• We knew we also needed integration and provenance (but that came later)
• We aimed to push semantics into our systems to build new ‘prototypes’ but we ‘failed’ ;-)
Tetherless World Constellation 15
In 2004
• 2004 – OWL was a W3 recommendation!!• Protégé 2.x and the Protégé-Java-OWL
API• SWOOP was a viable editor• Jena and the Jena API were in good
shape• Pellet worked• SPARQL was still a twinkle in the RDF
working group’s eye• Semantics were still the realm of computer
scientists
Tetherless World Constellation 16
Design and Development
• We made a conscious decision only to develop ontologies that were required to answer specific use cases and migrate metadata– Both Classes AND Properties (uh-oh…)
• We made a conscious effort to use whatever ontologies were available (cf. trends in metadata… nuff said)
• We were pretty sure that rules would be needed (complex logic or late semantic binding)
• We ignored query (see implementation)
Tetherless World Constellation 17
18
Use Case example
• Plot the neutral temperature from the Millstone-Hill Fabry Perot, operating in the non-vertical mode during January 2000 as a time series.
• Plot the neutral temperature from the Millstone-Hill Fabry Perot, operating in the non-vertical mode during January 2000 as a time series.– Meanings and relations
• Objects=Things!– Neutral temperature is a (temperature is a) parameter– Millstone Hill is a (ground-based observatory is a) observatory– Fabry-Perot is a interferometer is a optical instrument is a instrument– Non-vertical mode is a instrument operating mode– January 2000 is a date-time range– Time is a independent variable/ coordinate– Time series is a data plot is a data product
• Metadata just appeared everywhere…
Semantics - Modern informatics enables a new scale-free** framework approach
• Use cases
• Stakeholders
• Distributed authority
• Access control
• Ontologies
• Maintaining Identity
Semantics between 2004 and 2009
• Ontologies were needed for data integration and provenance and mediation for data mining
• Protégé 3.x and then 4.0 came out• SWOOP development was interrupted• Cmap added OWL predicate support*• SPARQL became a recommendation• Triple stores exploded in use and capability• Linked Open Data started to take off• Pellet 2.0 came out• I used the “M” word less frequently!
Tetherless World Constellation 22
Working with knowledge
Expressivity
Maintainability/ Extensibility
Implementability
Working with semantics
Query
Rule execution
Inference
Semantics between 2009 and now
• Semantic data framework (SeSF)• Substantial knowledge provenance work• Data quality, uncertainty and bias
representations and applications (oh, these are in production at NASA)
• Multi-sensor data synergy advisor• Applications:
– Sea Ice, Carbon Observatory, Integrated Ecosystem Assessments, globalchange.gov, ocean.data.gov, energy.data.gov ….
Tetherless World Constellation 25
Respect and Mediation … how
Discovering new data
Information model
29
Ontology
Core and Framework Semantics - Multi-tiered interoperability
used by
Closing thoughts
• Go ahead, create all the metadata you want, we’ll “materialize” some of it into triples based on semantics for use!
• Go ahead, create all the schema and encodings you want but remember – semantics now lives in an open-world (some of it). You are not the only source of metadata. Not all formal. Link over map.
• Semantics make metadata useful but we do not need all of your metadata
Tetherless World Constellation 31
Contact
• http://tw.rpi.edu
• @taswegian