Download - Knowledge Evolution in Distributed Geoscience Datasets and the Role of Semantic Technologies
TWCKnowledge Evolution in
Distributed Geoscience Datasets and
the Role of Semantic Technologies
Xiaogang (Marshall) Ma
Tetherless World Constellation
Rensselaer Polytechnic Institute
x.marshall.ma
rpi.edu/~max7
0000-0002-9110-7369MarshallXMa
TWCWilliam Smith's 1815 geologic
map of England and Wales
with part of Scotland
William Smith
(1769-1839)
(Image source: Geological Society of London)
TWC1874
(Image source: British
Geological Survey)
Evolution of the
Geological Map of
British Islands / UK
TWC1874
(Image source: British
Geological Survey)
1906
Evolution of the
Geological Map of
British Islands / UK
TWC1874
(Image source: British
Geological Survey)
1906
Evolution of the
Geological Map of
British Islands / UK
1939
TWC1874
(Image source: British
Geological Survey)
1906
Evolution of the
Geological Map of
British Islands / UK
1939
1969
TWC1874
(Image source: British
Geological Survey)
1906
Evolution of the
Geological Map of
British Islands / UK
1939
1969
2007
TWC1874
(Image source: British
Geological Survey)
1906
Evolution of the
Geological Map of
British Islands / UK
1939
1969
2007
2013
TWC
9
2004 2005
2008 2009
Definition of
“Quaternary” in
several versions of
the International
Stratigraphic Chart
TWC
13
Distributed datasets:
Mismatches of geological
units across political
boundaries
Italy/France near
Cuneo/Colmar
Cambrian Carboniferous
(Asch et al., 2012)
(Base map courtesy:
OneGeology-Europe and USGS)
TWC
14
Distributed datasets:
Mismatches of geological
units across political
boundaries
Italy/France near
Cuneo/Colmar
Cambrian Carboniferous
(Asch et al., 2012)
(Ma et al., 2014)
Felsic and hornblendic gneisses
Granitic rocks
Wyoming/Colorado
(Base map courtesy:
OneGeology-Europe and USGS)
TWC• Data and models, vocabularies, and ontologies
– Have we ever had model-independent datasets?
• Ontology dynamics and a data life cycle
15
CONCEPT
*Initial concepts
*Questions and
answers
*Grant info
COLLECTION
*Questionnaire
*Coded instrument
*CAI metadata
*Paradata
PROCESSING
*Data specs
*Recodes
*Summary
descriptive info
DISTRIBUTION
*Terms of use
*Citation
*Packaging info
DISCOVERY
*Catalog record
*Indexing
*Related
publications
ANALYSIS
*Replication code
*Publications
ARCHIVING
*Preservation metadata
*Confidentiality
*Additional processing
REPURPOSING
*Post-hoc harmonization
*Data transformations
Diagram reproduced from (Spencer, 2012)
TWCOntology dynamics
• Ontology Mapping
• Ontology Morphism
• Ontology Matching
• Ontology Articulation
• Ontology Translation
• Ontology Evolution
• Ontology Debugging
• Ontology Versioning
• Ontology Integration
• Ontology Merging
16(Flouris et al., 2008)
TWCPotential challenges
• Reworking of the extant data in a data center
– e.g. caused by ontology/vocabulary versioning
• Semantic mismatch among data sources
– e.g. heterogeneity in ontologies of the same topic
• Differentiated understanding of a same piece of dataset
between data providers and data users
– e.g. a data provider understands Quaternary as 1.806 Ma-present,
and a data user understands it as 2.588 Ma-present
• Error propagation in cross-discipline data re-use
– e.g. heterogeneous datasets may cause misconception in
subsequent works
17(Ma et al., 2014)
TWCOneGeology-Europe
• 20 European nations
providing national geologic
maps at scale ~1: 1M
• Harmonized geological
terms and map legends
• Multilingual labels in 18
languages
• Central portal for data
browsing/query among
distributed data sources
A contribution to
INSPIRE
http://www.onegeology-europe.org
18
A few recent works of interest
TWC
19
Federated query:
Result of geologic
units with age
‘Cenozoic - from 66
million years to today’
TWC
20
Earth Resource Form
Environmental Impact Value
Exploration Activity Type
Exploration Result
UNFC Value
Earth Resource Expression
Earth Resource Shape
Enduse Potential
Mineral Occurrence Type
Mining Activity Type
Processing Activity Type
Mining Waste Type Value
Commodity Code
Mineral Deposit Group
Mineral Deposit Type
Product Value
Recently finished CGI vocabularies
• Construct a collection of vocabularies for
populating information interchange
documents and enabling interoperability
• Provide labels for concepts, scope to
various communities defined by
language, science domain, or application
domain
CGI Geoscience Terminology Workgroup
http://cgi-iugs.org/tech_collaboration/
geoscience_terminology_working_group.html
TWC
21
USGS Online Geologic Maps
• Standardized vocabulary
with detailed annotation
• Forward and backward
queries between spatial
data and attribute data
• Links to further data
sources, e.g. aeromagnetic
survey, mineral resources
data, soils, geochemical
samples, etc.
http://mrdata.usgs.gov/geology/
state/map.html
TWCRecommendations
• Communities of practice on ontology and vocabulary
– Bottom-up, self-organized, and loose top-down control
• Formalize the ‘Concept’ step in a data life cycle
– Top-down, and adopt outputs from the bottom-up approach
• Make it a virtuous circle among the bottom-up and top-
down approaches
23
Thanks for listening.