2006-03-219th open forum on metadata registries, kobe, japan1 xmdr project overview frank olken...

29
2006-03-21 9th Open Forum on Metadata Registries , Kobe, Japan 1 XMDR Project Overview Frank Olken & Kevin D. Keck {olken,kdkeck}@lbl.gov Lawrence Berkeley National Laboratory Presentation to Open Metadata Forum Kobe, Japan March 21, 2006

Upload: abner-pope

Post on 04-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 2006-03-219th Open Forum on Metadata Registries, Kobe, Japan1 XMDR Project Overview Frank Olken & Kevin D. Keck {olken,kdkeck}@lbl.govkdkeck}@lbl.gov Lawrence

2006-03-21 9th Open Forum on Metadata Registries, Kobe, Japan 1

XMDR Project OverviewFrank Olken & Kevin D. Keck

{olken,kdkeck}@lbl.gov

Lawrence Berkeley National LaboratoryPresentation to

Open Metadata Forum

Kobe, JapanMarch 21, 2006

Page 2: 2006-03-219th Open Forum on Metadata Registries, Kobe, Japan1 XMDR Project Overview Frank Olken & Kevin D. Keck {olken,kdkeck}@lbl.govkdkeck}@lbl.gov Lawrence

2006-03-21 9th Open Forum on Metadata Registries, Kobe, Japan 2

XMDR means:

Extended Metadata Registry

Page 3: 2006-03-219th Open Forum on Metadata Registries, Kobe, Japan1 XMDR Project Overview Frank Olken & Kevin D. Keck {olken,kdkeck}@lbl.govkdkeck}@lbl.gov Lawrence

2006-03-21 9th Open Forum on Metadata Registries, Kobe, Japan 3

The Cast● Bruce Bargmeyer (LBNL) = Principal Investigator

● Kevin Keck (LBNL) = architect & stds. (design)

● Frank Olken (LBNL) = content characterization & stds. (design)

● John McCarthy (LBNL) = prototype development (management)

● Karlo Berket (LBNL) = prototype development

● Harold Solbrig (Mayo) = content preprocessing via LexGrid, stds

● Gayle Hodge (USGS) = content characterization, acquisition

● Denise Warzel (NCI) = content acquisition, standards, design

● Larry Fitzwater (EPA) = program mgt. (vision, direction)

● Nancy Lawler (DOD) = program mgt. (vision, direction)

● Sam Chance (DOD) = program mgt. (vision, direction)

Page 4: 2006-03-219th Open Forum on Metadata Registries, Kobe, Japan1 XMDR Project Overview Frank Olken & Kevin D. Keck {olken,kdkeck}@lbl.govkdkeck}@lbl.gov Lawrence

2006-03-21 9th Open Forum on Metadata Registries, Kobe, Japan 4

Organizational Cast

● Lawrence Berkeley National Laboratory● Environmental Protection Agency● National Cancer Institute● Mayo Clinic● United States Geological Survey● Department of Defense

Page 5: 2006-03-219th Open Forum on Metadata Registries, Kobe, Japan1 XMDR Project Overview Frank Olken & Kevin D. Keck {olken,kdkeck}@lbl.govkdkeck}@lbl.gov Lawrence

2006-03-21 9th Open Forum on Metadata Registries, Kobe, Japan 5

Goals● Assist revisions of ISO/IEC 11179 Metadata Registry

Standard to encompass additional semantic descriptions and resources

Vocabularies, thesauri, etc. Ontologies Relationships Semantic types

● Design and implement prototype Extended Metadata Registry

● Load metadata content into prototype● Demonstrate prototype

Page 6: 2006-03-219th Open Forum on Metadata Registries, Kobe, Japan1 XMDR Project Overview Frank Olken & Kevin D. Keck {olken,kdkeck}@lbl.govkdkeck}@lbl.gov Lawrence

2006-03-21 9th Open Forum on Metadata Registries, Kobe, Japan 6

Why Metadata Registries?● Facilitate reuse/standardization/integration/exchange of data

● Design time:

Database / messaging / application / forms designers

Data warehouse design ● Run-time:

Query formulation / optimization

Federated data query optimization / processing

Extraction, Translation, Load (ETL) of Data Warehouses

Semantic services, composition, workflows, ...● Users

Finding, understanding data

Understanding data entry forms

Page 7: 2006-03-219th Open Forum on Metadata Registries, Kobe, Japan1 XMDR Project Overview Frank Olken & Kevin D. Keck {olken,kdkeck}@lbl.govkdkeck}@lbl.gov Lawrence

2006-03-21 9th Open Forum on Metadata Registries, Kobe, Japan 7

Why Standards?

● Developing metamodel to serve as design for next generation metadata registries

● Evolve ISO/IEC 11179 Metadata Data Registry Standard Edition 2 (current)

● UML modeling, relational DB technology implementation Edition 3 (new)

● UML + OWL (Ontology Web Language) / MOF (Meta Object Facility) / CL (Common Logic) modeling

● Add support for ontologies

Page 8: 2006-03-219th Open Forum on Metadata Registries, Kobe, Japan1 XMDR Project Overview Frank Olken & Kevin D. Keck {olken,kdkeck}@lbl.govkdkeck}@lbl.gov Lawrence

2006-03-21 9th Open Forum on Metadata Registries, Kobe, Japan 8

More on Why MDR Standards?

● MDR Standards Can improve metadata creation practice Can improve metadata and data reuse Facilitate MDR adoption by organizations Facilitate MDR interoperability Facilitate MDR software marketing Facilitate MDR procurement Facilitate alignment / mapping among metadata

schemas, ...

Page 9: 2006-03-219th Open Forum on Metadata Registries, Kobe, Japan1 XMDR Project Overview Frank Olken & Kevin D. Keck {olken,kdkeck}@lbl.govkdkeck}@lbl.gov Lawrence

2006-03-21 9th Open Forum on Metadata Registries, Kobe, Japan 9

Proposed Changes to ISO/IEC 11179

● Support for ontologies, etc.● More formal modeling of relationships● Semantic types (?)

Page 10: 2006-03-219th Open Forum on Metadata Registries, Kobe, Japan1 XMDR Project Overview Frank Olken & Kevin D. Keck {olken,kdkeck}@lbl.govkdkeck}@lbl.gov Lawrence

2006-03-21 9th Open Forum on Metadata Registries, Kobe, Japan 10

Changes to ISO/IEC 11179 Std.

● Add support for ontologies, vocabularies Add ontologies Add predicates (logical formulae) Add axioms (asserted to be true) Add support for modularization of ontologies

● Add inclusion mechanisms for concept systems and ontologies

● Assert axioms in context of containing ontology

Page 11: 2006-03-219th Open Forum on Metadata Registries, Kobe, Japan1 XMDR Project Overview Frank Olken & Kevin D. Keck {olken,kdkeck}@lbl.govkdkeck}@lbl.gov Lawrence

2006-03-21 9th Open Forum on Metadata Registries, Kobe, Japan 11

Why add support for ontologies?

● More precise specification of data semantics (than natural language definitions)

● Machine processing of semantic specifications of data

Classification, subsumption testing, alignment, spatial, temporal reasoning

● Reusable semantic specifications for subject domains

● Conceptual data models to facilitate data integration

● Encoding of much current work on data semantics and terminologies as ontologies

● Useful for machine learning.

Page 12: 2006-03-219th Open Forum on Metadata Registries, Kobe, Japan1 XMDR Project Overview Frank Olken & Kevin D. Keck {olken,kdkeck}@lbl.govkdkeck}@lbl.gov Lawrence

2006-03-21 9th Open Forum on Metadata Registries, Kobe, Japan 12

Issues in Including Ontologies in ISO/IEC 11179

● Lack of agreement on logical formalisms

FOL, description logic (which?), ...● Hence, MDR std must be agnostic among logic

formalisms● Poses difficulties for:

Standards specification MDR implementation MDR interoperability

● See work of OMG Ontology Definition Metamodel (ODM) standard

Page 13: 2006-03-219th Open Forum on Metadata Registries, Kobe, Japan1 XMDR Project Overview Frank Olken & Kevin D. Keck {olken,kdkeck}@lbl.govkdkeck}@lbl.gov Lawrence

2006-03-21 9th Open Forum on Metadata Registries, Kobe, Japan 13

Changes to ISO/IEC 11179 Std.

● Formalize specification of semantic relationships Refinement of Edition 2 Classification Schemes Add relationships (types), roles, links (instances)

among concepts Specify attributes of relationships

● Reflexivity, irreflexivity, symmetry, anti-symmetry, transitivity

To support inference across semantic relationships● e.g., transitive closure over is-a, part-of, ...

Page 14: 2006-03-219th Open Forum on Metadata Registries, Kobe, Japan1 XMDR Project Overview Frank Olken & Kevin D. Keck {olken,kdkeck}@lbl.govkdkeck}@lbl.gov Lawrence

2006-03-21 9th Open Forum on Metadata Registries, Kobe, Japan 14

Relationship Modeling in ISO/IEC 11179 Edition 3

● Edition 2 has classification schemes and specialized relationships among various metamodel entities

● Proposed for Edition 3

● Binary and N-ary semantic relationships among concepts (a.k.a. relations)

● Treat data element concept, conceptual value domain, value meaning, etc. as subtypes of concept

● More detailed characterization of relationships: Roles / links Reflexivity, symmetry, anti-symmetry, transitivity, ....

Page 15: 2006-03-219th Open Forum on Metadata Registries, Kobe, Japan1 XMDR Project Overview Frank Olken & Kevin D. Keck {olken,kdkeck}@lbl.govkdkeck}@lbl.gov Lawrence

2006-03-21 9th Open Forum on Metadata Registries, Kobe, Japan 15

Why care about relationship characterization?

● Who cares about reflexivity, irreflexivity, symmetry, transitivity?

● Answer: need this information for inference on semantic relationships (usually binary) Example: Does it make sense to compute transitive

closure? ● Is-a: transitive● Part-of: sometimes transitive● Equals: transitive, symmetric● Similar: usually symmetric, typically not transitive

Page 16: 2006-03-219th Open Forum on Metadata Registries, Kobe, Japan1 XMDR Project Overview Frank Olken & Kevin D. Keck {olken,kdkeck}@lbl.govkdkeck}@lbl.gov Lawrence

2006-03-21 9th Open Forum on Metadata Registries, Kobe, Japan 16

Semantic Types for ISO/IEC 11179

● ISO/IEC 11179 Edition 2 has “datatypes” Associated with “value domain” i.e., datatypes are an aspect of representation NOT

semantics● Semantic Types

Concern meaning rather than representation Uses:

● Constraints over relationship roles● Attribute of concepts, conceptual value domains, ...● Ubiquitous in ontologies, schemas, ...

Page 17: 2006-03-219th Open Forum on Metadata Registries, Kobe, Japan1 XMDR Project Overview Frank Olken & Kevin D. Keck {olken,kdkeck}@lbl.govkdkeck}@lbl.gov Lawrence

2006-03-21 9th Open Forum on Metadata Registries, Kobe, Japan 17

Some Issues for Semantic Types● Alternative approaches:

Build semantic types into 11179 metamodel Reuse relationships for semantic type specifications Treat semantic types as unary predicates in

ontologies + axioms ● Should we have a standard set of semantic types

(at least base types) Yes, for interoperability No, for flexibility

● Collection types, type constructors ?

Page 18: 2006-03-219th Open Forum on Metadata Registries, Kobe, Japan1 XMDR Project Overview Frank Olken & Kevin D. Keck {olken,kdkeck}@lbl.govkdkeck}@lbl.gov Lawrence

2006-03-21 9th Open Forum on Metadata Registries, Kobe, Japan 18

Why Construct A Prototype?● To explore alternative revisions to ISO/IEC 11179

● To demonstrate that proposed revisions to ISO/IEC 11179 Metadata Registry Std. are:

Feasible

Useful● To experiment with alternative architectures / technologies for

constructing extended metadata registries.

Text retrieval engines - Lucene

Inference engines – Jena, Kowari (?), ....

Service oriented architecture (SOA) ● To facilitate deployment of revised ISO/IEC Metadata Registries

Example implementation

Open Source Code !

Page 19: 2006-03-219th Open Forum on Metadata Registries, Kobe, Japan1 XMDR Project Overview Frank Olken & Kevin D. Keck {olken,kdkeck}@lbl.govkdkeck}@lbl.gov Lawrence

2006-03-21 9th Open Forum on Metadata Registries, Kobe, Japan 19

Why Content?● Content characterization assists in shaping

revisions to ISO/IEC 11179● Content characterization assists in selection of

content to load● Content ingestion, installation, querying

provides a means to exercise the prototype Testing Demonstration Performance evaluation Utility evaluation

Page 20: 2006-03-219th Open Forum on Metadata Registries, Kobe, Japan1 XMDR Project Overview Frank Olken & Kevin D. Keck {olken,kdkeck}@lbl.govkdkeck}@lbl.gov Lawrence

2006-03-21 9th Open Forum on Metadata Registries, Kobe, Japan 20

Metadata Content Activities

● Content Characterization e.g., graph theoretic characterization

● Content Acquisition● Content Preprocessing

Into standard formats for loading (H. Solbrig)● Content Loading● Content Querying

Page 21: 2006-03-219th Open Forum on Metadata Registries, Kobe, Japan1 XMDR Project Overview Frank Olken & Kevin D. Keck {olken,kdkeck}@lbl.govkdkeck}@lbl.gov Lawrence

2006-03-21 9th Open Forum on Metadata Registries, Kobe, Japan 21

Desiderata for Content Selection

● Accessibility

Licensing, source cooperation, unclassified● Documentation, familiarity to XMDR collaborators

● Funder interest

● Diversity of metadata types, subject areas

● Diverse graph structures (of semantic relationships)

● OWL encodings available

● Moderate size

● Opportunities for mappings among metadata sets

● Multi-linguality

Page 22: 2006-03-219th Open Forum on Metadata Registries, Kobe, Japan1 XMDR Project Overview Frank Olken & Kevin D. Keck {olken,kdkeck}@lbl.govkdkeck}@lbl.gov Lawrence

2006-03-21 9th Open Forum on Metadata Registries, Kobe, Japan 22

Content Characterization● Provenance: Name, source, contact, ...

● Type of metadata:

thesauri, ontology, ISO/IEC 11179 metadata registry, ...● Graph Characterization

Tree, Faceted Classification, partial order (directed acyclic graph), cyclic graph, ...

● Size: # concepts, # links, # bytes

● Definitions ?

● File Formats

● OWL encoding ?

● Multilingual

● Availability / licensing issues

Page 23: 2006-03-219th Open Forum on Metadata Registries, Kobe, Japan1 XMDR Project Overview Frank Olken & Kevin D. Keck {olken,kdkeck}@lbl.govkdkeck}@lbl.gov Lawrence

2006-03-21 9th Open Forum on Metadata Registries, Kobe, Japan 23

Why Graph-theoretic Content Characterization?

● Important structural taxonomy

● Impacts:

Expressivity required of registry Content representation, index structures Search, matching algorithms Computational complexity of search, matching, ... Inference algorithms Computational complexity of inference Design / implementation / performance of metadata

registries

Page 24: 2006-03-219th Open Forum on Metadata Registries, Kobe, Japan1 XMDR Project Overview Frank Olken & Kevin D. Keck {olken,kdkeck}@lbl.govkdkeck}@lbl.gov Lawrence

2006-03-21 9th Open Forum on Metadata Registries, Kobe, Japan 24

Loaded content metadatasets

● National Cancer Institute Thesaurus (NCIT)● Defense Technology Information Center (DTIC)

Thesaurus● General Multilingual Environmental Thesaurus

(GEMET)● Adult Mouse Anatomical Dictionary ● EPA Terms of the Environment● ISO 3166 Country Codes● ISO 4217 Currency Codes

Page 25: 2006-03-219th Open Forum on Metadata Registries, Kobe, Japan1 XMDR Project Overview Frank Olken & Kevin D. Keck {olken,kdkeck}@lbl.govkdkeck}@lbl.gov Lawrence

2006-03-21 9th Open Forum on Metadata Registries, Kobe, Japan 25

Other Metadatasets of Interest● NCI Cancer Data Standards Repository (caDSR)

● EPA Environmental Data Registry (EDR)

● NLM Uniform Medical Language System (UMLS)

● USGS Geographic Names Information System (GNIS)

● Integrated Taxonomic Information System (ITIS)

● NBII Biocomplexity Thesaurus

● ISO 639 Language Identifiers

● Logical Observations, Identifiers, Codes (LOINC)

● Getty Thesaurus of Geographical Names (TGN)

● NASA Semantic Web Earth and Environmental Terminologies (SWEET)

● Dublin Core Metadata (?)

Page 26: 2006-03-219th Open Forum on Metadata Registries, Kobe, Japan1 XMDR Project Overview Frank Olken & Kevin D. Keck {olken,kdkeck}@lbl.govkdkeck}@lbl.gov Lawrence

2006-03-21 9th Open Forum on Metadata Registries, Kobe, Japan 26

Conclusions

● XMDR Activities ISO/IEC 11179 Revisions

● Support for ontologies, etc.● Relationships● Semantic types

Prototype Development Content (characterization, loading, query) Prototype testing, performance evaluation, demos

Page 27: 2006-03-219th Open Forum on Metadata Registries, Kobe, Japan1 XMDR Project Overview Frank Olken & Kevin D. Keck {olken,kdkeck}@lbl.govkdkeck}@lbl.gov Lawrence

2006-03-21 9th Open Forum on Metadata Registries, Kobe, Japan 27

Coming in Second Part of Talk (Kevin Keck) :

● Detailed discussion of the architecture and technology of the prototype ...

Page 28: 2006-03-219th Open Forum on Metadata Registries, Kobe, Japan1 XMDR Project Overview Frank Olken & Kevin D. Keck {olken,kdkeck}@lbl.govkdkeck}@lbl.gov Lawrence

2006-03-21 9th Open Forum on Metadata Registries, Kobe, Japan 28

Acknowledgements

● Financial support from U.S. Dept. of Defense, U.S. Environmental Protection Agency

● In kind contributions from U.S. National Cancer Institute, Mayo Clinic, US Geological Survey

● Support from program managers: Nancy Lawler (DOD) and Sam Chance (DOD)

● Comments on drafts of this talk by John L. McCarthy

Page 29: 2006-03-219th Open Forum on Metadata Registries, Kobe, Japan1 XMDR Project Overview Frank Olken & Kevin D. Keck {olken,kdkeck}@lbl.govkdkeck}@lbl.gov Lawrence

2006-03-21 9th Open Forum on Metadata Registries, Kobe, Japan 29

Contact Information:

● Project: http://xmdr.org/

● Frank Olken: Lawrence Berkeley National Laboratory Email: [email protected] Tel: 510-486-5891 URL: http://www.lbl.gov/~olken