1 sc 32/wg 2 tutorial metadata registry standards july 16, 2007 bruce bargmeyer university of...

34
1 SC 32/WG 2 Tutorial Metadata Registry Standards July 16, 2007 Bruce Bargmeyer University of California, Berkeley and Lawrence Berkley National Laboratory Tel: +1 510-495-2905 [email protected] JTC1 SC32 N1649

Upload: oscar-stevens

Post on 13-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 1 SC 32/WG 2 Tutorial Metadata Registry Standards July 16, 2007 Bruce Bargmeyer University of California, Berkeley and Lawrence Berkley National Laboratory

1

SC 32/WG 2 Tutorial

Metadata Registry Standards

July 16, 2007

Bruce BargmeyerUniversity of California, BerkeleyandLawrence Berkley National LaboratoryTel: +1 [email protected]

JTC1 SC32 N1649JTC1 SC32 N1649

Page 2: 1 SC 32/WG 2 Tutorial Metadata Registry Standards July 16, 2007 Bruce Bargmeyer University of California, Berkeley and Lawrence Berkley National Laboratory

2

Topics

Standards development: OMG, ISO (TC 37 & JTC 1/SC 32), W3C, OASIS Align, Coordinate, Integrate:

Standards, Recommendations, Specifications Semantics Challenges and Future Directions

Page 3: 1 SC 32/WG 2 Tutorial Metadata Registry Standards July 16, 2007 Bruce Bargmeyer University of California, Berkeley and Lawrence Berkley National Laboratory

Align, Coordinate, IntegrateStandards

3

24707

11179 E3 19763

20944

WG 2 doing OK internally:

Page 4: 1 SC 32/WG 2 Tutorial Metadata Registry Standards July 16, 2007 Bruce Bargmeyer University of California, Berkeley and Lawrence Berkley National Laboratory

Align, Coordinate, IntegrateStandards

4

WG 1

WG 2WG 3

WG 4

SC 32?

Clearwater meetinga step forward

Page 5: 1 SC 32/WG 2 Tutorial Metadata Registry Standards July 16, 2007 Bruce Bargmeyer University of California, Berkeley and Lawrence Berkley National Laboratory

5

Align, Coordinate, Integrate Standards/Recommendations/Specifications

for Semantic Computing

ISO/IEC JTC 1/SC 32

UsUs

ererss

ISO/IEC 11179MetadataRegistries

Metadata Registry

Terminology Thesaurus Taxonomy

DataStandards

Ontology

StructuredMetadata

Terminology

CONCEPT

Referent

Refers To Symbolizes

Stands For

“Rose”,“ClipArt Rose”

ISO TC 37

SemanticWeb

W3C

Object Management

MOFODMCWMIMM

OMG

Node

Node

Edge

Subject

Predicate

Object

Graph RDF

Page 6: 1 SC 32/WG 2 Tutorial Metadata Registry Standards July 16, 2007 Bruce Bargmeyer University of California, Berkeley and Lawrence Berkley National Laboratory

Standards DevelopmentSemantics Management and Semantics Services –

Semantic Computing

6

OMG

W3CISO/IEC JTC 1 SC 32

Align, Co-develop, Fast Track, PAS Submission …

ISO TC 37

Page 7: 1 SC 32/WG 2 Tutorial Metadata Registry Standards July 16, 2007 Bruce Bargmeyer University of California, Berkeley and Lawrence Berkley National Laboratory

Standards DevelopmentSemantics Management and Semantics Services –

Semantic Computing

7

OMG

W3CISO/IEC JTC 1 SC 32

Align, integrate, co-develop, Fast Track, PAS Submission …Can we coordinate content?

W3C

Page 8: 1 SC 32/WG 2 Tutorial Metadata Registry Standards July 16, 2007 Bruce Bargmeyer University of California, Berkeley and Lawrence Berkley National Laboratory

A Success

8

OMG

ISO/IEC JTC 1 SC 32

Some text and figures are identical in the two standards.

ISO/IEC 24707OMG ODM

ISO/IEC 20944 – Common LogicOMG Ontology Definition Metamodel

Page 9: 1 SC 32/WG 2 Tutorial Metadata Registry Standards July 16, 2007 Bruce Bargmeyer University of California, Berkeley and Lawrence Berkley National Laboratory

Standards DevelopmentSemantics Management and Semantics Services –

Semantic Computing

9

ISO/IEC 11179 (Edition 3)

ISO/IEC JTC 1 SC 32

Ongoing effort

Page 10: 1 SC 32/WG 2 Tutorial Metadata Registry Standards July 16, 2007 Bruce Bargmeyer University of California, Berkeley and Lawrence Berkley National Laboratory

Standards DevelopmentSemantics Management and Semantics Services –

Semantic Computing

10

Possible effort

11179 E3 proposals

OMG

Page 11: 1 SC 32/WG 2 Tutorial Metadata Registry Standards July 16, 2007 Bruce Bargmeyer University of California, Berkeley and Lawrence Berkley National Laboratory

Standards DevelopmentSemantics Management and Semantics Services –

Semantic Computing

11

ISO/IEC 11179 (Edition 3)

ISO/IEC JTC 1 SC 32

Hopeful?

OMG

IMM &

Page 12: 1 SC 32/WG 2 Tutorial Metadata Registry Standards July 16, 2007 Bruce Bargmeyer University of California, Berkeley and Lawrence Berkley National Laboratory

Other Possibilities

OASIS ebXML RegistryW3C Semantic Web Deployment WGTC 37

12

Page 13: 1 SC 32/WG 2 Tutorial Metadata Registry Standards July 16, 2007 Bruce Bargmeyer University of California, Berkeley and Lawrence Berkley National Laboratory

Getting the information that we need, when we need it, without afflicting the excellent minds of humans with toil and drudgery

The litany: Too much or too little, irrelevant, not authoritative, out of date Unknown quality, not trustable, lacks provenance, no certainty measures Difficult to find, difficult to access, difficult to use Meaning not clear, relationship to other information not clear Data creators do not have the same understanding of the data as end users Recorded data loses much real world meaning, context, relationships Much of the meaning of data is buried in the processes used to manipulate the data (e.g., in

computer code) Need improvements in efficiency and effectiveness

Every time we solve it, we re-create it.

The Ageless Information Problemcf: Data, Information, Knowledge, Wisdom

Page 14: 1 SC 32/WG 2 Tutorial Metadata Registry Standards July 16, 2007 Bruce Bargmeyer University of California, Berkeley and Lawrence Berkley National Laboratory

Improve traditional data management/data administration Use stronger semantics management and

semantics services capabilities

Enable something new Semantic computing

New Semantics Capabilities Proposed for ISO/IEC 11179 MDR (Edition 3)

Page 15: 1 SC 32/WG 2 Tutorial Metadata Registry Standards July 16, 2007 Bruce Bargmeyer University of California, Berkeley and Lawrence Berkley National Laboratory

Processing that takes “meaning” into account Makes use of concept systems, e.g., thesauri and/or ontologies Moves some of the “meaning” of data from computer code to

managed semantics Processing that uses (e.g., reasons across) the relations between

things not just computing about the things themselves. Processing that helps to take people out of the computation,

reducing the human toil Semantics “grounding” for data, data discovery, extraction,

mapping, translation, formatting, validation, inferencing, … Delivering higher-level results that are more helpful for the user’s

thought and action

Semantic Computing: The Nub of It

Page 16: 1 SC 32/WG 2 Tutorial Metadata Registry Standards July 16, 2007 Bruce Bargmeyer University of California, Berkeley and Lawrence Berkley National Laboratory

In The Epic Information StruggleWe Have Made Heroic Progress

Files

Machine Processing

Computer Processing Cards Tape Disk

Page 17: 1 SC 32/WG 2 Tutorial Metadata Registry Standards July 16, 2007 Bruce Bargmeyer University of California, Berkeley and Lawrence Berkley National Laboratory

In structuring data and text -- Structured Data

Columns on cards & tape (possibly comma separated) Hierarchical (DBMS) Network Table (relational DBMS) Hierarchy (XML) Graph (RDF)

Semi-structured text Nrof, trof, LaTeX … SGML HTML XML

In The Epic Information StruggleWe Have Made Heroic Progress

Page 18: 1 SC 32/WG 2 Tutorial Metadata Registry Standards July 16, 2007 Bruce Bargmeyer University of California, Berkeley and Lawrence Berkley National Laboratory

In documenting data and text (e.g., semantics management) –

Data Standards Code sets

(Meta)Data Standards Data element definitions, valid values, value meanings Metadata registries (MDR, ISO/IEC 11179) Other standards as presented at this conference

Concept systems (or KOS) Glossaries Dictionaries Thesauri Taxonomies Ontologies Graphs

In The Epic Information StruggleWe Have Made Heroic Progress

Page 19: 1 SC 32/WG 2 Tutorial Metadata Registry Standards July 16, 2007 Bruce Bargmeyer University of California, Berkeley and Lawrence Berkley National Laboratory

Improve data management through use of stronger semantics management Databases XML data Other “traditional” data

Enable new wave of semantic computing Take meaning of data into account Process across relations as well as properties May use reasoning engines, e.g., to draw inferences

Semantic ManagementProposals for 11179 Edition 3

Page 20: 1 SC 32/WG 2 Tutorial Metadata Registry Standards July 16, 2007 Bruce Bargmeyer University of California, Berkeley and Lawrence Berkley National Laboratory

Semantic Computing Application: Find and process non-explicit data

Analgesic Agent

Non-Narcotic Analgesic

AcetominophenNonsteroidal Antiinflammatory Drug

Analgesic and Antipyretic

DatrilAnacin-3 Tylenol

For example…

Patient data on drugs contains brand names (e.g. Tylenol, Anacin-3, Datril,…);

However, want to study patients taking analgesic agents

Page 21: 1 SC 32/WG 2 Tutorial Metadata Registry Standards July 16, 2007 Bruce Bargmeyer University of California, Berkeley and Lawrence Berkley National Laboratory

A Semantics Application: Specify and compute across Relations, e.g., within a food web in an Arctic ecosystem

                                        An organism is connected to another organism for which it is a source of food energy and material by an arrow representing the direction of biomass transfer.

Source: http://en.wikipedia.org/wiki/Food_web#Food_web (from SPIRE)

Page 22: 1 SC 32/WG 2 Tutorial Metadata Registry Standards July 16, 2007 Bruce Bargmeyer University of California, Berkeley and Lawrence Berkley National Laboratory

Semantics Application: Combine Data, Metadata & Concept Systems

Name Datatype Definition Units

ID textMonitoring Station Identifier

not applicable

Date date Date yy-mm-dd

Temp numberTemperature (to 0.1 degree C)

degrees Celcius

Hg numberMercury contamination

micrograms per liter

ID Date Temp Hg

A 06-09-13 4.4 4

B 06-09-13 9.3 2

X 06-09-13 6.7 78

Inference Search Query:“find water bodies downstream from Fletcher Creek where chemical contamination was over 10 micrograms per liter between December 2001 and March 2003”

Data:

Metadata:

Biological Radioactive

Contamination

lead cadmiummercury

Chemical

Concept system:

Page 23: 1 SC 32/WG 2 Tutorial Metadata Registry Standards July 16, 2007 Bruce Bargmeyer University of California, Berkeley and Lawrence Berkley National Laboratory

Challenge: Use data from systems that record the same facts with different terms

Common Content

OASIS/ebXMLRegistries

Common Content

ISO 11179Registries

Common Content

OntologicalRegistries

Common Content

CASE ToolRepositories

Common Content

UDDIRegistries

CountryIdentifier

DataElement

XML Tag

TermHierarchy

Attribute

BusinessSpecification

TableColumn

SoftwareComponentRegistries

Common Content

Common Content

DatabaseCatalogs

BusinessObject

DublinCore

Registries

Common Content

Coverage

Page 24: 1 SC 32/WG 2 Tutorial Metadata Registry Standards July 16, 2007 Bruce Bargmeyer University of California, Berkeley and Lawrence Berkley National Laboratory

Data Elements

DZ

BE

CN

DK

EG

FR

. . .

ZW

ISO 3166English Name

ISO 31663-Numeric Code

012

056

156

208

818

250

. . .

716

ISO 31662-Alpha Code

Algeria

Belgium

China

Denmark

Egypt

France

. . .

Zimbabwe

Name:Context:Definition:Unique ID: 4572Value Domain:Maintenance Org.Steward:Classification:Registration Authority:Others

ISO 3166French Name

L`Algérie

Belgique

Chine

Danemark

Egypte

La France

. . .

Zimbabwe

DZA

BEL

CHN

DNK

EGY

FRA

. . .

ZWE

ISO 31663-Alpha Code

Same Fact, Different Terms

Algeria

Belgium

China

Denmark

Egypt

France

. . .

Zimbabwe

Name: Country IdentifiersContext:Definition:Unique ID: 5769Conceptual Domain:Maintenance Org.:Steward:Classification:Registration Authority:Others

DataElementConcept

Page 25: 1 SC 32/WG 2 Tutorial Metadata Registry Standards July 16, 2007 Bruce Bargmeyer University of California, Berkeley and Lawrence Berkley National Laboratory

Challenge: Draw information together from a broad range of studies, databases, reports, etc.

Page 26: 1 SC 32/WG 2 Tutorial Metadata Registry Standards July 16, 2007 Bruce Bargmeyer University of California, Berkeley and Lawrence Berkley National Laboratory

A semantics application:Information Extraction and Use

Segment

Classify

Associate

Normalize

Deduplicate

Discover patterns

Select models

Fit parameters

Inference

Report results

Actionable Information

Decision Support

ExtractionEngine

11179-3(E3)

XMDR

Page 27: 1 SC 32/WG 2 Tutorial Metadata Registry Standards July 16, 2007 Bruce Bargmeyer University of California, Berkeley and Lawrence Berkley National Laboratory

Metadata Registries are Useful

Registered semantics For “training” extraction engines The “Normalize” function can make use of standard

code sets that have mapping between representation forms.

The “Classify” function can interact with pre-established concept systems.

Provenance High precision for proper nouns, less precision

(e.g., 70%) for other concepts -> impacts downstream processing, Need to track precision

Page 28: 1 SC 32/WG 2 Tutorial Metadata Registry Standards July 16, 2007 Bruce Bargmeyer University of California, Berkeley and Lawrence Berkley National Laboratory

Challenge: Gain Common Understanding of meaning between Data Creators and Data Users

Users Information systems

Data Creation

UsersUsers

EEA

USGS

DoD

EPAenvironagricultureclimatehuman healthindustrytourismsoilwaterair

123345445670248591308

123345445670248591308

3268082513485038270800002178

3268082513485038270800002178

text data

environagricultureclimatehuman healthindustrytourismsoilwaterair

123345445670248591308

123345445670248591308

3268082513485038270800002178

3268082513485038270800002178

text

ambienteagriculturatiemposalud hunanoindustriaturismotierraaguaaero

123345445670248591308

123345445670248591308

3268082513485038270800002178

3268082513485038270800002178

text data

data

environagricultureclimatehuman healthindustrytourismsoilwaterair

123345445670248591308

123345445670248591308

3268082513485038270800002178

3268082513485038270800002178

text data

Others . . .

ambienteagriculturatiemposalud hunoindustriaturismotierraaguaaero

123345445670248591308

123345445670248591308

3268082513485038

3268082513485038270800002178

text data

A common interpretation of what the data represents

Page 29: 1 SC 32/WG 2 Tutorial Metadata Registry Standards July 16, 2007 Bruce Bargmeyer University of California, Berkeley and Lawrence Berkley National Laboratory

Vocabulary Management is essential for use of semantic technologies Define concepts and relationships Harmonize terminology, resolve conflicts Collaborate with stakeholders

An approach Select a domain of interest Enter core concepts and relationships Engage community in vocabulary review Harmonize, validate and vet the vocabulary Enter metadata describing enterprise data Link concept system to metadata

Practical Vocabulary Management

Page 30: 1 SC 32/WG 2 Tutorial Metadata Registry Standards July 16, 2007 Bruce Bargmeyer University of California, Berkeley and Lawrence Berkley National Laboratory

For vocabulary repository Register, harmonize, validate, and vet definitions and

relations To register mappings between multiple vocabularies To register mappings of concepts to data To provide semantics services To register and manage the provenance of data

11179-3 (E3) is part of the infrastructure for semantics and data management.

These capabilities are proposed for ISO/IEC 11179 Edition 3

Use eXtended MDR Capabilities

Page 31: 1 SC 32/WG 2 Tutorial Metadata Registry Standards July 16, 2007 Bruce Bargmeyer University of California, Berkeley and Lawrence Berkley National Laboratory

Upside Collaborative

Supports interaction with community of interest Shared evolution and dissemination Enables Review Cycle

Standards-based – don’t lock semantics into proprietary technology Foundation for strategic data centric applications Lays the foundation for

Ontology-based Information Management Content is reusable for many purposes

Downside Managing semantics is HARD WORK

- No matter how friendly the tools Needs integration with other components

11179 (E3) Use

Page 32: 1 SC 32/WG 2 Tutorial Metadata Registry Standards July 16, 2007 Bruce Bargmeyer University of California, Berkeley and Lawrence Berkley National Laboratory

Data management and metadata management must evolve to address more complex data structures (relational, object, hierarchies, graphs) Query capabilities

More than SQL, XQuery, SPARQL Discovery mechanisms

More than Google Access, mining, extraction

We need stronger semantics management

Some Challenges

Page 33: 1 SC 32/WG 2 Tutorial Metadata Registry Standards July 16, 2007 Bruce Bargmeyer University of California, Berkeley and Lawrence Berkley National Laboratory

Registering and mapping ontologiesOntology EvolutionRegistering Process Ontologies

Metadata Registry Support for

Page 34: 1 SC 32/WG 2 Tutorial Metadata Registry Standards July 16, 2007 Bruce Bargmeyer University of California, Berkeley and Lawrence Berkley National Laboratory

Thank You

Acknowledgements Karlo Berket, LBNL Kevin Keck, LBNL John McCarthy, LBNL Harold Solbrig, Apelon

This material is based upon work supported by the National Science Foundation under Grant No. 0637122, USEPA and USDOD. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation, USEPA or USDOD.

37

Bruce BargmeyerLawrence Berkeley National Laboratory &Berkeley Water CenterUniversity of California, BerkeleyTel: +1 [email protected]