workshop on the doi system doi system: data model international doi foundation
TRANSCRIPT
Workshop on the DOI System
DOI SYSTEM: DATA MODEL
International DOI Foundation
• DOI Data Model and interoperability• Application profiles • Kernel metadata • Metadata declaration• Role of DOI name metadata • Origins of the DOI Data Model • Semantic interoperability• The indecs principles • Applications of indecs• The use of a data dictionary• Example: rights management
Outline / Key concepts in this section doi>
• DOI Handbook Chapter 4, “DOI Data Model” http://www.doi.org/handbook_2000/metadata.html
• “DOI System and Data Dictionaries” Factsheet:http://www.doi.org/factsheets/DOIDataDictionaries.html
Further reading on key concepts in this section doi>
DOI data model
• The underlying model of how data within the DOI System relates to other data
– Therefore vital for interoperability
• Whereas the Handle System component is needed for every DOI name, the DOI Data Model component has not yet been used to its full extent
– Some applications have used it “behind the scenes”
• Interoperability becomes more important as an economic feature when there are multiple services or multiple uses – which there will be eventually
– Don’t design only for today
doi>
784369
965876
456
908
453
Application Profile
Application Profile
The properties of groups of DOI names are defined as APs
Service Instance
Service Instance
Service Instance
APs have one or moreServices
Application Profile (AP) Framework
965
876
456
453
784
369
908
Entities are identified byDOI names
Service Definition
Service Definition
Services havedefinitions
Service Definition
doi>
965
784369
965876
456
908
453
Service Instance Service Definition876
456
453
784
369
908Service Instance
Application Profile
Application Profile
453784
Application Profile
Service DefinitionService Instance
Service Instance
Service Definition
The properties of groups of DOI names are defined as APs
APs have one or moreServices
Services havedefinitions
Application Profile (AP) Framework
• New APs and services may be created or made available• One change to an AP to affect all DOI names within that AP
Entities are identified by DOI names
Service Definition
doi>
DOI Application Profile
A DOI Application Profile is a DOI name view: mechanism for “unity in diversity”: what do all these DOI names have in common?
Based on any interest group’s view of a type of creation (a DOI Name User Community). Functional granularity: create a grouping when you need it.
DOI-AP’s can overlap: things can be in multiple DOI-APs.
DOI-AP has metadata kernel, Registration Agency, Governance /Development Group
Zero Set = “initial implementation” DOI names (just a single URL redirection; zero additional metadata).
Activitytracking
Activitytracking
Full implementation
Full implementation
Initial implementation
Initial implementation
Single redirection (persistent identifier)
Metadata W3C, WIPO, NISO, ISO, IETF, etc
Multiple resolution
Defined App Profiles
Defined App ProfilesZero App ProfileZero App Profile
Single redirection (persistent identifier)
Metadata W3C, WIPO, NISO, ISO, IETF, etc
Multiple resolution
Activitytracking
Activitytracking
Each DOI-AP starts from a basic Kernel (8 elements) and may add whatever else it needs: defined by the DOI Name User Community.
DOI name metadata vocabulary being developed - in tandem with ONIX etc
Can/should coincide with or provide sector requirements
Different DOI-APs’ metadata will interoperate if vocabularies are developed within indecs-based model.
DOI Kernel
DOI name 10.1000/ISBN0141255559
resourceIdentifier ISBN 0141255559
resourceName Two for the dough
PrincipalAgent,role Janet Evanovich, author
StructuralType Physical fixation
Character Text
Mode Visual
referentType Book
DOI Kernel
Contains critical minimum metadata for basic recognition (but not complete disambiguation).
Standard base vocabulary
DOI -AP entity (e.g. “book”) must be analysable in terms of other attributes (e.g. media, mode, content, subject).
DOI Kernel as the basis of each application profile
DOI AP
metadata for application
Compulsory kernel
Each Profile can be thought of as built from the kernel + extensions:
Each DOI-AP can be thought of as built from the kernel + extensions…
...But the kernel is actually what several AP’s have in common (compare the different views of a person) :
SonLegal personAgentAlienScholarLibrary userComposercredit card holderShoe purchaserAuthorLottery entrant
Hospital patientCitizenCar driverRights ownerMarathon runnerSoftware licenseeParentTax payerClub membere-consumer Back account holder
HusbandCharity giverHotel guestSpeeding ticket recipientDisney World visitorFrequent FlyerConcert-goerPassengerEmployeeVoterDog owner
DOI Kernel as the basis of each application profile
This kernel cannot be logically defined from first principles
In the absence of existing Application Profiles to define this overlap = kernel, we have made a reasonable estimate from the logical analysis of <indecs>
DOI Kernel as the basis of each application profile
DOI AP 1
metadata for AP
DOI AP 2
D
OI AP 3
DOI-APs: all metadata in well-formed structure
kernel for any DOI name
Metadata declarations
WHAT:
• Base kernel metadata must be declared.
• DOI-AP-specific metadata is a matter for the DOI Name User Community (Governance Group/Registration Agency) to decide.
HOW:
• Either local webpage or central repository or both (as decided by User Community rules).
• Automated access to metadata declaration via Handle data types?
• XML schemas.
Roles of declared metadata
= Functional specification of the DOI kernel
(a) to assign a unique DOI name to the creation [DOI]
(b) to link the DOI name to the principal local identifier of a creation (if any) to enable the integration of DOI name-related applications and metadata with others [resourceIdentifier]
(c) to enable a searcher or application to identify the creation by its most common name and the parties(s) responsible for its creation or publication [resourceName, principalAgent, agent Role]
Roles of declared metadata (continued)
(d) to enable a searcher or application to distinguish the fundamental type of creation (abstract, physical, digital or spatio-temporal), and thereby also to distinguish between creations of different types with the same names and creators. [structuralType]
(e) to enable a searcher or application or distinguish the mode of the creation (visual, audio, etc.) [mode, character]
(f) to enable a searcher or application to determine to which DOI name user/application set the creation belongs [DOI-AP] .
Handles and metadata: a possible development
Handle data types could create a way of processing metadata as a “distributed database” of services: e.g.
Data types (and results) must be consistent, so the DOI name Handle data type vocabulary must be developed with great care within indecs-based model. Some data types could be application specific.
[email protected]/[email protected]/[email protected]/[email protected]/[email protected]/[email protected]/[email protected]/123456etc.
Origins of DOI data model
• The underlying model of how data within the DOI System relates to other data
• Two components– Data Dictionary + DOI Application Profile Framework
• Based on the indecs analysis– Provides tool for precise description of entity through metadata (and
mapping to other schemes).
• Met the needs of DOI System development aim: do not re-invent the wheel
• DOI System and indecs development were in parallel • DOI Application Profile framework
– Provides means of relating entities: grouping entities and expressing relationships
– A mechanism for grouping DOI names with similar properties
doi>
popular...Metadata is data about data.Everyone
logical...An item of metadata is a relationship that someone claims exists between two entities*.<indecs> framework
functional...Metadata is the life-blood of e-commerce.
*entity = something which has identity
Definitions of metadata
#1: All metadata is just a view
e.g. Views of a “person”: some (generic) ways in which you might be identified in metadata schemes...
SonLegal personAgentAlienScholarLibrary userComposercredit card holderShoe purchaserAuthorLottery entrant
Hospital patientCitizenCar driverRights ownerMarathon runnerSoftware licenseeParentTax payerClub membere-consumer Back account holder
HusbandCharity giverHotel guestSpeeding ticket recipientDisney World visitorFrequent FlyerConcert-goerPassengerEmployeeVoterDog owner
In each of these roles “you” will have different IDs and attributes.
Three <indecs> conclusions
#1: All metadata is just a view
Creations are the same. An identifier for a published article may refer to...
A manuscript The abstract workA draft A (class of) physical copy in a publication A (class of) digital copy (not in a publication)A (class of) digital copy in a publicationA (class of) digital formatA specific digital copyA (class of) paper copyA specific paper copyAn editionA reprintA translationetc…and many combinations of the above
Similar views apply to other types of creations.
Three <indecs> conclusions
#1: All metadata is just a view
Views must not be confused for digital content and rights management. Mistaken identity can be catastrophic.
Increasingly, views need to be interoperable
• e.g. production workflow, rights, marketing within one business; supply chain transfer; etc.
The need for automated, interoperable views in digital commerce will be enormous.
Three <indecs> conclusions
#2: (Almost) all terms need identifiers
Each of the values of a view must be defined and identified if other views are to recognize them (what do you mean by an abstract work? an edition? a format? a scholar? a name?)
So views need comprehensive controlled vocabularies (note our reliance on ISO language, territory, currency, time codes).
Automation needs disambiguity.
Terms of rights must be unambiguous. Anything may be a term of an agreement.
Emergence of the value of structured ontologies for commerce (like the indecs model).
Three <indecs> conclusions
#3: Events are the key to interoperability
Most metadata is “thing” or “people” based.
• static views e.g. “a creation”
In the net future, metadata interoperability will be achieved by describing “events”; relating things and people
• dynamic views e.g. “A created B”
Event descriptions will also be the key to rights metadata (transactions are events)
Three <indecs> conclusions
• Assigning metadata to a referent, to enable semantic interoperability – “say what the referent is”– Resolution of an identifier may give the referent; or only metadata; or a
“manifestation”
• Semantic: – Do two identifiers from different schemes actually denote the same referent? – If A says “owner” and B says “owner”, are they referring to the same thing? – If A says “released” and B says “disseminated”, do they mean different
things?
• Interoperability: the ability for identifiers to be used in services outside the direct control of the issuing assigner
– Identifiers assigned in one context may be encountered, and may be re-used, in another place or time - without consulting the assigner. You can’t assume that your assumptions made on assignment will be known to someone else.
• Persistence = interoperability with the future
Meaning doi>
• Precisely what is being named? – Suppose I have here a pdf version of Defoe’s “Robinson Crusoe”
issued by Norton. I find an identifier – is it of:– All works by Daniel Defoe – The work “Robinson Crusoe”?– The Norton edition of “Robinson Crusoe”? – The pdf version of the Norton edition of…. ?– The pdf version of…held on this server…?
• Most digital objects of interest have compound form, simultaneously embodying several referents.
• Multiple identifiers may be necessary (compare music CDs)
doi>A pointer is not enough
Metadata scheme e.g. ONIX
Metadata scheme e.g. LOM
Agreed term-by-term mapping or “Crosswalk”
Metadata scheme e.g. ONIX
Metadata scheme e.g. LOM
Metadata scheme e.g. ONIX
Metadata scheme e.g. LOM
ONIX:Author = NormanRights:Writer
Metadata SchemeNormanRights
Term “Author”
Term “Writer”
Central dictionary
Metadata interoperability: semantic problems
Mappings are not simple:
• Different names (and languages) for the same thing (Author vs Writer)
• Same name for different things (title, Title)
• Data elements at different levels of speciality (title vs FullTitle, AlternativeTitle).
• Different allowed values for elements (“pii” vs “not pii”)
• Data at different levels of granularity (journal_article vs SerialArticleWork/SerialArticleVersion).
• Data in different structures (article as attribute of journal or vice versa).
• Data from different sources (local codes vs ONIX codes).
• Different contextual meaning (DOI name of what…?)
• Different representation (1 title vs n titles).
• Different mandatory requirements (ISSN mandatory vs optional)
• Schemas are being updated all the time. . . . . etc.
To manage all of this requires a coherent structured approach.
doi>
Semantic layer Rights metadata Data Dictionary
Dictionary = a common base semantic layer
Communication layer
Rights Expression Language XrML, XCML, ODRL,
etc
Application layer Technology Platform
DRM systems,
“Semantic Web”
DRM
doi>
Semantic = “meaning”
• Does A “mean the same as” B ? – = in practice, does A need a different identifier from B?– versions; works and manifestations; editions
• For a machine, “A means same as B” = “A has same attributes as B”– Which attributes? The answer is entirely contextual:– Do A and B belong to the same class for the purposes of …– The class is defined by a set of attributes (metadata) (RDF, etc)
• We group similar things together; what is identified is usually a class – e.g. the class of all copies of the hardback printed second edition of this book from
this publisher = the same ISBN
• Ultimately, no one thing is the same as another thing (or they wouldn’t be two things)
– “Roughly speaking, to say of two things that they are identical is nonsense, and to say of one thing that it is identical with itself is to say nothing at all”.
– Liebniz’s Law (no two objects have exactly the same properties)– A class contains similar things
• Automation = logic
doi>
2001: Ontologies and Semantic Web
“Ontologies
Of course, this is not the end of the story, because two databases may use different identifiers for what is in fact the same concept, such as zip code. A program that wants to compare or combine information across the two databases has to know that these two terms are being used to mean the same thing. Ideally, the program must have a way to discover such common meanings for whatever databases it encounters.
A solution to this problem is provided by the third basic component of the Semantic Web, collections of information called ontologies.”
doi>
• The key to defining what is identified logically– enabling people to use their existing metadata – Ontologies can deliver data dictionaries suitable for mapping
• Fundamental, generic, extensible methods can be used to construct interoperable ontologies – by putting metadata into context:
Ontology approach: deeper view of metadata
entity attributeattribute
relationshipentity entityrelationship
agent
context
resource
time place
context
doi>
Interoperability of Data in E-Commerce Systems• <indecs> project 1998-2000• <indecs2> 2001-2002 (= MPEG21 Rights Data Dictionary)
• Focus on multimedia rights metadata: recognized that rights and descriptive metadata were inseparable. Produced an event-based reference model/framework (parties, resources, agreements)
• indecs: 50% EC funding + consortium members including:• EDItEUR (international book industry standards/ONIX)• IFPI (international record industry)• MPAA (international film industry)• Various copyright societies and associations• Various technology providers• Library and author representatives • International DOI Foundation
• Metadata in networks needs to support interoperability across– media (e.g. books, serials, audiovisual, software, abstract works). – functions (e.g. cataloguing, discovery, workflow, rights mgmt). – levels of metadata (from simple to complex). – semantic barriers. – linguistic barriers.
The <indecs> framework doi>
Principles: • Unique Identification: every entity should be uniquely identified within an
identified namespace.• Functional Granularity: it should be possible to identify an entity
whenever it needs to be distinguished [1st class].• Designated Authority: the author of an item of metadata should be
securely identified.• Appropriate Access: everyone requires access to the metadata on which
they depend, and privacy and confidentiality for their own metadata from those who are not dependent on it.
• Definition of metadata: An item of metadata is a relationship that someone claims to exist between two referents (description).
Delivered: • Generic data model of e-commerce • Applicable to all types of intellectual property• Specifications for supporting services• Standardisation proposals• Documentation at www.indecs.org
Led to:• Contextual ontology architecture: contexts, roles, identities
The <indecs> framework doi>
Agent
PlaceTime
Resource
Context
EntityTypesAn Entity may have typed relationships
with Entities of any kind (including those of its own kind)
EntityTypesAn Entity may have typed relationships
with Entities of any kind (including those of its own kind)
AttributeTypesAn Entity may have Attributes of any kind. (Attributes, which are a type of Resource,
may have their own Attributes).
AttributeTypesAn Entity may have Attributes of any kind. (Attributes, which are a type of Resource,
may have their own Attributes).
Contextual Relationships
RoleRole
RoleRole
RoleRole
RoleRole
RelatorRelator
Descriptor Descriptor
Name Name
Identifier Identifier
Annotation Annotation
Category Category
FlagFlag
QuantityQuantity
Attributes (illustrative: any Entity or Attribute may have Attributes of any type)
Every Relationshiphas a Relator
VerbVerb
Figure 1
COA MetaModel Overview
Non Contextual Relationships (illustrative: any Type of Entity may relate to any other)
Contextual ontology metamodel overview doi>
1995-2004: Defining what is identified
• Many individual metadata schemes for specific sectors, applications, etc.; vary from simple to complex data models
• 1995+: Dublin Core: need for standardisation on WWW– 15 (+) elements for “output” for simple resource description– Now ISO 15836
• Ontology-based activities:– 1995+ : Common Information System “CIS” (CISAC) – rights, music– 1998: Functional Requirements of Bibliographic Records, “FRBR” (IFLA) –
library cataloguing– 1998-2000: Interoperability of Data in E-Commerce Systems, “indecs”
(multiple partners) – generic intellectual property• For “e-commerce” read “automation” • Influenced by CIS and FRBR
– 2000: ABC/Harmony – generic events-aware model– Should enable re-use of existing metadata
doi>
LicensingEvent UseEventPermits (MAY)
1-n
UseEvent
Prohibits (MUST NOT)
0-n
Payment
ReportingEvent
etc
Requires (MUST)
0-n
Has Exception
Has Precondition
This structure allows for whatever level of flexibility or granularity may be required now or in the future.
e.g. Terms of a Licence as a group of Events
Event = time, place, entities
doi>
Contextual Ontology usage examples
• ISO MPEG-21 Rights Data Dictionary (http://iso21000-6.net/)
• DDEX Digital Data EXchange - music industry (http://ddex.net/)
• ONIX: Book industry (+) messaging schemas (www.editeur.org )
• ONIX: Rights: ONIX for Licensing Terms, Repertoire and Distribution
• Digital Library Federation - communication of licence terms (ERMI: working with ONIX for licensing terms)
• DOI Data Dictionary (http://www.doi.org )
• Rightscom’s OntologyX - licensee of early output, plus their own later work (www.rightscom.com )
• RDA (Resource Description and Access); next generation of AACR/MARC cataloguing – RDA/ONIX common framework
• ACAP: Automated Content Access Protocol (http://www.the-acap.org/ )
• Consistent with FRBR, ABC-Harmony, OWL, CIDOC CRM, etc
doi>
OntologyXRightsCom(Mi3p etc)
indecsDD
IDF + ONIX
Development of indecs 2000-2004 Black = what
Red = who
indecs(2000)
indecs Framework Ltd
IFPI/RIAA, MPA, IDF, DentsuMMG, Rightscom
CONTECS(2001+)
2004
ISOMPEG21
RDD
IDF
1995-2004: Defining what is identified doi>
• DOI name of one item may be related to DOI name of another – Through multiple resolution, metadata, Application Profiles…
• Example: A DOI name of a work could resolve to several available formats, languages, etc.
doi>DOI names to express relationships
ArticleDOI Name
12345
Chinese version DOI Name
56789
• DRM: Technical Protection Measures which use RMI• But: simple management WITHOUT technical protection also needs RMI• What is being managed for any rights purpose has to be identified• We need to accommodate existing and new identifier schemes • A consistent approach to all kinds of inter-related entities is necessary:
Rights: an example of DOI System potential
PeoplePeople makemake
StuffStuffuseuse
DealsDeals
aboutaboutdodo
“identity management” “content management”
“license management”
doi>
Describing rights using data
Primary rights events (claims, deals) are described using pieces of data from all these domains:
Rights Statement (“claim”):
[party] owns [right] in [creation] in [time] and [place]
Rights Agreement (“deal”):
[party] agreed with [party] in [time] and [place] that [event]
Pieces of "rights metadata" usedin each rights statement are things which need to be identified
doi>
Other pieces of data also need standard identifiers (time, party..)
Describing rights using data
Primary rights events (claims, deals) are described using pieces of data from all these domains:
Rights Statement (“claim”):
[party] owns [right] in [creation] in [time] and [place]
Rights Agreement (“deal”):
[party] agreed with [party] in [time] and [place] that [event]
Creations typically have standard identifiers, which may have associated structured data, or which may act as keys to get this data
doi>
Permission: [party] can [verb] [amount] to [creation] at [time] in [place].
Prohibition: [party] can’t [verb] to [creation] at [time] in [place].
Requirement: [party] must [verb] [amount] to [creation/party] at [time] in [place].
Rights Transfer: [party] can [grant right] to [party] in [creation] at [time] in [place].
Secondary rights events (licences) are also described using pieces of data:
Describing rights using data doi>
Describing rights using data
Pieces of "rights metadata" used in each rights declaration
Permission: [party] can [verb] [amount] to [creation] at [time] in [place].
Prohibition: [party] can’t [verb] to [creation] at [time] in [place].
Requirement: [party] must [verb] [amount] to [creation/party] at [time] in [place].
Rights Transfer: [party] can [grant right] to [party] in [creation] at [time] in [place].
doi>
What are these pieces of "rights metadata"?
A mix of data from many sources:
1 Rights “events”
Statements, agreements, transfers, permissions, prohibitions, requirements, assertions, approvals…
2 Descriptive metadata
Creations,creation types, contributor roles, user roles, tools, classifications, measures …
Rights, persons, companies, intellectual property, jurisdictions …
3 Legal terms
Terms, currencies,conventions…4 Financial
metadata
These sets of “rights metadata" are standardized and
maintained in different places.
doi>
This mix of data from many sources is used in many different places by different people in chains of rights events:
Distributed rights management
agreementagreement
transfertransferstatementstatement agreementagreement
permissionpermissionprohibition prohibition
permissionpermissionassertionassertion agreementagreement
requirementrequirement
etc
[party] can [verb] [amount] to [creation] at [time] in [place].
Compound entity can be expanded to reveal more data
doi>
agreementagreement
transfertransferstatementstatement agreementagreement
permissionpermissionprohibition prohibition
permissionpermissionassertionassertion agreementagreement
requirementrequirement
etc
Each of these is an information object:
•which needs to be identified (and may be a compound object);
•which may need to link to or use information objects in other databases;
•which should be interoperable
Distributed rights management doi>
• DOI Data Model and interoperability• Application profiles • Kernel metadata • Metadata declaration• Role of DOI name metadata • Origins of the DOI Data Model • Semantic interoperability• The indecs principles • Applications of indecs• The use of a data dictionary• Example: rights management
Summary doi>
Workshop on the DOI System
DOI SYSTEM: DATA MODEL
International DOI Foundation