labelling & classification using emerging protocols · a metalanguage that lets you design your...

27
Labelling & Classification Labelling & Classification using emerging protocols using emerging protocols "wheels you don't have to reinvent & "wheels you don't have to reinvent & bandwagons you can jump on" bandwagons you can jump on" Stephen McGibbon Stephen McGibbon Lotus Development Lotus Development

Upload: others

Post on 20-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Labelling & Classification using emerging protocols · a metalanguage that lets you design your own markup language XML is an abbreviated version of SGML - think of XML as being SGML--

Labelling & Classification Labelling & Classification using emerging protocolsusing emerging protocols

"wheels you don't have to reinvent & "wheels you don't have to reinvent & bandwagons you can jump on"bandwagons you can jump on"

Stephen McGibbonStephen McGibbonLotus DevelopmentLotus Development

Page 2: Labelling & Classification using emerging protocols · a metalanguage that lets you design your own markup language XML is an abbreviated version of SGML - think of XML as being SGML--

AssumptionsAssumptions

The business rationale and benefits of classification The business rationale and benefits of classification and labelling are understood and acceptedand labelling are understood and acceptedThere is a desire to build a prototype quicklyThere is a desire to build a prototype quicklyInteroperability is the wicked issueInteroperability is the wicked issue

Page 3: Labelling & Classification using emerging protocols · a metalanguage that lets you design your own markup language XML is an abbreviated version of SGML - think of XML as being SGML--

Generally accepted that we are data rich but Generally accepted that we are data rich but information poor.information poor.Increasing acceptance that as we become Increasing acceptance that as we become information richer we aren't necessarily doing information richer we aren't necessarily doing anything about increasing knowledgeanything about increasing knowledgeOne of the reasons for both of these is that we are One of the reasons for both of these is that we are metadata poormetadata poorMassive effort to address this (as we shall see)Massive effort to address this (as we shall see)

Labelling and Classification are Labelling and Classification are about Metadata (data about data)about Metadata (data about data)

Page 4: Labelling & Classification using emerging protocols · a metalanguage that lets you design your own markup language XML is an abbreviated version of SGML - think of XML as being SGML--

Information Security Labeling and ClassificationInformation Security Labeling and ClassificationInfoSec MetadataInfoSec MetadataThey're one and the same thingThey're one and the same thing=> there are a whole load of metadata bandwagons => there are a whole load of metadata bandwagons you can jump onyou can jump on

Basic PropositionBasic Proposition

Page 5: Labelling & Classification using emerging protocols · a metalanguage that lets you design your own markup language XML is an abbreviated version of SGML - think of XML as being SGML--

Consumer bandwagons are the best !Consumer bandwagons are the best !

Consumer issuesConsumer issuesParents don't want their kids corrupted by the netParents don't want their kids corrupted by the netAdults don't want their privacy compromised by Adults don't want their privacy compromised by e-commercee-commerce

Corporate issuesCorporate issuesCorporates want to control who sees what information Corporates want to control who sees what information and whenand whenCorporates want to retain confidentialityCorporates want to retain confidentiality

The biggest difference is the nomenclatureThe biggest difference is the nomenclature

Page 6: Labelling & Classification using emerging protocols · a metalanguage that lets you design your own markup language XML is an abbreviated version of SGML - think of XML as being SGML--

Metadata about what ?Metadata about what ?

metadata is machine understandable information metadata is machine understandable information about anything accessible/retrievable via a URIabout anything accessible/retrievable via a URIthis includes databasesthis includes databases

Metadata may describe information source rather than Metadata may describe information source rather than just output (i.e. SQL script rather than just the report just output (i.e. SQL script rather than just the report file)file)or elements inside the databaseor elements inside the databaseor attributes of elements in the databaseor attributes of elements in the database

Much effort to define tag sets to allow for standard Much effort to define tag sets to allow for standard classification etc.classification etc.

Page 7: Labelling & Classification using emerging protocols · a metalanguage that lets you design your own markup language XML is an abbreviated version of SGML - think of XML as being SGML--

MetaData InteroperabilityMetaData Interoperability

requires conventions aboutrequires conventions aboutsemanticssemantics -- the meaning of elementsthe meaning of elementsstructurestructure -- human readablehuman readablestructurestructure -- machine readablemachine readablesyntaxsyntax -- grammar to convey syntax and structuregrammar to convey syntax and structure

resource descriptioncommunities eg DC

HTML and RDF (XML)

Page 8: Labelling & Classification using emerging protocols · a metalanguage that lets you design your own markup language XML is an abbreviated version of SGML - think of XML as being SGML--

Dublin Core Tag SetDublin Core Tag Set

Dublin Core is a set of fifteen elements identified by Dublin Core is a set of fifteen elements identified by international, interdisciplinary consensus as being 'core' to the international, interdisciplinary consensus as being 'core' to the process of describing diverse objects in such a way that they process of describing diverse objects in such a way that they may be effectively discovered and evaluated.may be effectively discovered and evaluated.The Dublin Core is not a replacement for existing detailed The Dublin Core is not a replacement for existing detailed metadata structures such as the library world's MARC or the metadata structures such as the library world's MARC or the geospatial community's Content Standards for Digital geospatial community's Content Standards for Digital Geospatial Metadata, but can rather be seen as a means of Geospatial Metadata, but can rather be seen as a means of describing the essence - or 'core' - of both library books and describing the essence - or 'core' - of both library books and maps- and many other types of digital and non-digital maps- and many other types of digital and non-digital resource.resource.

Page 9: Labelling & Classification using emerging protocols · a metalanguage that lets you design your own markup language XML is an abbreviated version of SGML - think of XML as being SGML--

XMLXML

eXtensible Markup LanguageeXtensible Markup Languagea metalanguage that lets you design your own markup a metalanguage that lets you design your own markup languagelanguageXML is an abbreviated version of SGML - think of XML is an abbreviated version of SGML - think of XML as being SGML-- rather than HTML++.XML as being SGML-- rather than HTML++.The v1.0 specification was accepted by the W3C as The v1.0 specification was accepted by the W3C as Recommendation on Feb 10, 1998.Recommendation on Feb 10, 1998.

Strong industry backingStrong industry backingdocument interoperability - SmartSuite/eSuite/Officedocument interoperability - SmartSuite/eSuite/Office

Page 10: Labelling & Classification using emerging protocols · a metalanguage that lets you design your own markup language XML is an abbreviated version of SGML - think of XML as being SGML--

XML lets you define your own markup languageXML lets you define your own markup languageThere are no predefined elements in XML, because it is an There are no predefined elements in XML, because it is an architecture, not an application, so it is not part of XML‘s job architecture, not an application, so it is not part of XML‘s job to specify how or if authors should or should not implement to specify how or if authors should or should not implement metadata.metadata.However you can make full use of the extended hypertext However you can make full use of the extended hypertext features of XML to store or link to metadata in any format features of XML to store or link to metadata in any format (e.g. Dublin Core, Warwick Framework, Resource (e.g. Dublin Core, Warwick Framework, Resource Description Framework (RDF), and Platform for Internet Description Framework (RDF), and Platform for Internet Content Selection (PICS)).Content Selection (PICS)).HTML/RDF/XML rapidly emerging as the dream ticketHTML/RDF/XML rapidly emerging as the dream ticket

How does XML handle metadata?How does XML handle metadata?

Page 11: Labelling & Classification using emerging protocols · a metalanguage that lets you design your own markup language XML is an abbreviated version of SGML - think of XML as being SGML--

RDF - RDF - resource description frameworkresource description frameworka framework for metadataa framework for metadata

RDF will allow different application communities to define the metadata RDF will allow different application communities to define the metadata property set that best serves the needs of each community. RDF metadata property set that best serves the needs of each community. RDF metadata can be used in a variety of application areas such as: can be used in a variety of application areas such as:

in resource discovery to provide better search engine capabilities; in resource discovery to provide better search engine capabilities; in cataloging for describing the content and content relationships in cataloging for describing the content and content relationships available at a particular Web site, page, or digital library; available at a particular Web site, page, or digital library; by intelligent software agents to facilitate knowledge sharing and by intelligent software agents to facilitate knowledge sharing and exchange; exchange; in content rating for child protection and privacy protection; in content rating for child protection and privacy protection; in describing collections of pages that represent a single logical in describing collections of pages that represent a single logical "document"; "document"; for describing intellectual property rights of Web pages. for describing intellectual property rights of Web pages.

Page 12: Labelling & Classification using emerging protocols · a metalanguage that lets you design your own markup language XML is an abbreviated version of SGML - think of XML as being SGML--

RDF - RDF - resource description frameworkresource description frameworka framework for metadata (continued)a framework for metadata (continued)

RDF will use XML as the transfer syntax in order to leverage RDF will use XML as the transfer syntax in order to leverage other tools and code bases being built around XML. other tools and code bases being built around XML. With digital signatures, RDF will be key to building the With digital signatures, RDF will be key to building the "Web of Trust" for electronic commerce, collaboration, and "Web of Trust" for electronic commerce, collaboration, and other applications. other applications.

(source w3c)

Page 13: Labelling & Classification using emerging protocols · a metalanguage that lets you design your own markup language XML is an abbreviated version of SGML - think of XML as being SGML--

RDF will provide RDF will provide (source w3c)(source w3c)

interoperability of metadata interoperability of metadata machine understandable semantics for metadata machine understandable semantics for metadata a uniform query capability for resource discovery a uniform query capability for resource discovery better precision in resource discovery than full text search better precision in resource discovery than full text search a processing rules language for automated decision-making a processing rules language for automated decision-making about Web resources about Web resources language for retrieving metadata from third parties language for retrieving metadata from third parties future-proofing applications as schemas evolve future-proofing applications as schemas evolve

Page 14: Labelling & Classification using emerging protocols · a metalanguage that lets you design your own markup language XML is an abbreviated version of SGML - think of XML as being SGML--

RDFRDF

Built on the following rulesBuilt on the following rulesA Resource - anything that can have a URI (Uniform Resource A Resource - anything that can have a URI (Uniform Resource Identifier)Identifier)A PropertyType - a resource that has a name and can be used as a A PropertyType - a resource that has a name and can be used as a property eg Author or Titleproperty eg Author or TitleA Property is the combination of a resource, a property type and A Property is the combination of a resource, a property type and a valuea value

<RDF:Description href='http://www.textuality.com/rdf/why-rdf.html'><Author>Tim Bray</Author><Home-Page RDF:href='http://www.textuality.com'/></RDF:Description>

Page 15: Labelling & Classification using emerging protocols · a metalanguage that lets you design your own markup language XML is an abbreviated version of SGML - think of XML as being SGML--

RDF 101RDF 101 characteristics characteristics

IndependenceIndependencePropertyType is a resource therefore any independent PropertyType is a resource therefore any independent organisation/person can invent them.organisation/person can invent them.

InterchangeInterchangeRDF properties can be converted into XMLRDF properties can be converted into XML

ScalabilityScalabilityRDF properties are simply 3 part recordsRDF properties are simply 3 part records

Page 16: Labelling & Classification using emerging protocols · a metalanguage that lets you design your own markup language XML is an abbreviated version of SGML - think of XML as being SGML--

RDF 101RDF 101 characteristics (continued) characteristics (continued)

Property Types are ResourcesProperty Types are ResourcesPropertyTypes can have their own Properties and can PropertyTypes can have their own Properties and can be found and manipulated like any other resourcebe found and manipulated like any other resource

Values can be resourcesValues can be resourcese.g. Home-Pagee.g. Home-Page

Properties can be resourcesProperties can be resourcesSo they can have properties tooSo they can have properties tooAvoids meta-meta-data, allows us to ask who said the Avoids meta-meta-data, allows us to ask who said the home-page is whatever etc and whenhome-page is whatever etc and when

Page 17: Labelling & Classification using emerging protocols · a metalanguage that lets you design your own markup language XML is an abbreviated version of SGML - think of XML as being SGML--

VocabulariesVocabularies

RDF provides a model for metadata, and a syntax so RDF provides a model for metadata, and a syntax so that independent parties can exchange and use it.that independent parties can exchange and use it.What it doesn't provide is any What it doesn't provide is any PropertyTypesPropertyTypes of its of its own.own.Vocabularies are packages of property typesVocabularies are packages of property types

eg. The CDF Submission predates RDF and therefore encodes eg. The CDF Submission predates RDF and therefore encodes its vocabulary directly in XML. CDF is the sort of application its vocabulary directly in XML. CDF is the sort of application that is suited to RDF; RDF provides a unifying framework to that is suited to RDF; RDF provides a unifying framework to which a future revision of CDF could evolve. which a future revision of CDF could evolve.

TOG could/should work to define vocabulary(ies)TOG could/should work to define vocabulary(ies)

Page 18: Labelling & Classification using emerging protocols · a metalanguage that lets you design your own markup language XML is an abbreviated version of SGML - think of XML as being SGML--

RDF SchemasRDF Schemas

used to declare vocabularies defined by a particular used to declare vocabularies defined by a particular communitycommunitydefine the valid properties in a given RDF define the valid properties in a given RDF description, as well as any characteristics or description, as well as any characteristics or restrictions of the property-type values themselvesrestrictions of the property-type values themselvesare identified by the XML namespace mechanismare identified by the XML namespace mechanismexact details currently being discussed in W3C RDF exact details currently being discussed in W3C RDF Schema working groupSchema working groupTOG could/should define the InfoSec Schema(s)TOG could/should define the InfoSec Schema(s)

Page 19: Labelling & Classification using emerging protocols · a metalanguage that lets you design your own markup language XML is an abbreviated version of SGML - think of XML as being SGML--

Convergence is startingConvergence is starting

Dublin Core and Web Metadata Standards Converge in HelsinkiDublin Core and Web Metadata Standards Converge in HelsinkiDUBLIN, Ohio, Nov. 7, 1997--The National Library of Finland and DUBLIN, Ohio, Nov. 7, 1997--The National Library of Finland and OCLC cosponsored the fifth metadata workshop Oct. 6-8 in Helsinki, OCLC cosponsored the fifth metadata workshop Oct. 6-8 in Helsinki, Finland, with support from the National Science Foundation and the Finland, with support from the National Science Foundation and the Coalition for Networked Information.Coalition for Networked Information.Seventy-five experts from libraries, the networking research Seventy-five experts from libraries, the networking research community, the digital library research community and content community, the digital library research community and content providers continued work begun in 1995 to reach consensus on providers continued work begun in 1995 to reach consensus on conventions for describing resources on the Internet.conventions for describing resources on the Internet.

Page 20: Labelling & Classification using emerging protocols · a metalanguage that lets you design your own markup language XML is an abbreviated version of SGML - think of XML as being SGML--

PICSPICS

Platform for Internet Content SelectionPlatform for Internet Content SelectionBegan W3C efforts in metadata in 1995Began W3C efforts in metadata in 1995Basically a mechanism for communicating ratings of Basically a mechanism for communicating ratings of web pages from servers to clients.web pages from servers to clients.

Labels contain information about contents of page Labels contain information about contents of page

PICS-NG (next generation) to address more general PICS-NG (next generation) to address more general problem of associating descriptive information with problem of associating descriptive information with internet resources based on PICS architecture ... internet resources based on PICS architecture ... became RDF working groupbecame RDF working group

Page 21: Labelling & Classification using emerging protocols · a metalanguage that lets you design your own markup language XML is an abbreviated version of SGML - think of XML as being SGML--

Platform for Privacy Preferences - Platform for Privacy Preferences - P3PP3P

The goal of P3P is to enable Web sites to express their The goal of P3P is to enable Web sites to express their privacy practices and enable users to exercise preferences privacy practices and enable users to exercise preferences over those practices.over those practices. P3P products will allow users to be informed of site practices P3P products will allow users to be informed of site practices (in both machine and human readable formats), to delegate (in both machine and human readable formats), to delegate decisions to their computer when appropriate, and allow decisions to their computer when appropriate, and allow users to tailor their relationship to specific sites. users to tailor their relationship to specific sites. Sites with practices that fall within the range of a user's Sites with practices that fall within the range of a user's preference could, at the option of the user, be accessed preference could, at the option of the user, be accessed "seamlessly." "seamlessly."

Page 22: Labelling & Classification using emerging protocols · a metalanguage that lets you design your own markup language XML is an abbreviated version of SGML - think of XML as being SGML--

P3PP3P (continued) (continued)

Otherwise users will be notified of a site's practices and have Otherwise users will be notified of a site's practices and have the opportunity to agree to those terms or other terms and the opportunity to agree to those terms or other terms and continue browsing if they wish.continue browsing if they wish.A result of such an "agreement" is that data from the user A result of such an "agreement" is that data from the user may be transferred to the site with the consent of the user.may be transferred to the site with the consent of the user.

Page 23: Labelling & Classification using emerging protocols · a metalanguage that lets you design your own markup language XML is an abbreviated version of SGML - think of XML as being SGML--

P3PP3P

RDF/XML are the meta-data and encoding RDF/XML are the meta-data and encoding specifications that will be used for exchanging specifications that will be used for exchanging information in P3P.information in P3P.The W3C PICS effort allowed one to make simple The W3C PICS effort allowed one to make simple statements about Web resources. RDF will provide statements about Web resources. RDF will provide a more generalized and sophisticated framework for a more generalized and sophisticated framework for discussing privacy practices and preferences.discussing privacy practices and preferences.

Page 24: Labelling & Classification using emerging protocols · a metalanguage that lets you design your own markup language XML is an abbreviated version of SGML - think of XML as being SGML--

SummarySummary

We are really just talking about "Security Metadata"We are really just talking about "Security Metadata"RDF/XML provides the meansRDF/XML provides the meansNeed a Vocabulary of InfoSec PropertyTypesNeed a Vocabulary of InfoSec PropertyTypeswill allow Policies that can use this metadatawill allow Policies that can use this metadatawill allow applications that apply/respect these policieswill allow applications that apply/respect these policieswill allow OS that enforce these policieswill allow OS that enforce these policies

Page 25: Labelling & Classification using emerging protocols · a metalanguage that lets you design your own markup language XML is an abbreviated version of SGML - think of XML as being SGML--

Business DriversBusiness Drivers

needs to be more than just confidentiality and access needs to be more than just confidentiality and access controlcontrolKnowledge ManagementKnowledge Management

eg usage, searching, clusteringeg usage, searching, clustering

Document ManagementDocument Managementeg validity, copyright, dependencies etceg validity, copyright, dependencies etc

Page 26: Labelling & Classification using emerging protocols · a metalanguage that lets you design your own markup language XML is an abbreviated version of SGML - think of XML as being SGML--

Business DriversBusiness Drivers

needs to be more than just confidentiality and access needs to be more than just confidentiality and access controlcontrolKnowledge ManagementKnowledge Management

eg usage, searching, clusteringeg usage, searching, clustering

Document ManagementDocument Managementeg validity, copyright, dependencies etceg validity, copyright, dependencies etc

Page 27: Labelling & Classification using emerging protocols · a metalanguage that lets you design your own markup language XML is an abbreviated version of SGML - think of XML as being SGML--

Q&AQ&A

www.ucc.ie/xmlwww.ucc.ie/xml XML FAQ XML FAQwww.w3.org/metadatawww.w3.org/metadata W3C Metadata activity W3C Metadata activitywww.w3.org/RDF/www.w3.org/RDF/ W3C RDF Homepage W3C RDF Homepagewww.w3.org/TR/NOTE-rdf-simple-introwww.w3.org/TR/NOTE-rdf-simple-intro Introduction to RDF Introduction to RDF Metadata Metadata www.alphaworks.ibm.com/formula/rdfxmlwww.alphaworks.ibm.com/formula/rdfxml IBM's RDF for XML IBM's RDF for XML provides a Java implementation of RDFprovides a Java implementation of RDFpurl.org/metadata/dublin_corepurl.org/metadata/dublin_core The Dublin Core Metadata The Dublin Core Metadata Element Set Home PageElement Set Home Page

<[email protected]>