an open localisation interface to cms using oasis content management interoperability services

32
An Open Localisation Interface to CMS using OASIS Content Management Interoperability Services Aonghus Ó hAirt, Dominic Jones, Leroy Finn and David Lewis Centre for Next Generation Localisation, Trinity College Dublin

Upload: vince

Post on 06-Jan-2016

35 views

Category:

Documents


0 download

DESCRIPTION

An Open Localisation Interface to CMS using OASIS Content Management Interoperability Services. Aonghus Ó hAirt, Dominic Jones, Leroy Finn and David Lewis Centre for Next Generation Localisation, Trinity College Dublin. Challenges for Interoperability. More iterative workflows - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: An Open Localisation Interface to CMS using OASIS Content Management Interoperability Services

An Open Localisation Interface to CMS using OASIS Content Management Interoperability Services

Aonghus Ó hAirt, Dominic Jones, Leroy Finn and David Lewis

Centre for Next Generation Localisation, Trinity College Dublin

Page 2: An Open Localisation Interface to CMS using OASIS Content Management Interoperability Services

Challenges for Interoperability

More iterative workflows From push-based hand-offs

To change notification & fine grained retrievals

Extending data management to support innovation• Statistical MT, named entity recognition, text analytics for QA

and terminology management • All require up-to-date relevant training corpora

Solutions must sit comfortably with technology of:Content Management, Web Publishing & Localisation

No one Standard can address it all

Integrating ITS and CMIS for L10n& a little on XLIFF, RDF and Open Provenance

Page 3: An Open Localisation Interface to CMS using OASIS Content Management Interoperability Services

CMS-TMS Interoperability

Interoperability roadblock

High variety:Content Management Systems

Content formats

Increasingly dynamic

Add language resource curation

Data driven MT, text analytics

Create Content

PublishContent

Client

LSPPrepare Content

translators

QA

Translate Content

CMS

CMS/CVS

editors

Terminology tools

TMSCAT

QA tools

Web CMS

design

Page 4: An Open Localisation Interface to CMS using OASIS Content Management Interoperability Services

Internationalisation Tag Set (ITS)

Allows I18n and L10n tools to be instructed to treat specific text in specific ways

Principles:Minimise disturbance of original content

Don’t reinvent wheel

Link to existing meta-data before adding new

Defined distinct, independent Data Categories

Identify relevant text using:Attributes to existing elements: LOCAL selection

Xpath selectors in a special element: GLOBAL selection

Page 5: An Open Localisation Interface to CMS using OASIS Content Management Interoperability Services

C. Lieske, F. Sasaki 2010

Page 6: An Open Localisation Interface to CMS using OASIS Content Management Interoperability Services

ITS 1.0 Data Categories

Translate: Mark whether the content of an element or attribute should be translated or not

Localization Note:Communicate notes to localizers about a particular item of content

Terminology Mark terms and optionally associate them with information, such as definitions

DirectionalitySpecify the base writing direction of blocks, embeddings and overrides for the Unicode bidirectional algorithm

RubyProvide a short annotation of an associated base text, particularly useful for East Asian languages

Language InformationExpress the language of a given piece of content

Element within TextIdentify how an element behaves relative to its surrounding text, eg. for text segmentation purposes

Page 7: An Open Localisation Interface to CMS using OASIS Content Management Interoperability Services

ITS 2.0 Draft Data Categories

Page 8: An Open Localisation Interface to CMS using OASIS Content Management Interoperability Services

ITS and Content Management

Global ITS rules can be defined in an external file

Attribute applied to a node with following precedence:LOCAL attributes

Embedded GLOBAL rules in reverse order

External GLOBAL rules in reverse order

ITS allows tool-specific mechanisms for associating global rules with content – precedence not specified

Common practice to apply a given set of rules to all documents in a project with the same schema

Can this scale to multiple overlapping schema?

Can we use some CMS-level meta-data interoperability solution?

Page 9: An Open Localisation Interface to CMS using OASIS Content Management Interoperability Services

CMS Interoperability

Integrating with CMS requires the use of an API. Until now, most CMS used proprietary APIs

Proprietary interfaces to CMS lead to limited support, vendor lock-in and poor interoperability between CMS and with localisation tools

Content Management Interoperability Service (CMIS) from OASIS offers a standardised API for interacting with CMS

Localisation is out of scope for CMIS

How can CMIS facilitate the localisation of content across multiple CMS?

Page 10: An Open Localisation Interface to CMS using OASIS Content Management Interoperability Services

OASIS Content Management Interoperability Services (CMIS)

“defines a domain model and Web Services and Restful AtomPub bindings that can be used by applications to work with one or more Content Management repositories/systems.” (CMIS standard)

Published in 2010

Participation from Adobe, Alfresco, EMC, IBM, Microsoft, Oracle, SAP, and others.

Page 11: An Open Localisation Interface to CMS using OASIS Content Management Interoperability Services

CMIS Implementations

Alfresco 3.3+Apache Chemistry InMemory Server Athento COIDay Software CRX EMC DocumentumeXo Platform with xCMISFabasoftHP Autonomy Interwoven WorksiteIBM Content ManagerIBM FileNet Content ManagerIBM Content Manager On Demand IBM Connections FilesIBM LotusLive FilesIBM Lotus Quickr ListsISIS Papyrus Objects

KnowledgeTree 3.7+Maarch 1.3Magnolia (CMS) 4.5Microsoft SharePoint Server 2010NCMISNemakiWareNuxeo Platform 5.5O3spaces 3.2+OpenIMSOpenWGA 5.2+PTC WindchillSAP NetWeaver Cloud DocumentSeapine Surround SCM 2011.1Sense/Net 6.0+TYPO3VB.CMIS

Page 12: An Open Localisation Interface to CMS using OASIS Content Management Interoperability Services

CMIS Objects

A repository is a container of objects.

Objects have four base types: Document object – “elementary information entities managed by the repository”

Folder object – “serves as the anchor for a collection of file-able objects”

Relationship object – “instantiates an explicit, binary, directional, non-invasive, and typed relationship between a Source Object and a Target Object”

Policy object – “represents an administrative policy that can be enforced by a repository, such as a retention management policy.”

(CMIS Specification)

Page 13: An Open Localisation Interface to CMS using OASIS Content Management Interoperability Services

CMS-L10n Interoperability: Two Requirements

Flexible ITS rule to document bindingsThe same rule to be applied to multiple documents

Multiple rules to be applied to individual documents

Specify the precedence order in which rules are processed for a document

Aim to support external ITS rules via CMIS

Need to signal L10n-relevant updates to documents

MLW-LT (ITS2.0) workgroup identified a requirement for such ‘readiness’ signalling

Aim to support open asynchronous change notification for CMIS

Page 14: An Open Localisation Interface to CMS using OASIS Content Management Interoperability Services

Design: Extending CMIS Implementations

Two approaches to modelling the localisation information:

Custom content modelling

Alfresco aspects

Implementation in repositoryAlfresco (primary)

Nuxeo (basic testing)

Page 15: An Open Localisation Interface to CMS using OASIS Content Management Interoperability Services

ITS rules using Policy Objects

Translate rules as policy objects

document

translate rule(3)

translate rule(1)

translate rule(2)

translate ruleordering(1,2,3)

Page 16: An Open Localisation Interface to CMS using OASIS Content Management Interoperability Services

ITS Rules as Folders

Translate rules as folder objects

document (a)

translate rule(3)

translate rule(1)

translate rule(2)

translate ruleordering(1,2,3)

folder parents

Page 17: An Open Localisation Interface to CMS using OASIS Content Management Interoperability Services

Signalling Readiness from CMS

Readiness meta-dataIndicates the readiness of a document for submission to L10n processes or provide an estimate of when it will be ready for a particular process

Data modelready-to-process – type of process to be perfomred next

process-ref – a pointer to an external set of process type definitions used for ready-to-process

ready-at – defines the time the content is ready for the process, it could be some time in the past, or some time in the future

revised – indicates is this is a different version of content that was previously marked as ready for the declared process

priority – high or low

complete-by – indicates target date-time for completing the process

Page 18: An Open Localisation Interface to CMS using OASIS Content Management Interoperability Services

Polling extension to CMIS

Polling schemes describe the way in which documents are polled for updated readiness properties

scheme name / ID

polling interval

notification method

notification target / host

port (for network connection)

readiness property

readiness value

Page 19: An Open Localisation Interface to CMS using OASIS Content Management Interoperability Services

Polling sequence

PollerRepositoryClient 1 Client 2

updateDocument

updateReadiness

poll

results

poll

documentDetails

sendNotification

Page 20: An Open Localisation Interface to CMS using OASIS Content Management Interoperability Services

Readiness

cmis:document

cmis:namecmis:objectId

cmis:createdBy...

loc:doc

loc:readtoprocessloc:processref

loc:readyatloc:revisedloc:priority

loc:completeby

loc:readtoprocessloc:processref

loc:readyatloc:revisedloc:priority

loc:completeby

<<aspect>>loc:readiness

cmis:namecmis:objectId

cmis:createdBy...

cmis:document

loc:doc

Readiness modelled as custom object

Readiness modelled with an aspect

Page 21: An Open Localisation Interface to CMS using OASIS Content Management Interoperability Services

Polling Schemes

document

polling schemefolder

document

document

document

document

polling schemefolder

document

document

document

document

polling schemefolder

document

document

document

folder children

Page 22: An Open Localisation Interface to CMS using OASIS Content Management Interoperability Services

Document model with localisation

loc:readtoprocessloc:processref

loc:readyatloc:revisedloc:priority

loc:completeby

<<aspect>>loc:readiness

cmis:namecmis:objectId

cmis:createdBy...

cmis:document

not:notificationsent

<<aspect>>not:notified

loc:doc

trans:rulesOrder

<<aspect>>trans:rules

Page 23: An Open Localisation Interface to CMS using OASIS Content Management Interoperability Services

Technical setup

Server

Polling System

Notification System

Repository & Localisation Extensions

Client

NotificationReceiver

Client Application

& Localisation Extensions

Repository browser tool

Polling system

Notification system

Test tools

Page 24: An Open Localisation Interface to CMS using OASIS Content Management Interoperability Services

Evaluation

Notification response time

Page 25: An Open Localisation Interface to CMS using OASIS Content Management Interoperability Services

Evaluation

Performance evaluation

Page 26: An Open Localisation Interface to CMS using OASIS Content Management Interoperability Services

Content Management - L10n Workflow Integration

Source CMS

Target CMS

RDF provenance

store

MT

Web-based PE

MT

TM

CAT

Content Management Localisation Preparation

Translation Management

XLIFF store

Parse, filter, segment

XLIFF+ITSXLIFF/PROV

Workflow Management

Rea

ssem

ble

QA viewer

ITS

Page 27: An Open Localisation Interface to CMS using OASIS Content Management Interoperability Services

XLIFF and Open Provenance

Capture XLIFF transformations that operate on content and its meta-data as the result of content processing by different localisation workflow services

A provenance model used to capture process operations • agents and properties of those processes

Support managing & auditing quality of processes • correlating output of individual steps with professional, crowd

and consumer judgement • support end-to-end process management• terminology management

On-demand language resource assembly • e.g. for parallel text for MT training

Page 28: An Open Localisation Interface to CMS using OASIS Content Management Interoperability Services

Linked Localisation Data: RDF-based loggingOpen Provenance Vocabulary• http://openprovenance.org/• Active W3C Provenance working group

Page 29: An Open Localisation Interface to CMS using OASIS Content Management Interoperability Services

1240112401 “I am a string”value

2010-02-09T12:30:00

j.doej.doe wasGeneratedBy

wasGeneratedAt

Machine Trans

Machine Trans

wasControlledBy

1560115601

“Je suis un string”

wasTranslatedFrom

value

value

1579015790

“Je suis une string”

2010-02-13T10:07:00wasGeneratedAt

CrowdPE

CrowdPE

wasGeneratedBy

d.jonesd.joneswasControlledBy

wasRevisedFromfr-FR

xml:langwasTranslatedFrom

value

“Je suis un phrase”2010-02-14T10:30:00

wasGeneratedAt

Prof trans

Prof trans

wasGeneratedByc3poc3po

1672316723

16740ms

16740ms

expended

1672716727

2010-02-14T14:05:00

l.jfinnl.jfinn

wasGeneratedAt

Text Classify

Text Classify

wasControlledBy wasGeneratedBy

anomolous

value

wasAnnotatedWith

1673416734

2010-02-14T13:21:00

s.currans.curran

wasGeneratedAt

Trans QA

Trans QA

wasControlledBywasGeneratedBy

pass

value

wasAnnotatedWith

LT Assisted Localisation Process Provenance

1577115771

wasAnnotatedWith

2010-02-12T13:17:00

m.beanm.bean

wasGeneratedAt

Crowd rate

Crowd rate

wasControlledBy wasGeneratedBy

“Poor”

value

Page 30: An Open Localisation Interface to CMS using OASIS Content Management Interoperability Services

Client

Future LSP-Neutral Open Service

Source CMS

Target CMS

QA viewer

CMIS+ITS+PROV

TMS+ L10n tools

XLIFF+ITSCMIS+ ITS+XLIFF

+PROV

TMS+ L10n tools

XLIFF+ITSCMIS+ ITS+XLIFF

+PROV

TMS+ L10n tools

XLIFF+ITSCMIS+ ITS+XLIFF

+PROV

Common Services

Content status/ update

Provenance

query

Resource Curation/Sharing

LSPs

Page 31: An Open Localisation Interface to CMS using OASIS Content Management Interoperability Services

Conclusion

Have extended CMIS to support:Document level ITS rules

Open document change notification mechanism

Strong potential to streamline CMS-L10n integration in combination with XLIFF and PROV

Achieved with current CMIS specificationCustom extension to folder object

Custom extension to policy object may be better

Next StepsCombining standards for vendor-neutral CMS integration

Aligh with ITS2.0 and XLIFF2.0

Discuss extensions with CMIS-compliant vendors

Page 32: An Open Localisation Interface to CMS using OASIS Content Management Interoperability Services

THANK YOU.Questions.

Follow ITS Use Case at:

http://www.w3.org/International/multilingualweb/lt/wiki/CMS_Neutral_External_ITS_Rules_and_Readiness

Follow XLIFF+ITS mapping at:

http://www.w3.org/International/multilingualweb/lt/wiki/XLIFF_Mapping