institutional services and tools for content, metadata and ipr management

6
Institutional Services and Tools for Content, Metadata and IPR Management Pierfrancesco Bellini, Ivan Bruno, Paolo Nesi, Michela Paolucci DISIT, Distributed Systems and Internet Technology Lab, Dipartimento di Ingegneria dell’Informazione, Università degli Studi di Firenze, Firenze, Italy {pbellini,ivanb,paolucci}@dsi.unifi.it, [email protected] Abstract Multimedia services need to be supported by content, metadata and workflow management systems to efficiently manage huge amount of content items and metadata production as in the case of cultural institutions. Online digital libraries and cultural heritage institutions and the other portals of publishers need an integrated multimedia back office in order to aggregate content and provide them to national and international aggregators. Different technologies need to be integrated to improve existing content management and workflow systems in order to efficiently organize and manage large amount of data and processes to cope with them. The aim of this paper is to formalise and discuss about requirements, design and validation of an institutional aggregator for metadata and content, coping with IPR models for providing content towards Europeana, the European international aggregator. ECLAP, the European Collected Library of Artistic Performance, has been realised respecting all these features and taking into account the problems connected to the cultural heritage cross media content on Performing Arts domain. In order to establish the quality of the institutional services and tools for content described, a set of measures have been made and reported. Keywords: institutional archive; content aggregator; grid computing; workflow; metadata enrichment; metadata validation; semantic computing; IPR management. I. INTRODUCTION Multimedia services need to be supported by content, metadata and workflow management systems to efficiently manage huge amount of content items and metadata production. With the introduction of web 2.0/3.0, and thus of data mining and semantic computing, including social media and mobile technologies most of the digital libraries and museum services became rapidly obsolete and were constrained to rapidly change. In most cases, the cultural institutions see their content ingested, promoted, distributed and exploited by final users via online commercial partners (e.g., Google, Amazon, YouTube), that may take benefits to commercial resell and/or via advertising. Thus the online digital libraries and cultural heritage institutions such as ACM, PubMed and IEEE and the other portals of publishers need an integrated multimedia back office. In Europe and in US, most of the cultural heritage institutions aggregate content and provide them to national and international aggregators such as Europeana in Europe [1], Library of Congress in US [2]. Europeana has more than 24 million of contents coming from about 2200 different content Providers, and about 100 content aggregators. Specific workflow and metadata processing, enrichment (name resolving, date analysis, linking with open data, creation of relationships, etc.) are needed to cope with the content aggregation and publication. To this end, specific solutions and architectures are needed that present some aspects or can be integrated with traditional workflow management systems [3], [4]. In [15] a visual tool for defining authorization workflow models for e-commerce application has been proposed. In this paper, the requirements, the design and the results of the ECLAP aggregator of Europeana are reported. ECLAP stands for European Collected Library of Artistic Performance, [10]. It has been started as an European Commission project CIP-ICT-PSP.2009.2.2, Grant Agreement N°250481. ECLAP has been designed and developed to ingest content coming from 35 different cultural international institutions in 13 languages thus creating a considerable online archive for all the performing arts in Europe [12]. Until now, it has ingested and processed about 180.000 objects (images, document, video, audio, 3D, Braille, e-books, etc.) with 940.000 items (pages, images, video, audio). The paper is organized as follows. In section II, general requirements of the back office and tools for cultural heritage content aggregator are reported. Section III provides ECLAP overview describing the architecture and ECLAP institutional services and tools, while a report on tools usage is shown in Section IV. Conclusions are drawn in section V. II. GENERAL REQUIREMENTS The main requirements of the back office tool for cultural heritage content aggregator are reported as follows. The back office has to be capable to: 1) Ingest a large range metadata formats (XML based or Dublin Core, METS, MPEG-21, etc.) coming from different channels (http, ftp, oai-pmh, etc.). To ingest implies to get the metadata and content, link them together, collect them on a suitable storage, and ingest IPR information. The ingestion may be performed from web pages uploads, and/or from batch processing and/or crawling. 2) Perform human content enrichment, such as metadata translations, validation; addition of comments; social media promotion; voting/rating; promoting; publication to other portals; editing and performing corrections; quality assessment, etc. Multiple activities for users imply to have the possibility of granting different authorizations.

Upload: paolo-nesi

Post on 07-May-2015

210 views

Category:

Technology


0 download

DESCRIPTION

Ingest a large range metadata formats (XML based or Dublin Core, METS, MPEG-21, etc.) coming from different channels (http, ftp, oai-pmh, etc.) and content files >500 ff. Perform human content enrichment, translations, validation; comments; social media, rating; promoting; publication; corrections; assessment, etc. Perform automated activities, technical parameters (duration, size, etc.), descriptors, indexing, translations, VIP names, geonames, LOD, assessment, IPR , verification IPR modelling, assignment and verification. Harmonising the activities of human and automated processing Scale up of the back office architecture to cope with a large number of transactions Support and model one or more workflows

TRANSCRIPT

Page 1: Institutional Services and Tools for Content, Metadata and IPR Management

Institutional Services and Tools for Content,

Metadata and IPR Management

Pierfrancesco Bellini, Ivan Bruno, Paolo Nesi, Michela Paolucci

DISIT, Distributed Systems and Internet Technology Lab,

Dipartimento di Ingegneria dell’Informazione, Università degli Studi di Firenze,

Firenze, Italy

{pbellini,ivanb,paolucci}@dsi.unifi.it, [email protected]

Abstract — Multimedia services need to be supported by content,

metadata and workflow management systems to efficiently

manage huge amount of content items and metadata production

as in the case of cultural institutions. Online digital libraries and

cultural heritage institutions and the other portals of publishers

need an integrated multimedia back office in order to aggregate

content and provide them to national and international

aggregators. Different technologies need to be integrated to

improve existing content management and workflow systems in

order to efficiently organize and manage large amount of data

and processes to cope with them. The aim of this paper is to

formalise and discuss about requirements, design and validation

of an institutional aggregator for metadata and content, coping

with IPR models for providing content towards Europeana, the

European international aggregator. ECLAP, the European

Collected Library of Artistic Performance, has been realised

respecting all these features and taking into account the problems

connected to the cultural heritage cross media content on

Performing Arts domain. In order to establish the quality of the

institutional services and tools for content described, a set of

measures have been made and reported.

Keywords: institutional archive; content aggregator; grid

computing; workflow; metadata enrichment; metadata validation;

semantic computing; IPR management.

I. INTRODUCTION

Multimedia services need to be supported by content, metadata and workflow management systems to efficiently manage huge amount of content items and metadata production. With the introduction of web 2.0/3.0, and thus of data mining and semantic computing, including social media and mobile technologies most of the digital libraries and museum services became rapidly obsolete and were constrained to rapidly change. In most cases, the cultural institutions see their content ingested, promoted, distributed and exploited by final users via online commercial partners (e.g., Google, Amazon, YouTube), that may take benefits to commercial resell and/or via advertising. Thus the online digital libraries and cultural heritage institutions such as ACM, PubMed and IEEE and the other portals of publishers need an integrated multimedia back office. In Europe and in US, most of the cultural heritage institutions aggregate content and provide them to national and international aggregators such as Europeana in Europe [1], Library of Congress in US [2]. Europeana has more than 24 million of contents coming from about 2200 different content Providers, and about 100 content aggregators. Specific workflow and metadata processing,

enrichment (name resolving, date analysis, linking with open data, creation of relationships, etc.) are needed to cope with the content aggregation and publication. To this end, specific solutions and architectures are needed that present some aspects or can be integrated with traditional workflow management systems [3], [4]. In [15] a visual tool for defining authorization workflow models for e-commerce application has been proposed.

In this paper, the requirements, the design and the results of the ECLAP aggregator of Europeana are reported. ECLAP stands for European Collected Library of Artistic Performance, [10]. It has been started as an European Commission project CIP-ICT-PSP.2009.2.2, Grant Agreement N°250481. ECLAP has been designed and developed to ingest content coming from 35 different cultural international institutions in 13 languages thus creating a considerable online archive for all the performing arts in Europe [12]. Until now, it has ingested and processed about 180.000 objects (images, document, video, audio, 3D, Braille, e-books, etc.) with 940.000 items (pages, images, video, audio).

The paper is organized as follows. In section II, general requirements of the back office and tools for cultural heritage content aggregator are reported. Section III provides ECLAP overview describing the architecture and ECLAP institutional services and tools, while a report on tools usage is shown in Section IV. Conclusions are drawn in section V.

II. GENERAL REQUIREMENTS

The main requirements of the back office tool for cultural heritage content aggregator are reported as follows. The back office has to be capable to:

1) Ingest a large range metadata formats (XML based or

Dublin Core, METS, MPEG-21, etc.) coming from different

channels (http, ftp, oai-pmh, etc.). To ingest implies to get the

metadata and content, link them together, collect them on a

suitable storage, and ingest IPR information. The ingestion

may be performed from web pages uploads, and/or from batch

processing and/or crawling.

2) Perform human content enrichment, such as metadata

translations, validation; addition of comments; social media

promotion; voting/rating; promoting; publication to other

portals; editing and performing corrections; quality

assessment, etc. Multiple activities for users imply to have the

possibility of granting different authorizations.

Page 2: Institutional Services and Tools for Content, Metadata and IPR Management

3) Perform automated activities, such as: estimation of

technical parameters (duration, size, etc.), extraction of

descriptors, indexing, automated translations, searching for

VIP names, geonames resolutions, linking with LOD,

metadata assessment (completeness and consistency [6],

[14]), IPR (Intellectual Property Rights) verification. Among

these activities, it has to be included the content adaptation

and repurpose according to a large set of the distribution

channels. For example, ingesting video in any format and

producing the multiple formats for distribution on different

devices.

4) Harmonising the activities of human and automated

processing among the above mentioned. For example,

identifying when the human actions are needed, taking trace of

the performed manual activities, blocking the humans when

the automated elaboration has locked the resource.

5) Scale up of the back office architecture to cope with

large number of transactions on metadata information and

activities in the back office. In most cases, the back office has

to be capable to process large data sets per day, and thus the

execution of massive processing on distributed resources as

GRID is needed [7].

6) Support and model one or more workflows according

to each specific content life cycle. Workflow management also

means to have different user roles for different activities. For

example, not all ingested content is ready to be published at

the same time. Thus, some users may be entitled to move

forward the status of their content, while other may need

approval.

7) Cope with the IPR modelling, assignment and

verification. It also implies that the IPR model may regulate

uses/accesses to the digital content, and the exploitation of

rights about the content manipulation and reuse according to

the owner rights [11].

Moreover, a large number of detailed requirements have

been identified as reported in [8]. Among the additional requirements, we noticed the need of (i) logging and keeping trace of metadata versioning: to keep trace of the work done and changes performed, (ii) formalizing and managing a number of different roles / capabilities to be assigned to the ECLAP users (i.e., enricher, publisher and validator), (iii) providing user accessible tools for: multilingual metadata editing, IPR managements, massive content editing of some specific object status associated with metadata (workflow status, IPR, public/non-public, tags, groups, etc.).

III. ECLAP OVERVIEW

The ECLAP architecture for content and metadata

management (see Figure 1) consists of three main areas:

Metadata Ingestion Server, ACXP back office services [7] and

ECLAP Portal. The Metadata Ingestion Server, which collects

metadata provided by digital archives and libraries (realised by

using MINT metadata mapping tool, [8]). There the metadata

coming in different schemas are mapped according to the

ECLAP semantic model and are made available through the

OAI-PMH protocol. ACXP back office services provide

automated procedures for content and metadata processing

(harvesting, ingestion, analysis, production, adaptation,

validation, publishing, etc.). The ECLAP portal is the Drupal

based front end, which provides front-office tools to work on

content and metadata, IPR models definition, content

management and Europeana publishing. ECLAP provides a

social media style front end service (BPN, Best Practice

Network) with more than 2200 users; directly linked via

service oriented interface to the more complex back office

(based on AXCP) capable to really cope with the complexity

of managing a complex workflow and content processing and

ingestion. The most important activities addressed by ECLAP

are performed in the automated back office.

Figure 1 - ECLAP architecture

In order to better understand the content and metadata

management, it is useful to describe the ECLAP Overall

Scenario in terms of workflow, rules, procedures, etc., which

each Content Provider follows to publish content on ECLAP

and then provide it to Europeana (see Figure 2). All content

managed in the ECLAP are associated with a specific

workflow. In event of Europeana based ECLAP workflow,

content has to be: (i) uploaded/ingested; (ii) enriched through

metadata (some metadata must be sent to Europeana and

others are necessary to describe and manage the content in the

ECLAP); (iii) associated with an IPR Model (through the IPR

Wizard, as described in next sections) [9].

The content uploaded/ingested is initially available on the

ECLAP BPN with maximum restrictions, while metadata are

immediately available for indexing and searching for all kind

of ECLAP users. Only content presenting a (i) sufficient set of

metadata (e.g., Europeana mandatory metadata) and (ii) IPR

information license defined (one from the set admitted by

“europeana:rights” in [13]), can be published on Europeana,

[1].

Front office and back office tools of ECLAP allow

covering the whole content life-cycle. The AXCP grid solution

adopted provides a scalable back office capable to cope with a

huge amount of content and metadata processing capabilities

and features [7].

Ingestion and Harvesting

ECLAP

Metadata

Ingestion

Server

O

A

I

P

M

H Resource Injection

Content

Retrieval

Database +

semantic database

Library

partner Library

partner

Archive

partner Archive

partner Archive

partner

ECLAP Social Service Portal

IPR Wizard/CAS

AXCP back office services

Content Analysis

Content Indexing and Search

Metadata Editor

Content Aggregation and Play

Content Processing

Metadata

Export

Semantic Computing and Sugg.

Content Upload Management

Content Upload

Content Management

Social Network connections

Me

tad

ata

E-Learning Support

Page 3: Institutional Services and Tools for Content, Metadata and IPR Management

Figure 2 - ECLAP Flow Overall Scenario

A. Front office tools

The ECLAP front office allow users covering the whole

content life-cycle: content upload, enrichment, validation, IPR

modelling and editing, content and metadata assessment and

management, publication, etc.

WEB based content upload allows users uploading content

on the portal through the Upload web page. The page shows a

form where users input: DCMI metadata consisting of a set of

Dublin Core fields; Taxonomy as multiple classification terms

selection; Groups assignment (ECLAP Groups); Resource data

by selecting one or multiple files from user’s HD device or a

valid URL (via http or ftp); Workflow type associated with the

life-cycle of content and IPR model, if accessible.

Metadata Editor is the tool for editing, enriching and

validating metadata. According to the user role, the editor

works in the “enrichment mode” or in “validation mode”,

respectively for those enabled users. Metadata editor allows

editing any kind of metadata organized in specific panels to

work on. All changes made on metadata are tracked to

maintain the history of changes and who made it and when.

IPR wizard tool allows creating IPR Models that formalize

the owner rights related to publishing content online in the

ECLAP context. The IPR Logic Model implemented takes

decisions for the IPR Managers according to the relationships:

among user roles and among permissions [9]. The IPR

manager has just to select one or more permissions for a user

role that he/she wants to associate with an IPR Model (and

therefore to a set of contents) and the wizard automatically

selects also the permissions implied by those chosen (e.g.

download imply play, see Figure 3 for the list of permissions).

When some restrictions are applied Creative Commons

Licenses [10] cannot be associated with the IPR Model, so the

user can choose the license from one of the restricted licenses

allowed by Europeana (“Unknown copyright status” or “Right

Reserved – Restricted access”). In ECLAP, many different set

of content permissions (rights) can be imposed by the content

owners, which are the ECLAP Content Providers. For

example: content and metadata upload methods; metadata

standards and formats; IPR on content (licenses, permissions,

etc.); collection topics; etc. Permissions managed on the

ECLAP Portal can be referred to the following aspects:

access to the content (e.g., the content can be accessible

via progressive download and/or download)

user device (e.g., the content can be played via a PC

and/or a mobile device, iPad, etc.)

content resolution (e.g., the content can be accessible only

in a reduced Low Resolution and/or in High resolution).

Figure 3 - ECLAP Permissions

Content Management tool allows users to manage contents

and publish them to Europeana and to perform massive editing

on large set of content elements.

B. Back office services

The ECLAP back-office services consists of a set of grid

processes that run automated workflow processes both on a

single and on multiple contents.

Automated ingestion – It ingests metadata and content

coming from ECLAP partners and Digital Archives and from

the external metadata mapping tool MINT. The process allows

ingesting both massively and singularly metadata and digital

resources. When resources are big file, they are provided by

using physical device. In this case, ECLAP just starts with

metadata ingestion and when the digital resources are

available, the joining is performed.

Content production and adaptation - This process works

with the digital resource and metadata uploaded via web or

ingested. Metadata and digital resource are retrieved from the

CMS or the storage area or downloaded from the provided

URL. Incoming metadata (Dublin Core, Taxonomy, Groups,

Performing Arts metadata, workflow type, user) are enriched

with technical metadata built by analysing the digital resource:

(i) content format (document, audio, video, image,

crossmedia), (ii) content type (file format), (iii) structural

information (size, duration, number of pages). The produced

enriched metadata and digital resource are aggregated and

published in the publication database. Metadata are indexed to

make the content ready for access on the portal. The

production process works also when the digital resource has to

be replaced with a new one (Update). To make the incoming

digital resource accessible by different devices Content

Adaptation processes are exploited: (i) Content adaptation to

different resolutions produces content accessible by different

devices (iPhone, iPad, Android, Windows Phone, etc. and on

Page 4: Institutional Services and Tools for Content, Metadata and IPR Management

the ECLAP portal, any browser.); (ii) Video adaptation

produces the Low, Medium and High Definition versions of a

video; (iii) Metadata Translation translates Dublin Core

metadata and missing metadata in different languages by using

tool or web service for text translation.

Content management - During the life-cycle of content,

massive actions on content could be needed: changes in the

workflow status, changes in the metadata, addition of details

in the metadata sets, etc. Specific actions are also needed to

maintain and manage the content and work both on single

content and multiple such as: delete content, update metadata,

and publish content uploaded by common users.

ECLAP back-office services and front-office tools work

both on content and metadata. However, such processes have

to work in concurrency: back-office services could access and

process content in parallel to the user activities on the front-

end. Activities of translation, enrichment, validation, IPR

definition and assessment cannot be performed by more than

one process at time on the same content. On the other hand,

sequential processing is too expensive and time consuming to

sustain the content workflow and ingestion. In ECLAP,

several thousands of new content per day have to be

processed. To this end, a workflow state diagram has been

modelled, formalized and implemented. Therefore, to manage

the concurrency and to guarantee a safety access to the content

a mechanism of lock-unlock access has been defined. The

general workflow state diagram is coded as described in

Figure 4.

Figure 4 - ECLAP Workflow diagram

C. User Roles and Workflow Model

The front-office tools have to allow working on metadata

in different ways. In order to implement am high quality

content enrichment process, each specific activity has to be

granted to specific people according to their skill, language

and the identification of the institutional Content Provider

(CPID). Moreover, the solution has to keep trace of each

single metadata change to have the evidence of the work

performed and eventually recover wrong situations. To this

end, the following user roles have been defined with their

parameters:

WFIPR (CPID): authorizes the definition and validation

of IPR models, and IPR assignment to the content of the

CPID; by using the IPR Wizard and during the Upload

for the IPR Model Assignment.

WFENRICHER (CPID, {languages}): authorizes the

metadata enrichment and changes in the specified

languages (add, edit metadata).

WFVALIDATOR (CPID, {languages}): authorizes the

validation of the metadata for the identified language. The

metadata fields can be singularly validated until the object

may pass the whole approval phase.

WFPUBLISHER (CPID): to take the final decision for

publishing on ECLAP and on Europeana. The publishing

of single or groups of content can be performed by using

the Content Management Tool and AXCP, together

with much other functionalities, plus eventual new actions

to be programmed on the same tools.

Back-office services are not associated with a specific user

role since they are performed as root user by rules on AXCP

computing grid background automated processes on content

and metadata.

IV. REPORT ON TOOLS USAGE

In this section, the results about the ECLAP back office

activities performed on the content, metadata and IPR until

April 2013 is reported. This service allows users and

automated workflow processes to interoperate securely and to

increase the quality and accessibility of content and metadata,

without any creation of conflicts each other. It is currently in

use by 31 institutions. The number of state transitions and

their distribution in the time period put in evidence the whole

activity of the portal on content and metadata and allow

analysing singularly both the back-office and the user

activities. Some results are reported in the temporal domain

considering the “month” as a time period unit.

A. Workflow Users

Actually, there are 29 workflow qualified users. Each user

may have single or multiple user roles (grant authorization).

The workflow user roles are distributed as: 24 enrichers

(WFENRICHER), 6 validators (WFVALIDATOR), 23 IPR

users (WFIPR) and 9 publishers (WFPUBLISHER).

B. Workflow Transitions

At current date, 706,052 workflow transitions have been

handled for 117,861 content items with an average of 6

transitions per content and a maximum of 104 transitions for a

single content. These transitions were performed in 653 days

with an average of 1,014 transitions per day and a maximum

of 13,162 transitions in one day, with a maximum of 14

different virtual nodes in the AXCP grid, on DISIT Cloud.

TABLE I. DISTRIBUTION OF WORKFLOW TRANSITIONS

From To Number of Transitions

'Uploaded' 'Under-AXCP' 179912

'Under-AXCP' 'Uploaded' 179912

'Proposed (creation)' 'Uploaded' 117861

UNDER-AXCP UPLOADED

UNDER-IPR

UNDER-ENRICH

UNDER-VALIDATION

UNDER-APPROVAL

PUBLISHED

WFIPR By IPR wizard

IPR edit

IPR done

Upload rule

Metadata edit

Automated enrich rule

Assessment rule

Final publication rule

WFENRICHER By Metadata Editor

Translations, Content update/adaptation, Metadata analysis & validation

By AXCP Backoffice

WFVALIDATOR By Metadata Editor

WFPUBLISHER Manually or by AXCP Backoffice

Validation request

PROPOSED

Validation done not

approved

Upload via form

Ingestion rule

Administrative database

Publishing database

Publish to Europeana

Enrichment done

Enrichment done

By AXCP Backoffice

Moderated Upload

Professional & Institutions Upload

Page 5: Institutional Services and Tools for Content, Metadata and IPR Management

'Uploaded' 'Under-Approval' 113549

'Under-Approval' 'Published' 111362

'Uploaded' 'Under-IPR' 929

'Under-IPR' 'Uploaded' 929

'Uploaded' 'Under-Enrichment' 611

'Under-Enrichment' 'Uploaded' 611

'Under-Approval' 'Uploaded' 212

'Uploaded' 'Under-Validation' 38

'Under-Validation' 'Uploaded' 38

'Published' 'Uploaded' 3

C. Back-Office services

The back-office services consist of a set of grid processes

that run periodically automated workflow processes both on a

single and on multiple contents.

1) Content and Metadata Ingestion

The number of content ingested and processed by the back-

office has been 106,525 corresponding to the UPLOADED

workflow state of content.

2) Metadata Analysis

Metadata analysis for assessment or automated translation

performs a transition to the UNDER-AXCP in order to lock

the content and avoid that a user could be access to it for

manual editing or validation. In total, 179,912 of these

transitions were performed.

3) Metadata Validation

Every time content passed the metadata analysis the back-

office performs a transition to the UNDER-APPROVAL. In

total, 113,549 of these transitions were performed.

4) Content Publication

Every time the back-office performs the publication of content

in the UNDER-APPROVAL workflow state it performs a new

transition to the final state: PUBLISHED In total, 107,598 of

these transitions were performed.

D. Front-Office tools

In this section the analysis of the activity performed by

users via front-office tools is reported.

1) Web Page Upload

11336 content has been manually uploaded by users via the

Web Page Upload. Once uploaded the process is passed to the

back-office.

2) Metadata Editor

In order to evaluate the usage of Metadata Editor for the

enrichment and validation activities, both the number of

workflow transitions from UPLOADED to UNDER-ENRICH

and from UPLOADED to UNDER-VALIDATION have been

considered. The former transition gives a measure of manual

enrichment activity, while the latter of the manual validation

activity. The transitions related to enrichment were 611, and

38 for validation. Figure 5 reports the distribution in time of

the enrichment activity.

3) IPR Wizard, IPR definition model ECLAP IPR Wizard is largely used by more than 35

partners in Europe. To evaluate the usage of IPR Wizard, the number of workflow transitions from UPLOADED to UNDER-IPR over time have been tracked. The transitions were

929, and their distribution is reported in the Figure 6. Comparing Figures 5 and 6, it is evident that the IPR tool has been much more used than the metadata editing tool. This is due to the fact that, in most cases the content metadata where ingested by stable institutional databases and archives, while the IPR model was missing on those archives; and Europeana constrained them to formalized the IPR aspect before the content submission.

Figure 5 - Enrichment Activities

Figure 6 - IPR Wizard Activities

For the IPR aspects, 67 different IPR models/licenses have been used. 40 of them are restrictive not public models, while 27 are public models allowing the full content access. Most content providers used 1, 2 or 3 different IPR models/licenses for their content, while a few partners used 4, 8 or 12 models. Figure 7 reports he number of files used per IPR model. It can be seen that the most diffuse two models cover more than the one half of the whole content collection. On the other hand, the semantic flexibility of the IPR model proposed allowed to cope with many different needs of the content owners that impose the IPR according to legal rights they can really provide.

Figure 7 - Statistics on IPR Models.

020406080

100120140160

0

50

100

150

200

0

10000

20000

30000

40000

50000

1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67

Public Not Public

Page 6: Institutional Services and Tools for Content, Metadata and IPR Management

The 68% of content is associated with a public IPR model. Regarding the 30 restrictive IPR models defined by a Content Provider, in 11 cases they restricted the access to only educational group users and in 6 cases restricted the access to group users only (educational and not educational). Moreover, 23 models have been used to not allow the download of the digital resource for some kind of resource type (regardless of the user type).

TABLE II. IPR MODELS ALLOWING PERMISSIONS BY USER TYPE

Permission User type

public group educ./research

only play/access 11 13 19

download & play 3 8 11

no permission 19 12 4

Table II reports, for the three user types (public, group and

educational), how many IPR models allow only play/access of the digital resource or allow the download and play of it or no permission are provided. It can be seen that in most cases the models are used to restrict access from the public users (19 over 30) and to limit the download of the resource.

4) Content Management Tool To evaluate the usage of Content Management tool for

publication activity we measured the number of workflow transitions from UNDER-APPROVAL to PUBLISHED made by partners. The transitions were 3764 and distributed by month as reported in Figure 8.

Figure 8 - Publication Activities.

V. CONCLUSION

The paper discussed about requirements, design and validation of ECLAP which is an institutional aggregator for metadata and content coping with IPR models for providing metadata towards Europeana, the European international aggregator. The proposed solution takes into account issues connected to the cultural heritage cross media content on Performing Arts domain and integrates front office tools and an automated back-office based on a Grid. The solution allows to cope with high quality and provides large scale multimedia services to manage huge amount of content and metadata, coming in turn from several national and local institutions: museum, archives, content providers. Finally, the usage analysis puts in evidence the whole activities of ECLAP on content, metadata and IPR until April 2013. It underlines that the huge activity on content

and metadata aggregation, analysis and validation to match the Europeana requirements has been mainly automated and performed by the back-office, thus allowing to keep content processing cheap and sustainable. Regarding the front office side, the most used tools by content providers have been associated with IPR, namely IPR Wizard and the Content Management since they allow users to finalise the rights and to provide a connection of the content versus Europeana. Most of the metadata provided were already in a good shape and less than the 1% of content has been corrected from that point of view. On the other hand, the IPR details requested by Europeana constrained the content provider to associate to the100% of the content a new IPR model. This huge effort has been kept under control by exploiting the IPR Model, and applying only 67 models to the whole set of more than 120.000 different content coming from more than 35 different collections and institutions.

ACKNOWLEDGMENT

The authors want to thank all the partners involved in ECLAP, and the European Commission for funding ECLAP in the Theme CIP-ICT-PSP.2009.2.2, Grant Agreement No. 250481.

REFERENCES

[1] Europeana, http://www.europeana.eu

[2] The Library of Congress, http://www.loc.gov

[3] Yu, J., & Buyya, R.. A taxonomy of workflow management systems for grid computing. Journal of Grid Computing, 3(3-4), 171-200, 2005

[4] W.M.P. van der Aalst and K.M. van Hee. Workflow Management: Models, Methods, and Systems. MIT Press, Cambridge, MA, USA, 2002.

[5] European Library of Artistic Performance, ECLAP, http://www.eclap.eu/

[6] Bellini E., Nesi P. "Metadata Quality assessment tool for Open Access Cultural Heritage institutional repositories", Proc. of the ECLAP 2013 conference, 2nd Int. Conf. on Information Technologies for Performing Arts, Media Access and Entertainment, Springer Verlag LNCS, 2013.

[7] Bellini P., Bruno I., Cenni D., Nesi P., "Micro grids for scalable media computing and intelligence on distributed scenarious", IEEE Multimedia, 18 Aug. 2011, IEEE Computer Society Digital Library, ISSN: 1070-986X.

[8] “ECLAP DE3.1 infrastructure: ingestion and processing content and metadata”, 2011, ECLAP Project, http://www.eclap.eu/urn:axmedis:00000:obj:a345a84f-6fdf-4f84-a412-88094ce363e2

[9] Bellini P., Nesi P., Paolucci M. (2013). IPR Management Models for Cultural Heritage on ECLAP Best Practice Network, IEEE International Conference on Communications, 9-13 June, Budapest, Hungary.

[10] Creative Commons, http://creativecommons.org

[11] Wang X., “MPEG ‐ 21 rights expression language: Enabling interoperable digital rights management,” IEEE Multimedia, 11(4):84–87, 2004

[12] ECLAP Partners. List and information on ECLAP Partners available at: http://www.eclap.eu/drupal/?q=node/3578

[13] ‘Guidelines for the europeana:rights metadata element’, v4.0 – 20, 9 February 2012, http://pro.europeana.eu/documents/900548/0d423921-23e0-45fa-82a4-2ac72b3e6f38

[14] Park Jung-Ran, Metadata quality in digital repositories: A survey of the current state of the art, “Cataloging & classification quarterly”, vol. 47, nos. 3-4 (April 2009), p. 213-228

[15] S.K. Chang et al, “Visual Authorization Modeling with Applications to Electronic Commerce”, IEEE Multimedia, (2003), Vol. 10, No. 1, pp. 44-54.

0

200

400

600

800

1000

1200

1400