cluj napoca, 28 august 2008 2008 ieee international conference on intelligent computer communication...
Post on 24-Dec-2015
218 Views
Preview:
TRANSCRIPT
Cluj Napoca, 28 August 2008
2008 IEEE International Conference on Intelligent Computer Communication and Processing
Digital Libraries Workshop
Towards a GRID-Based Digital Library Management System.
Gheorghe Sebestyén-Pál1, Doina Banciu2, Tünde Bálint1, Bogdan Moscaiuc1, and Ágnes Sebestyén-Pál1
1- Technical University of Cluj-Napoca
2 - ICI Bucharest
Debrecen, 3-5 September 2008, DAPSYS’087th INTERNATIONAL CONFERENCE ON DISTRIBUTED AND PARALLEL SYSTEMS
Content
Classical vs. Digital Libraries Recent research on Digital Libraries (DL) Main issues and requirements for DLs An ontology-based DL model Grid-enabled DL Implementation considerations of a pilot DL Experiments Conclusions
Debrecen, 3-5 September 2008, DAPSYS’087th INTERNATIONAL CONFERENCE ON DISTRIBUTED AND PARALLEL SYSTEMS
Classical vs. Digital Libraries
Classical library a repository of knowledge organized mainly on
paper Digital library
Not only a digitized version of a classical library A new set of functionalities and services are added (e.g.
access control, resources management and allocation, complex search and processing services, etc.)
A data exchange and cooperation environment DLs are becoming digital content management systems Incorporates a wide variety of formats and data types ( text,
audio, video, multi-document complex digital objects) Uses a variety of communication and data-exchange
protocols and standards
Debrecen, 3-5 September 2008, DAPSYS’087th INTERNATIONAL CONFERENCE ON DISTRIBUTED AND PARALLEL SYSTEMS
IT and Communication technologies involved in the implementation of digital libraries
http://mapageweb.umontreal.ca/turner/meta/english/metamap.html
Debrecen, 3-5 September 2008, DAPSYS’087th INTERNATIONAL CONFERENCE ON DISTRIBUTED AND PARALLEL SYSTEMS
Goals for modern DLs
DELOS project’s vision – “to enable any person to access all human knowledge
anytime and anywhere, in a friendly, multi-modal, efficient, and effective way, by overcoming barriers of distance, language, and culture and by using multiple Internet-connected devices”
DL - a knowledge repository and an information exchange infrastructure that allows:
data generation, processing and seamless access to relevant information, regardless of the
geographic distribution of hardware resources, databases or persons.
Debrecen, 3-5 September 2008, DAPSYS’087th INTERNATIONAL CONFERENCE ON DISTRIBUTED AND PARALLEL SYSTEMS
Research in digital libraries Delos Network of Excellence –
Goals: to define and implement digital libraries on new computing and communication technologies
Achievements: definition of functional and architectural requirements for DL implementation
BRICKS project Goals: to design a user and service-oriented space to share
knowledge and resources in a multi-cultural heritage. Achievements:
Definition of a digital library architecture for a very broad and heterogeneous user community; automatic indexing and annotation functionalities
OpenDlib project Goal: development of a software toolkit for dedicated DLs generation Achievements: tools for content harvesting form existing resources
Fedora, DSpace – open source software for DLs Lucene – open source Search engines
Debrecen, 3-5 September 2008, DAPSYS’087th INTERNATIONAL CONFERENCE ON DISTRIBUTED AND PARALLEL SYSTEMS
Research in digital libraries (cont.) Diligent project (part of EGEE project)
Goal: the use of GRID infrastructure for DL implementation Achievements: a new vision about the DL concept:
DL = a dynamic digital content repository and management system dedicated for a purpose (e.g. a project, an art collection, an academic course)
Definition of generic DL services mapped on GRID services DLs dedicated for different domains – with powerful processing
capabilities SINRED project – National Excellency project
Goal: development of a national framework for DLs specialized on technical sciences and research
Achievements: evaluation of requirements, evaluation of existing software, infrastructure development, DL model definition, implementation of a pilot DL
SIPADOC project – National research program Goal: reevaluation of the national patrimony through DLs Achievements: evaluation of digitizing tools
Debrecen, 3-5 September 2008, DAPSYS’087th INTERNATIONAL CONFERENCE ON DISTRIBUTED AND PARALLEL SYSTEMS
Key issues in DL implementation Architectural issues:
distributed nature of storage, processing and access resources Scalability, flexibility, interoperability
Functional requirements: Core functions: storage, indexing and annotation, data-search, content
retrieval, users management Content organization should reflect semantic connections
Processing facilities Data processing services – specialized for different fields Pattern search and recognition
QoS issues Restricted time to obtain relevant information Reasonable time for complex data processing
User and access control management Virtual organizations Role-based access
Debrecen, 3-5 September 2008, DAPSYS’087th INTERNATIONAL CONFERENCE ON DISTRIBUTED AND PARALLEL SYSTEMS
DL = Essence & Metadata Management
TextAudio
VideoText
Digital content generation and
harvesting
Management of essence
Automatic feature (metadata) extraction
Metadata Management
Cataloging, indexing,
annotation
Access and visualization
Cataloging information
system
Debrecen, 3-5 September 2008, DAPSYS’087th INTERNATIONAL CONFERENCE ON DISTRIBUTED AND PARALLEL SYSTEMS
An ontology-based Digital Library approach
Ontology: concepts and relations together with a reasoning engine
Ontology for technical and scientific domains Main concepts:
Digital objects: association of content, metadata and
procedures Examples: articles, technical reports,
prospects, PhD Thesis, patents Digital collections
Set of digital objects structured for a given goal/purpose of based on a given criterion
Examples: articles of an author, documents of a domain
Events Conferences, workshops, seminars
Processes Projects Courses
Virtual organizations Roles users
Debrecen, 3-5 September 2008, DAPSYS’087th INTERNATIONAL CONFERENCE ON DISTRIBUTED AND PARALLEL SYSTEMS
Grid-enabled digital library services
Why DLs on GRID infrastructure? Huge volume of documents/digital objects Concurrent access and multiple search engines (see
Google) Multimedia streaming Automatic indexing and annotation Complex processing requires prohibitive time User management through virtual organizations Job distribution facilities offered by GRID
Debrecen, 3-5 September 2008, DAPSYS’087th INTERNATIONAL CONFERENCE ON DISTRIBUTED AND PARALLEL SYSTEMS
DL functions mapped on GRID services
Computing, storage and communication resources
Digital Library
GRID Services
Collections management
Catalog and metadata
management
Digital objects management
Users’ management
Data visualization
Virtual organizations management
Resource management
Task distribution
Processing
Data distribution and replication
Data processing
Debrecen, 3-5 September 2008, DAPSYS’087th INTERNATIONAL CONFERENCE ON DISTRIBUTED AND PARALLEL SYSTEMS
Experiments Two approaches:
DL implementation on Alchemi GRID (Microsoft) Job distribution at thread level Explicit GRID programming Experiments with multimedia streaming (multimedia content
distribution) DL implementation on Condor GRID (Open source)
Job distribution at task level Job and data distribution is transparent to the DL application
( distribution is made through separate scripts) Experiments with “key-word search” in the whole DL content
The execution time decreased with the number of executor computers
For more than 5 executors the scheduling and communication time is comparable with the execution time
Debrecen, 3-5 September 2008, DAPSYS’087th INTERNATIONAL CONFERENCE ON DISTRIBUTED AND PARALLEL SYSTEMS
A pilot implementation of a Digital library framework developed with GRID support Goal: implementation of a digital content storage and retrieval system
dedicated for educational and scientific activities (courses, projects, etc.)
Main requirements: A DL adaptable for a given purpose/goal Access controlled and restricted with virtual organizations Ontology-based approach (concepts, relations, semantic search) Advanced search procedures GRID-enabled full-text search services – for better reaction time Access through Internet browsers
The result: A distributed digital library application, which allows:
Management of digital objects (upload, storage, indexing, metadata creation
Management of collections Management of users and virtual organizations
Debrecen, 3-5 September 2008, DAPSYS’087th INTERNATIONAL CONFERENCE ON DISTRIBUTED AND PARALLEL SYSTEMS
Pilot DL details: (www.bib-dig.utcluj.ro)
Management of digital objects Digital Documents’ upload, Annotation, metadata generation according with
Dublin Core Distributed Storage of data
Management of collections Define a new collection Attach new documents to an existing collection Associate access rights to a collection
Management of users and virtual organizations Define new users and new virtual organizations Define roles Associate roles to users and collections
Debrecen, 3-5 September 2008, DAPSYS’087th INTERNATIONAL CONFERENCE ON DISTRIBUTED AND PARALLEL SYSTEMS
Snapshots of the DL application’s interface
bib-dig.utcluj.ro
Debrecen, 3-5 September 2008, DAPSYS’087th INTERNATIONAL CONFERENCE ON DISTRIBUTED AND PARALLEL SYSTEMS
Snapshots of the DL application’s interface
Debrecen, 3-5 September 2008, DAPSYS’087th INTERNATIONAL CONFERENCE ON DISTRIBUTED AND PARALLEL SYSTEMS
Search techniques in DLs
through key-word or index search: Database techniques
through semantic Information Retrieval: Semantic graph with documents
and concepts through non-semantic Information
Retrieval: Naive Bayes Algorithm
Probabilistic approach Based on probabilistic
similarity between documents Topic-Based Vector Space
Model Algorithm
Debrecen, 3-5 September 2008, DAPSYS’087th INTERNATIONAL CONFERENCE ON DISTRIBUTED AND PARALLEL SYSTEMS
Experimental results
Execution time v. s. number of executor nodes
0
1000
2000
3000
4000
5000
6000
7000
8000
1 2 3 4 5
Nodes
Tim
e (
s)
Search execution time
Scheduling andcommunication time(case 1)
Scheduling andcommunication time(case 2)
Total time (case1)
Total time (case2)
Debrecen, 3-5 September 2008, DAPSYS’087th INTERNATIONAL CONFERENCE ON DISTRIBUTED AND PARALLEL SYSTEMS
Experiments
Debrecen, 3-5 September 2008, DAPSYS’087th INTERNATIONAL CONFERENCE ON DISTRIBUTED AND PARALLEL SYSTEMS
Conclusions
DLs are complex content management systems that extend the functionalities of classical libraries: Semantic organization of a wide variety of information formats Multiple search and data retrieval techniques (including full-text and
semantic search): Key-word full-text search Semantic search Statistical and probabilistic retrieval and classification
Access control to distributed and remote data DLs are Data exchange and cooperation environments
Useful for remote and cooperative work DLs must include powerful search and data retrieval engines GRID infrastructures may be a feasible support in the implementation of DLs
For more efficient parallel search, classification or automatic annotation
Cluj Napoca, 28 August 2008
2008 IEEE International Conference on Intelligent Computer Communication and Processing
Digital Libraries Workshop
Thank you for your attention
Questions ?
top related