digital object: a virtual online storage solution 598c course project huajing li

12
Digital Object: A Digital Object: A Virtual Online Virtual Online Storage Solution Storage Solution 598C Course Project 598C Course Project Huajing Li Huajing Li

Upload: annabel-crawford

Post on 24-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Digital Object: A Virtual Digital Object: A Virtual Online Storage SolutionOnline Storage Solution

598C Course Project598C Course Project

Huajing LiHuajing Li

Digital Contents: not just Digital Contents: not just documents…documents…

Some conventional objectsSome conventional objects

Complex, compound, dynamic objectsComplex, compound, dynamic objects

Traditional Online Access MethodTraditional Online Access Method

The web server & application must be aware of The web server & application must be aware of the types and components of the managed the types and components of the managed digital contents.digital contents. HTML documents…HTML documents… JPG/GIF photos…JPG/GIF photos… Video clips…Video clips… Stream media…Stream media…

System developer and interface designer must System developer and interface designer must provide appropriate handler for these file types.provide appropriate handler for these file types.

Traditional Data Storage AccessTraditional Data Storage Access

For tuples stored in a relational database, For tuples stored in a relational database, the system developer must have the system developer must have knowledge of the table schema and knowledge of the table schema and encode it explicitly in a query:encode it explicitly in a query:Select name from authors where Select name from authors where

affiliation=“Penn State”;affiliation=“Penn State”;For on-disk file access, a file path needs to For on-disk file access, a file path needs to

be specified.be specified.

ProblemsProblems

Lack of flexibilityLack of flexibilityLack of extensibilityLack of extensibilityLack of support for complex data Lack of support for complex data

structuresstructuresLack of security control at the data levelLack of security control at the data levelUnnecessary duplicate work needs to be Unnecessary duplicate work needs to be

performed by different applications / performed by different applications / partiesparties

Key Research QuestionsKey Research Questions How can clients interact with heterogeneous How can clients interact with heterogeneous

collections of complex objects in a simple and collections of complex objects in a simple and interoperable manner?interoperable manner?

How can complex objects be designed to be both How can complex objects be designed to be both generic and genre-specific at the same time?generic and genre-specific at the same time?

How can we associate services and tools with How can we associate services and tools with objects to provide different presentations or objects to provide different presentations or transformations of the object content?transformations of the object content?

How can we associate specialized, fine-grained How can we associate specialized, fine-grained access control policies with specific objects, or access control policies with specific objects, or with groups of objects?with groups of objects?

How can we facilitate the long-term management How can we facilitate the long-term management and preservation of objects?and preservation of objects?

Look into the Nature, We Have the Look into the Nature, We Have the Hints…Hints…

Data is no longer an isolate existence in most Data is no longer an isolate existence in most current applications.current applications. MetadataMetadata Structural informationStructural information Legal methods that can be applied to the dataLegal methods that can be applied to the data Access control policiesAccess control policies Links to other digital contentsLinks to other digital contents

These features can be grouped into an integral These features can be grouped into an integral unit, which in return simplify the applications.unit, which in return simplify the applications.

Sort of similar with a Java classSort of similar with a Java class

SolutionSolution

We propose a middleware which virtually We propose a middleware which virtually represents each digital content in a represents each digital content in a generic model. This middleware separates generic model. This middleware separates front-end applications from the back-end front-end applications from the back-end storages, gives abstraction to both sides.storages, gives abstraction to both sides.

Persistent ID (PID)

Default Disseminator

System Metadata

Datastream (item)

Digital object identifier

Service Perspective: methods for disseminating “views” of content

Internal: key metadata necessary to manage the object

Item Perspective: Set of content or metadata items

Digital Object Model Architectural View

Datastream (item)

Datastream (item)

Your Extension

Your Extension

A Well-Known Digital Object A Well-Known Digital Object Management System: FedoraManagement System: Fedora

Exte rn a lCo n te n t S o u rc e H

TT

P

E x ter n al C o n ten tR etr iev er

R D B M S

X M L

U s e r Authe nt i c at i o n

P o l icies

U s ers /G ro u p s

H T T P

D atas tr e am s

D i g i tal O bje c tsS to ra g e S ubs ys te m

S e c u rityS u b s ys te m

W e b Se r vi c eE xpo s ur eL aye r

SO

AP

R em o teS er v ic e

L o c alS er v ic e

M an ag e A c c e s s S e arc h O A I P ro v id e r

M a na g e m e ntS ubs ys te m

A c c e s sS ubs ys te m

HT

TP

H T T PH T T P S O A P H T T P S O A P H T T P S O A P

C lie n tA pp

B a tchPro g ra m

S e rv e rA pp

W e bB ro ws e r

P o lic y En fo rc e me n t

P o lic y M g mt

Co n te n t

O b je c t M g mt

O b je c t Va lid a t io n

P ID Ge n e ra t io n D is s e min a t io n

O b je c t Re fle c t io n

S e a rc h

Fedora Service FrameworkFedora Service Framework

My Project WorkMy Project Work

Bring new features into the previous Bring new features into the previous framework.framework.

Fedora does not provide powerful indexing Fedora does not provide powerful indexing and query capabilities.and query capabilities.Full text indexing based on Lucene.Full text indexing based on Lucene.Dynamic field indexing.Dynamic field indexing.Dynamically build in-memory indexing to Dynamically build in-memory indexing to

improve query performance.improve query performance.