digital object: a virtual online storage solution 598c course project huajing li
TRANSCRIPT
Digital Object: A Virtual Digital Object: A Virtual Online Storage SolutionOnline Storage Solution
598C Course Project598C Course Project
Huajing LiHuajing Li
Digital Contents: not just Digital Contents: not just documents…documents…
Some conventional objectsSome conventional objects
Complex, compound, dynamic objectsComplex, compound, dynamic objects
Traditional Online Access MethodTraditional Online Access Method
The web server & application must be aware of The web server & application must be aware of the types and components of the managed the types and components of the managed digital contents.digital contents. HTML documents…HTML documents… JPG/GIF photos…JPG/GIF photos… Video clips…Video clips… Stream media…Stream media…
System developer and interface designer must System developer and interface designer must provide appropriate handler for these file types.provide appropriate handler for these file types.
Traditional Data Storage AccessTraditional Data Storage Access
For tuples stored in a relational database, For tuples stored in a relational database, the system developer must have the system developer must have knowledge of the table schema and knowledge of the table schema and encode it explicitly in a query:encode it explicitly in a query:Select name from authors where Select name from authors where
affiliation=“Penn State”;affiliation=“Penn State”;For on-disk file access, a file path needs to For on-disk file access, a file path needs to
be specified.be specified.
ProblemsProblems
Lack of flexibilityLack of flexibilityLack of extensibilityLack of extensibilityLack of support for complex data Lack of support for complex data
structuresstructuresLack of security control at the data levelLack of security control at the data levelUnnecessary duplicate work needs to be Unnecessary duplicate work needs to be
performed by different applications / performed by different applications / partiesparties
Key Research QuestionsKey Research Questions How can clients interact with heterogeneous How can clients interact with heterogeneous
collections of complex objects in a simple and collections of complex objects in a simple and interoperable manner?interoperable manner?
How can complex objects be designed to be both How can complex objects be designed to be both generic and genre-specific at the same time?generic and genre-specific at the same time?
How can we associate services and tools with How can we associate services and tools with objects to provide different presentations or objects to provide different presentations or transformations of the object content?transformations of the object content?
How can we associate specialized, fine-grained How can we associate specialized, fine-grained access control policies with specific objects, or access control policies with specific objects, or with groups of objects?with groups of objects?
How can we facilitate the long-term management How can we facilitate the long-term management and preservation of objects?and preservation of objects?
Look into the Nature, We Have the Look into the Nature, We Have the Hints…Hints…
Data is no longer an isolate existence in most Data is no longer an isolate existence in most current applications.current applications. MetadataMetadata Structural informationStructural information Legal methods that can be applied to the dataLegal methods that can be applied to the data Access control policiesAccess control policies Links to other digital contentsLinks to other digital contents
These features can be grouped into an integral These features can be grouped into an integral unit, which in return simplify the applications.unit, which in return simplify the applications.
Sort of similar with a Java classSort of similar with a Java class
SolutionSolution
We propose a middleware which virtually We propose a middleware which virtually represents each digital content in a represents each digital content in a generic model. This middleware separates generic model. This middleware separates front-end applications from the back-end front-end applications from the back-end storages, gives abstraction to both sides.storages, gives abstraction to both sides.
Persistent ID (PID)
Default Disseminator
System Metadata
Datastream (item)
Digital object identifier
Service Perspective: methods for disseminating “views” of content
Internal: key metadata necessary to manage the object
Item Perspective: Set of content or metadata items
Digital Object Model Architectural View
Datastream (item)
Datastream (item)
Your Extension
Your Extension
A Well-Known Digital Object A Well-Known Digital Object Management System: FedoraManagement System: Fedora
Exte rn a lCo n te n t S o u rc e H
TT
P
E x ter n al C o n ten tR etr iev er
R D B M S
X M L
U s e r Authe nt i c at i o n
P o l icies
U s ers /G ro u p s
H T T P
D atas tr e am s
D i g i tal O bje c tsS to ra g e S ubs ys te m
S e c u rityS u b s ys te m
W e b Se r vi c eE xpo s ur eL aye r
SO
AP
R em o teS er v ic e
L o c alS er v ic e
M an ag e A c c e s s S e arc h O A I P ro v id e r
M a na g e m e ntS ubs ys te m
A c c e s sS ubs ys te m
HT
TP
H T T PH T T P S O A P H T T P S O A P H T T P S O A P
C lie n tA pp
B a tchPro g ra m
S e rv e rA pp
W e bB ro ws e r
P o lic y En fo rc e me n t
P o lic y M g mt
Co n te n t
O b je c t M g mt
O b je c t Va lid a t io n
P ID Ge n e ra t io n D is s e min a t io n
O b je c t Re fle c t io n
S e a rc h
My Project WorkMy Project Work
Bring new features into the previous Bring new features into the previous framework.framework.
Fedora does not provide powerful indexing Fedora does not provide powerful indexing and query capabilities.and query capabilities.Full text indexing based on Lucene.Full text indexing based on Lucene.Dynamic field indexing.Dynamic field indexing.Dynamically build in-memory indexing to Dynamically build in-memory indexing to
improve query performance.improve query performance.