marian: searching and querying across heterogeneous federated digital libraries marcos andré...
TRANSCRIPT
-
MARIAN:Searching and Querying Across Heterogeneous FederatedDigital LibrariesMarcos Andr GonalvesRobert K. FranceEdward A. FoxTamas E. Doszkocs
Work performed at Virginia Tech, Blacksburg, VA USASupport provided in part by NSF & National Library of Medicine.
-
JCDL 2001First Joint ACM/IEEE Conference on Digital Libraries (+ NSF DLI-2 PI mtg)http://www.jcdl.orgJune 24-28, 2001 in Roanoke, VAConference Committee:General Chair: Edward A. Fox, Virginia TechProgram Chair: Christine Borgman, UCLATreasurer: Neil Rowe, Naval Postgraduate SchoolPosters Chair: Craig Nevill-Manning, Rutgers U.
-
OutlineNDLTDHarvesting Strategies and the OAIMARIAN MiddlewareGenerating Digital Libraries with 5SLFuture Directions
-
NDLTD (1 of 3)Context: Networked Digital Library of Theses and Dissertations, www.ndltd.org, www.theses.orgPlease join! Submit your (students) works!International federation of universities, libraries, supporting institutions (e.g., VTLS union catalog)Extremely heterogeneousAutonomy of management and decentralizationDisparate protocols, metadata, repositories (e.g., UMI, OCLCs WorldCat), language, encodings, user characteristics and preferences
-
NDLTD (2 of 3)Worldwide organization: educational/social contextNational/regional projects in Australia, Catalunya, Germany, India, Latin America (UNESCO/OAS/ISTEC), South Africa (Mellon), USA (including OhioLINK), International conference (225 in March 2000, more expected for next, at Caltech)Steering committee representing supporting groups as well as the hundreds of universities
-
NDLTD (3 of 3)Unique collection discipline/document contextMultilingual and multimedia contentLarge book-size documentsFull-content in several formats (XML, PDF, etc.)Large number of bibliographic referencesSeveral sets of metadata with different ranges of quality, that can fit with the Open Archives Initiative (www.openarchives.org)
-
Harvesting StrategiesHarvesting vs. Federated SearchHarvesting plus Federated SearchPlus local collections The NDLTD Union CollectionMultiple Harvesting Protocols Harvest SystemZ39.50DienstOAI
-
Union Collection Architecture
-
GermanPhysDis Collection
5SL SourceDescription
wrapper
wrapper
Harvestprotocol
VT OAICollection
MARIAN Mediation Middleware
MIT ETDCollection
...
Open Archives protocol
wrapper
...
Dienstprotocol
SOIF
DublinCore
RFC1807
NDLTD/NUDL/Digital Library User
Queries + Results
GreekHellenic DissertationsCollection
wrapper
MARC
Z39.50protocol
WrapperGenerator
Local Data Store
Search ServicesRecommendation Services, etcAnalysisIndexingLinking
-
Open Archives Initiative (OAI)Interoperability Standards: Released - Jan/FebData + Service ProvidersMetadata Harvesting ProtocolUnique identifiers (URNs) for each recordDate-stamp for each record when last modified/created/deletedHTTP server with scripting capabilities6 Service requests (verbs)Identify, ListMetaFormats, ListSetsListIdentifiers, GetRecord, ListRecords
-
low-barrier interop umbrellaherbert van de sompelmetadata
-
OAI harvesting toolsherbert van de sompelservice providerharvesterdata providerrepositoryDatestampIdentifierSetRecords
-
OAI harvesting toolsherbert van de sompelservice providerharvesterdata providerrepositorySupporting protocol requests: Identify ListMetadataFormats ListSetsHarvesting protocol requests: ListRecords ListIdentifiers GetRecord
-
Design FeaturesCombined Harvesting, Federated Search, and Local CollectionsObject-Oriented Information Graph Representation5S Model and 5SL Specification Language
-
MARIAN MiddlewareFlexible Representation ModelInformation GraphClass HierarchiesWeights and Weighted Sets (w. lazy eval)Class-Based SearchUnified Searcher APICombining Heterogeneous InformationStructural MatchingSynthetic Superclasses
-
Information Graph Model (1/2)Each Information Object is a Node.Structure: exposed through LinksFeatures of interest can become Nodesor can remain Hidden within Node Class Search Methods.
-
Information Graph Model (2/2)
-
Class-Based SearchCommon Search MethodsTextLink / Weighted LinkNode in ContextCommon Searcher OperationsMatch Best (weighted maximum)Match Most (summative union)
-
Class-Based Searchpublic interface ClassManager{public WtdObjSet match(InfoDesc description);
public boolean isInClass(FullID id);
public Object idToObject(FullID id);public Vector idsToObjects(Vector ids);}
-
Class-Based Search
-
Combining Sources of InformationStructural MatchingExtends Weighted Retrieval to include Best Match to Document StructureRecursive, ExtensibleCollection ViewsSimple Interface to Complex CollectionsCommon Interface to Diverse CollectionsWeighted Interface to Collections of Varying Quality
-
NDLTD Collection View (part)Dc.creatorHasDcCreatorHasCrawlerAuthorHeadingsDc.SubjectKeywords HasDcSubject HasHeadings HasKeywordsdc.titlecrawlerTitlePhysDis-ETD (SOIF)dc.descriptioncrawlerDescriptionbodyIndividualHasAuthor HasSubjecttitleThesisDissertationdescription SubClasses 0.81.00.91.00.8 SubClasses SubClasses SubClasses 1.01.00.80.80.9SubjectIndividualDc.creatorHasDcCreatorHasCrawlerAuthorHeadingsDc.SubjectKeywords HasDcSubject HasHeadings HasKeywordsdc.titlecrawlerTitlePhysDis-ETD (SOIF)dc.descriptioncrawlerDescriptionbodyIndividualHasAuthor HasSubjecttitleThesisDissertationdescription SubClasses 0.81.00.91.00.8 SubClasses SubClasses SubClasses 1.01.00.80.80.9SubjectIndividual
-
5S Model for Digital Libraries (1/2)Formal ModelStreamsStructuresSpacesServicesSocieties
-
5S Model for Digital Libraries (2/2)Formal ModelStreamsStructuresSpacesServicesSocietiesNDLTD / MARIAN ExampleDocument (presentable, indexable information object)Weighted Set (e.g., of results to a match operation)Collection Graph; Inheritance Lattice; Measure SpaceAdaptive Search; Query History MaintenanceLibrary End-Users; DL Builders
-
5SLGenerates Digital Library (Components)
-
Generating Digital Libraries: XML
-
Interoperability with 5S and 5SLReductionist / Constructivist Approach
Compositional mappings between DLsComposition of S-based constructs
Mapping language
-
Student Projects to IntegrateSchedule-driven HarvesterSDI / Filtering for NDLTDMARIAN-Phronesis (Spanish Monterrey); and work with German (Oldenburg / DFG), Portuguese, Chinese, Japanese, KoreanTREC data formatted for loading
-
Future WorkFusion on hybrid architectureIncorporation of belief networksUsing 5SL to generate wrappersNew services/ functionalitiesPersonalization (e.g., history, folders)Visualization (e.g., Envision applet)Integration with PetaPlex (100 nodes, 2.5 Tbytes disk capacity, > 300 Mbps to campus backbone, Sornil inversion)
-
ConclusionsNDLTD provides a real, fertile, DL testbed.Harvesting strategies and the OAIMARIAN middleware: graphs, classes, viewsGenerating Digital Libraries with 5SLFuture: high performance services, experimental comparisons