marian: searching and querying across heterogeneous federated digital libraries marcos andré...

Download MARIAN: Searching and Querying Across Heterogeneous Federated Digital Libraries Marcos André Gonçalves Robert K. France Edward A. Fox Tamas E. Doszkocs

If you can't read please download the document

Upload: clarissa-daniel

Post on 03-Jan-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

  • MARIAN:Searching and Querying Across Heterogeneous FederatedDigital LibrariesMarcos Andr GonalvesRobert K. FranceEdward A. FoxTamas E. Doszkocs

    Work performed at Virginia Tech, Blacksburg, VA USASupport provided in part by NSF & National Library of Medicine.

  • JCDL 2001First Joint ACM/IEEE Conference on Digital Libraries (+ NSF DLI-2 PI mtg)http://www.jcdl.orgJune 24-28, 2001 in Roanoke, VAConference Committee:General Chair: Edward A. Fox, Virginia TechProgram Chair: Christine Borgman, UCLATreasurer: Neil Rowe, Naval Postgraduate SchoolPosters Chair: Craig Nevill-Manning, Rutgers U.

  • OutlineNDLTDHarvesting Strategies and the OAIMARIAN MiddlewareGenerating Digital Libraries with 5SLFuture Directions

  • NDLTD (1 of 3)Context: Networked Digital Library of Theses and Dissertations, www.ndltd.org, www.theses.orgPlease join! Submit your (students) works!International federation of universities, libraries, supporting institutions (e.g., VTLS union catalog)Extremely heterogeneousAutonomy of management and decentralizationDisparate protocols, metadata, repositories (e.g., UMI, OCLCs WorldCat), language, encodings, user characteristics and preferences

  • NDLTD (2 of 3)Worldwide organization: educational/social contextNational/regional projects in Australia, Catalunya, Germany, India, Latin America (UNESCO/OAS/ISTEC), South Africa (Mellon), USA (including OhioLINK), International conference (225 in March 2000, more expected for next, at Caltech)Steering committee representing supporting groups as well as the hundreds of universities

  • NDLTD (3 of 3)Unique collection discipline/document contextMultilingual and multimedia contentLarge book-size documentsFull-content in several formats (XML, PDF, etc.)Large number of bibliographic referencesSeveral sets of metadata with different ranges of quality, that can fit with the Open Archives Initiative (www.openarchives.org)

  • Harvesting StrategiesHarvesting vs. Federated SearchHarvesting plus Federated SearchPlus local collections The NDLTD Union CollectionMultiple Harvesting Protocols Harvest SystemZ39.50DienstOAI

  • Union Collection Architecture

  • GermanPhysDis Collection

    5SL SourceDescription

    wrapper

    wrapper

    Harvestprotocol

    VT OAICollection

    MARIAN Mediation Middleware

    MIT ETDCollection

    ...

    Open Archives protocol

    wrapper

    ...

    Dienstprotocol

    SOIF

    DublinCore

    RFC1807

    NDLTD/NUDL/Digital Library User

    Queries + Results

    GreekHellenic DissertationsCollection

    wrapper

    MARC

    Z39.50protocol

    WrapperGenerator

    Local Data Store

    Search ServicesRecommendation Services, etcAnalysisIndexingLinking

  • Open Archives Initiative (OAI)Interoperability Standards: Released - Jan/FebData + Service ProvidersMetadata Harvesting ProtocolUnique identifiers (URNs) for each recordDate-stamp for each record when last modified/created/deletedHTTP server with scripting capabilities6 Service requests (verbs)Identify, ListMetaFormats, ListSetsListIdentifiers, GetRecord, ListRecords

  • low-barrier interop umbrellaherbert van de sompelmetadata

  • OAI harvesting toolsherbert van de sompelservice providerharvesterdata providerrepositoryDatestampIdentifierSetRecords

  • OAI harvesting toolsherbert van de sompelservice providerharvesterdata providerrepositorySupporting protocol requests: Identify ListMetadataFormats ListSetsHarvesting protocol requests: ListRecords ListIdentifiers GetRecord

  • Design FeaturesCombined Harvesting, Federated Search, and Local CollectionsObject-Oriented Information Graph Representation5S Model and 5SL Specification Language

  • MARIAN MiddlewareFlexible Representation ModelInformation GraphClass HierarchiesWeights and Weighted Sets (w. lazy eval)Class-Based SearchUnified Searcher APICombining Heterogeneous InformationStructural MatchingSynthetic Superclasses

  • Information Graph Model (1/2)Each Information Object is a Node.Structure: exposed through LinksFeatures of interest can become Nodesor can remain Hidden within Node Class Search Methods.

  • Information Graph Model (2/2)

  • Class-Based SearchCommon Search MethodsTextLink / Weighted LinkNode in ContextCommon Searcher OperationsMatch Best (weighted maximum)Match Most (summative union)

  • Class-Based Searchpublic interface ClassManager{public WtdObjSet match(InfoDesc description);

    public boolean isInClass(FullID id);

    public Object idToObject(FullID id);public Vector idsToObjects(Vector ids);}

  • Class-Based Search

  • Combining Sources of InformationStructural MatchingExtends Weighted Retrieval to include Best Match to Document StructureRecursive, ExtensibleCollection ViewsSimple Interface to Complex CollectionsCommon Interface to Diverse CollectionsWeighted Interface to Collections of Varying Quality

  • NDLTD Collection View (part)Dc.creatorHasDcCreatorHasCrawlerAuthorHeadingsDc.SubjectKeywords HasDcSubject HasHeadings HasKeywordsdc.titlecrawlerTitlePhysDis-ETD (SOIF)dc.descriptioncrawlerDescriptionbodyIndividualHasAuthor HasSubjecttitleThesisDissertationdescription SubClasses 0.81.00.91.00.8 SubClasses SubClasses SubClasses 1.01.00.80.80.9SubjectIndividualDc.creatorHasDcCreatorHasCrawlerAuthorHeadingsDc.SubjectKeywords HasDcSubject HasHeadings HasKeywordsdc.titlecrawlerTitlePhysDis-ETD (SOIF)dc.descriptioncrawlerDescriptionbodyIndividualHasAuthor HasSubjecttitleThesisDissertationdescription SubClasses 0.81.00.91.00.8 SubClasses SubClasses SubClasses 1.01.00.80.80.9SubjectIndividual

  • 5S Model for Digital Libraries (1/2)Formal ModelStreamsStructuresSpacesServicesSocieties

  • 5S Model for Digital Libraries (2/2)Formal ModelStreamsStructuresSpacesServicesSocietiesNDLTD / MARIAN ExampleDocument (presentable, indexable information object)Weighted Set (e.g., of results to a match operation)Collection Graph; Inheritance Lattice; Measure SpaceAdaptive Search; Query History MaintenanceLibrary End-Users; DL Builders

  • 5SLGenerates Digital Library (Components)

  • Generating Digital Libraries: XML

  • Interoperability with 5S and 5SLReductionist / Constructivist Approach

    Compositional mappings between DLsComposition of S-based constructs

    Mapping language

  • Student Projects to IntegrateSchedule-driven HarvesterSDI / Filtering for NDLTDMARIAN-Phronesis (Spanish Monterrey); and work with German (Oldenburg / DFG), Portuguese, Chinese, Japanese, KoreanTREC data formatted for loading

  • Future WorkFusion on hybrid architectureIncorporation of belief networksUsing 5SL to generate wrappersNew services/ functionalitiesPersonalization (e.g., history, folders)Visualization (e.g., Envision applet)Integration with PetaPlex (100 nodes, 2.5 Tbytes disk capacity, > 300 Mbps to campus backbone, Sornil inversion)

  • ConclusionsNDLTD provides a real, fertile, DL testbed.Harvesting strategies and the OAIMARIAN middleware: graphs, classes, viewsGenerating Digital Libraries with 5SLFuture: high performance services, experimental comparisons