bingo! and daffodil - stanford...

20
1 BINGO! and Daffodil: BINGO! and Daffodil: Personalized Exploration of Personalized Exploration of Digital Libraries and Digital Libraries and Web Sources Web Sources Martin Martin Theobald Theobald Max Max - - Planck Planck - - Institut Institut für für Informatik Informatik Claus Claus - - Peter Peter Klas Klas Universität Universität Duisburg Duisburg - - Essen Essen

Upload: others

Post on 11-Jun-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: BINGO! and Daffodil - Stanford Universityinfolab.stanford.edu/~theobald/pub/riao04_bingo-daffodil.pdf · Coupling of Bingo! and Daffodil provides significantly added value for advanced

1

BINGO! and Daffodil: BINGO! and Daffodil: Personalized Exploration of Personalized Exploration of Digital Libraries and Digital Libraries and Web SourcesWeb Sources

Martin Martin TheobaldTheobaldMaxMax--PlanckPlanck--InstitutInstitut fürfür InformatikInformatik

ClausClaus--Peter Peter KlasKlasUniversitätUniversität DuisburgDuisburg--Essen Essen

Page 2: BINGO! and Daffodil - Stanford Universityinfolab.stanford.edu/~theobald/pub/riao04_bingo-daffodil.pdf · Coupling of Bingo! and Daffodil provides significantly added value for advanced

Theobald, Klas: Personalized Exploration of Digital Libraries and Web Sources 2

OverviewOverview

Challenges in Web Recommendations for Digital Library UsersDAFFODILBINGO!Meta Data ProcessingSystem Integration via Web ServicesExperiments

Page 3: BINGO! and Daffodil - Stanford Universityinfolab.stanford.edu/~theobald/pub/riao04_bingo-daffodil.pdf · Coupling of Bingo! and Daffodil provides significantly added value for advanced

Theobald, Klas: Personalized Exploration of Digital Libraries and Web Sources 3

1.1. Recommendations for Recommendations for Digital Library UsersDigital Library Users

Current recommendations systems are based onUser collaboration & mining of user behaviourPre-computed document similarities (clustering, cosine measure, co-citations, etc.) limited to a single DL index

Challenges in providing Web recommendationsWell organized and structured Digital Library world vs. heterogeneous and chaotic WebUnified view on related publications and topic-specific Web documents (e.g., authors’ homepages)“Virtual links” from meta data through additional Deep Web search

Topic-specific hub pagesFull text sources for training

Filtering, user feedback, and iterative topic refinement

Page 4: BINGO! and Daffodil - Stanford Universityinfolab.stanford.edu/~theobald/pub/riao04_bingo-daffodil.pdf · Coupling of Bingo! and Daffodil provides significantly added value for advanced

Theobald, Klas: Personalized Exploration of Digital Libraries and Web Sources 4

DL1 DL2 DLk...

Service

Service

...

Service

Many access points, e.g., DBLP, CiteSeer, ACM, Springer, Achilles, HCIBIBDifferent query formats &

servicesInsufficient coverageInsufficient functionality

2. DAFFODIL: Motivation2. DAFFODIL: Motivation

Page 5: BINGO! and Daffodil - Stanford Universityinfolab.stanford.edu/~theobald/pub/riao04_bingo-daffodil.pdf · Coupling of Bingo! and Daffodil provides significantly added value for advanced

Theobald, Klas: Personalized Exploration of Digital Libraries and Web Sources 5

2.1. Combination of Services & 2.1. Combination of Services & InformationInformation

Virtual Digital Library

Service

Service

...

Service

One access pointIntegration of DLs and individual servicesMeta data extractionSynergy effects

Strategic information searchInformation organization in personal folders

DAFFODILDAFFODIL

DL1 DL2 DLk...

Page 6: BINGO! and Daffodil - Stanford Universityinfolab.stanford.edu/~theobald/pub/riao04_bingo-daffodil.pdf · Coupling of Bingo! and Daffodil provides significantly added value for advanced

Theobald, Klas: Personalized Exploration of Digital Libraries and Web Sources 6

Information Sources

Structured Personal Lib

Interpret RetrievedInformation

Create Knowledge

Retrieve Information

2.2. Lifecycle of Scientific Search as 2.2. Lifecycle of Scientific Search as an Interactive and Iterative Processan Interactive and Iterative Process

Page 7: BINGO! and Daffodil - Stanford Universityinfolab.stanford.edu/~theobald/pub/riao04_bingo-daffodil.pdf · Coupling of Bingo! and Daffodil provides significantly added value for advanced

Theobald, Klas: Personalized Exploration of Digital Libraries and Web Sources 7

2.3. Personal Library2.3. Personal LibraryPublications organized as structured meta dataDigital Library Objects (DLOs)

Authors, titleWeb linksJournalsConferencesKeywords, abstractsQueries

Meta data export in XML or BibTexBasis for information exchange

with BINGO!

Page 8: BINGO! and Daffodil - Stanford Universityinfolab.stanford.edu/~theobald/pub/riao04_bingo-daffodil.pdf · Coupling of Bingo! and Daffodil provides significantly added value for advanced

Theobald, Klas: Personalized Exploration of Digital Libraries and Web Sources 8

3. BINGO!3. BINGO!Focused crawler

Combination of traditional crawler with classifier and link analyzer for topic distillationCrawls mainly within the densely connected neighbourhood graph for the topics of interestFilters out irrelevant documents directly at the crawler frontier

Information portal generationInitialized by a set of training documents (e.g., bookmarks)Extends and maintains a user-specific topic-directory (Yahoo!)

Expert queriesInitialized by a single-topic keyword queryExploits existing index or external search engine for retrievinginitial hubs training & seedImproves recall

Page 9: BINGO! and Daffodil - Stanford Universityinfolab.stanford.edu/~theobald/pub/riao04_bingo-daffodil.pdf · Coupling of Bingo! and Daffodil provides significantly added value for advanced

Theobald, Klas: Personalized Exploration of Digital Libraries and Web Sources 9

World Wide Web......................

Crawler Documentanalyzer

Feature selection

SVMclassifier

HITSlink-

analyzer

URLqueue

Featurevectors,TF*IDF

Hubs &

AuthoritiesTaxonomy

indexTopic-

specificfeatures

3.1. Focused Crawler Architecture3.1. Focused Crawler Architecture

Trainingdocuments

TrainingIterative (Re-)Training,Feedback

Page 10: BINGO! and Daffodil - Stanford Universityinfolab.stanford.edu/~theobald/pub/riao04_bingo-daffodil.pdf · Coupling of Bingo! and Daffodil provides significantly added value for advanced

Theobald, Klas: Personalized Exploration of Digital Libraries and Web Sources 10

4. Meta Data Processing4. Meta Data ProcessingDLOs do provide

Comprehensive meta data describing the publicationDLOs may provide

Links to full text sources (e.g., PS, PDF or DOC - files)Links to homepages, DL portals, or other Web references

Focused crawler requiresFull text documents for feature extraction and trainingTopic-specific hubs as start pages of focused crawl

Solution: Additional Deep Web queries using DLO contents

Web Service Framework (WSF)Replace form submission method of the browser (HTTP post) with new WS

Detection of attribute labels/names next to form fieldsForm types determine attribute types (e.g., String, Date, etc.)

Generation of WSDL description for WS parametersAutomatic deployment under local Tomcat Web Server

Page 11: BINGO! and Daffodil - Stanford Universityinfolab.stanford.edu/~theobald/pub/riao04_bingo-daffodil.pdf · Coupling of Bingo! and Daffodil provides significantly added value for advanced

Theobald, Klas: Personalized Exploration of Digital Libraries and Web Sources 11

Advanced search form of theDBLP Trier

(HTML <form> and <input> tags)

WS

F

WSDL descriptionregistered at local UDDI

Page 12: BINGO! and Daffodil - Stanford Universityinfolab.stanford.edu/~theobald/pub/riao04_bingo-daffodil.pdf · Coupling of Bingo! and Daffodil provides significantly added value for advanced

Theobald, Klas: Personalized Exploration of Digital Libraries and Web Sources 12

4.1. Ontology4.1. Ontology--Enabled Form Matching Enabled Form Matching & Portal Selection& Portal Selection

Ontology serviceIncorporates WordNet, Cyc, and personalized domain ontologiesMapping of attribute names to semantic conceptsQuantified relationships between conceptsApproximate matches between similar attributes in a DLO and WSDL

Portal selectionAggregation of attribute similarities between a DLO and WSDLRanking for the top matching portals to each meta data query

Unified handling of heterogeneous portalsPre-computed (static) mapping on schema level onlyAutomatic form completion and submission

Page 13: BINGO! and Daffodil - Stanford Universityinfolab.stanford.edu/~theobald/pub/riao04_bingo-daffodil.pdf · Coupling of Bingo! and Daffodil provides significantly added value for advanced

Theobald, Klas: Personalized Exploration of Digital Libraries and Web Sources 13

DLO extracted from Achilles

WSDL for DBLP Trier

Synonym

Synonym

1.01.0

HypernymHypernym

0.80.8

Page 14: BINGO! and Daffodil - Stanford Universityinfolab.stanford.edu/~theobald/pub/riao04_bingo-daffodil.pdf · Coupling of Bingo! and Daffodil provides significantly added value for advanced

Theobald, Klas: Personalized Exploration of Digital Libraries and Web Sources 14

5. System Integration5. System IntegrationWeb Service Infrastructure

Two standalone Tomcat Web Servers for WS routingDaffodil at Universität DuisburgBingo! + WSF at MPII Saarbrücken

Asynchronous couplingDistributed multi-user requestsQueuing of recommendation requests at Daffodil (FIFO)Integration of Bingo! into Daffodil’s agent-based structure

Simple communication protocol (via RPCs)1) Daffodil: Init request with basic crawler parameters2) Bingo!: Fetch folders along with user id & timestamp3) Bingo!: Return top recommendations

Page 15: BINGO! and Daffodil - Stanford Universityinfolab.stanford.edu/~theobald/pub/riao04_bingo-daffodil.pdf · Coupling of Bingo! and Daffodil provides significantly added value for advanced

Theobald, Klas: Personalized Exploration of Digital Libraries and Web Sources 15

1) init

DAFFODILDAFFODILDistributed Agents for User-Friendly Access of Digital Libraries

BINGO!BINGO!Bookmark-Induced Gathering of Information

DL 1 DL k Web Portal 1

Web Portal m... ...

Web

Ser

vice

In

frast

ruct

ure

2) fetch

+ +Fulltext sources

CoreDB

WebIR

Work-flow

Training Docs

Web Service Framework

Web Service Framework

Ontology-Enabled

Portal Selector

Ontology-Enabled

Portal Selector Focused Crawler

Focused Crawler

3) return

CoreDB

Author = ...Title = ......

WebIR

Work-flow

DLOs,Metadata

User Folders

BINGO! Agent forRecommendationsBINGO! Agent forRecommendations

Page 16: BINGO! and Daffodil - Stanford Universityinfolab.stanford.edu/~theobald/pub/riao04_bingo-daffodil.pdf · Coupling of Bingo! and Daffodil provides significantly added value for advanced

Theobald, Klas: Personalized Exploration of Digital Libraries and Web Sources 16

WSF: DBLP Trier, HCIBIB, Achilles, NCSTRL, Springer, Google adv.

6. Experiments: User Folder for “Workflow”6. Experiments: User Folder for “Workflow”

Page 17: BINGO! and Daffodil - Stanford Universityinfolab.stanford.edu/~theobald/pub/riao04_bingo-daffodil.pdf · Coupling of Bingo! and Daffodil provides significantly added value for advanced

Theobald, Klas: Personalized Exploration of Digital Libraries and Web Sources 17

6.1. First Iteration & Feedback6.1. First Iteration & Feedback

Page 18: BINGO! and Daffodil - Stanford Universityinfolab.stanford.edu/~theobald/pub/riao04_bingo-daffodil.pdf · Coupling of Bingo! and Daffodil provides significantly added value for advanced

Theobald, Klas: Personalized Exploration of Digital Libraries and Web Sources 18

6.2. Second Iteration6.2. Second Iteration

Page 19: BINGO! and Daffodil - Stanford Universityinfolab.stanford.edu/~theobald/pub/riao04_bingo-daffodil.pdf · Coupling of Bingo! and Daffodil provides significantly added value for advanced

Theobald, Klas: Personalized Exploration of Digital Libraries and Web Sources 19

7. Conclusions & Outlook7. Conclusions & OutlookCoupling of Bingo! and Daffodil provides significantly added value for advanced users

For Daffodil: Context-based search over the WebFor Bingo!: Access to semi-structured meta data and the Deep Web

Future work:More exhaustive experimental evaluationExtensive study of relevance feedbackDLO extraction from recommended URLsPersonalized crawling as a publicly available service for Daffodil users (dedicated server)

Page 20: BINGO! and Daffodil - Stanford Universityinfolab.stanford.edu/~theobald/pub/riao04_bingo-daffodil.pdf · Coupling of Bingo! and Daffodil provides significantly added value for advanced

Theobald, Klas: Personalized Exploration of Digital Libraries and Web Sources 20

Try it out !Try it out !

www.daffodil.dewww.daffodil.de