vishnu kanth reddy challam - ittc home...vishnu kanth reddy challam master’s thesis defense date:...

Post on 05-Sep-2021

4 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1University of Kansas

Contextual Information Retrieval Using Ontology-Based User Profiles

Vishnu Kanth Reddy ChallamMaster’s Thesis DefenseDate: Jan 22nd, 2004.

CommitteeDr. Susan Gauch(Chair)

Dr.David AndrewsDr. Jerzy W.Grzymala-Busse

2University of Kansas

Presentation Outline

Search Engines TodaySearch Engine Personalization ContributionsOur Approach for Contextual IRExperiments and EvaluationConclusions and Future Work

3University of Kansas

Search Engines TodayReturn results based on simple key-word matches. No regard for conceptual information.

For E.g. : If the query is “SALSA” Is it………

4University of Kansas

Search Engines Today Contd..

What is the user looking for?No personalization mechanism to understand the information needs of the user.

5University of Kansas

Search Engine Personalization…How?

Collect and represent information about the user.Use this information to either filter or re-rank the results returned from the initial retrieval process or directly use this information in the search process.

6University of Kansas

Search Engine Personalization ..Challenges

How can accurate information about the user’s interests be collected and represented?How can we use this information to deliver personalized search results?

7University of Kansas

Contributions….

We present a novel-approach to personalizing search engines using ontology-based contextual user profiles.Studied the effect of conceptual ranking versus original keyword based ranking.Studied the usage of multiple sources of information to build the user’s contextual profile.

8University of Kansas

Related Work

Semantic WebExplicitly state meaning of content using Knowledge Representation LanguagesDomain specific effortsWeb is democratic!

9University of Kansas

Design Criteria

Monitor and store user information on the client machine or the server.Short term vs. Long termWith server side profiling, privacy is an issue.Instantaneous information needs are hard to satisfy.

10University of Kansas

Contextual SearchNo long term user profilesBuild contextual profiles that capture the information needs of the user at the time they conduct search…TASK ORIENTEDUpload the contextual profile to the server.Privacy

11University of Kansas

How to Build Contextual Profiles?

Monitor the activity of the user on his/her Windows machine. Capture content from Word documents,Web pages, Chat transcripts etc..Classify the captured content to build a contextual profile

12University of Kansas

Monitoring the User Activity

A Windows application that runs in the background.Captured text from open Word, IE, MSN Chat windows.Stored the captured content in a special folder on the clients machine.Content is assigned a time-stamp.

13University of Kansas

Text Classification

Classifier works in 2 phases: training and classification.Training Phase:

Classifier is given a series of documents classified manually.Learns about the features (vocabulary) of the various categories into which the text might be classified.

14University of Kansas

Text Classification Contd…

Classification phase:Classifier, classifies the input text and assigns it to a particular category based on similarity between the features of input text and those extracted from training data.

15University of Kansas

Text Classification : Our Approach

Vector-space model (tf-idf model).Training data are the documents manually assigned into categories of the Standard Tree which is our reference ontology. Classifier creates a vector of vocabulary terms and weights associated with the category in an inverted file.

16University of Kansas

Standard Tree

17University of Kansas

Text Classification: Our Approach Contd..

During classification phase, vector of input document is created.Degree of similarity between training vectors and input document vector calculated using dot product of the vectors.Best matches are the concepts into which the input document is assigned.

18University of Kansas

Building Contextual User Profile

Content created/viewed within a specific time window is classified.The classifier represents the user’s contextual profile for the time window as a weighted ontology. Weight of a concept in the ontology represents the amount of information recently viewed/created that was classified into that concept.

19University of Kansas

Sample Contextual User ProfileCategory-id, WeightCategory-id used to identify the concept in Standard Tree.26878 is Top/Science/Environment/Water_Resources

20University of Kansas

Personalizing Search Results Using Contextual User ProfilesResults are re-ranked using a combination of the original rank and their conceptual rankSimilarity of the documents to the contextual profile is used to calculate the conceptual rank

21University of Kansas

Conceptual RankDocument’s title and summary are classified to create the document profile.Document profile is compared to the contextual profile to calculate the conceptual similarity between document and user’s context.

wherewtik = Weight of Conceptk in Contextiwtjk = Weight of Conceptk in documentj

jk

N

kikji wtwtdoccontextsim ∗= ∑

=1

),(

22University of Kansas

Final Rank

α has a value between 0 and 1 Varying the values of α between 0 and 1 conceptual and keyword ranks can be weighted differently.

Rank Keyword)-(1 Rank Conceptual* Rank Final ∗+= αα

23University of Kansas

Experiments and Evaluation

Wrapper around Google built using Google API.Google Wrapper builds a log of:

1. Queries given by user2. Results & ranks returned by Google3. Result clicked by the user4. Title & Summaries

Randomizes the results returned by Google before displaying them to the user.

24University of Kansas

Google Wrapper

25University of Kansas

Experiments5 users asked to write essays on topics ranging from car buying, labs at ITTC to jewelry.Windows application monitored their activityQueries issued to Google WrapperResult clicked by the user was used as a form of implicit user relevance for analysis.

26University of Kansas

Experiments Contd..Log of 50 queries.6 had to be filtered out. 44 queries analyzedEvaluate number of concepts for the user’s contextual profile, the document profile and the value of α for blending original and conceptual ranks.Analysis based on average rank of the result clicked by the user in our conceptual search engine and baseline system Google.

27University of Kansas

EvaluationProfile built from content of Word documents alone32 queries analyzedVaried the number of concepts for the user profile and the document profile.Average Google Rank is 4.84Best average conceptual rank is 4.68

4.4

4.5

4.6

4.7

4.8

4.9

5

5.1

5.2

Google Top 2Concepts

Fr omDocumentPr of i l e

Top 5Concepts

Fr omDocumentPr of i l e

Top 7Concepts

Fr omDocumentPr of i l e

Al l ConceptsFr om

DocumentPr of i l e

Aver

age

Ran

k

Top 10 Conceptsf rom User Prof ile

Top 20 Conceptsf rom User Prof ile

Top 30 Conceptsf rom User Prof ile

A ll concepts f romUser Prof ile

28University of Kansas

Evaluation Contd…Final Rank calculated using the formula

FR = α*CR +(1-α)*KRBest final rank of 4.59 when α = 0.45.16 percent improvement over Google’s rank of 4.84Contextual information from Word documents can be used to improve web queries.

4.45

4.5

4.55

4.6

4.65

4.7

4.75

4.8

4.85

4.9

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Alpha Value

Fina

l Ran

k

29University of Kansas

Evaluation Contd…Profile built from content of Web pages alone31 queries analyzedVaried the number of concepts for the user profile and the document profile.Average Google Rank is 4.58Best average conceptual rank is 4.74( 30 concepts for contextual profile and all concepts for document profile)

0

1

2

3

4

5

6

Google Top 2Concept s

FromDocument

Prof i le

Top 5Concept s

FromDocument

Prof i le

Top 7Concept s

FromDocument

Prof i le

AllConcept s

FromDocument

Prof i le

Aver

age

Ran

k

Top 10 ConceptsFrom User Pro f ile

Top 20 Conceptsf rom User Pro f ile

Top 30 Conceptsf rom User Pro f ile

A ll Concepts f romUser Prof ile

30University of Kansas

Evaluation Contd…Final Rank calculated using the formula

FR = α*CR +(1-α)*KRBest final rank of 4.22 when α = 0.47.86 percent improvement over Google’s rank of 4.74Contextual information from Web Pages can be used to improve web queries.

3.94

4.14.24.34.44.54.64.74.8

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Alpha Values

Aver

age

Ran

k

31University of Kansas

Evaluation Contd…

Profile built by combining content of Web pages and Word Documents.Final Profile = β *Word Profile + (1 - β) * Web Profileβ has values between 0 and 1

32University of Kansas

Evaluation Contd….

Effect of α and β22 queries analyzedBest Conceptual Rank 4.36 when α is 0.8 and β is 0.115% improvement over Google’s rank!

4

4.2

4.4

4.6

4.8

5

5.2

5.4

5.6

0 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Alpha Values

Aver

age

Ran

k

Bet a = 0

Bet a = 0.1

Bet a = 0.2

Bet a = 0.3

Bet a = 0.4

Bet a = 0.5

Bet a = 0.6

Bet a = 0.7

Bet a = 0.8

Bet a = 0.9

Bet a = 1

33University of Kansas

Evaluation Contd…Effect of α on final rankHigh value of α indicates that conceptual rank should be given more importance.Re-ranking among top 10, all of them match the user’s query equally well.Primary distinguishing factor is conceptual similarity to contextual profile.

4.24.34.44.54.64.74.84.9

55.15.2

0 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Alpha Values

Aver

age

Ran

k

34University of Kansas

Evaluation Contd…Effect of β on final rankβ values between 0.1 and 0.5 produce roughly comparable results.Increased importance of Web content maybe because Word documents were short.If more content available in Word documents a higher value of β might have been observed.

4.3

4.4

4.5

4.6

4.7

4.8

4.9

5

5.1

5.2

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Beta values

Aver

age

Ran

k

35University of Kansas

ConclusionsContextual profiles improve Web searches.15% improvement over Google when profile is built by combining content from Word documents and Web pagesWithin top 10 results of Google, re-ranking should be done giving more weight to conceptual similarity between documents and the contextual profile

36University of Kansas

Conclusions Contd..All users were expert search engine users. Query length was long. Longer queries tend to disambiguate themselves.System performs better for shorter queries more common on the Web as a whole

37University of Kansas

Future Work

Best time window within which documents captured should be included in the contextual profileAnalyze content from other sources like Chat transcripts, Excel spreadsheets, PowerPoint slides etc..Combination of user’s current context, long and short term interests.

38University of Kansas

Questions or Comments

?? or !!

39University of Kansas

Thank You!

top related