modern information retrieval computer engineering department fall 2005
TRANSCRIPT
Modern Information Modern Information Retrieval Retrieval
Computer engineering Computer engineering departmentdepartment
Fall 2005 Fall 2005
SubjectsSubjects
1 Introduction1 Introduction2 Models2 Models3 Retrieval evaluation3 Retrieval evaluation4 Query languages4 Query languages5 Query reformulation5 Query reformulation6 Text properties6 Text properties7 Text languages7 Text languages8 Text processing8 Text processing9 Information retrieval from the Web9 Information retrieval from the Web
SourcesSources
1Modern Information Retrieval1Modern Information RetrievalR Baeza-Yates and R Baeza-Yates and B Ribeiro-Neto Addison-Wesley 1999B Ribeiro-Neto Addison-Wesley 1999
2Information Retrieval Systems Theory and 2Information Retrieval Systems Theory and ImplementationImplementationG Kowalski Kluwer 1997G Kowalski Kluwer 1997
3Automatic Text Processing3Automatic Text ProcessingG Salton Addison-G Salton Addison-Wesley 1989Wesley 1989
4Database System Concepts 4thEdition4Database System Concepts 4thEditionA A Silberschatz H Korth and S Sudarshan Silberschatz H Korth and S Sudarshan McGraw-Hill 2001McGraw-Hill 2001
5Various Web sites5Various Web sites6Original material6Original material
MotivationMotivation
bull IR representation storage organization of IR representation storage organization of and access to information itemsand access to information items
bull Focus is on the Focus is on the user information needuser information needbull Information itemInformation item Usually text but possibly Usually text but possibly
also image audio video etcalso image audio video etcbull Presently most retrieval of non-text items Presently most retrieval of non-text items
is based on searching their textual is based on searching their textual descriptionsdescriptions
bull Text items are often referred to as Text items are often referred to as documents documents and may be of different scope and may be of different scope (book article paragraph etc)(book article paragraph etc)
Motivation (cont)Motivation (cont)
bull User information needUser information needndash Find all docs containing information on college tennis Find all docs containing information on college tennis
teams which (1) are maintained by a USA university teams which (1) are maintained by a USA university and (2) participate in the NCAA tournamentand (2) participate in the NCAA tournament
bull Emphasis is on the retrieval of information Emphasis is on the retrieval of information (not data)(not data)
Motivation (cont) Data retrieval
which docs contain a set of keywords Well defined semantics a single erroneous object implies
failure Information retrieval
information about a subject or topic semantics is frequently loose small errors are tolerated
IR system interpret contents of information items generate a ranking which reflects
relevance notion of relevance is most important
Motivation (cont) IR at the center of the stage
IR in the last 20 years classification and categorization systems and languages user interfaces and visualization
Still area was seen as of narrow interest Advent of the Web changed this perception
once and for all universal repository of knowledge free (low cost) universal access no central editorial board many problems though IR seen as key to
finding the solutions
Objecttives
1048633Overall objective Minimize search overhead
1048633Measurement of success Precision and recall
1048633Facilitate the overall objective Good search tools Helpful presentation of results
Minimize search overheadMinimize overheadof a user who is locating needed information Overhead Time spent in all steps leading to the reading of
items containing the needed information (query generation query execution scanning results reading non-relevant items etc)
Needed information Either Sufficient information in the system to complete a task All information in the system relevant to the user needs Example ndashshopping
Looking for an item to purchase Looking for an item to purchase at minimal cost
Example ndashresearching Looking for a bibliographic citation that explains a particular
term Building a comprehensive bibliography on a particular
subject
Measurement of success
Two dual measures 1048633Precision Proportion of items retrieved that are
relevant Precision = relevant retrieved total retrieved = Answer Relevant Answer
1048633Recall Proportion of relevant items that are retrieved
Recall = relevant retrieved relevant exist = Answer Relevant Relevant
1048633Most popular measures but others exist
Measurement of success (cont)
Support user search
Support user search providing tools to overcome obstacles such as
Ambiguities inherent in languages Homographs Words with identical spelling but with multiple
meanings Example ChinonmdashJapanese electronics French chateau
Limits to users ability to express needs Lack of system experience or aptitude
Lack of expertise in the area being searched Initially only vague concept of information sought Differences between users vocabulary and authors
vocabulary different words with similar meanings
Presentation of results
Present search results in format that helps user determine relevant items
Arbitrary (physical) order Relevance order Clustered (eg conceptual similarity) Graphical (visual) representation
Basic Concepts The User Task
Retrieval information or data purposeful
Browsing glancing around F1 cars Le Mans France tourism
Retrieval
Browsing
Database
Querying(retrieval) vs Browsing
Two complementary forms of information or data retrieval
Querying Information need (retrieval goal) is focused
and crystallized Contents of repository are well-known Often user is sophisticated
Querying(retrieval) vs Browsing (cont)
Browsing bullInformation need (retrieval goal) is vague
and imprecise bullContents of repository are not well-known bullOften user is naive
Querying and browsing are often interleaved (in the same session) bullExample present a query to a search
engine browse in the results restate the original query etc
Pulling vs Pushing information Querying and browsing are both initiated by users
(information is ldquopulledrdquo from the sources) Alternatively information may be ldquopushedrdquo to
users Dynamically compare newly received items against
standing statements of interests of users (profiles) and deliver matching items to user mail files
Asynchronous (background) process Profile defines all areas of interest (whereas an
individual query focuses on specific question) Each item compared against many profiles
(whereas each query is compared against many items)
Basic Concepts Logical view of the documents
Document representation viewed as a continuum logical view of docs might shift
structure
Accentsspacing stopwords
Noungroups stemming
Manual indexingDocs
structure Full text
Index terms
UserInterface
Text Operations
Query Operations Indexing
Searching
Ranking
Index
Text
query
user need
user feedback
ranked docs
retrieved docs
logical viewlogical view
inverted file
DB Manager Module
Text Database
Text
The Retrieval Process
- Subjects
- Sources
- Motivation
- Motivation (cont)
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
-
SubjectsSubjects
1 Introduction1 Introduction2 Models2 Models3 Retrieval evaluation3 Retrieval evaluation4 Query languages4 Query languages5 Query reformulation5 Query reformulation6 Text properties6 Text properties7 Text languages7 Text languages8 Text processing8 Text processing9 Information retrieval from the Web9 Information retrieval from the Web
SourcesSources
1Modern Information Retrieval1Modern Information RetrievalR Baeza-Yates and R Baeza-Yates and B Ribeiro-Neto Addison-Wesley 1999B Ribeiro-Neto Addison-Wesley 1999
2Information Retrieval Systems Theory and 2Information Retrieval Systems Theory and ImplementationImplementationG Kowalski Kluwer 1997G Kowalski Kluwer 1997
3Automatic Text Processing3Automatic Text ProcessingG Salton Addison-G Salton Addison-Wesley 1989Wesley 1989
4Database System Concepts 4thEdition4Database System Concepts 4thEditionA A Silberschatz H Korth and S Sudarshan Silberschatz H Korth and S Sudarshan McGraw-Hill 2001McGraw-Hill 2001
5Various Web sites5Various Web sites6Original material6Original material
MotivationMotivation
bull IR representation storage organization of IR representation storage organization of and access to information itemsand access to information items
bull Focus is on the Focus is on the user information needuser information needbull Information itemInformation item Usually text but possibly Usually text but possibly
also image audio video etcalso image audio video etcbull Presently most retrieval of non-text items Presently most retrieval of non-text items
is based on searching their textual is based on searching their textual descriptionsdescriptions
bull Text items are often referred to as Text items are often referred to as documents documents and may be of different scope and may be of different scope (book article paragraph etc)(book article paragraph etc)
Motivation (cont)Motivation (cont)
bull User information needUser information needndash Find all docs containing information on college tennis Find all docs containing information on college tennis
teams which (1) are maintained by a USA university teams which (1) are maintained by a USA university and (2) participate in the NCAA tournamentand (2) participate in the NCAA tournament
bull Emphasis is on the retrieval of information Emphasis is on the retrieval of information (not data)(not data)
Motivation (cont) Data retrieval
which docs contain a set of keywords Well defined semantics a single erroneous object implies
failure Information retrieval
information about a subject or topic semantics is frequently loose small errors are tolerated
IR system interpret contents of information items generate a ranking which reflects
relevance notion of relevance is most important
Motivation (cont) IR at the center of the stage
IR in the last 20 years classification and categorization systems and languages user interfaces and visualization
Still area was seen as of narrow interest Advent of the Web changed this perception
once and for all universal repository of knowledge free (low cost) universal access no central editorial board many problems though IR seen as key to
finding the solutions
Objecttives
1048633Overall objective Minimize search overhead
1048633Measurement of success Precision and recall
1048633Facilitate the overall objective Good search tools Helpful presentation of results
Minimize search overheadMinimize overheadof a user who is locating needed information Overhead Time spent in all steps leading to the reading of
items containing the needed information (query generation query execution scanning results reading non-relevant items etc)
Needed information Either Sufficient information in the system to complete a task All information in the system relevant to the user needs Example ndashshopping
Looking for an item to purchase Looking for an item to purchase at minimal cost
Example ndashresearching Looking for a bibliographic citation that explains a particular
term Building a comprehensive bibliography on a particular
subject
Measurement of success
Two dual measures 1048633Precision Proportion of items retrieved that are
relevant Precision = relevant retrieved total retrieved = Answer Relevant Answer
1048633Recall Proportion of relevant items that are retrieved
Recall = relevant retrieved relevant exist = Answer Relevant Relevant
1048633Most popular measures but others exist
Measurement of success (cont)
Support user search
Support user search providing tools to overcome obstacles such as
Ambiguities inherent in languages Homographs Words with identical spelling but with multiple
meanings Example ChinonmdashJapanese electronics French chateau
Limits to users ability to express needs Lack of system experience or aptitude
Lack of expertise in the area being searched Initially only vague concept of information sought Differences between users vocabulary and authors
vocabulary different words with similar meanings
Presentation of results
Present search results in format that helps user determine relevant items
Arbitrary (physical) order Relevance order Clustered (eg conceptual similarity) Graphical (visual) representation
Basic Concepts The User Task
Retrieval information or data purposeful
Browsing glancing around F1 cars Le Mans France tourism
Retrieval
Browsing
Database
Querying(retrieval) vs Browsing
Two complementary forms of information or data retrieval
Querying Information need (retrieval goal) is focused
and crystallized Contents of repository are well-known Often user is sophisticated
Querying(retrieval) vs Browsing (cont)
Browsing bullInformation need (retrieval goal) is vague
and imprecise bullContents of repository are not well-known bullOften user is naive
Querying and browsing are often interleaved (in the same session) bullExample present a query to a search
engine browse in the results restate the original query etc
Pulling vs Pushing information Querying and browsing are both initiated by users
(information is ldquopulledrdquo from the sources) Alternatively information may be ldquopushedrdquo to
users Dynamically compare newly received items against
standing statements of interests of users (profiles) and deliver matching items to user mail files
Asynchronous (background) process Profile defines all areas of interest (whereas an
individual query focuses on specific question) Each item compared against many profiles
(whereas each query is compared against many items)
Basic Concepts Logical view of the documents
Document representation viewed as a continuum logical view of docs might shift
structure
Accentsspacing stopwords
Noungroups stemming
Manual indexingDocs
structure Full text
Index terms
UserInterface
Text Operations
Query Operations Indexing
Searching
Ranking
Index
Text
query
user need
user feedback
ranked docs
retrieved docs
logical viewlogical view
inverted file
DB Manager Module
Text Database
Text
The Retrieval Process
- Subjects
- Sources
- Motivation
- Motivation (cont)
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
-
SourcesSources
1Modern Information Retrieval1Modern Information RetrievalR Baeza-Yates and R Baeza-Yates and B Ribeiro-Neto Addison-Wesley 1999B Ribeiro-Neto Addison-Wesley 1999
2Information Retrieval Systems Theory and 2Information Retrieval Systems Theory and ImplementationImplementationG Kowalski Kluwer 1997G Kowalski Kluwer 1997
3Automatic Text Processing3Automatic Text ProcessingG Salton Addison-G Salton Addison-Wesley 1989Wesley 1989
4Database System Concepts 4thEdition4Database System Concepts 4thEditionA A Silberschatz H Korth and S Sudarshan Silberschatz H Korth and S Sudarshan McGraw-Hill 2001McGraw-Hill 2001
5Various Web sites5Various Web sites6Original material6Original material
MotivationMotivation
bull IR representation storage organization of IR representation storage organization of and access to information itemsand access to information items
bull Focus is on the Focus is on the user information needuser information needbull Information itemInformation item Usually text but possibly Usually text but possibly
also image audio video etcalso image audio video etcbull Presently most retrieval of non-text items Presently most retrieval of non-text items
is based on searching their textual is based on searching their textual descriptionsdescriptions
bull Text items are often referred to as Text items are often referred to as documents documents and may be of different scope and may be of different scope (book article paragraph etc)(book article paragraph etc)
Motivation (cont)Motivation (cont)
bull User information needUser information needndash Find all docs containing information on college tennis Find all docs containing information on college tennis
teams which (1) are maintained by a USA university teams which (1) are maintained by a USA university and (2) participate in the NCAA tournamentand (2) participate in the NCAA tournament
bull Emphasis is on the retrieval of information Emphasis is on the retrieval of information (not data)(not data)
Motivation (cont) Data retrieval
which docs contain a set of keywords Well defined semantics a single erroneous object implies
failure Information retrieval
information about a subject or topic semantics is frequently loose small errors are tolerated
IR system interpret contents of information items generate a ranking which reflects
relevance notion of relevance is most important
Motivation (cont) IR at the center of the stage
IR in the last 20 years classification and categorization systems and languages user interfaces and visualization
Still area was seen as of narrow interest Advent of the Web changed this perception
once and for all universal repository of knowledge free (low cost) universal access no central editorial board many problems though IR seen as key to
finding the solutions
Objecttives
1048633Overall objective Minimize search overhead
1048633Measurement of success Precision and recall
1048633Facilitate the overall objective Good search tools Helpful presentation of results
Minimize search overheadMinimize overheadof a user who is locating needed information Overhead Time spent in all steps leading to the reading of
items containing the needed information (query generation query execution scanning results reading non-relevant items etc)
Needed information Either Sufficient information in the system to complete a task All information in the system relevant to the user needs Example ndashshopping
Looking for an item to purchase Looking for an item to purchase at minimal cost
Example ndashresearching Looking for a bibliographic citation that explains a particular
term Building a comprehensive bibliography on a particular
subject
Measurement of success
Two dual measures 1048633Precision Proportion of items retrieved that are
relevant Precision = relevant retrieved total retrieved = Answer Relevant Answer
1048633Recall Proportion of relevant items that are retrieved
Recall = relevant retrieved relevant exist = Answer Relevant Relevant
1048633Most popular measures but others exist
Measurement of success (cont)
Support user search
Support user search providing tools to overcome obstacles such as
Ambiguities inherent in languages Homographs Words with identical spelling but with multiple
meanings Example ChinonmdashJapanese electronics French chateau
Limits to users ability to express needs Lack of system experience or aptitude
Lack of expertise in the area being searched Initially only vague concept of information sought Differences between users vocabulary and authors
vocabulary different words with similar meanings
Presentation of results
Present search results in format that helps user determine relevant items
Arbitrary (physical) order Relevance order Clustered (eg conceptual similarity) Graphical (visual) representation
Basic Concepts The User Task
Retrieval information or data purposeful
Browsing glancing around F1 cars Le Mans France tourism
Retrieval
Browsing
Database
Querying(retrieval) vs Browsing
Two complementary forms of information or data retrieval
Querying Information need (retrieval goal) is focused
and crystallized Contents of repository are well-known Often user is sophisticated
Querying(retrieval) vs Browsing (cont)
Browsing bullInformation need (retrieval goal) is vague
and imprecise bullContents of repository are not well-known bullOften user is naive
Querying and browsing are often interleaved (in the same session) bullExample present a query to a search
engine browse in the results restate the original query etc
Pulling vs Pushing information Querying and browsing are both initiated by users
(information is ldquopulledrdquo from the sources) Alternatively information may be ldquopushedrdquo to
users Dynamically compare newly received items against
standing statements of interests of users (profiles) and deliver matching items to user mail files
Asynchronous (background) process Profile defines all areas of interest (whereas an
individual query focuses on specific question) Each item compared against many profiles
(whereas each query is compared against many items)
Basic Concepts Logical view of the documents
Document representation viewed as a continuum logical view of docs might shift
structure
Accentsspacing stopwords
Noungroups stemming
Manual indexingDocs
structure Full text
Index terms
UserInterface
Text Operations
Query Operations Indexing
Searching
Ranking
Index
Text
query
user need
user feedback
ranked docs
retrieved docs
logical viewlogical view
inverted file
DB Manager Module
Text Database
Text
The Retrieval Process
- Subjects
- Sources
- Motivation
- Motivation (cont)
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
-
MotivationMotivation
bull IR representation storage organization of IR representation storage organization of and access to information itemsand access to information items
bull Focus is on the Focus is on the user information needuser information needbull Information itemInformation item Usually text but possibly Usually text but possibly
also image audio video etcalso image audio video etcbull Presently most retrieval of non-text items Presently most retrieval of non-text items
is based on searching their textual is based on searching their textual descriptionsdescriptions
bull Text items are often referred to as Text items are often referred to as documents documents and may be of different scope and may be of different scope (book article paragraph etc)(book article paragraph etc)
Motivation (cont)Motivation (cont)
bull User information needUser information needndash Find all docs containing information on college tennis Find all docs containing information on college tennis
teams which (1) are maintained by a USA university teams which (1) are maintained by a USA university and (2) participate in the NCAA tournamentand (2) participate in the NCAA tournament
bull Emphasis is on the retrieval of information Emphasis is on the retrieval of information (not data)(not data)
Motivation (cont) Data retrieval
which docs contain a set of keywords Well defined semantics a single erroneous object implies
failure Information retrieval
information about a subject or topic semantics is frequently loose small errors are tolerated
IR system interpret contents of information items generate a ranking which reflects
relevance notion of relevance is most important
Motivation (cont) IR at the center of the stage
IR in the last 20 years classification and categorization systems and languages user interfaces and visualization
Still area was seen as of narrow interest Advent of the Web changed this perception
once and for all universal repository of knowledge free (low cost) universal access no central editorial board many problems though IR seen as key to
finding the solutions
Objecttives
1048633Overall objective Minimize search overhead
1048633Measurement of success Precision and recall
1048633Facilitate the overall objective Good search tools Helpful presentation of results
Minimize search overheadMinimize overheadof a user who is locating needed information Overhead Time spent in all steps leading to the reading of
items containing the needed information (query generation query execution scanning results reading non-relevant items etc)
Needed information Either Sufficient information in the system to complete a task All information in the system relevant to the user needs Example ndashshopping
Looking for an item to purchase Looking for an item to purchase at minimal cost
Example ndashresearching Looking for a bibliographic citation that explains a particular
term Building a comprehensive bibliography on a particular
subject
Measurement of success
Two dual measures 1048633Precision Proportion of items retrieved that are
relevant Precision = relevant retrieved total retrieved = Answer Relevant Answer
1048633Recall Proportion of relevant items that are retrieved
Recall = relevant retrieved relevant exist = Answer Relevant Relevant
1048633Most popular measures but others exist
Measurement of success (cont)
Support user search
Support user search providing tools to overcome obstacles such as
Ambiguities inherent in languages Homographs Words with identical spelling but with multiple
meanings Example ChinonmdashJapanese electronics French chateau
Limits to users ability to express needs Lack of system experience or aptitude
Lack of expertise in the area being searched Initially only vague concept of information sought Differences between users vocabulary and authors
vocabulary different words with similar meanings
Presentation of results
Present search results in format that helps user determine relevant items
Arbitrary (physical) order Relevance order Clustered (eg conceptual similarity) Graphical (visual) representation
Basic Concepts The User Task
Retrieval information or data purposeful
Browsing glancing around F1 cars Le Mans France tourism
Retrieval
Browsing
Database
Querying(retrieval) vs Browsing
Two complementary forms of information or data retrieval
Querying Information need (retrieval goal) is focused
and crystallized Contents of repository are well-known Often user is sophisticated
Querying(retrieval) vs Browsing (cont)
Browsing bullInformation need (retrieval goal) is vague
and imprecise bullContents of repository are not well-known bullOften user is naive
Querying and browsing are often interleaved (in the same session) bullExample present a query to a search
engine browse in the results restate the original query etc
Pulling vs Pushing information Querying and browsing are both initiated by users
(information is ldquopulledrdquo from the sources) Alternatively information may be ldquopushedrdquo to
users Dynamically compare newly received items against
standing statements of interests of users (profiles) and deliver matching items to user mail files
Asynchronous (background) process Profile defines all areas of interest (whereas an
individual query focuses on specific question) Each item compared against many profiles
(whereas each query is compared against many items)
Basic Concepts Logical view of the documents
Document representation viewed as a continuum logical view of docs might shift
structure
Accentsspacing stopwords
Noungroups stemming
Manual indexingDocs
structure Full text
Index terms
UserInterface
Text Operations
Query Operations Indexing
Searching
Ranking
Index
Text
query
user need
user feedback
ranked docs
retrieved docs
logical viewlogical view
inverted file
DB Manager Module
Text Database
Text
The Retrieval Process
- Subjects
- Sources
- Motivation
- Motivation (cont)
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
-
Motivation (cont)Motivation (cont)
bull User information needUser information needndash Find all docs containing information on college tennis Find all docs containing information on college tennis
teams which (1) are maintained by a USA university teams which (1) are maintained by a USA university and (2) participate in the NCAA tournamentand (2) participate in the NCAA tournament
bull Emphasis is on the retrieval of information Emphasis is on the retrieval of information (not data)(not data)
Motivation (cont) Data retrieval
which docs contain a set of keywords Well defined semantics a single erroneous object implies
failure Information retrieval
information about a subject or topic semantics is frequently loose small errors are tolerated
IR system interpret contents of information items generate a ranking which reflects
relevance notion of relevance is most important
Motivation (cont) IR at the center of the stage
IR in the last 20 years classification and categorization systems and languages user interfaces and visualization
Still area was seen as of narrow interest Advent of the Web changed this perception
once and for all universal repository of knowledge free (low cost) universal access no central editorial board many problems though IR seen as key to
finding the solutions
Objecttives
1048633Overall objective Minimize search overhead
1048633Measurement of success Precision and recall
1048633Facilitate the overall objective Good search tools Helpful presentation of results
Minimize search overheadMinimize overheadof a user who is locating needed information Overhead Time spent in all steps leading to the reading of
items containing the needed information (query generation query execution scanning results reading non-relevant items etc)
Needed information Either Sufficient information in the system to complete a task All information in the system relevant to the user needs Example ndashshopping
Looking for an item to purchase Looking for an item to purchase at minimal cost
Example ndashresearching Looking for a bibliographic citation that explains a particular
term Building a comprehensive bibliography on a particular
subject
Measurement of success
Two dual measures 1048633Precision Proportion of items retrieved that are
relevant Precision = relevant retrieved total retrieved = Answer Relevant Answer
1048633Recall Proportion of relevant items that are retrieved
Recall = relevant retrieved relevant exist = Answer Relevant Relevant
1048633Most popular measures but others exist
Measurement of success (cont)
Support user search
Support user search providing tools to overcome obstacles such as
Ambiguities inherent in languages Homographs Words with identical spelling but with multiple
meanings Example ChinonmdashJapanese electronics French chateau
Limits to users ability to express needs Lack of system experience or aptitude
Lack of expertise in the area being searched Initially only vague concept of information sought Differences between users vocabulary and authors
vocabulary different words with similar meanings
Presentation of results
Present search results in format that helps user determine relevant items
Arbitrary (physical) order Relevance order Clustered (eg conceptual similarity) Graphical (visual) representation
Basic Concepts The User Task
Retrieval information or data purposeful
Browsing glancing around F1 cars Le Mans France tourism
Retrieval
Browsing
Database
Querying(retrieval) vs Browsing
Two complementary forms of information or data retrieval
Querying Information need (retrieval goal) is focused
and crystallized Contents of repository are well-known Often user is sophisticated
Querying(retrieval) vs Browsing (cont)
Browsing bullInformation need (retrieval goal) is vague
and imprecise bullContents of repository are not well-known bullOften user is naive
Querying and browsing are often interleaved (in the same session) bullExample present a query to a search
engine browse in the results restate the original query etc
Pulling vs Pushing information Querying and browsing are both initiated by users
(information is ldquopulledrdquo from the sources) Alternatively information may be ldquopushedrdquo to
users Dynamically compare newly received items against
standing statements of interests of users (profiles) and deliver matching items to user mail files
Asynchronous (background) process Profile defines all areas of interest (whereas an
individual query focuses on specific question) Each item compared against many profiles
(whereas each query is compared against many items)
Basic Concepts Logical view of the documents
Document representation viewed as a continuum logical view of docs might shift
structure
Accentsspacing stopwords
Noungroups stemming
Manual indexingDocs
structure Full text
Index terms
UserInterface
Text Operations
Query Operations Indexing
Searching
Ranking
Index
Text
query
user need
user feedback
ranked docs
retrieved docs
logical viewlogical view
inverted file
DB Manager Module
Text Database
Text
The Retrieval Process
- Subjects
- Sources
- Motivation
- Motivation (cont)
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
-
Motivation (cont) Data retrieval
which docs contain a set of keywords Well defined semantics a single erroneous object implies
failure Information retrieval
information about a subject or topic semantics is frequently loose small errors are tolerated
IR system interpret contents of information items generate a ranking which reflects
relevance notion of relevance is most important
Motivation (cont) IR at the center of the stage
IR in the last 20 years classification and categorization systems and languages user interfaces and visualization
Still area was seen as of narrow interest Advent of the Web changed this perception
once and for all universal repository of knowledge free (low cost) universal access no central editorial board many problems though IR seen as key to
finding the solutions
Objecttives
1048633Overall objective Minimize search overhead
1048633Measurement of success Precision and recall
1048633Facilitate the overall objective Good search tools Helpful presentation of results
Minimize search overheadMinimize overheadof a user who is locating needed information Overhead Time spent in all steps leading to the reading of
items containing the needed information (query generation query execution scanning results reading non-relevant items etc)
Needed information Either Sufficient information in the system to complete a task All information in the system relevant to the user needs Example ndashshopping
Looking for an item to purchase Looking for an item to purchase at minimal cost
Example ndashresearching Looking for a bibliographic citation that explains a particular
term Building a comprehensive bibliography on a particular
subject
Measurement of success
Two dual measures 1048633Precision Proportion of items retrieved that are
relevant Precision = relevant retrieved total retrieved = Answer Relevant Answer
1048633Recall Proportion of relevant items that are retrieved
Recall = relevant retrieved relevant exist = Answer Relevant Relevant
1048633Most popular measures but others exist
Measurement of success (cont)
Support user search
Support user search providing tools to overcome obstacles such as
Ambiguities inherent in languages Homographs Words with identical spelling but with multiple
meanings Example ChinonmdashJapanese electronics French chateau
Limits to users ability to express needs Lack of system experience or aptitude
Lack of expertise in the area being searched Initially only vague concept of information sought Differences between users vocabulary and authors
vocabulary different words with similar meanings
Presentation of results
Present search results in format that helps user determine relevant items
Arbitrary (physical) order Relevance order Clustered (eg conceptual similarity) Graphical (visual) representation
Basic Concepts The User Task
Retrieval information or data purposeful
Browsing glancing around F1 cars Le Mans France tourism
Retrieval
Browsing
Database
Querying(retrieval) vs Browsing
Two complementary forms of information or data retrieval
Querying Information need (retrieval goal) is focused
and crystallized Contents of repository are well-known Often user is sophisticated
Querying(retrieval) vs Browsing (cont)
Browsing bullInformation need (retrieval goal) is vague
and imprecise bullContents of repository are not well-known bullOften user is naive
Querying and browsing are often interleaved (in the same session) bullExample present a query to a search
engine browse in the results restate the original query etc
Pulling vs Pushing information Querying and browsing are both initiated by users
(information is ldquopulledrdquo from the sources) Alternatively information may be ldquopushedrdquo to
users Dynamically compare newly received items against
standing statements of interests of users (profiles) and deliver matching items to user mail files
Asynchronous (background) process Profile defines all areas of interest (whereas an
individual query focuses on specific question) Each item compared against many profiles
(whereas each query is compared against many items)
Basic Concepts Logical view of the documents
Document representation viewed as a continuum logical view of docs might shift
structure
Accentsspacing stopwords
Noungroups stemming
Manual indexingDocs
structure Full text
Index terms
UserInterface
Text Operations
Query Operations Indexing
Searching
Ranking
Index
Text
query
user need
user feedback
ranked docs
retrieved docs
logical viewlogical view
inverted file
DB Manager Module
Text Database
Text
The Retrieval Process
- Subjects
- Sources
- Motivation
- Motivation (cont)
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
-
Motivation (cont) IR at the center of the stage
IR in the last 20 years classification and categorization systems and languages user interfaces and visualization
Still area was seen as of narrow interest Advent of the Web changed this perception
once and for all universal repository of knowledge free (low cost) universal access no central editorial board many problems though IR seen as key to
finding the solutions
Objecttives
1048633Overall objective Minimize search overhead
1048633Measurement of success Precision and recall
1048633Facilitate the overall objective Good search tools Helpful presentation of results
Minimize search overheadMinimize overheadof a user who is locating needed information Overhead Time spent in all steps leading to the reading of
items containing the needed information (query generation query execution scanning results reading non-relevant items etc)
Needed information Either Sufficient information in the system to complete a task All information in the system relevant to the user needs Example ndashshopping
Looking for an item to purchase Looking for an item to purchase at minimal cost
Example ndashresearching Looking for a bibliographic citation that explains a particular
term Building a comprehensive bibliography on a particular
subject
Measurement of success
Two dual measures 1048633Precision Proportion of items retrieved that are
relevant Precision = relevant retrieved total retrieved = Answer Relevant Answer
1048633Recall Proportion of relevant items that are retrieved
Recall = relevant retrieved relevant exist = Answer Relevant Relevant
1048633Most popular measures but others exist
Measurement of success (cont)
Support user search
Support user search providing tools to overcome obstacles such as
Ambiguities inherent in languages Homographs Words with identical spelling but with multiple
meanings Example ChinonmdashJapanese electronics French chateau
Limits to users ability to express needs Lack of system experience or aptitude
Lack of expertise in the area being searched Initially only vague concept of information sought Differences between users vocabulary and authors
vocabulary different words with similar meanings
Presentation of results
Present search results in format that helps user determine relevant items
Arbitrary (physical) order Relevance order Clustered (eg conceptual similarity) Graphical (visual) representation
Basic Concepts The User Task
Retrieval information or data purposeful
Browsing glancing around F1 cars Le Mans France tourism
Retrieval
Browsing
Database
Querying(retrieval) vs Browsing
Two complementary forms of information or data retrieval
Querying Information need (retrieval goal) is focused
and crystallized Contents of repository are well-known Often user is sophisticated
Querying(retrieval) vs Browsing (cont)
Browsing bullInformation need (retrieval goal) is vague
and imprecise bullContents of repository are not well-known bullOften user is naive
Querying and browsing are often interleaved (in the same session) bullExample present a query to a search
engine browse in the results restate the original query etc
Pulling vs Pushing information Querying and browsing are both initiated by users
(information is ldquopulledrdquo from the sources) Alternatively information may be ldquopushedrdquo to
users Dynamically compare newly received items against
standing statements of interests of users (profiles) and deliver matching items to user mail files
Asynchronous (background) process Profile defines all areas of interest (whereas an
individual query focuses on specific question) Each item compared against many profiles
(whereas each query is compared against many items)
Basic Concepts Logical view of the documents
Document representation viewed as a continuum logical view of docs might shift
structure
Accentsspacing stopwords
Noungroups stemming
Manual indexingDocs
structure Full text
Index terms
UserInterface
Text Operations
Query Operations Indexing
Searching
Ranking
Index
Text
query
user need
user feedback
ranked docs
retrieved docs
logical viewlogical view
inverted file
DB Manager Module
Text Database
Text
The Retrieval Process
- Subjects
- Sources
- Motivation
- Motivation (cont)
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
-
Objecttives
1048633Overall objective Minimize search overhead
1048633Measurement of success Precision and recall
1048633Facilitate the overall objective Good search tools Helpful presentation of results
Minimize search overheadMinimize overheadof a user who is locating needed information Overhead Time spent in all steps leading to the reading of
items containing the needed information (query generation query execution scanning results reading non-relevant items etc)
Needed information Either Sufficient information in the system to complete a task All information in the system relevant to the user needs Example ndashshopping
Looking for an item to purchase Looking for an item to purchase at minimal cost
Example ndashresearching Looking for a bibliographic citation that explains a particular
term Building a comprehensive bibliography on a particular
subject
Measurement of success
Two dual measures 1048633Precision Proportion of items retrieved that are
relevant Precision = relevant retrieved total retrieved = Answer Relevant Answer
1048633Recall Proportion of relevant items that are retrieved
Recall = relevant retrieved relevant exist = Answer Relevant Relevant
1048633Most popular measures but others exist
Measurement of success (cont)
Support user search
Support user search providing tools to overcome obstacles such as
Ambiguities inherent in languages Homographs Words with identical spelling but with multiple
meanings Example ChinonmdashJapanese electronics French chateau
Limits to users ability to express needs Lack of system experience or aptitude
Lack of expertise in the area being searched Initially only vague concept of information sought Differences between users vocabulary and authors
vocabulary different words with similar meanings
Presentation of results
Present search results in format that helps user determine relevant items
Arbitrary (physical) order Relevance order Clustered (eg conceptual similarity) Graphical (visual) representation
Basic Concepts The User Task
Retrieval information or data purposeful
Browsing glancing around F1 cars Le Mans France tourism
Retrieval
Browsing
Database
Querying(retrieval) vs Browsing
Two complementary forms of information or data retrieval
Querying Information need (retrieval goal) is focused
and crystallized Contents of repository are well-known Often user is sophisticated
Querying(retrieval) vs Browsing (cont)
Browsing bullInformation need (retrieval goal) is vague
and imprecise bullContents of repository are not well-known bullOften user is naive
Querying and browsing are often interleaved (in the same session) bullExample present a query to a search
engine browse in the results restate the original query etc
Pulling vs Pushing information Querying and browsing are both initiated by users
(information is ldquopulledrdquo from the sources) Alternatively information may be ldquopushedrdquo to
users Dynamically compare newly received items against
standing statements of interests of users (profiles) and deliver matching items to user mail files
Asynchronous (background) process Profile defines all areas of interest (whereas an
individual query focuses on specific question) Each item compared against many profiles
(whereas each query is compared against many items)
Basic Concepts Logical view of the documents
Document representation viewed as a continuum logical view of docs might shift
structure
Accentsspacing stopwords
Noungroups stemming
Manual indexingDocs
structure Full text
Index terms
UserInterface
Text Operations
Query Operations Indexing
Searching
Ranking
Index
Text
query
user need
user feedback
ranked docs
retrieved docs
logical viewlogical view
inverted file
DB Manager Module
Text Database
Text
The Retrieval Process
- Subjects
- Sources
- Motivation
- Motivation (cont)
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
-
Minimize search overheadMinimize overheadof a user who is locating needed information Overhead Time spent in all steps leading to the reading of
items containing the needed information (query generation query execution scanning results reading non-relevant items etc)
Needed information Either Sufficient information in the system to complete a task All information in the system relevant to the user needs Example ndashshopping
Looking for an item to purchase Looking for an item to purchase at minimal cost
Example ndashresearching Looking for a bibliographic citation that explains a particular
term Building a comprehensive bibliography on a particular
subject
Measurement of success
Two dual measures 1048633Precision Proportion of items retrieved that are
relevant Precision = relevant retrieved total retrieved = Answer Relevant Answer
1048633Recall Proportion of relevant items that are retrieved
Recall = relevant retrieved relevant exist = Answer Relevant Relevant
1048633Most popular measures but others exist
Measurement of success (cont)
Support user search
Support user search providing tools to overcome obstacles such as
Ambiguities inherent in languages Homographs Words with identical spelling but with multiple
meanings Example ChinonmdashJapanese electronics French chateau
Limits to users ability to express needs Lack of system experience or aptitude
Lack of expertise in the area being searched Initially only vague concept of information sought Differences between users vocabulary and authors
vocabulary different words with similar meanings
Presentation of results
Present search results in format that helps user determine relevant items
Arbitrary (physical) order Relevance order Clustered (eg conceptual similarity) Graphical (visual) representation
Basic Concepts The User Task
Retrieval information or data purposeful
Browsing glancing around F1 cars Le Mans France tourism
Retrieval
Browsing
Database
Querying(retrieval) vs Browsing
Two complementary forms of information or data retrieval
Querying Information need (retrieval goal) is focused
and crystallized Contents of repository are well-known Often user is sophisticated
Querying(retrieval) vs Browsing (cont)
Browsing bullInformation need (retrieval goal) is vague
and imprecise bullContents of repository are not well-known bullOften user is naive
Querying and browsing are often interleaved (in the same session) bullExample present a query to a search
engine browse in the results restate the original query etc
Pulling vs Pushing information Querying and browsing are both initiated by users
(information is ldquopulledrdquo from the sources) Alternatively information may be ldquopushedrdquo to
users Dynamically compare newly received items against
standing statements of interests of users (profiles) and deliver matching items to user mail files
Asynchronous (background) process Profile defines all areas of interest (whereas an
individual query focuses on specific question) Each item compared against many profiles
(whereas each query is compared against many items)
Basic Concepts Logical view of the documents
Document representation viewed as a continuum logical view of docs might shift
structure
Accentsspacing stopwords
Noungroups stemming
Manual indexingDocs
structure Full text
Index terms
UserInterface
Text Operations
Query Operations Indexing
Searching
Ranking
Index
Text
query
user need
user feedback
ranked docs
retrieved docs
logical viewlogical view
inverted file
DB Manager Module
Text Database
Text
The Retrieval Process
- Subjects
- Sources
- Motivation
- Motivation (cont)
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
-
Measurement of success
Two dual measures 1048633Precision Proportion of items retrieved that are
relevant Precision = relevant retrieved total retrieved = Answer Relevant Answer
1048633Recall Proportion of relevant items that are retrieved
Recall = relevant retrieved relevant exist = Answer Relevant Relevant
1048633Most popular measures but others exist
Measurement of success (cont)
Support user search
Support user search providing tools to overcome obstacles such as
Ambiguities inherent in languages Homographs Words with identical spelling but with multiple
meanings Example ChinonmdashJapanese electronics French chateau
Limits to users ability to express needs Lack of system experience or aptitude
Lack of expertise in the area being searched Initially only vague concept of information sought Differences between users vocabulary and authors
vocabulary different words with similar meanings
Presentation of results
Present search results in format that helps user determine relevant items
Arbitrary (physical) order Relevance order Clustered (eg conceptual similarity) Graphical (visual) representation
Basic Concepts The User Task
Retrieval information or data purposeful
Browsing glancing around F1 cars Le Mans France tourism
Retrieval
Browsing
Database
Querying(retrieval) vs Browsing
Two complementary forms of information or data retrieval
Querying Information need (retrieval goal) is focused
and crystallized Contents of repository are well-known Often user is sophisticated
Querying(retrieval) vs Browsing (cont)
Browsing bullInformation need (retrieval goal) is vague
and imprecise bullContents of repository are not well-known bullOften user is naive
Querying and browsing are often interleaved (in the same session) bullExample present a query to a search
engine browse in the results restate the original query etc
Pulling vs Pushing information Querying and browsing are both initiated by users
(information is ldquopulledrdquo from the sources) Alternatively information may be ldquopushedrdquo to
users Dynamically compare newly received items against
standing statements of interests of users (profiles) and deliver matching items to user mail files
Asynchronous (background) process Profile defines all areas of interest (whereas an
individual query focuses on specific question) Each item compared against many profiles
(whereas each query is compared against many items)
Basic Concepts Logical view of the documents
Document representation viewed as a continuum logical view of docs might shift
structure
Accentsspacing stopwords
Noungroups stemming
Manual indexingDocs
structure Full text
Index terms
UserInterface
Text Operations
Query Operations Indexing
Searching
Ranking
Index
Text
query
user need
user feedback
ranked docs
retrieved docs
logical viewlogical view
inverted file
DB Manager Module
Text Database
Text
The Retrieval Process
- Subjects
- Sources
- Motivation
- Motivation (cont)
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
-
Measurement of success (cont)
Support user search
Support user search providing tools to overcome obstacles such as
Ambiguities inherent in languages Homographs Words with identical spelling but with multiple
meanings Example ChinonmdashJapanese electronics French chateau
Limits to users ability to express needs Lack of system experience or aptitude
Lack of expertise in the area being searched Initially only vague concept of information sought Differences between users vocabulary and authors
vocabulary different words with similar meanings
Presentation of results
Present search results in format that helps user determine relevant items
Arbitrary (physical) order Relevance order Clustered (eg conceptual similarity) Graphical (visual) representation
Basic Concepts The User Task
Retrieval information or data purposeful
Browsing glancing around F1 cars Le Mans France tourism
Retrieval
Browsing
Database
Querying(retrieval) vs Browsing
Two complementary forms of information or data retrieval
Querying Information need (retrieval goal) is focused
and crystallized Contents of repository are well-known Often user is sophisticated
Querying(retrieval) vs Browsing (cont)
Browsing bullInformation need (retrieval goal) is vague
and imprecise bullContents of repository are not well-known bullOften user is naive
Querying and browsing are often interleaved (in the same session) bullExample present a query to a search
engine browse in the results restate the original query etc
Pulling vs Pushing information Querying and browsing are both initiated by users
(information is ldquopulledrdquo from the sources) Alternatively information may be ldquopushedrdquo to
users Dynamically compare newly received items against
standing statements of interests of users (profiles) and deliver matching items to user mail files
Asynchronous (background) process Profile defines all areas of interest (whereas an
individual query focuses on specific question) Each item compared against many profiles
(whereas each query is compared against many items)
Basic Concepts Logical view of the documents
Document representation viewed as a continuum logical view of docs might shift
structure
Accentsspacing stopwords
Noungroups stemming
Manual indexingDocs
structure Full text
Index terms
UserInterface
Text Operations
Query Operations Indexing
Searching
Ranking
Index
Text
query
user need
user feedback
ranked docs
retrieved docs
logical viewlogical view
inverted file
DB Manager Module
Text Database
Text
The Retrieval Process
- Subjects
- Sources
- Motivation
- Motivation (cont)
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
-
Support user search
Support user search providing tools to overcome obstacles such as
Ambiguities inherent in languages Homographs Words with identical spelling but with multiple
meanings Example ChinonmdashJapanese electronics French chateau
Limits to users ability to express needs Lack of system experience or aptitude
Lack of expertise in the area being searched Initially only vague concept of information sought Differences between users vocabulary and authors
vocabulary different words with similar meanings
Presentation of results
Present search results in format that helps user determine relevant items
Arbitrary (physical) order Relevance order Clustered (eg conceptual similarity) Graphical (visual) representation
Basic Concepts The User Task
Retrieval information or data purposeful
Browsing glancing around F1 cars Le Mans France tourism
Retrieval
Browsing
Database
Querying(retrieval) vs Browsing
Two complementary forms of information or data retrieval
Querying Information need (retrieval goal) is focused
and crystallized Contents of repository are well-known Often user is sophisticated
Querying(retrieval) vs Browsing (cont)
Browsing bullInformation need (retrieval goal) is vague
and imprecise bullContents of repository are not well-known bullOften user is naive
Querying and browsing are often interleaved (in the same session) bullExample present a query to a search
engine browse in the results restate the original query etc
Pulling vs Pushing information Querying and browsing are both initiated by users
(information is ldquopulledrdquo from the sources) Alternatively information may be ldquopushedrdquo to
users Dynamically compare newly received items against
standing statements of interests of users (profiles) and deliver matching items to user mail files
Asynchronous (background) process Profile defines all areas of interest (whereas an
individual query focuses on specific question) Each item compared against many profiles
(whereas each query is compared against many items)
Basic Concepts Logical view of the documents
Document representation viewed as a continuum logical view of docs might shift
structure
Accentsspacing stopwords
Noungroups stemming
Manual indexingDocs
structure Full text
Index terms
UserInterface
Text Operations
Query Operations Indexing
Searching
Ranking
Index
Text
query
user need
user feedback
ranked docs
retrieved docs
logical viewlogical view
inverted file
DB Manager Module
Text Database
Text
The Retrieval Process
- Subjects
- Sources
- Motivation
- Motivation (cont)
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
-
Presentation of results
Present search results in format that helps user determine relevant items
Arbitrary (physical) order Relevance order Clustered (eg conceptual similarity) Graphical (visual) representation
Basic Concepts The User Task
Retrieval information or data purposeful
Browsing glancing around F1 cars Le Mans France tourism
Retrieval
Browsing
Database
Querying(retrieval) vs Browsing
Two complementary forms of information or data retrieval
Querying Information need (retrieval goal) is focused
and crystallized Contents of repository are well-known Often user is sophisticated
Querying(retrieval) vs Browsing (cont)
Browsing bullInformation need (retrieval goal) is vague
and imprecise bullContents of repository are not well-known bullOften user is naive
Querying and browsing are often interleaved (in the same session) bullExample present a query to a search
engine browse in the results restate the original query etc
Pulling vs Pushing information Querying and browsing are both initiated by users
(information is ldquopulledrdquo from the sources) Alternatively information may be ldquopushedrdquo to
users Dynamically compare newly received items against
standing statements of interests of users (profiles) and deliver matching items to user mail files
Asynchronous (background) process Profile defines all areas of interest (whereas an
individual query focuses on specific question) Each item compared against many profiles
(whereas each query is compared against many items)
Basic Concepts Logical view of the documents
Document representation viewed as a continuum logical view of docs might shift
structure
Accentsspacing stopwords
Noungroups stemming
Manual indexingDocs
structure Full text
Index terms
UserInterface
Text Operations
Query Operations Indexing
Searching
Ranking
Index
Text
query
user need
user feedback
ranked docs
retrieved docs
logical viewlogical view
inverted file
DB Manager Module
Text Database
Text
The Retrieval Process
- Subjects
- Sources
- Motivation
- Motivation (cont)
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
-
Basic Concepts The User Task
Retrieval information or data purposeful
Browsing glancing around F1 cars Le Mans France tourism
Retrieval
Browsing
Database
Querying(retrieval) vs Browsing
Two complementary forms of information or data retrieval
Querying Information need (retrieval goal) is focused
and crystallized Contents of repository are well-known Often user is sophisticated
Querying(retrieval) vs Browsing (cont)
Browsing bullInformation need (retrieval goal) is vague
and imprecise bullContents of repository are not well-known bullOften user is naive
Querying and browsing are often interleaved (in the same session) bullExample present a query to a search
engine browse in the results restate the original query etc
Pulling vs Pushing information Querying and browsing are both initiated by users
(information is ldquopulledrdquo from the sources) Alternatively information may be ldquopushedrdquo to
users Dynamically compare newly received items against
standing statements of interests of users (profiles) and deliver matching items to user mail files
Asynchronous (background) process Profile defines all areas of interest (whereas an
individual query focuses on specific question) Each item compared against many profiles
(whereas each query is compared against many items)
Basic Concepts Logical view of the documents
Document representation viewed as a continuum logical view of docs might shift
structure
Accentsspacing stopwords
Noungroups stemming
Manual indexingDocs
structure Full text
Index terms
UserInterface
Text Operations
Query Operations Indexing
Searching
Ranking
Index
Text
query
user need
user feedback
ranked docs
retrieved docs
logical viewlogical view
inverted file
DB Manager Module
Text Database
Text
The Retrieval Process
- Subjects
- Sources
- Motivation
- Motivation (cont)
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
-
Querying(retrieval) vs Browsing
Two complementary forms of information or data retrieval
Querying Information need (retrieval goal) is focused
and crystallized Contents of repository are well-known Often user is sophisticated
Querying(retrieval) vs Browsing (cont)
Browsing bullInformation need (retrieval goal) is vague
and imprecise bullContents of repository are not well-known bullOften user is naive
Querying and browsing are often interleaved (in the same session) bullExample present a query to a search
engine browse in the results restate the original query etc
Pulling vs Pushing information Querying and browsing are both initiated by users
(information is ldquopulledrdquo from the sources) Alternatively information may be ldquopushedrdquo to
users Dynamically compare newly received items against
standing statements of interests of users (profiles) and deliver matching items to user mail files
Asynchronous (background) process Profile defines all areas of interest (whereas an
individual query focuses on specific question) Each item compared against many profiles
(whereas each query is compared against many items)
Basic Concepts Logical view of the documents
Document representation viewed as a continuum logical view of docs might shift
structure
Accentsspacing stopwords
Noungroups stemming
Manual indexingDocs
structure Full text
Index terms
UserInterface
Text Operations
Query Operations Indexing
Searching
Ranking
Index
Text
query
user need
user feedback
ranked docs
retrieved docs
logical viewlogical view
inverted file
DB Manager Module
Text Database
Text
The Retrieval Process
- Subjects
- Sources
- Motivation
- Motivation (cont)
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
-
Querying(retrieval) vs Browsing (cont)
Browsing bullInformation need (retrieval goal) is vague
and imprecise bullContents of repository are not well-known bullOften user is naive
Querying and browsing are often interleaved (in the same session) bullExample present a query to a search
engine browse in the results restate the original query etc
Pulling vs Pushing information Querying and browsing are both initiated by users
(information is ldquopulledrdquo from the sources) Alternatively information may be ldquopushedrdquo to
users Dynamically compare newly received items against
standing statements of interests of users (profiles) and deliver matching items to user mail files
Asynchronous (background) process Profile defines all areas of interest (whereas an
individual query focuses on specific question) Each item compared against many profiles
(whereas each query is compared against many items)
Basic Concepts Logical view of the documents
Document representation viewed as a continuum logical view of docs might shift
structure
Accentsspacing stopwords
Noungroups stemming
Manual indexingDocs
structure Full text
Index terms
UserInterface
Text Operations
Query Operations Indexing
Searching
Ranking
Index
Text
query
user need
user feedback
ranked docs
retrieved docs
logical viewlogical view
inverted file
DB Manager Module
Text Database
Text
The Retrieval Process
- Subjects
- Sources
- Motivation
- Motivation (cont)
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
-
Pulling vs Pushing information Querying and browsing are both initiated by users
(information is ldquopulledrdquo from the sources) Alternatively information may be ldquopushedrdquo to
users Dynamically compare newly received items against
standing statements of interests of users (profiles) and deliver matching items to user mail files
Asynchronous (background) process Profile defines all areas of interest (whereas an
individual query focuses on specific question) Each item compared against many profiles
(whereas each query is compared against many items)
Basic Concepts Logical view of the documents
Document representation viewed as a continuum logical view of docs might shift
structure
Accentsspacing stopwords
Noungroups stemming
Manual indexingDocs
structure Full text
Index terms
UserInterface
Text Operations
Query Operations Indexing
Searching
Ranking
Index
Text
query
user need
user feedback
ranked docs
retrieved docs
logical viewlogical view
inverted file
DB Manager Module
Text Database
Text
The Retrieval Process
- Subjects
- Sources
- Motivation
- Motivation (cont)
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
-
Basic Concepts Logical view of the documents
Document representation viewed as a continuum logical view of docs might shift
structure
Accentsspacing stopwords
Noungroups stemming
Manual indexingDocs
structure Full text
Index terms
UserInterface
Text Operations
Query Operations Indexing
Searching
Ranking
Index
Text
query
user need
user feedback
ranked docs
retrieved docs
logical viewlogical view
inverted file
DB Manager Module
Text Database
Text
The Retrieval Process
- Subjects
- Sources
- Motivation
- Motivation (cont)
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
-
UserInterface
Text Operations
Query Operations Indexing
Searching
Ranking
Index
Text
query
user need
user feedback
ranked docs
retrieved docs
logical viewlogical view
inverted file
DB Manager Module
Text Database
Text
The Retrieval Process
- Subjects
- Sources
- Motivation
- Motivation (cont)
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
-