enhancing internet search engines to achieve concept- based retrieval f. lu, t. johnsten, v....
TRANSCRIPT
![Page 1: Enhancing Internet Search Engines to Achieve Concept- based Retrieval F. Lu, T. Johnsten, V. Raghavan, and D. Traylor](https://reader034.vdocument.in/reader034/viewer/2022051401/56649e055503460f94af19a5/html5/thumbnails/1.jpg)
Enhancing Internet Search Engines to Achieve Concept-
based Retrieval
F. Lu, T. Johnsten, V. Raghavan,
and D. Traylor
![Page 2: Enhancing Internet Search Engines to Achieve Concept- based Retrieval F. Lu, T. Johnsten, V. Raghavan, and D. Traylor](https://reader034.vdocument.in/reader034/viewer/2022051401/56649e055503460f94af19a5/html5/thumbnails/2.jpg)
Agenda
• Information on the Internet.
• Boolean Retrieval Model and the Internet.
• Personalized Search.
• Concept-Based Retrieval (RUBRIC / CS3).
• CS3 and Boolean Search Engines.
• Deep Web Sources.
• Current & Future Work.
![Page 3: Enhancing Internet Search Engines to Achieve Concept- based Retrieval F. Lu, T. Johnsten, V. Raghavan, and D. Traylor](https://reader034.vdocument.in/reader034/viewer/2022051401/56649e055503460f94af19a5/html5/thumbnails/3.jpg)
Information on the Internet
• Large volume.
• Rapid growth rate.
• Wide variations in quality and type.
![Page 4: Enhancing Internet Search Engines to Achieve Concept- based Retrieval F. Lu, T. Johnsten, V. Raghavan, and D. Traylor](https://reader034.vdocument.in/reader034/viewer/2022051401/56649e055503460f94af19a5/html5/thumbnails/4.jpg)
Boolean Retrieval Model and the Internet
• Most Internet search engines are based on the Boolean Retrieval Model.
• Boolean Retrieval Model is relatively easy to implement.
• Limitations:– Inability to assign weights to query or document terms.
– Inability to rank retrieved documents.
– Naïve users have difficulty in using
![Page 5: Enhancing Internet Search Engines to Achieve Concept- based Retrieval F. Lu, T. Johnsten, V. Raghavan, and D. Traylor](https://reader034.vdocument.in/reader034/viewer/2022051401/56649e055503460f94af19a5/html5/thumbnails/5.jpg)
Personalized Search
Personalized Engine
QueryProcessor
User Query
Search Engine
Query Augmentation Search Results
ResultProcessor
Personalized Results
UserProfile
GeneralProfile
![Page 6: Enhancing Internet Search Engines to Achieve Concept- based Retrieval F. Lu, T. Johnsten, V. Raghavan, and D. Traylor](https://reader034.vdocument.in/reader034/viewer/2022051401/56649e055503460f94af19a5/html5/thumbnails/6.jpg)
Concept-Based Retrieval
• Address shortcomings of Boolean Retrieval Model.
• Search Requests specified in terms of concepts structured as rule-base trees.
![Page 7: Enhancing Internet Search Engines to Achieve Concept- based Retrieval F. Lu, T. Johnsten, V. Raghavan, and D. Traylor](https://reader034.vdocument.in/reader034/viewer/2022051401/56649e055503460f94af19a5/html5/thumbnails/7.jpg)
Development of Rule-Base Trees (General)
• Top-down refinement strategy.
• Support for AND / OR relationships.
• Support for user-defined weights.
![Page 8: Enhancing Internet Search Engines to Achieve Concept- based Retrieval F. Lu, T. Johnsten, V. Raghavan, and D. Traylor](https://reader034.vdocument.in/reader034/viewer/2022051401/56649e055503460f94af19a5/html5/thumbnails/8.jpg)
![Page 9: Enhancing Internet Search Engines to Achieve Concept- based Retrieval F. Lu, T. Johnsten, V. Raghavan, and D. Traylor](https://reader034.vdocument.in/reader034/viewer/2022051401/56649e055503460f94af19a5/html5/thumbnails/9.jpg)
Development of Rule-Base Trees (CS3)
• Concept-Set Structuring System (CS3)
• CS3 supports the creation, storage and modification of user-defined concepts
• Post-processing of results of sub-queries
• CS3 user-interface.
![Page 10: Enhancing Internet Search Engines to Achieve Concept- based Retrieval F. Lu, T. Johnsten, V. Raghavan, and D. Traylor](https://reader034.vdocument.in/reader034/viewer/2022051401/56649e055503460f94af19a5/html5/thumbnails/10.jpg)
CS3 User Interface
![Page 11: Enhancing Internet Search Engines to Achieve Concept- based Retrieval F. Lu, T. Johnsten, V. Raghavan, and D. Traylor](https://reader034.vdocument.in/reader034/viewer/2022051401/56649e055503460f94af19a5/html5/thumbnails/11.jpg)
Evaluation of Rule-Base Trees (RUBRIC)
• Run-time, bottom-up analysis.
• Propagation of weight values (MIN / MAX).
• Disadvantage of run-time analysis.
![Page 12: Enhancing Internet Search Engines to Achieve Concept- based Retrieval F. Lu, T. Johnsten, V. Raghavan, and D. Traylor](https://reader034.vdocument.in/reader034/viewer/2022051401/56649e055503460f94af19a5/html5/thumbnails/12.jpg)
![Page 13: Enhancing Internet Search Engines to Achieve Concept- based Retrieval F. Lu, T. Johnsten, V. Raghavan, and D. Traylor](https://reader034.vdocument.in/reader034/viewer/2022051401/56649e055503460f94af19a5/html5/thumbnails/13.jpg)
Evaluation of Rule-Base Trees (CS3)
• Static, bottom-up analysis.
• Construct Minimal Term Set (MTS).
• Propagation of terms.
• CS3 user-interface.
![Page 14: Enhancing Internet Search Engines to Achieve Concept- based Retrieval F. Lu, T. Johnsten, V. Raghavan, and D. Traylor](https://reader034.vdocument.in/reader034/viewer/2022051401/56649e055503460f94af19a5/html5/thumbnails/14.jpg)
MTS-Minimal Term Set
A MTS for a topic is a set of terms such that if each term in the set appears in the document, the document would get a RSV larger than 0. If not, the RSV would be 0.
A topic could have more than one MTSs. A user can choose from those MTSs to perform a
search to his needs.
![Page 15: Enhancing Internet Search Engines to Achieve Concept- based Retrieval F. Lu, T. Johnsten, V. Raghavan, and D. Traylor](https://reader034.vdocument.in/reader034/viewer/2022051401/56649e055503460f94af19a5/html5/thumbnails/15.jpg)
![Page 16: Enhancing Internet Search Engines to Achieve Concept- based Retrieval F. Lu, T. Johnsten, V. Raghavan, and D. Traylor](https://reader034.vdocument.in/reader034/viewer/2022051401/56649e055503460f94af19a5/html5/thumbnails/16.jpg)
![Page 17: Enhancing Internet Search Engines to Achieve Concept- based Retrieval F. Lu, T. Johnsten, V. Raghavan, and D. Traylor](https://reader034.vdocument.in/reader034/viewer/2022051401/56649e055503460f94af19a5/html5/thumbnails/17.jpg)
![Page 18: Enhancing Internet Search Engines to Achieve Concept- based Retrieval F. Lu, T. Johnsten, V. Raghavan, and D. Traylor](https://reader034.vdocument.in/reader034/viewer/2022051401/56649e055503460f94af19a5/html5/thumbnails/18.jpg)
![Page 19: Enhancing Internet Search Engines to Achieve Concept- based Retrieval F. Lu, T. Johnsten, V. Raghavan, and D. Traylor](https://reader034.vdocument.in/reader034/viewer/2022051401/56649e055503460f94af19a5/html5/thumbnails/19.jpg)
CS3 and Boolean Search Engines
• CS3 is designed to interface with existing Boolean search engines.
• U.S. Department of Energy’s “Information-Bridge” search engine.
• U.S. Department of Transportation’s “National Transportation Library” search engine.
![Page 20: Enhancing Internet Search Engines to Achieve Concept- based Retrieval F. Lu, T. Johnsten, V. Raghavan, and D. Traylor](https://reader034.vdocument.in/reader034/viewer/2022051401/56649e055503460f94af19a5/html5/thumbnails/20.jpg)
System Architecture
Client (Java/ Applet )
CORBA CGI
Server (JAVA) Server (JAVA/C++)
JDBC
ORACLE
DOE
InfoBridge…
etc.
![Page 21: Enhancing Internet Search Engines to Achieve Concept- based Retrieval F. Lu, T. Johnsten, V. Raghavan, and D. Traylor](https://reader034.vdocument.in/reader034/viewer/2022051401/56649e055503460f94af19a5/html5/thumbnails/21.jpg)
Information-Bridge and CS3
• Search request: Boolean Vs. Concept
• Output: Non-Ranked Vs. Ranked.
• Calculation of RSV:– Given a document D and a set S of MTS
expressions satisfied by D, the RSV of D is equal to the sum of all the weights of S plus the maximum weight in S.
![Page 22: Enhancing Internet Search Engines to Achieve Concept- based Retrieval F. Lu, T. Johnsten, V. Raghavan, and D. Traylor](https://reader034.vdocument.in/reader034/viewer/2022051401/56649e055503460f94af19a5/html5/thumbnails/22.jpg)
Information-Bridge and CS3 (Example)
• Boolean search request (“Environmental Science Network” Form):– (“Hydrogeology” OR “Dnapl” OR (“Colloid*”
AND “Environmental Transport”)).
• Concept (CS3):– “Hydrogeology”.– Rule-Base Tree.
![Page 23: Enhancing Internet Search Engines to Achieve Concept- based Retrieval F. Lu, T. Johnsten, V. Raghavan, and D. Traylor](https://reader034.vdocument.in/reader034/viewer/2022051401/56649e055503460f94af19a5/html5/thumbnails/23.jpg)
CS3 Hydrogeology Rule Base
![Page 24: Enhancing Internet Search Engines to Achieve Concept- based Retrieval F. Lu, T. Johnsten, V. Raghavan, and D. Traylor](https://reader034.vdocument.in/reader034/viewer/2022051401/56649e055503460f94af19a5/html5/thumbnails/24.jpg)
CS3 search results
![Page 25: Enhancing Internet Search Engines to Achieve Concept- based Retrieval F. Lu, T. Johnsten, V. Raghavan, and D. Traylor](https://reader034.vdocument.in/reader034/viewer/2022051401/56649e055503460f94af19a5/html5/thumbnails/25.jpg)
Deep Web Sources• Also referred to as hidden Web or invisible
Web
• Resides behind search forms in databases e.g. monster.com, louisiana1st.com, PubMed.
• Web pages in deep Web are generated dynamically based on the submitted queries.
• Not indexed by current search engines. Search engines index content on the surface Web.
![Page 26: Enhancing Internet Search Engines to Achieve Concept- based Retrieval F. Lu, T. Johnsten, V. Raghavan, and D. Traylor](https://reader034.vdocument.in/reader034/viewer/2022051401/56649e055503460f94af19a5/html5/thumbnails/26.jpg)
Deep Web Sources and Concept-based Retrieval
• Deep Web in terms of size and quality:Size (Deep Web) = 500 * Size (Surface Web)Quality (Deep Web) = 1000 * Quality
(Surface Web)• Queries submitted at deep Web sources are more
stable compared to queries submitted to search engines
• So, naturally concept-based retrieval is more suitable for deep Web sources
![Page 27: Enhancing Internet Search Engines to Achieve Concept- based Retrieval F. Lu, T. Johnsten, V. Raghavan, and D. Traylor](https://reader034.vdocument.in/reader034/viewer/2022051401/56649e055503460f94af19a5/html5/thumbnails/27.jpg)
Current and Future Work
• Conduct experiments to evaluate effectiveness (future).
• Investigate alternative methods to compute RSVs [KADR00, KDR01*].
• Learning edge weights through relevance feedback [KR00].
• Thesaurii based rulebase generation [KLR00].
![Page 28: Enhancing Internet Search Engines to Achieve Concept- based Retrieval F. Lu, T. Johnsten, V. Raghavan, and D. Traylor](https://reader034.vdocument.in/reader034/viewer/2022051401/56649e055503460f94af19a5/html5/thumbnails/28.jpg)
Relevant URLs
[LJRT99*]
RaghavanHome Publications since 1991
www.allinonenews.com