leveraging publisher’s search engines to deliver relevant results to users

Post on 25-Jan-2016

24 Views

Category:

Documents

5 Downloads

Preview:

Click to see full reader

DESCRIPTION

Leveraging Publisher’s Search Engines to Deliver Relevant Results to Users. Presented by Abe Lederman, President and CTO Deep Web Technologies, LLC. 28 th Annual Scholarly Publishing Meeting – Virginia – June 9, 2006. Abe’s Background. Earned B.S. and M.S. Computer Science degrees, MIT - PowerPoint PPT Presentation

TRANSCRIPT

Leveraging Publisher’s Search Engines to Deliver Relevant

Results to Users

Presented by Abe Lederman, President and CTO

Deep Web Technologies, LLC

28th Annual Scholarly Publishing Meeting – Virginia – June 9, 2006

Abe’s Background• Earned B.S. and M.S. Computer Science degrees, MIT• 18 years experience developing sophisticated

information retrieval applications• Cofounded Verity, 1988• Consulted to LANL, 1994-2000• Deployed first “federated search” portal in the Federal

government, 1999• Founded Deep Web Technologies (DWT), 2002

DWT is a New Mexico based company focused on providing state-of-the-art software solutions which search, retrieve,

aggregate, and analyze content from web-based databases.

The Problem:

Searching a large number of

sources can lead to a flood of

results

Relevance ranking

begins as soon as the user clicks the Search

button

Ranking Recipe

Source Selection

Query Language

Search Conductor

Ranking Algorithms

INGREDIENTS

MIX WELL AND SERVE UP RELEVANT RESULTS

Source Selection Optimizer

Search Conductor

Source Selection Optimizer

Source

Descriptions Previous Results

Powerful Query Language

• Takes advantage of search capabilities of each source

• Supports full Boolean operators where possible

• Supports fielded search

• Translates natural language questions into query syntax

Select sources to search

Can I get more results from “good”

sources?

Enough good

results?

YES

Deliver results to user

YES

NO

NO

Perform Search

Get Next Results

Search Conductor

Challenges in Organizing and Ranking Results

Multi-tier Relevance Ranking

User-driven Ranking

Clustering of Results

Multi-tier Relevance Ranking

• QuickRank – Ranks results based on occurrence of search terms in title, author, and snippet

• MetaRank – Ranks results utilizing custom algorithms applied to meta-data

• DeepRank – Downloads and indexes full-text documents

HEAVY LIFTING REQUIRED!

User-driven Ranking

Credibility of sourceDate rangeDocument lengthDocument type

Geographic proximityPopularity of documentReading levelRelevance

Desired: Blending (weighing) of above criteria

Clustering

Attributes of Successful Federated Search

• Powerful query language that takes advantage of publisher search capabilities

• Source selection optimizer will reduce unnecessary searches

• Search conductor gets more results from sources bringing back good results

• A tool that highlights best search results

• Caching of search results

Advice for Publishers

• Use good search engines with good relevance ranking

• Return 100 or more results at a time

• Return meta-data (author, journal, snippet) as part of result list

• Provide access to your content through XML Gateway or Web Services

• Speed up search time

Abe Lederman

301 N Guadalupe, Ste 201

Santa Fe, NM 87501

abe@deepwebtech.com

www.deepwebtech.com

Thank You!

top related