data-oriented content query system: searching for data into text on the web

14
Data-oriented Content Query System: Searching for Data into Text on the Web anwei Zhou, Kevin Chen-Chuan Chang partment of Computer Science UC 1

Upload: sylvia

Post on 21-Mar-2016

30 views

Category:

Documents


2 download

DESCRIPTION

Data-oriented Content Query System: Searching for Data into Text on the Web. Mianwei Zhou, Kevin Chen-Chuan Chang Department of Computer Science UIUC. Many Web Applications Try to Exploit the “Content” of Web Pages. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Data-oriented Content Query System: Searching for Data into Text on the Web

1

Data-oriented Content Query System: Searching for Data into Text on the Web

Mianwei Zhou, Kevin Chen-Chuan ChangDepartment of Computer ScienceUIUC

Page 2: Data-oriented Content Query System: Searching for Data into Text on the Web

Many Web Applications Try to Exploit the “Content” of Web Pages.

Web Info Extractio

n

Typed Entity Search

Web-based

Q/A

In most cases, what we really want are not pages, but the information units inside.

? ?

2

Page 3: Data-oriented Content Query System: Searching for Data into Text on the Web

3

Content-Exploiting Application 1:

Web Information Extraction

Specialized Information Extractors

Web Information Extraction (WIE)(Marius 2006, Cafarella 2005, Etzioni 2004)

Pattern: “#Number

people die of #Disease each

year”

Disease DeathInfluenza 63730 Penumonia

61776

… …Limitation

•Focus on simple patterns.

•Lack of interactivity.

Page 4: Data-oriented Content Query System: Searching for Data into Text on the Web

Content-Exploiting Application 2:

Web-based Question AnsweringWeb-based Question Answering (WQA)

(Wu 2007, Lin 2003, Brill 2002)How many

people die from seasonal flue each year in

US? Keywords:“seasonal flu death”

Parse Top-k results

Around 36,000

Limitation

•Only rely on top-k pages to

retrieve the answer.

4

Page 5: Data-oriented Content Query System: Searching for Data into Text on the Web

Content-Exploiting Application 3:

Typed-Entity SearchTyped-Entity Search (TES)

(Cheng 2007, Cafarella 2007, Chakrabarti 2006)

Amazon Phone

……0.60

0.80

0.90

Ranked Entity List

But … Where is Professor

Limitation

•Limited Number of Data Type

•Lack of Flexibility

5

Page 6: Data-oriented Content Query System: Searching for Data into Text on the Web

General System for Querying Text “Content”, Much Like How DBMS Supports Data Application

? ?Data-oriented Content Query System

Web Info Extractio

n

Typed Entity Search

Web-based QA

Requirements1. Extensible Data Types2. Flexible Contextual

Patterns3. Customizable Scoring

Page 7: Data-oriented Content Query System: Searching for Data into Text on the Web

System FrameworkInput: CQL

Output: Table

Input: CQL (Content Query Language)

OutputExpert Users

Entity Searc

hWeb QA

Common Users

Data-oriented Content Query System

Page 8: Data-oriented Content Query System: Searching for Data into Text on the Web

8

Demo

Online Demohttp://wisdm.cs.uiuc.edu/demos/docqs

Page 9: Data-oriented Content Query System: Searching for Data into Text on the Web

Why Relational Model:Occurrence->Tuple

What we need Relational Model

Person OrganizationLocation

Number

Number

Person

Organization

Location

Page 10: Data-oriented Content Query System: Searching for Data into Text on the Web

What we need

Why Relational Model: Content Query->Relational Operation

Relational Model

Find the population of China

WHERE pattern(…)

GROUP BY #number

ORDER BY conf()

FROM #number

China has a population of 1.3 billion

China with its population of 1.3 billion people

China is established in 1949.

Shanghai is the largest city with 15 million inhabitants in China

1.3 billion 15 million

1. 1.3 billion

2. 15 million

Page 11: Data-oriented Content Query System: Searching for Data into Text on the Web

What we need

Why Relational ModelExtensible Data Type->View

Relational Model

Number

Location

Person

PopulationPhonePrice

CapitalHeadquarter

ProfessorCEOPresident

TableView

Number

population

price

phone

Page 12: Data-oriented Content Query System: Searching for Data into Text on the Web

Main Technique: Highly Efficient and Scalable Index Structure

Index Layer

Parsing Layer

Index Selection Module

Execution Tree

INPUTSELECT

…FROM …WHERE

OUTPUT

Index DesignSpecial Inverted

Index•Contextual Index•Join Index

Query OptimizationGraph Coverage

Problem

Data Type

Repository

Data Type Definition

Experimental Result

• Speed improvement: 6-10

times•Space overhead: Around 2

times original corpus size.

Page 13: Data-oriented Content Query System: Searching for Data into Text on the Web

Data-oriented Content Query System: Supporting Ad-hoc, Interactive “Content” Search.

Data-oriented Content Query System

Web Info Extraction

Typed Entity Search

Web-based Q/A

Interactive

Ad-hoc

Page 14: Data-oriented Content Query System: Searching for Data into Text on the Web

14

Q & AThank You!