data extraction from html tables cui tao department of computer science brigham young university

Post on 20-Dec-2015

215 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Data Extraction From HTML Tables

Cui Tao

Department of Computer Science

Brigham Young University

Information In Tables

Nowadays, significant portion of the information on the Wed is stored in tables.

The Ontology-Based Extraction

The Ontology-Based Extraction

Major Problems

In the tables, the values and their corresponding attributes are separately. But the ontology can only extract the data when they are together.

Sometimes the attributes in the table are the values in the database, the values in the table are only the identifier of the attributes.

Sometimes, the values in one cell of the table may informs several attribute values in the database.

Attribute-Value Pair

Attribute: (part of the) constant/key word rule

How To Solve This Problem?

Put the attribute-value pair together.Try both order.

More General…

The attributes in the table are actually values in the database…

Attribute Value

How To Solve This Problem?

Put attribute in the file depends on the Boolean value

Value Multiple Information

More Problems …

top related