Download - Data Extraction From HTML Tables Cui Tao Department of Computer Science Brigham Young University
Data Extraction From HTML Tables
Cui Tao
Department of Computer Science
Brigham Young University
Information In Tables
Nowadays, significant portion of the information on the Wed is stored in tables.
The Ontology-Based Extraction
The Ontology-Based Extraction
Major Problems
In the tables, the values and their corresponding attributes are separately. But the ontology can only extract the data when they are together.
Sometimes the attributes in the table are the values in the database, the values in the table are only the identifier of the attributes.
Sometimes, the values in one cell of the table may informs several attribute values in the database.
Attribute-Value Pair
Attribute: (part of the) constant/key word rule
How To Solve This Problem?
Put the attribute-value pair together.Try both order.
More General…
The attributes in the table are actually values in the database…
Attribute Value
How To Solve This Problem?
Put attribute in the file depends on the Boolean value
Value Multiple Information
More Problems …