assuming accurate layout information for web documents is available, what now? hassan alam, rachmat...
TRANSCRIPT
Assuming Accurate Layout Assuming Accurate Layout Information for Web Information for Web Documents is Available, What Documents is Available, What Now? Now?
Hassan Alam, Rachmat Hartono, Aman Kumar, Hassan Alam, Rachmat Hartono, Aman Kumar, Fuad Rahman,Fuad Rahman, Yuliya Yuliya Tarnikova and Che Wilcox Tarnikova and Che Wilcox
Human Computer Interaction GroupHuman Computer Interaction GroupBCL Technologies Inc. Santa Clara, CA 95050BCL Technologies Inc. Santa Clara, CA 95050www.www.bcltechnologiesbcltechnologies.com.com
[email protected]@bcltechnologies.com
Overview of the talkOverview of the talk Web pages vs. document layout Why do we need layout information? Web page summarization for
handheld devices The future: Marrying Ontology with
XML Conclusion and Future Work
Related WorkRelated Work
Handcrafting
Transcoding
Adaptive Re-authoring
Handcrafting involves typically crafting web pages by hand by a set of content experts for device specific output.
Transcoding replaces HTML tags with suitable device specific tags, such as HDML, WML and others.
The research on web page re-authoring can explicitly use natural language processing or use non-NLP techniques.
Web Page Summarization for Web Page Summarization for Handheld DevicesHandheld Devices
Web Page Data Structure
Content Analysis Content Processing for Re-authoring
Verbatim Transcode Summarize
Node Merging
Representing the Complete Web page
When to Summarize? Creating a label Creating a Summary
The Future: Marrying The Future: Marrying Ontology with XMLOntology with XML
We assume that we have layout information for a web page
What do we do then? How do we use this
information? How do that information help
us in getting better re-authoring solutions?
We then define an ontology for that domain!
We define an XML to code that information
To define an ontology for the domain of web pages
What is Ontology and How do We What is Ontology and How do We Define it?Define it?
Ontology is a specification of a conceptualization.
Ontology establishes a joint terminology between members of a community of interest.
These members can be human or automated agents.
A list of elements
Concept hierarchy
Concept association
Rules or axioms
Web Page Summarization for Web Page Summarization for Handheld Devices using OntologyHandheld Devices using Ontology
Web Page Data Structure
Content Analysis Content Processing for Re-authoring
Verbatim Transcode Summarize
Node Merging
Representing the Complete Web page
When to Summarize? Creating a label Creating a Summary
Output Level Decided
Use Ontology to re-format the web page
XML Structure Derived
Device Specific Display
What is the Advantage ofWhat is the Advantage of using using Ontology?Ontology?
It improves the quality of the output in many ways. It becomes possible to capture the contextual
relationship among various components within the document
It leads to better understanding of the information contained within the document.
This additional information can be used in other processes, such as document categorization and contextual search.
Future WorkFuture Work
It is assumed that the future of mobile browsing lies in the adoption of semantic web technology.
Before that realizes, the proposed approach offers a workable compromise to generate high fidelity re-authored web pages.
This is an exploratory paper offering a specific pathway to the future of web page re-authoring provided accurate layout information is available.
Currently, it is beyond the capability of any algorithm to achieve this level of accuracy. However, approximations to that accuracy are attainable and even practical. It will be interesting to discuss other possibilities in this space.
ConclusionsConclusions
Some ideas about how to produce better web page re-authoring solutions by using linguistic knowledge and ontology assuming accurate layout information for web pages is available.
It is shown that such an approach will produce high quality intelligent summary for web pages allowing fast and efficient web browsing on small display handheld devices.