a12-sun

8/8/2019 a12-sun

1/34

P1: OJL ACMJ321-02 ACM-TRANSACTION September 19, 2007 23:48

12

Model-Directed Web Transactions Under

Constrained ModalitiesZAN SUN, JALAL MAHMUD, and I. V. RAMAKRISHNANStony Brook UniversityandSAIKAT MUKHERJEESiemens Corporate Research

Online transactions (e.g., buying a book on the Web) typically involve a number of steps spanningseveral pages. Conducting such transactions under constrained interaction modalities as exempli-ed by small screen handhelds or interactive speech interfacesthe primary mode of communica-tion for visually impaired individualsis a strenuous, fatigue-inducing activity. But usually oneneeds to browse only a small fragment of a Web page to perform a transactional step such as aform llout, selecting an item from a search results list, and so on. We exploit this observation todevelop an automata-based process model that delivers only the relevant page fragments at eachtransactional step, thereby reducing information overload on such narrow interaction bandwidths.We realize this model by coupling techniques from content analysis of Web documents, automatalearningand statistical classication. The processmodel and associated techniqueshave been incor-porated into Guide-O,a prototype systemthat facilitates onlinetransactions usingspeech/keyboardinterface (Guide-O-Speech), or with limited-displaysize handhelds (Guide-O-Mobile). Performanceof Guide-O and its user experience are reported.

Categories and Subject Descriptors: I.7.5 [ Document and Text Processing ]: Document Cap-ture Document analysis

General Terms: Algorithms, Human Factors

Additional Key Words and Phrases: Web transaction, content adaption, assistive device

This work was supported in part by NSF grants CCR-0311512 and IIS-0534419.This article is an extended version of our paper Model-Directed Web Transactions under Con-strained Modalities, which appears in the proceedings of International World Wide Web Confer-ence 2006 (WWW 06). Authors addresses: Z. Sun, J. Mahmud, and I. V. Ramakrishnan, Department of Computer Sci-ence, Stony Brook University, Stony Brook, NY 11794; email: {zsun,jmahmud,ram } @cs.sunysb.edu;S. Mukherjee, Siemens Corporate Research, Princeton, NJ 08540; email: [email protected] to make digital or hard copies of part or all of this work for personal or classroom use isgranted without fee provided that copies are not made or distributed for prot or direct commercialadvantage and that copies show this notice on the rst page or initial screen of a display alongwith the full citation. Copyrights for components of this work owned by others than ACM must behonored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers,to redistribute to lists, or to use any component of this work in other works requires prior specicpermission and/or a fee. Permissions may be requested from Publications Dept., ACM, Inc., 2 PennPlaza, Suite 701, New York, NY 10121-0701 USA, fax +1 (212) 869-0481, or [email protected] 2007 ACM 1559-1131/2007/09-ART12 $5.00 DOI 10.1145/1281480.1281482 http://doi.acm.org/ 10.1145/1281480.1281482

ACM Transactions on the Web, Vol. 1, No. 3, Article 12, Publication date: September 2007.

8/8/2019 a12-sun

2/34


12:2 Z. Sun et al.

ACM Reference Format:Sun,Z., Mahmud, J., Ramakrishnan, I. V., andMukherjee, S.2007.Model-directed web transactionsunder constrained modalities. ACM Trans. Web 1, 3, Article 12 (September 2007), 34 pages. DOI= 10.1145/ 1281480.1281482 http://doi.acm.org/10.1145/1281480.1281482

1. INTRODUCTION

The World Wide Web has become the dominant medium for doing e-commerce.The volume of goods bought from online stores continues to grow dramatically. A Web transaction such as buying a CD player from an online store involvesa number of user steps spanning several Web pages. As an illustration let usexamine some common steps for buying a CD player from Best Buy ( http://www.bestbuy.com ). To begin the user lls out the search form with electronicsas the category and CD Player as the item. The search result generated in

response is shown in Figure 1(b). The user selects the very rst item in theresult list which leads to the page in Figure 1(c) containing product details.To complete the transaction the user adds this selected item to the cart whichleads to the page in Figure 1(d) wherein the user selects checkout. Observethat there are two essential components to a Web transaction: (i) locating therelevant content, such as a search form or the desired item in a Web page, and(ii) performing a sequence of steps, such as lling out a search form, selectingan item from the search list and doing a checkout. For completing a transactionthese steps usually span several pages.

Online transactions such as the one described above are usually performedwith graphical Web browsers. The primary mode of interaction with graphicalbrowsers is visual, an intrinsically spatial modality. Hence, users can quicklyscan through the rich engaging content in Web pages scripted for e-commerceand locate the objects of interest quite easily. Moreover, the spatial organiza-tion of content in these pages helps users comprehend the sequence of stepsnecessary to complete a transaction.

Nowconsider scenarioswhere visual interaction is impossible ( e.g., when theuser is a visually handicapped individual) or the interaction media has smalldisplays (e.g., mobile handhelds). Speech is the primary modality of interac-tion in the former situation. Speech interfaces and small displays offer narrowinteraction bandwidths making it cumbersome and tedious to get to the perti-nent content in a page. For instance, state-of-the-art screen readers and audiobrowsers (e.g., JAWS and IBMs Home Page Reader) provide almost no form of ltering of the content in a Web page resulting in severe information overload .This problem is further exacerbated when such an interaction spans severalpages as in an online transaction. In particular the loss of spatially organizedcontent makes it difcult for users to comprehend the sequence of transactionalsteps. While content summarization can compensate somewhat for this loss, italone cannot handle the information overload that the user faces.

Thus, there is a need for techniques that facilitate Web transactions usingconstrained interaction modalities that are far less cumbersome to do than cur-rent approaches. This article addresses this problem and describes our solution.


8/8/2019 a12-sun

3/34

8/8/2019 a12-sun

4/34


12:4 Z. Sun et al.

Our approach to semantic concept identication in Web pages is built uponour previous work [Mukherjee et al. 2005] where we had proposed a technique

for partitioning a pages content into segments constituting concept instances.It uses a combination of structural analysis of the page and machine learning.We adopt it for the problem addressed in this paper and enhance its learningcomponent to produce more robust statistical models of semantic concepts. It isnoteworthy to point out that the use of content semantics, as opposed to syntax-based techniques for identifying concepts, makes it more robust to structural variations in the pages andscalable over Web sources that share similar contentsemantics.

We also use automata learning techniques (see Murphy [1996] for a survey)to construct process models from training sequences generated from real Webtransactions. The use of process models for online transactions bears similar-ities to the emerging Web services paradigm for conducting automated trans-actions. But the fundamental difference is that our technique works on Web

pages instead of services exposed by a service provider (see Section 6 for de-tailed comparison).The rest of this article is organized as follows: In Section 2, we describe a

user scenario and the architecture of Guide-O, a prototype system that we im-plemented based on the techniques detailed in this paper. Currently, Guide-Ocan be congured to work with speech ( Guide-O-Speech ) or with small displayhandhelds ( Guide-O-Mobile ). In Section 3 we formalize the process model anddescribe its implementation using DFA learning techniques. Content analysistechniques for semantic concept identication appear in Section 4. Quantitativeexperimental evaluation of the two Guide-O congurations as well as qualita-tive user experience appear in Section 5. Related work appears in Section 6,and we conclude the paper in Section 7.

2. THE GUIDE-O SYSTEM

2.1 Use Scenario of Guide-O Speech:

Alice, who is a visually impaired individual, is planning on replacing her bro-ken CD player with a new one from Best Buy. To begin, she speaks Best BuysURL to Guide-O. After retrieving the home page Guide-O analyzes this page;extracts the two concepts in it, namely Item Taxonomy (the circled item on theleft in Figure 1(b)) and Search Form (the circled item on the top of Figure 1);and asks Alice to choose one of them. Alice says Search Form and in responseGuide-O reads out the drop-down items in the search form pausing briey aftereach item. Alice can pick an item at any time by saying either the item name orits number. Alice says Electronics and Guide-O prompts her for the electronicitem she wishes to search for. Alice responds with CD Player. Guide-O sub-mits the search form lled with these two parameters that results in fetchingthe page containing the search results shown in Figure 1(b). Guide-O extractsthe Search Result concept and begins reading out the brief description asso-ciated with each CD player in this list. Alice says item 1 causing Guide-O tofollow the link associated with the 1st player (CDP-CE375) in the list to the


8/8/2019 a12-sun

5/34


Model-Directed Web Transactions Under Constrained Modalities 12:5

page containing a detailed description of this player (Figure 1(c)). In this pageGuide-O extracts three concepts, namelySearch Form,Item Detail, andAdd

To Cart. Alice is asked if she wishes to hear the product details. When Aliceresponds in the afrmative, Guide-O reads out the detailed description of theCD player she picked earlier. At the end Alice is asked if she wishes to add thisto her shopping cart. Alice responds yes. Guide-O follows the link labeled AddTo Cart in Figure 1(c) to the page shown in Figure 1(d). In this page Guide-Oextracts the concepts of Search Form, Shopping Cart, Checkout and Con-tinue Shopping. When presented with these choices, Alice chooses Checkout.To complete the transaction Alice must provide credit card information uponcheckout. Its details have been omitted as they are quite similar to the formlling step described in the rst step of the scenario. At any point Alice can alsosay any one of a set of general-purpose navigation commands, such as Backto top page, Start over, Repeat (last item), or Stop. Besides, on a laptopor desktop computer Alice could also use a keyboard in addition to speech to

interact with Guide-O.

2.2 Use Scenario of Guide-O Mobile:

Let us examine how Jane, a Verizon subscriber, will use the Guide-O-Mobile topay her telephone bill. To begin, she submits Verizons URL to Guide-O-Mobile. After retrieving the home page, Guide-O-Mobile analyzes this page, extractsthe My Account concept segment (Figure 2(a)) in it and displays it to Jane.Jane clicks on the Sign in link, and in response the Sign-in concept segment(Figure 2(b)) is presented and Jane types in the login info. On successful login,the Billing Summary segment (Figure 2(c)) is displayed along with two actionchoicesView-Bill and Pay-Bill. Jane chooses the former and the CompleteBill segment (Figure 2(d)) is shown. In this page, Jane is presented with two

action choicesPay-Bill and View-Detail. Jane picks the latter, which causesthe Bill Details concept segment (Figure 2(e)) to be presented. Jane choosesthe Pay-Bill action here and the Bill Payment segment data (Figure 2(f))is displayed. Jane provides payment information in this page. On receipt of this data a Submit-Form action takes place and the transaction proceeds to thepage where the payment data is reviewed and a nal sign-out completes thetransaction.

The use scenario above illustrates how a user conducts Web transactionswith Guide-O-Mobile using mobile handheld devices where graphical interac-tion with a standard browser is not feasible because of their small screen size.Instead of displaying the entire Web page to the user, Guide-O-Mobile displaysonly the extracted concepts of a page. Since usually these are few in numberwith small content, they can be browsed more easily even under small screen

size constraint.

2.3 Guide-O Architecture

The architecture of Guide-O is shown in Figure 3. It can be congured to op-erate with speech/keyboard inputs (Guide-O-Speech) or with mobile handhelddevices (Guide-O-Mobile). In the latter conguration the Interface Manager


8/8/2019 a12-sun

6/34


12:6 Z. Sun et al.

Fig. 2. Utility bill payment using Guide-O-Mobile.

automatically generates a DHTML page to display the extracted concepts whilefor the former conguration it automatically generates a VoiceXML 1 dialog in-terface. We use freely available text-to-speech synthesizers and speech recog-nizers in Guide-O-Speech. We have developeda VoiceXML interpreter to handle

both speech as well as keyboard inputs.The Content Analyzer partitions an input Web page into segments contain-

ing semantically related content elements (such as Item Taxonomy, SearchResult in Figure 1). Using an underlying shallow ontology of concepts present

1http://www.w3.org/TR/voicexml20/


8/8/2019 a12-sun

7/34



Fig. 3. Guide-O system architecture.

in Web pages it classies these segments to the concepts in the ontology andlabels them with their names. (Section 4 describes the classication technique.)Table I shows concept names in our shallow ontology for online shopping trans-action domain (e.g., books, consumer electronics, ofce supplies). Table II listssome concept names in a shallow ontology for the utility bill payment trans-action domain. Associated with each concept is an operation as shown in thetables. When the user selects a concept in a state, the corresponding opera-tion is invoked. The ontology also includes information for classication of pagesegments to concepts.

The Browser Object Interface fetches pages from Web servers. Special fea-tures include automatic form llouts and retrieval of pages pointed to by navi-gable links, which requires execution of javascript.

The Process Model orchestrates all the actions that take place in Guide-O.In a state, it takes a URL as the input, calls the Browser Object Interface tofetch the page and the Content Analyzer to extract from the page the conceptsthat it expects. The extracted concepts are organized as a concept tree that isdispatched by the Process Model to the Interface Manager for presentation tothe user. When the user selects a concept, it is sent to the Process Model as an


8/8/2019 a12-sun

8/34


12:8 Z. Sun et al.

Table I. Concepts in Ontology for OnlineShopping

Concept OperationShopping Cart view shoppingcart Add To Cart add to cartEdit Cart update cartContinue Shopping continue shoppingCheckout check outSearch Form submit searchformSearch Result item selectItem List item selectItem Taxonomy select item categoryItem Detail show item detail

Table II. Concepts in Ontology for UtilityBill Payment

Concept OperationMy Account Sign-InSign In Submit-FormBilling Summary Pay-Bill, View-BillComplete Bill Pay-Bill, View-DetailBill Payment Submit-FormBill Details Pay-Bill

operation on the concept. A state transition based on the operation is made andthe cycle repeats.

The architecture described is in essence the runtime engine that drives theGuide-O system. The Process Model and the Concept Extractor in the engineare learned a priori in the learning component.

3. PROCESS MODEL

Formally, our process model is dened as follows: Let C = { c0, c1, . . . } be a setof concepts, and I (c) denote the set c of concept instances. Let Q = { q0 , q1, . . . }be a set of states. With every state qi we associate a set S i C of concepts. LetO = { o0, o1, . . . } be a set of operations. An operation oi can take parameters. Atransition is a function Q O Q , and a concept operation is also a functionC O . Operations label transitions; that is, if (qi , o) = q j then o is the labelon this transition. An operation o = (c) is enabled in state qi whenever theuser selects an instance of c S i and when it is enabled a transition is madefrom qi to state q j = (qi , o).

Technically a concept instance is the occurrence of a concept in a Web page.For example, the circled items in Figure 1 are all concept instances. But forbrevity we choose not to make this distinction explicitly and use concepts andconcept instances interchangeably when referring to content segments in Webpages.

Figure 4 illustrates a process model. The concepts associated with the start-ing state q1 are Item Taxonomy, Item List, and Search Form. This means


8/8/2019 a12-sun

9/34



Fig. 4. A process model for online shopping.

Fig. 5. A process model for utility bill payment.

that if these concept instances are present in the Web page given to q1 as itsinput, they will be extracted and presented to the user. User can select anyof these concepts. When the user selects the Search Form concept, he is re-quired to supply the form input upon which the submit searchform operationis invoked. This amounts to submitting the form with the user-supplied forminput. A Web page consisting of the search results is generated and a transitionis made to q3 . As discussed in the use scenario in Section 2 we have omitted thepayment steps following checkout.

Hence q6, which is entered upon a check out operation, is deemed as the nalstate.

Figure 5 is an example of a process model for utility bill payment correspond-ing to the use scenario for Guide-O-Mobile described in Section 2.

Process Model Learning

A process model can be built manually or learned from training data. For simpletasks, users can create their own process models. However, when we dont havesufcient knowledge on how to accomplish the task in a step-by-step processor there are variations in how the same task is carried out in different sites,


8/8/2019 a12-sun

10/34


12:10 Z. Sun et al.

Fig. 6. (a) A prex automata. (b) learned DFA.

manual construction of the process model becomes cumbersome. This is moreevident when the resultant model is large (i.e., has large number of states andtransitions). Here we propose a learning based approach of process model con-struction, which is generic and scalable. Specically, we built the process modelusing DFA learning techniques, a thoroughly researched topic (see Section 6 forrelated work). In the DFA learning problem the training set consists of two setsof example strings, one labeled positive and the other negative. Only strings inthe positive set are in the DFAs language. The objective is to construct a DFAthat is consistent with respect to these two sets; that is, it should accept stringsin the positive set while rejecting those in the negative set. We adapted theheuristic in Oncina and Garc [1991] for learning our process model, the choicebeing mainly dictated by its simplicity and low complexity.

The training sequences we used for learning the process model con-sist of strings whose elements are operations on concepts. The sequence< submit searchform , item select , add to cart , check out > is one such example.These training sequences are (manually) labeled completed and not com-pleted. Thepositive example set ( S + ) consists of sequences labeled completedwhile the negative example set ( S ) consists of those labeled not completed.

We rst construct a prex tree automata as shown in Figure 6(a) using onlythe examples in S + . In Figure 6(a), the sequence of operations along each root-to-leaf path constitutes a string in S + . For this example S consists of thestrings: {< check out > , < submit searchform , add to cart > , < submit searchform , check out > , < select item category , add to cart , check out > }. Theprex of every string in S + is associated with a unique state in the prextree. The prexes are ordered and each state in the prex tree automata isnumbered by the position of its corresponding prex string in this lexicographic

8/8/2019 a12-sun

11/34



order. Next we generalize the prex tree automata by state merging. We choosestate pairs ( i, j ), i < j as candidates for merging. The candidate pair ( i, j ) is

merged if it results in a consistent automata. For example, merging the pair(1,2) is consistent whereas (3,4) is not merged as the resulting automata willaccept the string < submit searchform, check out > in S . The DFA that resultsupon termination of this merging process on the above example set is shownin Figure 6(b).

Since we do at most Q 2 state mergings, where Q is the cardinality of S + ,the time complexity is polynomial. In Section 5 experimental evaluation of theperformance of the learned process model is presented.

4. CONTENT ANALYSIS

Herein we describe the content analyzer module that extracts the relevant con-cepts. In a nutshell this is achieved by partitioning a Web page into segmentsof semantically related items and classifying them against concepts in the on-tology. In the following we provide the detail of our approach.

The Approach

It is based on our previous work on learning-based semantic analysis of Webcontent [Mukherjee et al. 2005]. Briey, the technique rests on three key steps:(i) inferring the logical structure of a Web page via structural analysis of itscontent, (ii) learning statistical models of semantic concepts using lightweightfeatures extracted from both the content and its logical structure in a set of training Web pages, and (iii) applying these models on the logical structures of new Web pages to automatically identify concept instances.

4.1 Structural Analysis:

The basic idea of structural analysis comes from our previous work (seeMukherjee et al. [2005] for details), which is based upon the observation thatsemantically related items in Web pages typically exhibit consistency in pre-sentation style and spatial locality. For example, in the search result page of BestBuy (Figure 1(b)), all the search result items which are semantically re-lated, located in the middle of the page, have similar presentation: an imagefollowed by the product name and the description of the product.

4.1.1 Type System. Exploiting the above observation, we have redesignedour simple but effective type system in the previous work to encode the pre-sentation style and structural information. Instead of treating the HTML tagsequally for the primary type (which is the sequence containing all the HTMLtags in the root-to-leaf path in the DOM tree; see Mukherjee et al. [2005] formore details), we divide the HTML tags into three categories: style tags , layouttags and functional tags (see Table III).Style tags in a HTML page are about thepresentation style of visible elements such as font and style ; functional tagsare those that enable HTML document browsers to handle events and functionssuch as script and meta ; layout tags are all the tags other than the above twocategories, and are used to arrange the position of the visible elements. For the

8/8/2019 a12-sun

12/34


12:12 Z. Sun et al.

Table III. Categories of the HTML Tags

Category HTML TagsStyle tags font , color , style , strong , i , b, a , big , em, img .Layout tags table , tr , td , p, br , etc. (All the tags not belong to the other two

categories.)Functional tags head , script , meta , var , object , applet .

Fig. 7. Structural analysis of the page in Figure 1(a).

purpose of partitioning the content, the functional tags are omitted since theydont contribute to the presentation style of the content.

Our type system is based on theDocument Object Model (DOM) of the HTMLdocument. A segment of the DOM presentation of the HTML page is shown in

Figure 7(a), where the internal nodes of the tree are labeled with HTML tagsand the displayable content are at the leaf nodes. Formally:

Denition 4.1 . Every node in DOM tree has an associated type T , which isdened as follows.

A primary type T of a node n is a 2-tuple < D , L > , in which D is the styleof n indicated by collecting the style tags in the path from root to n and/orextracting the class attributes from Cascade Style Sheets (CSS); 2 L is thesequence of tags ( t1, t2, . . . , ), in which t i is the ith HTML layout tag in thepath from the root node of DOM tree to n .

A compound type T is a sequence of types ( T 1, T 2, . . . , ), in which T i is also a

type .A type T is either a primary type or a compound type.

2There are many attributes dened in CSS styles. However in our work, we use only a small subsetof the CSS style attributes.

8/8/2019 a12-sun

13/34



Algorithm 1 PartitionTree(n)

Require: n : a node in a DOM tree1: if n is a leaf node then2: n .type = the primary type dened above.3: else if n has only one child node c then4: PartitionTree(c)5: Replace n with c and remove n from the DOM tree.6: else7: for all child node x of n do8: PartitionTree(x)9: end for

10: Analyze(n)11: end if

Intuitively, the primary type encodes the presentation style (including lo-cation and visual cues such as font type and size) of a piece of content (textor image) that corresponds to a leaf node in a DOM tree. For example, theleaf nodes of the product names ( SONY 5-Disc... , SONY CD-R... , etc.) inFigure 7(a) have the same primary type T 1 = < D 1 , L 1 > , where D 1 ={ bold =true, italic = true, link = true, size = 11px, face = serif, color = #000000, im-age = false } and L 1 = ..FORM.TABLE.TR.TD . Similarly, all the descriptions of items ( Variable line... , Internal storage... , etc.) share the same primary typeT 2 = < D 2, L 2 > , where D 2 = { bold = false, italic = false, link = false, size =9px, face = serif, color = #000000, image = false } and L 2 = ..FORM.TABLE.TR.TD.

A compound type essentially summarizes the structural recurrence infor-mation of a subtree rooted at an internal node. For example, in Figure 7(a)the subtree rooted at the circled TABLEnode are the search results. The prop-erty of spatial locality combined with consistency in presentation style revealsstructural recurrence information about semantically related items. For exam-ple, the TR nodes under the circled TABLEnode have the same compound typeT T R = T 1T 2T 3T 4T 5 where each T i is the primary type of the i th leaf node underthem. By concatenating all the types of the circled TABLEnodes children, weget a type sequence T TR T TR T TR ... where the sequential pattern, T TR , exactlycaptures the structural recurrence information of each semantic unit (i.e., theitems in the search results). In our type system, such a repeating type sequenceT T R results in a compound type T TABLE = T 1T 2T 3T 4T 5 for the circled TABLEnode.

4.1.2 Partition Creation. We now formalize the notion of a pattern used inpartitioning a Web page into semantically related items:

Denition 4.2 . A pattern p is a substringof typesequence T with thelength| p| 2, such that there are n p( 2) occurrences of p in T and for any twooccurrences pi , p j of p , pi p j = .

8/8/2019 a12-sun

14/34


12:14 Z. Sun et al.

Algorithm 2 Analyze(n)

Require: n : an internal node in a DOM tree1: nalpattern = null2: S = the type sequence of all the child nodes of n3: repeat4: max = 0; maxpattern = null; maxscheme = null5: collapse adjacent nodes in S which share the same type6: = MaximalPatterns(S)7: for all each maximal pattern p in do8: scheme = GenerateScheme( p, S )9: score = EvaluateScheme( scheme )

10: if score > max then11: max = score ; maxpattern = p; nalpattern = p; maxscheme = scheme12: end if

13: end for14: if maxpattern = null then15: partition and update S with scheme maxscheme16: end if 17: untill maxpattern = null18: if nalpattern = null then19: set the type of n to maxpattern20: else21: set the type of n to S22: atten all the grandchild nodes of n (each child of n will be a 1-level subtree)23: end if

It is quite common that a sequence contains more than one pattern, and onlyone pattern can be used to identify the related items. In our previous work, wedened a maximal pattern to be the only candidate based on the frequency of its appearance. A more robust denition is as follows,

Denition 4.3 . A pattern P is maximal if and only if pattern p = p ,either n p < n p or p p .

For example, in the compound type T 1T 2T 3T 1T 2T 3T 1T 2, themaximalpatternsare T 1T 2, T 1T 2T 3, T 2T 3T 1 , and T 3T 1T 2. T 2T 3 is also a pattern but not maximal,as the pattern T 1T 2T 3 repeats the same number of times as T 2T 3 and T 2T 3 T 1T 2T 3 . T 1T 2T 3T 1 has two occurrences in the sequence but it is not a pattern,since the two occurrences overlap with each other.

Our top-level structural analysis algorithm, namely PartitionTree shownear-lier, partitions each internal node according to the maximal pattern found in thetype sequence of its children. PartitionTree traverses the DOM tree top-downand then re-structures it bottom-up. When a leaf node is processed, we collectits primary type in Line 2. For the internal node with only one child, we pro-cess its only child, delete the node, and move its child up to the nodes parent,as shown in Line 45. However, for an internal node with multiple children,

8/8/2019 a12-sun

15/34



Algorithm 3 GenerateScheme( p, S )

Require: p : a maximal pattern. S : a sequence of nodes.1: Partition S into 0 1 . . . m where = p and i represents the node sequence

between each pair of patterns. Set the initial partition i to the ith .2: For each i , split it to i,1 and i,2 if there is a separator between i,1 and i,2 .3: For each i , if the nodes in i have different parent nodes, split it to i,1 and i,2 such

that the nodes in each of them come from one parent node.4: For any sequence i i i+ 1 , if after all the above steps, it becomes i i,1 i,2 . . . i, k i + 1 ,

we merge i,1 into i and i, k into i+ 1 .5: The nal scheme is { 1 , 2 , . . . , m }.

PartitionTree is invoked on its children for type computation (Line 8) andthen, pattern discovery is performed on the type sequence of its children(Line 10).

Algorithm Analyze takes an internal node n as input. Its main function is

to partition the child nodes of n by examining their structural similarity. Asmentioned above, such similarity is discovered by nding the maximal patternin the type sequence S of its children. The main body of algorithm analyze isan iterative process. At Line 6 the set of maximal patterns in the type sequenceS is generated. Each such maximal pattern is the basis of a partition scheme(Line 8), and we assign a goodness score to the schemes (Line 9). The schemewith thehighest score will be used to actually partition thesequence S (Line 15).We will show the details shortly. The process repeats until no more patternscan be found. Finally, the type of n will be set to the type sequence of the lastfound pattern (if it exists), or to the type sequence S of its children (otherwise).If no pattern is found, its immediate children are deleted and their child nodesare attached to the node n itself (Line 22). In our algorithm, we use sufx treesto nd the repeating substrings in S (see Guseld [1997] for the details). We

nd the maximal patterns amongst these substrings.Let us illustrate the algorithm step by step using an example. Suppose al-gorithm PartitionTree is invoked on the tree shown in Figure 8(a). The nodesare labeled as n i , 1 i 19. Each leaf node has its primary type displayedon the left. The rectangle between n 11 and n 12 is a separator 3 detected in theHTML document. PartitionTree rst traverses to the leaf nodes n 14 and n 15(other leaf nodes are processed similarly), assigns the primary types to themand calls the Analyze algorithm. It rst runs on the node n 6, and is unable tond any pattern. Since all the child nodes of n 6 are leaf nodes, Analyze skipspropagating the grandchildren and just assigns T 2T 3 as the compound type of n 6 . Analyze continues on n 2 where still no pattern can be found. At this time allgrandchildren of n 2 arepropagated up. Node n 6 gets deleted; n 14 and n 15 becomethe immediate children of n 2. The nodes n 3 and n 4 are processed similarly. Thisleads to the intermediate tree in Figure 8(b), where n 2, n 3 and n 4 are assignedthe compound types shown on their left.

3 A separator denotes a horizontal or vertical blank region in a Web page. Its purpose is to indicatethe boundary between visible content. Examples of separators include stand-alone BR tags, blanktable cells and images used as spacers.


8/8/2019 a12-sun

16/34


12:16 Z. Sun et al.

Fig. 8. An example for the partitioning algorithm.

Now we take a closer look at the partition scheme generation and its eval-uation. In the sequel, for simplicity, we will refer to a node by its type unlessotherwise stated. Therefore, the type sequence can also be regarded as the se-quence of nodes in the DOM tree. Next we generate the partition scheme of Sfor a pattern p using GenerateScheme below.

Aftera partition schemeis created under node n , we assign a goodness scoreto it. This is done as follows: suppose the scheme has n partitions under noden . Let us denote each partition as i , 1 i n . Then the score for is:

2 | | n | | + n

1 i n

| i | | |2, (1)

where | i | denotes the size of the partition i .The rst term in the score is the harmonic mean of coverage and number of

partitions, and the second term is the standard square deviation of the sizes


8/8/2019 a12-sun

17/34



of all the partitions. The combination is based on the observation that the se-mantically related items usually have similar size of contents (low standard

square deviation), and we prefer the scheme whose coverage and the numberof partitions are both large (high harmonic mean).We pick the partition scheme with the highest score and do the partition-

ing based on the selected scheme. The partitioning step rst removes all thesubstructure in the child nodes, resulting a 1-level tree rooted at n . It thenaggregates all the child nodes in partition i under a newly created node n i ,namely pattern node , whose type is set to the pattern p . n i is then inserted asthe child of n .

Let us continue the example in Figure 8. When Analyze is invokedon n 1, the algorithm assigns n 1 with the type sequence of its childrenT 1T 2T 3T 4T 5T 2T 3T 6T 7T 1T 2T 3. The maximal patterns discovered are p1 = T 2T 3and p2 = T 1T 2T 3. Algorithm GenerateScheme is invoked on each of themaximal patterns. For p1 , the initial partition scheme is

[T 1](T 2T 3)[T 4T 5](T 2T 3)[T 6T 7T 1](T 2T 3) (we use parenthesis to indicate theboundaries of i , and bracket to indicate the boundaries of i ). Since thereis a separator between n 11 and n 4, 2 = T 6T 7T 1 is split into [ T 6T 7][T 1]. Also since n 4 and n 5 have different parent nodes, 1 = T 4T 5 is split into[T 4][T 5]. The same holds for 2,1 = T 6T 7. Accordingly the type sequence isnow [T 1](T 2T 3)[T 4][T 5](T 2T 3)[T 6][T 7][T 1](T 2T 3). GenerateScheme modies thescheme to ( T 1T 2T 3T 4)(T 5T 2T 3T 6)[T 7](T 1T 2T 3). This scheme is assigned thescore 4.05 ( | | = 11, n = 3, 1 = 2 = 4, 3 = 3) by the goodness scoringformula given in (1).

Similarly the maximal pattern p2 results in the partition scheme(T 1T 2T 3T 4)[T 5T 2 T 3T 6][T 7](T 1T 2T 3), with the score of 2.93 ( | | = 7, n = 2, 1 =4, 2 = 3). Therefore, the partition scheme based on p1 has the highest scoreand is selected to actually partition the sequence. The partitioning step re-moves n 2, n 3, and n 4. We insert the new pattern nodes P s (see Figure 8(c))according to the scheme. The newly inserted pattern nodes are assigned thecompound type T p = T 2T 3 . The type sequence of n 1 becomes T pT pT 7T p. Sinceno more patterns can be found in the new type sequence, the loop ends andn 1 is assigned the type T p. The nal result of Analyze on n 1 is shown inFigure 8(c), where the nodes labeled with P are the newly inserted patternnodes.

Finally in the restructured tree, known also as the partition tree , there arethree classes of internal nodes: (i) group which encapsulates repeating pat-terns in its immediate children typesequence, (ii) pattern whichcaptures eachindividual occurrence of the repeat pattern, or (iii) block when it is neithergroup nor pattern. Intuitively the subtree of a group node denotes homogenouscontent consisting of semantically related items. For example, observe how allthe items in the search results list in Figure 1(a) are rooted under the groupnode in the partition tree. The leaf nodes of the partition tree correspond to theleaf nodes in the original DOM tree and have content associated with them.The partition tree resulting from structural analysis of the DOM in Figure 7(a)is shown in Figure 7(b). The partition tree represents the logical organizationof the pages content.


8/8/2019 a12-sun

18/34


12:18 Z. Sun et al.

4.2 Concept Identication

After a partition tree is generated, our task is to associate an instance of a

concept to a particular node in the partition tree. The subtree rooted at sucha node represents the smallest segment of the Web page covering the conceptinstance. Towards this, we have designed a feature space for partition treenodes. For each node in the partition tree, its features are collected as a vectorin the feature space. The features are then used to build a statistical conceptmodel. The model is trained using prelabeled nodes. For a new partition tree,each node is ranked by the concept model and the one with the highest rank isselected as the instance of the concept. In the following we give a description of the feature space.

4.2.1 Feature Space. Each node in the partition tree can be represented bya vector of values representing its attributes, namely features . The set of feature vectors forms a multidimensional feature space . In our concept identication

problem, given a partition tree node p , the value of feature f i , denoted as n f i , p, isthe frequency of occurrence of feature f i in p . The features of p are categorizedinto three types, namely word , p-gram , and t-gram .

(i) Word features These are features drawn from the text encapsulatedwithin a partition tree node. Intuitively, an instance of a certain concept usu-ally contains a set of particular words occurring more often than others. Forexample, given the Search Form concept, an instance usually contains wordslike search and nd. Such concept-specic words are of great importance inidentifying the instance. Word features are the most widely used features intext-related machine learning tasks such as text categorization.

For a leaf node in the partition tree, its word features are drawn from itsown text while for an internal partition tree node, the words present in all the

leaves within the subtree rooted at it are aggregated. All the words are case-insensitive, and stop words (e.g., the, in, and please) are ignored in bothcases. n f i , p is the number of times the word f i occurs in the text of p .

(ii) p-gram features These are features representing the presentation of content. We have observed that in Web sites sharing similar content semantics,the presentation of a semantic concept in those sites exhibits similarity. Forinstance, in Figure 1(b), each item is presented as a link with the item name,followed by a short text description, and ending with miscellaneous text infor-mation. Similar visual presentation can also be found on other sites. A p-gramfeature captures these presentation similarities. The basic p-gram features arelink , text , and image found in the leaf nodes of a partition tree. Recall thatduring structural analysis, pattern nodes aggregate every individual repeatin a type sequence. Since repeats are typically associated with similar visualpresentation, complex p-gram features are constructed only at pattern nodesby concatenating the p-gram features of their immediate children nodes. In-ternal nodes aggregate basic and possibly complex p-grams from the subtreesrooted at them. Let n f i , p be the number of times f i occurs in the subtree rootedat p . For instance, in the left tree of Figure 9, the p-gram feature at the pat-tern node labeled P is < text link > , while in the right tree of Figure 9,

8/8/2019 a12-sun

19/34



Fig. 9. t-gram features.

the node labeled B at the same position has the p-gram features {< text > ,< link > }.

(iii) t-gram features While p-gram features capture the visual presenta-tion, t-gram features represent the structure of the partition tree. Recall that

internal partition tree nodes can be either group , pattern , or block while link,text, and image are the different classes of leaf nodes. The structural arrange-ment of these classes of nodes also characterizes a concept and this is whatis captured by t-gram features. Given a partition tree node with N nodes inits subtree, the complete structural arrangement within the node can be de-scribed in terms of a set of subtrees of k (2 k N ) nodes where each subtreeis an arrangement of group, pattern, block, link, text, or image type nodes.Since enumerating all these subtrees has exponential complexity, we restrictour analysis to subtrees of 2 nodes. When k = 2 the t-gram is essentially aparent-child feature. For instance, in Figure 9, when k = 2 the t-gram featurespace of the left tree is {< G , P > , , }, and the righttree is { , , }, where G and B are labels of group and block nodes respectively.

4.2.2 Concept Model. Our concept identication task is to assign a score sn ,c to every node p in thepartition tree given a concept c. The node covering onlythe instance of c gets the highest score. Specically, given N denotes the domainof nodes, C is a domain of concepts and R is the domain of real numbers, thetask is to approximate the target function : N C R with : N C Rcalled concept model , such that for any node n assigned highest score by , will perform similarly (i.e., very likely to assign the highest score also).

We use Bayesian learning method for the concept identication. Specicallywe are looking for the node p that has the maximum ( p, c) = P (c| p) (i.e.,given a node p , the probability that it is an instance of a concept c) among allthe nodes in the partition tree. By Bayes theorem, we have

argmax p N P (c| p) = argmax p N P ( p|c) P (c)

P ( p) = argmax p N P ( p|c),

where P (c) is a constant and P ( p) is uniformly distributed. Therefore, ourconcept model is dened as P ( p|c), which consists of two components: (i) aprobability distribution on the frequency of occurrence of the word, p-gram,and t-gram features of p , and (ii) a probability distribution on the number of

8/8/2019 a12-sun

20/34


12:20 Z. Sun et al.

nodes present in the entire subtree of p . A collection of partition trees whosenodes are (manually) labeled as concept instances serve as the training set for

learning the parameters of these distributions. A maximum likelihood approach is used to model thedistribution of a featurein a concept. Given a training set of L partition tree nodes identiedas instancesof concept c j , we are interested in the features occurring in L . The probabilityof occurrence of a feature f i in c j is dened using Laplace smoothing as:

P ( f i |c j ) = p L n f i , p + 1

i=| F |i= 1 p L n f i , p + | F | + 1

,

where n f i , p denotes the number of occurrences of f i in partition node p and | F |is the total number of unique feature including word, p-grams, and t-grams.Features that do not appear in the training set have the xed probability of

1i=| F |i= 1 p L n f i , p + | F | + 1

.

The number of nodes within the subtree of a partition tree node for a concept c jis modeled as a Gaussian distribution with parameters mean u c j and variance c j and is dened as:

c j = p L | p|

| L|, c j =

p L(| p| c j )2

| L| 1,

where | p| is the number of nodes within the subtree of p , and | L| is the numberof training examples.

For a node p of any new partition tree, we use a modied multinomial dis-tribution to model the likelihood P ( p|c j ):

P ( p|c j ) =N !

N f 1 , p! N f | F | , p!

i=| F |

i= 1 P ( f i |c j ) N f i , p

where N = K e(| p| c j )2 / (2 2c j ), with K being a normalized total feature fre-

quency count, | p| being the total number of partition tree nodes within thesubtree rooted at p , and N f i , p is a scaled value of n f i , p such that i N f i , p = N .The normalization of the multinomial coefcient ensures that the subtree of pwith the size closer to c j will have greater likelihood to c j .

Note that the above formulation of the likelihood takes into considerationboth the number of nodes within p as well as the frequencies of the variousfeatures in the content encapsulated within p . This results in a tight couplingbetween content analysis and document structure during concept identication.The partition tree node with the maximum likelihood value is identied as theconcept instance. The concept tree created with the identied concept instancesis shown in Figure 10. Observe that the irrelevant content in the original DOMtree has been ltered.


8/8/2019 a12-sun

21/34



Fig. 10. Extracted concept tree.

5. EVALUATION

Herein we report on the experimental evaluation of the learned process model,concept extraction and the integrated Guide-O system. Thirty Web sites span-ning the three content domains of books , consumer electronics , and ofce supplies were used in the evaluation.

To create the concepts in the ontology we did an analysis of the transactionalactivities done by users on the Web in each of the three domains. Based on theanalysis we came up with an inclusive set of concepts in Table I.

5.1 Process ModelWe collected 200 example transactional sequences from 30 Web sites. Thesewere labeled sequences whose elements are concept operations as illustratedin the sequences used in Figure 6. A number of CS graduate students wereenlisted for this purpose. Specically each student was told to do around 5to 6 transactions with a Web browser and the sequences were generated bymonitoring their browsing activities. We used 120 of these sequences spanning15 Web sites (averaging 7 to 9 sequences per site) as thetraining set for learningthe process model. The remaining 80 were used for testing its performance. Thelearned model is shown in Figure 4.

The rst metric that we measured was its recall/precision. 4 Figure 11(a)shows the recall/precision values for model learning in each of the three do-

mains. Recall/precision values were 90% and 96% for the books domain, 86%and 88% for the consumer electronics domain and 84% and 92% for the ofcesupplies domain.

4Recall for a process model is the ratio of the number of completed transactions accepted by themodel over the total number of completed transactions. For Precision, this denominator becomesthe total number of accepted transactions (completed and not completed).


8/8/2019 a12-sun

22/34


12:22 Z. Sun et al.

Fig. 11. Evaluation of the learned process model.

Thesecond metricwe measured was thenumberof transitions that remainedto be completed when a true trace (completed transaction) in the test set failedto reach the nal state (check out state). Figure 11(b) shows the failure curve. Analysis of the failure curve indicates that almost half of the failure traces endone hop away from the nal state of the model and only 10% of the failuresend three or more hops away from the nal state. This means that fast errorrecovery techniques can be designed with such a process model.

5.2 Concept Extraction

We built a statistical concept model for each of the concepts in Table I. Recallthat the ve concepts in the upper half of the table are generic for all the threedomains whereas those in the lower half are domain specic. For instance thefeature set of a list of books differs from that of consumer electronic items. So webuilt one model for each concept in the upper half of the table and threeoneper domainfor each concept in the lower half.

The concept model were built using the techniques in Section 4. To build themodel for each of the ve generic concepts we collected 90 pages from 15 out of the 30 Web sites. For each of the domain specic concept we collected 30 Webpages from ve Web sites that catered to that domain.

Note that pages containing more than one concept were shared during thebuilding of the respective concept models. These models drive the concept ex-tractor at runtime. We measured the recall 5 of the concept extractor for eachconcept in the ontology. Roughly 150 Web pages collected from all of 30 Websites was used as the test data. Figure 12 shows the recall values for all of the10 concepts in each of the three domains.

An examination of the Web Pages used in the testing revealed that the highrecall rates (above 80% for Item Taxonomy, Search Form, Add To Cart,Edit Cart, Continue Shopping, and Checkout) is due to the high degree

5Recall value for a concept is the ratio of the number of correctly labeled concept instances in Webpages over the actual number of concept instances present in them.


8/8/2019 a12-sun

23/34



Fig. 12. Recall for concept extraction.

of consistency of the presentation styles of these concepts across all these Websites. The low recall gures for the Item Detail (about 65% averaged overthe three domains) and Shopping Cart (about 70%) is mainly due to a highdegree of variation in their features across different Web sites. A straightfor-ward way to improve the recall of such concepts is to use more training data.However, even this may not help for concepts such as Add To Cart that relyon keywords as the predominant feature. Quite often these are embedded in aimage precluding textual analysis. It appears that in such cases local contextsurrounding the concept can be utilized as a feature to improve recall.

We also observed that Web pages with sufcient textual content relevant tothe concept was identied as an instance of that concept with higher accuracy.Furthermore, concept extractionperformedbetter for schematicWeb pages (i.e.,Web pages generated from a template), because structural feature extractionis dependent on the presentation style and organization of different elementsof the page.

5.3 Integrated System

We conducted quantitative and qualitative evaluation of the integrated sys-tem; the former measured time taken to conduct Web transactions and the

latter surveyed user experience. The evaluation was conducted separately foreach domain using the corresponding domain-specic concept extractors (seeSection 5.2).

Experimental Setup. We used a 1.2GHz desktop machine with 256MB RAMas the computing platform for running the core Guide-O system (shown withinthe outermost box inFigure 3). We alsoused it to run Guide-O-Speech. To do that


8/8/2019 a12-sun

24/34


12:24 Z. Sun et al.

Table IV. Guide-O-Speech Performance

Time Taken (using)Web Sites Voice Interactions Pages Explored Guide-O-Speech JAWS Screen Reader

Amazon 8 0 5 0 306.1 13.73 1300 120.2BN 9 0.82 5 0 351.5 40.78 1130 176.78 AbeBooks 10.8 0.96 6 0.82 384.9 31.35 700 176.78 Amazon 9.8 3.1 5.2 1.26 386.6 60.15 1413 130.11CompUSA 9 1.15 6 0.82 360.5 13.34 931.5 211.42tigerdirect 9.4 1.29 6 0.82 357.7 30.07 1293 212.13OfceMAX 10 0.82 5.8 0.96 341.6 23.69 686 61.52OfceDepot 10 2.16 6 1.41 309.9 45.87 604 55.23QuillCorp 9.6 1.73 5.2 0.96 382.6 49.61 625 51.84

*All times are in seconds.

we installed our own VoiceXML interpreter along withoff-the-shelf speechSDK components. Guide-O-Mobile was architected to run in a client/server modewith themachine running the core Guide-O system as theserver and a 400MHziPaq with 64MB RAM as the client. Thirty CS graduate students were used asevaluators. Prior to evaluation they were trained on how to use the twosystems.We chose 18 Web sites (6 for each domain) to evaluate Guide-O-Mobile and 9(3 for each domain) to evaluate Guide-O-Speech. We conducted roughly 5 to 6transactions on each of them and calculated mean ( ) and standard deviation( ) for all the measured metrics. Over 93.33% of those transactions could becompleted. Evaluation of the integrated system discussed below is based onthese completed transactions.

5.4 Quantitative EvaluationGuide-O-Speech. Evaluators were asked to measure the total time taken

to complete the transactions with Guide-O-Speech. The screen was disabledand evaluators had to interact with the system using a headphone and key-board. For baseline comparison evaluators were also asked to conduct anotherexperiment with the JAWS screen reader on the same set of transactions. Forevery page used in a transaction sequence they were asked to record the time ittook JAWS to read from the beginning of the page until the end of the selectedconcepts content. The sum of these times over all the pages associated withthe transaction denotes the time taken to merely listen to the content of theselected concepts with a screen reader. Table IV lists the metrics measured foreach Web site.

From 5 to 6 transactions on each Web sites (3 Web sites for each contentdomain), we calculated mean ( ) and standard deviation ( sigma ) of all themetrics that are listed in Table IV. Figure 13 shows comparative performanceof guide-o-speech and JAWS.

Observe that Guide-O-Speech compares quite favorably with the best-of-breed screen readers and hence can serve as a practical assistive device fordoing online transactions.


8/8/2019 a12-sun

25/34



Fig. 13. Guide-O-Speech performance.

Guide-O-Mobile. Evaluators rst conducted transactions with Guide-O-Mobile. Next they did the same set of transactions with the original pagesloaded onto the iPaq; that is, the page fetched from the Web was dispatched asis to the iPaq. Figure 14(a) lists the metrics measured. The page load timecolumn contains the times taken to load the original pages into the iPaq. Theanalysis time is the time taken by the Guide-O server to fetch pages from theWeb do content analysis, generate the DHTML page of the extracted conceptsand upload them onto the handheld. Interaction time is measured from themoment a page is loaded into the iPaq until the user takes a navigable actionlike a form-llout, clicking a link, etc. Figure 14(b) and (c) presents graphicallythe overall and interaction times respectively. Observe the signicant decreasein interaction time to complete a Web transaction using Guide-O-Mobile. This

reduction is mainly due to the ltering away of irrelevant content by the pro-cess model. Consequently the user avoids needless scrolling steps. Furthermoresince the transaction is goal directed by the process model the user is able tocomplete the transaction in fewer steps. Another advantage of ltering awayirrelevant content is the relatively small page load times. These times havebeen absorbed in the analysis time.

Qualitative Evaluation. To gauge user experience we prepared a question-naire for the evaluators (see Table V). They were required to answer them uponcompleting the quantitative evaluation. The questions were organized into twobroad categoriessystem (S1 to S4) and concepts (C1 to C3)the former to as-sess theoverall functionality and usability of the Guide-O system and the latterto determine the effectiveness of the semantic concepts in doing Web transac-

tions. The questions were quite generic and applicable to both Guide-O-Mobileand Guide-O-Speech. All of the concept questions (C1 to C3) and questions S3and S4 required a yes/no response. From the responses to S1 and S2 we com-puted the mean percentage shown in the table.

Results. A large percentage of evaluators felt that the concepts presentedwere self-explanatory and contained enough information based on which they


8/8/2019 a12-sun

26/34

8/8/2019 a12-sun

27/34

8/8/2019 a12-sun

28/34


12:28 Z. Sun et al.

over Web pages instead of services. A vast majority of transactions on the Webare still conducted over HTML pages. This focus on Web pages is what sets our

work apart from those in Web services. However using our approach, users can-not understand the goal of the step because this has been split in several Webpages. Thus users lose the global vision of the transaction. We are currentlyworking on an implementation where the local step of the user can be visuallyshown in the context of the global taskthe process model and the semantics.

Using standard web services constrains the user to a set of pre-dened tasks(e.g., searching for a product). However, our approach does not conne a user towhatever service is exposed by the provider. Using our transactional approach,users are able to create services (e.g., service for buying airline ticket, servicefor job search) which can accomplish his personal goal. Thus we are providinga scalable technique for creating services.

Semantic Analysis of Web content. The essence of the technique underlying

our content analysis module is to partition a page into segments containingsemantically related items and classify them against concepts in the ontology.Web page partitioning techniques have been proposed for adapting content onsmall screen devices [Buyukkoten et al. 2001; Buyukkoten et al. 2000; Chenet al. 2003; Yin and Lee 2004], content caching [Ramaswamy et al. 2004], datacleaning [Song et al. 2004; Yiand Liu 2003], and search [Yu et al. 2003]. The ideaof using content similarities for semantic analysis was also recently exploredin Zhang et al. [2004] in the context of Web forms. The fundamental differencebetween our technique and all theabove works is the integrationof inferring thelogical structure of a page (see the partition tree in Figure 7(b)) with featurelearning. This allows us to dene and learn features, such as p-grams andt-grams, using the partition tree.

Concept identication in Web pages is related to the body of research on se-mantic understanding of Web content. Powerful ontology management systemsand knowledge bases have been used for interactive annotation of Web pages[Hein et al. 2003; Kahan et al. 2001]. More automated approaches combinethem with linguistic analysis [Popov et al. 2003], segmentation heuristics [Dillet al. 2003], and machine learning techniques [Cimiano et al. 2004; Hammondet al. 2002]. Our semantic analysis technique is an extension of our previouswork [Mukherjee et al. 2005], and, in contrast to all the above, does not de-pend on rich domain information. Instead, our approach relies on lightweightfeatures in a machine learning setting for concept identication. This allowsusers to dene personalized semantic concepts, thereby lending more exibilityto modeling Web transactions. It should also be noted that the extensive workon wrapper learning [Laender et al. 2002] is related to concept identication.However, wrappers are syntax-based solutions and are neither scalable norrobust when compared to semantics-based techniques.

Process Model Learning. Our work on learning process models from user ac-tivity logs is related to research in mining workow process models (see [van der Aalst and Weijters 2004] for a survey). However, our current denition of aprocess is simpler than traditional notions of workows. For instance, we do


8/8/2019 a12-sun

29/34



not use sophisticated synchronization primitives. Hence we are able to modelour processes as DFAs instead of workows and learn them from example se-

quences. Learning DFAs is a thoroughly researched topic (see Murphy [1996]for a comprehensive survey). A classical result is that learning the smallestsize DFA that is consistent with respect to a set of positive and negative train-ing examples is NP-hard [Angluin 1978; Gold 1978]. This spurred a number of papers describing efcient heuristics for DFA learning [Murphy 1996; Oncinaand Garc 1991]. We have not proposed any new DFA learning algorithm forour work. Instead we adapted the simple yet effective heuristic with low timecomplexity described in [Oncina and Garc 1991]. Navigating to relevant pagesin a site using the notion of information scent has been explored in Chi et al.[2001]. This notion is modeled using keywords extracted from pages specic tothat site. In contrast our process model is domain specic and using it a usercan do online transactions on sites that share similar content semantics.

Content Adaptation. Adapting Web content for browsing with handheldsand speech interfaces is an ongoing research activity. Initial efforts at adaptingWeb content for small-screen devices relied on WML (Wireless Markup Lan-guage) and WAP (Wireless Application Protocol) for designing and displayingWeb pages [Kaasinen et al. 2000]. These approaches imposed additional bur-den on Web designers to create separate WML content. In contrast, we do notrequire any additional effort on the part of content providersGuide-O takesthe original Web pages, processes them, and presents them to the users.

Subsequent research [Buyukkoten et al. 2001; Buyukkoten et al. 2000; Chenet al. 2003; Yang and Wang 2003; Yin and Lee 2004] dealt with adapting HTMLcontent onto these devices by organizing the Web page into tree-like structuresand summarizing the content within these structures for efcient browsing.They were effective for ad-hoc exploratory browsing. However, summary struc-tures often cause needless navigational steps when a user is interested in somespecic content. In our work we need to rst lter away the content basedon the transactional context represented by the states of the process model.Summarization can be used to present the ltered content succinctly.

Page-splitting techniquesused in content adaptation for small-screendevicesare described in Chen et al. [2003], and Xiangye Xiao and Fu [2005]. A page-analysis technique is proposed in Chen et al. [2003] to analyze the structure of aWeb page and split it into small logically-related units that t on the screen of amobile device. In the case when a Web page is not suitable for splitting, an auto-positioning method or scrolling-by-block is used to facilitate browsing withoutsplitting the original Web page. In Xiangye Xiao and Fu [2005], a Web page thatdoes not t on a small screen is transformed into a set of pages, each of whichts into the screen. Page-splitting techniques could be somewhat disorientingfor the users familiar with the original page structure, however, browsing fullpages on mobiles is far more disorienting because most of the mobile browsersdistort the pages while rendering. Guide-O also uses a partitioning algorithmto split Web pages into partitions, and then, identify the concepts. However,Guide-O goes beyond the above-mentioned systems, because it presents theusers with the concepts relevant to the transactional step.


8/8/2019 a12-sun

30/34


12:30 Z. Sun et al.

A number of research projects have tried to condense Web pages by display-ing thumbnails and summarizing Web content [Baluja 2006; Milic-Frayling

and Sommerer 2002]. For example, SmartView [Milic-Frayling and Sommerer2002] uses a page-splitting technique to group the elements of a Web page andpresent them together while allowing a user to zoom into the individual ele-ments. The work described in Baluja [2006] segments Web pages into regions,presents each region using a thumbnail, and also allows a user to zoom intoa region by pressing a single key. In the latter work, the segmentation task isformulated as a machine learning problem based on entropy reduction and de-cision tree learning. Zooming and thumbnail presentation styles can facilitatethe identication of relevant sections in Web pages, but they do not pinpointthe concept instance in a page. These features, however, can further enhanceGuide-O-Mobiles usability.

Recently, desktop-class applications for browsing on mobile devices appearedon the market. Apple iPhone [IPhone], for instance, provides access to desktop-

class applications and various software, (e.g., rich HTML email, full-featuredWeb browsing, widgets, etc.). The ThunderHawk [Thunderhawk] mobile Webbrowser offers desktop level browsing experience to mobile users. It providesrapid operation, zooming capabilities, different viewing modes (e.g., full screenandsplit screen), etc. Users canadjust a Web pages resolution according to theirpreferences and view the page with minimal scrolling. Desktop-class browsersare the future of mobile browsing. However, they also require user interactionto scroll and zoom in on the information. Therefore, they could also benetfrom concept instance identication and presentation to further improve userexperience.

Technologies for making the Web accessible to visually handicapped indi- viduals via nonvisual interfaces analogous to Raman [1998], include work onbrowser level support [Asakawa and Itoh 1998], content adaptation and sum-marization. [Richards and Hanson 2004; Zajicek et al. 1999; Harper and Patel2005], ontology-directed exploratory browsing as in our HearSay audiobrowser[Ramakrishnan et al. 2004], and organization and annotation of Web pages foreffective audio rendition [Huang and Sundaresan 2000; Pontelli et al. 2002;Takagi et al. 2002].

Some of the most popular screen-readers are JAWS [jaw] and IBMs HomePage Reader [Asakawa and Itoh 1998; Takagi et al. 2002]. An example of a VoiceXML browsing system (which presents information sequentially) is de-scribed in [NetECHO[tm]]. Allof these applications do notperform content anal-ysis of Web pages. BrookesTalk [Zajicek et al. 1999] facilitates nonvisual Webaccess by providing summaries of Web pages to give its users an audio overviewof Web page content. The work described in [Harper and Patel 2005] generates agist summary of a Web page to alleviate information overload for blind users.

Howwe differ from these works is similar to thedifferences highlighted above vis-a-vis content adaptationresearch forhandhelds. Specically we need a moreglobal view of the content in a set of pages in order to determine what should beltered in a state. An interesting outcome of this generality is that by collapsingall the states in the process model into a single state, our speech-based HearSaysystem for exploratory browsing becomes a special case of Guide-O-Speech.


8/8/2019 a12-sun

31/34



7. CONCLUSION

The use of complete Web pages for doing online transactions under constrainedinteraction modalities can cause severe information overload on the end user.Model-directed Web transaction can substantially reduce if not eliminate thisproblem. Our preliminary experimentation with Guide-O seems to suggest thatit is possible to achieve such reductions in practice. We are corrently conduct-ing preliminary usability studies of Guide-Os speech at Helen Keller Servicesfor the Blind ( http://www.helenkeller.org ) to get feedback from the visuallyhandicapped community.

There are several avenues of future research. In this article we have pre-sented a client-side Web transactional system, where process model resides inclient side. In the future we will also develop server-side Web transactional sys-tem, where content providers will annotate the Web pages such that concept in-stances will be labeled with the concept name. Content providers can easily use

XHTML to label concept instances in their Web sites and describe process mod-els specic to their sites. Most online stores generate their Web pages automat-ically from templates. So we expect that labeling the content of templates willnot require extensive effort on the part of content providers. At the same time,the use of XHTML for semantic labeling will imposes no limitation, allowingcontent providers to dene their own states and actions as they deem necessary.

We build separate ontology and process model for each type of Web transac-tion (e.g., online shopping, utility bill payment, ight ticket booking). An inter-esting problem is to develop a generic ontology and process model to supportdifferent types of Web transactions. Currently, we do not consider the securitymechanisms dened in the Web sites to handle sensitive information. In fu-ture, we will investigate the possibility of handling security mechanisms thatare already present in the Web site.

The Guide-O system works with the assumption that Web transaction isperformed correctly. We plan to extend our system to incorporate failure sce-narios. To handle failure scenarios, we will maintain a history of transactionalprogress. On failure (e.g., when the system does not identify any concept on thecurrent page), the model will go back to the previous state. Similarly, when auser chooses to go back, the model will go back to the previous state.

In our current framework, Guide-O-Mobile is set up to run as a client/serverapplication with the server doing the bulk of the processing. In the future itshould be possible to port all the server steps to the handheld when improve-ments in handheld hardware and software technologies willprovide more mem-ory and a richer set of browser operations than what are currently available inhandheld.

In addition, displaying concepts compactly in handhelds along the lines sug-gested in Florins and Vanderdonckt [2004] will further improve the users easeof interaction.

Finally, integrationof ourframework withWebservices standards is an inter-esting problem. Success here will let us specify the process model in BPEL4WSwhich in turn will enable interoperation with sites exposing Web pages as wellas those exposing Web services.


8/8/2019 a12-sun

32/34


12:32 Z. Sun et al.

REFERENCES

AGARWAL ,S . ,H ANDSCHUH , S., AND STAAB , S. 2003. Surng the service web.In International Seman-

tic Web Conference (ISWC) . 211216. ANGLUIN , D. 1978. On the complexity of minimum inference of regular sets. Inform. Control 39 , 3,337350.

ASAKAWA , C. AND ITOH , T. 1998. User interface of a home page reader. In Proceedings of the ACM International Conference on Assistive Technologies (ASSETS) . ACM Press, 149156.

B ALUJA , S. 2006. Browsing on small screens: Recasting web-page segmentation into an efcientmachine learning framework. In Proceedings of the 15th International Conference on World WideWeb (WWW06) . ACM Press, 3342.

BUYUKKOTEN , O., G ARCIA-MOLINA , H., AND P AEPCKE , A. 2001. Seeing the whole in parts: Text sum-marization for web browsing on handheld devices. In International World Wide Web Conference(WWW) . ACM Press, 652662.

BUYUKKOTEN ,O. ,G ARCIA-MOLINA ,H . ,P AEPCKE , A., AND WINOGRAD , T. 2000. Powerbrowser: Efcientweb browsing for pdas. In Proceedings of the ACM Conference on Human Factors in ComputingSystems (CHI) . ACM Press, 430437.

CHEN ,L.,S HADBOLT ,N.,G OBLE ,C.A.,T AO,F.,C OX,S.J.,P ULESTON ,C., AND SMART, P. R. 2003. Towardsa knowledge-based approach to semantic service composition. In Proceedings of the InternationalSemantic Web Conference (ISWC) . 319334.

CHEN , Y., M A, W.-Y., AND ZHANG , H.-J. 2003. Detecting web page structure for adaptive viewingon small form factor devices. In Proceedings of the International World Wide Web Conference(WWW) . ACM Press, 225233.

CHI , E., P IROLLI , P., C HEN , K., AND P ITKOW, J. 2001. Using information scent to model user infor-mation needs and actions on the web. In Proceedings of the ACM Conference on Human Factorsin Computing Systems (CHI) . ACM Press, 490497.

CIMIANO , P., H ANDSCHUH , S., AND STAAB , S. 2004. Towards the self-annotating web. In Proceedingsof the International World Wide Web Conference (WWW) . ACM Press, 462471.

DILL , S., E IRON , N., G IBSON , D., GRUHL , D., GUHA , R., J HINGRAN , A., K ANUNGO , T., R AJAGOPALAN , S.,TOMKINS , A., TOMLIN , J., AND YIEN , J. 2003. SemTag and Seeker: Bootstrapping the semanticweb via automated semantic annotation. In Proceedings of the International World Wide WebConference (WWW) . 831840.

FLORINS , M. AND V ANDERDONCKT , J. 2004. Graceful degradation of user interfaces as a designmethod for multiplatform systems. In Proceedings of the 9th International Conference on In-telligent User Interface (IUI04) . ACM Press, New York, NY, 140147.

GOLD, E. M. 1978. Complexityof automaton identication from given data. Inform. Control 37 , 3,302320.

GUSFIELD , D. 1997. Algorithms on Strings, Trees and Sequences: Computer Science and Compu-tational Biology . Cambridge University Press.

H AMMOND , B., SHETH , A., AND K OCHUT, K. 2002. Semantic enhancement engine: A modular docu-ment enhancement platform for semantic applications over heterogenous content. In Real WorldSemantic Applications , V. Kashyap and L. Shklar, Eds. IOS Press.

H ARPER , S. AND P ATEL , N. 2005. Gist summaries for visually impaired surfers. In Proceedingsof the ACM International Conference on Assistive Technologies (ASSETS) . ACM Press, 9097.

HEFLIN , J., H ENDLER , J. A., AND LUKE , S. 2003. SHOE: A blueprint for the semantic web. In Spin-ning the Semantic Web , D. Fensel, J. A. Hendler, H. Lieberman, and W. Wahlster, Eds. MIT Press,2963.

HESS ,A. ,J OHNSTON ,E., AND K USHMERICK , N. 2004. Assam: A tool forsemi-automatically annotating

semantic web services. In Proceedings of the International Semantic Web Conference (ISWC) .320334.HUANG , A. AND SUNDARESAN , N. 2000. A semantic transcoding system to adapt web services for

users with disabilities. In Proceedings of the ACM International Conference on Assistive Tech-nologies (ASSETS) . ACM Press, 156163.

IPhone. http://www.apple.com/iphone/ .JAWS S CREEN READER . http://www.freedomscientic.com.


8/8/2019 a12-sun

33/34



K AASINEN , E., A ALTONEN , M., K OLARI , J., M ELAKOSKI , S., AND L AAKKO, T. 2000. Two approaches tobringing internet services to wap devices. In Proceedings of the International World Wide WebConference (WWW) .

K AHAN,J. ,K OIVUNEN ,M. ,P RUD HOMMEAUX ,E., AND SWICK , R. 2001. Annotea: Anopenrdfinfrastruc-ture for shared web annotations. In Proceedings of the International World Wide Web Conference(WWW) . ACM Press, 623632.

L AENDER , A., RIBEIRO -NETO , B., DA SILVA , A., AND TEIXEIRA , J. 2002. A brief survey of web dataextraction tools. SIGMOD Rec. 31 , 2.

MILIC -FRAYLING , N. AND SOMMERER , R. 2002. Smartview: Flexible viewing of web page contents. In Proceedings of the International World Wide Web Conference (WWW) . ACM Press, 172.

MUKHERJEE , S., R AMAKRISHNAN , I., AND SINGH , A. 2005. Bootstrapping semantic annotation forcontent-rich html documents. In Proceedingsof the International Conference on DataEngineering(ICDE) . ACM Press.

MURPHY , K. 1996. Passively learning nite automata. Tech. rep., Janta Fe Institute.NetECHO[tm]. http://www.internetspeech.com .ONCINA , J. AND G ARC, P. 1991. Inferring Regular Languages in Polynomial Update Time . World

Scientic Publishing.P ATIL , A., OUNDHAKAR , S., SHETH , A., AND VERMA , K. 2004. Meteor-s web service annotation

framework. In Proceedings of the International World Wide Web Conference (WWW) . 553562.PONTELLI , E., XIONG , W., G ILLIAN , D., S AAD, E., GUPTA , G., AND K ARSHMER , A. 2002. Navigation of

html tables, frames, and xml fragments. In Proceedings of the ACM International Conference on Assistive Technologies (ASSETS) . ACM Press, 2532.

POPOV, B., K IRYAKOV , A., K IRILOV, A., M ANOV, D., OGNYANOFF , D., AND GORANOV, M. 2003. Kimsemantic annotation platform. In Proceedings of the International Semantic Web Conference(ISWC) .

R AMAKRISHNAN , I., STENT, A., AND Y ANG, G. 2004. Hearsay: Enabling audio browsing on hypertextcontent. In Proceedings of the International World Wide Web Conference (WWW) . ACM Press,8089.

R AMAN, T. V. 1998. Audio system for technical readings. 1410 , xvi + 121.R AMASWAMY, L., I YENGAR , A., LIU, L., AND DOUGLIS , F. 2004. Automatic detection of fragments in

dynamically generated web pages. In Proceedingsof the International World WideWeb Conference(WWW) . ACM Press, 443454.

R AO, J., K UNGAS , P., AND M ATSKIN , M. 2004. Logic-based web services composition: From servicedescription to process model. In Proceedings of the International Conference on Web Services(ICWS) .

RICHARDS , J. T. AND H ANSON , V. L. 2004. Web accessibility: a broader view. In Proceedings of the13th International Conference on World Wide Web (WWW04) . ACM Press, New York, NY, 7279.

S ABOU, M., WROE , C., GOBLE , C., AND MISHNE , G. 2005. Learning domain ontologies for web servicedescriptions: An experiment in bioinformatics. In Proceedings of the International World WideWeb Conference (WWW) . 190198.

SONG, R., LIU, H., WEN, J.-R., AND M A, W.-Y. 2004. Learning block importance models for webpages. In Proceedings of the International World Wide Web Conference (WWW) . ACM Press, 203211.

T AKAGI , H., ASAKAWA , C., FUKUDA , K., AND M AEDA, J. 2002. Site-wide annotation: Reconstructingexisting pages to be accessible. In Proceedings of the ACM International Conference on AssistiveTechnologies (ASSETS) . ACM Press.

THUNDERHAWK . http://www.bitstream.com/wireless/index.html .TRAVERSO , P. AND P ISTORE , M. 2004. Automated composition of semantic web services into exe-

cutable processes. In Proceedings of the International Semantic Web Conference (ISWC) . VAN DER A ALST, W. AND WEIJTERS , A. 2004. Processmining:A research agenda. Comput. Industry 53 ,

231244.WOMBACHER ,A. ,F ANKHAUSER , P., AND NEUHOLD , E. J. 2004. Transforming bpel into annotateddeter-

ministic nite stateautomata for service discovery. In Proceedingsof the International Conferenceon Web Services (ICWS) .


8/8/2019 a12-sun

34/34


12:34 Z. Sun et al.

XIANGYE XIAO, QIONG LUO, D. H. AND FU, H. 2005. Slicing* tree based web page transformationfor small displays. In Proceedings of the Conference on Information and Knowledge Management . ACM Press, 303304.

Y ANG, C. AND W ANG, F. L. 2003. Fractal summarization for mobile devices to access large docu-ments on the web. In Proceedings of the International World Wide Web Conference (WWW) . ACMPress, 215224.

YI, L. AND LIU, B. 2003. Eliminating noisy information in web pages for data mining. In Proceed-ings of the ACM Conference on Knowledge Discovery and Data Mining (SIGKDD) . ACM Press,117118.

YIN, X. AND LEE , W. S. 2004. Using link analysis to improve layout on mobile devices. In Proceed-ings of the International World Wide Web Conference (WWW) . ACM Press, 338344.

YU, S., C AI, D., WEN, J.-R., AND M A, W.-Y. 2003. Improving pseudo-relevance feedback in webinformation retrieval using web page segnmentation. In Proceedings of the International WorldWide Web Conference (WWW) .

Z AJICEK , M., POWELL , C., AND REEVES , C. 1999. Web search and orientation with brookestalk. In Proceedings of Technology and Persons with Disabilities Conference.

ZHANG , Z., HE, B., AND CHANG , K. C.-C. 2004. Understanding web query interfaces: Best-effortparsing with hidden syntax. In Proceedings of the ACM Conference on Management of Data

(SIGMOD) . ACM Press, 107118.

Received October 2006; revised May 2007; accepted July 2007


a12-sun

Documents