Detection and Localization of HTML
Presentation Failures Using Computer
Vision-Based Techniques
Sonal Mahajan and William G. J. Halfond
Department of Computer Science
University of Southern California
Presentation of a Website
• What do we mean by presentation? – “Look and feel” of the website in a browser
• What is a presentation failure? – Web page rendering ≠ expected appearance
• Why is it important? – It takes users only 50 ms to form opinion about
your website (Google research – 2012)
– Affects impressions of trustworthiness, usability, company branding, and perceived quality
2
End user – no penalty to move to another website
Business – loses out on valuable customers
Motivation
• Manual detection is difficult
– Complex interaction between HTML, CSS,
and Javascript
– Hundreds of HTML elements + CSS properties
– Labor intensive and error-prone
• Our approach – Automate debugging of
presentation failures
3
Two Key Insights
1. Detect presentation failures
4
Oracle image
Test web page
Presentation
failures Visual comparison
Use computer vision techniques
Two Key Insights
2. Localize to faulty HTML elements
5
Test web page Faulty HTML
elements Layout tree
Use rendering maps
• Regression Debugging – Current version of the web app is modified
• Correct bug
• Refactor HTML (e.g., convert <table> layout to <div> layout)
– DOM comparison techniques (XBT) not useful, if DOM has changed significantly
• Mockup-driven development – Front-end developers convert high-fidelity mockups to HTML
pages
– DOM comparison techniques cannot be used, since there is no existing DOM
– Invariants specification techniques (Selenium, Cucumber, Sikuli) not practical, since all correctness properties need to be specified
– Fighting layout bugs: app independent correctness checker 6
Limitations of Existing Techniques
Running Example
Web page rendering Expected appearance (oracle) ≠ 7
Our Approach
8 P1. Detection P2. Localization
Oracle image
Test web page
Visual
differences
Pixel-HTML mapping
Report
Goal – Automatically detect and localize
presentation failures in web pages
P1. Detection
9
• Find visual differences (presentation failures)
• Compare oracle image and test page
screenshot
• Simple approach: strict pixel-to-pixel
equivalence comparison
– Drawbacks
• Spurious differences due to difference in platform
• Small differences may be “OK”
Perceptual Image Differencing (PID)
• Uses models of the human visual system
– Spatial sensitivity
– Luminance sensitivity
– Color sensitivity
• Configurable parameters
– Δ : Threshold value for perceptible difference
– F : Field of view of the observer
– L : Brightness of the display
– C : Sensitivity to colors 10
Shows only human perceptible differences
Filter differences belonging to dynamic areas
P1. Detection – Example
11 Test web page screenshot Oracle Visual comparison using PID Apply clustering (DBSCAN)
A
B
C
P2. Localization
• Identify the faulty HTML element
• Use R-tree to map pixel visual differences
to HTML elements
• “R”ectangle-tree: height-balanced tree,
popular to store multidimensional data
Use rendering maps to find faulty HTML
elements corresponding to visual differences
12
13
P2. Localization - Example
R1
Sub-tree of R-tree 14
P2. Localization - Example
R2
R3
R4
R5
(100, 400)
Map pixel visual differences to HTML elements 15
P2. Localization - Example
R2
R1
tr[2]
R3
td
table
tr
td
Result Set:
/html/body/…/tr[2]
/html/body/…/tr[2]/td[1]
/html/body/…/tr[2]/td[1]/table[1]
/html/body/…/tr[2]/td[1]/table[1]/tr[1]
/html/body/…/tr[2]/td[1]/table[1]/td[1]
Special Regions Handling
• Special regions = Dynamic portions (actual content
not known)
16
1. Exclusion Region
2. Dynamic Text Region
1. Exclusion Regions
• Only apply size bounding property
17
Difference
pixels filtered
in detection
Advertisement
box <element>
reported as
faulty
Test web page Oracle
2. Dynamic Text Regions
• Style properties of text known
18
Text color: red
Font-size: 12px
Font-weight: bold
Test web page Modified test web page
(Oracle) Run P1, P2
News box
<element>
reported as
faulty
P3. Result Set Processing
• Rank the HTML elements in the order of likelihood of being faulty
• Weighted prioritization score
• Lower the score, higher the likelihood of being faulty
Use heuristics based on element relationships
19
3.1 Contained Elements (C)
20
parent
child1 child2 child1 child2
parent
Expected appearance Actual appearance
✖
✖ ✖
3.2 Overlapped Elements (O)
21
parent
child1 child2 child1 child2
parent
Expected appearance Actual appearance
✖
✖
3.3 Cascading (D)
22
element
1
Expected appearance Actual appearance
element
2 element
3
element
1
element
2
element
3
✖
✖
3.4 Pixels Ratio (P)
23
parent
child
✖
✖
Child pixels ratio = 100%
Parent pixels ratio = 20%
/html
/html/body
/html/body/table
.
.
.
/html/body/table/…/img
1. /html/body/table/…/img
.
.
.
5. /html/body/table
6. /html/body
7. /html
P3. Result Set Processing - Example
24
A
B
C
D
E
Report
Cluster B
Cluster C
Cluster D
Cluster E
/html
/html/body
/html/body/table
.
.
.
/html/body/table/…/img
Cluster A
Empirical Evaluation
• RQ1: What is the accuracy of our approach for detecting and localizing presentation failures?
• RQ2: What is the quality of the localization results?
• RQ3: How long does it take to detect and localize presentation failures with our approach?
25
Experimental Protocol
• Approach implemented in “WebSee”
• Five real-world subject applications
• For each subject application
– Download page and take screenshot, use as
the oracle
– Seed a unique presentation failure to create a
variant
– Run WebSee on oracle and variant
26
Subject Applications
27
Subject Application Size (Total HTML
Elements) Generated # test
cases
Gmail 72 52
USC CS Research 322 59
Craigslist 1,100 53
Virgin America 998 39
Java Tutorial 159 50
RQ1: What is the accuracy?
28
93%
92%
92%
90%
97%
94%
Gmail
USC CS Research
Craigslist
Virgin America
Java Tutorial
Localization accuracy
• Detection accuracy: Sanity check for PID
• Localization accuracy: % of test cases in
which the expected faulty element was
reported in the result set
RQ2: What is the quality of localization?
29
12 (16%)
17 (5%)
32 (3%)
49 (5%)
8 (5%)
Gmail
USC CS…
Craigslist
Virgin America
Java Tutorial
Result Set Size
23 (10%)
1. <>…</>
2. <>…</>
.
.
.
.
.
.
23. <>…</>
Rank = 4.8 (2%) ✖ ✔ Distance = 6
faulty element not present
RQ3: What is the running time?
21%
25%
54%
P1: Detection
P2: Localization
P3: Result SetProcessing
30
7 sec
3 min
87 sec
Sub-image search for cascading heuristic
Comparison with User Study
76%
36%
100% 93%
Detection Localization
Accuracy
Students WebSee• Graduate-level students
• Manual detection and
localization using Firebug
• Time
– Students: 7 min
– WebSee: 87 sec
31
Case Study with Real Mockups
• Three subject applications
• 45% of the faulty elements reported in
top five
• 70% reported in top 10
• Analysis time similar
32
Summary
• Technique for automatically detecting and
localizing presentation failures
• Use computer vision techniques for detection
• Use rendering maps for localization
• Empirical evaluation shows positive results
33
Thank you
34
Detection and Localization of HTML Presentation Failures Using Computer
Vision-Based Techniques
Sonal Mahajan and William G. J. Halfond
Normalization Process
• Pre-processing step before detection
1. Browser window size is adjusted based
on the oracle
2. Zoom level is adjusted
3. Scrolling is taken care of
35
Difference with XBT
• XBT use DOM comparison – Find matched nodes, compare them
• Regression debugging – Correct bug, refactor HTML (e.g. <table> to <div> layout)
– DOM significantly changed • XBT cannot find matching DOM nodes, not accurate
comparison
• Mockup Driven Development – No “golden” version of page (DOM) exists
– XBT techniques cannot be used
• Our approach – Uses computer vision techniques for detection
– Applies to both scenarios
36
Pixel-to-pixel Comparison
37
Oracle Test Webpage Screenshot
Pixel-to-pixel Comparison
38 Difference image
98% of the entire image is shown in difference!
Difference pixel
Matched pixel
P1. Detection
39
Analyze using computer vision
techniques
• Our approach: Perceptual image
differencing (PID)
• Find visual differences (presentation failures)
• Simple approach: strict pixel-to-pixel
equivalence comparison
Perceptual Image Differencing
40 Difference image
Difference pixel
Matched pixel
41
High fidelity mockups… reasonable?
42
High fidelity mockups… reasonable?
43
High fidelity mockups… reasonable?
44
High fidelity mockups… reasonable?
45
High fidelity mockups… reasonable?
46
High fidelity mockups… reasonable?
47
High fidelity mockups… reasonable?
48
High fidelity mockups… reasonable?
49
High fidelity mockups… reasonable?
50
High fidelity mockups… reasonable?
51
High fidelity mockups… reasonable?
52
High fidelity mockups… reasonable?