1 mining test oracles for search engines wujie zheng [email protected]
TRANSCRIPT
![Page 2: 1 Mining Test Oracles for Search Engines Wujie Zheng wjzheng@cse.cuhk.edu.hk](https://reader035.vdocument.in/reader035/viewer/2022062217/56649eb75503460f94bc0f94/html5/thumbnails/2.jpg)
2
Outline
Search Engines Evaluation/Testing Our Approach Data Collection Examples
![Page 3: 1 Mining Test Oracles for Search Engines Wujie Zheng wjzheng@cse.cuhk.edu.hk](https://reader035.vdocument.in/reader035/viewer/2022062217/56649eb75503460f94bc0f94/html5/thumbnails/3.jpg)
3
Search Engines Evaluation/Testing
![Page 4: 1 Mining Test Oracles for Search Engines Wujie Zheng wjzheng@cse.cuhk.edu.hk](https://reader035.vdocument.in/reader035/viewer/2022062217/56649eb75503460f94bc0f94/html5/thumbnails/4.jpg)
4
Search Engine Evaluation
Prepare a set of queries and the ground truth, then evaluate the results of different search engines using well-defined measurements How to prepare queries, i.e., test inputs? How to get the ground truth, i.e., test oracles?
![Page 5: 1 Mining Test Oracles for Search Engines Wujie Zheng wjzheng@cse.cuhk.edu.hk](https://reader035.vdocument.in/reader035/viewer/2022062217/56649eb75503460f94bc0f94/html5/thumbnails/5.jpg)
5
Test Oracles
Previous Approaches Manually labeling
too costly, hardly reusable Clickthrough Data
cannot find relevant pages that are not in the search results Automatic labeling based on the search results of multiple
search engines at the same time
bias to systems of similar characteristics Use previous search results as test oracles
desired search results may change
![Page 6: 1 Mining Test Oracles for Search Engines Wujie Zheng wjzheng@cse.cuhk.edu.hk](https://reader035.vdocument.in/reader035/viewer/2022062217/56649eb75503460f94bc0f94/html5/thumbnails/6.jpg)
6
Mining Test Oracles from Search Results
![Page 7: 1 Mining Test Oracles for Search Engines Wujie Zheng wjzheng@cse.cuhk.edu.hk](https://reader035.vdocument.in/reader035/viewer/2022062217/56649eb75503460f94bc0f94/html5/thumbnails/7.jpg)
7
Basic Idea
Mine implicit rules between inputs/outputs, e.g., tvguide.com, => imdb.com; basketball-reference.com, => nba.com ericsson,sony, => sonyericsson.com
![Page 8: 1 Mining Test Oracles for Search Engines Wujie Zheng wjzheng@cse.cuhk.edu.hk](https://reader035.vdocument.in/reader035/viewer/2022062217/56649eb75503460f94bc0f94/html5/thumbnails/8.jpg)
8
Build The Dataset
Terms (features) of inputs Query words Query types
Terms (features) of outputs Domains of top 10 search results
Terms (features) of multiple search engines Search engine + domains of top 10 search results
![Page 9: 1 Mining Test Oracles for Search Engines Wujie Zheng wjzheng@cse.cuhk.edu.hk](https://reader035.vdocument.in/reader035/viewer/2022062217/56649eb75503460f94bc0f94/html5/thumbnails/9.jpg)
9
Example Dataset
pine,furniture,Home.csv,barnfurnituremart.com,americancountryhomestore.com,overstock.com,prairiecountryfurniture.com,etsy.com,unfinishedfurnituregiant.com,cozylogfurniture.com,directfrommexico.com,oakplus.com,sawdustcityllc.com,
buy,wine,online,Food.csv,wine.com,foodandwine.com,marketviewliquor.com,winechateau.com,wines.com,thewinebuyer.com,wineweb.com,alloutwine.com,cellaraiders.com,french-wine-online.com,
piercing,labret,Beauty.csv,wikipedia.org,youtube.com,youtube.com,about.com,ygoy.com,ehow.com,bodyjewelleryshop.com,google.com,bmezine.com,piercingdot.com
![Page 10: 1 Mining Test Oracles for Search Engines Wujie Zheng wjzheng@cse.cuhk.edu.hk](https://reader035.vdocument.in/reader035/viewer/2022062217/56649eb75503460f94bc0f94/html5/thumbnails/10.jpg)
10
Example Dataset
interest,rates,today,Finance.csv,real.csv,google:wellsfargo.com,google:bankrate.com,google:marketwatch.com,google:interest.com,google:interest.com,google:mortgagenewsdaily.com,google:usbank.com,google:mortgage101.com,google:yahoo.com,google:mortgageloan.com,bing:wellsfargo.com,bing:bankrate.com,bing:marketwatch.com,bing:wsj.com,bing:interest.com,bing:interest.com,bing:bankrate.com,bing:usbank.com,bing:yahoo.com,bing:usatoday.com,yahoo:bankrate.com,yahoo:wellsfargo.com,yahoo:bankrate.com,yahoo:interest.com,yahoo:msn.com,yahoo:money-rates.com,yahoo:cnn.com,yahoo:yahoo.com,yahoo:fxstreet.com,yahoo:marketwatch.com,
![Page 11: 1 Mining Test Oracles for Search Engines Wujie Zheng wjzheng@cse.cuhk.edu.hk](https://reader035.vdocument.in/reader035/viewer/2022062217/56649eb75503460f94bc0f94/html5/thumbnails/11.jpg)
11
Association Rule Mining
A,B,C=>D confidence(A=>D) = support(A,D)/support(A)
bing:mlb.com, => google:mlb.com, support(bing:mlb.com, google:mlb.com)=26, support(bing:mlb.com)=27, confidence(bing:mlb.com, =>
google:mlb.com, )=26/27
![Page 12: 1 Mining Test Oracles for Search Engines Wujie Zheng wjzheng@cse.cuhk.edu.hk](https://reader035.vdocument.in/reader035/viewer/2022062217/56649eb75503460f94bc0f94/html5/thumbnails/12.jpg)
12
Association Rule Mining
Mine all frequent itemsets We are most interested in the single postfix
rules, i.e., A=>B, where B’s size is 1 Algorithm
For each itemset S For each u in S
Check the rule S-u => u
![Page 13: 1 Mining Test Oracles for Search Engines Wujie Zheng wjzheng@cse.cuhk.edu.hk](https://reader035.vdocument.in/reader035/viewer/2022062217/56649eb75503460f94bc0f94/html5/thumbnails/13.jpg)
13
Data Collection
![Page 14: 1 Mining Test Oracles for Search Engines Wujie Zheng wjzheng@cse.cuhk.edu.hk](https://reader035.vdocument.in/reader035/viewer/2022062217/56649eb75503460f94bc0f94/html5/thumbnails/14.jpg)
14
Search Engines
Google Bing Yahoo Baidu Sogou Soso
![Page 15: 1 Mining Test Oracles for Search Engines Wujie Zheng wjzheng@cse.cuhk.edu.hk](https://reader035.vdocument.in/reader035/viewer/2022062217/56649eb75503460f94bc0f94/html5/thumbnails/15.jpg)
15
Queries
Google trends (hot queries), 1000 queries Queries in KDDCUP 2005, 800 queries Google Adwords, 15,000 queries, 22 types Baidu Tops
![Page 16: 1 Mining Test Oracles for Search Engines Wujie Zheng wjzheng@cse.cuhk.edu.hk](https://reader035.vdocument.in/reader035/viewer/2022062217/56649eb75503460f94bc0f94/html5/thumbnails/16.jpg)
16
Examples
![Page 17: 1 Mining Test Oracles for Search Engines Wujie Zheng wjzheng@cse.cuhk.edu.hk](https://reader035.vdocument.in/reader035/viewer/2022062217/56649eb75503460f94bc0f94/html5/thumbnails/17.jpg)
17
dpreview.com,kenrockwell.com, => amazon.com, : 29/29=1.0, violations: test: 37/40, violations: 3881,4691,4783,
amazon.com,kenrockwell.com, => dpreview.com, : 29/29=1.0, violations: test: 37/39, violations: 2089,8921,
canon,amazon.com, => canon.com, : 22/22=1.0, violations: test: 34/38, violations: 4090,4870,5384,7400,
canon.com,amazon.com, => canon, : 22/22=1.0, violations: test: 34/38, violations: 3560,5409,8983,8988,
canon.com,Hobbies.csv, => canon, : 31/31=1.0, violations: test: 31/34, violations: 3560,5409,8988,
canon.com,dpreview.com, => canon, : 22/22=1.0, violations: test: 24/26, violations: 5409,8983,
gsmarena.com,samsung.com, => samsung, : 26/26=1.0, violations: test: 32/35, violations: 852,1195,1714,
phonenumber.com, => whitepages.com, : 25/25=1.0, violations: test: 11/12, violations: 1077,
Hobbies.csv,nikon, => nikon.com, : 28/28=1.0, violations: test: 26/28, violations: 896,8319,
nikon, => nikon.com, : 28/28=1.0, violations: test: 26/28, violations: 896,8319, canon.com, => canon, : 37/37=1.0, violations: test: 37/41, violations:
3560,5409,8983,8988, amazon.com,nikon, => nikon.com, : 25/25=1.0, violations: test: 25/27, violations:
896,8319, reversephonedirectory.com,Computer.csv, => whitepages.com, : 22/22=1.0, violations:
test: 26/30, violations: 1804,4424,5453,8720,
![Page 18: 1 Mining Test Oracles for Search Engines Wujie Zheng wjzheng@cse.cuhk.edu.hk](https://reader035.vdocument.in/reader035/viewer/2022062217/56649eb75503460f94bc0f94/html5/thumbnails/18.jpg)
18
Internet.csv,ericsson, => sonyericsson.com, : 24/24=1.0, violations: test: 23/24, violations: 8776,
reversephonedirectory.com, => whitepages.com, : 22/22=1.0, violations: test: 28/32, violations: 1804,4424,5453,8720,
simplyrecipes.com,about.com, => allrecipes.com, : 25/25=1.0, violations: test: 38/39, violations: 5596,
Finance.csv,oanda.com, => xe.com, : 20/20=1.0, violations: test: 27/30, violations: 3410,5566,5781,
oanda.com, => xe.com, : 20/20=1.0, violations: test: 28/31, violations: 3410,5566,5781, food.com,foodnetwork.com, => allrecipes.com, : 30/30=1.0, violations: test: 32/34,
violations: 7642,8519, foodnetwork.com,simplyrecipes.com, => allrecipes.com, : 39/39=1.0, violations: test:
40/43, violations: 566,5596,7642, ericsson,sony, => sonyericsson.com, : 24/24=1.0, violations: test: 23/24, violations: 8776, myrecipes.com,foodnetwork.com, => allrecipes.com, : 24/24=1.0, violations: test: 28/30,
violations: 2748,5252, myrecipes.com,allrecipes.com, => foodnetwork.com, : 24/24=1.0, violations: test: 28/35,
violations: 377,1236,1335,1645,3752,6655,6920, phonenumber.com,phone, => whitepages.com, : 20/20=1.0, violations: test: 8/9,
violations: 1077, Food.csv,joyofbaking.com, => allrecipes.com, : 27/27=1.0, violations: test: 35/36,
violations: 566, nikonusa.com,nikon, => nikon.com, : 28/28=1.0, violations: test: 26/28, violations:
896,8319, joyofbaking.com, => allrecipes.com, : 27/27=1.0, violations: test: 35/36, violations: 566,
![Page 19: 1 Mining Test Oracles for Search Engines Wujie Zheng wjzheng@cse.cuhk.edu.hk](https://reader035.vdocument.in/reader035/viewer/2022062217/56649eb75503460f94bc0f94/html5/thumbnails/19.jpg)
19
mortgageloan.com, => bankrate.com, : 20/21=0.9523809523809523, violations: 7719, test: 24/28, violations: 545,1603,5073,7711,
Finance.csv,mortgageloan.com, => bankrate.com, : 20/21=0.9523809523809523, violations: 7719, test: 24/28, violations: 545,1603,5073,7711,
recipes,myrecipes.com, => foodnetwork.com, : 20/21=0.9523809523809523, violations: 7778, test: 20/25, violations: 1236,1335,6655,6920,7770,
recipes,myrecipes.com, => allrecipes.com, : 20/21=0.9523809523809523, violations: 7778, test: 24/25, violations: 7770,
phonearena.com,samsung, => gsmarena.com, : 21/22=0.9545454545454546, violations: 3806, test: 33/34, violations: 3802,
samsung.com,samsungmobile.com, => samsung, : 21/22=0.9545454545454546, violations: 8585, test: 8/10, violations: 1195,4778,
food.com,about.com, => allrecipes.com, : 21/22=0.9545454545454546, violations: 2406, test: 43/46, violations: 5740,7359,8893,
Dining.csv,mcdonalds, => mcdonalds.com, : 21/22=0.9545454545454546, violations: 5326, test: 20/22, violations: 3470,3569,
amazon.com,nikon.com, => nikon, : 25/26=0.9615384615384616, violations: 7295, test: 25/30, violations: 1256,5102,6165,6744,7287,
nikon.com,Hobbies.csv, => nikon, : 28/29=0.9655172413793104, violations: 7295, test: 26/31, violations: 1256,5102,6165,6744,7287,
![Page 20: 1 Mining Test Oracles for Search Engines Wujie Zheng wjzheng@cse.cuhk.edu.hk](https://reader035.vdocument.in/reader035/viewer/2022062217/56649eb75503460f94bc0f94/html5/thumbnails/20.jpg)
20
Examples of Multiple Search Engines
![Page 21: 1 Mining Test Oracles for Search Engines Wujie Zheng wjzheng@cse.cuhk.edu.hk](https://reader035.vdocument.in/reader035/viewer/2022062217/56649eb75503460f94bc0f94/html5/thumbnails/21.jpg)
21
bing:medicinenet.com,google:emedicinehealth.com, => google:medicinenet.com, : 107/107=1.0, violations:
symptoms,bing:medicinenet.com, => google:webmd.com, : 55/55=1.0, violations:
Hobbies.csv,yahoo:allrecipes.com, => google:allrecipes.com, : 53/53=1.0, violations:
bing:medicinenet.com,yahoo:nih.gov, => google:medicinenet.com, : 100/100=1.0, violations:
google:amazon.com,bing:gsmarena.com, => google:gsmarena.com, : 52/52=1.0, violations:
bing:gsmarena.com,google:youtube.com, => google:gsmarena.com, : 73/73=1.0, violations:
google,google:google.com, => bing:google.com, : 56/56=1.0, violations:
google:allrecipes.com,recipe, => bing:allrecipes.com, : 55/55=1.0, violations:
bing:medicinenet.com,yahoo:mayoclinic.com, => google:medicinenet.com, : 90/90=1.0, violations:
bing:dpreview.com,bing:amazon.com, => google:dpreview.com, : 56/56=1.0, violations:
![Page 22: 1 Mining Test Oracles for Search Engines Wujie Zheng wjzheng@cse.cuhk.edu.hk](https://reader035.vdocument.in/reader035/viewer/2022062217/56649eb75503460f94bc0f94/html5/thumbnails/22.jpg)
22
bing:medicinenet.com,yahoo:mayoclinic.com, => google:mayoclinic.com, : 89/90=0.9888888888888889, violations: 7124,
Home.csv,bing:amazon.com, => google:amazon.com, : 90/91=0.989010989010989, violations: 2124,
bing:medicinenet.com,yahoo:wrongdiagnosis.com, => google:medicinenet.com, : 90/91=0.989010989010989, violations: 8556,
bing:webmd.com,yahoo:wrongdiagnosis.com, => google:webmd.com, : 95/96=0.9895833333333334, violations: 6305,
recipes,yahoo:allrecipes.com, => google:allrecipes.com, : 95/96=0.9895833333333334, violations: 6041,
bing:mayoclinic.com,bing:nih.gov, => google:mayoclinic.com, : 102/103=0.9902912621359223, violations: 583,
bing:mayoclinic.com,bing:medicinenet.com, => google:medicinenet.com, : 124/125=0.992, violations: 645,
bing:medicinenet.com,bing:webmd.com, => google:medicinenet.com, : 136/137=0.9927007299270073, violations: 8556,
yahoo:nextag.com,bing:amazon.com, => google:amazon.com, : 172/173=0.9942196531791907, violations: 4773,
bing:medicinenet.com,google:mayoclinic.com, => google:medicinenet.com, : 174/175=0.9942857142857143, violations: 645,
google:walmart.com,bing:amazon.com, => google:amazon.com, : 177/178=0.9943820224719101, violations: 4773,
![Page 23: 1 Mining Test Oracles for Search Engines Wujie Zheng wjzheng@cse.cuhk.edu.hk](https://reader035.vdocument.in/reader035/viewer/2022062217/56649eb75503460f94bc0f94/html5/thumbnails/23.jpg)
23
bing:mayoclinic.com,google:nih.gov, => google:mayoclinic.com, : 143/145=0.9862068965517241, violations: 1255,583,
bing:amazon.com,yahoo:thefind.com, => google:amazon.com, : 72/73=0.9863013698630136, violations: 4773,
symptoms,bing:webmd.com, => google:webmd.com, : 77/78=0.9871794871794872, violations: 6451,
yahoo:medicinenet.com,yahoo:wrongdiagnosis.com, => google:medicinenet.com, : 77/78=0.9871794871794872, violations: 8556,
yahoo:medicinenet.com,yahoo:mayoclinic.com, => google:mayoclinic.com, : 78/79=0.9873417721518988, violations: 7124,
bing:allrecipes.com,yahoo:allrecipes.com, => google:allrecipes.com, : 160/162=0.9876543209876543, violations: 566,5601,
yahoo:bankrate.com,bing:bankrate.com, => google:bankrate.com, : 82/83=0.9879518072289156, violations: 6266,
Internet.csv,bing:gsmarena.com, => google:gsmarena.com, : 83/84=0.9880952380952381, violations: 7617,
bing:gsmarena.com, => google:gsmarena.com, : 86/87=0.9885057471264368, violations: 7617,
bing:nextag.com,bing:amazon.com, => google:amazon.com, : 176/178=0.9887640449438202, violations: 4773,7343,
bing:mayoclinic.com,bing:answers.com, => google:mayoclinic.com, : 89/90=0.9888888888888889, violations: 6328,
![Page 24: 1 Mining Test Oracles for Search Engines Wujie Zheng wjzheng@cse.cuhk.edu.hk](https://reader035.vdocument.in/reader035/viewer/2022062217/56649eb75503460f94bc0f94/html5/thumbnails/24.jpg)
24
Thank you!