Download - Online Commercial Intention
![Page 1: Online Commercial Intention](https://reader034.vdocument.in/reader034/viewer/2022052322/5580267bd8b42aac768b4c06/html5/thumbnails/1.jpg)
04/13/23 1
Detecting Online Commercial Detecting Online Commercial IntentionIntention
(OCI)(OCI)
Honghua (Kathy) Dai, Zaiqing Nie, Lee Wang, Lingzhi Zhao, Ji-Rong
Wen, and Ying Li
![Page 2: Online Commercial Intention](https://reader034.vdocument.in/reader034/viewer/2022052322/5580267bd8b42aac768b4c06/html5/thumbnails/2.jpg)
04/13/23 2
Agenda Motivations and introduction to OCI
(Online Commercial Intention)
A machine learning-based approach for OCI detection
Experiments
Conclusion and future work
![Page 3: Online Commercial Intention](https://reader034.vdocument.in/reader034/viewer/2022052322/5580267bd8b42aac768b4c06/html5/thumbnails/3.jpg)
04/13/23 3
Motivation Serving ads will be more effective
and less annoying, when user has intent to purchase
We are interested in detecting web pages / queries that show intention to commit a commercial activity (purchase, rent, bid, or sell…)
![Page 4: Online Commercial Intention](https://reader034.vdocument.in/reader034/viewer/2022052322/5580267bd8b42aac768b4c06/html5/thumbnails/4.jpg)
04/13/23 4
OCI vs 3 search goal categories Navigational
The immediate intent is to reach a particular site Informational
The intent is to acquire some information assumed to be present on one or more web pages.
Transactional The intent is to perform some web-mediated activity
Commercial Non-Commercial
Navigational walmart hotmail
Informational Digital camera San Francisco
Transactional / Resource
U2 music download Collide lyrics
OCI can bee seen as a new dimension of user search OCI can bee seen as a new dimension of user search goals.goals.
![Page 5: Online Commercial Intention](https://reader034.vdocument.in/reader034/viewer/2022052322/5580267bd8b42aac768b4c06/html5/thumbnails/5.jpg)
04/13/23 5
Define the OCI detection problem
A binary classification problem OCI: Query/Page -> {Commercial,
Non-Commercial} We can derive the commercial
sense from a confidence value that ranges from 0 (no commercial intent) to 1 (strong commercial intent) Stronger
Commercial Intention
0 1
![Page 6: Online Commercial Intention](https://reader034.vdocument.in/reader034/viewer/2022052322/5580267bd8b42aac768b4c06/html5/thumbnails/6.jpg)
04/13/23 6
Framework of Detecting Page
OCI
Classification Algorithm
Page Commercial Intention Detector
Labeled Training Page Content…
Page content of http://shopping.msn.com/:
Commercial...
Keyword Extraction and Selection
Feature Composition
Significant Keywords
Feature Vectors of selected keywords
Full HTML Page Content
![Page 7: Online Commercial Intention](https://reader034.vdocument.in/reader034/viewer/2022052322/5580267bd8b42aac768b4c06/html5/thumbnails/7.jpg)
04/13/23 7
Keywords selection
Select significant and reliable keywords Significance: Frequency:
Keyword selection threshold For simplicity we use the same threshold
for the two measures in the experiments.
12)|Pr()|Pr(
)}|Pr(),|{Pr()(
CkCk
CkCkMaxkSig
)|Pr()( CCkkFreq
![Page 8: Online Commercial Intention](https://reader034.vdocument.in/reader034/viewer/2022052322/5580267bd8b42aac768b4c06/html5/thumbnails/8.jpg)
04/13/23 8
Page feature composition
We define two aspects of properties for each keyword in a page p: keyword occurrences in inner text keyword occurrences in tag attributes
As the result, a page p is represented by a feature vector using these two aspects
),( pknit i
),( pknta i
![Page 9: Online Commercial Intention](https://reader034.vdocument.in/reader034/viewer/2022052322/5580267bd8b42aac768b4c06/html5/thumbnails/9.jpg)
04/13/23 9
Detecting query OCI Challenges
Only few search queries contain explicit commercial indicators, such as “buy”, “price”, “rent”, “discount”, etc.
Search queries are usually short.
Solution Enrich query from external resource (search engine)
First result page (Query snippets) Top N landing URLs
Query classification problem -> page classification problem
![Page 10: Online Commercial Intention](https://reader034.vdocument.in/reader034/viewer/2022052322/5580267bd8b42aac768b4c06/html5/thumbnails/10.jpg)
04/13/23 10
Search result page and Landing URLs
![Page 11: Online Commercial Intention](https://reader034.vdocument.in/reader034/viewer/2022052322/5580267bd8b42aac768b4c06/html5/thumbnails/11.jpg)
04/13/23 11
Query OCI Detector based on Top N Landing URLs
Classification Algorithm/
Simple Average
TopURL-basedModel
Training Queries…
digital camera : CommercialEncarta: Non-Commercial
...
Search Engine
Result URLs on the 1st Result page:Query: digital camera Rank1: URL1Rank 2: URL2…Rank N: URLN
General Page OCI Detector
OCI of the URLs on the 1st Result page:Query: digital camera Rank1: CommercialRank 2: Commercial…Rank N: Non-Commercial---
![Page 12: Online Commercial Intention](https://reader034.vdocument.in/reader034/viewer/2022052322/5580267bd8b42aac768b4c06/html5/thumbnails/12.jpg)
04/13/23 12
Query OCI Detectorbased on first search result page
Training Queries…
digital camera dealsCommercial
Encarta:Non-Commercial...
Search Engine
First Search Result Page ContentOCI labels
Framework of Learning Page OCI in Figure 1
First-Search-Result-Page-based Model
Querybuy supersonics ticket
Search Engine
First Search Result Page Content
First-Search-Result-Page-based Model
buy supersonics ticket: Commercial
Training Process Prediction Process
Build a dedicated search result page classifier for this purpose
![Page 13: Online Commercial Intention](https://reader034.vdocument.in/reader034/viewer/2022052322/5580267bd8b42aac768b4c06/html5/thumbnails/13.jpg)
04/13/23 13
Labeling process
We adopted majority vote: 3 human labelers voted for the labels
Initial Web pages and queries were randomly selected from our page/query repository.
Pages Queries
Commercial 4074 602
Non-Commercial 21823 790
Total 25897 1408
![Page 14: Online Commercial Intention](https://reader034.vdocument.in/reader034/viewer/2022052322/5580267bd8b42aac768b4c06/html5/thumbnails/14.jpg)
04/13/23 14
Experiment Results - Page OCI detector
Reach best performance (CF) when keyword selection threshold = 0.1 (using SVM as the classifier)
CP, CR and CF are the precision, recall and F1 metrics for detecting commercial intent.
Keyword Selection Threshold
Keyword Number
CP CR CF
0.1 391 0.930 0.925 0.928
![Page 15: Online Commercial Intention](https://reader034.vdocument.in/reader034/viewer/2022052322/5580267bd8b42aac768b4c06/html5/thumbnails/15.jpg)
04/13/23 15
Experiment Results- Query OCI detector
0.860.82 0.84
0.75
0.57
0.65
0.43
1.00
0.60
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
CP CR CF
Model based on search result page: OCI(FSRPq)Model based on top N landing pages: OCI(TLPq) SVMModel based on top N landing pages: OCI(TLPq) Naïve Average
Query OCI Detector Performance
Model based on first result page returns best performance.
![Page 16: Online Commercial Intention](https://reader034.vdocument.in/reader034/viewer/2022052322/5580267bd8b42aac768b4c06/html5/thumbnails/16.jpg)
04/13/23 16
OCI Distribution among Query Frequency Ranges
0%
10%
20%
30%
40%
50%
60%
70%
80%
AllQueries
High Mid Low Very Low Single
Query Frequency Range
Perc
en
tag
e
Commercial NonCommercial
![Page 17: Online Commercial Intention](https://reader034.vdocument.in/reader034/viewer/2022052322/5580267bd8b42aac768b4c06/html5/thumbnails/17.jpg)
04/13/23 17
Conclusions The notion of OCI (Online Commercial
Intention) and the problem of detecting OCI from pages and queries.
The framework of building machine learning models to detect OCI based on Web page content.
Based on this framework, we build models
to detect OCI from search queries.
![Page 18: Online Commercial Intention](https://reader034.vdocument.in/reader034/viewer/2022052322/5580267bd8b42aac768b4c06/html5/thumbnails/18.jpg)
04/13/23 18
Conclusions (cont.) Our framework trains learning models from two
types of data sources for a given search query: content of first search result page (query snippets) content of top landing URLs returned by search
engine.
Experiments showed that the model based on the first search result page achieved better performance.
We also discovered an interesting phenomenon that the portion of queries having commercial intention is higher in frequent query sets.
![Page 19: Online Commercial Intention](https://reader034.vdocument.in/reader034/viewer/2022052322/5580267bd8b42aac768b4c06/html5/thumbnails/19.jpg)
04/13/23 19
Future work Utilize search query click through logs
Reduce labeling effort
Take user online context into consideration in studying user’s online intention
Detect at which commercial activity phase a user is (research/commit).
![Page 20: Online Commercial Intention](https://reader034.vdocument.in/reader034/viewer/2022052322/5580267bd8b42aac768b4c06/html5/thumbnails/20.jpg)
04/13/23 20
Future Work (Cont.) Detect more detailed commercial intentions in different
verticals Traveling intention and preferences. Branding awareness and preferences.
Study how specific the user intention is: “Halo2” vs “video games” “cheap airline ticket new york to las vegas” vs “book a flight”
Study the correlations between conversion rate and user intention.
A lot of more interesting research problems! We are HIRING! Contact:[email protected]
![Page 21: Online Commercial Intention](https://reader034.vdocument.in/reader034/viewer/2022052322/5580267bd8b42aac768b4c06/html5/thumbnails/21.jpg)
04/13/23 21
Thank You for Your
Attention!