deep learning powered in- session contextual ranking using clickthrough data xiujun li 1, chenlei...

26
Deep Learning Powered In- Session Contextual Ranking using Clickthrough Data Xiujun Li 1 , Chenlei Guo 2 , Wei Chu 2 , Ye-Yi Wang 2 , Jude Shavlik 1 1 University of Wisconsin – Madison 2 Microsoft Bing, Redmond, Washington, U.S.A. 12/12/2014

Upload: beverley-powell

Post on 13-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Deep Learning Powered In- Session Contextual Ranking using Clickthrough Data Xiujun Li 1, Chenlei Guo 2, Wei Chu 2, Ye-Yi Wang 2, Jude Shavlik 1 1 University

Deep Learning Powered In-Session Contextual Ranking

using Clickthrough DataXiujun Li1, Chenlei Guo2, Wei Chu2, Ye-Yi Wang2, Jude

Shavlik1

1University of Wisconsin – Madison2Microsoft Bing, Redmond, Washington, U.S.A.

12/12/2014

Page 2: Deep Learning Powered In- Session Contextual Ranking using Clickthrough Data Xiujun Li 1, Chenlei Guo 2, Wei Chu 2, Ye-Yi Wang 2, Jude Shavlik 1 1 University

Outline

• Project Goal and Motivation Background & Goal Problem Definition

• Methodology Feature Generation Semantic Features Click-Based Features

• Experimental Results Offline Evaluation

• Future Work

Page 3: Deep Learning Powered In- Session Contextual Ranking using Clickthrough Data Xiujun Li 1, Chenlei Guo 2, Wei Chu 2, Ye-Yi Wang 2, Jude Shavlik 1 1 University

Goal

• Incorporate some expressive signals (e.g. topic, domain) from in-session query history into the ranking, to re-rank the results

Page 4: Deep Learning Powered In- Session Contextual Ranking using Clickthrough Data Xiujun Li 1, Chenlei Guo 2, Wei Chu 2, Ye-Yi Wang 2, Jude Shavlik 1 1 University

Example

Query: the dangerous days of daniel x Altrelda wiki

URL Original Rank ControlRankerScore

http://en.wikipedia.org/wiki/The_Dangerous_Days_of_Daniel_X 1 10.922

http://fanon.wikia.com/wiki/The_Dangerous_Days_of_Daniel_X 2 9.2745

http://www.amazon.com/The-Dangerous-Days-Daniel-X/dp/0316119709 3 7.444

http://danielx.wikia.com/wiki/The_Dangerous_Days_of_Daniel_X_(novel) 4 6.7845

http://danielx.wikia.com/wiki/Daniel_X 5 6.4136

http://www.goodreads.com/series/49946-daniel-x 6 5.5571

http://en.wikipedia.org/wiki/Daniel_X:_Watch_the_Skies 7 5.2691

http://www.jamespatterson.com/books_danielX.php 8 4.6821

Example 1. Current Ranking

But this Wikipedia url is an UnSat page clicked by the user in the same session?

Query 2: the dangerous days of daniel x

This is the 6th query in the same session.

Page 5: Deep Learning Powered In- Session Contextual Ranking using Clickthrough Data Xiujun Li 1, Chenlei Guo 2, Wei Chu 2, Ye-Yi Wang 2, Jude Shavlik 1 1 University

Approach

• Correlate users clicks with their satisfaction• Propose a set of fine-grained semantic features to explore the

relations between in-session history queries, clicked URLs and current query, URLs, e.g. o The similarity between previous queries and current URLo The similarity of previous sat/unsat clicked URLs of last query to current URL

• Employ the semantic models (e.g. XCode, DSSM, CDSSM) to measure the similarity• Incorporate these features into the ranking, to re-rank the results

Page 6: Deep Learning Powered In- Session Contextual Ranking using Clickthrough Data Xiujun Li 1, Chenlei Guo 2, Wei Chu 2, Ye-Yi Wang 2, Jude Shavlik 1 1 University

Problem1. ,…, are the history queries in one session.2. ,…, are the URLs in one impression page for query , it contains Sat and unSat clicks.3. is the current query.4. ,…, are the URLs in the impression page for query based on the current ranking algorithm.

Problem: we will re-rank , …, based on some signals (i.e. topic, domain) derived from the history in-session clickthrough data.

Page 7: Deep Learning Powered In- Session Contextual Ranking using Clickthrough Data Xiujun Li 1, Chenlei Guo 2, Wei Chu 2, Ye-Yi Wang 2, Jude Shavlik 1 1 University

Outline

• Project Goal and Motivation Background & Goal Problem Definition

• Methodology Feature Generation Semantic Features Click-Based Features

• Experimental Results Offline Evaluation

• Future Work

Page 8: Deep Learning Powered In- Session Contextual Ranking using Clickthrough Data Xiujun Li 1, Chenlei Guo 2, Wei Chu 2, Ye-Yi Wang 2, Jude Shavlik 1 1 University

Features Engineering – fine-grained featuresFeatures Level Description Category Feature Name

Semantic Features

URL-URLThe similarity between the urls of previous queries and urls of current query

Sat 9 raw featuresUnSat 9 raw features

Query-URL The similarity between the previous queries and urls of current query

Sat 5 raw featuresUnSat 5 raw features

Weighted URL-URLThe weighted similarity between the urls of previous queries and urls of current query

Sat 9 raw featuresUnSat 9 raw features

Click-Based Features

URL Level The domain and history click statistics on the current url of current query

History Click1.clickcount_url_total2.clickcount_url_sat3.clickcount_url_unsat

Domain Click1.domain_click_total2.domain_click_sat3.domain_click_unsat

Impression Level The Sat/UnSat statistics on the in-session history queries

Clicked url 1.sat_click_urls2.unsat_click_urls

Clicked count 1.sat_click_count2.unsat_click_count

Last click 1.last_click_dwelltime

Page 9: Deep Learning Powered In- Session Contextual Ranking using Clickthrough Data Xiujun Li 1, Chenlei Guo 2, Wei Chu 2, Ye-Yi Wang 2, Jude Shavlik 1 1 University

Semantic FeaturesThree Distance to leverage the similarity for semantic features1. dis( , ): The distance between query and , say .2. dis( , ): The distance between i-th url of query and j-th url of

query .3. dis( , ): The distance between query and j-th url of query .

Page 10: Deep Learning Powered In- Session Contextual Ranking using Clickthrough Data Xiujun Li 1, Chenlei Guo 2, Wei Chu 2, Ye-Yi Wang 2, Jude Shavlik 1 1 University

Semantic FeaturesSemantic distance calculation

Model Distance Range Semantics

XCode Hamming distance [0, 512]0 is the most similar,512 is the least similar

DSSM Cosine distance [-1, 1]-1 is the least similar,1 is the most similar

CDSSM Cosine distance [-1, 1]-1 is the least similar,1 is the most similar

Page 11: Deep Learning Powered In- Session Contextual Ranking using Clickthrough Data Xiujun Li 1, Chenlei Guo 2, Wei Chu 2, Ye-Yi Wang 2, Jude Shavlik 1 1 University

XCode• Map every word, query, url into an intent space.• Measure semantic similarities of words, phrases and entities in an

intent space.

Maximize Mutual Information between Queries and URLsTrained on all of Bing search logs with 100b samples

512 bits X-CodeWord: (0 1 1 1 0 1…)URL: (0 1 1 0 0 0…)

Query: (0 1 1 0 0 1…) Clicked URLs:www.teslamotors.com www.wikipedia.org …

Queries:Tesla carTesla bio … Maximize Mutual

Information between Queries and URLsTrained on all of Bing search logs with 100b samples

Maximize Mutual Information of queries and URLs on Bing logs

* Borrowed from AIM Project

Page 12: Deep Learning Powered In- Session Contextual Ranking using Clickthrough Data Xiujun Li 1, Chenlei Guo 2, Wei Chu 2, Ye-Yi Wang 2, Jude Shavlik 1 1 University

Deep Structured Semantic Model (DSSM)

• Representation: use deep neural networks to extract abstract semantic representations

• Learning: maximize the similarity between the query and the doc

• Scalability: use sub-word unit (e.g., letter-ngram) as raw input so that to handle an unlimited vocabulary (word hashing)

Page 13: Deep Learning Powered In- Session Contextual Ranking using Clickthrough Data Xiujun Li 1, Chenlei Guo 2, Wei Chu 2, Ye-Yi Wang 2, Jude Shavlik 1 1 University

DSSM Double Model Training

DSSM• DSSM provides a way to encode how well contextual information is

matched in the semantic level. • Map queries/titles/docs … to a common low dimensional

semantic space using DNN• Leverage contextual information, not depend only on context

terms

Training Data Source Model Target Model

<Query, Title> Query Model Title Model

<Query, BrokenURL> Query Model BrokenURL Model

Page 14: Deep Learning Powered In- Session Contextual Ranking using Clickthrough Data Xiujun Li 1, Chenlei Guo 2, Wei Chu 2, Ye-Yi Wang 2, Jude Shavlik 1 1 University

DSSM Distance Three Distance Calculation

Calculate DSSM Distance between queries Calculate DSSM Distance between URLs Calculate DSSM Distance between query and URL

Query Model

Query Model Title

ModelTitle

ModelTitle

Model

Query Model

Cosine Distance Cosine Distance Cosine Distance

<URL, Title> <URL, Title>

Page 15: Deep Learning Powered In- Session Contextual Ranking using Clickthrough Data Xiujun Li 1, Chenlei Guo 2, Wei Chu 2, Ye-Yi Wang 2, Jude Shavlik 1 1 University

Convolutional DSSM (Preliminary Trial)

• Difference between CDSSM and DSSM• Sliding window: capture the contextual feature

for a word.• Convolutional layer: project each word within a

context window to a local contextual feature vector.

• Max pooling layer: extract the most salient local features to form a fixed-length global contextual feature vector.

• CDSSM Training• Pre-Training on Cosmos• GPU Training.

Page 16: Deep Learning Powered In- Session Contextual Ranking using Clickthrough Data Xiujun Li 1, Chenlei Guo 2, Wei Chu 2, Ye-Yi Wang 2, Jude Shavlik 1 1 University

Offline Experiment

Self-Join

Reducer

Log Data

<PreQuery, CurQuery> Pair

Distance Calculation{XCode, DSSM, CDSSM}

Feature Data Generation{curQuery, Features}

Training Data(1 week)

Validation Data(1 day)

Test Data(1 week)

Input Features

FastRank Feature Selection

Tree Ensemble

ReRanker Evaluator

Baseline Ranker

Indicator Feature

Diagram 1. Feature Data Generation Diagram 2. ReRanking Evaluation

Page 17: Deep Learning Powered In- Session Contextual Ranking using Clickthrough Data Xiujun Li 1, Chenlei Guo 2, Wei Chu 2, Ye-Yi Wang 2, Jude Shavlik 1 1 University

Outline

• Project Goal and Motivation Background & Goal Problem Definition

• Methodology Feature Generation Semantic Features Click-Based Features

• Experimental Results Offline Evaluation

• Future Work

Page 18: Deep Learning Powered In- Session Contextual Ranking using Clickthrough Data Xiujun Li 1, Chenlei Guo 2, Wei Chu 2, Ye-Yi Wang 2, Jude Shavlik 1 1 University

Offline Performance• Individual Features: XCode, Click-Based, DSSM

• For Semantic Features, DSSM (Query-Title) is the best.

• For Click-Based Features, it is also better, and easy to be flighted.

• More info in the table 1 of the Appendix.

• TimeToSuccess: The dwelling time to meet the first SatClick in the session.

• Session Success Position: defined by the position of the first SatClick in the session.

-0.03

-0.025

-0.02

-0.015

-0.01

-0.005

0

0.005

XCode Click-Based DSSM (Query-Title) DSSM (Query-BrokenURL) CDSSM (Query-Title)

Session Success Position

SatClick Rate UnSatClick Rate

TimeToSuccess (sec)TimeToSuccess

From Quickback

Page 19: Deep Learning Powered In- Session Contextual Ranking using Clickthrough Data Xiujun Li 1, Chenlei Guo 2, Wei Chu 2, Ye-Yi Wang 2, Jude Shavlik 1 1 University

Offline Performance• Features Combination: Semantic Features + Click-Based Features

• Semantic features and Click-Based features can provide different information to boost the performance.

• More info in the table 2 in the Appendix.

-0.06

-0.05

-0.04

-0.03

-0.02

-0.01

0

0.01

DSSM (Query-Title) XCode+Click DSSM (Query-Title)+ClickDSSM (Query-BrokenURL)+Click CDSSM (Query-Title)+Click XCode+Click+DSSM (Query-Title)

TimeToSuccess (sec) Session Success PositionTimeToSuccess

From QuickbackSatClick Rate UnSatClick Rate

Page 20: Deep Learning Powered In- Session Contextual Ranking using Clickthrough Data Xiujun Li 1, Chenlei Guo 2, Wei Chu 2, Ye-Yi Wang 2, Jude Shavlik 1 1 University

Click-Based Features Recap.Level Category Features Semantics Function

URL Level

History Click1.clickcount_url_total2.clickcount_url_sat3.clickcount_url_unsat

how many sat/unsat clicks for the current url in the previous queries in this session.

Improve TimeToSuccess, SessionSuccessPosition

Domain Click1.domain_click_total2.domain_click_sat3.domain_click_unsat

how many sat/unsat clicks for the domain of current url in the previous queries in this session

Increase the SatClick Rate and decrease UnSatClick Rate

Impression LevelClicked url 1.sat_click_urls

2.unsat_click_urlsSome statistics from in-session history queries

Clicked count 1.sat_click_count2.unsat_click_count

Page 21: Deep Learning Powered In- Session Contextual Ranking using Clickthrough Data Xiujun Li 1, Chenlei Guo 2, Wei Chu 2, Ye-Yi Wang 2, Jude Shavlik 1 1 University

Offline Performance• Domain Click v.s. History Click

-0.45

-0.4

-0.35

-0.3

-0.25

-0.2

-0.15

-0.1

-0.05

0

0.05

Domain+History Clicks Domain Clicks History Clicks

1 2 30

0.0005

0.001

0.0015

0.002

0.0025

0.003

0.0035

UnSatClick Rate

• Domain Click: Decrease the UnSat click

• History Click: Improve TimeToSuccess, SessionSuccessPosition

• More info in the table 4 in the Appendix.

TimeToSuccess (sec) Session Success PositionTimeToSuccess

From QuickbackUnSatClick RateSatClick Rate

Page 22: Deep Learning Powered In- Session Contextual Ranking using Clickthrough Data Xiujun Li 1, Chenlei Guo 2, Wei Chu 2, Ye-Yi Wang 2, Jude Shavlik 1 1 University

Example

Query: the dangerous days of daniel x Altrelda wiki

URL Original Rank Cranker Score Rank Delta ControlRankerScore ExperimentalRankerPosition domain_click_total domain_click_sat domain_click_unsat

http://fanon.wikia.com/wiki/The_Dangerous_Days_of_Daniel_X 2 9.0423 +1 9.2745 1 0 0 0

http://en.wikipedia.org/wiki/The_Dangerous_Days_of_Daniel_X 1 7.8237 -1 10.922 2 5 0 5

http://danielx.wikia.com/wiki/The_Dangerous_Days_of_Daniel_X_(novel) 4 7.0838 +1 6.7845 3 0 0 0

http://www.goodreads.com/series/49946-daniel-x 6 6.1145 +2 5.5571 4 0 0 0

http://danielx.wikia.com/wiki/Daniel_X 5 5.1809 0 6.4136 6 5 5 0

http://www.amazon.com/The-Dangerous-Days-Daniel-X/dp/0316119709 3 5.0326 -3 7.444 7 5 0 5

http://www.jamespatterson.com/books_danielX.php 8 2.9647 +1 4.6821 8 5 0 5

http://en.wikipedia.org/wiki/Daniel_X:_Watch_the_Skies 7 6.0986 -1 5.2691 5 0 0 0

Example 1. Five unSat clicks on Wikipedia in session log

This is the 6th query in the same session.

In the same session, 2nd query (“the dangerous days of daniel x”), there are 5 unsat clicks for

Wikipedia page.

Page 23: Deep Learning Powered In- Session Contextual Ranking using Clickthrough Data Xiujun Li 1, Chenlei Guo 2, Wei Chu 2, Ye-Yi Wang 2, Jude Shavlik 1 1 University

Appendix

• Evaluation: Table 1. Individual Features Table 2. Features Combination Table 3. Domain Click v.s. History Click Features

Page 24: Deep Learning Powered In- Session Contextual Ranking using Clickthrough Data Xiujun Li 1, Chenlei Guo 2, Wei Chu 2, Ye-Yi Wang 2, Jude Shavlik 1 1 University

Offline PerformanceFeatures TimeToSuccess

(sec)Session Success

PositionTimeToSuccess From

Quickback (sec)SatClick

RateUnSatClick

RateCoverage

All XCode -0.0159 [-0.0267%]

-0.0011 [-0.0483%]

-0.0053 [-0.0194%]

0.0014 [0.1626%]

0.0004 [0.2923%]

0.3288

All Click-Based

-0.0203 [-0.0343%]

-0.0018 [-0.0758%]

-0.0038 [-0.0141%]

0.0019 [0.2178%]

0.0005 [0.3932%]

0.3288

DSSM (Query-Title Model)

-0.0282 [-0.0476%]

-0.0026 [-0.1079%]

-0.0067 [-0.0251%]

0.0016 [0.1897%]

0.0004 [0.3458%]

0.3193

DSSM (Query-BrokenURL

Model)

-0.0194 [-0.0328%]

-0.0018 [-0.0751%]

-0.0047 [-0.0176%]

0.0014 [0.1734%]

0.0004 [0.3223%]

0.3202

CDSSM (Query-Title)

-0.0255 [-0.0430%]

-0.0024 [-0.1031%]

-0.0051 [-0.0191%]

0.0016 [0.1909%]

0.0004 [0.3690%]

0.3193

Table 1. Individual Features

Page 25: Deep Learning Powered In- Session Contextual Ranking using Clickthrough Data Xiujun Li 1, Chenlei Guo 2, Wei Chu 2, Ye-Yi Wang 2, Jude Shavlik 1 1 University

Offline PerformanceFeatures TimeToSuccess

(sec)Session Success

PositionTimeToSuccess From

Quickback (sec)SatClick

RateUnSatClick

RateCoverage

XCode + Click -0.0299 [-0.0504%]

-0.0025 [-0.1049%]

-0.0070 [-0.0259%]

0.0022 [0.2420%]

0.0005 [0.3825%]

0.3288

DSSM (Query-Title) + Click

-0.0366 [-0.0617%]

-0.0035 [-0.1492%]

-0.0059 [-0.0220%]

0.0022 [0.2625%]

0.0005 [0.4541%]

0.3193

DSSM (Query-BrokenURL) +

Click

-0.0286 [-0.0483%]

-0.0029 [-0.1209%]

-0.0044 [-0.0163%]

0.0021 [0.2478%]

0.0005 [0.4369%]

0.3202

XCode + Click + DSSM

(Query-Title)

-0.0499 [-0.0841%]

-0.0045 [-0.1911%]

-0.0109 [-0.0403%]

0.0025 [0.2821%]

0.0005 [0.4133%]

CDSSM(Query-Title) + Click

-0.0379 [-0.0639%]

-0.0037 [-0.1548%]

-0.0060 [-0.0224%]

0.0022 [0.2656%]

0.0005 [0.4567%]

0.3193

Table 2. The feature combination

Page 26: Deep Learning Powered In- Session Contextual Ranking using Clickthrough Data Xiujun Li 1, Chenlei Guo 2, Wei Chu 2, Ye-Yi Wang 2, Jude Shavlik 1 1 University

Offline Performance

Features TimeToSuccess (sec)

Session Success Position

TimeToSuccess From Quickback

(sec)

SatClick Rate

UnSatClick Rate

Coverage UnSatClickSwap

6 Click-Based

-0.3933 [-0.5938%]

-0.0357 [-1.2221%]

-0.0806 [-0.2815%]

0.0118 [1.3100%]

0.0025 [1.0925%]

0.79% 286 -> 307 (+21)

3 domain Click

-0.1971 [-0.2976%]

-0.0147 [-0.5044%]

-0.0758 [-0.2648%]

0.0077 [0.8529%]

0.0014 [0.6157%]

0.79% 249 - > 225 (-24)

3 history Click

-0.3637 [-0.5490%]

-0.0357 [-1.2246%]

-0.0487 [-0.1701%]

0.0086 [0.9562%]

0.0030 [1.2915%]

0.79% 128 -> 213 (+85)

Table 3. Domain Clicks and History Clicks Features