parameter-efficient transfer from sequential behaviors for

Fajie Yuan （Tencent）； Xiangnan He （University of Science and Technology of China）Alexandros Karatzoglou (Google Research); Liguang Zhang (Tencent)

Parameter-Efficient Transfer from Sequential Behaviors for User Modeling and

Recommendation SIGIR2020

PeterRec Data&Code: https://github.com/fajieyuan/sigir2020_peterrec

➢ Motivation➢ Related Work➢ PeterRec➢ Experiments

Outline

Users engage with recommender systems and provided or left feedback.

Figure: https://www.researchgate.net/figure/The-sequential-recommendation-process-After-the-RS-recommends-an-item-the-user-gives_fig4_311513879

Car Rec

A user has different roles to play in life！

News Rec

Music Rec

Video Rec Video Rec

Social APP

Social APP

Video Rec

Search Engine

Browser

Map APP

Video Rec

Me

our

Motivation

About myself

Male

Age: 30Phd of Glasgow University

Researcher at Tencent

Married

Hot user in Tiktok Cold user in

Amazon

New user in Netflix

A user has different roles to play in life！

My user Model

our

Motivation

Our PeterRec

About myself

Male

Age: 30Phd of Glasgow

Univeristy

Researcher at Tencent

Married

Hot user in Tiktok

Cold user in

AmazonNew user in Netflix

My user Model

our

MotivationA user has different roles to play in life！


Outline

• Recommendation Background: (1) Content & Context Recommendation

(2) Session-based Recommendation: recommending the next item based on previously recorded user interactions.

DNN

user vector item vector

𝑥1, 𝑥2 , … , 𝑥𝑛 +C 𝑦i

DNN

Embedding Embedding

A DSSM (Non-sequential) RS model (Supervised Learning)Sequential NextItNet (Self-supervised Learning)

Why sequential recommendation?

• Short-videos (Tik Tok, Weishi, Kuaishou)

Music (Tencent music，Yahoo! Music) & News

Movie clips （You Tube, Netflix）

NonSeq Rec vs. Seq Rec:

• Only Static vs. Dynamic Preference

• Manual Feature Engineering vs. Manual-free Features

• Supervised Learning vs. Unsupervised (self-supervised) Learning

Transfer Learning Background

TL aims to extract the knowledge from one or more source tasks and applies the knowledge to a target task.

Figure: A Comprehensive Hands-on Guide to Transfer Learning with Real-World Applications in Deep Learning, online

Transfer Learning (TL) vs Multi-task Learning (MTL)

[1] ICML 2018: Advances in transfer, multitask, and semi-supervised learning, online

TL vs MTL• Two-stage training vs joint training• One objective vs multiple objectives• Care only target vs. care all objectives

Transfer Learning (TL) for Recommender System (RS)

Motivation:• User representation may be generic,

since their preference tends to besimilar across different recommendationtask. That is, user’s engagement inprevious platforms may be importanttraining signals for other systems.

• Traditional ML models usually fail towhen modeling new or cold users dueto lack of interaction data

[1] figure is from online, url is missing

[1]


Outline

Transfer Learning (TL) for Recommender System (RS)

Task description

Source data: (u, 𝑥𝑢), 𝑤ℎ𝑒𝑟𝑒 𝑥𝑢 = 𝑥1𝑢, 𝑥2

𝑢 , … 𝑥𝑛𝑢 ,

where 𝑥𝑡𝑢denotes the t−th interacted item of user u

Target data: (u, y) where y is the supervise label in the target dataset

Example Source data: user’s watching activities in Tencent QQ BrowserTarget data: user’s watching activities in Kandian, but users are cold or new here

or user’s profile labels e.g. age, gender, lifestatus, etc.

PeterRec Architecture

(a)pre-training on QQ Browser data

[TCL]

𝐿𝑎𝑏𝑒𝑙 (e.g., gender)

(b)fine-tuning on user profile dataset

ℋ ෩Θ

𝑤 Θ π ν

෩ℋ ෩Θ

𝑥1𝑢 𝑥2

𝑢 𝑥3𝑢 𝑥4



𝑢

𝑥2𝑢 𝑥3


𝑢

User clickingbehaviors

NextItNet-style neural network

NextItNet: A Simple Convolutional Generative Network for Next Item Recommendation. WSDM2019, Yuan et al.

, .

AutoRegressive

What can be done by PeterRec

(a)pre-training

[TCL]

𝐿𝑎𝑏𝑒𝑙

(b)fine-tuning

ℋ ෩Θ

𝑤 Θ π ν

෩ℋ ෩Θ

𝑥1𝑢 𝑥2




𝑢

𝑥2𝑢 𝑥3


𝑢

• Cold-start problem, e.g., ads rec• User profile prediction, e.g., gender prediction

Problems we meet when a number of tasks are required.

pretraining

Six fine-tuning models

[TCL]

Age 𝐿𝑎𝑏𝑒𝑙

π1 ν

෩ℋ ෩Θ1

𝑥1𝑢 𝑥2


𝑢

[TCL]

Lifestatus 𝐿𝑎𝑏𝑒𝑙

π2 ν

෩ℋ ෪Θ2

𝑥1𝑢 𝑥2


𝑢

[TCL]

Education 𝐿𝑎𝑏𝑒𝑙

π3 ν

෩ℋ ෩Θ3

𝑥1𝑢 𝑥2


𝑢

[TCL]

Gender𝐿𝑎𝑏𝑒𝑙

π4 ν

෩ℋ ෪Θ4

𝑥1𝑢 𝑥2


𝑢

[TCL]

Profession 𝐿𝑎𝑏𝑒𝑙

π5 ν

෩ℋ ෩Θ4

𝑥1𝑢 𝑥2


𝑢

[TCL]

Ads 𝐿𝑎𝑏𝑒𝑙

π6 ν

෩ℋ ෩Θ5

𝑥1𝑢 𝑥2


𝑢

Training a separate model for each downstream task is parameter-inefficientsince both pretraining& finetuning modelsare very large.

The number of finetunedmodels is as many as the number of downstreamtasks.100 tasks=100 finetunedmodels

Taking inspiration from grafting

A: branch of plumB: Tree of peachC: insertionD: grow together

TreePretrained model

Pretrained model is treated as the peach Tree.

Grafting for plants.

Original resnet block insertionMP

A: branch of plum vs MPB: Tree of peach vs pretrained modelC: insertion vs insertionD: grow together vs finetuning

Parameter-Efficient Transfer from Sequential Behaviors for User Modeling and Recommendation, Yuan et al SIGIR 2020


Outline

Results.

PeterZero：no pretraining

PeterRec： with pretraining

Is pretraining necessary？

, .

Results.

What can be done by Peterrec

More

Adolescent mental health - for parentsPayment capacity - for bank Advertising – for company

Example : if we have the video watch behaviors of a teenager, we may know whether he has depression or propensity for violence by PeterRec without resorting to much feature engineering and human-labeled data.

parameter-efficient transfer from sequential behaviors for

Documents