parallel recurrent neural network architectures for feature-rich session-based recommendations
TRANSCRIPT
![Page 1: Parallel Recurrent Neural Network Architectures for Feature-rich Session-based Recommendations](https://reader035.vdocument.in/reader035/viewer/2022062503/587faa9f1a28ab107e8b45cb/html5/thumbnails/1.jpg)
Parallel Recurrent Neural Network Architectures for Feature-rich Session-based Recommendations
Balázs Hidasi, Gravity R&D (@balazshidasi)Massimo Quadrana, Politecnico di Milano (@mxqdr)Alexandros Karatzoglou, Telefonica Research (@alexk_z)Domonkos Tikk, Gravity R&D (@domonkostikk)
![Page 2: Parallel Recurrent Neural Network Architectures for Feature-rich Session-based Recommendations](https://reader035.vdocument.in/reader035/viewer/2022062503/587faa9f1a28ab107e8b45cb/html5/thumbnails/2.jpg)
Session-based recommendations• Permanent (user) cold-start
User identification Changing goal/intent Disjoint sessions
• Item-to-Session recommendations Recommend to previous events
• Next event prediction
![Page 3: Parallel Recurrent Neural Network Architectures for Feature-rich Session-based Recommendations](https://reader035.vdocument.in/reader035/viewer/2022062503/587faa9f1a28ab107e8b45cb/html5/thumbnails/3.jpg)
GRU4Rec [Hidasi et. al. ICLR 2016]• Gated Recurrent Unit
• Adapted to the recommendation problem Session-parallel mini-batches
Ranking loss: TOP1 Sampling the output
hh 𝑥𝑡
h𝑡
𝑟
𝑧
𝑖1𝑖5𝑖8�� 11�� 21�� 31�� 41�� 51�� 61�� 71�� 81
�� 13�� 23�� 33�� 43�� 53�� 63�� 73�� 83�� 12�� 22�� 32�� 42�� 52�� 62�� 72�� 82
1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 10 0 0 0 1 0 0 0
𝑖1,1𝑖1,2𝑖1,3𝑖1,4𝑖2,1𝑖2,2𝑖2,3𝑖3,1𝑖3,2𝑖3,3𝑖3,4𝑖3,5𝑖3,6𝑖4,1𝑖4,2𝑖5,1𝑖5,2𝑖5,3
Session1
Session2
Session3
Session4
Session5
𝑖1,1𝑖1,2𝑖1,3𝑖2,1𝑖2,2𝑖3,1𝑖3,2𝑖3,3𝑖3,4𝑖3,5
𝑖4,1𝑖5,1𝑖5,2
Input
Desired output
…
𝑖1,2𝑖1,3𝑖1,4𝑖2,2𝑖2,3𝑖3,2𝑖3,3𝑖3,4𝑖3,5𝑖3,6
𝑖4,2𝑖5,2𝑖5,3…
GRU layer
One-hot vector
Weighted output
Scores on items
f()
One-hot vector
ItemID (next)
ItemID
![Page 4: Parallel Recurrent Neural Network Architectures for Feature-rich Session-based Recommendations](https://reader035.vdocument.in/reader035/viewer/2022062503/587faa9f1a28ab107e8b45cb/html5/thumbnails/4.jpg)
Item features & user decisions• Clicking on recommendations
• Influencing factors Prior knowledge of the user The stuff they see
o Image (thumbnail, product image, etc)o Text (title, short description, etc)o Priceo Item propertieso ...
• Feature extraction from images and text
![Page 5: Parallel Recurrent Neural Network Architectures for Feature-rich Session-based Recommendations](https://reader035.vdocument.in/reader035/viewer/2022062503/587faa9f1a28ab107e8b45cb/html5/thumbnails/5.jpg)
Naive solution 1: Train on the features• Item features on the input
• Bad offline accuracy
• Recommendations make sense
but somewhat general
Image feature vector
GRU layer
Network output
Scores on items
f()
ItemID
One-hot vector
ItemID (next)
![Page 6: Parallel Recurrent Neural Network Architectures for Feature-rich Session-based Recommendations](https://reader035.vdocument.in/reader035/viewer/2022062503/587faa9f1a28ab107e8b45cb/html5/thumbnails/6.jpg)
Naive solution 2: Concat. session & feature info
• Concatenate one-hot vector of the ID and the feature vector of the item
• Recommendation accuracy similar to the ID only network
• ID input is initially stronger signal on the next item
Dominates the image input Network learns to focus on
weights of the ID
wID wimage
Image feature vector
GRU layer
One-hot vector
Weighted output
Scores on items
f()
One-hot vector
ItemID (next)
ItemID
![Page 7: Parallel Recurrent Neural Network Architectures for Feature-rich Session-based Recommendations](https://reader035.vdocument.in/reader035/viewer/2022062503/587faa9f1a28ab107e8b45cb/html5/thumbnails/7.jpg)
Parallel RNN• Separate RNNs for
each input type ID Image feature vector ...
• Subnets model the sessions separately• Concatenate hidden
layers Optional scaling /
weighting• Output computed from
the concat. hidden state
wID wimage
Image feature vector
GRU layer
Subnet output
GRU layer
One-hot vector
Subnet output
Weighted output
Scores on items
f()
ItemID
One-hot vector
ItemID (next)
![Page 8: Parallel Recurrent Neural Network Architectures for Feature-rich Session-based Recommendations](https://reader035.vdocument.in/reader035/viewer/2022062503/587faa9f1a28ab107e8b45cb/html5/thumbnails/8.jpg)
Training• Backpropagate through the network in one go
• Accuracy is slightly better than for the ID only network
• Subnets learn similar aspects of the sessions Some of the model capacity is wasted
![Page 9: Parallel Recurrent Neural Network Architectures for Feature-rich Session-based Recommendations](https://reader035.vdocument.in/reader035/viewer/2022062503/587faa9f1a28ab107e8b45cb/html5/thumbnails/9.jpg)
Alternative training methods• Force the networks to learn different aspects
Complement each other• Inspired by ALS and ensemble learning• Train one subnet at a time and fix the others
Alternate between them• Depending on the frequency of alternation
Residual training (train fully) Alternating training (per epoch) Interleaving training (per minibatch)
![Page 10: Parallel Recurrent Neural Network Architectures for Feature-rich Session-based Recommendations](https://reader035.vdocument.in/reader035/viewer/2022062503/587faa9f1a28ab107e8b45cb/html5/thumbnails/10.jpg)
Experiments on video thumbnails• Online video site
712,824 items Training: 17,419,964 sessions; 69,312,698 events Test: 216,725 sessions; 921,202 events
o Target score compared to only the 50K most popular items
• Feature extraction GoogLeNet (Caffe) pretrained on ImageNet dataset
Features: values of the last avg. pooling layero Dense vector of 1024 valueso Length normalized to 1
![Page 11: Parallel Recurrent Neural Network Architectures for Feature-rich Session-based Recommendations](https://reader035.vdocument.in/reader035/viewer/2022062503/587faa9f1a28ab107e8b45cb/html5/thumbnails/11.jpg)
Results on video thumbnails• Low number of hidden
units MRR improvements
o +18.72% (vs item-kNN)o +15.41% (vs ID only,
100)o +14.40% (vs ID only,
200)
• High number of hidden units
500+500 for p-RNN 1000 for others Diminishing return for
increasing the number of epochs or the hidden units
+5-7% in MRR
Method #Hidden units
Recall@20
MRR@20
Item-kNN - 0.6263 0.3740ID only 100 0.6831 0.3847ID only 200 0.6963 0.3881Feature only 100 0.5367 0.3065Concat 100 0.6766 0.3850P-RNN (naive)
100+100
0.6765 0.4014
P-RNN (res) 100+100
0.7028 0.4440
P-RNN (alt) 100+100
0.6874 0.4331
P-RNN (int) 100+100
0.7040 0.4361
![Page 12: Parallel Recurrent Neural Network Architectures for Feature-rich Session-based Recommendations](https://reader035.vdocument.in/reader035/viewer/2022062503/587faa9f1a28ab107e8b45cb/html5/thumbnails/12.jpg)
Experiments on text• Classified ad site
339,055 items Train: 1,173,094 sessions; 9,011,321 events Test: 35,741 sessions; 254,857 events
o Target score compared with all items Multilanguage user generated titles and item descriptions
• Feature extraction Concatenated title and description Bi-grams with TF-IDF weighting
o Sparse vectors of 1,099,425 length; 5.44 avg. non-zeroes
• Experiments with high number of hidden units• +3.4% in recall• +5.37% in MRR
![Page 13: Parallel Recurrent Neural Network Architectures for Feature-rich Session-based Recommendations](https://reader035.vdocument.in/reader035/viewer/2022062503/587faa9f1a28ab107e8b45cb/html5/thumbnails/13.jpg)
Summary• Incorporated image and text features into
session-based recommendations using RNNs• Naive inclusion is not effective• P-RNN architectures• Training methods for P-RNNs
![Page 14: Parallel Recurrent Neural Network Architectures for Feature-rich Session-based Recommendations](https://reader035.vdocument.in/reader035/viewer/2022062503/587faa9f1a28ab107e8b45cb/html5/thumbnails/14.jpg)
Future work• Revisit feature extraction
• Try representations of the whole video
• Use multiple aspects for items
![Page 15: Parallel Recurrent Neural Network Architectures for Feature-rich Session-based Recommendations](https://reader035.vdocument.in/reader035/viewer/2022062503/587faa9f1a28ab107e8b45cb/html5/thumbnails/15.jpg)
Thanks!Q&A