my phd trajectory
TRANSCRIPT
Factorization Machines for Hybrid Recommendation Systems Based
on Behavioral, Product, and Customer Data
Stijn Geuens
Agenda• PhD Trajectory• Goals• Research Questions• Progress• Future Work
RecSys 2015 [email protected]
RecSys 2015 [email protected]
PhD Trajectory
Computer Science
Machine Learning Math &
Statistics
Business Expertise
Data Engineering
Business Analytics
Data Science
RecSys 2015 [email protected]
Research Questions
Machine Learning What is the added value of combining different data sources?
• More data beats better models (Halevy, Norveg, Pereira, 2009)
• Rich database– Explicit Ratings– Implicit Ratings– Customer Data– Product Data– Context Data
• Different combination methods
RecSys 2015 [email protected]
Research QuestionsHow can we evaluate recommender systems in online settings using business metrics?
• Collaboration with company• Witch metric to optimize?
– Click rates– conversion– Turnover– Loyalty– Etc.
• Does a RecSys affect these business performance?
Business Analytics
RecSys 2015 [email protected]
Current Study
Factorization Machines for Hybrid Recommendation Systems Based
on Behavioral, Product, and Customer Data
RecSys 2015 [email protected]
Motivation• Typologies of systems using different input data:
– Collaborative filtering, content-based, and hybrid (Adomavicius, Tuzhilin, 2005)
– Collaborative filtering, content-based, demographic, knowledge-based, hybrid (Burke, 2000; Bobadilla et al. 2013)
• Each systems has its advantages and disadvantages• Hybridization resolves these issues and leads to better performance• More data trumps better models (Halevy, Norveg, Pereira, 2009)
• This study: Hybridization by combining different data sources (customer, product, behavioral data) by feature combination using a single state-of-the-art algorithm, factorization machines (FM) Combining all different data sources in one algorithm is never done before, especially not in factorization machines research
RecSys 2015 [email protected]
Factorization Machines (FM)• Introduced by Rendle (2010)• Based on Support Vector Machines (SVM) and factorization
models and combines the advantages of both.• SVM: Works with any real valued feature vector, allowing to
integrated different data sources• Factorization Models: Variable interaction is calculated based
on factorized parameters, allowing to estimate interaction under huge sparsity, where SVM’s fail.
• General FM model equation of degree 2:
RecSys 2015 [email protected]
Algorithms• 4 factorization machines
– 3 single data source FMs• Behavioral data (FMBD)
• Customer data (FMCD)
• Product data (FMPD)
– 1 Hybrid FM based on the 3 distinct data sources (FMBD/CD/PD)
• 1 company used hybrid CF benchmark model– Input user-item matrix (M), where each element is defined as follows:
RecSys 2015 [email protected]
Data• 2 distinct data sets:
– Furniture: 5,368 users and 2,601 items– Children’s clothing: 5,999 users and 4,372 items
RecSys 2015 [email protected]
Results• Evaluation: Recall@5 – recall@100• Friedman test with Holm’s Procedure (Demsar 2006):
– Dependent variable = Recall– Independent variable = Algorithm– Cases = selection size – product category combinations
Algorithm FMPD/CD/BD FMBD CF FMCD FMPD
Ranking 1 2.38 2.77 3.95 4.90
NS
RecSys 2015 [email protected]
Results• Furniture category
• Children’s Clothing Category5 15 25 35 45 55 65 75 85 95
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
FMPD FMCD FMBDFMPD/CD/BD CF
Selection Size
Reca
ll
5 15 25 35 45 55 65 75 85 950%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
FMPD FMCD FMBDFM/PD/CD/BD CF
Selection Size
Reca
ll
RecSys 2015 [email protected]
Future Work: This study
• Preform grid search to identify witch data sources are the most important (on data type level and individual variable level)
• Creating a benchmark hybrid algorithm combining results of different systems created based on each of the data sources
• Evaluation based on other theoretical metrics (precision, F1, AUC, diversity, novelty, etc.)
RecSys 2015 [email protected]
Future Work: PhD
• Implement model at the company and perform a real-life A/B tests– Email system– Webshop
• Evaluation of the implemented algorithm in terms of business metrics (click rates, conversion rates, turnover, loyalty, etc.)
• Investigate which (combination of) business metrics optimize(s) economic value of the RecSys in both short and long term
• Investigate the impact of a RecSys on economic performance of a company
RecSys 2015 [email protected]
Thank you for your Attention
Contact:Stijn Geuens (0)3.20.545.892
IESEG School of Management [email protected] Rue de la Digue fr.linkedin.com/pub/stijn-geuens/
F-59000 Lille stijn.geuens
RecSys 2015 [email protected]
Advantages and disadvantages of different systems
Pros Cons
Collaborative Filtering • No metadata engineering needed
• Serendipity in results• Adaptive
• Scalability• Cold Start for new users
and items• Long tail problem• Stability
Content-based • Comparision between items possible
• No metadata engineering needed
• Adaptive
• Overspecialization• Cold start for new users• Collection of product
information
RecSys 2015 [email protected]
Advantages and disadvantages of different systems
Pros Cons
Knowlegde-based • Deterministic• No cold-start
• Knowledge engineering requered
• Subjective• Static
Demographic • No metadata engineering needed
• Serendipity in results
• Long tail• Cold start for new users• Static