a file-based approach for recommender systems in high-performance computing environments

15
A File-based Approach for Recommender Systems in High-Performance Computing Environments Simon Dooms @sidooms

Upload: simon-dooms

Post on 10-May-2015

1.049 views

Category:

Technology


1 download

DESCRIPTION

How to create a recommender system that works without a database backend and therefore allows perfect scaling across an arbitrary number of computing nodes and multiple cores?

TRANSCRIPT

Page 1: A File-Based Approach for Recommender Systems in High-Performance Computing Environments

A File-based Approach for Recommender Systems in High-

Performance Computing Environments

Simon Dooms

@sidooms

Page 2: A File-Based Approach for Recommender Systems in High-Performance Computing Environments

Introduction

Is a database always the best option?

IntroIntro Hardware Workflow Item User Calc Results Concl.

09/02/2011 Simon Dooms - Ghent University - RSmeetDB '11 2

0.5%

99.5%

Page 3: A File-Based Approach for Recommender Systems in High-Performance Computing Environments

Hardware

Shared storage (RAID5)

Infiniband connectX DDR

194 computing nodes:8 cores @ 2.5 GHz16 GB RAM146 GB local storage

IntroHardwareHardware Workflow Item User Calc Results Concl.

Simon Dooms - Ghent University - RSmeetDB '11 309/02/2011

Page 4: A File-Based Approach for Recommender Systems in High-Performance Computing Environments

Recommendation workflowIntro Hardware

WorkflowWorkflow Item User Calc Results Concl.

Simon Dooms - Ghent University - RSmeetDB '11 4

Consumptions Item Metadata

Item SimilarityCalculation

Item

Similarities

RecommendationCalculation

User Similarities

User SimilarityCalculation

Consumptions

Consumptions Item

Similarities

Phase 1: Item Similarity

Phase 2: User Similarity

Phase 3: Recommendation

09/02/2011

Page 5: A File-Based Approach for Recommender Systems in High-Performance Computing Environments

Item similarityIntro Hardware Workflow

ItemItem User Calc Results Concl.

Simon Dooms - Ghent University - RSmeetDB '11 5

Item Metadata

Item SimilarityCalculation

Item

Similarities

09/02/2011

Page 6: A File-Based Approach for Recommender Systems in High-Performance Computing Environments

Item similarityIntro Hardware Workflow

ItemItem User Calc Results Concl.

Simon Dooms - Ghent University - RSmeetDB '11 6

node node node node node

C C C C C C C C C C

09/02/2011

Page 7: A File-Based Approach for Recommender Systems in High-Performance Computing Environments

File bucketsIntro Hardware Workflow

ItemItem User Calc Results Concl.

Simon Dooms - Ghent University - RSmeetDB '11 7

MODULO

Example for 3 buckets

09/02/2011

Page 8: A File-Based Approach for Recommender Systems in High-Performance Computing Environments

Writing item similaritiesIntro Hardware Workflow

ItemItem User Calc Results Concl.

C C C C C C

Local Storage

Shared Storage

Simon Dooms - Ghent University - RSmeetDB '11 809/02/2011

Page 9: A File-Based Approach for Recommender Systems in High-Performance Computing Environments

User SimilarityIntro Hardware Workflow Item

UserUser Calc Results Concl.

Simon Dooms - Ghent University - RSmeetDB '11 9

Item

Similarities

User

Similarities

User SimilarityCalculation

Consumptions

09/02/2011

Page 10: A File-Based Approach for Recommender Systems in High-Performance Computing Environments

User SimilarityIntro Hardware Workflow Item

UserUser Calc Results Concl.

Simon Dooms - Ghent University - RSmeetDB '11 10

C C C C

nodenodenode

node

09/02/2011

Page 11: A File-Based Approach for Recommender Systems in High-Performance Computing Environments

Recommendation calculationIntro Hardware Workflow Item User

CalcCalc Results Concl.

Simon Dooms - Ghent University - RSmeetDB '11 11

User

Similarities

RecommendationCalculation

Consumptions Item

Similarities

09/02/2011

Page 12: A File-Based Approach for Recommender Systems in High-Performance Computing Environments

Recommendation calculationIntro Hardware Workflow Item User

CalcCalc Results Concl.

Simon Dooms - Ghent University - RSmeetDB '11 12

SimilaritiesItem

SimilaritiesUser

09/02/2011

Page 13: A File-Based Approach for Recommender Systems in High-Performance Computing Environments

ResultsIntro Hardware Workflow Item User Calc

ResultsResults Concl.

Simon Dooms - Ghent University - RSmeetDB '11 13

• Proof of concept implementation• Cultural events dataset– 5 months of data– 53,000 items– 1,700 users– 14,000 => 6,800 consumptions

09/02/2011

Used number of nodes: 10, 20, 40, 80, 160Execution time scales inversely with number of nodes

Page 14: A File-Based Approach for Recommender Systems in High-Performance Computing Environments

Conclusion

• A file-based approach for HPC• Workflow as independent subjobs • Workflow ≈ embarrasingly parallel• Approach both scalable and memory efficient

Intro Hardware Workflow Item User Calc ResultsConcl.Concl.

Simon Dooms - Ghent University - RSmeetDB '11 1409/02/2011

Page 15: A File-Based Approach for Recommender Systems in High-Performance Computing Environments

Simon Dooms

@sidooms

A File-based Approach for Recommender Systems in High-

Performance Computing Environments

With the support of IWT Vlaanderen, Stevin Supercomputer Infrastructure at Ghent University, the Hercules Foundation and EWI