matrix factorization
DESCRIPTION
Matrix Factorization. Reporter : Sun Yuanshuai E-mail : [email protected]. 1. MF Introduction. 2. Application Area. 3. My Work. 4. Difficulty in my work. Content. MF Introduction. - PowerPoint PPT PresentationTRANSCRIPT
Content
11 MF Introduction
22 Application Area
33 My Work
44 Difficulty in my work
MF Introduction
Matrix factorization (abbr. MF), just as the name suggests, decomposes a big matrix into the multiplication form of several small matrix. It defines mathematically as follows,
We here assume the target matrix , the factor matrix and , where K << min (m, n), so it is
nmR kmU
knV
TUVR
Rji
jiijVU
vurf),(
2
,)(minarg
MF Introduction
We quantify the quality of the approximation with the Euclidean distance, so we can get the objective function as follows,
Rji
jijiVU
rrf),(
2~
,,,
))((minarg
Where i.e. is the predict value.
K
kjkikjiji vuvur
1**
~
, *~
, jir
Rji
jiji
ji
jiji
VUrr
r
rrf
),(
~
,,~
,
,,
,)log(minarg
22 |||||||| jvu vui
MF Introduction1. Alternating Descent Method
This method only works, when the loss function implies with Euclidean distance.
0])[( iuj jjiij
i
UVVUrU
f
So, we can get
The same to .jV
)( jj
j jij
i VV
VrU
MF Introduction2. Gradient Descent Method
The update rules of U defines as follows,
j jjiiji
VVUrU
f])[(
iii UfUU /*/
iuUwhere
The same to .jV
Rji
jiijVU
vurf),(
2
,)(minarg 22 |||||||| jvu vu
i
MF IntroductionGradient AlgorithmStochastic Gradient Algorithm
MF IntroductionOnline Algorithm
Online-Updating Regularized Kernel Matrix Factorization Models for Large-Scale Recommender Systems
MF IntroductionLoss Function
Rji
jiijVU
vurf),(
2
,)(minarg
])()[( ijT
ijT
ijijij UVUURVV
UVU
URVV
T
T
jij
UVU
VT
We update the factor V for reducing the objective function f with the conventional gradient descendent, as follows,
Here we set , so it is reachable
, the same to factor matrix U. go
MF Introduction
Here, we go on based on an assumption that SSGD can converge to a set of stationary points.
MF Introduction
The idea of DSGD is to specialize the SSGD algorithm, choosing the strata with special layout such that SGD can be run on each stratum in a distributed manner.
We see that there exists dependence between the current solution and the last one gotten by iteration operation, i.e. the last solution has to be known before the current can be computed. To solve the problem, we propose the notion interchangeablility :
We can get the theorem from definition about interchangeability, as follows:
From the theorem, we can compute the train matrix which is block-diagonal in parallel, i.e.
can be computed independ
-ently.
Zi
MF Introduction
We can compute the block-diagonal matrix in parallel. Our target, however, is to make the general matrix decomposition parallelism. How can we make it?
Now we can stratify the input matrix, such that each stratum meets the interchangeable condition.
Assume we cut input matrix into 3*3 blocks, as follows:
MF Introduction
MF Introduction
Application Area
Any area where dyadic data can be generated.
Dyadic Data : In general, dyadic data are the measurements on dyads, which are pairs of two elements coming from two sets.
= (userId, itemId, rating)
customer product
buy
My Work
My Work
MM T
× =
Left Matrix
Right Matrix
× =
× =
+
+
||
Difficulty in my work
DataSet
I use a total of 5 jobs. But the job can’t work. Just because the data generated in the procedure is too big which is 6000GB, Analyzed as follows, the left matrix is 300 thousand * 250 thousand approximately, the right matrix F is 250 thousand * 10 approximately, so the additional data generated is 250K*10*250K*8=6000G, where the 8 implies the number of bytes taken to store a double.
Difficulty in my work
The techniques I have used:
Combiner
Compress
THANK YOU FOR YOUR TIME!