big data analytics - universität hildesheimbig data analytics outline 1. graphlab application...
TRANSCRIPT
Big Data Analytics
Big Data Analytics
Lucas Rego Drumond
Information Systems and Machine Learning Lab (ISMLL)Institute of Computer Science
University of Hildesheim, Germany
GraphLab On Practice
Lucas Rego Drumond, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
GraphLab On Practice 1 / 15
Big Data Analytics
Outline
1. GraphLab Application Deployment
2. Relational Classification Example
3. Factorization Models on GraphLab
Lucas Rego Drumond, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
GraphLab On Practice 1 / 15
Big Data Analytics 1. GraphLab Application Deployment
Outline
1. GraphLab Application Deployment
2. Relational Classification Example
3. Factorization Models on GraphLab
Lucas Rego Drumond, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
GraphLab On Practice 1 / 15
Big Data Analytics 1. GraphLab Application Deployment
Steps
1. Install GraphLab into a specific directory
I Example: cd /home/user/
I git clone https://github.com/dato-code/PowerGraph.git
2. Create a directory for your application under/home/user/graphlab/apps
3. Create a CMakeLists.txt file into your application directory
4. Add the source files for your program under your application directory
5. Run ./configure under /home/user/graphlab
6. Go to /home/user/graphlab/release/apps/your application
and type make
Lucas Rego Drumond, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
GraphLab On Practice 1 / 15
Big Data Analytics 1. GraphLab Application Deployment
CMakeLists.txt
p r o j e c t (MyProjectName )
a dd g r a p h l a b e x e c u t a b l e ( execu tab l e name imp l ementa t i on . cpp )
Lucas Rego Drumond, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
GraphLab On Practice 2 / 15
Big Data Analytics 1. GraphLab Application Deployment
Hello World
#i n c l u d e <g raph l ab . hpp>
i n t main ( i n t argc , char ∗∗ argv ) {
graphlab : : mpi_tools : : init ( argc , argv ) ;graphlab : : distributed_control dc ;
dc . cout ( ) << ” He l l o World !\ n” ;
graphlab : : mpi_tools : : finalize ( ) ;
}
Lucas Rego Drumond, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
GraphLab On Practice 3 / 15
Big Data Analytics 2. Relational Classification Example
Outline
1. GraphLab Application Deployment
2. Relational Classification Example
3. Factorization Models on GraphLab
Lucas Rego Drumond, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
GraphLab On Practice 4 / 15
Big Data Analytics 2. Relational Classification Example
Relational Classification
v1
y(v1) : 1
v2
y(v2) : 3
v3
y(v3) :?
v4
y(v4) : 2
1
1 1 1
1
Given a graph G := (V ,E ) and a set oflabels L
I Some nodes have labels y : V → L
I Edges v , u have weights wv ,u
I Task: estimate a function y : V → L
Lucas Rego Drumond, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
GraphLab On Practice 4 / 15
Big Data Analytics 2. Relational Classification Example
Weighted voted Relational Neighbor
v1
y(v1) : 1
v2
y(v2) : 3
v3
y(v3) :?
v4
y(v4) : 2
1
1 1 1
1
Probability that a vertex v ∈ V has labelc ∈ L
P(c |v) =1
Zv
∑u∈{u|u∈Nv∧y(u)=c}
w(u,v)
Where:
Zv =∑u∈Nv
w(u,v)
I Nv denotes the neighbors of v
y(v) := arg maxc∈L
P(c |v)
Lucas Rego Drumond, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
GraphLab On Practice 5 / 15
Big Data Analytics 2. Relational Classification Example
wvRN Vertex Program:
1: procedurewvRNGatherinput: vertex v , scopeSv , ingoing edge (u → v)
2: return (w(u,v), y(u))
3: end procedure
1: procedure wvRNApplyinput: vertex v , scope Sv , gatherresult(Zv ,
(∑{u|u∈Nv∧y(u)=c} wu,v
)c∈L
),
2: y(v) :=
arg maxc∈L
(∑{u|u∈Nv∧y(u)=c} wu,v
)3: end procedure
Lucas Rego Drumond, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
GraphLab On Practice 6 / 15
Big Data Analytics 2. Relational Classification Example
wvRN Code
I Code and toy data:I http://www.ismll.uni-hildesheim.de/lehre/bd-14s/script/
gl_ex/wvRN_example.zip
Lucas Rego Drumond, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
GraphLab On Practice 7 / 15
Big Data Analytics 3. Factorization Models on GraphLab
Outline
1. GraphLab Application Deployment
2. Relational Classification Example
3. Factorization Models on GraphLab
Lucas Rego Drumond, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
GraphLab On Practice 8 / 15
Big Data Analytics 3. Factorization Models on GraphLab
Factorization modelsI Each item i ∈ I is associated with a latent feature vector qi ∈ Rk
I Each user u ∈ U is associated with a latent feature vector pu ∈ Rk
I Each entry in the original matrix can be estimated by
r(u, i) = p>u qi =k∑
f =1
pu,f qi ,f
Lucas Rego Drumond, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
GraphLab On Practice 8 / 15
Big Data Analytics 3. Factorization Models on GraphLab
Example
Titanic (t) Matrix (m) The Godfather (g) Once (o)
Alice (a) 4 2 5Bob (b) 4 3John (j) 4 3
a≈b xx
RR QQTTPP
TT
AliceAlice
BobBob
JohnJohn
4
4
4
2
3
5
3
MM GG OO
AliceAlice
BobBob
JohnJohn
TT MM GG OO
Lucas Rego Drumond, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
GraphLab On Practice 9 / 15
Big Data Analytics 3. Factorization Models on GraphLab
Learning a factorization model - Objective Function
Task:arg min
P,Q
∑(u,i ,rui )∈Dtrain
(rui − r(u, i))2 + λ(||P||2 + ||Q||2)
Where:
I r(u, i) := p>u qiI Dtrain is the training data
I λ is a regularization constant
Lucas Rego Drumond, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
GraphLab On Practice 10 / 15
Big Data Analytics 3. Factorization Models on GraphLab
Stochastic Gradient Descent Algorithm
1: procedure LearnLatentFactorsinput: DTrain, λ, α
2: (pu)u∈U ∼ N(0, σI)3: (qi )i∈I ∼ N(0, σI)4: repeat5: for (u, i , ru,i ) ∈ DTrain do . In a random order6: pu ← pu − α (−2(ru,i − r(u, i))qi + 2λpu)7: qi ← qi − α (−2(ru,i − r(u, i))pu + 2λqi )8: end for9: until convergence
10: return P,Q11: end procedure
Lucas Rego Drumond, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
GraphLab On Practice 11 / 15
Big Data Analytics 3. Factorization Models on GraphLab
Recommender System Graph
l
n
o
c
prnp = 4
rlp = 2
roc = 5
I Nodes:I Users UI Items I
I EdgesI Ratings rui
Lucas Rego Drumond, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
GraphLab On Practice 12 / 15
Big Data Analytics 3. Factorization Models on GraphLab
Factorization Models on GraphLab
l
n
o
c
prnp = 4
rlp = 2
roc = 5
I Node data:I user node: puI item node: qi
I Edge data:I Rating rui
Lucas Rego Drumond, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
GraphLab On Practice 13 / 15
Big Data Analytics 3. Factorization Models on GraphLab
Factorization Models on Graphlab
l
n
o
c
prnp = 4
rlp = 2
roc = 5
User Nodes:I Gather
I Compute the error andaccumulate the updateon each item
I ApplyI Update latent feature
vectors and computeupdates for eachneighboring item
I ScatterI Send message to items
with updates andaccumulated error
I Signal neighboringitems to execute
Lucas Rego Drumond, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
GraphLab On Practice 14 / 15
Big Data Analytics 3. Factorization Models on GraphLab
Factorization Models on Graphlab
l
n
o
c
prnp = 4
rlp = 2
roc = 5
Item Nodes:I Gather
I Gather messages fromusers
I ApplyI Update latent feature
vectors
I ScatterI Signal neighboring
users to execute
Lucas Rego Drumond, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
GraphLab On Practice 15 / 15