distributed r for big data
DESCRIPTION
Distributed R for big data. Shivaram Venkataraman * , Indrajit Roy + , Alvin AuYoung + , Rob Schreiber + , Erik Bodzsar # , Kyungyong Lee ^+ * UC Berkeley, + HP Labs, # U Chicago, ^ UFL. Single Threaded + Single Machine. R. R. R. R. R. R. darray. foreach. f (x). - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Distributed R for big data](https://reader036.vdocument.in/reader036/viewer/2022081421/56816621550346895dd97718/html5/thumbnails/1.jpg)
Distributed R for big dataShivaram Venkataraman*, Indrajit Roy+, Alvin AuYoung+, Rob Schreiber+, Erik Bodzsar#, Kyungyong Lee^+
*UC Berkeley, +HP Labs, #U Chicago, ^ UFL
![Page 2: Distributed R for big data](https://reader036.vdocument.in/reader036/viewer/2022081421/56816621550346895dd97718/html5/thumbnails/2.jpg)
RSingle
Threaded+
Single Machine
![Page 3: Distributed R for big data](https://reader036.vdocument.in/reader036/viewer/2022081421/56816621550346895dd97718/html5/thumbnails/3.jpg)
RRR
R R
![Page 4: Distributed R for big data](https://reader036.vdocument.in/reader036/viewer/2022081421/56816621550346895dd97718/html5/thumbnails/4.jpg)
darray
![Page 5: Distributed R for big data](https://reader036.vdocument.in/reader036/viewer/2022081421/56816621550346895dd97718/html5/thumbnails/5.jpg)
foreach
f (x)
![Page 6: Distributed R for big data](https://reader036.vdocument.in/reader036/viewer/2022081421/56816621550346895dd97718/html5/thumbnails/6.jpg)
20x faster than In-memory Hadoop
Power method with 1B edges,Netflix ALS
ScaleSpeed
![Page 7: Distributed R for big data](https://reader036.vdocument.in/reader036/viewer/2022081421/56816621550346895dd97718/html5/thumbnails/7.jpg)
demo
![Page 8: Distributed R for big data](https://reader036.vdocument.in/reader036/viewer/2022081421/56816621550346895dd97718/html5/thumbnails/8.jpg)
![Page 9: Distributed R for big data](https://reader036.vdocument.in/reader036/viewer/2022081421/56816621550346895dd97718/html5/thumbnails/9.jpg)
lj_matrix darray(dim=c(n,n),blocks=c(n,n))
in_vector darray(dim=c(n,1), blocks=(s,1), data=1/n)
out_vector darray(dim=c(n,1), blocks=(s,1))
foreach(i, 1:length(splits(lj_matrix)), function(g = splits(lj_matrix, i), i = splits(in_vector), o = splits(out_vector, i)) { n g %*% o update(n)
})
![Page 10: Distributed R for big data](https://reader036.vdocument.in/reader036/viewer/2022081421/56816621550346895dd97718/html5/thumbnails/10.jpg)
Contact us - alpha version
m
hpl.hp.com/research/presto.htm
tinyurl.com/presto-project
![Page 11: Distributed R for big data](https://reader036.vdocument.in/reader036/viewer/2022081421/56816621550346895dd97718/html5/thumbnails/11.jpg)
![Page 12: Distributed R for big data](https://reader036.vdocument.in/reader036/viewer/2022081421/56816621550346895dd97718/html5/thumbnails/12.jpg)
R
R
RR