scaling apache spark mllib to billions of parameters: spark summit east talk by yanbo liang
TRANSCRIPT
![Page 1: Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk by Yanbo Liang](https://reader031.vdocument.in/reader031/viewer/2022021506/58abca611a28ab68068b58c9/html5/thumbnails/1.jpg)
Scaling Apache Spark MLlib tobillions of parameters
Yanbo Liang(Apache Spark committer @ Hortonworks)
joint with Weichen Xu
![Page 2: Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk by Yanbo Liang](https://reader031.vdocument.in/reader031/viewer/2022021506/58abca611a28ab68068b58c9/html5/thumbnails/2.jpg)
Outline
• Background• Vector-free L-BFGS on Spark• Logistic regression on vector-free L-BFGS• Performance• Integrate with existing MLlib• Future work
![Page 3: Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk by Yanbo Liang](https://reader031.vdocument.in/reader031/viewer/2022021506/58abca611a28ab68068b58c9/html5/thumbnails/3.jpg)
Outline
• Background• Vector-free L-BFGS on Spark• Logistic regression on vector-free L-BFGS• Performance• Integrate with existing MLlib• Future work
![Page 4: Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk by Yanbo Liang](https://reader031.vdocument.in/reader031/viewer/2022021506/58abca611a28ab68068b58c9/html5/thumbnails/4.jpg)
Big data + big models = better accuracies
![Page 5: Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk by Yanbo Liang](https://reader031.vdocument.in/reader031/viewer/2022021506/58abca611a28ab68068b58c9/html5/thumbnails/5.jpg)
Spark: a unified platform
“Hidden Technical Debt in Machine Learning Systems”, Google
![Page 6: Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk by Yanbo Liang](https://reader031.vdocument.in/reader031/viewer/2022021506/58abca611a28ab68068b58c9/html5/thumbnails/6.jpg)
Spark: a unified platform
“Hidden Technical Debt in Machine Learning Systems”, Google
![Page 7: Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk by Yanbo Liang](https://reader031.vdocument.in/reader031/viewer/2022021506/58abca611a28ab68068b58c9/html5/thumbnails/7.jpg)
Optimization
• L-BFGS• SGD• WLS
![Page 8: Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk by Yanbo Liang](https://reader031.vdocument.in/reader031/viewer/2022021506/58abca611a28ab68068b58c9/html5/thumbnails/8.jpg)
MLlib logistic regression
Initialize coefficients
Broadcast coefficients to Executors
Comput loss and gradient for each instance, and sum them up locally
Comput loss and gradient for each instance, and sum them up locally
Comput loss and gradient for each instance, and sum them up locally
Reduce from executors to get
lossSum and gradientSum
Handle regularizationand
use L-BFGS/OWL-QNto find next step Final model
coefficients
Executors/Workers
Driver/Controller
loop untial converge
![Page 9: Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk by Yanbo Liang](https://reader031.vdocument.in/reader031/viewer/2022021506/58abca611a28ab68068b58c9/html5/thumbnails/9.jpg)
Bottleneck 1
Initialize coefficients
Broadcast coefficients to Executors
Comput loss and gradient for each instance, and sum them up locally
Comput loss and gradient for each instance, and sum them up locally
Comput loss and gradient for each instance, and sum them up locally
Reduce from executors to get
lossSum and gradientSum
Handle regularizationand
use L-BFGS/OWL-QNto find next step Final model
coefficients
Executors/Workers
Driver/Controller
loop untial converge
![Page 10: Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk by Yanbo Liang](https://reader031.vdocument.in/reader031/viewer/2022021506/58abca611a28ab68068b58c9/html5/thumbnails/10.jpg)
Solution 1Comput loss and gradient for each instance, and sum them up locally
Comput loss and gradient for each instance, and sum them up locally
Comput loss and gradient for each instance, and sum them up locally
Reduce from executors to get
final lossSum and gradientSum
Comput loss and gradient for each instance, and sum them up locally
Reduce fromexecutors to getpartial lossSum
and gradientSum
Reduce fromexecutors to getpartial lossSum
and gradientSum
![Page 11: Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk by Yanbo Liang](https://reader031.vdocument.in/reader031/viewer/2022021506/58abca611a28ab68068b58c9/html5/thumbnails/11.jpg)
Bottleneck 2
Initialize coefficients
Broadcast coefficients to Executors
Comput loss and gradient for each instance, and sum them up locally
Comput loss and gradient for each instance, and sum them up locally
Comput loss and gradient for each instance, and sum them up locally
Reduce from executors to get
lossSum and gradientSum
Handle regularizationand
use L-BFGS/OWL-QNto find next step Final model
coefficients
Executors/Workers
Driver/Controller
loop untial converge
![Page 12: Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk by Yanbo Liang](https://reader031.vdocument.in/reader031/viewer/2022021506/58abca611a28ab68068b58c9/html5/thumbnails/12.jpg)
Outline
• Background• Vector-free L-BFGS on Spark• Logistic regression on vector-free L-BFGS• Performance• Integrate with existing MLlib• Future work
![Page 13: Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk by Yanbo Liang](https://reader031.vdocument.in/reader031/viewer/2022021506/58abca611a28ab68068b58c9/html5/thumbnails/13.jpg)
Distributed Vector
Driver
[2.5, 6.8, 3.1, 9.0, 0.7, 1.9, 2.6, 8.7, 6.2, 1.1, 5.0, 0.0]
worker0 worker1 worker2 worker3
![Page 14: Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk by Yanbo Liang](https://reader031.vdocument.in/reader031/viewer/2022021506/58abca611a28ab68068b58c9/html5/thumbnails/14.jpg)
Distributed Vector
Driver
[2.5, 6.8, 3.1, 9.0, 0.7, 1.9, 2.6, 8.7, 6.2, 1.1, 5.0, 0.0]
[2.5, 6.8, 3.1]
worker0
[9.0, 0.7, 1.9]
worker1
[2.6, 8.7, 6.2]
worker2
[1.1, 5.0, 0.0]
worker3
![Page 15: Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk by Yanbo Liang](https://reader031.vdocument.in/reader031/viewer/2022021506/58abca611a28ab68068b58c9/html5/thumbnails/15.jpg)
Distributed Vector
• Definition:
• Linear algebra operations:– a * x + y– a * x + b * y + c * z + …– x.dot(y)– x.norm– …
class DistribuedVector(val values: RDD[Vector],val sizePerPart: Int,val numPartitions: Int,val size: Long)
![Page 16: Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk by Yanbo Liang](https://reader031.vdocument.in/reader031/viewer/2022021506/58abca611a28ab68068b58c9/html5/thumbnails/16.jpg)
L-BFGSCoefficients: X(k)
Choose descent direction(Two loop recursion)
X(k+1) = alpha * dir(k) + X(k)
Calculate G(k+1) and F(k+1)
![Page 17: Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk by Yanbo Liang](https://reader031.vdocument.in/reader031/viewer/2022021506/58abca611a28ab68068b58c9/html5/thumbnails/17.jpg)
L-BFGSCoefficients: X(k)
Choose descent direction(Two loop recursion)
X(k+1) = alpha * dir(k) + X(k)
Calculate G(k+1) and F(k+1)
Worker Worker Worker Worker
WorkerWorkerWorkerWorker
Driver
Space: (2m+1)dTime: 6md
![Page 18: Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk by Yanbo Liang](https://reader031.vdocument.in/reader031/viewer/2022021506/58abca611a28ab68068b58c9/html5/thumbnails/18.jpg)
Vector-free L-BFGSCoefficients: X(k)
Choose descent direction(Two loop recursion)
X(k+1) = alpha * dir(k) + X(k)
Calculate G(k+1) and F(k+1)
![Page 19: Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk by Yanbo Liang](https://reader031.vdocument.in/reader031/viewer/2022021506/58abca611a28ab68068b58c9/html5/thumbnails/19.jpg)
Vector-free L-BFGSCoefficients: X(k)
Choose descent direction(Two loop recursion)
X(k+1) = alpha * dir(k) + X(k)
Calculate G(k+1) and F(k+1)
Worker Worker Worker Worker
WorkerWorkerWorkerWorker
Driver
Space: (2m+1)*dTime: 6md
Space: (2m+1)^2Time: 8m^2
![Page 20: Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk by Yanbo Liang](https://reader031.vdocument.in/reader031/viewer/2022021506/58abca611a28ab68068b58c9/html5/thumbnails/20.jpg)
Outline
• Background• Vector-free L-BFGS on Spark• Logistic regression on vector-free L-BFGS• Performance• Integrate with existing MLlib• Future work
![Page 21: Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk by Yanbo Liang](https://reader031.vdocument.in/reader031/viewer/2022021506/58abca611a28ab68068b58c9/html5/thumbnails/21.jpg)
numFeatures
numInstance
partition1
partition2
partition3
![Page 22: Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk by Yanbo Liang](https://reader031.vdocument.in/reader031/viewer/2022021506/58abca611a28ab68068b58c9/html5/thumbnails/22.jpg)
numFeatures
numInstance
![Page 23: Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk by Yanbo Liang](https://reader031.vdocument.in/reader031/viewer/2022021506/58abca611a28ab68068b58c9/html5/thumbnails/23.jpg)
coefficients: DistributedVectorcoeff0 coeff1 coeff2
![Page 24: Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk by Yanbo Liang](https://reader031.vdocument.in/reader031/viewer/2022021506/58abca611a28ab68068b58c9/html5/thumbnails/24.jpg)
coeff2coeff0
coeff0
coeff0
coeff1
coeff1
coeff1
coeff2
coeff2
![Page 25: Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk by Yanbo Liang](https://reader031.vdocument.in/reader031/viewer/2022021506/58abca611a28ab68068b58c9/html5/thumbnails/25.jpg)
multiplier02
multiplier22
multiplier12
multiplier01
multiplier00
multiplier11
multiplier10
multiplier21
multiplier20
![Page 26: Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk by Yanbo Liang](https://reader031.vdocument.in/reader031/viewer/2022021506/58abca611a28ab68068b58c9/html5/thumbnails/26.jpg)
multiplier02
multiplier22
multiplier12
multiplier01
multiplier00
multiplier11
multiplier10
multiplier21
multiplier20
label0label2
label1
![Page 27: Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk by Yanbo Liang](https://reader031.vdocument.in/reader031/viewer/2022021506/58abca611a28ab68068b58c9/html5/thumbnails/27.jpg)
multiplier0
multiplier2
multiplier1
![Page 28: Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk by Yanbo Liang](https://reader031.vdocument.in/reader031/viewer/2022021506/58abca611a28ab68068b58c9/html5/thumbnails/28.jpg)
multiplier0
multiplier2
multiplier1
multiplier0
multiplier0
multiplier1
multiplier1
multiplier2
multiplier2
![Page 29: Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk by Yanbo Liang](https://reader031.vdocument.in/reader031/viewer/2022021506/58abca611a28ab68068b58c9/html5/thumbnails/29.jpg)
grad02grad00
grad10
grad20
grad01
grad11
grad21
grad12
grad22
![Page 30: Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk by Yanbo Liang](https://reader031.vdocument.in/reader031/viewer/2022021506/58abca611a28ab68068b58c9/html5/thumbnails/30.jpg)
grad0 grad1 grad2 gradient: DistributedVector
![Page 31: Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk by Yanbo Liang](https://reader031.vdocument.in/reader031/viewer/2022021506/58abca611a28ab68068b58c9/html5/thumbnails/31.jpg)
Regularization and OWL-QN
• Regularization:– L2– L1– elasticNet
• Implement vector-free OWL-QN for objective with L1/elasticNetregularization.
![Page 32: Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk by Yanbo Liang](https://reader031.vdocument.in/reader031/viewer/2022021506/58abca611a28ab68068b58c9/html5/thumbnails/32.jpg)
CheckpointCoefficients: X(k)
Choose descent direction(Two loop recursion)
X(k+1) = alpha * dir(k) + X(k)
Calculate G(k+1) and F(k+1)
Worker Worker Worker Worker
WorkerWorkerWorkerWorker
Driver
Space: (2m+1)*dTime: 6md
Space: (2m+1)^2Time: 8m^2
![Page 33: Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk by Yanbo Liang](https://reader031.vdocument.in/reader031/viewer/2022021506/58abca611a28ab68068b58c9/html5/thumbnails/33.jpg)
Outline
• Background• Vector-free L-BFGS on Spark• Logistic regression on vector-free L-BFGS• Performance• Integrate with existing MLlib• Future work
![Page 34: Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk by Yanbo Liang](https://reader031.vdocument.in/reader031/viewer/2022021506/58abca611a28ab68068b58c9/html5/thumbnails/34.jpg)
Performance(WIP)
0
20
40
60
80
100
120
1M 10M 50M 100M 250M 500M 750M 1B
Time for each epoch
L-BFGS VL-BFGS
![Page 35: Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk by Yanbo Liang](https://reader031.vdocument.in/reader031/viewer/2022021506/58abca611a28ab68068b58c9/html5/thumbnails/35.jpg)
Outline
• Background• Vector-free L-BFGS on Spark• Logistic regression on vector-free L-BFGS• Performance• Integrate with existing MLlib• Future work
![Page 36: Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk by Yanbo Liang](https://reader031.vdocument.in/reader031/viewer/2022021506/58abca611a28ab68068b58c9/html5/thumbnails/36.jpg)
APIs
val dataset: Dataset[_] = spark.read.format("libsvm").load("data/a9a")
val trainer = new VLogisticRegression().setColsPerBlock(100) .setRowsPerBlock(10) .setColPartitions(3) .setRowPartitions(3) .setRegParam(0.5)
val model = trainer.fit(dataset)println(s"Vector-free logistic regression coefficients:
${model.coefficients}")
![Page 37: Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk by Yanbo Liang](https://reader031.vdocument.in/reader031/viewer/2022021506/58abca611a28ab68068b58c9/html5/thumbnails/37.jpg)
Adaptive logistic regression
Number of features Optimization method
less than 4096 WLS/IRLS
more than 4096,but less than 10 million
L-BFGS
more than 10 million VL-BFGS
![Page 38: Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk by Yanbo Liang](https://reader031.vdocument.in/reader031/viewer/2022021506/58abca611a28ab68068b58c9/html5/thumbnails/38.jpg)
Whole picture of MLlib
Application layer(Ads CTR, anti-fraud, data science, NLP, …)
WLS/IRLS
Optimization layer
Spark core
L-BFGS VL-BFGS
LinearRegression
Machine learning algorithm layer
LogisticRegression MLP …
SGD …
![Page 39: Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk by Yanbo Liang](https://reader031.vdocument.in/reader031/viewer/2022021506/58abca611a28ab68068b58c9/html5/thumbnails/39.jpg)
APIs
val dataset: Dataset[_] = spark.read.format("libsvm").load("data/a9a")
val trainer = new LogisticRegression().setColsPerBlock(100) .setRowsPerBlock(10) .setColPartitions(3) .setRowPartitions(3) .setRegParam(0.5).setSolver(“vl-bfgs”)
val model = trainer.fit(dataset)println(s"Vector-free logistic regression coefficients:
${model.coefficients}")
![Page 40: Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk by Yanbo Liang](https://reader031.vdocument.in/reader031/viewer/2022021506/58abca611a28ab68068b58c9/html5/thumbnails/40.jpg)
Outline
• Background• Vector-free L-BFGS on Spark• Logistic regression on vector-free L-BFGS• Performance• Integrate with existing MLlib• Future work
![Page 41: Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk by Yanbo Liang](https://reader031.vdocument.in/reader031/viewer/2022021506/58abca611a28ab68068b58c9/html5/thumbnails/41.jpg)
Future work
• Reduce stages for each iteration.• Performance improvements.• Implement LinearRegression, SoftmaxRegression and
MultilayerPerceptronClassifier base on vector-free L-BFGS/OWL-QN.• Real word use case for Ads CTR prediction with billions parameters
and will share experiences and lessons we learned:– https://github.com/yanboliang/spark-vlbfgs
![Page 42: Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk by Yanbo Liang](https://reader031.vdocument.in/reader031/viewer/2022021506/58abca611a28ab68068b58c9/html5/thumbnails/42.jpg)
Key takeaways
• Full distributed calculation, full distributed model, can runsuccessfully without OOM.
• VL-BFGS API is consistent with breeze L-BFGS.• VL-BFGS output exactly the same solution as breeze L-BFGS.• Does not require special components such as parameter servers.• Pure library and can be deployed and used easily.• Can benefit from the Spark cluster operation and development
experience.• Use the same platform as the data size increasing.
![Page 43: Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk by Yanbo Liang](https://reader031.vdocument.in/reader031/viewer/2022021506/58abca611a28ab68068b58c9/html5/thumbnails/43.jpg)
Reference
• https://papers.nips.cc/paper/5333-large-scale-l-bfgs-using-mapreduce
• https://github.com/mengxr/spark-vl-bfgs• https://spark-summit.org/2015/events/large-scale-lasso-and-elastic-net-regularized-generalized-linear-models/
![Page 44: Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk by Yanbo Liang](https://reader031.vdocument.in/reader031/viewer/2022021506/58abca611a28ab68068b58c9/html5/thumbnails/44.jpg)
Thank You.
Yanbo Liang