scaling spark
TRANSCRIPT
![Page 1: Scaling spark](https://reader035.vdocument.in/reader035/viewer/2022062503/58f23c921a28aba8208b45a7/html5/thumbnails/1.jpg)
SCALING SPARK ON AWS
THE JOURNEY
![Page 2: Scaling spark](https://reader035.vdocument.in/reader035/viewer/2022062503/58f23c921a28aba8208b45a7/html5/thumbnails/2.jpg)
ABOUT US
![Page 3: Scaling spark](https://reader035.vdocument.in/reader035/viewer/2022062503/58f23c921a28aba8208b45a7/html5/thumbnails/3.jpg)
Alex Rovner, Director of Data EngineeringMedia Platform
Processing Terabytes Daily
![Page 4: Scaling spark](https://reader035.vdocument.in/reader035/viewer/2022062503/58f23c921a28aba8208b45a7/html5/thumbnails/4.jpg)
PRIOR STATETWO CLUSTERS
CORE & ANALYTICSBOTH IN COLO
![Page 5: Scaling spark](https://reader035.vdocument.in/reader035/viewer/2022062503/58f23c921a28aba8208b45a7/html5/thumbnails/5.jpg)
CHALLENGES
![Page 6: Scaling spark](https://reader035.vdocument.in/reader035/viewer/2022062503/58f23c921a28aba8208b45a7/html5/thumbnails/6.jpg)
CHALLENGESSCALABILITYELASTICITY
AGILITY
![Page 7: Scaling spark](https://reader035.vdocument.in/reader035/viewer/2022062503/58f23c921a28aba8208b45a7/html5/thumbnails/7.jpg)
SPARK
![Page 8: Scaling spark](https://reader035.vdocument.in/reader035/viewer/2022062503/58f23c921a28aba8208b45a7/html5/thumbnails/8.jpg)
SPARKSCALABLE
FRIENDLY APIPYTHON SUPPORT
![Page 9: Scaling spark](https://reader035.vdocument.in/reader035/viewer/2022062503/58f23c921a28aba8208b45a7/html5/thumbnails/9.jpg)
AWS
![Page 10: Scaling spark](https://reader035.vdocument.in/reader035/viewer/2022062503/58f23c921a28aba8208b45a7/html5/thumbnails/10.jpg)
ON-DEMAND COMPUTEFLEXIBLE TERMS
AWS
![Page 11: Scaling spark](https://reader035.vdocument.in/reader035/viewer/2022062503/58f23c921a28aba8208b45a7/html5/thumbnails/11.jpg)
INSTANCES
![Page 12: Scaling spark](https://reader035.vdocument.in/reader035/viewer/2022062503/58f23c921a28aba8208b45a7/html5/thumbnails/12.jpg)
D2.8XLARGE48TB OF EPHEMERAL STORAGE
244 GB RAM38 V-CPU
INSTANCES
![Page 13: Scaling spark](https://reader035.vdocument.in/reader035/viewer/2022062503/58f23c921a28aba8208b45a7/html5/thumbnails/13.jpg)
INSTANCES
WHY EPHEMERAL?
![Page 14: Scaling spark](https://reader035.vdocument.in/reader035/viewer/2022062503/58f23c921a28aba8208b45a7/html5/thumbnails/14.jpg)
INSTANCES
RESERVED VS ON-DEMAND?
![Page 15: Scaling spark](https://reader035.vdocument.in/reader035/viewer/2022062503/58f23c921a28aba8208b45a7/html5/thumbnails/15.jpg)
INSTANCES
SPOT?
![Page 16: Scaling spark](https://reader035.vdocument.in/reader035/viewer/2022062503/58f23c921a28aba8208b45a7/html5/thumbnails/16.jpg)
SPOT
INSTANCESHDFS
D2
D2
D2
D2
![Page 17: Scaling spark](https://reader035.vdocument.in/reader035/viewer/2022062503/58f23c921a28aba8208b45a7/html5/thumbnails/17.jpg)
INSTANCESWAIT, WHAT ABOUT DATA
LOCALITY?
![Page 18: Scaling spark](https://reader035.vdocument.in/reader035/viewer/2022062503/58f23c921a28aba8208b45a7/html5/thumbnails/18.jpg)
HADOOP
![Page 19: Scaling spark](https://reader035.vdocument.in/reader035/viewer/2022062503/58f23c921a28aba8208b45a7/html5/thumbnails/19.jpg)
WHAT ABOUT EMR?
HADOOP
![Page 20: Scaling spark](https://reader035.vdocument.in/reader035/viewer/2022062503/58f23c921a28aba8208b45a7/html5/thumbnails/20.jpg)
HADOOPCDH 5.3
SPARK 1.2 ON YARN
![Page 21: Scaling spark](https://reader035.vdocument.in/reader035/viewer/2022062503/58f23c921a28aba8208b45a7/html5/thumbnails/21.jpg)
HADOOPCDH 5.4
SPARK 1.3 ON YARN
![Page 22: Scaling spark](https://reader035.vdocument.in/reader035/viewer/2022062503/58f23c921a28aba8208b45a7/html5/thumbnails/22.jpg)
HADOOPCDH 5.4
SPARK 1.5 ON YARN
![Page 24: Scaling spark](https://reader035.vdocument.in/reader035/viewer/2022062503/58f23c921a28aba8208b45a7/html5/thumbnails/24.jpg)
AUTO SCALE
![Page 25: Scaling spark](https://reader035.vdocument.in/reader035/viewer/2022062503/58f23c921a28aba8208b45a7/html5/thumbnails/25.jpg)
CALCULATE CLUSTER UTILIZATION
QUERY CM APIV-CORES AVAILABLE, USED &
PENDING
AUTO SCALE
![Page 26: Scaling spark](https://reader035.vdocument.in/reader035/viewer/2022062503/58f23c921a28aba8208b45a7/html5/thumbnails/26.jpg)
CALCULATE TARGET CAPACITYTARGET 80% UTILIZATION
LIMIT DOWNSIZING
AUTO SCALE
![Page 27: Scaling spark](https://reader035.vdocument.in/reader035/viewer/2022062503/58f23c921a28aba8208b45a7/html5/thumbnails/27.jpg)
ADJUST CAPACITY
AUTO SCALE
![Page 28: Scaling spark](https://reader035.vdocument.in/reader035/viewer/2022062503/58f23c921a28aba8208b45a7/html5/thumbnails/28.jpg)
SPEED BUMPS
![Page 29: Scaling spark](https://reader035.vdocument.in/reader035/viewer/2022062503/58f23c921a28aba8208b45a7/html5/thumbnails/29.jpg)
SPEED BUMPSAPPLICATION MASTER ON SPOT
YARN LABELS
![Page 30: Scaling spark](https://reader035.vdocument.in/reader035/viewer/2022062503/58f23c921a28aba8208b45a7/html5/thumbnails/30.jpg)
SPEED BUMPSUSERS ARE IMPATIENTITS NEVER ENOUGH
![Page 31: Scaling spark](https://reader035.vdocument.in/reader035/viewer/2022062503/58f23c921a28aba8208b45a7/html5/thumbnails/31.jpg)
SPEED BUMPSI AM LOST!YARN LOGS
SET YARN OVERHEADCHECK GC TIME
INCREASE EXECUTOR MEMORYTRY AGAIN
![Page 32: Scaling spark](https://reader035.vdocument.in/reader035/viewer/2022062503/58f23c921a28aba8208b45a7/html5/thumbnails/32.jpg)
SPEED BUMPS
BROADCASTING IS EVIL
![Page 33: Scaling spark](https://reader035.vdocument.in/reader035/viewer/2022062503/58f23c921a28aba8208b45a7/html5/thumbnails/33.jpg)
SPEED BUMPSBROADCASTING “LARGE”
DATASETS IS EVIL
![Page 34: Scaling spark](https://reader035.vdocument.in/reader035/viewer/2022062503/58f23c921a28aba8208b45a7/html5/thumbnails/34.jpg)
CURRENT STATE
THREE CLUSTERSANALYTICS & STREAMING (AWS)CORE (COLO - MOVING SOON!)
![Page 35: Scaling spark](https://reader035.vdocument.in/reader035/viewer/2022062503/58f23c921a28aba8208b45a7/html5/thumbnails/35.jpg)
BIG SUCCESS!
![Page 36: Scaling spark](https://reader035.vdocument.in/reader035/viewer/2022062503/58f23c921a28aba8208b45a7/html5/thumbnails/36.jpg)
QUESTIONS?