spark / mesos cluster optimization

43
HAYSSAM SALEH Spark / Mesos Cluster Optimization Paris Spark Meetup April 28 th 2016 @Criteo

Upload: ebiznext

Post on 21-Apr-2017

3.957 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Spark / Mesos Cluster Optimization

HAYSSAM SALEHSpark/Mesos ClusterOptimizationParisSparkMeetup April28th 2016@Criteo

Page 2: Spark / Mesos Cluster Optimization

Grappe

Clic

Nav

AT

DataLake

Avis

Evts.

Supports

Comptes

Spark Job

DataMart

Contribution

VisiteurSessionRechercheBandeau

[LR][FD]

[Clic]

ProjectContext

Page 3: Spark / Mesos Cluster Optimization

3

InteractiveDiscovery

Elastcsearch DataMart(KPI)

Page 4: Spark / Mesos Cluster Optimization

InteractiveDiscovery

Elastcsearch DataMart(KPI)

Page 5: Spark / Mesos Cluster Optimization

OBJECTIVES

� 100Gbofdata/day=>50millionrequests� 40To/year

� Howwe turned from a4hours jobon:¢ 6nodes¢ 8cores &32Gbpernode

¢ Toa20minutesJobon:¢ 4nodes,8cores &8Gbpernode

Page 6: Spark / Mesos Cluster Optimization

SUMMARY

¢ Sparkconcepts¢ SparkUIoffline¢ ApplicationOptimization

� Shuffling� Partitioning� Closures

¢ ParametersOptimization� SparkShuffling� Mesos applicationdistribution

¢ Elasticsearch Optimization� Google“elasticsearch performancetuning”->blog.ebiznext.com

Page 7: Spark / Mesos Cluster Optimization

SPARK :LES CONCEPTS¢ Application

� Mainapplication¢ Job

� RoundtripDriver->Cluster¢ Stage

� ShuffleBoundary¢ Task

� ThreadworkingonasingleRDDpartition¢ Partition

� RDDaresplit intopartitions� Partitionisunitofworkforeachtask

¢ Executor� Systemprocess

Application

Job

Stage

Task

1

n

n

n

1

1

Driver code

Spark Action

Shuffle Boundary

One task / partition

RDD

Partition

Page 8: Spark / Mesos Cluster Optimization

SPARK UIOFFLINE

Spark-env.sh

Onthedriver

Page 9: Spark / Mesos Cluster Optimization

APPLICATION

Page 10: Spark / Mesos Cluster Optimization

JOBS

Page 11: Spark / Mesos Cluster Optimization

STAGES

Page 12: Spark / Mesos Cluster Optimization

STAGES

Page 13: Spark / Mesos Cluster Optimization

TASKS

Page 14: Spark / Mesos Cluster Optimization

SHUFFLING APPLICATION OPTIMIZATION

Page 15: Spark / Mesos Cluster Optimization

WHY OPTIMIZE SHUFFLING

DistancebetweenData&CPU Duaration (scaled)

CacheL1 1secondeRAM 3minutesNode tonode communication 3jours

Page 16: Spark / Mesos Cluster Optimization

TRANSFORMATIONS LEADING TO SHUFFLING

¢ repartition

¢ cogroup

¢ ...join

¢ ...ByKey

¢ distinct

Page 17: Spark / Mesos Cluster Optimization

SHUFFLING OPTIMIZATION 1/2(k1,v1,w1)(k2,v2,w2)(k3,v3,w3)

(k1,v4w4)(k1,v5,w5)(k3,v6,w6)

(k2,v7,w7)(k2v8,w8)(k3,v9,w9)

(k1,v1)(k2,v2)(k3,v3)

(k1,v4)(k1,v5)(k3,v6)

(k2,v7)(k2v8)(k3,v9)

map map map

(k1,[v1,v4,v5]) (k3,[v3,v6,v9]) (k2,[v2,v7,v8])

groupByKey groupByKey groupByKey

(k1,v1+v4+v5) (k3,v3+v6+v9) (k2,v2+v7+v8])

Shuffling

Bonus:OOM

Page 18: Spark / Mesos Cluster Optimization

SHUFFLING OPTIMIZATION 2/2(k1,v1,w1)(k2,v2,w2)(k3,v3,w3)

(k1,v4w4)(k1,v5,w5)(k3,v6,w6)

(k2,v7,w7)(k2v8,w8)(k3,v9,w9)

(k1,v1)(k2,v2)(k3,v3)

(k1,v4)(k1,v5)(k3,v6)

(k2,v7)(k2v8)(k3,v9)

map map map

(k1,v1+v45)) (k3,v3+v6+v9) (k2,v2+v78)

reduceByKey (2/2) reduceByKey (2/2) reduceByKey (2/2)

(k1,v1)(k2,v2)(k3,v3)

(k1,v4+v5)(k3,v6)

(k2,v7+v8)(k3,v9)

reduceByKey (1/2) reduceByKey (1/2) reduceByKey (1/2)

Lessshuffling

Page 19: Spark / Mesos Cluster Optimization

OPTIMIZATIONS BROUGHT BY SPARK SQL

LWeaktyping

L LimitedJavaSupportL Schemainferencenotsupported

• requiresscala.Product

Page 20: Spark / Mesos Cluster Optimization

LES DATASETS – EXPERIMENTAL -

Scala

Java

J APIsimilartoRDD,stronglytyped

Page 21: Spark / Mesos Cluster Optimization

REDUCE OBJECT SIZE

Page 22: Spark / Mesos Cluster Optimization

CLOSURES

Page 23: Spark / Mesos Cluster Optimization

CLOSURES

Indirectreference.Uselocalvariablesinstead

Page 24: Spark / Mesos Cluster Optimization

RIGHT PARTITIONING

Page 25: Spark / Mesos Cluster Optimization

PARTITIONING

¢ Symptoms� Importantnumberofshortlivedtasks

¢ Solutions� Review your implementation� Define custompartitionner� Controlthenumber ofpartitionswith

¢ coalesceetrepartition

Page 26: Spark / Mesos Cluster Optimization

CHOOSE THE RIGHT SERIALIZATION ALGORITHM

Page 27: Spark / Mesos Cluster Optimization

REDUCE OBJECT SIZE

¢ Kryo Serialization� Reducespace– upto10X� Performgain– 2Xto3X

1.RequireSparktouseKryo serialization2.Optimizeclassnaming=>reduceobjectsize3.Forceallserializedclassestoregister

NotapplicabletotheDatasetAPIthatuseEncodersinstead

Page 28: Spark / Mesos Cluster Optimization

DATASETS PROMISE

Page 29: Spark / Mesos Cluster Optimization

MEMORY TUNING

Page 30: Spark / Mesos Cluster Optimization

SPARK MEMORYMANAGEMENT– SPARK 1.6.X

Shuffle fraction

Memory reserved to Spark (300Mb)

Memory assigned to user data1 - spark.memory.fraction

Memory assigned to spark dataspark.memory.fraction = 0.75

spark.memory.storageFraction = 0.5

16Go

11.78Go

3.92Go

Page 31: Spark / Mesos Cluster Optimization

SPARK MEMORYMANAGEMENT – SPARK 1.6.X

¢ StorageFraction� HostcachedRDDs� Hostbroadcastvariables� Datainthis fractionaresubject toeviction

¢ ShufflingFraction� HoldIntermediateData� Spilltodiskwhenfull� Datathisfractioncannotbeevictedbyotherthreads

Memory reserved to Spark (300Mb)

Memory assigned to user data1 - spark.memory.fraction

Memory assigned to spark dataspark.memory.fraction = 0.75

spark.memory.storageFraction = 0.5

16Go

11.78Go

3.92Go

Page 32: Spark / Mesos Cluster Optimization

SPARK MEMORYMANAGEMENT – SPARK 1.6.X

¢ Shuffle&Storagefractionsmayborrowmemoryfromeachotherundercertainconditions:� Theshufflingfractioncannotextendbeyonditsdefinedsizeifstoragefractionusesallitsmemory

� Storagefractioncannotevictdatafromtheshufflingfractionevenifitexpandedbeyonditsdefinedsize.

Memory reserved to Spark (300Mb)

Memory assigned to user data1 - spark.memory.fraction

Memory assigned to spark dataspark.memory.fraction = 0.75

spark.memory.storageFraction = 0.5

16Go

11.78Go

3.92Go

Page 33: Spark / Mesos Cluster Optimization

THE SHUFFLE MANAGERSPARK.SHUFFLE.MANAGER

Page 34: Spark / Mesos Cluster Optimization

HASH SHUFFLE MANAGER

Executor

Partition

Partition

Partition

Partition

Partition

Task

Task

Task

Number of tasks = spark.executor.cores / spark.task.cpus(1)

YARN & standalone only

Local Filesystemspark.local.dir

File 1

File 2

File …

Page 35: Spark / Mesos Cluster Optimization

SORT SHUFFLE MANAGERExecutor

Partition

Partition

Partition

Partition

Partition

Task

Task

Task

Number of tasks = spark.executor.cores / spark.task.cpus(1)

YARN & standalone only

Filesystem localspark.local.dir

Single fileMap AppendOnly

Sort & Spill

spark.shuffle.spill = true

id1

id2

id3

id..

sorted by reducer id

Page 36: Spark / Mesos Cluster Optimization

SHUFFLE MANAGER

¢ spark.shuffle.manager=hash� Performbetterwhenthenumberofmapper/reducer issmall(generationofM*Rfiles)

¢ spark.shuffle.manager=sort� Performbetterforanimportantnumberofmapper/reducer

¢ Bestofbothworld� spark.shuffle.sort.bypassMergeThreshold

¢ spark.shuffle.manager=tungsten-sort???� Similartosortwiththefollowingbenefits:

¢ Off-Heapallocation¢ Workdirectlyonbinaryobjectserialization(muchsmallerthanJVMobjects)– akamemcopy J¢ Use8bytes/recordpour lessort=>takeadvantageonL*cache

Page 37: Spark / Mesos Cluster Optimization

SPARK ON MESOS

Page 38: Spark / Mesos Cluster Optimization

COARSE GRAINED MODE

J Staticresourceallocation

Spark Driver

CoarseMesosSchedulerBackendMesos Master

Mesos WorkerCoarseGrainedExecutorBackend

Mesos WorkerCoarseGrainedExecutorBackend

Mesos WorkerCoarseGrainedExecutorBackend

Mesos WorkerCoarseGrainedExecutorBackend

Executor

Task Task

Task Task

Task

Task

Page 39: Spark / Mesos Cluster Optimization

COARSE GRAINED MODE

L Onlyoneexecutor/node

L Nocontroloverthenumberofexecutors 8 cores

16g8 cores

16g

8 cores16g

8 cores16g

16 cores16g

16 cores16g

32 cores / 64Go 32 cores / 32Go

Page 40: Spark / Mesos Cluster Optimization

FINE GRAINED MODE

L Onlyoneexecutor/node

L Noguarantyofresourceavailability

L Nocontroloverthenumberofexecutors

Spark Driver

MesosSchedulerBackendMesos Master

Mesos WorkerCoarseGrainedExecutorBackend

Mesos WorkerCoarseGrainedExecutorBackend

Mesos WorkerCoarseGrainedExecutorBackend

Mesos WorkerMesosExecutorBackend

Executor

Task

Executor

Task

Executor

Task

Executor

Task

spark.mesos.mesosExecutor.cores (1)

Page 41: Spark / Mesos Cluster Optimization

SOLUTION 1:USE MESOS ROLES

¢ Limitthenumberofcores/nodeoneachmesos worker

LMustbe configured foreach Spark application

• Assign aMesos role toyour Spark job8 cores

16g8 cores

16g

8 cores16g

8 cores16g

Page 42: Spark / Mesos Cluster Optimization

SOLUTION 2:DYNAMIC ALLOCATION (COARSED GRAINED MODE ONLY)

¢ Principles� Dynamicallyadd/removeSparkexecutors

¢ Requiresadedicatedprocesstomoveshuffleddata(spawnedoneachMesosworkerthroughMarathon)

Page 43: Spark / Mesos Cluster Optimization

QUESTIONS ?