running spark in production

Running Spark in Production

Director, Product Management Member, Technical Staff April 13, 2016Twitter: @neomythos

Vinay Shukla Saisai (Jerry) Shao

2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Who Are We?

Product Management Spark for 2+ years, Hadoop for 3+ years Recovering Programmer Blog at www.vinayshukla.com Twitter: @neomythos Addicted to Yoga, Hiking, & Coffee

Member of Technical Staff Apache Spark Contributor

Vinay Shukla Saisai (Jerry) Shao


Spark Security : Secure Production Deployments


Context: Spark Deployment Modes• Spark on YARN

–Spark driver (SparkContext) in YARN AM(yarn-cluster)–Spark driver (SparkContext) in local (yarn-client):

• Spark Shell & Spark Thrift Server runs in yarn-client only

Client

Executor

App Master

Spark Driver

Client

Executor

App Master

Spark Driver

YARN-ClientYARN-Cluster


Spark on YARN

Spark Submit

User

Spark AM

1

Hadoop Cluster

HDFS

Executor

YARN RM

4

2 3

Node Manager


Spark – Security – Four Pillars

Authentication Authorization Audit Encryption

Spark leverages Kerberos on YARNThis presumes network is secured,

the first step of security


Kerberos Primer

Client

KDC

NN

DN

1. kinit - Login and get Ticket Granting Ticket (TGT)

3. Get NameNode Service Ticket (NN-ST)

2. Client Stores TGT in Ticket Cache

4. Client Stores NN-ST in Ticket Cache

5. Read/write file given NN-ST and file name; returns block locations, block IDs and Block Access Tokensif access permitted

6. Read/write block givenBlock Access Token and block ID

Client’sKerberos

Ticket Cache


Kerberos authentication within Spark

KDC

Use Spark ST, submit Spark Job

Spark gets Namenode (NN) service ticket

YARN launches Spark Executors using John Doe’s identity

Get service ticket for Spark,

John Doe

Spark AMNN

Executor reads from HDFS using John Doe’s delegation token

kinit

1

2

3

4

5

6

7

Hadoop Cluster


Spark + X (Source of Data)

KDC


Spark gets X ST

YARN launches Spark Executors using John Doe’s identity

Get Service Ticket (ST) for Spark

John Doe

Spark AMX

Executor reads from X using John Doe’s delegation token

kinit

1

2

3

4

5

6

7

Hadoop Cluster


Spark – Kerberos - Example

kinit -kt /etc/security/keytabs/johndoe.keytab [email protected]

./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster --num-executors 3 --driver-memory 512m --executor-memory 512m --executor-cores 1 lib/spark-examples*.jar 10

mailto:sparkuser/[email protected]

mailto:sparkuser/[email protected]


HDFS

Spark – Authorization

YARN Cluster

A B C

KDC


Get Namenode (NN) service ticket

Executors read from HDFS

Client gets service ticket for Spark

John Doe

RangerCan John launch this job?Can John read this file


Spark – Communication Channels – YARN-CLUSTER Mode

Spark Submit

RM

Shuffle Service

AMDriver

NM

Ex 1 Ex N

Shuffle Data

Control/RPC

ShuffleBlockTransfer

DataSource

Read/Write Data

FS – Broadcast,File Download


Spark – Channel Encryption - Example

Shuffle Data

Control/RPC

ShuffleBlockTransfer

Read/Write Data

FS – Broadcast,File Download

spark.authenticate.enableSaslEncryption= true

spark.authenticate = true. Leverage YARN to distribute keys

Depends on Data Source, For HDFS RPC (RC4 | 3DES) or SSL for WebHDFS

NM > Ex leverages YARN based SSL

spark.ssl.enabled = true


Gotchas with Spark Security Client -> Spark Thrift Server > Spark Executors – No identity propagation on 2nd hop

– Forces STS to run as Hive user to read all data– Reduces security– Use SparkSQL via shell or programmatic API– https://issues.apache.org/jira/browse/SPARK-5159

SparkSQL – Granular security unavailable– Ranger integration will solve this problem (TP coming)– Brings Row/Column level/Masking features to SparkSQL

Spark + HBase with Kerberos– Issue fixed in Spark 1.4 (Spark-6918)

Spark Stream + Kafka + Kerberos + SSL– Issues fixed in HDP 2.4.x

Spark jobs > 72 Hours– Kerberos token not renewed before Spark 1.5


Spark Performance


#1 Big or Small Executor ?

spark-submit --master yarn \ --deploy-mode client \ --num-executors ? \ --executor-cores ? \ --executor-memory ? \ --class MySimpleApp \ mySimpleApp.jar \ arg1 arg2


Show the Details

Cluster resource: 72 cores + 288GB Memory

24 cores

96GB Memory24 cores

96GB Memory24 cores

96GB Memory


Benchmark with Different Combinations

Cores Per E#

Memory Per E (GB)

E Per Node

1 6 18

2 12 9

3 18 6

6 36 3

9 54 2

18 108 1

#E stands for Executor

Except too small or too big executor, the performance is relatively close

18 cores 108GB memory per node# The lower the better


Big Or Small Executor ?

Avoid too small or too big executor.– Small executor will decrease CPU and memory efficiency.– Big executor will introduce heavy GC overhead.

Usually 3 ~ 6 cores and 10 ~ 40 GB of memory per executor is a preferable choice.


Any Other Thing ?

Executor memory != Container memory Container memory = executor memory + overhead memory (10% of executory memory) Leave some resources to other service and system

Enable CPU scheduling if you want to constrain CPU usage#

#CPU scheduling - http://hortonworks.com/blog/managing-cpu-resources-in-your-hadoop-yarn-clusters/

http://hortonworks.com/blog/managing-cpu-resources-in-your-hadoop-yarn-clusters/

http://hortonworks.com/blog/managing-cpu-resources-in-your-hadoop-yarn-clusters/


#2 Task Parallelism ?

What is the preferable number of partitions (task parallelism) ?

rdd.groupByKey(numPartitions)

rdd.reduceByKey(func, [numPartitions])

rdd.sortByKey([ascending], [numPartitions])


Task Parallelism ?

map map map

reduce1 reduce2


Task Parallelism ?

Task parallelism == number of reducer

Increasing the task parallelism will reduce the data to be processed for each task

map map map

reduce1 reduce2 reduce2


Big or Small Task ?

Small task– Fine-grained partition can mitigate data skew problem to some extent.– With relatively small data to be processed in each task can alleviate disk spill.– Task number will be increased, this will somehow increase the scheduling overhead.

Big task– Too big shuffle block (> 2GB) will crash the Spark application.– Easily introduce data skew problem.– Frequent GC problem.

Summary– Fully make using of cluster parallelism.– Consider a preferable task size according to the size of shuffle input (~128MB).– Prefer small task rather than big task (increase the task parallelism).


Benchmark With Different Task Parallelism

# The lower the better


Conclusion

Different workloads have different processing pattern, knowing the pattern will give a right direction to tune

No one rule could fit all situations, based on the monitored behavior to tune workload can save the effort and get a better result

The more you know about the framework, the better you will get the tuning result


Thank You

Vinay Shukla, @neomythosSaisai (Jerry) Shao

running spark in production

Technology