running spark in production

27
Running Spark in Production Director, Product Management Member, Technical Staff April 13, 2016 Twitter: @neomythos Vinay Shukla Saisai (Jerry) Shao

Upload: vinay-shukla

Post on 15-Jan-2017

1.248 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Running Spark In Production

Running Spark in Production

Director, Product Management Member, Technical Staff April 13, 2016Twitter: @neomythos

Vinay Shukla Saisai (Jerry) Shao

Page 2: Running Spark In Production

2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Who Are We?

Product Management Spark for 2+ years, Hadoop for 3+ years Recovering Programmer Blog at www.vinayshukla.com Twitter: @neomythos Addicted to Yoga, Hiking, & Coffee

Member of Technical Staff Apache Spark Contributor

Vinay Shukla Saisai (Jerry) Shao

Page 3: Running Spark In Production

3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Spark Security : Secure Production Deployments

Page 4: Running Spark In Production

4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Context: Spark Deployment Modes• Spark on YARN

–Spark driver (SparkContext) in YARN AM(yarn-cluster)–Spark driver (SparkContext) in local (yarn-client):

• Spark Shell & Spark Thrift Server runs in yarn-client only

Client

Executor

App Master

Spark Driver

Client

Executor

App Master

Spark Driver

YARN-ClientYARN-Cluster

Page 5: Running Spark In Production

5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Spark on YARN

Spark Submit

User

Spark AM

1

Hadoop Cluster

HDFS

Executor

YARN RM

4

2 3

Node Manager

Page 6: Running Spark In Production

6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Spark – Security – Four Pillars

Authentication Authorization Audit Encryption

Spark leverages Kerberos on YARNThis presumes network is secured,

the first step of security

Page 7: Running Spark In Production

7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Kerberos Primer

Client

KDC

NN

DN

1. kinit - Login and get Ticket Granting Ticket (TGT)

3. Get NameNode Service Ticket (NN-ST)

2. Client Stores TGT in Ticket Cache

4. Client Stores NN-ST in Ticket Cache

5. Read/write file given NN-ST and file name; returns block locations, block IDs and Block Access Tokensif access permitted

6. Read/write block givenBlock Access Token and block ID

Client’sKerberos

Ticket Cache

Page 8: Running Spark In Production

8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Kerberos authentication within Spark

KDC

Use Spark ST, submit Spark Job

Spark gets Namenode (NN) service ticket

YARN launches Spark Executors using John Doe’s identity

Get service ticket for Spark,

John Doe

Spark AMNN

Executor reads from HDFS using John Doe’s delegation token

kinit

1

2

3

4

5

6

7

Hadoop Cluster

Page 9: Running Spark In Production

9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Spark + X (Source of Data)

KDC

Use Spark ST, submit Spark Job

Spark gets X ST

YARN launches Spark Executors using John Doe’s identity

Get Service Ticket (ST) for Spark

John Doe

Spark AMX

Executor reads from X using John Doe’s delegation token

kinit

1

2

3

4

5

6

7

Hadoop Cluster

Page 10: Running Spark In Production

10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Spark – Kerberos - Example

kinit -kt /etc/security/keytabs/johndoe.keytab [email protected]

./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster --num-executors 3 --driver-memory 512m --executor-memory 512m --executor-cores 1 lib/spark-examples*.jar 10

Page 11: Running Spark In Production

11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

HDFS

Spark – Authorization

YARN Cluster

A B C

KDC

Use Spark ST, submit Spark Job

Get Namenode (NN) service ticket

Executors read from HDFS

Client gets service ticket for Spark

John Doe

RangerCan John launch this job?Can John read this file

Page 12: Running Spark In Production

12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Spark – Communication Channels – YARN-CLUSTER Mode

Spark Submit

RM

Shuffle Service

AMDriver

NM

Ex 1 Ex N

Shuffle Data

Control/RPC

ShuffleBlockTransfer

DataSource

Read/Write Data

FS – Broadcast,File Download

Page 13: Running Spark In Production

13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Spark – Channel Encryption - Example

Shuffle Data

Control/RPC

ShuffleBlockTransfer

Read/Write Data

FS – Broadcast,File Download

spark.authenticate.enableSaslEncryption= true

spark.authenticate = true. Leverage YARN to distribute keys

Depends on Data Source, For HDFS RPC (RC4 | 3DES) or SSL for WebHDFS

NM > Ex leverages YARN based SSL

spark.ssl.enabled = true

Page 14: Running Spark In Production

14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Gotchas with Spark Security Client -> Spark Thrift Server > Spark Executors – No identity propagation on 2nd hop

– Forces STS to run as Hive user to read all data– Reduces security– Use SparkSQL via shell or programmatic API– https://issues.apache.org/jira/browse/SPARK-5159

SparkSQL – Granular security unavailable– Ranger integration will solve this problem (TP coming)– Brings Row/Column level/Masking features to SparkSQL

Spark + HBase with Kerberos– Issue fixed in Spark 1.4 (Spark-6918)

Spark Stream + Kafka + Kerberos + SSL– Issues fixed in HDP 2.4.x

Spark jobs > 72 Hours– Kerberos token not renewed before Spark 1.5

Page 15: Running Spark In Production

15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Spark Performance

Page 16: Running Spark In Production

16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

#1 Big or Small Executor ?

spark-submit --master yarn \ --deploy-mode client \ --num-executors ? \ --executor-cores ? \ --executor-memory ? \ --class MySimpleApp \ mySimpleApp.jar \ arg1 arg2

Page 17: Running Spark In Production

17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Show the Details

Cluster resource: 72 cores + 288GB Memory

24 cores

96GB Memory24 cores

96GB Memory24 cores

96GB Memory

Page 18: Running Spark In Production

18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Benchmark with Different Combinations

Cores Per E#

Memory Per E (GB)

E Per Node

1 6 18

2 12 9

3 18 6

6 36 3

9 54 2

18 108 1

#E stands for Executor

Except too small or too big executor, the performance is relatively close

18 cores 108GB memory per node# The lower the better

Page 19: Running Spark In Production

19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Big Or Small Executor ?

Avoid too small or too big executor.– Small executor will decrease CPU and memory efficiency.– Big executor will introduce heavy GC overhead.

Usually 3 ~ 6 cores and 10 ~ 40 GB of memory per executor is a preferable choice.

Page 20: Running Spark In Production

20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Any Other Thing ?

Executor memory != Container memory Container memory = executor memory + overhead memory (10% of executory memory) Leave some resources to other service and system

Enable CPU scheduling if you want to constrain CPU usage#

#CPU scheduling - http://hortonworks.com/blog/managing-cpu-resources-in-your-hadoop-yarn-clusters/

Page 21: Running Spark In Production

21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

#2 Task Parallelism ?

What is the preferable number of partitions (task parallelism) ?

rdd.groupByKey(numPartitions)

rdd.reduceByKey(func, [numPartitions])

rdd.sortByKey([ascending], [numPartitions])

Page 22: Running Spark In Production

22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Task Parallelism ?

map map map

reduce1 reduce2

Page 23: Running Spark In Production

23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Task Parallelism ?

Task parallelism == number of reducer

Increasing the task parallelism will reduce the data to be processed for each task

map map map

reduce1 reduce2 reduce2

Page 24: Running Spark In Production

24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Big or Small Task ?

Small task– Fine-grained partition can mitigate data skew problem to some extent.– With relatively small data to be processed in each task can alleviate disk spill.– Task number will be increased, this will somehow increase the scheduling overhead.

Big task– Too big shuffle block (> 2GB) will crash the Spark application.– Easily introduce data skew problem.– Frequent GC problem.

Summary– Fully make using of cluster parallelism.– Consider a preferable task size according to the size of shuffle input (~128MB).– Prefer small task rather than big task (increase the task parallelism).

Page 25: Running Spark In Production

25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Benchmark With Different Task Parallelism

# The lower the better

Page 26: Running Spark In Production

26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Conclusion

Different workloads have different processing pattern, knowing the pattern will give a right direction to tune

No one rule could fit all situations, based on the monitored behavior to tune workload can save the effort and get a better result

The more you know about the framework, the better you will get the tuning result

Page 27: Running Spark In Production

27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Thank You

Vinay Shukla, @neomythosSaisai (Jerry) Shao