institute of computing technology understanding big data workloads on modern processors using...

60
INSTITUTE OF COMPUTING TECHNOLOGY Understanding Big Data Workloads on Modern Processors using BigDataBench Jianfeng Zhan http://prof.ict.ac.cn/BigDataBench Professor, ICT, Chinese Academy of Sciences and University of Chinese Academy of Sciences HPBDC 2015 Ohio, USA

Upload: philip-webb

Post on 02-Jan-2016

219 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: INSTITUTE OF COMPUTING TECHNOLOGY Understanding Big Data Workloads on Modern Processors using BigDataBench Jianfeng Zhan

INS

TIT

UTE O

F C

OM

PU

TIN

G

TEC

HN

OLO

GY

Understanding Big Data Workloads on Modern Processors using BigDataBench

Jianfeng Zhan

http://prof.ict.ac.cn/BigDataBench

Professor, ICT, Chinese Academy of Sciences and University of Chinese Academy of Sciences

HPBDC 2015Ohio, USA

Page 2: INSTITUTE OF COMPUTING TECHNOLOGY Understanding Big Data Workloads on Modern Processors using BigDataBench Jianfeng Zhan

BigDataBench HPBDC 2015

Outline

BigDataBench Overview

Workload characterization

Multi-tenancy version

Processors evaluation

Page 3: INSTITUTE OF COMPUTING TECHNOLOGY Understanding Big Data Workloads on Modern Processors using BigDataBench Jianfeng Zhan

BigDataBench HPBDC 2015

What is BigDataBench? An open source big data benchmarking project

• http://prof.ict.ac.cn/BigDataBench

• Search Google using “BigDataBench”

Page 4: INSTITUTE OF COMPUTING TECHNOLOGY Understanding Big Data Workloads on Modern Processors using BigDataBench Jianfeng Zhan

BigDataBench HPBDC 2015

BigDataBench Details

Methodolo

gy

• Five application domains• Benchmark specification for each domain

Implementat

ion

• 14 Real world data sets & 3 kinds of big data generators• 33 Big data workloads with diverse implementation

Specific-

purpose

Version

• BigDataBench subset version

Page 5: INSTITUTE OF COMPUTING TECHNOLOGY Understanding Big Data Workloads on Modern Processors using BigDataBench Jianfeng Zhan

BigDataBench HPBDC 2015

Five Application Domains

http://www.alexa.com/topsites/global;0

40%

25%

15%

5%15%

Search Engine Social Network Electronic Commerce Media Streaming

Others

Top 20 websites

Taking up 80% of internet services

according to page views and daily visitors

http://www.oldcolony.us/wp-content/uploads/2014/11/whatisbigdata-DKB-v2.pdf

600+ new

VIDEOS on YouTube every minute

6600+ new

PHOTOS on FLICKR every minute

100’s

VIDEO feeds from surveillance cameras

80% data growth

are IMAGES, VIDEOS, documents, …

13000+ hours MUSIC streaming on PANDORA every minute

370000+ minutes VOICE calls on Skype every minute

19981999

20002001

20022003

20042005

20062007

20082009

20102011

20122013

2014

2015.30

20406080

100120140160180200

0

20

40

60

80

100

120

140

160

180

DDBJ/EMBL/GenBank database GrowthNucleotides Entries

Nuc

leoti

des (

billi

on)

Entr

ies (

mill

ion)

http://www.ddbj.nig.ac.jp/breakdown_stats/dbgrowth-e.html#dbgrowth-graph

Internet ServiceSearch engine, Social network, E-commerce

Multimedia

Bioinformatics

Page 6: INSTITUTE OF COMPUTING TECHNOLOGY Understanding Big Data Workloads on Modern Processors using BigDataBench Jianfeng Zhan

BigDataBench HPBDC 2015

Benchmark specification

Guidelines for BigDataBench implementation Data model workloads

Model typical application scenarios

Describe data model

Extract important workloads

Page 7: INSTITUTE OF COMPUTING TECHNOLOGY Understanding Big Data Workloads on Modern Processors using BigDataBench Jianfeng Zhan

BigDataBench HPBDC 2015

BigDataBench Details

Methodolo

gy

• Five application domains• Benchmark specification for each domain

Implementat

ion

• 14 Real world data sets & 3 kinds of big data generators• 33 Big data workloads with diverse implementation

Specific-

purpose

Version

• BigDataBench subset version

Page 8: INSTITUTE OF COMPUTING TECHNOLOGY Understanding Big Data Workloads on Modern Processors using BigDataBench Jianfeng Zhan

BigDataBench HPBDC 2015

MPI

Shark

ImpalaNoSql

Software Stacks

BigDataBench Summary

33 Workloads

Search Engine

14 Real-world Data Sets

BDGS(Big Data Generator Suite) for scalable data

DataMPI

Hadoop RDMA

Facebook Social Network

ImageNet

Wikipedia Entries

E-commerce Transaction

English broadcasting audio

Amazon Movie Reviews

ProfSearch Resumes

DVD Input Streams

Google Web Graph

SoGou Data

Image scene

MNIST

Genome sequence data Assembly of the human genome

Social

NetworkE-commerce

Multimedia Bioinformatics

Page 9: INSTITUTE OF COMPUTING TECHNOLOGY Understanding Big Data Workloads on Modern Processors using BigDataBench Jianfeng Zhan

BigDataBench HPBDC 2015

Big Data Generator Tool

3 kinds of big data generators Preserving original characteristics of real data Text/Graph/Table generator

Page 10: INSTITUTE OF COMPUTING TECHNOLOGY Understanding Big Data Workloads on Modern Processors using BigDataBench Jianfeng Zhan

BigDataBench HPBDC 2015

BigDataBench Details

Methodolo

gy

• Five application domains• Benchmark specification for each domain

Implementat

ion

• 14 Real world data sets & 3 kinds of big data generators• 33 Big data workloads with diverse implementation

Specific-

purpose

Version

• BigDataBench subset version

Page 11: INSTITUTE OF COMPUTING TECHNOLOGY Understanding Big Data Workloads on Modern Processors using BigDataBench Jianfeng Zhan

BigDataBench HPBDC 2015

BigDataBench Subset

Motivation It is expensive to run all the benchmarks for

system and architecture researches• multiplied by different implementations • BigDataBench 3.0 provides about 77 workloads

Identify workload characteristics from a specific perspective

Eliminate the correlation data

Dimension reduction (PCA)

Clustering(K-Means) Subset

Page 12: INSTITUTE OF COMPUTING TECHNOLOGY Understanding Big Data Workloads on Modern Processors using BigDataBench Jianfeng Zhan

BigDataBench HPBDC 2015

Why BigDataBench?Specification

Application domains

Workload Types

Workloads

Scalable data sets (from real data)

Multipleimplementations

Multitenancy

Subsets

Simulatorversion

BigDataBench Y Five Four[1] 33 8 Y Y Y Y

BigBench Y One Three 10 3 N N N N

Cloud-Suite N N/A Two 8 3 N N N Y

HiBench N N/A Two 10 3 N N N N

CALDA Y N/A One 5 N/A Y N N NYCSB Y N/A One 6 N/A Y N N NLinkBench Y N/A One 10 N/A Y N N N

AMPBenchmarks

Y N/A One 4 N/A Y N N N

[1] The four workloads types include Offline Analytics, Cloud OLTP, Interactive Analytics and Online Service

Page 13: INSTITUTE OF COMPUTING TECHNOLOGY Understanding Big Data Workloads on Modern Processors using BigDataBench Jianfeng Zhan

BigDataBench HPBDC 2015

BigDataBench Users http://prof.ict.ac.cn/BigDataBench/users/ Industry users

Accenture, BROADCOM, SAMSUMG, Huawei, IBM China’s first industry-standard big data benchmark

suite http://prof.ict.ac.cn/BigDataBench/industry-standard-

benchmarks/ About 20 academia groups published papers using

BigDataBench

Page 14: INSTITUTE OF COMPUTING TECHNOLOGY Understanding Big Data Workloads on Modern Processors using BigDataBench Jianfeng Zhan

BigDataBench HPBDC 2015

BigDataBench Publications BigDataBench: a Big Data Benchmark Suite from Internet Services. 20th IEEE

International Symposium On High Performance Computer Architecture (HPCA-2014).

Characterizing data analysis workloads in data centers. 2013 IEEE International Symposium on Workload Characterization (IISWC 2013) ( Best paper award )

BigOP: generating comprehensive big data workloads as a benchmarking framework. 19th International Conference on Database Systems for Advanced Applications (DASFAA 2014)

BDGS: A Scalable Big Data Generator Suite in Big Data Benchmarking. The Fourth workshop on big data benchmarking (WBDB 2014)

Identifying Dwarfs Workloads in Big Data Analytics arXiv preprint arXiv:1505.06872

BigDataBench-MT: A Benchmark Tool for Generating Realistic Mixed Data Center Workloads arXiv preprint arXiv:1504.02205

Page 15: INSTITUTE OF COMPUTING TECHNOLOGY Understanding Big Data Workloads on Modern Processors using BigDataBench Jianfeng Zhan

BigDataBench HPBDC 2015

Outline

BigDataBench Overview

Workload characterization

Multi-tenancy version

Processors evaluation

Page 16: INSTITUTE OF COMPUTING TECHNOLOGY Understanding Big Data Workloads on Modern Processors using BigDataBench Jianfeng Zhan

BigDataBench HPBDC 2015

H-Gre

p(7)

S-Pa

geRa

nk(1

)

H-Bay

es(1

)

M-K

mea

ns

H-Rea

d(10

)

I-Selec

tQue

ry(9

)

S-Pr

ojec

t(4)

S-Gre

p(1)

H-TPC

-DS-

quer

y3(9

)

S-TP

C-DS-

quer

y10(

4)

S-So

rt(1

)

M-S

ort

0.0013

0.013

0.13

1.3

13

130

Wei

ghte

d I/O

tim

e ra

tio

System Behaviors

Diversified system level behaviors:

The Average Weighted disk I/O time ratio

H-Gre

p(7)

S-Pa

geRa

nk(1

)

H-Bay

es(1

)

M-K

mea

ns

H-Rea

d(10

)

I-Selec

tQue

ry(9

)

S-Pr

ojec

t(4)

S-Gre

p(1)

H-TPC

-DS-

quer

y3(9

)

S-TP

C-DS-

quer

y10(

4)

S-So

rt(1

)

M-S

ort

0%

20%

40%

60%

80%

100%CPU utilization I/O wait ratio

Per

cent

age

Page 17: INSTITUTE OF COMPUTING TECHNOLOGY Understanding Big Data Workloads on Modern Processors using BigDataBench Jianfeng Zhan

BigDataBench HPBDC 2015

H-Gre

p(7)

S-Pa

geRa

nk(1

)

H-Bay

es(1

)

M-K

mea

ns

H-Rea

d(10

)

I-Selec

tQue

ry(9

)

S-Pr

ojec

t(4)

S-Gre

p(1)

H-TPC

-DS-

quer

y3(9

)

S-TP

C-DS-

quer

y10(

4)

S-So

rt(1

)

M-S

ort

0.0013

0.013

0.13

1.3

13

130

Wei

ghte

d I/O

tim

e ra

tio

System Behaviors

Diversified system level behaviors:

High CPU utilization & less I/O time

The Average Weighted disk I/O time ratio

H-Gre

p(7)

S-Pa

geRa

nk(1

)

H-Bay

es(1

)

M-K

mea

ns

H-Rea

d(10

)

I-Selec

tQue

ry(9

)

S-Pr

ojec

t(4)

S-Gre

p(1)

H-TPC

-DS-

quer

y3(9

)

S-TP

C-DS-

quer

y10(

4)

S-So

rt(1

)

M-S

ort

0%

20%

40%

60%

80%

100%CPU utilization I/O wait ratio

Per

cent

age

Page 18: INSTITUTE OF COMPUTING TECHNOLOGY Understanding Big Data Workloads on Modern Processors using BigDataBench Jianfeng Zhan

BigDataBench HPBDC 2015

H-Gre

p(7)

S-Pa

geRa

nk(1

)

H-Bay

es(1

)

M-K

mea

ns

H-Rea

d(10

)

I-Selec

tQue

ry(9

)

S-Pr

ojec

t(4)

S-Gre

p(1)

H-TPC

-DS-

quer

y3(9

)

S-TP

C-DS-

quer

y10(

4)

S-So

rt(1

)

M-S

ort

0.0013

0.013

0.13

1.3

13

130

Wei

ghte

d I/O

tim

e ra

tio

System Behaviors

Diversified system level behaviors:

High CPU utilization & less I/O time

Low CPU utilization relatively and lots of I/O time

The Average Weighted disk I/O time ratio

H-Gre

p(7)

S-Pa

geRa

nk(1

)

H-Bay

es(1

)

M-K

mea

ns

H-Rea

d(10

)

I-Selec

tQue

ry(9

)

S-Pr

ojec

t(4)

S-Gre

p(1)

H-TPC

-DS-

quer

y3(9

)

S-TP

C-DS-

quer

y10(

4)

S-So

rt(1

)

M-S

ort

0%

20%

40%

60%

80%

100%CPU utilization I/O wait ratio

Per

cent

age

Page 19: INSTITUTE OF COMPUTING TECHNOLOGY Understanding Big Data Workloads on Modern Processors using BigDataBench Jianfeng Zhan

BigDataBench HPBDC 2015

System Behaviors

Diversified system level behaviors:

High CPU utilization & less I/O time

Low CPU utilization relatively and lots of I/O time

Medium CPU utilization and I/O time

The average Weighted disk I/O time ratios

H-Gre

p(7)

S-Pa

geRa

nk(1

)

H-Bay

es(1

)

M-K

mea

ns

H-Rea

d(10

)

I-Selec

tQue

ry(9

)

S-Pr

ojec

t(4)

S-Gre

p(1)

H-TPC

-DS-

quer

y3(9

)

S-TP

C-DS-

quer

y10(

4)

S-So

rt(1

)

M-S

ort

0%

20%

40%

60%

80%

100%CPU utilization I/O wait ratio

Per

cent

age

H-Gre

p(7)

S-Pa

geRa

nk(1

)

H-Bay

es(1

)

M-K

mea

ns

H-Rea

d(10

)

I-Selec

tQue

ry(9

)

S-Pr

ojec

t(4)

S-Gre

p(1)

H-TPC

-DS-

quer

y3(9

)

S-TP

C-DS-

quer

y10(

4)

S-So

rt(1

)

M-S

ort

0.0013

0.013

0.13

1.3

13

130

Wei

ghte

d I/O

tim

e ra

tio

Page 20: INSTITUTE OF COMPUTING TECHNOLOGY Understanding Big Data Workloads on Modern Processors using BigDataBench Jianfeng Zhan

BigDataBench HPBDC 2015

Workloads Classification

From perspective of system behaviors: System behaviors vary across different workloads Workloads are divided into 3 categories:

Type Workloads

CPU Intensive H-Grep, S-Kmeans, S-PageRank, H-WordCount, H-Bayes, M-Bayes, M-Kmeans and M-PageRank

I/O Intensive H-Read, H-Difference, I-SelectQuery, S-WordCount, S-Project, S-OrderBy, M-Grep and S-Grep

Hybrid H-TPC-DS-query3, I-OrderBy, S-TPC-DS-query10, S-TPC-DS-query8, S-Sort, M-WordCount and M-Sort

Page 21: INSTITUTE OF COMPUTING TECHNOLOGY Understanding Big Data Workloads on Modern Processors using BigDataBench Jianfeng Zhan

BigDataBench HPBDC 2015

Off-Chip Bandwidth

Most of CPU-intensive workloads have higher off-chip bandwidth (3GB/s), Maximum is 6.2GB/s; Other workloads have lower off-chip bandwidth (0.6GB/s).

MPI based workloads need low memory bandwidth.

Page 22: INSTITUTE OF COMPUTING TECHNOLOGY Understanding Big Data Workloads on Modern Processors using BigDataBench Jianfeng Zhan

BigDataBench HPBDC 2015

H-Gre

p(7)

S-Pag

eRan

k(1)

H-Bay

es(1

)

M-K

mea

ns

H-Rea

d(10)

I-Selec

tQuer

y(9)

S-Pro

ject

(4)

S-Gre

p(1)

H-TPC

-DS-q

uery3

(9)

S-TPC

-DS-q

uery1

0(4)

S-Sor

t(1)

M-S

ort

TPC-C

Avg_H

PCC

AVG_SPE

Cfp

0

0.5

1

1.5

2

IPC

IPC of BigDataBench vs. other benchmarks

The average IPC of the big data workloads is larger than that of CloudSuite, SPECFP and SPECINT, similar with PARSEC and slightly lower than HPCC

The avrerage IPC of BigDataBench is 1.3 times of that of CloudSuite

Some workloads have high IPC (M_Kmeans, S-TPC-DS-Query8)

Page 23: INSTITUTE OF COMPUTING TECHNOLOGY Understanding Big Data Workloads on Modern Processors using BigDataBench Jianfeng Zhan

BigDataBench HPBDC 2015

Instructions Mix of BigDataBench vs. other benchmarks

Big data workloads are data movement dominated computing with more branch operations

92% percentage in terms of instruction mix (Load + Store + Branch + data movements of INT)

Page 24: INSTITUTE OF COMPUTING TECHNOLOGY Understanding Big Data Workloads on Modern Processors using BigDataBench Jianfeng Zhan

BigDataBench HPBDC 2015

Pipeline Stalls The service workloads have more RAT (Register Allocation Table) stalls The data analysis workloads have more RS (Reservation Station) and

ROB (ReOrder Buffer) full stalls Notable front end stalls (i.e., instruction fetch stall) !

Data analysis

Service

Page 25: INSTITUTE OF COMPUTING TECHNOLOGY Understanding Big Data Workloads on Modern Processors using BigDataBench Jianfeng Zhan

BigDataBench HPBDC 2015

Cache Behaviors of BigDataBench

L1I MPKI Larger than traditional benchmarks, but lower than that of CloudSuite (12 vs.

31) Different among big data workloads

CPU-intensive(8), I/O intensive(22), and hybrid workloads(9) One order of magnitude differences among diverse implementations

M_WordCount is 2, while H_WordCount is 17

CPU-intensive I/O-intensive hybrid

Page 26: INSTITUTE OF COMPUTING TECHNOLOGY Understanding Big Data Workloads on Modern Processors using BigDataBench Jianfeng Zhan

BigDataBench HPBDC 2015

Cache Behaviors

L2 Cache : The IO-intensive workloads undergo more L2 MPKI

L3 Cache: The average L3 MPKI of the big data workloads is lower

than all of the other workloads The underlying software stacks impact data

locality MPI workloads have better data locality and less cache

misses

Page 27: INSTITUTE OF COMPUTING TECHNOLOGY Understanding Big Data Workloads on Modern Processors using BigDataBench Jianfeng Zhan

BigDataBench HPBDC 2015

TLB Behaviors

ITLB IO-intensive workloads undergo more ITLB MPKI.

DTLB CPU-intensive workloads have more DTLB MPKI.

CPU-intensive I/O-intensive hybrid

0.01

0.1

1

10H

-Gre

p(7

)

S-K

means(1

)

S-P

ageR

ank(1

)

H-W

ord

Count(1)

H-B

ayes(1

)

M-B

ayes

M-K

means

M-P

ageR

ank

H-R

ead(1

0)

H-D

iffere

nce(9

)

I-S

ele

ctQ

uery

(9)

S-W

ord

Count(8)

S-P

roje

ct(4)

S-O

rderB

y(3

)

S-G

rep(1

)

M-G

rep

H-T

PC

-DS

-query

3(9

)

I-O

rderB

y(7

)

S-T

PC

-DS

-query

10(4

)

S-T

PC

-DS

-query

8(1

)

S-S

ort(1

)

M-W

ord

Count

M-S

ort

AV

G_S

_B

igD

ata

TP

C-C

AV

G_C

loudS

uite

Avg_H

PC

C

Avg_P

AR

SE

C

AV

G_S

PE

Cfp

AV

G_S

PE

Cin

t

DTLB MPKI ITLB MPKI

TL

B m

isse

s (

MP

KI)

Page 28: INSTITUTE OF COMPUTING TECHNOLOGY Understanding Big Data Workloads on Modern Processors using BigDataBench Jianfeng Zhan

BigDataBench HPBDC 2015

Our observations from BigDataBench Unique characteristics

Data movement dominated computing with more branch operations• 92% percentage in terms of instruction mix

Notable pipeline frontend stalls

Different behaviors among Big Data workloads Disparity of IPCs and memory access behaviors

• CloudSuite is a subclass of Big Data

Software stacks impacts The L1I cache miss rates have one order of magnitude differences among

diverse implementations with different software stacks.

Page 29: INSTITUTE OF COMPUTING TECHNOLOGY Understanding Big Data Workloads on Modern Processors using BigDataBench Jianfeng Zhan

BigDataBench HPBDC 2015

Correlation Analysis

Compute the correlation coefficients of CPI with other micro-architecture level metrics. Pearson’s correlation coefficient:

Absolute value (from 0 to 1) shows the dependency:• The bigger the absolute value, the stronger the correlation.

Page 30: INSTITUTE OF COMPUTING TECHNOLOGY Understanding Big Data Workloads on Modern Processors using BigDataBench Jianfeng Zhan

BigDataBench HPBDC 2015

Top five coefficients

Naive Bayes Grep WordCount Kmeans FKmeans PageRank Sort Hive IBCF HMM SVM

Page 31: INSTITUTE OF COMPUTING TECHNOLOGY Understanding Big Data Workloads on Modern Processors using BigDataBench Jianfeng Zhan

BigDataBench HPBDC 2015

Insights Frontend stall does not have high correlation

coefficient value for most of big data analytics workloads Frontend stall is not the factor that affects the CPI

performance most. L2 cache misses and TLB misses have high

correlation coefficient values. The long latency memory accesses (access L3 cache or

memory) affect the CPI performance most and should be the optimization point with highest priority.

Page 32: INSTITUTE OF COMPUTING TECHNOLOGY Understanding Big Data Workloads on Modern Processors using BigDataBench Jianfeng Zhan

BigDataBench HPBDC 2015

Outline

BigDataBench Overview

Workload characterization

Multi-tenancy version

Processors evaluation

Page 33: INSTITUTE OF COMPUTING TECHNOLOGY Understanding Big Data Workloads on Modern Processors using BigDataBench Jianfeng Zhan

BigDataBench HPBDC 2015

Cloud Data Centers

Two class of popular workloads Long-running services

• Search engines, E-commerce sites Short-term data analytic jobs

• Hadoop MapReduce, Spark jobs

Page 34: INSTITUTE OF COMPUTING TECHNOLOGY Understanding Big Data Workloads on Modern Processors using BigDataBench Jianfeng Zhan

BigDataBench HPBDC 2015

Problem

Existing benchmarks focus on specific types of workload

Scenarios are not realistic Does not match the typical data center scenario

that mixes different percentages of tenants and workloads shareing the same computing infrastructure

Page 35: INSTITUTE OF COMPUTING TECHNOLOGY Understanding Big Data Workloads on Modern Processors using BigDataBench Jianfeng Zhan

BigDataBench HPBDC 2015

Purpose of BigDataBench-MT

Developing realistic benchmarks to reflect such practical scenarios of mixed workloads. Both service and data analytic workloads Dynamic scaling up and down

The tool is publicly available from http://prof.ict.ac.cn/BigDataBench/multi-tenancyversion

Page 36: INSTITUTE OF COMPUTING TECHNOLOGY Understanding Big Data Workloads on Modern Processors using BigDataBench Jianfeng Zhan

BigDataBench HPBDC 2015

What can you do with it?

We consider two dimensions of the benchmarking scenarios From tenants’ perspectives From workloads’ perspectives

Page 37: INSTITUTE OF COMPUTING TECHNOLOGY Understanding Big Data Workloads on Modern Processors using BigDataBench Jianfeng Zhan

BigDataBench HPBDC 2015

You can specify the tenants

The number of tenants Scalability Benchmark: How many tenants are able run

in parallel ? The priorities of tenants

Fairness Benchmark: How fair is the system, i.e., are the available resources equally available to all tenants? If tenants have different priorities ?

Time line How the number and priorities of tenants change

over time?

Page 38: INSTITUTE OF COMPUTING TECHNOLOGY Understanding Big Data Workloads on Modern Processors using BigDataBench Jianfeng Zhan

BigDataBench HPBDC 2015

You can specify the workloads

Data characteristics Data type, source Input/output data volumes, distributions

Computation semantics Source code Big data software stacks

Job arrival patterns Arrival rate Arrival sequence

Page 39: INSTITUTE OF COMPUTING TECHNOLOGY Understanding Big Data Workloads on Modern Processors using BigDataBench Jianfeng Zhan

BigDataBench HPBDC 2015

Two major challenges Heterogeneity of real workloads

Different workload types e.g. CPU or I/O intensive workloads

Different software stacks e.g. Hadoop, Spark, MPI

Workload dynamicity hidden in real-world traces Arrival patterns

Request/Job submitting time and sequences

Job input sizes e.g. ranging from KB to ZB

Page 40: INSTITUTE OF COMPUTING TECHNOLOGY Understanding Big Data Workloads on Modern Processors using BigDataBench Jianfeng Zhan

BigDataBench HPBDC 2015

Existing big data benchmarks

How to generate real workloads on the basis of real workload traces, is still an open question.

Benchmarks Actual workloads

Real workload traces

Mixed workloads

AMPLab benchmark, Linkbench , Bigbench , YCSB, CloudSuite Yes No No

GridMix , SWIM No Yes No

Page 41: INSTITUTE OF COMPUTING TECHNOLOGY Understanding Big Data Workloads on Modern Processors using BigDataBench Jianfeng Zhan

BigDataBench HPBDC 2015

System Overview

Three modules Benchmark User

Portal• A visual interface

Combiner of Workloads and Traces

• A matcher of real workloads and traces

Multi-tenant Workload Generator

• A multi-tenant workload generator

Page 42: INSTITUTE OF COMPUTING TECHNOLOGY Understanding Big Data Workloads on Modern Processors using BigDataBench Jianfeng Zhan

BigDataBench HPBDC 2015

Key technique: Combination of real and synthetic data analytic jobs

Goal: Combining the arrival patterns extracted from real traces with real workloads.

Problem: Workload traces only contain anonymous jobs whose workload types and/or input data are unknown.

Page 43: INSTITUTE OF COMPUTING TECHNOLOGY Understanding Big Data Workloads on Modern Processors using BigDataBench Jianfeng Zhan

BigDataBench HPBDC 2015

Solution: the first step Deriving the workload characteristics of both

real and anonymous jobs

Metric DescriptionExecution time Measured in seconds

CPU usage Total CPU time per secondMemory usage Measured in GB

CPI Cycles per instruction

MAI The number of memory accesses per instruction

TABLE. Metrics to represent workload characteristics

Page 44: INSTITUTE OF COMPUTING TECHNOLOGY Understanding Big Data Workloads on Modern Processors using BigDataBench Jianfeng Zhan

BigDataBench HPBDC 2015

Solution: the second step

Matching both types of jobs whose workload characteristics are sufficiently similar

Page 45: INSTITUTE OF COMPUTING TECHNOLOGY Understanding Big Data Workloads on Modern Processors using BigDataBench Jianfeng Zhan

BigDataBench HPBDC 2015

An Example

Mining Facebook/Google

workload trace(Exact workload characteristics

information

Profiling Hadoop workloads from BigDataBench

(Collect workload characteristics information)

Workload matching

usingk-means

clustering

An example of matching Hadoop workloadsMatching result: replaying basis

Job type Input size (GB)

StartingTime (minutes)

Bayes 2 10

Sort 1 20

K-means 0.5 25

Bayes 5 30

Sort 1 40

Page 46: INSTITUTE OF COMPUTING TECHNOLOGY Understanding Big Data Workloads on Modern Processors using BigDataBench Jianfeng Zhan

BigDataBench HPBDC 2015

System demostration Three steps to generate a mix of search service and Hadoop MapReduce jobs Traces : 24-hour Sogou user query logs and Google cluster trace.

Step 1-Specification of tested machines and workloadsStep 2-Selection of benchmarking period and scale

Step 3-Generation of mixed workloads

Page 47: INSTITUTE OF COMPUTING TECHNOLOGY Understanding Big Data Workloads on Modern Processors using BigDataBench Jianfeng Zhan

BigDataBench HPBDC 2015

Workloads and traces in BigDataBench-MT

Workloads Software stack Workload traceNutch Web

SearchApache Tomcat 6.0.26, Search Server ( Nutch )

Sogou (http://www.sogou.com/labs/dl/q-e.html)

Hadoop Hadoop 1.0.2 Facebook (https://github.com/SWIMProjectUCB/SWIM/wiki)

Shark Shark 0.8.0 Google data center (https://code.google.com/p/googleclusterdata/)

Multi-tenancy V1.0 releases:

Page 48: INSTITUTE OF COMPUTING TECHNOLOGY Understanding Big Data Workloads on Modern Processors using BigDataBench Jianfeng Zhan

BigDataBench HPBDC 2015

Outline

BigDataBench Overview

Workload characterization

Multi-tenancy version

Processors evaluation

Page 49: INSTITUTE OF COMPUTING TECHNOLOGY Understanding Big Data Workloads on Modern Processors using BigDataBench Jianfeng Zhan

BigDataBench HPBDC 2015

Core Architecture

Multi brawny core (Xeon E5645, 2.4 GHz) 6 Out-of-Order cores Dynamic Multiple Issue (supper scalar) Dynamic Overclocking Simultaneous multithreading

Many wimpy core architecture (Tile-Gx36, 1.2 GHz): 36 In-Order cores Static Multiple Issue (VLIW)

Page 50: INSTITUTE OF COMPUTING TECHNOLOGY Understanding Big Data Workloads on Modern Processors using BigDataBench Jianfeng Zhan

BigDataBench HPBDC 2015

Experiment methodology

User real hardware instead of simulation Real power consumption measurement instead of

modeling Saturate CPU performance by:

Isolate the processor behavior• Over-provisions the disk I/O subsystem by using RAM disk

Optimize benchmarks• Tune the software stack parameters • JVM flags to performance

Page 51: INSTITUTE OF COMPUTING TECHNOLOGY Understanding Big Data Workloads on Modern Processors using BigDataBench Jianfeng Zhan

BigDataBench HPBDC 2015

Execution time For Hadoop based sort, the performance gap is about 1.08×. For the other workloads, more than 2× gaps exist between

Xeon and Tilera.

From the perspective of execution time, the Xeon processor is better than Tilera processor all the time.

Hadoop-Sort

Hadoop-G

rep

Hadoop-B

ayes

Hadoop-K

means

Hadoop-P

ageran

k

Spark

-Sort

Spark

-Grep

Spark

-Bay

es

Spark

-Kmea

ns

Spark

-Pag

erank

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5 Xeon Tilera

Nor

mal

ized

Tim

e

Page 52: INSTITUTE OF COMPUTING TECHNOLOGY Understanding Big Data Workloads on Modern Processors using BigDataBench Jianfeng Zhan

BigDataBench HPBDC 2015

Cycle Counts There are huge cycle count gaps between Xeon and Tilera

ranging from 5.3 to 14. Tilera need more cycles to complete the same amount of

work.

Hadoop-Sort

Hadoop-G

rep

Hadoop-B

ayes

Hadoop-K

means

Hadoop-P

ageran

k

Spark

-Sort

Spark

-Grep

Spark

-Bay

es

Spark

-Kmea

ns

Spark

-Pag

erank

0

2

4

6

8

10

12

14

16 Xeon Tilera

Nor

mal

ized

Cyc

les

Page 53: INSTITUTE OF COMPUTING TECHNOLOGY Understanding Big Data Workloads on Modern Processors using BigDataBench Jianfeng Zhan

BigDataBench HPBDC 2015

Pipeline Efficiency• The theoretical IPC:

Xeon: 4 instructions per cycle

• Pipeline efficiency:

• OoO pipelines are more efficient than in-order ones

Tilera: 1 instruction bundle per cycle

Hadoop-Sort

Hadoop-G

rep

Hadoop-B

ayes

Hadoop-K

means

Hadoop-P

ageran

k

Spark

-Sort

Spark

-Grep

Spark

-Bay

es

Spark

-Kmea

ns

Spark

-Pag

erank

00.05

0.10.15

0.20.25

0.30.35

0.40.45 Tilera Xeon

Pipe

line

Effici

ency

Page 54: INSTITUTE OF COMPUTING TECHNOLOGY Understanding Big Data Workloads on Modern Processors using BigDataBench Jianfeng Zhan

BigDataBench HPBDC 2015

Power Consumption

Tilera is power optimized. Xeon consumes more power.

Hadoop-Sort

Hadoop-G

rep

Hadoop-B

ayes

Hadoop-K

means

Hadoop-P

ageran

k

Spark

-Sort

Spark

-Grep

Spark

-Bay

es

Spark

-Kmea

ns

Spark

-Pag

erank

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8 Tilera Xeon

Nor

mal

ized

Pow

er

Page 55: INSTITUTE OF COMPUTING TECHNOLOGY Understanding Big Data Workloads on Modern Processors using BigDataBench Jianfeng Zhan

BigDataBench HPBDC 2015

Energy Consumption Hadoop based sort consumes less energy on Tilera than on Xeon

Hadoop sort is an extremely I/O intensive workloads. Tilera consumes more energy than Xeon to complete the same amount of

work for most big data workloads The longer execution time offsets the lower power design

Hadoop-Sort

Hadoop-G

rep

Hadoop-B

ayes

Hadoop-K

means

Hadoop-P

ageran

k

Spark

-Sort

Spark

-Grep

Spark

-Bay

es

Spark

-Kmea

ns

Spark

-Pag

erank

0

0.5

1

1.5

2

2.5

3

3.5 Xeon Tilera

Nor

mal

ized

Ene

rgy

Page 56: INSTITUTE OF COMPUTING TECHNOLOGY Understanding Big Data Workloads on Modern Processors using BigDataBench Jianfeng Zhan

BigDataBench HPBDC 2015

Total Cost of Ownership (TCO) Model[*]

Three-year depreciation cycle Hardware costs associated with individual

components CPU Memory Disk Board Power Cooling

[*] K. Lim et al. Understanding and designing new server architectures for emerging warehouse-computing environments. ISCA 2008

Page 57: INSTITUTE OF COMPUTING TECHNOLOGY Understanding Big Data Workloads on Modern Processors using BigDataBench Jianfeng Zhan

BigDataBench HPBDC 2015

Cost model

The cost data originate from diverse sources: Different vendors Corresponding official websites

Power and cooling: An activity factor of 0.75

Page 58: INSTITUTE OF COMPUTING TECHNOLOGY Understanding Big Data Workloads on Modern Processors using BigDataBench Jianfeng Zhan

BigDataBench HPBDC 2015

Performance per TCO

Haoop-based Sort has higher performance per TCO on the Tilera.

For other workloads, Xeon outperforms Tilera.

Hadoop-Sort

Hadoop-G

rep

Hadoop-B

ayes

Hadoop-K

means

Hadoop-P

ageran

k

Spark

-Sort

Spark

-Grep

Spark

-Bay

es

Spark

-Kmea

ns

Spark

-Pag

erank

0

0.5

1

1.5

2

2.5

3

3.5Tilera Xeon Turbo & HT enabled

Nor

mal

ized

Per

form

ance

per

TCO

Page 59: INSTITUTE OF COMPUTING TECHNOLOGY Understanding Big Data Workloads on Modern Processors using BigDataBench Jianfeng Zhan

BigDataBench HPBDC 2015

Key Takeaways

Try using an open-source big data benchmark suite from http://prof.ict.ac.cn/BigDataBench

Big Data: data movement dominated computing with more branch operations 92% percentage in terms of instruction mix

Multi-tenancy version: replaying mixed workloads according to publicly available workloads traces.

Wimpy-core processors only suit a part of big data workloads.

Page 60: INSTITUTE OF COMPUTING TECHNOLOGY Understanding Big Data Workloads on Modern Processors using BigDataBench Jianfeng Zhan

BigDataBench HPBDC 2015