big data management and systems design dr. weikuan yu associate professor department of computer...

Post on 14-Jan-2016

212 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Big Data Management and Systems Design

Dr. Weikuan Yu

Associate Professor

Department of Computer Science

Florida State University

Oct 15, 2015 – CS5935 - S2

Life as a Graduate Student

· Course work– Degrees vs. skill sets

· Research– Goals vs. means

· Career development– Presentation and teaching skills– Communication and social skills– Internships– Networking

Oct 15, 2015 – CS5935 - S3

Big Data Challenge

Sources: 1: http://www.infobarrel.com/Evolution 2:http://visual.ly/big-data-explosion?utm_source=visually_embed

Where is Data Coming From ?

3.0 BillionInternet Users

1.35 BillionFacebook Users

550 MillionTweets Per Day

72 Hrs VideoPer Minutes

4.7 Billion Google Search Per Day

And so many others

The dawn of civilization

Year 2003

5 EXABYTES of DATA 5 EXABYTES of DATAEvery 2 Days

Year 2015

7910EXABYTES

Oct 15, 2015 – CS5935 - S4

Big Data Ecosystem

HDFS Hbase

Storage

DisksNetworks

Mahout, Giraph, Pregel, R

MR/Hadoop

Hive, Pig, Shark, Flume

Storm RamCloudSpark

Infrastructure

Applications

Run-Time

Hardware

MemCached

VM, Containers, Public and Private CloudsOS

ComplexityData

Insight

Processors(Accelerators)

Services

Oct 15, 2015 – CS5935 - S5

Research Strategies?· Descriptive research

– Examine and collect facts to document patterns

· Discovery oriented research– Inductive reasoning from patterns to general discoveries

· Engineering-based research– Use existing techniques and theories to create a

technology or tool.

· Hypothesis-driven research– Make a hypothesis and then test the hypothesis using

deductive reasoning

Oct 15, 2015 – CS5935 - S6

Overview of Hadoop MapReduce

Split MapTask

…. ….

Split MapTask

Split MapTask

MOF

MOF

MOF

MOF ReduceTask

MEM

map

ReduceTask

ReduceTask

ReduceTask

shuffle/merge reduce

DFSDFS

JobTrackerAssign MapTasks Assign ReduceTasks

Oct 15, 2015 – CS5935 - S7

1 2 3 4 5 6 70

10

20

30

40

50

60

0

500

1000

1500

2000

2500

3000

Standalone Execution Time (sec)

Sta

nd

alo

ne E

xecu

tion

Tim

e (s

ec)

Groups

Normalized Execution Time (slowdown)

No

rma

lize

d E

xecu

tion

Tim

e

1.5×

Small Job Starvation within Hadoop

52×

Oct 15, 2015 – CS5935 - S8

Hadoop Fair Scheduler

· The mostly widely used Hadoop scheduler

· It is designed to provide fairness among concurrently running jobs.

· Tasks occupy slots until completion or failure

Job Arrival

J-1

J-1

J-1

J-2

J-2

J-2

J-2

J-2

J-3

J-3 J-3

J-3

J-3

J-2

J-2

Time

shuffle

reduce

Slot-R1

Slot-R2

Slot-R3

Slot-M1

Slot-M2

Slot-M3

Slot-M4

Slot-M5

5 Map Slots

3 Reduce Slots

Oct 15, 2015 – CS5935 - S9

How to achieve both Efficiency and Fairness?

· How to correct the non-preemptive nature of reduce tasks for flexible and dynamic allocation of reduce slots?

– Existing schedulers are not aware of such behavior. Once a reduce task is

launched, it stays with the reduce slot till the end.

· How to better schedule two different types of tasks?

– Hadoop schedules both map and reduce tasks with a similar max-min fair

sharing policy without paying attention to their relationship in a job.

– Map and reduce slots need to be dynamically, and proportionally shared.

Objective

Oct 15, 2015 – CS5935 - S10

Preemptive ReduceTasks

· Different from Linux command “Kill –STOP $PID”.

· Lightweight work-conserving preemption mechanism.

– Provides any-time preemption with negligible performance impact.

– Allows a reduce task to resume from where it was preempted.

– Preserves previous computation and I/O.

Oct 15, 2015 – CS5935 - S11

TaskTracker

Heap

Seg

men

t

Merge

seg seg seg

Retrieve

R1: Before Preempt

Heapseg

R1: After Resume

Index

Seg

men

t

Preemption During Shuffle Phase

· Only merge the in-memory segs, while maintaining on-disk segs untouched

Oct 15, 2015 – CS5935 - S12

R1: Before Preempt

MPQ

R1: After Resume

MPQ

DFS

Index

Flush

Ret

rieve

offset

Preemption During Reduce Phase

· Preemption is occurred at the boundary of intermediate <key, val> pairs.

· Recording the current offset of each segment and minimum priority queue

TaskTracker

Oct 15, 2015 – CS5935 - S13

Evaluation of Preemptive ReduceTask

10% 30% 50% 70% 90%3000

4000

5000

6000

7000Work-Conserving PreemptionKilling PreemptionNo Preemption (Baseline)

Completion Ratio of ReduceTask

Jo

b E

xe

cu

tio

n T

ime

(s

e)

Negligible overhead

Oct 15, 2015 – CS5935 - S14

Fast Completion Scheduler

· Strategy: – Find a reduce task for Preemption and select another for Launching,

and balance the utilization of reduce slots

– Decisions to make

· Which reduce task to preempt and for how many times?

· Which one to launch and on which slot/node?

· How to avoid starvation and achieve locality?

· New progress metrics of a job: – Remaining shuffle time

– Remaining data (<k,v> pairs) to be reduced

Oct 15, 2015 – CS5935 - S15

FCS Algorithms

· Preemption algorithm– Select a reduce task from a job with longer remaining time + more

remaining data

– Task slackness: the number of times a task has been preempted

– Avoid starvation: do not preempt a reduce task that has a big slackness.

· Launching algorithm– Select another reduce task from a job with the least remaining time /

remaining data

– Delay a reduce task to maximize data locality.

– Avoid aggressive delay: set a threshold based on the cluster size.

Oct 15, 2015 – CS5935 - S16

1 2 3 4 5 6 7 8 9 1010

100

1000

10000

FCS HFS

· FCS reduces average execution time by 31% (171 jobs).

· Significantly speeds up small jobs at a small cost of big jobs.

Ave

rage

Exe

cutio

n T

ime

(sec

)

10 Groups of Jobs

1.9 2.4 1.9 2.3

1.6

1.9

1.12.2 0.79

Results for Map-heavy Workload

Oct 15, 2015 – CS5935 - S17

1 2 3 4 5 6 7 8 9 101

10

100

1000

10000FCS HFS

Av

era

ge

Re

du

ce

Ta

sk

Wa

it

Tim

e (

se

c)

· Small jobs are benefited from significantly shortened reduce wait time.

· Waiting time are reduced by 22× for the jobs in the first 6 groups.

Average ReduceTask Wait Time

19.512.4

2221 32.2

1.224.5

0.5

0.8

10 Groups of Jobs

27.2

Oct 15, 2015 – CS5935 - S18

1 2 3 4 5 6 7 8 9 1002468

101214161820

Fair Comple-tion

Max

imum

Slo

wdo

wn

· Nearly uniform maximum slowdown for all groups of jobs.

· FCS improves the fairness by 66.7% on average.

10 groups of jobs

Fairness Evaluation: Maximum Slowdown

Oct 15, 2015 – CS5935 - S19

Summary for Coordinated Scheduling

· Identified the fairness and efficiency issues because of the lack of scheduling coordination.

· Introduced Preemptive ReduceTasks for efficient preemption of reduce tasks from long-running jobs

· Designed and Implemented Fast Completion Scheduler for fast execution of small jobs and better job fairness.

Oct 15, 2015 – CS5935 - S20

Broad Research Interests

Big Data· System Design and

Management– Fast Data Movement– Efficient job management– Multi-purpose framework

· High Performance Computing– Parallel computing models– Scalable I/O & communication – Computation & I/O optimization

· Data Analytics– K-mer indexing for sequence

fingerprinting and alignment– Scalable Image Processing– Fast community detection

· Security and Reliability– Analytics logging and Recovery– Cloud security– Storage security

Oct 15, 2015 – CS5935 - S21

Resource and Capabilities· Software: Unstructured Data

Accelerator (UDA)– Accelerator for Big Data Analytics– Transferred to Mellanox

· In-house big data platform– 22 nodes; InfiniBand and 10GigE– SSD and GPGPU (Phi and Kepler)– Donations from Mellanox, Solarflare,

and NVIDIA

Oct 15, 2015 – CS5935 - S22

Sponsors, Contractors, and Collaborators· Current Sponsors and Contractors:

– NSF: Two active grants on big data analytics, storage and network systems, – LLNL: burst buffer based storage systems

· Past Sponsors and Contractors– NASA: one grant for for I/O Optimization of climate applications– DOE Labs: many contracts for high-performance computing.– Industry: contracts from Intel, Mellanox, NVIDIA and Scitor; – Alabama: Innovation Award; Auburn IGP for TigerCloud.

· Collaborators:– IBM, Intel, Mellanox, Scitor, Solarflare, AMD– LBNL, ORNL, LLNL, SNL, GSFC, LPS– Illinois Tech, Clemson, College of William Mary, NJIT, Georgia Tech, Auburn

(OIT, Physics, Biology, SFWS)

Oct 15, 2015 – CS5935 - S23

Research Directions

· Main Thrusts

· Big Data Analytics and systems design

· Network and data privacy and security

· Interdisciplinary data-driven computational research

· Key: solve challenging problems with novel strategies…

· Collaborations

· Students (systems/network oriented, interdisciplinary)

· Faculty (on and off campus)

· National laboratories

· Industry: IBM, Intel, and many more

Oct 15, 2015 – CS5935 - S24

Student Cultivation and Team Building Numerous Internships

ORNL, Sandia, LANL, LLNL, IBM. Awards and honors

First Prize of ACM Grand Finals SC11 Fellowship ($5000). Outstanding students: 2011-2014.

Alumni (Ph.D. listed) Yuan Tian – ORNL Xinyu Que – IBM T.J. Watson Yandong Wang – IBM Watson Zhuo Liu – Yahoo! Cong Xu – Intel Bin Wang – Arista Networks

Current Team 4 Ph.D. students

top related