big data management and systems design dr. weikuan yu associate professor department of computer...

24
Big Data Management and Systems Design Dr. Weikuan Yu Associate Professor Department of Computer Science Florida State University

Upload: elvin-marshall

Post on 14-Jan-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Big Data Management and Systems Design Dr. Weikuan Yu Associate Professor Department of Computer Science Florida State University

Big Data Management and Systems Design

Dr. Weikuan Yu

Associate Professor

Department of Computer Science

Florida State University

Page 2: Big Data Management and Systems Design Dr. Weikuan Yu Associate Professor Department of Computer Science Florida State University

Oct 15, 2015 – CS5935 - S2

Life as a Graduate Student

· Course work– Degrees vs. skill sets

· Research– Goals vs. means

· Career development– Presentation and teaching skills– Communication and social skills– Internships– Networking

Page 3: Big Data Management and Systems Design Dr. Weikuan Yu Associate Professor Department of Computer Science Florida State University

Oct 15, 2015 – CS5935 - S3

Big Data Challenge

Sources: 1: http://www.infobarrel.com/Evolution 2:http://visual.ly/big-data-explosion?utm_source=visually_embed

Where is Data Coming From ?

3.0 BillionInternet Users

1.35 BillionFacebook Users

550 MillionTweets Per Day

72 Hrs VideoPer Minutes

4.7 Billion Google Search Per Day

And so many others

The dawn of civilization

Year 2003

5 EXABYTES of DATA 5 EXABYTES of DATAEvery 2 Days

Year 2015

7910EXABYTES

Page 4: Big Data Management and Systems Design Dr. Weikuan Yu Associate Professor Department of Computer Science Florida State University

Oct 15, 2015 – CS5935 - S4

Big Data Ecosystem

HDFS Hbase

Storage

DisksNetworks

Mahout, Giraph, Pregel, R

MR/Hadoop

Hive, Pig, Shark, Flume

Storm RamCloudSpark

Infrastructure

Applications

Run-Time

Hardware

MemCached

VM, Containers, Public and Private CloudsOS

ComplexityData

Insight

Processors(Accelerators)

Services

Page 5: Big Data Management and Systems Design Dr. Weikuan Yu Associate Professor Department of Computer Science Florida State University

Oct 15, 2015 – CS5935 - S5

Research Strategies?· Descriptive research

– Examine and collect facts to document patterns

· Discovery oriented research– Inductive reasoning from patterns to general discoveries

· Engineering-based research– Use existing techniques and theories to create a

technology or tool.

· Hypothesis-driven research– Make a hypothesis and then test the hypothesis using

deductive reasoning

Page 6: Big Data Management and Systems Design Dr. Weikuan Yu Associate Professor Department of Computer Science Florida State University

Oct 15, 2015 – CS5935 - S6

Overview of Hadoop MapReduce

Split MapTask

…. ….

Split MapTask

Split MapTask

MOF

MOF

MOF

MOF ReduceTask

MEM

map

ReduceTask

ReduceTask

ReduceTask

shuffle/merge reduce

DFSDFS

JobTrackerAssign MapTasks Assign ReduceTasks

Page 7: Big Data Management and Systems Design Dr. Weikuan Yu Associate Professor Department of Computer Science Florida State University

Oct 15, 2015 – CS5935 - S7

1 2 3 4 5 6 70

10

20

30

40

50

60

0

500

1000

1500

2000

2500

3000

Standalone Execution Time (sec)

Sta

nd

alo

ne E

xecu

tion

Tim

e (s

ec)

Groups

Normalized Execution Time (slowdown)

No

rma

lize

d E

xecu

tion

Tim

e

1.5×

Small Job Starvation within Hadoop

52×

Page 8: Big Data Management and Systems Design Dr. Weikuan Yu Associate Professor Department of Computer Science Florida State University

Oct 15, 2015 – CS5935 - S8

Hadoop Fair Scheduler

· The mostly widely used Hadoop scheduler

· It is designed to provide fairness among concurrently running jobs.

· Tasks occupy slots until completion or failure

Job Arrival

J-1

J-1

J-1

J-2

J-2

J-2

J-2

J-2

J-3

J-3 J-3

J-3

J-3

J-2

J-2

Time

shuffle

reduce

Slot-R1

Slot-R2

Slot-R3

Slot-M1

Slot-M2

Slot-M3

Slot-M4

Slot-M5

5 Map Slots

3 Reduce Slots

Page 9: Big Data Management and Systems Design Dr. Weikuan Yu Associate Professor Department of Computer Science Florida State University

Oct 15, 2015 – CS5935 - S9

How to achieve both Efficiency and Fairness?

· How to correct the non-preemptive nature of reduce tasks for flexible and dynamic allocation of reduce slots?

– Existing schedulers are not aware of such behavior. Once a reduce task is

launched, it stays with the reduce slot till the end.

· How to better schedule two different types of tasks?

– Hadoop schedules both map and reduce tasks with a similar max-min fair

sharing policy without paying attention to their relationship in a job.

– Map and reduce slots need to be dynamically, and proportionally shared.

Objective

Page 10: Big Data Management and Systems Design Dr. Weikuan Yu Associate Professor Department of Computer Science Florida State University

Oct 15, 2015 – CS5935 - S10

Preemptive ReduceTasks

· Different from Linux command “Kill –STOP $PID”.

· Lightweight work-conserving preemption mechanism.

– Provides any-time preemption with negligible performance impact.

– Allows a reduce task to resume from where it was preempted.

– Preserves previous computation and I/O.

Page 11: Big Data Management and Systems Design Dr. Weikuan Yu Associate Professor Department of Computer Science Florida State University

Oct 15, 2015 – CS5935 - S11

TaskTracker

Heap

Seg

men

t

Merge

seg seg seg

Retrieve

R1: Before Preempt

Heapseg

R1: After Resume

Index

Seg

men

t

Preemption During Shuffle Phase

· Only merge the in-memory segs, while maintaining on-disk segs untouched

Page 12: Big Data Management and Systems Design Dr. Weikuan Yu Associate Professor Department of Computer Science Florida State University

Oct 15, 2015 – CS5935 - S12

R1: Before Preempt

MPQ

R1: After Resume

MPQ

DFS

Index

Flush

Ret

rieve

offset

Preemption During Reduce Phase

· Preemption is occurred at the boundary of intermediate <key, val> pairs.

· Recording the current offset of each segment and minimum priority queue

TaskTracker

Page 13: Big Data Management and Systems Design Dr. Weikuan Yu Associate Professor Department of Computer Science Florida State University

Oct 15, 2015 – CS5935 - S13

Evaluation of Preemptive ReduceTask

10% 30% 50% 70% 90%3000

4000

5000

6000

7000Work-Conserving PreemptionKilling PreemptionNo Preemption (Baseline)

Completion Ratio of ReduceTask

Jo

b E

xe

cu

tio

n T

ime

(s

e)

Negligible overhead

Page 14: Big Data Management and Systems Design Dr. Weikuan Yu Associate Professor Department of Computer Science Florida State University

Oct 15, 2015 – CS5935 - S14

Fast Completion Scheduler

· Strategy: – Find a reduce task for Preemption and select another for Launching,

and balance the utilization of reduce slots

– Decisions to make

· Which reduce task to preempt and for how many times?

· Which one to launch and on which slot/node?

· How to avoid starvation and achieve locality?

· New progress metrics of a job: – Remaining shuffle time

– Remaining data (<k,v> pairs) to be reduced

Page 15: Big Data Management and Systems Design Dr. Weikuan Yu Associate Professor Department of Computer Science Florida State University

Oct 15, 2015 – CS5935 - S15

FCS Algorithms

· Preemption algorithm– Select a reduce task from a job with longer remaining time + more

remaining data

– Task slackness: the number of times a task has been preempted

– Avoid starvation: do not preempt a reduce task that has a big slackness.

· Launching algorithm– Select another reduce task from a job with the least remaining time /

remaining data

– Delay a reduce task to maximize data locality.

– Avoid aggressive delay: set a threshold based on the cluster size.

Page 16: Big Data Management and Systems Design Dr. Weikuan Yu Associate Professor Department of Computer Science Florida State University

Oct 15, 2015 – CS5935 - S16

1 2 3 4 5 6 7 8 9 1010

100

1000

10000

FCS HFS

· FCS reduces average execution time by 31% (171 jobs).

· Significantly speeds up small jobs at a small cost of big jobs.

Ave

rage

Exe

cutio

n T

ime

(sec

)

10 Groups of Jobs

1.9 2.4 1.9 2.3

1.6

1.9

1.12.2 0.79

Results for Map-heavy Workload

Page 17: Big Data Management and Systems Design Dr. Weikuan Yu Associate Professor Department of Computer Science Florida State University

Oct 15, 2015 – CS5935 - S17

1 2 3 4 5 6 7 8 9 101

10

100

1000

10000FCS HFS

Av

era

ge

Re

du

ce

Ta

sk

Wa

it

Tim

e (

se

c)

· Small jobs are benefited from significantly shortened reduce wait time.

· Waiting time are reduced by 22× for the jobs in the first 6 groups.

Average ReduceTask Wait Time

19.512.4

2221 32.2

1.224.5

0.5

0.8

10 Groups of Jobs

27.2

Page 18: Big Data Management and Systems Design Dr. Weikuan Yu Associate Professor Department of Computer Science Florida State University

Oct 15, 2015 – CS5935 - S18

1 2 3 4 5 6 7 8 9 1002468

101214161820

Fair Comple-tion

Max

imum

Slo

wdo

wn

· Nearly uniform maximum slowdown for all groups of jobs.

· FCS improves the fairness by 66.7% on average.

10 groups of jobs

Fairness Evaluation: Maximum Slowdown

Page 19: Big Data Management and Systems Design Dr. Weikuan Yu Associate Professor Department of Computer Science Florida State University

Oct 15, 2015 – CS5935 - S19

Summary for Coordinated Scheduling

· Identified the fairness and efficiency issues because of the lack of scheduling coordination.

· Introduced Preemptive ReduceTasks for efficient preemption of reduce tasks from long-running jobs

· Designed and Implemented Fast Completion Scheduler for fast execution of small jobs and better job fairness.

Page 20: Big Data Management and Systems Design Dr. Weikuan Yu Associate Professor Department of Computer Science Florida State University

Oct 15, 2015 – CS5935 - S20

Broad Research Interests

Big Data· System Design and

Management– Fast Data Movement– Efficient job management– Multi-purpose framework

· High Performance Computing– Parallel computing models– Scalable I/O & communication – Computation & I/O optimization

· Data Analytics– K-mer indexing for sequence

fingerprinting and alignment– Scalable Image Processing– Fast community detection

· Security and Reliability– Analytics logging and Recovery– Cloud security– Storage security

Page 21: Big Data Management and Systems Design Dr. Weikuan Yu Associate Professor Department of Computer Science Florida State University

Oct 15, 2015 – CS5935 - S21

Resource and Capabilities· Software: Unstructured Data

Accelerator (UDA)– Accelerator for Big Data Analytics– Transferred to Mellanox

· In-house big data platform– 22 nodes; InfiniBand and 10GigE– SSD and GPGPU (Phi and Kepler)– Donations from Mellanox, Solarflare,

and NVIDIA

Page 22: Big Data Management and Systems Design Dr. Weikuan Yu Associate Professor Department of Computer Science Florida State University

Oct 15, 2015 – CS5935 - S22

Sponsors, Contractors, and Collaborators· Current Sponsors and Contractors:

– NSF: Two active grants on big data analytics, storage and network systems, – LLNL: burst buffer based storage systems

· Past Sponsors and Contractors– NASA: one grant for for I/O Optimization of climate applications– DOE Labs: many contracts for high-performance computing.– Industry: contracts from Intel, Mellanox, NVIDIA and Scitor; – Alabama: Innovation Award; Auburn IGP for TigerCloud.

· Collaborators:– IBM, Intel, Mellanox, Scitor, Solarflare, AMD– LBNL, ORNL, LLNL, SNL, GSFC, LPS– Illinois Tech, Clemson, College of William Mary, NJIT, Georgia Tech, Auburn

(OIT, Physics, Biology, SFWS)

Page 23: Big Data Management and Systems Design Dr. Weikuan Yu Associate Professor Department of Computer Science Florida State University

Oct 15, 2015 – CS5935 - S23

Research Directions

· Main Thrusts

· Big Data Analytics and systems design

· Network and data privacy and security

· Interdisciplinary data-driven computational research

· Key: solve challenging problems with novel strategies…

· Collaborations

· Students (systems/network oriented, interdisciplinary)

· Faculty (on and off campus)

· National laboratories

· Industry: IBM, Intel, and many more

Page 24: Big Data Management and Systems Design Dr. Weikuan Yu Associate Professor Department of Computer Science Florida State University

Oct 15, 2015 – CS5935 - S24

Student Cultivation and Team Building Numerous Internships

ORNL, Sandia, LANL, LLNL, IBM. Awards and honors

First Prize of ACM Grand Finals SC11 Fellowship ($5000). Outstanding students: 2011-2014.

Alumni (Ph.D. listed) Yuan Tian – ORNL Xinyu Que – IBM T.J. Watson Yandong Wang – IBM Watson Zhuo Liu – Yahoo! Cong Xu – Intel Bin Wang – Arista Networks

Current Team 4 Ph.D. students