big data management and systems design dr. weikuan yu associate professor department of computer...
TRANSCRIPT
Big Data Management and Systems Design
Dr. Weikuan Yu
Associate Professor
Department of Computer Science
Florida State University
Oct 15, 2015 – CS5935 - S2
Life as a Graduate Student
· Course work– Degrees vs. skill sets
· Research– Goals vs. means
· Career development– Presentation and teaching skills– Communication and social skills– Internships– Networking
Oct 15, 2015 – CS5935 - S3
Big Data Challenge
Sources: 1: http://www.infobarrel.com/Evolution 2:http://visual.ly/big-data-explosion?utm_source=visually_embed
Where is Data Coming From ?
3.0 BillionInternet Users
1.35 BillionFacebook Users
550 MillionTweets Per Day
72 Hrs VideoPer Minutes
4.7 Billion Google Search Per Day
And so many others
The dawn of civilization
Year 2003
5 EXABYTES of DATA 5 EXABYTES of DATAEvery 2 Days
Year 2015
7910EXABYTES
Oct 15, 2015 – CS5935 - S4
Big Data Ecosystem
HDFS Hbase
Storage
DisksNetworks
Mahout, Giraph, Pregel, R
MR/Hadoop
Hive, Pig, Shark, Flume
Storm RamCloudSpark
Infrastructure
Applications
Run-Time
Hardware
MemCached
VM, Containers, Public and Private CloudsOS
ComplexityData
Insight
Processors(Accelerators)
Services
Oct 15, 2015 – CS5935 - S5
Research Strategies?· Descriptive research
– Examine and collect facts to document patterns
· Discovery oriented research– Inductive reasoning from patterns to general discoveries
· Engineering-based research– Use existing techniques and theories to create a
technology or tool.
· Hypothesis-driven research– Make a hypothesis and then test the hypothesis using
deductive reasoning
Oct 15, 2015 – CS5935 - S6
Overview of Hadoop MapReduce
Split MapTask
…. ….
Split MapTask
Split MapTask
MOF
MOF
MOF
MOF ReduceTask
MEM
map
ReduceTask
ReduceTask
ReduceTask
shuffle/merge reduce
DFSDFS
JobTrackerAssign MapTasks Assign ReduceTasks
Oct 15, 2015 – CS5935 - S7
1 2 3 4 5 6 70
10
20
30
40
50
60
0
500
1000
1500
2000
2500
3000
Standalone Execution Time (sec)
Sta
nd
alo
ne E
xecu
tion
Tim
e (s
ec)
Groups
Normalized Execution Time (slowdown)
No
rma
lize
d E
xecu
tion
Tim
e
1.5×
Small Job Starvation within Hadoop
52×
Oct 15, 2015 – CS5935 - S8
Hadoop Fair Scheduler
· The mostly widely used Hadoop scheduler
· It is designed to provide fairness among concurrently running jobs.
· Tasks occupy slots until completion or failure
Job Arrival
J-1
J-1
J-1
J-2
J-2
J-2
J-2
J-2
J-3
J-3 J-3
J-3
J-3
J-2
J-2
Time
shuffle
reduce
Slot-R1
Slot-R2
Slot-R3
Slot-M1
Slot-M2
Slot-M3
Slot-M4
Slot-M5
5 Map Slots
3 Reduce Slots
Oct 15, 2015 – CS5935 - S9
How to achieve both Efficiency and Fairness?
· How to correct the non-preemptive nature of reduce tasks for flexible and dynamic allocation of reduce slots?
– Existing schedulers are not aware of such behavior. Once a reduce task is
launched, it stays with the reduce slot till the end.
· How to better schedule two different types of tasks?
– Hadoop schedules both map and reduce tasks with a similar max-min fair
sharing policy without paying attention to their relationship in a job.
– Map and reduce slots need to be dynamically, and proportionally shared.
Objective
Oct 15, 2015 – CS5935 - S10
Preemptive ReduceTasks
· Different from Linux command “Kill –STOP $PID”.
· Lightweight work-conserving preemption mechanism.
– Provides any-time preemption with negligible performance impact.
– Allows a reduce task to resume from where it was preempted.
– Preserves previous computation and I/O.
Oct 15, 2015 – CS5935 - S11
TaskTracker
Heap
Seg
men
t
Merge
seg seg seg
Retrieve
R1: Before Preempt
Heapseg
R1: After Resume
Index
Seg
men
t
Preemption During Shuffle Phase
· Only merge the in-memory segs, while maintaining on-disk segs untouched
Oct 15, 2015 – CS5935 - S12
R1: Before Preempt
MPQ
R1: After Resume
MPQ
DFS
Index
Flush
Ret
rieve
offset
Preemption During Reduce Phase
· Preemption is occurred at the boundary of intermediate <key, val> pairs.
· Recording the current offset of each segment and minimum priority queue
TaskTracker
Oct 15, 2015 – CS5935 - S13
Evaluation of Preemptive ReduceTask
10% 30% 50% 70% 90%3000
4000
5000
6000
7000Work-Conserving PreemptionKilling PreemptionNo Preemption (Baseline)
Completion Ratio of ReduceTask
Jo
b E
xe
cu
tio
n T
ime
(s
e)
Negligible overhead
Oct 15, 2015 – CS5935 - S14
Fast Completion Scheduler
· Strategy: – Find a reduce task for Preemption and select another for Launching,
and balance the utilization of reduce slots
– Decisions to make
· Which reduce task to preempt and for how many times?
· Which one to launch and on which slot/node?
· How to avoid starvation and achieve locality?
· New progress metrics of a job: – Remaining shuffle time
– Remaining data (<k,v> pairs) to be reduced
Oct 15, 2015 – CS5935 - S15
FCS Algorithms
· Preemption algorithm– Select a reduce task from a job with longer remaining time + more
remaining data
– Task slackness: the number of times a task has been preempted
– Avoid starvation: do not preempt a reduce task that has a big slackness.
· Launching algorithm– Select another reduce task from a job with the least remaining time /
remaining data
– Delay a reduce task to maximize data locality.
– Avoid aggressive delay: set a threshold based on the cluster size.
Oct 15, 2015 – CS5935 - S16
1 2 3 4 5 6 7 8 9 1010
100
1000
10000
FCS HFS
· FCS reduces average execution time by 31% (171 jobs).
· Significantly speeds up small jobs at a small cost of big jobs.
Ave
rage
Exe
cutio
n T
ime
(sec
)
10 Groups of Jobs
1.9 2.4 1.9 2.3
1.6
1.9
1.12.2 0.79
Results for Map-heavy Workload
Oct 15, 2015 – CS5935 - S17
1 2 3 4 5 6 7 8 9 101
10
100
1000
10000FCS HFS
Av
era
ge
Re
du
ce
Ta
sk
Wa
it
Tim
e (
se
c)
· Small jobs are benefited from significantly shortened reduce wait time.
· Waiting time are reduced by 22× for the jobs in the first 6 groups.
Average ReduceTask Wait Time
19.512.4
2221 32.2
1.224.5
0.5
0.8
10 Groups of Jobs
27.2
Oct 15, 2015 – CS5935 - S18
1 2 3 4 5 6 7 8 9 1002468
101214161820
Fair Comple-tion
Max
imum
Slo
wdo
wn
· Nearly uniform maximum slowdown for all groups of jobs.
· FCS improves the fairness by 66.7% on average.
10 groups of jobs
Fairness Evaluation: Maximum Slowdown
Oct 15, 2015 – CS5935 - S19
Summary for Coordinated Scheduling
· Identified the fairness and efficiency issues because of the lack of scheduling coordination.
· Introduced Preemptive ReduceTasks for efficient preemption of reduce tasks from long-running jobs
· Designed and Implemented Fast Completion Scheduler for fast execution of small jobs and better job fairness.
Oct 15, 2015 – CS5935 - S20
Broad Research Interests
Big Data· System Design and
Management– Fast Data Movement– Efficient job management– Multi-purpose framework
· High Performance Computing– Parallel computing models– Scalable I/O & communication – Computation & I/O optimization
· Data Analytics– K-mer indexing for sequence
fingerprinting and alignment– Scalable Image Processing– Fast community detection
· Security and Reliability– Analytics logging and Recovery– Cloud security– Storage security
Oct 15, 2015 – CS5935 - S21
Resource and Capabilities· Software: Unstructured Data
Accelerator (UDA)– Accelerator for Big Data Analytics– Transferred to Mellanox
· In-house big data platform– 22 nodes; InfiniBand and 10GigE– SSD and GPGPU (Phi and Kepler)– Donations from Mellanox, Solarflare,
and NVIDIA
Oct 15, 2015 – CS5935 - S22
Sponsors, Contractors, and Collaborators· Current Sponsors and Contractors:
– NSF: Two active grants on big data analytics, storage and network systems, – LLNL: burst buffer based storage systems
· Past Sponsors and Contractors– NASA: one grant for for I/O Optimization of climate applications– DOE Labs: many contracts for high-performance computing.– Industry: contracts from Intel, Mellanox, NVIDIA and Scitor; – Alabama: Innovation Award; Auburn IGP for TigerCloud.
· Collaborators:– IBM, Intel, Mellanox, Scitor, Solarflare, AMD– LBNL, ORNL, LLNL, SNL, GSFC, LPS– Illinois Tech, Clemson, College of William Mary, NJIT, Georgia Tech, Auburn
(OIT, Physics, Biology, SFWS)
Oct 15, 2015 – CS5935 - S23
Research Directions
· Main Thrusts
· Big Data Analytics and systems design
· Network and data privacy and security
· Interdisciplinary data-driven computational research
· Key: solve challenging problems with novel strategies…
· Collaborations
· Students (systems/network oriented, interdisciplinary)
· Faculty (on and off campus)
· National laboratories
· Industry: IBM, Intel, and many more
Oct 15, 2015 – CS5935 - S24
Student Cultivation and Team Building Numerous Internships
ORNL, Sandia, LANL, LLNL, IBM. Awards and honors
First Prize of ACM Grand Finals SC11 Fellowship ($5000). Outstanding students: 2011-2014.
Alumni (Ph.D. listed) Yuan Tian – ORNL Xinyu Que – IBM T.J. Watson Yandong Wang – IBM Watson Zhuo Liu – Yahoo! Cong Xu – Intel Bin Wang – Arista Networks
Current Team 4 Ph.D. students