efficient skyline computation in mapreduce

Efficient Skyline Computation in MapReduce Kasper Mullesgaard, Jens Laurits Pedersen, Hua Lu Aalborg University Yongluan Zhou University of Southern Denmark

Upload: suki-reid

Post on 31-Dec-2015

41 views

Category:

Documents

0 download

Report

Download

Tags:

Embed Size (px):

DESCRIPTION

Efficient Skyline Computation in MapReduce. Kasper Mullesgaard , Jens Laurits Pedersen, Hua Lu Aalborg University Yongluan Zhou University of S outhern Denmark. Skyline Query. Application: multi-criteria decision Tuple dominance: t1 dominates t2 (t1 ⊰ t2) - PowerPoint PPT Presentation

TRANSCRIPT

Efficient Skyline Computation in MapReduce

Kasper Mullesgaard, Jens Laurits Pedersen, Hua Lu

Aalborg University

Yongluan Zhou

University of Southern Denmark

Page 2: Efficient Skyline Computation in MapReduce

Skyline Query

• Application: multi-criteria decision• Tuple dominance: t1 dominates t2 (t1 ⊰ t2)– Iff t1 is not worse than t2 in all dimensions, and– t1 is better than t2 in at least one dimension

• Skyline query:– Given a dataset, returns all tuples that are not

dominated by others

Page 3: Efficient Skyline Computation in MapReduce

Scaling Skyline Computation

• Customized solutions:– Require arbitrary inter-node communication– Need software stacks to hardness a large cluster– Unproved scalability– Lack of fault tolerance

• General MapReduce platforms– Availability of scalable systems, such as Hadoop– A strict communication/synchronization model

MapReduce

Challenges of Skyline Computation using MapReduce

• To maximize parallelization• Push more work to mappers, i.e. let mappers filter out

more non-skyline points• Ability to utilize multiple reducers

• However, global skylines cannot be determined by local information• Without global information, Mappers have very limited

capabilities to filter out non-skyline points

Page 6: Efficient Skyline Computation in MapReduce

Grid Partitioning and Bit String Representation

Partition Dominance: pi ⊰ pj iff pi.max ⊰ pj.min

2 5 8

1 4 7

0 3 6

BSR = 011110100

Page 7: Efficient Skyline Computation in MapReduce

Bit String Generation

Page 8: Efficient Skyline Computation in MapReduce

Determining Partitions Per Dimension (PPD)

• PPD is too high → very few tuples in each partition and too many partitions

• PPD is too low → too many tuples in each partition and less effective pruning

• Idea: generate bit strings for PPD from 2 to

– then choose the one with the most desirable number of tuples per partition

Page 9: Efficient Skyline Computation in MapReduce

Single Reducer

Page 10: Efficient Skyline Computation in MapReduce

Multi-Reducer

• The single reducer still performs significant work for detecting global skyline – limits the degree of parallelization

• Idea: independent partition group– Anti-Dominating Region (ADR):

– Independent Partition Group: A set of partitions Pi is an IPG iff holds

– One reducer is responsible for each IPG.

Page 11: Efficient Skyline Computation in MapReduce

Multi-Reducer

Page 12: Efficient Skyline Computation in MapReduce

Generation of I.P.G.

• Idea: a partition pm is a maximum partition iff ∀p, pm ∉ p.ADR

• Procedure:1. Find a maximum partition pm

2. Generate IPG = {pm} U pm.ADR

3. Remove pm and repeat 1

Page 13: Efficient Skyline Computation in MapReduce

Implementation Issues

• More independent groups than #reducers– Need allocate them to the reducers, two options:1. Load balancing 2. Minimizing duplicate data transmission

• Elimination of duplicated skyline outputs– A grid partition appears in multiple IPGs– Designate one IPG as the responsible group• Load balancing

Page 14: Efficient Skyline Computation in MapReduce

Experimental Setup

• 13 commodity machines• Datasets with independent and anti-

correlated distribution • Comparisons:– MR-BNL– MR-Angle

Page 15: Efficient Skyline Computation in MapReduce

#Dimensions

independent data, cardinality: 1×105

Page 16: Efficient Skyline Computation in MapReduce

#Dimensions

Anti-correlated data, cardinality: 1×105

Page 17: Efficient Skyline Computation in MapReduce

Cardinality (independent data)

Dimensions: 3 Dimensions: 8

Page 18: Efficient Skyline Computation in MapReduce

Cardinality (Anti-corr. data)

Dimensions: 3 Dimensions: 8

Page 19: Efficient Skyline Computation in MapReduce

Number of Reducers

Page 20: Efficient Skyline Computation in MapReduce

Summary

• Grid partitioning and bit strings– Choose an appropriate # partitioning

• Exploit independent groups to enable multiple reducers – Good for cases with large # skylines– Merging independent groups– Eliminate duplicate outputs

Distributed Computing with Spark and MapReducerezab/dao/notes/Intro_Spark.pdf · 2018-05-17 · Limitations of MapReduce MapReduce is great at one-pass computation, but inefﬁcient

MapReduce. MapReduce Outline MapReduce Architecture MapReduce Internals MapReduce Examples JobTracker Interface

MapReduce for the Cell B.E. Architecturepages.cs.wisc.edu/~dekruijf/docs/mapreduce-cell.pdf · overlapping computation with memory transfers as much as possible. Third, between the

Introduction to MapReducecacs.usc.edu › education › cs596 › L9.pdf · MapReduce Programming Model • Computation: • takes a set of input pairs and produces

Efﬁcient Skyline Computation over Low-Cardinality … · Celestial Sleep. The Slumber Well is not in the skyline since it has ... The Nap Motel is not in the skyline because the

A Model of Computation for MapReduce Karloff, Suri and Vassilvitskii (SODA ’ 10) Presented by Ning Xie

Terasort Using SAGA-MapReduce Given by: Sharath Maddineni CCT: Center for Computation & Technology

SecureMR: Secure MapReduce Computation Using ...milanova/docs/HoTSoS18.pdfSecureMR: Secure MapReduce Computation Using Homomorphic Encryption and Program Partitioning Yao Dong Rensselaer

PRIVACY AWARE PARALLEL COMPUTATION OF SKYLINE SETS …

Skyline Skyline Skyline EYES - WIDE - OPEN · EYES SKYLINE SKYLINE SKYLINE WIDE OPEN Eyes Wide Open SKYLINE To better capture reality, keep “EYES WIDE OPEN” catches the eye on

Eﬃcient continuous skyline computationdbgroup.eecs.umich.edu/files/infosci07.pdf · Eﬃcient continuous skyline computation M. Morse a,*, J.M. Patel a, W.I. Grosky b a Department

Parallel Skyline Computation on Multicore Architectures