performance tuning on multicore systems for feature matching within image collections xiaoxin tang*,...

28
Performance Tuning on Multicore Systems for Feature Matching within Image Collections Xiaoxin Tang*, Steven Mills, David Eyers, Zhiyi Huang, Kai-Cheung Leung and Minyi Guo* Department of Computer Science University of Otago, New Zealand * Department of Computer Science Shanghai Jiao Tong University, China

Upload: rosamund-marsh

Post on 29-Dec-2015

226 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Performance Tuning on Multicore Systems for Feature Matching within Image Collections Xiaoxin Tang*, Steven Mills, David Eyers, Zhiyi Huang, Kai-Cheung

Performance Tuning on Multicore Systems for

Feature Matching within Image Collections

Xiaoxin Tang*, Steven Mills, David Eyers, Zhiyi Huang, Kai-Cheung Leung and Minyi Guo*

Department of Computer Science University of Otago, New Zealand

* Department of Computer ScienceShanghai Jiao Tong University, China

Page 2: Performance Tuning on Multicore Systems for Feature Matching within Image Collections Xiaoxin Tang*, Steven Mills, David Eyers, Zhiyi Huang, Kai-Cheung

Contents

• Motivation• Our work• Evaluation• Conclusion

Page 3: Performance Tuning on Multicore Systems for Feature Matching within Image Collections Xiaoxin Tang*, Steven Mills, David Eyers, Zhiyi Huang, Kai-Cheung

Contents

• Motivation• Our work• Evaluation• Conclusion

Page 4: Performance Tuning on Multicore Systems for Feature Matching within Image Collections Xiaoxin Tang*, Steven Mills, David Eyers, Zhiyi Huang, Kai-Cheung

Similarity Search

• Definition:– To preprocess a database of N objects so that

given a query object, one can effectively determine its nearest neighbors in database.

• Applications:– pattern recognition, chemical similarity

analysis, and statistical classification, etc.

Page 5: Performance Tuning on Multicore Systems for Feature Matching within Image Collections Xiaoxin Tang*, Steven Mills, David Eyers, Zhiyi Huang, Kai-Cheung

The problem – KNN Search

• K Nearest Neighbor Search:– Feature: an array of D elements

• f = [e1]

– Feature Space: a set of features• Fs= {f1}

– Feature Similarity: Euclidean distance• =sqrt(Σ(fi

m-fjm)2)

– Search: given a query feature fq, find k features in Fs so that they have the shortest distances to fq.

Page 6: Performance Tuning on Multicore Systems for Feature Matching within Image Collections Xiaoxin Tang*, Steven Mills, David Eyers, Zhiyi Huang, Kai-Cheung

Our Case Study

• Feature Matching: a fundamental problem in many computer vision tasks– Use the SIFT algorithm to generate features for each image;– Use a k-Nearest Neighbors (k-NN) algorithm to find similar

features between images

Page 7: Performance Tuning on Multicore Systems for Feature Matching within Image Collections Xiaoxin Tang*, Steven Mills, David Eyers, Zhiyi Huang, Kai-Cheung

Challenges

• Very time-consuming:– datasets become larger:

• hundreds or thousands of images;

– image resolution increases:• 2300×1500 pixels, or higher;

• New platforms: HPC turns to multi-/many-core age:

• AMD 16-core and 64-core machines.

Page 8: Performance Tuning on Multicore Systems for Feature Matching within Image Collections Xiaoxin Tang*, Steven Mills, David Eyers, Zhiyi Huang, Kai-Cheung

Motivation

• Performance evaluation:– Find out common problems that may limit the

performance of feature matching on multi-/many-core platforms.

• Performance tuning:– Find general methods to solve the identified

problems.

Page 9: Performance Tuning on Multicore Systems for Feature Matching within Image Collections Xiaoxin Tang*, Steven Mills, David Eyers, Zhiyi Huang, Kai-Cheung

Contents

• Motivation

• Our work• Evaluation• Conclusion

Page 10: Performance Tuning on Multicore Systems for Feature Matching within Image Collections Xiaoxin Tang*, Steven Mills, David Eyers, Zhiyi Huang, Kai-Cheung

Data Distribution

10000 20000 30000 400000

5

10

15

20

25

30

0

100000

200000

300000

400000

500000

600000

700000

26 26 26

3

181124

420008

660949

146180

images features

feature size range

nu

mb

er

of i

ma

ge

s

tota

l nu

mb

er

of f

ea

ture

s

Page 11: Performance Tuning on Multicore Systems for Feature Matching within Image Collections Xiaoxin Tang*, Steven Mills, David Eyers, Zhiyi Huang, Kai-Cheung

Data Size

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 8005

1015202530354045

data size kd-tree size total

Image id

Siz

e (

MB

)

Page 12: Performance Tuning on Multicore Systems for Feature Matching within Image Collections Xiaoxin Tang*, Steven Mills, David Eyers, Zhiyi Huang, Kai-Cheung

Problems

• Unbalanced workload:– Levels of parallelism;– Scheduling policy.

• Poor last-level cache utilization:– Memory architecture.

Page 13: Performance Tuning on Multicore Systems for Feature Matching within Image Collections Xiaoxin Tang*, Steven Mills, David Eyers, Zhiyi Huang, Kai-Cheung

Levels of parallelism

…….. ……..

Level_1

 

Level_2 Level_3

———————

Level_4

LinearKD-treeKmeansLSHOthers

Level_1&2

Reference Images Query Images Features

Page 14: Performance Tuning on Multicore Systems for Feature Matching within Image Collections Xiaoxin Tang*, Steven Mills, David Eyers, Zhiyi Huang, Kai-Cheung

Scheduling policy

• OpenMP scheduling policy:– Static: the scheduler will assign an equal

number of tasks to each thread (not used);

– Dynamic: when one thread finishes its current task, it will take new tasks from the global task queue;

– Guided: chunk size is adjusted dynamically when tasks are requested from the task queue.

Page 15: Performance Tuning on Multicore Systems for Feature Matching within Image Collections Xiaoxin Tang*, Steven Mills, David Eyers, Zhiyi Huang, Kai-Cheung

Memory architecture

• More cores are sharing the memory and last-level cache:– Memory bandwidth:

• AMD 16-core 12.8 GB/s• AMD 64-core 25.6 GB/s

– Last-level cache:• AMD 16-core 6 MB• AMD 64-core 16 MB

• Large images may not fit in cache and will cause many memory accesses, which leads to hitting the memory wall.

Page 16: Performance Tuning on Multicore Systems for Feature Matching within Image Collections Xiaoxin Tang*, Steven Mills, David Eyers, Zhiyi Huang, Kai-Cheung

Divide-and-Merge

• We propose Divide-and-Merge:– Whole feature space is split into several

smaller sub-spaces;– Search each sub-space independently;– Merge their results.

Page 17: Performance Tuning on Multicore Systems for Feature Matching within Image Collections Xiaoxin Tang*, Steven Mills, David Eyers, Zhiyi Huang, Kai-Cheung

Divide-and-Merge

Page 18: Performance Tuning on Multicore Systems for Feature Matching within Image Collections Xiaoxin Tang*, Steven Mills, David Eyers, Zhiyi Huang, Kai-Cheung

Time complexity

• Accurate algorithms:– Brute force: – Apply DM:

• Approximate algorithms:– Randomized KD-Tree: – Apply DM:

Page 19: Performance Tuning on Multicore Systems for Feature Matching within Image Collections Xiaoxin Tang*, Steven Mills, David Eyers, Zhiyi Huang, Kai-Cheung

Contents

• Motivation• Our work

• Evaluation• Conclusion

Page 20: Performance Tuning on Multicore Systems for Feature Matching within Image Collections Xiaoxin Tang*, Steven Mills, David Eyers, Zhiyi Huang, Kai-Cheung

Hardware and Software configuration

Name CPU Cache Memory OSCompil

er

AMD 16-core(AMD16)

AMD Opteron Processor

83804 cores × 4 @ 2.5 GHz

L1: 128 KB,L2: 512 KB,L3: 6144 KB

16 GiB, DDR2 800 MHz12.8 GB/s

Ubuntu 12.04.1

g++-4.4

AMD 64-core(AMD64)

AMD Opteron Processor

62768 cores × 8 @ 2.3 GHz

L1: 48 KB,L2: 1000 KB,

L3: 16384 KB

64 GiB, DDR3 1333

MHz21.32 GB/s

Ubuntu 12.04.1

g++-4.4

Environment:OpenCV + OpenMP: one of the most frequently used setup for computer vision researchers to utilize parallel platforms

Page 21: Performance Tuning on Multicore Systems for Feature Matching within Image Collections Xiaoxin Tang*, Steven Mills, David Eyers, Zhiyi Huang, Kai-Cheung

Levels of parallelism

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 160

2

4

6

8

10

12

Level_1 Level_2 Level_3 Level_1&2

Scalability

Page 22: Performance Tuning on Multicore Systems for Feature Matching within Image Collections Xiaoxin Tang*, Steven Mills, David Eyers, Zhiyi Huang, Kai-Cheung

Scheduling policy(on level_1&2)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 160

2

4

6

8

10

12

d1 d2 d4 guided

Scalability

Page 23: Performance Tuning on Multicore Systems for Feature Matching within Image Collections Xiaoxin Tang*, Steven Mills, David Eyers, Zhiyi Huang, Kai-Cheung

Scheduling policy(on level_3)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 160

2

4

6

8

10

12

14

d1 d2 d4 guided

Scalability

Page 24: Performance Tuning on Multicore Systems for Feature Matching within Image Collections Xiaoxin Tang*, Steven Mills, David Eyers, Zhiyi Huang, Kai-Cheung

Memory architecture

1. Original Execution

2. Apply Divide-and-Merge

Page 25: Performance Tuning on Multicore Systems for Feature Matching within Image Collections Xiaoxin Tang*, Steven Mills, David Eyers, Zhiyi Huang, Kai-Cheung

Evaluation on Manawatu Dataset

1 4 8 121620242832364044485256606405101520253035404550

Level_3 Level_3_DMLevel_1&2 Level_1&2_DM

Scalability

1 4 8 12162024283236404448525660640

5

10

15

20

25

Level_3 Level_3_DMLevel_1&2 Level_1&2_DM

Speedup

Page 26: Performance Tuning on Multicore Systems for Feature Matching within Image Collections Xiaoxin Tang*, Steven Mills, David Eyers, Zhiyi Huang, Kai-Cheung

Evaluation on Manawatu Dataset

1 4 8 121620242832364044485256606405101520253035404550

Level_3 Level_3_DMLevel_1&2 Level_1&2_DM

Scalability

1 4 8 12162024283236404448525660640

2

4

6

8

10

12

14

Level_3 Level_3_DMLevel_1&2 Level_1&2_DM

Speedup

Page 27: Performance Tuning on Multicore Systems for Feature Matching within Image Collections Xiaoxin Tang*, Steven Mills, David Eyers, Zhiyi Huang, Kai-Cheung

Contents

• Motivation• Our work• Evaluation

• Conclusion

Page 28: Performance Tuning on Multicore Systems for Feature Matching within Image Collections Xiaoxin Tang*, Steven Mills, David Eyers, Zhiyi Huang, Kai-Cheung

Conclusion

• We have shown that performance tuning is demanding on modern multicore systems.

• We have comprehensively evaluated the impact of the three factors that have an influence on large-scale image feature matching.

• We have proposed a Divide-and-Merge algorithm that can greatly improve the speedup and scalability of feature matching algorithms on multicore machines.