improve run generation overlap input,output, and internal cpu work. reduce the number of runs...

31
Improve Run Generation Overlap input,output, and internal CPU work. Reduce the number of runs (equivalently, increase average run length). DISK MEMORY DISK

Post on 22-Dec-2015

221 views

Category:

Documents


1 download

TRANSCRIPT

Improve Run Generation• Overlap input,output, and internal CPU work.• Reduce the number of runs (equivalently, increase average

run length).

DISK

MEMORY

DISK

Internal Quick Sort6 2 8 5 11 10 4 1 9 7 3

Use 6 as the pivot (median of 3).

Input first, middle, and last blocks first.

In-place partitioning.

Input blocks from the ends toward the middle.

Sort left and right groups recursively.

Can begin output as soon as left most block is ready.

4 2 3 5 1 6 10 11 9 7 8

Alternative Internal Sort Scheme

DISK

DISK

B1 B2 B3

Partition into 3 buffers.

Steady State Operation

Read from disk

Write to disk

Run generation

• Synchronization is done when the active input buffer gets empty (the active output buffer will be full at this time).

DISK

MEMORY

DISK

New Strategy

• Use 2 input and 2 output buffers.

• Rest of memory is used for a min loser tree.

Input 1Input 0

Output 0 Output 1

Loser Tree

Steady State Operation

Read from disk

Write to disk

Run generation

• Synchronization is done when the active input buffer gets empty (the active output buffer will be full at this time).

4 3 6 8 1 5 7 3 2 6 9 4 5 2 5 8

4

3

8

O0 O1

I0 I1

Initialize

4 3 6 8 1 5 7 3 2 6 9 4 5 2 5 8

4

6

8

3

5

1

7

Initialize

O0 O1

I0 I1

4 3 6 8 1 5 7 3 2 6 9 4 5 2 5 8

4

6

8

3

5

3

7

1

6

2

9

Initialize

O0 O1

I0 I1

4 3 6 8 1 5 7 3 2 6 9 4 5 2 5 8

4

6

8

3

5

3

7

2

5

2

8

1

6

4

9

Initialize

O0 O1

I0 I1

4 3 6 8 1 5 7 3 2 6 9 4 5 2 5 8

4

6

8

3

5

3

7

2

5

5

8

1

6

4

9

Initialize

O0 O1

I0 I1

4 3 6 8 1 5 7 3 2 6 9 4 5 2 5 8

4

6

8

3

5

3

7

2

5

5

8

2

6

4

9

Initialize

O0 O1

I0 I1

Generate Run 1

14 3 6 8 5 7 3 2 6 9 4 5 2 5 8

4

6

8

3

5

3

7

2

5

5

8

2

6

4

9

O0 O1

I0 I1

3

5

4

Generate Run 1

4 3 6 8 5 7 3 2 6 9 4 5 2 5 8

4

6

8

3

5

3

7

2

5

5

8

2

6

4

9

O0 O1

I0 I1

3

5

4

1

3

3

5

O0

2

3

Generate Run 1

4 3 6 8 5 7 3 6 9 4 5 2 5 8

4

6

8

3

5

3

7

2

5

5

8

3

6

4

9

O1

I0 I1

3

5

4

1

5

4

45

O0

2

3

Generate Run 1

4 3 6 8 5 7 3 6 9 4 5

2

5 8

4

6

8

3

5

3

7

2

5

5

8

3

6

4

9

O1

I0 I1

3

5

4

1

5

4

5

O0

2

34 3 6 8 5 7 3 6 9 4 5

2

5 8

4

6

8

3

5

3

7

2

5

5

8

3

6

4

9

O1

I0 I1

1

5

4

4

1

9

2

5

O0

2

34 3 6 8 5 7 3 6 9 4 5

2

5 8

4

6

8

3

5

3

7

4

5

5

8

3

6

4

9

O1

I0 I1

1

5

4

4

1

9

2

Continue With Run 1

O1

3

45

O0

2

4 3 6 8 5 7 3 6 9 4 5

2

5 8

4

6

8

3

5

3

7

4

5

5

8

4

6

4

9

I0 I1

1

5

1

1

9

2

Continue With Run 1

1

5

1

O1

3

45

O0

2

4 3 6 8 5 7

3

6 9 4 5

2

5 8

4

6

8

3

5

3

7

4

5

5

8

4

6

4

9

I0 I1

1

5

1

1

9

2

Continue With Run 1

5

9

9

5

7

91

O1

3

45

O0

2

4

3

6 8 5 7

3

6 9 4 5

2

5 8

4

6

8

3

5

3

7

4

5

5

8

4

6

4

9

I0 I1

1

5

1

1

9

2

Continue With Run 1

5

9

5

7

2

91

O1

3

45

O0

4

3

6 8 5 7

3

6 9 4 5 5 8

4

6

8

3

5

3

7

4

5

5

8

4

6

4

9

I0 I1

5

1

6

1

3

5

9

5

7

2

91

O1

3

45

O0

4

3

6 8 5 7

3

6 9 4 5 5 8

4

6

8

3

5

3

7

4

5

5

8

4

6

4

9

I0 I1

5

1

6

1

3

5

9

5

7

2

Continue With Run 1

2

2 91

O1

3

45

O0

4

3

6 8 5 7

3

6 9 4 5 5 8

4

6

8

3

5

3

7

4

5

5

8

4

6

4

9

I0 I1

5

1

6

1

3

5

9

5

7

Continue With Run 1

2

6

6

5

2 91

O1

3

45

O0

4

3

6 8 5 7

3

6 9

4

5 5 8

4

6

8

3

5

3

7

4

5

5

8

4

6

4

9

I0 I1

5

1

6

1

3

5

9

5

7

Continue With Run 1

2

6

6

5

1

1

9

5

Buffer Reduction

• We may reduce the number of buffers to 3.

• At any time, 1 us used to read into, 1 to write from, and the 3rd both feeds the loser tree and is filled from the tree.

• The 3 physical buffers rotate through these three roles.

• Let k be number of external nodes in loser tree.

• Run size >= k.

• Sorted input => 1 run.

• Reverse of sorted input => n/k runs.

• Average run size is ~2k.

• Memory capacity = m records.

• Run size using fill memory, sort, and output run scheme = m.

• Use loser tree scheme. Assume block size is b records. Need memory for 4 buffers (4b records). Loser tree k = m – 4b. Average run size = 2k = 2(m – 4b). 2k >= m when m >= 8b.

• Assume b = 100.

m 600 1000 5000 10000

k 200 600 4600 9600

2k 400 1200 9200 19200

• Total internal processing time using fill memory, sort, and output run scheme = O((n/m) m log m) = O(n log m).

• Total internal processing time using loser tree = O(n log k).

• Loser tree scheme generates runs that differ in their lengths.

4 3 6 9

Merging Runs Of Different Length

4 3

6

9

7 15

22

7

13

22