improve run generation overlap input,output, and internal cpu work. reduce the number of runs...
Post on 22-Dec-2015
221 views
TRANSCRIPT
Improve Run Generation• Overlap input,output, and internal CPU work.• Reduce the number of runs (equivalently, increase average
run length).
DISK
MEMORY
DISK
Internal Quick Sort6 2 8 5 11 10 4 1 9 7 3
Use 6 as the pivot (median of 3).
Input first, middle, and last blocks first.
In-place partitioning.
Input blocks from the ends toward the middle.
Sort left and right groups recursively.
Can begin output as soon as left most block is ready.
4 2 3 5 1 6 10 11 9 7 8
Steady State Operation
Read from disk
Write to disk
Run generation
• Synchronization is done when the active input buffer gets empty (the active output buffer will be full at this time).
DISK
MEMORY
DISK
New Strategy
• Use 2 input and 2 output buffers.
• Rest of memory is used for a min loser tree.
Input 1Input 0
Output 0 Output 1
Loser Tree
Steady State Operation
Read from disk
Write to disk
Run generation
• Synchronization is done when the active input buffer gets empty (the active output buffer will be full at this time).
5
O0
2
3
Generate Run 1
4 3 6 8 5 7 3 6 9 4 5 2 5 8
4
6
8
3
5
3
7
2
5
5
8
3
6
4
9
O1
I0 I1
3
5
4
1
5
4
45
O0
2
3
Generate Run 1
4 3 6 8 5 7 3 6 9 4 5
2
5 8
4
6
8
3
5
3
7
2
5
5
8
3
6
4
9
O1
I0 I1
3
5
4
1
5
4
5
O0
2
34 3 6 8 5 7 3 6 9 4 5
2
5 8
4
6
8
3
5
3
7
4
5
5
8
3
6
4
9
O1
I0 I1
1
5
4
4
1
9
2
Continue With Run 1
O1
3
45
O0
2
4 3 6 8 5 7 3 6 9 4 5
2
5 8
4
6
8
3
5
3
7
4
5
5
8
4
6
4
9
I0 I1
1
5
1
1
9
2
Continue With Run 1
1
5
1
O1
3
45
O0
2
4 3 6 8 5 7
3
6 9 4 5
2
5 8
4
6
8
3
5
3
7
4
5
5
8
4
6
4
9
I0 I1
1
5
1
1
9
2
Continue With Run 1
5
9
9
5
7
91
O1
3
45
O0
2
4
3
6 8 5 7
3
6 9 4 5
2
5 8
4
6
8
3
5
3
7
4
5
5
8
4
6
4
9
I0 I1
1
5
1
1
9
2
Continue With Run 1
5
9
5
7
2
91
O1
3
45
O0
4
3
6 8 5 7
3
6 9 4 5 5 8
4
6
8
3
5
3
7
4
5
5
8
4
6
4
9
I0 I1
5
1
6
1
3
5
9
5
7
2
Continue With Run 1
2
2 91
O1
3
45
O0
4
3
6 8 5 7
3
6 9 4 5 5 8
4
6
8
3
5
3
7
4
5
5
8
4
6
4
9
I0 I1
5
1
6
1
3
5
9
5
7
Continue With Run 1
2
6
6
5
2 91
O1
3
45
O0
4
3
6 8 5 7
3
6 9
4
5 5 8
4
6
8
3
5
3
7
4
5
5
8
4
6
4
9
I0 I1
5
1
6
1
3
5
9
5
7
Continue With Run 1
2
6
6
5
1
1
9
5
Buffer Reduction
• We may reduce the number of buffers to 3.
• At any time, 1 us used to read into, 1 to write from, and the 3rd both feeds the loser tree and is filled from the tree.
• The 3 physical buffers rotate through these three roles.
• Let k be number of external nodes in loser tree.
• Run size >= k.
• Sorted input => 1 run.
• Reverse of sorted input => n/k runs.
• Average run size is ~2k.
• Memory capacity = m records.
• Run size using fill memory, sort, and output run scheme = m.
• Use loser tree scheme. Assume block size is b records. Need memory for 4 buffers (4b records). Loser tree k = m – 4b. Average run size = 2k = 2(m – 4b). 2k >= m when m >= 8b.
• Total internal processing time using fill memory, sort, and output run scheme = O((n/m) m log m) = O(n log m).
• Total internal processing time using loser tree = O(n log k).
• Loser tree scheme generates runs that differ in their lengths.