adaptive sorting “a dynamically tuned sorting library” “optimizing sorting with genetic...
TRANSCRIPT
Adaptive SortingAdaptive Sorting
““A Dynamically Tuned Sorting Library”A Dynamically Tuned Sorting Library”
““Optimizing Sorting with Genetic Algorithms”Optimizing Sorting with Genetic Algorithms”
By Xiaoming Li, Maria Jesus Garzaran, and By Xiaoming Li, Maria Jesus Garzaran, and David PaduaDavid Padua
Presented by Anton MorozovPresented by Anton Morozov
Motivations and ObservationsMotivations and Observations
• Success of ATLAS, FFTW and SPIRALSuccess of ATLAS, FFTW and SPIRAL (signal processing libraries) (signal processing libraries)
What Can be done for Sorting?What Can be done for Sorting?
Why are we interested in the sorting Why are we interested in the sorting algorithms?algorithms?
Does this reflects the performance of the sorting algorithms?
Which additional factors influence the Which additional factors influence the performance of the sorting algorithm?performance of the sorting algorithm?
Performance vs. Standard DeviationPerformance vs. Standard Deviation
ObservationQuicksort and Merge sort are both comparison based Quicksort and Merge sort are both comparison based sorts, thus they are independent of the chosen sorts, thus they are independent of the chosen distribution or standard deviationdistribution or standard deviation
Performance depends on Performance depends on degree of sortednessdegree of sortednessi.e. the number of inversionsi.e. the number of inversionsMax Max n(n-1)/2n(n-1)/2
Architectural Model and Empirical SearchArchitectural Model and Empirical Search
• We saw how programs like BLAS and ATLAS use search to establish the parameters of the underlying architecture
So what Sort Algorithm is better?So what Sort Algorithm is better?
• What performance of the sorting algorithm depends on?What performance of the sorting algorithm depends on?• How to choose the best sorting algorithm?How to choose the best sorting algorithm?
Sorting algorithmsSorting algorithms
• QuickSort
• Radix Sort
• Merge Sort
• Insertion Sort
• Sorting Networks
• Heap Sort
Sorting algorithmsSorting algorithms
• QuickSort
• Radix Sort
• Merge Sort
• Insertion Sort
• Sorting Networks
• Heap Sort
Sorting algorithmsSorting algorithms
• QuickSort
• Radix Sort
• Cache-Conscious Radix sort
• Merge Sort
• Multiway Merge Sort
• Insertion Sort
• Sorting Networks
• Heap Sort
Register sortsRegister sorts}
Quick SortQuick SortDescription: Pick a pivot, move records around the pivot, records which are smaller than pivot go to the front, bigger go to the back, and pivot inserted between them.
Improvements: • Move iteratively • Choose pivot among the first, middle and last keys• Use fast sorts for the small partitioning. (insertion or sorting networks)
Cache-Conscious Radix SortCache-Conscious Radix Sort
Having Having bb-bit integer and a -bit integer and a radix of size 2radix of size 2rr, algorithm first , algorithm first sorts by lower r bits then sorts sorts by lower r bits then sorts by next r bits total in by next r bits total in b/rb/r phases, where r is chosen to phases, where r is chosen to be be rr ≤ log≤ log22SSTBLTBL-1 where S-1 where STBLTBL
number of entries in number of entries in translation look-aside buffer.translation look-aside buffer.
ImprovementsImprovements: : • Proceed iteratively, Proceed iteratively, • Compute the histogram of the each r bits first time the Compute the histogram of the each r bits first time the
sort is applied, sort is applied, • Choose r as described aboveChoose r as described above
Multiway merge sort.
It partitions the keys into p subsets, each subset is then sorted in (in this case with CC-radix sort) and then subsets are merged using heap. First smallest/largest element of the subset is promoted to the leaves of the heap then leaves are compared and an appropriate leaf is promoted.
• Heap contains 2*p-1 leaves.• Each parent in a heap has A/r children, A cache line, r size of a node.
Insertion Sort.Insertion Sort.
Used for the small data sizesUsed for the small data sizes
Algorithm working from left to right for each key scans to the Algorithm working from left to right for each key scans to the left of the key and places it in the appropriate placeleft of the key and places it in the appropriate place
Sorting NetworksSorting Networks
Algorithms compares two inputs Algorithms compares two inputs in sequence and if one is bigger in sequence and if one is bigger then the other it swaps them.then the other it swaps them.
Input Data FactorsInput Data Factors
• Number of keysNumber of keys• DistributionDistribution• Standard deviationStandard deviation• … …
Approximate S.D. with Entropy vector
∑∑i i -P-Pii*log*log22PPii where P where Pi i =c=ci i /N, c/N, cii is a number of keys with is a number of keys with
value value ii in that digit in that digit
Learning procedureLearning procedure
Winnow algorithm: Winnow algorithm: ∑∑ii w wi i *E*Eii > > ΘΘ
Computes weights vector and threshold depending on Computes weights vector and threshold depending on the Entropy vectorthe Entropy vector
: (N,E) : (N,E) → {CC-radix, Multiway Merge(N,E), Quicksort}→ {CC-radix, Multiway Merge(N,E), Quicksort}
Sample the input array (every fourth entry)Sample the input array (every fourth entry)
Compute the entropy vectorCompute the entropy vector
Compute S = ∑Compute S = ∑ii w wii * entropy * entropyii
If S If S ≥≥ ӨӨ
choose CC-radixchoose CC-radix
elseelse
choose others based on size of inputchoose others based on size of input
(either Merge Sort or QuickSort)(either Merge Sort or QuickSort)
Selection at run timeSelection at run time
Runtime FactorsRuntime FactorsDistribution shape of the data Distribution shape of the data
Amount of data to Sort Amount of data to Sort
Distribution WidthDistribution Width
Architectural FactorsArchitectural FactorsCache / TLB sizeCache / TLB size
Number of Registers Number of Registers
Cache Line SizeCache Line Size
SummarizeSummarize
Empirical SearchEmpirical Search
Any, since it doesn’t matterAny, since it doesn’t matter
Learn at installation timeLearn at installation time
Performance ResultsPerformance Results
Performance ResultsPerformance Results
Is it possible to do better?Is it possible to do better?
Sorting PrimitivesSorting Primitives
To build a new sorting algorithms: sorting and selection To build a new sorting algorithms: sorting and selection primitivesprimitives
• Sorting primitive: Is a pure sorting algorithm looked before• Selection primitive: Is a process to be executed at run time
to decide which sorting algorithm to apply
Sorting PrimitivesSorting Primitives• Divide-by-ValueDivide-by-Value: corresponds to the first phase of Quicksort : corresponds to the first phase of Quicksort
takes the number of pivots as a parameter (takes the number of pivots as a parameter (npnp+1) +1)
- A step in Quicksort- A step in Quicksort-Select one or multiple pivots and sort the input array Select one or multiple pivots and sort the input array around around these pivotsthese pivots
• Divide-by-PositionDivide-by-Position: corresponds to initial break of Merg Sort: corresponds to initial break of Merg Sorttakes takes sizesize of each partition and of each partition and fan-outfan-out of the heap of the heap
- Divide input into same-size sub-partitions- Divide input into same-size sub-partitions- Use heap to merge the multiple sorted sub-partitions- Use heap to merge the multiple sorted sub-partitions
Sorting PrimitivesSorting Primitives•Divide-by-RadixDivide-by-Radix: corresponds to the step in the radix sort : corresponds to the step in the radix sort algorithm. Takes a algorithm. Takes a radixradix as a parameter. as a parameter.
Parameter: radix (r bits)Parameter: radix (r bits)Step 1: Scan the input to get distribution array, which records how Step 1: Scan the input to get distribution array, which records how many elements in each of the 2r sub-partitions.many elements in each of the 2r sub-partitions.Step 2: Compute the accumulative distribution array, which is used as Step 2: Compute the accumulative distribution array, which is used as the indexes when copying the input to the destination array.the indexes when copying the input to the destination array.Step 3: Copy the input to the 2r sub-partitions.Step 3: Copy the input to the 2r sub-partitions.
1111
0123
counter
0123
0123
accum. dest.
11233012
src.
30111223
1234
Sorting PrimitivesSorting Primitives•Divide-by-radix-assuming-Uniform-distributionDivide-by-radix-assuming-Uniform-distribution: same as : same as above. Assumes that each bucket contains n/2above. Assumes that each bucket contains n/2rr keys keys
- Step 1 and Step 2 in DR are expensive.- Step 1 and Step 2 in DR are expensive.- If the input elements are distributed among 2r sub-- If the input elements are distributed among 2r sub-partitions partitions near evenly, the input can be copied into the near evenly, the input can be copied into the
destination array directly assuming every partition destination array directly assuming every partition have have the same number of elements.the same number of elements.- Overhead: partition overflow- Overhead: partition overflow
Sorting PrimitivesSorting PrimitivesOnce the partition is small:Once the partition is small:
• Leaf-Divide-by-ValueLeaf-Divide-by-Value: same as DV but applies recursively to : same as DV but applies recursively to the partitions. < the partitions. < ThresholdThreshold applies register sorting applies register sorting
• Leaf-Divide-by-RadixLeaf-Divide-by-Radix: same as DR but is used on all : same as DR but is used on all remaining subsets. < remaining subsets. < thresholdthreshold applies register sorting applies register sorting
Selection Primitives
• Branch-by-SizeBranch-by-Size: used to select different paths based on size: used to select different paths based on size• Branch-by-EntropyBranch-by-Entropy: uses entropy to branch on different path.: uses entropy to branch on different path.
Uses Winnow for learning the weight vectorUses Winnow for learning the weight vector
Genetic AlgorithmGenetic Algorithm
Crossover:Crossover:
• Propagate good sub-treesPropagate good sub-trees
Mutation:
• Mutate the structure of the Mutate the structure of the algorithm.algorithm.
• Change the parameter Change the parameter values of primitivesvalues of primitives..
Genetic AlgorithmGenetic Algorithm
Fitness function:Fitness function:
• Average performance by Average performance by S.D.S.D.
• Uses Rank instead of Uses Rank instead of fitness.fitness.
Performance ResultsPerformance Results
Performance ResultsPerformance Results
Is it possible to do better?Is it possible to do better?
Empirically was observed that Branch-by-Entropy selection Empirically was observed that Branch-by-Entropy selection primitive was never usedprimitive was never used
Classifier SortingClassifier Sorting
Based on the idea that the performance of the algorithm in Based on the idea that the performance of the algorithm in one region of input space can be independent of the other.one region of input space can be independent of the other.
ii is an input characteristic string, c is a condition string with “1”, “0” and “*” is an input characteristic string, c is a condition string with “1”, “0” and “*” for don’t care. for don’t care.
Example:Example:Encode number of keys into 4 bits.Encode number of keys into 4 bits.
0000: 0~1M, 0001: 1~2M…0000: 0~1M, 0001: 1~2M…
Number of keys = 10.5M. Encoded into “1100”Number of keys = 10.5M. Encoded into “1100”
ConditionCondition ActionAction FitnessFitness AccuracyAccuracy
(dr 5 (lq 1 16))(dr 5 (lq 1 16)) …… ……
(dp 4 2 ( lr 5 16))(dp 4 2 ( lr 5 16)) …… ……
…… ……
1100
1100
1100
01**
1010
110* (dv 2 ( lr 6 16))
Experimental ResultsExperimental Results
Experimental ResultsExperimental Results
Experimental ResultsExperimental Results
Summary and Future workSummary and Future work
The work presented shows how sorting can be adapted to The work presented shows how sorting can be adapted to underlying platformsunderlying platforms
Potential future work:Potential future work:- Figure out what went wrong or not wrong with those graphsFigure out what went wrong or not wrong with those graphs- Incorporate the notion of “sortedness” into sort selectionIncorporate the notion of “sortedness” into sort selection- Simplify the selection algorithmSimplify the selection algorithm- See if these notions can be used in the cache oblivious waySee if these notions can be used in the cache oblivious way