adaptive sorting “a dynamically tuned sorting library” “optimizing sorting with genetic...

Adaptive SortingAdaptive Sorting

““A Dynamically Tuned Sorting Library”A Dynamically Tuned Sorting Library”

““Optimizing Sorting with Genetic Algorithms”Optimizing Sorting with Genetic Algorithms”

By Xiaoming Li, Maria Jesus Garzaran, and By Xiaoming Li, Maria Jesus Garzaran, and David PaduaDavid Padua

Presented by Anton MorozovPresented by Anton Morozov

Motivations and ObservationsMotivations and Observations

• Success of ATLAS, FFTW and SPIRALSuccess of ATLAS, FFTW and SPIRAL (signal processing libraries) (signal processing libraries)

What Can be done for Sorting?What Can be done for Sorting?

Why are we interested in the sorting Why are we interested in the sorting algorithms?algorithms?

Does this reflects the performance of the sorting algorithms?

Which additional factors influence the Which additional factors influence the performance of the sorting algorithm?performance of the sorting algorithm?

Performance vs. Standard DeviationPerformance vs. Standard Deviation

ObservationQuicksort and Merge sort are both comparison based Quicksort and Merge sort are both comparison based sorts, thus they are independent of the chosen sorts, thus they are independent of the chosen distribution or standard deviationdistribution or standard deviation

Performance depends on Performance depends on degree of sortednessdegree of sortednessi.e. the number of inversionsi.e. the number of inversionsMax Max n(n-1)/2n(n-1)/2

Architectural Model and Empirical SearchArchitectural Model and Empirical Search

• We saw how programs like BLAS and ATLAS use search to establish the parameters of the underlying architecture

So what Sort Algorithm is better?So what Sort Algorithm is better?

• What performance of the sorting algorithm depends on?What performance of the sorting algorithm depends on?• How to choose the best sorting algorithm?How to choose the best sorting algorithm?

Sorting algorithmsSorting algorithms

• QuickSort

• Radix Sort

• Merge Sort

• Insertion Sort

• Sorting Networks

• Heap Sort

• QuickSort

• Radix Sort

• Merge Sort

• Insertion Sort

• Heap Sort

• QuickSort

• Radix Sort

• Cache-Conscious Radix sort

• Merge Sort

• Multiway Merge Sort

• Insertion Sort

• Heap Sort

Register sortsRegister sorts}

Quick SortQuick SortDescription: Pick a pivot, move records around the pivot, records which are smaller than pivot go to the front, bigger go to the back, and pivot inserted between them.

Improvements: • Move iteratively • Choose pivot among the first, middle and last keys• Use fast sorts for the small partitioning. (insertion or sorting networks)

Cache-Conscious Radix SortCache-Conscious Radix Sort

Having Having bb-bit integer and a -bit integer and a radix of size 2radix of size 2rr, algorithm first , algorithm first sorts by lower r bits then sorts sorts by lower r bits then sorts by next r bits total in by next r bits total in b/rb/r phases, where r is chosen to phases, where r is chosen to be be rr ≤ log≤ log22SSTBLTBL-1 where S-1 where STBLTBL

number of entries in number of entries in translation look-aside buffer.translation look-aside buffer.

ImprovementsImprovements: : • Proceed iteratively, Proceed iteratively, • Compute the histogram of the each r bits first time the Compute the histogram of the each r bits first time the

sort is applied, sort is applied, • Choose r as described aboveChoose r as described above

Multiway merge sort.

It partitions the keys into p subsets, each subset is then sorted in (in this case with CC-radix sort) and then subsets are merged using heap. First smallest/largest element of the subset is promoted to the leaves of the heap then leaves are compared and an appropriate leaf is promoted.

• Heap contains 2*p-1 leaves.• Each parent in a heap has A/r children, A cache line, r size of a node.

Insertion Sort.Insertion Sort.

Used for the small data sizesUsed for the small data sizes

Algorithm working from left to right for each key scans to the Algorithm working from left to right for each key scans to the left of the key and places it in the appropriate placeleft of the key and places it in the appropriate place

Sorting NetworksSorting Networks

Algorithms compares two inputs Algorithms compares two inputs in sequence and if one is bigger in sequence and if one is bigger then the other it swaps them.then the other it swaps them.

Input Data FactorsInput Data Factors

• Number of keysNumber of keys• DistributionDistribution• Standard deviationStandard deviation• … …

Approximate S.D. with Entropy vector

∑∑i i -P-Pii*log*log22PPii where P where Pi i =c=ci i /N, c/N, cii is a number of keys with is a number of keys with

value value ii in that digit in that digit

Learning procedureLearning procedure

Winnow algorithm: Winnow algorithm: ∑∑ii w wi i *E*Eii > > ΘΘ

Computes weights vector and threshold depending on Computes weights vector and threshold depending on the Entropy vectorthe Entropy vector

: (N,E) : (N,E) → {CC-radix, Multiway Merge(N,E), Quicksort}→ {CC-radix, Multiway Merge(N,E), Quicksort}

Sample the input array (every fourth entry)Sample the input array (every fourth entry)

Compute the entropy vectorCompute the entropy vector

Compute S = ∑Compute S = ∑ii w wii * entropy * entropyii

If S If S ≥≥ ӨӨ

choose CC-radixchoose CC-radix

elseelse

choose others based on size of inputchoose others based on size of input

(either Merge Sort or QuickSort)(either Merge Sort or QuickSort)

Selection at run timeSelection at run time

Runtime FactorsRuntime FactorsDistribution shape of the data Distribution shape of the data

Amount of data to Sort Amount of data to Sort

Distribution WidthDistribution Width

Architectural FactorsArchitectural FactorsCache / TLB sizeCache / TLB size

Number of Registers Number of Registers

Cache Line SizeCache Line Size

SummarizeSummarize

Empirical SearchEmpirical Search

Any, since it doesn’t matterAny, since it doesn’t matter

Learn at installation timeLearn at installation time

Performance ResultsPerformance Results

Is it possible to do better?Is it possible to do better?

Sorting PrimitivesSorting Primitives

To build a new sorting algorithms: sorting and selection To build a new sorting algorithms: sorting and selection primitivesprimitives

• Sorting primitive: Is a pure sorting algorithm looked before• Selection primitive: Is a process to be executed at run time

to decide which sorting algorithm to apply

Sorting PrimitivesSorting Primitives• Divide-by-ValueDivide-by-Value: corresponds to the first phase of Quicksort : corresponds to the first phase of Quicksort

takes the number of pivots as a parameter (takes the number of pivots as a parameter (npnp+1) +1)

- A step in Quicksort- A step in Quicksort-Select one or multiple pivots and sort the input array Select one or multiple pivots and sort the input array around around these pivotsthese pivots

• Divide-by-PositionDivide-by-Position: corresponds to initial break of Merg Sort: corresponds to initial break of Merg Sorttakes takes sizesize of each partition and of each partition and fan-outfan-out of the heap of the heap

- Divide input into same-size sub-partitions- Divide input into same-size sub-partitions- Use heap to merge the multiple sorted sub-partitions- Use heap to merge the multiple sorted sub-partitions

Sorting PrimitivesSorting Primitives•Divide-by-RadixDivide-by-Radix: corresponds to the step in the radix sort : corresponds to the step in the radix sort algorithm. Takes a algorithm. Takes a radixradix as a parameter. as a parameter.

Parameter: radix (r bits)Parameter: radix (r bits)Step 1: Scan the input to get distribution array, which records how Step 1: Scan the input to get distribution array, which records how many elements in each of the 2r sub-partitions.many elements in each of the 2r sub-partitions.Step 2: Compute the accumulative distribution array, which is used as Step 2: Compute the accumulative distribution array, which is used as the indexes when copying the input to the destination array.the indexes when copying the input to the destination array.Step 3: Copy the input to the 2r sub-partitions.Step 3: Copy the input to the 2r sub-partitions.

counter

accum. dest.

11233012

30111223

Sorting PrimitivesSorting Primitives•Divide-by-radix-assuming-Uniform-distributionDivide-by-radix-assuming-Uniform-distribution: same as : same as above. Assumes that each bucket contains n/2above. Assumes that each bucket contains n/2rr keys keys

- Step 1 and Step 2 in DR are expensive.- Step 1 and Step 2 in DR are expensive.- If the input elements are distributed among 2r sub-- If the input elements are distributed among 2r sub-partitions partitions near evenly, the input can be copied into the near evenly, the input can be copied into the

destination array directly assuming every partition destination array directly assuming every partition have have the same number of elements.the same number of elements.- Overhead: partition overflow- Overhead: partition overflow

Sorting PrimitivesSorting PrimitivesOnce the partition is small:Once the partition is small:

• Leaf-Divide-by-ValueLeaf-Divide-by-Value: same as DV but applies recursively to : same as DV but applies recursively to the partitions. < the partitions. < ThresholdThreshold applies register sorting applies register sorting

• Leaf-Divide-by-RadixLeaf-Divide-by-Radix: same as DR but is used on all : same as DR but is used on all remaining subsets. < remaining subsets. < thresholdthreshold applies register sorting applies register sorting

Selection Primitives

• Branch-by-SizeBranch-by-Size: used to select different paths based on size: used to select different paths based on size• Branch-by-EntropyBranch-by-Entropy: uses entropy to branch on different path.: uses entropy to branch on different path.

Uses Winnow for learning the weight vectorUses Winnow for learning the weight vector

Genetic AlgorithmGenetic Algorithm

Crossover:Crossover:

• Propagate good sub-treesPropagate good sub-trees

Mutation:

• Mutate the structure of the Mutate the structure of the algorithm.algorithm.

• Change the parameter Change the parameter values of primitivesvalues of primitives..

Genetic AlgorithmGenetic Algorithm

Fitness function:Fitness function:

• Average performance by Average performance by S.D.S.D.

• Uses Rank instead of Uses Rank instead of fitness.fitness.

Performance ResultsPerformance Results

Is it possible to do better?Is it possible to do better?

Empirically was observed that Branch-by-Entropy selection Empirically was observed that Branch-by-Entropy selection primitive was never usedprimitive was never used

Classifier SortingClassifier Sorting

Based on the idea that the performance of the algorithm in Based on the idea that the performance of the algorithm in one region of input space can be independent of the other.one region of input space can be independent of the other.

ii is an input characteristic string, c is a condition string with “1”, “0” and “*” is an input characteristic string, c is a condition string with “1”, “0” and “*” for don’t care. for don’t care.

Example:Example:Encode number of keys into 4 bits.Encode number of keys into 4 bits.

0000: 0~1M, 0001: 1~2M…0000: 0~1M, 0001: 1~2M…

Number of keys = 10.5M. Encoded into “1100”Number of keys = 10.5M. Encoded into “1100”

ConditionCondition ActionAction FitnessFitness AccuracyAccuracy

(dr 5 (lq 1 16))(dr 5 (lq 1 16)) …… ……

(dp 4 2 ( lr 5 16))(dp 4 2 ( lr 5 16)) …… ……

…… ……

110* (dv 2 ( lr 6 16))

Experimental ResultsExperimental Results

Summary and Future workSummary and Future work

The work presented shows how sorting can be adapted to The work presented shows how sorting can be adapted to underlying platformsunderlying platforms

Potential future work:Potential future work:- Figure out what went wrong or not wrong with those graphsFigure out what went wrong or not wrong with those graphs- Incorporate the notion of “sortedness” into sort selectionIncorporate the notion of “sortedness” into sort selection- Simplify the selection algorithmSimplify the selection algorithm- See if these notions can be used in the cache oblivious waySee if these notions can be used in the cache oblivious way

adaptive sorting “a dynamically tuned sorting library” “optimizing sorting with genetic...

Documents

optimization techniques for efficient hta...

radio frequency systems the radio frequency systems … ·...

sorting considerations - virginia...

welcome to phys 406! statistical and thermal physics...

introduction to embedded system xiaoming ju 2005.2

back to sorting – more efficient sorting algorithms

computer programming sorting and sorting algorithms 1

mobile ip technology li xiaoming valon sejdini hasan...

graphical contents list -...

an overview of grid, cloud and related database technologies...

xiaoming wang education professional experiences

yinhe sheng1, a, kun liu2, b, defu shao2, c and xiaoming

modeling residual-geometric flow sampling xiaoming...

ted labuza, peng zhou, xiaoming liulaurie davis* & amy

efficient sorting algorithmsefficient sorting...

feature modeling for adaptive computing by tao xiaoming...

xiaoming wang, md. ms

i. sorting networks · batcher’s sorting network counting...

back to sorting – more efficient sorting algorithms

sorting ii/ slide 1 lecture 24 may 15, 2011 l merge-sorting...