gpu-accelerated genetic algorithms

Download GPU-Accelerated Genetic Algorithms

If you can't read please download the document

Upload: malia

Post on 10-Jan-2016

32 views

Category:

Documents


0 download

DESCRIPTION

GPU-Accelerated Genetic Algorithms. Rajvi Shah + , P J Narayanan + , Kishore Kothapalli ˆ IIIT Hyderabad Hyderabad, India. + : Center for Visual Information Technology ˆ : Center for Security, Theory and Algorithmic Research. GAs – an introduction. Genetic Algorithms - PowerPoint PPT Presentation

TRANSCRIPT

  • Genetic Algorithms A class of evolutionary algorithms Efficiently solves optimization tasksPotential Applications in many fields

    ChallengesLarge execution time

  • A representation for chromosomeCreate Initial PopulationSelect ParentsCreate New PopulationGA ParametersEvaluate FitnessCrossover OperatorMutation OperatorTermination CriteriaUser Specifies A method for fitness evaluationExitYes

  • High degree of parallelism Fitness evaluationCrossoverMutation

    Most obvious : chromosome level parallelismSame Operations on each chromosomeUse a thread per chromosome

  • Thread-per-chromosome model Good enough for small to moderate sized multi-coreDoesnt map well to a massively multithreaded GPUs

    Solution : identify and exploit gene-level parallelism

  • A column of threads read a chromosome gene-by-gene and cooperate to perform operationsResults in coalesced read and faster processingPopulation Matrix in MemoryThread Blocks in a grid

  • Construct Initial Population

    On CPUGPU Global MemoryRandom NumbersOld PopulationNew PopulationFitness ScoresStatisticsEvaluation KernelStatistics Update KernelSelection KernelCrossover KernelMutation KernelParse GA ParametersGenerate Random NumbersOn GPU

  • Construct Initial Population

    On CPUGPU Global MemoryRandom NumbersOld PopulationNew PopulationFitness ScoresStatisticsStatistics Update KernelSelection KernelCrossover KernelMutation KernelParse GA ParametersGenerate Random NumbersOn GPUPopulationScoresEvaluation KernelEvaluation Kernel

  • Partially parallel methodPartially-parallel Method

    User Specifies a serial code fragment for fitness evaluation.

    Threads are arranged in a 1D grid.

    Each thread executes users code on one chromosome.

    Providing chromosome level parallelism.

    Benefit : Abstraction

    Fully parallel methodCUDA familiar user can effectively use 2D thread layout

    Use gene level Parallelism for fitness evaluation

    Benefit : Efficiency

  • Task : Given weights , costs & knapsack capacityAim : maximize the cost.

    Representation1D binary string0/1: Absence/Presence of an item,W and C are total weight and Cost of given representation

    Best Solution : One with max C given W < Wmax

    Fully Parallel Method

    Use a group of threads to compute total cost and weight in logarithmic time

  • Construct Initial Population

    On CPUGPU Global MemoryRandom NumbersOld PopulationNew PopulationFitness ScoresStatisticsStatistics Update KernelSelection KernelCrossover KernelMutation KernelParse GA ParametersGenerate Random NumbersOn GPUScoresStatisticsEvaluation KernelStatistics Update Kernel

  • Selection and Termination most often use Population Statistics

    We use standard parallel reduce algorithm to calculate Max, Min, Average Scores

    We use highly optimized public library CUDPP To sort and rank chromosomes

  • Construct Initial Population

    On CPUGPU Global MemoryRandom NumbersOld PopulationNew PopulationFitness ScoresStatisticsStatistics Update KernelSelection KernelCrossover KernelMutation KernelParse GA ParametersGenerate Random NumbersOn GPUStatisticsParentsEvaluation KernelSelection Kernel

  • Selection KernelUses N/2 threadsEach thread selects two parents for producing offspring

    Uniform Selection : Selects parents in a uniform random manner

    Roulette Wheel Selection: Fitness based approach, more the fitness, better the chance of selection

  • Roulette Wheel

    Sort fitness scores

    Compute a roulette wheel array by doing a prefix-sum scan of scores and normalizing it.

    Generate a random number in 0-1.

    Perform binary search in roulette wheel array for the nearest smaller number to the randomly selected number.

    Return the index of the result in array

    Image Courtesy : xyz

  • Construct Initial Population

    On CPUGPU Global MemoryRandom NumbersOld PopulationNew PopulationFitness ScoresStatisticsStatistics Update KernelSelection KernelCrossover KernelMutation KernelParse GA ParametersGenerate Random NumbersOn GPUOld PopulationNew PopulationEvaluation KernelCrossover Kernel

  • GPU Global Memory

    PopulationThread idy 12020403Thread idy 2081302Thread idy 3120702Thread idy 4051902Thread idx 1-LThread idx 1-LThread idx 1-LThread idx 1-L

  • GPU Global Memory

    PopulationThread idy020403Thread idy081302Thread idy 120702Thread idy 051902Thread idx 1-LThread idx 1-LThread idx 1-LThread idx 1-L12345678

  • Construct Initial Population

    On CPUGPU Global MemoryRandom NumbersOld PopulationNew PopulationFitness ScoresStatisticsStatistics Update KernelSelection KernelCrossover KernelMutation KernelParse GA ParametersGenerate Random NumbersOn GPUNew PopulationNew PopulationEvaluation KernelMutation Kernel

  • Thread 1,4Coin State GeneXFlip CoinCoin State GeneTFlip MutatorEach thread handles one gene and mutates it with probability of mutationThread Id yPopulation

  • Thread Id yPopulationThread 1,4Coin State GeneXFlip CoinCoin State GeneTFlip MutatorEach thread handles one gene and mutates it with probability of mutation

  • Construct Initial Population

    On CPUGPU Global MemoryRandom NumbersOld PopulationNew PopulationFitness ScoresStatisticsStatistics Update KernelSelection KernelCrossover KernelMutation KernelParse GA ParametersGenerate Random NumbersOn GPURandom No.sEvaluation KernelGenerate Random Numbers

  • Extensive use of random numbers

    No primitive for on the fly single random number generation

    Solution: Generate a pool of random numbers and copy it on GPU

    We use CUDPP routine to generate a large pool of random numbers on GPU (faster)

    If better quality random numbers are needed, this can be replaced by a CPU based routine

  • Test Device : A quarter of Nvidia Tesla S1030 GPU

    Test Problem : Solve a 0/1 knapsack problem

    Test Parameters:Representation : A 1D Binary StringCrossover : One-point crossoverMutation : Flip Mutation Selection : Uniform and Roulette Wheel

  • Ave. Run-time for 100 iterations (Uniform Selection)Ave. Run-time for 100 iterations (Roulette Wheel Selection)Growth in run-time for increase in NxLN: Population Size , L: Chromosome Length

  • Our approach is modeled after GAlib and maintains structures for GA, Genome and Statistics

    It is built with enough abstraction from user program so that user does not need to know CUDA architecture or programming.

    This can be extended to build a GPU-Accelerated GA library