using a genetic algorithm for approximate string matching on genetic code carrie mantsch december 5,...
Post on 21-Dec-2015
224 views
TRANSCRIPT
![Page 1: Using a Genetic Algorithm for Approximate String Matching on Genetic Code Carrie Mantsch December 5, 2003](https://reader035.vdocument.in/reader035/viewer/2022062313/56649d595503460f94a394e0/html5/thumbnails/1.jpg)
Using a Genetic Algorithm for Approximate String Matching on
Genetic Code
Carrie Mantsch
December 5, 2003
![Page 2: Using a Genetic Algorithm for Approximate String Matching on Genetic Code Carrie Mantsch December 5, 2003](https://reader035.vdocument.in/reader035/viewer/2022062313/56649d595503460f94a394e0/html5/thumbnails/2.jpg)
Outline
• Problem Statement
• Current Techniques
• GA Motivation
• My Algorithm
• Results
• Extension Possibilities
![Page 3: Using a Genetic Algorithm for Approximate String Matching on Genetic Code Carrie Mantsch December 5, 2003](https://reader035.vdocument.in/reader035/viewer/2022062313/56649d595503460f94a394e0/html5/thumbnails/3.jpg)
Problem Statement
The problem is to search and align strands of DNA using a genetic algorithm.
![Page 4: Using a Genetic Algorithm for Approximate String Matching on Genetic Code Carrie Mantsch December 5, 2003](https://reader035.vdocument.in/reader035/viewer/2022062313/56649d595503460f94a394e0/html5/thumbnails/4.jpg)
Current Techniques
• Approximate string matching– Usually meant for smaller strings– Many are set up for k mismatches
• 2 DNA strands of size 90 and 85– Allowing for 5 gaps in the second strand gives
almost 44 million possible alignments
![Page 5: Using a Genetic Algorithm for Approximate String Matching on Genetic Code Carrie Mantsch December 5, 2003](https://reader035.vdocument.in/reader035/viewer/2022062313/56649d595503460f94a394e0/html5/thumbnails/5.jpg)
Current Techniques (cont.)
• Needleman-Wunsch– Gap penalty -1
– Match bonus +1
– Mismatch 0
• Not practical if the sequence starts in the middle
– Counts the gaps at the beginning and end as penalties.
![Page 6: Using a Genetic Algorithm for Approximate String Matching on Genetic Code Carrie Mantsch December 5, 2003](https://reader035.vdocument.in/reader035/viewer/2022062313/56649d595503460f94a394e0/html5/thumbnails/6.jpg)
Current Techniques (cont.)
• BLAST (Basic Local Alignment Search Tool) and FASTA– Use domain specific knowledge
• http://www.ncbi.nlm.nih.gov/BLAST
• http://fasta.bioch.virginia.edu
![Page 7: Using a Genetic Algorithm for Approximate String Matching on Genetic Code Carrie Mantsch December 5, 2003](https://reader035.vdocument.in/reader035/viewer/2022062313/56649d595503460f94a394e0/html5/thumbnails/7.jpg)
GA Motivation
• Alien DNA
• Junk DNA
• Extendable to similar text searches without domain specific knowledge
![Page 8: Using a Genetic Algorithm for Approximate String Matching on Genetic Code Carrie Mantsch December 5, 2003](https://reader035.vdocument.in/reader035/viewer/2022062313/56649d595503460f94a394e0/html5/thumbnails/8.jpg)
My Algorithm
• The population– Bit strings of 0’s and 1’s– 0’s are spaces, 1’s mean a letter is placed there– The number of 1’s stays constant as the number
of letters in the smaller search string
![Page 9: Using a Genetic Algorithm for Approximate String Matching on Genetic Code Carrie Mantsch December 5, 2003](https://reader035.vdocument.in/reader035/viewer/2022062313/56649d595503460f94a394e0/html5/thumbnails/9.jpg)
My Algorithm (cont.)
• Breeding– Rank based selection
• Crossover– The common place markers are kept the same– The rest of the place markers are split evenly
between the two children
![Page 10: Using a Genetic Algorithm for Approximate String Matching on Genetic Code Carrie Mantsch December 5, 2003](https://reader035.vdocument.in/reader035/viewer/2022062313/56649d595503460f94a394e0/html5/thumbnails/10.jpg)
My Algorithm (cont.)
• Mutation– If the amount of gaps is less than one tenth of
the small string size add a gap– Otherwise delete a gap
![Page 11: Using a Genetic Algorithm for Approximate String Matching on Genetic Code Carrie Mantsch December 5, 2003](https://reader035.vdocument.in/reader035/viewer/2022062313/56649d595503460f94a394e0/html5/thumbnails/11.jpg)
Results
• The target match
![Page 12: Using a Genetic Algorithm for Approximate String Matching on Genetic Code Carrie Mantsch December 5, 2003](https://reader035.vdocument.in/reader035/viewer/2022062313/56649d595503460f94a394e0/html5/thumbnails/12.jpg)
Results (cont.)
• Ran for 50 generations
• Different random numbers for the same number of generations give best fitness values between about 32 and 67 (optimal fitness - 90)
![Page 13: Using a Genetic Algorithm for Approximate String Matching on Genetic Code Carrie Mantsch December 5, 2003](https://reader035.vdocument.in/reader035/viewer/2022062313/56649d595503460f94a394e0/html5/thumbnails/13.jpg)
Extension Possibilities
• Better representation of population
• Be able to alter fitness evaluation to be more specific to different problems
• Ability to add domain specific knowledge
• Parallel searching
![Page 14: Using a Genetic Algorithm for Approximate String Matching on Genetic Code Carrie Mantsch December 5, 2003](https://reader035.vdocument.in/reader035/viewer/2022062313/56649d595503460f94a394e0/html5/thumbnails/14.jpg)
Questions?