whole genome comparison kelley crouse and greg matuszek
TRANSCRIPT
Whole genome comparison
Kelley CrouseAnd
Greg Matuszek
Objective
• Implement a parallel program for genome and chromosome comparisons
Background
• MUMmer: serial implementation using a suffix tree
• Parallel implementation using a variant of the Smith-Waterman local alignment algorithm.
Disadvantages
• Neither handles larger genomes and chromosomes quickly
• Parallel version hindered by data structure
How we plan to implement
• A suffix tree will be created using one sequence
• The second sequence will be fragmented and sent out to the workers.
• Each worker will compare its fragment against the suffix tree and report back to the farmer with the location(s) of similarity
What is a Suffix Tree?
• The tree represents all suffixes within a given string
• Used to search for a sub-string within a string
• By comparing a test string, T, against the suffix tree of string, S, it is possible to locate any and all possible correlations between the two strings
Suffix Tree - Bananas
• Each suffix of “Bananas” is represented within the suffix tree
• Sub-string S, can be compared to bananas by following the paths of each leaf.
Fragmenting the Second Sequence
Random fragmenting- Difficult to assemble alignment- allows for small and large fragments
Specific length fragments- Restricted to one fragment size- Alignment is easier to assemble
What we hope to gain
• Ability to identify conserved regions between genomes (and chromosomes)
• Conduct comparison between large genomes and chromosomes quickly and accurately