introduction

1
Introduction SSAHA stands for Sequence Search and Alignment by Hashing Algorithm, a method of quickly and easily searching the large amounts of data in DNA databases. Created in 2001, it transcends BLAST as a faster tool for sequence alignment. SSAHA works by storing the locations of nucleotides in a hash table, and then retrieving them to compare as necessary. SSAHA compromises memory for speed, but can handle high throughput, such as entire genomes. Genomes Available to Search with SSAHA Acknowledgments We thank Owen Astrachan for being such a wonderful teacher, our parents for sending us to Duke ,and the Genome Revolution Focus for our learning opportunities. Timeline Applications SSAHA has been used to… •Refine repetitive sequence searches (2005) •Locate large amounts of genetic variation within the zebrafish species (2006) •Localizing large data sets (2006) •Aligning DNA strands (2004) Nnenna Opara, Marni Siegel, Kaitlyn McPartland, and Meg Eckman Department of Computer Science, Duke University, Durham, North Carolina 27708 Literature cited Ning, Zemin, Anthony J. Cox and James C. Mullikin. 2001. SSAHA: A Fast Search Engine for Large DNA Databases. Genome Research. 11: 1725-1729. For further information Please contact [email protected]. or [email protected]. More information on this assignment and the CompSci 4G Class may be found from http://www.cs.duke.edu/~ola/. More information about the Duke Institute for Genome Science and Research may be found at www.genome.duke.edu. SSAHA: A Fast Search Method for Large DNA Databases DNA Comparison APT Problem Statement In trying to compare two different genomes of the same species, bioinformatisists use SSAHA to find the similarity between strands of DNA using a hash table. If the strands are not exactly the same, SSAHA returns the location on the nucleotide strand of the SNP. In this problem, return the base number on a nucleotide strand of the first SNP, and if the strands are exactly the same, return 0. Definition Class: DNA Comparison Method: snpdetector Parameter: String dna, String n Returns: integer Method signature: public String snpdetector (String dna, String n); Class public class DNA Comparison { public String snpdetector (String dna, String n) { // fill in code here } } Constraints String of DNA will have at max fifty characters. The characters of DNA will all be ‘a’, ‘g’, ‘t’, or ‘c’. Anything not in these characters are treated as ‘a’. The two strands in comparison are the same length. Examples strands = { aaggttcc, aagtttcc} Returns: “4” because the first place the strands disagree is the g and t. strands = {aaaaaaaaaaaa; Formation of a hash table is mentioned in a Stanford professor’s presentation. BLAST is publishe d. SSAHA is publishe d. SSAHA2 is publishe d. Genetic Variation in Zebrafish is publishe d. John Sulsten and Craig Venter jointly announce the completion of a draft of the Human Genome. John Sulsten and Craig Venter jointly announce the completion of a draft of the Human Genome.

Upload: mufutau-reeves

Post on 30-Dec-2015

11 views

Category:

Documents


0 download

DESCRIPTION

John Sulsten and Craig Venter jointly announce the completion of a draft of the Human Genome. 2004. 2000. 1989. 2001. 2006. 1966. Formation of a hash table is mentioned in a Stanford professor’s presentation. Genetic Variation in Zebrafish is published. SSAHA is published. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Introduction

IntroductionSSAHA stands for Sequence Search and Alignment by Hashing Algorithm, a method of quickly and easily searching the large amounts of data in DNA databases. Created in 2001, it transcends BLAST as a faster tool for sequence alignment.

SSAHA works by storing the locations of nucleotides in a hash table, and then retrieving them to compare as necessary.

SSAHA compromises memory for speed, but can handle high throughput, such as entire genomes.

Genomes Available to Search with SSAHA

AcknowledgmentsWe thank Owen Astrachan for being such a wonderful teacher, our parents for sending us to Duke ,and the Genome Revolution Focus for our learning opportunities.

Timeline

ApplicationsSSAHA has been used to…

•Refine repetitive sequence searches (2005)

•Locate large amounts of genetic variation within the zebrafish species (2006)

•Localizing large data sets (2006)

•Aligning DNA strands (2004)

Nnenna Opara, Marni Siegel, Kaitlyn McPartland, and Meg EckmanDepartment of Computer Science, Duke University, Durham, North Carolina 27708

Literature citedNing, Zemin, Anthony J. Cox and James C. Mullikin. 2001. SSAHA: A Fast

Search Engine for Large DNA Databases. Genome Research. 11: 1725-1729.

For further informationPlease contact [email protected]. or [email protected]. More information on this assignment and the CompSci 4G Class may be found from http://www.cs.duke.edu/~ola/. More information about the Duke Institute for Genome Science and Research may be found at www.genome.duke.edu.

SSAHA: A Fast Search Method for Large DNA Databases

DNA Comparison APTProblem StatementIn trying to compare two different genomes of the same species, bioinformatisists use SSAHA to find the similarity between strands of DNA using a hash table. If the strands are not exactly the same, SSAHA returns the location on the nucleotide strand of the SNP.  In this problem, return the base number on a nucleotide strand of the first SNP, and if the strands are exactly the same, return 0. Definition

Class: DNA ComparisonMethod: snpdetectorParameter: String dna, String nReturns: integerMethod signature:

public String snpdetector (String dna, String n);  Class public class DNA Comparison {

public String snpdetector (String dna, String n){

// fill in code here}

}

ConstraintsString of DNA will have at max fifty characters.

The characters of DNA will all be ‘a’, ‘g’, ‘t’, or ‘c’. Anything not in these characters are treated as ‘a’. The two strands in comparison are the same length.  Examples strands = { aaggttcc, aagtttcc}Returns: “4” because the first place the strands disagree is the g and t.

strands = {aaaaaaaaaaaa; aaaaaaaaxaaa}Returns: “0” because the program will treat the ‘x’ as an ‘a’.

Formation of a hash table is mentioned in a Stanford professor’s presentation.

BLAST is published.

SSAHA is published.

SSAHA2 is published.

Genetic Variation in Zebrafish is published.

John Sulsten and Craig Venter jointly announce the completion of a draft of the Human Genome.John Sulsten and Craig Venter jointly announce the completion of a draft of the Human Genome.

John Sulsten and Craig Venter jointly announce the completion of a draft of the Human Genome.