sudhindra r. gadagkar, ph.d
TRANSCRIPT
Sudhindra R. Gadagkar, Ph.D.
Computational Biology
University of Dayton
Some background material…
• BS in Fisheries Science from University of Agricultural Sciences, Bangalore, India
• MS in Fisheries Science (Statistics)
Tilapia (Oreochromis niloticus)
Genetics of fish behavior
Ph. D. research (contd.)
• Complex behaviors are heritable (behaviors governed by genes)
• Behavior and growth rate are correlated at the genetic level
(the same gene(s) are responsible for both traits or they are closely linked)
Post-doctoral research in Bioinformatics at Arizona State University
What I do now
• There is information in DNA and this information is used by the body.
Source for image: www.nigms.nih.gov/.../ genetics/science.html
• DNA is an incredibly long strand, made up of four different molecules (called nucleotides), abbreviated as A, C, G and T.
• For example, the DNA from the longest human chromosome is 12 cm long!
• Each cell of the human body contains DNA.
• The total length of all this DNA is >3 billion nucleotides!
• That’s a large number!
Let’s get some perspective• A DNA sequence can look like the following:• ACTGTTTGAAATTGACCCAGCACTTCTCCCTCGCGCAGACAGAGAGCAGTGTAG
ACGGAGCCTTAATCGCTAGAGCGAATCCCGATGCCCCACCTTCCGTCGGTGCATAAGTCGCACGGCGTCTCCCCCCCGTATGTGGTCTTAGGTAACCGCCGCCGGGCGTAGGGTTCACGGTCGAGGATGAAGATGGCGATTCGTCACCTCGCCAACGGGAGGGACCTCATTCGATCGATCCGCAAGTCTTCGCGGGAGCTCGTCATGCGGAACGCAGGAGACAACACTCTGCGTCGGATGCGCGCCGTATCAGTCGGGTGAGGCACGCCTAGCGATTCGACCTTAATTCCCGGACGCGACGCGAGGAGTTGGGAGATTGCTGCCCAAACCGGTCCGCGCTACTTAGGCTGCCGGACCCTTCTCGCCCCACGGGTGGCGGTGGTAATAGAGTTGGCCCGCCCTCTATGTGTCGGAAAGGGGGAGCCGGGGGCCGTGAGGATGCCCACACTGTCGGCGAGACCATGCTATCGAGCCTCCCTGGGACCCTCGGGGACTTTAGTTCCCACTCGGTTGGGGATTCAGTAGCCACGAATCAGACCGCCCCGGGTGGGGGCTTCGTCGTCTTGTCTTTCCAGCCCCCCTCTACTCTTCCTACTACGCCCGTCTGTCGAGGGTGCCGAGCGCGCAGTGTGCTCCCAGCGGCTCGTGCCAGGTTAGGTAGCCATATGTATTTATCGGCTGAGGACCGCCCGCCGTGTACCGACGATTTTGTTATAATTCTAGAGATGGGCTGGCACTTACCTGCTAGGTTTCTTGTCTGCTATGACTCGTGCGAACAGTCTTACTCTTGGCACAGCCGCGATGGCGATGGTTTAGCGGTTCCCATGGGGGGAATCGCGCGACGGCACCCAGTTCTGTTTCGACCGGACCCTGCTTACTCCTGGCCGAGAGGCCTCATTCTCGTTCGAGTCGATCGCTTATGTTATCGCGCCATTGGGAGTGCTCTGACCAATTACCGACCCGGAGTGTG
Let’s get some perspective
• What if we try and write down the entire sequence (all 3.5 billion of them)?
• After all, now we do know what the entire human DNA sequence is.
Let’s get some perspective
• Let’s see…if we can fit 75 letters in each line and if there are 50 lines in a page, then a page will contain 3,750 nucleotides.
• That does sound like a lot (the earlier slide had 1024).
Let’s get some perspective• A book that contains 100 pages can hold
100 x 3,750 = 375,000 nucleotides.
• That is a lot!
• How thick do you think a book of 100 pages might be?
• An inch maybe.
• We need to write down at least 3 billion letters.
Let’s get some perspective
• Therefore, we need (3,000,000,000)/375,000
• = 8000 inches
• = 667 feet.
The Washington monument
Source of image: epod.usra.edu/archive/ epodviewer.php3?oid=158368
Let’s get some perspective
• ... is 555 feet!
• So imagine a stack of books taller than the Washington monument crammed with letters – no spaces, no commas, no paragraphs.
Let’s get some perspective
• And we would have written down the data for one strand in one cell of one human being!
• We need to understand this data.
• Remember, there are no words, no punctuations, no “parts of speech” in this “text”.
• Yet, we have to make sense out of this information.
Another example
• This is the evolutionary tree of primates.
• There are 10 species here whose evolutionary relationship we are interested in.
Source for image: locus.umdnj.edu/nigms/ special/primate.html
How many possible trees?
• Do you know how many possible ways there are for drawing the evolutionary history (“tree”) for 10 species?
2
2 3 !Formula:
2 2 !n
n
n
where n is the number of species
How many trees!
0
400
600
800
1000
1200
0 100 200 300 400
Millions
Billions
10200
10
10
10
10
10N
o. o
f P
oss i
ble
Tre
es
No. of Sequences
1079 atoms in the universe
1037 atoms in the bodies of all humans by year 2035
5 1030 prokaryotes living today
5 1011 stars in the milky way
How many trees represent the true relationship?
• And only one of them is the correct tree because evolution has happened only once.
• And we need to find it!
One final example
Pairwise Alignment – contd.• Consider these two DNA sequences
– AATCTATA– AAGATA
• We want to compare them site by site, so we need to align them by introducing gaps.
• Gaps can be introduced in various places, and in various combinations, as shown next.
Pairwise Alignment – contd.
AATCTATA
AAG--ATA
AATCTATA
AA-G-ATA
AATCTATA
AA--GATA
Pairwise Alignment – contd.• Clearly, if the sequences are long, it would
become impossible for manual introduction of gaps; we would need a computer to help us find the optimal gaps.
• But let us first see what is involved in asking the computer to do this.
• One way, the looooooooong way is to:– introduce gaps in every possible position.
The Brute Force Method(the Perspiration approach)
• For the long way, to get an idea of what is involved, let us first look at the first position.
• There are three possible choices:1. gap in the first sequence2. gap in the second sequence, or3. gap in neither sequenceThat is,• - A A• A - A
The Brute Force Method(the Perspiration approach)
• These options are the same for every position.
• Therefore, the number of possible paths, y, for a pair of sequences of length 1 base is 3
• If the sequences are 2 bases long it is 32 = 9.
The Brute Force Method(the Perspiration approach)
• In general, if they are n bases long, then there are 3n paths.
• If n = 20, then y = 320 = 3.4 x 109
The Brute Force Method(the Perspiration approach)
• If n = 200, then y = 3200 = 2.6 x 1095
• If one path takes 1 nanosecond (10-9 seconds), then for a pair of sequences that is 200 bases long, the computer will need
– 8.4 x 1078 years!!
Let’s get some perspective
• Needs a super-human effort, eh?
• That’s absolutely right!
• That super-human is the computer.
• But it’s not enough to just use the computer to solve such problems.
• The computer does not have to work hard. It needs to work smart!
Need Computer!