dna dna (deoxyribonucleic acid) and rna (ribonucleic acid) are composed of linear chains of...

12
DNA DNA (deoxyribonucleic acid) and RNA (ribonucleic acid) are composed of linear chains of monomeric units of nucleotides A nucleotide has three parts: a sugar, a phophate and a base Four bases

Upload: mervyn-barrett

Post on 02-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DNA DNA (deoxyribonucleic acid) and RNA (ribonucleic acid) are composed of linear chains of monomeric units of nucleotides A nucleotide has three parts:

DNA

• DNA (deoxyribonucleic acid) and RNA (ribonucleic acid) are composed of linear chains of monomeric units of nucleotides

• A nucleotide has three parts: a sugar, a phophate and a base

• Four bases

Page 2: DNA DNA (deoxyribonucleic acid) and RNA (ribonucleic acid) are composed of linear chains of monomeric units of nucleotides A nucleotide has three parts:

• Two strands are complementary• Base pairing: A-T; G-C• Pyrimidine and Purine form complementary H

bonding

Secondary Structure of DNA

Page 3: DNA DNA (deoxyribonucleic acid) and RNA (ribonucleic acid) are composed of linear chains of monomeric units of nucleotides A nucleotide has three parts:

• Genome– The entire DNAs of a cell is the genome– Individual units for coding proteins or RNA are genes

– A gene starts with ATG, ends with one or two stop codons

– Called ORF (Open Reading Frame)

– Biological Info– Contained in genome– Encoded in nucleotide sequences of DNA or RNA– Partitioned into discrete units, genes

Genome

Page 4: DNA DNA (deoxyribonucleic acid) and RNA (ribonucleic acid) are composed of linear chains of monomeric units of nucleotides A nucleotide has three parts:

Genome Databases

Completed genomes ftp site -- ftp://ftp.ncbi.nlm.nih.gov/genomes/ http://www.ncbi.nlm.nih.gov/PMGifs/Genomes/allorg.html http://www.ebi.ac.uk/genomes/mot/index.html http:/pir.goergetown.edu/pirwww/search/genome.html

Organism-specific databases http://www.unledu/stc-95/ResTools/biotools/biotools10.html http://www.fp.mcs.anl.gov/~gaasterland/genomes.html http://www.hgmp.mrc.ac.uk/GenomeWeb/genome-db.html http://www.bioinformatik.de/cgi-bin/browse/Catalog/

Databases/Genome_Proejcts

Page 5: DNA DNA (deoxyribonucleic acid) and RNA (ribonucleic acid) are composed of linear chains of monomeric units of nucleotides A nucleotide has three parts:

Human Genome• Human Genome Project

– Conceived in 1984, begun in 1990, completed in 2001 ahead of 2003 schedule

• What did the sequence reveal ?– 3 Bbp (base pair)

– 24 chromosomes,

– 22 autosomes plus two sex chromasomes (X,Y)

– Longest 250 Mbp, shorted 55 Mbp

– Mitochondrial genome

– Circular DNA molecule of 16.569 Mbp

– ~10**(13) cells

– How many is 3 Bbp ?– Typical 11-pt font can print 60 nucleotide is 3 in (~10 cm).

– In this format, 3 Bbp writes out in 5,000 mi

Page 6: DNA DNA (deoxyribonucleic acid) and RNA (ribonucleic acid) are composed of linear chains of monomeric units of nucleotides A nucleotide has three parts:

Other Species

Organism Genome size # of genes

Epstein – Barr virus 0.17 Mbp 80

E.Coli 4.6 Mbp 4,406

Yeast (S. cerevisiae) 12.5 Mbp 6,172

Nematode worm (C.elegans) 100.3 Mbp 19,099

Thale cress (A. thaliana) 115.4 Mbp 25,498

Fruit fly (D. melanogaster) 128.3 Mbp 13,601

Human (H. sapiens) 3223.0 Mbp 20,500

Fugu (Takifugu rubripes) 390.0 Mbp 30,000

Wheat 16000.0 Mbp 30,000

Page 7: DNA DNA (deoxyribonucleic acid) and RNA (ribonucleic acid) are composed of linear chains of monomeric units of nucleotides A nucleotide has three parts:

• In double strands• # of A = # of T; # of G = # of C• Erwin Chargaff’s 1st Parity Rule, 1951

• In a single strand ?• # of A = # of T; # of G = # of C• Erwin Chargaff’s 2nd Parity Rule

Monomer counts in DNA

Page 8: DNA DNA (deoxyribonucleic acid) and RNA (ribonucleic acid) are composed of linear chains of monomeric units of nucleotides A nucleotide has three parts:

• Download the Yeast Chromosome 1 sequence from www.cs.uml.edu/~kim/100/yeast01.txt to your C:\100

• Open a Command Prompt from Applications (NOT JES)

• cd C:\100• python• In Python

• NAME the DNA file• Read all lines and put them

into a single string, ‘dna’

• What does lines[0] have ?• What is happening here ?

Parsing DNA Data Files

>>> fp = open(‘yeast01.txt’)>>> lines=fp.readlines()

>>> lines[0]

Page 9: DNA DNA (deoxyribonucleic acid) and RNA (ribonucleic acid) are composed of linear chains of monomeric units of nucleotides A nucleotide has three parts:

• Line by line processing is difficult• Each line ends with ‘\n’• How to concatenate all

the lines into a LONG string by removing ‘\n’

• Why lines[1:], not lines[0:]?

Parsing DNA Data Files

>>> dna = ‘’.join(lines[1:])>>> dna[0:100]>>> dna = dna.replace(‘\n’,’’)

Page 10: DNA DNA (deoxyribonucleic acid) and RNA (ribonucleic acid) are composed of linear chains of monomeric units of nucleotides A nucleotide has three parts:

Base-Pair Distribution in a DNA String

• Write a Python function, basePairFreq(dna)• To count the number of ‘A’,’T’,’C’,’G’ in the concatenated dna

string

• How about the distribution of pairs of bases (bimers) ?• ACTTAGG

• AC, CT, TT, TA, AG, GG

• How about trimers, tetramers, pentamers, hexamers, … ?

Page 11: DNA DNA (deoxyribonucleic acid) and RNA (ribonucleic acid) are composed of linear chains of monomeric units of nucleotides A nucleotide has three parts:

DNA Base Countingdef baseFreq(dna):

count = [0.0,0.0, 0.0, 0.0]

num = 0

length = len(dna)

for i in range(0,length):

if dna[i:i+1] == 'A': count[0] = count[0]+1

elif dna[i:i+1] == 'C': count[1] = count[1]+1

elif dna[i:i+1] == 'T': count[2] = count[2]+1

elif dna[i:i+1] == 'G': count[3] = count[3]+1

else: num=num

num = num+1

for i in range(0,4):

count[i] = count[i]/num

return count

Page 12: DNA DNA (deoxyribonucleic acid) and RNA (ribonucleic acid) are composed of linear chains of monomeric units of nucleotides A nucleotide has three parts:

Base Counting (in Notepad)

def baseFreq(dna): count = [0.0,0.0] num = 0 length = len(dna) for i in range(0,length): if dna[i:i+1] == 'A': count[0] = count[0]+1 elif dna[i:i+1] == 'C': count[1] = count[1]+1 elif dna[i:i+1] == 'T': count[2] = count[2]+1 elif dna[i:i+1] == 'G': count[3] = count[3]+1 else: num=num num = num+1 for i in range(0,4): count[i] = count[i]/num return count

##### main() function #############dataFile = input('Enter a DNA file name\n')fp = open(dataFile)lines = fp.readlines()dnaStr = ''.join(lines)dnaStr = dnaStr.replace('\n', '')

freq = basePairFreq(dnaStr)print(freq)