topics in bioinformatics cs832b bin ma. lecture 1: basic

29
Topics in Bioinformatics CS832b Bin Ma

Upload: alexandra-howard

Post on 18-Jan-2016

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Topics in Bioinformatics CS832b Bin Ma. Lecture 1: Basic

Topics in Bioinformatics

CS832b

Bin Ma

Page 2: Topics in Bioinformatics CS832b Bin Ma. Lecture 1: Basic

Lecture 1: Basic

Page 3: Topics in Bioinformatics CS832b Bin Ma. Lecture 1: Basic

Three molecules we will study

• DNA• A string over alphabet {A,C,G,T}

• RNA• Primary structure – a string over alphabet {A,C,G,U}

• Secondary and tertiary structures

• Protein• Primary structure – a string over alphabet

{A,R,N,D,C,Q,E,G,H,I,L,K,M,F,P,S,T,W,Y,V}

• Secondary and tertiary structures

Page 4: Topics in Bioinformatics CS832b Bin Ma. Lecture 1: Basic
Page 5: Topics in Bioinformatics CS832b Bin Ma. Lecture 1: Basic
Page 6: Topics in Bioinformatics CS832b Bin Ma. Lecture 1: Basic

5’

5’ 3’

3’

Page 7: Topics in Bioinformatics CS832b Bin Ma. Lecture 1: Basic
Page 8: Topics in Bioinformatics CS832b Bin Ma. Lecture 1: Basic

DNA

5’…AGTAGCCTATGCGA…3’ …::::::::::::::…3’…TCATCGGATACGCT…5’

5’…AGTAGCCTATGCGA…3’

Page 9: Topics in Bioinformatics CS832b Bin Ma. Lecture 1: Basic

>CHRXGATCACCTGACATCAGGAGTTCAAGACCAGCCTGCCAACGTGGTGAAACCCCATCTCTACTAAAAATAGGAAATTCACCTGGTGGCAGGTGCCTGTAATCCCAGCTACTCGGGAGGCTGAGGCAGAAGAATCGCTTGAACCCAGGAGGTGGAGATTGCACTGAGCTGAGATCACGCCACTGCGCTCCAGCCTGGGTGACAGAGCAAGACTCCATAAAAAAAAAAATTATAACCTAATGATTAAATACTGTAGGGAAGAGCTTACCACAATTGCTGGCCCATGGCCAATGCTGGGTATAAGACAGCTACTGCAAACAACCATGATGATGATACATCTCTTGTGTAGGGTTAGGTTGTTTGAGACACATTCTATGCTCCTTGATTTGATTGGAAGGTACCTTGGTTCCTTGGGGACTTGGAGGTGACGAAAGCCTCCCTGGGGACAAAACTCACCTTCACTTCTCTAATATCAAGCTTCAGCAACCTGCTCCAGCTACAGCACAGGGTTGGACAGGCCCAACAACAGAGGAAATCCACAAAGTGTGTCTTGACACATACATCCACGGGGTCTAACGAGGTGAGGCCAATGACTGCTTCCACACACCCCAGCCAGACTCTGACTTCACTCCCGGCAGGTTTCAGTAGACTTGGCAGCAGTTGGAGCGAGCTGGCTTCTTGCGGTAGGCAGCCATGTTGGAAGAGCTCCCAATAGTCCTCGTTTCCTGGTAATCTCATGCTTGGATCATCTTCTTCTCTTGAGTGAAGAGAAGAACTGCAGAGAGAGACAGAGACAGAGAGACAGATCACAGGGGCAGTTTCCCCCATACTGTTCTCAAGATAAATGAGTCAACTCTTACACCTCTTTTCTCTGGTGTAAAACAAGGCTGGTGAACAGGCAGAGAGAACTGGGGTGTTGGAGTAGCATTGACCTTCCTTCTTCATCCCTCTATAATCTCTCCTAGTGCAGGAGTAGGAAAACTAAAAATCACACGTCTGATCATCTGTGATCTCAGAGTCTTGGACAAGCCTTGCTTGCCAATCAGCAGGGATGGGAGTTGGAGCCATCTCCAAGTGTCCCCCCACAAATCTATGTCCACCTGGAAGTTTCAAATGCAACTTTATTTGGGAAAGGCAATTTTGCAAATGTTATTAAGTGAAGGATCTAGGGATGAGATCATCCTGGAGTAGGGTGGGTCCTAGGTCAAATGACAGGAAATCTGCCCACCTCGGCCTCCCAAAGTGCTGGGATTACAGGCATGAGCCACCAAACCTGGCCTATCATTGATTTAATGATTAATACGGTTAGGCTCTGTGTCCCCACCCAAATCTCATCTCAAATTGTAATTCCCATGTGTCCAGGGAGGGAGCTTGTGGAAGGTGATTGGATCACAGGGGCAGTTTTTGTCATGCTGTTCTCATGATAAATGAGTCAATTCTCAGAAGAGATGATGGTTTTAAAGTGTGGCACTTCTTTGCTCTCTTGCTCTCTCTCTCTCCTGAGTAGACTGGCTCATTCTTTCTACTGGTTACAAGCAATAGAAGTGATAACAAAATTGATGGTTTCTCATTTCCTAAATGGTACCAGTGGATTCCTGGTTTCCTCTCTCTCTCTTCTCTCTCTCTATCAACTTTTCCCTCAATCTCTCTATCAACCTCCCTCTCTCTCAATCTCAATCTCTCTCAGTCTCATTCTCAATCTCTTTTGCTCAATCTCTTTCTCAGCTTCTCTCCCTCAATTTCTCTTTTGCAACTTCTCTCTCTCAGTCTGTGTCTCTCAATCTCCCTCTCTCAATCTCTCTTGTAGTCTCCCTGTCTCTCATACTCTCTCTGTTTCTGTCTGTCTCTGCCCTTGCTCTAGGGAAAGCAAGTTCTTATGCTGTAAGTTCTCCTGTAAAAAGGTCCACATGATACGGAACTGGCCATCTTTGGCCAACATGAGTGAGTTTAGAAGTGTGCCTTTCACCAGTTGAGCCTTCAAATGAGATCCCAGCCCTGGATGACACAGTGACAGTAACCTGCTAGGAACTGTGAACCAGAGGCACCCAGCCAAGCTGCTCCCAGACTCCCAACCCAGTGAAACCATAAGATAATAAATGCATGTTGTTTTAAGCTGCTAAGTTTGGGGGTCACTTGTTACACAGCAACAGCTGACTCATACATTTTCTTTGAAATTGATTTCCACTTCTGTCACCAGCATCATTCCATAAATTTGCTCTATGTGCATTGCTGACCTGCAGTAGAAGTTTTGGAGAAGTGAACCACATCCCCTTATCTGCCATTTGACAGCAAGCAGCCTCAAACATTCATAATTTCTTTCCTGACTCTCCACTCCACACTGTTGCCTGCCTTCCTGGTTCCAGATCTTTGGATCTGGACTGACACCTGGGCACTGTCATAGGCATCCGTGTGAAGAGACCACCAACAGGCTCTGTGTGAGCAATAAAGCTTTTTAATCACCTGGGTGCAGGTGGGCTGATTCTGAAAAGAGAGTCAGCAAAGAGTGGTGGGATTATCATTAGTTCTTATAGGTTCGGGATAGGTGGTGGAGTTAGGAGCAATTTTTTGTGGGCAGGGAGTGGATCTTACAAAGGACATTCTCAAGGGTGGGGATGATTTTACAAAGTACCTTCTTAAGGGCGGGGGAGGATATTACAAAGTACCTTCTCAAGGGTGGGGATGATTTTACAAAGTACCTTCTTAAGGGCGGGGGAGGATATTACAAAGTACCTTCTCAAGGGTGGGGGTGGATATTACAAAGTACCTTCTTAAGGGCAGGGGAGGATATTACAAAGTACCTTCTCAAGGGGGGGGATGATTTTACAAAGTACCTTCTTAAGGGCGGGGGAGGATATTACAAAGTACCTTCTCAAGGGTGGGGGTGGATATTAGAAAGTACCTTCT

• Chromosome X is one of the 23 chromosomes in human genome.• Chromosome X has 162 million base pairs.

Page 10: Topics in Bioinformatics CS832b Bin Ma. Lecture 1: Basic

Genome Sizes

Species Size in bps

Amoeba dubia 670,000,000,000

Homo sapiens 3,400,000,000

Drosophila melanogaster 180,000,000

Mycoplasma genitalium 580,000

Human immunodeficiency virus type 1

9,750

Page 11: Topics in Bioinformatics CS832b Bin Ma. Lecture 1: Basic

Protein and Amino Acids

Page 12: Topics in Bioinformatics CS832b Bin Ma. Lecture 1: Basic

Protein

Page 13: Topics in Bioinformatics CS832b Bin Ma. Lecture 1: Basic

Protein

GOT Ecoli

Page 14: Topics in Bioinformatics CS832b Bin Ma. Lecture 1: Basic

A protein sequence

>gi|7228451|dbj|BAA92411.1| EST AU055734(S20025) corresponds to a region …

MCSYIRYDTPKLFTHVTKTPPKNQVSNSINDVGSRRATDRSVASCSSEKSVGTMSVKNASSISFEDIEKSISNWKIPKVN

IKEIYHVDTDIHKVLTLNLQTSGYELELGSENISVTYRVYYKAMTTLAPCAKHYTPKGLTTLLQTNPNNRCTTPKTLKWD

EITLPEKWVLSQAVEPKSMDQSEVESLIETPDGDVEITFASKQKAFLQSRPSVSLDSRPRTKPQNVVYATYEDNSDEPSI

SDFDINVIELDVGFVIAIEEDEFEIDKDLLKKELRLQKNRPKMKRYFERVDEPFRLKIRELWHKEMREQRKNIFFFDWYE

SSQVRHFEEFFKGKNMMKKEQKSEAEDLTVIKKVSTEWETTSGNKSSSSQSVSPMFVPTIDPNIKLGKQKAFGPAISEEL

VSELALKLNNLKVNKNINEISDNEKYDMVNKIFKPSTLTSTTRNYYPRPTYADLQFEEMPQIQNMTYYNGKEIVEWNLDG

FTEYQIFTLCHQMIMYANACIANGNKEREAANMIVIGFSGQLKGWWNNYLNETQRQEILCAVKRDDQGRPLPDRDGNGNP

TELKEGFHMEEKDEPIQEDDQVVGTIQKYTKQKWYAEVMYRFIDGSYFQHITLIDSGADVNCIREDEILDQLVQTKREQV

VNSIYLHDNSFPKSMDLPDQKITEKRAKLQDIPHHEERLLDYREKKSRDGQDKLPMEVEQSMATNKNTKILLRAWLLST

A protein sequence may have a few hundreds to several thousands amino acids.

Page 15: Topics in Bioinformatics CS832b Bin Ma. Lecture 1: Basic

RNA

Page 16: Topics in Bioinformatics CS832b Bin Ma. Lecture 1: Basic

Animal cell

Nucleus

Chromatin

Mitochondrion

Nucleolus (rRNA synthesized)

Plasma membraneCell coat

Cytoplasm

Page 17: Topics in Bioinformatics CS832b Bin Ma. Lecture 1: Basic

Protein synthesis

Page 18: Topics in Bioinformatics CS832b Bin Ma. Lecture 1: Basic

Protein synthesis

Page 19: Topics in Bioinformatics CS832b Bin Ma. Lecture 1: Basic

Genetic code ..ATTCACAGTGGA..

I

H

S

G

Page 20: Topics in Bioinformatics CS832b Bin Ma. Lecture 1: Basic

Notes on translation

• Reading frame• Start and end codon

• Third base not important

• 5’ -> 3’

Page 21: Topics in Bioinformatics CS832b Bin Ma. Lecture 1: Basic

DNA replication

Page 22: Topics in Bioinformatics CS832b Bin Ma. Lecture 1: Basic

The Central Dogma of Molecular Biology

DNA RNA Proteintranscript translation

replication

genotype phenotype

Page 23: Topics in Bioinformatics CS832b Bin Ma. Lecture 1: Basic

Exception – retroviruses

DNA RNA Proteintranscript translation

replication

genotype phenotype

Page 24: Topics in Bioinformatics CS832b Bin Ma. Lecture 1: Basic

ProteinPhenotype

DNA(Genotype)

Biology

Page 25: Topics in Bioinformatics CS832b Bin Ma. Lecture 1: Basic

Genes• One gene encodes one protein (or sometimes

RNA).• Like a program, it starts with start codon (e.g.

ATG), then each three code one amino acid. Then a stop codon (e.g. TGA) signifies end of the gene.

• Genes are dense in prokaryotes and sparse in eukaryotes.

• In the middle of a eukaryotic gene, there are introns that are spliced out (as junk) after transcription. Good parts are called exons. This is the task of gene finding.

Page 26: Topics in Bioinformatics CS832b Bin Ma. Lecture 1: Basic

Introns and Exons

Page 27: Topics in Bioinformatics CS832b Bin Ma. Lecture 1: Basic

Jumping genes

• Genes can jump over other genes.

Page 28: Topics in Bioinformatics CS832b Bin Ma. Lecture 1: Basic

Gene related diseases

• Hemophilia: on X chromosome.• Sickle-Cell Anemia: single nucleotide mutation in the first

exon of beta-globin gene (removes a cutting site). 1 in 12 African Americans are carriers. (sick for homozygotes)

• BRCA1 gene (chr. 17q) – responsible for ½ inherited breast cancer (10% of breast cancer)

• Fragile X syndrome (mentally retard) – 1 in 1250 males, 2500 females (dominate, but females have partially expressed good gene). FMR-1 gene: tri-nucleotide repeats >200 causes disease.

• P53 gene: chr. 17p, responsible for ½ of all cancers

Page 29: Topics in Bioinformatics CS832b Bin Ma. Lecture 1: Basic