introduction to bioinformatics

53
Introduction to Bioinformatics Molecular Biology Primer 1

Upload: korbin

Post on 08-Jan-2016

30 views

Category:

Documents


1 download

DESCRIPTION

Introduction to Bioinformatics. Molecular Biology Primer. Genetic Material. DNA (deoxyribonucleic acid) is the genetic material Information stored in DNA the basis of inheritance distinguishes living things from nonliving things Genes - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Introduction to Bioinformatics

Introduction to Bioinformatics

Molecular Biology Primer

1

Page 2: Introduction to Bioinformatics

Genetic Material

• DNA (deoxyribonucleic acid) is the genetic material

• Information stored in DNA– the basis of inheritance– distinguishes living things from nonliving

things

• Genes– various units that govern living thing’s

characteristics at the genetic level

2

Page 3: Introduction to Bioinformatics

Nucleotides

• Genes themselves contain their information as a specific sequence of nucleotides found in DNA molecules

• Only four different bases in DNA molecules– Guanine (G)– Adenine (A)– Thymine (T)– Cytosine (C)

• Each base is attached to a phosphate group and a deoxyribose sugar to form a nucleotide.

• The only thing that makes one nucleotide different from another is which nitrogenous base it contains

SugarP

Base

3

Page 4: Introduction to Bioinformatics

Nucleoside

Purine:

Pyrimidine:

4

Page 5: Introduction to Bioinformatics

Nucleotides

• Complicated genes can be many thousands of nucleotides long

• All of an organism’s genetic instructions, its genome, can be maintained in millions or even billions of nucleotides

5

Page 6: Introduction to Bioinformatics

Orientation

• Strings of nucleotides can be attached to each other to make long polynucleotide chains

• 5’ (5 prime) end – The end of a string of nucleotides with a 5'

carbon not attached to another nucleotide

• 3’ (3 prime) end– The other end of the molecule with an

unattached 3' carbon

6

Page 7: Introduction to Bioinformatics

1’

2’3’

4’

5’

7

Page 8: Introduction to Bioinformatics

Base Pairing

• Structure of DNA– Double helix– Seminal paper by Watson and Crick in 1953– Rosalind Franklin’s contribution

• Information content on one of those strands essentially redundant with the information on the other– Not exactly the same—it is complementary

• Base pair– G paired with C (G C)– A paired with T (A = T)

8

Page 9: Introduction to Bioinformatics

9

Page 10: Introduction to Bioinformatics

Base Pairing

• Reverse complements– 5' end of one strand corresponding to the 3' end of its

complementary strand and vice versa

• Example– one strand: 5'-GTATCC-3'

the other strand: 3'-CATAGG-5' 5'-GGATAC-3'

• Upstream: Sequence features that are 5' to a particular reference point

• Downstream: Sequence features that are 3' to a particular reference point

5' 3'Upstream Downstream

10

Page 11: Introduction to Bioinformatics

DNA Structure

11

Page 12: Introduction to Bioinformatics

DNA Structure

12

Page 13: Introduction to Bioinformatics

Chromosome

• Threadlike "packages" of genes and other DNA in the nucleus of a cell

13

Page 14: Introduction to Bioinformatics

14

Page 15: Introduction to Bioinformatics

Chromosome

• Different kinds of organisms have different numbers of chromosomes

• Humans – 23 pairs– 46 in all

15

Page 16: Introduction to Bioinformatics

Central Dogma of Molecular Biology

• DNA: information storage

• Protein: function unit, such as enzyme

• Gene: instructions needed to make protein

• Central dogma

16

Page 17: Introduction to Bioinformatics

Central Dogma of Molecular Biology

• Central dogma

reverse transcription(reverse transcriptase)

replication(DNA polymerase)

• DNA obtained from reverse transcription is called complementary DNA (cDNA) Difference between DNA and cDNA will be

discussed later 17

Page 18: Introduction to Bioinformatics

Central Dogma of Molecular Biology

• RNA (ribonucleic acid)– Single-stranded polynucleotide– Bases

• A• G• C• U (uracil), instead of T

• Transcription (simplified …)– A A, G G, C C, T U

SugarP

Base

SugarP

Base

H

OH

DNA

RNA

18

Page 19: Introduction to Bioinformatics

19

Page 20: Introduction to Bioinformatics

20

Page 21: Introduction to Bioinformatics

DNA Replication (DNA DNA)

21

Page 22: Introduction to Bioinformatics

DNA Replication (DNA DNA)

22

Page 23: Introduction to Bioinformatics

DNA Replication Animation

Courtesy of Rob Rutherford, St. Olaf University

23

Page 24: Introduction to Bioinformatics

Transcription (DNA RNA)

• Messenger RNA (mRNA)– carries information to be

translated

• Ribosomal RNA (rRNA)– the working “spine” of

the ribosome

• Transfer RNA (tRNA)– the “decoder keys” that

will translate nucleic acids to amino acids

24

Page 25: Introduction to Bioinformatics

Transcription Animation

Courtesy of Rob Rutherford, St. Olaf University

25

Page 26: Introduction to Bioinformatics

Peptides and Proteins

• mRNA Sequence of amino acids connected by peptide bond

• Amino acid sequence– Peptide: < 30 – 50 amino acids– Protein: longer peptide

26

Page 27: Introduction to Bioinformatics

27

Page 28: Introduction to Bioinformatics

28

Page 29: Introduction to Bioinformatics

29

Genetic Code – Codon

Stop codons

Start codon

Codon:

3-base RNA sequence

Page 30: Introduction to Bioinformatics

List of Amino Acids

Amino acid Symbol CodonA Alanine Ala GC*C Cysteine Cys UGU, UGCD Aspartic Acid Asp GAU, GACE Glutamic Acid Glu GAA, GAGF Phenylalanine Phe UUU, UUCG Glycine Gly GG*H Histidine His CAU, CACI Isoleucine Ile AUU, AUC, AUAK Lysine Lys AAA, AAGL Leucine Leu UUA, UUG, CU*

30

Page 31: Introduction to Bioinformatics

List of Amino Acids

Amino acid Symbol CodonM Methionine Met AUGN Asparagine Asn AAU, AACP Proline Pro CC*Q Glutamine Gln CAA, CAGR Arginine Arg CG*, AGA, AGGS Serine Ser UC*, AGU, AGCT Threonine Thr AC*V Valine Val GU*W Tryptophan Trp UGGY Tyrosine Tyr UAU, UAC

20 letters, no B J O U X Z31

Page 32: Introduction to Bioinformatics

Codon and Reading Frame

• 4 AA letters 43 = 64 triplet possibilities• 20 (< 64) known amino acids• Wobbling 3rd base• Redundant Resistant to mutation• Reading frame: linear sequence of codons in a

gene• Open Reading Frame (ORF), definition varies:

– a reading frame that begins with a start codon and end at a stop codon

– a series of codons in a DNA sequence uninterrupted by the presence of a stop codon

a potential protein-coding region of DNA sequence32

Page 33: Introduction to Bioinformatics

Open Reading Frame

• Given a nucleotide sequence– How many reading frames? __

• __ forward and __ backward

• Example: Given a DNA sequence, 5’-ATGACCGTGGGCTCTTAA-3’– ATG ACC GTG GGC TCT TAA M T V G S *– TGA CCG TGG GCT CTT AA * P W A L – GAC CGT GGG CTC TTA A D R G L L– Figure out the three backward reading frames

• In random sequence, a stop codon will follow a Met in ~20 AAs

• Substantially longer ORFs are often genes or parts of them

33

Page 34: Introduction to Bioinformatics

Translation (RNA Protein)

34

Page 35: Introduction to Bioinformatics

Translation Animation

Courtesy of Rob Rutherford, St. Olaf University

35

Page 36: Introduction to Bioinformatics

Gene Expression

• Gene expression– Process of using the information stored in

DNA to make an RNA molecule and then a corresponding protein

• Cells controlling gene expression by– reliably distinguishing between those parts of

an organism’s genome that correspond to the beginnings of genes and those that do not

– determining which genes code for proteins that are needed at any particular time.

36

Page 37: Introduction to Bioinformatics

Promoter

• The probability (P) that a string of nucleotides will occur by chance alone if all nucleotides are present at the same frequency P = (1/4)n, where n is the string’s length

• Promoter sequences – Sequences recognized by RNA polymerases as being associated

with a gene

• Example– Prokaryotic RNA polymerases scan along DNA looking for a

specific set of approximately 13 nucleotides marking the beginning of genes

– 1 nucleotide that serves as a transcriptional start site – 6 that are 10 nucleotides 5' to the start site, and – 6 more that are 35 nucleotides 5' to the start site– What is the frequency for the sequence to occur?

37

Page 38: Introduction to Bioinformatics

Gene Regulation

• Regulatory proteins– Capable of binding to a cell’s DNA near the promoter

of the genes – Control gene expression in some circumstances but

not in others

• Positive regulation – binding of regulatory proteins makes it easier for an

RNA polymerase to initiate transcription

• Negative regulation– binding of the regulatory proteins prevents

transcription from occurring

38

Page 39: Introduction to Bioinformatics

Promoter and Regulatory Example

• Low tryptophan concentration RNA polymerase binds to promoter genes transcribed

• High tryptophan concentration repressor protein becomes active and binds to operator blocks the binding of RNA polymerase to the promoter

• Tryptophan concentration drops repressor releases its tryptophan and is released from DNA polymerase again transcribes genes 39

Page 40: Introduction to Bioinformatics

Gene Structure

40

Page 41: Introduction to Bioinformatics

Exons and Introns

41

Page 42: Introduction to Bioinformatics

Exons and Introns Example

42

Page 43: Introduction to Bioinformatics

Protein Structure and Function

• Genes encode the recipes for proteins

43

Page 44: Introduction to Bioinformatics

44

Protein Structure and Function

• Proteins are amino acid polymers

Page 45: Introduction to Bioinformatics

Proteins: Molecular Machines Proteins in your muscles allows you to move:

myosinandactin

45

Page 46: Introduction to Bioinformatics

Proteins: Molecular Machines Digestion, catalysis

(enzymes) Structure (collagen)

46

Page 47: Introduction to Bioinformatics

Proteins: Molecular Machines Signaling

(hormones, kinases)

Transport(energy, oxygen)

47

Page 48: Introduction to Bioinformatics

Protein Structures

48

Page 49: Introduction to Bioinformatics

Information Flow in Nucleated Cell

49

Page 50: Introduction to Bioinformatics

Point Mutation Example: Sickle-cell Disease

• Wild-type hemoglobin

DNA

3’----CTT----5’

mRNA

5’----GAA----3’

Normal hemoglobin

------[Glu]------

• Mutant hemoglobin

DNA

3’----CAT----5’

mRNA

5’----GUA----3’

Mutant hemoglobin

------[Val]------50

Page 51: Introduction to Bioinformatics

image credit: U.S. Department of Energy Human Genome Program, http://www.ornl.gov/hgmis.

51

Page 52: Introduction to Bioinformatics

50% is high copy number repeats

About 10% is transcribed

(made into RNA)

Only 1.5% actually codes for protein

98.5% Junk DNA

Thinking about the Human Genome

52

Page 53: Introduction to Bioinformatics

Thinking about the Human Genome

~ 3 X 109 bps

(3 billion base pairs)

If each base were one mm long…

2000 miles, across the center of Africa

Average gene about 30 meters long

Occur about every 270 meters between them

Once spliced the message would only be

~1 meter long53