introduction to molecular biology, genetics and genomics sushmita roy [email protected]...
TRANSCRIPT
Introduction to Molecular Biology, Genetics and Genomics
Sushmita Roy
www.biostat.wisc.edu/bmi576/[email protected]
September 6, 2012
BMI/CS 576
Goals for today
• Molecular biology crash course:– The different parts of a cell– DNA, RNA, chromosomes, nucleus, cytoplasm– Bio-chemical entities of a cell: mRNA, proteins,
metabolites– genes, heredity, transcription, translation, gene regulation,
gene expression, alternative splicing
• Genomics crash course:– Genomes, functional genomics, other omes, networks
Organization of biological information
Organism
Tissue
Gene
Chromosome
Cell
http://publications.nigms.nih.gov/thenewgenetics/chapter1.html
The central dogma of Molecular biology
DNA
RNA
Proteins
Transcription
Translation
image from the DOE Human Genome Programhttp://www.ornl.gov/hgmis
DNA
• Short for Deoxy ribonucleic acid
• composed of small chemical units called nucleotides (or bases)– adenine (A), cytosine (C), guanine (G) and thymine (T)– ATGC is the alphabet
• DNA is double stranded: made up two twisting strands
• Each strand of DNA is a string composed of the four letters: A, C, G, T
DNA is a double helical molecule
DNA molecules consist of two strands arranged in a double helix
• DNA is made up of nucleotides
Double-helical structure is needed for the DNA molecule to store and pass with great precision
James Watson, Francis Crick, Maurice Wilkins and Rosalind Franklin
Watson-Crick Base Pairs
A always bonds to T C always bonds to G
This is called base pairing.A and G are double ringed structures called purines.C and T single ringed structures called pyrimidines
5’ and 3’ of a DNA molecule• The backbone of this molecule has
alternating carbon and phosphate molecules
• each strand of DNA has a “direction”– at one end, the terminal carbon atom
in the backbone is the 5’ carbon atom of the terminal sugar
– at the other end, the terminal carbon atom is the 3’ carbon atom of the terminal sugar
• therefore we can talk about the 5’ and the 3’ ends of a DNA strand
DNA stores the blue print of an organism
• The heredity molecule• Has the information needed to make an organism• Base pairing enables self-replication:
– one strand has all the information
Chromosomes
• All the DNA of an organism is divided up into individual chromosomes
• prokaryotes (single-celled organisms lacking nuclei) typically have a single circular chromosome
• eukaryotes (organisms with nuclei) have a species-specific number of chromosomes
Image from www.genome.gov
DNA packaging in Chromatin
DNA is very long (3m in humans), cell is very smallChromosome compresses the DNA molecule 50,000Collection of DNA and proteins is called chromatin.
Different organisms have different numbers of chromosomes
Organism # of chromosomes
Yeast 32
Human 46
Fly 8
Mouse 40
Arabidopsis 10
Worm 12
Genes
• genes are the basic units of heredity
• a gene is a sequence of bases which specifies a protein or RNA genes
• the human genome comprises ~ 25,000 protein-coding genes (still being revised)
• One gene can have many functions• One function can require many
genes…GTATGTCTAAGCCTGAATTCAGTCTGCTTTAAACGGCTTC…
Structure of genes
DNA
GeneNon-coding Promoter
Gene A Gene B Gene C
Genomes
• Refers to the complete complement of DNA for a given species
• the human genome consists of 2X23 chromosomes
• every cell (except egg and sperm cells and mature red blood cells) contains the complete genome of an organism
Some Greatest Hits
Genome Where Year
H. Influenza TIGR 1995
E. Coli K -12 Wisconsin 1997
S. cerevisiae (yeast) internat. collab. 1997
C. elegans (worm) Washington U./Sanger 1998
Drosophila M. (fruit fly) multiple groups 2000
E. Coli 0157:H7 (pathogen) Wisconsin 2000
H. Sapiens (that’s us) internat. collab./Celera 2001
Mus musculus (mouse) internat. collaboration 2002
Rattus norvegicus (rat) internat. Collaboration 2004
Some Genome Sizes
genome # base pairs
HIV 9750
E. coli 4.6 million
S. cerevisiae 12 million
C. elegans 97 million
Drosophila M. 137 million
human 3.1 billion
Number of sequenced genomes
The central dogma of Molecular biology
DNA
RNA
Proteins
Transcription
Translation
RNA
• RNA is like DNA except:– single stranded– U is used in place of T
• a strand of RNA can be thought of as a string composed of the four letters: A, C, G, U
Transcription• In eukaryotes: happens inside the nucleus• RNA polymerase is an enzyme that builds an RNA strand
from a gene• RNA Pol II is recruited at specific parts of the genome in a
condition-specific way. • Transcription factor proteins are assigned the job of Pol II
recruitment.
• RNA that is transcribed from a gene is called messenger RNA (mRNA)
Transcription: Process of turning DNA into RNA
mRNA
The central dogma of Molecular biology
DNA
RNA
Proteins
Transcription
Translation
Translation
• Process of turning mRNA into proteins.
• Happens inside the cytoplasm in ribosomes
• ribosomes are the machines that synthesize proteins from mRNA
• Translation process reads one codon at a time
• translation begins with the start codon
• translation ends with the stop codon
Translation happens in ribosomes
Codons
• Each triplet of bases is called a odon• How many codons are possible?• Each codon is responsible for coding a particular
amino acid.
The Genetic Code
Codons and Reading Frames
Alanine
Threonine
Proteins
• Proteins are long strings of composed of amino acids
• There are 20 different amino acids known
Amino AcidsAlanine Ala A
Arginine Arg R
Aspartic Acid Asp D
Asparagine Asn N
Cysteine Cys C
Glutamic Acid Glu E
Glutamine Gln Q
Glycine Gly G
Histidine His H
Isoleucine Ile I
Leucine Leu L
Lysine Lys K
Methionine Met M
Phenylalanine Phe F
Proline Pro P
Serine Ser S
Threonine Thr T
Tryptophan Trp W
Tyrosine Tyr Y
Valine Val V
Proteins are the workhorses of the cell
• structural support• storage of amino acids• transport of other substances• coordination of an organism’s activities• response of cell to chemical stimuli• movement• protection against disease• selective acceleration of chemical reactions
Proteins are complex molecules
• Primary amino acid sequence
• Secondary structure• Tertiary structure• Quarternary structure
Some well-known proteins
Hemoglobin: carries oxygen Insulin: metabolism of sugarActin: maintenance of cell structure
Hemoglobin protein HBA1
>gi|224589807:226679-227520 Homo sapiens chromosome 16, GRCh37.p9 Primary Assembly
1 CCCACAGACT CAGAGAGAAC CCACCATGGT GCTGTCTCCT GACGACAAGA CCAACGTCAA
61 GGCCGCCTGG GGTAAGGTCG GCGCGCACGC TGGCGAGTAT GGTGCGGAGG CCCTGGAGAG
121 GATGTTCCTG TCCTTCCCCA CCACCAAGAC CTACTTCCCG CACTTCGACC TGAGCCACGG
181 CTCTGCCCAG GTTAAGGGCC ACGGCAAGAA GGTGGCCGAC GCGCTGACCA ACGCCGTGGC
241 GCACGTGGAC GACATGCCCA ACGCGCTGTC CGCCCTGAGC GACCTGCACG CGCACAAGCT
301 TCGGGTGGAC CCGGTCAACT TCAAGCTCCT AAGCCACTGC CTGCTGGTGA CCCTGGCCGC
361 CCACCTCCCC GCCGAGTTCA CCCCTGCGGT GCACGCCTCC CTGGACAAGT TCCTGGCTTC
421 TGTGAGCACC GTGCTGACCT CCAAATACCG TTAAGCTGGA GCCTCGGTGG CCATGCTTCT
481 TGCCCCTTTG G
DNA sequence (491 bp)
>sp|P69905|HBA_HUMAN Hemoglobin subunit alpha OS=Homo sapiens GN=HBA1 PE=1 SV=2MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR
Amino acid sequence (142 aa) Protein 3d structure
RNA Processing in Eukaryotes
• eukaryotes are organisms that have enclosed nuclei in their cells
• in many eukaryotes, RNAs consist of alternating exon/intron segments
• exons are the coding parts
• introns are spliced out before translation
RNA Splicing
RNA Genes
• not all genes encode proteins• for some genes the end product is RNA
– ribosomal RNA (rRNA), which includes major constituents of ribosomes
– transfer RNAs (tRNAs), which carry amino acids to ribosomes
– micro RNAs (miRNAs), which play an important regulatory role in various plants and animals
– linc RNAs (long non-coding RNAs), play important regulatory roles.
Central Dogma revisited
DNA
RNA
Proteins
Transcription
Translation
ncRNA, miRNA, rRNAs
Non-coding RNA processing
Summary
• Key concepts in molecular biology– Central Dogma– DNA, RNA, proteins– Chromosomes, Nucleus, Ribosomes
• Important processes– Transcription– Translation– RNA splicing
Functional Genomics
• Aims to characterize gene, proteins in an organism in an unbiased way using high throughput technologies.
• Really focused on “beyond the genetic sequence”• What does a piece of DNA do?
– Gene, regulatory element, a mutation
• Has generated large collections of “omics” datasets– Gene expression– Protein expression– Metabolite levels
Metabolites
• Metabolism:– A set of chemical processes in cells – Need for sustaining life
• Small molecules that are intermediates of metabolism– Sugar– Glycerol
• Metabolic pathway– A set of chemical reactions in a cell
The Tri-Carboxylic Acid cycle
Metabolites
Enzyme
Courtesy KEGG Pathways
Yeast metabolic pathways
Context-specific expression of a cell
• The DNA is static • But the set of mRNA per cell type, environment, time-
point may be different.• A key process is gene regulation
– determines which genes are expressed when
Environmental signal
Transcriptional gene regulation
• Key control process that determines what genes are expressed when
• Requires– RNA Polymerase– Transcription factors– Energy
http://www.youtube.com/watch?v=WsofH466lqk
Transcriptional gene regulation
Transcription factor level (trans)
HSP12
Transcription factor binding sites (cis)
mRNA levels
P2P1
Promoter
Regulation of GAL genes
• GAL genes are required for yeasts to grow on Galactose.
• There are 4 genes that are metabolic– GAL1, GAL10, GAL2 and GAL7
• There are three that are regulatory– GAL4, GAL80 and GAL3
Regulation of GAL genes
No Galactose
In Galactose
A metabolic GAL gene
Transcriptome
• The entire set of RNA products in a cell• A cell can decide to make more or less of a particular
RNA– Levels change
• It’s constituents are context-specific• Context is determined by environment of a cell
Transcriptional Regulatory networks
• The entire set of interactions between TFs and genes in an organism
• The transcriptome is the output of a regulatory network
Image courtesy: Dr. Mike Snyder, http://compbio.pbworks.com/w/page/16252928/Transcription%20Regluatory%20Network#1
Understanding cells requires an iterative approach spanning multiple levels
Ideker et al., Science 2002
Summary
• Cells are made up of many different molecular entities
• Functional genomics enables us to identify these entities
• Cells function via the interaction of these entities• Putting it together into comprehensive models is a
major goal of systems biology