copyright © 2004 by limsoon wong a biology review
TRANSCRIPT
![Page 1: Copyright © 2004 by Limsoon Wong A Biology Review](https://reader036.vdocument.in/reader036/viewer/2022062422/56649e915503460f94b9740c/html5/thumbnails/1.jpg)
Cop
yrig
ht ©
200
4 by
Lim
soon
Won
g
A Biology Review
![Page 2: Copyright © 2004 by Limsoon Wong A Biology Review](https://reader036.vdocument.in/reader036/viewer/2022062422/56649e915503460f94b9740c/html5/thumbnails/2.jpg)
Body
• Our body consists of a number of organs• Each organ is composed of a number of
tissues• Each tissue is composed of cells of the
same type
![Page 3: Copyright © 2004 by Limsoon Wong A Biology Review](https://reader036.vdocument.in/reader036/viewer/2022062422/56649e915503460f94b9740c/html5/thumbnails/3.jpg)
Cell
• Performs two types of function– Chemical reactions necessary to maintain our
life– Pass info for maintaining life to next
generation
• In particular– Protein performs chemical reactions– DNA stores & passes info– RNA is intermediate between DNA & proteins
![Page 4: Copyright © 2004 by Limsoon Wong A Biology Review](https://reader036.vdocument.in/reader036/viewer/2022062422/56649e915503460f94b9740c/html5/thumbnails/4.jpg)
Protein
• A protein sequence composed from an alphabet of 20 amino acids– Length is usually 20 to
5000 amino acids – Average around 350
amino acids
• Folds into 3D shape, forming the building blocks & performing most of the chemical reactions within a cell
![Page 5: Copyright © 2004 by Limsoon Wong A Biology Review](https://reader036.vdocument.in/reader036/viewer/2022062422/56649e915503460f94b9740c/html5/thumbnails/5.jpg)
Classification of Amino Acids
• Amino acids can be classified into 4 types.
• Positively charged (basic)– Arginine (Arg, R)– Histidine (His, H)– Lysine (Lys, K)
• Negatively charged (acidic)– Aspartic acid (Asp, D)– Glutamic acid (Glu, E)
![Page 6: Copyright © 2004 by Limsoon Wong A Biology Review](https://reader036.vdocument.in/reader036/viewer/2022062422/56649e915503460f94b9740c/html5/thumbnails/6.jpg)
Classification of Amino Acids• Polar (overall
uncharged, but uneven charge distribution. can form hydrogen bonds with water. they are called hydrophilic)– Asparagine (Asn, N)– Cysteine (Cys, C)– Glutamine (Gln, Q)– Glycine (Gly, G)– Serine (Ser, S)– Threonine (Thr, T)– Tyrosine (Tyr, Y)
• Nonpolar (overall uncharged and uniform charge distribution. cant form hydrogen bonds with water. they are called hydrophobic)– Alanine (Ala, A)– Isoleucine (Ile, I)– Leucine (Leu, L)– Methionine (Met, M)– Phenylalanine (Phe, F)– Proline (Pro, P)– Tryptophan (Trp, W)– Valine (Val, V)
![Page 7: Copyright © 2004 by Limsoon Wong A Biology Review](https://reader036.vdocument.in/reader036/viewer/2022062422/56649e915503460f94b9740c/html5/thumbnails/7.jpg)
Genetic Code• Each amino acid is composed of three nucleotides• Start codon: ATG (code for M)• Stop codon: TAA, TAG, TGA
Copyright © 2004 by Limsoon Wong
![Page 8: Copyright © 2004 by Limsoon Wong A Biology Review](https://reader036.vdocument.in/reader036/viewer/2022062422/56649e915503460f94b9740c/html5/thumbnails/8.jpg)
DNA
• DNA stores instruction needed by the cell to perform daily life function
• Consists of two strands interwoven together and form a double helix
• Each strand is a chain of some small molecules called nucleotides
Francis Crick shows James Watson the model of DNA in their room number 103 of the Austin Wing at the Cavendish Laboratories, Cambridge
Copyright © 2004 Limsoon Wong
![Page 9: Copyright © 2004 by Limsoon Wong A Biology Review](https://reader036.vdocument.in/reader036/viewer/2022062422/56649e915503460f94b9740c/html5/thumbnails/9.jpg)
A C G T U
Classification of Nucleotides
• 5 different nucleotides: adenine(A), cytosine(C), guanine(G), thymine(T), & uracil(U)
• A, G are purines. They have a 2-ring structure• C, T, U are pyrimidines. They have a 1-ring
structure• DNA only uses A, C, G, & T
Copyright © 2004 by Limsoon Wong
![Page 10: Copyright © 2004 by Limsoon Wong A Biology Review](https://reader036.vdocument.in/reader036/viewer/2022062422/56649e915503460f94b9740c/html5/thumbnails/10.jpg)
A
T
10Å
G
C
10Å
Watson-Crick rules
• Complementary bases:– A with T (two hydrogen-bonds)– C with G (three hydrogen-bonds)
Copyright © 004 by Limsoon Wong
![Page 11: Copyright © 2004 by Limsoon Wong A Biology Review](https://reader036.vdocument.in/reader036/viewer/2022062422/56649e915503460f94b9740c/html5/thumbnails/11.jpg)
Double Stranded DNA
• DNA is double stranded in a cell. The two strands are anti-parallel. One strand is reverse complement of the other
• The double strands are interwoven to form a double helix
Copyright © 2004 by Limsoon Wong
![Page 12: Copyright © 2004 by Limsoon Wong A Biology Review](https://reader036.vdocument.in/reader036/viewer/2022062422/56649e915503460f94b9740c/html5/thumbnails/12.jpg)
Locations of DNAs in a Cell?
• Two types of organisms– Prokaryotes (single-celled organisms with no nuclei. e.g.,
bacteria)
– Eukaryotes (organisms with single or multiple cells. their cells have nuclei. e.g., plant & animal)
• In Prokaryotes, DNA swims within the cell• In Eukaryotes, DNA locates within the
nucleus
![Page 13: Copyright © 2004 by Limsoon Wong A Biology Review](https://reader036.vdocument.in/reader036/viewer/2022062422/56649e915503460f94b9740c/html5/thumbnails/13.jpg)
Chromosome
• DNA is usually tightly wound around histone proteins and forms a chromosome
• The total info stored in all chromosomes constitutes a genome
• In most multi-cell organisms, every cell contains the same complete set of chromosomes – May have some small differences due to
mutation
• Human genome has 3G base pairs, organized in 23 pairs of chromosomes
![Page 14: Copyright © 2004 by Limsoon Wong A Biology Review](https://reader036.vdocument.in/reader036/viewer/2022062422/56649e915503460f94b9740c/html5/thumbnails/14.jpg)
Gene
• A gene is a sequence of DNA that encodes a protein or an RNA molecule
• About 30,000 – 35,000 (protein-coding) genes in human genome
• For gene that encodes protein– In Prokaryotic genome, one gene corresponds
to one protein– In Eukaryotic genome, one gene can
corresponds to more than one protein because of the process “alternative splicing”
![Page 15: Copyright © 2004 by Limsoon Wong A Biology Review](https://reader036.vdocument.in/reader036/viewer/2022062422/56649e915503460f94b9740c/html5/thumbnails/15.jpg)
Complexity of Organism vs. Genome Size
• Human Genome: 3G base pairs
• Amoeba dubia (a single cell organism): 600G base pairs
Genome size has no relationship with the complexity of the organism
![Page 16: Copyright © 2004 by Limsoon Wong A Biology Review](https://reader036.vdocument.in/reader036/viewer/2022062422/56649e915503460f94b9740c/html5/thumbnails/16.jpg)
Number of Genes vs. Genome Size
• Prokaryotic genome (e.g., E. coli)– Number of base pairs: 5M– Number of genes: 4k– Average length of a gene:
1000 bp
• Eukaryotic genome (e.g., human)– Number of base pairs: 3G– Estimated number of
genes: 30k – 35k– Estimated average length
of a gene: 1000-2000 bp
• ~ 90% of E. coli genome are of coding regions.
• < 3% of human genome is believed to be coding regions
Genome size has no relationship with the number of genes!
![Page 17: Copyright © 2004 by Limsoon Wong A Biology Review](https://reader036.vdocument.in/reader036/viewer/2022062422/56649e915503460f94b9740c/html5/thumbnails/17.jpg)
RNA vs DNA
• RNA is single stranded• Nucleotides of RNA are similar to that of
DNA, except that have an extra OH at position 2’– Due to this extra OH, it can form more
hydrogen bonds than DNA– So RNA can form complex 3D structure
• RNA use the base U instead of T– U is chemically similar to T– In particular, U is also complementary to A
![Page 18: Copyright © 2004 by Limsoon Wong A Biology Review](https://reader036.vdocument.in/reader036/viewer/2022062422/56649e915503460f94b9740c/html5/thumbnails/18.jpg)
Central Dogma
• Gene expression consists of two steps– Transcription
DNA mRNA
– Translation mRNA Protein
Copyright © 2004 by Limsoon Wong
![Page 19: Copyright © 2004 by Limsoon Wong A Biology Review](https://reader036.vdocument.in/reader036/viewer/2022062422/56649e915503460f94b9740c/html5/thumbnails/19.jpg)
Transcription
• Synthesize mRNA from one strand of DNA– An enzyme RNA
polymerase temporarily separates double-stranded DNA
– It begins transcription at transcription start site (ATG)
– A A, CC, GG, & TU
– Once RNA polymerase reaches transcription stop site, transcription stops (TGA, TAG, and TAA)
• Additional “steps” for Eukaryotes– Transcription produces
pre-mRNA that contains both introns & exons
– 5’ cap & poly-A tail are added to pre-mRNA
– RNA splicing removes introns & mRNA is made
– mRNA are transported out of nucleus
![Page 20: Copyright © 2004 by Limsoon Wong A Biology Review](https://reader036.vdocument.in/reader036/viewer/2022062422/56649e915503460f94b9740c/html5/thumbnails/20.jpg)
Translation
• Synthesize protein from mRNA
• Each amino acid is encoded by consecutive seq of 3 nucleotides, called a codon
• The decoding table from codon to amino acid is called genetic code
• 43=64 diff codons Codons are not 1-to-1
corr to 20 amino acids• All organisms use the
same decoding table• Recall that amino acids
can be classified into 4 groups. A single-base change in a codon is usually not sufficient to cause a codon to code for an amino acid in different group
![Page 21: Copyright © 2004 by Limsoon Wong A Biology Review](https://reader036.vdocument.in/reader036/viewer/2022062422/56649e915503460f94b9740c/html5/thumbnails/21.jpg)
Ribosome
• Translation is handled by a molecular complex, ribosome, which consists of both proteins & ribosomal RNA (rRNA)
• Ribosome reads mRNA & the translation starts at a start codon (the translation start site)
• With help of tRNA, each codon is translated to an amino acid
• Translation stops once ribosome reads a stop codon (the translation stop site)
![Page 22: Copyright © 2004 by Limsoon Wong A Biology Review](https://reader036.vdocument.in/reader036/viewer/2022062422/56649e915503460f94b9740c/html5/thumbnails/22.jpg)
Introns and exons
• Eukaryotic genes contain introns & exons– Introns are seq that
are ultimately spliced out of mRNA
– Introns normally satisfy GT-AG rule, viz. begin w/ GT & end w/ AG
– Each gene can have many introns & each intron can have thousands bases
• Introns can be very long
• An extreme example is a gene associated with cystic fibrosis in human:– Length of 24 introns
~1Mb– Length of exons ~1kb
![Page 23: Copyright © 2004 by Limsoon Wong A Biology Review](https://reader036.vdocument.in/reader036/viewer/2022062422/56649e915503460f94b9740c/html5/thumbnails/23.jpg)
• Unlike eukaryotic genes, a prokaryotic gene typically consists of only one contiguous coding region
Typical Eukaryotic Gene Structure
Copyright © 2004 by Limsoon Wong
Image credit: Xu
![Page 24: Copyright © 2004 by Limsoon Wong A Biology Review](https://reader036.vdocument.in/reader036/viewer/2022062422/56649e915503460f94b9740c/html5/thumbnails/24.jpg)
Reading frame #1
ATGGCTTACGCTTGC
Reading frame #2
TGGCTTACGCTTGA.
Reading frame #3
GGCTTACGCTTGA..
ATGGCTTACGCTTGAForward strand:
Reading frame #4
TCAAGCGTAAGCCAT
Reading frame #5
CAAGCGTAAGCCAT.
Reading frame #6
AAGCGTAAGCCAT..
TCAAGCGTAAGCCATReverse strand:
Reading Frame
• Each DNA segment has six possible reading frames
Copyright © 2004 by Limsoon Wong
![Page 25: Copyright © 2004 by Limsoon Wong A Biology Review](https://reader036.vdocument.in/reader036/viewer/2022062422/56649e915503460f94b9740c/html5/thumbnails/25.jpg)
stop stop
ORF
Open Reading Frame (ORF)
• ORF is a segment of DNA with two in-frame stop codons at the two ends and no in-frame stop codon in the middle
• Each ORF has a fixed reading frame
![Page 26: Copyright © 2004 by Limsoon Wong A Biology Review](https://reader036.vdocument.in/reader036/viewer/2022062422/56649e915503460f94b9740c/html5/thumbnails/26.jpg)
Coding Region
• Each coding region (exon or whole gene) has a fixed translation frame
• A coding region always sits inside an ORF of same reading frame
• All exons of a gene are on the same strand
• Neighboring exons of a gene could have different reading frames
![Page 27: Copyright © 2004 by Limsoon Wong A Biology Review](https://reader036.vdocument.in/reader036/viewer/2022062422/56649e915503460f94b9740c/html5/thumbnails/27.jpg)
exon1[i, j] in frame A and exon2[m, n] in frame B are consistent if
B = (m - j - 1 + A) mod 3
ATG GCT TGG GCT TTA A -------------- GT TTC CCG GAG AT ------ T GGG
exon 1 exon 3exon 2
Frame Consistency
• Neighbouring exons of a gene should be frame-consistent
![Page 28: Copyright © 2004 by Limsoon Wong A Biology Review](https://reader036.vdocument.in/reader036/viewer/2022062422/56649e915503460f94b9740c/html5/thumbnails/28.jpg)
What is Gene Finding?
• Find all coding regions from a stretch of DNA sequence, and construct gene structures from the identified exons
• Can be decomposed into– Find coding potential
of a region in a frame– Find boundaries
between coding & non-coding regions
![Page 29: Copyright © 2004 by Limsoon Wong A Biology Review](https://reader036.vdocument.in/reader036/viewer/2022062422/56649e915503460f94b9740c/html5/thumbnails/29.jpg)
Genbank or nr
candidate gene
BLAST search
sequence alignments with known genes,
alignment p-values
Image credit: Xu
Copyright © 2004 by Limsoon Wong
Search-by-Homology Example: Gene Finding Using
BLAST• High seq similarity typically implies
homologous genes Search for genes in yeast seq using
BLAST Extract Feature for gene identification
![Page 30: Copyright © 2004 by Limsoon Wong A Biology Review](https://reader036.vdocument.in/reader036/viewer/2022062422/56649e915503460f94b9740c/html5/thumbnails/30.jpg)
BLAST hits
sequence
• Searching all ORFs against known genes in nr db helps identify an initial set of (possibly incomplete) genes
Image credit: Xu
![Page 31: Copyright © 2004 by Limsoon Wong A Biology Review](https://reader036.vdocument.in/reader036/viewer/2022062422/56649e915503460f94b9740c/html5/thumbnails/31.jpg)
• A (yeast) gene starts w/ ATG and ends w/ a stop codon, in same reading frame of ORF
• Have “strong” coding potentials, measured by, preference models, Markov chain model, ...
• Have “strong” translation start signal, measured by weight matrix model, ...
• Have distributions wrt length, G+C composition, ...
• Have special seq signals in flanking regions, ...
known genes
0
%known non-
genes
coding potential
gene length distribution