26/1/20081 general introduction to the genome. 26/1/20082 an outlines molecular biology major events...
TRANSCRIPT
26/1/2008 1
General Introduction to the Genome
26/1/2008 2
An Outlines• Molecular Biology Major Events• DNA, RNA • Protein Synthesis(Transcription & Translation)• Genome Anatomy• Bioinformatics• Genomics Signal Processing
2
26/1/2008 3
• Molecular Biology Major Events• DNA, RNA • Protein Synthesis(Transcription & Translation)• Genome Anatomy• Bioinformatics• Genomics Signal Processing
3
26/1/2008 4
Molecular Biology Major Events
DNA Discovery
1865Mendel
Inheritance is controlled by unit factors
1881
Chromosomes are composed of DNA
1869Johann Friedrich
26/1/2008 5
Molecular Biology Major Events
1881
Chromosomes are composed of DNA
1911
Thomas Hunt
Genes on chromosomes are the discrete units of heredity
1941George Beadle
Identify that genes make proteins
Edward Tatum
26/1/2008 6
The Central Dogma1
2
3
TargetBookBook shelvesNucleus
Apr 21, 2023 715
What is Life made of?
715
8
Eukaryotes vs Prokaryotes
8
DNA
DNA
Apr 21, 2023 915
Prokaryotes EukaryotesSingle cell Single or multi cell
No nucleus Nucleus
No organelles Organelles
One piece of circular DNA Chromosomes
No mRNA post transcriptional modification
Exons/Introns splicing
915
26/1/2008 10
The Cell: Chemical Composition
–70% Water–7% Small molecules• Salts• Amino acids (Protein)• Nucleotides (DNA, RAN)
–23% macromolecules• Proteins• Polysaccharides• Lipids
10
26/1/2008 11
The Cell: The 3 Critical Molecules
DNA
Hold Genetic information Transfer Information
Synthesize Protein
PROTEIN
Form enzymesForm body’s components
RNA
m-RNA t-RNAr-RNA
26/1/2008 12
• Molecular Biology Major Events
• DNA, RNA • Protein Synthesis(Transcription & Translation)• Genome Anatomy• Bioinformatics• Genomics Signal Processing
12
26/1/2008 13
DNA: the Nucleotide
13
Phosphate
Sugar
Nitrogenous base
A
26/1/2008 14
DNA: Nitrogenous base
14
Purines Pyrimidines
A TG C
26/1/2008 15
DNA: Polymerization reaction
A T G C
5 P’ 3OH’
A T G C A T G C
5 3
26/1/2008 16
DNA: hydrogn bounds
A C G T A C G T
ACG
TACG
T
No of base pairs= Genome SizeHG= 3200 Mbp (Mb)
26/1/2008 17
AC
GT
AC
GTA
CG
TA
CG
T
Sugar- Phosphate Back bone
DNA: Watson - Crick Model 1951
26/1/2008 18
DNA: Watson - Crick Model
Sugar- Phosphate Back bone
No of base pairs= Genome SizeHG= 3200 Mbp (Mb)
26/1/2008 19
RNA versus DNA
19
Phosphate
Sugar "Ribose”
Nitrogenous base
Phosphate
Sugar” deoxyRibose”
Nitrogenous base
G, A ,C,T G, A ,C,U
26/1/2008 20
Protein structure
• 1902 - Emil Hermann Fischer wins Nobel prize: showed amino acids are linked and form proteins
20
A AFNG
GS T
SD
K
26/1/2008 21
Amino acid: Basic unit of proteinAmino acid: Basic unit of protein
COO-NH3+ C
R
HAn amino
acid
Different side chains, R, determine the properties of 20 amino acids.
Amino group Carboxylic acid group
21
26/1/2008 2222
26/1/2008 23
Protein structure
• Primary structure
• Secondary structure
• Super-secondary structure
• Tertiary structure
• Quaternary structure
26/1/2008 24
Protein Structure: Predication Problem
Protein sequence
Protein 3D structure
Protein Function
A FNG S T
26/1/2008 25
The Central Dogma:Genes is protein’s blueprint, Genes is protein’s blueprint,
Gene
GenomeDNA
Protein
Gene GeneGene
Gene
GeneGeneGeneGene
GeneGeneGeneGene
GeneGene
Protein Protein
ProteinProtein
Protein
ProteinProtein
Protein
Protein
Protein
Protein
ProteinProtein
Protein
26/1/2008 26
• Molecular Biology Major Events• DNA, RNA • Protein Synthesis(Transcription & Translation)• Genome Anatomy• Bioinformatics• Genomics Signal Processing
26
26/1/2008 27
Protein Synthesis: DNA, RNA, and the Flow of Information
TranslationTranscription
Replication
27
26/1/2008 28
Protein Synthesis: Gene Expression
28
26/1/2008 29
Gene 2
Gen
e 1
Pre-mRNA
mRNAGen
e 3
Transcription
1
2
3
Translation
Splicing
1
2
3
26/1/2008 30
Gene 2
Gen
e 1
Pre-mRNA
mRNAGen
e 3
Transcription
1
2
3
Translation
Alternative Splicing
1
3
2
26/1/2008 31
Gene 2
Gen
e 1
Pre-mRNA
mRNAGen
e 3
Transcription
1
2
3
Translation
m-RNA Editing
1
2
3
26/1/2008 3232
26/1/2008 33
Gene 2
Gen
e 1
Pre-mRNA
mRNAGen
e 31
2
3
Translation
AUGAUAACUAG
MS
AK
Start Codon
Stop Codon
CV
26/1/2008 34
Protein Synthesis: The Genetic Code
34
Start
Stop
26/1/2008 35
Gene 1
R Ge
ne 1
1
2
3
1
2
3
Gene Regulation
Regulatory protein
26/1/2008 36
Gene Regulation
Regulatory protein Gene 1
Gene 1 Gene 2
Regulatory protein Gene 2
We have a little knowledge about regulatory mechanisms
26/1/2008 37
What a big Genome Size?
• The 12 font size enables approximately 60 nucleotides of DNA sequence to be written in a line 10 cm in length.
• Genome size = total number of nucleotide base pairs.– typically in millions of base pairs, or megabases
[abbreviated Mb or Mbp])
37
26/1/2008 38
• Molecular Biology Major Events• DNA, RNA • Protein Synthesis(Transcription & Translation)
• Genome Anatomy• Bioinformatics• Genomics Signal Processing
38
26/1/2008 39
the human genome sequence would stretch for 5000 km, the distance from Montreal to London, Los Angeles to Panama, Tokyo to Calcutta, Cape Town to Addis Ababa, or Auckland to Perth
The sequence would fill about 3000 books the size of book 600 pages size.
39
26/1/2008 40
Genome size of organism are different
40
26/1/2008 4141
Genome size is not good indicator for genes number
26/1/2008 4242
• Space is saved in the genomes of less complex organisms because the genes are more closely packed together.
26/1/2008 43
C-value paradox
• Correlation between the complexity of an organism and the size of its genome was looked on as a bit of a puzzle.
•
43
26/1/2008 44
Genome Anatomy
Gene 1
Gene 6Gene 5
Gene 4
Gene 2
Gene 3
26/1/2008 45
Human Genome Anatomy
Human genome Nuclear genome Mitochondrial genome
45
26/1/2008 46
Human Mitochondrial Genome Anatomy
46
• it is much smaller than the nuclear genome(~17 kB), and it contains just 37 genes.
• 13 code proteins and 24 specify non-coding RNA.
• do not contain intron.• is typical of the
mitochondrial genomes of other animals
26/1/2008 4747
26/1/2008 48
Nuclear Human Genome Anatomy
48
62%
26/1/2008 49
Nuclear Human Genome Anatomy: Protein Coding Genes
26/1/2008 50
Nuclear Human Genome Anatomy: Protein Coding Genes
50
five exons, separated by four introns.
average exons= nine exons per gene
26/1/2008 5151
Two gene segments (V28 and V29-1)
26/1/2008 52
Nuclear Human Genome Anatomy: pseudogene
52
Non functional genes
26/1/2008 53
Nuclear Human Genome Anatomy: genome-wide repeat
26/1/2008 54
Nuclear Human Genome Anatomy: genome-wide repeat
•Tandemly repeated DNA•Minisatellite DNA•Microsatellite DNA
•Interspersed genome-wide repeats•SINE•LINES•LTR•DNA transposons
54
26/1/2008 55
Nuclear Human Genome Anatomy: genome-wide repeat Minisatellite DNA
• we are familiar with because of its association with structural features of chromosomes.
• Telomeric DNA, which in humans comprises hundreds of copies of the motif 5 -TTAGGG-3 .′ ′
55
TTAGGGAATCCC
TTAGGGAATCCC
TTAGGGAATCCC
………………………..………………………..
26/1/2008 56
The content of the human nuclear genome: genome-wide repeat Microsatellite DNA
• microsatellites with a CA repeat, such as:
make up 0.25% of the genome, 8 Mb in all. • Single base-pair repeats such as:
make up another 0.15%.
56
26/1/2008 57
Nuclear Human Genome Anatomy: genome-wide repeat Interspersed repeat
57
26/1/2008 58
Gene Classification: Gene function
• This system has the advantage that the fairly broad functional categories used in can be further subdivided to produce a hierarchy of increasingly specific functional descriptions for smaller and smaller sets of genes.
• The weakness : functions have not yet been assigned to
many eukaryotic genes.
58
26/1/2008 59
Gene Classification: Gene function
• The gene catalog couldn’t tell us why we are human?
• it may still not be possible simply from genome comparisons with the chimpanzee genome to determine what makes us human
59
26/1/2008 60
Gene Classification: Gene function
• The major categories of protein coding genes represent the most studied areas of cell biology, which means that many of the relevant genes can be recognized because their protein products are known.
• Genes whose products have not yet been identified are more likely to be involved in the less well studied areas of cellular activity.
60
26/1/2008 61
Gene classification: Protein Domain
• A more powerful method is to base the classification not on the functions of genes but on the structures of the proteins that they specify.
• A protein molecule is constructed from a series of domains, each of which has a particular biochemical function.
61
26/1/2008 62
Gene classification: Protein Domain
62
26/1/2008 63
• Molecular Biology Major Events• DNA, RNA • Protein Synthesis(Transcription & Translation)• Genome Anatomy
• Bioinformatics• Genomics Signal Processing
63
26/1/2008 64
What is Bioinformatics?
• Integration of computational and biological methodsto convert biological information into general theories.
64
aatgcatgcggctatgctaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgggatccgatgacaatgcatgcggctatgctaatgaatggtcttgggatttaccttggaatgctaagctgggatccgatgacaatgcatgcggctatgctaatgaatggtcttgggatttaccttggaatatgctaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgcggctatgctaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatcctgcggctatgctaatgaatggtcttgggatttaccttggaatgctaagctgggatccgatgacaatgcatgcggctatgctaatgaatggtcttgggatttaccttggaatatgctaatgcatgcggctatgctaagctgggaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgcggctatgctaatgcatgcggctatgctaagctcatgcggctatgctaagctgggaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgcggctatgctaatgcatgcggctatgctaagctcggctatgctaatgaatggtcttgggatttaccttggaatgctaagctgggatccgatgacaatgcatgcggctatgctaatgaatggtcttgggatttaccttggaatatgctaatgcatgcggctatgctaagctgggaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgcggctatgctaatgcatgcggctatgctaagctcatgcgg
aatgcatgcggctatgctaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgggatccgatgacaatgcatgcggctatgctaatgaatggtcttgggatttaccttggaatgctaagctgggatccgatgacaatgcatgcggctatgctaatgaatggtcttgggatttaccttggaatatgctaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgcggctatgctaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatcctgcggctatgctaatgaatggtcttgggatttaccttggaatgctaagctgggatccgatgacaatgcatgcggctatgctaatgaatggtcttgggatttaccttggaatatgctaatgcatgcggctatgctaagctgggaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgcggctatgctaatgcatgcggctatgctaagctcatgcggctatgctaagctgggaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgcggctatgctaatgcatgcggctatgctaagctcggctatgctaatgaatggtcttgggatttaccttggaatgctaagctgggatccgatgacaatgcatgcggctatgctaatgaatggtcttgggatttaccttggaatatgctaatgcatgcggctatgctaagctgggaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgcggctatgctaatgcatgcggctatgctaagctcatgcgg
26/1/2008 65
Bioinformatics
Statistics
BiologyComputer Science
Chemistry
Data structuresSoftware engineering
(C, C++,PERL)Cell structure
Genome, genesDNA, RNA
Protein structureMolecular bounds
Markof ModelNeural Network
65
26/1/2008 66
Bioinformatics Subareas
• The subareas within bioinformatics include Genomics and Proteomics.
66
Genome comparisonevolutionary tree
Microarray AnalysisGene predicationGene classificationGene regulation
Protein 3D predicationProtein protein interactionProtein alignment
26/1/2008 67
• Molecular Biology Major Events• DNA, RNA • Protein Synthesis(Transcription & Translation)• Genome Anatomy• Bioinformatics
• Genomics Signal Processing
67
26/1/2008 68
What is GSP?
aatgcatgcggctatgctaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgggatccgatgacaatgcatgcggctatgctaatgaatggtcttgggatttaccttggaatgctaagctgggatccgatgacaatgcatgcggctatgctaatgaatggtcttgggatttaccttggaatatgctaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgcggctatgctaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatcctgcggctatgctaatgaatggtcttgggatttaccttggaatgctaagctgggatccgatgacaatgcatgcggctatgctaatgaatggtcttgggatttaccttggaatatgctaatgcatgcggctatgctaagctgggaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgcggctatgctaatgcatgcggctatgctaagctcatgcggctatgctaagctgg
Analysis Processing
Using Theory and Methods of Signal Processing
To gain global understanding of Genome.
26/1/2008 69
GSP Labs
• The Genomic Signal Processing Laboratory at
Texas A&M University.• The Computational Biology
Division of the Translational Genomics
Research Institute in Phoenix, Arizona.
Edward R. DoughertyTo model Genomic Regulatory Mechanisms for the purposes of diagnosis and therapy.
26/1/2008 70
GSP Labs
• Columbia's Genomic Information Systems Laboratory
at Columbia University
Dimitris Anastassiou
26/1/2008 71
GSP Labs
• DSP Group, Department of Electrical Engineering, California Institute of Technology
P. P. Vaidyanathan
26/1/2008 72
Mapping Character String to Numerical Sequences
AAAATTTTCCCGGGTAGCTTTCCCGGGT
0001110101010101111111111000
26/1/2008 73
Research Area of GSP
• Gene Predication• Genes Predication– Hidden Markov Models (HMM)– Fourier Transform– Wavelet Transform• Resonant Recognition Model (RRM)To identify the common hot spots of many protein
molecules using Fourier transform methods.•
26/1/2008 74
References
• http://biology.ucok.edu/bidlack/biology/notes.htm
• http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=genomes
• http://www.estrellamountain.edu/faculty/farabee/biobk/biobooktoc.html
• http://www.werathah.com/• http://lectures.molgen.mpg.de/
online_lectures.html
74
26/1/2008 75
References
• http://www.biology.lsu.edu/webfac/jmoroney/BIOL3090/
75
26/1/2008 76
THANKYOU FOR YOUR
ATTENATION