genome sequencing - ahmadrezarafati 1395-01-30
TRANSCRIPT
Genome Sequencing
A Seminar in MISby Ahmadreza Rafati Roudsari
1395/01/30
DNA (Deoxyribonucleic acid)
RNA (Ribonucleic acid)
1
A brief History…
Marshall Nirenberg is bestknown for “breaking the geneticcode” in 1961, an achievementthat won him the Nobel Prize.
Marshall Nirenberg, c. 1968.
Gregor Mendel is usuallyconsidered to be the founderof modern genetics. (with1856-1869 experiments)
Gregor Mendel. The completed chart of the genetic code
To read the code, select aletter from the left, right, andtop columns, such as U-C-A.This combination representsan mRNA codon. Drawimaginary horizontal andvertical lines to connect theletters. They intersect at theamino acid for which theycode. For example, UCA is thecode for serine.
2
Scientific Instruments
Spectrophotometer
Electrophoresis Instrument.French PressCentrifuge
Multi-plater.
3
GlossaryAmino acid
Base
A nucleotide base (Guanine, Adenine, Cytosine, and Thymine) is one of the building blocks of DNA, along with phosphates and sugar.
4
Essential Histidine Isoleucine Leucine Lysine Methionine Phenylalanine Threonine Tryptophan Valine
Nonessential Alanine Arginine* Asparagine Aspartic acid Cysteine* Glutamic acid Glutamine* Glycine Proline* Selenocysteine* Serine* Tyrosine*
Glossary
CodonA codon is a triplet series of bases linked together during protein synthesisto form an amino acid. Each codon carries the code for a specific aminoacid.
“Central Dogma”Francis Crick's “central dogma” of molecular biology,put simply, is:DNA makes RNA makes protein.
DNA and RNADNA, deoxyribonucleic acid, and RNA, ribonucleic acid, are molecules that hold thegenetic information of each cell.
Escherichia coli
Genome
Protein
Ribosome 5
The ribosome (/ˈraɪbəˌsoʊm, -boʊ-/[1]) is a complex molecular machine found within all living cells, that serves as the site ofbiological protein synthesis (translation). Ribosomes link amino acids together in the order specified by messengerRNA(mRNA) molecules.
In modern molecular biology and genetics, the genome is the genetic material of an organism. It consistsof DNA (or RNA in RNA viruses). The genome includes both the genes and the non-protein-coding[1] information of theDNA/RNA.[2]
Escherichia coli (/ˌɛʃᵻˈrɪkiə ˈkoʊlaɪ/;[1] also known as E. coli) is a Gram-negative, facultatively anaerobic, rod-shaped bacterium of the genus Escherichia that iscommonly found in the lower intestine of warm-blooded organisms (endotherms).[2]
Proteins (/ˈproʊˌtiːnz/ or /ˈproʊti.ᵻnz/) are large biomolecules, or macromolecules, consisting of one or more long chains of amino acidresidues.
Glossary
6
Standard genetic code1st 2nd base 3rd
base U C A G base
U
UUU(Phe/F) Phenylalanine
UCU
(Ser/S) Serine
UAU(Tyr/Y) Tyrosine
UGU(Cys/C) Cysteine
UUUC UCC UAC UGC CUUA
(Leu/L) Leucine
UCA UAA Stop (Ochre) UGA Stop (Opal) A
UUG UCG UAG Stop (Amber) UGG (Trp/W) Tryptophan G
C
CUU CCU
(Pro/P) Proline
CAU(His/H) Histidine
CGU
(Arg/R) Arginine
U
CUC CCC CAC CGC CCUA CCA CAA
(Gln/Q) GlutamineCGA A
CUG CCG CAG CGG G
A
AUU(Ile/I) Isoleucine
ACU
(Thr/T) Threonine
AAU(Asn/N) Asparagine
AGU(Ser/S) Serine
UAUC ACC AAC AGC CAUA ACA AAA
(Lys/K) LysineAGA
(Arg/R) ArginineA
AUG[A] (Met/M) Methionine ACG AAG AGG G
G
GUU
(Val/V) Valine
GCU
(Ala/A) Alanine
GAU(Asp/D) Aspartic acid
GGU
(Gly/G) Glycine
UGUC GCC GAC GGC CGUA GCA GAA
(Glu/E) Glutamic acidGGA A
GUG GCG GAG GGG G
In RNA, thymine (T) is replaced by uracil (U),
and the deoxyribose is substituted by ribose.
Ribonucleic acid (RNA) is a polymeric molecule implicated in various biological
roles in coding, decoding, regulation, and expression of genes.
Glossary
7
Messenger RNA (mRNA) is a large family of RNA molecules that convey genetic information from DNA to
the ribosome, where they specify the amino acid sequence of the protein products of gene expression.
The genetic code has 7 main characteristics:
1. It is made up of codons, which are triplets of bases. Each codon specifies a specific amino acid.
2. The codons do not overlap; that is, the sequence GCCCAC contains two triplets, “GCC” and “CAC”
not counting the “CCC” and other subsequent three-letter sequences.
3. The code includes punctuation in the form of three “stop” codons that do not code for an amino
acid: UAA, UAG, and UGA.
4.The genetic code is known as a “degenerate” code. This means that each amino acid is triggered by
between one and six codons. (There are only 20 amino acids and 64 possible codon triplets).
5.To read each gene and glean the necessary information to form proteins, cells begin at a fixed and
particular starting point on the mRNA strand. The initiation codon is AUG (methionine).
6. The mRNA strand is read from the 5' to the 3' end.
7. If there are mutations or errors in the DNA, the message may be changed and incorrect protein
formation results (1)
8
People with public genome sequences
The first nearly complete human genomes sequencedwere J. Craig Venter's (American at 7.5-fold averagecoverage) in 2007.
An American
biotechnologist,
biochemist,
geneticist,
and entrepreneur.
James Watson a Han Chinese
a Yoruban from Nigeria
a female leukemia patient
Seong-Jin Kim
& Steve Jobsfor the cost of $100,000
9
10
Sequence Analysis
• In bioinformatics, sequence analysis is the process of subjectinga DNA, RNA or peptide sequence to any of a wide range of analyticalmethods to understand its features, function, structure, or evolution.
• In chemistry, sequence analysis comprises techniques used to determine thesequence of a polymer formed of several monomers. In molecularbiology and genetics, the same process is called simply "sequencing".
• In marketing, sequence analysis is often used in analytical customerrelationship management applications, such as NPTB models (Next Product toBuy).
• In sociology, sequence methods are increasingly used to study life-course andcareer trajectories, patterns of organizational and national development,conversation and interaction structure, and the problem of work/familysynchrony.(2)
11
GenBank
• The GenBank sequence database is an open access, annotated collection of all publicly available nucleotide sequences and their protein translations.
ftp://ftp.ncbi.nih.gov/ (3)
12
DNA Patterns
DNA patterns are graphs of DNA or RNA sequences. Various functional structures such as promotersand genes, or larger structures like bacterial or viral genomes, can be analyzed using DNA patterns.
Method
The technique was described in 2012 by Paul Gagniuc and Constantin Ionescu-Tirgoviste.[3]They adapted algorithms from cryptography and optical character recognition to make their graphs.
To graph a DNA pattern, two values, kappa index of coincidence and the total percentage ofcytosine plus guanine (C + G)% are calculated from a sliding window which is "circulated" over theDNA sequence.
13
Index of Coincidence (IC)In cryptography, coincidence counting is the technique (invented by William F.Friedman[1]) of putting two texts side-by-side and counting the number of timesthat identical letters appear in the same position in both texts. This count, eitheras a ratio of the total or normalized by dividing by the expected count for arandom source model, is known as the index of coincidence, or IC for short.
14
Where c is the normalizing coefficient (26 for English), na is the number of times the
letter "a" appears in the text, and N is the length of the text.
where N is the length of the text and n1 through nc are the frequencies (as integers) of the c letters of the
alphabet (c = 26 for monocase English). The sum of the ni is necessarily N.
The products n(n−1) count the number of combinations of n elements.
Gene Promoter
15
In genetics, a promoter is a region
of DNA that initiates transcription of a
particular gene. Promoters are located
near the transcription start sites of genes,
on the same strand and upstream on the
DNA (towards the 5' region of the sense
strand). Promoters can be about 100–
1000 base pairs long.[1]
http://www.ncbi.nlm.nih.gov/
• The National Center for Biotechnology Information advances scienceand health by providing access to biomedical and genomic information.
16
Bilbliography
• (1) https://history.nih.gov/exhibits/nirenberg/HS5_cracked.htm
• (2) https://en.wikipedia.org/wiki/Sequence_analysis
• (3) https://en.wikipedia.org/wiki/GenBank
• (4) https://en.wikipedia.org/wiki/DNA_Patterns
• (5) https://en.wikipedia.org/wiki/Transfer_RNA
• (6) https://en.wikipedia.org/wiki/Genetic_code
• (7) https://en.wikipedia.org/wiki/Index_of_coincidence
• (8) PromKappa V3.0 Java (uses the DNA pattern method)
• Thanks to https://xbioinformatics.wordpress.com/tag/promkappa/ for software.
17
Seminar in Management of Information Systems
18