genome sequencing - ahmadrezarafati 1395-01-30

18
Genome Sequencing A Seminar in MIS by Ahmadreza Rafati Roudsari 1395/01/30 DNA (Deoxyribonucleic acid) RNA (Ribonucleic acid) 1

Upload: ahmadreza-rafati-roudsari

Post on 14-Apr-2017

28 views

Category:

Documents


6 download

TRANSCRIPT

Page 1: Genome Sequencing - Ahmadrezarafati 1395-01-30

Genome Sequencing

A Seminar in MISby Ahmadreza Rafati Roudsari

1395/01/30

DNA (Deoxyribonucleic acid)

RNA (Ribonucleic acid)

1

Page 2: Genome Sequencing - Ahmadrezarafati 1395-01-30

A brief History…

Marshall Nirenberg is bestknown for “breaking the geneticcode” in 1961, an achievementthat won him the Nobel Prize.

Marshall Nirenberg, c. 1968.

Gregor Mendel is usuallyconsidered to be the founderof modern genetics. (with1856-1869 experiments)

Gregor Mendel. The completed chart of the genetic code

To read the code, select aletter from the left, right, andtop columns, such as U-C-A.This combination representsan mRNA codon. Drawimaginary horizontal andvertical lines to connect theletters. They intersect at theamino acid for which theycode. For example, UCA is thecode for serine.

2

Page 3: Genome Sequencing - Ahmadrezarafati 1395-01-30

Scientific Instruments

Spectrophotometer

Electrophoresis Instrument.French PressCentrifuge

Multi-plater.

3

Page 5: Genome Sequencing - Ahmadrezarafati 1395-01-30

Glossary

CodonA codon is a triplet series of bases linked together during protein synthesisto form an amino acid. Each codon carries the code for a specific aminoacid.

“Central Dogma”Francis Crick's “central dogma” of molecular biology,put simply, is:DNA makes RNA makes protein.

DNA and RNADNA, deoxyribonucleic acid, and RNA, ribonucleic acid, are molecules that hold thegenetic information of each cell.

Escherichia coli

Genome

Protein

Ribosome 5

The ribosome (/ˈraɪbəˌsoʊm, -boʊ-/[1]) is a complex molecular machine found within all living cells, that serves as the site ofbiological protein synthesis (translation). Ribosomes link amino acids together in the order specified by messengerRNA(mRNA) molecules.

In modern molecular biology and genetics, the genome is the genetic material of an organism. It consistsof DNA (or RNA in RNA viruses). The genome includes both the genes and the non-protein-coding[1] information of theDNA/RNA.[2]

Escherichia coli (/ˌɛʃᵻˈrɪkiə ˈkoʊlaɪ/;[1] also known as E. coli) is a Gram-negative, facultatively anaerobic, rod-shaped bacterium of the genus Escherichia that iscommonly found in the lower intestine of warm-blooded organisms (endotherms).[2]

Proteins (/ˈproʊˌtiːnz/ or /ˈproʊti.ᵻnz/) are large biomolecules, or macromolecules, consisting of one or more long chains of amino acidresidues.

Page 6: Genome Sequencing - Ahmadrezarafati 1395-01-30

Glossary

6

Standard genetic code1st 2nd base 3rd

base U C A G base

U

UUU(Phe/F) Phenylalanine

UCU

(Ser/S) Serine

UAU(Tyr/Y) Tyrosine

UGU(Cys/C) Cysteine

UUUC UCC UAC UGC CUUA

(Leu/L) Leucine

UCA UAA Stop (Ochre) UGA Stop (Opal) A

UUG UCG UAG Stop (Amber) UGG (Trp/W) Tryptophan G

C

CUU CCU

(Pro/P) Proline

CAU(His/H) Histidine

CGU

(Arg/R) Arginine

U

CUC CCC CAC CGC CCUA CCA CAA

(Gln/Q) GlutamineCGA A

CUG CCG CAG CGG G

A

AUU(Ile/I) Isoleucine

ACU

(Thr/T) Threonine

AAU(Asn/N) Asparagine

AGU(Ser/S) Serine

UAUC ACC AAC AGC CAUA ACA AAA

(Lys/K) LysineAGA

(Arg/R) ArginineA

AUG[A] (Met/M) Methionine ACG AAG AGG G

G

GUU

(Val/V) Valine

GCU

(Ala/A) Alanine

GAU(Asp/D) Aspartic acid

GGU

(Gly/G) Glycine

UGUC GCC GAC GGC CGUA GCA GAA

(Glu/E) Glutamic acidGGA A

GUG GCG GAG GGG G

In RNA, thymine (T) is replaced by uracil (U),

and the deoxyribose is substituted by ribose.

Ribonucleic acid (RNA) is a polymeric molecule implicated in various biological

roles in coding, decoding, regulation, and expression of genes.

Page 7: Genome Sequencing - Ahmadrezarafati 1395-01-30

Glossary

7

Messenger RNA (mRNA) is a large family of RNA molecules that convey genetic information from DNA to

the ribosome, where they specify the amino acid sequence of the protein products of gene expression.

Page 8: Genome Sequencing - Ahmadrezarafati 1395-01-30

The genetic code has 7 main characteristics:

1. It is made up of codons, which are triplets of bases. Each codon specifies a specific amino acid.

2. The codons do not overlap; that is, the sequence GCCCAC contains two triplets, “GCC” and “CAC”

not counting the “CCC” and other subsequent three-letter sequences.

3. The code includes punctuation in the form of three “stop” codons that do not code for an amino

acid: UAA, UAG, and UGA.

4.The genetic code is known as a “degenerate” code. This means that each amino acid is triggered by

between one and six codons. (There are only 20 amino acids and 64 possible codon triplets).

5.To read each gene and glean the necessary information to form proteins, cells begin at a fixed and

particular starting point on the mRNA strand. The initiation codon is AUG (methionine).

6. The mRNA strand is read from the 5' to the 3' end.

7. If there are mutations or errors in the DNA, the message may be changed and incorrect protein

formation results (1)

8

Page 9: Genome Sequencing - Ahmadrezarafati 1395-01-30

People with public genome sequences

The first nearly complete human genomes sequencedwere J. Craig Venter's (American at 7.5-fold averagecoverage) in 2007.

An American

biotechnologist,

biochemist,

geneticist,

and entrepreneur.

James Watson a Han Chinese

a Yoruban from Nigeria

a female leukemia patient

Seong-Jin Kim

& Steve Jobsfor the cost of $100,000

9

Page 10: Genome Sequencing - Ahmadrezarafati 1395-01-30

10

Page 11: Genome Sequencing - Ahmadrezarafati 1395-01-30

Sequence Analysis

• In bioinformatics, sequence analysis is the process of subjectinga DNA, RNA or peptide sequence to any of a wide range of analyticalmethods to understand its features, function, structure, or evolution.

• In chemistry, sequence analysis comprises techniques used to determine thesequence of a polymer formed of several monomers. In molecularbiology and genetics, the same process is called simply "sequencing".

• In marketing, sequence analysis is often used in analytical customerrelationship management applications, such as NPTB models (Next Product toBuy).

• In sociology, sequence methods are increasingly used to study life-course andcareer trajectories, patterns of organizational and national development,conversation and interaction structure, and the problem of work/familysynchrony.(2)

11

Page 12: Genome Sequencing - Ahmadrezarafati 1395-01-30

GenBank

• The GenBank sequence database is an open access, annotated collection of all publicly available nucleotide sequences and their protein translations.

ftp://ftp.ncbi.nih.gov/ (3)

12

Page 13: Genome Sequencing - Ahmadrezarafati 1395-01-30

DNA Patterns

DNA patterns are graphs of DNA or RNA sequences. Various functional structures such as promotersand genes, or larger structures like bacterial or viral genomes, can be analyzed using DNA patterns.

Method

The technique was described in 2012 by Paul Gagniuc and Constantin Ionescu-Tirgoviste.[3]They adapted algorithms from cryptography and optical character recognition to make their graphs.

To graph a DNA pattern, two values, kappa index of coincidence and the total percentage ofcytosine plus guanine (C + G)% are calculated from a sliding window which is "circulated" over theDNA sequence.

13

Page 14: Genome Sequencing - Ahmadrezarafati 1395-01-30

Index of Coincidence (IC)In cryptography, coincidence counting is the technique (invented by William F.Friedman[1]) of putting two texts side-by-side and counting the number of timesthat identical letters appear in the same position in both texts. This count, eitheras a ratio of the total or normalized by dividing by the expected count for arandom source model, is known as the index of coincidence, or IC for short.

14

Where c is the normalizing coefficient (26 for English), na is the number of times the

letter "a" appears in the text, and N is the length of the text.

where N is the length of the text and n1 through nc are the frequencies (as integers) of the c letters of the

alphabet (c = 26 for monocase English). The sum of the ni is necessarily N.

The products n(n−1) count the number of combinations of n elements.

Page 15: Genome Sequencing - Ahmadrezarafati 1395-01-30

Gene Promoter

15

In genetics, a promoter is a region

of DNA that initiates transcription of a

particular gene. Promoters are located

near the transcription start sites of genes,

on the same strand and upstream on the

DNA (towards the 5' region of the sense

strand). Promoters can be about 100–

1000 base pairs long.[1]

Page 16: Genome Sequencing - Ahmadrezarafati 1395-01-30

http://www.ncbi.nlm.nih.gov/

• The National Center for Biotechnology Information advances scienceand health by providing access to biomedical and genomic information.

16

Page 17: Genome Sequencing - Ahmadrezarafati 1395-01-30

Bilbliography

• (1) https://history.nih.gov/exhibits/nirenberg/HS5_cracked.htm

• (2) https://en.wikipedia.org/wiki/Sequence_analysis

• (3) https://en.wikipedia.org/wiki/GenBank

• (4) https://en.wikipedia.org/wiki/DNA_Patterns

• (5) https://en.wikipedia.org/wiki/Transfer_RNA

• (6) https://en.wikipedia.org/wiki/Genetic_code

• (7) https://en.wikipedia.org/wiki/Index_of_coincidence

• (8) PromKappa V3.0 Java (uses the DNA pattern method)

• Thanks to https://xbioinformatics.wordpress.com/tag/promkappa/ for software.

17

Page 18: Genome Sequencing - Ahmadrezarafati 1395-01-30

Seminar in Management of Information Systems

18