cis 667
DESCRIPTION
CIS 667. Bioinformatics Cleveland State University Department of Computer and Information Science Fall 2003. What is Bioinformatics?. Field of science in which biology, computer science, information technology merge to form a single discipline - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: CIS 667](https://reader035.vdocument.in/reader035/viewer/2022062517/56812d2c550346895d922baa/html5/thumbnails/1.jpg)
CIS 667
BioinformaticsCleveland State University
Department of Computer and Information Science
Fall 2003
![Page 2: CIS 667](https://reader035.vdocument.in/reader035/viewer/2022062517/56812d2c550346895d922baa/html5/thumbnails/2.jpg)
What is Bioinformatics?
• Field of science in which biology, computer science, information technology merge to form a single discipline Historically, creation/maintenance of
biological sequence databases important
• Biology is being transformed from a purely lab-based science to an information science as well
![Page 3: CIS 667](https://reader035.vdocument.in/reader035/viewer/2022062517/56812d2c550346895d922baa/html5/thumbnails/3.jpg)
What is Bioinformatics?
• Three important sub-disciplines Development of new algorithms and
statistical methods to analyze relationships among members of large data sets
Analysis and interpretation of various types of data (nucleotide and amino acid sequences, protein structures, etc.)
Development/implementation of tools for efficient access/mgmt. of various types data
![Page 4: CIS 667](https://reader035.vdocument.in/reader035/viewer/2022062517/56812d2c550346895d922baa/html5/thumbnails/4.jpg)
Why now?
• Recent advances in molecular biology and genomic technologies lead to an explosive growth in the amount of biological information generated
• Requires computerized databases to store/organize/index data and specialized tools to view and analyze data
![Page 5: CIS 667](https://reader035.vdocument.in/reader035/viewer/2022062517/56812d2c550346895d922baa/html5/thumbnails/5.jpg)
What skills should a Bioinformatician have?
• Deep background in some area of molecular biology
• Understand the central dogma of molecular biology
• Substantial experience with at least one or two major packages
• Experience working in a command-line computing environment
• Experience with both high-level and scripting languages
![Page 6: CIS 667](https://reader035.vdocument.in/reader035/viewer/2022062517/56812d2c550346895d922baa/html5/thumbnails/6.jpg)
Others…
• Molecular Evolution • Physical chemistry• Statistics and probability• Database design• Algorithm development• Molecular biology lab methods
![Page 7: CIS 667](https://reader035.vdocument.in/reader035/viewer/2022062517/56812d2c550346895d922baa/html5/thumbnails/7.jpg)
What will we learn?
• Central dogma of molecular biology + other necessary biology background
• Working in a Unix command-line environment
• Programming in Perl• Algorithms for molecular biology• Hands-on experience with
bioinformatics tools
![Page 8: CIS 667](https://reader035.vdocument.in/reader035/viewer/2022062517/56812d2c550346895d922baa/html5/thumbnails/8.jpg)
Molecular Biology
• Primarily concerned with two basic molecules of all living things: Proteins
Structural proteins are tissue building blocks while enzymes catalyze chemical reactions
Proteins are chains of amino acids
![Page 9: CIS 667](https://reader035.vdocument.in/reader035/viewer/2022062517/56812d2c550346895d922baa/html5/thumbnails/9.jpg)
Example Amino Acid
CH3
H2N
H
C COOH
Alpha Carbon
Amino GroupCarboxy Group
Side Chain
![Page 10: CIS 667](https://reader035.vdocument.in/reader035/viewer/2022062517/56812d2c550346895d922baa/html5/thumbnails/10.jpg)
Amino Acids
• There are 20 naturally occurring amino acids Amino acids can be identified by a 3-
letter code (and sometimes by 1-letter code)
In a protein, amino acids are joined by peptide bonds (C from carboxy group binds to N from amino group) A water molecule is liberated so we speak of
residues in the chain
![Page 11: CIS 667](https://reader035.vdocument.in/reader035/viewer/2022062517/56812d2c550346895d922baa/html5/thumbnails/11.jpg)
Amino AcidsName One-letter code Three-letter code
Alanine A Ala
Cysteine C Cys
Aspartic Acid D Asp
Glutamic Acid E Glu
Phenylalanine F Phe
Glycine G Gly
Histidine H His
Isoleucine I Ile
Lysine K Lys
Leucine L Leu
Methionine M Met
Asparagine N Asn
Proline P Pro
Glutamine Q Gln
Arginine R Arg
Serine S Ser
Threonine T Thr
Valine V Val
Tryptophan W Trp
Tyrosine Y Tyr
![Page 12: CIS 667](https://reader035.vdocument.in/reader035/viewer/2022062517/56812d2c550346895d922baa/html5/thumbnails/12.jpg)
Proteins
• Typical protein contains about 300 residues
• Chain have an amino group at one end and a carboxy group at the other giving the chain an orientation (start - end)
• The sequence of residues in the chain is called the protein’s primary structure
![Page 13: CIS 667](https://reader035.vdocument.in/reader035/viewer/2022062517/56812d2c550346895d922baa/html5/thumbnails/13.jpg)
Proteins
• Proteins fold in three dimensions resulting in secondary, tertiary, quaternary structures
• The two most common secondary structures are the -helix and the -sheet
![Page 14: CIS 667](https://reader035.vdocument.in/reader035/viewer/2022062517/56812d2c550346895d922baa/html5/thumbnails/14.jpg)
Secondary Structure
• Only a small number of patterns are common
• Patterns formed by regular intramolecular hydrogen bonding patterns
![Page 15: CIS 667](https://reader035.vdocument.in/reader035/viewer/2022062517/56812d2c550346895d922baa/html5/thumbnails/15.jpg)
![Page 16: CIS 667](https://reader035.vdocument.in/reader035/viewer/2022062517/56812d2c550346895d922baa/html5/thumbnails/16.jpg)
Proteins
• The specific shape that a protein folds into determines its unique function Different shapes mean the protein can
bind to different molecules• Proteins are produced in a cell
structure called a ribosome Amino acids are added one after the other
in the sequence coded by a messenger ribonucleic acid (mRNA) molecule
![Page 17: CIS 667](https://reader035.vdocument.in/reader035/viewer/2022062517/56812d2c550346895d922baa/html5/thumbnails/17.jpg)
Ribosomes
Large subunit Small subunit
![Page 18: CIS 667](https://reader035.vdocument.in/reader035/viewer/2022062517/56812d2c550346895d922baa/html5/thumbnails/18.jpg)
Nucleic Acid
• Two types of nucleic acids Ribonucleic acid (RNA) Deoxyribonucleic acid (DNA)
• DNA, like protein, is a chain of simpler molecules, but double stranded Each strand consists of a chain of
nucleotides
![Page 19: CIS 667](https://reader035.vdocument.in/reader035/viewer/2022062517/56812d2c550346895d922baa/html5/thumbnails/19.jpg)
Nucleic Acids
• Each nucleotide consists of A sugar molecule A phosphate residue A base
• The sugar molecule has five carbon atoms labeled 1’ - 5’ The 3’ carbon of one nucleotide is bound to the
5’ carbon of the next nucleotide in the chain giving an orientation to the chain 5’ is the start and 3’ is the end
![Page 20: CIS 667](https://reader035.vdocument.in/reader035/viewer/2022062517/56812d2c550346895d922baa/html5/thumbnails/20.jpg)
Nucleic Acids
![Page 21: CIS 667](https://reader035.vdocument.in/reader035/viewer/2022062517/56812d2c550346895d922baa/html5/thumbnails/21.jpg)
Nucleic Acids
• The chain of sugar/phosphate groups forms the backbone of a strand of DNA
• Attached to each 1’ carbon in the backbone is a molecule called a base There are four different bases
Adenine (A) Guanine (G) Cytosine (C) Thymine (T)
![Page 22: CIS 667](https://reader035.vdocument.in/reader035/viewer/2022062517/56812d2c550346895d922baa/html5/thumbnails/22.jpg)
DNA
• DNA molecules are double strands The strands form a double helix The strands are held in the helix form by
bonds between complementary bases in the two strands A and T are complements G and C are complements
We refer to the paired bases as base pairs (bp) and use base pairs as the unit of length of DNA molecules
![Page 23: CIS 667](https://reader035.vdocument.in/reader035/viewer/2022062517/56812d2c550346895d922baa/html5/thumbnails/23.jpg)
DNA Double Helix
![Page 24: CIS 667](https://reader035.vdocument.in/reader035/viewer/2022062517/56812d2c550346895d922baa/html5/thumbnails/24.jpg)
DNA
• DNA can be considered as a string of letters from the set {A, T, C, G} 5’ … TACTGAA … 3’
• This other strand connected to this one is antiparallel and complentary 3’ … ATGACTT … 5’
• Note that the orientations of the two strands are opposite
![Page 25: CIS 667](https://reader035.vdocument.in/reader035/viewer/2022062517/56812d2c550346895d922baa/html5/thumbnails/25.jpg)
DNA
• Given one of the strands, we can infer the other strands One of the strands can act as a template
for the construction of the other This property allows for cell division and
replication with each new cell containing a copy of the DNA from the original cell
• Complementary base pairs are held together by hydrogen bonds
![Page 26: CIS 667](https://reader035.vdocument.in/reader035/viewer/2022062517/56812d2c550346895d922baa/html5/thumbnails/26.jpg)
DNA
• In higher organisms, DNA is found inside the cell nucleus Also in cell organelles called mitochondria
(plants and animals) and chloroplasts (plants only)
• The DNA is found in a few very long DNA molecules called chromosomes
![Page 27: CIS 667](https://reader035.vdocument.in/reader035/viewer/2022062517/56812d2c550346895d922baa/html5/thumbnails/27.jpg)
RNA
• RNA molecules are similar to DNA, but Have a different sugar Have the base uracil (U) instead of thymine (T)
U binds with A, as does T
RNA does not form a double helix Hybrid DNA-RNA helices may form Parts of an RNA molecule may bind to other parts of
the same molecule by complementarity
Three-dimensional structure is variable (compare Protein)
![Page 28: CIS 667](https://reader035.vdocument.in/reader035/viewer/2022062517/56812d2c550346895d922baa/html5/thumbnails/28.jpg)
Central Dogma of Molecular Biology
• Information stored in DNA is used to make a transient RNA Process is called transcription
accomplished through use of enzyme RNA polymerase
• The RNA is used to make proteins Process is called translation and is
performed by ribosomes
![Page 29: CIS 667](https://reader035.vdocument.in/reader035/viewer/2022062517/56812d2c550346895d922baa/html5/thumbnails/29.jpg)
RNA Transcription
![Page 30: CIS 667](https://reader035.vdocument.in/reader035/viewer/2022062517/56812d2c550346895d922baa/html5/thumbnails/30.jpg)
RNA Transcription
QuickTime™ and aGraphics decompressor
are needed to see this picture.
![Page 31: CIS 667](https://reader035.vdocument.in/reader035/viewer/2022062517/56812d2c550346895d922baa/html5/thumbnails/31.jpg)
![Page 32: CIS 667](https://reader035.vdocument.in/reader035/viewer/2022062517/56812d2c550346895d922baa/html5/thumbnails/32.jpg)
Genes and the Genetic Code
• All of the proteins in an organism are specified by a contiguous stretch of DNA called a gene Remember that the DNA is contained in
a small number of molecules called chromosomes
Not all of the DNA specifies some protein
Some genes code for RNA products
![Page 33: CIS 667](https://reader035.vdocument.in/reader035/viewer/2022062517/56812d2c550346895d922baa/html5/thumbnails/33.jpg)
Gene Expression
• Gene expression is the process of using the information stored in DNA to make an RNA molecule and then a protein RNA polymerases must
determine the start of genes determine whether the protein coded by a
gene is needed at the present moment Start of gene marked by 13 nucleotides
(why 13, not, e.g. 1) promoter sequence
![Page 34: CIS 667](https://reader035.vdocument.in/reader035/viewer/2022062517/56812d2c550346895d922baa/html5/thumbnails/34.jpg)
Gene Expression
![Page 35: CIS 667](https://reader035.vdocument.in/reader035/viewer/2022062517/56812d2c550346895d922baa/html5/thumbnails/35.jpg)
Gene Expression
• How does the RNA polymerase then tell if a protein should now be produced? Specific regulatory genes produce proteins
capable of binding to a cell’s DNA near the promoter sequence of a gene they control in some circumstances
Positive regulation when binding makes RNA polymerase initiation of transcription easier, negative regulation when harder
![Page 36: CIS 667](https://reader035.vdocument.in/reader035/viewer/2022062517/56812d2c550346895d922baa/html5/thumbnails/36.jpg)
Genetic Code
• A gene codes the sequence of amino acids needed to form a protein
• 20 aa > 4 bases need more than one base to specify an aa 43 > 20, so 3 bases suffice Each sequence of 3 bases (a codon)
codes for an amino acid (with 3 exceptions)
Three codons cause translation to end and are called stop codons
![Page 37: CIS 667](https://reader035.vdocument.in/reader035/viewer/2022062517/56812d2c550346895d922baa/html5/thumbnails/37.jpg)
Genetic Code
• Since 64 > 20, more than one codon must code for some amino acid(s)
• In fact, 18 of the 20 amino acids are coded for by more than one codon
• The genetic code is therefore a degenerate code Errors in transcription may not cause the
wrong aa to be produced (especially if the error is in the 3rd nucleotide)
Even if the wrong aa is produced due to a single error, a similar aa is likely to be produced
![Page 38: CIS 667](https://reader035.vdocument.in/reader035/viewer/2022062517/56812d2c550346895d922baa/html5/thumbnails/38.jpg)
Open Reading Frames
• One special start codon (AUG) marks the spot where translation begins
• A sequence of codons is called a reading frame A sequence of codons which begins with
a start codon and has no stop codons is called an open reading frame (orf)
![Page 39: CIS 667](https://reader035.vdocument.in/reader035/viewer/2022062517/56812d2c550346895d922baa/html5/thumbnails/39.jpg)
Prokaryotes and Eukaryotes
• Living organisms may be classified as either prokaryote (bacteria) or eukaryote (higer organisms like yeast, plants, people) The cells of eukaryotes have a nucleus
while prokaryotes don’t DNA is linear in eukaryotes and circular
in prokaryotes
![Page 40: CIS 667](https://reader035.vdocument.in/reader035/viewer/2022062517/56812d2c550346895d922baa/html5/thumbnails/40.jpg)
Prokaryotes and Eukaryotes
![Page 41: CIS 667](https://reader035.vdocument.in/reader035/viewer/2022062517/56812d2c550346895d922baa/html5/thumbnails/41.jpg)
Introns and Exons
• In prokaryotes, the mRNA copies of the genes corresponds directly to the DNA sequence in the genome (with U substituted for T)
• In eukaryotes, the mRNA is carried outside the nucleus before translation The mRNA is modified by splicing out
sequences of introns and rejoining the exons that flank them
![Page 42: CIS 667](https://reader035.vdocument.in/reader035/viewer/2022062517/56812d2c550346895d922baa/html5/thumbnails/42.jpg)
Introns and Exons
• Splicing is controlled by enzyme complexes called spliceosomes Incorrect splicing leads to frame shifts or
premature stop codons which make the resulting protein useless
The position of introns is signalled by several specific sequences of nucleotides Since there is more than one sequence we
can have alternative splicing resulting in different proteins being produced in different circumstances.
![Page 43: CIS 667](https://reader035.vdocument.in/reader035/viewer/2022062517/56812d2c550346895d922baa/html5/thumbnails/43.jpg)
Molecular Biology Tools
• A small set of laboratory techniques are used by molecular biologists to identify the information content of organisms so that it can be processed using bioinformatics methods
![Page 44: CIS 667](https://reader035.vdocument.in/reader035/viewer/2022062517/56812d2c550346895d922baa/html5/thumbnails/44.jpg)
Restriction Enzyme Digests
• Restriction enzymes can be used to cut DNA molecules wherever a particular sequence occurs Digesting a DNA molecule and observing
how many fragments occur gives some insight into the organization and sequence of that DNA This is called restriction mapping
Allows isolation and experimentation of individual genes for the first time
![Page 45: CIS 667](https://reader035.vdocument.in/reader035/viewer/2022062517/56812d2c550346895d922baa/html5/thumbnails/45.jpg)
Restriction Enzyme Digests
![Page 46: CIS 667](https://reader035.vdocument.in/reader035/viewer/2022062517/56812d2c550346895d922baa/html5/thumbnails/46.jpg)
Gel Electrophoresis
• We can separate the fragments of DNA obtained by restriction enzymes with gel electrophoresis DNA fragments are pulled through a gel
towards an electrical charge Larger fragments do not move as
quickly, so this provides a way to separate the fragments by size
![Page 47: CIS 667](https://reader035.vdocument.in/reader035/viewer/2022062517/56812d2c550346895d922baa/html5/thumbnails/47.jpg)
Gel Electrophoresis
![Page 48: CIS 667](https://reader035.vdocument.in/reader035/viewer/2022062517/56812d2c550346895d922baa/html5/thumbnails/48.jpg)
Blotting and Hybridization
• To study a single fragment, DNA is transferred from the gel to a piece of paper or cloth (blotting) The DNA fragments are then
permanently attached to the membrane using (e.g.) UV light
A specially prepared labeled fragment of DNA (a probe) is allowed to base pair with the fragments to try to find a specific fragment
![Page 49: CIS 667](https://reader035.vdocument.in/reader035/viewer/2022062517/56812d2c550346895d922baa/html5/thumbnails/49.jpg)
Blotting and Hybridization
• The probe is tagged using (e.g.) a fluorescent dye (hybridization) Then determine where on the membrane base
pairing has occurred
• DNA chip or microarray techniques are similar Thousands of nucleotide sequences are affixed
to portions of a small silica chip A large number of probes are washed over the
chip and a laser is used to find which probes bind to which sequences
![Page 50: CIS 667](https://reader035.vdocument.in/reader035/viewer/2022062517/56812d2c550346895d922baa/html5/thumbnails/50.jpg)
DNA Chip
![Page 51: CIS 667](https://reader035.vdocument.in/reader035/viewer/2022062517/56812d2c550346895d922baa/html5/thumbnails/51.jpg)
Cloning
• Large amounts of DNA material is typically required for analysis In cloning, specific DNA fragments are
inserted into chromosome-like carriers called vectors in living cells
The identical copies of the fragments are called molecular clones and can be stored in libraries for later study
Vectors are derived from bacteria and yeast chromosomes
![Page 52: CIS 667](https://reader035.vdocument.in/reader035/viewer/2022062517/56812d2c550346895d922baa/html5/thumbnails/52.jpg)
Cloning
![Page 53: CIS 667](https://reader035.vdocument.in/reader035/viewer/2022062517/56812d2c550346895d922baa/html5/thumbnails/53.jpg)
Polymerase Chain Reaction
• Polymerase Chain Reaction (PCR) is an alternative to cloning for amplifying DNA fragments DNA fragments are heated to break
them into single strands Probes are added to bind to the portion
of DNA to be amplified DNA polymerase grows the strands from
the probes The process repeats
![Page 54: CIS 667](https://reader035.vdocument.in/reader035/viewer/2022062517/56812d2c550346895d922baa/html5/thumbnails/54.jpg)
Polymerase Chain Reaction