introduction to bioinformatics cpsc 265
DESCRIPTION
Introduction to Bioinformatics CPSC 265. What is bioinformatics?. Interface of biology and computer science Analysis of proteins, genes and genomes using computer algorithms and computer databases Genome informatics: making sense of the billions of base pairs of DNA - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Introduction to Bioinformatics CPSC 265](https://reader033.vdocument.in/reader033/viewer/2022051623/5681596a550346895dc6aa69/html5/thumbnails/1.jpg)
Introduction to Bioinformatics
CPSC 265
![Page 2: Introduction to Bioinformatics CPSC 265](https://reader033.vdocument.in/reader033/viewer/2022051623/5681596a550346895dc6aa69/html5/thumbnails/2.jpg)
Interface of biology and computer science
Analysis of proteins, genes and genomes using computer algorithms and computer databases
Genome informatics: making sense of the billions of base pairs of DNA that are sequenced by genomics projects.
Mostly, it’s about protein and DNA sequences
What is bioinformatics?
![Page 3: Introduction to Bioinformatics CPSC 265](https://reader033.vdocument.in/reader033/viewer/2022051623/5681596a550346895dc6aa69/html5/thumbnails/3.jpg)
What do bioinformatics researchers do?
Process large data outputs from new technologies
Turn sequence data into whole-genome sequences
Interpret genome sequences in terms of genes and their expression
Find genes that control crop, animal traits, disease etc.
Model evolution in genomes and proteins
Model and predict 3D structures of proteins
![Page 4: Introduction to Bioinformatics CPSC 265](https://reader033.vdocument.in/reader033/viewer/2022051623/5681596a550346895dc6aa69/html5/thumbnails/4.jpg)
Growth of GenBank
Year
Bas
e p
airs
of
DN
A (
bil
lio
ns)
Seq
uen
ces
(mil
lio
ns)
Updated 8-12-04:>40b base pairs
1982 1986 1990 1994 1998 2002 Fig. 2.1Page 17
![Page 5: Introduction to Bioinformatics CPSC 265](https://reader033.vdocument.in/reader033/viewer/2022051623/5681596a550346895dc6aa69/html5/thumbnails/5.jpg)
Cost of sequencing is falling exponentially
![Page 6: Introduction to Bioinformatics CPSC 265](https://reader033.vdocument.in/reader033/viewer/2022051623/5681596a550346895dc6aa69/html5/thumbnails/6.jpg)
DNA sequence analysis
Could be like those from our experiment last week
Or, a lot bigger, like the whole human genome.Some have chromatogram or “quality” data, some don’t.
![Page 7: Introduction to Bioinformatics CPSC 265](https://reader033.vdocument.in/reader033/viewer/2022051623/5681596a550346895dc6aa69/html5/thumbnails/7.jpg)
DNA makes RNA makes protein
Hard to sequence RNA
Very hard to sequence protein
We can deduce RNA sequence from DNA (in bacteria, as easy as turning Ts to Us.In eukarya, need also to figure out where introns are)
We can deduce protein sequence from RNA, using the Universal Genetic Code
![Page 8: Introduction to Bioinformatics CPSC 265](https://reader033.vdocument.in/reader033/viewer/2022051623/5681596a550346895dc6aa69/html5/thumbnails/8.jpg)
ConceptualTranslation
In a computer, take each set of three RNA letters, and then figure out what amino acid they code for.
Professionalbiologists usethe SINGLELETTER CODE
![Page 9: Introduction to Bioinformatics CPSC 265](https://reader033.vdocument.in/reader033/viewer/2022051623/5681596a550346895dc6aa69/html5/thumbnails/9.jpg)
DNA potentially encodes six proteins
5’ CAT CAA 5’ ATC AAC 5’ TCA ACT
5’ GTG GGT 5’ TGG GTA 5’ GGG TAG
5’ CATCAACTACAACTCCAAAGACACCCTTACACATCAACAAACCTACCCAC 3’3’ GTAGTTGATGTTGAGGTTTCTGTGGGAATGTGTAGTTGTTTGGATGGGTG 5’
![Page 10: Introduction to Bioinformatics CPSC 265](https://reader033.vdocument.in/reader033/viewer/2022051623/5681596a550346895dc6aa69/html5/thumbnails/10.jpg)
We call these READING FRAMES
5’ CAT CAA 5’ ATC AAT 5’ TCA ATG
5’ GTG GGT 5’ TGG GTA 5’ GGG TAG
5’ CATCAATGACAACTCCAAAGACACCCTTACACATCAACAAACCTACCCAC 3’3’ GTAGTTACTGTTGAGGTTTCTGTGGGAATGTGTAGTTGTTTGGATGGGTG 5’
![Page 11: Introduction to Bioinformatics CPSC 265](https://reader033.vdocument.in/reader033/viewer/2022051623/5681596a550346895dc6aa69/html5/thumbnails/11.jpg)
All proteins start with M (ATG)TAG, TAA and TGA are all STOP
This can help narrow it down
5’ CAT CAA 5’ ATC AAT 5’ TCA ATG
5’ GTG GGT 5’ TGG GTA 5’ GGG TAG
5’ CATCAATGACAACTCCAAAGACACCCTTACACATCAACAAACCTACCCAC 3’3’ GTAGTTACTGTTGAGGTTTCTGTGGGAATGTGTAGTTGTTTGGATGGGTG 5’
![Page 12: Introduction to Bioinformatics CPSC 265](https://reader033.vdocument.in/reader033/viewer/2022051623/5681596a550346895dc6aa69/html5/thumbnails/12.jpg)
Once you know the sequence of the protein, you can figure out if it has
been studied already.You may even be able to track down
a likely structure
![Page 13: Introduction to Bioinformatics CPSC 265](https://reader033.vdocument.in/reader033/viewer/2022051623/5681596a550346895dc6aa69/html5/thumbnails/13.jpg)
GenBankEMBL DDBJ
Housedat EBI
EuropeanBioinformatics
Institute
There are three major public DNA databases
Housed at NCBINational
Center forBiotechnology
Information
Housed in Japan
Page 16
![Page 14: Introduction to Bioinformatics CPSC 265](https://reader033.vdocument.in/reader033/viewer/2022051623/5681596a550346895dc6aa69/html5/thumbnails/14.jpg)
www.ncbi.nlm.nih.gov
![Page 15: Introduction to Bioinformatics CPSC 265](https://reader033.vdocument.in/reader033/viewer/2022051623/5681596a550346895dc6aa69/html5/thumbnails/15.jpg)
PubMed is… • National Library of Medicine's search service• 12 million citations in MEDLINE• links to participating online journals• PubMed tutorial (via “Education” on side bar)
![Page 16: Introduction to Bioinformatics CPSC 265](https://reader033.vdocument.in/reader033/viewer/2022051623/5681596a550346895dc6aa69/html5/thumbnails/16.jpg)
BLAST is…
• Basic Local Alignment Search Tool• NCBI's sequence similarity search tool• supports analysis of DNA and protein databases• 80,000 searches per day
![Page 17: Introduction to Bioinformatics CPSC 265](https://reader033.vdocument.in/reader033/viewer/2022051623/5681596a550346895dc6aa69/html5/thumbnails/17.jpg)
TaxBrowser is…
• browser for the major divisions of living organisms (archaea, bacteria, eukaryota, viruses)• taxonomy information such as genetic codes• molecular data on extinct organisms
![Page 18: Introduction to Bioinformatics CPSC 265](https://reader033.vdocument.in/reader033/viewer/2022051623/5681596a550346895dc6aa69/html5/thumbnails/18.jpg)
From the NCBI homepage, type “lectin”and hit “Search”
![Page 19: Introduction to Bioinformatics CPSC 265](https://reader033.vdocument.in/reader033/viewer/2022051623/5681596a550346895dc6aa69/html5/thumbnails/19.jpg)
![Page 20: Introduction to Bioinformatics CPSC 265](https://reader033.vdocument.in/reader033/viewer/2022051623/5681596a550346895dc6aa69/html5/thumbnails/20.jpg)
PubMed is the NCBI gateway to MEDLINE.
MEDLINE contains bibliographic citations and author abstracts from over 4,600 journals published in the United States and in 70 foreign countries.
It has 12 million records dating back to 1966.
Page 35
PubMed
![Page 21: Introduction to Bioinformatics CPSC 265](https://reader033.vdocument.in/reader033/viewer/2022051623/5681596a550346895dc6aa69/html5/thumbnails/21.jpg)
BLAST
BLAST looks for similarity between your favorite query sequence and other known protein or DNA sequences.
Applications include• identifying homologs (orthologs and paralogs)• discovering new genes or proteins• discovering variants of genes or proteins• investigating expressed sequence tags (ESTs)• exploring protein structure and function
page 88
![Page 22: Introduction to Bioinformatics CPSC 265](https://reader033.vdocument.in/reader033/viewer/2022051623/5681596a550346895dc6aa69/html5/thumbnails/22.jpg)
Four components to a BLAST search
(1) Obtain the sequence (query)
(2) Select the BLAST program
(3) Enter sequence
(4) Choose optional parameters
Then click “BLAST”
page 88
![Page 23: Introduction to Bioinformatics CPSC 265](https://reader033.vdocument.in/reader033/viewer/2022051623/5681596a550346895dc6aa69/html5/thumbnails/23.jpg)
Step 2: Choose the BLAST program
blastn (nucleotide BLAST)
blastp (protein BLAST)
tblastn (translated BLAST)
blastx (translated BLAST)
tblastx (translated BLAST)
![Page 24: Introduction to Bioinformatics CPSC 265](https://reader033.vdocument.in/reader033/viewer/2022051623/5681596a550346895dc6aa69/html5/thumbnails/24.jpg)
DNA potentially encodes six proteins
5’ CAT CAA 5’ ATC AAC 5’ TCA ACT
5’ GTG GGT 5’ TGG GTA 5’ GGG TAG
5’ CATCAACTACAACTCCAAAGACACCCTTACACATCAACAAACCTACCCAC 3’3’ GTAGTTGATGTTGAGGTTTCTGTGGGAATGTGTAGTTGTTTGGATGGGTG 5’
![Page 25: Introduction to Bioinformatics CPSC 265](https://reader033.vdocument.in/reader033/viewer/2022051623/5681596a550346895dc6aa69/html5/thumbnails/25.jpg)
Choose the BLAST program
Program Input Database 1
blastn DNA DNA 1
blastp protein protein 6
blastx DNA protein 6
tblastn protein DNA 36
tblastx DNA DNA
![Page 26: Introduction to Bioinformatics CPSC 265](https://reader033.vdocument.in/reader033/viewer/2022051623/5681596a550346895dc6aa69/html5/thumbnails/26.jpg)
![Page 27: Introduction to Bioinformatics CPSC 265](https://reader033.vdocument.in/reader033/viewer/2022051623/5681596a550346895dc6aa69/html5/thumbnails/27.jpg)
Step 3: choose the database
nr = non-redundant protein (most general
database)
Also can search specific organisms and DNA
rather than protein (although ALL DNA is going to
take a long time…)
![Page 28: Introduction to Bioinformatics CPSC 265](https://reader033.vdocument.in/reader033/viewer/2022051623/5681596a550346895dc6aa69/html5/thumbnails/28.jpg)
![Page 29: Introduction to Bioinformatics CPSC 265](https://reader033.vdocument.in/reader033/viewer/2022051623/5681596a550346895dc6aa69/html5/thumbnails/29.jpg)
![Page 30: Introduction to Bioinformatics CPSC 265](https://reader033.vdocument.in/reader033/viewer/2022051623/5681596a550346895dc6aa69/html5/thumbnails/30.jpg)
filtering
![Page 31: Introduction to Bioinformatics CPSC 265](https://reader033.vdocument.in/reader033/viewer/2022051623/5681596a550346895dc6aa69/html5/thumbnails/31.jpg)
![Page 32: Introduction to Bioinformatics CPSC 265](https://reader033.vdocument.in/reader033/viewer/2022051623/5681596a550346895dc6aa69/html5/thumbnails/32.jpg)
![Page 33: Introduction to Bioinformatics CPSC 265](https://reader033.vdocument.in/reader033/viewer/2022051623/5681596a550346895dc6aa69/html5/thumbnails/33.jpg)
So now you can
• Find any sequence in the database
• Find relevant publications
• Match DNA to protein sequence
• Find database matches to DNA or protein
• Find conserved domains in protein
• Find the 3D structure of a protein
…Without doing any experiments!