bioinformatics - rutgers universitychem.rutgers.edu/~kyc/teaching/files/543-05/543-24.pdf[3] precise...
TRANSCRIPT
![Page 1: Bioinformatics - Rutgers Universitychem.rutgers.edu/~kyc/Teaching/Files/543-05/543-24.pdf[3] Precise models of signal transduction pathways;ability to predict cellular responses to](https://reader034.vdocument.in/reader034/viewer/2022042809/5f91e8bd40f4f5540f059921/html5/thumbnails/1.jpg)
KYC
BioinformaticsLecture 24
Definition of bioinformatics
Overview of the NCBI website
Accessing information about DNA and proteins
--Definition of an accession number
--Five ways to find information on proteins and DNA
Access to biomedical literature
Alignment and BLAST
Tim
e ofdevelopm
ent
Body region, physiology, pharmacology, pathology
![Page 2: Bioinformatics - Rutgers Universitychem.rutgers.edu/~kyc/Teaching/Files/543-05/543-24.pdf[3] Precise models of signal transduction pathways;ability to predict cellular responses to](https://reader034.vdocument.in/reader034/viewer/2022042809/5f91e8bd40f4f5540f059921/html5/thumbnails/2.jpg)
KYC
Gene/protein familiesIn Silico experiments
Examples:Retinol-binding protein 4 (RBP4): a member of the lipocalin family.The Pol protein of HIV-1 --sequence alignment--gene expression--protein structure--phylogeny--homologs in various species
Aspartylprotease
Reversetranscriptase
Integrase
PR RT IN
![Page 3: Bioinformatics - Rutgers Universitychem.rutgers.edu/~kyc/Teaching/Files/543-05/543-24.pdf[3] Precise models of signal transduction pathways;ability to predict cellular responses to](https://reader034.vdocument.in/reader034/viewer/2022042809/5f91e8bd40f4f5540f059921/html5/thumbnails/3.jpg)
KYC
• Interface of biology and computers• Analysis of proteins, genes and genomes using computer algorithms and computer databases• Genomics is the analysis of genomes. The tools of bioinformatics are used to make sense of the billions of base pairs of DNA that are sequenced by genomics projects.
What is bioinformatics?
![Page 4: Bioinformatics - Rutgers Universitychem.rutgers.edu/~kyc/Teaching/Files/543-05/543-24.pdf[3] Precise models of signal transduction pathways;ability to predict cellular responses to](https://reader034.vdocument.in/reader034/viewer/2022042809/5f91e8bd40f4f5540f059921/html5/thumbnails/4.jpg)
KYC
Top ten challenges for bioinformatics
[1] Precise models of where and when transcription will occur in a genome(initiation and termination)
[2] Precise, predictive models of alternative RNA splicing[3] Precise models of signal transduction pathways;ability to predict cellular
responses to external stimuli[4] Determining protein:DNA, protein:RNA, protein:protein recognition codes[5] Accurate ab initio protein structure prediction[6] Rational design of small molecule inhibitors of proteins[7] Mechanistic understanding of protein evolution[8] Mechanistic understanding of speciation[9] Development of effective gene ontologies: systematic ways to describe
gene and protein function[10] Education: development of bioinformatics curricula
Source: Ewan Birney, Chris Burge, Jim Fickett
![Page 5: Bioinformatics - Rutgers Universitychem.rutgers.edu/~kyc/Teaching/Files/543-05/543-24.pdf[3] Precise models of signal transduction pathways;ability to predict cellular responses to](https://reader034.vdocument.in/reader034/viewer/2022042809/5f91e8bd40f4f5540f059921/html5/thumbnails/5.jpg)
KYC
Tool-users
Tool-makers
bioinformatics
public healthinformatics
medicalinformatics
infrastructure
databases algorithms
![Page 6: Bioinformatics - Rutgers Universitychem.rutgers.edu/~kyc/Teaching/Files/543-05/543-24.pdf[3] Precise models of signal transduction pathways;ability to predict cellular responses to](https://reader034.vdocument.in/reader034/viewer/2022042809/5f91e8bd40f4f5540f059921/html5/thumbnails/6.jpg)
KYC
Growth of GenBank
Year
Bas
e p
airs
of
DN
A (
bill
ion
s)
Seq
uen
ces
(mill
ion
s)
Updated 8-12-04:>40b base pairs
1982 1986 1990 1994 1998 2002
![Page 7: Bioinformatics - Rutgers Universitychem.rutgers.edu/~kyc/Teaching/Files/543-05/543-24.pdf[3] Precise models of signal transduction pathways;ability to predict cellular responses to](https://reader034.vdocument.in/reader034/viewer/2022042809/5f91e8bd40f4f5540f059921/html5/thumbnails/7.jpg)
KYC
DNA RNA
cDNAESTsUniGene
phenotype
genomicDNAdatabases
protein sequence databases
protein
genome transcriptome proteome
![Page 8: Bioinformatics - Rutgers Universitychem.rutgers.edu/~kyc/Teaching/Files/543-05/543-24.pdf[3] Precise models of signal transduction pathways;ability to predict cellular responses to](https://reader034.vdocument.in/reader034/viewer/2022042809/5f91e8bd40f4f5540f059921/html5/thumbnails/8.jpg)
KYC
GenBankEMBL DDBJ
Housedat EBIEuropean
BioinformaticsInstitute
There are three major public DNA databases
Housedat NCBI
National Center for Bio-technology and Information
Housed in Japan
all species 128,941
viruses 6,137
bacteria 31,262
archaea 2,100
eukaryota 87,147
Homo sapiens 10.7b Mus musculus 6.5bRattus norvegicus 5.6bDanio rerio 1.7bZea mays 1.4bOryza sativa 0.8bDrosophila melanogaster 0.7bGallus gallus 0.5bArabidopsis thaliana 0.5b
The most sequenced organisms in GenBank
www.ncbi.nlm.nih.gov
Species in GenBank
![Page 9: Bioinformatics - Rutgers Universitychem.rutgers.edu/~kyc/Teaching/Files/543-05/543-24.pdf[3] Precise models of signal transduction pathways;ability to predict cellular responses to](https://reader034.vdocument.in/reader034/viewer/2022042809/5f91e8bd40f4f5540f059921/html5/thumbnails/9.jpg)
KYC
www.ncbi.nlm.nih.gov
PubMed is…
• National Library of Medicine's search service
• 12 million citations in MEDLINE
• links to participating online journals
• PubMed tutorial (via “Education” on side bar)
Entrez integrates…• the scientific literature; • DNA and protein sequence databases; • 3D protein structure data; • population study data sets; • assemblies of complete genomes
Books is…• searchable resource of on-line books
![Page 10: Bioinformatics - Rutgers Universitychem.rutgers.edu/~kyc/Teaching/Files/543-05/543-24.pdf[3] Precise models of signal transduction pathways;ability to predict cellular responses to](https://reader034.vdocument.in/reader034/viewer/2022042809/5f91e8bd40f4f5540f059921/html5/thumbnails/10.jpg)
KYC
TaxBrowser is…• browser for the major divisions of living organisms (archaea, bacteria, eukaryota, viruses)• taxonomy information such as genetic codes• molecular data on extinct organisms
Structure site includes…• Molecular Modelling Database (MMDB)
• biopolymer structures obtained from
the Protein Data Bank (PDB)
• Cn3D (a 3D-structure viewer)
• vector alignment search tool (VAST)OMIM is…•Online Mendelian Inheritance in Man
•catalog of human genes and genetic disorders
•edited by Dr. Victor McKusick, others at JHUBLAST is…• Basic Local Alignment Search Tool
• NCBI's sequence similarity search tool
• supports analysis of DNA and protein databases
• 80,000 searches per day
![Page 11: Bioinformatics - Rutgers Universitychem.rutgers.edu/~kyc/Teaching/Files/543-05/543-24.pdf[3] Precise models of signal transduction pathways;ability to predict cellular responses to](https://reader034.vdocument.in/reader034/viewer/2022042809/5f91e8bd40f4f5540f059921/html5/thumbnails/11.jpg)
KYC
Accession numbers are labels for sequencesNCBI includes databases (such as GenBank) that contain information on DNA, RNA, orprotein sequences. You can acquire information beginning with a query such as thename of a protein, or the raw nucleotides comprising a DNA sequence of interest.DNA sequences and other molecular data are tagged with accession numbers that areused to identify a sequence or other record relevant to molecular data.
Accessing information on molecular sequences
An accession number is label that used to identify a sequence. It is a string ofletters and/or numbers that corresponds to a molecular sequence.Examples (all for retinol-binding protein, RBP4):
X02775 GenBank genomic DNA sequenceNT_030059 Genomic contigRs7079946 dbSNP (single nucleotide polymorphism)
N91759.1 An expressed sequence tag (1 of 170)NM_006744 RefSeq DNA sequence (from a transcript)
NP_007635 RefSeq proteinAAC02945 GenBank proteinQ28369 SwissProt protein1KT7 Protein Data Bank structure record
![Page 12: Bioinformatics - Rutgers Universitychem.rutgers.edu/~kyc/Teaching/Files/543-05/543-24.pdf[3] Precise models of signal transduction pathways;ability to predict cellular responses to](https://reader034.vdocument.in/reader034/viewer/2022042809/5f91e8bd40f4f5540f059921/html5/thumbnails/12.jpg)
KYC
Five ways to access DNA and protein sequences[1] LocusLink with RefSeq[2] UniGene [3] Entrez[4] European Bioinformatics Institute (EBI) and Ensembl (separate from NCBI)[5] ExPASy Sequence Retrieval System (separate from NCBI)
LocusLink is a great starting point: it collects key information on each gene/protein frommajor databases. It now covers 15 organisms.
Unfortunately, LocusLink is slowly being retired in favor of EntrezGene
RefSeq provides a curated, optimal accession number for each DNA (NM_006744)
or protein (NP_007635)
![Page 13: Bioinformatics - Rutgers Universitychem.rutgers.edu/~kyc/Teaching/Files/543-05/543-24.pdf[3] Precise models of signal transduction pathways;ability to predict cellular responses to](https://reader034.vdocument.in/reader034/viewer/2022042809/5f91e8bd40f4f5540f059921/html5/thumbnails/13.jpg)
KYC
![Page 14: Bioinformatics - Rutgers Universitychem.rutgers.edu/~kyc/Teaching/Files/543-05/543-24.pdf[3] Precise models of signal transduction pathways;ability to predict cellular responses to](https://reader034.vdocument.in/reader034/viewer/2022042809/5f91e8bd40f4f5540f059921/html5/thumbnails/14.jpg)
KYC
![Page 15: Bioinformatics - Rutgers Universitychem.rutgers.edu/~kyc/Teaching/Files/543-05/543-24.pdf[3] Precise models of signal transduction pathways;ability to predict cellular responses to](https://reader034.vdocument.in/reader034/viewer/2022042809/5f91e8bd40f4f5540f059921/html5/thumbnails/15.jpg)
KYC
![Page 16: Bioinformatics - Rutgers Universitychem.rutgers.edu/~kyc/Teaching/Files/543-05/543-24.pdf[3] Precise models of signal transduction pathways;ability to predict cellular responses to](https://reader034.vdocument.in/reader034/viewer/2022042809/5f91e8bd40f4f5540f059921/html5/thumbnails/16.jpg)
KYC
NCBI’s important RefSeq project:best representative sequences
RefSeq (accessible via the main page of NCBI)provides an expertly curated accession number thatcorresponds to the most stable, agreed-upon “reference”version of a sequence.
RefSeq identifiers include the following formats:
Complete genome NC_######Complete chromosome NC_######Genomic contig NT_######mRNA (DNA format) NM_###### e.g. NM_006744Protein NP_###### e.g. NP_006735
![Page 17: Bioinformatics - Rutgers Universitychem.rutgers.edu/~kyc/Teaching/Files/543-05/543-24.pdf[3] Precise models of signal transduction pathways;ability to predict cellular responses to](https://reader034.vdocument.in/reader034/viewer/2022042809/5f91e8bd40f4f5540f059921/html5/thumbnails/17.jpg)
KYC
UniGene: unique genes via ESTs
• Find UniGene at NCBI: www.ncbi.nlm.nih.gov/UniGene• UniGene clusters contain many expressed sequence tags (ESTs), which are DNA sequences (typically 500 base pairs in length) corresponding to the mRNA from an expressed gene. ESTs are sequenced from a complementary DNA (cDNA) library.• UniGene data come from many cDNA libraries. Thus, when you look up a gene in UniGene you get information on its abundance and its regional distribution.
[2] UniGene
DNA RNA
complementary DNA(cDNA)
protein
UniGene
![Page 18: Bioinformatics - Rutgers Universitychem.rutgers.edu/~kyc/Teaching/Files/543-05/543-24.pdf[3] Precise models of signal transduction pathways;ability to predict cellular responses to](https://reader034.vdocument.in/reader034/viewer/2022042809/5f91e8bd40f4f5540f059921/html5/thumbnails/18.jpg)
KYC
Cluster sizes in UniGene
This is a gene with10 ESTs associated;the cluster size is 10
This is a gene with1 EST associated;the cluster size is 1
![Page 19: Bioinformatics - Rutgers Universitychem.rutgers.edu/~kyc/Teaching/Files/543-05/543-24.pdf[3] Precise models of signal transduction pathways;ability to predict cellular responses to](https://reader034.vdocument.in/reader034/viewer/2022042809/5f91e8bd40f4f5540f059921/html5/thumbnails/19.jpg)
KYC
Cluster sizes in UniGene (human)
Cluster size Number of clusters1 ≈ 8,1002 38,2003-4 23,3005-8 12,0009-16 5,60017-32 3,700
≈500-1000 1,050≈2000-4000 100≈8000-16,000 12≈16,000-30,000 2
![Page 20: Bioinformatics - Rutgers Universitychem.rutgers.edu/~kyc/Teaching/Files/543-05/543-24.pdf[3] Precise models of signal transduction pathways;ability to predict cellular responses to](https://reader034.vdocument.in/reader034/viewer/2022042809/5f91e8bd40f4f5540f059921/html5/thumbnails/20.jpg)
KYC
From the NCBI homepage,type “rbp4” and hit “Go”
3. Entrez to access protein & DNA sequences
![Page 21: Bioinformatics - Rutgers Universitychem.rutgers.edu/~kyc/Teaching/Files/543-05/543-24.pdf[3] Precise models of signal transduction pathways;ability to predict cellular responses to](https://reader034.vdocument.in/reader034/viewer/2022042809/5f91e8bd40f4f5540f059921/html5/thumbnails/21.jpg)
KYC
![Page 22: Bioinformatics - Rutgers Universitychem.rutgers.edu/~kyc/Teaching/Files/543-05/543-24.pdf[3] Precise models of signal transduction pathways;ability to predict cellular responses to](https://reader034.vdocument.in/reader034/viewer/2022042809/5f91e8bd40f4f5540f059921/html5/thumbnails/22.jpg)
KYC
![Page 23: Bioinformatics - Rutgers Universitychem.rutgers.edu/~kyc/Teaching/Files/543-05/543-24.pdf[3] Precise models of signal transduction pathways;ability to predict cellular responses to](https://reader034.vdocument.in/reader034/viewer/2022042809/5f91e8bd40f4f5540f059921/html5/thumbnails/23.jpg)
KYC
![Page 24: Bioinformatics - Rutgers Universitychem.rutgers.edu/~kyc/Teaching/Files/543-05/543-24.pdf[3] Precise models of signal transduction pathways;ability to predict cellular responses to](https://reader034.vdocument.in/reader034/viewer/2022042809/5f91e8bd40f4f5540f059921/html5/thumbnails/24.jpg)
KYC
By applying limits, there are now just two entries
![Page 25: Bioinformatics - Rutgers Universitychem.rutgers.edu/~kyc/Teaching/Files/543-05/543-24.pdf[3] Precise models of signal transduction pathways;ability to predict cellular responses to](https://reader034.vdocument.in/reader034/viewer/2022042809/5f91e8bd40f4f5540f059921/html5/thumbnails/25.jpg)
KYC
![Page 26: Bioinformatics - Rutgers Universitychem.rutgers.edu/~kyc/Teaching/Files/543-05/543-24.pdf[3] Precise models of signal transduction pathways;ability to predict cellular responses to](https://reader034.vdocument.in/reader034/viewer/2022042809/5f91e8bd40f4f5540f059921/html5/thumbnails/26.jpg)
KYC
![Page 27: Bioinformatics - Rutgers Universitychem.rutgers.edu/~kyc/Teaching/Files/543-05/543-24.pdf[3] Precise models of signal transduction pathways;ability to predict cellular responses to](https://reader034.vdocument.in/reader034/viewer/2022042809/5f91e8bd40f4f5540f059921/html5/thumbnails/27.jpg)
KYC
FASTA format
![Page 28: Bioinformatics - Rutgers Universitychem.rutgers.edu/~kyc/Teaching/Files/543-05/543-24.pdf[3] Precise models of signal transduction pathways;ability to predict cellular responses to](https://reader034.vdocument.in/reader034/viewer/2022042809/5f91e8bd40f4f5540f059921/html5/thumbnails/28.jpg)
KYC
clickhuman
4. EBI and
Ensembl
![Page 29: Bioinformatics - Rutgers Universitychem.rutgers.edu/~kyc/Teaching/Files/543-05/543-24.pdf[3] Precise models of signal transduction pathways;ability to predict cellular responses to](https://reader034.vdocument.in/reader034/viewer/2022042809/5f91e8bd40f4f5540f059921/html5/thumbnails/29.jpg)
KYC
enterRBP4
![Page 30: Bioinformatics - Rutgers Universitychem.rutgers.edu/~kyc/Teaching/Files/543-05/543-24.pdf[3] Precise models of signal transduction pathways;ability to predict cellular responses to](https://reader034.vdocument.in/reader034/viewer/2022042809/5f91e8bd40f4f5540f059921/html5/thumbnails/30.jpg)
KYC
![Page 31: Bioinformatics - Rutgers Universitychem.rutgers.edu/~kyc/Teaching/Files/543-05/543-24.pdf[3] Precise models of signal transduction pathways;ability to predict cellular responses to](https://reader034.vdocument.in/reader034/viewer/2022042809/5f91e8bd40f4f5540f059921/html5/thumbnails/31.jpg)
KYC
5. ExPASy Sequence Retrieval System
![Page 32: Bioinformatics - Rutgers Universitychem.rutgers.edu/~kyc/Teaching/Files/543-05/543-24.pdf[3] Precise models of signal transduction pathways;ability to predict cellular responses to](https://reader034.vdocument.in/reader034/viewer/2022042809/5f91e8bd40f4f5540f059921/html5/thumbnails/32.jpg)
KYC
![Page 33: Bioinformatics - Rutgers Universitychem.rutgers.edu/~kyc/Teaching/Files/543-05/543-24.pdf[3] Precise models of signal transduction pathways;ability to predict cellular responses to](https://reader034.vdocument.in/reader034/viewer/2022042809/5f91e8bd40f4f5540f059921/html5/thumbnails/33.jpg)
KYC
Begin at the main page of NCBI, and type an Entrez query: hiv-1 pol
![Page 34: Bioinformatics - Rutgers Universitychem.rutgers.edu/~kyc/Teaching/Files/543-05/543-24.pdf[3] Precise models of signal transduction pathways;ability to predict cellular responses to](https://reader034.vdocument.in/reader034/viewer/2022042809/5f91e8bd40f4f5540f059921/html5/thumbnails/34.jpg)
KYC
Searching for HIV-1 pol:Following the “genome” link yields
a manageable three results
![Page 35: Bioinformatics - Rutgers Universitychem.rutgers.edu/~kyc/Teaching/Files/543-05/543-24.pdf[3] Precise models of signal transduction pathways;ability to predict cellular responses to](https://reader034.vdocument.in/reader034/viewer/2022042809/5f91e8bd40f4f5540f059921/html5/thumbnails/35.jpg)
KYC
![Page 36: Bioinformatics - Rutgers Universitychem.rutgers.edu/~kyc/Teaching/Files/543-05/543-24.pdf[3] Precise models of signal transduction pathways;ability to predict cellular responses to](https://reader034.vdocument.in/reader034/viewer/2022042809/5f91e8bd40f4f5540f059921/html5/thumbnails/36.jpg)
KYC
PubMed at NCBIto find literatureinformation
PubMed is the NCBIgateway to MEDLINE.
MEDLINE containsbibliographic citationsand author abstracts fromover 4,600 journalspublished in the UnitedStates and in 70 foreigncountries.
It has 12 million recordsdating back to 1966.
![Page 37: Bioinformatics - Rutgers Universitychem.rutgers.edu/~kyc/Teaching/Files/543-05/543-24.pdf[3] Precise models of signal transduction pathways;ability to predict cellular responses to](https://reader034.vdocument.in/reader034/viewer/2022042809/5f91e8bd40f4f5540f059921/html5/thumbnails/37.jpg)
KYC
MeSH is the acronym for "Medical Subject Headings."
MeSH is the list of the vocabulary terms used for subject analysis of biomedical literature at NLM. MeSH vocabulary is used for indexing journal articles for MEDLINE.
The MeSH controlled vocabulary imposes uniformity and consistency to the indexing of biomedical literature.
![Page 38: Bioinformatics - Rutgers Universitychem.rutgers.edu/~kyc/Teaching/Files/543-05/543-24.pdf[3] Precise models of signal transduction pathways;ability to predict cellular responses to](https://reader034.vdocument.in/reader034/viewer/2022042809/5f91e8bd40f4f5540f059921/html5/thumbnails/38.jpg)
KYC
![Page 39: Bioinformatics - Rutgers Universitychem.rutgers.edu/~kyc/Teaching/Files/543-05/543-24.pdf[3] Precise models of signal transduction pathways;ability to predict cellular responses to](https://reader034.vdocument.in/reader034/viewer/2022042809/5f91e8bd40f4f5540f059921/html5/thumbnails/39.jpg)
KYC
lipocalin AND disease(60 results)
lipocalin OR disease(1,650,000 results)
lipocalin NOT disease(530 results)
1 AND 2
1 OR 2
1 NOT 2
1
1
1
2
2
2
![Page 40: Bioinformatics - Rutgers Universitychem.rutgers.edu/~kyc/Teaching/Files/543-05/543-24.pdf[3] Precise models of signal transduction pathways;ability to predict cellular responses to](https://reader034.vdocument.in/reader034/viewer/2022042809/5f91e8bd40f4f5540f059921/html5/thumbnails/40.jpg)
KYC
Pairwise sequence alignment
β-corticotropin (sheep)Corticotropin A (pig)
ala gly glu asp asp gluasp gly ala glu asp glu
OxytocinVasopressin
CYIQNCPLGCYFQNCPRG
Pairwise sequence alignment is the most fundamental operation of bioinformatics
• It is used to decide if two proteins (or genes) are related structurally or functionally• It is used to identify domains or motifs that are shared between proteins• It is the basis of BLAST searching • It is used in the analysis of genomes
![Page 41: Bioinformatics - Rutgers Universitychem.rutgers.edu/~kyc/Teaching/Files/543-05/543-24.pdf[3] Precise models of signal transduction pathways;ability to predict cellular responses to](https://reader034.vdocument.in/reader034/viewer/2022042809/5f91e8bd40f4f5540f059921/html5/thumbnails/41.jpg)
KYC
retinol-binding protein(NP_006735)
β-lactoglobulin(P02754)
RBP and β-lactoglobulin are homologous proteinsthat share related three-dimensional structures
![Page 42: Bioinformatics - Rutgers Universitychem.rutgers.edu/~kyc/Teaching/Files/543-05/543-24.pdf[3] Precise models of signal transduction pathways;ability to predict cellular responses to](https://reader034.vdocument.in/reader034/viewer/2022042809/5f91e8bd40f4f5540f059921/html5/thumbnails/42.jpg)
KYC
![Page 43: Bioinformatics - Rutgers Universitychem.rutgers.edu/~kyc/Teaching/Files/543-05/543-24.pdf[3] Precise models of signal transduction pathways;ability to predict cellular responses to](https://reader034.vdocument.in/reader034/viewer/2022042809/5f91e8bd40f4f5540f059921/html5/thumbnails/43.jpg)
KYC
1 MKWVWALLLLAAWAAAERDCRVSSFRVKENFDKARFSGTWYAMAKKDPEG 50 RBP . ||| | . |. . . | : .||||.:| : 1 ...MKCLLLALALTCGAQALIVT..QTMKGLDIQKVAGTWYSLAMAASD. 44 lactoglobulin
51 LFLQDNIVAEFSVDETGQMSATAKGRVR.LLNNWD..VCADMVGTFTDTE 97 RBP : | | | | :: | .| . || |: || |. 45 ISLLDAQSAPLRV.YVEELKPTPEGDLEILLQKWENGECAQKKIIAEKTK 93 lactoglobulin
98 DPAKFKMKYWGVASFLQKGNDDHWIVDTDYDTYAV...........QYSC 136 RBP || ||. | :.|||| | . .| 94 IPAVFKIDALNENKVL........VLDTDYKKYLLFCMENSAEPEQSLAC 135 lactoglobulin
137 RLLNLDGTCADSYSFVFSRDPNGLPPEAQKIVRQRQ.EELCLARQYRLIV 185 RBP . | | | : || . | || | 136 QCLVRTPEVDDEALEKFDKALKALPMHIRLSFNPTQLEEQCHI....... 178 lactoglobulin
Pairwise alignment of retinol-binding protein and β-lactoglobulin
![Page 44: Bioinformatics - Rutgers Universitychem.rutgers.edu/~kyc/Teaching/Files/543-05/543-24.pdf[3] Precise models of signal transduction pathways;ability to predict cellular responses to](https://reader034.vdocument.in/reader034/viewer/2022042809/5f91e8bd40f4f5540f059921/html5/thumbnails/44.jpg)
KYC
1 MKWVWALLLLAAWAAAERDCRVSSFRVKENFDKARFSGTWYAMAKKDPEG 50 RBP . ||| | . |. . . | : .||||.:| : 1 ...MKCLLLALALTCGAQALIVT..QTMKGLDIQKVAGTWYSLAMAASD. 44 lactoglobulin
51 LFLQDNIVAEFSVDETGQMSATAKGRVR.LLNNWD..VCADMVGTFTDTE 97 RBP : | | | | :: | .| . || |: || |. 45 ISLLDAQSAPLRV.YVEELKPTPEGDLEILLQKWENGECAQKKIIAEKTK 93 lactoglobulin
98 DPAKFKMKYWGVASFLQKGNDDHWIVDTDYDTYAV...........QYSC 136 RBP || ||. | :.|||| | . .| 94 IPAVFKIDALNENKVL........VLDTDYKKYLLFCMENSAEPEQSLAC 135 lactoglobulin
137 RLLNLDGTCADSYSFVFSRDPNGLPPEAQKIVRQRQ.EELCLARQYRLIV 185 RBP . | | | : || . | || | 136 QCLVRTPEVDDEALEKFDKALKALPMHIRLSFNPTQLEEQCHI....... 178 lactoglobulin
Pairwise alignment of retinol-binding protein and β-lactoglobulin
Somewhatsimilar
(one dot)
Verysimilar
(two dots)
Identity(bar)
![Page 45: Bioinformatics - Rutgers Universitychem.rutgers.edu/~kyc/Teaching/Files/543-05/543-24.pdf[3] Precise models of signal transduction pathways;ability to predict cellular responses to](https://reader034.vdocument.in/reader034/viewer/2022042809/5f91e8bd40f4f5540f059921/html5/thumbnails/45.jpg)
KYC
1 MKWVWALLLLAAWAAAERDCRVSSFRVKENFDKARFSGTWYAMAKKDPEG 50 RBP . ||| | . |. . . | : .||||.:| : 1 ...MKCLLLALALTCGAQALIVT..QTMKGLDIQKVAGTWYSLAMAASD. 44 lactoglobulin
51 LFLQDNIVAEFSVDETGQMSATAKGRVR.LLNNWD..VCADMVGTFTDTE 97 RBP : | | | | :: | .| . || |: || |. 45 ISLLDAQSAPLRV.YVEELKPTPEGDLEILLQKWENGECAQKKIIAEKTK 93 lactoglobulin
98 DPAKFKMKYWGVASFLQKGNDDHWIVDTDYDTYAV...........QYSC 136 RBP || ||. | :.|||| | . .| 94 IPAVFKIDALNENKVL........VLDTDYKKYLLFCMENSAEPEQSLAC 135 lactoglobulin
137 RLLNLDGTCADSYSFVFSRDPNGLPPEAQKIVRQRQ.EELCLARQYRLIV 185 RBP . | | | : || . | || | 136 QCLVRTPEVDDEALEKFDKALKALPMHIRLSFNPTQLEEQCHI....... 178 lactoglobulin
Pairwise alignment of retinol-binding protein and β-lactoglobulin
Internalgap
Terminalgap
![Page 46: Bioinformatics - Rutgers Universitychem.rutgers.edu/~kyc/Teaching/Files/543-05/543-24.pdf[3] Precise models of signal transduction pathways;ability to predict cellular responses to](https://reader034.vdocument.in/reader034/viewer/2022042809/5f91e8bd40f4f5540f059921/html5/thumbnails/46.jpg)
KYC
SimilarityThe extent to which nucleotide or protein sequences arerelated. It is based upon identity plus conservation.
IdentityThe extent to which two sequences are invariant.
ConservationChanges at a specific position of an amino acid or (lesscommonly, DNA) sequence that preserve the physico-chemical properties of the original residue.
Definitions
![Page 47: Bioinformatics - Rutgers Universitychem.rutgers.edu/~kyc/Teaching/Files/543-05/543-24.pdf[3] Precise models of signal transduction pathways;ability to predict cellular responses to](https://reader034.vdocument.in/reader034/viewer/2022042809/5f91e8bd40f4f5540f059921/html5/thumbnails/47.jpg)
KYC
• Positions at which a letter is paired with a null are called gaps. • Gap scores are typically negative. • Since a single mutational event may cause the insertion or deletion of more than one residue, the presence of a gap is ascribed more significance than the length of the gap. • In BLAST, it is rarely necessary to change gap values from the default.
Gaps
![Page 48: Bioinformatics - Rutgers Universitychem.rutgers.edu/~kyc/Teaching/Files/543-05/543-24.pdf[3] Precise models of signal transduction pathways;ability to predict cellular responses to](https://reader034.vdocument.in/reader034/viewer/2022042809/5f91e8bd40f4f5540f059921/html5/thumbnails/48.jpg)
KYC
1 .MKWVWALLLLA.AWAAAERDCRVSSFRVKENFDKARFSGTWYAMAKKDP 48 :: || || || .||.||. .| :|||:.|:.| |||.||||| 1 MLRICVALCALATCWA...QDCQVSNIQVMQNFDRSRYTGRWYAVAKKDP 47 . . . . . 49 EGLFLQDNIVAEFSVDETGQMSATAKGRVRLLNNWDVCADMVGTFTDTED 98 |||| ||:||:|||||.|.|.||| ||| :||||:.||.| ||| || | 48 VGLFLLDNVVAQFSVDESGKMTATAHGRVIILNNWEMCANMFGTFEDTPD 97 . . . . . 99 PAKFKMKYWGVASFLQKGNDDHWIVDTDYDTYAVQYSCRLLNLDGTCADS 148 ||||||:||| ||:|| ||||||::||||| ||: |||| ..||||| | 98 PAKFKMRYWGAASYLQTGNDDHWVIDTDYDNYAIHYSCREVDLDGTCLDG 147 . . . . . 149 YSFVFSRDPNGLPPEAQKIVRQRQEELCLARQYRLIVHNGYCDGRSERNLL 199 |||:||| | || || |||| :..|:| .|| : | |:|: 148 YSFIFSRHPTGLRPEDQKIVTDKKKEICFLGKYRRVGHTGFCESS...... 192
Pairwise alignment of retinol-binding protein from human (top) and rainbow trout (O. mykiss)
![Page 49: Bioinformatics - Rutgers Universitychem.rutgers.edu/~kyc/Teaching/Files/543-05/543-24.pdf[3] Precise models of signal transduction pathways;ability to predict cellular responses to](https://reader034.vdocument.in/reader034/viewer/2022042809/5f91e8bd40f4f5540f059921/html5/thumbnails/49.jpg)
KYC
Pairwise alignment The process of lining up two or more sequences to achieve maximal levels of identity (and conservation, in the case of amino acid sequences) for the purpose of assessing the degree of similarity and the possibility of homology.
HomologySimilarity attributed to descent from a common ancestor.
RBP 26 RVKENFDKARFSGTWYAMAKKDPEGLFLQDNIVAEFSVDETGQMSATAKGRVRLLNNWD- 84 +K++ +++ GTW++MA + L + A V T + +L+ W+ glycodelin 23 QTKQDLELPKLAGTWHSMAMA-TNNISLMATLKAPLRVHITSLLPTPEDNLEIVLHRWEN 81
OrthologsHomologous sequences in different speciesthat arose from a common ancestral geneduring speciation; may or may not be responsiblefor a similar function.ParalogsHomologous sequences within a single speciesthat arose by gene duplication.
![Page 50: Bioinformatics - Rutgers Universitychem.rutgers.edu/~kyc/Teaching/Files/543-05/543-24.pdf[3] Precise models of signal transduction pathways;ability to predict cellular responses to](https://reader034.vdocument.in/reader034/viewer/2022042809/5f91e8bd40f4f5540f059921/html5/thumbnails/50.jpg)
KYC
Orthologs:members of a gene (protein)family in variousorganisms.
This tree shows13 RBP orthologs.
common carp
zebrafish
rainbow trout
teleost
African clawed frog
chicken
mouserat
rabbitcowpighorse
human
10 changes
![Page 51: Bioinformatics - Rutgers Universitychem.rutgers.edu/~kyc/Teaching/Files/543-05/543-24.pdf[3] Precise models of signal transduction pathways;ability to predict cellular responses to](https://reader034.vdocument.in/reader034/viewer/2022042809/5f91e8bd40f4f5540f059921/html5/thumbnails/51.jpg)
KYC
Paralogs:members of a gene (protein)family within aspecies.
This tree shows9 humanlipocalins.
apolipoprotein D
retinol-bindingprotein 4
Complementcomponent 8
prostaglandinD2 synthase
neutrophilgelatinase-associatedlipocalin
10 changesLipocalin 1Odorant-bindingprotein 2A
progestagen-associatedendometrialprotein
Alpha-1Microglobulin/bikunin
![Page 52: Bioinformatics - Rutgers Universitychem.rutgers.edu/~kyc/Teaching/Files/543-05/543-24.pdf[3] Precise models of signal transduction pathways;ability to predict cellular responses to](https://reader034.vdocument.in/reader034/viewer/2022042809/5f91e8bd40f4f5540f059921/html5/thumbnails/52.jpg)
KYChttp://www.ncbi.nlm.nih.gov/Education/BLASTinfo/Orthology.html
![Page 53: Bioinformatics - Rutgers Universitychem.rutgers.edu/~kyc/Teaching/Files/543-05/543-24.pdf[3] Precise models of signal transduction pathways;ability to predict cellular responses to](https://reader034.vdocument.in/reader034/viewer/2022042809/5f91e8bd40f4f5540f059921/html5/thumbnails/53.jpg)
KYC
4 3 2 1 0
Pairwise sequence alignment allows usto look back billions of years ago (BYA)
Origin oflife
Origin ofeukaryotes insects
Fungi/animalPlant/animal
Earliestfossils
Eukaryote/archaea
![Page 54: Bioinformatics - Rutgers Universitychem.rutgers.edu/~kyc/Teaching/Files/543-05/543-24.pdf[3] Precise models of signal transduction pathways;ability to predict cellular responses to](https://reader034.vdocument.in/reader034/viewer/2022042809/5f91e8bd40f4f5540f059921/html5/thumbnails/54.jpg)
KYC
fly GAKKVIISAP SAD.APM..F VCGVNLDAYK PDMKVVSNAS CTTNCLAPLA human GAKRVIISAP SAD.APM..F VMGVNHEKYD NSLKIISNAS CTTNCLAPLA plant GAKKVIISAP SAD.APM..F VVGVNEHTYQ PNMDIVSNAS CTTNCLAPLA bacterium GAKKVVMTGP SKDNTPM..F VKGANFDKY. AGQDIVSNAS CTTNCLAPLA yeast GAKKVVITAP SS.TAPM..F VMGVNEEKYT SDLKIVSNAS CTTNCLAPLA archaeon GADKVLISAP PKGDEPVKQL VYGVNHDEYD GE.DVVSNAS CTTNSITPVA
fly KVINDNFEIV EGLMTTVHAT TATQKTVDGP SGKLWRDGRG AAQNIIPAST human KVIHDNFGIV EGLMTTVHAI TATQKTVDGP SGKLWRDGRG ALQNIIPAST plant KVVHEEFGIL EGLMTTVHAT TATQKTVDGP SMKDWRGGRG ASQNIIPSST bacterium KVINDNFGII EGLMTTVHAT TATQKTVDGP SHKDWRGGRG ASQNIIPSST yeast KVINDAFGIE EGLMTTVHSL TATQKTVDGP SHKDWRGGRT ASGNIIPSST archaeon KVLDEEFGIN AGQLTTVHAY TGSQNLMDGP NGKP.RRRRA AAENIIPTST
fly GAAKAVGKVI PALNGKLTGM AFRVPTPNVS VVDLTVRLGK GASYDEIKAK human GAAKAVGKVI PELNGKLTGM AFRVPTANVS VVDLTCRLEK PAKYDDIKKV plant GAAKAVGKVL PELNGKLTGM AFRVPTSNVS VVDLTCRLEK GASYEDVKAA bacterium GAAKAVGKVL PELNGKLTGM AFRVPTPNVS VVDLTVRLEK AATYEQIKAA yeast GAAKAVGKVL PELQGKLTGM AFRVPTVDVS VVDLTVKLNK ETTYDEIKKV archaeon GAAQAATEVL PELEGKLDGM AIRVPVPNGS ITEFVVDLDD DVTESDVNAA
Multiple sequence alignment ofglyceraldehyde 3-phosphate dehydrogenases
![Page 55: Bioinformatics - Rutgers Universitychem.rutgers.edu/~kyc/Teaching/Files/543-05/543-24.pdf[3] Precise models of signal transduction pathways;ability to predict cellular responses to](https://reader034.vdocument.in/reader034/viewer/2022042809/5f91e8bd40f4f5540f059921/html5/thumbnails/55.jpg)
KYC
~~~~~EIQDVSGTWYAMTVDREFPEMNLESVTPMTLTTL.GGNLEAKVTM lipocalin 1 LSFTLEEEDITGTWYAMVVDKDFPEDRRRKVSPVKVTALGGGNLEATFTF odorant-binding protein 2aTKQDLELPKLAGTWHSMAMATNNISLMATLKAPLRVHITSEDNLEIVLHR progestagen-assoc. endo.VQENFDVNKYLGRWYEIEKIPTTFENGRCIQANYSLMENGNQELRADGTV apolipoprotein DVKENFDKARFSGTWYAMAKDPEGLFLQDNIVAEFSVDETGNWDVCADGTF retinol-binding proteinLQQNFQDNQFQGKWYVVGLAGNAI.LREDKDPQKMYATIDKSYNVTSVLF neutrophil gelatinase-ass.VQPNFQQDKFLGRWFSAGLASNSSWLREKKAALSMCKSVDGGLNLTSTFL prostaglandin D2 synthaseVQENFNISRIYGKWYNLAIGSTCPWMDRMTVSTLVLGEGEAEISMTSTRW alpha-1-microglobulinPKANFDAQQFAGTWLLVAVGSACRFLQRAEATTLHVAPQGSTFRKLD... complement component 8
Multiple sequence alignment ofhuman lipocalin paralogs
![Page 56: Bioinformatics - Rutgers Universitychem.rutgers.edu/~kyc/Teaching/Files/543-05/543-24.pdf[3] Precise models of signal transduction pathways;ability to predict cellular responses to](https://reader034.vdocument.in/reader034/viewer/2022042809/5f91e8bd40f4f5540f059921/html5/thumbnails/56.jpg)
KYC
General approach to pairwise alignment
• Choose two sequences• Select an algorithm that generates a score• Allow gaps (insertions, deletions)• Score reflects degree of similarity• Alignments can be global or local• Estimate probability that the alignment occurred by chance
![Page 57: Bioinformatics - Rutgers Universitychem.rutgers.edu/~kyc/Teaching/Files/543-05/543-24.pdf[3] Precise models of signal transduction pathways;ability to predict cellular responses to](https://reader034.vdocument.in/reader034/viewer/2022042809/5f91e8bd40f4f5540f059921/html5/thumbnails/57.jpg)
KYC
Calculation of an alignment score
http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/Alignment_Scores2.html