bioinformatics & computational biologyhome.iitk.ac.in/~rsankar/courses/lec01.pdfdr. r. sankar,...

56
Dr. R. Sankar, BSE 633 (2020) BSE 633A Bioinformatics & Computational Biology R. Sankararamakrishnan

Upload: others

Post on 21-May-2020

23 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bioinformatics & Computational Biologyhome.iitk.ac.in/~rsankar/courses/lec01.pdfDr. R. Sankar, BSE 633 (2020) Introduction to bioinformatics, biological databases and their growth,

Dr. R. Sankar, BSE 633 (2020)

BSE 633A

Bioinformatics & Computational Biology

R. Sankararamakrishnan

Page 2: Bioinformatics & Computational Biologyhome.iitk.ac.in/~rsankar/courses/lec01.pdfDr. R. Sankar, BSE 633 (2020) Introduction to bioinformatics, biological databases and their growth,

Dr. R. Sankar, BSE 633 (2020)

References Bioinformatics: Sequence and Genome Analysis David W. Mount, Cold Spring Harbor Laboratory Press (2001)

Bioinformatics and Functional Genomics by Jonathan Pevsner, Wiley-Balckwell

Developing Bioinformatics Computer Skills. C. Gibas and P. Jambeck, O’ Reilly (2001)

Biological Sequence Analysis: Probabilistic models of proteins and nucleic acids. R. Durbin, S. Eddy, A. Krogh and G. Mitchison, Cambridge University Press (1998)

Journals: Bioinformatics, BMC Bioinformatics, Nucleic Acid Research, ISMB, J. Comp. Biol., PLoS Computational Biology

Page 3: Bioinformatics & Computational Biologyhome.iitk.ac.in/~rsankar/courses/lec01.pdfDr. R. Sankar, BSE 633 (2020) Introduction to bioinformatics, biological databases and their growth,

Dr. R. Sankar, BSE 633 (2020)

Instructors: Upto MidSem Exam: Dr. R. Sankar After MidSem Exam: Dr. Hamim Zafar

Page 4: Bioinformatics & Computational Biologyhome.iitk.ac.in/~rsankar/courses/lec01.pdfDr. R. Sankar, BSE 633 (2020) Introduction to bioinformatics, biological databases and their growth,

Dr. R. Sankar, BSE 633 (2020)

Quiz I – February first week: 5% Midsem: 30% Quiz II – April first week: 5% Assignment/Exercise: 10% Presentation: 5% End-semester exam: 40% Attendance: 5%

Course evaluation

Page 5: Bioinformatics & Computational Biologyhome.iitk.ac.in/~rsankar/courses/lec01.pdfDr. R. Sankar, BSE 633 (2020) Introduction to bioinformatics, biological databases and their growth,

Dr. R. Sankar, BSE 633 (2020)

Introduction to bioinformatics, biological databases and their growth, Concept of homology and definition of associated terms, pairwise sequence alignment, dotmatrix plot, dynamic programming algorithm, global (Needleman-Wunsch) and local (Smith-Waterman) alignments, BLAST Scoring matrices (PAM and BLOSUM families), gap penalty, statistical significance of alignment Multiple sequence alignment, Sum-of-pairs method, CLUSTAL W, Genetic Algorithm Pattern finding in protein and DNA sequencing, Gibbs Sampler, Hidden Markov Model, Profile construction and searching, PSI-BLAST Introduction to phylogeny, maximum parsimony method, distance method (neighbor-joining), maximum-likelihood method Gene prediction in prokaryotes and eukaryotes, homology and ab-initio methods

Genome analysis and annotation, comparative genomics

BSE633: Course Contents

Page 6: Bioinformatics & Computational Biologyhome.iitk.ac.in/~rsankar/courses/lec01.pdfDr. R. Sankar, BSE 633 (2020) Introduction to bioinformatics, biological databases and their growth,

Dr. R. Sankar, BSE 633 (2020)

Powerpoint presentation of each class and other course materials will be available at:

http://home.iitk.ac.in/~rsankar/courses/

Page 7: Bioinformatics & Computational Biologyhome.iitk.ac.in/~rsankar/courses/lec01.pdfDr. R. Sankar, BSE 633 (2020) Introduction to bioinformatics, biological databases and their growth,

Dr. R. Sankar, BSE 633 (2020)

What is Bioinformatics? - Research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, store, organize, archive, analyze, or visualize such data.

What is Computational Biology? - The development and application of data-analytical and theoretical methods, mathematical modeling and computational simulation techniques to the study of biological, behavioral, and social systems.

- NIH Definition http://www.bisti.nih.gov/

Page 8: Bioinformatics & Computational Biologyhome.iitk.ac.in/~rsankar/courses/lec01.pdfDr. R. Sankar, BSE 633 (2020) Introduction to bioinformatics, biological databases and their growth,

Dr. R. Sankar, BSE 633 (2020)

Nature (Oct. 2017)

Page 9: Bioinformatics & Computational Biologyhome.iitk.ac.in/~rsankar/courses/lec01.pdfDr. R. Sankar, BSE 633 (2020) Introduction to bioinformatics, biological databases and their growth,

Dr. R. Sankar, BSE 633 (2020)

Nature (Oct. 2017)

Page 10: Bioinformatics & Computational Biologyhome.iitk.ac.in/~rsankar/courses/lec01.pdfDr. R. Sankar, BSE 633 (2020) Introduction to bioinformatics, biological databases and their growth,

Dr. R. Sankar, BSE 633 (2020)

Nature (Oct. 2017)

Page 11: Bioinformatics & Computational Biologyhome.iitk.ac.in/~rsankar/courses/lec01.pdfDr. R. Sankar, BSE 633 (2020) Introduction to bioinformatics, biological databases and their growth,

Dr. R. Sankar, BSE 633 (2020) Nature (Oct. 2017)

Page 12: Bioinformatics & Computational Biologyhome.iitk.ac.in/~rsankar/courses/lec01.pdfDr. R. Sankar, BSE 633 (2020) Introduction to bioinformatics, biological databases and their growth,

Dr. R. Sankar, BSE 633 (2020)

The first protein was sequenced in 1953

Page 13: Bioinformatics & Computational Biologyhome.iitk.ac.in/~rsankar/courses/lec01.pdfDr. R. Sankar, BSE 633 (2020) Introduction to bioinformatics, biological databases and their growth,

Dr. R. Sankar, BSE 633 (2020)

Number of protein sequences today Source: UniProt Database www.uniprot.org

Swiss-Prot: 561,568 seqs TrEMBL: 179,250,561 seqs

11/Dec/2019

Page 14: Bioinformatics & Computational Biologyhome.iitk.ac.in/~rsankar/courses/lec01.pdfDr. R. Sankar, BSE 633 (2020) Introduction to bioinformatics, biological databases and their growth,

Dr. R. Sankar, BSE 633 (2020)

Myoglobin and Hemoglobin: First protein structures to be determined

Page 15: Bioinformatics & Computational Biologyhome.iitk.ac.in/~rsankar/courses/lec01.pdfDr. R. Sankar, BSE 633 (2020) Introduction to bioinformatics, biological databases and their growth,

Dr. R. Sankar, BSE 633 (2020)

Yearly growth of structures in PDB

http://www.pdb.org

159140 structures in PDB Date: 6/Jan/2020

Page 16: Bioinformatics & Computational Biologyhome.iitk.ac.in/~rsankar/courses/lec01.pdfDr. R. Sankar, BSE 633 (2020) Introduction to bioinformatics, biological databases and their growth,

Dr. R. Sankar, BSE 633 (2020)

1976: Bacteriophage MS2 – RNA Virus; 3569 bp PhiX174 – DNA virus; 5386 bp 1995: Haemophilus influenzae - bacteria; 1.8 m bp Methanococcus jannaschii – archaeon; 1.7 m bp 1996: Baker’s yeast; 12.1 m bp 1998: Caenorhabditis elegans; 100 m bp 2000: Arabidopsis thaliana; 119 m bp Drosophila melanogaster; 165 m bp 2001: Homo sapiens; 3.2 b bp 2002: Mouse; 3.48 b bp 2003: Mosquito; 278 m bp Japanese pufferfish; 390 m bp Rice: 374 m bp 2004: Chicken; 1 b bp 2005: Chimpanzee; 3.3 b 2010: Western clawed frog: 1.5 m bp 2013: Zebra fish; 1.5 b

http://www.yourgenome.org/facts/timeline-organisms-that-have-had-their-genomes-sequenced

Genome Sequencing: Important milestones

Page 17: Bioinformatics & Computational Biologyhome.iitk.ac.in/~rsankar/courses/lec01.pdfDr. R. Sankar, BSE 633 (2020) Introduction to bioinformatics, biological databases and their growth,

Dr. R. Sankar, BSE 633 (2020)

Number of genome sequences

http://gregoryzynda.com/ncbi/genome/python/2014/03/31/ncbi-genome.html

https://ark-invest.com/research/genome-sequencing

The genome sequencing market is in its infancy, poised to grow at rates difficult to comprehend. Sequencing is introducing deeper scientific knowledge into medical decision making, eliminating wasteful guess work, and moving us closer to a truly personalized healthcare system.

Page 18: Bioinformatics & Computational Biologyhome.iitk.ac.in/~rsankar/courses/lec01.pdfDr. R. Sankar, BSE 633 (2020) Introduction to bioinformatics, biological databases and their growth,

Dr. R. Sankar, BSE 633 (2020) http://www.internationalgenome.org/

Page 19: Bioinformatics & Computational Biologyhome.iitk.ac.in/~rsankar/courses/lec01.pdfDr. R. Sankar, BSE 633 (2020) Introduction to bioinformatics, biological databases and their growth,

Dr. R. Sankar, BSE 633 (2020)

Page 20: Bioinformatics & Computational Biologyhome.iitk.ac.in/~rsankar/courses/lec01.pdfDr. R. Sankar, BSE 633 (2020) Introduction to bioinformatics, biological databases and their growth,

Dr. R. Sankar, BSE 633 (2020)

598 sequences from India

Page 21: Bioinformatics & Computational Biologyhome.iitk.ac.in/~rsankar/courses/lec01.pdfDr. R. Sankar, BSE 633 (2020) Introduction to bioinformatics, biological databases and their growth,

Dr. R. Sankar, BSE 633 (2020)

Page 22: Bioinformatics & Computational Biologyhome.iitk.ac.in/~rsankar/courses/lec01.pdfDr. R. Sankar, BSE 633 (2020) Introduction to bioinformatics, biological databases and their growth,

Dr. R. Sankar, BSE 633 (2020) https://www.nlm.nih.gov/about/2020CJ.html

Page 23: Bioinformatics & Computational Biologyhome.iitk.ac.in/~rsankar/courses/lec01.pdfDr. R. Sankar, BSE 633 (2020) Introduction to bioinformatics, biological databases and their growth,

Dr. R. Sankar, BSE 633 (2020)

Page 24: Bioinformatics & Computational Biologyhome.iitk.ac.in/~rsankar/courses/lec01.pdfDr. R. Sankar, BSE 633 (2020) Introduction to bioinformatics, biological databases and their growth,

Dr. R. Sankar, BSE 633 (2020)

Page 25: Bioinformatics & Computational Biologyhome.iitk.ac.in/~rsankar/courses/lec01.pdfDr. R. Sankar, BSE 633 (2020) Introduction to bioinformatics, biological databases and their growth,

Dr. R. Sankar, BSE 633 (2020)

https://digitalworldbiology.com/blog/bio-databases-2018-how-do-they-taste

Page 26: Bioinformatics & Computational Biologyhome.iitk.ac.in/~rsankar/courses/lec01.pdfDr. R. Sankar, BSE 633 (2020) Introduction to bioinformatics, biological databases and their growth,

Dr. R. Sankar, BSE 633 (2020)

Page 27: Bioinformatics & Computational Biologyhome.iitk.ac.in/~rsankar/courses/lec01.pdfDr. R. Sankar, BSE 633 (2020) Introduction to bioinformatics, biological databases and their growth,

Dr. R. Sankar, BSE 633 (2020)

Page 28: Bioinformatics & Computational Biologyhome.iitk.ac.in/~rsankar/courses/lec01.pdfDr. R. Sankar, BSE 633 (2020) Introduction to bioinformatics, biological databases and their growth,

Dr. R. Sankar, BSE 633 (2020)

Page 29: Bioinformatics & Computational Biologyhome.iitk.ac.in/~rsankar/courses/lec01.pdfDr. R. Sankar, BSE 633 (2020) Introduction to bioinformatics, biological databases and their growth,

Dr. R. Sankar, BSE 633 (2020)

>gi|388480089|ref|YP_492284.1| transporter [Escherichia coli str. K-12 substr. W3110] MSGLKQELGLAQGIGLLSTSLLGTGVFAVPALAALVAGNNSLWAWPVLIILVFPIAIVFAILGRHYPSAG GVAHFVGMAFGSRLERVTGWLFLSVIPVGLPAALQIAAGFGQAMFGWHSWQLLLAELGTLALVWYIGTRG ASSSANLQTVIAGLIVALIVAIWWAGDIKPANIPFPAPGNIELTGLFAALSVMFWCFVGLEAFAHLASEF KNPERDFPRALMIGLLLAGLVYWGCTVVVLHFDAYGEKMAAAASLPKIVVQLFGVGALWIACVIGYLACF ASLNIYIQSFARLVWSQAQHNPDHYLARLSSRHIPNNALNAVLGCCVVSTLVIHALEINLDALIIYANGI FIMIYLLCMLAGCKLLQGRYRLLAVVGGLLCVLLLAMVGWKSLYALIMLAGLWLLLPKRKTPENGITT

A sample record in FASTA format

Page 30: Bioinformatics & Computational Biologyhome.iitk.ac.in/~rsankar/courses/lec01.pdfDr. R. Sankar, BSE 633 (2020) Introduction to bioinformatics, biological databases and their growth,

Dr. R. Sankar, BSE 633 (2020)

Genomic sequences

Single Nucleotide Polymorphisms (SNPs)

Protein amino acid sequences

Protein 3D structures

Gene Expression

Protein function

Biomolecular interactions and networks

Literature

Biological Data

Page 31: Bioinformatics & Computational Biologyhome.iitk.ac.in/~rsankar/courses/lec01.pdfDr. R. Sankar, BSE 633 (2020) Introduction to bioinformatics, biological databases and their growth,

Dr. R. Sankar, BSE 633 (2020)

Emergence of ‘Omes’ – The new ‘era’ in Biology

Transcriptome: the mRNA complement of an entire organism, tissue type, or cell

Metabolome: the totality of metabolites in an organism

Lipidome: the totality of lipids

Glycome: the totality of glycans, carbohydrate structures of an organism, a cell or tissue type

Interactome: the totality of the molecular interactions in an organism

Spliceome: the totality of the alternative splicing protein isoforms

Kinome: The totality of protein kinases in a cell

Foldome: Foldome is the totality of biological structures as skeletons

Dynome: Adding a 4th Dimension to the Protein Database by Terascale Simulation

Reactome: A knowledge base of biological processes

Page 32: Bioinformatics & Computational Biologyhome.iitk.ac.in/~rsankar/courses/lec01.pdfDr. R. Sankar, BSE 633 (2020) Introduction to bioinformatics, biological databases and their growth,

Dr. R. Sankar, BSE 633 (2020)

What is Bioinformatics?

A Proposed Definition and Overview of the Field N.M. Luscombe, D. Greenbaum and M. Gerstein

http://bioinfo.mbb.yale.edu/

Page 33: Bioinformatics & Computational Biologyhome.iitk.ac.in/~rsankar/courses/lec01.pdfDr. R. Sankar, BSE 633 (2020) Introduction to bioinformatics, biological databases and their growth,

Dr. R. Sankar, BSE 633 (2020)

What is Bioinformatics? Bioinformatics is conceptualizing biology in terms

of molecules (in the sense of physical chemistry) and applying “informatics techniques” (derived from applied maths, computer science and statistics) to understand and organize the information associated with these molecules on a large scale. In short, bioinformatics is a management information system for molecular biology and has many practical applications

Mark Gerstein, Yale University

Page 34: Bioinformatics & Computational Biologyhome.iitk.ac.in/~rsankar/courses/lec01.pdfDr. R. Sankar, BSE 633 (2020) Introduction to bioinformatics, biological databases and their growth,

Dr. R. Sankar, BSE 633 (2020)

Bioinformatics is an interdisciplinary field combining mathematical, statistical, and computer methods to analyze medical, biological, biochemical, and biophysical data.

Page 35: Bioinformatics & Computational Biologyhome.iitk.ac.in/~rsankar/courses/lec01.pdfDr. R. Sankar, BSE 633 (2020) Introduction to bioinformatics, biological databases and their growth,

Dr. R. Sankar, BSE 633 (2020)

Crystal Structure of ATP-gated P2X4 receptors

Nature July (2009)

Page 36: Bioinformatics & Computational Biologyhome.iitk.ac.in/~rsankar/courses/lec01.pdfDr. R. Sankar, BSE 633 (2020) Introduction to bioinformatics, biological databases and their growth,

Dr. R. Sankar, BSE 633 (2020)

Nature July (2009)

Page 37: Bioinformatics & Computational Biologyhome.iitk.ac.in/~rsankar/courses/lec01.pdfDr. R. Sankar, BSE 633 (2020) Introduction to bioinformatics, biological databases and their growth,

Dr. R. Sankar, BSE 633 (2020)

Nature July (2009)

Page 38: Bioinformatics & Computational Biologyhome.iitk.ac.in/~rsankar/courses/lec01.pdfDr. R. Sankar, BSE 633 (2020) Introduction to bioinformatics, biological databases and their growth,

Dr. R. Sankar, BSE 633 (2020)

What are we going to learn in this course?

How to compare two sequences?

How to compare many sequences?

How to evaluate an alignment?

What are the limitations?

Phylogenetic analysis Prediction of genes from a given genomic sequence

Comparative genomics

Page 39: Bioinformatics & Computational Biologyhome.iitk.ac.in/~rsankar/courses/lec01.pdfDr. R. Sankar, BSE 633 (2020) Introduction to bioinformatics, biological databases and their growth,

Dr. R. Sankar, BSE 633 (2020)

Page 41: Bioinformatics & Computational Biologyhome.iitk.ac.in/~rsankar/courses/lec01.pdfDr. R. Sankar, BSE 633 (2020) Introduction to bioinformatics, biological databases and their growth,

Dr. R. Sankar, BSE 633 (2020)

Species & Speciation Species: Group of populations, have similar appearance

Successfully interbreed

Reproductively isolated from other species

Gene flow occurs, genetically distinctive and isolated from other species

Speciation The formation of two groups of organisms that are reproductively isolated from each other and thus have no gene flow.

When there is no gene flow, the 2 groups will accumulate more and more differences over time.

Page 42: Bioinformatics & Computational Biologyhome.iitk.ac.in/~rsankar/courses/lec01.pdfDr. R. Sankar, BSE 633 (2020) Introduction to bioinformatics, biological databases and their growth,

Dr. R. Sankar, BSE 633 (2020)

Gene Duplication

•A redundant duplicate of a gene may acquire divergent mutations and eventually emerge as a new gene.

•Gene duplication is one of the means by which a new gene can arise.

•It is one of a only a few ways to increase the amount of genetic material.

•One of the means to create new function

Page 43: Bioinformatics & Computational Biologyhome.iitk.ac.in/~rsankar/courses/lec01.pdfDr. R. Sankar, BSE 633 (2020) Introduction to bioinformatics, biological databases and their growth,

Dr. R. Sankar, BSE 633 (2020)

Why should we do sequence alignments?

Useful for discovering functional, structural and evolutionary information in biological sequences

Sequences that are very much alike probably have the same function and 3D structure in the case of proteins

If two sequences from different organisms are similar, there may have been a common ancestor sequence

The sequences are then defined as homologous

Page 44: Bioinformatics & Computational Biologyhome.iitk.ac.in/~rsankar/courses/lec01.pdfDr. R. Sankar, BSE 633 (2020) Introduction to bioinformatics, biological databases and their growth,

Dr. R. Sankar, BSE 633 (2020)

Hemoglobin

Page 45: Bioinformatics & Computational Biologyhome.iitk.ac.in/~rsankar/courses/lec01.pdfDr. R. Sankar, BSE 633 (2020) Introduction to bioinformatics, biological databases and their growth,

Dr. R. Sankar, BSE 633 (2020)

_________ Rat_gene_1

Rat |

________X

| |_________ Rat_gene_2

|

---( )

| _____________ Mouse_gene_1

| |

|____X

Mouse |_____________ Mouse_gene_2

Two genes are to be orthologous if they diverged after a speciation event, Two genes are to be paralogous if they diverged after a duplication event.

Orthologous and paralogous genes

http://www.icp.ucl.ac.be/~opperd/private/orthol.html

Page 46: Bioinformatics & Computational Biologyhome.iitk.ac.in/~rsankar/courses/lec01.pdfDr. R. Sankar, BSE 633 (2020) Introduction to bioinformatics, biological databases and their growth,

Dr. R. Sankar, BSE 633 (2020)

Types of Homology

Page 47: Bioinformatics & Computational Biologyhome.iitk.ac.in/~rsankar/courses/lec01.pdfDr. R. Sankar, BSE 633 (2020) Introduction to bioinformatics, biological databases and their growth,

Dr. R. Sankar, BSE 633 (2020)

Chymotrypsin Subtilisin

Page 48: Bioinformatics & Computational Biologyhome.iitk.ac.in/~rsankar/courses/lec01.pdfDr. R. Sankar, BSE 633 (2020) Introduction to bioinformatics, biological databases and their growth,

Dr. R. Sankar, BSE 633 (2020)

Analogous genes

Similar regions in sequences may not have a common ancestor but may have arisen independently by evolutionary pathways converging on the same function

This is called convergent evolution

Such gene/protein sequences are referred to as analogous

Page 49: Bioinformatics & Computational Biologyhome.iitk.ac.in/~rsankar/courses/lec01.pdfDr. R. Sankar, BSE 633 (2020) Introduction to bioinformatics, biological databases and their growth,

Dr. R. Sankar, BSE 633 (2020)

Certain infectious agents, such as retroviruses, or species hybridization can introduce foreign DNA into the genome of an organism.

Once introduced, these sequences become part of the genome passed between generations, but the sequence has its origins elsewhere

Such sequences are called xenologues or xenologous sequences

Xenologoues

Page 50: Bioinformatics & Computational Biologyhome.iitk.ac.in/~rsankar/courses/lec01.pdfDr. R. Sankar, BSE 633 (2020) Introduction to bioinformatics, biological databases and their growth,

Dr. R. Sankar, BSE 633 (2020)

Sequence Alignment - Example

Page 51: Bioinformatics & Computational Biologyhome.iitk.ac.in/~rsankar/courses/lec01.pdfDr. R. Sankar, BSE 633 (2020) Introduction to bioinformatics, biological databases and their growth,

Dr. R. Sankar, BSE 633 (2020)

Sequence Alignment - Definition

•Procedure of comparing two or more sequences

•Search for a series of individual characters or character pattern that are in the same order

•Two sequences are aligned by writing them across a page in two rows

•Identical/similar characters are placed in the same column

•Nonidentical characters are placed in the same column as a mismatch or opposite a gap in the other sequence

•Optimal alignment: mismatches and gaps are placed to bring as many identical and similar characters as possible

Page 52: Bioinformatics & Computational Biologyhome.iitk.ac.in/~rsankar/courses/lec01.pdfDr. R. Sankar, BSE 633 (2020) Introduction to bioinformatics, biological databases and their growth,

Dr. R. Sankar, BSE 633 (2020)

PRA isomerase: IGP synthase

Page 53: Bioinformatics & Computational Biologyhome.iitk.ac.in/~rsankar/courses/lec01.pdfDr. R. Sankar, BSE 633 (2020) Introduction to bioinformatics, biological databases and their growth,

Dr. R. Sankar, BSE 633 (2020)

>1PII:_|PDBID|CHAIN|SEQUENCE MQTVLAKIVADKAIWVEARKQQQPLASFQNEVQPSTRHFYDALQGARTAFILECKKASPSKGVIRDDFDPARIAAIYKHYASAISVLTDEKYFQGSFNFLPIVSQIAPQPILCKDFIIDPYQIYLARYYQADACLLMLSVLDDDQYRQLAAVAHSLEMGVLTEVSNEEEQERAIALGAKVVGINNRDLRDLSIDLNRTRELAPKLGHNVTVISESGINTYAQVRELSHFANGFLIGSALMAHDDLHAAVRRVLLGENKVCGLTRGQDAKAAYDAGAIYGGLIFVATSPRCVNVEQAQEVMAAAPLQYVGVFRNHDIADVVDKAKVLSLAAVQLHGNEEQLYIDTLREALPAHVAIWKALSVGETLPAREFQHVDKYVLDNGQGGSGQRFDWSLLNGQSLGNVLLAGGLGADNCVEAAQTGCAGLDFNSAVESQPGIKDARLLASVFQTLRAY > PRAI sequence GENKVCGLTRGQDAKAAYDAGAIYGGLIFVATSPRCVNVEQAQEVMAAAPLQYVGVFRNHDIADVVDKAKVLSLAAVQLHGNEEQLYIDTLREALPAHVAIWKALSVGETLPAREFQHVDKYVLDNGQGGSGQRFDWSLLNGQSLGNVLLAGGLGADNCVEAAQTGCAGLDFNSAVESQPGIKDARLLASVFQTLRAY

Page 54: Bioinformatics & Computational Biologyhome.iitk.ac.in/~rsankar/courses/lec01.pdfDr. R. Sankar, BSE 633 (2020) Introduction to bioinformatics, biological databases and their growth,

Dr. R. Sankar, BSE 633 (2020)

Global and Local alignment Global alignment: Align the entire sequence

Sequences that are quite similar and approximately the same length are suitable candidates

Local alignment: Stretches of sequence with the highest density of matches are aligned

One or more islands of matches or subalignments are generated

More suitable for sequences that are similar along some of their lengths and dissimilar in others

that differ in length

that share a conserved region or domain

Page 55: Bioinformatics & Computational Biologyhome.iitk.ac.in/~rsankar/courses/lec01.pdfDr. R. Sankar, BSE 633 (2020) Introduction to bioinformatics, biological databases and their growth,

Dr. R. Sankar, BSE 633 (2020)

Page 56: Bioinformatics & Computational Biologyhome.iitk.ac.in/~rsankar/courses/lec01.pdfDr. R. Sankar, BSE 633 (2020) Introduction to bioinformatics, biological databases and their growth,

Dr. R. Sankar, BSE 633 (2020)

Exercise 1 Get the UniProt (www.uniprot.org) Accession ID for the protein whose PDB ID is 1BL8. Go to the corresponding UniProt entry. Find out the databases which are cross-linked. What are the related databases from which you can extract information about this protein?