online bioinformatics resources -...

20
ONLINE BIOINFORMATICS RESOURCES Introduction to Molecular Biology and Bioinformatics (IMBB) 2014 Dedan Githae Email: [email protected] BecA-ILRI Hub; Nairobi, Kenya 16 May, 2014

Upload: others

Post on 23-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ONLINE BIOINFORMATICS RESOURCES - CGIARhpc.ilri.cgiar.org/beca/training/IMBB_2015/lectures/2014_Online_bfx... · ONLINE BIOINFORMATICS RESOURCES Introduction to Molecular Biology

ONLINE BIOINFORMATICS RESOURCES Introduction to Molecular Biology and Bioinformatics (IMBB) 2014

Dedan Githae Email: [email protected] BecA-ILRI Hub; Nairobi, Kenya 16 May, 2014

Page 2: ONLINE BIOINFORMATICS RESOURCES - CGIARhpc.ilri.cgiar.org/beca/training/IMBB_2015/lectures/2014_Online_bfx... · ONLINE BIOINFORMATICS RESOURCES Introduction to Molecular Biology

The larger picture..

• Lower cost, Improved efficiency and modern

technology:

The sequencers, computers, and rate at which

data is being produced have all increased in

speed - huge amounts of biological data

produced at a FASTER rate than interpreted.

• Storage and Exchange of data:

Need to improve database design, develop better

software for database access / manipulation,

harmonize data-entry procedures to compensate

for the varied computer procedures and systems

used in different laboratories

Computing power, access to information (data) and right software (tools)

has enabled scientists increase rate at answering research questions and

new discoveries

Page 3: ONLINE BIOINFORMATICS RESOURCES - CGIARhpc.ilri.cgiar.org/beca/training/IMBB_2015/lectures/2014_Online_bfx... · ONLINE BIOINFORMATICS RESOURCES Introduction to Molecular Biology

Online Bioinformatics resources

• INFORMATION • Journals / publications, • biological research data.

• TOOLS – Programs to analyse data

Page 4: ONLINE BIOINFORMATICS RESOURCES - CGIARhpc.ilri.cgiar.org/beca/training/IMBB_2015/lectures/2014_Online_bfx... · ONLINE BIOINFORMATICS RESOURCES Introduction to Molecular Biology

Information

Journal Website: Almost every major journal has a web access to abstracts is usually free, even when the content is subscription.

E-journals: Some electronic journals are online-only journals; some are online

versions of printed journals, and some consist of the online equivalent of a printed

journal, but with additional online-only (sometimes video and interactive media)

material.

Page 5: ONLINE BIOINFORMATICS RESOURCES - CGIARhpc.ilri.cgiar.org/beca/training/IMBB_2015/lectures/2014_Online_bfx... · ONLINE BIOINFORMATICS RESOURCES Introduction to Molecular Biology

Information

Servers (eg NCBI Pubmed; Google scholar; SCOPUS) A search engine to search references and abstracts on life sciences and biomedical topics in multiple databases

Page 6: ONLINE BIOINFORMATICS RESOURCES - CGIARhpc.ilri.cgiar.org/beca/training/IMBB_2015/lectures/2014_Online_bfx... · ONLINE BIOINFORMATICS RESOURCES Introduction to Molecular Biology

Sequence Databases

Primary Databases: International Nucleotide Sequence Database Collaboration comprises of • GenBank (USA), • European Nucleotide Archive (Europe)

and • DNA DataBase of Japan.

They cooperate to make all publicly

available sequences available.

http://www.ncbi.nlm.nih.gov

http://www.ebi.ac.uk/ena/

Page 7: ONLINE BIOINFORMATICS RESOURCES - CGIARhpc.ilri.cgiar.org/beca/training/IMBB_2015/lectures/2014_Online_bfx... · ONLINE BIOINFORMATICS RESOURCES Introduction to Molecular Biology

Sequence Databases

Secondary Databases

UniProt: It is a database of protein sequence and functional information. Information about the biological function of proteins derived from the research literature

Available under Uniprot

SWISS PROT: Manually annotated and reviewed

TrEMBL: Automatically annotated and not reviewed.

Uniprot: www.uniprot.org

Page 8: ONLINE BIOINFORMATICS RESOURCES - CGIARhpc.ilri.cgiar.org/beca/training/IMBB_2015/lectures/2014_Online_bfx... · ONLINE BIOINFORMATICS RESOURCES Introduction to Molecular Biology

Other Databases

Multigenome: ensembl: genome databases for vertebrates and other eukaryotic species, Genome specific databases: eg Wormbase; Saccharomyces database; Vectorbase; Flu DB; Zebrafish Model Organisms, Flybase etc. NB: You can also access to these genomes from the public databases

Biochemical pathway databases: Biological activities are orchestrated by various molecules These include KEGG; ExPASY; MetaCyc; BioPath Gene expression Data: DNA Microarrays: array of probe molecules that can bind specific DNA / mRNA. Fluoro-labelling enables viewing level of expression of genes; eg NCBI Geo (Gene experssion omnibus) Expression Atlas (EBI)

2D PAGE: allows quantitative study of protein concentration in the cell. Eg SWISS-2D PAGE www.yeastgenome.org/

www.fudb.org

www.zfin.org

www.wormbase.org/

http://vectorbase.org/

http://www.metacyc.org/

KEGG: http://www.genome.jp

http://web.expasy.org/pathways/

http://www.molecular-networks.com/databases/biopath

http://www.ncbi.nlm.nih.gov/geo/

http://www.ebi.ac.uk/gxa

http://world-2dpage.expasy.org/swiss-2dpage/

Page 9: ONLINE BIOINFORMATICS RESOURCES - CGIARhpc.ilri.cgiar.org/beca/training/IMBB_2015/lectures/2014_Online_bfx... · ONLINE BIOINFORMATICS RESOURCES Introduction to Molecular Biology

Protein Structure Databases

Protein Databank (PDB) (experimentally validated)

Secondary & Others:

ModBase: A database of annotated comparative protein structure models Modelled proteins) SCOP: Structural classification of Proteins Depending on α ; β ; α+β ; membrane & cell surface proteins; small

proteins; coiled coil proteins, etc CATH: hierarchical domain classification of protein structures in the Protein Data Bank (Class | Architecture |

Topology | Homologous superfamilies)

www.pdb.org

https://modbase.compbio.ucsf.edu/modweb/

Page 10: ONLINE BIOINFORMATICS RESOURCES - CGIARhpc.ilri.cgiar.org/beca/training/IMBB_2015/lectures/2014_Online_bfx... · ONLINE BIOINFORMATICS RESOURCES Introduction to Molecular Biology

Softwares / Tools

Sources of (informative) tools:

• Journals eg Bioinformatics, Nucleic Acids Research, Journal of Molecular Biology, Protein science publish papers on cutting edge developments and innovations in computational biology methods • Most biological databases have software resource listings- eg Sequence searching, visualisation resources (genome / alignment / genome level). • Web servers: “Simple” web implementation of the softwares. Clear inputs, outputs, parameters, graphical data representation and downloadable results.

Page 11: ONLINE BIOINFORMATICS RESOURCES - CGIARhpc.ilri.cgiar.org/beca/training/IMBB_2015/lectures/2014_Online_bfx... · ONLINE BIOINFORMATICS RESOURCES Introduction to Molecular Biology

Basic tasks

Task Objective Tool

Sequence similarity Identify homologous sequences to gain information

BLAST; SSEARCH; FASTA, ENA search

Sequence alignment Identify conserved regions, domains, motifs

CLUSTAL; Muscle;

Gene finding Identify coding regions in gDNA sequences

GENSCAN; ORFinder; GRAIL

DNA Translation Convert DNA Sequence to protein

Various servers and tools eg ExPasy, Transeq, EBI, and ORFinders

Local pairwise alignment Detect short regions of homology in longer sequences

BLAST; FASTA

Global pairwise alignment Find the best full length alignment in 2 sequences

ALIGN

http://web.expasy.org/translate/

Page 12: ONLINE BIOINFORMATICS RESOURCES - CGIARhpc.ilri.cgiar.org/beca/training/IMBB_2015/lectures/2014_Online_bfx... · ONLINE BIOINFORMATICS RESOURCES Introduction to Molecular Biology

http://www.ebi.ac.uk/Tools

Page 13: ONLINE BIOINFORMATICS RESOURCES - CGIARhpc.ilri.cgiar.org/beca/training/IMBB_2015/lectures/2014_Online_bfx... · ONLINE BIOINFORMATICS RESOURCES Introduction to Molecular Biology

http://www.ebi.ac.uk/Tools

Page 14: ONLINE BIOINFORMATICS RESOURCES - CGIARhpc.ilri.cgiar.org/beca/training/IMBB_2015/lectures/2014_Online_bfx... · ONLINE BIOINFORMATICS RESOURCES Introduction to Molecular Biology

A protein domain is a conserved part of a given protein sequence and structure that can evolve, function, and exist independently of the rest of the protein chain.

Molecular evolution uses domains as building blocks and these may be recombined in different arrangements to create proteins with different functions.

Databases include: SMART; PROSITE; NCBI; CATH

Protein Domains

http://prosite.expasy.org/

http://smart.embl-heidelberg.de/

GLYCINAMIDE RIBONUCLEOTIDE SYNTHETASE (GAR-SYN) FROM E. COLI.

Page 15: ONLINE BIOINFORMATICS RESOURCES - CGIARhpc.ilri.cgiar.org/beca/training/IMBB_2015/lectures/2014_Online_bfx... · ONLINE BIOINFORMATICS RESOURCES Introduction to Molecular Biology

A motif is a locally, conserved region / short

sequence pattern shared by set of sequence;

(Multiple sequence analysis)

Thus can be indicative of function / structural

similarities. Can be displayed via Sequence logos,

or as patterns of amino acids.

Motifs

Patterns of amino acids [PROSITE]:

For example N-glycosylation site motif takes the form:

N{P}[ST]{P}

To mean:

Asn, followed by anything but Pro, followed by either Ser or Thr, followed by

anything but Pro

Page 16: ONLINE BIOINFORMATICS RESOURCES - CGIARhpc.ilri.cgiar.org/beca/training/IMBB_2015/lectures/2014_Online_bfx... · ONLINE BIOINFORMATICS RESOURCES Introduction to Molecular Biology

DSSP: Wolfgang Kabsch and Chris Sander

EXPASY: seconadary structure prediction

JPRED 3: from university of Dundee

PDB: DSSP prediction

Page 17: ONLINE BIOINFORMATICS RESOURCES - CGIARhpc.ilri.cgiar.org/beca/training/IMBB_2015/lectures/2014_Online_bfx... · ONLINE BIOINFORMATICS RESOURCES Introduction to Molecular Biology

Honorable mention

About PredictProtein

PredictProtein integrates feature prediction for secondary structure, solvent accessibility, transmembrane helices, globular regions, coiled-coil regions, structural switch regions, B-values, disorder regions, intra-residue contacts, protein-protein and protein-DNA binding sites, sub-cellular localization, domain boundaries, beta-barrels, cysteine bonds, metal binding sites and disulphide bridges.

Listed below is a comprehensive list of methods and databases currently incorporated into the server.

Page 18: ONLINE BIOINFORMATICS RESOURCES - CGIARhpc.ilri.cgiar.org/beca/training/IMBB_2015/lectures/2014_Online_bfx... · ONLINE BIOINFORMATICS RESOURCES Introduction to Molecular Biology

Integrated Methods Method Name Description Reference Web Server Download

PROFphd Prediction of secondary structure and solvent accessibility - - ftp://rostlab.org/profphd

PHDhtm Prediction of transmembrane helices - - ftp://rostlab.org/profphd

PROFtmb Prediction of transmembrane beta-barrels - - ftp://rostlab.org/proftmb

NORSp Predictor of non-regular secondary structure - - ftp://rostlab.org/norsp

NCOILS Calculates the probability that the sequence will adopt a coiled-coil conformation - - -

SEG Identifies low complexity regions - - -

DISULFIND Prediction of disulfide bridges - - ftp://rostlab.org/profbval

PROFBval Prediction of residue mobility - - ftp://rostlab.org/profbval

NorsNet Prediction protein disordered sites - - ftp://rostlab.org/norsnet

UCON Contact based prediction of disordered sites - -

METADISORDER Consensus based prediction of protein disorder - -

ISIS Prediction of protein-protein interaction sites - -

LocTree2 Prediction of sub-cellular localization for all domains of life - - -

MetaStudent Prediction of GO terms for Molecular Function and Biological Process - - ftp://rostlab.org/metastudent

SNAP2 Prediction of functional changes due to single nucleotide polymorphism - - -

ConSurf Prediction of functional changes due to single nucleotide polymorphism - - -

Supporting Databases Database Name Description Reference Download

UniRef Used for sequence homology searches by PSI-BLAST abstract http://www.uniprot.org/help/uniref

PDB The PDB archive contains information about experimentally-determined

structures of proteins, nucleic acids, and complex assemblies. As a

member of the wwPDB, the RCSB PDB curates and annotates PDB

data according to agreed upon standards. abstract http://www.rcsb.org/pdb/download/download.do

Pfam A collection of protein families abstract ftp://ftp.sanger.ac.uk/pub/databases/Pfam/releases/

PROSITE Database of biologically significant sites, patterns and profiles abstract ftp://ftp.expasy.org/databases/prosite/

Page 19: ONLINE BIOINFORMATICS RESOURCES - CGIARhpc.ilri.cgiar.org/beca/training/IMBB_2015/lectures/2014_Online_bfx... · ONLINE BIOINFORMATICS RESOURCES Introduction to Molecular Biology

In a Nutshell… There is vast amount of available out there….

There is a vast number of tools out there..

Select best that is applicable for your project..

Page 20: ONLINE BIOINFORMATICS RESOURCES - CGIARhpc.ilri.cgiar.org/beca/training/IMBB_2015/lectures/2014_Online_bfx... · ONLINE BIOINFORMATICS RESOURCES Introduction to Molecular Biology

Dedan Githae Bioinformatician

BecA-ILRI Hub [email protected]

Online Bioinformatics resources http://hub.africabiosciences.org

Thank you