Transcript
Page 1: PIPELINE FOR FUNCTIONAL ANNOTATION OF NOVEL …accelrys.com/.../pdf/hyseq_pipeline_annot.pdf · PsiPsi-blast score & link-blast score & link ... PIPELINE FOR FUNCTIONAL ANNOTATION

HS00222

HS00945

query sequence lengthquery sequence lengthquery sequence length

Structure Search ResultsStructure Search ResultsStructure Search Results

start, end & range of model sequencestart, end & range of model sequencestart, end & range of model sequence

Psi-blast score & linkPsiPsi--blast score & linkblast score & link

3D model scores3D model scores3D model scores

SeqFold scoreSeqFold scoreSeqFold score

model sequence % identity & similarity

model sequence model sequence % identity & similarity% identity & similarityPDB functional annotationPDB functional annotationPDB functional annotation

total numberof hits for each sequence

total numbertotal numberof hits for each sequenceof hits for each sequence

PDB template originPDB template originPDB template origin

SCOP & PDB public database linksSCOP & PDB public database linksSCOP & PDB public database linksHS00222

HS00222

HS00945

HS00945

HS00945

PDB active site annotationPDB active site annotationPDB active site annotation

HS00222

sequence identifiersequence identifiersequence identifier

structure methodstructure methodstructure method

Acknowledgements

We would like to thank Sue Andarmani and Ling Jiang (web interface), Savita Jayaram (Structure Plus), Kiran Mukhyala (structure analysis tools), Ami Gavali (SQL database), and Ivan Labat for their excellent work and contributions.

Disclaimer: Sequence and structure data are only representations of the real data

References

• Sánchez, R., Šali, A., PNAS 95 (1998) 13597-13602.

• Fischer, D., Eisenberg, D., Theor. Chem. Acc. 101 (1999) 57.

• Fischer, D., Eisenberg, D., Protein Sci. 5 (1996) 947-955.

• Lüthy, R., Bowie, J., Eisenberg, D., Nature 356 (1992) 83-85.

• Kitson, D., et al., Briefings in Bioinform. In press.

Abstract

We have created a high-throughput, integral pipeline of structure analysis protocols for over 10,000 of Hyseq’s proprietary protein sequences. This structure analysis pipeline incorporates 3D structure prediction and functional annotation (GeneAtlasTM, Accelrys Inc., San Diego), parsing and datamining programs, an SQL structure database, and several structural analysis programs. These tools are all accessible via an in-house web-interface. The pipeline has allowed us to obtain significant structure hits (over 100,000) and 3D models for many of our novel protein sequences. After storing the hit information in our database, we datamine the hits by keywords and analyze template-model structure pairs for individual novel proteins. Altogether, the results of the pipeline are used to aid in the functional annotation of our sequences by structure homology including active site residues, to interpret and verify sequence-based annotation, and to rapidly target novel genes to appropriate assays.

High-throughput 3D structure determination from novel gene sequences has created new opportunities for us for discovery of biopharmaceuticals acting through novel mechanisms.

PIPELINE FOR FUNCTIONAL ANNOTATION OF NOVEL PROTEINS BY STRUCTURPIPELINE FOR FUNCTIONAL ANNOTATION OF NOVEL PROTEINS BY STRUCTURAL HOMOLOGYAL HOMOLOGYDana Haley-Vicente* and Nancy Mize

Hyseq Pharmaceuticals Inc., 675 Almanor Ave., Sunnyvale, CA 94086

* Currently at Accelrys, 9685 Scranton Rd., San Diego, CA 92121

searching databaseby keyword(s) or sequence ID

searching databasesearching databaseby keyword(s) or sequence IDby keyword(s) or sequence ID

filtering forbest hits

filtering forfiltering forbest hitsbest hits

search & print fields optionssearch & print fields optionssearch & print fields options

3D Protein Structure Search3D Protein Structure Search3D Protein Structure Search

query sequence

model

query query sequence sequence

modelmodel

PDB template structure

PDB PDB template template structurestructure

3D Viewer Links

3D Viewer 3D Viewer LinksLinks

Structure AnalysisStructure AnalysisStructure Analysis

alignment analysisalignment analysisalignment analysis

USER INTERFACE

USER INTERFACE

PIPELINE & DATABASEPIPELINE & DATABASE

query sequence & template alignment

query sequence query sequence & template & template alignmentalignment

HS01Project

Individual GeneAtlasTM 3D Model Report

Individual Individual GeneAtlasGeneAtlasTMTM 3D 3D Model ReportModel Report

secondary structure

annotation

secondary secondary structure structure

annotationannotation

HS01Project

Individual GeneAtlasTM SeqFold

Report

Individual Individual GeneAtlasGeneAtlasTMTM SeqFold SeqFold

ReportReportstatistical analysisstatistical analysisstatistical analysis

CloningCloningSequencingSequencing

3D Active Site Annotation

3D Active Site Annotation

3D Alignment &

Statistical Analysis Tools

3D Alignment &

Statistical Analysis Tools

Template Search

(Psi-Blast)

Template Search

(Psi-Blast)

Sequence / Template Alignment

(Psi-Blast, PDB95)

Sequence / Template Alignment

(Psi-Blast, PDB95)

Model Generation(MODELER)

Model Generation(MODELER)

Model AnnotationModel Annotation

DataminingDatamining

Threading(SeqFold)

Threading(SeqFold)

GeneAtlasTM

Structure Plus(Parser & Filter Program)

Structure Plus(Parser & Filter Program)

Protein Sequences(Projects)

Protein Sequences(Projects)

StructureDatabase

StructureDatabase

Model Evaluation(Profiles-3D/Verify & PMF)

Model Evaluation(Profiles-3D/Verify & PMF)

The Protein Structure Pipeline

The Protein The Protein Structure PipelineStructure Pipeline

join toother databases

join tojoin toother databasesother databases

individualhit data

individualindividualhit datahit data

method typemethod typemethod type

project dataproject dataproject data

active sitedata for

query sequencemodels

active siteactive sitedata for data for

query sequencequery sequencemodelsmodels

individualsequence

data

individualindividualsequencesequence

datadata

active site & template PDB

structuredata

active site & active site & template PDB template PDB

structurestructuredatadata

Hyseq’s Relational Structure DatabaseHyseq’s Relational Structure DatabaseHyseq’s Relational Structure Database

Top Related