e-Science Tools For The Genomic Scale Characterisation Of Bacterial Secreted Proteins
Tracy Craddock, Phillip Lord, Colin Harwood and Anil Wipat
Newcastle University
Outline
Computational challenges of bioinformatics
Secretion in Bacillus Classification and analysis workflows Results and discussion
Computational Challenges of Bioinformatics
New requirements from bioinformatics 3 major problems
Heterogeneity Distribution Autonomy
Experiments - series of workflows
myGrid and Taverna
Scufl Simple Conceptual Unified Flow LanguageTaverna Writing, running workflows & examining resultsSOAPLAB Makes applications available
Freefluo Workflow engine to run workflows
Freefluo
SOAPLABWeb Service
Any Application
Web Service e.g. DDBJ BLAST
Microbase
Grid-based system for microbial genome comparison and analysis
Information repository (and execution environment)
Pre-computed data
Outline
Computational challenges of bioinformatics
Secretion in Bacillus Classification and analysis workflows Results and discussion
Secretion in Bacillus
Predict characteristics & behavior of bacteria
Identify secreted proteins
Bacillus species diverse behaviour Soil inhabitants Harmful bacteria
Importance of Secretion
Mechanism of interaction with environment
Reveal capabilities of an organism
Pathogens are of great interest
Secretory Proteins
Cytoplasm
Medium
Membrane
Cell Wall
Signal Peptide
Lipoprotein
Cell wall binding
Transmembrane
LPXTG
Outline
Computational challenges of bioinformatics
Secretion in Bacillus Classification and analysis workflows Results and discussion
Bioinformatic Tools
Cytoplasm
Medium
Membrane
Cell Wall
Signal Peptide
Lipoprotein
Cell wall binding
Transmembrane
LPXTG
Signalp
TMHMMtmap
MEMSATLipoP
ps_scan
Classification Workflow
Process of Analysis
01
02
03
04
0
CP
00
00
01
AE
01
73
55
AE
01
72
25
AE
01
73
34
AE
01
68
79
AE
01
71
94
AE
01
68
77
AL
00
91
26
CP
00
00
02
AE
01
73
33
AP
00
66
27
BA
00
00
04
Putative secreted proteins
Protein families
Functional classification Relations
Analysis Workflow
Architecture
Custom-designed database Provenance tracking Analysis – computationally intensive Architecture differs from other systems
Web Portal
Outline
Computational challenges of bioinformatics
Secretion in Bacillus Classification and analysis workflows Results and discussion
Classification Results
Similar to unknown proteinsTransport/binding proteins and lipoproteinsCell WallMembrane bioenergeticsGerminationProtein secretionSporulationMetabolism of carbohydrates and related moleculesSpecific pathwaysTransformation/competenceMetabolism of lipidsMetabolism of phosphateTranscription regulationMetabolism of amino acids and related molecules
02
46
81
01
2
Functions of the Clusters
Num
ber
of
fam
ilies
Biologist’s Outlook
Results available for subsequent analysis
Data and results are of great interest
eScientist’s Outlook
Microbase simplified data analysis
But … Autonomy - most services
provided originally by external parties
Licensing – limits exposure of services
Distribution - difficulty came from the relatively large datasets
Future Enhancements
Use notification to automatically analyse recently annotated genomes
Migrate workflows to a remote enclosed environment?
Acknowledgments
Phillip Lord Colin Harwood Anil Wipat
myGrid Carole Goble Tom Oinn
… and the rest of the myGrid team
Microbase Yudong Sun Anil Wipat Matthew Pocock Pete A. Lee Paul Watson Keith Flanagan James T. Worthington