martin john bishop uk hgmp resource centre hinxton cambridge cb10 1 sb [email protected]

25
Martin John Bishop UK HGMP Resource Centre Hinxton Cambridge CB10 1 SB [email protected] http://www.hgmp.mrc.ac.uk

Upload: alexa-hansen

Post on 27-Mar-2015

239 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Martin John Bishop UK HGMP Resource Centre Hinxton Cambridge CB10 1 SB mbishop@hgmp.mrc.ac.uk

Martin John Bishop

UK HGMP Resource CentreHinxtonCambridge CB10 1 [email protected]://www.hgmp.mrc.ac.uk

Page 2: Martin John Bishop UK HGMP Resource Centre Hinxton Cambridge CB10 1 SB mbishop@hgmp.mrc.ac.uk

Bioinformatics scope

Genome sequences - DNA Transcripts - RNA Proteins Protein interactions Macromolecular assemblies Development and cellular function Genetic linkage analysis

Page 3: Martin John Bishop UK HGMP Resource Centre Hinxton Cambridge CB10 1 SB mbishop@hgmp.mrc.ac.uk

Molecular biology needs bioinformatics

Biological data - molecules Sequences Structures Gene expression Proteomes Pathways Evolution

Computer analysis – methods Comparison Modelling Co-regulation Mass spectrometry Knowledge bases Phylogenetics

Page 4: Martin John Bishop UK HGMP Resource Centre Hinxton Cambridge CB10 1 SB mbishop@hgmp.mrc.ac.uk

Molecular biology is about information

Central dogma DNA

<-> RNA -> protein -> phenotype <- DNA

Molecules Processes

Central paradigm Genome repository <-> RNA world -> Protein sequence -> Protein structure -> Protein function -> Phenotype<- Fed back to genome

Information processing

Page 5: Martin John Bishop UK HGMP Resource Centre Hinxton Cambridge CB10 1 SB mbishop@hgmp.mrc.ac.uk

The activities of HGMP-RC

B io in fo rm a tics S e rv ices

M H C F u g u M o u se se q ue n c ing T e ch n o lo gyd e ve lop m e nt

R e se a rch

B io lo g ica l m a te ria lsb y m a il o rd er

B io lo g ica l se rv ice sin c lu d in g

h o te l fa c ilit ies

C o n tra c t R & D

B io lo g y S e rv ices

H G M P -R C

Page 6: Martin John Bishop UK HGMP Resource Centre Hinxton Cambridge CB10 1 SB mbishop@hgmp.mrc.ac.uk

On-line service

M a ilN e tw o rk N e w sF ile s /B a ckup

S e rv ice s

D a ta L in ks

U n re s tric te d

P u b licD a ta

P riva teD a ta

R e g is te re d u se rs

In fo rm a tion A n a ly tica l to o ls

O n -lin e se rv ice

Page 7: Martin John Bishop UK HGMP Resource Centre Hinxton Cambridge CB10 1 SB mbishop@hgmp.mrc.ac.uk

HGMP-RC SERVICE

Web menu X (or VNC) Java Telnet

Telnet menu / Unix login

Page 8: Martin John Bishop UK HGMP Resource Centre Hinxton Cambridge CB10 1 SB mbishop@hgmp.mrc.ac.uk

GENOME WEB

Up to date Relevant Fully searchable Fully verified Extensive

Page 9: Martin John Bishop UK HGMP Resource Centre Hinxton Cambridge CB10 1 SB mbishop@hgmp.mrc.ac.uk

INTEGRATED ANALYSIS

BLAST NIX PIX GLUE PIE MAGI PINT

Page 10: Martin John Bishop UK HGMP Resource Centre Hinxton Cambridge CB10 1 SB mbishop@hgmp.mrc.ac.uk

COMMON OPTIONS

EMBOSS GCG PINE CLUSTAL STADEN PASSWORD

Page 11: Martin John Bishop UK HGMP Resource Centre Hinxton Cambridge CB10 1 SB mbishop@hgmp.mrc.ac.uk

GENOMICS APPLICATIONS

Linkage Analysis Radiation Hybrid Mapping Sequence Ready Clone Maps Genome Databases Polymorphisms Sequence Analysis Gene Prediction Expression Profiling Phylogenetic Analysis Integrated Tools - GLUE,

RHYME, NIX, PIE

Page 12: Martin John Bishop UK HGMP Resource Centre Hinxton Cambridge CB10 1 SB mbishop@hgmp.mrc.ac.uk

PROTEOMICS APPLICATIONS

Protein Sequence Analysis Protein Structure Analysis Protein Structural Modelling Proteome Databases Tools for Peptide Sequence

Determination Protein Cellular Localisation Protein Functional Studies Pathways and Protein

Interactions Integrated tools and

databases - PIX

Page 13: Martin John Bishop UK HGMP Resource Centre Hinxton Cambridge CB10 1 SB mbishop@hgmp.mrc.ac.uk

NETWORK / JANET SERVICE

LONDON Currently 34 Mbps

main link Future keep 34

Mbps link for backup

CAMBRIDGE Currently 8 Mbps

redundant link Future Gigabit

Ethernet

Page 14: Martin John Bishop UK HGMP Resource Centre Hinxton Cambridge CB10 1 SB mbishop@hgmp.mrc.ac.uk

SERVERS

More than 80 servers 1, 4 and 8 cpu SMP Sparc and Intel Solaris and Linux Databases doubling every 14 months

Page 15: Martin John Bishop UK HGMP Resource Centre Hinxton Cambridge CB10 1 SB mbishop@hgmp.mrc.ac.uk

LOADS

Load is the percentage of processes trying to run

Interactive load 50% Job queues load 100% Jobs waiting can be 6-10 times the

work being processed

Page 16: Martin John Bishop UK HGMP Resource Centre Hinxton Cambridge CB10 1 SB mbishop@hgmp.mrc.ac.uk

PROCESSES AND QUEUES

Menu service (hot swop) General analysis (overloaded) Sun BLAST and NIX queue Dell BLAST queue BLAST data file server Interactive Linkage queue Heavy Linkage queue

Page 17: Martin John Bishop UK HGMP Resource Centre Hinxton Cambridge CB10 1 SB mbishop@hgmp.mrc.ac.uk

USERS’ REAL WORLD PROBLEMS

Comparative method Extrapolate from known to similar Hints to reduce the amount of

experimental work that needs to be done

Page 18: Martin John Bishop UK HGMP Resource Centre Hinxton Cambridge CB10 1 SB mbishop@hgmp.mrc.ac.uk

SOFTWARE SYSTEMS

A variety of technical solutions are used BLAST NCBI Entrez SRS GeneCards NIX ENSEMBL

Page 19: Martin John Bishop UK HGMP Resource Centre Hinxton Cambridge CB10 1 SB mbishop@hgmp.mrc.ac.uk

HELPING THE USER

Information discovery – completeness Communication – multiple sites Ontology – uniformity? Software integration – ease of use Reasoning about results Monitoring – repeat queries

Page 20: Martin John Bishop UK HGMP Resource Centre Hinxton Cambridge CB10 1 SB mbishop@hgmp.mrc.ac.uk

MAJOR CHALLENGES

User interface Back end processing Cost recovery

Page 21: Martin John Bishop UK HGMP Resource Centre Hinxton Cambridge CB10 1 SB mbishop@hgmp.mrc.ac.uk

NEW TECHNOLOGIES?

Web services GRID (EMBnet) Object-orientated computing Multi-agent systems

Page 22: Martin John Bishop UK HGMP Resource Centre Hinxton Cambridge CB10 1 SB mbishop@hgmp.mrc.ac.uk

TREASURE

Web service with top level container Customise for the user User selects a service and opens it as

an application An alternative view can be built

around user data as the fundamental objects

Page 23: Martin John Bishop UK HGMP Resource Centre Hinxton Cambridge CB10 1 SB mbishop@hgmp.mrc.ac.uk

IMPLEMENTATION

EMBREO library written in Java handles web service layer (also CORBA, XML-RPC, JDBC and other connectivity)

Also handles file access and transfer and display of results (including use of VNC)

Simple Object Access Protocol (SOAP) Browser channel uses XML format

Page 24: Martin John Bishop UK HGMP Resource Centre Hinxton Cambridge CB10 1 SB mbishop@hgmp.mrc.ac.uk

USER ACCOUNTING AND CUSTOMIZATION

Currently very complex HED NIS+ Filesystem configuration files

Future a single database Lightweight Directory Access Protocol

(LDAP)

Page 25: Martin John Bishop UK HGMP Resource Centre Hinxton Cambridge CB10 1 SB mbishop@hgmp.mrc.ac.uk

CREDITS

Gary Williams Menu systems and Genome Web

Geoff Gibbs Network and systems

Peter Tribble Web servers, Queues, Treasure