bioinformatics approaches for…

63
Bioinformatics approaches Bioinformatics approaches for… for… Teresa K Attwood Faculty of Life Sciences & School of Computer Science University of Manchester, Oxford Road Manchester M13 9PT, UK http://www.bioinf.man.ac.uk/dbbrowser/

Upload: drew

Post on 25-Feb-2016

45 views

Category:

Documents


3 download

DESCRIPTION

Bioinformatics approaches for…. Teresa K Attwood Faculty of Life Sciences & School of Computer Science University of Manchester, Oxford Road Manchester M13 9PT, UK http://www.bioinf.man.ac.uk/dbbrowser/. ….analysing GPCRs…. …. which craft is best?. Overview. What are GPCRs? - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Bioinformatics approaches for…

Bioinformatics approaches for…Bioinformatics approaches for…

Teresa K AttwoodFaculty of Life Sciences & School of Computer Science

University of Manchester, Oxford RoadManchester M13 9PT, UK

http://www.bioinf.man.ac.uk/dbbrowser/

Page 2: Bioinformatics approaches for…

…….analysing GPCRs…..analysing GPCRs….

Page 3: Bioinformatics approaches for…

……..whichwhich craft is best? craft is best?

Page 4: Bioinformatics approaches for…

OverviewOverview• What are GPCRs?

– why they’re interesting & important– why bioinformatics approaches are important

• In silico function prediction – a reality check

• Family-based methods for characterising GPCRs• Understanding the tools

– problems with pair-wise & family-based approaches– estimating (biological) significance

• Seeking deeper functional insights• Conclusions

Page 5: Bioinformatics approaches for…

GDPGTP

GTP

GTPGTP

What are GPCRs?What are GPCRs?G protein-coupled receptorsG protein-coupled receptors

• A functionally diverse family of cell-surface 7TM proteins • Functional diversity achieved via

– interaction with a variety of ligands – stimulation of various intracellular pathways via coupling to

different G proteins

Page 6: Bioinformatics approaches for…

Why are GPCRs interesting?Why are GPCRs interesting?Attwood, TK & Flower, DR (2002) Trawling the genome for G protein-coupled receptors: the importance

of integrating bioinformatic approaches. In Drug Design – Cutting Edge Approaches, pp.60-71.

• They are ubiquitous – >800 GPCR genes in the human genome, from 3 major

superfamilies • rhodopsin-, secretin- & metabotropic glutamate receptor-like

• Share almost no sequence similarity– but are united by common 7TM architecture

• Constitute a complex multi-gene family– populated by >50 families & >350 subtypes

Page 7: Bioinformatics approaches for…

Isn’t just stamp collecting!Isn’t just stamp collecting!Attwood, TK & Flower, DR (2002) Trawling the genome for G protein-coupled receptors: the importance

of integrating bioinformatic approaches. In Drug Design – Cutting Edge Approaches, pp.60-71.

• GPCRs are of profound biomedical importance– targets for >50% of prescription drugs– yield sales >$16 billion/annum

• they’re big business!

• Given their importance, we need to – characterise the ones we know about– identify new ones

• & discover what they do!– e.g., as potential new drug targets

Page 8: Bioinformatics approaches for…

Why studying GPCRs is difficultWhy studying GPCRs is difficult• Only 2 crystal structures available

– bovine rhodopsin (2000) & human 2-adrenergic receptor (2007)

• Many GPCRs haven’t been characterised experimentally– remain 'orphans’, with unknown ligand specificity

• With >800 human GPCRs, this isn’t much to go on!

Page 9: Bioinformatics approaches for…

Why use bioinformatics approaches?Why use bioinformatics approaches?• Computational approaches are important

– can be used to help identify, characterise & model novel receptors • usually by similarity & extrapolation of known characteristics

• Bioinformatics thus offers complementary tools for elucidating the structures & functions of receptors

• But the task is non-trivial– GPCRs exhibit rich relationships & complex molecular interactions

• present many challenges for in silico analysis– in trying to derive meaningful functional insights, traditional methods are

likely to be limited

Page 10: Bioinformatics approaches for…

γ

Src Grb2Shc Sos

Ras Rap

MAPK

GDP

GTP

GTP

GDP

GTP

GTPGTP

GPCR

PP

Regulation of geneexpression

Nucleus

PI3Kγ

PLCβPKC

RasGRF

PYK2

MEK

Raf1 B-Raf

RTK

cAMP

EPAC

PKACa2+

biogenicamines

amino acids

ions

lipids

peptides proteins

lightothers

αi

αq

γβα

αo

αi

βα γ

αs

GPCR

biogenicamines

amino acids

ions

lipids

peptides proteins

lightothers

We’ve been using biology-unaware search tools to analyse such complex systemsHow far can we truly expect to understand cellular function with such naïve approaches…?

Page 11: Bioinformatics approaches for…

In silicoIn silico function prediction function prediction…a reality check…a reality check

• What is the function of this structure?

• What is the function of this sequence?

• What is the function of this motif?– the fold provides a scaffold, which can be

decorated in different ways by different sequences to confer different functions - knowing the fold & function allows us to rationalise how the structure effects its function at the molecular level

Page 12: Bioinformatics approaches for…

“A test case for structural genomics Structure-based assignment of the biochemical function of

hypothetical protein mj0577” (Zarembinski et al., PNAS 95 1998)

Although the structure co-crystallised with ATP, the biochemical function of the protein is unknown

Page 13: Bioinformatics approaches for…

What's in a sequence?What's in a sequence?

Page 14: Bioinformatics approaches for…

Full domain alignment methods

Single motif methods

Multiple motif methods

Fuzzy regex (eMOTIF)

Exact regex (PROSITE)

Profiles (Profile Library)

HMMs (Pfam)

Identity matrices (PRINTS)

Weight matrices (Blocks)

Methods for family analysisMethods for family analysisAttwood, TK (2000). The quest to deduce protein function from sequence: the role of pattern databases. Int.J. Biochem. Cell Biol., 32(2), 139–155.

Page 15: Bioinformatics approaches for…

The challenge of family analysisThe challenge of family analysis

• highly divergent family with single function?• superfamily with many diverse functional families?

– must distinguish if function analysis done in silico– a tough challenge!

Page 16: Bioinformatics approaches for…

In the beginning was PROSITEIn the beginning was PROSITE

[GSTALIVMYWC]-[GSTANCPDE]-{EDPKRH}-X(2)-[LIVMNQGA]-X(2)-[LIVMFT]-[GSTANC]-LIVMFYWSTAC]-[DENH]-R

TM domain

Page 17: Bioinformatics approaches for…

Diagnostic limitations of PROSITEDiagnostic limitations of PROSITEID G_PROTEIN_RECEP_F1_1; PATTERN. AC PS00237; DT APR-1990 (CREATED); NOV-1997 (DATA UPDATE); SEP-2004 (INFO UPDATE). DE G-protein coupled receptors family 1 signature. PA [GSTALIVMFYWC]-[GSTANCPDE]-{EDPKRH}-x(2)-[LIVMNQGA]-x(2)-[LIVMFT]- PA [GSTANC]-[LIVMFYWSTAC]-[DENH]-R-[FYWCSH]-x(2)-[LIVM]. NR /RELEASE=44.6,159201; NR /TOTAL=1622(1621); /POSITIVE=1530(1529); /UNKNOWN=0(0); NR /FALSE_POS=92(92); /FALSE_NEG=261; /PARTIAL=61;

• This represents an apparent 22% error rate – the actual rate is probably higher

• Thus, a match to a pattern is not necessarily true – & a mis-match is not necessarily false!

• False-negatives are a fundamental limitation to this type of pattern matching– if you don't know what you're looking for, you'll never know

you missed it!

Page 18: Bioinformatics approaches for…

Where do motifs (fingerprints) fit in?Where do motifs (fingerprints) fit in?(fingerprints are hierarchical)(fingerprints are hierarchical)

loop regionTM domain TM domain

Page 19: Bioinformatics approaches for…

Rhodopsin-likeRhodopsin-like superfamily, family superfamily, family & subtype& subtype GPCRs in PRINTSGPCRs in PRINTS

Attwood, TK (2001) A compendium of specific motifs for diagnosing GPCR subtypes. TiPS, 22(4), 162-165.

Page 20: Bioinformatics approaches for…

Searching PRINTS - FingerPRINTScanSearching PRINTS - FingerPRINTScanScordis, P, Flower, DR & Attwood, TK (1999) FingerPRINTScan: intelligent

searching of the PRINTS motif database. Bioinformatics, 15, 523-524.

• GPCR fingerprints are embedded in PRINTS– allows diagnosis of GPCR mosaics

Page 21: Bioinformatics approaches for…
Page 22: Bioinformatics approaches for…

N CN C

Visualising fingerprintsVisualising fingerprintsAttwood, TK & Findlay, JBC (1993) Design of a discriminating fingerprint

for G-protein-coupled receptors. Protein Eng., 6(2), 167–176.

Page 23: Bioinformatics approaches for…

Visualising fingerprintsVisualising fingerprintsAttwood, TK & Findlay, JBC (1993) Design of a discriminating fingerprint

for G-protein-coupled receptors. Protein Eng., 6(2), 167–176.

N

C

Page 24: Bioinformatics approaches for…

Diagnosing partial matchesDiagnosing partial matches

• Missed by PROSITE– wasn’t annotated as a FN

Page 25: Bioinformatics approaches for…

An integrated approachAn integrated approachMulder, NJ, Apweiler, R, Attwood, TK, Bairoch, A et al. (2007) New developments in InterPro. NAR, 35, D224-8.

• To simplify sequence analysis, the family dbs were integrated within a unified annotation resource – InterPro– initial partners were PRINTS,

PROSITE, profiles & Pfam• now many more partners

– linked to its satellite dbs• but lags behind their coverage

– by Oct 2007, it had 14,768 entries & covered 76% of UnitProtKB

• major role in fly & human genome annotation

Page 26: Bioinformatics approaches for…

InterPro – method comparisonInterPro – method comparison

Page 27: Bioinformatics approaches for…

Where has this got us?Where has this got us?

Page 28: Bioinformatics approaches for…

Understanding the tools Understanding the tools …estimating significance…estimating significance

• How do we know what to believe? • Let’s explore some of the difficulties that arise when

pair-wise search tools (BLAST & FastA) & family-based methods are used naïvely– these examples caution us to think about what the results

actually mean in biological terms.....

Page 29: Bioinformatics approaches for…

Identifying sequence similarityIdentifying sequence similarity

• GPCRs present many challenges for in silico functional analysis

• Several signature-based methods now available– with different areas of optimum application

• Yet naïve, pair-wise similarity searching has been the mainstay of functional annotation efforts– it allows us to identify/quantify relationships between

sequences

• But quantifying similarity between sequences is not the same as identifying their functions

Page 30: Bioinformatics approaches for…

Problems with pairwise similarity toolsProblems with pairwise similarity toolsGaulton, A & Attwood, TK (2003) Bioinformatics approaches for the classification of G protein-coupled

receptors. Current Opinion in Pharmacology, 3, 114-120.

• For identifying precise families to which receptors belong & the ligands they bind, pair-wise tools are limited – at what level of seq ID is ligand specificity conserved?

• some GPCRs with 25% ID share a common ligand; • others, with greater levels, don’t…

• It may be impossible to tell from BLAST if an orphan belongs to a known family (the top hit), or if it will bind a novel ligand – e.g., for the now de-orphaned UR2R, BLAST indicates most

similarity to the type 4 SSRs, yet it is known to bind a different (related) ligand

Page 31: Bioinformatics approaches for…

When is a GPCR not an SSR?When is a GPCR not an SSR?Query length: 389 AA Date run: 2002-10-18 09:08:29 UTC+0100 on sib-blast.unil.chTaxon: Homo sapiensDatabase: XXswissprot

120,412 sequences; 45,523,583 total letters SWISS-PROT Release 40.29 of 10-Oct-2002

Db AC Description Score E-value sp Q9UKP6 Q9UKP6 Orphan receptor [Homo sapiens... 782 0.0sp P31391 SSR4_HUMAN Somatostatin receptor type 4 (SS4R) [SSTR4]... 167 3e-41sp O43603 GALS_HUMAN Galanin receptor type 2 (GAL2-R) (GALR2) [G... 147 4e-35sp P30872 SSR1_HUMAN Somatostatin receptor type 1 (SS1R) (SRIF-2... 144 3e-34sp P32745 SSR3_HUMAN Somatostatin receptor type 3 (SS3R) (SSR-28... 140 3e-33sp P35346 SSR5_HUMAN Somatostatin receptor type 3 (SS5R) (SSTR5)... 140 6e-33sp P30874 SPLICE ISOFORM B of P30874 [SSTR2] [Homo sapiens... 134 3e-31sp P30874 SSR2_HUMAN Somatostatin receptor type 2 (SS2R) (SRIF-1... 134 3e-31sp P48145 GPR7_HUMAN Neuropeptides B/W receptor type 1 (G protei... 133 7e-31sp O60755 GALT_HUMAN Galanin receptor type 3 (GAL3-R) (GALR3) [G... 132 2e-30sp P41143 OPRD_HUMAN Delta-type opioid receptor (DOR-1) [OPRD1] ... 128 2e-29sp P35372 SPLICE ISOFORM 1A of P35372 [OPRM1] [Homo sapien... 125 1e-28sp P35372 OPRM_HUMAN Mu-type opioid receptor (MOR-1) [OPRM1] [Ho... 125 1e-28

Page 32: Bioinformatics approaches for…

When is a GPCR not an SSR?…when it’s a UR2R…when it’s a UR2R

Query length: 389 AA Date run: 2002-10-18 09:08:29 UTC+0100 on sib-blast.unil.chTaxon: Homo sapiensDatabase: XXswissprot

120,412 sequences; 45,523,583 total letters SWISS-PROT Release 40.29 of 10-Oct-2002

Db AC Description Score E-value sp Q9UKP6 UR2R_HUMAN Urotensin II receptor (UR-II-R) [GPR14] [Ho... 782 0.0sp P31391 SSR4_HUMAN Somatostatin receptor type 4 (SS4R) [SSTR4]... 167 3e-41sp O43603 GALS_HUMAN Galanin receptor type 2 (GAL2-R) (GALR2) [G... 147 4e-35sp P30872 SSR1_HUMAN Somatostatin receptor type 1 (SS1R) (SRIF-2... 144 3e-34sp P32745 SSR3_HUMAN Somatostatin receptor type 3 (SS3R) (SSR-28... 140 3e-33sp P35346 SSR5_HUMAN Somatostatin receptor type 3 (SS5R) (SSTR5)... 140 6e-33sp P30874 SPLICE ISOFORM B of P30874 [SSTR2] [Homo sapiens... 134 3e-31sp P30874 SSR2_HUMAN Somatostatin receptor type 2 (SS2R) (SRIF-1... 134 3e-31sp P48145 GPR7_HUMAN Neuropeptides B/W receptor type 1 (G protei... 133 7e-31sp O60755 GALT_HUMAN Galanin receptor type 3 (GAL3-R) (GALR3) [G... 132 2e-30sp P41143 OPRD_HUMAN Delta-type opioid receptor (DOR-1) [OPRD1] ... 128 2e-29sp P35372 SPLICE ISOFORM 1A of P35372 [OPRM1] [Homo sapien... 125 1e-28sp P35372 OPRM_HUMAN Mu-type opioid receptor (MOR-1) [OPRM1] [Ho... 125 1e-28

Page 33: Bioinformatics approaches for…

Residue Number

%ID

UR2R_HUMAN vs SOMATOSTANRUR2R_HUMAN vs UROTENSIN2R

1 380 1 380

7

6

5

4

3

2

1

9

8

7

6

5

4

3

2

11

2

3

4

5

6

7

8

9

Page 34: Bioinformatics approaches for…

The trouble with top hitsThe trouble with top hits

• The most statistically significant hit is not always the most biologically relevant

• Yet many rule-based ‘expert systems’ still rely on top BLAST or FastA hits to make their diagnoses

• BLAST/FastA ‘see’ generic similarity & not the often-subtle differences that constitute the functional determinants between closely-related receptor families & subtypes

• Failure to appreciate this fundamental point has generated numerous annotation errors in our databases

Page 35: Bioinformatics approaches for…

-opioid receptor -opioid receptor -opioid receptor true

Misleading annotation via FastAMisleading annotation via FastA

Page 36: Bioinformatics approaches for…

• As we’ve seen, it’s tempting to use top hits from BLAST or FastA results to classify unknown proteins– but this may lead us (& especially computer programs) to false

functional conclusions• PSI-BLAST is more sensitive than BLAST, because it

creates a profile from hits above a given threshold– but this too can cause problems– let’s take a closer look

Misleading results from BLASTMisleading results from BLAST

Page 37: Bioinformatics approaches for…
Page 38: Bioinformatics approaches for…

So, is UL78 a GPCR?So, is UL78 a GPCR?& if so, what sort?& if so, what sort?

Page 39: Bioinformatics approaches for…

What What PSI-PSI-BLAST BLAST saidsaid(profile dilution (profile dilution in action)in action)

*

*

*

Page 40: Bioinformatics approaches for…

What GeneQuiz said…What GeneQuiz said…a thrombin receptora thrombin receptor

Page 41: Bioinformatics approaches for…

What GeneQuiz said later…What GeneQuiz said later…

Page 42: Bioinformatics approaches for…

Overview of resultsOverview of resultspair-wise & family-based methodspair-wise & family-based methods

Page 43: Bioinformatics approaches for…

What is UL78?What is UL78?

Tool No hit Poor hit Significant hitBLAST GPCRs in list

PSI-BLAST thrombin receptor; chemokine & opioid receptors

PROSITE profile GPCR

Pfam

PRINTS

Blocks-PRINTS GPCR

GeneQuiz thrombin receptor; C5A receptor

Bioinformatics tools, alone, cannot tell us!

Page 44: Bioinformatics approaches for…

So, beware top hitsSo, beware top hits…but also beware bottom hits!…but also beware bottom hits!

Let us now compare & contrast some InterPro results with those of its source dbs…

Page 45: Bioinformatics approaches for…

Rhodopsin-like superfamily Rhodopsin-like superfamily GPCRs in InterPro 2005 GPCRs in InterPro 2005

IPR000276 GPCR_Rhodopsn 7752 proteins

PS50262 G_PROTEIN_RECEP_F1_2 7702 proteins

PF00001 7tm_1 7064 proteins

PS00237 G_PROTEIN_RECEP_F1_1 6527 proteins

PR00237 GPCRRHODOPSN 5821 proteins (don’t include partials)

Page 46: Bioinformatics approaches for…

Rhodopsin-like superfamily Rhodopsin-like superfamily GPCRs in the source databases GPCRs in the source databases

Pfam FP ? FN ? U ? TP? 8776 matches 7064

PROSITE (profile) FP 3 FN 3 U 12 TP 1837 matches 7702

PROSITE (regex) FP 92 FN 261 U 0 TP 1530 matches 6527

PRINTS FP 0 FN ? U 0 TP 1154 matches 5821 >2165 updated

Page 47: Bioinformatics approaches for…

Rhodopsin-like superfamily Rhodopsin-like superfamily GPCRs in InterPro 2007 GPCRs in InterPro 2007

IPR000276 GPCR_Rhodopsn 16,845 proteins

PS50262 G_PROTEIN_RECEP_F1_2 16,714 proteins

PF00001 7tm_1 15,712 proteins

PR00237 GPCRRHODOPSN 13,405 proteins

PS00237 G_PROTEIN_RECEP_F1_1 13,723 proteins

No human curator has time to validate all these matches…

Page 48: Bioinformatics approaches for…

14,615 rhodopsin-like superfamily 14,615 rhodopsin-like superfamily GPCRs in Pfam?GPCRs in Pfam?

Page 49: Bioinformatics approaches for…

ID Q6NV75 PRELIMINARY; PRT; 609 AA.AC Q6NV75;DT 05-JUL-2004 (TrEMBLrel. 27, Created)DT 05-JUL-2004 (TrEMBLrel. 27, Last sequence update)DT 05-JUL-2004 (TrEMBLrel. 27, Last annotation update)DE G protein-coupled receptor 153.GN Name=GPR153;OS Homo sapiens (Human).OX NCBI_TaxID=9606 RN [1]RP SEQUENCE FROM N.A.RC TISSUE=Brain;RA Strausberg R.L., Feingold E.A., Grouse L.H., Derge J.G.,RA Jones S.J., Marra M.A.;RT "Generation and initial analysis of more than 15,000 full-lengthRT human and mouse cDNA sequences.";RL Proc. Natl. Acad. Sci. U.S.A. 99:16899-16903(2002).RP SEQUENCE FROM N.A.RC TISSUE=Brain;RA Strausberg R.;RL Submitted (MAR-2004) to the EMBL/GenBank/DDBJ databases.DR EMBL; BC068275; AAH68275.1; -. DR GO; GO:0004872 DR InterPro; IPR000276; GPCR_Rhodpsn.DR Pfam; PF00001; 7tm_1; 1.DR PROSITE; PS50262; G_PROTEIN_RECEP_F1_2; 1.KW ReceptorSQ SEQUENCE 609 AA; 65341 MW; E525CC7F60D0891C CRC64; MSDERRLPGS AVGWLVCGGL SLLANAWGIL SVGAKQKKWK PLEFLLCTLA ATHMLNVAVP IATYSVVQLR RQRPDFEWNE GLCKVFVSTF YTLTLATCFS VTSLSYHRMW MVCWPVNYRL SNAKKQAVHT VMGIWMVSFI LSALPAVGWH DTSERFYTHG CRFIVAEIGL GFGVCFLLLV GGSVAMGVIC TAIALFQTLA VQVGRQADHR AFTVPTIVVE DAQGKRRSSI DGSEPAKTSL QTTGLVTTIV FIYDCLMGFP VLVVSFSSLR ADASAPWMAL CVLWCSVAQA LLLPVFLWAC DRYRADLKAV REKCMALMAN DEESDDETSL EGGISPDLVL ERSLDYGYGG DFVALDRMAK YEISALEGGL PQLYPLRPLQ EDKMQYLQVP PTRRFSHDDA DVWAAVPLPA FLPRWGSGED LAALAHLVLP AGPERRRASL LAFAEDAPPS RARRRSAESL LSLRPSALDS GPRGARDSPP GSPRRRPGPG PRSASASLLP DAFALTAFEC EPQALRRPPG PFPAAPAAPD GADPGEAPTP PSSAQRSPGP RPSAHSHAGS LRPGLSASWG EPGGLRAAGG GGSTSSFLSS PSESSGYATL HSDSLGSAS//

Pfam match Q6NV75/24-297

GPCR?

PROSITE (profile) no match

PROSITE (regex) no matchPRINTS no match

ClustalW – sequences too divergent to be aligned

false negative

Page 50: Bioinformatics approaches for…

Beware top & bottom hitsBeware top & bottom hits…but also beware simplistic analysis …but also beware simplistic analysis tools coupled with wet experiments! tools coupled with wet experiments!

Let’s finally look at how hydropathy profiles can compel biologists to make strange deductions…

- & still get their results published in Science!

Page 51: Bioinformatics approaches for…

GPCR?

Pfam Lanthionine synthetase C-like proteinPROSITE (profile) no match

PROSITE (regex) no matchPRINTS no match

ClustalW – sequences too divergent to be aligned

ID Q9C929_ARATH Unreviewed; 401 AA.AC Q9C929;DT 01-JUN-2001, integrated into UniProtKB/TrEMBL.DT 01-JUN-2001, sequence version 1.DT 24-JUL-2007, entry version 23.DE Putative G protein-coupled receptor; 80093-78432.GN Name=F14G24.19; OrderedLocusNames=At1g52920;OS Arabidopsis thaliana (Mouse-ear cress).OC Eukaryota; Viridiplantae; Streptophyta; ... Arabidopsis. OX NCBI_TaxID=3702;RN [1]RP NUCLEOTIDE SEQUENCE.RA Lin X., Kaul S., Town C.D., Benito M., Creasy T.H., Haas B.J., Wu D.,RA Maiti R., Ronning C.M., Koo H., Fujii C.Y., Utterback T.R.,RA Barnstead M.E., Bowman C.L., White O., Nierman W.C., Fraser C.M.;RT "Arabidopsis thaliana chromosome 1 BAC F14G24 genomic sequence.";RL Submitted (DEC-1999) to the EMBL/GenBank/DDBJ databases.RN [2]RP NUCLEOTIDE SEQUENCE.RA Town C.D., Kaul S.;RL Submitted (JAN-2001) to the EMBL/GenBank/DDBJ databases.DR EMBL; AC019018; AAG52264.1; -; Genomic_DNA. [EMBL / GenBank / DDBJ]DR PIR; E96570; E96570.DR UniGene; At.66935; -.DR GenomeReviews; CT485782_GR; AT1G52920.DR KEGG; ath:At1g52920; -.DR TAIR; At1g52920; -.DR GO; GO:0004872; F:receptor activity; IEA:UniProtKB-KW.DR InterPro; IPR007822; LANC_like.DR InterPro; Graphical view of domain structure.DR Pfam; PF05147; LANC_like; 1.KW Receptor.SQ SEQUENCE 401 AA; 45284 MW; C9D3BF8CC8F0FE0B CRC64; MPEFVPEDLS GEEETVTECK DSLTKLLSLP YKSFSEKLHR YALSIKDKVV WETWERSGKR VRDYNLYTGV LGTAYLLFKS YQVTRNEDDL KLCLENVEAC DVASRDSERV TFICGYAGVC ALGAVAAKCL GDDQLYDRYL ARFRGIRLPS DLPYELLYGR AGYLWACLFL NKHIGQESIS SERMRSVVEE IFRAGRQLGN KGTCPLMYEW HGKRYWGAAH GLAGIMNVLM HTELEPDEIK DVKGTLSYMI QNRFPSGNYL SSEGSKSDRL VHWCHGAPGV ALTLVKAAQV YNTKEFVEAA MEAGEVVWSR GLLKRVGICH GISGNTYVFL SLYRLTRNPK YLYRAKAFAS FLLDKSEKLI SEGQMHGGDR PFSLFEGIGG MAYMLLDMND PTQALFPGYE L//

Page 52: Bioinformatics approaches for…

γ

Src Grb2Shc Sos

Ras Rap

MAPK

GDP

GTP

GTP

GDP

GTP

GTPGTP

GPCR

PP

Regulation of geneexpression

Nucleus

PI3Kγ

PLCβPKC

RasGRF

PYK2

MEK

Raf1 B-Raf

RTK

cAMP

EPAC

PKACa2+

biogenicamines

amino acids

ions

lipids

peptides proteins

lightothers

αi

αq

γβα

αo

αi

βα γ

αs

GPCR

biogenicamines

amino acids

ions

lipids

peptides proteins

lightothers

They do sums (quickly) & crude string matching

RememberRememberComputers don’t do biology!Computers don’t do biology!

Page 53: Bioinformatics approaches for…

Seeking deeper functional insightsSeeking deeper functional insightsAttwood, TK, Croning, MD & Gaulton, A (2002) Deriving structural and functional insights from a ligand-based

hierarchical classification of G protein-coupled receptors. Protein Eng., 15, 7-12.

• S’family, family & subtype motifs have different locations• If s’family motifs define the common scaffold, hypothesis:

– family motifs relate to ligand binding?– subtype motifs relate to G protein coupling?– powerful tools for subtyping & potentially de-orphaning GPCRs

Page 54: Bioinformatics approaches for…

Locations of ligand-binding residues & motif distributionLocations of ligand-binding residues & motif distribution

Page 55: Bioinformatics approaches for…

Locations of G protein-coupling residues & distribution of motifsLocations of G protein-coupling residues & distribution of motifs

Subtype motifs & # of fingerprints mapping to each region

G protein coupling regions & ## of families mapping to each region

Page 56: Bioinformatics approaches for…

Seeking deeper functional insights?Seeking deeper functional insights?Attwood, TK, Croning, MD & Gaulton, A (2002) Deriving structural and functional insights from a ligand-based

hierarchical classification of G protein-coupled receptors. Protein Eng., 15, 7-12.

• Clearly, many family- & subtype motifs are simply in the ‘wrong’ place for the initial hypothesis to be true

Muscarinic receptors Muscarinic receptor M5GPCR superfamily

Page 57: Bioinformatics approaches for…

Refining the hypothesisRefining the hypothesis

• Besides, it’s not that simple– only part of the answer

• Need to consider that GPCRs don’t function in isolation– their functions are modulated via interactions with other proteins

• Also, the phenomenon of dimerisation challenges the view of the GPCR monomer as functional unit– many GPCRs exist as homo- & heterodimers

• Such observations demand a more systematic analysis of motifs & their likely functional roles

Page 58: Bioinformatics approaches for…

Oligomerisation & protein-protein interaction Oligomerisation & protein-protein interaction residues/regionsresidues/regions

A pilot study with adrenergic, bradykinin & dopamine receptorsA pilot study with adrenergic, bradykinin & dopamine receptors

family-level motifssubfamily-level motifs

residues involved in oligomerisationresidues involved in protein-protein interactionresidues involved in G protein coupling

residues involved in ligand binding

Page 59: Bioinformatics approaches for…

Where next?Where next?• Based on location, some family-level motifs couldn’t

be involved in ligand binding & some subtype-level motifs couldn’t be involved in G protein coupling– clearly, 3D location must be taken into account

• functional correlations would then be stronger

• The remaining motifs are likely to be involved in other molecular interactions– e.g., dimerisation, effector proteins….(early results promising)

• this will help us to build a knowledge-based system to help suggest the likely functional roles for family- & subtype-level motifs in future

Page 60: Bioinformatics approaches for…

ConclusionsConclusions• There are many barriers to success for the jobbing

bioinformatician, e.g.: – not fully understanding the processes we’re trying to model

& predict (e.g., protein folding)– the dynamic nature of biological data– not having been rigorous in the way we define &/or describe

biology/biological processes in the literature– the volume of data, data heterogeneity– maintenance of data, propagation of errors…

• Possibly the largest hurdle is that computers are number crunchers– they don’t do biology, & trying to teach them is hard– & the harder we try, the clearer it is how naïve we’ve been

Page 61: Bioinformatics approaches for…

ConclusionsConclusions• In silico functional annotation requires several dbs to be

searched & several tools to be used– different methods provide different perspectives– dbs aren’t complete & their contents don’t fully overlap

• The more dbs searched, the harder it is to interpret results• The more computers are involved in automating annotation,

the greater the need for collaboration– especially between s/w developers, annotators & ‘wet’

experimentalists

• The more data we have, the more rigorous we must be in thinking/writing if we are to make sense of the complexities

Page 62: Bioinformatics approaches for…

ConclusionsConclusionsFlower DR & Attwood, TK (2004) Integrative bioinformatics for functional genome annotation: trawling for G

protein-coupled receptors.Semin Cell Dev Biol., 15(6), 693-701.

• For GPCRs, there are many analysis tools available– BLAST, FastA, family databases, modelling tools, etc.

• We must understand the limitations of the methods– no method is infallible or able to replace the need for biological validation – use all available resources & understand their problems – none is best!

• Used wisely, bioinformatics tools are useful– BLAST/FastA offer broad brush strokes, motif-methods add fine detail– together, they facilitate receptor characterisation & prediction of ligand

specificity, & allow identification of novel ligand-binding, G protein-coupling or other likely molecular interaction motifs

• We are a long way from having reliable tools for deducing GPCR function & structure from sequence– but with the right approach, there is hope

Page 63: Bioinformatics approaches for…