update susan bridges, fiona mccarthy, shane burgess nri 2006-04846

36
Update Susan Bridges, Fiona McCarthy, Shane Burgess NRI 2006-04846

Upload: valentine-king

Post on 18-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Update Susan Bridges, Fiona McCarthy, Shane Burgess NRI 2006-04846

Update

Susan Bridges, Fiona McCarthy, Shane Burgess

NRI 2006-04846

Page 2: Update Susan Bridges, Fiona McCarthy, Shane Burgess NRI 2006-04846

1.Some of what we’ve been doing :Confirmation of predicted/hypothetical proteins in chicken

2. Something of more interest to almost everyone in here for analyzing your data.

Page 3: Update Susan Bridges, Fiona McCarthy, Shane Burgess NRI 2006-04846

Educate researchers who need to use GO.

University of Delaware, 12-13 November, 2007.

…… currently working with researchers from the Universities of Delaware and Maryland to provide GO annotations necessary to facilitate publication of array data.

First residential workshop at MSU in May 20-22 2008.

Page 4: Update Susan Bridges, Fiona McCarthy, Shane Burgess NRI 2006-04846

Avian Genome Conference 18-20 May, 2008GO Annotation Jamboree 21-22 May, 2008

[email protected]

Page 5: Update Susan Bridges, Fiona McCarthy, Shane Burgess NRI 2006-04846
Page 6: Update Susan Bridges, Fiona McCarthy, Shane Burgess NRI 2006-04846

“Hypothetical” and “predicted” proteins

Naive and activated purified CD4+ T cells; transformed CD4+ T cells; spleen; brain tissues; bursal B and stromal cells; muscle; and serum.

Database of all predicted proteins, from chicken build 2.1, using DFF-2D LC MS2 and our computational pipeline.

Experimentally-confirmed 7,809 chicken predicted proteins: 52% were expressed in more than one tissue.

6,027 (77%) of these proteins mapped to human and mouse orthologs and we assigned standardized nomenclature to 5,326 (64%).

8,213 GO associations to 21% of the identified chicken proteins using the ISS evidence code to transfer function between human-chicken and human-mouse orthologs

increased the current chicken GO annotations by 8% and doubled the number of chicken manually-curated annotations.

In PRIDE and NCBI databases and being used at NCBI to promote XP (computational model) to NP (confirmed product) accessions i.e. the words “hypothetical” and “predicted” are removed.

We also add experimentally-derived cell component GO annotations.

Page 7: Update Susan Bridges, Fiona McCarthy, Shane Burgess NRI 2006-04846

48%(3,779)

1%(61)4%

(313)7%

(561)

26%(2,020)

14%(1,073)

0%(0)

0%(2)

In one tissue In two tissues In three tissues In four tissuesIn five tissues In six tissues In seven tissues In all eight tissues

Tissue distribution of expressed ‘predicted’ proteins

0

1000

2000

3000

4000

5000

6000

Spleen

UA

01

Strom

a

Tcell

s B-cells

Serum

Muscle

Brain

Tissue type

Nu

mb

er o

f p

rote

ins

Tissue specific proteins

Proteins identified inother tissues

Page 8: Update Susan Bridges, Fiona McCarthy, Shane Burgess NRI 2006-04846

chicken: human/mouse orthologs (1:1)

236

Mouse orthologsHuman orthologs

5,685 106

No human or mouse orthologs

1,784

Page 9: Update Susan Bridges, Fiona McCarthy, Shane Burgess NRI 2006-04846
Page 10: Update Susan Bridges, Fiona McCarthy, Shane Burgess NRI 2006-04846
Page 11: Update Susan Bridges, Fiona McCarthy, Shane Burgess NRI 2006-04846

Cumulative external visits to AgBase

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

05 05 05 05 05 05 06 06 06 06 06 06 06 06 06 06 06 06 07 07 07 07 07 07 07 07 07 07 07J Au Se Oc No De Ja Fe MaAp MaJu J Au Se Oc No De Ja Fe MaAp Ma Ju J Au Se Oc No De

07

Page 12: Update Susan Bridges, Fiona McCarthy, Shane Burgess NRI 2006-04846
Page 13: Update Susan Bridges, Fiona McCarthy, Shane Burgess NRI 2006-04846
Page 14: Update Susan Bridges, Fiona McCarthy, Shane Burgess NRI 2006-04846

Summary of GO annotations for last 12 months

11,716 GO annotations for chicken & cow:• 214 cow gene products GO annotated

(1,521 GO annotations)• 1,762 chicken gene products GO

annotated (10,194 GO annotations)• in addition, orthology with human and

mouse genes used to GO annotate 7,809 computationally ‘predicted’ chicken proteins (8,213 GO annotations)

Page 15: Update Susan Bridges, Fiona McCarthy, Shane Burgess NRI 2006-04846

Annotation metrics

Page 16: Update Susan Bridges, Fiona McCarthy, Shane Burgess NRI 2006-04846
Page 17: Update Susan Bridges, Fiona McCarthy, Shane Burgess NRI 2006-04846

Database distribution of AgBase GO Annotations

AgBase Community file

GO Consortium file

Chicken Dec '07Cow Dec '07

Page 18: Update Susan Bridges, Fiona McCarthy, Shane Burgess NRI 2006-04846

GO Annotation of Arrays

Page 19: Update Susan Bridges, Fiona McCarthy, Shane Burgess NRI 2006-04846

Functional annotation usingGene Ontology

Nomenclature(species’ genome nomenclature committees)

Other annotations

using other bio-ontologies e.g.

AnatomyOntology

Structural Annotationincluding Sequence Ontology

Genomic Annotation

Page 20: Update Susan Bridges, Fiona McCarthy, Shane Burgess NRI 2006-04846

Quality improvement of annotationsPre-annotation Re-annotation

Page 21: Update Susan Bridges, Fiona McCarthy, Shane Burgess NRI 2006-04846

GO annotation of arrays.

Array IDs

‘known’ genes frompublic databases

‘predicted’ genesfrom genome sequencing

Are strict mammalian orthologs available ?

GO annotation of literature

Is functional literature available ?

Gene product IDs

Electronic GO annotation using InterPro data (IEA)

GO annotation from orthologs (ISO)

Collate GO annotations

Submit to EBI-GOA, GOC

YES

YES NO

NO

structural mapping

link to array IDs(updateable)

Page 22: Update Susan Bridges, Fiona McCarthy, Shane Burgess NRI 2006-04846

AgBase: annotating arrays

1. Del-Mar 14K Chicken Integrated Systems microarray (GPL1731).• 14,053 chicken genes represented

• 9,587 contigs GO annotated

(CC:3,514; MF:6,640; BP:4,623)

• 3,101 singletons GO annotated

(CC:487; MF: 881; BP:646)

• many singletons map to chicken ESTs with no associated GO

Page 23: Update Susan Bridges, Fiona McCarthy, Shane Burgess NRI 2006-04846

metabolic process

transport

cell communication

development

immune response

cell death

cell differentiation

response to stress

sensory perception

cell motility

regulation of biological process

cellular organization and biogenesis

behavior

response to chemical stimulus

process unknown

Figure 1A: Biological Process associated with Del-Mar 14K array

Page 24: Update Susan Bridges, Fiona McCarthy, Shane Burgess NRI 2006-04846

Relative amount of GO BP associated with Del-Mar 14K array compared to total chicken GO.

-6.0

-4.0

-2.0

0.0

2.0

4.0

6.0

de

velo

pm

en

t

imm

un

e r

esp

on

se

cell

de

ath

resp

on

se t

o s

tre

ss

pro

cess

un

kno

wn

cell

mo

tility

cell

diff

ere

ntia

tion

be

ha

vio

r

tra

nsp

ort

reg

ula

tion

of

bio

log

ica

l pro

cess

sen

sory

pe

rce

ptio

n

resp

on

se t

o c

he

mic

al s

timu

lus

secr

etio

n

cellu

lar

org

an

iza

tion

an

d b

iog

en

esi

s

resp

on

se t

o s

timu

lus

me

tab

olic

pro

cess

cell

com

mu

nic

atio

n

Arr

ay

GO

/to

tal c

hic

ken

GO

GO Biological Processes

Page 25: Update Susan Bridges, Fiona McCarthy, Shane Burgess NRI 2006-04846

AgBase: annotating arrays

2. TAMU Agilent 44K chicken array

• approx 44,000 chicken genes represented

• added GO annotation for 8,731 chicken gene products

• many of the array IDs with no associated GO annotation map to chicken EST sequences

Page 26: Update Susan Bridges, Fiona McCarthy, Shane Burgess NRI 2006-04846

AgBase: annotating arrays

3. FHCRC Chicken 13K v2.0 (GPL1836)• 13,007 chicken genes represented• 2,491 array IDs mapped to chicken gene products & GO annotated• 628 mapped to chicken gene products with no GO• approx 2,000 array IDs mapped to human or mouse gene products with GO annotation

Page 27: Update Susan Bridges, Fiona McCarthy, Shane Burgess NRI 2006-04846

GO Annotation Quality Score: “GAQ”

GAQ : no. annotations; DAG depth; GO evidence code

• calculate overall GAQ score for any dataset (eg. array)• calculate GAQ for subsets (eg. biological processes studied

using arrays)

Page 28: Update Susan Bridges, Fiona McCarthy, Shane Burgess NRI 2006-04846

“Gene Ontology”“Biological Process”

IEA inferred from electronic annotation ISS inferred from sequence similarity IMP inferred from mutant phenotype IGI inferred from genetic interaction IPI inferred from physical interaction IDA inferred from direct assay IEP inferred from expression pattern TAS traceable author statement NAS non-traceable author statement ND no biological data available RCA inferred from reviewed computational analysis IC inferred by curator

Evidence Code

Your Favorite Gene

Low GAQ score

Your NEW Favorite gene

High GAQ score

Page 29: Update Susan Bridges, Fiona McCarthy, Shane Burgess NRI 2006-04846

Quantification of re-annotation

Metrics

Granularity Specificity

# previous annotations # chicken annotations

# re-annotations # human/mouse annotations

Quality

Gene Annotation Quality (GAQ) score

Page 30: Update Susan Bridges, Fiona McCarthy, Shane Burgess NRI 2006-04846

0

5001000

15002000

25003000

35004000

4500

Whole Array Chicken Human/Mouse

Annotation type

Nu

mb

er

of a

nn

ota

tion

s

Pre-annotation

Re-annotation

• 13% of previous annotations to other species were corrected to chicken specific annotations

300% increase

50% increase700% increase

GRANULARITY SPECIFICITY

Bart van den Berg, CVM MSU/ Sue Lamont and Huaijun Zhu

Page 31: Update Susan Bridges, Fiona McCarthy, Shane Burgess NRI 2006-04846

2.8579,599207,869Total GAQ score

4.84,240886Total # proteins (Breadth)

2.8108,53739,355Confidence score total

2.7231,18487,250Depth

Fold differenceRe-annotationPre-annotation

GAQ score summary

Page 32: Update Susan Bridges, Fiona McCarthy, Shane Burgess NRI 2006-04846

Quality improvement of annotationsPre-annotation Re-annotation

Page 33: Update Susan Bridges, Fiona McCarthy, Shane Burgess NRI 2006-04846

GO biological process annotations

-4.88

-3.61

-1.80

-0.75-0.04

0.18 0.33 0.461.04 1.06 1.26 1.64

5.12

-6

-4

-2

0

2

4

6

cell co

mm

unica

tion

meta

bolic p

roce

ss

cata

bolic p

roce

ss

transp

ort

regula

tion o

f bio

logica

l pro

cess

Macro

mole

cule

m

eta

bolic p

roce

ss

bio

logica

l_pro

cess

cell m

otility

resp

onse

to stim

ulu

s

Nucle

obase

, nucle

osid

e, n

ucle

otid

e a

nd n

ucle

ic acid

meta

bolic p

roce

ss

cell d

iffere

ntia

tion

cell d

eath

multice

llula

r org

an

ismal

develo

pm

ent

GO Term

Rela

tive

diff

ere

nce

microarray GO / total chicken GO

Page 34: Update Susan Bridges, Fiona McCarthy, Shane Burgess NRI 2006-04846

Modeling using the GO

Functional Understanding

ImpliedDerivedPhysiology (= Cellular Component + Biological

Process + Molecular Function)

Network ModelingGene Ontology

(interactions)

Page 35: Update Susan Bridges, Fiona McCarthy, Shane Burgess NRI 2006-04846

Hypothesis-driven GO-based data interrogation

Buza, J. J. and S.C. Burgess. Modeling the proteome of a Marek's disease transformed cell line: a natural animal model for CD30 over-expressing lymphomas. Proteomics, 2007. 7:1316-26.

Page 36: Update Susan Bridges, Fiona McCarthy, Shane Burgess NRI 2006-04846

Avian Genome Conference 18-20 May, 2008GO Annotation Jamboree 21-22 May, 2008

[email protected]