structural and functional genomics in grapevine through flagdb · padova university 2.5 x udine...

33
Structural and Functional Genomics in Grapevine through FLAGdb ++ Sandra Dèrozier, Cécile Guichard, Franck Samson, Jean-Philippe Tamby, Véronique Brunaud, Vincent Thareau, Christophe Caron, Maria Tchoumakov, Roberto Bacilieri, Anne-Françoise Adam-Blondon and Sébastien Aubourg

Upload: others

Post on 11-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Structural and Functional Genomics in Grapevine through FLAGdb · Padova University 2.5 X Udine University 2.5 X Milano University France Italy Franco-Italian consortium Coord : A.-F

Structural and Functional Genomics

in Grapevine through FLAGdb++

Sandra Dèrozier, Cécile Guichard, Franck Samson, Jean-Philippe Tamby, Véronique Brunaud,

Vincent Thareau, Christophe Caron, Maria Tchoumakov, Roberto Bacilieri, Anne-Françoise Adam-Blondon

and Sébastien Aubourg

Page 2: Structural and Functional Genomics in Grapevine through FLAGdb · Padova University 2.5 X Udine University 2.5 X Milano University France Italy Franco-Italian consortium Coord : A.-F

Why genomics on Vitis vinifera ?

4 106 hl

0,5 109 €

15 106 hl

6 109 €

Fungicides

5000

10000

15000

20000

25000

30000

35000

40000

45000

1992 1993 1994 1995 1996 1997 1998 1999

To

nn

es

Cereals

Grape

Other crops

2n = 38

475 Mb

Production

Export

106 hl

2005 2007

0

10

20

30

40

50

60

Franc

eIta

ly

Spa

inUSA

Arg

entin

a

Aus

tralia

Chi

na

Ger

man

y

Sou

th A

frica

Por

tuga

l

Chi

le

Rom

ania

Page 3: Structural and Functional Genomics in Grapevine through FLAGdb · Padova University 2.5 X Udine University 2.5 X Milano University France Italy Franco-Italian consortium Coord : A.-F

Genome sequencing

Génoscope 7 X

INRA

Padova University 2.5 X

Udine University 2.5 X

Milano University

France Italy

Franco-Italian consortium

Coord : A.-F. Adam-Blondon

8 X assembly (Jaillon et al., Nature 2007)

12 X assembly (2009)

Pinot Noir PN40024 (93% homozygote)

Sanger after WGS

Page 4: Structural and Functional Genomics in Grapevine through FLAGdb · Padova University 2.5 X Udine University 2.5 X Milano University France Italy Franco-Italian consortium Coord : A.-F

Genome annotation strategy

Ab initio predictions

coding potential, signals

ATAACATGATCATGTTAAACGTTGTA

ATCAAAACTGATTAAAAAGAACTAAT

GTCGTGATGTCTGCTTTAAAATCTCA

GAAGTGTGGAGAGAGGATTCAAGGG

TAAACCATACCCATGAGGCTGGTGAA

AATGGTACTATCAGCTACATTGACGC

AGGATCCAAGCAAGCTTATTCAGCTT

GACCTGCATCACCCGTTATTCATGAC

AACTGGAGGAAGCCGATACAGACGT

scaffolds

Sequence comparisons

similarities

Spliced alignments of

transcripts

Masking of repeat sequences

Integration and reconciliation

predicted gene models

training set EMBL/GenBank/DDBJ cDNA/EST resources

SpliceMachine

EuGène IMMBLASTX GenomeThreader

EuGène

GeneID

GlimmerHMMEXOFISH GeneWise

EST2Genome

GAZE

Page 5: Structural and Functional Genomics in Grapevine through FLAGdb · Padova University 2.5 X Udine University 2.5 X Milano University France Italy Franco-Italian consortium Coord : A.-F

Training, evaluation and results

1

2

3

Training of EuGène (MM) and SpliceMachine

EuGène (combiner) optimization

Prediction evaluation

600

genes

EuGène requires

high computation facilities:

(MIG)

0

10

20

30

40

50

60

70

80

90

100

0 10 20 30 40 50 60 70 80 90 100

Sensitivity

Sp

ecif

icit

y

Scale : gene

Mode :

- with integration

- ab initio pure

EuGène predicts 44 414 coding genes in the 12x PN40024 genome

(GAZE : 26 347 genes)

4500 splicing sites

Page 6: Structural and Functional Genomics in Grapevine through FLAGdb · Padova University 2.5 X Udine University 2.5 X Milano University France Italy Franco-Italian consortium Coord : A.-F

Vitis vinifera in FLAGdb++

Relational database and JAVA application dedicated to integration and

exploration of genomic data (experimental or predicted) in order to help

the functional study of plant genes

EuGène

PFAM

GAZE

repeats

SNP probe

EST+

cDNA

Page 7: Structural and Functional Genomics in Grapevine through FLAGdb · Padova University 2.5 X Udine University 2.5 X Milano University France Italy Franco-Italian consortium Coord : A.-F

Vitis vinifera in FLAGdb++

Easy access to gene families or large groups of genes

Page 8: Structural and Functional Genomics in Grapevine through FLAGdb · Padova University 2.5 X Udine University 2.5 X Milano University France Italy Franco-Italian consortium Coord : A.-F

Vitis vinifera in FLAGdb++

TargetP-ChloroP

WolfPSORT

Predotar

TMhmm

- Phylogenetic profiles

- Predicted subcellular localization

- Functional annotations

- Retrieve of sequences (promoters, proteins…) in batch

and numerous other tools…

Page 9: Structural and Functional Genomics in Grapevine through FLAGdb · Padova University 2.5 X Udine University 2.5 X Milano University France Italy Franco-Italian consortium Coord : A.-F

Vitis vinifera in FLAGdb++

New tool dedicated to orthology relationships between the 4 plant genomes hosted in the database

- Graphical display of

Reciprocal Blast Hits

- Comparison of gene

structures and

promoters

- Global protein

alignments

Page 10: Structural and Functional Genomics in Grapevine through FLAGdb · Padova University 2.5 X Udine University 2.5 X Milano University France Italy Franco-Italian consortium Coord : A.-F

From grape genome to wine flavours...

CCAGTGTGCAGCTTGCAGGCGCCAAACGTGATGCCTTGCTTCTCAGTTTCAAAGACGCTAAGC

TGTCTGTGGTGGAGTATGACCCAGGCACTCATGACCTGAAGACCCTGTCTCTACACTACTTTG

AGGAGCCTGAACTTCGGGATGGGTTTGTGCAGAACGTGCATACGCCTCGTGTGCGGGTGGACC

CTGACGGGCGCTGTGCAGCAATGCTCATCTATGGCACGAGGCTGGTAGTTCTGCCTTTCCGCA

GAGAGAGCCTGGCTGAGGAACATGAGGGGCTCATGGGCGAAGGGCAGAGGTCTAGCTTCTTGC

CCAGCTACATCATTGATGTACGGGCTCTGGATGAGAAACTACTCAACATCATTGACCTGCAGT

TCCTGCATGGCTACTATGAGCCCACCCTGCTTATCCTGTTTGAGCCCAACCAGACTTGGCCAG

GGCGGGTGGCTGTGAGGCAGGACACGTGCTCCATTGTGGCTATCTCGCTGAACATCACACAGA

AAGTCCATCCAGTCATCTGGTCCCTCACCAGCTTGCCTTTTGACTGTACCCAGGCCCTGGCTG

TGCCCAAGCCCATAGGTGGGGTAGTGATCTTCGCTGTCAACTCACTGTTGTACCTGAACCAGA

GTGTTCCCCCATATGGTGTGGCTCTCAATAGTCTCACCACAGGCACCACCGCTTTCCCATTGC

GTACCCAAGAGGGTGTACGGATCACCCTAGACTGCGCACAGGCGGCCTTCATCTCTTATGACA

AGATGGTCATCTCCCTCAAGGGTGGTGAGATTTATGTGCTGACCCTCATCACTGACGGCATGC

GAAGTGTCCGAGCGTTTCACTTTGACAAGGCAGCTGCTAGTGTCCTTACCACCAGTATGGTCA

CAATGGAGCCCGGATACCTGTTCCTAGGCTCTCGCCTGGGCAATTCCCTCCTCCTCAAGTACA

CGGAGAAGCTGCAGGAGCCCCCAGCCAGCTCCGTCCGTGAGGCTGCTGACAAGGAAGAGCTGC

a gene family story

Many volatile organic compounds wine flavour (taste and aroma)

skin

flesh

Stored as sugar or amino acid

conjugates in vacuoles

Volatilization of compounds

essential for the flavour

perception

... ...

glycosidases

&

peptidases

from grape, yeast

and/or industrial

enzymes

Lund and Bohlmann (2006) Science

Page 11: Structural and Functional Genomics in Grapevine through FLAGdb · Padova University 2.5 X Udine University 2.5 X Milano University France Italy Franco-Italian consortium Coord : A.-F

The terpenoid world

Gewürztraminer

linalool

geraniol

nerol

citronellol

-terpineol

Distinctive floral

character

Girard et al. (2002) Am. J. Enol. Vitic.

Cabernet Sauvignon

anthers and pollen grains

(+)-valencene

(-)-7-epi- -selinene

Protect reproductive tissues against pathogens

Allow pollinators to locate flowers

Martin et al. (2009) PNAS

(22 000 different molecules !)

Page 12: Structural and Functional Genomics in Grapevine through FLAGdb · Padova University 2.5 X Udine University 2.5 X Milano University France Italy Franco-Italian consortium Coord : A.-F

aroma

plant-pathogen relationships

plant-pathogen relationships

gibberellic acid

plant-pathogen relationships

isopentenyldiphosphate

C5

The terpenoid biosynthesis : TPS family

Page 13: Structural and Functional Genomics in Grapevine through FLAGdb · Padova University 2.5 X Udine University 2.5 X Milano University France Italy Franco-Italian consortium Coord : A.-F

Expert annotation process

Known TPS proteins (Arabidopsis, Pinus, Citrus…)

Vitis 12x genome

PN40024

TBLASTN

152 loci TPS-like

Clean and exhaustive structural annotation of the Vitis TPS gene family- Detect and correct erroneous annotations and missing genes

- Discriminate between pseudogenes, partial and complete genes

GAZE predictions

EUGENE predictions

PFAM motifs

EST

cDNA

BLASTX results

TPS knowledge

From

GB/EMBL

&

URGV

Page 14: Structural and Functional Genomics in Grapevine through FLAGdb · Padova University 2.5 X Udine University 2.5 X Milano University France Italy Franco-Italian consortium Coord : A.-F

The Vitis TPS family

152 loci homologous to known TPS :

53 full and perfect TPS genes

16 complete TPS genes with only one punctual problem of sequence

(sequencing error ?)

20 partial TPS genes disrupted by unsequenced gap in the 12x assembly

63 pseudogenes with numerous frameshifts, stop codons and/or large deletion(s)

Arabidopsis Rice Poplar

32 full TPS

8 pseudogenes

48 TPS loci

including pseudogenes

45 TPS loci

including pseudogenes

and partial genes

53

89

functional

TPS

GAZE fails to detect 54 of them

EUGENE fails to detect 12 of them GAZE predicts perfectly 12 TPS (23%)

EUGENE predicts perfectly 43 TPS (81%)

Page 15: Structural and Functional Genomics in Grapevine through FLAGdb · Padova University 2.5 X Udine University 2.5 X Milano University France Italy Franco-Italian consortium Coord : A.-F

Topological organization of the family

690 Kb

20

25

129 out of the 152 TPS loci (85%) are tandemly

organized in 13 clusters from 2 to 45 genes.

Page 16: Structural and Functional Genomics in Grapevine through FLAGdb · Padova University 2.5 X Udine University 2.5 X Milano University France Italy Franco-Italian consortium Coord : A.-F

Vitis TPS classification

high diversity

of monoterpenes

in PN40024 ?

Page 17: Structural and Functional Genomics in Grapevine through FLAGdb · Padova University 2.5 X Udine University 2.5 X Milano University France Italy Franco-Italian consortium Coord : A.-F

Geranyl diphosphate synthases (GPPS) family

In Arabidopsis thaliana :

homodimeric GPPS

In Mentha piperita and Clarkia breweri :

heterodimeric GPPS

LSU + SSU

GSVIVP00014674001

GSVIVP00019758001

GSVIVP00025000001

Mentha GPPS LSU

645

477

Clarkia GPPS SSU

GSVIVP00027313001

Mentha GPPS SSU

Antirrhinum GPPS SSU

1000

888

1000

GSVIVP00036188001

GSVIVP00008098001

1000

685

GSVIVP00027948001

Arabidopsis GPPS

GSVIVP00005765001

GSVIVP00022956001

996

1000

1000

1000Probable SSU

SSU

LSU

high diversity

of monoterpenes

In PN40024, there are orthologous

genes for the 2 forms of GPPS

Page 18: Structural and Functional Genomics in Grapevine through FLAGdb · Padova University 2.5 X Udine University 2.5 X Milano University France Italy Franco-Italian consortium Coord : A.-F

TPS gene structures

predicted plastidial targeting peptide

mono- and diterpene synthases

RRx(8)W motif

isomerization-cyclization reaction

DDxxD motif

metal ion binding for cleavage

of prenyl diphosphate substrates

Page 19: Structural and Functional Genomics in Grapevine through FLAGdb · Padova University 2.5 X Udine University 2.5 X Milano University France Italy Franco-Italian consortium Coord : A.-F

Functional characterization of VvTPS genes

(+)-α-phellandrene 40 %

myrcene 15 %

terpinolene 15 %

α-terpinene 7 %

γ-terpinene 6 %

(+)-limonene 5 %

(+)-α-pinene 3 %

(+)-β-phellandrene 3 %

(+)-α-terpineol 3 %

(E)-β-ocimene 2 %

(+)-α-pinene 47 %

(+)-limonene 25 %

(+)-camphene 9 %

myrcene 6 %

(+)-α-terpineol 5 %

(+)-sabinene 3 %

(3S)-linalool 2 %

(+)- -pinene 1 %

(+)-α-phellandrene 1 %

(+)-α-thujene 1 %

α-terpinolene 1 %

(E)-β-ocimene 98 %

(Z)-β-ocimene 2 %

(E)-β-ocimene 53 %

myrcene 42 %

β-pinene 3 %

limonene 2 %

(3R)-linalool 100%

(E)-β-ocimene 100%

(E,E)-α-farnesene 100%

GDP

FDP

1. cDNA cloning of TPS

2. Expression in E. Coli

3. Semi-purification

4. In vitro enzyme assay

5. GC-MS

6. Product identification

Expert annotations, in

silico predictions, protein

sequence analyses are

driving the experimental

characterization of TPS.

Page 20: Structural and Functional Genomics in Grapevine through FLAGdb · Padova University 2.5 X Udine University 2.5 X Milano University France Italy Franco-Italian consortium Coord : A.-F

Other wine constituents…

from Velasco et al.

(2007) PLOS One

STS CHS

Resveratrol is involved in health benefits : ‘French paradox’

Prevents cancer and cardiovascular disease.

Stilbenes are also known as fungicides.

Colour and astringent

features of wines

48 Stilbene Synthase genes !

Page 21: Structural and Functional Genomics in Grapevine through FLAGdb · Padova University 2.5 X Udine University 2.5 X Milano University France Italy Franco-Italian consortium Coord : A.-F

Acknowledgements

Franck Samson

Cécile Guichard

Sandra Dèrozier

Jean-Philippe Tamby

Véronique Brunaud

Christophe Caron

Philippe Hugueney

French-Italian Public Consortium for Vitis

Anne-Françoise Adam-Blondon

Olivier Jaillon, Patrick Wincker et al.

Joerg Bohlmann

Diane Martin

Jérome Gouzy

Thomas Schiex

http://urgv.evry.inra.fr/FLAGdb

[email protected]

[email protected]

Toulouse

Page 22: Structural and Functional Genomics in Grapevine through FLAGdb · Padova University 2.5 X Udine University 2.5 X Milano University France Italy Franco-Italian consortium Coord : A.-F
Page 23: Structural and Functional Genomics in Grapevine through FLAGdb · Padova University 2.5 X Udine University 2.5 X Milano University France Italy Franco-Italian consortium Coord : A.-F

S. Aubourg et al. (URGV, Evry)

S. Rombauts et al. (PSB, Gent)

J. Gouzy et al. (LIPM, Toulouse)

J. Amselem et al. (URGI, Versailles)

...

0

10

20

30

40

50

60

70

80

90

100

0 10 20 30 40 50 60 70 80 90 100

Sensitivity

Sn=67 %

Sp=56 %

Scale : gene

Mode : ab initio pureFgenesP

GenScan

GlimmerA

GeneMark.hmm

FgeneSH

EuGène

Sp

ecif

icit

y

Why EuGène ?

An eukaryotic gene finder that combines multiple sources of evidence.

T. Schiex et al. (BIA, INRA Toulouse)

http://eugene.toulouse.inra.fr/

Page 24: Structural and Functional Genomics in Grapevine through FLAGdb · Padova University 2.5 X Udine University 2.5 X Milano University France Italy Franco-Italian consortium Coord : A.-F

Gene training set

1 2 3 4

1 2 3 4

1

23

41

2 3 4

CDS5’ 3’

GC content

0

2

4

6

8

10

12

14

16

Set of 600 experimentally characterized complete genes (full-length mRNA cognate to genomic DNA) ‘representative’ of the whole genome + 4500 splicing sites

Arabidopsis

Populus

Vitis

Oryza

GC%

%

Page 25: Structural and Functional Genomics in Grapevine through FLAGdb · Padova University 2.5 X Udine University 2.5 X Milano University France Italy Franco-Italian consortium Coord : A.-F

Integration and optimization steps

Tested formulas (plug-ins):

-1- IMM + SpliceMachine

-2- IMM + SpliceMachine + BLASTX

-3- IMM + SpliceMachine + BLASTX + TBLASTX

-4- IMM + SpliceMachine + BLASTX + TBLASTX + GenomeThreader

-5- IMM + SpliceMachine + BLASTX + GenomeThreader

- SwissProtproteomes from :- Arabidopsis- Oryza- Poplar- Medicago

Poplar

genome

EST

cDNA

9 % Arabidopsis

7 % Medicago

6 % Oryza

78 %

Poplar

Page 26: Structural and Functional Genomics in Grapevine through FLAGdb · Padova University 2.5 X Udine University 2.5 X Milano University France Italy Franco-Italian consortium Coord : A.-F

IMM+SM

IMM+SM+BLASTX

IMM+SM+BLASTX+TBLASTX

IMM+SM+BLASTX+GT

IMM+SM+BLASTX+TBLASTX+GT

0

10

20

30

40

50

60

70

80

90

100

0 10 20 30 40 50 60 70 80 90 100

Sensibility

Sp

ecif

icit

y

EUGENE training results

Page 27: Structural and Functional Genomics in Grapevine through FLAGdb · Padova University 2.5 X Udine University 2.5 X Milano University France Italy Franco-Italian consortium Coord : A.-F

0

10

20

30

40

50

60

50

100

150

200

250

300

350

400

450

500

550

600

650

700

750

800

850

900

950

1000

1050

1100

1150

1200

1250

1300

1350

1400

1450

1500

Intron size (training set)

% introns

size(bp)

86876 introns 157 4 256

68705 introns 360 7 774

2329 introns 716 17 451

Mean max

148087 introns 376 29 834

Long introns (12% include LINES elements)

Page 28: Structural and Functional Genomics in Grapevine through FLAGdb · Padova University 2.5 X Udine University 2.5 X Milano University France Italy Franco-Italian consortium Coord : A.-F

0

01-30

31-60

61-90

>90

0

2000

4000

6000

8000

10000

12000

001-30

31-6061-90

>90

EuGène results

CDS

number

Page 29: Structural and Functional Genomics in Grapevine through FLAGdb · Padova University 2.5 X Udine University 2.5 X Milano University France Italy Franco-Italian consortium Coord : A.-F

Out of 152 TPS loci :

GAZE fails to detect 54 of them (including 6 full TPS genes)

EUGENE fails to detect 12 of them

Out of 53 complete and perfect TPS genes:

(partial and disturbed genes can not be well predicted)

GAZE predicts perfectly 12 TPS 23%

EUGENE predicts perfectly 43 TPS 81%

EuGène vs GAZE (TPS family)

Page 30: Structural and Functional Genomics in Grapevine through FLAGdb · Padova University 2.5 X Udine University 2.5 X Milano University France Italy Franco-Italian consortium Coord : A.-F

TPS classification

TPS-a Sesquiterpene and Diterpene Synthases

TPS-b Monoterpene Synthases

TPS-c Diterpene Synthases (copalyl diphosphate synthases)

TPS-d TPS from conifers

TPS-e Diterpene Synthases (ent-kaurene synthases)

TPS-f Monoterpene Synthases (linalool synthases)

TPS-g unknown TPS

Aubourg et al. (2002)

Mol. Genet. Genomics

Page 31: Structural and Functional Genomics in Grapevine through FLAGdb · Padova University 2.5 X Udine University 2.5 X Milano University France Italy Franco-Italian consortium Coord : A.-F

Functional characterization of TPS genes

Expression of recombinant TPS in E. Coli

semi-purification

In vitro enzyme assay Buffer and substrate selection is driven

by in silico functional predictions

GC-MS

Identification of terpenoid products

cDNA cloning of vitis TPS

Stem, leaf, berry, root and flower tissues Methyl Jasmonate

mRNA extraction

PCR with primers selected from annotated genomic sequences

Gewürztraminer

Pinot Noir

Cabernet Sauvignon

Dr Diane Martin (UBC)

Page 32: Structural and Functional Genomics in Grapevine through FLAGdb · Padova University 2.5 X Udine University 2.5 X Milano University France Italy Franco-Italian consortium Coord : A.-F

newSTS7

newSTS8

GSVIVG00010585001

GSVIVG00010580001

Vst1 Vst2

(Vst3)

(Vst3)Vst2

Vst1

GSVIVG00010584001

GSVIVG00010579001

Copia-like

Copia-like

Unsequenced region…

Page 33: Structural and Functional Genomics in Grapevine through FLAGdb · Padova University 2.5 X Udine University 2.5 X Milano University France Italy Franco-Italian consortium Coord : A.-F

Stilbene and Chalcone Synthase families

Proteome and translated genome have been screened with sequences of 3 previously characterized

Stilbene Synthases :

62 loci identified and expertised :

3 CHS genes highly expressed and 11 CHS-like

48 STS genes (discriminated with specific residues) distributing on 2 clusters

(chromosomes 10 and 16). At least 73% of them are expressed. Only 2 grape STS have

been experimentally characterized. All the STS share between 91% and 99.7% of identity.

Out of 62 STS/CHS genes (2 exons):

GAZE fails to detect 36 of them and

only 2 are well predicted !

EUGENE fails to detect only one and

40 are well predicted.