© 2017 guanqiao feng

157
EVOLUTION OF PLANT 3R-MYB AND TIFY GENE FAMILIES AND JASMONATE INDUCED ALTERNATIVE SPLICING RESPONSES IN ARABIDOPSIS By GUANQIAO FENG A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2017

Upload: others

Post on 17-Feb-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

EVOLUTION OF PLANT 3R-MYB AND TIFY GENE FAMILIES AND JASMONATE INDUCED ALTERNATIVE SPLICING RESPONSES IN ARABIDOPSIS

By

GUANQIAO FENG

A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT

OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY

UNIVERSITY OF FLORIDA

2017

© 2017 Guanqiao Feng

To my beloved parents

4

ACKNOWLEDGMENTS

First and foremost, I thank my advisor Dr. William Bradley Barbazuk for his

support, guidance, and encouragement during my Ph.D. study. He provided motivation

and made my Ph.D. experience smooth. With his trust, I always felt confidence to tackle

problems encountered in my research, through which I gained insight on the issues and

learned the methods to solve them. I am grateful to my committee members, all of

whom have been intensively involved in my project development and were always there

to help me out when I confronted obstacles problems. I started my MYB project with Dr.

Gordon Burleigh and this project was further developed with assistance from Dr.

Edward Braun. Without their generous help, I could not have explored the MYB project

in so many interesting directions. Dr. Sixue Chen provided me with data to explore a

fascinating project on jasmonate related alternative splicing. I appreciate the opportunity

I had to work on it and am thankful of his support and advice on the project.

During my Ph.D. study at UF, my lab members and friends provided much

technical support and informative discussions. I would like to thank all of them: Dr.

George Tiley, Dr. Tong Zhang, Dr. Srikar Chamala, Dr. Wenbin Mei, Lucas Boatwright,

Jerald Noble, Nathan Catlin, Ruth Davenport, Xiaoxian Liu, Qinyin Ling, Dr. Jason Orr,

Dr. Stela Palii. In addition, I would like to thank Plant Molecular and Cellular Biology

program, Department of Biology and China Scholarship Council for financial support.

I cannot give enough thanks to my parents who always love, support and

understand me. Their love is the source of my strength. I thank my beloved fiancé

Anirudh Gangadhar for showing up in my life. He supported me continuously to

overcome obstacles encountered while pursuing my Ph.D. degree.

5

TABLE OF CONTENTS page

ACKNOWLEDGMENTS .................................................................................................. 4

LIST OF TABLES ............................................................................................................ 8

LIST OF FIGURES .......................................................................................................... 9

LIST OF ABBREVIATIONS ........................................................................................... 11

ABSTRACT ................................................................................................................... 13

CHAPTER

1 INTRODUCTION .................................................................................................... 15

Alternative Splicing Mechanism .............................................................................. 15

Intron Origin Hypothesis ......................................................................................... 18 MYB Gene Family Evolution ................................................................................... 21 TIFY Gene Family Evolution ................................................................................... 24

2 EVOLUTION OF THE 3R-MYB GENE FAMILY IN PLANTS .................................. 27

Background ............................................................................................................. 27

Materials and Methods............................................................................................ 30

Identification of the 3R-MYB Proteins ............................................................... 30

Multiple Sequence Alignments and Phylogenetic Analysis............................... 31 Domain and Motif Identification ........................................................................ 31

Synonymous Divergence among Paralogs....................................................... 32 Syntenic Block Identification ............................................................................. 33 Identification of Intron Positions and AS Analysis ............................................. 33

Analysis of Motifs in Promoter Regions ............................................................ 34 Gene Expression Analysis ................................................................................ 34

Results .................................................................................................................... 35

Global Identification of 3R-MYB Proteins from 65 Plant Species ..................... 35 Phylogenetic Analysis of the Plant 3R-MYB Proteins ....................................... 35 Synteny ............................................................................................................ 36 Synonymous Divergence Analysis of the Three Group 3R-MYBs in

Angiosperms ................................................................................................. 36 The Evolutionary History of the Plant 3R-MYBs Motifs .................................... 37 Gene Structure Evolution ................................................................................. 38

Alternative Splicing of the Plant 3R-MYBs........................................................ 38 MSA cis-Regulatory Element Prediction (Cell Cycle Regulation) ..................... 39

Expression Pattern of the Plant 3R-MYBs under Abiotic Stresses ................... 40 Discussion .............................................................................................................. 41

Patterns of Duplication and Loss in Plant 3R-MYB Genes ............................... 41

6

DNA-Binding Domain and Regulatory Motifs .................................................... 43

Intron Gain and Gene Structure Evolution ........................................................ 44

AS Regulation of the Plant 3R-MYBs ............................................................... 45 Plant 3R-MYBs: Link between Cell Cycle and Abiotic Stresses ....................... 46

3 JASMONATE INDUCED ALTERNATIVE SPLICING RESPONSES IN ARABIDOPSIS ....................................................................................................... 67

Background ............................................................................................................. 67

Materials and Methods............................................................................................ 70 Plant Growth, MeJA Treatment and Harvesting, Transcriptome Library

Preparation and Sequencing ......................................................................... 70 Transcriptome Assembly and Differential AS Detection ................................... 71 Open Reading Frame Prediction ...................................................................... 71

Protein Interaction Network Analysis ................................................................ 72 miRNA Target Predication ................................................................................ 72

Protein Extraction, Digestion, iTRAQ Labeling and LC-MS/MS ........................ 72 Proteomics Data Analysis ................................................................................. 73

Results .................................................................................................................... 74 Transcriptome Sequencing and Genome-Guided Assembly ............................ 74 Jasmonate-Related Protein Interaction Network .............................................. 75

Regulation of Transcription Factors (bHLHs and MYBs) and Splicing Factors (SRs and hnRNPs) ........................................................................... 76

Differential Alternative Splicing in Response to MeJA Treatment ..................... 78 Alternative Splicing Variants Differentially Targeted by miRNA ........................ 80 Alternative Splicing Variants with Novel Functions ........................................... 81

Alternative Splicing Variant of bHLH160 with Potential Novel Function ........... 82

Proteomics Validation for Alternative Splicing .................................................. 84 Discussion .............................................................................................................. 84

Regulatory Functions of JAZ2 and JAZ7 .......................................................... 84

Alternative Splicing Coupled miRNA Regulation .............................................. 87 Functional AS Regulation ................................................................................. 90

4 ORIGIN AND EVOLUTION OF THE TIFY PLANT-SPECIFIC MULTI-DOMAIN GENE FAMILY ...................................................................................................... 113

Background ........................................................................................................... 113 Materials and Methods.......................................................................................... 114

Identification of Members in the TIFY Gene Family ........................................ 114

Multiple Sequence Alignment and Phylogeny ................................................ 115 Domain Identification ...................................................................................... 115 Rate-Shift Analysis of the TIFY Domain ......................................................... 116

Alternative Splicing Analysis ........................................................................... 116 Results .................................................................................................................. 117

Domain Identification and Evolutionary History .............................................. 117 Gene Family Identification and Evolution History ........................................... 118 Domain Dynamics during Evolution ................................................................ 119

7

Alternative Splicing of Jas-like Intron in PPD Genes ...................................... 120

Discussion ............................................................................................................ 121

Evolutionary History of the TIFY Family ......................................................... 121 Poales Experienced Many Gene Loss Events ................................................ 122 Domain Loss of TIFY Multidomain Family during Evolution ........................... 123 Alternative Splicing of Jas-like Intron of PPD Genes ...................................... 125

5 CONCLUSIONS AND PERSPECTIVES ............................................................... 136

Evolution and Function of the Plant 3R-MYBs ...................................................... 136 Origin and Evolution of the Multidomain TIFY Family ........................................... 138 AS Regulation of Arabidopsis under Jasmonate Treatment ................................. 139

LIST OF REFERENCES ............................................................................................. 140

BIOGRAPHICAL SKETCH .......................................................................................... 157

8

LIST OF TABLES

Table page 2-1 Data resource summary of the sixty-five plant species used in this study. ......... 64

2-2 Positive selection test results.............................................................................. 66

3-1 RNA-seq library and mapping information. ....................................................... 110

3-2 Gene isoform number in TAIR10 and MeJA RNA-Seq data. ............................ 111

3-3 Differential AS or expression in treatment comparisons. .................................. 112

3-4 Differential AS or expression in tissue comparisons. ........................................ 112

3-5 Differential AS or expression between mutant backgrounds. ........................... 112

9

LIST OF FIGURES

Figure page 2-1 Species phylogeny and numbers of 3R-MYB genes in each species.. ............... 49

2-2 Subgroup classification of the plant 3R-MYBs. ................................................... 50

2-3 Whole length protein ML phylogenetic tree of the plant 3R-MYBs...................... 51

2-4 Syntenic blocks in algae and Amborella that contain 3R-MYB genes. ............... 52

2-5 Tests for origin of the three groups of the plant 3R-MYB genes.. ....................... 53

2-6 Multiple protein alignments of motif 4 with representative species ..................... 54

2-7 Analysis of DNA binding domain of the plant 3R-MYBs proteins. ....................... 55

2-8 Intron evolution of the DNA-binding-domain region of the plant 3R-MYBs. ........ 57

2-9 AS of 3R-MYBs in multiple plant species.. ......................................................... 58

2-10 Predicted MSA element within promoter of the plant 3R-MYB genes. ................ 59

2-11 Violin plots of MSA core sequences in the upstream regions for each group. .... 60

2-12 Expression of the Arabidopsis 3R-MYB genes under abiotic stresses.. ............. 61

2-13 Expression of 3R-MYB genes from nine angiosperm species under abiotic stresses.. ............................................................................................................ 62

2-14 Model of plant 3R-MYB evolution ....................................................................... 63

3-1 jaz2 and jaz7 mutant characterization ................................................................ 94

3-2 Characterization of the assembled transcripts from the MeJA project. ............... 95

3-3 Protein interaction network of AS genes with differential expression/AS.. .......... 96

3-4 Heatmap of differentially expressed AS genes of the transcription factor (bHLH and MYB) or splicing factor (SR and hnRNP) gene family. ..................... 97

3-5 Venn diagram of differential AS or expression genes. ........................................ 98

3-6 Selective cases in which gene expression was regulated by AS in response to MeJA treatment. ............................................................................................. 99

3-7 Two genes under AS regulation in response to MeJA treatment.. .................... 100

10

3-8 AS genes with multiple miRNA target sites differentially targeted by miRNA between isoforms.. ........................................................................................... 101

3-9 Twenty-one genes which contain miRNA binding sites subjected to AS regulation .......................................................................................................... 102

3-10 miRNA regulation and expression profiles of SMZ, AAO2 and At3g02740 ...... 103

3-11 Cases in which AS won’t generate multiple protein products. .......................... 104

3-12 Genes with different domain structure of the transcript isoforms due to AS. .... 105

3-13 Arabidopsis bHLH160 AS pattern and proposed regulatory function................ 106

3-14 Mapped reads supporting the AS junction in Arabidopsis bHLH160b-. ............. 107

3-15 Proteomics validated AS isoform expression. .................................................. 108

3-16 Differential miRNA regulation of SPL4 splice variants. ..................................... 109

4-1 Logos of the domains in the TIFY family. ......................................................... 128

4-2 Domain distribution across 76 plant species. ................................................... 129

4-3 Distribution of ZML, JAZ, PPD and TIFY family in plant species ...................... 130

4-4 ML tree of ZML, JAZ, PPD and TIFY family...................................................... 131

4-5 Estimated evolutionary history of the four families in the TIFY family. .............. 132

4-6 Rate-shift sites in the TIFY domain across the four families. ............................ 133

4-7 Domain dynamics of the PPD, TIFY and ZML families. .................................... 134

4-8 AS in the Jas-like intron of PPD genes ............................................................. 135

11

LIST OF ABBREVIATIONS

AltA Alternative acceptor

AltD Alternative donor

ANOVA Analysis of variance

AS Alternative splicing

CBF C-repeat-binding factor

CCT CONSTANS, CO-like, TOC1

cDNA Complementary DNA

COI1 Coronatine Insensitive 1

dN Nonsynonymous substitutions per nonsynonymous site

DNA Nucleic acid

DREB Dehydration responsive element-binding factor

dS Synonymous substitutions per synonymous site

ExonS Exon skipping

HMM Hidden Markov model

hnRNP Heterogeneous nuclear ribonucleoprotein

IntronR Intron retention

JA Jasmonic acid

JA-Ile (+)-7-iso-Jasmonoyl-L-isoleucine

Jas Jasmonate-associated

JAZ Jasmonate-ZIM domain

MeJA Methyl jasmonate

miRNA Micro-RNA

ML Maximum likelihood

mRNA Messenger RNA

12

MSA Mitosis-Specific Activator

NINJA Novel interactor of JAZ

NMD Nonsense-mediated mRNA decay

NRT1.8 Nitrate transporter 1.8

NUDX9 Nudix hydrolase homolog 9

ORF Open reading frame

PPD PEAPOD

PTC Premature termination codon

RNA Ribonucleic acid

RNA-seq mRNA-sequencing

rRNA Ribosomal RNA

snRNA Small nuclear RNA

snRNP Small nuclear ribonucleoprotein

SR Serine/Arginine-rich

TPL TOPLESS

tRNA Transfer RNA

UTR Untranslated region

WGD Whole genome duplication

ZIM Zinc-finger protein expressed in Inflorescence Meristem

13

Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy

EVOLUTION OF PLANT 3R-MYB AND TIFY GENE FAMILIES AND

JASMONATE INDUCED ALTERNATIVE SPLICING RESPONSES IN ARABIDOPSIS

By

Guanqiao Feng

December 2017

Chair: William Bradley Barbazuk Major: Plant Molecular and Cellular Biology

One of the biggest challenges for a plant’s survival is to deal with various abiotic

and biotic stresses. During evolution many gene families related with stress responses

have gone through expansion. However, the link between the expansion of these

families and the adaptation of plants to environment is not clear. In my Ph.D. research I

focused on the molecular evolution and function of two important gene families: 3R-

MYB and TIFY. 3R-MYB and TIFY genes were identified from ~70 plant species

including algae and all major lineages of land plants. Duplication events giving rise to

the expansion of the two families were identified and placed in the context of plant

evolution and speciation. In the 3R-MYB project, I further explored the 3R-MYB motif

and domain organization, gene structure, alternative splicing (AS), promoter, and their

expression patterns under abiotic stresses. In the TIFY project, I focused on domain

architecture evolution, rate-shift analysis in the domain which may contribute to

functional divergence among subfamilies, and AS conservation and dynamics.

Jasmonic acid (JA) is a phytohormone induced by wound and herbivorous attack.

Many MYB and TIFY genes play an important role in the jasmonate signaling pathway.

In plants, AS, a posttranscriptional mechanism providing fast responses towards

14

endogenous and exogenous stimuli, occurs within ~60% of the protein-coding genes in

the genome. In my third project, I focused on jasmonate induced AS responses in

Arabidopsis using transcriptome and proteome data. Three aspects of AS-related

regulation were addressed: 1) differential expression of AS isoforms identified by a

change in the proportion of AS isoforms from genes in response to methyl jasmonate

(MeJA); 2) genes that undergo differential AS and produce isoforms with potential

miRNA target sites; and 3) genes that undergo AS to produce splice variants with novel

functions. I observed cases where AS alone or AS and transcription together can

influence gene expression in response to jasmonate treatment. Twenty-one genes

which show differential AS were also predicted to be differentially targeted by miRNAs. I

identified 30 cases where alternative spliced isoforms may have novel functions. For

example, AS of bHLH160 generates an isoform without a basic domain, which may

convert it from an activator to a repressor.

15

CHAPTER 1 INTRODUCTION

Alternative Splicing Mechanism

Precise and quick response to environmental and developmental change is

crucial for organism survival. AS, through which multiple mRNA products are generated

from a single gene, is a posttranscriptional modification of mRNA that may offer a quick

response to stimulus in eukaryotes. As in yeast, AS of ribosomal protein-encoding

genes are induced rapidly in responses to amino acid starvation and other stresses

within minutes (Pleiss et al. 2007). More than 95% of animal multi-exon genes and more

than 60% of plant multi-exon genes undergo AS (Pan et al. 2008; Marquez et al. 2012;

Zhang et al. 2017), leading to the increased transcriptome and proteome complexity.

Introns and exons in pre-mRNA are bounded by conserved sequences that

define their ends. AS is regulated by the interactions of trans-regulatory proteins and

cis-regulatory elements, called splicing factors and splicing signal sequences,

respectively (Nilsen 2003). In general, strong cis-regulatory elements, which are highly

conserved, lead to consistent splicing, while weak cis-regulatory element lead to

inconsistent splicing (Keren et al. 2010; Reddy et al. 2013). Depending on the

regulatory direction and location, cis-regulatory elements could be divided into four

groups: exonic splicing silencers, exonic splicing enhancers, intronic splicing silencers

and intronic splicing enhancers (Kornblihtt et al. 2013). Similarly, trans-regulatory

proteins could act as activators or repressors during the splicing process. For example,

SR family and members of the hnRNP family have been shown to act as activators or

repressors, respectively (Reddy 2004; Martinez-Contreras et al. 2007). Splice site

16

selection is influenced by cellular stress, developmental stage and cell type in response

to adaptation and environment stresses (Filichkin et al. 2015).

The splicing reaction is mediated by a holoenzyme called the spliceosome, which

is a large macromolecular complex composed of five snRNAs and tens/hundreds of

auxiliary proteins (Nilsen 2003). The snRNAs function in RNA-protein-complexes, which

are called snRNP (Wahl et al. 2009). The conserved sequences that define a typical

intron include: 5’ splice site, branch site, polypyrimdine tract and 3’ splice site (Schwartz

et al. 2008). Based on the splicing signal sequence, introns could be divided into two

types: U2 and U12 intron (Patel and Stertz 2003; Will and Luhrmann 2005; Basu et al.

2008). More than 99% percent of introns are U2 introns in plants and animals (Will and

Luhrmann 2005; Basu et al. 2008). Although the two types of introns require different

splicing factors and spliceosomes, the catalytic reactions required to remove both types

of introns are similar (Simpson and Brown 2008). U2 intron splicing requires the U1, U2,

U4, U5 and U6 snRNPs whereas in U12 intron requires the analogous U11, U12,

U4atac and U6atac (Will and Luhrmann 2005; Simpson and Brown 2008). U2 intron

splicing process is a two-step reaction (Schwartz et al. 2008; Kornblihtt et al. 2013). In

preparation for splicing, the 5’ splice site is bound by U1 snRNP, the 3’ splicing site is

bound by the U2AF65 and U2AF35 auxiliary proteins, the branching site is initially

bound by branchpoint-binding protein, which is subsequently replaced by U2 snRNP

(Nilsen 2003; Wahl et al. 2009; Kornblihtt et al. 2013). In the first reaction, the 2’

hydroxyl group of the branchpoint adenosine attacks the phosphodiester bond of the 5’

splice junction site, forming an intermediate structure called the intron lariat (Nilsen

2003; Wahl et al. 2009). In the second reaction, the 3’ hydroxyl group of 5’ splice

17

junction site attacks the phosphodiester bond of 3’ splice junction site, leading to the

excision of intron lariat and the ligation of exons (Nilsen 2003; Wahl et al. 2009).

AS generates transcript isoforms carrying different pieces of DNA information.

Compared with original transcripts, AS can be categorized into a few basic types:

ExonS, IntronR, AltD, and AltA (Sammeth et al. 2008; Barbazuk et al. 2008). The ratio

of each type is different between animals and plants (Barbazuk et al. 2008). In animals,

the major type of AS is ExonS, which account for ~40%; while IntronR accounts for only

~10% (Kim et al. 2007). On the contrary, IntronR is common in plants and accounts for

~40% of AS events while ExonS only accounts for ~10% (Wang and Brendel 2006;

Marquez et al. 2012). The differences in AS between animals and plants lead to the

assumption that different splicing regulatory mechanisms exist in plants vs. animals.

Two models were proposed for splicing site selection: exon definition and intron

definition (Keren et al. 2010). Exon definition model suggests that the spliceosome

recognizes and assemblies in exons and splices out introns. In contrast, the intron

definition model suggests that the spliceosome recognizes, and assembles across,

introns (Keren et al. 2010). Because animals have small exons and large introns, exon

definition model is thought to predominate in animals. In comparison, plants possess

small introns and large exons and splicing is thought to occur through intron definition

(Keren et al. 2010). These differences in exon vs. intron definition agree with the

observed differences in AS event type preference – a mistake in defining an exon would

lead to ExonS in animals, while a mistake in intron definition in plants would result in

IntronR.

18

The functional outcome of AS largely depends on the fate of alternatively spliced

transcripts. Will the alternatively spliced isoform be translated into protein? If so, does

the “new” protein have function? AS will increase the complexity the proteome if the

isoform affects the coding region, and the alternative spliced transcripts are translated.

Thus, AS may in some part be play a role in creating the discrepancy between the

number of protein-coding genes in a genome and the much larger proteome (Keren et

al. 2010). However, the role of AS in protein complexity has been observed to be limited

(Severing et al. 2009). AS frequently generates nonsense mRNAs with PTCs that are

subjected to NMD. These unproductive transcripts may participate in post-

transcriptional regulation of protein levels (Lewis et al. 2003; Filichkin and Mockler 2012;

Drechsel et al. 2013). Thus, AS may play important roles in environmental adaptation

and developmental regulation (Staiger and Brown 2013). A combination of

transcriptome and proteome research would contribute to a better understanding of the

functional regulatory mechanism of AS.

Intron Origin Hypothesis

Introns are non-coding sequences that interrupt the coding sequence of a gene.

Introns are spliced out of the pre-mRNA leading to the uninterrupted coding potential of

mature mRNAs. Introns were first discovered in protein-coding genes (Berget et al.

1977; Chow et al. 1977), followed by the discovery of introns in tRNA-coding genes

(Goodman et al. 1997; Valenzuela et al. 1978) and rRNA-coding genes (LaPolla and

Lambowitz 1979; Dujon 1980). Introns have been found in all biological kingdoms

including organisms and viruses (Irimia and Roy 2014). Group I and II introns are found

in protein-coding genes, tRNA genes and rRNA genes of fungal, mitochondrial and

chloroplastic DNA (Saldanha et al. 1993); group I introns are also found in

19

bacteriophage, eubacteria and nuclear genes of lower eukaryotes (Saldanha et al.

1993). Group I and II introns are self-splicing, although their mechanisms differ. Group

III introns, also called spliceosomal introns, exist in protein-coding genes in the nuclear

genomes of eukaryotes (Brown et al. 1992). The splicing mechanism is directed by

spliceosome in a two-step manner that described above. The catalytic splicing reaction

of Group II introns is identical to that of Group III, although it requires no participation

from other RNA enzymes or proteins (Saldanha et al. 1993). The origin of the

spliceosomal intron is a topic of debate in molecular evolution (Koonin 2006; Rogozin et

al. 2012).

Two opposing explanations have dominated the debate for almost 25 years—

introns-early and introns-late hypothesis (Doolittle 1978; Darnel 1978; Cavalier-Smith

1991; Palmer and Logsdon 1991). According to the intron-early hypothesis, great

numbers of spliceosomal introns existed in the common ancestors of prokaryotes and

eukaryotes, but there was massive intron loss within independent linages during

evolution (Doolittle 1978; Gilbert 1987; Roy et al. 1999; Fedorov et al. 2001). In

contrast, supporters of the intron-late hypothesis argue that spliceosomal introns

originated in early eukaryotic species and were gained during eukaryotes evolution

(Cavalier-Smith 1985; Logsdon et al. 1995; Logsdon 1998; Stoltzfus 1999). All

sequenced genomes of eukaryote species contain spliceosomal-introns, even the

earliest linages of eukaryotes (Fedorova and Fedorov 2003; Rogozin et al. 2012). The

only reported case of an intronless eukaryotic genome is the Hemiselmis andersenii

nucleomorph genome, which shows a highly degraded remnant (Lane et al. 2007).

Intron-early hypothesis argues intron losses are the main driver for intron evolution,

20

whereas intron-late hypothesis argues intron gain drives intron evolution. Analysis of

completely sequenced eukaryotic genomes shows intron-rich and intron-poor species

interspersed throughout the phylogeny, with no simple phylogenetic pattern for

distribution evidence (Roy and Gilbert 2006). Because evidence exists for both

ancestral introns shared by orthologous genes from animals, plants and protists, and

newly evolved introns in some lineages (Logsdon et al. 1995; Fedorov et al. 2001), the

debate behind the intron-early or intron-late has not been resolved. Further developed

from the two extreme ideas of whether intron is originated before or after prokaryotes

and eukaryotes divergence, a more compromised solution is proposed as: the

progenitor spliceosomal introns are evolved from Group II introns during

eukaryogenesis, followed by both intron gain and intron loss, giving rise the current

intron distribution (Koonin 2006).

Three evolutionary forces—intron loss, intron gain and intron sliding—are thought

to play important roles in the evolution of intron diversity (Tarrío et al. 2008). For intron

loss, two models have been proposed: 1) The reverse transcription-recombination

model, in which cDNA is generated by reverse transcription of mRNA, followed by

recombination with the genomic copy, leading to intron loss. 2) Genomic deletion model,

in which introns are lost by direct and exact genomic deletion (Roy and Gilbert 2006 and

references therein). For intron gain, there are at least five models: 1) intron

transposition, in which the spliced intron re-inserts itself into mRNA in a different

position followed by reverse transcription; 2) intron transfer, in which the recombination

between paralogous genes lead to insertion of intron into new positions; 3) conversion

of Group II intron into spliceosomal intron; 4) transposon insertion; and 5) tandem

21

genomic duplication (Roy and Gilbert 2006 and references therein). An AS model has

been proposed for intron sliding, in which a change of a strong splice signal to a weak

one (or the reverse) leads to a change in the major isoform (Tarrío et al. 2008). On one

hand, AS could potentially affect intron evolution by driving intron sliding, on the other

hand, the evolution of an intron, which contains cis-regulatory splicing elements, would

in turn affect AS. The combined analysis would help to understand the role AS plays in

intron evolution as well as the influence intron evolution has on AS regulation.

MYB Gene Family Evolution

MYB transcription factors first appeared more than a billion years ago and they

are widely distributed in eukaryote species including slime molds, fungi, plants and

animals (Lipsick 1996). The basic unit of MYB domain, which is the ‘R’ repeat, is

classified into three main types: R1, R2 and R3. Different copy numbers and/or ordering

of these repeats make the evolutionary history of MYB gene family puzzling as well as

enchanting. To solve this puzzle requires clear identification of the MYB types in

different organisms. Animals mainly carry 3R-MYBs, plants mainly carry 3R-MYBs,

R2R3-MYBs and MYB-related genes (Lipsick 1996; Dubos et al. 2010; Davidson et al.

2012). One question that arises is what is the progenitor MYB type before the

divergence of plants and animals, 3R-MYB, R2R3-MYB, or MYB-related? The two

models for MYB evolution have been proposed: the loss model (Lipsick 1996), and the

gain model (Jiang et al. 2004). The loss module postulates that a single progenitor

repeat R replicated, giving rise to two- and three-repeat MYB domains (3R-MYB, also

called R1R2R3-MYB) before the divergence of plants and animals. The three-repeat

MYB domain is the progenitor MYB of plant and animals. During evolution, animals and

plants kept the 3R-MYB proteins, but some plant MYBs also lost the R1 domain, giving

22

rise to the R2R3-MYBs (Lipsick 1996). Subsequently, the R2R3-MYB subfamily

underwent great expansion during evolution. In the gain module it predicts that R2R3-

MYB and 3R-MYB coexisted in the common ancestors of plants and animals. During

evolution, plants kept both R2R3-MYB and 3R-MYB, while animals kept 3R-MYB, but

lost R2R3-MYB (Jiang et al. 2014). The loss model is better supported by the

conservation of 3R-MYBs between plants and animals and the phylogenetic relationship

between R2R3-MYBs and 3R-MYBs.

In addition to the broad evolutionary relationships of the MYB gene family, how

MYB evolved within animal or plant species is also under investigation. The evolution of

the MYB gene family in animals is relatively clear. Invertebrate species usually harbor

one copy of 3R-MYB, however vertebrates usually have three copies of 3R-MYB

(Lipsick 1996; Davidson et al. 2012). It is suggested that a single copy of 3R-MYB

underwent two rounds of duplication and gave rise to B-MYB, A-MYB and c-MYB in

vertebrate species (Davidson et al. 2012). The animal MYB evolution model is well

supported by phylogeny analysis.

In contrast, the function and phylogeny of the MYB gene family members in

plants are very poorly conserved. In addition, the plant MYB gene family includes at

least three types of MYBs: 3R-MYB, R2R3-MYB and MYB-related. Taken as a whole,

this broad and disparate collection makes evolution analysis of plant MYB gene family a

difficult task. Sequence conservation of DNA binding domain and regulatory motifs

currently divides the R2R3-MYB group into 22 subfamilies (Kranz et al. 1998). There

are reports detailing the identification of the MYB gene family in different plant species,

and a few comparison studies of two or a few species. However, the detailed expansion

23

pattern of MYB gene family in plant species is still not clear. The increased number of

plant species with sequenced and annotated genome drafts now available is enabling a

broad evolutionary analysis of MYB gene family across the plant kingdom.

The MYB transcription factor gene family, which regulates gene expression, is

itself under regulation by AS. To the best of our knowledge, in plant species only 2

research reports have been published on AS analysis of MYBs in plants. Arabidopsis

AtMYB59 and AtMYB48, and their rice homologous AK111626 and AK107214, shared

a conserved AS pattern and the expression level of their splice variants are regulated by

the treatment of hormones and stresses (Li et al. 2006). A genome scale analysis of

Cucumis sativus identified fifty-five R2R3-MYBs, among which eight exhibit AS events

(Li et al. 2012). All identified MYBs that have AS events show a certain level of

conservation of their AS pattern (Li et al. 2006; Li et al. 2012). Many questions

regarding AS of MYB gene family are currently unanswered: What is the ratio of genes

having AS in MYB gene family? What are the outcomes of splicing, and are the

transcript isoforms translated? Is AS conserved during MYB evolution/plant evolution? If

yes, what is the conserved pattern? If not, how does it evolve?

In the plant R2R3-MYB family, two conserved introns have been identified, with

the first conserved intron located in R2 domain region and the second conserved intron

located in R3 domain region (Jiang et al. 2004; Matus et al. 2008). Based on existence

of the two types of introns, R2R3-MYB genes could be divided into four basic types: 1)

single-exon gene; 2) two-exon gene with a conserved intron in R2 domain; 3) two-exon

gene with a conserved intron in R3 domain; 4) three-exon genes with both introns in R2

and R3 domains. Intron position and gene structure are largely conserved within

24

subfamilies. For subfamilies dominated by three-exon genes, the two conserved introns

show occasional loss in several species. Besides the two typical conserved introns,

more ancient subfamilies in R2R3-MYBs show other intron positions within the R2 and

R3 domain region, which is also conserved within subfamilies. For example, intron

pattern analysis of soybean R2R3-MYB genes identifies a total of 14 different patterns

(Du et al. 2012), which suggests variation in intron position in R2R3-MYBs.

Similar to R2R3-MYB, plant 3R-MYBs contain conserved introns within DNA-

binding domain region. However, the intron position of 3R-MYB is different from that in

R2R3-MYB. During evolution, intron position shows both conservation and dynamic. It

would be interesting to see whether evolution of MYBs support the intron gain or the

intron loss hypothesis.

TIFY Gene Family Evolution

Multiple domain proteins are more prevalent than single domain proteins in both

prokaryote and eukaryote genomes. In prokaryotes, approximately 60% of proteins

have multiple domains; while in eukaryotes, more than 70% of proteins have multiple

domain proteins (Vogel et al. 2004; Han et al. 2007). In multiple domain proteins the

function of individual domains may contribute to the overall function of the protein (Han

et al. 2007). In other cases, the recombination of two domains could link two

biochemical process together (Han et al. 2007). Although over half of all the domain

families are common in bacteria, archaea, and eukaryotes, only 5% of the two-domain

families are common in the three domains of life, suggesting that the emergence of new

domain combinations is related to speciation (Apic et al. 2001; Madrea et al 2004; Vogel

et al. 2004). Understanding the evolution of multi-domain gene family is important to

better understand the evolution of proteins and the process of speciation.

25

The TIFY family is a multi-domain gene family defined by a highly conserved

TIFY domain, which is about 36 amino acids long and forms an beta -beta-alpha fold

(Vanholme et al. 2007; Chung et al. 2009). Proteins within the TIFY family could be

further divided into four subfamilies based on the existence of other domains (Bai et al.

2011): 1) TIFY subfamily, with only the TIFY domain; 2) ZML subfamily, with the TIFY

domain along with a CCT and C2C2-GATA zinc finger domain (refer to as GATA

thereafter for simplicity), 3) PPD subfamily, with the TIFY domain along with a PPD and

Jas-like domain; 4) JAZ subfamily, with the TIFY domain and a Jas domain. The TIFY

domain functions in protein-protein interactions, which include interactions between

TIFY genes (Chini et al. 2009) and interactions between TIFY genes with other proteins

such as NINJA (Pauwels et al. 2010). The CCT domain is predicted to contain a nuclear

localization signal (Nishii et al. 2000). The GATA domain is a DNA-binding domain

recognizing specific cis-elements (Nishii et al. 2000; Teakle et al. 2002). The Jas

domain mediates protein-protein interaction with various transcription factors as well as

F-box protein COI1 (Shyu et al. 2012; Chini et al. 2016).

The major molecular mechanism for generating novel domain architecture are

domain duplication and domain shuffling (including domain deletion and domain

insertion) (Vogel et al. 2004; Han et al. 2007; Stolzer et al. 2015). In eukaryote genomes

exon boundaries and domain boundaries tend to coincide, suggesting new domain

combination may be formed by intronic recombination (Vogel et al. 2004). After the

origin of new domain architecture, the structure and function of the new combination

would determine whether they will be selected for or against during evolution (Vogel et

al. 2004).

26

The TIFY gene family is a plant-specific gene family (Bai et al. 2011). No TIFY

genes were observed in single-celled green algae C. reinhardtii or multi-cellular green

algae V. carteri, suggesting the TIFY family originated after the divergence of algae

from land plants (Bai et al. 2011). As a plant specific gene family, TIFY genes are

involved in many plant specific pathways. The ZML genes are transcription factors

function in wound-induced lignification in maize (Vélez-Bermúdez et al. 2015) and

photoprotective responses in Arabidopsis (Shaikhali et al. 2012). The PPD genes

regulate leaf morphology in Arabidopsis (White 2006). The JAZ subfamily comprises

transcription repressors playing an important role in the jasmonate signaling pathway

(Chini et al. 2016 and reference therein). Jasmonate is an essential phytohormone

regulating plant responses towards biotic and abiotic stresses. As the JAZ proteins are

important regulators in jasmonate responses, the evolution of JAZ proteins could

contribute to our understanding of the origin and establishment of the jasmonate

signaling pathway during plant evolution. The earliest identified plant species containing

JAZ genes is the moss P. patens (Katsir et al. 2008; Bai et al. 2011). Interestingly, the

P. patens genome also contains jasmonate biosynthetic enzyme and jasmonate-

conjugating enzymes (Terol et al. 2006; Katsir et al. 2008). It is likely that the early land

plants evolved the jasmonate pathway to better adapt to frequent abiotic and biotic

environmental stresses (Katsir et al. 2008).

27

CHAPTER 2 EVOLUTION OF THE 3R-MYB GENE FAMILY IN PLANTS

Background

The MYB gene family is broadly distributed in eukaryotes (Lipsick 1996), with

many homologs in plants (Dubos et al. 2010; Feller et al. 2011; Du et al. 2013). MYB

proteins are defined by the presence of one or more MYB domains, typically denoted ‘R’

(for repeat), which occur in the DNA-binding domain of MYB transcription factors

(Lipsick 1996; Martin and Paz-Ares 1997; Rosinski and Atchley 1998). Each R repeat

comprises approximately 52 amino acids that contain three regularly spaced conserved

hydrophobic residues (usually tryptophans) that are essential in forming the

hydrophobic pocket (Ogata et al. 1992). MYB domains fold into three alpha helices, with

the second and third helix forming a helix-turn-helix structure (Ogata et al. 1992). MYB

proteins are classified into four major types (1R-MYB/MYB-related, R2R3-MYB, 3R-

MYB and 4R-MYB) based on their number of repeats (Dubos et al. 2010), although this

classification is not necessarily consistent with the MYB phylogeny. There are three

genes in most vertebrates and fewer than ten genes in angiosperms that encode 3R-

MYB proteins (Feller et al. 2011), which include the product of the prototypical c-myb

gene (the cellular homolog of v-myb; Klempnauer et al. 1982). However, the animal and

plant 3R-MYB gene families appear to be separate clades, and the plant 3R-MYB

genes likely gave rise to the diverse (approximately 100 to 200 genes per species)

R2R3-MYB gene families of plants (Braun and Grotewold 1999; Dias et al. 2003). Thus,

understanding the evolution of the 3R-MYB genes in plants is critical for understanding

the evolution of the plant MYB gene family in general.

28

The primary function of many different MYB proteins appears to be recognition of

specific DNA sequence motifs (e.g., Ording et al. 1994), although MYB domains also

play a role in protein-protein interactions (e.g., Grotewold et al. 2000). Plant 3R-MYB

proteins recognize MSA elements (Ito et al. 1998; Ma et al. 2009), and play a conserved

role in cell cycle regulation. The 3R-MYB proteins in plants regulate the G2/M transition

(Ito et al. 2001), whereas the animal proteins regulate the G1/S transition (Bergoltz et al.

2001). The DNA element (MSA) that plant 3R-MYBs recognize exists in the upstream

promoter region of G2/M-phase specific genes, such as B-type cyclin genes, and it is

both necessary and sufficient for driving G2/M-phase specific gene expression (Ito et al.

2001; Haga et al. 2007; Kato et al. 2009).

Plant 3R-MYBs often are divided into three groups (the A-, B- and C-group; Ito et

al. 2001; Ito 2005). The tobacco NtMybA1 and NtMybA2 genes (A-group) have variable

expression patterns during cell cycle, with a peak of expression at M-phase, and their

products bind to the MSA element directly and activate B-type cyclin gene expression

(Ito et al. 2001; Kato et al. 2009). The Arabidopsis orthologs (Myb3R1 and Myb3R4) of

those tobacco genes bind to the MSA elements of B2-type cyclin, CELL DIVISON

CYCLE20.1 and KNOLLE, and up-regulate their expression (Haga et al. 2007).

Consistent with their putative role in the cell cycle, double mutants in these A-group

genes exhibit incomplete cytokinesis, multinucleate cells, and defective cell walls in

Arabidopsis (Haga et al. 2011). In contrast, tobacco NtMybB (B-group) is constantly

expressed during the cell cycle, and it functions as a repressor (Ito et al. 2001). Finally,

one of the C-group genes (OsMYB3R-2 in rice) is involved in both cell cycle and abiotic

stresses (Dai et al. 2007; Ma et al. 2009). The OsMYB3R-2 is induced by stresses, such

29

as freezing, drought, and salt; and, overexpression of it under stress conditions

increases stress tolerance and maintains a high level of cell division (Dai et al. 2007).

The pleiotropic effects of OsMYB3R-2 suggest it’s possible involvement in the B-type

cyclin pathway and the DREB/CBF pathway (Ma et al. 2009). It is unclear whether A-

and B-group 3R-MYB proteins are also involved in abiotic stresses. Plants have sessile

life styles and coping with abiotic stresses is a challenge for their survival. Placing these

functions of 3R-MYB transcription factors in an evolutionary framework is important for

understanding the ways that plants couple cell cycle and abiotic stress responses.

The genetic basis for functional divergence among the A-, B-, and C-groups of

3R-MYB proteins is also unclear. The C-terminal regions of MYB proteins are highly

divergent, and there is substantial length variation among the A-, B-, and C-groups (Ito

et al. 2001). There is a negative regulatory domain located in C-terminal region that

represses transactivation activity of NtMybA2 (A-group); specific cyclin/CDK

complex(es) could phosphorylate specific sites in NtMybA2 protein and remove the

inhibitory effects (Araki et al. 2004). Overexpression of the truncated protein without the

negative regulation domain up-regulates many G2/M specific genes compared with

overexpression of the full-length protein in tobacco (Kato et al. 2009). In addition to

these C-terminal regions, there can be divergence within the MYB repeats themselves.

If any such divergent sites exist, they might exhibit shifts in their evolutionary rate

(Gaucher et al. 2002) that would render them detectable.

AS is a process that results in multiple discrete mRNA products from a single

gene. This is a posttranscriptional modification of mRNA that may offer a quick

response to stimuli in eukaryotes. More than 95% of animal multi-exon genes (Pan et al.

30

2008) and more than 60% of plant multi-exon genes (Marquez et al. 2012) undergo AS.

However, the extent and regulation of AS in the plant 3R-MYBs is largely unknown.

Moreover, the evolutionary forces that shape current intron/exon gene structures (e.g.,

intron gain or intron loss) are unknown.

In this study, I explore the patterns of molecular evolution in the plant 3R-MYB

transcription factor gene family and examine its motif and domain organization, gene

structure, AS, and expression patterns under abiotic stresses. Specifically, I address the

phylogenetic relationships among plant 3R-MYBs, seek to identify candidate sites and

motifs in the 3R-MYB proteins that contribute to their functional divergence, determine

the pattern of intron and AS evolution within the plant 3R-MYBs, and look for evidence

that the A-, B- or C-group 3R-MYBs are involved in abiotic stress responses. Answering

these questions will enhance our understanding of the evolution and function of the 3R-

MYBs in plants and help illuminate the evolution and functional divergence of gene

families encoding plant transcription factors.

Materials and Methods

Identification of the 3R-MYB Proteins

We used HMMER v3.1b2 (Eddy 2011) to conduct profile HMM searches using

the Pfam MYB DNA-binding-domain (PF00249) as a query to search annotated proteins

from 65 plant species (Table 2-1). For gene loci with multiple isoforms predicted, the

primary isoform was used if primary isoform annotation is available; otherwise the

longest protein was used. We considered sequences with three MYB domains identified

by HMMER with an E-value of ≤ 1.0E-15 to be candidate 3R-MYB proteins. Those

candidate 3R-MYB proteins from the HMMER search were then examined to confirm

that three R repeats are adjacent to one another using the SMART (Letunic et al. 2015),

31

CDD (Marchler-Bauer et al. 2015), and Pfam (Finn et al. 2014) databases. Proteins with

non-adjacent R repeats or proteins containing other domains besides MYB domains

were removed.

Multiple Sequence Alignments and Phylogenetic Analysis

We generated an amino acid multiple sequence alignment for 3R-MYB using

Muscle v3.8.31 with default parameters (Edgar 2004) followed by manual improvements

(Supplemental Data 1), and used these as input to generate a ML phylogenetic tree

based on the entire protein lengths with RAxML v8.1.12 (Stamatakis 2014) using the

LG4X model (Le et al. 2012). Eight tree searches were performed to identify the ML

tree. Then I attempted to improve the ML gene tree topologies using TreeFix (Wu et al.

2013b), which takes the ML gene tree topology, the sequence alignment, and a species

tree topology (Figure 2-1) and tries to find an alternate gene tree topology that implies

fewer duplications and losses than the original ML topology while not significantly

increasing the likelihood. 500 nonparametric bootstrap replicates were run for the

dataset with ML under the LG4X model using RAxML (v8.1.12) (Stamatakis 2014) and

MEGA6 Beta2 software (Tamura et al. 2013) was used to generate the tree figures.

Domain and Motif Identification

We identified group-specific evolutionary rate shifts in the MYB domain region

using a method described by Gaucher et al. (2001). Briefly, I estimated the amino acid

substitution rates of each site in the alignments of the MYB-domains of six groups: 1) A-

group; 2) B-group; 3) C-group; 4) A- and B-groups; 5) B- and C-groups; and 6) A- and

C-groups with PAML v4.8a (Yang 2007) using the LG model (Le and Gascuel 2008)

with Γ-distributed rate variation among sites. We conducted three comparisons: 1) A-

group vs. B- and C-groups; 2) B-group vs. A- and C-groups; and 3) C-group vs. A- and

32

B-groups. The expected evolutionary rate difference for any comparison of two groups

is zero; large positive or negative values indicate shifts in rates. Sites with amino acid

substitution rate differences larger than 2.57 standard deviations from the mean were

chosen as significantly conserved or dynamic sites.

The branch-site model in PAML v. 4.8a (Yang 2007) was used to examine the

MYB domain of A-, B- or C-groups for positive selection following their divergence and,

if present, to determine the sites of positive selection. In these tests I compared the

alternative model (branch-site model A) with its corresponding null model (model A with

ω2=1 fixed). Additionally, I tested for positive selection in monocots within A- and C-

groups using the same method to detect whether monocot A- and C-groups have

picked up B-group gene function and thus have accelerated evolutionary rates. In the

positive selection tests, the nucleotide alignments of the DNA-binding-domain region

were generated from back translation from the amino acid alignments with in-house perl

scripts.

Motifs in the C-terminus were identified using MEME (Multiple EM for Motif

Elicitation) v.4.10.2 (Bailey et al. 2006). Sequence logos of the C-terminal motifs were

generated with Weblogo Berkeley (http://weblogo.berkeley.edu/logo.cgi).

Synonymous Divergence among Paralogs

PAML v. 4.8a (Yang 2007) was used on the nucleotide alignments described in

the positive selection test (above) to calculate pairwise dS with one ratio model (M0)

(Goldman and Yang 1994) for nucleotide alignments of the MYB-domains of paralogous

genes from each of 40 different angiosperm species (Table 2-1). Pair-wise dS values

were placed into six subsets depending on the group membership of the genes being

33

compared (A vs. A, B vs. B, C vs. C, A vs. B, B vs. C, and A vs. C). Normal distributions

were fit to the dS distributions of the six groups.

Syntenic Block Identification

In order to investigate whether the origin of the three 3R-MYB genes in

Amborella were due to single gene duplication or segmental duplication events, I

analyzed the synteny blocks in Amborella trichopoda and Ostreococcus lucimarinus.

Syntenic blocks in Ostreococcus lucimarinus and Amborella trichopoda were identified

with DAGchainer (Haas et al. 2004). Ostreococcus and Amborella proteins were aligned

to each another by the all-to-all blastp (v2.2.28) method (Altschul et al. 1990). The

combined file of genome annotation (gff3) and blastp results were supplied to

DAGchainer with default parameters. Syntenic blocks that contain the algal and

Amborella 3R-MYB proteins were plotted in R (R Development Core Team 2008).

Identification of Intron Positions and AS Analysis

We extracted gene structure information from gff3 annotation files for 42 species

(Table 2-1). The evolutionary history of introns in the DNA-binding-domain was

reconstructed using maximum parsimony with the phylogenetic trees constructed in this

study (Figure 2-2A; Figure 2-3). We also examined the 3R-MYB genes from six species

for evidence of AS . Arabidopsis thaliana, Populus trichocarpa, Vitis vinifera, Oryza

sativa, and Amborella trichopoda AS data was acquired from Chamala et al. (2015),

while AS in Sorghum bicolor was identified using the available reference genome

sequence and annotation (Paterson et al. 2009) and publicly available sorghum RNA-

Seq data (GSE30249 and GSE50464 from Gene Expression Omnibus) (Dugas et al.

2001; Olson et al. 2014) using the methodology described in Chamala et al. (2015).

Among the 25 3R-MYB genes identified within these species, 16 genes have evidence

34

of alternatively spliced transcripts. The gene structure of the sixteen 3R-MYB genes

were displayed with Gene Structure Display Server 2.0 (http://gsds.cbi.pku.edu.cn) (Hu

et al. 2015), and the AS patterns were added with manual editing.

Analysis of Motifs in Promoter Regions

We examined sequences from the start codon to a point 2000 base pairs

upstream for 160 3R-MYB genes from 41 species (Table 2-1). These putative promoter

regions were searched on both strands for exact matches to the sequence 5’-AACGG-

3’, which is the core consensus sequence of the MSA element

(T/C)C(T/C)AACGG(T/C)(T/C)A. We compared the number of exact matches to 5’-

AACGG-3’ in 3R-MYB gene promoters to 400 randomly sampled genes. We conduced

a one-way ANOVA and Tukey’s HSD test in R (R Development Core Team 2008) to

examine the hypothesis that 3R-MYB genes have more potential MSA elements than

randomly chosen genes. The number of potential MSA elements for each gene was

transformed by square root to normalize residuals and equalize variances before

statistical tests.

Gene Expression Analysis

We examined 3R-MYB gene expression analysis under various abiotic stresses

(heat, cold, drought and salt) with microarray data available from the AtGenExpress

(Arabidopsis thaliana genome transcript expression study) project (Kilian et al. 2007) for

Arabidopsis; and the Plant Expression Database (PLEXdb) (Dash et al. 2012) for

barley, rice, wheat, maize, grape, soybean, Medicago, poplar, and cotton. For data with

multiple time points I performed a one-way ANOVA test to determine the statistical

significance of expression changes. For data with control and stress conditions, I

performed a two-sample t-test to identify significant expression changes.

35

Results

Global Identification of 3R-MYB Proteins from 65 Plant Species

We identified 225 3R-MYB genes from 65 plant species using profile HMM

searches (Materials and Methods; Figure 2-1). There was a single 3R-MYB gene in

each of the algal outgroups, whereas the moss (Physcomitrella patens) has two 3R-

MYB genes, possibly resulting from a genome duplication in that lineage (Rensing et al.

2007). Both gymnosperm species that were analyzed have two 3R-MYB genes.

Amborella has three 3R-MYB genes that fall into the A-, B- and C-group, respectively,

indicating gene duplications preceding the origin of angiosperms. All other angiosperm

3R-MYB genes also fall into the A-, B- and C-groups; the number of 3R-MYB genes

found in angiosperm genomes ranges from one (e.g., Citrus sinensis) to nine (e.g.,

Triticum aestivum). The absence of gene members from a certain group of 3R-MYB in a

given species might represent bona fide gene loss but it could also result from an

incomplete or locally misassembled genome, improper annotation, or failure to meet our

screening criteria. However, the absence of B-group 3R-MYBs in many monocots [with

the exception of duckweed (Spirodela polyrhiza), banana (Musa acuminate), and wild

banana (Musa balbisiana)] suggests the loss of B-group 3R-MYBs during monocot

evolution. Based on the distribution of B-group 3R-MYB genes in monocots there were

probably two independent losses: one in the grasses and one in orchid and palms. In

addition, orchid and palms probably also lost A-group 3R-MYBs.

Phylogenetic Analysis of the Plant 3R-MYB Proteins

The 3R-MYB proteins were clearly divided among three groups (the previously

defined A-, B- and C-groups) (Figure 2-2A). The A-, B- and C-group proteins were

present only in angiosperm species, the single Amborella 3R-MYB gene in each group

36

was sister to all other species. Within A- and C-groups, genes from monocots formed

one branch while genes from eudicots formed another branch (Figure 2-2A and Figure

2-3). This indicates no gene duplication event before the divergence of monocots and

eudicots and the expansion of 3R-MYBs in angiosperms are mainly due to lineage

specific duplication events during the evolution of monocots and eudicots.

Synteny

A total of 1911 synteny blocks were identified between algae (Ostreococcus

lucimarinus) and Amborella, with an average of 9.5 (standard deviation = 2.8) genes per

synteny block. Examination of these blocks indicates that the region of Ostreococcus

lucimarinus chr9 surrounding a 3R-MYB gene is present in triplicate in Amborella – with

each block in the Amborella genome containing one of the three 3R-MYBs (Figure 2-4).

This suggests that the origin of the three 3R-MYB genes in Amborella resulted from

segmental duplications rather than tandem duplications of single gene.

Synonymous Divergence Analysis of the Three Group 3R-MYBs in Angiosperms

We analyzed the pair-wise dS values of paralogous 3R-MYB genes within the

same species of angiosperms (Figure 2-5). Inter-group comparisons (A-B, B-C, A-C)

were used to estimate the timing of gene duplication events leading to the divergence of

the three groups. The peaks of dS distribution of the three inter-group comparisons are

at 1.9, 2.2, and 2.4 for B-C, A-C, and A-B respectively. This suggests that the A-group

diverged before the divergence of B- and C-groups, in agreement with the phylogenetic

tree (Figure 2-2A and Figure 2-3). Intra-group comparisons (A-A, B-B, C-C) were used

to estimate the timing of gene duplication events after the divergence of A-, B- and C-

group. We observed the peak of dS distribution of A-A, B-B, C-C to be at 0.7, 0.9, and

0.5 respectively.

37

The Evolutionary History of the Plant 3R-MYBs Motifs

Four conserved motifs were identified in the C-terminal region of plant 3R-MYBs

(Figure 2-2). Motif 2 arose early in land plant evolution and was conserved across

moss, gymnosperm, and angiosperm proteins. The other three motifs appear to have

been present within the common ancestor of seed plants (gymnosperms and

angiosperms). Different motifs then appear to have been lost in each group.

Specifically, motif 3 was lost from the A-group proteins, motifs 1 and 4 were lost from

the common ancestor of B- and C-group proteins, and motif 3 was independently lost

from C-group proteins (Figure 2-2B). We also observed a 12-14 amino acids deletion in

motif 4 within the grasses (Figure 2-2C and Figure 2-6). It is unclear whether the lost

fragment in motif 4 affects 3R-MYB function in grasses.

Several amino acid sites in the MYB DNA-binding-domain appear to have

undergone rate shifts (Figure 2-7). Most of the candidate rate-shift sites are located in

the first helix of each R repeat, so they are unlikely to directly impact the DNA-binding

activity since the second and third helix form a helix-turn-helix structure responsible for

DNA binding (Ogata et al. 1992). Our rate shift analyses are consistent with the results

of functional characterization of the three MYB repeats in animal c-MYB (Ogata et al.

1992; Ording et al. 1994). Specifically, there are the fewest (3) rate divergent sites in

R3, which plays the dominant role in DNA-binding, whereas R1 and R2 have more (6

and 7, respectively). Site 85 in R2, showing divergence among A-, B- and C-groups, is

the only site located within the helix-turn-helix structure.

In order to test whether any of the three groups experienced accelerated

evolutionary rates after divergence, I tested positive selection of A-, B- and C-groups

using a branch-site model (Materials and Methods). However, none of these three tests

38

support the hypothesis of positive selection (Table 2-2). Moreover, positive selection in

monocots within the A- and C-groups was also not detected (Table 2-2).

Gene Structure Evolution

We identified six introns in the DNA-binding-domain region from 160 3R-MYB

genes (Figure 2-7A). Five introns (A, B, C, D and E) are conserved among multiple

species, while the other intron (b) was found only in one sequence. The distribution of

the five conserved introns reveals their evolutionary history (Figure 2-8). Introns A and B

were present in the common ancestor of all land plants and green algae; indeed, intron

A is broadly distributed in eukaryotes (Braun and Grotewold 1999). Two additional

introns (D and E) were gained before the divergence of mosses and seed plants.

Finally, intron C was inserted after the divergence of seed plants from mosses. The

unconserved intron b is found in only one case [Gorai008G117400 (B-group) in

Gossypium raimondii]. Gorai008G117400 has conserved introns A, C, D, and E, and

unconserved intron b in a position close to intron B. The amino acid alignment of the

corresponding region around intron b of Gorai008G117400 is different compared to

other proteins. It is possible that nucleotide substitutions around intron B may have

altered splicing signals; alternately, it could be a sequencing/assembly error.

Notably, I observed four conserved exons at the 3’ end in angiosperm A-group

and gymnosperm 3R-MYB genes. The middle two of the four conserved exons contain

the motif 4 in angiosperm A-group and gymnosperm 3R-MYB proteins (Figure 2-8).

Alternative Splicing of the Plant 3R-MYBs

The proportions of 3R-MYB genes with evidence of AS in Arabidopsis, poplar,

grapevine, rice, sorghum, and Amborella are 100% (5/5), 50% (2/4), 67% (4/6), 25%

(1/4), 33% (1/3), and 100% (3/3), respectively. Thus, 16 of the 25 3R-MYB genes

39

represented within the six species have evidence of undergoing AS, and these 16

genes produce a total of 30 AS events. Among the 30 AS events, 1 is ExonS, 15 are

IntronR, 7 are AltA, 1 is AltD, and 6 are alternative polyadenylation. Eight of the 30

events occur within UTRs, while 22 events impact the coding region (Figure 2-9). Eight

of the 22 AS events that impact the coding region lead to PTCs. These transcripts may

succumb to nonsense mediated decay (Chang et al. 2007) and may represent

unproductive splicing that may regulate 3R-MYB protein levels (Lareau et al. 2007).

Furthermore, 13 of the 22 events that impact the coding region affect the DNA binding

domain. Of all the AS events identified, I observe two shared AS patterns in 3R-MYB

genes among different species: Amborella Amtr00109.47, Arabidopsis At5g11510 and

At3g09370 shared a conserved AltA event in their second exons; Grape

GSVIVT01027493001 and Arabidopsis At4g00540 shared a conserved AltA event in

their second exons (Figure 2-9). Moreover, I observed a shared alternative

polyadenylation event between the two A-group Arabidopsis genes (At4g32730 and

At5g11510).

MSA cis-Regulatory Element Prediction (Cell Cycle Regulation)

The cis-regulatory elements necessary and sufficient to drive G2/M-phase

specific gene expression (MSA) are specific targets of the trans-acting 3R-MYB

proteins. Thus MSAs provide a way to identify candidate genes that might be involved in

the regulation of the G2/M transition during the cell cycle. The plant 3R-MYB genes

have been shown to be self-regulated by MSA elements in their promoter (Kato et al.

2009). We used evidence of enrichment of the MSA element core sequence within

regions upstream of 3R-MYB genes from plant species that have not been functionally

characterized as indication of potential involvement in cell cycle. We searched for the

40

MSA element core sequence (5’-AACGG-3’) within either of the sense or antisense

strands in the region up to 2kb upstream of the start codon of the 3R-MYB genes. There

were no significant differences in the number of MSA core sequences on the sense or

antisense strand (Figure 2-10). The average number of MSA element core sequences

in the upstream 2kp region of each gene of the A-, B-, C-group, and the outgroup

species (algae, moss, and gymnosperms) were 3.3, 3.2, 6.7 and 4.4 respectively. In

contrast, the average number of MSA element core sequence in the upstream

sequences for randomly selected genes was only 1.7. The numbers of MSA element

core sequences in plant 3R-MYB genes are significantly higher than randomly selected

genes based on ANOVA and Tukey’s HSD test (Figure 2-11). While this suggests the

possibility that plant 3R-MYBs are widely involved in the cell-cycle, this relationship

remains to be experimentally verified.

The number of MSA element core sequence in C-group genes is significantly

higher than that in A- and B-groups, suggesting that the C-group may have different

regulatory mechanisms.

Expression Pattern of the Plant 3R-MYBs under Abiotic Stresses

We analyzed available gene expression profiles of three Arabidopsis 3R-MYB

genes, At4g32730 (A-group), At5g11510 (A-group) and At3g09370 (C-group), under

various abiotic stresses. mRNA accumulation of At5g11510 under favorable growth

conditions was two-fold higher in the root than in the shoot, whereas the other two

genes have similar expression levels in the root and shoot (Figure 2-12). The C-group

gene At3g09370 was induced under two different stress conditions: 1) heat treatment

(both shoot and root); 2) salt stress (only in root). At3g09370 returns to its original

expression level when heat stress is released. The A-group genes At5g11510 and

41

At4g32730 showed reduced expression under heat treatment in shoot and root tissue,

although change in expression was less dramatic for At4g32730 (Figure 2-12). Overall,

there were several cases where A- and C-group 3R-MYB genes exhibited opposite

patterns of regulation. The Arabidopsis C-group gene At3g09370 shows an upregulated

expression pattern similar to the rice C-group gene OsMYB3R-2 under stress

conditions, implying At3g09370 also plays a role in stress response. The opposite

expression patterns of the A- and C-group genes described above implies a possible

antagonistic regulation of these two groups under abiotic stresses in Arabidopsis.

We analyzed available microarray gene expression profiles of 3R-MYBs in

barley, rice, wheat, maize, grape, soybean, Medicago, poplar, and cotton. Among the

available gene expression profiles, five A-group genes, one B-group genes and six C-

group genes showed significant expression changes in response to one or more stress

treatments (Figure 2-13). Among the fifteen instances of differential expression, six

cases involved upregulated expression: A-group gene MLOC10556 (barley) in response

to cold; B-group gene GSVIVT01019834001 (grape) in response to heat; and four C-

group genes Glyma18G18110 (soybean) in response to heat, LOC_Os01g62410

(OsMYB3R-2) (rice), GRMZM2G081919 (maize) and Potri006G085600 (poplar) in

response to drought (Figure 2-13). The remaining nine instances of differential

expression indicated downregulation in response to abiotic stresses.

Discussion

Patterns of Duplication and Loss in Plant 3R-MYB Genes

Plant and animal 3R-MYBs share a 3R-MYB common ancestor, which is

supported by the conservation of an intron in R1 (Braun and Grotewold 1999) and

phylogenetic analyses (Dias et al. 2003). Interestingly, there are similarities in the

42

evolution of 3R-MYBs in plants and animals. Most invertebrates have a single 3R-MYB

gene whereas vertebrates have three (A-MYB, B-MYB, and c-MYB) (Davidson et al.

2012). All three vertebrate 3R-MYB genes are involved in cell-cycle regulation, although

they have distinct expression patterns and exhibit some degree of functional

differentiation, such as the ability of B-MYB to complement Drosophila MYB mutants

when neither A- or c-MYB can do so (Davidson et al. 2005). The three vertebrate MYB

genes have originated from two rounds of segmental duplication (Davidson et al. 2012).

They may also be a result of two rounds of WGD in vertebrates (Gibson and Spring

2000), although more recent phylogenetic analyses raise questions about this

hypothesis (Abbasi and Hanif 2012). Analysis of synteny between Amborella trichopoda

and Ostreococcus lucimarinus suggest that the duplication events giving rise to the

three members in Amborella were regional or possibly even WGD events. There are

two putative WGD events, ζ and ε, shared by all angiosperm species (Jiao et al. 2011).

Our phylogenetic analyses suggest that event ε along with a second segmental

duplication could have produced the three angiosperm 3R-MYB groups (Figure 2-14A),

and it is conceivable that they were formed from both ζ and ε events combined with a

gene loss (Figure 2-14B).

Subsequent lineage specific duplication and loss events account for the variation

in the number of 3R-MYB members observed in modern angiosperm species. For

example, the grass lineage probably lost B-group 3R-MYBs (Figure 2-1 and 2-14); and

the orchid and palms possibly lost A- and B-group 3R-MYBs (Figure 2-1). The B-group

3R-MYB gene in tobacco is constitutively expressed during the cell cycle and functions

as a repressor (Ito et al. 2001), whereas A-group 3R-MYB genes in tobacco and

43

Arabidopsis exhibit circadian expression patterns that peak during M-phase and act as

activators (Ito et al. 2001; Araki et al. 2004; Haga et al. 2007). It was proposed that the

repressors (B-group 3R-MYBs) and activators (A-group 3R-MYBs) collaborate to

manipulate the cell progress through the G2/M transition in tobacco (Ito et al. 2001;

Araki et al. 2004). Thus, it is not clear what effect the absence of the B-group 3R-MYBs

has on cell cycle regulation in grasses. One possibility is that the monocot A- or C-

groups have picked up B-group gene function after its loss. In that case we would

expect to see accelerated evolutionary rates in monocots within the A- or C-group.

However, no positive selection in monocot lineages was detected with the method used

(Table 2-2). Taken into consideration that orchid and palm might have lost both A- and

B-group 3R-MYBs, the mechanism of monocot 3R-MYB regulation in cell cycle might be

more complex.

DNA-Binding Domain and Regulatory Motifs

As R1 does not directly interact with DNA in animal c-MYB, I expected it to be

less conserved compared with R3 and R2. However, I found the R1 domains of plant

3R-MYBs to be highly conserved (Figure 2-7D), suggesting R1 has functional

significance. In animals, R1 of c-MYB participates in intra-molecular interaction with the

carboxyl-terminus of itself (Dash et al. 1996). It is unclear whether that is the case in

plant 3R-MYBs. In addition, R1 of c-MYB influences transactivation of target genes, and

it may play a role in protein-protein interactions (Oelgeschläger et al. 2001). Further

functional characterization of the candidate rate shift sites are likely to establish whether

these lessons from animal c-MYB can provide insights into plant 3R-MYBs and

illuminate the ways that the three different subgroups of the plant 3R-MYB proteins

differ functionally. We did not detect any sites in the MYB domain region in A-, B- or C-

44

groups under positive selection, suggesting positive selection may not have played a

role in the divergence of these paralogs. However, the power of branch-site dN/dS test

for positive selection decreases as the dS value increases (Gharib and Robinson-

Rechavi 2013). As the MYB genes in this study came from distantly related species, dS

saturation was expected and it could affect the test results.

The diversity of motifs in the plant 3R-MYBs is a result of both motif gain and loss

during evolution. Motif 4, which originated in a common ancestor to seed plants,

remains in gymnosperm and angiosperm A-group genes but has been lost in B- and C-

groups genes. This motif is a repression domain that inhibits the ability of 3R-MYB

proteins to activate downstream genes during the cell cycle in tobacco (Araki et al.

2004) and Arabidopsis (Chandran et al. 2010). Moreover, specific Serine/Threonine

sites in motif 1 and 4 contribute to the removal of this inhibitory effect by cyclin-mediated

phosphorylation (Araki et al. 2004; Chandran et al. 2010). The gain of motif 4 has added

another level of regulation of the 3R-MYB proteins and increased the complexity of the

3R-MYB regulation network. Moreover, grass A-group 3R-MYBs have lost ~12 amino

acids in the middle of the repression motif, motif 4 (Figure 2-2C and Figure 2-5), which

may lead to differential function. Thus, in addition to the lack of B-group genes,

divergent motif 4 is another factor that may contribute to the different cell cycle

regulatory mechanism in grasses compared with the other flowering plants.

Intron Gain and Gene Structure Evolution

The origin of spliceosome-processed introns is a topic of debate (Koonin 2006;

Rogozin et al. 2012) that has focused on two contrasting models: the introns-early and

the introns-late hypothesis (Darnel 1978; Cavalier-Smith 1985). The introns-early

hypothesis argues that gene intron-exon structure evolution is driven by intron loss,

45

whereas the introns-late hypothesis argues that intron gain is the driver (Tarrío et al.

2008). Braun and Grotewold (1999) found only a single conserved intron position in

eukaryotic 3R-MYBs, suggesting a major role for intron gain in this gene family. Our

results expand on this, providing evidence that plant 3R-MYB genes underwent step-

wise intron gain (Figure 2-8), consistent with the introns-late hypothesis.

AS Regulation of the Plant 3R-MYBs

Although more than 60% of plant multi-exon genes were suggested to undergo

AS (Marquez et al. 2012), very little has been reported regarding alternatively spliced

transcript isoforms from the MYB gene family. Previously, there were two reports of AS

associated with plant R2R3-MYB genes. Arabidopsis AtMYB59 and AtMYB48, and their

rice homologs AK111626 and AK107214, shared a conserved AS pattern, and the

expression level of their splice variants are regulated during treatment with hormones

and stresses (Li et al. 2006). A genome scale analysis of Cucumis sativus identified

fifty-five R2R3-MYBs, among which eight exhibit AS regulation (Li et al. 2012). Our

analysis suggests that more than 60% (16 out of 25 genes) of the 3R-MYB genes

undergo AS, which is similar to the number of genes within plant genomes that are

observed to undergo AS (Marquez et al. 2012), but higher than the extent of the R2R3-

MYBs. Among the 30 AS events observed there are two cases (Amborella

Amtr00109.47, Arabidopsis At5g11510 and At3g09370; Grape GSVIVT01027493001

and Arabidopsis At4g00540) where the same AS pattern was shared between different

species, indicating a possible ancestral AS event. However, the majority of the AS

patterns were species-specific in our analysis. In a study that identified conserved AS

events among nine angiosperm species, Chamala et al. (2015) observed that 18% of

AS events identified in Amborella were shared with at least one other species, while

46

10% were shared with at least two other species. Plant 3R-MYB AS events seems to be

less conserved relative to AS events among other genes.

Interestingly, I observed a conserved alternative polyadenylation event between

Arabidopsis At4g32730 and At5g11510, both of which belong to the A-group. This AS

event would lead to a truncated protein lacking motif 4, which is the important C-

terminal repression motif (Figure 2-9). Transgenic study of the tobacco A-group gene

NtmybA2 indicated that the C-terminal truncated protein is hyperactive compared with

the whole length protein in upregulating downstream genes (Kato et al. 2009). Our

results indicate that the Arabidopsis A-group 3R-MYB genes could generate both the

primary protein products and the hyperactive protein products via AS.

Plant 3R-MYBs: Link between Cell Cycle and Abiotic Stresses

There are trade-offs between growth and stress resistance in plants. Increased

abiotic stress resistance is usually associated with decreased plant growth (Bechtold et

al. 2010), and arresting the cell cycle could lead to slow plant growth (Inzé and De

Veylder 2006). Molecular evidence for connections between abiotic stress and cell cycle

is emerging, but the mechanisms remain poorly defined. Phytohormones provide one

piece of evidence that cell cycle and abiotic stress response are linked (del Pozo et al.

2005). For example, the key stress hormone abscisic acid (ABA) accumulates under

osmotic stress and regulates various stress responsive genes, leading to increased

stress resistance and growth inhibition (Yoshida et al. 2014). ABA also increases the

expression of cell cycle inhibitors and down regulates factors related with DNA

replication (Wang et al. 1998; Mudgil et al. 2002; Yang et al. 2002; del Pozo et al.

2005). Since it is likely that various abiotic stresses induce ABA, they are expected to

change the rate of cell division. Reactive oxygen species (ROS) provide another

47

potential link between cell cycle and abiotic stresses. ROS are often produced in

reaction to various abiotic stresses (Mittler et al. 2004), and these can damage DNA

and affect DNA replication, which may affect the progression through cell division (Gill

and Tuteja 2010). A tobacco MAPKKK protein, NPK1, was observed to be involved in

cell cycle, ROS signaling and plant growth (Hirt 2000; Jonak et al. 2002, Nakagami et

al. 2005). In tobacco cells, NPK1 is expressed during M-phase and its protein product

localizes to the phragmoplast and central region of the mitotic spindle, suggesting its

role in cell cycle regulation (Hirt 2000). It has also been proposed that NPK1 senses

H2O2 and activates stress MAPKs in response to increased levels of H2O2 (Hirt 2000;

Nakagami et al. 2005). In addition, the Arabidopsis ANP1, an ortholog of the tobacco

NPK1, downregulates auxin-induced gene expression (Hirt 2000). Although the NPK1

protein is involved in multiple signaling pathways, it is not clear if it mediates interaction

between different signaling pathways.

Since there are often trade-offs between growth and stress resistance, genes

that are positively related with plant growth and cell cycle are expected to be

downregulated under stress conditions. However, up-regulation under stress conditions

implies a possible stress-related regulatory function of the gene. 3R-MYB genes in

tobacco (Ito et al. 2001; Araki et al. 2004; Ito 2005; Kato et al. 2009; Araki et al. 2012;

Araki et al. 2013), Arabidopsis (Haga et al. 2007; Haga et al. 2011) and rice (Ma et al.

2009) are involved in regulating the cell cycle. Recently, rice OsMYB3R-2, a C-group

3R-MYB, has been shown to play a role in responses to cold stress as well (Dai et al.

2007; Ma et al. 2009); the expression of OsMYB3R-2 is upregulated under various

stress conditions and overexpression of OsMYB3R-2 under cold stress increases

48

tolerance and maintains a high level of cell division (Ma et al. 2009). Our analysis

identified seven 3R-MYB genes from seven species that were significantly upregulated

under abiotic stresses: barley MLOC10556 in response to cold; grape

GSVIVT01019834001, Arabidopsis At3g09370 and soybean Glyma18G181100 in

response to heat; and rice LOC_Os01g62410 (OsMYB3R-2), maize GRMZM2G081919

and poplar Potri006G085600 in response to drought (Figure 2-12 and 2-13). Among

these seven genes, MLOC10556 is from the A-group, GSVIVT01019834001 is from B-

group, while the remaining five genes were from C-group. The observation that C-group

genes from multiple monocot and eudicot species show upregulation under various

stresses suggests that the C-group 3R-MYB genes may be involved in both cell cycle

and stress resistance, and the involvement in abiotic stresses may be an ancestral

condition that is conserved across angiosperms. Identification of the upstream

regulatory genes as well as other downstream target genes will contribute to the

understanding of how plant C-group 3R-MYBs integrate in both cell cycle and abiotic

stress response. The animal orthologs of the 3R-MYB genes are solely involved in the

cell cycle. The coupling of abiotic stress response and cell cycle through the 3R-MYB

gene products may play a role in the ability of plants to adapt to their sessile life style.

49

Figure 2-1. Species phylogeny and numbers of 3R-MYB genes in each species. The species tree in the study was inferred from Ruhfel et al. (2014), Zeng et al. (2014), Vanneste et al. (2014), and Huang et al. (2015). The divergence time was estimated by molecular clock dating from TimeTree (Hedges et al. 2015). Stars on the branches indicate WGD events; the five WGD events Arabidopsis thaliana went through were α, β, γ, ε and ζ. In the species tree dark green, yellow, purple, blue, green, and red indicate algae, moss, gymnosperms, Amborella trichopoda, monocots, and eudicots respectively. Following the species names are the number of 3R-MYBs identified in each group as well as in total. Mya: million years ago.

Geologic Timescale

Time (Mya)

z

e

g

ba

0 300 600 900 1160

Species Common name Outgroup A_group B_group C_group Total

Bathycoccus prasinos 1 1

Micromonas pusilla CCMP1545 1 1

Micromonas pusilla RCC299 1 1

Ostreococcus lucimarinus 1 1

Ostreococcus sp. RCC809 1 1

Physcomitrella patens moss 2 2

Ginkgo biloba common ginkgo 2 2

Pinus taeda loblolly pine 2 2

Amborella trichopoda 1 1 1 3

Spirodela polyrhiza duckweek 1 1 1 3

Phalaenopsis equestris orchid 1 1

Phoenix dactylifera data palm 1 1

Elaeis guineensis African oil palm 1 1

Musa acuminata banana 2 1 3 6

Musa balbisiana wild banana 2 1 1 4

Panicum virgatum switchgrass 3 2 5

Panicum hallii Hall's panicgrass 1 2 3

Setaria italica foxtail milet 2 2

Sorghum bicolor sorghum 2 1 3

Zea mays maize 1 1

Oryza sativa rice 2 2 4

Brachypodium distachyon purple false brome 2 4 6

Hordeum vulgare barley 1 1

Triticum aestivum bread wheat 4 5 9

Triticum urartu wheat A genome progenitor 1 1

Aquilegia coerulea Colorado blue columbine 1 1 1 3

Nelumbo nucifera sacred lotus 2 2 4

Beta vulgaris sugar beet 1 1 1 3

Actinidia chinensis kiwifruit 2 1 3

Utricularia gibba humped bladderwort 1 1

Mimulus guttatus monkeyflower 2 1 1 4

Nicotiana benthamiana tobbacco 2 2 2 6

Capsicum annuum pepper 2 1 3

Solanum lycopersicum tomato 2 1 1 4

Solanum tuberosum potato 1 1 2

Vitis vinifera grapevine 3 2 1 6

Eucalyptus grandis flooded gum 1 1 1 3

Citrus sinensis orange 1 1

Gossypium raimondii cotton 3 2 2 7

Theobroma cacao cacao tree 1 2 1 4

Carica papaya papaya 1 1 1 3

Brassica rapa field mustard 5 4 9

Eutrema salsugineum salt cress 2 1 2 5

Arabidopsis thaliana 2 1 2 5

Capsella grandiflora 2 2 4

Boechera stricta Drummond's rockcress 2 1 2 5

Cucumis sativus cucumber 2 1 3

Citrullus lanatus watermelon 2 1 3

Malus domestica apple 1 2 3

Pyrus bretschneideri Chinese white pear 1 1 1 3

Prunus persica peach 1 1 1 3

Prunus mume mei 1 2 1 4

Fragaria vesca woodland strawberry 1 2 1 4

Glycine max soybean 4 1 3 8

Phaseolus vulgaris common bean 2 2 4

Cajanus cajan pigeon pea 2 1 1 4

Medicago truncatula barrel medic 2 1 2 5

Cicer arietinum chickpea 2 1 3

Lotus japonicus birdsfoot trefoil 1 1 1 3

Ricinus communis castor bean 1 1 1 3

Manihot esculenta cassava 2 1 3

Jatropha curcas physic nut 1 2 1 4

Linum usitatissimum flax 2 1 2 5

Populus trichocarpa poplar 2 1 1 4

Salix purpurea willow 2 3 1 6

11 85 46 83 225P

oa

les

Gre

en

alg

ae

Total

Eu

rosid

s I

IE

uro

sid

s I

Ro

sid

sA

ste

rid

s

50

Figure 2-2. Subgroup classification of the plant 3R-MYBs. A) ML tree of the whole length plant 3R-MYB proteins. In the ML tree: dark green, yellow, purple, blue, green, and red indicate proteins from algae, moss, gymnosperms, Amborella trichopoda, monocots, and eudicots respectively. B) Domain and motif structures of the plant 3R-MYBs in each group. Boxes on the right show the protein structure of the 3R-MYB in each group. N: amino-terminus; C: carboxyl-terminus. C) Sequence logos of the four motifs identified in b. Orange stars below amino acids indicate highly conserved amino acid sites. Blue box indicates the lost fragment in motif 4 in grasses.

51

Figure 2-3. Whole length protein ML phylogenetic tree of the plant 3R-MYBs. Bootstrap values ≥ 50% are shown on the corresponding branches. In the ML tree: dark green, yellow, purple, blue, green and red indicate proteins from algae, moss, gymnosperms, Amborella trichopoda, monocots, and eudicots respectively. Orange shadow indicates sequences from rosids, purple shadow indicates sequences from asterids, and blue shadow indicates sequences from monocots.

Bathy10g03950 (Bathycoccus prasinos) 192902 (Micromonas pusilla CCMP1545)

88104 (Micromonas pusilla RCC299) 37383 (Ostreococcus RCC809)

34891 (Ostreococcus lucim arinus) Phpat.014G052200 (Physcomitrella patens)

Phpat.010G040000 (Physcomitrella patens) PITA000002016 (Pinus teada) PITA000027114 (Pinus teada)

06223 (Ginkgo biloba) 08952 (Ginkgo biloba)

Amborellales Amtr00109.47(Am borella trichopoda)Alismatales Spipo10G0009900 (Spirodela polyrhiza)

Bchr10P30232 (Musa balbisiana) Achr10P14840 (Musa acuminata)

Bchr2P04866 (Musa balbisiana) Achr2P19890 (Musa acuminata)

Zingiberales

Sobic.009G083800 (Sorghum bicolor) Pavir.J07395 (Panicum virgatum)

Bradi2g31887 (Brachypodium distachyon) Traes.1AS.61D017632 (Triticum aestivum)

Traes.1BS.403DBC53C (Triticum aestivum) Sobic.003G008700 (Sorghum bicolor) Pahal.0003s0659 (Panicum hallii) Pavir.Ea00087 (Panicum virgatum) Pavir.Eb00149 (Panicum virgatum)

LOC_Os01g12860 (Oryza sativa) LOC_Os12g13570 (Oryza sativa)

Bradi2g07677 (Brachypodium distachyon) MLOC10556 (Hordeum vulgare) Traes.3AS.09025DA2E (Triticum aestivum) Traes.3B.ABF465DDF (Triticum aestivum)

Poales

Ranunculales Aquca001.00435 (Aquilegia coerulea) Nenu003474 (Nelumbo nucifera)

Nenu005941 (Nelumbo nucifera)Proteales

GSVIVT01015370001 (Vitis vinifera)Vitales GSVIVT01035663001 (Vitis vinifera)

GSVIVT01035664001 (Vitis vinifera)Caryophyllales Bv2.032710 (Beta vulgaris)

MigutE01513 (Minulus guttatus) MigutA00232 (Minulus guttatus)

Lamiales

Solyc08g068320 (Solanum lycopersicum) Capana08g000819 (Capsicum annuum)

Capana11g000012 (Capsicum annuum) Solyc11g071300 (Solanum lycopersicum)

NbS00023020g0004 (Nicotiana benthamiana) NbS00023374g0007 (Nicotiana benthamiana)

Solanales

Cucsa161430 (Cucumis sativus) Cla022224 (Citrullus lanatus)

Cucsa311420 (Cucumis sativus) Cla023007 (Citrullus lanatus)

Cucurbitales

MDP0000179225 (Malus domestica) mrna08793 (Fragaria vesca)

Pbr009229 (Pyrus bretschneideri) Prupe.1G452000 (Prunus persica) Pm005124 (Prunus mume)

Rosales

Ca01884 (Cicer arietinum) Medtr3g110028 (Medicago truncatula)

CM1664.320 (Lotus japonicus) Ccajan11088 (Cajanus cajan)

Phvul009G106700 (Phaseolus vulgaris) Glyma06G082300 (Glycine max) Glyma04G080600 (Glycine max)

Ca14925 (Cicer arietinum) Medtr1g026870 (Medicago truncatula)

Ccajan14244 (Cajanus cajan) Phvul001G061200 (Phaseolus vulgaris)

Glyma17G190900 (Glycine max) Glyma14G143400 (Glycine max)

Fabales

Myrtales Eucgr.C02893 (Eucalyptus grandis) 29794m003447 (Ricinus communis)

Jcr4S11652.10 (Jatropha curcas) Lus10038623 (Linum usitatissimum)

Lus10022136 (Linum usitatissimum) Sapur0283s0180 (Salix purpurea)

Potri018G038000 (Populus trichocarpa) Sapur0446s0220 (Salix purpurea)

Potri006G241700 (Populus trichocarpa)

Malpighiales

Thecc1EG047091t1 (Theobroma cacao) Gorai010G110000 (Gossypium raimondii) Gorai001G028200 (Gossypium raimondii) Gorai009G051000 (Gossypium raimondii)

Malvales

Cpapaya78.76 (Carica papaya) BraraK00691 (Brassica rapa)

BraraA00517 (Brassica rapa) BraraH01296 (Brassica rapa)

Thhalv10024320m (Eutrema salsugineum) AT4G32730 (Arabidopsis thaliana) Cagra4093s0003 (Capsella grandiflora)

Bostr7867s1124 (Boechera stricta) BraraJ02221 (Brassica rapa)

BraraC00465 (Brassica rapa) Thhalv10012583m (Eutrema salsugineum) Bostr20055s0087 (Boechera stricta) Cagra7526s0003 (Capsella grandiflora) AT5G11510 (Arabidopsis thaliana)

Brassicales

Amborellales Amtr00012.146 (Amborella trichopoda)Alismatales Spipo6G0071600 (Spirodela polyrhiza)

Elgu00003.114 (Elaeis guineensis) PDK30s1074861g022 (Phoenix dactylifera)

Arecales

Achr6P05030 (Musa acuminata) Achr7P10520 (Musa acuminata)

Bchr10P31223 (Musa balbisiana) Achr10P26610 (Musa acuminata)

Zingiberales

Asparagales PEQU09277 (Phalaenopsis equestris) LOC_Os05g38460 (Oryza sativa)

Pahal.0696s0006 (Panicum hallii) Si021484m (Setaria italica)

Bradi2g23341 (Brachypodium distachyon) Bradi2g23310 (Brachypodium distachyon)

Bradi2g23320 (Brachypodium distachyon) Traes.1BL.C25B5DDB4 (Triticum aestivum)

Traes.1DL.B26A733D4 (Triticum aestivum) TRIUR3.29290 (Triticum urartu)

LOC_Os01g62410 (Oryza sativa) Bradi2g54640 (Brachypodium distachyon)

Traes.3B.B594FC28C (Triticum aestivum) Traes.3AL.386795528 (Triticum aestivum) Traes.3DL.6BBC889A1 (Triticum aestivum)

GRMZM2G081919 (Zea mays) Sobic.003G352200 (Sorghum bicolor) Si000842m (Setaria italica)

Pahal.0006s0119 (Panicum hallii) Pavir.Ea03349 (Panicum virgatum) Pavir.Eb03676 (Panicum virgatum)

Poales

Ranunculales Aquca003.00045 (Aquilegia coerulea) Nenu007682 (Nelumbo nucifera) Nenu012205 (Nelumbo nucifera)

Proteales

Caryophyllales Bv5.108980 (Beta vulgaris) ugScf00212.g12744 (Utricularia gibba)

MigutL01945 (Minulus guttatus)Lamiales

NbS00027068g0023 (Nicotiana benthamiana) NbS00007819g0018 (Nicotiana benthamiana)

Solyc09g010820 (Solanum lycopersicum) PGSC0003DMP400015671 (Solanum tuberosum)

Solanales

Vitales GSVIVT01034171001 (Vitis vinifera)Ericales Achn163941 (Actinidia chinensis)Myrtales Eucgr.K03133 (Eucalyptus grandis)

Cucsa175460 (Cucumis sativus) Cla017897 (Citrullus lanatus)

Cucurbitales

Pm002704 (Prunus mume) Prupe 6G255400 (Prunus persica)

mrna18416 (Fragaria vesca) Pbr006264 (Pyrus bretschneideri)

MDP0000197330 (Malus domestica) MDP0000219581 (Malus domestica)

Rosales

CM0147.620 (Lotus japonicus) Ccajan44733 (Cajanus cajan)

Phvul010G012500 (Phaseolus vulgaris) Glyma03G082400 (Glycine max)

Medtr7g061330 (Medicago truncatula) Medtr7g461410 (Medicago truncatula)

Phvul008G102000 (Phaseolus vulgaris) Glyma18G181100 (Glycine max) Glyma07G132200 (Glycine max)

Fabales

Sapur0001s1810 (Salix purpurea) Potri006G085600 (Populus trichocarpa) Jcr4S01332.10 (Jatropha curcas)

29846m000184 (Ricinus communis) cassava004816m (Manihot esculenta)

Malpighiales

Brassicales Cpapaya228.17 (Carica papaya) Thecc1EG021936t1 (Theobroma cacao) Gorai006G172800 (Gossypium raimondii)

Gorai001G249400 (Gossypium raimondii)Malvales

Lus10025351 (Linum usitatissimum) Lus10024394 (Linum usitatissimum)

Malpighiales

Sapindales orange1.1g009402m (Citrus sinensis) BraraJ02886 (Brassica rapa)

Thhalv10013211m (Eutrema salsugineum) AT5G02320 (Arabidopsis thaliana)

Cagra0487s0012 (Capsella grandiflora) Bostr6251s0040 (Boechera stricta) Thhalv10020579m (Eutrema salsugineum)

AT3G09370 (Arabidopsis thaliana) Cagra2515s0028 (Capsella grandiflora)

Bostr22252s0130 (Boechera stricta) BraraE03093 (Brassica rapa)

BraraC03272 (Brassica rapa) BraraA03530 (Brassica rapa)

Brassicales

Amborellales Amtr00009.357 (Amborella trichopoda)Alismatales Spipo1G0035900 (Spirodela polyrhiza)

Bchr2P03456 (Musa balbisiana) Achr2P03880 (Musa acuminata)

Zingiberales

Ranunculales Aquca013.00366 (Aquilegia coerulea)Vitales GSVIVT01019834001 (Vitis vinifera)Caryophyllales Bv5.095680 (Beta vulgaris)Ericales Achn380251 (Actinidia chinensis)Lamiales MigutA00800 (Mimulus guttatus)

Solyc08g080580 (Solanum lycopersicum ) Capana01g000277 (Capsicum annuum) PGSC0003DMP400005488 (Solanum tuberosum)

NbS00046693g0003 (Nicotiana benthamiana) NbS00001647g0009 (Nicotiana benthamiana)

Solanales

Pm023786 (Prunus mume) Prupe.5G093400 (Prunus persica)

Pbr039284 (Pyrus bretschneideri) Pbr039284 (Pyrus bretschneideri)

Rosales

CM0021.3260 (Lotus japonicus) Ca02417 (Cicer arietinum)

Medtr5g010650 (Medicago truncatula) Glyma01G217500 (Glycine max)

Ccajan13183 (Cajanus cajan)

Fabales

Myrtales Eucgr.D01905 (Eucalyptus grandis) Thecc1EG016203t1 (Theobroma cacao)

Gorai003G160500 (Gossypium raimondii)Malvales

Sapur0761s0070 (Salix purpurea) Lus10008010 (Linum usitatissimum)

cassava022173m (Manihot esculenta) 30190m011160 (Ricinus communis)

Jcr4S00150.230 (Jatropha curcas)

Malpighiales

Thhalv10029464m (Eutrema salsugineum) Bostr10064s0051 (Boechera stricta)

AT4G00540 (Arabidopsis thaliana)Brassicales

Ericales Achn295821 (Actinidia chinensis)Vitales GSVIVT01027493001 (Vitis vinifera)

Pm018007 (Prunus mume) mrna19115 (Fragaria vesca)

Rosales

Brassicales Cpapaya18.208 (Carica papaya) Thecc1EG005402t1 (Theobroma cacao)

Gorai008G117400 (Gossypium raimondii)Malvales

Sapur0586s0030 (Salix purpurea) Jcr4S10210.20 (Jatropha curcas)

Potri014G079300 (Populus trichocarpa) Sapur0586s0030 (Salix purpurea) Sapur0533s0050 (Salix purpurea)

Malpighiales

71

100

67

100

100

100

100

100

99

100

100100

100

100

100

94

95100

100

95

70

99

79

52

9983

100

100

84

100

100

99

99

100

100

90

100

10096

100

5688

63

100

100

93

100

100

100

100

100

77

72

81

99

100

100

100

92

8559

100

100100

9699

55

100

99

9797

100

94

55

58

90

57

98

56

9955

87

67

10070

9472

96100

97

98

90100

99

100

72100

100

100

100

100

100

67

89

97

61

10097

89

100

7784

100

98

93100

100

7976

88

93

100

93100

59

100

100

9987

78

100

8189

91

61

100

100

89

100

10052

10092

97

67

100

100

98

98

70

100

98

61

91

99

79

85100

97

93

80

100100

94

50

86

100

0.2

Bathy10g03950 (Bathycoccus prasinos) 192902 (Micromonas pusilla CCMP1545)

88104 (Micromonas pusilla RCC299) 37383 (Ostreococcus RCC809)

34891 (Ostreococcus lucim arinus) Phpat.014G052200 (Physcomitrella patens)

Phpat.010G040000 (Physcomitrella patens) PITA000002016 (Pinus teada) PITA000027114 (Pinus teada)

06223 (Ginkgo biloba) 08952 (Ginkgo biloba)

Amborellales Amtr00109.47(Am borella trichopoda)Alismatales Spipo10G0009900 (Spirodela polyrhiza)

Bchr10P30232 (Musa balbisiana) Achr10P14840 (Musa acuminata)

Bchr2P04866 (Musa balbisiana) Achr2P19890 (Musa acuminata)

Zingiberales

Sobic.009G083800 (Sorghum bicolor) Pavir.J07395 (Panicum virgatum)

Bradi2g31887 (Brachypodium distachyon) Traes.1AS.61D017632 (Triticum aestivum)

Traes.1BS.403DBC53C (Triticum aestivum) Sobic.003G008700 (Sorghum bicolor) Pahal.0003s0659 (Panicum hallii) Pavir.Ea00087 (Panicum virgatum) Pavir.Eb00149 (Panicum virgatum)

LOC_Os01g12860 (Oryza sativa) LOC_Os12g13570 (Oryza sativa)

Bradi2g07677 (Brachypodium distachyon) MLOC10556 (Hordeum vulgare) Traes.3AS.09025DA2E (Triticum aestivum) Traes.3B.ABF465DDF (Triticum aestivum)

Poales

Ranunculales Aquca001.00435 (Aquilegia coerulea) Nenu003474 (Nelumbo nucifera)

Nenu005941 (Nelumbo nucifera)Proteales

GSVIVT01015370001 (Vitis vinifera)Vitales GSVIVT01035663001 (Vitis vinifera)

GSVIVT01035664001 (Vitis vinifera)Caryophyllales Bv2.032710 (Beta vulgaris)

MigutE01513 (Minulus guttatus) MigutA00232 (Minulus guttatus)

Lamiales

Solyc08g068320 (Solanum lycopersicum ) Capana08g000819 (Capsicum annuum)

Capana11g000012 (Capsicum annuum) Solyc11g071300 (Solanum lycopersicum )

NbS00023020g0004 (Nicotiana benthamiana) NbS00023374g0007 (Nicotiana benthamiana)

Solanales

Cucsa161430 (Cucumis sativus) Cla022224 (Citrullus lanatus)

Cucsa311420 (Cucumis sativus) Cla023007 (Citrullus lanatus)

Cucurbitales

MDP0000179225 (Malus domestica) mrna08793 (Fragaria vesca)

Pbr009229 (Pyrus bretschneideri) Prupe.1G452000 (Prunus persica) Pm005124 (Prunus mume)

Rosales

Ca01884 (Cicer arietinum) Medtr3g110028 (Medicago truncatula)

CM1664.320 (Lotus japonicus) Ccajan11088 (Cajanus cajan)

Phvul009G106700 (Phaseolus vulgaris) Glyma06G082300 (Glycine max) Glyma04G080600 (Glycine max)

Ca14925 (Cicer arietinum) Medtr1g026870 (Medicago truncatula)

Ccajan14244 (Cajanus cajan) Phvul001G061200 (Phaseolus vulgaris)

Glyma17G190900 (Glycine max) Glyma14G143400 (Glycine max)

Fabales

Myrtales Eucgr.C02893 (Eucalyptus grandis) 29794m003447 (Ricinus communis)

Jcr4S11652.10 (Jatropha curcas) Lus10038623 (Linum usitatissimum)

Lus10022136 (Linum usitatissimum) Sapur0283s0180 (Salix purpurea)

Potri018G038000 (Populus trichocarpa) Sapur0446s0220 (Salix purpurea)

Potri006G241700 (Populus trichocarpa)

Malpighiales

Thecc1EG047091t1 (Theobroma cacao) Gorai010G110000 (Gossypium raimondii) Gorai001G028200 (Gossypium raimondii) Gorai009G051000 (Gossypium raimondii)

Malvales

Cpapaya78.76 (Carica papaya) BraraK00691 (Brassica rapa)

BraraA00517 (Brassica rapa) BraraH01296 (Brassica rapa)

Thhalv10024320m (Eutrema salsugineum) AT4G32730 (Arabidopsis thaliana) Cagra4093s0003 (Capsella grandiflora)

Bostr7867s1124 (Boechera stricta) BraraJ02221 (Brassica rapa)

BraraC00465 (Brassica rapa) Thhalv10012583m (Eutrema salsugineum) Bostr20055s0087 (Boechera stricta) Cagra7526s0003 (Capsella grandiflora) AT5G11510 (Arabidopsis thaliana)

Brassicales

Amborellales Amtr00012.146 (Amborella trichopoda)Alismatales Spipo6G0071600 (Spirodela polyrhiza)

Elgu00003.114 (Elaeis guineensis) PDK30s1074861g022 (Phoenix dactylifera)

Arecales

Achr6P05030 (Musa acuminata) Achr7P10520 (Musa acuminata)

Bchr10P31223 (Musa balbisiana) Achr10P26610 (Musa acuminata)

Zingiberales

Asparagales PEQU09277 (Phalaenopsis equestris) LOC_Os05g38460 (Oryza sativa)

Pahal.0696s0006 (Panicum hallii) Si021484m (Setaria italica)

Bradi2g23341 (Brachypodium distachyon) Bradi2g23310 (Brachypodium distachyon)

Bradi2g23320 (Brachypodium distachyon) Traes.1BL.C25B5DDB4 (Triticum aestivum)

Traes.1DL.B26A733D4 (Triticum aestivum) TRIUR3.29290 (Triticum urartu)

LOC_Os01g62410 (Oryza sativa) Bradi2g54640 (Brachypodium distachyon)

Traes.3B.B594FC28C (Triticum aestivum) Traes.3AL.386795528 (Triticum aestivum) Traes.3DL.6BBC889A1 (Triticum aestivum)

GRMZM2G081919 (Zea mays) Sobic.003G352200 (Sorghum bicolor) Si000842m (Setaria italica)

Pahal.0006s0119 (Panicum hallii) Pavir.Ea03349 (Panicum virgatum) Pavir.Eb03676 (Panicum virgatum)

Poales

Ranunculales Aquca003.00045 (Aquilegia coerulea) Nenu007682 (Nelumbo nucifera) Nenu012205 (Nelumbo nucifera)

Proteales

Caryophyllales Bv5.108980 (Beta vulgaris) ugScf00212.g12744 (Utricularia gibba)

MigutL01945 (Minulus guttatus)Lamiales

NbS00027068g0023 (Nicotiana benthamiana) NbS00007819g0018 (Nicotiana benthamiana)

Solyc09g010820 (Solanum lycopersicum ) PGSC0003DMP400015671 (Solanum tuberosum)

Solanales

Vitales GSVIVT01034171001 (Vitis vinifera)Ericales Achn163941 (Actinidia chinensis)Myrtales Eucgr.K03133 (Eucalyptus grandis)

Cucsa175460 (Cucumis sativus) Cla017897 (Citrullus lanatus)

Cucurbitales

Pm002704 (Prunus mume) Prupe 6G255400 (Prunus persica)

mrna18416 (Fragaria vesca) Pbr006264 (Pyrus bretschneideri)

MDP0000197330 (Malus domestica) MDP0000219581 (Malus domestica)

Rosales

CM0147.620 (Lotus japonicus) Ccajan44733 (Cajanus cajan)

Phvul010G012500 (Phaseolus vulgaris) Glyma03G082400 (Glycine max)

Medtr7g061330 (Medicago truncatula) Medtr7g461410 (Medicago truncatula)

Phvul008G102000 (Phaseolus vulgaris) Glyma18G181100 (Glycine max) Glyma07G132200 (Glycine max)

Fabales

Sapur0001s1810 (Salix purpurea) Potri006G085600 (Populus trichocarpa) Jcr4S01332.10 (Jatropha curcas)

29846m000184 (Ricinus communis) cassava004816m (Manihot esculenta)

Malpighiales

Brassicales Cpapaya228.17 (Carica papaya) Thecc1EG021936t1 (Theobroma cacao) Gorai006G172800 (Gossypium raimondii)

Gorai001G249400 (Gossypium raimondii)Malvales

Lus10025351 (Linum usitatissimum) Lus10024394 (Linum usitatissimum)

Malpighiales

Sapindales orange1.1g009402m (Citrus sinensis) BraraJ02886 (Brassica rapa)

Thhalv10013211m (Eutrema salsugineum) AT5G02320 (Arabidopsis thaliana)

Cagra0487s0012 (Capsella grandiflora) Bostr6251s0040 (Boechera stricta) Thhalv10020579m (Eutrema salsugineum)

AT3G09370 (Arabidopsis thaliana) Cagra2515s0028 (Capsella grandiflora)

Bostr22252s0130 (Boechera stricta) BraraE03093 (Brassica rapa)

BraraC03272 (Brassica rapa) BraraA03530 (Brassica rapa)

Brassicales

Amborellales Amtr00009.357 (Amborella trichopoda)Alismatales Spipo1G0035900 (Spirodela polyrhiza)

Bchr2P03456 (Musa balbisiana) Achr2P03880 (Musa acuminata)

Zingiberales

Ranunculales Aquca013.00366 (Aquilegia coerulea)Vitales GSVIVT01019834001 (Vitis vinifera)Caryophyllales Bv5.095680 (Beta vulgaris)Ericales Achn380251 (Actinidia chinensis)Lamiales MigutA00800 (Mimulus guttatus)

Solyc08g080580 (Solanum lycopersicum) Capana01g000277 (Capsicum annuum) PGSC0003DMP400005488 (Solanum tuberosum)

NbS00046693g0003 (Nicotiana benthamiana) NbS00001647g0009 (Nicotiana benthamiana)

Solanales

Pm023786 (Prunus mume) Prupe.5G093400 (Prunus persica)

Pbr039284 (Pyrus bretschneideri) Pbr039284 (Pyrus bretschneideri)

Rosales

CM0021.3260 (Lotus japonicus) Ca02417 (Cicer arietinum)

Medtr5g010650 (Medicago truncatula) Glyma01G217500 (Glycine max)

Ccajan13183 (Cajanus cajan)

Fabales

Myrtales Eucgr.D01905 (Eucalyptus grandis) Thecc1EG016203t1 (Theobroma cacao)

Gorai003G160500 (Gossypium raimondii)Malvales

Sapur0761s0070 (Salix purpurea) Lus10008010 (Linum usitatissimum)

cassava022173m (Manihot esculenta) 30190m011160 (Ricinus communis)

Jcr4S00150.230 (Jatropha curcas)

Malpighiales

Thhalv10029464m (Eutrema salsugineum) Bostr10064s0051 (Boechera stricta)

AT4G00540 (Arabidopsis thaliana)Brassicales

Ericales Achn295821 (Actinidia chinensis)Vitales GSVIVT01027493001 (Vitis vinifera)

Pm018007 (Prunus mume) mrna19115 (Fragaria vesca)

Rosales

Brassicales Cpapaya18.208 (Carica papaya) Thecc1EG005402t1 (Theobroma cacao)

Gorai008G117400 (Gossypium raimondii)Malvales

Sapur0586s0030 (Salix purpurea) Jcr4S10210.20 (Jatropha curcas)

Potri014G079300 (Populus trichocarpa) Sapur0586s0030 (Salix purpurea) Sapur0533s0050 (Salix purpurea)

Malpighiales

71

100

67

100

100

100

100

100

99

100

100100

100

100

100

94

95100

100

95

70

99

79

52

9983

100

100

84

100

100

99

99

100

100

90

100

10096

100

5688

63

100

100

93

100

100

100

100

100

77

72

81

99

100

100

100

92

8559

100

100100

9699

55

100

99

9797

100

94

55

58

90

57

98

56

9955

87

67

10070

9472

96100

97

98

90100

99

100

72100

100

100

100

100

100

67

89

97

61

10097

89

100

7784

100

98

93100

100

7976

88

93

100

93100

59

100

100

9987

78

100

8189

91

61

100

100

89

100

10052

10092

97

67

100

100

98

98

70

100

98

61

91

99

79

85100

97

93

80

100100

94

50

86

100

0.2

Bathy10g03950 (Bathycoccus prasinos) 192902 (Micromonas pusilla CCMP1545)

88104 (Micromonas pusilla RCC299) 37383 (Ostreococcus RCC809)

34891 (Ostreococcus lucim arinus) Phpat.014G052200 (Physcomitrella patens)

Phpat.010G040000 (Physcomitrella patens) PITA000002016 (Pinus teada) PITA000027114 (Pinus teada)

06223 (Ginkgo biloba) 08952 (Ginkgo biloba)

Amborellales Amtr00109.47(Am borella trichopoda)Alismatales Spipo10G0009900 (Spirodela polyrhiza)

Bchr10P30232 (Musa balbisiana) Achr10P14840 (Musa acuminata)

Bchr2P04866 (Musa balbisiana) Achr2P19890 (Musa acuminata)

Zingiberales

Sobic.009G083800 (Sorghum bicolor) Pavir.J07395 (Panicum virgatum)

Bradi2g31887 (Brachypodium distachyon) Traes.1AS.61D017632 (Triticum aestivum)

Traes.1BS.403DBC53C (Triticum aestivum) Sobic.003G008700 (Sorghum bicolor) Pahal.0003s0659 (Panicum hallii) Pavir.Ea00087 (Panicum virgatum) Pavir.Eb00149 (Panicum virgatum)

LOC_Os01g12860 (Oryza sativa) LOC_Os12g13570 (Oryza sativa)

Bradi2g07677 (Brachypodium distachyon) MLOC10556 (Hordeum vulgare) Traes.3AS.09025DA2E (Triticum aestivum) Traes.3B.ABF465DDF (Triticum aestivum)

Poales

Ranunculales Aquca001.00435 (Aquilegia coerulea) Nenu003474 (Nelumbo nucifera)

Nenu005941 (Nelumbo nucifera)Proteales

GSVIVT01015370001 (Vitis vinifera)Vitales GSVIVT01035663001 (Vitis vinifera)

GSVIVT01035664001 (Vitis vinifera)Caryophyllales Bv2.032710 (Beta vulgaris)

MigutE01513 (Minulus guttatus) MigutA00232 (Minulus guttatus)

Lamiales

Solyc08g068320 (Solanum lycopersicum ) Capana08g000819 (Capsicum annuum)

Capana11g000012 (Capsicum annuum) Solyc11g071300 (Solanum lycopersicum )

NbS00023020g0004 (Nicotiana benthamiana) NbS00023374g0007 (Nicotiana benthamiana)

Solanales

Cucsa161430 (Cucumis sativus) Cla022224 (Citrullus lanatus)

Cucsa311420 (Cucumis sativus) Cla023007 (Citrullus lanatus)

Cucurbitales

MDP0000179225 (Malus domestica) mrna08793 (Fragaria vesca)

Pbr009229 (Pyrus bretschneideri) Prupe.1G452000 (Prunus persica) Pm005124 (Prunus mume)

Rosales

Ca01884 (Cicer arietinum) Medtr3g110028 (Medicago truncatula)

CM1664.320 (Lotus japonicus) Ccajan11088 (Cajanus cajan)

Phvul009G106700 (Phaseolus vulgaris) Glyma06G082300 (Glycine max) Glyma04G080600 (Glycine max)

Ca14925 (Cicer arietinum) Medtr1g026870 (Medicago truncatula)

Ccajan14244 (Cajanus cajan) Phvul001G061200 (Phaseolus vulgaris)

Glyma17G190900 (Glycine max) Glyma14G143400 (Glycine max)

Fabales

Myrtales Eucgr.C02893 (Eucalyptus grandis) 29794m003447 (Ricinus communis)

Jcr4S11652.10 (Jatropha curcas) Lus10038623 (Linum usitatissimum)

Lus10022136 (Linum usitatissimum) Sapur0283s0180 (Salix purpurea)

Potri018G038000 (Populus trichocarpa) Sapur0446s0220 (Salix purpurea)

Potri006G241700 (Populus trichocarpa)

Malpighiales

Thecc1EG047091t1 (Theobroma cacao) Gorai010G110000 (Gossypium raimondii) Gorai001G028200 (Gossypium raimondii) Gorai009G051000 (Gossypium raimondii)

Malvales

Cpapaya78.76 (Carica papaya) BraraK00691 (Brassica rapa)

BraraA00517 (Brassica rapa) BraraH01296 (Brassica rapa)

Thhalv10024320m (Eutrema salsugineum) AT4G32730 (Arabidopsis thaliana) Cagra4093s0003 (Capsella grandiflora)

Bostr7867s1124 (Boechera stricta) BraraJ02221 (Brassica rapa)

BraraC00465 (Brassica rapa) Thhalv10012583m (Eutrema salsugineum) Bostr20055s0087 (Boechera stricta) Cagra7526s0003 (Capsella grandiflora) AT5G11510 (Arabidopsis thaliana)

Brassicales

Amborellales Amtr00012.146 (Amborella trichopoda)Alismatales Spipo6G0071600 (Spirodela polyrhiza)

Elgu00003.114 (Elaeis guineensis) PDK30s1074861g022 (Phoenix dactylifera)

Arecales

Achr6P05030 (Musa acuminata) Achr7P10520 (Musa acuminata)

Bchr10P31223 (Musa balbisiana) Achr10P26610 (Musa acuminata)

Zingiberales

Asparagales PEQU09277 (Phalaenopsis equestris) LOC_Os05g38460 (Oryza sativa)

Pahal.0696s0006 (Panicum hallii) Si021484m (Setaria italica)

Bradi2g23341 (Brachypodium distachyon) Bradi2g23310 (Brachypodium distachyon)

Bradi2g23320 (Brachypodium distachyon) Traes.1BL.C25B5DDB4 (Triticum aestivum)

Traes.1DL.B26A733D4 (Triticum aestivum) TRIUR3.29290 (Triticum urartu)

LOC_Os01g62410 (Oryza sativa) Bradi2g54640 (Brachypodium distachyon)

Traes.3B.B594FC28C (Triticum aestivum) Traes.3AL.386795528 (Triticum aestivum) Traes.3DL.6BBC889A1 (Triticum aestivum) GRMZM2G081919 (Zea mays)

Sobic.003G352200 (Sorghum bicolor) Si000842m (Setaria italica)

Pahal.0006s0119 (Panicum hallii) Pavir.Ea03349 (Panicum virgatum) Pavir.Eb03676 (Panicum virgatum)

Poales

Ranunculales Aquca003.00045 (Aquilegia coerulea) Nenu007682 (Nelumbo nucifera) Nenu012205 (Nelumbo nucifera)

Proteales

Caryophyllales Bv5.108980 (Beta vulgaris) ugScf00212.g12744 (Utricularia gibba)

MigutL01945 (Minulus guttatus)Lamiales

NbS00027068g0023 (Nicotiana benthamiana) NbS00007819g0018 (Nicotiana benthamiana)

Solyc09g010820 (Solanum lycopersicum ) PGSC0003DMP400015671 (Solanum tuberosum)

Solanales

Vitales GSVIVT01034171001 (Vitis vinifera)Ericales Achn163941 (Actinidia chinensis)Myrtales Eucgr.K03133 (Eucalyptus grandis)

Cucsa175460 (Cucumis sativus) Cla017897 (Citrullus lanatus)

Cucurbitales

Pm002704 (Prunus mume) Prupe 6G255400 (Prunus persica)

mrna18416 (Fragaria vesca) Pbr006264 (Pyrus bretschneideri)

MDP0000197330 (Malus domestica) MDP0000219581 (Malus domestica)

Rosales

CM0147.620 (Lotus japonicus) Ccajan44733 (Cajanus cajan)

Phvul010G012500 (Phaseolus vulgaris) Glyma03G082400 (Glycine max)

Medtr7g061330 (Medicago truncatula) Medtr7g461410 (Medicago truncatula)

Phvul008G102000 (Phaseolus vulgaris) Glyma18G181100 (Glycine max) Glyma07G132200 (Glycine max)

Fabales

Sapur0001s1810 (Salix purpurea) Potri006G085600 (Populus trichocarpa) Jcr4S01332.10 (Jatropha curcas)

29846m000184 (Ricinus communis) cassava004816m (Manihot esculenta)

Malpighiales

Brassicales Cpapaya228.17 (Carica papaya) Thecc1EG021936t1 (Theobroma cacao) Gorai006G172800 (Gossypium raimondii)

Gorai001G249400 (Gossypium raimondii)Malvales

Lus10025351 (Linum usitatissimum) Lus10024394 (Linum usitatissimum)

Malpighiales

Sapindales orange1.1g009402m (Citrus sinensis) BraraJ02886 (Brassica rapa)

Thhalv10013211m (Eutrema salsugineum) AT5G02320 (Arabidopsis thaliana)

Cagra0487s0012 (Capsella grandiflora) Bostr6251s0040 (Boechera stricta) Thhalv10020579m (Eutrema salsugineum)

AT3G09370 (Arabidopsis thaliana) Cagra2515s0028 (Capsella grandiflora)

Bostr22252s0130 (Boechera stricta) BraraE03093 (Brassica rapa)

BraraC03272 (Brassica rapa) BraraA03530 (Brassica rapa)

Brassicales

Amborellales Amtr00009.357 (Amborella trichopoda)Alismatales Spipo1G0035900 (Spirodela polyrhiza)

Bchr2P03456 (Musa balbisiana) Achr2P03880 (Musa acuminata)

Zingiberales

Ranunculales Aquca013.00366 (Aquilegia coerulea)Vitales GSVIVT01019834001 (Vitis vinifera)Caryophyllales Bv5.095680 (Beta vulgaris)Ericales Achn380251 (Actinidia chinensis)Lamiales MigutA00800 (Mimulus guttatus)

Solyc08g080580 (Solanum lycopersicum) Capana01g000277 (Capsicum annuum) PGSC0003DMP400005488 (Solanum tuberosum)

NbS00046693g0003 (Nicotiana benthamiana) NbS00001647g0009 (Nicotiana benthamiana)

Solanales

Pm023786 (Prunus mume) Prupe.5G093400 (Prunus persica)

Pbr039284 (Pyrus bretschneideri) Pbr039284 (Pyrus bretschneideri)

Rosales

CM0021.3260 (Lotus japonicus) Ca02417 (Cicer arietinum)

Medtr5g010650 (Medicago truncatula) Glyma01G217500 (Glycine max)

Ccajan13183 (Cajanus cajan)

Fabales

Myrtales Eucgr.D01905 (Eucalyptus grandis) Thecc1EG016203t1 (Theobroma cacao)

Gorai003G160500 (Gossypium raimondii)Malvales

Sapur0761s0070 (Salix purpurea) Lus10008010 (Linum usitatissimum)

cassava022173m (Manihot esculenta) 30190m011160 (Ricinus communis)

Jcr4S00150.230 (Jatropha curcas)

Malpighiales

Thhalv10029464m (Eutrema salsugineum) Bostr10064s0051 (Boechera stricta)

AT4G00540 (Arabidopsis thaliana)Brassicales

Ericales Achn295821 (Actinidia chinensis)Vitales GSVIVT01027493001 (Vitis vinifera)

Pm018007 (Prunus mume) mrna19115 (Fragaria vesca)

Rosales

Brassicales Cpapaya18.208 (Carica papaya) Thecc1EG005402t1 (Theobroma cacao)

Gorai008G117400 (Gossypium raimondii)Malvales

Sapur0586s0030 (Salix purpurea) Jcr4S10210.20 (Jatropha curcas)

Potri014G079300 (Populus trichocarpa) Sapur0586s0030 (Salix purpurea) Sapur0533s0050 (Salix purpurea)

Malpighiales

71

100

67

100

100

100

100

100

99

100

100100

100

100

100

94

95100

100

95

70

99

79

52

9983

100

100

84

100

100

99

99

100

100

90

100

10096

100

5688

63

100

100

93

100

100

100

100

100

77

72

81

99

100

100

100

92

8559

100

100100

9699

55

100

99

9797

100

94

55

58

90

57

98

56

9955

87

67

10070

9472

96100

97

98

90100

99

100

72100

100

100

100

100

100

67

89

97

61

10097

89

100

7784

100

98

93100

100

7976

88

93

100

93100

59

100

100

9987

78

100

8189

91

61

100

100

89

100

10052

10092

97

67

100

100

98

98

70

100

98

61

91

99

79

85100

97

93

80

100100

94

50

86

100

0.2

Bathy10g03950 (Bathycoccus prasinos) 192902 (Micromonas pusilla CCMP1545)

88104 (Micromonas pusilla RCC299) 37383 (Ostreococcus RCC809)

34891 (Ostreococcus lucim arinus) Phpat.014G052200 (Physcomitrella patens)

Phpat.010G040000 (Physcomitrella patens) PITA000002016 (Pinus teada) PITA000027114 (Pinus teada)

06223 (Ginkgo biloba) 08952 (Ginkgo biloba)

71

100

67

100

100

100

98

100

50

86

100

0.2

Bathy10g03950 (Bathycoccus prasinos) 192902 (Micromonas pusilla CCMP1545)

88104 (Micromonas pusilla RCC299) 37383 (Ostreococcus RCC809)

34891 (Ostreococcus lucim arinus) Phpat.014G052200 (Physcomitrella patens)

Phpat.010G040000 (Physcomitrella patens) PITA000002016 (Pinus teada) PITA000027114 (Pinus teada)

06223 (Ginkgo biloba) 08952 (Ginkgo biloba)

71

100

67

100

100

100

98

100

50

86

100

0.2

A_Group

C_Group

B_Group

A_Group

C_Group

B_Group

52

Figure 2-4. Syntenic blocks in algae (Ostreococcus lucimarinus) and Amborella trichopoda that contain 3R-MYB genes identified by DAGchainer. The x-axes of each plot show the three Amborella scaffolds and the y-axis shows the segment on chr9 of Ostreococcus lucimarinus genome that contains the single 3R-MYB in that taxon. Dots indicate homologous genes; solid dots indicate the 3R-MYB genes.

53

Figure 2-5. Tests for origin of the three groups of the plant 3R-MYB genes. A) Distribution of the pair-wise synonymous distances (dS) for paralogous 3R-MYBs in each angiosperm species. The pair-wise dS value distribution of A-A, B-B, C-C, A-B, A-C and B-C are shown as histograms with a normal distribution fitted. B) Normal distributions fit to pairwise dS values for the six groups.

54

Figure 2-6. Multiple protein alignments of motif 4 with representative species.

Algae

Moss

Gymnosperm

Angiosperm (A_Group)

Angiosperm (C_Group)

R1 R2 R3 C

Angiosperm (B_Group)

N

Motif 1

Motif 2 Motif 3

Motif 4 Motif4

Gymnosperms

Angiosperms

Amborella

Eudicots

Monocots

55

Figure 2-7. Analysis of DNA binding domain of the plant 3R-MYBs proteins. A) Alignments of DNA binding domain of representative plant 3R-MYB proteins. Protein groups (A-, B-, or C-) are indicated before of gene names and species are indicated inside brackets. The five conserved introns in the DNA-binding domain are indicated using black arrows, black lines, uppercase bold letters A, B, C, D and E; the other intron is indicated using gray arrow, gray line and lowercase letters b. The numbers in parentheses after the letter indicate intron position, with “0” indicates the introns between the two codons of the indicated two amino acids; “1” indicates the introns between the first and second nucleotide of the codon of the indicated amino acid; “2” indicates the introns between the second and third nucleotide of the codon of the indicated amino acid. Thick black lines at the bottom indicate the three helices in each R repeat (Ogata et al. 1992; Ogata et al. 1994) and blue asterisks indicate the conserved tryptophans. B) Distribution of the amino acid substitution rate differences comparing each group with the other two groups. Dashed lines indicate our threshold (2.57 standard deviations) for the identification of rate shift sites. C) The site in each group that has an unusually low (Slow in the Group) or high (Fast in the Group) amino acid substitution rate compared relative to the other two groups. D) Amino acid alignment logos of the DNA-binding-domain of A-, B- and C-group 3R-MYBs with the slow (green) and fast (orange) sites highlighted. Blue boxes above the sequence logos indicate helices, blue lines between them indicate turns, and blue asterisks indicate the conserved tryptophans.

56

57

Figure 2-8. Intron evolution pattern of the DNA-binding-domain region of the plant 3R-MYBs. For each gene depicted boxes indicate exons, lines indicate introns, UTR regions are not included in the gene structure. The hash lines indicate possible introns. Gray, pink and green thick bars indicate the five conserved introns, with the name of each intron on the top. The four conserved motifs are shown in corresponding position in the gene structure.

Algae

Moss

Gymnosperm

Angiosperm (A_Group)

Angiosperm (C_Group)

Angiosperm (B_Group) Start

codon

Stop

codon 1 2 3 4 5 6

A B

C

D E

R1

R2

R3

Motif 1

Motif 2

Motif 3

Motif 4

58

Figure 2-9. AS of 3R-MYB proteins in Amborella, Arabidopsis, grape, popular, rice and

sorghum. The group (A-, B-, or C-) membership for each gene is indicated in brackets. Boxes indicate exons (blue for constitutively spliced; orange for alternatively spliced) and lines indicate introns. Gene structures are drawn to scale and connecting bars indicate homologous exons (green for the six exons encoding the DNA binding domain; pink for the four exons specific to the A-group; gray for all others). The two black flags in each gene indicate the start and stop codon in the primary transcript and red hexagons indicate stop codons generated by AS. The green circles at the end of the exons indicate alternative polyadenylation events.

59

Figure 2-10. Predicted MSA element distribution within the regions 2kb upstream of the plant 3R-MYB genes.

60

Figure 2-11. Violin plots of the number of MSA core sequences in the upstream regions for each group of genes. The median number of MSA core sequences in each group is shown by the white dot (the median is on the right side). Kernel width indicates the fitted data density under kernel distribution. a, b and c above each violin plot indicate difference significance by ANOVA and Tukey’s HSD test under 0.05 significance.

61

Figure 2-12. Expression profiles of the Arabidopsis 3R-MYB genes under abiotic stresses. The expression level of three Arabidopsis genes At4g32730 (A-group), At5g11510 (A-group), At3g09370 (C-group) in root and shoot under heat (38 °C), cold (4 °C), salt (150mM NaCl), and drought (dry air stream). In heat stress, the seedlings were returned to room temperature after a 3 hour treatment (indicated by red arrow). For each gene, the expression level in root at 0 time point was normalized to 1. The expression levels of that gene under other conditions were normalized accordingly. Error bars indicate standard error. Asterisk(s) indicate significant level from one-way ANOVA test (significance level: *: 0.05; **: 0.01; ***: 0.001).

0 0.25 0.5 1 3 4 6 12 24h

0.0

0.5

1.0

1.5

2.0

Root

Shoot*****

0 0.5 1 3 6 12 24h

0.0

0.5

1.0

1.5

2.0

Root

Shoot***

0 0.5 1 3 6 12 24h

0.0

0.5

1.0

1.5

2.0

Root

Shoot

0 0.5 1 3 6 12 24h

0.0

0.5

1.0

1.5

2.0

Root

Shoot

0 0.25 0.5 1 3 4 6 12 24h

0.0

0.5

1.0

1.5

2.0

Root

Shoot*****

0 0.5 1 3 6 12 24h

0.0

0.5

1.0

1.5

2.0

Root

Shoot

0 0.5 1 3 6 12 24h

0.0

0.5

1.0

1.5

2.0

Root

Shoot*

0 0.5 1 3 6 12 24h

0.0

0.5

1.0

1.5

2.0

Root

Shoot*

0 0.25 0.5 1 3 4 6 12 24h

0.0

0.5

1.0

1.5

2.0

Root

Shoot******

0 0.5 1 3 6 12 24h

0.0

0.5

1.0

1.5

2.0

Root

Shoot*

0 0.5 1 3 6 12 24h

0.0

0.5

1.0

1.5

2.0

Root

Shoot

0 0.5 1 3 6 12 24h

0.0

0.5

1.0

1.5

2.0

Root

Shoot*

Rela

tive E

xp

ressio

n

AT

3G

09

370

(G

rou

p C

)AT

5G

11

51

0 (

Gro

up A

)AT

4G

32

73

0 (

Gro

up

A)

Heat Cold Salt Drought

Time

62

Figure 2-13. Expression profiles of the 3R-MYB genes from nine angiosperm species under abiotic stresses. Labels in the upper left corner of each bar plot indicate microarray project accession number in PLEXdb (Dash et al. 2012). Please see detailed description of each experiment in PLEXdb (http://www.plexdb.org/index.php) under corresponding microarray project accession number. Error bars indicate standard error. Asterisk(s) indicate significant level from two-sample t-test (significance level: *: 0.05; **: 0.01; ***: 0.001). a, b and c above each bar plot indicate difference significance by ANOVA and Tukey’s HSD test under 0.05 significance.

63

Figure 2-14. Model of plant 3R-MYB evolution.

64

Table 2-1. Data resource summary of the sixty-five plant species used in this study.

Group Species Version Note*

Algae Bathycoccus prasinos 7/15/11 [1]

Algae Ostreococcus lucimarinus v2.0 [1,2,3,4]

Algae Ostreococcus RCC809 v2 [1]

Algae Micromonas pusilla CCMP1545 v3.0 [1,2,3,4]

Algae Micromonas pusilla RCC299 v3.0 [1,2,3,4]

moss Physcomitrella patens v3.0 [1,2,3,4]

Gymnospermae Ginkgo biloba [1]

Gymnospermae Pinus taeda v1.01/v2 [1,3]

Angiospermae Amborella trichopoda v1.0 [1,2,3,4]

Monocot Spirodela polyrhiza v2 [1,2,3,4]

Monocot Phalaenopsis equestris v5.0 [1]

Monocot Elaeis guineensis v2 [1]

Monocot Phoenix dactylifera v3 [1]

Monocot Musa acuminata v1 [1]

Monocot Musa balbisiana v1 [1]

Monocot Brachypodium distachyon v2.1 [1,2,3,4]

Monocot Hordeum vulgare v1.25 [1]

Monocot Oryza sativa v7.0 [1,2,3,4]

Monocot Panicum virgatum v1.1 [1,2,3,4]

Monocot Setaria italica v2.1 [1,2,3,4]

Monocot Sorghum bicolor v2.1 [1,2,3,4]

Monocot Triticum aestivum v2.2 [1,2,3,4]

Monocot Triticum urartu 1.25 [1]

Monocot Zea mays 6a [1,2,3,4]

Monocot Panicum hallii v0.5 [1,2,3,4]

Basal eudicot Aquilegia coerulea v1.1 [1,2,3,4,5]

Basal eudicot Nelumbo nucifera v1 [1,5]

Eudicot Beta vulgaris v1.1 [1,5]

Eudicot_Rosid Vitis vinifera Genoscope.12X [1,2,3,4,5]

Eudicot_Rosid Eucalyptus grandis v2.0 [1,2,3,4,5]

Eudicot_Rosid Fragaria vesca v1.1 [1,2,3,4,5]

Eudicot_Rosid Malus domestica v1.0 [1,2,3,4,5]

Eudicot_Rosid Prunus mume v1 [1,5]

Eudicot_Rosid Prunus persica v2.1 [1,2,3,4,5]

Eudicot_Rosid Pyrus bretschneideri V121010 [1,5]

Eudicot_Rosid Cajanus cajan v5.0 [1,5]

Eudicot_Rosid Glycine max Wm82.a2.v1 [1,2,3,4,5]

Eudicot_Rosid Medicago truncatula mt4.0v1 [1,2,3,4,5]

Eudicot_Rosid Phaseolus vulgaris v1.0 [1,2,3,4,5]

Eudicot_Rosid Cicer arietinum v1 [1,5]

Eudicot_Rosid Lotus japonicus v2.5 [1,5]

65

Table 2-1. Continued

Group Species Version Note*

Eudicot_Rosid Citrullus lanatus v1 [1,5]

Eudicot_Rosid Cucumis sativus v1.0 [1,2,3,4,5]

Eudicot_Rosid Jatropha curcas v4.5 [1,5]

Eudicot_Rosid Manihot esculenta v4.1 [1,2,3,4,5]

Eudicot_Rosid Ricinus communis v0.1 [1,2,3,4,5]

Eudicot_Rosid Linum usitatissimum BGIv1.0 [1,2,3,4,5]

Eudicot_Rosid Populus trichocarpa v3.0 [1,2,3,4,5]

Eudicot_Rosid Salix purpurea v1.0 [1,2,3,4,5]

Eudicot_Rosid Gossypium raimondii v2.1 [1,2,3,4,5]

Eudicot_Rosid Theobroma cacao v1.1 [1,2,3,4,5]

Eudicot_Rosid Citrus sinensis v1.1 [1,2,3,4,5]

Eudicot_Rosid Carica papaya ASGPBv0.4 [1,2,3,4,5]

Eudicot_Rosid Arabidopsis thaliana TAIR10 [1,2,3,4,5]

Eudicot_Rosid Brassica rapa v1.3 [1,2,3,4,5]

Eudicot_Rosid Boechera stricta v1.2 [1,2,3,4,5]

Eudicot_Rosid Capsella grandiflora v1.1 [1,2,3,4,5]

Eudicot_Rosid Eutrema salsugineum v1.0 [1,2,3,4,5]

Eudicot_Asterid Capsicum annuum v2.0 [1,5]

Eudicot_Asterid Nicotiana benthamiana v0.4.4 [1,5]

Eudicot_Asterid Solanum lycopersicum iTAGv2.3 [1,2,3,4,5]

Eudicot_Asterid Solanum tuberosum v3.4 [1,2,3,4,5]

Eudicot_Asterid Mimulus guttatus v2.0 [1,2,3,4,5]

Eudicot_Asterid Utricularia gibba v4.1 [1,5]

Eudicot_Asterid Actinidia chinensis v1 [1,5]

*NOTE: 1.Phylogeny analysis; 2. Synonymous divergence analysis, 3. Gene structure analysis, 4. Promoter analysis; 5. Synonymous divergence analysis

66

Table 2-2. Positive selection test results.

Test H0(InL) H1(InL)

Positive selection in A group -32669.96291 -32669.96293

Positive selection in B group -32669.96299 -32669.96292

Positive selection in C group -32669.81231 -32669.81241

Positive selection in monocots within A group -10477.43538 -10477.35695

Positive selection in monocots within C group -10656.13493 -10656.13493

67

CHAPTER 3 JASMONATE INDUCED ALTERNATIVE SPLICING RESPONSES IN ARABIDOPSIS

Background

Plants have developed inducible defense mechanisms and complex signaling

networks for efficient, precise and fast response to ever-changing environmental stimuli

and unpredictable biotic attacks. JA is a phytohormone induced by biotic aggression

mediated by herbivores and pathogen attack, as well as abiotic stresses such as cold,

salt, UV light and ozone (Browse and Howe 2008; Howe and Jander 2008; Katsir et al.

2008; Hu et al. 2013; Valenzuela et al. 2016). JA regulates a variety of plant events

including photosynthesis (Attaran et al. 2014), root and shoot morphogenesis (Gasperini

et al. 2015; Zheng et al. 2017), flowering time (Zhai et al. 2015; Thatcher et al. 2016),

stamen development (Song et al. 2011; Qi et al. 2015a), seed germination (Linkies and

Leubner-Metzger 2011), and senescence (Miao and Zentgraf 2007; Jiang et al. 2014; Qi

et al. 2015b; Yu et al. 2016). In general, JA inhibits plant growth, promotes growth-

defense transition and triggers early reproduction.

JA-Ile, JAZ proteins and the F-box protein COI1 form the key regulatory module

of JA signal transduction and amplification (Thines et al. 2007; Chini et al. 2007). Upon

JA-Ile accumulation, the three-component-module will be formed, the JAZs are

polyubiquitinated and degraded through the 26S proteasome, which releases

suppression of the JAZ-interacting transcription factors that regulate early-jasmonate-

responsive genes (Thines et al. 2007; Chini et al. 2007; Chini et al. 2016). Thirteen JAZ

genes have been identified in the Arabidopsis genome (Bai et al. 2011; Thireault et al.

2015); there is both functional redundancy and specificity (Chini et al. 2016). Members

of the JAZ family share a conserved N-terminal domain (Moreno et al. 2013), a TIFY

68

(previously known as ZIM) domain (expect JAZ13) and a C-terminal Jas domain (Thines

et al. 2007; Chini et al. 2007; Yan et al. 2007). The TIFY domain is responsible for

homo- and hetero-dimerization (Chini et al. 2009) and interaction with NINJA, which

further recruits the co-repressor TPL through its conserved EAR domain (Pauwels et al.

2010). However, JAZ5, JAZ6, JAZ7, JAZ8 and JAZ13 could directly recruit TPL by EAR

in their N-terminal domain (Causier et al. 2012; Shyu et al. 2012; Thireault et al. 2015;

Thatcher et al. 2016). The Jas domain is the major domain responsible for interacting

with transcription factors (Chini et al. 2016). It is also responsible for binding with COI1,

except in JAZ7 and JAZ8 where the Jas domain has diverged (Shyu et al. 2012; Chini

et al. 2016). As a result, JAZ8, and probably JAZ7, won’t be polyubiquitinated and

broken down in the presence of jasmonate and serve as permanent repressors.

Interestingly, almost all JAZs (expect for JAZ1, JAZ7 and JAZ8) share a homologous

intron (Jas intron) that divides the Jas domain into a 20 N-terminal motif and a 7 C-

terminal motif (X5PY) (Chung et al. 2010). Frequently observed AS around this

conserved intron usually leads to a truncated protein lacking the X5PY motif or lacking

the whole Jas domain (Yan et al. 2007; Chung and Howe 2009; Chung et al. 2010;

Moreno et al. 2013). These truncated proteins retain their ability to interact with

transcription factors through 20 amino acids remaining in N-terminal motif of the Jas

domain or a similar sequence in their N-terminal domain, but have reduced (or absent)

ability of being recognized by COI1 (Chung and Howe 2009; Chung et al. 2010; Moreno

et al. 2013; Zhang et al. 2017a). As a result, these AS isoforms avoid degradation and

function as dominant repressors in the presence of JA and may serve to limit up-

regulation of the jasmonate responsive genes. The observed AS around the Jas intron

69

is conserved among monocots and eudicots, suggesting a conserved function

underscoring its importance (Chung et al. 2010). It was predicted that ~60% percent

protein-coding genes of the Arabidopsis genome are alternatively spliced (Zhang et al.

2017b). However, other then JAZ repressors, little is known about AS regulation in the

jasmonate signaling pathway. In this project, I explore the role of AS in the jasmonate

signaling pathway.

AS is a post-transcriptional regulation which generates multiple

transcripts/isoforms from a single gene by selecting different splice sites caused by

interaction between cis-elements (intronic/exonic splicing enhancer/silencer) and trans-

factors (e.g. SR splicing activators and hnRNP splicing repressors) (Kornblihtt et al.

2013). AS regulation refers to selection of one splice junction over another leading to a

change in the proportion of different isoforms for a given gene. Moreover, AS interacts

with other regulatory mechanisms such as NMD (Kervestin and Jacobson 2012; Kalyna

et al. 2012) and miRNA regulation (Reddy et al. 2013). AS frequently (32% of genes

which undergo AS, Kalyna et al. 2012) causes a PTC that results in the transcript being

recognized by the RNA surveillance system and subsequently degraded through the

NMD pathway (Kervestin and Jacobson 2012). Thus, NMD coupled with AS functions

as an important posttranscriptional mechanism to regulate protein levels (Kervestin and

Jacobson 2012). Inclusion or exclusion of the alternative region may introduce new

miRNA targeting sites that the other isoforms do not have (Reddy et al. 2013), thus

genes producing AS isoforms may be subject to both AS and miRNA regulation. In

plants, experimentally validated miRNA coupled AS regulation has been reported in

Arabidopsis (Wu and Poethig 2006; Yang et al. 2012) and rice (Campo et al. 2013).

70

Moreover, a simulation study suggested AS occurs more frequently in miRNA binding

sites than in other regions, raising the possibility that this combined regulation could be

an important and prevalent mechanism (Yang et al. 2012). Finally, if the AS isoform is

able to make a protein product, it may generate the same protein or a different protein

depending on whether AS happens in the UTRs or CDS. For cases which generate a

different protein product, these proteins could be non-functional, partially-functional,

redundantly-functional, or neo-functional compared with the primary protein (Reddy et

al. 2013; Staiger and Brown 2013). Isoforms with partial- or neo-functions are especially

interesting as they may regulate an alternate gene function.

In this project, I integrated transcriptomics and proteomics analyses of

Arabidopsis with specific interest in identifying three aspects of AS-related regulation

potentially impacting the jasmonate signaling pathway: 1) differential AS in response to

MeJA treatment; 2) miRNA regulation ediated by differential use of AS isoforms; 3) AS

splice variants with novel functions. In each case, I screened a pool of candidates and

further explored a few interesting examples in details. In addition, AS events identified

from the RNA-seq data were further validated with the proteomics data.

Materials and Methods

Plant Growth, MeJA Treatment and Harvesting, Transcriptome Library Preparation and Sequencing

This work describes a reanalysis of RNA-Seq reads presented by Yang et al.

2014. Three independent replicate shoot and three independent replicate root samples

were obtained from the Arabidopsis thaliana mutants jaz2—SALK079895C and jaz7 –

SALK040835C and WT (Col-0) under (10 μM) or without (0 μM) MeJA treatment and

used for transcriptome and proteome sequencing. A total of 36 samples for RNA-Seq as

71

well as proteomics were obtained. Transcriptome data is available in NCBI Sequence

Read Archive under SRP026541 (Yang et al. 2014).

Transcriptome Assembly and Differential AS Detection

Raw reads from Hiseq 2000 system were filtered with FASTX-Toolkit

(http://hannonlab.cshl.edu/fastx_toolkit/index.html) and the screened reads were

mapped to TAIR10 genome database (http://www.arabidopsis.org) with GSNAP v2013-

07-20 (Wu and Nacu 2010) with maximum intron size 8000, minimum intron size 20,

and maximum 5% mismatch allowed. The unique mapped reads from each library were

assembled using Cufflinks v2.2.1 (Trapnell et al. 2013) by reference guided method with

TAIR10 annotation as reference, maximum intron size 8000, minimum intron size 20,

and minimum isoform abundance 5%. The 36 assemblies were merged to a single

transcript reference set by Cuffmerge v2.2.1 (Trapnell et al. 2013) with minimum isoform

abundance to be 5%. The merged transcript reference were further filtered with two

criteria: 1) each junction of the transcript has minimum 3 mapped reads support; 2) for

the alternative region of the IntronR event, the minimum average mapping depth to be 4

if 100% coverage, 5 if 90% coverage, and 6 if 80% coverage. The AS events were

called by AStalavista v3.2 (Foissac and Sammeth 2007). Percentage of novel junction

reads (“complete novel” indicates both the 5’ and 3’ splicing sites are novel, “partial

novel” indicates only one splice site is novel) were calculated by RSeQC v2.6.2 (Wang

et al. 2012). Differential AS and expression were identified by Cuffdiff v2.2.1 (Trapnell et

al. 2013) with default parameters using the screened transcript set as reference.

Open Reading Frame Prediction

I integrated Blast and Pfam search for ORF and protein prediction for the

screened transcript set using TransDecoder v3.0.0 (https://transdecoder.github.io/).

72

Protein Interaction Network Analysis

Genes with significantly differential expression and/or AS in response to MeJA

treatment were used for protein interaction network construction with STRING v10.0

(Szklarczyk et al. 2015). Genes or their homologs with experimental evidence of protein

interaction were connected with lines. Protein interaction network was displayed with

Cytoscape v3.4.0 (Shannon et al. 2003) with gene expression heatmap further added.

miRNA Target Predication

We used the total 427 Arabidopsis miRNAs from the miRNA database (miRBase

Release 21, Kozomara and Griffiths-Jones 2014) as query to searched against

reference transcript set for target site by psRNATarget (Dai and Zhao 2011) with default

parameters in scoring schema V2.

Protein Extraction, Digestion, iTRAQ Labeling and LC-MS/MS

Proteins were quantified as previously described (Koh et al. 2012), and dissolved

in denaturant buffer (0.1% SDS (w/v)) and dissolution buffer (0.5 M triethylammonium

bicarbonate, pH 8.5) in the iTRAQ 8-plex kit (AB sciex Inc., Foster City, CA, USA). For

each sample, a total of 100 μg of protein were reduced, alkylated, trypsin-digested, and

labeled according to the manufacturer’s instructions (AB Sciex Inc.). The control

samples from wild type, jaz2, and jaz7 were labeled with iTRAQ tags 113, 114, and 115,

respectively, and the corresponding treated samples were labeled with iTRAQ tags 116,

117, and 118, respectively. In addition, all six samples were mixed and labeled with

iTRAQ tag 119 as an internal control. Labeled peptides were desalted with C18-solid

phase extraction and dissolved in strong cation exchange (SCX) solvent A (25% (v/v)

acetonitrile, 10 mM ammonium formate, and 0.1% (v/v) formic acid, pH 2.8). The

peptides were fractionated using an Agilent HPLC 1260 with a polysulfoethyl A column

73

(2.1 × 100 mm, 5 µm, 300 Å; PolyLC, Columbia, MD, USA). Peptides were eluted with a

linear gradient of 0–20% solvent B (25% (v/v) acetonitrile and 500 mM ammonium

formate, pH 6.8) over 50 min followed by ramping up to 100% solvent B in 5 min. The

absorbance at 280 nm was monitored and a total of 12 fractions were collected. The

fractions were lyophilized and resuspended in LC solvent A (0.1% formic acid in 97%

water (v/v), 3% acetonitrile (v/v)). A hybrid quadrupole Orbitrap (Q Exactive Plus) MS

system (Thermo Fisher Scientific, Bremen, Germany) was used with high energy

collision dissociation (HCD) in each MS and MS/MS cycle. The MS system was

interfaced with an automated Easy-nLC 1000 system (Thermo Fisher Scientific,

Bremen, Germany). Each sample fraction was loaded onto an Acclaim Pepmap 100

pre-column (20 mm × 75 μm; 3 μm-C18) and separated on a PepMap RSLC analytical

column (250 mm × 75 μm; 2 μm-C18) at a flow rate at 350 nl/min during a linear

gradient from solvent A (0.1% formic acid (v/v)) to 30% solvent B (0.1% formic acid (v/v)

and 99.9% acetonitrile (v/v)) for 95 min, to 98% solvent B for 15 min, and hold 98%

solvent B for additional 30 min. Full MS scans were acquired in the Orbitrap mass

analyzer over m/z 400–2000 range with resolution 70,000 at 200 m/z. The top ten most

intense peaks with charge state ≥ 2 were isolated (with 2 m/z isolation window) and

fragmented in the high energy collision cell using a normalized collision energy of 28%.

The maximum ion injection time for the survey scan and the MS/MS scans were 250

ms, and the ion target values were set to 3e6 and 1e6, respectively. The selected

sequenced ions were dynamically excluded for 60 sec.

Proteomics Data Analysis

The raw MS/MS data files were processed by a thorough database searching

approach considering biological modification and amino acid substitution against

74

customized Arabidopsis database using the ProteinPilot v4.5 with the Fraglet and Taglet

searches under ParagonTM algorithm (Shilov et al. 2007). The following parameters

were considered for all the searching: fixed modification of methylmethane

thiosulfonate-labeled cysteine, fixed iTRAQ modification of amine groups in the N-

terminus, lysine, and variable iTRAQ modifications of tyrosine. The false discovery rate

at the peptide level was estimated with the integrated PSPEP tool in the ProteinPilot

Software to be 1.0%. The identified peptide reads were screened for confidence no less

than 95%. The screened peptide reads were mapped to TAIR10 primary protein

database and peptides which failed to map to the database were candidates for

supporting AS isoforms. These candidates were manually validated and only reads

spanning the AS junction were regarded as evidence for that AS isoform.

Proteomics data generation was performed by Mi-Jeong Yoo and Jin Koh.

Results

Transcriptome Sequencing and Genome-Guided Assembly

36 RNA-Seq Arabidopsis libraries originally constructed to examine response to

MeJA in roots and shoots under WT and mutant conditions (JAZ2 and JAZ7) were

reanalyzed to identify and characterize AS associated with response to MeJA. The

RNA-Seq data was sampled from shoot and root tissues of the Arabidopsis WT (Col-0),

knock-down mutant jaz2 (Figure 3-1; Yan et al. 2014), and overexpression mutant jaz7

(Figure 3-1; Yan et al. 2014) under 0 or 10 μM MeJA treatment, with three biological

replicates each. A total of 578 million reads were available and 78% of these (452M)

were uniquely mapped to the Arabidopsis TAIR10 genome assembly (Table 3-1). The

unique mapped reads from each library were assembled with Cufflinks (Trapnell et al.

2013) and these assemblies were further merged via Cuffmerge (Trapnell et al. 2013) to

75

generate the reference transcript assembly. Isoforms with less than three reads

supporting each of the annotated junctions or IntronR isoforms with low sequence

support for the retained intron were removed from consideration (See materials and

methods). A total of 20,524 transcripts from 13,647 genes were identified in our

transcript assembly, among which 15,947 transcripts were known transcripts shared

with TAIR10 annotation and the remaining 4,577 were novel transcripts (Figure 3-2A).

Among all the splice junctions identified in this project, 26% of the junctions have at

least one boundary novel to the TAIR10 annotation (Figure 3-2B). 4446 genes (32.58%

of the identified 13,647 genes) have evidence of undergoing AS with the majority (98%)

of them producing two to five splice variants (Table 3-2). The most abundant AS events

are AltA (38%) and IntronR (30%) (Figure 3-2C). In plants IntronR is usually the most

common AS event (Reddy et al. 2013). The finding that AltA is more common than

IntronR in this analysis may be a result of applying more stringent screening criteria to

IntronR events (Materials and Methods).

Jasmonate-Related Protein Interaction Network

Using the screened transcript assembly as reference, significant differential gene

expression and differential expression of AS isoforms were identified with Cuffdiff

(Trapnell et al. 2013) in response to MeJA treatment (Table 3-3), between tissues

(Table 3-4), and between genotypes (Table 3-5). In each case there were always more

genes with significant expression changes than genes with significant AS changes. The

most differences were observed between tissues. Genes with significant differential

expression and/or AS in response to jasmonate treatment were used to generate the

protein interaction network with STRING (Szklarczyk et al. 2015). Edges in the network

indicate experimentally validated protein interaction of the two connected proteins or

76

their homologs. A subset of the network including splicing related proteins, jasmonate

key regulatory proteins (JAZs, COI1, NINJA) as well as genes interacting with them was

isolated and examined further. We arbitrary divided the network into four modules

(Figure 3-3). One module contains splicing related genes, and this is connected to three

other modules. The first of these contains the key jasmonate regulatory factors, JAZ3,

JAZ10, TIFY7, COI1 and NINJA, as well as transcription factors such as bHLHs and

R2R3-MYBs. The communication between jasmonate signaling and splicing signaling

was shown to be mediated by bHLHs, R2R3-MYBs and three splicing-related proteins:

CBF1-interacting co-repressor (CIR), pre-mRNA-splicing factor ISY1-like protein (LSY1)

and SNW/SKI-interacting protein (SKIP). The second of these modules mainly contains

kinases and transcription factors and its interaction with the splicing-related proteins is

through kinases and SKIP. The third module is centered by a topoisomerase (TOPII)

and a ubiquitin (UBQ11) protein. Interaction between the third module with the splicing-

related proteins is through Embryo Defective 2816 (emb2816) and Always Early 4

(ALY4). As all genes in the protein interaction network are differentially regulated in

response to MeJA, these genes are presumably jasmonate-responsive genes. The

three modules interacting with the splicing-related proteins suggest three possible signal

transduction pathways between jasmonate and mRNA splicing regulation (Figure 3-3):

shared transcription factors (module 1), shared kinases (module 2) and shared ubiquitin

pathways (module 3).

Regulation of Transcription Factors (bHLHs and MYBs) and Splicing Factors (SRs and hnRNPs)

The protein interaction network indicates that bHLH and MYB gene families have

a large presence, which is suggestive of their importance in the jasmonate pathway.

77

Moreover, the SR and hnRNP gene families are key trans-factors involved in the

selection of splice junctions and play essential roles in AS regulation (Barbazuk et al.

2008). We further analyzed the expression patterns of these four important gene

families in response to MeJA treatment as well as between tissues and genotypes

(Figure 3-4). Genes without any significant expression change in any of the three

comparisons were not included.

I observed changed expression patterns in the mutants compared with WT for

some genes. For example, while the expression of bHLH116 is strongly downregulated

in WT root tissue it is slightly upregulated in jaz2 root tissue and higher still in jaz7 root

tissue in response to MeJA. This expression pattern change indicates bHLH116 might

be regulated by JAZ2 and JAZ7 in the root, which is also supported by the expression

change in genotype comparison (Figure 3-4C). Similarly, bHLH77 might be regulated by

JAZ2 and JAZ7 in the root tissue; bHLH153, bHLH155, bHLH137 might be regulated by

JAZ2 and JAZ7 in the shoot tissue; MYB48, bHLH36 is likely to be regulated by JAZ7 in

both the shoot and root tissues (Figure 3-4A). The expression profiles of the MYB and

bHLH gene family reveals the impact of expression in response to MeJA treatment (up

or down), as well as whether they act downstream of JAZ2 and JAZ7.

The majority of the splicing factors were downregulated in response to MeJA

treatment (Figure 3-4D). The expression changes were less dramatic in the SR and

hnRNP splicing factors compared with the bHLH and MYB transcription factors.

However, I observed several splicing factors that may be involved in jasmonate

responses: RBP45a, RS2Z33, RNPA/B_3, and RSZ21. The expression of RBP45a was

downregulated in response to MeJA in the shoot of WT. However, it shows increased

78

expression in the jaz7 mutant, and even higher expression in the jaz2 mutant. Another

interesting case is RS2Z33, which is dramatically downregulated in response to MeJA in

jaz7 mutant root tissue, but upregulated in the jaz2 mutant and WT in both shoot and

root tissues (Figure 3-4D). Moreover, the expression of RS2Z33 was greatly induced in

jaz2 mutant compared with WT in the shoot tissue (Figure 3-4F). The expression of

RNPA/B_3 did not change much in response to MeJA in WT shoot. However, its

expression is greatly downregulated in the shoot of jaz2 mutants and greatly

upregulated in the shoot of jaz7 mutants. These altered expression patterns suggest

possible involvement of these splicing factors in the jasmonate pathway regulated by

JAZ2 and JAZ7.

Differential Alternative Splicing in Response to MeJA Treatment

In response to MeJA treatment, the most (326) significantly differential AS events

were observed within the shoot of jaz7 mutant (Table 3-3). In each of the six treatments

(two tissues sampled from each of three genotypes), only the minority of the observed

differential AS in response to MeJA in each treatment is shared with the other

treatment(s) (Figure 3-5). Genes with significantly differential AS in response to MeJA in

more than one genotype of a specific tissue (shoot or root) were analyzed further

(Figure 3-6). Among these 16 genes, 7 have the AS events in the coding region, and 5

of the 7 lead to a PTC, which could target these transcripts for NMD (Kervestin and

Jacobson 2012). Based on the AS pattern and expression profiles, I observed cases

where AS alone may regulate gene expression (through NMD), such as NUDX9, and

cases where AS and transcription together regulate gene expression, such as NRT1.8

(Figure 3-7).

79

Two splice variants were identified for gene NUDX9, with an AltA event in the

fifth exon leading to a frame shift that results in a PTC in the AS transcript. There are 17

mapped reads supporting the AS junction (Figure 3-7B). No significant changes in the

gene expression level in response to MeJA were identified. However, I observed

significant differential AS in the shoot tissue of WT and jaz7 upon MeJA treatment. In

the jaz2 mutant, NUDX9 showed a different AS pattern, indicating that a mutation in

JAZ2 may affect the regulation of AS for this gene (Figure 3-7A). When there is no

MeJA treatment the two splice variants are present at similar levels, with only one

isoform (the primary annotated isoform) of the two generating a protein product while

the AltA AS isoform is potentially subject to NMD. However, upon MeJA treatment very

few of the isoforms are of the AltA form providing the potential to generate twice as

much productive translation of the protein product under MeJA treatment relative to

untreated tissue with no required increase in the rate of transcription (Figure 3-7C).

NRT1.8 exhibited both differential expression and AS in response to MeJA

treatment in the root of the three genotypes (Figure 3-7D). Two splice variants were

identified for NRT1.8: the primary annotated isoform and the AS isoform. The AS

isoform contains an IntronR and an alternative polyadenylation event compared with the

primary annotated isoform. The IntronR event affects the CDS where the primary

annotated isoform retained the intron and the AS isoform removed it. Ten mapped

reads support this AS junction (Figure 3-7E). The AS isoform is unproductive as it

contains a PTC. Upon MeJA treatment the expression level of NRT1.8 was up-

regulated and AS generated more of the productive isoform – the primary annotated

isoform. In the case of NRT1.8, differential transcription and AS both play a role to

80

potentially generate more protein product in response to increased jasmonate (Figure 3-

7F).

Alternative Splicing Variants Differentially Targeted by miRNA

I used psRNATarget (Dai and Zhao 2011) to predict miRNA targets on the

assembled transcripts. A total of 508 genes were predicted to be targeted by miRNAs,

among which 171 have evidence of AS (Figure 3-8). Sixty-four genes with evidence of

AS were targeted by miRNA in different manners between isoforms, suggesting the

potential of being regulated by a combination of AS and miRNA. Among these sixty-four

genes, twenty-one show significantly changed isoform proportions (significantly

differential AS identified by Cuffdiff) in MeJA treatment/tissue comparison/genotype

comparison (Figure 3-9). We can not directly relate the observed changes in isoform

proportion solely to significantly differential AS in these cases as the changes in the

ratio of isoform abundance may also be a result of miRNA regulation. Most of the

changes in isoform ratio occurs between tissues, with only five genes (SMZ, AAO2,

PUB4, LON2, AT3G02740) exhibiting a difference in isoform ratios in relation to MeJA

treatment and three (AAO2, LON2 and CEST) exhibiting difference in isoform ratios

between genotype comparison (Figure 3-9). The AS pattern and miRNA target sites of

SMZ, AAO2 and At3g02740 were further analyzed as these genes exhibit significantly

changed isoform ratios in response to MeJA and they represent cases where miRNA

target sites were located at CDS, 3’-UTR and 5’-UTR respectively (Figure 3-10). The

altered proportions of SMZ and AAO2 isoforms in jaz7 shoot tissue in response to

MeJA, and the altered proportion of At3g02740 isoforms in jaz2 and jaz7 root tissue in

response to MeJA, could be a result of AS regulation, miRNA regulation or a

combination of both.

81

Alternative Splicing Variants with Novel Functions

In order to identify AS isoforms with novel functions the ORF and protein

products of the 20,524 transcripts were predicted with TransDecoder (Figure 3-2D;

https://transdecoder.github.io/). Of the 15947 known transcripts identified from both our

project and TAIR10, there is 99.6% agreement between TAIR10 annotation and

TransDecoder prediction, supporting the efficacy of TransDecoder. We used the ORF

annotation from TAIR10 for the 15947 known transcripts and the ORF prediction from

TransDecoder for the remaining 4577 novel transcripts. Transcripts with no ORF

prediction or multiple ORF predictions were excluded from further analysis, thus 3220

genes with evidence of AS (7638 transcripts) with ORF predictions from either TAIR10

or TransDecoder were used for further analysis.

Not all AS leads to multiple protein products of a gene. Indeed, a large proportion

of the AS events doesn’t contribute to protein diversity. One cause is a result of AS in

the UTRs where the AS isoform generates the same protein product. An alternative is

the case where AS causes a PTC leading to NMD mediated transcript degradation. In

this study, 37.6% of the AS events occurred in UTRs (Figure 3-11). Both long 3’-UTR

and introns in the UTR region are cis-elements triggering NMD (Kervestin and Jacobson

2012; Kalyna et al. 2012). Two experimentally validated criteria for NMD in Arabidopsis

were applied to identify potential cases of NMD: (Kalyna et al. 2012): 1.) 3’ UTR > 350

nt; 2.) distance of stop codon to last exon junction > 55 nt (Figure 3-11). As a result, a

total of 1745 transcripts from 1106 genes (34.3% genes with evidence of AS) were

predicted to be targets of NMD. Eliminating genes that undergo AS but produce only a

single protein product identified 1464 genes with AS transcripts which have the potential

to generate multiple protein products.

82

Among these 1464 genes, 171 were identified that were either splicing-related

genes, jasmonate-related genes, or transcription factors, and also show significant

differential expression and/or AS in response to jasmonate treatment and/or between

tissues and/or between genotypes. Predicted proteins from the 372 transcripts from this

171 gene subset were analyzed with the NCBI Conserved Domain Database (CDD,

Marchler-Bauer et al. 2015) and Simple Modular Architecture Research Tool (SMART,

Letunic et al. 2015) to determine their domain structure. We identified potential novel AS

isoforms by two criteria: 1. the AS protein has conserved domain(s) that suggests

functional importance and 2. the domain arrangement/structure of the AS variant is

different from the primary protein, which indicates functional divergence. A total of thirty

genes were identified that satisfy both criteria (Figure 3-12). We observed conservation

between members of the same gene family for domain pattern changes. For example,

the MADS-box genes FYF and FLM both have AS isoforms lacking the MADS domain;

R2R3-MYB genes MYB59, MYB48, MYB28, MYB15, MYB47 all have AS isoforms

lacking the R2 domain; 3R-MYB genes MYB3R1 and MYB3R4 both have AS isoforms

lacking the C-terminal repression motif (Figure 3-12); suggesting conservation of AS

regulation of gene families.

Alternative Splicing Variant of bHLH160 with Potential Novel Function

Among the thirty genes which were predicted to produce AS isoform(s) with

novel functions, I identified bHLH160 as an interesting case. Like all bHLH proteins, the

bHLH160 contains a basic-Helix-Loop-Helix domain, where the basic region is

responsible for sequence specific recognition and interaction with DNA, and the HLH

region is responsible for dimerization (Figure 3-13A, Carretero-Paulet et al. 2010).

When the bHLH dimer stably binds with the DNA recognition sequence it serves as a

83

transcriptional activator/repressor for gene expression. Three transcript isoforms of

bHLH160 were assembled with two of them (TCONS_00034502 and

TCONS_00034498) predicted to generate the primary protein product and the third

isoform (TCONS_00034499) predicted to produce a product lacking part of the N-

terminal region (Figure 3-13A). There are 45 mapped reads supporting the AS junction

of the third isoform (Figure 3-14). Interestingly, the AS isoform lacking the N-terminal

region has lost the 13bp basic region and its upstream sequences leading to a predicted

protein product with only HLH region (conveniently called bHLH160b-) (Figure 3-13).

Based on the function of the basic and HLH regions, lacking the basic region would lead

to a protein that is able to dimerize but unable to bind DNA. In addition, by dimerizing

with normal bHLH proteins, bHLH160b- would be expected to decrease the active bHLH

dimers (Figure 3-13B), thus bHLH160b- functions in a manner opposite to that of the

primary isoform regardless of whether the protein product produced by the primary

isoform acts as an activator or a repressor. We observed significantly upregulated gene

expression of bHLH160 in response to MeJA treatment in WT and jaz2 mutant roots

(Figure 3-13C) suggesting it might be a jasmonate-responsive gene.

To investigate whether the AS regulatory pattern in bHLH160 is conserved

among other bHLH genes, all other bHLH genes expressed in these data with evidence

of AS were examined. Phylogeny analysis of the bHLH gene family from multiple plant

species indicates bHLH160 is a newly evolved gene in brassicaceae (Carretero-Paulet

et al. 2010). We further checked the bHLH160 orthologs in other brassicaceae species

and three genes with similar AS pattern based on the gene annotation of Camelina

sativa were identified (Figure 3-13D). Protein sequence alignments of the primary and

84

AS isoforms of the Arabidopsis bHLH160 and the three Camelina genes

(LOC104712692, LOC104701783, LOC104750971) suggest an alternative start codon

at the first amino acid of the Helix1 region (Figure 3-13E). A similar AS pattern observed

in multiple species suggests that AS generation of both an activator and a repressor

might be a conserved and important regulatory mechanism in bHLH160 and its

orthologous genes.

Proteomics Validation for Alternative Splicing

An attempt was made to validate AS events identified at the transcript level in

this transcriptome analysis with proteomics data. I applied three criteria to select peptide

sequences supporting identified AS events: 1) The peptide sequence should have at

least 95% confidential level; 2) The peptide sequence can only be mapped to a single

gene; 3) The peptide sequence is mapped to the AS junction of the AS isoform. With

the above criteria, AS events of nine genes were identified to have evidence supporting

the protein level expression (Figure 3-15). Of the nine AS events, two (PAPP5,

AKINBETA1) are ExonS; two (RCA, GRXC2) are IntronR; two (SYP43, GSTZ1) are

AltA, two (MORF1, PYR6) are alternative polyadenylation, one (RPAC14) is alternative

promoter (Figure 3-15). Among them, GSTZ1 has support for both the AS junction and

the primary junction.

Discussion

Regulatory Functions of JAZ2 and JAZ7

The isoleucine-conjugated JA, JA-Ile, is the single known bioactive molecule of

the JA hormone (Fonseca et al. 2009). One puzzle of the JA signaling pathway is how a

single molecule could regulate so many different biological processes with

uncompromised specificity (Chini et al. 2016). In the three component key regulatory

85

module “COI1 – JA-Ile – JAZ”, JAZs are responsible for shunting jasmonate signaling

for various plant specific responses due to functional redundancy and specificity of its

members (Chini et al. 2016).

JAZ2 and JAZ7 belong to clade I and IV of the JAZ gene family respectively (Bai

et al. 2011). In agreement with their sequence variation they have several functional

differences. First, JAZ7 has a diverged Jas domain which could not be recognized by F-

box protein COI1 (Shyu et al. 2012; Chini et al. 2016). Thus, JAZ7 may serve as a

permanent repressor, different from most of the other JAZ proteins which are

repressible repressors, such as JAZ2. It is also possible that JAZ7 may be targeted by

other F-box proteins (Shyu et al. 2012; Yan et al. 2014). Secondly, JAZ7 recruits co-

repressor TPL directly by the EAR domain in its N-terminal domain (Causier et al. 2012;

Shyu et al. 2012), whereas JAZ2, lacking the EAR domain, recruits NINJA with its TIFY

domain and NINJA further recruits TPL through its EAR domain (Pauwels et al. 2010).

Dependency on NINJA would affect whether the downstream regulation of JAZ2 and

JAZ7 is affected by signals regulating NINJA. Thirdly, JAZ2 is regulated by AS in its Jas

intron (the intron interrupts the Jas domain), which generates a transcript isoform with a

truncated Jas domain that leads to reduced interaction with COI1 and resistance to

degradation (Chung et al. 2010). Indeed, the jaz2 mutant, which generates a truncated

protein due to T-insertion in the Jas intron, is similar to the alternative isoform (Yan et al.

2014). However, JAZ7 does not have such AS regulation in the Jas domain simply due

to the lack of Jas intron of the JAZ7 gene (Chung et al. 2010). Fourthly, JAZ2 could

form homodimers with itself and heterodimers with other JAZ proteins, such as JAZ1,

JAZ5 and JAZ6, through the TIFY domain (Chung and Howe 2009). However, JAZ7

86

could not dimerize with any JAZ proteins, including itself (Chung and Howe 2009; Chini

et al. 2009). Thus JAZ2 may function as a dimer whereas JAZ7 likely functions as a

monomer. In addition, dimerization is critical for stabilization of the splice variant of

JAZ10 lacking the Jas domain (Chung and Howe 2009), which may also be the case for

the JAZ2 splice variant. Fifthly, interactive transcription factors of the JAZ2 and JAZ7

share similarity as well as differences. JAZ2 and JAZ7 both bind with MYC3, MYC4,

bHLH17(JAM1) (Fernández-Calvo et al. 2011; Fonseca et al. 2014; Thatcher et al.

2016). In addition, JAZ2 also interacts with MYC2, MYC5, bHLH13 (JAM2),

bHLH3(JAM3), bHLH14, TT8, GL3, and EGL3, whereas JAZ7 could not interact with

these (Qi et al. 2011; Fernández-Calvo et al. 2011; Song et al. 2013; Fonseca et al.

2014). Among these transcription factors, bHLH clade IIIe (MYC2, MYC3, MYC4,

MYC5) and IIIf (TT8, GL3, EGL3) mainly function as jasmonate signaling activators (Qi

et al. 2011; Fernández-Calvo et al. 2011), whereas clade IIId (JAM1, JAM2, JAM3,

bHLH14) functions as repressors (Song et al. 2013; Fonseca et al. 2014). The IIIf clade

has a specific role in anthocyanin accumulation and trichome formation (Qi et al. 2011),

suggesting involvement of JAZ2, but not JAZ7 in these regulatory pathways. Recent

research suggests JAZ7 play a specific role in light/dark induced responses (Chico et al.

2014; Yu et al. 2016). JAZ7, as well as JAZ1, JAZ5, JAZ9, JAZ10, JAZ11, JAZ12, are

greatly induced in simulated shade conditions leading to repression of the jasmonate-

dependent defense responses (Chico et al. 2014). Moreover, darkness greatly up-

regulates expression of JAZ7, which negatively regulates Arabidopsis leaf senescence

through MYC2 activator (Yu et al. 2016). Two previously published studies used the

same jaz7 over-expression mutant (SALK_040835) as this project for functional

87

analysis of JAZ7 (Yan et al. 2014; Thatcher et al. 2016). The jaz7 over-expression

mutant exhibits short root and reduced weight compared with WT in response to MeJA,

early flowering phenotype compared with WT in short-day conditions, and increased

susceptibility to bacterial pathogen F. oxysporum (Yan et al. 2014; Thatcher et al.

2016). Interestingly, transgenic JAZ7 overexpression lines do not resemble the

phenotype of the jaz7 over-expression mutant (SALK_040835) suggesting possible

tissue-specific JAZ7 expression of the jaz7 mutant (Thatcher et al. 2016). In this project,

significant up-regulated expression in shoot, but not in root of the jaz7 mutant compared

with WT was observed in response to MeJA, which support tissue-specific expression of

JAZ7 in the mutant (Figure 3-1).

In agreement with functional similarity and specificity of the JAZ2 and JAZ7

genes discussed above I identified potential genes regulated by JAZ2, JAZ7 or both,

although further experimental validation is needed. For example, gene NUDX9 might be

regulated by JAZ2 in the shoot due to the changed isoform expression profile of these

genes in the jaz2 mutant compared with WT and the jaz7 mutant (Figure 3-7A); gene

bHLH160 might be regulated by JAZ7 in the root due to the changed gene expression

profile in jaz7 mutant compared with WT and jaz2 mutant (Figure 3-13C); and gene

At3g02740 might be regulated by both JAZ2 and JAZ7 due to the changed isoform

expression profiles of this gene in jaz2 and jaz7 mutants compared with WT (Figure 3-

10C).

Alternative Splicing Coupled miRNA Regulation

AS and miRNA perform two important post-transcriptional regulation processes

in plants (Alves-Junior et al. 2009; Zhang et al. 2017b) and animals (Pan et al. 2008;

Friedman et al. 2009). In human, more than 95% of multi-exon protein-coding genes are

88

under AS regulation (Pan et al. 2008) and over 60% of the protein-coding genes are

conserved targets of miRNA regulation (Friedman et al. 2009). In Arabidopsis, around

60% of multi-exon protein-coding genes are regulated by AS (Zhang et al. 2017b) and

hundreds of genes involved in various pathways are subjected to miRNA regulation

(Alves-Junior et al. 2009). Approximately 33.3% of miRNA binding sites in human (Wu

et al. 2013a) and more than 12.4% miRNA binding sites in Arabidopsis (Yang et al.

2012) are predicted to be regulated by AS. The inevitable combination of AS and

miRNA forms an important layer of gene regulation in plants and animals (Reddy et al.

2013; Tian and Manley 2013). Both plant and animal miRNAs mediate post-

transcriptional repression, however plant miRNAs manipulate this process by nearly

perfect complementation with usually a single binding site in any region of the transcript,

whereas animal miRNAs achieve this process by partial complementation with multiple

binding sites usually located at the 3’UTR of the mRNA (Voinnet 2009; Axtell et al.

2011). As a result, the plant miRNAs have stronger and more focused effects on

specific target genes, whereas the animal miRNAs have more subtle and wide effects

across the transcriptome (Axtell et al. 2011).

Published studies suggested at least two mechanisms of AS coupled miRNA

regulation. The first mechanism is shared by plants and animals, which include miRNA

binding sites located at the 3’UTR and alternative polyadenylation leading to

inclusion/exclusion of these miRNA binding site(s). By selectively splicing miRNA

binding sites at the 3’UTR region, AS could render isoforms without the miRNA binding

site(s) resistant to miRNA repression with no change on protein structure/function (Wu

and Poethig 2006; Sandberg et al. 2008; Kalsotra et al. 2010; Boutet et al. 2012;

89

Campo et al. 2013). This mechanism is supported by the fact that most eukaryote

genes contain more than one cleavage and polyadenylation sites (Tian and Manley

2013). In animals, alternative polyadenylation associated miRNA regulation is known to

play important roles in a wide variety of pathways, such as immune cell activation, cell

proliferation, muscle stem cell function, and heart development (Sandberg et al. 2008;

Kalsotra et al. 2010; Boutet et al. 2012). In plants, alternative polyadenylation coupled

miRNA regulation was reported on Arabidopsis SPL3/4/5 (Wu and Poethig 2006) and

rice Nramp6 (Campo et al. 2013). In Arabidopsis, alternative polyadenylation on 3’UTR

of SPL3/4/5 affects miR156 binding site and generates miR156-sensative and miR156-

insensitive isoforms. Increased abundance of miR156-insensitive isoforms confer

juvenile-to-adult transition (Wu and Poethig 2006). In rice, dominant splicing variant of

Nramp6 (Natural resistance-associated macrophage protein 6, Os01g31870),

Nramp6.8, contains two miR7695 binding sites in its 3’UTR region. The expression level

of Nramp6.8 is negatively correlated with miR7695 and the over-expression of miR7695

confers pathogen resistance in rice (Campo et al. 2013). The second mechanism is

plant specific, that miRNA could repress pre-mature or erroneous mRNA by targeting

miRNA binding sites located in the intron region. In plant, the most prevalent AS event

is IntronR, which account for ~40% of the total AS events (Zhang et al. 2017b). It was

shown that miRNAs target introns in Arabidopsis and rice (Meng et al. 2013), raising the

possibility that miRNA could specifically repress or degrade erroneous intron-containing

mRNAs.

In this project, I identified 64 genes which contain miRNA binding sites potentially

impacted by AS, including the known SPL4 (Figure 3-16; Wu and Poethig 2006). The

90

actual gene number might be higher as only genes whose expression was detected in

our transcriptome data were analyzed. I observed cases where alternative spliced

miRNA target sites were located in the CDS (e.g. SMZ), 5’UTR (e.g. At3g02740) and

3’UTR (e.g. AAO2) (Figure 3-10). The cases of 5’UTR-located AS-regulating miRNA

binding sites expand the first mechanism to also include alternative promoter coupled

miRNA regulation, which functions in a similar pattern as alternative polyadenylation

coupled miRNA regulation. In addition, most plant miRNA regulated genes were

regulated by only one miRNA with a single target site (Voinnet 2009; Axtell et al. 2011).

I identified 32 genes (6.3% of miRNA regulated genes) that were predicted to contain

multiple miRNA target sites and some of them are regulated by different miRNAs such

as AAO2 and At3g23900 that contain predicted target sites of miR414 and miR5021,

and GRF3 that contains predicted target sites of miR396 and miR5658 (Figure 3-8;

Figure 3-10). Though further validation of these prediction is needed, these

observations suggest complexity of miRNA regulation of these genes in Arabidopsis.

Functional AS Regulation

Several cases where AS may play functional roles in response to MeJA were

identified. In terms of AS regulation – changed proportion of splice variants under

different conditions – I identified genes NUDX9 and NRT1.8 (Figure 3-7). Both genes

exhibited significantly altered isoform proportion upon increased jasmonate, suggesting

existence of cis-elements (intronic/exonic splicing enhancer/silencer) and different

involvement of trans-factors (e.g. SR splicing activators and hnRNPs splicing

repressors). Specifically, I observed significantly changed expression of SR genes

RS2Z33, RSZ21, and hnRNP genes RBP45a, RNPA/B_3 in response to MeJA

treatment (Figure 3-4). These splicing factors are candidate genes responsible for

91

differential AS induced by jasmonate. Moreover, I explored protein interaction network

of jasmonate responsive genes and identified three possible signal transduction

pathways between jasmonate signaling and splicing regulation – shared transcription

factors, kinases and ubiquitin pathways (Figure 3-3). Increased jasmonate would trigger

the jasmonate signaling pathway. The signal could be mediated and transferred through

the transcription factors, kinases and ubiquitin signaling pathways and ultimately

transmitted to splicing related factors that would regulate AS of the responsive genes,

such as NUDX9 and NRT1.8 (Figure 3-7). Interestingly, the AS isoforms of both genes

are potentially subjected to NMD due to a PTC. NMD coupled AS is an efficient

mechanism to downregulate gene expression (Kervestin and Jacobson 2012; Kalyna et

al. 2012) and it may occur in 34.3% of genes predicted to undergo AS in this analysis

(Figure 3-11). Specifically, NRT1.8 not only show differential AS, but also increased

transcription in response to MeJA treatment. The two regulation points – transcription

and AS – function synergistically to increase NRT1.8 protein product (Figure 3-7).

NRT1.8 is a nitrate transporter which enhances nitrate uptake by mediating nitrate

unloading from the xylem vessels (Li et al. 2010). Increased NRT1.8 protein level from

transcription and AS regulation indicates its possible involvement in jasmonate triggered

plant nitrate uptake. NUDX9 is a GDP-D-Man pyrophosphohydrolase, which indirectly

modulate ammonium responses by hydrolysis of GDP-D-Man in the root (Tanaka et al.

2015). Here I reported a possible involvement of NUDX9 in jasmonate signaling

pathway by AS regulation in shoot of Arabidopsis.

In addition to regulating the relative abundance of isoforms AS also plays

important functional roles by generating novel function splice variants, as is the case for

92

the well documented JAZ repressors (Yan et al. 2007; Chung and Howe 2009; Chung et

al. 2010; Moreno et al. 2013). Notably, in response to wounding, the ratio of the

expressed JAZ10 splice variants did not change significantly (Yan et al. 2007). The AS

regulation of the JAZ repressors lies in generating a stable repressor (lacking the

domain of being recognized by COI1) in addition to the primary degradable repressor

rather than modulating the proportion of isoforms. AS has the potential to generate

isoforms with different functions from the same gene by modulating domain structures.

By identifying AS genes with different domain arrangements between splice variants an

attempt was made to identify cases similar to JAZ repressors in the jasmonate signaling

pathway. We identified a copper-responsive transcription factor bHLH160 (Yamasaki et

al. 2009; Bernal et al. 2012), which has the potential to generate an activator and a

repressor through AS by inclusion/exclusion of sequences coding for the DNA binding

domain (Figure 3-13). The primary isoform of bHLH160 is a transcription factor with a

DNA-binding domain (basic) and dimerization domain (HLH) (Carretero-Paulet et al.

2010). The splice variant bHLH160b- lacks the basic domain and is not expected to be

able to bind with the promoters of genes normally regulated by bHLH160. Moreover,

bHLH160b- protein could soak up functional bHLH160 proteins by forming dimers with

them through the HLH domain. Thus, the splice variant bHLH160b- likely functions in a

manner opposite to the primary isoform. Human Id genes, which only contain the HLH

domain like Arabidopsis bHLH160b-, function in a similar manner to repress gene

expression (Sikder et al. 2003). Interestingly, similar AS regulation was observed in

MADS-box genes, FYF and FLM. The AS isoforms of FYF and FLM have lost the

MADS-box domain, which is the DNA-binding domain, but retained the K-box domain,

93

which facilitates dimerization of MADS-box proteins (Par̆enicová et al. 2003; Figure 3-

12). Thus the splice variants of FYF and FLM may negatively regulate the original

function of the gene in a manner similar to bHLH160b-. AS may result in isoforms with

mutually exclusive functions, or could also result in isoforms that effect the same

outcome but have different strengths, which is the case for MYB3R1 and MYB3R4

(Figure 3-12). MYB3R1 and MYB3R4 are transcriptional activators upregulating G2/M

transition in cell cycle (Haga et al. 2007). AS of MYB3R1 and MYB3R4 leads to

truncated proteins lacking the repression motif, which would serve as hyper-activators

compared with the primary proteins (Kato et al. 2009; Feng et al. 2017). Besides

transcription factors, I also identified many splicing factor genes with different domain

arrangements between splice variants. In most of the identified cases the primary

protein product of the genes has multiple domains and the AS isoform has lost one or

more domain(s) (Figure 3-12). It is not surprising that many of the cases I identified are

transcription factors and splicing factors, as they contain at least two domains: the

DNA/RNA-binding domain and the regulatory domain. Based on these observations, AS

could serve as an important regulator on multi-domain proteins that could generate

splice variants with different functions by altering domain arrangement. Further

functional validation of these cases would help to elucidate regulatory roles different

isoforms are playing and their involvement in the jasmonate signaling pathway.

94

Figure 3-1. jaz2 and jaz7 mutant characterization. A) Gene structure of jaz2 and jaz7 with the location of the T insertion indicated. B) Expression profile of JAZ2 (left) and JAZ7 (right) based on the transcriptome data. C) Phenotype of Arabidopsis seedlings under 0 μm or 10μm MeJA (Yan et al. 2014).

95

Figure 3-2. Characterization of assembled transcripts. A) Comparison of the assembled transcripts from the MeJA RNA-Seq data and TAIR10 annotation. B) Comparison of the mapped junction reads from the MeJA RNA-Seq data to the TAIR10 annotated junctions. C) Identified AS patterns detected in the MeJA RNA-Seq assemblies. D) ORF prediction of the 20,524 transcripts assembled from the MeJA RNA-Seq data with the TransDecoder program (Haas and Papanicolaou 2016).

96

Figure 3-3. Protein interaction network of genes that undergo AS which show differential expression or differential AS in response to MeJA treatment. Edges indicates experimentally validated interactions of the two connected proteins or their homologs. Yellow boxes indicate proteins related with splicing; blue boxes indicate transcription factors; pink boxes indicate kinase proteins. Heatmaps under the gene name indicate gene expression patterns in response to MeJA treatment.

97

Figure 3-4. Heatmap of differentially expressed transcription factor (bHLH and MYB) and splicing factor (SR and hnRNP) gene family members that undergo AS. Differentially expressed bHLH and MYB genes under MeJA treatment (A), between tissues (shoot relative to root, ) (B), between mutant backgrounds (mutant relative to WT) (C). Differentially expressed SR and hnRNP genes under MeJA treatment (D), between tissues (shoot relative to root) (E), between mutant backgrounds (mutant relative to WT) (F).

98

Figure 3-5. Venn diagram of genes that exhibit differential expression or differential AS in response to treatment (MeJA), between tissues (root and shoots) and between mutant backgrounds.

Treatment no vs. 10µM MeJA

Tissue root vs. shoot

Differential AS Differential exp.

Genotype WT vs. jaz2 or

WT vs. jaz7 or jaz2 vs. jaz7

Treatment no vs. 10µM MeJA

Tissue root vs. shoot

Genotype WT vs. jaz2 or

WT vs. jaz7 or jaz2 vs. jaz7

99

Figure 3-6. Example cases in which gene expression was regulated by AS in response to MeJA treatment. Pie charts under each condition indicate the proportion of each AS isoform present relative to the total expression from the locus. Red and blue lines under pies indicate significant differential AS in shoot and root, respectively.

100

Figure 3-7. Two genes under AS regulation in response to MeJA treatment. A) Expression profile of NUDX9. Error bars indicate standard deviation. Pie charts under each condition indicate the proportion of each AS isoform relative to the total expression from the locus. Red lines under pie charts indicate significantly differential AS in the shoot tissue. B) Gene structure of the two isoforms of NUDX9. Angled lines indicate introns; thin black boxes indicate UTRs; blue boxes indicate CDS; pink boxes indicate regions which were converted to non-coding regions as a result of an AS induced PTC. Gray line connected red boxes indicate mapped reads supporting the novel junction in the alternative isoform. C) Regulation of NUDX9 in response to MeJA treatment by AS. D) Expression profile of NRT1.8. Blue lines under pie charts indicate significantly differential AS patterns in the root tissue. Red and blue lines above the barplot indicate significantly differential gene expression in shoot and root respectively. E) Gene structure of the two isoforms of NRT1.8. F) Regulation of NRT1.8 in response to MeJA treatment by transcription and AS.

101

Figure 3-8. Genes undergoing AS that include miRNA binding sites that differ between isoforms. A) Distribution of predicted miRNA target sites in genes that do not undergo AS, genes that undergo AS and AS isoforms of the gene contain different miRNA target sites and genes that undergo AS to produce isoforms with the same miRNA target sites. The darker shaded sections within each category indicate cases with multiple miRNA target sites. B) Gene structure of the seven cases (dark pink in A) with the predicted miRNA binding site indicated. C) Expression profiles of the seven genes shown in B. Pie charts under each condition indicate the proportion of each AS isoform relative to the total expression from the locus. Significantly differential AS is indicated by red lines.

102

Figure 3-9. Twenty-one genes which contain miRNA binding sites potentially subjected to AS regulation. Blue, green and yellow shading indicate significantly changed isoform proportions identified in treatment, tissue or genotype comparisons, respectively. Targeted sequences within the transcript and their locations are presented within brackets [ ] in the “Target site” column.

a b c d e f a b c d e f a b c d e f g h i j k l

TCONS_00008685 0

TCONS_00008686 1 ath-miR156 [798-817: CUUCUCUCUCUCUUCUCUCA]

TCONS_00085221 1 ath-miR172 [965-985: UUGCAGCAUCAUCAGGAUUCC]

TCONS_00085222 0

TCONS_00082086 2ath-miR414 [4777-4797: UGAUGAUGAUGAUGAAGAUGC]

ath-miR5021 [5090-5109: UUUUCUUCUUCUUCUUCUUC]

TCONS_00082087 0

TCONS_00051815 1 ath-miR414 [2818-2838: GGAUGAUGAUGAUGAUGAUGA]

TCONS_00051822 1 ath-miR414 [2877-2897: GGAUGAUGAUGAUGAUGAUGA]

TCONS_00051834 0

TCONS_00137229 0

TCONS_00137255 1 ath-miR414 [916-939: UGACAACGAUGAUGAUGAAGAUGA]

TCONS_00090637 1 ath-miR415 [1441-1461: CUUUUCUGUCUCUGCUCUGUU]

TCONS_00090640 0

TCONS_00090641 1 ath-miR415 [1396-1416: CUUUUCUGUCUCUGCUCUGUU]

TCONS_00119400 0

TCONS_00119402 1 ath-miR472 [1313-1334: GGUAUGGGGGGAAUAGGAAAAA]

TCONS_00119403 1 ath-miR472 [1313-1334: GGUAUGGGGGGAAUAGGAAAAA]

TCONS_00119404 0

TCONS_00003955 0

TCONS_00003956 1 ath-miR472 [1191-1212: GGUAUGGGGGGAGUAGGUAAAA]

TCONS_00103441 1 ath-miR5021 [47-67: UUCUUCUUCUUCUUCUUCUCU]

TCONS_00103442 1 ath-miR5021 [47-67: UUCUUCUUCUUCUUCUUCUCU]

TCONS_00103445 0

TCONS_00103446 1 ath-miR5021 [44-64: UUCUUCUUCUUCUUCUUCUCU]

TCONS_00042668 0

TCONS_00042669 0

TCONS_00042670 1 ath-miR5021 [24-44: UUCUUCUUCUUCUUCUUCUCU]

TCONS_00059096 0

TCONS_00059100 1 ath-miR5021 [128-147: CUUUCUUCUUCUUCUUCUUC]

TCONS_00060511 1 ath-miR5021 [114-133: UUUUCUUCUUCUUCUUCUUU]

TCONS_00060512 0

TCONS_00074206 1 ath-miR5021 [1620-1640: UUCUUCUUCUUCUUCUUCUCU]

TCONS_00074207 0

TCONS_00082295 1 ath-miR5021 [18-38: UUCUUCUUCUUCUUCUUCUCU]

TCONS_00082304 0

TCONS_00101323 1 ath-miR5021 [2480-2500: UUUAUCUUCUUCUUCUUCUCC]

TCONS_00101325 0

TCONS_00124016 1 ath-miR5021 [57-76: UUUUUUUGUUCUUCUUCUCA]

TCONS_00124018 0

TCONS_00122678 0

TCONS_00122680 0

TCONS_00122682 0

TCONS_00122683 1 ath-miR5641 [60-80: UCUUUCUAUCAUCUUCUUACA]

TCONS_00119648 1 ath-miR5641 [1180-1200: UUAUUUAAUCAUCUUCUUCCU]

TCONS_00119653 1 ath-miR5641 [1180-1200: UUAUUUAAUCAUCUUCUUCCU]

TCONS_00119656 0

TCONS_00124898 0

TCONS_00124899 0

TCONS_00124902 1 ath-miR5658 [3646-3666: AAUCAUCAUCAUAAUCAUCAU]

TCONS_00095194 2ath-miR5658 [1487-1507: GAUCAUCACCAUCAUCAUCAU]

ath-miR5658 [1981-2001: GCUCAUCGUCAUCAUCAUCAU]

TCONS_00095197 1 ath-miR5658 [1078-1098: GAUCAUCACCAUCAUCAUCAU]

TCONS_00004939 0

TCONS_00004941 1 ath-miR8177 [3771-3791: UA-GAGUGACACAUCAUCACAA]

sh

oo

t_W

Tsh

oo

t_ja

z2

sh

oo

t_ja

z7

roo

t_W

Tro

ot_

jaz2

roo

t_ja

z7

No

Me

JA

_W

TN

o M

eJA

_ja

z2

No

Me

JA

_ja

z7

10

um

Me

JA

_W

T1

0u

m M

eJA

_ja

z2

10

um

Me

JA

_ja

z7

jaz2

-WT

(sh

oo

t_n

o M

eJA

)ja

z2

-WT

(sh

oo

t_1

0u

m M

eJA

)ja

z2

-WT

(ro

ot_

no

Me

JA

)ja

z2

-WT

(ro

ot_

10

um

Me

JA

)ja

z7

-WT

(sh

oo

t_n

o M

eJA

)ja

z7

-WT

(sh

oo

t_1

0u

m M

eJA

)ja

z7

-WT

(ro

ot_

no

Me

JA

)ja

z7

-WT

(ro

ot_

10

um

Me

JA

)ja

z2

-ja

z7

(sh

oo

t_n

o M

eJA

)ja

z2

-ja

z7

(sh

oo

t_1

0u

m M

eJA

)ja

z2

-ja

z7

(ro

ot_

no

Me

JA

)ja

z2

-ja

z7

(ro

ot_

10

um

Me

JA

)

ath-miR8177 21AT1G18880

(NPF2.9)

20AT4G29230

(NAC75)

ath-miR5658

19AT5G64330

(NPH3/RPT3)

18AT5G44650

(CEST/Y3IP1)

ath-miR5641

17 AT5G55530

16AT5G60860

(RABA1F)

15AT4G13930

(SHM4)

14 AT3G44610

13 AT3G05180

12 AT3G02740

11AT2G46340

(SPA1)

10AT2G27030

(CAM5)

ath-miR5021

9 AT4G19670

8 AT1G15885

ath-miR472

7AT5G43730

(RSG2)

ath-miR415 6AT4G12560

(CPR1/CPR30)

5AT5G47040

(LON2/APEM10)

4AT2G23140

(PUB4)ath-miR414

3AT3G43600

(AAO2)

ath-miR172 2AT3G54990

(SMZ)

Genotype

ath-miR156 1AT1G35515

(MYB8/HOS10)

miRNA No. Gene

AS.diff

Transcript Target Target siteTreatment Tissue

103

Figure 3-10. miRNA regulation and expression profiles of SMZ, AAO2 and At3g02740. A) Gene structure of the transcript isoforms of SMZ, AAO2 and At3g02740. Angled lines indicate introns; thin black boxes indicate UTRs; blue/green boxes indicate CDS. Red bars indicate miRNA targeting position. B) Complementarity of miRNAs and their target sites on the transcripts. C) Expression profiles of SMZ, AAO2 and At3g02740. Isoforms subjected to miRNA regulation are indicated in red. Error bars indicate standard deviation. Pie charts under each condition indicate the proportion of each AS isoform present relative to the total expression from the locus. Black lines under pie charts indicate significantly changed isoform proportions.

104

Figure 3-11. Cases in which AS won’t generate multiple protein products. A) Distribution of AS events occurring within the CDS or UTRs. B) Distribution of 3’-UTR lengths of all identified transcripts. C) Distribution of the distance between the stop codon and the last exon junction when the stop codon is not within the last exon of the transcript.

105

Figure 3-12. AS Genes with different domain structures predicted between transcript isoforms. Heatmaps indicate up- or down-regulated transcript expression in response to MeJA treatment in six conditions (SW: shoot of WT; S2: shoot of jaz2; S7: shoot of jaz7; RW: root of WT; R2: root of jaz2; R7: root of jaz7).

No. Category Gene Transcript SW S2 S7 RW R2 R7 Domain (Interval, E-value)*

TCONS_00124363/5/8 MADS-MEF2-like (3-78, 7.2e-44); K-box (84-170, 1.2e-22)

TCONS_00124364/6 MADS-MEF2-like (3-78, 2.6e-44); K-box (84-156, 1.4e-14)

TCONS_00124367 MADS-MEF2-like ( - ); K-box (56-117, 1.6e-18)

TCONS_00016233 MADS (2-71, 1.8e-25); K-box (95-163, 2.6e-10)

TCONS_00016236 MADS ( - ); K-box (86-126, 5.2e-08)

TCONS_00140881 MYB_DNA-binding (10-57, 7.5e-14); MYB_DNA-binding (63-108, 1.2e-14)

TCONS_00140880 MYB_DNA-binding ( - ); MYB_DNA-binding (42-87; 3.1e-15)

TCONS_00140882 MYB_DNA-binding ( - ); MYB_DNA-binding (2-43, 5.1e-14)

TCONS_00068443 MYB_DNA-binding (9-56, 3.8e-14); MYB_DNA-binding (62-107, 1.8e-14)

TCONS_00068438 MYB_DNA-binding ( - ); MYB_DNA-binding (2-43, 1.3e-13)

TCONS_00068444 MYB_DNA-binding ( - ); MYB_DNA-binding (32-77, 9.3e-15)

TCONS_00141362 MYB_DNA-binding (14-61, 5.1e-16); MYB_DNA-binding (67-112, 4.4e-14)

TCONS_00141363 MYB_DNA-binding ( - ); MYB_DNA-binding (2-33, 1.7e-08)

TCONS_00066032 MYB_DNA-binding (14-61, 5.2e-15); MYB_DNA-binding (67-112, 2.2e-14)

TCONS_00066034 MYB_DNA-binding ( - ); MYB_DNA-binding (10-55, 1.9e-15)

TCONS_00004892 MYB_DNA-binding (14-61, 2.2e-12); MYB_DNA-binding (67-112, 4.8e-13)

TCONS_00004893 MYB_DNA-binding ( - ); MYB_DNA-binding (2-21, 2.2e-03)

TCONS_00096267 MYB_DNA-binding (35-81, 1.0e-15; 87-133, 6.2e-20;139-182, 5.4e-14 ); Repression_motif1 (849-920, NA)

TCONS_00096266 MYB_DNA-binding (35-81, 1.7e-15; 87-133, 1.2e-19; 139-182, 8.8e-14); Repression_motif1 ( - )

TCONS_00112857 MYB_DNA-binding (29-75, 7.5e-17; 81-127, 1.1e-19; 133-176, 3.8e-12); Repression_motif1 (843-908, NA)

TCONS_00112860 MYB_DNA-binding (29-75, 1.2e-16;87-127, 1.7e-19; 133-176, 5.3e-12); Repression_motif1 ( - )

TCONS_00134035/6 myb_SHAQKYF (188-242, 8.9e-21); Myb_CC_LHEQLE (276-321, 4.4e-23)

TCONS_00134034 myb_SHAQKYF (231-285, 8.3e-21); Myb_CC_LHEQLE (319-364, 9.5e-24)

TCONS_00134043 myb_SHAQKYF (231-285, 7.8e-22); Myb_CC_LHEQLE ( - )

TCONS_00048235 myb_SHAQKYF (15-71, 2.2e-25); Myb_CC_LHEQLE (101-146, 5.7e-28)

TCONS_00048234 myb_SHAQKYF ( - ); Myb_CC_LHEQLE (52-97, 4.7e-28)

TCONS_00036895 myb_SHAQKYF (34-90, 2.5e-23); Myb_CC_LHEQLE (130-167, 6.0e-23)

TCONS_00036894 myb_SHAQKYF ( - ); Myb_CC_LHEQLE (65-102, 1.8e-22)

TCONS_00111659 myb_SHAQKYF (192-248, 9.6e-21); MYB_CC_LHEQLE (276-321, 2.3e-23)

TCONS_00111656 myb_SHAQKYF (192-248, 1.2e-21); Myb_CC_LHEQLE ( - )

TCONS_00127440 zf-CCCH (38-61, 1.8e-04); KH_1 (115-177, 2.9e-12); ZnF_C3H1 (205-231, 2.6e-08)

TCONS_00127443 zf-CCCH ( - ); KH_1 (43-105, 1.1e-11); ZnF_C3H1 ( 133-159, 2.2e-09)

TCONS_00053227/9 FAR1 super family (64-137, 1.66e-09); MULE (251-339, 2.2e-19); ZnF_PMZ (538-562, 1.6e-07); Commd super family (777-837, 2.8e-03)

TCONS_00053228 FAR1 super family ( - ); MULE (106-194, 2.2e-19); ZnF_PMZ (393-417, 1.2e-07); Commd super family (632-692, 2.3e-03)

TCONS_00053546 AP2 (153-215, 1.5e-14); AP2 (245-308, 7.9e-25)

TCONS_00053545 AP2 (153-215, 1.4e-14); AP2 ( - )

TCONS_00034498/502 HLH super family (61-119, 2.6e-03)

TCONS_00034499 HLH super family ( - )

TCONS_00007957 MFMR (1-107, 1.0e-35); MFMR_assoc (115-266, 3.2e-23); bZIP_plant_GBF1 (298-348, 5.3e-25); BAR super family (340-385, 9.7e-03)

TCONS_00007970 MFMR ( - ); MFMR assoc (16-167, 4.0e-24); bZIP_plant_GBF1 (199-249, 4.9e-23); BAR super family (241-286, 6.3e-03)

TCONS_00051442 bZIP_plant_RF2 (373-422, 8.3e-22); Dzip-like_N super family (416-462, 3.2e-03)

TCONS_00051446 bZIP_plant_RF2 (373-422, 8.6e-23); Dzip-like_N super family ( - )

TCONS_00121748 RRM1_AtRSp31_like (2-73, 2.1e-37); RRM2_AtRSp31_like (97-166, 6.3e-40); Rubella_Capsid super family (237-327, 8.4e-04)

TCONS_00121746 RRM1_AtRSp31_like (2-73, 1.5e-37); RRM2_AtRSp31_like (97-166, 6.2e-40); Rubella_Capsid super family (237-327, 8.7e-04)

TCONS_00121747 RRM1_AtRSp31_like ( - ); RRM2_AtRSp31_like (64-133, 2.4e-40); Rubella_Capsid super family (204-294, 2.0e-03)

TCONS_00121756 RRM1_AtRSp31_like ( - ); RRM2_AtRSp31_like (64-133, 2.5e-40); Rubella_Capsid super family (204-294, 1.8e-03)

TCONS_00094149/51 RRM1_AtRSp31_like (2-73, 4.3e-37); RRM_SF super family (98-167, 2.4e-38); DUF2722 super family (210-315, 7.6e-03)

TCONS_00094150 RRM1_AtRSp31_like ( - ); RRM_SF super family (65-134, 1.1e-38); DUF2722 super family ( - )

TCONS_00094152 RRM1_AtRSp31_like ( - ); RRM_SF super family (57-126, 8.6e-39); DUF2722 super family ( - )

TCONS_00056283 RRM (2-85, 1.8e-09); Zf-CCHC (99-115, 1.9e-04); Zf-CCHC (121-138, 4.8e-04)

TCONS_00056285 RRM (3-44, 2.3e-03); Zf-CCHC (58-74, 3.5e-04); Zf-CCHC (80-97, 8.6e-04)

TCONS_00056284 RRM ( - ); Zf-CCHC (69-85, 4.5e-04); Zf-CCHC (91-108, 1.1e-03)

TCONS_00084632 RRM (2-92, 2.7e-08); Zf-CCHC (99-115, 1.8e-04); Zf-CCHC ( 121-138, 4.7e-04)

TCONS_00084631 RRM ( - ); Zf-CCHC (58-74, 3.5e-04); Zf-CCHC (80-97, 8.8e-04)

TCONS_00125380 RRM_hnRNPH_ESRPs_RBM12_like (50-119, 3.1e-23); RRM_hnRNPH_ESRPs_RBM12_like (165-239, 4.9e-28)

TCONS_00125379 RRM_hnRNPH_ESRPs_RBM12_like ( - ); RRM_hnRNPH_ESRPs_RBM12_like (70-142, 2.4E-29)

TCONS_00122488 RRM1_SECp43_like (61-138, 2.8e-43); RRM2_SECp43_like (153-232, 6.0e-50); RRM3_NGR1_NAM8_like (259-330, 4.4e-41)

TCONS_00122487 RRM1_SECp43_like (61-138, 1.2e-44); RRM2_SECp43_like (153-222, 7.3e-44); RRM3_NGR1_NAM8_like ( - )

TCONS_00122484 RRM1_SECp43_like (61-138, 2.3e-44); RRM2_SECp43_like (153-232, 5.4e-51); RRM3_NGR1_NAM8_like ( - )

TCONS_00139168 RRM1_PTBPH1_PTBPH2 (16-96, 3.5e-57); RRM2_PTBPH1_PTBPH2 (110-204, 1.4e-58); RRM3_PTBPH1_PTBPH2 (243-339, 2.0e-64)

TCONS_00139169 RRM1_PTBPH1_PTBPH2 ( - ); RRM2_PTBPH1_PTBPH2 ( - ); RRM3_PTBPH1_PTBPH2 (143-239, 2.1e-66)

TCONS_00077879 Smc (117-457, 2.2e-06); PRK12472 (609-751, 1.7e-03)

TCONS_00077880 Smc (134-469, 7.5e-07); PRK12472 (571-713, 1.5e-03)

TCONS_00077870 MDN1 (70-350, 1.34e-04); PRK12472 (417-559, 6.7e-04)

TCONS_00112713 DEADc (48-250, 5.4e-88); HELICc (264-391, 2.0e-38)

TCONS_00112720 DEADc ( - ); HELICc (181-308, 8.5e-39)

TCONS_00047402 EF-G (67-339, 2.0e-179); mtEFG1_II_like (366-446, 1.2e-48); EFG_III (459-534, 2.6e-40); EFG_mtEFG1_IV (537-654, 8.2e-63); mtEFG1_C (659-736, 7.5e-45)

TCONS_00047388 EF-G (67-304, 1.5e-151); mtEFG1_II_like ( - ); EFG_III ( - ); EFG_mtEFG1_IV ( - ); mtEFG1_C ( - )

TCONS_00119648 Transmembrane region2 ( 254-276, NA)

TCONS_00119656 Transmembrane region2 ( - )

* Domain prediction information from NCBI Conserevd Domain Database (Marchler-Bauer et al., 2017). 1. Motif information from Feng et al., 2017. 2. Transmembrane prediction from SMART (Simple Modular Architecture Research Tool) (Letunic et al., 2015).

29translation elongation

factorAT2G45030 (EFG/EF2)

30 thylakoid protein AT5G44650 (CEST)

27 ER body protein AT3G15950 (NAI2 )

28 RNA helicase AT5G11170 (UAP56A)

25 RNA-binding protein AT5G54900 (RBP45A)

26 RNA-binding protein AT5G53180 (PTB2 )

23 SR splicing factor AT3G53500 (RSZ32)

24 hnRNP splicing factor AT5G66010

21 SR splicing factor AT4G25500 (RS40)

22 SR splicing factor AT2G37340 (RS2Z33)

19 bZIP AT2G21230 (bZIP30)

20 SR splicing factor AT5G52040 (RS41)

17 bHLH AT1G71200 (bHLH160)

18 bZIP AT1G32150 (bZIP68)

15 FHY3/FAR1 AT2G27110 (FRS3)

16 AP2 AT2G28550 (TOE1 )

13 G2-like AT5G06800

14 C3H AT5G06770

11 G2-like AT2G01060

12 G2-like AT1G79430 (APL)

9 3R-MYB AT5G11510 (MYB3R4)

10 G2-like AT5G29000 (BHL1)

7 R2R3-MYB AT1G18710 (MYB47)

8 3R-MYB AT4G32730 (MYB3R1)

5 R2R3-MYB AT5G61420 (MYB28)

6 R2R3-MYB AT3G23250 (MYB15)

3 R2R3-MYB AT5G59780 (MYB59)

4 R2R3-MYB AT3G46130 (MYB48)

1 MADS BOX AT5G62165 (AGL42/FYF)

2 MADS BOX AT1G77080 (FLM)

WT_Shoot

J2_Shoot

J7_Shoot

WT_Root

J2_Root

J7_Root

TCONS_00119656

TCONS_00119648

TCONS_00047388

TCONS_00047402

TCONS_00112720

TCONS_00112713

TCONS_00077870

TCONS_00077880

TCONS_00077879

TCONS_00084631

TCONS_00084632

TCONS_00056284

TCONS_00056285

TCONS_00056283

TCONS_00139169

TCONS_00139168

TCONS_00125379

TCONS_00125380

TCONS_00094152

TCONS_00094150

TCONS_00094149/51

TCONS_00122484

TCONS_00122487

TCONS_00122488

TCONS_00121756

TCONS_00121747

TCONS_00121746

TCONS_00121748

TCONS_00051446

TCONS_00051442

TCONS_00007970

TCONS_00007957

TCONS_00034499

TCONS_00034498/502

TCONS_00014999

TCONS_00014988

TCONS_00053545

TCONS_00053546

TCONS_00053228

TCONS_00053227/9

TCONS_00127443

TCONS_00127440

TCONS_00111656

TCONS_00111659

TCONS_00036894

TCONS_00036895

TCONS_00048234

TCONS_00048235

TCONS_00134043

TCONS_00134034

TCONS_00134035/6

TCONS_00112860

TCONS_00112857

TCONS_00096266

TCONS_00096267

TCONS_00004893

TCONS_00004892

TCONS_00066034

TCONS_00066032

TCONS_00141363

TCONS_00141362

TCONS_00068444

TCONS_00068438

TCONS_00068443

TCONS_00140882

TCONS_00140880

TCONS_00140881

TCONS_00016236

TCONS_00016233

TCONS_00124367

TCONS_00124364/6

TCONS_00124363/5/8

−5 5

Value

Color Key

WT_Shoot

J2_Shoot

J7_Shoot

WT_Root

J2_Root

J7_Root

TCONS_00119656

TCONS_00119648

TCONS_00047388

TCONS_00047402

TCONS_00112720

TCONS_00112713

TCONS_00077870

TCONS_00077880

TCONS_00077879

TCONS_00084631

TCONS_00084632

TCONS_00056284

TCONS_00056285

TCONS_00056283

TCONS_00139169

TCONS_00139168

TCONS_00125379

TCONS_00125380

TCONS_00094152

TCONS_00094150

TCONS_00094149/51

TCONS_00122484

TCONS_00122487

TCONS_00122488

TCONS_00121756

TCONS_00121747

TCONS_00121746

TCONS_00121748

TCONS_00051446

TCONS_00051442

TCONS_00007970

TCONS_00007957

TCONS_00034499

TCONS_00034498/502

TCONS_00014999

TCONS_00014988

TCONS_00053545

TCONS_00053546

TCONS_00053228

TCONS_00053227/9

TCONS_00127443

TCONS_00127440

TCONS_00111656

TCONS_00111659

TCONS_00036894

TCONS_00036895

TCONS_00048234

TCONS_00048235

TCONS_00134043

TCONS_00134034

TCONS_00134035/6

TCONS_00112860

TCONS_00112857

TCONS_00096266

TCONS_00096267

TCONS_00004893

TCONS_00004892

TCONS_00066034

TCONS_00066032

TCONS_00141363

TCONS_00141362

TCONS_00068444

TCONS_00068438

TCONS_00068443

TCONS_00140882

TCONS_00140880

TCONS_00140881

TCONS_00016236

TCONS_00016233

TCONS_00124367

TCONS_00124364/6

TCONS_00124363/5/8

−5 5

Value

Color Key

Color Key

-5 0 5

106

Figure 3-13. Arabidopsis bHLH160 AS patterns and the proposed regulatory function of the splice variant. A) Gene structure of the transcript isoforms of Arabidopsis bHLH160. B) Protein 3D structure of the Arabidopsis bHLH160 proteins adapted from Ma et al. 1994 and their regulatory roles. C) Expression profile of Arabidopsis bHLH160. Error bars indicate the standard deviation. Pies charts under each condition indicate the proportion of each AS isoform present relative to the total expression from the locus. Red lines above the barplot indicate significantly changed gene expression. D) Gene structure of the transcript isoforms of Camelina sativa LOC104712692, LOC104701783 and LOC104750970. Angled lines indicate introns; black thin boxes indicate UTRs; thick boxes indicate CDS; orange, green and purple indicate basic, helix and loop respectively in the bHLH domain. E) Multiple sequence alignments of the protein isoforms from Arabidopsis thaliana bHLH160, Camelina sativa LOC104712692, LOC104701783 and LOC104750970. The bHLH domain is indicated within black boxes.

107

Figure 3-14. Mapped reads from the MeJA project supporting the AS junction in Arabidopsis bHLH160b-.

108

Figure 3-15. Proteomics validated AS isoform expression. Gene structure was displayed by Gene Structure Display Sever 2.0 (http://gsds.cbi.pku.edu.cn, Hu et al. 2015): thick black line -- UTR; blue box -- exon; angled line -- intron; red angled line -- intron splitting the junction of the transcript isoform supported by proteomics data; black/gray line under blue box -- peptide mapped region. Shaded isoform name indicates primary annotated isoform from TAIR10 annotation.

109

Figure 3-16. Differential miRNA regulation of SPL4 splice variants. A) Gene structure of SPL4 AS isoforms with the predicted miRNA binding site indicated. B) Complementarity of the miRNA with the predicted SPL4 target site.

110

Table 3-1. RNA-seq library and mapping information.

Samples # Raw reads # Mapped reads Percentage

WT_shoot_0_rep1 21,128,469 17,025,286 80.6%

WT_shoot_0_rep2 14,186,832 8,909,077 62.8%

WT_shoot_0_rep3 23,486,963 8,769,393 37.3%

WT_shoot_10_rep1 13,657,112 10,920,597 80.0%

WT_shoot_10_rep2 16,134,503 12,685,395 78.6%

WT_shoot_10_rep3 13,510,226 10,641,409 78.8%

WT_root_0_rep1 15,241,432 12,494,869 82.0%

WT_root_0_rep2 40,886,447 32,315,025 79.0%

WT_root_0_rep3 22,960,991 18,992,644 82.7%

WT_root_10_rep1 15,351,676 13,060,948 85.1%

WT_root_10_rep2 17,683,061 14,960,418 84.6%

WT_root_10_rep3 15,102,434 12,796,169 84.7%

jaz2_shoot_0_rep1 15,032,167 10,950,279 72.8%

jaz2_shoot_0_rep2 11,958,237 9,261,665 77.5%

jaz2_shoot_0_rep3 9,277,191 5,794,838 62.5%

jaz2_shoot_10_rep1 11,154,705 8,481,602 76.0%

jaz2_shoot_10_rep2 11,431,445 8,887,788 77.7%

jaz2_shoot_10_rep3 11,715,590 9,311,499 79.5%

jaz2_root_0_rep1 15,850,717 13,107,489 82.7%

jaz2_root_0_rep2 17,974,716 15,159,325 84.3%

jaz2_root_0_rep3 17,514,424 14,196,519 81.1%

jaz2_root_10_rep1 16,661,319 13,609,917 81.7%

jaz2_root_10_rep2 23,840,265 19,031,731 79.8%

jaz2_root_10_rep3 18,466,668 15,379,759 83.3%

jaz7_shoot_0_rep1 10,507,564 8,446,985 80.4%

jaz7_shoot_0_rep2 10,564,614 8,097,926 76.7%

jaz7_shoot_0_rep3 n.a. n.a. n.a.

jaz7_shoot_10_rep1 12,594,778 9,115,023 72.4%

jaz7_shoot_10_rep2 17,867,067 13,564,544 75.9%

jaz7_shoot_10_rep3 15,161,488 11,884,760 78.4%

jaz7_root_0_rep1 17,388,880 14,375,611 82.7%

jaz7_root_0_rep2 17,181,387 14,247,505 82.9%

jaz7_root_0_rep3 15,300,499 13,159,910 86.0%

jaz7_root_10_rep1 16,636,604 13,859,228 83.3%

jaz7_root_10_rep2 13,528,403 11,442,447 84.6%

jaz7_root_10_rep3 20,662,422 17,233,780 83.4%

Total 577,601,296 452,171,360 78.3%

111

Table 3-2. Gene isoform number in TAIR10 and determined by assembly of the MeJA RNA-Seq data.

isoform # TAIR10 MeJA project

gene # percentage gene # percentage

1 21402 78.67% 9201 67.42%

2 4251 15.63% 2998 21.97%

3 1133 4.17% 881 6.46%

4 291 1.07% 345 2.53%

5 89 0.33% 115 0.84%

6 26 0.10% 65 0.48%

7 7 0.03% 21 0.15%

8 5 0.02% 11 0.08%

9 1 0.00% 6 0.04%

10 1 0.00% 0 0.00%

11 0 0.00% 1 0.01%

12 0 0.00% 2 0.02%

13 0 0.00% 0 0.00%

14 0 0.00% 0 0.00%

15 0 0.00% 1 0.01%

Total 27206 100.00% 13647 100.00%

112

Table 3-3. Differential AS or expression of gene undergoing AS in treatment comparisons.

genotype tissue differential AS

%a differential expression

%a splicing.diff promoter.diff cds.diff total

WT shoot 44 3 6 48 0.99 537 12.08

jaz2 shoot 32 5 8 37 0.72 586 13.18

jaz7 shoot 234 90 72 326 5.26 1400 31.49

WT root 49 25 15 75 1.10 886 19.93

jaz2 root 32 14 9 48 0.72 557 12.50

jaz7 root 43 22 13 67 0.97 1131 25.44

Table 3-4. Differential AS or expression of gene undergoing AS in tissue comparisons.

genotype MeJA treatment

differential AS %a

differential expression

%a splicing.diff promoter.diff cds.diff total

WT 0μM 159 86 41 249 3.58 1943 43.70

jaz2 0μM 162 75 37 243 3.64 1896 42.65

jaz7 0μM 246 136 88 389 5.53 2049 46.09

WT 10μM 267 165 84 429 6.00 2300 51.73

jaz2 10μM 140 84 53 230 3.15 1885 42.40

jaz7 10μM 259 111 68 372 5.83 2389 53.73

Table 3-5. Differential AS or expression of gene undergoing AS between mutant backgrounds.

tissue MeJA treatment

differential AS %a

differential expression

%a splicing.diff promoter.diff cds.diff total

shoot 0μM 84 6 25 90 2.05 380 8.55

root 0μM 40 14 10 53 0.90 718 16.15

shoot 10μM 205 58 77 231 4.61 1096 24.65

root 10μM 76 10 22 79 1.71 670 15.07 a Percentage was calculated by dividing total changes by a total number of AS genes (4,446) detected.

113

CAPTER 4 ORIGIN AND EVOLUTION OF THE TIFY PLANT-SPECIFIC MULTI-DOMAIN GENE

FAMILY

Background

During the process of genome evolution new arrangements of domains are

formed, which provide additional resources for natural selection. Novel domain

arrangements which are beneficial to the species are likely to be fixed in the population

(Bornberg-Bauer and Albà 2013) and continuous changes in domain arrangement is

one of the main forces shaping the evolution of species. It was estimated that more than

70% of eukaryotic genes code for multi-domain proteins (Han et al. 2007). However, our

understanding for the origin and evolution of the multi-domain gene families is limited.

TIFY is an excellent example of a multi-domain gene family.

The TIFY family is a plant-specific gene family defined by a highly conserved

domain (TIFY), which is named after the core amino acid motif TIF[F/Y]XG (Vanholme

et al. 2007). The TIFY domain is about 36 amino acids long and forms a beta-beta-

alpha fold that mediates protein-protein interaction between TIFY proteins as well as

with other transcription factors (Vanholme et al. 2007; Chung et al. 2009). Proteins

within the TIFY family could be further divided into four subfamilies based on domain

architecture (Vanholme et al. 2007; Chung et al. 2009; Bai et al. 2011): 1) ZML

subfamily, with TIFY, CCT and GATA domains; 2) JAZ subfamily, with TIFY and Jas

domains; 3) PPD subfamily, with TIFY, PPD and Jas-like domains; and 4) TIFY

subfamily, with only the TIFY domain. TIFY is a plant specific gene family, and is thus

expected to be involved in plant specific functions. The JAZ subfamily acts as

transcription repressors in the first step of the jasmonate signaling pathway (Chini et al.

2016). The Arabidopsis PPD genes are responsible for lamina size and regulation of

114

leaf curvature (White 2006). The ZML genes are transcription factors involved in

responses to high-light (Shaikhali et al. 2012) and wound-induced lignification (Vélez-

Bermúdez et al. 2015). Understanding the origin and evolution of the TIFY plant-specific

multi-domain gene family would provide insight into the evolution of its domain

architecture and the relationship between its domain architecture and the plant specific

functions these genes are involved in.

In this project I studied the evolution of the domains contained within the TIFY

gene family members, and the evolutionary history of each TIFY subfamily and the

domain gain/loss dynamics during gene family evolution. I further analyzed differences

of the TIFY domain among the four subfamilies and characterized AS events and their

conservation within the PPD subfamily.

Materials and Methods

Identification of Members in the TIFY Gene Family

I collected protein annotations from 76 plant species and generated a database

including only the primary protein sequences. For species with no primary protein

annotation, the longest protein isoform of each locus was used as the primary protein.

Profile HMM searches against this primary protein database with the Pfam profile HMM

database were conducted with HMMER v3.1b2 (Eddy 2011). The PPD, SRT and Jas-

like domains are not available in the Pfam database, but a profile HMM for these were

generated by HMMER v3.1b2 with the domain alignments and these were used to

identify proteins containing them. The PPD and Jas-like domains were identified from

previous studies (White 2006; Thireault et al. 2015). The SRT domain was identified by

MEME in this study. For each domain, the identification criteria was E-value of ≤ 1.0E-5

for sequence prediction in HMMER search. Proteins containing TIFY, CCT, ZML, Jas

115

(CCT_2 in Pfam), Jas-like, PPD, or SRT were kept for phylogenetic analysis. Clusters

that did not contain any protein with a TIFY domain were removed, as were proteins

with abnormal alignments. The remaining proteins were regarded as candidate

members in the TIFY family.

Multiple Sequence Alignment and Phylogeny

The multiple sequence alignments of each subfamily (TIFY, PPD, ZML, JAZ)

were generated with whole length protein sequences using Muscle v3.8.31 (Edgar

2004) with default parameters. These alignments were used to generate ML phylogeny

trees with RAxML v8.2.3 (Stamatakis 2014) under the LG4X model (Le et al. 2012). For

each subfamily, I performed five tree searches and the tree with the highest likelihood

was selected as the ML tree. I ran 500 bootstrap replicates for the dataset of PPD and

TIFY subfamilies, and 200 bootstrap replicates for the dataset of the ZML and JAZ

subfamilies with ML under LG4X model (Le et al. 2012) using RAxML v8.2.3

(Stamatakis 2014).

Domain Identification

The TIFY, CCT, GATA, and Jas domains in the TIFY family were identified with

HMMER v3.1b2 using the Pfam domain profile HMM database (TIFY - PF06200, CCT -

PF06203, GATA - PF00320, Jas - PF09425) (Finn et al. 2014). The PPD and Jas-like

domains were identified by HMMER v3.1b2 with domain profile HMMs generated from

alignments of domain annotations from previous studies (White 2006; Thireault et al.

2015). In addition, MEME v4.12.0 (Bailey et al. 2006) was used to search for other

domains present in addition to the TIFY domain in the TIFY subfamily. Domain

sequence logos were generated by Weblogo Berkeley

(http://weblogo.berkeley.edu/logo.cgi).

116

Rate-Shift Analysis of the TIFY Domain

We compared amino acid substitution rate differences in the TIFY domain

between the four subfamilies ZML, JAZ, PPD and TIFY using an adapted rate shift

detecting method (Gaucher et al. 2011). First, the amino acid substitution rates in each

site of the TIFY domain for ZML, JAZ, PPD and TIFY subfamilies were estimated using

PAML v4.9a (Yang 2007) with Γ-distributed rate variation among sites under the LG

model (Le and Gascuel 2008). Secondly the average amino acid substitution rate in

each site of the TIFY domain of the four subfamilies was calculated. Third, the amino

acid substitution rate of the TIFY domain of each subfamily was compared to the

average amino acid substitution rate at each site. Large positive and negative values in

the rate differences indicate rate shifts and identify variable (evolving) and conserved

sites in the corresponding subfamily relative to the whole family. 2.57 standard

deviations from the mean was used as a cutoff to detect significant rate shifted sites.

Alternative Splicing Analysis

We identified AS in the PPD subfamily genes from 7 species (A. trichopoda, E.

guineensis, M. acuminata, A. thaliana, P. trichocarpa, V. vinifera, S. lycopersicum) with

a specific focus on the intron interrupting the Jas-like domain (Jas-like intron). Transcript

AS data in A. trichopoda, E. guineensis and M. acuminata was acquired from Mei et al.

(2017); transcript AS data in P. trichocarpa, V. vinifera, S. lycopersicum was extracted

from Chamala et al. (2015); and the AS data in A. thaliana was obtained from Araport11

annotation (Cheng et al. 2016). Among the 14 PPD genes in the 7 species, 7 genes

from 4 species (A. thaliana, P. trichocarpa, V. vinifera, S. lycopersicum) have evidence

of AS in the Jas-like intron (the intron interrupting the Jas-like domain). The primary

gene structure of the PPD genes from these four species was illustrated using Gene

117

Structure Display Server 2.0 (Hu et al. 2015) with the observed AS patterns in the Jas-

like intron indicated.

Results

Domain Identification and Evolutionary History

The TIFY family is defined by the presence of the TIFY domain. Four subfamilies

(ZML, JAZ, PPD, TIFY) with different domain arrangement have been identified (Chung

et al. 2009; Bai et al. 2011). We applied a thorough search against all documented

domains in the Pfam database in an attempt to detect the presence of other domains in

addition to the TIFY domain. Other than the domains identified in the ZML, JAZ, PPD

and TIFY subfamilies no new domains were identified in combination with the TIFY

domain. Of the four subfamilies in the TIFY family, the TIFY subfamily is the only family

defined by a single domain – the TIFY domain. To detect whether there are other

domains/motifs present in the TIFY subfamily a domain search was conducted with

MEME software and a highly conserved domain/motif was detected. I named the

domain/motif SRT based on the most conserved three amino acids in the alignments

(Figure 4-1). Therefore, the TIFY subfamily, like the remaining 3 subfamilies, also

contains multiple domains. A total of seven domains are identified in TIFY family: TIFY

(in all four subfamilies); CCT (in the ZML subfamily); GATA (in the ZML subfamily); Jas

(in the JAZ subfamily); Jas-like (in the PPD subfamily); PPD (in the PPD subfamily); and

SRT (in the TIFY subfamily) (Figure 4-1). The CCT and GATA domains have been

detected in all available genome sequences including algae, moss, fern, gymnosperm

and angiosperms (Figure 4-2). In comparison, the TIFY, Jas, Jas-like and SRT domains

were only observed in land plants (moss, fern and seed plants); and the PPD domain

was only identified in vascular plants (fern and seed plants) (Figure 4-2). Interestingly,

118

none of the eleven poales contain PPD, Jas-like or SRT domain, which is a strong

indication for domain loss. Almost all identified TIFY, Jas/Jas-like, SRT and PPD

domains in the genome are associated with TIFY family genes. The few exceptions

observed were the result of poor alignments and these instances were removed from

analysis. Thus, TIFY, Jas/Jas-like, PPD, SRT domains are likely TIFY-family-specific. In

comparison, only ~12.3% and ~13.8% of the CCT or GATA domains, respectively,

identified in the genome are in the TIFY family.

Gene Family Identification and Evolution History

A total of 1373 TIFY family proteins were identified from 76 plant species. Based

on phylogeny and domain arrangement the TIFY family is divided into four subfamilies –

JAZ, ZML, PPD and TIFY (Figure 4-3). The algae genomes do not contain any proteins

belonging to the TIFY family. The JAZ, ZML and TIFY proteins containing all domains

defining that family were present in moss, fern, gymnosperms and angiosperms. A JAZ

gene in moss, Phpat.005G044400, which contains the TIFY and Jas-like domains and

lacks the PPD domain, might be the ancestor of PPD genes. The PPD proteins

containing all three domains were firstly identified in fern. The TIFY and PPD

subfamilies are the smallest families with an average of 1.5 and 1.7 genes per species,

respectively. Notably, none of the eleven poales examined contain PPD and TIFY

subfamily proteins; however, the PPD and TIFY subfamily proteins were identified in the

ancient monocot species duckweed, orchid, as well as in the palm and banana

lineages. This suggests PPD and TIFY subfamilies were lost within the common

ancestor of grasses during monocot evolution. The JAZ and ZML subfamilies are large

families with an average of 15.6 and 4.7 genes per genome, which suggests they went

through gene family expansion during evolution.

119

Phylogenetic analysis suggests that the ZML subfamily contains three subgroups

(named A, B, and C) and the JAZ subfamily contains five subgroups (named A, B, C, D

and E) (Figure 4-4). The ZML_A subgroup was rooted by one Amborella gene, and the

ZML_B and ZML_C subgroups together were rooted by one Amborella gene. In addition

to the Amborella branch, each of the three ZML subgroups roughly consist of a monocot

and an eudicot cluster. The five JAZ subgroups all contain gymnosperm genes.

Moreover, the JAZ_A subgroup contains fern genes and the JAZ_B, C, D, E subgroups

were rooted by four fern genes. Interestingly, similar to the loss of PPD and TIFY

subfamilies, the ZML_B subgroup and JAZ_C subgroup were also lost in poales (Figure

4-5). The PPD and TIFY subfamilies did not undergo gene family expansion prior to the

divergence of the monocot and eudicot lineages.

Domain Dynamics during Evolution

We identified five sites of the TIFY domain with significantly changed amino acid

substitution rates among the four subfamilies (Figure 4-6). Compared among the entire

TIFY family, the first, third and thirty-second sites are more conserved in the PPD, TIFY

and ZML subfamilies respectively; the twenty-fourth and thirty-sixth sites are more

dynamic in the PPD and JAZ subfamilies respectively. Sites under rate-shift could

contribute to functional differences of the TIFY domain within different subfamilies. The

ZML subfamily contains a diverged amino acid pattern of the TIFY motif – T[L/I]S[F/V],

whereas the other three families contains the TIFY pattern.

In order to track domain gain/loss dynamics in the TIFY multidomain gene family,

the phylogenetic tree describing the PPD, TIFY and ZML subfamilies was annotated

with the domain arrangements (Figure 4-7). I observed 22 instances (26.2%) where one

or more domains in PPD proteins were lost. These instances include 1 case of PPD

120

domain loss, 8 cases of TIFY domain loss, and 15 cases of Jas-like domain loss. I did

not observe any domain loss in the TIFY subfamily. 70 genes (23.7%) within the ZML

subfamily are missing one or more domains. Among these, 39 genes have lost the TIFY

domain, 9 genes have lost the CCT domain, and 32 genes have lost the GATA domain.

Alternative Splicing of Jas-like Intron in PPD Genes

In order to assess AS of the Jas-like intron in the PPD subfamily, AS data from 7

species were examined: one basal angiosperm – A. trichopoda; two monocots – E.

guineensis, M. acuminata; and four eudicots – A. thaliana (eurosid II), P. trichocarpa

(eurosid I), V. vinifera (eurosid), S. lycopersicum (asterid). Among the five PPD genes

from the two monocot species, only one gene (Achr2P08870 from M. acuminata)

contains the Jas-like domain and no AS of the Jas-like intron was observed. The single

PPD gene Amtr0002.597 from A. trichopoda contains the Jas-like domain, but there

was no evidence for AS in the Jas-like intron. All of the eight PPD genes from eudicot

species have the Jas-like domain, while evidence of AS in the Jas-like intron was

observed for 7 (Figure 4-8). Of the seven AS events, one is AltD event (AT4G14713

from A. thaliana), two are AltA event (Potri.002G048500 and Potri005G214300 from P.

trichocarpa), and four are IntronR event (GSVIVT01018038001 and

GSVIVT01003113001 from V. vinifera, Solyc09g065630 and Solyc06g084120 from S.

lycopersicum). Six of the seven AS events cause a stop codon immediate downstream

of the α-helix of the Jas-like domain and would lead to a truncated Jas-like domain that

lacks seven amino acids in the C-terminus.

121

Discussion

Evolutionary History of the TIFY Family

1373 TIFY family genes were identified from 76 land plants, including algae,

moss, fern, Amborella, monocots and eudicots. Based on phylogeny and domain

arrangement, the TIFY family can be divided into four subfamilies: JAZ, ZML, PPD and

TIFY. The evolutionary history of the TIFY family can be inferred from the domain and

family distribution within the TIFY phylogeny. No TIFY family gene were identified in

algae species suggesting the TIFY family originated after the divergence of algae from

land plants, which is in agreement with previous observations (Bai et al. 2011). Among

the seven domains (TIFY, Jas, CCT, GATA, PPD, SRT, Jas-like) in the TIFY family,

only the CCT and GATA domain were found in algae species. Genes containing the

CCT domain and genes containing the GATA domain may have merged along with the

newly evolved TIFY domain to form the ZML gene in the early ancestor of land plants.

The CCT and GATA domains were also observed in other gene families; for example,

the B-box zinc finger family has members containing both the CCT and B-box domains

(Khanna et al. 2009). Sequence similarity of the Jas domain with the first half of the

CCT domain (Chung et al. 2009; Figure 4-1) suggests that the Jas domain likely derived

from the CCT domain. In addition, the Jas-like domain is likely a variant of the Jas

domain. Unlike the CCT and GATA domains, the TIFY, Jas/Jas-like, SRT and PPD

domains are restricted to the TIFY family and their origins coincides with the origin of

the four subfamilies. Distribution and phylogeny of subfamilies suggested that the JAZ,

ZML and TIFY subfamilies originated in land plants and the PPD subfamily originated in

vascular plants (Figure 4-5).

122

During evolution the JAZ and ZML subfamilies expanded. We identified five

subgroups within the JAZ subfamily and three subgroups within the ZML subfamily. The

expansion of the subgroups occurred at different times within two subfamilies (Figure 4-

5). In moss, the JAZ subfamily experienced duplication, forming two groups: the JAZ_A

subgroup and the precursor of JAZ_B/C/D/E subgroup. The later experienced further

expansion forming the JAZ_B, _C, _D, and _E subgroups after the divergence of ferns

and before the divergence of gymnosperms. However, the ZML family has not

experienced any subgroup expansion before the origin of gymnosperms. The first

expansion of the ZML family occurred after the divergence gymnosperms from seed

plants and before the origin of angiosperms and resulted in the ZML_A and ZML_B/C

subgroups in Amborella. The ZML_B/C subgroup further duplicated and formed the

ZML_B and ZML_C subgroups after the divergence of Amborella, but before the

divergence of monocots and eudicots. The PPD and TIFY families did not undergo gene

family expansion and only contain a single subgroup in each family (Figure 4-5).

Poales Experienced Many Gene Loss Events

Like gene family expansion, gene family loss also occurs during evolution. Bai et

al. (2011) observed loss of the PPD subfamily in O. sativa, B. distachyon, S. bicolor and

Z. mays and suggested that the PPD subfamily was lost from monocots. Increased

sampling across 76 plant species including ancient members of the monocots

(duckweed, orchid, palm and banana) and members of the grasses in this study could

more precisely place the PPD subfamily loss event to the common ancestor of all

poales rather than the common ancestor of monocots as suggested by Bai et al. (2011)

(Figure 4-1).

123

Similar to PPD subfamily, TIFY subfamily also experienced loss in poales (Figure

4-1). The previous study (Bai et al. 2011) identified two TIFY genes in O. sativa and Z.

mays based on the presence of the TIFY domain and loss of the other domains within

the protein sequences. The identification of gene family based only on

presence/absence of domains could be misleading as genes containing only the TIFY

domain could be derived from a JAZ gene that has recently lost the Jas domain, or a

ZML gene that has recently lost the CCT and GATA domain, or a PPD gene which has

lost the PPD and Jas-like domains. Rather than placing all TIFY family genes that did

not contain any other domains into the TIFY subfamily, I applied phylogeny for

subfamily classification which proved sensitive for cases of domain gain/loss during

evolution. In addition, I identified a new domain, SRT, that could facilitate the

classification of the TIFY subfamily.

Based on phylogeny, I identify that the loss of the ZML_B and JAZ_C subgroups

occurred within poales. Given this, poales have lost a large portion of the TIFY family

(PPD subfamily, TIFY subfamily, ZML_B subgroup, and JAZ_C subgroup), which

suggests that the roles played by these genes in other species are lost or being fulfilled

by a different set of genes within the poales.

Domain Loss of TIFY Multidomain Family during Evolution

One interesting aspect of multidomain gene family evolution are the dynamics of

domain gain/loss (Stolzer et al. 2015). In this project, I explored domain dynamics within

the ZML, PPD and TIFY subfamilies. The TIFY subfamily represents cases where there

are restricted domain gain/loss events as all the gene members contain a SRT and a

TIFY domain. The restricted domain dynamics suggest that it is important for both

domains to be present for TIFY subfamily gene function. Genes which have lost the

124

SRT or TIFY domain could be subjected to strong purifying selection and quickly

removed from the genome.

Approximately 25% of the ZML and PPD subfamily genes have lost one or more

domains, suggesting domain loss is a frequent event in the two subfamilies. PPD genes

typically contain a N-terminal PPD domain, a TIFY domain and a C-terminal Jas-like

domain (White 2006; Bai et al. 2011; Zhang et al. 2012). Variants in domain

arrangement of the PPD subfamily include: absence of the PPD domain, absence of the

TIFY domain, absence of the Jas-like domain, lack of both PPD and Jas-like domains,

and lack of both TIFY and Jas-like domains (Figure 4-7). Arabidopsis PPD genes PPD1

and PPD2 are regulators of lamina size and leaf blade curvature (White 2006). Further

exploration of domain function would help to understand the function of PPD genes

lacking one or more domains.

The ZML subfamily contains three domains: TIFY, CCT and GATA (Bai et al.

2011). We identified five patterns of domain loss in the ZML family: loss of TIFY, loss of

CCT, loss of GATA, loss of both CCT and GATA, and loss of both TIFY and GATA

(Figure 4-7). The Arabidopsis ZML1 and ZML2 genes are transcriptional activators of

photoprotective responses by interacting with the CryR1 cis-element in the promoter of

high-light responsive genes (Shaikhali et al. 2012). The maize ZML2 gene regulates

wound-induced lignification by acting as a transcriptional repressor which binds in the

form of MYB/ZML complex with the GAT(A/C) and AC-rich cis-elements of lignin genes

(Vélez-Bermúdez et al. 2015). The TIFY domain of the JAZ subfamily functions in

protein-protein interaction (Chini et al. 2009; Pauwels et al. 2010), which might be a

similar case for the TIFY domain in the ZML subfamily. If that were the case, ZML

125

genes without the TIFY domain may have lost the ability to interact with other proteins,

such as with other ZMLs or MYB. The CCT domain was predicted to contain a nuclear

localization signal (Nishii et al. 2000). Losing the CCT domain may render the ZML

protein unable to enter the nucleus and affect its ability to function as transcription

factor. However, if an interacting protein contains the nuclear localization signal ZML

without the CCT domain may still be properly targeted. The GATA domain is a DNA-

binding domain that recognizes a specific cis-element (Nishii et al. 2000; Teakle et al.

2002). Lost of the GATA domain may lead to a ZML protein that is unable to bind DNA

sequence and thus fails to activate or repress downstream gene targets. If a ZML

protein that has lost a GATA domain is still able to interact with other proteins it may

compete with normal ZML proteins.

Alternative Splicing of Jas-like Intron of PPD Genes

Based on sequence similarity the Jas-like domain is likely a variant of the Jas

domain, which in turn may have originated from the CCT domain (Chung et al. 2009).

JAZ proteins are transcriptional repressors which could bind with transcription factors to

repress their function (Thines et al. 2007; Chini et al. 2007; Chini et al. 2016). The Jas

domain of JAZ proteins is split by an intron (the Jas intron) into two parts: a 20 amino

acid N-terminal motif and a 7 amino acid C-terminal motif (X5PY) (Figure 4-8; Chung et

al. 2010). AS in the intron of the Jas domain of JAZ genes plays an important functional

role in jasmonate signaling pathway (Yan et al. 2007; Chung and Howe 2009; Chung et

al. 2010; Moreno et al. 2013). Moreover, this AS event is conserved among monocots

and eudicots (Chung et al. 2010). AS around the Jas intron usually causes a stop codon

downstream of the 20 N-terminal motif and leads to a truncated protein lacking the 7 C-

terminal motif X5PY(Yan et al. 2007; Chung and Howe 2009; Chung et al. 2010). The

126

Jas domain of JAZ proteins has two main functions: 1) JA-Ile mediated interaction with

COI1 that leads to polyubiquitinization of the JAZ proteins by ubiquitin ligase followed by

degradation by the 26S proteasome (Bryan et al. 2007; Chini et al. 2007; Katsir et al.

2008); 2) interaction with transcription factors to suppress their functions (Fernández-

Calvo et al. 2011; Qi et al. 2011; Song et al. 2013; Fonseca et al. 2014; Thatcher et al.

2016). The truncated JAZ proteins lacking the X5PY motif within the Jas domain retain

their ability to interact with transcription factors but their recognition by COI1 is absent or

compromised (Chung and Howe 2009; Chung et al. 2010; Moreno et al. 2013; Zhang et

al. 2017a). Absence of COI1 binding would prevent its degradation resulting in an

abundance of the transcriptional repressor. Thus, the AS isoforms of the JAZ proteins

may function as permanent repressors.

To detect whether there are similar AS events in the Jas-like domain of PPD

genes, and to investigate whether those AS events are conserved among species, I

collected gene structures and AS data from 7 species including Amborella, two

monocots and four eudicots. Similar to the Jas domain, the Jas-like domain in PPD

genes are split by an intron (the Jas-like intron) into a 20 N-terminal motif and a 7 C-

terminal motif. We observed conserved AS in the Jas-like intron of PPD genes in the

four eudicot species (Figure 4-8). Similar to the AS event in the intron within the Jas

domain within JAZ genes, an AS event in the intron within the Jas-like domain of the

PPD genes frequently (6/7) causes a stop codon leading to a truncated protein lacking

the 7 C-terminal motif. However, I noticed significant differences between the 7 C-

terminal motif of the Jas domain and the Jas-like domain (Figure 4-8). In the Jas

domain, the 7 C-terminal motif contains two highly conserved amino acids, PY, at the C-

127

terminus (Katsir et al. 2008). While in Jas-like domain, the 7 C-terminal motif contains a

conserved basic amino acid (R or K) in the second site and is enriched in basic amino

acids from the 4th to the 7th site (Figure 4-8). Thus the functional lessons we learned

from AS in the Jas domain may not directly apply to AS in the Jas-like domain.

However, it could provide two directions for future functional analysis. The first is to

investigate whether the Jas-like domain could interact with COI1 protein, and if so,

whether the loss of the 7 C-terminal motif affects this interaction. Second, whether the

Jas-like domain could interact with transcription factors and whether the loss of the 7 C-

terminal motif affect this interaction. Additionally, nuclear localization sequences are

usually enriched in basic amino acids (Raikhel 1992). The 16th to 27th sites of the Jas-

like domain are enriched with basic amino acids (Figure 4-1), and it will be interesting to

see whether the Jas-like domain carries a nuclear localization signal and whether AS

could interrupt this signal.

128

Figure 4-1. Logos of the domains in the TIFY family. Available secondary structure

information and predicted functional regions were indicated.

129

Figure 4-2. Domain distributions across 76 plant species. Absence of a domain is

indicated by grey-tone shading. The solid black triangles indicate that the identified domains were limited in the TIFY family. The black-white triangles indicate that only some of the identified domains belong to the TIFY family.

Geologic Timescale

Time (Mya)

z

e

g

ba

0 300 600 900 1660

X

TIFY Jas CCT GATA PPD SRT Jas-like

1 C. paradoxa 0/0 0/0 0/1 0/3 0/0 0/0 0/-

2 C. merolae (Strain 10D) 0/0 0/0 0/3 0/6 0/0 0/0 0/-

3 P. purpureum CCMP1328 0/0 0/0 0/4 0/2 0/0 0/0 0/-

4 V. carteri 0/0 0/0 0/5 0/8 0/0 0/0 0/-

5 C. reinhardtii 0/0 0/0 0/8 0/12 0/0 0/0 0/-

6 C. subellipsoidea C-169 0/0 0/0 0/3 0/6 0/0 0/0 0/-

7 C. variabilis NC64A 0/0 0/0 0/4 0/6 0/0 0/0 0/-

8 B. prasinos 0/0 0/0 0/5 0/7 0/0 0/0 0/-

9 M. pusilla CCMP1545 0/0 0/0 0/3 0/9 0/0 0/0 0/-

10 M. pusilla RCC299 0/0 0/0 0/4 0/6 0/0 0/0 0/-

11 O. lucimarinus 0/0 0/0 0/5 0/8 0/0 0/0 0/-

12 O. sp. RCC809 0/0 0/0 0/3 0/5 0/0 0/0 0/-

13 O. tauri 0/0 0/0 0/4 0/3 0/0 0/0 0/-

14 P. patens moss 16/16 6/6 4/30 4/14 0/0 2/2 1/-

15 S. moellendorffii fern 11/12 2/3 2/13 2/8 3/3 1/1 3/-

16 G. biloba common ginkgo 17/17 8/8 3/19 2/22 1/1 3/3 1/-

17 P. abies Norway spruce 50/56 26/27 1/16 0/13 2/2 3/3 1/-

18 P. taeda loblolly pine 67/68 24/25 1/15 0/14 2/2 2/3 2/-

19 A. trichopoda 10/10 4/4 2/17 2/19 1/1 1/1 1/-

20 S. polyrhiza duckweed 15/15 9/9 3/25 3/20 1/1 1/1 1/-

21 P. equestris orchid 16/16 14/15 4/23 3/21 1/1 1/1 0/-

22 P. dactylifera data palm 16/16 14/15 6/31 5/17 0/0 1/1 0/-

23 E. guineensis African oil palm 14/15 12/12 6/38 4/30 2/2 0/1 0/-

24 M. acuminata banana 49/50 31/32 6/71 6/53 3/3 3/2 1/-

25 M. balbisiana wild banana 31/36 22/26 4/52 3/38 1/2 3/3 1/-

26 P. virgatum switchgrass 42/42 31/31 6/66 6/57 0/0 0/0 0/-

27 P. hallii Hall's panicgrass 21/21 19/19 3/35 3/29 0/0 0/0 0/-

28 S. italica foxtail milet 19/19 17/17 3/36 3/28 0/0 0/0 0/-

29 S. bicolor sorghum 21/21 20/20 3/36 3/30 0/0 0/0 0/-

30 Z. mays maize 32/34 30/34 3/52 3/41 0/0 0/0 0/-

31 O. sativa rice 17/18 17/18 4/40 4/25 0/0 0/0 0/-

32 P. heterocycla moso bamboo 19/20 23/23 6/44 5/29 0/0 0/0 0/-

33 B. distachyon purple false brome 18/18 17/17 5/36 5/28 0/0 0/0 0/-

34 H. vulgare barley 12/12 11/11 4/26 4/17 0/0 0/0 0/-

35 T. aestivum bread wheat 33/33 31/32 12/79 9/52 0/0 0/0 0/-

36 T. urartu wheat A genome progenitor 11/11 10/10 4/31 4/14 0/0 0/0 0/-

37 A. coerulea Colorado blue columbine 12/13 4/4 4/22 5/26 1/1 1/1 1/-

38 N. nucifera sacred lotus 18/18 11/12 5/38 5/33 2/2 2/2 2/-

39 B. vulgaris sugar beet 12/12 6/6 4/22 4/16 1/1 1/1 0/-

40 A. chinensis kiwifruit 7/9 8/8 8/46 7/39 0/0 0/0 0/-

41 U. gibba humped bladderwort 19/21 9/9 3/36 2/28 2/2 2/2 1/-

42 M. guttatus monkeyflower 15/15 8/8 4/32 4/26 2/2 1/1 2/-

43 N. benthamiana tobbacco 25/26 15/15 6/46 6/52 3/3 2/2 2/-

44 C. annuum pepper 16/16 8/8 4/26 3/27 2/2 1/1 2/-

45 S. lycopersicum tomato 19/20 12/12 4/30 4/30 2/2 1/1 2/-

46 S. tuberosum potato 20/20 11/11 3/30 3/30 2/2 1/1 2/-

47 V. vinifera grapevine 19/19 10/10 4/26 4/19 2/2 1/1 2/-

48 E. grandis flooded gum 19/19 12/12 4/27 3/22 1/1 1/1 1/-

49 C. sinensis orange 14/14 7/8 4/28 4/22 2/2 1/1 2/-

50 G. raimondii cotton 28/28 15/15 8/57 8/46 3/3 1/1 3/-

51 T. cacao cacao tree 17/17 9/9 5/30 5/24 2/2 1/1 2/-

52 C. papaya papaya 13/13 6/6 4/25 2/22 1/1 1/1 1/-

53 B. rapa field mustard 35/35 21/23 5/65 5/60 2/2 2/2 2/-

54 E. salsugineum salt cress 14/14 8/9 3/34 3/30 1/1 1/1 1/-

55 A. thaliana 18/18 10/12 3/40 3/30 2/2 1/1 2/-

56 C. grandiflora 17/17 9/10 3/37 3/26 2/2 1/1 2/-

57 B. stricta Drummond's rockcress 17/17 9/10 4/38 3/30 2/2 1/1 1/-

58 C. sativus cucumber 15/15 9/9 4/30 4/25 1/1 2/2 1/-

59 C. lanatus watermelon 15/15 9/10 4/26 4/21 1/1 2/2 1/-

60 M. domestica apple 29/32 18/19 5/49 3/35 2/2 3/3 2/-

61 P. bretschneideri Chinese white pear 23/26 12/12 6/44 6/30 2/3 2/2 2/-

62 P. persica peach 16/16 8/8 5/27 5/20 1/1 1/1 1/-

63 P. mume mei 14/14 6/6 5/26 5/20 1/1 1/1 1/-

64 F. vesca woodland strawberry 11/13 5/5 3/25 3/19 0/1 1/1 0/-

65 G. max soybean 35/37 20/21 9/68 9/59 2/2 2/3 2/-

66 P. vulgaris common bean 18/18 9/9 6/38 6/32 1/1 2/2 1/-

67 C. cajan pigeon pea 17/17 11/11 4/33 4/33 1/1 1/1 1/-

68 M. truncatula barrel medic 20/20 12/13 4/36 4/41 1/1 1/1 1/-

69 C. arietinum chickpea 16/16 9/9 5/35 5/27 1/1 1/1 1/-

70 L. japonicus birdsfoot trefoil 11/11 4/4 4/27 3/19 1/1 1/1 1/-

71 R. communis castor bean 14/14 9/9 4/28 4/19 2/2 1/1 2/-

72 M. esculenta cassava 26/26 15/15 7/46 7/35 3/3 1/2 2/-

73 J. curcas physic nut 11/12 7/7 5/30 5/27 3/3 1/1 0/-

74 L. usitatissimum flax 23/24 13/14 4/47 3/33 2/2 2/2 2/-

75 P. trichocarpa poplar 22/22 12/12 8/47 7/39 2/2 2/2 2/-

76 S. purpurea willow 21/21 11/11 7/44 7/39 2/2 2/2 2/-

1,288/1,326 805/835 286/2,324 263/1,911 83/86 74/77 70/-

# Domains of TIFY super family / # Domains of the genome

Total

No. Species Common name

Po

ale

sA

ste

rid

sE

uro

sid

s I

IE

uro

sid

s I

130

Figure 4-3. Distribution of ZML, JAZ, PPD and TIFY subfamilies in plant species.

Absence of a subfamily is indicated by shade.

Geologic Timescale

Time (Mya)

z

e

g

ba

0 300 544

JAZ ZML PPD TIFY

14 P. patens moss 10 4 1 2

15 S. moellendorffii fern 5 2 3 1

16 G. biloba common ginkgo 11 3 1 3

17 P. abies Norway spruce 51 1 2 3

18 P. taeda loblolly pine 63 1 2 2

19 A. trichopoda 6 2 1 1

20 S. polyrhiza duckweek 11 3 1 1

21 P. equestris orchid 12 4 1 1

22 P. dactylifera data palm 10 7 0 1

23 E. guineensis African oil palm 9 6 2 0

24 M. acuminata banana 37 8 3 3

25 M. balbisiana wild banana 28 5 1 3

26 P. virgatum switchgrass 36 6 0 0

27 P. hallii Hall's panicgrass 18 3 0 0

28 S. italica foxtail milet 16 3 0 0

29 S. bicolor sorghum 18 3 0 0

30 Z. mays maize 31 3 0 0

31 O. sativa rice 14 4 0 0

32 P. heterocycla moso bamboo 18 6 0 0

33 B. distachyon purple false brome 15 5 0 0

34 H. vulgare barley 11 4 0 0

35 T. aestivum bread wheat 27 12 0 0

36 T. urartu wheat A genome progenitor 10 4 0 0

37 A. coerulea Colorado blue columbine 4 6 1 1

38 N. nucifera sacred lotus 11 5 2 2

39 B. vulgaris sugar beet 6 4 1 1

40 A. chinensis kiwifruit 7 8 0 0

41 U. gibba humped bladderwort 12 3 2 2

42 M. guttatus monkeyflower 9 4 2 1

43 N. benthamiana tobbacco 17 6 3 2

44 C. annuum pepper 9 4 2 1

45 S. lycopersicum tomato 12 4 2 1

46 S. tuberosum potato 14 3 2 1

47 V. vinifera grapevine 11 5 2 1

48 E. grandis flooded gum 12 5 1 1

49 C. sinensis orange 7 4 2 1

50 G. raimondii cotton 16 8 3 1

51 T. cacao cacao tree 9 5 2 1

52 C. papaya papaya 8 4 1 1

53 B. rapa field mustard 26 5 2 2

54 E. salsugineum salt cress 9 3 1 1

55 A. thaliana 12 3 2 1

56 C. grandiflora 11 3 2 1

57 B. stricta Drummond's rockcress 11 4 2 1

58 C. sativus cucumber 9 4 1 2

59 C. lanatus watermelon 9 4 1 2

60 M. domestica apple 21 5 2 3

61 P. bretschneideri Chinese white pear 13 6 2 2

62 P. persica peach 9 5 1 1

63 P. mume mei 8 5 1 1

64 F. vesca woodland strawberry 7 3 0 1

65 G. max soybean 22 9 2 2

66 P. vulgaris common bean 10 6 1 2

67 C. cajan pigeon pea 12 5 1 1

68 M. truncatula barrel medic 14 4 1 1

69 C. arietinum chickpea 9 5 1 1

70 L. japonicus birdsfoot trefoil 5 4 1 1

71 R. communis castor bean 9 4 2 1

72 M. esculenta cassava 15 7 3 1

73 J. curcas physic nut 8 5 4 1

74 L. usitatissimum flax 15 4 2 2

75 P. trichocarpa poplar 13 8 2 2

76 S. purpurea willow 12 7 2 2

920 295 85 74

TIFY Super Family

Po

ale

sA

ste

rid

sE

uro

sid

s I

IE

uro

sid

s I

Total

No. Species Common name

131

Figure 4-4. ML tree of ZML, JAZ, PPD and TIFY subfamilies. Yellow, pink, purple, blue,

green and red indicate proteins from moss, fern, gymnosperms, A. trichopoda, monocots, and eudicots, respectively.

0.40.4

0.40.4

PPD tree TIFY tree

46

48

51

59

36

55

17

77

0.4

ZML tree

0.4

97 74

40

4

32 41

9

23

27

91

90

51

27

45

100

100

93

68

47

100

96

92

100

JAZ tree

0.4

0.4

A

B

C

D

E

A

B

C

132

Figure 4-5. Estimated evolutionary history of the four subfamilies of the TIFY family.

Domain symbols with dotted edges indicate the origin of the domain. Red crosses indicate absence of the subfamily or subgroup. Red arrows indicate expansion or loss events of the subfamily or subgroup.

133

Figure 4-6. Rate-shift sites in the TIFY domain across the four subfamilies. A) Amino

acid substitution rate differences comparing each subfamily with the average of the four subfamilies. B) Sites with significantly shifted rates in each subfamily. Slow or fast sites indicate sites with significantly low or high amino acid substitution rates in the indicated subfamily compared with the whole family. C) TIFY domain logo in each of the four subfamilies with rate-shifted sites indicated. Green triangles indicate slow sites and orange triangles indicate fast sites. Blue asterisks indicate the conserved motif TIFY. Yellow arrows indicate the predicted β-sheet; and orange cylinder indicates the predicted α-helix (Chung et al. 2009).

134

Figure 4-7. Domain dynamics of the PPD, TIFY and ZML subfamilies. Gray squares

indicate the presence of the domains shown in the bottom of each tree. Black squares indicate absence of the indicated domain. In the ML tree, yellow, pink, purple, blue, green and red indicate proteins from moss, fern, gymnosperms, A. trichopoda, monocots, and eudicots, respectively.

PPD TIFY

PPD TIFY Jas-like

SRT TIFY

ZML ZML_continue ZML_continue

TIFY CCT GATA

TIFY CCT GATA

TIFY CCT GATA

135

Figure 4-8. AS in the Jas-like intron of the PPD genes. Comparison of the Jas domain

in JAZ proteins (A) and the Jas-like domain in PPD proteins (B) with the intron position indicated. C) Gene structure of PPD genes from four eudicot species. Homolog exons were connected with gray bars. D) Protein product prediction of the AS isoforms. Three letter codons were separated by dots. Blue indicates sequences in exon7 and orange indicates alternatively spliced sequences in Jas-like intron as shown in C. Black rectangles indicate stop codons.

136

CAPTER 5 CONCLUSIONS AND PERSPECTIVES

During plant evolution, a comprehensive signal interpretation and transduction

network was developed for precise and fast response to biotic and abiotic stresses.

Transcription factors and transcription coregulators play important roles in this network.

The 3R-MYB and TIFY gene families are transcription factors/coregulators with

members that play significant roles in abiotic/biotic stress responses. In Chapter 2 and 4

I investigated the expansion of the two families and examined features from molecular

evolution of the two families which may contribute to adaptation to the environment.

Within the TIFY family, JAZ proteins are important repressors functioning in the first

step of the jasmonate signaling pathway. AS of JAZ proteins plays an important role in

balancing jasmonate signaling. However, our understanding of the AS regulation in

jasmonate pathway is limited. In Chapter 3, I explored various AS and AS-related

regulation of Arabidopsis in response to increased jasmonate.

Evolution and Function of the Plant 3R-MYBs

There are three groups of 3R-MYB genes in angiosperms: A-, B-, and C-groups.

In Arabidopsis and tobacco, A- and B-groups were involved in regulating the G2/M

transition during cell cycle (Araki et al. 2004; Haga et al. 2007; Kato et al. 2009; Haga et

al. 2011; Araki et al. 2012; Araki et al. 2013) while the C-group gene in rice functions in

both upregulating cell duplication and increasing stress resistance (Dai et al. 2007; Ma

et al. 2009). The analysis of predicted MSA promoter number in A-, B-, C-group and

randomly sampled genes implied involvement of the three groups in cell cycle

regulation, and suggested that the mechanism of cell cycle regulation by C-group genes

might be different from that of the A- and B-groups. Based on the expression profiles of

137

3R-MYB genes from ten plant species under various abiotic stresses genes in 3R-

MYBs, especially in C-group 3R-MYBs, may positively regulate stress responses.

In Arabidopsis and tobacco, A-group 3R-MYBs function as activators while B-

group 3R-MYBs function as repressors of G2/M transition. The competition between the

activator and repressor regulates the transition of G2/M. However, this mechanism, if

valid, is not universal for all angiosperms – certainly not for grasses where no B-group

3R-MYBs were identified. Based on phylogeny, B-group 3R-MYBs were likely lost in the

common ancestor of grasses. Moreover, a motif in A-group 3R-MYBs that plays a

repressor role has diverged in grasses relative to other species, suggesting that A-

group 3R-MYBs in grasses may play diverged functional roles. Taken together, the

mechanism of G2/M transition in grasses, which includes many important crop species,

is likely different from that of the eudicot species Arabidopsis and tobacco. Further

investigation of the G2/M transition mechanism in grasses is needed.

Phylogenetic analysis predicts that the expansions of the 3R-MYBs occurred

before the divergence of Amborella from other angiosperm species. More ancient

species, including algae, moss, and likely gymnosperms, only contain one group of 3R-

MYBs. Thus, the 3R-MYBs regulatory role in cell cycle in these species would be

different from that in angiosperms. However, our understanding of 3R-MYB function in

these ancient species is lacking. It will be interesting to compare regulation of cell cycle

in these ancient species with the single group of 3R-MYBs, and in modern species with

three (or two in grasses) groups of 3R-MYBs with divergent functions. C-group 3R-MYB

genes involved in both cell cycle and abiotic stresses could be a newly evolved function

contributing the plant better adaptation to the environment.

138

AS of the two Arabidopsis A-group genes can generate isoforms lacking a

repression motif in the C-terminus leading to a hyperactive isoform. Similar cases were

observed in A-group genes in grape and Amborella. Thus, AS may play an important

role in regulating functional activity of A-group 3R-MYBs.

Origin and Evolution of the Multidomain TIFY Family

The TIFY plant-specific multi-domain gene family could be divided into four

subfamilies: ZML, JAZ, PPD, TIFY. The division of subfamilies based only on domain

arrangement could be misleading as domain rearrangement happens during evolution,

which is supported by observed domain loss in members of the ZML and PPD families. I

suggest the importance of phylogeny in subfamily division of TIFY family. Available data

did not reveal any evidence for domain loss in the TIFY family, which suggests that both

domains are important for proper gene function. The JAZ and ZML families have

undergone gene family expansion during evolution. The JAZ genes are the repressors

in the first step of the jasmonate signaling pathway. The expansion of the JAZ family

during plant evolution may have contributed to the complexity of jasmonate signaling

pathway. Interestingly, I observed AS in the Jas-like intron of PPD genes in various

eudicot species. The observed AS in the intron of the Jas-like domain of PPD genes is

similar to the AS in the intron of the Jas domain of JAZ genes, which frequently results

in a premature stop codon and results in a truncated protein that lacks part of the

Jas/Jas-like domain. AS in the Jas intron of JAZ genes could generate functional

proteins with divergent functions compared with that of the primary isoform. It will be

interesting to further analyze whether AS isoforms of the PPD genes are translated, and

if so, to determine the function of the translated protein.

139

AS Regulation of Arabidopsis under Jasmonate Treatment

In this project, I explored AS and AS-related regulation of Arabidopsis jaz2 and

jaz7 mutants in response to jasmonate treatment with transcriptome and proteome data.

Specifically, I focused on 1) AS regulation – a change in the proportion of AS isoforms

from genes in response to jasmonate treatment; 2) genes that undergo differential AS

and produce isoforms with potential miRNA target sites; and 3) genes that undergo AS

to produce splice variants with novel functions. At each level of examination I identified

a pool of candidate genes and a few interested cases were further explored. NUDX9

and NRT1.8 were identified to have significantly changed isoform proportions in

response to MeJA, which suggests that jasmonate signaling pathway regulates AS of

these genes. SMZ, AAO2 and At3g02740 are jasmonate responsive genes with

predicted miRNA binding sites subjected to AS regulation. AS of bHLH160, FYF and

FLM have the potential to generate an activator and repressor from the same gene

through different arrangements of domain structures – similar to the AS regulation of

JAZ repressors. Proteomics data validated protein level expression of the AS isoform

for nine genes. Jasmonate signaling and splicing regulation may communicate through

shared transcription factors, splicing factors and ubiquitin pathways. splicing factors

usually have decreased expression in response to MeJA, and a few candidate splicing

factors involved in jasmonate responses were identified. Taken in aggregate, these

findings expand our understanding of AS regulation in the jasmonate signaling pathway

and provide us candidate genes which may play critical roles in AS regulation of

jasmonate responses.

140

LIST OF REFERENCES

Abbasi AA, Hanif H. 2012. Phylogenetic history of paralogous gene quartets on human chromosomes 1, 2, 8 and 20 provides no evidence in favor of the vertebrate octoploidy hypothesis. Mol Phylogenet Evol. 63: 922-927.

Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. 1990. Basic local alignment search tool. J. Mol. Biol. 215: 403-410.

Alves-Junior L, Niemeier S, Hauenschild A, Rehmsmeier M, Merkle T. 2009. Comprehensive prediction of novel microRNA targets in Arabidopsis thaliana. Nucleic Acids Res. 37: 4010-4021.

Apic G, Gough J, Teichmann SA. 2001. Domain combinations in archaeal, eubacterial and eukaryotic proteomes. J Mol Biol. 310: 311-325.

Araki S, Ito M, Soyano T, Nishihama R, Machida Y. 2004. Mitotic cyclins simulate the activity of c-Myb-like factors for transactivation of G2/M phase-specific genes in tobacco. J Biol Chem. 279: 32979-32988.

Araki S, Machida Y, Ito M. 2012. Virus-induced silencing of NtmybA1 and NtmybA2 causes incomplete cytokinesis and reduced shoot elongation in Nicotiana benthamiana. Plant Biotechnol. 29: 483-487.

Araki S, et al. 2013. Cosuppression of NtmybA1 and NtmybA2 causes downregulation of G2/M phase-expressed genes and negatively affects both cell division and expansion in tobacco. Plant Signal Behav. 8: e26780.

Attaran E, et al. 2014. Temporal dynamics of growth and photosynthesis suppression in response to jasmonate signaling. Plant Physiol. 165: 1302-1314.

Axtell MJ, Westholm JO, Lai EC. 2011 Vive la différence: biogenesis and evolution of microRNAs in plants and animals. Genome Biol. 12: 221.

Bai Y, Meng Y, Huang D, Qi Y, Chen M. 2011. Origin and evolutionary analysis of the plant-specific TIFY transcription factor family. Genomics 98: 128-136.

Bailey TL, Williams N, Misleh C, Li WW. 2006. MEME: discovering and analyzing DNA and protein sequence motifs. Nucleic Acids Res. 34: W369-W373.

Barbazuk WB, Fu Y, McGinnis KM. 2008. Genome-wide analyses of alternative splicing in plants: Opportunities and challenges. Genome Res. 18: 1381-1392.

Basu MK, Makalowski W, Rogozin IB, Koonin EV. 2008. U12 intron positions are more strongly conserved between animals and plants than U2 intron positions. Biol Direct 3: 19.

141

Bechtold U, et al. 2010. Constitutive salicylic acid defences do no compromise seed yield, drought tolerance and water productivity in the Arabidopsis accession C24. Plant, Cell & Environ. 33: 1959-1973.

Berget SM, Moore C, Sharp PA. 1977. Spliced segments at the 5’ terminus of adenovirus 2 late mRNA. P Natl Acad Sci USA. 74: 3171-3175

Bergoltz S, et al. 2001. The highly conserved DNA-binding domains of A-, B, and c-Myb differ with respect to DNA-binding phosphorylation and redox properties. Nucleic Acids Res. 29: 3546-3556.

Bernal M, et al. 2012 Transcriptome sequencing identifies SPL7-regulated copper acquisition genes FRO4/FRO5 and the copper dependence of iron homeostasis in Arabidopsis. Plant Cell 24: 738-761.

Bornberg-Bauer E, Albà MM. 2013. Dynamics and adaptive benefits of modular protein evolution. Curr Opin Struct Biol. 23: 459-466.

Boutet SC, et al. 2012. Alternative polyadenylation mediates microRNA regulation of muscle stem cell function. Cell Stem Cell 10: 327-336.

Braun EL, Grotewold E. 1999. Newly discovered plant c-myb-like genes rewrite the evolution of the plant myb gene family. Plant Physiol. 121: 21-24.

Brown JD, Plumpton M, Beggs JD. 1992. The genetics of nuclear premessenger RNA splicing: a complex story. Antonie Van Leeuwenhoek 62: 35-46.

Browse J, Howe GA. 2008. New weapons and a rapid response against insect attack. Plant Physiol. 146: 832-838.

Campo S, et al. 2013. Identification of a novel microRNA(miRNA) from rice that targets an alternatively splicing transcript of the Nramp6 (Natural resistance-associated macrophage protein 6) gene involved in pathogen resistance. New Phytol. 199: 212-227.

Carretero-Paulet L, et al. 2010. Genome-wide classification and evolutionary analysis of the bHLH family of transcription factors in Arabidopsis, poplar, rice, moss and algae. Plant Physiol. 153: 1398-1412.

Chamala S, Feng G, Chavarro C, Barbazuk WB. 2015. Genome-wide identification of evolutionarily conserved alternative splicing events in flowering plants. Front Bioeng Biotechnol. 3: 33.

Chandran D, Inada N, Hather G, Kleindt CK, Wildermuth MC. 2010. Laser microdissection of Arabidopsis cells at the powdery mildew infection site reveals site-specific processes and regulators. P Natl Acad Sci USA. 107: 460-465.

142

Chang YF, Iman JS, Wilkinson MF. 2007. The nonsense-mediated decay RNA surveillance pathway. Annu Rev Biochem. 76: 51-74.

Cavalier-Smith T. 1985. Selfish DNA and the origin of introns. Nature 315: 283-284.

Cheng CY, et al. 2016. Araport11: a complete reannotation of the Arabidopsis thaliana reference genome. Plant J. 89: 789-804.

Chico JM, et al. 2014. Repression of jasmonate-dependent defenses by shade involves differential regulation of protein stability of MYC transcription factors and their JAZ repressors in Arabidopsis. Plant Cell 26: 1967-1980.

Chini A, et al. 2007. The JAZ family of repressors is the missing link in jasmonate signalling. Nature 448: 666-671.

Chini A, Fonseca S, Chico JM, Fernández-Calvo P, Solano R. 2009. The ZIM domain mediates homo- and heteromeric interactions between Arabidopsis JAZ proteins. Plant J. 59: 77-87.

Chini A, Gimenez-Ibanez S, Goossens A, Solano R. 2016. Redundancy and specificity in jasmonate signalling. Curr Opin Plant Biol. 33: 147-156.

Chow LT, Gelinas RE, Broker TR, Roberts RJ. 1977. An amazing sequence arrangement at the 5’ ends of adenovirus 2 messenger RNA. Cell 12: 1-8.

Chung HS, Howe GA. 2009. A critical role for the TIFY motif in repression of jasmonate signaling by a stabilized splice variant of the JASMONATE ZIM-Domain protein JAZ10 in Arabidopsis. Plant Cell 21: 131-145.

Chung HS, Niu Y, Browse J, Howe GA. 2009. Top hits in contemporary JAZ: An update on jasmonate signaling. Phytochem. 70: 1547-1559.

Chung HS, et al. 2010. Alternative splicing expands the repertoire of dominant JAZ repressors of jasmonate signaling. Plant J. 63: 613-622.

Dai X, et al. 2007. Overexpression of an R1R2R3 MYB gene, OsMYB3R-2, increases tolerance to freezing, drought, and salt stress in transgenic Arabidopsis. Plant Physiol. 143: 1739-1751.

Dai X, Zhao PX. 2011. psRNATarget: a plant small RNA target analysis server. Nucleic Acids Res. 39: W155-W159.

Darnel JE. 1978. Implications of RNA-RNA splicing in evolution of eukaryotic cells. Science 202: 1257-1260.

Dash AB, Orrico FC, Ness SA. 1996. The EVES motif mediates both intermolecular and intramolecular regulation of c-Myb. Gene Dev. 10: 1858-1869.

143

Dash S, Van Hemert J, Hong L, Wise RP, Dickerson JA. 2012. PLEXdb: gene expression resources for plants and plant pathogens. Nucleic Acids Res. 40: D1194-D1201.

Davidson CJ, Guthrie EE, Lipsick JS. 2012. Duplication and maintenance of the Myb genes of vertebrate animals. Biol Open 2: 101-110.

Davidson CJ, Tirouvanziam R, Herzenberg LA, Lipsick JS. 2005. Functional evolution of the vertebrate Myb gene family B-Myb, but neither A-Myb nor c-Myb, complements Drosophila Myb in hemocytes. Genetics 169: 215-229.

del Pozo JC, Lopez-Matas MA, Ramriez-Parra E, Gutierrez C. 2005. Hormonal control of the plant cell cycle. Physiol Plantarum 123: 173-183.

Davidson CJ, Guthrie EE, Lipsick JS. 2012. Duplication and maintenance of the Myb genes of vertebrate animals. Biol. Open 2: 101-110.

Dias AP, Braun EL, McMullen MD, Grotewold E. 2003. Recently duplicated maize R2R3 Myb genes provide evidence for distinct mechanisms of evolutionary divergence after duplication. Plant Physiol. 131: 610-620.

Drechsel G, et al. 2013. Nonsense-mediated decay of alternative precursor mRNA splicing variants is a major determinant of the Arabidopsis steady state transcriptome. Plant Cell 25: 3726-3742.

Du H, et al. 2012. Genome-wide analysis of the MYB transcription factor superfamily in soybean. BMC Plant Biol. 12: 106.

Du H, et al. 2013. Genome-wide identification and evolutionary and expression analyses of MYB-related genes in land plants. DNA Res. 20: 437-448.

Dubos C, et al. 2010. MYB transcription factor in Arabidopsis. Trends Plant Sci. 15: 573-581.

Dugas DV, et al. 2011. Functional annotation of the transcriptome of Sorghum bicolor in response to osmotic stress and abscisic acid. BMC Genomics 12: 514.

Dujon B. 1980. Sequence of the intron and flanking exons of the mitochondrial 21S rRNA gene of yeast strains having different alleles at the omega and rib-1 loci. Cell 20: 185-197.

Doolittle WF. 1978. Genes in pieces: were they ever together? Nature 272: 581-582.

Eddy SR. 2011. Accelerated profile HMM searches. PLoS Comput Biol. 7: e1002195.

Edgar RC. 2004. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32: 1792-1797.

144

Fedorov A, et al. 2001. Intron distribution difference for 276 ancient and 131 modern genes suggests the existence of ancient introns. P Natl Acad Sci USA. 98: 13177-13182.

Fedorova L, Fedorov A. 2003. Introns in gene evolution. Genetica 118: 123-131.

Feller A, Machemer K, Braun EL, Grotewold E. 2011. Evolutionary and comparative analysis of MYB and bHLH plant transcription factors. Plant J. 66: 94-116.

Feng G, Burleigh JG, Braun EL, Mei W, Barbazuk WB. 2017. Evolution of the 3R-MYB gene family in plants. Genome Bio Evol. 9: 1013-1029.

Fernández-Calvo P, et al. 2011. The Arabidopsis bHLH transcription factors MYC3 and MYC4 are targets of JAZ repressors and act additively with MYC2 in the activation of jasmonate responses. Plant Cell 23: 701-715.

Filichkin S, Mockler TC. 2012. Unproductive alternative splicing and nonsense mRNAs: a widespread phenomenon among plant circadian clock genes. Biol Direct 7: 20.

Filichkin S, Priest HD, Megraw M, Mockler TC. 2015. Alternative splicing in plants: directing traffic at the corssroads of adaptation and environmental stress. Curr Opin Plant Biol. 24: 125-135.

Finn RD, et al. 2014. Pfam: the protein families database. Nucleic Acids Res. 42: D222-D230.

Foissac S, Sammeth M. 2007. ASTALAVISTA: dynamic and flexible analysis of alternative splicing events in custom gene datasets. Nucleic Acids Res. 35: W297-W299.

Fonseca S, et al. 2009. (+)-7-iso-Jasmonoyl-L-isoleucine is the endogenous bioactive jasmonate. Nature Chem Biol. 5: 344-350.

Fonseca S, et al. 2014. bHLH003, bHLH013 and bHLH017 are new targets of JAZ repressors negatively regulating JA responses. PLoS ONE 9: e86182.

Friedman RC, Farh KKH, Burge CB, Bartel DP. 2009. Most mammalian mRNAs are conserved targets of microRNAs. Genome Res. 19: 92-105.

Gasperini D, et al. 2015 Multilayered organization of jasmonate signalling in the regulation of root growth. PLoS Genet. 11: e1005300.

Gaucher EA, Gu X, Miyamoto MM, Benner SA. 2002. Predicting functional divergence in protein evolution by site-specific rate shifts. Trends in Biochem Sci. 27:315-321.

145

Gaucher EA, Miyamoto MM, Benner SA. 2001. Function-structure analysis of proteins using covarion-based evolutionary approaches: elongation factors. P Natl Acad Sci USA. 98: 548-552.

Gharib WH, Robinson-Rechavi M. 2013. The branch-site test of positive selection is surprisingly robust but lacks power under synonymous substitution saturation and variation in GC. Mol Bio Evol. 30: 1675-1686.

Gilbert W. 1987. The exon theory of genes. In: Cold Spring Harbor symposia on quantitative biology. Cold Spring Harbor Laboratory Press. 52: 901-905.

Gibson TJ, Spring J, 2000. Evidence in favour of ancient octaploidy in the vertebrate genome. Biochem Soc Trans. 28: 259-264.

Gill SS, Tuteja N. 2010. Reactive oxygen species and antioxidant machinery in abiotic stress tolerance in crop plants. Plant Physiol BioChem. 48: 909-930.

Goldman N, Yang Z. 1994. A codon-based model of nucleotide substitution for protein-coding DNA sequences. Mol Biol Evol. 11: 725-736.

Goodman HM, Olson MV, Hall BD. 1977. Nucleotide sequence of a mutant eukaryotic gene: the yeast tyrosine-inserting ochre suppressor SUP4-o. P Natl Acad Sci USA. 74: 5453-5457.

Grotewold E, et al. 2000. Identification of the residues in the Myb domain of maize C1 that specify the interaction with the bHLH cofactor R. P Natl Acad Sci USA. 97:13579-13584.

Han JH, Batey S, Nickson AA, Teichmann SA, Clarke J. 2007. The folding and evolution of multidomain proteins. Nat Rev Mol Cell Biol. 8: 319-330.

Irimia M, Roy SW. 2014. Origin of spliceosomal introns and alternative splicing. In: Cold Spring Harbor perspectives in biology. Cold Spring Harbor Laboratory Press. 6: a016071

Jiang C, Gu J, Chopra S, Gu X, Peterson T. 2004. Ordered origin of the typical two- and three-repeat Myb genes. Gene 326: 13-22.

Jiang Y, Liang G, Yang S, Yu D. 2014. Arabidopsis WRKY57 functions as a node of convergence for jasmonic acid- and auxin-mediated signaling in jasmonic acid-induced leaf senescence. Plant Cell 26: 230-245.

Keren H, Lev-Maor G, Ast G. 2010. Alternative splicing and evolution: diversitication, exon definition and function. Nat Rev Genet. 11: 345-355.

Kim E, Magen A, Ast G. 2007. Different levels of alternative splicing among eukaryotes. Nucleic Acids Res. 35: 125-131.

146

Kornblihtt AR, et al. 2013. Alternative splicing: a pivotal step between eukaryotic transcription and translation. Nat Rev Mol Cell Biol. 14: 153-165.

Koonin EV. 2006. The origin of introns and their role in eukaryogenesis: a compromise solution to the introns-early versus introns-late debate? Biol Direct 1: 22.

Kranz HD, et al. 1998. Towards functional characterisation of the members of the R2R3-MYB gene family from Arabidopsis thaliana. Plant J. 16: 263-276.

Haas BJ, Delcher AL, Wortman JR, Salzberg SL. 2004. DAGchainer: a tool for mining segmental genome duplications and synteny. Bioinformatics 20: 3643-3646.

Haga N, et al. 2007. R1R2R3-Myb proteins positively regulate cytokinesis through activation of KNOLLE transcription in Arabidopsis thaliana. Development 134: 1101-1110.

Haga N, et al. 2011. Mutations in MYB3R1 and MYB3R4 cause pleiotropic developmental defects and preferential down-regulation of multiple G2/M-specific genes in Arabidopsis. Plant Physiol. 157: 706-717.

Han JH, Batey S, Nickson AA, Teichmann SA, Charke J. 2007. The folding and evolution of multidomain proteins. Mol Cell Biol. 8: 319-330.

Hedges SB, Martin J, Suleski M, Paymer M, Kumar S. 2015. Tree of life reveals clock-like speciation and diversification. Mol Biol Evol. 32: 835-845.

Hirt H. 2000. Connecting oxidative stress, auxin, and cell cycle regulation through a plant mitogen-activated protein kinase pathway. P Natl Acad Sci USA. 97: 2405-2407.

Howe GA, Jander G. 2008. Plant immunity to insect herbivores. Annu Rev Plant Biol. 59: 41-66.

Hu Y, Jiang L, Wang F, Yu D. 2013. Jasmonate regulates the INDUCER OF CBF EXPRESSION-C-REPEAT BINDING FACTOR/DRE BINDING FACTOR1 cascade and freezing tolerance in Arabidopsis. Plant Cell 25: 2907-2924.

Hu B, et al. 2015. GSDS 2.0: An upgraded gene feature visualization server. Bioinformatics 31: 1296-1297.

Huang CH, et al. 2015. Resolution of Brassicaceae phylogeny using nuclear genes uncovers nested radiations and supports convergent morphological evolution. Mol Biol Evol. 33: 394-412.

Inzé D and De Veylder L. 2006. Cell cycle regulation in plant development. Annu Rev Genet. 40: 77-105.

147

Ito M, et al. 1998. A novel cis-acting element in promoters of plant B-type cyclin genes activates M phase-specific transcription. Plant Cell 10: 331-341.

Ito M, et al.. 2001. G2/M-phase-specific transcription during the plant cell cycle is mediated by c-Myb-like transcription factors. Plant Cell 13: 1891-1905.

Ito M. 2005. Conservation and diversification of the three-repeat Myb transcription factors in plants. J Plant Res. 118: 61-69.

Jiao Y, et al. 2011. Ancestral polyploidy in seed plants and angiosperms. Nature 473: 97-100.

Jonak C, Ökrész L, Bögre L, Hirt H. 2002. Complexity, cross talk and integration of plant MAP kinase signalling. Curr Opin Plant Biol. 5: 415-424.

Kalsotra A, Wang K, Li PF, Cooper TA. 2010. MicroRNAs coordinate an alternative splicing network during mouse postnatal heart development. Genes Dev. 24: 653-658.

Kalyna M, et al. 2012. Alternative splicing and nonsense-mediated decay modulate expression of important regulatory genes in Arabidopsis. Nucleic Acids Res. 40: 2454-2469.

Kato K, et al. 2009. Preferential up-regulation of G2/M phage-specific genes by overexpression of the hyperactive form of NtmybA2a lacking its negative regulation domain in tobacco BY-2 Cells. Plant Physiol. 149: 1945-1957.

Katsir L, Chung HS, Koo AJK, Howe GA. 2008. Jasmonate signaling: a conserved mechanism of hormone sensing. Curr Opin Plant Biol. 11: 428-435.

Kervestin S, Jacobson A. 2012. NMD: a multifaceted response to premature translational termination. Nature Rev Mol Cell Bio. 13: 700-712.

Khanna R, et al. 2009. The Arabidopsis B-box zinc finger family. Plant Cell 21: 3416-3420.

Kilian J, et al. 2007. The AtGenExpress global stress expression data set: protocols, evaluation and model data analysis of UV-B light, drought and cold stress responses. Plant J. 50: 347-363.

Klempnauer KH, Gonda TJ, Bishop JM. 1982. Nucleotide sequence of the retroviral leukemia gene v-myb and its cellular progenitor c-myb: the architecture of a transduced oncogene. Cell 31: 453-463.

Koh J, et al. 2012. Comparative proteomics of the recently and recurrently formed natural allopolyploid Tragopogon mirus (Asteraceae) and its parents. New Phytol. 196: 292-305.

148

Koonin EV. 2006. The origin of introns and their role in eukaryogenesis: a compromise solution to the introns-early versus introns-late debate? Biol Direct. 1: 22.

Kornblihtt AR, et al. 2013. Alternative splicing: a pivotal step between eukaryotic transcription and translation. Nature Rev Mol Cell Biol. 14: 153-165.

Kozomara A, Griffiths-Jones S. 2014. miRBase: annotating high confidence microRNAs using deep sequencing data. Nucleic Acids Res. 42: D68-D73.

Lane CE, et al. 2007. Nucleomorph genome of Hemiselmis andersenii reveals complete intron loss and compaction as a driver of protein structure and function. P Natl Acad Sci USA.104: 19908-19913.

LaPolla RJ, Lambowitz AM. 1978. Ribosomal precursor RNA containing a 2.3-kilobase intron. J Biol Chem. 254: 11746-11750.

Lareau LF, Inada M, Green RE, Wengrod JC, Brenner SE. 2007. Unproductive splicing of SR genes associated with highly conserved and ultraconserved DNA elements. Nature 446: 926-929.

Le SQ, Dang CC, Gascuel O. 2012. Modeling protein evolution with several amino acid replacement matrices depending on site rates. Mol Biol Evol. 29: 2921-2936.

Le SQ, Gascuel O. 2008. An improved general amino acid replacement matrix. Mol Biol Evol. 25: 1307-1320.

Letunic I, Doerks T, Bork P. 2015. SMART: recent updates, new developments and status in 2015. Nucleic Acids Res. 43: D257-D260.

Lewis BP, Green RE, Brenner SE. 2003. Evidence for the widespread coupling of alternative splicing and nonsense-mediated mRNA decay in humans. P Natl Acad Sci USA. 100: 189-192.

Li J, et al. 2006. A subgroup of MYB transcription factor genes undergoes highly conserved alternative splicing in Arabidopsis and rice. J Exp Bot. 57: 1263-1273.

Li JY, et al. 2010. The Arabidopsis nitrate transporter NRT1.8 functions in nitrate removal from the xylem sap and mediates cadmium tolerance. Plant Cell 22: 1633-1646.

Li Q, Zhang C, Li J, Wang L, Ren Z. 2012. Genome-wide identification and characterization of R2R3MYB gene family in Cucumis sativus. PLoS ONE 7: e47576.

Linkies A, Leubner-Metzger G. 2012. Beyond gibberellins and abscisic acid: how ethylene and jasmonates control seed germination. Plant Cell Rep. 31: 253-270.

Lipsick JS. 1996. One billion years of Myb. Oncogene 13: 223-235.

149

Logsdon JM, et al. 1995. Seven newly discovered intron positions in the triose-phosphate isomerase gene: evidence for the introns late theory. P Natl Acad Sci USA. 92: 8507-8511.

Logsdon JM. 1998. The recent origin of spliceosomal introns revised. Curr Opin Genet Dev. 8: 637-648.

Ma PCM, Rould MA, Weintraub H, Pabo CO. 1994. Crystal structure of MyoD bHLH domain-DNA complex: Perspective on DNA recognition and implications for transcriptional activation. Cell 77: 451-459.

Ma Q, et al. 2009. Enhanced tolerance to chilling stress in OsMYB3R-2 transgenic rice is mediated by alteration in cell cycle and ectopic expression of stress genes. Plant Physiol. 150: 244-256.

Madera M, Vogel C, Kummerfeld SK, Chothia C, Gough J. 2004. The SUPERFAMILY database in 2004: additions and improvements. Nucleic Acids Res. 32: D235-D239.

Marchler-Bauer A, et al. 2015. CDD: NCBI’s conserved domain database. Nucleic Acids Res. 43: D222-D226.

Marquez Y, Brown JWS, Simpson C, Barta A, Kalyna M. 2012. Transcriptome survey reveals increased complexity of the alternative splicing landscape in Arabidopsis. Genome Res. 22: 1184-1195.

Martin C, Paz-Ares J. 1997. MYB transcription factors in plants. Trends in Genet. 13: 67-73.

Martinez-Contreras R, et al. 2008. 8 hnRNP proteins and splicing control. Adv Exp Med Biol. 623: 107-122.

Matus JT, Aquea F, Arce-Johnson P. 2008. Analysis of the grape MYB R2R3 subfamily reveals expanded wine quality-related clades and conserved gene structure organization across Vitis and Arabidopsis genomes. BMC Plant Biol. 8: 83.

Mei W, Boatwright JL, Feng G, Schnable JC, Barbazuk WB. 2017. Evolutionarily conserved alternative splicing across monocots. Genetics xx: xx-xx.

Meng Y, Shao C, Ma X, Wang H. 2013. Introns targeted by plant microRNAs: a possible novel mechanism of gene regulation. Rice 6: 8.

Miao Y, Zentgraf U. 2007. The antagonist function of Arabidopsis WRKY53 and ESR/ESP in leaf senescence is modulated by the jasmonic and salicylic acid equilibrium. Plant Cell 19: 819-830.

Mittler R, Vanderauwera S, Gollery M, Van Breusegem F. 2004. Reactive oxygen gene network of plants. Trends Plant Sci. 9: 490-498.

150

Moreno JE, et al. 2013. Negative feedback control of jasmonate signaling by an alternative splice variant of JAZ10. Plant Physiol. 162: 1006-1017.

Mudgil Y, Singh BN, Upadhyaya KC, Sopory SK, Reddy MK. 2002. Cloning and characterization of a cell cycle-regulated gene encoding topoisomerase I from Nicotiana tabacum that is inducible by light, low temperature and abscisic acid. Mol Genet Genomics 267: 380-390.

Nakagami H, Pitzschke A, Hirt H. 2005. Emerging MAP kinase pathways in plant stress signalling. Trends Plant Sci. 10: 339-346.

Nilsen TW. 2003. The spliceosome: the most complex macromolecular machine in the cell? BioEssays 25: 1147-1149.

Nishii A, et al. 2000. Characteriztion of a novel gene encoding a putative single zinc-finger protein, ZIM, expressed during the reproductive phage in Arabidopsis thaliana. Biosci Biotechnol Biochem 64: 1402-1409.

Oelgeschläger M, Kowenz-Leutz E, Schreek S, Leutz A, Lüscher B. 2001. Tumorigenic N-terminal deletions of c-Myb modulate DNA binding, transactivation, and cooperativity with C/EBP. Oncogene 20: 7420-7424.

Ogata K, et al. 1992. Solution structure of a DNA-binding unit of Myb: A helix-turn-helix-related motif with conserved tryptophans forming a hydrophobic core. P Natl Acad Sci USA. 89: 6428-6432.

Ogata K, et al. 1994. Solution structure of a specific DNA complex of the Myb DNA-binding domain with cooperative recognition helices. Cell 79: 639-648.

Olson A, et al. 2014. Expanding and vetting Sorghum bicolor gene annotations through transcriptome and methylome sequencing. Plant Genome. 7: 2.

Ording E, Kvavik W, Bostad A, Gabrielsen OS. 1994. Two functionally distinct half sites in the DNA-recognition sequence of the Myb oncoprotein. Eur J BioChem. 222: 113-120.

Palmer JD, Logsdon JM. 1991. The recent origin of introns. Curr Opin Genet Dev. 1: 470-477.

Pan Q, Shai O, Lee LJ, Frey BJ, Blencowe BJ. 2008. Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nat Genet. 40: 1413-1415.

Par̆enicová L, et al. 2003. Molecular and phylogenetic analyses of the complete MADS-Box transcription factor family in Arabidopsis: new openings to the MADS world. Plant Cell 15: 1538-1551.

151

Patel AA, Steitz JA. 2003. Splicing double: insights from the second spliceosome. Nat Rev Mol Cell Biol. 4: 960-970.

Paterson AH, et al. 2009. The Sorghum bicolor genome and the diversification of grasses. Nature 457: 551-556.

Pauwels L, et al. 2010. NINJA connects the co-repressor TOPLESS to jasmonate signalling. Nature 464: 788-791.

Pleiss JA, Whitworth G, Bergkessel M, Guthrie C. 2007. Rapid, transcript-specific changes in splicing in response to environmental stress. Mol Cell 27: 928-937.

Qi T, et al. 2011. The jasmonate-ZIM-domain proteins interact with the WD-repeat/bHLH/MYB Complexes to regulate jasmonate-mediated anthocyanin accumulation and trichome initiation in Arabidopsis thaliana. Plant Cell 23: 1795-1814.

Qi T, Huang H, Song S, Xie D. 2015a. Regulation of jasmonate-mediated stamen development and seed production by a bHLH-MYB complex in Arabidopsis. Plant Cell 27: 1620-1633.

Qi T, et al. 2015b. Regulation of jasmonate-induced leaf senescence by antagonism between bHLH subgroup IIIe and IIId factors in Arabidopsis. Plant Cell 27: 1634-1649.

R Development Core Team. 2014. R: A language and environment for statistical computing. Vienna, Austria.

Raikhel N. 1992. Nuclear targeting in plants. Plant Physiol. 100: 1627-1632.

Reddy ASN. 2004. Plant serine/arginine-rich proteins and their role in pre-mRNA splicing. Trends Plant Sci. 9: 11.

Reddy ASN, Marquez Y, Kalyna M, Barta A. 2013. Complexity of the alternative splicing landscape in plants. Plant Cell 25: 3657-3683.

Rensing SA, et al. 2007. An ancient genome duplication contributed to the abundance of metabolic genes in the moss Phycomitrella patens. BMC Evol Biol. 7: 130.

Rogozin IB, Carmel L, Csuros M, Koonin EV. 2012. Origin and evolution of spliceosomal introns. Biol Direct 7: 11.

Rosinski JA, Atchley WR. 1998. Molecular evolution of the Myb family of transcription factors: evidence for polyphyletic origin. J Mol Evol. 46: 74-83.

Roy SW, Nosaka M, de Souza SJ, Gilbert W. 1999. Centripetal modules and ancient introns. Gene 238: 85-91.

152

Roy SW, Gilbert W. 2006. The evolution of spliceosomal introns: patterns, puzzles and progress. Nat Rev Genet. 7: 211-221.

Ruhfel BR, Gitzendanner MA, Soltis PS, Soltis DE, Burleigh JG. 2014. From algae to angiosperms – inferring the phylogeny of green plants (Viridiplantae) from 360 plastid genomes. BMC Evol Biol. 14: 23.

Saldanha R, Mohr G, Belfort M, Lambowitz AM. 1993. Group I and group II introns. FASEB J. 7: 15-24.

Sammeth M, Foissac S, Guigó R. 2008. A general definition and nomenclature for alternative splicing events. PLoS Comput Biol. 4: e1000147.

Sandberg R, Neilson JR, Sarma A, Sharp PA, Burge CB. 2008. Proliferating cells express mRNAs with shortened 3’ untranslated regions and fewer microRNA target sites. Science 320: 1643-1647.

Schwartz SH, et al. 2008. Large-scale comparative analysis of splicing signals and their corresponding splicing factors in eukaryotes. Genome Res. 18: 88-103.

Severing EI, van Dijk ADJ, Stiekema WJ, van Ham RCHJ. 2009. Comparative analysis indicates that alternative splicing in plants has limited role in functional expansion of the proteome. BMC Genomics 10: 154.

Shaikhali J, et al. 2012. The CRYPTOCHROME1-depenent response to excess light is mediated through the transcriptional activators ZINC FINGER PROTEIN EXPRESSED IN INFLORESENCE MERISTEM LIKE1 and ZML2 in Arabidopsis. Plant Cell 24: 3009-3025.

Shannon P, et al. 2003. Cytoscape: A software environment for integrated models of biomolecular interaction networks. Genome Res. 13: 2498-2504.

Shilov IV, et al. 2007. The paragon algorithm, a next generation search engine that uses sequence temperature values and feature probabilities to identify peptides from tandem mass spectra. Mol Cell Proteomics 6: 1638-1655.

Shyu C, et al. 2012. JAZ8 lacks a canonical degron and has an EAR motif that mediates transcriptional repression of jasmonate responses in Arabidopsis. Plant Cell 24: 536-550.

Sikder HA, Devlin MK, Dunlap S, Ryu B, Alani RM. 2003. Id proteins in cell growth and tumorigenesis. Cancer Cell 3: 525-530.

Simpson CG, Brown JWS. 2008. U12-Dependent Intron Splicing in Plants. Nuclear pre-mRNA Processing in Plants 326: 61-82.

153

Song S, et al. 2011. The Jasmonate-ZIM domain proteins interact with the R2R3-MYB transcription factors MYB21 and MYB24 to affect jasmonate-regulated stamen development in Arabidopsis. Plant Cell 23: 1000-1013.

Song S, et al. 2013. The bHLH subgroup IIId factors negatively regulate jasmonate-mediated plant defense and development. PLoS Genetics 9: e1003653.

Staiger D, Brown JWS. 2013. Alternative splicing at the intersection of biological timing, development, and stress responses. Plant Cell 25: 3640-3656.

Stamatakis A. 2014. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30: 1312-1313.

Stoltzfus A. 1999. On the possibility of constructive neutral evolution. J Mol Evol 49: 169-181

Stolzer M, Siewert K, Lai H, Xu M, Durand D. 2015. Event inference in multidomain families with phylogenetic reconciliation. BMC Bioinformatics 16: S8.

Szklarczyk D, et al. 2015. STRING V10: protein-protein interaction networks, integrated over the tree of life. Nucleic Acids Res. 43: D447-D452.

Tamura K, Stecher G, Peterson D, Filipski A, Kumar S. 2013. MEGA6: Molecular evolutionary genetics analysis version 6.0. Mol Biol Evol. 30:2725-2729.

Tanaka H, et al. 2015. Identification and characterization of Arabidopsis AtNUDX9 as a GDP-D-mannose pyrophosphohydrolase: its involvement in root growth inhibition in response to ammonium. J Exp Bot. 66: 5797-5808.

Tarrío R, Ayala FJ, Rodríguez-Trelles F. 2008. Alternative splicing: a missing piece in the puzzle of intron gain. P Natl Acad Sci USA. 105: 7223-7228.

Teakle GR, Manfield IW, Graham JF, Gilmartin PM. 2002. Arabidopsis thaliana GATA factors: organisation, expression and DNA-binding characteristics. Plant Mol Biol. 50: 43-57.

Terol J, Domingo C, Talon M. 2006. The GH3 family in plants: genome wide analysis in rice and evolutionary history based on EST analysis. Gene. 371: 279-290.

Thatcher LF, et al. 2016. Characterization of a JAZ7 activation-tagged Arabidopsis mutant with increased susceptibility to the fungal pathogen Fusarium oxysporum. J Exp Bot. 67: 2367-2386.

Thines B, et al. 2007. JAZ repressor proteins are targets of the SCFCOI1 complex during jasmonate signalling. Nature 448: 661-665.

Thireault C, et al. 2015. Repression of jasmonate signaling by a non-TIFY JAZ protein in Arabidopsis. Plant J. 82: 669-679.

154

Tian B, Manley JL. 2013. Alternative cleavage and polyadenylation: the long and short of it. Trends in Biochem Sci. 38: 6.

Trapnell C, et al. 2013. Differential analysis of gene regulation at transcript resolution with RNA-seq. Nature Biotechnol. 31: 46-53.

Valenzuela P, Venegas A, Weinberg F, Bishop R, Rutter WJ. 1978. Structure of yeast phenylalanine-tRNA genes: an intervening DNA segment within the region coding for the tRNA. P Natl Acad Sci USA. 75: 190-194.

Valenzuela CE, et al. 2016. Salt stress response triggers activation of the jasmonate signaling pathway leading to inhibition of cell elongation in Arabidopsis primary root. J Exp Bot. 67: 4209-4220.

Vanholme B, Grunewald W, Bateman A, Kohchi T, Gheysen G. 2007. The tify family previously known as ZIM. Trends Plant Sci. 12: 239-244.

Vanneste K, Maere S, Van de Peer Y. 2014. Tangled up in two: a burst of genome duplications at the end of the Cretaceous and the consequences for plant evolution. Phil Trans R Soc B. 369: 20130353.

Vélez-Bermúdez IC, et al. 2015. A MYB/ZML complex regulates wound-induced lignin genes in maize. Plant Cell 27: 3245-3259.

Vogel C, Bashton M, Kerrison N, Chothina C, Teichmann SA. 2004. Structure, function and evolution of multidomain proteins. Curr Opin Struct Biol. 14: 208-216.

Voinnet O. 2009. Origin, biogenesis, and activity of plant microRNAs. Cell 136: 669-687.

Wahl MC, Will CL, Lührmann R. 2009. The spliceosome: design principles of a dynamic RNP machine. Cell 136: 701-718.

Wang H, et al. 1998. ICK1, a cyclin-dependent protein kinase inhibitor from Arabidopsis thaliana interacts with both Cdc2a and CycD3, and its expression is induced by abscisic acid. Plant J. 15: 501-510.

Wang BB, Brendel V. 2006. Genomewide comparative analysis of alternative splicing in plants. P Natl Acad Sci USA. 103: 7175-7180.

Wang L, Wang S, Li W. 2012. RSeQC: quality control of RNA-seq experiments. Bioinformatics 28: 2184-2185.

White DWR. 2006. PEAPOD regulates lamina size and curvature in Arabidopsis. Proc Natl Acad Sci USA. 103: 13238-13243.

Will CL, Luhrmann R. 2005. Splicing of a rare class of introns by the U12-dependent spliceosome. Biol Chem. 386: 713-724.

155

Wu CT, Chiou CY, Chiu HC, Yang UC. 2013a. Fine-tuning of microRNA-mediated repression of mRNA by splicing-regulated and highly repressive microRNA recognition element. BMC Genomics 14: 438.

Wu G, Poethig RS. 2006. Temporal regulation of shoot development in Arabidopsis thaliana by miR156 and its target SPL3. Development 133: 3539-3547.

Wu TD, Nacu S. 2010. Fast and SNP-tolerant detection of complex variants and splicing in short reads. Bioinformatics 26: 873-881.

Wu YC, Rasmussen MD, Bansal MS, Kellis M. 2013b. TreeFix: statistically informed gene tree error correction using species trees. Syst Biol. 62: 110-120.

Yamasaki H, Hayashi M, Fukazawa M, Kobayashi Y, Shikanai T. 2009. SQUAMOSA promoter binding protein-like7 is a central regulator for copper homeostasis in Arabidopsis. Plant Cell 21: 347-361.

Yan H, et al. 2014. Molecular reprogramming of Arabidopsis in response to perturbation of jasmonate signaling. J Proteome Res. 13: 5751-5766.

Yan Y, et al. 2007. A downstream mediator in the growth repression limb of the jasmonate pathway. Plant Cell 19: 2470-2483.

Yang SW, Jin E, Chung IK, Kim WT. 2002. Cell cycle-dependent regulation of telomerase activity by auxin, abscisic acid and protein phosphorylation in tobacco BY-2 suspension culture cells. Plant J. 29: 617-626.

Yang Z. 2007. PAML4: Phylogenetic analysis by maximum likelihood. Mol Biol Evol. 24: 1586-1591.

Yang X, Zhang H, Li L. 2012. Alternative mRNA processing increases the complexity of microRNA-based gene regulation in Arabidopsis. Plant J. 70: 421-431.

Yoshida T, Mogami J, Yamaguchi-Shinozaki K. 2014. ABA-dependent and ABA-independent signaling in response to osmotic stress in plans. Curr Opin Plant Biol. 21: 133-139.

Yu J, et al. 2016. JAZ7 negatively regulates dark-induced leaf senescence in Arabidopsis. J Exp Bot. 67: 751-762.

Zeng L, et al. 2014. Resolution of deep angiosperm phylogeny using conserved nuclear genes and estimates of early divergence times. Nature Commun. 5: 4956.

Zhai Q, et al. 2015. Transcriptional mechanism of jasmonate receptor COI1-mediated delay of flowering time in Arabidopsis. Plant Cell 27: 2814-2828.

Zhang F, et al. 2017a. Structural insights into alternative splicing-mediated desensitization of jasmonate signaling. Proc Natl Acad Sci USA. 114: 1720-1725.

156

Zhang R, et al. 2017b. A high quality Arabidopsis transcriptome for accurate transcript-level analysis of alternative splicing. Nucleic Acids Res. 45: 5061-5073.

Zhang Y, et al. 2012. Genome-wide identification and analysis of the TIFY gene family in grape. PLoS ONE 7: e44465.

Zheng Y, et al. 2017. Jasmonate inhibits COP1 activity to suppress hypocotyl elongation and promote cotyledon opening in etiolated Arabidopsis seedlings. Plant J. 90: 1144-1155.

157

BIOGRAPHICAL SKETCH

Guanqiao Feng was from Xingtai, China, where she completed her primary,

middle school and high school education. In 2007 she moved to Nanjing to start her

undergraduate studies. She received her bachelor’s degree in agriculture in 2011 and

master’s degree in biotechnology in 2013 from Nanjing Agricultural University, China.

During that time she developed a keen interest in bioinformatics. After graduating with

her MSc. she joined the Plant Molecular and Cellular Biology Graduate Program at the

University of Florida in 2013 to pursue her doctorate degree. In the first year, she

rotated in Dr. Burleigh’s lab, Dr. Soltis’s lab and Dr. Barbazuk’s lab. In Dr. Burleigh’s lab,

she worked on evolution of MYB gene family in plant kingdom, which later developed

into the second chapter of her dissertation. While in Dr. Soltis’s lab, she worked with Dr.

Gitzendanner on the plant 1kp project data to estimate the plant phylogeny based on

plasmid genome. During a rotation in Dr. Barbazuk’s lab she was involved in the project

of identifying and characterizing conserved AS in flowering plants. After the first year of

rotation she joined Dr. Barbazuk’s lab to study gene family evolution and alternative

splicing. In her Ph.D. research, she explored three projects: 1) Evolution of the 3R-

MYBs gene family in plants; 2) Origin and evolution of the TIFY plant-specific multi-

domain gene family; 3) Jasmonate induced alternative splicing in Arabidopsis. Besides

the above projects, she was involved in two other projects: 1) Conserved alternative

splicing across monocots; 2) Maize RBM48 in minor intron splicing and differentiation.

She received her Ph.D. from the University of Florida in the fall of 2017.