© 2017 guanqiao feng
TRANSCRIPT
EVOLUTION OF PLANT 3R-MYB AND TIFY GENE FAMILIES AND JASMONATE INDUCED ALTERNATIVE SPLICING RESPONSES IN ARABIDOPSIS
By
GUANQIAO FENG
A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY
UNIVERSITY OF FLORIDA
2017
4
ACKNOWLEDGMENTS
First and foremost, I thank my advisor Dr. William Bradley Barbazuk for his
support, guidance, and encouragement during my Ph.D. study. He provided motivation
and made my Ph.D. experience smooth. With his trust, I always felt confidence to tackle
problems encountered in my research, through which I gained insight on the issues and
learned the methods to solve them. I am grateful to my committee members, all of
whom have been intensively involved in my project development and were always there
to help me out when I confronted obstacles problems. I started my MYB project with Dr.
Gordon Burleigh and this project was further developed with assistance from Dr.
Edward Braun. Without their generous help, I could not have explored the MYB project
in so many interesting directions. Dr. Sixue Chen provided me with data to explore a
fascinating project on jasmonate related alternative splicing. I appreciate the opportunity
I had to work on it and am thankful of his support and advice on the project.
During my Ph.D. study at UF, my lab members and friends provided much
technical support and informative discussions. I would like to thank all of them: Dr.
George Tiley, Dr. Tong Zhang, Dr. Srikar Chamala, Dr. Wenbin Mei, Lucas Boatwright,
Jerald Noble, Nathan Catlin, Ruth Davenport, Xiaoxian Liu, Qinyin Ling, Dr. Jason Orr,
Dr. Stela Palii. In addition, I would like to thank Plant Molecular and Cellular Biology
program, Department of Biology and China Scholarship Council for financial support.
I cannot give enough thanks to my parents who always love, support and
understand me. Their love is the source of my strength. I thank my beloved fiancé
Anirudh Gangadhar for showing up in my life. He supported me continuously to
overcome obstacles encountered while pursuing my Ph.D. degree.
5
TABLE OF CONTENTS page
ACKNOWLEDGMENTS .................................................................................................. 4
LIST OF TABLES ............................................................................................................ 8
LIST OF FIGURES .......................................................................................................... 9
LIST OF ABBREVIATIONS ........................................................................................... 11
ABSTRACT ................................................................................................................... 13
CHAPTER
1 INTRODUCTION .................................................................................................... 15
Alternative Splicing Mechanism .............................................................................. 15
Intron Origin Hypothesis ......................................................................................... 18 MYB Gene Family Evolution ................................................................................... 21 TIFY Gene Family Evolution ................................................................................... 24
2 EVOLUTION OF THE 3R-MYB GENE FAMILY IN PLANTS .................................. 27
Background ............................................................................................................. 27
Materials and Methods............................................................................................ 30
Identification of the 3R-MYB Proteins ............................................................... 30
Multiple Sequence Alignments and Phylogenetic Analysis............................... 31 Domain and Motif Identification ........................................................................ 31
Synonymous Divergence among Paralogs....................................................... 32 Syntenic Block Identification ............................................................................. 33 Identification of Intron Positions and AS Analysis ............................................. 33
Analysis of Motifs in Promoter Regions ............................................................ 34 Gene Expression Analysis ................................................................................ 34
Results .................................................................................................................... 35
Global Identification of 3R-MYB Proteins from 65 Plant Species ..................... 35 Phylogenetic Analysis of the Plant 3R-MYB Proteins ....................................... 35 Synteny ............................................................................................................ 36 Synonymous Divergence Analysis of the Three Group 3R-MYBs in
Angiosperms ................................................................................................. 36 The Evolutionary History of the Plant 3R-MYBs Motifs .................................... 37 Gene Structure Evolution ................................................................................. 38
Alternative Splicing of the Plant 3R-MYBs........................................................ 38 MSA cis-Regulatory Element Prediction (Cell Cycle Regulation) ..................... 39
Expression Pattern of the Plant 3R-MYBs under Abiotic Stresses ................... 40 Discussion .............................................................................................................. 41
Patterns of Duplication and Loss in Plant 3R-MYB Genes ............................... 41
6
DNA-Binding Domain and Regulatory Motifs .................................................... 43
Intron Gain and Gene Structure Evolution ........................................................ 44
AS Regulation of the Plant 3R-MYBs ............................................................... 45 Plant 3R-MYBs: Link between Cell Cycle and Abiotic Stresses ....................... 46
3 JASMONATE INDUCED ALTERNATIVE SPLICING RESPONSES IN ARABIDOPSIS ....................................................................................................... 67
Background ............................................................................................................. 67
Materials and Methods............................................................................................ 70 Plant Growth, MeJA Treatment and Harvesting, Transcriptome Library
Preparation and Sequencing ......................................................................... 70 Transcriptome Assembly and Differential AS Detection ................................... 71 Open Reading Frame Prediction ...................................................................... 71
Protein Interaction Network Analysis ................................................................ 72 miRNA Target Predication ................................................................................ 72
Protein Extraction, Digestion, iTRAQ Labeling and LC-MS/MS ........................ 72 Proteomics Data Analysis ................................................................................. 73
Results .................................................................................................................... 74 Transcriptome Sequencing and Genome-Guided Assembly ............................ 74 Jasmonate-Related Protein Interaction Network .............................................. 75
Regulation of Transcription Factors (bHLHs and MYBs) and Splicing Factors (SRs and hnRNPs) ........................................................................... 76
Differential Alternative Splicing in Response to MeJA Treatment ..................... 78 Alternative Splicing Variants Differentially Targeted by miRNA ........................ 80 Alternative Splicing Variants with Novel Functions ........................................... 81
Alternative Splicing Variant of bHLH160 with Potential Novel Function ........... 82
Proteomics Validation for Alternative Splicing .................................................. 84 Discussion .............................................................................................................. 84
Regulatory Functions of JAZ2 and JAZ7 .......................................................... 84
Alternative Splicing Coupled miRNA Regulation .............................................. 87 Functional AS Regulation ................................................................................. 90
4 ORIGIN AND EVOLUTION OF THE TIFY PLANT-SPECIFIC MULTI-DOMAIN GENE FAMILY ...................................................................................................... 113
Background ........................................................................................................... 113 Materials and Methods.......................................................................................... 114
Identification of Members in the TIFY Gene Family ........................................ 114
Multiple Sequence Alignment and Phylogeny ................................................ 115 Domain Identification ...................................................................................... 115 Rate-Shift Analysis of the TIFY Domain ......................................................... 116
Alternative Splicing Analysis ........................................................................... 116 Results .................................................................................................................. 117
Domain Identification and Evolutionary History .............................................. 117 Gene Family Identification and Evolution History ........................................... 118 Domain Dynamics during Evolution ................................................................ 119
7
Alternative Splicing of Jas-like Intron in PPD Genes ...................................... 120
Discussion ............................................................................................................ 121
Evolutionary History of the TIFY Family ......................................................... 121 Poales Experienced Many Gene Loss Events ................................................ 122 Domain Loss of TIFY Multidomain Family during Evolution ........................... 123 Alternative Splicing of Jas-like Intron of PPD Genes ...................................... 125
5 CONCLUSIONS AND PERSPECTIVES ............................................................... 136
Evolution and Function of the Plant 3R-MYBs ...................................................... 136 Origin and Evolution of the Multidomain TIFY Family ........................................... 138 AS Regulation of Arabidopsis under Jasmonate Treatment ................................. 139
LIST OF REFERENCES ............................................................................................. 140
BIOGRAPHICAL SKETCH .......................................................................................... 157
8
LIST OF TABLES
Table page 2-1 Data resource summary of the sixty-five plant species used in this study. ......... 64
2-2 Positive selection test results.............................................................................. 66
3-1 RNA-seq library and mapping information. ....................................................... 110
3-2 Gene isoform number in TAIR10 and MeJA RNA-Seq data. ............................ 111
3-3 Differential AS or expression in treatment comparisons. .................................. 112
3-4 Differential AS or expression in tissue comparisons. ........................................ 112
3-5 Differential AS or expression between mutant backgrounds. ........................... 112
9
LIST OF FIGURES
Figure page 2-1 Species phylogeny and numbers of 3R-MYB genes in each species.. ............... 49
2-2 Subgroup classification of the plant 3R-MYBs. ................................................... 50
2-3 Whole length protein ML phylogenetic tree of the plant 3R-MYBs...................... 51
2-4 Syntenic blocks in algae and Amborella that contain 3R-MYB genes. ............... 52
2-5 Tests for origin of the three groups of the plant 3R-MYB genes.. ....................... 53
2-6 Multiple protein alignments of motif 4 with representative species ..................... 54
2-7 Analysis of DNA binding domain of the plant 3R-MYBs proteins. ....................... 55
2-8 Intron evolution of the DNA-binding-domain region of the plant 3R-MYBs. ........ 57
2-9 AS of 3R-MYBs in multiple plant species.. ......................................................... 58
2-10 Predicted MSA element within promoter of the plant 3R-MYB genes. ................ 59
2-11 Violin plots of MSA core sequences in the upstream regions for each group. .... 60
2-12 Expression of the Arabidopsis 3R-MYB genes under abiotic stresses.. ............. 61
2-13 Expression of 3R-MYB genes from nine angiosperm species under abiotic stresses.. ............................................................................................................ 62
2-14 Model of plant 3R-MYB evolution ....................................................................... 63
3-1 jaz2 and jaz7 mutant characterization ................................................................ 94
3-2 Characterization of the assembled transcripts from the MeJA project. ............... 95
3-3 Protein interaction network of AS genes with differential expression/AS.. .......... 96
3-4 Heatmap of differentially expressed AS genes of the transcription factor (bHLH and MYB) or splicing factor (SR and hnRNP) gene family. ..................... 97
3-5 Venn diagram of differential AS or expression genes. ........................................ 98
3-6 Selective cases in which gene expression was regulated by AS in response to MeJA treatment. ............................................................................................. 99
3-7 Two genes under AS regulation in response to MeJA treatment.. .................... 100
10
3-8 AS genes with multiple miRNA target sites differentially targeted by miRNA between isoforms.. ........................................................................................... 101
3-9 Twenty-one genes which contain miRNA binding sites subjected to AS regulation .......................................................................................................... 102
3-10 miRNA regulation and expression profiles of SMZ, AAO2 and At3g02740 ...... 103
3-11 Cases in which AS won’t generate multiple protein products. .......................... 104
3-12 Genes with different domain structure of the transcript isoforms due to AS. .... 105
3-13 Arabidopsis bHLH160 AS pattern and proposed regulatory function................ 106
3-14 Mapped reads supporting the AS junction in Arabidopsis bHLH160b-. ............. 107
3-15 Proteomics validated AS isoform expression. .................................................. 108
3-16 Differential miRNA regulation of SPL4 splice variants. ..................................... 109
4-1 Logos of the domains in the TIFY family. ......................................................... 128
4-2 Domain distribution across 76 plant species. ................................................... 129
4-3 Distribution of ZML, JAZ, PPD and TIFY family in plant species ...................... 130
4-4 ML tree of ZML, JAZ, PPD and TIFY family...................................................... 131
4-5 Estimated evolutionary history of the four families in the TIFY family. .............. 132
4-6 Rate-shift sites in the TIFY domain across the four families. ............................ 133
4-7 Domain dynamics of the PPD, TIFY and ZML families. .................................... 134
4-8 AS in the Jas-like intron of PPD genes ............................................................. 135
11
LIST OF ABBREVIATIONS
AltA Alternative acceptor
AltD Alternative donor
ANOVA Analysis of variance
AS Alternative splicing
CBF C-repeat-binding factor
CCT CONSTANS, CO-like, TOC1
cDNA Complementary DNA
COI1 Coronatine Insensitive 1
dN Nonsynonymous substitutions per nonsynonymous site
DNA Nucleic acid
DREB Dehydration responsive element-binding factor
dS Synonymous substitutions per synonymous site
ExonS Exon skipping
HMM Hidden Markov model
hnRNP Heterogeneous nuclear ribonucleoprotein
IntronR Intron retention
JA Jasmonic acid
JA-Ile (+)-7-iso-Jasmonoyl-L-isoleucine
Jas Jasmonate-associated
JAZ Jasmonate-ZIM domain
MeJA Methyl jasmonate
miRNA Micro-RNA
ML Maximum likelihood
mRNA Messenger RNA
12
MSA Mitosis-Specific Activator
NINJA Novel interactor of JAZ
NMD Nonsense-mediated mRNA decay
NRT1.8 Nitrate transporter 1.8
NUDX9 Nudix hydrolase homolog 9
ORF Open reading frame
PPD PEAPOD
PTC Premature termination codon
RNA Ribonucleic acid
RNA-seq mRNA-sequencing
rRNA Ribosomal RNA
snRNA Small nuclear RNA
snRNP Small nuclear ribonucleoprotein
SR Serine/Arginine-rich
TPL TOPLESS
tRNA Transfer RNA
UTR Untranslated region
WGD Whole genome duplication
ZIM Zinc-finger protein expressed in Inflorescence Meristem
13
Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy
EVOLUTION OF PLANT 3R-MYB AND TIFY GENE FAMILIES AND
JASMONATE INDUCED ALTERNATIVE SPLICING RESPONSES IN ARABIDOPSIS
By
Guanqiao Feng
December 2017
Chair: William Bradley Barbazuk Major: Plant Molecular and Cellular Biology
One of the biggest challenges for a plant’s survival is to deal with various abiotic
and biotic stresses. During evolution many gene families related with stress responses
have gone through expansion. However, the link between the expansion of these
families and the adaptation of plants to environment is not clear. In my Ph.D. research I
focused on the molecular evolution and function of two important gene families: 3R-
MYB and TIFY. 3R-MYB and TIFY genes were identified from ~70 plant species
including algae and all major lineages of land plants. Duplication events giving rise to
the expansion of the two families were identified and placed in the context of plant
evolution and speciation. In the 3R-MYB project, I further explored the 3R-MYB motif
and domain organization, gene structure, alternative splicing (AS), promoter, and their
expression patterns under abiotic stresses. In the TIFY project, I focused on domain
architecture evolution, rate-shift analysis in the domain which may contribute to
functional divergence among subfamilies, and AS conservation and dynamics.
Jasmonic acid (JA) is a phytohormone induced by wound and herbivorous attack.
Many MYB and TIFY genes play an important role in the jasmonate signaling pathway.
In plants, AS, a posttranscriptional mechanism providing fast responses towards
14
endogenous and exogenous stimuli, occurs within ~60% of the protein-coding genes in
the genome. In my third project, I focused on jasmonate induced AS responses in
Arabidopsis using transcriptome and proteome data. Three aspects of AS-related
regulation were addressed: 1) differential expression of AS isoforms identified by a
change in the proportion of AS isoforms from genes in response to methyl jasmonate
(MeJA); 2) genes that undergo differential AS and produce isoforms with potential
miRNA target sites; and 3) genes that undergo AS to produce splice variants with novel
functions. I observed cases where AS alone or AS and transcription together can
influence gene expression in response to jasmonate treatment. Twenty-one genes
which show differential AS were also predicted to be differentially targeted by miRNAs. I
identified 30 cases where alternative spliced isoforms may have novel functions. For
example, AS of bHLH160 generates an isoform without a basic domain, which may
convert it from an activator to a repressor.
15
CHAPTER 1 INTRODUCTION
Alternative Splicing Mechanism
Precise and quick response to environmental and developmental change is
crucial for organism survival. AS, through which multiple mRNA products are generated
from a single gene, is a posttranscriptional modification of mRNA that may offer a quick
response to stimulus in eukaryotes. As in yeast, AS of ribosomal protein-encoding
genes are induced rapidly in responses to amino acid starvation and other stresses
within minutes (Pleiss et al. 2007). More than 95% of animal multi-exon genes and more
than 60% of plant multi-exon genes undergo AS (Pan et al. 2008; Marquez et al. 2012;
Zhang et al. 2017), leading to the increased transcriptome and proteome complexity.
Introns and exons in pre-mRNA are bounded by conserved sequences that
define their ends. AS is regulated by the interactions of trans-regulatory proteins and
cis-regulatory elements, called splicing factors and splicing signal sequences,
respectively (Nilsen 2003). In general, strong cis-regulatory elements, which are highly
conserved, lead to consistent splicing, while weak cis-regulatory element lead to
inconsistent splicing (Keren et al. 2010; Reddy et al. 2013). Depending on the
regulatory direction and location, cis-regulatory elements could be divided into four
groups: exonic splicing silencers, exonic splicing enhancers, intronic splicing silencers
and intronic splicing enhancers (Kornblihtt et al. 2013). Similarly, trans-regulatory
proteins could act as activators or repressors during the splicing process. For example,
SR family and members of the hnRNP family have been shown to act as activators or
repressors, respectively (Reddy 2004; Martinez-Contreras et al. 2007). Splice site
16
selection is influenced by cellular stress, developmental stage and cell type in response
to adaptation and environment stresses (Filichkin et al. 2015).
The splicing reaction is mediated by a holoenzyme called the spliceosome, which
is a large macromolecular complex composed of five snRNAs and tens/hundreds of
auxiliary proteins (Nilsen 2003). The snRNAs function in RNA-protein-complexes, which
are called snRNP (Wahl et al. 2009). The conserved sequences that define a typical
intron include: 5’ splice site, branch site, polypyrimdine tract and 3’ splice site (Schwartz
et al. 2008). Based on the splicing signal sequence, introns could be divided into two
types: U2 and U12 intron (Patel and Stertz 2003; Will and Luhrmann 2005; Basu et al.
2008). More than 99% percent of introns are U2 introns in plants and animals (Will and
Luhrmann 2005; Basu et al. 2008). Although the two types of introns require different
splicing factors and spliceosomes, the catalytic reactions required to remove both types
of introns are similar (Simpson and Brown 2008). U2 intron splicing requires the U1, U2,
U4, U5 and U6 snRNPs whereas in U12 intron requires the analogous U11, U12,
U4atac and U6atac (Will and Luhrmann 2005; Simpson and Brown 2008). U2 intron
splicing process is a two-step reaction (Schwartz et al. 2008; Kornblihtt et al. 2013). In
preparation for splicing, the 5’ splice site is bound by U1 snRNP, the 3’ splicing site is
bound by the U2AF65 and U2AF35 auxiliary proteins, the branching site is initially
bound by branchpoint-binding protein, which is subsequently replaced by U2 snRNP
(Nilsen 2003; Wahl et al. 2009; Kornblihtt et al. 2013). In the first reaction, the 2’
hydroxyl group of the branchpoint adenosine attacks the phosphodiester bond of the 5’
splice junction site, forming an intermediate structure called the intron lariat (Nilsen
2003; Wahl et al. 2009). In the second reaction, the 3’ hydroxyl group of 5’ splice
17
junction site attacks the phosphodiester bond of 3’ splice junction site, leading to the
excision of intron lariat and the ligation of exons (Nilsen 2003; Wahl et al. 2009).
AS generates transcript isoforms carrying different pieces of DNA information.
Compared with original transcripts, AS can be categorized into a few basic types:
ExonS, IntronR, AltD, and AltA (Sammeth et al. 2008; Barbazuk et al. 2008). The ratio
of each type is different between animals and plants (Barbazuk et al. 2008). In animals,
the major type of AS is ExonS, which account for ~40%; while IntronR accounts for only
~10% (Kim et al. 2007). On the contrary, IntronR is common in plants and accounts for
~40% of AS events while ExonS only accounts for ~10% (Wang and Brendel 2006;
Marquez et al. 2012). The differences in AS between animals and plants lead to the
assumption that different splicing regulatory mechanisms exist in plants vs. animals.
Two models were proposed for splicing site selection: exon definition and intron
definition (Keren et al. 2010). Exon definition model suggests that the spliceosome
recognizes and assemblies in exons and splices out introns. In contrast, the intron
definition model suggests that the spliceosome recognizes, and assembles across,
introns (Keren et al. 2010). Because animals have small exons and large introns, exon
definition model is thought to predominate in animals. In comparison, plants possess
small introns and large exons and splicing is thought to occur through intron definition
(Keren et al. 2010). These differences in exon vs. intron definition agree with the
observed differences in AS event type preference – a mistake in defining an exon would
lead to ExonS in animals, while a mistake in intron definition in plants would result in
IntronR.
18
The functional outcome of AS largely depends on the fate of alternatively spliced
transcripts. Will the alternatively spliced isoform be translated into protein? If so, does
the “new” protein have function? AS will increase the complexity the proteome if the
isoform affects the coding region, and the alternative spliced transcripts are translated.
Thus, AS may in some part be play a role in creating the discrepancy between the
number of protein-coding genes in a genome and the much larger proteome (Keren et
al. 2010). However, the role of AS in protein complexity has been observed to be limited
(Severing et al. 2009). AS frequently generates nonsense mRNAs with PTCs that are
subjected to NMD. These unproductive transcripts may participate in post-
transcriptional regulation of protein levels (Lewis et al. 2003; Filichkin and Mockler 2012;
Drechsel et al. 2013). Thus, AS may play important roles in environmental adaptation
and developmental regulation (Staiger and Brown 2013). A combination of
transcriptome and proteome research would contribute to a better understanding of the
functional regulatory mechanism of AS.
Intron Origin Hypothesis
Introns are non-coding sequences that interrupt the coding sequence of a gene.
Introns are spliced out of the pre-mRNA leading to the uninterrupted coding potential of
mature mRNAs. Introns were first discovered in protein-coding genes (Berget et al.
1977; Chow et al. 1977), followed by the discovery of introns in tRNA-coding genes
(Goodman et al. 1997; Valenzuela et al. 1978) and rRNA-coding genes (LaPolla and
Lambowitz 1979; Dujon 1980). Introns have been found in all biological kingdoms
including organisms and viruses (Irimia and Roy 2014). Group I and II introns are found
in protein-coding genes, tRNA genes and rRNA genes of fungal, mitochondrial and
chloroplastic DNA (Saldanha et al. 1993); group I introns are also found in
19
bacteriophage, eubacteria and nuclear genes of lower eukaryotes (Saldanha et al.
1993). Group I and II introns are self-splicing, although their mechanisms differ. Group
III introns, also called spliceosomal introns, exist in protein-coding genes in the nuclear
genomes of eukaryotes (Brown et al. 1992). The splicing mechanism is directed by
spliceosome in a two-step manner that described above. The catalytic splicing reaction
of Group II introns is identical to that of Group III, although it requires no participation
from other RNA enzymes or proteins (Saldanha et al. 1993). The origin of the
spliceosomal intron is a topic of debate in molecular evolution (Koonin 2006; Rogozin et
al. 2012).
Two opposing explanations have dominated the debate for almost 25 years—
introns-early and introns-late hypothesis (Doolittle 1978; Darnel 1978; Cavalier-Smith
1991; Palmer and Logsdon 1991). According to the intron-early hypothesis, great
numbers of spliceosomal introns existed in the common ancestors of prokaryotes and
eukaryotes, but there was massive intron loss within independent linages during
evolution (Doolittle 1978; Gilbert 1987; Roy et al. 1999; Fedorov et al. 2001). In
contrast, supporters of the intron-late hypothesis argue that spliceosomal introns
originated in early eukaryotic species and were gained during eukaryotes evolution
(Cavalier-Smith 1985; Logsdon et al. 1995; Logsdon 1998; Stoltzfus 1999). All
sequenced genomes of eukaryote species contain spliceosomal-introns, even the
earliest linages of eukaryotes (Fedorova and Fedorov 2003; Rogozin et al. 2012). The
only reported case of an intronless eukaryotic genome is the Hemiselmis andersenii
nucleomorph genome, which shows a highly degraded remnant (Lane et al. 2007).
Intron-early hypothesis argues intron losses are the main driver for intron evolution,
20
whereas intron-late hypothesis argues intron gain drives intron evolution. Analysis of
completely sequenced eukaryotic genomes shows intron-rich and intron-poor species
interspersed throughout the phylogeny, with no simple phylogenetic pattern for
distribution evidence (Roy and Gilbert 2006). Because evidence exists for both
ancestral introns shared by orthologous genes from animals, plants and protists, and
newly evolved introns in some lineages (Logsdon et al. 1995; Fedorov et al. 2001), the
debate behind the intron-early or intron-late has not been resolved. Further developed
from the two extreme ideas of whether intron is originated before or after prokaryotes
and eukaryotes divergence, a more compromised solution is proposed as: the
progenitor spliceosomal introns are evolved from Group II introns during
eukaryogenesis, followed by both intron gain and intron loss, giving rise the current
intron distribution (Koonin 2006).
Three evolutionary forces—intron loss, intron gain and intron sliding—are thought
to play important roles in the evolution of intron diversity (Tarrío et al. 2008). For intron
loss, two models have been proposed: 1) The reverse transcription-recombination
model, in which cDNA is generated by reverse transcription of mRNA, followed by
recombination with the genomic copy, leading to intron loss. 2) Genomic deletion model,
in which introns are lost by direct and exact genomic deletion (Roy and Gilbert 2006 and
references therein). For intron gain, there are at least five models: 1) intron
transposition, in which the spliced intron re-inserts itself into mRNA in a different
position followed by reverse transcription; 2) intron transfer, in which the recombination
between paralogous genes lead to insertion of intron into new positions; 3) conversion
of Group II intron into spliceosomal intron; 4) transposon insertion; and 5) tandem
21
genomic duplication (Roy and Gilbert 2006 and references therein). An AS model has
been proposed for intron sliding, in which a change of a strong splice signal to a weak
one (or the reverse) leads to a change in the major isoform (Tarrío et al. 2008). On one
hand, AS could potentially affect intron evolution by driving intron sliding, on the other
hand, the evolution of an intron, which contains cis-regulatory splicing elements, would
in turn affect AS. The combined analysis would help to understand the role AS plays in
intron evolution as well as the influence intron evolution has on AS regulation.
MYB Gene Family Evolution
MYB transcription factors first appeared more than a billion years ago and they
are widely distributed in eukaryote species including slime molds, fungi, plants and
animals (Lipsick 1996). The basic unit of MYB domain, which is the ‘R’ repeat, is
classified into three main types: R1, R2 and R3. Different copy numbers and/or ordering
of these repeats make the evolutionary history of MYB gene family puzzling as well as
enchanting. To solve this puzzle requires clear identification of the MYB types in
different organisms. Animals mainly carry 3R-MYBs, plants mainly carry 3R-MYBs,
R2R3-MYBs and MYB-related genes (Lipsick 1996; Dubos et al. 2010; Davidson et al.
2012). One question that arises is what is the progenitor MYB type before the
divergence of plants and animals, 3R-MYB, R2R3-MYB, or MYB-related? The two
models for MYB evolution have been proposed: the loss model (Lipsick 1996), and the
gain model (Jiang et al. 2004). The loss module postulates that a single progenitor
repeat R replicated, giving rise to two- and three-repeat MYB domains (3R-MYB, also
called R1R2R3-MYB) before the divergence of plants and animals. The three-repeat
MYB domain is the progenitor MYB of plant and animals. During evolution, animals and
plants kept the 3R-MYB proteins, but some plant MYBs also lost the R1 domain, giving
22
rise to the R2R3-MYBs (Lipsick 1996). Subsequently, the R2R3-MYB subfamily
underwent great expansion during evolution. In the gain module it predicts that R2R3-
MYB and 3R-MYB coexisted in the common ancestors of plants and animals. During
evolution, plants kept both R2R3-MYB and 3R-MYB, while animals kept 3R-MYB, but
lost R2R3-MYB (Jiang et al. 2014). The loss model is better supported by the
conservation of 3R-MYBs between plants and animals and the phylogenetic relationship
between R2R3-MYBs and 3R-MYBs.
In addition to the broad evolutionary relationships of the MYB gene family, how
MYB evolved within animal or plant species is also under investigation. The evolution of
the MYB gene family in animals is relatively clear. Invertebrate species usually harbor
one copy of 3R-MYB, however vertebrates usually have three copies of 3R-MYB
(Lipsick 1996; Davidson et al. 2012). It is suggested that a single copy of 3R-MYB
underwent two rounds of duplication and gave rise to B-MYB, A-MYB and c-MYB in
vertebrate species (Davidson et al. 2012). The animal MYB evolution model is well
supported by phylogeny analysis.
In contrast, the function and phylogeny of the MYB gene family members in
plants are very poorly conserved. In addition, the plant MYB gene family includes at
least three types of MYBs: 3R-MYB, R2R3-MYB and MYB-related. Taken as a whole,
this broad and disparate collection makes evolution analysis of plant MYB gene family a
difficult task. Sequence conservation of DNA binding domain and regulatory motifs
currently divides the R2R3-MYB group into 22 subfamilies (Kranz et al. 1998). There
are reports detailing the identification of the MYB gene family in different plant species,
and a few comparison studies of two or a few species. However, the detailed expansion
23
pattern of MYB gene family in plant species is still not clear. The increased number of
plant species with sequenced and annotated genome drafts now available is enabling a
broad evolutionary analysis of MYB gene family across the plant kingdom.
The MYB transcription factor gene family, which regulates gene expression, is
itself under regulation by AS. To the best of our knowledge, in plant species only 2
research reports have been published on AS analysis of MYBs in plants. Arabidopsis
AtMYB59 and AtMYB48, and their rice homologous AK111626 and AK107214, shared
a conserved AS pattern and the expression level of their splice variants are regulated by
the treatment of hormones and stresses (Li et al. 2006). A genome scale analysis of
Cucumis sativus identified fifty-five R2R3-MYBs, among which eight exhibit AS events
(Li et al. 2012). All identified MYBs that have AS events show a certain level of
conservation of their AS pattern (Li et al. 2006; Li et al. 2012). Many questions
regarding AS of MYB gene family are currently unanswered: What is the ratio of genes
having AS in MYB gene family? What are the outcomes of splicing, and are the
transcript isoforms translated? Is AS conserved during MYB evolution/plant evolution? If
yes, what is the conserved pattern? If not, how does it evolve?
In the plant R2R3-MYB family, two conserved introns have been identified, with
the first conserved intron located in R2 domain region and the second conserved intron
located in R3 domain region (Jiang et al. 2004; Matus et al. 2008). Based on existence
of the two types of introns, R2R3-MYB genes could be divided into four basic types: 1)
single-exon gene; 2) two-exon gene with a conserved intron in R2 domain; 3) two-exon
gene with a conserved intron in R3 domain; 4) three-exon genes with both introns in R2
and R3 domains. Intron position and gene structure are largely conserved within
24
subfamilies. For subfamilies dominated by three-exon genes, the two conserved introns
show occasional loss in several species. Besides the two typical conserved introns,
more ancient subfamilies in R2R3-MYBs show other intron positions within the R2 and
R3 domain region, which is also conserved within subfamilies. For example, intron
pattern analysis of soybean R2R3-MYB genes identifies a total of 14 different patterns
(Du et al. 2012), which suggests variation in intron position in R2R3-MYBs.
Similar to R2R3-MYB, plant 3R-MYBs contain conserved introns within DNA-
binding domain region. However, the intron position of 3R-MYB is different from that in
R2R3-MYB. During evolution, intron position shows both conservation and dynamic. It
would be interesting to see whether evolution of MYBs support the intron gain or the
intron loss hypothesis.
TIFY Gene Family Evolution
Multiple domain proteins are more prevalent than single domain proteins in both
prokaryote and eukaryote genomes. In prokaryotes, approximately 60% of proteins
have multiple domains; while in eukaryotes, more than 70% of proteins have multiple
domain proteins (Vogel et al. 2004; Han et al. 2007). In multiple domain proteins the
function of individual domains may contribute to the overall function of the protein (Han
et al. 2007). In other cases, the recombination of two domains could link two
biochemical process together (Han et al. 2007). Although over half of all the domain
families are common in bacteria, archaea, and eukaryotes, only 5% of the two-domain
families are common in the three domains of life, suggesting that the emergence of new
domain combinations is related to speciation (Apic et al. 2001; Madrea et al 2004; Vogel
et al. 2004). Understanding the evolution of multi-domain gene family is important to
better understand the evolution of proteins and the process of speciation.
25
The TIFY family is a multi-domain gene family defined by a highly conserved
TIFY domain, which is about 36 amino acids long and forms an beta -beta-alpha fold
(Vanholme et al. 2007; Chung et al. 2009). Proteins within the TIFY family could be
further divided into four subfamilies based on the existence of other domains (Bai et al.
2011): 1) TIFY subfamily, with only the TIFY domain; 2) ZML subfamily, with the TIFY
domain along with a CCT and C2C2-GATA zinc finger domain (refer to as GATA
thereafter for simplicity), 3) PPD subfamily, with the TIFY domain along with a PPD and
Jas-like domain; 4) JAZ subfamily, with the TIFY domain and a Jas domain. The TIFY
domain functions in protein-protein interactions, which include interactions between
TIFY genes (Chini et al. 2009) and interactions between TIFY genes with other proteins
such as NINJA (Pauwels et al. 2010). The CCT domain is predicted to contain a nuclear
localization signal (Nishii et al. 2000). The GATA domain is a DNA-binding domain
recognizing specific cis-elements (Nishii et al. 2000; Teakle et al. 2002). The Jas
domain mediates protein-protein interaction with various transcription factors as well as
F-box protein COI1 (Shyu et al. 2012; Chini et al. 2016).
The major molecular mechanism for generating novel domain architecture are
domain duplication and domain shuffling (including domain deletion and domain
insertion) (Vogel et al. 2004; Han et al. 2007; Stolzer et al. 2015). In eukaryote genomes
exon boundaries and domain boundaries tend to coincide, suggesting new domain
combination may be formed by intronic recombination (Vogel et al. 2004). After the
origin of new domain architecture, the structure and function of the new combination
would determine whether they will be selected for or against during evolution (Vogel et
al. 2004).
26
The TIFY gene family is a plant-specific gene family (Bai et al. 2011). No TIFY
genes were observed in single-celled green algae C. reinhardtii or multi-cellular green
algae V. carteri, suggesting the TIFY family originated after the divergence of algae
from land plants (Bai et al. 2011). As a plant specific gene family, TIFY genes are
involved in many plant specific pathways. The ZML genes are transcription factors
function in wound-induced lignification in maize (Vélez-Bermúdez et al. 2015) and
photoprotective responses in Arabidopsis (Shaikhali et al. 2012). The PPD genes
regulate leaf morphology in Arabidopsis (White 2006). The JAZ subfamily comprises
transcription repressors playing an important role in the jasmonate signaling pathway
(Chini et al. 2016 and reference therein). Jasmonate is an essential phytohormone
regulating plant responses towards biotic and abiotic stresses. As the JAZ proteins are
important regulators in jasmonate responses, the evolution of JAZ proteins could
contribute to our understanding of the origin and establishment of the jasmonate
signaling pathway during plant evolution. The earliest identified plant species containing
JAZ genes is the moss P. patens (Katsir et al. 2008; Bai et al. 2011). Interestingly, the
P. patens genome also contains jasmonate biosynthetic enzyme and jasmonate-
conjugating enzymes (Terol et al. 2006; Katsir et al. 2008). It is likely that the early land
plants evolved the jasmonate pathway to better adapt to frequent abiotic and biotic
environmental stresses (Katsir et al. 2008).
27
CHAPTER 2 EVOLUTION OF THE 3R-MYB GENE FAMILY IN PLANTS
Background
The MYB gene family is broadly distributed in eukaryotes (Lipsick 1996), with
many homologs in plants (Dubos et al. 2010; Feller et al. 2011; Du et al. 2013). MYB
proteins are defined by the presence of one or more MYB domains, typically denoted ‘R’
(for repeat), which occur in the DNA-binding domain of MYB transcription factors
(Lipsick 1996; Martin and Paz-Ares 1997; Rosinski and Atchley 1998). Each R repeat
comprises approximately 52 amino acids that contain three regularly spaced conserved
hydrophobic residues (usually tryptophans) that are essential in forming the
hydrophobic pocket (Ogata et al. 1992). MYB domains fold into three alpha helices, with
the second and third helix forming a helix-turn-helix structure (Ogata et al. 1992). MYB
proteins are classified into four major types (1R-MYB/MYB-related, R2R3-MYB, 3R-
MYB and 4R-MYB) based on their number of repeats (Dubos et al. 2010), although this
classification is not necessarily consistent with the MYB phylogeny. There are three
genes in most vertebrates and fewer than ten genes in angiosperms that encode 3R-
MYB proteins (Feller et al. 2011), which include the product of the prototypical c-myb
gene (the cellular homolog of v-myb; Klempnauer et al. 1982). However, the animal and
plant 3R-MYB gene families appear to be separate clades, and the plant 3R-MYB
genes likely gave rise to the diverse (approximately 100 to 200 genes per species)
R2R3-MYB gene families of plants (Braun and Grotewold 1999; Dias et al. 2003). Thus,
understanding the evolution of the 3R-MYB genes in plants is critical for understanding
the evolution of the plant MYB gene family in general.
28
The primary function of many different MYB proteins appears to be recognition of
specific DNA sequence motifs (e.g., Ording et al. 1994), although MYB domains also
play a role in protein-protein interactions (e.g., Grotewold et al. 2000). Plant 3R-MYB
proteins recognize MSA elements (Ito et al. 1998; Ma et al. 2009), and play a conserved
role in cell cycle regulation. The 3R-MYB proteins in plants regulate the G2/M transition
(Ito et al. 2001), whereas the animal proteins regulate the G1/S transition (Bergoltz et al.
2001). The DNA element (MSA) that plant 3R-MYBs recognize exists in the upstream
promoter region of G2/M-phase specific genes, such as B-type cyclin genes, and it is
both necessary and sufficient for driving G2/M-phase specific gene expression (Ito et al.
2001; Haga et al. 2007; Kato et al. 2009).
Plant 3R-MYBs often are divided into three groups (the A-, B- and C-group; Ito et
al. 2001; Ito 2005). The tobacco NtMybA1 and NtMybA2 genes (A-group) have variable
expression patterns during cell cycle, with a peak of expression at M-phase, and their
products bind to the MSA element directly and activate B-type cyclin gene expression
(Ito et al. 2001; Kato et al. 2009). The Arabidopsis orthologs (Myb3R1 and Myb3R4) of
those tobacco genes bind to the MSA elements of B2-type cyclin, CELL DIVISON
CYCLE20.1 and KNOLLE, and up-regulate their expression (Haga et al. 2007).
Consistent with their putative role in the cell cycle, double mutants in these A-group
genes exhibit incomplete cytokinesis, multinucleate cells, and defective cell walls in
Arabidopsis (Haga et al. 2011). In contrast, tobacco NtMybB (B-group) is constantly
expressed during the cell cycle, and it functions as a repressor (Ito et al. 2001). Finally,
one of the C-group genes (OsMYB3R-2 in rice) is involved in both cell cycle and abiotic
stresses (Dai et al. 2007; Ma et al. 2009). The OsMYB3R-2 is induced by stresses, such
29
as freezing, drought, and salt; and, overexpression of it under stress conditions
increases stress tolerance and maintains a high level of cell division (Dai et al. 2007).
The pleiotropic effects of OsMYB3R-2 suggest it’s possible involvement in the B-type
cyclin pathway and the DREB/CBF pathway (Ma et al. 2009). It is unclear whether A-
and B-group 3R-MYB proteins are also involved in abiotic stresses. Plants have sessile
life styles and coping with abiotic stresses is a challenge for their survival. Placing these
functions of 3R-MYB transcription factors in an evolutionary framework is important for
understanding the ways that plants couple cell cycle and abiotic stress responses.
The genetic basis for functional divergence among the A-, B-, and C-groups of
3R-MYB proteins is also unclear. The C-terminal regions of MYB proteins are highly
divergent, and there is substantial length variation among the A-, B-, and C-groups (Ito
et al. 2001). There is a negative regulatory domain located in C-terminal region that
represses transactivation activity of NtMybA2 (A-group); specific cyclin/CDK
complex(es) could phosphorylate specific sites in NtMybA2 protein and remove the
inhibitory effects (Araki et al. 2004). Overexpression of the truncated protein without the
negative regulation domain up-regulates many G2/M specific genes compared with
overexpression of the full-length protein in tobacco (Kato et al. 2009). In addition to
these C-terminal regions, there can be divergence within the MYB repeats themselves.
If any such divergent sites exist, they might exhibit shifts in their evolutionary rate
(Gaucher et al. 2002) that would render them detectable.
AS is a process that results in multiple discrete mRNA products from a single
gene. This is a posttranscriptional modification of mRNA that may offer a quick
response to stimuli in eukaryotes. More than 95% of animal multi-exon genes (Pan et al.
30
2008) and more than 60% of plant multi-exon genes (Marquez et al. 2012) undergo AS.
However, the extent and regulation of AS in the plant 3R-MYBs is largely unknown.
Moreover, the evolutionary forces that shape current intron/exon gene structures (e.g.,
intron gain or intron loss) are unknown.
In this study, I explore the patterns of molecular evolution in the plant 3R-MYB
transcription factor gene family and examine its motif and domain organization, gene
structure, AS, and expression patterns under abiotic stresses. Specifically, I address the
phylogenetic relationships among plant 3R-MYBs, seek to identify candidate sites and
motifs in the 3R-MYB proteins that contribute to their functional divergence, determine
the pattern of intron and AS evolution within the plant 3R-MYBs, and look for evidence
that the A-, B- or C-group 3R-MYBs are involved in abiotic stress responses. Answering
these questions will enhance our understanding of the evolution and function of the 3R-
MYBs in plants and help illuminate the evolution and functional divergence of gene
families encoding plant transcription factors.
Materials and Methods
Identification of the 3R-MYB Proteins
We used HMMER v3.1b2 (Eddy 2011) to conduct profile HMM searches using
the Pfam MYB DNA-binding-domain (PF00249) as a query to search annotated proteins
from 65 plant species (Table 2-1). For gene loci with multiple isoforms predicted, the
primary isoform was used if primary isoform annotation is available; otherwise the
longest protein was used. We considered sequences with three MYB domains identified
by HMMER with an E-value of ≤ 1.0E-15 to be candidate 3R-MYB proteins. Those
candidate 3R-MYB proteins from the HMMER search were then examined to confirm
that three R repeats are adjacent to one another using the SMART (Letunic et al. 2015),
31
CDD (Marchler-Bauer et al. 2015), and Pfam (Finn et al. 2014) databases. Proteins with
non-adjacent R repeats or proteins containing other domains besides MYB domains
were removed.
Multiple Sequence Alignments and Phylogenetic Analysis
We generated an amino acid multiple sequence alignment for 3R-MYB using
Muscle v3.8.31 with default parameters (Edgar 2004) followed by manual improvements
(Supplemental Data 1), and used these as input to generate a ML phylogenetic tree
based on the entire protein lengths with RAxML v8.1.12 (Stamatakis 2014) using the
LG4X model (Le et al. 2012). Eight tree searches were performed to identify the ML
tree. Then I attempted to improve the ML gene tree topologies using TreeFix (Wu et al.
2013b), which takes the ML gene tree topology, the sequence alignment, and a species
tree topology (Figure 2-1) and tries to find an alternate gene tree topology that implies
fewer duplications and losses than the original ML topology while not significantly
increasing the likelihood. 500 nonparametric bootstrap replicates were run for the
dataset with ML under the LG4X model using RAxML (v8.1.12) (Stamatakis 2014) and
MEGA6 Beta2 software (Tamura et al. 2013) was used to generate the tree figures.
Domain and Motif Identification
We identified group-specific evolutionary rate shifts in the MYB domain region
using a method described by Gaucher et al. (2001). Briefly, I estimated the amino acid
substitution rates of each site in the alignments of the MYB-domains of six groups: 1) A-
group; 2) B-group; 3) C-group; 4) A- and B-groups; 5) B- and C-groups; and 6) A- and
C-groups with PAML v4.8a (Yang 2007) using the LG model (Le and Gascuel 2008)
with Γ-distributed rate variation among sites. We conducted three comparisons: 1) A-
group vs. B- and C-groups; 2) B-group vs. A- and C-groups; and 3) C-group vs. A- and
32
B-groups. The expected evolutionary rate difference for any comparison of two groups
is zero; large positive or negative values indicate shifts in rates. Sites with amino acid
substitution rate differences larger than 2.57 standard deviations from the mean were
chosen as significantly conserved or dynamic sites.
The branch-site model in PAML v. 4.8a (Yang 2007) was used to examine the
MYB domain of A-, B- or C-groups for positive selection following their divergence and,
if present, to determine the sites of positive selection. In these tests I compared the
alternative model (branch-site model A) with its corresponding null model (model A with
ω2=1 fixed). Additionally, I tested for positive selection in monocots within A- and C-
groups using the same method to detect whether monocot A- and C-groups have
picked up B-group gene function and thus have accelerated evolutionary rates. In the
positive selection tests, the nucleotide alignments of the DNA-binding-domain region
were generated from back translation from the amino acid alignments with in-house perl
scripts.
Motifs in the C-terminus were identified using MEME (Multiple EM for Motif
Elicitation) v.4.10.2 (Bailey et al. 2006). Sequence logos of the C-terminal motifs were
generated with Weblogo Berkeley (http://weblogo.berkeley.edu/logo.cgi).
Synonymous Divergence among Paralogs
PAML v. 4.8a (Yang 2007) was used on the nucleotide alignments described in
the positive selection test (above) to calculate pairwise dS with one ratio model (M0)
(Goldman and Yang 1994) for nucleotide alignments of the MYB-domains of paralogous
genes from each of 40 different angiosperm species (Table 2-1). Pair-wise dS values
were placed into six subsets depending on the group membership of the genes being
33
compared (A vs. A, B vs. B, C vs. C, A vs. B, B vs. C, and A vs. C). Normal distributions
were fit to the dS distributions of the six groups.
Syntenic Block Identification
In order to investigate whether the origin of the three 3R-MYB genes in
Amborella were due to single gene duplication or segmental duplication events, I
analyzed the synteny blocks in Amborella trichopoda and Ostreococcus lucimarinus.
Syntenic blocks in Ostreococcus lucimarinus and Amborella trichopoda were identified
with DAGchainer (Haas et al. 2004). Ostreococcus and Amborella proteins were aligned
to each another by the all-to-all blastp (v2.2.28) method (Altschul et al. 1990). The
combined file of genome annotation (gff3) and blastp results were supplied to
DAGchainer with default parameters. Syntenic blocks that contain the algal and
Amborella 3R-MYB proteins were plotted in R (R Development Core Team 2008).
Identification of Intron Positions and AS Analysis
We extracted gene structure information from gff3 annotation files for 42 species
(Table 2-1). The evolutionary history of introns in the DNA-binding-domain was
reconstructed using maximum parsimony with the phylogenetic trees constructed in this
study (Figure 2-2A; Figure 2-3). We also examined the 3R-MYB genes from six species
for evidence of AS . Arabidopsis thaliana, Populus trichocarpa, Vitis vinifera, Oryza
sativa, and Amborella trichopoda AS data was acquired from Chamala et al. (2015),
while AS in Sorghum bicolor was identified using the available reference genome
sequence and annotation (Paterson et al. 2009) and publicly available sorghum RNA-
Seq data (GSE30249 and GSE50464 from Gene Expression Omnibus) (Dugas et al.
2001; Olson et al. 2014) using the methodology described in Chamala et al. (2015).
Among the 25 3R-MYB genes identified within these species, 16 genes have evidence
34
of alternatively spliced transcripts. The gene structure of the sixteen 3R-MYB genes
were displayed with Gene Structure Display Server 2.0 (http://gsds.cbi.pku.edu.cn) (Hu
et al. 2015), and the AS patterns were added with manual editing.
Analysis of Motifs in Promoter Regions
We examined sequences from the start codon to a point 2000 base pairs
upstream for 160 3R-MYB genes from 41 species (Table 2-1). These putative promoter
regions were searched on both strands for exact matches to the sequence 5’-AACGG-
3’, which is the core consensus sequence of the MSA element
(T/C)C(T/C)AACGG(T/C)(T/C)A. We compared the number of exact matches to 5’-
AACGG-3’ in 3R-MYB gene promoters to 400 randomly sampled genes. We conduced
a one-way ANOVA and Tukey’s HSD test in R (R Development Core Team 2008) to
examine the hypothesis that 3R-MYB genes have more potential MSA elements than
randomly chosen genes. The number of potential MSA elements for each gene was
transformed by square root to normalize residuals and equalize variances before
statistical tests.
Gene Expression Analysis
We examined 3R-MYB gene expression analysis under various abiotic stresses
(heat, cold, drought and salt) with microarray data available from the AtGenExpress
(Arabidopsis thaliana genome transcript expression study) project (Kilian et al. 2007) for
Arabidopsis; and the Plant Expression Database (PLEXdb) (Dash et al. 2012) for
barley, rice, wheat, maize, grape, soybean, Medicago, poplar, and cotton. For data with
multiple time points I performed a one-way ANOVA test to determine the statistical
significance of expression changes. For data with control and stress conditions, I
performed a two-sample t-test to identify significant expression changes.
35
Results
Global Identification of 3R-MYB Proteins from 65 Plant Species
We identified 225 3R-MYB genes from 65 plant species using profile HMM
searches (Materials and Methods; Figure 2-1). There was a single 3R-MYB gene in
each of the algal outgroups, whereas the moss (Physcomitrella patens) has two 3R-
MYB genes, possibly resulting from a genome duplication in that lineage (Rensing et al.
2007). Both gymnosperm species that were analyzed have two 3R-MYB genes.
Amborella has three 3R-MYB genes that fall into the A-, B- and C-group, respectively,
indicating gene duplications preceding the origin of angiosperms. All other angiosperm
3R-MYB genes also fall into the A-, B- and C-groups; the number of 3R-MYB genes
found in angiosperm genomes ranges from one (e.g., Citrus sinensis) to nine (e.g.,
Triticum aestivum). The absence of gene members from a certain group of 3R-MYB in a
given species might represent bona fide gene loss but it could also result from an
incomplete or locally misassembled genome, improper annotation, or failure to meet our
screening criteria. However, the absence of B-group 3R-MYBs in many monocots [with
the exception of duckweed (Spirodela polyrhiza), banana (Musa acuminate), and wild
banana (Musa balbisiana)] suggests the loss of B-group 3R-MYBs during monocot
evolution. Based on the distribution of B-group 3R-MYB genes in monocots there were
probably two independent losses: one in the grasses and one in orchid and palms. In
addition, orchid and palms probably also lost A-group 3R-MYBs.
Phylogenetic Analysis of the Plant 3R-MYB Proteins
The 3R-MYB proteins were clearly divided among three groups (the previously
defined A-, B- and C-groups) (Figure 2-2A). The A-, B- and C-group proteins were
present only in angiosperm species, the single Amborella 3R-MYB gene in each group
36
was sister to all other species. Within A- and C-groups, genes from monocots formed
one branch while genes from eudicots formed another branch (Figure 2-2A and Figure
2-3). This indicates no gene duplication event before the divergence of monocots and
eudicots and the expansion of 3R-MYBs in angiosperms are mainly due to lineage
specific duplication events during the evolution of monocots and eudicots.
Synteny
A total of 1911 synteny blocks were identified between algae (Ostreococcus
lucimarinus) and Amborella, with an average of 9.5 (standard deviation = 2.8) genes per
synteny block. Examination of these blocks indicates that the region of Ostreococcus
lucimarinus chr9 surrounding a 3R-MYB gene is present in triplicate in Amborella – with
each block in the Amborella genome containing one of the three 3R-MYBs (Figure 2-4).
This suggests that the origin of the three 3R-MYB genes in Amborella resulted from
segmental duplications rather than tandem duplications of single gene.
Synonymous Divergence Analysis of the Three Group 3R-MYBs in Angiosperms
We analyzed the pair-wise dS values of paralogous 3R-MYB genes within the
same species of angiosperms (Figure 2-5). Inter-group comparisons (A-B, B-C, A-C)
were used to estimate the timing of gene duplication events leading to the divergence of
the three groups. The peaks of dS distribution of the three inter-group comparisons are
at 1.9, 2.2, and 2.4 for B-C, A-C, and A-B respectively. This suggests that the A-group
diverged before the divergence of B- and C-groups, in agreement with the phylogenetic
tree (Figure 2-2A and Figure 2-3). Intra-group comparisons (A-A, B-B, C-C) were used
to estimate the timing of gene duplication events after the divergence of A-, B- and C-
group. We observed the peak of dS distribution of A-A, B-B, C-C to be at 0.7, 0.9, and
0.5 respectively.
37
The Evolutionary History of the Plant 3R-MYBs Motifs
Four conserved motifs were identified in the C-terminal region of plant 3R-MYBs
(Figure 2-2). Motif 2 arose early in land plant evolution and was conserved across
moss, gymnosperm, and angiosperm proteins. The other three motifs appear to have
been present within the common ancestor of seed plants (gymnosperms and
angiosperms). Different motifs then appear to have been lost in each group.
Specifically, motif 3 was lost from the A-group proteins, motifs 1 and 4 were lost from
the common ancestor of B- and C-group proteins, and motif 3 was independently lost
from C-group proteins (Figure 2-2B). We also observed a 12-14 amino acids deletion in
motif 4 within the grasses (Figure 2-2C and Figure 2-6). It is unclear whether the lost
fragment in motif 4 affects 3R-MYB function in grasses.
Several amino acid sites in the MYB DNA-binding-domain appear to have
undergone rate shifts (Figure 2-7). Most of the candidate rate-shift sites are located in
the first helix of each R repeat, so they are unlikely to directly impact the DNA-binding
activity since the second and third helix form a helix-turn-helix structure responsible for
DNA binding (Ogata et al. 1992). Our rate shift analyses are consistent with the results
of functional characterization of the three MYB repeats in animal c-MYB (Ogata et al.
1992; Ording et al. 1994). Specifically, there are the fewest (3) rate divergent sites in
R3, which plays the dominant role in DNA-binding, whereas R1 and R2 have more (6
and 7, respectively). Site 85 in R2, showing divergence among A-, B- and C-groups, is
the only site located within the helix-turn-helix structure.
In order to test whether any of the three groups experienced accelerated
evolutionary rates after divergence, I tested positive selection of A-, B- and C-groups
using a branch-site model (Materials and Methods). However, none of these three tests
38
support the hypothesis of positive selection (Table 2-2). Moreover, positive selection in
monocots within the A- and C-groups was also not detected (Table 2-2).
Gene Structure Evolution
We identified six introns in the DNA-binding-domain region from 160 3R-MYB
genes (Figure 2-7A). Five introns (A, B, C, D and E) are conserved among multiple
species, while the other intron (b) was found only in one sequence. The distribution of
the five conserved introns reveals their evolutionary history (Figure 2-8). Introns A and B
were present in the common ancestor of all land plants and green algae; indeed, intron
A is broadly distributed in eukaryotes (Braun and Grotewold 1999). Two additional
introns (D and E) were gained before the divergence of mosses and seed plants.
Finally, intron C was inserted after the divergence of seed plants from mosses. The
unconserved intron b is found in only one case [Gorai008G117400 (B-group) in
Gossypium raimondii]. Gorai008G117400 has conserved introns A, C, D, and E, and
unconserved intron b in a position close to intron B. The amino acid alignment of the
corresponding region around intron b of Gorai008G117400 is different compared to
other proteins. It is possible that nucleotide substitutions around intron B may have
altered splicing signals; alternately, it could be a sequencing/assembly error.
Notably, I observed four conserved exons at the 3’ end in angiosperm A-group
and gymnosperm 3R-MYB genes. The middle two of the four conserved exons contain
the motif 4 in angiosperm A-group and gymnosperm 3R-MYB proteins (Figure 2-8).
Alternative Splicing of the Plant 3R-MYBs
The proportions of 3R-MYB genes with evidence of AS in Arabidopsis, poplar,
grapevine, rice, sorghum, and Amborella are 100% (5/5), 50% (2/4), 67% (4/6), 25%
(1/4), 33% (1/3), and 100% (3/3), respectively. Thus, 16 of the 25 3R-MYB genes
39
represented within the six species have evidence of undergoing AS, and these 16
genes produce a total of 30 AS events. Among the 30 AS events, 1 is ExonS, 15 are
IntronR, 7 are AltA, 1 is AltD, and 6 are alternative polyadenylation. Eight of the 30
events occur within UTRs, while 22 events impact the coding region (Figure 2-9). Eight
of the 22 AS events that impact the coding region lead to PTCs. These transcripts may
succumb to nonsense mediated decay (Chang et al. 2007) and may represent
unproductive splicing that may regulate 3R-MYB protein levels (Lareau et al. 2007).
Furthermore, 13 of the 22 events that impact the coding region affect the DNA binding
domain. Of all the AS events identified, I observe two shared AS patterns in 3R-MYB
genes among different species: Amborella Amtr00109.47, Arabidopsis At5g11510 and
At3g09370 shared a conserved AltA event in their second exons; Grape
GSVIVT01027493001 and Arabidopsis At4g00540 shared a conserved AltA event in
their second exons (Figure 2-9). Moreover, I observed a shared alternative
polyadenylation event between the two A-group Arabidopsis genes (At4g32730 and
At5g11510).
MSA cis-Regulatory Element Prediction (Cell Cycle Regulation)
The cis-regulatory elements necessary and sufficient to drive G2/M-phase
specific gene expression (MSA) are specific targets of the trans-acting 3R-MYB
proteins. Thus MSAs provide a way to identify candidate genes that might be involved in
the regulation of the G2/M transition during the cell cycle. The plant 3R-MYB genes
have been shown to be self-regulated by MSA elements in their promoter (Kato et al.
2009). We used evidence of enrichment of the MSA element core sequence within
regions upstream of 3R-MYB genes from plant species that have not been functionally
characterized as indication of potential involvement in cell cycle. We searched for the
40
MSA element core sequence (5’-AACGG-3’) within either of the sense or antisense
strands in the region up to 2kb upstream of the start codon of the 3R-MYB genes. There
were no significant differences in the number of MSA core sequences on the sense or
antisense strand (Figure 2-10). The average number of MSA element core sequences
in the upstream 2kp region of each gene of the A-, B-, C-group, and the outgroup
species (algae, moss, and gymnosperms) were 3.3, 3.2, 6.7 and 4.4 respectively. In
contrast, the average number of MSA element core sequence in the upstream
sequences for randomly selected genes was only 1.7. The numbers of MSA element
core sequences in plant 3R-MYB genes are significantly higher than randomly selected
genes based on ANOVA and Tukey’s HSD test (Figure 2-11). While this suggests the
possibility that plant 3R-MYBs are widely involved in the cell-cycle, this relationship
remains to be experimentally verified.
The number of MSA element core sequence in C-group genes is significantly
higher than that in A- and B-groups, suggesting that the C-group may have different
regulatory mechanisms.
Expression Pattern of the Plant 3R-MYBs under Abiotic Stresses
We analyzed available gene expression profiles of three Arabidopsis 3R-MYB
genes, At4g32730 (A-group), At5g11510 (A-group) and At3g09370 (C-group), under
various abiotic stresses. mRNA accumulation of At5g11510 under favorable growth
conditions was two-fold higher in the root than in the shoot, whereas the other two
genes have similar expression levels in the root and shoot (Figure 2-12). The C-group
gene At3g09370 was induced under two different stress conditions: 1) heat treatment
(both shoot and root); 2) salt stress (only in root). At3g09370 returns to its original
expression level when heat stress is released. The A-group genes At5g11510 and
41
At4g32730 showed reduced expression under heat treatment in shoot and root tissue,
although change in expression was less dramatic for At4g32730 (Figure 2-12). Overall,
there were several cases where A- and C-group 3R-MYB genes exhibited opposite
patterns of regulation. The Arabidopsis C-group gene At3g09370 shows an upregulated
expression pattern similar to the rice C-group gene OsMYB3R-2 under stress
conditions, implying At3g09370 also plays a role in stress response. The opposite
expression patterns of the A- and C-group genes described above implies a possible
antagonistic regulation of these two groups under abiotic stresses in Arabidopsis.
We analyzed available microarray gene expression profiles of 3R-MYBs in
barley, rice, wheat, maize, grape, soybean, Medicago, poplar, and cotton. Among the
available gene expression profiles, five A-group genes, one B-group genes and six C-
group genes showed significant expression changes in response to one or more stress
treatments (Figure 2-13). Among the fifteen instances of differential expression, six
cases involved upregulated expression: A-group gene MLOC10556 (barley) in response
to cold; B-group gene GSVIVT01019834001 (grape) in response to heat; and four C-
group genes Glyma18G18110 (soybean) in response to heat, LOC_Os01g62410
(OsMYB3R-2) (rice), GRMZM2G081919 (maize) and Potri006G085600 (poplar) in
response to drought (Figure 2-13). The remaining nine instances of differential
expression indicated downregulation in response to abiotic stresses.
Discussion
Patterns of Duplication and Loss in Plant 3R-MYB Genes
Plant and animal 3R-MYBs share a 3R-MYB common ancestor, which is
supported by the conservation of an intron in R1 (Braun and Grotewold 1999) and
phylogenetic analyses (Dias et al. 2003). Interestingly, there are similarities in the
42
evolution of 3R-MYBs in plants and animals. Most invertebrates have a single 3R-MYB
gene whereas vertebrates have three (A-MYB, B-MYB, and c-MYB) (Davidson et al.
2012). All three vertebrate 3R-MYB genes are involved in cell-cycle regulation, although
they have distinct expression patterns and exhibit some degree of functional
differentiation, such as the ability of B-MYB to complement Drosophila MYB mutants
when neither A- or c-MYB can do so (Davidson et al. 2005). The three vertebrate MYB
genes have originated from two rounds of segmental duplication (Davidson et al. 2012).
They may also be a result of two rounds of WGD in vertebrates (Gibson and Spring
2000), although more recent phylogenetic analyses raise questions about this
hypothesis (Abbasi and Hanif 2012). Analysis of synteny between Amborella trichopoda
and Ostreococcus lucimarinus suggest that the duplication events giving rise to the
three members in Amborella were regional or possibly even WGD events. There are
two putative WGD events, ζ and ε, shared by all angiosperm species (Jiao et al. 2011).
Our phylogenetic analyses suggest that event ε along with a second segmental
duplication could have produced the three angiosperm 3R-MYB groups (Figure 2-14A),
and it is conceivable that they were formed from both ζ and ε events combined with a
gene loss (Figure 2-14B).
Subsequent lineage specific duplication and loss events account for the variation
in the number of 3R-MYB members observed in modern angiosperm species. For
example, the grass lineage probably lost B-group 3R-MYBs (Figure 2-1 and 2-14); and
the orchid and palms possibly lost A- and B-group 3R-MYBs (Figure 2-1). The B-group
3R-MYB gene in tobacco is constitutively expressed during the cell cycle and functions
as a repressor (Ito et al. 2001), whereas A-group 3R-MYB genes in tobacco and
43
Arabidopsis exhibit circadian expression patterns that peak during M-phase and act as
activators (Ito et al. 2001; Araki et al. 2004; Haga et al. 2007). It was proposed that the
repressors (B-group 3R-MYBs) and activators (A-group 3R-MYBs) collaborate to
manipulate the cell progress through the G2/M transition in tobacco (Ito et al. 2001;
Araki et al. 2004). Thus, it is not clear what effect the absence of the B-group 3R-MYBs
has on cell cycle regulation in grasses. One possibility is that the monocot A- or C-
groups have picked up B-group gene function after its loss. In that case we would
expect to see accelerated evolutionary rates in monocots within the A- or C-group.
However, no positive selection in monocot lineages was detected with the method used
(Table 2-2). Taken into consideration that orchid and palm might have lost both A- and
B-group 3R-MYBs, the mechanism of monocot 3R-MYB regulation in cell cycle might be
more complex.
DNA-Binding Domain and Regulatory Motifs
As R1 does not directly interact with DNA in animal c-MYB, I expected it to be
less conserved compared with R3 and R2. However, I found the R1 domains of plant
3R-MYBs to be highly conserved (Figure 2-7D), suggesting R1 has functional
significance. In animals, R1 of c-MYB participates in intra-molecular interaction with the
carboxyl-terminus of itself (Dash et al. 1996). It is unclear whether that is the case in
plant 3R-MYBs. In addition, R1 of c-MYB influences transactivation of target genes, and
it may play a role in protein-protein interactions (Oelgeschläger et al. 2001). Further
functional characterization of the candidate rate shift sites are likely to establish whether
these lessons from animal c-MYB can provide insights into plant 3R-MYBs and
illuminate the ways that the three different subgroups of the plant 3R-MYB proteins
differ functionally. We did not detect any sites in the MYB domain region in A-, B- or C-
44
groups under positive selection, suggesting positive selection may not have played a
role in the divergence of these paralogs. However, the power of branch-site dN/dS test
for positive selection decreases as the dS value increases (Gharib and Robinson-
Rechavi 2013). As the MYB genes in this study came from distantly related species, dS
saturation was expected and it could affect the test results.
The diversity of motifs in the plant 3R-MYBs is a result of both motif gain and loss
during evolution. Motif 4, which originated in a common ancestor to seed plants,
remains in gymnosperm and angiosperm A-group genes but has been lost in B- and C-
groups genes. This motif is a repression domain that inhibits the ability of 3R-MYB
proteins to activate downstream genes during the cell cycle in tobacco (Araki et al.
2004) and Arabidopsis (Chandran et al. 2010). Moreover, specific Serine/Threonine
sites in motif 1 and 4 contribute to the removal of this inhibitory effect by cyclin-mediated
phosphorylation (Araki et al. 2004; Chandran et al. 2010). The gain of motif 4 has added
another level of regulation of the 3R-MYB proteins and increased the complexity of the
3R-MYB regulation network. Moreover, grass A-group 3R-MYBs have lost ~12 amino
acids in the middle of the repression motif, motif 4 (Figure 2-2C and Figure 2-5), which
may lead to differential function. Thus, in addition to the lack of B-group genes,
divergent motif 4 is another factor that may contribute to the different cell cycle
regulatory mechanism in grasses compared with the other flowering plants.
Intron Gain and Gene Structure Evolution
The origin of spliceosome-processed introns is a topic of debate (Koonin 2006;
Rogozin et al. 2012) that has focused on two contrasting models: the introns-early and
the introns-late hypothesis (Darnel 1978; Cavalier-Smith 1985). The introns-early
hypothesis argues that gene intron-exon structure evolution is driven by intron loss,
45
whereas the introns-late hypothesis argues that intron gain is the driver (Tarrío et al.
2008). Braun and Grotewold (1999) found only a single conserved intron position in
eukaryotic 3R-MYBs, suggesting a major role for intron gain in this gene family. Our
results expand on this, providing evidence that plant 3R-MYB genes underwent step-
wise intron gain (Figure 2-8), consistent with the introns-late hypothesis.
AS Regulation of the Plant 3R-MYBs
Although more than 60% of plant multi-exon genes were suggested to undergo
AS (Marquez et al. 2012), very little has been reported regarding alternatively spliced
transcript isoforms from the MYB gene family. Previously, there were two reports of AS
associated with plant R2R3-MYB genes. Arabidopsis AtMYB59 and AtMYB48, and their
rice homologs AK111626 and AK107214, shared a conserved AS pattern, and the
expression level of their splice variants are regulated during treatment with hormones
and stresses (Li et al. 2006). A genome scale analysis of Cucumis sativus identified
fifty-five R2R3-MYBs, among which eight exhibit AS regulation (Li et al. 2012). Our
analysis suggests that more than 60% (16 out of 25 genes) of the 3R-MYB genes
undergo AS, which is similar to the number of genes within plant genomes that are
observed to undergo AS (Marquez et al. 2012), but higher than the extent of the R2R3-
MYBs. Among the 30 AS events observed there are two cases (Amborella
Amtr00109.47, Arabidopsis At5g11510 and At3g09370; Grape GSVIVT01027493001
and Arabidopsis At4g00540) where the same AS pattern was shared between different
species, indicating a possible ancestral AS event. However, the majority of the AS
patterns were species-specific in our analysis. In a study that identified conserved AS
events among nine angiosperm species, Chamala et al. (2015) observed that 18% of
AS events identified in Amborella were shared with at least one other species, while
46
10% were shared with at least two other species. Plant 3R-MYB AS events seems to be
less conserved relative to AS events among other genes.
Interestingly, I observed a conserved alternative polyadenylation event between
Arabidopsis At4g32730 and At5g11510, both of which belong to the A-group. This AS
event would lead to a truncated protein lacking motif 4, which is the important C-
terminal repression motif (Figure 2-9). Transgenic study of the tobacco A-group gene
NtmybA2 indicated that the C-terminal truncated protein is hyperactive compared with
the whole length protein in upregulating downstream genes (Kato et al. 2009). Our
results indicate that the Arabidopsis A-group 3R-MYB genes could generate both the
primary protein products and the hyperactive protein products via AS.
Plant 3R-MYBs: Link between Cell Cycle and Abiotic Stresses
There are trade-offs between growth and stress resistance in plants. Increased
abiotic stress resistance is usually associated with decreased plant growth (Bechtold et
al. 2010), and arresting the cell cycle could lead to slow plant growth (Inzé and De
Veylder 2006). Molecular evidence for connections between abiotic stress and cell cycle
is emerging, but the mechanisms remain poorly defined. Phytohormones provide one
piece of evidence that cell cycle and abiotic stress response are linked (del Pozo et al.
2005). For example, the key stress hormone abscisic acid (ABA) accumulates under
osmotic stress and regulates various stress responsive genes, leading to increased
stress resistance and growth inhibition (Yoshida et al. 2014). ABA also increases the
expression of cell cycle inhibitors and down regulates factors related with DNA
replication (Wang et al. 1998; Mudgil et al. 2002; Yang et al. 2002; del Pozo et al.
2005). Since it is likely that various abiotic stresses induce ABA, they are expected to
change the rate of cell division. Reactive oxygen species (ROS) provide another
47
potential link between cell cycle and abiotic stresses. ROS are often produced in
reaction to various abiotic stresses (Mittler et al. 2004), and these can damage DNA
and affect DNA replication, which may affect the progression through cell division (Gill
and Tuteja 2010). A tobacco MAPKKK protein, NPK1, was observed to be involved in
cell cycle, ROS signaling and plant growth (Hirt 2000; Jonak et al. 2002, Nakagami et
al. 2005). In tobacco cells, NPK1 is expressed during M-phase and its protein product
localizes to the phragmoplast and central region of the mitotic spindle, suggesting its
role in cell cycle regulation (Hirt 2000). It has also been proposed that NPK1 senses
H2O2 and activates stress MAPKs in response to increased levels of H2O2 (Hirt 2000;
Nakagami et al. 2005). In addition, the Arabidopsis ANP1, an ortholog of the tobacco
NPK1, downregulates auxin-induced gene expression (Hirt 2000). Although the NPK1
protein is involved in multiple signaling pathways, it is not clear if it mediates interaction
between different signaling pathways.
Since there are often trade-offs between growth and stress resistance, genes
that are positively related with plant growth and cell cycle are expected to be
downregulated under stress conditions. However, up-regulation under stress conditions
implies a possible stress-related regulatory function of the gene. 3R-MYB genes in
tobacco (Ito et al. 2001; Araki et al. 2004; Ito 2005; Kato et al. 2009; Araki et al. 2012;
Araki et al. 2013), Arabidopsis (Haga et al. 2007; Haga et al. 2011) and rice (Ma et al.
2009) are involved in regulating the cell cycle. Recently, rice OsMYB3R-2, a C-group
3R-MYB, has been shown to play a role in responses to cold stress as well (Dai et al.
2007; Ma et al. 2009); the expression of OsMYB3R-2 is upregulated under various
stress conditions and overexpression of OsMYB3R-2 under cold stress increases
48
tolerance and maintains a high level of cell division (Ma et al. 2009). Our analysis
identified seven 3R-MYB genes from seven species that were significantly upregulated
under abiotic stresses: barley MLOC10556 in response to cold; grape
GSVIVT01019834001, Arabidopsis At3g09370 and soybean Glyma18G181100 in
response to heat; and rice LOC_Os01g62410 (OsMYB3R-2), maize GRMZM2G081919
and poplar Potri006G085600 in response to drought (Figure 2-12 and 2-13). Among
these seven genes, MLOC10556 is from the A-group, GSVIVT01019834001 is from B-
group, while the remaining five genes were from C-group. The observation that C-group
genes from multiple monocot and eudicot species show upregulation under various
stresses suggests that the C-group 3R-MYB genes may be involved in both cell cycle
and stress resistance, and the involvement in abiotic stresses may be an ancestral
condition that is conserved across angiosperms. Identification of the upstream
regulatory genes as well as other downstream target genes will contribute to the
understanding of how plant C-group 3R-MYBs integrate in both cell cycle and abiotic
stress response. The animal orthologs of the 3R-MYB genes are solely involved in the
cell cycle. The coupling of abiotic stress response and cell cycle through the 3R-MYB
gene products may play a role in the ability of plants to adapt to their sessile life style.
49
Figure 2-1. Species phylogeny and numbers of 3R-MYB genes in each species. The species tree in the study was inferred from Ruhfel et al. (2014), Zeng et al. (2014), Vanneste et al. (2014), and Huang et al. (2015). The divergence time was estimated by molecular clock dating from TimeTree (Hedges et al. 2015). Stars on the branches indicate WGD events; the five WGD events Arabidopsis thaliana went through were α, β, γ, ε and ζ. In the species tree dark green, yellow, purple, blue, green, and red indicate algae, moss, gymnosperms, Amborella trichopoda, monocots, and eudicots respectively. Following the species names are the number of 3R-MYBs identified in each group as well as in total. Mya: million years ago.
Geologic Timescale
Time (Mya)
z
e
g
ba
0 300 600 900 1160
Species Common name Outgroup A_group B_group C_group Total
Bathycoccus prasinos 1 1
Micromonas pusilla CCMP1545 1 1
Micromonas pusilla RCC299 1 1
Ostreococcus lucimarinus 1 1
Ostreococcus sp. RCC809 1 1
Physcomitrella patens moss 2 2
Ginkgo biloba common ginkgo 2 2
Pinus taeda loblolly pine 2 2
Amborella trichopoda 1 1 1 3
Spirodela polyrhiza duckweek 1 1 1 3
Phalaenopsis equestris orchid 1 1
Phoenix dactylifera data palm 1 1
Elaeis guineensis African oil palm 1 1
Musa acuminata banana 2 1 3 6
Musa balbisiana wild banana 2 1 1 4
Panicum virgatum switchgrass 3 2 5
Panicum hallii Hall's panicgrass 1 2 3
Setaria italica foxtail milet 2 2
Sorghum bicolor sorghum 2 1 3
Zea mays maize 1 1
Oryza sativa rice 2 2 4
Brachypodium distachyon purple false brome 2 4 6
Hordeum vulgare barley 1 1
Triticum aestivum bread wheat 4 5 9
Triticum urartu wheat A genome progenitor 1 1
Aquilegia coerulea Colorado blue columbine 1 1 1 3
Nelumbo nucifera sacred lotus 2 2 4
Beta vulgaris sugar beet 1 1 1 3
Actinidia chinensis kiwifruit 2 1 3
Utricularia gibba humped bladderwort 1 1
Mimulus guttatus monkeyflower 2 1 1 4
Nicotiana benthamiana tobbacco 2 2 2 6
Capsicum annuum pepper 2 1 3
Solanum lycopersicum tomato 2 1 1 4
Solanum tuberosum potato 1 1 2
Vitis vinifera grapevine 3 2 1 6
Eucalyptus grandis flooded gum 1 1 1 3
Citrus sinensis orange 1 1
Gossypium raimondii cotton 3 2 2 7
Theobroma cacao cacao tree 1 2 1 4
Carica papaya papaya 1 1 1 3
Brassica rapa field mustard 5 4 9
Eutrema salsugineum salt cress 2 1 2 5
Arabidopsis thaliana 2 1 2 5
Capsella grandiflora 2 2 4
Boechera stricta Drummond's rockcress 2 1 2 5
Cucumis sativus cucumber 2 1 3
Citrullus lanatus watermelon 2 1 3
Malus domestica apple 1 2 3
Pyrus bretschneideri Chinese white pear 1 1 1 3
Prunus persica peach 1 1 1 3
Prunus mume mei 1 2 1 4
Fragaria vesca woodland strawberry 1 2 1 4
Glycine max soybean 4 1 3 8
Phaseolus vulgaris common bean 2 2 4
Cajanus cajan pigeon pea 2 1 1 4
Medicago truncatula barrel medic 2 1 2 5
Cicer arietinum chickpea 2 1 3
Lotus japonicus birdsfoot trefoil 1 1 1 3
Ricinus communis castor bean 1 1 1 3
Manihot esculenta cassava 2 1 3
Jatropha curcas physic nut 1 2 1 4
Linum usitatissimum flax 2 1 2 5
Populus trichocarpa poplar 2 1 1 4
Salix purpurea willow 2 3 1 6
11 85 46 83 225P
oa
les
Gre
en
alg
ae
Total
Eu
rosid
s I
IE
uro
sid
s I
Ro
sid
sA
ste
rid
s
50
Figure 2-2. Subgroup classification of the plant 3R-MYBs. A) ML tree of the whole length plant 3R-MYB proteins. In the ML tree: dark green, yellow, purple, blue, green, and red indicate proteins from algae, moss, gymnosperms, Amborella trichopoda, monocots, and eudicots respectively. B) Domain and motif structures of the plant 3R-MYBs in each group. Boxes on the right show the protein structure of the 3R-MYB in each group. N: amino-terminus; C: carboxyl-terminus. C) Sequence logos of the four motifs identified in b. Orange stars below amino acids indicate highly conserved amino acid sites. Blue box indicates the lost fragment in motif 4 in grasses.
51
Figure 2-3. Whole length protein ML phylogenetic tree of the plant 3R-MYBs. Bootstrap values ≥ 50% are shown on the corresponding branches. In the ML tree: dark green, yellow, purple, blue, green and red indicate proteins from algae, moss, gymnosperms, Amborella trichopoda, monocots, and eudicots respectively. Orange shadow indicates sequences from rosids, purple shadow indicates sequences from asterids, and blue shadow indicates sequences from monocots.
Bathy10g03950 (Bathycoccus prasinos) 192902 (Micromonas pusilla CCMP1545)
88104 (Micromonas pusilla RCC299) 37383 (Ostreococcus RCC809)
34891 (Ostreococcus lucim arinus) Phpat.014G052200 (Physcomitrella patens)
Phpat.010G040000 (Physcomitrella patens) PITA000002016 (Pinus teada) PITA000027114 (Pinus teada)
06223 (Ginkgo biloba) 08952 (Ginkgo biloba)
Amborellales Amtr00109.47(Am borella trichopoda)Alismatales Spipo10G0009900 (Spirodela polyrhiza)
Bchr10P30232 (Musa balbisiana) Achr10P14840 (Musa acuminata)
Bchr2P04866 (Musa balbisiana) Achr2P19890 (Musa acuminata)
Zingiberales
Sobic.009G083800 (Sorghum bicolor) Pavir.J07395 (Panicum virgatum)
Bradi2g31887 (Brachypodium distachyon) Traes.1AS.61D017632 (Triticum aestivum)
Traes.1BS.403DBC53C (Triticum aestivum) Sobic.003G008700 (Sorghum bicolor) Pahal.0003s0659 (Panicum hallii) Pavir.Ea00087 (Panicum virgatum) Pavir.Eb00149 (Panicum virgatum)
LOC_Os01g12860 (Oryza sativa) LOC_Os12g13570 (Oryza sativa)
Bradi2g07677 (Brachypodium distachyon) MLOC10556 (Hordeum vulgare) Traes.3AS.09025DA2E (Triticum aestivum) Traes.3B.ABF465DDF (Triticum aestivum)
Poales
Ranunculales Aquca001.00435 (Aquilegia coerulea) Nenu003474 (Nelumbo nucifera)
Nenu005941 (Nelumbo nucifera)Proteales
GSVIVT01015370001 (Vitis vinifera)Vitales GSVIVT01035663001 (Vitis vinifera)
GSVIVT01035664001 (Vitis vinifera)Caryophyllales Bv2.032710 (Beta vulgaris)
MigutE01513 (Minulus guttatus) MigutA00232 (Minulus guttatus)
Lamiales
Solyc08g068320 (Solanum lycopersicum) Capana08g000819 (Capsicum annuum)
Capana11g000012 (Capsicum annuum) Solyc11g071300 (Solanum lycopersicum)
NbS00023020g0004 (Nicotiana benthamiana) NbS00023374g0007 (Nicotiana benthamiana)
Solanales
Cucsa161430 (Cucumis sativus) Cla022224 (Citrullus lanatus)
Cucsa311420 (Cucumis sativus) Cla023007 (Citrullus lanatus)
Cucurbitales
MDP0000179225 (Malus domestica) mrna08793 (Fragaria vesca)
Pbr009229 (Pyrus bretschneideri) Prupe.1G452000 (Prunus persica) Pm005124 (Prunus mume)
Rosales
Ca01884 (Cicer arietinum) Medtr3g110028 (Medicago truncatula)
CM1664.320 (Lotus japonicus) Ccajan11088 (Cajanus cajan)
Phvul009G106700 (Phaseolus vulgaris) Glyma06G082300 (Glycine max) Glyma04G080600 (Glycine max)
Ca14925 (Cicer arietinum) Medtr1g026870 (Medicago truncatula)
Ccajan14244 (Cajanus cajan) Phvul001G061200 (Phaseolus vulgaris)
Glyma17G190900 (Glycine max) Glyma14G143400 (Glycine max)
Fabales
Myrtales Eucgr.C02893 (Eucalyptus grandis) 29794m003447 (Ricinus communis)
Jcr4S11652.10 (Jatropha curcas) Lus10038623 (Linum usitatissimum)
Lus10022136 (Linum usitatissimum) Sapur0283s0180 (Salix purpurea)
Potri018G038000 (Populus trichocarpa) Sapur0446s0220 (Salix purpurea)
Potri006G241700 (Populus trichocarpa)
Malpighiales
Thecc1EG047091t1 (Theobroma cacao) Gorai010G110000 (Gossypium raimondii) Gorai001G028200 (Gossypium raimondii) Gorai009G051000 (Gossypium raimondii)
Malvales
Cpapaya78.76 (Carica papaya) BraraK00691 (Brassica rapa)
BraraA00517 (Brassica rapa) BraraH01296 (Brassica rapa)
Thhalv10024320m (Eutrema salsugineum) AT4G32730 (Arabidopsis thaliana) Cagra4093s0003 (Capsella grandiflora)
Bostr7867s1124 (Boechera stricta) BraraJ02221 (Brassica rapa)
BraraC00465 (Brassica rapa) Thhalv10012583m (Eutrema salsugineum) Bostr20055s0087 (Boechera stricta) Cagra7526s0003 (Capsella grandiflora) AT5G11510 (Arabidopsis thaliana)
Brassicales
Amborellales Amtr00012.146 (Amborella trichopoda)Alismatales Spipo6G0071600 (Spirodela polyrhiza)
Elgu00003.114 (Elaeis guineensis) PDK30s1074861g022 (Phoenix dactylifera)
Arecales
Achr6P05030 (Musa acuminata) Achr7P10520 (Musa acuminata)
Bchr10P31223 (Musa balbisiana) Achr10P26610 (Musa acuminata)
Zingiberales
Asparagales PEQU09277 (Phalaenopsis equestris) LOC_Os05g38460 (Oryza sativa)
Pahal.0696s0006 (Panicum hallii) Si021484m (Setaria italica)
Bradi2g23341 (Brachypodium distachyon) Bradi2g23310 (Brachypodium distachyon)
Bradi2g23320 (Brachypodium distachyon) Traes.1BL.C25B5DDB4 (Triticum aestivum)
Traes.1DL.B26A733D4 (Triticum aestivum) TRIUR3.29290 (Triticum urartu)
LOC_Os01g62410 (Oryza sativa) Bradi2g54640 (Brachypodium distachyon)
Traes.3B.B594FC28C (Triticum aestivum) Traes.3AL.386795528 (Triticum aestivum) Traes.3DL.6BBC889A1 (Triticum aestivum)
GRMZM2G081919 (Zea mays) Sobic.003G352200 (Sorghum bicolor) Si000842m (Setaria italica)
Pahal.0006s0119 (Panicum hallii) Pavir.Ea03349 (Panicum virgatum) Pavir.Eb03676 (Panicum virgatum)
Poales
Ranunculales Aquca003.00045 (Aquilegia coerulea) Nenu007682 (Nelumbo nucifera) Nenu012205 (Nelumbo nucifera)
Proteales
Caryophyllales Bv5.108980 (Beta vulgaris) ugScf00212.g12744 (Utricularia gibba)
MigutL01945 (Minulus guttatus)Lamiales
NbS00027068g0023 (Nicotiana benthamiana) NbS00007819g0018 (Nicotiana benthamiana)
Solyc09g010820 (Solanum lycopersicum) PGSC0003DMP400015671 (Solanum tuberosum)
Solanales
Vitales GSVIVT01034171001 (Vitis vinifera)Ericales Achn163941 (Actinidia chinensis)Myrtales Eucgr.K03133 (Eucalyptus grandis)
Cucsa175460 (Cucumis sativus) Cla017897 (Citrullus lanatus)
Cucurbitales
Pm002704 (Prunus mume) Prupe 6G255400 (Prunus persica)
mrna18416 (Fragaria vesca) Pbr006264 (Pyrus bretschneideri)
MDP0000197330 (Malus domestica) MDP0000219581 (Malus domestica)
Rosales
CM0147.620 (Lotus japonicus) Ccajan44733 (Cajanus cajan)
Phvul010G012500 (Phaseolus vulgaris) Glyma03G082400 (Glycine max)
Medtr7g061330 (Medicago truncatula) Medtr7g461410 (Medicago truncatula)
Phvul008G102000 (Phaseolus vulgaris) Glyma18G181100 (Glycine max) Glyma07G132200 (Glycine max)
Fabales
Sapur0001s1810 (Salix purpurea) Potri006G085600 (Populus trichocarpa) Jcr4S01332.10 (Jatropha curcas)
29846m000184 (Ricinus communis) cassava004816m (Manihot esculenta)
Malpighiales
Brassicales Cpapaya228.17 (Carica papaya) Thecc1EG021936t1 (Theobroma cacao) Gorai006G172800 (Gossypium raimondii)
Gorai001G249400 (Gossypium raimondii)Malvales
Lus10025351 (Linum usitatissimum) Lus10024394 (Linum usitatissimum)
Malpighiales
Sapindales orange1.1g009402m (Citrus sinensis) BraraJ02886 (Brassica rapa)
Thhalv10013211m (Eutrema salsugineum) AT5G02320 (Arabidopsis thaliana)
Cagra0487s0012 (Capsella grandiflora) Bostr6251s0040 (Boechera stricta) Thhalv10020579m (Eutrema salsugineum)
AT3G09370 (Arabidopsis thaliana) Cagra2515s0028 (Capsella grandiflora)
Bostr22252s0130 (Boechera stricta) BraraE03093 (Brassica rapa)
BraraC03272 (Brassica rapa) BraraA03530 (Brassica rapa)
Brassicales
Amborellales Amtr00009.357 (Amborella trichopoda)Alismatales Spipo1G0035900 (Spirodela polyrhiza)
Bchr2P03456 (Musa balbisiana) Achr2P03880 (Musa acuminata)
Zingiberales
Ranunculales Aquca013.00366 (Aquilegia coerulea)Vitales GSVIVT01019834001 (Vitis vinifera)Caryophyllales Bv5.095680 (Beta vulgaris)Ericales Achn380251 (Actinidia chinensis)Lamiales MigutA00800 (Mimulus guttatus)
Solyc08g080580 (Solanum lycopersicum ) Capana01g000277 (Capsicum annuum) PGSC0003DMP400005488 (Solanum tuberosum)
NbS00046693g0003 (Nicotiana benthamiana) NbS00001647g0009 (Nicotiana benthamiana)
Solanales
Pm023786 (Prunus mume) Prupe.5G093400 (Prunus persica)
Pbr039284 (Pyrus bretschneideri) Pbr039284 (Pyrus bretschneideri)
Rosales
CM0021.3260 (Lotus japonicus) Ca02417 (Cicer arietinum)
Medtr5g010650 (Medicago truncatula) Glyma01G217500 (Glycine max)
Ccajan13183 (Cajanus cajan)
Fabales
Myrtales Eucgr.D01905 (Eucalyptus grandis) Thecc1EG016203t1 (Theobroma cacao)
Gorai003G160500 (Gossypium raimondii)Malvales
Sapur0761s0070 (Salix purpurea) Lus10008010 (Linum usitatissimum)
cassava022173m (Manihot esculenta) 30190m011160 (Ricinus communis)
Jcr4S00150.230 (Jatropha curcas)
Malpighiales
Thhalv10029464m (Eutrema salsugineum) Bostr10064s0051 (Boechera stricta)
AT4G00540 (Arabidopsis thaliana)Brassicales
Ericales Achn295821 (Actinidia chinensis)Vitales GSVIVT01027493001 (Vitis vinifera)
Pm018007 (Prunus mume) mrna19115 (Fragaria vesca)
Rosales
Brassicales Cpapaya18.208 (Carica papaya) Thecc1EG005402t1 (Theobroma cacao)
Gorai008G117400 (Gossypium raimondii)Malvales
Sapur0586s0030 (Salix purpurea) Jcr4S10210.20 (Jatropha curcas)
Potri014G079300 (Populus trichocarpa) Sapur0586s0030 (Salix purpurea) Sapur0533s0050 (Salix purpurea)
Malpighiales
71
100
67
100
100
100
100
100
99
100
100100
100
100
100
94
95100
100
95
70
99
79
52
9983
100
100
84
100
100
99
99
100
100
90
100
10096
100
5688
63
100
100
93
100
100
100
100
100
77
72
81
99
100
100
100
92
8559
100
100100
9699
55
100
99
9797
100
94
55
58
90
57
98
56
9955
87
67
10070
9472
96100
97
98
90100
99
100
72100
100
100
100
100
100
67
89
97
61
10097
89
100
7784
100
98
93100
100
7976
88
93
100
93100
59
100
100
9987
78
100
8189
91
61
100
100
89
100
10052
10092
97
67
100
100
98
98
70
100
98
61
91
99
79
85100
97
93
80
100100
94
50
86
100
0.2
Bathy10g03950 (Bathycoccus prasinos) 192902 (Micromonas pusilla CCMP1545)
88104 (Micromonas pusilla RCC299) 37383 (Ostreococcus RCC809)
34891 (Ostreococcus lucim arinus) Phpat.014G052200 (Physcomitrella patens)
Phpat.010G040000 (Physcomitrella patens) PITA000002016 (Pinus teada) PITA000027114 (Pinus teada)
06223 (Ginkgo biloba) 08952 (Ginkgo biloba)
Amborellales Amtr00109.47(Am borella trichopoda)Alismatales Spipo10G0009900 (Spirodela polyrhiza)
Bchr10P30232 (Musa balbisiana) Achr10P14840 (Musa acuminata)
Bchr2P04866 (Musa balbisiana) Achr2P19890 (Musa acuminata)
Zingiberales
Sobic.009G083800 (Sorghum bicolor) Pavir.J07395 (Panicum virgatum)
Bradi2g31887 (Brachypodium distachyon) Traes.1AS.61D017632 (Triticum aestivum)
Traes.1BS.403DBC53C (Triticum aestivum) Sobic.003G008700 (Sorghum bicolor) Pahal.0003s0659 (Panicum hallii) Pavir.Ea00087 (Panicum virgatum) Pavir.Eb00149 (Panicum virgatum)
LOC_Os01g12860 (Oryza sativa) LOC_Os12g13570 (Oryza sativa)
Bradi2g07677 (Brachypodium distachyon) MLOC10556 (Hordeum vulgare) Traes.3AS.09025DA2E (Triticum aestivum) Traes.3B.ABF465DDF (Triticum aestivum)
Poales
Ranunculales Aquca001.00435 (Aquilegia coerulea) Nenu003474 (Nelumbo nucifera)
Nenu005941 (Nelumbo nucifera)Proteales
GSVIVT01015370001 (Vitis vinifera)Vitales GSVIVT01035663001 (Vitis vinifera)
GSVIVT01035664001 (Vitis vinifera)Caryophyllales Bv2.032710 (Beta vulgaris)
MigutE01513 (Minulus guttatus) MigutA00232 (Minulus guttatus)
Lamiales
Solyc08g068320 (Solanum lycopersicum ) Capana08g000819 (Capsicum annuum)
Capana11g000012 (Capsicum annuum) Solyc11g071300 (Solanum lycopersicum )
NbS00023020g0004 (Nicotiana benthamiana) NbS00023374g0007 (Nicotiana benthamiana)
Solanales
Cucsa161430 (Cucumis sativus) Cla022224 (Citrullus lanatus)
Cucsa311420 (Cucumis sativus) Cla023007 (Citrullus lanatus)
Cucurbitales
MDP0000179225 (Malus domestica) mrna08793 (Fragaria vesca)
Pbr009229 (Pyrus bretschneideri) Prupe.1G452000 (Prunus persica) Pm005124 (Prunus mume)
Rosales
Ca01884 (Cicer arietinum) Medtr3g110028 (Medicago truncatula)
CM1664.320 (Lotus japonicus) Ccajan11088 (Cajanus cajan)
Phvul009G106700 (Phaseolus vulgaris) Glyma06G082300 (Glycine max) Glyma04G080600 (Glycine max)
Ca14925 (Cicer arietinum) Medtr1g026870 (Medicago truncatula)
Ccajan14244 (Cajanus cajan) Phvul001G061200 (Phaseolus vulgaris)
Glyma17G190900 (Glycine max) Glyma14G143400 (Glycine max)
Fabales
Myrtales Eucgr.C02893 (Eucalyptus grandis) 29794m003447 (Ricinus communis)
Jcr4S11652.10 (Jatropha curcas) Lus10038623 (Linum usitatissimum)
Lus10022136 (Linum usitatissimum) Sapur0283s0180 (Salix purpurea)
Potri018G038000 (Populus trichocarpa) Sapur0446s0220 (Salix purpurea)
Potri006G241700 (Populus trichocarpa)
Malpighiales
Thecc1EG047091t1 (Theobroma cacao) Gorai010G110000 (Gossypium raimondii) Gorai001G028200 (Gossypium raimondii) Gorai009G051000 (Gossypium raimondii)
Malvales
Cpapaya78.76 (Carica papaya) BraraK00691 (Brassica rapa)
BraraA00517 (Brassica rapa) BraraH01296 (Brassica rapa)
Thhalv10024320m (Eutrema salsugineum) AT4G32730 (Arabidopsis thaliana) Cagra4093s0003 (Capsella grandiflora)
Bostr7867s1124 (Boechera stricta) BraraJ02221 (Brassica rapa)
BraraC00465 (Brassica rapa) Thhalv10012583m (Eutrema salsugineum) Bostr20055s0087 (Boechera stricta) Cagra7526s0003 (Capsella grandiflora) AT5G11510 (Arabidopsis thaliana)
Brassicales
Amborellales Amtr00012.146 (Amborella trichopoda)Alismatales Spipo6G0071600 (Spirodela polyrhiza)
Elgu00003.114 (Elaeis guineensis) PDK30s1074861g022 (Phoenix dactylifera)
Arecales
Achr6P05030 (Musa acuminata) Achr7P10520 (Musa acuminata)
Bchr10P31223 (Musa balbisiana) Achr10P26610 (Musa acuminata)
Zingiberales
Asparagales PEQU09277 (Phalaenopsis equestris) LOC_Os05g38460 (Oryza sativa)
Pahal.0696s0006 (Panicum hallii) Si021484m (Setaria italica)
Bradi2g23341 (Brachypodium distachyon) Bradi2g23310 (Brachypodium distachyon)
Bradi2g23320 (Brachypodium distachyon) Traes.1BL.C25B5DDB4 (Triticum aestivum)
Traes.1DL.B26A733D4 (Triticum aestivum) TRIUR3.29290 (Triticum urartu)
LOC_Os01g62410 (Oryza sativa) Bradi2g54640 (Brachypodium distachyon)
Traes.3B.B594FC28C (Triticum aestivum) Traes.3AL.386795528 (Triticum aestivum) Traes.3DL.6BBC889A1 (Triticum aestivum)
GRMZM2G081919 (Zea mays) Sobic.003G352200 (Sorghum bicolor) Si000842m (Setaria italica)
Pahal.0006s0119 (Panicum hallii) Pavir.Ea03349 (Panicum virgatum) Pavir.Eb03676 (Panicum virgatum)
Poales
Ranunculales Aquca003.00045 (Aquilegia coerulea) Nenu007682 (Nelumbo nucifera) Nenu012205 (Nelumbo nucifera)
Proteales
Caryophyllales Bv5.108980 (Beta vulgaris) ugScf00212.g12744 (Utricularia gibba)
MigutL01945 (Minulus guttatus)Lamiales
NbS00027068g0023 (Nicotiana benthamiana) NbS00007819g0018 (Nicotiana benthamiana)
Solyc09g010820 (Solanum lycopersicum ) PGSC0003DMP400015671 (Solanum tuberosum)
Solanales
Vitales GSVIVT01034171001 (Vitis vinifera)Ericales Achn163941 (Actinidia chinensis)Myrtales Eucgr.K03133 (Eucalyptus grandis)
Cucsa175460 (Cucumis sativus) Cla017897 (Citrullus lanatus)
Cucurbitales
Pm002704 (Prunus mume) Prupe 6G255400 (Prunus persica)
mrna18416 (Fragaria vesca) Pbr006264 (Pyrus bretschneideri)
MDP0000197330 (Malus domestica) MDP0000219581 (Malus domestica)
Rosales
CM0147.620 (Lotus japonicus) Ccajan44733 (Cajanus cajan)
Phvul010G012500 (Phaseolus vulgaris) Glyma03G082400 (Glycine max)
Medtr7g061330 (Medicago truncatula) Medtr7g461410 (Medicago truncatula)
Phvul008G102000 (Phaseolus vulgaris) Glyma18G181100 (Glycine max) Glyma07G132200 (Glycine max)
Fabales
Sapur0001s1810 (Salix purpurea) Potri006G085600 (Populus trichocarpa) Jcr4S01332.10 (Jatropha curcas)
29846m000184 (Ricinus communis) cassava004816m (Manihot esculenta)
Malpighiales
Brassicales Cpapaya228.17 (Carica papaya) Thecc1EG021936t1 (Theobroma cacao) Gorai006G172800 (Gossypium raimondii)
Gorai001G249400 (Gossypium raimondii)Malvales
Lus10025351 (Linum usitatissimum) Lus10024394 (Linum usitatissimum)
Malpighiales
Sapindales orange1.1g009402m (Citrus sinensis) BraraJ02886 (Brassica rapa)
Thhalv10013211m (Eutrema salsugineum) AT5G02320 (Arabidopsis thaliana)
Cagra0487s0012 (Capsella grandiflora) Bostr6251s0040 (Boechera stricta) Thhalv10020579m (Eutrema salsugineum)
AT3G09370 (Arabidopsis thaliana) Cagra2515s0028 (Capsella grandiflora)
Bostr22252s0130 (Boechera stricta) BraraE03093 (Brassica rapa)
BraraC03272 (Brassica rapa) BraraA03530 (Brassica rapa)
Brassicales
Amborellales Amtr00009.357 (Amborella trichopoda)Alismatales Spipo1G0035900 (Spirodela polyrhiza)
Bchr2P03456 (Musa balbisiana) Achr2P03880 (Musa acuminata)
Zingiberales
Ranunculales Aquca013.00366 (Aquilegia coerulea)Vitales GSVIVT01019834001 (Vitis vinifera)Caryophyllales Bv5.095680 (Beta vulgaris)Ericales Achn380251 (Actinidia chinensis)Lamiales MigutA00800 (Mimulus guttatus)
Solyc08g080580 (Solanum lycopersicum) Capana01g000277 (Capsicum annuum) PGSC0003DMP400005488 (Solanum tuberosum)
NbS00046693g0003 (Nicotiana benthamiana) NbS00001647g0009 (Nicotiana benthamiana)
Solanales
Pm023786 (Prunus mume) Prupe.5G093400 (Prunus persica)
Pbr039284 (Pyrus bretschneideri) Pbr039284 (Pyrus bretschneideri)
Rosales
CM0021.3260 (Lotus japonicus) Ca02417 (Cicer arietinum)
Medtr5g010650 (Medicago truncatula) Glyma01G217500 (Glycine max)
Ccajan13183 (Cajanus cajan)
Fabales
Myrtales Eucgr.D01905 (Eucalyptus grandis) Thecc1EG016203t1 (Theobroma cacao)
Gorai003G160500 (Gossypium raimondii)Malvales
Sapur0761s0070 (Salix purpurea) Lus10008010 (Linum usitatissimum)
cassava022173m (Manihot esculenta) 30190m011160 (Ricinus communis)
Jcr4S00150.230 (Jatropha curcas)
Malpighiales
Thhalv10029464m (Eutrema salsugineum) Bostr10064s0051 (Boechera stricta)
AT4G00540 (Arabidopsis thaliana)Brassicales
Ericales Achn295821 (Actinidia chinensis)Vitales GSVIVT01027493001 (Vitis vinifera)
Pm018007 (Prunus mume) mrna19115 (Fragaria vesca)
Rosales
Brassicales Cpapaya18.208 (Carica papaya) Thecc1EG005402t1 (Theobroma cacao)
Gorai008G117400 (Gossypium raimondii)Malvales
Sapur0586s0030 (Salix purpurea) Jcr4S10210.20 (Jatropha curcas)
Potri014G079300 (Populus trichocarpa) Sapur0586s0030 (Salix purpurea) Sapur0533s0050 (Salix purpurea)
Malpighiales
71
100
67
100
100
100
100
100
99
100
100100
100
100
100
94
95100
100
95
70
99
79
52
9983
100
100
84
100
100
99
99
100
100
90
100
10096
100
5688
63
100
100
93
100
100
100
100
100
77
72
81
99
100
100
100
92
8559
100
100100
9699
55
100
99
9797
100
94
55
58
90
57
98
56
9955
87
67
10070
9472
96100
97
98
90100
99
100
72100
100
100
100
100
100
67
89
97
61
10097
89
100
7784
100
98
93100
100
7976
88
93
100
93100
59
100
100
9987
78
100
8189
91
61
100
100
89
100
10052
10092
97
67
100
100
98
98
70
100
98
61
91
99
79
85100
97
93
80
100100
94
50
86
100
0.2
Bathy10g03950 (Bathycoccus prasinos) 192902 (Micromonas pusilla CCMP1545)
88104 (Micromonas pusilla RCC299) 37383 (Ostreococcus RCC809)
34891 (Ostreococcus lucim arinus) Phpat.014G052200 (Physcomitrella patens)
Phpat.010G040000 (Physcomitrella patens) PITA000002016 (Pinus teada) PITA000027114 (Pinus teada)
06223 (Ginkgo biloba) 08952 (Ginkgo biloba)
Amborellales Amtr00109.47(Am borella trichopoda)Alismatales Spipo10G0009900 (Spirodela polyrhiza)
Bchr10P30232 (Musa balbisiana) Achr10P14840 (Musa acuminata)
Bchr2P04866 (Musa balbisiana) Achr2P19890 (Musa acuminata)
Zingiberales
Sobic.009G083800 (Sorghum bicolor) Pavir.J07395 (Panicum virgatum)
Bradi2g31887 (Brachypodium distachyon) Traes.1AS.61D017632 (Triticum aestivum)
Traes.1BS.403DBC53C (Triticum aestivum) Sobic.003G008700 (Sorghum bicolor) Pahal.0003s0659 (Panicum hallii) Pavir.Ea00087 (Panicum virgatum) Pavir.Eb00149 (Panicum virgatum)
LOC_Os01g12860 (Oryza sativa) LOC_Os12g13570 (Oryza sativa)
Bradi2g07677 (Brachypodium distachyon) MLOC10556 (Hordeum vulgare) Traes.3AS.09025DA2E (Triticum aestivum) Traes.3B.ABF465DDF (Triticum aestivum)
Poales
Ranunculales Aquca001.00435 (Aquilegia coerulea) Nenu003474 (Nelumbo nucifera)
Nenu005941 (Nelumbo nucifera)Proteales
GSVIVT01015370001 (Vitis vinifera)Vitales GSVIVT01035663001 (Vitis vinifera)
GSVIVT01035664001 (Vitis vinifera)Caryophyllales Bv2.032710 (Beta vulgaris)
MigutE01513 (Minulus guttatus) MigutA00232 (Minulus guttatus)
Lamiales
Solyc08g068320 (Solanum lycopersicum ) Capana08g000819 (Capsicum annuum)
Capana11g000012 (Capsicum annuum) Solyc11g071300 (Solanum lycopersicum )
NbS00023020g0004 (Nicotiana benthamiana) NbS00023374g0007 (Nicotiana benthamiana)
Solanales
Cucsa161430 (Cucumis sativus) Cla022224 (Citrullus lanatus)
Cucsa311420 (Cucumis sativus) Cla023007 (Citrullus lanatus)
Cucurbitales
MDP0000179225 (Malus domestica) mrna08793 (Fragaria vesca)
Pbr009229 (Pyrus bretschneideri) Prupe.1G452000 (Prunus persica) Pm005124 (Prunus mume)
Rosales
Ca01884 (Cicer arietinum) Medtr3g110028 (Medicago truncatula)
CM1664.320 (Lotus japonicus) Ccajan11088 (Cajanus cajan)
Phvul009G106700 (Phaseolus vulgaris) Glyma06G082300 (Glycine max) Glyma04G080600 (Glycine max)
Ca14925 (Cicer arietinum) Medtr1g026870 (Medicago truncatula)
Ccajan14244 (Cajanus cajan) Phvul001G061200 (Phaseolus vulgaris)
Glyma17G190900 (Glycine max) Glyma14G143400 (Glycine max)
Fabales
Myrtales Eucgr.C02893 (Eucalyptus grandis) 29794m003447 (Ricinus communis)
Jcr4S11652.10 (Jatropha curcas) Lus10038623 (Linum usitatissimum)
Lus10022136 (Linum usitatissimum) Sapur0283s0180 (Salix purpurea)
Potri018G038000 (Populus trichocarpa) Sapur0446s0220 (Salix purpurea)
Potri006G241700 (Populus trichocarpa)
Malpighiales
Thecc1EG047091t1 (Theobroma cacao) Gorai010G110000 (Gossypium raimondii) Gorai001G028200 (Gossypium raimondii) Gorai009G051000 (Gossypium raimondii)
Malvales
Cpapaya78.76 (Carica papaya) BraraK00691 (Brassica rapa)
BraraA00517 (Brassica rapa) BraraH01296 (Brassica rapa)
Thhalv10024320m (Eutrema salsugineum) AT4G32730 (Arabidopsis thaliana) Cagra4093s0003 (Capsella grandiflora)
Bostr7867s1124 (Boechera stricta) BraraJ02221 (Brassica rapa)
BraraC00465 (Brassica rapa) Thhalv10012583m (Eutrema salsugineum) Bostr20055s0087 (Boechera stricta) Cagra7526s0003 (Capsella grandiflora) AT5G11510 (Arabidopsis thaliana)
Brassicales
Amborellales Amtr00012.146 (Amborella trichopoda)Alismatales Spipo6G0071600 (Spirodela polyrhiza)
Elgu00003.114 (Elaeis guineensis) PDK30s1074861g022 (Phoenix dactylifera)
Arecales
Achr6P05030 (Musa acuminata) Achr7P10520 (Musa acuminata)
Bchr10P31223 (Musa balbisiana) Achr10P26610 (Musa acuminata)
Zingiberales
Asparagales PEQU09277 (Phalaenopsis equestris) LOC_Os05g38460 (Oryza sativa)
Pahal.0696s0006 (Panicum hallii) Si021484m (Setaria italica)
Bradi2g23341 (Brachypodium distachyon) Bradi2g23310 (Brachypodium distachyon)
Bradi2g23320 (Brachypodium distachyon) Traes.1BL.C25B5DDB4 (Triticum aestivum)
Traes.1DL.B26A733D4 (Triticum aestivum) TRIUR3.29290 (Triticum urartu)
LOC_Os01g62410 (Oryza sativa) Bradi2g54640 (Brachypodium distachyon)
Traes.3B.B594FC28C (Triticum aestivum) Traes.3AL.386795528 (Triticum aestivum) Traes.3DL.6BBC889A1 (Triticum aestivum) GRMZM2G081919 (Zea mays)
Sobic.003G352200 (Sorghum bicolor) Si000842m (Setaria italica)
Pahal.0006s0119 (Panicum hallii) Pavir.Ea03349 (Panicum virgatum) Pavir.Eb03676 (Panicum virgatum)
Poales
Ranunculales Aquca003.00045 (Aquilegia coerulea) Nenu007682 (Nelumbo nucifera) Nenu012205 (Nelumbo nucifera)
Proteales
Caryophyllales Bv5.108980 (Beta vulgaris) ugScf00212.g12744 (Utricularia gibba)
MigutL01945 (Minulus guttatus)Lamiales
NbS00027068g0023 (Nicotiana benthamiana) NbS00007819g0018 (Nicotiana benthamiana)
Solyc09g010820 (Solanum lycopersicum ) PGSC0003DMP400015671 (Solanum tuberosum)
Solanales
Vitales GSVIVT01034171001 (Vitis vinifera)Ericales Achn163941 (Actinidia chinensis)Myrtales Eucgr.K03133 (Eucalyptus grandis)
Cucsa175460 (Cucumis sativus) Cla017897 (Citrullus lanatus)
Cucurbitales
Pm002704 (Prunus mume) Prupe 6G255400 (Prunus persica)
mrna18416 (Fragaria vesca) Pbr006264 (Pyrus bretschneideri)
MDP0000197330 (Malus domestica) MDP0000219581 (Malus domestica)
Rosales
CM0147.620 (Lotus japonicus) Ccajan44733 (Cajanus cajan)
Phvul010G012500 (Phaseolus vulgaris) Glyma03G082400 (Glycine max)
Medtr7g061330 (Medicago truncatula) Medtr7g461410 (Medicago truncatula)
Phvul008G102000 (Phaseolus vulgaris) Glyma18G181100 (Glycine max) Glyma07G132200 (Glycine max)
Fabales
Sapur0001s1810 (Salix purpurea) Potri006G085600 (Populus trichocarpa) Jcr4S01332.10 (Jatropha curcas)
29846m000184 (Ricinus communis) cassava004816m (Manihot esculenta)
Malpighiales
Brassicales Cpapaya228.17 (Carica papaya) Thecc1EG021936t1 (Theobroma cacao) Gorai006G172800 (Gossypium raimondii)
Gorai001G249400 (Gossypium raimondii)Malvales
Lus10025351 (Linum usitatissimum) Lus10024394 (Linum usitatissimum)
Malpighiales
Sapindales orange1.1g009402m (Citrus sinensis) BraraJ02886 (Brassica rapa)
Thhalv10013211m (Eutrema salsugineum) AT5G02320 (Arabidopsis thaliana)
Cagra0487s0012 (Capsella grandiflora) Bostr6251s0040 (Boechera stricta) Thhalv10020579m (Eutrema salsugineum)
AT3G09370 (Arabidopsis thaliana) Cagra2515s0028 (Capsella grandiflora)
Bostr22252s0130 (Boechera stricta) BraraE03093 (Brassica rapa)
BraraC03272 (Brassica rapa) BraraA03530 (Brassica rapa)
Brassicales
Amborellales Amtr00009.357 (Amborella trichopoda)Alismatales Spipo1G0035900 (Spirodela polyrhiza)
Bchr2P03456 (Musa balbisiana) Achr2P03880 (Musa acuminata)
Zingiberales
Ranunculales Aquca013.00366 (Aquilegia coerulea)Vitales GSVIVT01019834001 (Vitis vinifera)Caryophyllales Bv5.095680 (Beta vulgaris)Ericales Achn380251 (Actinidia chinensis)Lamiales MigutA00800 (Mimulus guttatus)
Solyc08g080580 (Solanum lycopersicum) Capana01g000277 (Capsicum annuum) PGSC0003DMP400005488 (Solanum tuberosum)
NbS00046693g0003 (Nicotiana benthamiana) NbS00001647g0009 (Nicotiana benthamiana)
Solanales
Pm023786 (Prunus mume) Prupe.5G093400 (Prunus persica)
Pbr039284 (Pyrus bretschneideri) Pbr039284 (Pyrus bretschneideri)
Rosales
CM0021.3260 (Lotus japonicus) Ca02417 (Cicer arietinum)
Medtr5g010650 (Medicago truncatula) Glyma01G217500 (Glycine max)
Ccajan13183 (Cajanus cajan)
Fabales
Myrtales Eucgr.D01905 (Eucalyptus grandis) Thecc1EG016203t1 (Theobroma cacao)
Gorai003G160500 (Gossypium raimondii)Malvales
Sapur0761s0070 (Salix purpurea) Lus10008010 (Linum usitatissimum)
cassava022173m (Manihot esculenta) 30190m011160 (Ricinus communis)
Jcr4S00150.230 (Jatropha curcas)
Malpighiales
Thhalv10029464m (Eutrema salsugineum) Bostr10064s0051 (Boechera stricta)
AT4G00540 (Arabidopsis thaliana)Brassicales
Ericales Achn295821 (Actinidia chinensis)Vitales GSVIVT01027493001 (Vitis vinifera)
Pm018007 (Prunus mume) mrna19115 (Fragaria vesca)
Rosales
Brassicales Cpapaya18.208 (Carica papaya) Thecc1EG005402t1 (Theobroma cacao)
Gorai008G117400 (Gossypium raimondii)Malvales
Sapur0586s0030 (Salix purpurea) Jcr4S10210.20 (Jatropha curcas)
Potri014G079300 (Populus trichocarpa) Sapur0586s0030 (Salix purpurea) Sapur0533s0050 (Salix purpurea)
Malpighiales
71
100
67
100
100
100
100
100
99
100
100100
100
100
100
94
95100
100
95
70
99
79
52
9983
100
100
84
100
100
99
99
100
100
90
100
10096
100
5688
63
100
100
93
100
100
100
100
100
77
72
81
99
100
100
100
92
8559
100
100100
9699
55
100
99
9797
100
94
55
58
90
57
98
56
9955
87
67
10070
9472
96100
97
98
90100
99
100
72100
100
100
100
100
100
67
89
97
61
10097
89
100
7784
100
98
93100
100
7976
88
93
100
93100
59
100
100
9987
78
100
8189
91
61
100
100
89
100
10052
10092
97
67
100
100
98
98
70
100
98
61
91
99
79
85100
97
93
80
100100
94
50
86
100
0.2
Bathy10g03950 (Bathycoccus prasinos) 192902 (Micromonas pusilla CCMP1545)
88104 (Micromonas pusilla RCC299) 37383 (Ostreococcus RCC809)
34891 (Ostreococcus lucim arinus) Phpat.014G052200 (Physcomitrella patens)
Phpat.010G040000 (Physcomitrella patens) PITA000002016 (Pinus teada) PITA000027114 (Pinus teada)
06223 (Ginkgo biloba) 08952 (Ginkgo biloba)
71
100
67
100
100
100
98
100
50
86
100
0.2
Bathy10g03950 (Bathycoccus prasinos) 192902 (Micromonas pusilla CCMP1545)
88104 (Micromonas pusilla RCC299) 37383 (Ostreococcus RCC809)
34891 (Ostreococcus lucim arinus) Phpat.014G052200 (Physcomitrella patens)
Phpat.010G040000 (Physcomitrella patens) PITA000002016 (Pinus teada) PITA000027114 (Pinus teada)
06223 (Ginkgo biloba) 08952 (Ginkgo biloba)
71
100
67
100
100
100
98
100
50
86
100
0.2
A_Group
C_Group
B_Group
A_Group
C_Group
B_Group
52
Figure 2-4. Syntenic blocks in algae (Ostreococcus lucimarinus) and Amborella trichopoda that contain 3R-MYB genes identified by DAGchainer. The x-axes of each plot show the three Amborella scaffolds and the y-axis shows the segment on chr9 of Ostreococcus lucimarinus genome that contains the single 3R-MYB in that taxon. Dots indicate homologous genes; solid dots indicate the 3R-MYB genes.
53
Figure 2-5. Tests for origin of the three groups of the plant 3R-MYB genes. A) Distribution of the pair-wise synonymous distances (dS) for paralogous 3R-MYBs in each angiosperm species. The pair-wise dS value distribution of A-A, B-B, C-C, A-B, A-C and B-C are shown as histograms with a normal distribution fitted. B) Normal distributions fit to pairwise dS values for the six groups.
54
Figure 2-6. Multiple protein alignments of motif 4 with representative species.
Algae
Moss
Gymnosperm
Angiosperm (A_Group)
Angiosperm (C_Group)
R1 R2 R3 C
Angiosperm (B_Group)
N
Motif 1
Motif 2 Motif 3
Motif 4 Motif4
Gymnosperms
Angiosperms
Amborella
Eudicots
Monocots
55
Figure 2-7. Analysis of DNA binding domain of the plant 3R-MYBs proteins. A) Alignments of DNA binding domain of representative plant 3R-MYB proteins. Protein groups (A-, B-, or C-) are indicated before of gene names and species are indicated inside brackets. The five conserved introns in the DNA-binding domain are indicated using black arrows, black lines, uppercase bold letters A, B, C, D and E; the other intron is indicated using gray arrow, gray line and lowercase letters b. The numbers in parentheses after the letter indicate intron position, with “0” indicates the introns between the two codons of the indicated two amino acids; “1” indicates the introns between the first and second nucleotide of the codon of the indicated amino acid; “2” indicates the introns between the second and third nucleotide of the codon of the indicated amino acid. Thick black lines at the bottom indicate the three helices in each R repeat (Ogata et al. 1992; Ogata et al. 1994) and blue asterisks indicate the conserved tryptophans. B) Distribution of the amino acid substitution rate differences comparing each group with the other two groups. Dashed lines indicate our threshold (2.57 standard deviations) for the identification of rate shift sites. C) The site in each group that has an unusually low (Slow in the Group) or high (Fast in the Group) amino acid substitution rate compared relative to the other two groups. D) Amino acid alignment logos of the DNA-binding-domain of A-, B- and C-group 3R-MYBs with the slow (green) and fast (orange) sites highlighted. Blue boxes above the sequence logos indicate helices, blue lines between them indicate turns, and blue asterisks indicate the conserved tryptophans.
57
Figure 2-8. Intron evolution pattern of the DNA-binding-domain region of the plant 3R-MYBs. For each gene depicted boxes indicate exons, lines indicate introns, UTR regions are not included in the gene structure. The hash lines indicate possible introns. Gray, pink and green thick bars indicate the five conserved introns, with the name of each intron on the top. The four conserved motifs are shown in corresponding position in the gene structure.
Algae
Moss
Gymnosperm
Angiosperm (A_Group)
Angiosperm (C_Group)
Angiosperm (B_Group) Start
codon
Stop
codon 1 2 3 4 5 6
A B
C
D E
R1
R2
R3
Motif 1
Motif 2
Motif 3
Motif 4
58
Figure 2-9. AS of 3R-MYB proteins in Amborella, Arabidopsis, grape, popular, rice and
sorghum. The group (A-, B-, or C-) membership for each gene is indicated in brackets. Boxes indicate exons (blue for constitutively spliced; orange for alternatively spliced) and lines indicate introns. Gene structures are drawn to scale and connecting bars indicate homologous exons (green for the six exons encoding the DNA binding domain; pink for the four exons specific to the A-group; gray for all others). The two black flags in each gene indicate the start and stop codon in the primary transcript and red hexagons indicate stop codons generated by AS. The green circles at the end of the exons indicate alternative polyadenylation events.
59
Figure 2-10. Predicted MSA element distribution within the regions 2kb upstream of the plant 3R-MYB genes.
60
Figure 2-11. Violin plots of the number of MSA core sequences in the upstream regions for each group of genes. The median number of MSA core sequences in each group is shown by the white dot (the median is on the right side). Kernel width indicates the fitted data density under kernel distribution. a, b and c above each violin plot indicate difference significance by ANOVA and Tukey’s HSD test under 0.05 significance.
61
Figure 2-12. Expression profiles of the Arabidopsis 3R-MYB genes under abiotic stresses. The expression level of three Arabidopsis genes At4g32730 (A-group), At5g11510 (A-group), At3g09370 (C-group) in root and shoot under heat (38 °C), cold (4 °C), salt (150mM NaCl), and drought (dry air stream). In heat stress, the seedlings were returned to room temperature after a 3 hour treatment (indicated by red arrow). For each gene, the expression level in root at 0 time point was normalized to 1. The expression levels of that gene under other conditions were normalized accordingly. Error bars indicate standard error. Asterisk(s) indicate significant level from one-way ANOVA test (significance level: *: 0.05; **: 0.01; ***: 0.001).
0 0.25 0.5 1 3 4 6 12 24h
0.0
0.5
1.0
1.5
2.0
Root
Shoot*****
0 0.5 1 3 6 12 24h
0.0
0.5
1.0
1.5
2.0
Root
Shoot***
0 0.5 1 3 6 12 24h
0.0
0.5
1.0
1.5
2.0
Root
Shoot
0 0.5 1 3 6 12 24h
0.0
0.5
1.0
1.5
2.0
Root
Shoot
0 0.25 0.5 1 3 4 6 12 24h
0.0
0.5
1.0
1.5
2.0
Root
Shoot*****
0 0.5 1 3 6 12 24h
0.0
0.5
1.0
1.5
2.0
Root
Shoot
0 0.5 1 3 6 12 24h
0.0
0.5
1.0
1.5
2.0
Root
Shoot*
0 0.5 1 3 6 12 24h
0.0
0.5
1.0
1.5
2.0
Root
Shoot*
0 0.25 0.5 1 3 4 6 12 24h
0.0
0.5
1.0
1.5
2.0
Root
Shoot******
0 0.5 1 3 6 12 24h
0.0
0.5
1.0
1.5
2.0
Root
Shoot*
0 0.5 1 3 6 12 24h
0.0
0.5
1.0
1.5
2.0
Root
Shoot
0 0.5 1 3 6 12 24h
0.0
0.5
1.0
1.5
2.0
Root
Shoot*
Rela
tive E
xp
ressio
n
AT
3G
09
370
(G
rou
p C
)AT
5G
11
51
0 (
Gro
up A
)AT
4G
32
73
0 (
Gro
up
A)
Heat Cold Salt Drought
Time
62
Figure 2-13. Expression profiles of the 3R-MYB genes from nine angiosperm species under abiotic stresses. Labels in the upper left corner of each bar plot indicate microarray project accession number in PLEXdb (Dash et al. 2012). Please see detailed description of each experiment in PLEXdb (http://www.plexdb.org/index.php) under corresponding microarray project accession number. Error bars indicate standard error. Asterisk(s) indicate significant level from two-sample t-test (significance level: *: 0.05; **: 0.01; ***: 0.001). a, b and c above each bar plot indicate difference significance by ANOVA and Tukey’s HSD test under 0.05 significance.
64
Table 2-1. Data resource summary of the sixty-five plant species used in this study.
Group Species Version Note*
Algae Bathycoccus prasinos 7/15/11 [1]
Algae Ostreococcus lucimarinus v2.0 [1,2,3,4]
Algae Ostreococcus RCC809 v2 [1]
Algae Micromonas pusilla CCMP1545 v3.0 [1,2,3,4]
Algae Micromonas pusilla RCC299 v3.0 [1,2,3,4]
moss Physcomitrella patens v3.0 [1,2,3,4]
Gymnospermae Ginkgo biloba [1]
Gymnospermae Pinus taeda v1.01/v2 [1,3]
Angiospermae Amborella trichopoda v1.0 [1,2,3,4]
Monocot Spirodela polyrhiza v2 [1,2,3,4]
Monocot Phalaenopsis equestris v5.0 [1]
Monocot Elaeis guineensis v2 [1]
Monocot Phoenix dactylifera v3 [1]
Monocot Musa acuminata v1 [1]
Monocot Musa balbisiana v1 [1]
Monocot Brachypodium distachyon v2.1 [1,2,3,4]
Monocot Hordeum vulgare v1.25 [1]
Monocot Oryza sativa v7.0 [1,2,3,4]
Monocot Panicum virgatum v1.1 [1,2,3,4]
Monocot Setaria italica v2.1 [1,2,3,4]
Monocot Sorghum bicolor v2.1 [1,2,3,4]
Monocot Triticum aestivum v2.2 [1,2,3,4]
Monocot Triticum urartu 1.25 [1]
Monocot Zea mays 6a [1,2,3,4]
Monocot Panicum hallii v0.5 [1,2,3,4]
Basal eudicot Aquilegia coerulea v1.1 [1,2,3,4,5]
Basal eudicot Nelumbo nucifera v1 [1,5]
Eudicot Beta vulgaris v1.1 [1,5]
Eudicot_Rosid Vitis vinifera Genoscope.12X [1,2,3,4,5]
Eudicot_Rosid Eucalyptus grandis v2.0 [1,2,3,4,5]
Eudicot_Rosid Fragaria vesca v1.1 [1,2,3,4,5]
Eudicot_Rosid Malus domestica v1.0 [1,2,3,4,5]
Eudicot_Rosid Prunus mume v1 [1,5]
Eudicot_Rosid Prunus persica v2.1 [1,2,3,4,5]
Eudicot_Rosid Pyrus bretschneideri V121010 [1,5]
Eudicot_Rosid Cajanus cajan v5.0 [1,5]
Eudicot_Rosid Glycine max Wm82.a2.v1 [1,2,3,4,5]
Eudicot_Rosid Medicago truncatula mt4.0v1 [1,2,3,4,5]
Eudicot_Rosid Phaseolus vulgaris v1.0 [1,2,3,4,5]
Eudicot_Rosid Cicer arietinum v1 [1,5]
Eudicot_Rosid Lotus japonicus v2.5 [1,5]
65
Table 2-1. Continued
Group Species Version Note*
Eudicot_Rosid Citrullus lanatus v1 [1,5]
Eudicot_Rosid Cucumis sativus v1.0 [1,2,3,4,5]
Eudicot_Rosid Jatropha curcas v4.5 [1,5]
Eudicot_Rosid Manihot esculenta v4.1 [1,2,3,4,5]
Eudicot_Rosid Ricinus communis v0.1 [1,2,3,4,5]
Eudicot_Rosid Linum usitatissimum BGIv1.0 [1,2,3,4,5]
Eudicot_Rosid Populus trichocarpa v3.0 [1,2,3,4,5]
Eudicot_Rosid Salix purpurea v1.0 [1,2,3,4,5]
Eudicot_Rosid Gossypium raimondii v2.1 [1,2,3,4,5]
Eudicot_Rosid Theobroma cacao v1.1 [1,2,3,4,5]
Eudicot_Rosid Citrus sinensis v1.1 [1,2,3,4,5]
Eudicot_Rosid Carica papaya ASGPBv0.4 [1,2,3,4,5]
Eudicot_Rosid Arabidopsis thaliana TAIR10 [1,2,3,4,5]
Eudicot_Rosid Brassica rapa v1.3 [1,2,3,4,5]
Eudicot_Rosid Boechera stricta v1.2 [1,2,3,4,5]
Eudicot_Rosid Capsella grandiflora v1.1 [1,2,3,4,5]
Eudicot_Rosid Eutrema salsugineum v1.0 [1,2,3,4,5]
Eudicot_Asterid Capsicum annuum v2.0 [1,5]
Eudicot_Asterid Nicotiana benthamiana v0.4.4 [1,5]
Eudicot_Asterid Solanum lycopersicum iTAGv2.3 [1,2,3,4,5]
Eudicot_Asterid Solanum tuberosum v3.4 [1,2,3,4,5]
Eudicot_Asterid Mimulus guttatus v2.0 [1,2,3,4,5]
Eudicot_Asterid Utricularia gibba v4.1 [1,5]
Eudicot_Asterid Actinidia chinensis v1 [1,5]
*NOTE: 1.Phylogeny analysis; 2. Synonymous divergence analysis, 3. Gene structure analysis, 4. Promoter analysis; 5. Synonymous divergence analysis
66
Table 2-2. Positive selection test results.
Test H0(InL) H1(InL)
Positive selection in A group -32669.96291 -32669.96293
Positive selection in B group -32669.96299 -32669.96292
Positive selection in C group -32669.81231 -32669.81241
Positive selection in monocots within A group -10477.43538 -10477.35695
Positive selection in monocots within C group -10656.13493 -10656.13493
67
CHAPTER 3 JASMONATE INDUCED ALTERNATIVE SPLICING RESPONSES IN ARABIDOPSIS
Background
Plants have developed inducible defense mechanisms and complex signaling
networks for efficient, precise and fast response to ever-changing environmental stimuli
and unpredictable biotic attacks. JA is a phytohormone induced by biotic aggression
mediated by herbivores and pathogen attack, as well as abiotic stresses such as cold,
salt, UV light and ozone (Browse and Howe 2008; Howe and Jander 2008; Katsir et al.
2008; Hu et al. 2013; Valenzuela et al. 2016). JA regulates a variety of plant events
including photosynthesis (Attaran et al. 2014), root and shoot morphogenesis (Gasperini
et al. 2015; Zheng et al. 2017), flowering time (Zhai et al. 2015; Thatcher et al. 2016),
stamen development (Song et al. 2011; Qi et al. 2015a), seed germination (Linkies and
Leubner-Metzger 2011), and senescence (Miao and Zentgraf 2007; Jiang et al. 2014; Qi
et al. 2015b; Yu et al. 2016). In general, JA inhibits plant growth, promotes growth-
defense transition and triggers early reproduction.
JA-Ile, JAZ proteins and the F-box protein COI1 form the key regulatory module
of JA signal transduction and amplification (Thines et al. 2007; Chini et al. 2007). Upon
JA-Ile accumulation, the three-component-module will be formed, the JAZs are
polyubiquitinated and degraded through the 26S proteasome, which releases
suppression of the JAZ-interacting transcription factors that regulate early-jasmonate-
responsive genes (Thines et al. 2007; Chini et al. 2007; Chini et al. 2016). Thirteen JAZ
genes have been identified in the Arabidopsis genome (Bai et al. 2011; Thireault et al.
2015); there is both functional redundancy and specificity (Chini et al. 2016). Members
of the JAZ family share a conserved N-terminal domain (Moreno et al. 2013), a TIFY
68
(previously known as ZIM) domain (expect JAZ13) and a C-terminal Jas domain (Thines
et al. 2007; Chini et al. 2007; Yan et al. 2007). The TIFY domain is responsible for
homo- and hetero-dimerization (Chini et al. 2009) and interaction with NINJA, which
further recruits the co-repressor TPL through its conserved EAR domain (Pauwels et al.
2010). However, JAZ5, JAZ6, JAZ7, JAZ8 and JAZ13 could directly recruit TPL by EAR
in their N-terminal domain (Causier et al. 2012; Shyu et al. 2012; Thireault et al. 2015;
Thatcher et al. 2016). The Jas domain is the major domain responsible for interacting
with transcription factors (Chini et al. 2016). It is also responsible for binding with COI1,
except in JAZ7 and JAZ8 where the Jas domain has diverged (Shyu et al. 2012; Chini
et al. 2016). As a result, JAZ8, and probably JAZ7, won’t be polyubiquitinated and
broken down in the presence of jasmonate and serve as permanent repressors.
Interestingly, almost all JAZs (expect for JAZ1, JAZ7 and JAZ8) share a homologous
intron (Jas intron) that divides the Jas domain into a 20 N-terminal motif and a 7 C-
terminal motif (X5PY) (Chung et al. 2010). Frequently observed AS around this
conserved intron usually leads to a truncated protein lacking the X5PY motif or lacking
the whole Jas domain (Yan et al. 2007; Chung and Howe 2009; Chung et al. 2010;
Moreno et al. 2013). These truncated proteins retain their ability to interact with
transcription factors through 20 amino acids remaining in N-terminal motif of the Jas
domain or a similar sequence in their N-terminal domain, but have reduced (or absent)
ability of being recognized by COI1 (Chung and Howe 2009; Chung et al. 2010; Moreno
et al. 2013; Zhang et al. 2017a). As a result, these AS isoforms avoid degradation and
function as dominant repressors in the presence of JA and may serve to limit up-
regulation of the jasmonate responsive genes. The observed AS around the Jas intron
69
is conserved among monocots and eudicots, suggesting a conserved function
underscoring its importance (Chung et al. 2010). It was predicted that ~60% percent
protein-coding genes of the Arabidopsis genome are alternatively spliced (Zhang et al.
2017b). However, other then JAZ repressors, little is known about AS regulation in the
jasmonate signaling pathway. In this project, I explore the role of AS in the jasmonate
signaling pathway.
AS is a post-transcriptional regulation which generates multiple
transcripts/isoforms from a single gene by selecting different splice sites caused by
interaction between cis-elements (intronic/exonic splicing enhancer/silencer) and trans-
factors (e.g. SR splicing activators and hnRNP splicing repressors) (Kornblihtt et al.
2013). AS regulation refers to selection of one splice junction over another leading to a
change in the proportion of different isoforms for a given gene. Moreover, AS interacts
with other regulatory mechanisms such as NMD (Kervestin and Jacobson 2012; Kalyna
et al. 2012) and miRNA regulation (Reddy et al. 2013). AS frequently (32% of genes
which undergo AS, Kalyna et al. 2012) causes a PTC that results in the transcript being
recognized by the RNA surveillance system and subsequently degraded through the
NMD pathway (Kervestin and Jacobson 2012). Thus, NMD coupled with AS functions
as an important posttranscriptional mechanism to regulate protein levels (Kervestin and
Jacobson 2012). Inclusion or exclusion of the alternative region may introduce new
miRNA targeting sites that the other isoforms do not have (Reddy et al. 2013), thus
genes producing AS isoforms may be subject to both AS and miRNA regulation. In
plants, experimentally validated miRNA coupled AS regulation has been reported in
Arabidopsis (Wu and Poethig 2006; Yang et al. 2012) and rice (Campo et al. 2013).
70
Moreover, a simulation study suggested AS occurs more frequently in miRNA binding
sites than in other regions, raising the possibility that this combined regulation could be
an important and prevalent mechanism (Yang et al. 2012). Finally, if the AS isoform is
able to make a protein product, it may generate the same protein or a different protein
depending on whether AS happens in the UTRs or CDS. For cases which generate a
different protein product, these proteins could be non-functional, partially-functional,
redundantly-functional, or neo-functional compared with the primary protein (Reddy et
al. 2013; Staiger and Brown 2013). Isoforms with partial- or neo-functions are especially
interesting as they may regulate an alternate gene function.
In this project, I integrated transcriptomics and proteomics analyses of
Arabidopsis with specific interest in identifying three aspects of AS-related regulation
potentially impacting the jasmonate signaling pathway: 1) differential AS in response to
MeJA treatment; 2) miRNA regulation ediated by differential use of AS isoforms; 3) AS
splice variants with novel functions. In each case, I screened a pool of candidates and
further explored a few interesting examples in details. In addition, AS events identified
from the RNA-seq data were further validated with the proteomics data.
Materials and Methods
Plant Growth, MeJA Treatment and Harvesting, Transcriptome Library Preparation and Sequencing
This work describes a reanalysis of RNA-Seq reads presented by Yang et al.
2014. Three independent replicate shoot and three independent replicate root samples
were obtained from the Arabidopsis thaliana mutants jaz2—SALK079895C and jaz7 –
SALK040835C and WT (Col-0) under (10 μM) or without (0 μM) MeJA treatment and
used for transcriptome and proteome sequencing. A total of 36 samples for RNA-Seq as
71
well as proteomics were obtained. Transcriptome data is available in NCBI Sequence
Read Archive under SRP026541 (Yang et al. 2014).
Transcriptome Assembly and Differential AS Detection
Raw reads from Hiseq 2000 system were filtered with FASTX-Toolkit
(http://hannonlab.cshl.edu/fastx_toolkit/index.html) and the screened reads were
mapped to TAIR10 genome database (http://www.arabidopsis.org) with GSNAP v2013-
07-20 (Wu and Nacu 2010) with maximum intron size 8000, minimum intron size 20,
and maximum 5% mismatch allowed. The unique mapped reads from each library were
assembled using Cufflinks v2.2.1 (Trapnell et al. 2013) by reference guided method with
TAIR10 annotation as reference, maximum intron size 8000, minimum intron size 20,
and minimum isoform abundance 5%. The 36 assemblies were merged to a single
transcript reference set by Cuffmerge v2.2.1 (Trapnell et al. 2013) with minimum isoform
abundance to be 5%. The merged transcript reference were further filtered with two
criteria: 1) each junction of the transcript has minimum 3 mapped reads support; 2) for
the alternative region of the IntronR event, the minimum average mapping depth to be 4
if 100% coverage, 5 if 90% coverage, and 6 if 80% coverage. The AS events were
called by AStalavista v3.2 (Foissac and Sammeth 2007). Percentage of novel junction
reads (“complete novel” indicates both the 5’ and 3’ splicing sites are novel, “partial
novel” indicates only one splice site is novel) were calculated by RSeQC v2.6.2 (Wang
et al. 2012). Differential AS and expression were identified by Cuffdiff v2.2.1 (Trapnell et
al. 2013) with default parameters using the screened transcript set as reference.
Open Reading Frame Prediction
I integrated Blast and Pfam search for ORF and protein prediction for the
screened transcript set using TransDecoder v3.0.0 (https://transdecoder.github.io/).
72
Protein Interaction Network Analysis
Genes with significantly differential expression and/or AS in response to MeJA
treatment were used for protein interaction network construction with STRING v10.0
(Szklarczyk et al. 2015). Genes or their homologs with experimental evidence of protein
interaction were connected with lines. Protein interaction network was displayed with
Cytoscape v3.4.0 (Shannon et al. 2003) with gene expression heatmap further added.
miRNA Target Predication
We used the total 427 Arabidopsis miRNAs from the miRNA database (miRBase
Release 21, Kozomara and Griffiths-Jones 2014) as query to searched against
reference transcript set for target site by psRNATarget (Dai and Zhao 2011) with default
parameters in scoring schema V2.
Protein Extraction, Digestion, iTRAQ Labeling and LC-MS/MS
Proteins were quantified as previously described (Koh et al. 2012), and dissolved
in denaturant buffer (0.1% SDS (w/v)) and dissolution buffer (0.5 M triethylammonium
bicarbonate, pH 8.5) in the iTRAQ 8-plex kit (AB sciex Inc., Foster City, CA, USA). For
each sample, a total of 100 μg of protein were reduced, alkylated, trypsin-digested, and
labeled according to the manufacturer’s instructions (AB Sciex Inc.). The control
samples from wild type, jaz2, and jaz7 were labeled with iTRAQ tags 113, 114, and 115,
respectively, and the corresponding treated samples were labeled with iTRAQ tags 116,
117, and 118, respectively. In addition, all six samples were mixed and labeled with
iTRAQ tag 119 as an internal control. Labeled peptides were desalted with C18-solid
phase extraction and dissolved in strong cation exchange (SCX) solvent A (25% (v/v)
acetonitrile, 10 mM ammonium formate, and 0.1% (v/v) formic acid, pH 2.8). The
peptides were fractionated using an Agilent HPLC 1260 with a polysulfoethyl A column
73
(2.1 × 100 mm, 5 µm, 300 Å; PolyLC, Columbia, MD, USA). Peptides were eluted with a
linear gradient of 0–20% solvent B (25% (v/v) acetonitrile and 500 mM ammonium
formate, pH 6.8) over 50 min followed by ramping up to 100% solvent B in 5 min. The
absorbance at 280 nm was monitored and a total of 12 fractions were collected. The
fractions were lyophilized and resuspended in LC solvent A (0.1% formic acid in 97%
water (v/v), 3% acetonitrile (v/v)). A hybrid quadrupole Orbitrap (Q Exactive Plus) MS
system (Thermo Fisher Scientific, Bremen, Germany) was used with high energy
collision dissociation (HCD) in each MS and MS/MS cycle. The MS system was
interfaced with an automated Easy-nLC 1000 system (Thermo Fisher Scientific,
Bremen, Germany). Each sample fraction was loaded onto an Acclaim Pepmap 100
pre-column (20 mm × 75 μm; 3 μm-C18) and separated on a PepMap RSLC analytical
column (250 mm × 75 μm; 2 μm-C18) at a flow rate at 350 nl/min during a linear
gradient from solvent A (0.1% formic acid (v/v)) to 30% solvent B (0.1% formic acid (v/v)
and 99.9% acetonitrile (v/v)) for 95 min, to 98% solvent B for 15 min, and hold 98%
solvent B for additional 30 min. Full MS scans were acquired in the Orbitrap mass
analyzer over m/z 400–2000 range with resolution 70,000 at 200 m/z. The top ten most
intense peaks with charge state ≥ 2 were isolated (with 2 m/z isolation window) and
fragmented in the high energy collision cell using a normalized collision energy of 28%.
The maximum ion injection time for the survey scan and the MS/MS scans were 250
ms, and the ion target values were set to 3e6 and 1e6, respectively. The selected
sequenced ions were dynamically excluded for 60 sec.
Proteomics Data Analysis
The raw MS/MS data files were processed by a thorough database searching
approach considering biological modification and amino acid substitution against
74
customized Arabidopsis database using the ProteinPilot v4.5 with the Fraglet and Taglet
searches under ParagonTM algorithm (Shilov et al. 2007). The following parameters
were considered for all the searching: fixed modification of methylmethane
thiosulfonate-labeled cysteine, fixed iTRAQ modification of amine groups in the N-
terminus, lysine, and variable iTRAQ modifications of tyrosine. The false discovery rate
at the peptide level was estimated with the integrated PSPEP tool in the ProteinPilot
Software to be 1.0%. The identified peptide reads were screened for confidence no less
than 95%. The screened peptide reads were mapped to TAIR10 primary protein
database and peptides which failed to map to the database were candidates for
supporting AS isoforms. These candidates were manually validated and only reads
spanning the AS junction were regarded as evidence for that AS isoform.
Proteomics data generation was performed by Mi-Jeong Yoo and Jin Koh.
Results
Transcriptome Sequencing and Genome-Guided Assembly
36 RNA-Seq Arabidopsis libraries originally constructed to examine response to
MeJA in roots and shoots under WT and mutant conditions (JAZ2 and JAZ7) were
reanalyzed to identify and characterize AS associated with response to MeJA. The
RNA-Seq data was sampled from shoot and root tissues of the Arabidopsis WT (Col-0),
knock-down mutant jaz2 (Figure 3-1; Yan et al. 2014), and overexpression mutant jaz7
(Figure 3-1; Yan et al. 2014) under 0 or 10 μM MeJA treatment, with three biological
replicates each. A total of 578 million reads were available and 78% of these (452M)
were uniquely mapped to the Arabidopsis TAIR10 genome assembly (Table 3-1). The
unique mapped reads from each library were assembled with Cufflinks (Trapnell et al.
2013) and these assemblies were further merged via Cuffmerge (Trapnell et al. 2013) to
75
generate the reference transcript assembly. Isoforms with less than three reads
supporting each of the annotated junctions or IntronR isoforms with low sequence
support for the retained intron were removed from consideration (See materials and
methods). A total of 20,524 transcripts from 13,647 genes were identified in our
transcript assembly, among which 15,947 transcripts were known transcripts shared
with TAIR10 annotation and the remaining 4,577 were novel transcripts (Figure 3-2A).
Among all the splice junctions identified in this project, 26% of the junctions have at
least one boundary novel to the TAIR10 annotation (Figure 3-2B). 4446 genes (32.58%
of the identified 13,647 genes) have evidence of undergoing AS with the majority (98%)
of them producing two to five splice variants (Table 3-2). The most abundant AS events
are AltA (38%) and IntronR (30%) (Figure 3-2C). In plants IntronR is usually the most
common AS event (Reddy et al. 2013). The finding that AltA is more common than
IntronR in this analysis may be a result of applying more stringent screening criteria to
IntronR events (Materials and Methods).
Jasmonate-Related Protein Interaction Network
Using the screened transcript assembly as reference, significant differential gene
expression and differential expression of AS isoforms were identified with Cuffdiff
(Trapnell et al. 2013) in response to MeJA treatment (Table 3-3), between tissues
(Table 3-4), and between genotypes (Table 3-5). In each case there were always more
genes with significant expression changes than genes with significant AS changes. The
most differences were observed between tissues. Genes with significant differential
expression and/or AS in response to jasmonate treatment were used to generate the
protein interaction network with STRING (Szklarczyk et al. 2015). Edges in the network
indicate experimentally validated protein interaction of the two connected proteins or
76
their homologs. A subset of the network including splicing related proteins, jasmonate
key regulatory proteins (JAZs, COI1, NINJA) as well as genes interacting with them was
isolated and examined further. We arbitrary divided the network into four modules
(Figure 3-3). One module contains splicing related genes, and this is connected to three
other modules. The first of these contains the key jasmonate regulatory factors, JAZ3,
JAZ10, TIFY7, COI1 and NINJA, as well as transcription factors such as bHLHs and
R2R3-MYBs. The communication between jasmonate signaling and splicing signaling
was shown to be mediated by bHLHs, R2R3-MYBs and three splicing-related proteins:
CBF1-interacting co-repressor (CIR), pre-mRNA-splicing factor ISY1-like protein (LSY1)
and SNW/SKI-interacting protein (SKIP). The second of these modules mainly contains
kinases and transcription factors and its interaction with the splicing-related proteins is
through kinases and SKIP. The third module is centered by a topoisomerase (TOPII)
and a ubiquitin (UBQ11) protein. Interaction between the third module with the splicing-
related proteins is through Embryo Defective 2816 (emb2816) and Always Early 4
(ALY4). As all genes in the protein interaction network are differentially regulated in
response to MeJA, these genes are presumably jasmonate-responsive genes. The
three modules interacting with the splicing-related proteins suggest three possible signal
transduction pathways between jasmonate and mRNA splicing regulation (Figure 3-3):
shared transcription factors (module 1), shared kinases (module 2) and shared ubiquitin
pathways (module 3).
Regulation of Transcription Factors (bHLHs and MYBs) and Splicing Factors (SRs and hnRNPs)
The protein interaction network indicates that bHLH and MYB gene families have
a large presence, which is suggestive of their importance in the jasmonate pathway.
77
Moreover, the SR and hnRNP gene families are key trans-factors involved in the
selection of splice junctions and play essential roles in AS regulation (Barbazuk et al.
2008). We further analyzed the expression patterns of these four important gene
families in response to MeJA treatment as well as between tissues and genotypes
(Figure 3-4). Genes without any significant expression change in any of the three
comparisons were not included.
I observed changed expression patterns in the mutants compared with WT for
some genes. For example, while the expression of bHLH116 is strongly downregulated
in WT root tissue it is slightly upregulated in jaz2 root tissue and higher still in jaz7 root
tissue in response to MeJA. This expression pattern change indicates bHLH116 might
be regulated by JAZ2 and JAZ7 in the root, which is also supported by the expression
change in genotype comparison (Figure 3-4C). Similarly, bHLH77 might be regulated by
JAZ2 and JAZ7 in the root tissue; bHLH153, bHLH155, bHLH137 might be regulated by
JAZ2 and JAZ7 in the shoot tissue; MYB48, bHLH36 is likely to be regulated by JAZ7 in
both the shoot and root tissues (Figure 3-4A). The expression profiles of the MYB and
bHLH gene family reveals the impact of expression in response to MeJA treatment (up
or down), as well as whether they act downstream of JAZ2 and JAZ7.
The majority of the splicing factors were downregulated in response to MeJA
treatment (Figure 3-4D). The expression changes were less dramatic in the SR and
hnRNP splicing factors compared with the bHLH and MYB transcription factors.
However, I observed several splicing factors that may be involved in jasmonate
responses: RBP45a, RS2Z33, RNPA/B_3, and RSZ21. The expression of RBP45a was
downregulated in response to MeJA in the shoot of WT. However, it shows increased
78
expression in the jaz7 mutant, and even higher expression in the jaz2 mutant. Another
interesting case is RS2Z33, which is dramatically downregulated in response to MeJA in
jaz7 mutant root tissue, but upregulated in the jaz2 mutant and WT in both shoot and
root tissues (Figure 3-4D). Moreover, the expression of RS2Z33 was greatly induced in
jaz2 mutant compared with WT in the shoot tissue (Figure 3-4F). The expression of
RNPA/B_3 did not change much in response to MeJA in WT shoot. However, its
expression is greatly downregulated in the shoot of jaz2 mutants and greatly
upregulated in the shoot of jaz7 mutants. These altered expression patterns suggest
possible involvement of these splicing factors in the jasmonate pathway regulated by
JAZ2 and JAZ7.
Differential Alternative Splicing in Response to MeJA Treatment
In response to MeJA treatment, the most (326) significantly differential AS events
were observed within the shoot of jaz7 mutant (Table 3-3). In each of the six treatments
(two tissues sampled from each of three genotypes), only the minority of the observed
differential AS in response to MeJA in each treatment is shared with the other
treatment(s) (Figure 3-5). Genes with significantly differential AS in response to MeJA in
more than one genotype of a specific tissue (shoot or root) were analyzed further
(Figure 3-6). Among these 16 genes, 7 have the AS events in the coding region, and 5
of the 7 lead to a PTC, which could target these transcripts for NMD (Kervestin and
Jacobson 2012). Based on the AS pattern and expression profiles, I observed cases
where AS alone may regulate gene expression (through NMD), such as NUDX9, and
cases where AS and transcription together regulate gene expression, such as NRT1.8
(Figure 3-7).
79
Two splice variants were identified for gene NUDX9, with an AltA event in the
fifth exon leading to a frame shift that results in a PTC in the AS transcript. There are 17
mapped reads supporting the AS junction (Figure 3-7B). No significant changes in the
gene expression level in response to MeJA were identified. However, I observed
significant differential AS in the shoot tissue of WT and jaz7 upon MeJA treatment. In
the jaz2 mutant, NUDX9 showed a different AS pattern, indicating that a mutation in
JAZ2 may affect the regulation of AS for this gene (Figure 3-7A). When there is no
MeJA treatment the two splice variants are present at similar levels, with only one
isoform (the primary annotated isoform) of the two generating a protein product while
the AltA AS isoform is potentially subject to NMD. However, upon MeJA treatment very
few of the isoforms are of the AltA form providing the potential to generate twice as
much productive translation of the protein product under MeJA treatment relative to
untreated tissue with no required increase in the rate of transcription (Figure 3-7C).
NRT1.8 exhibited both differential expression and AS in response to MeJA
treatment in the root of the three genotypes (Figure 3-7D). Two splice variants were
identified for NRT1.8: the primary annotated isoform and the AS isoform. The AS
isoform contains an IntronR and an alternative polyadenylation event compared with the
primary annotated isoform. The IntronR event affects the CDS where the primary
annotated isoform retained the intron and the AS isoform removed it. Ten mapped
reads support this AS junction (Figure 3-7E). The AS isoform is unproductive as it
contains a PTC. Upon MeJA treatment the expression level of NRT1.8 was up-
regulated and AS generated more of the productive isoform – the primary annotated
isoform. In the case of NRT1.8, differential transcription and AS both play a role to
80
potentially generate more protein product in response to increased jasmonate (Figure 3-
7F).
Alternative Splicing Variants Differentially Targeted by miRNA
I used psRNATarget (Dai and Zhao 2011) to predict miRNA targets on the
assembled transcripts. A total of 508 genes were predicted to be targeted by miRNAs,
among which 171 have evidence of AS (Figure 3-8). Sixty-four genes with evidence of
AS were targeted by miRNA in different manners between isoforms, suggesting the
potential of being regulated by a combination of AS and miRNA. Among these sixty-four
genes, twenty-one show significantly changed isoform proportions (significantly
differential AS identified by Cuffdiff) in MeJA treatment/tissue comparison/genotype
comparison (Figure 3-9). We can not directly relate the observed changes in isoform
proportion solely to significantly differential AS in these cases as the changes in the
ratio of isoform abundance may also be a result of miRNA regulation. Most of the
changes in isoform ratio occurs between tissues, with only five genes (SMZ, AAO2,
PUB4, LON2, AT3G02740) exhibiting a difference in isoform ratios in relation to MeJA
treatment and three (AAO2, LON2 and CEST) exhibiting difference in isoform ratios
between genotype comparison (Figure 3-9). The AS pattern and miRNA target sites of
SMZ, AAO2 and At3g02740 were further analyzed as these genes exhibit significantly
changed isoform ratios in response to MeJA and they represent cases where miRNA
target sites were located at CDS, 3’-UTR and 5’-UTR respectively (Figure 3-10). The
altered proportions of SMZ and AAO2 isoforms in jaz7 shoot tissue in response to
MeJA, and the altered proportion of At3g02740 isoforms in jaz2 and jaz7 root tissue in
response to MeJA, could be a result of AS regulation, miRNA regulation or a
combination of both.
81
Alternative Splicing Variants with Novel Functions
In order to identify AS isoforms with novel functions the ORF and protein
products of the 20,524 transcripts were predicted with TransDecoder (Figure 3-2D;
https://transdecoder.github.io/). Of the 15947 known transcripts identified from both our
project and TAIR10, there is 99.6% agreement between TAIR10 annotation and
TransDecoder prediction, supporting the efficacy of TransDecoder. We used the ORF
annotation from TAIR10 for the 15947 known transcripts and the ORF prediction from
TransDecoder for the remaining 4577 novel transcripts. Transcripts with no ORF
prediction or multiple ORF predictions were excluded from further analysis, thus 3220
genes with evidence of AS (7638 transcripts) with ORF predictions from either TAIR10
or TransDecoder were used for further analysis.
Not all AS leads to multiple protein products of a gene. Indeed, a large proportion
of the AS events doesn’t contribute to protein diversity. One cause is a result of AS in
the UTRs where the AS isoform generates the same protein product. An alternative is
the case where AS causes a PTC leading to NMD mediated transcript degradation. In
this study, 37.6% of the AS events occurred in UTRs (Figure 3-11). Both long 3’-UTR
and introns in the UTR region are cis-elements triggering NMD (Kervestin and Jacobson
2012; Kalyna et al. 2012). Two experimentally validated criteria for NMD in Arabidopsis
were applied to identify potential cases of NMD: (Kalyna et al. 2012): 1.) 3’ UTR > 350
nt; 2.) distance of stop codon to last exon junction > 55 nt (Figure 3-11). As a result, a
total of 1745 transcripts from 1106 genes (34.3% genes with evidence of AS) were
predicted to be targets of NMD. Eliminating genes that undergo AS but produce only a
single protein product identified 1464 genes with AS transcripts which have the potential
to generate multiple protein products.
82
Among these 1464 genes, 171 were identified that were either splicing-related
genes, jasmonate-related genes, or transcription factors, and also show significant
differential expression and/or AS in response to jasmonate treatment and/or between
tissues and/or between genotypes. Predicted proteins from the 372 transcripts from this
171 gene subset were analyzed with the NCBI Conserved Domain Database (CDD,
Marchler-Bauer et al. 2015) and Simple Modular Architecture Research Tool (SMART,
Letunic et al. 2015) to determine their domain structure. We identified potential novel AS
isoforms by two criteria: 1. the AS protein has conserved domain(s) that suggests
functional importance and 2. the domain arrangement/structure of the AS variant is
different from the primary protein, which indicates functional divergence. A total of thirty
genes were identified that satisfy both criteria (Figure 3-12). We observed conservation
between members of the same gene family for domain pattern changes. For example,
the MADS-box genes FYF and FLM both have AS isoforms lacking the MADS domain;
R2R3-MYB genes MYB59, MYB48, MYB28, MYB15, MYB47 all have AS isoforms
lacking the R2 domain; 3R-MYB genes MYB3R1 and MYB3R4 both have AS isoforms
lacking the C-terminal repression motif (Figure 3-12); suggesting conservation of AS
regulation of gene families.
Alternative Splicing Variant of bHLH160 with Potential Novel Function
Among the thirty genes which were predicted to produce AS isoform(s) with
novel functions, I identified bHLH160 as an interesting case. Like all bHLH proteins, the
bHLH160 contains a basic-Helix-Loop-Helix domain, where the basic region is
responsible for sequence specific recognition and interaction with DNA, and the HLH
region is responsible for dimerization (Figure 3-13A, Carretero-Paulet et al. 2010).
When the bHLH dimer stably binds with the DNA recognition sequence it serves as a
83
transcriptional activator/repressor for gene expression. Three transcript isoforms of
bHLH160 were assembled with two of them (TCONS_00034502 and
TCONS_00034498) predicted to generate the primary protein product and the third
isoform (TCONS_00034499) predicted to produce a product lacking part of the N-
terminal region (Figure 3-13A). There are 45 mapped reads supporting the AS junction
of the third isoform (Figure 3-14). Interestingly, the AS isoform lacking the N-terminal
region has lost the 13bp basic region and its upstream sequences leading to a predicted
protein product with only HLH region (conveniently called bHLH160b-) (Figure 3-13).
Based on the function of the basic and HLH regions, lacking the basic region would lead
to a protein that is able to dimerize but unable to bind DNA. In addition, by dimerizing
with normal bHLH proteins, bHLH160b- would be expected to decrease the active bHLH
dimers (Figure 3-13B), thus bHLH160b- functions in a manner opposite to that of the
primary isoform regardless of whether the protein product produced by the primary
isoform acts as an activator or a repressor. We observed significantly upregulated gene
expression of bHLH160 in response to MeJA treatment in WT and jaz2 mutant roots
(Figure 3-13C) suggesting it might be a jasmonate-responsive gene.
To investigate whether the AS regulatory pattern in bHLH160 is conserved
among other bHLH genes, all other bHLH genes expressed in these data with evidence
of AS were examined. Phylogeny analysis of the bHLH gene family from multiple plant
species indicates bHLH160 is a newly evolved gene in brassicaceae (Carretero-Paulet
et al. 2010). We further checked the bHLH160 orthologs in other brassicaceae species
and three genes with similar AS pattern based on the gene annotation of Camelina
sativa were identified (Figure 3-13D). Protein sequence alignments of the primary and
84
AS isoforms of the Arabidopsis bHLH160 and the three Camelina genes
(LOC104712692, LOC104701783, LOC104750971) suggest an alternative start codon
at the first amino acid of the Helix1 region (Figure 3-13E). A similar AS pattern observed
in multiple species suggests that AS generation of both an activator and a repressor
might be a conserved and important regulatory mechanism in bHLH160 and its
orthologous genes.
Proteomics Validation for Alternative Splicing
An attempt was made to validate AS events identified at the transcript level in
this transcriptome analysis with proteomics data. I applied three criteria to select peptide
sequences supporting identified AS events: 1) The peptide sequence should have at
least 95% confidential level; 2) The peptide sequence can only be mapped to a single
gene; 3) The peptide sequence is mapped to the AS junction of the AS isoform. With
the above criteria, AS events of nine genes were identified to have evidence supporting
the protein level expression (Figure 3-15). Of the nine AS events, two (PAPP5,
AKINBETA1) are ExonS; two (RCA, GRXC2) are IntronR; two (SYP43, GSTZ1) are
AltA, two (MORF1, PYR6) are alternative polyadenylation, one (RPAC14) is alternative
promoter (Figure 3-15). Among them, GSTZ1 has support for both the AS junction and
the primary junction.
Discussion
Regulatory Functions of JAZ2 and JAZ7
The isoleucine-conjugated JA, JA-Ile, is the single known bioactive molecule of
the JA hormone (Fonseca et al. 2009). One puzzle of the JA signaling pathway is how a
single molecule could regulate so many different biological processes with
uncompromised specificity (Chini et al. 2016). In the three component key regulatory
85
module “COI1 – JA-Ile – JAZ”, JAZs are responsible for shunting jasmonate signaling
for various plant specific responses due to functional redundancy and specificity of its
members (Chini et al. 2016).
JAZ2 and JAZ7 belong to clade I and IV of the JAZ gene family respectively (Bai
et al. 2011). In agreement with their sequence variation they have several functional
differences. First, JAZ7 has a diverged Jas domain which could not be recognized by F-
box protein COI1 (Shyu et al. 2012; Chini et al. 2016). Thus, JAZ7 may serve as a
permanent repressor, different from most of the other JAZ proteins which are
repressible repressors, such as JAZ2. It is also possible that JAZ7 may be targeted by
other F-box proteins (Shyu et al. 2012; Yan et al. 2014). Secondly, JAZ7 recruits co-
repressor TPL directly by the EAR domain in its N-terminal domain (Causier et al. 2012;
Shyu et al. 2012), whereas JAZ2, lacking the EAR domain, recruits NINJA with its TIFY
domain and NINJA further recruits TPL through its EAR domain (Pauwels et al. 2010).
Dependency on NINJA would affect whether the downstream regulation of JAZ2 and
JAZ7 is affected by signals regulating NINJA. Thirdly, JAZ2 is regulated by AS in its Jas
intron (the intron interrupts the Jas domain), which generates a transcript isoform with a
truncated Jas domain that leads to reduced interaction with COI1 and resistance to
degradation (Chung et al. 2010). Indeed, the jaz2 mutant, which generates a truncated
protein due to T-insertion in the Jas intron, is similar to the alternative isoform (Yan et al.
2014). However, JAZ7 does not have such AS regulation in the Jas domain simply due
to the lack of Jas intron of the JAZ7 gene (Chung et al. 2010). Fourthly, JAZ2 could
form homodimers with itself and heterodimers with other JAZ proteins, such as JAZ1,
JAZ5 and JAZ6, through the TIFY domain (Chung and Howe 2009). However, JAZ7
86
could not dimerize with any JAZ proteins, including itself (Chung and Howe 2009; Chini
et al. 2009). Thus JAZ2 may function as a dimer whereas JAZ7 likely functions as a
monomer. In addition, dimerization is critical for stabilization of the splice variant of
JAZ10 lacking the Jas domain (Chung and Howe 2009), which may also be the case for
the JAZ2 splice variant. Fifthly, interactive transcription factors of the JAZ2 and JAZ7
share similarity as well as differences. JAZ2 and JAZ7 both bind with MYC3, MYC4,
bHLH17(JAM1) (Fernández-Calvo et al. 2011; Fonseca et al. 2014; Thatcher et al.
2016). In addition, JAZ2 also interacts with MYC2, MYC5, bHLH13 (JAM2),
bHLH3(JAM3), bHLH14, TT8, GL3, and EGL3, whereas JAZ7 could not interact with
these (Qi et al. 2011; Fernández-Calvo et al. 2011; Song et al. 2013; Fonseca et al.
2014). Among these transcription factors, bHLH clade IIIe (MYC2, MYC3, MYC4,
MYC5) and IIIf (TT8, GL3, EGL3) mainly function as jasmonate signaling activators (Qi
et al. 2011; Fernández-Calvo et al. 2011), whereas clade IIId (JAM1, JAM2, JAM3,
bHLH14) functions as repressors (Song et al. 2013; Fonseca et al. 2014). The IIIf clade
has a specific role in anthocyanin accumulation and trichome formation (Qi et al. 2011),
suggesting involvement of JAZ2, but not JAZ7 in these regulatory pathways. Recent
research suggests JAZ7 play a specific role in light/dark induced responses (Chico et al.
2014; Yu et al. 2016). JAZ7, as well as JAZ1, JAZ5, JAZ9, JAZ10, JAZ11, JAZ12, are
greatly induced in simulated shade conditions leading to repression of the jasmonate-
dependent defense responses (Chico et al. 2014). Moreover, darkness greatly up-
regulates expression of JAZ7, which negatively regulates Arabidopsis leaf senescence
through MYC2 activator (Yu et al. 2016). Two previously published studies used the
same jaz7 over-expression mutant (SALK_040835) as this project for functional
87
analysis of JAZ7 (Yan et al. 2014; Thatcher et al. 2016). The jaz7 over-expression
mutant exhibits short root and reduced weight compared with WT in response to MeJA,
early flowering phenotype compared with WT in short-day conditions, and increased
susceptibility to bacterial pathogen F. oxysporum (Yan et al. 2014; Thatcher et al.
2016). Interestingly, transgenic JAZ7 overexpression lines do not resemble the
phenotype of the jaz7 over-expression mutant (SALK_040835) suggesting possible
tissue-specific JAZ7 expression of the jaz7 mutant (Thatcher et al. 2016). In this project,
significant up-regulated expression in shoot, but not in root of the jaz7 mutant compared
with WT was observed in response to MeJA, which support tissue-specific expression of
JAZ7 in the mutant (Figure 3-1).
In agreement with functional similarity and specificity of the JAZ2 and JAZ7
genes discussed above I identified potential genes regulated by JAZ2, JAZ7 or both,
although further experimental validation is needed. For example, gene NUDX9 might be
regulated by JAZ2 in the shoot due to the changed isoform expression profile of these
genes in the jaz2 mutant compared with WT and the jaz7 mutant (Figure 3-7A); gene
bHLH160 might be regulated by JAZ7 in the root due to the changed gene expression
profile in jaz7 mutant compared with WT and jaz2 mutant (Figure 3-13C); and gene
At3g02740 might be regulated by both JAZ2 and JAZ7 due to the changed isoform
expression profiles of this gene in jaz2 and jaz7 mutants compared with WT (Figure 3-
10C).
Alternative Splicing Coupled miRNA Regulation
AS and miRNA perform two important post-transcriptional regulation processes
in plants (Alves-Junior et al. 2009; Zhang et al. 2017b) and animals (Pan et al. 2008;
Friedman et al. 2009). In human, more than 95% of multi-exon protein-coding genes are
88
under AS regulation (Pan et al. 2008) and over 60% of the protein-coding genes are
conserved targets of miRNA regulation (Friedman et al. 2009). In Arabidopsis, around
60% of multi-exon protein-coding genes are regulated by AS (Zhang et al. 2017b) and
hundreds of genes involved in various pathways are subjected to miRNA regulation
(Alves-Junior et al. 2009). Approximately 33.3% of miRNA binding sites in human (Wu
et al. 2013a) and more than 12.4% miRNA binding sites in Arabidopsis (Yang et al.
2012) are predicted to be regulated by AS. The inevitable combination of AS and
miRNA forms an important layer of gene regulation in plants and animals (Reddy et al.
2013; Tian and Manley 2013). Both plant and animal miRNAs mediate post-
transcriptional repression, however plant miRNAs manipulate this process by nearly
perfect complementation with usually a single binding site in any region of the transcript,
whereas animal miRNAs achieve this process by partial complementation with multiple
binding sites usually located at the 3’UTR of the mRNA (Voinnet 2009; Axtell et al.
2011). As a result, the plant miRNAs have stronger and more focused effects on
specific target genes, whereas the animal miRNAs have more subtle and wide effects
across the transcriptome (Axtell et al. 2011).
Published studies suggested at least two mechanisms of AS coupled miRNA
regulation. The first mechanism is shared by plants and animals, which include miRNA
binding sites located at the 3’UTR and alternative polyadenylation leading to
inclusion/exclusion of these miRNA binding site(s). By selectively splicing miRNA
binding sites at the 3’UTR region, AS could render isoforms without the miRNA binding
site(s) resistant to miRNA repression with no change on protein structure/function (Wu
and Poethig 2006; Sandberg et al. 2008; Kalsotra et al. 2010; Boutet et al. 2012;
89
Campo et al. 2013). This mechanism is supported by the fact that most eukaryote
genes contain more than one cleavage and polyadenylation sites (Tian and Manley
2013). In animals, alternative polyadenylation associated miRNA regulation is known to
play important roles in a wide variety of pathways, such as immune cell activation, cell
proliferation, muscle stem cell function, and heart development (Sandberg et al. 2008;
Kalsotra et al. 2010; Boutet et al. 2012). In plants, alternative polyadenylation coupled
miRNA regulation was reported on Arabidopsis SPL3/4/5 (Wu and Poethig 2006) and
rice Nramp6 (Campo et al. 2013). In Arabidopsis, alternative polyadenylation on 3’UTR
of SPL3/4/5 affects miR156 binding site and generates miR156-sensative and miR156-
insensitive isoforms. Increased abundance of miR156-insensitive isoforms confer
juvenile-to-adult transition (Wu and Poethig 2006). In rice, dominant splicing variant of
Nramp6 (Natural resistance-associated macrophage protein 6, Os01g31870),
Nramp6.8, contains two miR7695 binding sites in its 3’UTR region. The expression level
of Nramp6.8 is negatively correlated with miR7695 and the over-expression of miR7695
confers pathogen resistance in rice (Campo et al. 2013). The second mechanism is
plant specific, that miRNA could repress pre-mature or erroneous mRNA by targeting
miRNA binding sites located in the intron region. In plant, the most prevalent AS event
is IntronR, which account for ~40% of the total AS events (Zhang et al. 2017b). It was
shown that miRNAs target introns in Arabidopsis and rice (Meng et al. 2013), raising the
possibility that miRNA could specifically repress or degrade erroneous intron-containing
mRNAs.
In this project, I identified 64 genes which contain miRNA binding sites potentially
impacted by AS, including the known SPL4 (Figure 3-16; Wu and Poethig 2006). The
90
actual gene number might be higher as only genes whose expression was detected in
our transcriptome data were analyzed. I observed cases where alternative spliced
miRNA target sites were located in the CDS (e.g. SMZ), 5’UTR (e.g. At3g02740) and
3’UTR (e.g. AAO2) (Figure 3-10). The cases of 5’UTR-located AS-regulating miRNA
binding sites expand the first mechanism to also include alternative promoter coupled
miRNA regulation, which functions in a similar pattern as alternative polyadenylation
coupled miRNA regulation. In addition, most plant miRNA regulated genes were
regulated by only one miRNA with a single target site (Voinnet 2009; Axtell et al. 2011).
I identified 32 genes (6.3% of miRNA regulated genes) that were predicted to contain
multiple miRNA target sites and some of them are regulated by different miRNAs such
as AAO2 and At3g23900 that contain predicted target sites of miR414 and miR5021,
and GRF3 that contains predicted target sites of miR396 and miR5658 (Figure 3-8;
Figure 3-10). Though further validation of these prediction is needed, these
observations suggest complexity of miRNA regulation of these genes in Arabidopsis.
Functional AS Regulation
Several cases where AS may play functional roles in response to MeJA were
identified. In terms of AS regulation – changed proportion of splice variants under
different conditions – I identified genes NUDX9 and NRT1.8 (Figure 3-7). Both genes
exhibited significantly altered isoform proportion upon increased jasmonate, suggesting
existence of cis-elements (intronic/exonic splicing enhancer/silencer) and different
involvement of trans-factors (e.g. SR splicing activators and hnRNPs splicing
repressors). Specifically, I observed significantly changed expression of SR genes
RS2Z33, RSZ21, and hnRNP genes RBP45a, RNPA/B_3 in response to MeJA
treatment (Figure 3-4). These splicing factors are candidate genes responsible for
91
differential AS induced by jasmonate. Moreover, I explored protein interaction network
of jasmonate responsive genes and identified three possible signal transduction
pathways between jasmonate signaling and splicing regulation – shared transcription
factors, kinases and ubiquitin pathways (Figure 3-3). Increased jasmonate would trigger
the jasmonate signaling pathway. The signal could be mediated and transferred through
the transcription factors, kinases and ubiquitin signaling pathways and ultimately
transmitted to splicing related factors that would regulate AS of the responsive genes,
such as NUDX9 and NRT1.8 (Figure 3-7). Interestingly, the AS isoforms of both genes
are potentially subjected to NMD due to a PTC. NMD coupled AS is an efficient
mechanism to downregulate gene expression (Kervestin and Jacobson 2012; Kalyna et
al. 2012) and it may occur in 34.3% of genes predicted to undergo AS in this analysis
(Figure 3-11). Specifically, NRT1.8 not only show differential AS, but also increased
transcription in response to MeJA treatment. The two regulation points – transcription
and AS – function synergistically to increase NRT1.8 protein product (Figure 3-7).
NRT1.8 is a nitrate transporter which enhances nitrate uptake by mediating nitrate
unloading from the xylem vessels (Li et al. 2010). Increased NRT1.8 protein level from
transcription and AS regulation indicates its possible involvement in jasmonate triggered
plant nitrate uptake. NUDX9 is a GDP-D-Man pyrophosphohydrolase, which indirectly
modulate ammonium responses by hydrolysis of GDP-D-Man in the root (Tanaka et al.
2015). Here I reported a possible involvement of NUDX9 in jasmonate signaling
pathway by AS regulation in shoot of Arabidopsis.
In addition to regulating the relative abundance of isoforms AS also plays
important functional roles by generating novel function splice variants, as is the case for
92
the well documented JAZ repressors (Yan et al. 2007; Chung and Howe 2009; Chung et
al. 2010; Moreno et al. 2013). Notably, in response to wounding, the ratio of the
expressed JAZ10 splice variants did not change significantly (Yan et al. 2007). The AS
regulation of the JAZ repressors lies in generating a stable repressor (lacking the
domain of being recognized by COI1) in addition to the primary degradable repressor
rather than modulating the proportion of isoforms. AS has the potential to generate
isoforms with different functions from the same gene by modulating domain structures.
By identifying AS genes with different domain arrangements between splice variants an
attempt was made to identify cases similar to JAZ repressors in the jasmonate signaling
pathway. We identified a copper-responsive transcription factor bHLH160 (Yamasaki et
al. 2009; Bernal et al. 2012), which has the potential to generate an activator and a
repressor through AS by inclusion/exclusion of sequences coding for the DNA binding
domain (Figure 3-13). The primary isoform of bHLH160 is a transcription factor with a
DNA-binding domain (basic) and dimerization domain (HLH) (Carretero-Paulet et al.
2010). The splice variant bHLH160b- lacks the basic domain and is not expected to be
able to bind with the promoters of genes normally regulated by bHLH160. Moreover,
bHLH160b- protein could soak up functional bHLH160 proteins by forming dimers with
them through the HLH domain. Thus, the splice variant bHLH160b- likely functions in a
manner opposite to the primary isoform. Human Id genes, which only contain the HLH
domain like Arabidopsis bHLH160b-, function in a similar manner to repress gene
expression (Sikder et al. 2003). Interestingly, similar AS regulation was observed in
MADS-box genes, FYF and FLM. The AS isoforms of FYF and FLM have lost the
MADS-box domain, which is the DNA-binding domain, but retained the K-box domain,
93
which facilitates dimerization of MADS-box proteins (Par̆enicová et al. 2003; Figure 3-
12). Thus the splice variants of FYF and FLM may negatively regulate the original
function of the gene in a manner similar to bHLH160b-. AS may result in isoforms with
mutually exclusive functions, or could also result in isoforms that effect the same
outcome but have different strengths, which is the case for MYB3R1 and MYB3R4
(Figure 3-12). MYB3R1 and MYB3R4 are transcriptional activators upregulating G2/M
transition in cell cycle (Haga et al. 2007). AS of MYB3R1 and MYB3R4 leads to
truncated proteins lacking the repression motif, which would serve as hyper-activators
compared with the primary proteins (Kato et al. 2009; Feng et al. 2017). Besides
transcription factors, I also identified many splicing factor genes with different domain
arrangements between splice variants. In most of the identified cases the primary
protein product of the genes has multiple domains and the AS isoform has lost one or
more domain(s) (Figure 3-12). It is not surprising that many of the cases I identified are
transcription factors and splicing factors, as they contain at least two domains: the
DNA/RNA-binding domain and the regulatory domain. Based on these observations, AS
could serve as an important regulator on multi-domain proteins that could generate
splice variants with different functions by altering domain arrangement. Further
functional validation of these cases would help to elucidate regulatory roles different
isoforms are playing and their involvement in the jasmonate signaling pathway.
94
Figure 3-1. jaz2 and jaz7 mutant characterization. A) Gene structure of jaz2 and jaz7 with the location of the T insertion indicated. B) Expression profile of JAZ2 (left) and JAZ7 (right) based on the transcriptome data. C) Phenotype of Arabidopsis seedlings under 0 μm or 10μm MeJA (Yan et al. 2014).
95
Figure 3-2. Characterization of assembled transcripts. A) Comparison of the assembled transcripts from the MeJA RNA-Seq data and TAIR10 annotation. B) Comparison of the mapped junction reads from the MeJA RNA-Seq data to the TAIR10 annotated junctions. C) Identified AS patterns detected in the MeJA RNA-Seq assemblies. D) ORF prediction of the 20,524 transcripts assembled from the MeJA RNA-Seq data with the TransDecoder program (Haas and Papanicolaou 2016).
96
Figure 3-3. Protein interaction network of genes that undergo AS which show differential expression or differential AS in response to MeJA treatment. Edges indicates experimentally validated interactions of the two connected proteins or their homologs. Yellow boxes indicate proteins related with splicing; blue boxes indicate transcription factors; pink boxes indicate kinase proteins. Heatmaps under the gene name indicate gene expression patterns in response to MeJA treatment.
97
Figure 3-4. Heatmap of differentially expressed transcription factor (bHLH and MYB) and splicing factor (SR and hnRNP) gene family members that undergo AS. Differentially expressed bHLH and MYB genes under MeJA treatment (A), between tissues (shoot relative to root, ) (B), between mutant backgrounds (mutant relative to WT) (C). Differentially expressed SR and hnRNP genes under MeJA treatment (D), between tissues (shoot relative to root) (E), between mutant backgrounds (mutant relative to WT) (F).
98
Figure 3-5. Venn diagram of genes that exhibit differential expression or differential AS in response to treatment (MeJA), between tissues (root and shoots) and between mutant backgrounds.
Treatment no vs. 10µM MeJA
Tissue root vs. shoot
Differential AS Differential exp.
Genotype WT vs. jaz2 or
WT vs. jaz7 or jaz2 vs. jaz7
Treatment no vs. 10µM MeJA
Tissue root vs. shoot
Genotype WT vs. jaz2 or
WT vs. jaz7 or jaz2 vs. jaz7
99
Figure 3-6. Example cases in which gene expression was regulated by AS in response to MeJA treatment. Pie charts under each condition indicate the proportion of each AS isoform present relative to the total expression from the locus. Red and blue lines under pies indicate significant differential AS in shoot and root, respectively.
100
Figure 3-7. Two genes under AS regulation in response to MeJA treatment. A) Expression profile of NUDX9. Error bars indicate standard deviation. Pie charts under each condition indicate the proportion of each AS isoform relative to the total expression from the locus. Red lines under pie charts indicate significantly differential AS in the shoot tissue. B) Gene structure of the two isoforms of NUDX9. Angled lines indicate introns; thin black boxes indicate UTRs; blue boxes indicate CDS; pink boxes indicate regions which were converted to non-coding regions as a result of an AS induced PTC. Gray line connected red boxes indicate mapped reads supporting the novel junction in the alternative isoform. C) Regulation of NUDX9 in response to MeJA treatment by AS. D) Expression profile of NRT1.8. Blue lines under pie charts indicate significantly differential AS patterns in the root tissue. Red and blue lines above the barplot indicate significantly differential gene expression in shoot and root respectively. E) Gene structure of the two isoforms of NRT1.8. F) Regulation of NRT1.8 in response to MeJA treatment by transcription and AS.
101
Figure 3-8. Genes undergoing AS that include miRNA binding sites that differ between isoforms. A) Distribution of predicted miRNA target sites in genes that do not undergo AS, genes that undergo AS and AS isoforms of the gene contain different miRNA target sites and genes that undergo AS to produce isoforms with the same miRNA target sites. The darker shaded sections within each category indicate cases with multiple miRNA target sites. B) Gene structure of the seven cases (dark pink in A) with the predicted miRNA binding site indicated. C) Expression profiles of the seven genes shown in B. Pie charts under each condition indicate the proportion of each AS isoform relative to the total expression from the locus. Significantly differential AS is indicated by red lines.
102
Figure 3-9. Twenty-one genes which contain miRNA binding sites potentially subjected to AS regulation. Blue, green and yellow shading indicate significantly changed isoform proportions identified in treatment, tissue or genotype comparisons, respectively. Targeted sequences within the transcript and their locations are presented within brackets [ ] in the “Target site” column.
a b c d e f a b c d e f a b c d e f g h i j k l
TCONS_00008685 0
TCONS_00008686 1 ath-miR156 [798-817: CUUCUCUCUCUCUUCUCUCA]
TCONS_00085221 1 ath-miR172 [965-985: UUGCAGCAUCAUCAGGAUUCC]
TCONS_00085222 0
TCONS_00082086 2ath-miR414 [4777-4797: UGAUGAUGAUGAUGAAGAUGC]
ath-miR5021 [5090-5109: UUUUCUUCUUCUUCUUCUUC]
TCONS_00082087 0
TCONS_00051815 1 ath-miR414 [2818-2838: GGAUGAUGAUGAUGAUGAUGA]
TCONS_00051822 1 ath-miR414 [2877-2897: GGAUGAUGAUGAUGAUGAUGA]
TCONS_00051834 0
TCONS_00137229 0
TCONS_00137255 1 ath-miR414 [916-939: UGACAACGAUGAUGAUGAAGAUGA]
TCONS_00090637 1 ath-miR415 [1441-1461: CUUUUCUGUCUCUGCUCUGUU]
TCONS_00090640 0
TCONS_00090641 1 ath-miR415 [1396-1416: CUUUUCUGUCUCUGCUCUGUU]
TCONS_00119400 0
TCONS_00119402 1 ath-miR472 [1313-1334: GGUAUGGGGGGAAUAGGAAAAA]
TCONS_00119403 1 ath-miR472 [1313-1334: GGUAUGGGGGGAAUAGGAAAAA]
TCONS_00119404 0
TCONS_00003955 0
TCONS_00003956 1 ath-miR472 [1191-1212: GGUAUGGGGGGAGUAGGUAAAA]
TCONS_00103441 1 ath-miR5021 [47-67: UUCUUCUUCUUCUUCUUCUCU]
TCONS_00103442 1 ath-miR5021 [47-67: UUCUUCUUCUUCUUCUUCUCU]
TCONS_00103445 0
TCONS_00103446 1 ath-miR5021 [44-64: UUCUUCUUCUUCUUCUUCUCU]
TCONS_00042668 0
TCONS_00042669 0
TCONS_00042670 1 ath-miR5021 [24-44: UUCUUCUUCUUCUUCUUCUCU]
TCONS_00059096 0
TCONS_00059100 1 ath-miR5021 [128-147: CUUUCUUCUUCUUCUUCUUC]
TCONS_00060511 1 ath-miR5021 [114-133: UUUUCUUCUUCUUCUUCUUU]
TCONS_00060512 0
TCONS_00074206 1 ath-miR5021 [1620-1640: UUCUUCUUCUUCUUCUUCUCU]
TCONS_00074207 0
TCONS_00082295 1 ath-miR5021 [18-38: UUCUUCUUCUUCUUCUUCUCU]
TCONS_00082304 0
TCONS_00101323 1 ath-miR5021 [2480-2500: UUUAUCUUCUUCUUCUUCUCC]
TCONS_00101325 0
TCONS_00124016 1 ath-miR5021 [57-76: UUUUUUUGUUCUUCUUCUCA]
TCONS_00124018 0
TCONS_00122678 0
TCONS_00122680 0
TCONS_00122682 0
TCONS_00122683 1 ath-miR5641 [60-80: UCUUUCUAUCAUCUUCUUACA]
TCONS_00119648 1 ath-miR5641 [1180-1200: UUAUUUAAUCAUCUUCUUCCU]
TCONS_00119653 1 ath-miR5641 [1180-1200: UUAUUUAAUCAUCUUCUUCCU]
TCONS_00119656 0
TCONS_00124898 0
TCONS_00124899 0
TCONS_00124902 1 ath-miR5658 [3646-3666: AAUCAUCAUCAUAAUCAUCAU]
TCONS_00095194 2ath-miR5658 [1487-1507: GAUCAUCACCAUCAUCAUCAU]
ath-miR5658 [1981-2001: GCUCAUCGUCAUCAUCAUCAU]
TCONS_00095197 1 ath-miR5658 [1078-1098: GAUCAUCACCAUCAUCAUCAU]
TCONS_00004939 0
TCONS_00004941 1 ath-miR8177 [3771-3791: UA-GAGUGACACAUCAUCACAA]
sh
oo
t_W
Tsh
oo
t_ja
z2
sh
oo
t_ja
z7
roo
t_W
Tro
ot_
jaz2
roo
t_ja
z7
No
Me
JA
_W
TN
o M
eJA
_ja
z2
No
Me
JA
_ja
z7
10
um
Me
JA
_W
T1
0u
m M
eJA
_ja
z2
10
um
Me
JA
_ja
z7
jaz2
-WT
(sh
oo
t_n
o M
eJA
)ja
z2
-WT
(sh
oo
t_1
0u
m M
eJA
)ja
z2
-WT
(ro
ot_
no
Me
JA
)ja
z2
-WT
(ro
ot_
10
um
Me
JA
)ja
z7
-WT
(sh
oo
t_n
o M
eJA
)ja
z7
-WT
(sh
oo
t_1
0u
m M
eJA
)ja
z7
-WT
(ro
ot_
no
Me
JA
)ja
z7
-WT
(ro
ot_
10
um
Me
JA
)ja
z2
-ja
z7
(sh
oo
t_n
o M
eJA
)ja
z2
-ja
z7
(sh
oo
t_1
0u
m M
eJA
)ja
z2
-ja
z7
(ro
ot_
no
Me
JA
)ja
z2
-ja
z7
(ro
ot_
10
um
Me
JA
)
ath-miR8177 21AT1G18880
(NPF2.9)
20AT4G29230
(NAC75)
ath-miR5658
19AT5G64330
(NPH3/RPT3)
18AT5G44650
(CEST/Y3IP1)
ath-miR5641
17 AT5G55530
16AT5G60860
(RABA1F)
15AT4G13930
(SHM4)
14 AT3G44610
13 AT3G05180
12 AT3G02740
11AT2G46340
(SPA1)
10AT2G27030
(CAM5)
ath-miR5021
9 AT4G19670
8 AT1G15885
ath-miR472
7AT5G43730
(RSG2)
ath-miR415 6AT4G12560
(CPR1/CPR30)
5AT5G47040
(LON2/APEM10)
4AT2G23140
(PUB4)ath-miR414
3AT3G43600
(AAO2)
ath-miR172 2AT3G54990
(SMZ)
Genotype
ath-miR156 1AT1G35515
(MYB8/HOS10)
miRNA No. Gene
AS.diff
Transcript Target Target siteTreatment Tissue
103
Figure 3-10. miRNA regulation and expression profiles of SMZ, AAO2 and At3g02740. A) Gene structure of the transcript isoforms of SMZ, AAO2 and At3g02740. Angled lines indicate introns; thin black boxes indicate UTRs; blue/green boxes indicate CDS. Red bars indicate miRNA targeting position. B) Complementarity of miRNAs and their target sites on the transcripts. C) Expression profiles of SMZ, AAO2 and At3g02740. Isoforms subjected to miRNA regulation are indicated in red. Error bars indicate standard deviation. Pie charts under each condition indicate the proportion of each AS isoform present relative to the total expression from the locus. Black lines under pie charts indicate significantly changed isoform proportions.
104
Figure 3-11. Cases in which AS won’t generate multiple protein products. A) Distribution of AS events occurring within the CDS or UTRs. B) Distribution of 3’-UTR lengths of all identified transcripts. C) Distribution of the distance between the stop codon and the last exon junction when the stop codon is not within the last exon of the transcript.
105
Figure 3-12. AS Genes with different domain structures predicted between transcript isoforms. Heatmaps indicate up- or down-regulated transcript expression in response to MeJA treatment in six conditions (SW: shoot of WT; S2: shoot of jaz2; S7: shoot of jaz7; RW: root of WT; R2: root of jaz2; R7: root of jaz7).
No. Category Gene Transcript SW S2 S7 RW R2 R7 Domain (Interval, E-value)*
TCONS_00124363/5/8 MADS-MEF2-like (3-78, 7.2e-44); K-box (84-170, 1.2e-22)
TCONS_00124364/6 MADS-MEF2-like (3-78, 2.6e-44); K-box (84-156, 1.4e-14)
TCONS_00124367 MADS-MEF2-like ( - ); K-box (56-117, 1.6e-18)
TCONS_00016233 MADS (2-71, 1.8e-25); K-box (95-163, 2.6e-10)
TCONS_00016236 MADS ( - ); K-box (86-126, 5.2e-08)
TCONS_00140881 MYB_DNA-binding (10-57, 7.5e-14); MYB_DNA-binding (63-108, 1.2e-14)
TCONS_00140880 MYB_DNA-binding ( - ); MYB_DNA-binding (42-87; 3.1e-15)
TCONS_00140882 MYB_DNA-binding ( - ); MYB_DNA-binding (2-43, 5.1e-14)
TCONS_00068443 MYB_DNA-binding (9-56, 3.8e-14); MYB_DNA-binding (62-107, 1.8e-14)
TCONS_00068438 MYB_DNA-binding ( - ); MYB_DNA-binding (2-43, 1.3e-13)
TCONS_00068444 MYB_DNA-binding ( - ); MYB_DNA-binding (32-77, 9.3e-15)
TCONS_00141362 MYB_DNA-binding (14-61, 5.1e-16); MYB_DNA-binding (67-112, 4.4e-14)
TCONS_00141363 MYB_DNA-binding ( - ); MYB_DNA-binding (2-33, 1.7e-08)
TCONS_00066032 MYB_DNA-binding (14-61, 5.2e-15); MYB_DNA-binding (67-112, 2.2e-14)
TCONS_00066034 MYB_DNA-binding ( - ); MYB_DNA-binding (10-55, 1.9e-15)
TCONS_00004892 MYB_DNA-binding (14-61, 2.2e-12); MYB_DNA-binding (67-112, 4.8e-13)
TCONS_00004893 MYB_DNA-binding ( - ); MYB_DNA-binding (2-21, 2.2e-03)
TCONS_00096267 MYB_DNA-binding (35-81, 1.0e-15; 87-133, 6.2e-20;139-182, 5.4e-14 ); Repression_motif1 (849-920, NA)
TCONS_00096266 MYB_DNA-binding (35-81, 1.7e-15; 87-133, 1.2e-19; 139-182, 8.8e-14); Repression_motif1 ( - )
TCONS_00112857 MYB_DNA-binding (29-75, 7.5e-17; 81-127, 1.1e-19; 133-176, 3.8e-12); Repression_motif1 (843-908, NA)
TCONS_00112860 MYB_DNA-binding (29-75, 1.2e-16;87-127, 1.7e-19; 133-176, 5.3e-12); Repression_motif1 ( - )
TCONS_00134035/6 myb_SHAQKYF (188-242, 8.9e-21); Myb_CC_LHEQLE (276-321, 4.4e-23)
TCONS_00134034 myb_SHAQKYF (231-285, 8.3e-21); Myb_CC_LHEQLE (319-364, 9.5e-24)
TCONS_00134043 myb_SHAQKYF (231-285, 7.8e-22); Myb_CC_LHEQLE ( - )
TCONS_00048235 myb_SHAQKYF (15-71, 2.2e-25); Myb_CC_LHEQLE (101-146, 5.7e-28)
TCONS_00048234 myb_SHAQKYF ( - ); Myb_CC_LHEQLE (52-97, 4.7e-28)
TCONS_00036895 myb_SHAQKYF (34-90, 2.5e-23); Myb_CC_LHEQLE (130-167, 6.0e-23)
TCONS_00036894 myb_SHAQKYF ( - ); Myb_CC_LHEQLE (65-102, 1.8e-22)
TCONS_00111659 myb_SHAQKYF (192-248, 9.6e-21); MYB_CC_LHEQLE (276-321, 2.3e-23)
TCONS_00111656 myb_SHAQKYF (192-248, 1.2e-21); Myb_CC_LHEQLE ( - )
TCONS_00127440 zf-CCCH (38-61, 1.8e-04); KH_1 (115-177, 2.9e-12); ZnF_C3H1 (205-231, 2.6e-08)
TCONS_00127443 zf-CCCH ( - ); KH_1 (43-105, 1.1e-11); ZnF_C3H1 ( 133-159, 2.2e-09)
TCONS_00053227/9 FAR1 super family (64-137, 1.66e-09); MULE (251-339, 2.2e-19); ZnF_PMZ (538-562, 1.6e-07); Commd super family (777-837, 2.8e-03)
TCONS_00053228 FAR1 super family ( - ); MULE (106-194, 2.2e-19); ZnF_PMZ (393-417, 1.2e-07); Commd super family (632-692, 2.3e-03)
TCONS_00053546 AP2 (153-215, 1.5e-14); AP2 (245-308, 7.9e-25)
TCONS_00053545 AP2 (153-215, 1.4e-14); AP2 ( - )
TCONS_00034498/502 HLH super family (61-119, 2.6e-03)
TCONS_00034499 HLH super family ( - )
TCONS_00007957 MFMR (1-107, 1.0e-35); MFMR_assoc (115-266, 3.2e-23); bZIP_plant_GBF1 (298-348, 5.3e-25); BAR super family (340-385, 9.7e-03)
TCONS_00007970 MFMR ( - ); MFMR assoc (16-167, 4.0e-24); bZIP_plant_GBF1 (199-249, 4.9e-23); BAR super family (241-286, 6.3e-03)
TCONS_00051442 bZIP_plant_RF2 (373-422, 8.3e-22); Dzip-like_N super family (416-462, 3.2e-03)
TCONS_00051446 bZIP_plant_RF2 (373-422, 8.6e-23); Dzip-like_N super family ( - )
TCONS_00121748 RRM1_AtRSp31_like (2-73, 2.1e-37); RRM2_AtRSp31_like (97-166, 6.3e-40); Rubella_Capsid super family (237-327, 8.4e-04)
TCONS_00121746 RRM1_AtRSp31_like (2-73, 1.5e-37); RRM2_AtRSp31_like (97-166, 6.2e-40); Rubella_Capsid super family (237-327, 8.7e-04)
TCONS_00121747 RRM1_AtRSp31_like ( - ); RRM2_AtRSp31_like (64-133, 2.4e-40); Rubella_Capsid super family (204-294, 2.0e-03)
TCONS_00121756 RRM1_AtRSp31_like ( - ); RRM2_AtRSp31_like (64-133, 2.5e-40); Rubella_Capsid super family (204-294, 1.8e-03)
TCONS_00094149/51 RRM1_AtRSp31_like (2-73, 4.3e-37); RRM_SF super family (98-167, 2.4e-38); DUF2722 super family (210-315, 7.6e-03)
TCONS_00094150 RRM1_AtRSp31_like ( - ); RRM_SF super family (65-134, 1.1e-38); DUF2722 super family ( - )
TCONS_00094152 RRM1_AtRSp31_like ( - ); RRM_SF super family (57-126, 8.6e-39); DUF2722 super family ( - )
TCONS_00056283 RRM (2-85, 1.8e-09); Zf-CCHC (99-115, 1.9e-04); Zf-CCHC (121-138, 4.8e-04)
TCONS_00056285 RRM (3-44, 2.3e-03); Zf-CCHC (58-74, 3.5e-04); Zf-CCHC (80-97, 8.6e-04)
TCONS_00056284 RRM ( - ); Zf-CCHC (69-85, 4.5e-04); Zf-CCHC (91-108, 1.1e-03)
TCONS_00084632 RRM (2-92, 2.7e-08); Zf-CCHC (99-115, 1.8e-04); Zf-CCHC ( 121-138, 4.7e-04)
TCONS_00084631 RRM ( - ); Zf-CCHC (58-74, 3.5e-04); Zf-CCHC (80-97, 8.8e-04)
TCONS_00125380 RRM_hnRNPH_ESRPs_RBM12_like (50-119, 3.1e-23); RRM_hnRNPH_ESRPs_RBM12_like (165-239, 4.9e-28)
TCONS_00125379 RRM_hnRNPH_ESRPs_RBM12_like ( - ); RRM_hnRNPH_ESRPs_RBM12_like (70-142, 2.4E-29)
TCONS_00122488 RRM1_SECp43_like (61-138, 2.8e-43); RRM2_SECp43_like (153-232, 6.0e-50); RRM3_NGR1_NAM8_like (259-330, 4.4e-41)
TCONS_00122487 RRM1_SECp43_like (61-138, 1.2e-44); RRM2_SECp43_like (153-222, 7.3e-44); RRM3_NGR1_NAM8_like ( - )
TCONS_00122484 RRM1_SECp43_like (61-138, 2.3e-44); RRM2_SECp43_like (153-232, 5.4e-51); RRM3_NGR1_NAM8_like ( - )
TCONS_00139168 RRM1_PTBPH1_PTBPH2 (16-96, 3.5e-57); RRM2_PTBPH1_PTBPH2 (110-204, 1.4e-58); RRM3_PTBPH1_PTBPH2 (243-339, 2.0e-64)
TCONS_00139169 RRM1_PTBPH1_PTBPH2 ( - ); RRM2_PTBPH1_PTBPH2 ( - ); RRM3_PTBPH1_PTBPH2 (143-239, 2.1e-66)
TCONS_00077879 Smc (117-457, 2.2e-06); PRK12472 (609-751, 1.7e-03)
TCONS_00077880 Smc (134-469, 7.5e-07); PRK12472 (571-713, 1.5e-03)
TCONS_00077870 MDN1 (70-350, 1.34e-04); PRK12472 (417-559, 6.7e-04)
TCONS_00112713 DEADc (48-250, 5.4e-88); HELICc (264-391, 2.0e-38)
TCONS_00112720 DEADc ( - ); HELICc (181-308, 8.5e-39)
TCONS_00047402 EF-G (67-339, 2.0e-179); mtEFG1_II_like (366-446, 1.2e-48); EFG_III (459-534, 2.6e-40); EFG_mtEFG1_IV (537-654, 8.2e-63); mtEFG1_C (659-736, 7.5e-45)
TCONS_00047388 EF-G (67-304, 1.5e-151); mtEFG1_II_like ( - ); EFG_III ( - ); EFG_mtEFG1_IV ( - ); mtEFG1_C ( - )
TCONS_00119648 Transmembrane region2 ( 254-276, NA)
TCONS_00119656 Transmembrane region2 ( - )
* Domain prediction information from NCBI Conserevd Domain Database (Marchler-Bauer et al., 2017). 1. Motif information from Feng et al., 2017. 2. Transmembrane prediction from SMART (Simple Modular Architecture Research Tool) (Letunic et al., 2015).
29translation elongation
factorAT2G45030 (EFG/EF2)
30 thylakoid protein AT5G44650 (CEST)
27 ER body protein AT3G15950 (NAI2 )
28 RNA helicase AT5G11170 (UAP56A)
25 RNA-binding protein AT5G54900 (RBP45A)
26 RNA-binding protein AT5G53180 (PTB2 )
23 SR splicing factor AT3G53500 (RSZ32)
24 hnRNP splicing factor AT5G66010
21 SR splicing factor AT4G25500 (RS40)
22 SR splicing factor AT2G37340 (RS2Z33)
19 bZIP AT2G21230 (bZIP30)
20 SR splicing factor AT5G52040 (RS41)
17 bHLH AT1G71200 (bHLH160)
18 bZIP AT1G32150 (bZIP68)
15 FHY3/FAR1 AT2G27110 (FRS3)
16 AP2 AT2G28550 (TOE1 )
13 G2-like AT5G06800
14 C3H AT5G06770
11 G2-like AT2G01060
12 G2-like AT1G79430 (APL)
9 3R-MYB AT5G11510 (MYB3R4)
10 G2-like AT5G29000 (BHL1)
7 R2R3-MYB AT1G18710 (MYB47)
8 3R-MYB AT4G32730 (MYB3R1)
5 R2R3-MYB AT5G61420 (MYB28)
6 R2R3-MYB AT3G23250 (MYB15)
3 R2R3-MYB AT5G59780 (MYB59)
4 R2R3-MYB AT3G46130 (MYB48)
1 MADS BOX AT5G62165 (AGL42/FYF)
2 MADS BOX AT1G77080 (FLM)
WT_Shoot
J2_Shoot
J7_Shoot
WT_Root
J2_Root
J7_Root
TCONS_00119656
TCONS_00119648
TCONS_00047388
TCONS_00047402
TCONS_00112720
TCONS_00112713
TCONS_00077870
TCONS_00077880
TCONS_00077879
TCONS_00084631
TCONS_00084632
TCONS_00056284
TCONS_00056285
TCONS_00056283
TCONS_00139169
TCONS_00139168
TCONS_00125379
TCONS_00125380
TCONS_00094152
TCONS_00094150
TCONS_00094149/51
TCONS_00122484
TCONS_00122487
TCONS_00122488
TCONS_00121756
TCONS_00121747
TCONS_00121746
TCONS_00121748
TCONS_00051446
TCONS_00051442
TCONS_00007970
TCONS_00007957
TCONS_00034499
TCONS_00034498/502
TCONS_00014999
TCONS_00014988
TCONS_00053545
TCONS_00053546
TCONS_00053228
TCONS_00053227/9
TCONS_00127443
TCONS_00127440
TCONS_00111656
TCONS_00111659
TCONS_00036894
TCONS_00036895
TCONS_00048234
TCONS_00048235
TCONS_00134043
TCONS_00134034
TCONS_00134035/6
TCONS_00112860
TCONS_00112857
TCONS_00096266
TCONS_00096267
TCONS_00004893
TCONS_00004892
TCONS_00066034
TCONS_00066032
TCONS_00141363
TCONS_00141362
TCONS_00068444
TCONS_00068438
TCONS_00068443
TCONS_00140882
TCONS_00140880
TCONS_00140881
TCONS_00016236
TCONS_00016233
TCONS_00124367
TCONS_00124364/6
TCONS_00124363/5/8
−5 5
Value
Color Key
WT_Shoot
J2_Shoot
J7_Shoot
WT_Root
J2_Root
J7_Root
TCONS_00119656
TCONS_00119648
TCONS_00047388
TCONS_00047402
TCONS_00112720
TCONS_00112713
TCONS_00077870
TCONS_00077880
TCONS_00077879
TCONS_00084631
TCONS_00084632
TCONS_00056284
TCONS_00056285
TCONS_00056283
TCONS_00139169
TCONS_00139168
TCONS_00125379
TCONS_00125380
TCONS_00094152
TCONS_00094150
TCONS_00094149/51
TCONS_00122484
TCONS_00122487
TCONS_00122488
TCONS_00121756
TCONS_00121747
TCONS_00121746
TCONS_00121748
TCONS_00051446
TCONS_00051442
TCONS_00007970
TCONS_00007957
TCONS_00034499
TCONS_00034498/502
TCONS_00014999
TCONS_00014988
TCONS_00053545
TCONS_00053546
TCONS_00053228
TCONS_00053227/9
TCONS_00127443
TCONS_00127440
TCONS_00111656
TCONS_00111659
TCONS_00036894
TCONS_00036895
TCONS_00048234
TCONS_00048235
TCONS_00134043
TCONS_00134034
TCONS_00134035/6
TCONS_00112860
TCONS_00112857
TCONS_00096266
TCONS_00096267
TCONS_00004893
TCONS_00004892
TCONS_00066034
TCONS_00066032
TCONS_00141363
TCONS_00141362
TCONS_00068444
TCONS_00068438
TCONS_00068443
TCONS_00140882
TCONS_00140880
TCONS_00140881
TCONS_00016236
TCONS_00016233
TCONS_00124367
TCONS_00124364/6
TCONS_00124363/5/8
−5 5
Value
Color Key
Color Key
-5 0 5
106
Figure 3-13. Arabidopsis bHLH160 AS patterns and the proposed regulatory function of the splice variant. A) Gene structure of the transcript isoforms of Arabidopsis bHLH160. B) Protein 3D structure of the Arabidopsis bHLH160 proteins adapted from Ma et al. 1994 and their regulatory roles. C) Expression profile of Arabidopsis bHLH160. Error bars indicate the standard deviation. Pies charts under each condition indicate the proportion of each AS isoform present relative to the total expression from the locus. Red lines above the barplot indicate significantly changed gene expression. D) Gene structure of the transcript isoforms of Camelina sativa LOC104712692, LOC104701783 and LOC104750970. Angled lines indicate introns; black thin boxes indicate UTRs; thick boxes indicate CDS; orange, green and purple indicate basic, helix and loop respectively in the bHLH domain. E) Multiple sequence alignments of the protein isoforms from Arabidopsis thaliana bHLH160, Camelina sativa LOC104712692, LOC104701783 and LOC104750970. The bHLH domain is indicated within black boxes.
107
Figure 3-14. Mapped reads from the MeJA project supporting the AS junction in Arabidopsis bHLH160b-.
108
Figure 3-15. Proteomics validated AS isoform expression. Gene structure was displayed by Gene Structure Display Sever 2.0 (http://gsds.cbi.pku.edu.cn, Hu et al. 2015): thick black line -- UTR; blue box -- exon; angled line -- intron; red angled line -- intron splitting the junction of the transcript isoform supported by proteomics data; black/gray line under blue box -- peptide mapped region. Shaded isoform name indicates primary annotated isoform from TAIR10 annotation.
109
Figure 3-16. Differential miRNA regulation of SPL4 splice variants. A) Gene structure of SPL4 AS isoforms with the predicted miRNA binding site indicated. B) Complementarity of the miRNA with the predicted SPL4 target site.
110
Table 3-1. RNA-seq library and mapping information.
Samples # Raw reads # Mapped reads Percentage
WT_shoot_0_rep1 21,128,469 17,025,286 80.6%
WT_shoot_0_rep2 14,186,832 8,909,077 62.8%
WT_shoot_0_rep3 23,486,963 8,769,393 37.3%
WT_shoot_10_rep1 13,657,112 10,920,597 80.0%
WT_shoot_10_rep2 16,134,503 12,685,395 78.6%
WT_shoot_10_rep3 13,510,226 10,641,409 78.8%
WT_root_0_rep1 15,241,432 12,494,869 82.0%
WT_root_0_rep2 40,886,447 32,315,025 79.0%
WT_root_0_rep3 22,960,991 18,992,644 82.7%
WT_root_10_rep1 15,351,676 13,060,948 85.1%
WT_root_10_rep2 17,683,061 14,960,418 84.6%
WT_root_10_rep3 15,102,434 12,796,169 84.7%
jaz2_shoot_0_rep1 15,032,167 10,950,279 72.8%
jaz2_shoot_0_rep2 11,958,237 9,261,665 77.5%
jaz2_shoot_0_rep3 9,277,191 5,794,838 62.5%
jaz2_shoot_10_rep1 11,154,705 8,481,602 76.0%
jaz2_shoot_10_rep2 11,431,445 8,887,788 77.7%
jaz2_shoot_10_rep3 11,715,590 9,311,499 79.5%
jaz2_root_0_rep1 15,850,717 13,107,489 82.7%
jaz2_root_0_rep2 17,974,716 15,159,325 84.3%
jaz2_root_0_rep3 17,514,424 14,196,519 81.1%
jaz2_root_10_rep1 16,661,319 13,609,917 81.7%
jaz2_root_10_rep2 23,840,265 19,031,731 79.8%
jaz2_root_10_rep3 18,466,668 15,379,759 83.3%
jaz7_shoot_0_rep1 10,507,564 8,446,985 80.4%
jaz7_shoot_0_rep2 10,564,614 8,097,926 76.7%
jaz7_shoot_0_rep3 n.a. n.a. n.a.
jaz7_shoot_10_rep1 12,594,778 9,115,023 72.4%
jaz7_shoot_10_rep2 17,867,067 13,564,544 75.9%
jaz7_shoot_10_rep3 15,161,488 11,884,760 78.4%
jaz7_root_0_rep1 17,388,880 14,375,611 82.7%
jaz7_root_0_rep2 17,181,387 14,247,505 82.9%
jaz7_root_0_rep3 15,300,499 13,159,910 86.0%
jaz7_root_10_rep1 16,636,604 13,859,228 83.3%
jaz7_root_10_rep2 13,528,403 11,442,447 84.6%
jaz7_root_10_rep3 20,662,422 17,233,780 83.4%
Total 577,601,296 452,171,360 78.3%
111
Table 3-2. Gene isoform number in TAIR10 and determined by assembly of the MeJA RNA-Seq data.
isoform # TAIR10 MeJA project
gene # percentage gene # percentage
1 21402 78.67% 9201 67.42%
2 4251 15.63% 2998 21.97%
3 1133 4.17% 881 6.46%
4 291 1.07% 345 2.53%
5 89 0.33% 115 0.84%
6 26 0.10% 65 0.48%
7 7 0.03% 21 0.15%
8 5 0.02% 11 0.08%
9 1 0.00% 6 0.04%
10 1 0.00% 0 0.00%
11 0 0.00% 1 0.01%
12 0 0.00% 2 0.02%
13 0 0.00% 0 0.00%
14 0 0.00% 0 0.00%
15 0 0.00% 1 0.01%
Total 27206 100.00% 13647 100.00%
112
Table 3-3. Differential AS or expression of gene undergoing AS in treatment comparisons.
genotype tissue differential AS
%a differential expression
%a splicing.diff promoter.diff cds.diff total
WT shoot 44 3 6 48 0.99 537 12.08
jaz2 shoot 32 5 8 37 0.72 586 13.18
jaz7 shoot 234 90 72 326 5.26 1400 31.49
WT root 49 25 15 75 1.10 886 19.93
jaz2 root 32 14 9 48 0.72 557 12.50
jaz7 root 43 22 13 67 0.97 1131 25.44
Table 3-4. Differential AS or expression of gene undergoing AS in tissue comparisons.
genotype MeJA treatment
differential AS %a
differential expression
%a splicing.diff promoter.diff cds.diff total
WT 0μM 159 86 41 249 3.58 1943 43.70
jaz2 0μM 162 75 37 243 3.64 1896 42.65
jaz7 0μM 246 136 88 389 5.53 2049 46.09
WT 10μM 267 165 84 429 6.00 2300 51.73
jaz2 10μM 140 84 53 230 3.15 1885 42.40
jaz7 10μM 259 111 68 372 5.83 2389 53.73
Table 3-5. Differential AS or expression of gene undergoing AS between mutant backgrounds.
tissue MeJA treatment
differential AS %a
differential expression
%a splicing.diff promoter.diff cds.diff total
shoot 0μM 84 6 25 90 2.05 380 8.55
root 0μM 40 14 10 53 0.90 718 16.15
shoot 10μM 205 58 77 231 4.61 1096 24.65
root 10μM 76 10 22 79 1.71 670 15.07 a Percentage was calculated by dividing total changes by a total number of AS genes (4,446) detected.
113
CAPTER 4 ORIGIN AND EVOLUTION OF THE TIFY PLANT-SPECIFIC MULTI-DOMAIN GENE
FAMILY
Background
During the process of genome evolution new arrangements of domains are
formed, which provide additional resources for natural selection. Novel domain
arrangements which are beneficial to the species are likely to be fixed in the population
(Bornberg-Bauer and Albà 2013) and continuous changes in domain arrangement is
one of the main forces shaping the evolution of species. It was estimated that more than
70% of eukaryotic genes code for multi-domain proteins (Han et al. 2007). However, our
understanding for the origin and evolution of the multi-domain gene families is limited.
TIFY is an excellent example of a multi-domain gene family.
The TIFY family is a plant-specific gene family defined by a highly conserved
domain (TIFY), which is named after the core amino acid motif TIF[F/Y]XG (Vanholme
et al. 2007). The TIFY domain is about 36 amino acids long and forms a beta-beta-
alpha fold that mediates protein-protein interaction between TIFY proteins as well as
with other transcription factors (Vanholme et al. 2007; Chung et al. 2009). Proteins
within the TIFY family could be further divided into four subfamilies based on domain
architecture (Vanholme et al. 2007; Chung et al. 2009; Bai et al. 2011): 1) ZML
subfamily, with TIFY, CCT and GATA domains; 2) JAZ subfamily, with TIFY and Jas
domains; 3) PPD subfamily, with TIFY, PPD and Jas-like domains; and 4) TIFY
subfamily, with only the TIFY domain. TIFY is a plant specific gene family, and is thus
expected to be involved in plant specific functions. The JAZ subfamily acts as
transcription repressors in the first step of the jasmonate signaling pathway (Chini et al.
2016). The Arabidopsis PPD genes are responsible for lamina size and regulation of
114
leaf curvature (White 2006). The ZML genes are transcription factors involved in
responses to high-light (Shaikhali et al. 2012) and wound-induced lignification (Vélez-
Bermúdez et al. 2015). Understanding the origin and evolution of the TIFY plant-specific
multi-domain gene family would provide insight into the evolution of its domain
architecture and the relationship between its domain architecture and the plant specific
functions these genes are involved in.
In this project I studied the evolution of the domains contained within the TIFY
gene family members, and the evolutionary history of each TIFY subfamily and the
domain gain/loss dynamics during gene family evolution. I further analyzed differences
of the TIFY domain among the four subfamilies and characterized AS events and their
conservation within the PPD subfamily.
Materials and Methods
Identification of Members in the TIFY Gene Family
I collected protein annotations from 76 plant species and generated a database
including only the primary protein sequences. For species with no primary protein
annotation, the longest protein isoform of each locus was used as the primary protein.
Profile HMM searches against this primary protein database with the Pfam profile HMM
database were conducted with HMMER v3.1b2 (Eddy 2011). The PPD, SRT and Jas-
like domains are not available in the Pfam database, but a profile HMM for these were
generated by HMMER v3.1b2 with the domain alignments and these were used to
identify proteins containing them. The PPD and Jas-like domains were identified from
previous studies (White 2006; Thireault et al. 2015). The SRT domain was identified by
MEME in this study. For each domain, the identification criteria was E-value of ≤ 1.0E-5
for sequence prediction in HMMER search. Proteins containing TIFY, CCT, ZML, Jas
115
(CCT_2 in Pfam), Jas-like, PPD, or SRT were kept for phylogenetic analysis. Clusters
that did not contain any protein with a TIFY domain were removed, as were proteins
with abnormal alignments. The remaining proteins were regarded as candidate
members in the TIFY family.
Multiple Sequence Alignment and Phylogeny
The multiple sequence alignments of each subfamily (TIFY, PPD, ZML, JAZ)
were generated with whole length protein sequences using Muscle v3.8.31 (Edgar
2004) with default parameters. These alignments were used to generate ML phylogeny
trees with RAxML v8.2.3 (Stamatakis 2014) under the LG4X model (Le et al. 2012). For
each subfamily, I performed five tree searches and the tree with the highest likelihood
was selected as the ML tree. I ran 500 bootstrap replicates for the dataset of PPD and
TIFY subfamilies, and 200 bootstrap replicates for the dataset of the ZML and JAZ
subfamilies with ML under LG4X model (Le et al. 2012) using RAxML v8.2.3
(Stamatakis 2014).
Domain Identification
The TIFY, CCT, GATA, and Jas domains in the TIFY family were identified with
HMMER v3.1b2 using the Pfam domain profile HMM database (TIFY - PF06200, CCT -
PF06203, GATA - PF00320, Jas - PF09425) (Finn et al. 2014). The PPD and Jas-like
domains were identified by HMMER v3.1b2 with domain profile HMMs generated from
alignments of domain annotations from previous studies (White 2006; Thireault et al.
2015). In addition, MEME v4.12.0 (Bailey et al. 2006) was used to search for other
domains present in addition to the TIFY domain in the TIFY subfamily. Domain
sequence logos were generated by Weblogo Berkeley
(http://weblogo.berkeley.edu/logo.cgi).
116
Rate-Shift Analysis of the TIFY Domain
We compared amino acid substitution rate differences in the TIFY domain
between the four subfamilies ZML, JAZ, PPD and TIFY using an adapted rate shift
detecting method (Gaucher et al. 2011). First, the amino acid substitution rates in each
site of the TIFY domain for ZML, JAZ, PPD and TIFY subfamilies were estimated using
PAML v4.9a (Yang 2007) with Γ-distributed rate variation among sites under the LG
model (Le and Gascuel 2008). Secondly the average amino acid substitution rate in
each site of the TIFY domain of the four subfamilies was calculated. Third, the amino
acid substitution rate of the TIFY domain of each subfamily was compared to the
average amino acid substitution rate at each site. Large positive and negative values in
the rate differences indicate rate shifts and identify variable (evolving) and conserved
sites in the corresponding subfamily relative to the whole family. 2.57 standard
deviations from the mean was used as a cutoff to detect significant rate shifted sites.
Alternative Splicing Analysis
We identified AS in the PPD subfamily genes from 7 species (A. trichopoda, E.
guineensis, M. acuminata, A. thaliana, P. trichocarpa, V. vinifera, S. lycopersicum) with
a specific focus on the intron interrupting the Jas-like domain (Jas-like intron). Transcript
AS data in A. trichopoda, E. guineensis and M. acuminata was acquired from Mei et al.
(2017); transcript AS data in P. trichocarpa, V. vinifera, S. lycopersicum was extracted
from Chamala et al. (2015); and the AS data in A. thaliana was obtained from Araport11
annotation (Cheng et al. 2016). Among the 14 PPD genes in the 7 species, 7 genes
from 4 species (A. thaliana, P. trichocarpa, V. vinifera, S. lycopersicum) have evidence
of AS in the Jas-like intron (the intron interrupting the Jas-like domain). The primary
gene structure of the PPD genes from these four species was illustrated using Gene
117
Structure Display Server 2.0 (Hu et al. 2015) with the observed AS patterns in the Jas-
like intron indicated.
Results
Domain Identification and Evolutionary History
The TIFY family is defined by the presence of the TIFY domain. Four subfamilies
(ZML, JAZ, PPD, TIFY) with different domain arrangement have been identified (Chung
et al. 2009; Bai et al. 2011). We applied a thorough search against all documented
domains in the Pfam database in an attempt to detect the presence of other domains in
addition to the TIFY domain. Other than the domains identified in the ZML, JAZ, PPD
and TIFY subfamilies no new domains were identified in combination with the TIFY
domain. Of the four subfamilies in the TIFY family, the TIFY subfamily is the only family
defined by a single domain – the TIFY domain. To detect whether there are other
domains/motifs present in the TIFY subfamily a domain search was conducted with
MEME software and a highly conserved domain/motif was detected. I named the
domain/motif SRT based on the most conserved three amino acids in the alignments
(Figure 4-1). Therefore, the TIFY subfamily, like the remaining 3 subfamilies, also
contains multiple domains. A total of seven domains are identified in TIFY family: TIFY
(in all four subfamilies); CCT (in the ZML subfamily); GATA (in the ZML subfamily); Jas
(in the JAZ subfamily); Jas-like (in the PPD subfamily); PPD (in the PPD subfamily); and
SRT (in the TIFY subfamily) (Figure 4-1). The CCT and GATA domains have been
detected in all available genome sequences including algae, moss, fern, gymnosperm
and angiosperms (Figure 4-2). In comparison, the TIFY, Jas, Jas-like and SRT domains
were only observed in land plants (moss, fern and seed plants); and the PPD domain
was only identified in vascular plants (fern and seed plants) (Figure 4-2). Interestingly,
118
none of the eleven poales contain PPD, Jas-like or SRT domain, which is a strong
indication for domain loss. Almost all identified TIFY, Jas/Jas-like, SRT and PPD
domains in the genome are associated with TIFY family genes. The few exceptions
observed were the result of poor alignments and these instances were removed from
analysis. Thus, TIFY, Jas/Jas-like, PPD, SRT domains are likely TIFY-family-specific. In
comparison, only ~12.3% and ~13.8% of the CCT or GATA domains, respectively,
identified in the genome are in the TIFY family.
Gene Family Identification and Evolution History
A total of 1373 TIFY family proteins were identified from 76 plant species. Based
on phylogeny and domain arrangement the TIFY family is divided into four subfamilies –
JAZ, ZML, PPD and TIFY (Figure 4-3). The algae genomes do not contain any proteins
belonging to the TIFY family. The JAZ, ZML and TIFY proteins containing all domains
defining that family were present in moss, fern, gymnosperms and angiosperms. A JAZ
gene in moss, Phpat.005G044400, which contains the TIFY and Jas-like domains and
lacks the PPD domain, might be the ancestor of PPD genes. The PPD proteins
containing all three domains were firstly identified in fern. The TIFY and PPD
subfamilies are the smallest families with an average of 1.5 and 1.7 genes per species,
respectively. Notably, none of the eleven poales examined contain PPD and TIFY
subfamily proteins; however, the PPD and TIFY subfamily proteins were identified in the
ancient monocot species duckweed, orchid, as well as in the palm and banana
lineages. This suggests PPD and TIFY subfamilies were lost within the common
ancestor of grasses during monocot evolution. The JAZ and ZML subfamilies are large
families with an average of 15.6 and 4.7 genes per genome, which suggests they went
through gene family expansion during evolution.
119
Phylogenetic analysis suggests that the ZML subfamily contains three subgroups
(named A, B, and C) and the JAZ subfamily contains five subgroups (named A, B, C, D
and E) (Figure 4-4). The ZML_A subgroup was rooted by one Amborella gene, and the
ZML_B and ZML_C subgroups together were rooted by one Amborella gene. In addition
to the Amborella branch, each of the three ZML subgroups roughly consist of a monocot
and an eudicot cluster. The five JAZ subgroups all contain gymnosperm genes.
Moreover, the JAZ_A subgroup contains fern genes and the JAZ_B, C, D, E subgroups
were rooted by four fern genes. Interestingly, similar to the loss of PPD and TIFY
subfamilies, the ZML_B subgroup and JAZ_C subgroup were also lost in poales (Figure
4-5). The PPD and TIFY subfamilies did not undergo gene family expansion prior to the
divergence of the monocot and eudicot lineages.
Domain Dynamics during Evolution
We identified five sites of the TIFY domain with significantly changed amino acid
substitution rates among the four subfamilies (Figure 4-6). Compared among the entire
TIFY family, the first, third and thirty-second sites are more conserved in the PPD, TIFY
and ZML subfamilies respectively; the twenty-fourth and thirty-sixth sites are more
dynamic in the PPD and JAZ subfamilies respectively. Sites under rate-shift could
contribute to functional differences of the TIFY domain within different subfamilies. The
ZML subfamily contains a diverged amino acid pattern of the TIFY motif – T[L/I]S[F/V],
whereas the other three families contains the TIFY pattern.
In order to track domain gain/loss dynamics in the TIFY multidomain gene family,
the phylogenetic tree describing the PPD, TIFY and ZML subfamilies was annotated
with the domain arrangements (Figure 4-7). I observed 22 instances (26.2%) where one
or more domains in PPD proteins were lost. These instances include 1 case of PPD
120
domain loss, 8 cases of TIFY domain loss, and 15 cases of Jas-like domain loss. I did
not observe any domain loss in the TIFY subfamily. 70 genes (23.7%) within the ZML
subfamily are missing one or more domains. Among these, 39 genes have lost the TIFY
domain, 9 genes have lost the CCT domain, and 32 genes have lost the GATA domain.
Alternative Splicing of Jas-like Intron in PPD Genes
In order to assess AS of the Jas-like intron in the PPD subfamily, AS data from 7
species were examined: one basal angiosperm – A. trichopoda; two monocots – E.
guineensis, M. acuminata; and four eudicots – A. thaliana (eurosid II), P. trichocarpa
(eurosid I), V. vinifera (eurosid), S. lycopersicum (asterid). Among the five PPD genes
from the two monocot species, only one gene (Achr2P08870 from M. acuminata)
contains the Jas-like domain and no AS of the Jas-like intron was observed. The single
PPD gene Amtr0002.597 from A. trichopoda contains the Jas-like domain, but there
was no evidence for AS in the Jas-like intron. All of the eight PPD genes from eudicot
species have the Jas-like domain, while evidence of AS in the Jas-like intron was
observed for 7 (Figure 4-8). Of the seven AS events, one is AltD event (AT4G14713
from A. thaliana), two are AltA event (Potri.002G048500 and Potri005G214300 from P.
trichocarpa), and four are IntronR event (GSVIVT01018038001 and
GSVIVT01003113001 from V. vinifera, Solyc09g065630 and Solyc06g084120 from S.
lycopersicum). Six of the seven AS events cause a stop codon immediate downstream
of the α-helix of the Jas-like domain and would lead to a truncated Jas-like domain that
lacks seven amino acids in the C-terminus.
121
Discussion
Evolutionary History of the TIFY Family
1373 TIFY family genes were identified from 76 land plants, including algae,
moss, fern, Amborella, monocots and eudicots. Based on phylogeny and domain
arrangement, the TIFY family can be divided into four subfamilies: JAZ, ZML, PPD and
TIFY. The evolutionary history of the TIFY family can be inferred from the domain and
family distribution within the TIFY phylogeny. No TIFY family gene were identified in
algae species suggesting the TIFY family originated after the divergence of algae from
land plants, which is in agreement with previous observations (Bai et al. 2011). Among
the seven domains (TIFY, Jas, CCT, GATA, PPD, SRT, Jas-like) in the TIFY family,
only the CCT and GATA domain were found in algae species. Genes containing the
CCT domain and genes containing the GATA domain may have merged along with the
newly evolved TIFY domain to form the ZML gene in the early ancestor of land plants.
The CCT and GATA domains were also observed in other gene families; for example,
the B-box zinc finger family has members containing both the CCT and B-box domains
(Khanna et al. 2009). Sequence similarity of the Jas domain with the first half of the
CCT domain (Chung et al. 2009; Figure 4-1) suggests that the Jas domain likely derived
from the CCT domain. In addition, the Jas-like domain is likely a variant of the Jas
domain. Unlike the CCT and GATA domains, the TIFY, Jas/Jas-like, SRT and PPD
domains are restricted to the TIFY family and their origins coincides with the origin of
the four subfamilies. Distribution and phylogeny of subfamilies suggested that the JAZ,
ZML and TIFY subfamilies originated in land plants and the PPD subfamily originated in
vascular plants (Figure 4-5).
122
During evolution the JAZ and ZML subfamilies expanded. We identified five
subgroups within the JAZ subfamily and three subgroups within the ZML subfamily. The
expansion of the subgroups occurred at different times within two subfamilies (Figure 4-
5). In moss, the JAZ subfamily experienced duplication, forming two groups: the JAZ_A
subgroup and the precursor of JAZ_B/C/D/E subgroup. The later experienced further
expansion forming the JAZ_B, _C, _D, and _E subgroups after the divergence of ferns
and before the divergence of gymnosperms. However, the ZML family has not
experienced any subgroup expansion before the origin of gymnosperms. The first
expansion of the ZML family occurred after the divergence gymnosperms from seed
plants and before the origin of angiosperms and resulted in the ZML_A and ZML_B/C
subgroups in Amborella. The ZML_B/C subgroup further duplicated and formed the
ZML_B and ZML_C subgroups after the divergence of Amborella, but before the
divergence of monocots and eudicots. The PPD and TIFY families did not undergo gene
family expansion and only contain a single subgroup in each family (Figure 4-5).
Poales Experienced Many Gene Loss Events
Like gene family expansion, gene family loss also occurs during evolution. Bai et
al. (2011) observed loss of the PPD subfamily in O. sativa, B. distachyon, S. bicolor and
Z. mays and suggested that the PPD subfamily was lost from monocots. Increased
sampling across 76 plant species including ancient members of the monocots
(duckweed, orchid, palm and banana) and members of the grasses in this study could
more precisely place the PPD subfamily loss event to the common ancestor of all
poales rather than the common ancestor of monocots as suggested by Bai et al. (2011)
(Figure 4-1).
123
Similar to PPD subfamily, TIFY subfamily also experienced loss in poales (Figure
4-1). The previous study (Bai et al. 2011) identified two TIFY genes in O. sativa and Z.
mays based on the presence of the TIFY domain and loss of the other domains within
the protein sequences. The identification of gene family based only on
presence/absence of domains could be misleading as genes containing only the TIFY
domain could be derived from a JAZ gene that has recently lost the Jas domain, or a
ZML gene that has recently lost the CCT and GATA domain, or a PPD gene which has
lost the PPD and Jas-like domains. Rather than placing all TIFY family genes that did
not contain any other domains into the TIFY subfamily, I applied phylogeny for
subfamily classification which proved sensitive for cases of domain gain/loss during
evolution. In addition, I identified a new domain, SRT, that could facilitate the
classification of the TIFY subfamily.
Based on phylogeny, I identify that the loss of the ZML_B and JAZ_C subgroups
occurred within poales. Given this, poales have lost a large portion of the TIFY family
(PPD subfamily, TIFY subfamily, ZML_B subgroup, and JAZ_C subgroup), which
suggests that the roles played by these genes in other species are lost or being fulfilled
by a different set of genes within the poales.
Domain Loss of TIFY Multidomain Family during Evolution
One interesting aspect of multidomain gene family evolution are the dynamics of
domain gain/loss (Stolzer et al. 2015). In this project, I explored domain dynamics within
the ZML, PPD and TIFY subfamilies. The TIFY subfamily represents cases where there
are restricted domain gain/loss events as all the gene members contain a SRT and a
TIFY domain. The restricted domain dynamics suggest that it is important for both
domains to be present for TIFY subfamily gene function. Genes which have lost the
124
SRT or TIFY domain could be subjected to strong purifying selection and quickly
removed from the genome.
Approximately 25% of the ZML and PPD subfamily genes have lost one or more
domains, suggesting domain loss is a frequent event in the two subfamilies. PPD genes
typically contain a N-terminal PPD domain, a TIFY domain and a C-terminal Jas-like
domain (White 2006; Bai et al. 2011; Zhang et al. 2012). Variants in domain
arrangement of the PPD subfamily include: absence of the PPD domain, absence of the
TIFY domain, absence of the Jas-like domain, lack of both PPD and Jas-like domains,
and lack of both TIFY and Jas-like domains (Figure 4-7). Arabidopsis PPD genes PPD1
and PPD2 are regulators of lamina size and leaf blade curvature (White 2006). Further
exploration of domain function would help to understand the function of PPD genes
lacking one or more domains.
The ZML subfamily contains three domains: TIFY, CCT and GATA (Bai et al.
2011). We identified five patterns of domain loss in the ZML family: loss of TIFY, loss of
CCT, loss of GATA, loss of both CCT and GATA, and loss of both TIFY and GATA
(Figure 4-7). The Arabidopsis ZML1 and ZML2 genes are transcriptional activators of
photoprotective responses by interacting with the CryR1 cis-element in the promoter of
high-light responsive genes (Shaikhali et al. 2012). The maize ZML2 gene regulates
wound-induced lignification by acting as a transcriptional repressor which binds in the
form of MYB/ZML complex with the GAT(A/C) and AC-rich cis-elements of lignin genes
(Vélez-Bermúdez et al. 2015). The TIFY domain of the JAZ subfamily functions in
protein-protein interaction (Chini et al. 2009; Pauwels et al. 2010), which might be a
similar case for the TIFY domain in the ZML subfamily. If that were the case, ZML
125
genes without the TIFY domain may have lost the ability to interact with other proteins,
such as with other ZMLs or MYB. The CCT domain was predicted to contain a nuclear
localization signal (Nishii et al. 2000). Losing the CCT domain may render the ZML
protein unable to enter the nucleus and affect its ability to function as transcription
factor. However, if an interacting protein contains the nuclear localization signal ZML
without the CCT domain may still be properly targeted. The GATA domain is a DNA-
binding domain that recognizes a specific cis-element (Nishii et al. 2000; Teakle et al.
2002). Lost of the GATA domain may lead to a ZML protein that is unable to bind DNA
sequence and thus fails to activate or repress downstream gene targets. If a ZML
protein that has lost a GATA domain is still able to interact with other proteins it may
compete with normal ZML proteins.
Alternative Splicing of Jas-like Intron of PPD Genes
Based on sequence similarity the Jas-like domain is likely a variant of the Jas
domain, which in turn may have originated from the CCT domain (Chung et al. 2009).
JAZ proteins are transcriptional repressors which could bind with transcription factors to
repress their function (Thines et al. 2007; Chini et al. 2007; Chini et al. 2016). The Jas
domain of JAZ proteins is split by an intron (the Jas intron) into two parts: a 20 amino
acid N-terminal motif and a 7 amino acid C-terminal motif (X5PY) (Figure 4-8; Chung et
al. 2010). AS in the intron of the Jas domain of JAZ genes plays an important functional
role in jasmonate signaling pathway (Yan et al. 2007; Chung and Howe 2009; Chung et
al. 2010; Moreno et al. 2013). Moreover, this AS event is conserved among monocots
and eudicots (Chung et al. 2010). AS around the Jas intron usually causes a stop codon
downstream of the 20 N-terminal motif and leads to a truncated protein lacking the 7 C-
terminal motif X5PY(Yan et al. 2007; Chung and Howe 2009; Chung et al. 2010). The
126
Jas domain of JAZ proteins has two main functions: 1) JA-Ile mediated interaction with
COI1 that leads to polyubiquitinization of the JAZ proteins by ubiquitin ligase followed by
degradation by the 26S proteasome (Bryan et al. 2007; Chini et al. 2007; Katsir et al.
2008); 2) interaction with transcription factors to suppress their functions (Fernández-
Calvo et al. 2011; Qi et al. 2011; Song et al. 2013; Fonseca et al. 2014; Thatcher et al.
2016). The truncated JAZ proteins lacking the X5PY motif within the Jas domain retain
their ability to interact with transcription factors but their recognition by COI1 is absent or
compromised (Chung and Howe 2009; Chung et al. 2010; Moreno et al. 2013; Zhang et
al. 2017a). Absence of COI1 binding would prevent its degradation resulting in an
abundance of the transcriptional repressor. Thus, the AS isoforms of the JAZ proteins
may function as permanent repressors.
To detect whether there are similar AS events in the Jas-like domain of PPD
genes, and to investigate whether those AS events are conserved among species, I
collected gene structures and AS data from 7 species including Amborella, two
monocots and four eudicots. Similar to the Jas domain, the Jas-like domain in PPD
genes are split by an intron (the Jas-like intron) into a 20 N-terminal motif and a 7 C-
terminal motif. We observed conserved AS in the Jas-like intron of PPD genes in the
four eudicot species (Figure 4-8). Similar to the AS event in the intron within the Jas
domain within JAZ genes, an AS event in the intron within the Jas-like domain of the
PPD genes frequently (6/7) causes a stop codon leading to a truncated protein lacking
the 7 C-terminal motif. However, I noticed significant differences between the 7 C-
terminal motif of the Jas domain and the Jas-like domain (Figure 4-8). In the Jas
domain, the 7 C-terminal motif contains two highly conserved amino acids, PY, at the C-
127
terminus (Katsir et al. 2008). While in Jas-like domain, the 7 C-terminal motif contains a
conserved basic amino acid (R or K) in the second site and is enriched in basic amino
acids from the 4th to the 7th site (Figure 4-8). Thus the functional lessons we learned
from AS in the Jas domain may not directly apply to AS in the Jas-like domain.
However, it could provide two directions for future functional analysis. The first is to
investigate whether the Jas-like domain could interact with COI1 protein, and if so,
whether the loss of the 7 C-terminal motif affects this interaction. Second, whether the
Jas-like domain could interact with transcription factors and whether the loss of the 7 C-
terminal motif affect this interaction. Additionally, nuclear localization sequences are
usually enriched in basic amino acids (Raikhel 1992). The 16th to 27th sites of the Jas-
like domain are enriched with basic amino acids (Figure 4-1), and it will be interesting to
see whether the Jas-like domain carries a nuclear localization signal and whether AS
could interrupt this signal.
128
Figure 4-1. Logos of the domains in the TIFY family. Available secondary structure
information and predicted functional regions were indicated.
129
Figure 4-2. Domain distributions across 76 plant species. Absence of a domain is
indicated by grey-tone shading. The solid black triangles indicate that the identified domains were limited in the TIFY family. The black-white triangles indicate that only some of the identified domains belong to the TIFY family.
Geologic Timescale
Time (Mya)
z
e
g
ba
0 300 600 900 1660
X
TIFY Jas CCT GATA PPD SRT Jas-like
1 C. paradoxa 0/0 0/0 0/1 0/3 0/0 0/0 0/-
2 C. merolae (Strain 10D) 0/0 0/0 0/3 0/6 0/0 0/0 0/-
3 P. purpureum CCMP1328 0/0 0/0 0/4 0/2 0/0 0/0 0/-
4 V. carteri 0/0 0/0 0/5 0/8 0/0 0/0 0/-
5 C. reinhardtii 0/0 0/0 0/8 0/12 0/0 0/0 0/-
6 C. subellipsoidea C-169 0/0 0/0 0/3 0/6 0/0 0/0 0/-
7 C. variabilis NC64A 0/0 0/0 0/4 0/6 0/0 0/0 0/-
8 B. prasinos 0/0 0/0 0/5 0/7 0/0 0/0 0/-
9 M. pusilla CCMP1545 0/0 0/0 0/3 0/9 0/0 0/0 0/-
10 M. pusilla RCC299 0/0 0/0 0/4 0/6 0/0 0/0 0/-
11 O. lucimarinus 0/0 0/0 0/5 0/8 0/0 0/0 0/-
12 O. sp. RCC809 0/0 0/0 0/3 0/5 0/0 0/0 0/-
13 O. tauri 0/0 0/0 0/4 0/3 0/0 0/0 0/-
14 P. patens moss 16/16 6/6 4/30 4/14 0/0 2/2 1/-
15 S. moellendorffii fern 11/12 2/3 2/13 2/8 3/3 1/1 3/-
16 G. biloba common ginkgo 17/17 8/8 3/19 2/22 1/1 3/3 1/-
17 P. abies Norway spruce 50/56 26/27 1/16 0/13 2/2 3/3 1/-
18 P. taeda loblolly pine 67/68 24/25 1/15 0/14 2/2 2/3 2/-
19 A. trichopoda 10/10 4/4 2/17 2/19 1/1 1/1 1/-
20 S. polyrhiza duckweed 15/15 9/9 3/25 3/20 1/1 1/1 1/-
21 P. equestris orchid 16/16 14/15 4/23 3/21 1/1 1/1 0/-
22 P. dactylifera data palm 16/16 14/15 6/31 5/17 0/0 1/1 0/-
23 E. guineensis African oil palm 14/15 12/12 6/38 4/30 2/2 0/1 0/-
24 M. acuminata banana 49/50 31/32 6/71 6/53 3/3 3/2 1/-
25 M. balbisiana wild banana 31/36 22/26 4/52 3/38 1/2 3/3 1/-
26 P. virgatum switchgrass 42/42 31/31 6/66 6/57 0/0 0/0 0/-
27 P. hallii Hall's panicgrass 21/21 19/19 3/35 3/29 0/0 0/0 0/-
28 S. italica foxtail milet 19/19 17/17 3/36 3/28 0/0 0/0 0/-
29 S. bicolor sorghum 21/21 20/20 3/36 3/30 0/0 0/0 0/-
30 Z. mays maize 32/34 30/34 3/52 3/41 0/0 0/0 0/-
31 O. sativa rice 17/18 17/18 4/40 4/25 0/0 0/0 0/-
32 P. heterocycla moso bamboo 19/20 23/23 6/44 5/29 0/0 0/0 0/-
33 B. distachyon purple false brome 18/18 17/17 5/36 5/28 0/0 0/0 0/-
34 H. vulgare barley 12/12 11/11 4/26 4/17 0/0 0/0 0/-
35 T. aestivum bread wheat 33/33 31/32 12/79 9/52 0/0 0/0 0/-
36 T. urartu wheat A genome progenitor 11/11 10/10 4/31 4/14 0/0 0/0 0/-
37 A. coerulea Colorado blue columbine 12/13 4/4 4/22 5/26 1/1 1/1 1/-
38 N. nucifera sacred lotus 18/18 11/12 5/38 5/33 2/2 2/2 2/-
39 B. vulgaris sugar beet 12/12 6/6 4/22 4/16 1/1 1/1 0/-
40 A. chinensis kiwifruit 7/9 8/8 8/46 7/39 0/0 0/0 0/-
41 U. gibba humped bladderwort 19/21 9/9 3/36 2/28 2/2 2/2 1/-
42 M. guttatus monkeyflower 15/15 8/8 4/32 4/26 2/2 1/1 2/-
43 N. benthamiana tobbacco 25/26 15/15 6/46 6/52 3/3 2/2 2/-
44 C. annuum pepper 16/16 8/8 4/26 3/27 2/2 1/1 2/-
45 S. lycopersicum tomato 19/20 12/12 4/30 4/30 2/2 1/1 2/-
46 S. tuberosum potato 20/20 11/11 3/30 3/30 2/2 1/1 2/-
47 V. vinifera grapevine 19/19 10/10 4/26 4/19 2/2 1/1 2/-
48 E. grandis flooded gum 19/19 12/12 4/27 3/22 1/1 1/1 1/-
49 C. sinensis orange 14/14 7/8 4/28 4/22 2/2 1/1 2/-
50 G. raimondii cotton 28/28 15/15 8/57 8/46 3/3 1/1 3/-
51 T. cacao cacao tree 17/17 9/9 5/30 5/24 2/2 1/1 2/-
52 C. papaya papaya 13/13 6/6 4/25 2/22 1/1 1/1 1/-
53 B. rapa field mustard 35/35 21/23 5/65 5/60 2/2 2/2 2/-
54 E. salsugineum salt cress 14/14 8/9 3/34 3/30 1/1 1/1 1/-
55 A. thaliana 18/18 10/12 3/40 3/30 2/2 1/1 2/-
56 C. grandiflora 17/17 9/10 3/37 3/26 2/2 1/1 2/-
57 B. stricta Drummond's rockcress 17/17 9/10 4/38 3/30 2/2 1/1 1/-
58 C. sativus cucumber 15/15 9/9 4/30 4/25 1/1 2/2 1/-
59 C. lanatus watermelon 15/15 9/10 4/26 4/21 1/1 2/2 1/-
60 M. domestica apple 29/32 18/19 5/49 3/35 2/2 3/3 2/-
61 P. bretschneideri Chinese white pear 23/26 12/12 6/44 6/30 2/3 2/2 2/-
62 P. persica peach 16/16 8/8 5/27 5/20 1/1 1/1 1/-
63 P. mume mei 14/14 6/6 5/26 5/20 1/1 1/1 1/-
64 F. vesca woodland strawberry 11/13 5/5 3/25 3/19 0/1 1/1 0/-
65 G. max soybean 35/37 20/21 9/68 9/59 2/2 2/3 2/-
66 P. vulgaris common bean 18/18 9/9 6/38 6/32 1/1 2/2 1/-
67 C. cajan pigeon pea 17/17 11/11 4/33 4/33 1/1 1/1 1/-
68 M. truncatula barrel medic 20/20 12/13 4/36 4/41 1/1 1/1 1/-
69 C. arietinum chickpea 16/16 9/9 5/35 5/27 1/1 1/1 1/-
70 L. japonicus birdsfoot trefoil 11/11 4/4 4/27 3/19 1/1 1/1 1/-
71 R. communis castor bean 14/14 9/9 4/28 4/19 2/2 1/1 2/-
72 M. esculenta cassava 26/26 15/15 7/46 7/35 3/3 1/2 2/-
73 J. curcas physic nut 11/12 7/7 5/30 5/27 3/3 1/1 0/-
74 L. usitatissimum flax 23/24 13/14 4/47 3/33 2/2 2/2 2/-
75 P. trichocarpa poplar 22/22 12/12 8/47 7/39 2/2 2/2 2/-
76 S. purpurea willow 21/21 11/11 7/44 7/39 2/2 2/2 2/-
1,288/1,326 805/835 286/2,324 263/1,911 83/86 74/77 70/-
# Domains of TIFY super family / # Domains of the genome
Total
No. Species Common name
Po
ale
sA
ste
rid
sE
uro
sid
s I
IE
uro
sid
s I
130
Figure 4-3. Distribution of ZML, JAZ, PPD and TIFY subfamilies in plant species.
Absence of a subfamily is indicated by shade.
Geologic Timescale
Time (Mya)
z
e
g
ba
0 300 544
JAZ ZML PPD TIFY
14 P. patens moss 10 4 1 2
15 S. moellendorffii fern 5 2 3 1
16 G. biloba common ginkgo 11 3 1 3
17 P. abies Norway spruce 51 1 2 3
18 P. taeda loblolly pine 63 1 2 2
19 A. trichopoda 6 2 1 1
20 S. polyrhiza duckweek 11 3 1 1
21 P. equestris orchid 12 4 1 1
22 P. dactylifera data palm 10 7 0 1
23 E. guineensis African oil palm 9 6 2 0
24 M. acuminata banana 37 8 3 3
25 M. balbisiana wild banana 28 5 1 3
26 P. virgatum switchgrass 36 6 0 0
27 P. hallii Hall's panicgrass 18 3 0 0
28 S. italica foxtail milet 16 3 0 0
29 S. bicolor sorghum 18 3 0 0
30 Z. mays maize 31 3 0 0
31 O. sativa rice 14 4 0 0
32 P. heterocycla moso bamboo 18 6 0 0
33 B. distachyon purple false brome 15 5 0 0
34 H. vulgare barley 11 4 0 0
35 T. aestivum bread wheat 27 12 0 0
36 T. urartu wheat A genome progenitor 10 4 0 0
37 A. coerulea Colorado blue columbine 4 6 1 1
38 N. nucifera sacred lotus 11 5 2 2
39 B. vulgaris sugar beet 6 4 1 1
40 A. chinensis kiwifruit 7 8 0 0
41 U. gibba humped bladderwort 12 3 2 2
42 M. guttatus monkeyflower 9 4 2 1
43 N. benthamiana tobbacco 17 6 3 2
44 C. annuum pepper 9 4 2 1
45 S. lycopersicum tomato 12 4 2 1
46 S. tuberosum potato 14 3 2 1
47 V. vinifera grapevine 11 5 2 1
48 E. grandis flooded gum 12 5 1 1
49 C. sinensis orange 7 4 2 1
50 G. raimondii cotton 16 8 3 1
51 T. cacao cacao tree 9 5 2 1
52 C. papaya papaya 8 4 1 1
53 B. rapa field mustard 26 5 2 2
54 E. salsugineum salt cress 9 3 1 1
55 A. thaliana 12 3 2 1
56 C. grandiflora 11 3 2 1
57 B. stricta Drummond's rockcress 11 4 2 1
58 C. sativus cucumber 9 4 1 2
59 C. lanatus watermelon 9 4 1 2
60 M. domestica apple 21 5 2 3
61 P. bretschneideri Chinese white pear 13 6 2 2
62 P. persica peach 9 5 1 1
63 P. mume mei 8 5 1 1
64 F. vesca woodland strawberry 7 3 0 1
65 G. max soybean 22 9 2 2
66 P. vulgaris common bean 10 6 1 2
67 C. cajan pigeon pea 12 5 1 1
68 M. truncatula barrel medic 14 4 1 1
69 C. arietinum chickpea 9 5 1 1
70 L. japonicus birdsfoot trefoil 5 4 1 1
71 R. communis castor bean 9 4 2 1
72 M. esculenta cassava 15 7 3 1
73 J. curcas physic nut 8 5 4 1
74 L. usitatissimum flax 15 4 2 2
75 P. trichocarpa poplar 13 8 2 2
76 S. purpurea willow 12 7 2 2
920 295 85 74
TIFY Super Family
Po
ale
sA
ste
rid
sE
uro
sid
s I
IE
uro
sid
s I
Total
No. Species Common name
131
Figure 4-4. ML tree of ZML, JAZ, PPD and TIFY subfamilies. Yellow, pink, purple, blue,
green and red indicate proteins from moss, fern, gymnosperms, A. trichopoda, monocots, and eudicots, respectively.
0.40.4
0.40.4
PPD tree TIFY tree
46
48
51
59
36
55
17
77
0.4
ZML tree
0.4
97 74
40
4
32 41
9
23
27
91
90
51
27
45
100
100
93
68
47
100
96
92
100
JAZ tree
0.4
0.4
A
B
C
D
E
A
B
C
132
Figure 4-5. Estimated evolutionary history of the four subfamilies of the TIFY family.
Domain symbols with dotted edges indicate the origin of the domain. Red crosses indicate absence of the subfamily or subgroup. Red arrows indicate expansion or loss events of the subfamily or subgroup.
133
Figure 4-6. Rate-shift sites in the TIFY domain across the four subfamilies. A) Amino
acid substitution rate differences comparing each subfamily with the average of the four subfamilies. B) Sites with significantly shifted rates in each subfamily. Slow or fast sites indicate sites with significantly low or high amino acid substitution rates in the indicated subfamily compared with the whole family. C) TIFY domain logo in each of the four subfamilies with rate-shifted sites indicated. Green triangles indicate slow sites and orange triangles indicate fast sites. Blue asterisks indicate the conserved motif TIFY. Yellow arrows indicate the predicted β-sheet; and orange cylinder indicates the predicted α-helix (Chung et al. 2009).
134
Figure 4-7. Domain dynamics of the PPD, TIFY and ZML subfamilies. Gray squares
indicate the presence of the domains shown in the bottom of each tree. Black squares indicate absence of the indicated domain. In the ML tree, yellow, pink, purple, blue, green and red indicate proteins from moss, fern, gymnosperms, A. trichopoda, monocots, and eudicots, respectively.
PPD TIFY
PPD TIFY Jas-like
SRT TIFY
ZML ZML_continue ZML_continue
TIFY CCT GATA
TIFY CCT GATA
TIFY CCT GATA
135
Figure 4-8. AS in the Jas-like intron of the PPD genes. Comparison of the Jas domain
in JAZ proteins (A) and the Jas-like domain in PPD proteins (B) with the intron position indicated. C) Gene structure of PPD genes from four eudicot species. Homolog exons were connected with gray bars. D) Protein product prediction of the AS isoforms. Three letter codons were separated by dots. Blue indicates sequences in exon7 and orange indicates alternatively spliced sequences in Jas-like intron as shown in C. Black rectangles indicate stop codons.
136
CAPTER 5 CONCLUSIONS AND PERSPECTIVES
During plant evolution, a comprehensive signal interpretation and transduction
network was developed for precise and fast response to biotic and abiotic stresses.
Transcription factors and transcription coregulators play important roles in this network.
The 3R-MYB and TIFY gene families are transcription factors/coregulators with
members that play significant roles in abiotic/biotic stress responses. In Chapter 2 and 4
I investigated the expansion of the two families and examined features from molecular
evolution of the two families which may contribute to adaptation to the environment.
Within the TIFY family, JAZ proteins are important repressors functioning in the first
step of the jasmonate signaling pathway. AS of JAZ proteins plays an important role in
balancing jasmonate signaling. However, our understanding of the AS regulation in
jasmonate pathway is limited. In Chapter 3, I explored various AS and AS-related
regulation of Arabidopsis in response to increased jasmonate.
Evolution and Function of the Plant 3R-MYBs
There are three groups of 3R-MYB genes in angiosperms: A-, B-, and C-groups.
In Arabidopsis and tobacco, A- and B-groups were involved in regulating the G2/M
transition during cell cycle (Araki et al. 2004; Haga et al. 2007; Kato et al. 2009; Haga et
al. 2011; Araki et al. 2012; Araki et al. 2013) while the C-group gene in rice functions in
both upregulating cell duplication and increasing stress resistance (Dai et al. 2007; Ma
et al. 2009). The analysis of predicted MSA promoter number in A-, B-, C-group and
randomly sampled genes implied involvement of the three groups in cell cycle
regulation, and suggested that the mechanism of cell cycle regulation by C-group genes
might be different from that of the A- and B-groups. Based on the expression profiles of
137
3R-MYB genes from ten plant species under various abiotic stresses genes in 3R-
MYBs, especially in C-group 3R-MYBs, may positively regulate stress responses.
In Arabidopsis and tobacco, A-group 3R-MYBs function as activators while B-
group 3R-MYBs function as repressors of G2/M transition. The competition between the
activator and repressor regulates the transition of G2/M. However, this mechanism, if
valid, is not universal for all angiosperms – certainly not for grasses where no B-group
3R-MYBs were identified. Based on phylogeny, B-group 3R-MYBs were likely lost in the
common ancestor of grasses. Moreover, a motif in A-group 3R-MYBs that plays a
repressor role has diverged in grasses relative to other species, suggesting that A-
group 3R-MYBs in grasses may play diverged functional roles. Taken together, the
mechanism of G2/M transition in grasses, which includes many important crop species,
is likely different from that of the eudicot species Arabidopsis and tobacco. Further
investigation of the G2/M transition mechanism in grasses is needed.
Phylogenetic analysis predicts that the expansions of the 3R-MYBs occurred
before the divergence of Amborella from other angiosperm species. More ancient
species, including algae, moss, and likely gymnosperms, only contain one group of 3R-
MYBs. Thus, the 3R-MYBs regulatory role in cell cycle in these species would be
different from that in angiosperms. However, our understanding of 3R-MYB function in
these ancient species is lacking. It will be interesting to compare regulation of cell cycle
in these ancient species with the single group of 3R-MYBs, and in modern species with
three (or two in grasses) groups of 3R-MYBs with divergent functions. C-group 3R-MYB
genes involved in both cell cycle and abiotic stresses could be a newly evolved function
contributing the plant better adaptation to the environment.
138
AS of the two Arabidopsis A-group genes can generate isoforms lacking a
repression motif in the C-terminus leading to a hyperactive isoform. Similar cases were
observed in A-group genes in grape and Amborella. Thus, AS may play an important
role in regulating functional activity of A-group 3R-MYBs.
Origin and Evolution of the Multidomain TIFY Family
The TIFY plant-specific multi-domain gene family could be divided into four
subfamilies: ZML, JAZ, PPD, TIFY. The division of subfamilies based only on domain
arrangement could be misleading as domain rearrangement happens during evolution,
which is supported by observed domain loss in members of the ZML and PPD families. I
suggest the importance of phylogeny in subfamily division of TIFY family. Available data
did not reveal any evidence for domain loss in the TIFY family, which suggests that both
domains are important for proper gene function. The JAZ and ZML families have
undergone gene family expansion during evolution. The JAZ genes are the repressors
in the first step of the jasmonate signaling pathway. The expansion of the JAZ family
during plant evolution may have contributed to the complexity of jasmonate signaling
pathway. Interestingly, I observed AS in the Jas-like intron of PPD genes in various
eudicot species. The observed AS in the intron of the Jas-like domain of PPD genes is
similar to the AS in the intron of the Jas domain of JAZ genes, which frequently results
in a premature stop codon and results in a truncated protein that lacks part of the
Jas/Jas-like domain. AS in the Jas intron of JAZ genes could generate functional
proteins with divergent functions compared with that of the primary isoform. It will be
interesting to further analyze whether AS isoforms of the PPD genes are translated, and
if so, to determine the function of the translated protein.
139
AS Regulation of Arabidopsis under Jasmonate Treatment
In this project, I explored AS and AS-related regulation of Arabidopsis jaz2 and
jaz7 mutants in response to jasmonate treatment with transcriptome and proteome data.
Specifically, I focused on 1) AS regulation – a change in the proportion of AS isoforms
from genes in response to jasmonate treatment; 2) genes that undergo differential AS
and produce isoforms with potential miRNA target sites; and 3) genes that undergo AS
to produce splice variants with novel functions. At each level of examination I identified
a pool of candidate genes and a few interested cases were further explored. NUDX9
and NRT1.8 were identified to have significantly changed isoform proportions in
response to MeJA, which suggests that jasmonate signaling pathway regulates AS of
these genes. SMZ, AAO2 and At3g02740 are jasmonate responsive genes with
predicted miRNA binding sites subjected to AS regulation. AS of bHLH160, FYF and
FLM have the potential to generate an activator and repressor from the same gene
through different arrangements of domain structures – similar to the AS regulation of
JAZ repressors. Proteomics data validated protein level expression of the AS isoform
for nine genes. Jasmonate signaling and splicing regulation may communicate through
shared transcription factors, splicing factors and ubiquitin pathways. splicing factors
usually have decreased expression in response to MeJA, and a few candidate splicing
factors involved in jasmonate responses were identified. Taken in aggregate, these
findings expand our understanding of AS regulation in the jasmonate signaling pathway
and provide us candidate genes which may play critical roles in AS regulation of
jasmonate responses.
140
LIST OF REFERENCES
Abbasi AA, Hanif H. 2012. Phylogenetic history of paralogous gene quartets on human chromosomes 1, 2, 8 and 20 provides no evidence in favor of the vertebrate octoploidy hypothesis. Mol Phylogenet Evol. 63: 922-927.
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. 1990. Basic local alignment search tool. J. Mol. Biol. 215: 403-410.
Alves-Junior L, Niemeier S, Hauenschild A, Rehmsmeier M, Merkle T. 2009. Comprehensive prediction of novel microRNA targets in Arabidopsis thaliana. Nucleic Acids Res. 37: 4010-4021.
Apic G, Gough J, Teichmann SA. 2001. Domain combinations in archaeal, eubacterial and eukaryotic proteomes. J Mol Biol. 310: 311-325.
Araki S, Ito M, Soyano T, Nishihama R, Machida Y. 2004. Mitotic cyclins simulate the activity of c-Myb-like factors for transactivation of G2/M phase-specific genes in tobacco. J Biol Chem. 279: 32979-32988.
Araki S, Machida Y, Ito M. 2012. Virus-induced silencing of NtmybA1 and NtmybA2 causes incomplete cytokinesis and reduced shoot elongation in Nicotiana benthamiana. Plant Biotechnol. 29: 483-487.
Araki S, et al. 2013. Cosuppression of NtmybA1 and NtmybA2 causes downregulation of G2/M phase-expressed genes and negatively affects both cell division and expansion in tobacco. Plant Signal Behav. 8: e26780.
Attaran E, et al. 2014. Temporal dynamics of growth and photosynthesis suppression in response to jasmonate signaling. Plant Physiol. 165: 1302-1314.
Axtell MJ, Westholm JO, Lai EC. 2011 Vive la différence: biogenesis and evolution of microRNAs in plants and animals. Genome Biol. 12: 221.
Bai Y, Meng Y, Huang D, Qi Y, Chen M. 2011. Origin and evolutionary analysis of the plant-specific TIFY transcription factor family. Genomics 98: 128-136.
Bailey TL, Williams N, Misleh C, Li WW. 2006. MEME: discovering and analyzing DNA and protein sequence motifs. Nucleic Acids Res. 34: W369-W373.
Barbazuk WB, Fu Y, McGinnis KM. 2008. Genome-wide analyses of alternative splicing in plants: Opportunities and challenges. Genome Res. 18: 1381-1392.
Basu MK, Makalowski W, Rogozin IB, Koonin EV. 2008. U12 intron positions are more strongly conserved between animals and plants than U2 intron positions. Biol Direct 3: 19.
141
Bechtold U, et al. 2010. Constitutive salicylic acid defences do no compromise seed yield, drought tolerance and water productivity in the Arabidopsis accession C24. Plant, Cell & Environ. 33: 1959-1973.
Berget SM, Moore C, Sharp PA. 1977. Spliced segments at the 5’ terminus of adenovirus 2 late mRNA. P Natl Acad Sci USA. 74: 3171-3175
Bergoltz S, et al. 2001. The highly conserved DNA-binding domains of A-, B, and c-Myb differ with respect to DNA-binding phosphorylation and redox properties. Nucleic Acids Res. 29: 3546-3556.
Bernal M, et al. 2012 Transcriptome sequencing identifies SPL7-regulated copper acquisition genes FRO4/FRO5 and the copper dependence of iron homeostasis in Arabidopsis. Plant Cell 24: 738-761.
Bornberg-Bauer E, Albà MM. 2013. Dynamics and adaptive benefits of modular protein evolution. Curr Opin Struct Biol. 23: 459-466.
Boutet SC, et al. 2012. Alternative polyadenylation mediates microRNA regulation of muscle stem cell function. Cell Stem Cell 10: 327-336.
Braun EL, Grotewold E. 1999. Newly discovered plant c-myb-like genes rewrite the evolution of the plant myb gene family. Plant Physiol. 121: 21-24.
Brown JD, Plumpton M, Beggs JD. 1992. The genetics of nuclear premessenger RNA splicing: a complex story. Antonie Van Leeuwenhoek 62: 35-46.
Browse J, Howe GA. 2008. New weapons and a rapid response against insect attack. Plant Physiol. 146: 832-838.
Campo S, et al. 2013. Identification of a novel microRNA(miRNA) from rice that targets an alternatively splicing transcript of the Nramp6 (Natural resistance-associated macrophage protein 6) gene involved in pathogen resistance. New Phytol. 199: 212-227.
Carretero-Paulet L, et al. 2010. Genome-wide classification and evolutionary analysis of the bHLH family of transcription factors in Arabidopsis, poplar, rice, moss and algae. Plant Physiol. 153: 1398-1412.
Chamala S, Feng G, Chavarro C, Barbazuk WB. 2015. Genome-wide identification of evolutionarily conserved alternative splicing events in flowering plants. Front Bioeng Biotechnol. 3: 33.
Chandran D, Inada N, Hather G, Kleindt CK, Wildermuth MC. 2010. Laser microdissection of Arabidopsis cells at the powdery mildew infection site reveals site-specific processes and regulators. P Natl Acad Sci USA. 107: 460-465.
142
Chang YF, Iman JS, Wilkinson MF. 2007. The nonsense-mediated decay RNA surveillance pathway. Annu Rev Biochem. 76: 51-74.
Cavalier-Smith T. 1985. Selfish DNA and the origin of introns. Nature 315: 283-284.
Cheng CY, et al. 2016. Araport11: a complete reannotation of the Arabidopsis thaliana reference genome. Plant J. 89: 789-804.
Chico JM, et al. 2014. Repression of jasmonate-dependent defenses by shade involves differential regulation of protein stability of MYC transcription factors and their JAZ repressors in Arabidopsis. Plant Cell 26: 1967-1980.
Chini A, et al. 2007. The JAZ family of repressors is the missing link in jasmonate signalling. Nature 448: 666-671.
Chini A, Fonseca S, Chico JM, Fernández-Calvo P, Solano R. 2009. The ZIM domain mediates homo- and heteromeric interactions between Arabidopsis JAZ proteins. Plant J. 59: 77-87.
Chini A, Gimenez-Ibanez S, Goossens A, Solano R. 2016. Redundancy and specificity in jasmonate signalling. Curr Opin Plant Biol. 33: 147-156.
Chow LT, Gelinas RE, Broker TR, Roberts RJ. 1977. An amazing sequence arrangement at the 5’ ends of adenovirus 2 messenger RNA. Cell 12: 1-8.
Chung HS, Howe GA. 2009. A critical role for the TIFY motif in repression of jasmonate signaling by a stabilized splice variant of the JASMONATE ZIM-Domain protein JAZ10 in Arabidopsis. Plant Cell 21: 131-145.
Chung HS, Niu Y, Browse J, Howe GA. 2009. Top hits in contemporary JAZ: An update on jasmonate signaling. Phytochem. 70: 1547-1559.
Chung HS, et al. 2010. Alternative splicing expands the repertoire of dominant JAZ repressors of jasmonate signaling. Plant J. 63: 613-622.
Dai X, et al. 2007. Overexpression of an R1R2R3 MYB gene, OsMYB3R-2, increases tolerance to freezing, drought, and salt stress in transgenic Arabidopsis. Plant Physiol. 143: 1739-1751.
Dai X, Zhao PX. 2011. psRNATarget: a plant small RNA target analysis server. Nucleic Acids Res. 39: W155-W159.
Darnel JE. 1978. Implications of RNA-RNA splicing in evolution of eukaryotic cells. Science 202: 1257-1260.
Dash AB, Orrico FC, Ness SA. 1996. The EVES motif mediates both intermolecular and intramolecular regulation of c-Myb. Gene Dev. 10: 1858-1869.
143
Dash S, Van Hemert J, Hong L, Wise RP, Dickerson JA. 2012. PLEXdb: gene expression resources for plants and plant pathogens. Nucleic Acids Res. 40: D1194-D1201.
Davidson CJ, Guthrie EE, Lipsick JS. 2012. Duplication and maintenance of the Myb genes of vertebrate animals. Biol Open 2: 101-110.
Davidson CJ, Tirouvanziam R, Herzenberg LA, Lipsick JS. 2005. Functional evolution of the vertebrate Myb gene family B-Myb, but neither A-Myb nor c-Myb, complements Drosophila Myb in hemocytes. Genetics 169: 215-229.
del Pozo JC, Lopez-Matas MA, Ramriez-Parra E, Gutierrez C. 2005. Hormonal control of the plant cell cycle. Physiol Plantarum 123: 173-183.
Davidson CJ, Guthrie EE, Lipsick JS. 2012. Duplication and maintenance of the Myb genes of vertebrate animals. Biol. Open 2: 101-110.
Dias AP, Braun EL, McMullen MD, Grotewold E. 2003. Recently duplicated maize R2R3 Myb genes provide evidence for distinct mechanisms of evolutionary divergence after duplication. Plant Physiol. 131: 610-620.
Drechsel G, et al. 2013. Nonsense-mediated decay of alternative precursor mRNA splicing variants is a major determinant of the Arabidopsis steady state transcriptome. Plant Cell 25: 3726-3742.
Du H, et al. 2012. Genome-wide analysis of the MYB transcription factor superfamily in soybean. BMC Plant Biol. 12: 106.
Du H, et al. 2013. Genome-wide identification and evolutionary and expression analyses of MYB-related genes in land plants. DNA Res. 20: 437-448.
Dubos C, et al. 2010. MYB transcription factor in Arabidopsis. Trends Plant Sci. 15: 573-581.
Dugas DV, et al. 2011. Functional annotation of the transcriptome of Sorghum bicolor in response to osmotic stress and abscisic acid. BMC Genomics 12: 514.
Dujon B. 1980. Sequence of the intron and flanking exons of the mitochondrial 21S rRNA gene of yeast strains having different alleles at the omega and rib-1 loci. Cell 20: 185-197.
Doolittle WF. 1978. Genes in pieces: were they ever together? Nature 272: 581-582.
Eddy SR. 2011. Accelerated profile HMM searches. PLoS Comput Biol. 7: e1002195.
Edgar RC. 2004. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32: 1792-1797.
144
Fedorov A, et al. 2001. Intron distribution difference for 276 ancient and 131 modern genes suggests the existence of ancient introns. P Natl Acad Sci USA. 98: 13177-13182.
Fedorova L, Fedorov A. 2003. Introns in gene evolution. Genetica 118: 123-131.
Feller A, Machemer K, Braun EL, Grotewold E. 2011. Evolutionary and comparative analysis of MYB and bHLH plant transcription factors. Plant J. 66: 94-116.
Feng G, Burleigh JG, Braun EL, Mei W, Barbazuk WB. 2017. Evolution of the 3R-MYB gene family in plants. Genome Bio Evol. 9: 1013-1029.
Fernández-Calvo P, et al. 2011. The Arabidopsis bHLH transcription factors MYC3 and MYC4 are targets of JAZ repressors and act additively with MYC2 in the activation of jasmonate responses. Plant Cell 23: 701-715.
Filichkin S, Mockler TC. 2012. Unproductive alternative splicing and nonsense mRNAs: a widespread phenomenon among plant circadian clock genes. Biol Direct 7: 20.
Filichkin S, Priest HD, Megraw M, Mockler TC. 2015. Alternative splicing in plants: directing traffic at the corssroads of adaptation and environmental stress. Curr Opin Plant Biol. 24: 125-135.
Finn RD, et al. 2014. Pfam: the protein families database. Nucleic Acids Res. 42: D222-D230.
Foissac S, Sammeth M. 2007. ASTALAVISTA: dynamic and flexible analysis of alternative splicing events in custom gene datasets. Nucleic Acids Res. 35: W297-W299.
Fonseca S, et al. 2009. (+)-7-iso-Jasmonoyl-L-isoleucine is the endogenous bioactive jasmonate. Nature Chem Biol. 5: 344-350.
Fonseca S, et al. 2014. bHLH003, bHLH013 and bHLH017 are new targets of JAZ repressors negatively regulating JA responses. PLoS ONE 9: e86182.
Friedman RC, Farh KKH, Burge CB, Bartel DP. 2009. Most mammalian mRNAs are conserved targets of microRNAs. Genome Res. 19: 92-105.
Gasperini D, et al. 2015 Multilayered organization of jasmonate signalling in the regulation of root growth. PLoS Genet. 11: e1005300.
Gaucher EA, Gu X, Miyamoto MM, Benner SA. 2002. Predicting functional divergence in protein evolution by site-specific rate shifts. Trends in Biochem Sci. 27:315-321.
145
Gaucher EA, Miyamoto MM, Benner SA. 2001. Function-structure analysis of proteins using covarion-based evolutionary approaches: elongation factors. P Natl Acad Sci USA. 98: 548-552.
Gharib WH, Robinson-Rechavi M. 2013. The branch-site test of positive selection is surprisingly robust but lacks power under synonymous substitution saturation and variation in GC. Mol Bio Evol. 30: 1675-1686.
Gilbert W. 1987. The exon theory of genes. In: Cold Spring Harbor symposia on quantitative biology. Cold Spring Harbor Laboratory Press. 52: 901-905.
Gibson TJ, Spring J, 2000. Evidence in favour of ancient octaploidy in the vertebrate genome. Biochem Soc Trans. 28: 259-264.
Gill SS, Tuteja N. 2010. Reactive oxygen species and antioxidant machinery in abiotic stress tolerance in crop plants. Plant Physiol BioChem. 48: 909-930.
Goldman N, Yang Z. 1994. A codon-based model of nucleotide substitution for protein-coding DNA sequences. Mol Biol Evol. 11: 725-736.
Goodman HM, Olson MV, Hall BD. 1977. Nucleotide sequence of a mutant eukaryotic gene: the yeast tyrosine-inserting ochre suppressor SUP4-o. P Natl Acad Sci USA. 74: 5453-5457.
Grotewold E, et al. 2000. Identification of the residues in the Myb domain of maize C1 that specify the interaction with the bHLH cofactor R. P Natl Acad Sci USA. 97:13579-13584.
Han JH, Batey S, Nickson AA, Teichmann SA, Clarke J. 2007. The folding and evolution of multidomain proteins. Nat Rev Mol Cell Biol. 8: 319-330.
Irimia M, Roy SW. 2014. Origin of spliceosomal introns and alternative splicing. In: Cold Spring Harbor perspectives in biology. Cold Spring Harbor Laboratory Press. 6: a016071
Jiang C, Gu J, Chopra S, Gu X, Peterson T. 2004. Ordered origin of the typical two- and three-repeat Myb genes. Gene 326: 13-22.
Jiang Y, Liang G, Yang S, Yu D. 2014. Arabidopsis WRKY57 functions as a node of convergence for jasmonic acid- and auxin-mediated signaling in jasmonic acid-induced leaf senescence. Plant Cell 26: 230-245.
Keren H, Lev-Maor G, Ast G. 2010. Alternative splicing and evolution: diversitication, exon definition and function. Nat Rev Genet. 11: 345-355.
Kim E, Magen A, Ast G. 2007. Different levels of alternative splicing among eukaryotes. Nucleic Acids Res. 35: 125-131.
146
Kornblihtt AR, et al. 2013. Alternative splicing: a pivotal step between eukaryotic transcription and translation. Nat Rev Mol Cell Biol. 14: 153-165.
Koonin EV. 2006. The origin of introns and their role in eukaryogenesis: a compromise solution to the introns-early versus introns-late debate? Biol Direct 1: 22.
Kranz HD, et al. 1998. Towards functional characterisation of the members of the R2R3-MYB gene family from Arabidopsis thaliana. Plant J. 16: 263-276.
Haas BJ, Delcher AL, Wortman JR, Salzberg SL. 2004. DAGchainer: a tool for mining segmental genome duplications and synteny. Bioinformatics 20: 3643-3646.
Haga N, et al. 2007. R1R2R3-Myb proteins positively regulate cytokinesis through activation of KNOLLE transcription in Arabidopsis thaliana. Development 134: 1101-1110.
Haga N, et al. 2011. Mutations in MYB3R1 and MYB3R4 cause pleiotropic developmental defects and preferential down-regulation of multiple G2/M-specific genes in Arabidopsis. Plant Physiol. 157: 706-717.
Han JH, Batey S, Nickson AA, Teichmann SA, Charke J. 2007. The folding and evolution of multidomain proteins. Mol Cell Biol. 8: 319-330.
Hedges SB, Martin J, Suleski M, Paymer M, Kumar S. 2015. Tree of life reveals clock-like speciation and diversification. Mol Biol Evol. 32: 835-845.
Hirt H. 2000. Connecting oxidative stress, auxin, and cell cycle regulation through a plant mitogen-activated protein kinase pathway. P Natl Acad Sci USA. 97: 2405-2407.
Howe GA, Jander G. 2008. Plant immunity to insect herbivores. Annu Rev Plant Biol. 59: 41-66.
Hu Y, Jiang L, Wang F, Yu D. 2013. Jasmonate regulates the INDUCER OF CBF EXPRESSION-C-REPEAT BINDING FACTOR/DRE BINDING FACTOR1 cascade and freezing tolerance in Arabidopsis. Plant Cell 25: 2907-2924.
Hu B, et al. 2015. GSDS 2.0: An upgraded gene feature visualization server. Bioinformatics 31: 1296-1297.
Huang CH, et al. 2015. Resolution of Brassicaceae phylogeny using nuclear genes uncovers nested radiations and supports convergent morphological evolution. Mol Biol Evol. 33: 394-412.
Inzé D and De Veylder L. 2006. Cell cycle regulation in plant development. Annu Rev Genet. 40: 77-105.
147
Ito M, et al. 1998. A novel cis-acting element in promoters of plant B-type cyclin genes activates M phase-specific transcription. Plant Cell 10: 331-341.
Ito M, et al.. 2001. G2/M-phase-specific transcription during the plant cell cycle is mediated by c-Myb-like transcription factors. Plant Cell 13: 1891-1905.
Ito M. 2005. Conservation and diversification of the three-repeat Myb transcription factors in plants. J Plant Res. 118: 61-69.
Jiao Y, et al. 2011. Ancestral polyploidy in seed plants and angiosperms. Nature 473: 97-100.
Jonak C, Ökrész L, Bögre L, Hirt H. 2002. Complexity, cross talk and integration of plant MAP kinase signalling. Curr Opin Plant Biol. 5: 415-424.
Kalsotra A, Wang K, Li PF, Cooper TA. 2010. MicroRNAs coordinate an alternative splicing network during mouse postnatal heart development. Genes Dev. 24: 653-658.
Kalyna M, et al. 2012. Alternative splicing and nonsense-mediated decay modulate expression of important regulatory genes in Arabidopsis. Nucleic Acids Res. 40: 2454-2469.
Kato K, et al. 2009. Preferential up-regulation of G2/M phage-specific genes by overexpression of the hyperactive form of NtmybA2a lacking its negative regulation domain in tobacco BY-2 Cells. Plant Physiol. 149: 1945-1957.
Katsir L, Chung HS, Koo AJK, Howe GA. 2008. Jasmonate signaling: a conserved mechanism of hormone sensing. Curr Opin Plant Biol. 11: 428-435.
Kervestin S, Jacobson A. 2012. NMD: a multifaceted response to premature translational termination. Nature Rev Mol Cell Bio. 13: 700-712.
Khanna R, et al. 2009. The Arabidopsis B-box zinc finger family. Plant Cell 21: 3416-3420.
Kilian J, et al. 2007. The AtGenExpress global stress expression data set: protocols, evaluation and model data analysis of UV-B light, drought and cold stress responses. Plant J. 50: 347-363.
Klempnauer KH, Gonda TJ, Bishop JM. 1982. Nucleotide sequence of the retroviral leukemia gene v-myb and its cellular progenitor c-myb: the architecture of a transduced oncogene. Cell 31: 453-463.
Koh J, et al. 2012. Comparative proteomics of the recently and recurrently formed natural allopolyploid Tragopogon mirus (Asteraceae) and its parents. New Phytol. 196: 292-305.
148
Koonin EV. 2006. The origin of introns and their role in eukaryogenesis: a compromise solution to the introns-early versus introns-late debate? Biol Direct. 1: 22.
Kornblihtt AR, et al. 2013. Alternative splicing: a pivotal step between eukaryotic transcription and translation. Nature Rev Mol Cell Biol. 14: 153-165.
Kozomara A, Griffiths-Jones S. 2014. miRBase: annotating high confidence microRNAs using deep sequencing data. Nucleic Acids Res. 42: D68-D73.
Lane CE, et al. 2007. Nucleomorph genome of Hemiselmis andersenii reveals complete intron loss and compaction as a driver of protein structure and function. P Natl Acad Sci USA.104: 19908-19913.
LaPolla RJ, Lambowitz AM. 1978. Ribosomal precursor RNA containing a 2.3-kilobase intron. J Biol Chem. 254: 11746-11750.
Lareau LF, Inada M, Green RE, Wengrod JC, Brenner SE. 2007. Unproductive splicing of SR genes associated with highly conserved and ultraconserved DNA elements. Nature 446: 926-929.
Le SQ, Dang CC, Gascuel O. 2012. Modeling protein evolution with several amino acid replacement matrices depending on site rates. Mol Biol Evol. 29: 2921-2936.
Le SQ, Gascuel O. 2008. An improved general amino acid replacement matrix. Mol Biol Evol. 25: 1307-1320.
Letunic I, Doerks T, Bork P. 2015. SMART: recent updates, new developments and status in 2015. Nucleic Acids Res. 43: D257-D260.
Lewis BP, Green RE, Brenner SE. 2003. Evidence for the widespread coupling of alternative splicing and nonsense-mediated mRNA decay in humans. P Natl Acad Sci USA. 100: 189-192.
Li J, et al. 2006. A subgroup of MYB transcription factor genes undergoes highly conserved alternative splicing in Arabidopsis and rice. J Exp Bot. 57: 1263-1273.
Li JY, et al. 2010. The Arabidopsis nitrate transporter NRT1.8 functions in nitrate removal from the xylem sap and mediates cadmium tolerance. Plant Cell 22: 1633-1646.
Li Q, Zhang C, Li J, Wang L, Ren Z. 2012. Genome-wide identification and characterization of R2R3MYB gene family in Cucumis sativus. PLoS ONE 7: e47576.
Linkies A, Leubner-Metzger G. 2012. Beyond gibberellins and abscisic acid: how ethylene and jasmonates control seed germination. Plant Cell Rep. 31: 253-270.
Lipsick JS. 1996. One billion years of Myb. Oncogene 13: 223-235.
149
Logsdon JM, et al. 1995. Seven newly discovered intron positions in the triose-phosphate isomerase gene: evidence for the introns late theory. P Natl Acad Sci USA. 92: 8507-8511.
Logsdon JM. 1998. The recent origin of spliceosomal introns revised. Curr Opin Genet Dev. 8: 637-648.
Ma PCM, Rould MA, Weintraub H, Pabo CO. 1994. Crystal structure of MyoD bHLH domain-DNA complex: Perspective on DNA recognition and implications for transcriptional activation. Cell 77: 451-459.
Ma Q, et al. 2009. Enhanced tolerance to chilling stress in OsMYB3R-2 transgenic rice is mediated by alteration in cell cycle and ectopic expression of stress genes. Plant Physiol. 150: 244-256.
Madera M, Vogel C, Kummerfeld SK, Chothia C, Gough J. 2004. The SUPERFAMILY database in 2004: additions and improvements. Nucleic Acids Res. 32: D235-D239.
Marchler-Bauer A, et al. 2015. CDD: NCBI’s conserved domain database. Nucleic Acids Res. 43: D222-D226.
Marquez Y, Brown JWS, Simpson C, Barta A, Kalyna M. 2012. Transcriptome survey reveals increased complexity of the alternative splicing landscape in Arabidopsis. Genome Res. 22: 1184-1195.
Martin C, Paz-Ares J. 1997. MYB transcription factors in plants. Trends in Genet. 13: 67-73.
Martinez-Contreras R, et al. 2008. 8 hnRNP proteins and splicing control. Adv Exp Med Biol. 623: 107-122.
Matus JT, Aquea F, Arce-Johnson P. 2008. Analysis of the grape MYB R2R3 subfamily reveals expanded wine quality-related clades and conserved gene structure organization across Vitis and Arabidopsis genomes. BMC Plant Biol. 8: 83.
Mei W, Boatwright JL, Feng G, Schnable JC, Barbazuk WB. 2017. Evolutionarily conserved alternative splicing across monocots. Genetics xx: xx-xx.
Meng Y, Shao C, Ma X, Wang H. 2013. Introns targeted by plant microRNAs: a possible novel mechanism of gene regulation. Rice 6: 8.
Miao Y, Zentgraf U. 2007. The antagonist function of Arabidopsis WRKY53 and ESR/ESP in leaf senescence is modulated by the jasmonic and salicylic acid equilibrium. Plant Cell 19: 819-830.
Mittler R, Vanderauwera S, Gollery M, Van Breusegem F. 2004. Reactive oxygen gene network of plants. Trends Plant Sci. 9: 490-498.
150
Moreno JE, et al. 2013. Negative feedback control of jasmonate signaling by an alternative splice variant of JAZ10. Plant Physiol. 162: 1006-1017.
Mudgil Y, Singh BN, Upadhyaya KC, Sopory SK, Reddy MK. 2002. Cloning and characterization of a cell cycle-regulated gene encoding topoisomerase I from Nicotiana tabacum that is inducible by light, low temperature and abscisic acid. Mol Genet Genomics 267: 380-390.
Nakagami H, Pitzschke A, Hirt H. 2005. Emerging MAP kinase pathways in plant stress signalling. Trends Plant Sci. 10: 339-346.
Nilsen TW. 2003. The spliceosome: the most complex macromolecular machine in the cell? BioEssays 25: 1147-1149.
Nishii A, et al. 2000. Characteriztion of a novel gene encoding a putative single zinc-finger protein, ZIM, expressed during the reproductive phage in Arabidopsis thaliana. Biosci Biotechnol Biochem 64: 1402-1409.
Oelgeschläger M, Kowenz-Leutz E, Schreek S, Leutz A, Lüscher B. 2001. Tumorigenic N-terminal deletions of c-Myb modulate DNA binding, transactivation, and cooperativity with C/EBP. Oncogene 20: 7420-7424.
Ogata K, et al. 1992. Solution structure of a DNA-binding unit of Myb: A helix-turn-helix-related motif with conserved tryptophans forming a hydrophobic core. P Natl Acad Sci USA. 89: 6428-6432.
Ogata K, et al. 1994. Solution structure of a specific DNA complex of the Myb DNA-binding domain with cooperative recognition helices. Cell 79: 639-648.
Olson A, et al. 2014. Expanding and vetting Sorghum bicolor gene annotations through transcriptome and methylome sequencing. Plant Genome. 7: 2.
Ording E, Kvavik W, Bostad A, Gabrielsen OS. 1994. Two functionally distinct half sites in the DNA-recognition sequence of the Myb oncoprotein. Eur J BioChem. 222: 113-120.
Palmer JD, Logsdon JM. 1991. The recent origin of introns. Curr Opin Genet Dev. 1: 470-477.
Pan Q, Shai O, Lee LJ, Frey BJ, Blencowe BJ. 2008. Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nat Genet. 40: 1413-1415.
Par̆enicová L, et al. 2003. Molecular and phylogenetic analyses of the complete MADS-Box transcription factor family in Arabidopsis: new openings to the MADS world. Plant Cell 15: 1538-1551.
151
Patel AA, Steitz JA. 2003. Splicing double: insights from the second spliceosome. Nat Rev Mol Cell Biol. 4: 960-970.
Paterson AH, et al. 2009. The Sorghum bicolor genome and the diversification of grasses. Nature 457: 551-556.
Pauwels L, et al. 2010. NINJA connects the co-repressor TOPLESS to jasmonate signalling. Nature 464: 788-791.
Pleiss JA, Whitworth G, Bergkessel M, Guthrie C. 2007. Rapid, transcript-specific changes in splicing in response to environmental stress. Mol Cell 27: 928-937.
Qi T, et al. 2011. The jasmonate-ZIM-domain proteins interact with the WD-repeat/bHLH/MYB Complexes to regulate jasmonate-mediated anthocyanin accumulation and trichome initiation in Arabidopsis thaliana. Plant Cell 23: 1795-1814.
Qi T, Huang H, Song S, Xie D. 2015a. Regulation of jasmonate-mediated stamen development and seed production by a bHLH-MYB complex in Arabidopsis. Plant Cell 27: 1620-1633.
Qi T, et al. 2015b. Regulation of jasmonate-induced leaf senescence by antagonism between bHLH subgroup IIIe and IIId factors in Arabidopsis. Plant Cell 27: 1634-1649.
R Development Core Team. 2014. R: A language and environment for statistical computing. Vienna, Austria.
Raikhel N. 1992. Nuclear targeting in plants. Plant Physiol. 100: 1627-1632.
Reddy ASN. 2004. Plant serine/arginine-rich proteins and their role in pre-mRNA splicing. Trends Plant Sci. 9: 11.
Reddy ASN, Marquez Y, Kalyna M, Barta A. 2013. Complexity of the alternative splicing landscape in plants. Plant Cell 25: 3657-3683.
Rensing SA, et al. 2007. An ancient genome duplication contributed to the abundance of metabolic genes in the moss Phycomitrella patens. BMC Evol Biol. 7: 130.
Rogozin IB, Carmel L, Csuros M, Koonin EV. 2012. Origin and evolution of spliceosomal introns. Biol Direct 7: 11.
Rosinski JA, Atchley WR. 1998. Molecular evolution of the Myb family of transcription factors: evidence for polyphyletic origin. J Mol Evol. 46: 74-83.
Roy SW, Nosaka M, de Souza SJ, Gilbert W. 1999. Centripetal modules and ancient introns. Gene 238: 85-91.
152
Roy SW, Gilbert W. 2006. The evolution of spliceosomal introns: patterns, puzzles and progress. Nat Rev Genet. 7: 211-221.
Ruhfel BR, Gitzendanner MA, Soltis PS, Soltis DE, Burleigh JG. 2014. From algae to angiosperms – inferring the phylogeny of green plants (Viridiplantae) from 360 plastid genomes. BMC Evol Biol. 14: 23.
Saldanha R, Mohr G, Belfort M, Lambowitz AM. 1993. Group I and group II introns. FASEB J. 7: 15-24.
Sammeth M, Foissac S, Guigó R. 2008. A general definition and nomenclature for alternative splicing events. PLoS Comput Biol. 4: e1000147.
Sandberg R, Neilson JR, Sarma A, Sharp PA, Burge CB. 2008. Proliferating cells express mRNAs with shortened 3’ untranslated regions and fewer microRNA target sites. Science 320: 1643-1647.
Schwartz SH, et al. 2008. Large-scale comparative analysis of splicing signals and their corresponding splicing factors in eukaryotes. Genome Res. 18: 88-103.
Severing EI, van Dijk ADJ, Stiekema WJ, van Ham RCHJ. 2009. Comparative analysis indicates that alternative splicing in plants has limited role in functional expansion of the proteome. BMC Genomics 10: 154.
Shaikhali J, et al. 2012. The CRYPTOCHROME1-depenent response to excess light is mediated through the transcriptional activators ZINC FINGER PROTEIN EXPRESSED IN INFLORESENCE MERISTEM LIKE1 and ZML2 in Arabidopsis. Plant Cell 24: 3009-3025.
Shannon P, et al. 2003. Cytoscape: A software environment for integrated models of biomolecular interaction networks. Genome Res. 13: 2498-2504.
Shilov IV, et al. 2007. The paragon algorithm, a next generation search engine that uses sequence temperature values and feature probabilities to identify peptides from tandem mass spectra. Mol Cell Proteomics 6: 1638-1655.
Shyu C, et al. 2012. JAZ8 lacks a canonical degron and has an EAR motif that mediates transcriptional repression of jasmonate responses in Arabidopsis. Plant Cell 24: 536-550.
Sikder HA, Devlin MK, Dunlap S, Ryu B, Alani RM. 2003. Id proteins in cell growth and tumorigenesis. Cancer Cell 3: 525-530.
Simpson CG, Brown JWS. 2008. U12-Dependent Intron Splicing in Plants. Nuclear pre-mRNA Processing in Plants 326: 61-82.
153
Song S, et al. 2011. The Jasmonate-ZIM domain proteins interact with the R2R3-MYB transcription factors MYB21 and MYB24 to affect jasmonate-regulated stamen development in Arabidopsis. Plant Cell 23: 1000-1013.
Song S, et al. 2013. The bHLH subgroup IIId factors negatively regulate jasmonate-mediated plant defense and development. PLoS Genetics 9: e1003653.
Staiger D, Brown JWS. 2013. Alternative splicing at the intersection of biological timing, development, and stress responses. Plant Cell 25: 3640-3656.
Stamatakis A. 2014. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30: 1312-1313.
Stoltzfus A. 1999. On the possibility of constructive neutral evolution. J Mol Evol 49: 169-181
Stolzer M, Siewert K, Lai H, Xu M, Durand D. 2015. Event inference in multidomain families with phylogenetic reconciliation. BMC Bioinformatics 16: S8.
Szklarczyk D, et al. 2015. STRING V10: protein-protein interaction networks, integrated over the tree of life. Nucleic Acids Res. 43: D447-D452.
Tamura K, Stecher G, Peterson D, Filipski A, Kumar S. 2013. MEGA6: Molecular evolutionary genetics analysis version 6.0. Mol Biol Evol. 30:2725-2729.
Tanaka H, et al. 2015. Identification and characterization of Arabidopsis AtNUDX9 as a GDP-D-mannose pyrophosphohydrolase: its involvement in root growth inhibition in response to ammonium. J Exp Bot. 66: 5797-5808.
Tarrío R, Ayala FJ, Rodríguez-Trelles F. 2008. Alternative splicing: a missing piece in the puzzle of intron gain. P Natl Acad Sci USA. 105: 7223-7228.
Teakle GR, Manfield IW, Graham JF, Gilmartin PM. 2002. Arabidopsis thaliana GATA factors: organisation, expression and DNA-binding characteristics. Plant Mol Biol. 50: 43-57.
Terol J, Domingo C, Talon M. 2006. The GH3 family in plants: genome wide analysis in rice and evolutionary history based on EST analysis. Gene. 371: 279-290.
Thatcher LF, et al. 2016. Characterization of a JAZ7 activation-tagged Arabidopsis mutant with increased susceptibility to the fungal pathogen Fusarium oxysporum. J Exp Bot. 67: 2367-2386.
Thines B, et al. 2007. JAZ repressor proteins are targets of the SCFCOI1 complex during jasmonate signalling. Nature 448: 661-665.
Thireault C, et al. 2015. Repression of jasmonate signaling by a non-TIFY JAZ protein in Arabidopsis. Plant J. 82: 669-679.
154
Tian B, Manley JL. 2013. Alternative cleavage and polyadenylation: the long and short of it. Trends in Biochem Sci. 38: 6.
Trapnell C, et al. 2013. Differential analysis of gene regulation at transcript resolution with RNA-seq. Nature Biotechnol. 31: 46-53.
Valenzuela P, Venegas A, Weinberg F, Bishop R, Rutter WJ. 1978. Structure of yeast phenylalanine-tRNA genes: an intervening DNA segment within the region coding for the tRNA. P Natl Acad Sci USA. 75: 190-194.
Valenzuela CE, et al. 2016. Salt stress response triggers activation of the jasmonate signaling pathway leading to inhibition of cell elongation in Arabidopsis primary root. J Exp Bot. 67: 4209-4220.
Vanholme B, Grunewald W, Bateman A, Kohchi T, Gheysen G. 2007. The tify family previously known as ZIM. Trends Plant Sci. 12: 239-244.
Vanneste K, Maere S, Van de Peer Y. 2014. Tangled up in two: a burst of genome duplications at the end of the Cretaceous and the consequences for plant evolution. Phil Trans R Soc B. 369: 20130353.
Vélez-Bermúdez IC, et al. 2015. A MYB/ZML complex regulates wound-induced lignin genes in maize. Plant Cell 27: 3245-3259.
Vogel C, Bashton M, Kerrison N, Chothina C, Teichmann SA. 2004. Structure, function and evolution of multidomain proteins. Curr Opin Struct Biol. 14: 208-216.
Voinnet O. 2009. Origin, biogenesis, and activity of plant microRNAs. Cell 136: 669-687.
Wahl MC, Will CL, Lührmann R. 2009. The spliceosome: design principles of a dynamic RNP machine. Cell 136: 701-718.
Wang H, et al. 1998. ICK1, a cyclin-dependent protein kinase inhibitor from Arabidopsis thaliana interacts with both Cdc2a and CycD3, and its expression is induced by abscisic acid. Plant J. 15: 501-510.
Wang BB, Brendel V. 2006. Genomewide comparative analysis of alternative splicing in plants. P Natl Acad Sci USA. 103: 7175-7180.
Wang L, Wang S, Li W. 2012. RSeQC: quality control of RNA-seq experiments. Bioinformatics 28: 2184-2185.
White DWR. 2006. PEAPOD regulates lamina size and curvature in Arabidopsis. Proc Natl Acad Sci USA. 103: 13238-13243.
Will CL, Luhrmann R. 2005. Splicing of a rare class of introns by the U12-dependent spliceosome. Biol Chem. 386: 713-724.
155
Wu CT, Chiou CY, Chiu HC, Yang UC. 2013a. Fine-tuning of microRNA-mediated repression of mRNA by splicing-regulated and highly repressive microRNA recognition element. BMC Genomics 14: 438.
Wu G, Poethig RS. 2006. Temporal regulation of shoot development in Arabidopsis thaliana by miR156 and its target SPL3. Development 133: 3539-3547.
Wu TD, Nacu S. 2010. Fast and SNP-tolerant detection of complex variants and splicing in short reads. Bioinformatics 26: 873-881.
Wu YC, Rasmussen MD, Bansal MS, Kellis M. 2013b. TreeFix: statistically informed gene tree error correction using species trees. Syst Biol. 62: 110-120.
Yamasaki H, Hayashi M, Fukazawa M, Kobayashi Y, Shikanai T. 2009. SQUAMOSA promoter binding protein-like7 is a central regulator for copper homeostasis in Arabidopsis. Plant Cell 21: 347-361.
Yan H, et al. 2014. Molecular reprogramming of Arabidopsis in response to perturbation of jasmonate signaling. J Proteome Res. 13: 5751-5766.
Yan Y, et al. 2007. A downstream mediator in the growth repression limb of the jasmonate pathway. Plant Cell 19: 2470-2483.
Yang SW, Jin E, Chung IK, Kim WT. 2002. Cell cycle-dependent regulation of telomerase activity by auxin, abscisic acid and protein phosphorylation in tobacco BY-2 suspension culture cells. Plant J. 29: 617-626.
Yang Z. 2007. PAML4: Phylogenetic analysis by maximum likelihood. Mol Biol Evol. 24: 1586-1591.
Yang X, Zhang H, Li L. 2012. Alternative mRNA processing increases the complexity of microRNA-based gene regulation in Arabidopsis. Plant J. 70: 421-431.
Yoshida T, Mogami J, Yamaguchi-Shinozaki K. 2014. ABA-dependent and ABA-independent signaling in response to osmotic stress in plans. Curr Opin Plant Biol. 21: 133-139.
Yu J, et al. 2016. JAZ7 negatively regulates dark-induced leaf senescence in Arabidopsis. J Exp Bot. 67: 751-762.
Zeng L, et al. 2014. Resolution of deep angiosperm phylogeny using conserved nuclear genes and estimates of early divergence times. Nature Commun. 5: 4956.
Zhai Q, et al. 2015. Transcriptional mechanism of jasmonate receptor COI1-mediated delay of flowering time in Arabidopsis. Plant Cell 27: 2814-2828.
Zhang F, et al. 2017a. Structural insights into alternative splicing-mediated desensitization of jasmonate signaling. Proc Natl Acad Sci USA. 114: 1720-1725.
156
Zhang R, et al. 2017b. A high quality Arabidopsis transcriptome for accurate transcript-level analysis of alternative splicing. Nucleic Acids Res. 45: 5061-5073.
Zhang Y, et al. 2012. Genome-wide identification and analysis of the TIFY gene family in grape. PLoS ONE 7: e44465.
Zheng Y, et al. 2017. Jasmonate inhibits COP1 activity to suppress hypocotyl elongation and promote cotyledon opening in etiolated Arabidopsis seedlings. Plant J. 90: 1144-1155.
157
BIOGRAPHICAL SKETCH
Guanqiao Feng was from Xingtai, China, where she completed her primary,
middle school and high school education. In 2007 she moved to Nanjing to start her
undergraduate studies. She received her bachelor’s degree in agriculture in 2011 and
master’s degree in biotechnology in 2013 from Nanjing Agricultural University, China.
During that time she developed a keen interest in bioinformatics. After graduating with
her MSc. she joined the Plant Molecular and Cellular Biology Graduate Program at the
University of Florida in 2013 to pursue her doctorate degree. In the first year, she
rotated in Dr. Burleigh’s lab, Dr. Soltis’s lab and Dr. Barbazuk’s lab. In Dr. Burleigh’s lab,
she worked on evolution of MYB gene family in plant kingdom, which later developed
into the second chapter of her dissertation. While in Dr. Soltis’s lab, she worked with Dr.
Gitzendanner on the plant 1kp project data to estimate the plant phylogeny based on
plasmid genome. During a rotation in Dr. Barbazuk’s lab she was involved in the project
of identifying and characterizing conserved AS in flowering plants. After the first year of
rotation she joined Dr. Barbazuk’s lab to study gene family evolution and alternative
splicing. In her Ph.D. research, she explored three projects: 1) Evolution of the 3R-
MYBs gene family in plants; 2) Origin and evolution of the TIFY plant-specific multi-
domain gene family; 3) Jasmonate induced alternative splicing in Arabidopsis. Besides
the above projects, she was involved in two other projects: 1) Conserved alternative
splicing across monocots; 2) Maize RBM48 in minor intron splicing and differentiation.
She received her Ph.D. from the University of Florida in the fall of 2017.