1 : maize rna-seq gcns · 4 author names and affiliation: 5 ji huang, ... and the genome annotation...

48
Page | 1 Short title: Maize RNA-Seq GCNs 1 Corresponding author details: Dr. Karen McGinnis 2 Article title: Construction and Optimization of Large Gene Co-expression Network in Maize Using RNA-Seq Data 3 Author names and affiliation: 4 Ji Huang, Stefania Vendramin, Karen M. McGinnis, Department of Biological Science, Florida State University, Tallahassee, FL 5 32306 6 Lizhen Shi, Department of Computer Science, Florida State University, Tallahassee, FL 32306 7 One sentence summary: Large-scale maize co-expression network from RNA-Seq data facilitates gene function and pathway 8 analysis. 9 Footnotes: 10 List of author contributions: J.H. and K.M. designed the experiments. J.H. conducted experiments. J.H. and S.V. 11 analyzed the data. J.H., K.M. and S.V. interpreted the data. L.S and J.H made the website. J.H., K.M. and S.V. wrote the article. 12 Funding information: National Science Foundation 13 Corresponding author email: [email protected] 14 15 16 Abstract 17 With the emergence of massively parallel sequencing, genome-wide expression data production has reached 18 an unprecedented level. This abundance of data has greatly facilitated maize research, but may not be 19 amenable to traditional analysis techniques that were optimized for other data types. Using publicly available 20 data, a Gene Co-expression Network (GCN) can be constructed and used for gene function prediction, 21 candidate gene selection and improving understanding of regulatory pathways. Several GCN studies have 22 been done in maize, mostly using microarray datasets. To build an optimal GCN from plant materials RNA-Seq 23 data, parameters for expression data normalization and network inference were evaluated. A comprehensive 24 evaluation of these two parameters and ranked aggregation strategy on network performance using libraries 25 from 1266 maize samples was conducted. Three normalization methods (VST, CPM, RPKM) and ten inference 26 methods, including six correlation and four mutual information (MI) methods, were tested. The three 27 normalization methods had very similar performance. For network inference, correlation methods performed 28 better than MI methods at some genes. Increasing sample size also had a positive effect on GCN. Aggregating 29 single networks together resulted in improved performance compared to single networks. 30 31 Introduction 32 Plant Physiology Preview. Published on August 2, 2017, as DOI:10.1104/pp.17.00825 Copyright 2017 by the American Society of Plant Biologists www.plantphysiol.org on August 22, 2020 - Published by Downloaded from Copyright © 2017 American Society of Plant Biologists. All rights reserved.

Upload: others

Post on 13-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 : Maize RNA-Seq GCNs · 4 Author names and affiliation: 5 Ji Huang, ... and the genome annotation is still far from complete (Mark 49 Cigan et al., 2005; Ficklin and Feltus, 2011)

Page | 1

Short title Maize RNA-Seq GCNs 1

Corresponding author details Dr Karen McGinnis 2

Article title Construction and Optimization of Large Gene Co-expression Network in Maize Using RNA-Seq Data 3

Author names and affiliation 4

Ji Huang Stefania Vendramin Karen M McGinnis Department of Biological Science Florida State University Tallahassee FL 5

32306 6

Lizhen Shi Department of Computer Science Florida State University Tallahassee FL 32306 7

One sentence summary Large-scale maize co-expression network from RNA-Seq data facilitates gene function and pathway 8

analysis 9

Footnotes 10

List of author contributions JH and KM designed the experiments JH conducted experiments JH and SV 11

analyzed the data JH KM and SV interpreted the data LS and JH made the website JH KM and SV wrote the article 12

Funding information National Science Foundation 13

Corresponding author email mcginnisbiofsuedu 14

15

16

Abstract 17

With the emergence of massively parallel sequencing genome-wide expression data production has reached 18

an unprecedented level This abundance of data has greatly facilitated maize research but may not be 19

amenable to traditional analysis techniques that were optimized for other data types Using publicly available 20

data a Gene Co-expression Network (GCN) can be constructed and used for gene function prediction 21

candidate gene selection and improving understanding of regulatory pathways Several GCN studies have 22

been done in maize mostly using microarray datasets To build an optimal GCN from plant materials RNA-Seq 23

data parameters for expression data normalization and network inference were evaluated A comprehensive 24

evaluation of these two parameters and ranked aggregation strategy on network performance using libraries 25

from 1266 maize samples was conducted Three normalization methods (VST CPM RPKM) and ten inference 26

methods including six correlation and four mutual information (MI) methods were tested The three 27

normalization methods had very similar performance For network inference correlation methods performed 28

better than MI methods at some genes Increasing sample size also had a positive effect on GCN Aggregating 29

single networks together resulted in improved performance compared to single networks 30

31

Introduction 32

Plant Physiology Preview Published on August 2 2017 as DOI101104pp1700825

Copyright 2017 by the American Society of Plant Biologists

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 2

Zea mays (maize) is the most widely produced crop in United States and US agriculture accounted for 36 33

of world maize production in 2015 (USDA 2016) Maize has also been in the center of the genetics research 34

for over 100 years including McClintockrsquos pioneering work with transposable elements (TEs) (reviewed by 35

(McClintock 1983 Fedoroff 2012)) Due to recent technological advances in nucleic acid sequencing and the 36

availability of the maize genome sequence (Schnable et al 2009) maize genomics research has been greatly 37

expedited 38

RNA-Sequencing (RNA-Seq) has become the favored technique for detecting genome-wide expression 39

patterns RNA-Seq has some advantages over microarray analysis of gene expression including single base 40

pair resolution detection of novel transcripts and the ability to analyze transcript abundance without existing 41

genome information (reviewed by (Wang et al 2009 Han et al 2015 Conesa et al 2016)) RNA-Seq data 42

provides information about single nucleotide polymorphisms (SNPs) which facilitates Genome-wide 43

Association Studies (GWAS) (Fu et al 2013 Li et al 2013a Lonsdale et al 2013 Fadista et al 2014) 44

Because of its widespread adaptability over five thousand Illumina platform maize RNA-Seq libraries (Fig 1A) 45

are available in the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) 46

database (Leinonen et al 2010) adding to the body of data that can be used to study the maize genome 47

The maize genome is large and heterogeneous and the genome annotation is still far from complete (Mark 48

Cigan et al 2005 Ficklin and Feltus 2011) Although recent work has made substantial progress toward 49

describing genome-wide expression patterns in many genotypes environmental conditions and tissues 50

relatively little is known about the function and regulation of most maize genes Because genes with related 51

biological functions or regulatory mechanisms often have similar expression patterns (Aoki et al 2007) one 52

way to enhance understanding of gene function is by construction of a Gene Co-expression Network (GCN) 53

(Drsquohaeseleer et al 2000 Aoki et al 2007 Usadel et al 2009 Li et al 2015c Serin et al 2016) GCNs are 54

constructed using data mining tools and algorithms that describe the relatedness between the expression 55

patterns of multiple genes in a pairwise fashion 56

The use of GCNs pre-dates the availability of RNA-Seq expression data (Ficklin and Feltus 2011 Sato et al 57

2011 De Bodt et al 2012) meaning that these approaches were initiated and optimized predominantly with 58

microarray datasets Maize RNA-Seq samples are already five times more abundant than microarray (Fig 1) 59

and increasing in number meaning that an RNA-Seq oriented maize GCN protocol would be valuable to the 60

scientific community Although the initial inputs and results from microarray and RNA-Seq are similar there are 61

many differences between the data types and analytical approaches It is therefore anticipated that some 62

adjustments to GCN parameters may improve the efficacy of GCN analysis of RNA-Seq data GCN 63

construction is typically a multistep process starting with normalization of input datasets network inference 64

network evaluation and interpretation (Supplemental Fig 1) 65

Both RNA-Seq and microarrays are affected by systematic variations (Park et al 2003 Oshlack and 66

Wakefield 2009 Zheng et al 2011 Li et al 2014b) Therefore genome wide expression results generated by 67

either technique need to be normalized prior to analysis (Dillies et al 2013a Li et al 2015b) Variance 68

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 3

stabilizing transformation (VST) Counts Per Million (CPM) and Reads Per Killobase Million (RPKM) are three 69

popular normalization methods for RNA-Seq experiments (Mortazavi et al 2008 Anders and Huber 2010 70

Rau et al 2013) 71

Some work has been done to evaluate the efficacy of different normalization methods for expression analysis 72

Giorgi et al (2013) showed VST normalization of RNA-Seq data resulted in a GCN with similar characteristics 73

to a microarray-supported network in terms of coefficient and node degree distribution Normalizations with 74

CPM and using the Trimmed Mean of M-values (TMM) to adjust the composition bias between RNA-Seq 75

datasets by calculating normalization factors (Robinson et al 2010) increased the robustness of analysis 76

among diverse library sizes and compositions (Dillies et al 2013a) These studies suggest that optimizing 77

normalization methods might improve GCN performance 78

There are several methods for gene network inference including correlation mutual information (MI) Bayesian 79

network and probabilistic graphical models Typically correlation and MI methods are used for constructing 80

large-scale GCNs with more than ten thousand genes (Krouk et al 2013) Correlation methods include 81

Pearson Correlation Coefficient (PCC) Spearmans correlation coefficient (SCC) Kendall rank correlation 82

coefficient (KCC) Gini correlation coefficient (GCC) and Biweight midcorrelation (BIC) (Langfelder and 83

Horvath 2008 Kumari et al 2012 Ma and Wang 2012 Ballouz et al 2015) Cosine similarity coefficient 84

(CSC) has also been used for computing similarities in sparse datasets such as text (Dhillon and Modha 2001) 85

and protein-protein interaction data (Luo et al 2015) MI methods include Accurate Cellular Networks 86

(ARACNE) Minimum Redundancy NETwork (MRNET) and Context Likelihood of Relatedness (CLR) (Margolin 87

et al 2006 Faith et al 2007 Meyer et al 2007) The network inference method might also influence GCN 88

performance 89

Several resources are already available for GCN analysis in maize including COB (Schaefer et al 2014) 90

CORNET (De Bodt et al 2012) CoP (Ogata et al 2010) PLANEX (Yim et al 2013) and ATTED-II (Obayashi 91

et al 2009) All of databases except ATTED-II used PCC to build GCN from 128 to 379 microarray datasets 92

ATTED-II recently updated their database to provide both GCNs from microarray and RNA-Seq using PCC-93

based mutual rank (Aoki et al 2015) Although PCC is widely used there is very limited evidence that it is the 94

optimal approach for GCN analyses 95

GCNs could also be improved by meta-analysis using ranked aggregation from individual networks (Zhong et 96

al 2014 Ballouz et al 2015 Wang et al 2015a) By aggregating individual experiments only interactions 97

consistent among networks are preserved which helps reduce noise and highlights conserved interactions 98

Furthermore the ranked aggregation method provides a way to efficiently increase the size of the aggregated 99

network with newly available datasets and recalculation with all datasets is not required when a new one is 100

added This provides an efficient way to process and incorporate emerging information 101

Herein an extensive evaluation in constructing maize GCNs is reported Three parameters were tested 102

normalization method network inference algorithm and ranked aggregation method To our knowledge this is 103

the first comprehensive attempt to optimizing GCN construction using plant RNA-Seq datasets The network is 104 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 4

publicly accessible at httpwwwbiofsuedumcginnislabmcnmain_pagephp A tutorial is also provided as 105

supplemental material 106

107

Results 108

Manually Curated Maize mRNA Expression Profiling from Publicly Available Datasets 109

Recently the usage of RNA-Seq in maize has increased dramatically generating zero data entries in NCBI-110

SRA in 2008 to over nine hundred in 2016 (Fig 1B) On the contrary the most widely used Affymetrix 111

expression array for maize had 177 samples in 2008 but only 46 in 2016 (Fig 1B) GCN construction 112

approaches have not been optimized for RNA-Seq datasets in plants and doing so could improve the quality 113

and robustness of GCNs To support a comprehensive evaluation on the effect of RNA-Seq normalization 114

methods and network inference methods on the performance of GCNs maize RNA-Seq datasets were 115

compiled and processed with a computational pipeline (Supplemental Fig 1) 1266 high quality RNA-Seq 116

maize libraries from 17 different experiments were selected as input to an expression matrix The 117

corresponding experimental descriptions and publications where available of each library were manually 118

checked for sample information (Supplemental Table S1) Also a filter for reads depth and alignment rate 119

were used to remove unqualified libraries (see Methods for detail) Tissue type and haplotype from those 120

libraries were manually curated and found to include a range of sample types (Supplemental Table S1) Shoot 121

apical meristem (SAM) leaf and root were the top three most abundant tissue types but a wide range of 122

tissues were represented by multiple libraries in the dataset (Supplemental Fig 1) The dataset also included 123

multiple haplotypes although B73 represented approximately 40 of the included libraries To reduce noise 124

lowly expressed genes were removed from analysis leaving 15116 nonredundant genes across the 1266 125

libraries For comparative purposes the Affymetrix Gene Chip maize array includes 13339 genes before 126

filtering (GeneChip Maize Genome Array 127

httpwwwaffymetrixcomcatalog131468AFFYMaize+Genome+Array1_1) 128

129

Three RNA-Seq Normalization Methods Show Comparable Distribution of Expression 130

Expression data from distinct sources and experiments can be highly variable because of hybridization artifacts 131

in microarray or variable sequencing depth in RNA-Seq Many methods have been successfully used for 132

normalizing both microarray and RNA-Seq data to correct for potential biases (Lim et al 2007 Dillies et al 133

2013b Li et al 2015b) To find an optimal normalization method for building a maize GCN from RNA-Seq data 134

three widely used normalization methods were compared This included Variance Stabilizing Transformed 135

(VST) Counts Per Million (CPM) and Reads Per Killobase Per Million (RPKM) (Mortazavi et al 2008 Anders 136

and Huber 2010 Rau et al 2013) For all normalization methods log2 transformation on the normalized 137

expression values reduced the skew of the data distribution (Supplemental Fig 2) Several network studies 138

from plant RNA-Seq data used log2 transformation (Davidson et al 2011 Ma and Wang 2012 Giorgi et al 139 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 5

2013 Stelpflug et al 2015 Walley et al 2016) In our analysis genes with CPM gt 2 in more than 1000 140

samples were included This filter dramatically reduces zero count values in raw data from 30949 to 0367 141

Moreover a prior count of one was added at log2 normalization (expression = log2(CPMRPKM +1)) to avoid 142

problem with remaining zero values The log2 transformation reduced skewed distributions and extreme values 143

represented by outliers (Supplemental Fig 2) Thus we think it is important to apply log2 transformation for our 144

data 145

The distribution of gene expression across the 1266 libraries formed a bell-shaped curve with a small 146

additional peak of low expression for all three methods (Supplemental Fig 2) To determine if these low 147

expression values came from a few or multiple libraries elements within the range of expression that 148

corresponded to the observed peak (lt -37 CPM Supplemental Fig 2B) were extracted from CPM-normalized 149

expression matrix and matched to the originating libraries This demonstrated that the low expression elements 150

were not limited exclusively to specific libraries but eight libraries contributed over 25 of low elements A 151

gene ontology enrichment analysis failed to identify significant gene ontology descriptors within the subset of 152

43 genes that were defined as lowly expressed (data not shown) All eight of these libraries were from pollen 153

tissue where the average gene expression at 147 Counts Per Million (CPM) is lower than the average gene 154

expression of the other 79 tissues combined at average 183 CPM Hierarchical clustering and correlation 155

heatmap with the same data (Stelpflug et al 2015) shows the uniqueness of pollen tissue expression pattern 156

(Langfelder and Horvath 2008) (Supplemental Fig 3) When the lowly expressed elements from RPKM- and 157

VST-normalized data were analyzed to determine library origin and GO enrichment (data not shown) we found 158

similarly high level of pollen-specific libraries without significant GO categories In pollen some highly 159

expressed genes are considered orphan genes (Wu et al 2014) because they lack detectable homologs in 160

another species To investigate whether these lowly expressed genes were orphan genes their gene 161

sequences were blasted against Setaria italica genome (JGIv2) (BLASTX e-value lt 1E-03) Setaria italic 162

(foxtail millet) is a close relative to maize which diverged 234 million years ago (MYA) as estimated by 163

TimeTree (Kumar et al 2017) Only 1 out of 43 genes lacked detectable homologs in Setaria italic (data not 164

shown) indicating that the majority of these genes are not likely to be orphan genes 165

Because RPKM normalization accounts for gene length the distribution of gene length versus expression for 166

the RPKM method was compared to data normalized by VST and CPM methods VST- and CPM-normalized 167

data showed very similar overall patterns with no clear linear relationship between gene length and average 168

expression (Supplemental Fig 2C) RPKM-normalized data displayed an apparent bias toward elevated 169

expression of a small number of genes less than 5000bp in length and lower expression of long genes 170

suggesting that this normalization method might skew the distribution of expression at some genes Overall in 171

spite of these differences the three normalization methods resulted in a similar distribution of expression 172

patterns for most of the genes included in the analysis Additional analysis was completed to determine if the 173

three normalization methods influence network performance 174

175

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 6

Network Performance Does Not Differ Based Upon Normalization Method 176

To compare the efficacy of three normalization and ten inference methods a GCN was generated for each 177

combination of normalization and inference methods Furthermore all networks were rank-standardized to limit 178

the edge weight ranging from 0 to 1 (See Methods) All networks evaluations used the whole adjacency matrix 179

(1511615116 in RNA-Seq networks 1142911429 or 1786217862 in protein networks) without a cut-off 180

The performance of the different networks was measured by comparing the area under the receiver operator 181

characteristic curves (AUROC) AUROC is a measurement used to evaluate the accuracy of classification 182

models making it suitable for evaluating GCNs (Gillis and Pavlidis 2011 Ma and Wang 2012 Liu et al 2017) 183

AUROC values range from 0 to 1 with a value closer to 1 indicating that the network is discriminating 184

nonrandom patterns and perfect classification random networks returning values close to 05 and values 185

closer to 0 indicating a high degree of incorrect classification While an AUROC value close to 1 is optimal 186

values over 07 suggest good performance when analyzing large diverse networks (Gillis and Pavlidis 2011) 187

To set up the AUROC baseline for the random networks maize gene IDs were shuffled 10 (for MRNET and 188

CLR) or 1000 times (for PCC) from the normalized expression matrix The randomized expression matirx were 189

inferenced using designated alorgrithms and further evaluated The resulting AUROC values from randomized 190

networks were very close to 05 (Supplemental Table S2) 191

AUROC values were calculated and compared for three different network characteristics The first 192

characteristic was designed to test if the network identified genes with known or predicted co-expression 193

patterns based upon prior results and inclusion in two existing datasets that could serve as a positive control 194

for co-expression The maize metabolic pathway (MaizeCyc) contains 413 pathways with more than two genes 195

and was built based upon collection of evidence from genome annotation phylogenetic distance and known 196

genes in maize rice and Arabidopsis (Monaco et al 2013) The maize protein-protein interaction database 197

(PPIM) is based upon both predicted and experimentally detected protein interactions (Zhu et al 2016) and 198

was the second dataset used in this analysis Only high-confident interactions from PPIM were used as 199

defined by ranking top 5 in their model (Zhu et al 2016) For comparison with the GCN genes within the 200

same MaizeCyc or PPIM pathways were considered co-expressed The MaizeCyc and PPIM datasets were 201

combined and genes with less than 5 interactions were excluded from evaluation creating a compiled dataset 202

referred to herein as the Protein-Protein and Pathway dataset (PPPTY) PPPTY had 1720 genes and 104856 203

interactions that were used in this evaluation The AUROC value was calculated for each of the 1720 gene 204

terms 205

To assess the effect of normalization method on GCNs AUROC values for all ten inference methods were 206

averaged for each of the three normalization methods All three normalization methods scored similarly in 207

comparison with the PPPTY dataset (Fig 2B) with a mean AUROC value around 0575 for each suggesting 208

that the predicted networks were more selective than a random network 209

The second characteristic was the presence of similar gene ontology (GO) information for maize genes within 210

a detected co-expression set based upon ldquoguilt by associationrdquo that assumes specific subgroups of co-211 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 7

expressed genes have some shared functions (Wolfe et al 2005) GO annotations were downloaded from 212

AgriGO (Du et al 2010) which uses signature integration by InterPro to map gene IDs to GO terms rather 213

than co-expression data InterPro provided over 108 million stable GO terms to the functional protein 214

information database UniProtKB at release 2016_01(Sangrador-Vegas et al 2016) Thus the GO annotations 215

provide a reliable evaluation resource independent of co-expression data To assess this characteristic gene 216

ontology information was used in a neighbor voting algorithm (Gillis and Pavlidis 2011) for sets of co-217

expression matrices and compared Co-expression matrices were assessed by 3-fold cross-validation which 218

involved masking GO terms from some genes to test whether the masked GO terms could be predicted based 219

upon gene expression patterns 277 GO terms were included for this analysis 220

When GO characteristics were used to assess the networks all three normalization methods performed 221

similarly but the AUROC values were higher at around 0689 for each than those observed for comparisons 222

with PPPTY (Fig 2A) Because GO addresses gene functions and PPPTY emphasizes protein-protein 223

interactions this suggests that GCNs are better at predicting functional interactions than physical interactions 224

The p-value from one-way ANOVA for testing normalization method effect on PPPTY and GO dataset were 225

09535 and 04714 respectively confirming that the normalization method did not create a significant difference 226

in the AUROC scores associated with the GCNs for the characteristics that were tested 227

Finally proteins that regulate gene expression or modify chromatin structure might interact with the DNA of a 228

subset of co-expressed genes The interactions between such a protein and regulated DNA could be detected 229

by chromatin precipitation of associated DNA followed by DNA sequencing (ChIP-Seq) In maize there are five 230

ChIP-Seq datasets available (Bolduc et al 2012 Morohashi et al 2012 Li et al 2015a Pautler et al 2015 231

Yang et al 2016) some of which involving lowly expressed or tissue-specific genes For example Opaque2 is 232

specifically expressed in endosperm (Li et al 2015a) Knotted1 is expressed in SAM and floral tissues (Bolduc 233

et al 2012) and Pericarp Color1 has low expression except in inflorescence and seed (Morohashi et al 234

2012) Histone Deacetylase 101 (HDA101) ChIP-Seq data provided the largest dataset for comparison with 26 235

confirmed binding targets that are relatively high expressed in most maize tissues (Yang et al 2016) Histone 236

deacetylation often correlates with decreased in gene expression (Verdin and Ott 2014) High confidence 237

HDA101 targets were defined as those discovered by ChIP-Seq and that also showed increased gene 238

expression in hda101 mutant Networks associated with the 26 high confidence HDA101 targets were 239

compared by calculating AUROC Based upon this analysis the AUROC values were very similar among 240

networks normalized by VST CPM and RPKM (Fig 2C) which is consistent with GO and PPPTY evaluation 241

242

Correlation Methods Performs better than Mutual Information at Some Genes 243

After normalization of the expression matrices they can be processed by different methods for GCN inference 244

To optimize this step the AUROC values of six correlation (PCC SCC KCC GCC BIC CSC) and four mutual 245

information (MI) methods (AA MA MRNET CLR) were compared for the expression matrices that were 246

generated from each of three normalization methods (VST CPM RPKM) and then averaged In general 247 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 8

correlation methods are more computationally efficient while MI methods are able to reveal non-linear 248

relationships (Li et al 2015c) PCC is widely used but may be influenced by outliers (Mukaka 2012) SCC 249

KCC and BIC are less sensitive to outliers because SCC and KCC only consider the rank information and BIC 250

calculates based on dataset median instead of mean (Serin et al 2016) Recently GCC has been shown to 251

be a better correlation method for gene expression analysis because of its capacity to detect non-linear 252

relationships and insensitivity to outliers (Ma and Wang 2012) CSC is widely used for text mining and 253

analyzing sparse data with many zeros (Dhillon and Modha 2001) ARACNE MRNET and CLR showed 254

extended gene-dependent relationships under variable biological settings (Margolin et al 2006 Faith et al 255

2007 Meyer et al 2007 Li et al 2013b) To estimate the effectiveness of the inference methods the same 256

testing parameters with AUROC calculations were performed as described for the testing of normalization 257

methods 258

Assessed by GO datasets the 277 AUROC values were averaged to create one average value for each of the 259

10 inference methods ranging from 0620 to 0724 (Fig 2D) The average AUROC across all normalization 260

methods for six correlation methods was 0718 while the average AUROC for the all four MI methods was 261

0646 The majority of the 277 GO terms had similar AUROC values in the different correlation method-262

generated GCNs and these patterns are different from those observed in the MI-generated GCNs (Fig 3A) 263

The similarity among different methods was also detectable by pairwise comparison and comparing Pearson 264

correlations between the different methods (Supplemental Fig 4A) 265

To evaluate network inference methods with the PPPTY dataset the AUROC values for 1720 genes were 266

averaged for each combination of normalization and inference methods (Fig 2E) This evaluation also showed 267

that the networks constructed using correlation methods resulted in higher AUROC values than MI methods 268

although the CSC method resulted in lower AUROC values than other correlation methods As demonstrated 269

for the GO evaluation results from correlation methods were more similar with each other than the MI methods 270

(Supplemental Fig 4B) Interestingly heatmap results indicated that a subset of genes consistently had higher 271

AUROC values when CSC MRNETCLR or AAMA were used (Fig 3B) although this includes a small enough 272

number of genes that the average AUROC value over the whole gene set was relatively low for those methods 273

The gene sets with highest AUROC values in PCC CSC or MRNET were extracted Characteristics of each 274

gene sets were compared in average expression (CPM) and average number of low expressed elements 275

(CPM lt 0) The CSC gene set had the smallest number of low expression elements and had higher average 276

expression than both the 1720 gene set and the PCC gene set (Supplemental Fig 5) This may indicate that 277

the CSC method is better at determining co-expression for highly expressed genes 278

The AUROC values from 26 targets of HDA101 ChIP-Seq datasets reveals that CSC GCN had the highest 279

AUROC value and the use of MRNETCLR GCNs resulted in slightly higher scores than correlation methods 280

(Fig 2F) This could be explained by the small number of targets creating skewed results but may also 281

indicate that CSCMI methods are more suitable for specific types of genes or interactions between genes 282

(Tzfadia et al 2016) HDA101 is a highly expressed gene in all samples with average expression value equals 283

to 864 CPM and minimum expression equals to 289 CPM so itrsquos possible that HDA101 is more suitable for 284 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 9

CSC method CPM and RPKM normalization methods had higher AUROC values than VST (Fig 2C) Using 285

two models of ARACNE (additive-AA and multiplicative-MA) the co-expression matrices contain less than 05 286

non-zero values for all comparisons and so these techniques were not included in any additional analyses 287

In conclusion our results indicated the widely-used correlation methods resulted in a more predictive maize 288

GCN from a single expression matrix but co-expression with some individual genes may be better detected 289

using MI methods Normalization method did not have a substantial influence on GCNrsquos performance so only 290

CPM normalization was used in conjunction with PCC SCC MRNT and CLR inference for subsequent 291

optimization of other parameters 292

293

Increase Sample Size Had a Positive Effect On GCN 294

GCN analysis can be accomplished with a variable number of samples and datasets but sample size can 295

influence the quality of the resulting GCN (Wei et al 2004 Ballouz et al 2015) Separate analyses were 296

conducted with different numbers of samples and experiments to empirically determine the effect of sample 297

number on GCN effectiveness The data in our analysis consisted of 17 experiments each including between 298

12 and 404 libraries For this analysis CPM normalization method followed by each of four inference methods 299

(PCC SCC MRNET and CLR) was applied to the 17 experiments and the 68 resulting networks were 300

evaluated by both GO and PPPTY 301

From GO and PPPTY evaluation all algorithms exhibit a positive linear relationship between sample size with 302

natural logarithm transformed and average AUROC values (Fig 4) The linear relationships are stronger in 303

PCC and SCC methods with higher r-square values indicating correlation methods benefit more from 304

increasing sample size Thus for building correlation-based GCNs as many samples as possible should be 305

included We also found that as seen for the total GCN analysis PCC and SCC had higher average AUROC 306

values than the MRNET and CLR methods for PPPTY and GO analysis for most of individual networks (Fig 5) 307

308

Ranked Aggregation of Networks Improved Performance of GCNs 309

Ranked aggregation for meta-analysis can also be modified to change the outcomes of GCN by buffering the 310

effect of sample heterogeneity (Zhong et al 2014 Wang et al 2015a Asnicar et al 2016) Aggregated rank 311

standardized correlationMI matrices were calculated from separate experiments to determine if this approach 312

enhanced GCN performance Aggregating individual networks together for meta-analysis can help to highlight 313

true co-expression interactions and reduce noise (Zhong et al 2014 Wang et al 2015a Wang et al 2015b) 314

This analysis was conducted with the 17 differently sized experiments using PCC SCC MRNET and CLR 315

method for GCN inference as we did previously resulting in 68 single GCNs The 17 experiments were 316

aggregated for PCC SCC MRNET and CLR individually and evaluated by GO and PPPTY datasets 317

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 10

Of the 4 aggregated networks that were evaluated the two correlation methods (PCC and SCC) had higher 318

AUROC values than the single network from 1266 samples (Figure 6 and Supplemental Fig 6) However this 319

aggregation strategy did not result in significant higher AUROC scores for the MRNET and CLR method 320

networks compared with single networks with 1266 samples (two-tail Wilcoxon rank test for GO evaluation p-321

values 0494 and 0796) It has been reported that MI estimation accuracy is dependent on sample size (Gao 322

et al 2015) therefore individual MI networks built with a small number of libraries may not demonstrate 323

improved accuracy from aggregation In conclusion the PCCSCC-built GCN performed best using a ranked 324

aggregation strategy and use of this strategy in combination with the other optimized parameters creates a 325

robust GCN 326

327

The Performance of Protein Networks Did Not Exceed Aggregation Networks 328

In many cases mRNA levels in a cell are of interest because mRNA level is thought to be related to the level 329

and function of a protein of interest However many researchers had found inconsistencies between mRNA 330

and protein level (Baerenfaller et al 2008 Schwanhaumlusser et al 2011 Ponnala et al 2014 Walley et al 331

2016) Although relatively less protein expression data is available this data is amenable to GCN construction 332

and could represent a more direct reflection of interacting proteins Using a non-modified protein expression 333

atlas from 23 maize tissues based upon mass spectrometry data (Walley et al 2016) four protein networks 334

were built with PCC SCC MRNET and CLR separately and then evaluated using the same PPPTY and GO 335

dataset as previously mentioned 336

GCNs constructed from protein expression did not exhibit superior AUROC values to those observed for RNA-337

Seq based GCN using the aggregation strategy (Fig 6) When evaluated by GO and PPPTY dataset the 338

performance of the protein network was lower than the aggregated network as well as the single network from 339

1266 samples To confirm this result a two-way ANOVA was computed with pairwise comparison for the GO 340

evaluation which showed that the effect of network type was significant (Supplemental Table S3) A 341

subsequent pairwise comparison using Wilcoxon rank sum test indicated that PCCSCC method were 342

significantly better than MRNETCLR (Supplemental Table S3) although MI methods may be superior for 343

some types of interactions 344

The raw protein expression data included 17862 genes of which 11429 genes overlapped with our RNA-Seq-345

based network and were therefore used for the analysis To demonstrate that the performance of the protein 346

network was not biased due by the selection of genes the PCC method was used for the whole 17862 genes 347

to construct a protein network (Supplemental Fig 7) No improvement could be detected from protein network 348

derived from 17862 genes with p-value equals to 0635 for GO evaluation and 0995 for PPPTY evaluation 349

from one-sided Wilcoxon rank sum test 350

351

PCC and SCC-built GCN Exhibit Identical Topological and Functional Properties 352 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 11

In addition to evaluation of network performance based upon biological characteristics networks can be 353

compared based upon several different network characteristics including clustering coefficient number of 354

nodes network heterogeneity (Dong and Horvath 2007) network centralization (Dong and Horvath 2007) 355

number of detected modules and number of genes in largest module Number of nodes is a basic construct in 356

graph theory depicting the scale of a network Clustering coefficient and number of modules are to model how 357

densely nodes are connected in networks Heterogeneity measures the variability of node connections 358

Centralization indicates how likely some nodes have significantly more connections than average In this 359

analysis each gene corresponds with a node Based on the extensive evaluation using biological 360

characteristics like protein-protein interactions (PPPTY) and predicted gene function (GO) three final maize 361

networks were selected for comparison of basic network characteristics based on their overall performance 362

PCC and SCC-built ranked aggregation network from 17 experiments (PA and SA) MRNET-built single 363

network from 1266 total samples (MS) The three networks were constrained to include the top one million 364

predicted interactions or edges 365

In prior studies most biological networks had scale-free architectures which fit a power-law distribution 366

(Barabasi et al 2004 Doncheva et al 2012 Schaefer et al 2014) For the three final maize networks 367

constructed using optimized parameters both neighborhood connectivity distribution (Supplemental Fig 8) and 368

node degree distribution (Supplemental Fig 9) fit power-law models with r-squared values over 07 The MS 369

network had the highest network centralization value The network heterogeneity value of MS was over two 370

times that of PA and SA indicating that MS may contain more highly interacting genes (Supplemental Table 371

S4) consistent with the observed highest centralization values for this network Centralization and 372

heterogeneity are two variants to model the degree distribution of networks A scale-free network with more 373

numbers of hubs has larger values of centralization and heterogeneity while a network with larger values of 374

centralization and heterogeneity may contain a larger number of hubs or the number of hubs is not significantly 375

large but the degree distributions are extremely imbalanced In biological networks many observations 376

connected large values of centralization and heterogeneity with more hub genes (Ma and Zeng 2003 Horvath 377

and Dong 2008 Iancu et al 2012 Scott-Boyer et al 2013) even though theoretically we cannot rule out the 378

possibility that high values were result from extremely imbalanced degree distribution For the MS network 379

most highly connected genes interacted with a large number of lowly connected genes this pattern is also 380

apparent reflected in the decreasing neighborhood connectivity distribution for the MS network (Supplemental 381

Fig 8) The genes with the most interactions are expected to act as key components in GCN networks 382

(Langfelder and Horvath 2008 Allen et al 2012) and likely represent central regulators of multi-protein 383

biological processes (Ma et al 2013 Du et al 2015) The top 1000 interacting genes from all networks were 384

analyzed in more detail as these were potential ldquohubrdquo genes that may regulate other expression patterns and 385

processes PA and SA shared 95 of the top 1000 interacting genes while MS had 835 unique genes (Fig 386

7A) 148 genes were shared among all three networks (Supplemental Table S5) making these genes strong 387

candidate for central biological regulators The annotation of these genes suggests their participation in a 388

range of basic cellular process (Fig 7C) including gene expression DNA replication translation and gene 389

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 12

silencing (Supplemental Table S5) the top interacting genes were not limited to a subset of cellular 390

biochemistry Ribosomal proteins were the largest component of top interacting genes (27148) which was 391

expected because of their cellular abundance and involvement with translation Interestingly nine epigenetic 392

regulators were found in the 148 shared genes including AGO104 (GRMZM2G141818) (Singh et al 2011) 393

CHR106 (GRMZM2G071025) (Li et al 2014a) and LBL1 (GRMZM2G020187) (Dotto et al 2014) 394

demonstrating the importance of epigenetic regulation for plant development (reviewed by (Huang et al 395

2017)) 396

To reveal the underlying properties of GCNs a graph clustering algorithm Markov Cluster Algorithm(MCL) was 397

used to identify network modules (Enright et al 2002 Morris et al 2011) The result showed a shared pattern 398

between the PA and SA networks that was distinct from the MS network (Supplemental Table S4) The MS 399

network had fewer but larger modules detected than the PA and SA networks Consequently most genes in 400

the MS network clustered into one very large module of 14054 consistent with the high network centralization 401

value for the MS network Conversely PA and SA networks separated into smaller distinct modules with 402

related gene ontology enrichment (Supplemental Table S6 and S7) The pattern displayed by the PA and SA 403

networks (Supplemental Fig 10) seems more likely to represent biologically relevant pathways and so these 404

methods appear to be better for module detection 405

To compile a high-confident co-expression network the top 1 million edges from PA SA and MS were merged 406

together and the intersection of the three produced a 14277 gene 106591 interactions merged network PA 407

and SA shared 835 of common interactions within the networks while MS had 873 unique interactions 408

(Fig 7B) This merged network (Supplemental Dataset S1) was used for a case study analysis of cell wall 409

biosynthesis The same network can also be accessed at httpwwwbiofsuedumcginnislabmcnmain_pagephp 410

411

Case Study Cell Wall Biosynthesis and Regulation 412

To demonstrate the functionality of network the predicted cell wall biosynthesis pathway from the merged 413

network was compared to the existing knowledge of this pathway Sixteen well-characterized components of 414

cell wall biosynthesis were selected as guide genes (Supplemental Table S8) including five cellulose 415

synthase genes seven cellulose synthase-like genes three glycosyl hydrolase genes and one glycosidase 416

gene (Penning et al 2009 Bosch et al 2011) Collectively 214 genes containing 377 edges were extracted 417

from the network with the 16 guide genes (Fig 8 A) two guide genes did not have any co-expressed genes in 418

the network that met the analysis criteria As expected for these 214 genes cell wall related GO terms were 419

enriched (Fig 7D Supplemental Table S9) 420

The resulting 214 co-expressed genes were queried against the Arabidopsis TAIR 10 protein database to 421

retrieve homologs and their annotations using BLASTP The literature was manually searched using the maize 422

genes and their Arabidopsis homologs as queries (Supplemental Table S10) The results of the literature 423

survey showed that 313 (67214) of the genes co-expressed with the guide genes had peer-reviewed 424

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 13

publications indicating a role in cell wall synthesis or related pathways in plants A search using 214 randomly 425

selected genes as queries returned only 327 genes (7214) that were involved in cell wall related pathways 426

This suggests that the network discriminated co-expressed genes and identified some known components of 427

the pathway Lignin biosynthesis genes are expected to function in cell wall biosynthesis to provide rigidity and 428

strength in the secondary cell wall (reviewed by Vanholme et al 2010) Interestingly even though no lignin 429

biosynthesis genes were included in our queries six lignin biosynthesis genes (PAL1 C4H 4CL2 HCT 430

CCoAOMT1 and PDR1) (reviewed by Zhong and Ye 2015) were found to be co-expressed with the guide 431

genes At least nine cellulose biosynthesis and assembly genes were discovered including CESA1 FLA11 432

IRX9 IRX14 and IRX10 (reviewed by Zhong and Ye 2015) Moreover proteins participating in a well-studied 433

physical interaction CSI1 (Cellulose Synthase Interactive 1) CESA6 (Cellulose Synthase 6) and CESA3 434

(Cellulose Synthase 3) (Desprez et al 2007 Gu et al 2010) were also predicted to be expressed in the 435

network There were 131 genes without reported functions in cell wall pathways an indication that GCN 436

analysis can be used to predict undiscovered components of biological pathways in maize 437

The cell wall biosynthesis pathway results were also compared with the CORNET Co-expression database (De 438

Bodt et al 2012) and STRING functional protein association network (Szklarczyk et al 2015) using the same 439

16 genes and similar parameters (See Methods) From CORNET 10 out of 16 genes had co-expressed genes 440

(Fig 8B) In total 210 genes and 325 interactions were retrieved using CORNET of which 19 (40210) had 441

publications supporting their function in cell wall pathways (Supplemental Table S11) STRING performed very 442

well with 14 out of 16 genes demonstrating predicted protein association (Fig 8C) resulting in 817 443

interactions with 76 genes 48 (3675) of co-expressed genes were experimentally confirmed (Supplemental 444

Table S12) the highest percentage among the three methods Only one of the lignin biosynthesis genes 445

(PAL1) was found using CORNET and none were found using STRING Although STRING appears very 446

robust for predicting protein-protein interactions this suggests that an optimized GCN analysis have more 447

power to find genes that function together without physically interacting This case study shows that a robust 448

optimized GCN can discover physical and functional interactions and enhance study of biological relevant 449

interactions A tutorial was provided as supplemental material on how to use Cytoscape to visualize any co-450

expressed genes in our network (Supplemental Dataset S2) 451

452

Discussion 453

As the per-read cost of RNA-Seq technology decreases the use of this technology is quickly increasing With 454

over five thousand libraries available for maize there is now ample data to support GCN analysis This 455

comprehensive evaluation of normalization methods and network inference methods using real maize RNA-456

Seq data will provide a useful set of optimized parameters to support these analyses 457

In our analysis VST CPM and RPKM normalization methods had equivalent outcomes for GCN analysis 458

consistent with prior results using much smaller datasets (Giorgi et al 2013) Several benchmark studies 459

focusing on differential expression (DE) analysis proposed that RPKM performed poorly and should be avoided 460 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 14

(Maza et al 2013 Dillies et al 2013b Zyprych-Walczak et al 2015) This was not observed for the maize 461

GCN testing It is possible that the large number of samples from various labs created enough heterogeneity 462

within samples that normalization effects were minimized (Paulson et al 2016) Furthermore the 463

normalization is on a library basis which means genes within the same library are normalized by similar factors 464

So when the network is constructed by PCC and BIC where expression vectors are centered by mean or 465

median values the effect of different normalization methods are probably small Two rank correlations SCC 466

and KCC only consider difference on relative rankings where normalization has a limited effect It is similar for 467

GCC method The estimation of mutual information is based on the k-nearest neighbor method implemented in 468

parmigene (Sales and Romualdi 2011) Since the three normalization methods shared similar expression 469

distribution (Supplemental Fig 2) MI estimations from different normalizations are expected to be similar 470

When assessing inference methods the simple and widely used correlation methods like PCC and SCC are 471

less time-consuming than MI methods This analysis showed PCCSCC- built GCNs had better overall 472

performance This is consistent with a study in human GCN analysis (Ballouz et al 2015) but SCC did not 473

score higher than other correlation methods using GO and PPPTY evaluations Some genes had higher 474

performance using MI methods but this effect was limited to evaluation with the PPPTY data This may 475

indicate that correlation and MI inference methods assert different kinds of interactions (Meyer et al 2008 476

Marbach et al 2012 Song et al 2012) Marbach et al (2012) stated that integration of multiple inference 477

methods showed a more robust performance than any single inference methods in in silico and E coli 478

expression networks referring to ldquothe wisdom of crowdrdquo However for analysis of the available maize data 479

integration of PCC SCC MRNET and CLR together did not result in a network that outperformed PCC and 480

SCC networks (data not shown) This approach was also less effective in more complex S cerevisiae datasets 481

than prokaryotic networks (Marbach et al 2012) suggesting that more work is required to determine whether 482

integrating algorithms can improve GCNs with eukaryotic data 483

In conclusion we extensively evaluated normalization methods and inference methods for building an RNA-484

Seq based maize GCN This optimization may apply to a range of datasets with shared characteristics of 485

maize including a large and heterogeneous genome with rich and diverse transposon element composition 486

and limited gene annotation 487

488

Materials and Methods 489

RNA-Seq Data Collection and Process 490

The maize genome and its annotation were downloaded from Ensembl Plant Release 31 491

(httpplantsensemblorg) The original 1303 RNA-Seq samples based on illumina HiSeq2000 or Hiseq2500 492

were downloaded from NCBI Sequence Read Archive (SRA) (Leinonen et al 2010) The downloaded files 493

were converted to fastq format using the fastq-dump command in SRA Toolkit (version 252) The adapters for 494

the fastq files were trimmed by Cutadapt 181 (Martin 2011) The adapter-removed files were then quality 495

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 15

checked by FastQC v0112 (httpwwwbioinformaticsbabrahamacukprojectsfastqc) HISAT2 v204 (Kim 496

et al 2015) was used for genome alignment Gene-level expression raw read counts were calculated by 497

FeatureCounts 150 (Liao et al 2014) from aligned bam files (Supplemental Fig S1) 26 libraries with less 498

than 5 million reads total and 11 libraries with less than 70 of total alignment rate were excluded leaving 499

1266 samples (Supplemental Table S1) for the final expression table The processing protocol were 500

streamlined by Snakemake v371 (Koumlster and Rahmann 2012) 501

502

Gene Count Normalization 503

The expression data was normalized using three different methods before constructing GCNs Counts Per 504

Million (CPM) and Reads Per Killbase Per Million (RPKM) were calculated by edgeR package (Robinson et al 505

2010) in R environment and then log2 normalized (expression = log2(CPMRPKM +1) For both method scale 506

factors between samples were estimated by Trimmed Mean of M-values (TMM) in edge R Variance Stabilizing 507

Transformation (VST) was calculated by DESeq2 package (Love et al 2014) Only genes with expression 508

higher than 2 CPM in more than 1000 samples were included from additional analysis (15116 genes) 509

510

Network Inference 511

Six correlation coefficient methods and four mutual information methods were applied to normalized gene 512

expression data to construct GCNs All computing steps were done in the R 331 environment Pearson 513

Correlation Coefficient (PCC) and Spearman Correlation Coefficient (SCC) was calculated by cor() function 514

Kendall rank Correlation Coefficient was calculated using corfk() function in pcaPP package (Filzmoser et al 515

2009) Gini Correlation Coefficient was calculated by adjacencymatrix() function in rsgcc package (Ma and 516

Wang 2012) Biweight midcorrelation was computed by bicor() function in WGCNA package (Langfelder and 517

Horvath 2008) Cosine similarity coefficient was computed by cosine() function in coop package (Schmidt 518

2016) Mutual information results were computed using the parmigene package (Sales and Romualdi 2011) 519

The adjacency matrix weighs derived from ten inference methods were ranked with smallest value equals to 520

one Then ranks were divided by the number of elements in the matrix and diagonal was set to one to make all 521

networks weighs ranging from zero to one 522

523

Network Performance Evaluation 524

To generate the random networks gene IDs were shuffled randomly in CPM or VST normalized expression 525

matrices The randomized expression matrices were then inferenced by PCC MRNET or CLR methods and 526

evaluated For PCC methods 1000 repeats of randomization and evaluation were conducted For MRNET and 527

CLR each inference steps took 2 hours on our server so 10 repeats were conducted 528

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 16

Four maize datasets were used for evaluation First maize protein-protein interactions were downloaded from 529

PPIM v11 (Zhu et al 2016) Only high-confidence interactions were used for evaluation as defined by ranking 530

top 5 in their results Second maize pathway information was downloaded from MaizeCyc v22 (Monaco et 531

al 2013) Genes within same pathways were considered as co-expressed Third maize gene ontology data 532

for AGPv330 was downloaded from AgriGO (Du et al 2010) GO terms with 20 to 300 genes were used for 533

evaluation Fourth ChIP-Seq confirmed targets for HDA101 (GRMZM2G172883) (Yang et al 2016) was used 534

as positive co-expressed examples for evaluation 535

The widely-used Area under Receiver Operating Characteristic (AUROC) for binary classification problems 536

was used for evaluations Protein-protein interaction and pathway information was parsed into lists of co-537

expressed genes Prediction() and performance() function in R package ROCR were used to calculate 538

AUROCs (Sing et al 2005) The 277 AUROC values for GO datasets were calculated by EGAD package 539

(Ballouz et al 2016) in R Basically it utilizes the ldquoguilt-by associationrdquo principle that genes with shared GO 540

terms are more likely to connected Thus networks normalized and inferred by different methods can be 541

evaluated by hiding a subset of genes GO terms and test whether the hidden GO terms could be predicted 542

from the remaining annotations The prediction model performance was measured by AUROC values in three-543

fold cross-validation All ANOVA and pairwise Wilcoxon rank tests were analyzed in R using anova() and 544

pairwisewilcoxtest() function from stats package P-value adjustment method was set to ldquofdrrdquo (Benjamini and 545

Hochberg 1995) 546

Definition of True Positives (TP) False Positives (FP) True Negatives (TN) False Negatives (FN) For the 547

evaluation using PPPTY dataset TP a network predicts two genes are co-expressed and they are co-548

expressed in PPPTY dataset FP a network predicts two genes are co-expressed but they are not TN a 549

network predicts two genes are not co-expressed and they are not co-expressed in PPPTY FN a network 550

predicts two genes are not co-expressed but they are co-expressed in PPPTY datasets For the evaluation 551

using GO dataset TP a network predicts a gene has a specific GO term and it does have that GO term in our 552

GO dataset FP a network predicts a gene has a specific GO term but it does not have that GO term in our 553

GO dataset TN a network predicts a gene does not have a specific GO term and it doesnrsquot have in our GO 554

dataset FN a network predicts a gene does not have a specific GO terms but it has that GO term in GO 555

dataset 556

557

Network Clustering and Characterization 558

For each network the top 1 million edges were selected as stringent co-expression networks The network 559

topological characteristics were computed in Cytoscape (Shannon et al 2003) The neighborhood connectivity 560

distribution and node degree distributions were plotted by Network Analyzer plugin (Doncheva et al 2012) 561

Graph clustering was performed using Markov Cluster Algorithm (MCL) by MCL v14137 with inflation value set 562

to 18 (Enright et al 2002) All networks were visualized in Cytoscape 563

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 17

564

Gene Ontology Enrichment and Visualization 565

Gene ontology enrichment was analyzed in AgriGOrsquos Singular Enrichment Analysis tool (Du et al 2010) 566

15116 genes involved in our networks were used as background references Hypergeometric testing was used 567

to calculate p-value for which a value below 005 was considered as significant The Yekutieli method was 568

used for multiple test correction and terms with false discovery rate (FDR) above 005 were discarded The 569

results were then imported into Cytoscape for visualization 570

571

Databases Comparison on Cell Wall Pathway 572

Sixteen well characterized (Penning et al 2009 Bosch et al 2011) components of cell wall biosynthesis 573

(Supplemental Table S8) were chosen as query genes to search against CORNET Maize 574

(httpsbioinformaticspsbugentbecornetversionscornet_maize10) on website and STRING database using 575

Cytoscape stringApp (httpappscytoscapeorgappsstringapp) The parameters for searching CORNET 576

database were Method=Pearson Correlation coefficient=075 P-value le 005 and Top genes = 50 This 577

resulted in 210 co-expressed genes and 325 interactions To search STRING database the confidence cutoff 578

was set to 04 with maximum number of interactors set to 100 76 genes with 817 interactions were retrieved 579

Maize proteins were blasted against TAIR 10 protein sequences using standalone BLASTP version 2228+ 580

(Camacho et al 2009) 581

582

Acknowledgments 583

We would like to give special thanks to Dr Peixiang Zhao (FSU Department of Computer Science) for advice 584

and discussion on topological analysis of maize networks Also we thank Dr Alan Lemmon (FSU Department 585

of Scientific Computing) and Dr Jonathan Dennis (FSU Department of Biological Science) for the helpful 586

discussion on data analysis 587

588

Supplemental Data 589

Supplemental Figure 1 Pipeline and datasets used for analysis 590

Supplemental Figure 2 Distribution of gene expression values 591

Supplemental Figure 3 Maize CPM-normalized with log2 transformed gene expression from all tissues and 592

developmental stages 593

Supplemental Figure 4 Pairwise comparison among results of inferences methods 594

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 18

Supplemental Figure 5 Characteristic of all 1720 PPPTY gene set (ALL_1720) genes with highest AUROC 595

values in CSC method (CSC) PCC method (PCC) and MRNET method (MRNET) 596

Supplemental Figure 6 Evaluation of network performance based on sample size and inference 597

Supplemental Figure 7 GCN performance comparison between protein networks 598

Supplemental Figure 8 Average neighborhood connectivity for three selected networks PCC-aggregated (PA) 599

SCC-aggregated (SA) and MRNET-single (MS) 600

Supplemental Figure 9 Node distribution for four selected networks PCC-aggregated (PA) SCC-aggregated 601

(SA) and MRNET-single (MS) and the intersection among three networks (Merged network) 602

Supplemental Figure 10 Network representation of PCC ranked aggregation network (PA) 603

Supplemental Table S1 RNA-Seq libraries used in this analysis 604

Supplemental Table S2 Random network AUROC value baseline 605

Supplemental Table S3 ANOVA tables and pairwise comparisons 606

Supplemental Table S4 Topological characteristics of four maize networks 607

Supplemental Table S5 Gene Ontology annotation for 148 hub genes 608

Supplemental Table S6 Enriched GO terms for PCC ranked aggregation networks from module 1 to module 8 609

Supplemental Table S7 Enriched GO terms for SCC ranked aggregation networks from module 1 to module 8 610

Supplemental Table S8 16 query genes in maize cell wall pathway 611

Supplemetal Table S9 GO enrichment analysis for 214 co-expressed genes of cell wall query genes in 612

merged network 613

Supplemental Table S10 Annotation for co-expressed genes queried by 16 cell wall pathway genes from 614

merged network 615

Supplemental Table S11 Annotation for co-expressed genes queried by 16 cell wall pathway genes from 616

CORNET database 617

Supplemental Table S12 Annotation for co-expressed genes queried by 16 cell wall pathway genes from 618

STRING database 619

Supplemental Dataset S1 The merged network in Cytoscape-ready format 620

Supplemental Dataset S2 Tutorial Visualizing Co-expression data in Cytoscape 621

622

623 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 19

624

625

626

Figure legends 627

628

Figure 1 Number of maize microarray and RNA-Seq samples submitted to NCBI (httpswwwncbinlmnihgov) 629

from 2008-2016 A A text search of the National Center for Biotechnology Information (NCBI) Gene 630

Expression Omnibus (GEO) database identified samples generated by microarray platforms GPL4032 and 631

GPL12620 the total values for years 2008 to 2016 were combined to represent the number of microarray 632

studies (Microarray) A text search of the NCBI Sequence Read Archive (SRA) database was used to identify 633

RNA-seq samples generated between 2008 and 2016 using the Illumina sequencing platform (RNA-Seq) B 634

the number of samples submitted to NCBI GEO database each year generated by microarray platform 635

GPL4032 and GPL12620 were identified by a text search (dash line) and compared to the number of RNA-Seq 636

Illumina samples (solid line) per year 2008-2016 637

638

Figure 2 Normalization and network inference methods effect on single network performance A Network 639

performance was evaluated by calculating area under the receiver operating characteristic curve (AUROC) 640

values from GO datasets for comparisons with samples normalized using Variance Stabilizing Transformation 641

(VST) Counts Per Million (CPM) or Reads per Kilobase per Million (RPKM) methods B Network performance 642

was evaluated by calculating AUROC values from PPPTY dataset comparisons for samples normalized using 643

VST CPM or RPKM methods C Network performance was evaluated by calculating AUROC values from 644

comparisons with HDA101 binding targets for samples normalized using VST CPM or RPKM methods D 645

Network performance was evaluated by calculating AUROC values from comparisons with GO dataset for 646

samples constructed using ten inference methods including Pearson Correlation Coefficient (PCC) Spearman 647

correlation coefficient (SCC) Kendall rank Correlation Coefficient (KCC) Gini correlation coefficient (GCC) 648

Biweight midcorrelation (BIC) Cosine Similarity Coefficient (CSC) Additive ARCNE (AA) multiplicative 649

ARCNE (MA) Minimum Redundancy NETwork (MRNET) and Context Likelihood of Relatedness (CLR) E 650

Network performance was evaluated by calculating AUROC values from comparisons with PPPTY for samples 651

constructed using ten inference methods F Network performance was evaluated by calculating AUROC 652

values from comparisons with HDA101 binding targets for samples constructed using ten inference methods 653

Outliers were defined as outside of 15 times the interquartile range above 75 quantile or below 25 quantile 654

Median values were plotted as bold horizontal lines For C to F the horizontal dashed lines indicate the highest 655

and lowest AUROC values 656

657

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 20

Figure 3 Similarity between ten inference methods on network performance based upon GO (A) and PPPTY 658

(B) evaluation Genes with highest AUROC value in CSC or MRNETCLR are closed in blue and green box 659

respectively Area under the ROC curve (AUROC) values for each GO term or genes were scaled to standard 660

normal distribution resulting in scaled AUROC values between -3 (blue) and 3 (red) Samples normalized by 661

VST CPM and RPKM were analyzed using each inference methods (PCC SCC KCC GCC BIC CSC AA 662

MA MRNET and CLR) and clustered based on Euclidian distance PCC Pearson Correlation Coefficient SCC 663

Spearman correlation coefficient KCC Kendall rank Correlation Coefficient GCC Gini correlation coefficient 664

BIC Biweight midcorrelation CSC Cosine Similarity Coefficient AA Additive ARCNE MA multiplicative 665

ARCNE MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness 666

667

Figure 4 Effect of sample size on network performance A Average area under the ROC curve (average 668

AUROC) values from GO evaluation of 17 different sized networks plotted against natural logarithm 669

transformed sample size (log(sample size)) B Average AUROC values from PPPTY evaluation of 17 different 670

sized networks plotted against natural logarithm transformed sample size (log(sample size)) Regression fitting 671

logarithm models were plotted in black lines R-square and p-values were calculated by lm() function in R PCC 672

Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET Minimum Redundancy 673

NETwork CLR Context Likelihood of Relatedness 674

675

Figure 5 Evaluation of different sized networks constructed with different number of samples using PCC 676

(black) SCC (green) MRNET (red) and CLR (blue) methods A Average AUROC values from GO evaluations 677

of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted against sample size Seventeen 678

individual networks were labeled as S12_1 to S404 the S1266 included all samples from 17 experiments B 679

Average AUROC values from PPPTY evaluations of networks constructed using 12 (S12) to 1266 (S1266) 680

libraries were plotted against sample size Networks with the same number of samples included are 681

designated as ldquo_1rdquo ldquo_2rdquo and ldquo_3rdquo PCC Pearson Correlation Coefficient SCC Spearman correlation 682

coefficient MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness 683

684

Fig 6 GCN performance comparison among single network (whiterdquo1266rdquo) aggregated network (greyrdquoaggrdquo) 685

and protein network (dark greyrdquoprrdquo) using PCC SCC MRNET and CLR A GO evaluation on networks 686

Inference methods were indicated by single letter (p- PCC s- SCC m- MRNET c-CLR) AUROC values were 687

plotted against network types B PPPTY evaluation on networks Inference methods were indicated by single 688

letter (p- PCC s- SCC m- MRNET c-CLR) Network types were plotted against AUROC values Bold 689

horizontal lines indicate median star sign is the mean value of each box Outliers are plotted in grey dots 690

691

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 21

Figure 6 GCN performance comparison of networks constructed with 1266 libraries A Area under the ROC 692

curve (AUROC) values from GO evaluation of single network (white bars) aggregation network (grey bars) and 693

protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) 694

or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers B 695

AUROC values from PPPTY evaluation of single network (white bars) aggregation network (grey bars) and 696

protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) 697

or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers 698

699

Figure 7 Comparison of top 1000 interacting genes in aggregation and single networks A Venn diagram 700

shows the overlap among 1000 highest interacting genes in three networks 148 genes were shared among 701

three networks PA PCC ranked aggregation network SA SCC ranked aggregation network MS MRNET 702

single network B Venn diagram shows the overlap among top 1x106 edges in three networks 106591 edges 703

were shared among three networks C GO term enrichment analyzed by AgriGO for 148 shared highly 704

interacting genes among PA SA and MS D GO term enrichment analyzed by AgriGO for 214 co-expressed 705

genes queried by 16 cell wall pathway genes 706

707

Figure 8 Cell wall pathway subnetworks A Intersections of PCC aggregation (PA) SCC aggregation (SA) and 708

MRNET-single (MS) networks queried by 16 cell wall pathway genes (red nodes) Cyan nodes are genes with 709

reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of 710

involvement in cell wall related pathways Grey lines indicate network predicted interactions B Network 711

retrieved from CORNET database queried by the16 cell wall pathway genes (red node) Cyan nodes are 712

genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior 713

knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions C 714

Network retrieved from STRING database queried by 16 cell wall pathway genes (red nodes) Cyan nodes are 715

genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior 716

knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions 717

718

Supplemental Figure 1 Pipeline and datasets used for analysis A Workflow used in this analysis 719

Independent steps are labeled in square boxes with alternative algorithms for each step in the rounded boxes 720

Software and packages for each step are in italics between the boxes Raw data files were acquired from 721

National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) database converted to a 722

common format (fastq files) and aligned to the maize AGPv3 genome (Alignment) Gene-level reads were 723

counted (Read Count) to generate an expression matrix which was imported to the R environment for the 724

normalization inference and evaluation steps All networks were visualized in Cytoscape B Relative 725

representation of different maize tissues in acquired datasets Tissues are listed by name with the percentage 726

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 22

of the1266 libraries originating from each tissue SAM= Shoot Apical Meristem Samples are grouped by tissue 727

and may be represented by one or more developmental stages of that tissue Tissues represented by less than 728

10 libraries were grouped together as Others C Relative representation of different maize genotypes in our 729

datasets Genotypes are listed by name with the percentage of the 1266 libraries originating from each tissue 730

MAGIC = Multi-parent Advanced Generation InterCrosses Genotypes represented by more than 10 libraries 731

were grouped together as Others 732

733

Supplemental Figure 2 Distribution of gene expression values The frequency of each expression level in the 734

dataset (Density) was plotted against gene expression (Expr) which was calculated after normalization by 735

Variance Stabilizing Transformation (VST) Counts Per Million (CPM) or Reads Per Kilobase per Million 736

mapped reads (RPKM) A-B distribution of expression values for samples normalized with CPM (black line 737

CPM graph) and RPKM (black line RPKM graph) before (A) and after (B) logarithm normalization (log2) VST 738

values are log2 transformed by default The normal distribution of expression (dot lines) was calculated using 739

dnorm() function in R which takes the mean value and standard deviation from log2 transformed expressions 740

C Normalized gene expression values for 15116 genes were averaged libraries and plotted as a function of 741

gene length in base pairs (bp) 742

743

Supplemental Figure 3 Maize CPM-normalized with log2 transformed gene expression from all tissues and 744

developmental stages (Stelpflug et al 2015) A Clustering dendrogram of samples based on Euclidean 745

distance (Height) DAS days after sowing DAP days after pollination V1-V18 vegetative developmental 746

stage B Heat map of the gene expression correlation between pollen tissue and 78 other tissues calculated 747

by Pearson correlation coefficient ranging 06 to 10 Red color indicates higher correlation 748

749

Supplemental Figure 4 Pairwise comparison among results of inferences methods A GO evaluation 750

comparisons for VST CPM and RPKM normalized data The AUROC value density for each method was 751

plotted in diagonal line of blocks between AUROC values and PCC values AUROC values evaluated by GO 752

datasets were plotted pairwise in triangle below diagonal with the number corresponding coefficient values as 753

calculated by Pearson correlation shown in the triangle above diagonal B PPPTY evaluation comparisons for 754

VST CPM and RPKM normalized data The AUROC value density for each method was plotted in diagonal 755

line of blocks between AUROC values and PCC values AUROC values evaluated by PPPTY datasets were 756

plotted pairwise in triangle below diagonal with the number corresponding coefficient values as calculated by 757

Pearson correlation shown in the triangle above diagonal PCC Pearson Correlation Coefficient SCC 758

Spearman correlation coefficient KCC Kendall rank Correlation Coefficient GCC Gini correlation coefficient 759

Bi Biweight midcorrelation CSC Cosine Similarity Coefficient AA Additive ARCNE MA multiplicative 760

ARCNE MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness 761

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 23

762

Supplemental Figure 5 Characteristic of all 1720 PPPTY gene set (ALL_1720) genes with highest AUROC 763

values in CSC method (CSC) PCC method (PCC) and MRNET method (MRNET) Average expression in 764

CPM of four gene sets were in squares average number of lowly expressed elements (CPM lt 0) were in solid 765

circles 766

767

Supplemental Figure 6 Evaluation of network performance based on sample size and inference A AUROC 768

values from GO evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted 769

against sample size B AUROC values from PPPTY evaluations of networks constructed using 12 (S12) to 770

1266 (S1266) libraries were plotted against sample size Networks with the same number of samples included 771

are designated as ldquo_1rdquo ldquo_2rdquo and ldquo_3rdquo Outliers were defined as outside of 15 times the interquartile range 772

above the 75 quantile or below 25 quantile Median values were plotted as bold horizontal lines Dash lines 773

are average AUROC value from 17 individual networks of each categories Mean values of each network were 774

labeled in asterisks PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET 775

Minimum Redundancy NETwork CLR Context Likelihood of Relatedness 776

777

Supplemental Figure 7 GCN performance comparison between protein networks A Area Under the ROC 778

curve (AUROC) values from GO evaluation of protein networks with 17862 genes (ppr_all) and with 11429 779

genes (ppr) B Area Under the ROC curve (AUROC) values from PPPTY evaluation of protein networks with 780

17862 genes (ppr_all) and with 11429 genes (ppr) Both networks were constructed by Pearson Correlation 781

Coefficient (PCC) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate 782

outliers 783

784

Supplemental Figure 8 Average neighborhood connectivity for three selected networks PCC-aggregated (PA) 785

SCC-aggregated (SA) and MRNET-single (MS) The average neighborhood connectivity distribution of all 786

genes is plotted against number of neighbors The top one million edges were chosen for each network Red 787

and blue curve shows the power-law fitted distribution R2 value indicates the fitness with the power-law model 788

789

Supplemental Figure 9 Node distribution for four selected networks PCC-aggregated (PA) SCC-aggregated 790

(SA) and MRNET-single (MS) and the intersection among three networks (Merged network) The number of 791

edges linked to the genes (node degree) was plotted against the number of genes with that degree (number of 792

nodes) Red curve shows the power-law fitted distribution with the function and R2 indicated beside 793

794

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 24

Supplemental Figure 10 Network representation of PCC ranked aggregation network (PA) Each node is a 795

gene in the network The eight largest modules detected by Markov Cluster Algorithm (MCL) were highlighted 796

in colors Genes not in modules 1-8 are light grey nodes 797

798

799

Literature Cited 800

Allen JD Xie Y Chen M Girard L Xiao G (2012) Comparing statistical methods for constructing large scale 801 gene networks PLoS One 7 e29348 802

Anders S Huber W (2010) Differential expression analysis for sequence count data Genome Biol 11 R106 803

Aoki K Ogata Y Shibata D (2007) Approaches for extracting practical information from gene co-expression 804 networks in plant biology Plant Cell Physiol 48 381ndash90 805

Aoki Y Okamura Y Tadaka S Kinoshita K Obayashi T (2015) ATTED-II in 2016 A Plant Coexpression 806 Database Towards Lineage-Specific Coexpression Plant Cell Physiol 57 e5ndashe5 807

Asnicar F Masera L Coller E Gallo C Sella N Tolio T Morettin P Erculiani L Galante F Semeniuta S (2016) 808 NES2RA Network expansion by stratified variable subsetting and ranking aggregation Int J High Perform 809 Comput Appl 1094342016662508 810

Baerenfaller K Grossmann J Grobei MA Hull R Hirsch-Hoffmann M Yalovsky S Zimmermann P 811 Grossniklaus U Gruissem W Baginsky S (2008) Genome-scale proteomics reveals Arabidopsis thaliana 812 gene models and proteome dynamics Science (80- ) 320 938ndash941 813

Ballouz S Verleyen W Gillis J (2015) Guidance for RNA-seq co-expression network construction and analysis 814 Safety in numbers Bioinformatics 31 2123ndash2130 815

Ballouz S Weber M Pavlidis P Gillis J (2016) EGAD Ultra-fast functional analysis of gene networks bioRxiv 816 53868 817

Barabasi A-L Oltvai ZNZN Barabaacutesi A-L (2004) Network biology understanding the cellrsquos functional 818 organization Nat Rev Genet 5 101ndash113 819

Benjamini Y Hochberg Y (1995) Controlling the false discovery rate a practical and powerful approach to 820 multiple testing J R Stat Soc Ser B 289ndash300 821

De Bodt S Hollunder J Nelissen H Meulemeester N Inzeacute D (2012) CORNET 20 Integrating plant 822 coexpression protein-protein interactions regulatory interactions gene associations and functional 823 annotations New Phytol 195 707ndash720 824

Bolduc N Yilmaz A Mejia-Guerra MK Morohashi K OrsquoConnor D Grotewold E Hake S (2012) Unraveling the 825 KNOTTED1 regulatory network in maize meristems Genes Dev 26 1685ndash90 826

Bosch M Mayer C-D Cookson A Donnison IS (2011) Identification of genes involved in cell wall biogenesis in 827 grasses by differential gene expression profiling of elongating and non-elongating maize internodes J 828 Exp Bot 62 3545ndash3561 829

Camacho C Coulouris G Avagyan V Ma N Papadopoulos J Bealer K Madden TL (2009) BLAST+ 830 architecture and applications BMC Bioinformatics 10 421 831

Conesa A Madrigal P Tarazona S Gomez-Cabrero D Cervera A McPherson A Szcześniak MW Gaffney DJ 832 Elo LL Zhang X et al (2016) A survey of best practices for RNA-seq data analysis Genome Biol 17 13 833

Drsquohaeseleer P Liang S Somogyi R (2000) Genetic network inference from co-expression clustering to reverse 834 engineering Bioinformatics 16 707ndash726 835

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 25

Davidson RM Hansey CN Gowda M Childs KL Lin H Vaillancourt B Sekhon RS de Leon N Kaeppler SM 836 Jiang N et al (2011) Utility of RNA Sequencing for Analysis of Maize Reproductive Transcriptomes Plant 837 Genome J 4 191 838

Desprez T Juraniec M Crowell EF Jouy H Pochylova Z Parcy F Houmlfte H Gonneau M Vernhettes S (2007) 839 Organization of cellulose synthase complexes involved in primary cell wall synthesis in Arabidopsis 840 thaliana Proc Natl Acad Sci 104 15572ndash15577 841

Dhillon IS Modha DS (2001) Concept decompositions for large sparse text data using clustering Mach Learn 842 42 143ndash175 843

Dillies M-A Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot G Castel D 844 Estelle J (2013a) A comprehensive evaluation of normalization methods for Illumina high-throughput RNA 845 sequencing data analysis Brief Bioinform 14 671ndash683 846

Dillies MA Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot NS Castel D 847 Estelle J et al (2013b) A comprehensive evaluation of normalization methods for Illumina high-throughput 848 RNA sequencing data analysis Brief Bioinform 14 671ndash683 849

Doncheva NT Assenov Y Domingues FS Albrecht M (2012) Topological analysis and interactive visualization 850 of biological networks and protein structures Nature Protoc 7 670ndash85 851

Dong J Horvath S (2007) Understanding network concepts in modules BMC Syst Biol 1 24 852

Dotto MC Petsch KA Aukerman MJ Beatty M Hammell M Timmermans MCP (2014) Genome-wide analysis 853 of leafbladeless1-regulated and phased small RNAs underscores the importance of the TAS3 ta-siRNA 854 pathway to maize development PLoS Genet 10 e1004826 855

Du D Rawat N Deng Z Gmitter FG (2015) Construction of citrus gene coexpression networks from microarray 856 data using random matrix theory Hortic Res 2 15026 857

Du Z Zhou X Ling Y Zhang Z Su Z (2010) agriGO a GO analysis toolkit for the agricultural community 858 Nucleic Acids Res 38 64-70 859

Enright AJ Van Dongen S Ouzounis CA (2002) An efficient algorithm for large-scale detection of protein 860 families Nucleic Acids Res 30 1575ndash1584 861

Fadista J Vikman P Laakso EO Mollet IG Esguerra J Lou Taneera J Storm P Osmark P Ladenvall C 862 Prasad RB (2014) Global genomic and transcriptomic analysis of human pancreatic islets reveals novel 863 genes influencing glucose metabolism Proc Natl Acad Sci 111 13924ndash13929 864

Faith JJ Hayete B Thaden JT Mogno I Wierzbowski J Cottarel G Kasif S Collins JJ Gardner TS (2007) 865 Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of 866 expression profiles PLoS Biol 5 0054ndash0066 867

Fedoroff N V (2012) McClintockrsquos challenge in the 21st century Proc Natl Acad Sci 109(50) 20200ndash20203 868

Ficklin SP Feltus FA (2011) Gene coexpression network alignment and conservation of gene modules 869 between two grass species maize and rice Plant Physiol 156 1244ndash56 870

Filzmoser P Fritz H Kalcher K (2009) pcaPP Robust PCA by Projection Pursuit R Packag version 1 871

Fu J Cheng Y Linghu J Yang X Kang L Zhang Z Zhang J He C Du X Peng Z (2013) RNA sequencing 872 reveals the complex regulatory network in the maize kernel Nature Commun 42832 873

Gao S Ver Steeg G Galstyan A (2015) Efficient Estimation of Mutual Information for Strongly Dependent 874 Variables Artificial Intelligence and Statistics 277-286 875

Gillis J Pavlidis P (2011) The role of indirect connections in gene networks in predicting function 876 Bioinformatics 27 1860ndash1866 877

Giorgi F Fabbro C Del Licausi F (2013) Comparative study of RNA-seq-and Microarray-derived coexpression 878 networks in Arabidopsis thaliana Bioinformatics 2 1ndash8 879

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 26

Gu Y Kaplinsky N Bringmann M Cobb A Carroll A Sampathkumar A Baskin TI Persson S Somerville CR 880 (2010) Identification of a cellulose synthase-associated protein required for cellulose biosynthesis Proc 881 Natl Acad Sci 107 12866ndash12871 882

Han Y Gao S Muegge K Zhang W Zhou B (2015) Advanced applications of RNA sequencing and challenges 883 Bioinform Biol Insights 9 29ndash46 884

Horvath S Dong J (2008) Geometric interpretation of gene coexpression network analysis PLoS Comput biol 885 4 e1000117 886

Huang J Lynn JS Schulte L Vendramin S McGinnis K (2017) Chapter Two-Epigenetic Control of Gene 887 Expression in Maize Int Rev Cell Mol Biol 328 25ndash48 888

Iancu OD Kawane S Bottomly D Searles R Hitzemann R McWeeney S (2012) Utilizing RNA-Seq data for de 889 novo coexpression network inference Bioinformatics 28 1592ndash1597 890

Kim D Langmead B Salzberg SL (2015) HISAT a fast spliced aligner with low memory requirements Nat 891 Methods 12 357ndash360 892

Koumlster J Rahmann S (2012) Snakemakemdasha scalable bioinformatics workflow engine Bioinformatics 28 893 2520ndash2522 894

Krouk G Lingeman J Colon A Coruzzi G Shasha D (2013) Gene regulatory networks in plants learning 895 causality from time and perturbation Genome Biol 14 123 896

Kumar S Stecher G Suleski M Hedges SB (2017) TimeTree a resource for timelines timetrees and 897 divergence times Mol Biol Evol 34 1812ndash1819 898

Kumari S Nie J Chen H-S Ma H Stewart R Li X Lu M-Z Taylor WM Wei H (2012) Evaluation of gene 899 association methods for coexpression network construction and biological knowledge discovery PLoS 900 One 7 e50411 901

Langfelder P Horvath S (2008) WGCNA an R package for weighted correlation network analysis BMC 902 Bioinformatics 9 559 903

Leinonen R Sugawara H Shumway M (2010) The sequence read archive Nucleic Acids Res gkq1019 904

Li C Qiao Z Qi W Wang Q Yuan Y Yang X Tang Y Mei B Lv Y Zhao H et al (2015a) Genome-Wide 905 Characterization of cis-Acting DNA Targets Reveals the Transcriptional Regulatory Framework of 906 Opaque2 in Maize Plant Cell 27 532-545 907

Li H Peng Z Yang X Wang W Fu J Wang J Han Y Chai Y Guo T Yang N (2013a) Genome-wide 908 association study dissects the genetic architecture of oil biosynthesis in maize kernels Nat Genet 45 43ndash909 50 910

Li J Wei H Zhao PX (2013b) DeGNServer  Deciphering Genome-Scale Gene Networks through High 911 Performance Reverse Engineering Analysis 2013 912

Li P Piao Y Shon HS Ryu KH (2015b) Comparing the normalization methods for the differential analysis of 913 Illumina high-throughput RNA-Seq data BMC Bioinformatics 16 347 914

Li Q Eichten SR Hermanson PJ Zaunbrecher VM Song J Wendt J Rosenbaum H Madzima TF Sloan AE 915 Huang J et al (2014a) Genetic Perturbation of the Maize Methylome Plant Cell 26 4602ndash4616 916

Li S Łabaj PP Zumbo P Sykacek P Shi W Shi L Phan J Wu P-Y Wang M Wang C (2014b) Detecting and 917 correcting systematic variation in large-scale RNA sequencing data Nature Biotechnol 32 888ndash895 918

Li Y Pearl SA Jackson SA (2015c) Gene Networks in Plant Biology Approaches in Reconstruction and 919 Analysis Trends Plant Sci 20 664ndash675 920

Liao Y Smyth GK Shi W (2014) featureCounts an efficient general purpose program for assigning sequence 921 reads to genomic features Bioinformatics 30 923ndash930 922

Lim WK Wang K Lefebvre C Califano A (2007) Comparative analysis of microarray normalization procedures 923 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 27

Effects on reverse engineering gene networks Bioinformatics pp 282ndash288 924

Liu S Liu Y Zhao J Cai S Qian H Zuo K Zhao L Zhang L (2017) A computational interactome for prioritizing 925 genes associated with complex agronomic traits in rice Plant J 90 177-188 926

Lonsdale J Thomas J Salvatore M Phillips R Lo E Shad S Hasz R Walters G Garcia F Young N (2013) 927 The genotype-tissue expression (GTEx) project Nat Genet 45 580ndash585 928

Love MI Huber W Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data 929 with DESeq2 Genome Biol 15 1 930

Luo X You Z Zhou M Li S Leung H Xia Y Zhu Q (2015) A highly efficient approach to protein interactome 931 mapping based on collaborative filtering framework Sci Rep 5 7702 932

Ma C Wang X (2012) Application of the Gini correlation coefficient to infer regulatory relationships in 933 transcriptome analysis Plant Physiol 160 192ndash203 934

Ma H-W Zeng A-P (2003) The connectivity structure giant strong component and centrality of metabolic 935 networks Bioinformatics 19 1423ndash1430 936

Ma S Shah S Bohnert HJ Snyder M Dinesh-Kumar SP (2013) Incorporating motif analysis into gene co-937 expression networks reveals novel modular expression pattern and new signaling pathways PLoS Genet 938 9 e1003840 939

Marbach D Costello JC Kuumlffner R Vega NNM Prill RJ Camacho DM Allison KR Aderhold A Allison KR 940 Bonneau R et al (2012) Wisdom of crowds for robust gene network inference Nat Methods 9 796ndash804 941

Margolin AA Nemenman I Basso K Wiggins C Stolovitzky G Dalla Favera R Califano A (2006) ARACNE 942 an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context BMC 943 Bioinformatics 7 S7 944

Mark Cigan A Unger‐Wallace E Haug‐Collet K (2005) Transcriptional gene silencing as a tool for uncovering 945 gene function in maize Plant J 43 929ndash940 946

Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads EMBnet J 17 947 pp-10 948

Maza E Frasse P Senin P Bouzayen M Zouine M (2013) Comparison of normalization methods for 949 differential gene expression analysis in RNA-Seq experiments A matter of relative size of studied 950 transcriptomes Commun Integr Biol 6 e25849 951

McClintock B (1983) The Significance of Responses of THE GENOME TO CHALLENGE Science (80- ) 792ndash952 801 953

Meyer PE Kontos K Lafitte F Bontempi G (2007) Information-theoretic inference of large transcriptional 954 regulatory networks Eurasip J Bioinforma Syst Biol doi 101155200779879 955

Meyer PE Lafitte F Bontempi G (2008) minet ARBioconductor package for inferring large transcriptional 956 networks using mutual information BMC Bioinformatics 9 461 957

Monaco MK Sen TZ Dharmawardhana PD Ren L Schaeffer M Naithani S Amarasinghe V Thomason J 958 Harper L Gardiner J et al (2013) Maize Metabolic Network Construction and Transcriptome Analysis 959 Plant Genome 6 12 960

Morohashi K Casas MI Falcone Ferreyra ML Falcone Ferreyra L Mejiacutea-Guerra MK Pourcel L Yilmaz A 961 Feller A Carvalho B Emiliani J et al (2012) A genome-wide regulatory framework identifies maize 962 pericarp color1 controlled genes Plant Cell 24 2745ndash64 963

Morris JH Apeltsin L Newman AM Baumbach J Wittkop T Su G Bader GD Ferrin TE (2011) clusterMaker 964 a multi-algorithm clustering plugin for Cytoscape BMC Bioinformatics 12 436 965

Mortazavi A Williams BA McCue K Schaeffer L Wold B (2008) Mapping and quantifying mammalian 966 transcriptomes by RNA-Seq Nat Methods 5 621ndash628 967

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 28

Mukaka MM (2012) A guide to appropriate use of correlation coefficient in medical research Malawi Med J 24 968 69ndash71 969

Obayashi T Hayashi S Saeki M Ohta H Kinoshita K (2009) ATTED-II provides coexpressed gene networks 970 for Arabidopsis Nucleic Acids Res 37 D987ndashD991 971

Ogata Y Suzuki H Sakurai N Shibata D (2010) CoP a database for characterizing co-expressed gene 972 modules with biological information in plants Bioinformatics 26 1267ndash1268 973

Oshlack A Wakefield MJ (2009) Transcript length bias in RNA-seq data confounds systems biology Biol 974 Direct 4 14 975

Park T Yi S-G Kang S-H Lee S Lee Y-S Simon R (2003) Evaluation of normalization methods for microarray 976 data BMC Bioinformatics 4 33 977

Paulson J Chen C-Y Lopes-Ramos CM Kuijjer ML Platig J Sonawane AR Fagny M Glass K Quackenbush 978 J (2016) Tissue-aware RNA-Seq processing and normalization for heterogeneous and sparse data 979 bioRxiv 81802 980

Pautler M Eveland AL LaRue T Yang F Weeks R Lunde C Je B Il Meeley R Komatsu M Vollbrecht E et 981 al (2015) FASCIATED EAR4 Encodes a bZIP Transcription Factor That Regulates Shoot Meristem Size in 982 Maize Plant Cell Online 2 tpc114132506 983

Penning BW Hunter 3rd CT Tayengwa R Eveland AL Dugard CK Olek AT Vermerris W Koch KE McCarty 984 DR Davis MF et al (2009) Genetic resources for maize cell wall biology Plant Physiol 151 1703ndash1728 985

Ponnala L Wang Y Sun Q Wijk KJ (2014) Correlation of mRNA and protein abundance in the developing 986 maize leaf Plant J 78 424ndash440 987

Rau A Gallopin M Celeux G Jaffreacutezic F (2013) Data-based filtering for replicated high-throughput 988 transcriptome sequencing experiments Bioinformatics 29 2146ndash2152 989

Robinson MD McCarthy DJ Smyth GK (2010) edgeR a Bioconductor package for differential expression 990 analysis of digital gene expression data Bioinformatics 26 139ndash140 991

Sales G Romualdi C (2011) parmigenemdasha parallel R package for mutual information estimation and gene 992 network reconstruction Bioinformatics 27 1876ndash1877 993

Sangrador-Vegas A Mitchell AL Chang H-Y Yong S-Y Finn RD (2016) GO annotation in InterPro why 994 stability does not indicate accuracy in a sea of changing annotations Database J Biol databases 995 curation 2016 996

Sato Y Antonio BA Namiki N Takehisa H Minami H Kamatsuki K Sugimoto K Shimizu Y Hirochika H 997 Nagamura Y (2011) RiceXPro a platform for monitoring gene expression in japonica rice grown under 998 natural field conditions Nucleic Acids Res 39 D1141ndashD1148 999

Schaefer RJ Briskine R Springer NM Myers CL (2014) Discovering functional modules across diverse maize 1000 transcriptomes using COB the co-expression browser PLoS One doi 101371journalpone0099193 1001

Schmidt D (2016) Co-Operation Fast Correlation Covariance and Cosine Similarity R package 1002

Schnable P Ware D Fulton R Stein J (2009) The B73 maize genome complexity diversity and dynamics 1003 Science (80- ) 326 1112ndash1115 1004

Schwanhaumlusser B Busse D Li N Dittmar G Schuchhardt J Wolf J Chen W Selbach M (2011) Global 1005 quantification of mammalian gene expression control Nature 473 337ndash342 1006

Scott-Boyer M-P Haibe-Kains B Deschepper CF (2013) Network statistics of genetically-driven gene co-1007 expression modules in mouse crosses Frontiers in Genetics 20134291 1008

Serin EAR Nijveen H Hilhorst HWM Ligterink W (2016) Learning from Co-expression Networks Possibilities 1009 and Challenges Front Plant Sci 7 444 1010

Shannon P Markiel A Ozier O Baliga NS Wang JT Ramage D Amin N Schwikowski B Ideker T (2003) 1011 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 29

Cytoscape a software environment for integrated models of biomolecular interaction networks Genome 1012 Res 13 2498ndash2504 1013

Sing T Sander O Beerenwinkel N Lengauer T (2005) ROCR visualizing classifier performance in R 1014 Bioinformatics 21 3940ndash3941 1015

Singh M Goel S Meeley RB Dantec C Parrinello H Michaud C Leblanc O Grimanelli D (2011) Production of 1016 viable gametes without meiosis in maize deficient for an ARGONAUTE protein Plant Cell 23 443ndash458 1017

Song L Langfelder P Horvath S (2012) Comparison of co-expression measures mutual information 1018 correlation and model based indices BMC Bioinformatics 13 328 1019

Stelpflug SC Rajandeep S Vaillancourt B Hirsch CN Buell CR Leon N De Kaeppler SM (2015) An 1020 expanded maize gene expression atlas based on RNA-sequencing and its use to explore root 1021 development Plant Genome 314ndash362 1022

Szklarczyk D Franceschini A Wyder S Forslund K Heller D Huerta-Cepas J Simonovic M Roth A Santos A 1023 Tsafou KP et al (2015) STRING v10 Protein-protein interaction networks integrated over the tree of life 1024 Nucleic Acids Res 43 D447ndashD452 1025

Tzfadia O Diels T De Meyer S Vandepoele K Aharoni A Van de Peer Y (2016) CoExpNetViz Comparative 1026 Co-Expression Networks Construction and Visualization Tool Front Plant Sci 6 1194 1027

Usadel B Obayashi T Mutwil M Giorgi FM Bassel GW Tanimoto M Chow A Steinhauser D Persson S 1028 Provart NJ (2009) Co-expression tools for plant biology opportunities for hypothesis generation and 1029 caveats Plant Cell Environ 32 1633ndash51 1030

USDA (2016) Grain World Markets and Trade 1031

Vanholme R Demedts B Morreel K Ralph J Boerjan W (2010) Lignin biosynthesis and structure Plant 1032 Physiol 153 895ndash905 1033

Verdin E Ott M (2014) 50 Years of Protein Acetylation From Gene Regulation To Epigenetics Metabolism 1034 and Beyond Nat Rev Mol Cell Biol 16 258ndash264 1035

Walley JW Sartor RC Shen Z Schmitz RJ Wu KJ Urich MA Nery JR Smith LG Schnable JC Ecker JR 1036 (2016) Integration of omic networks in a developmental atlas of maize Science 353 814ndash818 1037

Wang W Zhou XJ Liu Z Sun F (2015a) Network tuned multiple rank aggregation and applications to gene 1038 ranking BMC Bioinformatics 16 S6 1039

Wang YXR Jiang K Feldman LJ Bickel PJ Huang H (2015b) Inferring genendashgene interactions and functional 1040 modules using sparse canonical correlation analysis Ann Appl Stat 9 300ndash323 1041

Wang Z Gerstein M Snyder M (2009) RNA-Seq a revolutionary tool for transcriptomics Nat Rev Genet 10 1042 57ndash63 1043

Wei C Li J Bumgarner RE (2004) Sample size for detecting differentially expressed genes in microarray 1044 experiments BMC Genomics 5 87 1045

Wolfe CJ Kohane IS Butte AJ (2005) Systematic survey reveals general applicability ofldquo guilt-by-associationrdquo 1046 within gene coexpression networks BMC Bioinformatics 6 227 1047

Wu D-D Wang X Li Y Zeng L Irwin DM Zhang Y-P (2014) ldquoOut of Pollenrdquo Hypothesis for Origin of New 1048 Genes in Flowering Plants Study from Arabidopsis thaliana Genome Biol Evol 6 2822ndash2829 1049

Yang H Liu X Xin M Du J Hu Z Peng H Rossi V Sun Q Ni Z Yao Y (2016) Genome-wide Mapping of 1050 Targets of Maize Histone Deacetylase HDA101 Reveals Its Function and Regulatory Mechanism during 1051 Seed Development Plant Cell 28 629ndash645 1052

Yim WC Yu Y Song K Jang CS Lee B-M (2013) PLANEX the plant co-expression database BMC Plant Biol 1053 13 83 1054

Zheng W Chung LM Zhao H (2011) Bias detection and correction in RNA-Sequencing data BMC 1055 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 30

Bioinformatics 12 290 1056

Zhong R Allen JD Xiao G Xie Y (2014) Ensemble-based network aggregation improves the accuracy of gene 1057 network reconstruction PLoS One 9 e106319 1058

Zhong R Ye ZH (2015) Secondary cell walls Biosynthesis patterned deposition and transcriptional regulation 1059 Plant Cell Physiol 56 195ndash214 1060

Zhu G Wu A Xu X-J Xiao P Lu L Liu J Cao Y Chen L Wu J Zhao X-M (2016) PPIM A protein-protein 1061 interaction database for Maize Plant Physiol 170 pp1501821- 1062

Zyprych-Walczak J Szabelska A Handschuh L Goacuterczak K Klamecka K Figlerowicz M Siatkowski I (2015) 1063 The impact of normalization methods on RNA-Seq data analysis Biomed Res Int 2015 1064

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 1 Number of maize microarray and RNA-Seq samples submitted to NCBI (httpswwwncbinlmnihgov) from 2008-2016 A A text search of the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) database identified samples generated by microarray platforms GPL4032 and GPL12620 the total values for years 2008 to 2016 were combined to represent the number of microarray studies (Microarray) A text search of the NCBI Sequence Read Archive (SRA) database was used to identify RNA-seq samples generated between 2008 and 2016 using the Illumina sequencing platform (RNA-Seq) B the number of samples submitted to NCBI GEO database each year generated by microarray platform GPL4032 and GPL12620 were identified by a text search (dash line) and compared to the number of RNA-Seq Illumina samples (solid line) per year 2008-2016

Fig 1A B

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 2 Normalization and network inference methods effect on single network performance A Network performance was evaluated by calculating area under the receiver operating characteristic curve (AUROC) values from GO datasets for comparisons with samples normalized using Variance Stabilizing Transformation (VST) Counts Per Million (CPM) or Reads per Kilobase per Million (RPKM) methods B Network performance was evaluated by calculating AUROC values from PPPTY dataset comparisons for samples normalized using VST CPM or RPKM methods C Network performance was evaluated by calculating AUROC values from com-parisons with HDA101 binding targets for samples normalized using VST CPM or RPKM methods D Network performance was evaluated by calculating AUROC values from comparisons with GO dataset for samples constructed using nine inference methods including Pearson Correlation Coefficient (PCC) Spearman correla-tion coefficient (SCC) Kendall rank Correlation Coefficient (KCC) Gini correlation coefficient (GCC) Biweight midcorrelation (BIC) Cosine Similarity Coefficient (CSC) Additive ARCNE (AA) multiplicative ARCNE (MA) Minimum Redundancy NETwork (MRNET) and Context Likelihood of Relatedness (CLR) E Network perfor-mance was evaluated by calculating AUROC values from comparisons with PPPTY for samples constructed using nine inference methods F Network performance was evaluated by calculating AUROC values from com-parisons with HDA101 binding targets for samples constructed using nine inference methods Outliers were defined as outside of 15 times the interquartile range above 75 quantile or below 25 quantile Median values were plotted as bold horizontal lines For C to F the horizontal dashed lines indicate the highest and lowest AUROC values

Fig 2 A D

B E

C F

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

FigP

FigurePIPSimilarityPbetweenPninePinferencePmethodsPonPnetworkPperformancePbasedPuponPGOPA-PandPPPPTYPB-PevaluationI Genes with highest AUROC value in CSC or MRNETCLR are closed in blue and green box respectively AreaPunderPthePROCPcurvePAUROC-PvaluesPforPeachPGOPtermPorPgenesPwerePscaledPtoPstandardPnormalPdistributionMPresultingPinPscaledPAUROCPvaluesPbetweenPKPblue-PandPPred-IPSamplesPnormalizedPbyPVSTMPCPMPandPRPKMPwerePanalyzedPusingPeachPinferencePmethodsPPCCMPSCCMPKCCMPGCCMPBICMPCSCMPAAMPMAMPMRNETPandPCLR-PandPclusteredPbasedPonP EuclidianP distanceIP PCCP PearsonP CorrelationP CoefficientP SCCP SpearmanP correlationP coefficientP KCCP KendallP rankP CorrelationP CoefficientPGCCPGiniPcorrelationPcoefficientPBICPBiweightPmidcorrelationPCosinePSimilarityPCoefficientPCSC-PAAPAdditivePARCNEPMAPmultiplicativePARCNEPMRNETPMinimumPRedundancyPNETworkPCLRPContextPLikelihoodPofPRelatednessI

A

B

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 4 Effect of sample size on network performance A Average area under the ROC curve (average AUROC) values from GO evaluation of 17 different sized networks plotted against natural logarithm transformed sample size (log(sample size)) B Average AUROC values from PPPTY evaluation of 17 different sized networks plotted against natural logarithm transformed sample size (log(sample size)) Regression fitting logarithm models were plotted in black lines R-square and p-values were calculated by lm() function in R PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness

Fig 4A

B

GO PCC GO SCC

GO MRNET GO CLR

PPPTY PCC PPPTY SCC

PPPTY CLRPPPTY MRNET

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Fig 5

A B

Figure 5 Evaluation of different sized networks constructed with different number of samples using PCC (black) SCC (green) MRNET (red) and CLR (blue) methods A Average AUROC values from GO evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted against sample size B Average AUROC values from PPPTY evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted against sample size Networks with the same number of samples included are designated as ldquo_1rdquo ldquo_2rdquo and ldquo_3rdquo PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 6 GCN performance comparison of networks constructed with 1266 libraries A Area under the ROC curve (AUROC) values from GO evalu-ation of single network (white bars) aggregation network (grey bars) and protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers B AUROC values from PPPTY evaluation of single network (white bars) aggregation network (grey bars) and protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers

AU

C

AU

C

Protein GO Protein PPPTYA B

Fig 6

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

835

45

3812

5 802

148

MS PA

SA

872505

167664

16510411732

9172 716573

106591

MS PA

SA

chromatinassemblydisassembly

cellular macromoleculemetabolic process

chromatin assembly

Hub

N2 compound metabolicprocess

gene silencing

macromoleculemetabolic process

cellular componentorganization

chromatin modification

biosynthetic process

cellular biosyntheticprocess

DNA packaging

organelle organization

protein-DNA complexassembly

nucleosomeorganization

DNA-dep DNAreplication

macromoleculebiosynthetic process

response to DNAdamage stimulus

chromosomeorganization

pattern specificationprocess

DNA replication

DNA conformationchange

translation

cellular macromoleculebiosynthetic process

Nucleic acid metabolicprocess

gene expression

chromatin organizationnucleosome assembly

epigenetic reg of geneexpression

negative regulation ofmacromolecule

metabolic process

cellular response tostress

RNA processing

DNA repair

regionalization

polysaccharidebiosynthetic process

cell wall organization orbiogenesis

glucan metabolicprocess

cellular glucanmetabolic process

cellular polysaccharidebiosynthetic process

cellular carbohydratebiosynthetic process

cellulose metabolicprocess

cellular polysaccharidemetabolic process

cellulose biosyntheticprocess epidermis development

cell growthgrowth

regulation of cellularcomponent size

cellular amino acidderivative metabolic

process

cell wall polysaccharidemetabolic process

carbohydrate metabolicprocess

regulation of anatomicalstructure size

GTP metabolic process

root morphogenesis

epidermal celldifferentiation

ectoderm developmentphenylpropanoid

biosynthetic process

regulation of cell size

glucan biosyntheticprocess

carbohydratebiosynthetic process cellular cell wall

organization orbiogenesis

cell wall biogenesis

Cell Wallroot epidermal cell

differentiationcell differentiation

cell wall organization

protein polymerization

plant-type cell wallbiogenesis

cellular carbohydratemetabolic process

phenylpropanoidmetabolic process

cell wall macromoleculemetabolic process

cellular cell wallmacromolecule

metabolic process

plant-type cell wallorganization or

biogenesis

hemicellulose metabolicprocess

Fig 7A C

B D

Figure 7 Comparison of top 1000 interacting genes in aggregation and single networks A Venn diagram shows the overlap among 1000 highest interacting genes in three networks 148 genes were shared among three networks PA PCC ranked aggregation network SA SCC ranked aggrega-tion network MS MRNET single network B Venn diagram shows the overlap among top 1x106 edges in three networks 106591 edges were shared among three networks C GO term enrichment analyzed by AgriGO for 148 shared highly interacting genes among PA SA and MS D GO term enrichment analyzed by AgriGO for 214 co-expressed genes queried by 16 cell wall pathway genes wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Fig 8

A B C

Figure 8 Cell wall pathway subnetworks A Intersections of PCC aggregation (PA) SCC aggregation (SA) and MRNET-single (MS) networks queried by 16 cell wall pathway genes (red nodes) Cyan nodes are genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions B Network retrieved from CORNET database queried by the 16 cell wall pathway genes (red node) Cyan nodes are genes with reported function in cell wall related path-ways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions C Network retrieved from STRING database queried by the 16 cell wall pathway genes (red nodes) Cyan nodes are genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Parsed CitationsAllen JD Xie Y Chen M Girard L Xiao G (2012) Comparing statistical methods for constructing large scale gene networks PLoS One7 e29348

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Anders S Huber W (2010) Differential expression analysis for sequence count data Genome Biol 11 R106Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Aoki K Ogata Y Shibata D (2007) Approaches for extracting practical information from gene co-expression networks in plant biologyPlant Cell Physiol 48 381-90

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Aoki Y Okamura Y Tadaka S Kinoshita K Obayashi T (2015) ATTED-II in 2016 A Plant Coexpression Database Towards Lineage-Specific Coexpression Plant Cell Physiol 57 e5-e5

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Asnicar F Masera L Coller E Gallo C Sella N Tolio T Morettin P Erculiani L Galante F Semeniuta S (2016) NES2RA Networkexpansion by stratified variable subsetting and ranking aggregation Int J High Perform Comput Appl 1094342016662508

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Baerenfaller K Grossmann J Grobei MA Hull R Hirsch-Hoffmann M Yalovsky S Zimmermann P Grossniklaus U Gruissem WBaginsky S (2008) Genome-scale proteomics reveals Arabidopsis thaliana gene models and proteome dynamics Science (80- ) 320938-941

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ballouz S Verleyen W Gillis J (2015) Guidance for RNA-seq co-expression network construction and analysis Safety in numbersBioinformatics 31 2123-2130

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ballouz S Weber M Pavlidis P Gillis J (2016) EGAD Ultra-fast functional analysis of gene networks bioRxiv 53868Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Barabasi A-L Oltvai ZNZN Barabaacutesi A-L (2004) Network biology understanding the cells functional organization Nat Rev Genet 5101-113

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Benjamini Y Hochberg Y (1995) Controlling the false discovery rate a practical and powerful approach to multiple testing J R Stat SocSer B 289-300

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

De Bodt S Hollunder J Nelissen H Meulemeester N Inzeacute D (2012) CORNET 20 Integrating plant coexpression protein-proteininteractions regulatory interactions gene associations and functional annotations New Phytol 195 707-720

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Bolduc N Yilmaz A Mejia-Guerra MK Morohashi K OConnor D Grotewold E Hake S (2012) Unraveling the KNOTTED1 regulatorynetwork in maize meristems Genes Dev 26 1685-90

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Bosch M Mayer C-D Cookson A Donnison IS (2011) Identification of genes involved in cell wall biogenesis in grasses by differential wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

gene expression profiling of elongating and non-elongating maize internodes J Exp Bot 62 3545-3561Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Camacho C Coulouris G Avagyan V Ma N Papadopoulos J Bealer K Madden TL (2009) BLAST+ architecture and applications BMCBioinformatics 10 421

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Conesa A Madrigal P Tarazona S Gomez-Cabrero D Cervera A McPherson A Szczesniak MW Gaffney DJ Elo LL Zhang X et al(2016) A survey of best practices for RNA-seq data analysis Genome Biol 17 13

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dhaeseleer P Liang S Somogyi R (2000) Genetic network inference from co-expression clustering to reverse engineeringBioinformatics 16 707-726

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Davidson RM Hansey CN Gowda M Childs KL Lin H Vaillancourt B Sekhon RS de Leon N Kaeppler SM Jiang N et al (2011) Utilityof RNA Sequencing for Analysis of Maize Reproductive Transcriptomes Plant Genome J 4 191

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Desprez T Juraniec M Crowell EF Jouy H Pochylova Z Parcy F Houmlfte H Gonneau M Vernhettes S (2007) Organization of cellulosesynthase complexes involved in primary cell wall synthesis in Arabidopsis thaliana Proc Natl Acad Sci 104 15572-15577

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dhillon IS Modha DS (2001) Concept decompositions for large sparse text data using clustering Mach Learn 42 143-175Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dillies M-A Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot G Castel D Estelle J (2013a) Acomprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis Brief Bioinform 14671-683

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dillies MA Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot NS Castel D Estelle J et al (2013b) Acomprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis Brief Bioinform 14671-683

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Doncheva NT Assenov Y Domingues FS Albrecht M (2012) Topological analysis and interactive visualization of biological networksand protein structures Nature Protoc 7 670-85

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dong J Horvath S (2007) Understanding network concepts in modules BMC Syst Biol 1 24Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dotto MC Petsch KA Aukerman MJ Beatty M Hammell M Timmermans MCP (2014) Genome-wide analysis of leafbladeless1-regulatedand phased small RNAs underscores the importance of the TAS3 ta-siRNA pathway to maize development PLoS Genet 10 e1004826

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Du D Rawat N Deng Z Gmitter FG (2015) Construction of citrus gene coexpression networks from microarray data using randommatrix theory Hortic Res 2 15026

Pubmed Author and TitleCrossRef Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Google Scholar Author Only Title Only Author and Title

Du Z Zhou X Ling Y Zhang Z Su Z (2010) agriGO a GO analysis toolkit for the agricultural community Nucleic Acids Res 38 64-70Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Enright AJ Van Dongen S Ouzounis CA (2002) An efficient algorithm for large-scale detection of protein families Nucleic Acids Res 301575-1584

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fadista J Vikman P Laakso EO Mollet IG Esguerra J Lou Taneera J Storm P Osmark P Ladenvall C Prasad RB (2014) Globalgenomic and transcriptomic analysis of human pancreatic islets reveals novel genes influencing glucose metabolism Proc Natl AcadSci 111 13924-13929

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Faith JJ Hayete B Thaden JT Mogno I Wierzbowski J Cottarel G Kasif S Collins JJ Gardner TS (2007) Large-scale mapping andvalidation of Escherichia coli transcriptional regulation from a compendium of expression profiles PLoS Biol 5 0054-0066

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fedoroff N V (2012) McClintocks challenge in the 21st century Proc Natl Acad Sci 109(50) 20200-20203Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ficklin SP Feltus FA (2011) Gene coexpression network alignment and conservation of gene modules between two grass speciesmaize and rice Plant Physiol 156 1244-56

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Filzmoser P Fritz H Kalcher K (2009) pcaPP Robust PCA by Projection Pursuit R Packag version 1Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fu J Cheng Y Linghu J Yang X Kang L Zhang Z Zhang J He C Du X Peng Z (2013) RNA sequencing reveals the complex regulatorynetwork in the maize kernel Nature Commun 42832

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gao S Ver Steeg G Galstyan A (2015) Efficient Estimation of Mutual Information for Strongly Dependent Variables ArtificialIntelligence and Statistics 277-286

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gillis J Pavlidis P (2011) The role of indirect connections in gene networks in predicting function Bioinformatics 27 1860-1866Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Giorgi F Fabbro C Del Licausi F (2013) Comparative study of RNA-seq-and Microarray-derived coexpression networks in Arabidopsisthaliana Bioinformatics 2 1-8

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gu Y Kaplinsky N Bringmann M Cobb A Carroll A Sampathkumar A Baskin TI Persson S Somerville CR (2010) Identification of acellulose synthase-associated protein required for cellulose biosynthesis Proc Natl Acad Sci 107 12866-12871

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Han Y Gao S Muegge K Zhang W Zhou B (2015) Advanced applications of RNA sequencing and challenges Bioinform Biol Insights 929-46

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Horvath S Dong J (2008) Geometric interpretation of gene coexpression network analysis PLoS Comput biol 4 e1000117Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Huang J Lynn JS Schulte L Vendramin S McGinnis K (2017) Chapter Two-Epigenetic Control of Gene Expression in Maize Int RevCell Mol Biol 328 25-48

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Iancu OD Kawane S Bottomly D Searles R Hitzemann R McWeeney S (2012) Utilizing RNA-Seq data for de novo coexpressionnetwork inference Bioinformatics 28 1592-1597

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kim D Langmead B Salzberg SL (2015) HISAT a fast spliced aligner with low memory requirements Nat Methods 12 357-360Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Koumlster J Rahmann S (2012) Snakemakemdasha scalable bioinformatics workflow engine Bioinformatics 28 2520-2522Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Krouk G Lingeman J Colon A Coruzzi G Shasha D (2013) Gene regulatory networks in plants learning causality from time andperturbation Genome Biol 14 123

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kumar S Stecher G Suleski M Hedges SB (2017) TimeTree a resource for timelines timetrees and divergence times Mol Biol Evol34 1812-1819

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kumari S Nie J Chen H-S Ma H Stewart R Li X Lu M-Z Taylor WM Wei H (2012) Evaluation of gene association methods forcoexpression network construction and biological knowledge discovery PLoS One 7 e50411

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Langfelder P Horvath S (2008) WGCNA an R package for weighted correlation network analysis BMC Bioinformatics 9 559Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Leinonen R Sugawara H Shumway M (2010) The sequence read archive Nucleic Acids Res gkq1019Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li C Qiao Z Qi W Wang Q Yuan Y Yang X Tang Y Mei B Lv Y Zhao H et al (2015a) Genome-Wide Characterization of cis-Acting DNATargets Reveals the Transcriptional Regulatory Framework of Opaque2 in Maize Plant Cell 27 532-545

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li H Peng Z Yang X Wang W Fu J Wang J Han Y Chai Y Guo T Yang N (2013a) Genome-wide association study dissects the geneticarchitecture of oil biosynthesis in maize kernels Nat Genet 45 43-50

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li J Wei H Zhao PX (2013b) DeGNServer Deciphering Genome-Scale Gene Networks through High Performance ReverseEngineering Analysis 2013

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li P Piao Y Shon HS Ryu KH (2015b) Comparing the normalization methods for the differential analysis of Illumina high-throughputRNA-Seq data BMC Bioinformatics 16 347 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li Q Eichten SR Hermanson PJ Zaunbrecher VM Song J Wendt J Rosenbaum H Madzima TF Sloan AE Huang J et al (2014a)Genetic Perturbation of the Maize Methylome Plant Cell 26 4602-4616

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li S Labaj PP Zumbo P Sykacek P Shi W Shi L Phan J Wu P-Y Wang M Wang C (2014b) Detecting and correcting systematicvariation in large-scale RNA sequencing data Nature Biotechnol 32 888-895

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li Y Pearl SA Jackson SA (2015c) Gene Networks in Plant Biology Approaches in Reconstruction and Analysis Trends Plant Sci 20664-675

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Liao Y Smyth GK Shi W (2014) featureCounts an efficient general purpose program for assigning sequence reads to genomicfeatures Bioinformatics 30 923-930

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Lim WK Wang K Lefebvre C Califano A (2007) Comparative analysis of microarray normalization procedures Effects on reverseengineering gene networks Bioinformatics pp 282-288

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Liu S Liu Y Zhao J Cai S Qian H Zuo K Zhao L Zhang L (2017) A computational interactome for prioritizing genes associated withcomplex agronomic traits in rice Plant J 90 177-188

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Lonsdale J Thomas J Salvatore M Phillips R Lo E Shad S Hasz R Walters G Garcia F Young N (2013) The genotype-tissueexpression (GTEx) project Nat Genet 45 580-585

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Love MI Huber W Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 Genome Biol15 1

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Luo X You Z Zhou M Li S Leung H Xia Y Zhu Q (2015) A highly efficient approach to protein interactome mapping based oncollaborative filtering framework Sci Rep 5 7702

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma C Wang X (2012) Application of the Gini correlation coefficient to infer regulatory relationships in transcriptome analysis PlantPhysiol 160 192-203

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma H-W Zeng A-P (2003) The connectivity structure giant strong component and centrality of metabolic networks Bioinformatics 191423-1430

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma S Shah S Bohnert HJ Snyder M Dinesh-Kumar SP (2013) Incorporating motif analysis into gene co-expression networks revealsnovel modular expression pattern and new signaling pathways PLoS Genet 9 e1003840

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Marbach D Costello JC Kuumlffner R Vega NNM Prill RJ Camacho DM Allison KR Aderhold A Allison KR Bonneau R et al (2012)Wisdom of crowds for robust gene network inference Nat Methods 9 796-804

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Margolin AA Nemenman I Basso K Wiggins C Stolovitzky G Dalla Favera R Califano A (2006) ARACNE an algorithm for thereconstruction of gene regulatory networks in a mammalian cellular context BMC Bioinformatics 7 S7

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mark Cigan A Unger-Wallace E Haug-Collet K (2005) Transcriptional gene silencing as a tool for uncovering gene function in maizePlant J 43 929-940

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads EMBnet J 17 pp-10Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Maza E Frasse P Senin P Bouzayen M Zouine M (2013) Comparison of normalization methods for differential gene expressionanalysis in RNA-Seq experiments A matter of relative size of studied transcriptomes Commun Integr Biol 6 e25849

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

McClintock B (1983) The Significance of Responses of THE GENOME TO CHALLENGE Science (80- ) 792-801Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Meyer PE Kontos K Lafitte F Bontempi G (2007) Information-theoretic inference of large transcriptional regulatory networks EurasipJ Bioinforma Syst Biol doi 101155200779879

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Meyer PE Lafitte F Bontempi G (2008) minet ARBioconductor package for inferring large transcriptional networks using mutualinformation BMC Bioinformatics 9 461

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Monaco MK Sen TZ Dharmawardhana PD Ren L Schaeffer M Naithani S Amarasinghe V Thomason J Harper L Gardiner J et al(2013) Maize Metabolic Network Construction and Transcriptome Analysis Plant Genome 6 12

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Morohashi K Casas MI Falcone Ferreyra ML Falcone Ferreyra L Mejiacutea-Guerra MK Pourcel L Yilmaz A Feller A Carvalho B EmilianiJ et al (2012) A genome-wide regulatory framework identifies maize pericarp color1 controlled genes Plant Cell 24 2745-64

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Morris JH Apeltsin L Newman AM Baumbach J Wittkop T Su G Bader GD Ferrin TE (2011) clusterMaker a multi-algorithmclustering plugin for Cytoscape BMC Bioinformatics 12 436

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mortazavi A Williams BA McCue K Schaeffer L Wold B (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq NatMethods 5 621-628

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mukaka MM (2012) A guide to appropriate use of correlation coefficient in medical research Malawi Med J 24 69-71Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Obayashi T Hayashi S Saeki M Ohta H Kinoshita K (2009) ATTED-II provides coexpressed gene networks for Arabidopsis Nucleic wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Acids Res 37 D987-D991Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ogata Y Suzuki H Sakurai N Shibata D (2010) CoP a database for characterizing co-expressed gene modules with biologicalinformation in plants Bioinformatics 26 1267-1268

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Oshlack A Wakefield MJ (2009) Transcript length bias in RNA-seq data confounds systems biology Biol Direct 4 14Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Park T Yi S-G Kang S-H Lee S Lee Y-S Simon R (2003) Evaluation of normalization methods for microarray data BMC Bioinformatics4 33

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Paulson J Chen C-Y Lopes-Ramos CM Kuijjer ML Platig J Sonawane AR Fagny M Glass K Quackenbush J (2016) Tissue-awareRNA-Seq processing and normalization for heterogeneous and sparse data bioRxiv 81802

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Pautler M Eveland AL LaRue T Yang F Weeks R Lunde C Je B Il Meeley R Komatsu M Vollbrecht E et al (2015) FASCIATED EAR4Encodes a bZIP Transcription Factor That Regulates Shoot Meristem Size in Maize Plant Cell Online 2 tpc114132506

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Penning BW Hunter 3rd CT Tayengwa R Eveland AL Dugard CK Olek AT Vermerris W Koch KE McCarty DR Davis MF et al (2009)Genetic resources for maize cell wall biology Plant Physiol 151 1703-1728

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ponnala L Wang Y Sun Q Wijk KJ (2014) Correlation of mRNA and protein abundance in the developing maize leaf Plant J 78 424-440Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Rau A Gallopin M Celeux G Jaffreacutezic F (2013) Data-based filtering for replicated high-throughput transcriptome sequencingexperiments Bioinformatics 29 2146-2152

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Robinson MD McCarthy DJ Smyth GK (2010) edgeR a Bioconductor package for differential expression analysis of digital geneexpression data Bioinformatics 26 139-140

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sales G Romualdi C (2011) parmigenemdasha parallel R package for mutual information estimation and gene network reconstructionBioinformatics 27 1876-1877

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sangrador-Vegas A Mitchell AL Chang H-Y Yong S-Y Finn RD (2016) GO annotation in InterPro why stability does not indicateaccuracy in a sea of changing annotations Database J Biol databases curation 2016

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sato Y Antonio BA Namiki N Takehisa H Minami H Kamatsuki K Sugimoto K Shimizu Y Hirochika H Nagamura Y (2011) RiceXPro aplatform for monitoring gene expression in japonica rice grown under natural field conditions Nucleic Acids Res 39 D1141-D1148

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schaefer RJ Briskine R Springer NM Myers CL (2014) Discovering functional modules across diverse maize transcriptomes using wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

COB the co-expression browser PLoS One doi 101371journalpone0099193Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schmidt D (2016) Co-Operation Fast Correlation Covariance and Cosine Similarity R packagePubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schnable P Ware D Fulton R Stein J (2009) The B73 maize genome complexity diversity and dynamics Science (80- ) 326 1112-1115Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schwanhaumlusser B Busse D Li N Dittmar G Schuchhardt J Wolf J Chen W Selbach M (2011) Global quantification of mammalian geneexpression control Nature 473 337-342

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Scott-Boyer M-P Haibe-Kains B Deschepper CF (2013) Network statistics of genetically-driven gene co-expression modules in mousecrosses Frontiers in Genetics 20134291

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Serin EAR Nijveen H Hilhorst HWM Ligterink W (2016) Learning from Co-expression Networks Possibilities and Challenges FrontPlant Sci 7 444

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Shannon P Markiel A Ozier O Baliga NS Wang JT Ramage D Amin N Schwikowski B Ideker T (2003) Cytoscape a softwareenvironment for integrated models of biomolecular interaction networks Genome Res 13 2498-2504

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sing T Sander O Beerenwinkel N Lengauer T (2005) ROCR visualizing classifier performance in R Bioinformatics 21 3940-3941Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Singh M Goel S Meeley RB Dantec C Parrinello H Michaud C Leblanc O Grimanelli D (2011) Production of viable gametes withoutmeiosis in maize deficient for an ARGONAUTE protein Plant Cell 23 443-458

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Song L Langfelder P Horvath S (2012) Comparison of co-expression measures mutual information correlation and model basedindices BMC Bioinformatics 13 328

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Stelpflug SC Rajandeep S Vaillancourt B Hirsch CN Buell CR Leon N De Kaeppler SM (2015) An expanded maize gene expressionatlas based on RNA-sequencing and its use to explore root development Plant Genome 314-362

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Szklarczyk D Franceschini A Wyder S Forslund K Heller D Huerta-Cepas J Simonovic M Roth A Santos A Tsafou KP et al (2015)STRING v10 Protein-protein interaction networks integrated over the tree of life Nucleic Acids Res 43 D447-D452

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Tzfadia O Diels T De Meyer S Vandepoele K Aharoni A Van de Peer Y (2016) CoExpNetViz Comparative Co-Expression NetworksConstruction and Visualization Tool Front Plant Sci 6 1194

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Usadel B Obayashi T Mutwil M Giorgi FM Bassel GW Tanimoto M Chow A Steinhauser D Persson S Provart NJ (2009) Co-expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

USDA (2016) Grain World Markets and Trade

Vanholme R Demedts B Morreel K Ralph J Boerjan W (2010) Lignin biosynthesis and structure Plant Physiol 153 895-905Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Verdin E Ott M (2014) 50 Years of Protein Acetylation From Gene Regulation To Epigenetics Metabolism and Beyond Nat Rev MolCell Biol 16 258-264

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Walley JW Sartor RC Shen Z Schmitz RJ Wu KJ Urich MA Nery JR Smith LG Schnable JC Ecker JR (2016) Integration of omicnetworks in a developmental atlas of maize Science 353 814-818

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang W Zhou XJ Liu Z Sun F (2015a) Network tuned multiple rank aggregation and applications to gene ranking BMC Bioinformatics16 S6

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang YXR Jiang K Feldman LJ Bickel PJ Huang H (2015b) Inferring gene-gene interactions and functional modules using sparsecanonical correlation analysis Ann Appl Stat 9 300-323

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang Z Gerstein M Snyder M (2009) RNA-Seq a revolutionary tool for transcriptomics Nat Rev Genet 10 57-63Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wei C Li J Bumgarner RE (2004) Sample size for detecting differentially expressed genes in microarray experiments BMC Genomics5 87

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wolfe CJ Kohane IS Butte AJ (2005) Systematic survey reveals general applicability of guilt-by-association within genecoexpression networks BMC Bioinformatics 6 227

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wu D-D Wang X Li Y Zeng L Irwin DM Zhang Y-P (2014) Out of Pollen Hypothesis for Origin of New Genes in Flowering PlantsStudy from Arabidopsis thaliana Genome Biol Evol 6 2822-2829

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yang H Liu X Xin M Du J Hu Z Peng H Rossi V Sun Q Ni Z Yao Y (2016) Genome-wide Mapping of Targets of Maize HistoneDeacetylase HDA101 Reveals Its Function and Regulatory Mechanism during Seed Development Plant Cell 28 629-645

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yim WC Yu Y Song K Jang CS Lee B-M (2013) PLANEX the plant co-expression database BMC Plant Biol 13 83Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zheng W Chung LM Zhao H (2011) Bias detection and correction in RNA-Sequencing data BMC Bioinformatics 12 290Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Allen JD Xiao G Xie Y (2014) Ensemble-based network aggregation improves the accuracy of gene network reconstructionPLoS One 9 e106319 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Ye ZH (2015) Secondary cell walls Biosynthesis patterned deposition and transcriptional regulation Plant Cell Physiol 56195-214

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhu G Wu A Xu X-J Xiao P Lu L Liu J Cao Y Chen L Wu J Zhao X-M (2016) PPIM A protein-protein interaction database for MaizePlant Physiol 170 pp1501821-

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zyprych-Walczak J Szabelska A Handschuh L Goacuterczak K Klamecka K Figlerowicz M Siatkowski I (2015) The impact of normalizationmethods on RNA-Seq data analysis Biomed Res Int 2015

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

  • Parsed Citations
  • Article File
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Figure 7
  • Figure 8
  • Parsed Citations
Page 2: 1 : Maize RNA-Seq GCNs · 4 Author names and affiliation: 5 Ji Huang, ... and the genome annotation is still far from complete (Mark 49 Cigan et al., 2005; Ficklin and Feltus, 2011)

Page | 2

Zea mays (maize) is the most widely produced crop in United States and US agriculture accounted for 36 33

of world maize production in 2015 (USDA 2016) Maize has also been in the center of the genetics research 34

for over 100 years including McClintockrsquos pioneering work with transposable elements (TEs) (reviewed by 35

(McClintock 1983 Fedoroff 2012)) Due to recent technological advances in nucleic acid sequencing and the 36

availability of the maize genome sequence (Schnable et al 2009) maize genomics research has been greatly 37

expedited 38

RNA-Sequencing (RNA-Seq) has become the favored technique for detecting genome-wide expression 39

patterns RNA-Seq has some advantages over microarray analysis of gene expression including single base 40

pair resolution detection of novel transcripts and the ability to analyze transcript abundance without existing 41

genome information (reviewed by (Wang et al 2009 Han et al 2015 Conesa et al 2016)) RNA-Seq data 42

provides information about single nucleotide polymorphisms (SNPs) which facilitates Genome-wide 43

Association Studies (GWAS) (Fu et al 2013 Li et al 2013a Lonsdale et al 2013 Fadista et al 2014) 44

Because of its widespread adaptability over five thousand Illumina platform maize RNA-Seq libraries (Fig 1A) 45

are available in the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) 46

database (Leinonen et al 2010) adding to the body of data that can be used to study the maize genome 47

The maize genome is large and heterogeneous and the genome annotation is still far from complete (Mark 48

Cigan et al 2005 Ficklin and Feltus 2011) Although recent work has made substantial progress toward 49

describing genome-wide expression patterns in many genotypes environmental conditions and tissues 50

relatively little is known about the function and regulation of most maize genes Because genes with related 51

biological functions or regulatory mechanisms often have similar expression patterns (Aoki et al 2007) one 52

way to enhance understanding of gene function is by construction of a Gene Co-expression Network (GCN) 53

(Drsquohaeseleer et al 2000 Aoki et al 2007 Usadel et al 2009 Li et al 2015c Serin et al 2016) GCNs are 54

constructed using data mining tools and algorithms that describe the relatedness between the expression 55

patterns of multiple genes in a pairwise fashion 56

The use of GCNs pre-dates the availability of RNA-Seq expression data (Ficklin and Feltus 2011 Sato et al 57

2011 De Bodt et al 2012) meaning that these approaches were initiated and optimized predominantly with 58

microarray datasets Maize RNA-Seq samples are already five times more abundant than microarray (Fig 1) 59

and increasing in number meaning that an RNA-Seq oriented maize GCN protocol would be valuable to the 60

scientific community Although the initial inputs and results from microarray and RNA-Seq are similar there are 61

many differences between the data types and analytical approaches It is therefore anticipated that some 62

adjustments to GCN parameters may improve the efficacy of GCN analysis of RNA-Seq data GCN 63

construction is typically a multistep process starting with normalization of input datasets network inference 64

network evaluation and interpretation (Supplemental Fig 1) 65

Both RNA-Seq and microarrays are affected by systematic variations (Park et al 2003 Oshlack and 66

Wakefield 2009 Zheng et al 2011 Li et al 2014b) Therefore genome wide expression results generated by 67

either technique need to be normalized prior to analysis (Dillies et al 2013a Li et al 2015b) Variance 68

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 3

stabilizing transformation (VST) Counts Per Million (CPM) and Reads Per Killobase Million (RPKM) are three 69

popular normalization methods for RNA-Seq experiments (Mortazavi et al 2008 Anders and Huber 2010 70

Rau et al 2013) 71

Some work has been done to evaluate the efficacy of different normalization methods for expression analysis 72

Giorgi et al (2013) showed VST normalization of RNA-Seq data resulted in a GCN with similar characteristics 73

to a microarray-supported network in terms of coefficient and node degree distribution Normalizations with 74

CPM and using the Trimmed Mean of M-values (TMM) to adjust the composition bias between RNA-Seq 75

datasets by calculating normalization factors (Robinson et al 2010) increased the robustness of analysis 76

among diverse library sizes and compositions (Dillies et al 2013a) These studies suggest that optimizing 77

normalization methods might improve GCN performance 78

There are several methods for gene network inference including correlation mutual information (MI) Bayesian 79

network and probabilistic graphical models Typically correlation and MI methods are used for constructing 80

large-scale GCNs with more than ten thousand genes (Krouk et al 2013) Correlation methods include 81

Pearson Correlation Coefficient (PCC) Spearmans correlation coefficient (SCC) Kendall rank correlation 82

coefficient (KCC) Gini correlation coefficient (GCC) and Biweight midcorrelation (BIC) (Langfelder and 83

Horvath 2008 Kumari et al 2012 Ma and Wang 2012 Ballouz et al 2015) Cosine similarity coefficient 84

(CSC) has also been used for computing similarities in sparse datasets such as text (Dhillon and Modha 2001) 85

and protein-protein interaction data (Luo et al 2015) MI methods include Accurate Cellular Networks 86

(ARACNE) Minimum Redundancy NETwork (MRNET) and Context Likelihood of Relatedness (CLR) (Margolin 87

et al 2006 Faith et al 2007 Meyer et al 2007) The network inference method might also influence GCN 88

performance 89

Several resources are already available for GCN analysis in maize including COB (Schaefer et al 2014) 90

CORNET (De Bodt et al 2012) CoP (Ogata et al 2010) PLANEX (Yim et al 2013) and ATTED-II (Obayashi 91

et al 2009) All of databases except ATTED-II used PCC to build GCN from 128 to 379 microarray datasets 92

ATTED-II recently updated their database to provide both GCNs from microarray and RNA-Seq using PCC-93

based mutual rank (Aoki et al 2015) Although PCC is widely used there is very limited evidence that it is the 94

optimal approach for GCN analyses 95

GCNs could also be improved by meta-analysis using ranked aggregation from individual networks (Zhong et 96

al 2014 Ballouz et al 2015 Wang et al 2015a) By aggregating individual experiments only interactions 97

consistent among networks are preserved which helps reduce noise and highlights conserved interactions 98

Furthermore the ranked aggregation method provides a way to efficiently increase the size of the aggregated 99

network with newly available datasets and recalculation with all datasets is not required when a new one is 100

added This provides an efficient way to process and incorporate emerging information 101

Herein an extensive evaluation in constructing maize GCNs is reported Three parameters were tested 102

normalization method network inference algorithm and ranked aggregation method To our knowledge this is 103

the first comprehensive attempt to optimizing GCN construction using plant RNA-Seq datasets The network is 104 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 4

publicly accessible at httpwwwbiofsuedumcginnislabmcnmain_pagephp A tutorial is also provided as 105

supplemental material 106

107

Results 108

Manually Curated Maize mRNA Expression Profiling from Publicly Available Datasets 109

Recently the usage of RNA-Seq in maize has increased dramatically generating zero data entries in NCBI-110

SRA in 2008 to over nine hundred in 2016 (Fig 1B) On the contrary the most widely used Affymetrix 111

expression array for maize had 177 samples in 2008 but only 46 in 2016 (Fig 1B) GCN construction 112

approaches have not been optimized for RNA-Seq datasets in plants and doing so could improve the quality 113

and robustness of GCNs To support a comprehensive evaluation on the effect of RNA-Seq normalization 114

methods and network inference methods on the performance of GCNs maize RNA-Seq datasets were 115

compiled and processed with a computational pipeline (Supplemental Fig 1) 1266 high quality RNA-Seq 116

maize libraries from 17 different experiments were selected as input to an expression matrix The 117

corresponding experimental descriptions and publications where available of each library were manually 118

checked for sample information (Supplemental Table S1) Also a filter for reads depth and alignment rate 119

were used to remove unqualified libraries (see Methods for detail) Tissue type and haplotype from those 120

libraries were manually curated and found to include a range of sample types (Supplemental Table S1) Shoot 121

apical meristem (SAM) leaf and root were the top three most abundant tissue types but a wide range of 122

tissues were represented by multiple libraries in the dataset (Supplemental Fig 1) The dataset also included 123

multiple haplotypes although B73 represented approximately 40 of the included libraries To reduce noise 124

lowly expressed genes were removed from analysis leaving 15116 nonredundant genes across the 1266 125

libraries For comparative purposes the Affymetrix Gene Chip maize array includes 13339 genes before 126

filtering (GeneChip Maize Genome Array 127

httpwwwaffymetrixcomcatalog131468AFFYMaize+Genome+Array1_1) 128

129

Three RNA-Seq Normalization Methods Show Comparable Distribution of Expression 130

Expression data from distinct sources and experiments can be highly variable because of hybridization artifacts 131

in microarray or variable sequencing depth in RNA-Seq Many methods have been successfully used for 132

normalizing both microarray and RNA-Seq data to correct for potential biases (Lim et al 2007 Dillies et al 133

2013b Li et al 2015b) To find an optimal normalization method for building a maize GCN from RNA-Seq data 134

three widely used normalization methods were compared This included Variance Stabilizing Transformed 135

(VST) Counts Per Million (CPM) and Reads Per Killobase Per Million (RPKM) (Mortazavi et al 2008 Anders 136

and Huber 2010 Rau et al 2013) For all normalization methods log2 transformation on the normalized 137

expression values reduced the skew of the data distribution (Supplemental Fig 2) Several network studies 138

from plant RNA-Seq data used log2 transformation (Davidson et al 2011 Ma and Wang 2012 Giorgi et al 139 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 5

2013 Stelpflug et al 2015 Walley et al 2016) In our analysis genes with CPM gt 2 in more than 1000 140

samples were included This filter dramatically reduces zero count values in raw data from 30949 to 0367 141

Moreover a prior count of one was added at log2 normalization (expression = log2(CPMRPKM +1)) to avoid 142

problem with remaining zero values The log2 transformation reduced skewed distributions and extreme values 143

represented by outliers (Supplemental Fig 2) Thus we think it is important to apply log2 transformation for our 144

data 145

The distribution of gene expression across the 1266 libraries formed a bell-shaped curve with a small 146

additional peak of low expression for all three methods (Supplemental Fig 2) To determine if these low 147

expression values came from a few or multiple libraries elements within the range of expression that 148

corresponded to the observed peak (lt -37 CPM Supplemental Fig 2B) were extracted from CPM-normalized 149

expression matrix and matched to the originating libraries This demonstrated that the low expression elements 150

were not limited exclusively to specific libraries but eight libraries contributed over 25 of low elements A 151

gene ontology enrichment analysis failed to identify significant gene ontology descriptors within the subset of 152

43 genes that were defined as lowly expressed (data not shown) All eight of these libraries were from pollen 153

tissue where the average gene expression at 147 Counts Per Million (CPM) is lower than the average gene 154

expression of the other 79 tissues combined at average 183 CPM Hierarchical clustering and correlation 155

heatmap with the same data (Stelpflug et al 2015) shows the uniqueness of pollen tissue expression pattern 156

(Langfelder and Horvath 2008) (Supplemental Fig 3) When the lowly expressed elements from RPKM- and 157

VST-normalized data were analyzed to determine library origin and GO enrichment (data not shown) we found 158

similarly high level of pollen-specific libraries without significant GO categories In pollen some highly 159

expressed genes are considered orphan genes (Wu et al 2014) because they lack detectable homologs in 160

another species To investigate whether these lowly expressed genes were orphan genes their gene 161

sequences were blasted against Setaria italica genome (JGIv2) (BLASTX e-value lt 1E-03) Setaria italic 162

(foxtail millet) is a close relative to maize which diverged 234 million years ago (MYA) as estimated by 163

TimeTree (Kumar et al 2017) Only 1 out of 43 genes lacked detectable homologs in Setaria italic (data not 164

shown) indicating that the majority of these genes are not likely to be orphan genes 165

Because RPKM normalization accounts for gene length the distribution of gene length versus expression for 166

the RPKM method was compared to data normalized by VST and CPM methods VST- and CPM-normalized 167

data showed very similar overall patterns with no clear linear relationship between gene length and average 168

expression (Supplemental Fig 2C) RPKM-normalized data displayed an apparent bias toward elevated 169

expression of a small number of genes less than 5000bp in length and lower expression of long genes 170

suggesting that this normalization method might skew the distribution of expression at some genes Overall in 171

spite of these differences the three normalization methods resulted in a similar distribution of expression 172

patterns for most of the genes included in the analysis Additional analysis was completed to determine if the 173

three normalization methods influence network performance 174

175

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 6

Network Performance Does Not Differ Based Upon Normalization Method 176

To compare the efficacy of three normalization and ten inference methods a GCN was generated for each 177

combination of normalization and inference methods Furthermore all networks were rank-standardized to limit 178

the edge weight ranging from 0 to 1 (See Methods) All networks evaluations used the whole adjacency matrix 179

(1511615116 in RNA-Seq networks 1142911429 or 1786217862 in protein networks) without a cut-off 180

The performance of the different networks was measured by comparing the area under the receiver operator 181

characteristic curves (AUROC) AUROC is a measurement used to evaluate the accuracy of classification 182

models making it suitable for evaluating GCNs (Gillis and Pavlidis 2011 Ma and Wang 2012 Liu et al 2017) 183

AUROC values range from 0 to 1 with a value closer to 1 indicating that the network is discriminating 184

nonrandom patterns and perfect classification random networks returning values close to 05 and values 185

closer to 0 indicating a high degree of incorrect classification While an AUROC value close to 1 is optimal 186

values over 07 suggest good performance when analyzing large diverse networks (Gillis and Pavlidis 2011) 187

To set up the AUROC baseline for the random networks maize gene IDs were shuffled 10 (for MRNET and 188

CLR) or 1000 times (for PCC) from the normalized expression matrix The randomized expression matirx were 189

inferenced using designated alorgrithms and further evaluated The resulting AUROC values from randomized 190

networks were very close to 05 (Supplemental Table S2) 191

AUROC values were calculated and compared for three different network characteristics The first 192

characteristic was designed to test if the network identified genes with known or predicted co-expression 193

patterns based upon prior results and inclusion in two existing datasets that could serve as a positive control 194

for co-expression The maize metabolic pathway (MaizeCyc) contains 413 pathways with more than two genes 195

and was built based upon collection of evidence from genome annotation phylogenetic distance and known 196

genes in maize rice and Arabidopsis (Monaco et al 2013) The maize protein-protein interaction database 197

(PPIM) is based upon both predicted and experimentally detected protein interactions (Zhu et al 2016) and 198

was the second dataset used in this analysis Only high-confident interactions from PPIM were used as 199

defined by ranking top 5 in their model (Zhu et al 2016) For comparison with the GCN genes within the 200

same MaizeCyc or PPIM pathways were considered co-expressed The MaizeCyc and PPIM datasets were 201

combined and genes with less than 5 interactions were excluded from evaluation creating a compiled dataset 202

referred to herein as the Protein-Protein and Pathway dataset (PPPTY) PPPTY had 1720 genes and 104856 203

interactions that were used in this evaluation The AUROC value was calculated for each of the 1720 gene 204

terms 205

To assess the effect of normalization method on GCNs AUROC values for all ten inference methods were 206

averaged for each of the three normalization methods All three normalization methods scored similarly in 207

comparison with the PPPTY dataset (Fig 2B) with a mean AUROC value around 0575 for each suggesting 208

that the predicted networks were more selective than a random network 209

The second characteristic was the presence of similar gene ontology (GO) information for maize genes within 210

a detected co-expression set based upon ldquoguilt by associationrdquo that assumes specific subgroups of co-211 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 7

expressed genes have some shared functions (Wolfe et al 2005) GO annotations were downloaded from 212

AgriGO (Du et al 2010) which uses signature integration by InterPro to map gene IDs to GO terms rather 213

than co-expression data InterPro provided over 108 million stable GO terms to the functional protein 214

information database UniProtKB at release 2016_01(Sangrador-Vegas et al 2016) Thus the GO annotations 215

provide a reliable evaluation resource independent of co-expression data To assess this characteristic gene 216

ontology information was used in a neighbor voting algorithm (Gillis and Pavlidis 2011) for sets of co-217

expression matrices and compared Co-expression matrices were assessed by 3-fold cross-validation which 218

involved masking GO terms from some genes to test whether the masked GO terms could be predicted based 219

upon gene expression patterns 277 GO terms were included for this analysis 220

When GO characteristics were used to assess the networks all three normalization methods performed 221

similarly but the AUROC values were higher at around 0689 for each than those observed for comparisons 222

with PPPTY (Fig 2A) Because GO addresses gene functions and PPPTY emphasizes protein-protein 223

interactions this suggests that GCNs are better at predicting functional interactions than physical interactions 224

The p-value from one-way ANOVA for testing normalization method effect on PPPTY and GO dataset were 225

09535 and 04714 respectively confirming that the normalization method did not create a significant difference 226

in the AUROC scores associated with the GCNs for the characteristics that were tested 227

Finally proteins that regulate gene expression or modify chromatin structure might interact with the DNA of a 228

subset of co-expressed genes The interactions between such a protein and regulated DNA could be detected 229

by chromatin precipitation of associated DNA followed by DNA sequencing (ChIP-Seq) In maize there are five 230

ChIP-Seq datasets available (Bolduc et al 2012 Morohashi et al 2012 Li et al 2015a Pautler et al 2015 231

Yang et al 2016) some of which involving lowly expressed or tissue-specific genes For example Opaque2 is 232

specifically expressed in endosperm (Li et al 2015a) Knotted1 is expressed in SAM and floral tissues (Bolduc 233

et al 2012) and Pericarp Color1 has low expression except in inflorescence and seed (Morohashi et al 234

2012) Histone Deacetylase 101 (HDA101) ChIP-Seq data provided the largest dataset for comparison with 26 235

confirmed binding targets that are relatively high expressed in most maize tissues (Yang et al 2016) Histone 236

deacetylation often correlates with decreased in gene expression (Verdin and Ott 2014) High confidence 237

HDA101 targets were defined as those discovered by ChIP-Seq and that also showed increased gene 238

expression in hda101 mutant Networks associated with the 26 high confidence HDA101 targets were 239

compared by calculating AUROC Based upon this analysis the AUROC values were very similar among 240

networks normalized by VST CPM and RPKM (Fig 2C) which is consistent with GO and PPPTY evaluation 241

242

Correlation Methods Performs better than Mutual Information at Some Genes 243

After normalization of the expression matrices they can be processed by different methods for GCN inference 244

To optimize this step the AUROC values of six correlation (PCC SCC KCC GCC BIC CSC) and four mutual 245

information (MI) methods (AA MA MRNET CLR) were compared for the expression matrices that were 246

generated from each of three normalization methods (VST CPM RPKM) and then averaged In general 247 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 8

correlation methods are more computationally efficient while MI methods are able to reveal non-linear 248

relationships (Li et al 2015c) PCC is widely used but may be influenced by outliers (Mukaka 2012) SCC 249

KCC and BIC are less sensitive to outliers because SCC and KCC only consider the rank information and BIC 250

calculates based on dataset median instead of mean (Serin et al 2016) Recently GCC has been shown to 251

be a better correlation method for gene expression analysis because of its capacity to detect non-linear 252

relationships and insensitivity to outliers (Ma and Wang 2012) CSC is widely used for text mining and 253

analyzing sparse data with many zeros (Dhillon and Modha 2001) ARACNE MRNET and CLR showed 254

extended gene-dependent relationships under variable biological settings (Margolin et al 2006 Faith et al 255

2007 Meyer et al 2007 Li et al 2013b) To estimate the effectiveness of the inference methods the same 256

testing parameters with AUROC calculations were performed as described for the testing of normalization 257

methods 258

Assessed by GO datasets the 277 AUROC values were averaged to create one average value for each of the 259

10 inference methods ranging from 0620 to 0724 (Fig 2D) The average AUROC across all normalization 260

methods for six correlation methods was 0718 while the average AUROC for the all four MI methods was 261

0646 The majority of the 277 GO terms had similar AUROC values in the different correlation method-262

generated GCNs and these patterns are different from those observed in the MI-generated GCNs (Fig 3A) 263

The similarity among different methods was also detectable by pairwise comparison and comparing Pearson 264

correlations between the different methods (Supplemental Fig 4A) 265

To evaluate network inference methods with the PPPTY dataset the AUROC values for 1720 genes were 266

averaged for each combination of normalization and inference methods (Fig 2E) This evaluation also showed 267

that the networks constructed using correlation methods resulted in higher AUROC values than MI methods 268

although the CSC method resulted in lower AUROC values than other correlation methods As demonstrated 269

for the GO evaluation results from correlation methods were more similar with each other than the MI methods 270

(Supplemental Fig 4B) Interestingly heatmap results indicated that a subset of genes consistently had higher 271

AUROC values when CSC MRNETCLR or AAMA were used (Fig 3B) although this includes a small enough 272

number of genes that the average AUROC value over the whole gene set was relatively low for those methods 273

The gene sets with highest AUROC values in PCC CSC or MRNET were extracted Characteristics of each 274

gene sets were compared in average expression (CPM) and average number of low expressed elements 275

(CPM lt 0) The CSC gene set had the smallest number of low expression elements and had higher average 276

expression than both the 1720 gene set and the PCC gene set (Supplemental Fig 5) This may indicate that 277

the CSC method is better at determining co-expression for highly expressed genes 278

The AUROC values from 26 targets of HDA101 ChIP-Seq datasets reveals that CSC GCN had the highest 279

AUROC value and the use of MRNETCLR GCNs resulted in slightly higher scores than correlation methods 280

(Fig 2F) This could be explained by the small number of targets creating skewed results but may also 281

indicate that CSCMI methods are more suitable for specific types of genes or interactions between genes 282

(Tzfadia et al 2016) HDA101 is a highly expressed gene in all samples with average expression value equals 283

to 864 CPM and minimum expression equals to 289 CPM so itrsquos possible that HDA101 is more suitable for 284 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 9

CSC method CPM and RPKM normalization methods had higher AUROC values than VST (Fig 2C) Using 285

two models of ARACNE (additive-AA and multiplicative-MA) the co-expression matrices contain less than 05 286

non-zero values for all comparisons and so these techniques were not included in any additional analyses 287

In conclusion our results indicated the widely-used correlation methods resulted in a more predictive maize 288

GCN from a single expression matrix but co-expression with some individual genes may be better detected 289

using MI methods Normalization method did not have a substantial influence on GCNrsquos performance so only 290

CPM normalization was used in conjunction with PCC SCC MRNT and CLR inference for subsequent 291

optimization of other parameters 292

293

Increase Sample Size Had a Positive Effect On GCN 294

GCN analysis can be accomplished with a variable number of samples and datasets but sample size can 295

influence the quality of the resulting GCN (Wei et al 2004 Ballouz et al 2015) Separate analyses were 296

conducted with different numbers of samples and experiments to empirically determine the effect of sample 297

number on GCN effectiveness The data in our analysis consisted of 17 experiments each including between 298

12 and 404 libraries For this analysis CPM normalization method followed by each of four inference methods 299

(PCC SCC MRNET and CLR) was applied to the 17 experiments and the 68 resulting networks were 300

evaluated by both GO and PPPTY 301

From GO and PPPTY evaluation all algorithms exhibit a positive linear relationship between sample size with 302

natural logarithm transformed and average AUROC values (Fig 4) The linear relationships are stronger in 303

PCC and SCC methods with higher r-square values indicating correlation methods benefit more from 304

increasing sample size Thus for building correlation-based GCNs as many samples as possible should be 305

included We also found that as seen for the total GCN analysis PCC and SCC had higher average AUROC 306

values than the MRNET and CLR methods for PPPTY and GO analysis for most of individual networks (Fig 5) 307

308

Ranked Aggregation of Networks Improved Performance of GCNs 309

Ranked aggregation for meta-analysis can also be modified to change the outcomes of GCN by buffering the 310

effect of sample heterogeneity (Zhong et al 2014 Wang et al 2015a Asnicar et al 2016) Aggregated rank 311

standardized correlationMI matrices were calculated from separate experiments to determine if this approach 312

enhanced GCN performance Aggregating individual networks together for meta-analysis can help to highlight 313

true co-expression interactions and reduce noise (Zhong et al 2014 Wang et al 2015a Wang et al 2015b) 314

This analysis was conducted with the 17 differently sized experiments using PCC SCC MRNET and CLR 315

method for GCN inference as we did previously resulting in 68 single GCNs The 17 experiments were 316

aggregated for PCC SCC MRNET and CLR individually and evaluated by GO and PPPTY datasets 317

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 10

Of the 4 aggregated networks that were evaluated the two correlation methods (PCC and SCC) had higher 318

AUROC values than the single network from 1266 samples (Figure 6 and Supplemental Fig 6) However this 319

aggregation strategy did not result in significant higher AUROC scores for the MRNET and CLR method 320

networks compared with single networks with 1266 samples (two-tail Wilcoxon rank test for GO evaluation p-321

values 0494 and 0796) It has been reported that MI estimation accuracy is dependent on sample size (Gao 322

et al 2015) therefore individual MI networks built with a small number of libraries may not demonstrate 323

improved accuracy from aggregation In conclusion the PCCSCC-built GCN performed best using a ranked 324

aggregation strategy and use of this strategy in combination with the other optimized parameters creates a 325

robust GCN 326

327

The Performance of Protein Networks Did Not Exceed Aggregation Networks 328

In many cases mRNA levels in a cell are of interest because mRNA level is thought to be related to the level 329

and function of a protein of interest However many researchers had found inconsistencies between mRNA 330

and protein level (Baerenfaller et al 2008 Schwanhaumlusser et al 2011 Ponnala et al 2014 Walley et al 331

2016) Although relatively less protein expression data is available this data is amenable to GCN construction 332

and could represent a more direct reflection of interacting proteins Using a non-modified protein expression 333

atlas from 23 maize tissues based upon mass spectrometry data (Walley et al 2016) four protein networks 334

were built with PCC SCC MRNET and CLR separately and then evaluated using the same PPPTY and GO 335

dataset as previously mentioned 336

GCNs constructed from protein expression did not exhibit superior AUROC values to those observed for RNA-337

Seq based GCN using the aggregation strategy (Fig 6) When evaluated by GO and PPPTY dataset the 338

performance of the protein network was lower than the aggregated network as well as the single network from 339

1266 samples To confirm this result a two-way ANOVA was computed with pairwise comparison for the GO 340

evaluation which showed that the effect of network type was significant (Supplemental Table S3) A 341

subsequent pairwise comparison using Wilcoxon rank sum test indicated that PCCSCC method were 342

significantly better than MRNETCLR (Supplemental Table S3) although MI methods may be superior for 343

some types of interactions 344

The raw protein expression data included 17862 genes of which 11429 genes overlapped with our RNA-Seq-345

based network and were therefore used for the analysis To demonstrate that the performance of the protein 346

network was not biased due by the selection of genes the PCC method was used for the whole 17862 genes 347

to construct a protein network (Supplemental Fig 7) No improvement could be detected from protein network 348

derived from 17862 genes with p-value equals to 0635 for GO evaluation and 0995 for PPPTY evaluation 349

from one-sided Wilcoxon rank sum test 350

351

PCC and SCC-built GCN Exhibit Identical Topological and Functional Properties 352 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 11

In addition to evaluation of network performance based upon biological characteristics networks can be 353

compared based upon several different network characteristics including clustering coefficient number of 354

nodes network heterogeneity (Dong and Horvath 2007) network centralization (Dong and Horvath 2007) 355

number of detected modules and number of genes in largest module Number of nodes is a basic construct in 356

graph theory depicting the scale of a network Clustering coefficient and number of modules are to model how 357

densely nodes are connected in networks Heterogeneity measures the variability of node connections 358

Centralization indicates how likely some nodes have significantly more connections than average In this 359

analysis each gene corresponds with a node Based on the extensive evaluation using biological 360

characteristics like protein-protein interactions (PPPTY) and predicted gene function (GO) three final maize 361

networks were selected for comparison of basic network characteristics based on their overall performance 362

PCC and SCC-built ranked aggregation network from 17 experiments (PA and SA) MRNET-built single 363

network from 1266 total samples (MS) The three networks were constrained to include the top one million 364

predicted interactions or edges 365

In prior studies most biological networks had scale-free architectures which fit a power-law distribution 366

(Barabasi et al 2004 Doncheva et al 2012 Schaefer et al 2014) For the three final maize networks 367

constructed using optimized parameters both neighborhood connectivity distribution (Supplemental Fig 8) and 368

node degree distribution (Supplemental Fig 9) fit power-law models with r-squared values over 07 The MS 369

network had the highest network centralization value The network heterogeneity value of MS was over two 370

times that of PA and SA indicating that MS may contain more highly interacting genes (Supplemental Table 371

S4) consistent with the observed highest centralization values for this network Centralization and 372

heterogeneity are two variants to model the degree distribution of networks A scale-free network with more 373

numbers of hubs has larger values of centralization and heterogeneity while a network with larger values of 374

centralization and heterogeneity may contain a larger number of hubs or the number of hubs is not significantly 375

large but the degree distributions are extremely imbalanced In biological networks many observations 376

connected large values of centralization and heterogeneity with more hub genes (Ma and Zeng 2003 Horvath 377

and Dong 2008 Iancu et al 2012 Scott-Boyer et al 2013) even though theoretically we cannot rule out the 378

possibility that high values were result from extremely imbalanced degree distribution For the MS network 379

most highly connected genes interacted with a large number of lowly connected genes this pattern is also 380

apparent reflected in the decreasing neighborhood connectivity distribution for the MS network (Supplemental 381

Fig 8) The genes with the most interactions are expected to act as key components in GCN networks 382

(Langfelder and Horvath 2008 Allen et al 2012) and likely represent central regulators of multi-protein 383

biological processes (Ma et al 2013 Du et al 2015) The top 1000 interacting genes from all networks were 384

analyzed in more detail as these were potential ldquohubrdquo genes that may regulate other expression patterns and 385

processes PA and SA shared 95 of the top 1000 interacting genes while MS had 835 unique genes (Fig 386

7A) 148 genes were shared among all three networks (Supplemental Table S5) making these genes strong 387

candidate for central biological regulators The annotation of these genes suggests their participation in a 388

range of basic cellular process (Fig 7C) including gene expression DNA replication translation and gene 389

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 12

silencing (Supplemental Table S5) the top interacting genes were not limited to a subset of cellular 390

biochemistry Ribosomal proteins were the largest component of top interacting genes (27148) which was 391

expected because of their cellular abundance and involvement with translation Interestingly nine epigenetic 392

regulators were found in the 148 shared genes including AGO104 (GRMZM2G141818) (Singh et al 2011) 393

CHR106 (GRMZM2G071025) (Li et al 2014a) and LBL1 (GRMZM2G020187) (Dotto et al 2014) 394

demonstrating the importance of epigenetic regulation for plant development (reviewed by (Huang et al 395

2017)) 396

To reveal the underlying properties of GCNs a graph clustering algorithm Markov Cluster Algorithm(MCL) was 397

used to identify network modules (Enright et al 2002 Morris et al 2011) The result showed a shared pattern 398

between the PA and SA networks that was distinct from the MS network (Supplemental Table S4) The MS 399

network had fewer but larger modules detected than the PA and SA networks Consequently most genes in 400

the MS network clustered into one very large module of 14054 consistent with the high network centralization 401

value for the MS network Conversely PA and SA networks separated into smaller distinct modules with 402

related gene ontology enrichment (Supplemental Table S6 and S7) The pattern displayed by the PA and SA 403

networks (Supplemental Fig 10) seems more likely to represent biologically relevant pathways and so these 404

methods appear to be better for module detection 405

To compile a high-confident co-expression network the top 1 million edges from PA SA and MS were merged 406

together and the intersection of the three produced a 14277 gene 106591 interactions merged network PA 407

and SA shared 835 of common interactions within the networks while MS had 873 unique interactions 408

(Fig 7B) This merged network (Supplemental Dataset S1) was used for a case study analysis of cell wall 409

biosynthesis The same network can also be accessed at httpwwwbiofsuedumcginnislabmcnmain_pagephp 410

411

Case Study Cell Wall Biosynthesis and Regulation 412

To demonstrate the functionality of network the predicted cell wall biosynthesis pathway from the merged 413

network was compared to the existing knowledge of this pathway Sixteen well-characterized components of 414

cell wall biosynthesis were selected as guide genes (Supplemental Table S8) including five cellulose 415

synthase genes seven cellulose synthase-like genes three glycosyl hydrolase genes and one glycosidase 416

gene (Penning et al 2009 Bosch et al 2011) Collectively 214 genes containing 377 edges were extracted 417

from the network with the 16 guide genes (Fig 8 A) two guide genes did not have any co-expressed genes in 418

the network that met the analysis criteria As expected for these 214 genes cell wall related GO terms were 419

enriched (Fig 7D Supplemental Table S9) 420

The resulting 214 co-expressed genes were queried against the Arabidopsis TAIR 10 protein database to 421

retrieve homologs and their annotations using BLASTP The literature was manually searched using the maize 422

genes and their Arabidopsis homologs as queries (Supplemental Table S10) The results of the literature 423

survey showed that 313 (67214) of the genes co-expressed with the guide genes had peer-reviewed 424

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 13

publications indicating a role in cell wall synthesis or related pathways in plants A search using 214 randomly 425

selected genes as queries returned only 327 genes (7214) that were involved in cell wall related pathways 426

This suggests that the network discriminated co-expressed genes and identified some known components of 427

the pathway Lignin biosynthesis genes are expected to function in cell wall biosynthesis to provide rigidity and 428

strength in the secondary cell wall (reviewed by Vanholme et al 2010) Interestingly even though no lignin 429

biosynthesis genes were included in our queries six lignin biosynthesis genes (PAL1 C4H 4CL2 HCT 430

CCoAOMT1 and PDR1) (reviewed by Zhong and Ye 2015) were found to be co-expressed with the guide 431

genes At least nine cellulose biosynthesis and assembly genes were discovered including CESA1 FLA11 432

IRX9 IRX14 and IRX10 (reviewed by Zhong and Ye 2015) Moreover proteins participating in a well-studied 433

physical interaction CSI1 (Cellulose Synthase Interactive 1) CESA6 (Cellulose Synthase 6) and CESA3 434

(Cellulose Synthase 3) (Desprez et al 2007 Gu et al 2010) were also predicted to be expressed in the 435

network There were 131 genes without reported functions in cell wall pathways an indication that GCN 436

analysis can be used to predict undiscovered components of biological pathways in maize 437

The cell wall biosynthesis pathway results were also compared with the CORNET Co-expression database (De 438

Bodt et al 2012) and STRING functional protein association network (Szklarczyk et al 2015) using the same 439

16 genes and similar parameters (See Methods) From CORNET 10 out of 16 genes had co-expressed genes 440

(Fig 8B) In total 210 genes and 325 interactions were retrieved using CORNET of which 19 (40210) had 441

publications supporting their function in cell wall pathways (Supplemental Table S11) STRING performed very 442

well with 14 out of 16 genes demonstrating predicted protein association (Fig 8C) resulting in 817 443

interactions with 76 genes 48 (3675) of co-expressed genes were experimentally confirmed (Supplemental 444

Table S12) the highest percentage among the three methods Only one of the lignin biosynthesis genes 445

(PAL1) was found using CORNET and none were found using STRING Although STRING appears very 446

robust for predicting protein-protein interactions this suggests that an optimized GCN analysis have more 447

power to find genes that function together without physically interacting This case study shows that a robust 448

optimized GCN can discover physical and functional interactions and enhance study of biological relevant 449

interactions A tutorial was provided as supplemental material on how to use Cytoscape to visualize any co-450

expressed genes in our network (Supplemental Dataset S2) 451

452

Discussion 453

As the per-read cost of RNA-Seq technology decreases the use of this technology is quickly increasing With 454

over five thousand libraries available for maize there is now ample data to support GCN analysis This 455

comprehensive evaluation of normalization methods and network inference methods using real maize RNA-456

Seq data will provide a useful set of optimized parameters to support these analyses 457

In our analysis VST CPM and RPKM normalization methods had equivalent outcomes for GCN analysis 458

consistent with prior results using much smaller datasets (Giorgi et al 2013) Several benchmark studies 459

focusing on differential expression (DE) analysis proposed that RPKM performed poorly and should be avoided 460 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 14

(Maza et al 2013 Dillies et al 2013b Zyprych-Walczak et al 2015) This was not observed for the maize 461

GCN testing It is possible that the large number of samples from various labs created enough heterogeneity 462

within samples that normalization effects were minimized (Paulson et al 2016) Furthermore the 463

normalization is on a library basis which means genes within the same library are normalized by similar factors 464

So when the network is constructed by PCC and BIC where expression vectors are centered by mean or 465

median values the effect of different normalization methods are probably small Two rank correlations SCC 466

and KCC only consider difference on relative rankings where normalization has a limited effect It is similar for 467

GCC method The estimation of mutual information is based on the k-nearest neighbor method implemented in 468

parmigene (Sales and Romualdi 2011) Since the three normalization methods shared similar expression 469

distribution (Supplemental Fig 2) MI estimations from different normalizations are expected to be similar 470

When assessing inference methods the simple and widely used correlation methods like PCC and SCC are 471

less time-consuming than MI methods This analysis showed PCCSCC- built GCNs had better overall 472

performance This is consistent with a study in human GCN analysis (Ballouz et al 2015) but SCC did not 473

score higher than other correlation methods using GO and PPPTY evaluations Some genes had higher 474

performance using MI methods but this effect was limited to evaluation with the PPPTY data This may 475

indicate that correlation and MI inference methods assert different kinds of interactions (Meyer et al 2008 476

Marbach et al 2012 Song et al 2012) Marbach et al (2012) stated that integration of multiple inference 477

methods showed a more robust performance than any single inference methods in in silico and E coli 478

expression networks referring to ldquothe wisdom of crowdrdquo However for analysis of the available maize data 479

integration of PCC SCC MRNET and CLR together did not result in a network that outperformed PCC and 480

SCC networks (data not shown) This approach was also less effective in more complex S cerevisiae datasets 481

than prokaryotic networks (Marbach et al 2012) suggesting that more work is required to determine whether 482

integrating algorithms can improve GCNs with eukaryotic data 483

In conclusion we extensively evaluated normalization methods and inference methods for building an RNA-484

Seq based maize GCN This optimization may apply to a range of datasets with shared characteristics of 485

maize including a large and heterogeneous genome with rich and diverse transposon element composition 486

and limited gene annotation 487

488

Materials and Methods 489

RNA-Seq Data Collection and Process 490

The maize genome and its annotation were downloaded from Ensembl Plant Release 31 491

(httpplantsensemblorg) The original 1303 RNA-Seq samples based on illumina HiSeq2000 or Hiseq2500 492

were downloaded from NCBI Sequence Read Archive (SRA) (Leinonen et al 2010) The downloaded files 493

were converted to fastq format using the fastq-dump command in SRA Toolkit (version 252) The adapters for 494

the fastq files were trimmed by Cutadapt 181 (Martin 2011) The adapter-removed files were then quality 495

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 15

checked by FastQC v0112 (httpwwwbioinformaticsbabrahamacukprojectsfastqc) HISAT2 v204 (Kim 496

et al 2015) was used for genome alignment Gene-level expression raw read counts were calculated by 497

FeatureCounts 150 (Liao et al 2014) from aligned bam files (Supplemental Fig S1) 26 libraries with less 498

than 5 million reads total and 11 libraries with less than 70 of total alignment rate were excluded leaving 499

1266 samples (Supplemental Table S1) for the final expression table The processing protocol were 500

streamlined by Snakemake v371 (Koumlster and Rahmann 2012) 501

502

Gene Count Normalization 503

The expression data was normalized using three different methods before constructing GCNs Counts Per 504

Million (CPM) and Reads Per Killbase Per Million (RPKM) were calculated by edgeR package (Robinson et al 505

2010) in R environment and then log2 normalized (expression = log2(CPMRPKM +1) For both method scale 506

factors between samples were estimated by Trimmed Mean of M-values (TMM) in edge R Variance Stabilizing 507

Transformation (VST) was calculated by DESeq2 package (Love et al 2014) Only genes with expression 508

higher than 2 CPM in more than 1000 samples were included from additional analysis (15116 genes) 509

510

Network Inference 511

Six correlation coefficient methods and four mutual information methods were applied to normalized gene 512

expression data to construct GCNs All computing steps were done in the R 331 environment Pearson 513

Correlation Coefficient (PCC) and Spearman Correlation Coefficient (SCC) was calculated by cor() function 514

Kendall rank Correlation Coefficient was calculated using corfk() function in pcaPP package (Filzmoser et al 515

2009) Gini Correlation Coefficient was calculated by adjacencymatrix() function in rsgcc package (Ma and 516

Wang 2012) Biweight midcorrelation was computed by bicor() function in WGCNA package (Langfelder and 517

Horvath 2008) Cosine similarity coefficient was computed by cosine() function in coop package (Schmidt 518

2016) Mutual information results were computed using the parmigene package (Sales and Romualdi 2011) 519

The adjacency matrix weighs derived from ten inference methods were ranked with smallest value equals to 520

one Then ranks were divided by the number of elements in the matrix and diagonal was set to one to make all 521

networks weighs ranging from zero to one 522

523

Network Performance Evaluation 524

To generate the random networks gene IDs were shuffled randomly in CPM or VST normalized expression 525

matrices The randomized expression matrices were then inferenced by PCC MRNET or CLR methods and 526

evaluated For PCC methods 1000 repeats of randomization and evaluation were conducted For MRNET and 527

CLR each inference steps took 2 hours on our server so 10 repeats were conducted 528

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 16

Four maize datasets were used for evaluation First maize protein-protein interactions were downloaded from 529

PPIM v11 (Zhu et al 2016) Only high-confidence interactions were used for evaluation as defined by ranking 530

top 5 in their results Second maize pathway information was downloaded from MaizeCyc v22 (Monaco et 531

al 2013) Genes within same pathways were considered as co-expressed Third maize gene ontology data 532

for AGPv330 was downloaded from AgriGO (Du et al 2010) GO terms with 20 to 300 genes were used for 533

evaluation Fourth ChIP-Seq confirmed targets for HDA101 (GRMZM2G172883) (Yang et al 2016) was used 534

as positive co-expressed examples for evaluation 535

The widely-used Area under Receiver Operating Characteristic (AUROC) for binary classification problems 536

was used for evaluations Protein-protein interaction and pathway information was parsed into lists of co-537

expressed genes Prediction() and performance() function in R package ROCR were used to calculate 538

AUROCs (Sing et al 2005) The 277 AUROC values for GO datasets were calculated by EGAD package 539

(Ballouz et al 2016) in R Basically it utilizes the ldquoguilt-by associationrdquo principle that genes with shared GO 540

terms are more likely to connected Thus networks normalized and inferred by different methods can be 541

evaluated by hiding a subset of genes GO terms and test whether the hidden GO terms could be predicted 542

from the remaining annotations The prediction model performance was measured by AUROC values in three-543

fold cross-validation All ANOVA and pairwise Wilcoxon rank tests were analyzed in R using anova() and 544

pairwisewilcoxtest() function from stats package P-value adjustment method was set to ldquofdrrdquo (Benjamini and 545

Hochberg 1995) 546

Definition of True Positives (TP) False Positives (FP) True Negatives (TN) False Negatives (FN) For the 547

evaluation using PPPTY dataset TP a network predicts two genes are co-expressed and they are co-548

expressed in PPPTY dataset FP a network predicts two genes are co-expressed but they are not TN a 549

network predicts two genes are not co-expressed and they are not co-expressed in PPPTY FN a network 550

predicts two genes are not co-expressed but they are co-expressed in PPPTY datasets For the evaluation 551

using GO dataset TP a network predicts a gene has a specific GO term and it does have that GO term in our 552

GO dataset FP a network predicts a gene has a specific GO term but it does not have that GO term in our 553

GO dataset TN a network predicts a gene does not have a specific GO term and it doesnrsquot have in our GO 554

dataset FN a network predicts a gene does not have a specific GO terms but it has that GO term in GO 555

dataset 556

557

Network Clustering and Characterization 558

For each network the top 1 million edges were selected as stringent co-expression networks The network 559

topological characteristics were computed in Cytoscape (Shannon et al 2003) The neighborhood connectivity 560

distribution and node degree distributions were plotted by Network Analyzer plugin (Doncheva et al 2012) 561

Graph clustering was performed using Markov Cluster Algorithm (MCL) by MCL v14137 with inflation value set 562

to 18 (Enright et al 2002) All networks were visualized in Cytoscape 563

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 17

564

Gene Ontology Enrichment and Visualization 565

Gene ontology enrichment was analyzed in AgriGOrsquos Singular Enrichment Analysis tool (Du et al 2010) 566

15116 genes involved in our networks were used as background references Hypergeometric testing was used 567

to calculate p-value for which a value below 005 was considered as significant The Yekutieli method was 568

used for multiple test correction and terms with false discovery rate (FDR) above 005 were discarded The 569

results were then imported into Cytoscape for visualization 570

571

Databases Comparison on Cell Wall Pathway 572

Sixteen well characterized (Penning et al 2009 Bosch et al 2011) components of cell wall biosynthesis 573

(Supplemental Table S8) were chosen as query genes to search against CORNET Maize 574

(httpsbioinformaticspsbugentbecornetversionscornet_maize10) on website and STRING database using 575

Cytoscape stringApp (httpappscytoscapeorgappsstringapp) The parameters for searching CORNET 576

database were Method=Pearson Correlation coefficient=075 P-value le 005 and Top genes = 50 This 577

resulted in 210 co-expressed genes and 325 interactions To search STRING database the confidence cutoff 578

was set to 04 with maximum number of interactors set to 100 76 genes with 817 interactions were retrieved 579

Maize proteins were blasted against TAIR 10 protein sequences using standalone BLASTP version 2228+ 580

(Camacho et al 2009) 581

582

Acknowledgments 583

We would like to give special thanks to Dr Peixiang Zhao (FSU Department of Computer Science) for advice 584

and discussion on topological analysis of maize networks Also we thank Dr Alan Lemmon (FSU Department 585

of Scientific Computing) and Dr Jonathan Dennis (FSU Department of Biological Science) for the helpful 586

discussion on data analysis 587

588

Supplemental Data 589

Supplemental Figure 1 Pipeline and datasets used for analysis 590

Supplemental Figure 2 Distribution of gene expression values 591

Supplemental Figure 3 Maize CPM-normalized with log2 transformed gene expression from all tissues and 592

developmental stages 593

Supplemental Figure 4 Pairwise comparison among results of inferences methods 594

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 18

Supplemental Figure 5 Characteristic of all 1720 PPPTY gene set (ALL_1720) genes with highest AUROC 595

values in CSC method (CSC) PCC method (PCC) and MRNET method (MRNET) 596

Supplemental Figure 6 Evaluation of network performance based on sample size and inference 597

Supplemental Figure 7 GCN performance comparison between protein networks 598

Supplemental Figure 8 Average neighborhood connectivity for three selected networks PCC-aggregated (PA) 599

SCC-aggregated (SA) and MRNET-single (MS) 600

Supplemental Figure 9 Node distribution for four selected networks PCC-aggregated (PA) SCC-aggregated 601

(SA) and MRNET-single (MS) and the intersection among three networks (Merged network) 602

Supplemental Figure 10 Network representation of PCC ranked aggregation network (PA) 603

Supplemental Table S1 RNA-Seq libraries used in this analysis 604

Supplemental Table S2 Random network AUROC value baseline 605

Supplemental Table S3 ANOVA tables and pairwise comparisons 606

Supplemental Table S4 Topological characteristics of four maize networks 607

Supplemental Table S5 Gene Ontology annotation for 148 hub genes 608

Supplemental Table S6 Enriched GO terms for PCC ranked aggregation networks from module 1 to module 8 609

Supplemental Table S7 Enriched GO terms for SCC ranked aggregation networks from module 1 to module 8 610

Supplemental Table S8 16 query genes in maize cell wall pathway 611

Supplemetal Table S9 GO enrichment analysis for 214 co-expressed genes of cell wall query genes in 612

merged network 613

Supplemental Table S10 Annotation for co-expressed genes queried by 16 cell wall pathway genes from 614

merged network 615

Supplemental Table S11 Annotation for co-expressed genes queried by 16 cell wall pathway genes from 616

CORNET database 617

Supplemental Table S12 Annotation for co-expressed genes queried by 16 cell wall pathway genes from 618

STRING database 619

Supplemental Dataset S1 The merged network in Cytoscape-ready format 620

Supplemental Dataset S2 Tutorial Visualizing Co-expression data in Cytoscape 621

622

623 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 19

624

625

626

Figure legends 627

628

Figure 1 Number of maize microarray and RNA-Seq samples submitted to NCBI (httpswwwncbinlmnihgov) 629

from 2008-2016 A A text search of the National Center for Biotechnology Information (NCBI) Gene 630

Expression Omnibus (GEO) database identified samples generated by microarray platforms GPL4032 and 631

GPL12620 the total values for years 2008 to 2016 were combined to represent the number of microarray 632

studies (Microarray) A text search of the NCBI Sequence Read Archive (SRA) database was used to identify 633

RNA-seq samples generated between 2008 and 2016 using the Illumina sequencing platform (RNA-Seq) B 634

the number of samples submitted to NCBI GEO database each year generated by microarray platform 635

GPL4032 and GPL12620 were identified by a text search (dash line) and compared to the number of RNA-Seq 636

Illumina samples (solid line) per year 2008-2016 637

638

Figure 2 Normalization and network inference methods effect on single network performance A Network 639

performance was evaluated by calculating area under the receiver operating characteristic curve (AUROC) 640

values from GO datasets for comparisons with samples normalized using Variance Stabilizing Transformation 641

(VST) Counts Per Million (CPM) or Reads per Kilobase per Million (RPKM) methods B Network performance 642

was evaluated by calculating AUROC values from PPPTY dataset comparisons for samples normalized using 643

VST CPM or RPKM methods C Network performance was evaluated by calculating AUROC values from 644

comparisons with HDA101 binding targets for samples normalized using VST CPM or RPKM methods D 645

Network performance was evaluated by calculating AUROC values from comparisons with GO dataset for 646

samples constructed using ten inference methods including Pearson Correlation Coefficient (PCC) Spearman 647

correlation coefficient (SCC) Kendall rank Correlation Coefficient (KCC) Gini correlation coefficient (GCC) 648

Biweight midcorrelation (BIC) Cosine Similarity Coefficient (CSC) Additive ARCNE (AA) multiplicative 649

ARCNE (MA) Minimum Redundancy NETwork (MRNET) and Context Likelihood of Relatedness (CLR) E 650

Network performance was evaluated by calculating AUROC values from comparisons with PPPTY for samples 651

constructed using ten inference methods F Network performance was evaluated by calculating AUROC 652

values from comparisons with HDA101 binding targets for samples constructed using ten inference methods 653

Outliers were defined as outside of 15 times the interquartile range above 75 quantile or below 25 quantile 654

Median values were plotted as bold horizontal lines For C to F the horizontal dashed lines indicate the highest 655

and lowest AUROC values 656

657

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 20

Figure 3 Similarity between ten inference methods on network performance based upon GO (A) and PPPTY 658

(B) evaluation Genes with highest AUROC value in CSC or MRNETCLR are closed in blue and green box 659

respectively Area under the ROC curve (AUROC) values for each GO term or genes were scaled to standard 660

normal distribution resulting in scaled AUROC values between -3 (blue) and 3 (red) Samples normalized by 661

VST CPM and RPKM were analyzed using each inference methods (PCC SCC KCC GCC BIC CSC AA 662

MA MRNET and CLR) and clustered based on Euclidian distance PCC Pearson Correlation Coefficient SCC 663

Spearman correlation coefficient KCC Kendall rank Correlation Coefficient GCC Gini correlation coefficient 664

BIC Biweight midcorrelation CSC Cosine Similarity Coefficient AA Additive ARCNE MA multiplicative 665

ARCNE MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness 666

667

Figure 4 Effect of sample size on network performance A Average area under the ROC curve (average 668

AUROC) values from GO evaluation of 17 different sized networks plotted against natural logarithm 669

transformed sample size (log(sample size)) B Average AUROC values from PPPTY evaluation of 17 different 670

sized networks plotted against natural logarithm transformed sample size (log(sample size)) Regression fitting 671

logarithm models were plotted in black lines R-square and p-values were calculated by lm() function in R PCC 672

Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET Minimum Redundancy 673

NETwork CLR Context Likelihood of Relatedness 674

675

Figure 5 Evaluation of different sized networks constructed with different number of samples using PCC 676

(black) SCC (green) MRNET (red) and CLR (blue) methods A Average AUROC values from GO evaluations 677

of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted against sample size Seventeen 678

individual networks were labeled as S12_1 to S404 the S1266 included all samples from 17 experiments B 679

Average AUROC values from PPPTY evaluations of networks constructed using 12 (S12) to 1266 (S1266) 680

libraries were plotted against sample size Networks with the same number of samples included are 681

designated as ldquo_1rdquo ldquo_2rdquo and ldquo_3rdquo PCC Pearson Correlation Coefficient SCC Spearman correlation 682

coefficient MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness 683

684

Fig 6 GCN performance comparison among single network (whiterdquo1266rdquo) aggregated network (greyrdquoaggrdquo) 685

and protein network (dark greyrdquoprrdquo) using PCC SCC MRNET and CLR A GO evaluation on networks 686

Inference methods were indicated by single letter (p- PCC s- SCC m- MRNET c-CLR) AUROC values were 687

plotted against network types B PPPTY evaluation on networks Inference methods were indicated by single 688

letter (p- PCC s- SCC m- MRNET c-CLR) Network types were plotted against AUROC values Bold 689

horizontal lines indicate median star sign is the mean value of each box Outliers are plotted in grey dots 690

691

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 21

Figure 6 GCN performance comparison of networks constructed with 1266 libraries A Area under the ROC 692

curve (AUROC) values from GO evaluation of single network (white bars) aggregation network (grey bars) and 693

protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) 694

or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers B 695

AUROC values from PPPTY evaluation of single network (white bars) aggregation network (grey bars) and 696

protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) 697

or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers 698

699

Figure 7 Comparison of top 1000 interacting genes in aggregation and single networks A Venn diagram 700

shows the overlap among 1000 highest interacting genes in three networks 148 genes were shared among 701

three networks PA PCC ranked aggregation network SA SCC ranked aggregation network MS MRNET 702

single network B Venn diagram shows the overlap among top 1x106 edges in three networks 106591 edges 703

were shared among three networks C GO term enrichment analyzed by AgriGO for 148 shared highly 704

interacting genes among PA SA and MS D GO term enrichment analyzed by AgriGO for 214 co-expressed 705

genes queried by 16 cell wall pathway genes 706

707

Figure 8 Cell wall pathway subnetworks A Intersections of PCC aggregation (PA) SCC aggregation (SA) and 708

MRNET-single (MS) networks queried by 16 cell wall pathway genes (red nodes) Cyan nodes are genes with 709

reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of 710

involvement in cell wall related pathways Grey lines indicate network predicted interactions B Network 711

retrieved from CORNET database queried by the16 cell wall pathway genes (red node) Cyan nodes are 712

genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior 713

knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions C 714

Network retrieved from STRING database queried by 16 cell wall pathway genes (red nodes) Cyan nodes are 715

genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior 716

knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions 717

718

Supplemental Figure 1 Pipeline and datasets used for analysis A Workflow used in this analysis 719

Independent steps are labeled in square boxes with alternative algorithms for each step in the rounded boxes 720

Software and packages for each step are in italics between the boxes Raw data files were acquired from 721

National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) database converted to a 722

common format (fastq files) and aligned to the maize AGPv3 genome (Alignment) Gene-level reads were 723

counted (Read Count) to generate an expression matrix which was imported to the R environment for the 724

normalization inference and evaluation steps All networks were visualized in Cytoscape B Relative 725

representation of different maize tissues in acquired datasets Tissues are listed by name with the percentage 726

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 22

of the1266 libraries originating from each tissue SAM= Shoot Apical Meristem Samples are grouped by tissue 727

and may be represented by one or more developmental stages of that tissue Tissues represented by less than 728

10 libraries were grouped together as Others C Relative representation of different maize genotypes in our 729

datasets Genotypes are listed by name with the percentage of the 1266 libraries originating from each tissue 730

MAGIC = Multi-parent Advanced Generation InterCrosses Genotypes represented by more than 10 libraries 731

were grouped together as Others 732

733

Supplemental Figure 2 Distribution of gene expression values The frequency of each expression level in the 734

dataset (Density) was plotted against gene expression (Expr) which was calculated after normalization by 735

Variance Stabilizing Transformation (VST) Counts Per Million (CPM) or Reads Per Kilobase per Million 736

mapped reads (RPKM) A-B distribution of expression values for samples normalized with CPM (black line 737

CPM graph) and RPKM (black line RPKM graph) before (A) and after (B) logarithm normalization (log2) VST 738

values are log2 transformed by default The normal distribution of expression (dot lines) was calculated using 739

dnorm() function in R which takes the mean value and standard deviation from log2 transformed expressions 740

C Normalized gene expression values for 15116 genes were averaged libraries and plotted as a function of 741

gene length in base pairs (bp) 742

743

Supplemental Figure 3 Maize CPM-normalized with log2 transformed gene expression from all tissues and 744

developmental stages (Stelpflug et al 2015) A Clustering dendrogram of samples based on Euclidean 745

distance (Height) DAS days after sowing DAP days after pollination V1-V18 vegetative developmental 746

stage B Heat map of the gene expression correlation between pollen tissue and 78 other tissues calculated 747

by Pearson correlation coefficient ranging 06 to 10 Red color indicates higher correlation 748

749

Supplemental Figure 4 Pairwise comparison among results of inferences methods A GO evaluation 750

comparisons for VST CPM and RPKM normalized data The AUROC value density for each method was 751

plotted in diagonal line of blocks between AUROC values and PCC values AUROC values evaluated by GO 752

datasets were plotted pairwise in triangle below diagonal with the number corresponding coefficient values as 753

calculated by Pearson correlation shown in the triangle above diagonal B PPPTY evaluation comparisons for 754

VST CPM and RPKM normalized data The AUROC value density for each method was plotted in diagonal 755

line of blocks between AUROC values and PCC values AUROC values evaluated by PPPTY datasets were 756

plotted pairwise in triangle below diagonal with the number corresponding coefficient values as calculated by 757

Pearson correlation shown in the triangle above diagonal PCC Pearson Correlation Coefficient SCC 758

Spearman correlation coefficient KCC Kendall rank Correlation Coefficient GCC Gini correlation coefficient 759

Bi Biweight midcorrelation CSC Cosine Similarity Coefficient AA Additive ARCNE MA multiplicative 760

ARCNE MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness 761

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 23

762

Supplemental Figure 5 Characteristic of all 1720 PPPTY gene set (ALL_1720) genes with highest AUROC 763

values in CSC method (CSC) PCC method (PCC) and MRNET method (MRNET) Average expression in 764

CPM of four gene sets were in squares average number of lowly expressed elements (CPM lt 0) were in solid 765

circles 766

767

Supplemental Figure 6 Evaluation of network performance based on sample size and inference A AUROC 768

values from GO evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted 769

against sample size B AUROC values from PPPTY evaluations of networks constructed using 12 (S12) to 770

1266 (S1266) libraries were plotted against sample size Networks with the same number of samples included 771

are designated as ldquo_1rdquo ldquo_2rdquo and ldquo_3rdquo Outliers were defined as outside of 15 times the interquartile range 772

above the 75 quantile or below 25 quantile Median values were plotted as bold horizontal lines Dash lines 773

are average AUROC value from 17 individual networks of each categories Mean values of each network were 774

labeled in asterisks PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET 775

Minimum Redundancy NETwork CLR Context Likelihood of Relatedness 776

777

Supplemental Figure 7 GCN performance comparison between protein networks A Area Under the ROC 778

curve (AUROC) values from GO evaluation of protein networks with 17862 genes (ppr_all) and with 11429 779

genes (ppr) B Area Under the ROC curve (AUROC) values from PPPTY evaluation of protein networks with 780

17862 genes (ppr_all) and with 11429 genes (ppr) Both networks were constructed by Pearson Correlation 781

Coefficient (PCC) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate 782

outliers 783

784

Supplemental Figure 8 Average neighborhood connectivity for three selected networks PCC-aggregated (PA) 785

SCC-aggregated (SA) and MRNET-single (MS) The average neighborhood connectivity distribution of all 786

genes is plotted against number of neighbors The top one million edges were chosen for each network Red 787

and blue curve shows the power-law fitted distribution R2 value indicates the fitness with the power-law model 788

789

Supplemental Figure 9 Node distribution for four selected networks PCC-aggregated (PA) SCC-aggregated 790

(SA) and MRNET-single (MS) and the intersection among three networks (Merged network) The number of 791

edges linked to the genes (node degree) was plotted against the number of genes with that degree (number of 792

nodes) Red curve shows the power-law fitted distribution with the function and R2 indicated beside 793

794

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 24

Supplemental Figure 10 Network representation of PCC ranked aggregation network (PA) Each node is a 795

gene in the network The eight largest modules detected by Markov Cluster Algorithm (MCL) were highlighted 796

in colors Genes not in modules 1-8 are light grey nodes 797

798

799

Literature Cited 800

Allen JD Xie Y Chen M Girard L Xiao G (2012) Comparing statistical methods for constructing large scale 801 gene networks PLoS One 7 e29348 802

Anders S Huber W (2010) Differential expression analysis for sequence count data Genome Biol 11 R106 803

Aoki K Ogata Y Shibata D (2007) Approaches for extracting practical information from gene co-expression 804 networks in plant biology Plant Cell Physiol 48 381ndash90 805

Aoki Y Okamura Y Tadaka S Kinoshita K Obayashi T (2015) ATTED-II in 2016 A Plant Coexpression 806 Database Towards Lineage-Specific Coexpression Plant Cell Physiol 57 e5ndashe5 807

Asnicar F Masera L Coller E Gallo C Sella N Tolio T Morettin P Erculiani L Galante F Semeniuta S (2016) 808 NES2RA Network expansion by stratified variable subsetting and ranking aggregation Int J High Perform 809 Comput Appl 1094342016662508 810

Baerenfaller K Grossmann J Grobei MA Hull R Hirsch-Hoffmann M Yalovsky S Zimmermann P 811 Grossniklaus U Gruissem W Baginsky S (2008) Genome-scale proteomics reveals Arabidopsis thaliana 812 gene models and proteome dynamics Science (80- ) 320 938ndash941 813

Ballouz S Verleyen W Gillis J (2015) Guidance for RNA-seq co-expression network construction and analysis 814 Safety in numbers Bioinformatics 31 2123ndash2130 815

Ballouz S Weber M Pavlidis P Gillis J (2016) EGAD Ultra-fast functional analysis of gene networks bioRxiv 816 53868 817

Barabasi A-L Oltvai ZNZN Barabaacutesi A-L (2004) Network biology understanding the cellrsquos functional 818 organization Nat Rev Genet 5 101ndash113 819

Benjamini Y Hochberg Y (1995) Controlling the false discovery rate a practical and powerful approach to 820 multiple testing J R Stat Soc Ser B 289ndash300 821

De Bodt S Hollunder J Nelissen H Meulemeester N Inzeacute D (2012) CORNET 20 Integrating plant 822 coexpression protein-protein interactions regulatory interactions gene associations and functional 823 annotations New Phytol 195 707ndash720 824

Bolduc N Yilmaz A Mejia-Guerra MK Morohashi K OrsquoConnor D Grotewold E Hake S (2012) Unraveling the 825 KNOTTED1 regulatory network in maize meristems Genes Dev 26 1685ndash90 826

Bosch M Mayer C-D Cookson A Donnison IS (2011) Identification of genes involved in cell wall biogenesis in 827 grasses by differential gene expression profiling of elongating and non-elongating maize internodes J 828 Exp Bot 62 3545ndash3561 829

Camacho C Coulouris G Avagyan V Ma N Papadopoulos J Bealer K Madden TL (2009) BLAST+ 830 architecture and applications BMC Bioinformatics 10 421 831

Conesa A Madrigal P Tarazona S Gomez-Cabrero D Cervera A McPherson A Szcześniak MW Gaffney DJ 832 Elo LL Zhang X et al (2016) A survey of best practices for RNA-seq data analysis Genome Biol 17 13 833

Drsquohaeseleer P Liang S Somogyi R (2000) Genetic network inference from co-expression clustering to reverse 834 engineering Bioinformatics 16 707ndash726 835

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 25

Davidson RM Hansey CN Gowda M Childs KL Lin H Vaillancourt B Sekhon RS de Leon N Kaeppler SM 836 Jiang N et al (2011) Utility of RNA Sequencing for Analysis of Maize Reproductive Transcriptomes Plant 837 Genome J 4 191 838

Desprez T Juraniec M Crowell EF Jouy H Pochylova Z Parcy F Houmlfte H Gonneau M Vernhettes S (2007) 839 Organization of cellulose synthase complexes involved in primary cell wall synthesis in Arabidopsis 840 thaliana Proc Natl Acad Sci 104 15572ndash15577 841

Dhillon IS Modha DS (2001) Concept decompositions for large sparse text data using clustering Mach Learn 842 42 143ndash175 843

Dillies M-A Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot G Castel D 844 Estelle J (2013a) A comprehensive evaluation of normalization methods for Illumina high-throughput RNA 845 sequencing data analysis Brief Bioinform 14 671ndash683 846

Dillies MA Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot NS Castel D 847 Estelle J et al (2013b) A comprehensive evaluation of normalization methods for Illumina high-throughput 848 RNA sequencing data analysis Brief Bioinform 14 671ndash683 849

Doncheva NT Assenov Y Domingues FS Albrecht M (2012) Topological analysis and interactive visualization 850 of biological networks and protein structures Nature Protoc 7 670ndash85 851

Dong J Horvath S (2007) Understanding network concepts in modules BMC Syst Biol 1 24 852

Dotto MC Petsch KA Aukerman MJ Beatty M Hammell M Timmermans MCP (2014) Genome-wide analysis 853 of leafbladeless1-regulated and phased small RNAs underscores the importance of the TAS3 ta-siRNA 854 pathway to maize development PLoS Genet 10 e1004826 855

Du D Rawat N Deng Z Gmitter FG (2015) Construction of citrus gene coexpression networks from microarray 856 data using random matrix theory Hortic Res 2 15026 857

Du Z Zhou X Ling Y Zhang Z Su Z (2010) agriGO a GO analysis toolkit for the agricultural community 858 Nucleic Acids Res 38 64-70 859

Enright AJ Van Dongen S Ouzounis CA (2002) An efficient algorithm for large-scale detection of protein 860 families Nucleic Acids Res 30 1575ndash1584 861

Fadista J Vikman P Laakso EO Mollet IG Esguerra J Lou Taneera J Storm P Osmark P Ladenvall C 862 Prasad RB (2014) Global genomic and transcriptomic analysis of human pancreatic islets reveals novel 863 genes influencing glucose metabolism Proc Natl Acad Sci 111 13924ndash13929 864

Faith JJ Hayete B Thaden JT Mogno I Wierzbowski J Cottarel G Kasif S Collins JJ Gardner TS (2007) 865 Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of 866 expression profiles PLoS Biol 5 0054ndash0066 867

Fedoroff N V (2012) McClintockrsquos challenge in the 21st century Proc Natl Acad Sci 109(50) 20200ndash20203 868

Ficklin SP Feltus FA (2011) Gene coexpression network alignment and conservation of gene modules 869 between two grass species maize and rice Plant Physiol 156 1244ndash56 870

Filzmoser P Fritz H Kalcher K (2009) pcaPP Robust PCA by Projection Pursuit R Packag version 1 871

Fu J Cheng Y Linghu J Yang X Kang L Zhang Z Zhang J He C Du X Peng Z (2013) RNA sequencing 872 reveals the complex regulatory network in the maize kernel Nature Commun 42832 873

Gao S Ver Steeg G Galstyan A (2015) Efficient Estimation of Mutual Information for Strongly Dependent 874 Variables Artificial Intelligence and Statistics 277-286 875

Gillis J Pavlidis P (2011) The role of indirect connections in gene networks in predicting function 876 Bioinformatics 27 1860ndash1866 877

Giorgi F Fabbro C Del Licausi F (2013) Comparative study of RNA-seq-and Microarray-derived coexpression 878 networks in Arabidopsis thaliana Bioinformatics 2 1ndash8 879

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 26

Gu Y Kaplinsky N Bringmann M Cobb A Carroll A Sampathkumar A Baskin TI Persson S Somerville CR 880 (2010) Identification of a cellulose synthase-associated protein required for cellulose biosynthesis Proc 881 Natl Acad Sci 107 12866ndash12871 882

Han Y Gao S Muegge K Zhang W Zhou B (2015) Advanced applications of RNA sequencing and challenges 883 Bioinform Biol Insights 9 29ndash46 884

Horvath S Dong J (2008) Geometric interpretation of gene coexpression network analysis PLoS Comput biol 885 4 e1000117 886

Huang J Lynn JS Schulte L Vendramin S McGinnis K (2017) Chapter Two-Epigenetic Control of Gene 887 Expression in Maize Int Rev Cell Mol Biol 328 25ndash48 888

Iancu OD Kawane S Bottomly D Searles R Hitzemann R McWeeney S (2012) Utilizing RNA-Seq data for de 889 novo coexpression network inference Bioinformatics 28 1592ndash1597 890

Kim D Langmead B Salzberg SL (2015) HISAT a fast spliced aligner with low memory requirements Nat 891 Methods 12 357ndash360 892

Koumlster J Rahmann S (2012) Snakemakemdasha scalable bioinformatics workflow engine Bioinformatics 28 893 2520ndash2522 894

Krouk G Lingeman J Colon A Coruzzi G Shasha D (2013) Gene regulatory networks in plants learning 895 causality from time and perturbation Genome Biol 14 123 896

Kumar S Stecher G Suleski M Hedges SB (2017) TimeTree a resource for timelines timetrees and 897 divergence times Mol Biol Evol 34 1812ndash1819 898

Kumari S Nie J Chen H-S Ma H Stewart R Li X Lu M-Z Taylor WM Wei H (2012) Evaluation of gene 899 association methods for coexpression network construction and biological knowledge discovery PLoS 900 One 7 e50411 901

Langfelder P Horvath S (2008) WGCNA an R package for weighted correlation network analysis BMC 902 Bioinformatics 9 559 903

Leinonen R Sugawara H Shumway M (2010) The sequence read archive Nucleic Acids Res gkq1019 904

Li C Qiao Z Qi W Wang Q Yuan Y Yang X Tang Y Mei B Lv Y Zhao H et al (2015a) Genome-Wide 905 Characterization of cis-Acting DNA Targets Reveals the Transcriptional Regulatory Framework of 906 Opaque2 in Maize Plant Cell 27 532-545 907

Li H Peng Z Yang X Wang W Fu J Wang J Han Y Chai Y Guo T Yang N (2013a) Genome-wide 908 association study dissects the genetic architecture of oil biosynthesis in maize kernels Nat Genet 45 43ndash909 50 910

Li J Wei H Zhao PX (2013b) DeGNServer  Deciphering Genome-Scale Gene Networks through High 911 Performance Reverse Engineering Analysis 2013 912

Li P Piao Y Shon HS Ryu KH (2015b) Comparing the normalization methods for the differential analysis of 913 Illumina high-throughput RNA-Seq data BMC Bioinformatics 16 347 914

Li Q Eichten SR Hermanson PJ Zaunbrecher VM Song J Wendt J Rosenbaum H Madzima TF Sloan AE 915 Huang J et al (2014a) Genetic Perturbation of the Maize Methylome Plant Cell 26 4602ndash4616 916

Li S Łabaj PP Zumbo P Sykacek P Shi W Shi L Phan J Wu P-Y Wang M Wang C (2014b) Detecting and 917 correcting systematic variation in large-scale RNA sequencing data Nature Biotechnol 32 888ndash895 918

Li Y Pearl SA Jackson SA (2015c) Gene Networks in Plant Biology Approaches in Reconstruction and 919 Analysis Trends Plant Sci 20 664ndash675 920

Liao Y Smyth GK Shi W (2014) featureCounts an efficient general purpose program for assigning sequence 921 reads to genomic features Bioinformatics 30 923ndash930 922

Lim WK Wang K Lefebvre C Califano A (2007) Comparative analysis of microarray normalization procedures 923 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 27

Effects on reverse engineering gene networks Bioinformatics pp 282ndash288 924

Liu S Liu Y Zhao J Cai S Qian H Zuo K Zhao L Zhang L (2017) A computational interactome for prioritizing 925 genes associated with complex agronomic traits in rice Plant J 90 177-188 926

Lonsdale J Thomas J Salvatore M Phillips R Lo E Shad S Hasz R Walters G Garcia F Young N (2013) 927 The genotype-tissue expression (GTEx) project Nat Genet 45 580ndash585 928

Love MI Huber W Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data 929 with DESeq2 Genome Biol 15 1 930

Luo X You Z Zhou M Li S Leung H Xia Y Zhu Q (2015) A highly efficient approach to protein interactome 931 mapping based on collaborative filtering framework Sci Rep 5 7702 932

Ma C Wang X (2012) Application of the Gini correlation coefficient to infer regulatory relationships in 933 transcriptome analysis Plant Physiol 160 192ndash203 934

Ma H-W Zeng A-P (2003) The connectivity structure giant strong component and centrality of metabolic 935 networks Bioinformatics 19 1423ndash1430 936

Ma S Shah S Bohnert HJ Snyder M Dinesh-Kumar SP (2013) Incorporating motif analysis into gene co-937 expression networks reveals novel modular expression pattern and new signaling pathways PLoS Genet 938 9 e1003840 939

Marbach D Costello JC Kuumlffner R Vega NNM Prill RJ Camacho DM Allison KR Aderhold A Allison KR 940 Bonneau R et al (2012) Wisdom of crowds for robust gene network inference Nat Methods 9 796ndash804 941

Margolin AA Nemenman I Basso K Wiggins C Stolovitzky G Dalla Favera R Califano A (2006) ARACNE 942 an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context BMC 943 Bioinformatics 7 S7 944

Mark Cigan A Unger‐Wallace E Haug‐Collet K (2005) Transcriptional gene silencing as a tool for uncovering 945 gene function in maize Plant J 43 929ndash940 946

Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads EMBnet J 17 947 pp-10 948

Maza E Frasse P Senin P Bouzayen M Zouine M (2013) Comparison of normalization methods for 949 differential gene expression analysis in RNA-Seq experiments A matter of relative size of studied 950 transcriptomes Commun Integr Biol 6 e25849 951

McClintock B (1983) The Significance of Responses of THE GENOME TO CHALLENGE Science (80- ) 792ndash952 801 953

Meyer PE Kontos K Lafitte F Bontempi G (2007) Information-theoretic inference of large transcriptional 954 regulatory networks Eurasip J Bioinforma Syst Biol doi 101155200779879 955

Meyer PE Lafitte F Bontempi G (2008) minet ARBioconductor package for inferring large transcriptional 956 networks using mutual information BMC Bioinformatics 9 461 957

Monaco MK Sen TZ Dharmawardhana PD Ren L Schaeffer M Naithani S Amarasinghe V Thomason J 958 Harper L Gardiner J et al (2013) Maize Metabolic Network Construction and Transcriptome Analysis 959 Plant Genome 6 12 960

Morohashi K Casas MI Falcone Ferreyra ML Falcone Ferreyra L Mejiacutea-Guerra MK Pourcel L Yilmaz A 961 Feller A Carvalho B Emiliani J et al (2012) A genome-wide regulatory framework identifies maize 962 pericarp color1 controlled genes Plant Cell 24 2745ndash64 963

Morris JH Apeltsin L Newman AM Baumbach J Wittkop T Su G Bader GD Ferrin TE (2011) clusterMaker 964 a multi-algorithm clustering plugin for Cytoscape BMC Bioinformatics 12 436 965

Mortazavi A Williams BA McCue K Schaeffer L Wold B (2008) Mapping and quantifying mammalian 966 transcriptomes by RNA-Seq Nat Methods 5 621ndash628 967

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 28

Mukaka MM (2012) A guide to appropriate use of correlation coefficient in medical research Malawi Med J 24 968 69ndash71 969

Obayashi T Hayashi S Saeki M Ohta H Kinoshita K (2009) ATTED-II provides coexpressed gene networks 970 for Arabidopsis Nucleic Acids Res 37 D987ndashD991 971

Ogata Y Suzuki H Sakurai N Shibata D (2010) CoP a database for characterizing co-expressed gene 972 modules with biological information in plants Bioinformatics 26 1267ndash1268 973

Oshlack A Wakefield MJ (2009) Transcript length bias in RNA-seq data confounds systems biology Biol 974 Direct 4 14 975

Park T Yi S-G Kang S-H Lee S Lee Y-S Simon R (2003) Evaluation of normalization methods for microarray 976 data BMC Bioinformatics 4 33 977

Paulson J Chen C-Y Lopes-Ramos CM Kuijjer ML Platig J Sonawane AR Fagny M Glass K Quackenbush 978 J (2016) Tissue-aware RNA-Seq processing and normalization for heterogeneous and sparse data 979 bioRxiv 81802 980

Pautler M Eveland AL LaRue T Yang F Weeks R Lunde C Je B Il Meeley R Komatsu M Vollbrecht E et 981 al (2015) FASCIATED EAR4 Encodes a bZIP Transcription Factor That Regulates Shoot Meristem Size in 982 Maize Plant Cell Online 2 tpc114132506 983

Penning BW Hunter 3rd CT Tayengwa R Eveland AL Dugard CK Olek AT Vermerris W Koch KE McCarty 984 DR Davis MF et al (2009) Genetic resources for maize cell wall biology Plant Physiol 151 1703ndash1728 985

Ponnala L Wang Y Sun Q Wijk KJ (2014) Correlation of mRNA and protein abundance in the developing 986 maize leaf Plant J 78 424ndash440 987

Rau A Gallopin M Celeux G Jaffreacutezic F (2013) Data-based filtering for replicated high-throughput 988 transcriptome sequencing experiments Bioinformatics 29 2146ndash2152 989

Robinson MD McCarthy DJ Smyth GK (2010) edgeR a Bioconductor package for differential expression 990 analysis of digital gene expression data Bioinformatics 26 139ndash140 991

Sales G Romualdi C (2011) parmigenemdasha parallel R package for mutual information estimation and gene 992 network reconstruction Bioinformatics 27 1876ndash1877 993

Sangrador-Vegas A Mitchell AL Chang H-Y Yong S-Y Finn RD (2016) GO annotation in InterPro why 994 stability does not indicate accuracy in a sea of changing annotations Database J Biol databases 995 curation 2016 996

Sato Y Antonio BA Namiki N Takehisa H Minami H Kamatsuki K Sugimoto K Shimizu Y Hirochika H 997 Nagamura Y (2011) RiceXPro a platform for monitoring gene expression in japonica rice grown under 998 natural field conditions Nucleic Acids Res 39 D1141ndashD1148 999

Schaefer RJ Briskine R Springer NM Myers CL (2014) Discovering functional modules across diverse maize 1000 transcriptomes using COB the co-expression browser PLoS One doi 101371journalpone0099193 1001

Schmidt D (2016) Co-Operation Fast Correlation Covariance and Cosine Similarity R package 1002

Schnable P Ware D Fulton R Stein J (2009) The B73 maize genome complexity diversity and dynamics 1003 Science (80- ) 326 1112ndash1115 1004

Schwanhaumlusser B Busse D Li N Dittmar G Schuchhardt J Wolf J Chen W Selbach M (2011) Global 1005 quantification of mammalian gene expression control Nature 473 337ndash342 1006

Scott-Boyer M-P Haibe-Kains B Deschepper CF (2013) Network statistics of genetically-driven gene co-1007 expression modules in mouse crosses Frontiers in Genetics 20134291 1008

Serin EAR Nijveen H Hilhorst HWM Ligterink W (2016) Learning from Co-expression Networks Possibilities 1009 and Challenges Front Plant Sci 7 444 1010

Shannon P Markiel A Ozier O Baliga NS Wang JT Ramage D Amin N Schwikowski B Ideker T (2003) 1011 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 29

Cytoscape a software environment for integrated models of biomolecular interaction networks Genome 1012 Res 13 2498ndash2504 1013

Sing T Sander O Beerenwinkel N Lengauer T (2005) ROCR visualizing classifier performance in R 1014 Bioinformatics 21 3940ndash3941 1015

Singh M Goel S Meeley RB Dantec C Parrinello H Michaud C Leblanc O Grimanelli D (2011) Production of 1016 viable gametes without meiosis in maize deficient for an ARGONAUTE protein Plant Cell 23 443ndash458 1017

Song L Langfelder P Horvath S (2012) Comparison of co-expression measures mutual information 1018 correlation and model based indices BMC Bioinformatics 13 328 1019

Stelpflug SC Rajandeep S Vaillancourt B Hirsch CN Buell CR Leon N De Kaeppler SM (2015) An 1020 expanded maize gene expression atlas based on RNA-sequencing and its use to explore root 1021 development Plant Genome 314ndash362 1022

Szklarczyk D Franceschini A Wyder S Forslund K Heller D Huerta-Cepas J Simonovic M Roth A Santos A 1023 Tsafou KP et al (2015) STRING v10 Protein-protein interaction networks integrated over the tree of life 1024 Nucleic Acids Res 43 D447ndashD452 1025

Tzfadia O Diels T De Meyer S Vandepoele K Aharoni A Van de Peer Y (2016) CoExpNetViz Comparative 1026 Co-Expression Networks Construction and Visualization Tool Front Plant Sci 6 1194 1027

Usadel B Obayashi T Mutwil M Giorgi FM Bassel GW Tanimoto M Chow A Steinhauser D Persson S 1028 Provart NJ (2009) Co-expression tools for plant biology opportunities for hypothesis generation and 1029 caveats Plant Cell Environ 32 1633ndash51 1030

USDA (2016) Grain World Markets and Trade 1031

Vanholme R Demedts B Morreel K Ralph J Boerjan W (2010) Lignin biosynthesis and structure Plant 1032 Physiol 153 895ndash905 1033

Verdin E Ott M (2014) 50 Years of Protein Acetylation From Gene Regulation To Epigenetics Metabolism 1034 and Beyond Nat Rev Mol Cell Biol 16 258ndash264 1035

Walley JW Sartor RC Shen Z Schmitz RJ Wu KJ Urich MA Nery JR Smith LG Schnable JC Ecker JR 1036 (2016) Integration of omic networks in a developmental atlas of maize Science 353 814ndash818 1037

Wang W Zhou XJ Liu Z Sun F (2015a) Network tuned multiple rank aggregation and applications to gene 1038 ranking BMC Bioinformatics 16 S6 1039

Wang YXR Jiang K Feldman LJ Bickel PJ Huang H (2015b) Inferring genendashgene interactions and functional 1040 modules using sparse canonical correlation analysis Ann Appl Stat 9 300ndash323 1041

Wang Z Gerstein M Snyder M (2009) RNA-Seq a revolutionary tool for transcriptomics Nat Rev Genet 10 1042 57ndash63 1043

Wei C Li J Bumgarner RE (2004) Sample size for detecting differentially expressed genes in microarray 1044 experiments BMC Genomics 5 87 1045

Wolfe CJ Kohane IS Butte AJ (2005) Systematic survey reveals general applicability ofldquo guilt-by-associationrdquo 1046 within gene coexpression networks BMC Bioinformatics 6 227 1047

Wu D-D Wang X Li Y Zeng L Irwin DM Zhang Y-P (2014) ldquoOut of Pollenrdquo Hypothesis for Origin of New 1048 Genes in Flowering Plants Study from Arabidopsis thaliana Genome Biol Evol 6 2822ndash2829 1049

Yang H Liu X Xin M Du J Hu Z Peng H Rossi V Sun Q Ni Z Yao Y (2016) Genome-wide Mapping of 1050 Targets of Maize Histone Deacetylase HDA101 Reveals Its Function and Regulatory Mechanism during 1051 Seed Development Plant Cell 28 629ndash645 1052

Yim WC Yu Y Song K Jang CS Lee B-M (2013) PLANEX the plant co-expression database BMC Plant Biol 1053 13 83 1054

Zheng W Chung LM Zhao H (2011) Bias detection and correction in RNA-Sequencing data BMC 1055 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 30

Bioinformatics 12 290 1056

Zhong R Allen JD Xiao G Xie Y (2014) Ensemble-based network aggregation improves the accuracy of gene 1057 network reconstruction PLoS One 9 e106319 1058

Zhong R Ye ZH (2015) Secondary cell walls Biosynthesis patterned deposition and transcriptional regulation 1059 Plant Cell Physiol 56 195ndash214 1060

Zhu G Wu A Xu X-J Xiao P Lu L Liu J Cao Y Chen L Wu J Zhao X-M (2016) PPIM A protein-protein 1061 interaction database for Maize Plant Physiol 170 pp1501821- 1062

Zyprych-Walczak J Szabelska A Handschuh L Goacuterczak K Klamecka K Figlerowicz M Siatkowski I (2015) 1063 The impact of normalization methods on RNA-Seq data analysis Biomed Res Int 2015 1064

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 1 Number of maize microarray and RNA-Seq samples submitted to NCBI (httpswwwncbinlmnihgov) from 2008-2016 A A text search of the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) database identified samples generated by microarray platforms GPL4032 and GPL12620 the total values for years 2008 to 2016 were combined to represent the number of microarray studies (Microarray) A text search of the NCBI Sequence Read Archive (SRA) database was used to identify RNA-seq samples generated between 2008 and 2016 using the Illumina sequencing platform (RNA-Seq) B the number of samples submitted to NCBI GEO database each year generated by microarray platform GPL4032 and GPL12620 were identified by a text search (dash line) and compared to the number of RNA-Seq Illumina samples (solid line) per year 2008-2016

Fig 1A B

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 2 Normalization and network inference methods effect on single network performance A Network performance was evaluated by calculating area under the receiver operating characteristic curve (AUROC) values from GO datasets for comparisons with samples normalized using Variance Stabilizing Transformation (VST) Counts Per Million (CPM) or Reads per Kilobase per Million (RPKM) methods B Network performance was evaluated by calculating AUROC values from PPPTY dataset comparisons for samples normalized using VST CPM or RPKM methods C Network performance was evaluated by calculating AUROC values from com-parisons with HDA101 binding targets for samples normalized using VST CPM or RPKM methods D Network performance was evaluated by calculating AUROC values from comparisons with GO dataset for samples constructed using nine inference methods including Pearson Correlation Coefficient (PCC) Spearman correla-tion coefficient (SCC) Kendall rank Correlation Coefficient (KCC) Gini correlation coefficient (GCC) Biweight midcorrelation (BIC) Cosine Similarity Coefficient (CSC) Additive ARCNE (AA) multiplicative ARCNE (MA) Minimum Redundancy NETwork (MRNET) and Context Likelihood of Relatedness (CLR) E Network perfor-mance was evaluated by calculating AUROC values from comparisons with PPPTY for samples constructed using nine inference methods F Network performance was evaluated by calculating AUROC values from com-parisons with HDA101 binding targets for samples constructed using nine inference methods Outliers were defined as outside of 15 times the interquartile range above 75 quantile or below 25 quantile Median values were plotted as bold horizontal lines For C to F the horizontal dashed lines indicate the highest and lowest AUROC values

Fig 2 A D

B E

C F

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

FigP

FigurePIPSimilarityPbetweenPninePinferencePmethodsPonPnetworkPperformancePbasedPuponPGOPA-PandPPPPTYPB-PevaluationI Genes with highest AUROC value in CSC or MRNETCLR are closed in blue and green box respectively AreaPunderPthePROCPcurvePAUROC-PvaluesPforPeachPGOPtermPorPgenesPwerePscaledPtoPstandardPnormalPdistributionMPresultingPinPscaledPAUROCPvaluesPbetweenPKPblue-PandPPred-IPSamplesPnormalizedPbyPVSTMPCPMPandPRPKMPwerePanalyzedPusingPeachPinferencePmethodsPPCCMPSCCMPKCCMPGCCMPBICMPCSCMPAAMPMAMPMRNETPandPCLR-PandPclusteredPbasedPonP EuclidianP distanceIP PCCP PearsonP CorrelationP CoefficientP SCCP SpearmanP correlationP coefficientP KCCP KendallP rankP CorrelationP CoefficientPGCCPGiniPcorrelationPcoefficientPBICPBiweightPmidcorrelationPCosinePSimilarityPCoefficientPCSC-PAAPAdditivePARCNEPMAPmultiplicativePARCNEPMRNETPMinimumPRedundancyPNETworkPCLRPContextPLikelihoodPofPRelatednessI

A

B

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 4 Effect of sample size on network performance A Average area under the ROC curve (average AUROC) values from GO evaluation of 17 different sized networks plotted against natural logarithm transformed sample size (log(sample size)) B Average AUROC values from PPPTY evaluation of 17 different sized networks plotted against natural logarithm transformed sample size (log(sample size)) Regression fitting logarithm models were plotted in black lines R-square and p-values were calculated by lm() function in R PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness

Fig 4A

B

GO PCC GO SCC

GO MRNET GO CLR

PPPTY PCC PPPTY SCC

PPPTY CLRPPPTY MRNET

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Fig 5

A B

Figure 5 Evaluation of different sized networks constructed with different number of samples using PCC (black) SCC (green) MRNET (red) and CLR (blue) methods A Average AUROC values from GO evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted against sample size B Average AUROC values from PPPTY evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted against sample size Networks with the same number of samples included are designated as ldquo_1rdquo ldquo_2rdquo and ldquo_3rdquo PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 6 GCN performance comparison of networks constructed with 1266 libraries A Area under the ROC curve (AUROC) values from GO evalu-ation of single network (white bars) aggregation network (grey bars) and protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers B AUROC values from PPPTY evaluation of single network (white bars) aggregation network (grey bars) and protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers

AU

C

AU

C

Protein GO Protein PPPTYA B

Fig 6

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

835

45

3812

5 802

148

MS PA

SA

872505

167664

16510411732

9172 716573

106591

MS PA

SA

chromatinassemblydisassembly

cellular macromoleculemetabolic process

chromatin assembly

Hub

N2 compound metabolicprocess

gene silencing

macromoleculemetabolic process

cellular componentorganization

chromatin modification

biosynthetic process

cellular biosyntheticprocess

DNA packaging

organelle organization

protein-DNA complexassembly

nucleosomeorganization

DNA-dep DNAreplication

macromoleculebiosynthetic process

response to DNAdamage stimulus

chromosomeorganization

pattern specificationprocess

DNA replication

DNA conformationchange

translation

cellular macromoleculebiosynthetic process

Nucleic acid metabolicprocess

gene expression

chromatin organizationnucleosome assembly

epigenetic reg of geneexpression

negative regulation ofmacromolecule

metabolic process

cellular response tostress

RNA processing

DNA repair

regionalization

polysaccharidebiosynthetic process

cell wall organization orbiogenesis

glucan metabolicprocess

cellular glucanmetabolic process

cellular polysaccharidebiosynthetic process

cellular carbohydratebiosynthetic process

cellulose metabolicprocess

cellular polysaccharidemetabolic process

cellulose biosyntheticprocess epidermis development

cell growthgrowth

regulation of cellularcomponent size

cellular amino acidderivative metabolic

process

cell wall polysaccharidemetabolic process

carbohydrate metabolicprocess

regulation of anatomicalstructure size

GTP metabolic process

root morphogenesis

epidermal celldifferentiation

ectoderm developmentphenylpropanoid

biosynthetic process

regulation of cell size

glucan biosyntheticprocess

carbohydratebiosynthetic process cellular cell wall

organization orbiogenesis

cell wall biogenesis

Cell Wallroot epidermal cell

differentiationcell differentiation

cell wall organization

protein polymerization

plant-type cell wallbiogenesis

cellular carbohydratemetabolic process

phenylpropanoidmetabolic process

cell wall macromoleculemetabolic process

cellular cell wallmacromolecule

metabolic process

plant-type cell wallorganization or

biogenesis

hemicellulose metabolicprocess

Fig 7A C

B D

Figure 7 Comparison of top 1000 interacting genes in aggregation and single networks A Venn diagram shows the overlap among 1000 highest interacting genes in three networks 148 genes were shared among three networks PA PCC ranked aggregation network SA SCC ranked aggrega-tion network MS MRNET single network B Venn diagram shows the overlap among top 1x106 edges in three networks 106591 edges were shared among three networks C GO term enrichment analyzed by AgriGO for 148 shared highly interacting genes among PA SA and MS D GO term enrichment analyzed by AgriGO for 214 co-expressed genes queried by 16 cell wall pathway genes wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Fig 8

A B C

Figure 8 Cell wall pathway subnetworks A Intersections of PCC aggregation (PA) SCC aggregation (SA) and MRNET-single (MS) networks queried by 16 cell wall pathway genes (red nodes) Cyan nodes are genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions B Network retrieved from CORNET database queried by the 16 cell wall pathway genes (red node) Cyan nodes are genes with reported function in cell wall related path-ways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions C Network retrieved from STRING database queried by the 16 cell wall pathway genes (red nodes) Cyan nodes are genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Parsed CitationsAllen JD Xie Y Chen M Girard L Xiao G (2012) Comparing statistical methods for constructing large scale gene networks PLoS One7 e29348

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Anders S Huber W (2010) Differential expression analysis for sequence count data Genome Biol 11 R106Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Aoki K Ogata Y Shibata D (2007) Approaches for extracting practical information from gene co-expression networks in plant biologyPlant Cell Physiol 48 381-90

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Aoki Y Okamura Y Tadaka S Kinoshita K Obayashi T (2015) ATTED-II in 2016 A Plant Coexpression Database Towards Lineage-Specific Coexpression Plant Cell Physiol 57 e5-e5

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Asnicar F Masera L Coller E Gallo C Sella N Tolio T Morettin P Erculiani L Galante F Semeniuta S (2016) NES2RA Networkexpansion by stratified variable subsetting and ranking aggregation Int J High Perform Comput Appl 1094342016662508

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Baerenfaller K Grossmann J Grobei MA Hull R Hirsch-Hoffmann M Yalovsky S Zimmermann P Grossniklaus U Gruissem WBaginsky S (2008) Genome-scale proteomics reveals Arabidopsis thaliana gene models and proteome dynamics Science (80- ) 320938-941

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ballouz S Verleyen W Gillis J (2015) Guidance for RNA-seq co-expression network construction and analysis Safety in numbersBioinformatics 31 2123-2130

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ballouz S Weber M Pavlidis P Gillis J (2016) EGAD Ultra-fast functional analysis of gene networks bioRxiv 53868Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Barabasi A-L Oltvai ZNZN Barabaacutesi A-L (2004) Network biology understanding the cells functional organization Nat Rev Genet 5101-113

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Benjamini Y Hochberg Y (1995) Controlling the false discovery rate a practical and powerful approach to multiple testing J R Stat SocSer B 289-300

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

De Bodt S Hollunder J Nelissen H Meulemeester N Inzeacute D (2012) CORNET 20 Integrating plant coexpression protein-proteininteractions regulatory interactions gene associations and functional annotations New Phytol 195 707-720

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Bolduc N Yilmaz A Mejia-Guerra MK Morohashi K OConnor D Grotewold E Hake S (2012) Unraveling the KNOTTED1 regulatorynetwork in maize meristems Genes Dev 26 1685-90

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Bosch M Mayer C-D Cookson A Donnison IS (2011) Identification of genes involved in cell wall biogenesis in grasses by differential wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

gene expression profiling of elongating and non-elongating maize internodes J Exp Bot 62 3545-3561Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Camacho C Coulouris G Avagyan V Ma N Papadopoulos J Bealer K Madden TL (2009) BLAST+ architecture and applications BMCBioinformatics 10 421

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Conesa A Madrigal P Tarazona S Gomez-Cabrero D Cervera A McPherson A Szczesniak MW Gaffney DJ Elo LL Zhang X et al(2016) A survey of best practices for RNA-seq data analysis Genome Biol 17 13

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dhaeseleer P Liang S Somogyi R (2000) Genetic network inference from co-expression clustering to reverse engineeringBioinformatics 16 707-726

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Davidson RM Hansey CN Gowda M Childs KL Lin H Vaillancourt B Sekhon RS de Leon N Kaeppler SM Jiang N et al (2011) Utilityof RNA Sequencing for Analysis of Maize Reproductive Transcriptomes Plant Genome J 4 191

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Desprez T Juraniec M Crowell EF Jouy H Pochylova Z Parcy F Houmlfte H Gonneau M Vernhettes S (2007) Organization of cellulosesynthase complexes involved in primary cell wall synthesis in Arabidopsis thaliana Proc Natl Acad Sci 104 15572-15577

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dhillon IS Modha DS (2001) Concept decompositions for large sparse text data using clustering Mach Learn 42 143-175Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dillies M-A Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot G Castel D Estelle J (2013a) Acomprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis Brief Bioinform 14671-683

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dillies MA Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot NS Castel D Estelle J et al (2013b) Acomprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis Brief Bioinform 14671-683

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Doncheva NT Assenov Y Domingues FS Albrecht M (2012) Topological analysis and interactive visualization of biological networksand protein structures Nature Protoc 7 670-85

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dong J Horvath S (2007) Understanding network concepts in modules BMC Syst Biol 1 24Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dotto MC Petsch KA Aukerman MJ Beatty M Hammell M Timmermans MCP (2014) Genome-wide analysis of leafbladeless1-regulatedand phased small RNAs underscores the importance of the TAS3 ta-siRNA pathway to maize development PLoS Genet 10 e1004826

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Du D Rawat N Deng Z Gmitter FG (2015) Construction of citrus gene coexpression networks from microarray data using randommatrix theory Hortic Res 2 15026

Pubmed Author and TitleCrossRef Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Google Scholar Author Only Title Only Author and Title

Du Z Zhou X Ling Y Zhang Z Su Z (2010) agriGO a GO analysis toolkit for the agricultural community Nucleic Acids Res 38 64-70Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Enright AJ Van Dongen S Ouzounis CA (2002) An efficient algorithm for large-scale detection of protein families Nucleic Acids Res 301575-1584

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fadista J Vikman P Laakso EO Mollet IG Esguerra J Lou Taneera J Storm P Osmark P Ladenvall C Prasad RB (2014) Globalgenomic and transcriptomic analysis of human pancreatic islets reveals novel genes influencing glucose metabolism Proc Natl AcadSci 111 13924-13929

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Faith JJ Hayete B Thaden JT Mogno I Wierzbowski J Cottarel G Kasif S Collins JJ Gardner TS (2007) Large-scale mapping andvalidation of Escherichia coli transcriptional regulation from a compendium of expression profiles PLoS Biol 5 0054-0066

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fedoroff N V (2012) McClintocks challenge in the 21st century Proc Natl Acad Sci 109(50) 20200-20203Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ficklin SP Feltus FA (2011) Gene coexpression network alignment and conservation of gene modules between two grass speciesmaize and rice Plant Physiol 156 1244-56

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Filzmoser P Fritz H Kalcher K (2009) pcaPP Robust PCA by Projection Pursuit R Packag version 1Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fu J Cheng Y Linghu J Yang X Kang L Zhang Z Zhang J He C Du X Peng Z (2013) RNA sequencing reveals the complex regulatorynetwork in the maize kernel Nature Commun 42832

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gao S Ver Steeg G Galstyan A (2015) Efficient Estimation of Mutual Information for Strongly Dependent Variables ArtificialIntelligence and Statistics 277-286

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gillis J Pavlidis P (2011) The role of indirect connections in gene networks in predicting function Bioinformatics 27 1860-1866Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Giorgi F Fabbro C Del Licausi F (2013) Comparative study of RNA-seq-and Microarray-derived coexpression networks in Arabidopsisthaliana Bioinformatics 2 1-8

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gu Y Kaplinsky N Bringmann M Cobb A Carroll A Sampathkumar A Baskin TI Persson S Somerville CR (2010) Identification of acellulose synthase-associated protein required for cellulose biosynthesis Proc Natl Acad Sci 107 12866-12871

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Han Y Gao S Muegge K Zhang W Zhou B (2015) Advanced applications of RNA sequencing and challenges Bioinform Biol Insights 929-46

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Horvath S Dong J (2008) Geometric interpretation of gene coexpression network analysis PLoS Comput biol 4 e1000117Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Huang J Lynn JS Schulte L Vendramin S McGinnis K (2017) Chapter Two-Epigenetic Control of Gene Expression in Maize Int RevCell Mol Biol 328 25-48

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Iancu OD Kawane S Bottomly D Searles R Hitzemann R McWeeney S (2012) Utilizing RNA-Seq data for de novo coexpressionnetwork inference Bioinformatics 28 1592-1597

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kim D Langmead B Salzberg SL (2015) HISAT a fast spliced aligner with low memory requirements Nat Methods 12 357-360Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Koumlster J Rahmann S (2012) Snakemakemdasha scalable bioinformatics workflow engine Bioinformatics 28 2520-2522Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Krouk G Lingeman J Colon A Coruzzi G Shasha D (2013) Gene regulatory networks in plants learning causality from time andperturbation Genome Biol 14 123

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kumar S Stecher G Suleski M Hedges SB (2017) TimeTree a resource for timelines timetrees and divergence times Mol Biol Evol34 1812-1819

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kumari S Nie J Chen H-S Ma H Stewart R Li X Lu M-Z Taylor WM Wei H (2012) Evaluation of gene association methods forcoexpression network construction and biological knowledge discovery PLoS One 7 e50411

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Langfelder P Horvath S (2008) WGCNA an R package for weighted correlation network analysis BMC Bioinformatics 9 559Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Leinonen R Sugawara H Shumway M (2010) The sequence read archive Nucleic Acids Res gkq1019Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li C Qiao Z Qi W Wang Q Yuan Y Yang X Tang Y Mei B Lv Y Zhao H et al (2015a) Genome-Wide Characterization of cis-Acting DNATargets Reveals the Transcriptional Regulatory Framework of Opaque2 in Maize Plant Cell 27 532-545

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li H Peng Z Yang X Wang W Fu J Wang J Han Y Chai Y Guo T Yang N (2013a) Genome-wide association study dissects the geneticarchitecture of oil biosynthesis in maize kernels Nat Genet 45 43-50

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li J Wei H Zhao PX (2013b) DeGNServer Deciphering Genome-Scale Gene Networks through High Performance ReverseEngineering Analysis 2013

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li P Piao Y Shon HS Ryu KH (2015b) Comparing the normalization methods for the differential analysis of Illumina high-throughputRNA-Seq data BMC Bioinformatics 16 347 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li Q Eichten SR Hermanson PJ Zaunbrecher VM Song J Wendt J Rosenbaum H Madzima TF Sloan AE Huang J et al (2014a)Genetic Perturbation of the Maize Methylome Plant Cell 26 4602-4616

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li S Labaj PP Zumbo P Sykacek P Shi W Shi L Phan J Wu P-Y Wang M Wang C (2014b) Detecting and correcting systematicvariation in large-scale RNA sequencing data Nature Biotechnol 32 888-895

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li Y Pearl SA Jackson SA (2015c) Gene Networks in Plant Biology Approaches in Reconstruction and Analysis Trends Plant Sci 20664-675

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Liao Y Smyth GK Shi W (2014) featureCounts an efficient general purpose program for assigning sequence reads to genomicfeatures Bioinformatics 30 923-930

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Lim WK Wang K Lefebvre C Califano A (2007) Comparative analysis of microarray normalization procedures Effects on reverseengineering gene networks Bioinformatics pp 282-288

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Liu S Liu Y Zhao J Cai S Qian H Zuo K Zhao L Zhang L (2017) A computational interactome for prioritizing genes associated withcomplex agronomic traits in rice Plant J 90 177-188

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Lonsdale J Thomas J Salvatore M Phillips R Lo E Shad S Hasz R Walters G Garcia F Young N (2013) The genotype-tissueexpression (GTEx) project Nat Genet 45 580-585

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Love MI Huber W Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 Genome Biol15 1

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Luo X You Z Zhou M Li S Leung H Xia Y Zhu Q (2015) A highly efficient approach to protein interactome mapping based oncollaborative filtering framework Sci Rep 5 7702

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma C Wang X (2012) Application of the Gini correlation coefficient to infer regulatory relationships in transcriptome analysis PlantPhysiol 160 192-203

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma H-W Zeng A-P (2003) The connectivity structure giant strong component and centrality of metabolic networks Bioinformatics 191423-1430

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma S Shah S Bohnert HJ Snyder M Dinesh-Kumar SP (2013) Incorporating motif analysis into gene co-expression networks revealsnovel modular expression pattern and new signaling pathways PLoS Genet 9 e1003840

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Marbach D Costello JC Kuumlffner R Vega NNM Prill RJ Camacho DM Allison KR Aderhold A Allison KR Bonneau R et al (2012)Wisdom of crowds for robust gene network inference Nat Methods 9 796-804

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Margolin AA Nemenman I Basso K Wiggins C Stolovitzky G Dalla Favera R Califano A (2006) ARACNE an algorithm for thereconstruction of gene regulatory networks in a mammalian cellular context BMC Bioinformatics 7 S7

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mark Cigan A Unger-Wallace E Haug-Collet K (2005) Transcriptional gene silencing as a tool for uncovering gene function in maizePlant J 43 929-940

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads EMBnet J 17 pp-10Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Maza E Frasse P Senin P Bouzayen M Zouine M (2013) Comparison of normalization methods for differential gene expressionanalysis in RNA-Seq experiments A matter of relative size of studied transcriptomes Commun Integr Biol 6 e25849

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

McClintock B (1983) The Significance of Responses of THE GENOME TO CHALLENGE Science (80- ) 792-801Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Meyer PE Kontos K Lafitte F Bontempi G (2007) Information-theoretic inference of large transcriptional regulatory networks EurasipJ Bioinforma Syst Biol doi 101155200779879

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Meyer PE Lafitte F Bontempi G (2008) minet ARBioconductor package for inferring large transcriptional networks using mutualinformation BMC Bioinformatics 9 461

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Monaco MK Sen TZ Dharmawardhana PD Ren L Schaeffer M Naithani S Amarasinghe V Thomason J Harper L Gardiner J et al(2013) Maize Metabolic Network Construction and Transcriptome Analysis Plant Genome 6 12

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Morohashi K Casas MI Falcone Ferreyra ML Falcone Ferreyra L Mejiacutea-Guerra MK Pourcel L Yilmaz A Feller A Carvalho B EmilianiJ et al (2012) A genome-wide regulatory framework identifies maize pericarp color1 controlled genes Plant Cell 24 2745-64

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Morris JH Apeltsin L Newman AM Baumbach J Wittkop T Su G Bader GD Ferrin TE (2011) clusterMaker a multi-algorithmclustering plugin for Cytoscape BMC Bioinformatics 12 436

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mortazavi A Williams BA McCue K Schaeffer L Wold B (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq NatMethods 5 621-628

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mukaka MM (2012) A guide to appropriate use of correlation coefficient in medical research Malawi Med J 24 69-71Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Obayashi T Hayashi S Saeki M Ohta H Kinoshita K (2009) ATTED-II provides coexpressed gene networks for Arabidopsis Nucleic wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Acids Res 37 D987-D991Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ogata Y Suzuki H Sakurai N Shibata D (2010) CoP a database for characterizing co-expressed gene modules with biologicalinformation in plants Bioinformatics 26 1267-1268

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Oshlack A Wakefield MJ (2009) Transcript length bias in RNA-seq data confounds systems biology Biol Direct 4 14Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Park T Yi S-G Kang S-H Lee S Lee Y-S Simon R (2003) Evaluation of normalization methods for microarray data BMC Bioinformatics4 33

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Paulson J Chen C-Y Lopes-Ramos CM Kuijjer ML Platig J Sonawane AR Fagny M Glass K Quackenbush J (2016) Tissue-awareRNA-Seq processing and normalization for heterogeneous and sparse data bioRxiv 81802

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Pautler M Eveland AL LaRue T Yang F Weeks R Lunde C Je B Il Meeley R Komatsu M Vollbrecht E et al (2015) FASCIATED EAR4Encodes a bZIP Transcription Factor That Regulates Shoot Meristem Size in Maize Plant Cell Online 2 tpc114132506

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Penning BW Hunter 3rd CT Tayengwa R Eveland AL Dugard CK Olek AT Vermerris W Koch KE McCarty DR Davis MF et al (2009)Genetic resources for maize cell wall biology Plant Physiol 151 1703-1728

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ponnala L Wang Y Sun Q Wijk KJ (2014) Correlation of mRNA and protein abundance in the developing maize leaf Plant J 78 424-440Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Rau A Gallopin M Celeux G Jaffreacutezic F (2013) Data-based filtering for replicated high-throughput transcriptome sequencingexperiments Bioinformatics 29 2146-2152

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Robinson MD McCarthy DJ Smyth GK (2010) edgeR a Bioconductor package for differential expression analysis of digital geneexpression data Bioinformatics 26 139-140

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sales G Romualdi C (2011) parmigenemdasha parallel R package for mutual information estimation and gene network reconstructionBioinformatics 27 1876-1877

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sangrador-Vegas A Mitchell AL Chang H-Y Yong S-Y Finn RD (2016) GO annotation in InterPro why stability does not indicateaccuracy in a sea of changing annotations Database J Biol databases curation 2016

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sato Y Antonio BA Namiki N Takehisa H Minami H Kamatsuki K Sugimoto K Shimizu Y Hirochika H Nagamura Y (2011) RiceXPro aplatform for monitoring gene expression in japonica rice grown under natural field conditions Nucleic Acids Res 39 D1141-D1148

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schaefer RJ Briskine R Springer NM Myers CL (2014) Discovering functional modules across diverse maize transcriptomes using wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

COB the co-expression browser PLoS One doi 101371journalpone0099193Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schmidt D (2016) Co-Operation Fast Correlation Covariance and Cosine Similarity R packagePubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schnable P Ware D Fulton R Stein J (2009) The B73 maize genome complexity diversity and dynamics Science (80- ) 326 1112-1115Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schwanhaumlusser B Busse D Li N Dittmar G Schuchhardt J Wolf J Chen W Selbach M (2011) Global quantification of mammalian geneexpression control Nature 473 337-342

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Scott-Boyer M-P Haibe-Kains B Deschepper CF (2013) Network statistics of genetically-driven gene co-expression modules in mousecrosses Frontiers in Genetics 20134291

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Serin EAR Nijveen H Hilhorst HWM Ligterink W (2016) Learning from Co-expression Networks Possibilities and Challenges FrontPlant Sci 7 444

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Shannon P Markiel A Ozier O Baliga NS Wang JT Ramage D Amin N Schwikowski B Ideker T (2003) Cytoscape a softwareenvironment for integrated models of biomolecular interaction networks Genome Res 13 2498-2504

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sing T Sander O Beerenwinkel N Lengauer T (2005) ROCR visualizing classifier performance in R Bioinformatics 21 3940-3941Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Singh M Goel S Meeley RB Dantec C Parrinello H Michaud C Leblanc O Grimanelli D (2011) Production of viable gametes withoutmeiosis in maize deficient for an ARGONAUTE protein Plant Cell 23 443-458

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Song L Langfelder P Horvath S (2012) Comparison of co-expression measures mutual information correlation and model basedindices BMC Bioinformatics 13 328

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Stelpflug SC Rajandeep S Vaillancourt B Hirsch CN Buell CR Leon N De Kaeppler SM (2015) An expanded maize gene expressionatlas based on RNA-sequencing and its use to explore root development Plant Genome 314-362

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Szklarczyk D Franceschini A Wyder S Forslund K Heller D Huerta-Cepas J Simonovic M Roth A Santos A Tsafou KP et al (2015)STRING v10 Protein-protein interaction networks integrated over the tree of life Nucleic Acids Res 43 D447-D452

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Tzfadia O Diels T De Meyer S Vandepoele K Aharoni A Van de Peer Y (2016) CoExpNetViz Comparative Co-Expression NetworksConstruction and Visualization Tool Front Plant Sci 6 1194

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Usadel B Obayashi T Mutwil M Giorgi FM Bassel GW Tanimoto M Chow A Steinhauser D Persson S Provart NJ (2009) Co-expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

USDA (2016) Grain World Markets and Trade

Vanholme R Demedts B Morreel K Ralph J Boerjan W (2010) Lignin biosynthesis and structure Plant Physiol 153 895-905Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Verdin E Ott M (2014) 50 Years of Protein Acetylation From Gene Regulation To Epigenetics Metabolism and Beyond Nat Rev MolCell Biol 16 258-264

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Walley JW Sartor RC Shen Z Schmitz RJ Wu KJ Urich MA Nery JR Smith LG Schnable JC Ecker JR (2016) Integration of omicnetworks in a developmental atlas of maize Science 353 814-818

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang W Zhou XJ Liu Z Sun F (2015a) Network tuned multiple rank aggregation and applications to gene ranking BMC Bioinformatics16 S6

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang YXR Jiang K Feldman LJ Bickel PJ Huang H (2015b) Inferring gene-gene interactions and functional modules using sparsecanonical correlation analysis Ann Appl Stat 9 300-323

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang Z Gerstein M Snyder M (2009) RNA-Seq a revolutionary tool for transcriptomics Nat Rev Genet 10 57-63Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wei C Li J Bumgarner RE (2004) Sample size for detecting differentially expressed genes in microarray experiments BMC Genomics5 87

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wolfe CJ Kohane IS Butte AJ (2005) Systematic survey reveals general applicability of guilt-by-association within genecoexpression networks BMC Bioinformatics 6 227

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wu D-D Wang X Li Y Zeng L Irwin DM Zhang Y-P (2014) Out of Pollen Hypothesis for Origin of New Genes in Flowering PlantsStudy from Arabidopsis thaliana Genome Biol Evol 6 2822-2829

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yang H Liu X Xin M Du J Hu Z Peng H Rossi V Sun Q Ni Z Yao Y (2016) Genome-wide Mapping of Targets of Maize HistoneDeacetylase HDA101 Reveals Its Function and Regulatory Mechanism during Seed Development Plant Cell 28 629-645

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yim WC Yu Y Song K Jang CS Lee B-M (2013) PLANEX the plant co-expression database BMC Plant Biol 13 83Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zheng W Chung LM Zhao H (2011) Bias detection and correction in RNA-Sequencing data BMC Bioinformatics 12 290Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Allen JD Xiao G Xie Y (2014) Ensemble-based network aggregation improves the accuracy of gene network reconstructionPLoS One 9 e106319 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Ye ZH (2015) Secondary cell walls Biosynthesis patterned deposition and transcriptional regulation Plant Cell Physiol 56195-214

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhu G Wu A Xu X-J Xiao P Lu L Liu J Cao Y Chen L Wu J Zhao X-M (2016) PPIM A protein-protein interaction database for MaizePlant Physiol 170 pp1501821-

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zyprych-Walczak J Szabelska A Handschuh L Goacuterczak K Klamecka K Figlerowicz M Siatkowski I (2015) The impact of normalizationmethods on RNA-Seq data analysis Biomed Res Int 2015

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

  • Parsed Citations
  • Article File
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Figure 7
  • Figure 8
  • Parsed Citations
Page 3: 1 : Maize RNA-Seq GCNs · 4 Author names and affiliation: 5 Ji Huang, ... and the genome annotation is still far from complete (Mark 49 Cigan et al., 2005; Ficklin and Feltus, 2011)

Page | 3

stabilizing transformation (VST) Counts Per Million (CPM) and Reads Per Killobase Million (RPKM) are three 69

popular normalization methods for RNA-Seq experiments (Mortazavi et al 2008 Anders and Huber 2010 70

Rau et al 2013) 71

Some work has been done to evaluate the efficacy of different normalization methods for expression analysis 72

Giorgi et al (2013) showed VST normalization of RNA-Seq data resulted in a GCN with similar characteristics 73

to a microarray-supported network in terms of coefficient and node degree distribution Normalizations with 74

CPM and using the Trimmed Mean of M-values (TMM) to adjust the composition bias between RNA-Seq 75

datasets by calculating normalization factors (Robinson et al 2010) increased the robustness of analysis 76

among diverse library sizes and compositions (Dillies et al 2013a) These studies suggest that optimizing 77

normalization methods might improve GCN performance 78

There are several methods for gene network inference including correlation mutual information (MI) Bayesian 79

network and probabilistic graphical models Typically correlation and MI methods are used for constructing 80

large-scale GCNs with more than ten thousand genes (Krouk et al 2013) Correlation methods include 81

Pearson Correlation Coefficient (PCC) Spearmans correlation coefficient (SCC) Kendall rank correlation 82

coefficient (KCC) Gini correlation coefficient (GCC) and Biweight midcorrelation (BIC) (Langfelder and 83

Horvath 2008 Kumari et al 2012 Ma and Wang 2012 Ballouz et al 2015) Cosine similarity coefficient 84

(CSC) has also been used for computing similarities in sparse datasets such as text (Dhillon and Modha 2001) 85

and protein-protein interaction data (Luo et al 2015) MI methods include Accurate Cellular Networks 86

(ARACNE) Minimum Redundancy NETwork (MRNET) and Context Likelihood of Relatedness (CLR) (Margolin 87

et al 2006 Faith et al 2007 Meyer et al 2007) The network inference method might also influence GCN 88

performance 89

Several resources are already available for GCN analysis in maize including COB (Schaefer et al 2014) 90

CORNET (De Bodt et al 2012) CoP (Ogata et al 2010) PLANEX (Yim et al 2013) and ATTED-II (Obayashi 91

et al 2009) All of databases except ATTED-II used PCC to build GCN from 128 to 379 microarray datasets 92

ATTED-II recently updated their database to provide both GCNs from microarray and RNA-Seq using PCC-93

based mutual rank (Aoki et al 2015) Although PCC is widely used there is very limited evidence that it is the 94

optimal approach for GCN analyses 95

GCNs could also be improved by meta-analysis using ranked aggregation from individual networks (Zhong et 96

al 2014 Ballouz et al 2015 Wang et al 2015a) By aggregating individual experiments only interactions 97

consistent among networks are preserved which helps reduce noise and highlights conserved interactions 98

Furthermore the ranked aggregation method provides a way to efficiently increase the size of the aggregated 99

network with newly available datasets and recalculation with all datasets is not required when a new one is 100

added This provides an efficient way to process and incorporate emerging information 101

Herein an extensive evaluation in constructing maize GCNs is reported Three parameters were tested 102

normalization method network inference algorithm and ranked aggregation method To our knowledge this is 103

the first comprehensive attempt to optimizing GCN construction using plant RNA-Seq datasets The network is 104 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 4

publicly accessible at httpwwwbiofsuedumcginnislabmcnmain_pagephp A tutorial is also provided as 105

supplemental material 106

107

Results 108

Manually Curated Maize mRNA Expression Profiling from Publicly Available Datasets 109

Recently the usage of RNA-Seq in maize has increased dramatically generating zero data entries in NCBI-110

SRA in 2008 to over nine hundred in 2016 (Fig 1B) On the contrary the most widely used Affymetrix 111

expression array for maize had 177 samples in 2008 but only 46 in 2016 (Fig 1B) GCN construction 112

approaches have not been optimized for RNA-Seq datasets in plants and doing so could improve the quality 113

and robustness of GCNs To support a comprehensive evaluation on the effect of RNA-Seq normalization 114

methods and network inference methods on the performance of GCNs maize RNA-Seq datasets were 115

compiled and processed with a computational pipeline (Supplemental Fig 1) 1266 high quality RNA-Seq 116

maize libraries from 17 different experiments were selected as input to an expression matrix The 117

corresponding experimental descriptions and publications where available of each library were manually 118

checked for sample information (Supplemental Table S1) Also a filter for reads depth and alignment rate 119

were used to remove unqualified libraries (see Methods for detail) Tissue type and haplotype from those 120

libraries were manually curated and found to include a range of sample types (Supplemental Table S1) Shoot 121

apical meristem (SAM) leaf and root were the top three most abundant tissue types but a wide range of 122

tissues were represented by multiple libraries in the dataset (Supplemental Fig 1) The dataset also included 123

multiple haplotypes although B73 represented approximately 40 of the included libraries To reduce noise 124

lowly expressed genes were removed from analysis leaving 15116 nonredundant genes across the 1266 125

libraries For comparative purposes the Affymetrix Gene Chip maize array includes 13339 genes before 126

filtering (GeneChip Maize Genome Array 127

httpwwwaffymetrixcomcatalog131468AFFYMaize+Genome+Array1_1) 128

129

Three RNA-Seq Normalization Methods Show Comparable Distribution of Expression 130

Expression data from distinct sources and experiments can be highly variable because of hybridization artifacts 131

in microarray or variable sequencing depth in RNA-Seq Many methods have been successfully used for 132

normalizing both microarray and RNA-Seq data to correct for potential biases (Lim et al 2007 Dillies et al 133

2013b Li et al 2015b) To find an optimal normalization method for building a maize GCN from RNA-Seq data 134

three widely used normalization methods were compared This included Variance Stabilizing Transformed 135

(VST) Counts Per Million (CPM) and Reads Per Killobase Per Million (RPKM) (Mortazavi et al 2008 Anders 136

and Huber 2010 Rau et al 2013) For all normalization methods log2 transformation on the normalized 137

expression values reduced the skew of the data distribution (Supplemental Fig 2) Several network studies 138

from plant RNA-Seq data used log2 transformation (Davidson et al 2011 Ma and Wang 2012 Giorgi et al 139 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 5

2013 Stelpflug et al 2015 Walley et al 2016) In our analysis genes with CPM gt 2 in more than 1000 140

samples were included This filter dramatically reduces zero count values in raw data from 30949 to 0367 141

Moreover a prior count of one was added at log2 normalization (expression = log2(CPMRPKM +1)) to avoid 142

problem with remaining zero values The log2 transformation reduced skewed distributions and extreme values 143

represented by outliers (Supplemental Fig 2) Thus we think it is important to apply log2 transformation for our 144

data 145

The distribution of gene expression across the 1266 libraries formed a bell-shaped curve with a small 146

additional peak of low expression for all three methods (Supplemental Fig 2) To determine if these low 147

expression values came from a few or multiple libraries elements within the range of expression that 148

corresponded to the observed peak (lt -37 CPM Supplemental Fig 2B) were extracted from CPM-normalized 149

expression matrix and matched to the originating libraries This demonstrated that the low expression elements 150

were not limited exclusively to specific libraries but eight libraries contributed over 25 of low elements A 151

gene ontology enrichment analysis failed to identify significant gene ontology descriptors within the subset of 152

43 genes that were defined as lowly expressed (data not shown) All eight of these libraries were from pollen 153

tissue where the average gene expression at 147 Counts Per Million (CPM) is lower than the average gene 154

expression of the other 79 tissues combined at average 183 CPM Hierarchical clustering and correlation 155

heatmap with the same data (Stelpflug et al 2015) shows the uniqueness of pollen tissue expression pattern 156

(Langfelder and Horvath 2008) (Supplemental Fig 3) When the lowly expressed elements from RPKM- and 157

VST-normalized data were analyzed to determine library origin and GO enrichment (data not shown) we found 158

similarly high level of pollen-specific libraries without significant GO categories In pollen some highly 159

expressed genes are considered orphan genes (Wu et al 2014) because they lack detectable homologs in 160

another species To investigate whether these lowly expressed genes were orphan genes their gene 161

sequences were blasted against Setaria italica genome (JGIv2) (BLASTX e-value lt 1E-03) Setaria italic 162

(foxtail millet) is a close relative to maize which diverged 234 million years ago (MYA) as estimated by 163

TimeTree (Kumar et al 2017) Only 1 out of 43 genes lacked detectable homologs in Setaria italic (data not 164

shown) indicating that the majority of these genes are not likely to be orphan genes 165

Because RPKM normalization accounts for gene length the distribution of gene length versus expression for 166

the RPKM method was compared to data normalized by VST and CPM methods VST- and CPM-normalized 167

data showed very similar overall patterns with no clear linear relationship between gene length and average 168

expression (Supplemental Fig 2C) RPKM-normalized data displayed an apparent bias toward elevated 169

expression of a small number of genes less than 5000bp in length and lower expression of long genes 170

suggesting that this normalization method might skew the distribution of expression at some genes Overall in 171

spite of these differences the three normalization methods resulted in a similar distribution of expression 172

patterns for most of the genes included in the analysis Additional analysis was completed to determine if the 173

three normalization methods influence network performance 174

175

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 6

Network Performance Does Not Differ Based Upon Normalization Method 176

To compare the efficacy of three normalization and ten inference methods a GCN was generated for each 177

combination of normalization and inference methods Furthermore all networks were rank-standardized to limit 178

the edge weight ranging from 0 to 1 (See Methods) All networks evaluations used the whole adjacency matrix 179

(1511615116 in RNA-Seq networks 1142911429 or 1786217862 in protein networks) without a cut-off 180

The performance of the different networks was measured by comparing the area under the receiver operator 181

characteristic curves (AUROC) AUROC is a measurement used to evaluate the accuracy of classification 182

models making it suitable for evaluating GCNs (Gillis and Pavlidis 2011 Ma and Wang 2012 Liu et al 2017) 183

AUROC values range from 0 to 1 with a value closer to 1 indicating that the network is discriminating 184

nonrandom patterns and perfect classification random networks returning values close to 05 and values 185

closer to 0 indicating a high degree of incorrect classification While an AUROC value close to 1 is optimal 186

values over 07 suggest good performance when analyzing large diverse networks (Gillis and Pavlidis 2011) 187

To set up the AUROC baseline for the random networks maize gene IDs were shuffled 10 (for MRNET and 188

CLR) or 1000 times (for PCC) from the normalized expression matrix The randomized expression matirx were 189

inferenced using designated alorgrithms and further evaluated The resulting AUROC values from randomized 190

networks were very close to 05 (Supplemental Table S2) 191

AUROC values were calculated and compared for three different network characteristics The first 192

characteristic was designed to test if the network identified genes with known or predicted co-expression 193

patterns based upon prior results and inclusion in two existing datasets that could serve as a positive control 194

for co-expression The maize metabolic pathway (MaizeCyc) contains 413 pathways with more than two genes 195

and was built based upon collection of evidence from genome annotation phylogenetic distance and known 196

genes in maize rice and Arabidopsis (Monaco et al 2013) The maize protein-protein interaction database 197

(PPIM) is based upon both predicted and experimentally detected protein interactions (Zhu et al 2016) and 198

was the second dataset used in this analysis Only high-confident interactions from PPIM were used as 199

defined by ranking top 5 in their model (Zhu et al 2016) For comparison with the GCN genes within the 200

same MaizeCyc or PPIM pathways were considered co-expressed The MaizeCyc and PPIM datasets were 201

combined and genes with less than 5 interactions were excluded from evaluation creating a compiled dataset 202

referred to herein as the Protein-Protein and Pathway dataset (PPPTY) PPPTY had 1720 genes and 104856 203

interactions that were used in this evaluation The AUROC value was calculated for each of the 1720 gene 204

terms 205

To assess the effect of normalization method on GCNs AUROC values for all ten inference methods were 206

averaged for each of the three normalization methods All three normalization methods scored similarly in 207

comparison with the PPPTY dataset (Fig 2B) with a mean AUROC value around 0575 for each suggesting 208

that the predicted networks were more selective than a random network 209

The second characteristic was the presence of similar gene ontology (GO) information for maize genes within 210

a detected co-expression set based upon ldquoguilt by associationrdquo that assumes specific subgroups of co-211 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 7

expressed genes have some shared functions (Wolfe et al 2005) GO annotations were downloaded from 212

AgriGO (Du et al 2010) which uses signature integration by InterPro to map gene IDs to GO terms rather 213

than co-expression data InterPro provided over 108 million stable GO terms to the functional protein 214

information database UniProtKB at release 2016_01(Sangrador-Vegas et al 2016) Thus the GO annotations 215

provide a reliable evaluation resource independent of co-expression data To assess this characteristic gene 216

ontology information was used in a neighbor voting algorithm (Gillis and Pavlidis 2011) for sets of co-217

expression matrices and compared Co-expression matrices were assessed by 3-fold cross-validation which 218

involved masking GO terms from some genes to test whether the masked GO terms could be predicted based 219

upon gene expression patterns 277 GO terms were included for this analysis 220

When GO characteristics were used to assess the networks all three normalization methods performed 221

similarly but the AUROC values were higher at around 0689 for each than those observed for comparisons 222

with PPPTY (Fig 2A) Because GO addresses gene functions and PPPTY emphasizes protein-protein 223

interactions this suggests that GCNs are better at predicting functional interactions than physical interactions 224

The p-value from one-way ANOVA for testing normalization method effect on PPPTY and GO dataset were 225

09535 and 04714 respectively confirming that the normalization method did not create a significant difference 226

in the AUROC scores associated with the GCNs for the characteristics that were tested 227

Finally proteins that regulate gene expression or modify chromatin structure might interact with the DNA of a 228

subset of co-expressed genes The interactions between such a protein and regulated DNA could be detected 229

by chromatin precipitation of associated DNA followed by DNA sequencing (ChIP-Seq) In maize there are five 230

ChIP-Seq datasets available (Bolduc et al 2012 Morohashi et al 2012 Li et al 2015a Pautler et al 2015 231

Yang et al 2016) some of which involving lowly expressed or tissue-specific genes For example Opaque2 is 232

specifically expressed in endosperm (Li et al 2015a) Knotted1 is expressed in SAM and floral tissues (Bolduc 233

et al 2012) and Pericarp Color1 has low expression except in inflorescence and seed (Morohashi et al 234

2012) Histone Deacetylase 101 (HDA101) ChIP-Seq data provided the largest dataset for comparison with 26 235

confirmed binding targets that are relatively high expressed in most maize tissues (Yang et al 2016) Histone 236

deacetylation often correlates with decreased in gene expression (Verdin and Ott 2014) High confidence 237

HDA101 targets were defined as those discovered by ChIP-Seq and that also showed increased gene 238

expression in hda101 mutant Networks associated with the 26 high confidence HDA101 targets were 239

compared by calculating AUROC Based upon this analysis the AUROC values were very similar among 240

networks normalized by VST CPM and RPKM (Fig 2C) which is consistent with GO and PPPTY evaluation 241

242

Correlation Methods Performs better than Mutual Information at Some Genes 243

After normalization of the expression matrices they can be processed by different methods for GCN inference 244

To optimize this step the AUROC values of six correlation (PCC SCC KCC GCC BIC CSC) and four mutual 245

information (MI) methods (AA MA MRNET CLR) were compared for the expression matrices that were 246

generated from each of three normalization methods (VST CPM RPKM) and then averaged In general 247 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 8

correlation methods are more computationally efficient while MI methods are able to reveal non-linear 248

relationships (Li et al 2015c) PCC is widely used but may be influenced by outliers (Mukaka 2012) SCC 249

KCC and BIC are less sensitive to outliers because SCC and KCC only consider the rank information and BIC 250

calculates based on dataset median instead of mean (Serin et al 2016) Recently GCC has been shown to 251

be a better correlation method for gene expression analysis because of its capacity to detect non-linear 252

relationships and insensitivity to outliers (Ma and Wang 2012) CSC is widely used for text mining and 253

analyzing sparse data with many zeros (Dhillon and Modha 2001) ARACNE MRNET and CLR showed 254

extended gene-dependent relationships under variable biological settings (Margolin et al 2006 Faith et al 255

2007 Meyer et al 2007 Li et al 2013b) To estimate the effectiveness of the inference methods the same 256

testing parameters with AUROC calculations were performed as described for the testing of normalization 257

methods 258

Assessed by GO datasets the 277 AUROC values were averaged to create one average value for each of the 259

10 inference methods ranging from 0620 to 0724 (Fig 2D) The average AUROC across all normalization 260

methods for six correlation methods was 0718 while the average AUROC for the all four MI methods was 261

0646 The majority of the 277 GO terms had similar AUROC values in the different correlation method-262

generated GCNs and these patterns are different from those observed in the MI-generated GCNs (Fig 3A) 263

The similarity among different methods was also detectable by pairwise comparison and comparing Pearson 264

correlations between the different methods (Supplemental Fig 4A) 265

To evaluate network inference methods with the PPPTY dataset the AUROC values for 1720 genes were 266

averaged for each combination of normalization and inference methods (Fig 2E) This evaluation also showed 267

that the networks constructed using correlation methods resulted in higher AUROC values than MI methods 268

although the CSC method resulted in lower AUROC values than other correlation methods As demonstrated 269

for the GO evaluation results from correlation methods were more similar with each other than the MI methods 270

(Supplemental Fig 4B) Interestingly heatmap results indicated that a subset of genes consistently had higher 271

AUROC values when CSC MRNETCLR or AAMA were used (Fig 3B) although this includes a small enough 272

number of genes that the average AUROC value over the whole gene set was relatively low for those methods 273

The gene sets with highest AUROC values in PCC CSC or MRNET were extracted Characteristics of each 274

gene sets were compared in average expression (CPM) and average number of low expressed elements 275

(CPM lt 0) The CSC gene set had the smallest number of low expression elements and had higher average 276

expression than both the 1720 gene set and the PCC gene set (Supplemental Fig 5) This may indicate that 277

the CSC method is better at determining co-expression for highly expressed genes 278

The AUROC values from 26 targets of HDA101 ChIP-Seq datasets reveals that CSC GCN had the highest 279

AUROC value and the use of MRNETCLR GCNs resulted in slightly higher scores than correlation methods 280

(Fig 2F) This could be explained by the small number of targets creating skewed results but may also 281

indicate that CSCMI methods are more suitable for specific types of genes or interactions between genes 282

(Tzfadia et al 2016) HDA101 is a highly expressed gene in all samples with average expression value equals 283

to 864 CPM and minimum expression equals to 289 CPM so itrsquos possible that HDA101 is more suitable for 284 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 9

CSC method CPM and RPKM normalization methods had higher AUROC values than VST (Fig 2C) Using 285

two models of ARACNE (additive-AA and multiplicative-MA) the co-expression matrices contain less than 05 286

non-zero values for all comparisons and so these techniques were not included in any additional analyses 287

In conclusion our results indicated the widely-used correlation methods resulted in a more predictive maize 288

GCN from a single expression matrix but co-expression with some individual genes may be better detected 289

using MI methods Normalization method did not have a substantial influence on GCNrsquos performance so only 290

CPM normalization was used in conjunction with PCC SCC MRNT and CLR inference for subsequent 291

optimization of other parameters 292

293

Increase Sample Size Had a Positive Effect On GCN 294

GCN analysis can be accomplished with a variable number of samples and datasets but sample size can 295

influence the quality of the resulting GCN (Wei et al 2004 Ballouz et al 2015) Separate analyses were 296

conducted with different numbers of samples and experiments to empirically determine the effect of sample 297

number on GCN effectiveness The data in our analysis consisted of 17 experiments each including between 298

12 and 404 libraries For this analysis CPM normalization method followed by each of four inference methods 299

(PCC SCC MRNET and CLR) was applied to the 17 experiments and the 68 resulting networks were 300

evaluated by both GO and PPPTY 301

From GO and PPPTY evaluation all algorithms exhibit a positive linear relationship between sample size with 302

natural logarithm transformed and average AUROC values (Fig 4) The linear relationships are stronger in 303

PCC and SCC methods with higher r-square values indicating correlation methods benefit more from 304

increasing sample size Thus for building correlation-based GCNs as many samples as possible should be 305

included We also found that as seen for the total GCN analysis PCC and SCC had higher average AUROC 306

values than the MRNET and CLR methods for PPPTY and GO analysis for most of individual networks (Fig 5) 307

308

Ranked Aggregation of Networks Improved Performance of GCNs 309

Ranked aggregation for meta-analysis can also be modified to change the outcomes of GCN by buffering the 310

effect of sample heterogeneity (Zhong et al 2014 Wang et al 2015a Asnicar et al 2016) Aggregated rank 311

standardized correlationMI matrices were calculated from separate experiments to determine if this approach 312

enhanced GCN performance Aggregating individual networks together for meta-analysis can help to highlight 313

true co-expression interactions and reduce noise (Zhong et al 2014 Wang et al 2015a Wang et al 2015b) 314

This analysis was conducted with the 17 differently sized experiments using PCC SCC MRNET and CLR 315

method for GCN inference as we did previously resulting in 68 single GCNs The 17 experiments were 316

aggregated for PCC SCC MRNET and CLR individually and evaluated by GO and PPPTY datasets 317

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 10

Of the 4 aggregated networks that were evaluated the two correlation methods (PCC and SCC) had higher 318

AUROC values than the single network from 1266 samples (Figure 6 and Supplemental Fig 6) However this 319

aggregation strategy did not result in significant higher AUROC scores for the MRNET and CLR method 320

networks compared with single networks with 1266 samples (two-tail Wilcoxon rank test for GO evaluation p-321

values 0494 and 0796) It has been reported that MI estimation accuracy is dependent on sample size (Gao 322

et al 2015) therefore individual MI networks built with a small number of libraries may not demonstrate 323

improved accuracy from aggregation In conclusion the PCCSCC-built GCN performed best using a ranked 324

aggregation strategy and use of this strategy in combination with the other optimized parameters creates a 325

robust GCN 326

327

The Performance of Protein Networks Did Not Exceed Aggregation Networks 328

In many cases mRNA levels in a cell are of interest because mRNA level is thought to be related to the level 329

and function of a protein of interest However many researchers had found inconsistencies between mRNA 330

and protein level (Baerenfaller et al 2008 Schwanhaumlusser et al 2011 Ponnala et al 2014 Walley et al 331

2016) Although relatively less protein expression data is available this data is amenable to GCN construction 332

and could represent a more direct reflection of interacting proteins Using a non-modified protein expression 333

atlas from 23 maize tissues based upon mass spectrometry data (Walley et al 2016) four protein networks 334

were built with PCC SCC MRNET and CLR separately and then evaluated using the same PPPTY and GO 335

dataset as previously mentioned 336

GCNs constructed from protein expression did not exhibit superior AUROC values to those observed for RNA-337

Seq based GCN using the aggregation strategy (Fig 6) When evaluated by GO and PPPTY dataset the 338

performance of the protein network was lower than the aggregated network as well as the single network from 339

1266 samples To confirm this result a two-way ANOVA was computed with pairwise comparison for the GO 340

evaluation which showed that the effect of network type was significant (Supplemental Table S3) A 341

subsequent pairwise comparison using Wilcoxon rank sum test indicated that PCCSCC method were 342

significantly better than MRNETCLR (Supplemental Table S3) although MI methods may be superior for 343

some types of interactions 344

The raw protein expression data included 17862 genes of which 11429 genes overlapped with our RNA-Seq-345

based network and were therefore used for the analysis To demonstrate that the performance of the protein 346

network was not biased due by the selection of genes the PCC method was used for the whole 17862 genes 347

to construct a protein network (Supplemental Fig 7) No improvement could be detected from protein network 348

derived from 17862 genes with p-value equals to 0635 for GO evaluation and 0995 for PPPTY evaluation 349

from one-sided Wilcoxon rank sum test 350

351

PCC and SCC-built GCN Exhibit Identical Topological and Functional Properties 352 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 11

In addition to evaluation of network performance based upon biological characteristics networks can be 353

compared based upon several different network characteristics including clustering coefficient number of 354

nodes network heterogeneity (Dong and Horvath 2007) network centralization (Dong and Horvath 2007) 355

number of detected modules and number of genes in largest module Number of nodes is a basic construct in 356

graph theory depicting the scale of a network Clustering coefficient and number of modules are to model how 357

densely nodes are connected in networks Heterogeneity measures the variability of node connections 358

Centralization indicates how likely some nodes have significantly more connections than average In this 359

analysis each gene corresponds with a node Based on the extensive evaluation using biological 360

characteristics like protein-protein interactions (PPPTY) and predicted gene function (GO) three final maize 361

networks were selected for comparison of basic network characteristics based on their overall performance 362

PCC and SCC-built ranked aggregation network from 17 experiments (PA and SA) MRNET-built single 363

network from 1266 total samples (MS) The three networks were constrained to include the top one million 364

predicted interactions or edges 365

In prior studies most biological networks had scale-free architectures which fit a power-law distribution 366

(Barabasi et al 2004 Doncheva et al 2012 Schaefer et al 2014) For the three final maize networks 367

constructed using optimized parameters both neighborhood connectivity distribution (Supplemental Fig 8) and 368

node degree distribution (Supplemental Fig 9) fit power-law models with r-squared values over 07 The MS 369

network had the highest network centralization value The network heterogeneity value of MS was over two 370

times that of PA and SA indicating that MS may contain more highly interacting genes (Supplemental Table 371

S4) consistent with the observed highest centralization values for this network Centralization and 372

heterogeneity are two variants to model the degree distribution of networks A scale-free network with more 373

numbers of hubs has larger values of centralization and heterogeneity while a network with larger values of 374

centralization and heterogeneity may contain a larger number of hubs or the number of hubs is not significantly 375

large but the degree distributions are extremely imbalanced In biological networks many observations 376

connected large values of centralization and heterogeneity with more hub genes (Ma and Zeng 2003 Horvath 377

and Dong 2008 Iancu et al 2012 Scott-Boyer et al 2013) even though theoretically we cannot rule out the 378

possibility that high values were result from extremely imbalanced degree distribution For the MS network 379

most highly connected genes interacted with a large number of lowly connected genes this pattern is also 380

apparent reflected in the decreasing neighborhood connectivity distribution for the MS network (Supplemental 381

Fig 8) The genes with the most interactions are expected to act as key components in GCN networks 382

(Langfelder and Horvath 2008 Allen et al 2012) and likely represent central regulators of multi-protein 383

biological processes (Ma et al 2013 Du et al 2015) The top 1000 interacting genes from all networks were 384

analyzed in more detail as these were potential ldquohubrdquo genes that may regulate other expression patterns and 385

processes PA and SA shared 95 of the top 1000 interacting genes while MS had 835 unique genes (Fig 386

7A) 148 genes were shared among all three networks (Supplemental Table S5) making these genes strong 387

candidate for central biological regulators The annotation of these genes suggests their participation in a 388

range of basic cellular process (Fig 7C) including gene expression DNA replication translation and gene 389

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 12

silencing (Supplemental Table S5) the top interacting genes were not limited to a subset of cellular 390

biochemistry Ribosomal proteins were the largest component of top interacting genes (27148) which was 391

expected because of their cellular abundance and involvement with translation Interestingly nine epigenetic 392

regulators were found in the 148 shared genes including AGO104 (GRMZM2G141818) (Singh et al 2011) 393

CHR106 (GRMZM2G071025) (Li et al 2014a) and LBL1 (GRMZM2G020187) (Dotto et al 2014) 394

demonstrating the importance of epigenetic regulation for plant development (reviewed by (Huang et al 395

2017)) 396

To reveal the underlying properties of GCNs a graph clustering algorithm Markov Cluster Algorithm(MCL) was 397

used to identify network modules (Enright et al 2002 Morris et al 2011) The result showed a shared pattern 398

between the PA and SA networks that was distinct from the MS network (Supplemental Table S4) The MS 399

network had fewer but larger modules detected than the PA and SA networks Consequently most genes in 400

the MS network clustered into one very large module of 14054 consistent with the high network centralization 401

value for the MS network Conversely PA and SA networks separated into smaller distinct modules with 402

related gene ontology enrichment (Supplemental Table S6 and S7) The pattern displayed by the PA and SA 403

networks (Supplemental Fig 10) seems more likely to represent biologically relevant pathways and so these 404

methods appear to be better for module detection 405

To compile a high-confident co-expression network the top 1 million edges from PA SA and MS were merged 406

together and the intersection of the three produced a 14277 gene 106591 interactions merged network PA 407

and SA shared 835 of common interactions within the networks while MS had 873 unique interactions 408

(Fig 7B) This merged network (Supplemental Dataset S1) was used for a case study analysis of cell wall 409

biosynthesis The same network can also be accessed at httpwwwbiofsuedumcginnislabmcnmain_pagephp 410

411

Case Study Cell Wall Biosynthesis and Regulation 412

To demonstrate the functionality of network the predicted cell wall biosynthesis pathway from the merged 413

network was compared to the existing knowledge of this pathway Sixteen well-characterized components of 414

cell wall biosynthesis were selected as guide genes (Supplemental Table S8) including five cellulose 415

synthase genes seven cellulose synthase-like genes three glycosyl hydrolase genes and one glycosidase 416

gene (Penning et al 2009 Bosch et al 2011) Collectively 214 genes containing 377 edges were extracted 417

from the network with the 16 guide genes (Fig 8 A) two guide genes did not have any co-expressed genes in 418

the network that met the analysis criteria As expected for these 214 genes cell wall related GO terms were 419

enriched (Fig 7D Supplemental Table S9) 420

The resulting 214 co-expressed genes were queried against the Arabidopsis TAIR 10 protein database to 421

retrieve homologs and their annotations using BLASTP The literature was manually searched using the maize 422

genes and their Arabidopsis homologs as queries (Supplemental Table S10) The results of the literature 423

survey showed that 313 (67214) of the genes co-expressed with the guide genes had peer-reviewed 424

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 13

publications indicating a role in cell wall synthesis or related pathways in plants A search using 214 randomly 425

selected genes as queries returned only 327 genes (7214) that were involved in cell wall related pathways 426

This suggests that the network discriminated co-expressed genes and identified some known components of 427

the pathway Lignin biosynthesis genes are expected to function in cell wall biosynthesis to provide rigidity and 428

strength in the secondary cell wall (reviewed by Vanholme et al 2010) Interestingly even though no lignin 429

biosynthesis genes were included in our queries six lignin biosynthesis genes (PAL1 C4H 4CL2 HCT 430

CCoAOMT1 and PDR1) (reviewed by Zhong and Ye 2015) were found to be co-expressed with the guide 431

genes At least nine cellulose biosynthesis and assembly genes were discovered including CESA1 FLA11 432

IRX9 IRX14 and IRX10 (reviewed by Zhong and Ye 2015) Moreover proteins participating in a well-studied 433

physical interaction CSI1 (Cellulose Synthase Interactive 1) CESA6 (Cellulose Synthase 6) and CESA3 434

(Cellulose Synthase 3) (Desprez et al 2007 Gu et al 2010) were also predicted to be expressed in the 435

network There were 131 genes without reported functions in cell wall pathways an indication that GCN 436

analysis can be used to predict undiscovered components of biological pathways in maize 437

The cell wall biosynthesis pathway results were also compared with the CORNET Co-expression database (De 438

Bodt et al 2012) and STRING functional protein association network (Szklarczyk et al 2015) using the same 439

16 genes and similar parameters (See Methods) From CORNET 10 out of 16 genes had co-expressed genes 440

(Fig 8B) In total 210 genes and 325 interactions were retrieved using CORNET of which 19 (40210) had 441

publications supporting their function in cell wall pathways (Supplemental Table S11) STRING performed very 442

well with 14 out of 16 genes demonstrating predicted protein association (Fig 8C) resulting in 817 443

interactions with 76 genes 48 (3675) of co-expressed genes were experimentally confirmed (Supplemental 444

Table S12) the highest percentage among the three methods Only one of the lignin biosynthesis genes 445

(PAL1) was found using CORNET and none were found using STRING Although STRING appears very 446

robust for predicting protein-protein interactions this suggests that an optimized GCN analysis have more 447

power to find genes that function together without physically interacting This case study shows that a robust 448

optimized GCN can discover physical and functional interactions and enhance study of biological relevant 449

interactions A tutorial was provided as supplemental material on how to use Cytoscape to visualize any co-450

expressed genes in our network (Supplemental Dataset S2) 451

452

Discussion 453

As the per-read cost of RNA-Seq technology decreases the use of this technology is quickly increasing With 454

over five thousand libraries available for maize there is now ample data to support GCN analysis This 455

comprehensive evaluation of normalization methods and network inference methods using real maize RNA-456

Seq data will provide a useful set of optimized parameters to support these analyses 457

In our analysis VST CPM and RPKM normalization methods had equivalent outcomes for GCN analysis 458

consistent with prior results using much smaller datasets (Giorgi et al 2013) Several benchmark studies 459

focusing on differential expression (DE) analysis proposed that RPKM performed poorly and should be avoided 460 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 14

(Maza et al 2013 Dillies et al 2013b Zyprych-Walczak et al 2015) This was not observed for the maize 461

GCN testing It is possible that the large number of samples from various labs created enough heterogeneity 462

within samples that normalization effects were minimized (Paulson et al 2016) Furthermore the 463

normalization is on a library basis which means genes within the same library are normalized by similar factors 464

So when the network is constructed by PCC and BIC where expression vectors are centered by mean or 465

median values the effect of different normalization methods are probably small Two rank correlations SCC 466

and KCC only consider difference on relative rankings where normalization has a limited effect It is similar for 467

GCC method The estimation of mutual information is based on the k-nearest neighbor method implemented in 468

parmigene (Sales and Romualdi 2011) Since the three normalization methods shared similar expression 469

distribution (Supplemental Fig 2) MI estimations from different normalizations are expected to be similar 470

When assessing inference methods the simple and widely used correlation methods like PCC and SCC are 471

less time-consuming than MI methods This analysis showed PCCSCC- built GCNs had better overall 472

performance This is consistent with a study in human GCN analysis (Ballouz et al 2015) but SCC did not 473

score higher than other correlation methods using GO and PPPTY evaluations Some genes had higher 474

performance using MI methods but this effect was limited to evaluation with the PPPTY data This may 475

indicate that correlation and MI inference methods assert different kinds of interactions (Meyer et al 2008 476

Marbach et al 2012 Song et al 2012) Marbach et al (2012) stated that integration of multiple inference 477

methods showed a more robust performance than any single inference methods in in silico and E coli 478

expression networks referring to ldquothe wisdom of crowdrdquo However for analysis of the available maize data 479

integration of PCC SCC MRNET and CLR together did not result in a network that outperformed PCC and 480

SCC networks (data not shown) This approach was also less effective in more complex S cerevisiae datasets 481

than prokaryotic networks (Marbach et al 2012) suggesting that more work is required to determine whether 482

integrating algorithms can improve GCNs with eukaryotic data 483

In conclusion we extensively evaluated normalization methods and inference methods for building an RNA-484

Seq based maize GCN This optimization may apply to a range of datasets with shared characteristics of 485

maize including a large and heterogeneous genome with rich and diverse transposon element composition 486

and limited gene annotation 487

488

Materials and Methods 489

RNA-Seq Data Collection and Process 490

The maize genome and its annotation were downloaded from Ensembl Plant Release 31 491

(httpplantsensemblorg) The original 1303 RNA-Seq samples based on illumina HiSeq2000 or Hiseq2500 492

were downloaded from NCBI Sequence Read Archive (SRA) (Leinonen et al 2010) The downloaded files 493

were converted to fastq format using the fastq-dump command in SRA Toolkit (version 252) The adapters for 494

the fastq files were trimmed by Cutadapt 181 (Martin 2011) The adapter-removed files were then quality 495

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 15

checked by FastQC v0112 (httpwwwbioinformaticsbabrahamacukprojectsfastqc) HISAT2 v204 (Kim 496

et al 2015) was used for genome alignment Gene-level expression raw read counts were calculated by 497

FeatureCounts 150 (Liao et al 2014) from aligned bam files (Supplemental Fig S1) 26 libraries with less 498

than 5 million reads total and 11 libraries with less than 70 of total alignment rate were excluded leaving 499

1266 samples (Supplemental Table S1) for the final expression table The processing protocol were 500

streamlined by Snakemake v371 (Koumlster and Rahmann 2012) 501

502

Gene Count Normalization 503

The expression data was normalized using three different methods before constructing GCNs Counts Per 504

Million (CPM) and Reads Per Killbase Per Million (RPKM) were calculated by edgeR package (Robinson et al 505

2010) in R environment and then log2 normalized (expression = log2(CPMRPKM +1) For both method scale 506

factors between samples were estimated by Trimmed Mean of M-values (TMM) in edge R Variance Stabilizing 507

Transformation (VST) was calculated by DESeq2 package (Love et al 2014) Only genes with expression 508

higher than 2 CPM in more than 1000 samples were included from additional analysis (15116 genes) 509

510

Network Inference 511

Six correlation coefficient methods and four mutual information methods were applied to normalized gene 512

expression data to construct GCNs All computing steps were done in the R 331 environment Pearson 513

Correlation Coefficient (PCC) and Spearman Correlation Coefficient (SCC) was calculated by cor() function 514

Kendall rank Correlation Coefficient was calculated using corfk() function in pcaPP package (Filzmoser et al 515

2009) Gini Correlation Coefficient was calculated by adjacencymatrix() function in rsgcc package (Ma and 516

Wang 2012) Biweight midcorrelation was computed by bicor() function in WGCNA package (Langfelder and 517

Horvath 2008) Cosine similarity coefficient was computed by cosine() function in coop package (Schmidt 518

2016) Mutual information results were computed using the parmigene package (Sales and Romualdi 2011) 519

The adjacency matrix weighs derived from ten inference methods were ranked with smallest value equals to 520

one Then ranks were divided by the number of elements in the matrix and diagonal was set to one to make all 521

networks weighs ranging from zero to one 522

523

Network Performance Evaluation 524

To generate the random networks gene IDs were shuffled randomly in CPM or VST normalized expression 525

matrices The randomized expression matrices were then inferenced by PCC MRNET or CLR methods and 526

evaluated For PCC methods 1000 repeats of randomization and evaluation were conducted For MRNET and 527

CLR each inference steps took 2 hours on our server so 10 repeats were conducted 528

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 16

Four maize datasets were used for evaluation First maize protein-protein interactions were downloaded from 529

PPIM v11 (Zhu et al 2016) Only high-confidence interactions were used for evaluation as defined by ranking 530

top 5 in their results Second maize pathway information was downloaded from MaizeCyc v22 (Monaco et 531

al 2013) Genes within same pathways were considered as co-expressed Third maize gene ontology data 532

for AGPv330 was downloaded from AgriGO (Du et al 2010) GO terms with 20 to 300 genes were used for 533

evaluation Fourth ChIP-Seq confirmed targets for HDA101 (GRMZM2G172883) (Yang et al 2016) was used 534

as positive co-expressed examples for evaluation 535

The widely-used Area under Receiver Operating Characteristic (AUROC) for binary classification problems 536

was used for evaluations Protein-protein interaction and pathway information was parsed into lists of co-537

expressed genes Prediction() and performance() function in R package ROCR were used to calculate 538

AUROCs (Sing et al 2005) The 277 AUROC values for GO datasets were calculated by EGAD package 539

(Ballouz et al 2016) in R Basically it utilizes the ldquoguilt-by associationrdquo principle that genes with shared GO 540

terms are more likely to connected Thus networks normalized and inferred by different methods can be 541

evaluated by hiding a subset of genes GO terms and test whether the hidden GO terms could be predicted 542

from the remaining annotations The prediction model performance was measured by AUROC values in three-543

fold cross-validation All ANOVA and pairwise Wilcoxon rank tests were analyzed in R using anova() and 544

pairwisewilcoxtest() function from stats package P-value adjustment method was set to ldquofdrrdquo (Benjamini and 545

Hochberg 1995) 546

Definition of True Positives (TP) False Positives (FP) True Negatives (TN) False Negatives (FN) For the 547

evaluation using PPPTY dataset TP a network predicts two genes are co-expressed and they are co-548

expressed in PPPTY dataset FP a network predicts two genes are co-expressed but they are not TN a 549

network predicts two genes are not co-expressed and they are not co-expressed in PPPTY FN a network 550

predicts two genes are not co-expressed but they are co-expressed in PPPTY datasets For the evaluation 551

using GO dataset TP a network predicts a gene has a specific GO term and it does have that GO term in our 552

GO dataset FP a network predicts a gene has a specific GO term but it does not have that GO term in our 553

GO dataset TN a network predicts a gene does not have a specific GO term and it doesnrsquot have in our GO 554

dataset FN a network predicts a gene does not have a specific GO terms but it has that GO term in GO 555

dataset 556

557

Network Clustering and Characterization 558

For each network the top 1 million edges were selected as stringent co-expression networks The network 559

topological characteristics were computed in Cytoscape (Shannon et al 2003) The neighborhood connectivity 560

distribution and node degree distributions were plotted by Network Analyzer plugin (Doncheva et al 2012) 561

Graph clustering was performed using Markov Cluster Algorithm (MCL) by MCL v14137 with inflation value set 562

to 18 (Enright et al 2002) All networks were visualized in Cytoscape 563

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 17

564

Gene Ontology Enrichment and Visualization 565

Gene ontology enrichment was analyzed in AgriGOrsquos Singular Enrichment Analysis tool (Du et al 2010) 566

15116 genes involved in our networks were used as background references Hypergeometric testing was used 567

to calculate p-value for which a value below 005 was considered as significant The Yekutieli method was 568

used for multiple test correction and terms with false discovery rate (FDR) above 005 were discarded The 569

results were then imported into Cytoscape for visualization 570

571

Databases Comparison on Cell Wall Pathway 572

Sixteen well characterized (Penning et al 2009 Bosch et al 2011) components of cell wall biosynthesis 573

(Supplemental Table S8) were chosen as query genes to search against CORNET Maize 574

(httpsbioinformaticspsbugentbecornetversionscornet_maize10) on website and STRING database using 575

Cytoscape stringApp (httpappscytoscapeorgappsstringapp) The parameters for searching CORNET 576

database were Method=Pearson Correlation coefficient=075 P-value le 005 and Top genes = 50 This 577

resulted in 210 co-expressed genes and 325 interactions To search STRING database the confidence cutoff 578

was set to 04 with maximum number of interactors set to 100 76 genes with 817 interactions were retrieved 579

Maize proteins were blasted against TAIR 10 protein sequences using standalone BLASTP version 2228+ 580

(Camacho et al 2009) 581

582

Acknowledgments 583

We would like to give special thanks to Dr Peixiang Zhao (FSU Department of Computer Science) for advice 584

and discussion on topological analysis of maize networks Also we thank Dr Alan Lemmon (FSU Department 585

of Scientific Computing) and Dr Jonathan Dennis (FSU Department of Biological Science) for the helpful 586

discussion on data analysis 587

588

Supplemental Data 589

Supplemental Figure 1 Pipeline and datasets used for analysis 590

Supplemental Figure 2 Distribution of gene expression values 591

Supplemental Figure 3 Maize CPM-normalized with log2 transformed gene expression from all tissues and 592

developmental stages 593

Supplemental Figure 4 Pairwise comparison among results of inferences methods 594

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 18

Supplemental Figure 5 Characteristic of all 1720 PPPTY gene set (ALL_1720) genes with highest AUROC 595

values in CSC method (CSC) PCC method (PCC) and MRNET method (MRNET) 596

Supplemental Figure 6 Evaluation of network performance based on sample size and inference 597

Supplemental Figure 7 GCN performance comparison between protein networks 598

Supplemental Figure 8 Average neighborhood connectivity for three selected networks PCC-aggregated (PA) 599

SCC-aggregated (SA) and MRNET-single (MS) 600

Supplemental Figure 9 Node distribution for four selected networks PCC-aggregated (PA) SCC-aggregated 601

(SA) and MRNET-single (MS) and the intersection among three networks (Merged network) 602

Supplemental Figure 10 Network representation of PCC ranked aggregation network (PA) 603

Supplemental Table S1 RNA-Seq libraries used in this analysis 604

Supplemental Table S2 Random network AUROC value baseline 605

Supplemental Table S3 ANOVA tables and pairwise comparisons 606

Supplemental Table S4 Topological characteristics of four maize networks 607

Supplemental Table S5 Gene Ontology annotation for 148 hub genes 608

Supplemental Table S6 Enriched GO terms for PCC ranked aggregation networks from module 1 to module 8 609

Supplemental Table S7 Enriched GO terms for SCC ranked aggregation networks from module 1 to module 8 610

Supplemental Table S8 16 query genes in maize cell wall pathway 611

Supplemetal Table S9 GO enrichment analysis for 214 co-expressed genes of cell wall query genes in 612

merged network 613

Supplemental Table S10 Annotation for co-expressed genes queried by 16 cell wall pathway genes from 614

merged network 615

Supplemental Table S11 Annotation for co-expressed genes queried by 16 cell wall pathway genes from 616

CORNET database 617

Supplemental Table S12 Annotation for co-expressed genes queried by 16 cell wall pathway genes from 618

STRING database 619

Supplemental Dataset S1 The merged network in Cytoscape-ready format 620

Supplemental Dataset S2 Tutorial Visualizing Co-expression data in Cytoscape 621

622

623 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 19

624

625

626

Figure legends 627

628

Figure 1 Number of maize microarray and RNA-Seq samples submitted to NCBI (httpswwwncbinlmnihgov) 629

from 2008-2016 A A text search of the National Center for Biotechnology Information (NCBI) Gene 630

Expression Omnibus (GEO) database identified samples generated by microarray platforms GPL4032 and 631

GPL12620 the total values for years 2008 to 2016 were combined to represent the number of microarray 632

studies (Microarray) A text search of the NCBI Sequence Read Archive (SRA) database was used to identify 633

RNA-seq samples generated between 2008 and 2016 using the Illumina sequencing platform (RNA-Seq) B 634

the number of samples submitted to NCBI GEO database each year generated by microarray platform 635

GPL4032 and GPL12620 were identified by a text search (dash line) and compared to the number of RNA-Seq 636

Illumina samples (solid line) per year 2008-2016 637

638

Figure 2 Normalization and network inference methods effect on single network performance A Network 639

performance was evaluated by calculating area under the receiver operating characteristic curve (AUROC) 640

values from GO datasets for comparisons with samples normalized using Variance Stabilizing Transformation 641

(VST) Counts Per Million (CPM) or Reads per Kilobase per Million (RPKM) methods B Network performance 642

was evaluated by calculating AUROC values from PPPTY dataset comparisons for samples normalized using 643

VST CPM or RPKM methods C Network performance was evaluated by calculating AUROC values from 644

comparisons with HDA101 binding targets for samples normalized using VST CPM or RPKM methods D 645

Network performance was evaluated by calculating AUROC values from comparisons with GO dataset for 646

samples constructed using ten inference methods including Pearson Correlation Coefficient (PCC) Spearman 647

correlation coefficient (SCC) Kendall rank Correlation Coefficient (KCC) Gini correlation coefficient (GCC) 648

Biweight midcorrelation (BIC) Cosine Similarity Coefficient (CSC) Additive ARCNE (AA) multiplicative 649

ARCNE (MA) Minimum Redundancy NETwork (MRNET) and Context Likelihood of Relatedness (CLR) E 650

Network performance was evaluated by calculating AUROC values from comparisons with PPPTY for samples 651

constructed using ten inference methods F Network performance was evaluated by calculating AUROC 652

values from comparisons with HDA101 binding targets for samples constructed using ten inference methods 653

Outliers were defined as outside of 15 times the interquartile range above 75 quantile or below 25 quantile 654

Median values were plotted as bold horizontal lines For C to F the horizontal dashed lines indicate the highest 655

and lowest AUROC values 656

657

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 20

Figure 3 Similarity between ten inference methods on network performance based upon GO (A) and PPPTY 658

(B) evaluation Genes with highest AUROC value in CSC or MRNETCLR are closed in blue and green box 659

respectively Area under the ROC curve (AUROC) values for each GO term or genes were scaled to standard 660

normal distribution resulting in scaled AUROC values between -3 (blue) and 3 (red) Samples normalized by 661

VST CPM and RPKM were analyzed using each inference methods (PCC SCC KCC GCC BIC CSC AA 662

MA MRNET and CLR) and clustered based on Euclidian distance PCC Pearson Correlation Coefficient SCC 663

Spearman correlation coefficient KCC Kendall rank Correlation Coefficient GCC Gini correlation coefficient 664

BIC Biweight midcorrelation CSC Cosine Similarity Coefficient AA Additive ARCNE MA multiplicative 665

ARCNE MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness 666

667

Figure 4 Effect of sample size on network performance A Average area under the ROC curve (average 668

AUROC) values from GO evaluation of 17 different sized networks plotted against natural logarithm 669

transformed sample size (log(sample size)) B Average AUROC values from PPPTY evaluation of 17 different 670

sized networks plotted against natural logarithm transformed sample size (log(sample size)) Regression fitting 671

logarithm models were plotted in black lines R-square and p-values were calculated by lm() function in R PCC 672

Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET Minimum Redundancy 673

NETwork CLR Context Likelihood of Relatedness 674

675

Figure 5 Evaluation of different sized networks constructed with different number of samples using PCC 676

(black) SCC (green) MRNET (red) and CLR (blue) methods A Average AUROC values from GO evaluations 677

of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted against sample size Seventeen 678

individual networks were labeled as S12_1 to S404 the S1266 included all samples from 17 experiments B 679

Average AUROC values from PPPTY evaluations of networks constructed using 12 (S12) to 1266 (S1266) 680

libraries were plotted against sample size Networks with the same number of samples included are 681

designated as ldquo_1rdquo ldquo_2rdquo and ldquo_3rdquo PCC Pearson Correlation Coefficient SCC Spearman correlation 682

coefficient MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness 683

684

Fig 6 GCN performance comparison among single network (whiterdquo1266rdquo) aggregated network (greyrdquoaggrdquo) 685

and protein network (dark greyrdquoprrdquo) using PCC SCC MRNET and CLR A GO evaluation on networks 686

Inference methods were indicated by single letter (p- PCC s- SCC m- MRNET c-CLR) AUROC values were 687

plotted against network types B PPPTY evaluation on networks Inference methods were indicated by single 688

letter (p- PCC s- SCC m- MRNET c-CLR) Network types were plotted against AUROC values Bold 689

horizontal lines indicate median star sign is the mean value of each box Outliers are plotted in grey dots 690

691

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 21

Figure 6 GCN performance comparison of networks constructed with 1266 libraries A Area under the ROC 692

curve (AUROC) values from GO evaluation of single network (white bars) aggregation network (grey bars) and 693

protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) 694

or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers B 695

AUROC values from PPPTY evaluation of single network (white bars) aggregation network (grey bars) and 696

protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) 697

or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers 698

699

Figure 7 Comparison of top 1000 interacting genes in aggregation and single networks A Venn diagram 700

shows the overlap among 1000 highest interacting genes in three networks 148 genes were shared among 701

three networks PA PCC ranked aggregation network SA SCC ranked aggregation network MS MRNET 702

single network B Venn diagram shows the overlap among top 1x106 edges in three networks 106591 edges 703

were shared among three networks C GO term enrichment analyzed by AgriGO for 148 shared highly 704

interacting genes among PA SA and MS D GO term enrichment analyzed by AgriGO for 214 co-expressed 705

genes queried by 16 cell wall pathway genes 706

707

Figure 8 Cell wall pathway subnetworks A Intersections of PCC aggregation (PA) SCC aggregation (SA) and 708

MRNET-single (MS) networks queried by 16 cell wall pathway genes (red nodes) Cyan nodes are genes with 709

reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of 710

involvement in cell wall related pathways Grey lines indicate network predicted interactions B Network 711

retrieved from CORNET database queried by the16 cell wall pathway genes (red node) Cyan nodes are 712

genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior 713

knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions C 714

Network retrieved from STRING database queried by 16 cell wall pathway genes (red nodes) Cyan nodes are 715

genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior 716

knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions 717

718

Supplemental Figure 1 Pipeline and datasets used for analysis A Workflow used in this analysis 719

Independent steps are labeled in square boxes with alternative algorithms for each step in the rounded boxes 720

Software and packages for each step are in italics between the boxes Raw data files were acquired from 721

National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) database converted to a 722

common format (fastq files) and aligned to the maize AGPv3 genome (Alignment) Gene-level reads were 723

counted (Read Count) to generate an expression matrix which was imported to the R environment for the 724

normalization inference and evaluation steps All networks were visualized in Cytoscape B Relative 725

representation of different maize tissues in acquired datasets Tissues are listed by name with the percentage 726

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 22

of the1266 libraries originating from each tissue SAM= Shoot Apical Meristem Samples are grouped by tissue 727

and may be represented by one or more developmental stages of that tissue Tissues represented by less than 728

10 libraries were grouped together as Others C Relative representation of different maize genotypes in our 729

datasets Genotypes are listed by name with the percentage of the 1266 libraries originating from each tissue 730

MAGIC = Multi-parent Advanced Generation InterCrosses Genotypes represented by more than 10 libraries 731

were grouped together as Others 732

733

Supplemental Figure 2 Distribution of gene expression values The frequency of each expression level in the 734

dataset (Density) was plotted against gene expression (Expr) which was calculated after normalization by 735

Variance Stabilizing Transformation (VST) Counts Per Million (CPM) or Reads Per Kilobase per Million 736

mapped reads (RPKM) A-B distribution of expression values for samples normalized with CPM (black line 737

CPM graph) and RPKM (black line RPKM graph) before (A) and after (B) logarithm normalization (log2) VST 738

values are log2 transformed by default The normal distribution of expression (dot lines) was calculated using 739

dnorm() function in R which takes the mean value and standard deviation from log2 transformed expressions 740

C Normalized gene expression values for 15116 genes were averaged libraries and plotted as a function of 741

gene length in base pairs (bp) 742

743

Supplemental Figure 3 Maize CPM-normalized with log2 transformed gene expression from all tissues and 744

developmental stages (Stelpflug et al 2015) A Clustering dendrogram of samples based on Euclidean 745

distance (Height) DAS days after sowing DAP days after pollination V1-V18 vegetative developmental 746

stage B Heat map of the gene expression correlation between pollen tissue and 78 other tissues calculated 747

by Pearson correlation coefficient ranging 06 to 10 Red color indicates higher correlation 748

749

Supplemental Figure 4 Pairwise comparison among results of inferences methods A GO evaluation 750

comparisons for VST CPM and RPKM normalized data The AUROC value density for each method was 751

plotted in diagonal line of blocks between AUROC values and PCC values AUROC values evaluated by GO 752

datasets were plotted pairwise in triangle below diagonal with the number corresponding coefficient values as 753

calculated by Pearson correlation shown in the triangle above diagonal B PPPTY evaluation comparisons for 754

VST CPM and RPKM normalized data The AUROC value density for each method was plotted in diagonal 755

line of blocks between AUROC values and PCC values AUROC values evaluated by PPPTY datasets were 756

plotted pairwise in triangle below diagonal with the number corresponding coefficient values as calculated by 757

Pearson correlation shown in the triangle above diagonal PCC Pearson Correlation Coefficient SCC 758

Spearman correlation coefficient KCC Kendall rank Correlation Coefficient GCC Gini correlation coefficient 759

Bi Biweight midcorrelation CSC Cosine Similarity Coefficient AA Additive ARCNE MA multiplicative 760

ARCNE MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness 761

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 23

762

Supplemental Figure 5 Characteristic of all 1720 PPPTY gene set (ALL_1720) genes with highest AUROC 763

values in CSC method (CSC) PCC method (PCC) and MRNET method (MRNET) Average expression in 764

CPM of four gene sets were in squares average number of lowly expressed elements (CPM lt 0) were in solid 765

circles 766

767

Supplemental Figure 6 Evaluation of network performance based on sample size and inference A AUROC 768

values from GO evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted 769

against sample size B AUROC values from PPPTY evaluations of networks constructed using 12 (S12) to 770

1266 (S1266) libraries were plotted against sample size Networks with the same number of samples included 771

are designated as ldquo_1rdquo ldquo_2rdquo and ldquo_3rdquo Outliers were defined as outside of 15 times the interquartile range 772

above the 75 quantile or below 25 quantile Median values were plotted as bold horizontal lines Dash lines 773

are average AUROC value from 17 individual networks of each categories Mean values of each network were 774

labeled in asterisks PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET 775

Minimum Redundancy NETwork CLR Context Likelihood of Relatedness 776

777

Supplemental Figure 7 GCN performance comparison between protein networks A Area Under the ROC 778

curve (AUROC) values from GO evaluation of protein networks with 17862 genes (ppr_all) and with 11429 779

genes (ppr) B Area Under the ROC curve (AUROC) values from PPPTY evaluation of protein networks with 780

17862 genes (ppr_all) and with 11429 genes (ppr) Both networks were constructed by Pearson Correlation 781

Coefficient (PCC) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate 782

outliers 783

784

Supplemental Figure 8 Average neighborhood connectivity for three selected networks PCC-aggregated (PA) 785

SCC-aggregated (SA) and MRNET-single (MS) The average neighborhood connectivity distribution of all 786

genes is plotted against number of neighbors The top one million edges were chosen for each network Red 787

and blue curve shows the power-law fitted distribution R2 value indicates the fitness with the power-law model 788

789

Supplemental Figure 9 Node distribution for four selected networks PCC-aggregated (PA) SCC-aggregated 790

(SA) and MRNET-single (MS) and the intersection among three networks (Merged network) The number of 791

edges linked to the genes (node degree) was plotted against the number of genes with that degree (number of 792

nodes) Red curve shows the power-law fitted distribution with the function and R2 indicated beside 793

794

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 24

Supplemental Figure 10 Network representation of PCC ranked aggregation network (PA) Each node is a 795

gene in the network The eight largest modules detected by Markov Cluster Algorithm (MCL) were highlighted 796

in colors Genes not in modules 1-8 are light grey nodes 797

798

799

Literature Cited 800

Allen JD Xie Y Chen M Girard L Xiao G (2012) Comparing statistical methods for constructing large scale 801 gene networks PLoS One 7 e29348 802

Anders S Huber W (2010) Differential expression analysis for sequence count data Genome Biol 11 R106 803

Aoki K Ogata Y Shibata D (2007) Approaches for extracting practical information from gene co-expression 804 networks in plant biology Plant Cell Physiol 48 381ndash90 805

Aoki Y Okamura Y Tadaka S Kinoshita K Obayashi T (2015) ATTED-II in 2016 A Plant Coexpression 806 Database Towards Lineage-Specific Coexpression Plant Cell Physiol 57 e5ndashe5 807

Asnicar F Masera L Coller E Gallo C Sella N Tolio T Morettin P Erculiani L Galante F Semeniuta S (2016) 808 NES2RA Network expansion by stratified variable subsetting and ranking aggregation Int J High Perform 809 Comput Appl 1094342016662508 810

Baerenfaller K Grossmann J Grobei MA Hull R Hirsch-Hoffmann M Yalovsky S Zimmermann P 811 Grossniklaus U Gruissem W Baginsky S (2008) Genome-scale proteomics reveals Arabidopsis thaliana 812 gene models and proteome dynamics Science (80- ) 320 938ndash941 813

Ballouz S Verleyen W Gillis J (2015) Guidance for RNA-seq co-expression network construction and analysis 814 Safety in numbers Bioinformatics 31 2123ndash2130 815

Ballouz S Weber M Pavlidis P Gillis J (2016) EGAD Ultra-fast functional analysis of gene networks bioRxiv 816 53868 817

Barabasi A-L Oltvai ZNZN Barabaacutesi A-L (2004) Network biology understanding the cellrsquos functional 818 organization Nat Rev Genet 5 101ndash113 819

Benjamini Y Hochberg Y (1995) Controlling the false discovery rate a practical and powerful approach to 820 multiple testing J R Stat Soc Ser B 289ndash300 821

De Bodt S Hollunder J Nelissen H Meulemeester N Inzeacute D (2012) CORNET 20 Integrating plant 822 coexpression protein-protein interactions regulatory interactions gene associations and functional 823 annotations New Phytol 195 707ndash720 824

Bolduc N Yilmaz A Mejia-Guerra MK Morohashi K OrsquoConnor D Grotewold E Hake S (2012) Unraveling the 825 KNOTTED1 regulatory network in maize meristems Genes Dev 26 1685ndash90 826

Bosch M Mayer C-D Cookson A Donnison IS (2011) Identification of genes involved in cell wall biogenesis in 827 grasses by differential gene expression profiling of elongating and non-elongating maize internodes J 828 Exp Bot 62 3545ndash3561 829

Camacho C Coulouris G Avagyan V Ma N Papadopoulos J Bealer K Madden TL (2009) BLAST+ 830 architecture and applications BMC Bioinformatics 10 421 831

Conesa A Madrigal P Tarazona S Gomez-Cabrero D Cervera A McPherson A Szcześniak MW Gaffney DJ 832 Elo LL Zhang X et al (2016) A survey of best practices for RNA-seq data analysis Genome Biol 17 13 833

Drsquohaeseleer P Liang S Somogyi R (2000) Genetic network inference from co-expression clustering to reverse 834 engineering Bioinformatics 16 707ndash726 835

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 25

Davidson RM Hansey CN Gowda M Childs KL Lin H Vaillancourt B Sekhon RS de Leon N Kaeppler SM 836 Jiang N et al (2011) Utility of RNA Sequencing for Analysis of Maize Reproductive Transcriptomes Plant 837 Genome J 4 191 838

Desprez T Juraniec M Crowell EF Jouy H Pochylova Z Parcy F Houmlfte H Gonneau M Vernhettes S (2007) 839 Organization of cellulose synthase complexes involved in primary cell wall synthesis in Arabidopsis 840 thaliana Proc Natl Acad Sci 104 15572ndash15577 841

Dhillon IS Modha DS (2001) Concept decompositions for large sparse text data using clustering Mach Learn 842 42 143ndash175 843

Dillies M-A Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot G Castel D 844 Estelle J (2013a) A comprehensive evaluation of normalization methods for Illumina high-throughput RNA 845 sequencing data analysis Brief Bioinform 14 671ndash683 846

Dillies MA Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot NS Castel D 847 Estelle J et al (2013b) A comprehensive evaluation of normalization methods for Illumina high-throughput 848 RNA sequencing data analysis Brief Bioinform 14 671ndash683 849

Doncheva NT Assenov Y Domingues FS Albrecht M (2012) Topological analysis and interactive visualization 850 of biological networks and protein structures Nature Protoc 7 670ndash85 851

Dong J Horvath S (2007) Understanding network concepts in modules BMC Syst Biol 1 24 852

Dotto MC Petsch KA Aukerman MJ Beatty M Hammell M Timmermans MCP (2014) Genome-wide analysis 853 of leafbladeless1-regulated and phased small RNAs underscores the importance of the TAS3 ta-siRNA 854 pathway to maize development PLoS Genet 10 e1004826 855

Du D Rawat N Deng Z Gmitter FG (2015) Construction of citrus gene coexpression networks from microarray 856 data using random matrix theory Hortic Res 2 15026 857

Du Z Zhou X Ling Y Zhang Z Su Z (2010) agriGO a GO analysis toolkit for the agricultural community 858 Nucleic Acids Res 38 64-70 859

Enright AJ Van Dongen S Ouzounis CA (2002) An efficient algorithm for large-scale detection of protein 860 families Nucleic Acids Res 30 1575ndash1584 861

Fadista J Vikman P Laakso EO Mollet IG Esguerra J Lou Taneera J Storm P Osmark P Ladenvall C 862 Prasad RB (2014) Global genomic and transcriptomic analysis of human pancreatic islets reveals novel 863 genes influencing glucose metabolism Proc Natl Acad Sci 111 13924ndash13929 864

Faith JJ Hayete B Thaden JT Mogno I Wierzbowski J Cottarel G Kasif S Collins JJ Gardner TS (2007) 865 Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of 866 expression profiles PLoS Biol 5 0054ndash0066 867

Fedoroff N V (2012) McClintockrsquos challenge in the 21st century Proc Natl Acad Sci 109(50) 20200ndash20203 868

Ficklin SP Feltus FA (2011) Gene coexpression network alignment and conservation of gene modules 869 between two grass species maize and rice Plant Physiol 156 1244ndash56 870

Filzmoser P Fritz H Kalcher K (2009) pcaPP Robust PCA by Projection Pursuit R Packag version 1 871

Fu J Cheng Y Linghu J Yang X Kang L Zhang Z Zhang J He C Du X Peng Z (2013) RNA sequencing 872 reveals the complex regulatory network in the maize kernel Nature Commun 42832 873

Gao S Ver Steeg G Galstyan A (2015) Efficient Estimation of Mutual Information for Strongly Dependent 874 Variables Artificial Intelligence and Statistics 277-286 875

Gillis J Pavlidis P (2011) The role of indirect connections in gene networks in predicting function 876 Bioinformatics 27 1860ndash1866 877

Giorgi F Fabbro C Del Licausi F (2013) Comparative study of RNA-seq-and Microarray-derived coexpression 878 networks in Arabidopsis thaliana Bioinformatics 2 1ndash8 879

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 26

Gu Y Kaplinsky N Bringmann M Cobb A Carroll A Sampathkumar A Baskin TI Persson S Somerville CR 880 (2010) Identification of a cellulose synthase-associated protein required for cellulose biosynthesis Proc 881 Natl Acad Sci 107 12866ndash12871 882

Han Y Gao S Muegge K Zhang W Zhou B (2015) Advanced applications of RNA sequencing and challenges 883 Bioinform Biol Insights 9 29ndash46 884

Horvath S Dong J (2008) Geometric interpretation of gene coexpression network analysis PLoS Comput biol 885 4 e1000117 886

Huang J Lynn JS Schulte L Vendramin S McGinnis K (2017) Chapter Two-Epigenetic Control of Gene 887 Expression in Maize Int Rev Cell Mol Biol 328 25ndash48 888

Iancu OD Kawane S Bottomly D Searles R Hitzemann R McWeeney S (2012) Utilizing RNA-Seq data for de 889 novo coexpression network inference Bioinformatics 28 1592ndash1597 890

Kim D Langmead B Salzberg SL (2015) HISAT a fast spliced aligner with low memory requirements Nat 891 Methods 12 357ndash360 892

Koumlster J Rahmann S (2012) Snakemakemdasha scalable bioinformatics workflow engine Bioinformatics 28 893 2520ndash2522 894

Krouk G Lingeman J Colon A Coruzzi G Shasha D (2013) Gene regulatory networks in plants learning 895 causality from time and perturbation Genome Biol 14 123 896

Kumar S Stecher G Suleski M Hedges SB (2017) TimeTree a resource for timelines timetrees and 897 divergence times Mol Biol Evol 34 1812ndash1819 898

Kumari S Nie J Chen H-S Ma H Stewart R Li X Lu M-Z Taylor WM Wei H (2012) Evaluation of gene 899 association methods for coexpression network construction and biological knowledge discovery PLoS 900 One 7 e50411 901

Langfelder P Horvath S (2008) WGCNA an R package for weighted correlation network analysis BMC 902 Bioinformatics 9 559 903

Leinonen R Sugawara H Shumway M (2010) The sequence read archive Nucleic Acids Res gkq1019 904

Li C Qiao Z Qi W Wang Q Yuan Y Yang X Tang Y Mei B Lv Y Zhao H et al (2015a) Genome-Wide 905 Characterization of cis-Acting DNA Targets Reveals the Transcriptional Regulatory Framework of 906 Opaque2 in Maize Plant Cell 27 532-545 907

Li H Peng Z Yang X Wang W Fu J Wang J Han Y Chai Y Guo T Yang N (2013a) Genome-wide 908 association study dissects the genetic architecture of oil biosynthesis in maize kernels Nat Genet 45 43ndash909 50 910

Li J Wei H Zhao PX (2013b) DeGNServer  Deciphering Genome-Scale Gene Networks through High 911 Performance Reverse Engineering Analysis 2013 912

Li P Piao Y Shon HS Ryu KH (2015b) Comparing the normalization methods for the differential analysis of 913 Illumina high-throughput RNA-Seq data BMC Bioinformatics 16 347 914

Li Q Eichten SR Hermanson PJ Zaunbrecher VM Song J Wendt J Rosenbaum H Madzima TF Sloan AE 915 Huang J et al (2014a) Genetic Perturbation of the Maize Methylome Plant Cell 26 4602ndash4616 916

Li S Łabaj PP Zumbo P Sykacek P Shi W Shi L Phan J Wu P-Y Wang M Wang C (2014b) Detecting and 917 correcting systematic variation in large-scale RNA sequencing data Nature Biotechnol 32 888ndash895 918

Li Y Pearl SA Jackson SA (2015c) Gene Networks in Plant Biology Approaches in Reconstruction and 919 Analysis Trends Plant Sci 20 664ndash675 920

Liao Y Smyth GK Shi W (2014) featureCounts an efficient general purpose program for assigning sequence 921 reads to genomic features Bioinformatics 30 923ndash930 922

Lim WK Wang K Lefebvre C Califano A (2007) Comparative analysis of microarray normalization procedures 923 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 27

Effects on reverse engineering gene networks Bioinformatics pp 282ndash288 924

Liu S Liu Y Zhao J Cai S Qian H Zuo K Zhao L Zhang L (2017) A computational interactome for prioritizing 925 genes associated with complex agronomic traits in rice Plant J 90 177-188 926

Lonsdale J Thomas J Salvatore M Phillips R Lo E Shad S Hasz R Walters G Garcia F Young N (2013) 927 The genotype-tissue expression (GTEx) project Nat Genet 45 580ndash585 928

Love MI Huber W Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data 929 with DESeq2 Genome Biol 15 1 930

Luo X You Z Zhou M Li S Leung H Xia Y Zhu Q (2015) A highly efficient approach to protein interactome 931 mapping based on collaborative filtering framework Sci Rep 5 7702 932

Ma C Wang X (2012) Application of the Gini correlation coefficient to infer regulatory relationships in 933 transcriptome analysis Plant Physiol 160 192ndash203 934

Ma H-W Zeng A-P (2003) The connectivity structure giant strong component and centrality of metabolic 935 networks Bioinformatics 19 1423ndash1430 936

Ma S Shah S Bohnert HJ Snyder M Dinesh-Kumar SP (2013) Incorporating motif analysis into gene co-937 expression networks reveals novel modular expression pattern and new signaling pathways PLoS Genet 938 9 e1003840 939

Marbach D Costello JC Kuumlffner R Vega NNM Prill RJ Camacho DM Allison KR Aderhold A Allison KR 940 Bonneau R et al (2012) Wisdom of crowds for robust gene network inference Nat Methods 9 796ndash804 941

Margolin AA Nemenman I Basso K Wiggins C Stolovitzky G Dalla Favera R Califano A (2006) ARACNE 942 an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context BMC 943 Bioinformatics 7 S7 944

Mark Cigan A Unger‐Wallace E Haug‐Collet K (2005) Transcriptional gene silencing as a tool for uncovering 945 gene function in maize Plant J 43 929ndash940 946

Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads EMBnet J 17 947 pp-10 948

Maza E Frasse P Senin P Bouzayen M Zouine M (2013) Comparison of normalization methods for 949 differential gene expression analysis in RNA-Seq experiments A matter of relative size of studied 950 transcriptomes Commun Integr Biol 6 e25849 951

McClintock B (1983) The Significance of Responses of THE GENOME TO CHALLENGE Science (80- ) 792ndash952 801 953

Meyer PE Kontos K Lafitte F Bontempi G (2007) Information-theoretic inference of large transcriptional 954 regulatory networks Eurasip J Bioinforma Syst Biol doi 101155200779879 955

Meyer PE Lafitte F Bontempi G (2008) minet ARBioconductor package for inferring large transcriptional 956 networks using mutual information BMC Bioinformatics 9 461 957

Monaco MK Sen TZ Dharmawardhana PD Ren L Schaeffer M Naithani S Amarasinghe V Thomason J 958 Harper L Gardiner J et al (2013) Maize Metabolic Network Construction and Transcriptome Analysis 959 Plant Genome 6 12 960

Morohashi K Casas MI Falcone Ferreyra ML Falcone Ferreyra L Mejiacutea-Guerra MK Pourcel L Yilmaz A 961 Feller A Carvalho B Emiliani J et al (2012) A genome-wide regulatory framework identifies maize 962 pericarp color1 controlled genes Plant Cell 24 2745ndash64 963

Morris JH Apeltsin L Newman AM Baumbach J Wittkop T Su G Bader GD Ferrin TE (2011) clusterMaker 964 a multi-algorithm clustering plugin for Cytoscape BMC Bioinformatics 12 436 965

Mortazavi A Williams BA McCue K Schaeffer L Wold B (2008) Mapping and quantifying mammalian 966 transcriptomes by RNA-Seq Nat Methods 5 621ndash628 967

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 28

Mukaka MM (2012) A guide to appropriate use of correlation coefficient in medical research Malawi Med J 24 968 69ndash71 969

Obayashi T Hayashi S Saeki M Ohta H Kinoshita K (2009) ATTED-II provides coexpressed gene networks 970 for Arabidopsis Nucleic Acids Res 37 D987ndashD991 971

Ogata Y Suzuki H Sakurai N Shibata D (2010) CoP a database for characterizing co-expressed gene 972 modules with biological information in plants Bioinformatics 26 1267ndash1268 973

Oshlack A Wakefield MJ (2009) Transcript length bias in RNA-seq data confounds systems biology Biol 974 Direct 4 14 975

Park T Yi S-G Kang S-H Lee S Lee Y-S Simon R (2003) Evaluation of normalization methods for microarray 976 data BMC Bioinformatics 4 33 977

Paulson J Chen C-Y Lopes-Ramos CM Kuijjer ML Platig J Sonawane AR Fagny M Glass K Quackenbush 978 J (2016) Tissue-aware RNA-Seq processing and normalization for heterogeneous and sparse data 979 bioRxiv 81802 980

Pautler M Eveland AL LaRue T Yang F Weeks R Lunde C Je B Il Meeley R Komatsu M Vollbrecht E et 981 al (2015) FASCIATED EAR4 Encodes a bZIP Transcription Factor That Regulates Shoot Meristem Size in 982 Maize Plant Cell Online 2 tpc114132506 983

Penning BW Hunter 3rd CT Tayengwa R Eveland AL Dugard CK Olek AT Vermerris W Koch KE McCarty 984 DR Davis MF et al (2009) Genetic resources for maize cell wall biology Plant Physiol 151 1703ndash1728 985

Ponnala L Wang Y Sun Q Wijk KJ (2014) Correlation of mRNA and protein abundance in the developing 986 maize leaf Plant J 78 424ndash440 987

Rau A Gallopin M Celeux G Jaffreacutezic F (2013) Data-based filtering for replicated high-throughput 988 transcriptome sequencing experiments Bioinformatics 29 2146ndash2152 989

Robinson MD McCarthy DJ Smyth GK (2010) edgeR a Bioconductor package for differential expression 990 analysis of digital gene expression data Bioinformatics 26 139ndash140 991

Sales G Romualdi C (2011) parmigenemdasha parallel R package for mutual information estimation and gene 992 network reconstruction Bioinformatics 27 1876ndash1877 993

Sangrador-Vegas A Mitchell AL Chang H-Y Yong S-Y Finn RD (2016) GO annotation in InterPro why 994 stability does not indicate accuracy in a sea of changing annotations Database J Biol databases 995 curation 2016 996

Sato Y Antonio BA Namiki N Takehisa H Minami H Kamatsuki K Sugimoto K Shimizu Y Hirochika H 997 Nagamura Y (2011) RiceXPro a platform for monitoring gene expression in japonica rice grown under 998 natural field conditions Nucleic Acids Res 39 D1141ndashD1148 999

Schaefer RJ Briskine R Springer NM Myers CL (2014) Discovering functional modules across diverse maize 1000 transcriptomes using COB the co-expression browser PLoS One doi 101371journalpone0099193 1001

Schmidt D (2016) Co-Operation Fast Correlation Covariance and Cosine Similarity R package 1002

Schnable P Ware D Fulton R Stein J (2009) The B73 maize genome complexity diversity and dynamics 1003 Science (80- ) 326 1112ndash1115 1004

Schwanhaumlusser B Busse D Li N Dittmar G Schuchhardt J Wolf J Chen W Selbach M (2011) Global 1005 quantification of mammalian gene expression control Nature 473 337ndash342 1006

Scott-Boyer M-P Haibe-Kains B Deschepper CF (2013) Network statistics of genetically-driven gene co-1007 expression modules in mouse crosses Frontiers in Genetics 20134291 1008

Serin EAR Nijveen H Hilhorst HWM Ligterink W (2016) Learning from Co-expression Networks Possibilities 1009 and Challenges Front Plant Sci 7 444 1010

Shannon P Markiel A Ozier O Baliga NS Wang JT Ramage D Amin N Schwikowski B Ideker T (2003) 1011 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 29

Cytoscape a software environment for integrated models of biomolecular interaction networks Genome 1012 Res 13 2498ndash2504 1013

Sing T Sander O Beerenwinkel N Lengauer T (2005) ROCR visualizing classifier performance in R 1014 Bioinformatics 21 3940ndash3941 1015

Singh M Goel S Meeley RB Dantec C Parrinello H Michaud C Leblanc O Grimanelli D (2011) Production of 1016 viable gametes without meiosis in maize deficient for an ARGONAUTE protein Plant Cell 23 443ndash458 1017

Song L Langfelder P Horvath S (2012) Comparison of co-expression measures mutual information 1018 correlation and model based indices BMC Bioinformatics 13 328 1019

Stelpflug SC Rajandeep S Vaillancourt B Hirsch CN Buell CR Leon N De Kaeppler SM (2015) An 1020 expanded maize gene expression atlas based on RNA-sequencing and its use to explore root 1021 development Plant Genome 314ndash362 1022

Szklarczyk D Franceschini A Wyder S Forslund K Heller D Huerta-Cepas J Simonovic M Roth A Santos A 1023 Tsafou KP et al (2015) STRING v10 Protein-protein interaction networks integrated over the tree of life 1024 Nucleic Acids Res 43 D447ndashD452 1025

Tzfadia O Diels T De Meyer S Vandepoele K Aharoni A Van de Peer Y (2016) CoExpNetViz Comparative 1026 Co-Expression Networks Construction and Visualization Tool Front Plant Sci 6 1194 1027

Usadel B Obayashi T Mutwil M Giorgi FM Bassel GW Tanimoto M Chow A Steinhauser D Persson S 1028 Provart NJ (2009) Co-expression tools for plant biology opportunities for hypothesis generation and 1029 caveats Plant Cell Environ 32 1633ndash51 1030

USDA (2016) Grain World Markets and Trade 1031

Vanholme R Demedts B Morreel K Ralph J Boerjan W (2010) Lignin biosynthesis and structure Plant 1032 Physiol 153 895ndash905 1033

Verdin E Ott M (2014) 50 Years of Protein Acetylation From Gene Regulation To Epigenetics Metabolism 1034 and Beyond Nat Rev Mol Cell Biol 16 258ndash264 1035

Walley JW Sartor RC Shen Z Schmitz RJ Wu KJ Urich MA Nery JR Smith LG Schnable JC Ecker JR 1036 (2016) Integration of omic networks in a developmental atlas of maize Science 353 814ndash818 1037

Wang W Zhou XJ Liu Z Sun F (2015a) Network tuned multiple rank aggregation and applications to gene 1038 ranking BMC Bioinformatics 16 S6 1039

Wang YXR Jiang K Feldman LJ Bickel PJ Huang H (2015b) Inferring genendashgene interactions and functional 1040 modules using sparse canonical correlation analysis Ann Appl Stat 9 300ndash323 1041

Wang Z Gerstein M Snyder M (2009) RNA-Seq a revolutionary tool for transcriptomics Nat Rev Genet 10 1042 57ndash63 1043

Wei C Li J Bumgarner RE (2004) Sample size for detecting differentially expressed genes in microarray 1044 experiments BMC Genomics 5 87 1045

Wolfe CJ Kohane IS Butte AJ (2005) Systematic survey reveals general applicability ofldquo guilt-by-associationrdquo 1046 within gene coexpression networks BMC Bioinformatics 6 227 1047

Wu D-D Wang X Li Y Zeng L Irwin DM Zhang Y-P (2014) ldquoOut of Pollenrdquo Hypothesis for Origin of New 1048 Genes in Flowering Plants Study from Arabidopsis thaliana Genome Biol Evol 6 2822ndash2829 1049

Yang H Liu X Xin M Du J Hu Z Peng H Rossi V Sun Q Ni Z Yao Y (2016) Genome-wide Mapping of 1050 Targets of Maize Histone Deacetylase HDA101 Reveals Its Function and Regulatory Mechanism during 1051 Seed Development Plant Cell 28 629ndash645 1052

Yim WC Yu Y Song K Jang CS Lee B-M (2013) PLANEX the plant co-expression database BMC Plant Biol 1053 13 83 1054

Zheng W Chung LM Zhao H (2011) Bias detection and correction in RNA-Sequencing data BMC 1055 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 30

Bioinformatics 12 290 1056

Zhong R Allen JD Xiao G Xie Y (2014) Ensemble-based network aggregation improves the accuracy of gene 1057 network reconstruction PLoS One 9 e106319 1058

Zhong R Ye ZH (2015) Secondary cell walls Biosynthesis patterned deposition and transcriptional regulation 1059 Plant Cell Physiol 56 195ndash214 1060

Zhu G Wu A Xu X-J Xiao P Lu L Liu J Cao Y Chen L Wu J Zhao X-M (2016) PPIM A protein-protein 1061 interaction database for Maize Plant Physiol 170 pp1501821- 1062

Zyprych-Walczak J Szabelska A Handschuh L Goacuterczak K Klamecka K Figlerowicz M Siatkowski I (2015) 1063 The impact of normalization methods on RNA-Seq data analysis Biomed Res Int 2015 1064

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 1 Number of maize microarray and RNA-Seq samples submitted to NCBI (httpswwwncbinlmnihgov) from 2008-2016 A A text search of the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) database identified samples generated by microarray platforms GPL4032 and GPL12620 the total values for years 2008 to 2016 were combined to represent the number of microarray studies (Microarray) A text search of the NCBI Sequence Read Archive (SRA) database was used to identify RNA-seq samples generated between 2008 and 2016 using the Illumina sequencing platform (RNA-Seq) B the number of samples submitted to NCBI GEO database each year generated by microarray platform GPL4032 and GPL12620 were identified by a text search (dash line) and compared to the number of RNA-Seq Illumina samples (solid line) per year 2008-2016

Fig 1A B

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 2 Normalization and network inference methods effect on single network performance A Network performance was evaluated by calculating area under the receiver operating characteristic curve (AUROC) values from GO datasets for comparisons with samples normalized using Variance Stabilizing Transformation (VST) Counts Per Million (CPM) or Reads per Kilobase per Million (RPKM) methods B Network performance was evaluated by calculating AUROC values from PPPTY dataset comparisons for samples normalized using VST CPM or RPKM methods C Network performance was evaluated by calculating AUROC values from com-parisons with HDA101 binding targets for samples normalized using VST CPM or RPKM methods D Network performance was evaluated by calculating AUROC values from comparisons with GO dataset for samples constructed using nine inference methods including Pearson Correlation Coefficient (PCC) Spearman correla-tion coefficient (SCC) Kendall rank Correlation Coefficient (KCC) Gini correlation coefficient (GCC) Biweight midcorrelation (BIC) Cosine Similarity Coefficient (CSC) Additive ARCNE (AA) multiplicative ARCNE (MA) Minimum Redundancy NETwork (MRNET) and Context Likelihood of Relatedness (CLR) E Network perfor-mance was evaluated by calculating AUROC values from comparisons with PPPTY for samples constructed using nine inference methods F Network performance was evaluated by calculating AUROC values from com-parisons with HDA101 binding targets for samples constructed using nine inference methods Outliers were defined as outside of 15 times the interquartile range above 75 quantile or below 25 quantile Median values were plotted as bold horizontal lines For C to F the horizontal dashed lines indicate the highest and lowest AUROC values

Fig 2 A D

B E

C F

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

FigP

FigurePIPSimilarityPbetweenPninePinferencePmethodsPonPnetworkPperformancePbasedPuponPGOPA-PandPPPPTYPB-PevaluationI Genes with highest AUROC value in CSC or MRNETCLR are closed in blue and green box respectively AreaPunderPthePROCPcurvePAUROC-PvaluesPforPeachPGOPtermPorPgenesPwerePscaledPtoPstandardPnormalPdistributionMPresultingPinPscaledPAUROCPvaluesPbetweenPKPblue-PandPPred-IPSamplesPnormalizedPbyPVSTMPCPMPandPRPKMPwerePanalyzedPusingPeachPinferencePmethodsPPCCMPSCCMPKCCMPGCCMPBICMPCSCMPAAMPMAMPMRNETPandPCLR-PandPclusteredPbasedPonP EuclidianP distanceIP PCCP PearsonP CorrelationP CoefficientP SCCP SpearmanP correlationP coefficientP KCCP KendallP rankP CorrelationP CoefficientPGCCPGiniPcorrelationPcoefficientPBICPBiweightPmidcorrelationPCosinePSimilarityPCoefficientPCSC-PAAPAdditivePARCNEPMAPmultiplicativePARCNEPMRNETPMinimumPRedundancyPNETworkPCLRPContextPLikelihoodPofPRelatednessI

A

B

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 4 Effect of sample size on network performance A Average area under the ROC curve (average AUROC) values from GO evaluation of 17 different sized networks plotted against natural logarithm transformed sample size (log(sample size)) B Average AUROC values from PPPTY evaluation of 17 different sized networks plotted against natural logarithm transformed sample size (log(sample size)) Regression fitting logarithm models were plotted in black lines R-square and p-values were calculated by lm() function in R PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness

Fig 4A

B

GO PCC GO SCC

GO MRNET GO CLR

PPPTY PCC PPPTY SCC

PPPTY CLRPPPTY MRNET

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Fig 5

A B

Figure 5 Evaluation of different sized networks constructed with different number of samples using PCC (black) SCC (green) MRNET (red) and CLR (blue) methods A Average AUROC values from GO evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted against sample size B Average AUROC values from PPPTY evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted against sample size Networks with the same number of samples included are designated as ldquo_1rdquo ldquo_2rdquo and ldquo_3rdquo PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 6 GCN performance comparison of networks constructed with 1266 libraries A Area under the ROC curve (AUROC) values from GO evalu-ation of single network (white bars) aggregation network (grey bars) and protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers B AUROC values from PPPTY evaluation of single network (white bars) aggregation network (grey bars) and protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers

AU

C

AU

C

Protein GO Protein PPPTYA B

Fig 6

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

835

45

3812

5 802

148

MS PA

SA

872505

167664

16510411732

9172 716573

106591

MS PA

SA

chromatinassemblydisassembly

cellular macromoleculemetabolic process

chromatin assembly

Hub

N2 compound metabolicprocess

gene silencing

macromoleculemetabolic process

cellular componentorganization

chromatin modification

biosynthetic process

cellular biosyntheticprocess

DNA packaging

organelle organization

protein-DNA complexassembly

nucleosomeorganization

DNA-dep DNAreplication

macromoleculebiosynthetic process

response to DNAdamage stimulus

chromosomeorganization

pattern specificationprocess

DNA replication

DNA conformationchange

translation

cellular macromoleculebiosynthetic process

Nucleic acid metabolicprocess

gene expression

chromatin organizationnucleosome assembly

epigenetic reg of geneexpression

negative regulation ofmacromolecule

metabolic process

cellular response tostress

RNA processing

DNA repair

regionalization

polysaccharidebiosynthetic process

cell wall organization orbiogenesis

glucan metabolicprocess

cellular glucanmetabolic process

cellular polysaccharidebiosynthetic process

cellular carbohydratebiosynthetic process

cellulose metabolicprocess

cellular polysaccharidemetabolic process

cellulose biosyntheticprocess epidermis development

cell growthgrowth

regulation of cellularcomponent size

cellular amino acidderivative metabolic

process

cell wall polysaccharidemetabolic process

carbohydrate metabolicprocess

regulation of anatomicalstructure size

GTP metabolic process

root morphogenesis

epidermal celldifferentiation

ectoderm developmentphenylpropanoid

biosynthetic process

regulation of cell size

glucan biosyntheticprocess

carbohydratebiosynthetic process cellular cell wall

organization orbiogenesis

cell wall biogenesis

Cell Wallroot epidermal cell

differentiationcell differentiation

cell wall organization

protein polymerization

plant-type cell wallbiogenesis

cellular carbohydratemetabolic process

phenylpropanoidmetabolic process

cell wall macromoleculemetabolic process

cellular cell wallmacromolecule

metabolic process

plant-type cell wallorganization or

biogenesis

hemicellulose metabolicprocess

Fig 7A C

B D

Figure 7 Comparison of top 1000 interacting genes in aggregation and single networks A Venn diagram shows the overlap among 1000 highest interacting genes in three networks 148 genes were shared among three networks PA PCC ranked aggregation network SA SCC ranked aggrega-tion network MS MRNET single network B Venn diagram shows the overlap among top 1x106 edges in three networks 106591 edges were shared among three networks C GO term enrichment analyzed by AgriGO for 148 shared highly interacting genes among PA SA and MS D GO term enrichment analyzed by AgriGO for 214 co-expressed genes queried by 16 cell wall pathway genes wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Fig 8

A B C

Figure 8 Cell wall pathway subnetworks A Intersections of PCC aggregation (PA) SCC aggregation (SA) and MRNET-single (MS) networks queried by 16 cell wall pathway genes (red nodes) Cyan nodes are genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions B Network retrieved from CORNET database queried by the 16 cell wall pathway genes (red node) Cyan nodes are genes with reported function in cell wall related path-ways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions C Network retrieved from STRING database queried by the 16 cell wall pathway genes (red nodes) Cyan nodes are genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Parsed CitationsAllen JD Xie Y Chen M Girard L Xiao G (2012) Comparing statistical methods for constructing large scale gene networks PLoS One7 e29348

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Anders S Huber W (2010) Differential expression analysis for sequence count data Genome Biol 11 R106Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Aoki K Ogata Y Shibata D (2007) Approaches for extracting practical information from gene co-expression networks in plant biologyPlant Cell Physiol 48 381-90

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Aoki Y Okamura Y Tadaka S Kinoshita K Obayashi T (2015) ATTED-II in 2016 A Plant Coexpression Database Towards Lineage-Specific Coexpression Plant Cell Physiol 57 e5-e5

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Asnicar F Masera L Coller E Gallo C Sella N Tolio T Morettin P Erculiani L Galante F Semeniuta S (2016) NES2RA Networkexpansion by stratified variable subsetting and ranking aggregation Int J High Perform Comput Appl 1094342016662508

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Baerenfaller K Grossmann J Grobei MA Hull R Hirsch-Hoffmann M Yalovsky S Zimmermann P Grossniklaus U Gruissem WBaginsky S (2008) Genome-scale proteomics reveals Arabidopsis thaliana gene models and proteome dynamics Science (80- ) 320938-941

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ballouz S Verleyen W Gillis J (2015) Guidance for RNA-seq co-expression network construction and analysis Safety in numbersBioinformatics 31 2123-2130

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ballouz S Weber M Pavlidis P Gillis J (2016) EGAD Ultra-fast functional analysis of gene networks bioRxiv 53868Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Barabasi A-L Oltvai ZNZN Barabaacutesi A-L (2004) Network biology understanding the cells functional organization Nat Rev Genet 5101-113

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Benjamini Y Hochberg Y (1995) Controlling the false discovery rate a practical and powerful approach to multiple testing J R Stat SocSer B 289-300

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

De Bodt S Hollunder J Nelissen H Meulemeester N Inzeacute D (2012) CORNET 20 Integrating plant coexpression protein-proteininteractions regulatory interactions gene associations and functional annotations New Phytol 195 707-720

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Bolduc N Yilmaz A Mejia-Guerra MK Morohashi K OConnor D Grotewold E Hake S (2012) Unraveling the KNOTTED1 regulatorynetwork in maize meristems Genes Dev 26 1685-90

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Bosch M Mayer C-D Cookson A Donnison IS (2011) Identification of genes involved in cell wall biogenesis in grasses by differential wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

gene expression profiling of elongating and non-elongating maize internodes J Exp Bot 62 3545-3561Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Camacho C Coulouris G Avagyan V Ma N Papadopoulos J Bealer K Madden TL (2009) BLAST+ architecture and applications BMCBioinformatics 10 421

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Conesa A Madrigal P Tarazona S Gomez-Cabrero D Cervera A McPherson A Szczesniak MW Gaffney DJ Elo LL Zhang X et al(2016) A survey of best practices for RNA-seq data analysis Genome Biol 17 13

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dhaeseleer P Liang S Somogyi R (2000) Genetic network inference from co-expression clustering to reverse engineeringBioinformatics 16 707-726

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Davidson RM Hansey CN Gowda M Childs KL Lin H Vaillancourt B Sekhon RS de Leon N Kaeppler SM Jiang N et al (2011) Utilityof RNA Sequencing for Analysis of Maize Reproductive Transcriptomes Plant Genome J 4 191

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Desprez T Juraniec M Crowell EF Jouy H Pochylova Z Parcy F Houmlfte H Gonneau M Vernhettes S (2007) Organization of cellulosesynthase complexes involved in primary cell wall synthesis in Arabidopsis thaliana Proc Natl Acad Sci 104 15572-15577

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dhillon IS Modha DS (2001) Concept decompositions for large sparse text data using clustering Mach Learn 42 143-175Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dillies M-A Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot G Castel D Estelle J (2013a) Acomprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis Brief Bioinform 14671-683

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dillies MA Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot NS Castel D Estelle J et al (2013b) Acomprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis Brief Bioinform 14671-683

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Doncheva NT Assenov Y Domingues FS Albrecht M (2012) Topological analysis and interactive visualization of biological networksand protein structures Nature Protoc 7 670-85

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dong J Horvath S (2007) Understanding network concepts in modules BMC Syst Biol 1 24Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dotto MC Petsch KA Aukerman MJ Beatty M Hammell M Timmermans MCP (2014) Genome-wide analysis of leafbladeless1-regulatedand phased small RNAs underscores the importance of the TAS3 ta-siRNA pathway to maize development PLoS Genet 10 e1004826

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Du D Rawat N Deng Z Gmitter FG (2015) Construction of citrus gene coexpression networks from microarray data using randommatrix theory Hortic Res 2 15026

Pubmed Author and TitleCrossRef Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Google Scholar Author Only Title Only Author and Title

Du Z Zhou X Ling Y Zhang Z Su Z (2010) agriGO a GO analysis toolkit for the agricultural community Nucleic Acids Res 38 64-70Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Enright AJ Van Dongen S Ouzounis CA (2002) An efficient algorithm for large-scale detection of protein families Nucleic Acids Res 301575-1584

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fadista J Vikman P Laakso EO Mollet IG Esguerra J Lou Taneera J Storm P Osmark P Ladenvall C Prasad RB (2014) Globalgenomic and transcriptomic analysis of human pancreatic islets reveals novel genes influencing glucose metabolism Proc Natl AcadSci 111 13924-13929

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Faith JJ Hayete B Thaden JT Mogno I Wierzbowski J Cottarel G Kasif S Collins JJ Gardner TS (2007) Large-scale mapping andvalidation of Escherichia coli transcriptional regulation from a compendium of expression profiles PLoS Biol 5 0054-0066

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fedoroff N V (2012) McClintocks challenge in the 21st century Proc Natl Acad Sci 109(50) 20200-20203Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ficklin SP Feltus FA (2011) Gene coexpression network alignment and conservation of gene modules between two grass speciesmaize and rice Plant Physiol 156 1244-56

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Filzmoser P Fritz H Kalcher K (2009) pcaPP Robust PCA by Projection Pursuit R Packag version 1Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fu J Cheng Y Linghu J Yang X Kang L Zhang Z Zhang J He C Du X Peng Z (2013) RNA sequencing reveals the complex regulatorynetwork in the maize kernel Nature Commun 42832

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gao S Ver Steeg G Galstyan A (2015) Efficient Estimation of Mutual Information for Strongly Dependent Variables ArtificialIntelligence and Statistics 277-286

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gillis J Pavlidis P (2011) The role of indirect connections in gene networks in predicting function Bioinformatics 27 1860-1866Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Giorgi F Fabbro C Del Licausi F (2013) Comparative study of RNA-seq-and Microarray-derived coexpression networks in Arabidopsisthaliana Bioinformatics 2 1-8

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gu Y Kaplinsky N Bringmann M Cobb A Carroll A Sampathkumar A Baskin TI Persson S Somerville CR (2010) Identification of acellulose synthase-associated protein required for cellulose biosynthesis Proc Natl Acad Sci 107 12866-12871

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Han Y Gao S Muegge K Zhang W Zhou B (2015) Advanced applications of RNA sequencing and challenges Bioinform Biol Insights 929-46

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Horvath S Dong J (2008) Geometric interpretation of gene coexpression network analysis PLoS Comput biol 4 e1000117Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Huang J Lynn JS Schulte L Vendramin S McGinnis K (2017) Chapter Two-Epigenetic Control of Gene Expression in Maize Int RevCell Mol Biol 328 25-48

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Iancu OD Kawane S Bottomly D Searles R Hitzemann R McWeeney S (2012) Utilizing RNA-Seq data for de novo coexpressionnetwork inference Bioinformatics 28 1592-1597

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kim D Langmead B Salzberg SL (2015) HISAT a fast spliced aligner with low memory requirements Nat Methods 12 357-360Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Koumlster J Rahmann S (2012) Snakemakemdasha scalable bioinformatics workflow engine Bioinformatics 28 2520-2522Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Krouk G Lingeman J Colon A Coruzzi G Shasha D (2013) Gene regulatory networks in plants learning causality from time andperturbation Genome Biol 14 123

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kumar S Stecher G Suleski M Hedges SB (2017) TimeTree a resource for timelines timetrees and divergence times Mol Biol Evol34 1812-1819

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kumari S Nie J Chen H-S Ma H Stewart R Li X Lu M-Z Taylor WM Wei H (2012) Evaluation of gene association methods forcoexpression network construction and biological knowledge discovery PLoS One 7 e50411

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Langfelder P Horvath S (2008) WGCNA an R package for weighted correlation network analysis BMC Bioinformatics 9 559Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Leinonen R Sugawara H Shumway M (2010) The sequence read archive Nucleic Acids Res gkq1019Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li C Qiao Z Qi W Wang Q Yuan Y Yang X Tang Y Mei B Lv Y Zhao H et al (2015a) Genome-Wide Characterization of cis-Acting DNATargets Reveals the Transcriptional Regulatory Framework of Opaque2 in Maize Plant Cell 27 532-545

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li H Peng Z Yang X Wang W Fu J Wang J Han Y Chai Y Guo T Yang N (2013a) Genome-wide association study dissects the geneticarchitecture of oil biosynthesis in maize kernels Nat Genet 45 43-50

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li J Wei H Zhao PX (2013b) DeGNServer Deciphering Genome-Scale Gene Networks through High Performance ReverseEngineering Analysis 2013

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li P Piao Y Shon HS Ryu KH (2015b) Comparing the normalization methods for the differential analysis of Illumina high-throughputRNA-Seq data BMC Bioinformatics 16 347 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li Q Eichten SR Hermanson PJ Zaunbrecher VM Song J Wendt J Rosenbaum H Madzima TF Sloan AE Huang J et al (2014a)Genetic Perturbation of the Maize Methylome Plant Cell 26 4602-4616

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li S Labaj PP Zumbo P Sykacek P Shi W Shi L Phan J Wu P-Y Wang M Wang C (2014b) Detecting and correcting systematicvariation in large-scale RNA sequencing data Nature Biotechnol 32 888-895

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li Y Pearl SA Jackson SA (2015c) Gene Networks in Plant Biology Approaches in Reconstruction and Analysis Trends Plant Sci 20664-675

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Liao Y Smyth GK Shi W (2014) featureCounts an efficient general purpose program for assigning sequence reads to genomicfeatures Bioinformatics 30 923-930

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Lim WK Wang K Lefebvre C Califano A (2007) Comparative analysis of microarray normalization procedures Effects on reverseengineering gene networks Bioinformatics pp 282-288

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Liu S Liu Y Zhao J Cai S Qian H Zuo K Zhao L Zhang L (2017) A computational interactome for prioritizing genes associated withcomplex agronomic traits in rice Plant J 90 177-188

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Lonsdale J Thomas J Salvatore M Phillips R Lo E Shad S Hasz R Walters G Garcia F Young N (2013) The genotype-tissueexpression (GTEx) project Nat Genet 45 580-585

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Love MI Huber W Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 Genome Biol15 1

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Luo X You Z Zhou M Li S Leung H Xia Y Zhu Q (2015) A highly efficient approach to protein interactome mapping based oncollaborative filtering framework Sci Rep 5 7702

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma C Wang X (2012) Application of the Gini correlation coefficient to infer regulatory relationships in transcriptome analysis PlantPhysiol 160 192-203

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma H-W Zeng A-P (2003) The connectivity structure giant strong component and centrality of metabolic networks Bioinformatics 191423-1430

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma S Shah S Bohnert HJ Snyder M Dinesh-Kumar SP (2013) Incorporating motif analysis into gene co-expression networks revealsnovel modular expression pattern and new signaling pathways PLoS Genet 9 e1003840

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Marbach D Costello JC Kuumlffner R Vega NNM Prill RJ Camacho DM Allison KR Aderhold A Allison KR Bonneau R et al (2012)Wisdom of crowds for robust gene network inference Nat Methods 9 796-804

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Margolin AA Nemenman I Basso K Wiggins C Stolovitzky G Dalla Favera R Califano A (2006) ARACNE an algorithm for thereconstruction of gene regulatory networks in a mammalian cellular context BMC Bioinformatics 7 S7

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mark Cigan A Unger-Wallace E Haug-Collet K (2005) Transcriptional gene silencing as a tool for uncovering gene function in maizePlant J 43 929-940

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads EMBnet J 17 pp-10Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Maza E Frasse P Senin P Bouzayen M Zouine M (2013) Comparison of normalization methods for differential gene expressionanalysis in RNA-Seq experiments A matter of relative size of studied transcriptomes Commun Integr Biol 6 e25849

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

McClintock B (1983) The Significance of Responses of THE GENOME TO CHALLENGE Science (80- ) 792-801Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Meyer PE Kontos K Lafitte F Bontempi G (2007) Information-theoretic inference of large transcriptional regulatory networks EurasipJ Bioinforma Syst Biol doi 101155200779879

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Meyer PE Lafitte F Bontempi G (2008) minet ARBioconductor package for inferring large transcriptional networks using mutualinformation BMC Bioinformatics 9 461

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Monaco MK Sen TZ Dharmawardhana PD Ren L Schaeffer M Naithani S Amarasinghe V Thomason J Harper L Gardiner J et al(2013) Maize Metabolic Network Construction and Transcriptome Analysis Plant Genome 6 12

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Morohashi K Casas MI Falcone Ferreyra ML Falcone Ferreyra L Mejiacutea-Guerra MK Pourcel L Yilmaz A Feller A Carvalho B EmilianiJ et al (2012) A genome-wide regulatory framework identifies maize pericarp color1 controlled genes Plant Cell 24 2745-64

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Morris JH Apeltsin L Newman AM Baumbach J Wittkop T Su G Bader GD Ferrin TE (2011) clusterMaker a multi-algorithmclustering plugin for Cytoscape BMC Bioinformatics 12 436

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mortazavi A Williams BA McCue K Schaeffer L Wold B (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq NatMethods 5 621-628

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mukaka MM (2012) A guide to appropriate use of correlation coefficient in medical research Malawi Med J 24 69-71Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Obayashi T Hayashi S Saeki M Ohta H Kinoshita K (2009) ATTED-II provides coexpressed gene networks for Arabidopsis Nucleic wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Acids Res 37 D987-D991Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ogata Y Suzuki H Sakurai N Shibata D (2010) CoP a database for characterizing co-expressed gene modules with biologicalinformation in plants Bioinformatics 26 1267-1268

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Oshlack A Wakefield MJ (2009) Transcript length bias in RNA-seq data confounds systems biology Biol Direct 4 14Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Park T Yi S-G Kang S-H Lee S Lee Y-S Simon R (2003) Evaluation of normalization methods for microarray data BMC Bioinformatics4 33

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Paulson J Chen C-Y Lopes-Ramos CM Kuijjer ML Platig J Sonawane AR Fagny M Glass K Quackenbush J (2016) Tissue-awareRNA-Seq processing and normalization for heterogeneous and sparse data bioRxiv 81802

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Pautler M Eveland AL LaRue T Yang F Weeks R Lunde C Je B Il Meeley R Komatsu M Vollbrecht E et al (2015) FASCIATED EAR4Encodes a bZIP Transcription Factor That Regulates Shoot Meristem Size in Maize Plant Cell Online 2 tpc114132506

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Penning BW Hunter 3rd CT Tayengwa R Eveland AL Dugard CK Olek AT Vermerris W Koch KE McCarty DR Davis MF et al (2009)Genetic resources for maize cell wall biology Plant Physiol 151 1703-1728

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ponnala L Wang Y Sun Q Wijk KJ (2014) Correlation of mRNA and protein abundance in the developing maize leaf Plant J 78 424-440Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Rau A Gallopin M Celeux G Jaffreacutezic F (2013) Data-based filtering for replicated high-throughput transcriptome sequencingexperiments Bioinformatics 29 2146-2152

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Robinson MD McCarthy DJ Smyth GK (2010) edgeR a Bioconductor package for differential expression analysis of digital geneexpression data Bioinformatics 26 139-140

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sales G Romualdi C (2011) parmigenemdasha parallel R package for mutual information estimation and gene network reconstructionBioinformatics 27 1876-1877

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sangrador-Vegas A Mitchell AL Chang H-Y Yong S-Y Finn RD (2016) GO annotation in InterPro why stability does not indicateaccuracy in a sea of changing annotations Database J Biol databases curation 2016

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sato Y Antonio BA Namiki N Takehisa H Minami H Kamatsuki K Sugimoto K Shimizu Y Hirochika H Nagamura Y (2011) RiceXPro aplatform for monitoring gene expression in japonica rice grown under natural field conditions Nucleic Acids Res 39 D1141-D1148

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schaefer RJ Briskine R Springer NM Myers CL (2014) Discovering functional modules across diverse maize transcriptomes using wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

COB the co-expression browser PLoS One doi 101371journalpone0099193Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schmidt D (2016) Co-Operation Fast Correlation Covariance and Cosine Similarity R packagePubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schnable P Ware D Fulton R Stein J (2009) The B73 maize genome complexity diversity and dynamics Science (80- ) 326 1112-1115Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schwanhaumlusser B Busse D Li N Dittmar G Schuchhardt J Wolf J Chen W Selbach M (2011) Global quantification of mammalian geneexpression control Nature 473 337-342

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Scott-Boyer M-P Haibe-Kains B Deschepper CF (2013) Network statistics of genetically-driven gene co-expression modules in mousecrosses Frontiers in Genetics 20134291

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Serin EAR Nijveen H Hilhorst HWM Ligterink W (2016) Learning from Co-expression Networks Possibilities and Challenges FrontPlant Sci 7 444

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Shannon P Markiel A Ozier O Baliga NS Wang JT Ramage D Amin N Schwikowski B Ideker T (2003) Cytoscape a softwareenvironment for integrated models of biomolecular interaction networks Genome Res 13 2498-2504

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sing T Sander O Beerenwinkel N Lengauer T (2005) ROCR visualizing classifier performance in R Bioinformatics 21 3940-3941Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Singh M Goel S Meeley RB Dantec C Parrinello H Michaud C Leblanc O Grimanelli D (2011) Production of viable gametes withoutmeiosis in maize deficient for an ARGONAUTE protein Plant Cell 23 443-458

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Song L Langfelder P Horvath S (2012) Comparison of co-expression measures mutual information correlation and model basedindices BMC Bioinformatics 13 328

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Stelpflug SC Rajandeep S Vaillancourt B Hirsch CN Buell CR Leon N De Kaeppler SM (2015) An expanded maize gene expressionatlas based on RNA-sequencing and its use to explore root development Plant Genome 314-362

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Szklarczyk D Franceschini A Wyder S Forslund K Heller D Huerta-Cepas J Simonovic M Roth A Santos A Tsafou KP et al (2015)STRING v10 Protein-protein interaction networks integrated over the tree of life Nucleic Acids Res 43 D447-D452

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Tzfadia O Diels T De Meyer S Vandepoele K Aharoni A Van de Peer Y (2016) CoExpNetViz Comparative Co-Expression NetworksConstruction and Visualization Tool Front Plant Sci 6 1194

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Usadel B Obayashi T Mutwil M Giorgi FM Bassel GW Tanimoto M Chow A Steinhauser D Persson S Provart NJ (2009) Co-expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

USDA (2016) Grain World Markets and Trade

Vanholme R Demedts B Morreel K Ralph J Boerjan W (2010) Lignin biosynthesis and structure Plant Physiol 153 895-905Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Verdin E Ott M (2014) 50 Years of Protein Acetylation From Gene Regulation To Epigenetics Metabolism and Beyond Nat Rev MolCell Biol 16 258-264

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Walley JW Sartor RC Shen Z Schmitz RJ Wu KJ Urich MA Nery JR Smith LG Schnable JC Ecker JR (2016) Integration of omicnetworks in a developmental atlas of maize Science 353 814-818

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang W Zhou XJ Liu Z Sun F (2015a) Network tuned multiple rank aggregation and applications to gene ranking BMC Bioinformatics16 S6

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang YXR Jiang K Feldman LJ Bickel PJ Huang H (2015b) Inferring gene-gene interactions and functional modules using sparsecanonical correlation analysis Ann Appl Stat 9 300-323

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang Z Gerstein M Snyder M (2009) RNA-Seq a revolutionary tool for transcriptomics Nat Rev Genet 10 57-63Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wei C Li J Bumgarner RE (2004) Sample size for detecting differentially expressed genes in microarray experiments BMC Genomics5 87

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wolfe CJ Kohane IS Butte AJ (2005) Systematic survey reveals general applicability of guilt-by-association within genecoexpression networks BMC Bioinformatics 6 227

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wu D-D Wang X Li Y Zeng L Irwin DM Zhang Y-P (2014) Out of Pollen Hypothesis for Origin of New Genes in Flowering PlantsStudy from Arabidopsis thaliana Genome Biol Evol 6 2822-2829

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yang H Liu X Xin M Du J Hu Z Peng H Rossi V Sun Q Ni Z Yao Y (2016) Genome-wide Mapping of Targets of Maize HistoneDeacetylase HDA101 Reveals Its Function and Regulatory Mechanism during Seed Development Plant Cell 28 629-645

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yim WC Yu Y Song K Jang CS Lee B-M (2013) PLANEX the plant co-expression database BMC Plant Biol 13 83Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zheng W Chung LM Zhao H (2011) Bias detection and correction in RNA-Sequencing data BMC Bioinformatics 12 290Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Allen JD Xiao G Xie Y (2014) Ensemble-based network aggregation improves the accuracy of gene network reconstructionPLoS One 9 e106319 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Ye ZH (2015) Secondary cell walls Biosynthesis patterned deposition and transcriptional regulation Plant Cell Physiol 56195-214

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhu G Wu A Xu X-J Xiao P Lu L Liu J Cao Y Chen L Wu J Zhao X-M (2016) PPIM A protein-protein interaction database for MaizePlant Physiol 170 pp1501821-

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zyprych-Walczak J Szabelska A Handschuh L Goacuterczak K Klamecka K Figlerowicz M Siatkowski I (2015) The impact of normalizationmethods on RNA-Seq data analysis Biomed Res Int 2015

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

  • Parsed Citations
  • Article File
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Figure 7
  • Figure 8
  • Parsed Citations
Page 4: 1 : Maize RNA-Seq GCNs · 4 Author names and affiliation: 5 Ji Huang, ... and the genome annotation is still far from complete (Mark 49 Cigan et al., 2005; Ficklin and Feltus, 2011)

Page | 4

publicly accessible at httpwwwbiofsuedumcginnislabmcnmain_pagephp A tutorial is also provided as 105

supplemental material 106

107

Results 108

Manually Curated Maize mRNA Expression Profiling from Publicly Available Datasets 109

Recently the usage of RNA-Seq in maize has increased dramatically generating zero data entries in NCBI-110

SRA in 2008 to over nine hundred in 2016 (Fig 1B) On the contrary the most widely used Affymetrix 111

expression array for maize had 177 samples in 2008 but only 46 in 2016 (Fig 1B) GCN construction 112

approaches have not been optimized for RNA-Seq datasets in plants and doing so could improve the quality 113

and robustness of GCNs To support a comprehensive evaluation on the effect of RNA-Seq normalization 114

methods and network inference methods on the performance of GCNs maize RNA-Seq datasets were 115

compiled and processed with a computational pipeline (Supplemental Fig 1) 1266 high quality RNA-Seq 116

maize libraries from 17 different experiments were selected as input to an expression matrix The 117

corresponding experimental descriptions and publications where available of each library were manually 118

checked for sample information (Supplemental Table S1) Also a filter for reads depth and alignment rate 119

were used to remove unqualified libraries (see Methods for detail) Tissue type and haplotype from those 120

libraries were manually curated and found to include a range of sample types (Supplemental Table S1) Shoot 121

apical meristem (SAM) leaf and root were the top three most abundant tissue types but a wide range of 122

tissues were represented by multiple libraries in the dataset (Supplemental Fig 1) The dataset also included 123

multiple haplotypes although B73 represented approximately 40 of the included libraries To reduce noise 124

lowly expressed genes were removed from analysis leaving 15116 nonredundant genes across the 1266 125

libraries For comparative purposes the Affymetrix Gene Chip maize array includes 13339 genes before 126

filtering (GeneChip Maize Genome Array 127

httpwwwaffymetrixcomcatalog131468AFFYMaize+Genome+Array1_1) 128

129

Three RNA-Seq Normalization Methods Show Comparable Distribution of Expression 130

Expression data from distinct sources and experiments can be highly variable because of hybridization artifacts 131

in microarray or variable sequencing depth in RNA-Seq Many methods have been successfully used for 132

normalizing both microarray and RNA-Seq data to correct for potential biases (Lim et al 2007 Dillies et al 133

2013b Li et al 2015b) To find an optimal normalization method for building a maize GCN from RNA-Seq data 134

three widely used normalization methods were compared This included Variance Stabilizing Transformed 135

(VST) Counts Per Million (CPM) and Reads Per Killobase Per Million (RPKM) (Mortazavi et al 2008 Anders 136

and Huber 2010 Rau et al 2013) For all normalization methods log2 transformation on the normalized 137

expression values reduced the skew of the data distribution (Supplemental Fig 2) Several network studies 138

from plant RNA-Seq data used log2 transformation (Davidson et al 2011 Ma and Wang 2012 Giorgi et al 139 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 5

2013 Stelpflug et al 2015 Walley et al 2016) In our analysis genes with CPM gt 2 in more than 1000 140

samples were included This filter dramatically reduces zero count values in raw data from 30949 to 0367 141

Moreover a prior count of one was added at log2 normalization (expression = log2(CPMRPKM +1)) to avoid 142

problem with remaining zero values The log2 transformation reduced skewed distributions and extreme values 143

represented by outliers (Supplemental Fig 2) Thus we think it is important to apply log2 transformation for our 144

data 145

The distribution of gene expression across the 1266 libraries formed a bell-shaped curve with a small 146

additional peak of low expression for all three methods (Supplemental Fig 2) To determine if these low 147

expression values came from a few or multiple libraries elements within the range of expression that 148

corresponded to the observed peak (lt -37 CPM Supplemental Fig 2B) were extracted from CPM-normalized 149

expression matrix and matched to the originating libraries This demonstrated that the low expression elements 150

were not limited exclusively to specific libraries but eight libraries contributed over 25 of low elements A 151

gene ontology enrichment analysis failed to identify significant gene ontology descriptors within the subset of 152

43 genes that were defined as lowly expressed (data not shown) All eight of these libraries were from pollen 153

tissue where the average gene expression at 147 Counts Per Million (CPM) is lower than the average gene 154

expression of the other 79 tissues combined at average 183 CPM Hierarchical clustering and correlation 155

heatmap with the same data (Stelpflug et al 2015) shows the uniqueness of pollen tissue expression pattern 156

(Langfelder and Horvath 2008) (Supplemental Fig 3) When the lowly expressed elements from RPKM- and 157

VST-normalized data were analyzed to determine library origin and GO enrichment (data not shown) we found 158

similarly high level of pollen-specific libraries without significant GO categories In pollen some highly 159

expressed genes are considered orphan genes (Wu et al 2014) because they lack detectable homologs in 160

another species To investigate whether these lowly expressed genes were orphan genes their gene 161

sequences were blasted against Setaria italica genome (JGIv2) (BLASTX e-value lt 1E-03) Setaria italic 162

(foxtail millet) is a close relative to maize which diverged 234 million years ago (MYA) as estimated by 163

TimeTree (Kumar et al 2017) Only 1 out of 43 genes lacked detectable homologs in Setaria italic (data not 164

shown) indicating that the majority of these genes are not likely to be orphan genes 165

Because RPKM normalization accounts for gene length the distribution of gene length versus expression for 166

the RPKM method was compared to data normalized by VST and CPM methods VST- and CPM-normalized 167

data showed very similar overall patterns with no clear linear relationship between gene length and average 168

expression (Supplemental Fig 2C) RPKM-normalized data displayed an apparent bias toward elevated 169

expression of a small number of genes less than 5000bp in length and lower expression of long genes 170

suggesting that this normalization method might skew the distribution of expression at some genes Overall in 171

spite of these differences the three normalization methods resulted in a similar distribution of expression 172

patterns for most of the genes included in the analysis Additional analysis was completed to determine if the 173

three normalization methods influence network performance 174

175

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 6

Network Performance Does Not Differ Based Upon Normalization Method 176

To compare the efficacy of three normalization and ten inference methods a GCN was generated for each 177

combination of normalization and inference methods Furthermore all networks were rank-standardized to limit 178

the edge weight ranging from 0 to 1 (See Methods) All networks evaluations used the whole adjacency matrix 179

(1511615116 in RNA-Seq networks 1142911429 or 1786217862 in protein networks) without a cut-off 180

The performance of the different networks was measured by comparing the area under the receiver operator 181

characteristic curves (AUROC) AUROC is a measurement used to evaluate the accuracy of classification 182

models making it suitable for evaluating GCNs (Gillis and Pavlidis 2011 Ma and Wang 2012 Liu et al 2017) 183

AUROC values range from 0 to 1 with a value closer to 1 indicating that the network is discriminating 184

nonrandom patterns and perfect classification random networks returning values close to 05 and values 185

closer to 0 indicating a high degree of incorrect classification While an AUROC value close to 1 is optimal 186

values over 07 suggest good performance when analyzing large diverse networks (Gillis and Pavlidis 2011) 187

To set up the AUROC baseline for the random networks maize gene IDs were shuffled 10 (for MRNET and 188

CLR) or 1000 times (for PCC) from the normalized expression matrix The randomized expression matirx were 189

inferenced using designated alorgrithms and further evaluated The resulting AUROC values from randomized 190

networks were very close to 05 (Supplemental Table S2) 191

AUROC values were calculated and compared for three different network characteristics The first 192

characteristic was designed to test if the network identified genes with known or predicted co-expression 193

patterns based upon prior results and inclusion in two existing datasets that could serve as a positive control 194

for co-expression The maize metabolic pathway (MaizeCyc) contains 413 pathways with more than two genes 195

and was built based upon collection of evidence from genome annotation phylogenetic distance and known 196

genes in maize rice and Arabidopsis (Monaco et al 2013) The maize protein-protein interaction database 197

(PPIM) is based upon both predicted and experimentally detected protein interactions (Zhu et al 2016) and 198

was the second dataset used in this analysis Only high-confident interactions from PPIM were used as 199

defined by ranking top 5 in their model (Zhu et al 2016) For comparison with the GCN genes within the 200

same MaizeCyc or PPIM pathways were considered co-expressed The MaizeCyc and PPIM datasets were 201

combined and genes with less than 5 interactions were excluded from evaluation creating a compiled dataset 202

referred to herein as the Protein-Protein and Pathway dataset (PPPTY) PPPTY had 1720 genes and 104856 203

interactions that were used in this evaluation The AUROC value was calculated for each of the 1720 gene 204

terms 205

To assess the effect of normalization method on GCNs AUROC values for all ten inference methods were 206

averaged for each of the three normalization methods All three normalization methods scored similarly in 207

comparison with the PPPTY dataset (Fig 2B) with a mean AUROC value around 0575 for each suggesting 208

that the predicted networks were more selective than a random network 209

The second characteristic was the presence of similar gene ontology (GO) information for maize genes within 210

a detected co-expression set based upon ldquoguilt by associationrdquo that assumes specific subgroups of co-211 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 7

expressed genes have some shared functions (Wolfe et al 2005) GO annotations were downloaded from 212

AgriGO (Du et al 2010) which uses signature integration by InterPro to map gene IDs to GO terms rather 213

than co-expression data InterPro provided over 108 million stable GO terms to the functional protein 214

information database UniProtKB at release 2016_01(Sangrador-Vegas et al 2016) Thus the GO annotations 215

provide a reliable evaluation resource independent of co-expression data To assess this characteristic gene 216

ontology information was used in a neighbor voting algorithm (Gillis and Pavlidis 2011) for sets of co-217

expression matrices and compared Co-expression matrices were assessed by 3-fold cross-validation which 218

involved masking GO terms from some genes to test whether the masked GO terms could be predicted based 219

upon gene expression patterns 277 GO terms were included for this analysis 220

When GO characteristics were used to assess the networks all three normalization methods performed 221

similarly but the AUROC values were higher at around 0689 for each than those observed for comparisons 222

with PPPTY (Fig 2A) Because GO addresses gene functions and PPPTY emphasizes protein-protein 223

interactions this suggests that GCNs are better at predicting functional interactions than physical interactions 224

The p-value from one-way ANOVA for testing normalization method effect on PPPTY and GO dataset were 225

09535 and 04714 respectively confirming that the normalization method did not create a significant difference 226

in the AUROC scores associated with the GCNs for the characteristics that were tested 227

Finally proteins that regulate gene expression or modify chromatin structure might interact with the DNA of a 228

subset of co-expressed genes The interactions between such a protein and regulated DNA could be detected 229

by chromatin precipitation of associated DNA followed by DNA sequencing (ChIP-Seq) In maize there are five 230

ChIP-Seq datasets available (Bolduc et al 2012 Morohashi et al 2012 Li et al 2015a Pautler et al 2015 231

Yang et al 2016) some of which involving lowly expressed or tissue-specific genes For example Opaque2 is 232

specifically expressed in endosperm (Li et al 2015a) Knotted1 is expressed in SAM and floral tissues (Bolduc 233

et al 2012) and Pericarp Color1 has low expression except in inflorescence and seed (Morohashi et al 234

2012) Histone Deacetylase 101 (HDA101) ChIP-Seq data provided the largest dataset for comparison with 26 235

confirmed binding targets that are relatively high expressed in most maize tissues (Yang et al 2016) Histone 236

deacetylation often correlates with decreased in gene expression (Verdin and Ott 2014) High confidence 237

HDA101 targets were defined as those discovered by ChIP-Seq and that also showed increased gene 238

expression in hda101 mutant Networks associated with the 26 high confidence HDA101 targets were 239

compared by calculating AUROC Based upon this analysis the AUROC values were very similar among 240

networks normalized by VST CPM and RPKM (Fig 2C) which is consistent with GO and PPPTY evaluation 241

242

Correlation Methods Performs better than Mutual Information at Some Genes 243

After normalization of the expression matrices they can be processed by different methods for GCN inference 244

To optimize this step the AUROC values of six correlation (PCC SCC KCC GCC BIC CSC) and four mutual 245

information (MI) methods (AA MA MRNET CLR) were compared for the expression matrices that were 246

generated from each of three normalization methods (VST CPM RPKM) and then averaged In general 247 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 8

correlation methods are more computationally efficient while MI methods are able to reveal non-linear 248

relationships (Li et al 2015c) PCC is widely used but may be influenced by outliers (Mukaka 2012) SCC 249

KCC and BIC are less sensitive to outliers because SCC and KCC only consider the rank information and BIC 250

calculates based on dataset median instead of mean (Serin et al 2016) Recently GCC has been shown to 251

be a better correlation method for gene expression analysis because of its capacity to detect non-linear 252

relationships and insensitivity to outliers (Ma and Wang 2012) CSC is widely used for text mining and 253

analyzing sparse data with many zeros (Dhillon and Modha 2001) ARACNE MRNET and CLR showed 254

extended gene-dependent relationships under variable biological settings (Margolin et al 2006 Faith et al 255

2007 Meyer et al 2007 Li et al 2013b) To estimate the effectiveness of the inference methods the same 256

testing parameters with AUROC calculations were performed as described for the testing of normalization 257

methods 258

Assessed by GO datasets the 277 AUROC values were averaged to create one average value for each of the 259

10 inference methods ranging from 0620 to 0724 (Fig 2D) The average AUROC across all normalization 260

methods for six correlation methods was 0718 while the average AUROC for the all four MI methods was 261

0646 The majority of the 277 GO terms had similar AUROC values in the different correlation method-262

generated GCNs and these patterns are different from those observed in the MI-generated GCNs (Fig 3A) 263

The similarity among different methods was also detectable by pairwise comparison and comparing Pearson 264

correlations between the different methods (Supplemental Fig 4A) 265

To evaluate network inference methods with the PPPTY dataset the AUROC values for 1720 genes were 266

averaged for each combination of normalization and inference methods (Fig 2E) This evaluation also showed 267

that the networks constructed using correlation methods resulted in higher AUROC values than MI methods 268

although the CSC method resulted in lower AUROC values than other correlation methods As demonstrated 269

for the GO evaluation results from correlation methods were more similar with each other than the MI methods 270

(Supplemental Fig 4B) Interestingly heatmap results indicated that a subset of genes consistently had higher 271

AUROC values when CSC MRNETCLR or AAMA were used (Fig 3B) although this includes a small enough 272

number of genes that the average AUROC value over the whole gene set was relatively low for those methods 273

The gene sets with highest AUROC values in PCC CSC or MRNET were extracted Characteristics of each 274

gene sets were compared in average expression (CPM) and average number of low expressed elements 275

(CPM lt 0) The CSC gene set had the smallest number of low expression elements and had higher average 276

expression than both the 1720 gene set and the PCC gene set (Supplemental Fig 5) This may indicate that 277

the CSC method is better at determining co-expression for highly expressed genes 278

The AUROC values from 26 targets of HDA101 ChIP-Seq datasets reveals that CSC GCN had the highest 279

AUROC value and the use of MRNETCLR GCNs resulted in slightly higher scores than correlation methods 280

(Fig 2F) This could be explained by the small number of targets creating skewed results but may also 281

indicate that CSCMI methods are more suitable for specific types of genes or interactions between genes 282

(Tzfadia et al 2016) HDA101 is a highly expressed gene in all samples with average expression value equals 283

to 864 CPM and minimum expression equals to 289 CPM so itrsquos possible that HDA101 is more suitable for 284 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 9

CSC method CPM and RPKM normalization methods had higher AUROC values than VST (Fig 2C) Using 285

two models of ARACNE (additive-AA and multiplicative-MA) the co-expression matrices contain less than 05 286

non-zero values for all comparisons and so these techniques were not included in any additional analyses 287

In conclusion our results indicated the widely-used correlation methods resulted in a more predictive maize 288

GCN from a single expression matrix but co-expression with some individual genes may be better detected 289

using MI methods Normalization method did not have a substantial influence on GCNrsquos performance so only 290

CPM normalization was used in conjunction with PCC SCC MRNT and CLR inference for subsequent 291

optimization of other parameters 292

293

Increase Sample Size Had a Positive Effect On GCN 294

GCN analysis can be accomplished with a variable number of samples and datasets but sample size can 295

influence the quality of the resulting GCN (Wei et al 2004 Ballouz et al 2015) Separate analyses were 296

conducted with different numbers of samples and experiments to empirically determine the effect of sample 297

number on GCN effectiveness The data in our analysis consisted of 17 experiments each including between 298

12 and 404 libraries For this analysis CPM normalization method followed by each of four inference methods 299

(PCC SCC MRNET and CLR) was applied to the 17 experiments and the 68 resulting networks were 300

evaluated by both GO and PPPTY 301

From GO and PPPTY evaluation all algorithms exhibit a positive linear relationship between sample size with 302

natural logarithm transformed and average AUROC values (Fig 4) The linear relationships are stronger in 303

PCC and SCC methods with higher r-square values indicating correlation methods benefit more from 304

increasing sample size Thus for building correlation-based GCNs as many samples as possible should be 305

included We also found that as seen for the total GCN analysis PCC and SCC had higher average AUROC 306

values than the MRNET and CLR methods for PPPTY and GO analysis for most of individual networks (Fig 5) 307

308

Ranked Aggregation of Networks Improved Performance of GCNs 309

Ranked aggregation for meta-analysis can also be modified to change the outcomes of GCN by buffering the 310

effect of sample heterogeneity (Zhong et al 2014 Wang et al 2015a Asnicar et al 2016) Aggregated rank 311

standardized correlationMI matrices were calculated from separate experiments to determine if this approach 312

enhanced GCN performance Aggregating individual networks together for meta-analysis can help to highlight 313

true co-expression interactions and reduce noise (Zhong et al 2014 Wang et al 2015a Wang et al 2015b) 314

This analysis was conducted with the 17 differently sized experiments using PCC SCC MRNET and CLR 315

method for GCN inference as we did previously resulting in 68 single GCNs The 17 experiments were 316

aggregated for PCC SCC MRNET and CLR individually and evaluated by GO and PPPTY datasets 317

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 10

Of the 4 aggregated networks that were evaluated the two correlation methods (PCC and SCC) had higher 318

AUROC values than the single network from 1266 samples (Figure 6 and Supplemental Fig 6) However this 319

aggregation strategy did not result in significant higher AUROC scores for the MRNET and CLR method 320

networks compared with single networks with 1266 samples (two-tail Wilcoxon rank test for GO evaluation p-321

values 0494 and 0796) It has been reported that MI estimation accuracy is dependent on sample size (Gao 322

et al 2015) therefore individual MI networks built with a small number of libraries may not demonstrate 323

improved accuracy from aggregation In conclusion the PCCSCC-built GCN performed best using a ranked 324

aggregation strategy and use of this strategy in combination with the other optimized parameters creates a 325

robust GCN 326

327

The Performance of Protein Networks Did Not Exceed Aggregation Networks 328

In many cases mRNA levels in a cell are of interest because mRNA level is thought to be related to the level 329

and function of a protein of interest However many researchers had found inconsistencies between mRNA 330

and protein level (Baerenfaller et al 2008 Schwanhaumlusser et al 2011 Ponnala et al 2014 Walley et al 331

2016) Although relatively less protein expression data is available this data is amenable to GCN construction 332

and could represent a more direct reflection of interacting proteins Using a non-modified protein expression 333

atlas from 23 maize tissues based upon mass spectrometry data (Walley et al 2016) four protein networks 334

were built with PCC SCC MRNET and CLR separately and then evaluated using the same PPPTY and GO 335

dataset as previously mentioned 336

GCNs constructed from protein expression did not exhibit superior AUROC values to those observed for RNA-337

Seq based GCN using the aggregation strategy (Fig 6) When evaluated by GO and PPPTY dataset the 338

performance of the protein network was lower than the aggregated network as well as the single network from 339

1266 samples To confirm this result a two-way ANOVA was computed with pairwise comparison for the GO 340

evaluation which showed that the effect of network type was significant (Supplemental Table S3) A 341

subsequent pairwise comparison using Wilcoxon rank sum test indicated that PCCSCC method were 342

significantly better than MRNETCLR (Supplemental Table S3) although MI methods may be superior for 343

some types of interactions 344

The raw protein expression data included 17862 genes of which 11429 genes overlapped with our RNA-Seq-345

based network and were therefore used for the analysis To demonstrate that the performance of the protein 346

network was not biased due by the selection of genes the PCC method was used for the whole 17862 genes 347

to construct a protein network (Supplemental Fig 7) No improvement could be detected from protein network 348

derived from 17862 genes with p-value equals to 0635 for GO evaluation and 0995 for PPPTY evaluation 349

from one-sided Wilcoxon rank sum test 350

351

PCC and SCC-built GCN Exhibit Identical Topological and Functional Properties 352 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 11

In addition to evaluation of network performance based upon biological characteristics networks can be 353

compared based upon several different network characteristics including clustering coefficient number of 354

nodes network heterogeneity (Dong and Horvath 2007) network centralization (Dong and Horvath 2007) 355

number of detected modules and number of genes in largest module Number of nodes is a basic construct in 356

graph theory depicting the scale of a network Clustering coefficient and number of modules are to model how 357

densely nodes are connected in networks Heterogeneity measures the variability of node connections 358

Centralization indicates how likely some nodes have significantly more connections than average In this 359

analysis each gene corresponds with a node Based on the extensive evaluation using biological 360

characteristics like protein-protein interactions (PPPTY) and predicted gene function (GO) three final maize 361

networks were selected for comparison of basic network characteristics based on their overall performance 362

PCC and SCC-built ranked aggregation network from 17 experiments (PA and SA) MRNET-built single 363

network from 1266 total samples (MS) The three networks were constrained to include the top one million 364

predicted interactions or edges 365

In prior studies most biological networks had scale-free architectures which fit a power-law distribution 366

(Barabasi et al 2004 Doncheva et al 2012 Schaefer et al 2014) For the three final maize networks 367

constructed using optimized parameters both neighborhood connectivity distribution (Supplemental Fig 8) and 368

node degree distribution (Supplemental Fig 9) fit power-law models with r-squared values over 07 The MS 369

network had the highest network centralization value The network heterogeneity value of MS was over two 370

times that of PA and SA indicating that MS may contain more highly interacting genes (Supplemental Table 371

S4) consistent with the observed highest centralization values for this network Centralization and 372

heterogeneity are two variants to model the degree distribution of networks A scale-free network with more 373

numbers of hubs has larger values of centralization and heterogeneity while a network with larger values of 374

centralization and heterogeneity may contain a larger number of hubs or the number of hubs is not significantly 375

large but the degree distributions are extremely imbalanced In biological networks many observations 376

connected large values of centralization and heterogeneity with more hub genes (Ma and Zeng 2003 Horvath 377

and Dong 2008 Iancu et al 2012 Scott-Boyer et al 2013) even though theoretically we cannot rule out the 378

possibility that high values were result from extremely imbalanced degree distribution For the MS network 379

most highly connected genes interacted with a large number of lowly connected genes this pattern is also 380

apparent reflected in the decreasing neighborhood connectivity distribution for the MS network (Supplemental 381

Fig 8) The genes with the most interactions are expected to act as key components in GCN networks 382

(Langfelder and Horvath 2008 Allen et al 2012) and likely represent central regulators of multi-protein 383

biological processes (Ma et al 2013 Du et al 2015) The top 1000 interacting genes from all networks were 384

analyzed in more detail as these were potential ldquohubrdquo genes that may regulate other expression patterns and 385

processes PA and SA shared 95 of the top 1000 interacting genes while MS had 835 unique genes (Fig 386

7A) 148 genes were shared among all three networks (Supplemental Table S5) making these genes strong 387

candidate for central biological regulators The annotation of these genes suggests their participation in a 388

range of basic cellular process (Fig 7C) including gene expression DNA replication translation and gene 389

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 12

silencing (Supplemental Table S5) the top interacting genes were not limited to a subset of cellular 390

biochemistry Ribosomal proteins were the largest component of top interacting genes (27148) which was 391

expected because of their cellular abundance and involvement with translation Interestingly nine epigenetic 392

regulators were found in the 148 shared genes including AGO104 (GRMZM2G141818) (Singh et al 2011) 393

CHR106 (GRMZM2G071025) (Li et al 2014a) and LBL1 (GRMZM2G020187) (Dotto et al 2014) 394

demonstrating the importance of epigenetic regulation for plant development (reviewed by (Huang et al 395

2017)) 396

To reveal the underlying properties of GCNs a graph clustering algorithm Markov Cluster Algorithm(MCL) was 397

used to identify network modules (Enright et al 2002 Morris et al 2011) The result showed a shared pattern 398

between the PA and SA networks that was distinct from the MS network (Supplemental Table S4) The MS 399

network had fewer but larger modules detected than the PA and SA networks Consequently most genes in 400

the MS network clustered into one very large module of 14054 consistent with the high network centralization 401

value for the MS network Conversely PA and SA networks separated into smaller distinct modules with 402

related gene ontology enrichment (Supplemental Table S6 and S7) The pattern displayed by the PA and SA 403

networks (Supplemental Fig 10) seems more likely to represent biologically relevant pathways and so these 404

methods appear to be better for module detection 405

To compile a high-confident co-expression network the top 1 million edges from PA SA and MS were merged 406

together and the intersection of the three produced a 14277 gene 106591 interactions merged network PA 407

and SA shared 835 of common interactions within the networks while MS had 873 unique interactions 408

(Fig 7B) This merged network (Supplemental Dataset S1) was used for a case study analysis of cell wall 409

biosynthesis The same network can also be accessed at httpwwwbiofsuedumcginnislabmcnmain_pagephp 410

411

Case Study Cell Wall Biosynthesis and Regulation 412

To demonstrate the functionality of network the predicted cell wall biosynthesis pathway from the merged 413

network was compared to the existing knowledge of this pathway Sixteen well-characterized components of 414

cell wall biosynthesis were selected as guide genes (Supplemental Table S8) including five cellulose 415

synthase genes seven cellulose synthase-like genes three glycosyl hydrolase genes and one glycosidase 416

gene (Penning et al 2009 Bosch et al 2011) Collectively 214 genes containing 377 edges were extracted 417

from the network with the 16 guide genes (Fig 8 A) two guide genes did not have any co-expressed genes in 418

the network that met the analysis criteria As expected for these 214 genes cell wall related GO terms were 419

enriched (Fig 7D Supplemental Table S9) 420

The resulting 214 co-expressed genes were queried against the Arabidopsis TAIR 10 protein database to 421

retrieve homologs and their annotations using BLASTP The literature was manually searched using the maize 422

genes and their Arabidopsis homologs as queries (Supplemental Table S10) The results of the literature 423

survey showed that 313 (67214) of the genes co-expressed with the guide genes had peer-reviewed 424

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 13

publications indicating a role in cell wall synthesis or related pathways in plants A search using 214 randomly 425

selected genes as queries returned only 327 genes (7214) that were involved in cell wall related pathways 426

This suggests that the network discriminated co-expressed genes and identified some known components of 427

the pathway Lignin biosynthesis genes are expected to function in cell wall biosynthesis to provide rigidity and 428

strength in the secondary cell wall (reviewed by Vanholme et al 2010) Interestingly even though no lignin 429

biosynthesis genes were included in our queries six lignin biosynthesis genes (PAL1 C4H 4CL2 HCT 430

CCoAOMT1 and PDR1) (reviewed by Zhong and Ye 2015) were found to be co-expressed with the guide 431

genes At least nine cellulose biosynthesis and assembly genes were discovered including CESA1 FLA11 432

IRX9 IRX14 and IRX10 (reviewed by Zhong and Ye 2015) Moreover proteins participating in a well-studied 433

physical interaction CSI1 (Cellulose Synthase Interactive 1) CESA6 (Cellulose Synthase 6) and CESA3 434

(Cellulose Synthase 3) (Desprez et al 2007 Gu et al 2010) were also predicted to be expressed in the 435

network There were 131 genes without reported functions in cell wall pathways an indication that GCN 436

analysis can be used to predict undiscovered components of biological pathways in maize 437

The cell wall biosynthesis pathway results were also compared with the CORNET Co-expression database (De 438

Bodt et al 2012) and STRING functional protein association network (Szklarczyk et al 2015) using the same 439

16 genes and similar parameters (See Methods) From CORNET 10 out of 16 genes had co-expressed genes 440

(Fig 8B) In total 210 genes and 325 interactions were retrieved using CORNET of which 19 (40210) had 441

publications supporting their function in cell wall pathways (Supplemental Table S11) STRING performed very 442

well with 14 out of 16 genes demonstrating predicted protein association (Fig 8C) resulting in 817 443

interactions with 76 genes 48 (3675) of co-expressed genes were experimentally confirmed (Supplemental 444

Table S12) the highest percentage among the three methods Only one of the lignin biosynthesis genes 445

(PAL1) was found using CORNET and none were found using STRING Although STRING appears very 446

robust for predicting protein-protein interactions this suggests that an optimized GCN analysis have more 447

power to find genes that function together without physically interacting This case study shows that a robust 448

optimized GCN can discover physical and functional interactions and enhance study of biological relevant 449

interactions A tutorial was provided as supplemental material on how to use Cytoscape to visualize any co-450

expressed genes in our network (Supplemental Dataset S2) 451

452

Discussion 453

As the per-read cost of RNA-Seq technology decreases the use of this technology is quickly increasing With 454

over five thousand libraries available for maize there is now ample data to support GCN analysis This 455

comprehensive evaluation of normalization methods and network inference methods using real maize RNA-456

Seq data will provide a useful set of optimized parameters to support these analyses 457

In our analysis VST CPM and RPKM normalization methods had equivalent outcomes for GCN analysis 458

consistent with prior results using much smaller datasets (Giorgi et al 2013) Several benchmark studies 459

focusing on differential expression (DE) analysis proposed that RPKM performed poorly and should be avoided 460 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 14

(Maza et al 2013 Dillies et al 2013b Zyprych-Walczak et al 2015) This was not observed for the maize 461

GCN testing It is possible that the large number of samples from various labs created enough heterogeneity 462

within samples that normalization effects were minimized (Paulson et al 2016) Furthermore the 463

normalization is on a library basis which means genes within the same library are normalized by similar factors 464

So when the network is constructed by PCC and BIC where expression vectors are centered by mean or 465

median values the effect of different normalization methods are probably small Two rank correlations SCC 466

and KCC only consider difference on relative rankings where normalization has a limited effect It is similar for 467

GCC method The estimation of mutual information is based on the k-nearest neighbor method implemented in 468

parmigene (Sales and Romualdi 2011) Since the three normalization methods shared similar expression 469

distribution (Supplemental Fig 2) MI estimations from different normalizations are expected to be similar 470

When assessing inference methods the simple and widely used correlation methods like PCC and SCC are 471

less time-consuming than MI methods This analysis showed PCCSCC- built GCNs had better overall 472

performance This is consistent with a study in human GCN analysis (Ballouz et al 2015) but SCC did not 473

score higher than other correlation methods using GO and PPPTY evaluations Some genes had higher 474

performance using MI methods but this effect was limited to evaluation with the PPPTY data This may 475

indicate that correlation and MI inference methods assert different kinds of interactions (Meyer et al 2008 476

Marbach et al 2012 Song et al 2012) Marbach et al (2012) stated that integration of multiple inference 477

methods showed a more robust performance than any single inference methods in in silico and E coli 478

expression networks referring to ldquothe wisdom of crowdrdquo However for analysis of the available maize data 479

integration of PCC SCC MRNET and CLR together did not result in a network that outperformed PCC and 480

SCC networks (data not shown) This approach was also less effective in more complex S cerevisiae datasets 481

than prokaryotic networks (Marbach et al 2012) suggesting that more work is required to determine whether 482

integrating algorithms can improve GCNs with eukaryotic data 483

In conclusion we extensively evaluated normalization methods and inference methods for building an RNA-484

Seq based maize GCN This optimization may apply to a range of datasets with shared characteristics of 485

maize including a large and heterogeneous genome with rich and diverse transposon element composition 486

and limited gene annotation 487

488

Materials and Methods 489

RNA-Seq Data Collection and Process 490

The maize genome and its annotation were downloaded from Ensembl Plant Release 31 491

(httpplantsensemblorg) The original 1303 RNA-Seq samples based on illumina HiSeq2000 or Hiseq2500 492

were downloaded from NCBI Sequence Read Archive (SRA) (Leinonen et al 2010) The downloaded files 493

were converted to fastq format using the fastq-dump command in SRA Toolkit (version 252) The adapters for 494

the fastq files were trimmed by Cutadapt 181 (Martin 2011) The adapter-removed files were then quality 495

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 15

checked by FastQC v0112 (httpwwwbioinformaticsbabrahamacukprojectsfastqc) HISAT2 v204 (Kim 496

et al 2015) was used for genome alignment Gene-level expression raw read counts were calculated by 497

FeatureCounts 150 (Liao et al 2014) from aligned bam files (Supplemental Fig S1) 26 libraries with less 498

than 5 million reads total and 11 libraries with less than 70 of total alignment rate were excluded leaving 499

1266 samples (Supplemental Table S1) for the final expression table The processing protocol were 500

streamlined by Snakemake v371 (Koumlster and Rahmann 2012) 501

502

Gene Count Normalization 503

The expression data was normalized using three different methods before constructing GCNs Counts Per 504

Million (CPM) and Reads Per Killbase Per Million (RPKM) were calculated by edgeR package (Robinson et al 505

2010) in R environment and then log2 normalized (expression = log2(CPMRPKM +1) For both method scale 506

factors between samples were estimated by Trimmed Mean of M-values (TMM) in edge R Variance Stabilizing 507

Transformation (VST) was calculated by DESeq2 package (Love et al 2014) Only genes with expression 508

higher than 2 CPM in more than 1000 samples were included from additional analysis (15116 genes) 509

510

Network Inference 511

Six correlation coefficient methods and four mutual information methods were applied to normalized gene 512

expression data to construct GCNs All computing steps were done in the R 331 environment Pearson 513

Correlation Coefficient (PCC) and Spearman Correlation Coefficient (SCC) was calculated by cor() function 514

Kendall rank Correlation Coefficient was calculated using corfk() function in pcaPP package (Filzmoser et al 515

2009) Gini Correlation Coefficient was calculated by adjacencymatrix() function in rsgcc package (Ma and 516

Wang 2012) Biweight midcorrelation was computed by bicor() function in WGCNA package (Langfelder and 517

Horvath 2008) Cosine similarity coefficient was computed by cosine() function in coop package (Schmidt 518

2016) Mutual information results were computed using the parmigene package (Sales and Romualdi 2011) 519

The adjacency matrix weighs derived from ten inference methods were ranked with smallest value equals to 520

one Then ranks were divided by the number of elements in the matrix and diagonal was set to one to make all 521

networks weighs ranging from zero to one 522

523

Network Performance Evaluation 524

To generate the random networks gene IDs were shuffled randomly in CPM or VST normalized expression 525

matrices The randomized expression matrices were then inferenced by PCC MRNET or CLR methods and 526

evaluated For PCC methods 1000 repeats of randomization and evaluation were conducted For MRNET and 527

CLR each inference steps took 2 hours on our server so 10 repeats were conducted 528

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 16

Four maize datasets were used for evaluation First maize protein-protein interactions were downloaded from 529

PPIM v11 (Zhu et al 2016) Only high-confidence interactions were used for evaluation as defined by ranking 530

top 5 in their results Second maize pathway information was downloaded from MaizeCyc v22 (Monaco et 531

al 2013) Genes within same pathways were considered as co-expressed Third maize gene ontology data 532

for AGPv330 was downloaded from AgriGO (Du et al 2010) GO terms with 20 to 300 genes were used for 533

evaluation Fourth ChIP-Seq confirmed targets for HDA101 (GRMZM2G172883) (Yang et al 2016) was used 534

as positive co-expressed examples for evaluation 535

The widely-used Area under Receiver Operating Characteristic (AUROC) for binary classification problems 536

was used for evaluations Protein-protein interaction and pathway information was parsed into lists of co-537

expressed genes Prediction() and performance() function in R package ROCR were used to calculate 538

AUROCs (Sing et al 2005) The 277 AUROC values for GO datasets were calculated by EGAD package 539

(Ballouz et al 2016) in R Basically it utilizes the ldquoguilt-by associationrdquo principle that genes with shared GO 540

terms are more likely to connected Thus networks normalized and inferred by different methods can be 541

evaluated by hiding a subset of genes GO terms and test whether the hidden GO terms could be predicted 542

from the remaining annotations The prediction model performance was measured by AUROC values in three-543

fold cross-validation All ANOVA and pairwise Wilcoxon rank tests were analyzed in R using anova() and 544

pairwisewilcoxtest() function from stats package P-value adjustment method was set to ldquofdrrdquo (Benjamini and 545

Hochberg 1995) 546

Definition of True Positives (TP) False Positives (FP) True Negatives (TN) False Negatives (FN) For the 547

evaluation using PPPTY dataset TP a network predicts two genes are co-expressed and they are co-548

expressed in PPPTY dataset FP a network predicts two genes are co-expressed but they are not TN a 549

network predicts two genes are not co-expressed and they are not co-expressed in PPPTY FN a network 550

predicts two genes are not co-expressed but they are co-expressed in PPPTY datasets For the evaluation 551

using GO dataset TP a network predicts a gene has a specific GO term and it does have that GO term in our 552

GO dataset FP a network predicts a gene has a specific GO term but it does not have that GO term in our 553

GO dataset TN a network predicts a gene does not have a specific GO term and it doesnrsquot have in our GO 554

dataset FN a network predicts a gene does not have a specific GO terms but it has that GO term in GO 555

dataset 556

557

Network Clustering and Characterization 558

For each network the top 1 million edges were selected as stringent co-expression networks The network 559

topological characteristics were computed in Cytoscape (Shannon et al 2003) The neighborhood connectivity 560

distribution and node degree distributions were plotted by Network Analyzer plugin (Doncheva et al 2012) 561

Graph clustering was performed using Markov Cluster Algorithm (MCL) by MCL v14137 with inflation value set 562

to 18 (Enright et al 2002) All networks were visualized in Cytoscape 563

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 17

564

Gene Ontology Enrichment and Visualization 565

Gene ontology enrichment was analyzed in AgriGOrsquos Singular Enrichment Analysis tool (Du et al 2010) 566

15116 genes involved in our networks were used as background references Hypergeometric testing was used 567

to calculate p-value for which a value below 005 was considered as significant The Yekutieli method was 568

used for multiple test correction and terms with false discovery rate (FDR) above 005 were discarded The 569

results were then imported into Cytoscape for visualization 570

571

Databases Comparison on Cell Wall Pathway 572

Sixteen well characterized (Penning et al 2009 Bosch et al 2011) components of cell wall biosynthesis 573

(Supplemental Table S8) were chosen as query genes to search against CORNET Maize 574

(httpsbioinformaticspsbugentbecornetversionscornet_maize10) on website and STRING database using 575

Cytoscape stringApp (httpappscytoscapeorgappsstringapp) The parameters for searching CORNET 576

database were Method=Pearson Correlation coefficient=075 P-value le 005 and Top genes = 50 This 577

resulted in 210 co-expressed genes and 325 interactions To search STRING database the confidence cutoff 578

was set to 04 with maximum number of interactors set to 100 76 genes with 817 interactions were retrieved 579

Maize proteins were blasted against TAIR 10 protein sequences using standalone BLASTP version 2228+ 580

(Camacho et al 2009) 581

582

Acknowledgments 583

We would like to give special thanks to Dr Peixiang Zhao (FSU Department of Computer Science) for advice 584

and discussion on topological analysis of maize networks Also we thank Dr Alan Lemmon (FSU Department 585

of Scientific Computing) and Dr Jonathan Dennis (FSU Department of Biological Science) for the helpful 586

discussion on data analysis 587

588

Supplemental Data 589

Supplemental Figure 1 Pipeline and datasets used for analysis 590

Supplemental Figure 2 Distribution of gene expression values 591

Supplemental Figure 3 Maize CPM-normalized with log2 transformed gene expression from all tissues and 592

developmental stages 593

Supplemental Figure 4 Pairwise comparison among results of inferences methods 594

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 18

Supplemental Figure 5 Characteristic of all 1720 PPPTY gene set (ALL_1720) genes with highest AUROC 595

values in CSC method (CSC) PCC method (PCC) and MRNET method (MRNET) 596

Supplemental Figure 6 Evaluation of network performance based on sample size and inference 597

Supplemental Figure 7 GCN performance comparison between protein networks 598

Supplemental Figure 8 Average neighborhood connectivity for three selected networks PCC-aggregated (PA) 599

SCC-aggregated (SA) and MRNET-single (MS) 600

Supplemental Figure 9 Node distribution for four selected networks PCC-aggregated (PA) SCC-aggregated 601

(SA) and MRNET-single (MS) and the intersection among three networks (Merged network) 602

Supplemental Figure 10 Network representation of PCC ranked aggregation network (PA) 603

Supplemental Table S1 RNA-Seq libraries used in this analysis 604

Supplemental Table S2 Random network AUROC value baseline 605

Supplemental Table S3 ANOVA tables and pairwise comparisons 606

Supplemental Table S4 Topological characteristics of four maize networks 607

Supplemental Table S5 Gene Ontology annotation for 148 hub genes 608

Supplemental Table S6 Enriched GO terms for PCC ranked aggregation networks from module 1 to module 8 609

Supplemental Table S7 Enriched GO terms for SCC ranked aggregation networks from module 1 to module 8 610

Supplemental Table S8 16 query genes in maize cell wall pathway 611

Supplemetal Table S9 GO enrichment analysis for 214 co-expressed genes of cell wall query genes in 612

merged network 613

Supplemental Table S10 Annotation for co-expressed genes queried by 16 cell wall pathway genes from 614

merged network 615

Supplemental Table S11 Annotation for co-expressed genes queried by 16 cell wall pathway genes from 616

CORNET database 617

Supplemental Table S12 Annotation for co-expressed genes queried by 16 cell wall pathway genes from 618

STRING database 619

Supplemental Dataset S1 The merged network in Cytoscape-ready format 620

Supplemental Dataset S2 Tutorial Visualizing Co-expression data in Cytoscape 621

622

623 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 19

624

625

626

Figure legends 627

628

Figure 1 Number of maize microarray and RNA-Seq samples submitted to NCBI (httpswwwncbinlmnihgov) 629

from 2008-2016 A A text search of the National Center for Biotechnology Information (NCBI) Gene 630

Expression Omnibus (GEO) database identified samples generated by microarray platforms GPL4032 and 631

GPL12620 the total values for years 2008 to 2016 were combined to represent the number of microarray 632

studies (Microarray) A text search of the NCBI Sequence Read Archive (SRA) database was used to identify 633

RNA-seq samples generated between 2008 and 2016 using the Illumina sequencing platform (RNA-Seq) B 634

the number of samples submitted to NCBI GEO database each year generated by microarray platform 635

GPL4032 and GPL12620 were identified by a text search (dash line) and compared to the number of RNA-Seq 636

Illumina samples (solid line) per year 2008-2016 637

638

Figure 2 Normalization and network inference methods effect on single network performance A Network 639

performance was evaluated by calculating area under the receiver operating characteristic curve (AUROC) 640

values from GO datasets for comparisons with samples normalized using Variance Stabilizing Transformation 641

(VST) Counts Per Million (CPM) or Reads per Kilobase per Million (RPKM) methods B Network performance 642

was evaluated by calculating AUROC values from PPPTY dataset comparisons for samples normalized using 643

VST CPM or RPKM methods C Network performance was evaluated by calculating AUROC values from 644

comparisons with HDA101 binding targets for samples normalized using VST CPM or RPKM methods D 645

Network performance was evaluated by calculating AUROC values from comparisons with GO dataset for 646

samples constructed using ten inference methods including Pearson Correlation Coefficient (PCC) Spearman 647

correlation coefficient (SCC) Kendall rank Correlation Coefficient (KCC) Gini correlation coefficient (GCC) 648

Biweight midcorrelation (BIC) Cosine Similarity Coefficient (CSC) Additive ARCNE (AA) multiplicative 649

ARCNE (MA) Minimum Redundancy NETwork (MRNET) and Context Likelihood of Relatedness (CLR) E 650

Network performance was evaluated by calculating AUROC values from comparisons with PPPTY for samples 651

constructed using ten inference methods F Network performance was evaluated by calculating AUROC 652

values from comparisons with HDA101 binding targets for samples constructed using ten inference methods 653

Outliers were defined as outside of 15 times the interquartile range above 75 quantile or below 25 quantile 654

Median values were plotted as bold horizontal lines For C to F the horizontal dashed lines indicate the highest 655

and lowest AUROC values 656

657

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 20

Figure 3 Similarity between ten inference methods on network performance based upon GO (A) and PPPTY 658

(B) evaluation Genes with highest AUROC value in CSC or MRNETCLR are closed in blue and green box 659

respectively Area under the ROC curve (AUROC) values for each GO term or genes were scaled to standard 660

normal distribution resulting in scaled AUROC values between -3 (blue) and 3 (red) Samples normalized by 661

VST CPM and RPKM were analyzed using each inference methods (PCC SCC KCC GCC BIC CSC AA 662

MA MRNET and CLR) and clustered based on Euclidian distance PCC Pearson Correlation Coefficient SCC 663

Spearman correlation coefficient KCC Kendall rank Correlation Coefficient GCC Gini correlation coefficient 664

BIC Biweight midcorrelation CSC Cosine Similarity Coefficient AA Additive ARCNE MA multiplicative 665

ARCNE MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness 666

667

Figure 4 Effect of sample size on network performance A Average area under the ROC curve (average 668

AUROC) values from GO evaluation of 17 different sized networks plotted against natural logarithm 669

transformed sample size (log(sample size)) B Average AUROC values from PPPTY evaluation of 17 different 670

sized networks plotted against natural logarithm transformed sample size (log(sample size)) Regression fitting 671

logarithm models were plotted in black lines R-square and p-values were calculated by lm() function in R PCC 672

Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET Minimum Redundancy 673

NETwork CLR Context Likelihood of Relatedness 674

675

Figure 5 Evaluation of different sized networks constructed with different number of samples using PCC 676

(black) SCC (green) MRNET (red) and CLR (blue) methods A Average AUROC values from GO evaluations 677

of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted against sample size Seventeen 678

individual networks were labeled as S12_1 to S404 the S1266 included all samples from 17 experiments B 679

Average AUROC values from PPPTY evaluations of networks constructed using 12 (S12) to 1266 (S1266) 680

libraries were plotted against sample size Networks with the same number of samples included are 681

designated as ldquo_1rdquo ldquo_2rdquo and ldquo_3rdquo PCC Pearson Correlation Coefficient SCC Spearman correlation 682

coefficient MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness 683

684

Fig 6 GCN performance comparison among single network (whiterdquo1266rdquo) aggregated network (greyrdquoaggrdquo) 685

and protein network (dark greyrdquoprrdquo) using PCC SCC MRNET and CLR A GO evaluation on networks 686

Inference methods were indicated by single letter (p- PCC s- SCC m- MRNET c-CLR) AUROC values were 687

plotted against network types B PPPTY evaluation on networks Inference methods were indicated by single 688

letter (p- PCC s- SCC m- MRNET c-CLR) Network types were plotted against AUROC values Bold 689

horizontal lines indicate median star sign is the mean value of each box Outliers are plotted in grey dots 690

691

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 21

Figure 6 GCN performance comparison of networks constructed with 1266 libraries A Area under the ROC 692

curve (AUROC) values from GO evaluation of single network (white bars) aggregation network (grey bars) and 693

protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) 694

or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers B 695

AUROC values from PPPTY evaluation of single network (white bars) aggregation network (grey bars) and 696

protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) 697

or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers 698

699

Figure 7 Comparison of top 1000 interacting genes in aggregation and single networks A Venn diagram 700

shows the overlap among 1000 highest interacting genes in three networks 148 genes were shared among 701

three networks PA PCC ranked aggregation network SA SCC ranked aggregation network MS MRNET 702

single network B Venn diagram shows the overlap among top 1x106 edges in three networks 106591 edges 703

were shared among three networks C GO term enrichment analyzed by AgriGO for 148 shared highly 704

interacting genes among PA SA and MS D GO term enrichment analyzed by AgriGO for 214 co-expressed 705

genes queried by 16 cell wall pathway genes 706

707

Figure 8 Cell wall pathway subnetworks A Intersections of PCC aggregation (PA) SCC aggregation (SA) and 708

MRNET-single (MS) networks queried by 16 cell wall pathway genes (red nodes) Cyan nodes are genes with 709

reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of 710

involvement in cell wall related pathways Grey lines indicate network predicted interactions B Network 711

retrieved from CORNET database queried by the16 cell wall pathway genes (red node) Cyan nodes are 712

genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior 713

knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions C 714

Network retrieved from STRING database queried by 16 cell wall pathway genes (red nodes) Cyan nodes are 715

genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior 716

knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions 717

718

Supplemental Figure 1 Pipeline and datasets used for analysis A Workflow used in this analysis 719

Independent steps are labeled in square boxes with alternative algorithms for each step in the rounded boxes 720

Software and packages for each step are in italics between the boxes Raw data files were acquired from 721

National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) database converted to a 722

common format (fastq files) and aligned to the maize AGPv3 genome (Alignment) Gene-level reads were 723

counted (Read Count) to generate an expression matrix which was imported to the R environment for the 724

normalization inference and evaluation steps All networks were visualized in Cytoscape B Relative 725

representation of different maize tissues in acquired datasets Tissues are listed by name with the percentage 726

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 22

of the1266 libraries originating from each tissue SAM= Shoot Apical Meristem Samples are grouped by tissue 727

and may be represented by one or more developmental stages of that tissue Tissues represented by less than 728

10 libraries were grouped together as Others C Relative representation of different maize genotypes in our 729

datasets Genotypes are listed by name with the percentage of the 1266 libraries originating from each tissue 730

MAGIC = Multi-parent Advanced Generation InterCrosses Genotypes represented by more than 10 libraries 731

were grouped together as Others 732

733

Supplemental Figure 2 Distribution of gene expression values The frequency of each expression level in the 734

dataset (Density) was plotted against gene expression (Expr) which was calculated after normalization by 735

Variance Stabilizing Transformation (VST) Counts Per Million (CPM) or Reads Per Kilobase per Million 736

mapped reads (RPKM) A-B distribution of expression values for samples normalized with CPM (black line 737

CPM graph) and RPKM (black line RPKM graph) before (A) and after (B) logarithm normalization (log2) VST 738

values are log2 transformed by default The normal distribution of expression (dot lines) was calculated using 739

dnorm() function in R which takes the mean value and standard deviation from log2 transformed expressions 740

C Normalized gene expression values for 15116 genes were averaged libraries and plotted as a function of 741

gene length in base pairs (bp) 742

743

Supplemental Figure 3 Maize CPM-normalized with log2 transformed gene expression from all tissues and 744

developmental stages (Stelpflug et al 2015) A Clustering dendrogram of samples based on Euclidean 745

distance (Height) DAS days after sowing DAP days after pollination V1-V18 vegetative developmental 746

stage B Heat map of the gene expression correlation between pollen tissue and 78 other tissues calculated 747

by Pearson correlation coefficient ranging 06 to 10 Red color indicates higher correlation 748

749

Supplemental Figure 4 Pairwise comparison among results of inferences methods A GO evaluation 750

comparisons for VST CPM and RPKM normalized data The AUROC value density for each method was 751

plotted in diagonal line of blocks between AUROC values and PCC values AUROC values evaluated by GO 752

datasets were plotted pairwise in triangle below diagonal with the number corresponding coefficient values as 753

calculated by Pearson correlation shown in the triangle above diagonal B PPPTY evaluation comparisons for 754

VST CPM and RPKM normalized data The AUROC value density for each method was plotted in diagonal 755

line of blocks between AUROC values and PCC values AUROC values evaluated by PPPTY datasets were 756

plotted pairwise in triangle below diagonal with the number corresponding coefficient values as calculated by 757

Pearson correlation shown in the triangle above diagonal PCC Pearson Correlation Coefficient SCC 758

Spearman correlation coefficient KCC Kendall rank Correlation Coefficient GCC Gini correlation coefficient 759

Bi Biweight midcorrelation CSC Cosine Similarity Coefficient AA Additive ARCNE MA multiplicative 760

ARCNE MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness 761

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 23

762

Supplemental Figure 5 Characteristic of all 1720 PPPTY gene set (ALL_1720) genes with highest AUROC 763

values in CSC method (CSC) PCC method (PCC) and MRNET method (MRNET) Average expression in 764

CPM of four gene sets were in squares average number of lowly expressed elements (CPM lt 0) were in solid 765

circles 766

767

Supplemental Figure 6 Evaluation of network performance based on sample size and inference A AUROC 768

values from GO evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted 769

against sample size B AUROC values from PPPTY evaluations of networks constructed using 12 (S12) to 770

1266 (S1266) libraries were plotted against sample size Networks with the same number of samples included 771

are designated as ldquo_1rdquo ldquo_2rdquo and ldquo_3rdquo Outliers were defined as outside of 15 times the interquartile range 772

above the 75 quantile or below 25 quantile Median values were plotted as bold horizontal lines Dash lines 773

are average AUROC value from 17 individual networks of each categories Mean values of each network were 774

labeled in asterisks PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET 775

Minimum Redundancy NETwork CLR Context Likelihood of Relatedness 776

777

Supplemental Figure 7 GCN performance comparison between protein networks A Area Under the ROC 778

curve (AUROC) values from GO evaluation of protein networks with 17862 genes (ppr_all) and with 11429 779

genes (ppr) B Area Under the ROC curve (AUROC) values from PPPTY evaluation of protein networks with 780

17862 genes (ppr_all) and with 11429 genes (ppr) Both networks were constructed by Pearson Correlation 781

Coefficient (PCC) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate 782

outliers 783

784

Supplemental Figure 8 Average neighborhood connectivity for three selected networks PCC-aggregated (PA) 785

SCC-aggregated (SA) and MRNET-single (MS) The average neighborhood connectivity distribution of all 786

genes is plotted against number of neighbors The top one million edges were chosen for each network Red 787

and blue curve shows the power-law fitted distribution R2 value indicates the fitness with the power-law model 788

789

Supplemental Figure 9 Node distribution for four selected networks PCC-aggregated (PA) SCC-aggregated 790

(SA) and MRNET-single (MS) and the intersection among three networks (Merged network) The number of 791

edges linked to the genes (node degree) was plotted against the number of genes with that degree (number of 792

nodes) Red curve shows the power-law fitted distribution with the function and R2 indicated beside 793

794

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 24

Supplemental Figure 10 Network representation of PCC ranked aggregation network (PA) Each node is a 795

gene in the network The eight largest modules detected by Markov Cluster Algorithm (MCL) were highlighted 796

in colors Genes not in modules 1-8 are light grey nodes 797

798

799

Literature Cited 800

Allen JD Xie Y Chen M Girard L Xiao G (2012) Comparing statistical methods for constructing large scale 801 gene networks PLoS One 7 e29348 802

Anders S Huber W (2010) Differential expression analysis for sequence count data Genome Biol 11 R106 803

Aoki K Ogata Y Shibata D (2007) Approaches for extracting practical information from gene co-expression 804 networks in plant biology Plant Cell Physiol 48 381ndash90 805

Aoki Y Okamura Y Tadaka S Kinoshita K Obayashi T (2015) ATTED-II in 2016 A Plant Coexpression 806 Database Towards Lineage-Specific Coexpression Plant Cell Physiol 57 e5ndashe5 807

Asnicar F Masera L Coller E Gallo C Sella N Tolio T Morettin P Erculiani L Galante F Semeniuta S (2016) 808 NES2RA Network expansion by stratified variable subsetting and ranking aggregation Int J High Perform 809 Comput Appl 1094342016662508 810

Baerenfaller K Grossmann J Grobei MA Hull R Hirsch-Hoffmann M Yalovsky S Zimmermann P 811 Grossniklaus U Gruissem W Baginsky S (2008) Genome-scale proteomics reveals Arabidopsis thaliana 812 gene models and proteome dynamics Science (80- ) 320 938ndash941 813

Ballouz S Verleyen W Gillis J (2015) Guidance for RNA-seq co-expression network construction and analysis 814 Safety in numbers Bioinformatics 31 2123ndash2130 815

Ballouz S Weber M Pavlidis P Gillis J (2016) EGAD Ultra-fast functional analysis of gene networks bioRxiv 816 53868 817

Barabasi A-L Oltvai ZNZN Barabaacutesi A-L (2004) Network biology understanding the cellrsquos functional 818 organization Nat Rev Genet 5 101ndash113 819

Benjamini Y Hochberg Y (1995) Controlling the false discovery rate a practical and powerful approach to 820 multiple testing J R Stat Soc Ser B 289ndash300 821

De Bodt S Hollunder J Nelissen H Meulemeester N Inzeacute D (2012) CORNET 20 Integrating plant 822 coexpression protein-protein interactions regulatory interactions gene associations and functional 823 annotations New Phytol 195 707ndash720 824

Bolduc N Yilmaz A Mejia-Guerra MK Morohashi K OrsquoConnor D Grotewold E Hake S (2012) Unraveling the 825 KNOTTED1 regulatory network in maize meristems Genes Dev 26 1685ndash90 826

Bosch M Mayer C-D Cookson A Donnison IS (2011) Identification of genes involved in cell wall biogenesis in 827 grasses by differential gene expression profiling of elongating and non-elongating maize internodes J 828 Exp Bot 62 3545ndash3561 829

Camacho C Coulouris G Avagyan V Ma N Papadopoulos J Bealer K Madden TL (2009) BLAST+ 830 architecture and applications BMC Bioinformatics 10 421 831

Conesa A Madrigal P Tarazona S Gomez-Cabrero D Cervera A McPherson A Szcześniak MW Gaffney DJ 832 Elo LL Zhang X et al (2016) A survey of best practices for RNA-seq data analysis Genome Biol 17 13 833

Drsquohaeseleer P Liang S Somogyi R (2000) Genetic network inference from co-expression clustering to reverse 834 engineering Bioinformatics 16 707ndash726 835

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 25

Davidson RM Hansey CN Gowda M Childs KL Lin H Vaillancourt B Sekhon RS de Leon N Kaeppler SM 836 Jiang N et al (2011) Utility of RNA Sequencing for Analysis of Maize Reproductive Transcriptomes Plant 837 Genome J 4 191 838

Desprez T Juraniec M Crowell EF Jouy H Pochylova Z Parcy F Houmlfte H Gonneau M Vernhettes S (2007) 839 Organization of cellulose synthase complexes involved in primary cell wall synthesis in Arabidopsis 840 thaliana Proc Natl Acad Sci 104 15572ndash15577 841

Dhillon IS Modha DS (2001) Concept decompositions for large sparse text data using clustering Mach Learn 842 42 143ndash175 843

Dillies M-A Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot G Castel D 844 Estelle J (2013a) A comprehensive evaluation of normalization methods for Illumina high-throughput RNA 845 sequencing data analysis Brief Bioinform 14 671ndash683 846

Dillies MA Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot NS Castel D 847 Estelle J et al (2013b) A comprehensive evaluation of normalization methods for Illumina high-throughput 848 RNA sequencing data analysis Brief Bioinform 14 671ndash683 849

Doncheva NT Assenov Y Domingues FS Albrecht M (2012) Topological analysis and interactive visualization 850 of biological networks and protein structures Nature Protoc 7 670ndash85 851

Dong J Horvath S (2007) Understanding network concepts in modules BMC Syst Biol 1 24 852

Dotto MC Petsch KA Aukerman MJ Beatty M Hammell M Timmermans MCP (2014) Genome-wide analysis 853 of leafbladeless1-regulated and phased small RNAs underscores the importance of the TAS3 ta-siRNA 854 pathway to maize development PLoS Genet 10 e1004826 855

Du D Rawat N Deng Z Gmitter FG (2015) Construction of citrus gene coexpression networks from microarray 856 data using random matrix theory Hortic Res 2 15026 857

Du Z Zhou X Ling Y Zhang Z Su Z (2010) agriGO a GO analysis toolkit for the agricultural community 858 Nucleic Acids Res 38 64-70 859

Enright AJ Van Dongen S Ouzounis CA (2002) An efficient algorithm for large-scale detection of protein 860 families Nucleic Acids Res 30 1575ndash1584 861

Fadista J Vikman P Laakso EO Mollet IG Esguerra J Lou Taneera J Storm P Osmark P Ladenvall C 862 Prasad RB (2014) Global genomic and transcriptomic analysis of human pancreatic islets reveals novel 863 genes influencing glucose metabolism Proc Natl Acad Sci 111 13924ndash13929 864

Faith JJ Hayete B Thaden JT Mogno I Wierzbowski J Cottarel G Kasif S Collins JJ Gardner TS (2007) 865 Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of 866 expression profiles PLoS Biol 5 0054ndash0066 867

Fedoroff N V (2012) McClintockrsquos challenge in the 21st century Proc Natl Acad Sci 109(50) 20200ndash20203 868

Ficklin SP Feltus FA (2011) Gene coexpression network alignment and conservation of gene modules 869 between two grass species maize and rice Plant Physiol 156 1244ndash56 870

Filzmoser P Fritz H Kalcher K (2009) pcaPP Robust PCA by Projection Pursuit R Packag version 1 871

Fu J Cheng Y Linghu J Yang X Kang L Zhang Z Zhang J He C Du X Peng Z (2013) RNA sequencing 872 reveals the complex regulatory network in the maize kernel Nature Commun 42832 873

Gao S Ver Steeg G Galstyan A (2015) Efficient Estimation of Mutual Information for Strongly Dependent 874 Variables Artificial Intelligence and Statistics 277-286 875

Gillis J Pavlidis P (2011) The role of indirect connections in gene networks in predicting function 876 Bioinformatics 27 1860ndash1866 877

Giorgi F Fabbro C Del Licausi F (2013) Comparative study of RNA-seq-and Microarray-derived coexpression 878 networks in Arabidopsis thaliana Bioinformatics 2 1ndash8 879

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 26

Gu Y Kaplinsky N Bringmann M Cobb A Carroll A Sampathkumar A Baskin TI Persson S Somerville CR 880 (2010) Identification of a cellulose synthase-associated protein required for cellulose biosynthesis Proc 881 Natl Acad Sci 107 12866ndash12871 882

Han Y Gao S Muegge K Zhang W Zhou B (2015) Advanced applications of RNA sequencing and challenges 883 Bioinform Biol Insights 9 29ndash46 884

Horvath S Dong J (2008) Geometric interpretation of gene coexpression network analysis PLoS Comput biol 885 4 e1000117 886

Huang J Lynn JS Schulte L Vendramin S McGinnis K (2017) Chapter Two-Epigenetic Control of Gene 887 Expression in Maize Int Rev Cell Mol Biol 328 25ndash48 888

Iancu OD Kawane S Bottomly D Searles R Hitzemann R McWeeney S (2012) Utilizing RNA-Seq data for de 889 novo coexpression network inference Bioinformatics 28 1592ndash1597 890

Kim D Langmead B Salzberg SL (2015) HISAT a fast spliced aligner with low memory requirements Nat 891 Methods 12 357ndash360 892

Koumlster J Rahmann S (2012) Snakemakemdasha scalable bioinformatics workflow engine Bioinformatics 28 893 2520ndash2522 894

Krouk G Lingeman J Colon A Coruzzi G Shasha D (2013) Gene regulatory networks in plants learning 895 causality from time and perturbation Genome Biol 14 123 896

Kumar S Stecher G Suleski M Hedges SB (2017) TimeTree a resource for timelines timetrees and 897 divergence times Mol Biol Evol 34 1812ndash1819 898

Kumari S Nie J Chen H-S Ma H Stewart R Li X Lu M-Z Taylor WM Wei H (2012) Evaluation of gene 899 association methods for coexpression network construction and biological knowledge discovery PLoS 900 One 7 e50411 901

Langfelder P Horvath S (2008) WGCNA an R package for weighted correlation network analysis BMC 902 Bioinformatics 9 559 903

Leinonen R Sugawara H Shumway M (2010) The sequence read archive Nucleic Acids Res gkq1019 904

Li C Qiao Z Qi W Wang Q Yuan Y Yang X Tang Y Mei B Lv Y Zhao H et al (2015a) Genome-Wide 905 Characterization of cis-Acting DNA Targets Reveals the Transcriptional Regulatory Framework of 906 Opaque2 in Maize Plant Cell 27 532-545 907

Li H Peng Z Yang X Wang W Fu J Wang J Han Y Chai Y Guo T Yang N (2013a) Genome-wide 908 association study dissects the genetic architecture of oil biosynthesis in maize kernels Nat Genet 45 43ndash909 50 910

Li J Wei H Zhao PX (2013b) DeGNServer  Deciphering Genome-Scale Gene Networks through High 911 Performance Reverse Engineering Analysis 2013 912

Li P Piao Y Shon HS Ryu KH (2015b) Comparing the normalization methods for the differential analysis of 913 Illumina high-throughput RNA-Seq data BMC Bioinformatics 16 347 914

Li Q Eichten SR Hermanson PJ Zaunbrecher VM Song J Wendt J Rosenbaum H Madzima TF Sloan AE 915 Huang J et al (2014a) Genetic Perturbation of the Maize Methylome Plant Cell 26 4602ndash4616 916

Li S Łabaj PP Zumbo P Sykacek P Shi W Shi L Phan J Wu P-Y Wang M Wang C (2014b) Detecting and 917 correcting systematic variation in large-scale RNA sequencing data Nature Biotechnol 32 888ndash895 918

Li Y Pearl SA Jackson SA (2015c) Gene Networks in Plant Biology Approaches in Reconstruction and 919 Analysis Trends Plant Sci 20 664ndash675 920

Liao Y Smyth GK Shi W (2014) featureCounts an efficient general purpose program for assigning sequence 921 reads to genomic features Bioinformatics 30 923ndash930 922

Lim WK Wang K Lefebvre C Califano A (2007) Comparative analysis of microarray normalization procedures 923 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 27

Effects on reverse engineering gene networks Bioinformatics pp 282ndash288 924

Liu S Liu Y Zhao J Cai S Qian H Zuo K Zhao L Zhang L (2017) A computational interactome for prioritizing 925 genes associated with complex agronomic traits in rice Plant J 90 177-188 926

Lonsdale J Thomas J Salvatore M Phillips R Lo E Shad S Hasz R Walters G Garcia F Young N (2013) 927 The genotype-tissue expression (GTEx) project Nat Genet 45 580ndash585 928

Love MI Huber W Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data 929 with DESeq2 Genome Biol 15 1 930

Luo X You Z Zhou M Li S Leung H Xia Y Zhu Q (2015) A highly efficient approach to protein interactome 931 mapping based on collaborative filtering framework Sci Rep 5 7702 932

Ma C Wang X (2012) Application of the Gini correlation coefficient to infer regulatory relationships in 933 transcriptome analysis Plant Physiol 160 192ndash203 934

Ma H-W Zeng A-P (2003) The connectivity structure giant strong component and centrality of metabolic 935 networks Bioinformatics 19 1423ndash1430 936

Ma S Shah S Bohnert HJ Snyder M Dinesh-Kumar SP (2013) Incorporating motif analysis into gene co-937 expression networks reveals novel modular expression pattern and new signaling pathways PLoS Genet 938 9 e1003840 939

Marbach D Costello JC Kuumlffner R Vega NNM Prill RJ Camacho DM Allison KR Aderhold A Allison KR 940 Bonneau R et al (2012) Wisdom of crowds for robust gene network inference Nat Methods 9 796ndash804 941

Margolin AA Nemenman I Basso K Wiggins C Stolovitzky G Dalla Favera R Califano A (2006) ARACNE 942 an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context BMC 943 Bioinformatics 7 S7 944

Mark Cigan A Unger‐Wallace E Haug‐Collet K (2005) Transcriptional gene silencing as a tool for uncovering 945 gene function in maize Plant J 43 929ndash940 946

Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads EMBnet J 17 947 pp-10 948

Maza E Frasse P Senin P Bouzayen M Zouine M (2013) Comparison of normalization methods for 949 differential gene expression analysis in RNA-Seq experiments A matter of relative size of studied 950 transcriptomes Commun Integr Biol 6 e25849 951

McClintock B (1983) The Significance of Responses of THE GENOME TO CHALLENGE Science (80- ) 792ndash952 801 953

Meyer PE Kontos K Lafitte F Bontempi G (2007) Information-theoretic inference of large transcriptional 954 regulatory networks Eurasip J Bioinforma Syst Biol doi 101155200779879 955

Meyer PE Lafitte F Bontempi G (2008) minet ARBioconductor package for inferring large transcriptional 956 networks using mutual information BMC Bioinformatics 9 461 957

Monaco MK Sen TZ Dharmawardhana PD Ren L Schaeffer M Naithani S Amarasinghe V Thomason J 958 Harper L Gardiner J et al (2013) Maize Metabolic Network Construction and Transcriptome Analysis 959 Plant Genome 6 12 960

Morohashi K Casas MI Falcone Ferreyra ML Falcone Ferreyra L Mejiacutea-Guerra MK Pourcel L Yilmaz A 961 Feller A Carvalho B Emiliani J et al (2012) A genome-wide regulatory framework identifies maize 962 pericarp color1 controlled genes Plant Cell 24 2745ndash64 963

Morris JH Apeltsin L Newman AM Baumbach J Wittkop T Su G Bader GD Ferrin TE (2011) clusterMaker 964 a multi-algorithm clustering plugin for Cytoscape BMC Bioinformatics 12 436 965

Mortazavi A Williams BA McCue K Schaeffer L Wold B (2008) Mapping and quantifying mammalian 966 transcriptomes by RNA-Seq Nat Methods 5 621ndash628 967

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 28

Mukaka MM (2012) A guide to appropriate use of correlation coefficient in medical research Malawi Med J 24 968 69ndash71 969

Obayashi T Hayashi S Saeki M Ohta H Kinoshita K (2009) ATTED-II provides coexpressed gene networks 970 for Arabidopsis Nucleic Acids Res 37 D987ndashD991 971

Ogata Y Suzuki H Sakurai N Shibata D (2010) CoP a database for characterizing co-expressed gene 972 modules with biological information in plants Bioinformatics 26 1267ndash1268 973

Oshlack A Wakefield MJ (2009) Transcript length bias in RNA-seq data confounds systems biology Biol 974 Direct 4 14 975

Park T Yi S-G Kang S-H Lee S Lee Y-S Simon R (2003) Evaluation of normalization methods for microarray 976 data BMC Bioinformatics 4 33 977

Paulson J Chen C-Y Lopes-Ramos CM Kuijjer ML Platig J Sonawane AR Fagny M Glass K Quackenbush 978 J (2016) Tissue-aware RNA-Seq processing and normalization for heterogeneous and sparse data 979 bioRxiv 81802 980

Pautler M Eveland AL LaRue T Yang F Weeks R Lunde C Je B Il Meeley R Komatsu M Vollbrecht E et 981 al (2015) FASCIATED EAR4 Encodes a bZIP Transcription Factor That Regulates Shoot Meristem Size in 982 Maize Plant Cell Online 2 tpc114132506 983

Penning BW Hunter 3rd CT Tayengwa R Eveland AL Dugard CK Olek AT Vermerris W Koch KE McCarty 984 DR Davis MF et al (2009) Genetic resources for maize cell wall biology Plant Physiol 151 1703ndash1728 985

Ponnala L Wang Y Sun Q Wijk KJ (2014) Correlation of mRNA and protein abundance in the developing 986 maize leaf Plant J 78 424ndash440 987

Rau A Gallopin M Celeux G Jaffreacutezic F (2013) Data-based filtering for replicated high-throughput 988 transcriptome sequencing experiments Bioinformatics 29 2146ndash2152 989

Robinson MD McCarthy DJ Smyth GK (2010) edgeR a Bioconductor package for differential expression 990 analysis of digital gene expression data Bioinformatics 26 139ndash140 991

Sales G Romualdi C (2011) parmigenemdasha parallel R package for mutual information estimation and gene 992 network reconstruction Bioinformatics 27 1876ndash1877 993

Sangrador-Vegas A Mitchell AL Chang H-Y Yong S-Y Finn RD (2016) GO annotation in InterPro why 994 stability does not indicate accuracy in a sea of changing annotations Database J Biol databases 995 curation 2016 996

Sato Y Antonio BA Namiki N Takehisa H Minami H Kamatsuki K Sugimoto K Shimizu Y Hirochika H 997 Nagamura Y (2011) RiceXPro a platform for monitoring gene expression in japonica rice grown under 998 natural field conditions Nucleic Acids Res 39 D1141ndashD1148 999

Schaefer RJ Briskine R Springer NM Myers CL (2014) Discovering functional modules across diverse maize 1000 transcriptomes using COB the co-expression browser PLoS One doi 101371journalpone0099193 1001

Schmidt D (2016) Co-Operation Fast Correlation Covariance and Cosine Similarity R package 1002

Schnable P Ware D Fulton R Stein J (2009) The B73 maize genome complexity diversity and dynamics 1003 Science (80- ) 326 1112ndash1115 1004

Schwanhaumlusser B Busse D Li N Dittmar G Schuchhardt J Wolf J Chen W Selbach M (2011) Global 1005 quantification of mammalian gene expression control Nature 473 337ndash342 1006

Scott-Boyer M-P Haibe-Kains B Deschepper CF (2013) Network statistics of genetically-driven gene co-1007 expression modules in mouse crosses Frontiers in Genetics 20134291 1008

Serin EAR Nijveen H Hilhorst HWM Ligterink W (2016) Learning from Co-expression Networks Possibilities 1009 and Challenges Front Plant Sci 7 444 1010

Shannon P Markiel A Ozier O Baliga NS Wang JT Ramage D Amin N Schwikowski B Ideker T (2003) 1011 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 29

Cytoscape a software environment for integrated models of biomolecular interaction networks Genome 1012 Res 13 2498ndash2504 1013

Sing T Sander O Beerenwinkel N Lengauer T (2005) ROCR visualizing classifier performance in R 1014 Bioinformatics 21 3940ndash3941 1015

Singh M Goel S Meeley RB Dantec C Parrinello H Michaud C Leblanc O Grimanelli D (2011) Production of 1016 viable gametes without meiosis in maize deficient for an ARGONAUTE protein Plant Cell 23 443ndash458 1017

Song L Langfelder P Horvath S (2012) Comparison of co-expression measures mutual information 1018 correlation and model based indices BMC Bioinformatics 13 328 1019

Stelpflug SC Rajandeep S Vaillancourt B Hirsch CN Buell CR Leon N De Kaeppler SM (2015) An 1020 expanded maize gene expression atlas based on RNA-sequencing and its use to explore root 1021 development Plant Genome 314ndash362 1022

Szklarczyk D Franceschini A Wyder S Forslund K Heller D Huerta-Cepas J Simonovic M Roth A Santos A 1023 Tsafou KP et al (2015) STRING v10 Protein-protein interaction networks integrated over the tree of life 1024 Nucleic Acids Res 43 D447ndashD452 1025

Tzfadia O Diels T De Meyer S Vandepoele K Aharoni A Van de Peer Y (2016) CoExpNetViz Comparative 1026 Co-Expression Networks Construction and Visualization Tool Front Plant Sci 6 1194 1027

Usadel B Obayashi T Mutwil M Giorgi FM Bassel GW Tanimoto M Chow A Steinhauser D Persson S 1028 Provart NJ (2009) Co-expression tools for plant biology opportunities for hypothesis generation and 1029 caveats Plant Cell Environ 32 1633ndash51 1030

USDA (2016) Grain World Markets and Trade 1031

Vanholme R Demedts B Morreel K Ralph J Boerjan W (2010) Lignin biosynthesis and structure Plant 1032 Physiol 153 895ndash905 1033

Verdin E Ott M (2014) 50 Years of Protein Acetylation From Gene Regulation To Epigenetics Metabolism 1034 and Beyond Nat Rev Mol Cell Biol 16 258ndash264 1035

Walley JW Sartor RC Shen Z Schmitz RJ Wu KJ Urich MA Nery JR Smith LG Schnable JC Ecker JR 1036 (2016) Integration of omic networks in a developmental atlas of maize Science 353 814ndash818 1037

Wang W Zhou XJ Liu Z Sun F (2015a) Network tuned multiple rank aggregation and applications to gene 1038 ranking BMC Bioinformatics 16 S6 1039

Wang YXR Jiang K Feldman LJ Bickel PJ Huang H (2015b) Inferring genendashgene interactions and functional 1040 modules using sparse canonical correlation analysis Ann Appl Stat 9 300ndash323 1041

Wang Z Gerstein M Snyder M (2009) RNA-Seq a revolutionary tool for transcriptomics Nat Rev Genet 10 1042 57ndash63 1043

Wei C Li J Bumgarner RE (2004) Sample size for detecting differentially expressed genes in microarray 1044 experiments BMC Genomics 5 87 1045

Wolfe CJ Kohane IS Butte AJ (2005) Systematic survey reveals general applicability ofldquo guilt-by-associationrdquo 1046 within gene coexpression networks BMC Bioinformatics 6 227 1047

Wu D-D Wang X Li Y Zeng L Irwin DM Zhang Y-P (2014) ldquoOut of Pollenrdquo Hypothesis for Origin of New 1048 Genes in Flowering Plants Study from Arabidopsis thaliana Genome Biol Evol 6 2822ndash2829 1049

Yang H Liu X Xin M Du J Hu Z Peng H Rossi V Sun Q Ni Z Yao Y (2016) Genome-wide Mapping of 1050 Targets of Maize Histone Deacetylase HDA101 Reveals Its Function and Regulatory Mechanism during 1051 Seed Development Plant Cell 28 629ndash645 1052

Yim WC Yu Y Song K Jang CS Lee B-M (2013) PLANEX the plant co-expression database BMC Plant Biol 1053 13 83 1054

Zheng W Chung LM Zhao H (2011) Bias detection and correction in RNA-Sequencing data BMC 1055 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 30

Bioinformatics 12 290 1056

Zhong R Allen JD Xiao G Xie Y (2014) Ensemble-based network aggregation improves the accuracy of gene 1057 network reconstruction PLoS One 9 e106319 1058

Zhong R Ye ZH (2015) Secondary cell walls Biosynthesis patterned deposition and transcriptional regulation 1059 Plant Cell Physiol 56 195ndash214 1060

Zhu G Wu A Xu X-J Xiao P Lu L Liu J Cao Y Chen L Wu J Zhao X-M (2016) PPIM A protein-protein 1061 interaction database for Maize Plant Physiol 170 pp1501821- 1062

Zyprych-Walczak J Szabelska A Handschuh L Goacuterczak K Klamecka K Figlerowicz M Siatkowski I (2015) 1063 The impact of normalization methods on RNA-Seq data analysis Biomed Res Int 2015 1064

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 1 Number of maize microarray and RNA-Seq samples submitted to NCBI (httpswwwncbinlmnihgov) from 2008-2016 A A text search of the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) database identified samples generated by microarray platforms GPL4032 and GPL12620 the total values for years 2008 to 2016 were combined to represent the number of microarray studies (Microarray) A text search of the NCBI Sequence Read Archive (SRA) database was used to identify RNA-seq samples generated between 2008 and 2016 using the Illumina sequencing platform (RNA-Seq) B the number of samples submitted to NCBI GEO database each year generated by microarray platform GPL4032 and GPL12620 were identified by a text search (dash line) and compared to the number of RNA-Seq Illumina samples (solid line) per year 2008-2016

Fig 1A B

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 2 Normalization and network inference methods effect on single network performance A Network performance was evaluated by calculating area under the receiver operating characteristic curve (AUROC) values from GO datasets for comparisons with samples normalized using Variance Stabilizing Transformation (VST) Counts Per Million (CPM) or Reads per Kilobase per Million (RPKM) methods B Network performance was evaluated by calculating AUROC values from PPPTY dataset comparisons for samples normalized using VST CPM or RPKM methods C Network performance was evaluated by calculating AUROC values from com-parisons with HDA101 binding targets for samples normalized using VST CPM or RPKM methods D Network performance was evaluated by calculating AUROC values from comparisons with GO dataset for samples constructed using nine inference methods including Pearson Correlation Coefficient (PCC) Spearman correla-tion coefficient (SCC) Kendall rank Correlation Coefficient (KCC) Gini correlation coefficient (GCC) Biweight midcorrelation (BIC) Cosine Similarity Coefficient (CSC) Additive ARCNE (AA) multiplicative ARCNE (MA) Minimum Redundancy NETwork (MRNET) and Context Likelihood of Relatedness (CLR) E Network perfor-mance was evaluated by calculating AUROC values from comparisons with PPPTY for samples constructed using nine inference methods F Network performance was evaluated by calculating AUROC values from com-parisons with HDA101 binding targets for samples constructed using nine inference methods Outliers were defined as outside of 15 times the interquartile range above 75 quantile or below 25 quantile Median values were plotted as bold horizontal lines For C to F the horizontal dashed lines indicate the highest and lowest AUROC values

Fig 2 A D

B E

C F

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

FigP

FigurePIPSimilarityPbetweenPninePinferencePmethodsPonPnetworkPperformancePbasedPuponPGOPA-PandPPPPTYPB-PevaluationI Genes with highest AUROC value in CSC or MRNETCLR are closed in blue and green box respectively AreaPunderPthePROCPcurvePAUROC-PvaluesPforPeachPGOPtermPorPgenesPwerePscaledPtoPstandardPnormalPdistributionMPresultingPinPscaledPAUROCPvaluesPbetweenPKPblue-PandPPred-IPSamplesPnormalizedPbyPVSTMPCPMPandPRPKMPwerePanalyzedPusingPeachPinferencePmethodsPPCCMPSCCMPKCCMPGCCMPBICMPCSCMPAAMPMAMPMRNETPandPCLR-PandPclusteredPbasedPonP EuclidianP distanceIP PCCP PearsonP CorrelationP CoefficientP SCCP SpearmanP correlationP coefficientP KCCP KendallP rankP CorrelationP CoefficientPGCCPGiniPcorrelationPcoefficientPBICPBiweightPmidcorrelationPCosinePSimilarityPCoefficientPCSC-PAAPAdditivePARCNEPMAPmultiplicativePARCNEPMRNETPMinimumPRedundancyPNETworkPCLRPContextPLikelihoodPofPRelatednessI

A

B

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 4 Effect of sample size on network performance A Average area under the ROC curve (average AUROC) values from GO evaluation of 17 different sized networks plotted against natural logarithm transformed sample size (log(sample size)) B Average AUROC values from PPPTY evaluation of 17 different sized networks plotted against natural logarithm transformed sample size (log(sample size)) Regression fitting logarithm models were plotted in black lines R-square and p-values were calculated by lm() function in R PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness

Fig 4A

B

GO PCC GO SCC

GO MRNET GO CLR

PPPTY PCC PPPTY SCC

PPPTY CLRPPPTY MRNET

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Fig 5

A B

Figure 5 Evaluation of different sized networks constructed with different number of samples using PCC (black) SCC (green) MRNET (red) and CLR (blue) methods A Average AUROC values from GO evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted against sample size B Average AUROC values from PPPTY evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted against sample size Networks with the same number of samples included are designated as ldquo_1rdquo ldquo_2rdquo and ldquo_3rdquo PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 6 GCN performance comparison of networks constructed with 1266 libraries A Area under the ROC curve (AUROC) values from GO evalu-ation of single network (white bars) aggregation network (grey bars) and protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers B AUROC values from PPPTY evaluation of single network (white bars) aggregation network (grey bars) and protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers

AU

C

AU

C

Protein GO Protein PPPTYA B

Fig 6

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

835

45

3812

5 802

148

MS PA

SA

872505

167664

16510411732

9172 716573

106591

MS PA

SA

chromatinassemblydisassembly

cellular macromoleculemetabolic process

chromatin assembly

Hub

N2 compound metabolicprocess

gene silencing

macromoleculemetabolic process

cellular componentorganization

chromatin modification

biosynthetic process

cellular biosyntheticprocess

DNA packaging

organelle organization

protein-DNA complexassembly

nucleosomeorganization

DNA-dep DNAreplication

macromoleculebiosynthetic process

response to DNAdamage stimulus

chromosomeorganization

pattern specificationprocess

DNA replication

DNA conformationchange

translation

cellular macromoleculebiosynthetic process

Nucleic acid metabolicprocess

gene expression

chromatin organizationnucleosome assembly

epigenetic reg of geneexpression

negative regulation ofmacromolecule

metabolic process

cellular response tostress

RNA processing

DNA repair

regionalization

polysaccharidebiosynthetic process

cell wall organization orbiogenesis

glucan metabolicprocess

cellular glucanmetabolic process

cellular polysaccharidebiosynthetic process

cellular carbohydratebiosynthetic process

cellulose metabolicprocess

cellular polysaccharidemetabolic process

cellulose biosyntheticprocess epidermis development

cell growthgrowth

regulation of cellularcomponent size

cellular amino acidderivative metabolic

process

cell wall polysaccharidemetabolic process

carbohydrate metabolicprocess

regulation of anatomicalstructure size

GTP metabolic process

root morphogenesis

epidermal celldifferentiation

ectoderm developmentphenylpropanoid

biosynthetic process

regulation of cell size

glucan biosyntheticprocess

carbohydratebiosynthetic process cellular cell wall

organization orbiogenesis

cell wall biogenesis

Cell Wallroot epidermal cell

differentiationcell differentiation

cell wall organization

protein polymerization

plant-type cell wallbiogenesis

cellular carbohydratemetabolic process

phenylpropanoidmetabolic process

cell wall macromoleculemetabolic process

cellular cell wallmacromolecule

metabolic process

plant-type cell wallorganization or

biogenesis

hemicellulose metabolicprocess

Fig 7A C

B D

Figure 7 Comparison of top 1000 interacting genes in aggregation and single networks A Venn diagram shows the overlap among 1000 highest interacting genes in three networks 148 genes were shared among three networks PA PCC ranked aggregation network SA SCC ranked aggrega-tion network MS MRNET single network B Venn diagram shows the overlap among top 1x106 edges in three networks 106591 edges were shared among three networks C GO term enrichment analyzed by AgriGO for 148 shared highly interacting genes among PA SA and MS D GO term enrichment analyzed by AgriGO for 214 co-expressed genes queried by 16 cell wall pathway genes wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Fig 8

A B C

Figure 8 Cell wall pathway subnetworks A Intersections of PCC aggregation (PA) SCC aggregation (SA) and MRNET-single (MS) networks queried by 16 cell wall pathway genes (red nodes) Cyan nodes are genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions B Network retrieved from CORNET database queried by the 16 cell wall pathway genes (red node) Cyan nodes are genes with reported function in cell wall related path-ways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions C Network retrieved from STRING database queried by the 16 cell wall pathway genes (red nodes) Cyan nodes are genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Parsed CitationsAllen JD Xie Y Chen M Girard L Xiao G (2012) Comparing statistical methods for constructing large scale gene networks PLoS One7 e29348

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Anders S Huber W (2010) Differential expression analysis for sequence count data Genome Biol 11 R106Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Aoki K Ogata Y Shibata D (2007) Approaches for extracting practical information from gene co-expression networks in plant biologyPlant Cell Physiol 48 381-90

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Aoki Y Okamura Y Tadaka S Kinoshita K Obayashi T (2015) ATTED-II in 2016 A Plant Coexpression Database Towards Lineage-Specific Coexpression Plant Cell Physiol 57 e5-e5

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Asnicar F Masera L Coller E Gallo C Sella N Tolio T Morettin P Erculiani L Galante F Semeniuta S (2016) NES2RA Networkexpansion by stratified variable subsetting and ranking aggregation Int J High Perform Comput Appl 1094342016662508

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Baerenfaller K Grossmann J Grobei MA Hull R Hirsch-Hoffmann M Yalovsky S Zimmermann P Grossniklaus U Gruissem WBaginsky S (2008) Genome-scale proteomics reveals Arabidopsis thaliana gene models and proteome dynamics Science (80- ) 320938-941

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ballouz S Verleyen W Gillis J (2015) Guidance for RNA-seq co-expression network construction and analysis Safety in numbersBioinformatics 31 2123-2130

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ballouz S Weber M Pavlidis P Gillis J (2016) EGAD Ultra-fast functional analysis of gene networks bioRxiv 53868Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Barabasi A-L Oltvai ZNZN Barabaacutesi A-L (2004) Network biology understanding the cells functional organization Nat Rev Genet 5101-113

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Benjamini Y Hochberg Y (1995) Controlling the false discovery rate a practical and powerful approach to multiple testing J R Stat SocSer B 289-300

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

De Bodt S Hollunder J Nelissen H Meulemeester N Inzeacute D (2012) CORNET 20 Integrating plant coexpression protein-proteininteractions regulatory interactions gene associations and functional annotations New Phytol 195 707-720

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Bolduc N Yilmaz A Mejia-Guerra MK Morohashi K OConnor D Grotewold E Hake S (2012) Unraveling the KNOTTED1 regulatorynetwork in maize meristems Genes Dev 26 1685-90

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Bosch M Mayer C-D Cookson A Donnison IS (2011) Identification of genes involved in cell wall biogenesis in grasses by differential wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

gene expression profiling of elongating and non-elongating maize internodes J Exp Bot 62 3545-3561Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Camacho C Coulouris G Avagyan V Ma N Papadopoulos J Bealer K Madden TL (2009) BLAST+ architecture and applications BMCBioinformatics 10 421

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Conesa A Madrigal P Tarazona S Gomez-Cabrero D Cervera A McPherson A Szczesniak MW Gaffney DJ Elo LL Zhang X et al(2016) A survey of best practices for RNA-seq data analysis Genome Biol 17 13

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dhaeseleer P Liang S Somogyi R (2000) Genetic network inference from co-expression clustering to reverse engineeringBioinformatics 16 707-726

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Davidson RM Hansey CN Gowda M Childs KL Lin H Vaillancourt B Sekhon RS de Leon N Kaeppler SM Jiang N et al (2011) Utilityof RNA Sequencing for Analysis of Maize Reproductive Transcriptomes Plant Genome J 4 191

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Desprez T Juraniec M Crowell EF Jouy H Pochylova Z Parcy F Houmlfte H Gonneau M Vernhettes S (2007) Organization of cellulosesynthase complexes involved in primary cell wall synthesis in Arabidopsis thaliana Proc Natl Acad Sci 104 15572-15577

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dhillon IS Modha DS (2001) Concept decompositions for large sparse text data using clustering Mach Learn 42 143-175Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dillies M-A Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot G Castel D Estelle J (2013a) Acomprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis Brief Bioinform 14671-683

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dillies MA Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot NS Castel D Estelle J et al (2013b) Acomprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis Brief Bioinform 14671-683

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Doncheva NT Assenov Y Domingues FS Albrecht M (2012) Topological analysis and interactive visualization of biological networksand protein structures Nature Protoc 7 670-85

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dong J Horvath S (2007) Understanding network concepts in modules BMC Syst Biol 1 24Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dotto MC Petsch KA Aukerman MJ Beatty M Hammell M Timmermans MCP (2014) Genome-wide analysis of leafbladeless1-regulatedand phased small RNAs underscores the importance of the TAS3 ta-siRNA pathway to maize development PLoS Genet 10 e1004826

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Du D Rawat N Deng Z Gmitter FG (2015) Construction of citrus gene coexpression networks from microarray data using randommatrix theory Hortic Res 2 15026

Pubmed Author and TitleCrossRef Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Google Scholar Author Only Title Only Author and Title

Du Z Zhou X Ling Y Zhang Z Su Z (2010) agriGO a GO analysis toolkit for the agricultural community Nucleic Acids Res 38 64-70Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Enright AJ Van Dongen S Ouzounis CA (2002) An efficient algorithm for large-scale detection of protein families Nucleic Acids Res 301575-1584

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fadista J Vikman P Laakso EO Mollet IG Esguerra J Lou Taneera J Storm P Osmark P Ladenvall C Prasad RB (2014) Globalgenomic and transcriptomic analysis of human pancreatic islets reveals novel genes influencing glucose metabolism Proc Natl AcadSci 111 13924-13929

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Faith JJ Hayete B Thaden JT Mogno I Wierzbowski J Cottarel G Kasif S Collins JJ Gardner TS (2007) Large-scale mapping andvalidation of Escherichia coli transcriptional regulation from a compendium of expression profiles PLoS Biol 5 0054-0066

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fedoroff N V (2012) McClintocks challenge in the 21st century Proc Natl Acad Sci 109(50) 20200-20203Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ficklin SP Feltus FA (2011) Gene coexpression network alignment and conservation of gene modules between two grass speciesmaize and rice Plant Physiol 156 1244-56

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Filzmoser P Fritz H Kalcher K (2009) pcaPP Robust PCA by Projection Pursuit R Packag version 1Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fu J Cheng Y Linghu J Yang X Kang L Zhang Z Zhang J He C Du X Peng Z (2013) RNA sequencing reveals the complex regulatorynetwork in the maize kernel Nature Commun 42832

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gao S Ver Steeg G Galstyan A (2015) Efficient Estimation of Mutual Information for Strongly Dependent Variables ArtificialIntelligence and Statistics 277-286

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gillis J Pavlidis P (2011) The role of indirect connections in gene networks in predicting function Bioinformatics 27 1860-1866Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Giorgi F Fabbro C Del Licausi F (2013) Comparative study of RNA-seq-and Microarray-derived coexpression networks in Arabidopsisthaliana Bioinformatics 2 1-8

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gu Y Kaplinsky N Bringmann M Cobb A Carroll A Sampathkumar A Baskin TI Persson S Somerville CR (2010) Identification of acellulose synthase-associated protein required for cellulose biosynthesis Proc Natl Acad Sci 107 12866-12871

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Han Y Gao S Muegge K Zhang W Zhou B (2015) Advanced applications of RNA sequencing and challenges Bioinform Biol Insights 929-46

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Horvath S Dong J (2008) Geometric interpretation of gene coexpression network analysis PLoS Comput biol 4 e1000117Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Huang J Lynn JS Schulte L Vendramin S McGinnis K (2017) Chapter Two-Epigenetic Control of Gene Expression in Maize Int RevCell Mol Biol 328 25-48

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Iancu OD Kawane S Bottomly D Searles R Hitzemann R McWeeney S (2012) Utilizing RNA-Seq data for de novo coexpressionnetwork inference Bioinformatics 28 1592-1597

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kim D Langmead B Salzberg SL (2015) HISAT a fast spliced aligner with low memory requirements Nat Methods 12 357-360Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Koumlster J Rahmann S (2012) Snakemakemdasha scalable bioinformatics workflow engine Bioinformatics 28 2520-2522Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Krouk G Lingeman J Colon A Coruzzi G Shasha D (2013) Gene regulatory networks in plants learning causality from time andperturbation Genome Biol 14 123

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kumar S Stecher G Suleski M Hedges SB (2017) TimeTree a resource for timelines timetrees and divergence times Mol Biol Evol34 1812-1819

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kumari S Nie J Chen H-S Ma H Stewart R Li X Lu M-Z Taylor WM Wei H (2012) Evaluation of gene association methods forcoexpression network construction and biological knowledge discovery PLoS One 7 e50411

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Langfelder P Horvath S (2008) WGCNA an R package for weighted correlation network analysis BMC Bioinformatics 9 559Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Leinonen R Sugawara H Shumway M (2010) The sequence read archive Nucleic Acids Res gkq1019Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li C Qiao Z Qi W Wang Q Yuan Y Yang X Tang Y Mei B Lv Y Zhao H et al (2015a) Genome-Wide Characterization of cis-Acting DNATargets Reveals the Transcriptional Regulatory Framework of Opaque2 in Maize Plant Cell 27 532-545

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li H Peng Z Yang X Wang W Fu J Wang J Han Y Chai Y Guo T Yang N (2013a) Genome-wide association study dissects the geneticarchitecture of oil biosynthesis in maize kernels Nat Genet 45 43-50

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li J Wei H Zhao PX (2013b) DeGNServer Deciphering Genome-Scale Gene Networks through High Performance ReverseEngineering Analysis 2013

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li P Piao Y Shon HS Ryu KH (2015b) Comparing the normalization methods for the differential analysis of Illumina high-throughputRNA-Seq data BMC Bioinformatics 16 347 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li Q Eichten SR Hermanson PJ Zaunbrecher VM Song J Wendt J Rosenbaum H Madzima TF Sloan AE Huang J et al (2014a)Genetic Perturbation of the Maize Methylome Plant Cell 26 4602-4616

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li S Labaj PP Zumbo P Sykacek P Shi W Shi L Phan J Wu P-Y Wang M Wang C (2014b) Detecting and correcting systematicvariation in large-scale RNA sequencing data Nature Biotechnol 32 888-895

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li Y Pearl SA Jackson SA (2015c) Gene Networks in Plant Biology Approaches in Reconstruction and Analysis Trends Plant Sci 20664-675

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Liao Y Smyth GK Shi W (2014) featureCounts an efficient general purpose program for assigning sequence reads to genomicfeatures Bioinformatics 30 923-930

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Lim WK Wang K Lefebvre C Califano A (2007) Comparative analysis of microarray normalization procedures Effects on reverseengineering gene networks Bioinformatics pp 282-288

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Liu S Liu Y Zhao J Cai S Qian H Zuo K Zhao L Zhang L (2017) A computational interactome for prioritizing genes associated withcomplex agronomic traits in rice Plant J 90 177-188

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Lonsdale J Thomas J Salvatore M Phillips R Lo E Shad S Hasz R Walters G Garcia F Young N (2013) The genotype-tissueexpression (GTEx) project Nat Genet 45 580-585

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Love MI Huber W Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 Genome Biol15 1

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Luo X You Z Zhou M Li S Leung H Xia Y Zhu Q (2015) A highly efficient approach to protein interactome mapping based oncollaborative filtering framework Sci Rep 5 7702

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma C Wang X (2012) Application of the Gini correlation coefficient to infer regulatory relationships in transcriptome analysis PlantPhysiol 160 192-203

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma H-W Zeng A-P (2003) The connectivity structure giant strong component and centrality of metabolic networks Bioinformatics 191423-1430

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma S Shah S Bohnert HJ Snyder M Dinesh-Kumar SP (2013) Incorporating motif analysis into gene co-expression networks revealsnovel modular expression pattern and new signaling pathways PLoS Genet 9 e1003840

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Marbach D Costello JC Kuumlffner R Vega NNM Prill RJ Camacho DM Allison KR Aderhold A Allison KR Bonneau R et al (2012)Wisdom of crowds for robust gene network inference Nat Methods 9 796-804

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Margolin AA Nemenman I Basso K Wiggins C Stolovitzky G Dalla Favera R Califano A (2006) ARACNE an algorithm for thereconstruction of gene regulatory networks in a mammalian cellular context BMC Bioinformatics 7 S7

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mark Cigan A Unger-Wallace E Haug-Collet K (2005) Transcriptional gene silencing as a tool for uncovering gene function in maizePlant J 43 929-940

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads EMBnet J 17 pp-10Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Maza E Frasse P Senin P Bouzayen M Zouine M (2013) Comparison of normalization methods for differential gene expressionanalysis in RNA-Seq experiments A matter of relative size of studied transcriptomes Commun Integr Biol 6 e25849

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

McClintock B (1983) The Significance of Responses of THE GENOME TO CHALLENGE Science (80- ) 792-801Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Meyer PE Kontos K Lafitte F Bontempi G (2007) Information-theoretic inference of large transcriptional regulatory networks EurasipJ Bioinforma Syst Biol doi 101155200779879

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Meyer PE Lafitte F Bontempi G (2008) minet ARBioconductor package for inferring large transcriptional networks using mutualinformation BMC Bioinformatics 9 461

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Monaco MK Sen TZ Dharmawardhana PD Ren L Schaeffer M Naithani S Amarasinghe V Thomason J Harper L Gardiner J et al(2013) Maize Metabolic Network Construction and Transcriptome Analysis Plant Genome 6 12

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Morohashi K Casas MI Falcone Ferreyra ML Falcone Ferreyra L Mejiacutea-Guerra MK Pourcel L Yilmaz A Feller A Carvalho B EmilianiJ et al (2012) A genome-wide regulatory framework identifies maize pericarp color1 controlled genes Plant Cell 24 2745-64

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Morris JH Apeltsin L Newman AM Baumbach J Wittkop T Su G Bader GD Ferrin TE (2011) clusterMaker a multi-algorithmclustering plugin for Cytoscape BMC Bioinformatics 12 436

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mortazavi A Williams BA McCue K Schaeffer L Wold B (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq NatMethods 5 621-628

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mukaka MM (2012) A guide to appropriate use of correlation coefficient in medical research Malawi Med J 24 69-71Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Obayashi T Hayashi S Saeki M Ohta H Kinoshita K (2009) ATTED-II provides coexpressed gene networks for Arabidopsis Nucleic wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Acids Res 37 D987-D991Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ogata Y Suzuki H Sakurai N Shibata D (2010) CoP a database for characterizing co-expressed gene modules with biologicalinformation in plants Bioinformatics 26 1267-1268

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Oshlack A Wakefield MJ (2009) Transcript length bias in RNA-seq data confounds systems biology Biol Direct 4 14Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Park T Yi S-G Kang S-H Lee S Lee Y-S Simon R (2003) Evaluation of normalization methods for microarray data BMC Bioinformatics4 33

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Paulson J Chen C-Y Lopes-Ramos CM Kuijjer ML Platig J Sonawane AR Fagny M Glass K Quackenbush J (2016) Tissue-awareRNA-Seq processing and normalization for heterogeneous and sparse data bioRxiv 81802

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Pautler M Eveland AL LaRue T Yang F Weeks R Lunde C Je B Il Meeley R Komatsu M Vollbrecht E et al (2015) FASCIATED EAR4Encodes a bZIP Transcription Factor That Regulates Shoot Meristem Size in Maize Plant Cell Online 2 tpc114132506

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Penning BW Hunter 3rd CT Tayengwa R Eveland AL Dugard CK Olek AT Vermerris W Koch KE McCarty DR Davis MF et al (2009)Genetic resources for maize cell wall biology Plant Physiol 151 1703-1728

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ponnala L Wang Y Sun Q Wijk KJ (2014) Correlation of mRNA and protein abundance in the developing maize leaf Plant J 78 424-440Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Rau A Gallopin M Celeux G Jaffreacutezic F (2013) Data-based filtering for replicated high-throughput transcriptome sequencingexperiments Bioinformatics 29 2146-2152

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Robinson MD McCarthy DJ Smyth GK (2010) edgeR a Bioconductor package for differential expression analysis of digital geneexpression data Bioinformatics 26 139-140

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sales G Romualdi C (2011) parmigenemdasha parallel R package for mutual information estimation and gene network reconstructionBioinformatics 27 1876-1877

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sangrador-Vegas A Mitchell AL Chang H-Y Yong S-Y Finn RD (2016) GO annotation in InterPro why stability does not indicateaccuracy in a sea of changing annotations Database J Biol databases curation 2016

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sato Y Antonio BA Namiki N Takehisa H Minami H Kamatsuki K Sugimoto K Shimizu Y Hirochika H Nagamura Y (2011) RiceXPro aplatform for monitoring gene expression in japonica rice grown under natural field conditions Nucleic Acids Res 39 D1141-D1148

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schaefer RJ Briskine R Springer NM Myers CL (2014) Discovering functional modules across diverse maize transcriptomes using wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

COB the co-expression browser PLoS One doi 101371journalpone0099193Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schmidt D (2016) Co-Operation Fast Correlation Covariance and Cosine Similarity R packagePubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schnable P Ware D Fulton R Stein J (2009) The B73 maize genome complexity diversity and dynamics Science (80- ) 326 1112-1115Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schwanhaumlusser B Busse D Li N Dittmar G Schuchhardt J Wolf J Chen W Selbach M (2011) Global quantification of mammalian geneexpression control Nature 473 337-342

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Scott-Boyer M-P Haibe-Kains B Deschepper CF (2013) Network statistics of genetically-driven gene co-expression modules in mousecrosses Frontiers in Genetics 20134291

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Serin EAR Nijveen H Hilhorst HWM Ligterink W (2016) Learning from Co-expression Networks Possibilities and Challenges FrontPlant Sci 7 444

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Shannon P Markiel A Ozier O Baliga NS Wang JT Ramage D Amin N Schwikowski B Ideker T (2003) Cytoscape a softwareenvironment for integrated models of biomolecular interaction networks Genome Res 13 2498-2504

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sing T Sander O Beerenwinkel N Lengauer T (2005) ROCR visualizing classifier performance in R Bioinformatics 21 3940-3941Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Singh M Goel S Meeley RB Dantec C Parrinello H Michaud C Leblanc O Grimanelli D (2011) Production of viable gametes withoutmeiosis in maize deficient for an ARGONAUTE protein Plant Cell 23 443-458

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Song L Langfelder P Horvath S (2012) Comparison of co-expression measures mutual information correlation and model basedindices BMC Bioinformatics 13 328

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Stelpflug SC Rajandeep S Vaillancourt B Hirsch CN Buell CR Leon N De Kaeppler SM (2015) An expanded maize gene expressionatlas based on RNA-sequencing and its use to explore root development Plant Genome 314-362

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Szklarczyk D Franceschini A Wyder S Forslund K Heller D Huerta-Cepas J Simonovic M Roth A Santos A Tsafou KP et al (2015)STRING v10 Protein-protein interaction networks integrated over the tree of life Nucleic Acids Res 43 D447-D452

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Tzfadia O Diels T De Meyer S Vandepoele K Aharoni A Van de Peer Y (2016) CoExpNetViz Comparative Co-Expression NetworksConstruction and Visualization Tool Front Plant Sci 6 1194

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Usadel B Obayashi T Mutwil M Giorgi FM Bassel GW Tanimoto M Chow A Steinhauser D Persson S Provart NJ (2009) Co-expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

USDA (2016) Grain World Markets and Trade

Vanholme R Demedts B Morreel K Ralph J Boerjan W (2010) Lignin biosynthesis and structure Plant Physiol 153 895-905Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Verdin E Ott M (2014) 50 Years of Protein Acetylation From Gene Regulation To Epigenetics Metabolism and Beyond Nat Rev MolCell Biol 16 258-264

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Walley JW Sartor RC Shen Z Schmitz RJ Wu KJ Urich MA Nery JR Smith LG Schnable JC Ecker JR (2016) Integration of omicnetworks in a developmental atlas of maize Science 353 814-818

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang W Zhou XJ Liu Z Sun F (2015a) Network tuned multiple rank aggregation and applications to gene ranking BMC Bioinformatics16 S6

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang YXR Jiang K Feldman LJ Bickel PJ Huang H (2015b) Inferring gene-gene interactions and functional modules using sparsecanonical correlation analysis Ann Appl Stat 9 300-323

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang Z Gerstein M Snyder M (2009) RNA-Seq a revolutionary tool for transcriptomics Nat Rev Genet 10 57-63Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wei C Li J Bumgarner RE (2004) Sample size for detecting differentially expressed genes in microarray experiments BMC Genomics5 87

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wolfe CJ Kohane IS Butte AJ (2005) Systematic survey reveals general applicability of guilt-by-association within genecoexpression networks BMC Bioinformatics 6 227

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wu D-D Wang X Li Y Zeng L Irwin DM Zhang Y-P (2014) Out of Pollen Hypothesis for Origin of New Genes in Flowering PlantsStudy from Arabidopsis thaliana Genome Biol Evol 6 2822-2829

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yang H Liu X Xin M Du J Hu Z Peng H Rossi V Sun Q Ni Z Yao Y (2016) Genome-wide Mapping of Targets of Maize HistoneDeacetylase HDA101 Reveals Its Function and Regulatory Mechanism during Seed Development Plant Cell 28 629-645

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yim WC Yu Y Song K Jang CS Lee B-M (2013) PLANEX the plant co-expression database BMC Plant Biol 13 83Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zheng W Chung LM Zhao H (2011) Bias detection and correction in RNA-Sequencing data BMC Bioinformatics 12 290Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Allen JD Xiao G Xie Y (2014) Ensemble-based network aggregation improves the accuracy of gene network reconstructionPLoS One 9 e106319 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Ye ZH (2015) Secondary cell walls Biosynthesis patterned deposition and transcriptional regulation Plant Cell Physiol 56195-214

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhu G Wu A Xu X-J Xiao P Lu L Liu J Cao Y Chen L Wu J Zhao X-M (2016) PPIM A protein-protein interaction database for MaizePlant Physiol 170 pp1501821-

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zyprych-Walczak J Szabelska A Handschuh L Goacuterczak K Klamecka K Figlerowicz M Siatkowski I (2015) The impact of normalizationmethods on RNA-Seq data analysis Biomed Res Int 2015

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

  • Parsed Citations
  • Article File
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Figure 7
  • Figure 8
  • Parsed Citations
Page 5: 1 : Maize RNA-Seq GCNs · 4 Author names and affiliation: 5 Ji Huang, ... and the genome annotation is still far from complete (Mark 49 Cigan et al., 2005; Ficklin and Feltus, 2011)

Page | 5

2013 Stelpflug et al 2015 Walley et al 2016) In our analysis genes with CPM gt 2 in more than 1000 140

samples were included This filter dramatically reduces zero count values in raw data from 30949 to 0367 141

Moreover a prior count of one was added at log2 normalization (expression = log2(CPMRPKM +1)) to avoid 142

problem with remaining zero values The log2 transformation reduced skewed distributions and extreme values 143

represented by outliers (Supplemental Fig 2) Thus we think it is important to apply log2 transformation for our 144

data 145

The distribution of gene expression across the 1266 libraries formed a bell-shaped curve with a small 146

additional peak of low expression for all three methods (Supplemental Fig 2) To determine if these low 147

expression values came from a few or multiple libraries elements within the range of expression that 148

corresponded to the observed peak (lt -37 CPM Supplemental Fig 2B) were extracted from CPM-normalized 149

expression matrix and matched to the originating libraries This demonstrated that the low expression elements 150

were not limited exclusively to specific libraries but eight libraries contributed over 25 of low elements A 151

gene ontology enrichment analysis failed to identify significant gene ontology descriptors within the subset of 152

43 genes that were defined as lowly expressed (data not shown) All eight of these libraries were from pollen 153

tissue where the average gene expression at 147 Counts Per Million (CPM) is lower than the average gene 154

expression of the other 79 tissues combined at average 183 CPM Hierarchical clustering and correlation 155

heatmap with the same data (Stelpflug et al 2015) shows the uniqueness of pollen tissue expression pattern 156

(Langfelder and Horvath 2008) (Supplemental Fig 3) When the lowly expressed elements from RPKM- and 157

VST-normalized data were analyzed to determine library origin and GO enrichment (data not shown) we found 158

similarly high level of pollen-specific libraries without significant GO categories In pollen some highly 159

expressed genes are considered orphan genes (Wu et al 2014) because they lack detectable homologs in 160

another species To investigate whether these lowly expressed genes were orphan genes their gene 161

sequences were blasted against Setaria italica genome (JGIv2) (BLASTX e-value lt 1E-03) Setaria italic 162

(foxtail millet) is a close relative to maize which diverged 234 million years ago (MYA) as estimated by 163

TimeTree (Kumar et al 2017) Only 1 out of 43 genes lacked detectable homologs in Setaria italic (data not 164

shown) indicating that the majority of these genes are not likely to be orphan genes 165

Because RPKM normalization accounts for gene length the distribution of gene length versus expression for 166

the RPKM method was compared to data normalized by VST and CPM methods VST- and CPM-normalized 167

data showed very similar overall patterns with no clear linear relationship between gene length and average 168

expression (Supplemental Fig 2C) RPKM-normalized data displayed an apparent bias toward elevated 169

expression of a small number of genes less than 5000bp in length and lower expression of long genes 170

suggesting that this normalization method might skew the distribution of expression at some genes Overall in 171

spite of these differences the three normalization methods resulted in a similar distribution of expression 172

patterns for most of the genes included in the analysis Additional analysis was completed to determine if the 173

three normalization methods influence network performance 174

175

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 6

Network Performance Does Not Differ Based Upon Normalization Method 176

To compare the efficacy of three normalization and ten inference methods a GCN was generated for each 177

combination of normalization and inference methods Furthermore all networks were rank-standardized to limit 178

the edge weight ranging from 0 to 1 (See Methods) All networks evaluations used the whole adjacency matrix 179

(1511615116 in RNA-Seq networks 1142911429 or 1786217862 in protein networks) without a cut-off 180

The performance of the different networks was measured by comparing the area under the receiver operator 181

characteristic curves (AUROC) AUROC is a measurement used to evaluate the accuracy of classification 182

models making it suitable for evaluating GCNs (Gillis and Pavlidis 2011 Ma and Wang 2012 Liu et al 2017) 183

AUROC values range from 0 to 1 with a value closer to 1 indicating that the network is discriminating 184

nonrandom patterns and perfect classification random networks returning values close to 05 and values 185

closer to 0 indicating a high degree of incorrect classification While an AUROC value close to 1 is optimal 186

values over 07 suggest good performance when analyzing large diverse networks (Gillis and Pavlidis 2011) 187

To set up the AUROC baseline for the random networks maize gene IDs were shuffled 10 (for MRNET and 188

CLR) or 1000 times (for PCC) from the normalized expression matrix The randomized expression matirx were 189

inferenced using designated alorgrithms and further evaluated The resulting AUROC values from randomized 190

networks were very close to 05 (Supplemental Table S2) 191

AUROC values were calculated and compared for three different network characteristics The first 192

characteristic was designed to test if the network identified genes with known or predicted co-expression 193

patterns based upon prior results and inclusion in two existing datasets that could serve as a positive control 194

for co-expression The maize metabolic pathway (MaizeCyc) contains 413 pathways with more than two genes 195

and was built based upon collection of evidence from genome annotation phylogenetic distance and known 196

genes in maize rice and Arabidopsis (Monaco et al 2013) The maize protein-protein interaction database 197

(PPIM) is based upon both predicted and experimentally detected protein interactions (Zhu et al 2016) and 198

was the second dataset used in this analysis Only high-confident interactions from PPIM were used as 199

defined by ranking top 5 in their model (Zhu et al 2016) For comparison with the GCN genes within the 200

same MaizeCyc or PPIM pathways were considered co-expressed The MaizeCyc and PPIM datasets were 201

combined and genes with less than 5 interactions were excluded from evaluation creating a compiled dataset 202

referred to herein as the Protein-Protein and Pathway dataset (PPPTY) PPPTY had 1720 genes and 104856 203

interactions that were used in this evaluation The AUROC value was calculated for each of the 1720 gene 204

terms 205

To assess the effect of normalization method on GCNs AUROC values for all ten inference methods were 206

averaged for each of the three normalization methods All three normalization methods scored similarly in 207

comparison with the PPPTY dataset (Fig 2B) with a mean AUROC value around 0575 for each suggesting 208

that the predicted networks were more selective than a random network 209

The second characteristic was the presence of similar gene ontology (GO) information for maize genes within 210

a detected co-expression set based upon ldquoguilt by associationrdquo that assumes specific subgroups of co-211 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 7

expressed genes have some shared functions (Wolfe et al 2005) GO annotations were downloaded from 212

AgriGO (Du et al 2010) which uses signature integration by InterPro to map gene IDs to GO terms rather 213

than co-expression data InterPro provided over 108 million stable GO terms to the functional protein 214

information database UniProtKB at release 2016_01(Sangrador-Vegas et al 2016) Thus the GO annotations 215

provide a reliable evaluation resource independent of co-expression data To assess this characteristic gene 216

ontology information was used in a neighbor voting algorithm (Gillis and Pavlidis 2011) for sets of co-217

expression matrices and compared Co-expression matrices were assessed by 3-fold cross-validation which 218

involved masking GO terms from some genes to test whether the masked GO terms could be predicted based 219

upon gene expression patterns 277 GO terms were included for this analysis 220

When GO characteristics were used to assess the networks all three normalization methods performed 221

similarly but the AUROC values were higher at around 0689 for each than those observed for comparisons 222

with PPPTY (Fig 2A) Because GO addresses gene functions and PPPTY emphasizes protein-protein 223

interactions this suggests that GCNs are better at predicting functional interactions than physical interactions 224

The p-value from one-way ANOVA for testing normalization method effect on PPPTY and GO dataset were 225

09535 and 04714 respectively confirming that the normalization method did not create a significant difference 226

in the AUROC scores associated with the GCNs for the characteristics that were tested 227

Finally proteins that regulate gene expression or modify chromatin structure might interact with the DNA of a 228

subset of co-expressed genes The interactions between such a protein and regulated DNA could be detected 229

by chromatin precipitation of associated DNA followed by DNA sequencing (ChIP-Seq) In maize there are five 230

ChIP-Seq datasets available (Bolduc et al 2012 Morohashi et al 2012 Li et al 2015a Pautler et al 2015 231

Yang et al 2016) some of which involving lowly expressed or tissue-specific genes For example Opaque2 is 232

specifically expressed in endosperm (Li et al 2015a) Knotted1 is expressed in SAM and floral tissues (Bolduc 233

et al 2012) and Pericarp Color1 has low expression except in inflorescence and seed (Morohashi et al 234

2012) Histone Deacetylase 101 (HDA101) ChIP-Seq data provided the largest dataset for comparison with 26 235

confirmed binding targets that are relatively high expressed in most maize tissues (Yang et al 2016) Histone 236

deacetylation often correlates with decreased in gene expression (Verdin and Ott 2014) High confidence 237

HDA101 targets were defined as those discovered by ChIP-Seq and that also showed increased gene 238

expression in hda101 mutant Networks associated with the 26 high confidence HDA101 targets were 239

compared by calculating AUROC Based upon this analysis the AUROC values were very similar among 240

networks normalized by VST CPM and RPKM (Fig 2C) which is consistent with GO and PPPTY evaluation 241

242

Correlation Methods Performs better than Mutual Information at Some Genes 243

After normalization of the expression matrices they can be processed by different methods for GCN inference 244

To optimize this step the AUROC values of six correlation (PCC SCC KCC GCC BIC CSC) and four mutual 245

information (MI) methods (AA MA MRNET CLR) were compared for the expression matrices that were 246

generated from each of three normalization methods (VST CPM RPKM) and then averaged In general 247 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 8

correlation methods are more computationally efficient while MI methods are able to reveal non-linear 248

relationships (Li et al 2015c) PCC is widely used but may be influenced by outliers (Mukaka 2012) SCC 249

KCC and BIC are less sensitive to outliers because SCC and KCC only consider the rank information and BIC 250

calculates based on dataset median instead of mean (Serin et al 2016) Recently GCC has been shown to 251

be a better correlation method for gene expression analysis because of its capacity to detect non-linear 252

relationships and insensitivity to outliers (Ma and Wang 2012) CSC is widely used for text mining and 253

analyzing sparse data with many zeros (Dhillon and Modha 2001) ARACNE MRNET and CLR showed 254

extended gene-dependent relationships under variable biological settings (Margolin et al 2006 Faith et al 255

2007 Meyer et al 2007 Li et al 2013b) To estimate the effectiveness of the inference methods the same 256

testing parameters with AUROC calculations were performed as described for the testing of normalization 257

methods 258

Assessed by GO datasets the 277 AUROC values were averaged to create one average value for each of the 259

10 inference methods ranging from 0620 to 0724 (Fig 2D) The average AUROC across all normalization 260

methods for six correlation methods was 0718 while the average AUROC for the all four MI methods was 261

0646 The majority of the 277 GO terms had similar AUROC values in the different correlation method-262

generated GCNs and these patterns are different from those observed in the MI-generated GCNs (Fig 3A) 263

The similarity among different methods was also detectable by pairwise comparison and comparing Pearson 264

correlations between the different methods (Supplemental Fig 4A) 265

To evaluate network inference methods with the PPPTY dataset the AUROC values for 1720 genes were 266

averaged for each combination of normalization and inference methods (Fig 2E) This evaluation also showed 267

that the networks constructed using correlation methods resulted in higher AUROC values than MI methods 268

although the CSC method resulted in lower AUROC values than other correlation methods As demonstrated 269

for the GO evaluation results from correlation methods were more similar with each other than the MI methods 270

(Supplemental Fig 4B) Interestingly heatmap results indicated that a subset of genes consistently had higher 271

AUROC values when CSC MRNETCLR or AAMA were used (Fig 3B) although this includes a small enough 272

number of genes that the average AUROC value over the whole gene set was relatively low for those methods 273

The gene sets with highest AUROC values in PCC CSC or MRNET were extracted Characteristics of each 274

gene sets were compared in average expression (CPM) and average number of low expressed elements 275

(CPM lt 0) The CSC gene set had the smallest number of low expression elements and had higher average 276

expression than both the 1720 gene set and the PCC gene set (Supplemental Fig 5) This may indicate that 277

the CSC method is better at determining co-expression for highly expressed genes 278

The AUROC values from 26 targets of HDA101 ChIP-Seq datasets reveals that CSC GCN had the highest 279

AUROC value and the use of MRNETCLR GCNs resulted in slightly higher scores than correlation methods 280

(Fig 2F) This could be explained by the small number of targets creating skewed results but may also 281

indicate that CSCMI methods are more suitable for specific types of genes or interactions between genes 282

(Tzfadia et al 2016) HDA101 is a highly expressed gene in all samples with average expression value equals 283

to 864 CPM and minimum expression equals to 289 CPM so itrsquos possible that HDA101 is more suitable for 284 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 9

CSC method CPM and RPKM normalization methods had higher AUROC values than VST (Fig 2C) Using 285

two models of ARACNE (additive-AA and multiplicative-MA) the co-expression matrices contain less than 05 286

non-zero values for all comparisons and so these techniques were not included in any additional analyses 287

In conclusion our results indicated the widely-used correlation methods resulted in a more predictive maize 288

GCN from a single expression matrix but co-expression with some individual genes may be better detected 289

using MI methods Normalization method did not have a substantial influence on GCNrsquos performance so only 290

CPM normalization was used in conjunction with PCC SCC MRNT and CLR inference for subsequent 291

optimization of other parameters 292

293

Increase Sample Size Had a Positive Effect On GCN 294

GCN analysis can be accomplished with a variable number of samples and datasets but sample size can 295

influence the quality of the resulting GCN (Wei et al 2004 Ballouz et al 2015) Separate analyses were 296

conducted with different numbers of samples and experiments to empirically determine the effect of sample 297

number on GCN effectiveness The data in our analysis consisted of 17 experiments each including between 298

12 and 404 libraries For this analysis CPM normalization method followed by each of four inference methods 299

(PCC SCC MRNET and CLR) was applied to the 17 experiments and the 68 resulting networks were 300

evaluated by both GO and PPPTY 301

From GO and PPPTY evaluation all algorithms exhibit a positive linear relationship between sample size with 302

natural logarithm transformed and average AUROC values (Fig 4) The linear relationships are stronger in 303

PCC and SCC methods with higher r-square values indicating correlation methods benefit more from 304

increasing sample size Thus for building correlation-based GCNs as many samples as possible should be 305

included We also found that as seen for the total GCN analysis PCC and SCC had higher average AUROC 306

values than the MRNET and CLR methods for PPPTY and GO analysis for most of individual networks (Fig 5) 307

308

Ranked Aggregation of Networks Improved Performance of GCNs 309

Ranked aggregation for meta-analysis can also be modified to change the outcomes of GCN by buffering the 310

effect of sample heterogeneity (Zhong et al 2014 Wang et al 2015a Asnicar et al 2016) Aggregated rank 311

standardized correlationMI matrices were calculated from separate experiments to determine if this approach 312

enhanced GCN performance Aggregating individual networks together for meta-analysis can help to highlight 313

true co-expression interactions and reduce noise (Zhong et al 2014 Wang et al 2015a Wang et al 2015b) 314

This analysis was conducted with the 17 differently sized experiments using PCC SCC MRNET and CLR 315

method for GCN inference as we did previously resulting in 68 single GCNs The 17 experiments were 316

aggregated for PCC SCC MRNET and CLR individually and evaluated by GO and PPPTY datasets 317

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 10

Of the 4 aggregated networks that were evaluated the two correlation methods (PCC and SCC) had higher 318

AUROC values than the single network from 1266 samples (Figure 6 and Supplemental Fig 6) However this 319

aggregation strategy did not result in significant higher AUROC scores for the MRNET and CLR method 320

networks compared with single networks with 1266 samples (two-tail Wilcoxon rank test for GO evaluation p-321

values 0494 and 0796) It has been reported that MI estimation accuracy is dependent on sample size (Gao 322

et al 2015) therefore individual MI networks built with a small number of libraries may not demonstrate 323

improved accuracy from aggregation In conclusion the PCCSCC-built GCN performed best using a ranked 324

aggregation strategy and use of this strategy in combination with the other optimized parameters creates a 325

robust GCN 326

327

The Performance of Protein Networks Did Not Exceed Aggregation Networks 328

In many cases mRNA levels in a cell are of interest because mRNA level is thought to be related to the level 329

and function of a protein of interest However many researchers had found inconsistencies between mRNA 330

and protein level (Baerenfaller et al 2008 Schwanhaumlusser et al 2011 Ponnala et al 2014 Walley et al 331

2016) Although relatively less protein expression data is available this data is amenable to GCN construction 332

and could represent a more direct reflection of interacting proteins Using a non-modified protein expression 333

atlas from 23 maize tissues based upon mass spectrometry data (Walley et al 2016) four protein networks 334

were built with PCC SCC MRNET and CLR separately and then evaluated using the same PPPTY and GO 335

dataset as previously mentioned 336

GCNs constructed from protein expression did not exhibit superior AUROC values to those observed for RNA-337

Seq based GCN using the aggregation strategy (Fig 6) When evaluated by GO and PPPTY dataset the 338

performance of the protein network was lower than the aggregated network as well as the single network from 339

1266 samples To confirm this result a two-way ANOVA was computed with pairwise comparison for the GO 340

evaluation which showed that the effect of network type was significant (Supplemental Table S3) A 341

subsequent pairwise comparison using Wilcoxon rank sum test indicated that PCCSCC method were 342

significantly better than MRNETCLR (Supplemental Table S3) although MI methods may be superior for 343

some types of interactions 344

The raw protein expression data included 17862 genes of which 11429 genes overlapped with our RNA-Seq-345

based network and were therefore used for the analysis To demonstrate that the performance of the protein 346

network was not biased due by the selection of genes the PCC method was used for the whole 17862 genes 347

to construct a protein network (Supplemental Fig 7) No improvement could be detected from protein network 348

derived from 17862 genes with p-value equals to 0635 for GO evaluation and 0995 for PPPTY evaluation 349

from one-sided Wilcoxon rank sum test 350

351

PCC and SCC-built GCN Exhibit Identical Topological and Functional Properties 352 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 11

In addition to evaluation of network performance based upon biological characteristics networks can be 353

compared based upon several different network characteristics including clustering coefficient number of 354

nodes network heterogeneity (Dong and Horvath 2007) network centralization (Dong and Horvath 2007) 355

number of detected modules and number of genes in largest module Number of nodes is a basic construct in 356

graph theory depicting the scale of a network Clustering coefficient and number of modules are to model how 357

densely nodes are connected in networks Heterogeneity measures the variability of node connections 358

Centralization indicates how likely some nodes have significantly more connections than average In this 359

analysis each gene corresponds with a node Based on the extensive evaluation using biological 360

characteristics like protein-protein interactions (PPPTY) and predicted gene function (GO) three final maize 361

networks were selected for comparison of basic network characteristics based on their overall performance 362

PCC and SCC-built ranked aggregation network from 17 experiments (PA and SA) MRNET-built single 363

network from 1266 total samples (MS) The three networks were constrained to include the top one million 364

predicted interactions or edges 365

In prior studies most biological networks had scale-free architectures which fit a power-law distribution 366

(Barabasi et al 2004 Doncheva et al 2012 Schaefer et al 2014) For the three final maize networks 367

constructed using optimized parameters both neighborhood connectivity distribution (Supplemental Fig 8) and 368

node degree distribution (Supplemental Fig 9) fit power-law models with r-squared values over 07 The MS 369

network had the highest network centralization value The network heterogeneity value of MS was over two 370

times that of PA and SA indicating that MS may contain more highly interacting genes (Supplemental Table 371

S4) consistent with the observed highest centralization values for this network Centralization and 372

heterogeneity are two variants to model the degree distribution of networks A scale-free network with more 373

numbers of hubs has larger values of centralization and heterogeneity while a network with larger values of 374

centralization and heterogeneity may contain a larger number of hubs or the number of hubs is not significantly 375

large but the degree distributions are extremely imbalanced In biological networks many observations 376

connected large values of centralization and heterogeneity with more hub genes (Ma and Zeng 2003 Horvath 377

and Dong 2008 Iancu et al 2012 Scott-Boyer et al 2013) even though theoretically we cannot rule out the 378

possibility that high values were result from extremely imbalanced degree distribution For the MS network 379

most highly connected genes interacted with a large number of lowly connected genes this pattern is also 380

apparent reflected in the decreasing neighborhood connectivity distribution for the MS network (Supplemental 381

Fig 8) The genes with the most interactions are expected to act as key components in GCN networks 382

(Langfelder and Horvath 2008 Allen et al 2012) and likely represent central regulators of multi-protein 383

biological processes (Ma et al 2013 Du et al 2015) The top 1000 interacting genes from all networks were 384

analyzed in more detail as these were potential ldquohubrdquo genes that may regulate other expression patterns and 385

processes PA and SA shared 95 of the top 1000 interacting genes while MS had 835 unique genes (Fig 386

7A) 148 genes were shared among all three networks (Supplemental Table S5) making these genes strong 387

candidate for central biological regulators The annotation of these genes suggests their participation in a 388

range of basic cellular process (Fig 7C) including gene expression DNA replication translation and gene 389

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 12

silencing (Supplemental Table S5) the top interacting genes were not limited to a subset of cellular 390

biochemistry Ribosomal proteins were the largest component of top interacting genes (27148) which was 391

expected because of their cellular abundance and involvement with translation Interestingly nine epigenetic 392

regulators were found in the 148 shared genes including AGO104 (GRMZM2G141818) (Singh et al 2011) 393

CHR106 (GRMZM2G071025) (Li et al 2014a) and LBL1 (GRMZM2G020187) (Dotto et al 2014) 394

demonstrating the importance of epigenetic regulation for plant development (reviewed by (Huang et al 395

2017)) 396

To reveal the underlying properties of GCNs a graph clustering algorithm Markov Cluster Algorithm(MCL) was 397

used to identify network modules (Enright et al 2002 Morris et al 2011) The result showed a shared pattern 398

between the PA and SA networks that was distinct from the MS network (Supplemental Table S4) The MS 399

network had fewer but larger modules detected than the PA and SA networks Consequently most genes in 400

the MS network clustered into one very large module of 14054 consistent with the high network centralization 401

value for the MS network Conversely PA and SA networks separated into smaller distinct modules with 402

related gene ontology enrichment (Supplemental Table S6 and S7) The pattern displayed by the PA and SA 403

networks (Supplemental Fig 10) seems more likely to represent biologically relevant pathways and so these 404

methods appear to be better for module detection 405

To compile a high-confident co-expression network the top 1 million edges from PA SA and MS were merged 406

together and the intersection of the three produced a 14277 gene 106591 interactions merged network PA 407

and SA shared 835 of common interactions within the networks while MS had 873 unique interactions 408

(Fig 7B) This merged network (Supplemental Dataset S1) was used for a case study analysis of cell wall 409

biosynthesis The same network can also be accessed at httpwwwbiofsuedumcginnislabmcnmain_pagephp 410

411

Case Study Cell Wall Biosynthesis and Regulation 412

To demonstrate the functionality of network the predicted cell wall biosynthesis pathway from the merged 413

network was compared to the existing knowledge of this pathway Sixteen well-characterized components of 414

cell wall biosynthesis were selected as guide genes (Supplemental Table S8) including five cellulose 415

synthase genes seven cellulose synthase-like genes three glycosyl hydrolase genes and one glycosidase 416

gene (Penning et al 2009 Bosch et al 2011) Collectively 214 genes containing 377 edges were extracted 417

from the network with the 16 guide genes (Fig 8 A) two guide genes did not have any co-expressed genes in 418

the network that met the analysis criteria As expected for these 214 genes cell wall related GO terms were 419

enriched (Fig 7D Supplemental Table S9) 420

The resulting 214 co-expressed genes were queried against the Arabidopsis TAIR 10 protein database to 421

retrieve homologs and their annotations using BLASTP The literature was manually searched using the maize 422

genes and their Arabidopsis homologs as queries (Supplemental Table S10) The results of the literature 423

survey showed that 313 (67214) of the genes co-expressed with the guide genes had peer-reviewed 424

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 13

publications indicating a role in cell wall synthesis or related pathways in plants A search using 214 randomly 425

selected genes as queries returned only 327 genes (7214) that were involved in cell wall related pathways 426

This suggests that the network discriminated co-expressed genes and identified some known components of 427

the pathway Lignin biosynthesis genes are expected to function in cell wall biosynthesis to provide rigidity and 428

strength in the secondary cell wall (reviewed by Vanholme et al 2010) Interestingly even though no lignin 429

biosynthesis genes were included in our queries six lignin biosynthesis genes (PAL1 C4H 4CL2 HCT 430

CCoAOMT1 and PDR1) (reviewed by Zhong and Ye 2015) were found to be co-expressed with the guide 431

genes At least nine cellulose biosynthesis and assembly genes were discovered including CESA1 FLA11 432

IRX9 IRX14 and IRX10 (reviewed by Zhong and Ye 2015) Moreover proteins participating in a well-studied 433

physical interaction CSI1 (Cellulose Synthase Interactive 1) CESA6 (Cellulose Synthase 6) and CESA3 434

(Cellulose Synthase 3) (Desprez et al 2007 Gu et al 2010) were also predicted to be expressed in the 435

network There were 131 genes without reported functions in cell wall pathways an indication that GCN 436

analysis can be used to predict undiscovered components of biological pathways in maize 437

The cell wall biosynthesis pathway results were also compared with the CORNET Co-expression database (De 438

Bodt et al 2012) and STRING functional protein association network (Szklarczyk et al 2015) using the same 439

16 genes and similar parameters (See Methods) From CORNET 10 out of 16 genes had co-expressed genes 440

(Fig 8B) In total 210 genes and 325 interactions were retrieved using CORNET of which 19 (40210) had 441

publications supporting their function in cell wall pathways (Supplemental Table S11) STRING performed very 442

well with 14 out of 16 genes demonstrating predicted protein association (Fig 8C) resulting in 817 443

interactions with 76 genes 48 (3675) of co-expressed genes were experimentally confirmed (Supplemental 444

Table S12) the highest percentage among the three methods Only one of the lignin biosynthesis genes 445

(PAL1) was found using CORNET and none were found using STRING Although STRING appears very 446

robust for predicting protein-protein interactions this suggests that an optimized GCN analysis have more 447

power to find genes that function together without physically interacting This case study shows that a robust 448

optimized GCN can discover physical and functional interactions and enhance study of biological relevant 449

interactions A tutorial was provided as supplemental material on how to use Cytoscape to visualize any co-450

expressed genes in our network (Supplemental Dataset S2) 451

452

Discussion 453

As the per-read cost of RNA-Seq technology decreases the use of this technology is quickly increasing With 454

over five thousand libraries available for maize there is now ample data to support GCN analysis This 455

comprehensive evaluation of normalization methods and network inference methods using real maize RNA-456

Seq data will provide a useful set of optimized parameters to support these analyses 457

In our analysis VST CPM and RPKM normalization methods had equivalent outcomes for GCN analysis 458

consistent with prior results using much smaller datasets (Giorgi et al 2013) Several benchmark studies 459

focusing on differential expression (DE) analysis proposed that RPKM performed poorly and should be avoided 460 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 14

(Maza et al 2013 Dillies et al 2013b Zyprych-Walczak et al 2015) This was not observed for the maize 461

GCN testing It is possible that the large number of samples from various labs created enough heterogeneity 462

within samples that normalization effects were minimized (Paulson et al 2016) Furthermore the 463

normalization is on a library basis which means genes within the same library are normalized by similar factors 464

So when the network is constructed by PCC and BIC where expression vectors are centered by mean or 465

median values the effect of different normalization methods are probably small Two rank correlations SCC 466

and KCC only consider difference on relative rankings where normalization has a limited effect It is similar for 467

GCC method The estimation of mutual information is based on the k-nearest neighbor method implemented in 468

parmigene (Sales and Romualdi 2011) Since the three normalization methods shared similar expression 469

distribution (Supplemental Fig 2) MI estimations from different normalizations are expected to be similar 470

When assessing inference methods the simple and widely used correlation methods like PCC and SCC are 471

less time-consuming than MI methods This analysis showed PCCSCC- built GCNs had better overall 472

performance This is consistent with a study in human GCN analysis (Ballouz et al 2015) but SCC did not 473

score higher than other correlation methods using GO and PPPTY evaluations Some genes had higher 474

performance using MI methods but this effect was limited to evaluation with the PPPTY data This may 475

indicate that correlation and MI inference methods assert different kinds of interactions (Meyer et al 2008 476

Marbach et al 2012 Song et al 2012) Marbach et al (2012) stated that integration of multiple inference 477

methods showed a more robust performance than any single inference methods in in silico and E coli 478

expression networks referring to ldquothe wisdom of crowdrdquo However for analysis of the available maize data 479

integration of PCC SCC MRNET and CLR together did not result in a network that outperformed PCC and 480

SCC networks (data not shown) This approach was also less effective in more complex S cerevisiae datasets 481

than prokaryotic networks (Marbach et al 2012) suggesting that more work is required to determine whether 482

integrating algorithms can improve GCNs with eukaryotic data 483

In conclusion we extensively evaluated normalization methods and inference methods for building an RNA-484

Seq based maize GCN This optimization may apply to a range of datasets with shared characteristics of 485

maize including a large and heterogeneous genome with rich and diverse transposon element composition 486

and limited gene annotation 487

488

Materials and Methods 489

RNA-Seq Data Collection and Process 490

The maize genome and its annotation were downloaded from Ensembl Plant Release 31 491

(httpplantsensemblorg) The original 1303 RNA-Seq samples based on illumina HiSeq2000 or Hiseq2500 492

were downloaded from NCBI Sequence Read Archive (SRA) (Leinonen et al 2010) The downloaded files 493

were converted to fastq format using the fastq-dump command in SRA Toolkit (version 252) The adapters for 494

the fastq files were trimmed by Cutadapt 181 (Martin 2011) The adapter-removed files were then quality 495

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 15

checked by FastQC v0112 (httpwwwbioinformaticsbabrahamacukprojectsfastqc) HISAT2 v204 (Kim 496

et al 2015) was used for genome alignment Gene-level expression raw read counts were calculated by 497

FeatureCounts 150 (Liao et al 2014) from aligned bam files (Supplemental Fig S1) 26 libraries with less 498

than 5 million reads total and 11 libraries with less than 70 of total alignment rate were excluded leaving 499

1266 samples (Supplemental Table S1) for the final expression table The processing protocol were 500

streamlined by Snakemake v371 (Koumlster and Rahmann 2012) 501

502

Gene Count Normalization 503

The expression data was normalized using three different methods before constructing GCNs Counts Per 504

Million (CPM) and Reads Per Killbase Per Million (RPKM) were calculated by edgeR package (Robinson et al 505

2010) in R environment and then log2 normalized (expression = log2(CPMRPKM +1) For both method scale 506

factors between samples were estimated by Trimmed Mean of M-values (TMM) in edge R Variance Stabilizing 507

Transformation (VST) was calculated by DESeq2 package (Love et al 2014) Only genes with expression 508

higher than 2 CPM in more than 1000 samples were included from additional analysis (15116 genes) 509

510

Network Inference 511

Six correlation coefficient methods and four mutual information methods were applied to normalized gene 512

expression data to construct GCNs All computing steps were done in the R 331 environment Pearson 513

Correlation Coefficient (PCC) and Spearman Correlation Coefficient (SCC) was calculated by cor() function 514

Kendall rank Correlation Coefficient was calculated using corfk() function in pcaPP package (Filzmoser et al 515

2009) Gini Correlation Coefficient was calculated by adjacencymatrix() function in rsgcc package (Ma and 516

Wang 2012) Biweight midcorrelation was computed by bicor() function in WGCNA package (Langfelder and 517

Horvath 2008) Cosine similarity coefficient was computed by cosine() function in coop package (Schmidt 518

2016) Mutual information results were computed using the parmigene package (Sales and Romualdi 2011) 519

The adjacency matrix weighs derived from ten inference methods were ranked with smallest value equals to 520

one Then ranks were divided by the number of elements in the matrix and diagonal was set to one to make all 521

networks weighs ranging from zero to one 522

523

Network Performance Evaluation 524

To generate the random networks gene IDs were shuffled randomly in CPM or VST normalized expression 525

matrices The randomized expression matrices were then inferenced by PCC MRNET or CLR methods and 526

evaluated For PCC methods 1000 repeats of randomization and evaluation were conducted For MRNET and 527

CLR each inference steps took 2 hours on our server so 10 repeats were conducted 528

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 16

Four maize datasets were used for evaluation First maize protein-protein interactions were downloaded from 529

PPIM v11 (Zhu et al 2016) Only high-confidence interactions were used for evaluation as defined by ranking 530

top 5 in their results Second maize pathway information was downloaded from MaizeCyc v22 (Monaco et 531

al 2013) Genes within same pathways were considered as co-expressed Third maize gene ontology data 532

for AGPv330 was downloaded from AgriGO (Du et al 2010) GO terms with 20 to 300 genes were used for 533

evaluation Fourth ChIP-Seq confirmed targets for HDA101 (GRMZM2G172883) (Yang et al 2016) was used 534

as positive co-expressed examples for evaluation 535

The widely-used Area under Receiver Operating Characteristic (AUROC) for binary classification problems 536

was used for evaluations Protein-protein interaction and pathway information was parsed into lists of co-537

expressed genes Prediction() and performance() function in R package ROCR were used to calculate 538

AUROCs (Sing et al 2005) The 277 AUROC values for GO datasets were calculated by EGAD package 539

(Ballouz et al 2016) in R Basically it utilizes the ldquoguilt-by associationrdquo principle that genes with shared GO 540

terms are more likely to connected Thus networks normalized and inferred by different methods can be 541

evaluated by hiding a subset of genes GO terms and test whether the hidden GO terms could be predicted 542

from the remaining annotations The prediction model performance was measured by AUROC values in three-543

fold cross-validation All ANOVA and pairwise Wilcoxon rank tests were analyzed in R using anova() and 544

pairwisewilcoxtest() function from stats package P-value adjustment method was set to ldquofdrrdquo (Benjamini and 545

Hochberg 1995) 546

Definition of True Positives (TP) False Positives (FP) True Negatives (TN) False Negatives (FN) For the 547

evaluation using PPPTY dataset TP a network predicts two genes are co-expressed and they are co-548

expressed in PPPTY dataset FP a network predicts two genes are co-expressed but they are not TN a 549

network predicts two genes are not co-expressed and they are not co-expressed in PPPTY FN a network 550

predicts two genes are not co-expressed but they are co-expressed in PPPTY datasets For the evaluation 551

using GO dataset TP a network predicts a gene has a specific GO term and it does have that GO term in our 552

GO dataset FP a network predicts a gene has a specific GO term but it does not have that GO term in our 553

GO dataset TN a network predicts a gene does not have a specific GO term and it doesnrsquot have in our GO 554

dataset FN a network predicts a gene does not have a specific GO terms but it has that GO term in GO 555

dataset 556

557

Network Clustering and Characterization 558

For each network the top 1 million edges were selected as stringent co-expression networks The network 559

topological characteristics were computed in Cytoscape (Shannon et al 2003) The neighborhood connectivity 560

distribution and node degree distributions were plotted by Network Analyzer plugin (Doncheva et al 2012) 561

Graph clustering was performed using Markov Cluster Algorithm (MCL) by MCL v14137 with inflation value set 562

to 18 (Enright et al 2002) All networks were visualized in Cytoscape 563

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 17

564

Gene Ontology Enrichment and Visualization 565

Gene ontology enrichment was analyzed in AgriGOrsquos Singular Enrichment Analysis tool (Du et al 2010) 566

15116 genes involved in our networks were used as background references Hypergeometric testing was used 567

to calculate p-value for which a value below 005 was considered as significant The Yekutieli method was 568

used for multiple test correction and terms with false discovery rate (FDR) above 005 were discarded The 569

results were then imported into Cytoscape for visualization 570

571

Databases Comparison on Cell Wall Pathway 572

Sixteen well characterized (Penning et al 2009 Bosch et al 2011) components of cell wall biosynthesis 573

(Supplemental Table S8) were chosen as query genes to search against CORNET Maize 574

(httpsbioinformaticspsbugentbecornetversionscornet_maize10) on website and STRING database using 575

Cytoscape stringApp (httpappscytoscapeorgappsstringapp) The parameters for searching CORNET 576

database were Method=Pearson Correlation coefficient=075 P-value le 005 and Top genes = 50 This 577

resulted in 210 co-expressed genes and 325 interactions To search STRING database the confidence cutoff 578

was set to 04 with maximum number of interactors set to 100 76 genes with 817 interactions were retrieved 579

Maize proteins were blasted against TAIR 10 protein sequences using standalone BLASTP version 2228+ 580

(Camacho et al 2009) 581

582

Acknowledgments 583

We would like to give special thanks to Dr Peixiang Zhao (FSU Department of Computer Science) for advice 584

and discussion on topological analysis of maize networks Also we thank Dr Alan Lemmon (FSU Department 585

of Scientific Computing) and Dr Jonathan Dennis (FSU Department of Biological Science) for the helpful 586

discussion on data analysis 587

588

Supplemental Data 589

Supplemental Figure 1 Pipeline and datasets used for analysis 590

Supplemental Figure 2 Distribution of gene expression values 591

Supplemental Figure 3 Maize CPM-normalized with log2 transformed gene expression from all tissues and 592

developmental stages 593

Supplemental Figure 4 Pairwise comparison among results of inferences methods 594

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 18

Supplemental Figure 5 Characteristic of all 1720 PPPTY gene set (ALL_1720) genes with highest AUROC 595

values in CSC method (CSC) PCC method (PCC) and MRNET method (MRNET) 596

Supplemental Figure 6 Evaluation of network performance based on sample size and inference 597

Supplemental Figure 7 GCN performance comparison between protein networks 598

Supplemental Figure 8 Average neighborhood connectivity for three selected networks PCC-aggregated (PA) 599

SCC-aggregated (SA) and MRNET-single (MS) 600

Supplemental Figure 9 Node distribution for four selected networks PCC-aggregated (PA) SCC-aggregated 601

(SA) and MRNET-single (MS) and the intersection among three networks (Merged network) 602

Supplemental Figure 10 Network representation of PCC ranked aggregation network (PA) 603

Supplemental Table S1 RNA-Seq libraries used in this analysis 604

Supplemental Table S2 Random network AUROC value baseline 605

Supplemental Table S3 ANOVA tables and pairwise comparisons 606

Supplemental Table S4 Topological characteristics of four maize networks 607

Supplemental Table S5 Gene Ontology annotation for 148 hub genes 608

Supplemental Table S6 Enriched GO terms for PCC ranked aggregation networks from module 1 to module 8 609

Supplemental Table S7 Enriched GO terms for SCC ranked aggregation networks from module 1 to module 8 610

Supplemental Table S8 16 query genes in maize cell wall pathway 611

Supplemetal Table S9 GO enrichment analysis for 214 co-expressed genes of cell wall query genes in 612

merged network 613

Supplemental Table S10 Annotation for co-expressed genes queried by 16 cell wall pathway genes from 614

merged network 615

Supplemental Table S11 Annotation for co-expressed genes queried by 16 cell wall pathway genes from 616

CORNET database 617

Supplemental Table S12 Annotation for co-expressed genes queried by 16 cell wall pathway genes from 618

STRING database 619

Supplemental Dataset S1 The merged network in Cytoscape-ready format 620

Supplemental Dataset S2 Tutorial Visualizing Co-expression data in Cytoscape 621

622

623 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 19

624

625

626

Figure legends 627

628

Figure 1 Number of maize microarray and RNA-Seq samples submitted to NCBI (httpswwwncbinlmnihgov) 629

from 2008-2016 A A text search of the National Center for Biotechnology Information (NCBI) Gene 630

Expression Omnibus (GEO) database identified samples generated by microarray platforms GPL4032 and 631

GPL12620 the total values for years 2008 to 2016 were combined to represent the number of microarray 632

studies (Microarray) A text search of the NCBI Sequence Read Archive (SRA) database was used to identify 633

RNA-seq samples generated between 2008 and 2016 using the Illumina sequencing platform (RNA-Seq) B 634

the number of samples submitted to NCBI GEO database each year generated by microarray platform 635

GPL4032 and GPL12620 were identified by a text search (dash line) and compared to the number of RNA-Seq 636

Illumina samples (solid line) per year 2008-2016 637

638

Figure 2 Normalization and network inference methods effect on single network performance A Network 639

performance was evaluated by calculating area under the receiver operating characteristic curve (AUROC) 640

values from GO datasets for comparisons with samples normalized using Variance Stabilizing Transformation 641

(VST) Counts Per Million (CPM) or Reads per Kilobase per Million (RPKM) methods B Network performance 642

was evaluated by calculating AUROC values from PPPTY dataset comparisons for samples normalized using 643

VST CPM or RPKM methods C Network performance was evaluated by calculating AUROC values from 644

comparisons with HDA101 binding targets for samples normalized using VST CPM or RPKM methods D 645

Network performance was evaluated by calculating AUROC values from comparisons with GO dataset for 646

samples constructed using ten inference methods including Pearson Correlation Coefficient (PCC) Spearman 647

correlation coefficient (SCC) Kendall rank Correlation Coefficient (KCC) Gini correlation coefficient (GCC) 648

Biweight midcorrelation (BIC) Cosine Similarity Coefficient (CSC) Additive ARCNE (AA) multiplicative 649

ARCNE (MA) Minimum Redundancy NETwork (MRNET) and Context Likelihood of Relatedness (CLR) E 650

Network performance was evaluated by calculating AUROC values from comparisons with PPPTY for samples 651

constructed using ten inference methods F Network performance was evaluated by calculating AUROC 652

values from comparisons with HDA101 binding targets for samples constructed using ten inference methods 653

Outliers were defined as outside of 15 times the interquartile range above 75 quantile or below 25 quantile 654

Median values were plotted as bold horizontal lines For C to F the horizontal dashed lines indicate the highest 655

and lowest AUROC values 656

657

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 20

Figure 3 Similarity between ten inference methods on network performance based upon GO (A) and PPPTY 658

(B) evaluation Genes with highest AUROC value in CSC or MRNETCLR are closed in blue and green box 659

respectively Area under the ROC curve (AUROC) values for each GO term or genes were scaled to standard 660

normal distribution resulting in scaled AUROC values between -3 (blue) and 3 (red) Samples normalized by 661

VST CPM and RPKM were analyzed using each inference methods (PCC SCC KCC GCC BIC CSC AA 662

MA MRNET and CLR) and clustered based on Euclidian distance PCC Pearson Correlation Coefficient SCC 663

Spearman correlation coefficient KCC Kendall rank Correlation Coefficient GCC Gini correlation coefficient 664

BIC Biweight midcorrelation CSC Cosine Similarity Coefficient AA Additive ARCNE MA multiplicative 665

ARCNE MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness 666

667

Figure 4 Effect of sample size on network performance A Average area under the ROC curve (average 668

AUROC) values from GO evaluation of 17 different sized networks plotted against natural logarithm 669

transformed sample size (log(sample size)) B Average AUROC values from PPPTY evaluation of 17 different 670

sized networks plotted against natural logarithm transformed sample size (log(sample size)) Regression fitting 671

logarithm models were plotted in black lines R-square and p-values were calculated by lm() function in R PCC 672

Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET Minimum Redundancy 673

NETwork CLR Context Likelihood of Relatedness 674

675

Figure 5 Evaluation of different sized networks constructed with different number of samples using PCC 676

(black) SCC (green) MRNET (red) and CLR (blue) methods A Average AUROC values from GO evaluations 677

of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted against sample size Seventeen 678

individual networks were labeled as S12_1 to S404 the S1266 included all samples from 17 experiments B 679

Average AUROC values from PPPTY evaluations of networks constructed using 12 (S12) to 1266 (S1266) 680

libraries were plotted against sample size Networks with the same number of samples included are 681

designated as ldquo_1rdquo ldquo_2rdquo and ldquo_3rdquo PCC Pearson Correlation Coefficient SCC Spearman correlation 682

coefficient MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness 683

684

Fig 6 GCN performance comparison among single network (whiterdquo1266rdquo) aggregated network (greyrdquoaggrdquo) 685

and protein network (dark greyrdquoprrdquo) using PCC SCC MRNET and CLR A GO evaluation on networks 686

Inference methods were indicated by single letter (p- PCC s- SCC m- MRNET c-CLR) AUROC values were 687

plotted against network types B PPPTY evaluation on networks Inference methods were indicated by single 688

letter (p- PCC s- SCC m- MRNET c-CLR) Network types were plotted against AUROC values Bold 689

horizontal lines indicate median star sign is the mean value of each box Outliers are plotted in grey dots 690

691

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 21

Figure 6 GCN performance comparison of networks constructed with 1266 libraries A Area under the ROC 692

curve (AUROC) values from GO evaluation of single network (white bars) aggregation network (grey bars) and 693

protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) 694

or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers B 695

AUROC values from PPPTY evaluation of single network (white bars) aggregation network (grey bars) and 696

protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) 697

or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers 698

699

Figure 7 Comparison of top 1000 interacting genes in aggregation and single networks A Venn diagram 700

shows the overlap among 1000 highest interacting genes in three networks 148 genes were shared among 701

three networks PA PCC ranked aggregation network SA SCC ranked aggregation network MS MRNET 702

single network B Venn diagram shows the overlap among top 1x106 edges in three networks 106591 edges 703

were shared among three networks C GO term enrichment analyzed by AgriGO for 148 shared highly 704

interacting genes among PA SA and MS D GO term enrichment analyzed by AgriGO for 214 co-expressed 705

genes queried by 16 cell wall pathway genes 706

707

Figure 8 Cell wall pathway subnetworks A Intersections of PCC aggregation (PA) SCC aggregation (SA) and 708

MRNET-single (MS) networks queried by 16 cell wall pathway genes (red nodes) Cyan nodes are genes with 709

reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of 710

involvement in cell wall related pathways Grey lines indicate network predicted interactions B Network 711

retrieved from CORNET database queried by the16 cell wall pathway genes (red node) Cyan nodes are 712

genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior 713

knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions C 714

Network retrieved from STRING database queried by 16 cell wall pathway genes (red nodes) Cyan nodes are 715

genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior 716

knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions 717

718

Supplemental Figure 1 Pipeline and datasets used for analysis A Workflow used in this analysis 719

Independent steps are labeled in square boxes with alternative algorithms for each step in the rounded boxes 720

Software and packages for each step are in italics between the boxes Raw data files were acquired from 721

National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) database converted to a 722

common format (fastq files) and aligned to the maize AGPv3 genome (Alignment) Gene-level reads were 723

counted (Read Count) to generate an expression matrix which was imported to the R environment for the 724

normalization inference and evaluation steps All networks were visualized in Cytoscape B Relative 725

representation of different maize tissues in acquired datasets Tissues are listed by name with the percentage 726

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 22

of the1266 libraries originating from each tissue SAM= Shoot Apical Meristem Samples are grouped by tissue 727

and may be represented by one or more developmental stages of that tissue Tissues represented by less than 728

10 libraries were grouped together as Others C Relative representation of different maize genotypes in our 729

datasets Genotypes are listed by name with the percentage of the 1266 libraries originating from each tissue 730

MAGIC = Multi-parent Advanced Generation InterCrosses Genotypes represented by more than 10 libraries 731

were grouped together as Others 732

733

Supplemental Figure 2 Distribution of gene expression values The frequency of each expression level in the 734

dataset (Density) was plotted against gene expression (Expr) which was calculated after normalization by 735

Variance Stabilizing Transformation (VST) Counts Per Million (CPM) or Reads Per Kilobase per Million 736

mapped reads (RPKM) A-B distribution of expression values for samples normalized with CPM (black line 737

CPM graph) and RPKM (black line RPKM graph) before (A) and after (B) logarithm normalization (log2) VST 738

values are log2 transformed by default The normal distribution of expression (dot lines) was calculated using 739

dnorm() function in R which takes the mean value and standard deviation from log2 transformed expressions 740

C Normalized gene expression values for 15116 genes were averaged libraries and plotted as a function of 741

gene length in base pairs (bp) 742

743

Supplemental Figure 3 Maize CPM-normalized with log2 transformed gene expression from all tissues and 744

developmental stages (Stelpflug et al 2015) A Clustering dendrogram of samples based on Euclidean 745

distance (Height) DAS days after sowing DAP days after pollination V1-V18 vegetative developmental 746

stage B Heat map of the gene expression correlation between pollen tissue and 78 other tissues calculated 747

by Pearson correlation coefficient ranging 06 to 10 Red color indicates higher correlation 748

749

Supplemental Figure 4 Pairwise comparison among results of inferences methods A GO evaluation 750

comparisons for VST CPM and RPKM normalized data The AUROC value density for each method was 751

plotted in diagonal line of blocks between AUROC values and PCC values AUROC values evaluated by GO 752

datasets were plotted pairwise in triangle below diagonal with the number corresponding coefficient values as 753

calculated by Pearson correlation shown in the triangle above diagonal B PPPTY evaluation comparisons for 754

VST CPM and RPKM normalized data The AUROC value density for each method was plotted in diagonal 755

line of blocks between AUROC values and PCC values AUROC values evaluated by PPPTY datasets were 756

plotted pairwise in triangle below diagonal with the number corresponding coefficient values as calculated by 757

Pearson correlation shown in the triangle above diagonal PCC Pearson Correlation Coefficient SCC 758

Spearman correlation coefficient KCC Kendall rank Correlation Coefficient GCC Gini correlation coefficient 759

Bi Biweight midcorrelation CSC Cosine Similarity Coefficient AA Additive ARCNE MA multiplicative 760

ARCNE MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness 761

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 23

762

Supplemental Figure 5 Characteristic of all 1720 PPPTY gene set (ALL_1720) genes with highest AUROC 763

values in CSC method (CSC) PCC method (PCC) and MRNET method (MRNET) Average expression in 764

CPM of four gene sets were in squares average number of lowly expressed elements (CPM lt 0) were in solid 765

circles 766

767

Supplemental Figure 6 Evaluation of network performance based on sample size and inference A AUROC 768

values from GO evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted 769

against sample size B AUROC values from PPPTY evaluations of networks constructed using 12 (S12) to 770

1266 (S1266) libraries were plotted against sample size Networks with the same number of samples included 771

are designated as ldquo_1rdquo ldquo_2rdquo and ldquo_3rdquo Outliers were defined as outside of 15 times the interquartile range 772

above the 75 quantile or below 25 quantile Median values were plotted as bold horizontal lines Dash lines 773

are average AUROC value from 17 individual networks of each categories Mean values of each network were 774

labeled in asterisks PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET 775

Minimum Redundancy NETwork CLR Context Likelihood of Relatedness 776

777

Supplemental Figure 7 GCN performance comparison between protein networks A Area Under the ROC 778

curve (AUROC) values from GO evaluation of protein networks with 17862 genes (ppr_all) and with 11429 779

genes (ppr) B Area Under the ROC curve (AUROC) values from PPPTY evaluation of protein networks with 780

17862 genes (ppr_all) and with 11429 genes (ppr) Both networks were constructed by Pearson Correlation 781

Coefficient (PCC) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate 782

outliers 783

784

Supplemental Figure 8 Average neighborhood connectivity for three selected networks PCC-aggregated (PA) 785

SCC-aggregated (SA) and MRNET-single (MS) The average neighborhood connectivity distribution of all 786

genes is plotted against number of neighbors The top one million edges were chosen for each network Red 787

and blue curve shows the power-law fitted distribution R2 value indicates the fitness with the power-law model 788

789

Supplemental Figure 9 Node distribution for four selected networks PCC-aggregated (PA) SCC-aggregated 790

(SA) and MRNET-single (MS) and the intersection among three networks (Merged network) The number of 791

edges linked to the genes (node degree) was plotted against the number of genes with that degree (number of 792

nodes) Red curve shows the power-law fitted distribution with the function and R2 indicated beside 793

794

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 24

Supplemental Figure 10 Network representation of PCC ranked aggregation network (PA) Each node is a 795

gene in the network The eight largest modules detected by Markov Cluster Algorithm (MCL) were highlighted 796

in colors Genes not in modules 1-8 are light grey nodes 797

798

799

Literature Cited 800

Allen JD Xie Y Chen M Girard L Xiao G (2012) Comparing statistical methods for constructing large scale 801 gene networks PLoS One 7 e29348 802

Anders S Huber W (2010) Differential expression analysis for sequence count data Genome Biol 11 R106 803

Aoki K Ogata Y Shibata D (2007) Approaches for extracting practical information from gene co-expression 804 networks in plant biology Plant Cell Physiol 48 381ndash90 805

Aoki Y Okamura Y Tadaka S Kinoshita K Obayashi T (2015) ATTED-II in 2016 A Plant Coexpression 806 Database Towards Lineage-Specific Coexpression Plant Cell Physiol 57 e5ndashe5 807

Asnicar F Masera L Coller E Gallo C Sella N Tolio T Morettin P Erculiani L Galante F Semeniuta S (2016) 808 NES2RA Network expansion by stratified variable subsetting and ranking aggregation Int J High Perform 809 Comput Appl 1094342016662508 810

Baerenfaller K Grossmann J Grobei MA Hull R Hirsch-Hoffmann M Yalovsky S Zimmermann P 811 Grossniklaus U Gruissem W Baginsky S (2008) Genome-scale proteomics reveals Arabidopsis thaliana 812 gene models and proteome dynamics Science (80- ) 320 938ndash941 813

Ballouz S Verleyen W Gillis J (2015) Guidance for RNA-seq co-expression network construction and analysis 814 Safety in numbers Bioinformatics 31 2123ndash2130 815

Ballouz S Weber M Pavlidis P Gillis J (2016) EGAD Ultra-fast functional analysis of gene networks bioRxiv 816 53868 817

Barabasi A-L Oltvai ZNZN Barabaacutesi A-L (2004) Network biology understanding the cellrsquos functional 818 organization Nat Rev Genet 5 101ndash113 819

Benjamini Y Hochberg Y (1995) Controlling the false discovery rate a practical and powerful approach to 820 multiple testing J R Stat Soc Ser B 289ndash300 821

De Bodt S Hollunder J Nelissen H Meulemeester N Inzeacute D (2012) CORNET 20 Integrating plant 822 coexpression protein-protein interactions regulatory interactions gene associations and functional 823 annotations New Phytol 195 707ndash720 824

Bolduc N Yilmaz A Mejia-Guerra MK Morohashi K OrsquoConnor D Grotewold E Hake S (2012) Unraveling the 825 KNOTTED1 regulatory network in maize meristems Genes Dev 26 1685ndash90 826

Bosch M Mayer C-D Cookson A Donnison IS (2011) Identification of genes involved in cell wall biogenesis in 827 grasses by differential gene expression profiling of elongating and non-elongating maize internodes J 828 Exp Bot 62 3545ndash3561 829

Camacho C Coulouris G Avagyan V Ma N Papadopoulos J Bealer K Madden TL (2009) BLAST+ 830 architecture and applications BMC Bioinformatics 10 421 831

Conesa A Madrigal P Tarazona S Gomez-Cabrero D Cervera A McPherson A Szcześniak MW Gaffney DJ 832 Elo LL Zhang X et al (2016) A survey of best practices for RNA-seq data analysis Genome Biol 17 13 833

Drsquohaeseleer P Liang S Somogyi R (2000) Genetic network inference from co-expression clustering to reverse 834 engineering Bioinformatics 16 707ndash726 835

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 25

Davidson RM Hansey CN Gowda M Childs KL Lin H Vaillancourt B Sekhon RS de Leon N Kaeppler SM 836 Jiang N et al (2011) Utility of RNA Sequencing for Analysis of Maize Reproductive Transcriptomes Plant 837 Genome J 4 191 838

Desprez T Juraniec M Crowell EF Jouy H Pochylova Z Parcy F Houmlfte H Gonneau M Vernhettes S (2007) 839 Organization of cellulose synthase complexes involved in primary cell wall synthesis in Arabidopsis 840 thaliana Proc Natl Acad Sci 104 15572ndash15577 841

Dhillon IS Modha DS (2001) Concept decompositions for large sparse text data using clustering Mach Learn 842 42 143ndash175 843

Dillies M-A Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot G Castel D 844 Estelle J (2013a) A comprehensive evaluation of normalization methods for Illumina high-throughput RNA 845 sequencing data analysis Brief Bioinform 14 671ndash683 846

Dillies MA Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot NS Castel D 847 Estelle J et al (2013b) A comprehensive evaluation of normalization methods for Illumina high-throughput 848 RNA sequencing data analysis Brief Bioinform 14 671ndash683 849

Doncheva NT Assenov Y Domingues FS Albrecht M (2012) Topological analysis and interactive visualization 850 of biological networks and protein structures Nature Protoc 7 670ndash85 851

Dong J Horvath S (2007) Understanding network concepts in modules BMC Syst Biol 1 24 852

Dotto MC Petsch KA Aukerman MJ Beatty M Hammell M Timmermans MCP (2014) Genome-wide analysis 853 of leafbladeless1-regulated and phased small RNAs underscores the importance of the TAS3 ta-siRNA 854 pathway to maize development PLoS Genet 10 e1004826 855

Du D Rawat N Deng Z Gmitter FG (2015) Construction of citrus gene coexpression networks from microarray 856 data using random matrix theory Hortic Res 2 15026 857

Du Z Zhou X Ling Y Zhang Z Su Z (2010) agriGO a GO analysis toolkit for the agricultural community 858 Nucleic Acids Res 38 64-70 859

Enright AJ Van Dongen S Ouzounis CA (2002) An efficient algorithm for large-scale detection of protein 860 families Nucleic Acids Res 30 1575ndash1584 861

Fadista J Vikman P Laakso EO Mollet IG Esguerra J Lou Taneera J Storm P Osmark P Ladenvall C 862 Prasad RB (2014) Global genomic and transcriptomic analysis of human pancreatic islets reveals novel 863 genes influencing glucose metabolism Proc Natl Acad Sci 111 13924ndash13929 864

Faith JJ Hayete B Thaden JT Mogno I Wierzbowski J Cottarel G Kasif S Collins JJ Gardner TS (2007) 865 Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of 866 expression profiles PLoS Biol 5 0054ndash0066 867

Fedoroff N V (2012) McClintockrsquos challenge in the 21st century Proc Natl Acad Sci 109(50) 20200ndash20203 868

Ficklin SP Feltus FA (2011) Gene coexpression network alignment and conservation of gene modules 869 between two grass species maize and rice Plant Physiol 156 1244ndash56 870

Filzmoser P Fritz H Kalcher K (2009) pcaPP Robust PCA by Projection Pursuit R Packag version 1 871

Fu J Cheng Y Linghu J Yang X Kang L Zhang Z Zhang J He C Du X Peng Z (2013) RNA sequencing 872 reveals the complex regulatory network in the maize kernel Nature Commun 42832 873

Gao S Ver Steeg G Galstyan A (2015) Efficient Estimation of Mutual Information for Strongly Dependent 874 Variables Artificial Intelligence and Statistics 277-286 875

Gillis J Pavlidis P (2011) The role of indirect connections in gene networks in predicting function 876 Bioinformatics 27 1860ndash1866 877

Giorgi F Fabbro C Del Licausi F (2013) Comparative study of RNA-seq-and Microarray-derived coexpression 878 networks in Arabidopsis thaliana Bioinformatics 2 1ndash8 879

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 26

Gu Y Kaplinsky N Bringmann M Cobb A Carroll A Sampathkumar A Baskin TI Persson S Somerville CR 880 (2010) Identification of a cellulose synthase-associated protein required for cellulose biosynthesis Proc 881 Natl Acad Sci 107 12866ndash12871 882

Han Y Gao S Muegge K Zhang W Zhou B (2015) Advanced applications of RNA sequencing and challenges 883 Bioinform Biol Insights 9 29ndash46 884

Horvath S Dong J (2008) Geometric interpretation of gene coexpression network analysis PLoS Comput biol 885 4 e1000117 886

Huang J Lynn JS Schulte L Vendramin S McGinnis K (2017) Chapter Two-Epigenetic Control of Gene 887 Expression in Maize Int Rev Cell Mol Biol 328 25ndash48 888

Iancu OD Kawane S Bottomly D Searles R Hitzemann R McWeeney S (2012) Utilizing RNA-Seq data for de 889 novo coexpression network inference Bioinformatics 28 1592ndash1597 890

Kim D Langmead B Salzberg SL (2015) HISAT a fast spliced aligner with low memory requirements Nat 891 Methods 12 357ndash360 892

Koumlster J Rahmann S (2012) Snakemakemdasha scalable bioinformatics workflow engine Bioinformatics 28 893 2520ndash2522 894

Krouk G Lingeman J Colon A Coruzzi G Shasha D (2013) Gene regulatory networks in plants learning 895 causality from time and perturbation Genome Biol 14 123 896

Kumar S Stecher G Suleski M Hedges SB (2017) TimeTree a resource for timelines timetrees and 897 divergence times Mol Biol Evol 34 1812ndash1819 898

Kumari S Nie J Chen H-S Ma H Stewart R Li X Lu M-Z Taylor WM Wei H (2012) Evaluation of gene 899 association methods for coexpression network construction and biological knowledge discovery PLoS 900 One 7 e50411 901

Langfelder P Horvath S (2008) WGCNA an R package for weighted correlation network analysis BMC 902 Bioinformatics 9 559 903

Leinonen R Sugawara H Shumway M (2010) The sequence read archive Nucleic Acids Res gkq1019 904

Li C Qiao Z Qi W Wang Q Yuan Y Yang X Tang Y Mei B Lv Y Zhao H et al (2015a) Genome-Wide 905 Characterization of cis-Acting DNA Targets Reveals the Transcriptional Regulatory Framework of 906 Opaque2 in Maize Plant Cell 27 532-545 907

Li H Peng Z Yang X Wang W Fu J Wang J Han Y Chai Y Guo T Yang N (2013a) Genome-wide 908 association study dissects the genetic architecture of oil biosynthesis in maize kernels Nat Genet 45 43ndash909 50 910

Li J Wei H Zhao PX (2013b) DeGNServer  Deciphering Genome-Scale Gene Networks through High 911 Performance Reverse Engineering Analysis 2013 912

Li P Piao Y Shon HS Ryu KH (2015b) Comparing the normalization methods for the differential analysis of 913 Illumina high-throughput RNA-Seq data BMC Bioinformatics 16 347 914

Li Q Eichten SR Hermanson PJ Zaunbrecher VM Song J Wendt J Rosenbaum H Madzima TF Sloan AE 915 Huang J et al (2014a) Genetic Perturbation of the Maize Methylome Plant Cell 26 4602ndash4616 916

Li S Łabaj PP Zumbo P Sykacek P Shi W Shi L Phan J Wu P-Y Wang M Wang C (2014b) Detecting and 917 correcting systematic variation in large-scale RNA sequencing data Nature Biotechnol 32 888ndash895 918

Li Y Pearl SA Jackson SA (2015c) Gene Networks in Plant Biology Approaches in Reconstruction and 919 Analysis Trends Plant Sci 20 664ndash675 920

Liao Y Smyth GK Shi W (2014) featureCounts an efficient general purpose program for assigning sequence 921 reads to genomic features Bioinformatics 30 923ndash930 922

Lim WK Wang K Lefebvre C Califano A (2007) Comparative analysis of microarray normalization procedures 923 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 27

Effects on reverse engineering gene networks Bioinformatics pp 282ndash288 924

Liu S Liu Y Zhao J Cai S Qian H Zuo K Zhao L Zhang L (2017) A computational interactome for prioritizing 925 genes associated with complex agronomic traits in rice Plant J 90 177-188 926

Lonsdale J Thomas J Salvatore M Phillips R Lo E Shad S Hasz R Walters G Garcia F Young N (2013) 927 The genotype-tissue expression (GTEx) project Nat Genet 45 580ndash585 928

Love MI Huber W Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data 929 with DESeq2 Genome Biol 15 1 930

Luo X You Z Zhou M Li S Leung H Xia Y Zhu Q (2015) A highly efficient approach to protein interactome 931 mapping based on collaborative filtering framework Sci Rep 5 7702 932

Ma C Wang X (2012) Application of the Gini correlation coefficient to infer regulatory relationships in 933 transcriptome analysis Plant Physiol 160 192ndash203 934

Ma H-W Zeng A-P (2003) The connectivity structure giant strong component and centrality of metabolic 935 networks Bioinformatics 19 1423ndash1430 936

Ma S Shah S Bohnert HJ Snyder M Dinesh-Kumar SP (2013) Incorporating motif analysis into gene co-937 expression networks reveals novel modular expression pattern and new signaling pathways PLoS Genet 938 9 e1003840 939

Marbach D Costello JC Kuumlffner R Vega NNM Prill RJ Camacho DM Allison KR Aderhold A Allison KR 940 Bonneau R et al (2012) Wisdom of crowds for robust gene network inference Nat Methods 9 796ndash804 941

Margolin AA Nemenman I Basso K Wiggins C Stolovitzky G Dalla Favera R Califano A (2006) ARACNE 942 an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context BMC 943 Bioinformatics 7 S7 944

Mark Cigan A Unger‐Wallace E Haug‐Collet K (2005) Transcriptional gene silencing as a tool for uncovering 945 gene function in maize Plant J 43 929ndash940 946

Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads EMBnet J 17 947 pp-10 948

Maza E Frasse P Senin P Bouzayen M Zouine M (2013) Comparison of normalization methods for 949 differential gene expression analysis in RNA-Seq experiments A matter of relative size of studied 950 transcriptomes Commun Integr Biol 6 e25849 951

McClintock B (1983) The Significance of Responses of THE GENOME TO CHALLENGE Science (80- ) 792ndash952 801 953

Meyer PE Kontos K Lafitte F Bontempi G (2007) Information-theoretic inference of large transcriptional 954 regulatory networks Eurasip J Bioinforma Syst Biol doi 101155200779879 955

Meyer PE Lafitte F Bontempi G (2008) minet ARBioconductor package for inferring large transcriptional 956 networks using mutual information BMC Bioinformatics 9 461 957

Monaco MK Sen TZ Dharmawardhana PD Ren L Schaeffer M Naithani S Amarasinghe V Thomason J 958 Harper L Gardiner J et al (2013) Maize Metabolic Network Construction and Transcriptome Analysis 959 Plant Genome 6 12 960

Morohashi K Casas MI Falcone Ferreyra ML Falcone Ferreyra L Mejiacutea-Guerra MK Pourcel L Yilmaz A 961 Feller A Carvalho B Emiliani J et al (2012) A genome-wide regulatory framework identifies maize 962 pericarp color1 controlled genes Plant Cell 24 2745ndash64 963

Morris JH Apeltsin L Newman AM Baumbach J Wittkop T Su G Bader GD Ferrin TE (2011) clusterMaker 964 a multi-algorithm clustering plugin for Cytoscape BMC Bioinformatics 12 436 965

Mortazavi A Williams BA McCue K Schaeffer L Wold B (2008) Mapping and quantifying mammalian 966 transcriptomes by RNA-Seq Nat Methods 5 621ndash628 967

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 28

Mukaka MM (2012) A guide to appropriate use of correlation coefficient in medical research Malawi Med J 24 968 69ndash71 969

Obayashi T Hayashi S Saeki M Ohta H Kinoshita K (2009) ATTED-II provides coexpressed gene networks 970 for Arabidopsis Nucleic Acids Res 37 D987ndashD991 971

Ogata Y Suzuki H Sakurai N Shibata D (2010) CoP a database for characterizing co-expressed gene 972 modules with biological information in plants Bioinformatics 26 1267ndash1268 973

Oshlack A Wakefield MJ (2009) Transcript length bias in RNA-seq data confounds systems biology Biol 974 Direct 4 14 975

Park T Yi S-G Kang S-H Lee S Lee Y-S Simon R (2003) Evaluation of normalization methods for microarray 976 data BMC Bioinformatics 4 33 977

Paulson J Chen C-Y Lopes-Ramos CM Kuijjer ML Platig J Sonawane AR Fagny M Glass K Quackenbush 978 J (2016) Tissue-aware RNA-Seq processing and normalization for heterogeneous and sparse data 979 bioRxiv 81802 980

Pautler M Eveland AL LaRue T Yang F Weeks R Lunde C Je B Il Meeley R Komatsu M Vollbrecht E et 981 al (2015) FASCIATED EAR4 Encodes a bZIP Transcription Factor That Regulates Shoot Meristem Size in 982 Maize Plant Cell Online 2 tpc114132506 983

Penning BW Hunter 3rd CT Tayengwa R Eveland AL Dugard CK Olek AT Vermerris W Koch KE McCarty 984 DR Davis MF et al (2009) Genetic resources for maize cell wall biology Plant Physiol 151 1703ndash1728 985

Ponnala L Wang Y Sun Q Wijk KJ (2014) Correlation of mRNA and protein abundance in the developing 986 maize leaf Plant J 78 424ndash440 987

Rau A Gallopin M Celeux G Jaffreacutezic F (2013) Data-based filtering for replicated high-throughput 988 transcriptome sequencing experiments Bioinformatics 29 2146ndash2152 989

Robinson MD McCarthy DJ Smyth GK (2010) edgeR a Bioconductor package for differential expression 990 analysis of digital gene expression data Bioinformatics 26 139ndash140 991

Sales G Romualdi C (2011) parmigenemdasha parallel R package for mutual information estimation and gene 992 network reconstruction Bioinformatics 27 1876ndash1877 993

Sangrador-Vegas A Mitchell AL Chang H-Y Yong S-Y Finn RD (2016) GO annotation in InterPro why 994 stability does not indicate accuracy in a sea of changing annotations Database J Biol databases 995 curation 2016 996

Sato Y Antonio BA Namiki N Takehisa H Minami H Kamatsuki K Sugimoto K Shimizu Y Hirochika H 997 Nagamura Y (2011) RiceXPro a platform for monitoring gene expression in japonica rice grown under 998 natural field conditions Nucleic Acids Res 39 D1141ndashD1148 999

Schaefer RJ Briskine R Springer NM Myers CL (2014) Discovering functional modules across diverse maize 1000 transcriptomes using COB the co-expression browser PLoS One doi 101371journalpone0099193 1001

Schmidt D (2016) Co-Operation Fast Correlation Covariance and Cosine Similarity R package 1002

Schnable P Ware D Fulton R Stein J (2009) The B73 maize genome complexity diversity and dynamics 1003 Science (80- ) 326 1112ndash1115 1004

Schwanhaumlusser B Busse D Li N Dittmar G Schuchhardt J Wolf J Chen W Selbach M (2011) Global 1005 quantification of mammalian gene expression control Nature 473 337ndash342 1006

Scott-Boyer M-P Haibe-Kains B Deschepper CF (2013) Network statistics of genetically-driven gene co-1007 expression modules in mouse crosses Frontiers in Genetics 20134291 1008

Serin EAR Nijveen H Hilhorst HWM Ligterink W (2016) Learning from Co-expression Networks Possibilities 1009 and Challenges Front Plant Sci 7 444 1010

Shannon P Markiel A Ozier O Baliga NS Wang JT Ramage D Amin N Schwikowski B Ideker T (2003) 1011 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 29

Cytoscape a software environment for integrated models of biomolecular interaction networks Genome 1012 Res 13 2498ndash2504 1013

Sing T Sander O Beerenwinkel N Lengauer T (2005) ROCR visualizing classifier performance in R 1014 Bioinformatics 21 3940ndash3941 1015

Singh M Goel S Meeley RB Dantec C Parrinello H Michaud C Leblanc O Grimanelli D (2011) Production of 1016 viable gametes without meiosis in maize deficient for an ARGONAUTE protein Plant Cell 23 443ndash458 1017

Song L Langfelder P Horvath S (2012) Comparison of co-expression measures mutual information 1018 correlation and model based indices BMC Bioinformatics 13 328 1019

Stelpflug SC Rajandeep S Vaillancourt B Hirsch CN Buell CR Leon N De Kaeppler SM (2015) An 1020 expanded maize gene expression atlas based on RNA-sequencing and its use to explore root 1021 development Plant Genome 314ndash362 1022

Szklarczyk D Franceschini A Wyder S Forslund K Heller D Huerta-Cepas J Simonovic M Roth A Santos A 1023 Tsafou KP et al (2015) STRING v10 Protein-protein interaction networks integrated over the tree of life 1024 Nucleic Acids Res 43 D447ndashD452 1025

Tzfadia O Diels T De Meyer S Vandepoele K Aharoni A Van de Peer Y (2016) CoExpNetViz Comparative 1026 Co-Expression Networks Construction and Visualization Tool Front Plant Sci 6 1194 1027

Usadel B Obayashi T Mutwil M Giorgi FM Bassel GW Tanimoto M Chow A Steinhauser D Persson S 1028 Provart NJ (2009) Co-expression tools for plant biology opportunities for hypothesis generation and 1029 caveats Plant Cell Environ 32 1633ndash51 1030

USDA (2016) Grain World Markets and Trade 1031

Vanholme R Demedts B Morreel K Ralph J Boerjan W (2010) Lignin biosynthesis and structure Plant 1032 Physiol 153 895ndash905 1033

Verdin E Ott M (2014) 50 Years of Protein Acetylation From Gene Regulation To Epigenetics Metabolism 1034 and Beyond Nat Rev Mol Cell Biol 16 258ndash264 1035

Walley JW Sartor RC Shen Z Schmitz RJ Wu KJ Urich MA Nery JR Smith LG Schnable JC Ecker JR 1036 (2016) Integration of omic networks in a developmental atlas of maize Science 353 814ndash818 1037

Wang W Zhou XJ Liu Z Sun F (2015a) Network tuned multiple rank aggregation and applications to gene 1038 ranking BMC Bioinformatics 16 S6 1039

Wang YXR Jiang K Feldman LJ Bickel PJ Huang H (2015b) Inferring genendashgene interactions and functional 1040 modules using sparse canonical correlation analysis Ann Appl Stat 9 300ndash323 1041

Wang Z Gerstein M Snyder M (2009) RNA-Seq a revolutionary tool for transcriptomics Nat Rev Genet 10 1042 57ndash63 1043

Wei C Li J Bumgarner RE (2004) Sample size for detecting differentially expressed genes in microarray 1044 experiments BMC Genomics 5 87 1045

Wolfe CJ Kohane IS Butte AJ (2005) Systematic survey reveals general applicability ofldquo guilt-by-associationrdquo 1046 within gene coexpression networks BMC Bioinformatics 6 227 1047

Wu D-D Wang X Li Y Zeng L Irwin DM Zhang Y-P (2014) ldquoOut of Pollenrdquo Hypothesis for Origin of New 1048 Genes in Flowering Plants Study from Arabidopsis thaliana Genome Biol Evol 6 2822ndash2829 1049

Yang H Liu X Xin M Du J Hu Z Peng H Rossi V Sun Q Ni Z Yao Y (2016) Genome-wide Mapping of 1050 Targets of Maize Histone Deacetylase HDA101 Reveals Its Function and Regulatory Mechanism during 1051 Seed Development Plant Cell 28 629ndash645 1052

Yim WC Yu Y Song K Jang CS Lee B-M (2013) PLANEX the plant co-expression database BMC Plant Biol 1053 13 83 1054

Zheng W Chung LM Zhao H (2011) Bias detection and correction in RNA-Sequencing data BMC 1055 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 30

Bioinformatics 12 290 1056

Zhong R Allen JD Xiao G Xie Y (2014) Ensemble-based network aggregation improves the accuracy of gene 1057 network reconstruction PLoS One 9 e106319 1058

Zhong R Ye ZH (2015) Secondary cell walls Biosynthesis patterned deposition and transcriptional regulation 1059 Plant Cell Physiol 56 195ndash214 1060

Zhu G Wu A Xu X-J Xiao P Lu L Liu J Cao Y Chen L Wu J Zhao X-M (2016) PPIM A protein-protein 1061 interaction database for Maize Plant Physiol 170 pp1501821- 1062

Zyprych-Walczak J Szabelska A Handschuh L Goacuterczak K Klamecka K Figlerowicz M Siatkowski I (2015) 1063 The impact of normalization methods on RNA-Seq data analysis Biomed Res Int 2015 1064

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 1 Number of maize microarray and RNA-Seq samples submitted to NCBI (httpswwwncbinlmnihgov) from 2008-2016 A A text search of the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) database identified samples generated by microarray platforms GPL4032 and GPL12620 the total values for years 2008 to 2016 were combined to represent the number of microarray studies (Microarray) A text search of the NCBI Sequence Read Archive (SRA) database was used to identify RNA-seq samples generated between 2008 and 2016 using the Illumina sequencing platform (RNA-Seq) B the number of samples submitted to NCBI GEO database each year generated by microarray platform GPL4032 and GPL12620 were identified by a text search (dash line) and compared to the number of RNA-Seq Illumina samples (solid line) per year 2008-2016

Fig 1A B

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 2 Normalization and network inference methods effect on single network performance A Network performance was evaluated by calculating area under the receiver operating characteristic curve (AUROC) values from GO datasets for comparisons with samples normalized using Variance Stabilizing Transformation (VST) Counts Per Million (CPM) or Reads per Kilobase per Million (RPKM) methods B Network performance was evaluated by calculating AUROC values from PPPTY dataset comparisons for samples normalized using VST CPM or RPKM methods C Network performance was evaluated by calculating AUROC values from com-parisons with HDA101 binding targets for samples normalized using VST CPM or RPKM methods D Network performance was evaluated by calculating AUROC values from comparisons with GO dataset for samples constructed using nine inference methods including Pearson Correlation Coefficient (PCC) Spearman correla-tion coefficient (SCC) Kendall rank Correlation Coefficient (KCC) Gini correlation coefficient (GCC) Biweight midcorrelation (BIC) Cosine Similarity Coefficient (CSC) Additive ARCNE (AA) multiplicative ARCNE (MA) Minimum Redundancy NETwork (MRNET) and Context Likelihood of Relatedness (CLR) E Network perfor-mance was evaluated by calculating AUROC values from comparisons with PPPTY for samples constructed using nine inference methods F Network performance was evaluated by calculating AUROC values from com-parisons with HDA101 binding targets for samples constructed using nine inference methods Outliers were defined as outside of 15 times the interquartile range above 75 quantile or below 25 quantile Median values were plotted as bold horizontal lines For C to F the horizontal dashed lines indicate the highest and lowest AUROC values

Fig 2 A D

B E

C F

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

FigP

FigurePIPSimilarityPbetweenPninePinferencePmethodsPonPnetworkPperformancePbasedPuponPGOPA-PandPPPPTYPB-PevaluationI Genes with highest AUROC value in CSC or MRNETCLR are closed in blue and green box respectively AreaPunderPthePROCPcurvePAUROC-PvaluesPforPeachPGOPtermPorPgenesPwerePscaledPtoPstandardPnormalPdistributionMPresultingPinPscaledPAUROCPvaluesPbetweenPKPblue-PandPPred-IPSamplesPnormalizedPbyPVSTMPCPMPandPRPKMPwerePanalyzedPusingPeachPinferencePmethodsPPCCMPSCCMPKCCMPGCCMPBICMPCSCMPAAMPMAMPMRNETPandPCLR-PandPclusteredPbasedPonP EuclidianP distanceIP PCCP PearsonP CorrelationP CoefficientP SCCP SpearmanP correlationP coefficientP KCCP KendallP rankP CorrelationP CoefficientPGCCPGiniPcorrelationPcoefficientPBICPBiweightPmidcorrelationPCosinePSimilarityPCoefficientPCSC-PAAPAdditivePARCNEPMAPmultiplicativePARCNEPMRNETPMinimumPRedundancyPNETworkPCLRPContextPLikelihoodPofPRelatednessI

A

B

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 4 Effect of sample size on network performance A Average area under the ROC curve (average AUROC) values from GO evaluation of 17 different sized networks plotted against natural logarithm transformed sample size (log(sample size)) B Average AUROC values from PPPTY evaluation of 17 different sized networks plotted against natural logarithm transformed sample size (log(sample size)) Regression fitting logarithm models were plotted in black lines R-square and p-values were calculated by lm() function in R PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness

Fig 4A

B

GO PCC GO SCC

GO MRNET GO CLR

PPPTY PCC PPPTY SCC

PPPTY CLRPPPTY MRNET

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Fig 5

A B

Figure 5 Evaluation of different sized networks constructed with different number of samples using PCC (black) SCC (green) MRNET (red) and CLR (blue) methods A Average AUROC values from GO evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted against sample size B Average AUROC values from PPPTY evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted against sample size Networks with the same number of samples included are designated as ldquo_1rdquo ldquo_2rdquo and ldquo_3rdquo PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 6 GCN performance comparison of networks constructed with 1266 libraries A Area under the ROC curve (AUROC) values from GO evalu-ation of single network (white bars) aggregation network (grey bars) and protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers B AUROC values from PPPTY evaluation of single network (white bars) aggregation network (grey bars) and protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers

AU

C

AU

C

Protein GO Protein PPPTYA B

Fig 6

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

835

45

3812

5 802

148

MS PA

SA

872505

167664

16510411732

9172 716573

106591

MS PA

SA

chromatinassemblydisassembly

cellular macromoleculemetabolic process

chromatin assembly

Hub

N2 compound metabolicprocess

gene silencing

macromoleculemetabolic process

cellular componentorganization

chromatin modification

biosynthetic process

cellular biosyntheticprocess

DNA packaging

organelle organization

protein-DNA complexassembly

nucleosomeorganization

DNA-dep DNAreplication

macromoleculebiosynthetic process

response to DNAdamage stimulus

chromosomeorganization

pattern specificationprocess

DNA replication

DNA conformationchange

translation

cellular macromoleculebiosynthetic process

Nucleic acid metabolicprocess

gene expression

chromatin organizationnucleosome assembly

epigenetic reg of geneexpression

negative regulation ofmacromolecule

metabolic process

cellular response tostress

RNA processing

DNA repair

regionalization

polysaccharidebiosynthetic process

cell wall organization orbiogenesis

glucan metabolicprocess

cellular glucanmetabolic process

cellular polysaccharidebiosynthetic process

cellular carbohydratebiosynthetic process

cellulose metabolicprocess

cellular polysaccharidemetabolic process

cellulose biosyntheticprocess epidermis development

cell growthgrowth

regulation of cellularcomponent size

cellular amino acidderivative metabolic

process

cell wall polysaccharidemetabolic process

carbohydrate metabolicprocess

regulation of anatomicalstructure size

GTP metabolic process

root morphogenesis

epidermal celldifferentiation

ectoderm developmentphenylpropanoid

biosynthetic process

regulation of cell size

glucan biosyntheticprocess

carbohydratebiosynthetic process cellular cell wall

organization orbiogenesis

cell wall biogenesis

Cell Wallroot epidermal cell

differentiationcell differentiation

cell wall organization

protein polymerization

plant-type cell wallbiogenesis

cellular carbohydratemetabolic process

phenylpropanoidmetabolic process

cell wall macromoleculemetabolic process

cellular cell wallmacromolecule

metabolic process

plant-type cell wallorganization or

biogenesis

hemicellulose metabolicprocess

Fig 7A C

B D

Figure 7 Comparison of top 1000 interacting genes in aggregation and single networks A Venn diagram shows the overlap among 1000 highest interacting genes in three networks 148 genes were shared among three networks PA PCC ranked aggregation network SA SCC ranked aggrega-tion network MS MRNET single network B Venn diagram shows the overlap among top 1x106 edges in three networks 106591 edges were shared among three networks C GO term enrichment analyzed by AgriGO for 148 shared highly interacting genes among PA SA and MS D GO term enrichment analyzed by AgriGO for 214 co-expressed genes queried by 16 cell wall pathway genes wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Fig 8

A B C

Figure 8 Cell wall pathway subnetworks A Intersections of PCC aggregation (PA) SCC aggregation (SA) and MRNET-single (MS) networks queried by 16 cell wall pathway genes (red nodes) Cyan nodes are genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions B Network retrieved from CORNET database queried by the 16 cell wall pathway genes (red node) Cyan nodes are genes with reported function in cell wall related path-ways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions C Network retrieved from STRING database queried by the 16 cell wall pathway genes (red nodes) Cyan nodes are genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Parsed CitationsAllen JD Xie Y Chen M Girard L Xiao G (2012) Comparing statistical methods for constructing large scale gene networks PLoS One7 e29348

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Anders S Huber W (2010) Differential expression analysis for sequence count data Genome Biol 11 R106Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Aoki K Ogata Y Shibata D (2007) Approaches for extracting practical information from gene co-expression networks in plant biologyPlant Cell Physiol 48 381-90

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Aoki Y Okamura Y Tadaka S Kinoshita K Obayashi T (2015) ATTED-II in 2016 A Plant Coexpression Database Towards Lineage-Specific Coexpression Plant Cell Physiol 57 e5-e5

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Asnicar F Masera L Coller E Gallo C Sella N Tolio T Morettin P Erculiani L Galante F Semeniuta S (2016) NES2RA Networkexpansion by stratified variable subsetting and ranking aggregation Int J High Perform Comput Appl 1094342016662508

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Baerenfaller K Grossmann J Grobei MA Hull R Hirsch-Hoffmann M Yalovsky S Zimmermann P Grossniklaus U Gruissem WBaginsky S (2008) Genome-scale proteomics reveals Arabidopsis thaliana gene models and proteome dynamics Science (80- ) 320938-941

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ballouz S Verleyen W Gillis J (2015) Guidance for RNA-seq co-expression network construction and analysis Safety in numbersBioinformatics 31 2123-2130

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ballouz S Weber M Pavlidis P Gillis J (2016) EGAD Ultra-fast functional analysis of gene networks bioRxiv 53868Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Barabasi A-L Oltvai ZNZN Barabaacutesi A-L (2004) Network biology understanding the cells functional organization Nat Rev Genet 5101-113

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Benjamini Y Hochberg Y (1995) Controlling the false discovery rate a practical and powerful approach to multiple testing J R Stat SocSer B 289-300

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

De Bodt S Hollunder J Nelissen H Meulemeester N Inzeacute D (2012) CORNET 20 Integrating plant coexpression protein-proteininteractions regulatory interactions gene associations and functional annotations New Phytol 195 707-720

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Bolduc N Yilmaz A Mejia-Guerra MK Morohashi K OConnor D Grotewold E Hake S (2012) Unraveling the KNOTTED1 regulatorynetwork in maize meristems Genes Dev 26 1685-90

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Bosch M Mayer C-D Cookson A Donnison IS (2011) Identification of genes involved in cell wall biogenesis in grasses by differential wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

gene expression profiling of elongating and non-elongating maize internodes J Exp Bot 62 3545-3561Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Camacho C Coulouris G Avagyan V Ma N Papadopoulos J Bealer K Madden TL (2009) BLAST+ architecture and applications BMCBioinformatics 10 421

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Conesa A Madrigal P Tarazona S Gomez-Cabrero D Cervera A McPherson A Szczesniak MW Gaffney DJ Elo LL Zhang X et al(2016) A survey of best practices for RNA-seq data analysis Genome Biol 17 13

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dhaeseleer P Liang S Somogyi R (2000) Genetic network inference from co-expression clustering to reverse engineeringBioinformatics 16 707-726

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Davidson RM Hansey CN Gowda M Childs KL Lin H Vaillancourt B Sekhon RS de Leon N Kaeppler SM Jiang N et al (2011) Utilityof RNA Sequencing for Analysis of Maize Reproductive Transcriptomes Plant Genome J 4 191

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Desprez T Juraniec M Crowell EF Jouy H Pochylova Z Parcy F Houmlfte H Gonneau M Vernhettes S (2007) Organization of cellulosesynthase complexes involved in primary cell wall synthesis in Arabidopsis thaliana Proc Natl Acad Sci 104 15572-15577

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dhillon IS Modha DS (2001) Concept decompositions for large sparse text data using clustering Mach Learn 42 143-175Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dillies M-A Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot G Castel D Estelle J (2013a) Acomprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis Brief Bioinform 14671-683

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dillies MA Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot NS Castel D Estelle J et al (2013b) Acomprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis Brief Bioinform 14671-683

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Doncheva NT Assenov Y Domingues FS Albrecht M (2012) Topological analysis and interactive visualization of biological networksand protein structures Nature Protoc 7 670-85

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dong J Horvath S (2007) Understanding network concepts in modules BMC Syst Biol 1 24Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dotto MC Petsch KA Aukerman MJ Beatty M Hammell M Timmermans MCP (2014) Genome-wide analysis of leafbladeless1-regulatedand phased small RNAs underscores the importance of the TAS3 ta-siRNA pathway to maize development PLoS Genet 10 e1004826

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Du D Rawat N Deng Z Gmitter FG (2015) Construction of citrus gene coexpression networks from microarray data using randommatrix theory Hortic Res 2 15026

Pubmed Author and TitleCrossRef Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Google Scholar Author Only Title Only Author and Title

Du Z Zhou X Ling Y Zhang Z Su Z (2010) agriGO a GO analysis toolkit for the agricultural community Nucleic Acids Res 38 64-70Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Enright AJ Van Dongen S Ouzounis CA (2002) An efficient algorithm for large-scale detection of protein families Nucleic Acids Res 301575-1584

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fadista J Vikman P Laakso EO Mollet IG Esguerra J Lou Taneera J Storm P Osmark P Ladenvall C Prasad RB (2014) Globalgenomic and transcriptomic analysis of human pancreatic islets reveals novel genes influencing glucose metabolism Proc Natl AcadSci 111 13924-13929

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Faith JJ Hayete B Thaden JT Mogno I Wierzbowski J Cottarel G Kasif S Collins JJ Gardner TS (2007) Large-scale mapping andvalidation of Escherichia coli transcriptional regulation from a compendium of expression profiles PLoS Biol 5 0054-0066

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fedoroff N V (2012) McClintocks challenge in the 21st century Proc Natl Acad Sci 109(50) 20200-20203Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ficklin SP Feltus FA (2011) Gene coexpression network alignment and conservation of gene modules between two grass speciesmaize and rice Plant Physiol 156 1244-56

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Filzmoser P Fritz H Kalcher K (2009) pcaPP Robust PCA by Projection Pursuit R Packag version 1Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fu J Cheng Y Linghu J Yang X Kang L Zhang Z Zhang J He C Du X Peng Z (2013) RNA sequencing reveals the complex regulatorynetwork in the maize kernel Nature Commun 42832

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gao S Ver Steeg G Galstyan A (2015) Efficient Estimation of Mutual Information for Strongly Dependent Variables ArtificialIntelligence and Statistics 277-286

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gillis J Pavlidis P (2011) The role of indirect connections in gene networks in predicting function Bioinformatics 27 1860-1866Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Giorgi F Fabbro C Del Licausi F (2013) Comparative study of RNA-seq-and Microarray-derived coexpression networks in Arabidopsisthaliana Bioinformatics 2 1-8

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gu Y Kaplinsky N Bringmann M Cobb A Carroll A Sampathkumar A Baskin TI Persson S Somerville CR (2010) Identification of acellulose synthase-associated protein required for cellulose biosynthesis Proc Natl Acad Sci 107 12866-12871

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Han Y Gao S Muegge K Zhang W Zhou B (2015) Advanced applications of RNA sequencing and challenges Bioinform Biol Insights 929-46

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Horvath S Dong J (2008) Geometric interpretation of gene coexpression network analysis PLoS Comput biol 4 e1000117Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Huang J Lynn JS Schulte L Vendramin S McGinnis K (2017) Chapter Two-Epigenetic Control of Gene Expression in Maize Int RevCell Mol Biol 328 25-48

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Iancu OD Kawane S Bottomly D Searles R Hitzemann R McWeeney S (2012) Utilizing RNA-Seq data for de novo coexpressionnetwork inference Bioinformatics 28 1592-1597

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kim D Langmead B Salzberg SL (2015) HISAT a fast spliced aligner with low memory requirements Nat Methods 12 357-360Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Koumlster J Rahmann S (2012) Snakemakemdasha scalable bioinformatics workflow engine Bioinformatics 28 2520-2522Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Krouk G Lingeman J Colon A Coruzzi G Shasha D (2013) Gene regulatory networks in plants learning causality from time andperturbation Genome Biol 14 123

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kumar S Stecher G Suleski M Hedges SB (2017) TimeTree a resource for timelines timetrees and divergence times Mol Biol Evol34 1812-1819

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kumari S Nie J Chen H-S Ma H Stewart R Li X Lu M-Z Taylor WM Wei H (2012) Evaluation of gene association methods forcoexpression network construction and biological knowledge discovery PLoS One 7 e50411

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Langfelder P Horvath S (2008) WGCNA an R package for weighted correlation network analysis BMC Bioinformatics 9 559Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Leinonen R Sugawara H Shumway M (2010) The sequence read archive Nucleic Acids Res gkq1019Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li C Qiao Z Qi W Wang Q Yuan Y Yang X Tang Y Mei B Lv Y Zhao H et al (2015a) Genome-Wide Characterization of cis-Acting DNATargets Reveals the Transcriptional Regulatory Framework of Opaque2 in Maize Plant Cell 27 532-545

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li H Peng Z Yang X Wang W Fu J Wang J Han Y Chai Y Guo T Yang N (2013a) Genome-wide association study dissects the geneticarchitecture of oil biosynthesis in maize kernels Nat Genet 45 43-50

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li J Wei H Zhao PX (2013b) DeGNServer Deciphering Genome-Scale Gene Networks through High Performance ReverseEngineering Analysis 2013

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li P Piao Y Shon HS Ryu KH (2015b) Comparing the normalization methods for the differential analysis of Illumina high-throughputRNA-Seq data BMC Bioinformatics 16 347 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li Q Eichten SR Hermanson PJ Zaunbrecher VM Song J Wendt J Rosenbaum H Madzima TF Sloan AE Huang J et al (2014a)Genetic Perturbation of the Maize Methylome Plant Cell 26 4602-4616

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li S Labaj PP Zumbo P Sykacek P Shi W Shi L Phan J Wu P-Y Wang M Wang C (2014b) Detecting and correcting systematicvariation in large-scale RNA sequencing data Nature Biotechnol 32 888-895

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li Y Pearl SA Jackson SA (2015c) Gene Networks in Plant Biology Approaches in Reconstruction and Analysis Trends Plant Sci 20664-675

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Liao Y Smyth GK Shi W (2014) featureCounts an efficient general purpose program for assigning sequence reads to genomicfeatures Bioinformatics 30 923-930

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Lim WK Wang K Lefebvre C Califano A (2007) Comparative analysis of microarray normalization procedures Effects on reverseengineering gene networks Bioinformatics pp 282-288

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Liu S Liu Y Zhao J Cai S Qian H Zuo K Zhao L Zhang L (2017) A computational interactome for prioritizing genes associated withcomplex agronomic traits in rice Plant J 90 177-188

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Lonsdale J Thomas J Salvatore M Phillips R Lo E Shad S Hasz R Walters G Garcia F Young N (2013) The genotype-tissueexpression (GTEx) project Nat Genet 45 580-585

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Love MI Huber W Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 Genome Biol15 1

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Luo X You Z Zhou M Li S Leung H Xia Y Zhu Q (2015) A highly efficient approach to protein interactome mapping based oncollaborative filtering framework Sci Rep 5 7702

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma C Wang X (2012) Application of the Gini correlation coefficient to infer regulatory relationships in transcriptome analysis PlantPhysiol 160 192-203

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma H-W Zeng A-P (2003) The connectivity structure giant strong component and centrality of metabolic networks Bioinformatics 191423-1430

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma S Shah S Bohnert HJ Snyder M Dinesh-Kumar SP (2013) Incorporating motif analysis into gene co-expression networks revealsnovel modular expression pattern and new signaling pathways PLoS Genet 9 e1003840

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Marbach D Costello JC Kuumlffner R Vega NNM Prill RJ Camacho DM Allison KR Aderhold A Allison KR Bonneau R et al (2012)Wisdom of crowds for robust gene network inference Nat Methods 9 796-804

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Margolin AA Nemenman I Basso K Wiggins C Stolovitzky G Dalla Favera R Califano A (2006) ARACNE an algorithm for thereconstruction of gene regulatory networks in a mammalian cellular context BMC Bioinformatics 7 S7

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mark Cigan A Unger-Wallace E Haug-Collet K (2005) Transcriptional gene silencing as a tool for uncovering gene function in maizePlant J 43 929-940

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads EMBnet J 17 pp-10Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Maza E Frasse P Senin P Bouzayen M Zouine M (2013) Comparison of normalization methods for differential gene expressionanalysis in RNA-Seq experiments A matter of relative size of studied transcriptomes Commun Integr Biol 6 e25849

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

McClintock B (1983) The Significance of Responses of THE GENOME TO CHALLENGE Science (80- ) 792-801Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Meyer PE Kontos K Lafitte F Bontempi G (2007) Information-theoretic inference of large transcriptional regulatory networks EurasipJ Bioinforma Syst Biol doi 101155200779879

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Meyer PE Lafitte F Bontempi G (2008) minet ARBioconductor package for inferring large transcriptional networks using mutualinformation BMC Bioinformatics 9 461

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Monaco MK Sen TZ Dharmawardhana PD Ren L Schaeffer M Naithani S Amarasinghe V Thomason J Harper L Gardiner J et al(2013) Maize Metabolic Network Construction and Transcriptome Analysis Plant Genome 6 12

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Morohashi K Casas MI Falcone Ferreyra ML Falcone Ferreyra L Mejiacutea-Guerra MK Pourcel L Yilmaz A Feller A Carvalho B EmilianiJ et al (2012) A genome-wide regulatory framework identifies maize pericarp color1 controlled genes Plant Cell 24 2745-64

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Morris JH Apeltsin L Newman AM Baumbach J Wittkop T Su G Bader GD Ferrin TE (2011) clusterMaker a multi-algorithmclustering plugin for Cytoscape BMC Bioinformatics 12 436

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mortazavi A Williams BA McCue K Schaeffer L Wold B (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq NatMethods 5 621-628

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mukaka MM (2012) A guide to appropriate use of correlation coefficient in medical research Malawi Med J 24 69-71Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Obayashi T Hayashi S Saeki M Ohta H Kinoshita K (2009) ATTED-II provides coexpressed gene networks for Arabidopsis Nucleic wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Acids Res 37 D987-D991Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ogata Y Suzuki H Sakurai N Shibata D (2010) CoP a database for characterizing co-expressed gene modules with biologicalinformation in plants Bioinformatics 26 1267-1268

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Oshlack A Wakefield MJ (2009) Transcript length bias in RNA-seq data confounds systems biology Biol Direct 4 14Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Park T Yi S-G Kang S-H Lee S Lee Y-S Simon R (2003) Evaluation of normalization methods for microarray data BMC Bioinformatics4 33

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Paulson J Chen C-Y Lopes-Ramos CM Kuijjer ML Platig J Sonawane AR Fagny M Glass K Quackenbush J (2016) Tissue-awareRNA-Seq processing and normalization for heterogeneous and sparse data bioRxiv 81802

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Pautler M Eveland AL LaRue T Yang F Weeks R Lunde C Je B Il Meeley R Komatsu M Vollbrecht E et al (2015) FASCIATED EAR4Encodes a bZIP Transcription Factor That Regulates Shoot Meristem Size in Maize Plant Cell Online 2 tpc114132506

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Penning BW Hunter 3rd CT Tayengwa R Eveland AL Dugard CK Olek AT Vermerris W Koch KE McCarty DR Davis MF et al (2009)Genetic resources for maize cell wall biology Plant Physiol 151 1703-1728

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ponnala L Wang Y Sun Q Wijk KJ (2014) Correlation of mRNA and protein abundance in the developing maize leaf Plant J 78 424-440Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Rau A Gallopin M Celeux G Jaffreacutezic F (2013) Data-based filtering for replicated high-throughput transcriptome sequencingexperiments Bioinformatics 29 2146-2152

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Robinson MD McCarthy DJ Smyth GK (2010) edgeR a Bioconductor package for differential expression analysis of digital geneexpression data Bioinformatics 26 139-140

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sales G Romualdi C (2011) parmigenemdasha parallel R package for mutual information estimation and gene network reconstructionBioinformatics 27 1876-1877

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sangrador-Vegas A Mitchell AL Chang H-Y Yong S-Y Finn RD (2016) GO annotation in InterPro why stability does not indicateaccuracy in a sea of changing annotations Database J Biol databases curation 2016

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sato Y Antonio BA Namiki N Takehisa H Minami H Kamatsuki K Sugimoto K Shimizu Y Hirochika H Nagamura Y (2011) RiceXPro aplatform for monitoring gene expression in japonica rice grown under natural field conditions Nucleic Acids Res 39 D1141-D1148

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schaefer RJ Briskine R Springer NM Myers CL (2014) Discovering functional modules across diverse maize transcriptomes using wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

COB the co-expression browser PLoS One doi 101371journalpone0099193Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schmidt D (2016) Co-Operation Fast Correlation Covariance and Cosine Similarity R packagePubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schnable P Ware D Fulton R Stein J (2009) The B73 maize genome complexity diversity and dynamics Science (80- ) 326 1112-1115Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schwanhaumlusser B Busse D Li N Dittmar G Schuchhardt J Wolf J Chen W Selbach M (2011) Global quantification of mammalian geneexpression control Nature 473 337-342

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Scott-Boyer M-P Haibe-Kains B Deschepper CF (2013) Network statistics of genetically-driven gene co-expression modules in mousecrosses Frontiers in Genetics 20134291

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Serin EAR Nijveen H Hilhorst HWM Ligterink W (2016) Learning from Co-expression Networks Possibilities and Challenges FrontPlant Sci 7 444

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Shannon P Markiel A Ozier O Baliga NS Wang JT Ramage D Amin N Schwikowski B Ideker T (2003) Cytoscape a softwareenvironment for integrated models of biomolecular interaction networks Genome Res 13 2498-2504

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sing T Sander O Beerenwinkel N Lengauer T (2005) ROCR visualizing classifier performance in R Bioinformatics 21 3940-3941Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Singh M Goel S Meeley RB Dantec C Parrinello H Michaud C Leblanc O Grimanelli D (2011) Production of viable gametes withoutmeiosis in maize deficient for an ARGONAUTE protein Plant Cell 23 443-458

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Song L Langfelder P Horvath S (2012) Comparison of co-expression measures mutual information correlation and model basedindices BMC Bioinformatics 13 328

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Stelpflug SC Rajandeep S Vaillancourt B Hirsch CN Buell CR Leon N De Kaeppler SM (2015) An expanded maize gene expressionatlas based on RNA-sequencing and its use to explore root development Plant Genome 314-362

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Szklarczyk D Franceschini A Wyder S Forslund K Heller D Huerta-Cepas J Simonovic M Roth A Santos A Tsafou KP et al (2015)STRING v10 Protein-protein interaction networks integrated over the tree of life Nucleic Acids Res 43 D447-D452

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Tzfadia O Diels T De Meyer S Vandepoele K Aharoni A Van de Peer Y (2016) CoExpNetViz Comparative Co-Expression NetworksConstruction and Visualization Tool Front Plant Sci 6 1194

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Usadel B Obayashi T Mutwil M Giorgi FM Bassel GW Tanimoto M Chow A Steinhauser D Persson S Provart NJ (2009) Co-expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

USDA (2016) Grain World Markets and Trade

Vanholme R Demedts B Morreel K Ralph J Boerjan W (2010) Lignin biosynthesis and structure Plant Physiol 153 895-905Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Verdin E Ott M (2014) 50 Years of Protein Acetylation From Gene Regulation To Epigenetics Metabolism and Beyond Nat Rev MolCell Biol 16 258-264

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Walley JW Sartor RC Shen Z Schmitz RJ Wu KJ Urich MA Nery JR Smith LG Schnable JC Ecker JR (2016) Integration of omicnetworks in a developmental atlas of maize Science 353 814-818

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang W Zhou XJ Liu Z Sun F (2015a) Network tuned multiple rank aggregation and applications to gene ranking BMC Bioinformatics16 S6

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang YXR Jiang K Feldman LJ Bickel PJ Huang H (2015b) Inferring gene-gene interactions and functional modules using sparsecanonical correlation analysis Ann Appl Stat 9 300-323

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang Z Gerstein M Snyder M (2009) RNA-Seq a revolutionary tool for transcriptomics Nat Rev Genet 10 57-63Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wei C Li J Bumgarner RE (2004) Sample size for detecting differentially expressed genes in microarray experiments BMC Genomics5 87

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wolfe CJ Kohane IS Butte AJ (2005) Systematic survey reveals general applicability of guilt-by-association within genecoexpression networks BMC Bioinformatics 6 227

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wu D-D Wang X Li Y Zeng L Irwin DM Zhang Y-P (2014) Out of Pollen Hypothesis for Origin of New Genes in Flowering PlantsStudy from Arabidopsis thaliana Genome Biol Evol 6 2822-2829

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yang H Liu X Xin M Du J Hu Z Peng H Rossi V Sun Q Ni Z Yao Y (2016) Genome-wide Mapping of Targets of Maize HistoneDeacetylase HDA101 Reveals Its Function and Regulatory Mechanism during Seed Development Plant Cell 28 629-645

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yim WC Yu Y Song K Jang CS Lee B-M (2013) PLANEX the plant co-expression database BMC Plant Biol 13 83Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zheng W Chung LM Zhao H (2011) Bias detection and correction in RNA-Sequencing data BMC Bioinformatics 12 290Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Allen JD Xiao G Xie Y (2014) Ensemble-based network aggregation improves the accuracy of gene network reconstructionPLoS One 9 e106319 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Ye ZH (2015) Secondary cell walls Biosynthesis patterned deposition and transcriptional regulation Plant Cell Physiol 56195-214

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhu G Wu A Xu X-J Xiao P Lu L Liu J Cao Y Chen L Wu J Zhao X-M (2016) PPIM A protein-protein interaction database for MaizePlant Physiol 170 pp1501821-

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zyprych-Walczak J Szabelska A Handschuh L Goacuterczak K Klamecka K Figlerowicz M Siatkowski I (2015) The impact of normalizationmethods on RNA-Seq data analysis Biomed Res Int 2015

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

  • Parsed Citations
  • Article File
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Figure 7
  • Figure 8
  • Parsed Citations
Page 6: 1 : Maize RNA-Seq GCNs · 4 Author names and affiliation: 5 Ji Huang, ... and the genome annotation is still far from complete (Mark 49 Cigan et al., 2005; Ficklin and Feltus, 2011)

Page | 6

Network Performance Does Not Differ Based Upon Normalization Method 176

To compare the efficacy of three normalization and ten inference methods a GCN was generated for each 177

combination of normalization and inference methods Furthermore all networks were rank-standardized to limit 178

the edge weight ranging from 0 to 1 (See Methods) All networks evaluations used the whole adjacency matrix 179

(1511615116 in RNA-Seq networks 1142911429 or 1786217862 in protein networks) without a cut-off 180

The performance of the different networks was measured by comparing the area under the receiver operator 181

characteristic curves (AUROC) AUROC is a measurement used to evaluate the accuracy of classification 182

models making it suitable for evaluating GCNs (Gillis and Pavlidis 2011 Ma and Wang 2012 Liu et al 2017) 183

AUROC values range from 0 to 1 with a value closer to 1 indicating that the network is discriminating 184

nonrandom patterns and perfect classification random networks returning values close to 05 and values 185

closer to 0 indicating a high degree of incorrect classification While an AUROC value close to 1 is optimal 186

values over 07 suggest good performance when analyzing large diverse networks (Gillis and Pavlidis 2011) 187

To set up the AUROC baseline for the random networks maize gene IDs were shuffled 10 (for MRNET and 188

CLR) or 1000 times (for PCC) from the normalized expression matrix The randomized expression matirx were 189

inferenced using designated alorgrithms and further evaluated The resulting AUROC values from randomized 190

networks were very close to 05 (Supplemental Table S2) 191

AUROC values were calculated and compared for three different network characteristics The first 192

characteristic was designed to test if the network identified genes with known or predicted co-expression 193

patterns based upon prior results and inclusion in two existing datasets that could serve as a positive control 194

for co-expression The maize metabolic pathway (MaizeCyc) contains 413 pathways with more than two genes 195

and was built based upon collection of evidence from genome annotation phylogenetic distance and known 196

genes in maize rice and Arabidopsis (Monaco et al 2013) The maize protein-protein interaction database 197

(PPIM) is based upon both predicted and experimentally detected protein interactions (Zhu et al 2016) and 198

was the second dataset used in this analysis Only high-confident interactions from PPIM were used as 199

defined by ranking top 5 in their model (Zhu et al 2016) For comparison with the GCN genes within the 200

same MaizeCyc or PPIM pathways were considered co-expressed The MaizeCyc and PPIM datasets were 201

combined and genes with less than 5 interactions were excluded from evaluation creating a compiled dataset 202

referred to herein as the Protein-Protein and Pathway dataset (PPPTY) PPPTY had 1720 genes and 104856 203

interactions that were used in this evaluation The AUROC value was calculated for each of the 1720 gene 204

terms 205

To assess the effect of normalization method on GCNs AUROC values for all ten inference methods were 206

averaged for each of the three normalization methods All three normalization methods scored similarly in 207

comparison with the PPPTY dataset (Fig 2B) with a mean AUROC value around 0575 for each suggesting 208

that the predicted networks were more selective than a random network 209

The second characteristic was the presence of similar gene ontology (GO) information for maize genes within 210

a detected co-expression set based upon ldquoguilt by associationrdquo that assumes specific subgroups of co-211 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 7

expressed genes have some shared functions (Wolfe et al 2005) GO annotations were downloaded from 212

AgriGO (Du et al 2010) which uses signature integration by InterPro to map gene IDs to GO terms rather 213

than co-expression data InterPro provided over 108 million stable GO terms to the functional protein 214

information database UniProtKB at release 2016_01(Sangrador-Vegas et al 2016) Thus the GO annotations 215

provide a reliable evaluation resource independent of co-expression data To assess this characteristic gene 216

ontology information was used in a neighbor voting algorithm (Gillis and Pavlidis 2011) for sets of co-217

expression matrices and compared Co-expression matrices were assessed by 3-fold cross-validation which 218

involved masking GO terms from some genes to test whether the masked GO terms could be predicted based 219

upon gene expression patterns 277 GO terms were included for this analysis 220

When GO characteristics were used to assess the networks all three normalization methods performed 221

similarly but the AUROC values were higher at around 0689 for each than those observed for comparisons 222

with PPPTY (Fig 2A) Because GO addresses gene functions and PPPTY emphasizes protein-protein 223

interactions this suggests that GCNs are better at predicting functional interactions than physical interactions 224

The p-value from one-way ANOVA for testing normalization method effect on PPPTY and GO dataset were 225

09535 and 04714 respectively confirming that the normalization method did not create a significant difference 226

in the AUROC scores associated with the GCNs for the characteristics that were tested 227

Finally proteins that regulate gene expression or modify chromatin structure might interact with the DNA of a 228

subset of co-expressed genes The interactions between such a protein and regulated DNA could be detected 229

by chromatin precipitation of associated DNA followed by DNA sequencing (ChIP-Seq) In maize there are five 230

ChIP-Seq datasets available (Bolduc et al 2012 Morohashi et al 2012 Li et al 2015a Pautler et al 2015 231

Yang et al 2016) some of which involving lowly expressed or tissue-specific genes For example Opaque2 is 232

specifically expressed in endosperm (Li et al 2015a) Knotted1 is expressed in SAM and floral tissues (Bolduc 233

et al 2012) and Pericarp Color1 has low expression except in inflorescence and seed (Morohashi et al 234

2012) Histone Deacetylase 101 (HDA101) ChIP-Seq data provided the largest dataset for comparison with 26 235

confirmed binding targets that are relatively high expressed in most maize tissues (Yang et al 2016) Histone 236

deacetylation often correlates with decreased in gene expression (Verdin and Ott 2014) High confidence 237

HDA101 targets were defined as those discovered by ChIP-Seq and that also showed increased gene 238

expression in hda101 mutant Networks associated with the 26 high confidence HDA101 targets were 239

compared by calculating AUROC Based upon this analysis the AUROC values were very similar among 240

networks normalized by VST CPM and RPKM (Fig 2C) which is consistent with GO and PPPTY evaluation 241

242

Correlation Methods Performs better than Mutual Information at Some Genes 243

After normalization of the expression matrices they can be processed by different methods for GCN inference 244

To optimize this step the AUROC values of six correlation (PCC SCC KCC GCC BIC CSC) and four mutual 245

information (MI) methods (AA MA MRNET CLR) were compared for the expression matrices that were 246

generated from each of three normalization methods (VST CPM RPKM) and then averaged In general 247 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 8

correlation methods are more computationally efficient while MI methods are able to reveal non-linear 248

relationships (Li et al 2015c) PCC is widely used but may be influenced by outliers (Mukaka 2012) SCC 249

KCC and BIC are less sensitive to outliers because SCC and KCC only consider the rank information and BIC 250

calculates based on dataset median instead of mean (Serin et al 2016) Recently GCC has been shown to 251

be a better correlation method for gene expression analysis because of its capacity to detect non-linear 252

relationships and insensitivity to outliers (Ma and Wang 2012) CSC is widely used for text mining and 253

analyzing sparse data with many zeros (Dhillon and Modha 2001) ARACNE MRNET and CLR showed 254

extended gene-dependent relationships under variable biological settings (Margolin et al 2006 Faith et al 255

2007 Meyer et al 2007 Li et al 2013b) To estimate the effectiveness of the inference methods the same 256

testing parameters with AUROC calculations were performed as described for the testing of normalization 257

methods 258

Assessed by GO datasets the 277 AUROC values were averaged to create one average value for each of the 259

10 inference methods ranging from 0620 to 0724 (Fig 2D) The average AUROC across all normalization 260

methods for six correlation methods was 0718 while the average AUROC for the all four MI methods was 261

0646 The majority of the 277 GO terms had similar AUROC values in the different correlation method-262

generated GCNs and these patterns are different from those observed in the MI-generated GCNs (Fig 3A) 263

The similarity among different methods was also detectable by pairwise comparison and comparing Pearson 264

correlations between the different methods (Supplemental Fig 4A) 265

To evaluate network inference methods with the PPPTY dataset the AUROC values for 1720 genes were 266

averaged for each combination of normalization and inference methods (Fig 2E) This evaluation also showed 267

that the networks constructed using correlation methods resulted in higher AUROC values than MI methods 268

although the CSC method resulted in lower AUROC values than other correlation methods As demonstrated 269

for the GO evaluation results from correlation methods were more similar with each other than the MI methods 270

(Supplemental Fig 4B) Interestingly heatmap results indicated that a subset of genes consistently had higher 271

AUROC values when CSC MRNETCLR or AAMA were used (Fig 3B) although this includes a small enough 272

number of genes that the average AUROC value over the whole gene set was relatively low for those methods 273

The gene sets with highest AUROC values in PCC CSC or MRNET were extracted Characteristics of each 274

gene sets were compared in average expression (CPM) and average number of low expressed elements 275

(CPM lt 0) The CSC gene set had the smallest number of low expression elements and had higher average 276

expression than both the 1720 gene set and the PCC gene set (Supplemental Fig 5) This may indicate that 277

the CSC method is better at determining co-expression for highly expressed genes 278

The AUROC values from 26 targets of HDA101 ChIP-Seq datasets reveals that CSC GCN had the highest 279

AUROC value and the use of MRNETCLR GCNs resulted in slightly higher scores than correlation methods 280

(Fig 2F) This could be explained by the small number of targets creating skewed results but may also 281

indicate that CSCMI methods are more suitable for specific types of genes or interactions between genes 282

(Tzfadia et al 2016) HDA101 is a highly expressed gene in all samples with average expression value equals 283

to 864 CPM and minimum expression equals to 289 CPM so itrsquos possible that HDA101 is more suitable for 284 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 9

CSC method CPM and RPKM normalization methods had higher AUROC values than VST (Fig 2C) Using 285

two models of ARACNE (additive-AA and multiplicative-MA) the co-expression matrices contain less than 05 286

non-zero values for all comparisons and so these techniques were not included in any additional analyses 287

In conclusion our results indicated the widely-used correlation methods resulted in a more predictive maize 288

GCN from a single expression matrix but co-expression with some individual genes may be better detected 289

using MI methods Normalization method did not have a substantial influence on GCNrsquos performance so only 290

CPM normalization was used in conjunction with PCC SCC MRNT and CLR inference for subsequent 291

optimization of other parameters 292

293

Increase Sample Size Had a Positive Effect On GCN 294

GCN analysis can be accomplished with a variable number of samples and datasets but sample size can 295

influence the quality of the resulting GCN (Wei et al 2004 Ballouz et al 2015) Separate analyses were 296

conducted with different numbers of samples and experiments to empirically determine the effect of sample 297

number on GCN effectiveness The data in our analysis consisted of 17 experiments each including between 298

12 and 404 libraries For this analysis CPM normalization method followed by each of four inference methods 299

(PCC SCC MRNET and CLR) was applied to the 17 experiments and the 68 resulting networks were 300

evaluated by both GO and PPPTY 301

From GO and PPPTY evaluation all algorithms exhibit a positive linear relationship between sample size with 302

natural logarithm transformed and average AUROC values (Fig 4) The linear relationships are stronger in 303

PCC and SCC methods with higher r-square values indicating correlation methods benefit more from 304

increasing sample size Thus for building correlation-based GCNs as many samples as possible should be 305

included We also found that as seen for the total GCN analysis PCC and SCC had higher average AUROC 306

values than the MRNET and CLR methods for PPPTY and GO analysis for most of individual networks (Fig 5) 307

308

Ranked Aggregation of Networks Improved Performance of GCNs 309

Ranked aggregation for meta-analysis can also be modified to change the outcomes of GCN by buffering the 310

effect of sample heterogeneity (Zhong et al 2014 Wang et al 2015a Asnicar et al 2016) Aggregated rank 311

standardized correlationMI matrices were calculated from separate experiments to determine if this approach 312

enhanced GCN performance Aggregating individual networks together for meta-analysis can help to highlight 313

true co-expression interactions and reduce noise (Zhong et al 2014 Wang et al 2015a Wang et al 2015b) 314

This analysis was conducted with the 17 differently sized experiments using PCC SCC MRNET and CLR 315

method for GCN inference as we did previously resulting in 68 single GCNs The 17 experiments were 316

aggregated for PCC SCC MRNET and CLR individually and evaluated by GO and PPPTY datasets 317

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 10

Of the 4 aggregated networks that were evaluated the two correlation methods (PCC and SCC) had higher 318

AUROC values than the single network from 1266 samples (Figure 6 and Supplemental Fig 6) However this 319

aggregation strategy did not result in significant higher AUROC scores for the MRNET and CLR method 320

networks compared with single networks with 1266 samples (two-tail Wilcoxon rank test for GO evaluation p-321

values 0494 and 0796) It has been reported that MI estimation accuracy is dependent on sample size (Gao 322

et al 2015) therefore individual MI networks built with a small number of libraries may not demonstrate 323

improved accuracy from aggregation In conclusion the PCCSCC-built GCN performed best using a ranked 324

aggregation strategy and use of this strategy in combination with the other optimized parameters creates a 325

robust GCN 326

327

The Performance of Protein Networks Did Not Exceed Aggregation Networks 328

In many cases mRNA levels in a cell are of interest because mRNA level is thought to be related to the level 329

and function of a protein of interest However many researchers had found inconsistencies between mRNA 330

and protein level (Baerenfaller et al 2008 Schwanhaumlusser et al 2011 Ponnala et al 2014 Walley et al 331

2016) Although relatively less protein expression data is available this data is amenable to GCN construction 332

and could represent a more direct reflection of interacting proteins Using a non-modified protein expression 333

atlas from 23 maize tissues based upon mass spectrometry data (Walley et al 2016) four protein networks 334

were built with PCC SCC MRNET and CLR separately and then evaluated using the same PPPTY and GO 335

dataset as previously mentioned 336

GCNs constructed from protein expression did not exhibit superior AUROC values to those observed for RNA-337

Seq based GCN using the aggregation strategy (Fig 6) When evaluated by GO and PPPTY dataset the 338

performance of the protein network was lower than the aggregated network as well as the single network from 339

1266 samples To confirm this result a two-way ANOVA was computed with pairwise comparison for the GO 340

evaluation which showed that the effect of network type was significant (Supplemental Table S3) A 341

subsequent pairwise comparison using Wilcoxon rank sum test indicated that PCCSCC method were 342

significantly better than MRNETCLR (Supplemental Table S3) although MI methods may be superior for 343

some types of interactions 344

The raw protein expression data included 17862 genes of which 11429 genes overlapped with our RNA-Seq-345

based network and were therefore used for the analysis To demonstrate that the performance of the protein 346

network was not biased due by the selection of genes the PCC method was used for the whole 17862 genes 347

to construct a protein network (Supplemental Fig 7) No improvement could be detected from protein network 348

derived from 17862 genes with p-value equals to 0635 for GO evaluation and 0995 for PPPTY evaluation 349

from one-sided Wilcoxon rank sum test 350

351

PCC and SCC-built GCN Exhibit Identical Topological and Functional Properties 352 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 11

In addition to evaluation of network performance based upon biological characteristics networks can be 353

compared based upon several different network characteristics including clustering coefficient number of 354

nodes network heterogeneity (Dong and Horvath 2007) network centralization (Dong and Horvath 2007) 355

number of detected modules and number of genes in largest module Number of nodes is a basic construct in 356

graph theory depicting the scale of a network Clustering coefficient and number of modules are to model how 357

densely nodes are connected in networks Heterogeneity measures the variability of node connections 358

Centralization indicates how likely some nodes have significantly more connections than average In this 359

analysis each gene corresponds with a node Based on the extensive evaluation using biological 360

characteristics like protein-protein interactions (PPPTY) and predicted gene function (GO) three final maize 361

networks were selected for comparison of basic network characteristics based on their overall performance 362

PCC and SCC-built ranked aggregation network from 17 experiments (PA and SA) MRNET-built single 363

network from 1266 total samples (MS) The three networks were constrained to include the top one million 364

predicted interactions or edges 365

In prior studies most biological networks had scale-free architectures which fit a power-law distribution 366

(Barabasi et al 2004 Doncheva et al 2012 Schaefer et al 2014) For the three final maize networks 367

constructed using optimized parameters both neighborhood connectivity distribution (Supplemental Fig 8) and 368

node degree distribution (Supplemental Fig 9) fit power-law models with r-squared values over 07 The MS 369

network had the highest network centralization value The network heterogeneity value of MS was over two 370

times that of PA and SA indicating that MS may contain more highly interacting genes (Supplemental Table 371

S4) consistent with the observed highest centralization values for this network Centralization and 372

heterogeneity are two variants to model the degree distribution of networks A scale-free network with more 373

numbers of hubs has larger values of centralization and heterogeneity while a network with larger values of 374

centralization and heterogeneity may contain a larger number of hubs or the number of hubs is not significantly 375

large but the degree distributions are extremely imbalanced In biological networks many observations 376

connected large values of centralization and heterogeneity with more hub genes (Ma and Zeng 2003 Horvath 377

and Dong 2008 Iancu et al 2012 Scott-Boyer et al 2013) even though theoretically we cannot rule out the 378

possibility that high values were result from extremely imbalanced degree distribution For the MS network 379

most highly connected genes interacted with a large number of lowly connected genes this pattern is also 380

apparent reflected in the decreasing neighborhood connectivity distribution for the MS network (Supplemental 381

Fig 8) The genes with the most interactions are expected to act as key components in GCN networks 382

(Langfelder and Horvath 2008 Allen et al 2012) and likely represent central regulators of multi-protein 383

biological processes (Ma et al 2013 Du et al 2015) The top 1000 interacting genes from all networks were 384

analyzed in more detail as these were potential ldquohubrdquo genes that may regulate other expression patterns and 385

processes PA and SA shared 95 of the top 1000 interacting genes while MS had 835 unique genes (Fig 386

7A) 148 genes were shared among all three networks (Supplemental Table S5) making these genes strong 387

candidate for central biological regulators The annotation of these genes suggests their participation in a 388

range of basic cellular process (Fig 7C) including gene expression DNA replication translation and gene 389

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 12

silencing (Supplemental Table S5) the top interacting genes were not limited to a subset of cellular 390

biochemistry Ribosomal proteins were the largest component of top interacting genes (27148) which was 391

expected because of their cellular abundance and involvement with translation Interestingly nine epigenetic 392

regulators were found in the 148 shared genes including AGO104 (GRMZM2G141818) (Singh et al 2011) 393

CHR106 (GRMZM2G071025) (Li et al 2014a) and LBL1 (GRMZM2G020187) (Dotto et al 2014) 394

demonstrating the importance of epigenetic regulation for plant development (reviewed by (Huang et al 395

2017)) 396

To reveal the underlying properties of GCNs a graph clustering algorithm Markov Cluster Algorithm(MCL) was 397

used to identify network modules (Enright et al 2002 Morris et al 2011) The result showed a shared pattern 398

between the PA and SA networks that was distinct from the MS network (Supplemental Table S4) The MS 399

network had fewer but larger modules detected than the PA and SA networks Consequently most genes in 400

the MS network clustered into one very large module of 14054 consistent with the high network centralization 401

value for the MS network Conversely PA and SA networks separated into smaller distinct modules with 402

related gene ontology enrichment (Supplemental Table S6 and S7) The pattern displayed by the PA and SA 403

networks (Supplemental Fig 10) seems more likely to represent biologically relevant pathways and so these 404

methods appear to be better for module detection 405

To compile a high-confident co-expression network the top 1 million edges from PA SA and MS were merged 406

together and the intersection of the three produced a 14277 gene 106591 interactions merged network PA 407

and SA shared 835 of common interactions within the networks while MS had 873 unique interactions 408

(Fig 7B) This merged network (Supplemental Dataset S1) was used for a case study analysis of cell wall 409

biosynthesis The same network can also be accessed at httpwwwbiofsuedumcginnislabmcnmain_pagephp 410

411

Case Study Cell Wall Biosynthesis and Regulation 412

To demonstrate the functionality of network the predicted cell wall biosynthesis pathway from the merged 413

network was compared to the existing knowledge of this pathway Sixteen well-characterized components of 414

cell wall biosynthesis were selected as guide genes (Supplemental Table S8) including five cellulose 415

synthase genes seven cellulose synthase-like genes three glycosyl hydrolase genes and one glycosidase 416

gene (Penning et al 2009 Bosch et al 2011) Collectively 214 genes containing 377 edges were extracted 417

from the network with the 16 guide genes (Fig 8 A) two guide genes did not have any co-expressed genes in 418

the network that met the analysis criteria As expected for these 214 genes cell wall related GO terms were 419

enriched (Fig 7D Supplemental Table S9) 420

The resulting 214 co-expressed genes were queried against the Arabidopsis TAIR 10 protein database to 421

retrieve homologs and their annotations using BLASTP The literature was manually searched using the maize 422

genes and their Arabidopsis homologs as queries (Supplemental Table S10) The results of the literature 423

survey showed that 313 (67214) of the genes co-expressed with the guide genes had peer-reviewed 424

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 13

publications indicating a role in cell wall synthesis or related pathways in plants A search using 214 randomly 425

selected genes as queries returned only 327 genes (7214) that were involved in cell wall related pathways 426

This suggests that the network discriminated co-expressed genes and identified some known components of 427

the pathway Lignin biosynthesis genes are expected to function in cell wall biosynthesis to provide rigidity and 428

strength in the secondary cell wall (reviewed by Vanholme et al 2010) Interestingly even though no lignin 429

biosynthesis genes were included in our queries six lignin biosynthesis genes (PAL1 C4H 4CL2 HCT 430

CCoAOMT1 and PDR1) (reviewed by Zhong and Ye 2015) were found to be co-expressed with the guide 431

genes At least nine cellulose biosynthesis and assembly genes were discovered including CESA1 FLA11 432

IRX9 IRX14 and IRX10 (reviewed by Zhong and Ye 2015) Moreover proteins participating in a well-studied 433

physical interaction CSI1 (Cellulose Synthase Interactive 1) CESA6 (Cellulose Synthase 6) and CESA3 434

(Cellulose Synthase 3) (Desprez et al 2007 Gu et al 2010) were also predicted to be expressed in the 435

network There were 131 genes without reported functions in cell wall pathways an indication that GCN 436

analysis can be used to predict undiscovered components of biological pathways in maize 437

The cell wall biosynthesis pathway results were also compared with the CORNET Co-expression database (De 438

Bodt et al 2012) and STRING functional protein association network (Szklarczyk et al 2015) using the same 439

16 genes and similar parameters (See Methods) From CORNET 10 out of 16 genes had co-expressed genes 440

(Fig 8B) In total 210 genes and 325 interactions were retrieved using CORNET of which 19 (40210) had 441

publications supporting their function in cell wall pathways (Supplemental Table S11) STRING performed very 442

well with 14 out of 16 genes demonstrating predicted protein association (Fig 8C) resulting in 817 443

interactions with 76 genes 48 (3675) of co-expressed genes were experimentally confirmed (Supplemental 444

Table S12) the highest percentage among the three methods Only one of the lignin biosynthesis genes 445

(PAL1) was found using CORNET and none were found using STRING Although STRING appears very 446

robust for predicting protein-protein interactions this suggests that an optimized GCN analysis have more 447

power to find genes that function together without physically interacting This case study shows that a robust 448

optimized GCN can discover physical and functional interactions and enhance study of biological relevant 449

interactions A tutorial was provided as supplemental material on how to use Cytoscape to visualize any co-450

expressed genes in our network (Supplemental Dataset S2) 451

452

Discussion 453

As the per-read cost of RNA-Seq technology decreases the use of this technology is quickly increasing With 454

over five thousand libraries available for maize there is now ample data to support GCN analysis This 455

comprehensive evaluation of normalization methods and network inference methods using real maize RNA-456

Seq data will provide a useful set of optimized parameters to support these analyses 457

In our analysis VST CPM and RPKM normalization methods had equivalent outcomes for GCN analysis 458

consistent with prior results using much smaller datasets (Giorgi et al 2013) Several benchmark studies 459

focusing on differential expression (DE) analysis proposed that RPKM performed poorly and should be avoided 460 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 14

(Maza et al 2013 Dillies et al 2013b Zyprych-Walczak et al 2015) This was not observed for the maize 461

GCN testing It is possible that the large number of samples from various labs created enough heterogeneity 462

within samples that normalization effects were minimized (Paulson et al 2016) Furthermore the 463

normalization is on a library basis which means genes within the same library are normalized by similar factors 464

So when the network is constructed by PCC and BIC where expression vectors are centered by mean or 465

median values the effect of different normalization methods are probably small Two rank correlations SCC 466

and KCC only consider difference on relative rankings where normalization has a limited effect It is similar for 467

GCC method The estimation of mutual information is based on the k-nearest neighbor method implemented in 468

parmigene (Sales and Romualdi 2011) Since the three normalization methods shared similar expression 469

distribution (Supplemental Fig 2) MI estimations from different normalizations are expected to be similar 470

When assessing inference methods the simple and widely used correlation methods like PCC and SCC are 471

less time-consuming than MI methods This analysis showed PCCSCC- built GCNs had better overall 472

performance This is consistent with a study in human GCN analysis (Ballouz et al 2015) but SCC did not 473

score higher than other correlation methods using GO and PPPTY evaluations Some genes had higher 474

performance using MI methods but this effect was limited to evaluation with the PPPTY data This may 475

indicate that correlation and MI inference methods assert different kinds of interactions (Meyer et al 2008 476

Marbach et al 2012 Song et al 2012) Marbach et al (2012) stated that integration of multiple inference 477

methods showed a more robust performance than any single inference methods in in silico and E coli 478

expression networks referring to ldquothe wisdom of crowdrdquo However for analysis of the available maize data 479

integration of PCC SCC MRNET and CLR together did not result in a network that outperformed PCC and 480

SCC networks (data not shown) This approach was also less effective in more complex S cerevisiae datasets 481

than prokaryotic networks (Marbach et al 2012) suggesting that more work is required to determine whether 482

integrating algorithms can improve GCNs with eukaryotic data 483

In conclusion we extensively evaluated normalization methods and inference methods for building an RNA-484

Seq based maize GCN This optimization may apply to a range of datasets with shared characteristics of 485

maize including a large and heterogeneous genome with rich and diverse transposon element composition 486

and limited gene annotation 487

488

Materials and Methods 489

RNA-Seq Data Collection and Process 490

The maize genome and its annotation were downloaded from Ensembl Plant Release 31 491

(httpplantsensemblorg) The original 1303 RNA-Seq samples based on illumina HiSeq2000 or Hiseq2500 492

were downloaded from NCBI Sequence Read Archive (SRA) (Leinonen et al 2010) The downloaded files 493

were converted to fastq format using the fastq-dump command in SRA Toolkit (version 252) The adapters for 494

the fastq files were trimmed by Cutadapt 181 (Martin 2011) The adapter-removed files were then quality 495

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 15

checked by FastQC v0112 (httpwwwbioinformaticsbabrahamacukprojectsfastqc) HISAT2 v204 (Kim 496

et al 2015) was used for genome alignment Gene-level expression raw read counts were calculated by 497

FeatureCounts 150 (Liao et al 2014) from aligned bam files (Supplemental Fig S1) 26 libraries with less 498

than 5 million reads total and 11 libraries with less than 70 of total alignment rate were excluded leaving 499

1266 samples (Supplemental Table S1) for the final expression table The processing protocol were 500

streamlined by Snakemake v371 (Koumlster and Rahmann 2012) 501

502

Gene Count Normalization 503

The expression data was normalized using three different methods before constructing GCNs Counts Per 504

Million (CPM) and Reads Per Killbase Per Million (RPKM) were calculated by edgeR package (Robinson et al 505

2010) in R environment and then log2 normalized (expression = log2(CPMRPKM +1) For both method scale 506

factors between samples were estimated by Trimmed Mean of M-values (TMM) in edge R Variance Stabilizing 507

Transformation (VST) was calculated by DESeq2 package (Love et al 2014) Only genes with expression 508

higher than 2 CPM in more than 1000 samples were included from additional analysis (15116 genes) 509

510

Network Inference 511

Six correlation coefficient methods and four mutual information methods were applied to normalized gene 512

expression data to construct GCNs All computing steps were done in the R 331 environment Pearson 513

Correlation Coefficient (PCC) and Spearman Correlation Coefficient (SCC) was calculated by cor() function 514

Kendall rank Correlation Coefficient was calculated using corfk() function in pcaPP package (Filzmoser et al 515

2009) Gini Correlation Coefficient was calculated by adjacencymatrix() function in rsgcc package (Ma and 516

Wang 2012) Biweight midcorrelation was computed by bicor() function in WGCNA package (Langfelder and 517

Horvath 2008) Cosine similarity coefficient was computed by cosine() function in coop package (Schmidt 518

2016) Mutual information results were computed using the parmigene package (Sales and Romualdi 2011) 519

The adjacency matrix weighs derived from ten inference methods were ranked with smallest value equals to 520

one Then ranks were divided by the number of elements in the matrix and diagonal was set to one to make all 521

networks weighs ranging from zero to one 522

523

Network Performance Evaluation 524

To generate the random networks gene IDs were shuffled randomly in CPM or VST normalized expression 525

matrices The randomized expression matrices were then inferenced by PCC MRNET or CLR methods and 526

evaluated For PCC methods 1000 repeats of randomization and evaluation were conducted For MRNET and 527

CLR each inference steps took 2 hours on our server so 10 repeats were conducted 528

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 16

Four maize datasets were used for evaluation First maize protein-protein interactions were downloaded from 529

PPIM v11 (Zhu et al 2016) Only high-confidence interactions were used for evaluation as defined by ranking 530

top 5 in their results Second maize pathway information was downloaded from MaizeCyc v22 (Monaco et 531

al 2013) Genes within same pathways were considered as co-expressed Third maize gene ontology data 532

for AGPv330 was downloaded from AgriGO (Du et al 2010) GO terms with 20 to 300 genes were used for 533

evaluation Fourth ChIP-Seq confirmed targets for HDA101 (GRMZM2G172883) (Yang et al 2016) was used 534

as positive co-expressed examples for evaluation 535

The widely-used Area under Receiver Operating Characteristic (AUROC) for binary classification problems 536

was used for evaluations Protein-protein interaction and pathway information was parsed into lists of co-537

expressed genes Prediction() and performance() function in R package ROCR were used to calculate 538

AUROCs (Sing et al 2005) The 277 AUROC values for GO datasets were calculated by EGAD package 539

(Ballouz et al 2016) in R Basically it utilizes the ldquoguilt-by associationrdquo principle that genes with shared GO 540

terms are more likely to connected Thus networks normalized and inferred by different methods can be 541

evaluated by hiding a subset of genes GO terms and test whether the hidden GO terms could be predicted 542

from the remaining annotations The prediction model performance was measured by AUROC values in three-543

fold cross-validation All ANOVA and pairwise Wilcoxon rank tests were analyzed in R using anova() and 544

pairwisewilcoxtest() function from stats package P-value adjustment method was set to ldquofdrrdquo (Benjamini and 545

Hochberg 1995) 546

Definition of True Positives (TP) False Positives (FP) True Negatives (TN) False Negatives (FN) For the 547

evaluation using PPPTY dataset TP a network predicts two genes are co-expressed and they are co-548

expressed in PPPTY dataset FP a network predicts two genes are co-expressed but they are not TN a 549

network predicts two genes are not co-expressed and they are not co-expressed in PPPTY FN a network 550

predicts two genes are not co-expressed but they are co-expressed in PPPTY datasets For the evaluation 551

using GO dataset TP a network predicts a gene has a specific GO term and it does have that GO term in our 552

GO dataset FP a network predicts a gene has a specific GO term but it does not have that GO term in our 553

GO dataset TN a network predicts a gene does not have a specific GO term and it doesnrsquot have in our GO 554

dataset FN a network predicts a gene does not have a specific GO terms but it has that GO term in GO 555

dataset 556

557

Network Clustering and Characterization 558

For each network the top 1 million edges were selected as stringent co-expression networks The network 559

topological characteristics were computed in Cytoscape (Shannon et al 2003) The neighborhood connectivity 560

distribution and node degree distributions were plotted by Network Analyzer plugin (Doncheva et al 2012) 561

Graph clustering was performed using Markov Cluster Algorithm (MCL) by MCL v14137 with inflation value set 562

to 18 (Enright et al 2002) All networks were visualized in Cytoscape 563

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 17

564

Gene Ontology Enrichment and Visualization 565

Gene ontology enrichment was analyzed in AgriGOrsquos Singular Enrichment Analysis tool (Du et al 2010) 566

15116 genes involved in our networks were used as background references Hypergeometric testing was used 567

to calculate p-value for which a value below 005 was considered as significant The Yekutieli method was 568

used for multiple test correction and terms with false discovery rate (FDR) above 005 were discarded The 569

results were then imported into Cytoscape for visualization 570

571

Databases Comparison on Cell Wall Pathway 572

Sixteen well characterized (Penning et al 2009 Bosch et al 2011) components of cell wall biosynthesis 573

(Supplemental Table S8) were chosen as query genes to search against CORNET Maize 574

(httpsbioinformaticspsbugentbecornetversionscornet_maize10) on website and STRING database using 575

Cytoscape stringApp (httpappscytoscapeorgappsstringapp) The parameters for searching CORNET 576

database were Method=Pearson Correlation coefficient=075 P-value le 005 and Top genes = 50 This 577

resulted in 210 co-expressed genes and 325 interactions To search STRING database the confidence cutoff 578

was set to 04 with maximum number of interactors set to 100 76 genes with 817 interactions were retrieved 579

Maize proteins were blasted against TAIR 10 protein sequences using standalone BLASTP version 2228+ 580

(Camacho et al 2009) 581

582

Acknowledgments 583

We would like to give special thanks to Dr Peixiang Zhao (FSU Department of Computer Science) for advice 584

and discussion on topological analysis of maize networks Also we thank Dr Alan Lemmon (FSU Department 585

of Scientific Computing) and Dr Jonathan Dennis (FSU Department of Biological Science) for the helpful 586

discussion on data analysis 587

588

Supplemental Data 589

Supplemental Figure 1 Pipeline and datasets used for analysis 590

Supplemental Figure 2 Distribution of gene expression values 591

Supplemental Figure 3 Maize CPM-normalized with log2 transformed gene expression from all tissues and 592

developmental stages 593

Supplemental Figure 4 Pairwise comparison among results of inferences methods 594

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 18

Supplemental Figure 5 Characteristic of all 1720 PPPTY gene set (ALL_1720) genes with highest AUROC 595

values in CSC method (CSC) PCC method (PCC) and MRNET method (MRNET) 596

Supplemental Figure 6 Evaluation of network performance based on sample size and inference 597

Supplemental Figure 7 GCN performance comparison between protein networks 598

Supplemental Figure 8 Average neighborhood connectivity for three selected networks PCC-aggregated (PA) 599

SCC-aggregated (SA) and MRNET-single (MS) 600

Supplemental Figure 9 Node distribution for four selected networks PCC-aggregated (PA) SCC-aggregated 601

(SA) and MRNET-single (MS) and the intersection among three networks (Merged network) 602

Supplemental Figure 10 Network representation of PCC ranked aggregation network (PA) 603

Supplemental Table S1 RNA-Seq libraries used in this analysis 604

Supplemental Table S2 Random network AUROC value baseline 605

Supplemental Table S3 ANOVA tables and pairwise comparisons 606

Supplemental Table S4 Topological characteristics of four maize networks 607

Supplemental Table S5 Gene Ontology annotation for 148 hub genes 608

Supplemental Table S6 Enriched GO terms for PCC ranked aggregation networks from module 1 to module 8 609

Supplemental Table S7 Enriched GO terms for SCC ranked aggregation networks from module 1 to module 8 610

Supplemental Table S8 16 query genes in maize cell wall pathway 611

Supplemetal Table S9 GO enrichment analysis for 214 co-expressed genes of cell wall query genes in 612

merged network 613

Supplemental Table S10 Annotation for co-expressed genes queried by 16 cell wall pathway genes from 614

merged network 615

Supplemental Table S11 Annotation for co-expressed genes queried by 16 cell wall pathway genes from 616

CORNET database 617

Supplemental Table S12 Annotation for co-expressed genes queried by 16 cell wall pathway genes from 618

STRING database 619

Supplemental Dataset S1 The merged network in Cytoscape-ready format 620

Supplemental Dataset S2 Tutorial Visualizing Co-expression data in Cytoscape 621

622

623 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 19

624

625

626

Figure legends 627

628

Figure 1 Number of maize microarray and RNA-Seq samples submitted to NCBI (httpswwwncbinlmnihgov) 629

from 2008-2016 A A text search of the National Center for Biotechnology Information (NCBI) Gene 630

Expression Omnibus (GEO) database identified samples generated by microarray platforms GPL4032 and 631

GPL12620 the total values for years 2008 to 2016 were combined to represent the number of microarray 632

studies (Microarray) A text search of the NCBI Sequence Read Archive (SRA) database was used to identify 633

RNA-seq samples generated between 2008 and 2016 using the Illumina sequencing platform (RNA-Seq) B 634

the number of samples submitted to NCBI GEO database each year generated by microarray platform 635

GPL4032 and GPL12620 were identified by a text search (dash line) and compared to the number of RNA-Seq 636

Illumina samples (solid line) per year 2008-2016 637

638

Figure 2 Normalization and network inference methods effect on single network performance A Network 639

performance was evaluated by calculating area under the receiver operating characteristic curve (AUROC) 640

values from GO datasets for comparisons with samples normalized using Variance Stabilizing Transformation 641

(VST) Counts Per Million (CPM) or Reads per Kilobase per Million (RPKM) methods B Network performance 642

was evaluated by calculating AUROC values from PPPTY dataset comparisons for samples normalized using 643

VST CPM or RPKM methods C Network performance was evaluated by calculating AUROC values from 644

comparisons with HDA101 binding targets for samples normalized using VST CPM or RPKM methods D 645

Network performance was evaluated by calculating AUROC values from comparisons with GO dataset for 646

samples constructed using ten inference methods including Pearson Correlation Coefficient (PCC) Spearman 647

correlation coefficient (SCC) Kendall rank Correlation Coefficient (KCC) Gini correlation coefficient (GCC) 648

Biweight midcorrelation (BIC) Cosine Similarity Coefficient (CSC) Additive ARCNE (AA) multiplicative 649

ARCNE (MA) Minimum Redundancy NETwork (MRNET) and Context Likelihood of Relatedness (CLR) E 650

Network performance was evaluated by calculating AUROC values from comparisons with PPPTY for samples 651

constructed using ten inference methods F Network performance was evaluated by calculating AUROC 652

values from comparisons with HDA101 binding targets for samples constructed using ten inference methods 653

Outliers were defined as outside of 15 times the interquartile range above 75 quantile or below 25 quantile 654

Median values were plotted as bold horizontal lines For C to F the horizontal dashed lines indicate the highest 655

and lowest AUROC values 656

657

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 20

Figure 3 Similarity between ten inference methods on network performance based upon GO (A) and PPPTY 658

(B) evaluation Genes with highest AUROC value in CSC or MRNETCLR are closed in blue and green box 659

respectively Area under the ROC curve (AUROC) values for each GO term or genes were scaled to standard 660

normal distribution resulting in scaled AUROC values between -3 (blue) and 3 (red) Samples normalized by 661

VST CPM and RPKM were analyzed using each inference methods (PCC SCC KCC GCC BIC CSC AA 662

MA MRNET and CLR) and clustered based on Euclidian distance PCC Pearson Correlation Coefficient SCC 663

Spearman correlation coefficient KCC Kendall rank Correlation Coefficient GCC Gini correlation coefficient 664

BIC Biweight midcorrelation CSC Cosine Similarity Coefficient AA Additive ARCNE MA multiplicative 665

ARCNE MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness 666

667

Figure 4 Effect of sample size on network performance A Average area under the ROC curve (average 668

AUROC) values from GO evaluation of 17 different sized networks plotted against natural logarithm 669

transformed sample size (log(sample size)) B Average AUROC values from PPPTY evaluation of 17 different 670

sized networks plotted against natural logarithm transformed sample size (log(sample size)) Regression fitting 671

logarithm models were plotted in black lines R-square and p-values were calculated by lm() function in R PCC 672

Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET Minimum Redundancy 673

NETwork CLR Context Likelihood of Relatedness 674

675

Figure 5 Evaluation of different sized networks constructed with different number of samples using PCC 676

(black) SCC (green) MRNET (red) and CLR (blue) methods A Average AUROC values from GO evaluations 677

of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted against sample size Seventeen 678

individual networks were labeled as S12_1 to S404 the S1266 included all samples from 17 experiments B 679

Average AUROC values from PPPTY evaluations of networks constructed using 12 (S12) to 1266 (S1266) 680

libraries were plotted against sample size Networks with the same number of samples included are 681

designated as ldquo_1rdquo ldquo_2rdquo and ldquo_3rdquo PCC Pearson Correlation Coefficient SCC Spearman correlation 682

coefficient MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness 683

684

Fig 6 GCN performance comparison among single network (whiterdquo1266rdquo) aggregated network (greyrdquoaggrdquo) 685

and protein network (dark greyrdquoprrdquo) using PCC SCC MRNET and CLR A GO evaluation on networks 686

Inference methods were indicated by single letter (p- PCC s- SCC m- MRNET c-CLR) AUROC values were 687

plotted against network types B PPPTY evaluation on networks Inference methods were indicated by single 688

letter (p- PCC s- SCC m- MRNET c-CLR) Network types were plotted against AUROC values Bold 689

horizontal lines indicate median star sign is the mean value of each box Outliers are plotted in grey dots 690

691

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 21

Figure 6 GCN performance comparison of networks constructed with 1266 libraries A Area under the ROC 692

curve (AUROC) values from GO evaluation of single network (white bars) aggregation network (grey bars) and 693

protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) 694

or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers B 695

AUROC values from PPPTY evaluation of single network (white bars) aggregation network (grey bars) and 696

protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) 697

or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers 698

699

Figure 7 Comparison of top 1000 interacting genes in aggregation and single networks A Venn diagram 700

shows the overlap among 1000 highest interacting genes in three networks 148 genes were shared among 701

three networks PA PCC ranked aggregation network SA SCC ranked aggregation network MS MRNET 702

single network B Venn diagram shows the overlap among top 1x106 edges in three networks 106591 edges 703

were shared among three networks C GO term enrichment analyzed by AgriGO for 148 shared highly 704

interacting genes among PA SA and MS D GO term enrichment analyzed by AgriGO for 214 co-expressed 705

genes queried by 16 cell wall pathway genes 706

707

Figure 8 Cell wall pathway subnetworks A Intersections of PCC aggregation (PA) SCC aggregation (SA) and 708

MRNET-single (MS) networks queried by 16 cell wall pathway genes (red nodes) Cyan nodes are genes with 709

reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of 710

involvement in cell wall related pathways Grey lines indicate network predicted interactions B Network 711

retrieved from CORNET database queried by the16 cell wall pathway genes (red node) Cyan nodes are 712

genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior 713

knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions C 714

Network retrieved from STRING database queried by 16 cell wall pathway genes (red nodes) Cyan nodes are 715

genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior 716

knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions 717

718

Supplemental Figure 1 Pipeline and datasets used for analysis A Workflow used in this analysis 719

Independent steps are labeled in square boxes with alternative algorithms for each step in the rounded boxes 720

Software and packages for each step are in italics between the boxes Raw data files were acquired from 721

National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) database converted to a 722

common format (fastq files) and aligned to the maize AGPv3 genome (Alignment) Gene-level reads were 723

counted (Read Count) to generate an expression matrix which was imported to the R environment for the 724

normalization inference and evaluation steps All networks were visualized in Cytoscape B Relative 725

representation of different maize tissues in acquired datasets Tissues are listed by name with the percentage 726

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 22

of the1266 libraries originating from each tissue SAM= Shoot Apical Meristem Samples are grouped by tissue 727

and may be represented by one or more developmental stages of that tissue Tissues represented by less than 728

10 libraries were grouped together as Others C Relative representation of different maize genotypes in our 729

datasets Genotypes are listed by name with the percentage of the 1266 libraries originating from each tissue 730

MAGIC = Multi-parent Advanced Generation InterCrosses Genotypes represented by more than 10 libraries 731

were grouped together as Others 732

733

Supplemental Figure 2 Distribution of gene expression values The frequency of each expression level in the 734

dataset (Density) was plotted against gene expression (Expr) which was calculated after normalization by 735

Variance Stabilizing Transformation (VST) Counts Per Million (CPM) or Reads Per Kilobase per Million 736

mapped reads (RPKM) A-B distribution of expression values for samples normalized with CPM (black line 737

CPM graph) and RPKM (black line RPKM graph) before (A) and after (B) logarithm normalization (log2) VST 738

values are log2 transformed by default The normal distribution of expression (dot lines) was calculated using 739

dnorm() function in R which takes the mean value and standard deviation from log2 transformed expressions 740

C Normalized gene expression values for 15116 genes were averaged libraries and plotted as a function of 741

gene length in base pairs (bp) 742

743

Supplemental Figure 3 Maize CPM-normalized with log2 transformed gene expression from all tissues and 744

developmental stages (Stelpflug et al 2015) A Clustering dendrogram of samples based on Euclidean 745

distance (Height) DAS days after sowing DAP days after pollination V1-V18 vegetative developmental 746

stage B Heat map of the gene expression correlation between pollen tissue and 78 other tissues calculated 747

by Pearson correlation coefficient ranging 06 to 10 Red color indicates higher correlation 748

749

Supplemental Figure 4 Pairwise comparison among results of inferences methods A GO evaluation 750

comparisons for VST CPM and RPKM normalized data The AUROC value density for each method was 751

plotted in diagonal line of blocks between AUROC values and PCC values AUROC values evaluated by GO 752

datasets were plotted pairwise in triangle below diagonal with the number corresponding coefficient values as 753

calculated by Pearson correlation shown in the triangle above diagonal B PPPTY evaluation comparisons for 754

VST CPM and RPKM normalized data The AUROC value density for each method was plotted in diagonal 755

line of blocks between AUROC values and PCC values AUROC values evaluated by PPPTY datasets were 756

plotted pairwise in triangle below diagonal with the number corresponding coefficient values as calculated by 757

Pearson correlation shown in the triangle above diagonal PCC Pearson Correlation Coefficient SCC 758

Spearman correlation coefficient KCC Kendall rank Correlation Coefficient GCC Gini correlation coefficient 759

Bi Biweight midcorrelation CSC Cosine Similarity Coefficient AA Additive ARCNE MA multiplicative 760

ARCNE MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness 761

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 23

762

Supplemental Figure 5 Characteristic of all 1720 PPPTY gene set (ALL_1720) genes with highest AUROC 763

values in CSC method (CSC) PCC method (PCC) and MRNET method (MRNET) Average expression in 764

CPM of four gene sets were in squares average number of lowly expressed elements (CPM lt 0) were in solid 765

circles 766

767

Supplemental Figure 6 Evaluation of network performance based on sample size and inference A AUROC 768

values from GO evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted 769

against sample size B AUROC values from PPPTY evaluations of networks constructed using 12 (S12) to 770

1266 (S1266) libraries were plotted against sample size Networks with the same number of samples included 771

are designated as ldquo_1rdquo ldquo_2rdquo and ldquo_3rdquo Outliers were defined as outside of 15 times the interquartile range 772

above the 75 quantile or below 25 quantile Median values were plotted as bold horizontal lines Dash lines 773

are average AUROC value from 17 individual networks of each categories Mean values of each network were 774

labeled in asterisks PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET 775

Minimum Redundancy NETwork CLR Context Likelihood of Relatedness 776

777

Supplemental Figure 7 GCN performance comparison between protein networks A Area Under the ROC 778

curve (AUROC) values from GO evaluation of protein networks with 17862 genes (ppr_all) and with 11429 779

genes (ppr) B Area Under the ROC curve (AUROC) values from PPPTY evaluation of protein networks with 780

17862 genes (ppr_all) and with 11429 genes (ppr) Both networks were constructed by Pearson Correlation 781

Coefficient (PCC) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate 782

outliers 783

784

Supplemental Figure 8 Average neighborhood connectivity for three selected networks PCC-aggregated (PA) 785

SCC-aggregated (SA) and MRNET-single (MS) The average neighborhood connectivity distribution of all 786

genes is plotted against number of neighbors The top one million edges were chosen for each network Red 787

and blue curve shows the power-law fitted distribution R2 value indicates the fitness with the power-law model 788

789

Supplemental Figure 9 Node distribution for four selected networks PCC-aggregated (PA) SCC-aggregated 790

(SA) and MRNET-single (MS) and the intersection among three networks (Merged network) The number of 791

edges linked to the genes (node degree) was plotted against the number of genes with that degree (number of 792

nodes) Red curve shows the power-law fitted distribution with the function and R2 indicated beside 793

794

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 24

Supplemental Figure 10 Network representation of PCC ranked aggregation network (PA) Each node is a 795

gene in the network The eight largest modules detected by Markov Cluster Algorithm (MCL) were highlighted 796

in colors Genes not in modules 1-8 are light grey nodes 797

798

799

Literature Cited 800

Allen JD Xie Y Chen M Girard L Xiao G (2012) Comparing statistical methods for constructing large scale 801 gene networks PLoS One 7 e29348 802

Anders S Huber W (2010) Differential expression analysis for sequence count data Genome Biol 11 R106 803

Aoki K Ogata Y Shibata D (2007) Approaches for extracting practical information from gene co-expression 804 networks in plant biology Plant Cell Physiol 48 381ndash90 805

Aoki Y Okamura Y Tadaka S Kinoshita K Obayashi T (2015) ATTED-II in 2016 A Plant Coexpression 806 Database Towards Lineage-Specific Coexpression Plant Cell Physiol 57 e5ndashe5 807

Asnicar F Masera L Coller E Gallo C Sella N Tolio T Morettin P Erculiani L Galante F Semeniuta S (2016) 808 NES2RA Network expansion by stratified variable subsetting and ranking aggregation Int J High Perform 809 Comput Appl 1094342016662508 810

Baerenfaller K Grossmann J Grobei MA Hull R Hirsch-Hoffmann M Yalovsky S Zimmermann P 811 Grossniklaus U Gruissem W Baginsky S (2008) Genome-scale proteomics reveals Arabidopsis thaliana 812 gene models and proteome dynamics Science (80- ) 320 938ndash941 813

Ballouz S Verleyen W Gillis J (2015) Guidance for RNA-seq co-expression network construction and analysis 814 Safety in numbers Bioinformatics 31 2123ndash2130 815

Ballouz S Weber M Pavlidis P Gillis J (2016) EGAD Ultra-fast functional analysis of gene networks bioRxiv 816 53868 817

Barabasi A-L Oltvai ZNZN Barabaacutesi A-L (2004) Network biology understanding the cellrsquos functional 818 organization Nat Rev Genet 5 101ndash113 819

Benjamini Y Hochberg Y (1995) Controlling the false discovery rate a practical and powerful approach to 820 multiple testing J R Stat Soc Ser B 289ndash300 821

De Bodt S Hollunder J Nelissen H Meulemeester N Inzeacute D (2012) CORNET 20 Integrating plant 822 coexpression protein-protein interactions regulatory interactions gene associations and functional 823 annotations New Phytol 195 707ndash720 824

Bolduc N Yilmaz A Mejia-Guerra MK Morohashi K OrsquoConnor D Grotewold E Hake S (2012) Unraveling the 825 KNOTTED1 regulatory network in maize meristems Genes Dev 26 1685ndash90 826

Bosch M Mayer C-D Cookson A Donnison IS (2011) Identification of genes involved in cell wall biogenesis in 827 grasses by differential gene expression profiling of elongating and non-elongating maize internodes J 828 Exp Bot 62 3545ndash3561 829

Camacho C Coulouris G Avagyan V Ma N Papadopoulos J Bealer K Madden TL (2009) BLAST+ 830 architecture and applications BMC Bioinformatics 10 421 831

Conesa A Madrigal P Tarazona S Gomez-Cabrero D Cervera A McPherson A Szcześniak MW Gaffney DJ 832 Elo LL Zhang X et al (2016) A survey of best practices for RNA-seq data analysis Genome Biol 17 13 833

Drsquohaeseleer P Liang S Somogyi R (2000) Genetic network inference from co-expression clustering to reverse 834 engineering Bioinformatics 16 707ndash726 835

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 25

Davidson RM Hansey CN Gowda M Childs KL Lin H Vaillancourt B Sekhon RS de Leon N Kaeppler SM 836 Jiang N et al (2011) Utility of RNA Sequencing for Analysis of Maize Reproductive Transcriptomes Plant 837 Genome J 4 191 838

Desprez T Juraniec M Crowell EF Jouy H Pochylova Z Parcy F Houmlfte H Gonneau M Vernhettes S (2007) 839 Organization of cellulose synthase complexes involved in primary cell wall synthesis in Arabidopsis 840 thaliana Proc Natl Acad Sci 104 15572ndash15577 841

Dhillon IS Modha DS (2001) Concept decompositions for large sparse text data using clustering Mach Learn 842 42 143ndash175 843

Dillies M-A Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot G Castel D 844 Estelle J (2013a) A comprehensive evaluation of normalization methods for Illumina high-throughput RNA 845 sequencing data analysis Brief Bioinform 14 671ndash683 846

Dillies MA Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot NS Castel D 847 Estelle J et al (2013b) A comprehensive evaluation of normalization methods for Illumina high-throughput 848 RNA sequencing data analysis Brief Bioinform 14 671ndash683 849

Doncheva NT Assenov Y Domingues FS Albrecht M (2012) Topological analysis and interactive visualization 850 of biological networks and protein structures Nature Protoc 7 670ndash85 851

Dong J Horvath S (2007) Understanding network concepts in modules BMC Syst Biol 1 24 852

Dotto MC Petsch KA Aukerman MJ Beatty M Hammell M Timmermans MCP (2014) Genome-wide analysis 853 of leafbladeless1-regulated and phased small RNAs underscores the importance of the TAS3 ta-siRNA 854 pathway to maize development PLoS Genet 10 e1004826 855

Du D Rawat N Deng Z Gmitter FG (2015) Construction of citrus gene coexpression networks from microarray 856 data using random matrix theory Hortic Res 2 15026 857

Du Z Zhou X Ling Y Zhang Z Su Z (2010) agriGO a GO analysis toolkit for the agricultural community 858 Nucleic Acids Res 38 64-70 859

Enright AJ Van Dongen S Ouzounis CA (2002) An efficient algorithm for large-scale detection of protein 860 families Nucleic Acids Res 30 1575ndash1584 861

Fadista J Vikman P Laakso EO Mollet IG Esguerra J Lou Taneera J Storm P Osmark P Ladenvall C 862 Prasad RB (2014) Global genomic and transcriptomic analysis of human pancreatic islets reveals novel 863 genes influencing glucose metabolism Proc Natl Acad Sci 111 13924ndash13929 864

Faith JJ Hayete B Thaden JT Mogno I Wierzbowski J Cottarel G Kasif S Collins JJ Gardner TS (2007) 865 Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of 866 expression profiles PLoS Biol 5 0054ndash0066 867

Fedoroff N V (2012) McClintockrsquos challenge in the 21st century Proc Natl Acad Sci 109(50) 20200ndash20203 868

Ficklin SP Feltus FA (2011) Gene coexpression network alignment and conservation of gene modules 869 between two grass species maize and rice Plant Physiol 156 1244ndash56 870

Filzmoser P Fritz H Kalcher K (2009) pcaPP Robust PCA by Projection Pursuit R Packag version 1 871

Fu J Cheng Y Linghu J Yang X Kang L Zhang Z Zhang J He C Du X Peng Z (2013) RNA sequencing 872 reveals the complex regulatory network in the maize kernel Nature Commun 42832 873

Gao S Ver Steeg G Galstyan A (2015) Efficient Estimation of Mutual Information for Strongly Dependent 874 Variables Artificial Intelligence and Statistics 277-286 875

Gillis J Pavlidis P (2011) The role of indirect connections in gene networks in predicting function 876 Bioinformatics 27 1860ndash1866 877

Giorgi F Fabbro C Del Licausi F (2013) Comparative study of RNA-seq-and Microarray-derived coexpression 878 networks in Arabidopsis thaliana Bioinformatics 2 1ndash8 879

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 26

Gu Y Kaplinsky N Bringmann M Cobb A Carroll A Sampathkumar A Baskin TI Persson S Somerville CR 880 (2010) Identification of a cellulose synthase-associated protein required for cellulose biosynthesis Proc 881 Natl Acad Sci 107 12866ndash12871 882

Han Y Gao S Muegge K Zhang W Zhou B (2015) Advanced applications of RNA sequencing and challenges 883 Bioinform Biol Insights 9 29ndash46 884

Horvath S Dong J (2008) Geometric interpretation of gene coexpression network analysis PLoS Comput biol 885 4 e1000117 886

Huang J Lynn JS Schulte L Vendramin S McGinnis K (2017) Chapter Two-Epigenetic Control of Gene 887 Expression in Maize Int Rev Cell Mol Biol 328 25ndash48 888

Iancu OD Kawane S Bottomly D Searles R Hitzemann R McWeeney S (2012) Utilizing RNA-Seq data for de 889 novo coexpression network inference Bioinformatics 28 1592ndash1597 890

Kim D Langmead B Salzberg SL (2015) HISAT a fast spliced aligner with low memory requirements Nat 891 Methods 12 357ndash360 892

Koumlster J Rahmann S (2012) Snakemakemdasha scalable bioinformatics workflow engine Bioinformatics 28 893 2520ndash2522 894

Krouk G Lingeman J Colon A Coruzzi G Shasha D (2013) Gene regulatory networks in plants learning 895 causality from time and perturbation Genome Biol 14 123 896

Kumar S Stecher G Suleski M Hedges SB (2017) TimeTree a resource for timelines timetrees and 897 divergence times Mol Biol Evol 34 1812ndash1819 898

Kumari S Nie J Chen H-S Ma H Stewart R Li X Lu M-Z Taylor WM Wei H (2012) Evaluation of gene 899 association methods for coexpression network construction and biological knowledge discovery PLoS 900 One 7 e50411 901

Langfelder P Horvath S (2008) WGCNA an R package for weighted correlation network analysis BMC 902 Bioinformatics 9 559 903

Leinonen R Sugawara H Shumway M (2010) The sequence read archive Nucleic Acids Res gkq1019 904

Li C Qiao Z Qi W Wang Q Yuan Y Yang X Tang Y Mei B Lv Y Zhao H et al (2015a) Genome-Wide 905 Characterization of cis-Acting DNA Targets Reveals the Transcriptional Regulatory Framework of 906 Opaque2 in Maize Plant Cell 27 532-545 907

Li H Peng Z Yang X Wang W Fu J Wang J Han Y Chai Y Guo T Yang N (2013a) Genome-wide 908 association study dissects the genetic architecture of oil biosynthesis in maize kernels Nat Genet 45 43ndash909 50 910

Li J Wei H Zhao PX (2013b) DeGNServer  Deciphering Genome-Scale Gene Networks through High 911 Performance Reverse Engineering Analysis 2013 912

Li P Piao Y Shon HS Ryu KH (2015b) Comparing the normalization methods for the differential analysis of 913 Illumina high-throughput RNA-Seq data BMC Bioinformatics 16 347 914

Li Q Eichten SR Hermanson PJ Zaunbrecher VM Song J Wendt J Rosenbaum H Madzima TF Sloan AE 915 Huang J et al (2014a) Genetic Perturbation of the Maize Methylome Plant Cell 26 4602ndash4616 916

Li S Łabaj PP Zumbo P Sykacek P Shi W Shi L Phan J Wu P-Y Wang M Wang C (2014b) Detecting and 917 correcting systematic variation in large-scale RNA sequencing data Nature Biotechnol 32 888ndash895 918

Li Y Pearl SA Jackson SA (2015c) Gene Networks in Plant Biology Approaches in Reconstruction and 919 Analysis Trends Plant Sci 20 664ndash675 920

Liao Y Smyth GK Shi W (2014) featureCounts an efficient general purpose program for assigning sequence 921 reads to genomic features Bioinformatics 30 923ndash930 922

Lim WK Wang K Lefebvre C Califano A (2007) Comparative analysis of microarray normalization procedures 923 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 27

Effects on reverse engineering gene networks Bioinformatics pp 282ndash288 924

Liu S Liu Y Zhao J Cai S Qian H Zuo K Zhao L Zhang L (2017) A computational interactome for prioritizing 925 genes associated with complex agronomic traits in rice Plant J 90 177-188 926

Lonsdale J Thomas J Salvatore M Phillips R Lo E Shad S Hasz R Walters G Garcia F Young N (2013) 927 The genotype-tissue expression (GTEx) project Nat Genet 45 580ndash585 928

Love MI Huber W Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data 929 with DESeq2 Genome Biol 15 1 930

Luo X You Z Zhou M Li S Leung H Xia Y Zhu Q (2015) A highly efficient approach to protein interactome 931 mapping based on collaborative filtering framework Sci Rep 5 7702 932

Ma C Wang X (2012) Application of the Gini correlation coefficient to infer regulatory relationships in 933 transcriptome analysis Plant Physiol 160 192ndash203 934

Ma H-W Zeng A-P (2003) The connectivity structure giant strong component and centrality of metabolic 935 networks Bioinformatics 19 1423ndash1430 936

Ma S Shah S Bohnert HJ Snyder M Dinesh-Kumar SP (2013) Incorporating motif analysis into gene co-937 expression networks reveals novel modular expression pattern and new signaling pathways PLoS Genet 938 9 e1003840 939

Marbach D Costello JC Kuumlffner R Vega NNM Prill RJ Camacho DM Allison KR Aderhold A Allison KR 940 Bonneau R et al (2012) Wisdom of crowds for robust gene network inference Nat Methods 9 796ndash804 941

Margolin AA Nemenman I Basso K Wiggins C Stolovitzky G Dalla Favera R Califano A (2006) ARACNE 942 an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context BMC 943 Bioinformatics 7 S7 944

Mark Cigan A Unger‐Wallace E Haug‐Collet K (2005) Transcriptional gene silencing as a tool for uncovering 945 gene function in maize Plant J 43 929ndash940 946

Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads EMBnet J 17 947 pp-10 948

Maza E Frasse P Senin P Bouzayen M Zouine M (2013) Comparison of normalization methods for 949 differential gene expression analysis in RNA-Seq experiments A matter of relative size of studied 950 transcriptomes Commun Integr Biol 6 e25849 951

McClintock B (1983) The Significance of Responses of THE GENOME TO CHALLENGE Science (80- ) 792ndash952 801 953

Meyer PE Kontos K Lafitte F Bontempi G (2007) Information-theoretic inference of large transcriptional 954 regulatory networks Eurasip J Bioinforma Syst Biol doi 101155200779879 955

Meyer PE Lafitte F Bontempi G (2008) minet ARBioconductor package for inferring large transcriptional 956 networks using mutual information BMC Bioinformatics 9 461 957

Monaco MK Sen TZ Dharmawardhana PD Ren L Schaeffer M Naithani S Amarasinghe V Thomason J 958 Harper L Gardiner J et al (2013) Maize Metabolic Network Construction and Transcriptome Analysis 959 Plant Genome 6 12 960

Morohashi K Casas MI Falcone Ferreyra ML Falcone Ferreyra L Mejiacutea-Guerra MK Pourcel L Yilmaz A 961 Feller A Carvalho B Emiliani J et al (2012) A genome-wide regulatory framework identifies maize 962 pericarp color1 controlled genes Plant Cell 24 2745ndash64 963

Morris JH Apeltsin L Newman AM Baumbach J Wittkop T Su G Bader GD Ferrin TE (2011) clusterMaker 964 a multi-algorithm clustering plugin for Cytoscape BMC Bioinformatics 12 436 965

Mortazavi A Williams BA McCue K Schaeffer L Wold B (2008) Mapping and quantifying mammalian 966 transcriptomes by RNA-Seq Nat Methods 5 621ndash628 967

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 28

Mukaka MM (2012) A guide to appropriate use of correlation coefficient in medical research Malawi Med J 24 968 69ndash71 969

Obayashi T Hayashi S Saeki M Ohta H Kinoshita K (2009) ATTED-II provides coexpressed gene networks 970 for Arabidopsis Nucleic Acids Res 37 D987ndashD991 971

Ogata Y Suzuki H Sakurai N Shibata D (2010) CoP a database for characterizing co-expressed gene 972 modules with biological information in plants Bioinformatics 26 1267ndash1268 973

Oshlack A Wakefield MJ (2009) Transcript length bias in RNA-seq data confounds systems biology Biol 974 Direct 4 14 975

Park T Yi S-G Kang S-H Lee S Lee Y-S Simon R (2003) Evaluation of normalization methods for microarray 976 data BMC Bioinformatics 4 33 977

Paulson J Chen C-Y Lopes-Ramos CM Kuijjer ML Platig J Sonawane AR Fagny M Glass K Quackenbush 978 J (2016) Tissue-aware RNA-Seq processing and normalization for heterogeneous and sparse data 979 bioRxiv 81802 980

Pautler M Eveland AL LaRue T Yang F Weeks R Lunde C Je B Il Meeley R Komatsu M Vollbrecht E et 981 al (2015) FASCIATED EAR4 Encodes a bZIP Transcription Factor That Regulates Shoot Meristem Size in 982 Maize Plant Cell Online 2 tpc114132506 983

Penning BW Hunter 3rd CT Tayengwa R Eveland AL Dugard CK Olek AT Vermerris W Koch KE McCarty 984 DR Davis MF et al (2009) Genetic resources for maize cell wall biology Plant Physiol 151 1703ndash1728 985

Ponnala L Wang Y Sun Q Wijk KJ (2014) Correlation of mRNA and protein abundance in the developing 986 maize leaf Plant J 78 424ndash440 987

Rau A Gallopin M Celeux G Jaffreacutezic F (2013) Data-based filtering for replicated high-throughput 988 transcriptome sequencing experiments Bioinformatics 29 2146ndash2152 989

Robinson MD McCarthy DJ Smyth GK (2010) edgeR a Bioconductor package for differential expression 990 analysis of digital gene expression data Bioinformatics 26 139ndash140 991

Sales G Romualdi C (2011) parmigenemdasha parallel R package for mutual information estimation and gene 992 network reconstruction Bioinformatics 27 1876ndash1877 993

Sangrador-Vegas A Mitchell AL Chang H-Y Yong S-Y Finn RD (2016) GO annotation in InterPro why 994 stability does not indicate accuracy in a sea of changing annotations Database J Biol databases 995 curation 2016 996

Sato Y Antonio BA Namiki N Takehisa H Minami H Kamatsuki K Sugimoto K Shimizu Y Hirochika H 997 Nagamura Y (2011) RiceXPro a platform for monitoring gene expression in japonica rice grown under 998 natural field conditions Nucleic Acids Res 39 D1141ndashD1148 999

Schaefer RJ Briskine R Springer NM Myers CL (2014) Discovering functional modules across diverse maize 1000 transcriptomes using COB the co-expression browser PLoS One doi 101371journalpone0099193 1001

Schmidt D (2016) Co-Operation Fast Correlation Covariance and Cosine Similarity R package 1002

Schnable P Ware D Fulton R Stein J (2009) The B73 maize genome complexity diversity and dynamics 1003 Science (80- ) 326 1112ndash1115 1004

Schwanhaumlusser B Busse D Li N Dittmar G Schuchhardt J Wolf J Chen W Selbach M (2011) Global 1005 quantification of mammalian gene expression control Nature 473 337ndash342 1006

Scott-Boyer M-P Haibe-Kains B Deschepper CF (2013) Network statistics of genetically-driven gene co-1007 expression modules in mouse crosses Frontiers in Genetics 20134291 1008

Serin EAR Nijveen H Hilhorst HWM Ligterink W (2016) Learning from Co-expression Networks Possibilities 1009 and Challenges Front Plant Sci 7 444 1010

Shannon P Markiel A Ozier O Baliga NS Wang JT Ramage D Amin N Schwikowski B Ideker T (2003) 1011 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 29

Cytoscape a software environment for integrated models of biomolecular interaction networks Genome 1012 Res 13 2498ndash2504 1013

Sing T Sander O Beerenwinkel N Lengauer T (2005) ROCR visualizing classifier performance in R 1014 Bioinformatics 21 3940ndash3941 1015

Singh M Goel S Meeley RB Dantec C Parrinello H Michaud C Leblanc O Grimanelli D (2011) Production of 1016 viable gametes without meiosis in maize deficient for an ARGONAUTE protein Plant Cell 23 443ndash458 1017

Song L Langfelder P Horvath S (2012) Comparison of co-expression measures mutual information 1018 correlation and model based indices BMC Bioinformatics 13 328 1019

Stelpflug SC Rajandeep S Vaillancourt B Hirsch CN Buell CR Leon N De Kaeppler SM (2015) An 1020 expanded maize gene expression atlas based on RNA-sequencing and its use to explore root 1021 development Plant Genome 314ndash362 1022

Szklarczyk D Franceschini A Wyder S Forslund K Heller D Huerta-Cepas J Simonovic M Roth A Santos A 1023 Tsafou KP et al (2015) STRING v10 Protein-protein interaction networks integrated over the tree of life 1024 Nucleic Acids Res 43 D447ndashD452 1025

Tzfadia O Diels T De Meyer S Vandepoele K Aharoni A Van de Peer Y (2016) CoExpNetViz Comparative 1026 Co-Expression Networks Construction and Visualization Tool Front Plant Sci 6 1194 1027

Usadel B Obayashi T Mutwil M Giorgi FM Bassel GW Tanimoto M Chow A Steinhauser D Persson S 1028 Provart NJ (2009) Co-expression tools for plant biology opportunities for hypothesis generation and 1029 caveats Plant Cell Environ 32 1633ndash51 1030

USDA (2016) Grain World Markets and Trade 1031

Vanholme R Demedts B Morreel K Ralph J Boerjan W (2010) Lignin biosynthesis and structure Plant 1032 Physiol 153 895ndash905 1033

Verdin E Ott M (2014) 50 Years of Protein Acetylation From Gene Regulation To Epigenetics Metabolism 1034 and Beyond Nat Rev Mol Cell Biol 16 258ndash264 1035

Walley JW Sartor RC Shen Z Schmitz RJ Wu KJ Urich MA Nery JR Smith LG Schnable JC Ecker JR 1036 (2016) Integration of omic networks in a developmental atlas of maize Science 353 814ndash818 1037

Wang W Zhou XJ Liu Z Sun F (2015a) Network tuned multiple rank aggregation and applications to gene 1038 ranking BMC Bioinformatics 16 S6 1039

Wang YXR Jiang K Feldman LJ Bickel PJ Huang H (2015b) Inferring genendashgene interactions and functional 1040 modules using sparse canonical correlation analysis Ann Appl Stat 9 300ndash323 1041

Wang Z Gerstein M Snyder M (2009) RNA-Seq a revolutionary tool for transcriptomics Nat Rev Genet 10 1042 57ndash63 1043

Wei C Li J Bumgarner RE (2004) Sample size for detecting differentially expressed genes in microarray 1044 experiments BMC Genomics 5 87 1045

Wolfe CJ Kohane IS Butte AJ (2005) Systematic survey reveals general applicability ofldquo guilt-by-associationrdquo 1046 within gene coexpression networks BMC Bioinformatics 6 227 1047

Wu D-D Wang X Li Y Zeng L Irwin DM Zhang Y-P (2014) ldquoOut of Pollenrdquo Hypothesis for Origin of New 1048 Genes in Flowering Plants Study from Arabidopsis thaliana Genome Biol Evol 6 2822ndash2829 1049

Yang H Liu X Xin M Du J Hu Z Peng H Rossi V Sun Q Ni Z Yao Y (2016) Genome-wide Mapping of 1050 Targets of Maize Histone Deacetylase HDA101 Reveals Its Function and Regulatory Mechanism during 1051 Seed Development Plant Cell 28 629ndash645 1052

Yim WC Yu Y Song K Jang CS Lee B-M (2013) PLANEX the plant co-expression database BMC Plant Biol 1053 13 83 1054

Zheng W Chung LM Zhao H (2011) Bias detection and correction in RNA-Sequencing data BMC 1055 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 30

Bioinformatics 12 290 1056

Zhong R Allen JD Xiao G Xie Y (2014) Ensemble-based network aggregation improves the accuracy of gene 1057 network reconstruction PLoS One 9 e106319 1058

Zhong R Ye ZH (2015) Secondary cell walls Biosynthesis patterned deposition and transcriptional regulation 1059 Plant Cell Physiol 56 195ndash214 1060

Zhu G Wu A Xu X-J Xiao P Lu L Liu J Cao Y Chen L Wu J Zhao X-M (2016) PPIM A protein-protein 1061 interaction database for Maize Plant Physiol 170 pp1501821- 1062

Zyprych-Walczak J Szabelska A Handschuh L Goacuterczak K Klamecka K Figlerowicz M Siatkowski I (2015) 1063 The impact of normalization methods on RNA-Seq data analysis Biomed Res Int 2015 1064

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 1 Number of maize microarray and RNA-Seq samples submitted to NCBI (httpswwwncbinlmnihgov) from 2008-2016 A A text search of the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) database identified samples generated by microarray platforms GPL4032 and GPL12620 the total values for years 2008 to 2016 were combined to represent the number of microarray studies (Microarray) A text search of the NCBI Sequence Read Archive (SRA) database was used to identify RNA-seq samples generated between 2008 and 2016 using the Illumina sequencing platform (RNA-Seq) B the number of samples submitted to NCBI GEO database each year generated by microarray platform GPL4032 and GPL12620 were identified by a text search (dash line) and compared to the number of RNA-Seq Illumina samples (solid line) per year 2008-2016

Fig 1A B

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 2 Normalization and network inference methods effect on single network performance A Network performance was evaluated by calculating area under the receiver operating characteristic curve (AUROC) values from GO datasets for comparisons with samples normalized using Variance Stabilizing Transformation (VST) Counts Per Million (CPM) or Reads per Kilobase per Million (RPKM) methods B Network performance was evaluated by calculating AUROC values from PPPTY dataset comparisons for samples normalized using VST CPM or RPKM methods C Network performance was evaluated by calculating AUROC values from com-parisons with HDA101 binding targets for samples normalized using VST CPM or RPKM methods D Network performance was evaluated by calculating AUROC values from comparisons with GO dataset for samples constructed using nine inference methods including Pearson Correlation Coefficient (PCC) Spearman correla-tion coefficient (SCC) Kendall rank Correlation Coefficient (KCC) Gini correlation coefficient (GCC) Biweight midcorrelation (BIC) Cosine Similarity Coefficient (CSC) Additive ARCNE (AA) multiplicative ARCNE (MA) Minimum Redundancy NETwork (MRNET) and Context Likelihood of Relatedness (CLR) E Network perfor-mance was evaluated by calculating AUROC values from comparisons with PPPTY for samples constructed using nine inference methods F Network performance was evaluated by calculating AUROC values from com-parisons with HDA101 binding targets for samples constructed using nine inference methods Outliers were defined as outside of 15 times the interquartile range above 75 quantile or below 25 quantile Median values were plotted as bold horizontal lines For C to F the horizontal dashed lines indicate the highest and lowest AUROC values

Fig 2 A D

B E

C F

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

FigP

FigurePIPSimilarityPbetweenPninePinferencePmethodsPonPnetworkPperformancePbasedPuponPGOPA-PandPPPPTYPB-PevaluationI Genes with highest AUROC value in CSC or MRNETCLR are closed in blue and green box respectively AreaPunderPthePROCPcurvePAUROC-PvaluesPforPeachPGOPtermPorPgenesPwerePscaledPtoPstandardPnormalPdistributionMPresultingPinPscaledPAUROCPvaluesPbetweenPKPblue-PandPPred-IPSamplesPnormalizedPbyPVSTMPCPMPandPRPKMPwerePanalyzedPusingPeachPinferencePmethodsPPCCMPSCCMPKCCMPGCCMPBICMPCSCMPAAMPMAMPMRNETPandPCLR-PandPclusteredPbasedPonP EuclidianP distanceIP PCCP PearsonP CorrelationP CoefficientP SCCP SpearmanP correlationP coefficientP KCCP KendallP rankP CorrelationP CoefficientPGCCPGiniPcorrelationPcoefficientPBICPBiweightPmidcorrelationPCosinePSimilarityPCoefficientPCSC-PAAPAdditivePARCNEPMAPmultiplicativePARCNEPMRNETPMinimumPRedundancyPNETworkPCLRPContextPLikelihoodPofPRelatednessI

A

B

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 4 Effect of sample size on network performance A Average area under the ROC curve (average AUROC) values from GO evaluation of 17 different sized networks plotted against natural logarithm transformed sample size (log(sample size)) B Average AUROC values from PPPTY evaluation of 17 different sized networks plotted against natural logarithm transformed sample size (log(sample size)) Regression fitting logarithm models were plotted in black lines R-square and p-values were calculated by lm() function in R PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness

Fig 4A

B

GO PCC GO SCC

GO MRNET GO CLR

PPPTY PCC PPPTY SCC

PPPTY CLRPPPTY MRNET

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Fig 5

A B

Figure 5 Evaluation of different sized networks constructed with different number of samples using PCC (black) SCC (green) MRNET (red) and CLR (blue) methods A Average AUROC values from GO evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted against sample size B Average AUROC values from PPPTY evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted against sample size Networks with the same number of samples included are designated as ldquo_1rdquo ldquo_2rdquo and ldquo_3rdquo PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 6 GCN performance comparison of networks constructed with 1266 libraries A Area under the ROC curve (AUROC) values from GO evalu-ation of single network (white bars) aggregation network (grey bars) and protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers B AUROC values from PPPTY evaluation of single network (white bars) aggregation network (grey bars) and protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers

AU

C

AU

C

Protein GO Protein PPPTYA B

Fig 6

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

835

45

3812

5 802

148

MS PA

SA

872505

167664

16510411732

9172 716573

106591

MS PA

SA

chromatinassemblydisassembly

cellular macromoleculemetabolic process

chromatin assembly

Hub

N2 compound metabolicprocess

gene silencing

macromoleculemetabolic process

cellular componentorganization

chromatin modification

biosynthetic process

cellular biosyntheticprocess

DNA packaging

organelle organization

protein-DNA complexassembly

nucleosomeorganization

DNA-dep DNAreplication

macromoleculebiosynthetic process

response to DNAdamage stimulus

chromosomeorganization

pattern specificationprocess

DNA replication

DNA conformationchange

translation

cellular macromoleculebiosynthetic process

Nucleic acid metabolicprocess

gene expression

chromatin organizationnucleosome assembly

epigenetic reg of geneexpression

negative regulation ofmacromolecule

metabolic process

cellular response tostress

RNA processing

DNA repair

regionalization

polysaccharidebiosynthetic process

cell wall organization orbiogenesis

glucan metabolicprocess

cellular glucanmetabolic process

cellular polysaccharidebiosynthetic process

cellular carbohydratebiosynthetic process

cellulose metabolicprocess

cellular polysaccharidemetabolic process

cellulose biosyntheticprocess epidermis development

cell growthgrowth

regulation of cellularcomponent size

cellular amino acidderivative metabolic

process

cell wall polysaccharidemetabolic process

carbohydrate metabolicprocess

regulation of anatomicalstructure size

GTP metabolic process

root morphogenesis

epidermal celldifferentiation

ectoderm developmentphenylpropanoid

biosynthetic process

regulation of cell size

glucan biosyntheticprocess

carbohydratebiosynthetic process cellular cell wall

organization orbiogenesis

cell wall biogenesis

Cell Wallroot epidermal cell

differentiationcell differentiation

cell wall organization

protein polymerization

plant-type cell wallbiogenesis

cellular carbohydratemetabolic process

phenylpropanoidmetabolic process

cell wall macromoleculemetabolic process

cellular cell wallmacromolecule

metabolic process

plant-type cell wallorganization or

biogenesis

hemicellulose metabolicprocess

Fig 7A C

B D

Figure 7 Comparison of top 1000 interacting genes in aggregation and single networks A Venn diagram shows the overlap among 1000 highest interacting genes in three networks 148 genes were shared among three networks PA PCC ranked aggregation network SA SCC ranked aggrega-tion network MS MRNET single network B Venn diagram shows the overlap among top 1x106 edges in three networks 106591 edges were shared among three networks C GO term enrichment analyzed by AgriGO for 148 shared highly interacting genes among PA SA and MS D GO term enrichment analyzed by AgriGO for 214 co-expressed genes queried by 16 cell wall pathway genes wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Fig 8

A B C

Figure 8 Cell wall pathway subnetworks A Intersections of PCC aggregation (PA) SCC aggregation (SA) and MRNET-single (MS) networks queried by 16 cell wall pathway genes (red nodes) Cyan nodes are genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions B Network retrieved from CORNET database queried by the 16 cell wall pathway genes (red node) Cyan nodes are genes with reported function in cell wall related path-ways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions C Network retrieved from STRING database queried by the 16 cell wall pathway genes (red nodes) Cyan nodes are genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Parsed CitationsAllen JD Xie Y Chen M Girard L Xiao G (2012) Comparing statistical methods for constructing large scale gene networks PLoS One7 e29348

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Anders S Huber W (2010) Differential expression analysis for sequence count data Genome Biol 11 R106Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Aoki K Ogata Y Shibata D (2007) Approaches for extracting practical information from gene co-expression networks in plant biologyPlant Cell Physiol 48 381-90

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Aoki Y Okamura Y Tadaka S Kinoshita K Obayashi T (2015) ATTED-II in 2016 A Plant Coexpression Database Towards Lineage-Specific Coexpression Plant Cell Physiol 57 e5-e5

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Asnicar F Masera L Coller E Gallo C Sella N Tolio T Morettin P Erculiani L Galante F Semeniuta S (2016) NES2RA Networkexpansion by stratified variable subsetting and ranking aggregation Int J High Perform Comput Appl 1094342016662508

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Baerenfaller K Grossmann J Grobei MA Hull R Hirsch-Hoffmann M Yalovsky S Zimmermann P Grossniklaus U Gruissem WBaginsky S (2008) Genome-scale proteomics reveals Arabidopsis thaliana gene models and proteome dynamics Science (80- ) 320938-941

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ballouz S Verleyen W Gillis J (2015) Guidance for RNA-seq co-expression network construction and analysis Safety in numbersBioinformatics 31 2123-2130

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ballouz S Weber M Pavlidis P Gillis J (2016) EGAD Ultra-fast functional analysis of gene networks bioRxiv 53868Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Barabasi A-L Oltvai ZNZN Barabaacutesi A-L (2004) Network biology understanding the cells functional organization Nat Rev Genet 5101-113

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Benjamini Y Hochberg Y (1995) Controlling the false discovery rate a practical and powerful approach to multiple testing J R Stat SocSer B 289-300

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

De Bodt S Hollunder J Nelissen H Meulemeester N Inzeacute D (2012) CORNET 20 Integrating plant coexpression protein-proteininteractions regulatory interactions gene associations and functional annotations New Phytol 195 707-720

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Bolduc N Yilmaz A Mejia-Guerra MK Morohashi K OConnor D Grotewold E Hake S (2012) Unraveling the KNOTTED1 regulatorynetwork in maize meristems Genes Dev 26 1685-90

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Bosch M Mayer C-D Cookson A Donnison IS (2011) Identification of genes involved in cell wall biogenesis in grasses by differential wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

gene expression profiling of elongating and non-elongating maize internodes J Exp Bot 62 3545-3561Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Camacho C Coulouris G Avagyan V Ma N Papadopoulos J Bealer K Madden TL (2009) BLAST+ architecture and applications BMCBioinformatics 10 421

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Conesa A Madrigal P Tarazona S Gomez-Cabrero D Cervera A McPherson A Szczesniak MW Gaffney DJ Elo LL Zhang X et al(2016) A survey of best practices for RNA-seq data analysis Genome Biol 17 13

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dhaeseleer P Liang S Somogyi R (2000) Genetic network inference from co-expression clustering to reverse engineeringBioinformatics 16 707-726

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Davidson RM Hansey CN Gowda M Childs KL Lin H Vaillancourt B Sekhon RS de Leon N Kaeppler SM Jiang N et al (2011) Utilityof RNA Sequencing for Analysis of Maize Reproductive Transcriptomes Plant Genome J 4 191

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Desprez T Juraniec M Crowell EF Jouy H Pochylova Z Parcy F Houmlfte H Gonneau M Vernhettes S (2007) Organization of cellulosesynthase complexes involved in primary cell wall synthesis in Arabidopsis thaliana Proc Natl Acad Sci 104 15572-15577

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dhillon IS Modha DS (2001) Concept decompositions for large sparse text data using clustering Mach Learn 42 143-175Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dillies M-A Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot G Castel D Estelle J (2013a) Acomprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis Brief Bioinform 14671-683

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dillies MA Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot NS Castel D Estelle J et al (2013b) Acomprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis Brief Bioinform 14671-683

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Doncheva NT Assenov Y Domingues FS Albrecht M (2012) Topological analysis and interactive visualization of biological networksand protein structures Nature Protoc 7 670-85

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dong J Horvath S (2007) Understanding network concepts in modules BMC Syst Biol 1 24Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dotto MC Petsch KA Aukerman MJ Beatty M Hammell M Timmermans MCP (2014) Genome-wide analysis of leafbladeless1-regulatedand phased small RNAs underscores the importance of the TAS3 ta-siRNA pathway to maize development PLoS Genet 10 e1004826

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Du D Rawat N Deng Z Gmitter FG (2015) Construction of citrus gene coexpression networks from microarray data using randommatrix theory Hortic Res 2 15026

Pubmed Author and TitleCrossRef Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Google Scholar Author Only Title Only Author and Title

Du Z Zhou X Ling Y Zhang Z Su Z (2010) agriGO a GO analysis toolkit for the agricultural community Nucleic Acids Res 38 64-70Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Enright AJ Van Dongen S Ouzounis CA (2002) An efficient algorithm for large-scale detection of protein families Nucleic Acids Res 301575-1584

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fadista J Vikman P Laakso EO Mollet IG Esguerra J Lou Taneera J Storm P Osmark P Ladenvall C Prasad RB (2014) Globalgenomic and transcriptomic analysis of human pancreatic islets reveals novel genes influencing glucose metabolism Proc Natl AcadSci 111 13924-13929

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Faith JJ Hayete B Thaden JT Mogno I Wierzbowski J Cottarel G Kasif S Collins JJ Gardner TS (2007) Large-scale mapping andvalidation of Escherichia coli transcriptional regulation from a compendium of expression profiles PLoS Biol 5 0054-0066

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fedoroff N V (2012) McClintocks challenge in the 21st century Proc Natl Acad Sci 109(50) 20200-20203Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ficklin SP Feltus FA (2011) Gene coexpression network alignment and conservation of gene modules between two grass speciesmaize and rice Plant Physiol 156 1244-56

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Filzmoser P Fritz H Kalcher K (2009) pcaPP Robust PCA by Projection Pursuit R Packag version 1Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fu J Cheng Y Linghu J Yang X Kang L Zhang Z Zhang J He C Du X Peng Z (2013) RNA sequencing reveals the complex regulatorynetwork in the maize kernel Nature Commun 42832

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gao S Ver Steeg G Galstyan A (2015) Efficient Estimation of Mutual Information for Strongly Dependent Variables ArtificialIntelligence and Statistics 277-286

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gillis J Pavlidis P (2011) The role of indirect connections in gene networks in predicting function Bioinformatics 27 1860-1866Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Giorgi F Fabbro C Del Licausi F (2013) Comparative study of RNA-seq-and Microarray-derived coexpression networks in Arabidopsisthaliana Bioinformatics 2 1-8

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gu Y Kaplinsky N Bringmann M Cobb A Carroll A Sampathkumar A Baskin TI Persson S Somerville CR (2010) Identification of acellulose synthase-associated protein required for cellulose biosynthesis Proc Natl Acad Sci 107 12866-12871

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Han Y Gao S Muegge K Zhang W Zhou B (2015) Advanced applications of RNA sequencing and challenges Bioinform Biol Insights 929-46

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Horvath S Dong J (2008) Geometric interpretation of gene coexpression network analysis PLoS Comput biol 4 e1000117Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Huang J Lynn JS Schulte L Vendramin S McGinnis K (2017) Chapter Two-Epigenetic Control of Gene Expression in Maize Int RevCell Mol Biol 328 25-48

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Iancu OD Kawane S Bottomly D Searles R Hitzemann R McWeeney S (2012) Utilizing RNA-Seq data for de novo coexpressionnetwork inference Bioinformatics 28 1592-1597

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kim D Langmead B Salzberg SL (2015) HISAT a fast spliced aligner with low memory requirements Nat Methods 12 357-360Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Koumlster J Rahmann S (2012) Snakemakemdasha scalable bioinformatics workflow engine Bioinformatics 28 2520-2522Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Krouk G Lingeman J Colon A Coruzzi G Shasha D (2013) Gene regulatory networks in plants learning causality from time andperturbation Genome Biol 14 123

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kumar S Stecher G Suleski M Hedges SB (2017) TimeTree a resource for timelines timetrees and divergence times Mol Biol Evol34 1812-1819

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kumari S Nie J Chen H-S Ma H Stewart R Li X Lu M-Z Taylor WM Wei H (2012) Evaluation of gene association methods forcoexpression network construction and biological knowledge discovery PLoS One 7 e50411

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Langfelder P Horvath S (2008) WGCNA an R package for weighted correlation network analysis BMC Bioinformatics 9 559Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Leinonen R Sugawara H Shumway M (2010) The sequence read archive Nucleic Acids Res gkq1019Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li C Qiao Z Qi W Wang Q Yuan Y Yang X Tang Y Mei B Lv Y Zhao H et al (2015a) Genome-Wide Characterization of cis-Acting DNATargets Reveals the Transcriptional Regulatory Framework of Opaque2 in Maize Plant Cell 27 532-545

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li H Peng Z Yang X Wang W Fu J Wang J Han Y Chai Y Guo T Yang N (2013a) Genome-wide association study dissects the geneticarchitecture of oil biosynthesis in maize kernels Nat Genet 45 43-50

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li J Wei H Zhao PX (2013b) DeGNServer Deciphering Genome-Scale Gene Networks through High Performance ReverseEngineering Analysis 2013

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li P Piao Y Shon HS Ryu KH (2015b) Comparing the normalization methods for the differential analysis of Illumina high-throughputRNA-Seq data BMC Bioinformatics 16 347 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li Q Eichten SR Hermanson PJ Zaunbrecher VM Song J Wendt J Rosenbaum H Madzima TF Sloan AE Huang J et al (2014a)Genetic Perturbation of the Maize Methylome Plant Cell 26 4602-4616

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li S Labaj PP Zumbo P Sykacek P Shi W Shi L Phan J Wu P-Y Wang M Wang C (2014b) Detecting and correcting systematicvariation in large-scale RNA sequencing data Nature Biotechnol 32 888-895

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li Y Pearl SA Jackson SA (2015c) Gene Networks in Plant Biology Approaches in Reconstruction and Analysis Trends Plant Sci 20664-675

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Liao Y Smyth GK Shi W (2014) featureCounts an efficient general purpose program for assigning sequence reads to genomicfeatures Bioinformatics 30 923-930

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Lim WK Wang K Lefebvre C Califano A (2007) Comparative analysis of microarray normalization procedures Effects on reverseengineering gene networks Bioinformatics pp 282-288

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Liu S Liu Y Zhao J Cai S Qian H Zuo K Zhao L Zhang L (2017) A computational interactome for prioritizing genes associated withcomplex agronomic traits in rice Plant J 90 177-188

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Lonsdale J Thomas J Salvatore M Phillips R Lo E Shad S Hasz R Walters G Garcia F Young N (2013) The genotype-tissueexpression (GTEx) project Nat Genet 45 580-585

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Love MI Huber W Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 Genome Biol15 1

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Luo X You Z Zhou M Li S Leung H Xia Y Zhu Q (2015) A highly efficient approach to protein interactome mapping based oncollaborative filtering framework Sci Rep 5 7702

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma C Wang X (2012) Application of the Gini correlation coefficient to infer regulatory relationships in transcriptome analysis PlantPhysiol 160 192-203

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma H-W Zeng A-P (2003) The connectivity structure giant strong component and centrality of metabolic networks Bioinformatics 191423-1430

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma S Shah S Bohnert HJ Snyder M Dinesh-Kumar SP (2013) Incorporating motif analysis into gene co-expression networks revealsnovel modular expression pattern and new signaling pathways PLoS Genet 9 e1003840

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Marbach D Costello JC Kuumlffner R Vega NNM Prill RJ Camacho DM Allison KR Aderhold A Allison KR Bonneau R et al (2012)Wisdom of crowds for robust gene network inference Nat Methods 9 796-804

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Margolin AA Nemenman I Basso K Wiggins C Stolovitzky G Dalla Favera R Califano A (2006) ARACNE an algorithm for thereconstruction of gene regulatory networks in a mammalian cellular context BMC Bioinformatics 7 S7

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mark Cigan A Unger-Wallace E Haug-Collet K (2005) Transcriptional gene silencing as a tool for uncovering gene function in maizePlant J 43 929-940

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads EMBnet J 17 pp-10Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Maza E Frasse P Senin P Bouzayen M Zouine M (2013) Comparison of normalization methods for differential gene expressionanalysis in RNA-Seq experiments A matter of relative size of studied transcriptomes Commun Integr Biol 6 e25849

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

McClintock B (1983) The Significance of Responses of THE GENOME TO CHALLENGE Science (80- ) 792-801Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Meyer PE Kontos K Lafitte F Bontempi G (2007) Information-theoretic inference of large transcriptional regulatory networks EurasipJ Bioinforma Syst Biol doi 101155200779879

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Meyer PE Lafitte F Bontempi G (2008) minet ARBioconductor package for inferring large transcriptional networks using mutualinformation BMC Bioinformatics 9 461

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Monaco MK Sen TZ Dharmawardhana PD Ren L Schaeffer M Naithani S Amarasinghe V Thomason J Harper L Gardiner J et al(2013) Maize Metabolic Network Construction and Transcriptome Analysis Plant Genome 6 12

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Morohashi K Casas MI Falcone Ferreyra ML Falcone Ferreyra L Mejiacutea-Guerra MK Pourcel L Yilmaz A Feller A Carvalho B EmilianiJ et al (2012) A genome-wide regulatory framework identifies maize pericarp color1 controlled genes Plant Cell 24 2745-64

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Morris JH Apeltsin L Newman AM Baumbach J Wittkop T Su G Bader GD Ferrin TE (2011) clusterMaker a multi-algorithmclustering plugin for Cytoscape BMC Bioinformatics 12 436

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mortazavi A Williams BA McCue K Schaeffer L Wold B (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq NatMethods 5 621-628

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mukaka MM (2012) A guide to appropriate use of correlation coefficient in medical research Malawi Med J 24 69-71Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Obayashi T Hayashi S Saeki M Ohta H Kinoshita K (2009) ATTED-II provides coexpressed gene networks for Arabidopsis Nucleic wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Acids Res 37 D987-D991Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ogata Y Suzuki H Sakurai N Shibata D (2010) CoP a database for characterizing co-expressed gene modules with biologicalinformation in plants Bioinformatics 26 1267-1268

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Oshlack A Wakefield MJ (2009) Transcript length bias in RNA-seq data confounds systems biology Biol Direct 4 14Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Park T Yi S-G Kang S-H Lee S Lee Y-S Simon R (2003) Evaluation of normalization methods for microarray data BMC Bioinformatics4 33

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Paulson J Chen C-Y Lopes-Ramos CM Kuijjer ML Platig J Sonawane AR Fagny M Glass K Quackenbush J (2016) Tissue-awareRNA-Seq processing and normalization for heterogeneous and sparse data bioRxiv 81802

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Pautler M Eveland AL LaRue T Yang F Weeks R Lunde C Je B Il Meeley R Komatsu M Vollbrecht E et al (2015) FASCIATED EAR4Encodes a bZIP Transcription Factor That Regulates Shoot Meristem Size in Maize Plant Cell Online 2 tpc114132506

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Penning BW Hunter 3rd CT Tayengwa R Eveland AL Dugard CK Olek AT Vermerris W Koch KE McCarty DR Davis MF et al (2009)Genetic resources for maize cell wall biology Plant Physiol 151 1703-1728

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ponnala L Wang Y Sun Q Wijk KJ (2014) Correlation of mRNA and protein abundance in the developing maize leaf Plant J 78 424-440Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Rau A Gallopin M Celeux G Jaffreacutezic F (2013) Data-based filtering for replicated high-throughput transcriptome sequencingexperiments Bioinformatics 29 2146-2152

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Robinson MD McCarthy DJ Smyth GK (2010) edgeR a Bioconductor package for differential expression analysis of digital geneexpression data Bioinformatics 26 139-140

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sales G Romualdi C (2011) parmigenemdasha parallel R package for mutual information estimation and gene network reconstructionBioinformatics 27 1876-1877

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sangrador-Vegas A Mitchell AL Chang H-Y Yong S-Y Finn RD (2016) GO annotation in InterPro why stability does not indicateaccuracy in a sea of changing annotations Database J Biol databases curation 2016

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sato Y Antonio BA Namiki N Takehisa H Minami H Kamatsuki K Sugimoto K Shimizu Y Hirochika H Nagamura Y (2011) RiceXPro aplatform for monitoring gene expression in japonica rice grown under natural field conditions Nucleic Acids Res 39 D1141-D1148

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schaefer RJ Briskine R Springer NM Myers CL (2014) Discovering functional modules across diverse maize transcriptomes using wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

COB the co-expression browser PLoS One doi 101371journalpone0099193Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schmidt D (2016) Co-Operation Fast Correlation Covariance and Cosine Similarity R packagePubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schnable P Ware D Fulton R Stein J (2009) The B73 maize genome complexity diversity and dynamics Science (80- ) 326 1112-1115Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schwanhaumlusser B Busse D Li N Dittmar G Schuchhardt J Wolf J Chen W Selbach M (2011) Global quantification of mammalian geneexpression control Nature 473 337-342

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Scott-Boyer M-P Haibe-Kains B Deschepper CF (2013) Network statistics of genetically-driven gene co-expression modules in mousecrosses Frontiers in Genetics 20134291

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Serin EAR Nijveen H Hilhorst HWM Ligterink W (2016) Learning from Co-expression Networks Possibilities and Challenges FrontPlant Sci 7 444

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Shannon P Markiel A Ozier O Baliga NS Wang JT Ramage D Amin N Schwikowski B Ideker T (2003) Cytoscape a softwareenvironment for integrated models of biomolecular interaction networks Genome Res 13 2498-2504

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sing T Sander O Beerenwinkel N Lengauer T (2005) ROCR visualizing classifier performance in R Bioinformatics 21 3940-3941Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Singh M Goel S Meeley RB Dantec C Parrinello H Michaud C Leblanc O Grimanelli D (2011) Production of viable gametes withoutmeiosis in maize deficient for an ARGONAUTE protein Plant Cell 23 443-458

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Song L Langfelder P Horvath S (2012) Comparison of co-expression measures mutual information correlation and model basedindices BMC Bioinformatics 13 328

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Stelpflug SC Rajandeep S Vaillancourt B Hirsch CN Buell CR Leon N De Kaeppler SM (2015) An expanded maize gene expressionatlas based on RNA-sequencing and its use to explore root development Plant Genome 314-362

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Szklarczyk D Franceschini A Wyder S Forslund K Heller D Huerta-Cepas J Simonovic M Roth A Santos A Tsafou KP et al (2015)STRING v10 Protein-protein interaction networks integrated over the tree of life Nucleic Acids Res 43 D447-D452

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Tzfadia O Diels T De Meyer S Vandepoele K Aharoni A Van de Peer Y (2016) CoExpNetViz Comparative Co-Expression NetworksConstruction and Visualization Tool Front Plant Sci 6 1194

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Usadel B Obayashi T Mutwil M Giorgi FM Bassel GW Tanimoto M Chow A Steinhauser D Persson S Provart NJ (2009) Co-expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

USDA (2016) Grain World Markets and Trade

Vanholme R Demedts B Morreel K Ralph J Boerjan W (2010) Lignin biosynthesis and structure Plant Physiol 153 895-905Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Verdin E Ott M (2014) 50 Years of Protein Acetylation From Gene Regulation To Epigenetics Metabolism and Beyond Nat Rev MolCell Biol 16 258-264

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Walley JW Sartor RC Shen Z Schmitz RJ Wu KJ Urich MA Nery JR Smith LG Schnable JC Ecker JR (2016) Integration of omicnetworks in a developmental atlas of maize Science 353 814-818

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang W Zhou XJ Liu Z Sun F (2015a) Network tuned multiple rank aggregation and applications to gene ranking BMC Bioinformatics16 S6

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang YXR Jiang K Feldman LJ Bickel PJ Huang H (2015b) Inferring gene-gene interactions and functional modules using sparsecanonical correlation analysis Ann Appl Stat 9 300-323

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang Z Gerstein M Snyder M (2009) RNA-Seq a revolutionary tool for transcriptomics Nat Rev Genet 10 57-63Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wei C Li J Bumgarner RE (2004) Sample size for detecting differentially expressed genes in microarray experiments BMC Genomics5 87

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wolfe CJ Kohane IS Butte AJ (2005) Systematic survey reveals general applicability of guilt-by-association within genecoexpression networks BMC Bioinformatics 6 227

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wu D-D Wang X Li Y Zeng L Irwin DM Zhang Y-P (2014) Out of Pollen Hypothesis for Origin of New Genes in Flowering PlantsStudy from Arabidopsis thaliana Genome Biol Evol 6 2822-2829

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yang H Liu X Xin M Du J Hu Z Peng H Rossi V Sun Q Ni Z Yao Y (2016) Genome-wide Mapping of Targets of Maize HistoneDeacetylase HDA101 Reveals Its Function and Regulatory Mechanism during Seed Development Plant Cell 28 629-645

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yim WC Yu Y Song K Jang CS Lee B-M (2013) PLANEX the plant co-expression database BMC Plant Biol 13 83Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zheng W Chung LM Zhao H (2011) Bias detection and correction in RNA-Sequencing data BMC Bioinformatics 12 290Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Allen JD Xiao G Xie Y (2014) Ensemble-based network aggregation improves the accuracy of gene network reconstructionPLoS One 9 e106319 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Ye ZH (2015) Secondary cell walls Biosynthesis patterned deposition and transcriptional regulation Plant Cell Physiol 56195-214

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhu G Wu A Xu X-J Xiao P Lu L Liu J Cao Y Chen L Wu J Zhao X-M (2016) PPIM A protein-protein interaction database for MaizePlant Physiol 170 pp1501821-

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zyprych-Walczak J Szabelska A Handschuh L Goacuterczak K Klamecka K Figlerowicz M Siatkowski I (2015) The impact of normalizationmethods on RNA-Seq data analysis Biomed Res Int 2015

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

  • Parsed Citations
  • Article File
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Figure 7
  • Figure 8
  • Parsed Citations
Page 7: 1 : Maize RNA-Seq GCNs · 4 Author names and affiliation: 5 Ji Huang, ... and the genome annotation is still far from complete (Mark 49 Cigan et al., 2005; Ficklin and Feltus, 2011)

Page | 7

expressed genes have some shared functions (Wolfe et al 2005) GO annotations were downloaded from 212

AgriGO (Du et al 2010) which uses signature integration by InterPro to map gene IDs to GO terms rather 213

than co-expression data InterPro provided over 108 million stable GO terms to the functional protein 214

information database UniProtKB at release 2016_01(Sangrador-Vegas et al 2016) Thus the GO annotations 215

provide a reliable evaluation resource independent of co-expression data To assess this characteristic gene 216

ontology information was used in a neighbor voting algorithm (Gillis and Pavlidis 2011) for sets of co-217

expression matrices and compared Co-expression matrices were assessed by 3-fold cross-validation which 218

involved masking GO terms from some genes to test whether the masked GO terms could be predicted based 219

upon gene expression patterns 277 GO terms were included for this analysis 220

When GO characteristics were used to assess the networks all three normalization methods performed 221

similarly but the AUROC values were higher at around 0689 for each than those observed for comparisons 222

with PPPTY (Fig 2A) Because GO addresses gene functions and PPPTY emphasizes protein-protein 223

interactions this suggests that GCNs are better at predicting functional interactions than physical interactions 224

The p-value from one-way ANOVA for testing normalization method effect on PPPTY and GO dataset were 225

09535 and 04714 respectively confirming that the normalization method did not create a significant difference 226

in the AUROC scores associated with the GCNs for the characteristics that were tested 227

Finally proteins that regulate gene expression or modify chromatin structure might interact with the DNA of a 228

subset of co-expressed genes The interactions between such a protein and regulated DNA could be detected 229

by chromatin precipitation of associated DNA followed by DNA sequencing (ChIP-Seq) In maize there are five 230

ChIP-Seq datasets available (Bolduc et al 2012 Morohashi et al 2012 Li et al 2015a Pautler et al 2015 231

Yang et al 2016) some of which involving lowly expressed or tissue-specific genes For example Opaque2 is 232

specifically expressed in endosperm (Li et al 2015a) Knotted1 is expressed in SAM and floral tissues (Bolduc 233

et al 2012) and Pericarp Color1 has low expression except in inflorescence and seed (Morohashi et al 234

2012) Histone Deacetylase 101 (HDA101) ChIP-Seq data provided the largest dataset for comparison with 26 235

confirmed binding targets that are relatively high expressed in most maize tissues (Yang et al 2016) Histone 236

deacetylation often correlates with decreased in gene expression (Verdin and Ott 2014) High confidence 237

HDA101 targets were defined as those discovered by ChIP-Seq and that also showed increased gene 238

expression in hda101 mutant Networks associated with the 26 high confidence HDA101 targets were 239

compared by calculating AUROC Based upon this analysis the AUROC values were very similar among 240

networks normalized by VST CPM and RPKM (Fig 2C) which is consistent with GO and PPPTY evaluation 241

242

Correlation Methods Performs better than Mutual Information at Some Genes 243

After normalization of the expression matrices they can be processed by different methods for GCN inference 244

To optimize this step the AUROC values of six correlation (PCC SCC KCC GCC BIC CSC) and four mutual 245

information (MI) methods (AA MA MRNET CLR) were compared for the expression matrices that were 246

generated from each of three normalization methods (VST CPM RPKM) and then averaged In general 247 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 8

correlation methods are more computationally efficient while MI methods are able to reveal non-linear 248

relationships (Li et al 2015c) PCC is widely used but may be influenced by outliers (Mukaka 2012) SCC 249

KCC and BIC are less sensitive to outliers because SCC and KCC only consider the rank information and BIC 250

calculates based on dataset median instead of mean (Serin et al 2016) Recently GCC has been shown to 251

be a better correlation method for gene expression analysis because of its capacity to detect non-linear 252

relationships and insensitivity to outliers (Ma and Wang 2012) CSC is widely used for text mining and 253

analyzing sparse data with many zeros (Dhillon and Modha 2001) ARACNE MRNET and CLR showed 254

extended gene-dependent relationships under variable biological settings (Margolin et al 2006 Faith et al 255

2007 Meyer et al 2007 Li et al 2013b) To estimate the effectiveness of the inference methods the same 256

testing parameters with AUROC calculations were performed as described for the testing of normalization 257

methods 258

Assessed by GO datasets the 277 AUROC values were averaged to create one average value for each of the 259

10 inference methods ranging from 0620 to 0724 (Fig 2D) The average AUROC across all normalization 260

methods for six correlation methods was 0718 while the average AUROC for the all four MI methods was 261

0646 The majority of the 277 GO terms had similar AUROC values in the different correlation method-262

generated GCNs and these patterns are different from those observed in the MI-generated GCNs (Fig 3A) 263

The similarity among different methods was also detectable by pairwise comparison and comparing Pearson 264

correlations between the different methods (Supplemental Fig 4A) 265

To evaluate network inference methods with the PPPTY dataset the AUROC values for 1720 genes were 266

averaged for each combination of normalization and inference methods (Fig 2E) This evaluation also showed 267

that the networks constructed using correlation methods resulted in higher AUROC values than MI methods 268

although the CSC method resulted in lower AUROC values than other correlation methods As demonstrated 269

for the GO evaluation results from correlation methods were more similar with each other than the MI methods 270

(Supplemental Fig 4B) Interestingly heatmap results indicated that a subset of genes consistently had higher 271

AUROC values when CSC MRNETCLR or AAMA were used (Fig 3B) although this includes a small enough 272

number of genes that the average AUROC value over the whole gene set was relatively low for those methods 273

The gene sets with highest AUROC values in PCC CSC or MRNET were extracted Characteristics of each 274

gene sets were compared in average expression (CPM) and average number of low expressed elements 275

(CPM lt 0) The CSC gene set had the smallest number of low expression elements and had higher average 276

expression than both the 1720 gene set and the PCC gene set (Supplemental Fig 5) This may indicate that 277

the CSC method is better at determining co-expression for highly expressed genes 278

The AUROC values from 26 targets of HDA101 ChIP-Seq datasets reveals that CSC GCN had the highest 279

AUROC value and the use of MRNETCLR GCNs resulted in slightly higher scores than correlation methods 280

(Fig 2F) This could be explained by the small number of targets creating skewed results but may also 281

indicate that CSCMI methods are more suitable for specific types of genes or interactions between genes 282

(Tzfadia et al 2016) HDA101 is a highly expressed gene in all samples with average expression value equals 283

to 864 CPM and minimum expression equals to 289 CPM so itrsquos possible that HDA101 is more suitable for 284 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 9

CSC method CPM and RPKM normalization methods had higher AUROC values than VST (Fig 2C) Using 285

two models of ARACNE (additive-AA and multiplicative-MA) the co-expression matrices contain less than 05 286

non-zero values for all comparisons and so these techniques were not included in any additional analyses 287

In conclusion our results indicated the widely-used correlation methods resulted in a more predictive maize 288

GCN from a single expression matrix but co-expression with some individual genes may be better detected 289

using MI methods Normalization method did not have a substantial influence on GCNrsquos performance so only 290

CPM normalization was used in conjunction with PCC SCC MRNT and CLR inference for subsequent 291

optimization of other parameters 292

293

Increase Sample Size Had a Positive Effect On GCN 294

GCN analysis can be accomplished with a variable number of samples and datasets but sample size can 295

influence the quality of the resulting GCN (Wei et al 2004 Ballouz et al 2015) Separate analyses were 296

conducted with different numbers of samples and experiments to empirically determine the effect of sample 297

number on GCN effectiveness The data in our analysis consisted of 17 experiments each including between 298

12 and 404 libraries For this analysis CPM normalization method followed by each of four inference methods 299

(PCC SCC MRNET and CLR) was applied to the 17 experiments and the 68 resulting networks were 300

evaluated by both GO and PPPTY 301

From GO and PPPTY evaluation all algorithms exhibit a positive linear relationship between sample size with 302

natural logarithm transformed and average AUROC values (Fig 4) The linear relationships are stronger in 303

PCC and SCC methods with higher r-square values indicating correlation methods benefit more from 304

increasing sample size Thus for building correlation-based GCNs as many samples as possible should be 305

included We also found that as seen for the total GCN analysis PCC and SCC had higher average AUROC 306

values than the MRNET and CLR methods for PPPTY and GO analysis for most of individual networks (Fig 5) 307

308

Ranked Aggregation of Networks Improved Performance of GCNs 309

Ranked aggregation for meta-analysis can also be modified to change the outcomes of GCN by buffering the 310

effect of sample heterogeneity (Zhong et al 2014 Wang et al 2015a Asnicar et al 2016) Aggregated rank 311

standardized correlationMI matrices were calculated from separate experiments to determine if this approach 312

enhanced GCN performance Aggregating individual networks together for meta-analysis can help to highlight 313

true co-expression interactions and reduce noise (Zhong et al 2014 Wang et al 2015a Wang et al 2015b) 314

This analysis was conducted with the 17 differently sized experiments using PCC SCC MRNET and CLR 315

method for GCN inference as we did previously resulting in 68 single GCNs The 17 experiments were 316

aggregated for PCC SCC MRNET and CLR individually and evaluated by GO and PPPTY datasets 317

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 10

Of the 4 aggregated networks that were evaluated the two correlation methods (PCC and SCC) had higher 318

AUROC values than the single network from 1266 samples (Figure 6 and Supplemental Fig 6) However this 319

aggregation strategy did not result in significant higher AUROC scores for the MRNET and CLR method 320

networks compared with single networks with 1266 samples (two-tail Wilcoxon rank test for GO evaluation p-321

values 0494 and 0796) It has been reported that MI estimation accuracy is dependent on sample size (Gao 322

et al 2015) therefore individual MI networks built with a small number of libraries may not demonstrate 323

improved accuracy from aggregation In conclusion the PCCSCC-built GCN performed best using a ranked 324

aggregation strategy and use of this strategy in combination with the other optimized parameters creates a 325

robust GCN 326

327

The Performance of Protein Networks Did Not Exceed Aggregation Networks 328

In many cases mRNA levels in a cell are of interest because mRNA level is thought to be related to the level 329

and function of a protein of interest However many researchers had found inconsistencies between mRNA 330

and protein level (Baerenfaller et al 2008 Schwanhaumlusser et al 2011 Ponnala et al 2014 Walley et al 331

2016) Although relatively less protein expression data is available this data is amenable to GCN construction 332

and could represent a more direct reflection of interacting proteins Using a non-modified protein expression 333

atlas from 23 maize tissues based upon mass spectrometry data (Walley et al 2016) four protein networks 334

were built with PCC SCC MRNET and CLR separately and then evaluated using the same PPPTY and GO 335

dataset as previously mentioned 336

GCNs constructed from protein expression did not exhibit superior AUROC values to those observed for RNA-337

Seq based GCN using the aggregation strategy (Fig 6) When evaluated by GO and PPPTY dataset the 338

performance of the protein network was lower than the aggregated network as well as the single network from 339

1266 samples To confirm this result a two-way ANOVA was computed with pairwise comparison for the GO 340

evaluation which showed that the effect of network type was significant (Supplemental Table S3) A 341

subsequent pairwise comparison using Wilcoxon rank sum test indicated that PCCSCC method were 342

significantly better than MRNETCLR (Supplemental Table S3) although MI methods may be superior for 343

some types of interactions 344

The raw protein expression data included 17862 genes of which 11429 genes overlapped with our RNA-Seq-345

based network and were therefore used for the analysis To demonstrate that the performance of the protein 346

network was not biased due by the selection of genes the PCC method was used for the whole 17862 genes 347

to construct a protein network (Supplemental Fig 7) No improvement could be detected from protein network 348

derived from 17862 genes with p-value equals to 0635 for GO evaluation and 0995 for PPPTY evaluation 349

from one-sided Wilcoxon rank sum test 350

351

PCC and SCC-built GCN Exhibit Identical Topological and Functional Properties 352 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 11

In addition to evaluation of network performance based upon biological characteristics networks can be 353

compared based upon several different network characteristics including clustering coefficient number of 354

nodes network heterogeneity (Dong and Horvath 2007) network centralization (Dong and Horvath 2007) 355

number of detected modules and number of genes in largest module Number of nodes is a basic construct in 356

graph theory depicting the scale of a network Clustering coefficient and number of modules are to model how 357

densely nodes are connected in networks Heterogeneity measures the variability of node connections 358

Centralization indicates how likely some nodes have significantly more connections than average In this 359

analysis each gene corresponds with a node Based on the extensive evaluation using biological 360

characteristics like protein-protein interactions (PPPTY) and predicted gene function (GO) three final maize 361

networks were selected for comparison of basic network characteristics based on their overall performance 362

PCC and SCC-built ranked aggregation network from 17 experiments (PA and SA) MRNET-built single 363

network from 1266 total samples (MS) The three networks were constrained to include the top one million 364

predicted interactions or edges 365

In prior studies most biological networks had scale-free architectures which fit a power-law distribution 366

(Barabasi et al 2004 Doncheva et al 2012 Schaefer et al 2014) For the three final maize networks 367

constructed using optimized parameters both neighborhood connectivity distribution (Supplemental Fig 8) and 368

node degree distribution (Supplemental Fig 9) fit power-law models with r-squared values over 07 The MS 369

network had the highest network centralization value The network heterogeneity value of MS was over two 370

times that of PA and SA indicating that MS may contain more highly interacting genes (Supplemental Table 371

S4) consistent with the observed highest centralization values for this network Centralization and 372

heterogeneity are two variants to model the degree distribution of networks A scale-free network with more 373

numbers of hubs has larger values of centralization and heterogeneity while a network with larger values of 374

centralization and heterogeneity may contain a larger number of hubs or the number of hubs is not significantly 375

large but the degree distributions are extremely imbalanced In biological networks many observations 376

connected large values of centralization and heterogeneity with more hub genes (Ma and Zeng 2003 Horvath 377

and Dong 2008 Iancu et al 2012 Scott-Boyer et al 2013) even though theoretically we cannot rule out the 378

possibility that high values were result from extremely imbalanced degree distribution For the MS network 379

most highly connected genes interacted with a large number of lowly connected genes this pattern is also 380

apparent reflected in the decreasing neighborhood connectivity distribution for the MS network (Supplemental 381

Fig 8) The genes with the most interactions are expected to act as key components in GCN networks 382

(Langfelder and Horvath 2008 Allen et al 2012) and likely represent central regulators of multi-protein 383

biological processes (Ma et al 2013 Du et al 2015) The top 1000 interacting genes from all networks were 384

analyzed in more detail as these were potential ldquohubrdquo genes that may regulate other expression patterns and 385

processes PA and SA shared 95 of the top 1000 interacting genes while MS had 835 unique genes (Fig 386

7A) 148 genes were shared among all three networks (Supplemental Table S5) making these genes strong 387

candidate for central biological regulators The annotation of these genes suggests their participation in a 388

range of basic cellular process (Fig 7C) including gene expression DNA replication translation and gene 389

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 12

silencing (Supplemental Table S5) the top interacting genes were not limited to a subset of cellular 390

biochemistry Ribosomal proteins were the largest component of top interacting genes (27148) which was 391

expected because of their cellular abundance and involvement with translation Interestingly nine epigenetic 392

regulators were found in the 148 shared genes including AGO104 (GRMZM2G141818) (Singh et al 2011) 393

CHR106 (GRMZM2G071025) (Li et al 2014a) and LBL1 (GRMZM2G020187) (Dotto et al 2014) 394

demonstrating the importance of epigenetic regulation for plant development (reviewed by (Huang et al 395

2017)) 396

To reveal the underlying properties of GCNs a graph clustering algorithm Markov Cluster Algorithm(MCL) was 397

used to identify network modules (Enright et al 2002 Morris et al 2011) The result showed a shared pattern 398

between the PA and SA networks that was distinct from the MS network (Supplemental Table S4) The MS 399

network had fewer but larger modules detected than the PA and SA networks Consequently most genes in 400

the MS network clustered into one very large module of 14054 consistent with the high network centralization 401

value for the MS network Conversely PA and SA networks separated into smaller distinct modules with 402

related gene ontology enrichment (Supplemental Table S6 and S7) The pattern displayed by the PA and SA 403

networks (Supplemental Fig 10) seems more likely to represent biologically relevant pathways and so these 404

methods appear to be better for module detection 405

To compile a high-confident co-expression network the top 1 million edges from PA SA and MS were merged 406

together and the intersection of the three produced a 14277 gene 106591 interactions merged network PA 407

and SA shared 835 of common interactions within the networks while MS had 873 unique interactions 408

(Fig 7B) This merged network (Supplemental Dataset S1) was used for a case study analysis of cell wall 409

biosynthesis The same network can also be accessed at httpwwwbiofsuedumcginnislabmcnmain_pagephp 410

411

Case Study Cell Wall Biosynthesis and Regulation 412

To demonstrate the functionality of network the predicted cell wall biosynthesis pathway from the merged 413

network was compared to the existing knowledge of this pathway Sixteen well-characterized components of 414

cell wall biosynthesis were selected as guide genes (Supplemental Table S8) including five cellulose 415

synthase genes seven cellulose synthase-like genes three glycosyl hydrolase genes and one glycosidase 416

gene (Penning et al 2009 Bosch et al 2011) Collectively 214 genes containing 377 edges were extracted 417

from the network with the 16 guide genes (Fig 8 A) two guide genes did not have any co-expressed genes in 418

the network that met the analysis criteria As expected for these 214 genes cell wall related GO terms were 419

enriched (Fig 7D Supplemental Table S9) 420

The resulting 214 co-expressed genes were queried against the Arabidopsis TAIR 10 protein database to 421

retrieve homologs and their annotations using BLASTP The literature was manually searched using the maize 422

genes and their Arabidopsis homologs as queries (Supplemental Table S10) The results of the literature 423

survey showed that 313 (67214) of the genes co-expressed with the guide genes had peer-reviewed 424

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 13

publications indicating a role in cell wall synthesis or related pathways in plants A search using 214 randomly 425

selected genes as queries returned only 327 genes (7214) that were involved in cell wall related pathways 426

This suggests that the network discriminated co-expressed genes and identified some known components of 427

the pathway Lignin biosynthesis genes are expected to function in cell wall biosynthesis to provide rigidity and 428

strength in the secondary cell wall (reviewed by Vanholme et al 2010) Interestingly even though no lignin 429

biosynthesis genes were included in our queries six lignin biosynthesis genes (PAL1 C4H 4CL2 HCT 430

CCoAOMT1 and PDR1) (reviewed by Zhong and Ye 2015) were found to be co-expressed with the guide 431

genes At least nine cellulose biosynthesis and assembly genes were discovered including CESA1 FLA11 432

IRX9 IRX14 and IRX10 (reviewed by Zhong and Ye 2015) Moreover proteins participating in a well-studied 433

physical interaction CSI1 (Cellulose Synthase Interactive 1) CESA6 (Cellulose Synthase 6) and CESA3 434

(Cellulose Synthase 3) (Desprez et al 2007 Gu et al 2010) were also predicted to be expressed in the 435

network There were 131 genes without reported functions in cell wall pathways an indication that GCN 436

analysis can be used to predict undiscovered components of biological pathways in maize 437

The cell wall biosynthesis pathway results were also compared with the CORNET Co-expression database (De 438

Bodt et al 2012) and STRING functional protein association network (Szklarczyk et al 2015) using the same 439

16 genes and similar parameters (See Methods) From CORNET 10 out of 16 genes had co-expressed genes 440

(Fig 8B) In total 210 genes and 325 interactions were retrieved using CORNET of which 19 (40210) had 441

publications supporting their function in cell wall pathways (Supplemental Table S11) STRING performed very 442

well with 14 out of 16 genes demonstrating predicted protein association (Fig 8C) resulting in 817 443

interactions with 76 genes 48 (3675) of co-expressed genes were experimentally confirmed (Supplemental 444

Table S12) the highest percentage among the three methods Only one of the lignin biosynthesis genes 445

(PAL1) was found using CORNET and none were found using STRING Although STRING appears very 446

robust for predicting protein-protein interactions this suggests that an optimized GCN analysis have more 447

power to find genes that function together without physically interacting This case study shows that a robust 448

optimized GCN can discover physical and functional interactions and enhance study of biological relevant 449

interactions A tutorial was provided as supplemental material on how to use Cytoscape to visualize any co-450

expressed genes in our network (Supplemental Dataset S2) 451

452

Discussion 453

As the per-read cost of RNA-Seq technology decreases the use of this technology is quickly increasing With 454

over five thousand libraries available for maize there is now ample data to support GCN analysis This 455

comprehensive evaluation of normalization methods and network inference methods using real maize RNA-456

Seq data will provide a useful set of optimized parameters to support these analyses 457

In our analysis VST CPM and RPKM normalization methods had equivalent outcomes for GCN analysis 458

consistent with prior results using much smaller datasets (Giorgi et al 2013) Several benchmark studies 459

focusing on differential expression (DE) analysis proposed that RPKM performed poorly and should be avoided 460 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 14

(Maza et al 2013 Dillies et al 2013b Zyprych-Walczak et al 2015) This was not observed for the maize 461

GCN testing It is possible that the large number of samples from various labs created enough heterogeneity 462

within samples that normalization effects were minimized (Paulson et al 2016) Furthermore the 463

normalization is on a library basis which means genes within the same library are normalized by similar factors 464

So when the network is constructed by PCC and BIC where expression vectors are centered by mean or 465

median values the effect of different normalization methods are probably small Two rank correlations SCC 466

and KCC only consider difference on relative rankings where normalization has a limited effect It is similar for 467

GCC method The estimation of mutual information is based on the k-nearest neighbor method implemented in 468

parmigene (Sales and Romualdi 2011) Since the three normalization methods shared similar expression 469

distribution (Supplemental Fig 2) MI estimations from different normalizations are expected to be similar 470

When assessing inference methods the simple and widely used correlation methods like PCC and SCC are 471

less time-consuming than MI methods This analysis showed PCCSCC- built GCNs had better overall 472

performance This is consistent with a study in human GCN analysis (Ballouz et al 2015) but SCC did not 473

score higher than other correlation methods using GO and PPPTY evaluations Some genes had higher 474

performance using MI methods but this effect was limited to evaluation with the PPPTY data This may 475

indicate that correlation and MI inference methods assert different kinds of interactions (Meyer et al 2008 476

Marbach et al 2012 Song et al 2012) Marbach et al (2012) stated that integration of multiple inference 477

methods showed a more robust performance than any single inference methods in in silico and E coli 478

expression networks referring to ldquothe wisdom of crowdrdquo However for analysis of the available maize data 479

integration of PCC SCC MRNET and CLR together did not result in a network that outperformed PCC and 480

SCC networks (data not shown) This approach was also less effective in more complex S cerevisiae datasets 481

than prokaryotic networks (Marbach et al 2012) suggesting that more work is required to determine whether 482

integrating algorithms can improve GCNs with eukaryotic data 483

In conclusion we extensively evaluated normalization methods and inference methods for building an RNA-484

Seq based maize GCN This optimization may apply to a range of datasets with shared characteristics of 485

maize including a large and heterogeneous genome with rich and diverse transposon element composition 486

and limited gene annotation 487

488

Materials and Methods 489

RNA-Seq Data Collection and Process 490

The maize genome and its annotation were downloaded from Ensembl Plant Release 31 491

(httpplantsensemblorg) The original 1303 RNA-Seq samples based on illumina HiSeq2000 or Hiseq2500 492

were downloaded from NCBI Sequence Read Archive (SRA) (Leinonen et al 2010) The downloaded files 493

were converted to fastq format using the fastq-dump command in SRA Toolkit (version 252) The adapters for 494

the fastq files were trimmed by Cutadapt 181 (Martin 2011) The adapter-removed files were then quality 495

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 15

checked by FastQC v0112 (httpwwwbioinformaticsbabrahamacukprojectsfastqc) HISAT2 v204 (Kim 496

et al 2015) was used for genome alignment Gene-level expression raw read counts were calculated by 497

FeatureCounts 150 (Liao et al 2014) from aligned bam files (Supplemental Fig S1) 26 libraries with less 498

than 5 million reads total and 11 libraries with less than 70 of total alignment rate were excluded leaving 499

1266 samples (Supplemental Table S1) for the final expression table The processing protocol were 500

streamlined by Snakemake v371 (Koumlster and Rahmann 2012) 501

502

Gene Count Normalization 503

The expression data was normalized using three different methods before constructing GCNs Counts Per 504

Million (CPM) and Reads Per Killbase Per Million (RPKM) were calculated by edgeR package (Robinson et al 505

2010) in R environment and then log2 normalized (expression = log2(CPMRPKM +1) For both method scale 506

factors between samples were estimated by Trimmed Mean of M-values (TMM) in edge R Variance Stabilizing 507

Transformation (VST) was calculated by DESeq2 package (Love et al 2014) Only genes with expression 508

higher than 2 CPM in more than 1000 samples were included from additional analysis (15116 genes) 509

510

Network Inference 511

Six correlation coefficient methods and four mutual information methods were applied to normalized gene 512

expression data to construct GCNs All computing steps were done in the R 331 environment Pearson 513

Correlation Coefficient (PCC) and Spearman Correlation Coefficient (SCC) was calculated by cor() function 514

Kendall rank Correlation Coefficient was calculated using corfk() function in pcaPP package (Filzmoser et al 515

2009) Gini Correlation Coefficient was calculated by adjacencymatrix() function in rsgcc package (Ma and 516

Wang 2012) Biweight midcorrelation was computed by bicor() function in WGCNA package (Langfelder and 517

Horvath 2008) Cosine similarity coefficient was computed by cosine() function in coop package (Schmidt 518

2016) Mutual information results were computed using the parmigene package (Sales and Romualdi 2011) 519

The adjacency matrix weighs derived from ten inference methods were ranked with smallest value equals to 520

one Then ranks were divided by the number of elements in the matrix and diagonal was set to one to make all 521

networks weighs ranging from zero to one 522

523

Network Performance Evaluation 524

To generate the random networks gene IDs were shuffled randomly in CPM or VST normalized expression 525

matrices The randomized expression matrices were then inferenced by PCC MRNET or CLR methods and 526

evaluated For PCC methods 1000 repeats of randomization and evaluation were conducted For MRNET and 527

CLR each inference steps took 2 hours on our server so 10 repeats were conducted 528

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 16

Four maize datasets were used for evaluation First maize protein-protein interactions were downloaded from 529

PPIM v11 (Zhu et al 2016) Only high-confidence interactions were used for evaluation as defined by ranking 530

top 5 in their results Second maize pathway information was downloaded from MaizeCyc v22 (Monaco et 531

al 2013) Genes within same pathways were considered as co-expressed Third maize gene ontology data 532

for AGPv330 was downloaded from AgriGO (Du et al 2010) GO terms with 20 to 300 genes were used for 533

evaluation Fourth ChIP-Seq confirmed targets for HDA101 (GRMZM2G172883) (Yang et al 2016) was used 534

as positive co-expressed examples for evaluation 535

The widely-used Area under Receiver Operating Characteristic (AUROC) for binary classification problems 536

was used for evaluations Protein-protein interaction and pathway information was parsed into lists of co-537

expressed genes Prediction() and performance() function in R package ROCR were used to calculate 538

AUROCs (Sing et al 2005) The 277 AUROC values for GO datasets were calculated by EGAD package 539

(Ballouz et al 2016) in R Basically it utilizes the ldquoguilt-by associationrdquo principle that genes with shared GO 540

terms are more likely to connected Thus networks normalized and inferred by different methods can be 541

evaluated by hiding a subset of genes GO terms and test whether the hidden GO terms could be predicted 542

from the remaining annotations The prediction model performance was measured by AUROC values in three-543

fold cross-validation All ANOVA and pairwise Wilcoxon rank tests were analyzed in R using anova() and 544

pairwisewilcoxtest() function from stats package P-value adjustment method was set to ldquofdrrdquo (Benjamini and 545

Hochberg 1995) 546

Definition of True Positives (TP) False Positives (FP) True Negatives (TN) False Negatives (FN) For the 547

evaluation using PPPTY dataset TP a network predicts two genes are co-expressed and they are co-548

expressed in PPPTY dataset FP a network predicts two genes are co-expressed but they are not TN a 549

network predicts two genes are not co-expressed and they are not co-expressed in PPPTY FN a network 550

predicts two genes are not co-expressed but they are co-expressed in PPPTY datasets For the evaluation 551

using GO dataset TP a network predicts a gene has a specific GO term and it does have that GO term in our 552

GO dataset FP a network predicts a gene has a specific GO term but it does not have that GO term in our 553

GO dataset TN a network predicts a gene does not have a specific GO term and it doesnrsquot have in our GO 554

dataset FN a network predicts a gene does not have a specific GO terms but it has that GO term in GO 555

dataset 556

557

Network Clustering and Characterization 558

For each network the top 1 million edges were selected as stringent co-expression networks The network 559

topological characteristics were computed in Cytoscape (Shannon et al 2003) The neighborhood connectivity 560

distribution and node degree distributions were plotted by Network Analyzer plugin (Doncheva et al 2012) 561

Graph clustering was performed using Markov Cluster Algorithm (MCL) by MCL v14137 with inflation value set 562

to 18 (Enright et al 2002) All networks were visualized in Cytoscape 563

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 17

564

Gene Ontology Enrichment and Visualization 565

Gene ontology enrichment was analyzed in AgriGOrsquos Singular Enrichment Analysis tool (Du et al 2010) 566

15116 genes involved in our networks were used as background references Hypergeometric testing was used 567

to calculate p-value for which a value below 005 was considered as significant The Yekutieli method was 568

used for multiple test correction and terms with false discovery rate (FDR) above 005 were discarded The 569

results were then imported into Cytoscape for visualization 570

571

Databases Comparison on Cell Wall Pathway 572

Sixteen well characterized (Penning et al 2009 Bosch et al 2011) components of cell wall biosynthesis 573

(Supplemental Table S8) were chosen as query genes to search against CORNET Maize 574

(httpsbioinformaticspsbugentbecornetversionscornet_maize10) on website and STRING database using 575

Cytoscape stringApp (httpappscytoscapeorgappsstringapp) The parameters for searching CORNET 576

database were Method=Pearson Correlation coefficient=075 P-value le 005 and Top genes = 50 This 577

resulted in 210 co-expressed genes and 325 interactions To search STRING database the confidence cutoff 578

was set to 04 with maximum number of interactors set to 100 76 genes with 817 interactions were retrieved 579

Maize proteins were blasted against TAIR 10 protein sequences using standalone BLASTP version 2228+ 580

(Camacho et al 2009) 581

582

Acknowledgments 583

We would like to give special thanks to Dr Peixiang Zhao (FSU Department of Computer Science) for advice 584

and discussion on topological analysis of maize networks Also we thank Dr Alan Lemmon (FSU Department 585

of Scientific Computing) and Dr Jonathan Dennis (FSU Department of Biological Science) for the helpful 586

discussion on data analysis 587

588

Supplemental Data 589

Supplemental Figure 1 Pipeline and datasets used for analysis 590

Supplemental Figure 2 Distribution of gene expression values 591

Supplemental Figure 3 Maize CPM-normalized with log2 transformed gene expression from all tissues and 592

developmental stages 593

Supplemental Figure 4 Pairwise comparison among results of inferences methods 594

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 18

Supplemental Figure 5 Characteristic of all 1720 PPPTY gene set (ALL_1720) genes with highest AUROC 595

values in CSC method (CSC) PCC method (PCC) and MRNET method (MRNET) 596

Supplemental Figure 6 Evaluation of network performance based on sample size and inference 597

Supplemental Figure 7 GCN performance comparison between protein networks 598

Supplemental Figure 8 Average neighborhood connectivity for three selected networks PCC-aggregated (PA) 599

SCC-aggregated (SA) and MRNET-single (MS) 600

Supplemental Figure 9 Node distribution for four selected networks PCC-aggregated (PA) SCC-aggregated 601

(SA) and MRNET-single (MS) and the intersection among three networks (Merged network) 602

Supplemental Figure 10 Network representation of PCC ranked aggregation network (PA) 603

Supplemental Table S1 RNA-Seq libraries used in this analysis 604

Supplemental Table S2 Random network AUROC value baseline 605

Supplemental Table S3 ANOVA tables and pairwise comparisons 606

Supplemental Table S4 Topological characteristics of four maize networks 607

Supplemental Table S5 Gene Ontology annotation for 148 hub genes 608

Supplemental Table S6 Enriched GO terms for PCC ranked aggregation networks from module 1 to module 8 609

Supplemental Table S7 Enriched GO terms for SCC ranked aggregation networks from module 1 to module 8 610

Supplemental Table S8 16 query genes in maize cell wall pathway 611

Supplemetal Table S9 GO enrichment analysis for 214 co-expressed genes of cell wall query genes in 612

merged network 613

Supplemental Table S10 Annotation for co-expressed genes queried by 16 cell wall pathway genes from 614

merged network 615

Supplemental Table S11 Annotation for co-expressed genes queried by 16 cell wall pathway genes from 616

CORNET database 617

Supplemental Table S12 Annotation for co-expressed genes queried by 16 cell wall pathway genes from 618

STRING database 619

Supplemental Dataset S1 The merged network in Cytoscape-ready format 620

Supplemental Dataset S2 Tutorial Visualizing Co-expression data in Cytoscape 621

622

623 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 19

624

625

626

Figure legends 627

628

Figure 1 Number of maize microarray and RNA-Seq samples submitted to NCBI (httpswwwncbinlmnihgov) 629

from 2008-2016 A A text search of the National Center for Biotechnology Information (NCBI) Gene 630

Expression Omnibus (GEO) database identified samples generated by microarray platforms GPL4032 and 631

GPL12620 the total values for years 2008 to 2016 were combined to represent the number of microarray 632

studies (Microarray) A text search of the NCBI Sequence Read Archive (SRA) database was used to identify 633

RNA-seq samples generated between 2008 and 2016 using the Illumina sequencing platform (RNA-Seq) B 634

the number of samples submitted to NCBI GEO database each year generated by microarray platform 635

GPL4032 and GPL12620 were identified by a text search (dash line) and compared to the number of RNA-Seq 636

Illumina samples (solid line) per year 2008-2016 637

638

Figure 2 Normalization and network inference methods effect on single network performance A Network 639

performance was evaluated by calculating area under the receiver operating characteristic curve (AUROC) 640

values from GO datasets for comparisons with samples normalized using Variance Stabilizing Transformation 641

(VST) Counts Per Million (CPM) or Reads per Kilobase per Million (RPKM) methods B Network performance 642

was evaluated by calculating AUROC values from PPPTY dataset comparisons for samples normalized using 643

VST CPM or RPKM methods C Network performance was evaluated by calculating AUROC values from 644

comparisons with HDA101 binding targets for samples normalized using VST CPM or RPKM methods D 645

Network performance was evaluated by calculating AUROC values from comparisons with GO dataset for 646

samples constructed using ten inference methods including Pearson Correlation Coefficient (PCC) Spearman 647

correlation coefficient (SCC) Kendall rank Correlation Coefficient (KCC) Gini correlation coefficient (GCC) 648

Biweight midcorrelation (BIC) Cosine Similarity Coefficient (CSC) Additive ARCNE (AA) multiplicative 649

ARCNE (MA) Minimum Redundancy NETwork (MRNET) and Context Likelihood of Relatedness (CLR) E 650

Network performance was evaluated by calculating AUROC values from comparisons with PPPTY for samples 651

constructed using ten inference methods F Network performance was evaluated by calculating AUROC 652

values from comparisons with HDA101 binding targets for samples constructed using ten inference methods 653

Outliers were defined as outside of 15 times the interquartile range above 75 quantile or below 25 quantile 654

Median values were plotted as bold horizontal lines For C to F the horizontal dashed lines indicate the highest 655

and lowest AUROC values 656

657

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 20

Figure 3 Similarity between ten inference methods on network performance based upon GO (A) and PPPTY 658

(B) evaluation Genes with highest AUROC value in CSC or MRNETCLR are closed in blue and green box 659

respectively Area under the ROC curve (AUROC) values for each GO term or genes were scaled to standard 660

normal distribution resulting in scaled AUROC values between -3 (blue) and 3 (red) Samples normalized by 661

VST CPM and RPKM were analyzed using each inference methods (PCC SCC KCC GCC BIC CSC AA 662

MA MRNET and CLR) and clustered based on Euclidian distance PCC Pearson Correlation Coefficient SCC 663

Spearman correlation coefficient KCC Kendall rank Correlation Coefficient GCC Gini correlation coefficient 664

BIC Biweight midcorrelation CSC Cosine Similarity Coefficient AA Additive ARCNE MA multiplicative 665

ARCNE MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness 666

667

Figure 4 Effect of sample size on network performance A Average area under the ROC curve (average 668

AUROC) values from GO evaluation of 17 different sized networks plotted against natural logarithm 669

transformed sample size (log(sample size)) B Average AUROC values from PPPTY evaluation of 17 different 670

sized networks plotted against natural logarithm transformed sample size (log(sample size)) Regression fitting 671

logarithm models were plotted in black lines R-square and p-values were calculated by lm() function in R PCC 672

Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET Minimum Redundancy 673

NETwork CLR Context Likelihood of Relatedness 674

675

Figure 5 Evaluation of different sized networks constructed with different number of samples using PCC 676

(black) SCC (green) MRNET (red) and CLR (blue) methods A Average AUROC values from GO evaluations 677

of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted against sample size Seventeen 678

individual networks were labeled as S12_1 to S404 the S1266 included all samples from 17 experiments B 679

Average AUROC values from PPPTY evaluations of networks constructed using 12 (S12) to 1266 (S1266) 680

libraries were plotted against sample size Networks with the same number of samples included are 681

designated as ldquo_1rdquo ldquo_2rdquo and ldquo_3rdquo PCC Pearson Correlation Coefficient SCC Spearman correlation 682

coefficient MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness 683

684

Fig 6 GCN performance comparison among single network (whiterdquo1266rdquo) aggregated network (greyrdquoaggrdquo) 685

and protein network (dark greyrdquoprrdquo) using PCC SCC MRNET and CLR A GO evaluation on networks 686

Inference methods were indicated by single letter (p- PCC s- SCC m- MRNET c-CLR) AUROC values were 687

plotted against network types B PPPTY evaluation on networks Inference methods were indicated by single 688

letter (p- PCC s- SCC m- MRNET c-CLR) Network types were plotted against AUROC values Bold 689

horizontal lines indicate median star sign is the mean value of each box Outliers are plotted in grey dots 690

691

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 21

Figure 6 GCN performance comparison of networks constructed with 1266 libraries A Area under the ROC 692

curve (AUROC) values from GO evaluation of single network (white bars) aggregation network (grey bars) and 693

protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) 694

or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers B 695

AUROC values from PPPTY evaluation of single network (white bars) aggregation network (grey bars) and 696

protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) 697

or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers 698

699

Figure 7 Comparison of top 1000 interacting genes in aggregation and single networks A Venn diagram 700

shows the overlap among 1000 highest interacting genes in three networks 148 genes were shared among 701

three networks PA PCC ranked aggregation network SA SCC ranked aggregation network MS MRNET 702

single network B Venn diagram shows the overlap among top 1x106 edges in three networks 106591 edges 703

were shared among three networks C GO term enrichment analyzed by AgriGO for 148 shared highly 704

interacting genes among PA SA and MS D GO term enrichment analyzed by AgriGO for 214 co-expressed 705

genes queried by 16 cell wall pathway genes 706

707

Figure 8 Cell wall pathway subnetworks A Intersections of PCC aggregation (PA) SCC aggregation (SA) and 708

MRNET-single (MS) networks queried by 16 cell wall pathway genes (red nodes) Cyan nodes are genes with 709

reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of 710

involvement in cell wall related pathways Grey lines indicate network predicted interactions B Network 711

retrieved from CORNET database queried by the16 cell wall pathway genes (red node) Cyan nodes are 712

genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior 713

knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions C 714

Network retrieved from STRING database queried by 16 cell wall pathway genes (red nodes) Cyan nodes are 715

genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior 716

knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions 717

718

Supplemental Figure 1 Pipeline and datasets used for analysis A Workflow used in this analysis 719

Independent steps are labeled in square boxes with alternative algorithms for each step in the rounded boxes 720

Software and packages for each step are in italics between the boxes Raw data files were acquired from 721

National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) database converted to a 722

common format (fastq files) and aligned to the maize AGPv3 genome (Alignment) Gene-level reads were 723

counted (Read Count) to generate an expression matrix which was imported to the R environment for the 724

normalization inference and evaluation steps All networks were visualized in Cytoscape B Relative 725

representation of different maize tissues in acquired datasets Tissues are listed by name with the percentage 726

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 22

of the1266 libraries originating from each tissue SAM= Shoot Apical Meristem Samples are grouped by tissue 727

and may be represented by one or more developmental stages of that tissue Tissues represented by less than 728

10 libraries were grouped together as Others C Relative representation of different maize genotypes in our 729

datasets Genotypes are listed by name with the percentage of the 1266 libraries originating from each tissue 730

MAGIC = Multi-parent Advanced Generation InterCrosses Genotypes represented by more than 10 libraries 731

were grouped together as Others 732

733

Supplemental Figure 2 Distribution of gene expression values The frequency of each expression level in the 734

dataset (Density) was plotted against gene expression (Expr) which was calculated after normalization by 735

Variance Stabilizing Transformation (VST) Counts Per Million (CPM) or Reads Per Kilobase per Million 736

mapped reads (RPKM) A-B distribution of expression values for samples normalized with CPM (black line 737

CPM graph) and RPKM (black line RPKM graph) before (A) and after (B) logarithm normalization (log2) VST 738

values are log2 transformed by default The normal distribution of expression (dot lines) was calculated using 739

dnorm() function in R which takes the mean value and standard deviation from log2 transformed expressions 740

C Normalized gene expression values for 15116 genes were averaged libraries and plotted as a function of 741

gene length in base pairs (bp) 742

743

Supplemental Figure 3 Maize CPM-normalized with log2 transformed gene expression from all tissues and 744

developmental stages (Stelpflug et al 2015) A Clustering dendrogram of samples based on Euclidean 745

distance (Height) DAS days after sowing DAP days after pollination V1-V18 vegetative developmental 746

stage B Heat map of the gene expression correlation between pollen tissue and 78 other tissues calculated 747

by Pearson correlation coefficient ranging 06 to 10 Red color indicates higher correlation 748

749

Supplemental Figure 4 Pairwise comparison among results of inferences methods A GO evaluation 750

comparisons for VST CPM and RPKM normalized data The AUROC value density for each method was 751

plotted in diagonal line of blocks between AUROC values and PCC values AUROC values evaluated by GO 752

datasets were plotted pairwise in triangle below diagonal with the number corresponding coefficient values as 753

calculated by Pearson correlation shown in the triangle above diagonal B PPPTY evaluation comparisons for 754

VST CPM and RPKM normalized data The AUROC value density for each method was plotted in diagonal 755

line of blocks between AUROC values and PCC values AUROC values evaluated by PPPTY datasets were 756

plotted pairwise in triangle below diagonal with the number corresponding coefficient values as calculated by 757

Pearson correlation shown in the triangle above diagonal PCC Pearson Correlation Coefficient SCC 758

Spearman correlation coefficient KCC Kendall rank Correlation Coefficient GCC Gini correlation coefficient 759

Bi Biweight midcorrelation CSC Cosine Similarity Coefficient AA Additive ARCNE MA multiplicative 760

ARCNE MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness 761

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 23

762

Supplemental Figure 5 Characteristic of all 1720 PPPTY gene set (ALL_1720) genes with highest AUROC 763

values in CSC method (CSC) PCC method (PCC) and MRNET method (MRNET) Average expression in 764

CPM of four gene sets were in squares average number of lowly expressed elements (CPM lt 0) were in solid 765

circles 766

767

Supplemental Figure 6 Evaluation of network performance based on sample size and inference A AUROC 768

values from GO evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted 769

against sample size B AUROC values from PPPTY evaluations of networks constructed using 12 (S12) to 770

1266 (S1266) libraries were plotted against sample size Networks with the same number of samples included 771

are designated as ldquo_1rdquo ldquo_2rdquo and ldquo_3rdquo Outliers were defined as outside of 15 times the interquartile range 772

above the 75 quantile or below 25 quantile Median values were plotted as bold horizontal lines Dash lines 773

are average AUROC value from 17 individual networks of each categories Mean values of each network were 774

labeled in asterisks PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET 775

Minimum Redundancy NETwork CLR Context Likelihood of Relatedness 776

777

Supplemental Figure 7 GCN performance comparison between protein networks A Area Under the ROC 778

curve (AUROC) values from GO evaluation of protein networks with 17862 genes (ppr_all) and with 11429 779

genes (ppr) B Area Under the ROC curve (AUROC) values from PPPTY evaluation of protein networks with 780

17862 genes (ppr_all) and with 11429 genes (ppr) Both networks were constructed by Pearson Correlation 781

Coefficient (PCC) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate 782

outliers 783

784

Supplemental Figure 8 Average neighborhood connectivity for three selected networks PCC-aggregated (PA) 785

SCC-aggregated (SA) and MRNET-single (MS) The average neighborhood connectivity distribution of all 786

genes is plotted against number of neighbors The top one million edges were chosen for each network Red 787

and blue curve shows the power-law fitted distribution R2 value indicates the fitness with the power-law model 788

789

Supplemental Figure 9 Node distribution for four selected networks PCC-aggregated (PA) SCC-aggregated 790

(SA) and MRNET-single (MS) and the intersection among three networks (Merged network) The number of 791

edges linked to the genes (node degree) was plotted against the number of genes with that degree (number of 792

nodes) Red curve shows the power-law fitted distribution with the function and R2 indicated beside 793

794

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 24

Supplemental Figure 10 Network representation of PCC ranked aggregation network (PA) Each node is a 795

gene in the network The eight largest modules detected by Markov Cluster Algorithm (MCL) were highlighted 796

in colors Genes not in modules 1-8 are light grey nodes 797

798

799

Literature Cited 800

Allen JD Xie Y Chen M Girard L Xiao G (2012) Comparing statistical methods for constructing large scale 801 gene networks PLoS One 7 e29348 802

Anders S Huber W (2010) Differential expression analysis for sequence count data Genome Biol 11 R106 803

Aoki K Ogata Y Shibata D (2007) Approaches for extracting practical information from gene co-expression 804 networks in plant biology Plant Cell Physiol 48 381ndash90 805

Aoki Y Okamura Y Tadaka S Kinoshita K Obayashi T (2015) ATTED-II in 2016 A Plant Coexpression 806 Database Towards Lineage-Specific Coexpression Plant Cell Physiol 57 e5ndashe5 807

Asnicar F Masera L Coller E Gallo C Sella N Tolio T Morettin P Erculiani L Galante F Semeniuta S (2016) 808 NES2RA Network expansion by stratified variable subsetting and ranking aggregation Int J High Perform 809 Comput Appl 1094342016662508 810

Baerenfaller K Grossmann J Grobei MA Hull R Hirsch-Hoffmann M Yalovsky S Zimmermann P 811 Grossniklaus U Gruissem W Baginsky S (2008) Genome-scale proteomics reveals Arabidopsis thaliana 812 gene models and proteome dynamics Science (80- ) 320 938ndash941 813

Ballouz S Verleyen W Gillis J (2015) Guidance for RNA-seq co-expression network construction and analysis 814 Safety in numbers Bioinformatics 31 2123ndash2130 815

Ballouz S Weber M Pavlidis P Gillis J (2016) EGAD Ultra-fast functional analysis of gene networks bioRxiv 816 53868 817

Barabasi A-L Oltvai ZNZN Barabaacutesi A-L (2004) Network biology understanding the cellrsquos functional 818 organization Nat Rev Genet 5 101ndash113 819

Benjamini Y Hochberg Y (1995) Controlling the false discovery rate a practical and powerful approach to 820 multiple testing J R Stat Soc Ser B 289ndash300 821

De Bodt S Hollunder J Nelissen H Meulemeester N Inzeacute D (2012) CORNET 20 Integrating plant 822 coexpression protein-protein interactions regulatory interactions gene associations and functional 823 annotations New Phytol 195 707ndash720 824

Bolduc N Yilmaz A Mejia-Guerra MK Morohashi K OrsquoConnor D Grotewold E Hake S (2012) Unraveling the 825 KNOTTED1 regulatory network in maize meristems Genes Dev 26 1685ndash90 826

Bosch M Mayer C-D Cookson A Donnison IS (2011) Identification of genes involved in cell wall biogenesis in 827 grasses by differential gene expression profiling of elongating and non-elongating maize internodes J 828 Exp Bot 62 3545ndash3561 829

Camacho C Coulouris G Avagyan V Ma N Papadopoulos J Bealer K Madden TL (2009) BLAST+ 830 architecture and applications BMC Bioinformatics 10 421 831

Conesa A Madrigal P Tarazona S Gomez-Cabrero D Cervera A McPherson A Szcześniak MW Gaffney DJ 832 Elo LL Zhang X et al (2016) A survey of best practices for RNA-seq data analysis Genome Biol 17 13 833

Drsquohaeseleer P Liang S Somogyi R (2000) Genetic network inference from co-expression clustering to reverse 834 engineering Bioinformatics 16 707ndash726 835

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 25

Davidson RM Hansey CN Gowda M Childs KL Lin H Vaillancourt B Sekhon RS de Leon N Kaeppler SM 836 Jiang N et al (2011) Utility of RNA Sequencing for Analysis of Maize Reproductive Transcriptomes Plant 837 Genome J 4 191 838

Desprez T Juraniec M Crowell EF Jouy H Pochylova Z Parcy F Houmlfte H Gonneau M Vernhettes S (2007) 839 Organization of cellulose synthase complexes involved in primary cell wall synthesis in Arabidopsis 840 thaliana Proc Natl Acad Sci 104 15572ndash15577 841

Dhillon IS Modha DS (2001) Concept decompositions for large sparse text data using clustering Mach Learn 842 42 143ndash175 843

Dillies M-A Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot G Castel D 844 Estelle J (2013a) A comprehensive evaluation of normalization methods for Illumina high-throughput RNA 845 sequencing data analysis Brief Bioinform 14 671ndash683 846

Dillies MA Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot NS Castel D 847 Estelle J et al (2013b) A comprehensive evaluation of normalization methods for Illumina high-throughput 848 RNA sequencing data analysis Brief Bioinform 14 671ndash683 849

Doncheva NT Assenov Y Domingues FS Albrecht M (2012) Topological analysis and interactive visualization 850 of biological networks and protein structures Nature Protoc 7 670ndash85 851

Dong J Horvath S (2007) Understanding network concepts in modules BMC Syst Biol 1 24 852

Dotto MC Petsch KA Aukerman MJ Beatty M Hammell M Timmermans MCP (2014) Genome-wide analysis 853 of leafbladeless1-regulated and phased small RNAs underscores the importance of the TAS3 ta-siRNA 854 pathway to maize development PLoS Genet 10 e1004826 855

Du D Rawat N Deng Z Gmitter FG (2015) Construction of citrus gene coexpression networks from microarray 856 data using random matrix theory Hortic Res 2 15026 857

Du Z Zhou X Ling Y Zhang Z Su Z (2010) agriGO a GO analysis toolkit for the agricultural community 858 Nucleic Acids Res 38 64-70 859

Enright AJ Van Dongen S Ouzounis CA (2002) An efficient algorithm for large-scale detection of protein 860 families Nucleic Acids Res 30 1575ndash1584 861

Fadista J Vikman P Laakso EO Mollet IG Esguerra J Lou Taneera J Storm P Osmark P Ladenvall C 862 Prasad RB (2014) Global genomic and transcriptomic analysis of human pancreatic islets reveals novel 863 genes influencing glucose metabolism Proc Natl Acad Sci 111 13924ndash13929 864

Faith JJ Hayete B Thaden JT Mogno I Wierzbowski J Cottarel G Kasif S Collins JJ Gardner TS (2007) 865 Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of 866 expression profiles PLoS Biol 5 0054ndash0066 867

Fedoroff N V (2012) McClintockrsquos challenge in the 21st century Proc Natl Acad Sci 109(50) 20200ndash20203 868

Ficklin SP Feltus FA (2011) Gene coexpression network alignment and conservation of gene modules 869 between two grass species maize and rice Plant Physiol 156 1244ndash56 870

Filzmoser P Fritz H Kalcher K (2009) pcaPP Robust PCA by Projection Pursuit R Packag version 1 871

Fu J Cheng Y Linghu J Yang X Kang L Zhang Z Zhang J He C Du X Peng Z (2013) RNA sequencing 872 reveals the complex regulatory network in the maize kernel Nature Commun 42832 873

Gao S Ver Steeg G Galstyan A (2015) Efficient Estimation of Mutual Information for Strongly Dependent 874 Variables Artificial Intelligence and Statistics 277-286 875

Gillis J Pavlidis P (2011) The role of indirect connections in gene networks in predicting function 876 Bioinformatics 27 1860ndash1866 877

Giorgi F Fabbro C Del Licausi F (2013) Comparative study of RNA-seq-and Microarray-derived coexpression 878 networks in Arabidopsis thaliana Bioinformatics 2 1ndash8 879

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 26

Gu Y Kaplinsky N Bringmann M Cobb A Carroll A Sampathkumar A Baskin TI Persson S Somerville CR 880 (2010) Identification of a cellulose synthase-associated protein required for cellulose biosynthesis Proc 881 Natl Acad Sci 107 12866ndash12871 882

Han Y Gao S Muegge K Zhang W Zhou B (2015) Advanced applications of RNA sequencing and challenges 883 Bioinform Biol Insights 9 29ndash46 884

Horvath S Dong J (2008) Geometric interpretation of gene coexpression network analysis PLoS Comput biol 885 4 e1000117 886

Huang J Lynn JS Schulte L Vendramin S McGinnis K (2017) Chapter Two-Epigenetic Control of Gene 887 Expression in Maize Int Rev Cell Mol Biol 328 25ndash48 888

Iancu OD Kawane S Bottomly D Searles R Hitzemann R McWeeney S (2012) Utilizing RNA-Seq data for de 889 novo coexpression network inference Bioinformatics 28 1592ndash1597 890

Kim D Langmead B Salzberg SL (2015) HISAT a fast spliced aligner with low memory requirements Nat 891 Methods 12 357ndash360 892

Koumlster J Rahmann S (2012) Snakemakemdasha scalable bioinformatics workflow engine Bioinformatics 28 893 2520ndash2522 894

Krouk G Lingeman J Colon A Coruzzi G Shasha D (2013) Gene regulatory networks in plants learning 895 causality from time and perturbation Genome Biol 14 123 896

Kumar S Stecher G Suleski M Hedges SB (2017) TimeTree a resource for timelines timetrees and 897 divergence times Mol Biol Evol 34 1812ndash1819 898

Kumari S Nie J Chen H-S Ma H Stewart R Li X Lu M-Z Taylor WM Wei H (2012) Evaluation of gene 899 association methods for coexpression network construction and biological knowledge discovery PLoS 900 One 7 e50411 901

Langfelder P Horvath S (2008) WGCNA an R package for weighted correlation network analysis BMC 902 Bioinformatics 9 559 903

Leinonen R Sugawara H Shumway M (2010) The sequence read archive Nucleic Acids Res gkq1019 904

Li C Qiao Z Qi W Wang Q Yuan Y Yang X Tang Y Mei B Lv Y Zhao H et al (2015a) Genome-Wide 905 Characterization of cis-Acting DNA Targets Reveals the Transcriptional Regulatory Framework of 906 Opaque2 in Maize Plant Cell 27 532-545 907

Li H Peng Z Yang X Wang W Fu J Wang J Han Y Chai Y Guo T Yang N (2013a) Genome-wide 908 association study dissects the genetic architecture of oil biosynthesis in maize kernels Nat Genet 45 43ndash909 50 910

Li J Wei H Zhao PX (2013b) DeGNServer  Deciphering Genome-Scale Gene Networks through High 911 Performance Reverse Engineering Analysis 2013 912

Li P Piao Y Shon HS Ryu KH (2015b) Comparing the normalization methods for the differential analysis of 913 Illumina high-throughput RNA-Seq data BMC Bioinformatics 16 347 914

Li Q Eichten SR Hermanson PJ Zaunbrecher VM Song J Wendt J Rosenbaum H Madzima TF Sloan AE 915 Huang J et al (2014a) Genetic Perturbation of the Maize Methylome Plant Cell 26 4602ndash4616 916

Li S Łabaj PP Zumbo P Sykacek P Shi W Shi L Phan J Wu P-Y Wang M Wang C (2014b) Detecting and 917 correcting systematic variation in large-scale RNA sequencing data Nature Biotechnol 32 888ndash895 918

Li Y Pearl SA Jackson SA (2015c) Gene Networks in Plant Biology Approaches in Reconstruction and 919 Analysis Trends Plant Sci 20 664ndash675 920

Liao Y Smyth GK Shi W (2014) featureCounts an efficient general purpose program for assigning sequence 921 reads to genomic features Bioinformatics 30 923ndash930 922

Lim WK Wang K Lefebvre C Califano A (2007) Comparative analysis of microarray normalization procedures 923 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 27

Effects on reverse engineering gene networks Bioinformatics pp 282ndash288 924

Liu S Liu Y Zhao J Cai S Qian H Zuo K Zhao L Zhang L (2017) A computational interactome for prioritizing 925 genes associated with complex agronomic traits in rice Plant J 90 177-188 926

Lonsdale J Thomas J Salvatore M Phillips R Lo E Shad S Hasz R Walters G Garcia F Young N (2013) 927 The genotype-tissue expression (GTEx) project Nat Genet 45 580ndash585 928

Love MI Huber W Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data 929 with DESeq2 Genome Biol 15 1 930

Luo X You Z Zhou M Li S Leung H Xia Y Zhu Q (2015) A highly efficient approach to protein interactome 931 mapping based on collaborative filtering framework Sci Rep 5 7702 932

Ma C Wang X (2012) Application of the Gini correlation coefficient to infer regulatory relationships in 933 transcriptome analysis Plant Physiol 160 192ndash203 934

Ma H-W Zeng A-P (2003) The connectivity structure giant strong component and centrality of metabolic 935 networks Bioinformatics 19 1423ndash1430 936

Ma S Shah S Bohnert HJ Snyder M Dinesh-Kumar SP (2013) Incorporating motif analysis into gene co-937 expression networks reveals novel modular expression pattern and new signaling pathways PLoS Genet 938 9 e1003840 939

Marbach D Costello JC Kuumlffner R Vega NNM Prill RJ Camacho DM Allison KR Aderhold A Allison KR 940 Bonneau R et al (2012) Wisdom of crowds for robust gene network inference Nat Methods 9 796ndash804 941

Margolin AA Nemenman I Basso K Wiggins C Stolovitzky G Dalla Favera R Califano A (2006) ARACNE 942 an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context BMC 943 Bioinformatics 7 S7 944

Mark Cigan A Unger‐Wallace E Haug‐Collet K (2005) Transcriptional gene silencing as a tool for uncovering 945 gene function in maize Plant J 43 929ndash940 946

Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads EMBnet J 17 947 pp-10 948

Maza E Frasse P Senin P Bouzayen M Zouine M (2013) Comparison of normalization methods for 949 differential gene expression analysis in RNA-Seq experiments A matter of relative size of studied 950 transcriptomes Commun Integr Biol 6 e25849 951

McClintock B (1983) The Significance of Responses of THE GENOME TO CHALLENGE Science (80- ) 792ndash952 801 953

Meyer PE Kontos K Lafitte F Bontempi G (2007) Information-theoretic inference of large transcriptional 954 regulatory networks Eurasip J Bioinforma Syst Biol doi 101155200779879 955

Meyer PE Lafitte F Bontempi G (2008) minet ARBioconductor package for inferring large transcriptional 956 networks using mutual information BMC Bioinformatics 9 461 957

Monaco MK Sen TZ Dharmawardhana PD Ren L Schaeffer M Naithani S Amarasinghe V Thomason J 958 Harper L Gardiner J et al (2013) Maize Metabolic Network Construction and Transcriptome Analysis 959 Plant Genome 6 12 960

Morohashi K Casas MI Falcone Ferreyra ML Falcone Ferreyra L Mejiacutea-Guerra MK Pourcel L Yilmaz A 961 Feller A Carvalho B Emiliani J et al (2012) A genome-wide regulatory framework identifies maize 962 pericarp color1 controlled genes Plant Cell 24 2745ndash64 963

Morris JH Apeltsin L Newman AM Baumbach J Wittkop T Su G Bader GD Ferrin TE (2011) clusterMaker 964 a multi-algorithm clustering plugin for Cytoscape BMC Bioinformatics 12 436 965

Mortazavi A Williams BA McCue K Schaeffer L Wold B (2008) Mapping and quantifying mammalian 966 transcriptomes by RNA-Seq Nat Methods 5 621ndash628 967

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 28

Mukaka MM (2012) A guide to appropriate use of correlation coefficient in medical research Malawi Med J 24 968 69ndash71 969

Obayashi T Hayashi S Saeki M Ohta H Kinoshita K (2009) ATTED-II provides coexpressed gene networks 970 for Arabidopsis Nucleic Acids Res 37 D987ndashD991 971

Ogata Y Suzuki H Sakurai N Shibata D (2010) CoP a database for characterizing co-expressed gene 972 modules with biological information in plants Bioinformatics 26 1267ndash1268 973

Oshlack A Wakefield MJ (2009) Transcript length bias in RNA-seq data confounds systems biology Biol 974 Direct 4 14 975

Park T Yi S-G Kang S-H Lee S Lee Y-S Simon R (2003) Evaluation of normalization methods for microarray 976 data BMC Bioinformatics 4 33 977

Paulson J Chen C-Y Lopes-Ramos CM Kuijjer ML Platig J Sonawane AR Fagny M Glass K Quackenbush 978 J (2016) Tissue-aware RNA-Seq processing and normalization for heterogeneous and sparse data 979 bioRxiv 81802 980

Pautler M Eveland AL LaRue T Yang F Weeks R Lunde C Je B Il Meeley R Komatsu M Vollbrecht E et 981 al (2015) FASCIATED EAR4 Encodes a bZIP Transcription Factor That Regulates Shoot Meristem Size in 982 Maize Plant Cell Online 2 tpc114132506 983

Penning BW Hunter 3rd CT Tayengwa R Eveland AL Dugard CK Olek AT Vermerris W Koch KE McCarty 984 DR Davis MF et al (2009) Genetic resources for maize cell wall biology Plant Physiol 151 1703ndash1728 985

Ponnala L Wang Y Sun Q Wijk KJ (2014) Correlation of mRNA and protein abundance in the developing 986 maize leaf Plant J 78 424ndash440 987

Rau A Gallopin M Celeux G Jaffreacutezic F (2013) Data-based filtering for replicated high-throughput 988 transcriptome sequencing experiments Bioinformatics 29 2146ndash2152 989

Robinson MD McCarthy DJ Smyth GK (2010) edgeR a Bioconductor package for differential expression 990 analysis of digital gene expression data Bioinformatics 26 139ndash140 991

Sales G Romualdi C (2011) parmigenemdasha parallel R package for mutual information estimation and gene 992 network reconstruction Bioinformatics 27 1876ndash1877 993

Sangrador-Vegas A Mitchell AL Chang H-Y Yong S-Y Finn RD (2016) GO annotation in InterPro why 994 stability does not indicate accuracy in a sea of changing annotations Database J Biol databases 995 curation 2016 996

Sato Y Antonio BA Namiki N Takehisa H Minami H Kamatsuki K Sugimoto K Shimizu Y Hirochika H 997 Nagamura Y (2011) RiceXPro a platform for monitoring gene expression in japonica rice grown under 998 natural field conditions Nucleic Acids Res 39 D1141ndashD1148 999

Schaefer RJ Briskine R Springer NM Myers CL (2014) Discovering functional modules across diverse maize 1000 transcriptomes using COB the co-expression browser PLoS One doi 101371journalpone0099193 1001

Schmidt D (2016) Co-Operation Fast Correlation Covariance and Cosine Similarity R package 1002

Schnable P Ware D Fulton R Stein J (2009) The B73 maize genome complexity diversity and dynamics 1003 Science (80- ) 326 1112ndash1115 1004

Schwanhaumlusser B Busse D Li N Dittmar G Schuchhardt J Wolf J Chen W Selbach M (2011) Global 1005 quantification of mammalian gene expression control Nature 473 337ndash342 1006

Scott-Boyer M-P Haibe-Kains B Deschepper CF (2013) Network statistics of genetically-driven gene co-1007 expression modules in mouse crosses Frontiers in Genetics 20134291 1008

Serin EAR Nijveen H Hilhorst HWM Ligterink W (2016) Learning from Co-expression Networks Possibilities 1009 and Challenges Front Plant Sci 7 444 1010

Shannon P Markiel A Ozier O Baliga NS Wang JT Ramage D Amin N Schwikowski B Ideker T (2003) 1011 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 29

Cytoscape a software environment for integrated models of biomolecular interaction networks Genome 1012 Res 13 2498ndash2504 1013

Sing T Sander O Beerenwinkel N Lengauer T (2005) ROCR visualizing classifier performance in R 1014 Bioinformatics 21 3940ndash3941 1015

Singh M Goel S Meeley RB Dantec C Parrinello H Michaud C Leblanc O Grimanelli D (2011) Production of 1016 viable gametes without meiosis in maize deficient for an ARGONAUTE protein Plant Cell 23 443ndash458 1017

Song L Langfelder P Horvath S (2012) Comparison of co-expression measures mutual information 1018 correlation and model based indices BMC Bioinformatics 13 328 1019

Stelpflug SC Rajandeep S Vaillancourt B Hirsch CN Buell CR Leon N De Kaeppler SM (2015) An 1020 expanded maize gene expression atlas based on RNA-sequencing and its use to explore root 1021 development Plant Genome 314ndash362 1022

Szklarczyk D Franceschini A Wyder S Forslund K Heller D Huerta-Cepas J Simonovic M Roth A Santos A 1023 Tsafou KP et al (2015) STRING v10 Protein-protein interaction networks integrated over the tree of life 1024 Nucleic Acids Res 43 D447ndashD452 1025

Tzfadia O Diels T De Meyer S Vandepoele K Aharoni A Van de Peer Y (2016) CoExpNetViz Comparative 1026 Co-Expression Networks Construction and Visualization Tool Front Plant Sci 6 1194 1027

Usadel B Obayashi T Mutwil M Giorgi FM Bassel GW Tanimoto M Chow A Steinhauser D Persson S 1028 Provart NJ (2009) Co-expression tools for plant biology opportunities for hypothesis generation and 1029 caveats Plant Cell Environ 32 1633ndash51 1030

USDA (2016) Grain World Markets and Trade 1031

Vanholme R Demedts B Morreel K Ralph J Boerjan W (2010) Lignin biosynthesis and structure Plant 1032 Physiol 153 895ndash905 1033

Verdin E Ott M (2014) 50 Years of Protein Acetylation From Gene Regulation To Epigenetics Metabolism 1034 and Beyond Nat Rev Mol Cell Biol 16 258ndash264 1035

Walley JW Sartor RC Shen Z Schmitz RJ Wu KJ Urich MA Nery JR Smith LG Schnable JC Ecker JR 1036 (2016) Integration of omic networks in a developmental atlas of maize Science 353 814ndash818 1037

Wang W Zhou XJ Liu Z Sun F (2015a) Network tuned multiple rank aggregation and applications to gene 1038 ranking BMC Bioinformatics 16 S6 1039

Wang YXR Jiang K Feldman LJ Bickel PJ Huang H (2015b) Inferring genendashgene interactions and functional 1040 modules using sparse canonical correlation analysis Ann Appl Stat 9 300ndash323 1041

Wang Z Gerstein M Snyder M (2009) RNA-Seq a revolutionary tool for transcriptomics Nat Rev Genet 10 1042 57ndash63 1043

Wei C Li J Bumgarner RE (2004) Sample size for detecting differentially expressed genes in microarray 1044 experiments BMC Genomics 5 87 1045

Wolfe CJ Kohane IS Butte AJ (2005) Systematic survey reveals general applicability ofldquo guilt-by-associationrdquo 1046 within gene coexpression networks BMC Bioinformatics 6 227 1047

Wu D-D Wang X Li Y Zeng L Irwin DM Zhang Y-P (2014) ldquoOut of Pollenrdquo Hypothesis for Origin of New 1048 Genes in Flowering Plants Study from Arabidopsis thaliana Genome Biol Evol 6 2822ndash2829 1049

Yang H Liu X Xin M Du J Hu Z Peng H Rossi V Sun Q Ni Z Yao Y (2016) Genome-wide Mapping of 1050 Targets of Maize Histone Deacetylase HDA101 Reveals Its Function and Regulatory Mechanism during 1051 Seed Development Plant Cell 28 629ndash645 1052

Yim WC Yu Y Song K Jang CS Lee B-M (2013) PLANEX the plant co-expression database BMC Plant Biol 1053 13 83 1054

Zheng W Chung LM Zhao H (2011) Bias detection and correction in RNA-Sequencing data BMC 1055 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 30

Bioinformatics 12 290 1056

Zhong R Allen JD Xiao G Xie Y (2014) Ensemble-based network aggregation improves the accuracy of gene 1057 network reconstruction PLoS One 9 e106319 1058

Zhong R Ye ZH (2015) Secondary cell walls Biosynthesis patterned deposition and transcriptional regulation 1059 Plant Cell Physiol 56 195ndash214 1060

Zhu G Wu A Xu X-J Xiao P Lu L Liu J Cao Y Chen L Wu J Zhao X-M (2016) PPIM A protein-protein 1061 interaction database for Maize Plant Physiol 170 pp1501821- 1062

Zyprych-Walczak J Szabelska A Handschuh L Goacuterczak K Klamecka K Figlerowicz M Siatkowski I (2015) 1063 The impact of normalization methods on RNA-Seq data analysis Biomed Res Int 2015 1064

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 1 Number of maize microarray and RNA-Seq samples submitted to NCBI (httpswwwncbinlmnihgov) from 2008-2016 A A text search of the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) database identified samples generated by microarray platforms GPL4032 and GPL12620 the total values for years 2008 to 2016 were combined to represent the number of microarray studies (Microarray) A text search of the NCBI Sequence Read Archive (SRA) database was used to identify RNA-seq samples generated between 2008 and 2016 using the Illumina sequencing platform (RNA-Seq) B the number of samples submitted to NCBI GEO database each year generated by microarray platform GPL4032 and GPL12620 were identified by a text search (dash line) and compared to the number of RNA-Seq Illumina samples (solid line) per year 2008-2016

Fig 1A B

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 2 Normalization and network inference methods effect on single network performance A Network performance was evaluated by calculating area under the receiver operating characteristic curve (AUROC) values from GO datasets for comparisons with samples normalized using Variance Stabilizing Transformation (VST) Counts Per Million (CPM) or Reads per Kilobase per Million (RPKM) methods B Network performance was evaluated by calculating AUROC values from PPPTY dataset comparisons for samples normalized using VST CPM or RPKM methods C Network performance was evaluated by calculating AUROC values from com-parisons with HDA101 binding targets for samples normalized using VST CPM or RPKM methods D Network performance was evaluated by calculating AUROC values from comparisons with GO dataset for samples constructed using nine inference methods including Pearson Correlation Coefficient (PCC) Spearman correla-tion coefficient (SCC) Kendall rank Correlation Coefficient (KCC) Gini correlation coefficient (GCC) Biweight midcorrelation (BIC) Cosine Similarity Coefficient (CSC) Additive ARCNE (AA) multiplicative ARCNE (MA) Minimum Redundancy NETwork (MRNET) and Context Likelihood of Relatedness (CLR) E Network perfor-mance was evaluated by calculating AUROC values from comparisons with PPPTY for samples constructed using nine inference methods F Network performance was evaluated by calculating AUROC values from com-parisons with HDA101 binding targets for samples constructed using nine inference methods Outliers were defined as outside of 15 times the interquartile range above 75 quantile or below 25 quantile Median values were plotted as bold horizontal lines For C to F the horizontal dashed lines indicate the highest and lowest AUROC values

Fig 2 A D

B E

C F

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

FigP

FigurePIPSimilarityPbetweenPninePinferencePmethodsPonPnetworkPperformancePbasedPuponPGOPA-PandPPPPTYPB-PevaluationI Genes with highest AUROC value in CSC or MRNETCLR are closed in blue and green box respectively AreaPunderPthePROCPcurvePAUROC-PvaluesPforPeachPGOPtermPorPgenesPwerePscaledPtoPstandardPnormalPdistributionMPresultingPinPscaledPAUROCPvaluesPbetweenPKPblue-PandPPred-IPSamplesPnormalizedPbyPVSTMPCPMPandPRPKMPwerePanalyzedPusingPeachPinferencePmethodsPPCCMPSCCMPKCCMPGCCMPBICMPCSCMPAAMPMAMPMRNETPandPCLR-PandPclusteredPbasedPonP EuclidianP distanceIP PCCP PearsonP CorrelationP CoefficientP SCCP SpearmanP correlationP coefficientP KCCP KendallP rankP CorrelationP CoefficientPGCCPGiniPcorrelationPcoefficientPBICPBiweightPmidcorrelationPCosinePSimilarityPCoefficientPCSC-PAAPAdditivePARCNEPMAPmultiplicativePARCNEPMRNETPMinimumPRedundancyPNETworkPCLRPContextPLikelihoodPofPRelatednessI

A

B

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 4 Effect of sample size on network performance A Average area under the ROC curve (average AUROC) values from GO evaluation of 17 different sized networks plotted against natural logarithm transformed sample size (log(sample size)) B Average AUROC values from PPPTY evaluation of 17 different sized networks plotted against natural logarithm transformed sample size (log(sample size)) Regression fitting logarithm models were plotted in black lines R-square and p-values were calculated by lm() function in R PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness

Fig 4A

B

GO PCC GO SCC

GO MRNET GO CLR

PPPTY PCC PPPTY SCC

PPPTY CLRPPPTY MRNET

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Fig 5

A B

Figure 5 Evaluation of different sized networks constructed with different number of samples using PCC (black) SCC (green) MRNET (red) and CLR (blue) methods A Average AUROC values from GO evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted against sample size B Average AUROC values from PPPTY evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted against sample size Networks with the same number of samples included are designated as ldquo_1rdquo ldquo_2rdquo and ldquo_3rdquo PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 6 GCN performance comparison of networks constructed with 1266 libraries A Area under the ROC curve (AUROC) values from GO evalu-ation of single network (white bars) aggregation network (grey bars) and protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers B AUROC values from PPPTY evaluation of single network (white bars) aggregation network (grey bars) and protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers

AU

C

AU

C

Protein GO Protein PPPTYA B

Fig 6

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

835

45

3812

5 802

148

MS PA

SA

872505

167664

16510411732

9172 716573

106591

MS PA

SA

chromatinassemblydisassembly

cellular macromoleculemetabolic process

chromatin assembly

Hub

N2 compound metabolicprocess

gene silencing

macromoleculemetabolic process

cellular componentorganization

chromatin modification

biosynthetic process

cellular biosyntheticprocess

DNA packaging

organelle organization

protein-DNA complexassembly

nucleosomeorganization

DNA-dep DNAreplication

macromoleculebiosynthetic process

response to DNAdamage stimulus

chromosomeorganization

pattern specificationprocess

DNA replication

DNA conformationchange

translation

cellular macromoleculebiosynthetic process

Nucleic acid metabolicprocess

gene expression

chromatin organizationnucleosome assembly

epigenetic reg of geneexpression

negative regulation ofmacromolecule

metabolic process

cellular response tostress

RNA processing

DNA repair

regionalization

polysaccharidebiosynthetic process

cell wall organization orbiogenesis

glucan metabolicprocess

cellular glucanmetabolic process

cellular polysaccharidebiosynthetic process

cellular carbohydratebiosynthetic process

cellulose metabolicprocess

cellular polysaccharidemetabolic process

cellulose biosyntheticprocess epidermis development

cell growthgrowth

regulation of cellularcomponent size

cellular amino acidderivative metabolic

process

cell wall polysaccharidemetabolic process

carbohydrate metabolicprocess

regulation of anatomicalstructure size

GTP metabolic process

root morphogenesis

epidermal celldifferentiation

ectoderm developmentphenylpropanoid

biosynthetic process

regulation of cell size

glucan biosyntheticprocess

carbohydratebiosynthetic process cellular cell wall

organization orbiogenesis

cell wall biogenesis

Cell Wallroot epidermal cell

differentiationcell differentiation

cell wall organization

protein polymerization

plant-type cell wallbiogenesis

cellular carbohydratemetabolic process

phenylpropanoidmetabolic process

cell wall macromoleculemetabolic process

cellular cell wallmacromolecule

metabolic process

plant-type cell wallorganization or

biogenesis

hemicellulose metabolicprocess

Fig 7A C

B D

Figure 7 Comparison of top 1000 interacting genes in aggregation and single networks A Venn diagram shows the overlap among 1000 highest interacting genes in three networks 148 genes were shared among three networks PA PCC ranked aggregation network SA SCC ranked aggrega-tion network MS MRNET single network B Venn diagram shows the overlap among top 1x106 edges in three networks 106591 edges were shared among three networks C GO term enrichment analyzed by AgriGO for 148 shared highly interacting genes among PA SA and MS D GO term enrichment analyzed by AgriGO for 214 co-expressed genes queried by 16 cell wall pathway genes wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Fig 8

A B C

Figure 8 Cell wall pathway subnetworks A Intersections of PCC aggregation (PA) SCC aggregation (SA) and MRNET-single (MS) networks queried by 16 cell wall pathway genes (red nodes) Cyan nodes are genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions B Network retrieved from CORNET database queried by the 16 cell wall pathway genes (red node) Cyan nodes are genes with reported function in cell wall related path-ways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions C Network retrieved from STRING database queried by the 16 cell wall pathway genes (red nodes) Cyan nodes are genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Parsed CitationsAllen JD Xie Y Chen M Girard L Xiao G (2012) Comparing statistical methods for constructing large scale gene networks PLoS One7 e29348

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Anders S Huber W (2010) Differential expression analysis for sequence count data Genome Biol 11 R106Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Aoki K Ogata Y Shibata D (2007) Approaches for extracting practical information from gene co-expression networks in plant biologyPlant Cell Physiol 48 381-90

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Aoki Y Okamura Y Tadaka S Kinoshita K Obayashi T (2015) ATTED-II in 2016 A Plant Coexpression Database Towards Lineage-Specific Coexpression Plant Cell Physiol 57 e5-e5

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Asnicar F Masera L Coller E Gallo C Sella N Tolio T Morettin P Erculiani L Galante F Semeniuta S (2016) NES2RA Networkexpansion by stratified variable subsetting and ranking aggregation Int J High Perform Comput Appl 1094342016662508

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Baerenfaller K Grossmann J Grobei MA Hull R Hirsch-Hoffmann M Yalovsky S Zimmermann P Grossniklaus U Gruissem WBaginsky S (2008) Genome-scale proteomics reveals Arabidopsis thaliana gene models and proteome dynamics Science (80- ) 320938-941

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ballouz S Verleyen W Gillis J (2015) Guidance for RNA-seq co-expression network construction and analysis Safety in numbersBioinformatics 31 2123-2130

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ballouz S Weber M Pavlidis P Gillis J (2016) EGAD Ultra-fast functional analysis of gene networks bioRxiv 53868Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Barabasi A-L Oltvai ZNZN Barabaacutesi A-L (2004) Network biology understanding the cells functional organization Nat Rev Genet 5101-113

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Benjamini Y Hochberg Y (1995) Controlling the false discovery rate a practical and powerful approach to multiple testing J R Stat SocSer B 289-300

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

De Bodt S Hollunder J Nelissen H Meulemeester N Inzeacute D (2012) CORNET 20 Integrating plant coexpression protein-proteininteractions regulatory interactions gene associations and functional annotations New Phytol 195 707-720

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Bolduc N Yilmaz A Mejia-Guerra MK Morohashi K OConnor D Grotewold E Hake S (2012) Unraveling the KNOTTED1 regulatorynetwork in maize meristems Genes Dev 26 1685-90

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Bosch M Mayer C-D Cookson A Donnison IS (2011) Identification of genes involved in cell wall biogenesis in grasses by differential wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

gene expression profiling of elongating and non-elongating maize internodes J Exp Bot 62 3545-3561Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Camacho C Coulouris G Avagyan V Ma N Papadopoulos J Bealer K Madden TL (2009) BLAST+ architecture and applications BMCBioinformatics 10 421

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Conesa A Madrigal P Tarazona S Gomez-Cabrero D Cervera A McPherson A Szczesniak MW Gaffney DJ Elo LL Zhang X et al(2016) A survey of best practices for RNA-seq data analysis Genome Biol 17 13

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dhaeseleer P Liang S Somogyi R (2000) Genetic network inference from co-expression clustering to reverse engineeringBioinformatics 16 707-726

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Davidson RM Hansey CN Gowda M Childs KL Lin H Vaillancourt B Sekhon RS de Leon N Kaeppler SM Jiang N et al (2011) Utilityof RNA Sequencing for Analysis of Maize Reproductive Transcriptomes Plant Genome J 4 191

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Desprez T Juraniec M Crowell EF Jouy H Pochylova Z Parcy F Houmlfte H Gonneau M Vernhettes S (2007) Organization of cellulosesynthase complexes involved in primary cell wall synthesis in Arabidopsis thaliana Proc Natl Acad Sci 104 15572-15577

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dhillon IS Modha DS (2001) Concept decompositions for large sparse text data using clustering Mach Learn 42 143-175Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dillies M-A Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot G Castel D Estelle J (2013a) Acomprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis Brief Bioinform 14671-683

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dillies MA Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot NS Castel D Estelle J et al (2013b) Acomprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis Brief Bioinform 14671-683

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Doncheva NT Assenov Y Domingues FS Albrecht M (2012) Topological analysis and interactive visualization of biological networksand protein structures Nature Protoc 7 670-85

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dong J Horvath S (2007) Understanding network concepts in modules BMC Syst Biol 1 24Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dotto MC Petsch KA Aukerman MJ Beatty M Hammell M Timmermans MCP (2014) Genome-wide analysis of leafbladeless1-regulatedand phased small RNAs underscores the importance of the TAS3 ta-siRNA pathway to maize development PLoS Genet 10 e1004826

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Du D Rawat N Deng Z Gmitter FG (2015) Construction of citrus gene coexpression networks from microarray data using randommatrix theory Hortic Res 2 15026

Pubmed Author and TitleCrossRef Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Google Scholar Author Only Title Only Author and Title

Du Z Zhou X Ling Y Zhang Z Su Z (2010) agriGO a GO analysis toolkit for the agricultural community Nucleic Acids Res 38 64-70Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Enright AJ Van Dongen S Ouzounis CA (2002) An efficient algorithm for large-scale detection of protein families Nucleic Acids Res 301575-1584

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fadista J Vikman P Laakso EO Mollet IG Esguerra J Lou Taneera J Storm P Osmark P Ladenvall C Prasad RB (2014) Globalgenomic and transcriptomic analysis of human pancreatic islets reveals novel genes influencing glucose metabolism Proc Natl AcadSci 111 13924-13929

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Faith JJ Hayete B Thaden JT Mogno I Wierzbowski J Cottarel G Kasif S Collins JJ Gardner TS (2007) Large-scale mapping andvalidation of Escherichia coli transcriptional regulation from a compendium of expression profiles PLoS Biol 5 0054-0066

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fedoroff N V (2012) McClintocks challenge in the 21st century Proc Natl Acad Sci 109(50) 20200-20203Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ficklin SP Feltus FA (2011) Gene coexpression network alignment and conservation of gene modules between two grass speciesmaize and rice Plant Physiol 156 1244-56

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Filzmoser P Fritz H Kalcher K (2009) pcaPP Robust PCA by Projection Pursuit R Packag version 1Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fu J Cheng Y Linghu J Yang X Kang L Zhang Z Zhang J He C Du X Peng Z (2013) RNA sequencing reveals the complex regulatorynetwork in the maize kernel Nature Commun 42832

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gao S Ver Steeg G Galstyan A (2015) Efficient Estimation of Mutual Information for Strongly Dependent Variables ArtificialIntelligence and Statistics 277-286

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gillis J Pavlidis P (2011) The role of indirect connections in gene networks in predicting function Bioinformatics 27 1860-1866Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Giorgi F Fabbro C Del Licausi F (2013) Comparative study of RNA-seq-and Microarray-derived coexpression networks in Arabidopsisthaliana Bioinformatics 2 1-8

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gu Y Kaplinsky N Bringmann M Cobb A Carroll A Sampathkumar A Baskin TI Persson S Somerville CR (2010) Identification of acellulose synthase-associated protein required for cellulose biosynthesis Proc Natl Acad Sci 107 12866-12871

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Han Y Gao S Muegge K Zhang W Zhou B (2015) Advanced applications of RNA sequencing and challenges Bioinform Biol Insights 929-46

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Horvath S Dong J (2008) Geometric interpretation of gene coexpression network analysis PLoS Comput biol 4 e1000117Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Huang J Lynn JS Schulte L Vendramin S McGinnis K (2017) Chapter Two-Epigenetic Control of Gene Expression in Maize Int RevCell Mol Biol 328 25-48

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Iancu OD Kawane S Bottomly D Searles R Hitzemann R McWeeney S (2012) Utilizing RNA-Seq data for de novo coexpressionnetwork inference Bioinformatics 28 1592-1597

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kim D Langmead B Salzberg SL (2015) HISAT a fast spliced aligner with low memory requirements Nat Methods 12 357-360Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Koumlster J Rahmann S (2012) Snakemakemdasha scalable bioinformatics workflow engine Bioinformatics 28 2520-2522Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Krouk G Lingeman J Colon A Coruzzi G Shasha D (2013) Gene regulatory networks in plants learning causality from time andperturbation Genome Biol 14 123

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kumar S Stecher G Suleski M Hedges SB (2017) TimeTree a resource for timelines timetrees and divergence times Mol Biol Evol34 1812-1819

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kumari S Nie J Chen H-S Ma H Stewart R Li X Lu M-Z Taylor WM Wei H (2012) Evaluation of gene association methods forcoexpression network construction and biological knowledge discovery PLoS One 7 e50411

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Langfelder P Horvath S (2008) WGCNA an R package for weighted correlation network analysis BMC Bioinformatics 9 559Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Leinonen R Sugawara H Shumway M (2010) The sequence read archive Nucleic Acids Res gkq1019Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li C Qiao Z Qi W Wang Q Yuan Y Yang X Tang Y Mei B Lv Y Zhao H et al (2015a) Genome-Wide Characterization of cis-Acting DNATargets Reveals the Transcriptional Regulatory Framework of Opaque2 in Maize Plant Cell 27 532-545

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li H Peng Z Yang X Wang W Fu J Wang J Han Y Chai Y Guo T Yang N (2013a) Genome-wide association study dissects the geneticarchitecture of oil biosynthesis in maize kernels Nat Genet 45 43-50

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li J Wei H Zhao PX (2013b) DeGNServer Deciphering Genome-Scale Gene Networks through High Performance ReverseEngineering Analysis 2013

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li P Piao Y Shon HS Ryu KH (2015b) Comparing the normalization methods for the differential analysis of Illumina high-throughputRNA-Seq data BMC Bioinformatics 16 347 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li Q Eichten SR Hermanson PJ Zaunbrecher VM Song J Wendt J Rosenbaum H Madzima TF Sloan AE Huang J et al (2014a)Genetic Perturbation of the Maize Methylome Plant Cell 26 4602-4616

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li S Labaj PP Zumbo P Sykacek P Shi W Shi L Phan J Wu P-Y Wang M Wang C (2014b) Detecting and correcting systematicvariation in large-scale RNA sequencing data Nature Biotechnol 32 888-895

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li Y Pearl SA Jackson SA (2015c) Gene Networks in Plant Biology Approaches in Reconstruction and Analysis Trends Plant Sci 20664-675

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Liao Y Smyth GK Shi W (2014) featureCounts an efficient general purpose program for assigning sequence reads to genomicfeatures Bioinformatics 30 923-930

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Lim WK Wang K Lefebvre C Califano A (2007) Comparative analysis of microarray normalization procedures Effects on reverseengineering gene networks Bioinformatics pp 282-288

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Liu S Liu Y Zhao J Cai S Qian H Zuo K Zhao L Zhang L (2017) A computational interactome for prioritizing genes associated withcomplex agronomic traits in rice Plant J 90 177-188

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Lonsdale J Thomas J Salvatore M Phillips R Lo E Shad S Hasz R Walters G Garcia F Young N (2013) The genotype-tissueexpression (GTEx) project Nat Genet 45 580-585

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Love MI Huber W Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 Genome Biol15 1

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Luo X You Z Zhou M Li S Leung H Xia Y Zhu Q (2015) A highly efficient approach to protein interactome mapping based oncollaborative filtering framework Sci Rep 5 7702

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma C Wang X (2012) Application of the Gini correlation coefficient to infer regulatory relationships in transcriptome analysis PlantPhysiol 160 192-203

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma H-W Zeng A-P (2003) The connectivity structure giant strong component and centrality of metabolic networks Bioinformatics 191423-1430

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma S Shah S Bohnert HJ Snyder M Dinesh-Kumar SP (2013) Incorporating motif analysis into gene co-expression networks revealsnovel modular expression pattern and new signaling pathways PLoS Genet 9 e1003840

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Marbach D Costello JC Kuumlffner R Vega NNM Prill RJ Camacho DM Allison KR Aderhold A Allison KR Bonneau R et al (2012)Wisdom of crowds for robust gene network inference Nat Methods 9 796-804

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Margolin AA Nemenman I Basso K Wiggins C Stolovitzky G Dalla Favera R Califano A (2006) ARACNE an algorithm for thereconstruction of gene regulatory networks in a mammalian cellular context BMC Bioinformatics 7 S7

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mark Cigan A Unger-Wallace E Haug-Collet K (2005) Transcriptional gene silencing as a tool for uncovering gene function in maizePlant J 43 929-940

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads EMBnet J 17 pp-10Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Maza E Frasse P Senin P Bouzayen M Zouine M (2013) Comparison of normalization methods for differential gene expressionanalysis in RNA-Seq experiments A matter of relative size of studied transcriptomes Commun Integr Biol 6 e25849

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

McClintock B (1983) The Significance of Responses of THE GENOME TO CHALLENGE Science (80- ) 792-801Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Meyer PE Kontos K Lafitte F Bontempi G (2007) Information-theoretic inference of large transcriptional regulatory networks EurasipJ Bioinforma Syst Biol doi 101155200779879

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Meyer PE Lafitte F Bontempi G (2008) minet ARBioconductor package for inferring large transcriptional networks using mutualinformation BMC Bioinformatics 9 461

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Monaco MK Sen TZ Dharmawardhana PD Ren L Schaeffer M Naithani S Amarasinghe V Thomason J Harper L Gardiner J et al(2013) Maize Metabolic Network Construction and Transcriptome Analysis Plant Genome 6 12

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Morohashi K Casas MI Falcone Ferreyra ML Falcone Ferreyra L Mejiacutea-Guerra MK Pourcel L Yilmaz A Feller A Carvalho B EmilianiJ et al (2012) A genome-wide regulatory framework identifies maize pericarp color1 controlled genes Plant Cell 24 2745-64

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Morris JH Apeltsin L Newman AM Baumbach J Wittkop T Su G Bader GD Ferrin TE (2011) clusterMaker a multi-algorithmclustering plugin for Cytoscape BMC Bioinformatics 12 436

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mortazavi A Williams BA McCue K Schaeffer L Wold B (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq NatMethods 5 621-628

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mukaka MM (2012) A guide to appropriate use of correlation coefficient in medical research Malawi Med J 24 69-71Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Obayashi T Hayashi S Saeki M Ohta H Kinoshita K (2009) ATTED-II provides coexpressed gene networks for Arabidopsis Nucleic wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Acids Res 37 D987-D991Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ogata Y Suzuki H Sakurai N Shibata D (2010) CoP a database for characterizing co-expressed gene modules with biologicalinformation in plants Bioinformatics 26 1267-1268

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Oshlack A Wakefield MJ (2009) Transcript length bias in RNA-seq data confounds systems biology Biol Direct 4 14Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Park T Yi S-G Kang S-H Lee S Lee Y-S Simon R (2003) Evaluation of normalization methods for microarray data BMC Bioinformatics4 33

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Paulson J Chen C-Y Lopes-Ramos CM Kuijjer ML Platig J Sonawane AR Fagny M Glass K Quackenbush J (2016) Tissue-awareRNA-Seq processing and normalization for heterogeneous and sparse data bioRxiv 81802

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Pautler M Eveland AL LaRue T Yang F Weeks R Lunde C Je B Il Meeley R Komatsu M Vollbrecht E et al (2015) FASCIATED EAR4Encodes a bZIP Transcription Factor That Regulates Shoot Meristem Size in Maize Plant Cell Online 2 tpc114132506

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Penning BW Hunter 3rd CT Tayengwa R Eveland AL Dugard CK Olek AT Vermerris W Koch KE McCarty DR Davis MF et al (2009)Genetic resources for maize cell wall biology Plant Physiol 151 1703-1728

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ponnala L Wang Y Sun Q Wijk KJ (2014) Correlation of mRNA and protein abundance in the developing maize leaf Plant J 78 424-440Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Rau A Gallopin M Celeux G Jaffreacutezic F (2013) Data-based filtering for replicated high-throughput transcriptome sequencingexperiments Bioinformatics 29 2146-2152

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Robinson MD McCarthy DJ Smyth GK (2010) edgeR a Bioconductor package for differential expression analysis of digital geneexpression data Bioinformatics 26 139-140

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sales G Romualdi C (2011) parmigenemdasha parallel R package for mutual information estimation and gene network reconstructionBioinformatics 27 1876-1877

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sangrador-Vegas A Mitchell AL Chang H-Y Yong S-Y Finn RD (2016) GO annotation in InterPro why stability does not indicateaccuracy in a sea of changing annotations Database J Biol databases curation 2016

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sato Y Antonio BA Namiki N Takehisa H Minami H Kamatsuki K Sugimoto K Shimizu Y Hirochika H Nagamura Y (2011) RiceXPro aplatform for monitoring gene expression in japonica rice grown under natural field conditions Nucleic Acids Res 39 D1141-D1148

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schaefer RJ Briskine R Springer NM Myers CL (2014) Discovering functional modules across diverse maize transcriptomes using wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

COB the co-expression browser PLoS One doi 101371journalpone0099193Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schmidt D (2016) Co-Operation Fast Correlation Covariance and Cosine Similarity R packagePubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schnable P Ware D Fulton R Stein J (2009) The B73 maize genome complexity diversity and dynamics Science (80- ) 326 1112-1115Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schwanhaumlusser B Busse D Li N Dittmar G Schuchhardt J Wolf J Chen W Selbach M (2011) Global quantification of mammalian geneexpression control Nature 473 337-342

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Scott-Boyer M-P Haibe-Kains B Deschepper CF (2013) Network statistics of genetically-driven gene co-expression modules in mousecrosses Frontiers in Genetics 20134291

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Serin EAR Nijveen H Hilhorst HWM Ligterink W (2016) Learning from Co-expression Networks Possibilities and Challenges FrontPlant Sci 7 444

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Shannon P Markiel A Ozier O Baliga NS Wang JT Ramage D Amin N Schwikowski B Ideker T (2003) Cytoscape a softwareenvironment for integrated models of biomolecular interaction networks Genome Res 13 2498-2504

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sing T Sander O Beerenwinkel N Lengauer T (2005) ROCR visualizing classifier performance in R Bioinformatics 21 3940-3941Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Singh M Goel S Meeley RB Dantec C Parrinello H Michaud C Leblanc O Grimanelli D (2011) Production of viable gametes withoutmeiosis in maize deficient for an ARGONAUTE protein Plant Cell 23 443-458

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Song L Langfelder P Horvath S (2012) Comparison of co-expression measures mutual information correlation and model basedindices BMC Bioinformatics 13 328

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Stelpflug SC Rajandeep S Vaillancourt B Hirsch CN Buell CR Leon N De Kaeppler SM (2015) An expanded maize gene expressionatlas based on RNA-sequencing and its use to explore root development Plant Genome 314-362

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Szklarczyk D Franceschini A Wyder S Forslund K Heller D Huerta-Cepas J Simonovic M Roth A Santos A Tsafou KP et al (2015)STRING v10 Protein-protein interaction networks integrated over the tree of life Nucleic Acids Res 43 D447-D452

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Tzfadia O Diels T De Meyer S Vandepoele K Aharoni A Van de Peer Y (2016) CoExpNetViz Comparative Co-Expression NetworksConstruction and Visualization Tool Front Plant Sci 6 1194

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Usadel B Obayashi T Mutwil M Giorgi FM Bassel GW Tanimoto M Chow A Steinhauser D Persson S Provart NJ (2009) Co-expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

USDA (2016) Grain World Markets and Trade

Vanholme R Demedts B Morreel K Ralph J Boerjan W (2010) Lignin biosynthesis and structure Plant Physiol 153 895-905Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Verdin E Ott M (2014) 50 Years of Protein Acetylation From Gene Regulation To Epigenetics Metabolism and Beyond Nat Rev MolCell Biol 16 258-264

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Walley JW Sartor RC Shen Z Schmitz RJ Wu KJ Urich MA Nery JR Smith LG Schnable JC Ecker JR (2016) Integration of omicnetworks in a developmental atlas of maize Science 353 814-818

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang W Zhou XJ Liu Z Sun F (2015a) Network tuned multiple rank aggregation and applications to gene ranking BMC Bioinformatics16 S6

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang YXR Jiang K Feldman LJ Bickel PJ Huang H (2015b) Inferring gene-gene interactions and functional modules using sparsecanonical correlation analysis Ann Appl Stat 9 300-323

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang Z Gerstein M Snyder M (2009) RNA-Seq a revolutionary tool for transcriptomics Nat Rev Genet 10 57-63Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wei C Li J Bumgarner RE (2004) Sample size for detecting differentially expressed genes in microarray experiments BMC Genomics5 87

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wolfe CJ Kohane IS Butte AJ (2005) Systematic survey reveals general applicability of guilt-by-association within genecoexpression networks BMC Bioinformatics 6 227

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wu D-D Wang X Li Y Zeng L Irwin DM Zhang Y-P (2014) Out of Pollen Hypothesis for Origin of New Genes in Flowering PlantsStudy from Arabidopsis thaliana Genome Biol Evol 6 2822-2829

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yang H Liu X Xin M Du J Hu Z Peng H Rossi V Sun Q Ni Z Yao Y (2016) Genome-wide Mapping of Targets of Maize HistoneDeacetylase HDA101 Reveals Its Function and Regulatory Mechanism during Seed Development Plant Cell 28 629-645

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yim WC Yu Y Song K Jang CS Lee B-M (2013) PLANEX the plant co-expression database BMC Plant Biol 13 83Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zheng W Chung LM Zhao H (2011) Bias detection and correction in RNA-Sequencing data BMC Bioinformatics 12 290Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Allen JD Xiao G Xie Y (2014) Ensemble-based network aggregation improves the accuracy of gene network reconstructionPLoS One 9 e106319 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Ye ZH (2015) Secondary cell walls Biosynthesis patterned deposition and transcriptional regulation Plant Cell Physiol 56195-214

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhu G Wu A Xu X-J Xiao P Lu L Liu J Cao Y Chen L Wu J Zhao X-M (2016) PPIM A protein-protein interaction database for MaizePlant Physiol 170 pp1501821-

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zyprych-Walczak J Szabelska A Handschuh L Goacuterczak K Klamecka K Figlerowicz M Siatkowski I (2015) The impact of normalizationmethods on RNA-Seq data analysis Biomed Res Int 2015

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

  • Parsed Citations
  • Article File
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Figure 7
  • Figure 8
  • Parsed Citations
Page 8: 1 : Maize RNA-Seq GCNs · 4 Author names and affiliation: 5 Ji Huang, ... and the genome annotation is still far from complete (Mark 49 Cigan et al., 2005; Ficklin and Feltus, 2011)

Page | 8

correlation methods are more computationally efficient while MI methods are able to reveal non-linear 248

relationships (Li et al 2015c) PCC is widely used but may be influenced by outliers (Mukaka 2012) SCC 249

KCC and BIC are less sensitive to outliers because SCC and KCC only consider the rank information and BIC 250

calculates based on dataset median instead of mean (Serin et al 2016) Recently GCC has been shown to 251

be a better correlation method for gene expression analysis because of its capacity to detect non-linear 252

relationships and insensitivity to outliers (Ma and Wang 2012) CSC is widely used for text mining and 253

analyzing sparse data with many zeros (Dhillon and Modha 2001) ARACNE MRNET and CLR showed 254

extended gene-dependent relationships under variable biological settings (Margolin et al 2006 Faith et al 255

2007 Meyer et al 2007 Li et al 2013b) To estimate the effectiveness of the inference methods the same 256

testing parameters with AUROC calculations were performed as described for the testing of normalization 257

methods 258

Assessed by GO datasets the 277 AUROC values were averaged to create one average value for each of the 259

10 inference methods ranging from 0620 to 0724 (Fig 2D) The average AUROC across all normalization 260

methods for six correlation methods was 0718 while the average AUROC for the all four MI methods was 261

0646 The majority of the 277 GO terms had similar AUROC values in the different correlation method-262

generated GCNs and these patterns are different from those observed in the MI-generated GCNs (Fig 3A) 263

The similarity among different methods was also detectable by pairwise comparison and comparing Pearson 264

correlations between the different methods (Supplemental Fig 4A) 265

To evaluate network inference methods with the PPPTY dataset the AUROC values for 1720 genes were 266

averaged for each combination of normalization and inference methods (Fig 2E) This evaluation also showed 267

that the networks constructed using correlation methods resulted in higher AUROC values than MI methods 268

although the CSC method resulted in lower AUROC values than other correlation methods As demonstrated 269

for the GO evaluation results from correlation methods were more similar with each other than the MI methods 270

(Supplemental Fig 4B) Interestingly heatmap results indicated that a subset of genes consistently had higher 271

AUROC values when CSC MRNETCLR or AAMA were used (Fig 3B) although this includes a small enough 272

number of genes that the average AUROC value over the whole gene set was relatively low for those methods 273

The gene sets with highest AUROC values in PCC CSC or MRNET were extracted Characteristics of each 274

gene sets were compared in average expression (CPM) and average number of low expressed elements 275

(CPM lt 0) The CSC gene set had the smallest number of low expression elements and had higher average 276

expression than both the 1720 gene set and the PCC gene set (Supplemental Fig 5) This may indicate that 277

the CSC method is better at determining co-expression for highly expressed genes 278

The AUROC values from 26 targets of HDA101 ChIP-Seq datasets reveals that CSC GCN had the highest 279

AUROC value and the use of MRNETCLR GCNs resulted in slightly higher scores than correlation methods 280

(Fig 2F) This could be explained by the small number of targets creating skewed results but may also 281

indicate that CSCMI methods are more suitable for specific types of genes or interactions between genes 282

(Tzfadia et al 2016) HDA101 is a highly expressed gene in all samples with average expression value equals 283

to 864 CPM and minimum expression equals to 289 CPM so itrsquos possible that HDA101 is more suitable for 284 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 9

CSC method CPM and RPKM normalization methods had higher AUROC values than VST (Fig 2C) Using 285

two models of ARACNE (additive-AA and multiplicative-MA) the co-expression matrices contain less than 05 286

non-zero values for all comparisons and so these techniques were not included in any additional analyses 287

In conclusion our results indicated the widely-used correlation methods resulted in a more predictive maize 288

GCN from a single expression matrix but co-expression with some individual genes may be better detected 289

using MI methods Normalization method did not have a substantial influence on GCNrsquos performance so only 290

CPM normalization was used in conjunction with PCC SCC MRNT and CLR inference for subsequent 291

optimization of other parameters 292

293

Increase Sample Size Had a Positive Effect On GCN 294

GCN analysis can be accomplished with a variable number of samples and datasets but sample size can 295

influence the quality of the resulting GCN (Wei et al 2004 Ballouz et al 2015) Separate analyses were 296

conducted with different numbers of samples and experiments to empirically determine the effect of sample 297

number on GCN effectiveness The data in our analysis consisted of 17 experiments each including between 298

12 and 404 libraries For this analysis CPM normalization method followed by each of four inference methods 299

(PCC SCC MRNET and CLR) was applied to the 17 experiments and the 68 resulting networks were 300

evaluated by both GO and PPPTY 301

From GO and PPPTY evaluation all algorithms exhibit a positive linear relationship between sample size with 302

natural logarithm transformed and average AUROC values (Fig 4) The linear relationships are stronger in 303

PCC and SCC methods with higher r-square values indicating correlation methods benefit more from 304

increasing sample size Thus for building correlation-based GCNs as many samples as possible should be 305

included We also found that as seen for the total GCN analysis PCC and SCC had higher average AUROC 306

values than the MRNET and CLR methods for PPPTY and GO analysis for most of individual networks (Fig 5) 307

308

Ranked Aggregation of Networks Improved Performance of GCNs 309

Ranked aggregation for meta-analysis can also be modified to change the outcomes of GCN by buffering the 310

effect of sample heterogeneity (Zhong et al 2014 Wang et al 2015a Asnicar et al 2016) Aggregated rank 311

standardized correlationMI matrices were calculated from separate experiments to determine if this approach 312

enhanced GCN performance Aggregating individual networks together for meta-analysis can help to highlight 313

true co-expression interactions and reduce noise (Zhong et al 2014 Wang et al 2015a Wang et al 2015b) 314

This analysis was conducted with the 17 differently sized experiments using PCC SCC MRNET and CLR 315

method for GCN inference as we did previously resulting in 68 single GCNs The 17 experiments were 316

aggregated for PCC SCC MRNET and CLR individually and evaluated by GO and PPPTY datasets 317

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 10

Of the 4 aggregated networks that were evaluated the two correlation methods (PCC and SCC) had higher 318

AUROC values than the single network from 1266 samples (Figure 6 and Supplemental Fig 6) However this 319

aggregation strategy did not result in significant higher AUROC scores for the MRNET and CLR method 320

networks compared with single networks with 1266 samples (two-tail Wilcoxon rank test for GO evaluation p-321

values 0494 and 0796) It has been reported that MI estimation accuracy is dependent on sample size (Gao 322

et al 2015) therefore individual MI networks built with a small number of libraries may not demonstrate 323

improved accuracy from aggregation In conclusion the PCCSCC-built GCN performed best using a ranked 324

aggregation strategy and use of this strategy in combination with the other optimized parameters creates a 325

robust GCN 326

327

The Performance of Protein Networks Did Not Exceed Aggregation Networks 328

In many cases mRNA levels in a cell are of interest because mRNA level is thought to be related to the level 329

and function of a protein of interest However many researchers had found inconsistencies between mRNA 330

and protein level (Baerenfaller et al 2008 Schwanhaumlusser et al 2011 Ponnala et al 2014 Walley et al 331

2016) Although relatively less protein expression data is available this data is amenable to GCN construction 332

and could represent a more direct reflection of interacting proteins Using a non-modified protein expression 333

atlas from 23 maize tissues based upon mass spectrometry data (Walley et al 2016) four protein networks 334

were built with PCC SCC MRNET and CLR separately and then evaluated using the same PPPTY and GO 335

dataset as previously mentioned 336

GCNs constructed from protein expression did not exhibit superior AUROC values to those observed for RNA-337

Seq based GCN using the aggregation strategy (Fig 6) When evaluated by GO and PPPTY dataset the 338

performance of the protein network was lower than the aggregated network as well as the single network from 339

1266 samples To confirm this result a two-way ANOVA was computed with pairwise comparison for the GO 340

evaluation which showed that the effect of network type was significant (Supplemental Table S3) A 341

subsequent pairwise comparison using Wilcoxon rank sum test indicated that PCCSCC method were 342

significantly better than MRNETCLR (Supplemental Table S3) although MI methods may be superior for 343

some types of interactions 344

The raw protein expression data included 17862 genes of which 11429 genes overlapped with our RNA-Seq-345

based network and were therefore used for the analysis To demonstrate that the performance of the protein 346

network was not biased due by the selection of genes the PCC method was used for the whole 17862 genes 347

to construct a protein network (Supplemental Fig 7) No improvement could be detected from protein network 348

derived from 17862 genes with p-value equals to 0635 for GO evaluation and 0995 for PPPTY evaluation 349

from one-sided Wilcoxon rank sum test 350

351

PCC and SCC-built GCN Exhibit Identical Topological and Functional Properties 352 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 11

In addition to evaluation of network performance based upon biological characteristics networks can be 353

compared based upon several different network characteristics including clustering coefficient number of 354

nodes network heterogeneity (Dong and Horvath 2007) network centralization (Dong and Horvath 2007) 355

number of detected modules and number of genes in largest module Number of nodes is a basic construct in 356

graph theory depicting the scale of a network Clustering coefficient and number of modules are to model how 357

densely nodes are connected in networks Heterogeneity measures the variability of node connections 358

Centralization indicates how likely some nodes have significantly more connections than average In this 359

analysis each gene corresponds with a node Based on the extensive evaluation using biological 360

characteristics like protein-protein interactions (PPPTY) and predicted gene function (GO) three final maize 361

networks were selected for comparison of basic network characteristics based on their overall performance 362

PCC and SCC-built ranked aggregation network from 17 experiments (PA and SA) MRNET-built single 363

network from 1266 total samples (MS) The three networks were constrained to include the top one million 364

predicted interactions or edges 365

In prior studies most biological networks had scale-free architectures which fit a power-law distribution 366

(Barabasi et al 2004 Doncheva et al 2012 Schaefer et al 2014) For the three final maize networks 367

constructed using optimized parameters both neighborhood connectivity distribution (Supplemental Fig 8) and 368

node degree distribution (Supplemental Fig 9) fit power-law models with r-squared values over 07 The MS 369

network had the highest network centralization value The network heterogeneity value of MS was over two 370

times that of PA and SA indicating that MS may contain more highly interacting genes (Supplemental Table 371

S4) consistent with the observed highest centralization values for this network Centralization and 372

heterogeneity are two variants to model the degree distribution of networks A scale-free network with more 373

numbers of hubs has larger values of centralization and heterogeneity while a network with larger values of 374

centralization and heterogeneity may contain a larger number of hubs or the number of hubs is not significantly 375

large but the degree distributions are extremely imbalanced In biological networks many observations 376

connected large values of centralization and heterogeneity with more hub genes (Ma and Zeng 2003 Horvath 377

and Dong 2008 Iancu et al 2012 Scott-Boyer et al 2013) even though theoretically we cannot rule out the 378

possibility that high values were result from extremely imbalanced degree distribution For the MS network 379

most highly connected genes interacted with a large number of lowly connected genes this pattern is also 380

apparent reflected in the decreasing neighborhood connectivity distribution for the MS network (Supplemental 381

Fig 8) The genes with the most interactions are expected to act as key components in GCN networks 382

(Langfelder and Horvath 2008 Allen et al 2012) and likely represent central regulators of multi-protein 383

biological processes (Ma et al 2013 Du et al 2015) The top 1000 interacting genes from all networks were 384

analyzed in more detail as these were potential ldquohubrdquo genes that may regulate other expression patterns and 385

processes PA and SA shared 95 of the top 1000 interacting genes while MS had 835 unique genes (Fig 386

7A) 148 genes were shared among all three networks (Supplemental Table S5) making these genes strong 387

candidate for central biological regulators The annotation of these genes suggests their participation in a 388

range of basic cellular process (Fig 7C) including gene expression DNA replication translation and gene 389

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 12

silencing (Supplemental Table S5) the top interacting genes were not limited to a subset of cellular 390

biochemistry Ribosomal proteins were the largest component of top interacting genes (27148) which was 391

expected because of their cellular abundance and involvement with translation Interestingly nine epigenetic 392

regulators were found in the 148 shared genes including AGO104 (GRMZM2G141818) (Singh et al 2011) 393

CHR106 (GRMZM2G071025) (Li et al 2014a) and LBL1 (GRMZM2G020187) (Dotto et al 2014) 394

demonstrating the importance of epigenetic regulation for plant development (reviewed by (Huang et al 395

2017)) 396

To reveal the underlying properties of GCNs a graph clustering algorithm Markov Cluster Algorithm(MCL) was 397

used to identify network modules (Enright et al 2002 Morris et al 2011) The result showed a shared pattern 398

between the PA and SA networks that was distinct from the MS network (Supplemental Table S4) The MS 399

network had fewer but larger modules detected than the PA and SA networks Consequently most genes in 400

the MS network clustered into one very large module of 14054 consistent with the high network centralization 401

value for the MS network Conversely PA and SA networks separated into smaller distinct modules with 402

related gene ontology enrichment (Supplemental Table S6 and S7) The pattern displayed by the PA and SA 403

networks (Supplemental Fig 10) seems more likely to represent biologically relevant pathways and so these 404

methods appear to be better for module detection 405

To compile a high-confident co-expression network the top 1 million edges from PA SA and MS were merged 406

together and the intersection of the three produced a 14277 gene 106591 interactions merged network PA 407

and SA shared 835 of common interactions within the networks while MS had 873 unique interactions 408

(Fig 7B) This merged network (Supplemental Dataset S1) was used for a case study analysis of cell wall 409

biosynthesis The same network can also be accessed at httpwwwbiofsuedumcginnislabmcnmain_pagephp 410

411

Case Study Cell Wall Biosynthesis and Regulation 412

To demonstrate the functionality of network the predicted cell wall biosynthesis pathway from the merged 413

network was compared to the existing knowledge of this pathway Sixteen well-characterized components of 414

cell wall biosynthesis were selected as guide genes (Supplemental Table S8) including five cellulose 415

synthase genes seven cellulose synthase-like genes three glycosyl hydrolase genes and one glycosidase 416

gene (Penning et al 2009 Bosch et al 2011) Collectively 214 genes containing 377 edges were extracted 417

from the network with the 16 guide genes (Fig 8 A) two guide genes did not have any co-expressed genes in 418

the network that met the analysis criteria As expected for these 214 genes cell wall related GO terms were 419

enriched (Fig 7D Supplemental Table S9) 420

The resulting 214 co-expressed genes were queried against the Arabidopsis TAIR 10 protein database to 421

retrieve homologs and their annotations using BLASTP The literature was manually searched using the maize 422

genes and their Arabidopsis homologs as queries (Supplemental Table S10) The results of the literature 423

survey showed that 313 (67214) of the genes co-expressed with the guide genes had peer-reviewed 424

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 13

publications indicating a role in cell wall synthesis or related pathways in plants A search using 214 randomly 425

selected genes as queries returned only 327 genes (7214) that were involved in cell wall related pathways 426

This suggests that the network discriminated co-expressed genes and identified some known components of 427

the pathway Lignin biosynthesis genes are expected to function in cell wall biosynthesis to provide rigidity and 428

strength in the secondary cell wall (reviewed by Vanholme et al 2010) Interestingly even though no lignin 429

biosynthesis genes were included in our queries six lignin biosynthesis genes (PAL1 C4H 4CL2 HCT 430

CCoAOMT1 and PDR1) (reviewed by Zhong and Ye 2015) were found to be co-expressed with the guide 431

genes At least nine cellulose biosynthesis and assembly genes were discovered including CESA1 FLA11 432

IRX9 IRX14 and IRX10 (reviewed by Zhong and Ye 2015) Moreover proteins participating in a well-studied 433

physical interaction CSI1 (Cellulose Synthase Interactive 1) CESA6 (Cellulose Synthase 6) and CESA3 434

(Cellulose Synthase 3) (Desprez et al 2007 Gu et al 2010) were also predicted to be expressed in the 435

network There were 131 genes without reported functions in cell wall pathways an indication that GCN 436

analysis can be used to predict undiscovered components of biological pathways in maize 437

The cell wall biosynthesis pathway results were also compared with the CORNET Co-expression database (De 438

Bodt et al 2012) and STRING functional protein association network (Szklarczyk et al 2015) using the same 439

16 genes and similar parameters (See Methods) From CORNET 10 out of 16 genes had co-expressed genes 440

(Fig 8B) In total 210 genes and 325 interactions were retrieved using CORNET of which 19 (40210) had 441

publications supporting their function in cell wall pathways (Supplemental Table S11) STRING performed very 442

well with 14 out of 16 genes demonstrating predicted protein association (Fig 8C) resulting in 817 443

interactions with 76 genes 48 (3675) of co-expressed genes were experimentally confirmed (Supplemental 444

Table S12) the highest percentage among the three methods Only one of the lignin biosynthesis genes 445

(PAL1) was found using CORNET and none were found using STRING Although STRING appears very 446

robust for predicting protein-protein interactions this suggests that an optimized GCN analysis have more 447

power to find genes that function together without physically interacting This case study shows that a robust 448

optimized GCN can discover physical and functional interactions and enhance study of biological relevant 449

interactions A tutorial was provided as supplemental material on how to use Cytoscape to visualize any co-450

expressed genes in our network (Supplemental Dataset S2) 451

452

Discussion 453

As the per-read cost of RNA-Seq technology decreases the use of this technology is quickly increasing With 454

over five thousand libraries available for maize there is now ample data to support GCN analysis This 455

comprehensive evaluation of normalization methods and network inference methods using real maize RNA-456

Seq data will provide a useful set of optimized parameters to support these analyses 457

In our analysis VST CPM and RPKM normalization methods had equivalent outcomes for GCN analysis 458

consistent with prior results using much smaller datasets (Giorgi et al 2013) Several benchmark studies 459

focusing on differential expression (DE) analysis proposed that RPKM performed poorly and should be avoided 460 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 14

(Maza et al 2013 Dillies et al 2013b Zyprych-Walczak et al 2015) This was not observed for the maize 461

GCN testing It is possible that the large number of samples from various labs created enough heterogeneity 462

within samples that normalization effects were minimized (Paulson et al 2016) Furthermore the 463

normalization is on a library basis which means genes within the same library are normalized by similar factors 464

So when the network is constructed by PCC and BIC where expression vectors are centered by mean or 465

median values the effect of different normalization methods are probably small Two rank correlations SCC 466

and KCC only consider difference on relative rankings where normalization has a limited effect It is similar for 467

GCC method The estimation of mutual information is based on the k-nearest neighbor method implemented in 468

parmigene (Sales and Romualdi 2011) Since the three normalization methods shared similar expression 469

distribution (Supplemental Fig 2) MI estimations from different normalizations are expected to be similar 470

When assessing inference methods the simple and widely used correlation methods like PCC and SCC are 471

less time-consuming than MI methods This analysis showed PCCSCC- built GCNs had better overall 472

performance This is consistent with a study in human GCN analysis (Ballouz et al 2015) but SCC did not 473

score higher than other correlation methods using GO and PPPTY evaluations Some genes had higher 474

performance using MI methods but this effect was limited to evaluation with the PPPTY data This may 475

indicate that correlation and MI inference methods assert different kinds of interactions (Meyer et al 2008 476

Marbach et al 2012 Song et al 2012) Marbach et al (2012) stated that integration of multiple inference 477

methods showed a more robust performance than any single inference methods in in silico and E coli 478

expression networks referring to ldquothe wisdom of crowdrdquo However for analysis of the available maize data 479

integration of PCC SCC MRNET and CLR together did not result in a network that outperformed PCC and 480

SCC networks (data not shown) This approach was also less effective in more complex S cerevisiae datasets 481

than prokaryotic networks (Marbach et al 2012) suggesting that more work is required to determine whether 482

integrating algorithms can improve GCNs with eukaryotic data 483

In conclusion we extensively evaluated normalization methods and inference methods for building an RNA-484

Seq based maize GCN This optimization may apply to a range of datasets with shared characteristics of 485

maize including a large and heterogeneous genome with rich and diverse transposon element composition 486

and limited gene annotation 487

488

Materials and Methods 489

RNA-Seq Data Collection and Process 490

The maize genome and its annotation were downloaded from Ensembl Plant Release 31 491

(httpplantsensemblorg) The original 1303 RNA-Seq samples based on illumina HiSeq2000 or Hiseq2500 492

were downloaded from NCBI Sequence Read Archive (SRA) (Leinonen et al 2010) The downloaded files 493

were converted to fastq format using the fastq-dump command in SRA Toolkit (version 252) The adapters for 494

the fastq files were trimmed by Cutadapt 181 (Martin 2011) The adapter-removed files were then quality 495

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 15

checked by FastQC v0112 (httpwwwbioinformaticsbabrahamacukprojectsfastqc) HISAT2 v204 (Kim 496

et al 2015) was used for genome alignment Gene-level expression raw read counts were calculated by 497

FeatureCounts 150 (Liao et al 2014) from aligned bam files (Supplemental Fig S1) 26 libraries with less 498

than 5 million reads total and 11 libraries with less than 70 of total alignment rate were excluded leaving 499

1266 samples (Supplemental Table S1) for the final expression table The processing protocol were 500

streamlined by Snakemake v371 (Koumlster and Rahmann 2012) 501

502

Gene Count Normalization 503

The expression data was normalized using three different methods before constructing GCNs Counts Per 504

Million (CPM) and Reads Per Killbase Per Million (RPKM) were calculated by edgeR package (Robinson et al 505

2010) in R environment and then log2 normalized (expression = log2(CPMRPKM +1) For both method scale 506

factors between samples were estimated by Trimmed Mean of M-values (TMM) in edge R Variance Stabilizing 507

Transformation (VST) was calculated by DESeq2 package (Love et al 2014) Only genes with expression 508

higher than 2 CPM in more than 1000 samples were included from additional analysis (15116 genes) 509

510

Network Inference 511

Six correlation coefficient methods and four mutual information methods were applied to normalized gene 512

expression data to construct GCNs All computing steps were done in the R 331 environment Pearson 513

Correlation Coefficient (PCC) and Spearman Correlation Coefficient (SCC) was calculated by cor() function 514

Kendall rank Correlation Coefficient was calculated using corfk() function in pcaPP package (Filzmoser et al 515

2009) Gini Correlation Coefficient was calculated by adjacencymatrix() function in rsgcc package (Ma and 516

Wang 2012) Biweight midcorrelation was computed by bicor() function in WGCNA package (Langfelder and 517

Horvath 2008) Cosine similarity coefficient was computed by cosine() function in coop package (Schmidt 518

2016) Mutual information results were computed using the parmigene package (Sales and Romualdi 2011) 519

The adjacency matrix weighs derived from ten inference methods were ranked with smallest value equals to 520

one Then ranks were divided by the number of elements in the matrix and diagonal was set to one to make all 521

networks weighs ranging from zero to one 522

523

Network Performance Evaluation 524

To generate the random networks gene IDs were shuffled randomly in CPM or VST normalized expression 525

matrices The randomized expression matrices were then inferenced by PCC MRNET or CLR methods and 526

evaluated For PCC methods 1000 repeats of randomization and evaluation were conducted For MRNET and 527

CLR each inference steps took 2 hours on our server so 10 repeats were conducted 528

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 16

Four maize datasets were used for evaluation First maize protein-protein interactions were downloaded from 529

PPIM v11 (Zhu et al 2016) Only high-confidence interactions were used for evaluation as defined by ranking 530

top 5 in their results Second maize pathway information was downloaded from MaizeCyc v22 (Monaco et 531

al 2013) Genes within same pathways were considered as co-expressed Third maize gene ontology data 532

for AGPv330 was downloaded from AgriGO (Du et al 2010) GO terms with 20 to 300 genes were used for 533

evaluation Fourth ChIP-Seq confirmed targets for HDA101 (GRMZM2G172883) (Yang et al 2016) was used 534

as positive co-expressed examples for evaluation 535

The widely-used Area under Receiver Operating Characteristic (AUROC) for binary classification problems 536

was used for evaluations Protein-protein interaction and pathway information was parsed into lists of co-537

expressed genes Prediction() and performance() function in R package ROCR were used to calculate 538

AUROCs (Sing et al 2005) The 277 AUROC values for GO datasets were calculated by EGAD package 539

(Ballouz et al 2016) in R Basically it utilizes the ldquoguilt-by associationrdquo principle that genes with shared GO 540

terms are more likely to connected Thus networks normalized and inferred by different methods can be 541

evaluated by hiding a subset of genes GO terms and test whether the hidden GO terms could be predicted 542

from the remaining annotations The prediction model performance was measured by AUROC values in three-543

fold cross-validation All ANOVA and pairwise Wilcoxon rank tests were analyzed in R using anova() and 544

pairwisewilcoxtest() function from stats package P-value adjustment method was set to ldquofdrrdquo (Benjamini and 545

Hochberg 1995) 546

Definition of True Positives (TP) False Positives (FP) True Negatives (TN) False Negatives (FN) For the 547

evaluation using PPPTY dataset TP a network predicts two genes are co-expressed and they are co-548

expressed in PPPTY dataset FP a network predicts two genes are co-expressed but they are not TN a 549

network predicts two genes are not co-expressed and they are not co-expressed in PPPTY FN a network 550

predicts two genes are not co-expressed but they are co-expressed in PPPTY datasets For the evaluation 551

using GO dataset TP a network predicts a gene has a specific GO term and it does have that GO term in our 552

GO dataset FP a network predicts a gene has a specific GO term but it does not have that GO term in our 553

GO dataset TN a network predicts a gene does not have a specific GO term and it doesnrsquot have in our GO 554

dataset FN a network predicts a gene does not have a specific GO terms but it has that GO term in GO 555

dataset 556

557

Network Clustering and Characterization 558

For each network the top 1 million edges were selected as stringent co-expression networks The network 559

topological characteristics were computed in Cytoscape (Shannon et al 2003) The neighborhood connectivity 560

distribution and node degree distributions were plotted by Network Analyzer plugin (Doncheva et al 2012) 561

Graph clustering was performed using Markov Cluster Algorithm (MCL) by MCL v14137 with inflation value set 562

to 18 (Enright et al 2002) All networks were visualized in Cytoscape 563

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 17

564

Gene Ontology Enrichment and Visualization 565

Gene ontology enrichment was analyzed in AgriGOrsquos Singular Enrichment Analysis tool (Du et al 2010) 566

15116 genes involved in our networks were used as background references Hypergeometric testing was used 567

to calculate p-value for which a value below 005 was considered as significant The Yekutieli method was 568

used for multiple test correction and terms with false discovery rate (FDR) above 005 were discarded The 569

results were then imported into Cytoscape for visualization 570

571

Databases Comparison on Cell Wall Pathway 572

Sixteen well characterized (Penning et al 2009 Bosch et al 2011) components of cell wall biosynthesis 573

(Supplemental Table S8) were chosen as query genes to search against CORNET Maize 574

(httpsbioinformaticspsbugentbecornetversionscornet_maize10) on website and STRING database using 575

Cytoscape stringApp (httpappscytoscapeorgappsstringapp) The parameters for searching CORNET 576

database were Method=Pearson Correlation coefficient=075 P-value le 005 and Top genes = 50 This 577

resulted in 210 co-expressed genes and 325 interactions To search STRING database the confidence cutoff 578

was set to 04 with maximum number of interactors set to 100 76 genes with 817 interactions were retrieved 579

Maize proteins were blasted against TAIR 10 protein sequences using standalone BLASTP version 2228+ 580

(Camacho et al 2009) 581

582

Acknowledgments 583

We would like to give special thanks to Dr Peixiang Zhao (FSU Department of Computer Science) for advice 584

and discussion on topological analysis of maize networks Also we thank Dr Alan Lemmon (FSU Department 585

of Scientific Computing) and Dr Jonathan Dennis (FSU Department of Biological Science) for the helpful 586

discussion on data analysis 587

588

Supplemental Data 589

Supplemental Figure 1 Pipeline and datasets used for analysis 590

Supplemental Figure 2 Distribution of gene expression values 591

Supplemental Figure 3 Maize CPM-normalized with log2 transformed gene expression from all tissues and 592

developmental stages 593

Supplemental Figure 4 Pairwise comparison among results of inferences methods 594

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 18

Supplemental Figure 5 Characteristic of all 1720 PPPTY gene set (ALL_1720) genes with highest AUROC 595

values in CSC method (CSC) PCC method (PCC) and MRNET method (MRNET) 596

Supplemental Figure 6 Evaluation of network performance based on sample size and inference 597

Supplemental Figure 7 GCN performance comparison between protein networks 598

Supplemental Figure 8 Average neighborhood connectivity for three selected networks PCC-aggregated (PA) 599

SCC-aggregated (SA) and MRNET-single (MS) 600

Supplemental Figure 9 Node distribution for four selected networks PCC-aggregated (PA) SCC-aggregated 601

(SA) and MRNET-single (MS) and the intersection among three networks (Merged network) 602

Supplemental Figure 10 Network representation of PCC ranked aggregation network (PA) 603

Supplemental Table S1 RNA-Seq libraries used in this analysis 604

Supplemental Table S2 Random network AUROC value baseline 605

Supplemental Table S3 ANOVA tables and pairwise comparisons 606

Supplemental Table S4 Topological characteristics of four maize networks 607

Supplemental Table S5 Gene Ontology annotation for 148 hub genes 608

Supplemental Table S6 Enriched GO terms for PCC ranked aggregation networks from module 1 to module 8 609

Supplemental Table S7 Enriched GO terms for SCC ranked aggregation networks from module 1 to module 8 610

Supplemental Table S8 16 query genes in maize cell wall pathway 611

Supplemetal Table S9 GO enrichment analysis for 214 co-expressed genes of cell wall query genes in 612

merged network 613

Supplemental Table S10 Annotation for co-expressed genes queried by 16 cell wall pathway genes from 614

merged network 615

Supplemental Table S11 Annotation for co-expressed genes queried by 16 cell wall pathway genes from 616

CORNET database 617

Supplemental Table S12 Annotation for co-expressed genes queried by 16 cell wall pathway genes from 618

STRING database 619

Supplemental Dataset S1 The merged network in Cytoscape-ready format 620

Supplemental Dataset S2 Tutorial Visualizing Co-expression data in Cytoscape 621

622

623 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 19

624

625

626

Figure legends 627

628

Figure 1 Number of maize microarray and RNA-Seq samples submitted to NCBI (httpswwwncbinlmnihgov) 629

from 2008-2016 A A text search of the National Center for Biotechnology Information (NCBI) Gene 630

Expression Omnibus (GEO) database identified samples generated by microarray platforms GPL4032 and 631

GPL12620 the total values for years 2008 to 2016 were combined to represent the number of microarray 632

studies (Microarray) A text search of the NCBI Sequence Read Archive (SRA) database was used to identify 633

RNA-seq samples generated between 2008 and 2016 using the Illumina sequencing platform (RNA-Seq) B 634

the number of samples submitted to NCBI GEO database each year generated by microarray platform 635

GPL4032 and GPL12620 were identified by a text search (dash line) and compared to the number of RNA-Seq 636

Illumina samples (solid line) per year 2008-2016 637

638

Figure 2 Normalization and network inference methods effect on single network performance A Network 639

performance was evaluated by calculating area under the receiver operating characteristic curve (AUROC) 640

values from GO datasets for comparisons with samples normalized using Variance Stabilizing Transformation 641

(VST) Counts Per Million (CPM) or Reads per Kilobase per Million (RPKM) methods B Network performance 642

was evaluated by calculating AUROC values from PPPTY dataset comparisons for samples normalized using 643

VST CPM or RPKM methods C Network performance was evaluated by calculating AUROC values from 644

comparisons with HDA101 binding targets for samples normalized using VST CPM or RPKM methods D 645

Network performance was evaluated by calculating AUROC values from comparisons with GO dataset for 646

samples constructed using ten inference methods including Pearson Correlation Coefficient (PCC) Spearman 647

correlation coefficient (SCC) Kendall rank Correlation Coefficient (KCC) Gini correlation coefficient (GCC) 648

Biweight midcorrelation (BIC) Cosine Similarity Coefficient (CSC) Additive ARCNE (AA) multiplicative 649

ARCNE (MA) Minimum Redundancy NETwork (MRNET) and Context Likelihood of Relatedness (CLR) E 650

Network performance was evaluated by calculating AUROC values from comparisons with PPPTY for samples 651

constructed using ten inference methods F Network performance was evaluated by calculating AUROC 652

values from comparisons with HDA101 binding targets for samples constructed using ten inference methods 653

Outliers were defined as outside of 15 times the interquartile range above 75 quantile or below 25 quantile 654

Median values were plotted as bold horizontal lines For C to F the horizontal dashed lines indicate the highest 655

and lowest AUROC values 656

657

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 20

Figure 3 Similarity between ten inference methods on network performance based upon GO (A) and PPPTY 658

(B) evaluation Genes with highest AUROC value in CSC or MRNETCLR are closed in blue and green box 659

respectively Area under the ROC curve (AUROC) values for each GO term or genes were scaled to standard 660

normal distribution resulting in scaled AUROC values between -3 (blue) and 3 (red) Samples normalized by 661

VST CPM and RPKM were analyzed using each inference methods (PCC SCC KCC GCC BIC CSC AA 662

MA MRNET and CLR) and clustered based on Euclidian distance PCC Pearson Correlation Coefficient SCC 663

Spearman correlation coefficient KCC Kendall rank Correlation Coefficient GCC Gini correlation coefficient 664

BIC Biweight midcorrelation CSC Cosine Similarity Coefficient AA Additive ARCNE MA multiplicative 665

ARCNE MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness 666

667

Figure 4 Effect of sample size on network performance A Average area under the ROC curve (average 668

AUROC) values from GO evaluation of 17 different sized networks plotted against natural logarithm 669

transformed sample size (log(sample size)) B Average AUROC values from PPPTY evaluation of 17 different 670

sized networks plotted against natural logarithm transformed sample size (log(sample size)) Regression fitting 671

logarithm models were plotted in black lines R-square and p-values were calculated by lm() function in R PCC 672

Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET Minimum Redundancy 673

NETwork CLR Context Likelihood of Relatedness 674

675

Figure 5 Evaluation of different sized networks constructed with different number of samples using PCC 676

(black) SCC (green) MRNET (red) and CLR (blue) methods A Average AUROC values from GO evaluations 677

of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted against sample size Seventeen 678

individual networks were labeled as S12_1 to S404 the S1266 included all samples from 17 experiments B 679

Average AUROC values from PPPTY evaluations of networks constructed using 12 (S12) to 1266 (S1266) 680

libraries were plotted against sample size Networks with the same number of samples included are 681

designated as ldquo_1rdquo ldquo_2rdquo and ldquo_3rdquo PCC Pearson Correlation Coefficient SCC Spearman correlation 682

coefficient MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness 683

684

Fig 6 GCN performance comparison among single network (whiterdquo1266rdquo) aggregated network (greyrdquoaggrdquo) 685

and protein network (dark greyrdquoprrdquo) using PCC SCC MRNET and CLR A GO evaluation on networks 686

Inference methods were indicated by single letter (p- PCC s- SCC m- MRNET c-CLR) AUROC values were 687

plotted against network types B PPPTY evaluation on networks Inference methods were indicated by single 688

letter (p- PCC s- SCC m- MRNET c-CLR) Network types were plotted against AUROC values Bold 689

horizontal lines indicate median star sign is the mean value of each box Outliers are plotted in grey dots 690

691

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 21

Figure 6 GCN performance comparison of networks constructed with 1266 libraries A Area under the ROC 692

curve (AUROC) values from GO evaluation of single network (white bars) aggregation network (grey bars) and 693

protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) 694

or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers B 695

AUROC values from PPPTY evaluation of single network (white bars) aggregation network (grey bars) and 696

protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) 697

or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers 698

699

Figure 7 Comparison of top 1000 interacting genes in aggregation and single networks A Venn diagram 700

shows the overlap among 1000 highest interacting genes in three networks 148 genes were shared among 701

three networks PA PCC ranked aggregation network SA SCC ranked aggregation network MS MRNET 702

single network B Venn diagram shows the overlap among top 1x106 edges in three networks 106591 edges 703

were shared among three networks C GO term enrichment analyzed by AgriGO for 148 shared highly 704

interacting genes among PA SA and MS D GO term enrichment analyzed by AgriGO for 214 co-expressed 705

genes queried by 16 cell wall pathway genes 706

707

Figure 8 Cell wall pathway subnetworks A Intersections of PCC aggregation (PA) SCC aggregation (SA) and 708

MRNET-single (MS) networks queried by 16 cell wall pathway genes (red nodes) Cyan nodes are genes with 709

reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of 710

involvement in cell wall related pathways Grey lines indicate network predicted interactions B Network 711

retrieved from CORNET database queried by the16 cell wall pathway genes (red node) Cyan nodes are 712

genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior 713

knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions C 714

Network retrieved from STRING database queried by 16 cell wall pathway genes (red nodes) Cyan nodes are 715

genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior 716

knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions 717

718

Supplemental Figure 1 Pipeline and datasets used for analysis A Workflow used in this analysis 719

Independent steps are labeled in square boxes with alternative algorithms for each step in the rounded boxes 720

Software and packages for each step are in italics between the boxes Raw data files were acquired from 721

National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) database converted to a 722

common format (fastq files) and aligned to the maize AGPv3 genome (Alignment) Gene-level reads were 723

counted (Read Count) to generate an expression matrix which was imported to the R environment for the 724

normalization inference and evaluation steps All networks were visualized in Cytoscape B Relative 725

representation of different maize tissues in acquired datasets Tissues are listed by name with the percentage 726

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 22

of the1266 libraries originating from each tissue SAM= Shoot Apical Meristem Samples are grouped by tissue 727

and may be represented by one or more developmental stages of that tissue Tissues represented by less than 728

10 libraries were grouped together as Others C Relative representation of different maize genotypes in our 729

datasets Genotypes are listed by name with the percentage of the 1266 libraries originating from each tissue 730

MAGIC = Multi-parent Advanced Generation InterCrosses Genotypes represented by more than 10 libraries 731

were grouped together as Others 732

733

Supplemental Figure 2 Distribution of gene expression values The frequency of each expression level in the 734

dataset (Density) was plotted against gene expression (Expr) which was calculated after normalization by 735

Variance Stabilizing Transformation (VST) Counts Per Million (CPM) or Reads Per Kilobase per Million 736

mapped reads (RPKM) A-B distribution of expression values for samples normalized with CPM (black line 737

CPM graph) and RPKM (black line RPKM graph) before (A) and after (B) logarithm normalization (log2) VST 738

values are log2 transformed by default The normal distribution of expression (dot lines) was calculated using 739

dnorm() function in R which takes the mean value and standard deviation from log2 transformed expressions 740

C Normalized gene expression values for 15116 genes were averaged libraries and plotted as a function of 741

gene length in base pairs (bp) 742

743

Supplemental Figure 3 Maize CPM-normalized with log2 transformed gene expression from all tissues and 744

developmental stages (Stelpflug et al 2015) A Clustering dendrogram of samples based on Euclidean 745

distance (Height) DAS days after sowing DAP days after pollination V1-V18 vegetative developmental 746

stage B Heat map of the gene expression correlation between pollen tissue and 78 other tissues calculated 747

by Pearson correlation coefficient ranging 06 to 10 Red color indicates higher correlation 748

749

Supplemental Figure 4 Pairwise comparison among results of inferences methods A GO evaluation 750

comparisons for VST CPM and RPKM normalized data The AUROC value density for each method was 751

plotted in diagonal line of blocks between AUROC values and PCC values AUROC values evaluated by GO 752

datasets were plotted pairwise in triangle below diagonal with the number corresponding coefficient values as 753

calculated by Pearson correlation shown in the triangle above diagonal B PPPTY evaluation comparisons for 754

VST CPM and RPKM normalized data The AUROC value density for each method was plotted in diagonal 755

line of blocks between AUROC values and PCC values AUROC values evaluated by PPPTY datasets were 756

plotted pairwise in triangle below diagonal with the number corresponding coefficient values as calculated by 757

Pearson correlation shown in the triangle above diagonal PCC Pearson Correlation Coefficient SCC 758

Spearman correlation coefficient KCC Kendall rank Correlation Coefficient GCC Gini correlation coefficient 759

Bi Biweight midcorrelation CSC Cosine Similarity Coefficient AA Additive ARCNE MA multiplicative 760

ARCNE MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness 761

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 23

762

Supplemental Figure 5 Characteristic of all 1720 PPPTY gene set (ALL_1720) genes with highest AUROC 763

values in CSC method (CSC) PCC method (PCC) and MRNET method (MRNET) Average expression in 764

CPM of four gene sets were in squares average number of lowly expressed elements (CPM lt 0) were in solid 765

circles 766

767

Supplemental Figure 6 Evaluation of network performance based on sample size and inference A AUROC 768

values from GO evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted 769

against sample size B AUROC values from PPPTY evaluations of networks constructed using 12 (S12) to 770

1266 (S1266) libraries were plotted against sample size Networks with the same number of samples included 771

are designated as ldquo_1rdquo ldquo_2rdquo and ldquo_3rdquo Outliers were defined as outside of 15 times the interquartile range 772

above the 75 quantile or below 25 quantile Median values were plotted as bold horizontal lines Dash lines 773

are average AUROC value from 17 individual networks of each categories Mean values of each network were 774

labeled in asterisks PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET 775

Minimum Redundancy NETwork CLR Context Likelihood of Relatedness 776

777

Supplemental Figure 7 GCN performance comparison between protein networks A Area Under the ROC 778

curve (AUROC) values from GO evaluation of protein networks with 17862 genes (ppr_all) and with 11429 779

genes (ppr) B Area Under the ROC curve (AUROC) values from PPPTY evaluation of protein networks with 780

17862 genes (ppr_all) and with 11429 genes (ppr) Both networks were constructed by Pearson Correlation 781

Coefficient (PCC) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate 782

outliers 783

784

Supplemental Figure 8 Average neighborhood connectivity for three selected networks PCC-aggregated (PA) 785

SCC-aggregated (SA) and MRNET-single (MS) The average neighborhood connectivity distribution of all 786

genes is plotted against number of neighbors The top one million edges were chosen for each network Red 787

and blue curve shows the power-law fitted distribution R2 value indicates the fitness with the power-law model 788

789

Supplemental Figure 9 Node distribution for four selected networks PCC-aggregated (PA) SCC-aggregated 790

(SA) and MRNET-single (MS) and the intersection among three networks (Merged network) The number of 791

edges linked to the genes (node degree) was plotted against the number of genes with that degree (number of 792

nodes) Red curve shows the power-law fitted distribution with the function and R2 indicated beside 793

794

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 24

Supplemental Figure 10 Network representation of PCC ranked aggregation network (PA) Each node is a 795

gene in the network The eight largest modules detected by Markov Cluster Algorithm (MCL) were highlighted 796

in colors Genes not in modules 1-8 are light grey nodes 797

798

799

Literature Cited 800

Allen JD Xie Y Chen M Girard L Xiao G (2012) Comparing statistical methods for constructing large scale 801 gene networks PLoS One 7 e29348 802

Anders S Huber W (2010) Differential expression analysis for sequence count data Genome Biol 11 R106 803

Aoki K Ogata Y Shibata D (2007) Approaches for extracting practical information from gene co-expression 804 networks in plant biology Plant Cell Physiol 48 381ndash90 805

Aoki Y Okamura Y Tadaka S Kinoshita K Obayashi T (2015) ATTED-II in 2016 A Plant Coexpression 806 Database Towards Lineage-Specific Coexpression Plant Cell Physiol 57 e5ndashe5 807

Asnicar F Masera L Coller E Gallo C Sella N Tolio T Morettin P Erculiani L Galante F Semeniuta S (2016) 808 NES2RA Network expansion by stratified variable subsetting and ranking aggregation Int J High Perform 809 Comput Appl 1094342016662508 810

Baerenfaller K Grossmann J Grobei MA Hull R Hirsch-Hoffmann M Yalovsky S Zimmermann P 811 Grossniklaus U Gruissem W Baginsky S (2008) Genome-scale proteomics reveals Arabidopsis thaliana 812 gene models and proteome dynamics Science (80- ) 320 938ndash941 813

Ballouz S Verleyen W Gillis J (2015) Guidance for RNA-seq co-expression network construction and analysis 814 Safety in numbers Bioinformatics 31 2123ndash2130 815

Ballouz S Weber M Pavlidis P Gillis J (2016) EGAD Ultra-fast functional analysis of gene networks bioRxiv 816 53868 817

Barabasi A-L Oltvai ZNZN Barabaacutesi A-L (2004) Network biology understanding the cellrsquos functional 818 organization Nat Rev Genet 5 101ndash113 819

Benjamini Y Hochberg Y (1995) Controlling the false discovery rate a practical and powerful approach to 820 multiple testing J R Stat Soc Ser B 289ndash300 821

De Bodt S Hollunder J Nelissen H Meulemeester N Inzeacute D (2012) CORNET 20 Integrating plant 822 coexpression protein-protein interactions regulatory interactions gene associations and functional 823 annotations New Phytol 195 707ndash720 824

Bolduc N Yilmaz A Mejia-Guerra MK Morohashi K OrsquoConnor D Grotewold E Hake S (2012) Unraveling the 825 KNOTTED1 regulatory network in maize meristems Genes Dev 26 1685ndash90 826

Bosch M Mayer C-D Cookson A Donnison IS (2011) Identification of genes involved in cell wall biogenesis in 827 grasses by differential gene expression profiling of elongating and non-elongating maize internodes J 828 Exp Bot 62 3545ndash3561 829

Camacho C Coulouris G Avagyan V Ma N Papadopoulos J Bealer K Madden TL (2009) BLAST+ 830 architecture and applications BMC Bioinformatics 10 421 831

Conesa A Madrigal P Tarazona S Gomez-Cabrero D Cervera A McPherson A Szcześniak MW Gaffney DJ 832 Elo LL Zhang X et al (2016) A survey of best practices for RNA-seq data analysis Genome Biol 17 13 833

Drsquohaeseleer P Liang S Somogyi R (2000) Genetic network inference from co-expression clustering to reverse 834 engineering Bioinformatics 16 707ndash726 835

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 25

Davidson RM Hansey CN Gowda M Childs KL Lin H Vaillancourt B Sekhon RS de Leon N Kaeppler SM 836 Jiang N et al (2011) Utility of RNA Sequencing for Analysis of Maize Reproductive Transcriptomes Plant 837 Genome J 4 191 838

Desprez T Juraniec M Crowell EF Jouy H Pochylova Z Parcy F Houmlfte H Gonneau M Vernhettes S (2007) 839 Organization of cellulose synthase complexes involved in primary cell wall synthesis in Arabidopsis 840 thaliana Proc Natl Acad Sci 104 15572ndash15577 841

Dhillon IS Modha DS (2001) Concept decompositions for large sparse text data using clustering Mach Learn 842 42 143ndash175 843

Dillies M-A Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot G Castel D 844 Estelle J (2013a) A comprehensive evaluation of normalization methods for Illumina high-throughput RNA 845 sequencing data analysis Brief Bioinform 14 671ndash683 846

Dillies MA Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot NS Castel D 847 Estelle J et al (2013b) A comprehensive evaluation of normalization methods for Illumina high-throughput 848 RNA sequencing data analysis Brief Bioinform 14 671ndash683 849

Doncheva NT Assenov Y Domingues FS Albrecht M (2012) Topological analysis and interactive visualization 850 of biological networks and protein structures Nature Protoc 7 670ndash85 851

Dong J Horvath S (2007) Understanding network concepts in modules BMC Syst Biol 1 24 852

Dotto MC Petsch KA Aukerman MJ Beatty M Hammell M Timmermans MCP (2014) Genome-wide analysis 853 of leafbladeless1-regulated and phased small RNAs underscores the importance of the TAS3 ta-siRNA 854 pathway to maize development PLoS Genet 10 e1004826 855

Du D Rawat N Deng Z Gmitter FG (2015) Construction of citrus gene coexpression networks from microarray 856 data using random matrix theory Hortic Res 2 15026 857

Du Z Zhou X Ling Y Zhang Z Su Z (2010) agriGO a GO analysis toolkit for the agricultural community 858 Nucleic Acids Res 38 64-70 859

Enright AJ Van Dongen S Ouzounis CA (2002) An efficient algorithm for large-scale detection of protein 860 families Nucleic Acids Res 30 1575ndash1584 861

Fadista J Vikman P Laakso EO Mollet IG Esguerra J Lou Taneera J Storm P Osmark P Ladenvall C 862 Prasad RB (2014) Global genomic and transcriptomic analysis of human pancreatic islets reveals novel 863 genes influencing glucose metabolism Proc Natl Acad Sci 111 13924ndash13929 864

Faith JJ Hayete B Thaden JT Mogno I Wierzbowski J Cottarel G Kasif S Collins JJ Gardner TS (2007) 865 Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of 866 expression profiles PLoS Biol 5 0054ndash0066 867

Fedoroff N V (2012) McClintockrsquos challenge in the 21st century Proc Natl Acad Sci 109(50) 20200ndash20203 868

Ficklin SP Feltus FA (2011) Gene coexpression network alignment and conservation of gene modules 869 between two grass species maize and rice Plant Physiol 156 1244ndash56 870

Filzmoser P Fritz H Kalcher K (2009) pcaPP Robust PCA by Projection Pursuit R Packag version 1 871

Fu J Cheng Y Linghu J Yang X Kang L Zhang Z Zhang J He C Du X Peng Z (2013) RNA sequencing 872 reveals the complex regulatory network in the maize kernel Nature Commun 42832 873

Gao S Ver Steeg G Galstyan A (2015) Efficient Estimation of Mutual Information for Strongly Dependent 874 Variables Artificial Intelligence and Statistics 277-286 875

Gillis J Pavlidis P (2011) The role of indirect connections in gene networks in predicting function 876 Bioinformatics 27 1860ndash1866 877

Giorgi F Fabbro C Del Licausi F (2013) Comparative study of RNA-seq-and Microarray-derived coexpression 878 networks in Arabidopsis thaliana Bioinformatics 2 1ndash8 879

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 26

Gu Y Kaplinsky N Bringmann M Cobb A Carroll A Sampathkumar A Baskin TI Persson S Somerville CR 880 (2010) Identification of a cellulose synthase-associated protein required for cellulose biosynthesis Proc 881 Natl Acad Sci 107 12866ndash12871 882

Han Y Gao S Muegge K Zhang W Zhou B (2015) Advanced applications of RNA sequencing and challenges 883 Bioinform Biol Insights 9 29ndash46 884

Horvath S Dong J (2008) Geometric interpretation of gene coexpression network analysis PLoS Comput biol 885 4 e1000117 886

Huang J Lynn JS Schulte L Vendramin S McGinnis K (2017) Chapter Two-Epigenetic Control of Gene 887 Expression in Maize Int Rev Cell Mol Biol 328 25ndash48 888

Iancu OD Kawane S Bottomly D Searles R Hitzemann R McWeeney S (2012) Utilizing RNA-Seq data for de 889 novo coexpression network inference Bioinformatics 28 1592ndash1597 890

Kim D Langmead B Salzberg SL (2015) HISAT a fast spliced aligner with low memory requirements Nat 891 Methods 12 357ndash360 892

Koumlster J Rahmann S (2012) Snakemakemdasha scalable bioinformatics workflow engine Bioinformatics 28 893 2520ndash2522 894

Krouk G Lingeman J Colon A Coruzzi G Shasha D (2013) Gene regulatory networks in plants learning 895 causality from time and perturbation Genome Biol 14 123 896

Kumar S Stecher G Suleski M Hedges SB (2017) TimeTree a resource for timelines timetrees and 897 divergence times Mol Biol Evol 34 1812ndash1819 898

Kumari S Nie J Chen H-S Ma H Stewart R Li X Lu M-Z Taylor WM Wei H (2012) Evaluation of gene 899 association methods for coexpression network construction and biological knowledge discovery PLoS 900 One 7 e50411 901

Langfelder P Horvath S (2008) WGCNA an R package for weighted correlation network analysis BMC 902 Bioinformatics 9 559 903

Leinonen R Sugawara H Shumway M (2010) The sequence read archive Nucleic Acids Res gkq1019 904

Li C Qiao Z Qi W Wang Q Yuan Y Yang X Tang Y Mei B Lv Y Zhao H et al (2015a) Genome-Wide 905 Characterization of cis-Acting DNA Targets Reveals the Transcriptional Regulatory Framework of 906 Opaque2 in Maize Plant Cell 27 532-545 907

Li H Peng Z Yang X Wang W Fu J Wang J Han Y Chai Y Guo T Yang N (2013a) Genome-wide 908 association study dissects the genetic architecture of oil biosynthesis in maize kernels Nat Genet 45 43ndash909 50 910

Li J Wei H Zhao PX (2013b) DeGNServer  Deciphering Genome-Scale Gene Networks through High 911 Performance Reverse Engineering Analysis 2013 912

Li P Piao Y Shon HS Ryu KH (2015b) Comparing the normalization methods for the differential analysis of 913 Illumina high-throughput RNA-Seq data BMC Bioinformatics 16 347 914

Li Q Eichten SR Hermanson PJ Zaunbrecher VM Song J Wendt J Rosenbaum H Madzima TF Sloan AE 915 Huang J et al (2014a) Genetic Perturbation of the Maize Methylome Plant Cell 26 4602ndash4616 916

Li S Łabaj PP Zumbo P Sykacek P Shi W Shi L Phan J Wu P-Y Wang M Wang C (2014b) Detecting and 917 correcting systematic variation in large-scale RNA sequencing data Nature Biotechnol 32 888ndash895 918

Li Y Pearl SA Jackson SA (2015c) Gene Networks in Plant Biology Approaches in Reconstruction and 919 Analysis Trends Plant Sci 20 664ndash675 920

Liao Y Smyth GK Shi W (2014) featureCounts an efficient general purpose program for assigning sequence 921 reads to genomic features Bioinformatics 30 923ndash930 922

Lim WK Wang K Lefebvre C Califano A (2007) Comparative analysis of microarray normalization procedures 923 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 27

Effects on reverse engineering gene networks Bioinformatics pp 282ndash288 924

Liu S Liu Y Zhao J Cai S Qian H Zuo K Zhao L Zhang L (2017) A computational interactome for prioritizing 925 genes associated with complex agronomic traits in rice Plant J 90 177-188 926

Lonsdale J Thomas J Salvatore M Phillips R Lo E Shad S Hasz R Walters G Garcia F Young N (2013) 927 The genotype-tissue expression (GTEx) project Nat Genet 45 580ndash585 928

Love MI Huber W Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data 929 with DESeq2 Genome Biol 15 1 930

Luo X You Z Zhou M Li S Leung H Xia Y Zhu Q (2015) A highly efficient approach to protein interactome 931 mapping based on collaborative filtering framework Sci Rep 5 7702 932

Ma C Wang X (2012) Application of the Gini correlation coefficient to infer regulatory relationships in 933 transcriptome analysis Plant Physiol 160 192ndash203 934

Ma H-W Zeng A-P (2003) The connectivity structure giant strong component and centrality of metabolic 935 networks Bioinformatics 19 1423ndash1430 936

Ma S Shah S Bohnert HJ Snyder M Dinesh-Kumar SP (2013) Incorporating motif analysis into gene co-937 expression networks reveals novel modular expression pattern and new signaling pathways PLoS Genet 938 9 e1003840 939

Marbach D Costello JC Kuumlffner R Vega NNM Prill RJ Camacho DM Allison KR Aderhold A Allison KR 940 Bonneau R et al (2012) Wisdom of crowds for robust gene network inference Nat Methods 9 796ndash804 941

Margolin AA Nemenman I Basso K Wiggins C Stolovitzky G Dalla Favera R Califano A (2006) ARACNE 942 an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context BMC 943 Bioinformatics 7 S7 944

Mark Cigan A Unger‐Wallace E Haug‐Collet K (2005) Transcriptional gene silencing as a tool for uncovering 945 gene function in maize Plant J 43 929ndash940 946

Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads EMBnet J 17 947 pp-10 948

Maza E Frasse P Senin P Bouzayen M Zouine M (2013) Comparison of normalization methods for 949 differential gene expression analysis in RNA-Seq experiments A matter of relative size of studied 950 transcriptomes Commun Integr Biol 6 e25849 951

McClintock B (1983) The Significance of Responses of THE GENOME TO CHALLENGE Science (80- ) 792ndash952 801 953

Meyer PE Kontos K Lafitte F Bontempi G (2007) Information-theoretic inference of large transcriptional 954 regulatory networks Eurasip J Bioinforma Syst Biol doi 101155200779879 955

Meyer PE Lafitte F Bontempi G (2008) minet ARBioconductor package for inferring large transcriptional 956 networks using mutual information BMC Bioinformatics 9 461 957

Monaco MK Sen TZ Dharmawardhana PD Ren L Schaeffer M Naithani S Amarasinghe V Thomason J 958 Harper L Gardiner J et al (2013) Maize Metabolic Network Construction and Transcriptome Analysis 959 Plant Genome 6 12 960

Morohashi K Casas MI Falcone Ferreyra ML Falcone Ferreyra L Mejiacutea-Guerra MK Pourcel L Yilmaz A 961 Feller A Carvalho B Emiliani J et al (2012) A genome-wide regulatory framework identifies maize 962 pericarp color1 controlled genes Plant Cell 24 2745ndash64 963

Morris JH Apeltsin L Newman AM Baumbach J Wittkop T Su G Bader GD Ferrin TE (2011) clusterMaker 964 a multi-algorithm clustering plugin for Cytoscape BMC Bioinformatics 12 436 965

Mortazavi A Williams BA McCue K Schaeffer L Wold B (2008) Mapping and quantifying mammalian 966 transcriptomes by RNA-Seq Nat Methods 5 621ndash628 967

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 28

Mukaka MM (2012) A guide to appropriate use of correlation coefficient in medical research Malawi Med J 24 968 69ndash71 969

Obayashi T Hayashi S Saeki M Ohta H Kinoshita K (2009) ATTED-II provides coexpressed gene networks 970 for Arabidopsis Nucleic Acids Res 37 D987ndashD991 971

Ogata Y Suzuki H Sakurai N Shibata D (2010) CoP a database for characterizing co-expressed gene 972 modules with biological information in plants Bioinformatics 26 1267ndash1268 973

Oshlack A Wakefield MJ (2009) Transcript length bias in RNA-seq data confounds systems biology Biol 974 Direct 4 14 975

Park T Yi S-G Kang S-H Lee S Lee Y-S Simon R (2003) Evaluation of normalization methods for microarray 976 data BMC Bioinformatics 4 33 977

Paulson J Chen C-Y Lopes-Ramos CM Kuijjer ML Platig J Sonawane AR Fagny M Glass K Quackenbush 978 J (2016) Tissue-aware RNA-Seq processing and normalization for heterogeneous and sparse data 979 bioRxiv 81802 980

Pautler M Eveland AL LaRue T Yang F Weeks R Lunde C Je B Il Meeley R Komatsu M Vollbrecht E et 981 al (2015) FASCIATED EAR4 Encodes a bZIP Transcription Factor That Regulates Shoot Meristem Size in 982 Maize Plant Cell Online 2 tpc114132506 983

Penning BW Hunter 3rd CT Tayengwa R Eveland AL Dugard CK Olek AT Vermerris W Koch KE McCarty 984 DR Davis MF et al (2009) Genetic resources for maize cell wall biology Plant Physiol 151 1703ndash1728 985

Ponnala L Wang Y Sun Q Wijk KJ (2014) Correlation of mRNA and protein abundance in the developing 986 maize leaf Plant J 78 424ndash440 987

Rau A Gallopin M Celeux G Jaffreacutezic F (2013) Data-based filtering for replicated high-throughput 988 transcriptome sequencing experiments Bioinformatics 29 2146ndash2152 989

Robinson MD McCarthy DJ Smyth GK (2010) edgeR a Bioconductor package for differential expression 990 analysis of digital gene expression data Bioinformatics 26 139ndash140 991

Sales G Romualdi C (2011) parmigenemdasha parallel R package for mutual information estimation and gene 992 network reconstruction Bioinformatics 27 1876ndash1877 993

Sangrador-Vegas A Mitchell AL Chang H-Y Yong S-Y Finn RD (2016) GO annotation in InterPro why 994 stability does not indicate accuracy in a sea of changing annotations Database J Biol databases 995 curation 2016 996

Sato Y Antonio BA Namiki N Takehisa H Minami H Kamatsuki K Sugimoto K Shimizu Y Hirochika H 997 Nagamura Y (2011) RiceXPro a platform for monitoring gene expression in japonica rice grown under 998 natural field conditions Nucleic Acids Res 39 D1141ndashD1148 999

Schaefer RJ Briskine R Springer NM Myers CL (2014) Discovering functional modules across diverse maize 1000 transcriptomes using COB the co-expression browser PLoS One doi 101371journalpone0099193 1001

Schmidt D (2016) Co-Operation Fast Correlation Covariance and Cosine Similarity R package 1002

Schnable P Ware D Fulton R Stein J (2009) The B73 maize genome complexity diversity and dynamics 1003 Science (80- ) 326 1112ndash1115 1004

Schwanhaumlusser B Busse D Li N Dittmar G Schuchhardt J Wolf J Chen W Selbach M (2011) Global 1005 quantification of mammalian gene expression control Nature 473 337ndash342 1006

Scott-Boyer M-P Haibe-Kains B Deschepper CF (2013) Network statistics of genetically-driven gene co-1007 expression modules in mouse crosses Frontiers in Genetics 20134291 1008

Serin EAR Nijveen H Hilhorst HWM Ligterink W (2016) Learning from Co-expression Networks Possibilities 1009 and Challenges Front Plant Sci 7 444 1010

Shannon P Markiel A Ozier O Baliga NS Wang JT Ramage D Amin N Schwikowski B Ideker T (2003) 1011 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 29

Cytoscape a software environment for integrated models of biomolecular interaction networks Genome 1012 Res 13 2498ndash2504 1013

Sing T Sander O Beerenwinkel N Lengauer T (2005) ROCR visualizing classifier performance in R 1014 Bioinformatics 21 3940ndash3941 1015

Singh M Goel S Meeley RB Dantec C Parrinello H Michaud C Leblanc O Grimanelli D (2011) Production of 1016 viable gametes without meiosis in maize deficient for an ARGONAUTE protein Plant Cell 23 443ndash458 1017

Song L Langfelder P Horvath S (2012) Comparison of co-expression measures mutual information 1018 correlation and model based indices BMC Bioinformatics 13 328 1019

Stelpflug SC Rajandeep S Vaillancourt B Hirsch CN Buell CR Leon N De Kaeppler SM (2015) An 1020 expanded maize gene expression atlas based on RNA-sequencing and its use to explore root 1021 development Plant Genome 314ndash362 1022

Szklarczyk D Franceschini A Wyder S Forslund K Heller D Huerta-Cepas J Simonovic M Roth A Santos A 1023 Tsafou KP et al (2015) STRING v10 Protein-protein interaction networks integrated over the tree of life 1024 Nucleic Acids Res 43 D447ndashD452 1025

Tzfadia O Diels T De Meyer S Vandepoele K Aharoni A Van de Peer Y (2016) CoExpNetViz Comparative 1026 Co-Expression Networks Construction and Visualization Tool Front Plant Sci 6 1194 1027

Usadel B Obayashi T Mutwil M Giorgi FM Bassel GW Tanimoto M Chow A Steinhauser D Persson S 1028 Provart NJ (2009) Co-expression tools for plant biology opportunities for hypothesis generation and 1029 caveats Plant Cell Environ 32 1633ndash51 1030

USDA (2016) Grain World Markets and Trade 1031

Vanholme R Demedts B Morreel K Ralph J Boerjan W (2010) Lignin biosynthesis and structure Plant 1032 Physiol 153 895ndash905 1033

Verdin E Ott M (2014) 50 Years of Protein Acetylation From Gene Regulation To Epigenetics Metabolism 1034 and Beyond Nat Rev Mol Cell Biol 16 258ndash264 1035

Walley JW Sartor RC Shen Z Schmitz RJ Wu KJ Urich MA Nery JR Smith LG Schnable JC Ecker JR 1036 (2016) Integration of omic networks in a developmental atlas of maize Science 353 814ndash818 1037

Wang W Zhou XJ Liu Z Sun F (2015a) Network tuned multiple rank aggregation and applications to gene 1038 ranking BMC Bioinformatics 16 S6 1039

Wang YXR Jiang K Feldman LJ Bickel PJ Huang H (2015b) Inferring genendashgene interactions and functional 1040 modules using sparse canonical correlation analysis Ann Appl Stat 9 300ndash323 1041

Wang Z Gerstein M Snyder M (2009) RNA-Seq a revolutionary tool for transcriptomics Nat Rev Genet 10 1042 57ndash63 1043

Wei C Li J Bumgarner RE (2004) Sample size for detecting differentially expressed genes in microarray 1044 experiments BMC Genomics 5 87 1045

Wolfe CJ Kohane IS Butte AJ (2005) Systematic survey reveals general applicability ofldquo guilt-by-associationrdquo 1046 within gene coexpression networks BMC Bioinformatics 6 227 1047

Wu D-D Wang X Li Y Zeng L Irwin DM Zhang Y-P (2014) ldquoOut of Pollenrdquo Hypothesis for Origin of New 1048 Genes in Flowering Plants Study from Arabidopsis thaliana Genome Biol Evol 6 2822ndash2829 1049

Yang H Liu X Xin M Du J Hu Z Peng H Rossi V Sun Q Ni Z Yao Y (2016) Genome-wide Mapping of 1050 Targets of Maize Histone Deacetylase HDA101 Reveals Its Function and Regulatory Mechanism during 1051 Seed Development Plant Cell 28 629ndash645 1052

Yim WC Yu Y Song K Jang CS Lee B-M (2013) PLANEX the plant co-expression database BMC Plant Biol 1053 13 83 1054

Zheng W Chung LM Zhao H (2011) Bias detection and correction in RNA-Sequencing data BMC 1055 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 30

Bioinformatics 12 290 1056

Zhong R Allen JD Xiao G Xie Y (2014) Ensemble-based network aggregation improves the accuracy of gene 1057 network reconstruction PLoS One 9 e106319 1058

Zhong R Ye ZH (2015) Secondary cell walls Biosynthesis patterned deposition and transcriptional regulation 1059 Plant Cell Physiol 56 195ndash214 1060

Zhu G Wu A Xu X-J Xiao P Lu L Liu J Cao Y Chen L Wu J Zhao X-M (2016) PPIM A protein-protein 1061 interaction database for Maize Plant Physiol 170 pp1501821- 1062

Zyprych-Walczak J Szabelska A Handschuh L Goacuterczak K Klamecka K Figlerowicz M Siatkowski I (2015) 1063 The impact of normalization methods on RNA-Seq data analysis Biomed Res Int 2015 1064

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 1 Number of maize microarray and RNA-Seq samples submitted to NCBI (httpswwwncbinlmnihgov) from 2008-2016 A A text search of the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) database identified samples generated by microarray platforms GPL4032 and GPL12620 the total values for years 2008 to 2016 were combined to represent the number of microarray studies (Microarray) A text search of the NCBI Sequence Read Archive (SRA) database was used to identify RNA-seq samples generated between 2008 and 2016 using the Illumina sequencing platform (RNA-Seq) B the number of samples submitted to NCBI GEO database each year generated by microarray platform GPL4032 and GPL12620 were identified by a text search (dash line) and compared to the number of RNA-Seq Illumina samples (solid line) per year 2008-2016

Fig 1A B

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 2 Normalization and network inference methods effect on single network performance A Network performance was evaluated by calculating area under the receiver operating characteristic curve (AUROC) values from GO datasets for comparisons with samples normalized using Variance Stabilizing Transformation (VST) Counts Per Million (CPM) or Reads per Kilobase per Million (RPKM) methods B Network performance was evaluated by calculating AUROC values from PPPTY dataset comparisons for samples normalized using VST CPM or RPKM methods C Network performance was evaluated by calculating AUROC values from com-parisons with HDA101 binding targets for samples normalized using VST CPM or RPKM methods D Network performance was evaluated by calculating AUROC values from comparisons with GO dataset for samples constructed using nine inference methods including Pearson Correlation Coefficient (PCC) Spearman correla-tion coefficient (SCC) Kendall rank Correlation Coefficient (KCC) Gini correlation coefficient (GCC) Biweight midcorrelation (BIC) Cosine Similarity Coefficient (CSC) Additive ARCNE (AA) multiplicative ARCNE (MA) Minimum Redundancy NETwork (MRNET) and Context Likelihood of Relatedness (CLR) E Network perfor-mance was evaluated by calculating AUROC values from comparisons with PPPTY for samples constructed using nine inference methods F Network performance was evaluated by calculating AUROC values from com-parisons with HDA101 binding targets for samples constructed using nine inference methods Outliers were defined as outside of 15 times the interquartile range above 75 quantile or below 25 quantile Median values were plotted as bold horizontal lines For C to F the horizontal dashed lines indicate the highest and lowest AUROC values

Fig 2 A D

B E

C F

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

FigP

FigurePIPSimilarityPbetweenPninePinferencePmethodsPonPnetworkPperformancePbasedPuponPGOPA-PandPPPPTYPB-PevaluationI Genes with highest AUROC value in CSC or MRNETCLR are closed in blue and green box respectively AreaPunderPthePROCPcurvePAUROC-PvaluesPforPeachPGOPtermPorPgenesPwerePscaledPtoPstandardPnormalPdistributionMPresultingPinPscaledPAUROCPvaluesPbetweenPKPblue-PandPPred-IPSamplesPnormalizedPbyPVSTMPCPMPandPRPKMPwerePanalyzedPusingPeachPinferencePmethodsPPCCMPSCCMPKCCMPGCCMPBICMPCSCMPAAMPMAMPMRNETPandPCLR-PandPclusteredPbasedPonP EuclidianP distanceIP PCCP PearsonP CorrelationP CoefficientP SCCP SpearmanP correlationP coefficientP KCCP KendallP rankP CorrelationP CoefficientPGCCPGiniPcorrelationPcoefficientPBICPBiweightPmidcorrelationPCosinePSimilarityPCoefficientPCSC-PAAPAdditivePARCNEPMAPmultiplicativePARCNEPMRNETPMinimumPRedundancyPNETworkPCLRPContextPLikelihoodPofPRelatednessI

A

B

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 4 Effect of sample size on network performance A Average area under the ROC curve (average AUROC) values from GO evaluation of 17 different sized networks plotted against natural logarithm transformed sample size (log(sample size)) B Average AUROC values from PPPTY evaluation of 17 different sized networks plotted against natural logarithm transformed sample size (log(sample size)) Regression fitting logarithm models were plotted in black lines R-square and p-values were calculated by lm() function in R PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness

Fig 4A

B

GO PCC GO SCC

GO MRNET GO CLR

PPPTY PCC PPPTY SCC

PPPTY CLRPPPTY MRNET

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Fig 5

A B

Figure 5 Evaluation of different sized networks constructed with different number of samples using PCC (black) SCC (green) MRNET (red) and CLR (blue) methods A Average AUROC values from GO evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted against sample size B Average AUROC values from PPPTY evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted against sample size Networks with the same number of samples included are designated as ldquo_1rdquo ldquo_2rdquo and ldquo_3rdquo PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 6 GCN performance comparison of networks constructed with 1266 libraries A Area under the ROC curve (AUROC) values from GO evalu-ation of single network (white bars) aggregation network (grey bars) and protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers B AUROC values from PPPTY evaluation of single network (white bars) aggregation network (grey bars) and protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers

AU

C

AU

C

Protein GO Protein PPPTYA B

Fig 6

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

835

45

3812

5 802

148

MS PA

SA

872505

167664

16510411732

9172 716573

106591

MS PA

SA

chromatinassemblydisassembly

cellular macromoleculemetabolic process

chromatin assembly

Hub

N2 compound metabolicprocess

gene silencing

macromoleculemetabolic process

cellular componentorganization

chromatin modification

biosynthetic process

cellular biosyntheticprocess

DNA packaging

organelle organization

protein-DNA complexassembly

nucleosomeorganization

DNA-dep DNAreplication

macromoleculebiosynthetic process

response to DNAdamage stimulus

chromosomeorganization

pattern specificationprocess

DNA replication

DNA conformationchange

translation

cellular macromoleculebiosynthetic process

Nucleic acid metabolicprocess

gene expression

chromatin organizationnucleosome assembly

epigenetic reg of geneexpression

negative regulation ofmacromolecule

metabolic process

cellular response tostress

RNA processing

DNA repair

regionalization

polysaccharidebiosynthetic process

cell wall organization orbiogenesis

glucan metabolicprocess

cellular glucanmetabolic process

cellular polysaccharidebiosynthetic process

cellular carbohydratebiosynthetic process

cellulose metabolicprocess

cellular polysaccharidemetabolic process

cellulose biosyntheticprocess epidermis development

cell growthgrowth

regulation of cellularcomponent size

cellular amino acidderivative metabolic

process

cell wall polysaccharidemetabolic process

carbohydrate metabolicprocess

regulation of anatomicalstructure size

GTP metabolic process

root morphogenesis

epidermal celldifferentiation

ectoderm developmentphenylpropanoid

biosynthetic process

regulation of cell size

glucan biosyntheticprocess

carbohydratebiosynthetic process cellular cell wall

organization orbiogenesis

cell wall biogenesis

Cell Wallroot epidermal cell

differentiationcell differentiation

cell wall organization

protein polymerization

plant-type cell wallbiogenesis

cellular carbohydratemetabolic process

phenylpropanoidmetabolic process

cell wall macromoleculemetabolic process

cellular cell wallmacromolecule

metabolic process

plant-type cell wallorganization or

biogenesis

hemicellulose metabolicprocess

Fig 7A C

B D

Figure 7 Comparison of top 1000 interacting genes in aggregation and single networks A Venn diagram shows the overlap among 1000 highest interacting genes in three networks 148 genes were shared among three networks PA PCC ranked aggregation network SA SCC ranked aggrega-tion network MS MRNET single network B Venn diagram shows the overlap among top 1x106 edges in three networks 106591 edges were shared among three networks C GO term enrichment analyzed by AgriGO for 148 shared highly interacting genes among PA SA and MS D GO term enrichment analyzed by AgriGO for 214 co-expressed genes queried by 16 cell wall pathway genes wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Fig 8

A B C

Figure 8 Cell wall pathway subnetworks A Intersections of PCC aggregation (PA) SCC aggregation (SA) and MRNET-single (MS) networks queried by 16 cell wall pathway genes (red nodes) Cyan nodes are genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions B Network retrieved from CORNET database queried by the 16 cell wall pathway genes (red node) Cyan nodes are genes with reported function in cell wall related path-ways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions C Network retrieved from STRING database queried by the 16 cell wall pathway genes (red nodes) Cyan nodes are genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Parsed CitationsAllen JD Xie Y Chen M Girard L Xiao G (2012) Comparing statistical methods for constructing large scale gene networks PLoS One7 e29348

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Anders S Huber W (2010) Differential expression analysis for sequence count data Genome Biol 11 R106Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Aoki K Ogata Y Shibata D (2007) Approaches for extracting practical information from gene co-expression networks in plant biologyPlant Cell Physiol 48 381-90

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Aoki Y Okamura Y Tadaka S Kinoshita K Obayashi T (2015) ATTED-II in 2016 A Plant Coexpression Database Towards Lineage-Specific Coexpression Plant Cell Physiol 57 e5-e5

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Asnicar F Masera L Coller E Gallo C Sella N Tolio T Morettin P Erculiani L Galante F Semeniuta S (2016) NES2RA Networkexpansion by stratified variable subsetting and ranking aggregation Int J High Perform Comput Appl 1094342016662508

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Baerenfaller K Grossmann J Grobei MA Hull R Hirsch-Hoffmann M Yalovsky S Zimmermann P Grossniklaus U Gruissem WBaginsky S (2008) Genome-scale proteomics reveals Arabidopsis thaliana gene models and proteome dynamics Science (80- ) 320938-941

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ballouz S Verleyen W Gillis J (2015) Guidance for RNA-seq co-expression network construction and analysis Safety in numbersBioinformatics 31 2123-2130

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ballouz S Weber M Pavlidis P Gillis J (2016) EGAD Ultra-fast functional analysis of gene networks bioRxiv 53868Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Barabasi A-L Oltvai ZNZN Barabaacutesi A-L (2004) Network biology understanding the cells functional organization Nat Rev Genet 5101-113

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Benjamini Y Hochberg Y (1995) Controlling the false discovery rate a practical and powerful approach to multiple testing J R Stat SocSer B 289-300

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

De Bodt S Hollunder J Nelissen H Meulemeester N Inzeacute D (2012) CORNET 20 Integrating plant coexpression protein-proteininteractions regulatory interactions gene associations and functional annotations New Phytol 195 707-720

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Bolduc N Yilmaz A Mejia-Guerra MK Morohashi K OConnor D Grotewold E Hake S (2012) Unraveling the KNOTTED1 regulatorynetwork in maize meristems Genes Dev 26 1685-90

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Bosch M Mayer C-D Cookson A Donnison IS (2011) Identification of genes involved in cell wall biogenesis in grasses by differential wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

gene expression profiling of elongating and non-elongating maize internodes J Exp Bot 62 3545-3561Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Camacho C Coulouris G Avagyan V Ma N Papadopoulos J Bealer K Madden TL (2009) BLAST+ architecture and applications BMCBioinformatics 10 421

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Conesa A Madrigal P Tarazona S Gomez-Cabrero D Cervera A McPherson A Szczesniak MW Gaffney DJ Elo LL Zhang X et al(2016) A survey of best practices for RNA-seq data analysis Genome Biol 17 13

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dhaeseleer P Liang S Somogyi R (2000) Genetic network inference from co-expression clustering to reverse engineeringBioinformatics 16 707-726

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Davidson RM Hansey CN Gowda M Childs KL Lin H Vaillancourt B Sekhon RS de Leon N Kaeppler SM Jiang N et al (2011) Utilityof RNA Sequencing for Analysis of Maize Reproductive Transcriptomes Plant Genome J 4 191

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Desprez T Juraniec M Crowell EF Jouy H Pochylova Z Parcy F Houmlfte H Gonneau M Vernhettes S (2007) Organization of cellulosesynthase complexes involved in primary cell wall synthesis in Arabidopsis thaliana Proc Natl Acad Sci 104 15572-15577

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dhillon IS Modha DS (2001) Concept decompositions for large sparse text data using clustering Mach Learn 42 143-175Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dillies M-A Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot G Castel D Estelle J (2013a) Acomprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis Brief Bioinform 14671-683

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dillies MA Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot NS Castel D Estelle J et al (2013b) Acomprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis Brief Bioinform 14671-683

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Doncheva NT Assenov Y Domingues FS Albrecht M (2012) Topological analysis and interactive visualization of biological networksand protein structures Nature Protoc 7 670-85

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dong J Horvath S (2007) Understanding network concepts in modules BMC Syst Biol 1 24Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dotto MC Petsch KA Aukerman MJ Beatty M Hammell M Timmermans MCP (2014) Genome-wide analysis of leafbladeless1-regulatedand phased small RNAs underscores the importance of the TAS3 ta-siRNA pathway to maize development PLoS Genet 10 e1004826

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Du D Rawat N Deng Z Gmitter FG (2015) Construction of citrus gene coexpression networks from microarray data using randommatrix theory Hortic Res 2 15026

Pubmed Author and TitleCrossRef Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Google Scholar Author Only Title Only Author and Title

Du Z Zhou X Ling Y Zhang Z Su Z (2010) agriGO a GO analysis toolkit for the agricultural community Nucleic Acids Res 38 64-70Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Enright AJ Van Dongen S Ouzounis CA (2002) An efficient algorithm for large-scale detection of protein families Nucleic Acids Res 301575-1584

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fadista J Vikman P Laakso EO Mollet IG Esguerra J Lou Taneera J Storm P Osmark P Ladenvall C Prasad RB (2014) Globalgenomic and transcriptomic analysis of human pancreatic islets reveals novel genes influencing glucose metabolism Proc Natl AcadSci 111 13924-13929

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Faith JJ Hayete B Thaden JT Mogno I Wierzbowski J Cottarel G Kasif S Collins JJ Gardner TS (2007) Large-scale mapping andvalidation of Escherichia coli transcriptional regulation from a compendium of expression profiles PLoS Biol 5 0054-0066

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fedoroff N V (2012) McClintocks challenge in the 21st century Proc Natl Acad Sci 109(50) 20200-20203Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ficklin SP Feltus FA (2011) Gene coexpression network alignment and conservation of gene modules between two grass speciesmaize and rice Plant Physiol 156 1244-56

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Filzmoser P Fritz H Kalcher K (2009) pcaPP Robust PCA by Projection Pursuit R Packag version 1Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fu J Cheng Y Linghu J Yang X Kang L Zhang Z Zhang J He C Du X Peng Z (2013) RNA sequencing reveals the complex regulatorynetwork in the maize kernel Nature Commun 42832

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gao S Ver Steeg G Galstyan A (2015) Efficient Estimation of Mutual Information for Strongly Dependent Variables ArtificialIntelligence and Statistics 277-286

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gillis J Pavlidis P (2011) The role of indirect connections in gene networks in predicting function Bioinformatics 27 1860-1866Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Giorgi F Fabbro C Del Licausi F (2013) Comparative study of RNA-seq-and Microarray-derived coexpression networks in Arabidopsisthaliana Bioinformatics 2 1-8

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gu Y Kaplinsky N Bringmann M Cobb A Carroll A Sampathkumar A Baskin TI Persson S Somerville CR (2010) Identification of acellulose synthase-associated protein required for cellulose biosynthesis Proc Natl Acad Sci 107 12866-12871

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Han Y Gao S Muegge K Zhang W Zhou B (2015) Advanced applications of RNA sequencing and challenges Bioinform Biol Insights 929-46

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Horvath S Dong J (2008) Geometric interpretation of gene coexpression network analysis PLoS Comput biol 4 e1000117Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Huang J Lynn JS Schulte L Vendramin S McGinnis K (2017) Chapter Two-Epigenetic Control of Gene Expression in Maize Int RevCell Mol Biol 328 25-48

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Iancu OD Kawane S Bottomly D Searles R Hitzemann R McWeeney S (2012) Utilizing RNA-Seq data for de novo coexpressionnetwork inference Bioinformatics 28 1592-1597

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kim D Langmead B Salzberg SL (2015) HISAT a fast spliced aligner with low memory requirements Nat Methods 12 357-360Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Koumlster J Rahmann S (2012) Snakemakemdasha scalable bioinformatics workflow engine Bioinformatics 28 2520-2522Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Krouk G Lingeman J Colon A Coruzzi G Shasha D (2013) Gene regulatory networks in plants learning causality from time andperturbation Genome Biol 14 123

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kumar S Stecher G Suleski M Hedges SB (2017) TimeTree a resource for timelines timetrees and divergence times Mol Biol Evol34 1812-1819

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kumari S Nie J Chen H-S Ma H Stewart R Li X Lu M-Z Taylor WM Wei H (2012) Evaluation of gene association methods forcoexpression network construction and biological knowledge discovery PLoS One 7 e50411

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Langfelder P Horvath S (2008) WGCNA an R package for weighted correlation network analysis BMC Bioinformatics 9 559Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Leinonen R Sugawara H Shumway M (2010) The sequence read archive Nucleic Acids Res gkq1019Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li C Qiao Z Qi W Wang Q Yuan Y Yang X Tang Y Mei B Lv Y Zhao H et al (2015a) Genome-Wide Characterization of cis-Acting DNATargets Reveals the Transcriptional Regulatory Framework of Opaque2 in Maize Plant Cell 27 532-545

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li H Peng Z Yang X Wang W Fu J Wang J Han Y Chai Y Guo T Yang N (2013a) Genome-wide association study dissects the geneticarchitecture of oil biosynthesis in maize kernels Nat Genet 45 43-50

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li J Wei H Zhao PX (2013b) DeGNServer Deciphering Genome-Scale Gene Networks through High Performance ReverseEngineering Analysis 2013

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li P Piao Y Shon HS Ryu KH (2015b) Comparing the normalization methods for the differential analysis of Illumina high-throughputRNA-Seq data BMC Bioinformatics 16 347 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li Q Eichten SR Hermanson PJ Zaunbrecher VM Song J Wendt J Rosenbaum H Madzima TF Sloan AE Huang J et al (2014a)Genetic Perturbation of the Maize Methylome Plant Cell 26 4602-4616

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li S Labaj PP Zumbo P Sykacek P Shi W Shi L Phan J Wu P-Y Wang M Wang C (2014b) Detecting and correcting systematicvariation in large-scale RNA sequencing data Nature Biotechnol 32 888-895

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li Y Pearl SA Jackson SA (2015c) Gene Networks in Plant Biology Approaches in Reconstruction and Analysis Trends Plant Sci 20664-675

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Liao Y Smyth GK Shi W (2014) featureCounts an efficient general purpose program for assigning sequence reads to genomicfeatures Bioinformatics 30 923-930

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Lim WK Wang K Lefebvre C Califano A (2007) Comparative analysis of microarray normalization procedures Effects on reverseengineering gene networks Bioinformatics pp 282-288

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Liu S Liu Y Zhao J Cai S Qian H Zuo K Zhao L Zhang L (2017) A computational interactome for prioritizing genes associated withcomplex agronomic traits in rice Plant J 90 177-188

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Lonsdale J Thomas J Salvatore M Phillips R Lo E Shad S Hasz R Walters G Garcia F Young N (2013) The genotype-tissueexpression (GTEx) project Nat Genet 45 580-585

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Love MI Huber W Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 Genome Biol15 1

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Luo X You Z Zhou M Li S Leung H Xia Y Zhu Q (2015) A highly efficient approach to protein interactome mapping based oncollaborative filtering framework Sci Rep 5 7702

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma C Wang X (2012) Application of the Gini correlation coefficient to infer regulatory relationships in transcriptome analysis PlantPhysiol 160 192-203

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma H-W Zeng A-P (2003) The connectivity structure giant strong component and centrality of metabolic networks Bioinformatics 191423-1430

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma S Shah S Bohnert HJ Snyder M Dinesh-Kumar SP (2013) Incorporating motif analysis into gene co-expression networks revealsnovel modular expression pattern and new signaling pathways PLoS Genet 9 e1003840

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Marbach D Costello JC Kuumlffner R Vega NNM Prill RJ Camacho DM Allison KR Aderhold A Allison KR Bonneau R et al (2012)Wisdom of crowds for robust gene network inference Nat Methods 9 796-804

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Margolin AA Nemenman I Basso K Wiggins C Stolovitzky G Dalla Favera R Califano A (2006) ARACNE an algorithm for thereconstruction of gene regulatory networks in a mammalian cellular context BMC Bioinformatics 7 S7

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mark Cigan A Unger-Wallace E Haug-Collet K (2005) Transcriptional gene silencing as a tool for uncovering gene function in maizePlant J 43 929-940

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads EMBnet J 17 pp-10Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Maza E Frasse P Senin P Bouzayen M Zouine M (2013) Comparison of normalization methods for differential gene expressionanalysis in RNA-Seq experiments A matter of relative size of studied transcriptomes Commun Integr Biol 6 e25849

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

McClintock B (1983) The Significance of Responses of THE GENOME TO CHALLENGE Science (80- ) 792-801Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Meyer PE Kontos K Lafitte F Bontempi G (2007) Information-theoretic inference of large transcriptional regulatory networks EurasipJ Bioinforma Syst Biol doi 101155200779879

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Meyer PE Lafitte F Bontempi G (2008) minet ARBioconductor package for inferring large transcriptional networks using mutualinformation BMC Bioinformatics 9 461

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Monaco MK Sen TZ Dharmawardhana PD Ren L Schaeffer M Naithani S Amarasinghe V Thomason J Harper L Gardiner J et al(2013) Maize Metabolic Network Construction and Transcriptome Analysis Plant Genome 6 12

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Morohashi K Casas MI Falcone Ferreyra ML Falcone Ferreyra L Mejiacutea-Guerra MK Pourcel L Yilmaz A Feller A Carvalho B EmilianiJ et al (2012) A genome-wide regulatory framework identifies maize pericarp color1 controlled genes Plant Cell 24 2745-64

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Morris JH Apeltsin L Newman AM Baumbach J Wittkop T Su G Bader GD Ferrin TE (2011) clusterMaker a multi-algorithmclustering plugin for Cytoscape BMC Bioinformatics 12 436

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mortazavi A Williams BA McCue K Schaeffer L Wold B (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq NatMethods 5 621-628

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mukaka MM (2012) A guide to appropriate use of correlation coefficient in medical research Malawi Med J 24 69-71Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Obayashi T Hayashi S Saeki M Ohta H Kinoshita K (2009) ATTED-II provides coexpressed gene networks for Arabidopsis Nucleic wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Acids Res 37 D987-D991Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ogata Y Suzuki H Sakurai N Shibata D (2010) CoP a database for characterizing co-expressed gene modules with biologicalinformation in plants Bioinformatics 26 1267-1268

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Oshlack A Wakefield MJ (2009) Transcript length bias in RNA-seq data confounds systems biology Biol Direct 4 14Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Park T Yi S-G Kang S-H Lee S Lee Y-S Simon R (2003) Evaluation of normalization methods for microarray data BMC Bioinformatics4 33

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Paulson J Chen C-Y Lopes-Ramos CM Kuijjer ML Platig J Sonawane AR Fagny M Glass K Quackenbush J (2016) Tissue-awareRNA-Seq processing and normalization for heterogeneous and sparse data bioRxiv 81802

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Pautler M Eveland AL LaRue T Yang F Weeks R Lunde C Je B Il Meeley R Komatsu M Vollbrecht E et al (2015) FASCIATED EAR4Encodes a bZIP Transcription Factor That Regulates Shoot Meristem Size in Maize Plant Cell Online 2 tpc114132506

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Penning BW Hunter 3rd CT Tayengwa R Eveland AL Dugard CK Olek AT Vermerris W Koch KE McCarty DR Davis MF et al (2009)Genetic resources for maize cell wall biology Plant Physiol 151 1703-1728

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ponnala L Wang Y Sun Q Wijk KJ (2014) Correlation of mRNA and protein abundance in the developing maize leaf Plant J 78 424-440Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Rau A Gallopin M Celeux G Jaffreacutezic F (2013) Data-based filtering for replicated high-throughput transcriptome sequencingexperiments Bioinformatics 29 2146-2152

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Robinson MD McCarthy DJ Smyth GK (2010) edgeR a Bioconductor package for differential expression analysis of digital geneexpression data Bioinformatics 26 139-140

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sales G Romualdi C (2011) parmigenemdasha parallel R package for mutual information estimation and gene network reconstructionBioinformatics 27 1876-1877

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sangrador-Vegas A Mitchell AL Chang H-Y Yong S-Y Finn RD (2016) GO annotation in InterPro why stability does not indicateaccuracy in a sea of changing annotations Database J Biol databases curation 2016

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sato Y Antonio BA Namiki N Takehisa H Minami H Kamatsuki K Sugimoto K Shimizu Y Hirochika H Nagamura Y (2011) RiceXPro aplatform for monitoring gene expression in japonica rice grown under natural field conditions Nucleic Acids Res 39 D1141-D1148

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schaefer RJ Briskine R Springer NM Myers CL (2014) Discovering functional modules across diverse maize transcriptomes using wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

COB the co-expression browser PLoS One doi 101371journalpone0099193Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schmidt D (2016) Co-Operation Fast Correlation Covariance and Cosine Similarity R packagePubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schnable P Ware D Fulton R Stein J (2009) The B73 maize genome complexity diversity and dynamics Science (80- ) 326 1112-1115Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schwanhaumlusser B Busse D Li N Dittmar G Schuchhardt J Wolf J Chen W Selbach M (2011) Global quantification of mammalian geneexpression control Nature 473 337-342

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Scott-Boyer M-P Haibe-Kains B Deschepper CF (2013) Network statistics of genetically-driven gene co-expression modules in mousecrosses Frontiers in Genetics 20134291

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Serin EAR Nijveen H Hilhorst HWM Ligterink W (2016) Learning from Co-expression Networks Possibilities and Challenges FrontPlant Sci 7 444

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Shannon P Markiel A Ozier O Baliga NS Wang JT Ramage D Amin N Schwikowski B Ideker T (2003) Cytoscape a softwareenvironment for integrated models of biomolecular interaction networks Genome Res 13 2498-2504

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sing T Sander O Beerenwinkel N Lengauer T (2005) ROCR visualizing classifier performance in R Bioinformatics 21 3940-3941Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Singh M Goel S Meeley RB Dantec C Parrinello H Michaud C Leblanc O Grimanelli D (2011) Production of viable gametes withoutmeiosis in maize deficient for an ARGONAUTE protein Plant Cell 23 443-458

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Song L Langfelder P Horvath S (2012) Comparison of co-expression measures mutual information correlation and model basedindices BMC Bioinformatics 13 328

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Stelpflug SC Rajandeep S Vaillancourt B Hirsch CN Buell CR Leon N De Kaeppler SM (2015) An expanded maize gene expressionatlas based on RNA-sequencing and its use to explore root development Plant Genome 314-362

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Szklarczyk D Franceschini A Wyder S Forslund K Heller D Huerta-Cepas J Simonovic M Roth A Santos A Tsafou KP et al (2015)STRING v10 Protein-protein interaction networks integrated over the tree of life Nucleic Acids Res 43 D447-D452

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Tzfadia O Diels T De Meyer S Vandepoele K Aharoni A Van de Peer Y (2016) CoExpNetViz Comparative Co-Expression NetworksConstruction and Visualization Tool Front Plant Sci 6 1194

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Usadel B Obayashi T Mutwil M Giorgi FM Bassel GW Tanimoto M Chow A Steinhauser D Persson S Provart NJ (2009) Co-expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

USDA (2016) Grain World Markets and Trade

Vanholme R Demedts B Morreel K Ralph J Boerjan W (2010) Lignin biosynthesis and structure Plant Physiol 153 895-905Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Verdin E Ott M (2014) 50 Years of Protein Acetylation From Gene Regulation To Epigenetics Metabolism and Beyond Nat Rev MolCell Biol 16 258-264

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Walley JW Sartor RC Shen Z Schmitz RJ Wu KJ Urich MA Nery JR Smith LG Schnable JC Ecker JR (2016) Integration of omicnetworks in a developmental atlas of maize Science 353 814-818

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang W Zhou XJ Liu Z Sun F (2015a) Network tuned multiple rank aggregation and applications to gene ranking BMC Bioinformatics16 S6

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang YXR Jiang K Feldman LJ Bickel PJ Huang H (2015b) Inferring gene-gene interactions and functional modules using sparsecanonical correlation analysis Ann Appl Stat 9 300-323

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang Z Gerstein M Snyder M (2009) RNA-Seq a revolutionary tool for transcriptomics Nat Rev Genet 10 57-63Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wei C Li J Bumgarner RE (2004) Sample size for detecting differentially expressed genes in microarray experiments BMC Genomics5 87

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wolfe CJ Kohane IS Butte AJ (2005) Systematic survey reveals general applicability of guilt-by-association within genecoexpression networks BMC Bioinformatics 6 227

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wu D-D Wang X Li Y Zeng L Irwin DM Zhang Y-P (2014) Out of Pollen Hypothesis for Origin of New Genes in Flowering PlantsStudy from Arabidopsis thaliana Genome Biol Evol 6 2822-2829

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yang H Liu X Xin M Du J Hu Z Peng H Rossi V Sun Q Ni Z Yao Y (2016) Genome-wide Mapping of Targets of Maize HistoneDeacetylase HDA101 Reveals Its Function and Regulatory Mechanism during Seed Development Plant Cell 28 629-645

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yim WC Yu Y Song K Jang CS Lee B-M (2013) PLANEX the plant co-expression database BMC Plant Biol 13 83Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zheng W Chung LM Zhao H (2011) Bias detection and correction in RNA-Sequencing data BMC Bioinformatics 12 290Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Allen JD Xiao G Xie Y (2014) Ensemble-based network aggregation improves the accuracy of gene network reconstructionPLoS One 9 e106319 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Ye ZH (2015) Secondary cell walls Biosynthesis patterned deposition and transcriptional regulation Plant Cell Physiol 56195-214

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhu G Wu A Xu X-J Xiao P Lu L Liu J Cao Y Chen L Wu J Zhao X-M (2016) PPIM A protein-protein interaction database for MaizePlant Physiol 170 pp1501821-

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zyprych-Walczak J Szabelska A Handschuh L Goacuterczak K Klamecka K Figlerowicz M Siatkowski I (2015) The impact of normalizationmethods on RNA-Seq data analysis Biomed Res Int 2015

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

  • Parsed Citations
  • Article File
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Figure 7
  • Figure 8
  • Parsed Citations
Page 9: 1 : Maize RNA-Seq GCNs · 4 Author names and affiliation: 5 Ji Huang, ... and the genome annotation is still far from complete (Mark 49 Cigan et al., 2005; Ficklin and Feltus, 2011)

Page | 9

CSC method CPM and RPKM normalization methods had higher AUROC values than VST (Fig 2C) Using 285

two models of ARACNE (additive-AA and multiplicative-MA) the co-expression matrices contain less than 05 286

non-zero values for all comparisons and so these techniques were not included in any additional analyses 287

In conclusion our results indicated the widely-used correlation methods resulted in a more predictive maize 288

GCN from a single expression matrix but co-expression with some individual genes may be better detected 289

using MI methods Normalization method did not have a substantial influence on GCNrsquos performance so only 290

CPM normalization was used in conjunction with PCC SCC MRNT and CLR inference for subsequent 291

optimization of other parameters 292

293

Increase Sample Size Had a Positive Effect On GCN 294

GCN analysis can be accomplished with a variable number of samples and datasets but sample size can 295

influence the quality of the resulting GCN (Wei et al 2004 Ballouz et al 2015) Separate analyses were 296

conducted with different numbers of samples and experiments to empirically determine the effect of sample 297

number on GCN effectiveness The data in our analysis consisted of 17 experiments each including between 298

12 and 404 libraries For this analysis CPM normalization method followed by each of four inference methods 299

(PCC SCC MRNET and CLR) was applied to the 17 experiments and the 68 resulting networks were 300

evaluated by both GO and PPPTY 301

From GO and PPPTY evaluation all algorithms exhibit a positive linear relationship between sample size with 302

natural logarithm transformed and average AUROC values (Fig 4) The linear relationships are stronger in 303

PCC and SCC methods with higher r-square values indicating correlation methods benefit more from 304

increasing sample size Thus for building correlation-based GCNs as many samples as possible should be 305

included We also found that as seen for the total GCN analysis PCC and SCC had higher average AUROC 306

values than the MRNET and CLR methods for PPPTY and GO analysis for most of individual networks (Fig 5) 307

308

Ranked Aggregation of Networks Improved Performance of GCNs 309

Ranked aggregation for meta-analysis can also be modified to change the outcomes of GCN by buffering the 310

effect of sample heterogeneity (Zhong et al 2014 Wang et al 2015a Asnicar et al 2016) Aggregated rank 311

standardized correlationMI matrices were calculated from separate experiments to determine if this approach 312

enhanced GCN performance Aggregating individual networks together for meta-analysis can help to highlight 313

true co-expression interactions and reduce noise (Zhong et al 2014 Wang et al 2015a Wang et al 2015b) 314

This analysis was conducted with the 17 differently sized experiments using PCC SCC MRNET and CLR 315

method for GCN inference as we did previously resulting in 68 single GCNs The 17 experiments were 316

aggregated for PCC SCC MRNET and CLR individually and evaluated by GO and PPPTY datasets 317

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 10

Of the 4 aggregated networks that were evaluated the two correlation methods (PCC and SCC) had higher 318

AUROC values than the single network from 1266 samples (Figure 6 and Supplemental Fig 6) However this 319

aggregation strategy did not result in significant higher AUROC scores for the MRNET and CLR method 320

networks compared with single networks with 1266 samples (two-tail Wilcoxon rank test for GO evaluation p-321

values 0494 and 0796) It has been reported that MI estimation accuracy is dependent on sample size (Gao 322

et al 2015) therefore individual MI networks built with a small number of libraries may not demonstrate 323

improved accuracy from aggregation In conclusion the PCCSCC-built GCN performed best using a ranked 324

aggregation strategy and use of this strategy in combination with the other optimized parameters creates a 325

robust GCN 326

327

The Performance of Protein Networks Did Not Exceed Aggregation Networks 328

In many cases mRNA levels in a cell are of interest because mRNA level is thought to be related to the level 329

and function of a protein of interest However many researchers had found inconsistencies between mRNA 330

and protein level (Baerenfaller et al 2008 Schwanhaumlusser et al 2011 Ponnala et al 2014 Walley et al 331

2016) Although relatively less protein expression data is available this data is amenable to GCN construction 332

and could represent a more direct reflection of interacting proteins Using a non-modified protein expression 333

atlas from 23 maize tissues based upon mass spectrometry data (Walley et al 2016) four protein networks 334

were built with PCC SCC MRNET and CLR separately and then evaluated using the same PPPTY and GO 335

dataset as previously mentioned 336

GCNs constructed from protein expression did not exhibit superior AUROC values to those observed for RNA-337

Seq based GCN using the aggregation strategy (Fig 6) When evaluated by GO and PPPTY dataset the 338

performance of the protein network was lower than the aggregated network as well as the single network from 339

1266 samples To confirm this result a two-way ANOVA was computed with pairwise comparison for the GO 340

evaluation which showed that the effect of network type was significant (Supplemental Table S3) A 341

subsequent pairwise comparison using Wilcoxon rank sum test indicated that PCCSCC method were 342

significantly better than MRNETCLR (Supplemental Table S3) although MI methods may be superior for 343

some types of interactions 344

The raw protein expression data included 17862 genes of which 11429 genes overlapped with our RNA-Seq-345

based network and were therefore used for the analysis To demonstrate that the performance of the protein 346

network was not biased due by the selection of genes the PCC method was used for the whole 17862 genes 347

to construct a protein network (Supplemental Fig 7) No improvement could be detected from protein network 348

derived from 17862 genes with p-value equals to 0635 for GO evaluation and 0995 for PPPTY evaluation 349

from one-sided Wilcoxon rank sum test 350

351

PCC and SCC-built GCN Exhibit Identical Topological and Functional Properties 352 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 11

In addition to evaluation of network performance based upon biological characteristics networks can be 353

compared based upon several different network characteristics including clustering coefficient number of 354

nodes network heterogeneity (Dong and Horvath 2007) network centralization (Dong and Horvath 2007) 355

number of detected modules and number of genes in largest module Number of nodes is a basic construct in 356

graph theory depicting the scale of a network Clustering coefficient and number of modules are to model how 357

densely nodes are connected in networks Heterogeneity measures the variability of node connections 358

Centralization indicates how likely some nodes have significantly more connections than average In this 359

analysis each gene corresponds with a node Based on the extensive evaluation using biological 360

characteristics like protein-protein interactions (PPPTY) and predicted gene function (GO) three final maize 361

networks were selected for comparison of basic network characteristics based on their overall performance 362

PCC and SCC-built ranked aggregation network from 17 experiments (PA and SA) MRNET-built single 363

network from 1266 total samples (MS) The three networks were constrained to include the top one million 364

predicted interactions or edges 365

In prior studies most biological networks had scale-free architectures which fit a power-law distribution 366

(Barabasi et al 2004 Doncheva et al 2012 Schaefer et al 2014) For the three final maize networks 367

constructed using optimized parameters both neighborhood connectivity distribution (Supplemental Fig 8) and 368

node degree distribution (Supplemental Fig 9) fit power-law models with r-squared values over 07 The MS 369

network had the highest network centralization value The network heterogeneity value of MS was over two 370

times that of PA and SA indicating that MS may contain more highly interacting genes (Supplemental Table 371

S4) consistent with the observed highest centralization values for this network Centralization and 372

heterogeneity are two variants to model the degree distribution of networks A scale-free network with more 373

numbers of hubs has larger values of centralization and heterogeneity while a network with larger values of 374

centralization and heterogeneity may contain a larger number of hubs or the number of hubs is not significantly 375

large but the degree distributions are extremely imbalanced In biological networks many observations 376

connected large values of centralization and heterogeneity with more hub genes (Ma and Zeng 2003 Horvath 377

and Dong 2008 Iancu et al 2012 Scott-Boyer et al 2013) even though theoretically we cannot rule out the 378

possibility that high values were result from extremely imbalanced degree distribution For the MS network 379

most highly connected genes interacted with a large number of lowly connected genes this pattern is also 380

apparent reflected in the decreasing neighborhood connectivity distribution for the MS network (Supplemental 381

Fig 8) The genes with the most interactions are expected to act as key components in GCN networks 382

(Langfelder and Horvath 2008 Allen et al 2012) and likely represent central regulators of multi-protein 383

biological processes (Ma et al 2013 Du et al 2015) The top 1000 interacting genes from all networks were 384

analyzed in more detail as these were potential ldquohubrdquo genes that may regulate other expression patterns and 385

processes PA and SA shared 95 of the top 1000 interacting genes while MS had 835 unique genes (Fig 386

7A) 148 genes were shared among all three networks (Supplemental Table S5) making these genes strong 387

candidate for central biological regulators The annotation of these genes suggests their participation in a 388

range of basic cellular process (Fig 7C) including gene expression DNA replication translation and gene 389

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 12

silencing (Supplemental Table S5) the top interacting genes were not limited to a subset of cellular 390

biochemistry Ribosomal proteins were the largest component of top interacting genes (27148) which was 391

expected because of their cellular abundance and involvement with translation Interestingly nine epigenetic 392

regulators were found in the 148 shared genes including AGO104 (GRMZM2G141818) (Singh et al 2011) 393

CHR106 (GRMZM2G071025) (Li et al 2014a) and LBL1 (GRMZM2G020187) (Dotto et al 2014) 394

demonstrating the importance of epigenetic regulation for plant development (reviewed by (Huang et al 395

2017)) 396

To reveal the underlying properties of GCNs a graph clustering algorithm Markov Cluster Algorithm(MCL) was 397

used to identify network modules (Enright et al 2002 Morris et al 2011) The result showed a shared pattern 398

between the PA and SA networks that was distinct from the MS network (Supplemental Table S4) The MS 399

network had fewer but larger modules detected than the PA and SA networks Consequently most genes in 400

the MS network clustered into one very large module of 14054 consistent with the high network centralization 401

value for the MS network Conversely PA and SA networks separated into smaller distinct modules with 402

related gene ontology enrichment (Supplemental Table S6 and S7) The pattern displayed by the PA and SA 403

networks (Supplemental Fig 10) seems more likely to represent biologically relevant pathways and so these 404

methods appear to be better for module detection 405

To compile a high-confident co-expression network the top 1 million edges from PA SA and MS were merged 406

together and the intersection of the three produced a 14277 gene 106591 interactions merged network PA 407

and SA shared 835 of common interactions within the networks while MS had 873 unique interactions 408

(Fig 7B) This merged network (Supplemental Dataset S1) was used for a case study analysis of cell wall 409

biosynthesis The same network can also be accessed at httpwwwbiofsuedumcginnislabmcnmain_pagephp 410

411

Case Study Cell Wall Biosynthesis and Regulation 412

To demonstrate the functionality of network the predicted cell wall biosynthesis pathway from the merged 413

network was compared to the existing knowledge of this pathway Sixteen well-characterized components of 414

cell wall biosynthesis were selected as guide genes (Supplemental Table S8) including five cellulose 415

synthase genes seven cellulose synthase-like genes three glycosyl hydrolase genes and one glycosidase 416

gene (Penning et al 2009 Bosch et al 2011) Collectively 214 genes containing 377 edges were extracted 417

from the network with the 16 guide genes (Fig 8 A) two guide genes did not have any co-expressed genes in 418

the network that met the analysis criteria As expected for these 214 genes cell wall related GO terms were 419

enriched (Fig 7D Supplemental Table S9) 420

The resulting 214 co-expressed genes were queried against the Arabidopsis TAIR 10 protein database to 421

retrieve homologs and their annotations using BLASTP The literature was manually searched using the maize 422

genes and their Arabidopsis homologs as queries (Supplemental Table S10) The results of the literature 423

survey showed that 313 (67214) of the genes co-expressed with the guide genes had peer-reviewed 424

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 13

publications indicating a role in cell wall synthesis or related pathways in plants A search using 214 randomly 425

selected genes as queries returned only 327 genes (7214) that were involved in cell wall related pathways 426

This suggests that the network discriminated co-expressed genes and identified some known components of 427

the pathway Lignin biosynthesis genes are expected to function in cell wall biosynthesis to provide rigidity and 428

strength in the secondary cell wall (reviewed by Vanholme et al 2010) Interestingly even though no lignin 429

biosynthesis genes were included in our queries six lignin biosynthesis genes (PAL1 C4H 4CL2 HCT 430

CCoAOMT1 and PDR1) (reviewed by Zhong and Ye 2015) were found to be co-expressed with the guide 431

genes At least nine cellulose biosynthesis and assembly genes were discovered including CESA1 FLA11 432

IRX9 IRX14 and IRX10 (reviewed by Zhong and Ye 2015) Moreover proteins participating in a well-studied 433

physical interaction CSI1 (Cellulose Synthase Interactive 1) CESA6 (Cellulose Synthase 6) and CESA3 434

(Cellulose Synthase 3) (Desprez et al 2007 Gu et al 2010) were also predicted to be expressed in the 435

network There were 131 genes without reported functions in cell wall pathways an indication that GCN 436

analysis can be used to predict undiscovered components of biological pathways in maize 437

The cell wall biosynthesis pathway results were also compared with the CORNET Co-expression database (De 438

Bodt et al 2012) and STRING functional protein association network (Szklarczyk et al 2015) using the same 439

16 genes and similar parameters (See Methods) From CORNET 10 out of 16 genes had co-expressed genes 440

(Fig 8B) In total 210 genes and 325 interactions were retrieved using CORNET of which 19 (40210) had 441

publications supporting their function in cell wall pathways (Supplemental Table S11) STRING performed very 442

well with 14 out of 16 genes demonstrating predicted protein association (Fig 8C) resulting in 817 443

interactions with 76 genes 48 (3675) of co-expressed genes were experimentally confirmed (Supplemental 444

Table S12) the highest percentage among the three methods Only one of the lignin biosynthesis genes 445

(PAL1) was found using CORNET and none were found using STRING Although STRING appears very 446

robust for predicting protein-protein interactions this suggests that an optimized GCN analysis have more 447

power to find genes that function together without physically interacting This case study shows that a robust 448

optimized GCN can discover physical and functional interactions and enhance study of biological relevant 449

interactions A tutorial was provided as supplemental material on how to use Cytoscape to visualize any co-450

expressed genes in our network (Supplemental Dataset S2) 451

452

Discussion 453

As the per-read cost of RNA-Seq technology decreases the use of this technology is quickly increasing With 454

over five thousand libraries available for maize there is now ample data to support GCN analysis This 455

comprehensive evaluation of normalization methods and network inference methods using real maize RNA-456

Seq data will provide a useful set of optimized parameters to support these analyses 457

In our analysis VST CPM and RPKM normalization methods had equivalent outcomes for GCN analysis 458

consistent with prior results using much smaller datasets (Giorgi et al 2013) Several benchmark studies 459

focusing on differential expression (DE) analysis proposed that RPKM performed poorly and should be avoided 460 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 14

(Maza et al 2013 Dillies et al 2013b Zyprych-Walczak et al 2015) This was not observed for the maize 461

GCN testing It is possible that the large number of samples from various labs created enough heterogeneity 462

within samples that normalization effects were minimized (Paulson et al 2016) Furthermore the 463

normalization is on a library basis which means genes within the same library are normalized by similar factors 464

So when the network is constructed by PCC and BIC where expression vectors are centered by mean or 465

median values the effect of different normalization methods are probably small Two rank correlations SCC 466

and KCC only consider difference on relative rankings where normalization has a limited effect It is similar for 467

GCC method The estimation of mutual information is based on the k-nearest neighbor method implemented in 468

parmigene (Sales and Romualdi 2011) Since the three normalization methods shared similar expression 469

distribution (Supplemental Fig 2) MI estimations from different normalizations are expected to be similar 470

When assessing inference methods the simple and widely used correlation methods like PCC and SCC are 471

less time-consuming than MI methods This analysis showed PCCSCC- built GCNs had better overall 472

performance This is consistent with a study in human GCN analysis (Ballouz et al 2015) but SCC did not 473

score higher than other correlation methods using GO and PPPTY evaluations Some genes had higher 474

performance using MI methods but this effect was limited to evaluation with the PPPTY data This may 475

indicate that correlation and MI inference methods assert different kinds of interactions (Meyer et al 2008 476

Marbach et al 2012 Song et al 2012) Marbach et al (2012) stated that integration of multiple inference 477

methods showed a more robust performance than any single inference methods in in silico and E coli 478

expression networks referring to ldquothe wisdom of crowdrdquo However for analysis of the available maize data 479

integration of PCC SCC MRNET and CLR together did not result in a network that outperformed PCC and 480

SCC networks (data not shown) This approach was also less effective in more complex S cerevisiae datasets 481

than prokaryotic networks (Marbach et al 2012) suggesting that more work is required to determine whether 482

integrating algorithms can improve GCNs with eukaryotic data 483

In conclusion we extensively evaluated normalization methods and inference methods for building an RNA-484

Seq based maize GCN This optimization may apply to a range of datasets with shared characteristics of 485

maize including a large and heterogeneous genome with rich and diverse transposon element composition 486

and limited gene annotation 487

488

Materials and Methods 489

RNA-Seq Data Collection and Process 490

The maize genome and its annotation were downloaded from Ensembl Plant Release 31 491

(httpplantsensemblorg) The original 1303 RNA-Seq samples based on illumina HiSeq2000 or Hiseq2500 492

were downloaded from NCBI Sequence Read Archive (SRA) (Leinonen et al 2010) The downloaded files 493

were converted to fastq format using the fastq-dump command in SRA Toolkit (version 252) The adapters for 494

the fastq files were trimmed by Cutadapt 181 (Martin 2011) The adapter-removed files were then quality 495

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 15

checked by FastQC v0112 (httpwwwbioinformaticsbabrahamacukprojectsfastqc) HISAT2 v204 (Kim 496

et al 2015) was used for genome alignment Gene-level expression raw read counts were calculated by 497

FeatureCounts 150 (Liao et al 2014) from aligned bam files (Supplemental Fig S1) 26 libraries with less 498

than 5 million reads total and 11 libraries with less than 70 of total alignment rate were excluded leaving 499

1266 samples (Supplemental Table S1) for the final expression table The processing protocol were 500

streamlined by Snakemake v371 (Koumlster and Rahmann 2012) 501

502

Gene Count Normalization 503

The expression data was normalized using three different methods before constructing GCNs Counts Per 504

Million (CPM) and Reads Per Killbase Per Million (RPKM) were calculated by edgeR package (Robinson et al 505

2010) in R environment and then log2 normalized (expression = log2(CPMRPKM +1) For both method scale 506

factors between samples were estimated by Trimmed Mean of M-values (TMM) in edge R Variance Stabilizing 507

Transformation (VST) was calculated by DESeq2 package (Love et al 2014) Only genes with expression 508

higher than 2 CPM in more than 1000 samples were included from additional analysis (15116 genes) 509

510

Network Inference 511

Six correlation coefficient methods and four mutual information methods were applied to normalized gene 512

expression data to construct GCNs All computing steps were done in the R 331 environment Pearson 513

Correlation Coefficient (PCC) and Spearman Correlation Coefficient (SCC) was calculated by cor() function 514

Kendall rank Correlation Coefficient was calculated using corfk() function in pcaPP package (Filzmoser et al 515

2009) Gini Correlation Coefficient was calculated by adjacencymatrix() function in rsgcc package (Ma and 516

Wang 2012) Biweight midcorrelation was computed by bicor() function in WGCNA package (Langfelder and 517

Horvath 2008) Cosine similarity coefficient was computed by cosine() function in coop package (Schmidt 518

2016) Mutual information results were computed using the parmigene package (Sales and Romualdi 2011) 519

The adjacency matrix weighs derived from ten inference methods were ranked with smallest value equals to 520

one Then ranks were divided by the number of elements in the matrix and diagonal was set to one to make all 521

networks weighs ranging from zero to one 522

523

Network Performance Evaluation 524

To generate the random networks gene IDs were shuffled randomly in CPM or VST normalized expression 525

matrices The randomized expression matrices were then inferenced by PCC MRNET or CLR methods and 526

evaluated For PCC methods 1000 repeats of randomization and evaluation were conducted For MRNET and 527

CLR each inference steps took 2 hours on our server so 10 repeats were conducted 528

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 16

Four maize datasets were used for evaluation First maize protein-protein interactions were downloaded from 529

PPIM v11 (Zhu et al 2016) Only high-confidence interactions were used for evaluation as defined by ranking 530

top 5 in their results Second maize pathway information was downloaded from MaizeCyc v22 (Monaco et 531

al 2013) Genes within same pathways were considered as co-expressed Third maize gene ontology data 532

for AGPv330 was downloaded from AgriGO (Du et al 2010) GO terms with 20 to 300 genes were used for 533

evaluation Fourth ChIP-Seq confirmed targets for HDA101 (GRMZM2G172883) (Yang et al 2016) was used 534

as positive co-expressed examples for evaluation 535

The widely-used Area under Receiver Operating Characteristic (AUROC) for binary classification problems 536

was used for evaluations Protein-protein interaction and pathway information was parsed into lists of co-537

expressed genes Prediction() and performance() function in R package ROCR were used to calculate 538

AUROCs (Sing et al 2005) The 277 AUROC values for GO datasets were calculated by EGAD package 539

(Ballouz et al 2016) in R Basically it utilizes the ldquoguilt-by associationrdquo principle that genes with shared GO 540

terms are more likely to connected Thus networks normalized and inferred by different methods can be 541

evaluated by hiding a subset of genes GO terms and test whether the hidden GO terms could be predicted 542

from the remaining annotations The prediction model performance was measured by AUROC values in three-543

fold cross-validation All ANOVA and pairwise Wilcoxon rank tests were analyzed in R using anova() and 544

pairwisewilcoxtest() function from stats package P-value adjustment method was set to ldquofdrrdquo (Benjamini and 545

Hochberg 1995) 546

Definition of True Positives (TP) False Positives (FP) True Negatives (TN) False Negatives (FN) For the 547

evaluation using PPPTY dataset TP a network predicts two genes are co-expressed and they are co-548

expressed in PPPTY dataset FP a network predicts two genes are co-expressed but they are not TN a 549

network predicts two genes are not co-expressed and they are not co-expressed in PPPTY FN a network 550

predicts two genes are not co-expressed but they are co-expressed in PPPTY datasets For the evaluation 551

using GO dataset TP a network predicts a gene has a specific GO term and it does have that GO term in our 552

GO dataset FP a network predicts a gene has a specific GO term but it does not have that GO term in our 553

GO dataset TN a network predicts a gene does not have a specific GO term and it doesnrsquot have in our GO 554

dataset FN a network predicts a gene does not have a specific GO terms but it has that GO term in GO 555

dataset 556

557

Network Clustering and Characterization 558

For each network the top 1 million edges were selected as stringent co-expression networks The network 559

topological characteristics were computed in Cytoscape (Shannon et al 2003) The neighborhood connectivity 560

distribution and node degree distributions were plotted by Network Analyzer plugin (Doncheva et al 2012) 561

Graph clustering was performed using Markov Cluster Algorithm (MCL) by MCL v14137 with inflation value set 562

to 18 (Enright et al 2002) All networks were visualized in Cytoscape 563

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 17

564

Gene Ontology Enrichment and Visualization 565

Gene ontology enrichment was analyzed in AgriGOrsquos Singular Enrichment Analysis tool (Du et al 2010) 566

15116 genes involved in our networks were used as background references Hypergeometric testing was used 567

to calculate p-value for which a value below 005 was considered as significant The Yekutieli method was 568

used for multiple test correction and terms with false discovery rate (FDR) above 005 were discarded The 569

results were then imported into Cytoscape for visualization 570

571

Databases Comparison on Cell Wall Pathway 572

Sixteen well characterized (Penning et al 2009 Bosch et al 2011) components of cell wall biosynthesis 573

(Supplemental Table S8) were chosen as query genes to search against CORNET Maize 574

(httpsbioinformaticspsbugentbecornetversionscornet_maize10) on website and STRING database using 575

Cytoscape stringApp (httpappscytoscapeorgappsstringapp) The parameters for searching CORNET 576

database were Method=Pearson Correlation coefficient=075 P-value le 005 and Top genes = 50 This 577

resulted in 210 co-expressed genes and 325 interactions To search STRING database the confidence cutoff 578

was set to 04 with maximum number of interactors set to 100 76 genes with 817 interactions were retrieved 579

Maize proteins were blasted against TAIR 10 protein sequences using standalone BLASTP version 2228+ 580

(Camacho et al 2009) 581

582

Acknowledgments 583

We would like to give special thanks to Dr Peixiang Zhao (FSU Department of Computer Science) for advice 584

and discussion on topological analysis of maize networks Also we thank Dr Alan Lemmon (FSU Department 585

of Scientific Computing) and Dr Jonathan Dennis (FSU Department of Biological Science) for the helpful 586

discussion on data analysis 587

588

Supplemental Data 589

Supplemental Figure 1 Pipeline and datasets used for analysis 590

Supplemental Figure 2 Distribution of gene expression values 591

Supplemental Figure 3 Maize CPM-normalized with log2 transformed gene expression from all tissues and 592

developmental stages 593

Supplemental Figure 4 Pairwise comparison among results of inferences methods 594

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 18

Supplemental Figure 5 Characteristic of all 1720 PPPTY gene set (ALL_1720) genes with highest AUROC 595

values in CSC method (CSC) PCC method (PCC) and MRNET method (MRNET) 596

Supplemental Figure 6 Evaluation of network performance based on sample size and inference 597

Supplemental Figure 7 GCN performance comparison between protein networks 598

Supplemental Figure 8 Average neighborhood connectivity for three selected networks PCC-aggregated (PA) 599

SCC-aggregated (SA) and MRNET-single (MS) 600

Supplemental Figure 9 Node distribution for four selected networks PCC-aggregated (PA) SCC-aggregated 601

(SA) and MRNET-single (MS) and the intersection among three networks (Merged network) 602

Supplemental Figure 10 Network representation of PCC ranked aggregation network (PA) 603

Supplemental Table S1 RNA-Seq libraries used in this analysis 604

Supplemental Table S2 Random network AUROC value baseline 605

Supplemental Table S3 ANOVA tables and pairwise comparisons 606

Supplemental Table S4 Topological characteristics of four maize networks 607

Supplemental Table S5 Gene Ontology annotation for 148 hub genes 608

Supplemental Table S6 Enriched GO terms for PCC ranked aggregation networks from module 1 to module 8 609

Supplemental Table S7 Enriched GO terms for SCC ranked aggregation networks from module 1 to module 8 610

Supplemental Table S8 16 query genes in maize cell wall pathway 611

Supplemetal Table S9 GO enrichment analysis for 214 co-expressed genes of cell wall query genes in 612

merged network 613

Supplemental Table S10 Annotation for co-expressed genes queried by 16 cell wall pathway genes from 614

merged network 615

Supplemental Table S11 Annotation for co-expressed genes queried by 16 cell wall pathway genes from 616

CORNET database 617

Supplemental Table S12 Annotation for co-expressed genes queried by 16 cell wall pathway genes from 618

STRING database 619

Supplemental Dataset S1 The merged network in Cytoscape-ready format 620

Supplemental Dataset S2 Tutorial Visualizing Co-expression data in Cytoscape 621

622

623 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 19

624

625

626

Figure legends 627

628

Figure 1 Number of maize microarray and RNA-Seq samples submitted to NCBI (httpswwwncbinlmnihgov) 629

from 2008-2016 A A text search of the National Center for Biotechnology Information (NCBI) Gene 630

Expression Omnibus (GEO) database identified samples generated by microarray platforms GPL4032 and 631

GPL12620 the total values for years 2008 to 2016 were combined to represent the number of microarray 632

studies (Microarray) A text search of the NCBI Sequence Read Archive (SRA) database was used to identify 633

RNA-seq samples generated between 2008 and 2016 using the Illumina sequencing platform (RNA-Seq) B 634

the number of samples submitted to NCBI GEO database each year generated by microarray platform 635

GPL4032 and GPL12620 were identified by a text search (dash line) and compared to the number of RNA-Seq 636

Illumina samples (solid line) per year 2008-2016 637

638

Figure 2 Normalization and network inference methods effect on single network performance A Network 639

performance was evaluated by calculating area under the receiver operating characteristic curve (AUROC) 640

values from GO datasets for comparisons with samples normalized using Variance Stabilizing Transformation 641

(VST) Counts Per Million (CPM) or Reads per Kilobase per Million (RPKM) methods B Network performance 642

was evaluated by calculating AUROC values from PPPTY dataset comparisons for samples normalized using 643

VST CPM or RPKM methods C Network performance was evaluated by calculating AUROC values from 644

comparisons with HDA101 binding targets for samples normalized using VST CPM or RPKM methods D 645

Network performance was evaluated by calculating AUROC values from comparisons with GO dataset for 646

samples constructed using ten inference methods including Pearson Correlation Coefficient (PCC) Spearman 647

correlation coefficient (SCC) Kendall rank Correlation Coefficient (KCC) Gini correlation coefficient (GCC) 648

Biweight midcorrelation (BIC) Cosine Similarity Coefficient (CSC) Additive ARCNE (AA) multiplicative 649

ARCNE (MA) Minimum Redundancy NETwork (MRNET) and Context Likelihood of Relatedness (CLR) E 650

Network performance was evaluated by calculating AUROC values from comparisons with PPPTY for samples 651

constructed using ten inference methods F Network performance was evaluated by calculating AUROC 652

values from comparisons with HDA101 binding targets for samples constructed using ten inference methods 653

Outliers were defined as outside of 15 times the interquartile range above 75 quantile or below 25 quantile 654

Median values were plotted as bold horizontal lines For C to F the horizontal dashed lines indicate the highest 655

and lowest AUROC values 656

657

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 20

Figure 3 Similarity between ten inference methods on network performance based upon GO (A) and PPPTY 658

(B) evaluation Genes with highest AUROC value in CSC or MRNETCLR are closed in blue and green box 659

respectively Area under the ROC curve (AUROC) values for each GO term or genes were scaled to standard 660

normal distribution resulting in scaled AUROC values between -3 (blue) and 3 (red) Samples normalized by 661

VST CPM and RPKM were analyzed using each inference methods (PCC SCC KCC GCC BIC CSC AA 662

MA MRNET and CLR) and clustered based on Euclidian distance PCC Pearson Correlation Coefficient SCC 663

Spearman correlation coefficient KCC Kendall rank Correlation Coefficient GCC Gini correlation coefficient 664

BIC Biweight midcorrelation CSC Cosine Similarity Coefficient AA Additive ARCNE MA multiplicative 665

ARCNE MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness 666

667

Figure 4 Effect of sample size on network performance A Average area under the ROC curve (average 668

AUROC) values from GO evaluation of 17 different sized networks plotted against natural logarithm 669

transformed sample size (log(sample size)) B Average AUROC values from PPPTY evaluation of 17 different 670

sized networks plotted against natural logarithm transformed sample size (log(sample size)) Regression fitting 671

logarithm models were plotted in black lines R-square and p-values were calculated by lm() function in R PCC 672

Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET Minimum Redundancy 673

NETwork CLR Context Likelihood of Relatedness 674

675

Figure 5 Evaluation of different sized networks constructed with different number of samples using PCC 676

(black) SCC (green) MRNET (red) and CLR (blue) methods A Average AUROC values from GO evaluations 677

of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted against sample size Seventeen 678

individual networks were labeled as S12_1 to S404 the S1266 included all samples from 17 experiments B 679

Average AUROC values from PPPTY evaluations of networks constructed using 12 (S12) to 1266 (S1266) 680

libraries were plotted against sample size Networks with the same number of samples included are 681

designated as ldquo_1rdquo ldquo_2rdquo and ldquo_3rdquo PCC Pearson Correlation Coefficient SCC Spearman correlation 682

coefficient MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness 683

684

Fig 6 GCN performance comparison among single network (whiterdquo1266rdquo) aggregated network (greyrdquoaggrdquo) 685

and protein network (dark greyrdquoprrdquo) using PCC SCC MRNET and CLR A GO evaluation on networks 686

Inference methods were indicated by single letter (p- PCC s- SCC m- MRNET c-CLR) AUROC values were 687

plotted against network types B PPPTY evaluation on networks Inference methods were indicated by single 688

letter (p- PCC s- SCC m- MRNET c-CLR) Network types were plotted against AUROC values Bold 689

horizontal lines indicate median star sign is the mean value of each box Outliers are plotted in grey dots 690

691

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 21

Figure 6 GCN performance comparison of networks constructed with 1266 libraries A Area under the ROC 692

curve (AUROC) values from GO evaluation of single network (white bars) aggregation network (grey bars) and 693

protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) 694

or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers B 695

AUROC values from PPPTY evaluation of single network (white bars) aggregation network (grey bars) and 696

protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) 697

or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers 698

699

Figure 7 Comparison of top 1000 interacting genes in aggregation and single networks A Venn diagram 700

shows the overlap among 1000 highest interacting genes in three networks 148 genes were shared among 701

three networks PA PCC ranked aggregation network SA SCC ranked aggregation network MS MRNET 702

single network B Venn diagram shows the overlap among top 1x106 edges in three networks 106591 edges 703

were shared among three networks C GO term enrichment analyzed by AgriGO for 148 shared highly 704

interacting genes among PA SA and MS D GO term enrichment analyzed by AgriGO for 214 co-expressed 705

genes queried by 16 cell wall pathway genes 706

707

Figure 8 Cell wall pathway subnetworks A Intersections of PCC aggregation (PA) SCC aggregation (SA) and 708

MRNET-single (MS) networks queried by 16 cell wall pathway genes (red nodes) Cyan nodes are genes with 709

reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of 710

involvement in cell wall related pathways Grey lines indicate network predicted interactions B Network 711

retrieved from CORNET database queried by the16 cell wall pathway genes (red node) Cyan nodes are 712

genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior 713

knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions C 714

Network retrieved from STRING database queried by 16 cell wall pathway genes (red nodes) Cyan nodes are 715

genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior 716

knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions 717

718

Supplemental Figure 1 Pipeline and datasets used for analysis A Workflow used in this analysis 719

Independent steps are labeled in square boxes with alternative algorithms for each step in the rounded boxes 720

Software and packages for each step are in italics between the boxes Raw data files were acquired from 721

National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) database converted to a 722

common format (fastq files) and aligned to the maize AGPv3 genome (Alignment) Gene-level reads were 723

counted (Read Count) to generate an expression matrix which was imported to the R environment for the 724

normalization inference and evaluation steps All networks were visualized in Cytoscape B Relative 725

representation of different maize tissues in acquired datasets Tissues are listed by name with the percentage 726

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 22

of the1266 libraries originating from each tissue SAM= Shoot Apical Meristem Samples are grouped by tissue 727

and may be represented by one or more developmental stages of that tissue Tissues represented by less than 728

10 libraries were grouped together as Others C Relative representation of different maize genotypes in our 729

datasets Genotypes are listed by name with the percentage of the 1266 libraries originating from each tissue 730

MAGIC = Multi-parent Advanced Generation InterCrosses Genotypes represented by more than 10 libraries 731

were grouped together as Others 732

733

Supplemental Figure 2 Distribution of gene expression values The frequency of each expression level in the 734

dataset (Density) was plotted against gene expression (Expr) which was calculated after normalization by 735

Variance Stabilizing Transformation (VST) Counts Per Million (CPM) or Reads Per Kilobase per Million 736

mapped reads (RPKM) A-B distribution of expression values for samples normalized with CPM (black line 737

CPM graph) and RPKM (black line RPKM graph) before (A) and after (B) logarithm normalization (log2) VST 738

values are log2 transformed by default The normal distribution of expression (dot lines) was calculated using 739

dnorm() function in R which takes the mean value and standard deviation from log2 transformed expressions 740

C Normalized gene expression values for 15116 genes were averaged libraries and plotted as a function of 741

gene length in base pairs (bp) 742

743

Supplemental Figure 3 Maize CPM-normalized with log2 transformed gene expression from all tissues and 744

developmental stages (Stelpflug et al 2015) A Clustering dendrogram of samples based on Euclidean 745

distance (Height) DAS days after sowing DAP days after pollination V1-V18 vegetative developmental 746

stage B Heat map of the gene expression correlation between pollen tissue and 78 other tissues calculated 747

by Pearson correlation coefficient ranging 06 to 10 Red color indicates higher correlation 748

749

Supplemental Figure 4 Pairwise comparison among results of inferences methods A GO evaluation 750

comparisons for VST CPM and RPKM normalized data The AUROC value density for each method was 751

plotted in diagonal line of blocks between AUROC values and PCC values AUROC values evaluated by GO 752

datasets were plotted pairwise in triangle below diagonal with the number corresponding coefficient values as 753

calculated by Pearson correlation shown in the triangle above diagonal B PPPTY evaluation comparisons for 754

VST CPM and RPKM normalized data The AUROC value density for each method was plotted in diagonal 755

line of blocks between AUROC values and PCC values AUROC values evaluated by PPPTY datasets were 756

plotted pairwise in triangle below diagonal with the number corresponding coefficient values as calculated by 757

Pearson correlation shown in the triangle above diagonal PCC Pearson Correlation Coefficient SCC 758

Spearman correlation coefficient KCC Kendall rank Correlation Coefficient GCC Gini correlation coefficient 759

Bi Biweight midcorrelation CSC Cosine Similarity Coefficient AA Additive ARCNE MA multiplicative 760

ARCNE MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness 761

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 23

762

Supplemental Figure 5 Characteristic of all 1720 PPPTY gene set (ALL_1720) genes with highest AUROC 763

values in CSC method (CSC) PCC method (PCC) and MRNET method (MRNET) Average expression in 764

CPM of four gene sets were in squares average number of lowly expressed elements (CPM lt 0) were in solid 765

circles 766

767

Supplemental Figure 6 Evaluation of network performance based on sample size and inference A AUROC 768

values from GO evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted 769

against sample size B AUROC values from PPPTY evaluations of networks constructed using 12 (S12) to 770

1266 (S1266) libraries were plotted against sample size Networks with the same number of samples included 771

are designated as ldquo_1rdquo ldquo_2rdquo and ldquo_3rdquo Outliers were defined as outside of 15 times the interquartile range 772

above the 75 quantile or below 25 quantile Median values were plotted as bold horizontal lines Dash lines 773

are average AUROC value from 17 individual networks of each categories Mean values of each network were 774

labeled in asterisks PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET 775

Minimum Redundancy NETwork CLR Context Likelihood of Relatedness 776

777

Supplemental Figure 7 GCN performance comparison between protein networks A Area Under the ROC 778

curve (AUROC) values from GO evaluation of protein networks with 17862 genes (ppr_all) and with 11429 779

genes (ppr) B Area Under the ROC curve (AUROC) values from PPPTY evaluation of protein networks with 780

17862 genes (ppr_all) and with 11429 genes (ppr) Both networks were constructed by Pearson Correlation 781

Coefficient (PCC) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate 782

outliers 783

784

Supplemental Figure 8 Average neighborhood connectivity for three selected networks PCC-aggregated (PA) 785

SCC-aggregated (SA) and MRNET-single (MS) The average neighborhood connectivity distribution of all 786

genes is plotted against number of neighbors The top one million edges were chosen for each network Red 787

and blue curve shows the power-law fitted distribution R2 value indicates the fitness with the power-law model 788

789

Supplemental Figure 9 Node distribution for four selected networks PCC-aggregated (PA) SCC-aggregated 790

(SA) and MRNET-single (MS) and the intersection among three networks (Merged network) The number of 791

edges linked to the genes (node degree) was plotted against the number of genes with that degree (number of 792

nodes) Red curve shows the power-law fitted distribution with the function and R2 indicated beside 793

794

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 24

Supplemental Figure 10 Network representation of PCC ranked aggregation network (PA) Each node is a 795

gene in the network The eight largest modules detected by Markov Cluster Algorithm (MCL) were highlighted 796

in colors Genes not in modules 1-8 are light grey nodes 797

798

799

Literature Cited 800

Allen JD Xie Y Chen M Girard L Xiao G (2012) Comparing statistical methods for constructing large scale 801 gene networks PLoS One 7 e29348 802

Anders S Huber W (2010) Differential expression analysis for sequence count data Genome Biol 11 R106 803

Aoki K Ogata Y Shibata D (2007) Approaches for extracting practical information from gene co-expression 804 networks in plant biology Plant Cell Physiol 48 381ndash90 805

Aoki Y Okamura Y Tadaka S Kinoshita K Obayashi T (2015) ATTED-II in 2016 A Plant Coexpression 806 Database Towards Lineage-Specific Coexpression Plant Cell Physiol 57 e5ndashe5 807

Asnicar F Masera L Coller E Gallo C Sella N Tolio T Morettin P Erculiani L Galante F Semeniuta S (2016) 808 NES2RA Network expansion by stratified variable subsetting and ranking aggregation Int J High Perform 809 Comput Appl 1094342016662508 810

Baerenfaller K Grossmann J Grobei MA Hull R Hirsch-Hoffmann M Yalovsky S Zimmermann P 811 Grossniklaus U Gruissem W Baginsky S (2008) Genome-scale proteomics reveals Arabidopsis thaliana 812 gene models and proteome dynamics Science (80- ) 320 938ndash941 813

Ballouz S Verleyen W Gillis J (2015) Guidance for RNA-seq co-expression network construction and analysis 814 Safety in numbers Bioinformatics 31 2123ndash2130 815

Ballouz S Weber M Pavlidis P Gillis J (2016) EGAD Ultra-fast functional analysis of gene networks bioRxiv 816 53868 817

Barabasi A-L Oltvai ZNZN Barabaacutesi A-L (2004) Network biology understanding the cellrsquos functional 818 organization Nat Rev Genet 5 101ndash113 819

Benjamini Y Hochberg Y (1995) Controlling the false discovery rate a practical and powerful approach to 820 multiple testing J R Stat Soc Ser B 289ndash300 821

De Bodt S Hollunder J Nelissen H Meulemeester N Inzeacute D (2012) CORNET 20 Integrating plant 822 coexpression protein-protein interactions regulatory interactions gene associations and functional 823 annotations New Phytol 195 707ndash720 824

Bolduc N Yilmaz A Mejia-Guerra MK Morohashi K OrsquoConnor D Grotewold E Hake S (2012) Unraveling the 825 KNOTTED1 regulatory network in maize meristems Genes Dev 26 1685ndash90 826

Bosch M Mayer C-D Cookson A Donnison IS (2011) Identification of genes involved in cell wall biogenesis in 827 grasses by differential gene expression profiling of elongating and non-elongating maize internodes J 828 Exp Bot 62 3545ndash3561 829

Camacho C Coulouris G Avagyan V Ma N Papadopoulos J Bealer K Madden TL (2009) BLAST+ 830 architecture and applications BMC Bioinformatics 10 421 831

Conesa A Madrigal P Tarazona S Gomez-Cabrero D Cervera A McPherson A Szcześniak MW Gaffney DJ 832 Elo LL Zhang X et al (2016) A survey of best practices for RNA-seq data analysis Genome Biol 17 13 833

Drsquohaeseleer P Liang S Somogyi R (2000) Genetic network inference from co-expression clustering to reverse 834 engineering Bioinformatics 16 707ndash726 835

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 25

Davidson RM Hansey CN Gowda M Childs KL Lin H Vaillancourt B Sekhon RS de Leon N Kaeppler SM 836 Jiang N et al (2011) Utility of RNA Sequencing for Analysis of Maize Reproductive Transcriptomes Plant 837 Genome J 4 191 838

Desprez T Juraniec M Crowell EF Jouy H Pochylova Z Parcy F Houmlfte H Gonneau M Vernhettes S (2007) 839 Organization of cellulose synthase complexes involved in primary cell wall synthesis in Arabidopsis 840 thaliana Proc Natl Acad Sci 104 15572ndash15577 841

Dhillon IS Modha DS (2001) Concept decompositions for large sparse text data using clustering Mach Learn 842 42 143ndash175 843

Dillies M-A Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot G Castel D 844 Estelle J (2013a) A comprehensive evaluation of normalization methods for Illumina high-throughput RNA 845 sequencing data analysis Brief Bioinform 14 671ndash683 846

Dillies MA Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot NS Castel D 847 Estelle J et al (2013b) A comprehensive evaluation of normalization methods for Illumina high-throughput 848 RNA sequencing data analysis Brief Bioinform 14 671ndash683 849

Doncheva NT Assenov Y Domingues FS Albrecht M (2012) Topological analysis and interactive visualization 850 of biological networks and protein structures Nature Protoc 7 670ndash85 851

Dong J Horvath S (2007) Understanding network concepts in modules BMC Syst Biol 1 24 852

Dotto MC Petsch KA Aukerman MJ Beatty M Hammell M Timmermans MCP (2014) Genome-wide analysis 853 of leafbladeless1-regulated and phased small RNAs underscores the importance of the TAS3 ta-siRNA 854 pathway to maize development PLoS Genet 10 e1004826 855

Du D Rawat N Deng Z Gmitter FG (2015) Construction of citrus gene coexpression networks from microarray 856 data using random matrix theory Hortic Res 2 15026 857

Du Z Zhou X Ling Y Zhang Z Su Z (2010) agriGO a GO analysis toolkit for the agricultural community 858 Nucleic Acids Res 38 64-70 859

Enright AJ Van Dongen S Ouzounis CA (2002) An efficient algorithm for large-scale detection of protein 860 families Nucleic Acids Res 30 1575ndash1584 861

Fadista J Vikman P Laakso EO Mollet IG Esguerra J Lou Taneera J Storm P Osmark P Ladenvall C 862 Prasad RB (2014) Global genomic and transcriptomic analysis of human pancreatic islets reveals novel 863 genes influencing glucose metabolism Proc Natl Acad Sci 111 13924ndash13929 864

Faith JJ Hayete B Thaden JT Mogno I Wierzbowski J Cottarel G Kasif S Collins JJ Gardner TS (2007) 865 Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of 866 expression profiles PLoS Biol 5 0054ndash0066 867

Fedoroff N V (2012) McClintockrsquos challenge in the 21st century Proc Natl Acad Sci 109(50) 20200ndash20203 868

Ficklin SP Feltus FA (2011) Gene coexpression network alignment and conservation of gene modules 869 between two grass species maize and rice Plant Physiol 156 1244ndash56 870

Filzmoser P Fritz H Kalcher K (2009) pcaPP Robust PCA by Projection Pursuit R Packag version 1 871

Fu J Cheng Y Linghu J Yang X Kang L Zhang Z Zhang J He C Du X Peng Z (2013) RNA sequencing 872 reveals the complex regulatory network in the maize kernel Nature Commun 42832 873

Gao S Ver Steeg G Galstyan A (2015) Efficient Estimation of Mutual Information for Strongly Dependent 874 Variables Artificial Intelligence and Statistics 277-286 875

Gillis J Pavlidis P (2011) The role of indirect connections in gene networks in predicting function 876 Bioinformatics 27 1860ndash1866 877

Giorgi F Fabbro C Del Licausi F (2013) Comparative study of RNA-seq-and Microarray-derived coexpression 878 networks in Arabidopsis thaliana Bioinformatics 2 1ndash8 879

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 26

Gu Y Kaplinsky N Bringmann M Cobb A Carroll A Sampathkumar A Baskin TI Persson S Somerville CR 880 (2010) Identification of a cellulose synthase-associated protein required for cellulose biosynthesis Proc 881 Natl Acad Sci 107 12866ndash12871 882

Han Y Gao S Muegge K Zhang W Zhou B (2015) Advanced applications of RNA sequencing and challenges 883 Bioinform Biol Insights 9 29ndash46 884

Horvath S Dong J (2008) Geometric interpretation of gene coexpression network analysis PLoS Comput biol 885 4 e1000117 886

Huang J Lynn JS Schulte L Vendramin S McGinnis K (2017) Chapter Two-Epigenetic Control of Gene 887 Expression in Maize Int Rev Cell Mol Biol 328 25ndash48 888

Iancu OD Kawane S Bottomly D Searles R Hitzemann R McWeeney S (2012) Utilizing RNA-Seq data for de 889 novo coexpression network inference Bioinformatics 28 1592ndash1597 890

Kim D Langmead B Salzberg SL (2015) HISAT a fast spliced aligner with low memory requirements Nat 891 Methods 12 357ndash360 892

Koumlster J Rahmann S (2012) Snakemakemdasha scalable bioinformatics workflow engine Bioinformatics 28 893 2520ndash2522 894

Krouk G Lingeman J Colon A Coruzzi G Shasha D (2013) Gene regulatory networks in plants learning 895 causality from time and perturbation Genome Biol 14 123 896

Kumar S Stecher G Suleski M Hedges SB (2017) TimeTree a resource for timelines timetrees and 897 divergence times Mol Biol Evol 34 1812ndash1819 898

Kumari S Nie J Chen H-S Ma H Stewart R Li X Lu M-Z Taylor WM Wei H (2012) Evaluation of gene 899 association methods for coexpression network construction and biological knowledge discovery PLoS 900 One 7 e50411 901

Langfelder P Horvath S (2008) WGCNA an R package for weighted correlation network analysis BMC 902 Bioinformatics 9 559 903

Leinonen R Sugawara H Shumway M (2010) The sequence read archive Nucleic Acids Res gkq1019 904

Li C Qiao Z Qi W Wang Q Yuan Y Yang X Tang Y Mei B Lv Y Zhao H et al (2015a) Genome-Wide 905 Characterization of cis-Acting DNA Targets Reveals the Transcriptional Regulatory Framework of 906 Opaque2 in Maize Plant Cell 27 532-545 907

Li H Peng Z Yang X Wang W Fu J Wang J Han Y Chai Y Guo T Yang N (2013a) Genome-wide 908 association study dissects the genetic architecture of oil biosynthesis in maize kernels Nat Genet 45 43ndash909 50 910

Li J Wei H Zhao PX (2013b) DeGNServer  Deciphering Genome-Scale Gene Networks through High 911 Performance Reverse Engineering Analysis 2013 912

Li P Piao Y Shon HS Ryu KH (2015b) Comparing the normalization methods for the differential analysis of 913 Illumina high-throughput RNA-Seq data BMC Bioinformatics 16 347 914

Li Q Eichten SR Hermanson PJ Zaunbrecher VM Song J Wendt J Rosenbaum H Madzima TF Sloan AE 915 Huang J et al (2014a) Genetic Perturbation of the Maize Methylome Plant Cell 26 4602ndash4616 916

Li S Łabaj PP Zumbo P Sykacek P Shi W Shi L Phan J Wu P-Y Wang M Wang C (2014b) Detecting and 917 correcting systematic variation in large-scale RNA sequencing data Nature Biotechnol 32 888ndash895 918

Li Y Pearl SA Jackson SA (2015c) Gene Networks in Plant Biology Approaches in Reconstruction and 919 Analysis Trends Plant Sci 20 664ndash675 920

Liao Y Smyth GK Shi W (2014) featureCounts an efficient general purpose program for assigning sequence 921 reads to genomic features Bioinformatics 30 923ndash930 922

Lim WK Wang K Lefebvre C Califano A (2007) Comparative analysis of microarray normalization procedures 923 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 27

Effects on reverse engineering gene networks Bioinformatics pp 282ndash288 924

Liu S Liu Y Zhao J Cai S Qian H Zuo K Zhao L Zhang L (2017) A computational interactome for prioritizing 925 genes associated with complex agronomic traits in rice Plant J 90 177-188 926

Lonsdale J Thomas J Salvatore M Phillips R Lo E Shad S Hasz R Walters G Garcia F Young N (2013) 927 The genotype-tissue expression (GTEx) project Nat Genet 45 580ndash585 928

Love MI Huber W Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data 929 with DESeq2 Genome Biol 15 1 930

Luo X You Z Zhou M Li S Leung H Xia Y Zhu Q (2015) A highly efficient approach to protein interactome 931 mapping based on collaborative filtering framework Sci Rep 5 7702 932

Ma C Wang X (2012) Application of the Gini correlation coefficient to infer regulatory relationships in 933 transcriptome analysis Plant Physiol 160 192ndash203 934

Ma H-W Zeng A-P (2003) The connectivity structure giant strong component and centrality of metabolic 935 networks Bioinformatics 19 1423ndash1430 936

Ma S Shah S Bohnert HJ Snyder M Dinesh-Kumar SP (2013) Incorporating motif analysis into gene co-937 expression networks reveals novel modular expression pattern and new signaling pathways PLoS Genet 938 9 e1003840 939

Marbach D Costello JC Kuumlffner R Vega NNM Prill RJ Camacho DM Allison KR Aderhold A Allison KR 940 Bonneau R et al (2012) Wisdom of crowds for robust gene network inference Nat Methods 9 796ndash804 941

Margolin AA Nemenman I Basso K Wiggins C Stolovitzky G Dalla Favera R Califano A (2006) ARACNE 942 an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context BMC 943 Bioinformatics 7 S7 944

Mark Cigan A Unger‐Wallace E Haug‐Collet K (2005) Transcriptional gene silencing as a tool for uncovering 945 gene function in maize Plant J 43 929ndash940 946

Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads EMBnet J 17 947 pp-10 948

Maza E Frasse P Senin P Bouzayen M Zouine M (2013) Comparison of normalization methods for 949 differential gene expression analysis in RNA-Seq experiments A matter of relative size of studied 950 transcriptomes Commun Integr Biol 6 e25849 951

McClintock B (1983) The Significance of Responses of THE GENOME TO CHALLENGE Science (80- ) 792ndash952 801 953

Meyer PE Kontos K Lafitte F Bontempi G (2007) Information-theoretic inference of large transcriptional 954 regulatory networks Eurasip J Bioinforma Syst Biol doi 101155200779879 955

Meyer PE Lafitte F Bontempi G (2008) minet ARBioconductor package for inferring large transcriptional 956 networks using mutual information BMC Bioinformatics 9 461 957

Monaco MK Sen TZ Dharmawardhana PD Ren L Schaeffer M Naithani S Amarasinghe V Thomason J 958 Harper L Gardiner J et al (2013) Maize Metabolic Network Construction and Transcriptome Analysis 959 Plant Genome 6 12 960

Morohashi K Casas MI Falcone Ferreyra ML Falcone Ferreyra L Mejiacutea-Guerra MK Pourcel L Yilmaz A 961 Feller A Carvalho B Emiliani J et al (2012) A genome-wide regulatory framework identifies maize 962 pericarp color1 controlled genes Plant Cell 24 2745ndash64 963

Morris JH Apeltsin L Newman AM Baumbach J Wittkop T Su G Bader GD Ferrin TE (2011) clusterMaker 964 a multi-algorithm clustering plugin for Cytoscape BMC Bioinformatics 12 436 965

Mortazavi A Williams BA McCue K Schaeffer L Wold B (2008) Mapping and quantifying mammalian 966 transcriptomes by RNA-Seq Nat Methods 5 621ndash628 967

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 28

Mukaka MM (2012) A guide to appropriate use of correlation coefficient in medical research Malawi Med J 24 968 69ndash71 969

Obayashi T Hayashi S Saeki M Ohta H Kinoshita K (2009) ATTED-II provides coexpressed gene networks 970 for Arabidopsis Nucleic Acids Res 37 D987ndashD991 971

Ogata Y Suzuki H Sakurai N Shibata D (2010) CoP a database for characterizing co-expressed gene 972 modules with biological information in plants Bioinformatics 26 1267ndash1268 973

Oshlack A Wakefield MJ (2009) Transcript length bias in RNA-seq data confounds systems biology Biol 974 Direct 4 14 975

Park T Yi S-G Kang S-H Lee S Lee Y-S Simon R (2003) Evaluation of normalization methods for microarray 976 data BMC Bioinformatics 4 33 977

Paulson J Chen C-Y Lopes-Ramos CM Kuijjer ML Platig J Sonawane AR Fagny M Glass K Quackenbush 978 J (2016) Tissue-aware RNA-Seq processing and normalization for heterogeneous and sparse data 979 bioRxiv 81802 980

Pautler M Eveland AL LaRue T Yang F Weeks R Lunde C Je B Il Meeley R Komatsu M Vollbrecht E et 981 al (2015) FASCIATED EAR4 Encodes a bZIP Transcription Factor That Regulates Shoot Meristem Size in 982 Maize Plant Cell Online 2 tpc114132506 983

Penning BW Hunter 3rd CT Tayengwa R Eveland AL Dugard CK Olek AT Vermerris W Koch KE McCarty 984 DR Davis MF et al (2009) Genetic resources for maize cell wall biology Plant Physiol 151 1703ndash1728 985

Ponnala L Wang Y Sun Q Wijk KJ (2014) Correlation of mRNA and protein abundance in the developing 986 maize leaf Plant J 78 424ndash440 987

Rau A Gallopin M Celeux G Jaffreacutezic F (2013) Data-based filtering for replicated high-throughput 988 transcriptome sequencing experiments Bioinformatics 29 2146ndash2152 989

Robinson MD McCarthy DJ Smyth GK (2010) edgeR a Bioconductor package for differential expression 990 analysis of digital gene expression data Bioinformatics 26 139ndash140 991

Sales G Romualdi C (2011) parmigenemdasha parallel R package for mutual information estimation and gene 992 network reconstruction Bioinformatics 27 1876ndash1877 993

Sangrador-Vegas A Mitchell AL Chang H-Y Yong S-Y Finn RD (2016) GO annotation in InterPro why 994 stability does not indicate accuracy in a sea of changing annotations Database J Biol databases 995 curation 2016 996

Sato Y Antonio BA Namiki N Takehisa H Minami H Kamatsuki K Sugimoto K Shimizu Y Hirochika H 997 Nagamura Y (2011) RiceXPro a platform for monitoring gene expression in japonica rice grown under 998 natural field conditions Nucleic Acids Res 39 D1141ndashD1148 999

Schaefer RJ Briskine R Springer NM Myers CL (2014) Discovering functional modules across diverse maize 1000 transcriptomes using COB the co-expression browser PLoS One doi 101371journalpone0099193 1001

Schmidt D (2016) Co-Operation Fast Correlation Covariance and Cosine Similarity R package 1002

Schnable P Ware D Fulton R Stein J (2009) The B73 maize genome complexity diversity and dynamics 1003 Science (80- ) 326 1112ndash1115 1004

Schwanhaumlusser B Busse D Li N Dittmar G Schuchhardt J Wolf J Chen W Selbach M (2011) Global 1005 quantification of mammalian gene expression control Nature 473 337ndash342 1006

Scott-Boyer M-P Haibe-Kains B Deschepper CF (2013) Network statistics of genetically-driven gene co-1007 expression modules in mouse crosses Frontiers in Genetics 20134291 1008

Serin EAR Nijveen H Hilhorst HWM Ligterink W (2016) Learning from Co-expression Networks Possibilities 1009 and Challenges Front Plant Sci 7 444 1010

Shannon P Markiel A Ozier O Baliga NS Wang JT Ramage D Amin N Schwikowski B Ideker T (2003) 1011 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 29

Cytoscape a software environment for integrated models of biomolecular interaction networks Genome 1012 Res 13 2498ndash2504 1013

Sing T Sander O Beerenwinkel N Lengauer T (2005) ROCR visualizing classifier performance in R 1014 Bioinformatics 21 3940ndash3941 1015

Singh M Goel S Meeley RB Dantec C Parrinello H Michaud C Leblanc O Grimanelli D (2011) Production of 1016 viable gametes without meiosis in maize deficient for an ARGONAUTE protein Plant Cell 23 443ndash458 1017

Song L Langfelder P Horvath S (2012) Comparison of co-expression measures mutual information 1018 correlation and model based indices BMC Bioinformatics 13 328 1019

Stelpflug SC Rajandeep S Vaillancourt B Hirsch CN Buell CR Leon N De Kaeppler SM (2015) An 1020 expanded maize gene expression atlas based on RNA-sequencing and its use to explore root 1021 development Plant Genome 314ndash362 1022

Szklarczyk D Franceschini A Wyder S Forslund K Heller D Huerta-Cepas J Simonovic M Roth A Santos A 1023 Tsafou KP et al (2015) STRING v10 Protein-protein interaction networks integrated over the tree of life 1024 Nucleic Acids Res 43 D447ndashD452 1025

Tzfadia O Diels T De Meyer S Vandepoele K Aharoni A Van de Peer Y (2016) CoExpNetViz Comparative 1026 Co-Expression Networks Construction and Visualization Tool Front Plant Sci 6 1194 1027

Usadel B Obayashi T Mutwil M Giorgi FM Bassel GW Tanimoto M Chow A Steinhauser D Persson S 1028 Provart NJ (2009) Co-expression tools for plant biology opportunities for hypothesis generation and 1029 caveats Plant Cell Environ 32 1633ndash51 1030

USDA (2016) Grain World Markets and Trade 1031

Vanholme R Demedts B Morreel K Ralph J Boerjan W (2010) Lignin biosynthesis and structure Plant 1032 Physiol 153 895ndash905 1033

Verdin E Ott M (2014) 50 Years of Protein Acetylation From Gene Regulation To Epigenetics Metabolism 1034 and Beyond Nat Rev Mol Cell Biol 16 258ndash264 1035

Walley JW Sartor RC Shen Z Schmitz RJ Wu KJ Urich MA Nery JR Smith LG Schnable JC Ecker JR 1036 (2016) Integration of omic networks in a developmental atlas of maize Science 353 814ndash818 1037

Wang W Zhou XJ Liu Z Sun F (2015a) Network tuned multiple rank aggregation and applications to gene 1038 ranking BMC Bioinformatics 16 S6 1039

Wang YXR Jiang K Feldman LJ Bickel PJ Huang H (2015b) Inferring genendashgene interactions and functional 1040 modules using sparse canonical correlation analysis Ann Appl Stat 9 300ndash323 1041

Wang Z Gerstein M Snyder M (2009) RNA-Seq a revolutionary tool for transcriptomics Nat Rev Genet 10 1042 57ndash63 1043

Wei C Li J Bumgarner RE (2004) Sample size for detecting differentially expressed genes in microarray 1044 experiments BMC Genomics 5 87 1045

Wolfe CJ Kohane IS Butte AJ (2005) Systematic survey reveals general applicability ofldquo guilt-by-associationrdquo 1046 within gene coexpression networks BMC Bioinformatics 6 227 1047

Wu D-D Wang X Li Y Zeng L Irwin DM Zhang Y-P (2014) ldquoOut of Pollenrdquo Hypothesis for Origin of New 1048 Genes in Flowering Plants Study from Arabidopsis thaliana Genome Biol Evol 6 2822ndash2829 1049

Yang H Liu X Xin M Du J Hu Z Peng H Rossi V Sun Q Ni Z Yao Y (2016) Genome-wide Mapping of 1050 Targets of Maize Histone Deacetylase HDA101 Reveals Its Function and Regulatory Mechanism during 1051 Seed Development Plant Cell 28 629ndash645 1052

Yim WC Yu Y Song K Jang CS Lee B-M (2013) PLANEX the plant co-expression database BMC Plant Biol 1053 13 83 1054

Zheng W Chung LM Zhao H (2011) Bias detection and correction in RNA-Sequencing data BMC 1055 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 30

Bioinformatics 12 290 1056

Zhong R Allen JD Xiao G Xie Y (2014) Ensemble-based network aggregation improves the accuracy of gene 1057 network reconstruction PLoS One 9 e106319 1058

Zhong R Ye ZH (2015) Secondary cell walls Biosynthesis patterned deposition and transcriptional regulation 1059 Plant Cell Physiol 56 195ndash214 1060

Zhu G Wu A Xu X-J Xiao P Lu L Liu J Cao Y Chen L Wu J Zhao X-M (2016) PPIM A protein-protein 1061 interaction database for Maize Plant Physiol 170 pp1501821- 1062

Zyprych-Walczak J Szabelska A Handschuh L Goacuterczak K Klamecka K Figlerowicz M Siatkowski I (2015) 1063 The impact of normalization methods on RNA-Seq data analysis Biomed Res Int 2015 1064

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 1 Number of maize microarray and RNA-Seq samples submitted to NCBI (httpswwwncbinlmnihgov) from 2008-2016 A A text search of the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) database identified samples generated by microarray platforms GPL4032 and GPL12620 the total values for years 2008 to 2016 were combined to represent the number of microarray studies (Microarray) A text search of the NCBI Sequence Read Archive (SRA) database was used to identify RNA-seq samples generated between 2008 and 2016 using the Illumina sequencing platform (RNA-Seq) B the number of samples submitted to NCBI GEO database each year generated by microarray platform GPL4032 and GPL12620 were identified by a text search (dash line) and compared to the number of RNA-Seq Illumina samples (solid line) per year 2008-2016

Fig 1A B

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 2 Normalization and network inference methods effect on single network performance A Network performance was evaluated by calculating area under the receiver operating characteristic curve (AUROC) values from GO datasets for comparisons with samples normalized using Variance Stabilizing Transformation (VST) Counts Per Million (CPM) or Reads per Kilobase per Million (RPKM) methods B Network performance was evaluated by calculating AUROC values from PPPTY dataset comparisons for samples normalized using VST CPM or RPKM methods C Network performance was evaluated by calculating AUROC values from com-parisons with HDA101 binding targets for samples normalized using VST CPM or RPKM methods D Network performance was evaluated by calculating AUROC values from comparisons with GO dataset for samples constructed using nine inference methods including Pearson Correlation Coefficient (PCC) Spearman correla-tion coefficient (SCC) Kendall rank Correlation Coefficient (KCC) Gini correlation coefficient (GCC) Biweight midcorrelation (BIC) Cosine Similarity Coefficient (CSC) Additive ARCNE (AA) multiplicative ARCNE (MA) Minimum Redundancy NETwork (MRNET) and Context Likelihood of Relatedness (CLR) E Network perfor-mance was evaluated by calculating AUROC values from comparisons with PPPTY for samples constructed using nine inference methods F Network performance was evaluated by calculating AUROC values from com-parisons with HDA101 binding targets for samples constructed using nine inference methods Outliers were defined as outside of 15 times the interquartile range above 75 quantile or below 25 quantile Median values were plotted as bold horizontal lines For C to F the horizontal dashed lines indicate the highest and lowest AUROC values

Fig 2 A D

B E

C F

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

FigP

FigurePIPSimilarityPbetweenPninePinferencePmethodsPonPnetworkPperformancePbasedPuponPGOPA-PandPPPPTYPB-PevaluationI Genes with highest AUROC value in CSC or MRNETCLR are closed in blue and green box respectively AreaPunderPthePROCPcurvePAUROC-PvaluesPforPeachPGOPtermPorPgenesPwerePscaledPtoPstandardPnormalPdistributionMPresultingPinPscaledPAUROCPvaluesPbetweenPKPblue-PandPPred-IPSamplesPnormalizedPbyPVSTMPCPMPandPRPKMPwerePanalyzedPusingPeachPinferencePmethodsPPCCMPSCCMPKCCMPGCCMPBICMPCSCMPAAMPMAMPMRNETPandPCLR-PandPclusteredPbasedPonP EuclidianP distanceIP PCCP PearsonP CorrelationP CoefficientP SCCP SpearmanP correlationP coefficientP KCCP KendallP rankP CorrelationP CoefficientPGCCPGiniPcorrelationPcoefficientPBICPBiweightPmidcorrelationPCosinePSimilarityPCoefficientPCSC-PAAPAdditivePARCNEPMAPmultiplicativePARCNEPMRNETPMinimumPRedundancyPNETworkPCLRPContextPLikelihoodPofPRelatednessI

A

B

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 4 Effect of sample size on network performance A Average area under the ROC curve (average AUROC) values from GO evaluation of 17 different sized networks plotted against natural logarithm transformed sample size (log(sample size)) B Average AUROC values from PPPTY evaluation of 17 different sized networks plotted against natural logarithm transformed sample size (log(sample size)) Regression fitting logarithm models were plotted in black lines R-square and p-values were calculated by lm() function in R PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness

Fig 4A

B

GO PCC GO SCC

GO MRNET GO CLR

PPPTY PCC PPPTY SCC

PPPTY CLRPPPTY MRNET

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Fig 5

A B

Figure 5 Evaluation of different sized networks constructed with different number of samples using PCC (black) SCC (green) MRNET (red) and CLR (blue) methods A Average AUROC values from GO evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted against sample size B Average AUROC values from PPPTY evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted against sample size Networks with the same number of samples included are designated as ldquo_1rdquo ldquo_2rdquo and ldquo_3rdquo PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 6 GCN performance comparison of networks constructed with 1266 libraries A Area under the ROC curve (AUROC) values from GO evalu-ation of single network (white bars) aggregation network (grey bars) and protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers B AUROC values from PPPTY evaluation of single network (white bars) aggregation network (grey bars) and protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers

AU

C

AU

C

Protein GO Protein PPPTYA B

Fig 6

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

835

45

3812

5 802

148

MS PA

SA

872505

167664

16510411732

9172 716573

106591

MS PA

SA

chromatinassemblydisassembly

cellular macromoleculemetabolic process

chromatin assembly

Hub

N2 compound metabolicprocess

gene silencing

macromoleculemetabolic process

cellular componentorganization

chromatin modification

biosynthetic process

cellular biosyntheticprocess

DNA packaging

organelle organization

protein-DNA complexassembly

nucleosomeorganization

DNA-dep DNAreplication

macromoleculebiosynthetic process

response to DNAdamage stimulus

chromosomeorganization

pattern specificationprocess

DNA replication

DNA conformationchange

translation

cellular macromoleculebiosynthetic process

Nucleic acid metabolicprocess

gene expression

chromatin organizationnucleosome assembly

epigenetic reg of geneexpression

negative regulation ofmacromolecule

metabolic process

cellular response tostress

RNA processing

DNA repair

regionalization

polysaccharidebiosynthetic process

cell wall organization orbiogenesis

glucan metabolicprocess

cellular glucanmetabolic process

cellular polysaccharidebiosynthetic process

cellular carbohydratebiosynthetic process

cellulose metabolicprocess

cellular polysaccharidemetabolic process

cellulose biosyntheticprocess epidermis development

cell growthgrowth

regulation of cellularcomponent size

cellular amino acidderivative metabolic

process

cell wall polysaccharidemetabolic process

carbohydrate metabolicprocess

regulation of anatomicalstructure size

GTP metabolic process

root morphogenesis

epidermal celldifferentiation

ectoderm developmentphenylpropanoid

biosynthetic process

regulation of cell size

glucan biosyntheticprocess

carbohydratebiosynthetic process cellular cell wall

organization orbiogenesis

cell wall biogenesis

Cell Wallroot epidermal cell

differentiationcell differentiation

cell wall organization

protein polymerization

plant-type cell wallbiogenesis

cellular carbohydratemetabolic process

phenylpropanoidmetabolic process

cell wall macromoleculemetabolic process

cellular cell wallmacromolecule

metabolic process

plant-type cell wallorganization or

biogenesis

hemicellulose metabolicprocess

Fig 7A C

B D

Figure 7 Comparison of top 1000 interacting genes in aggregation and single networks A Venn diagram shows the overlap among 1000 highest interacting genes in three networks 148 genes were shared among three networks PA PCC ranked aggregation network SA SCC ranked aggrega-tion network MS MRNET single network B Venn diagram shows the overlap among top 1x106 edges in three networks 106591 edges were shared among three networks C GO term enrichment analyzed by AgriGO for 148 shared highly interacting genes among PA SA and MS D GO term enrichment analyzed by AgriGO for 214 co-expressed genes queried by 16 cell wall pathway genes wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Fig 8

A B C

Figure 8 Cell wall pathway subnetworks A Intersections of PCC aggregation (PA) SCC aggregation (SA) and MRNET-single (MS) networks queried by 16 cell wall pathway genes (red nodes) Cyan nodes are genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions B Network retrieved from CORNET database queried by the 16 cell wall pathway genes (red node) Cyan nodes are genes with reported function in cell wall related path-ways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions C Network retrieved from STRING database queried by the 16 cell wall pathway genes (red nodes) Cyan nodes are genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Parsed CitationsAllen JD Xie Y Chen M Girard L Xiao G (2012) Comparing statistical methods for constructing large scale gene networks PLoS One7 e29348

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Anders S Huber W (2010) Differential expression analysis for sequence count data Genome Biol 11 R106Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Aoki K Ogata Y Shibata D (2007) Approaches for extracting practical information from gene co-expression networks in plant biologyPlant Cell Physiol 48 381-90

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Aoki Y Okamura Y Tadaka S Kinoshita K Obayashi T (2015) ATTED-II in 2016 A Plant Coexpression Database Towards Lineage-Specific Coexpression Plant Cell Physiol 57 e5-e5

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Asnicar F Masera L Coller E Gallo C Sella N Tolio T Morettin P Erculiani L Galante F Semeniuta S (2016) NES2RA Networkexpansion by stratified variable subsetting and ranking aggregation Int J High Perform Comput Appl 1094342016662508

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Baerenfaller K Grossmann J Grobei MA Hull R Hirsch-Hoffmann M Yalovsky S Zimmermann P Grossniklaus U Gruissem WBaginsky S (2008) Genome-scale proteomics reveals Arabidopsis thaliana gene models and proteome dynamics Science (80- ) 320938-941

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ballouz S Verleyen W Gillis J (2015) Guidance for RNA-seq co-expression network construction and analysis Safety in numbersBioinformatics 31 2123-2130

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ballouz S Weber M Pavlidis P Gillis J (2016) EGAD Ultra-fast functional analysis of gene networks bioRxiv 53868Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Barabasi A-L Oltvai ZNZN Barabaacutesi A-L (2004) Network biology understanding the cells functional organization Nat Rev Genet 5101-113

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Benjamini Y Hochberg Y (1995) Controlling the false discovery rate a practical and powerful approach to multiple testing J R Stat SocSer B 289-300

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

De Bodt S Hollunder J Nelissen H Meulemeester N Inzeacute D (2012) CORNET 20 Integrating plant coexpression protein-proteininteractions regulatory interactions gene associations and functional annotations New Phytol 195 707-720

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Bolduc N Yilmaz A Mejia-Guerra MK Morohashi K OConnor D Grotewold E Hake S (2012) Unraveling the KNOTTED1 regulatorynetwork in maize meristems Genes Dev 26 1685-90

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Bosch M Mayer C-D Cookson A Donnison IS (2011) Identification of genes involved in cell wall biogenesis in grasses by differential wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

gene expression profiling of elongating and non-elongating maize internodes J Exp Bot 62 3545-3561Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Camacho C Coulouris G Avagyan V Ma N Papadopoulos J Bealer K Madden TL (2009) BLAST+ architecture and applications BMCBioinformatics 10 421

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Conesa A Madrigal P Tarazona S Gomez-Cabrero D Cervera A McPherson A Szczesniak MW Gaffney DJ Elo LL Zhang X et al(2016) A survey of best practices for RNA-seq data analysis Genome Biol 17 13

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dhaeseleer P Liang S Somogyi R (2000) Genetic network inference from co-expression clustering to reverse engineeringBioinformatics 16 707-726

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Davidson RM Hansey CN Gowda M Childs KL Lin H Vaillancourt B Sekhon RS de Leon N Kaeppler SM Jiang N et al (2011) Utilityof RNA Sequencing for Analysis of Maize Reproductive Transcriptomes Plant Genome J 4 191

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Desprez T Juraniec M Crowell EF Jouy H Pochylova Z Parcy F Houmlfte H Gonneau M Vernhettes S (2007) Organization of cellulosesynthase complexes involved in primary cell wall synthesis in Arabidopsis thaliana Proc Natl Acad Sci 104 15572-15577

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dhillon IS Modha DS (2001) Concept decompositions for large sparse text data using clustering Mach Learn 42 143-175Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dillies M-A Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot G Castel D Estelle J (2013a) Acomprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis Brief Bioinform 14671-683

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dillies MA Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot NS Castel D Estelle J et al (2013b) Acomprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis Brief Bioinform 14671-683

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Doncheva NT Assenov Y Domingues FS Albrecht M (2012) Topological analysis and interactive visualization of biological networksand protein structures Nature Protoc 7 670-85

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dong J Horvath S (2007) Understanding network concepts in modules BMC Syst Biol 1 24Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dotto MC Petsch KA Aukerman MJ Beatty M Hammell M Timmermans MCP (2014) Genome-wide analysis of leafbladeless1-regulatedand phased small RNAs underscores the importance of the TAS3 ta-siRNA pathway to maize development PLoS Genet 10 e1004826

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Du D Rawat N Deng Z Gmitter FG (2015) Construction of citrus gene coexpression networks from microarray data using randommatrix theory Hortic Res 2 15026

Pubmed Author and TitleCrossRef Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Google Scholar Author Only Title Only Author and Title

Du Z Zhou X Ling Y Zhang Z Su Z (2010) agriGO a GO analysis toolkit for the agricultural community Nucleic Acids Res 38 64-70Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Enright AJ Van Dongen S Ouzounis CA (2002) An efficient algorithm for large-scale detection of protein families Nucleic Acids Res 301575-1584

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fadista J Vikman P Laakso EO Mollet IG Esguerra J Lou Taneera J Storm P Osmark P Ladenvall C Prasad RB (2014) Globalgenomic and transcriptomic analysis of human pancreatic islets reveals novel genes influencing glucose metabolism Proc Natl AcadSci 111 13924-13929

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Faith JJ Hayete B Thaden JT Mogno I Wierzbowski J Cottarel G Kasif S Collins JJ Gardner TS (2007) Large-scale mapping andvalidation of Escherichia coli transcriptional regulation from a compendium of expression profiles PLoS Biol 5 0054-0066

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fedoroff N V (2012) McClintocks challenge in the 21st century Proc Natl Acad Sci 109(50) 20200-20203Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ficklin SP Feltus FA (2011) Gene coexpression network alignment and conservation of gene modules between two grass speciesmaize and rice Plant Physiol 156 1244-56

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Filzmoser P Fritz H Kalcher K (2009) pcaPP Robust PCA by Projection Pursuit R Packag version 1Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fu J Cheng Y Linghu J Yang X Kang L Zhang Z Zhang J He C Du X Peng Z (2013) RNA sequencing reveals the complex regulatorynetwork in the maize kernel Nature Commun 42832

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gao S Ver Steeg G Galstyan A (2015) Efficient Estimation of Mutual Information for Strongly Dependent Variables ArtificialIntelligence and Statistics 277-286

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gillis J Pavlidis P (2011) The role of indirect connections in gene networks in predicting function Bioinformatics 27 1860-1866Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Giorgi F Fabbro C Del Licausi F (2013) Comparative study of RNA-seq-and Microarray-derived coexpression networks in Arabidopsisthaliana Bioinformatics 2 1-8

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gu Y Kaplinsky N Bringmann M Cobb A Carroll A Sampathkumar A Baskin TI Persson S Somerville CR (2010) Identification of acellulose synthase-associated protein required for cellulose biosynthesis Proc Natl Acad Sci 107 12866-12871

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Han Y Gao S Muegge K Zhang W Zhou B (2015) Advanced applications of RNA sequencing and challenges Bioinform Biol Insights 929-46

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Horvath S Dong J (2008) Geometric interpretation of gene coexpression network analysis PLoS Comput biol 4 e1000117Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Huang J Lynn JS Schulte L Vendramin S McGinnis K (2017) Chapter Two-Epigenetic Control of Gene Expression in Maize Int RevCell Mol Biol 328 25-48

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Iancu OD Kawane S Bottomly D Searles R Hitzemann R McWeeney S (2012) Utilizing RNA-Seq data for de novo coexpressionnetwork inference Bioinformatics 28 1592-1597

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kim D Langmead B Salzberg SL (2015) HISAT a fast spliced aligner with low memory requirements Nat Methods 12 357-360Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Koumlster J Rahmann S (2012) Snakemakemdasha scalable bioinformatics workflow engine Bioinformatics 28 2520-2522Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Krouk G Lingeman J Colon A Coruzzi G Shasha D (2013) Gene regulatory networks in plants learning causality from time andperturbation Genome Biol 14 123

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kumar S Stecher G Suleski M Hedges SB (2017) TimeTree a resource for timelines timetrees and divergence times Mol Biol Evol34 1812-1819

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kumari S Nie J Chen H-S Ma H Stewart R Li X Lu M-Z Taylor WM Wei H (2012) Evaluation of gene association methods forcoexpression network construction and biological knowledge discovery PLoS One 7 e50411

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Langfelder P Horvath S (2008) WGCNA an R package for weighted correlation network analysis BMC Bioinformatics 9 559Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Leinonen R Sugawara H Shumway M (2010) The sequence read archive Nucleic Acids Res gkq1019Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li C Qiao Z Qi W Wang Q Yuan Y Yang X Tang Y Mei B Lv Y Zhao H et al (2015a) Genome-Wide Characterization of cis-Acting DNATargets Reveals the Transcriptional Regulatory Framework of Opaque2 in Maize Plant Cell 27 532-545

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li H Peng Z Yang X Wang W Fu J Wang J Han Y Chai Y Guo T Yang N (2013a) Genome-wide association study dissects the geneticarchitecture of oil biosynthesis in maize kernels Nat Genet 45 43-50

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li J Wei H Zhao PX (2013b) DeGNServer Deciphering Genome-Scale Gene Networks through High Performance ReverseEngineering Analysis 2013

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li P Piao Y Shon HS Ryu KH (2015b) Comparing the normalization methods for the differential analysis of Illumina high-throughputRNA-Seq data BMC Bioinformatics 16 347 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li Q Eichten SR Hermanson PJ Zaunbrecher VM Song J Wendt J Rosenbaum H Madzima TF Sloan AE Huang J et al (2014a)Genetic Perturbation of the Maize Methylome Plant Cell 26 4602-4616

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li S Labaj PP Zumbo P Sykacek P Shi W Shi L Phan J Wu P-Y Wang M Wang C (2014b) Detecting and correcting systematicvariation in large-scale RNA sequencing data Nature Biotechnol 32 888-895

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li Y Pearl SA Jackson SA (2015c) Gene Networks in Plant Biology Approaches in Reconstruction and Analysis Trends Plant Sci 20664-675

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Liao Y Smyth GK Shi W (2014) featureCounts an efficient general purpose program for assigning sequence reads to genomicfeatures Bioinformatics 30 923-930

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Lim WK Wang K Lefebvre C Califano A (2007) Comparative analysis of microarray normalization procedures Effects on reverseengineering gene networks Bioinformatics pp 282-288

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Liu S Liu Y Zhao J Cai S Qian H Zuo K Zhao L Zhang L (2017) A computational interactome for prioritizing genes associated withcomplex agronomic traits in rice Plant J 90 177-188

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Lonsdale J Thomas J Salvatore M Phillips R Lo E Shad S Hasz R Walters G Garcia F Young N (2013) The genotype-tissueexpression (GTEx) project Nat Genet 45 580-585

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Love MI Huber W Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 Genome Biol15 1

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Luo X You Z Zhou M Li S Leung H Xia Y Zhu Q (2015) A highly efficient approach to protein interactome mapping based oncollaborative filtering framework Sci Rep 5 7702

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma C Wang X (2012) Application of the Gini correlation coefficient to infer regulatory relationships in transcriptome analysis PlantPhysiol 160 192-203

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma H-W Zeng A-P (2003) The connectivity structure giant strong component and centrality of metabolic networks Bioinformatics 191423-1430

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma S Shah S Bohnert HJ Snyder M Dinesh-Kumar SP (2013) Incorporating motif analysis into gene co-expression networks revealsnovel modular expression pattern and new signaling pathways PLoS Genet 9 e1003840

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Marbach D Costello JC Kuumlffner R Vega NNM Prill RJ Camacho DM Allison KR Aderhold A Allison KR Bonneau R et al (2012)Wisdom of crowds for robust gene network inference Nat Methods 9 796-804

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Margolin AA Nemenman I Basso K Wiggins C Stolovitzky G Dalla Favera R Califano A (2006) ARACNE an algorithm for thereconstruction of gene regulatory networks in a mammalian cellular context BMC Bioinformatics 7 S7

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mark Cigan A Unger-Wallace E Haug-Collet K (2005) Transcriptional gene silencing as a tool for uncovering gene function in maizePlant J 43 929-940

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads EMBnet J 17 pp-10Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Maza E Frasse P Senin P Bouzayen M Zouine M (2013) Comparison of normalization methods for differential gene expressionanalysis in RNA-Seq experiments A matter of relative size of studied transcriptomes Commun Integr Biol 6 e25849

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

McClintock B (1983) The Significance of Responses of THE GENOME TO CHALLENGE Science (80- ) 792-801Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Meyer PE Kontos K Lafitte F Bontempi G (2007) Information-theoretic inference of large transcriptional regulatory networks EurasipJ Bioinforma Syst Biol doi 101155200779879

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Meyer PE Lafitte F Bontempi G (2008) minet ARBioconductor package for inferring large transcriptional networks using mutualinformation BMC Bioinformatics 9 461

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Monaco MK Sen TZ Dharmawardhana PD Ren L Schaeffer M Naithani S Amarasinghe V Thomason J Harper L Gardiner J et al(2013) Maize Metabolic Network Construction and Transcriptome Analysis Plant Genome 6 12

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Morohashi K Casas MI Falcone Ferreyra ML Falcone Ferreyra L Mejiacutea-Guerra MK Pourcel L Yilmaz A Feller A Carvalho B EmilianiJ et al (2012) A genome-wide regulatory framework identifies maize pericarp color1 controlled genes Plant Cell 24 2745-64

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Morris JH Apeltsin L Newman AM Baumbach J Wittkop T Su G Bader GD Ferrin TE (2011) clusterMaker a multi-algorithmclustering plugin for Cytoscape BMC Bioinformatics 12 436

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mortazavi A Williams BA McCue K Schaeffer L Wold B (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq NatMethods 5 621-628

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mukaka MM (2012) A guide to appropriate use of correlation coefficient in medical research Malawi Med J 24 69-71Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Obayashi T Hayashi S Saeki M Ohta H Kinoshita K (2009) ATTED-II provides coexpressed gene networks for Arabidopsis Nucleic wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Acids Res 37 D987-D991Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ogata Y Suzuki H Sakurai N Shibata D (2010) CoP a database for characterizing co-expressed gene modules with biologicalinformation in plants Bioinformatics 26 1267-1268

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Oshlack A Wakefield MJ (2009) Transcript length bias in RNA-seq data confounds systems biology Biol Direct 4 14Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Park T Yi S-G Kang S-H Lee S Lee Y-S Simon R (2003) Evaluation of normalization methods for microarray data BMC Bioinformatics4 33

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Paulson J Chen C-Y Lopes-Ramos CM Kuijjer ML Platig J Sonawane AR Fagny M Glass K Quackenbush J (2016) Tissue-awareRNA-Seq processing and normalization for heterogeneous and sparse data bioRxiv 81802

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Pautler M Eveland AL LaRue T Yang F Weeks R Lunde C Je B Il Meeley R Komatsu M Vollbrecht E et al (2015) FASCIATED EAR4Encodes a bZIP Transcription Factor That Regulates Shoot Meristem Size in Maize Plant Cell Online 2 tpc114132506

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Penning BW Hunter 3rd CT Tayengwa R Eveland AL Dugard CK Olek AT Vermerris W Koch KE McCarty DR Davis MF et al (2009)Genetic resources for maize cell wall biology Plant Physiol 151 1703-1728

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ponnala L Wang Y Sun Q Wijk KJ (2014) Correlation of mRNA and protein abundance in the developing maize leaf Plant J 78 424-440Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Rau A Gallopin M Celeux G Jaffreacutezic F (2013) Data-based filtering for replicated high-throughput transcriptome sequencingexperiments Bioinformatics 29 2146-2152

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Robinson MD McCarthy DJ Smyth GK (2010) edgeR a Bioconductor package for differential expression analysis of digital geneexpression data Bioinformatics 26 139-140

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sales G Romualdi C (2011) parmigenemdasha parallel R package for mutual information estimation and gene network reconstructionBioinformatics 27 1876-1877

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sangrador-Vegas A Mitchell AL Chang H-Y Yong S-Y Finn RD (2016) GO annotation in InterPro why stability does not indicateaccuracy in a sea of changing annotations Database J Biol databases curation 2016

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sato Y Antonio BA Namiki N Takehisa H Minami H Kamatsuki K Sugimoto K Shimizu Y Hirochika H Nagamura Y (2011) RiceXPro aplatform for monitoring gene expression in japonica rice grown under natural field conditions Nucleic Acids Res 39 D1141-D1148

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schaefer RJ Briskine R Springer NM Myers CL (2014) Discovering functional modules across diverse maize transcriptomes using wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

COB the co-expression browser PLoS One doi 101371journalpone0099193Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schmidt D (2016) Co-Operation Fast Correlation Covariance and Cosine Similarity R packagePubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schnable P Ware D Fulton R Stein J (2009) The B73 maize genome complexity diversity and dynamics Science (80- ) 326 1112-1115Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schwanhaumlusser B Busse D Li N Dittmar G Schuchhardt J Wolf J Chen W Selbach M (2011) Global quantification of mammalian geneexpression control Nature 473 337-342

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Scott-Boyer M-P Haibe-Kains B Deschepper CF (2013) Network statistics of genetically-driven gene co-expression modules in mousecrosses Frontiers in Genetics 20134291

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Serin EAR Nijveen H Hilhorst HWM Ligterink W (2016) Learning from Co-expression Networks Possibilities and Challenges FrontPlant Sci 7 444

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Shannon P Markiel A Ozier O Baliga NS Wang JT Ramage D Amin N Schwikowski B Ideker T (2003) Cytoscape a softwareenvironment for integrated models of biomolecular interaction networks Genome Res 13 2498-2504

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sing T Sander O Beerenwinkel N Lengauer T (2005) ROCR visualizing classifier performance in R Bioinformatics 21 3940-3941Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Singh M Goel S Meeley RB Dantec C Parrinello H Michaud C Leblanc O Grimanelli D (2011) Production of viable gametes withoutmeiosis in maize deficient for an ARGONAUTE protein Plant Cell 23 443-458

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Song L Langfelder P Horvath S (2012) Comparison of co-expression measures mutual information correlation and model basedindices BMC Bioinformatics 13 328

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Stelpflug SC Rajandeep S Vaillancourt B Hirsch CN Buell CR Leon N De Kaeppler SM (2015) An expanded maize gene expressionatlas based on RNA-sequencing and its use to explore root development Plant Genome 314-362

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Szklarczyk D Franceschini A Wyder S Forslund K Heller D Huerta-Cepas J Simonovic M Roth A Santos A Tsafou KP et al (2015)STRING v10 Protein-protein interaction networks integrated over the tree of life Nucleic Acids Res 43 D447-D452

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Tzfadia O Diels T De Meyer S Vandepoele K Aharoni A Van de Peer Y (2016) CoExpNetViz Comparative Co-Expression NetworksConstruction and Visualization Tool Front Plant Sci 6 1194

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Usadel B Obayashi T Mutwil M Giorgi FM Bassel GW Tanimoto M Chow A Steinhauser D Persson S Provart NJ (2009) Co-expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

USDA (2016) Grain World Markets and Trade

Vanholme R Demedts B Morreel K Ralph J Boerjan W (2010) Lignin biosynthesis and structure Plant Physiol 153 895-905Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Verdin E Ott M (2014) 50 Years of Protein Acetylation From Gene Regulation To Epigenetics Metabolism and Beyond Nat Rev MolCell Biol 16 258-264

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Walley JW Sartor RC Shen Z Schmitz RJ Wu KJ Urich MA Nery JR Smith LG Schnable JC Ecker JR (2016) Integration of omicnetworks in a developmental atlas of maize Science 353 814-818

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang W Zhou XJ Liu Z Sun F (2015a) Network tuned multiple rank aggregation and applications to gene ranking BMC Bioinformatics16 S6

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang YXR Jiang K Feldman LJ Bickel PJ Huang H (2015b) Inferring gene-gene interactions and functional modules using sparsecanonical correlation analysis Ann Appl Stat 9 300-323

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang Z Gerstein M Snyder M (2009) RNA-Seq a revolutionary tool for transcriptomics Nat Rev Genet 10 57-63Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wei C Li J Bumgarner RE (2004) Sample size for detecting differentially expressed genes in microarray experiments BMC Genomics5 87

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wolfe CJ Kohane IS Butte AJ (2005) Systematic survey reveals general applicability of guilt-by-association within genecoexpression networks BMC Bioinformatics 6 227

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wu D-D Wang X Li Y Zeng L Irwin DM Zhang Y-P (2014) Out of Pollen Hypothesis for Origin of New Genes in Flowering PlantsStudy from Arabidopsis thaliana Genome Biol Evol 6 2822-2829

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yang H Liu X Xin M Du J Hu Z Peng H Rossi V Sun Q Ni Z Yao Y (2016) Genome-wide Mapping of Targets of Maize HistoneDeacetylase HDA101 Reveals Its Function and Regulatory Mechanism during Seed Development Plant Cell 28 629-645

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yim WC Yu Y Song K Jang CS Lee B-M (2013) PLANEX the plant co-expression database BMC Plant Biol 13 83Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zheng W Chung LM Zhao H (2011) Bias detection and correction in RNA-Sequencing data BMC Bioinformatics 12 290Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Allen JD Xiao G Xie Y (2014) Ensemble-based network aggregation improves the accuracy of gene network reconstructionPLoS One 9 e106319 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Ye ZH (2015) Secondary cell walls Biosynthesis patterned deposition and transcriptional regulation Plant Cell Physiol 56195-214

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhu G Wu A Xu X-J Xiao P Lu L Liu J Cao Y Chen L Wu J Zhao X-M (2016) PPIM A protein-protein interaction database for MaizePlant Physiol 170 pp1501821-

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zyprych-Walczak J Szabelska A Handschuh L Goacuterczak K Klamecka K Figlerowicz M Siatkowski I (2015) The impact of normalizationmethods on RNA-Seq data analysis Biomed Res Int 2015

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

  • Parsed Citations
  • Article File
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Figure 7
  • Figure 8
  • Parsed Citations
Page 10: 1 : Maize RNA-Seq GCNs · 4 Author names and affiliation: 5 Ji Huang, ... and the genome annotation is still far from complete (Mark 49 Cigan et al., 2005; Ficklin and Feltus, 2011)

Page | 10

Of the 4 aggregated networks that were evaluated the two correlation methods (PCC and SCC) had higher 318

AUROC values than the single network from 1266 samples (Figure 6 and Supplemental Fig 6) However this 319

aggregation strategy did not result in significant higher AUROC scores for the MRNET and CLR method 320

networks compared with single networks with 1266 samples (two-tail Wilcoxon rank test for GO evaluation p-321

values 0494 and 0796) It has been reported that MI estimation accuracy is dependent on sample size (Gao 322

et al 2015) therefore individual MI networks built with a small number of libraries may not demonstrate 323

improved accuracy from aggregation In conclusion the PCCSCC-built GCN performed best using a ranked 324

aggregation strategy and use of this strategy in combination with the other optimized parameters creates a 325

robust GCN 326

327

The Performance of Protein Networks Did Not Exceed Aggregation Networks 328

In many cases mRNA levels in a cell are of interest because mRNA level is thought to be related to the level 329

and function of a protein of interest However many researchers had found inconsistencies between mRNA 330

and protein level (Baerenfaller et al 2008 Schwanhaumlusser et al 2011 Ponnala et al 2014 Walley et al 331

2016) Although relatively less protein expression data is available this data is amenable to GCN construction 332

and could represent a more direct reflection of interacting proteins Using a non-modified protein expression 333

atlas from 23 maize tissues based upon mass spectrometry data (Walley et al 2016) four protein networks 334

were built with PCC SCC MRNET and CLR separately and then evaluated using the same PPPTY and GO 335

dataset as previously mentioned 336

GCNs constructed from protein expression did not exhibit superior AUROC values to those observed for RNA-337

Seq based GCN using the aggregation strategy (Fig 6) When evaluated by GO and PPPTY dataset the 338

performance of the protein network was lower than the aggregated network as well as the single network from 339

1266 samples To confirm this result a two-way ANOVA was computed with pairwise comparison for the GO 340

evaluation which showed that the effect of network type was significant (Supplemental Table S3) A 341

subsequent pairwise comparison using Wilcoxon rank sum test indicated that PCCSCC method were 342

significantly better than MRNETCLR (Supplemental Table S3) although MI methods may be superior for 343

some types of interactions 344

The raw protein expression data included 17862 genes of which 11429 genes overlapped with our RNA-Seq-345

based network and were therefore used for the analysis To demonstrate that the performance of the protein 346

network was not biased due by the selection of genes the PCC method was used for the whole 17862 genes 347

to construct a protein network (Supplemental Fig 7) No improvement could be detected from protein network 348

derived from 17862 genes with p-value equals to 0635 for GO evaluation and 0995 for PPPTY evaluation 349

from one-sided Wilcoxon rank sum test 350

351

PCC and SCC-built GCN Exhibit Identical Topological and Functional Properties 352 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 11

In addition to evaluation of network performance based upon biological characteristics networks can be 353

compared based upon several different network characteristics including clustering coefficient number of 354

nodes network heterogeneity (Dong and Horvath 2007) network centralization (Dong and Horvath 2007) 355

number of detected modules and number of genes in largest module Number of nodes is a basic construct in 356

graph theory depicting the scale of a network Clustering coefficient and number of modules are to model how 357

densely nodes are connected in networks Heterogeneity measures the variability of node connections 358

Centralization indicates how likely some nodes have significantly more connections than average In this 359

analysis each gene corresponds with a node Based on the extensive evaluation using biological 360

characteristics like protein-protein interactions (PPPTY) and predicted gene function (GO) three final maize 361

networks were selected for comparison of basic network characteristics based on their overall performance 362

PCC and SCC-built ranked aggregation network from 17 experiments (PA and SA) MRNET-built single 363

network from 1266 total samples (MS) The three networks were constrained to include the top one million 364

predicted interactions or edges 365

In prior studies most biological networks had scale-free architectures which fit a power-law distribution 366

(Barabasi et al 2004 Doncheva et al 2012 Schaefer et al 2014) For the three final maize networks 367

constructed using optimized parameters both neighborhood connectivity distribution (Supplemental Fig 8) and 368

node degree distribution (Supplemental Fig 9) fit power-law models with r-squared values over 07 The MS 369

network had the highest network centralization value The network heterogeneity value of MS was over two 370

times that of PA and SA indicating that MS may contain more highly interacting genes (Supplemental Table 371

S4) consistent with the observed highest centralization values for this network Centralization and 372

heterogeneity are two variants to model the degree distribution of networks A scale-free network with more 373

numbers of hubs has larger values of centralization and heterogeneity while a network with larger values of 374

centralization and heterogeneity may contain a larger number of hubs or the number of hubs is not significantly 375

large but the degree distributions are extremely imbalanced In biological networks many observations 376

connected large values of centralization and heterogeneity with more hub genes (Ma and Zeng 2003 Horvath 377

and Dong 2008 Iancu et al 2012 Scott-Boyer et al 2013) even though theoretically we cannot rule out the 378

possibility that high values were result from extremely imbalanced degree distribution For the MS network 379

most highly connected genes interacted with a large number of lowly connected genes this pattern is also 380

apparent reflected in the decreasing neighborhood connectivity distribution for the MS network (Supplemental 381

Fig 8) The genes with the most interactions are expected to act as key components in GCN networks 382

(Langfelder and Horvath 2008 Allen et al 2012) and likely represent central regulators of multi-protein 383

biological processes (Ma et al 2013 Du et al 2015) The top 1000 interacting genes from all networks were 384

analyzed in more detail as these were potential ldquohubrdquo genes that may regulate other expression patterns and 385

processes PA and SA shared 95 of the top 1000 interacting genes while MS had 835 unique genes (Fig 386

7A) 148 genes were shared among all three networks (Supplemental Table S5) making these genes strong 387

candidate for central biological regulators The annotation of these genes suggests their participation in a 388

range of basic cellular process (Fig 7C) including gene expression DNA replication translation and gene 389

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 12

silencing (Supplemental Table S5) the top interacting genes were not limited to a subset of cellular 390

biochemistry Ribosomal proteins were the largest component of top interacting genes (27148) which was 391

expected because of their cellular abundance and involvement with translation Interestingly nine epigenetic 392

regulators were found in the 148 shared genes including AGO104 (GRMZM2G141818) (Singh et al 2011) 393

CHR106 (GRMZM2G071025) (Li et al 2014a) and LBL1 (GRMZM2G020187) (Dotto et al 2014) 394

demonstrating the importance of epigenetic regulation for plant development (reviewed by (Huang et al 395

2017)) 396

To reveal the underlying properties of GCNs a graph clustering algorithm Markov Cluster Algorithm(MCL) was 397

used to identify network modules (Enright et al 2002 Morris et al 2011) The result showed a shared pattern 398

between the PA and SA networks that was distinct from the MS network (Supplemental Table S4) The MS 399

network had fewer but larger modules detected than the PA and SA networks Consequently most genes in 400

the MS network clustered into one very large module of 14054 consistent with the high network centralization 401

value for the MS network Conversely PA and SA networks separated into smaller distinct modules with 402

related gene ontology enrichment (Supplemental Table S6 and S7) The pattern displayed by the PA and SA 403

networks (Supplemental Fig 10) seems more likely to represent biologically relevant pathways and so these 404

methods appear to be better for module detection 405

To compile a high-confident co-expression network the top 1 million edges from PA SA and MS were merged 406

together and the intersection of the three produced a 14277 gene 106591 interactions merged network PA 407

and SA shared 835 of common interactions within the networks while MS had 873 unique interactions 408

(Fig 7B) This merged network (Supplemental Dataset S1) was used for a case study analysis of cell wall 409

biosynthesis The same network can also be accessed at httpwwwbiofsuedumcginnislabmcnmain_pagephp 410

411

Case Study Cell Wall Biosynthesis and Regulation 412

To demonstrate the functionality of network the predicted cell wall biosynthesis pathway from the merged 413

network was compared to the existing knowledge of this pathway Sixteen well-characterized components of 414

cell wall biosynthesis were selected as guide genes (Supplemental Table S8) including five cellulose 415

synthase genes seven cellulose synthase-like genes three glycosyl hydrolase genes and one glycosidase 416

gene (Penning et al 2009 Bosch et al 2011) Collectively 214 genes containing 377 edges were extracted 417

from the network with the 16 guide genes (Fig 8 A) two guide genes did not have any co-expressed genes in 418

the network that met the analysis criteria As expected for these 214 genes cell wall related GO terms were 419

enriched (Fig 7D Supplemental Table S9) 420

The resulting 214 co-expressed genes were queried against the Arabidopsis TAIR 10 protein database to 421

retrieve homologs and their annotations using BLASTP The literature was manually searched using the maize 422

genes and their Arabidopsis homologs as queries (Supplemental Table S10) The results of the literature 423

survey showed that 313 (67214) of the genes co-expressed with the guide genes had peer-reviewed 424

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 13

publications indicating a role in cell wall synthesis or related pathways in plants A search using 214 randomly 425

selected genes as queries returned only 327 genes (7214) that were involved in cell wall related pathways 426

This suggests that the network discriminated co-expressed genes and identified some known components of 427

the pathway Lignin biosynthesis genes are expected to function in cell wall biosynthesis to provide rigidity and 428

strength in the secondary cell wall (reviewed by Vanholme et al 2010) Interestingly even though no lignin 429

biosynthesis genes were included in our queries six lignin biosynthesis genes (PAL1 C4H 4CL2 HCT 430

CCoAOMT1 and PDR1) (reviewed by Zhong and Ye 2015) were found to be co-expressed with the guide 431

genes At least nine cellulose biosynthesis and assembly genes were discovered including CESA1 FLA11 432

IRX9 IRX14 and IRX10 (reviewed by Zhong and Ye 2015) Moreover proteins participating in a well-studied 433

physical interaction CSI1 (Cellulose Synthase Interactive 1) CESA6 (Cellulose Synthase 6) and CESA3 434

(Cellulose Synthase 3) (Desprez et al 2007 Gu et al 2010) were also predicted to be expressed in the 435

network There were 131 genes without reported functions in cell wall pathways an indication that GCN 436

analysis can be used to predict undiscovered components of biological pathways in maize 437

The cell wall biosynthesis pathway results were also compared with the CORNET Co-expression database (De 438

Bodt et al 2012) and STRING functional protein association network (Szklarczyk et al 2015) using the same 439

16 genes and similar parameters (See Methods) From CORNET 10 out of 16 genes had co-expressed genes 440

(Fig 8B) In total 210 genes and 325 interactions were retrieved using CORNET of which 19 (40210) had 441

publications supporting their function in cell wall pathways (Supplemental Table S11) STRING performed very 442

well with 14 out of 16 genes demonstrating predicted protein association (Fig 8C) resulting in 817 443

interactions with 76 genes 48 (3675) of co-expressed genes were experimentally confirmed (Supplemental 444

Table S12) the highest percentage among the three methods Only one of the lignin biosynthesis genes 445

(PAL1) was found using CORNET and none were found using STRING Although STRING appears very 446

robust for predicting protein-protein interactions this suggests that an optimized GCN analysis have more 447

power to find genes that function together without physically interacting This case study shows that a robust 448

optimized GCN can discover physical and functional interactions and enhance study of biological relevant 449

interactions A tutorial was provided as supplemental material on how to use Cytoscape to visualize any co-450

expressed genes in our network (Supplemental Dataset S2) 451

452

Discussion 453

As the per-read cost of RNA-Seq technology decreases the use of this technology is quickly increasing With 454

over five thousand libraries available for maize there is now ample data to support GCN analysis This 455

comprehensive evaluation of normalization methods and network inference methods using real maize RNA-456

Seq data will provide a useful set of optimized parameters to support these analyses 457

In our analysis VST CPM and RPKM normalization methods had equivalent outcomes for GCN analysis 458

consistent with prior results using much smaller datasets (Giorgi et al 2013) Several benchmark studies 459

focusing on differential expression (DE) analysis proposed that RPKM performed poorly and should be avoided 460 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 14

(Maza et al 2013 Dillies et al 2013b Zyprych-Walczak et al 2015) This was not observed for the maize 461

GCN testing It is possible that the large number of samples from various labs created enough heterogeneity 462

within samples that normalization effects were minimized (Paulson et al 2016) Furthermore the 463

normalization is on a library basis which means genes within the same library are normalized by similar factors 464

So when the network is constructed by PCC and BIC where expression vectors are centered by mean or 465

median values the effect of different normalization methods are probably small Two rank correlations SCC 466

and KCC only consider difference on relative rankings where normalization has a limited effect It is similar for 467

GCC method The estimation of mutual information is based on the k-nearest neighbor method implemented in 468

parmigene (Sales and Romualdi 2011) Since the three normalization methods shared similar expression 469

distribution (Supplemental Fig 2) MI estimations from different normalizations are expected to be similar 470

When assessing inference methods the simple and widely used correlation methods like PCC and SCC are 471

less time-consuming than MI methods This analysis showed PCCSCC- built GCNs had better overall 472

performance This is consistent with a study in human GCN analysis (Ballouz et al 2015) but SCC did not 473

score higher than other correlation methods using GO and PPPTY evaluations Some genes had higher 474

performance using MI methods but this effect was limited to evaluation with the PPPTY data This may 475

indicate that correlation and MI inference methods assert different kinds of interactions (Meyer et al 2008 476

Marbach et al 2012 Song et al 2012) Marbach et al (2012) stated that integration of multiple inference 477

methods showed a more robust performance than any single inference methods in in silico and E coli 478

expression networks referring to ldquothe wisdom of crowdrdquo However for analysis of the available maize data 479

integration of PCC SCC MRNET and CLR together did not result in a network that outperformed PCC and 480

SCC networks (data not shown) This approach was also less effective in more complex S cerevisiae datasets 481

than prokaryotic networks (Marbach et al 2012) suggesting that more work is required to determine whether 482

integrating algorithms can improve GCNs with eukaryotic data 483

In conclusion we extensively evaluated normalization methods and inference methods for building an RNA-484

Seq based maize GCN This optimization may apply to a range of datasets with shared characteristics of 485

maize including a large and heterogeneous genome with rich and diverse transposon element composition 486

and limited gene annotation 487

488

Materials and Methods 489

RNA-Seq Data Collection and Process 490

The maize genome and its annotation were downloaded from Ensembl Plant Release 31 491

(httpplantsensemblorg) The original 1303 RNA-Seq samples based on illumina HiSeq2000 or Hiseq2500 492

were downloaded from NCBI Sequence Read Archive (SRA) (Leinonen et al 2010) The downloaded files 493

were converted to fastq format using the fastq-dump command in SRA Toolkit (version 252) The adapters for 494

the fastq files were trimmed by Cutadapt 181 (Martin 2011) The adapter-removed files were then quality 495

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 15

checked by FastQC v0112 (httpwwwbioinformaticsbabrahamacukprojectsfastqc) HISAT2 v204 (Kim 496

et al 2015) was used for genome alignment Gene-level expression raw read counts were calculated by 497

FeatureCounts 150 (Liao et al 2014) from aligned bam files (Supplemental Fig S1) 26 libraries with less 498

than 5 million reads total and 11 libraries with less than 70 of total alignment rate were excluded leaving 499

1266 samples (Supplemental Table S1) for the final expression table The processing protocol were 500

streamlined by Snakemake v371 (Koumlster and Rahmann 2012) 501

502

Gene Count Normalization 503

The expression data was normalized using three different methods before constructing GCNs Counts Per 504

Million (CPM) and Reads Per Killbase Per Million (RPKM) were calculated by edgeR package (Robinson et al 505

2010) in R environment and then log2 normalized (expression = log2(CPMRPKM +1) For both method scale 506

factors between samples were estimated by Trimmed Mean of M-values (TMM) in edge R Variance Stabilizing 507

Transformation (VST) was calculated by DESeq2 package (Love et al 2014) Only genes with expression 508

higher than 2 CPM in more than 1000 samples were included from additional analysis (15116 genes) 509

510

Network Inference 511

Six correlation coefficient methods and four mutual information methods were applied to normalized gene 512

expression data to construct GCNs All computing steps were done in the R 331 environment Pearson 513

Correlation Coefficient (PCC) and Spearman Correlation Coefficient (SCC) was calculated by cor() function 514

Kendall rank Correlation Coefficient was calculated using corfk() function in pcaPP package (Filzmoser et al 515

2009) Gini Correlation Coefficient was calculated by adjacencymatrix() function in rsgcc package (Ma and 516

Wang 2012) Biweight midcorrelation was computed by bicor() function in WGCNA package (Langfelder and 517

Horvath 2008) Cosine similarity coefficient was computed by cosine() function in coop package (Schmidt 518

2016) Mutual information results were computed using the parmigene package (Sales and Romualdi 2011) 519

The adjacency matrix weighs derived from ten inference methods were ranked with smallest value equals to 520

one Then ranks were divided by the number of elements in the matrix and diagonal was set to one to make all 521

networks weighs ranging from zero to one 522

523

Network Performance Evaluation 524

To generate the random networks gene IDs were shuffled randomly in CPM or VST normalized expression 525

matrices The randomized expression matrices were then inferenced by PCC MRNET or CLR methods and 526

evaluated For PCC methods 1000 repeats of randomization and evaluation were conducted For MRNET and 527

CLR each inference steps took 2 hours on our server so 10 repeats were conducted 528

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 16

Four maize datasets were used for evaluation First maize protein-protein interactions were downloaded from 529

PPIM v11 (Zhu et al 2016) Only high-confidence interactions were used for evaluation as defined by ranking 530

top 5 in their results Second maize pathway information was downloaded from MaizeCyc v22 (Monaco et 531

al 2013) Genes within same pathways were considered as co-expressed Third maize gene ontology data 532

for AGPv330 was downloaded from AgriGO (Du et al 2010) GO terms with 20 to 300 genes were used for 533

evaluation Fourth ChIP-Seq confirmed targets for HDA101 (GRMZM2G172883) (Yang et al 2016) was used 534

as positive co-expressed examples for evaluation 535

The widely-used Area under Receiver Operating Characteristic (AUROC) for binary classification problems 536

was used for evaluations Protein-protein interaction and pathway information was parsed into lists of co-537

expressed genes Prediction() and performance() function in R package ROCR were used to calculate 538

AUROCs (Sing et al 2005) The 277 AUROC values for GO datasets were calculated by EGAD package 539

(Ballouz et al 2016) in R Basically it utilizes the ldquoguilt-by associationrdquo principle that genes with shared GO 540

terms are more likely to connected Thus networks normalized and inferred by different methods can be 541

evaluated by hiding a subset of genes GO terms and test whether the hidden GO terms could be predicted 542

from the remaining annotations The prediction model performance was measured by AUROC values in three-543

fold cross-validation All ANOVA and pairwise Wilcoxon rank tests were analyzed in R using anova() and 544

pairwisewilcoxtest() function from stats package P-value adjustment method was set to ldquofdrrdquo (Benjamini and 545

Hochberg 1995) 546

Definition of True Positives (TP) False Positives (FP) True Negatives (TN) False Negatives (FN) For the 547

evaluation using PPPTY dataset TP a network predicts two genes are co-expressed and they are co-548

expressed in PPPTY dataset FP a network predicts two genes are co-expressed but they are not TN a 549

network predicts two genes are not co-expressed and they are not co-expressed in PPPTY FN a network 550

predicts two genes are not co-expressed but they are co-expressed in PPPTY datasets For the evaluation 551

using GO dataset TP a network predicts a gene has a specific GO term and it does have that GO term in our 552

GO dataset FP a network predicts a gene has a specific GO term but it does not have that GO term in our 553

GO dataset TN a network predicts a gene does not have a specific GO term and it doesnrsquot have in our GO 554

dataset FN a network predicts a gene does not have a specific GO terms but it has that GO term in GO 555

dataset 556

557

Network Clustering and Characterization 558

For each network the top 1 million edges were selected as stringent co-expression networks The network 559

topological characteristics were computed in Cytoscape (Shannon et al 2003) The neighborhood connectivity 560

distribution and node degree distributions were plotted by Network Analyzer plugin (Doncheva et al 2012) 561

Graph clustering was performed using Markov Cluster Algorithm (MCL) by MCL v14137 with inflation value set 562

to 18 (Enright et al 2002) All networks were visualized in Cytoscape 563

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 17

564

Gene Ontology Enrichment and Visualization 565

Gene ontology enrichment was analyzed in AgriGOrsquos Singular Enrichment Analysis tool (Du et al 2010) 566

15116 genes involved in our networks were used as background references Hypergeometric testing was used 567

to calculate p-value for which a value below 005 was considered as significant The Yekutieli method was 568

used for multiple test correction and terms with false discovery rate (FDR) above 005 were discarded The 569

results were then imported into Cytoscape for visualization 570

571

Databases Comparison on Cell Wall Pathway 572

Sixteen well characterized (Penning et al 2009 Bosch et al 2011) components of cell wall biosynthesis 573

(Supplemental Table S8) were chosen as query genes to search against CORNET Maize 574

(httpsbioinformaticspsbugentbecornetversionscornet_maize10) on website and STRING database using 575

Cytoscape stringApp (httpappscytoscapeorgappsstringapp) The parameters for searching CORNET 576

database were Method=Pearson Correlation coefficient=075 P-value le 005 and Top genes = 50 This 577

resulted in 210 co-expressed genes and 325 interactions To search STRING database the confidence cutoff 578

was set to 04 with maximum number of interactors set to 100 76 genes with 817 interactions were retrieved 579

Maize proteins were blasted against TAIR 10 protein sequences using standalone BLASTP version 2228+ 580

(Camacho et al 2009) 581

582

Acknowledgments 583

We would like to give special thanks to Dr Peixiang Zhao (FSU Department of Computer Science) for advice 584

and discussion on topological analysis of maize networks Also we thank Dr Alan Lemmon (FSU Department 585

of Scientific Computing) and Dr Jonathan Dennis (FSU Department of Biological Science) for the helpful 586

discussion on data analysis 587

588

Supplemental Data 589

Supplemental Figure 1 Pipeline and datasets used for analysis 590

Supplemental Figure 2 Distribution of gene expression values 591

Supplemental Figure 3 Maize CPM-normalized with log2 transformed gene expression from all tissues and 592

developmental stages 593

Supplemental Figure 4 Pairwise comparison among results of inferences methods 594

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 18

Supplemental Figure 5 Characteristic of all 1720 PPPTY gene set (ALL_1720) genes with highest AUROC 595

values in CSC method (CSC) PCC method (PCC) and MRNET method (MRNET) 596

Supplemental Figure 6 Evaluation of network performance based on sample size and inference 597

Supplemental Figure 7 GCN performance comparison between protein networks 598

Supplemental Figure 8 Average neighborhood connectivity for three selected networks PCC-aggregated (PA) 599

SCC-aggregated (SA) and MRNET-single (MS) 600

Supplemental Figure 9 Node distribution for four selected networks PCC-aggregated (PA) SCC-aggregated 601

(SA) and MRNET-single (MS) and the intersection among three networks (Merged network) 602

Supplemental Figure 10 Network representation of PCC ranked aggregation network (PA) 603

Supplemental Table S1 RNA-Seq libraries used in this analysis 604

Supplemental Table S2 Random network AUROC value baseline 605

Supplemental Table S3 ANOVA tables and pairwise comparisons 606

Supplemental Table S4 Topological characteristics of four maize networks 607

Supplemental Table S5 Gene Ontology annotation for 148 hub genes 608

Supplemental Table S6 Enriched GO terms for PCC ranked aggregation networks from module 1 to module 8 609

Supplemental Table S7 Enriched GO terms for SCC ranked aggregation networks from module 1 to module 8 610

Supplemental Table S8 16 query genes in maize cell wall pathway 611

Supplemetal Table S9 GO enrichment analysis for 214 co-expressed genes of cell wall query genes in 612

merged network 613

Supplemental Table S10 Annotation for co-expressed genes queried by 16 cell wall pathway genes from 614

merged network 615

Supplemental Table S11 Annotation for co-expressed genes queried by 16 cell wall pathway genes from 616

CORNET database 617

Supplemental Table S12 Annotation for co-expressed genes queried by 16 cell wall pathway genes from 618

STRING database 619

Supplemental Dataset S1 The merged network in Cytoscape-ready format 620

Supplemental Dataset S2 Tutorial Visualizing Co-expression data in Cytoscape 621

622

623 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 19

624

625

626

Figure legends 627

628

Figure 1 Number of maize microarray and RNA-Seq samples submitted to NCBI (httpswwwncbinlmnihgov) 629

from 2008-2016 A A text search of the National Center for Biotechnology Information (NCBI) Gene 630

Expression Omnibus (GEO) database identified samples generated by microarray platforms GPL4032 and 631

GPL12620 the total values for years 2008 to 2016 were combined to represent the number of microarray 632

studies (Microarray) A text search of the NCBI Sequence Read Archive (SRA) database was used to identify 633

RNA-seq samples generated between 2008 and 2016 using the Illumina sequencing platform (RNA-Seq) B 634

the number of samples submitted to NCBI GEO database each year generated by microarray platform 635

GPL4032 and GPL12620 were identified by a text search (dash line) and compared to the number of RNA-Seq 636

Illumina samples (solid line) per year 2008-2016 637

638

Figure 2 Normalization and network inference methods effect on single network performance A Network 639

performance was evaluated by calculating area under the receiver operating characteristic curve (AUROC) 640

values from GO datasets for comparisons with samples normalized using Variance Stabilizing Transformation 641

(VST) Counts Per Million (CPM) or Reads per Kilobase per Million (RPKM) methods B Network performance 642

was evaluated by calculating AUROC values from PPPTY dataset comparisons for samples normalized using 643

VST CPM or RPKM methods C Network performance was evaluated by calculating AUROC values from 644

comparisons with HDA101 binding targets for samples normalized using VST CPM or RPKM methods D 645

Network performance was evaluated by calculating AUROC values from comparisons with GO dataset for 646

samples constructed using ten inference methods including Pearson Correlation Coefficient (PCC) Spearman 647

correlation coefficient (SCC) Kendall rank Correlation Coefficient (KCC) Gini correlation coefficient (GCC) 648

Biweight midcorrelation (BIC) Cosine Similarity Coefficient (CSC) Additive ARCNE (AA) multiplicative 649

ARCNE (MA) Minimum Redundancy NETwork (MRNET) and Context Likelihood of Relatedness (CLR) E 650

Network performance was evaluated by calculating AUROC values from comparisons with PPPTY for samples 651

constructed using ten inference methods F Network performance was evaluated by calculating AUROC 652

values from comparisons with HDA101 binding targets for samples constructed using ten inference methods 653

Outliers were defined as outside of 15 times the interquartile range above 75 quantile or below 25 quantile 654

Median values were plotted as bold horizontal lines For C to F the horizontal dashed lines indicate the highest 655

and lowest AUROC values 656

657

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 20

Figure 3 Similarity between ten inference methods on network performance based upon GO (A) and PPPTY 658

(B) evaluation Genes with highest AUROC value in CSC or MRNETCLR are closed in blue and green box 659

respectively Area under the ROC curve (AUROC) values for each GO term or genes were scaled to standard 660

normal distribution resulting in scaled AUROC values between -3 (blue) and 3 (red) Samples normalized by 661

VST CPM and RPKM were analyzed using each inference methods (PCC SCC KCC GCC BIC CSC AA 662

MA MRNET and CLR) and clustered based on Euclidian distance PCC Pearson Correlation Coefficient SCC 663

Spearman correlation coefficient KCC Kendall rank Correlation Coefficient GCC Gini correlation coefficient 664

BIC Biweight midcorrelation CSC Cosine Similarity Coefficient AA Additive ARCNE MA multiplicative 665

ARCNE MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness 666

667

Figure 4 Effect of sample size on network performance A Average area under the ROC curve (average 668

AUROC) values from GO evaluation of 17 different sized networks plotted against natural logarithm 669

transformed sample size (log(sample size)) B Average AUROC values from PPPTY evaluation of 17 different 670

sized networks plotted against natural logarithm transformed sample size (log(sample size)) Regression fitting 671

logarithm models were plotted in black lines R-square and p-values were calculated by lm() function in R PCC 672

Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET Minimum Redundancy 673

NETwork CLR Context Likelihood of Relatedness 674

675

Figure 5 Evaluation of different sized networks constructed with different number of samples using PCC 676

(black) SCC (green) MRNET (red) and CLR (blue) methods A Average AUROC values from GO evaluations 677

of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted against sample size Seventeen 678

individual networks were labeled as S12_1 to S404 the S1266 included all samples from 17 experiments B 679

Average AUROC values from PPPTY evaluations of networks constructed using 12 (S12) to 1266 (S1266) 680

libraries were plotted against sample size Networks with the same number of samples included are 681

designated as ldquo_1rdquo ldquo_2rdquo and ldquo_3rdquo PCC Pearson Correlation Coefficient SCC Spearman correlation 682

coefficient MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness 683

684

Fig 6 GCN performance comparison among single network (whiterdquo1266rdquo) aggregated network (greyrdquoaggrdquo) 685

and protein network (dark greyrdquoprrdquo) using PCC SCC MRNET and CLR A GO evaluation on networks 686

Inference methods were indicated by single letter (p- PCC s- SCC m- MRNET c-CLR) AUROC values were 687

plotted against network types B PPPTY evaluation on networks Inference methods were indicated by single 688

letter (p- PCC s- SCC m- MRNET c-CLR) Network types were plotted against AUROC values Bold 689

horizontal lines indicate median star sign is the mean value of each box Outliers are plotted in grey dots 690

691

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 21

Figure 6 GCN performance comparison of networks constructed with 1266 libraries A Area under the ROC 692

curve (AUROC) values from GO evaluation of single network (white bars) aggregation network (grey bars) and 693

protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) 694

or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers B 695

AUROC values from PPPTY evaluation of single network (white bars) aggregation network (grey bars) and 696

protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) 697

or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers 698

699

Figure 7 Comparison of top 1000 interacting genes in aggregation and single networks A Venn diagram 700

shows the overlap among 1000 highest interacting genes in three networks 148 genes were shared among 701

three networks PA PCC ranked aggregation network SA SCC ranked aggregation network MS MRNET 702

single network B Venn diagram shows the overlap among top 1x106 edges in three networks 106591 edges 703

were shared among three networks C GO term enrichment analyzed by AgriGO for 148 shared highly 704

interacting genes among PA SA and MS D GO term enrichment analyzed by AgriGO for 214 co-expressed 705

genes queried by 16 cell wall pathway genes 706

707

Figure 8 Cell wall pathway subnetworks A Intersections of PCC aggregation (PA) SCC aggregation (SA) and 708

MRNET-single (MS) networks queried by 16 cell wall pathway genes (red nodes) Cyan nodes are genes with 709

reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of 710

involvement in cell wall related pathways Grey lines indicate network predicted interactions B Network 711

retrieved from CORNET database queried by the16 cell wall pathway genes (red node) Cyan nodes are 712

genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior 713

knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions C 714

Network retrieved from STRING database queried by 16 cell wall pathway genes (red nodes) Cyan nodes are 715

genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior 716

knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions 717

718

Supplemental Figure 1 Pipeline and datasets used for analysis A Workflow used in this analysis 719

Independent steps are labeled in square boxes with alternative algorithms for each step in the rounded boxes 720

Software and packages for each step are in italics between the boxes Raw data files were acquired from 721

National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) database converted to a 722

common format (fastq files) and aligned to the maize AGPv3 genome (Alignment) Gene-level reads were 723

counted (Read Count) to generate an expression matrix which was imported to the R environment for the 724

normalization inference and evaluation steps All networks were visualized in Cytoscape B Relative 725

representation of different maize tissues in acquired datasets Tissues are listed by name with the percentage 726

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 22

of the1266 libraries originating from each tissue SAM= Shoot Apical Meristem Samples are grouped by tissue 727

and may be represented by one or more developmental stages of that tissue Tissues represented by less than 728

10 libraries were grouped together as Others C Relative representation of different maize genotypes in our 729

datasets Genotypes are listed by name with the percentage of the 1266 libraries originating from each tissue 730

MAGIC = Multi-parent Advanced Generation InterCrosses Genotypes represented by more than 10 libraries 731

were grouped together as Others 732

733

Supplemental Figure 2 Distribution of gene expression values The frequency of each expression level in the 734

dataset (Density) was plotted against gene expression (Expr) which was calculated after normalization by 735

Variance Stabilizing Transformation (VST) Counts Per Million (CPM) or Reads Per Kilobase per Million 736

mapped reads (RPKM) A-B distribution of expression values for samples normalized with CPM (black line 737

CPM graph) and RPKM (black line RPKM graph) before (A) and after (B) logarithm normalization (log2) VST 738

values are log2 transformed by default The normal distribution of expression (dot lines) was calculated using 739

dnorm() function in R which takes the mean value and standard deviation from log2 transformed expressions 740

C Normalized gene expression values for 15116 genes were averaged libraries and plotted as a function of 741

gene length in base pairs (bp) 742

743

Supplemental Figure 3 Maize CPM-normalized with log2 transformed gene expression from all tissues and 744

developmental stages (Stelpflug et al 2015) A Clustering dendrogram of samples based on Euclidean 745

distance (Height) DAS days after sowing DAP days after pollination V1-V18 vegetative developmental 746

stage B Heat map of the gene expression correlation between pollen tissue and 78 other tissues calculated 747

by Pearson correlation coefficient ranging 06 to 10 Red color indicates higher correlation 748

749

Supplemental Figure 4 Pairwise comparison among results of inferences methods A GO evaluation 750

comparisons for VST CPM and RPKM normalized data The AUROC value density for each method was 751

plotted in diagonal line of blocks between AUROC values and PCC values AUROC values evaluated by GO 752

datasets were plotted pairwise in triangle below diagonal with the number corresponding coefficient values as 753

calculated by Pearson correlation shown in the triangle above diagonal B PPPTY evaluation comparisons for 754

VST CPM and RPKM normalized data The AUROC value density for each method was plotted in diagonal 755

line of blocks between AUROC values and PCC values AUROC values evaluated by PPPTY datasets were 756

plotted pairwise in triangle below diagonal with the number corresponding coefficient values as calculated by 757

Pearson correlation shown in the triangle above diagonal PCC Pearson Correlation Coefficient SCC 758

Spearman correlation coefficient KCC Kendall rank Correlation Coefficient GCC Gini correlation coefficient 759

Bi Biweight midcorrelation CSC Cosine Similarity Coefficient AA Additive ARCNE MA multiplicative 760

ARCNE MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness 761

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 23

762

Supplemental Figure 5 Characteristic of all 1720 PPPTY gene set (ALL_1720) genes with highest AUROC 763

values in CSC method (CSC) PCC method (PCC) and MRNET method (MRNET) Average expression in 764

CPM of four gene sets were in squares average number of lowly expressed elements (CPM lt 0) were in solid 765

circles 766

767

Supplemental Figure 6 Evaluation of network performance based on sample size and inference A AUROC 768

values from GO evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted 769

against sample size B AUROC values from PPPTY evaluations of networks constructed using 12 (S12) to 770

1266 (S1266) libraries were plotted against sample size Networks with the same number of samples included 771

are designated as ldquo_1rdquo ldquo_2rdquo and ldquo_3rdquo Outliers were defined as outside of 15 times the interquartile range 772

above the 75 quantile or below 25 quantile Median values were plotted as bold horizontal lines Dash lines 773

are average AUROC value from 17 individual networks of each categories Mean values of each network were 774

labeled in asterisks PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET 775

Minimum Redundancy NETwork CLR Context Likelihood of Relatedness 776

777

Supplemental Figure 7 GCN performance comparison between protein networks A Area Under the ROC 778

curve (AUROC) values from GO evaluation of protein networks with 17862 genes (ppr_all) and with 11429 779

genes (ppr) B Area Under the ROC curve (AUROC) values from PPPTY evaluation of protein networks with 780

17862 genes (ppr_all) and with 11429 genes (ppr) Both networks were constructed by Pearson Correlation 781

Coefficient (PCC) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate 782

outliers 783

784

Supplemental Figure 8 Average neighborhood connectivity for three selected networks PCC-aggregated (PA) 785

SCC-aggregated (SA) and MRNET-single (MS) The average neighborhood connectivity distribution of all 786

genes is plotted against number of neighbors The top one million edges were chosen for each network Red 787

and blue curve shows the power-law fitted distribution R2 value indicates the fitness with the power-law model 788

789

Supplemental Figure 9 Node distribution for four selected networks PCC-aggregated (PA) SCC-aggregated 790

(SA) and MRNET-single (MS) and the intersection among three networks (Merged network) The number of 791

edges linked to the genes (node degree) was plotted against the number of genes with that degree (number of 792

nodes) Red curve shows the power-law fitted distribution with the function and R2 indicated beside 793

794

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 24

Supplemental Figure 10 Network representation of PCC ranked aggregation network (PA) Each node is a 795

gene in the network The eight largest modules detected by Markov Cluster Algorithm (MCL) were highlighted 796

in colors Genes not in modules 1-8 are light grey nodes 797

798

799

Literature Cited 800

Allen JD Xie Y Chen M Girard L Xiao G (2012) Comparing statistical methods for constructing large scale 801 gene networks PLoS One 7 e29348 802

Anders S Huber W (2010) Differential expression analysis for sequence count data Genome Biol 11 R106 803

Aoki K Ogata Y Shibata D (2007) Approaches for extracting practical information from gene co-expression 804 networks in plant biology Plant Cell Physiol 48 381ndash90 805

Aoki Y Okamura Y Tadaka S Kinoshita K Obayashi T (2015) ATTED-II in 2016 A Plant Coexpression 806 Database Towards Lineage-Specific Coexpression Plant Cell Physiol 57 e5ndashe5 807

Asnicar F Masera L Coller E Gallo C Sella N Tolio T Morettin P Erculiani L Galante F Semeniuta S (2016) 808 NES2RA Network expansion by stratified variable subsetting and ranking aggregation Int J High Perform 809 Comput Appl 1094342016662508 810

Baerenfaller K Grossmann J Grobei MA Hull R Hirsch-Hoffmann M Yalovsky S Zimmermann P 811 Grossniklaus U Gruissem W Baginsky S (2008) Genome-scale proteomics reveals Arabidopsis thaliana 812 gene models and proteome dynamics Science (80- ) 320 938ndash941 813

Ballouz S Verleyen W Gillis J (2015) Guidance for RNA-seq co-expression network construction and analysis 814 Safety in numbers Bioinformatics 31 2123ndash2130 815

Ballouz S Weber M Pavlidis P Gillis J (2016) EGAD Ultra-fast functional analysis of gene networks bioRxiv 816 53868 817

Barabasi A-L Oltvai ZNZN Barabaacutesi A-L (2004) Network biology understanding the cellrsquos functional 818 organization Nat Rev Genet 5 101ndash113 819

Benjamini Y Hochberg Y (1995) Controlling the false discovery rate a practical and powerful approach to 820 multiple testing J R Stat Soc Ser B 289ndash300 821

De Bodt S Hollunder J Nelissen H Meulemeester N Inzeacute D (2012) CORNET 20 Integrating plant 822 coexpression protein-protein interactions regulatory interactions gene associations and functional 823 annotations New Phytol 195 707ndash720 824

Bolduc N Yilmaz A Mejia-Guerra MK Morohashi K OrsquoConnor D Grotewold E Hake S (2012) Unraveling the 825 KNOTTED1 regulatory network in maize meristems Genes Dev 26 1685ndash90 826

Bosch M Mayer C-D Cookson A Donnison IS (2011) Identification of genes involved in cell wall biogenesis in 827 grasses by differential gene expression profiling of elongating and non-elongating maize internodes J 828 Exp Bot 62 3545ndash3561 829

Camacho C Coulouris G Avagyan V Ma N Papadopoulos J Bealer K Madden TL (2009) BLAST+ 830 architecture and applications BMC Bioinformatics 10 421 831

Conesa A Madrigal P Tarazona S Gomez-Cabrero D Cervera A McPherson A Szcześniak MW Gaffney DJ 832 Elo LL Zhang X et al (2016) A survey of best practices for RNA-seq data analysis Genome Biol 17 13 833

Drsquohaeseleer P Liang S Somogyi R (2000) Genetic network inference from co-expression clustering to reverse 834 engineering Bioinformatics 16 707ndash726 835

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 25

Davidson RM Hansey CN Gowda M Childs KL Lin H Vaillancourt B Sekhon RS de Leon N Kaeppler SM 836 Jiang N et al (2011) Utility of RNA Sequencing for Analysis of Maize Reproductive Transcriptomes Plant 837 Genome J 4 191 838

Desprez T Juraniec M Crowell EF Jouy H Pochylova Z Parcy F Houmlfte H Gonneau M Vernhettes S (2007) 839 Organization of cellulose synthase complexes involved in primary cell wall synthesis in Arabidopsis 840 thaliana Proc Natl Acad Sci 104 15572ndash15577 841

Dhillon IS Modha DS (2001) Concept decompositions for large sparse text data using clustering Mach Learn 842 42 143ndash175 843

Dillies M-A Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot G Castel D 844 Estelle J (2013a) A comprehensive evaluation of normalization methods for Illumina high-throughput RNA 845 sequencing data analysis Brief Bioinform 14 671ndash683 846

Dillies MA Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot NS Castel D 847 Estelle J et al (2013b) A comprehensive evaluation of normalization methods for Illumina high-throughput 848 RNA sequencing data analysis Brief Bioinform 14 671ndash683 849

Doncheva NT Assenov Y Domingues FS Albrecht M (2012) Topological analysis and interactive visualization 850 of biological networks and protein structures Nature Protoc 7 670ndash85 851

Dong J Horvath S (2007) Understanding network concepts in modules BMC Syst Biol 1 24 852

Dotto MC Petsch KA Aukerman MJ Beatty M Hammell M Timmermans MCP (2014) Genome-wide analysis 853 of leafbladeless1-regulated and phased small RNAs underscores the importance of the TAS3 ta-siRNA 854 pathway to maize development PLoS Genet 10 e1004826 855

Du D Rawat N Deng Z Gmitter FG (2015) Construction of citrus gene coexpression networks from microarray 856 data using random matrix theory Hortic Res 2 15026 857

Du Z Zhou X Ling Y Zhang Z Su Z (2010) agriGO a GO analysis toolkit for the agricultural community 858 Nucleic Acids Res 38 64-70 859

Enright AJ Van Dongen S Ouzounis CA (2002) An efficient algorithm for large-scale detection of protein 860 families Nucleic Acids Res 30 1575ndash1584 861

Fadista J Vikman P Laakso EO Mollet IG Esguerra J Lou Taneera J Storm P Osmark P Ladenvall C 862 Prasad RB (2014) Global genomic and transcriptomic analysis of human pancreatic islets reveals novel 863 genes influencing glucose metabolism Proc Natl Acad Sci 111 13924ndash13929 864

Faith JJ Hayete B Thaden JT Mogno I Wierzbowski J Cottarel G Kasif S Collins JJ Gardner TS (2007) 865 Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of 866 expression profiles PLoS Biol 5 0054ndash0066 867

Fedoroff N V (2012) McClintockrsquos challenge in the 21st century Proc Natl Acad Sci 109(50) 20200ndash20203 868

Ficklin SP Feltus FA (2011) Gene coexpression network alignment and conservation of gene modules 869 between two grass species maize and rice Plant Physiol 156 1244ndash56 870

Filzmoser P Fritz H Kalcher K (2009) pcaPP Robust PCA by Projection Pursuit R Packag version 1 871

Fu J Cheng Y Linghu J Yang X Kang L Zhang Z Zhang J He C Du X Peng Z (2013) RNA sequencing 872 reveals the complex regulatory network in the maize kernel Nature Commun 42832 873

Gao S Ver Steeg G Galstyan A (2015) Efficient Estimation of Mutual Information for Strongly Dependent 874 Variables Artificial Intelligence and Statistics 277-286 875

Gillis J Pavlidis P (2011) The role of indirect connections in gene networks in predicting function 876 Bioinformatics 27 1860ndash1866 877

Giorgi F Fabbro C Del Licausi F (2013) Comparative study of RNA-seq-and Microarray-derived coexpression 878 networks in Arabidopsis thaliana Bioinformatics 2 1ndash8 879

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 26

Gu Y Kaplinsky N Bringmann M Cobb A Carroll A Sampathkumar A Baskin TI Persson S Somerville CR 880 (2010) Identification of a cellulose synthase-associated protein required for cellulose biosynthesis Proc 881 Natl Acad Sci 107 12866ndash12871 882

Han Y Gao S Muegge K Zhang W Zhou B (2015) Advanced applications of RNA sequencing and challenges 883 Bioinform Biol Insights 9 29ndash46 884

Horvath S Dong J (2008) Geometric interpretation of gene coexpression network analysis PLoS Comput biol 885 4 e1000117 886

Huang J Lynn JS Schulte L Vendramin S McGinnis K (2017) Chapter Two-Epigenetic Control of Gene 887 Expression in Maize Int Rev Cell Mol Biol 328 25ndash48 888

Iancu OD Kawane S Bottomly D Searles R Hitzemann R McWeeney S (2012) Utilizing RNA-Seq data for de 889 novo coexpression network inference Bioinformatics 28 1592ndash1597 890

Kim D Langmead B Salzberg SL (2015) HISAT a fast spliced aligner with low memory requirements Nat 891 Methods 12 357ndash360 892

Koumlster J Rahmann S (2012) Snakemakemdasha scalable bioinformatics workflow engine Bioinformatics 28 893 2520ndash2522 894

Krouk G Lingeman J Colon A Coruzzi G Shasha D (2013) Gene regulatory networks in plants learning 895 causality from time and perturbation Genome Biol 14 123 896

Kumar S Stecher G Suleski M Hedges SB (2017) TimeTree a resource for timelines timetrees and 897 divergence times Mol Biol Evol 34 1812ndash1819 898

Kumari S Nie J Chen H-S Ma H Stewart R Li X Lu M-Z Taylor WM Wei H (2012) Evaluation of gene 899 association methods for coexpression network construction and biological knowledge discovery PLoS 900 One 7 e50411 901

Langfelder P Horvath S (2008) WGCNA an R package for weighted correlation network analysis BMC 902 Bioinformatics 9 559 903

Leinonen R Sugawara H Shumway M (2010) The sequence read archive Nucleic Acids Res gkq1019 904

Li C Qiao Z Qi W Wang Q Yuan Y Yang X Tang Y Mei B Lv Y Zhao H et al (2015a) Genome-Wide 905 Characterization of cis-Acting DNA Targets Reveals the Transcriptional Regulatory Framework of 906 Opaque2 in Maize Plant Cell 27 532-545 907

Li H Peng Z Yang X Wang W Fu J Wang J Han Y Chai Y Guo T Yang N (2013a) Genome-wide 908 association study dissects the genetic architecture of oil biosynthesis in maize kernels Nat Genet 45 43ndash909 50 910

Li J Wei H Zhao PX (2013b) DeGNServer  Deciphering Genome-Scale Gene Networks through High 911 Performance Reverse Engineering Analysis 2013 912

Li P Piao Y Shon HS Ryu KH (2015b) Comparing the normalization methods for the differential analysis of 913 Illumina high-throughput RNA-Seq data BMC Bioinformatics 16 347 914

Li Q Eichten SR Hermanson PJ Zaunbrecher VM Song J Wendt J Rosenbaum H Madzima TF Sloan AE 915 Huang J et al (2014a) Genetic Perturbation of the Maize Methylome Plant Cell 26 4602ndash4616 916

Li S Łabaj PP Zumbo P Sykacek P Shi W Shi L Phan J Wu P-Y Wang M Wang C (2014b) Detecting and 917 correcting systematic variation in large-scale RNA sequencing data Nature Biotechnol 32 888ndash895 918

Li Y Pearl SA Jackson SA (2015c) Gene Networks in Plant Biology Approaches in Reconstruction and 919 Analysis Trends Plant Sci 20 664ndash675 920

Liao Y Smyth GK Shi W (2014) featureCounts an efficient general purpose program for assigning sequence 921 reads to genomic features Bioinformatics 30 923ndash930 922

Lim WK Wang K Lefebvre C Califano A (2007) Comparative analysis of microarray normalization procedures 923 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 27

Effects on reverse engineering gene networks Bioinformatics pp 282ndash288 924

Liu S Liu Y Zhao J Cai S Qian H Zuo K Zhao L Zhang L (2017) A computational interactome for prioritizing 925 genes associated with complex agronomic traits in rice Plant J 90 177-188 926

Lonsdale J Thomas J Salvatore M Phillips R Lo E Shad S Hasz R Walters G Garcia F Young N (2013) 927 The genotype-tissue expression (GTEx) project Nat Genet 45 580ndash585 928

Love MI Huber W Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data 929 with DESeq2 Genome Biol 15 1 930

Luo X You Z Zhou M Li S Leung H Xia Y Zhu Q (2015) A highly efficient approach to protein interactome 931 mapping based on collaborative filtering framework Sci Rep 5 7702 932

Ma C Wang X (2012) Application of the Gini correlation coefficient to infer regulatory relationships in 933 transcriptome analysis Plant Physiol 160 192ndash203 934

Ma H-W Zeng A-P (2003) The connectivity structure giant strong component and centrality of metabolic 935 networks Bioinformatics 19 1423ndash1430 936

Ma S Shah S Bohnert HJ Snyder M Dinesh-Kumar SP (2013) Incorporating motif analysis into gene co-937 expression networks reveals novel modular expression pattern and new signaling pathways PLoS Genet 938 9 e1003840 939

Marbach D Costello JC Kuumlffner R Vega NNM Prill RJ Camacho DM Allison KR Aderhold A Allison KR 940 Bonneau R et al (2012) Wisdom of crowds for robust gene network inference Nat Methods 9 796ndash804 941

Margolin AA Nemenman I Basso K Wiggins C Stolovitzky G Dalla Favera R Califano A (2006) ARACNE 942 an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context BMC 943 Bioinformatics 7 S7 944

Mark Cigan A Unger‐Wallace E Haug‐Collet K (2005) Transcriptional gene silencing as a tool for uncovering 945 gene function in maize Plant J 43 929ndash940 946

Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads EMBnet J 17 947 pp-10 948

Maza E Frasse P Senin P Bouzayen M Zouine M (2013) Comparison of normalization methods for 949 differential gene expression analysis in RNA-Seq experiments A matter of relative size of studied 950 transcriptomes Commun Integr Biol 6 e25849 951

McClintock B (1983) The Significance of Responses of THE GENOME TO CHALLENGE Science (80- ) 792ndash952 801 953

Meyer PE Kontos K Lafitte F Bontempi G (2007) Information-theoretic inference of large transcriptional 954 regulatory networks Eurasip J Bioinforma Syst Biol doi 101155200779879 955

Meyer PE Lafitte F Bontempi G (2008) minet ARBioconductor package for inferring large transcriptional 956 networks using mutual information BMC Bioinformatics 9 461 957

Monaco MK Sen TZ Dharmawardhana PD Ren L Schaeffer M Naithani S Amarasinghe V Thomason J 958 Harper L Gardiner J et al (2013) Maize Metabolic Network Construction and Transcriptome Analysis 959 Plant Genome 6 12 960

Morohashi K Casas MI Falcone Ferreyra ML Falcone Ferreyra L Mejiacutea-Guerra MK Pourcel L Yilmaz A 961 Feller A Carvalho B Emiliani J et al (2012) A genome-wide regulatory framework identifies maize 962 pericarp color1 controlled genes Plant Cell 24 2745ndash64 963

Morris JH Apeltsin L Newman AM Baumbach J Wittkop T Su G Bader GD Ferrin TE (2011) clusterMaker 964 a multi-algorithm clustering plugin for Cytoscape BMC Bioinformatics 12 436 965

Mortazavi A Williams BA McCue K Schaeffer L Wold B (2008) Mapping and quantifying mammalian 966 transcriptomes by RNA-Seq Nat Methods 5 621ndash628 967

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 28

Mukaka MM (2012) A guide to appropriate use of correlation coefficient in medical research Malawi Med J 24 968 69ndash71 969

Obayashi T Hayashi S Saeki M Ohta H Kinoshita K (2009) ATTED-II provides coexpressed gene networks 970 for Arabidopsis Nucleic Acids Res 37 D987ndashD991 971

Ogata Y Suzuki H Sakurai N Shibata D (2010) CoP a database for characterizing co-expressed gene 972 modules with biological information in plants Bioinformatics 26 1267ndash1268 973

Oshlack A Wakefield MJ (2009) Transcript length bias in RNA-seq data confounds systems biology Biol 974 Direct 4 14 975

Park T Yi S-G Kang S-H Lee S Lee Y-S Simon R (2003) Evaluation of normalization methods for microarray 976 data BMC Bioinformatics 4 33 977

Paulson J Chen C-Y Lopes-Ramos CM Kuijjer ML Platig J Sonawane AR Fagny M Glass K Quackenbush 978 J (2016) Tissue-aware RNA-Seq processing and normalization for heterogeneous and sparse data 979 bioRxiv 81802 980

Pautler M Eveland AL LaRue T Yang F Weeks R Lunde C Je B Il Meeley R Komatsu M Vollbrecht E et 981 al (2015) FASCIATED EAR4 Encodes a bZIP Transcription Factor That Regulates Shoot Meristem Size in 982 Maize Plant Cell Online 2 tpc114132506 983

Penning BW Hunter 3rd CT Tayengwa R Eveland AL Dugard CK Olek AT Vermerris W Koch KE McCarty 984 DR Davis MF et al (2009) Genetic resources for maize cell wall biology Plant Physiol 151 1703ndash1728 985

Ponnala L Wang Y Sun Q Wijk KJ (2014) Correlation of mRNA and protein abundance in the developing 986 maize leaf Plant J 78 424ndash440 987

Rau A Gallopin M Celeux G Jaffreacutezic F (2013) Data-based filtering for replicated high-throughput 988 transcriptome sequencing experiments Bioinformatics 29 2146ndash2152 989

Robinson MD McCarthy DJ Smyth GK (2010) edgeR a Bioconductor package for differential expression 990 analysis of digital gene expression data Bioinformatics 26 139ndash140 991

Sales G Romualdi C (2011) parmigenemdasha parallel R package for mutual information estimation and gene 992 network reconstruction Bioinformatics 27 1876ndash1877 993

Sangrador-Vegas A Mitchell AL Chang H-Y Yong S-Y Finn RD (2016) GO annotation in InterPro why 994 stability does not indicate accuracy in a sea of changing annotations Database J Biol databases 995 curation 2016 996

Sato Y Antonio BA Namiki N Takehisa H Minami H Kamatsuki K Sugimoto K Shimizu Y Hirochika H 997 Nagamura Y (2011) RiceXPro a platform for monitoring gene expression in japonica rice grown under 998 natural field conditions Nucleic Acids Res 39 D1141ndashD1148 999

Schaefer RJ Briskine R Springer NM Myers CL (2014) Discovering functional modules across diverse maize 1000 transcriptomes using COB the co-expression browser PLoS One doi 101371journalpone0099193 1001

Schmidt D (2016) Co-Operation Fast Correlation Covariance and Cosine Similarity R package 1002

Schnable P Ware D Fulton R Stein J (2009) The B73 maize genome complexity diversity and dynamics 1003 Science (80- ) 326 1112ndash1115 1004

Schwanhaumlusser B Busse D Li N Dittmar G Schuchhardt J Wolf J Chen W Selbach M (2011) Global 1005 quantification of mammalian gene expression control Nature 473 337ndash342 1006

Scott-Boyer M-P Haibe-Kains B Deschepper CF (2013) Network statistics of genetically-driven gene co-1007 expression modules in mouse crosses Frontiers in Genetics 20134291 1008

Serin EAR Nijveen H Hilhorst HWM Ligterink W (2016) Learning from Co-expression Networks Possibilities 1009 and Challenges Front Plant Sci 7 444 1010

Shannon P Markiel A Ozier O Baliga NS Wang JT Ramage D Amin N Schwikowski B Ideker T (2003) 1011 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 29

Cytoscape a software environment for integrated models of biomolecular interaction networks Genome 1012 Res 13 2498ndash2504 1013

Sing T Sander O Beerenwinkel N Lengauer T (2005) ROCR visualizing classifier performance in R 1014 Bioinformatics 21 3940ndash3941 1015

Singh M Goel S Meeley RB Dantec C Parrinello H Michaud C Leblanc O Grimanelli D (2011) Production of 1016 viable gametes without meiosis in maize deficient for an ARGONAUTE protein Plant Cell 23 443ndash458 1017

Song L Langfelder P Horvath S (2012) Comparison of co-expression measures mutual information 1018 correlation and model based indices BMC Bioinformatics 13 328 1019

Stelpflug SC Rajandeep S Vaillancourt B Hirsch CN Buell CR Leon N De Kaeppler SM (2015) An 1020 expanded maize gene expression atlas based on RNA-sequencing and its use to explore root 1021 development Plant Genome 314ndash362 1022

Szklarczyk D Franceschini A Wyder S Forslund K Heller D Huerta-Cepas J Simonovic M Roth A Santos A 1023 Tsafou KP et al (2015) STRING v10 Protein-protein interaction networks integrated over the tree of life 1024 Nucleic Acids Res 43 D447ndashD452 1025

Tzfadia O Diels T De Meyer S Vandepoele K Aharoni A Van de Peer Y (2016) CoExpNetViz Comparative 1026 Co-Expression Networks Construction and Visualization Tool Front Plant Sci 6 1194 1027

Usadel B Obayashi T Mutwil M Giorgi FM Bassel GW Tanimoto M Chow A Steinhauser D Persson S 1028 Provart NJ (2009) Co-expression tools for plant biology opportunities for hypothesis generation and 1029 caveats Plant Cell Environ 32 1633ndash51 1030

USDA (2016) Grain World Markets and Trade 1031

Vanholme R Demedts B Morreel K Ralph J Boerjan W (2010) Lignin biosynthesis and structure Plant 1032 Physiol 153 895ndash905 1033

Verdin E Ott M (2014) 50 Years of Protein Acetylation From Gene Regulation To Epigenetics Metabolism 1034 and Beyond Nat Rev Mol Cell Biol 16 258ndash264 1035

Walley JW Sartor RC Shen Z Schmitz RJ Wu KJ Urich MA Nery JR Smith LG Schnable JC Ecker JR 1036 (2016) Integration of omic networks in a developmental atlas of maize Science 353 814ndash818 1037

Wang W Zhou XJ Liu Z Sun F (2015a) Network tuned multiple rank aggregation and applications to gene 1038 ranking BMC Bioinformatics 16 S6 1039

Wang YXR Jiang K Feldman LJ Bickel PJ Huang H (2015b) Inferring genendashgene interactions and functional 1040 modules using sparse canonical correlation analysis Ann Appl Stat 9 300ndash323 1041

Wang Z Gerstein M Snyder M (2009) RNA-Seq a revolutionary tool for transcriptomics Nat Rev Genet 10 1042 57ndash63 1043

Wei C Li J Bumgarner RE (2004) Sample size for detecting differentially expressed genes in microarray 1044 experiments BMC Genomics 5 87 1045

Wolfe CJ Kohane IS Butte AJ (2005) Systematic survey reveals general applicability ofldquo guilt-by-associationrdquo 1046 within gene coexpression networks BMC Bioinformatics 6 227 1047

Wu D-D Wang X Li Y Zeng L Irwin DM Zhang Y-P (2014) ldquoOut of Pollenrdquo Hypothesis for Origin of New 1048 Genes in Flowering Plants Study from Arabidopsis thaliana Genome Biol Evol 6 2822ndash2829 1049

Yang H Liu X Xin M Du J Hu Z Peng H Rossi V Sun Q Ni Z Yao Y (2016) Genome-wide Mapping of 1050 Targets of Maize Histone Deacetylase HDA101 Reveals Its Function and Regulatory Mechanism during 1051 Seed Development Plant Cell 28 629ndash645 1052

Yim WC Yu Y Song K Jang CS Lee B-M (2013) PLANEX the plant co-expression database BMC Plant Biol 1053 13 83 1054

Zheng W Chung LM Zhao H (2011) Bias detection and correction in RNA-Sequencing data BMC 1055 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 30

Bioinformatics 12 290 1056

Zhong R Allen JD Xiao G Xie Y (2014) Ensemble-based network aggregation improves the accuracy of gene 1057 network reconstruction PLoS One 9 e106319 1058

Zhong R Ye ZH (2015) Secondary cell walls Biosynthesis patterned deposition and transcriptional regulation 1059 Plant Cell Physiol 56 195ndash214 1060

Zhu G Wu A Xu X-J Xiao P Lu L Liu J Cao Y Chen L Wu J Zhao X-M (2016) PPIM A protein-protein 1061 interaction database for Maize Plant Physiol 170 pp1501821- 1062

Zyprych-Walczak J Szabelska A Handschuh L Goacuterczak K Klamecka K Figlerowicz M Siatkowski I (2015) 1063 The impact of normalization methods on RNA-Seq data analysis Biomed Res Int 2015 1064

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 1 Number of maize microarray and RNA-Seq samples submitted to NCBI (httpswwwncbinlmnihgov) from 2008-2016 A A text search of the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) database identified samples generated by microarray platforms GPL4032 and GPL12620 the total values for years 2008 to 2016 were combined to represent the number of microarray studies (Microarray) A text search of the NCBI Sequence Read Archive (SRA) database was used to identify RNA-seq samples generated between 2008 and 2016 using the Illumina sequencing platform (RNA-Seq) B the number of samples submitted to NCBI GEO database each year generated by microarray platform GPL4032 and GPL12620 were identified by a text search (dash line) and compared to the number of RNA-Seq Illumina samples (solid line) per year 2008-2016

Fig 1A B

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 2 Normalization and network inference methods effect on single network performance A Network performance was evaluated by calculating area under the receiver operating characteristic curve (AUROC) values from GO datasets for comparisons with samples normalized using Variance Stabilizing Transformation (VST) Counts Per Million (CPM) or Reads per Kilobase per Million (RPKM) methods B Network performance was evaluated by calculating AUROC values from PPPTY dataset comparisons for samples normalized using VST CPM or RPKM methods C Network performance was evaluated by calculating AUROC values from com-parisons with HDA101 binding targets for samples normalized using VST CPM or RPKM methods D Network performance was evaluated by calculating AUROC values from comparisons with GO dataset for samples constructed using nine inference methods including Pearson Correlation Coefficient (PCC) Spearman correla-tion coefficient (SCC) Kendall rank Correlation Coefficient (KCC) Gini correlation coefficient (GCC) Biweight midcorrelation (BIC) Cosine Similarity Coefficient (CSC) Additive ARCNE (AA) multiplicative ARCNE (MA) Minimum Redundancy NETwork (MRNET) and Context Likelihood of Relatedness (CLR) E Network perfor-mance was evaluated by calculating AUROC values from comparisons with PPPTY for samples constructed using nine inference methods F Network performance was evaluated by calculating AUROC values from com-parisons with HDA101 binding targets for samples constructed using nine inference methods Outliers were defined as outside of 15 times the interquartile range above 75 quantile or below 25 quantile Median values were plotted as bold horizontal lines For C to F the horizontal dashed lines indicate the highest and lowest AUROC values

Fig 2 A D

B E

C F

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

FigP

FigurePIPSimilarityPbetweenPninePinferencePmethodsPonPnetworkPperformancePbasedPuponPGOPA-PandPPPPTYPB-PevaluationI Genes with highest AUROC value in CSC or MRNETCLR are closed in blue and green box respectively AreaPunderPthePROCPcurvePAUROC-PvaluesPforPeachPGOPtermPorPgenesPwerePscaledPtoPstandardPnormalPdistributionMPresultingPinPscaledPAUROCPvaluesPbetweenPKPblue-PandPPred-IPSamplesPnormalizedPbyPVSTMPCPMPandPRPKMPwerePanalyzedPusingPeachPinferencePmethodsPPCCMPSCCMPKCCMPGCCMPBICMPCSCMPAAMPMAMPMRNETPandPCLR-PandPclusteredPbasedPonP EuclidianP distanceIP PCCP PearsonP CorrelationP CoefficientP SCCP SpearmanP correlationP coefficientP KCCP KendallP rankP CorrelationP CoefficientPGCCPGiniPcorrelationPcoefficientPBICPBiweightPmidcorrelationPCosinePSimilarityPCoefficientPCSC-PAAPAdditivePARCNEPMAPmultiplicativePARCNEPMRNETPMinimumPRedundancyPNETworkPCLRPContextPLikelihoodPofPRelatednessI

A

B

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 4 Effect of sample size on network performance A Average area under the ROC curve (average AUROC) values from GO evaluation of 17 different sized networks plotted against natural logarithm transformed sample size (log(sample size)) B Average AUROC values from PPPTY evaluation of 17 different sized networks plotted against natural logarithm transformed sample size (log(sample size)) Regression fitting logarithm models were plotted in black lines R-square and p-values were calculated by lm() function in R PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness

Fig 4A

B

GO PCC GO SCC

GO MRNET GO CLR

PPPTY PCC PPPTY SCC

PPPTY CLRPPPTY MRNET

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Fig 5

A B

Figure 5 Evaluation of different sized networks constructed with different number of samples using PCC (black) SCC (green) MRNET (red) and CLR (blue) methods A Average AUROC values from GO evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted against sample size B Average AUROC values from PPPTY evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted against sample size Networks with the same number of samples included are designated as ldquo_1rdquo ldquo_2rdquo and ldquo_3rdquo PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 6 GCN performance comparison of networks constructed with 1266 libraries A Area under the ROC curve (AUROC) values from GO evalu-ation of single network (white bars) aggregation network (grey bars) and protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers B AUROC values from PPPTY evaluation of single network (white bars) aggregation network (grey bars) and protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers

AU

C

AU

C

Protein GO Protein PPPTYA B

Fig 6

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

835

45

3812

5 802

148

MS PA

SA

872505

167664

16510411732

9172 716573

106591

MS PA

SA

chromatinassemblydisassembly

cellular macromoleculemetabolic process

chromatin assembly

Hub

N2 compound metabolicprocess

gene silencing

macromoleculemetabolic process

cellular componentorganization

chromatin modification

biosynthetic process

cellular biosyntheticprocess

DNA packaging

organelle organization

protein-DNA complexassembly

nucleosomeorganization

DNA-dep DNAreplication

macromoleculebiosynthetic process

response to DNAdamage stimulus

chromosomeorganization

pattern specificationprocess

DNA replication

DNA conformationchange

translation

cellular macromoleculebiosynthetic process

Nucleic acid metabolicprocess

gene expression

chromatin organizationnucleosome assembly

epigenetic reg of geneexpression

negative regulation ofmacromolecule

metabolic process

cellular response tostress

RNA processing

DNA repair

regionalization

polysaccharidebiosynthetic process

cell wall organization orbiogenesis

glucan metabolicprocess

cellular glucanmetabolic process

cellular polysaccharidebiosynthetic process

cellular carbohydratebiosynthetic process

cellulose metabolicprocess

cellular polysaccharidemetabolic process

cellulose biosyntheticprocess epidermis development

cell growthgrowth

regulation of cellularcomponent size

cellular amino acidderivative metabolic

process

cell wall polysaccharidemetabolic process

carbohydrate metabolicprocess

regulation of anatomicalstructure size

GTP metabolic process

root morphogenesis

epidermal celldifferentiation

ectoderm developmentphenylpropanoid

biosynthetic process

regulation of cell size

glucan biosyntheticprocess

carbohydratebiosynthetic process cellular cell wall

organization orbiogenesis

cell wall biogenesis

Cell Wallroot epidermal cell

differentiationcell differentiation

cell wall organization

protein polymerization

plant-type cell wallbiogenesis

cellular carbohydratemetabolic process

phenylpropanoidmetabolic process

cell wall macromoleculemetabolic process

cellular cell wallmacromolecule

metabolic process

plant-type cell wallorganization or

biogenesis

hemicellulose metabolicprocess

Fig 7A C

B D

Figure 7 Comparison of top 1000 interacting genes in aggregation and single networks A Venn diagram shows the overlap among 1000 highest interacting genes in three networks 148 genes were shared among three networks PA PCC ranked aggregation network SA SCC ranked aggrega-tion network MS MRNET single network B Venn diagram shows the overlap among top 1x106 edges in three networks 106591 edges were shared among three networks C GO term enrichment analyzed by AgriGO for 148 shared highly interacting genes among PA SA and MS D GO term enrichment analyzed by AgriGO for 214 co-expressed genes queried by 16 cell wall pathway genes wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Fig 8

A B C

Figure 8 Cell wall pathway subnetworks A Intersections of PCC aggregation (PA) SCC aggregation (SA) and MRNET-single (MS) networks queried by 16 cell wall pathway genes (red nodes) Cyan nodes are genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions B Network retrieved from CORNET database queried by the 16 cell wall pathway genes (red node) Cyan nodes are genes with reported function in cell wall related path-ways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions C Network retrieved from STRING database queried by the 16 cell wall pathway genes (red nodes) Cyan nodes are genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Parsed CitationsAllen JD Xie Y Chen M Girard L Xiao G (2012) Comparing statistical methods for constructing large scale gene networks PLoS One7 e29348

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Anders S Huber W (2010) Differential expression analysis for sequence count data Genome Biol 11 R106Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Aoki K Ogata Y Shibata D (2007) Approaches for extracting practical information from gene co-expression networks in plant biologyPlant Cell Physiol 48 381-90

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Aoki Y Okamura Y Tadaka S Kinoshita K Obayashi T (2015) ATTED-II in 2016 A Plant Coexpression Database Towards Lineage-Specific Coexpression Plant Cell Physiol 57 e5-e5

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Asnicar F Masera L Coller E Gallo C Sella N Tolio T Morettin P Erculiani L Galante F Semeniuta S (2016) NES2RA Networkexpansion by stratified variable subsetting and ranking aggregation Int J High Perform Comput Appl 1094342016662508

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Baerenfaller K Grossmann J Grobei MA Hull R Hirsch-Hoffmann M Yalovsky S Zimmermann P Grossniklaus U Gruissem WBaginsky S (2008) Genome-scale proteomics reveals Arabidopsis thaliana gene models and proteome dynamics Science (80- ) 320938-941

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ballouz S Verleyen W Gillis J (2015) Guidance for RNA-seq co-expression network construction and analysis Safety in numbersBioinformatics 31 2123-2130

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ballouz S Weber M Pavlidis P Gillis J (2016) EGAD Ultra-fast functional analysis of gene networks bioRxiv 53868Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Barabasi A-L Oltvai ZNZN Barabaacutesi A-L (2004) Network biology understanding the cells functional organization Nat Rev Genet 5101-113

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Benjamini Y Hochberg Y (1995) Controlling the false discovery rate a practical and powerful approach to multiple testing J R Stat SocSer B 289-300

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

De Bodt S Hollunder J Nelissen H Meulemeester N Inzeacute D (2012) CORNET 20 Integrating plant coexpression protein-proteininteractions regulatory interactions gene associations and functional annotations New Phytol 195 707-720

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Bolduc N Yilmaz A Mejia-Guerra MK Morohashi K OConnor D Grotewold E Hake S (2012) Unraveling the KNOTTED1 regulatorynetwork in maize meristems Genes Dev 26 1685-90

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Bosch M Mayer C-D Cookson A Donnison IS (2011) Identification of genes involved in cell wall biogenesis in grasses by differential wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

gene expression profiling of elongating and non-elongating maize internodes J Exp Bot 62 3545-3561Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Camacho C Coulouris G Avagyan V Ma N Papadopoulos J Bealer K Madden TL (2009) BLAST+ architecture and applications BMCBioinformatics 10 421

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Conesa A Madrigal P Tarazona S Gomez-Cabrero D Cervera A McPherson A Szczesniak MW Gaffney DJ Elo LL Zhang X et al(2016) A survey of best practices for RNA-seq data analysis Genome Biol 17 13

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dhaeseleer P Liang S Somogyi R (2000) Genetic network inference from co-expression clustering to reverse engineeringBioinformatics 16 707-726

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Davidson RM Hansey CN Gowda M Childs KL Lin H Vaillancourt B Sekhon RS de Leon N Kaeppler SM Jiang N et al (2011) Utilityof RNA Sequencing for Analysis of Maize Reproductive Transcriptomes Plant Genome J 4 191

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Desprez T Juraniec M Crowell EF Jouy H Pochylova Z Parcy F Houmlfte H Gonneau M Vernhettes S (2007) Organization of cellulosesynthase complexes involved in primary cell wall synthesis in Arabidopsis thaliana Proc Natl Acad Sci 104 15572-15577

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dhillon IS Modha DS (2001) Concept decompositions for large sparse text data using clustering Mach Learn 42 143-175Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dillies M-A Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot G Castel D Estelle J (2013a) Acomprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis Brief Bioinform 14671-683

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dillies MA Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot NS Castel D Estelle J et al (2013b) Acomprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis Brief Bioinform 14671-683

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Doncheva NT Assenov Y Domingues FS Albrecht M (2012) Topological analysis and interactive visualization of biological networksand protein structures Nature Protoc 7 670-85

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dong J Horvath S (2007) Understanding network concepts in modules BMC Syst Biol 1 24Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dotto MC Petsch KA Aukerman MJ Beatty M Hammell M Timmermans MCP (2014) Genome-wide analysis of leafbladeless1-regulatedand phased small RNAs underscores the importance of the TAS3 ta-siRNA pathway to maize development PLoS Genet 10 e1004826

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Du D Rawat N Deng Z Gmitter FG (2015) Construction of citrus gene coexpression networks from microarray data using randommatrix theory Hortic Res 2 15026

Pubmed Author and TitleCrossRef Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Google Scholar Author Only Title Only Author and Title

Du Z Zhou X Ling Y Zhang Z Su Z (2010) agriGO a GO analysis toolkit for the agricultural community Nucleic Acids Res 38 64-70Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Enright AJ Van Dongen S Ouzounis CA (2002) An efficient algorithm for large-scale detection of protein families Nucleic Acids Res 301575-1584

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fadista J Vikman P Laakso EO Mollet IG Esguerra J Lou Taneera J Storm P Osmark P Ladenvall C Prasad RB (2014) Globalgenomic and transcriptomic analysis of human pancreatic islets reveals novel genes influencing glucose metabolism Proc Natl AcadSci 111 13924-13929

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Faith JJ Hayete B Thaden JT Mogno I Wierzbowski J Cottarel G Kasif S Collins JJ Gardner TS (2007) Large-scale mapping andvalidation of Escherichia coli transcriptional regulation from a compendium of expression profiles PLoS Biol 5 0054-0066

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fedoroff N V (2012) McClintocks challenge in the 21st century Proc Natl Acad Sci 109(50) 20200-20203Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ficklin SP Feltus FA (2011) Gene coexpression network alignment and conservation of gene modules between two grass speciesmaize and rice Plant Physiol 156 1244-56

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Filzmoser P Fritz H Kalcher K (2009) pcaPP Robust PCA by Projection Pursuit R Packag version 1Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fu J Cheng Y Linghu J Yang X Kang L Zhang Z Zhang J He C Du X Peng Z (2013) RNA sequencing reveals the complex regulatorynetwork in the maize kernel Nature Commun 42832

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gao S Ver Steeg G Galstyan A (2015) Efficient Estimation of Mutual Information for Strongly Dependent Variables ArtificialIntelligence and Statistics 277-286

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gillis J Pavlidis P (2011) The role of indirect connections in gene networks in predicting function Bioinformatics 27 1860-1866Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Giorgi F Fabbro C Del Licausi F (2013) Comparative study of RNA-seq-and Microarray-derived coexpression networks in Arabidopsisthaliana Bioinformatics 2 1-8

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gu Y Kaplinsky N Bringmann M Cobb A Carroll A Sampathkumar A Baskin TI Persson S Somerville CR (2010) Identification of acellulose synthase-associated protein required for cellulose biosynthesis Proc Natl Acad Sci 107 12866-12871

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Han Y Gao S Muegge K Zhang W Zhou B (2015) Advanced applications of RNA sequencing and challenges Bioinform Biol Insights 929-46

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Horvath S Dong J (2008) Geometric interpretation of gene coexpression network analysis PLoS Comput biol 4 e1000117Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Huang J Lynn JS Schulte L Vendramin S McGinnis K (2017) Chapter Two-Epigenetic Control of Gene Expression in Maize Int RevCell Mol Biol 328 25-48

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Iancu OD Kawane S Bottomly D Searles R Hitzemann R McWeeney S (2012) Utilizing RNA-Seq data for de novo coexpressionnetwork inference Bioinformatics 28 1592-1597

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kim D Langmead B Salzberg SL (2015) HISAT a fast spliced aligner with low memory requirements Nat Methods 12 357-360Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Koumlster J Rahmann S (2012) Snakemakemdasha scalable bioinformatics workflow engine Bioinformatics 28 2520-2522Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Krouk G Lingeman J Colon A Coruzzi G Shasha D (2013) Gene regulatory networks in plants learning causality from time andperturbation Genome Biol 14 123

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kumar S Stecher G Suleski M Hedges SB (2017) TimeTree a resource for timelines timetrees and divergence times Mol Biol Evol34 1812-1819

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kumari S Nie J Chen H-S Ma H Stewart R Li X Lu M-Z Taylor WM Wei H (2012) Evaluation of gene association methods forcoexpression network construction and biological knowledge discovery PLoS One 7 e50411

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Langfelder P Horvath S (2008) WGCNA an R package for weighted correlation network analysis BMC Bioinformatics 9 559Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Leinonen R Sugawara H Shumway M (2010) The sequence read archive Nucleic Acids Res gkq1019Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li C Qiao Z Qi W Wang Q Yuan Y Yang X Tang Y Mei B Lv Y Zhao H et al (2015a) Genome-Wide Characterization of cis-Acting DNATargets Reveals the Transcriptional Regulatory Framework of Opaque2 in Maize Plant Cell 27 532-545

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li H Peng Z Yang X Wang W Fu J Wang J Han Y Chai Y Guo T Yang N (2013a) Genome-wide association study dissects the geneticarchitecture of oil biosynthesis in maize kernels Nat Genet 45 43-50

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li J Wei H Zhao PX (2013b) DeGNServer Deciphering Genome-Scale Gene Networks through High Performance ReverseEngineering Analysis 2013

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li P Piao Y Shon HS Ryu KH (2015b) Comparing the normalization methods for the differential analysis of Illumina high-throughputRNA-Seq data BMC Bioinformatics 16 347 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li Q Eichten SR Hermanson PJ Zaunbrecher VM Song J Wendt J Rosenbaum H Madzima TF Sloan AE Huang J et al (2014a)Genetic Perturbation of the Maize Methylome Plant Cell 26 4602-4616

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li S Labaj PP Zumbo P Sykacek P Shi W Shi L Phan J Wu P-Y Wang M Wang C (2014b) Detecting and correcting systematicvariation in large-scale RNA sequencing data Nature Biotechnol 32 888-895

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li Y Pearl SA Jackson SA (2015c) Gene Networks in Plant Biology Approaches in Reconstruction and Analysis Trends Plant Sci 20664-675

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Liao Y Smyth GK Shi W (2014) featureCounts an efficient general purpose program for assigning sequence reads to genomicfeatures Bioinformatics 30 923-930

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Lim WK Wang K Lefebvre C Califano A (2007) Comparative analysis of microarray normalization procedures Effects on reverseengineering gene networks Bioinformatics pp 282-288

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Liu S Liu Y Zhao J Cai S Qian H Zuo K Zhao L Zhang L (2017) A computational interactome for prioritizing genes associated withcomplex agronomic traits in rice Plant J 90 177-188

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Lonsdale J Thomas J Salvatore M Phillips R Lo E Shad S Hasz R Walters G Garcia F Young N (2013) The genotype-tissueexpression (GTEx) project Nat Genet 45 580-585

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Love MI Huber W Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 Genome Biol15 1

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Luo X You Z Zhou M Li S Leung H Xia Y Zhu Q (2015) A highly efficient approach to protein interactome mapping based oncollaborative filtering framework Sci Rep 5 7702

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma C Wang X (2012) Application of the Gini correlation coefficient to infer regulatory relationships in transcriptome analysis PlantPhysiol 160 192-203

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma H-W Zeng A-P (2003) The connectivity structure giant strong component and centrality of metabolic networks Bioinformatics 191423-1430

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma S Shah S Bohnert HJ Snyder M Dinesh-Kumar SP (2013) Incorporating motif analysis into gene co-expression networks revealsnovel modular expression pattern and new signaling pathways PLoS Genet 9 e1003840

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Marbach D Costello JC Kuumlffner R Vega NNM Prill RJ Camacho DM Allison KR Aderhold A Allison KR Bonneau R et al (2012)Wisdom of crowds for robust gene network inference Nat Methods 9 796-804

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Margolin AA Nemenman I Basso K Wiggins C Stolovitzky G Dalla Favera R Califano A (2006) ARACNE an algorithm for thereconstruction of gene regulatory networks in a mammalian cellular context BMC Bioinformatics 7 S7

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mark Cigan A Unger-Wallace E Haug-Collet K (2005) Transcriptional gene silencing as a tool for uncovering gene function in maizePlant J 43 929-940

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads EMBnet J 17 pp-10Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Maza E Frasse P Senin P Bouzayen M Zouine M (2013) Comparison of normalization methods for differential gene expressionanalysis in RNA-Seq experiments A matter of relative size of studied transcriptomes Commun Integr Biol 6 e25849

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

McClintock B (1983) The Significance of Responses of THE GENOME TO CHALLENGE Science (80- ) 792-801Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Meyer PE Kontos K Lafitte F Bontempi G (2007) Information-theoretic inference of large transcriptional regulatory networks EurasipJ Bioinforma Syst Biol doi 101155200779879

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Meyer PE Lafitte F Bontempi G (2008) minet ARBioconductor package for inferring large transcriptional networks using mutualinformation BMC Bioinformatics 9 461

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Monaco MK Sen TZ Dharmawardhana PD Ren L Schaeffer M Naithani S Amarasinghe V Thomason J Harper L Gardiner J et al(2013) Maize Metabolic Network Construction and Transcriptome Analysis Plant Genome 6 12

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Morohashi K Casas MI Falcone Ferreyra ML Falcone Ferreyra L Mejiacutea-Guerra MK Pourcel L Yilmaz A Feller A Carvalho B EmilianiJ et al (2012) A genome-wide regulatory framework identifies maize pericarp color1 controlled genes Plant Cell 24 2745-64

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Morris JH Apeltsin L Newman AM Baumbach J Wittkop T Su G Bader GD Ferrin TE (2011) clusterMaker a multi-algorithmclustering plugin for Cytoscape BMC Bioinformatics 12 436

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mortazavi A Williams BA McCue K Schaeffer L Wold B (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq NatMethods 5 621-628

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mukaka MM (2012) A guide to appropriate use of correlation coefficient in medical research Malawi Med J 24 69-71Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Obayashi T Hayashi S Saeki M Ohta H Kinoshita K (2009) ATTED-II provides coexpressed gene networks for Arabidopsis Nucleic wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Acids Res 37 D987-D991Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ogata Y Suzuki H Sakurai N Shibata D (2010) CoP a database for characterizing co-expressed gene modules with biologicalinformation in plants Bioinformatics 26 1267-1268

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Oshlack A Wakefield MJ (2009) Transcript length bias in RNA-seq data confounds systems biology Biol Direct 4 14Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Park T Yi S-G Kang S-H Lee S Lee Y-S Simon R (2003) Evaluation of normalization methods for microarray data BMC Bioinformatics4 33

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Paulson J Chen C-Y Lopes-Ramos CM Kuijjer ML Platig J Sonawane AR Fagny M Glass K Quackenbush J (2016) Tissue-awareRNA-Seq processing and normalization for heterogeneous and sparse data bioRxiv 81802

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Pautler M Eveland AL LaRue T Yang F Weeks R Lunde C Je B Il Meeley R Komatsu M Vollbrecht E et al (2015) FASCIATED EAR4Encodes a bZIP Transcription Factor That Regulates Shoot Meristem Size in Maize Plant Cell Online 2 tpc114132506

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Penning BW Hunter 3rd CT Tayengwa R Eveland AL Dugard CK Olek AT Vermerris W Koch KE McCarty DR Davis MF et al (2009)Genetic resources for maize cell wall biology Plant Physiol 151 1703-1728

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ponnala L Wang Y Sun Q Wijk KJ (2014) Correlation of mRNA and protein abundance in the developing maize leaf Plant J 78 424-440Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Rau A Gallopin M Celeux G Jaffreacutezic F (2013) Data-based filtering for replicated high-throughput transcriptome sequencingexperiments Bioinformatics 29 2146-2152

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Robinson MD McCarthy DJ Smyth GK (2010) edgeR a Bioconductor package for differential expression analysis of digital geneexpression data Bioinformatics 26 139-140

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sales G Romualdi C (2011) parmigenemdasha parallel R package for mutual information estimation and gene network reconstructionBioinformatics 27 1876-1877

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sangrador-Vegas A Mitchell AL Chang H-Y Yong S-Y Finn RD (2016) GO annotation in InterPro why stability does not indicateaccuracy in a sea of changing annotations Database J Biol databases curation 2016

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sato Y Antonio BA Namiki N Takehisa H Minami H Kamatsuki K Sugimoto K Shimizu Y Hirochika H Nagamura Y (2011) RiceXPro aplatform for monitoring gene expression in japonica rice grown under natural field conditions Nucleic Acids Res 39 D1141-D1148

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schaefer RJ Briskine R Springer NM Myers CL (2014) Discovering functional modules across diverse maize transcriptomes using wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

COB the co-expression browser PLoS One doi 101371journalpone0099193Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schmidt D (2016) Co-Operation Fast Correlation Covariance and Cosine Similarity R packagePubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schnable P Ware D Fulton R Stein J (2009) The B73 maize genome complexity diversity and dynamics Science (80- ) 326 1112-1115Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schwanhaumlusser B Busse D Li N Dittmar G Schuchhardt J Wolf J Chen W Selbach M (2011) Global quantification of mammalian geneexpression control Nature 473 337-342

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Scott-Boyer M-P Haibe-Kains B Deschepper CF (2013) Network statistics of genetically-driven gene co-expression modules in mousecrosses Frontiers in Genetics 20134291

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Serin EAR Nijveen H Hilhorst HWM Ligterink W (2016) Learning from Co-expression Networks Possibilities and Challenges FrontPlant Sci 7 444

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Shannon P Markiel A Ozier O Baliga NS Wang JT Ramage D Amin N Schwikowski B Ideker T (2003) Cytoscape a softwareenvironment for integrated models of biomolecular interaction networks Genome Res 13 2498-2504

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sing T Sander O Beerenwinkel N Lengauer T (2005) ROCR visualizing classifier performance in R Bioinformatics 21 3940-3941Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Singh M Goel S Meeley RB Dantec C Parrinello H Michaud C Leblanc O Grimanelli D (2011) Production of viable gametes withoutmeiosis in maize deficient for an ARGONAUTE protein Plant Cell 23 443-458

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Song L Langfelder P Horvath S (2012) Comparison of co-expression measures mutual information correlation and model basedindices BMC Bioinformatics 13 328

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Stelpflug SC Rajandeep S Vaillancourt B Hirsch CN Buell CR Leon N De Kaeppler SM (2015) An expanded maize gene expressionatlas based on RNA-sequencing and its use to explore root development Plant Genome 314-362

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Szklarczyk D Franceschini A Wyder S Forslund K Heller D Huerta-Cepas J Simonovic M Roth A Santos A Tsafou KP et al (2015)STRING v10 Protein-protein interaction networks integrated over the tree of life Nucleic Acids Res 43 D447-D452

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Tzfadia O Diels T De Meyer S Vandepoele K Aharoni A Van de Peer Y (2016) CoExpNetViz Comparative Co-Expression NetworksConstruction and Visualization Tool Front Plant Sci 6 1194

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Usadel B Obayashi T Mutwil M Giorgi FM Bassel GW Tanimoto M Chow A Steinhauser D Persson S Provart NJ (2009) Co-expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

USDA (2016) Grain World Markets and Trade

Vanholme R Demedts B Morreel K Ralph J Boerjan W (2010) Lignin biosynthesis and structure Plant Physiol 153 895-905Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Verdin E Ott M (2014) 50 Years of Protein Acetylation From Gene Regulation To Epigenetics Metabolism and Beyond Nat Rev MolCell Biol 16 258-264

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Walley JW Sartor RC Shen Z Schmitz RJ Wu KJ Urich MA Nery JR Smith LG Schnable JC Ecker JR (2016) Integration of omicnetworks in a developmental atlas of maize Science 353 814-818

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang W Zhou XJ Liu Z Sun F (2015a) Network tuned multiple rank aggregation and applications to gene ranking BMC Bioinformatics16 S6

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang YXR Jiang K Feldman LJ Bickel PJ Huang H (2015b) Inferring gene-gene interactions and functional modules using sparsecanonical correlation analysis Ann Appl Stat 9 300-323

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang Z Gerstein M Snyder M (2009) RNA-Seq a revolutionary tool for transcriptomics Nat Rev Genet 10 57-63Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wei C Li J Bumgarner RE (2004) Sample size for detecting differentially expressed genes in microarray experiments BMC Genomics5 87

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wolfe CJ Kohane IS Butte AJ (2005) Systematic survey reveals general applicability of guilt-by-association within genecoexpression networks BMC Bioinformatics 6 227

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wu D-D Wang X Li Y Zeng L Irwin DM Zhang Y-P (2014) Out of Pollen Hypothesis for Origin of New Genes in Flowering PlantsStudy from Arabidopsis thaliana Genome Biol Evol 6 2822-2829

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yang H Liu X Xin M Du J Hu Z Peng H Rossi V Sun Q Ni Z Yao Y (2016) Genome-wide Mapping of Targets of Maize HistoneDeacetylase HDA101 Reveals Its Function and Regulatory Mechanism during Seed Development Plant Cell 28 629-645

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yim WC Yu Y Song K Jang CS Lee B-M (2013) PLANEX the plant co-expression database BMC Plant Biol 13 83Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zheng W Chung LM Zhao H (2011) Bias detection and correction in RNA-Sequencing data BMC Bioinformatics 12 290Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Allen JD Xiao G Xie Y (2014) Ensemble-based network aggregation improves the accuracy of gene network reconstructionPLoS One 9 e106319 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Ye ZH (2015) Secondary cell walls Biosynthesis patterned deposition and transcriptional regulation Plant Cell Physiol 56195-214

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhu G Wu A Xu X-J Xiao P Lu L Liu J Cao Y Chen L Wu J Zhao X-M (2016) PPIM A protein-protein interaction database for MaizePlant Physiol 170 pp1501821-

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zyprych-Walczak J Szabelska A Handschuh L Goacuterczak K Klamecka K Figlerowicz M Siatkowski I (2015) The impact of normalizationmethods on RNA-Seq data analysis Biomed Res Int 2015

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

  • Parsed Citations
  • Article File
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Figure 7
  • Figure 8
  • Parsed Citations
Page 11: 1 : Maize RNA-Seq GCNs · 4 Author names and affiliation: 5 Ji Huang, ... and the genome annotation is still far from complete (Mark 49 Cigan et al., 2005; Ficklin and Feltus, 2011)

Page | 11

In addition to evaluation of network performance based upon biological characteristics networks can be 353

compared based upon several different network characteristics including clustering coefficient number of 354

nodes network heterogeneity (Dong and Horvath 2007) network centralization (Dong and Horvath 2007) 355

number of detected modules and number of genes in largest module Number of nodes is a basic construct in 356

graph theory depicting the scale of a network Clustering coefficient and number of modules are to model how 357

densely nodes are connected in networks Heterogeneity measures the variability of node connections 358

Centralization indicates how likely some nodes have significantly more connections than average In this 359

analysis each gene corresponds with a node Based on the extensive evaluation using biological 360

characteristics like protein-protein interactions (PPPTY) and predicted gene function (GO) three final maize 361

networks were selected for comparison of basic network characteristics based on their overall performance 362

PCC and SCC-built ranked aggregation network from 17 experiments (PA and SA) MRNET-built single 363

network from 1266 total samples (MS) The three networks were constrained to include the top one million 364

predicted interactions or edges 365

In prior studies most biological networks had scale-free architectures which fit a power-law distribution 366

(Barabasi et al 2004 Doncheva et al 2012 Schaefer et al 2014) For the three final maize networks 367

constructed using optimized parameters both neighborhood connectivity distribution (Supplemental Fig 8) and 368

node degree distribution (Supplemental Fig 9) fit power-law models with r-squared values over 07 The MS 369

network had the highest network centralization value The network heterogeneity value of MS was over two 370

times that of PA and SA indicating that MS may contain more highly interacting genes (Supplemental Table 371

S4) consistent with the observed highest centralization values for this network Centralization and 372

heterogeneity are two variants to model the degree distribution of networks A scale-free network with more 373

numbers of hubs has larger values of centralization and heterogeneity while a network with larger values of 374

centralization and heterogeneity may contain a larger number of hubs or the number of hubs is not significantly 375

large but the degree distributions are extremely imbalanced In biological networks many observations 376

connected large values of centralization and heterogeneity with more hub genes (Ma and Zeng 2003 Horvath 377

and Dong 2008 Iancu et al 2012 Scott-Boyer et al 2013) even though theoretically we cannot rule out the 378

possibility that high values were result from extremely imbalanced degree distribution For the MS network 379

most highly connected genes interacted with a large number of lowly connected genes this pattern is also 380

apparent reflected in the decreasing neighborhood connectivity distribution for the MS network (Supplemental 381

Fig 8) The genes with the most interactions are expected to act as key components in GCN networks 382

(Langfelder and Horvath 2008 Allen et al 2012) and likely represent central regulators of multi-protein 383

biological processes (Ma et al 2013 Du et al 2015) The top 1000 interacting genes from all networks were 384

analyzed in more detail as these were potential ldquohubrdquo genes that may regulate other expression patterns and 385

processes PA and SA shared 95 of the top 1000 interacting genes while MS had 835 unique genes (Fig 386

7A) 148 genes were shared among all three networks (Supplemental Table S5) making these genes strong 387

candidate for central biological regulators The annotation of these genes suggests their participation in a 388

range of basic cellular process (Fig 7C) including gene expression DNA replication translation and gene 389

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 12

silencing (Supplemental Table S5) the top interacting genes were not limited to a subset of cellular 390

biochemistry Ribosomal proteins were the largest component of top interacting genes (27148) which was 391

expected because of their cellular abundance and involvement with translation Interestingly nine epigenetic 392

regulators were found in the 148 shared genes including AGO104 (GRMZM2G141818) (Singh et al 2011) 393

CHR106 (GRMZM2G071025) (Li et al 2014a) and LBL1 (GRMZM2G020187) (Dotto et al 2014) 394

demonstrating the importance of epigenetic regulation for plant development (reviewed by (Huang et al 395

2017)) 396

To reveal the underlying properties of GCNs a graph clustering algorithm Markov Cluster Algorithm(MCL) was 397

used to identify network modules (Enright et al 2002 Morris et al 2011) The result showed a shared pattern 398

between the PA and SA networks that was distinct from the MS network (Supplemental Table S4) The MS 399

network had fewer but larger modules detected than the PA and SA networks Consequently most genes in 400

the MS network clustered into one very large module of 14054 consistent with the high network centralization 401

value for the MS network Conversely PA and SA networks separated into smaller distinct modules with 402

related gene ontology enrichment (Supplemental Table S6 and S7) The pattern displayed by the PA and SA 403

networks (Supplemental Fig 10) seems more likely to represent biologically relevant pathways and so these 404

methods appear to be better for module detection 405

To compile a high-confident co-expression network the top 1 million edges from PA SA and MS were merged 406

together and the intersection of the three produced a 14277 gene 106591 interactions merged network PA 407

and SA shared 835 of common interactions within the networks while MS had 873 unique interactions 408

(Fig 7B) This merged network (Supplemental Dataset S1) was used for a case study analysis of cell wall 409

biosynthesis The same network can also be accessed at httpwwwbiofsuedumcginnislabmcnmain_pagephp 410

411

Case Study Cell Wall Biosynthesis and Regulation 412

To demonstrate the functionality of network the predicted cell wall biosynthesis pathway from the merged 413

network was compared to the existing knowledge of this pathway Sixteen well-characterized components of 414

cell wall biosynthesis were selected as guide genes (Supplemental Table S8) including five cellulose 415

synthase genes seven cellulose synthase-like genes three glycosyl hydrolase genes and one glycosidase 416

gene (Penning et al 2009 Bosch et al 2011) Collectively 214 genes containing 377 edges were extracted 417

from the network with the 16 guide genes (Fig 8 A) two guide genes did not have any co-expressed genes in 418

the network that met the analysis criteria As expected for these 214 genes cell wall related GO terms were 419

enriched (Fig 7D Supplemental Table S9) 420

The resulting 214 co-expressed genes were queried against the Arabidopsis TAIR 10 protein database to 421

retrieve homologs and their annotations using BLASTP The literature was manually searched using the maize 422

genes and their Arabidopsis homologs as queries (Supplemental Table S10) The results of the literature 423

survey showed that 313 (67214) of the genes co-expressed with the guide genes had peer-reviewed 424

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 13

publications indicating a role in cell wall synthesis or related pathways in plants A search using 214 randomly 425

selected genes as queries returned only 327 genes (7214) that were involved in cell wall related pathways 426

This suggests that the network discriminated co-expressed genes and identified some known components of 427

the pathway Lignin biosynthesis genes are expected to function in cell wall biosynthesis to provide rigidity and 428

strength in the secondary cell wall (reviewed by Vanholme et al 2010) Interestingly even though no lignin 429

biosynthesis genes were included in our queries six lignin biosynthesis genes (PAL1 C4H 4CL2 HCT 430

CCoAOMT1 and PDR1) (reviewed by Zhong and Ye 2015) were found to be co-expressed with the guide 431

genes At least nine cellulose biosynthesis and assembly genes were discovered including CESA1 FLA11 432

IRX9 IRX14 and IRX10 (reviewed by Zhong and Ye 2015) Moreover proteins participating in a well-studied 433

physical interaction CSI1 (Cellulose Synthase Interactive 1) CESA6 (Cellulose Synthase 6) and CESA3 434

(Cellulose Synthase 3) (Desprez et al 2007 Gu et al 2010) were also predicted to be expressed in the 435

network There were 131 genes without reported functions in cell wall pathways an indication that GCN 436

analysis can be used to predict undiscovered components of biological pathways in maize 437

The cell wall biosynthesis pathway results were also compared with the CORNET Co-expression database (De 438

Bodt et al 2012) and STRING functional protein association network (Szklarczyk et al 2015) using the same 439

16 genes and similar parameters (See Methods) From CORNET 10 out of 16 genes had co-expressed genes 440

(Fig 8B) In total 210 genes and 325 interactions were retrieved using CORNET of which 19 (40210) had 441

publications supporting their function in cell wall pathways (Supplemental Table S11) STRING performed very 442

well with 14 out of 16 genes demonstrating predicted protein association (Fig 8C) resulting in 817 443

interactions with 76 genes 48 (3675) of co-expressed genes were experimentally confirmed (Supplemental 444

Table S12) the highest percentage among the three methods Only one of the lignin biosynthesis genes 445

(PAL1) was found using CORNET and none were found using STRING Although STRING appears very 446

robust for predicting protein-protein interactions this suggests that an optimized GCN analysis have more 447

power to find genes that function together without physically interacting This case study shows that a robust 448

optimized GCN can discover physical and functional interactions and enhance study of biological relevant 449

interactions A tutorial was provided as supplemental material on how to use Cytoscape to visualize any co-450

expressed genes in our network (Supplemental Dataset S2) 451

452

Discussion 453

As the per-read cost of RNA-Seq technology decreases the use of this technology is quickly increasing With 454

over five thousand libraries available for maize there is now ample data to support GCN analysis This 455

comprehensive evaluation of normalization methods and network inference methods using real maize RNA-456

Seq data will provide a useful set of optimized parameters to support these analyses 457

In our analysis VST CPM and RPKM normalization methods had equivalent outcomes for GCN analysis 458

consistent with prior results using much smaller datasets (Giorgi et al 2013) Several benchmark studies 459

focusing on differential expression (DE) analysis proposed that RPKM performed poorly and should be avoided 460 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 14

(Maza et al 2013 Dillies et al 2013b Zyprych-Walczak et al 2015) This was not observed for the maize 461

GCN testing It is possible that the large number of samples from various labs created enough heterogeneity 462

within samples that normalization effects were minimized (Paulson et al 2016) Furthermore the 463

normalization is on a library basis which means genes within the same library are normalized by similar factors 464

So when the network is constructed by PCC and BIC where expression vectors are centered by mean or 465

median values the effect of different normalization methods are probably small Two rank correlations SCC 466

and KCC only consider difference on relative rankings where normalization has a limited effect It is similar for 467

GCC method The estimation of mutual information is based on the k-nearest neighbor method implemented in 468

parmigene (Sales and Romualdi 2011) Since the three normalization methods shared similar expression 469

distribution (Supplemental Fig 2) MI estimations from different normalizations are expected to be similar 470

When assessing inference methods the simple and widely used correlation methods like PCC and SCC are 471

less time-consuming than MI methods This analysis showed PCCSCC- built GCNs had better overall 472

performance This is consistent with a study in human GCN analysis (Ballouz et al 2015) but SCC did not 473

score higher than other correlation methods using GO and PPPTY evaluations Some genes had higher 474

performance using MI methods but this effect was limited to evaluation with the PPPTY data This may 475

indicate that correlation and MI inference methods assert different kinds of interactions (Meyer et al 2008 476

Marbach et al 2012 Song et al 2012) Marbach et al (2012) stated that integration of multiple inference 477

methods showed a more robust performance than any single inference methods in in silico and E coli 478

expression networks referring to ldquothe wisdom of crowdrdquo However for analysis of the available maize data 479

integration of PCC SCC MRNET and CLR together did not result in a network that outperformed PCC and 480

SCC networks (data not shown) This approach was also less effective in more complex S cerevisiae datasets 481

than prokaryotic networks (Marbach et al 2012) suggesting that more work is required to determine whether 482

integrating algorithms can improve GCNs with eukaryotic data 483

In conclusion we extensively evaluated normalization methods and inference methods for building an RNA-484

Seq based maize GCN This optimization may apply to a range of datasets with shared characteristics of 485

maize including a large and heterogeneous genome with rich and diverse transposon element composition 486

and limited gene annotation 487

488

Materials and Methods 489

RNA-Seq Data Collection and Process 490

The maize genome and its annotation were downloaded from Ensembl Plant Release 31 491

(httpplantsensemblorg) The original 1303 RNA-Seq samples based on illumina HiSeq2000 or Hiseq2500 492

were downloaded from NCBI Sequence Read Archive (SRA) (Leinonen et al 2010) The downloaded files 493

were converted to fastq format using the fastq-dump command in SRA Toolkit (version 252) The adapters for 494

the fastq files were trimmed by Cutadapt 181 (Martin 2011) The adapter-removed files were then quality 495

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 15

checked by FastQC v0112 (httpwwwbioinformaticsbabrahamacukprojectsfastqc) HISAT2 v204 (Kim 496

et al 2015) was used for genome alignment Gene-level expression raw read counts were calculated by 497

FeatureCounts 150 (Liao et al 2014) from aligned bam files (Supplemental Fig S1) 26 libraries with less 498

than 5 million reads total and 11 libraries with less than 70 of total alignment rate were excluded leaving 499

1266 samples (Supplemental Table S1) for the final expression table The processing protocol were 500

streamlined by Snakemake v371 (Koumlster and Rahmann 2012) 501

502

Gene Count Normalization 503

The expression data was normalized using three different methods before constructing GCNs Counts Per 504

Million (CPM) and Reads Per Killbase Per Million (RPKM) were calculated by edgeR package (Robinson et al 505

2010) in R environment and then log2 normalized (expression = log2(CPMRPKM +1) For both method scale 506

factors between samples were estimated by Trimmed Mean of M-values (TMM) in edge R Variance Stabilizing 507

Transformation (VST) was calculated by DESeq2 package (Love et al 2014) Only genes with expression 508

higher than 2 CPM in more than 1000 samples were included from additional analysis (15116 genes) 509

510

Network Inference 511

Six correlation coefficient methods and four mutual information methods were applied to normalized gene 512

expression data to construct GCNs All computing steps were done in the R 331 environment Pearson 513

Correlation Coefficient (PCC) and Spearman Correlation Coefficient (SCC) was calculated by cor() function 514

Kendall rank Correlation Coefficient was calculated using corfk() function in pcaPP package (Filzmoser et al 515

2009) Gini Correlation Coefficient was calculated by adjacencymatrix() function in rsgcc package (Ma and 516

Wang 2012) Biweight midcorrelation was computed by bicor() function in WGCNA package (Langfelder and 517

Horvath 2008) Cosine similarity coefficient was computed by cosine() function in coop package (Schmidt 518

2016) Mutual information results were computed using the parmigene package (Sales and Romualdi 2011) 519

The adjacency matrix weighs derived from ten inference methods were ranked with smallest value equals to 520

one Then ranks were divided by the number of elements in the matrix and diagonal was set to one to make all 521

networks weighs ranging from zero to one 522

523

Network Performance Evaluation 524

To generate the random networks gene IDs were shuffled randomly in CPM or VST normalized expression 525

matrices The randomized expression matrices were then inferenced by PCC MRNET or CLR methods and 526

evaluated For PCC methods 1000 repeats of randomization and evaluation were conducted For MRNET and 527

CLR each inference steps took 2 hours on our server so 10 repeats were conducted 528

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 16

Four maize datasets were used for evaluation First maize protein-protein interactions were downloaded from 529

PPIM v11 (Zhu et al 2016) Only high-confidence interactions were used for evaluation as defined by ranking 530

top 5 in their results Second maize pathway information was downloaded from MaizeCyc v22 (Monaco et 531

al 2013) Genes within same pathways were considered as co-expressed Third maize gene ontology data 532

for AGPv330 was downloaded from AgriGO (Du et al 2010) GO terms with 20 to 300 genes were used for 533

evaluation Fourth ChIP-Seq confirmed targets for HDA101 (GRMZM2G172883) (Yang et al 2016) was used 534

as positive co-expressed examples for evaluation 535

The widely-used Area under Receiver Operating Characteristic (AUROC) for binary classification problems 536

was used for evaluations Protein-protein interaction and pathway information was parsed into lists of co-537

expressed genes Prediction() and performance() function in R package ROCR were used to calculate 538

AUROCs (Sing et al 2005) The 277 AUROC values for GO datasets were calculated by EGAD package 539

(Ballouz et al 2016) in R Basically it utilizes the ldquoguilt-by associationrdquo principle that genes with shared GO 540

terms are more likely to connected Thus networks normalized and inferred by different methods can be 541

evaluated by hiding a subset of genes GO terms and test whether the hidden GO terms could be predicted 542

from the remaining annotations The prediction model performance was measured by AUROC values in three-543

fold cross-validation All ANOVA and pairwise Wilcoxon rank tests were analyzed in R using anova() and 544

pairwisewilcoxtest() function from stats package P-value adjustment method was set to ldquofdrrdquo (Benjamini and 545

Hochberg 1995) 546

Definition of True Positives (TP) False Positives (FP) True Negatives (TN) False Negatives (FN) For the 547

evaluation using PPPTY dataset TP a network predicts two genes are co-expressed and they are co-548

expressed in PPPTY dataset FP a network predicts two genes are co-expressed but they are not TN a 549

network predicts two genes are not co-expressed and they are not co-expressed in PPPTY FN a network 550

predicts two genes are not co-expressed but they are co-expressed in PPPTY datasets For the evaluation 551

using GO dataset TP a network predicts a gene has a specific GO term and it does have that GO term in our 552

GO dataset FP a network predicts a gene has a specific GO term but it does not have that GO term in our 553

GO dataset TN a network predicts a gene does not have a specific GO term and it doesnrsquot have in our GO 554

dataset FN a network predicts a gene does not have a specific GO terms but it has that GO term in GO 555

dataset 556

557

Network Clustering and Characterization 558

For each network the top 1 million edges were selected as stringent co-expression networks The network 559

topological characteristics were computed in Cytoscape (Shannon et al 2003) The neighborhood connectivity 560

distribution and node degree distributions were plotted by Network Analyzer plugin (Doncheva et al 2012) 561

Graph clustering was performed using Markov Cluster Algorithm (MCL) by MCL v14137 with inflation value set 562

to 18 (Enright et al 2002) All networks were visualized in Cytoscape 563

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 17

564

Gene Ontology Enrichment and Visualization 565

Gene ontology enrichment was analyzed in AgriGOrsquos Singular Enrichment Analysis tool (Du et al 2010) 566

15116 genes involved in our networks were used as background references Hypergeometric testing was used 567

to calculate p-value for which a value below 005 was considered as significant The Yekutieli method was 568

used for multiple test correction and terms with false discovery rate (FDR) above 005 were discarded The 569

results were then imported into Cytoscape for visualization 570

571

Databases Comparison on Cell Wall Pathway 572

Sixteen well characterized (Penning et al 2009 Bosch et al 2011) components of cell wall biosynthesis 573

(Supplemental Table S8) were chosen as query genes to search against CORNET Maize 574

(httpsbioinformaticspsbugentbecornetversionscornet_maize10) on website and STRING database using 575

Cytoscape stringApp (httpappscytoscapeorgappsstringapp) The parameters for searching CORNET 576

database were Method=Pearson Correlation coefficient=075 P-value le 005 and Top genes = 50 This 577

resulted in 210 co-expressed genes and 325 interactions To search STRING database the confidence cutoff 578

was set to 04 with maximum number of interactors set to 100 76 genes with 817 interactions were retrieved 579

Maize proteins were blasted against TAIR 10 protein sequences using standalone BLASTP version 2228+ 580

(Camacho et al 2009) 581

582

Acknowledgments 583

We would like to give special thanks to Dr Peixiang Zhao (FSU Department of Computer Science) for advice 584

and discussion on topological analysis of maize networks Also we thank Dr Alan Lemmon (FSU Department 585

of Scientific Computing) and Dr Jonathan Dennis (FSU Department of Biological Science) for the helpful 586

discussion on data analysis 587

588

Supplemental Data 589

Supplemental Figure 1 Pipeline and datasets used for analysis 590

Supplemental Figure 2 Distribution of gene expression values 591

Supplemental Figure 3 Maize CPM-normalized with log2 transformed gene expression from all tissues and 592

developmental stages 593

Supplemental Figure 4 Pairwise comparison among results of inferences methods 594

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 18

Supplemental Figure 5 Characteristic of all 1720 PPPTY gene set (ALL_1720) genes with highest AUROC 595

values in CSC method (CSC) PCC method (PCC) and MRNET method (MRNET) 596

Supplemental Figure 6 Evaluation of network performance based on sample size and inference 597

Supplemental Figure 7 GCN performance comparison between protein networks 598

Supplemental Figure 8 Average neighborhood connectivity for three selected networks PCC-aggregated (PA) 599

SCC-aggregated (SA) and MRNET-single (MS) 600

Supplemental Figure 9 Node distribution for four selected networks PCC-aggregated (PA) SCC-aggregated 601

(SA) and MRNET-single (MS) and the intersection among three networks (Merged network) 602

Supplemental Figure 10 Network representation of PCC ranked aggregation network (PA) 603

Supplemental Table S1 RNA-Seq libraries used in this analysis 604

Supplemental Table S2 Random network AUROC value baseline 605

Supplemental Table S3 ANOVA tables and pairwise comparisons 606

Supplemental Table S4 Topological characteristics of four maize networks 607

Supplemental Table S5 Gene Ontology annotation for 148 hub genes 608

Supplemental Table S6 Enriched GO terms for PCC ranked aggregation networks from module 1 to module 8 609

Supplemental Table S7 Enriched GO terms for SCC ranked aggregation networks from module 1 to module 8 610

Supplemental Table S8 16 query genes in maize cell wall pathway 611

Supplemetal Table S9 GO enrichment analysis for 214 co-expressed genes of cell wall query genes in 612

merged network 613

Supplemental Table S10 Annotation for co-expressed genes queried by 16 cell wall pathway genes from 614

merged network 615

Supplemental Table S11 Annotation for co-expressed genes queried by 16 cell wall pathway genes from 616

CORNET database 617

Supplemental Table S12 Annotation for co-expressed genes queried by 16 cell wall pathway genes from 618

STRING database 619

Supplemental Dataset S1 The merged network in Cytoscape-ready format 620

Supplemental Dataset S2 Tutorial Visualizing Co-expression data in Cytoscape 621

622

623 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 19

624

625

626

Figure legends 627

628

Figure 1 Number of maize microarray and RNA-Seq samples submitted to NCBI (httpswwwncbinlmnihgov) 629

from 2008-2016 A A text search of the National Center for Biotechnology Information (NCBI) Gene 630

Expression Omnibus (GEO) database identified samples generated by microarray platforms GPL4032 and 631

GPL12620 the total values for years 2008 to 2016 were combined to represent the number of microarray 632

studies (Microarray) A text search of the NCBI Sequence Read Archive (SRA) database was used to identify 633

RNA-seq samples generated between 2008 and 2016 using the Illumina sequencing platform (RNA-Seq) B 634

the number of samples submitted to NCBI GEO database each year generated by microarray platform 635

GPL4032 and GPL12620 were identified by a text search (dash line) and compared to the number of RNA-Seq 636

Illumina samples (solid line) per year 2008-2016 637

638

Figure 2 Normalization and network inference methods effect on single network performance A Network 639

performance was evaluated by calculating area under the receiver operating characteristic curve (AUROC) 640

values from GO datasets for comparisons with samples normalized using Variance Stabilizing Transformation 641

(VST) Counts Per Million (CPM) or Reads per Kilobase per Million (RPKM) methods B Network performance 642

was evaluated by calculating AUROC values from PPPTY dataset comparisons for samples normalized using 643

VST CPM or RPKM methods C Network performance was evaluated by calculating AUROC values from 644

comparisons with HDA101 binding targets for samples normalized using VST CPM or RPKM methods D 645

Network performance was evaluated by calculating AUROC values from comparisons with GO dataset for 646

samples constructed using ten inference methods including Pearson Correlation Coefficient (PCC) Spearman 647

correlation coefficient (SCC) Kendall rank Correlation Coefficient (KCC) Gini correlation coefficient (GCC) 648

Biweight midcorrelation (BIC) Cosine Similarity Coefficient (CSC) Additive ARCNE (AA) multiplicative 649

ARCNE (MA) Minimum Redundancy NETwork (MRNET) and Context Likelihood of Relatedness (CLR) E 650

Network performance was evaluated by calculating AUROC values from comparisons with PPPTY for samples 651

constructed using ten inference methods F Network performance was evaluated by calculating AUROC 652

values from comparisons with HDA101 binding targets for samples constructed using ten inference methods 653

Outliers were defined as outside of 15 times the interquartile range above 75 quantile or below 25 quantile 654

Median values were plotted as bold horizontal lines For C to F the horizontal dashed lines indicate the highest 655

and lowest AUROC values 656

657

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 20

Figure 3 Similarity between ten inference methods on network performance based upon GO (A) and PPPTY 658

(B) evaluation Genes with highest AUROC value in CSC or MRNETCLR are closed in blue and green box 659

respectively Area under the ROC curve (AUROC) values for each GO term or genes were scaled to standard 660

normal distribution resulting in scaled AUROC values between -3 (blue) and 3 (red) Samples normalized by 661

VST CPM and RPKM were analyzed using each inference methods (PCC SCC KCC GCC BIC CSC AA 662

MA MRNET and CLR) and clustered based on Euclidian distance PCC Pearson Correlation Coefficient SCC 663

Spearman correlation coefficient KCC Kendall rank Correlation Coefficient GCC Gini correlation coefficient 664

BIC Biweight midcorrelation CSC Cosine Similarity Coefficient AA Additive ARCNE MA multiplicative 665

ARCNE MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness 666

667

Figure 4 Effect of sample size on network performance A Average area under the ROC curve (average 668

AUROC) values from GO evaluation of 17 different sized networks plotted against natural logarithm 669

transformed sample size (log(sample size)) B Average AUROC values from PPPTY evaluation of 17 different 670

sized networks plotted against natural logarithm transformed sample size (log(sample size)) Regression fitting 671

logarithm models were plotted in black lines R-square and p-values were calculated by lm() function in R PCC 672

Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET Minimum Redundancy 673

NETwork CLR Context Likelihood of Relatedness 674

675

Figure 5 Evaluation of different sized networks constructed with different number of samples using PCC 676

(black) SCC (green) MRNET (red) and CLR (blue) methods A Average AUROC values from GO evaluations 677

of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted against sample size Seventeen 678

individual networks were labeled as S12_1 to S404 the S1266 included all samples from 17 experiments B 679

Average AUROC values from PPPTY evaluations of networks constructed using 12 (S12) to 1266 (S1266) 680

libraries were plotted against sample size Networks with the same number of samples included are 681

designated as ldquo_1rdquo ldquo_2rdquo and ldquo_3rdquo PCC Pearson Correlation Coefficient SCC Spearman correlation 682

coefficient MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness 683

684

Fig 6 GCN performance comparison among single network (whiterdquo1266rdquo) aggregated network (greyrdquoaggrdquo) 685

and protein network (dark greyrdquoprrdquo) using PCC SCC MRNET and CLR A GO evaluation on networks 686

Inference methods were indicated by single letter (p- PCC s- SCC m- MRNET c-CLR) AUROC values were 687

plotted against network types B PPPTY evaluation on networks Inference methods were indicated by single 688

letter (p- PCC s- SCC m- MRNET c-CLR) Network types were plotted against AUROC values Bold 689

horizontal lines indicate median star sign is the mean value of each box Outliers are plotted in grey dots 690

691

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 21

Figure 6 GCN performance comparison of networks constructed with 1266 libraries A Area under the ROC 692

curve (AUROC) values from GO evaluation of single network (white bars) aggregation network (grey bars) and 693

protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) 694

or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers B 695

AUROC values from PPPTY evaluation of single network (white bars) aggregation network (grey bars) and 696

protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) 697

or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers 698

699

Figure 7 Comparison of top 1000 interacting genes in aggregation and single networks A Venn diagram 700

shows the overlap among 1000 highest interacting genes in three networks 148 genes were shared among 701

three networks PA PCC ranked aggregation network SA SCC ranked aggregation network MS MRNET 702

single network B Venn diagram shows the overlap among top 1x106 edges in three networks 106591 edges 703

were shared among three networks C GO term enrichment analyzed by AgriGO for 148 shared highly 704

interacting genes among PA SA and MS D GO term enrichment analyzed by AgriGO for 214 co-expressed 705

genes queried by 16 cell wall pathway genes 706

707

Figure 8 Cell wall pathway subnetworks A Intersections of PCC aggregation (PA) SCC aggregation (SA) and 708

MRNET-single (MS) networks queried by 16 cell wall pathway genes (red nodes) Cyan nodes are genes with 709

reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of 710

involvement in cell wall related pathways Grey lines indicate network predicted interactions B Network 711

retrieved from CORNET database queried by the16 cell wall pathway genes (red node) Cyan nodes are 712

genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior 713

knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions C 714

Network retrieved from STRING database queried by 16 cell wall pathway genes (red nodes) Cyan nodes are 715

genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior 716

knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions 717

718

Supplemental Figure 1 Pipeline and datasets used for analysis A Workflow used in this analysis 719

Independent steps are labeled in square boxes with alternative algorithms for each step in the rounded boxes 720

Software and packages for each step are in italics between the boxes Raw data files were acquired from 721

National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) database converted to a 722

common format (fastq files) and aligned to the maize AGPv3 genome (Alignment) Gene-level reads were 723

counted (Read Count) to generate an expression matrix which was imported to the R environment for the 724

normalization inference and evaluation steps All networks were visualized in Cytoscape B Relative 725

representation of different maize tissues in acquired datasets Tissues are listed by name with the percentage 726

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 22

of the1266 libraries originating from each tissue SAM= Shoot Apical Meristem Samples are grouped by tissue 727

and may be represented by one or more developmental stages of that tissue Tissues represented by less than 728

10 libraries were grouped together as Others C Relative representation of different maize genotypes in our 729

datasets Genotypes are listed by name with the percentage of the 1266 libraries originating from each tissue 730

MAGIC = Multi-parent Advanced Generation InterCrosses Genotypes represented by more than 10 libraries 731

were grouped together as Others 732

733

Supplemental Figure 2 Distribution of gene expression values The frequency of each expression level in the 734

dataset (Density) was plotted against gene expression (Expr) which was calculated after normalization by 735

Variance Stabilizing Transformation (VST) Counts Per Million (CPM) or Reads Per Kilobase per Million 736

mapped reads (RPKM) A-B distribution of expression values for samples normalized with CPM (black line 737

CPM graph) and RPKM (black line RPKM graph) before (A) and after (B) logarithm normalization (log2) VST 738

values are log2 transformed by default The normal distribution of expression (dot lines) was calculated using 739

dnorm() function in R which takes the mean value and standard deviation from log2 transformed expressions 740

C Normalized gene expression values for 15116 genes were averaged libraries and plotted as a function of 741

gene length in base pairs (bp) 742

743

Supplemental Figure 3 Maize CPM-normalized with log2 transformed gene expression from all tissues and 744

developmental stages (Stelpflug et al 2015) A Clustering dendrogram of samples based on Euclidean 745

distance (Height) DAS days after sowing DAP days after pollination V1-V18 vegetative developmental 746

stage B Heat map of the gene expression correlation between pollen tissue and 78 other tissues calculated 747

by Pearson correlation coefficient ranging 06 to 10 Red color indicates higher correlation 748

749

Supplemental Figure 4 Pairwise comparison among results of inferences methods A GO evaluation 750

comparisons for VST CPM and RPKM normalized data The AUROC value density for each method was 751

plotted in diagonal line of blocks between AUROC values and PCC values AUROC values evaluated by GO 752

datasets were plotted pairwise in triangle below diagonal with the number corresponding coefficient values as 753

calculated by Pearson correlation shown in the triangle above diagonal B PPPTY evaluation comparisons for 754

VST CPM and RPKM normalized data The AUROC value density for each method was plotted in diagonal 755

line of blocks between AUROC values and PCC values AUROC values evaluated by PPPTY datasets were 756

plotted pairwise in triangle below diagonal with the number corresponding coefficient values as calculated by 757

Pearson correlation shown in the triangle above diagonal PCC Pearson Correlation Coefficient SCC 758

Spearman correlation coefficient KCC Kendall rank Correlation Coefficient GCC Gini correlation coefficient 759

Bi Biweight midcorrelation CSC Cosine Similarity Coefficient AA Additive ARCNE MA multiplicative 760

ARCNE MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness 761

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 23

762

Supplemental Figure 5 Characteristic of all 1720 PPPTY gene set (ALL_1720) genes with highest AUROC 763

values in CSC method (CSC) PCC method (PCC) and MRNET method (MRNET) Average expression in 764

CPM of four gene sets were in squares average number of lowly expressed elements (CPM lt 0) were in solid 765

circles 766

767

Supplemental Figure 6 Evaluation of network performance based on sample size and inference A AUROC 768

values from GO evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted 769

against sample size B AUROC values from PPPTY evaluations of networks constructed using 12 (S12) to 770

1266 (S1266) libraries were plotted against sample size Networks with the same number of samples included 771

are designated as ldquo_1rdquo ldquo_2rdquo and ldquo_3rdquo Outliers were defined as outside of 15 times the interquartile range 772

above the 75 quantile or below 25 quantile Median values were plotted as bold horizontal lines Dash lines 773

are average AUROC value from 17 individual networks of each categories Mean values of each network were 774

labeled in asterisks PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET 775

Minimum Redundancy NETwork CLR Context Likelihood of Relatedness 776

777

Supplemental Figure 7 GCN performance comparison between protein networks A Area Under the ROC 778

curve (AUROC) values from GO evaluation of protein networks with 17862 genes (ppr_all) and with 11429 779

genes (ppr) B Area Under the ROC curve (AUROC) values from PPPTY evaluation of protein networks with 780

17862 genes (ppr_all) and with 11429 genes (ppr) Both networks were constructed by Pearson Correlation 781

Coefficient (PCC) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate 782

outliers 783

784

Supplemental Figure 8 Average neighborhood connectivity for three selected networks PCC-aggregated (PA) 785

SCC-aggregated (SA) and MRNET-single (MS) The average neighborhood connectivity distribution of all 786

genes is plotted against number of neighbors The top one million edges were chosen for each network Red 787

and blue curve shows the power-law fitted distribution R2 value indicates the fitness with the power-law model 788

789

Supplemental Figure 9 Node distribution for four selected networks PCC-aggregated (PA) SCC-aggregated 790

(SA) and MRNET-single (MS) and the intersection among three networks (Merged network) The number of 791

edges linked to the genes (node degree) was plotted against the number of genes with that degree (number of 792

nodes) Red curve shows the power-law fitted distribution with the function and R2 indicated beside 793

794

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 24

Supplemental Figure 10 Network representation of PCC ranked aggregation network (PA) Each node is a 795

gene in the network The eight largest modules detected by Markov Cluster Algorithm (MCL) were highlighted 796

in colors Genes not in modules 1-8 are light grey nodes 797

798

799

Literature Cited 800

Allen JD Xie Y Chen M Girard L Xiao G (2012) Comparing statistical methods for constructing large scale 801 gene networks PLoS One 7 e29348 802

Anders S Huber W (2010) Differential expression analysis for sequence count data Genome Biol 11 R106 803

Aoki K Ogata Y Shibata D (2007) Approaches for extracting practical information from gene co-expression 804 networks in plant biology Plant Cell Physiol 48 381ndash90 805

Aoki Y Okamura Y Tadaka S Kinoshita K Obayashi T (2015) ATTED-II in 2016 A Plant Coexpression 806 Database Towards Lineage-Specific Coexpression Plant Cell Physiol 57 e5ndashe5 807

Asnicar F Masera L Coller E Gallo C Sella N Tolio T Morettin P Erculiani L Galante F Semeniuta S (2016) 808 NES2RA Network expansion by stratified variable subsetting and ranking aggregation Int J High Perform 809 Comput Appl 1094342016662508 810

Baerenfaller K Grossmann J Grobei MA Hull R Hirsch-Hoffmann M Yalovsky S Zimmermann P 811 Grossniklaus U Gruissem W Baginsky S (2008) Genome-scale proteomics reveals Arabidopsis thaliana 812 gene models and proteome dynamics Science (80- ) 320 938ndash941 813

Ballouz S Verleyen W Gillis J (2015) Guidance for RNA-seq co-expression network construction and analysis 814 Safety in numbers Bioinformatics 31 2123ndash2130 815

Ballouz S Weber M Pavlidis P Gillis J (2016) EGAD Ultra-fast functional analysis of gene networks bioRxiv 816 53868 817

Barabasi A-L Oltvai ZNZN Barabaacutesi A-L (2004) Network biology understanding the cellrsquos functional 818 organization Nat Rev Genet 5 101ndash113 819

Benjamini Y Hochberg Y (1995) Controlling the false discovery rate a practical and powerful approach to 820 multiple testing J R Stat Soc Ser B 289ndash300 821

De Bodt S Hollunder J Nelissen H Meulemeester N Inzeacute D (2012) CORNET 20 Integrating plant 822 coexpression protein-protein interactions regulatory interactions gene associations and functional 823 annotations New Phytol 195 707ndash720 824

Bolduc N Yilmaz A Mejia-Guerra MK Morohashi K OrsquoConnor D Grotewold E Hake S (2012) Unraveling the 825 KNOTTED1 regulatory network in maize meristems Genes Dev 26 1685ndash90 826

Bosch M Mayer C-D Cookson A Donnison IS (2011) Identification of genes involved in cell wall biogenesis in 827 grasses by differential gene expression profiling of elongating and non-elongating maize internodes J 828 Exp Bot 62 3545ndash3561 829

Camacho C Coulouris G Avagyan V Ma N Papadopoulos J Bealer K Madden TL (2009) BLAST+ 830 architecture and applications BMC Bioinformatics 10 421 831

Conesa A Madrigal P Tarazona S Gomez-Cabrero D Cervera A McPherson A Szcześniak MW Gaffney DJ 832 Elo LL Zhang X et al (2016) A survey of best practices for RNA-seq data analysis Genome Biol 17 13 833

Drsquohaeseleer P Liang S Somogyi R (2000) Genetic network inference from co-expression clustering to reverse 834 engineering Bioinformatics 16 707ndash726 835

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 25

Davidson RM Hansey CN Gowda M Childs KL Lin H Vaillancourt B Sekhon RS de Leon N Kaeppler SM 836 Jiang N et al (2011) Utility of RNA Sequencing for Analysis of Maize Reproductive Transcriptomes Plant 837 Genome J 4 191 838

Desprez T Juraniec M Crowell EF Jouy H Pochylova Z Parcy F Houmlfte H Gonneau M Vernhettes S (2007) 839 Organization of cellulose synthase complexes involved in primary cell wall synthesis in Arabidopsis 840 thaliana Proc Natl Acad Sci 104 15572ndash15577 841

Dhillon IS Modha DS (2001) Concept decompositions for large sparse text data using clustering Mach Learn 842 42 143ndash175 843

Dillies M-A Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot G Castel D 844 Estelle J (2013a) A comprehensive evaluation of normalization methods for Illumina high-throughput RNA 845 sequencing data analysis Brief Bioinform 14 671ndash683 846

Dillies MA Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot NS Castel D 847 Estelle J et al (2013b) A comprehensive evaluation of normalization methods for Illumina high-throughput 848 RNA sequencing data analysis Brief Bioinform 14 671ndash683 849

Doncheva NT Assenov Y Domingues FS Albrecht M (2012) Topological analysis and interactive visualization 850 of biological networks and protein structures Nature Protoc 7 670ndash85 851

Dong J Horvath S (2007) Understanding network concepts in modules BMC Syst Biol 1 24 852

Dotto MC Petsch KA Aukerman MJ Beatty M Hammell M Timmermans MCP (2014) Genome-wide analysis 853 of leafbladeless1-regulated and phased small RNAs underscores the importance of the TAS3 ta-siRNA 854 pathway to maize development PLoS Genet 10 e1004826 855

Du D Rawat N Deng Z Gmitter FG (2015) Construction of citrus gene coexpression networks from microarray 856 data using random matrix theory Hortic Res 2 15026 857

Du Z Zhou X Ling Y Zhang Z Su Z (2010) agriGO a GO analysis toolkit for the agricultural community 858 Nucleic Acids Res 38 64-70 859

Enright AJ Van Dongen S Ouzounis CA (2002) An efficient algorithm for large-scale detection of protein 860 families Nucleic Acids Res 30 1575ndash1584 861

Fadista J Vikman P Laakso EO Mollet IG Esguerra J Lou Taneera J Storm P Osmark P Ladenvall C 862 Prasad RB (2014) Global genomic and transcriptomic analysis of human pancreatic islets reveals novel 863 genes influencing glucose metabolism Proc Natl Acad Sci 111 13924ndash13929 864

Faith JJ Hayete B Thaden JT Mogno I Wierzbowski J Cottarel G Kasif S Collins JJ Gardner TS (2007) 865 Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of 866 expression profiles PLoS Biol 5 0054ndash0066 867

Fedoroff N V (2012) McClintockrsquos challenge in the 21st century Proc Natl Acad Sci 109(50) 20200ndash20203 868

Ficklin SP Feltus FA (2011) Gene coexpression network alignment and conservation of gene modules 869 between two grass species maize and rice Plant Physiol 156 1244ndash56 870

Filzmoser P Fritz H Kalcher K (2009) pcaPP Robust PCA by Projection Pursuit R Packag version 1 871

Fu J Cheng Y Linghu J Yang X Kang L Zhang Z Zhang J He C Du X Peng Z (2013) RNA sequencing 872 reveals the complex regulatory network in the maize kernel Nature Commun 42832 873

Gao S Ver Steeg G Galstyan A (2015) Efficient Estimation of Mutual Information for Strongly Dependent 874 Variables Artificial Intelligence and Statistics 277-286 875

Gillis J Pavlidis P (2011) The role of indirect connections in gene networks in predicting function 876 Bioinformatics 27 1860ndash1866 877

Giorgi F Fabbro C Del Licausi F (2013) Comparative study of RNA-seq-and Microarray-derived coexpression 878 networks in Arabidopsis thaliana Bioinformatics 2 1ndash8 879

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 26

Gu Y Kaplinsky N Bringmann M Cobb A Carroll A Sampathkumar A Baskin TI Persson S Somerville CR 880 (2010) Identification of a cellulose synthase-associated protein required for cellulose biosynthesis Proc 881 Natl Acad Sci 107 12866ndash12871 882

Han Y Gao S Muegge K Zhang W Zhou B (2015) Advanced applications of RNA sequencing and challenges 883 Bioinform Biol Insights 9 29ndash46 884

Horvath S Dong J (2008) Geometric interpretation of gene coexpression network analysis PLoS Comput biol 885 4 e1000117 886

Huang J Lynn JS Schulte L Vendramin S McGinnis K (2017) Chapter Two-Epigenetic Control of Gene 887 Expression in Maize Int Rev Cell Mol Biol 328 25ndash48 888

Iancu OD Kawane S Bottomly D Searles R Hitzemann R McWeeney S (2012) Utilizing RNA-Seq data for de 889 novo coexpression network inference Bioinformatics 28 1592ndash1597 890

Kim D Langmead B Salzberg SL (2015) HISAT a fast spliced aligner with low memory requirements Nat 891 Methods 12 357ndash360 892

Koumlster J Rahmann S (2012) Snakemakemdasha scalable bioinformatics workflow engine Bioinformatics 28 893 2520ndash2522 894

Krouk G Lingeman J Colon A Coruzzi G Shasha D (2013) Gene regulatory networks in plants learning 895 causality from time and perturbation Genome Biol 14 123 896

Kumar S Stecher G Suleski M Hedges SB (2017) TimeTree a resource for timelines timetrees and 897 divergence times Mol Biol Evol 34 1812ndash1819 898

Kumari S Nie J Chen H-S Ma H Stewart R Li X Lu M-Z Taylor WM Wei H (2012) Evaluation of gene 899 association methods for coexpression network construction and biological knowledge discovery PLoS 900 One 7 e50411 901

Langfelder P Horvath S (2008) WGCNA an R package for weighted correlation network analysis BMC 902 Bioinformatics 9 559 903

Leinonen R Sugawara H Shumway M (2010) The sequence read archive Nucleic Acids Res gkq1019 904

Li C Qiao Z Qi W Wang Q Yuan Y Yang X Tang Y Mei B Lv Y Zhao H et al (2015a) Genome-Wide 905 Characterization of cis-Acting DNA Targets Reveals the Transcriptional Regulatory Framework of 906 Opaque2 in Maize Plant Cell 27 532-545 907

Li H Peng Z Yang X Wang W Fu J Wang J Han Y Chai Y Guo T Yang N (2013a) Genome-wide 908 association study dissects the genetic architecture of oil biosynthesis in maize kernels Nat Genet 45 43ndash909 50 910

Li J Wei H Zhao PX (2013b) DeGNServer  Deciphering Genome-Scale Gene Networks through High 911 Performance Reverse Engineering Analysis 2013 912

Li P Piao Y Shon HS Ryu KH (2015b) Comparing the normalization methods for the differential analysis of 913 Illumina high-throughput RNA-Seq data BMC Bioinformatics 16 347 914

Li Q Eichten SR Hermanson PJ Zaunbrecher VM Song J Wendt J Rosenbaum H Madzima TF Sloan AE 915 Huang J et al (2014a) Genetic Perturbation of the Maize Methylome Plant Cell 26 4602ndash4616 916

Li S Łabaj PP Zumbo P Sykacek P Shi W Shi L Phan J Wu P-Y Wang M Wang C (2014b) Detecting and 917 correcting systematic variation in large-scale RNA sequencing data Nature Biotechnol 32 888ndash895 918

Li Y Pearl SA Jackson SA (2015c) Gene Networks in Plant Biology Approaches in Reconstruction and 919 Analysis Trends Plant Sci 20 664ndash675 920

Liao Y Smyth GK Shi W (2014) featureCounts an efficient general purpose program for assigning sequence 921 reads to genomic features Bioinformatics 30 923ndash930 922

Lim WK Wang K Lefebvre C Califano A (2007) Comparative analysis of microarray normalization procedures 923 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 27

Effects on reverse engineering gene networks Bioinformatics pp 282ndash288 924

Liu S Liu Y Zhao J Cai S Qian H Zuo K Zhao L Zhang L (2017) A computational interactome for prioritizing 925 genes associated with complex agronomic traits in rice Plant J 90 177-188 926

Lonsdale J Thomas J Salvatore M Phillips R Lo E Shad S Hasz R Walters G Garcia F Young N (2013) 927 The genotype-tissue expression (GTEx) project Nat Genet 45 580ndash585 928

Love MI Huber W Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data 929 with DESeq2 Genome Biol 15 1 930

Luo X You Z Zhou M Li S Leung H Xia Y Zhu Q (2015) A highly efficient approach to protein interactome 931 mapping based on collaborative filtering framework Sci Rep 5 7702 932

Ma C Wang X (2012) Application of the Gini correlation coefficient to infer regulatory relationships in 933 transcriptome analysis Plant Physiol 160 192ndash203 934

Ma H-W Zeng A-P (2003) The connectivity structure giant strong component and centrality of metabolic 935 networks Bioinformatics 19 1423ndash1430 936

Ma S Shah S Bohnert HJ Snyder M Dinesh-Kumar SP (2013) Incorporating motif analysis into gene co-937 expression networks reveals novel modular expression pattern and new signaling pathways PLoS Genet 938 9 e1003840 939

Marbach D Costello JC Kuumlffner R Vega NNM Prill RJ Camacho DM Allison KR Aderhold A Allison KR 940 Bonneau R et al (2012) Wisdom of crowds for robust gene network inference Nat Methods 9 796ndash804 941

Margolin AA Nemenman I Basso K Wiggins C Stolovitzky G Dalla Favera R Califano A (2006) ARACNE 942 an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context BMC 943 Bioinformatics 7 S7 944

Mark Cigan A Unger‐Wallace E Haug‐Collet K (2005) Transcriptional gene silencing as a tool for uncovering 945 gene function in maize Plant J 43 929ndash940 946

Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads EMBnet J 17 947 pp-10 948

Maza E Frasse P Senin P Bouzayen M Zouine M (2013) Comparison of normalization methods for 949 differential gene expression analysis in RNA-Seq experiments A matter of relative size of studied 950 transcriptomes Commun Integr Biol 6 e25849 951

McClintock B (1983) The Significance of Responses of THE GENOME TO CHALLENGE Science (80- ) 792ndash952 801 953

Meyer PE Kontos K Lafitte F Bontempi G (2007) Information-theoretic inference of large transcriptional 954 regulatory networks Eurasip J Bioinforma Syst Biol doi 101155200779879 955

Meyer PE Lafitte F Bontempi G (2008) minet ARBioconductor package for inferring large transcriptional 956 networks using mutual information BMC Bioinformatics 9 461 957

Monaco MK Sen TZ Dharmawardhana PD Ren L Schaeffer M Naithani S Amarasinghe V Thomason J 958 Harper L Gardiner J et al (2013) Maize Metabolic Network Construction and Transcriptome Analysis 959 Plant Genome 6 12 960

Morohashi K Casas MI Falcone Ferreyra ML Falcone Ferreyra L Mejiacutea-Guerra MK Pourcel L Yilmaz A 961 Feller A Carvalho B Emiliani J et al (2012) A genome-wide regulatory framework identifies maize 962 pericarp color1 controlled genes Plant Cell 24 2745ndash64 963

Morris JH Apeltsin L Newman AM Baumbach J Wittkop T Su G Bader GD Ferrin TE (2011) clusterMaker 964 a multi-algorithm clustering plugin for Cytoscape BMC Bioinformatics 12 436 965

Mortazavi A Williams BA McCue K Schaeffer L Wold B (2008) Mapping and quantifying mammalian 966 transcriptomes by RNA-Seq Nat Methods 5 621ndash628 967

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 28

Mukaka MM (2012) A guide to appropriate use of correlation coefficient in medical research Malawi Med J 24 968 69ndash71 969

Obayashi T Hayashi S Saeki M Ohta H Kinoshita K (2009) ATTED-II provides coexpressed gene networks 970 for Arabidopsis Nucleic Acids Res 37 D987ndashD991 971

Ogata Y Suzuki H Sakurai N Shibata D (2010) CoP a database for characterizing co-expressed gene 972 modules with biological information in plants Bioinformatics 26 1267ndash1268 973

Oshlack A Wakefield MJ (2009) Transcript length bias in RNA-seq data confounds systems biology Biol 974 Direct 4 14 975

Park T Yi S-G Kang S-H Lee S Lee Y-S Simon R (2003) Evaluation of normalization methods for microarray 976 data BMC Bioinformatics 4 33 977

Paulson J Chen C-Y Lopes-Ramos CM Kuijjer ML Platig J Sonawane AR Fagny M Glass K Quackenbush 978 J (2016) Tissue-aware RNA-Seq processing and normalization for heterogeneous and sparse data 979 bioRxiv 81802 980

Pautler M Eveland AL LaRue T Yang F Weeks R Lunde C Je B Il Meeley R Komatsu M Vollbrecht E et 981 al (2015) FASCIATED EAR4 Encodes a bZIP Transcription Factor That Regulates Shoot Meristem Size in 982 Maize Plant Cell Online 2 tpc114132506 983

Penning BW Hunter 3rd CT Tayengwa R Eveland AL Dugard CK Olek AT Vermerris W Koch KE McCarty 984 DR Davis MF et al (2009) Genetic resources for maize cell wall biology Plant Physiol 151 1703ndash1728 985

Ponnala L Wang Y Sun Q Wijk KJ (2014) Correlation of mRNA and protein abundance in the developing 986 maize leaf Plant J 78 424ndash440 987

Rau A Gallopin M Celeux G Jaffreacutezic F (2013) Data-based filtering for replicated high-throughput 988 transcriptome sequencing experiments Bioinformatics 29 2146ndash2152 989

Robinson MD McCarthy DJ Smyth GK (2010) edgeR a Bioconductor package for differential expression 990 analysis of digital gene expression data Bioinformatics 26 139ndash140 991

Sales G Romualdi C (2011) parmigenemdasha parallel R package for mutual information estimation and gene 992 network reconstruction Bioinformatics 27 1876ndash1877 993

Sangrador-Vegas A Mitchell AL Chang H-Y Yong S-Y Finn RD (2016) GO annotation in InterPro why 994 stability does not indicate accuracy in a sea of changing annotations Database J Biol databases 995 curation 2016 996

Sato Y Antonio BA Namiki N Takehisa H Minami H Kamatsuki K Sugimoto K Shimizu Y Hirochika H 997 Nagamura Y (2011) RiceXPro a platform for monitoring gene expression in japonica rice grown under 998 natural field conditions Nucleic Acids Res 39 D1141ndashD1148 999

Schaefer RJ Briskine R Springer NM Myers CL (2014) Discovering functional modules across diverse maize 1000 transcriptomes using COB the co-expression browser PLoS One doi 101371journalpone0099193 1001

Schmidt D (2016) Co-Operation Fast Correlation Covariance and Cosine Similarity R package 1002

Schnable P Ware D Fulton R Stein J (2009) The B73 maize genome complexity diversity and dynamics 1003 Science (80- ) 326 1112ndash1115 1004

Schwanhaumlusser B Busse D Li N Dittmar G Schuchhardt J Wolf J Chen W Selbach M (2011) Global 1005 quantification of mammalian gene expression control Nature 473 337ndash342 1006

Scott-Boyer M-P Haibe-Kains B Deschepper CF (2013) Network statistics of genetically-driven gene co-1007 expression modules in mouse crosses Frontiers in Genetics 20134291 1008

Serin EAR Nijveen H Hilhorst HWM Ligterink W (2016) Learning from Co-expression Networks Possibilities 1009 and Challenges Front Plant Sci 7 444 1010

Shannon P Markiel A Ozier O Baliga NS Wang JT Ramage D Amin N Schwikowski B Ideker T (2003) 1011 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 29

Cytoscape a software environment for integrated models of biomolecular interaction networks Genome 1012 Res 13 2498ndash2504 1013

Sing T Sander O Beerenwinkel N Lengauer T (2005) ROCR visualizing classifier performance in R 1014 Bioinformatics 21 3940ndash3941 1015

Singh M Goel S Meeley RB Dantec C Parrinello H Michaud C Leblanc O Grimanelli D (2011) Production of 1016 viable gametes without meiosis in maize deficient for an ARGONAUTE protein Plant Cell 23 443ndash458 1017

Song L Langfelder P Horvath S (2012) Comparison of co-expression measures mutual information 1018 correlation and model based indices BMC Bioinformatics 13 328 1019

Stelpflug SC Rajandeep S Vaillancourt B Hirsch CN Buell CR Leon N De Kaeppler SM (2015) An 1020 expanded maize gene expression atlas based on RNA-sequencing and its use to explore root 1021 development Plant Genome 314ndash362 1022

Szklarczyk D Franceschini A Wyder S Forslund K Heller D Huerta-Cepas J Simonovic M Roth A Santos A 1023 Tsafou KP et al (2015) STRING v10 Protein-protein interaction networks integrated over the tree of life 1024 Nucleic Acids Res 43 D447ndashD452 1025

Tzfadia O Diels T De Meyer S Vandepoele K Aharoni A Van de Peer Y (2016) CoExpNetViz Comparative 1026 Co-Expression Networks Construction and Visualization Tool Front Plant Sci 6 1194 1027

Usadel B Obayashi T Mutwil M Giorgi FM Bassel GW Tanimoto M Chow A Steinhauser D Persson S 1028 Provart NJ (2009) Co-expression tools for plant biology opportunities for hypothesis generation and 1029 caveats Plant Cell Environ 32 1633ndash51 1030

USDA (2016) Grain World Markets and Trade 1031

Vanholme R Demedts B Morreel K Ralph J Boerjan W (2010) Lignin biosynthesis and structure Plant 1032 Physiol 153 895ndash905 1033

Verdin E Ott M (2014) 50 Years of Protein Acetylation From Gene Regulation To Epigenetics Metabolism 1034 and Beyond Nat Rev Mol Cell Biol 16 258ndash264 1035

Walley JW Sartor RC Shen Z Schmitz RJ Wu KJ Urich MA Nery JR Smith LG Schnable JC Ecker JR 1036 (2016) Integration of omic networks in a developmental atlas of maize Science 353 814ndash818 1037

Wang W Zhou XJ Liu Z Sun F (2015a) Network tuned multiple rank aggregation and applications to gene 1038 ranking BMC Bioinformatics 16 S6 1039

Wang YXR Jiang K Feldman LJ Bickel PJ Huang H (2015b) Inferring genendashgene interactions and functional 1040 modules using sparse canonical correlation analysis Ann Appl Stat 9 300ndash323 1041

Wang Z Gerstein M Snyder M (2009) RNA-Seq a revolutionary tool for transcriptomics Nat Rev Genet 10 1042 57ndash63 1043

Wei C Li J Bumgarner RE (2004) Sample size for detecting differentially expressed genes in microarray 1044 experiments BMC Genomics 5 87 1045

Wolfe CJ Kohane IS Butte AJ (2005) Systematic survey reveals general applicability ofldquo guilt-by-associationrdquo 1046 within gene coexpression networks BMC Bioinformatics 6 227 1047

Wu D-D Wang X Li Y Zeng L Irwin DM Zhang Y-P (2014) ldquoOut of Pollenrdquo Hypothesis for Origin of New 1048 Genes in Flowering Plants Study from Arabidopsis thaliana Genome Biol Evol 6 2822ndash2829 1049

Yang H Liu X Xin M Du J Hu Z Peng H Rossi V Sun Q Ni Z Yao Y (2016) Genome-wide Mapping of 1050 Targets of Maize Histone Deacetylase HDA101 Reveals Its Function and Regulatory Mechanism during 1051 Seed Development Plant Cell 28 629ndash645 1052

Yim WC Yu Y Song K Jang CS Lee B-M (2013) PLANEX the plant co-expression database BMC Plant Biol 1053 13 83 1054

Zheng W Chung LM Zhao H (2011) Bias detection and correction in RNA-Sequencing data BMC 1055 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 30

Bioinformatics 12 290 1056

Zhong R Allen JD Xiao G Xie Y (2014) Ensemble-based network aggregation improves the accuracy of gene 1057 network reconstruction PLoS One 9 e106319 1058

Zhong R Ye ZH (2015) Secondary cell walls Biosynthesis patterned deposition and transcriptional regulation 1059 Plant Cell Physiol 56 195ndash214 1060

Zhu G Wu A Xu X-J Xiao P Lu L Liu J Cao Y Chen L Wu J Zhao X-M (2016) PPIM A protein-protein 1061 interaction database for Maize Plant Physiol 170 pp1501821- 1062

Zyprych-Walczak J Szabelska A Handschuh L Goacuterczak K Klamecka K Figlerowicz M Siatkowski I (2015) 1063 The impact of normalization methods on RNA-Seq data analysis Biomed Res Int 2015 1064

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 1 Number of maize microarray and RNA-Seq samples submitted to NCBI (httpswwwncbinlmnihgov) from 2008-2016 A A text search of the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) database identified samples generated by microarray platforms GPL4032 and GPL12620 the total values for years 2008 to 2016 were combined to represent the number of microarray studies (Microarray) A text search of the NCBI Sequence Read Archive (SRA) database was used to identify RNA-seq samples generated between 2008 and 2016 using the Illumina sequencing platform (RNA-Seq) B the number of samples submitted to NCBI GEO database each year generated by microarray platform GPL4032 and GPL12620 were identified by a text search (dash line) and compared to the number of RNA-Seq Illumina samples (solid line) per year 2008-2016

Fig 1A B

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 2 Normalization and network inference methods effect on single network performance A Network performance was evaluated by calculating area under the receiver operating characteristic curve (AUROC) values from GO datasets for comparisons with samples normalized using Variance Stabilizing Transformation (VST) Counts Per Million (CPM) or Reads per Kilobase per Million (RPKM) methods B Network performance was evaluated by calculating AUROC values from PPPTY dataset comparisons for samples normalized using VST CPM or RPKM methods C Network performance was evaluated by calculating AUROC values from com-parisons with HDA101 binding targets for samples normalized using VST CPM or RPKM methods D Network performance was evaluated by calculating AUROC values from comparisons with GO dataset for samples constructed using nine inference methods including Pearson Correlation Coefficient (PCC) Spearman correla-tion coefficient (SCC) Kendall rank Correlation Coefficient (KCC) Gini correlation coefficient (GCC) Biweight midcorrelation (BIC) Cosine Similarity Coefficient (CSC) Additive ARCNE (AA) multiplicative ARCNE (MA) Minimum Redundancy NETwork (MRNET) and Context Likelihood of Relatedness (CLR) E Network perfor-mance was evaluated by calculating AUROC values from comparisons with PPPTY for samples constructed using nine inference methods F Network performance was evaluated by calculating AUROC values from com-parisons with HDA101 binding targets for samples constructed using nine inference methods Outliers were defined as outside of 15 times the interquartile range above 75 quantile or below 25 quantile Median values were plotted as bold horizontal lines For C to F the horizontal dashed lines indicate the highest and lowest AUROC values

Fig 2 A D

B E

C F

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

FigP

FigurePIPSimilarityPbetweenPninePinferencePmethodsPonPnetworkPperformancePbasedPuponPGOPA-PandPPPPTYPB-PevaluationI Genes with highest AUROC value in CSC or MRNETCLR are closed in blue and green box respectively AreaPunderPthePROCPcurvePAUROC-PvaluesPforPeachPGOPtermPorPgenesPwerePscaledPtoPstandardPnormalPdistributionMPresultingPinPscaledPAUROCPvaluesPbetweenPKPblue-PandPPred-IPSamplesPnormalizedPbyPVSTMPCPMPandPRPKMPwerePanalyzedPusingPeachPinferencePmethodsPPCCMPSCCMPKCCMPGCCMPBICMPCSCMPAAMPMAMPMRNETPandPCLR-PandPclusteredPbasedPonP EuclidianP distanceIP PCCP PearsonP CorrelationP CoefficientP SCCP SpearmanP correlationP coefficientP KCCP KendallP rankP CorrelationP CoefficientPGCCPGiniPcorrelationPcoefficientPBICPBiweightPmidcorrelationPCosinePSimilarityPCoefficientPCSC-PAAPAdditivePARCNEPMAPmultiplicativePARCNEPMRNETPMinimumPRedundancyPNETworkPCLRPContextPLikelihoodPofPRelatednessI

A

B

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 4 Effect of sample size on network performance A Average area under the ROC curve (average AUROC) values from GO evaluation of 17 different sized networks plotted against natural logarithm transformed sample size (log(sample size)) B Average AUROC values from PPPTY evaluation of 17 different sized networks plotted against natural logarithm transformed sample size (log(sample size)) Regression fitting logarithm models were plotted in black lines R-square and p-values were calculated by lm() function in R PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness

Fig 4A

B

GO PCC GO SCC

GO MRNET GO CLR

PPPTY PCC PPPTY SCC

PPPTY CLRPPPTY MRNET

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Fig 5

A B

Figure 5 Evaluation of different sized networks constructed with different number of samples using PCC (black) SCC (green) MRNET (red) and CLR (blue) methods A Average AUROC values from GO evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted against sample size B Average AUROC values from PPPTY evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted against sample size Networks with the same number of samples included are designated as ldquo_1rdquo ldquo_2rdquo and ldquo_3rdquo PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 6 GCN performance comparison of networks constructed with 1266 libraries A Area under the ROC curve (AUROC) values from GO evalu-ation of single network (white bars) aggregation network (grey bars) and protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers B AUROC values from PPPTY evaluation of single network (white bars) aggregation network (grey bars) and protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers

AU

C

AU

C

Protein GO Protein PPPTYA B

Fig 6

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

835

45

3812

5 802

148

MS PA

SA

872505

167664

16510411732

9172 716573

106591

MS PA

SA

chromatinassemblydisassembly

cellular macromoleculemetabolic process

chromatin assembly

Hub

N2 compound metabolicprocess

gene silencing

macromoleculemetabolic process

cellular componentorganization

chromatin modification

biosynthetic process

cellular biosyntheticprocess

DNA packaging

organelle organization

protein-DNA complexassembly

nucleosomeorganization

DNA-dep DNAreplication

macromoleculebiosynthetic process

response to DNAdamage stimulus

chromosomeorganization

pattern specificationprocess

DNA replication

DNA conformationchange

translation

cellular macromoleculebiosynthetic process

Nucleic acid metabolicprocess

gene expression

chromatin organizationnucleosome assembly

epigenetic reg of geneexpression

negative regulation ofmacromolecule

metabolic process

cellular response tostress

RNA processing

DNA repair

regionalization

polysaccharidebiosynthetic process

cell wall organization orbiogenesis

glucan metabolicprocess

cellular glucanmetabolic process

cellular polysaccharidebiosynthetic process

cellular carbohydratebiosynthetic process

cellulose metabolicprocess

cellular polysaccharidemetabolic process

cellulose biosyntheticprocess epidermis development

cell growthgrowth

regulation of cellularcomponent size

cellular amino acidderivative metabolic

process

cell wall polysaccharidemetabolic process

carbohydrate metabolicprocess

regulation of anatomicalstructure size

GTP metabolic process

root morphogenesis

epidermal celldifferentiation

ectoderm developmentphenylpropanoid

biosynthetic process

regulation of cell size

glucan biosyntheticprocess

carbohydratebiosynthetic process cellular cell wall

organization orbiogenesis

cell wall biogenesis

Cell Wallroot epidermal cell

differentiationcell differentiation

cell wall organization

protein polymerization

plant-type cell wallbiogenesis

cellular carbohydratemetabolic process

phenylpropanoidmetabolic process

cell wall macromoleculemetabolic process

cellular cell wallmacromolecule

metabolic process

plant-type cell wallorganization or

biogenesis

hemicellulose metabolicprocess

Fig 7A C

B D

Figure 7 Comparison of top 1000 interacting genes in aggregation and single networks A Venn diagram shows the overlap among 1000 highest interacting genes in three networks 148 genes were shared among three networks PA PCC ranked aggregation network SA SCC ranked aggrega-tion network MS MRNET single network B Venn diagram shows the overlap among top 1x106 edges in three networks 106591 edges were shared among three networks C GO term enrichment analyzed by AgriGO for 148 shared highly interacting genes among PA SA and MS D GO term enrichment analyzed by AgriGO for 214 co-expressed genes queried by 16 cell wall pathway genes wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Fig 8

A B C

Figure 8 Cell wall pathway subnetworks A Intersections of PCC aggregation (PA) SCC aggregation (SA) and MRNET-single (MS) networks queried by 16 cell wall pathway genes (red nodes) Cyan nodes are genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions B Network retrieved from CORNET database queried by the 16 cell wall pathway genes (red node) Cyan nodes are genes with reported function in cell wall related path-ways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions C Network retrieved from STRING database queried by the 16 cell wall pathway genes (red nodes) Cyan nodes are genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Parsed CitationsAllen JD Xie Y Chen M Girard L Xiao G (2012) Comparing statistical methods for constructing large scale gene networks PLoS One7 e29348

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Anders S Huber W (2010) Differential expression analysis for sequence count data Genome Biol 11 R106Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Aoki K Ogata Y Shibata D (2007) Approaches for extracting practical information from gene co-expression networks in plant biologyPlant Cell Physiol 48 381-90

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Aoki Y Okamura Y Tadaka S Kinoshita K Obayashi T (2015) ATTED-II in 2016 A Plant Coexpression Database Towards Lineage-Specific Coexpression Plant Cell Physiol 57 e5-e5

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Asnicar F Masera L Coller E Gallo C Sella N Tolio T Morettin P Erculiani L Galante F Semeniuta S (2016) NES2RA Networkexpansion by stratified variable subsetting and ranking aggregation Int J High Perform Comput Appl 1094342016662508

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Baerenfaller K Grossmann J Grobei MA Hull R Hirsch-Hoffmann M Yalovsky S Zimmermann P Grossniklaus U Gruissem WBaginsky S (2008) Genome-scale proteomics reveals Arabidopsis thaliana gene models and proteome dynamics Science (80- ) 320938-941

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ballouz S Verleyen W Gillis J (2015) Guidance for RNA-seq co-expression network construction and analysis Safety in numbersBioinformatics 31 2123-2130

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ballouz S Weber M Pavlidis P Gillis J (2016) EGAD Ultra-fast functional analysis of gene networks bioRxiv 53868Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Barabasi A-L Oltvai ZNZN Barabaacutesi A-L (2004) Network biology understanding the cells functional organization Nat Rev Genet 5101-113

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Benjamini Y Hochberg Y (1995) Controlling the false discovery rate a practical and powerful approach to multiple testing J R Stat SocSer B 289-300

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

De Bodt S Hollunder J Nelissen H Meulemeester N Inzeacute D (2012) CORNET 20 Integrating plant coexpression protein-proteininteractions regulatory interactions gene associations and functional annotations New Phytol 195 707-720

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Bolduc N Yilmaz A Mejia-Guerra MK Morohashi K OConnor D Grotewold E Hake S (2012) Unraveling the KNOTTED1 regulatorynetwork in maize meristems Genes Dev 26 1685-90

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Bosch M Mayer C-D Cookson A Donnison IS (2011) Identification of genes involved in cell wall biogenesis in grasses by differential wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

gene expression profiling of elongating and non-elongating maize internodes J Exp Bot 62 3545-3561Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Camacho C Coulouris G Avagyan V Ma N Papadopoulos J Bealer K Madden TL (2009) BLAST+ architecture and applications BMCBioinformatics 10 421

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Conesa A Madrigal P Tarazona S Gomez-Cabrero D Cervera A McPherson A Szczesniak MW Gaffney DJ Elo LL Zhang X et al(2016) A survey of best practices for RNA-seq data analysis Genome Biol 17 13

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dhaeseleer P Liang S Somogyi R (2000) Genetic network inference from co-expression clustering to reverse engineeringBioinformatics 16 707-726

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Davidson RM Hansey CN Gowda M Childs KL Lin H Vaillancourt B Sekhon RS de Leon N Kaeppler SM Jiang N et al (2011) Utilityof RNA Sequencing for Analysis of Maize Reproductive Transcriptomes Plant Genome J 4 191

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Desprez T Juraniec M Crowell EF Jouy H Pochylova Z Parcy F Houmlfte H Gonneau M Vernhettes S (2007) Organization of cellulosesynthase complexes involved in primary cell wall synthesis in Arabidopsis thaliana Proc Natl Acad Sci 104 15572-15577

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dhillon IS Modha DS (2001) Concept decompositions for large sparse text data using clustering Mach Learn 42 143-175Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dillies M-A Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot G Castel D Estelle J (2013a) Acomprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis Brief Bioinform 14671-683

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dillies MA Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot NS Castel D Estelle J et al (2013b) Acomprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis Brief Bioinform 14671-683

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Doncheva NT Assenov Y Domingues FS Albrecht M (2012) Topological analysis and interactive visualization of biological networksand protein structures Nature Protoc 7 670-85

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dong J Horvath S (2007) Understanding network concepts in modules BMC Syst Biol 1 24Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dotto MC Petsch KA Aukerman MJ Beatty M Hammell M Timmermans MCP (2014) Genome-wide analysis of leafbladeless1-regulatedand phased small RNAs underscores the importance of the TAS3 ta-siRNA pathway to maize development PLoS Genet 10 e1004826

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Du D Rawat N Deng Z Gmitter FG (2015) Construction of citrus gene coexpression networks from microarray data using randommatrix theory Hortic Res 2 15026

Pubmed Author and TitleCrossRef Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Google Scholar Author Only Title Only Author and Title

Du Z Zhou X Ling Y Zhang Z Su Z (2010) agriGO a GO analysis toolkit for the agricultural community Nucleic Acids Res 38 64-70Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Enright AJ Van Dongen S Ouzounis CA (2002) An efficient algorithm for large-scale detection of protein families Nucleic Acids Res 301575-1584

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fadista J Vikman P Laakso EO Mollet IG Esguerra J Lou Taneera J Storm P Osmark P Ladenvall C Prasad RB (2014) Globalgenomic and transcriptomic analysis of human pancreatic islets reveals novel genes influencing glucose metabolism Proc Natl AcadSci 111 13924-13929

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Faith JJ Hayete B Thaden JT Mogno I Wierzbowski J Cottarel G Kasif S Collins JJ Gardner TS (2007) Large-scale mapping andvalidation of Escherichia coli transcriptional regulation from a compendium of expression profiles PLoS Biol 5 0054-0066

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fedoroff N V (2012) McClintocks challenge in the 21st century Proc Natl Acad Sci 109(50) 20200-20203Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ficklin SP Feltus FA (2011) Gene coexpression network alignment and conservation of gene modules between two grass speciesmaize and rice Plant Physiol 156 1244-56

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Filzmoser P Fritz H Kalcher K (2009) pcaPP Robust PCA by Projection Pursuit R Packag version 1Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fu J Cheng Y Linghu J Yang X Kang L Zhang Z Zhang J He C Du X Peng Z (2013) RNA sequencing reveals the complex regulatorynetwork in the maize kernel Nature Commun 42832

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gao S Ver Steeg G Galstyan A (2015) Efficient Estimation of Mutual Information for Strongly Dependent Variables ArtificialIntelligence and Statistics 277-286

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gillis J Pavlidis P (2011) The role of indirect connections in gene networks in predicting function Bioinformatics 27 1860-1866Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Giorgi F Fabbro C Del Licausi F (2013) Comparative study of RNA-seq-and Microarray-derived coexpression networks in Arabidopsisthaliana Bioinformatics 2 1-8

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gu Y Kaplinsky N Bringmann M Cobb A Carroll A Sampathkumar A Baskin TI Persson S Somerville CR (2010) Identification of acellulose synthase-associated protein required for cellulose biosynthesis Proc Natl Acad Sci 107 12866-12871

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Han Y Gao S Muegge K Zhang W Zhou B (2015) Advanced applications of RNA sequencing and challenges Bioinform Biol Insights 929-46

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Horvath S Dong J (2008) Geometric interpretation of gene coexpression network analysis PLoS Comput biol 4 e1000117Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Huang J Lynn JS Schulte L Vendramin S McGinnis K (2017) Chapter Two-Epigenetic Control of Gene Expression in Maize Int RevCell Mol Biol 328 25-48

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Iancu OD Kawane S Bottomly D Searles R Hitzemann R McWeeney S (2012) Utilizing RNA-Seq data for de novo coexpressionnetwork inference Bioinformatics 28 1592-1597

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kim D Langmead B Salzberg SL (2015) HISAT a fast spliced aligner with low memory requirements Nat Methods 12 357-360Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Koumlster J Rahmann S (2012) Snakemakemdasha scalable bioinformatics workflow engine Bioinformatics 28 2520-2522Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Krouk G Lingeman J Colon A Coruzzi G Shasha D (2013) Gene regulatory networks in plants learning causality from time andperturbation Genome Biol 14 123

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kumar S Stecher G Suleski M Hedges SB (2017) TimeTree a resource for timelines timetrees and divergence times Mol Biol Evol34 1812-1819

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kumari S Nie J Chen H-S Ma H Stewart R Li X Lu M-Z Taylor WM Wei H (2012) Evaluation of gene association methods forcoexpression network construction and biological knowledge discovery PLoS One 7 e50411

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Langfelder P Horvath S (2008) WGCNA an R package for weighted correlation network analysis BMC Bioinformatics 9 559Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Leinonen R Sugawara H Shumway M (2010) The sequence read archive Nucleic Acids Res gkq1019Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li C Qiao Z Qi W Wang Q Yuan Y Yang X Tang Y Mei B Lv Y Zhao H et al (2015a) Genome-Wide Characterization of cis-Acting DNATargets Reveals the Transcriptional Regulatory Framework of Opaque2 in Maize Plant Cell 27 532-545

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li H Peng Z Yang X Wang W Fu J Wang J Han Y Chai Y Guo T Yang N (2013a) Genome-wide association study dissects the geneticarchitecture of oil biosynthesis in maize kernels Nat Genet 45 43-50

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li J Wei H Zhao PX (2013b) DeGNServer Deciphering Genome-Scale Gene Networks through High Performance ReverseEngineering Analysis 2013

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li P Piao Y Shon HS Ryu KH (2015b) Comparing the normalization methods for the differential analysis of Illumina high-throughputRNA-Seq data BMC Bioinformatics 16 347 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li Q Eichten SR Hermanson PJ Zaunbrecher VM Song J Wendt J Rosenbaum H Madzima TF Sloan AE Huang J et al (2014a)Genetic Perturbation of the Maize Methylome Plant Cell 26 4602-4616

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li S Labaj PP Zumbo P Sykacek P Shi W Shi L Phan J Wu P-Y Wang M Wang C (2014b) Detecting and correcting systematicvariation in large-scale RNA sequencing data Nature Biotechnol 32 888-895

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li Y Pearl SA Jackson SA (2015c) Gene Networks in Plant Biology Approaches in Reconstruction and Analysis Trends Plant Sci 20664-675

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Liao Y Smyth GK Shi W (2014) featureCounts an efficient general purpose program for assigning sequence reads to genomicfeatures Bioinformatics 30 923-930

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Lim WK Wang K Lefebvre C Califano A (2007) Comparative analysis of microarray normalization procedures Effects on reverseengineering gene networks Bioinformatics pp 282-288

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Liu S Liu Y Zhao J Cai S Qian H Zuo K Zhao L Zhang L (2017) A computational interactome for prioritizing genes associated withcomplex agronomic traits in rice Plant J 90 177-188

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Lonsdale J Thomas J Salvatore M Phillips R Lo E Shad S Hasz R Walters G Garcia F Young N (2013) The genotype-tissueexpression (GTEx) project Nat Genet 45 580-585

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Love MI Huber W Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 Genome Biol15 1

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Luo X You Z Zhou M Li S Leung H Xia Y Zhu Q (2015) A highly efficient approach to protein interactome mapping based oncollaborative filtering framework Sci Rep 5 7702

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma C Wang X (2012) Application of the Gini correlation coefficient to infer regulatory relationships in transcriptome analysis PlantPhysiol 160 192-203

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma H-W Zeng A-P (2003) The connectivity structure giant strong component and centrality of metabolic networks Bioinformatics 191423-1430

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma S Shah S Bohnert HJ Snyder M Dinesh-Kumar SP (2013) Incorporating motif analysis into gene co-expression networks revealsnovel modular expression pattern and new signaling pathways PLoS Genet 9 e1003840

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Marbach D Costello JC Kuumlffner R Vega NNM Prill RJ Camacho DM Allison KR Aderhold A Allison KR Bonneau R et al (2012)Wisdom of crowds for robust gene network inference Nat Methods 9 796-804

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Margolin AA Nemenman I Basso K Wiggins C Stolovitzky G Dalla Favera R Califano A (2006) ARACNE an algorithm for thereconstruction of gene regulatory networks in a mammalian cellular context BMC Bioinformatics 7 S7

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mark Cigan A Unger-Wallace E Haug-Collet K (2005) Transcriptional gene silencing as a tool for uncovering gene function in maizePlant J 43 929-940

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads EMBnet J 17 pp-10Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Maza E Frasse P Senin P Bouzayen M Zouine M (2013) Comparison of normalization methods for differential gene expressionanalysis in RNA-Seq experiments A matter of relative size of studied transcriptomes Commun Integr Biol 6 e25849

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

McClintock B (1983) The Significance of Responses of THE GENOME TO CHALLENGE Science (80- ) 792-801Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Meyer PE Kontos K Lafitte F Bontempi G (2007) Information-theoretic inference of large transcriptional regulatory networks EurasipJ Bioinforma Syst Biol doi 101155200779879

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Meyer PE Lafitte F Bontempi G (2008) minet ARBioconductor package for inferring large transcriptional networks using mutualinformation BMC Bioinformatics 9 461

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Monaco MK Sen TZ Dharmawardhana PD Ren L Schaeffer M Naithani S Amarasinghe V Thomason J Harper L Gardiner J et al(2013) Maize Metabolic Network Construction and Transcriptome Analysis Plant Genome 6 12

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Morohashi K Casas MI Falcone Ferreyra ML Falcone Ferreyra L Mejiacutea-Guerra MK Pourcel L Yilmaz A Feller A Carvalho B EmilianiJ et al (2012) A genome-wide regulatory framework identifies maize pericarp color1 controlled genes Plant Cell 24 2745-64

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Morris JH Apeltsin L Newman AM Baumbach J Wittkop T Su G Bader GD Ferrin TE (2011) clusterMaker a multi-algorithmclustering plugin for Cytoscape BMC Bioinformatics 12 436

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mortazavi A Williams BA McCue K Schaeffer L Wold B (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq NatMethods 5 621-628

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mukaka MM (2012) A guide to appropriate use of correlation coefficient in medical research Malawi Med J 24 69-71Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Obayashi T Hayashi S Saeki M Ohta H Kinoshita K (2009) ATTED-II provides coexpressed gene networks for Arabidopsis Nucleic wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Acids Res 37 D987-D991Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ogata Y Suzuki H Sakurai N Shibata D (2010) CoP a database for characterizing co-expressed gene modules with biologicalinformation in plants Bioinformatics 26 1267-1268

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Oshlack A Wakefield MJ (2009) Transcript length bias in RNA-seq data confounds systems biology Biol Direct 4 14Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Park T Yi S-G Kang S-H Lee S Lee Y-S Simon R (2003) Evaluation of normalization methods for microarray data BMC Bioinformatics4 33

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Paulson J Chen C-Y Lopes-Ramos CM Kuijjer ML Platig J Sonawane AR Fagny M Glass K Quackenbush J (2016) Tissue-awareRNA-Seq processing and normalization for heterogeneous and sparse data bioRxiv 81802

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Pautler M Eveland AL LaRue T Yang F Weeks R Lunde C Je B Il Meeley R Komatsu M Vollbrecht E et al (2015) FASCIATED EAR4Encodes a bZIP Transcription Factor That Regulates Shoot Meristem Size in Maize Plant Cell Online 2 tpc114132506

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Penning BW Hunter 3rd CT Tayengwa R Eveland AL Dugard CK Olek AT Vermerris W Koch KE McCarty DR Davis MF et al (2009)Genetic resources for maize cell wall biology Plant Physiol 151 1703-1728

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ponnala L Wang Y Sun Q Wijk KJ (2014) Correlation of mRNA and protein abundance in the developing maize leaf Plant J 78 424-440Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Rau A Gallopin M Celeux G Jaffreacutezic F (2013) Data-based filtering for replicated high-throughput transcriptome sequencingexperiments Bioinformatics 29 2146-2152

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Robinson MD McCarthy DJ Smyth GK (2010) edgeR a Bioconductor package for differential expression analysis of digital geneexpression data Bioinformatics 26 139-140

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sales G Romualdi C (2011) parmigenemdasha parallel R package for mutual information estimation and gene network reconstructionBioinformatics 27 1876-1877

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sangrador-Vegas A Mitchell AL Chang H-Y Yong S-Y Finn RD (2016) GO annotation in InterPro why stability does not indicateaccuracy in a sea of changing annotations Database J Biol databases curation 2016

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sato Y Antonio BA Namiki N Takehisa H Minami H Kamatsuki K Sugimoto K Shimizu Y Hirochika H Nagamura Y (2011) RiceXPro aplatform for monitoring gene expression in japonica rice grown under natural field conditions Nucleic Acids Res 39 D1141-D1148

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schaefer RJ Briskine R Springer NM Myers CL (2014) Discovering functional modules across diverse maize transcriptomes using wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

COB the co-expression browser PLoS One doi 101371journalpone0099193Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schmidt D (2016) Co-Operation Fast Correlation Covariance and Cosine Similarity R packagePubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schnable P Ware D Fulton R Stein J (2009) The B73 maize genome complexity diversity and dynamics Science (80- ) 326 1112-1115Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schwanhaumlusser B Busse D Li N Dittmar G Schuchhardt J Wolf J Chen W Selbach M (2011) Global quantification of mammalian geneexpression control Nature 473 337-342

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Scott-Boyer M-P Haibe-Kains B Deschepper CF (2013) Network statistics of genetically-driven gene co-expression modules in mousecrosses Frontiers in Genetics 20134291

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Serin EAR Nijveen H Hilhorst HWM Ligterink W (2016) Learning from Co-expression Networks Possibilities and Challenges FrontPlant Sci 7 444

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Shannon P Markiel A Ozier O Baliga NS Wang JT Ramage D Amin N Schwikowski B Ideker T (2003) Cytoscape a softwareenvironment for integrated models of biomolecular interaction networks Genome Res 13 2498-2504

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sing T Sander O Beerenwinkel N Lengauer T (2005) ROCR visualizing classifier performance in R Bioinformatics 21 3940-3941Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Singh M Goel S Meeley RB Dantec C Parrinello H Michaud C Leblanc O Grimanelli D (2011) Production of viable gametes withoutmeiosis in maize deficient for an ARGONAUTE protein Plant Cell 23 443-458

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Song L Langfelder P Horvath S (2012) Comparison of co-expression measures mutual information correlation and model basedindices BMC Bioinformatics 13 328

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Stelpflug SC Rajandeep S Vaillancourt B Hirsch CN Buell CR Leon N De Kaeppler SM (2015) An expanded maize gene expressionatlas based on RNA-sequencing and its use to explore root development Plant Genome 314-362

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Szklarczyk D Franceschini A Wyder S Forslund K Heller D Huerta-Cepas J Simonovic M Roth A Santos A Tsafou KP et al (2015)STRING v10 Protein-protein interaction networks integrated over the tree of life Nucleic Acids Res 43 D447-D452

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Tzfadia O Diels T De Meyer S Vandepoele K Aharoni A Van de Peer Y (2016) CoExpNetViz Comparative Co-Expression NetworksConstruction and Visualization Tool Front Plant Sci 6 1194

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Usadel B Obayashi T Mutwil M Giorgi FM Bassel GW Tanimoto M Chow A Steinhauser D Persson S Provart NJ (2009) Co-expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

USDA (2016) Grain World Markets and Trade

Vanholme R Demedts B Morreel K Ralph J Boerjan W (2010) Lignin biosynthesis and structure Plant Physiol 153 895-905Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Verdin E Ott M (2014) 50 Years of Protein Acetylation From Gene Regulation To Epigenetics Metabolism and Beyond Nat Rev MolCell Biol 16 258-264

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Walley JW Sartor RC Shen Z Schmitz RJ Wu KJ Urich MA Nery JR Smith LG Schnable JC Ecker JR (2016) Integration of omicnetworks in a developmental atlas of maize Science 353 814-818

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang W Zhou XJ Liu Z Sun F (2015a) Network tuned multiple rank aggregation and applications to gene ranking BMC Bioinformatics16 S6

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang YXR Jiang K Feldman LJ Bickel PJ Huang H (2015b) Inferring gene-gene interactions and functional modules using sparsecanonical correlation analysis Ann Appl Stat 9 300-323

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang Z Gerstein M Snyder M (2009) RNA-Seq a revolutionary tool for transcriptomics Nat Rev Genet 10 57-63Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wei C Li J Bumgarner RE (2004) Sample size for detecting differentially expressed genes in microarray experiments BMC Genomics5 87

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wolfe CJ Kohane IS Butte AJ (2005) Systematic survey reveals general applicability of guilt-by-association within genecoexpression networks BMC Bioinformatics 6 227

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wu D-D Wang X Li Y Zeng L Irwin DM Zhang Y-P (2014) Out of Pollen Hypothesis for Origin of New Genes in Flowering PlantsStudy from Arabidopsis thaliana Genome Biol Evol 6 2822-2829

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yang H Liu X Xin M Du J Hu Z Peng H Rossi V Sun Q Ni Z Yao Y (2016) Genome-wide Mapping of Targets of Maize HistoneDeacetylase HDA101 Reveals Its Function and Regulatory Mechanism during Seed Development Plant Cell 28 629-645

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yim WC Yu Y Song K Jang CS Lee B-M (2013) PLANEX the plant co-expression database BMC Plant Biol 13 83Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zheng W Chung LM Zhao H (2011) Bias detection and correction in RNA-Sequencing data BMC Bioinformatics 12 290Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Allen JD Xiao G Xie Y (2014) Ensemble-based network aggregation improves the accuracy of gene network reconstructionPLoS One 9 e106319 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Ye ZH (2015) Secondary cell walls Biosynthesis patterned deposition and transcriptional regulation Plant Cell Physiol 56195-214

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhu G Wu A Xu X-J Xiao P Lu L Liu J Cao Y Chen L Wu J Zhao X-M (2016) PPIM A protein-protein interaction database for MaizePlant Physiol 170 pp1501821-

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zyprych-Walczak J Szabelska A Handschuh L Goacuterczak K Klamecka K Figlerowicz M Siatkowski I (2015) The impact of normalizationmethods on RNA-Seq data analysis Biomed Res Int 2015

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

  • Parsed Citations
  • Article File
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Figure 7
  • Figure 8
  • Parsed Citations
Page 12: 1 : Maize RNA-Seq GCNs · 4 Author names and affiliation: 5 Ji Huang, ... and the genome annotation is still far from complete (Mark 49 Cigan et al., 2005; Ficklin and Feltus, 2011)

Page | 12

silencing (Supplemental Table S5) the top interacting genes were not limited to a subset of cellular 390

biochemistry Ribosomal proteins were the largest component of top interacting genes (27148) which was 391

expected because of their cellular abundance and involvement with translation Interestingly nine epigenetic 392

regulators were found in the 148 shared genes including AGO104 (GRMZM2G141818) (Singh et al 2011) 393

CHR106 (GRMZM2G071025) (Li et al 2014a) and LBL1 (GRMZM2G020187) (Dotto et al 2014) 394

demonstrating the importance of epigenetic regulation for plant development (reviewed by (Huang et al 395

2017)) 396

To reveal the underlying properties of GCNs a graph clustering algorithm Markov Cluster Algorithm(MCL) was 397

used to identify network modules (Enright et al 2002 Morris et al 2011) The result showed a shared pattern 398

between the PA and SA networks that was distinct from the MS network (Supplemental Table S4) The MS 399

network had fewer but larger modules detected than the PA and SA networks Consequently most genes in 400

the MS network clustered into one very large module of 14054 consistent with the high network centralization 401

value for the MS network Conversely PA and SA networks separated into smaller distinct modules with 402

related gene ontology enrichment (Supplemental Table S6 and S7) The pattern displayed by the PA and SA 403

networks (Supplemental Fig 10) seems more likely to represent biologically relevant pathways and so these 404

methods appear to be better for module detection 405

To compile a high-confident co-expression network the top 1 million edges from PA SA and MS were merged 406

together and the intersection of the three produced a 14277 gene 106591 interactions merged network PA 407

and SA shared 835 of common interactions within the networks while MS had 873 unique interactions 408

(Fig 7B) This merged network (Supplemental Dataset S1) was used for a case study analysis of cell wall 409

biosynthesis The same network can also be accessed at httpwwwbiofsuedumcginnislabmcnmain_pagephp 410

411

Case Study Cell Wall Biosynthesis and Regulation 412

To demonstrate the functionality of network the predicted cell wall biosynthesis pathway from the merged 413

network was compared to the existing knowledge of this pathway Sixteen well-characterized components of 414

cell wall biosynthesis were selected as guide genes (Supplemental Table S8) including five cellulose 415

synthase genes seven cellulose synthase-like genes three glycosyl hydrolase genes and one glycosidase 416

gene (Penning et al 2009 Bosch et al 2011) Collectively 214 genes containing 377 edges were extracted 417

from the network with the 16 guide genes (Fig 8 A) two guide genes did not have any co-expressed genes in 418

the network that met the analysis criteria As expected for these 214 genes cell wall related GO terms were 419

enriched (Fig 7D Supplemental Table S9) 420

The resulting 214 co-expressed genes were queried against the Arabidopsis TAIR 10 protein database to 421

retrieve homologs and their annotations using BLASTP The literature was manually searched using the maize 422

genes and their Arabidopsis homologs as queries (Supplemental Table S10) The results of the literature 423

survey showed that 313 (67214) of the genes co-expressed with the guide genes had peer-reviewed 424

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 13

publications indicating a role in cell wall synthesis or related pathways in plants A search using 214 randomly 425

selected genes as queries returned only 327 genes (7214) that were involved in cell wall related pathways 426

This suggests that the network discriminated co-expressed genes and identified some known components of 427

the pathway Lignin biosynthesis genes are expected to function in cell wall biosynthesis to provide rigidity and 428

strength in the secondary cell wall (reviewed by Vanholme et al 2010) Interestingly even though no lignin 429

biosynthesis genes were included in our queries six lignin biosynthesis genes (PAL1 C4H 4CL2 HCT 430

CCoAOMT1 and PDR1) (reviewed by Zhong and Ye 2015) were found to be co-expressed with the guide 431

genes At least nine cellulose biosynthesis and assembly genes were discovered including CESA1 FLA11 432

IRX9 IRX14 and IRX10 (reviewed by Zhong and Ye 2015) Moreover proteins participating in a well-studied 433

physical interaction CSI1 (Cellulose Synthase Interactive 1) CESA6 (Cellulose Synthase 6) and CESA3 434

(Cellulose Synthase 3) (Desprez et al 2007 Gu et al 2010) were also predicted to be expressed in the 435

network There were 131 genes without reported functions in cell wall pathways an indication that GCN 436

analysis can be used to predict undiscovered components of biological pathways in maize 437

The cell wall biosynthesis pathway results were also compared with the CORNET Co-expression database (De 438

Bodt et al 2012) and STRING functional protein association network (Szklarczyk et al 2015) using the same 439

16 genes and similar parameters (See Methods) From CORNET 10 out of 16 genes had co-expressed genes 440

(Fig 8B) In total 210 genes and 325 interactions were retrieved using CORNET of which 19 (40210) had 441

publications supporting their function in cell wall pathways (Supplemental Table S11) STRING performed very 442

well with 14 out of 16 genes demonstrating predicted protein association (Fig 8C) resulting in 817 443

interactions with 76 genes 48 (3675) of co-expressed genes were experimentally confirmed (Supplemental 444

Table S12) the highest percentage among the three methods Only one of the lignin biosynthesis genes 445

(PAL1) was found using CORNET and none were found using STRING Although STRING appears very 446

robust for predicting protein-protein interactions this suggests that an optimized GCN analysis have more 447

power to find genes that function together without physically interacting This case study shows that a robust 448

optimized GCN can discover physical and functional interactions and enhance study of biological relevant 449

interactions A tutorial was provided as supplemental material on how to use Cytoscape to visualize any co-450

expressed genes in our network (Supplemental Dataset S2) 451

452

Discussion 453

As the per-read cost of RNA-Seq technology decreases the use of this technology is quickly increasing With 454

over five thousand libraries available for maize there is now ample data to support GCN analysis This 455

comprehensive evaluation of normalization methods and network inference methods using real maize RNA-456

Seq data will provide a useful set of optimized parameters to support these analyses 457

In our analysis VST CPM and RPKM normalization methods had equivalent outcomes for GCN analysis 458

consistent with prior results using much smaller datasets (Giorgi et al 2013) Several benchmark studies 459

focusing on differential expression (DE) analysis proposed that RPKM performed poorly and should be avoided 460 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 14

(Maza et al 2013 Dillies et al 2013b Zyprych-Walczak et al 2015) This was not observed for the maize 461

GCN testing It is possible that the large number of samples from various labs created enough heterogeneity 462

within samples that normalization effects were minimized (Paulson et al 2016) Furthermore the 463

normalization is on a library basis which means genes within the same library are normalized by similar factors 464

So when the network is constructed by PCC and BIC where expression vectors are centered by mean or 465

median values the effect of different normalization methods are probably small Two rank correlations SCC 466

and KCC only consider difference on relative rankings where normalization has a limited effect It is similar for 467

GCC method The estimation of mutual information is based on the k-nearest neighbor method implemented in 468

parmigene (Sales and Romualdi 2011) Since the three normalization methods shared similar expression 469

distribution (Supplemental Fig 2) MI estimations from different normalizations are expected to be similar 470

When assessing inference methods the simple and widely used correlation methods like PCC and SCC are 471

less time-consuming than MI methods This analysis showed PCCSCC- built GCNs had better overall 472

performance This is consistent with a study in human GCN analysis (Ballouz et al 2015) but SCC did not 473

score higher than other correlation methods using GO and PPPTY evaluations Some genes had higher 474

performance using MI methods but this effect was limited to evaluation with the PPPTY data This may 475

indicate that correlation and MI inference methods assert different kinds of interactions (Meyer et al 2008 476

Marbach et al 2012 Song et al 2012) Marbach et al (2012) stated that integration of multiple inference 477

methods showed a more robust performance than any single inference methods in in silico and E coli 478

expression networks referring to ldquothe wisdom of crowdrdquo However for analysis of the available maize data 479

integration of PCC SCC MRNET and CLR together did not result in a network that outperformed PCC and 480

SCC networks (data not shown) This approach was also less effective in more complex S cerevisiae datasets 481

than prokaryotic networks (Marbach et al 2012) suggesting that more work is required to determine whether 482

integrating algorithms can improve GCNs with eukaryotic data 483

In conclusion we extensively evaluated normalization methods and inference methods for building an RNA-484

Seq based maize GCN This optimization may apply to a range of datasets with shared characteristics of 485

maize including a large and heterogeneous genome with rich and diverse transposon element composition 486

and limited gene annotation 487

488

Materials and Methods 489

RNA-Seq Data Collection and Process 490

The maize genome and its annotation were downloaded from Ensembl Plant Release 31 491

(httpplantsensemblorg) The original 1303 RNA-Seq samples based on illumina HiSeq2000 or Hiseq2500 492

were downloaded from NCBI Sequence Read Archive (SRA) (Leinonen et al 2010) The downloaded files 493

were converted to fastq format using the fastq-dump command in SRA Toolkit (version 252) The adapters for 494

the fastq files were trimmed by Cutadapt 181 (Martin 2011) The adapter-removed files were then quality 495

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 15

checked by FastQC v0112 (httpwwwbioinformaticsbabrahamacukprojectsfastqc) HISAT2 v204 (Kim 496

et al 2015) was used for genome alignment Gene-level expression raw read counts were calculated by 497

FeatureCounts 150 (Liao et al 2014) from aligned bam files (Supplemental Fig S1) 26 libraries with less 498

than 5 million reads total and 11 libraries with less than 70 of total alignment rate were excluded leaving 499

1266 samples (Supplemental Table S1) for the final expression table The processing protocol were 500

streamlined by Snakemake v371 (Koumlster and Rahmann 2012) 501

502

Gene Count Normalization 503

The expression data was normalized using three different methods before constructing GCNs Counts Per 504

Million (CPM) and Reads Per Killbase Per Million (RPKM) were calculated by edgeR package (Robinson et al 505

2010) in R environment and then log2 normalized (expression = log2(CPMRPKM +1) For both method scale 506

factors between samples were estimated by Trimmed Mean of M-values (TMM) in edge R Variance Stabilizing 507

Transformation (VST) was calculated by DESeq2 package (Love et al 2014) Only genes with expression 508

higher than 2 CPM in more than 1000 samples were included from additional analysis (15116 genes) 509

510

Network Inference 511

Six correlation coefficient methods and four mutual information methods were applied to normalized gene 512

expression data to construct GCNs All computing steps were done in the R 331 environment Pearson 513

Correlation Coefficient (PCC) and Spearman Correlation Coefficient (SCC) was calculated by cor() function 514

Kendall rank Correlation Coefficient was calculated using corfk() function in pcaPP package (Filzmoser et al 515

2009) Gini Correlation Coefficient was calculated by adjacencymatrix() function in rsgcc package (Ma and 516

Wang 2012) Biweight midcorrelation was computed by bicor() function in WGCNA package (Langfelder and 517

Horvath 2008) Cosine similarity coefficient was computed by cosine() function in coop package (Schmidt 518

2016) Mutual information results were computed using the parmigene package (Sales and Romualdi 2011) 519

The adjacency matrix weighs derived from ten inference methods were ranked with smallest value equals to 520

one Then ranks were divided by the number of elements in the matrix and diagonal was set to one to make all 521

networks weighs ranging from zero to one 522

523

Network Performance Evaluation 524

To generate the random networks gene IDs were shuffled randomly in CPM or VST normalized expression 525

matrices The randomized expression matrices were then inferenced by PCC MRNET or CLR methods and 526

evaluated For PCC methods 1000 repeats of randomization and evaluation were conducted For MRNET and 527

CLR each inference steps took 2 hours on our server so 10 repeats were conducted 528

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 16

Four maize datasets were used for evaluation First maize protein-protein interactions were downloaded from 529

PPIM v11 (Zhu et al 2016) Only high-confidence interactions were used for evaluation as defined by ranking 530

top 5 in their results Second maize pathway information was downloaded from MaizeCyc v22 (Monaco et 531

al 2013) Genes within same pathways were considered as co-expressed Third maize gene ontology data 532

for AGPv330 was downloaded from AgriGO (Du et al 2010) GO terms with 20 to 300 genes were used for 533

evaluation Fourth ChIP-Seq confirmed targets for HDA101 (GRMZM2G172883) (Yang et al 2016) was used 534

as positive co-expressed examples for evaluation 535

The widely-used Area under Receiver Operating Characteristic (AUROC) for binary classification problems 536

was used for evaluations Protein-protein interaction and pathway information was parsed into lists of co-537

expressed genes Prediction() and performance() function in R package ROCR were used to calculate 538

AUROCs (Sing et al 2005) The 277 AUROC values for GO datasets were calculated by EGAD package 539

(Ballouz et al 2016) in R Basically it utilizes the ldquoguilt-by associationrdquo principle that genes with shared GO 540

terms are more likely to connected Thus networks normalized and inferred by different methods can be 541

evaluated by hiding a subset of genes GO terms and test whether the hidden GO terms could be predicted 542

from the remaining annotations The prediction model performance was measured by AUROC values in three-543

fold cross-validation All ANOVA and pairwise Wilcoxon rank tests were analyzed in R using anova() and 544

pairwisewilcoxtest() function from stats package P-value adjustment method was set to ldquofdrrdquo (Benjamini and 545

Hochberg 1995) 546

Definition of True Positives (TP) False Positives (FP) True Negatives (TN) False Negatives (FN) For the 547

evaluation using PPPTY dataset TP a network predicts two genes are co-expressed and they are co-548

expressed in PPPTY dataset FP a network predicts two genes are co-expressed but they are not TN a 549

network predicts two genes are not co-expressed and they are not co-expressed in PPPTY FN a network 550

predicts two genes are not co-expressed but they are co-expressed in PPPTY datasets For the evaluation 551

using GO dataset TP a network predicts a gene has a specific GO term and it does have that GO term in our 552

GO dataset FP a network predicts a gene has a specific GO term but it does not have that GO term in our 553

GO dataset TN a network predicts a gene does not have a specific GO term and it doesnrsquot have in our GO 554

dataset FN a network predicts a gene does not have a specific GO terms but it has that GO term in GO 555

dataset 556

557

Network Clustering and Characterization 558

For each network the top 1 million edges were selected as stringent co-expression networks The network 559

topological characteristics were computed in Cytoscape (Shannon et al 2003) The neighborhood connectivity 560

distribution and node degree distributions were plotted by Network Analyzer plugin (Doncheva et al 2012) 561

Graph clustering was performed using Markov Cluster Algorithm (MCL) by MCL v14137 with inflation value set 562

to 18 (Enright et al 2002) All networks were visualized in Cytoscape 563

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 17

564

Gene Ontology Enrichment and Visualization 565

Gene ontology enrichment was analyzed in AgriGOrsquos Singular Enrichment Analysis tool (Du et al 2010) 566

15116 genes involved in our networks were used as background references Hypergeometric testing was used 567

to calculate p-value for which a value below 005 was considered as significant The Yekutieli method was 568

used for multiple test correction and terms with false discovery rate (FDR) above 005 were discarded The 569

results were then imported into Cytoscape for visualization 570

571

Databases Comparison on Cell Wall Pathway 572

Sixteen well characterized (Penning et al 2009 Bosch et al 2011) components of cell wall biosynthesis 573

(Supplemental Table S8) were chosen as query genes to search against CORNET Maize 574

(httpsbioinformaticspsbugentbecornetversionscornet_maize10) on website and STRING database using 575

Cytoscape stringApp (httpappscytoscapeorgappsstringapp) The parameters for searching CORNET 576

database were Method=Pearson Correlation coefficient=075 P-value le 005 and Top genes = 50 This 577

resulted in 210 co-expressed genes and 325 interactions To search STRING database the confidence cutoff 578

was set to 04 with maximum number of interactors set to 100 76 genes with 817 interactions were retrieved 579

Maize proteins were blasted against TAIR 10 protein sequences using standalone BLASTP version 2228+ 580

(Camacho et al 2009) 581

582

Acknowledgments 583

We would like to give special thanks to Dr Peixiang Zhao (FSU Department of Computer Science) for advice 584

and discussion on topological analysis of maize networks Also we thank Dr Alan Lemmon (FSU Department 585

of Scientific Computing) and Dr Jonathan Dennis (FSU Department of Biological Science) for the helpful 586

discussion on data analysis 587

588

Supplemental Data 589

Supplemental Figure 1 Pipeline and datasets used for analysis 590

Supplemental Figure 2 Distribution of gene expression values 591

Supplemental Figure 3 Maize CPM-normalized with log2 transformed gene expression from all tissues and 592

developmental stages 593

Supplemental Figure 4 Pairwise comparison among results of inferences methods 594

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 18

Supplemental Figure 5 Characteristic of all 1720 PPPTY gene set (ALL_1720) genes with highest AUROC 595

values in CSC method (CSC) PCC method (PCC) and MRNET method (MRNET) 596

Supplemental Figure 6 Evaluation of network performance based on sample size and inference 597

Supplemental Figure 7 GCN performance comparison between protein networks 598

Supplemental Figure 8 Average neighborhood connectivity for three selected networks PCC-aggregated (PA) 599

SCC-aggregated (SA) and MRNET-single (MS) 600

Supplemental Figure 9 Node distribution for four selected networks PCC-aggregated (PA) SCC-aggregated 601

(SA) and MRNET-single (MS) and the intersection among three networks (Merged network) 602

Supplemental Figure 10 Network representation of PCC ranked aggregation network (PA) 603

Supplemental Table S1 RNA-Seq libraries used in this analysis 604

Supplemental Table S2 Random network AUROC value baseline 605

Supplemental Table S3 ANOVA tables and pairwise comparisons 606

Supplemental Table S4 Topological characteristics of four maize networks 607

Supplemental Table S5 Gene Ontology annotation for 148 hub genes 608

Supplemental Table S6 Enriched GO terms for PCC ranked aggregation networks from module 1 to module 8 609

Supplemental Table S7 Enriched GO terms for SCC ranked aggregation networks from module 1 to module 8 610

Supplemental Table S8 16 query genes in maize cell wall pathway 611

Supplemetal Table S9 GO enrichment analysis for 214 co-expressed genes of cell wall query genes in 612

merged network 613

Supplemental Table S10 Annotation for co-expressed genes queried by 16 cell wall pathway genes from 614

merged network 615

Supplemental Table S11 Annotation for co-expressed genes queried by 16 cell wall pathway genes from 616

CORNET database 617

Supplemental Table S12 Annotation for co-expressed genes queried by 16 cell wall pathway genes from 618

STRING database 619

Supplemental Dataset S1 The merged network in Cytoscape-ready format 620

Supplemental Dataset S2 Tutorial Visualizing Co-expression data in Cytoscape 621

622

623 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 19

624

625

626

Figure legends 627

628

Figure 1 Number of maize microarray and RNA-Seq samples submitted to NCBI (httpswwwncbinlmnihgov) 629

from 2008-2016 A A text search of the National Center for Biotechnology Information (NCBI) Gene 630

Expression Omnibus (GEO) database identified samples generated by microarray platforms GPL4032 and 631

GPL12620 the total values for years 2008 to 2016 were combined to represent the number of microarray 632

studies (Microarray) A text search of the NCBI Sequence Read Archive (SRA) database was used to identify 633

RNA-seq samples generated between 2008 and 2016 using the Illumina sequencing platform (RNA-Seq) B 634

the number of samples submitted to NCBI GEO database each year generated by microarray platform 635

GPL4032 and GPL12620 were identified by a text search (dash line) and compared to the number of RNA-Seq 636

Illumina samples (solid line) per year 2008-2016 637

638

Figure 2 Normalization and network inference methods effect on single network performance A Network 639

performance was evaluated by calculating area under the receiver operating characteristic curve (AUROC) 640

values from GO datasets for comparisons with samples normalized using Variance Stabilizing Transformation 641

(VST) Counts Per Million (CPM) or Reads per Kilobase per Million (RPKM) methods B Network performance 642

was evaluated by calculating AUROC values from PPPTY dataset comparisons for samples normalized using 643

VST CPM or RPKM methods C Network performance was evaluated by calculating AUROC values from 644

comparisons with HDA101 binding targets for samples normalized using VST CPM or RPKM methods D 645

Network performance was evaluated by calculating AUROC values from comparisons with GO dataset for 646

samples constructed using ten inference methods including Pearson Correlation Coefficient (PCC) Spearman 647

correlation coefficient (SCC) Kendall rank Correlation Coefficient (KCC) Gini correlation coefficient (GCC) 648

Biweight midcorrelation (BIC) Cosine Similarity Coefficient (CSC) Additive ARCNE (AA) multiplicative 649

ARCNE (MA) Minimum Redundancy NETwork (MRNET) and Context Likelihood of Relatedness (CLR) E 650

Network performance was evaluated by calculating AUROC values from comparisons with PPPTY for samples 651

constructed using ten inference methods F Network performance was evaluated by calculating AUROC 652

values from comparisons with HDA101 binding targets for samples constructed using ten inference methods 653

Outliers were defined as outside of 15 times the interquartile range above 75 quantile or below 25 quantile 654

Median values were plotted as bold horizontal lines For C to F the horizontal dashed lines indicate the highest 655

and lowest AUROC values 656

657

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 20

Figure 3 Similarity between ten inference methods on network performance based upon GO (A) and PPPTY 658

(B) evaluation Genes with highest AUROC value in CSC or MRNETCLR are closed in blue and green box 659

respectively Area under the ROC curve (AUROC) values for each GO term or genes were scaled to standard 660

normal distribution resulting in scaled AUROC values between -3 (blue) and 3 (red) Samples normalized by 661

VST CPM and RPKM were analyzed using each inference methods (PCC SCC KCC GCC BIC CSC AA 662

MA MRNET and CLR) and clustered based on Euclidian distance PCC Pearson Correlation Coefficient SCC 663

Spearman correlation coefficient KCC Kendall rank Correlation Coefficient GCC Gini correlation coefficient 664

BIC Biweight midcorrelation CSC Cosine Similarity Coefficient AA Additive ARCNE MA multiplicative 665

ARCNE MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness 666

667

Figure 4 Effect of sample size on network performance A Average area under the ROC curve (average 668

AUROC) values from GO evaluation of 17 different sized networks plotted against natural logarithm 669

transformed sample size (log(sample size)) B Average AUROC values from PPPTY evaluation of 17 different 670

sized networks plotted against natural logarithm transformed sample size (log(sample size)) Regression fitting 671

logarithm models were plotted in black lines R-square and p-values were calculated by lm() function in R PCC 672

Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET Minimum Redundancy 673

NETwork CLR Context Likelihood of Relatedness 674

675

Figure 5 Evaluation of different sized networks constructed with different number of samples using PCC 676

(black) SCC (green) MRNET (red) and CLR (blue) methods A Average AUROC values from GO evaluations 677

of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted against sample size Seventeen 678

individual networks were labeled as S12_1 to S404 the S1266 included all samples from 17 experiments B 679

Average AUROC values from PPPTY evaluations of networks constructed using 12 (S12) to 1266 (S1266) 680

libraries were plotted against sample size Networks with the same number of samples included are 681

designated as ldquo_1rdquo ldquo_2rdquo and ldquo_3rdquo PCC Pearson Correlation Coefficient SCC Spearman correlation 682

coefficient MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness 683

684

Fig 6 GCN performance comparison among single network (whiterdquo1266rdquo) aggregated network (greyrdquoaggrdquo) 685

and protein network (dark greyrdquoprrdquo) using PCC SCC MRNET and CLR A GO evaluation on networks 686

Inference methods were indicated by single letter (p- PCC s- SCC m- MRNET c-CLR) AUROC values were 687

plotted against network types B PPPTY evaluation on networks Inference methods were indicated by single 688

letter (p- PCC s- SCC m- MRNET c-CLR) Network types were plotted against AUROC values Bold 689

horizontal lines indicate median star sign is the mean value of each box Outliers are plotted in grey dots 690

691

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 21

Figure 6 GCN performance comparison of networks constructed with 1266 libraries A Area under the ROC 692

curve (AUROC) values from GO evaluation of single network (white bars) aggregation network (grey bars) and 693

protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) 694

or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers B 695

AUROC values from PPPTY evaluation of single network (white bars) aggregation network (grey bars) and 696

protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) 697

or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers 698

699

Figure 7 Comparison of top 1000 interacting genes in aggregation and single networks A Venn diagram 700

shows the overlap among 1000 highest interacting genes in three networks 148 genes were shared among 701

three networks PA PCC ranked aggregation network SA SCC ranked aggregation network MS MRNET 702

single network B Venn diagram shows the overlap among top 1x106 edges in three networks 106591 edges 703

were shared among three networks C GO term enrichment analyzed by AgriGO for 148 shared highly 704

interacting genes among PA SA and MS D GO term enrichment analyzed by AgriGO for 214 co-expressed 705

genes queried by 16 cell wall pathway genes 706

707

Figure 8 Cell wall pathway subnetworks A Intersections of PCC aggregation (PA) SCC aggregation (SA) and 708

MRNET-single (MS) networks queried by 16 cell wall pathway genes (red nodes) Cyan nodes are genes with 709

reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of 710

involvement in cell wall related pathways Grey lines indicate network predicted interactions B Network 711

retrieved from CORNET database queried by the16 cell wall pathway genes (red node) Cyan nodes are 712

genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior 713

knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions C 714

Network retrieved from STRING database queried by 16 cell wall pathway genes (red nodes) Cyan nodes are 715

genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior 716

knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions 717

718

Supplemental Figure 1 Pipeline and datasets used for analysis A Workflow used in this analysis 719

Independent steps are labeled in square boxes with alternative algorithms for each step in the rounded boxes 720

Software and packages for each step are in italics between the boxes Raw data files were acquired from 721

National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) database converted to a 722

common format (fastq files) and aligned to the maize AGPv3 genome (Alignment) Gene-level reads were 723

counted (Read Count) to generate an expression matrix which was imported to the R environment for the 724

normalization inference and evaluation steps All networks were visualized in Cytoscape B Relative 725

representation of different maize tissues in acquired datasets Tissues are listed by name with the percentage 726

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 22

of the1266 libraries originating from each tissue SAM= Shoot Apical Meristem Samples are grouped by tissue 727

and may be represented by one or more developmental stages of that tissue Tissues represented by less than 728

10 libraries were grouped together as Others C Relative representation of different maize genotypes in our 729

datasets Genotypes are listed by name with the percentage of the 1266 libraries originating from each tissue 730

MAGIC = Multi-parent Advanced Generation InterCrosses Genotypes represented by more than 10 libraries 731

were grouped together as Others 732

733

Supplemental Figure 2 Distribution of gene expression values The frequency of each expression level in the 734

dataset (Density) was plotted against gene expression (Expr) which was calculated after normalization by 735

Variance Stabilizing Transformation (VST) Counts Per Million (CPM) or Reads Per Kilobase per Million 736

mapped reads (RPKM) A-B distribution of expression values for samples normalized with CPM (black line 737

CPM graph) and RPKM (black line RPKM graph) before (A) and after (B) logarithm normalization (log2) VST 738

values are log2 transformed by default The normal distribution of expression (dot lines) was calculated using 739

dnorm() function in R which takes the mean value and standard deviation from log2 transformed expressions 740

C Normalized gene expression values for 15116 genes were averaged libraries and plotted as a function of 741

gene length in base pairs (bp) 742

743

Supplemental Figure 3 Maize CPM-normalized with log2 transformed gene expression from all tissues and 744

developmental stages (Stelpflug et al 2015) A Clustering dendrogram of samples based on Euclidean 745

distance (Height) DAS days after sowing DAP days after pollination V1-V18 vegetative developmental 746

stage B Heat map of the gene expression correlation between pollen tissue and 78 other tissues calculated 747

by Pearson correlation coefficient ranging 06 to 10 Red color indicates higher correlation 748

749

Supplemental Figure 4 Pairwise comparison among results of inferences methods A GO evaluation 750

comparisons for VST CPM and RPKM normalized data The AUROC value density for each method was 751

plotted in diagonal line of blocks between AUROC values and PCC values AUROC values evaluated by GO 752

datasets were plotted pairwise in triangle below diagonal with the number corresponding coefficient values as 753

calculated by Pearson correlation shown in the triangle above diagonal B PPPTY evaluation comparisons for 754

VST CPM and RPKM normalized data The AUROC value density for each method was plotted in diagonal 755

line of blocks between AUROC values and PCC values AUROC values evaluated by PPPTY datasets were 756

plotted pairwise in triangle below diagonal with the number corresponding coefficient values as calculated by 757

Pearson correlation shown in the triangle above diagonal PCC Pearson Correlation Coefficient SCC 758

Spearman correlation coefficient KCC Kendall rank Correlation Coefficient GCC Gini correlation coefficient 759

Bi Biweight midcorrelation CSC Cosine Similarity Coefficient AA Additive ARCNE MA multiplicative 760

ARCNE MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness 761

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 23

762

Supplemental Figure 5 Characteristic of all 1720 PPPTY gene set (ALL_1720) genes with highest AUROC 763

values in CSC method (CSC) PCC method (PCC) and MRNET method (MRNET) Average expression in 764

CPM of four gene sets were in squares average number of lowly expressed elements (CPM lt 0) were in solid 765

circles 766

767

Supplemental Figure 6 Evaluation of network performance based on sample size and inference A AUROC 768

values from GO evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted 769

against sample size B AUROC values from PPPTY evaluations of networks constructed using 12 (S12) to 770

1266 (S1266) libraries were plotted against sample size Networks with the same number of samples included 771

are designated as ldquo_1rdquo ldquo_2rdquo and ldquo_3rdquo Outliers were defined as outside of 15 times the interquartile range 772

above the 75 quantile or below 25 quantile Median values were plotted as bold horizontal lines Dash lines 773

are average AUROC value from 17 individual networks of each categories Mean values of each network were 774

labeled in asterisks PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET 775

Minimum Redundancy NETwork CLR Context Likelihood of Relatedness 776

777

Supplemental Figure 7 GCN performance comparison between protein networks A Area Under the ROC 778

curve (AUROC) values from GO evaluation of protein networks with 17862 genes (ppr_all) and with 11429 779

genes (ppr) B Area Under the ROC curve (AUROC) values from PPPTY evaluation of protein networks with 780

17862 genes (ppr_all) and with 11429 genes (ppr) Both networks were constructed by Pearson Correlation 781

Coefficient (PCC) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate 782

outliers 783

784

Supplemental Figure 8 Average neighborhood connectivity for three selected networks PCC-aggregated (PA) 785

SCC-aggregated (SA) and MRNET-single (MS) The average neighborhood connectivity distribution of all 786

genes is plotted against number of neighbors The top one million edges were chosen for each network Red 787

and blue curve shows the power-law fitted distribution R2 value indicates the fitness with the power-law model 788

789

Supplemental Figure 9 Node distribution for four selected networks PCC-aggregated (PA) SCC-aggregated 790

(SA) and MRNET-single (MS) and the intersection among three networks (Merged network) The number of 791

edges linked to the genes (node degree) was plotted against the number of genes with that degree (number of 792

nodes) Red curve shows the power-law fitted distribution with the function and R2 indicated beside 793

794

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 24

Supplemental Figure 10 Network representation of PCC ranked aggregation network (PA) Each node is a 795

gene in the network The eight largest modules detected by Markov Cluster Algorithm (MCL) were highlighted 796

in colors Genes not in modules 1-8 are light grey nodes 797

798

799

Literature Cited 800

Allen JD Xie Y Chen M Girard L Xiao G (2012) Comparing statistical methods for constructing large scale 801 gene networks PLoS One 7 e29348 802

Anders S Huber W (2010) Differential expression analysis for sequence count data Genome Biol 11 R106 803

Aoki K Ogata Y Shibata D (2007) Approaches for extracting practical information from gene co-expression 804 networks in plant biology Plant Cell Physiol 48 381ndash90 805

Aoki Y Okamura Y Tadaka S Kinoshita K Obayashi T (2015) ATTED-II in 2016 A Plant Coexpression 806 Database Towards Lineage-Specific Coexpression Plant Cell Physiol 57 e5ndashe5 807

Asnicar F Masera L Coller E Gallo C Sella N Tolio T Morettin P Erculiani L Galante F Semeniuta S (2016) 808 NES2RA Network expansion by stratified variable subsetting and ranking aggregation Int J High Perform 809 Comput Appl 1094342016662508 810

Baerenfaller K Grossmann J Grobei MA Hull R Hirsch-Hoffmann M Yalovsky S Zimmermann P 811 Grossniklaus U Gruissem W Baginsky S (2008) Genome-scale proteomics reveals Arabidopsis thaliana 812 gene models and proteome dynamics Science (80- ) 320 938ndash941 813

Ballouz S Verleyen W Gillis J (2015) Guidance for RNA-seq co-expression network construction and analysis 814 Safety in numbers Bioinformatics 31 2123ndash2130 815

Ballouz S Weber M Pavlidis P Gillis J (2016) EGAD Ultra-fast functional analysis of gene networks bioRxiv 816 53868 817

Barabasi A-L Oltvai ZNZN Barabaacutesi A-L (2004) Network biology understanding the cellrsquos functional 818 organization Nat Rev Genet 5 101ndash113 819

Benjamini Y Hochberg Y (1995) Controlling the false discovery rate a practical and powerful approach to 820 multiple testing J R Stat Soc Ser B 289ndash300 821

De Bodt S Hollunder J Nelissen H Meulemeester N Inzeacute D (2012) CORNET 20 Integrating plant 822 coexpression protein-protein interactions regulatory interactions gene associations and functional 823 annotations New Phytol 195 707ndash720 824

Bolduc N Yilmaz A Mejia-Guerra MK Morohashi K OrsquoConnor D Grotewold E Hake S (2012) Unraveling the 825 KNOTTED1 regulatory network in maize meristems Genes Dev 26 1685ndash90 826

Bosch M Mayer C-D Cookson A Donnison IS (2011) Identification of genes involved in cell wall biogenesis in 827 grasses by differential gene expression profiling of elongating and non-elongating maize internodes J 828 Exp Bot 62 3545ndash3561 829

Camacho C Coulouris G Avagyan V Ma N Papadopoulos J Bealer K Madden TL (2009) BLAST+ 830 architecture and applications BMC Bioinformatics 10 421 831

Conesa A Madrigal P Tarazona S Gomez-Cabrero D Cervera A McPherson A Szcześniak MW Gaffney DJ 832 Elo LL Zhang X et al (2016) A survey of best practices for RNA-seq data analysis Genome Biol 17 13 833

Drsquohaeseleer P Liang S Somogyi R (2000) Genetic network inference from co-expression clustering to reverse 834 engineering Bioinformatics 16 707ndash726 835

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 25

Davidson RM Hansey CN Gowda M Childs KL Lin H Vaillancourt B Sekhon RS de Leon N Kaeppler SM 836 Jiang N et al (2011) Utility of RNA Sequencing for Analysis of Maize Reproductive Transcriptomes Plant 837 Genome J 4 191 838

Desprez T Juraniec M Crowell EF Jouy H Pochylova Z Parcy F Houmlfte H Gonneau M Vernhettes S (2007) 839 Organization of cellulose synthase complexes involved in primary cell wall synthesis in Arabidopsis 840 thaliana Proc Natl Acad Sci 104 15572ndash15577 841

Dhillon IS Modha DS (2001) Concept decompositions for large sparse text data using clustering Mach Learn 842 42 143ndash175 843

Dillies M-A Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot G Castel D 844 Estelle J (2013a) A comprehensive evaluation of normalization methods for Illumina high-throughput RNA 845 sequencing data analysis Brief Bioinform 14 671ndash683 846

Dillies MA Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot NS Castel D 847 Estelle J et al (2013b) A comprehensive evaluation of normalization methods for Illumina high-throughput 848 RNA sequencing data analysis Brief Bioinform 14 671ndash683 849

Doncheva NT Assenov Y Domingues FS Albrecht M (2012) Topological analysis and interactive visualization 850 of biological networks and protein structures Nature Protoc 7 670ndash85 851

Dong J Horvath S (2007) Understanding network concepts in modules BMC Syst Biol 1 24 852

Dotto MC Petsch KA Aukerman MJ Beatty M Hammell M Timmermans MCP (2014) Genome-wide analysis 853 of leafbladeless1-regulated and phased small RNAs underscores the importance of the TAS3 ta-siRNA 854 pathway to maize development PLoS Genet 10 e1004826 855

Du D Rawat N Deng Z Gmitter FG (2015) Construction of citrus gene coexpression networks from microarray 856 data using random matrix theory Hortic Res 2 15026 857

Du Z Zhou X Ling Y Zhang Z Su Z (2010) agriGO a GO analysis toolkit for the agricultural community 858 Nucleic Acids Res 38 64-70 859

Enright AJ Van Dongen S Ouzounis CA (2002) An efficient algorithm for large-scale detection of protein 860 families Nucleic Acids Res 30 1575ndash1584 861

Fadista J Vikman P Laakso EO Mollet IG Esguerra J Lou Taneera J Storm P Osmark P Ladenvall C 862 Prasad RB (2014) Global genomic and transcriptomic analysis of human pancreatic islets reveals novel 863 genes influencing glucose metabolism Proc Natl Acad Sci 111 13924ndash13929 864

Faith JJ Hayete B Thaden JT Mogno I Wierzbowski J Cottarel G Kasif S Collins JJ Gardner TS (2007) 865 Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of 866 expression profiles PLoS Biol 5 0054ndash0066 867

Fedoroff N V (2012) McClintockrsquos challenge in the 21st century Proc Natl Acad Sci 109(50) 20200ndash20203 868

Ficklin SP Feltus FA (2011) Gene coexpression network alignment and conservation of gene modules 869 between two grass species maize and rice Plant Physiol 156 1244ndash56 870

Filzmoser P Fritz H Kalcher K (2009) pcaPP Robust PCA by Projection Pursuit R Packag version 1 871

Fu J Cheng Y Linghu J Yang X Kang L Zhang Z Zhang J He C Du X Peng Z (2013) RNA sequencing 872 reveals the complex regulatory network in the maize kernel Nature Commun 42832 873

Gao S Ver Steeg G Galstyan A (2015) Efficient Estimation of Mutual Information for Strongly Dependent 874 Variables Artificial Intelligence and Statistics 277-286 875

Gillis J Pavlidis P (2011) The role of indirect connections in gene networks in predicting function 876 Bioinformatics 27 1860ndash1866 877

Giorgi F Fabbro C Del Licausi F (2013) Comparative study of RNA-seq-and Microarray-derived coexpression 878 networks in Arabidopsis thaliana Bioinformatics 2 1ndash8 879

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 26

Gu Y Kaplinsky N Bringmann M Cobb A Carroll A Sampathkumar A Baskin TI Persson S Somerville CR 880 (2010) Identification of a cellulose synthase-associated protein required for cellulose biosynthesis Proc 881 Natl Acad Sci 107 12866ndash12871 882

Han Y Gao S Muegge K Zhang W Zhou B (2015) Advanced applications of RNA sequencing and challenges 883 Bioinform Biol Insights 9 29ndash46 884

Horvath S Dong J (2008) Geometric interpretation of gene coexpression network analysis PLoS Comput biol 885 4 e1000117 886

Huang J Lynn JS Schulte L Vendramin S McGinnis K (2017) Chapter Two-Epigenetic Control of Gene 887 Expression in Maize Int Rev Cell Mol Biol 328 25ndash48 888

Iancu OD Kawane S Bottomly D Searles R Hitzemann R McWeeney S (2012) Utilizing RNA-Seq data for de 889 novo coexpression network inference Bioinformatics 28 1592ndash1597 890

Kim D Langmead B Salzberg SL (2015) HISAT a fast spliced aligner with low memory requirements Nat 891 Methods 12 357ndash360 892

Koumlster J Rahmann S (2012) Snakemakemdasha scalable bioinformatics workflow engine Bioinformatics 28 893 2520ndash2522 894

Krouk G Lingeman J Colon A Coruzzi G Shasha D (2013) Gene regulatory networks in plants learning 895 causality from time and perturbation Genome Biol 14 123 896

Kumar S Stecher G Suleski M Hedges SB (2017) TimeTree a resource for timelines timetrees and 897 divergence times Mol Biol Evol 34 1812ndash1819 898

Kumari S Nie J Chen H-S Ma H Stewart R Li X Lu M-Z Taylor WM Wei H (2012) Evaluation of gene 899 association methods for coexpression network construction and biological knowledge discovery PLoS 900 One 7 e50411 901

Langfelder P Horvath S (2008) WGCNA an R package for weighted correlation network analysis BMC 902 Bioinformatics 9 559 903

Leinonen R Sugawara H Shumway M (2010) The sequence read archive Nucleic Acids Res gkq1019 904

Li C Qiao Z Qi W Wang Q Yuan Y Yang X Tang Y Mei B Lv Y Zhao H et al (2015a) Genome-Wide 905 Characterization of cis-Acting DNA Targets Reveals the Transcriptional Regulatory Framework of 906 Opaque2 in Maize Plant Cell 27 532-545 907

Li H Peng Z Yang X Wang W Fu J Wang J Han Y Chai Y Guo T Yang N (2013a) Genome-wide 908 association study dissects the genetic architecture of oil biosynthesis in maize kernels Nat Genet 45 43ndash909 50 910

Li J Wei H Zhao PX (2013b) DeGNServer  Deciphering Genome-Scale Gene Networks through High 911 Performance Reverse Engineering Analysis 2013 912

Li P Piao Y Shon HS Ryu KH (2015b) Comparing the normalization methods for the differential analysis of 913 Illumina high-throughput RNA-Seq data BMC Bioinformatics 16 347 914

Li Q Eichten SR Hermanson PJ Zaunbrecher VM Song J Wendt J Rosenbaum H Madzima TF Sloan AE 915 Huang J et al (2014a) Genetic Perturbation of the Maize Methylome Plant Cell 26 4602ndash4616 916

Li S Łabaj PP Zumbo P Sykacek P Shi W Shi L Phan J Wu P-Y Wang M Wang C (2014b) Detecting and 917 correcting systematic variation in large-scale RNA sequencing data Nature Biotechnol 32 888ndash895 918

Li Y Pearl SA Jackson SA (2015c) Gene Networks in Plant Biology Approaches in Reconstruction and 919 Analysis Trends Plant Sci 20 664ndash675 920

Liao Y Smyth GK Shi W (2014) featureCounts an efficient general purpose program for assigning sequence 921 reads to genomic features Bioinformatics 30 923ndash930 922

Lim WK Wang K Lefebvre C Califano A (2007) Comparative analysis of microarray normalization procedures 923 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 27

Effects on reverse engineering gene networks Bioinformatics pp 282ndash288 924

Liu S Liu Y Zhao J Cai S Qian H Zuo K Zhao L Zhang L (2017) A computational interactome for prioritizing 925 genes associated with complex agronomic traits in rice Plant J 90 177-188 926

Lonsdale J Thomas J Salvatore M Phillips R Lo E Shad S Hasz R Walters G Garcia F Young N (2013) 927 The genotype-tissue expression (GTEx) project Nat Genet 45 580ndash585 928

Love MI Huber W Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data 929 with DESeq2 Genome Biol 15 1 930

Luo X You Z Zhou M Li S Leung H Xia Y Zhu Q (2015) A highly efficient approach to protein interactome 931 mapping based on collaborative filtering framework Sci Rep 5 7702 932

Ma C Wang X (2012) Application of the Gini correlation coefficient to infer regulatory relationships in 933 transcriptome analysis Plant Physiol 160 192ndash203 934

Ma H-W Zeng A-P (2003) The connectivity structure giant strong component and centrality of metabolic 935 networks Bioinformatics 19 1423ndash1430 936

Ma S Shah S Bohnert HJ Snyder M Dinesh-Kumar SP (2013) Incorporating motif analysis into gene co-937 expression networks reveals novel modular expression pattern and new signaling pathways PLoS Genet 938 9 e1003840 939

Marbach D Costello JC Kuumlffner R Vega NNM Prill RJ Camacho DM Allison KR Aderhold A Allison KR 940 Bonneau R et al (2012) Wisdom of crowds for robust gene network inference Nat Methods 9 796ndash804 941

Margolin AA Nemenman I Basso K Wiggins C Stolovitzky G Dalla Favera R Califano A (2006) ARACNE 942 an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context BMC 943 Bioinformatics 7 S7 944

Mark Cigan A Unger‐Wallace E Haug‐Collet K (2005) Transcriptional gene silencing as a tool for uncovering 945 gene function in maize Plant J 43 929ndash940 946

Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads EMBnet J 17 947 pp-10 948

Maza E Frasse P Senin P Bouzayen M Zouine M (2013) Comparison of normalization methods for 949 differential gene expression analysis in RNA-Seq experiments A matter of relative size of studied 950 transcriptomes Commun Integr Biol 6 e25849 951

McClintock B (1983) The Significance of Responses of THE GENOME TO CHALLENGE Science (80- ) 792ndash952 801 953

Meyer PE Kontos K Lafitte F Bontempi G (2007) Information-theoretic inference of large transcriptional 954 regulatory networks Eurasip J Bioinforma Syst Biol doi 101155200779879 955

Meyer PE Lafitte F Bontempi G (2008) minet ARBioconductor package for inferring large transcriptional 956 networks using mutual information BMC Bioinformatics 9 461 957

Monaco MK Sen TZ Dharmawardhana PD Ren L Schaeffer M Naithani S Amarasinghe V Thomason J 958 Harper L Gardiner J et al (2013) Maize Metabolic Network Construction and Transcriptome Analysis 959 Plant Genome 6 12 960

Morohashi K Casas MI Falcone Ferreyra ML Falcone Ferreyra L Mejiacutea-Guerra MK Pourcel L Yilmaz A 961 Feller A Carvalho B Emiliani J et al (2012) A genome-wide regulatory framework identifies maize 962 pericarp color1 controlled genes Plant Cell 24 2745ndash64 963

Morris JH Apeltsin L Newman AM Baumbach J Wittkop T Su G Bader GD Ferrin TE (2011) clusterMaker 964 a multi-algorithm clustering plugin for Cytoscape BMC Bioinformatics 12 436 965

Mortazavi A Williams BA McCue K Schaeffer L Wold B (2008) Mapping and quantifying mammalian 966 transcriptomes by RNA-Seq Nat Methods 5 621ndash628 967

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 28

Mukaka MM (2012) A guide to appropriate use of correlation coefficient in medical research Malawi Med J 24 968 69ndash71 969

Obayashi T Hayashi S Saeki M Ohta H Kinoshita K (2009) ATTED-II provides coexpressed gene networks 970 for Arabidopsis Nucleic Acids Res 37 D987ndashD991 971

Ogata Y Suzuki H Sakurai N Shibata D (2010) CoP a database for characterizing co-expressed gene 972 modules with biological information in plants Bioinformatics 26 1267ndash1268 973

Oshlack A Wakefield MJ (2009) Transcript length bias in RNA-seq data confounds systems biology Biol 974 Direct 4 14 975

Park T Yi S-G Kang S-H Lee S Lee Y-S Simon R (2003) Evaluation of normalization methods for microarray 976 data BMC Bioinformatics 4 33 977

Paulson J Chen C-Y Lopes-Ramos CM Kuijjer ML Platig J Sonawane AR Fagny M Glass K Quackenbush 978 J (2016) Tissue-aware RNA-Seq processing and normalization for heterogeneous and sparse data 979 bioRxiv 81802 980

Pautler M Eveland AL LaRue T Yang F Weeks R Lunde C Je B Il Meeley R Komatsu M Vollbrecht E et 981 al (2015) FASCIATED EAR4 Encodes a bZIP Transcription Factor That Regulates Shoot Meristem Size in 982 Maize Plant Cell Online 2 tpc114132506 983

Penning BW Hunter 3rd CT Tayengwa R Eveland AL Dugard CK Olek AT Vermerris W Koch KE McCarty 984 DR Davis MF et al (2009) Genetic resources for maize cell wall biology Plant Physiol 151 1703ndash1728 985

Ponnala L Wang Y Sun Q Wijk KJ (2014) Correlation of mRNA and protein abundance in the developing 986 maize leaf Plant J 78 424ndash440 987

Rau A Gallopin M Celeux G Jaffreacutezic F (2013) Data-based filtering for replicated high-throughput 988 transcriptome sequencing experiments Bioinformatics 29 2146ndash2152 989

Robinson MD McCarthy DJ Smyth GK (2010) edgeR a Bioconductor package for differential expression 990 analysis of digital gene expression data Bioinformatics 26 139ndash140 991

Sales G Romualdi C (2011) parmigenemdasha parallel R package for mutual information estimation and gene 992 network reconstruction Bioinformatics 27 1876ndash1877 993

Sangrador-Vegas A Mitchell AL Chang H-Y Yong S-Y Finn RD (2016) GO annotation in InterPro why 994 stability does not indicate accuracy in a sea of changing annotations Database J Biol databases 995 curation 2016 996

Sato Y Antonio BA Namiki N Takehisa H Minami H Kamatsuki K Sugimoto K Shimizu Y Hirochika H 997 Nagamura Y (2011) RiceXPro a platform for monitoring gene expression in japonica rice grown under 998 natural field conditions Nucleic Acids Res 39 D1141ndashD1148 999

Schaefer RJ Briskine R Springer NM Myers CL (2014) Discovering functional modules across diverse maize 1000 transcriptomes using COB the co-expression browser PLoS One doi 101371journalpone0099193 1001

Schmidt D (2016) Co-Operation Fast Correlation Covariance and Cosine Similarity R package 1002

Schnable P Ware D Fulton R Stein J (2009) The B73 maize genome complexity diversity and dynamics 1003 Science (80- ) 326 1112ndash1115 1004

Schwanhaumlusser B Busse D Li N Dittmar G Schuchhardt J Wolf J Chen W Selbach M (2011) Global 1005 quantification of mammalian gene expression control Nature 473 337ndash342 1006

Scott-Boyer M-P Haibe-Kains B Deschepper CF (2013) Network statistics of genetically-driven gene co-1007 expression modules in mouse crosses Frontiers in Genetics 20134291 1008

Serin EAR Nijveen H Hilhorst HWM Ligterink W (2016) Learning from Co-expression Networks Possibilities 1009 and Challenges Front Plant Sci 7 444 1010

Shannon P Markiel A Ozier O Baliga NS Wang JT Ramage D Amin N Schwikowski B Ideker T (2003) 1011 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 29

Cytoscape a software environment for integrated models of biomolecular interaction networks Genome 1012 Res 13 2498ndash2504 1013

Sing T Sander O Beerenwinkel N Lengauer T (2005) ROCR visualizing classifier performance in R 1014 Bioinformatics 21 3940ndash3941 1015

Singh M Goel S Meeley RB Dantec C Parrinello H Michaud C Leblanc O Grimanelli D (2011) Production of 1016 viable gametes without meiosis in maize deficient for an ARGONAUTE protein Plant Cell 23 443ndash458 1017

Song L Langfelder P Horvath S (2012) Comparison of co-expression measures mutual information 1018 correlation and model based indices BMC Bioinformatics 13 328 1019

Stelpflug SC Rajandeep S Vaillancourt B Hirsch CN Buell CR Leon N De Kaeppler SM (2015) An 1020 expanded maize gene expression atlas based on RNA-sequencing and its use to explore root 1021 development Plant Genome 314ndash362 1022

Szklarczyk D Franceschini A Wyder S Forslund K Heller D Huerta-Cepas J Simonovic M Roth A Santos A 1023 Tsafou KP et al (2015) STRING v10 Protein-protein interaction networks integrated over the tree of life 1024 Nucleic Acids Res 43 D447ndashD452 1025

Tzfadia O Diels T De Meyer S Vandepoele K Aharoni A Van de Peer Y (2016) CoExpNetViz Comparative 1026 Co-Expression Networks Construction and Visualization Tool Front Plant Sci 6 1194 1027

Usadel B Obayashi T Mutwil M Giorgi FM Bassel GW Tanimoto M Chow A Steinhauser D Persson S 1028 Provart NJ (2009) Co-expression tools for plant biology opportunities for hypothesis generation and 1029 caveats Plant Cell Environ 32 1633ndash51 1030

USDA (2016) Grain World Markets and Trade 1031

Vanholme R Demedts B Morreel K Ralph J Boerjan W (2010) Lignin biosynthesis and structure Plant 1032 Physiol 153 895ndash905 1033

Verdin E Ott M (2014) 50 Years of Protein Acetylation From Gene Regulation To Epigenetics Metabolism 1034 and Beyond Nat Rev Mol Cell Biol 16 258ndash264 1035

Walley JW Sartor RC Shen Z Schmitz RJ Wu KJ Urich MA Nery JR Smith LG Schnable JC Ecker JR 1036 (2016) Integration of omic networks in a developmental atlas of maize Science 353 814ndash818 1037

Wang W Zhou XJ Liu Z Sun F (2015a) Network tuned multiple rank aggregation and applications to gene 1038 ranking BMC Bioinformatics 16 S6 1039

Wang YXR Jiang K Feldman LJ Bickel PJ Huang H (2015b) Inferring genendashgene interactions and functional 1040 modules using sparse canonical correlation analysis Ann Appl Stat 9 300ndash323 1041

Wang Z Gerstein M Snyder M (2009) RNA-Seq a revolutionary tool for transcriptomics Nat Rev Genet 10 1042 57ndash63 1043

Wei C Li J Bumgarner RE (2004) Sample size for detecting differentially expressed genes in microarray 1044 experiments BMC Genomics 5 87 1045

Wolfe CJ Kohane IS Butte AJ (2005) Systematic survey reveals general applicability ofldquo guilt-by-associationrdquo 1046 within gene coexpression networks BMC Bioinformatics 6 227 1047

Wu D-D Wang X Li Y Zeng L Irwin DM Zhang Y-P (2014) ldquoOut of Pollenrdquo Hypothesis for Origin of New 1048 Genes in Flowering Plants Study from Arabidopsis thaliana Genome Biol Evol 6 2822ndash2829 1049

Yang H Liu X Xin M Du J Hu Z Peng H Rossi V Sun Q Ni Z Yao Y (2016) Genome-wide Mapping of 1050 Targets of Maize Histone Deacetylase HDA101 Reveals Its Function and Regulatory Mechanism during 1051 Seed Development Plant Cell 28 629ndash645 1052

Yim WC Yu Y Song K Jang CS Lee B-M (2013) PLANEX the plant co-expression database BMC Plant Biol 1053 13 83 1054

Zheng W Chung LM Zhao H (2011) Bias detection and correction in RNA-Sequencing data BMC 1055 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 30

Bioinformatics 12 290 1056

Zhong R Allen JD Xiao G Xie Y (2014) Ensemble-based network aggregation improves the accuracy of gene 1057 network reconstruction PLoS One 9 e106319 1058

Zhong R Ye ZH (2015) Secondary cell walls Biosynthesis patterned deposition and transcriptional regulation 1059 Plant Cell Physiol 56 195ndash214 1060

Zhu G Wu A Xu X-J Xiao P Lu L Liu J Cao Y Chen L Wu J Zhao X-M (2016) PPIM A protein-protein 1061 interaction database for Maize Plant Physiol 170 pp1501821- 1062

Zyprych-Walczak J Szabelska A Handschuh L Goacuterczak K Klamecka K Figlerowicz M Siatkowski I (2015) 1063 The impact of normalization methods on RNA-Seq data analysis Biomed Res Int 2015 1064

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 1 Number of maize microarray and RNA-Seq samples submitted to NCBI (httpswwwncbinlmnihgov) from 2008-2016 A A text search of the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) database identified samples generated by microarray platforms GPL4032 and GPL12620 the total values for years 2008 to 2016 were combined to represent the number of microarray studies (Microarray) A text search of the NCBI Sequence Read Archive (SRA) database was used to identify RNA-seq samples generated between 2008 and 2016 using the Illumina sequencing platform (RNA-Seq) B the number of samples submitted to NCBI GEO database each year generated by microarray platform GPL4032 and GPL12620 were identified by a text search (dash line) and compared to the number of RNA-Seq Illumina samples (solid line) per year 2008-2016

Fig 1A B

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 2 Normalization and network inference methods effect on single network performance A Network performance was evaluated by calculating area under the receiver operating characteristic curve (AUROC) values from GO datasets for comparisons with samples normalized using Variance Stabilizing Transformation (VST) Counts Per Million (CPM) or Reads per Kilobase per Million (RPKM) methods B Network performance was evaluated by calculating AUROC values from PPPTY dataset comparisons for samples normalized using VST CPM or RPKM methods C Network performance was evaluated by calculating AUROC values from com-parisons with HDA101 binding targets for samples normalized using VST CPM or RPKM methods D Network performance was evaluated by calculating AUROC values from comparisons with GO dataset for samples constructed using nine inference methods including Pearson Correlation Coefficient (PCC) Spearman correla-tion coefficient (SCC) Kendall rank Correlation Coefficient (KCC) Gini correlation coefficient (GCC) Biweight midcorrelation (BIC) Cosine Similarity Coefficient (CSC) Additive ARCNE (AA) multiplicative ARCNE (MA) Minimum Redundancy NETwork (MRNET) and Context Likelihood of Relatedness (CLR) E Network perfor-mance was evaluated by calculating AUROC values from comparisons with PPPTY for samples constructed using nine inference methods F Network performance was evaluated by calculating AUROC values from com-parisons with HDA101 binding targets for samples constructed using nine inference methods Outliers were defined as outside of 15 times the interquartile range above 75 quantile or below 25 quantile Median values were plotted as bold horizontal lines For C to F the horizontal dashed lines indicate the highest and lowest AUROC values

Fig 2 A D

B E

C F

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

FigP

FigurePIPSimilarityPbetweenPninePinferencePmethodsPonPnetworkPperformancePbasedPuponPGOPA-PandPPPPTYPB-PevaluationI Genes with highest AUROC value in CSC or MRNETCLR are closed in blue and green box respectively AreaPunderPthePROCPcurvePAUROC-PvaluesPforPeachPGOPtermPorPgenesPwerePscaledPtoPstandardPnormalPdistributionMPresultingPinPscaledPAUROCPvaluesPbetweenPKPblue-PandPPred-IPSamplesPnormalizedPbyPVSTMPCPMPandPRPKMPwerePanalyzedPusingPeachPinferencePmethodsPPCCMPSCCMPKCCMPGCCMPBICMPCSCMPAAMPMAMPMRNETPandPCLR-PandPclusteredPbasedPonP EuclidianP distanceIP PCCP PearsonP CorrelationP CoefficientP SCCP SpearmanP correlationP coefficientP KCCP KendallP rankP CorrelationP CoefficientPGCCPGiniPcorrelationPcoefficientPBICPBiweightPmidcorrelationPCosinePSimilarityPCoefficientPCSC-PAAPAdditivePARCNEPMAPmultiplicativePARCNEPMRNETPMinimumPRedundancyPNETworkPCLRPContextPLikelihoodPofPRelatednessI

A

B

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 4 Effect of sample size on network performance A Average area under the ROC curve (average AUROC) values from GO evaluation of 17 different sized networks plotted against natural logarithm transformed sample size (log(sample size)) B Average AUROC values from PPPTY evaluation of 17 different sized networks plotted against natural logarithm transformed sample size (log(sample size)) Regression fitting logarithm models were plotted in black lines R-square and p-values were calculated by lm() function in R PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness

Fig 4A

B

GO PCC GO SCC

GO MRNET GO CLR

PPPTY PCC PPPTY SCC

PPPTY CLRPPPTY MRNET

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Fig 5

A B

Figure 5 Evaluation of different sized networks constructed with different number of samples using PCC (black) SCC (green) MRNET (red) and CLR (blue) methods A Average AUROC values from GO evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted against sample size B Average AUROC values from PPPTY evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted against sample size Networks with the same number of samples included are designated as ldquo_1rdquo ldquo_2rdquo and ldquo_3rdquo PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 6 GCN performance comparison of networks constructed with 1266 libraries A Area under the ROC curve (AUROC) values from GO evalu-ation of single network (white bars) aggregation network (grey bars) and protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers B AUROC values from PPPTY evaluation of single network (white bars) aggregation network (grey bars) and protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers

AU

C

AU

C

Protein GO Protein PPPTYA B

Fig 6

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

835

45

3812

5 802

148

MS PA

SA

872505

167664

16510411732

9172 716573

106591

MS PA

SA

chromatinassemblydisassembly

cellular macromoleculemetabolic process

chromatin assembly

Hub

N2 compound metabolicprocess

gene silencing

macromoleculemetabolic process

cellular componentorganization

chromatin modification

biosynthetic process

cellular biosyntheticprocess

DNA packaging

organelle organization

protein-DNA complexassembly

nucleosomeorganization

DNA-dep DNAreplication

macromoleculebiosynthetic process

response to DNAdamage stimulus

chromosomeorganization

pattern specificationprocess

DNA replication

DNA conformationchange

translation

cellular macromoleculebiosynthetic process

Nucleic acid metabolicprocess

gene expression

chromatin organizationnucleosome assembly

epigenetic reg of geneexpression

negative regulation ofmacromolecule

metabolic process

cellular response tostress

RNA processing

DNA repair

regionalization

polysaccharidebiosynthetic process

cell wall organization orbiogenesis

glucan metabolicprocess

cellular glucanmetabolic process

cellular polysaccharidebiosynthetic process

cellular carbohydratebiosynthetic process

cellulose metabolicprocess

cellular polysaccharidemetabolic process

cellulose biosyntheticprocess epidermis development

cell growthgrowth

regulation of cellularcomponent size

cellular amino acidderivative metabolic

process

cell wall polysaccharidemetabolic process

carbohydrate metabolicprocess

regulation of anatomicalstructure size

GTP metabolic process

root morphogenesis

epidermal celldifferentiation

ectoderm developmentphenylpropanoid

biosynthetic process

regulation of cell size

glucan biosyntheticprocess

carbohydratebiosynthetic process cellular cell wall

organization orbiogenesis

cell wall biogenesis

Cell Wallroot epidermal cell

differentiationcell differentiation

cell wall organization

protein polymerization

plant-type cell wallbiogenesis

cellular carbohydratemetabolic process

phenylpropanoidmetabolic process

cell wall macromoleculemetabolic process

cellular cell wallmacromolecule

metabolic process

plant-type cell wallorganization or

biogenesis

hemicellulose metabolicprocess

Fig 7A C

B D

Figure 7 Comparison of top 1000 interacting genes in aggregation and single networks A Venn diagram shows the overlap among 1000 highest interacting genes in three networks 148 genes were shared among three networks PA PCC ranked aggregation network SA SCC ranked aggrega-tion network MS MRNET single network B Venn diagram shows the overlap among top 1x106 edges in three networks 106591 edges were shared among three networks C GO term enrichment analyzed by AgriGO for 148 shared highly interacting genes among PA SA and MS D GO term enrichment analyzed by AgriGO for 214 co-expressed genes queried by 16 cell wall pathway genes wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Fig 8

A B C

Figure 8 Cell wall pathway subnetworks A Intersections of PCC aggregation (PA) SCC aggregation (SA) and MRNET-single (MS) networks queried by 16 cell wall pathway genes (red nodes) Cyan nodes are genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions B Network retrieved from CORNET database queried by the 16 cell wall pathway genes (red node) Cyan nodes are genes with reported function in cell wall related path-ways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions C Network retrieved from STRING database queried by the 16 cell wall pathway genes (red nodes) Cyan nodes are genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Parsed CitationsAllen JD Xie Y Chen M Girard L Xiao G (2012) Comparing statistical methods for constructing large scale gene networks PLoS One7 e29348

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Anders S Huber W (2010) Differential expression analysis for sequence count data Genome Biol 11 R106Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Aoki K Ogata Y Shibata D (2007) Approaches for extracting practical information from gene co-expression networks in plant biologyPlant Cell Physiol 48 381-90

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Aoki Y Okamura Y Tadaka S Kinoshita K Obayashi T (2015) ATTED-II in 2016 A Plant Coexpression Database Towards Lineage-Specific Coexpression Plant Cell Physiol 57 e5-e5

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Asnicar F Masera L Coller E Gallo C Sella N Tolio T Morettin P Erculiani L Galante F Semeniuta S (2016) NES2RA Networkexpansion by stratified variable subsetting and ranking aggregation Int J High Perform Comput Appl 1094342016662508

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Baerenfaller K Grossmann J Grobei MA Hull R Hirsch-Hoffmann M Yalovsky S Zimmermann P Grossniklaus U Gruissem WBaginsky S (2008) Genome-scale proteomics reveals Arabidopsis thaliana gene models and proteome dynamics Science (80- ) 320938-941

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ballouz S Verleyen W Gillis J (2015) Guidance for RNA-seq co-expression network construction and analysis Safety in numbersBioinformatics 31 2123-2130

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ballouz S Weber M Pavlidis P Gillis J (2016) EGAD Ultra-fast functional analysis of gene networks bioRxiv 53868Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Barabasi A-L Oltvai ZNZN Barabaacutesi A-L (2004) Network biology understanding the cells functional organization Nat Rev Genet 5101-113

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Benjamini Y Hochberg Y (1995) Controlling the false discovery rate a practical and powerful approach to multiple testing J R Stat SocSer B 289-300

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

De Bodt S Hollunder J Nelissen H Meulemeester N Inzeacute D (2012) CORNET 20 Integrating plant coexpression protein-proteininteractions regulatory interactions gene associations and functional annotations New Phytol 195 707-720

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Bolduc N Yilmaz A Mejia-Guerra MK Morohashi K OConnor D Grotewold E Hake S (2012) Unraveling the KNOTTED1 regulatorynetwork in maize meristems Genes Dev 26 1685-90

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Bosch M Mayer C-D Cookson A Donnison IS (2011) Identification of genes involved in cell wall biogenesis in grasses by differential wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

gene expression profiling of elongating and non-elongating maize internodes J Exp Bot 62 3545-3561Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Camacho C Coulouris G Avagyan V Ma N Papadopoulos J Bealer K Madden TL (2009) BLAST+ architecture and applications BMCBioinformatics 10 421

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Conesa A Madrigal P Tarazona S Gomez-Cabrero D Cervera A McPherson A Szczesniak MW Gaffney DJ Elo LL Zhang X et al(2016) A survey of best practices for RNA-seq data analysis Genome Biol 17 13

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dhaeseleer P Liang S Somogyi R (2000) Genetic network inference from co-expression clustering to reverse engineeringBioinformatics 16 707-726

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Davidson RM Hansey CN Gowda M Childs KL Lin H Vaillancourt B Sekhon RS de Leon N Kaeppler SM Jiang N et al (2011) Utilityof RNA Sequencing for Analysis of Maize Reproductive Transcriptomes Plant Genome J 4 191

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Desprez T Juraniec M Crowell EF Jouy H Pochylova Z Parcy F Houmlfte H Gonneau M Vernhettes S (2007) Organization of cellulosesynthase complexes involved in primary cell wall synthesis in Arabidopsis thaliana Proc Natl Acad Sci 104 15572-15577

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dhillon IS Modha DS (2001) Concept decompositions for large sparse text data using clustering Mach Learn 42 143-175Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dillies M-A Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot G Castel D Estelle J (2013a) Acomprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis Brief Bioinform 14671-683

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dillies MA Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot NS Castel D Estelle J et al (2013b) Acomprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis Brief Bioinform 14671-683

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Doncheva NT Assenov Y Domingues FS Albrecht M (2012) Topological analysis and interactive visualization of biological networksand protein structures Nature Protoc 7 670-85

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dong J Horvath S (2007) Understanding network concepts in modules BMC Syst Biol 1 24Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dotto MC Petsch KA Aukerman MJ Beatty M Hammell M Timmermans MCP (2014) Genome-wide analysis of leafbladeless1-regulatedand phased small RNAs underscores the importance of the TAS3 ta-siRNA pathway to maize development PLoS Genet 10 e1004826

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Du D Rawat N Deng Z Gmitter FG (2015) Construction of citrus gene coexpression networks from microarray data using randommatrix theory Hortic Res 2 15026

Pubmed Author and TitleCrossRef Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Google Scholar Author Only Title Only Author and Title

Du Z Zhou X Ling Y Zhang Z Su Z (2010) agriGO a GO analysis toolkit for the agricultural community Nucleic Acids Res 38 64-70Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Enright AJ Van Dongen S Ouzounis CA (2002) An efficient algorithm for large-scale detection of protein families Nucleic Acids Res 301575-1584

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fadista J Vikman P Laakso EO Mollet IG Esguerra J Lou Taneera J Storm P Osmark P Ladenvall C Prasad RB (2014) Globalgenomic and transcriptomic analysis of human pancreatic islets reveals novel genes influencing glucose metabolism Proc Natl AcadSci 111 13924-13929

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Faith JJ Hayete B Thaden JT Mogno I Wierzbowski J Cottarel G Kasif S Collins JJ Gardner TS (2007) Large-scale mapping andvalidation of Escherichia coli transcriptional regulation from a compendium of expression profiles PLoS Biol 5 0054-0066

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fedoroff N V (2012) McClintocks challenge in the 21st century Proc Natl Acad Sci 109(50) 20200-20203Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ficklin SP Feltus FA (2011) Gene coexpression network alignment and conservation of gene modules between two grass speciesmaize and rice Plant Physiol 156 1244-56

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Filzmoser P Fritz H Kalcher K (2009) pcaPP Robust PCA by Projection Pursuit R Packag version 1Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fu J Cheng Y Linghu J Yang X Kang L Zhang Z Zhang J He C Du X Peng Z (2013) RNA sequencing reveals the complex regulatorynetwork in the maize kernel Nature Commun 42832

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gao S Ver Steeg G Galstyan A (2015) Efficient Estimation of Mutual Information for Strongly Dependent Variables ArtificialIntelligence and Statistics 277-286

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gillis J Pavlidis P (2011) The role of indirect connections in gene networks in predicting function Bioinformatics 27 1860-1866Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Giorgi F Fabbro C Del Licausi F (2013) Comparative study of RNA-seq-and Microarray-derived coexpression networks in Arabidopsisthaliana Bioinformatics 2 1-8

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gu Y Kaplinsky N Bringmann M Cobb A Carroll A Sampathkumar A Baskin TI Persson S Somerville CR (2010) Identification of acellulose synthase-associated protein required for cellulose biosynthesis Proc Natl Acad Sci 107 12866-12871

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Han Y Gao S Muegge K Zhang W Zhou B (2015) Advanced applications of RNA sequencing and challenges Bioinform Biol Insights 929-46

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Horvath S Dong J (2008) Geometric interpretation of gene coexpression network analysis PLoS Comput biol 4 e1000117Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Huang J Lynn JS Schulte L Vendramin S McGinnis K (2017) Chapter Two-Epigenetic Control of Gene Expression in Maize Int RevCell Mol Biol 328 25-48

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Iancu OD Kawane S Bottomly D Searles R Hitzemann R McWeeney S (2012) Utilizing RNA-Seq data for de novo coexpressionnetwork inference Bioinformatics 28 1592-1597

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kim D Langmead B Salzberg SL (2015) HISAT a fast spliced aligner with low memory requirements Nat Methods 12 357-360Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Koumlster J Rahmann S (2012) Snakemakemdasha scalable bioinformatics workflow engine Bioinformatics 28 2520-2522Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Krouk G Lingeman J Colon A Coruzzi G Shasha D (2013) Gene regulatory networks in plants learning causality from time andperturbation Genome Biol 14 123

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kumar S Stecher G Suleski M Hedges SB (2017) TimeTree a resource for timelines timetrees and divergence times Mol Biol Evol34 1812-1819

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kumari S Nie J Chen H-S Ma H Stewart R Li X Lu M-Z Taylor WM Wei H (2012) Evaluation of gene association methods forcoexpression network construction and biological knowledge discovery PLoS One 7 e50411

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Langfelder P Horvath S (2008) WGCNA an R package for weighted correlation network analysis BMC Bioinformatics 9 559Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Leinonen R Sugawara H Shumway M (2010) The sequence read archive Nucleic Acids Res gkq1019Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li C Qiao Z Qi W Wang Q Yuan Y Yang X Tang Y Mei B Lv Y Zhao H et al (2015a) Genome-Wide Characterization of cis-Acting DNATargets Reveals the Transcriptional Regulatory Framework of Opaque2 in Maize Plant Cell 27 532-545

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li H Peng Z Yang X Wang W Fu J Wang J Han Y Chai Y Guo T Yang N (2013a) Genome-wide association study dissects the geneticarchitecture of oil biosynthesis in maize kernels Nat Genet 45 43-50

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li J Wei H Zhao PX (2013b) DeGNServer Deciphering Genome-Scale Gene Networks through High Performance ReverseEngineering Analysis 2013

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li P Piao Y Shon HS Ryu KH (2015b) Comparing the normalization methods for the differential analysis of Illumina high-throughputRNA-Seq data BMC Bioinformatics 16 347 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li Q Eichten SR Hermanson PJ Zaunbrecher VM Song J Wendt J Rosenbaum H Madzima TF Sloan AE Huang J et al (2014a)Genetic Perturbation of the Maize Methylome Plant Cell 26 4602-4616

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li S Labaj PP Zumbo P Sykacek P Shi W Shi L Phan J Wu P-Y Wang M Wang C (2014b) Detecting and correcting systematicvariation in large-scale RNA sequencing data Nature Biotechnol 32 888-895

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li Y Pearl SA Jackson SA (2015c) Gene Networks in Plant Biology Approaches in Reconstruction and Analysis Trends Plant Sci 20664-675

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Liao Y Smyth GK Shi W (2014) featureCounts an efficient general purpose program for assigning sequence reads to genomicfeatures Bioinformatics 30 923-930

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Lim WK Wang K Lefebvre C Califano A (2007) Comparative analysis of microarray normalization procedures Effects on reverseengineering gene networks Bioinformatics pp 282-288

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Liu S Liu Y Zhao J Cai S Qian H Zuo K Zhao L Zhang L (2017) A computational interactome for prioritizing genes associated withcomplex agronomic traits in rice Plant J 90 177-188

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Lonsdale J Thomas J Salvatore M Phillips R Lo E Shad S Hasz R Walters G Garcia F Young N (2013) The genotype-tissueexpression (GTEx) project Nat Genet 45 580-585

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Love MI Huber W Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 Genome Biol15 1

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Luo X You Z Zhou M Li S Leung H Xia Y Zhu Q (2015) A highly efficient approach to protein interactome mapping based oncollaborative filtering framework Sci Rep 5 7702

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma C Wang X (2012) Application of the Gini correlation coefficient to infer regulatory relationships in transcriptome analysis PlantPhysiol 160 192-203

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma H-W Zeng A-P (2003) The connectivity structure giant strong component and centrality of metabolic networks Bioinformatics 191423-1430

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma S Shah S Bohnert HJ Snyder M Dinesh-Kumar SP (2013) Incorporating motif analysis into gene co-expression networks revealsnovel modular expression pattern and new signaling pathways PLoS Genet 9 e1003840

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Marbach D Costello JC Kuumlffner R Vega NNM Prill RJ Camacho DM Allison KR Aderhold A Allison KR Bonneau R et al (2012)Wisdom of crowds for robust gene network inference Nat Methods 9 796-804

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Margolin AA Nemenman I Basso K Wiggins C Stolovitzky G Dalla Favera R Califano A (2006) ARACNE an algorithm for thereconstruction of gene regulatory networks in a mammalian cellular context BMC Bioinformatics 7 S7

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mark Cigan A Unger-Wallace E Haug-Collet K (2005) Transcriptional gene silencing as a tool for uncovering gene function in maizePlant J 43 929-940

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads EMBnet J 17 pp-10Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Maza E Frasse P Senin P Bouzayen M Zouine M (2013) Comparison of normalization methods for differential gene expressionanalysis in RNA-Seq experiments A matter of relative size of studied transcriptomes Commun Integr Biol 6 e25849

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

McClintock B (1983) The Significance of Responses of THE GENOME TO CHALLENGE Science (80- ) 792-801Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Meyer PE Kontos K Lafitte F Bontempi G (2007) Information-theoretic inference of large transcriptional regulatory networks EurasipJ Bioinforma Syst Biol doi 101155200779879

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Meyer PE Lafitte F Bontempi G (2008) minet ARBioconductor package for inferring large transcriptional networks using mutualinformation BMC Bioinformatics 9 461

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Monaco MK Sen TZ Dharmawardhana PD Ren L Schaeffer M Naithani S Amarasinghe V Thomason J Harper L Gardiner J et al(2013) Maize Metabolic Network Construction and Transcriptome Analysis Plant Genome 6 12

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Morohashi K Casas MI Falcone Ferreyra ML Falcone Ferreyra L Mejiacutea-Guerra MK Pourcel L Yilmaz A Feller A Carvalho B EmilianiJ et al (2012) A genome-wide regulatory framework identifies maize pericarp color1 controlled genes Plant Cell 24 2745-64

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Morris JH Apeltsin L Newman AM Baumbach J Wittkop T Su G Bader GD Ferrin TE (2011) clusterMaker a multi-algorithmclustering plugin for Cytoscape BMC Bioinformatics 12 436

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mortazavi A Williams BA McCue K Schaeffer L Wold B (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq NatMethods 5 621-628

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mukaka MM (2012) A guide to appropriate use of correlation coefficient in medical research Malawi Med J 24 69-71Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Obayashi T Hayashi S Saeki M Ohta H Kinoshita K (2009) ATTED-II provides coexpressed gene networks for Arabidopsis Nucleic wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Acids Res 37 D987-D991Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ogata Y Suzuki H Sakurai N Shibata D (2010) CoP a database for characterizing co-expressed gene modules with biologicalinformation in plants Bioinformatics 26 1267-1268

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Oshlack A Wakefield MJ (2009) Transcript length bias in RNA-seq data confounds systems biology Biol Direct 4 14Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Park T Yi S-G Kang S-H Lee S Lee Y-S Simon R (2003) Evaluation of normalization methods for microarray data BMC Bioinformatics4 33

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Paulson J Chen C-Y Lopes-Ramos CM Kuijjer ML Platig J Sonawane AR Fagny M Glass K Quackenbush J (2016) Tissue-awareRNA-Seq processing and normalization for heterogeneous and sparse data bioRxiv 81802

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Pautler M Eveland AL LaRue T Yang F Weeks R Lunde C Je B Il Meeley R Komatsu M Vollbrecht E et al (2015) FASCIATED EAR4Encodes a bZIP Transcription Factor That Regulates Shoot Meristem Size in Maize Plant Cell Online 2 tpc114132506

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Penning BW Hunter 3rd CT Tayengwa R Eveland AL Dugard CK Olek AT Vermerris W Koch KE McCarty DR Davis MF et al (2009)Genetic resources for maize cell wall biology Plant Physiol 151 1703-1728

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ponnala L Wang Y Sun Q Wijk KJ (2014) Correlation of mRNA and protein abundance in the developing maize leaf Plant J 78 424-440Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Rau A Gallopin M Celeux G Jaffreacutezic F (2013) Data-based filtering for replicated high-throughput transcriptome sequencingexperiments Bioinformatics 29 2146-2152

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Robinson MD McCarthy DJ Smyth GK (2010) edgeR a Bioconductor package for differential expression analysis of digital geneexpression data Bioinformatics 26 139-140

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sales G Romualdi C (2011) parmigenemdasha parallel R package for mutual information estimation and gene network reconstructionBioinformatics 27 1876-1877

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sangrador-Vegas A Mitchell AL Chang H-Y Yong S-Y Finn RD (2016) GO annotation in InterPro why stability does not indicateaccuracy in a sea of changing annotations Database J Biol databases curation 2016

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sato Y Antonio BA Namiki N Takehisa H Minami H Kamatsuki K Sugimoto K Shimizu Y Hirochika H Nagamura Y (2011) RiceXPro aplatform for monitoring gene expression in japonica rice grown under natural field conditions Nucleic Acids Res 39 D1141-D1148

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schaefer RJ Briskine R Springer NM Myers CL (2014) Discovering functional modules across diverse maize transcriptomes using wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

COB the co-expression browser PLoS One doi 101371journalpone0099193Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schmidt D (2016) Co-Operation Fast Correlation Covariance and Cosine Similarity R packagePubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schnable P Ware D Fulton R Stein J (2009) The B73 maize genome complexity diversity and dynamics Science (80- ) 326 1112-1115Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schwanhaumlusser B Busse D Li N Dittmar G Schuchhardt J Wolf J Chen W Selbach M (2011) Global quantification of mammalian geneexpression control Nature 473 337-342

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Scott-Boyer M-P Haibe-Kains B Deschepper CF (2013) Network statistics of genetically-driven gene co-expression modules in mousecrosses Frontiers in Genetics 20134291

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Serin EAR Nijveen H Hilhorst HWM Ligterink W (2016) Learning from Co-expression Networks Possibilities and Challenges FrontPlant Sci 7 444

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Shannon P Markiel A Ozier O Baliga NS Wang JT Ramage D Amin N Schwikowski B Ideker T (2003) Cytoscape a softwareenvironment for integrated models of biomolecular interaction networks Genome Res 13 2498-2504

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sing T Sander O Beerenwinkel N Lengauer T (2005) ROCR visualizing classifier performance in R Bioinformatics 21 3940-3941Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Singh M Goel S Meeley RB Dantec C Parrinello H Michaud C Leblanc O Grimanelli D (2011) Production of viable gametes withoutmeiosis in maize deficient for an ARGONAUTE protein Plant Cell 23 443-458

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Song L Langfelder P Horvath S (2012) Comparison of co-expression measures mutual information correlation and model basedindices BMC Bioinformatics 13 328

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Stelpflug SC Rajandeep S Vaillancourt B Hirsch CN Buell CR Leon N De Kaeppler SM (2015) An expanded maize gene expressionatlas based on RNA-sequencing and its use to explore root development Plant Genome 314-362

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Szklarczyk D Franceschini A Wyder S Forslund K Heller D Huerta-Cepas J Simonovic M Roth A Santos A Tsafou KP et al (2015)STRING v10 Protein-protein interaction networks integrated over the tree of life Nucleic Acids Res 43 D447-D452

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Tzfadia O Diels T De Meyer S Vandepoele K Aharoni A Van de Peer Y (2016) CoExpNetViz Comparative Co-Expression NetworksConstruction and Visualization Tool Front Plant Sci 6 1194

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Usadel B Obayashi T Mutwil M Giorgi FM Bassel GW Tanimoto M Chow A Steinhauser D Persson S Provart NJ (2009) Co-expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

USDA (2016) Grain World Markets and Trade

Vanholme R Demedts B Morreel K Ralph J Boerjan W (2010) Lignin biosynthesis and structure Plant Physiol 153 895-905Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Verdin E Ott M (2014) 50 Years of Protein Acetylation From Gene Regulation To Epigenetics Metabolism and Beyond Nat Rev MolCell Biol 16 258-264

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Walley JW Sartor RC Shen Z Schmitz RJ Wu KJ Urich MA Nery JR Smith LG Schnable JC Ecker JR (2016) Integration of omicnetworks in a developmental atlas of maize Science 353 814-818

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang W Zhou XJ Liu Z Sun F (2015a) Network tuned multiple rank aggregation and applications to gene ranking BMC Bioinformatics16 S6

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang YXR Jiang K Feldman LJ Bickel PJ Huang H (2015b) Inferring gene-gene interactions and functional modules using sparsecanonical correlation analysis Ann Appl Stat 9 300-323

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang Z Gerstein M Snyder M (2009) RNA-Seq a revolutionary tool for transcriptomics Nat Rev Genet 10 57-63Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wei C Li J Bumgarner RE (2004) Sample size for detecting differentially expressed genes in microarray experiments BMC Genomics5 87

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wolfe CJ Kohane IS Butte AJ (2005) Systematic survey reveals general applicability of guilt-by-association within genecoexpression networks BMC Bioinformatics 6 227

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wu D-D Wang X Li Y Zeng L Irwin DM Zhang Y-P (2014) Out of Pollen Hypothesis for Origin of New Genes in Flowering PlantsStudy from Arabidopsis thaliana Genome Biol Evol 6 2822-2829

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yang H Liu X Xin M Du J Hu Z Peng H Rossi V Sun Q Ni Z Yao Y (2016) Genome-wide Mapping of Targets of Maize HistoneDeacetylase HDA101 Reveals Its Function and Regulatory Mechanism during Seed Development Plant Cell 28 629-645

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yim WC Yu Y Song K Jang CS Lee B-M (2013) PLANEX the plant co-expression database BMC Plant Biol 13 83Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zheng W Chung LM Zhao H (2011) Bias detection and correction in RNA-Sequencing data BMC Bioinformatics 12 290Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Allen JD Xiao G Xie Y (2014) Ensemble-based network aggregation improves the accuracy of gene network reconstructionPLoS One 9 e106319 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Ye ZH (2015) Secondary cell walls Biosynthesis patterned deposition and transcriptional regulation Plant Cell Physiol 56195-214

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhu G Wu A Xu X-J Xiao P Lu L Liu J Cao Y Chen L Wu J Zhao X-M (2016) PPIM A protein-protein interaction database for MaizePlant Physiol 170 pp1501821-

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zyprych-Walczak J Szabelska A Handschuh L Goacuterczak K Klamecka K Figlerowicz M Siatkowski I (2015) The impact of normalizationmethods on RNA-Seq data analysis Biomed Res Int 2015

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

  • Parsed Citations
  • Article File
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Figure 7
  • Figure 8
  • Parsed Citations
Page 13: 1 : Maize RNA-Seq GCNs · 4 Author names and affiliation: 5 Ji Huang, ... and the genome annotation is still far from complete (Mark 49 Cigan et al., 2005; Ficklin and Feltus, 2011)

Page | 13

publications indicating a role in cell wall synthesis or related pathways in plants A search using 214 randomly 425

selected genes as queries returned only 327 genes (7214) that were involved in cell wall related pathways 426

This suggests that the network discriminated co-expressed genes and identified some known components of 427

the pathway Lignin biosynthesis genes are expected to function in cell wall biosynthesis to provide rigidity and 428

strength in the secondary cell wall (reviewed by Vanholme et al 2010) Interestingly even though no lignin 429

biosynthesis genes were included in our queries six lignin biosynthesis genes (PAL1 C4H 4CL2 HCT 430

CCoAOMT1 and PDR1) (reviewed by Zhong and Ye 2015) were found to be co-expressed with the guide 431

genes At least nine cellulose biosynthesis and assembly genes were discovered including CESA1 FLA11 432

IRX9 IRX14 and IRX10 (reviewed by Zhong and Ye 2015) Moreover proteins participating in a well-studied 433

physical interaction CSI1 (Cellulose Synthase Interactive 1) CESA6 (Cellulose Synthase 6) and CESA3 434

(Cellulose Synthase 3) (Desprez et al 2007 Gu et al 2010) were also predicted to be expressed in the 435

network There were 131 genes without reported functions in cell wall pathways an indication that GCN 436

analysis can be used to predict undiscovered components of biological pathways in maize 437

The cell wall biosynthesis pathway results were also compared with the CORNET Co-expression database (De 438

Bodt et al 2012) and STRING functional protein association network (Szklarczyk et al 2015) using the same 439

16 genes and similar parameters (See Methods) From CORNET 10 out of 16 genes had co-expressed genes 440

(Fig 8B) In total 210 genes and 325 interactions were retrieved using CORNET of which 19 (40210) had 441

publications supporting their function in cell wall pathways (Supplemental Table S11) STRING performed very 442

well with 14 out of 16 genes demonstrating predicted protein association (Fig 8C) resulting in 817 443

interactions with 76 genes 48 (3675) of co-expressed genes were experimentally confirmed (Supplemental 444

Table S12) the highest percentage among the three methods Only one of the lignin biosynthesis genes 445

(PAL1) was found using CORNET and none were found using STRING Although STRING appears very 446

robust for predicting protein-protein interactions this suggests that an optimized GCN analysis have more 447

power to find genes that function together without physically interacting This case study shows that a robust 448

optimized GCN can discover physical and functional interactions and enhance study of biological relevant 449

interactions A tutorial was provided as supplemental material on how to use Cytoscape to visualize any co-450

expressed genes in our network (Supplemental Dataset S2) 451

452

Discussion 453

As the per-read cost of RNA-Seq technology decreases the use of this technology is quickly increasing With 454

over five thousand libraries available for maize there is now ample data to support GCN analysis This 455

comprehensive evaluation of normalization methods and network inference methods using real maize RNA-456

Seq data will provide a useful set of optimized parameters to support these analyses 457

In our analysis VST CPM and RPKM normalization methods had equivalent outcomes for GCN analysis 458

consistent with prior results using much smaller datasets (Giorgi et al 2013) Several benchmark studies 459

focusing on differential expression (DE) analysis proposed that RPKM performed poorly and should be avoided 460 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 14

(Maza et al 2013 Dillies et al 2013b Zyprych-Walczak et al 2015) This was not observed for the maize 461

GCN testing It is possible that the large number of samples from various labs created enough heterogeneity 462

within samples that normalization effects were minimized (Paulson et al 2016) Furthermore the 463

normalization is on a library basis which means genes within the same library are normalized by similar factors 464

So when the network is constructed by PCC and BIC where expression vectors are centered by mean or 465

median values the effect of different normalization methods are probably small Two rank correlations SCC 466

and KCC only consider difference on relative rankings where normalization has a limited effect It is similar for 467

GCC method The estimation of mutual information is based on the k-nearest neighbor method implemented in 468

parmigene (Sales and Romualdi 2011) Since the three normalization methods shared similar expression 469

distribution (Supplemental Fig 2) MI estimations from different normalizations are expected to be similar 470

When assessing inference methods the simple and widely used correlation methods like PCC and SCC are 471

less time-consuming than MI methods This analysis showed PCCSCC- built GCNs had better overall 472

performance This is consistent with a study in human GCN analysis (Ballouz et al 2015) but SCC did not 473

score higher than other correlation methods using GO and PPPTY evaluations Some genes had higher 474

performance using MI methods but this effect was limited to evaluation with the PPPTY data This may 475

indicate that correlation and MI inference methods assert different kinds of interactions (Meyer et al 2008 476

Marbach et al 2012 Song et al 2012) Marbach et al (2012) stated that integration of multiple inference 477

methods showed a more robust performance than any single inference methods in in silico and E coli 478

expression networks referring to ldquothe wisdom of crowdrdquo However for analysis of the available maize data 479

integration of PCC SCC MRNET and CLR together did not result in a network that outperformed PCC and 480

SCC networks (data not shown) This approach was also less effective in more complex S cerevisiae datasets 481

than prokaryotic networks (Marbach et al 2012) suggesting that more work is required to determine whether 482

integrating algorithms can improve GCNs with eukaryotic data 483

In conclusion we extensively evaluated normalization methods and inference methods for building an RNA-484

Seq based maize GCN This optimization may apply to a range of datasets with shared characteristics of 485

maize including a large and heterogeneous genome with rich and diverse transposon element composition 486

and limited gene annotation 487

488

Materials and Methods 489

RNA-Seq Data Collection and Process 490

The maize genome and its annotation were downloaded from Ensembl Plant Release 31 491

(httpplantsensemblorg) The original 1303 RNA-Seq samples based on illumina HiSeq2000 or Hiseq2500 492

were downloaded from NCBI Sequence Read Archive (SRA) (Leinonen et al 2010) The downloaded files 493

were converted to fastq format using the fastq-dump command in SRA Toolkit (version 252) The adapters for 494

the fastq files were trimmed by Cutadapt 181 (Martin 2011) The adapter-removed files were then quality 495

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 15

checked by FastQC v0112 (httpwwwbioinformaticsbabrahamacukprojectsfastqc) HISAT2 v204 (Kim 496

et al 2015) was used for genome alignment Gene-level expression raw read counts were calculated by 497

FeatureCounts 150 (Liao et al 2014) from aligned bam files (Supplemental Fig S1) 26 libraries with less 498

than 5 million reads total and 11 libraries with less than 70 of total alignment rate were excluded leaving 499

1266 samples (Supplemental Table S1) for the final expression table The processing protocol were 500

streamlined by Snakemake v371 (Koumlster and Rahmann 2012) 501

502

Gene Count Normalization 503

The expression data was normalized using three different methods before constructing GCNs Counts Per 504

Million (CPM) and Reads Per Killbase Per Million (RPKM) were calculated by edgeR package (Robinson et al 505

2010) in R environment and then log2 normalized (expression = log2(CPMRPKM +1) For both method scale 506

factors between samples were estimated by Trimmed Mean of M-values (TMM) in edge R Variance Stabilizing 507

Transformation (VST) was calculated by DESeq2 package (Love et al 2014) Only genes with expression 508

higher than 2 CPM in more than 1000 samples were included from additional analysis (15116 genes) 509

510

Network Inference 511

Six correlation coefficient methods and four mutual information methods were applied to normalized gene 512

expression data to construct GCNs All computing steps were done in the R 331 environment Pearson 513

Correlation Coefficient (PCC) and Spearman Correlation Coefficient (SCC) was calculated by cor() function 514

Kendall rank Correlation Coefficient was calculated using corfk() function in pcaPP package (Filzmoser et al 515

2009) Gini Correlation Coefficient was calculated by adjacencymatrix() function in rsgcc package (Ma and 516

Wang 2012) Biweight midcorrelation was computed by bicor() function in WGCNA package (Langfelder and 517

Horvath 2008) Cosine similarity coefficient was computed by cosine() function in coop package (Schmidt 518

2016) Mutual information results were computed using the parmigene package (Sales and Romualdi 2011) 519

The adjacency matrix weighs derived from ten inference methods were ranked with smallest value equals to 520

one Then ranks were divided by the number of elements in the matrix and diagonal was set to one to make all 521

networks weighs ranging from zero to one 522

523

Network Performance Evaluation 524

To generate the random networks gene IDs were shuffled randomly in CPM or VST normalized expression 525

matrices The randomized expression matrices were then inferenced by PCC MRNET or CLR methods and 526

evaluated For PCC methods 1000 repeats of randomization and evaluation were conducted For MRNET and 527

CLR each inference steps took 2 hours on our server so 10 repeats were conducted 528

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 16

Four maize datasets were used for evaluation First maize protein-protein interactions were downloaded from 529

PPIM v11 (Zhu et al 2016) Only high-confidence interactions were used for evaluation as defined by ranking 530

top 5 in their results Second maize pathway information was downloaded from MaizeCyc v22 (Monaco et 531

al 2013) Genes within same pathways were considered as co-expressed Third maize gene ontology data 532

for AGPv330 was downloaded from AgriGO (Du et al 2010) GO terms with 20 to 300 genes were used for 533

evaluation Fourth ChIP-Seq confirmed targets for HDA101 (GRMZM2G172883) (Yang et al 2016) was used 534

as positive co-expressed examples for evaluation 535

The widely-used Area under Receiver Operating Characteristic (AUROC) for binary classification problems 536

was used for evaluations Protein-protein interaction and pathway information was parsed into lists of co-537

expressed genes Prediction() and performance() function in R package ROCR were used to calculate 538

AUROCs (Sing et al 2005) The 277 AUROC values for GO datasets were calculated by EGAD package 539

(Ballouz et al 2016) in R Basically it utilizes the ldquoguilt-by associationrdquo principle that genes with shared GO 540

terms are more likely to connected Thus networks normalized and inferred by different methods can be 541

evaluated by hiding a subset of genes GO terms and test whether the hidden GO terms could be predicted 542

from the remaining annotations The prediction model performance was measured by AUROC values in three-543

fold cross-validation All ANOVA and pairwise Wilcoxon rank tests were analyzed in R using anova() and 544

pairwisewilcoxtest() function from stats package P-value adjustment method was set to ldquofdrrdquo (Benjamini and 545

Hochberg 1995) 546

Definition of True Positives (TP) False Positives (FP) True Negatives (TN) False Negatives (FN) For the 547

evaluation using PPPTY dataset TP a network predicts two genes are co-expressed and they are co-548

expressed in PPPTY dataset FP a network predicts two genes are co-expressed but they are not TN a 549

network predicts two genes are not co-expressed and they are not co-expressed in PPPTY FN a network 550

predicts two genes are not co-expressed but they are co-expressed in PPPTY datasets For the evaluation 551

using GO dataset TP a network predicts a gene has a specific GO term and it does have that GO term in our 552

GO dataset FP a network predicts a gene has a specific GO term but it does not have that GO term in our 553

GO dataset TN a network predicts a gene does not have a specific GO term and it doesnrsquot have in our GO 554

dataset FN a network predicts a gene does not have a specific GO terms but it has that GO term in GO 555

dataset 556

557

Network Clustering and Characterization 558

For each network the top 1 million edges were selected as stringent co-expression networks The network 559

topological characteristics were computed in Cytoscape (Shannon et al 2003) The neighborhood connectivity 560

distribution and node degree distributions were plotted by Network Analyzer plugin (Doncheva et al 2012) 561

Graph clustering was performed using Markov Cluster Algorithm (MCL) by MCL v14137 with inflation value set 562

to 18 (Enright et al 2002) All networks were visualized in Cytoscape 563

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 17

564

Gene Ontology Enrichment and Visualization 565

Gene ontology enrichment was analyzed in AgriGOrsquos Singular Enrichment Analysis tool (Du et al 2010) 566

15116 genes involved in our networks were used as background references Hypergeometric testing was used 567

to calculate p-value for which a value below 005 was considered as significant The Yekutieli method was 568

used for multiple test correction and terms with false discovery rate (FDR) above 005 were discarded The 569

results were then imported into Cytoscape for visualization 570

571

Databases Comparison on Cell Wall Pathway 572

Sixteen well characterized (Penning et al 2009 Bosch et al 2011) components of cell wall biosynthesis 573

(Supplemental Table S8) were chosen as query genes to search against CORNET Maize 574

(httpsbioinformaticspsbugentbecornetversionscornet_maize10) on website and STRING database using 575

Cytoscape stringApp (httpappscytoscapeorgappsstringapp) The parameters for searching CORNET 576

database were Method=Pearson Correlation coefficient=075 P-value le 005 and Top genes = 50 This 577

resulted in 210 co-expressed genes and 325 interactions To search STRING database the confidence cutoff 578

was set to 04 with maximum number of interactors set to 100 76 genes with 817 interactions were retrieved 579

Maize proteins were blasted against TAIR 10 protein sequences using standalone BLASTP version 2228+ 580

(Camacho et al 2009) 581

582

Acknowledgments 583

We would like to give special thanks to Dr Peixiang Zhao (FSU Department of Computer Science) for advice 584

and discussion on topological analysis of maize networks Also we thank Dr Alan Lemmon (FSU Department 585

of Scientific Computing) and Dr Jonathan Dennis (FSU Department of Biological Science) for the helpful 586

discussion on data analysis 587

588

Supplemental Data 589

Supplemental Figure 1 Pipeline and datasets used for analysis 590

Supplemental Figure 2 Distribution of gene expression values 591

Supplemental Figure 3 Maize CPM-normalized with log2 transformed gene expression from all tissues and 592

developmental stages 593

Supplemental Figure 4 Pairwise comparison among results of inferences methods 594

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 18

Supplemental Figure 5 Characteristic of all 1720 PPPTY gene set (ALL_1720) genes with highest AUROC 595

values in CSC method (CSC) PCC method (PCC) and MRNET method (MRNET) 596

Supplemental Figure 6 Evaluation of network performance based on sample size and inference 597

Supplemental Figure 7 GCN performance comparison between protein networks 598

Supplemental Figure 8 Average neighborhood connectivity for three selected networks PCC-aggregated (PA) 599

SCC-aggregated (SA) and MRNET-single (MS) 600

Supplemental Figure 9 Node distribution for four selected networks PCC-aggregated (PA) SCC-aggregated 601

(SA) and MRNET-single (MS) and the intersection among three networks (Merged network) 602

Supplemental Figure 10 Network representation of PCC ranked aggregation network (PA) 603

Supplemental Table S1 RNA-Seq libraries used in this analysis 604

Supplemental Table S2 Random network AUROC value baseline 605

Supplemental Table S3 ANOVA tables and pairwise comparisons 606

Supplemental Table S4 Topological characteristics of four maize networks 607

Supplemental Table S5 Gene Ontology annotation for 148 hub genes 608

Supplemental Table S6 Enriched GO terms for PCC ranked aggregation networks from module 1 to module 8 609

Supplemental Table S7 Enriched GO terms for SCC ranked aggregation networks from module 1 to module 8 610

Supplemental Table S8 16 query genes in maize cell wall pathway 611

Supplemetal Table S9 GO enrichment analysis for 214 co-expressed genes of cell wall query genes in 612

merged network 613

Supplemental Table S10 Annotation for co-expressed genes queried by 16 cell wall pathway genes from 614

merged network 615

Supplemental Table S11 Annotation for co-expressed genes queried by 16 cell wall pathway genes from 616

CORNET database 617

Supplemental Table S12 Annotation for co-expressed genes queried by 16 cell wall pathway genes from 618

STRING database 619

Supplemental Dataset S1 The merged network in Cytoscape-ready format 620

Supplemental Dataset S2 Tutorial Visualizing Co-expression data in Cytoscape 621

622

623 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 19

624

625

626

Figure legends 627

628

Figure 1 Number of maize microarray and RNA-Seq samples submitted to NCBI (httpswwwncbinlmnihgov) 629

from 2008-2016 A A text search of the National Center for Biotechnology Information (NCBI) Gene 630

Expression Omnibus (GEO) database identified samples generated by microarray platforms GPL4032 and 631

GPL12620 the total values for years 2008 to 2016 were combined to represent the number of microarray 632

studies (Microarray) A text search of the NCBI Sequence Read Archive (SRA) database was used to identify 633

RNA-seq samples generated between 2008 and 2016 using the Illumina sequencing platform (RNA-Seq) B 634

the number of samples submitted to NCBI GEO database each year generated by microarray platform 635

GPL4032 and GPL12620 were identified by a text search (dash line) and compared to the number of RNA-Seq 636

Illumina samples (solid line) per year 2008-2016 637

638

Figure 2 Normalization and network inference methods effect on single network performance A Network 639

performance was evaluated by calculating area under the receiver operating characteristic curve (AUROC) 640

values from GO datasets for comparisons with samples normalized using Variance Stabilizing Transformation 641

(VST) Counts Per Million (CPM) or Reads per Kilobase per Million (RPKM) methods B Network performance 642

was evaluated by calculating AUROC values from PPPTY dataset comparisons for samples normalized using 643

VST CPM or RPKM methods C Network performance was evaluated by calculating AUROC values from 644

comparisons with HDA101 binding targets for samples normalized using VST CPM or RPKM methods D 645

Network performance was evaluated by calculating AUROC values from comparisons with GO dataset for 646

samples constructed using ten inference methods including Pearson Correlation Coefficient (PCC) Spearman 647

correlation coefficient (SCC) Kendall rank Correlation Coefficient (KCC) Gini correlation coefficient (GCC) 648

Biweight midcorrelation (BIC) Cosine Similarity Coefficient (CSC) Additive ARCNE (AA) multiplicative 649

ARCNE (MA) Minimum Redundancy NETwork (MRNET) and Context Likelihood of Relatedness (CLR) E 650

Network performance was evaluated by calculating AUROC values from comparisons with PPPTY for samples 651

constructed using ten inference methods F Network performance was evaluated by calculating AUROC 652

values from comparisons with HDA101 binding targets for samples constructed using ten inference methods 653

Outliers were defined as outside of 15 times the interquartile range above 75 quantile or below 25 quantile 654

Median values were plotted as bold horizontal lines For C to F the horizontal dashed lines indicate the highest 655

and lowest AUROC values 656

657

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 20

Figure 3 Similarity between ten inference methods on network performance based upon GO (A) and PPPTY 658

(B) evaluation Genes with highest AUROC value in CSC or MRNETCLR are closed in blue and green box 659

respectively Area under the ROC curve (AUROC) values for each GO term or genes were scaled to standard 660

normal distribution resulting in scaled AUROC values between -3 (blue) and 3 (red) Samples normalized by 661

VST CPM and RPKM were analyzed using each inference methods (PCC SCC KCC GCC BIC CSC AA 662

MA MRNET and CLR) and clustered based on Euclidian distance PCC Pearson Correlation Coefficient SCC 663

Spearman correlation coefficient KCC Kendall rank Correlation Coefficient GCC Gini correlation coefficient 664

BIC Biweight midcorrelation CSC Cosine Similarity Coefficient AA Additive ARCNE MA multiplicative 665

ARCNE MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness 666

667

Figure 4 Effect of sample size on network performance A Average area under the ROC curve (average 668

AUROC) values from GO evaluation of 17 different sized networks plotted against natural logarithm 669

transformed sample size (log(sample size)) B Average AUROC values from PPPTY evaluation of 17 different 670

sized networks plotted against natural logarithm transformed sample size (log(sample size)) Regression fitting 671

logarithm models were plotted in black lines R-square and p-values were calculated by lm() function in R PCC 672

Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET Minimum Redundancy 673

NETwork CLR Context Likelihood of Relatedness 674

675

Figure 5 Evaluation of different sized networks constructed with different number of samples using PCC 676

(black) SCC (green) MRNET (red) and CLR (blue) methods A Average AUROC values from GO evaluations 677

of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted against sample size Seventeen 678

individual networks were labeled as S12_1 to S404 the S1266 included all samples from 17 experiments B 679

Average AUROC values from PPPTY evaluations of networks constructed using 12 (S12) to 1266 (S1266) 680

libraries were plotted against sample size Networks with the same number of samples included are 681

designated as ldquo_1rdquo ldquo_2rdquo and ldquo_3rdquo PCC Pearson Correlation Coefficient SCC Spearman correlation 682

coefficient MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness 683

684

Fig 6 GCN performance comparison among single network (whiterdquo1266rdquo) aggregated network (greyrdquoaggrdquo) 685

and protein network (dark greyrdquoprrdquo) using PCC SCC MRNET and CLR A GO evaluation on networks 686

Inference methods were indicated by single letter (p- PCC s- SCC m- MRNET c-CLR) AUROC values were 687

plotted against network types B PPPTY evaluation on networks Inference methods were indicated by single 688

letter (p- PCC s- SCC m- MRNET c-CLR) Network types were plotted against AUROC values Bold 689

horizontal lines indicate median star sign is the mean value of each box Outliers are plotted in grey dots 690

691

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 21

Figure 6 GCN performance comparison of networks constructed with 1266 libraries A Area under the ROC 692

curve (AUROC) values from GO evaluation of single network (white bars) aggregation network (grey bars) and 693

protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) 694

or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers B 695

AUROC values from PPPTY evaluation of single network (white bars) aggregation network (grey bars) and 696

protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) 697

or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers 698

699

Figure 7 Comparison of top 1000 interacting genes in aggregation and single networks A Venn diagram 700

shows the overlap among 1000 highest interacting genes in three networks 148 genes were shared among 701

three networks PA PCC ranked aggregation network SA SCC ranked aggregation network MS MRNET 702

single network B Venn diagram shows the overlap among top 1x106 edges in three networks 106591 edges 703

were shared among three networks C GO term enrichment analyzed by AgriGO for 148 shared highly 704

interacting genes among PA SA and MS D GO term enrichment analyzed by AgriGO for 214 co-expressed 705

genes queried by 16 cell wall pathway genes 706

707

Figure 8 Cell wall pathway subnetworks A Intersections of PCC aggregation (PA) SCC aggregation (SA) and 708

MRNET-single (MS) networks queried by 16 cell wall pathway genes (red nodes) Cyan nodes are genes with 709

reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of 710

involvement in cell wall related pathways Grey lines indicate network predicted interactions B Network 711

retrieved from CORNET database queried by the16 cell wall pathway genes (red node) Cyan nodes are 712

genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior 713

knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions C 714

Network retrieved from STRING database queried by 16 cell wall pathway genes (red nodes) Cyan nodes are 715

genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior 716

knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions 717

718

Supplemental Figure 1 Pipeline and datasets used for analysis A Workflow used in this analysis 719

Independent steps are labeled in square boxes with alternative algorithms for each step in the rounded boxes 720

Software and packages for each step are in italics between the boxes Raw data files were acquired from 721

National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) database converted to a 722

common format (fastq files) and aligned to the maize AGPv3 genome (Alignment) Gene-level reads were 723

counted (Read Count) to generate an expression matrix which was imported to the R environment for the 724

normalization inference and evaluation steps All networks were visualized in Cytoscape B Relative 725

representation of different maize tissues in acquired datasets Tissues are listed by name with the percentage 726

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 22

of the1266 libraries originating from each tissue SAM= Shoot Apical Meristem Samples are grouped by tissue 727

and may be represented by one or more developmental stages of that tissue Tissues represented by less than 728

10 libraries were grouped together as Others C Relative representation of different maize genotypes in our 729

datasets Genotypes are listed by name with the percentage of the 1266 libraries originating from each tissue 730

MAGIC = Multi-parent Advanced Generation InterCrosses Genotypes represented by more than 10 libraries 731

were grouped together as Others 732

733

Supplemental Figure 2 Distribution of gene expression values The frequency of each expression level in the 734

dataset (Density) was plotted against gene expression (Expr) which was calculated after normalization by 735

Variance Stabilizing Transformation (VST) Counts Per Million (CPM) or Reads Per Kilobase per Million 736

mapped reads (RPKM) A-B distribution of expression values for samples normalized with CPM (black line 737

CPM graph) and RPKM (black line RPKM graph) before (A) and after (B) logarithm normalization (log2) VST 738

values are log2 transformed by default The normal distribution of expression (dot lines) was calculated using 739

dnorm() function in R which takes the mean value and standard deviation from log2 transformed expressions 740

C Normalized gene expression values for 15116 genes were averaged libraries and plotted as a function of 741

gene length in base pairs (bp) 742

743

Supplemental Figure 3 Maize CPM-normalized with log2 transformed gene expression from all tissues and 744

developmental stages (Stelpflug et al 2015) A Clustering dendrogram of samples based on Euclidean 745

distance (Height) DAS days after sowing DAP days after pollination V1-V18 vegetative developmental 746

stage B Heat map of the gene expression correlation between pollen tissue and 78 other tissues calculated 747

by Pearson correlation coefficient ranging 06 to 10 Red color indicates higher correlation 748

749

Supplemental Figure 4 Pairwise comparison among results of inferences methods A GO evaluation 750

comparisons for VST CPM and RPKM normalized data The AUROC value density for each method was 751

plotted in diagonal line of blocks between AUROC values and PCC values AUROC values evaluated by GO 752

datasets were plotted pairwise in triangle below diagonal with the number corresponding coefficient values as 753

calculated by Pearson correlation shown in the triangle above diagonal B PPPTY evaluation comparisons for 754

VST CPM and RPKM normalized data The AUROC value density for each method was plotted in diagonal 755

line of blocks between AUROC values and PCC values AUROC values evaluated by PPPTY datasets were 756

plotted pairwise in triangle below diagonal with the number corresponding coefficient values as calculated by 757

Pearson correlation shown in the triangle above diagonal PCC Pearson Correlation Coefficient SCC 758

Spearman correlation coefficient KCC Kendall rank Correlation Coefficient GCC Gini correlation coefficient 759

Bi Biweight midcorrelation CSC Cosine Similarity Coefficient AA Additive ARCNE MA multiplicative 760

ARCNE MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness 761

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 23

762

Supplemental Figure 5 Characteristic of all 1720 PPPTY gene set (ALL_1720) genes with highest AUROC 763

values in CSC method (CSC) PCC method (PCC) and MRNET method (MRNET) Average expression in 764

CPM of four gene sets were in squares average number of lowly expressed elements (CPM lt 0) were in solid 765

circles 766

767

Supplemental Figure 6 Evaluation of network performance based on sample size and inference A AUROC 768

values from GO evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted 769

against sample size B AUROC values from PPPTY evaluations of networks constructed using 12 (S12) to 770

1266 (S1266) libraries were plotted against sample size Networks with the same number of samples included 771

are designated as ldquo_1rdquo ldquo_2rdquo and ldquo_3rdquo Outliers were defined as outside of 15 times the interquartile range 772

above the 75 quantile or below 25 quantile Median values were plotted as bold horizontal lines Dash lines 773

are average AUROC value from 17 individual networks of each categories Mean values of each network were 774

labeled in asterisks PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET 775

Minimum Redundancy NETwork CLR Context Likelihood of Relatedness 776

777

Supplemental Figure 7 GCN performance comparison between protein networks A Area Under the ROC 778

curve (AUROC) values from GO evaluation of protein networks with 17862 genes (ppr_all) and with 11429 779

genes (ppr) B Area Under the ROC curve (AUROC) values from PPPTY evaluation of protein networks with 780

17862 genes (ppr_all) and with 11429 genes (ppr) Both networks were constructed by Pearson Correlation 781

Coefficient (PCC) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate 782

outliers 783

784

Supplemental Figure 8 Average neighborhood connectivity for three selected networks PCC-aggregated (PA) 785

SCC-aggregated (SA) and MRNET-single (MS) The average neighborhood connectivity distribution of all 786

genes is plotted against number of neighbors The top one million edges were chosen for each network Red 787

and blue curve shows the power-law fitted distribution R2 value indicates the fitness with the power-law model 788

789

Supplemental Figure 9 Node distribution for four selected networks PCC-aggregated (PA) SCC-aggregated 790

(SA) and MRNET-single (MS) and the intersection among three networks (Merged network) The number of 791

edges linked to the genes (node degree) was plotted against the number of genes with that degree (number of 792

nodes) Red curve shows the power-law fitted distribution with the function and R2 indicated beside 793

794

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 24

Supplemental Figure 10 Network representation of PCC ranked aggregation network (PA) Each node is a 795

gene in the network The eight largest modules detected by Markov Cluster Algorithm (MCL) were highlighted 796

in colors Genes not in modules 1-8 are light grey nodes 797

798

799

Literature Cited 800

Allen JD Xie Y Chen M Girard L Xiao G (2012) Comparing statistical methods for constructing large scale 801 gene networks PLoS One 7 e29348 802

Anders S Huber W (2010) Differential expression analysis for sequence count data Genome Biol 11 R106 803

Aoki K Ogata Y Shibata D (2007) Approaches for extracting practical information from gene co-expression 804 networks in plant biology Plant Cell Physiol 48 381ndash90 805

Aoki Y Okamura Y Tadaka S Kinoshita K Obayashi T (2015) ATTED-II in 2016 A Plant Coexpression 806 Database Towards Lineage-Specific Coexpression Plant Cell Physiol 57 e5ndashe5 807

Asnicar F Masera L Coller E Gallo C Sella N Tolio T Morettin P Erculiani L Galante F Semeniuta S (2016) 808 NES2RA Network expansion by stratified variable subsetting and ranking aggregation Int J High Perform 809 Comput Appl 1094342016662508 810

Baerenfaller K Grossmann J Grobei MA Hull R Hirsch-Hoffmann M Yalovsky S Zimmermann P 811 Grossniklaus U Gruissem W Baginsky S (2008) Genome-scale proteomics reveals Arabidopsis thaliana 812 gene models and proteome dynamics Science (80- ) 320 938ndash941 813

Ballouz S Verleyen W Gillis J (2015) Guidance for RNA-seq co-expression network construction and analysis 814 Safety in numbers Bioinformatics 31 2123ndash2130 815

Ballouz S Weber M Pavlidis P Gillis J (2016) EGAD Ultra-fast functional analysis of gene networks bioRxiv 816 53868 817

Barabasi A-L Oltvai ZNZN Barabaacutesi A-L (2004) Network biology understanding the cellrsquos functional 818 organization Nat Rev Genet 5 101ndash113 819

Benjamini Y Hochberg Y (1995) Controlling the false discovery rate a practical and powerful approach to 820 multiple testing J R Stat Soc Ser B 289ndash300 821

De Bodt S Hollunder J Nelissen H Meulemeester N Inzeacute D (2012) CORNET 20 Integrating plant 822 coexpression protein-protein interactions regulatory interactions gene associations and functional 823 annotations New Phytol 195 707ndash720 824

Bolduc N Yilmaz A Mejia-Guerra MK Morohashi K OrsquoConnor D Grotewold E Hake S (2012) Unraveling the 825 KNOTTED1 regulatory network in maize meristems Genes Dev 26 1685ndash90 826

Bosch M Mayer C-D Cookson A Donnison IS (2011) Identification of genes involved in cell wall biogenesis in 827 grasses by differential gene expression profiling of elongating and non-elongating maize internodes J 828 Exp Bot 62 3545ndash3561 829

Camacho C Coulouris G Avagyan V Ma N Papadopoulos J Bealer K Madden TL (2009) BLAST+ 830 architecture and applications BMC Bioinformatics 10 421 831

Conesa A Madrigal P Tarazona S Gomez-Cabrero D Cervera A McPherson A Szcześniak MW Gaffney DJ 832 Elo LL Zhang X et al (2016) A survey of best practices for RNA-seq data analysis Genome Biol 17 13 833

Drsquohaeseleer P Liang S Somogyi R (2000) Genetic network inference from co-expression clustering to reverse 834 engineering Bioinformatics 16 707ndash726 835

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 25

Davidson RM Hansey CN Gowda M Childs KL Lin H Vaillancourt B Sekhon RS de Leon N Kaeppler SM 836 Jiang N et al (2011) Utility of RNA Sequencing for Analysis of Maize Reproductive Transcriptomes Plant 837 Genome J 4 191 838

Desprez T Juraniec M Crowell EF Jouy H Pochylova Z Parcy F Houmlfte H Gonneau M Vernhettes S (2007) 839 Organization of cellulose synthase complexes involved in primary cell wall synthesis in Arabidopsis 840 thaliana Proc Natl Acad Sci 104 15572ndash15577 841

Dhillon IS Modha DS (2001) Concept decompositions for large sparse text data using clustering Mach Learn 842 42 143ndash175 843

Dillies M-A Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot G Castel D 844 Estelle J (2013a) A comprehensive evaluation of normalization methods for Illumina high-throughput RNA 845 sequencing data analysis Brief Bioinform 14 671ndash683 846

Dillies MA Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot NS Castel D 847 Estelle J et al (2013b) A comprehensive evaluation of normalization methods for Illumina high-throughput 848 RNA sequencing data analysis Brief Bioinform 14 671ndash683 849

Doncheva NT Assenov Y Domingues FS Albrecht M (2012) Topological analysis and interactive visualization 850 of biological networks and protein structures Nature Protoc 7 670ndash85 851

Dong J Horvath S (2007) Understanding network concepts in modules BMC Syst Biol 1 24 852

Dotto MC Petsch KA Aukerman MJ Beatty M Hammell M Timmermans MCP (2014) Genome-wide analysis 853 of leafbladeless1-regulated and phased small RNAs underscores the importance of the TAS3 ta-siRNA 854 pathway to maize development PLoS Genet 10 e1004826 855

Du D Rawat N Deng Z Gmitter FG (2015) Construction of citrus gene coexpression networks from microarray 856 data using random matrix theory Hortic Res 2 15026 857

Du Z Zhou X Ling Y Zhang Z Su Z (2010) agriGO a GO analysis toolkit for the agricultural community 858 Nucleic Acids Res 38 64-70 859

Enright AJ Van Dongen S Ouzounis CA (2002) An efficient algorithm for large-scale detection of protein 860 families Nucleic Acids Res 30 1575ndash1584 861

Fadista J Vikman P Laakso EO Mollet IG Esguerra J Lou Taneera J Storm P Osmark P Ladenvall C 862 Prasad RB (2014) Global genomic and transcriptomic analysis of human pancreatic islets reveals novel 863 genes influencing glucose metabolism Proc Natl Acad Sci 111 13924ndash13929 864

Faith JJ Hayete B Thaden JT Mogno I Wierzbowski J Cottarel G Kasif S Collins JJ Gardner TS (2007) 865 Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of 866 expression profiles PLoS Biol 5 0054ndash0066 867

Fedoroff N V (2012) McClintockrsquos challenge in the 21st century Proc Natl Acad Sci 109(50) 20200ndash20203 868

Ficklin SP Feltus FA (2011) Gene coexpression network alignment and conservation of gene modules 869 between two grass species maize and rice Plant Physiol 156 1244ndash56 870

Filzmoser P Fritz H Kalcher K (2009) pcaPP Robust PCA by Projection Pursuit R Packag version 1 871

Fu J Cheng Y Linghu J Yang X Kang L Zhang Z Zhang J He C Du X Peng Z (2013) RNA sequencing 872 reveals the complex regulatory network in the maize kernel Nature Commun 42832 873

Gao S Ver Steeg G Galstyan A (2015) Efficient Estimation of Mutual Information for Strongly Dependent 874 Variables Artificial Intelligence and Statistics 277-286 875

Gillis J Pavlidis P (2011) The role of indirect connections in gene networks in predicting function 876 Bioinformatics 27 1860ndash1866 877

Giorgi F Fabbro C Del Licausi F (2013) Comparative study of RNA-seq-and Microarray-derived coexpression 878 networks in Arabidopsis thaliana Bioinformatics 2 1ndash8 879

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 26

Gu Y Kaplinsky N Bringmann M Cobb A Carroll A Sampathkumar A Baskin TI Persson S Somerville CR 880 (2010) Identification of a cellulose synthase-associated protein required for cellulose biosynthesis Proc 881 Natl Acad Sci 107 12866ndash12871 882

Han Y Gao S Muegge K Zhang W Zhou B (2015) Advanced applications of RNA sequencing and challenges 883 Bioinform Biol Insights 9 29ndash46 884

Horvath S Dong J (2008) Geometric interpretation of gene coexpression network analysis PLoS Comput biol 885 4 e1000117 886

Huang J Lynn JS Schulte L Vendramin S McGinnis K (2017) Chapter Two-Epigenetic Control of Gene 887 Expression in Maize Int Rev Cell Mol Biol 328 25ndash48 888

Iancu OD Kawane S Bottomly D Searles R Hitzemann R McWeeney S (2012) Utilizing RNA-Seq data for de 889 novo coexpression network inference Bioinformatics 28 1592ndash1597 890

Kim D Langmead B Salzberg SL (2015) HISAT a fast spliced aligner with low memory requirements Nat 891 Methods 12 357ndash360 892

Koumlster J Rahmann S (2012) Snakemakemdasha scalable bioinformatics workflow engine Bioinformatics 28 893 2520ndash2522 894

Krouk G Lingeman J Colon A Coruzzi G Shasha D (2013) Gene regulatory networks in plants learning 895 causality from time and perturbation Genome Biol 14 123 896

Kumar S Stecher G Suleski M Hedges SB (2017) TimeTree a resource for timelines timetrees and 897 divergence times Mol Biol Evol 34 1812ndash1819 898

Kumari S Nie J Chen H-S Ma H Stewart R Li X Lu M-Z Taylor WM Wei H (2012) Evaluation of gene 899 association methods for coexpression network construction and biological knowledge discovery PLoS 900 One 7 e50411 901

Langfelder P Horvath S (2008) WGCNA an R package for weighted correlation network analysis BMC 902 Bioinformatics 9 559 903

Leinonen R Sugawara H Shumway M (2010) The sequence read archive Nucleic Acids Res gkq1019 904

Li C Qiao Z Qi W Wang Q Yuan Y Yang X Tang Y Mei B Lv Y Zhao H et al (2015a) Genome-Wide 905 Characterization of cis-Acting DNA Targets Reveals the Transcriptional Regulatory Framework of 906 Opaque2 in Maize Plant Cell 27 532-545 907

Li H Peng Z Yang X Wang W Fu J Wang J Han Y Chai Y Guo T Yang N (2013a) Genome-wide 908 association study dissects the genetic architecture of oil biosynthesis in maize kernels Nat Genet 45 43ndash909 50 910

Li J Wei H Zhao PX (2013b) DeGNServer  Deciphering Genome-Scale Gene Networks through High 911 Performance Reverse Engineering Analysis 2013 912

Li P Piao Y Shon HS Ryu KH (2015b) Comparing the normalization methods for the differential analysis of 913 Illumina high-throughput RNA-Seq data BMC Bioinformatics 16 347 914

Li Q Eichten SR Hermanson PJ Zaunbrecher VM Song J Wendt J Rosenbaum H Madzima TF Sloan AE 915 Huang J et al (2014a) Genetic Perturbation of the Maize Methylome Plant Cell 26 4602ndash4616 916

Li S Łabaj PP Zumbo P Sykacek P Shi W Shi L Phan J Wu P-Y Wang M Wang C (2014b) Detecting and 917 correcting systematic variation in large-scale RNA sequencing data Nature Biotechnol 32 888ndash895 918

Li Y Pearl SA Jackson SA (2015c) Gene Networks in Plant Biology Approaches in Reconstruction and 919 Analysis Trends Plant Sci 20 664ndash675 920

Liao Y Smyth GK Shi W (2014) featureCounts an efficient general purpose program for assigning sequence 921 reads to genomic features Bioinformatics 30 923ndash930 922

Lim WK Wang K Lefebvre C Califano A (2007) Comparative analysis of microarray normalization procedures 923 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 27

Effects on reverse engineering gene networks Bioinformatics pp 282ndash288 924

Liu S Liu Y Zhao J Cai S Qian H Zuo K Zhao L Zhang L (2017) A computational interactome for prioritizing 925 genes associated with complex agronomic traits in rice Plant J 90 177-188 926

Lonsdale J Thomas J Salvatore M Phillips R Lo E Shad S Hasz R Walters G Garcia F Young N (2013) 927 The genotype-tissue expression (GTEx) project Nat Genet 45 580ndash585 928

Love MI Huber W Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data 929 with DESeq2 Genome Biol 15 1 930

Luo X You Z Zhou M Li S Leung H Xia Y Zhu Q (2015) A highly efficient approach to protein interactome 931 mapping based on collaborative filtering framework Sci Rep 5 7702 932

Ma C Wang X (2012) Application of the Gini correlation coefficient to infer regulatory relationships in 933 transcriptome analysis Plant Physiol 160 192ndash203 934

Ma H-W Zeng A-P (2003) The connectivity structure giant strong component and centrality of metabolic 935 networks Bioinformatics 19 1423ndash1430 936

Ma S Shah S Bohnert HJ Snyder M Dinesh-Kumar SP (2013) Incorporating motif analysis into gene co-937 expression networks reveals novel modular expression pattern and new signaling pathways PLoS Genet 938 9 e1003840 939

Marbach D Costello JC Kuumlffner R Vega NNM Prill RJ Camacho DM Allison KR Aderhold A Allison KR 940 Bonneau R et al (2012) Wisdom of crowds for robust gene network inference Nat Methods 9 796ndash804 941

Margolin AA Nemenman I Basso K Wiggins C Stolovitzky G Dalla Favera R Califano A (2006) ARACNE 942 an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context BMC 943 Bioinformatics 7 S7 944

Mark Cigan A Unger‐Wallace E Haug‐Collet K (2005) Transcriptional gene silencing as a tool for uncovering 945 gene function in maize Plant J 43 929ndash940 946

Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads EMBnet J 17 947 pp-10 948

Maza E Frasse P Senin P Bouzayen M Zouine M (2013) Comparison of normalization methods for 949 differential gene expression analysis in RNA-Seq experiments A matter of relative size of studied 950 transcriptomes Commun Integr Biol 6 e25849 951

McClintock B (1983) The Significance of Responses of THE GENOME TO CHALLENGE Science (80- ) 792ndash952 801 953

Meyer PE Kontos K Lafitte F Bontempi G (2007) Information-theoretic inference of large transcriptional 954 regulatory networks Eurasip J Bioinforma Syst Biol doi 101155200779879 955

Meyer PE Lafitte F Bontempi G (2008) minet ARBioconductor package for inferring large transcriptional 956 networks using mutual information BMC Bioinformatics 9 461 957

Monaco MK Sen TZ Dharmawardhana PD Ren L Schaeffer M Naithani S Amarasinghe V Thomason J 958 Harper L Gardiner J et al (2013) Maize Metabolic Network Construction and Transcriptome Analysis 959 Plant Genome 6 12 960

Morohashi K Casas MI Falcone Ferreyra ML Falcone Ferreyra L Mejiacutea-Guerra MK Pourcel L Yilmaz A 961 Feller A Carvalho B Emiliani J et al (2012) A genome-wide regulatory framework identifies maize 962 pericarp color1 controlled genes Plant Cell 24 2745ndash64 963

Morris JH Apeltsin L Newman AM Baumbach J Wittkop T Su G Bader GD Ferrin TE (2011) clusterMaker 964 a multi-algorithm clustering plugin for Cytoscape BMC Bioinformatics 12 436 965

Mortazavi A Williams BA McCue K Schaeffer L Wold B (2008) Mapping and quantifying mammalian 966 transcriptomes by RNA-Seq Nat Methods 5 621ndash628 967

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 28

Mukaka MM (2012) A guide to appropriate use of correlation coefficient in medical research Malawi Med J 24 968 69ndash71 969

Obayashi T Hayashi S Saeki M Ohta H Kinoshita K (2009) ATTED-II provides coexpressed gene networks 970 for Arabidopsis Nucleic Acids Res 37 D987ndashD991 971

Ogata Y Suzuki H Sakurai N Shibata D (2010) CoP a database for characterizing co-expressed gene 972 modules with biological information in plants Bioinformatics 26 1267ndash1268 973

Oshlack A Wakefield MJ (2009) Transcript length bias in RNA-seq data confounds systems biology Biol 974 Direct 4 14 975

Park T Yi S-G Kang S-H Lee S Lee Y-S Simon R (2003) Evaluation of normalization methods for microarray 976 data BMC Bioinformatics 4 33 977

Paulson J Chen C-Y Lopes-Ramos CM Kuijjer ML Platig J Sonawane AR Fagny M Glass K Quackenbush 978 J (2016) Tissue-aware RNA-Seq processing and normalization for heterogeneous and sparse data 979 bioRxiv 81802 980

Pautler M Eveland AL LaRue T Yang F Weeks R Lunde C Je B Il Meeley R Komatsu M Vollbrecht E et 981 al (2015) FASCIATED EAR4 Encodes a bZIP Transcription Factor That Regulates Shoot Meristem Size in 982 Maize Plant Cell Online 2 tpc114132506 983

Penning BW Hunter 3rd CT Tayengwa R Eveland AL Dugard CK Olek AT Vermerris W Koch KE McCarty 984 DR Davis MF et al (2009) Genetic resources for maize cell wall biology Plant Physiol 151 1703ndash1728 985

Ponnala L Wang Y Sun Q Wijk KJ (2014) Correlation of mRNA and protein abundance in the developing 986 maize leaf Plant J 78 424ndash440 987

Rau A Gallopin M Celeux G Jaffreacutezic F (2013) Data-based filtering for replicated high-throughput 988 transcriptome sequencing experiments Bioinformatics 29 2146ndash2152 989

Robinson MD McCarthy DJ Smyth GK (2010) edgeR a Bioconductor package for differential expression 990 analysis of digital gene expression data Bioinformatics 26 139ndash140 991

Sales G Romualdi C (2011) parmigenemdasha parallel R package for mutual information estimation and gene 992 network reconstruction Bioinformatics 27 1876ndash1877 993

Sangrador-Vegas A Mitchell AL Chang H-Y Yong S-Y Finn RD (2016) GO annotation in InterPro why 994 stability does not indicate accuracy in a sea of changing annotations Database J Biol databases 995 curation 2016 996

Sato Y Antonio BA Namiki N Takehisa H Minami H Kamatsuki K Sugimoto K Shimizu Y Hirochika H 997 Nagamura Y (2011) RiceXPro a platform for monitoring gene expression in japonica rice grown under 998 natural field conditions Nucleic Acids Res 39 D1141ndashD1148 999

Schaefer RJ Briskine R Springer NM Myers CL (2014) Discovering functional modules across diverse maize 1000 transcriptomes using COB the co-expression browser PLoS One doi 101371journalpone0099193 1001

Schmidt D (2016) Co-Operation Fast Correlation Covariance and Cosine Similarity R package 1002

Schnable P Ware D Fulton R Stein J (2009) The B73 maize genome complexity diversity and dynamics 1003 Science (80- ) 326 1112ndash1115 1004

Schwanhaumlusser B Busse D Li N Dittmar G Schuchhardt J Wolf J Chen W Selbach M (2011) Global 1005 quantification of mammalian gene expression control Nature 473 337ndash342 1006

Scott-Boyer M-P Haibe-Kains B Deschepper CF (2013) Network statistics of genetically-driven gene co-1007 expression modules in mouse crosses Frontiers in Genetics 20134291 1008

Serin EAR Nijveen H Hilhorst HWM Ligterink W (2016) Learning from Co-expression Networks Possibilities 1009 and Challenges Front Plant Sci 7 444 1010

Shannon P Markiel A Ozier O Baliga NS Wang JT Ramage D Amin N Schwikowski B Ideker T (2003) 1011 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 29

Cytoscape a software environment for integrated models of biomolecular interaction networks Genome 1012 Res 13 2498ndash2504 1013

Sing T Sander O Beerenwinkel N Lengauer T (2005) ROCR visualizing classifier performance in R 1014 Bioinformatics 21 3940ndash3941 1015

Singh M Goel S Meeley RB Dantec C Parrinello H Michaud C Leblanc O Grimanelli D (2011) Production of 1016 viable gametes without meiosis in maize deficient for an ARGONAUTE protein Plant Cell 23 443ndash458 1017

Song L Langfelder P Horvath S (2012) Comparison of co-expression measures mutual information 1018 correlation and model based indices BMC Bioinformatics 13 328 1019

Stelpflug SC Rajandeep S Vaillancourt B Hirsch CN Buell CR Leon N De Kaeppler SM (2015) An 1020 expanded maize gene expression atlas based on RNA-sequencing and its use to explore root 1021 development Plant Genome 314ndash362 1022

Szklarczyk D Franceschini A Wyder S Forslund K Heller D Huerta-Cepas J Simonovic M Roth A Santos A 1023 Tsafou KP et al (2015) STRING v10 Protein-protein interaction networks integrated over the tree of life 1024 Nucleic Acids Res 43 D447ndashD452 1025

Tzfadia O Diels T De Meyer S Vandepoele K Aharoni A Van de Peer Y (2016) CoExpNetViz Comparative 1026 Co-Expression Networks Construction and Visualization Tool Front Plant Sci 6 1194 1027

Usadel B Obayashi T Mutwil M Giorgi FM Bassel GW Tanimoto M Chow A Steinhauser D Persson S 1028 Provart NJ (2009) Co-expression tools for plant biology opportunities for hypothesis generation and 1029 caveats Plant Cell Environ 32 1633ndash51 1030

USDA (2016) Grain World Markets and Trade 1031

Vanholme R Demedts B Morreel K Ralph J Boerjan W (2010) Lignin biosynthesis and structure Plant 1032 Physiol 153 895ndash905 1033

Verdin E Ott M (2014) 50 Years of Protein Acetylation From Gene Regulation To Epigenetics Metabolism 1034 and Beyond Nat Rev Mol Cell Biol 16 258ndash264 1035

Walley JW Sartor RC Shen Z Schmitz RJ Wu KJ Urich MA Nery JR Smith LG Schnable JC Ecker JR 1036 (2016) Integration of omic networks in a developmental atlas of maize Science 353 814ndash818 1037

Wang W Zhou XJ Liu Z Sun F (2015a) Network tuned multiple rank aggregation and applications to gene 1038 ranking BMC Bioinformatics 16 S6 1039

Wang YXR Jiang K Feldman LJ Bickel PJ Huang H (2015b) Inferring genendashgene interactions and functional 1040 modules using sparse canonical correlation analysis Ann Appl Stat 9 300ndash323 1041

Wang Z Gerstein M Snyder M (2009) RNA-Seq a revolutionary tool for transcriptomics Nat Rev Genet 10 1042 57ndash63 1043

Wei C Li J Bumgarner RE (2004) Sample size for detecting differentially expressed genes in microarray 1044 experiments BMC Genomics 5 87 1045

Wolfe CJ Kohane IS Butte AJ (2005) Systematic survey reveals general applicability ofldquo guilt-by-associationrdquo 1046 within gene coexpression networks BMC Bioinformatics 6 227 1047

Wu D-D Wang X Li Y Zeng L Irwin DM Zhang Y-P (2014) ldquoOut of Pollenrdquo Hypothesis for Origin of New 1048 Genes in Flowering Plants Study from Arabidopsis thaliana Genome Biol Evol 6 2822ndash2829 1049

Yang H Liu X Xin M Du J Hu Z Peng H Rossi V Sun Q Ni Z Yao Y (2016) Genome-wide Mapping of 1050 Targets of Maize Histone Deacetylase HDA101 Reveals Its Function and Regulatory Mechanism during 1051 Seed Development Plant Cell 28 629ndash645 1052

Yim WC Yu Y Song K Jang CS Lee B-M (2013) PLANEX the plant co-expression database BMC Plant Biol 1053 13 83 1054

Zheng W Chung LM Zhao H (2011) Bias detection and correction in RNA-Sequencing data BMC 1055 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 30

Bioinformatics 12 290 1056

Zhong R Allen JD Xiao G Xie Y (2014) Ensemble-based network aggregation improves the accuracy of gene 1057 network reconstruction PLoS One 9 e106319 1058

Zhong R Ye ZH (2015) Secondary cell walls Biosynthesis patterned deposition and transcriptional regulation 1059 Plant Cell Physiol 56 195ndash214 1060

Zhu G Wu A Xu X-J Xiao P Lu L Liu J Cao Y Chen L Wu J Zhao X-M (2016) PPIM A protein-protein 1061 interaction database for Maize Plant Physiol 170 pp1501821- 1062

Zyprych-Walczak J Szabelska A Handschuh L Goacuterczak K Klamecka K Figlerowicz M Siatkowski I (2015) 1063 The impact of normalization methods on RNA-Seq data analysis Biomed Res Int 2015 1064

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 1 Number of maize microarray and RNA-Seq samples submitted to NCBI (httpswwwncbinlmnihgov) from 2008-2016 A A text search of the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) database identified samples generated by microarray platforms GPL4032 and GPL12620 the total values for years 2008 to 2016 were combined to represent the number of microarray studies (Microarray) A text search of the NCBI Sequence Read Archive (SRA) database was used to identify RNA-seq samples generated between 2008 and 2016 using the Illumina sequencing platform (RNA-Seq) B the number of samples submitted to NCBI GEO database each year generated by microarray platform GPL4032 and GPL12620 were identified by a text search (dash line) and compared to the number of RNA-Seq Illumina samples (solid line) per year 2008-2016

Fig 1A B

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 2 Normalization and network inference methods effect on single network performance A Network performance was evaluated by calculating area under the receiver operating characteristic curve (AUROC) values from GO datasets for comparisons with samples normalized using Variance Stabilizing Transformation (VST) Counts Per Million (CPM) or Reads per Kilobase per Million (RPKM) methods B Network performance was evaluated by calculating AUROC values from PPPTY dataset comparisons for samples normalized using VST CPM or RPKM methods C Network performance was evaluated by calculating AUROC values from com-parisons with HDA101 binding targets for samples normalized using VST CPM or RPKM methods D Network performance was evaluated by calculating AUROC values from comparisons with GO dataset for samples constructed using nine inference methods including Pearson Correlation Coefficient (PCC) Spearman correla-tion coefficient (SCC) Kendall rank Correlation Coefficient (KCC) Gini correlation coefficient (GCC) Biweight midcorrelation (BIC) Cosine Similarity Coefficient (CSC) Additive ARCNE (AA) multiplicative ARCNE (MA) Minimum Redundancy NETwork (MRNET) and Context Likelihood of Relatedness (CLR) E Network perfor-mance was evaluated by calculating AUROC values from comparisons with PPPTY for samples constructed using nine inference methods F Network performance was evaluated by calculating AUROC values from com-parisons with HDA101 binding targets for samples constructed using nine inference methods Outliers were defined as outside of 15 times the interquartile range above 75 quantile or below 25 quantile Median values were plotted as bold horizontal lines For C to F the horizontal dashed lines indicate the highest and lowest AUROC values

Fig 2 A D

B E

C F

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

FigP

FigurePIPSimilarityPbetweenPninePinferencePmethodsPonPnetworkPperformancePbasedPuponPGOPA-PandPPPPTYPB-PevaluationI Genes with highest AUROC value in CSC or MRNETCLR are closed in blue and green box respectively AreaPunderPthePROCPcurvePAUROC-PvaluesPforPeachPGOPtermPorPgenesPwerePscaledPtoPstandardPnormalPdistributionMPresultingPinPscaledPAUROCPvaluesPbetweenPKPblue-PandPPred-IPSamplesPnormalizedPbyPVSTMPCPMPandPRPKMPwerePanalyzedPusingPeachPinferencePmethodsPPCCMPSCCMPKCCMPGCCMPBICMPCSCMPAAMPMAMPMRNETPandPCLR-PandPclusteredPbasedPonP EuclidianP distanceIP PCCP PearsonP CorrelationP CoefficientP SCCP SpearmanP correlationP coefficientP KCCP KendallP rankP CorrelationP CoefficientPGCCPGiniPcorrelationPcoefficientPBICPBiweightPmidcorrelationPCosinePSimilarityPCoefficientPCSC-PAAPAdditivePARCNEPMAPmultiplicativePARCNEPMRNETPMinimumPRedundancyPNETworkPCLRPContextPLikelihoodPofPRelatednessI

A

B

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 4 Effect of sample size on network performance A Average area under the ROC curve (average AUROC) values from GO evaluation of 17 different sized networks plotted against natural logarithm transformed sample size (log(sample size)) B Average AUROC values from PPPTY evaluation of 17 different sized networks plotted against natural logarithm transformed sample size (log(sample size)) Regression fitting logarithm models were plotted in black lines R-square and p-values were calculated by lm() function in R PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness

Fig 4A

B

GO PCC GO SCC

GO MRNET GO CLR

PPPTY PCC PPPTY SCC

PPPTY CLRPPPTY MRNET

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Fig 5

A B

Figure 5 Evaluation of different sized networks constructed with different number of samples using PCC (black) SCC (green) MRNET (red) and CLR (blue) methods A Average AUROC values from GO evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted against sample size B Average AUROC values from PPPTY evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted against sample size Networks with the same number of samples included are designated as ldquo_1rdquo ldquo_2rdquo and ldquo_3rdquo PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 6 GCN performance comparison of networks constructed with 1266 libraries A Area under the ROC curve (AUROC) values from GO evalu-ation of single network (white bars) aggregation network (grey bars) and protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers B AUROC values from PPPTY evaluation of single network (white bars) aggregation network (grey bars) and protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers

AU

C

AU

C

Protein GO Protein PPPTYA B

Fig 6

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

835

45

3812

5 802

148

MS PA

SA

872505

167664

16510411732

9172 716573

106591

MS PA

SA

chromatinassemblydisassembly

cellular macromoleculemetabolic process

chromatin assembly

Hub

N2 compound metabolicprocess

gene silencing

macromoleculemetabolic process

cellular componentorganization

chromatin modification

biosynthetic process

cellular biosyntheticprocess

DNA packaging

organelle organization

protein-DNA complexassembly

nucleosomeorganization

DNA-dep DNAreplication

macromoleculebiosynthetic process

response to DNAdamage stimulus

chromosomeorganization

pattern specificationprocess

DNA replication

DNA conformationchange

translation

cellular macromoleculebiosynthetic process

Nucleic acid metabolicprocess

gene expression

chromatin organizationnucleosome assembly

epigenetic reg of geneexpression

negative regulation ofmacromolecule

metabolic process

cellular response tostress

RNA processing

DNA repair

regionalization

polysaccharidebiosynthetic process

cell wall organization orbiogenesis

glucan metabolicprocess

cellular glucanmetabolic process

cellular polysaccharidebiosynthetic process

cellular carbohydratebiosynthetic process

cellulose metabolicprocess

cellular polysaccharidemetabolic process

cellulose biosyntheticprocess epidermis development

cell growthgrowth

regulation of cellularcomponent size

cellular amino acidderivative metabolic

process

cell wall polysaccharidemetabolic process

carbohydrate metabolicprocess

regulation of anatomicalstructure size

GTP metabolic process

root morphogenesis

epidermal celldifferentiation

ectoderm developmentphenylpropanoid

biosynthetic process

regulation of cell size

glucan biosyntheticprocess

carbohydratebiosynthetic process cellular cell wall

organization orbiogenesis

cell wall biogenesis

Cell Wallroot epidermal cell

differentiationcell differentiation

cell wall organization

protein polymerization

plant-type cell wallbiogenesis

cellular carbohydratemetabolic process

phenylpropanoidmetabolic process

cell wall macromoleculemetabolic process

cellular cell wallmacromolecule

metabolic process

plant-type cell wallorganization or

biogenesis

hemicellulose metabolicprocess

Fig 7A C

B D

Figure 7 Comparison of top 1000 interacting genes in aggregation and single networks A Venn diagram shows the overlap among 1000 highest interacting genes in three networks 148 genes were shared among three networks PA PCC ranked aggregation network SA SCC ranked aggrega-tion network MS MRNET single network B Venn diagram shows the overlap among top 1x106 edges in three networks 106591 edges were shared among three networks C GO term enrichment analyzed by AgriGO for 148 shared highly interacting genes among PA SA and MS D GO term enrichment analyzed by AgriGO for 214 co-expressed genes queried by 16 cell wall pathway genes wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Fig 8

A B C

Figure 8 Cell wall pathway subnetworks A Intersections of PCC aggregation (PA) SCC aggregation (SA) and MRNET-single (MS) networks queried by 16 cell wall pathway genes (red nodes) Cyan nodes are genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions B Network retrieved from CORNET database queried by the 16 cell wall pathway genes (red node) Cyan nodes are genes with reported function in cell wall related path-ways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions C Network retrieved from STRING database queried by the 16 cell wall pathway genes (red nodes) Cyan nodes are genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Parsed CitationsAllen JD Xie Y Chen M Girard L Xiao G (2012) Comparing statistical methods for constructing large scale gene networks PLoS One7 e29348

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Anders S Huber W (2010) Differential expression analysis for sequence count data Genome Biol 11 R106Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Aoki K Ogata Y Shibata D (2007) Approaches for extracting practical information from gene co-expression networks in plant biologyPlant Cell Physiol 48 381-90

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Aoki Y Okamura Y Tadaka S Kinoshita K Obayashi T (2015) ATTED-II in 2016 A Plant Coexpression Database Towards Lineage-Specific Coexpression Plant Cell Physiol 57 e5-e5

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Asnicar F Masera L Coller E Gallo C Sella N Tolio T Morettin P Erculiani L Galante F Semeniuta S (2016) NES2RA Networkexpansion by stratified variable subsetting and ranking aggregation Int J High Perform Comput Appl 1094342016662508

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Baerenfaller K Grossmann J Grobei MA Hull R Hirsch-Hoffmann M Yalovsky S Zimmermann P Grossniklaus U Gruissem WBaginsky S (2008) Genome-scale proteomics reveals Arabidopsis thaliana gene models and proteome dynamics Science (80- ) 320938-941

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ballouz S Verleyen W Gillis J (2015) Guidance for RNA-seq co-expression network construction and analysis Safety in numbersBioinformatics 31 2123-2130

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ballouz S Weber M Pavlidis P Gillis J (2016) EGAD Ultra-fast functional analysis of gene networks bioRxiv 53868Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Barabasi A-L Oltvai ZNZN Barabaacutesi A-L (2004) Network biology understanding the cells functional organization Nat Rev Genet 5101-113

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Benjamini Y Hochberg Y (1995) Controlling the false discovery rate a practical and powerful approach to multiple testing J R Stat SocSer B 289-300

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

De Bodt S Hollunder J Nelissen H Meulemeester N Inzeacute D (2012) CORNET 20 Integrating plant coexpression protein-proteininteractions regulatory interactions gene associations and functional annotations New Phytol 195 707-720

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Bolduc N Yilmaz A Mejia-Guerra MK Morohashi K OConnor D Grotewold E Hake S (2012) Unraveling the KNOTTED1 regulatorynetwork in maize meristems Genes Dev 26 1685-90

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Bosch M Mayer C-D Cookson A Donnison IS (2011) Identification of genes involved in cell wall biogenesis in grasses by differential wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

gene expression profiling of elongating and non-elongating maize internodes J Exp Bot 62 3545-3561Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Camacho C Coulouris G Avagyan V Ma N Papadopoulos J Bealer K Madden TL (2009) BLAST+ architecture and applications BMCBioinformatics 10 421

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Conesa A Madrigal P Tarazona S Gomez-Cabrero D Cervera A McPherson A Szczesniak MW Gaffney DJ Elo LL Zhang X et al(2016) A survey of best practices for RNA-seq data analysis Genome Biol 17 13

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dhaeseleer P Liang S Somogyi R (2000) Genetic network inference from co-expression clustering to reverse engineeringBioinformatics 16 707-726

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Davidson RM Hansey CN Gowda M Childs KL Lin H Vaillancourt B Sekhon RS de Leon N Kaeppler SM Jiang N et al (2011) Utilityof RNA Sequencing for Analysis of Maize Reproductive Transcriptomes Plant Genome J 4 191

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Desprez T Juraniec M Crowell EF Jouy H Pochylova Z Parcy F Houmlfte H Gonneau M Vernhettes S (2007) Organization of cellulosesynthase complexes involved in primary cell wall synthesis in Arabidopsis thaliana Proc Natl Acad Sci 104 15572-15577

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dhillon IS Modha DS (2001) Concept decompositions for large sparse text data using clustering Mach Learn 42 143-175Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dillies M-A Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot G Castel D Estelle J (2013a) Acomprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis Brief Bioinform 14671-683

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dillies MA Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot NS Castel D Estelle J et al (2013b) Acomprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis Brief Bioinform 14671-683

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Doncheva NT Assenov Y Domingues FS Albrecht M (2012) Topological analysis and interactive visualization of biological networksand protein structures Nature Protoc 7 670-85

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dong J Horvath S (2007) Understanding network concepts in modules BMC Syst Biol 1 24Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dotto MC Petsch KA Aukerman MJ Beatty M Hammell M Timmermans MCP (2014) Genome-wide analysis of leafbladeless1-regulatedand phased small RNAs underscores the importance of the TAS3 ta-siRNA pathway to maize development PLoS Genet 10 e1004826

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Du D Rawat N Deng Z Gmitter FG (2015) Construction of citrus gene coexpression networks from microarray data using randommatrix theory Hortic Res 2 15026

Pubmed Author and TitleCrossRef Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Google Scholar Author Only Title Only Author and Title

Du Z Zhou X Ling Y Zhang Z Su Z (2010) agriGO a GO analysis toolkit for the agricultural community Nucleic Acids Res 38 64-70Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Enright AJ Van Dongen S Ouzounis CA (2002) An efficient algorithm for large-scale detection of protein families Nucleic Acids Res 301575-1584

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fadista J Vikman P Laakso EO Mollet IG Esguerra J Lou Taneera J Storm P Osmark P Ladenvall C Prasad RB (2014) Globalgenomic and transcriptomic analysis of human pancreatic islets reveals novel genes influencing glucose metabolism Proc Natl AcadSci 111 13924-13929

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Faith JJ Hayete B Thaden JT Mogno I Wierzbowski J Cottarel G Kasif S Collins JJ Gardner TS (2007) Large-scale mapping andvalidation of Escherichia coli transcriptional regulation from a compendium of expression profiles PLoS Biol 5 0054-0066

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fedoroff N V (2012) McClintocks challenge in the 21st century Proc Natl Acad Sci 109(50) 20200-20203Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ficklin SP Feltus FA (2011) Gene coexpression network alignment and conservation of gene modules between two grass speciesmaize and rice Plant Physiol 156 1244-56

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Filzmoser P Fritz H Kalcher K (2009) pcaPP Robust PCA by Projection Pursuit R Packag version 1Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fu J Cheng Y Linghu J Yang X Kang L Zhang Z Zhang J He C Du X Peng Z (2013) RNA sequencing reveals the complex regulatorynetwork in the maize kernel Nature Commun 42832

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gao S Ver Steeg G Galstyan A (2015) Efficient Estimation of Mutual Information for Strongly Dependent Variables ArtificialIntelligence and Statistics 277-286

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gillis J Pavlidis P (2011) The role of indirect connections in gene networks in predicting function Bioinformatics 27 1860-1866Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Giorgi F Fabbro C Del Licausi F (2013) Comparative study of RNA-seq-and Microarray-derived coexpression networks in Arabidopsisthaliana Bioinformatics 2 1-8

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gu Y Kaplinsky N Bringmann M Cobb A Carroll A Sampathkumar A Baskin TI Persson S Somerville CR (2010) Identification of acellulose synthase-associated protein required for cellulose biosynthesis Proc Natl Acad Sci 107 12866-12871

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Han Y Gao S Muegge K Zhang W Zhou B (2015) Advanced applications of RNA sequencing and challenges Bioinform Biol Insights 929-46

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Horvath S Dong J (2008) Geometric interpretation of gene coexpression network analysis PLoS Comput biol 4 e1000117Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Huang J Lynn JS Schulte L Vendramin S McGinnis K (2017) Chapter Two-Epigenetic Control of Gene Expression in Maize Int RevCell Mol Biol 328 25-48

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Iancu OD Kawane S Bottomly D Searles R Hitzemann R McWeeney S (2012) Utilizing RNA-Seq data for de novo coexpressionnetwork inference Bioinformatics 28 1592-1597

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kim D Langmead B Salzberg SL (2015) HISAT a fast spliced aligner with low memory requirements Nat Methods 12 357-360Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Koumlster J Rahmann S (2012) Snakemakemdasha scalable bioinformatics workflow engine Bioinformatics 28 2520-2522Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Krouk G Lingeman J Colon A Coruzzi G Shasha D (2013) Gene regulatory networks in plants learning causality from time andperturbation Genome Biol 14 123

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kumar S Stecher G Suleski M Hedges SB (2017) TimeTree a resource for timelines timetrees and divergence times Mol Biol Evol34 1812-1819

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kumari S Nie J Chen H-S Ma H Stewart R Li X Lu M-Z Taylor WM Wei H (2012) Evaluation of gene association methods forcoexpression network construction and biological knowledge discovery PLoS One 7 e50411

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Langfelder P Horvath S (2008) WGCNA an R package for weighted correlation network analysis BMC Bioinformatics 9 559Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Leinonen R Sugawara H Shumway M (2010) The sequence read archive Nucleic Acids Res gkq1019Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li C Qiao Z Qi W Wang Q Yuan Y Yang X Tang Y Mei B Lv Y Zhao H et al (2015a) Genome-Wide Characterization of cis-Acting DNATargets Reveals the Transcriptional Regulatory Framework of Opaque2 in Maize Plant Cell 27 532-545

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li H Peng Z Yang X Wang W Fu J Wang J Han Y Chai Y Guo T Yang N (2013a) Genome-wide association study dissects the geneticarchitecture of oil biosynthesis in maize kernels Nat Genet 45 43-50

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li J Wei H Zhao PX (2013b) DeGNServer Deciphering Genome-Scale Gene Networks through High Performance ReverseEngineering Analysis 2013

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li P Piao Y Shon HS Ryu KH (2015b) Comparing the normalization methods for the differential analysis of Illumina high-throughputRNA-Seq data BMC Bioinformatics 16 347 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li Q Eichten SR Hermanson PJ Zaunbrecher VM Song J Wendt J Rosenbaum H Madzima TF Sloan AE Huang J et al (2014a)Genetic Perturbation of the Maize Methylome Plant Cell 26 4602-4616

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li S Labaj PP Zumbo P Sykacek P Shi W Shi L Phan J Wu P-Y Wang M Wang C (2014b) Detecting and correcting systematicvariation in large-scale RNA sequencing data Nature Biotechnol 32 888-895

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li Y Pearl SA Jackson SA (2015c) Gene Networks in Plant Biology Approaches in Reconstruction and Analysis Trends Plant Sci 20664-675

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Liao Y Smyth GK Shi W (2014) featureCounts an efficient general purpose program for assigning sequence reads to genomicfeatures Bioinformatics 30 923-930

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Lim WK Wang K Lefebvre C Califano A (2007) Comparative analysis of microarray normalization procedures Effects on reverseengineering gene networks Bioinformatics pp 282-288

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Liu S Liu Y Zhao J Cai S Qian H Zuo K Zhao L Zhang L (2017) A computational interactome for prioritizing genes associated withcomplex agronomic traits in rice Plant J 90 177-188

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Lonsdale J Thomas J Salvatore M Phillips R Lo E Shad S Hasz R Walters G Garcia F Young N (2013) The genotype-tissueexpression (GTEx) project Nat Genet 45 580-585

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Love MI Huber W Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 Genome Biol15 1

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Luo X You Z Zhou M Li S Leung H Xia Y Zhu Q (2015) A highly efficient approach to protein interactome mapping based oncollaborative filtering framework Sci Rep 5 7702

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma C Wang X (2012) Application of the Gini correlation coefficient to infer regulatory relationships in transcriptome analysis PlantPhysiol 160 192-203

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma H-W Zeng A-P (2003) The connectivity structure giant strong component and centrality of metabolic networks Bioinformatics 191423-1430

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma S Shah S Bohnert HJ Snyder M Dinesh-Kumar SP (2013) Incorporating motif analysis into gene co-expression networks revealsnovel modular expression pattern and new signaling pathways PLoS Genet 9 e1003840

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Marbach D Costello JC Kuumlffner R Vega NNM Prill RJ Camacho DM Allison KR Aderhold A Allison KR Bonneau R et al (2012)Wisdom of crowds for robust gene network inference Nat Methods 9 796-804

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Margolin AA Nemenman I Basso K Wiggins C Stolovitzky G Dalla Favera R Califano A (2006) ARACNE an algorithm for thereconstruction of gene regulatory networks in a mammalian cellular context BMC Bioinformatics 7 S7

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mark Cigan A Unger-Wallace E Haug-Collet K (2005) Transcriptional gene silencing as a tool for uncovering gene function in maizePlant J 43 929-940

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads EMBnet J 17 pp-10Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Maza E Frasse P Senin P Bouzayen M Zouine M (2013) Comparison of normalization methods for differential gene expressionanalysis in RNA-Seq experiments A matter of relative size of studied transcriptomes Commun Integr Biol 6 e25849

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

McClintock B (1983) The Significance of Responses of THE GENOME TO CHALLENGE Science (80- ) 792-801Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Meyer PE Kontos K Lafitte F Bontempi G (2007) Information-theoretic inference of large transcriptional regulatory networks EurasipJ Bioinforma Syst Biol doi 101155200779879

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Meyer PE Lafitte F Bontempi G (2008) minet ARBioconductor package for inferring large transcriptional networks using mutualinformation BMC Bioinformatics 9 461

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Monaco MK Sen TZ Dharmawardhana PD Ren L Schaeffer M Naithani S Amarasinghe V Thomason J Harper L Gardiner J et al(2013) Maize Metabolic Network Construction and Transcriptome Analysis Plant Genome 6 12

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Morohashi K Casas MI Falcone Ferreyra ML Falcone Ferreyra L Mejiacutea-Guerra MK Pourcel L Yilmaz A Feller A Carvalho B EmilianiJ et al (2012) A genome-wide regulatory framework identifies maize pericarp color1 controlled genes Plant Cell 24 2745-64

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Morris JH Apeltsin L Newman AM Baumbach J Wittkop T Su G Bader GD Ferrin TE (2011) clusterMaker a multi-algorithmclustering plugin for Cytoscape BMC Bioinformatics 12 436

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mortazavi A Williams BA McCue K Schaeffer L Wold B (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq NatMethods 5 621-628

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mukaka MM (2012) A guide to appropriate use of correlation coefficient in medical research Malawi Med J 24 69-71Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Obayashi T Hayashi S Saeki M Ohta H Kinoshita K (2009) ATTED-II provides coexpressed gene networks for Arabidopsis Nucleic wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Acids Res 37 D987-D991Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ogata Y Suzuki H Sakurai N Shibata D (2010) CoP a database for characterizing co-expressed gene modules with biologicalinformation in plants Bioinformatics 26 1267-1268

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Oshlack A Wakefield MJ (2009) Transcript length bias in RNA-seq data confounds systems biology Biol Direct 4 14Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Park T Yi S-G Kang S-H Lee S Lee Y-S Simon R (2003) Evaluation of normalization methods for microarray data BMC Bioinformatics4 33

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Paulson J Chen C-Y Lopes-Ramos CM Kuijjer ML Platig J Sonawane AR Fagny M Glass K Quackenbush J (2016) Tissue-awareRNA-Seq processing and normalization for heterogeneous and sparse data bioRxiv 81802

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Pautler M Eveland AL LaRue T Yang F Weeks R Lunde C Je B Il Meeley R Komatsu M Vollbrecht E et al (2015) FASCIATED EAR4Encodes a bZIP Transcription Factor That Regulates Shoot Meristem Size in Maize Plant Cell Online 2 tpc114132506

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Penning BW Hunter 3rd CT Tayengwa R Eveland AL Dugard CK Olek AT Vermerris W Koch KE McCarty DR Davis MF et al (2009)Genetic resources for maize cell wall biology Plant Physiol 151 1703-1728

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ponnala L Wang Y Sun Q Wijk KJ (2014) Correlation of mRNA and protein abundance in the developing maize leaf Plant J 78 424-440Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Rau A Gallopin M Celeux G Jaffreacutezic F (2013) Data-based filtering for replicated high-throughput transcriptome sequencingexperiments Bioinformatics 29 2146-2152

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Robinson MD McCarthy DJ Smyth GK (2010) edgeR a Bioconductor package for differential expression analysis of digital geneexpression data Bioinformatics 26 139-140

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sales G Romualdi C (2011) parmigenemdasha parallel R package for mutual information estimation and gene network reconstructionBioinformatics 27 1876-1877

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sangrador-Vegas A Mitchell AL Chang H-Y Yong S-Y Finn RD (2016) GO annotation in InterPro why stability does not indicateaccuracy in a sea of changing annotations Database J Biol databases curation 2016

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sato Y Antonio BA Namiki N Takehisa H Minami H Kamatsuki K Sugimoto K Shimizu Y Hirochika H Nagamura Y (2011) RiceXPro aplatform for monitoring gene expression in japonica rice grown under natural field conditions Nucleic Acids Res 39 D1141-D1148

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schaefer RJ Briskine R Springer NM Myers CL (2014) Discovering functional modules across diverse maize transcriptomes using wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

COB the co-expression browser PLoS One doi 101371journalpone0099193Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schmidt D (2016) Co-Operation Fast Correlation Covariance and Cosine Similarity R packagePubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schnable P Ware D Fulton R Stein J (2009) The B73 maize genome complexity diversity and dynamics Science (80- ) 326 1112-1115Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schwanhaumlusser B Busse D Li N Dittmar G Schuchhardt J Wolf J Chen W Selbach M (2011) Global quantification of mammalian geneexpression control Nature 473 337-342

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Scott-Boyer M-P Haibe-Kains B Deschepper CF (2013) Network statistics of genetically-driven gene co-expression modules in mousecrosses Frontiers in Genetics 20134291

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Serin EAR Nijveen H Hilhorst HWM Ligterink W (2016) Learning from Co-expression Networks Possibilities and Challenges FrontPlant Sci 7 444

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Shannon P Markiel A Ozier O Baliga NS Wang JT Ramage D Amin N Schwikowski B Ideker T (2003) Cytoscape a softwareenvironment for integrated models of biomolecular interaction networks Genome Res 13 2498-2504

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sing T Sander O Beerenwinkel N Lengauer T (2005) ROCR visualizing classifier performance in R Bioinformatics 21 3940-3941Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Singh M Goel S Meeley RB Dantec C Parrinello H Michaud C Leblanc O Grimanelli D (2011) Production of viable gametes withoutmeiosis in maize deficient for an ARGONAUTE protein Plant Cell 23 443-458

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Song L Langfelder P Horvath S (2012) Comparison of co-expression measures mutual information correlation and model basedindices BMC Bioinformatics 13 328

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Stelpflug SC Rajandeep S Vaillancourt B Hirsch CN Buell CR Leon N De Kaeppler SM (2015) An expanded maize gene expressionatlas based on RNA-sequencing and its use to explore root development Plant Genome 314-362

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Szklarczyk D Franceschini A Wyder S Forslund K Heller D Huerta-Cepas J Simonovic M Roth A Santos A Tsafou KP et al (2015)STRING v10 Protein-protein interaction networks integrated over the tree of life Nucleic Acids Res 43 D447-D452

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Tzfadia O Diels T De Meyer S Vandepoele K Aharoni A Van de Peer Y (2016) CoExpNetViz Comparative Co-Expression NetworksConstruction and Visualization Tool Front Plant Sci 6 1194

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Usadel B Obayashi T Mutwil M Giorgi FM Bassel GW Tanimoto M Chow A Steinhauser D Persson S Provart NJ (2009) Co-expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

USDA (2016) Grain World Markets and Trade

Vanholme R Demedts B Morreel K Ralph J Boerjan W (2010) Lignin biosynthesis and structure Plant Physiol 153 895-905Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Verdin E Ott M (2014) 50 Years of Protein Acetylation From Gene Regulation To Epigenetics Metabolism and Beyond Nat Rev MolCell Biol 16 258-264

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Walley JW Sartor RC Shen Z Schmitz RJ Wu KJ Urich MA Nery JR Smith LG Schnable JC Ecker JR (2016) Integration of omicnetworks in a developmental atlas of maize Science 353 814-818

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang W Zhou XJ Liu Z Sun F (2015a) Network tuned multiple rank aggregation and applications to gene ranking BMC Bioinformatics16 S6

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang YXR Jiang K Feldman LJ Bickel PJ Huang H (2015b) Inferring gene-gene interactions and functional modules using sparsecanonical correlation analysis Ann Appl Stat 9 300-323

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang Z Gerstein M Snyder M (2009) RNA-Seq a revolutionary tool for transcriptomics Nat Rev Genet 10 57-63Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wei C Li J Bumgarner RE (2004) Sample size for detecting differentially expressed genes in microarray experiments BMC Genomics5 87

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wolfe CJ Kohane IS Butte AJ (2005) Systematic survey reveals general applicability of guilt-by-association within genecoexpression networks BMC Bioinformatics 6 227

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wu D-D Wang X Li Y Zeng L Irwin DM Zhang Y-P (2014) Out of Pollen Hypothesis for Origin of New Genes in Flowering PlantsStudy from Arabidopsis thaliana Genome Biol Evol 6 2822-2829

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yang H Liu X Xin M Du J Hu Z Peng H Rossi V Sun Q Ni Z Yao Y (2016) Genome-wide Mapping of Targets of Maize HistoneDeacetylase HDA101 Reveals Its Function and Regulatory Mechanism during Seed Development Plant Cell 28 629-645

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yim WC Yu Y Song K Jang CS Lee B-M (2013) PLANEX the plant co-expression database BMC Plant Biol 13 83Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zheng W Chung LM Zhao H (2011) Bias detection and correction in RNA-Sequencing data BMC Bioinformatics 12 290Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Allen JD Xiao G Xie Y (2014) Ensemble-based network aggregation improves the accuracy of gene network reconstructionPLoS One 9 e106319 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Ye ZH (2015) Secondary cell walls Biosynthesis patterned deposition and transcriptional regulation Plant Cell Physiol 56195-214

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhu G Wu A Xu X-J Xiao P Lu L Liu J Cao Y Chen L Wu J Zhao X-M (2016) PPIM A protein-protein interaction database for MaizePlant Physiol 170 pp1501821-

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zyprych-Walczak J Szabelska A Handschuh L Goacuterczak K Klamecka K Figlerowicz M Siatkowski I (2015) The impact of normalizationmethods on RNA-Seq data analysis Biomed Res Int 2015

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

  • Parsed Citations
  • Article File
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Figure 7
  • Figure 8
  • Parsed Citations
Page 14: 1 : Maize RNA-Seq GCNs · 4 Author names and affiliation: 5 Ji Huang, ... and the genome annotation is still far from complete (Mark 49 Cigan et al., 2005; Ficklin and Feltus, 2011)

Page | 14

(Maza et al 2013 Dillies et al 2013b Zyprych-Walczak et al 2015) This was not observed for the maize 461

GCN testing It is possible that the large number of samples from various labs created enough heterogeneity 462

within samples that normalization effects were minimized (Paulson et al 2016) Furthermore the 463

normalization is on a library basis which means genes within the same library are normalized by similar factors 464

So when the network is constructed by PCC and BIC where expression vectors are centered by mean or 465

median values the effect of different normalization methods are probably small Two rank correlations SCC 466

and KCC only consider difference on relative rankings where normalization has a limited effect It is similar for 467

GCC method The estimation of mutual information is based on the k-nearest neighbor method implemented in 468

parmigene (Sales and Romualdi 2011) Since the three normalization methods shared similar expression 469

distribution (Supplemental Fig 2) MI estimations from different normalizations are expected to be similar 470

When assessing inference methods the simple and widely used correlation methods like PCC and SCC are 471

less time-consuming than MI methods This analysis showed PCCSCC- built GCNs had better overall 472

performance This is consistent with a study in human GCN analysis (Ballouz et al 2015) but SCC did not 473

score higher than other correlation methods using GO and PPPTY evaluations Some genes had higher 474

performance using MI methods but this effect was limited to evaluation with the PPPTY data This may 475

indicate that correlation and MI inference methods assert different kinds of interactions (Meyer et al 2008 476

Marbach et al 2012 Song et al 2012) Marbach et al (2012) stated that integration of multiple inference 477

methods showed a more robust performance than any single inference methods in in silico and E coli 478

expression networks referring to ldquothe wisdom of crowdrdquo However for analysis of the available maize data 479

integration of PCC SCC MRNET and CLR together did not result in a network that outperformed PCC and 480

SCC networks (data not shown) This approach was also less effective in more complex S cerevisiae datasets 481

than prokaryotic networks (Marbach et al 2012) suggesting that more work is required to determine whether 482

integrating algorithms can improve GCNs with eukaryotic data 483

In conclusion we extensively evaluated normalization methods and inference methods for building an RNA-484

Seq based maize GCN This optimization may apply to a range of datasets with shared characteristics of 485

maize including a large and heterogeneous genome with rich and diverse transposon element composition 486

and limited gene annotation 487

488

Materials and Methods 489

RNA-Seq Data Collection and Process 490

The maize genome and its annotation were downloaded from Ensembl Plant Release 31 491

(httpplantsensemblorg) The original 1303 RNA-Seq samples based on illumina HiSeq2000 or Hiseq2500 492

were downloaded from NCBI Sequence Read Archive (SRA) (Leinonen et al 2010) The downloaded files 493

were converted to fastq format using the fastq-dump command in SRA Toolkit (version 252) The adapters for 494

the fastq files were trimmed by Cutadapt 181 (Martin 2011) The adapter-removed files were then quality 495

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 15

checked by FastQC v0112 (httpwwwbioinformaticsbabrahamacukprojectsfastqc) HISAT2 v204 (Kim 496

et al 2015) was used for genome alignment Gene-level expression raw read counts were calculated by 497

FeatureCounts 150 (Liao et al 2014) from aligned bam files (Supplemental Fig S1) 26 libraries with less 498

than 5 million reads total and 11 libraries with less than 70 of total alignment rate were excluded leaving 499

1266 samples (Supplemental Table S1) for the final expression table The processing protocol were 500

streamlined by Snakemake v371 (Koumlster and Rahmann 2012) 501

502

Gene Count Normalization 503

The expression data was normalized using three different methods before constructing GCNs Counts Per 504

Million (CPM) and Reads Per Killbase Per Million (RPKM) were calculated by edgeR package (Robinson et al 505

2010) in R environment and then log2 normalized (expression = log2(CPMRPKM +1) For both method scale 506

factors between samples were estimated by Trimmed Mean of M-values (TMM) in edge R Variance Stabilizing 507

Transformation (VST) was calculated by DESeq2 package (Love et al 2014) Only genes with expression 508

higher than 2 CPM in more than 1000 samples were included from additional analysis (15116 genes) 509

510

Network Inference 511

Six correlation coefficient methods and four mutual information methods were applied to normalized gene 512

expression data to construct GCNs All computing steps were done in the R 331 environment Pearson 513

Correlation Coefficient (PCC) and Spearman Correlation Coefficient (SCC) was calculated by cor() function 514

Kendall rank Correlation Coefficient was calculated using corfk() function in pcaPP package (Filzmoser et al 515

2009) Gini Correlation Coefficient was calculated by adjacencymatrix() function in rsgcc package (Ma and 516

Wang 2012) Biweight midcorrelation was computed by bicor() function in WGCNA package (Langfelder and 517

Horvath 2008) Cosine similarity coefficient was computed by cosine() function in coop package (Schmidt 518

2016) Mutual information results were computed using the parmigene package (Sales and Romualdi 2011) 519

The adjacency matrix weighs derived from ten inference methods were ranked with smallest value equals to 520

one Then ranks were divided by the number of elements in the matrix and diagonal was set to one to make all 521

networks weighs ranging from zero to one 522

523

Network Performance Evaluation 524

To generate the random networks gene IDs were shuffled randomly in CPM or VST normalized expression 525

matrices The randomized expression matrices were then inferenced by PCC MRNET or CLR methods and 526

evaluated For PCC methods 1000 repeats of randomization and evaluation were conducted For MRNET and 527

CLR each inference steps took 2 hours on our server so 10 repeats were conducted 528

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 16

Four maize datasets were used for evaluation First maize protein-protein interactions were downloaded from 529

PPIM v11 (Zhu et al 2016) Only high-confidence interactions were used for evaluation as defined by ranking 530

top 5 in their results Second maize pathway information was downloaded from MaizeCyc v22 (Monaco et 531

al 2013) Genes within same pathways were considered as co-expressed Third maize gene ontology data 532

for AGPv330 was downloaded from AgriGO (Du et al 2010) GO terms with 20 to 300 genes were used for 533

evaluation Fourth ChIP-Seq confirmed targets for HDA101 (GRMZM2G172883) (Yang et al 2016) was used 534

as positive co-expressed examples for evaluation 535

The widely-used Area under Receiver Operating Characteristic (AUROC) for binary classification problems 536

was used for evaluations Protein-protein interaction and pathway information was parsed into lists of co-537

expressed genes Prediction() and performance() function in R package ROCR were used to calculate 538

AUROCs (Sing et al 2005) The 277 AUROC values for GO datasets were calculated by EGAD package 539

(Ballouz et al 2016) in R Basically it utilizes the ldquoguilt-by associationrdquo principle that genes with shared GO 540

terms are more likely to connected Thus networks normalized and inferred by different methods can be 541

evaluated by hiding a subset of genes GO terms and test whether the hidden GO terms could be predicted 542

from the remaining annotations The prediction model performance was measured by AUROC values in three-543

fold cross-validation All ANOVA and pairwise Wilcoxon rank tests were analyzed in R using anova() and 544

pairwisewilcoxtest() function from stats package P-value adjustment method was set to ldquofdrrdquo (Benjamini and 545

Hochberg 1995) 546

Definition of True Positives (TP) False Positives (FP) True Negatives (TN) False Negatives (FN) For the 547

evaluation using PPPTY dataset TP a network predicts two genes are co-expressed and they are co-548

expressed in PPPTY dataset FP a network predicts two genes are co-expressed but they are not TN a 549

network predicts two genes are not co-expressed and they are not co-expressed in PPPTY FN a network 550

predicts two genes are not co-expressed but they are co-expressed in PPPTY datasets For the evaluation 551

using GO dataset TP a network predicts a gene has a specific GO term and it does have that GO term in our 552

GO dataset FP a network predicts a gene has a specific GO term but it does not have that GO term in our 553

GO dataset TN a network predicts a gene does not have a specific GO term and it doesnrsquot have in our GO 554

dataset FN a network predicts a gene does not have a specific GO terms but it has that GO term in GO 555

dataset 556

557

Network Clustering and Characterization 558

For each network the top 1 million edges were selected as stringent co-expression networks The network 559

topological characteristics were computed in Cytoscape (Shannon et al 2003) The neighborhood connectivity 560

distribution and node degree distributions were plotted by Network Analyzer plugin (Doncheva et al 2012) 561

Graph clustering was performed using Markov Cluster Algorithm (MCL) by MCL v14137 with inflation value set 562

to 18 (Enright et al 2002) All networks were visualized in Cytoscape 563

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 17

564

Gene Ontology Enrichment and Visualization 565

Gene ontology enrichment was analyzed in AgriGOrsquos Singular Enrichment Analysis tool (Du et al 2010) 566

15116 genes involved in our networks were used as background references Hypergeometric testing was used 567

to calculate p-value for which a value below 005 was considered as significant The Yekutieli method was 568

used for multiple test correction and terms with false discovery rate (FDR) above 005 were discarded The 569

results were then imported into Cytoscape for visualization 570

571

Databases Comparison on Cell Wall Pathway 572

Sixteen well characterized (Penning et al 2009 Bosch et al 2011) components of cell wall biosynthesis 573

(Supplemental Table S8) were chosen as query genes to search against CORNET Maize 574

(httpsbioinformaticspsbugentbecornetversionscornet_maize10) on website and STRING database using 575

Cytoscape stringApp (httpappscytoscapeorgappsstringapp) The parameters for searching CORNET 576

database were Method=Pearson Correlation coefficient=075 P-value le 005 and Top genes = 50 This 577

resulted in 210 co-expressed genes and 325 interactions To search STRING database the confidence cutoff 578

was set to 04 with maximum number of interactors set to 100 76 genes with 817 interactions were retrieved 579

Maize proteins were blasted against TAIR 10 protein sequences using standalone BLASTP version 2228+ 580

(Camacho et al 2009) 581

582

Acknowledgments 583

We would like to give special thanks to Dr Peixiang Zhao (FSU Department of Computer Science) for advice 584

and discussion on topological analysis of maize networks Also we thank Dr Alan Lemmon (FSU Department 585

of Scientific Computing) and Dr Jonathan Dennis (FSU Department of Biological Science) for the helpful 586

discussion on data analysis 587

588

Supplemental Data 589

Supplemental Figure 1 Pipeline and datasets used for analysis 590

Supplemental Figure 2 Distribution of gene expression values 591

Supplemental Figure 3 Maize CPM-normalized with log2 transformed gene expression from all tissues and 592

developmental stages 593

Supplemental Figure 4 Pairwise comparison among results of inferences methods 594

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 18

Supplemental Figure 5 Characteristic of all 1720 PPPTY gene set (ALL_1720) genes with highest AUROC 595

values in CSC method (CSC) PCC method (PCC) and MRNET method (MRNET) 596

Supplemental Figure 6 Evaluation of network performance based on sample size and inference 597

Supplemental Figure 7 GCN performance comparison between protein networks 598

Supplemental Figure 8 Average neighborhood connectivity for three selected networks PCC-aggregated (PA) 599

SCC-aggregated (SA) and MRNET-single (MS) 600

Supplemental Figure 9 Node distribution for four selected networks PCC-aggregated (PA) SCC-aggregated 601

(SA) and MRNET-single (MS) and the intersection among three networks (Merged network) 602

Supplemental Figure 10 Network representation of PCC ranked aggregation network (PA) 603

Supplemental Table S1 RNA-Seq libraries used in this analysis 604

Supplemental Table S2 Random network AUROC value baseline 605

Supplemental Table S3 ANOVA tables and pairwise comparisons 606

Supplemental Table S4 Topological characteristics of four maize networks 607

Supplemental Table S5 Gene Ontology annotation for 148 hub genes 608

Supplemental Table S6 Enriched GO terms for PCC ranked aggregation networks from module 1 to module 8 609

Supplemental Table S7 Enriched GO terms for SCC ranked aggregation networks from module 1 to module 8 610

Supplemental Table S8 16 query genes in maize cell wall pathway 611

Supplemetal Table S9 GO enrichment analysis for 214 co-expressed genes of cell wall query genes in 612

merged network 613

Supplemental Table S10 Annotation for co-expressed genes queried by 16 cell wall pathway genes from 614

merged network 615

Supplemental Table S11 Annotation for co-expressed genes queried by 16 cell wall pathway genes from 616

CORNET database 617

Supplemental Table S12 Annotation for co-expressed genes queried by 16 cell wall pathway genes from 618

STRING database 619

Supplemental Dataset S1 The merged network in Cytoscape-ready format 620

Supplemental Dataset S2 Tutorial Visualizing Co-expression data in Cytoscape 621

622

623 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 19

624

625

626

Figure legends 627

628

Figure 1 Number of maize microarray and RNA-Seq samples submitted to NCBI (httpswwwncbinlmnihgov) 629

from 2008-2016 A A text search of the National Center for Biotechnology Information (NCBI) Gene 630

Expression Omnibus (GEO) database identified samples generated by microarray platforms GPL4032 and 631

GPL12620 the total values for years 2008 to 2016 were combined to represent the number of microarray 632

studies (Microarray) A text search of the NCBI Sequence Read Archive (SRA) database was used to identify 633

RNA-seq samples generated between 2008 and 2016 using the Illumina sequencing platform (RNA-Seq) B 634

the number of samples submitted to NCBI GEO database each year generated by microarray platform 635

GPL4032 and GPL12620 were identified by a text search (dash line) and compared to the number of RNA-Seq 636

Illumina samples (solid line) per year 2008-2016 637

638

Figure 2 Normalization and network inference methods effect on single network performance A Network 639

performance was evaluated by calculating area under the receiver operating characteristic curve (AUROC) 640

values from GO datasets for comparisons with samples normalized using Variance Stabilizing Transformation 641

(VST) Counts Per Million (CPM) or Reads per Kilobase per Million (RPKM) methods B Network performance 642

was evaluated by calculating AUROC values from PPPTY dataset comparisons for samples normalized using 643

VST CPM or RPKM methods C Network performance was evaluated by calculating AUROC values from 644

comparisons with HDA101 binding targets for samples normalized using VST CPM or RPKM methods D 645

Network performance was evaluated by calculating AUROC values from comparisons with GO dataset for 646

samples constructed using ten inference methods including Pearson Correlation Coefficient (PCC) Spearman 647

correlation coefficient (SCC) Kendall rank Correlation Coefficient (KCC) Gini correlation coefficient (GCC) 648

Biweight midcorrelation (BIC) Cosine Similarity Coefficient (CSC) Additive ARCNE (AA) multiplicative 649

ARCNE (MA) Minimum Redundancy NETwork (MRNET) and Context Likelihood of Relatedness (CLR) E 650

Network performance was evaluated by calculating AUROC values from comparisons with PPPTY for samples 651

constructed using ten inference methods F Network performance was evaluated by calculating AUROC 652

values from comparisons with HDA101 binding targets for samples constructed using ten inference methods 653

Outliers were defined as outside of 15 times the interquartile range above 75 quantile or below 25 quantile 654

Median values were plotted as bold horizontal lines For C to F the horizontal dashed lines indicate the highest 655

and lowest AUROC values 656

657

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 20

Figure 3 Similarity between ten inference methods on network performance based upon GO (A) and PPPTY 658

(B) evaluation Genes with highest AUROC value in CSC or MRNETCLR are closed in blue and green box 659

respectively Area under the ROC curve (AUROC) values for each GO term or genes were scaled to standard 660

normal distribution resulting in scaled AUROC values between -3 (blue) and 3 (red) Samples normalized by 661

VST CPM and RPKM were analyzed using each inference methods (PCC SCC KCC GCC BIC CSC AA 662

MA MRNET and CLR) and clustered based on Euclidian distance PCC Pearson Correlation Coefficient SCC 663

Spearman correlation coefficient KCC Kendall rank Correlation Coefficient GCC Gini correlation coefficient 664

BIC Biweight midcorrelation CSC Cosine Similarity Coefficient AA Additive ARCNE MA multiplicative 665

ARCNE MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness 666

667

Figure 4 Effect of sample size on network performance A Average area under the ROC curve (average 668

AUROC) values from GO evaluation of 17 different sized networks plotted against natural logarithm 669

transformed sample size (log(sample size)) B Average AUROC values from PPPTY evaluation of 17 different 670

sized networks plotted against natural logarithm transformed sample size (log(sample size)) Regression fitting 671

logarithm models were plotted in black lines R-square and p-values were calculated by lm() function in R PCC 672

Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET Minimum Redundancy 673

NETwork CLR Context Likelihood of Relatedness 674

675

Figure 5 Evaluation of different sized networks constructed with different number of samples using PCC 676

(black) SCC (green) MRNET (red) and CLR (blue) methods A Average AUROC values from GO evaluations 677

of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted against sample size Seventeen 678

individual networks were labeled as S12_1 to S404 the S1266 included all samples from 17 experiments B 679

Average AUROC values from PPPTY evaluations of networks constructed using 12 (S12) to 1266 (S1266) 680

libraries were plotted against sample size Networks with the same number of samples included are 681

designated as ldquo_1rdquo ldquo_2rdquo and ldquo_3rdquo PCC Pearson Correlation Coefficient SCC Spearman correlation 682

coefficient MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness 683

684

Fig 6 GCN performance comparison among single network (whiterdquo1266rdquo) aggregated network (greyrdquoaggrdquo) 685

and protein network (dark greyrdquoprrdquo) using PCC SCC MRNET and CLR A GO evaluation on networks 686

Inference methods were indicated by single letter (p- PCC s- SCC m- MRNET c-CLR) AUROC values were 687

plotted against network types B PPPTY evaluation on networks Inference methods were indicated by single 688

letter (p- PCC s- SCC m- MRNET c-CLR) Network types were plotted against AUROC values Bold 689

horizontal lines indicate median star sign is the mean value of each box Outliers are plotted in grey dots 690

691

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 21

Figure 6 GCN performance comparison of networks constructed with 1266 libraries A Area under the ROC 692

curve (AUROC) values from GO evaluation of single network (white bars) aggregation network (grey bars) and 693

protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) 694

or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers B 695

AUROC values from PPPTY evaluation of single network (white bars) aggregation network (grey bars) and 696

protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) 697

or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers 698

699

Figure 7 Comparison of top 1000 interacting genes in aggregation and single networks A Venn diagram 700

shows the overlap among 1000 highest interacting genes in three networks 148 genes were shared among 701

three networks PA PCC ranked aggregation network SA SCC ranked aggregation network MS MRNET 702

single network B Venn diagram shows the overlap among top 1x106 edges in three networks 106591 edges 703

were shared among three networks C GO term enrichment analyzed by AgriGO for 148 shared highly 704

interacting genes among PA SA and MS D GO term enrichment analyzed by AgriGO for 214 co-expressed 705

genes queried by 16 cell wall pathway genes 706

707

Figure 8 Cell wall pathway subnetworks A Intersections of PCC aggregation (PA) SCC aggregation (SA) and 708

MRNET-single (MS) networks queried by 16 cell wall pathway genes (red nodes) Cyan nodes are genes with 709

reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of 710

involvement in cell wall related pathways Grey lines indicate network predicted interactions B Network 711

retrieved from CORNET database queried by the16 cell wall pathway genes (red node) Cyan nodes are 712

genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior 713

knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions C 714

Network retrieved from STRING database queried by 16 cell wall pathway genes (red nodes) Cyan nodes are 715

genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior 716

knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions 717

718

Supplemental Figure 1 Pipeline and datasets used for analysis A Workflow used in this analysis 719

Independent steps are labeled in square boxes with alternative algorithms for each step in the rounded boxes 720

Software and packages for each step are in italics between the boxes Raw data files were acquired from 721

National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) database converted to a 722

common format (fastq files) and aligned to the maize AGPv3 genome (Alignment) Gene-level reads were 723

counted (Read Count) to generate an expression matrix which was imported to the R environment for the 724

normalization inference and evaluation steps All networks were visualized in Cytoscape B Relative 725

representation of different maize tissues in acquired datasets Tissues are listed by name with the percentage 726

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 22

of the1266 libraries originating from each tissue SAM= Shoot Apical Meristem Samples are grouped by tissue 727

and may be represented by one or more developmental stages of that tissue Tissues represented by less than 728

10 libraries were grouped together as Others C Relative representation of different maize genotypes in our 729

datasets Genotypes are listed by name with the percentage of the 1266 libraries originating from each tissue 730

MAGIC = Multi-parent Advanced Generation InterCrosses Genotypes represented by more than 10 libraries 731

were grouped together as Others 732

733

Supplemental Figure 2 Distribution of gene expression values The frequency of each expression level in the 734

dataset (Density) was plotted against gene expression (Expr) which was calculated after normalization by 735

Variance Stabilizing Transformation (VST) Counts Per Million (CPM) or Reads Per Kilobase per Million 736

mapped reads (RPKM) A-B distribution of expression values for samples normalized with CPM (black line 737

CPM graph) and RPKM (black line RPKM graph) before (A) and after (B) logarithm normalization (log2) VST 738

values are log2 transformed by default The normal distribution of expression (dot lines) was calculated using 739

dnorm() function in R which takes the mean value and standard deviation from log2 transformed expressions 740

C Normalized gene expression values for 15116 genes were averaged libraries and plotted as a function of 741

gene length in base pairs (bp) 742

743

Supplemental Figure 3 Maize CPM-normalized with log2 transformed gene expression from all tissues and 744

developmental stages (Stelpflug et al 2015) A Clustering dendrogram of samples based on Euclidean 745

distance (Height) DAS days after sowing DAP days after pollination V1-V18 vegetative developmental 746

stage B Heat map of the gene expression correlation between pollen tissue and 78 other tissues calculated 747

by Pearson correlation coefficient ranging 06 to 10 Red color indicates higher correlation 748

749

Supplemental Figure 4 Pairwise comparison among results of inferences methods A GO evaluation 750

comparisons for VST CPM and RPKM normalized data The AUROC value density for each method was 751

plotted in diagonal line of blocks between AUROC values and PCC values AUROC values evaluated by GO 752

datasets were plotted pairwise in triangle below diagonal with the number corresponding coefficient values as 753

calculated by Pearson correlation shown in the triangle above diagonal B PPPTY evaluation comparisons for 754

VST CPM and RPKM normalized data The AUROC value density for each method was plotted in diagonal 755

line of blocks between AUROC values and PCC values AUROC values evaluated by PPPTY datasets were 756

plotted pairwise in triangle below diagonal with the number corresponding coefficient values as calculated by 757

Pearson correlation shown in the triangle above diagonal PCC Pearson Correlation Coefficient SCC 758

Spearman correlation coefficient KCC Kendall rank Correlation Coefficient GCC Gini correlation coefficient 759

Bi Biweight midcorrelation CSC Cosine Similarity Coefficient AA Additive ARCNE MA multiplicative 760

ARCNE MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness 761

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 23

762

Supplemental Figure 5 Characteristic of all 1720 PPPTY gene set (ALL_1720) genes with highest AUROC 763

values in CSC method (CSC) PCC method (PCC) and MRNET method (MRNET) Average expression in 764

CPM of four gene sets were in squares average number of lowly expressed elements (CPM lt 0) were in solid 765

circles 766

767

Supplemental Figure 6 Evaluation of network performance based on sample size and inference A AUROC 768

values from GO evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted 769

against sample size B AUROC values from PPPTY evaluations of networks constructed using 12 (S12) to 770

1266 (S1266) libraries were plotted against sample size Networks with the same number of samples included 771

are designated as ldquo_1rdquo ldquo_2rdquo and ldquo_3rdquo Outliers were defined as outside of 15 times the interquartile range 772

above the 75 quantile or below 25 quantile Median values were plotted as bold horizontal lines Dash lines 773

are average AUROC value from 17 individual networks of each categories Mean values of each network were 774

labeled in asterisks PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET 775

Minimum Redundancy NETwork CLR Context Likelihood of Relatedness 776

777

Supplemental Figure 7 GCN performance comparison between protein networks A Area Under the ROC 778

curve (AUROC) values from GO evaluation of protein networks with 17862 genes (ppr_all) and with 11429 779

genes (ppr) B Area Under the ROC curve (AUROC) values from PPPTY evaluation of protein networks with 780

17862 genes (ppr_all) and with 11429 genes (ppr) Both networks were constructed by Pearson Correlation 781

Coefficient (PCC) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate 782

outliers 783

784

Supplemental Figure 8 Average neighborhood connectivity for three selected networks PCC-aggregated (PA) 785

SCC-aggregated (SA) and MRNET-single (MS) The average neighborhood connectivity distribution of all 786

genes is plotted against number of neighbors The top one million edges were chosen for each network Red 787

and blue curve shows the power-law fitted distribution R2 value indicates the fitness with the power-law model 788

789

Supplemental Figure 9 Node distribution for four selected networks PCC-aggregated (PA) SCC-aggregated 790

(SA) and MRNET-single (MS) and the intersection among three networks (Merged network) The number of 791

edges linked to the genes (node degree) was plotted against the number of genes with that degree (number of 792

nodes) Red curve shows the power-law fitted distribution with the function and R2 indicated beside 793

794

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 24

Supplemental Figure 10 Network representation of PCC ranked aggregation network (PA) Each node is a 795

gene in the network The eight largest modules detected by Markov Cluster Algorithm (MCL) were highlighted 796

in colors Genes not in modules 1-8 are light grey nodes 797

798

799

Literature Cited 800

Allen JD Xie Y Chen M Girard L Xiao G (2012) Comparing statistical methods for constructing large scale 801 gene networks PLoS One 7 e29348 802

Anders S Huber W (2010) Differential expression analysis for sequence count data Genome Biol 11 R106 803

Aoki K Ogata Y Shibata D (2007) Approaches for extracting practical information from gene co-expression 804 networks in plant biology Plant Cell Physiol 48 381ndash90 805

Aoki Y Okamura Y Tadaka S Kinoshita K Obayashi T (2015) ATTED-II in 2016 A Plant Coexpression 806 Database Towards Lineage-Specific Coexpression Plant Cell Physiol 57 e5ndashe5 807

Asnicar F Masera L Coller E Gallo C Sella N Tolio T Morettin P Erculiani L Galante F Semeniuta S (2016) 808 NES2RA Network expansion by stratified variable subsetting and ranking aggregation Int J High Perform 809 Comput Appl 1094342016662508 810

Baerenfaller K Grossmann J Grobei MA Hull R Hirsch-Hoffmann M Yalovsky S Zimmermann P 811 Grossniklaus U Gruissem W Baginsky S (2008) Genome-scale proteomics reveals Arabidopsis thaliana 812 gene models and proteome dynamics Science (80- ) 320 938ndash941 813

Ballouz S Verleyen W Gillis J (2015) Guidance for RNA-seq co-expression network construction and analysis 814 Safety in numbers Bioinformatics 31 2123ndash2130 815

Ballouz S Weber M Pavlidis P Gillis J (2016) EGAD Ultra-fast functional analysis of gene networks bioRxiv 816 53868 817

Barabasi A-L Oltvai ZNZN Barabaacutesi A-L (2004) Network biology understanding the cellrsquos functional 818 organization Nat Rev Genet 5 101ndash113 819

Benjamini Y Hochberg Y (1995) Controlling the false discovery rate a practical and powerful approach to 820 multiple testing J R Stat Soc Ser B 289ndash300 821

De Bodt S Hollunder J Nelissen H Meulemeester N Inzeacute D (2012) CORNET 20 Integrating plant 822 coexpression protein-protein interactions regulatory interactions gene associations and functional 823 annotations New Phytol 195 707ndash720 824

Bolduc N Yilmaz A Mejia-Guerra MK Morohashi K OrsquoConnor D Grotewold E Hake S (2012) Unraveling the 825 KNOTTED1 regulatory network in maize meristems Genes Dev 26 1685ndash90 826

Bosch M Mayer C-D Cookson A Donnison IS (2011) Identification of genes involved in cell wall biogenesis in 827 grasses by differential gene expression profiling of elongating and non-elongating maize internodes J 828 Exp Bot 62 3545ndash3561 829

Camacho C Coulouris G Avagyan V Ma N Papadopoulos J Bealer K Madden TL (2009) BLAST+ 830 architecture and applications BMC Bioinformatics 10 421 831

Conesa A Madrigal P Tarazona S Gomez-Cabrero D Cervera A McPherson A Szcześniak MW Gaffney DJ 832 Elo LL Zhang X et al (2016) A survey of best practices for RNA-seq data analysis Genome Biol 17 13 833

Drsquohaeseleer P Liang S Somogyi R (2000) Genetic network inference from co-expression clustering to reverse 834 engineering Bioinformatics 16 707ndash726 835

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 25

Davidson RM Hansey CN Gowda M Childs KL Lin H Vaillancourt B Sekhon RS de Leon N Kaeppler SM 836 Jiang N et al (2011) Utility of RNA Sequencing for Analysis of Maize Reproductive Transcriptomes Plant 837 Genome J 4 191 838

Desprez T Juraniec M Crowell EF Jouy H Pochylova Z Parcy F Houmlfte H Gonneau M Vernhettes S (2007) 839 Organization of cellulose synthase complexes involved in primary cell wall synthesis in Arabidopsis 840 thaliana Proc Natl Acad Sci 104 15572ndash15577 841

Dhillon IS Modha DS (2001) Concept decompositions for large sparse text data using clustering Mach Learn 842 42 143ndash175 843

Dillies M-A Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot G Castel D 844 Estelle J (2013a) A comprehensive evaluation of normalization methods for Illumina high-throughput RNA 845 sequencing data analysis Brief Bioinform 14 671ndash683 846

Dillies MA Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot NS Castel D 847 Estelle J et al (2013b) A comprehensive evaluation of normalization methods for Illumina high-throughput 848 RNA sequencing data analysis Brief Bioinform 14 671ndash683 849

Doncheva NT Assenov Y Domingues FS Albrecht M (2012) Topological analysis and interactive visualization 850 of biological networks and protein structures Nature Protoc 7 670ndash85 851

Dong J Horvath S (2007) Understanding network concepts in modules BMC Syst Biol 1 24 852

Dotto MC Petsch KA Aukerman MJ Beatty M Hammell M Timmermans MCP (2014) Genome-wide analysis 853 of leafbladeless1-regulated and phased small RNAs underscores the importance of the TAS3 ta-siRNA 854 pathway to maize development PLoS Genet 10 e1004826 855

Du D Rawat N Deng Z Gmitter FG (2015) Construction of citrus gene coexpression networks from microarray 856 data using random matrix theory Hortic Res 2 15026 857

Du Z Zhou X Ling Y Zhang Z Su Z (2010) agriGO a GO analysis toolkit for the agricultural community 858 Nucleic Acids Res 38 64-70 859

Enright AJ Van Dongen S Ouzounis CA (2002) An efficient algorithm for large-scale detection of protein 860 families Nucleic Acids Res 30 1575ndash1584 861

Fadista J Vikman P Laakso EO Mollet IG Esguerra J Lou Taneera J Storm P Osmark P Ladenvall C 862 Prasad RB (2014) Global genomic and transcriptomic analysis of human pancreatic islets reveals novel 863 genes influencing glucose metabolism Proc Natl Acad Sci 111 13924ndash13929 864

Faith JJ Hayete B Thaden JT Mogno I Wierzbowski J Cottarel G Kasif S Collins JJ Gardner TS (2007) 865 Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of 866 expression profiles PLoS Biol 5 0054ndash0066 867

Fedoroff N V (2012) McClintockrsquos challenge in the 21st century Proc Natl Acad Sci 109(50) 20200ndash20203 868

Ficklin SP Feltus FA (2011) Gene coexpression network alignment and conservation of gene modules 869 between two grass species maize and rice Plant Physiol 156 1244ndash56 870

Filzmoser P Fritz H Kalcher K (2009) pcaPP Robust PCA by Projection Pursuit R Packag version 1 871

Fu J Cheng Y Linghu J Yang X Kang L Zhang Z Zhang J He C Du X Peng Z (2013) RNA sequencing 872 reveals the complex regulatory network in the maize kernel Nature Commun 42832 873

Gao S Ver Steeg G Galstyan A (2015) Efficient Estimation of Mutual Information for Strongly Dependent 874 Variables Artificial Intelligence and Statistics 277-286 875

Gillis J Pavlidis P (2011) The role of indirect connections in gene networks in predicting function 876 Bioinformatics 27 1860ndash1866 877

Giorgi F Fabbro C Del Licausi F (2013) Comparative study of RNA-seq-and Microarray-derived coexpression 878 networks in Arabidopsis thaliana Bioinformatics 2 1ndash8 879

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 26

Gu Y Kaplinsky N Bringmann M Cobb A Carroll A Sampathkumar A Baskin TI Persson S Somerville CR 880 (2010) Identification of a cellulose synthase-associated protein required for cellulose biosynthesis Proc 881 Natl Acad Sci 107 12866ndash12871 882

Han Y Gao S Muegge K Zhang W Zhou B (2015) Advanced applications of RNA sequencing and challenges 883 Bioinform Biol Insights 9 29ndash46 884

Horvath S Dong J (2008) Geometric interpretation of gene coexpression network analysis PLoS Comput biol 885 4 e1000117 886

Huang J Lynn JS Schulte L Vendramin S McGinnis K (2017) Chapter Two-Epigenetic Control of Gene 887 Expression in Maize Int Rev Cell Mol Biol 328 25ndash48 888

Iancu OD Kawane S Bottomly D Searles R Hitzemann R McWeeney S (2012) Utilizing RNA-Seq data for de 889 novo coexpression network inference Bioinformatics 28 1592ndash1597 890

Kim D Langmead B Salzberg SL (2015) HISAT a fast spliced aligner with low memory requirements Nat 891 Methods 12 357ndash360 892

Koumlster J Rahmann S (2012) Snakemakemdasha scalable bioinformatics workflow engine Bioinformatics 28 893 2520ndash2522 894

Krouk G Lingeman J Colon A Coruzzi G Shasha D (2013) Gene regulatory networks in plants learning 895 causality from time and perturbation Genome Biol 14 123 896

Kumar S Stecher G Suleski M Hedges SB (2017) TimeTree a resource for timelines timetrees and 897 divergence times Mol Biol Evol 34 1812ndash1819 898

Kumari S Nie J Chen H-S Ma H Stewart R Li X Lu M-Z Taylor WM Wei H (2012) Evaluation of gene 899 association methods for coexpression network construction and biological knowledge discovery PLoS 900 One 7 e50411 901

Langfelder P Horvath S (2008) WGCNA an R package for weighted correlation network analysis BMC 902 Bioinformatics 9 559 903

Leinonen R Sugawara H Shumway M (2010) The sequence read archive Nucleic Acids Res gkq1019 904

Li C Qiao Z Qi W Wang Q Yuan Y Yang X Tang Y Mei B Lv Y Zhao H et al (2015a) Genome-Wide 905 Characterization of cis-Acting DNA Targets Reveals the Transcriptional Regulatory Framework of 906 Opaque2 in Maize Plant Cell 27 532-545 907

Li H Peng Z Yang X Wang W Fu J Wang J Han Y Chai Y Guo T Yang N (2013a) Genome-wide 908 association study dissects the genetic architecture of oil biosynthesis in maize kernels Nat Genet 45 43ndash909 50 910

Li J Wei H Zhao PX (2013b) DeGNServer  Deciphering Genome-Scale Gene Networks through High 911 Performance Reverse Engineering Analysis 2013 912

Li P Piao Y Shon HS Ryu KH (2015b) Comparing the normalization methods for the differential analysis of 913 Illumina high-throughput RNA-Seq data BMC Bioinformatics 16 347 914

Li Q Eichten SR Hermanson PJ Zaunbrecher VM Song J Wendt J Rosenbaum H Madzima TF Sloan AE 915 Huang J et al (2014a) Genetic Perturbation of the Maize Methylome Plant Cell 26 4602ndash4616 916

Li S Łabaj PP Zumbo P Sykacek P Shi W Shi L Phan J Wu P-Y Wang M Wang C (2014b) Detecting and 917 correcting systematic variation in large-scale RNA sequencing data Nature Biotechnol 32 888ndash895 918

Li Y Pearl SA Jackson SA (2015c) Gene Networks in Plant Biology Approaches in Reconstruction and 919 Analysis Trends Plant Sci 20 664ndash675 920

Liao Y Smyth GK Shi W (2014) featureCounts an efficient general purpose program for assigning sequence 921 reads to genomic features Bioinformatics 30 923ndash930 922

Lim WK Wang K Lefebvre C Califano A (2007) Comparative analysis of microarray normalization procedures 923 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 27

Effects on reverse engineering gene networks Bioinformatics pp 282ndash288 924

Liu S Liu Y Zhao J Cai S Qian H Zuo K Zhao L Zhang L (2017) A computational interactome for prioritizing 925 genes associated with complex agronomic traits in rice Plant J 90 177-188 926

Lonsdale J Thomas J Salvatore M Phillips R Lo E Shad S Hasz R Walters G Garcia F Young N (2013) 927 The genotype-tissue expression (GTEx) project Nat Genet 45 580ndash585 928

Love MI Huber W Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data 929 with DESeq2 Genome Biol 15 1 930

Luo X You Z Zhou M Li S Leung H Xia Y Zhu Q (2015) A highly efficient approach to protein interactome 931 mapping based on collaborative filtering framework Sci Rep 5 7702 932

Ma C Wang X (2012) Application of the Gini correlation coefficient to infer regulatory relationships in 933 transcriptome analysis Plant Physiol 160 192ndash203 934

Ma H-W Zeng A-P (2003) The connectivity structure giant strong component and centrality of metabolic 935 networks Bioinformatics 19 1423ndash1430 936

Ma S Shah S Bohnert HJ Snyder M Dinesh-Kumar SP (2013) Incorporating motif analysis into gene co-937 expression networks reveals novel modular expression pattern and new signaling pathways PLoS Genet 938 9 e1003840 939

Marbach D Costello JC Kuumlffner R Vega NNM Prill RJ Camacho DM Allison KR Aderhold A Allison KR 940 Bonneau R et al (2012) Wisdom of crowds for robust gene network inference Nat Methods 9 796ndash804 941

Margolin AA Nemenman I Basso K Wiggins C Stolovitzky G Dalla Favera R Califano A (2006) ARACNE 942 an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context BMC 943 Bioinformatics 7 S7 944

Mark Cigan A Unger‐Wallace E Haug‐Collet K (2005) Transcriptional gene silencing as a tool for uncovering 945 gene function in maize Plant J 43 929ndash940 946

Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads EMBnet J 17 947 pp-10 948

Maza E Frasse P Senin P Bouzayen M Zouine M (2013) Comparison of normalization methods for 949 differential gene expression analysis in RNA-Seq experiments A matter of relative size of studied 950 transcriptomes Commun Integr Biol 6 e25849 951

McClintock B (1983) The Significance of Responses of THE GENOME TO CHALLENGE Science (80- ) 792ndash952 801 953

Meyer PE Kontos K Lafitte F Bontempi G (2007) Information-theoretic inference of large transcriptional 954 regulatory networks Eurasip J Bioinforma Syst Biol doi 101155200779879 955

Meyer PE Lafitte F Bontempi G (2008) minet ARBioconductor package for inferring large transcriptional 956 networks using mutual information BMC Bioinformatics 9 461 957

Monaco MK Sen TZ Dharmawardhana PD Ren L Schaeffer M Naithani S Amarasinghe V Thomason J 958 Harper L Gardiner J et al (2013) Maize Metabolic Network Construction and Transcriptome Analysis 959 Plant Genome 6 12 960

Morohashi K Casas MI Falcone Ferreyra ML Falcone Ferreyra L Mejiacutea-Guerra MK Pourcel L Yilmaz A 961 Feller A Carvalho B Emiliani J et al (2012) A genome-wide regulatory framework identifies maize 962 pericarp color1 controlled genes Plant Cell 24 2745ndash64 963

Morris JH Apeltsin L Newman AM Baumbach J Wittkop T Su G Bader GD Ferrin TE (2011) clusterMaker 964 a multi-algorithm clustering plugin for Cytoscape BMC Bioinformatics 12 436 965

Mortazavi A Williams BA McCue K Schaeffer L Wold B (2008) Mapping and quantifying mammalian 966 transcriptomes by RNA-Seq Nat Methods 5 621ndash628 967

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 28

Mukaka MM (2012) A guide to appropriate use of correlation coefficient in medical research Malawi Med J 24 968 69ndash71 969

Obayashi T Hayashi S Saeki M Ohta H Kinoshita K (2009) ATTED-II provides coexpressed gene networks 970 for Arabidopsis Nucleic Acids Res 37 D987ndashD991 971

Ogata Y Suzuki H Sakurai N Shibata D (2010) CoP a database for characterizing co-expressed gene 972 modules with biological information in plants Bioinformatics 26 1267ndash1268 973

Oshlack A Wakefield MJ (2009) Transcript length bias in RNA-seq data confounds systems biology Biol 974 Direct 4 14 975

Park T Yi S-G Kang S-H Lee S Lee Y-S Simon R (2003) Evaluation of normalization methods for microarray 976 data BMC Bioinformatics 4 33 977

Paulson J Chen C-Y Lopes-Ramos CM Kuijjer ML Platig J Sonawane AR Fagny M Glass K Quackenbush 978 J (2016) Tissue-aware RNA-Seq processing and normalization for heterogeneous and sparse data 979 bioRxiv 81802 980

Pautler M Eveland AL LaRue T Yang F Weeks R Lunde C Je B Il Meeley R Komatsu M Vollbrecht E et 981 al (2015) FASCIATED EAR4 Encodes a bZIP Transcription Factor That Regulates Shoot Meristem Size in 982 Maize Plant Cell Online 2 tpc114132506 983

Penning BW Hunter 3rd CT Tayengwa R Eveland AL Dugard CK Olek AT Vermerris W Koch KE McCarty 984 DR Davis MF et al (2009) Genetic resources for maize cell wall biology Plant Physiol 151 1703ndash1728 985

Ponnala L Wang Y Sun Q Wijk KJ (2014) Correlation of mRNA and protein abundance in the developing 986 maize leaf Plant J 78 424ndash440 987

Rau A Gallopin M Celeux G Jaffreacutezic F (2013) Data-based filtering for replicated high-throughput 988 transcriptome sequencing experiments Bioinformatics 29 2146ndash2152 989

Robinson MD McCarthy DJ Smyth GK (2010) edgeR a Bioconductor package for differential expression 990 analysis of digital gene expression data Bioinformatics 26 139ndash140 991

Sales G Romualdi C (2011) parmigenemdasha parallel R package for mutual information estimation and gene 992 network reconstruction Bioinformatics 27 1876ndash1877 993

Sangrador-Vegas A Mitchell AL Chang H-Y Yong S-Y Finn RD (2016) GO annotation in InterPro why 994 stability does not indicate accuracy in a sea of changing annotations Database J Biol databases 995 curation 2016 996

Sato Y Antonio BA Namiki N Takehisa H Minami H Kamatsuki K Sugimoto K Shimizu Y Hirochika H 997 Nagamura Y (2011) RiceXPro a platform for monitoring gene expression in japonica rice grown under 998 natural field conditions Nucleic Acids Res 39 D1141ndashD1148 999

Schaefer RJ Briskine R Springer NM Myers CL (2014) Discovering functional modules across diverse maize 1000 transcriptomes using COB the co-expression browser PLoS One doi 101371journalpone0099193 1001

Schmidt D (2016) Co-Operation Fast Correlation Covariance and Cosine Similarity R package 1002

Schnable P Ware D Fulton R Stein J (2009) The B73 maize genome complexity diversity and dynamics 1003 Science (80- ) 326 1112ndash1115 1004

Schwanhaumlusser B Busse D Li N Dittmar G Schuchhardt J Wolf J Chen W Selbach M (2011) Global 1005 quantification of mammalian gene expression control Nature 473 337ndash342 1006

Scott-Boyer M-P Haibe-Kains B Deschepper CF (2013) Network statistics of genetically-driven gene co-1007 expression modules in mouse crosses Frontiers in Genetics 20134291 1008

Serin EAR Nijveen H Hilhorst HWM Ligterink W (2016) Learning from Co-expression Networks Possibilities 1009 and Challenges Front Plant Sci 7 444 1010

Shannon P Markiel A Ozier O Baliga NS Wang JT Ramage D Amin N Schwikowski B Ideker T (2003) 1011 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 29

Cytoscape a software environment for integrated models of biomolecular interaction networks Genome 1012 Res 13 2498ndash2504 1013

Sing T Sander O Beerenwinkel N Lengauer T (2005) ROCR visualizing classifier performance in R 1014 Bioinformatics 21 3940ndash3941 1015

Singh M Goel S Meeley RB Dantec C Parrinello H Michaud C Leblanc O Grimanelli D (2011) Production of 1016 viable gametes without meiosis in maize deficient for an ARGONAUTE protein Plant Cell 23 443ndash458 1017

Song L Langfelder P Horvath S (2012) Comparison of co-expression measures mutual information 1018 correlation and model based indices BMC Bioinformatics 13 328 1019

Stelpflug SC Rajandeep S Vaillancourt B Hirsch CN Buell CR Leon N De Kaeppler SM (2015) An 1020 expanded maize gene expression atlas based on RNA-sequencing and its use to explore root 1021 development Plant Genome 314ndash362 1022

Szklarczyk D Franceschini A Wyder S Forslund K Heller D Huerta-Cepas J Simonovic M Roth A Santos A 1023 Tsafou KP et al (2015) STRING v10 Protein-protein interaction networks integrated over the tree of life 1024 Nucleic Acids Res 43 D447ndashD452 1025

Tzfadia O Diels T De Meyer S Vandepoele K Aharoni A Van de Peer Y (2016) CoExpNetViz Comparative 1026 Co-Expression Networks Construction and Visualization Tool Front Plant Sci 6 1194 1027

Usadel B Obayashi T Mutwil M Giorgi FM Bassel GW Tanimoto M Chow A Steinhauser D Persson S 1028 Provart NJ (2009) Co-expression tools for plant biology opportunities for hypothesis generation and 1029 caveats Plant Cell Environ 32 1633ndash51 1030

USDA (2016) Grain World Markets and Trade 1031

Vanholme R Demedts B Morreel K Ralph J Boerjan W (2010) Lignin biosynthesis and structure Plant 1032 Physiol 153 895ndash905 1033

Verdin E Ott M (2014) 50 Years of Protein Acetylation From Gene Regulation To Epigenetics Metabolism 1034 and Beyond Nat Rev Mol Cell Biol 16 258ndash264 1035

Walley JW Sartor RC Shen Z Schmitz RJ Wu KJ Urich MA Nery JR Smith LG Schnable JC Ecker JR 1036 (2016) Integration of omic networks in a developmental atlas of maize Science 353 814ndash818 1037

Wang W Zhou XJ Liu Z Sun F (2015a) Network tuned multiple rank aggregation and applications to gene 1038 ranking BMC Bioinformatics 16 S6 1039

Wang YXR Jiang K Feldman LJ Bickel PJ Huang H (2015b) Inferring genendashgene interactions and functional 1040 modules using sparse canonical correlation analysis Ann Appl Stat 9 300ndash323 1041

Wang Z Gerstein M Snyder M (2009) RNA-Seq a revolutionary tool for transcriptomics Nat Rev Genet 10 1042 57ndash63 1043

Wei C Li J Bumgarner RE (2004) Sample size for detecting differentially expressed genes in microarray 1044 experiments BMC Genomics 5 87 1045

Wolfe CJ Kohane IS Butte AJ (2005) Systematic survey reveals general applicability ofldquo guilt-by-associationrdquo 1046 within gene coexpression networks BMC Bioinformatics 6 227 1047

Wu D-D Wang X Li Y Zeng L Irwin DM Zhang Y-P (2014) ldquoOut of Pollenrdquo Hypothesis for Origin of New 1048 Genes in Flowering Plants Study from Arabidopsis thaliana Genome Biol Evol 6 2822ndash2829 1049

Yang H Liu X Xin M Du J Hu Z Peng H Rossi V Sun Q Ni Z Yao Y (2016) Genome-wide Mapping of 1050 Targets of Maize Histone Deacetylase HDA101 Reveals Its Function and Regulatory Mechanism during 1051 Seed Development Plant Cell 28 629ndash645 1052

Yim WC Yu Y Song K Jang CS Lee B-M (2013) PLANEX the plant co-expression database BMC Plant Biol 1053 13 83 1054

Zheng W Chung LM Zhao H (2011) Bias detection and correction in RNA-Sequencing data BMC 1055 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 30

Bioinformatics 12 290 1056

Zhong R Allen JD Xiao G Xie Y (2014) Ensemble-based network aggregation improves the accuracy of gene 1057 network reconstruction PLoS One 9 e106319 1058

Zhong R Ye ZH (2015) Secondary cell walls Biosynthesis patterned deposition and transcriptional regulation 1059 Plant Cell Physiol 56 195ndash214 1060

Zhu G Wu A Xu X-J Xiao P Lu L Liu J Cao Y Chen L Wu J Zhao X-M (2016) PPIM A protein-protein 1061 interaction database for Maize Plant Physiol 170 pp1501821- 1062

Zyprych-Walczak J Szabelska A Handschuh L Goacuterczak K Klamecka K Figlerowicz M Siatkowski I (2015) 1063 The impact of normalization methods on RNA-Seq data analysis Biomed Res Int 2015 1064

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 1 Number of maize microarray and RNA-Seq samples submitted to NCBI (httpswwwncbinlmnihgov) from 2008-2016 A A text search of the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) database identified samples generated by microarray platforms GPL4032 and GPL12620 the total values for years 2008 to 2016 were combined to represent the number of microarray studies (Microarray) A text search of the NCBI Sequence Read Archive (SRA) database was used to identify RNA-seq samples generated between 2008 and 2016 using the Illumina sequencing platform (RNA-Seq) B the number of samples submitted to NCBI GEO database each year generated by microarray platform GPL4032 and GPL12620 were identified by a text search (dash line) and compared to the number of RNA-Seq Illumina samples (solid line) per year 2008-2016

Fig 1A B

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 2 Normalization and network inference methods effect on single network performance A Network performance was evaluated by calculating area under the receiver operating characteristic curve (AUROC) values from GO datasets for comparisons with samples normalized using Variance Stabilizing Transformation (VST) Counts Per Million (CPM) or Reads per Kilobase per Million (RPKM) methods B Network performance was evaluated by calculating AUROC values from PPPTY dataset comparisons for samples normalized using VST CPM or RPKM methods C Network performance was evaluated by calculating AUROC values from com-parisons with HDA101 binding targets for samples normalized using VST CPM or RPKM methods D Network performance was evaluated by calculating AUROC values from comparisons with GO dataset for samples constructed using nine inference methods including Pearson Correlation Coefficient (PCC) Spearman correla-tion coefficient (SCC) Kendall rank Correlation Coefficient (KCC) Gini correlation coefficient (GCC) Biweight midcorrelation (BIC) Cosine Similarity Coefficient (CSC) Additive ARCNE (AA) multiplicative ARCNE (MA) Minimum Redundancy NETwork (MRNET) and Context Likelihood of Relatedness (CLR) E Network perfor-mance was evaluated by calculating AUROC values from comparisons with PPPTY for samples constructed using nine inference methods F Network performance was evaluated by calculating AUROC values from com-parisons with HDA101 binding targets for samples constructed using nine inference methods Outliers were defined as outside of 15 times the interquartile range above 75 quantile or below 25 quantile Median values were plotted as bold horizontal lines For C to F the horizontal dashed lines indicate the highest and lowest AUROC values

Fig 2 A D

B E

C F

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

FigP

FigurePIPSimilarityPbetweenPninePinferencePmethodsPonPnetworkPperformancePbasedPuponPGOPA-PandPPPPTYPB-PevaluationI Genes with highest AUROC value in CSC or MRNETCLR are closed in blue and green box respectively AreaPunderPthePROCPcurvePAUROC-PvaluesPforPeachPGOPtermPorPgenesPwerePscaledPtoPstandardPnormalPdistributionMPresultingPinPscaledPAUROCPvaluesPbetweenPKPblue-PandPPred-IPSamplesPnormalizedPbyPVSTMPCPMPandPRPKMPwerePanalyzedPusingPeachPinferencePmethodsPPCCMPSCCMPKCCMPGCCMPBICMPCSCMPAAMPMAMPMRNETPandPCLR-PandPclusteredPbasedPonP EuclidianP distanceIP PCCP PearsonP CorrelationP CoefficientP SCCP SpearmanP correlationP coefficientP KCCP KendallP rankP CorrelationP CoefficientPGCCPGiniPcorrelationPcoefficientPBICPBiweightPmidcorrelationPCosinePSimilarityPCoefficientPCSC-PAAPAdditivePARCNEPMAPmultiplicativePARCNEPMRNETPMinimumPRedundancyPNETworkPCLRPContextPLikelihoodPofPRelatednessI

A

B

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 4 Effect of sample size on network performance A Average area under the ROC curve (average AUROC) values from GO evaluation of 17 different sized networks plotted against natural logarithm transformed sample size (log(sample size)) B Average AUROC values from PPPTY evaluation of 17 different sized networks plotted against natural logarithm transformed sample size (log(sample size)) Regression fitting logarithm models were plotted in black lines R-square and p-values were calculated by lm() function in R PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness

Fig 4A

B

GO PCC GO SCC

GO MRNET GO CLR

PPPTY PCC PPPTY SCC

PPPTY CLRPPPTY MRNET

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Fig 5

A B

Figure 5 Evaluation of different sized networks constructed with different number of samples using PCC (black) SCC (green) MRNET (red) and CLR (blue) methods A Average AUROC values from GO evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted against sample size B Average AUROC values from PPPTY evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted against sample size Networks with the same number of samples included are designated as ldquo_1rdquo ldquo_2rdquo and ldquo_3rdquo PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 6 GCN performance comparison of networks constructed with 1266 libraries A Area under the ROC curve (AUROC) values from GO evalu-ation of single network (white bars) aggregation network (grey bars) and protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers B AUROC values from PPPTY evaluation of single network (white bars) aggregation network (grey bars) and protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers

AU

C

AU

C

Protein GO Protein PPPTYA B

Fig 6

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

835

45

3812

5 802

148

MS PA

SA

872505

167664

16510411732

9172 716573

106591

MS PA

SA

chromatinassemblydisassembly

cellular macromoleculemetabolic process

chromatin assembly

Hub

N2 compound metabolicprocess

gene silencing

macromoleculemetabolic process

cellular componentorganization

chromatin modification

biosynthetic process

cellular biosyntheticprocess

DNA packaging

organelle organization

protein-DNA complexassembly

nucleosomeorganization

DNA-dep DNAreplication

macromoleculebiosynthetic process

response to DNAdamage stimulus

chromosomeorganization

pattern specificationprocess

DNA replication

DNA conformationchange

translation

cellular macromoleculebiosynthetic process

Nucleic acid metabolicprocess

gene expression

chromatin organizationnucleosome assembly

epigenetic reg of geneexpression

negative regulation ofmacromolecule

metabolic process

cellular response tostress

RNA processing

DNA repair

regionalization

polysaccharidebiosynthetic process

cell wall organization orbiogenesis

glucan metabolicprocess

cellular glucanmetabolic process

cellular polysaccharidebiosynthetic process

cellular carbohydratebiosynthetic process

cellulose metabolicprocess

cellular polysaccharidemetabolic process

cellulose biosyntheticprocess epidermis development

cell growthgrowth

regulation of cellularcomponent size

cellular amino acidderivative metabolic

process

cell wall polysaccharidemetabolic process

carbohydrate metabolicprocess

regulation of anatomicalstructure size

GTP metabolic process

root morphogenesis

epidermal celldifferentiation

ectoderm developmentphenylpropanoid

biosynthetic process

regulation of cell size

glucan biosyntheticprocess

carbohydratebiosynthetic process cellular cell wall

organization orbiogenesis

cell wall biogenesis

Cell Wallroot epidermal cell

differentiationcell differentiation

cell wall organization

protein polymerization

plant-type cell wallbiogenesis

cellular carbohydratemetabolic process

phenylpropanoidmetabolic process

cell wall macromoleculemetabolic process

cellular cell wallmacromolecule

metabolic process

plant-type cell wallorganization or

biogenesis

hemicellulose metabolicprocess

Fig 7A C

B D

Figure 7 Comparison of top 1000 interacting genes in aggregation and single networks A Venn diagram shows the overlap among 1000 highest interacting genes in three networks 148 genes were shared among three networks PA PCC ranked aggregation network SA SCC ranked aggrega-tion network MS MRNET single network B Venn diagram shows the overlap among top 1x106 edges in three networks 106591 edges were shared among three networks C GO term enrichment analyzed by AgriGO for 148 shared highly interacting genes among PA SA and MS D GO term enrichment analyzed by AgriGO for 214 co-expressed genes queried by 16 cell wall pathway genes wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Fig 8

A B C

Figure 8 Cell wall pathway subnetworks A Intersections of PCC aggregation (PA) SCC aggregation (SA) and MRNET-single (MS) networks queried by 16 cell wall pathway genes (red nodes) Cyan nodes are genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions B Network retrieved from CORNET database queried by the 16 cell wall pathway genes (red node) Cyan nodes are genes with reported function in cell wall related path-ways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions C Network retrieved from STRING database queried by the 16 cell wall pathway genes (red nodes) Cyan nodes are genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Parsed CitationsAllen JD Xie Y Chen M Girard L Xiao G (2012) Comparing statistical methods for constructing large scale gene networks PLoS One7 e29348

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Anders S Huber W (2010) Differential expression analysis for sequence count data Genome Biol 11 R106Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Aoki K Ogata Y Shibata D (2007) Approaches for extracting practical information from gene co-expression networks in plant biologyPlant Cell Physiol 48 381-90

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Aoki Y Okamura Y Tadaka S Kinoshita K Obayashi T (2015) ATTED-II in 2016 A Plant Coexpression Database Towards Lineage-Specific Coexpression Plant Cell Physiol 57 e5-e5

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Asnicar F Masera L Coller E Gallo C Sella N Tolio T Morettin P Erculiani L Galante F Semeniuta S (2016) NES2RA Networkexpansion by stratified variable subsetting and ranking aggregation Int J High Perform Comput Appl 1094342016662508

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Baerenfaller K Grossmann J Grobei MA Hull R Hirsch-Hoffmann M Yalovsky S Zimmermann P Grossniklaus U Gruissem WBaginsky S (2008) Genome-scale proteomics reveals Arabidopsis thaliana gene models and proteome dynamics Science (80- ) 320938-941

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ballouz S Verleyen W Gillis J (2015) Guidance for RNA-seq co-expression network construction and analysis Safety in numbersBioinformatics 31 2123-2130

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ballouz S Weber M Pavlidis P Gillis J (2016) EGAD Ultra-fast functional analysis of gene networks bioRxiv 53868Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Barabasi A-L Oltvai ZNZN Barabaacutesi A-L (2004) Network biology understanding the cells functional organization Nat Rev Genet 5101-113

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Benjamini Y Hochberg Y (1995) Controlling the false discovery rate a practical and powerful approach to multiple testing J R Stat SocSer B 289-300

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

De Bodt S Hollunder J Nelissen H Meulemeester N Inzeacute D (2012) CORNET 20 Integrating plant coexpression protein-proteininteractions regulatory interactions gene associations and functional annotations New Phytol 195 707-720

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Bolduc N Yilmaz A Mejia-Guerra MK Morohashi K OConnor D Grotewold E Hake S (2012) Unraveling the KNOTTED1 regulatorynetwork in maize meristems Genes Dev 26 1685-90

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Bosch M Mayer C-D Cookson A Donnison IS (2011) Identification of genes involved in cell wall biogenesis in grasses by differential wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

gene expression profiling of elongating and non-elongating maize internodes J Exp Bot 62 3545-3561Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Camacho C Coulouris G Avagyan V Ma N Papadopoulos J Bealer K Madden TL (2009) BLAST+ architecture and applications BMCBioinformatics 10 421

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Conesa A Madrigal P Tarazona S Gomez-Cabrero D Cervera A McPherson A Szczesniak MW Gaffney DJ Elo LL Zhang X et al(2016) A survey of best practices for RNA-seq data analysis Genome Biol 17 13

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dhaeseleer P Liang S Somogyi R (2000) Genetic network inference from co-expression clustering to reverse engineeringBioinformatics 16 707-726

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Davidson RM Hansey CN Gowda M Childs KL Lin H Vaillancourt B Sekhon RS de Leon N Kaeppler SM Jiang N et al (2011) Utilityof RNA Sequencing for Analysis of Maize Reproductive Transcriptomes Plant Genome J 4 191

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Desprez T Juraniec M Crowell EF Jouy H Pochylova Z Parcy F Houmlfte H Gonneau M Vernhettes S (2007) Organization of cellulosesynthase complexes involved in primary cell wall synthesis in Arabidopsis thaliana Proc Natl Acad Sci 104 15572-15577

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dhillon IS Modha DS (2001) Concept decompositions for large sparse text data using clustering Mach Learn 42 143-175Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dillies M-A Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot G Castel D Estelle J (2013a) Acomprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis Brief Bioinform 14671-683

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dillies MA Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot NS Castel D Estelle J et al (2013b) Acomprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis Brief Bioinform 14671-683

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Doncheva NT Assenov Y Domingues FS Albrecht M (2012) Topological analysis and interactive visualization of biological networksand protein structures Nature Protoc 7 670-85

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dong J Horvath S (2007) Understanding network concepts in modules BMC Syst Biol 1 24Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dotto MC Petsch KA Aukerman MJ Beatty M Hammell M Timmermans MCP (2014) Genome-wide analysis of leafbladeless1-regulatedand phased small RNAs underscores the importance of the TAS3 ta-siRNA pathway to maize development PLoS Genet 10 e1004826

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Du D Rawat N Deng Z Gmitter FG (2015) Construction of citrus gene coexpression networks from microarray data using randommatrix theory Hortic Res 2 15026

Pubmed Author and TitleCrossRef Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Google Scholar Author Only Title Only Author and Title

Du Z Zhou X Ling Y Zhang Z Su Z (2010) agriGO a GO analysis toolkit for the agricultural community Nucleic Acids Res 38 64-70Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Enright AJ Van Dongen S Ouzounis CA (2002) An efficient algorithm for large-scale detection of protein families Nucleic Acids Res 301575-1584

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fadista J Vikman P Laakso EO Mollet IG Esguerra J Lou Taneera J Storm P Osmark P Ladenvall C Prasad RB (2014) Globalgenomic and transcriptomic analysis of human pancreatic islets reveals novel genes influencing glucose metabolism Proc Natl AcadSci 111 13924-13929

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Faith JJ Hayete B Thaden JT Mogno I Wierzbowski J Cottarel G Kasif S Collins JJ Gardner TS (2007) Large-scale mapping andvalidation of Escherichia coli transcriptional regulation from a compendium of expression profiles PLoS Biol 5 0054-0066

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fedoroff N V (2012) McClintocks challenge in the 21st century Proc Natl Acad Sci 109(50) 20200-20203Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ficklin SP Feltus FA (2011) Gene coexpression network alignment and conservation of gene modules between two grass speciesmaize and rice Plant Physiol 156 1244-56

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Filzmoser P Fritz H Kalcher K (2009) pcaPP Robust PCA by Projection Pursuit R Packag version 1Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fu J Cheng Y Linghu J Yang X Kang L Zhang Z Zhang J He C Du X Peng Z (2013) RNA sequencing reveals the complex regulatorynetwork in the maize kernel Nature Commun 42832

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gao S Ver Steeg G Galstyan A (2015) Efficient Estimation of Mutual Information for Strongly Dependent Variables ArtificialIntelligence and Statistics 277-286

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gillis J Pavlidis P (2011) The role of indirect connections in gene networks in predicting function Bioinformatics 27 1860-1866Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Giorgi F Fabbro C Del Licausi F (2013) Comparative study of RNA-seq-and Microarray-derived coexpression networks in Arabidopsisthaliana Bioinformatics 2 1-8

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gu Y Kaplinsky N Bringmann M Cobb A Carroll A Sampathkumar A Baskin TI Persson S Somerville CR (2010) Identification of acellulose synthase-associated protein required for cellulose biosynthesis Proc Natl Acad Sci 107 12866-12871

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Han Y Gao S Muegge K Zhang W Zhou B (2015) Advanced applications of RNA sequencing and challenges Bioinform Biol Insights 929-46

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Horvath S Dong J (2008) Geometric interpretation of gene coexpression network analysis PLoS Comput biol 4 e1000117Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Huang J Lynn JS Schulte L Vendramin S McGinnis K (2017) Chapter Two-Epigenetic Control of Gene Expression in Maize Int RevCell Mol Biol 328 25-48

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Iancu OD Kawane S Bottomly D Searles R Hitzemann R McWeeney S (2012) Utilizing RNA-Seq data for de novo coexpressionnetwork inference Bioinformatics 28 1592-1597

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kim D Langmead B Salzberg SL (2015) HISAT a fast spliced aligner with low memory requirements Nat Methods 12 357-360Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Koumlster J Rahmann S (2012) Snakemakemdasha scalable bioinformatics workflow engine Bioinformatics 28 2520-2522Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Krouk G Lingeman J Colon A Coruzzi G Shasha D (2013) Gene regulatory networks in plants learning causality from time andperturbation Genome Biol 14 123

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kumar S Stecher G Suleski M Hedges SB (2017) TimeTree a resource for timelines timetrees and divergence times Mol Biol Evol34 1812-1819

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kumari S Nie J Chen H-S Ma H Stewart R Li X Lu M-Z Taylor WM Wei H (2012) Evaluation of gene association methods forcoexpression network construction and biological knowledge discovery PLoS One 7 e50411

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Langfelder P Horvath S (2008) WGCNA an R package for weighted correlation network analysis BMC Bioinformatics 9 559Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Leinonen R Sugawara H Shumway M (2010) The sequence read archive Nucleic Acids Res gkq1019Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li C Qiao Z Qi W Wang Q Yuan Y Yang X Tang Y Mei B Lv Y Zhao H et al (2015a) Genome-Wide Characterization of cis-Acting DNATargets Reveals the Transcriptional Regulatory Framework of Opaque2 in Maize Plant Cell 27 532-545

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li H Peng Z Yang X Wang W Fu J Wang J Han Y Chai Y Guo T Yang N (2013a) Genome-wide association study dissects the geneticarchitecture of oil biosynthesis in maize kernels Nat Genet 45 43-50

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li J Wei H Zhao PX (2013b) DeGNServer Deciphering Genome-Scale Gene Networks through High Performance ReverseEngineering Analysis 2013

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li P Piao Y Shon HS Ryu KH (2015b) Comparing the normalization methods for the differential analysis of Illumina high-throughputRNA-Seq data BMC Bioinformatics 16 347 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li Q Eichten SR Hermanson PJ Zaunbrecher VM Song J Wendt J Rosenbaum H Madzima TF Sloan AE Huang J et al (2014a)Genetic Perturbation of the Maize Methylome Plant Cell 26 4602-4616

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li S Labaj PP Zumbo P Sykacek P Shi W Shi L Phan J Wu P-Y Wang M Wang C (2014b) Detecting and correcting systematicvariation in large-scale RNA sequencing data Nature Biotechnol 32 888-895

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li Y Pearl SA Jackson SA (2015c) Gene Networks in Plant Biology Approaches in Reconstruction and Analysis Trends Plant Sci 20664-675

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Liao Y Smyth GK Shi W (2014) featureCounts an efficient general purpose program for assigning sequence reads to genomicfeatures Bioinformatics 30 923-930

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Lim WK Wang K Lefebvre C Califano A (2007) Comparative analysis of microarray normalization procedures Effects on reverseengineering gene networks Bioinformatics pp 282-288

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Liu S Liu Y Zhao J Cai S Qian H Zuo K Zhao L Zhang L (2017) A computational interactome for prioritizing genes associated withcomplex agronomic traits in rice Plant J 90 177-188

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Lonsdale J Thomas J Salvatore M Phillips R Lo E Shad S Hasz R Walters G Garcia F Young N (2013) The genotype-tissueexpression (GTEx) project Nat Genet 45 580-585

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Love MI Huber W Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 Genome Biol15 1

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Luo X You Z Zhou M Li S Leung H Xia Y Zhu Q (2015) A highly efficient approach to protein interactome mapping based oncollaborative filtering framework Sci Rep 5 7702

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma C Wang X (2012) Application of the Gini correlation coefficient to infer regulatory relationships in transcriptome analysis PlantPhysiol 160 192-203

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma H-W Zeng A-P (2003) The connectivity structure giant strong component and centrality of metabolic networks Bioinformatics 191423-1430

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma S Shah S Bohnert HJ Snyder M Dinesh-Kumar SP (2013) Incorporating motif analysis into gene co-expression networks revealsnovel modular expression pattern and new signaling pathways PLoS Genet 9 e1003840

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Marbach D Costello JC Kuumlffner R Vega NNM Prill RJ Camacho DM Allison KR Aderhold A Allison KR Bonneau R et al (2012)Wisdom of crowds for robust gene network inference Nat Methods 9 796-804

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Margolin AA Nemenman I Basso K Wiggins C Stolovitzky G Dalla Favera R Califano A (2006) ARACNE an algorithm for thereconstruction of gene regulatory networks in a mammalian cellular context BMC Bioinformatics 7 S7

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mark Cigan A Unger-Wallace E Haug-Collet K (2005) Transcriptional gene silencing as a tool for uncovering gene function in maizePlant J 43 929-940

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads EMBnet J 17 pp-10Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Maza E Frasse P Senin P Bouzayen M Zouine M (2013) Comparison of normalization methods for differential gene expressionanalysis in RNA-Seq experiments A matter of relative size of studied transcriptomes Commun Integr Biol 6 e25849

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

McClintock B (1983) The Significance of Responses of THE GENOME TO CHALLENGE Science (80- ) 792-801Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Meyer PE Kontos K Lafitte F Bontempi G (2007) Information-theoretic inference of large transcriptional regulatory networks EurasipJ Bioinforma Syst Biol doi 101155200779879

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Meyer PE Lafitte F Bontempi G (2008) minet ARBioconductor package for inferring large transcriptional networks using mutualinformation BMC Bioinformatics 9 461

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Monaco MK Sen TZ Dharmawardhana PD Ren L Schaeffer M Naithani S Amarasinghe V Thomason J Harper L Gardiner J et al(2013) Maize Metabolic Network Construction and Transcriptome Analysis Plant Genome 6 12

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Morohashi K Casas MI Falcone Ferreyra ML Falcone Ferreyra L Mejiacutea-Guerra MK Pourcel L Yilmaz A Feller A Carvalho B EmilianiJ et al (2012) A genome-wide regulatory framework identifies maize pericarp color1 controlled genes Plant Cell 24 2745-64

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Morris JH Apeltsin L Newman AM Baumbach J Wittkop T Su G Bader GD Ferrin TE (2011) clusterMaker a multi-algorithmclustering plugin for Cytoscape BMC Bioinformatics 12 436

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mortazavi A Williams BA McCue K Schaeffer L Wold B (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq NatMethods 5 621-628

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mukaka MM (2012) A guide to appropriate use of correlation coefficient in medical research Malawi Med J 24 69-71Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Obayashi T Hayashi S Saeki M Ohta H Kinoshita K (2009) ATTED-II provides coexpressed gene networks for Arabidopsis Nucleic wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Acids Res 37 D987-D991Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ogata Y Suzuki H Sakurai N Shibata D (2010) CoP a database for characterizing co-expressed gene modules with biologicalinformation in plants Bioinformatics 26 1267-1268

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Oshlack A Wakefield MJ (2009) Transcript length bias in RNA-seq data confounds systems biology Biol Direct 4 14Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Park T Yi S-G Kang S-H Lee S Lee Y-S Simon R (2003) Evaluation of normalization methods for microarray data BMC Bioinformatics4 33

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Paulson J Chen C-Y Lopes-Ramos CM Kuijjer ML Platig J Sonawane AR Fagny M Glass K Quackenbush J (2016) Tissue-awareRNA-Seq processing and normalization for heterogeneous and sparse data bioRxiv 81802

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Pautler M Eveland AL LaRue T Yang F Weeks R Lunde C Je B Il Meeley R Komatsu M Vollbrecht E et al (2015) FASCIATED EAR4Encodes a bZIP Transcription Factor That Regulates Shoot Meristem Size in Maize Plant Cell Online 2 tpc114132506

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Penning BW Hunter 3rd CT Tayengwa R Eveland AL Dugard CK Olek AT Vermerris W Koch KE McCarty DR Davis MF et al (2009)Genetic resources for maize cell wall biology Plant Physiol 151 1703-1728

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ponnala L Wang Y Sun Q Wijk KJ (2014) Correlation of mRNA and protein abundance in the developing maize leaf Plant J 78 424-440Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Rau A Gallopin M Celeux G Jaffreacutezic F (2013) Data-based filtering for replicated high-throughput transcriptome sequencingexperiments Bioinformatics 29 2146-2152

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Robinson MD McCarthy DJ Smyth GK (2010) edgeR a Bioconductor package for differential expression analysis of digital geneexpression data Bioinformatics 26 139-140

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sales G Romualdi C (2011) parmigenemdasha parallel R package for mutual information estimation and gene network reconstructionBioinformatics 27 1876-1877

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sangrador-Vegas A Mitchell AL Chang H-Y Yong S-Y Finn RD (2016) GO annotation in InterPro why stability does not indicateaccuracy in a sea of changing annotations Database J Biol databases curation 2016

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sato Y Antonio BA Namiki N Takehisa H Minami H Kamatsuki K Sugimoto K Shimizu Y Hirochika H Nagamura Y (2011) RiceXPro aplatform for monitoring gene expression in japonica rice grown under natural field conditions Nucleic Acids Res 39 D1141-D1148

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schaefer RJ Briskine R Springer NM Myers CL (2014) Discovering functional modules across diverse maize transcriptomes using wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

COB the co-expression browser PLoS One doi 101371journalpone0099193Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schmidt D (2016) Co-Operation Fast Correlation Covariance and Cosine Similarity R packagePubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schnable P Ware D Fulton R Stein J (2009) The B73 maize genome complexity diversity and dynamics Science (80- ) 326 1112-1115Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schwanhaumlusser B Busse D Li N Dittmar G Schuchhardt J Wolf J Chen W Selbach M (2011) Global quantification of mammalian geneexpression control Nature 473 337-342

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Scott-Boyer M-P Haibe-Kains B Deschepper CF (2013) Network statistics of genetically-driven gene co-expression modules in mousecrosses Frontiers in Genetics 20134291

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Serin EAR Nijveen H Hilhorst HWM Ligterink W (2016) Learning from Co-expression Networks Possibilities and Challenges FrontPlant Sci 7 444

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Shannon P Markiel A Ozier O Baliga NS Wang JT Ramage D Amin N Schwikowski B Ideker T (2003) Cytoscape a softwareenvironment for integrated models of biomolecular interaction networks Genome Res 13 2498-2504

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sing T Sander O Beerenwinkel N Lengauer T (2005) ROCR visualizing classifier performance in R Bioinformatics 21 3940-3941Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Singh M Goel S Meeley RB Dantec C Parrinello H Michaud C Leblanc O Grimanelli D (2011) Production of viable gametes withoutmeiosis in maize deficient for an ARGONAUTE protein Plant Cell 23 443-458

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Song L Langfelder P Horvath S (2012) Comparison of co-expression measures mutual information correlation and model basedindices BMC Bioinformatics 13 328

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Stelpflug SC Rajandeep S Vaillancourt B Hirsch CN Buell CR Leon N De Kaeppler SM (2015) An expanded maize gene expressionatlas based on RNA-sequencing and its use to explore root development Plant Genome 314-362

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Szklarczyk D Franceschini A Wyder S Forslund K Heller D Huerta-Cepas J Simonovic M Roth A Santos A Tsafou KP et al (2015)STRING v10 Protein-protein interaction networks integrated over the tree of life Nucleic Acids Res 43 D447-D452

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Tzfadia O Diels T De Meyer S Vandepoele K Aharoni A Van de Peer Y (2016) CoExpNetViz Comparative Co-Expression NetworksConstruction and Visualization Tool Front Plant Sci 6 1194

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Usadel B Obayashi T Mutwil M Giorgi FM Bassel GW Tanimoto M Chow A Steinhauser D Persson S Provart NJ (2009) Co-expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

USDA (2016) Grain World Markets and Trade

Vanholme R Demedts B Morreel K Ralph J Boerjan W (2010) Lignin biosynthesis and structure Plant Physiol 153 895-905Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Verdin E Ott M (2014) 50 Years of Protein Acetylation From Gene Regulation To Epigenetics Metabolism and Beyond Nat Rev MolCell Biol 16 258-264

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Walley JW Sartor RC Shen Z Schmitz RJ Wu KJ Urich MA Nery JR Smith LG Schnable JC Ecker JR (2016) Integration of omicnetworks in a developmental atlas of maize Science 353 814-818

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang W Zhou XJ Liu Z Sun F (2015a) Network tuned multiple rank aggregation and applications to gene ranking BMC Bioinformatics16 S6

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang YXR Jiang K Feldman LJ Bickel PJ Huang H (2015b) Inferring gene-gene interactions and functional modules using sparsecanonical correlation analysis Ann Appl Stat 9 300-323

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang Z Gerstein M Snyder M (2009) RNA-Seq a revolutionary tool for transcriptomics Nat Rev Genet 10 57-63Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wei C Li J Bumgarner RE (2004) Sample size for detecting differentially expressed genes in microarray experiments BMC Genomics5 87

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wolfe CJ Kohane IS Butte AJ (2005) Systematic survey reveals general applicability of guilt-by-association within genecoexpression networks BMC Bioinformatics 6 227

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wu D-D Wang X Li Y Zeng L Irwin DM Zhang Y-P (2014) Out of Pollen Hypothesis for Origin of New Genes in Flowering PlantsStudy from Arabidopsis thaliana Genome Biol Evol 6 2822-2829

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yang H Liu X Xin M Du J Hu Z Peng H Rossi V Sun Q Ni Z Yao Y (2016) Genome-wide Mapping of Targets of Maize HistoneDeacetylase HDA101 Reveals Its Function and Regulatory Mechanism during Seed Development Plant Cell 28 629-645

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yim WC Yu Y Song K Jang CS Lee B-M (2013) PLANEX the plant co-expression database BMC Plant Biol 13 83Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zheng W Chung LM Zhao H (2011) Bias detection and correction in RNA-Sequencing data BMC Bioinformatics 12 290Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Allen JD Xiao G Xie Y (2014) Ensemble-based network aggregation improves the accuracy of gene network reconstructionPLoS One 9 e106319 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Ye ZH (2015) Secondary cell walls Biosynthesis patterned deposition and transcriptional regulation Plant Cell Physiol 56195-214

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhu G Wu A Xu X-J Xiao P Lu L Liu J Cao Y Chen L Wu J Zhao X-M (2016) PPIM A protein-protein interaction database for MaizePlant Physiol 170 pp1501821-

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zyprych-Walczak J Szabelska A Handschuh L Goacuterczak K Klamecka K Figlerowicz M Siatkowski I (2015) The impact of normalizationmethods on RNA-Seq data analysis Biomed Res Int 2015

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

  • Parsed Citations
  • Article File
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Figure 7
  • Figure 8
  • Parsed Citations
Page 15: 1 : Maize RNA-Seq GCNs · 4 Author names and affiliation: 5 Ji Huang, ... and the genome annotation is still far from complete (Mark 49 Cigan et al., 2005; Ficklin and Feltus, 2011)

Page | 15

checked by FastQC v0112 (httpwwwbioinformaticsbabrahamacukprojectsfastqc) HISAT2 v204 (Kim 496

et al 2015) was used for genome alignment Gene-level expression raw read counts were calculated by 497

FeatureCounts 150 (Liao et al 2014) from aligned bam files (Supplemental Fig S1) 26 libraries with less 498

than 5 million reads total and 11 libraries with less than 70 of total alignment rate were excluded leaving 499

1266 samples (Supplemental Table S1) for the final expression table The processing protocol were 500

streamlined by Snakemake v371 (Koumlster and Rahmann 2012) 501

502

Gene Count Normalization 503

The expression data was normalized using three different methods before constructing GCNs Counts Per 504

Million (CPM) and Reads Per Killbase Per Million (RPKM) were calculated by edgeR package (Robinson et al 505

2010) in R environment and then log2 normalized (expression = log2(CPMRPKM +1) For both method scale 506

factors between samples were estimated by Trimmed Mean of M-values (TMM) in edge R Variance Stabilizing 507

Transformation (VST) was calculated by DESeq2 package (Love et al 2014) Only genes with expression 508

higher than 2 CPM in more than 1000 samples were included from additional analysis (15116 genes) 509

510

Network Inference 511

Six correlation coefficient methods and four mutual information methods were applied to normalized gene 512

expression data to construct GCNs All computing steps were done in the R 331 environment Pearson 513

Correlation Coefficient (PCC) and Spearman Correlation Coefficient (SCC) was calculated by cor() function 514

Kendall rank Correlation Coefficient was calculated using corfk() function in pcaPP package (Filzmoser et al 515

2009) Gini Correlation Coefficient was calculated by adjacencymatrix() function in rsgcc package (Ma and 516

Wang 2012) Biweight midcorrelation was computed by bicor() function in WGCNA package (Langfelder and 517

Horvath 2008) Cosine similarity coefficient was computed by cosine() function in coop package (Schmidt 518

2016) Mutual information results were computed using the parmigene package (Sales and Romualdi 2011) 519

The adjacency matrix weighs derived from ten inference methods were ranked with smallest value equals to 520

one Then ranks were divided by the number of elements in the matrix and diagonal was set to one to make all 521

networks weighs ranging from zero to one 522

523

Network Performance Evaluation 524

To generate the random networks gene IDs were shuffled randomly in CPM or VST normalized expression 525

matrices The randomized expression matrices were then inferenced by PCC MRNET or CLR methods and 526

evaluated For PCC methods 1000 repeats of randomization and evaluation were conducted For MRNET and 527

CLR each inference steps took 2 hours on our server so 10 repeats were conducted 528

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 16

Four maize datasets were used for evaluation First maize protein-protein interactions were downloaded from 529

PPIM v11 (Zhu et al 2016) Only high-confidence interactions were used for evaluation as defined by ranking 530

top 5 in their results Second maize pathway information was downloaded from MaizeCyc v22 (Monaco et 531

al 2013) Genes within same pathways were considered as co-expressed Third maize gene ontology data 532

for AGPv330 was downloaded from AgriGO (Du et al 2010) GO terms with 20 to 300 genes were used for 533

evaluation Fourth ChIP-Seq confirmed targets for HDA101 (GRMZM2G172883) (Yang et al 2016) was used 534

as positive co-expressed examples for evaluation 535

The widely-used Area under Receiver Operating Characteristic (AUROC) for binary classification problems 536

was used for evaluations Protein-protein interaction and pathway information was parsed into lists of co-537

expressed genes Prediction() and performance() function in R package ROCR were used to calculate 538

AUROCs (Sing et al 2005) The 277 AUROC values for GO datasets were calculated by EGAD package 539

(Ballouz et al 2016) in R Basically it utilizes the ldquoguilt-by associationrdquo principle that genes with shared GO 540

terms are more likely to connected Thus networks normalized and inferred by different methods can be 541

evaluated by hiding a subset of genes GO terms and test whether the hidden GO terms could be predicted 542

from the remaining annotations The prediction model performance was measured by AUROC values in three-543

fold cross-validation All ANOVA and pairwise Wilcoxon rank tests were analyzed in R using anova() and 544

pairwisewilcoxtest() function from stats package P-value adjustment method was set to ldquofdrrdquo (Benjamini and 545

Hochberg 1995) 546

Definition of True Positives (TP) False Positives (FP) True Negatives (TN) False Negatives (FN) For the 547

evaluation using PPPTY dataset TP a network predicts two genes are co-expressed and they are co-548

expressed in PPPTY dataset FP a network predicts two genes are co-expressed but they are not TN a 549

network predicts two genes are not co-expressed and they are not co-expressed in PPPTY FN a network 550

predicts two genes are not co-expressed but they are co-expressed in PPPTY datasets For the evaluation 551

using GO dataset TP a network predicts a gene has a specific GO term and it does have that GO term in our 552

GO dataset FP a network predicts a gene has a specific GO term but it does not have that GO term in our 553

GO dataset TN a network predicts a gene does not have a specific GO term and it doesnrsquot have in our GO 554

dataset FN a network predicts a gene does not have a specific GO terms but it has that GO term in GO 555

dataset 556

557

Network Clustering and Characterization 558

For each network the top 1 million edges were selected as stringent co-expression networks The network 559

topological characteristics were computed in Cytoscape (Shannon et al 2003) The neighborhood connectivity 560

distribution and node degree distributions were plotted by Network Analyzer plugin (Doncheva et al 2012) 561

Graph clustering was performed using Markov Cluster Algorithm (MCL) by MCL v14137 with inflation value set 562

to 18 (Enright et al 2002) All networks were visualized in Cytoscape 563

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 17

564

Gene Ontology Enrichment and Visualization 565

Gene ontology enrichment was analyzed in AgriGOrsquos Singular Enrichment Analysis tool (Du et al 2010) 566

15116 genes involved in our networks were used as background references Hypergeometric testing was used 567

to calculate p-value for which a value below 005 was considered as significant The Yekutieli method was 568

used for multiple test correction and terms with false discovery rate (FDR) above 005 were discarded The 569

results were then imported into Cytoscape for visualization 570

571

Databases Comparison on Cell Wall Pathway 572

Sixteen well characterized (Penning et al 2009 Bosch et al 2011) components of cell wall biosynthesis 573

(Supplemental Table S8) were chosen as query genes to search against CORNET Maize 574

(httpsbioinformaticspsbugentbecornetversionscornet_maize10) on website and STRING database using 575

Cytoscape stringApp (httpappscytoscapeorgappsstringapp) The parameters for searching CORNET 576

database were Method=Pearson Correlation coefficient=075 P-value le 005 and Top genes = 50 This 577

resulted in 210 co-expressed genes and 325 interactions To search STRING database the confidence cutoff 578

was set to 04 with maximum number of interactors set to 100 76 genes with 817 interactions were retrieved 579

Maize proteins were blasted against TAIR 10 protein sequences using standalone BLASTP version 2228+ 580

(Camacho et al 2009) 581

582

Acknowledgments 583

We would like to give special thanks to Dr Peixiang Zhao (FSU Department of Computer Science) for advice 584

and discussion on topological analysis of maize networks Also we thank Dr Alan Lemmon (FSU Department 585

of Scientific Computing) and Dr Jonathan Dennis (FSU Department of Biological Science) for the helpful 586

discussion on data analysis 587

588

Supplemental Data 589

Supplemental Figure 1 Pipeline and datasets used for analysis 590

Supplemental Figure 2 Distribution of gene expression values 591

Supplemental Figure 3 Maize CPM-normalized with log2 transformed gene expression from all tissues and 592

developmental stages 593

Supplemental Figure 4 Pairwise comparison among results of inferences methods 594

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 18

Supplemental Figure 5 Characteristic of all 1720 PPPTY gene set (ALL_1720) genes with highest AUROC 595

values in CSC method (CSC) PCC method (PCC) and MRNET method (MRNET) 596

Supplemental Figure 6 Evaluation of network performance based on sample size and inference 597

Supplemental Figure 7 GCN performance comparison between protein networks 598

Supplemental Figure 8 Average neighborhood connectivity for three selected networks PCC-aggregated (PA) 599

SCC-aggregated (SA) and MRNET-single (MS) 600

Supplemental Figure 9 Node distribution for four selected networks PCC-aggregated (PA) SCC-aggregated 601

(SA) and MRNET-single (MS) and the intersection among three networks (Merged network) 602

Supplemental Figure 10 Network representation of PCC ranked aggregation network (PA) 603

Supplemental Table S1 RNA-Seq libraries used in this analysis 604

Supplemental Table S2 Random network AUROC value baseline 605

Supplemental Table S3 ANOVA tables and pairwise comparisons 606

Supplemental Table S4 Topological characteristics of four maize networks 607

Supplemental Table S5 Gene Ontology annotation for 148 hub genes 608

Supplemental Table S6 Enriched GO terms for PCC ranked aggregation networks from module 1 to module 8 609

Supplemental Table S7 Enriched GO terms for SCC ranked aggregation networks from module 1 to module 8 610

Supplemental Table S8 16 query genes in maize cell wall pathway 611

Supplemetal Table S9 GO enrichment analysis for 214 co-expressed genes of cell wall query genes in 612

merged network 613

Supplemental Table S10 Annotation for co-expressed genes queried by 16 cell wall pathway genes from 614

merged network 615

Supplemental Table S11 Annotation for co-expressed genes queried by 16 cell wall pathway genes from 616

CORNET database 617

Supplemental Table S12 Annotation for co-expressed genes queried by 16 cell wall pathway genes from 618

STRING database 619

Supplemental Dataset S1 The merged network in Cytoscape-ready format 620

Supplemental Dataset S2 Tutorial Visualizing Co-expression data in Cytoscape 621

622

623 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 19

624

625

626

Figure legends 627

628

Figure 1 Number of maize microarray and RNA-Seq samples submitted to NCBI (httpswwwncbinlmnihgov) 629

from 2008-2016 A A text search of the National Center for Biotechnology Information (NCBI) Gene 630

Expression Omnibus (GEO) database identified samples generated by microarray platforms GPL4032 and 631

GPL12620 the total values for years 2008 to 2016 were combined to represent the number of microarray 632

studies (Microarray) A text search of the NCBI Sequence Read Archive (SRA) database was used to identify 633

RNA-seq samples generated between 2008 and 2016 using the Illumina sequencing platform (RNA-Seq) B 634

the number of samples submitted to NCBI GEO database each year generated by microarray platform 635

GPL4032 and GPL12620 were identified by a text search (dash line) and compared to the number of RNA-Seq 636

Illumina samples (solid line) per year 2008-2016 637

638

Figure 2 Normalization and network inference methods effect on single network performance A Network 639

performance was evaluated by calculating area under the receiver operating characteristic curve (AUROC) 640

values from GO datasets for comparisons with samples normalized using Variance Stabilizing Transformation 641

(VST) Counts Per Million (CPM) or Reads per Kilobase per Million (RPKM) methods B Network performance 642

was evaluated by calculating AUROC values from PPPTY dataset comparisons for samples normalized using 643

VST CPM or RPKM methods C Network performance was evaluated by calculating AUROC values from 644

comparisons with HDA101 binding targets for samples normalized using VST CPM or RPKM methods D 645

Network performance was evaluated by calculating AUROC values from comparisons with GO dataset for 646

samples constructed using ten inference methods including Pearson Correlation Coefficient (PCC) Spearman 647

correlation coefficient (SCC) Kendall rank Correlation Coefficient (KCC) Gini correlation coefficient (GCC) 648

Biweight midcorrelation (BIC) Cosine Similarity Coefficient (CSC) Additive ARCNE (AA) multiplicative 649

ARCNE (MA) Minimum Redundancy NETwork (MRNET) and Context Likelihood of Relatedness (CLR) E 650

Network performance was evaluated by calculating AUROC values from comparisons with PPPTY for samples 651

constructed using ten inference methods F Network performance was evaluated by calculating AUROC 652

values from comparisons with HDA101 binding targets for samples constructed using ten inference methods 653

Outliers were defined as outside of 15 times the interquartile range above 75 quantile or below 25 quantile 654

Median values were plotted as bold horizontal lines For C to F the horizontal dashed lines indicate the highest 655

and lowest AUROC values 656

657

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 20

Figure 3 Similarity between ten inference methods on network performance based upon GO (A) and PPPTY 658

(B) evaluation Genes with highest AUROC value in CSC or MRNETCLR are closed in blue and green box 659

respectively Area under the ROC curve (AUROC) values for each GO term or genes were scaled to standard 660

normal distribution resulting in scaled AUROC values between -3 (blue) and 3 (red) Samples normalized by 661

VST CPM and RPKM were analyzed using each inference methods (PCC SCC KCC GCC BIC CSC AA 662

MA MRNET and CLR) and clustered based on Euclidian distance PCC Pearson Correlation Coefficient SCC 663

Spearman correlation coefficient KCC Kendall rank Correlation Coefficient GCC Gini correlation coefficient 664

BIC Biweight midcorrelation CSC Cosine Similarity Coefficient AA Additive ARCNE MA multiplicative 665

ARCNE MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness 666

667

Figure 4 Effect of sample size on network performance A Average area under the ROC curve (average 668

AUROC) values from GO evaluation of 17 different sized networks plotted against natural logarithm 669

transformed sample size (log(sample size)) B Average AUROC values from PPPTY evaluation of 17 different 670

sized networks plotted against natural logarithm transformed sample size (log(sample size)) Regression fitting 671

logarithm models were plotted in black lines R-square and p-values were calculated by lm() function in R PCC 672

Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET Minimum Redundancy 673

NETwork CLR Context Likelihood of Relatedness 674

675

Figure 5 Evaluation of different sized networks constructed with different number of samples using PCC 676

(black) SCC (green) MRNET (red) and CLR (blue) methods A Average AUROC values from GO evaluations 677

of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted against sample size Seventeen 678

individual networks were labeled as S12_1 to S404 the S1266 included all samples from 17 experiments B 679

Average AUROC values from PPPTY evaluations of networks constructed using 12 (S12) to 1266 (S1266) 680

libraries were plotted against sample size Networks with the same number of samples included are 681

designated as ldquo_1rdquo ldquo_2rdquo and ldquo_3rdquo PCC Pearson Correlation Coefficient SCC Spearman correlation 682

coefficient MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness 683

684

Fig 6 GCN performance comparison among single network (whiterdquo1266rdquo) aggregated network (greyrdquoaggrdquo) 685

and protein network (dark greyrdquoprrdquo) using PCC SCC MRNET and CLR A GO evaluation on networks 686

Inference methods were indicated by single letter (p- PCC s- SCC m- MRNET c-CLR) AUROC values were 687

plotted against network types B PPPTY evaluation on networks Inference methods were indicated by single 688

letter (p- PCC s- SCC m- MRNET c-CLR) Network types were plotted against AUROC values Bold 689

horizontal lines indicate median star sign is the mean value of each box Outliers are plotted in grey dots 690

691

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 21

Figure 6 GCN performance comparison of networks constructed with 1266 libraries A Area under the ROC 692

curve (AUROC) values from GO evaluation of single network (white bars) aggregation network (grey bars) and 693

protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) 694

or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers B 695

AUROC values from PPPTY evaluation of single network (white bars) aggregation network (grey bars) and 696

protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) 697

or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers 698

699

Figure 7 Comparison of top 1000 interacting genes in aggregation and single networks A Venn diagram 700

shows the overlap among 1000 highest interacting genes in three networks 148 genes were shared among 701

three networks PA PCC ranked aggregation network SA SCC ranked aggregation network MS MRNET 702

single network B Venn diagram shows the overlap among top 1x106 edges in three networks 106591 edges 703

were shared among three networks C GO term enrichment analyzed by AgriGO for 148 shared highly 704

interacting genes among PA SA and MS D GO term enrichment analyzed by AgriGO for 214 co-expressed 705

genes queried by 16 cell wall pathway genes 706

707

Figure 8 Cell wall pathway subnetworks A Intersections of PCC aggregation (PA) SCC aggregation (SA) and 708

MRNET-single (MS) networks queried by 16 cell wall pathway genes (red nodes) Cyan nodes are genes with 709

reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of 710

involvement in cell wall related pathways Grey lines indicate network predicted interactions B Network 711

retrieved from CORNET database queried by the16 cell wall pathway genes (red node) Cyan nodes are 712

genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior 713

knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions C 714

Network retrieved from STRING database queried by 16 cell wall pathway genes (red nodes) Cyan nodes are 715

genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior 716

knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions 717

718

Supplemental Figure 1 Pipeline and datasets used for analysis A Workflow used in this analysis 719

Independent steps are labeled in square boxes with alternative algorithms for each step in the rounded boxes 720

Software and packages for each step are in italics between the boxes Raw data files were acquired from 721

National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) database converted to a 722

common format (fastq files) and aligned to the maize AGPv3 genome (Alignment) Gene-level reads were 723

counted (Read Count) to generate an expression matrix which was imported to the R environment for the 724

normalization inference and evaluation steps All networks were visualized in Cytoscape B Relative 725

representation of different maize tissues in acquired datasets Tissues are listed by name with the percentage 726

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 22

of the1266 libraries originating from each tissue SAM= Shoot Apical Meristem Samples are grouped by tissue 727

and may be represented by one or more developmental stages of that tissue Tissues represented by less than 728

10 libraries were grouped together as Others C Relative representation of different maize genotypes in our 729

datasets Genotypes are listed by name with the percentage of the 1266 libraries originating from each tissue 730

MAGIC = Multi-parent Advanced Generation InterCrosses Genotypes represented by more than 10 libraries 731

were grouped together as Others 732

733

Supplemental Figure 2 Distribution of gene expression values The frequency of each expression level in the 734

dataset (Density) was plotted against gene expression (Expr) which was calculated after normalization by 735

Variance Stabilizing Transformation (VST) Counts Per Million (CPM) or Reads Per Kilobase per Million 736

mapped reads (RPKM) A-B distribution of expression values for samples normalized with CPM (black line 737

CPM graph) and RPKM (black line RPKM graph) before (A) and after (B) logarithm normalization (log2) VST 738

values are log2 transformed by default The normal distribution of expression (dot lines) was calculated using 739

dnorm() function in R which takes the mean value and standard deviation from log2 transformed expressions 740

C Normalized gene expression values for 15116 genes were averaged libraries and plotted as a function of 741

gene length in base pairs (bp) 742

743

Supplemental Figure 3 Maize CPM-normalized with log2 transformed gene expression from all tissues and 744

developmental stages (Stelpflug et al 2015) A Clustering dendrogram of samples based on Euclidean 745

distance (Height) DAS days after sowing DAP days after pollination V1-V18 vegetative developmental 746

stage B Heat map of the gene expression correlation between pollen tissue and 78 other tissues calculated 747

by Pearson correlation coefficient ranging 06 to 10 Red color indicates higher correlation 748

749

Supplemental Figure 4 Pairwise comparison among results of inferences methods A GO evaluation 750

comparisons for VST CPM and RPKM normalized data The AUROC value density for each method was 751

plotted in diagonal line of blocks between AUROC values and PCC values AUROC values evaluated by GO 752

datasets were plotted pairwise in triangle below diagonal with the number corresponding coefficient values as 753

calculated by Pearson correlation shown in the triangle above diagonal B PPPTY evaluation comparisons for 754

VST CPM and RPKM normalized data The AUROC value density for each method was plotted in diagonal 755

line of blocks between AUROC values and PCC values AUROC values evaluated by PPPTY datasets were 756

plotted pairwise in triangle below diagonal with the number corresponding coefficient values as calculated by 757

Pearson correlation shown in the triangle above diagonal PCC Pearson Correlation Coefficient SCC 758

Spearman correlation coefficient KCC Kendall rank Correlation Coefficient GCC Gini correlation coefficient 759

Bi Biweight midcorrelation CSC Cosine Similarity Coefficient AA Additive ARCNE MA multiplicative 760

ARCNE MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness 761

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 23

762

Supplemental Figure 5 Characteristic of all 1720 PPPTY gene set (ALL_1720) genes with highest AUROC 763

values in CSC method (CSC) PCC method (PCC) and MRNET method (MRNET) Average expression in 764

CPM of four gene sets were in squares average number of lowly expressed elements (CPM lt 0) were in solid 765

circles 766

767

Supplemental Figure 6 Evaluation of network performance based on sample size and inference A AUROC 768

values from GO evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted 769

against sample size B AUROC values from PPPTY evaluations of networks constructed using 12 (S12) to 770

1266 (S1266) libraries were plotted against sample size Networks with the same number of samples included 771

are designated as ldquo_1rdquo ldquo_2rdquo and ldquo_3rdquo Outliers were defined as outside of 15 times the interquartile range 772

above the 75 quantile or below 25 quantile Median values were plotted as bold horizontal lines Dash lines 773

are average AUROC value from 17 individual networks of each categories Mean values of each network were 774

labeled in asterisks PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET 775

Minimum Redundancy NETwork CLR Context Likelihood of Relatedness 776

777

Supplemental Figure 7 GCN performance comparison between protein networks A Area Under the ROC 778

curve (AUROC) values from GO evaluation of protein networks with 17862 genes (ppr_all) and with 11429 779

genes (ppr) B Area Under the ROC curve (AUROC) values from PPPTY evaluation of protein networks with 780

17862 genes (ppr_all) and with 11429 genes (ppr) Both networks were constructed by Pearson Correlation 781

Coefficient (PCC) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate 782

outliers 783

784

Supplemental Figure 8 Average neighborhood connectivity for three selected networks PCC-aggregated (PA) 785

SCC-aggregated (SA) and MRNET-single (MS) The average neighborhood connectivity distribution of all 786

genes is plotted against number of neighbors The top one million edges were chosen for each network Red 787

and blue curve shows the power-law fitted distribution R2 value indicates the fitness with the power-law model 788

789

Supplemental Figure 9 Node distribution for four selected networks PCC-aggregated (PA) SCC-aggregated 790

(SA) and MRNET-single (MS) and the intersection among three networks (Merged network) The number of 791

edges linked to the genes (node degree) was plotted against the number of genes with that degree (number of 792

nodes) Red curve shows the power-law fitted distribution with the function and R2 indicated beside 793

794

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 24

Supplemental Figure 10 Network representation of PCC ranked aggregation network (PA) Each node is a 795

gene in the network The eight largest modules detected by Markov Cluster Algorithm (MCL) were highlighted 796

in colors Genes not in modules 1-8 are light grey nodes 797

798

799

Literature Cited 800

Allen JD Xie Y Chen M Girard L Xiao G (2012) Comparing statistical methods for constructing large scale 801 gene networks PLoS One 7 e29348 802

Anders S Huber W (2010) Differential expression analysis for sequence count data Genome Biol 11 R106 803

Aoki K Ogata Y Shibata D (2007) Approaches for extracting practical information from gene co-expression 804 networks in plant biology Plant Cell Physiol 48 381ndash90 805

Aoki Y Okamura Y Tadaka S Kinoshita K Obayashi T (2015) ATTED-II in 2016 A Plant Coexpression 806 Database Towards Lineage-Specific Coexpression Plant Cell Physiol 57 e5ndashe5 807

Asnicar F Masera L Coller E Gallo C Sella N Tolio T Morettin P Erculiani L Galante F Semeniuta S (2016) 808 NES2RA Network expansion by stratified variable subsetting and ranking aggregation Int J High Perform 809 Comput Appl 1094342016662508 810

Baerenfaller K Grossmann J Grobei MA Hull R Hirsch-Hoffmann M Yalovsky S Zimmermann P 811 Grossniklaus U Gruissem W Baginsky S (2008) Genome-scale proteomics reveals Arabidopsis thaliana 812 gene models and proteome dynamics Science (80- ) 320 938ndash941 813

Ballouz S Verleyen W Gillis J (2015) Guidance for RNA-seq co-expression network construction and analysis 814 Safety in numbers Bioinformatics 31 2123ndash2130 815

Ballouz S Weber M Pavlidis P Gillis J (2016) EGAD Ultra-fast functional analysis of gene networks bioRxiv 816 53868 817

Barabasi A-L Oltvai ZNZN Barabaacutesi A-L (2004) Network biology understanding the cellrsquos functional 818 organization Nat Rev Genet 5 101ndash113 819

Benjamini Y Hochberg Y (1995) Controlling the false discovery rate a practical and powerful approach to 820 multiple testing J R Stat Soc Ser B 289ndash300 821

De Bodt S Hollunder J Nelissen H Meulemeester N Inzeacute D (2012) CORNET 20 Integrating plant 822 coexpression protein-protein interactions regulatory interactions gene associations and functional 823 annotations New Phytol 195 707ndash720 824

Bolduc N Yilmaz A Mejia-Guerra MK Morohashi K OrsquoConnor D Grotewold E Hake S (2012) Unraveling the 825 KNOTTED1 regulatory network in maize meristems Genes Dev 26 1685ndash90 826

Bosch M Mayer C-D Cookson A Donnison IS (2011) Identification of genes involved in cell wall biogenesis in 827 grasses by differential gene expression profiling of elongating and non-elongating maize internodes J 828 Exp Bot 62 3545ndash3561 829

Camacho C Coulouris G Avagyan V Ma N Papadopoulos J Bealer K Madden TL (2009) BLAST+ 830 architecture and applications BMC Bioinformatics 10 421 831

Conesa A Madrigal P Tarazona S Gomez-Cabrero D Cervera A McPherson A Szcześniak MW Gaffney DJ 832 Elo LL Zhang X et al (2016) A survey of best practices for RNA-seq data analysis Genome Biol 17 13 833

Drsquohaeseleer P Liang S Somogyi R (2000) Genetic network inference from co-expression clustering to reverse 834 engineering Bioinformatics 16 707ndash726 835

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 25

Davidson RM Hansey CN Gowda M Childs KL Lin H Vaillancourt B Sekhon RS de Leon N Kaeppler SM 836 Jiang N et al (2011) Utility of RNA Sequencing for Analysis of Maize Reproductive Transcriptomes Plant 837 Genome J 4 191 838

Desprez T Juraniec M Crowell EF Jouy H Pochylova Z Parcy F Houmlfte H Gonneau M Vernhettes S (2007) 839 Organization of cellulose synthase complexes involved in primary cell wall synthesis in Arabidopsis 840 thaliana Proc Natl Acad Sci 104 15572ndash15577 841

Dhillon IS Modha DS (2001) Concept decompositions for large sparse text data using clustering Mach Learn 842 42 143ndash175 843

Dillies M-A Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot G Castel D 844 Estelle J (2013a) A comprehensive evaluation of normalization methods for Illumina high-throughput RNA 845 sequencing data analysis Brief Bioinform 14 671ndash683 846

Dillies MA Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot NS Castel D 847 Estelle J et al (2013b) A comprehensive evaluation of normalization methods for Illumina high-throughput 848 RNA sequencing data analysis Brief Bioinform 14 671ndash683 849

Doncheva NT Assenov Y Domingues FS Albrecht M (2012) Topological analysis and interactive visualization 850 of biological networks and protein structures Nature Protoc 7 670ndash85 851

Dong J Horvath S (2007) Understanding network concepts in modules BMC Syst Biol 1 24 852

Dotto MC Petsch KA Aukerman MJ Beatty M Hammell M Timmermans MCP (2014) Genome-wide analysis 853 of leafbladeless1-regulated and phased small RNAs underscores the importance of the TAS3 ta-siRNA 854 pathway to maize development PLoS Genet 10 e1004826 855

Du D Rawat N Deng Z Gmitter FG (2015) Construction of citrus gene coexpression networks from microarray 856 data using random matrix theory Hortic Res 2 15026 857

Du Z Zhou X Ling Y Zhang Z Su Z (2010) agriGO a GO analysis toolkit for the agricultural community 858 Nucleic Acids Res 38 64-70 859

Enright AJ Van Dongen S Ouzounis CA (2002) An efficient algorithm for large-scale detection of protein 860 families Nucleic Acids Res 30 1575ndash1584 861

Fadista J Vikman P Laakso EO Mollet IG Esguerra J Lou Taneera J Storm P Osmark P Ladenvall C 862 Prasad RB (2014) Global genomic and transcriptomic analysis of human pancreatic islets reveals novel 863 genes influencing glucose metabolism Proc Natl Acad Sci 111 13924ndash13929 864

Faith JJ Hayete B Thaden JT Mogno I Wierzbowski J Cottarel G Kasif S Collins JJ Gardner TS (2007) 865 Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of 866 expression profiles PLoS Biol 5 0054ndash0066 867

Fedoroff N V (2012) McClintockrsquos challenge in the 21st century Proc Natl Acad Sci 109(50) 20200ndash20203 868

Ficklin SP Feltus FA (2011) Gene coexpression network alignment and conservation of gene modules 869 between two grass species maize and rice Plant Physiol 156 1244ndash56 870

Filzmoser P Fritz H Kalcher K (2009) pcaPP Robust PCA by Projection Pursuit R Packag version 1 871

Fu J Cheng Y Linghu J Yang X Kang L Zhang Z Zhang J He C Du X Peng Z (2013) RNA sequencing 872 reveals the complex regulatory network in the maize kernel Nature Commun 42832 873

Gao S Ver Steeg G Galstyan A (2015) Efficient Estimation of Mutual Information for Strongly Dependent 874 Variables Artificial Intelligence and Statistics 277-286 875

Gillis J Pavlidis P (2011) The role of indirect connections in gene networks in predicting function 876 Bioinformatics 27 1860ndash1866 877

Giorgi F Fabbro C Del Licausi F (2013) Comparative study of RNA-seq-and Microarray-derived coexpression 878 networks in Arabidopsis thaliana Bioinformatics 2 1ndash8 879

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 26

Gu Y Kaplinsky N Bringmann M Cobb A Carroll A Sampathkumar A Baskin TI Persson S Somerville CR 880 (2010) Identification of a cellulose synthase-associated protein required for cellulose biosynthesis Proc 881 Natl Acad Sci 107 12866ndash12871 882

Han Y Gao S Muegge K Zhang W Zhou B (2015) Advanced applications of RNA sequencing and challenges 883 Bioinform Biol Insights 9 29ndash46 884

Horvath S Dong J (2008) Geometric interpretation of gene coexpression network analysis PLoS Comput biol 885 4 e1000117 886

Huang J Lynn JS Schulte L Vendramin S McGinnis K (2017) Chapter Two-Epigenetic Control of Gene 887 Expression in Maize Int Rev Cell Mol Biol 328 25ndash48 888

Iancu OD Kawane S Bottomly D Searles R Hitzemann R McWeeney S (2012) Utilizing RNA-Seq data for de 889 novo coexpression network inference Bioinformatics 28 1592ndash1597 890

Kim D Langmead B Salzberg SL (2015) HISAT a fast spliced aligner with low memory requirements Nat 891 Methods 12 357ndash360 892

Koumlster J Rahmann S (2012) Snakemakemdasha scalable bioinformatics workflow engine Bioinformatics 28 893 2520ndash2522 894

Krouk G Lingeman J Colon A Coruzzi G Shasha D (2013) Gene regulatory networks in plants learning 895 causality from time and perturbation Genome Biol 14 123 896

Kumar S Stecher G Suleski M Hedges SB (2017) TimeTree a resource for timelines timetrees and 897 divergence times Mol Biol Evol 34 1812ndash1819 898

Kumari S Nie J Chen H-S Ma H Stewart R Li X Lu M-Z Taylor WM Wei H (2012) Evaluation of gene 899 association methods for coexpression network construction and biological knowledge discovery PLoS 900 One 7 e50411 901

Langfelder P Horvath S (2008) WGCNA an R package for weighted correlation network analysis BMC 902 Bioinformatics 9 559 903

Leinonen R Sugawara H Shumway M (2010) The sequence read archive Nucleic Acids Res gkq1019 904

Li C Qiao Z Qi W Wang Q Yuan Y Yang X Tang Y Mei B Lv Y Zhao H et al (2015a) Genome-Wide 905 Characterization of cis-Acting DNA Targets Reveals the Transcriptional Regulatory Framework of 906 Opaque2 in Maize Plant Cell 27 532-545 907

Li H Peng Z Yang X Wang W Fu J Wang J Han Y Chai Y Guo T Yang N (2013a) Genome-wide 908 association study dissects the genetic architecture of oil biosynthesis in maize kernels Nat Genet 45 43ndash909 50 910

Li J Wei H Zhao PX (2013b) DeGNServer  Deciphering Genome-Scale Gene Networks through High 911 Performance Reverse Engineering Analysis 2013 912

Li P Piao Y Shon HS Ryu KH (2015b) Comparing the normalization methods for the differential analysis of 913 Illumina high-throughput RNA-Seq data BMC Bioinformatics 16 347 914

Li Q Eichten SR Hermanson PJ Zaunbrecher VM Song J Wendt J Rosenbaum H Madzima TF Sloan AE 915 Huang J et al (2014a) Genetic Perturbation of the Maize Methylome Plant Cell 26 4602ndash4616 916

Li S Łabaj PP Zumbo P Sykacek P Shi W Shi L Phan J Wu P-Y Wang M Wang C (2014b) Detecting and 917 correcting systematic variation in large-scale RNA sequencing data Nature Biotechnol 32 888ndash895 918

Li Y Pearl SA Jackson SA (2015c) Gene Networks in Plant Biology Approaches in Reconstruction and 919 Analysis Trends Plant Sci 20 664ndash675 920

Liao Y Smyth GK Shi W (2014) featureCounts an efficient general purpose program for assigning sequence 921 reads to genomic features Bioinformatics 30 923ndash930 922

Lim WK Wang K Lefebvre C Califano A (2007) Comparative analysis of microarray normalization procedures 923 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 27

Effects on reverse engineering gene networks Bioinformatics pp 282ndash288 924

Liu S Liu Y Zhao J Cai S Qian H Zuo K Zhao L Zhang L (2017) A computational interactome for prioritizing 925 genes associated with complex agronomic traits in rice Plant J 90 177-188 926

Lonsdale J Thomas J Salvatore M Phillips R Lo E Shad S Hasz R Walters G Garcia F Young N (2013) 927 The genotype-tissue expression (GTEx) project Nat Genet 45 580ndash585 928

Love MI Huber W Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data 929 with DESeq2 Genome Biol 15 1 930

Luo X You Z Zhou M Li S Leung H Xia Y Zhu Q (2015) A highly efficient approach to protein interactome 931 mapping based on collaborative filtering framework Sci Rep 5 7702 932

Ma C Wang X (2012) Application of the Gini correlation coefficient to infer regulatory relationships in 933 transcriptome analysis Plant Physiol 160 192ndash203 934

Ma H-W Zeng A-P (2003) The connectivity structure giant strong component and centrality of metabolic 935 networks Bioinformatics 19 1423ndash1430 936

Ma S Shah S Bohnert HJ Snyder M Dinesh-Kumar SP (2013) Incorporating motif analysis into gene co-937 expression networks reveals novel modular expression pattern and new signaling pathways PLoS Genet 938 9 e1003840 939

Marbach D Costello JC Kuumlffner R Vega NNM Prill RJ Camacho DM Allison KR Aderhold A Allison KR 940 Bonneau R et al (2012) Wisdom of crowds for robust gene network inference Nat Methods 9 796ndash804 941

Margolin AA Nemenman I Basso K Wiggins C Stolovitzky G Dalla Favera R Califano A (2006) ARACNE 942 an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context BMC 943 Bioinformatics 7 S7 944

Mark Cigan A Unger‐Wallace E Haug‐Collet K (2005) Transcriptional gene silencing as a tool for uncovering 945 gene function in maize Plant J 43 929ndash940 946

Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads EMBnet J 17 947 pp-10 948

Maza E Frasse P Senin P Bouzayen M Zouine M (2013) Comparison of normalization methods for 949 differential gene expression analysis in RNA-Seq experiments A matter of relative size of studied 950 transcriptomes Commun Integr Biol 6 e25849 951

McClintock B (1983) The Significance of Responses of THE GENOME TO CHALLENGE Science (80- ) 792ndash952 801 953

Meyer PE Kontos K Lafitte F Bontempi G (2007) Information-theoretic inference of large transcriptional 954 regulatory networks Eurasip J Bioinforma Syst Biol doi 101155200779879 955

Meyer PE Lafitte F Bontempi G (2008) minet ARBioconductor package for inferring large transcriptional 956 networks using mutual information BMC Bioinformatics 9 461 957

Monaco MK Sen TZ Dharmawardhana PD Ren L Schaeffer M Naithani S Amarasinghe V Thomason J 958 Harper L Gardiner J et al (2013) Maize Metabolic Network Construction and Transcriptome Analysis 959 Plant Genome 6 12 960

Morohashi K Casas MI Falcone Ferreyra ML Falcone Ferreyra L Mejiacutea-Guerra MK Pourcel L Yilmaz A 961 Feller A Carvalho B Emiliani J et al (2012) A genome-wide regulatory framework identifies maize 962 pericarp color1 controlled genes Plant Cell 24 2745ndash64 963

Morris JH Apeltsin L Newman AM Baumbach J Wittkop T Su G Bader GD Ferrin TE (2011) clusterMaker 964 a multi-algorithm clustering plugin for Cytoscape BMC Bioinformatics 12 436 965

Mortazavi A Williams BA McCue K Schaeffer L Wold B (2008) Mapping and quantifying mammalian 966 transcriptomes by RNA-Seq Nat Methods 5 621ndash628 967

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 28

Mukaka MM (2012) A guide to appropriate use of correlation coefficient in medical research Malawi Med J 24 968 69ndash71 969

Obayashi T Hayashi S Saeki M Ohta H Kinoshita K (2009) ATTED-II provides coexpressed gene networks 970 for Arabidopsis Nucleic Acids Res 37 D987ndashD991 971

Ogata Y Suzuki H Sakurai N Shibata D (2010) CoP a database for characterizing co-expressed gene 972 modules with biological information in plants Bioinformatics 26 1267ndash1268 973

Oshlack A Wakefield MJ (2009) Transcript length bias in RNA-seq data confounds systems biology Biol 974 Direct 4 14 975

Park T Yi S-G Kang S-H Lee S Lee Y-S Simon R (2003) Evaluation of normalization methods for microarray 976 data BMC Bioinformatics 4 33 977

Paulson J Chen C-Y Lopes-Ramos CM Kuijjer ML Platig J Sonawane AR Fagny M Glass K Quackenbush 978 J (2016) Tissue-aware RNA-Seq processing and normalization for heterogeneous and sparse data 979 bioRxiv 81802 980

Pautler M Eveland AL LaRue T Yang F Weeks R Lunde C Je B Il Meeley R Komatsu M Vollbrecht E et 981 al (2015) FASCIATED EAR4 Encodes a bZIP Transcription Factor That Regulates Shoot Meristem Size in 982 Maize Plant Cell Online 2 tpc114132506 983

Penning BW Hunter 3rd CT Tayengwa R Eveland AL Dugard CK Olek AT Vermerris W Koch KE McCarty 984 DR Davis MF et al (2009) Genetic resources for maize cell wall biology Plant Physiol 151 1703ndash1728 985

Ponnala L Wang Y Sun Q Wijk KJ (2014) Correlation of mRNA and protein abundance in the developing 986 maize leaf Plant J 78 424ndash440 987

Rau A Gallopin M Celeux G Jaffreacutezic F (2013) Data-based filtering for replicated high-throughput 988 transcriptome sequencing experiments Bioinformatics 29 2146ndash2152 989

Robinson MD McCarthy DJ Smyth GK (2010) edgeR a Bioconductor package for differential expression 990 analysis of digital gene expression data Bioinformatics 26 139ndash140 991

Sales G Romualdi C (2011) parmigenemdasha parallel R package for mutual information estimation and gene 992 network reconstruction Bioinformatics 27 1876ndash1877 993

Sangrador-Vegas A Mitchell AL Chang H-Y Yong S-Y Finn RD (2016) GO annotation in InterPro why 994 stability does not indicate accuracy in a sea of changing annotations Database J Biol databases 995 curation 2016 996

Sato Y Antonio BA Namiki N Takehisa H Minami H Kamatsuki K Sugimoto K Shimizu Y Hirochika H 997 Nagamura Y (2011) RiceXPro a platform for monitoring gene expression in japonica rice grown under 998 natural field conditions Nucleic Acids Res 39 D1141ndashD1148 999

Schaefer RJ Briskine R Springer NM Myers CL (2014) Discovering functional modules across diverse maize 1000 transcriptomes using COB the co-expression browser PLoS One doi 101371journalpone0099193 1001

Schmidt D (2016) Co-Operation Fast Correlation Covariance and Cosine Similarity R package 1002

Schnable P Ware D Fulton R Stein J (2009) The B73 maize genome complexity diversity and dynamics 1003 Science (80- ) 326 1112ndash1115 1004

Schwanhaumlusser B Busse D Li N Dittmar G Schuchhardt J Wolf J Chen W Selbach M (2011) Global 1005 quantification of mammalian gene expression control Nature 473 337ndash342 1006

Scott-Boyer M-P Haibe-Kains B Deschepper CF (2013) Network statistics of genetically-driven gene co-1007 expression modules in mouse crosses Frontiers in Genetics 20134291 1008

Serin EAR Nijveen H Hilhorst HWM Ligterink W (2016) Learning from Co-expression Networks Possibilities 1009 and Challenges Front Plant Sci 7 444 1010

Shannon P Markiel A Ozier O Baliga NS Wang JT Ramage D Amin N Schwikowski B Ideker T (2003) 1011 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 29

Cytoscape a software environment for integrated models of biomolecular interaction networks Genome 1012 Res 13 2498ndash2504 1013

Sing T Sander O Beerenwinkel N Lengauer T (2005) ROCR visualizing classifier performance in R 1014 Bioinformatics 21 3940ndash3941 1015

Singh M Goel S Meeley RB Dantec C Parrinello H Michaud C Leblanc O Grimanelli D (2011) Production of 1016 viable gametes without meiosis in maize deficient for an ARGONAUTE protein Plant Cell 23 443ndash458 1017

Song L Langfelder P Horvath S (2012) Comparison of co-expression measures mutual information 1018 correlation and model based indices BMC Bioinformatics 13 328 1019

Stelpflug SC Rajandeep S Vaillancourt B Hirsch CN Buell CR Leon N De Kaeppler SM (2015) An 1020 expanded maize gene expression atlas based on RNA-sequencing and its use to explore root 1021 development Plant Genome 314ndash362 1022

Szklarczyk D Franceschini A Wyder S Forslund K Heller D Huerta-Cepas J Simonovic M Roth A Santos A 1023 Tsafou KP et al (2015) STRING v10 Protein-protein interaction networks integrated over the tree of life 1024 Nucleic Acids Res 43 D447ndashD452 1025

Tzfadia O Diels T De Meyer S Vandepoele K Aharoni A Van de Peer Y (2016) CoExpNetViz Comparative 1026 Co-Expression Networks Construction and Visualization Tool Front Plant Sci 6 1194 1027

Usadel B Obayashi T Mutwil M Giorgi FM Bassel GW Tanimoto M Chow A Steinhauser D Persson S 1028 Provart NJ (2009) Co-expression tools for plant biology opportunities for hypothesis generation and 1029 caveats Plant Cell Environ 32 1633ndash51 1030

USDA (2016) Grain World Markets and Trade 1031

Vanholme R Demedts B Morreel K Ralph J Boerjan W (2010) Lignin biosynthesis and structure Plant 1032 Physiol 153 895ndash905 1033

Verdin E Ott M (2014) 50 Years of Protein Acetylation From Gene Regulation To Epigenetics Metabolism 1034 and Beyond Nat Rev Mol Cell Biol 16 258ndash264 1035

Walley JW Sartor RC Shen Z Schmitz RJ Wu KJ Urich MA Nery JR Smith LG Schnable JC Ecker JR 1036 (2016) Integration of omic networks in a developmental atlas of maize Science 353 814ndash818 1037

Wang W Zhou XJ Liu Z Sun F (2015a) Network tuned multiple rank aggregation and applications to gene 1038 ranking BMC Bioinformatics 16 S6 1039

Wang YXR Jiang K Feldman LJ Bickel PJ Huang H (2015b) Inferring genendashgene interactions and functional 1040 modules using sparse canonical correlation analysis Ann Appl Stat 9 300ndash323 1041

Wang Z Gerstein M Snyder M (2009) RNA-Seq a revolutionary tool for transcriptomics Nat Rev Genet 10 1042 57ndash63 1043

Wei C Li J Bumgarner RE (2004) Sample size for detecting differentially expressed genes in microarray 1044 experiments BMC Genomics 5 87 1045

Wolfe CJ Kohane IS Butte AJ (2005) Systematic survey reveals general applicability ofldquo guilt-by-associationrdquo 1046 within gene coexpression networks BMC Bioinformatics 6 227 1047

Wu D-D Wang X Li Y Zeng L Irwin DM Zhang Y-P (2014) ldquoOut of Pollenrdquo Hypothesis for Origin of New 1048 Genes in Flowering Plants Study from Arabidopsis thaliana Genome Biol Evol 6 2822ndash2829 1049

Yang H Liu X Xin M Du J Hu Z Peng H Rossi V Sun Q Ni Z Yao Y (2016) Genome-wide Mapping of 1050 Targets of Maize Histone Deacetylase HDA101 Reveals Its Function and Regulatory Mechanism during 1051 Seed Development Plant Cell 28 629ndash645 1052

Yim WC Yu Y Song K Jang CS Lee B-M (2013) PLANEX the plant co-expression database BMC Plant Biol 1053 13 83 1054

Zheng W Chung LM Zhao H (2011) Bias detection and correction in RNA-Sequencing data BMC 1055 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 30

Bioinformatics 12 290 1056

Zhong R Allen JD Xiao G Xie Y (2014) Ensemble-based network aggregation improves the accuracy of gene 1057 network reconstruction PLoS One 9 e106319 1058

Zhong R Ye ZH (2015) Secondary cell walls Biosynthesis patterned deposition and transcriptional regulation 1059 Plant Cell Physiol 56 195ndash214 1060

Zhu G Wu A Xu X-J Xiao P Lu L Liu J Cao Y Chen L Wu J Zhao X-M (2016) PPIM A protein-protein 1061 interaction database for Maize Plant Physiol 170 pp1501821- 1062

Zyprych-Walczak J Szabelska A Handschuh L Goacuterczak K Klamecka K Figlerowicz M Siatkowski I (2015) 1063 The impact of normalization methods on RNA-Seq data analysis Biomed Res Int 2015 1064

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 1 Number of maize microarray and RNA-Seq samples submitted to NCBI (httpswwwncbinlmnihgov) from 2008-2016 A A text search of the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) database identified samples generated by microarray platforms GPL4032 and GPL12620 the total values for years 2008 to 2016 were combined to represent the number of microarray studies (Microarray) A text search of the NCBI Sequence Read Archive (SRA) database was used to identify RNA-seq samples generated between 2008 and 2016 using the Illumina sequencing platform (RNA-Seq) B the number of samples submitted to NCBI GEO database each year generated by microarray platform GPL4032 and GPL12620 were identified by a text search (dash line) and compared to the number of RNA-Seq Illumina samples (solid line) per year 2008-2016

Fig 1A B

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 2 Normalization and network inference methods effect on single network performance A Network performance was evaluated by calculating area under the receiver operating characteristic curve (AUROC) values from GO datasets for comparisons with samples normalized using Variance Stabilizing Transformation (VST) Counts Per Million (CPM) or Reads per Kilobase per Million (RPKM) methods B Network performance was evaluated by calculating AUROC values from PPPTY dataset comparisons for samples normalized using VST CPM or RPKM methods C Network performance was evaluated by calculating AUROC values from com-parisons with HDA101 binding targets for samples normalized using VST CPM or RPKM methods D Network performance was evaluated by calculating AUROC values from comparisons with GO dataset for samples constructed using nine inference methods including Pearson Correlation Coefficient (PCC) Spearman correla-tion coefficient (SCC) Kendall rank Correlation Coefficient (KCC) Gini correlation coefficient (GCC) Biweight midcorrelation (BIC) Cosine Similarity Coefficient (CSC) Additive ARCNE (AA) multiplicative ARCNE (MA) Minimum Redundancy NETwork (MRNET) and Context Likelihood of Relatedness (CLR) E Network perfor-mance was evaluated by calculating AUROC values from comparisons with PPPTY for samples constructed using nine inference methods F Network performance was evaluated by calculating AUROC values from com-parisons with HDA101 binding targets for samples constructed using nine inference methods Outliers were defined as outside of 15 times the interquartile range above 75 quantile or below 25 quantile Median values were plotted as bold horizontal lines For C to F the horizontal dashed lines indicate the highest and lowest AUROC values

Fig 2 A D

B E

C F

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

FigP

FigurePIPSimilarityPbetweenPninePinferencePmethodsPonPnetworkPperformancePbasedPuponPGOPA-PandPPPPTYPB-PevaluationI Genes with highest AUROC value in CSC or MRNETCLR are closed in blue and green box respectively AreaPunderPthePROCPcurvePAUROC-PvaluesPforPeachPGOPtermPorPgenesPwerePscaledPtoPstandardPnormalPdistributionMPresultingPinPscaledPAUROCPvaluesPbetweenPKPblue-PandPPred-IPSamplesPnormalizedPbyPVSTMPCPMPandPRPKMPwerePanalyzedPusingPeachPinferencePmethodsPPCCMPSCCMPKCCMPGCCMPBICMPCSCMPAAMPMAMPMRNETPandPCLR-PandPclusteredPbasedPonP EuclidianP distanceIP PCCP PearsonP CorrelationP CoefficientP SCCP SpearmanP correlationP coefficientP KCCP KendallP rankP CorrelationP CoefficientPGCCPGiniPcorrelationPcoefficientPBICPBiweightPmidcorrelationPCosinePSimilarityPCoefficientPCSC-PAAPAdditivePARCNEPMAPmultiplicativePARCNEPMRNETPMinimumPRedundancyPNETworkPCLRPContextPLikelihoodPofPRelatednessI

A

B

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 4 Effect of sample size on network performance A Average area under the ROC curve (average AUROC) values from GO evaluation of 17 different sized networks plotted against natural logarithm transformed sample size (log(sample size)) B Average AUROC values from PPPTY evaluation of 17 different sized networks plotted against natural logarithm transformed sample size (log(sample size)) Regression fitting logarithm models were plotted in black lines R-square and p-values were calculated by lm() function in R PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness

Fig 4A

B

GO PCC GO SCC

GO MRNET GO CLR

PPPTY PCC PPPTY SCC

PPPTY CLRPPPTY MRNET

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Fig 5

A B

Figure 5 Evaluation of different sized networks constructed with different number of samples using PCC (black) SCC (green) MRNET (red) and CLR (blue) methods A Average AUROC values from GO evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted against sample size B Average AUROC values from PPPTY evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted against sample size Networks with the same number of samples included are designated as ldquo_1rdquo ldquo_2rdquo and ldquo_3rdquo PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 6 GCN performance comparison of networks constructed with 1266 libraries A Area under the ROC curve (AUROC) values from GO evalu-ation of single network (white bars) aggregation network (grey bars) and protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers B AUROC values from PPPTY evaluation of single network (white bars) aggregation network (grey bars) and protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers

AU

C

AU

C

Protein GO Protein PPPTYA B

Fig 6

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

835

45

3812

5 802

148

MS PA

SA

872505

167664

16510411732

9172 716573

106591

MS PA

SA

chromatinassemblydisassembly

cellular macromoleculemetabolic process

chromatin assembly

Hub

N2 compound metabolicprocess

gene silencing

macromoleculemetabolic process

cellular componentorganization

chromatin modification

biosynthetic process

cellular biosyntheticprocess

DNA packaging

organelle organization

protein-DNA complexassembly

nucleosomeorganization

DNA-dep DNAreplication

macromoleculebiosynthetic process

response to DNAdamage stimulus

chromosomeorganization

pattern specificationprocess

DNA replication

DNA conformationchange

translation

cellular macromoleculebiosynthetic process

Nucleic acid metabolicprocess

gene expression

chromatin organizationnucleosome assembly

epigenetic reg of geneexpression

negative regulation ofmacromolecule

metabolic process

cellular response tostress

RNA processing

DNA repair

regionalization

polysaccharidebiosynthetic process

cell wall organization orbiogenesis

glucan metabolicprocess

cellular glucanmetabolic process

cellular polysaccharidebiosynthetic process

cellular carbohydratebiosynthetic process

cellulose metabolicprocess

cellular polysaccharidemetabolic process

cellulose biosyntheticprocess epidermis development

cell growthgrowth

regulation of cellularcomponent size

cellular amino acidderivative metabolic

process

cell wall polysaccharidemetabolic process

carbohydrate metabolicprocess

regulation of anatomicalstructure size

GTP metabolic process

root morphogenesis

epidermal celldifferentiation

ectoderm developmentphenylpropanoid

biosynthetic process

regulation of cell size

glucan biosyntheticprocess

carbohydratebiosynthetic process cellular cell wall

organization orbiogenesis

cell wall biogenesis

Cell Wallroot epidermal cell

differentiationcell differentiation

cell wall organization

protein polymerization

plant-type cell wallbiogenesis

cellular carbohydratemetabolic process

phenylpropanoidmetabolic process

cell wall macromoleculemetabolic process

cellular cell wallmacromolecule

metabolic process

plant-type cell wallorganization or

biogenesis

hemicellulose metabolicprocess

Fig 7A C

B D

Figure 7 Comparison of top 1000 interacting genes in aggregation and single networks A Venn diagram shows the overlap among 1000 highest interacting genes in three networks 148 genes were shared among three networks PA PCC ranked aggregation network SA SCC ranked aggrega-tion network MS MRNET single network B Venn diagram shows the overlap among top 1x106 edges in three networks 106591 edges were shared among three networks C GO term enrichment analyzed by AgriGO for 148 shared highly interacting genes among PA SA and MS D GO term enrichment analyzed by AgriGO for 214 co-expressed genes queried by 16 cell wall pathway genes wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Fig 8

A B C

Figure 8 Cell wall pathway subnetworks A Intersections of PCC aggregation (PA) SCC aggregation (SA) and MRNET-single (MS) networks queried by 16 cell wall pathway genes (red nodes) Cyan nodes are genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions B Network retrieved from CORNET database queried by the 16 cell wall pathway genes (red node) Cyan nodes are genes with reported function in cell wall related path-ways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions C Network retrieved from STRING database queried by the 16 cell wall pathway genes (red nodes) Cyan nodes are genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Parsed CitationsAllen JD Xie Y Chen M Girard L Xiao G (2012) Comparing statistical methods for constructing large scale gene networks PLoS One7 e29348

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Anders S Huber W (2010) Differential expression analysis for sequence count data Genome Biol 11 R106Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Aoki K Ogata Y Shibata D (2007) Approaches for extracting practical information from gene co-expression networks in plant biologyPlant Cell Physiol 48 381-90

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Aoki Y Okamura Y Tadaka S Kinoshita K Obayashi T (2015) ATTED-II in 2016 A Plant Coexpression Database Towards Lineage-Specific Coexpression Plant Cell Physiol 57 e5-e5

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Asnicar F Masera L Coller E Gallo C Sella N Tolio T Morettin P Erculiani L Galante F Semeniuta S (2016) NES2RA Networkexpansion by stratified variable subsetting and ranking aggregation Int J High Perform Comput Appl 1094342016662508

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Baerenfaller K Grossmann J Grobei MA Hull R Hirsch-Hoffmann M Yalovsky S Zimmermann P Grossniklaus U Gruissem WBaginsky S (2008) Genome-scale proteomics reveals Arabidopsis thaliana gene models and proteome dynamics Science (80- ) 320938-941

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ballouz S Verleyen W Gillis J (2015) Guidance for RNA-seq co-expression network construction and analysis Safety in numbersBioinformatics 31 2123-2130

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ballouz S Weber M Pavlidis P Gillis J (2016) EGAD Ultra-fast functional analysis of gene networks bioRxiv 53868Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Barabasi A-L Oltvai ZNZN Barabaacutesi A-L (2004) Network biology understanding the cells functional organization Nat Rev Genet 5101-113

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Benjamini Y Hochberg Y (1995) Controlling the false discovery rate a practical and powerful approach to multiple testing J R Stat SocSer B 289-300

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

De Bodt S Hollunder J Nelissen H Meulemeester N Inzeacute D (2012) CORNET 20 Integrating plant coexpression protein-proteininteractions regulatory interactions gene associations and functional annotations New Phytol 195 707-720

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Bolduc N Yilmaz A Mejia-Guerra MK Morohashi K OConnor D Grotewold E Hake S (2012) Unraveling the KNOTTED1 regulatorynetwork in maize meristems Genes Dev 26 1685-90

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Bosch M Mayer C-D Cookson A Donnison IS (2011) Identification of genes involved in cell wall biogenesis in grasses by differential wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

gene expression profiling of elongating and non-elongating maize internodes J Exp Bot 62 3545-3561Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Camacho C Coulouris G Avagyan V Ma N Papadopoulos J Bealer K Madden TL (2009) BLAST+ architecture and applications BMCBioinformatics 10 421

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Conesa A Madrigal P Tarazona S Gomez-Cabrero D Cervera A McPherson A Szczesniak MW Gaffney DJ Elo LL Zhang X et al(2016) A survey of best practices for RNA-seq data analysis Genome Biol 17 13

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dhaeseleer P Liang S Somogyi R (2000) Genetic network inference from co-expression clustering to reverse engineeringBioinformatics 16 707-726

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Davidson RM Hansey CN Gowda M Childs KL Lin H Vaillancourt B Sekhon RS de Leon N Kaeppler SM Jiang N et al (2011) Utilityof RNA Sequencing for Analysis of Maize Reproductive Transcriptomes Plant Genome J 4 191

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Desprez T Juraniec M Crowell EF Jouy H Pochylova Z Parcy F Houmlfte H Gonneau M Vernhettes S (2007) Organization of cellulosesynthase complexes involved in primary cell wall synthesis in Arabidopsis thaliana Proc Natl Acad Sci 104 15572-15577

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dhillon IS Modha DS (2001) Concept decompositions for large sparse text data using clustering Mach Learn 42 143-175Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dillies M-A Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot G Castel D Estelle J (2013a) Acomprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis Brief Bioinform 14671-683

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dillies MA Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot NS Castel D Estelle J et al (2013b) Acomprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis Brief Bioinform 14671-683

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Doncheva NT Assenov Y Domingues FS Albrecht M (2012) Topological analysis and interactive visualization of biological networksand protein structures Nature Protoc 7 670-85

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dong J Horvath S (2007) Understanding network concepts in modules BMC Syst Biol 1 24Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dotto MC Petsch KA Aukerman MJ Beatty M Hammell M Timmermans MCP (2014) Genome-wide analysis of leafbladeless1-regulatedand phased small RNAs underscores the importance of the TAS3 ta-siRNA pathway to maize development PLoS Genet 10 e1004826

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Du D Rawat N Deng Z Gmitter FG (2015) Construction of citrus gene coexpression networks from microarray data using randommatrix theory Hortic Res 2 15026

Pubmed Author and TitleCrossRef Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Google Scholar Author Only Title Only Author and Title

Du Z Zhou X Ling Y Zhang Z Su Z (2010) agriGO a GO analysis toolkit for the agricultural community Nucleic Acids Res 38 64-70Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Enright AJ Van Dongen S Ouzounis CA (2002) An efficient algorithm for large-scale detection of protein families Nucleic Acids Res 301575-1584

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fadista J Vikman P Laakso EO Mollet IG Esguerra J Lou Taneera J Storm P Osmark P Ladenvall C Prasad RB (2014) Globalgenomic and transcriptomic analysis of human pancreatic islets reveals novel genes influencing glucose metabolism Proc Natl AcadSci 111 13924-13929

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Faith JJ Hayete B Thaden JT Mogno I Wierzbowski J Cottarel G Kasif S Collins JJ Gardner TS (2007) Large-scale mapping andvalidation of Escherichia coli transcriptional regulation from a compendium of expression profiles PLoS Biol 5 0054-0066

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fedoroff N V (2012) McClintocks challenge in the 21st century Proc Natl Acad Sci 109(50) 20200-20203Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ficklin SP Feltus FA (2011) Gene coexpression network alignment and conservation of gene modules between two grass speciesmaize and rice Plant Physiol 156 1244-56

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Filzmoser P Fritz H Kalcher K (2009) pcaPP Robust PCA by Projection Pursuit R Packag version 1Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fu J Cheng Y Linghu J Yang X Kang L Zhang Z Zhang J He C Du X Peng Z (2013) RNA sequencing reveals the complex regulatorynetwork in the maize kernel Nature Commun 42832

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gao S Ver Steeg G Galstyan A (2015) Efficient Estimation of Mutual Information for Strongly Dependent Variables ArtificialIntelligence and Statistics 277-286

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gillis J Pavlidis P (2011) The role of indirect connections in gene networks in predicting function Bioinformatics 27 1860-1866Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Giorgi F Fabbro C Del Licausi F (2013) Comparative study of RNA-seq-and Microarray-derived coexpression networks in Arabidopsisthaliana Bioinformatics 2 1-8

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gu Y Kaplinsky N Bringmann M Cobb A Carroll A Sampathkumar A Baskin TI Persson S Somerville CR (2010) Identification of acellulose synthase-associated protein required for cellulose biosynthesis Proc Natl Acad Sci 107 12866-12871

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Han Y Gao S Muegge K Zhang W Zhou B (2015) Advanced applications of RNA sequencing and challenges Bioinform Biol Insights 929-46

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Horvath S Dong J (2008) Geometric interpretation of gene coexpression network analysis PLoS Comput biol 4 e1000117Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Huang J Lynn JS Schulte L Vendramin S McGinnis K (2017) Chapter Two-Epigenetic Control of Gene Expression in Maize Int RevCell Mol Biol 328 25-48

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Iancu OD Kawane S Bottomly D Searles R Hitzemann R McWeeney S (2012) Utilizing RNA-Seq data for de novo coexpressionnetwork inference Bioinformatics 28 1592-1597

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kim D Langmead B Salzberg SL (2015) HISAT a fast spliced aligner with low memory requirements Nat Methods 12 357-360Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Koumlster J Rahmann S (2012) Snakemakemdasha scalable bioinformatics workflow engine Bioinformatics 28 2520-2522Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Krouk G Lingeman J Colon A Coruzzi G Shasha D (2013) Gene regulatory networks in plants learning causality from time andperturbation Genome Biol 14 123

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kumar S Stecher G Suleski M Hedges SB (2017) TimeTree a resource for timelines timetrees and divergence times Mol Biol Evol34 1812-1819

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kumari S Nie J Chen H-S Ma H Stewart R Li X Lu M-Z Taylor WM Wei H (2012) Evaluation of gene association methods forcoexpression network construction and biological knowledge discovery PLoS One 7 e50411

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Langfelder P Horvath S (2008) WGCNA an R package for weighted correlation network analysis BMC Bioinformatics 9 559Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Leinonen R Sugawara H Shumway M (2010) The sequence read archive Nucleic Acids Res gkq1019Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li C Qiao Z Qi W Wang Q Yuan Y Yang X Tang Y Mei B Lv Y Zhao H et al (2015a) Genome-Wide Characterization of cis-Acting DNATargets Reveals the Transcriptional Regulatory Framework of Opaque2 in Maize Plant Cell 27 532-545

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li H Peng Z Yang X Wang W Fu J Wang J Han Y Chai Y Guo T Yang N (2013a) Genome-wide association study dissects the geneticarchitecture of oil biosynthesis in maize kernels Nat Genet 45 43-50

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li J Wei H Zhao PX (2013b) DeGNServer Deciphering Genome-Scale Gene Networks through High Performance ReverseEngineering Analysis 2013

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li P Piao Y Shon HS Ryu KH (2015b) Comparing the normalization methods for the differential analysis of Illumina high-throughputRNA-Seq data BMC Bioinformatics 16 347 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li Q Eichten SR Hermanson PJ Zaunbrecher VM Song J Wendt J Rosenbaum H Madzima TF Sloan AE Huang J et al (2014a)Genetic Perturbation of the Maize Methylome Plant Cell 26 4602-4616

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li S Labaj PP Zumbo P Sykacek P Shi W Shi L Phan J Wu P-Y Wang M Wang C (2014b) Detecting and correcting systematicvariation in large-scale RNA sequencing data Nature Biotechnol 32 888-895

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li Y Pearl SA Jackson SA (2015c) Gene Networks in Plant Biology Approaches in Reconstruction and Analysis Trends Plant Sci 20664-675

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Liao Y Smyth GK Shi W (2014) featureCounts an efficient general purpose program for assigning sequence reads to genomicfeatures Bioinformatics 30 923-930

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Lim WK Wang K Lefebvre C Califano A (2007) Comparative analysis of microarray normalization procedures Effects on reverseengineering gene networks Bioinformatics pp 282-288

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Liu S Liu Y Zhao J Cai S Qian H Zuo K Zhao L Zhang L (2017) A computational interactome for prioritizing genes associated withcomplex agronomic traits in rice Plant J 90 177-188

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Lonsdale J Thomas J Salvatore M Phillips R Lo E Shad S Hasz R Walters G Garcia F Young N (2013) The genotype-tissueexpression (GTEx) project Nat Genet 45 580-585

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Love MI Huber W Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 Genome Biol15 1

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Luo X You Z Zhou M Li S Leung H Xia Y Zhu Q (2015) A highly efficient approach to protein interactome mapping based oncollaborative filtering framework Sci Rep 5 7702

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma C Wang X (2012) Application of the Gini correlation coefficient to infer regulatory relationships in transcriptome analysis PlantPhysiol 160 192-203

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma H-W Zeng A-P (2003) The connectivity structure giant strong component and centrality of metabolic networks Bioinformatics 191423-1430

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma S Shah S Bohnert HJ Snyder M Dinesh-Kumar SP (2013) Incorporating motif analysis into gene co-expression networks revealsnovel modular expression pattern and new signaling pathways PLoS Genet 9 e1003840

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Marbach D Costello JC Kuumlffner R Vega NNM Prill RJ Camacho DM Allison KR Aderhold A Allison KR Bonneau R et al (2012)Wisdom of crowds for robust gene network inference Nat Methods 9 796-804

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Margolin AA Nemenman I Basso K Wiggins C Stolovitzky G Dalla Favera R Califano A (2006) ARACNE an algorithm for thereconstruction of gene regulatory networks in a mammalian cellular context BMC Bioinformatics 7 S7

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mark Cigan A Unger-Wallace E Haug-Collet K (2005) Transcriptional gene silencing as a tool for uncovering gene function in maizePlant J 43 929-940

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads EMBnet J 17 pp-10Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Maza E Frasse P Senin P Bouzayen M Zouine M (2013) Comparison of normalization methods for differential gene expressionanalysis in RNA-Seq experiments A matter of relative size of studied transcriptomes Commun Integr Biol 6 e25849

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

McClintock B (1983) The Significance of Responses of THE GENOME TO CHALLENGE Science (80- ) 792-801Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Meyer PE Kontos K Lafitte F Bontempi G (2007) Information-theoretic inference of large transcriptional regulatory networks EurasipJ Bioinforma Syst Biol doi 101155200779879

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Meyer PE Lafitte F Bontempi G (2008) minet ARBioconductor package for inferring large transcriptional networks using mutualinformation BMC Bioinformatics 9 461

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Monaco MK Sen TZ Dharmawardhana PD Ren L Schaeffer M Naithani S Amarasinghe V Thomason J Harper L Gardiner J et al(2013) Maize Metabolic Network Construction and Transcriptome Analysis Plant Genome 6 12

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Morohashi K Casas MI Falcone Ferreyra ML Falcone Ferreyra L Mejiacutea-Guerra MK Pourcel L Yilmaz A Feller A Carvalho B EmilianiJ et al (2012) A genome-wide regulatory framework identifies maize pericarp color1 controlled genes Plant Cell 24 2745-64

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Morris JH Apeltsin L Newman AM Baumbach J Wittkop T Su G Bader GD Ferrin TE (2011) clusterMaker a multi-algorithmclustering plugin for Cytoscape BMC Bioinformatics 12 436

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mortazavi A Williams BA McCue K Schaeffer L Wold B (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq NatMethods 5 621-628

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mukaka MM (2012) A guide to appropriate use of correlation coefficient in medical research Malawi Med J 24 69-71Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Obayashi T Hayashi S Saeki M Ohta H Kinoshita K (2009) ATTED-II provides coexpressed gene networks for Arabidopsis Nucleic wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Acids Res 37 D987-D991Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ogata Y Suzuki H Sakurai N Shibata D (2010) CoP a database for characterizing co-expressed gene modules with biologicalinformation in plants Bioinformatics 26 1267-1268

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Oshlack A Wakefield MJ (2009) Transcript length bias in RNA-seq data confounds systems biology Biol Direct 4 14Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Park T Yi S-G Kang S-H Lee S Lee Y-S Simon R (2003) Evaluation of normalization methods for microarray data BMC Bioinformatics4 33

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Paulson J Chen C-Y Lopes-Ramos CM Kuijjer ML Platig J Sonawane AR Fagny M Glass K Quackenbush J (2016) Tissue-awareRNA-Seq processing and normalization for heterogeneous and sparse data bioRxiv 81802

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Pautler M Eveland AL LaRue T Yang F Weeks R Lunde C Je B Il Meeley R Komatsu M Vollbrecht E et al (2015) FASCIATED EAR4Encodes a bZIP Transcription Factor That Regulates Shoot Meristem Size in Maize Plant Cell Online 2 tpc114132506

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Penning BW Hunter 3rd CT Tayengwa R Eveland AL Dugard CK Olek AT Vermerris W Koch KE McCarty DR Davis MF et al (2009)Genetic resources for maize cell wall biology Plant Physiol 151 1703-1728

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ponnala L Wang Y Sun Q Wijk KJ (2014) Correlation of mRNA and protein abundance in the developing maize leaf Plant J 78 424-440Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Rau A Gallopin M Celeux G Jaffreacutezic F (2013) Data-based filtering for replicated high-throughput transcriptome sequencingexperiments Bioinformatics 29 2146-2152

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Robinson MD McCarthy DJ Smyth GK (2010) edgeR a Bioconductor package for differential expression analysis of digital geneexpression data Bioinformatics 26 139-140

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sales G Romualdi C (2011) parmigenemdasha parallel R package for mutual information estimation and gene network reconstructionBioinformatics 27 1876-1877

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sangrador-Vegas A Mitchell AL Chang H-Y Yong S-Y Finn RD (2016) GO annotation in InterPro why stability does not indicateaccuracy in a sea of changing annotations Database J Biol databases curation 2016

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sato Y Antonio BA Namiki N Takehisa H Minami H Kamatsuki K Sugimoto K Shimizu Y Hirochika H Nagamura Y (2011) RiceXPro aplatform for monitoring gene expression in japonica rice grown under natural field conditions Nucleic Acids Res 39 D1141-D1148

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schaefer RJ Briskine R Springer NM Myers CL (2014) Discovering functional modules across diverse maize transcriptomes using wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

COB the co-expression browser PLoS One doi 101371journalpone0099193Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schmidt D (2016) Co-Operation Fast Correlation Covariance and Cosine Similarity R packagePubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schnable P Ware D Fulton R Stein J (2009) The B73 maize genome complexity diversity and dynamics Science (80- ) 326 1112-1115Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schwanhaumlusser B Busse D Li N Dittmar G Schuchhardt J Wolf J Chen W Selbach M (2011) Global quantification of mammalian geneexpression control Nature 473 337-342

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Scott-Boyer M-P Haibe-Kains B Deschepper CF (2013) Network statistics of genetically-driven gene co-expression modules in mousecrosses Frontiers in Genetics 20134291

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Serin EAR Nijveen H Hilhorst HWM Ligterink W (2016) Learning from Co-expression Networks Possibilities and Challenges FrontPlant Sci 7 444

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Shannon P Markiel A Ozier O Baliga NS Wang JT Ramage D Amin N Schwikowski B Ideker T (2003) Cytoscape a softwareenvironment for integrated models of biomolecular interaction networks Genome Res 13 2498-2504

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sing T Sander O Beerenwinkel N Lengauer T (2005) ROCR visualizing classifier performance in R Bioinformatics 21 3940-3941Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Singh M Goel S Meeley RB Dantec C Parrinello H Michaud C Leblanc O Grimanelli D (2011) Production of viable gametes withoutmeiosis in maize deficient for an ARGONAUTE protein Plant Cell 23 443-458

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Song L Langfelder P Horvath S (2012) Comparison of co-expression measures mutual information correlation and model basedindices BMC Bioinformatics 13 328

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Stelpflug SC Rajandeep S Vaillancourt B Hirsch CN Buell CR Leon N De Kaeppler SM (2015) An expanded maize gene expressionatlas based on RNA-sequencing and its use to explore root development Plant Genome 314-362

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Szklarczyk D Franceschini A Wyder S Forslund K Heller D Huerta-Cepas J Simonovic M Roth A Santos A Tsafou KP et al (2015)STRING v10 Protein-protein interaction networks integrated over the tree of life Nucleic Acids Res 43 D447-D452

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Tzfadia O Diels T De Meyer S Vandepoele K Aharoni A Van de Peer Y (2016) CoExpNetViz Comparative Co-Expression NetworksConstruction and Visualization Tool Front Plant Sci 6 1194

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Usadel B Obayashi T Mutwil M Giorgi FM Bassel GW Tanimoto M Chow A Steinhauser D Persson S Provart NJ (2009) Co-expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

USDA (2016) Grain World Markets and Trade

Vanholme R Demedts B Morreel K Ralph J Boerjan W (2010) Lignin biosynthesis and structure Plant Physiol 153 895-905Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Verdin E Ott M (2014) 50 Years of Protein Acetylation From Gene Regulation To Epigenetics Metabolism and Beyond Nat Rev MolCell Biol 16 258-264

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Walley JW Sartor RC Shen Z Schmitz RJ Wu KJ Urich MA Nery JR Smith LG Schnable JC Ecker JR (2016) Integration of omicnetworks in a developmental atlas of maize Science 353 814-818

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang W Zhou XJ Liu Z Sun F (2015a) Network tuned multiple rank aggregation and applications to gene ranking BMC Bioinformatics16 S6

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang YXR Jiang K Feldman LJ Bickel PJ Huang H (2015b) Inferring gene-gene interactions and functional modules using sparsecanonical correlation analysis Ann Appl Stat 9 300-323

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang Z Gerstein M Snyder M (2009) RNA-Seq a revolutionary tool for transcriptomics Nat Rev Genet 10 57-63Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wei C Li J Bumgarner RE (2004) Sample size for detecting differentially expressed genes in microarray experiments BMC Genomics5 87

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wolfe CJ Kohane IS Butte AJ (2005) Systematic survey reveals general applicability of guilt-by-association within genecoexpression networks BMC Bioinformatics 6 227

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wu D-D Wang X Li Y Zeng L Irwin DM Zhang Y-P (2014) Out of Pollen Hypothesis for Origin of New Genes in Flowering PlantsStudy from Arabidopsis thaliana Genome Biol Evol 6 2822-2829

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yang H Liu X Xin M Du J Hu Z Peng H Rossi V Sun Q Ni Z Yao Y (2016) Genome-wide Mapping of Targets of Maize HistoneDeacetylase HDA101 Reveals Its Function and Regulatory Mechanism during Seed Development Plant Cell 28 629-645

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yim WC Yu Y Song K Jang CS Lee B-M (2013) PLANEX the plant co-expression database BMC Plant Biol 13 83Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zheng W Chung LM Zhao H (2011) Bias detection and correction in RNA-Sequencing data BMC Bioinformatics 12 290Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Allen JD Xiao G Xie Y (2014) Ensemble-based network aggregation improves the accuracy of gene network reconstructionPLoS One 9 e106319 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Ye ZH (2015) Secondary cell walls Biosynthesis patterned deposition and transcriptional regulation Plant Cell Physiol 56195-214

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhu G Wu A Xu X-J Xiao P Lu L Liu J Cao Y Chen L Wu J Zhao X-M (2016) PPIM A protein-protein interaction database for MaizePlant Physiol 170 pp1501821-

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zyprych-Walczak J Szabelska A Handschuh L Goacuterczak K Klamecka K Figlerowicz M Siatkowski I (2015) The impact of normalizationmethods on RNA-Seq data analysis Biomed Res Int 2015

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

  • Parsed Citations
  • Article File
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Figure 7
  • Figure 8
  • Parsed Citations
Page 16: 1 : Maize RNA-Seq GCNs · 4 Author names and affiliation: 5 Ji Huang, ... and the genome annotation is still far from complete (Mark 49 Cigan et al., 2005; Ficklin and Feltus, 2011)

Page | 16

Four maize datasets were used for evaluation First maize protein-protein interactions were downloaded from 529

PPIM v11 (Zhu et al 2016) Only high-confidence interactions were used for evaluation as defined by ranking 530

top 5 in their results Second maize pathway information was downloaded from MaizeCyc v22 (Monaco et 531

al 2013) Genes within same pathways were considered as co-expressed Third maize gene ontology data 532

for AGPv330 was downloaded from AgriGO (Du et al 2010) GO terms with 20 to 300 genes were used for 533

evaluation Fourth ChIP-Seq confirmed targets for HDA101 (GRMZM2G172883) (Yang et al 2016) was used 534

as positive co-expressed examples for evaluation 535

The widely-used Area under Receiver Operating Characteristic (AUROC) for binary classification problems 536

was used for evaluations Protein-protein interaction and pathway information was parsed into lists of co-537

expressed genes Prediction() and performance() function in R package ROCR were used to calculate 538

AUROCs (Sing et al 2005) The 277 AUROC values for GO datasets were calculated by EGAD package 539

(Ballouz et al 2016) in R Basically it utilizes the ldquoguilt-by associationrdquo principle that genes with shared GO 540

terms are more likely to connected Thus networks normalized and inferred by different methods can be 541

evaluated by hiding a subset of genes GO terms and test whether the hidden GO terms could be predicted 542

from the remaining annotations The prediction model performance was measured by AUROC values in three-543

fold cross-validation All ANOVA and pairwise Wilcoxon rank tests were analyzed in R using anova() and 544

pairwisewilcoxtest() function from stats package P-value adjustment method was set to ldquofdrrdquo (Benjamini and 545

Hochberg 1995) 546

Definition of True Positives (TP) False Positives (FP) True Negatives (TN) False Negatives (FN) For the 547

evaluation using PPPTY dataset TP a network predicts two genes are co-expressed and they are co-548

expressed in PPPTY dataset FP a network predicts two genes are co-expressed but they are not TN a 549

network predicts two genes are not co-expressed and they are not co-expressed in PPPTY FN a network 550

predicts two genes are not co-expressed but they are co-expressed in PPPTY datasets For the evaluation 551

using GO dataset TP a network predicts a gene has a specific GO term and it does have that GO term in our 552

GO dataset FP a network predicts a gene has a specific GO term but it does not have that GO term in our 553

GO dataset TN a network predicts a gene does not have a specific GO term and it doesnrsquot have in our GO 554

dataset FN a network predicts a gene does not have a specific GO terms but it has that GO term in GO 555

dataset 556

557

Network Clustering and Characterization 558

For each network the top 1 million edges were selected as stringent co-expression networks The network 559

topological characteristics were computed in Cytoscape (Shannon et al 2003) The neighborhood connectivity 560

distribution and node degree distributions were plotted by Network Analyzer plugin (Doncheva et al 2012) 561

Graph clustering was performed using Markov Cluster Algorithm (MCL) by MCL v14137 with inflation value set 562

to 18 (Enright et al 2002) All networks were visualized in Cytoscape 563

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 17

564

Gene Ontology Enrichment and Visualization 565

Gene ontology enrichment was analyzed in AgriGOrsquos Singular Enrichment Analysis tool (Du et al 2010) 566

15116 genes involved in our networks were used as background references Hypergeometric testing was used 567

to calculate p-value for which a value below 005 was considered as significant The Yekutieli method was 568

used for multiple test correction and terms with false discovery rate (FDR) above 005 were discarded The 569

results were then imported into Cytoscape for visualization 570

571

Databases Comparison on Cell Wall Pathway 572

Sixteen well characterized (Penning et al 2009 Bosch et al 2011) components of cell wall biosynthesis 573

(Supplemental Table S8) were chosen as query genes to search against CORNET Maize 574

(httpsbioinformaticspsbugentbecornetversionscornet_maize10) on website and STRING database using 575

Cytoscape stringApp (httpappscytoscapeorgappsstringapp) The parameters for searching CORNET 576

database were Method=Pearson Correlation coefficient=075 P-value le 005 and Top genes = 50 This 577

resulted in 210 co-expressed genes and 325 interactions To search STRING database the confidence cutoff 578

was set to 04 with maximum number of interactors set to 100 76 genes with 817 interactions were retrieved 579

Maize proteins were blasted against TAIR 10 protein sequences using standalone BLASTP version 2228+ 580

(Camacho et al 2009) 581

582

Acknowledgments 583

We would like to give special thanks to Dr Peixiang Zhao (FSU Department of Computer Science) for advice 584

and discussion on topological analysis of maize networks Also we thank Dr Alan Lemmon (FSU Department 585

of Scientific Computing) and Dr Jonathan Dennis (FSU Department of Biological Science) for the helpful 586

discussion on data analysis 587

588

Supplemental Data 589

Supplemental Figure 1 Pipeline and datasets used for analysis 590

Supplemental Figure 2 Distribution of gene expression values 591

Supplemental Figure 3 Maize CPM-normalized with log2 transformed gene expression from all tissues and 592

developmental stages 593

Supplemental Figure 4 Pairwise comparison among results of inferences methods 594

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 18

Supplemental Figure 5 Characteristic of all 1720 PPPTY gene set (ALL_1720) genes with highest AUROC 595

values in CSC method (CSC) PCC method (PCC) and MRNET method (MRNET) 596

Supplemental Figure 6 Evaluation of network performance based on sample size and inference 597

Supplemental Figure 7 GCN performance comparison between protein networks 598

Supplemental Figure 8 Average neighborhood connectivity for three selected networks PCC-aggregated (PA) 599

SCC-aggregated (SA) and MRNET-single (MS) 600

Supplemental Figure 9 Node distribution for four selected networks PCC-aggregated (PA) SCC-aggregated 601

(SA) and MRNET-single (MS) and the intersection among three networks (Merged network) 602

Supplemental Figure 10 Network representation of PCC ranked aggregation network (PA) 603

Supplemental Table S1 RNA-Seq libraries used in this analysis 604

Supplemental Table S2 Random network AUROC value baseline 605

Supplemental Table S3 ANOVA tables and pairwise comparisons 606

Supplemental Table S4 Topological characteristics of four maize networks 607

Supplemental Table S5 Gene Ontology annotation for 148 hub genes 608

Supplemental Table S6 Enriched GO terms for PCC ranked aggregation networks from module 1 to module 8 609

Supplemental Table S7 Enriched GO terms for SCC ranked aggregation networks from module 1 to module 8 610

Supplemental Table S8 16 query genes in maize cell wall pathway 611

Supplemetal Table S9 GO enrichment analysis for 214 co-expressed genes of cell wall query genes in 612

merged network 613

Supplemental Table S10 Annotation for co-expressed genes queried by 16 cell wall pathway genes from 614

merged network 615

Supplemental Table S11 Annotation for co-expressed genes queried by 16 cell wall pathway genes from 616

CORNET database 617

Supplemental Table S12 Annotation for co-expressed genes queried by 16 cell wall pathway genes from 618

STRING database 619

Supplemental Dataset S1 The merged network in Cytoscape-ready format 620

Supplemental Dataset S2 Tutorial Visualizing Co-expression data in Cytoscape 621

622

623 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 19

624

625

626

Figure legends 627

628

Figure 1 Number of maize microarray and RNA-Seq samples submitted to NCBI (httpswwwncbinlmnihgov) 629

from 2008-2016 A A text search of the National Center for Biotechnology Information (NCBI) Gene 630

Expression Omnibus (GEO) database identified samples generated by microarray platforms GPL4032 and 631

GPL12620 the total values for years 2008 to 2016 were combined to represent the number of microarray 632

studies (Microarray) A text search of the NCBI Sequence Read Archive (SRA) database was used to identify 633

RNA-seq samples generated between 2008 and 2016 using the Illumina sequencing platform (RNA-Seq) B 634

the number of samples submitted to NCBI GEO database each year generated by microarray platform 635

GPL4032 and GPL12620 were identified by a text search (dash line) and compared to the number of RNA-Seq 636

Illumina samples (solid line) per year 2008-2016 637

638

Figure 2 Normalization and network inference methods effect on single network performance A Network 639

performance was evaluated by calculating area under the receiver operating characteristic curve (AUROC) 640

values from GO datasets for comparisons with samples normalized using Variance Stabilizing Transformation 641

(VST) Counts Per Million (CPM) or Reads per Kilobase per Million (RPKM) methods B Network performance 642

was evaluated by calculating AUROC values from PPPTY dataset comparisons for samples normalized using 643

VST CPM or RPKM methods C Network performance was evaluated by calculating AUROC values from 644

comparisons with HDA101 binding targets for samples normalized using VST CPM or RPKM methods D 645

Network performance was evaluated by calculating AUROC values from comparisons with GO dataset for 646

samples constructed using ten inference methods including Pearson Correlation Coefficient (PCC) Spearman 647

correlation coefficient (SCC) Kendall rank Correlation Coefficient (KCC) Gini correlation coefficient (GCC) 648

Biweight midcorrelation (BIC) Cosine Similarity Coefficient (CSC) Additive ARCNE (AA) multiplicative 649

ARCNE (MA) Minimum Redundancy NETwork (MRNET) and Context Likelihood of Relatedness (CLR) E 650

Network performance was evaluated by calculating AUROC values from comparisons with PPPTY for samples 651

constructed using ten inference methods F Network performance was evaluated by calculating AUROC 652

values from comparisons with HDA101 binding targets for samples constructed using ten inference methods 653

Outliers were defined as outside of 15 times the interquartile range above 75 quantile or below 25 quantile 654

Median values were plotted as bold horizontal lines For C to F the horizontal dashed lines indicate the highest 655

and lowest AUROC values 656

657

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 20

Figure 3 Similarity between ten inference methods on network performance based upon GO (A) and PPPTY 658

(B) evaluation Genes with highest AUROC value in CSC or MRNETCLR are closed in blue and green box 659

respectively Area under the ROC curve (AUROC) values for each GO term or genes were scaled to standard 660

normal distribution resulting in scaled AUROC values between -3 (blue) and 3 (red) Samples normalized by 661

VST CPM and RPKM were analyzed using each inference methods (PCC SCC KCC GCC BIC CSC AA 662

MA MRNET and CLR) and clustered based on Euclidian distance PCC Pearson Correlation Coefficient SCC 663

Spearman correlation coefficient KCC Kendall rank Correlation Coefficient GCC Gini correlation coefficient 664

BIC Biweight midcorrelation CSC Cosine Similarity Coefficient AA Additive ARCNE MA multiplicative 665

ARCNE MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness 666

667

Figure 4 Effect of sample size on network performance A Average area under the ROC curve (average 668

AUROC) values from GO evaluation of 17 different sized networks plotted against natural logarithm 669

transformed sample size (log(sample size)) B Average AUROC values from PPPTY evaluation of 17 different 670

sized networks plotted against natural logarithm transformed sample size (log(sample size)) Regression fitting 671

logarithm models were plotted in black lines R-square and p-values were calculated by lm() function in R PCC 672

Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET Minimum Redundancy 673

NETwork CLR Context Likelihood of Relatedness 674

675

Figure 5 Evaluation of different sized networks constructed with different number of samples using PCC 676

(black) SCC (green) MRNET (red) and CLR (blue) methods A Average AUROC values from GO evaluations 677

of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted against sample size Seventeen 678

individual networks were labeled as S12_1 to S404 the S1266 included all samples from 17 experiments B 679

Average AUROC values from PPPTY evaluations of networks constructed using 12 (S12) to 1266 (S1266) 680

libraries were plotted against sample size Networks with the same number of samples included are 681

designated as ldquo_1rdquo ldquo_2rdquo and ldquo_3rdquo PCC Pearson Correlation Coefficient SCC Spearman correlation 682

coefficient MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness 683

684

Fig 6 GCN performance comparison among single network (whiterdquo1266rdquo) aggregated network (greyrdquoaggrdquo) 685

and protein network (dark greyrdquoprrdquo) using PCC SCC MRNET and CLR A GO evaluation on networks 686

Inference methods were indicated by single letter (p- PCC s- SCC m- MRNET c-CLR) AUROC values were 687

plotted against network types B PPPTY evaluation on networks Inference methods were indicated by single 688

letter (p- PCC s- SCC m- MRNET c-CLR) Network types were plotted against AUROC values Bold 689

horizontal lines indicate median star sign is the mean value of each box Outliers are plotted in grey dots 690

691

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 21

Figure 6 GCN performance comparison of networks constructed with 1266 libraries A Area under the ROC 692

curve (AUROC) values from GO evaluation of single network (white bars) aggregation network (grey bars) and 693

protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) 694

or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers B 695

AUROC values from PPPTY evaluation of single network (white bars) aggregation network (grey bars) and 696

protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) 697

or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers 698

699

Figure 7 Comparison of top 1000 interacting genes in aggregation and single networks A Venn diagram 700

shows the overlap among 1000 highest interacting genes in three networks 148 genes were shared among 701

three networks PA PCC ranked aggregation network SA SCC ranked aggregation network MS MRNET 702

single network B Venn diagram shows the overlap among top 1x106 edges in three networks 106591 edges 703

were shared among three networks C GO term enrichment analyzed by AgriGO for 148 shared highly 704

interacting genes among PA SA and MS D GO term enrichment analyzed by AgriGO for 214 co-expressed 705

genes queried by 16 cell wall pathway genes 706

707

Figure 8 Cell wall pathway subnetworks A Intersections of PCC aggregation (PA) SCC aggregation (SA) and 708

MRNET-single (MS) networks queried by 16 cell wall pathway genes (red nodes) Cyan nodes are genes with 709

reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of 710

involvement in cell wall related pathways Grey lines indicate network predicted interactions B Network 711

retrieved from CORNET database queried by the16 cell wall pathway genes (red node) Cyan nodes are 712

genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior 713

knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions C 714

Network retrieved from STRING database queried by 16 cell wall pathway genes (red nodes) Cyan nodes are 715

genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior 716

knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions 717

718

Supplemental Figure 1 Pipeline and datasets used for analysis A Workflow used in this analysis 719

Independent steps are labeled in square boxes with alternative algorithms for each step in the rounded boxes 720

Software and packages for each step are in italics between the boxes Raw data files were acquired from 721

National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) database converted to a 722

common format (fastq files) and aligned to the maize AGPv3 genome (Alignment) Gene-level reads were 723

counted (Read Count) to generate an expression matrix which was imported to the R environment for the 724

normalization inference and evaluation steps All networks were visualized in Cytoscape B Relative 725

representation of different maize tissues in acquired datasets Tissues are listed by name with the percentage 726

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 22

of the1266 libraries originating from each tissue SAM= Shoot Apical Meristem Samples are grouped by tissue 727

and may be represented by one or more developmental stages of that tissue Tissues represented by less than 728

10 libraries were grouped together as Others C Relative representation of different maize genotypes in our 729

datasets Genotypes are listed by name with the percentage of the 1266 libraries originating from each tissue 730

MAGIC = Multi-parent Advanced Generation InterCrosses Genotypes represented by more than 10 libraries 731

were grouped together as Others 732

733

Supplemental Figure 2 Distribution of gene expression values The frequency of each expression level in the 734

dataset (Density) was plotted against gene expression (Expr) which was calculated after normalization by 735

Variance Stabilizing Transformation (VST) Counts Per Million (CPM) or Reads Per Kilobase per Million 736

mapped reads (RPKM) A-B distribution of expression values for samples normalized with CPM (black line 737

CPM graph) and RPKM (black line RPKM graph) before (A) and after (B) logarithm normalization (log2) VST 738

values are log2 transformed by default The normal distribution of expression (dot lines) was calculated using 739

dnorm() function in R which takes the mean value and standard deviation from log2 transformed expressions 740

C Normalized gene expression values for 15116 genes were averaged libraries and plotted as a function of 741

gene length in base pairs (bp) 742

743

Supplemental Figure 3 Maize CPM-normalized with log2 transformed gene expression from all tissues and 744

developmental stages (Stelpflug et al 2015) A Clustering dendrogram of samples based on Euclidean 745

distance (Height) DAS days after sowing DAP days after pollination V1-V18 vegetative developmental 746

stage B Heat map of the gene expression correlation between pollen tissue and 78 other tissues calculated 747

by Pearson correlation coefficient ranging 06 to 10 Red color indicates higher correlation 748

749

Supplemental Figure 4 Pairwise comparison among results of inferences methods A GO evaluation 750

comparisons for VST CPM and RPKM normalized data The AUROC value density for each method was 751

plotted in diagonal line of blocks between AUROC values and PCC values AUROC values evaluated by GO 752

datasets were plotted pairwise in triangle below diagonal with the number corresponding coefficient values as 753

calculated by Pearson correlation shown in the triangle above diagonal B PPPTY evaluation comparisons for 754

VST CPM and RPKM normalized data The AUROC value density for each method was plotted in diagonal 755

line of blocks between AUROC values and PCC values AUROC values evaluated by PPPTY datasets were 756

plotted pairwise in triangle below diagonal with the number corresponding coefficient values as calculated by 757

Pearson correlation shown in the triangle above diagonal PCC Pearson Correlation Coefficient SCC 758

Spearman correlation coefficient KCC Kendall rank Correlation Coefficient GCC Gini correlation coefficient 759

Bi Biweight midcorrelation CSC Cosine Similarity Coefficient AA Additive ARCNE MA multiplicative 760

ARCNE MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness 761

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 23

762

Supplemental Figure 5 Characteristic of all 1720 PPPTY gene set (ALL_1720) genes with highest AUROC 763

values in CSC method (CSC) PCC method (PCC) and MRNET method (MRNET) Average expression in 764

CPM of four gene sets were in squares average number of lowly expressed elements (CPM lt 0) were in solid 765

circles 766

767

Supplemental Figure 6 Evaluation of network performance based on sample size and inference A AUROC 768

values from GO evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted 769

against sample size B AUROC values from PPPTY evaluations of networks constructed using 12 (S12) to 770

1266 (S1266) libraries were plotted against sample size Networks with the same number of samples included 771

are designated as ldquo_1rdquo ldquo_2rdquo and ldquo_3rdquo Outliers were defined as outside of 15 times the interquartile range 772

above the 75 quantile or below 25 quantile Median values were plotted as bold horizontal lines Dash lines 773

are average AUROC value from 17 individual networks of each categories Mean values of each network were 774

labeled in asterisks PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET 775

Minimum Redundancy NETwork CLR Context Likelihood of Relatedness 776

777

Supplemental Figure 7 GCN performance comparison between protein networks A Area Under the ROC 778

curve (AUROC) values from GO evaluation of protein networks with 17862 genes (ppr_all) and with 11429 779

genes (ppr) B Area Under the ROC curve (AUROC) values from PPPTY evaluation of protein networks with 780

17862 genes (ppr_all) and with 11429 genes (ppr) Both networks were constructed by Pearson Correlation 781

Coefficient (PCC) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate 782

outliers 783

784

Supplemental Figure 8 Average neighborhood connectivity for three selected networks PCC-aggregated (PA) 785

SCC-aggregated (SA) and MRNET-single (MS) The average neighborhood connectivity distribution of all 786

genes is plotted against number of neighbors The top one million edges were chosen for each network Red 787

and blue curve shows the power-law fitted distribution R2 value indicates the fitness with the power-law model 788

789

Supplemental Figure 9 Node distribution for four selected networks PCC-aggregated (PA) SCC-aggregated 790

(SA) and MRNET-single (MS) and the intersection among three networks (Merged network) The number of 791

edges linked to the genes (node degree) was plotted against the number of genes with that degree (number of 792

nodes) Red curve shows the power-law fitted distribution with the function and R2 indicated beside 793

794

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 24

Supplemental Figure 10 Network representation of PCC ranked aggregation network (PA) Each node is a 795

gene in the network The eight largest modules detected by Markov Cluster Algorithm (MCL) were highlighted 796

in colors Genes not in modules 1-8 are light grey nodes 797

798

799

Literature Cited 800

Allen JD Xie Y Chen M Girard L Xiao G (2012) Comparing statistical methods for constructing large scale 801 gene networks PLoS One 7 e29348 802

Anders S Huber W (2010) Differential expression analysis for sequence count data Genome Biol 11 R106 803

Aoki K Ogata Y Shibata D (2007) Approaches for extracting practical information from gene co-expression 804 networks in plant biology Plant Cell Physiol 48 381ndash90 805

Aoki Y Okamura Y Tadaka S Kinoshita K Obayashi T (2015) ATTED-II in 2016 A Plant Coexpression 806 Database Towards Lineage-Specific Coexpression Plant Cell Physiol 57 e5ndashe5 807

Asnicar F Masera L Coller E Gallo C Sella N Tolio T Morettin P Erculiani L Galante F Semeniuta S (2016) 808 NES2RA Network expansion by stratified variable subsetting and ranking aggregation Int J High Perform 809 Comput Appl 1094342016662508 810

Baerenfaller K Grossmann J Grobei MA Hull R Hirsch-Hoffmann M Yalovsky S Zimmermann P 811 Grossniklaus U Gruissem W Baginsky S (2008) Genome-scale proteomics reveals Arabidopsis thaliana 812 gene models and proteome dynamics Science (80- ) 320 938ndash941 813

Ballouz S Verleyen W Gillis J (2015) Guidance for RNA-seq co-expression network construction and analysis 814 Safety in numbers Bioinformatics 31 2123ndash2130 815

Ballouz S Weber M Pavlidis P Gillis J (2016) EGAD Ultra-fast functional analysis of gene networks bioRxiv 816 53868 817

Barabasi A-L Oltvai ZNZN Barabaacutesi A-L (2004) Network biology understanding the cellrsquos functional 818 organization Nat Rev Genet 5 101ndash113 819

Benjamini Y Hochberg Y (1995) Controlling the false discovery rate a practical and powerful approach to 820 multiple testing J R Stat Soc Ser B 289ndash300 821

De Bodt S Hollunder J Nelissen H Meulemeester N Inzeacute D (2012) CORNET 20 Integrating plant 822 coexpression protein-protein interactions regulatory interactions gene associations and functional 823 annotations New Phytol 195 707ndash720 824

Bolduc N Yilmaz A Mejia-Guerra MK Morohashi K OrsquoConnor D Grotewold E Hake S (2012) Unraveling the 825 KNOTTED1 regulatory network in maize meristems Genes Dev 26 1685ndash90 826

Bosch M Mayer C-D Cookson A Donnison IS (2011) Identification of genes involved in cell wall biogenesis in 827 grasses by differential gene expression profiling of elongating and non-elongating maize internodes J 828 Exp Bot 62 3545ndash3561 829

Camacho C Coulouris G Avagyan V Ma N Papadopoulos J Bealer K Madden TL (2009) BLAST+ 830 architecture and applications BMC Bioinformatics 10 421 831

Conesa A Madrigal P Tarazona S Gomez-Cabrero D Cervera A McPherson A Szcześniak MW Gaffney DJ 832 Elo LL Zhang X et al (2016) A survey of best practices for RNA-seq data analysis Genome Biol 17 13 833

Drsquohaeseleer P Liang S Somogyi R (2000) Genetic network inference from co-expression clustering to reverse 834 engineering Bioinformatics 16 707ndash726 835

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 25

Davidson RM Hansey CN Gowda M Childs KL Lin H Vaillancourt B Sekhon RS de Leon N Kaeppler SM 836 Jiang N et al (2011) Utility of RNA Sequencing for Analysis of Maize Reproductive Transcriptomes Plant 837 Genome J 4 191 838

Desprez T Juraniec M Crowell EF Jouy H Pochylova Z Parcy F Houmlfte H Gonneau M Vernhettes S (2007) 839 Organization of cellulose synthase complexes involved in primary cell wall synthesis in Arabidopsis 840 thaliana Proc Natl Acad Sci 104 15572ndash15577 841

Dhillon IS Modha DS (2001) Concept decompositions for large sparse text data using clustering Mach Learn 842 42 143ndash175 843

Dillies M-A Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot G Castel D 844 Estelle J (2013a) A comprehensive evaluation of normalization methods for Illumina high-throughput RNA 845 sequencing data analysis Brief Bioinform 14 671ndash683 846

Dillies MA Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot NS Castel D 847 Estelle J et al (2013b) A comprehensive evaluation of normalization methods for Illumina high-throughput 848 RNA sequencing data analysis Brief Bioinform 14 671ndash683 849

Doncheva NT Assenov Y Domingues FS Albrecht M (2012) Topological analysis and interactive visualization 850 of biological networks and protein structures Nature Protoc 7 670ndash85 851

Dong J Horvath S (2007) Understanding network concepts in modules BMC Syst Biol 1 24 852

Dotto MC Petsch KA Aukerman MJ Beatty M Hammell M Timmermans MCP (2014) Genome-wide analysis 853 of leafbladeless1-regulated and phased small RNAs underscores the importance of the TAS3 ta-siRNA 854 pathway to maize development PLoS Genet 10 e1004826 855

Du D Rawat N Deng Z Gmitter FG (2015) Construction of citrus gene coexpression networks from microarray 856 data using random matrix theory Hortic Res 2 15026 857

Du Z Zhou X Ling Y Zhang Z Su Z (2010) agriGO a GO analysis toolkit for the agricultural community 858 Nucleic Acids Res 38 64-70 859

Enright AJ Van Dongen S Ouzounis CA (2002) An efficient algorithm for large-scale detection of protein 860 families Nucleic Acids Res 30 1575ndash1584 861

Fadista J Vikman P Laakso EO Mollet IG Esguerra J Lou Taneera J Storm P Osmark P Ladenvall C 862 Prasad RB (2014) Global genomic and transcriptomic analysis of human pancreatic islets reveals novel 863 genes influencing glucose metabolism Proc Natl Acad Sci 111 13924ndash13929 864

Faith JJ Hayete B Thaden JT Mogno I Wierzbowski J Cottarel G Kasif S Collins JJ Gardner TS (2007) 865 Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of 866 expression profiles PLoS Biol 5 0054ndash0066 867

Fedoroff N V (2012) McClintockrsquos challenge in the 21st century Proc Natl Acad Sci 109(50) 20200ndash20203 868

Ficklin SP Feltus FA (2011) Gene coexpression network alignment and conservation of gene modules 869 between two grass species maize and rice Plant Physiol 156 1244ndash56 870

Filzmoser P Fritz H Kalcher K (2009) pcaPP Robust PCA by Projection Pursuit R Packag version 1 871

Fu J Cheng Y Linghu J Yang X Kang L Zhang Z Zhang J He C Du X Peng Z (2013) RNA sequencing 872 reveals the complex regulatory network in the maize kernel Nature Commun 42832 873

Gao S Ver Steeg G Galstyan A (2015) Efficient Estimation of Mutual Information for Strongly Dependent 874 Variables Artificial Intelligence and Statistics 277-286 875

Gillis J Pavlidis P (2011) The role of indirect connections in gene networks in predicting function 876 Bioinformatics 27 1860ndash1866 877

Giorgi F Fabbro C Del Licausi F (2013) Comparative study of RNA-seq-and Microarray-derived coexpression 878 networks in Arabidopsis thaliana Bioinformatics 2 1ndash8 879

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 26

Gu Y Kaplinsky N Bringmann M Cobb A Carroll A Sampathkumar A Baskin TI Persson S Somerville CR 880 (2010) Identification of a cellulose synthase-associated protein required for cellulose biosynthesis Proc 881 Natl Acad Sci 107 12866ndash12871 882

Han Y Gao S Muegge K Zhang W Zhou B (2015) Advanced applications of RNA sequencing and challenges 883 Bioinform Biol Insights 9 29ndash46 884

Horvath S Dong J (2008) Geometric interpretation of gene coexpression network analysis PLoS Comput biol 885 4 e1000117 886

Huang J Lynn JS Schulte L Vendramin S McGinnis K (2017) Chapter Two-Epigenetic Control of Gene 887 Expression in Maize Int Rev Cell Mol Biol 328 25ndash48 888

Iancu OD Kawane S Bottomly D Searles R Hitzemann R McWeeney S (2012) Utilizing RNA-Seq data for de 889 novo coexpression network inference Bioinformatics 28 1592ndash1597 890

Kim D Langmead B Salzberg SL (2015) HISAT a fast spliced aligner with low memory requirements Nat 891 Methods 12 357ndash360 892

Koumlster J Rahmann S (2012) Snakemakemdasha scalable bioinformatics workflow engine Bioinformatics 28 893 2520ndash2522 894

Krouk G Lingeman J Colon A Coruzzi G Shasha D (2013) Gene regulatory networks in plants learning 895 causality from time and perturbation Genome Biol 14 123 896

Kumar S Stecher G Suleski M Hedges SB (2017) TimeTree a resource for timelines timetrees and 897 divergence times Mol Biol Evol 34 1812ndash1819 898

Kumari S Nie J Chen H-S Ma H Stewart R Li X Lu M-Z Taylor WM Wei H (2012) Evaluation of gene 899 association methods for coexpression network construction and biological knowledge discovery PLoS 900 One 7 e50411 901

Langfelder P Horvath S (2008) WGCNA an R package for weighted correlation network analysis BMC 902 Bioinformatics 9 559 903

Leinonen R Sugawara H Shumway M (2010) The sequence read archive Nucleic Acids Res gkq1019 904

Li C Qiao Z Qi W Wang Q Yuan Y Yang X Tang Y Mei B Lv Y Zhao H et al (2015a) Genome-Wide 905 Characterization of cis-Acting DNA Targets Reveals the Transcriptional Regulatory Framework of 906 Opaque2 in Maize Plant Cell 27 532-545 907

Li H Peng Z Yang X Wang W Fu J Wang J Han Y Chai Y Guo T Yang N (2013a) Genome-wide 908 association study dissects the genetic architecture of oil biosynthesis in maize kernels Nat Genet 45 43ndash909 50 910

Li J Wei H Zhao PX (2013b) DeGNServer  Deciphering Genome-Scale Gene Networks through High 911 Performance Reverse Engineering Analysis 2013 912

Li P Piao Y Shon HS Ryu KH (2015b) Comparing the normalization methods for the differential analysis of 913 Illumina high-throughput RNA-Seq data BMC Bioinformatics 16 347 914

Li Q Eichten SR Hermanson PJ Zaunbrecher VM Song J Wendt J Rosenbaum H Madzima TF Sloan AE 915 Huang J et al (2014a) Genetic Perturbation of the Maize Methylome Plant Cell 26 4602ndash4616 916

Li S Łabaj PP Zumbo P Sykacek P Shi W Shi L Phan J Wu P-Y Wang M Wang C (2014b) Detecting and 917 correcting systematic variation in large-scale RNA sequencing data Nature Biotechnol 32 888ndash895 918

Li Y Pearl SA Jackson SA (2015c) Gene Networks in Plant Biology Approaches in Reconstruction and 919 Analysis Trends Plant Sci 20 664ndash675 920

Liao Y Smyth GK Shi W (2014) featureCounts an efficient general purpose program for assigning sequence 921 reads to genomic features Bioinformatics 30 923ndash930 922

Lim WK Wang K Lefebvre C Califano A (2007) Comparative analysis of microarray normalization procedures 923 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 27

Effects on reverse engineering gene networks Bioinformatics pp 282ndash288 924

Liu S Liu Y Zhao J Cai S Qian H Zuo K Zhao L Zhang L (2017) A computational interactome for prioritizing 925 genes associated with complex agronomic traits in rice Plant J 90 177-188 926

Lonsdale J Thomas J Salvatore M Phillips R Lo E Shad S Hasz R Walters G Garcia F Young N (2013) 927 The genotype-tissue expression (GTEx) project Nat Genet 45 580ndash585 928

Love MI Huber W Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data 929 with DESeq2 Genome Biol 15 1 930

Luo X You Z Zhou M Li S Leung H Xia Y Zhu Q (2015) A highly efficient approach to protein interactome 931 mapping based on collaborative filtering framework Sci Rep 5 7702 932

Ma C Wang X (2012) Application of the Gini correlation coefficient to infer regulatory relationships in 933 transcriptome analysis Plant Physiol 160 192ndash203 934

Ma H-W Zeng A-P (2003) The connectivity structure giant strong component and centrality of metabolic 935 networks Bioinformatics 19 1423ndash1430 936

Ma S Shah S Bohnert HJ Snyder M Dinesh-Kumar SP (2013) Incorporating motif analysis into gene co-937 expression networks reveals novel modular expression pattern and new signaling pathways PLoS Genet 938 9 e1003840 939

Marbach D Costello JC Kuumlffner R Vega NNM Prill RJ Camacho DM Allison KR Aderhold A Allison KR 940 Bonneau R et al (2012) Wisdom of crowds for robust gene network inference Nat Methods 9 796ndash804 941

Margolin AA Nemenman I Basso K Wiggins C Stolovitzky G Dalla Favera R Califano A (2006) ARACNE 942 an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context BMC 943 Bioinformatics 7 S7 944

Mark Cigan A Unger‐Wallace E Haug‐Collet K (2005) Transcriptional gene silencing as a tool for uncovering 945 gene function in maize Plant J 43 929ndash940 946

Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads EMBnet J 17 947 pp-10 948

Maza E Frasse P Senin P Bouzayen M Zouine M (2013) Comparison of normalization methods for 949 differential gene expression analysis in RNA-Seq experiments A matter of relative size of studied 950 transcriptomes Commun Integr Biol 6 e25849 951

McClintock B (1983) The Significance of Responses of THE GENOME TO CHALLENGE Science (80- ) 792ndash952 801 953

Meyer PE Kontos K Lafitte F Bontempi G (2007) Information-theoretic inference of large transcriptional 954 regulatory networks Eurasip J Bioinforma Syst Biol doi 101155200779879 955

Meyer PE Lafitte F Bontempi G (2008) minet ARBioconductor package for inferring large transcriptional 956 networks using mutual information BMC Bioinformatics 9 461 957

Monaco MK Sen TZ Dharmawardhana PD Ren L Schaeffer M Naithani S Amarasinghe V Thomason J 958 Harper L Gardiner J et al (2013) Maize Metabolic Network Construction and Transcriptome Analysis 959 Plant Genome 6 12 960

Morohashi K Casas MI Falcone Ferreyra ML Falcone Ferreyra L Mejiacutea-Guerra MK Pourcel L Yilmaz A 961 Feller A Carvalho B Emiliani J et al (2012) A genome-wide regulatory framework identifies maize 962 pericarp color1 controlled genes Plant Cell 24 2745ndash64 963

Morris JH Apeltsin L Newman AM Baumbach J Wittkop T Su G Bader GD Ferrin TE (2011) clusterMaker 964 a multi-algorithm clustering plugin for Cytoscape BMC Bioinformatics 12 436 965

Mortazavi A Williams BA McCue K Schaeffer L Wold B (2008) Mapping and quantifying mammalian 966 transcriptomes by RNA-Seq Nat Methods 5 621ndash628 967

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 28

Mukaka MM (2012) A guide to appropriate use of correlation coefficient in medical research Malawi Med J 24 968 69ndash71 969

Obayashi T Hayashi S Saeki M Ohta H Kinoshita K (2009) ATTED-II provides coexpressed gene networks 970 for Arabidopsis Nucleic Acids Res 37 D987ndashD991 971

Ogata Y Suzuki H Sakurai N Shibata D (2010) CoP a database for characterizing co-expressed gene 972 modules with biological information in plants Bioinformatics 26 1267ndash1268 973

Oshlack A Wakefield MJ (2009) Transcript length bias in RNA-seq data confounds systems biology Biol 974 Direct 4 14 975

Park T Yi S-G Kang S-H Lee S Lee Y-S Simon R (2003) Evaluation of normalization methods for microarray 976 data BMC Bioinformatics 4 33 977

Paulson J Chen C-Y Lopes-Ramos CM Kuijjer ML Platig J Sonawane AR Fagny M Glass K Quackenbush 978 J (2016) Tissue-aware RNA-Seq processing and normalization for heterogeneous and sparse data 979 bioRxiv 81802 980

Pautler M Eveland AL LaRue T Yang F Weeks R Lunde C Je B Il Meeley R Komatsu M Vollbrecht E et 981 al (2015) FASCIATED EAR4 Encodes a bZIP Transcription Factor That Regulates Shoot Meristem Size in 982 Maize Plant Cell Online 2 tpc114132506 983

Penning BW Hunter 3rd CT Tayengwa R Eveland AL Dugard CK Olek AT Vermerris W Koch KE McCarty 984 DR Davis MF et al (2009) Genetic resources for maize cell wall biology Plant Physiol 151 1703ndash1728 985

Ponnala L Wang Y Sun Q Wijk KJ (2014) Correlation of mRNA and protein abundance in the developing 986 maize leaf Plant J 78 424ndash440 987

Rau A Gallopin M Celeux G Jaffreacutezic F (2013) Data-based filtering for replicated high-throughput 988 transcriptome sequencing experiments Bioinformatics 29 2146ndash2152 989

Robinson MD McCarthy DJ Smyth GK (2010) edgeR a Bioconductor package for differential expression 990 analysis of digital gene expression data Bioinformatics 26 139ndash140 991

Sales G Romualdi C (2011) parmigenemdasha parallel R package for mutual information estimation and gene 992 network reconstruction Bioinformatics 27 1876ndash1877 993

Sangrador-Vegas A Mitchell AL Chang H-Y Yong S-Y Finn RD (2016) GO annotation in InterPro why 994 stability does not indicate accuracy in a sea of changing annotations Database J Biol databases 995 curation 2016 996

Sato Y Antonio BA Namiki N Takehisa H Minami H Kamatsuki K Sugimoto K Shimizu Y Hirochika H 997 Nagamura Y (2011) RiceXPro a platform for monitoring gene expression in japonica rice grown under 998 natural field conditions Nucleic Acids Res 39 D1141ndashD1148 999

Schaefer RJ Briskine R Springer NM Myers CL (2014) Discovering functional modules across diverse maize 1000 transcriptomes using COB the co-expression browser PLoS One doi 101371journalpone0099193 1001

Schmidt D (2016) Co-Operation Fast Correlation Covariance and Cosine Similarity R package 1002

Schnable P Ware D Fulton R Stein J (2009) The B73 maize genome complexity diversity and dynamics 1003 Science (80- ) 326 1112ndash1115 1004

Schwanhaumlusser B Busse D Li N Dittmar G Schuchhardt J Wolf J Chen W Selbach M (2011) Global 1005 quantification of mammalian gene expression control Nature 473 337ndash342 1006

Scott-Boyer M-P Haibe-Kains B Deschepper CF (2013) Network statistics of genetically-driven gene co-1007 expression modules in mouse crosses Frontiers in Genetics 20134291 1008

Serin EAR Nijveen H Hilhorst HWM Ligterink W (2016) Learning from Co-expression Networks Possibilities 1009 and Challenges Front Plant Sci 7 444 1010

Shannon P Markiel A Ozier O Baliga NS Wang JT Ramage D Amin N Schwikowski B Ideker T (2003) 1011 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 29

Cytoscape a software environment for integrated models of biomolecular interaction networks Genome 1012 Res 13 2498ndash2504 1013

Sing T Sander O Beerenwinkel N Lengauer T (2005) ROCR visualizing classifier performance in R 1014 Bioinformatics 21 3940ndash3941 1015

Singh M Goel S Meeley RB Dantec C Parrinello H Michaud C Leblanc O Grimanelli D (2011) Production of 1016 viable gametes without meiosis in maize deficient for an ARGONAUTE protein Plant Cell 23 443ndash458 1017

Song L Langfelder P Horvath S (2012) Comparison of co-expression measures mutual information 1018 correlation and model based indices BMC Bioinformatics 13 328 1019

Stelpflug SC Rajandeep S Vaillancourt B Hirsch CN Buell CR Leon N De Kaeppler SM (2015) An 1020 expanded maize gene expression atlas based on RNA-sequencing and its use to explore root 1021 development Plant Genome 314ndash362 1022

Szklarczyk D Franceschini A Wyder S Forslund K Heller D Huerta-Cepas J Simonovic M Roth A Santos A 1023 Tsafou KP et al (2015) STRING v10 Protein-protein interaction networks integrated over the tree of life 1024 Nucleic Acids Res 43 D447ndashD452 1025

Tzfadia O Diels T De Meyer S Vandepoele K Aharoni A Van de Peer Y (2016) CoExpNetViz Comparative 1026 Co-Expression Networks Construction and Visualization Tool Front Plant Sci 6 1194 1027

Usadel B Obayashi T Mutwil M Giorgi FM Bassel GW Tanimoto M Chow A Steinhauser D Persson S 1028 Provart NJ (2009) Co-expression tools for plant biology opportunities for hypothesis generation and 1029 caveats Plant Cell Environ 32 1633ndash51 1030

USDA (2016) Grain World Markets and Trade 1031

Vanholme R Demedts B Morreel K Ralph J Boerjan W (2010) Lignin biosynthesis and structure Plant 1032 Physiol 153 895ndash905 1033

Verdin E Ott M (2014) 50 Years of Protein Acetylation From Gene Regulation To Epigenetics Metabolism 1034 and Beyond Nat Rev Mol Cell Biol 16 258ndash264 1035

Walley JW Sartor RC Shen Z Schmitz RJ Wu KJ Urich MA Nery JR Smith LG Schnable JC Ecker JR 1036 (2016) Integration of omic networks in a developmental atlas of maize Science 353 814ndash818 1037

Wang W Zhou XJ Liu Z Sun F (2015a) Network tuned multiple rank aggregation and applications to gene 1038 ranking BMC Bioinformatics 16 S6 1039

Wang YXR Jiang K Feldman LJ Bickel PJ Huang H (2015b) Inferring genendashgene interactions and functional 1040 modules using sparse canonical correlation analysis Ann Appl Stat 9 300ndash323 1041

Wang Z Gerstein M Snyder M (2009) RNA-Seq a revolutionary tool for transcriptomics Nat Rev Genet 10 1042 57ndash63 1043

Wei C Li J Bumgarner RE (2004) Sample size for detecting differentially expressed genes in microarray 1044 experiments BMC Genomics 5 87 1045

Wolfe CJ Kohane IS Butte AJ (2005) Systematic survey reveals general applicability ofldquo guilt-by-associationrdquo 1046 within gene coexpression networks BMC Bioinformatics 6 227 1047

Wu D-D Wang X Li Y Zeng L Irwin DM Zhang Y-P (2014) ldquoOut of Pollenrdquo Hypothesis for Origin of New 1048 Genes in Flowering Plants Study from Arabidopsis thaliana Genome Biol Evol 6 2822ndash2829 1049

Yang H Liu X Xin M Du J Hu Z Peng H Rossi V Sun Q Ni Z Yao Y (2016) Genome-wide Mapping of 1050 Targets of Maize Histone Deacetylase HDA101 Reveals Its Function and Regulatory Mechanism during 1051 Seed Development Plant Cell 28 629ndash645 1052

Yim WC Yu Y Song K Jang CS Lee B-M (2013) PLANEX the plant co-expression database BMC Plant Biol 1053 13 83 1054

Zheng W Chung LM Zhao H (2011) Bias detection and correction in RNA-Sequencing data BMC 1055 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 30

Bioinformatics 12 290 1056

Zhong R Allen JD Xiao G Xie Y (2014) Ensemble-based network aggregation improves the accuracy of gene 1057 network reconstruction PLoS One 9 e106319 1058

Zhong R Ye ZH (2015) Secondary cell walls Biosynthesis patterned deposition and transcriptional regulation 1059 Plant Cell Physiol 56 195ndash214 1060

Zhu G Wu A Xu X-J Xiao P Lu L Liu J Cao Y Chen L Wu J Zhao X-M (2016) PPIM A protein-protein 1061 interaction database for Maize Plant Physiol 170 pp1501821- 1062

Zyprych-Walczak J Szabelska A Handschuh L Goacuterczak K Klamecka K Figlerowicz M Siatkowski I (2015) 1063 The impact of normalization methods on RNA-Seq data analysis Biomed Res Int 2015 1064

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 1 Number of maize microarray and RNA-Seq samples submitted to NCBI (httpswwwncbinlmnihgov) from 2008-2016 A A text search of the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) database identified samples generated by microarray platforms GPL4032 and GPL12620 the total values for years 2008 to 2016 were combined to represent the number of microarray studies (Microarray) A text search of the NCBI Sequence Read Archive (SRA) database was used to identify RNA-seq samples generated between 2008 and 2016 using the Illumina sequencing platform (RNA-Seq) B the number of samples submitted to NCBI GEO database each year generated by microarray platform GPL4032 and GPL12620 were identified by a text search (dash line) and compared to the number of RNA-Seq Illumina samples (solid line) per year 2008-2016

Fig 1A B

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 2 Normalization and network inference methods effect on single network performance A Network performance was evaluated by calculating area under the receiver operating characteristic curve (AUROC) values from GO datasets for comparisons with samples normalized using Variance Stabilizing Transformation (VST) Counts Per Million (CPM) or Reads per Kilobase per Million (RPKM) methods B Network performance was evaluated by calculating AUROC values from PPPTY dataset comparisons for samples normalized using VST CPM or RPKM methods C Network performance was evaluated by calculating AUROC values from com-parisons with HDA101 binding targets for samples normalized using VST CPM or RPKM methods D Network performance was evaluated by calculating AUROC values from comparisons with GO dataset for samples constructed using nine inference methods including Pearson Correlation Coefficient (PCC) Spearman correla-tion coefficient (SCC) Kendall rank Correlation Coefficient (KCC) Gini correlation coefficient (GCC) Biweight midcorrelation (BIC) Cosine Similarity Coefficient (CSC) Additive ARCNE (AA) multiplicative ARCNE (MA) Minimum Redundancy NETwork (MRNET) and Context Likelihood of Relatedness (CLR) E Network perfor-mance was evaluated by calculating AUROC values from comparisons with PPPTY for samples constructed using nine inference methods F Network performance was evaluated by calculating AUROC values from com-parisons with HDA101 binding targets for samples constructed using nine inference methods Outliers were defined as outside of 15 times the interquartile range above 75 quantile or below 25 quantile Median values were plotted as bold horizontal lines For C to F the horizontal dashed lines indicate the highest and lowest AUROC values

Fig 2 A D

B E

C F

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

FigP

FigurePIPSimilarityPbetweenPninePinferencePmethodsPonPnetworkPperformancePbasedPuponPGOPA-PandPPPPTYPB-PevaluationI Genes with highest AUROC value in CSC or MRNETCLR are closed in blue and green box respectively AreaPunderPthePROCPcurvePAUROC-PvaluesPforPeachPGOPtermPorPgenesPwerePscaledPtoPstandardPnormalPdistributionMPresultingPinPscaledPAUROCPvaluesPbetweenPKPblue-PandPPred-IPSamplesPnormalizedPbyPVSTMPCPMPandPRPKMPwerePanalyzedPusingPeachPinferencePmethodsPPCCMPSCCMPKCCMPGCCMPBICMPCSCMPAAMPMAMPMRNETPandPCLR-PandPclusteredPbasedPonP EuclidianP distanceIP PCCP PearsonP CorrelationP CoefficientP SCCP SpearmanP correlationP coefficientP KCCP KendallP rankP CorrelationP CoefficientPGCCPGiniPcorrelationPcoefficientPBICPBiweightPmidcorrelationPCosinePSimilarityPCoefficientPCSC-PAAPAdditivePARCNEPMAPmultiplicativePARCNEPMRNETPMinimumPRedundancyPNETworkPCLRPContextPLikelihoodPofPRelatednessI

A

B

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 4 Effect of sample size on network performance A Average area under the ROC curve (average AUROC) values from GO evaluation of 17 different sized networks plotted against natural logarithm transformed sample size (log(sample size)) B Average AUROC values from PPPTY evaluation of 17 different sized networks plotted against natural logarithm transformed sample size (log(sample size)) Regression fitting logarithm models were plotted in black lines R-square and p-values were calculated by lm() function in R PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness

Fig 4A

B

GO PCC GO SCC

GO MRNET GO CLR

PPPTY PCC PPPTY SCC

PPPTY CLRPPPTY MRNET

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Fig 5

A B

Figure 5 Evaluation of different sized networks constructed with different number of samples using PCC (black) SCC (green) MRNET (red) and CLR (blue) methods A Average AUROC values from GO evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted against sample size B Average AUROC values from PPPTY evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted against sample size Networks with the same number of samples included are designated as ldquo_1rdquo ldquo_2rdquo and ldquo_3rdquo PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 6 GCN performance comparison of networks constructed with 1266 libraries A Area under the ROC curve (AUROC) values from GO evalu-ation of single network (white bars) aggregation network (grey bars) and protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers B AUROC values from PPPTY evaluation of single network (white bars) aggregation network (grey bars) and protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers

AU

C

AU

C

Protein GO Protein PPPTYA B

Fig 6

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

835

45

3812

5 802

148

MS PA

SA

872505

167664

16510411732

9172 716573

106591

MS PA

SA

chromatinassemblydisassembly

cellular macromoleculemetabolic process

chromatin assembly

Hub

N2 compound metabolicprocess

gene silencing

macromoleculemetabolic process

cellular componentorganization

chromatin modification

biosynthetic process

cellular biosyntheticprocess

DNA packaging

organelle organization

protein-DNA complexassembly

nucleosomeorganization

DNA-dep DNAreplication

macromoleculebiosynthetic process

response to DNAdamage stimulus

chromosomeorganization

pattern specificationprocess

DNA replication

DNA conformationchange

translation

cellular macromoleculebiosynthetic process

Nucleic acid metabolicprocess

gene expression

chromatin organizationnucleosome assembly

epigenetic reg of geneexpression

negative regulation ofmacromolecule

metabolic process

cellular response tostress

RNA processing

DNA repair

regionalization

polysaccharidebiosynthetic process

cell wall organization orbiogenesis

glucan metabolicprocess

cellular glucanmetabolic process

cellular polysaccharidebiosynthetic process

cellular carbohydratebiosynthetic process

cellulose metabolicprocess

cellular polysaccharidemetabolic process

cellulose biosyntheticprocess epidermis development

cell growthgrowth

regulation of cellularcomponent size

cellular amino acidderivative metabolic

process

cell wall polysaccharidemetabolic process

carbohydrate metabolicprocess

regulation of anatomicalstructure size

GTP metabolic process

root morphogenesis

epidermal celldifferentiation

ectoderm developmentphenylpropanoid

biosynthetic process

regulation of cell size

glucan biosyntheticprocess

carbohydratebiosynthetic process cellular cell wall

organization orbiogenesis

cell wall biogenesis

Cell Wallroot epidermal cell

differentiationcell differentiation

cell wall organization

protein polymerization

plant-type cell wallbiogenesis

cellular carbohydratemetabolic process

phenylpropanoidmetabolic process

cell wall macromoleculemetabolic process

cellular cell wallmacromolecule

metabolic process

plant-type cell wallorganization or

biogenesis

hemicellulose metabolicprocess

Fig 7A C

B D

Figure 7 Comparison of top 1000 interacting genes in aggregation and single networks A Venn diagram shows the overlap among 1000 highest interacting genes in three networks 148 genes were shared among three networks PA PCC ranked aggregation network SA SCC ranked aggrega-tion network MS MRNET single network B Venn diagram shows the overlap among top 1x106 edges in three networks 106591 edges were shared among three networks C GO term enrichment analyzed by AgriGO for 148 shared highly interacting genes among PA SA and MS D GO term enrichment analyzed by AgriGO for 214 co-expressed genes queried by 16 cell wall pathway genes wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Fig 8

A B C

Figure 8 Cell wall pathway subnetworks A Intersections of PCC aggregation (PA) SCC aggregation (SA) and MRNET-single (MS) networks queried by 16 cell wall pathway genes (red nodes) Cyan nodes are genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions B Network retrieved from CORNET database queried by the 16 cell wall pathway genes (red node) Cyan nodes are genes with reported function in cell wall related path-ways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions C Network retrieved from STRING database queried by the 16 cell wall pathway genes (red nodes) Cyan nodes are genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Parsed CitationsAllen JD Xie Y Chen M Girard L Xiao G (2012) Comparing statistical methods for constructing large scale gene networks PLoS One7 e29348

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Anders S Huber W (2010) Differential expression analysis for sequence count data Genome Biol 11 R106Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Aoki K Ogata Y Shibata D (2007) Approaches for extracting practical information from gene co-expression networks in plant biologyPlant Cell Physiol 48 381-90

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Aoki Y Okamura Y Tadaka S Kinoshita K Obayashi T (2015) ATTED-II in 2016 A Plant Coexpression Database Towards Lineage-Specific Coexpression Plant Cell Physiol 57 e5-e5

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Asnicar F Masera L Coller E Gallo C Sella N Tolio T Morettin P Erculiani L Galante F Semeniuta S (2016) NES2RA Networkexpansion by stratified variable subsetting and ranking aggregation Int J High Perform Comput Appl 1094342016662508

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Baerenfaller K Grossmann J Grobei MA Hull R Hirsch-Hoffmann M Yalovsky S Zimmermann P Grossniklaus U Gruissem WBaginsky S (2008) Genome-scale proteomics reveals Arabidopsis thaliana gene models and proteome dynamics Science (80- ) 320938-941

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ballouz S Verleyen W Gillis J (2015) Guidance for RNA-seq co-expression network construction and analysis Safety in numbersBioinformatics 31 2123-2130

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ballouz S Weber M Pavlidis P Gillis J (2016) EGAD Ultra-fast functional analysis of gene networks bioRxiv 53868Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Barabasi A-L Oltvai ZNZN Barabaacutesi A-L (2004) Network biology understanding the cells functional organization Nat Rev Genet 5101-113

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Benjamini Y Hochberg Y (1995) Controlling the false discovery rate a practical and powerful approach to multiple testing J R Stat SocSer B 289-300

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

De Bodt S Hollunder J Nelissen H Meulemeester N Inzeacute D (2012) CORNET 20 Integrating plant coexpression protein-proteininteractions regulatory interactions gene associations and functional annotations New Phytol 195 707-720

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Bolduc N Yilmaz A Mejia-Guerra MK Morohashi K OConnor D Grotewold E Hake S (2012) Unraveling the KNOTTED1 regulatorynetwork in maize meristems Genes Dev 26 1685-90

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Bosch M Mayer C-D Cookson A Donnison IS (2011) Identification of genes involved in cell wall biogenesis in grasses by differential wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

gene expression profiling of elongating and non-elongating maize internodes J Exp Bot 62 3545-3561Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Camacho C Coulouris G Avagyan V Ma N Papadopoulos J Bealer K Madden TL (2009) BLAST+ architecture and applications BMCBioinformatics 10 421

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Conesa A Madrigal P Tarazona S Gomez-Cabrero D Cervera A McPherson A Szczesniak MW Gaffney DJ Elo LL Zhang X et al(2016) A survey of best practices for RNA-seq data analysis Genome Biol 17 13

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dhaeseleer P Liang S Somogyi R (2000) Genetic network inference from co-expression clustering to reverse engineeringBioinformatics 16 707-726

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Davidson RM Hansey CN Gowda M Childs KL Lin H Vaillancourt B Sekhon RS de Leon N Kaeppler SM Jiang N et al (2011) Utilityof RNA Sequencing for Analysis of Maize Reproductive Transcriptomes Plant Genome J 4 191

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Desprez T Juraniec M Crowell EF Jouy H Pochylova Z Parcy F Houmlfte H Gonneau M Vernhettes S (2007) Organization of cellulosesynthase complexes involved in primary cell wall synthesis in Arabidopsis thaliana Proc Natl Acad Sci 104 15572-15577

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dhillon IS Modha DS (2001) Concept decompositions for large sparse text data using clustering Mach Learn 42 143-175Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dillies M-A Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot G Castel D Estelle J (2013a) Acomprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis Brief Bioinform 14671-683

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dillies MA Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot NS Castel D Estelle J et al (2013b) Acomprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis Brief Bioinform 14671-683

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Doncheva NT Assenov Y Domingues FS Albrecht M (2012) Topological analysis and interactive visualization of biological networksand protein structures Nature Protoc 7 670-85

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dong J Horvath S (2007) Understanding network concepts in modules BMC Syst Biol 1 24Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dotto MC Petsch KA Aukerman MJ Beatty M Hammell M Timmermans MCP (2014) Genome-wide analysis of leafbladeless1-regulatedand phased small RNAs underscores the importance of the TAS3 ta-siRNA pathway to maize development PLoS Genet 10 e1004826

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Du D Rawat N Deng Z Gmitter FG (2015) Construction of citrus gene coexpression networks from microarray data using randommatrix theory Hortic Res 2 15026

Pubmed Author and TitleCrossRef Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Google Scholar Author Only Title Only Author and Title

Du Z Zhou X Ling Y Zhang Z Su Z (2010) agriGO a GO analysis toolkit for the agricultural community Nucleic Acids Res 38 64-70Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Enright AJ Van Dongen S Ouzounis CA (2002) An efficient algorithm for large-scale detection of protein families Nucleic Acids Res 301575-1584

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fadista J Vikman P Laakso EO Mollet IG Esguerra J Lou Taneera J Storm P Osmark P Ladenvall C Prasad RB (2014) Globalgenomic and transcriptomic analysis of human pancreatic islets reveals novel genes influencing glucose metabolism Proc Natl AcadSci 111 13924-13929

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Faith JJ Hayete B Thaden JT Mogno I Wierzbowski J Cottarel G Kasif S Collins JJ Gardner TS (2007) Large-scale mapping andvalidation of Escherichia coli transcriptional regulation from a compendium of expression profiles PLoS Biol 5 0054-0066

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fedoroff N V (2012) McClintocks challenge in the 21st century Proc Natl Acad Sci 109(50) 20200-20203Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ficklin SP Feltus FA (2011) Gene coexpression network alignment and conservation of gene modules between two grass speciesmaize and rice Plant Physiol 156 1244-56

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Filzmoser P Fritz H Kalcher K (2009) pcaPP Robust PCA by Projection Pursuit R Packag version 1Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fu J Cheng Y Linghu J Yang X Kang L Zhang Z Zhang J He C Du X Peng Z (2013) RNA sequencing reveals the complex regulatorynetwork in the maize kernel Nature Commun 42832

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gao S Ver Steeg G Galstyan A (2015) Efficient Estimation of Mutual Information for Strongly Dependent Variables ArtificialIntelligence and Statistics 277-286

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gillis J Pavlidis P (2011) The role of indirect connections in gene networks in predicting function Bioinformatics 27 1860-1866Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Giorgi F Fabbro C Del Licausi F (2013) Comparative study of RNA-seq-and Microarray-derived coexpression networks in Arabidopsisthaliana Bioinformatics 2 1-8

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gu Y Kaplinsky N Bringmann M Cobb A Carroll A Sampathkumar A Baskin TI Persson S Somerville CR (2010) Identification of acellulose synthase-associated protein required for cellulose biosynthesis Proc Natl Acad Sci 107 12866-12871

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Han Y Gao S Muegge K Zhang W Zhou B (2015) Advanced applications of RNA sequencing and challenges Bioinform Biol Insights 929-46

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Horvath S Dong J (2008) Geometric interpretation of gene coexpression network analysis PLoS Comput biol 4 e1000117Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Huang J Lynn JS Schulte L Vendramin S McGinnis K (2017) Chapter Two-Epigenetic Control of Gene Expression in Maize Int RevCell Mol Biol 328 25-48

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Iancu OD Kawane S Bottomly D Searles R Hitzemann R McWeeney S (2012) Utilizing RNA-Seq data for de novo coexpressionnetwork inference Bioinformatics 28 1592-1597

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kim D Langmead B Salzberg SL (2015) HISAT a fast spliced aligner with low memory requirements Nat Methods 12 357-360Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Koumlster J Rahmann S (2012) Snakemakemdasha scalable bioinformatics workflow engine Bioinformatics 28 2520-2522Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Krouk G Lingeman J Colon A Coruzzi G Shasha D (2013) Gene regulatory networks in plants learning causality from time andperturbation Genome Biol 14 123

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kumar S Stecher G Suleski M Hedges SB (2017) TimeTree a resource for timelines timetrees and divergence times Mol Biol Evol34 1812-1819

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kumari S Nie J Chen H-S Ma H Stewart R Li X Lu M-Z Taylor WM Wei H (2012) Evaluation of gene association methods forcoexpression network construction and biological knowledge discovery PLoS One 7 e50411

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Langfelder P Horvath S (2008) WGCNA an R package for weighted correlation network analysis BMC Bioinformatics 9 559Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Leinonen R Sugawara H Shumway M (2010) The sequence read archive Nucleic Acids Res gkq1019Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li C Qiao Z Qi W Wang Q Yuan Y Yang X Tang Y Mei B Lv Y Zhao H et al (2015a) Genome-Wide Characterization of cis-Acting DNATargets Reveals the Transcriptional Regulatory Framework of Opaque2 in Maize Plant Cell 27 532-545

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li H Peng Z Yang X Wang W Fu J Wang J Han Y Chai Y Guo T Yang N (2013a) Genome-wide association study dissects the geneticarchitecture of oil biosynthesis in maize kernels Nat Genet 45 43-50

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li J Wei H Zhao PX (2013b) DeGNServer Deciphering Genome-Scale Gene Networks through High Performance ReverseEngineering Analysis 2013

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li P Piao Y Shon HS Ryu KH (2015b) Comparing the normalization methods for the differential analysis of Illumina high-throughputRNA-Seq data BMC Bioinformatics 16 347 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li Q Eichten SR Hermanson PJ Zaunbrecher VM Song J Wendt J Rosenbaum H Madzima TF Sloan AE Huang J et al (2014a)Genetic Perturbation of the Maize Methylome Plant Cell 26 4602-4616

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li S Labaj PP Zumbo P Sykacek P Shi W Shi L Phan J Wu P-Y Wang M Wang C (2014b) Detecting and correcting systematicvariation in large-scale RNA sequencing data Nature Biotechnol 32 888-895

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li Y Pearl SA Jackson SA (2015c) Gene Networks in Plant Biology Approaches in Reconstruction and Analysis Trends Plant Sci 20664-675

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Liao Y Smyth GK Shi W (2014) featureCounts an efficient general purpose program for assigning sequence reads to genomicfeatures Bioinformatics 30 923-930

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Lim WK Wang K Lefebvre C Califano A (2007) Comparative analysis of microarray normalization procedures Effects on reverseengineering gene networks Bioinformatics pp 282-288

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Liu S Liu Y Zhao J Cai S Qian H Zuo K Zhao L Zhang L (2017) A computational interactome for prioritizing genes associated withcomplex agronomic traits in rice Plant J 90 177-188

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Lonsdale J Thomas J Salvatore M Phillips R Lo E Shad S Hasz R Walters G Garcia F Young N (2013) The genotype-tissueexpression (GTEx) project Nat Genet 45 580-585

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Love MI Huber W Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 Genome Biol15 1

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Luo X You Z Zhou M Li S Leung H Xia Y Zhu Q (2015) A highly efficient approach to protein interactome mapping based oncollaborative filtering framework Sci Rep 5 7702

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma C Wang X (2012) Application of the Gini correlation coefficient to infer regulatory relationships in transcriptome analysis PlantPhysiol 160 192-203

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma H-W Zeng A-P (2003) The connectivity structure giant strong component and centrality of metabolic networks Bioinformatics 191423-1430

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma S Shah S Bohnert HJ Snyder M Dinesh-Kumar SP (2013) Incorporating motif analysis into gene co-expression networks revealsnovel modular expression pattern and new signaling pathways PLoS Genet 9 e1003840

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Marbach D Costello JC Kuumlffner R Vega NNM Prill RJ Camacho DM Allison KR Aderhold A Allison KR Bonneau R et al (2012)Wisdom of crowds for robust gene network inference Nat Methods 9 796-804

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Margolin AA Nemenman I Basso K Wiggins C Stolovitzky G Dalla Favera R Califano A (2006) ARACNE an algorithm for thereconstruction of gene regulatory networks in a mammalian cellular context BMC Bioinformatics 7 S7

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mark Cigan A Unger-Wallace E Haug-Collet K (2005) Transcriptional gene silencing as a tool for uncovering gene function in maizePlant J 43 929-940

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads EMBnet J 17 pp-10Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Maza E Frasse P Senin P Bouzayen M Zouine M (2013) Comparison of normalization methods for differential gene expressionanalysis in RNA-Seq experiments A matter of relative size of studied transcriptomes Commun Integr Biol 6 e25849

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

McClintock B (1983) The Significance of Responses of THE GENOME TO CHALLENGE Science (80- ) 792-801Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Meyer PE Kontos K Lafitte F Bontempi G (2007) Information-theoretic inference of large transcriptional regulatory networks EurasipJ Bioinforma Syst Biol doi 101155200779879

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Meyer PE Lafitte F Bontempi G (2008) minet ARBioconductor package for inferring large transcriptional networks using mutualinformation BMC Bioinformatics 9 461

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Monaco MK Sen TZ Dharmawardhana PD Ren L Schaeffer M Naithani S Amarasinghe V Thomason J Harper L Gardiner J et al(2013) Maize Metabolic Network Construction and Transcriptome Analysis Plant Genome 6 12

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Morohashi K Casas MI Falcone Ferreyra ML Falcone Ferreyra L Mejiacutea-Guerra MK Pourcel L Yilmaz A Feller A Carvalho B EmilianiJ et al (2012) A genome-wide regulatory framework identifies maize pericarp color1 controlled genes Plant Cell 24 2745-64

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Morris JH Apeltsin L Newman AM Baumbach J Wittkop T Su G Bader GD Ferrin TE (2011) clusterMaker a multi-algorithmclustering plugin for Cytoscape BMC Bioinformatics 12 436

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mortazavi A Williams BA McCue K Schaeffer L Wold B (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq NatMethods 5 621-628

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mukaka MM (2012) A guide to appropriate use of correlation coefficient in medical research Malawi Med J 24 69-71Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Obayashi T Hayashi S Saeki M Ohta H Kinoshita K (2009) ATTED-II provides coexpressed gene networks for Arabidopsis Nucleic wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Acids Res 37 D987-D991Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ogata Y Suzuki H Sakurai N Shibata D (2010) CoP a database for characterizing co-expressed gene modules with biologicalinformation in plants Bioinformatics 26 1267-1268

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Oshlack A Wakefield MJ (2009) Transcript length bias in RNA-seq data confounds systems biology Biol Direct 4 14Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Park T Yi S-G Kang S-H Lee S Lee Y-S Simon R (2003) Evaluation of normalization methods for microarray data BMC Bioinformatics4 33

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Paulson J Chen C-Y Lopes-Ramos CM Kuijjer ML Platig J Sonawane AR Fagny M Glass K Quackenbush J (2016) Tissue-awareRNA-Seq processing and normalization for heterogeneous and sparse data bioRxiv 81802

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Pautler M Eveland AL LaRue T Yang F Weeks R Lunde C Je B Il Meeley R Komatsu M Vollbrecht E et al (2015) FASCIATED EAR4Encodes a bZIP Transcription Factor That Regulates Shoot Meristem Size in Maize Plant Cell Online 2 tpc114132506

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Penning BW Hunter 3rd CT Tayengwa R Eveland AL Dugard CK Olek AT Vermerris W Koch KE McCarty DR Davis MF et al (2009)Genetic resources for maize cell wall biology Plant Physiol 151 1703-1728

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ponnala L Wang Y Sun Q Wijk KJ (2014) Correlation of mRNA and protein abundance in the developing maize leaf Plant J 78 424-440Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Rau A Gallopin M Celeux G Jaffreacutezic F (2013) Data-based filtering for replicated high-throughput transcriptome sequencingexperiments Bioinformatics 29 2146-2152

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Robinson MD McCarthy DJ Smyth GK (2010) edgeR a Bioconductor package for differential expression analysis of digital geneexpression data Bioinformatics 26 139-140

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sales G Romualdi C (2011) parmigenemdasha parallel R package for mutual information estimation and gene network reconstructionBioinformatics 27 1876-1877

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sangrador-Vegas A Mitchell AL Chang H-Y Yong S-Y Finn RD (2016) GO annotation in InterPro why stability does not indicateaccuracy in a sea of changing annotations Database J Biol databases curation 2016

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sato Y Antonio BA Namiki N Takehisa H Minami H Kamatsuki K Sugimoto K Shimizu Y Hirochika H Nagamura Y (2011) RiceXPro aplatform for monitoring gene expression in japonica rice grown under natural field conditions Nucleic Acids Res 39 D1141-D1148

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schaefer RJ Briskine R Springer NM Myers CL (2014) Discovering functional modules across diverse maize transcriptomes using wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

COB the co-expression browser PLoS One doi 101371journalpone0099193Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schmidt D (2016) Co-Operation Fast Correlation Covariance and Cosine Similarity R packagePubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schnable P Ware D Fulton R Stein J (2009) The B73 maize genome complexity diversity and dynamics Science (80- ) 326 1112-1115Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schwanhaumlusser B Busse D Li N Dittmar G Schuchhardt J Wolf J Chen W Selbach M (2011) Global quantification of mammalian geneexpression control Nature 473 337-342

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Scott-Boyer M-P Haibe-Kains B Deschepper CF (2013) Network statistics of genetically-driven gene co-expression modules in mousecrosses Frontiers in Genetics 20134291

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Serin EAR Nijveen H Hilhorst HWM Ligterink W (2016) Learning from Co-expression Networks Possibilities and Challenges FrontPlant Sci 7 444

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Shannon P Markiel A Ozier O Baliga NS Wang JT Ramage D Amin N Schwikowski B Ideker T (2003) Cytoscape a softwareenvironment for integrated models of biomolecular interaction networks Genome Res 13 2498-2504

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sing T Sander O Beerenwinkel N Lengauer T (2005) ROCR visualizing classifier performance in R Bioinformatics 21 3940-3941Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Singh M Goel S Meeley RB Dantec C Parrinello H Michaud C Leblanc O Grimanelli D (2011) Production of viable gametes withoutmeiosis in maize deficient for an ARGONAUTE protein Plant Cell 23 443-458

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Song L Langfelder P Horvath S (2012) Comparison of co-expression measures mutual information correlation and model basedindices BMC Bioinformatics 13 328

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Stelpflug SC Rajandeep S Vaillancourt B Hirsch CN Buell CR Leon N De Kaeppler SM (2015) An expanded maize gene expressionatlas based on RNA-sequencing and its use to explore root development Plant Genome 314-362

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Szklarczyk D Franceschini A Wyder S Forslund K Heller D Huerta-Cepas J Simonovic M Roth A Santos A Tsafou KP et al (2015)STRING v10 Protein-protein interaction networks integrated over the tree of life Nucleic Acids Res 43 D447-D452

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Tzfadia O Diels T De Meyer S Vandepoele K Aharoni A Van de Peer Y (2016) CoExpNetViz Comparative Co-Expression NetworksConstruction and Visualization Tool Front Plant Sci 6 1194

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Usadel B Obayashi T Mutwil M Giorgi FM Bassel GW Tanimoto M Chow A Steinhauser D Persson S Provart NJ (2009) Co-expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

USDA (2016) Grain World Markets and Trade

Vanholme R Demedts B Morreel K Ralph J Boerjan W (2010) Lignin biosynthesis and structure Plant Physiol 153 895-905Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Verdin E Ott M (2014) 50 Years of Protein Acetylation From Gene Regulation To Epigenetics Metabolism and Beyond Nat Rev MolCell Biol 16 258-264

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Walley JW Sartor RC Shen Z Schmitz RJ Wu KJ Urich MA Nery JR Smith LG Schnable JC Ecker JR (2016) Integration of omicnetworks in a developmental atlas of maize Science 353 814-818

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang W Zhou XJ Liu Z Sun F (2015a) Network tuned multiple rank aggregation and applications to gene ranking BMC Bioinformatics16 S6

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang YXR Jiang K Feldman LJ Bickel PJ Huang H (2015b) Inferring gene-gene interactions and functional modules using sparsecanonical correlation analysis Ann Appl Stat 9 300-323

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang Z Gerstein M Snyder M (2009) RNA-Seq a revolutionary tool for transcriptomics Nat Rev Genet 10 57-63Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wei C Li J Bumgarner RE (2004) Sample size for detecting differentially expressed genes in microarray experiments BMC Genomics5 87

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wolfe CJ Kohane IS Butte AJ (2005) Systematic survey reveals general applicability of guilt-by-association within genecoexpression networks BMC Bioinformatics 6 227

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wu D-D Wang X Li Y Zeng L Irwin DM Zhang Y-P (2014) Out of Pollen Hypothesis for Origin of New Genes in Flowering PlantsStudy from Arabidopsis thaliana Genome Biol Evol 6 2822-2829

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yang H Liu X Xin M Du J Hu Z Peng H Rossi V Sun Q Ni Z Yao Y (2016) Genome-wide Mapping of Targets of Maize HistoneDeacetylase HDA101 Reveals Its Function and Regulatory Mechanism during Seed Development Plant Cell 28 629-645

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yim WC Yu Y Song K Jang CS Lee B-M (2013) PLANEX the plant co-expression database BMC Plant Biol 13 83Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zheng W Chung LM Zhao H (2011) Bias detection and correction in RNA-Sequencing data BMC Bioinformatics 12 290Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Allen JD Xiao G Xie Y (2014) Ensemble-based network aggregation improves the accuracy of gene network reconstructionPLoS One 9 e106319 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Ye ZH (2015) Secondary cell walls Biosynthesis patterned deposition and transcriptional regulation Plant Cell Physiol 56195-214

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhu G Wu A Xu X-J Xiao P Lu L Liu J Cao Y Chen L Wu J Zhao X-M (2016) PPIM A protein-protein interaction database for MaizePlant Physiol 170 pp1501821-

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zyprych-Walczak J Szabelska A Handschuh L Goacuterczak K Klamecka K Figlerowicz M Siatkowski I (2015) The impact of normalizationmethods on RNA-Seq data analysis Biomed Res Int 2015

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

  • Parsed Citations
  • Article File
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Figure 7
  • Figure 8
  • Parsed Citations
Page 17: 1 : Maize RNA-Seq GCNs · 4 Author names and affiliation: 5 Ji Huang, ... and the genome annotation is still far from complete (Mark 49 Cigan et al., 2005; Ficklin and Feltus, 2011)

Page | 17

564

Gene Ontology Enrichment and Visualization 565

Gene ontology enrichment was analyzed in AgriGOrsquos Singular Enrichment Analysis tool (Du et al 2010) 566

15116 genes involved in our networks were used as background references Hypergeometric testing was used 567

to calculate p-value for which a value below 005 was considered as significant The Yekutieli method was 568

used for multiple test correction and terms with false discovery rate (FDR) above 005 were discarded The 569

results were then imported into Cytoscape for visualization 570

571

Databases Comparison on Cell Wall Pathway 572

Sixteen well characterized (Penning et al 2009 Bosch et al 2011) components of cell wall biosynthesis 573

(Supplemental Table S8) were chosen as query genes to search against CORNET Maize 574

(httpsbioinformaticspsbugentbecornetversionscornet_maize10) on website and STRING database using 575

Cytoscape stringApp (httpappscytoscapeorgappsstringapp) The parameters for searching CORNET 576

database were Method=Pearson Correlation coefficient=075 P-value le 005 and Top genes = 50 This 577

resulted in 210 co-expressed genes and 325 interactions To search STRING database the confidence cutoff 578

was set to 04 with maximum number of interactors set to 100 76 genes with 817 interactions were retrieved 579

Maize proteins were blasted against TAIR 10 protein sequences using standalone BLASTP version 2228+ 580

(Camacho et al 2009) 581

582

Acknowledgments 583

We would like to give special thanks to Dr Peixiang Zhao (FSU Department of Computer Science) for advice 584

and discussion on topological analysis of maize networks Also we thank Dr Alan Lemmon (FSU Department 585

of Scientific Computing) and Dr Jonathan Dennis (FSU Department of Biological Science) for the helpful 586

discussion on data analysis 587

588

Supplemental Data 589

Supplemental Figure 1 Pipeline and datasets used for analysis 590

Supplemental Figure 2 Distribution of gene expression values 591

Supplemental Figure 3 Maize CPM-normalized with log2 transformed gene expression from all tissues and 592

developmental stages 593

Supplemental Figure 4 Pairwise comparison among results of inferences methods 594

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 18

Supplemental Figure 5 Characteristic of all 1720 PPPTY gene set (ALL_1720) genes with highest AUROC 595

values in CSC method (CSC) PCC method (PCC) and MRNET method (MRNET) 596

Supplemental Figure 6 Evaluation of network performance based on sample size and inference 597

Supplemental Figure 7 GCN performance comparison between protein networks 598

Supplemental Figure 8 Average neighborhood connectivity for three selected networks PCC-aggregated (PA) 599

SCC-aggregated (SA) and MRNET-single (MS) 600

Supplemental Figure 9 Node distribution for four selected networks PCC-aggregated (PA) SCC-aggregated 601

(SA) and MRNET-single (MS) and the intersection among three networks (Merged network) 602

Supplemental Figure 10 Network representation of PCC ranked aggregation network (PA) 603

Supplemental Table S1 RNA-Seq libraries used in this analysis 604

Supplemental Table S2 Random network AUROC value baseline 605

Supplemental Table S3 ANOVA tables and pairwise comparisons 606

Supplemental Table S4 Topological characteristics of four maize networks 607

Supplemental Table S5 Gene Ontology annotation for 148 hub genes 608

Supplemental Table S6 Enriched GO terms for PCC ranked aggregation networks from module 1 to module 8 609

Supplemental Table S7 Enriched GO terms for SCC ranked aggregation networks from module 1 to module 8 610

Supplemental Table S8 16 query genes in maize cell wall pathway 611

Supplemetal Table S9 GO enrichment analysis for 214 co-expressed genes of cell wall query genes in 612

merged network 613

Supplemental Table S10 Annotation for co-expressed genes queried by 16 cell wall pathway genes from 614

merged network 615

Supplemental Table S11 Annotation for co-expressed genes queried by 16 cell wall pathway genes from 616

CORNET database 617

Supplemental Table S12 Annotation for co-expressed genes queried by 16 cell wall pathway genes from 618

STRING database 619

Supplemental Dataset S1 The merged network in Cytoscape-ready format 620

Supplemental Dataset S2 Tutorial Visualizing Co-expression data in Cytoscape 621

622

623 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 19

624

625

626

Figure legends 627

628

Figure 1 Number of maize microarray and RNA-Seq samples submitted to NCBI (httpswwwncbinlmnihgov) 629

from 2008-2016 A A text search of the National Center for Biotechnology Information (NCBI) Gene 630

Expression Omnibus (GEO) database identified samples generated by microarray platforms GPL4032 and 631

GPL12620 the total values for years 2008 to 2016 were combined to represent the number of microarray 632

studies (Microarray) A text search of the NCBI Sequence Read Archive (SRA) database was used to identify 633

RNA-seq samples generated between 2008 and 2016 using the Illumina sequencing platform (RNA-Seq) B 634

the number of samples submitted to NCBI GEO database each year generated by microarray platform 635

GPL4032 and GPL12620 were identified by a text search (dash line) and compared to the number of RNA-Seq 636

Illumina samples (solid line) per year 2008-2016 637

638

Figure 2 Normalization and network inference methods effect on single network performance A Network 639

performance was evaluated by calculating area under the receiver operating characteristic curve (AUROC) 640

values from GO datasets for comparisons with samples normalized using Variance Stabilizing Transformation 641

(VST) Counts Per Million (CPM) or Reads per Kilobase per Million (RPKM) methods B Network performance 642

was evaluated by calculating AUROC values from PPPTY dataset comparisons for samples normalized using 643

VST CPM or RPKM methods C Network performance was evaluated by calculating AUROC values from 644

comparisons with HDA101 binding targets for samples normalized using VST CPM or RPKM methods D 645

Network performance was evaluated by calculating AUROC values from comparisons with GO dataset for 646

samples constructed using ten inference methods including Pearson Correlation Coefficient (PCC) Spearman 647

correlation coefficient (SCC) Kendall rank Correlation Coefficient (KCC) Gini correlation coefficient (GCC) 648

Biweight midcorrelation (BIC) Cosine Similarity Coefficient (CSC) Additive ARCNE (AA) multiplicative 649

ARCNE (MA) Minimum Redundancy NETwork (MRNET) and Context Likelihood of Relatedness (CLR) E 650

Network performance was evaluated by calculating AUROC values from comparisons with PPPTY for samples 651

constructed using ten inference methods F Network performance was evaluated by calculating AUROC 652

values from comparisons with HDA101 binding targets for samples constructed using ten inference methods 653

Outliers were defined as outside of 15 times the interquartile range above 75 quantile or below 25 quantile 654

Median values were plotted as bold horizontal lines For C to F the horizontal dashed lines indicate the highest 655

and lowest AUROC values 656

657

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 20

Figure 3 Similarity between ten inference methods on network performance based upon GO (A) and PPPTY 658

(B) evaluation Genes with highest AUROC value in CSC or MRNETCLR are closed in blue and green box 659

respectively Area under the ROC curve (AUROC) values for each GO term or genes were scaled to standard 660

normal distribution resulting in scaled AUROC values between -3 (blue) and 3 (red) Samples normalized by 661

VST CPM and RPKM were analyzed using each inference methods (PCC SCC KCC GCC BIC CSC AA 662

MA MRNET and CLR) and clustered based on Euclidian distance PCC Pearson Correlation Coefficient SCC 663

Spearman correlation coefficient KCC Kendall rank Correlation Coefficient GCC Gini correlation coefficient 664

BIC Biweight midcorrelation CSC Cosine Similarity Coefficient AA Additive ARCNE MA multiplicative 665

ARCNE MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness 666

667

Figure 4 Effect of sample size on network performance A Average area under the ROC curve (average 668

AUROC) values from GO evaluation of 17 different sized networks plotted against natural logarithm 669

transformed sample size (log(sample size)) B Average AUROC values from PPPTY evaluation of 17 different 670

sized networks plotted against natural logarithm transformed sample size (log(sample size)) Regression fitting 671

logarithm models were plotted in black lines R-square and p-values were calculated by lm() function in R PCC 672

Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET Minimum Redundancy 673

NETwork CLR Context Likelihood of Relatedness 674

675

Figure 5 Evaluation of different sized networks constructed with different number of samples using PCC 676

(black) SCC (green) MRNET (red) and CLR (blue) methods A Average AUROC values from GO evaluations 677

of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted against sample size Seventeen 678

individual networks were labeled as S12_1 to S404 the S1266 included all samples from 17 experiments B 679

Average AUROC values from PPPTY evaluations of networks constructed using 12 (S12) to 1266 (S1266) 680

libraries were plotted against sample size Networks with the same number of samples included are 681

designated as ldquo_1rdquo ldquo_2rdquo and ldquo_3rdquo PCC Pearson Correlation Coefficient SCC Spearman correlation 682

coefficient MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness 683

684

Fig 6 GCN performance comparison among single network (whiterdquo1266rdquo) aggregated network (greyrdquoaggrdquo) 685

and protein network (dark greyrdquoprrdquo) using PCC SCC MRNET and CLR A GO evaluation on networks 686

Inference methods were indicated by single letter (p- PCC s- SCC m- MRNET c-CLR) AUROC values were 687

plotted against network types B PPPTY evaluation on networks Inference methods were indicated by single 688

letter (p- PCC s- SCC m- MRNET c-CLR) Network types were plotted against AUROC values Bold 689

horizontal lines indicate median star sign is the mean value of each box Outliers are plotted in grey dots 690

691

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 21

Figure 6 GCN performance comparison of networks constructed with 1266 libraries A Area under the ROC 692

curve (AUROC) values from GO evaluation of single network (white bars) aggregation network (grey bars) and 693

protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) 694

or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers B 695

AUROC values from PPPTY evaluation of single network (white bars) aggregation network (grey bars) and 696

protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) 697

or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers 698

699

Figure 7 Comparison of top 1000 interacting genes in aggregation and single networks A Venn diagram 700

shows the overlap among 1000 highest interacting genes in three networks 148 genes were shared among 701

three networks PA PCC ranked aggregation network SA SCC ranked aggregation network MS MRNET 702

single network B Venn diagram shows the overlap among top 1x106 edges in three networks 106591 edges 703

were shared among three networks C GO term enrichment analyzed by AgriGO for 148 shared highly 704

interacting genes among PA SA and MS D GO term enrichment analyzed by AgriGO for 214 co-expressed 705

genes queried by 16 cell wall pathway genes 706

707

Figure 8 Cell wall pathway subnetworks A Intersections of PCC aggregation (PA) SCC aggregation (SA) and 708

MRNET-single (MS) networks queried by 16 cell wall pathway genes (red nodes) Cyan nodes are genes with 709

reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of 710

involvement in cell wall related pathways Grey lines indicate network predicted interactions B Network 711

retrieved from CORNET database queried by the16 cell wall pathway genes (red node) Cyan nodes are 712

genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior 713

knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions C 714

Network retrieved from STRING database queried by 16 cell wall pathway genes (red nodes) Cyan nodes are 715

genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior 716

knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions 717

718

Supplemental Figure 1 Pipeline and datasets used for analysis A Workflow used in this analysis 719

Independent steps are labeled in square boxes with alternative algorithms for each step in the rounded boxes 720

Software and packages for each step are in italics between the boxes Raw data files were acquired from 721

National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) database converted to a 722

common format (fastq files) and aligned to the maize AGPv3 genome (Alignment) Gene-level reads were 723

counted (Read Count) to generate an expression matrix which was imported to the R environment for the 724

normalization inference and evaluation steps All networks were visualized in Cytoscape B Relative 725

representation of different maize tissues in acquired datasets Tissues are listed by name with the percentage 726

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 22

of the1266 libraries originating from each tissue SAM= Shoot Apical Meristem Samples are grouped by tissue 727

and may be represented by one or more developmental stages of that tissue Tissues represented by less than 728

10 libraries were grouped together as Others C Relative representation of different maize genotypes in our 729

datasets Genotypes are listed by name with the percentage of the 1266 libraries originating from each tissue 730

MAGIC = Multi-parent Advanced Generation InterCrosses Genotypes represented by more than 10 libraries 731

were grouped together as Others 732

733

Supplemental Figure 2 Distribution of gene expression values The frequency of each expression level in the 734

dataset (Density) was plotted against gene expression (Expr) which was calculated after normalization by 735

Variance Stabilizing Transformation (VST) Counts Per Million (CPM) or Reads Per Kilobase per Million 736

mapped reads (RPKM) A-B distribution of expression values for samples normalized with CPM (black line 737

CPM graph) and RPKM (black line RPKM graph) before (A) and after (B) logarithm normalization (log2) VST 738

values are log2 transformed by default The normal distribution of expression (dot lines) was calculated using 739

dnorm() function in R which takes the mean value and standard deviation from log2 transformed expressions 740

C Normalized gene expression values for 15116 genes were averaged libraries and plotted as a function of 741

gene length in base pairs (bp) 742

743

Supplemental Figure 3 Maize CPM-normalized with log2 transformed gene expression from all tissues and 744

developmental stages (Stelpflug et al 2015) A Clustering dendrogram of samples based on Euclidean 745

distance (Height) DAS days after sowing DAP days after pollination V1-V18 vegetative developmental 746

stage B Heat map of the gene expression correlation between pollen tissue and 78 other tissues calculated 747

by Pearson correlation coefficient ranging 06 to 10 Red color indicates higher correlation 748

749

Supplemental Figure 4 Pairwise comparison among results of inferences methods A GO evaluation 750

comparisons for VST CPM and RPKM normalized data The AUROC value density for each method was 751

plotted in diagonal line of blocks between AUROC values and PCC values AUROC values evaluated by GO 752

datasets were plotted pairwise in triangle below diagonal with the number corresponding coefficient values as 753

calculated by Pearson correlation shown in the triangle above diagonal B PPPTY evaluation comparisons for 754

VST CPM and RPKM normalized data The AUROC value density for each method was plotted in diagonal 755

line of blocks between AUROC values and PCC values AUROC values evaluated by PPPTY datasets were 756

plotted pairwise in triangle below diagonal with the number corresponding coefficient values as calculated by 757

Pearson correlation shown in the triangle above diagonal PCC Pearson Correlation Coefficient SCC 758

Spearman correlation coefficient KCC Kendall rank Correlation Coefficient GCC Gini correlation coefficient 759

Bi Biweight midcorrelation CSC Cosine Similarity Coefficient AA Additive ARCNE MA multiplicative 760

ARCNE MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness 761

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 23

762

Supplemental Figure 5 Characteristic of all 1720 PPPTY gene set (ALL_1720) genes with highest AUROC 763

values in CSC method (CSC) PCC method (PCC) and MRNET method (MRNET) Average expression in 764

CPM of four gene sets were in squares average number of lowly expressed elements (CPM lt 0) were in solid 765

circles 766

767

Supplemental Figure 6 Evaluation of network performance based on sample size and inference A AUROC 768

values from GO evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted 769

against sample size B AUROC values from PPPTY evaluations of networks constructed using 12 (S12) to 770

1266 (S1266) libraries were plotted against sample size Networks with the same number of samples included 771

are designated as ldquo_1rdquo ldquo_2rdquo and ldquo_3rdquo Outliers were defined as outside of 15 times the interquartile range 772

above the 75 quantile or below 25 quantile Median values were plotted as bold horizontal lines Dash lines 773

are average AUROC value from 17 individual networks of each categories Mean values of each network were 774

labeled in asterisks PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET 775

Minimum Redundancy NETwork CLR Context Likelihood of Relatedness 776

777

Supplemental Figure 7 GCN performance comparison between protein networks A Area Under the ROC 778

curve (AUROC) values from GO evaluation of protein networks with 17862 genes (ppr_all) and with 11429 779

genes (ppr) B Area Under the ROC curve (AUROC) values from PPPTY evaluation of protein networks with 780

17862 genes (ppr_all) and with 11429 genes (ppr) Both networks were constructed by Pearson Correlation 781

Coefficient (PCC) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate 782

outliers 783

784

Supplemental Figure 8 Average neighborhood connectivity for three selected networks PCC-aggregated (PA) 785

SCC-aggregated (SA) and MRNET-single (MS) The average neighborhood connectivity distribution of all 786

genes is plotted against number of neighbors The top one million edges were chosen for each network Red 787

and blue curve shows the power-law fitted distribution R2 value indicates the fitness with the power-law model 788

789

Supplemental Figure 9 Node distribution for four selected networks PCC-aggregated (PA) SCC-aggregated 790

(SA) and MRNET-single (MS) and the intersection among three networks (Merged network) The number of 791

edges linked to the genes (node degree) was plotted against the number of genes with that degree (number of 792

nodes) Red curve shows the power-law fitted distribution with the function and R2 indicated beside 793

794

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 24

Supplemental Figure 10 Network representation of PCC ranked aggregation network (PA) Each node is a 795

gene in the network The eight largest modules detected by Markov Cluster Algorithm (MCL) were highlighted 796

in colors Genes not in modules 1-8 are light grey nodes 797

798

799

Literature Cited 800

Allen JD Xie Y Chen M Girard L Xiao G (2012) Comparing statistical methods for constructing large scale 801 gene networks PLoS One 7 e29348 802

Anders S Huber W (2010) Differential expression analysis for sequence count data Genome Biol 11 R106 803

Aoki K Ogata Y Shibata D (2007) Approaches for extracting practical information from gene co-expression 804 networks in plant biology Plant Cell Physiol 48 381ndash90 805

Aoki Y Okamura Y Tadaka S Kinoshita K Obayashi T (2015) ATTED-II in 2016 A Plant Coexpression 806 Database Towards Lineage-Specific Coexpression Plant Cell Physiol 57 e5ndashe5 807

Asnicar F Masera L Coller E Gallo C Sella N Tolio T Morettin P Erculiani L Galante F Semeniuta S (2016) 808 NES2RA Network expansion by stratified variable subsetting and ranking aggregation Int J High Perform 809 Comput Appl 1094342016662508 810

Baerenfaller K Grossmann J Grobei MA Hull R Hirsch-Hoffmann M Yalovsky S Zimmermann P 811 Grossniklaus U Gruissem W Baginsky S (2008) Genome-scale proteomics reveals Arabidopsis thaliana 812 gene models and proteome dynamics Science (80- ) 320 938ndash941 813

Ballouz S Verleyen W Gillis J (2015) Guidance for RNA-seq co-expression network construction and analysis 814 Safety in numbers Bioinformatics 31 2123ndash2130 815

Ballouz S Weber M Pavlidis P Gillis J (2016) EGAD Ultra-fast functional analysis of gene networks bioRxiv 816 53868 817

Barabasi A-L Oltvai ZNZN Barabaacutesi A-L (2004) Network biology understanding the cellrsquos functional 818 organization Nat Rev Genet 5 101ndash113 819

Benjamini Y Hochberg Y (1995) Controlling the false discovery rate a practical and powerful approach to 820 multiple testing J R Stat Soc Ser B 289ndash300 821

De Bodt S Hollunder J Nelissen H Meulemeester N Inzeacute D (2012) CORNET 20 Integrating plant 822 coexpression protein-protein interactions regulatory interactions gene associations and functional 823 annotations New Phytol 195 707ndash720 824

Bolduc N Yilmaz A Mejia-Guerra MK Morohashi K OrsquoConnor D Grotewold E Hake S (2012) Unraveling the 825 KNOTTED1 regulatory network in maize meristems Genes Dev 26 1685ndash90 826

Bosch M Mayer C-D Cookson A Donnison IS (2011) Identification of genes involved in cell wall biogenesis in 827 grasses by differential gene expression profiling of elongating and non-elongating maize internodes J 828 Exp Bot 62 3545ndash3561 829

Camacho C Coulouris G Avagyan V Ma N Papadopoulos J Bealer K Madden TL (2009) BLAST+ 830 architecture and applications BMC Bioinformatics 10 421 831

Conesa A Madrigal P Tarazona S Gomez-Cabrero D Cervera A McPherson A Szcześniak MW Gaffney DJ 832 Elo LL Zhang X et al (2016) A survey of best practices for RNA-seq data analysis Genome Biol 17 13 833

Drsquohaeseleer P Liang S Somogyi R (2000) Genetic network inference from co-expression clustering to reverse 834 engineering Bioinformatics 16 707ndash726 835

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 25

Davidson RM Hansey CN Gowda M Childs KL Lin H Vaillancourt B Sekhon RS de Leon N Kaeppler SM 836 Jiang N et al (2011) Utility of RNA Sequencing for Analysis of Maize Reproductive Transcriptomes Plant 837 Genome J 4 191 838

Desprez T Juraniec M Crowell EF Jouy H Pochylova Z Parcy F Houmlfte H Gonneau M Vernhettes S (2007) 839 Organization of cellulose synthase complexes involved in primary cell wall synthesis in Arabidopsis 840 thaliana Proc Natl Acad Sci 104 15572ndash15577 841

Dhillon IS Modha DS (2001) Concept decompositions for large sparse text data using clustering Mach Learn 842 42 143ndash175 843

Dillies M-A Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot G Castel D 844 Estelle J (2013a) A comprehensive evaluation of normalization methods for Illumina high-throughput RNA 845 sequencing data analysis Brief Bioinform 14 671ndash683 846

Dillies MA Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot NS Castel D 847 Estelle J et al (2013b) A comprehensive evaluation of normalization methods for Illumina high-throughput 848 RNA sequencing data analysis Brief Bioinform 14 671ndash683 849

Doncheva NT Assenov Y Domingues FS Albrecht M (2012) Topological analysis and interactive visualization 850 of biological networks and protein structures Nature Protoc 7 670ndash85 851

Dong J Horvath S (2007) Understanding network concepts in modules BMC Syst Biol 1 24 852

Dotto MC Petsch KA Aukerman MJ Beatty M Hammell M Timmermans MCP (2014) Genome-wide analysis 853 of leafbladeless1-regulated and phased small RNAs underscores the importance of the TAS3 ta-siRNA 854 pathway to maize development PLoS Genet 10 e1004826 855

Du D Rawat N Deng Z Gmitter FG (2015) Construction of citrus gene coexpression networks from microarray 856 data using random matrix theory Hortic Res 2 15026 857

Du Z Zhou X Ling Y Zhang Z Su Z (2010) agriGO a GO analysis toolkit for the agricultural community 858 Nucleic Acids Res 38 64-70 859

Enright AJ Van Dongen S Ouzounis CA (2002) An efficient algorithm for large-scale detection of protein 860 families Nucleic Acids Res 30 1575ndash1584 861

Fadista J Vikman P Laakso EO Mollet IG Esguerra J Lou Taneera J Storm P Osmark P Ladenvall C 862 Prasad RB (2014) Global genomic and transcriptomic analysis of human pancreatic islets reveals novel 863 genes influencing glucose metabolism Proc Natl Acad Sci 111 13924ndash13929 864

Faith JJ Hayete B Thaden JT Mogno I Wierzbowski J Cottarel G Kasif S Collins JJ Gardner TS (2007) 865 Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of 866 expression profiles PLoS Biol 5 0054ndash0066 867

Fedoroff N V (2012) McClintockrsquos challenge in the 21st century Proc Natl Acad Sci 109(50) 20200ndash20203 868

Ficklin SP Feltus FA (2011) Gene coexpression network alignment and conservation of gene modules 869 between two grass species maize and rice Plant Physiol 156 1244ndash56 870

Filzmoser P Fritz H Kalcher K (2009) pcaPP Robust PCA by Projection Pursuit R Packag version 1 871

Fu J Cheng Y Linghu J Yang X Kang L Zhang Z Zhang J He C Du X Peng Z (2013) RNA sequencing 872 reveals the complex regulatory network in the maize kernel Nature Commun 42832 873

Gao S Ver Steeg G Galstyan A (2015) Efficient Estimation of Mutual Information for Strongly Dependent 874 Variables Artificial Intelligence and Statistics 277-286 875

Gillis J Pavlidis P (2011) The role of indirect connections in gene networks in predicting function 876 Bioinformatics 27 1860ndash1866 877

Giorgi F Fabbro C Del Licausi F (2013) Comparative study of RNA-seq-and Microarray-derived coexpression 878 networks in Arabidopsis thaliana Bioinformatics 2 1ndash8 879

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 26

Gu Y Kaplinsky N Bringmann M Cobb A Carroll A Sampathkumar A Baskin TI Persson S Somerville CR 880 (2010) Identification of a cellulose synthase-associated protein required for cellulose biosynthesis Proc 881 Natl Acad Sci 107 12866ndash12871 882

Han Y Gao S Muegge K Zhang W Zhou B (2015) Advanced applications of RNA sequencing and challenges 883 Bioinform Biol Insights 9 29ndash46 884

Horvath S Dong J (2008) Geometric interpretation of gene coexpression network analysis PLoS Comput biol 885 4 e1000117 886

Huang J Lynn JS Schulte L Vendramin S McGinnis K (2017) Chapter Two-Epigenetic Control of Gene 887 Expression in Maize Int Rev Cell Mol Biol 328 25ndash48 888

Iancu OD Kawane S Bottomly D Searles R Hitzemann R McWeeney S (2012) Utilizing RNA-Seq data for de 889 novo coexpression network inference Bioinformatics 28 1592ndash1597 890

Kim D Langmead B Salzberg SL (2015) HISAT a fast spliced aligner with low memory requirements Nat 891 Methods 12 357ndash360 892

Koumlster J Rahmann S (2012) Snakemakemdasha scalable bioinformatics workflow engine Bioinformatics 28 893 2520ndash2522 894

Krouk G Lingeman J Colon A Coruzzi G Shasha D (2013) Gene regulatory networks in plants learning 895 causality from time and perturbation Genome Biol 14 123 896

Kumar S Stecher G Suleski M Hedges SB (2017) TimeTree a resource for timelines timetrees and 897 divergence times Mol Biol Evol 34 1812ndash1819 898

Kumari S Nie J Chen H-S Ma H Stewart R Li X Lu M-Z Taylor WM Wei H (2012) Evaluation of gene 899 association methods for coexpression network construction and biological knowledge discovery PLoS 900 One 7 e50411 901

Langfelder P Horvath S (2008) WGCNA an R package for weighted correlation network analysis BMC 902 Bioinformatics 9 559 903

Leinonen R Sugawara H Shumway M (2010) The sequence read archive Nucleic Acids Res gkq1019 904

Li C Qiao Z Qi W Wang Q Yuan Y Yang X Tang Y Mei B Lv Y Zhao H et al (2015a) Genome-Wide 905 Characterization of cis-Acting DNA Targets Reveals the Transcriptional Regulatory Framework of 906 Opaque2 in Maize Plant Cell 27 532-545 907

Li H Peng Z Yang X Wang W Fu J Wang J Han Y Chai Y Guo T Yang N (2013a) Genome-wide 908 association study dissects the genetic architecture of oil biosynthesis in maize kernels Nat Genet 45 43ndash909 50 910

Li J Wei H Zhao PX (2013b) DeGNServer  Deciphering Genome-Scale Gene Networks through High 911 Performance Reverse Engineering Analysis 2013 912

Li P Piao Y Shon HS Ryu KH (2015b) Comparing the normalization methods for the differential analysis of 913 Illumina high-throughput RNA-Seq data BMC Bioinformatics 16 347 914

Li Q Eichten SR Hermanson PJ Zaunbrecher VM Song J Wendt J Rosenbaum H Madzima TF Sloan AE 915 Huang J et al (2014a) Genetic Perturbation of the Maize Methylome Plant Cell 26 4602ndash4616 916

Li S Łabaj PP Zumbo P Sykacek P Shi W Shi L Phan J Wu P-Y Wang M Wang C (2014b) Detecting and 917 correcting systematic variation in large-scale RNA sequencing data Nature Biotechnol 32 888ndash895 918

Li Y Pearl SA Jackson SA (2015c) Gene Networks in Plant Biology Approaches in Reconstruction and 919 Analysis Trends Plant Sci 20 664ndash675 920

Liao Y Smyth GK Shi W (2014) featureCounts an efficient general purpose program for assigning sequence 921 reads to genomic features Bioinformatics 30 923ndash930 922

Lim WK Wang K Lefebvre C Califano A (2007) Comparative analysis of microarray normalization procedures 923 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 27

Effects on reverse engineering gene networks Bioinformatics pp 282ndash288 924

Liu S Liu Y Zhao J Cai S Qian H Zuo K Zhao L Zhang L (2017) A computational interactome for prioritizing 925 genes associated with complex agronomic traits in rice Plant J 90 177-188 926

Lonsdale J Thomas J Salvatore M Phillips R Lo E Shad S Hasz R Walters G Garcia F Young N (2013) 927 The genotype-tissue expression (GTEx) project Nat Genet 45 580ndash585 928

Love MI Huber W Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data 929 with DESeq2 Genome Biol 15 1 930

Luo X You Z Zhou M Li S Leung H Xia Y Zhu Q (2015) A highly efficient approach to protein interactome 931 mapping based on collaborative filtering framework Sci Rep 5 7702 932

Ma C Wang X (2012) Application of the Gini correlation coefficient to infer regulatory relationships in 933 transcriptome analysis Plant Physiol 160 192ndash203 934

Ma H-W Zeng A-P (2003) The connectivity structure giant strong component and centrality of metabolic 935 networks Bioinformatics 19 1423ndash1430 936

Ma S Shah S Bohnert HJ Snyder M Dinesh-Kumar SP (2013) Incorporating motif analysis into gene co-937 expression networks reveals novel modular expression pattern and new signaling pathways PLoS Genet 938 9 e1003840 939

Marbach D Costello JC Kuumlffner R Vega NNM Prill RJ Camacho DM Allison KR Aderhold A Allison KR 940 Bonneau R et al (2012) Wisdom of crowds for robust gene network inference Nat Methods 9 796ndash804 941

Margolin AA Nemenman I Basso K Wiggins C Stolovitzky G Dalla Favera R Califano A (2006) ARACNE 942 an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context BMC 943 Bioinformatics 7 S7 944

Mark Cigan A Unger‐Wallace E Haug‐Collet K (2005) Transcriptional gene silencing as a tool for uncovering 945 gene function in maize Plant J 43 929ndash940 946

Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads EMBnet J 17 947 pp-10 948

Maza E Frasse P Senin P Bouzayen M Zouine M (2013) Comparison of normalization methods for 949 differential gene expression analysis in RNA-Seq experiments A matter of relative size of studied 950 transcriptomes Commun Integr Biol 6 e25849 951

McClintock B (1983) The Significance of Responses of THE GENOME TO CHALLENGE Science (80- ) 792ndash952 801 953

Meyer PE Kontos K Lafitte F Bontempi G (2007) Information-theoretic inference of large transcriptional 954 regulatory networks Eurasip J Bioinforma Syst Biol doi 101155200779879 955

Meyer PE Lafitte F Bontempi G (2008) minet ARBioconductor package for inferring large transcriptional 956 networks using mutual information BMC Bioinformatics 9 461 957

Monaco MK Sen TZ Dharmawardhana PD Ren L Schaeffer M Naithani S Amarasinghe V Thomason J 958 Harper L Gardiner J et al (2013) Maize Metabolic Network Construction and Transcriptome Analysis 959 Plant Genome 6 12 960

Morohashi K Casas MI Falcone Ferreyra ML Falcone Ferreyra L Mejiacutea-Guerra MK Pourcel L Yilmaz A 961 Feller A Carvalho B Emiliani J et al (2012) A genome-wide regulatory framework identifies maize 962 pericarp color1 controlled genes Plant Cell 24 2745ndash64 963

Morris JH Apeltsin L Newman AM Baumbach J Wittkop T Su G Bader GD Ferrin TE (2011) clusterMaker 964 a multi-algorithm clustering plugin for Cytoscape BMC Bioinformatics 12 436 965

Mortazavi A Williams BA McCue K Schaeffer L Wold B (2008) Mapping and quantifying mammalian 966 transcriptomes by RNA-Seq Nat Methods 5 621ndash628 967

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 28

Mukaka MM (2012) A guide to appropriate use of correlation coefficient in medical research Malawi Med J 24 968 69ndash71 969

Obayashi T Hayashi S Saeki M Ohta H Kinoshita K (2009) ATTED-II provides coexpressed gene networks 970 for Arabidopsis Nucleic Acids Res 37 D987ndashD991 971

Ogata Y Suzuki H Sakurai N Shibata D (2010) CoP a database for characterizing co-expressed gene 972 modules with biological information in plants Bioinformatics 26 1267ndash1268 973

Oshlack A Wakefield MJ (2009) Transcript length bias in RNA-seq data confounds systems biology Biol 974 Direct 4 14 975

Park T Yi S-G Kang S-H Lee S Lee Y-S Simon R (2003) Evaluation of normalization methods for microarray 976 data BMC Bioinformatics 4 33 977

Paulson J Chen C-Y Lopes-Ramos CM Kuijjer ML Platig J Sonawane AR Fagny M Glass K Quackenbush 978 J (2016) Tissue-aware RNA-Seq processing and normalization for heterogeneous and sparse data 979 bioRxiv 81802 980

Pautler M Eveland AL LaRue T Yang F Weeks R Lunde C Je B Il Meeley R Komatsu M Vollbrecht E et 981 al (2015) FASCIATED EAR4 Encodes a bZIP Transcription Factor That Regulates Shoot Meristem Size in 982 Maize Plant Cell Online 2 tpc114132506 983

Penning BW Hunter 3rd CT Tayengwa R Eveland AL Dugard CK Olek AT Vermerris W Koch KE McCarty 984 DR Davis MF et al (2009) Genetic resources for maize cell wall biology Plant Physiol 151 1703ndash1728 985

Ponnala L Wang Y Sun Q Wijk KJ (2014) Correlation of mRNA and protein abundance in the developing 986 maize leaf Plant J 78 424ndash440 987

Rau A Gallopin M Celeux G Jaffreacutezic F (2013) Data-based filtering for replicated high-throughput 988 transcriptome sequencing experiments Bioinformatics 29 2146ndash2152 989

Robinson MD McCarthy DJ Smyth GK (2010) edgeR a Bioconductor package for differential expression 990 analysis of digital gene expression data Bioinformatics 26 139ndash140 991

Sales G Romualdi C (2011) parmigenemdasha parallel R package for mutual information estimation and gene 992 network reconstruction Bioinformatics 27 1876ndash1877 993

Sangrador-Vegas A Mitchell AL Chang H-Y Yong S-Y Finn RD (2016) GO annotation in InterPro why 994 stability does not indicate accuracy in a sea of changing annotations Database J Biol databases 995 curation 2016 996

Sato Y Antonio BA Namiki N Takehisa H Minami H Kamatsuki K Sugimoto K Shimizu Y Hirochika H 997 Nagamura Y (2011) RiceXPro a platform for monitoring gene expression in japonica rice grown under 998 natural field conditions Nucleic Acids Res 39 D1141ndashD1148 999

Schaefer RJ Briskine R Springer NM Myers CL (2014) Discovering functional modules across diverse maize 1000 transcriptomes using COB the co-expression browser PLoS One doi 101371journalpone0099193 1001

Schmidt D (2016) Co-Operation Fast Correlation Covariance and Cosine Similarity R package 1002

Schnable P Ware D Fulton R Stein J (2009) The B73 maize genome complexity diversity and dynamics 1003 Science (80- ) 326 1112ndash1115 1004

Schwanhaumlusser B Busse D Li N Dittmar G Schuchhardt J Wolf J Chen W Selbach M (2011) Global 1005 quantification of mammalian gene expression control Nature 473 337ndash342 1006

Scott-Boyer M-P Haibe-Kains B Deschepper CF (2013) Network statistics of genetically-driven gene co-1007 expression modules in mouse crosses Frontiers in Genetics 20134291 1008

Serin EAR Nijveen H Hilhorst HWM Ligterink W (2016) Learning from Co-expression Networks Possibilities 1009 and Challenges Front Plant Sci 7 444 1010

Shannon P Markiel A Ozier O Baliga NS Wang JT Ramage D Amin N Schwikowski B Ideker T (2003) 1011 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 29

Cytoscape a software environment for integrated models of biomolecular interaction networks Genome 1012 Res 13 2498ndash2504 1013

Sing T Sander O Beerenwinkel N Lengauer T (2005) ROCR visualizing classifier performance in R 1014 Bioinformatics 21 3940ndash3941 1015

Singh M Goel S Meeley RB Dantec C Parrinello H Michaud C Leblanc O Grimanelli D (2011) Production of 1016 viable gametes without meiosis in maize deficient for an ARGONAUTE protein Plant Cell 23 443ndash458 1017

Song L Langfelder P Horvath S (2012) Comparison of co-expression measures mutual information 1018 correlation and model based indices BMC Bioinformatics 13 328 1019

Stelpflug SC Rajandeep S Vaillancourt B Hirsch CN Buell CR Leon N De Kaeppler SM (2015) An 1020 expanded maize gene expression atlas based on RNA-sequencing and its use to explore root 1021 development Plant Genome 314ndash362 1022

Szklarczyk D Franceschini A Wyder S Forslund K Heller D Huerta-Cepas J Simonovic M Roth A Santos A 1023 Tsafou KP et al (2015) STRING v10 Protein-protein interaction networks integrated over the tree of life 1024 Nucleic Acids Res 43 D447ndashD452 1025

Tzfadia O Diels T De Meyer S Vandepoele K Aharoni A Van de Peer Y (2016) CoExpNetViz Comparative 1026 Co-Expression Networks Construction and Visualization Tool Front Plant Sci 6 1194 1027

Usadel B Obayashi T Mutwil M Giorgi FM Bassel GW Tanimoto M Chow A Steinhauser D Persson S 1028 Provart NJ (2009) Co-expression tools for plant biology opportunities for hypothesis generation and 1029 caveats Plant Cell Environ 32 1633ndash51 1030

USDA (2016) Grain World Markets and Trade 1031

Vanholme R Demedts B Morreel K Ralph J Boerjan W (2010) Lignin biosynthesis and structure Plant 1032 Physiol 153 895ndash905 1033

Verdin E Ott M (2014) 50 Years of Protein Acetylation From Gene Regulation To Epigenetics Metabolism 1034 and Beyond Nat Rev Mol Cell Biol 16 258ndash264 1035

Walley JW Sartor RC Shen Z Schmitz RJ Wu KJ Urich MA Nery JR Smith LG Schnable JC Ecker JR 1036 (2016) Integration of omic networks in a developmental atlas of maize Science 353 814ndash818 1037

Wang W Zhou XJ Liu Z Sun F (2015a) Network tuned multiple rank aggregation and applications to gene 1038 ranking BMC Bioinformatics 16 S6 1039

Wang YXR Jiang K Feldman LJ Bickel PJ Huang H (2015b) Inferring genendashgene interactions and functional 1040 modules using sparse canonical correlation analysis Ann Appl Stat 9 300ndash323 1041

Wang Z Gerstein M Snyder M (2009) RNA-Seq a revolutionary tool for transcriptomics Nat Rev Genet 10 1042 57ndash63 1043

Wei C Li J Bumgarner RE (2004) Sample size for detecting differentially expressed genes in microarray 1044 experiments BMC Genomics 5 87 1045

Wolfe CJ Kohane IS Butte AJ (2005) Systematic survey reveals general applicability ofldquo guilt-by-associationrdquo 1046 within gene coexpression networks BMC Bioinformatics 6 227 1047

Wu D-D Wang X Li Y Zeng L Irwin DM Zhang Y-P (2014) ldquoOut of Pollenrdquo Hypothesis for Origin of New 1048 Genes in Flowering Plants Study from Arabidopsis thaliana Genome Biol Evol 6 2822ndash2829 1049

Yang H Liu X Xin M Du J Hu Z Peng H Rossi V Sun Q Ni Z Yao Y (2016) Genome-wide Mapping of 1050 Targets of Maize Histone Deacetylase HDA101 Reveals Its Function and Regulatory Mechanism during 1051 Seed Development Plant Cell 28 629ndash645 1052

Yim WC Yu Y Song K Jang CS Lee B-M (2013) PLANEX the plant co-expression database BMC Plant Biol 1053 13 83 1054

Zheng W Chung LM Zhao H (2011) Bias detection and correction in RNA-Sequencing data BMC 1055 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 30

Bioinformatics 12 290 1056

Zhong R Allen JD Xiao G Xie Y (2014) Ensemble-based network aggregation improves the accuracy of gene 1057 network reconstruction PLoS One 9 e106319 1058

Zhong R Ye ZH (2015) Secondary cell walls Biosynthesis patterned deposition and transcriptional regulation 1059 Plant Cell Physiol 56 195ndash214 1060

Zhu G Wu A Xu X-J Xiao P Lu L Liu J Cao Y Chen L Wu J Zhao X-M (2016) PPIM A protein-protein 1061 interaction database for Maize Plant Physiol 170 pp1501821- 1062

Zyprych-Walczak J Szabelska A Handschuh L Goacuterczak K Klamecka K Figlerowicz M Siatkowski I (2015) 1063 The impact of normalization methods on RNA-Seq data analysis Biomed Res Int 2015 1064

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 1 Number of maize microarray and RNA-Seq samples submitted to NCBI (httpswwwncbinlmnihgov) from 2008-2016 A A text search of the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) database identified samples generated by microarray platforms GPL4032 and GPL12620 the total values for years 2008 to 2016 were combined to represent the number of microarray studies (Microarray) A text search of the NCBI Sequence Read Archive (SRA) database was used to identify RNA-seq samples generated between 2008 and 2016 using the Illumina sequencing platform (RNA-Seq) B the number of samples submitted to NCBI GEO database each year generated by microarray platform GPL4032 and GPL12620 were identified by a text search (dash line) and compared to the number of RNA-Seq Illumina samples (solid line) per year 2008-2016

Fig 1A B

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 2 Normalization and network inference methods effect on single network performance A Network performance was evaluated by calculating area under the receiver operating characteristic curve (AUROC) values from GO datasets for comparisons with samples normalized using Variance Stabilizing Transformation (VST) Counts Per Million (CPM) or Reads per Kilobase per Million (RPKM) methods B Network performance was evaluated by calculating AUROC values from PPPTY dataset comparisons for samples normalized using VST CPM or RPKM methods C Network performance was evaluated by calculating AUROC values from com-parisons with HDA101 binding targets for samples normalized using VST CPM or RPKM methods D Network performance was evaluated by calculating AUROC values from comparisons with GO dataset for samples constructed using nine inference methods including Pearson Correlation Coefficient (PCC) Spearman correla-tion coefficient (SCC) Kendall rank Correlation Coefficient (KCC) Gini correlation coefficient (GCC) Biweight midcorrelation (BIC) Cosine Similarity Coefficient (CSC) Additive ARCNE (AA) multiplicative ARCNE (MA) Minimum Redundancy NETwork (MRNET) and Context Likelihood of Relatedness (CLR) E Network perfor-mance was evaluated by calculating AUROC values from comparisons with PPPTY for samples constructed using nine inference methods F Network performance was evaluated by calculating AUROC values from com-parisons with HDA101 binding targets for samples constructed using nine inference methods Outliers were defined as outside of 15 times the interquartile range above 75 quantile or below 25 quantile Median values were plotted as bold horizontal lines For C to F the horizontal dashed lines indicate the highest and lowest AUROC values

Fig 2 A D

B E

C F

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

FigP

FigurePIPSimilarityPbetweenPninePinferencePmethodsPonPnetworkPperformancePbasedPuponPGOPA-PandPPPPTYPB-PevaluationI Genes with highest AUROC value in CSC or MRNETCLR are closed in blue and green box respectively AreaPunderPthePROCPcurvePAUROC-PvaluesPforPeachPGOPtermPorPgenesPwerePscaledPtoPstandardPnormalPdistributionMPresultingPinPscaledPAUROCPvaluesPbetweenPKPblue-PandPPred-IPSamplesPnormalizedPbyPVSTMPCPMPandPRPKMPwerePanalyzedPusingPeachPinferencePmethodsPPCCMPSCCMPKCCMPGCCMPBICMPCSCMPAAMPMAMPMRNETPandPCLR-PandPclusteredPbasedPonP EuclidianP distanceIP PCCP PearsonP CorrelationP CoefficientP SCCP SpearmanP correlationP coefficientP KCCP KendallP rankP CorrelationP CoefficientPGCCPGiniPcorrelationPcoefficientPBICPBiweightPmidcorrelationPCosinePSimilarityPCoefficientPCSC-PAAPAdditivePARCNEPMAPmultiplicativePARCNEPMRNETPMinimumPRedundancyPNETworkPCLRPContextPLikelihoodPofPRelatednessI

A

B

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 4 Effect of sample size on network performance A Average area under the ROC curve (average AUROC) values from GO evaluation of 17 different sized networks plotted against natural logarithm transformed sample size (log(sample size)) B Average AUROC values from PPPTY evaluation of 17 different sized networks plotted against natural logarithm transformed sample size (log(sample size)) Regression fitting logarithm models were plotted in black lines R-square and p-values were calculated by lm() function in R PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness

Fig 4A

B

GO PCC GO SCC

GO MRNET GO CLR

PPPTY PCC PPPTY SCC

PPPTY CLRPPPTY MRNET

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Fig 5

A B

Figure 5 Evaluation of different sized networks constructed with different number of samples using PCC (black) SCC (green) MRNET (red) and CLR (blue) methods A Average AUROC values from GO evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted against sample size B Average AUROC values from PPPTY evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted against sample size Networks with the same number of samples included are designated as ldquo_1rdquo ldquo_2rdquo and ldquo_3rdquo PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 6 GCN performance comparison of networks constructed with 1266 libraries A Area under the ROC curve (AUROC) values from GO evalu-ation of single network (white bars) aggregation network (grey bars) and protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers B AUROC values from PPPTY evaluation of single network (white bars) aggregation network (grey bars) and protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers

AU

C

AU

C

Protein GO Protein PPPTYA B

Fig 6

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

835

45

3812

5 802

148

MS PA

SA

872505

167664

16510411732

9172 716573

106591

MS PA

SA

chromatinassemblydisassembly

cellular macromoleculemetabolic process

chromatin assembly

Hub

N2 compound metabolicprocess

gene silencing

macromoleculemetabolic process

cellular componentorganization

chromatin modification

biosynthetic process

cellular biosyntheticprocess

DNA packaging

organelle organization

protein-DNA complexassembly

nucleosomeorganization

DNA-dep DNAreplication

macromoleculebiosynthetic process

response to DNAdamage stimulus

chromosomeorganization

pattern specificationprocess

DNA replication

DNA conformationchange

translation

cellular macromoleculebiosynthetic process

Nucleic acid metabolicprocess

gene expression

chromatin organizationnucleosome assembly

epigenetic reg of geneexpression

negative regulation ofmacromolecule

metabolic process

cellular response tostress

RNA processing

DNA repair

regionalization

polysaccharidebiosynthetic process

cell wall organization orbiogenesis

glucan metabolicprocess

cellular glucanmetabolic process

cellular polysaccharidebiosynthetic process

cellular carbohydratebiosynthetic process

cellulose metabolicprocess

cellular polysaccharidemetabolic process

cellulose biosyntheticprocess epidermis development

cell growthgrowth

regulation of cellularcomponent size

cellular amino acidderivative metabolic

process

cell wall polysaccharidemetabolic process

carbohydrate metabolicprocess

regulation of anatomicalstructure size

GTP metabolic process

root morphogenesis

epidermal celldifferentiation

ectoderm developmentphenylpropanoid

biosynthetic process

regulation of cell size

glucan biosyntheticprocess

carbohydratebiosynthetic process cellular cell wall

organization orbiogenesis

cell wall biogenesis

Cell Wallroot epidermal cell

differentiationcell differentiation

cell wall organization

protein polymerization

plant-type cell wallbiogenesis

cellular carbohydratemetabolic process

phenylpropanoidmetabolic process

cell wall macromoleculemetabolic process

cellular cell wallmacromolecule

metabolic process

plant-type cell wallorganization or

biogenesis

hemicellulose metabolicprocess

Fig 7A C

B D

Figure 7 Comparison of top 1000 interacting genes in aggregation and single networks A Venn diagram shows the overlap among 1000 highest interacting genes in three networks 148 genes were shared among three networks PA PCC ranked aggregation network SA SCC ranked aggrega-tion network MS MRNET single network B Venn diagram shows the overlap among top 1x106 edges in three networks 106591 edges were shared among three networks C GO term enrichment analyzed by AgriGO for 148 shared highly interacting genes among PA SA and MS D GO term enrichment analyzed by AgriGO for 214 co-expressed genes queried by 16 cell wall pathway genes wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Fig 8

A B C

Figure 8 Cell wall pathway subnetworks A Intersections of PCC aggregation (PA) SCC aggregation (SA) and MRNET-single (MS) networks queried by 16 cell wall pathway genes (red nodes) Cyan nodes are genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions B Network retrieved from CORNET database queried by the 16 cell wall pathway genes (red node) Cyan nodes are genes with reported function in cell wall related path-ways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions C Network retrieved from STRING database queried by the 16 cell wall pathway genes (red nodes) Cyan nodes are genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Parsed CitationsAllen JD Xie Y Chen M Girard L Xiao G (2012) Comparing statistical methods for constructing large scale gene networks PLoS One7 e29348

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Anders S Huber W (2010) Differential expression analysis for sequence count data Genome Biol 11 R106Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Aoki K Ogata Y Shibata D (2007) Approaches for extracting practical information from gene co-expression networks in plant biologyPlant Cell Physiol 48 381-90

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Aoki Y Okamura Y Tadaka S Kinoshita K Obayashi T (2015) ATTED-II in 2016 A Plant Coexpression Database Towards Lineage-Specific Coexpression Plant Cell Physiol 57 e5-e5

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Asnicar F Masera L Coller E Gallo C Sella N Tolio T Morettin P Erculiani L Galante F Semeniuta S (2016) NES2RA Networkexpansion by stratified variable subsetting and ranking aggregation Int J High Perform Comput Appl 1094342016662508

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Baerenfaller K Grossmann J Grobei MA Hull R Hirsch-Hoffmann M Yalovsky S Zimmermann P Grossniklaus U Gruissem WBaginsky S (2008) Genome-scale proteomics reveals Arabidopsis thaliana gene models and proteome dynamics Science (80- ) 320938-941

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ballouz S Verleyen W Gillis J (2015) Guidance for RNA-seq co-expression network construction and analysis Safety in numbersBioinformatics 31 2123-2130

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ballouz S Weber M Pavlidis P Gillis J (2016) EGAD Ultra-fast functional analysis of gene networks bioRxiv 53868Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Barabasi A-L Oltvai ZNZN Barabaacutesi A-L (2004) Network biology understanding the cells functional organization Nat Rev Genet 5101-113

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Benjamini Y Hochberg Y (1995) Controlling the false discovery rate a practical and powerful approach to multiple testing J R Stat SocSer B 289-300

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

De Bodt S Hollunder J Nelissen H Meulemeester N Inzeacute D (2012) CORNET 20 Integrating plant coexpression protein-proteininteractions regulatory interactions gene associations and functional annotations New Phytol 195 707-720

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Bolduc N Yilmaz A Mejia-Guerra MK Morohashi K OConnor D Grotewold E Hake S (2012) Unraveling the KNOTTED1 regulatorynetwork in maize meristems Genes Dev 26 1685-90

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Bosch M Mayer C-D Cookson A Donnison IS (2011) Identification of genes involved in cell wall biogenesis in grasses by differential wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

gene expression profiling of elongating and non-elongating maize internodes J Exp Bot 62 3545-3561Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Camacho C Coulouris G Avagyan V Ma N Papadopoulos J Bealer K Madden TL (2009) BLAST+ architecture and applications BMCBioinformatics 10 421

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Conesa A Madrigal P Tarazona S Gomez-Cabrero D Cervera A McPherson A Szczesniak MW Gaffney DJ Elo LL Zhang X et al(2016) A survey of best practices for RNA-seq data analysis Genome Biol 17 13

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dhaeseleer P Liang S Somogyi R (2000) Genetic network inference from co-expression clustering to reverse engineeringBioinformatics 16 707-726

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Davidson RM Hansey CN Gowda M Childs KL Lin H Vaillancourt B Sekhon RS de Leon N Kaeppler SM Jiang N et al (2011) Utilityof RNA Sequencing for Analysis of Maize Reproductive Transcriptomes Plant Genome J 4 191

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Desprez T Juraniec M Crowell EF Jouy H Pochylova Z Parcy F Houmlfte H Gonneau M Vernhettes S (2007) Organization of cellulosesynthase complexes involved in primary cell wall synthesis in Arabidopsis thaliana Proc Natl Acad Sci 104 15572-15577

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dhillon IS Modha DS (2001) Concept decompositions for large sparse text data using clustering Mach Learn 42 143-175Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dillies M-A Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot G Castel D Estelle J (2013a) Acomprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis Brief Bioinform 14671-683

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dillies MA Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot NS Castel D Estelle J et al (2013b) Acomprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis Brief Bioinform 14671-683

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Doncheva NT Assenov Y Domingues FS Albrecht M (2012) Topological analysis and interactive visualization of biological networksand protein structures Nature Protoc 7 670-85

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dong J Horvath S (2007) Understanding network concepts in modules BMC Syst Biol 1 24Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dotto MC Petsch KA Aukerman MJ Beatty M Hammell M Timmermans MCP (2014) Genome-wide analysis of leafbladeless1-regulatedand phased small RNAs underscores the importance of the TAS3 ta-siRNA pathway to maize development PLoS Genet 10 e1004826

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Du D Rawat N Deng Z Gmitter FG (2015) Construction of citrus gene coexpression networks from microarray data using randommatrix theory Hortic Res 2 15026

Pubmed Author and TitleCrossRef Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Google Scholar Author Only Title Only Author and Title

Du Z Zhou X Ling Y Zhang Z Su Z (2010) agriGO a GO analysis toolkit for the agricultural community Nucleic Acids Res 38 64-70Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Enright AJ Van Dongen S Ouzounis CA (2002) An efficient algorithm for large-scale detection of protein families Nucleic Acids Res 301575-1584

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fadista J Vikman P Laakso EO Mollet IG Esguerra J Lou Taneera J Storm P Osmark P Ladenvall C Prasad RB (2014) Globalgenomic and transcriptomic analysis of human pancreatic islets reveals novel genes influencing glucose metabolism Proc Natl AcadSci 111 13924-13929

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Faith JJ Hayete B Thaden JT Mogno I Wierzbowski J Cottarel G Kasif S Collins JJ Gardner TS (2007) Large-scale mapping andvalidation of Escherichia coli transcriptional regulation from a compendium of expression profiles PLoS Biol 5 0054-0066

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fedoroff N V (2012) McClintocks challenge in the 21st century Proc Natl Acad Sci 109(50) 20200-20203Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ficklin SP Feltus FA (2011) Gene coexpression network alignment and conservation of gene modules between two grass speciesmaize and rice Plant Physiol 156 1244-56

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Filzmoser P Fritz H Kalcher K (2009) pcaPP Robust PCA by Projection Pursuit R Packag version 1Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fu J Cheng Y Linghu J Yang X Kang L Zhang Z Zhang J He C Du X Peng Z (2013) RNA sequencing reveals the complex regulatorynetwork in the maize kernel Nature Commun 42832

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gao S Ver Steeg G Galstyan A (2015) Efficient Estimation of Mutual Information for Strongly Dependent Variables ArtificialIntelligence and Statistics 277-286

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gillis J Pavlidis P (2011) The role of indirect connections in gene networks in predicting function Bioinformatics 27 1860-1866Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Giorgi F Fabbro C Del Licausi F (2013) Comparative study of RNA-seq-and Microarray-derived coexpression networks in Arabidopsisthaliana Bioinformatics 2 1-8

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gu Y Kaplinsky N Bringmann M Cobb A Carroll A Sampathkumar A Baskin TI Persson S Somerville CR (2010) Identification of acellulose synthase-associated protein required for cellulose biosynthesis Proc Natl Acad Sci 107 12866-12871

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Han Y Gao S Muegge K Zhang W Zhou B (2015) Advanced applications of RNA sequencing and challenges Bioinform Biol Insights 929-46

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Horvath S Dong J (2008) Geometric interpretation of gene coexpression network analysis PLoS Comput biol 4 e1000117Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Huang J Lynn JS Schulte L Vendramin S McGinnis K (2017) Chapter Two-Epigenetic Control of Gene Expression in Maize Int RevCell Mol Biol 328 25-48

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Iancu OD Kawane S Bottomly D Searles R Hitzemann R McWeeney S (2012) Utilizing RNA-Seq data for de novo coexpressionnetwork inference Bioinformatics 28 1592-1597

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kim D Langmead B Salzberg SL (2015) HISAT a fast spliced aligner with low memory requirements Nat Methods 12 357-360Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Koumlster J Rahmann S (2012) Snakemakemdasha scalable bioinformatics workflow engine Bioinformatics 28 2520-2522Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Krouk G Lingeman J Colon A Coruzzi G Shasha D (2013) Gene regulatory networks in plants learning causality from time andperturbation Genome Biol 14 123

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kumar S Stecher G Suleski M Hedges SB (2017) TimeTree a resource for timelines timetrees and divergence times Mol Biol Evol34 1812-1819

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kumari S Nie J Chen H-S Ma H Stewart R Li X Lu M-Z Taylor WM Wei H (2012) Evaluation of gene association methods forcoexpression network construction and biological knowledge discovery PLoS One 7 e50411

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Langfelder P Horvath S (2008) WGCNA an R package for weighted correlation network analysis BMC Bioinformatics 9 559Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Leinonen R Sugawara H Shumway M (2010) The sequence read archive Nucleic Acids Res gkq1019Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li C Qiao Z Qi W Wang Q Yuan Y Yang X Tang Y Mei B Lv Y Zhao H et al (2015a) Genome-Wide Characterization of cis-Acting DNATargets Reveals the Transcriptional Regulatory Framework of Opaque2 in Maize Plant Cell 27 532-545

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li H Peng Z Yang X Wang W Fu J Wang J Han Y Chai Y Guo T Yang N (2013a) Genome-wide association study dissects the geneticarchitecture of oil biosynthesis in maize kernels Nat Genet 45 43-50

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li J Wei H Zhao PX (2013b) DeGNServer Deciphering Genome-Scale Gene Networks through High Performance ReverseEngineering Analysis 2013

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li P Piao Y Shon HS Ryu KH (2015b) Comparing the normalization methods for the differential analysis of Illumina high-throughputRNA-Seq data BMC Bioinformatics 16 347 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li Q Eichten SR Hermanson PJ Zaunbrecher VM Song J Wendt J Rosenbaum H Madzima TF Sloan AE Huang J et al (2014a)Genetic Perturbation of the Maize Methylome Plant Cell 26 4602-4616

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li S Labaj PP Zumbo P Sykacek P Shi W Shi L Phan J Wu P-Y Wang M Wang C (2014b) Detecting and correcting systematicvariation in large-scale RNA sequencing data Nature Biotechnol 32 888-895

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li Y Pearl SA Jackson SA (2015c) Gene Networks in Plant Biology Approaches in Reconstruction and Analysis Trends Plant Sci 20664-675

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Liao Y Smyth GK Shi W (2014) featureCounts an efficient general purpose program for assigning sequence reads to genomicfeatures Bioinformatics 30 923-930

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Lim WK Wang K Lefebvre C Califano A (2007) Comparative analysis of microarray normalization procedures Effects on reverseengineering gene networks Bioinformatics pp 282-288

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Liu S Liu Y Zhao J Cai S Qian H Zuo K Zhao L Zhang L (2017) A computational interactome for prioritizing genes associated withcomplex agronomic traits in rice Plant J 90 177-188

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Lonsdale J Thomas J Salvatore M Phillips R Lo E Shad S Hasz R Walters G Garcia F Young N (2013) The genotype-tissueexpression (GTEx) project Nat Genet 45 580-585

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Love MI Huber W Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 Genome Biol15 1

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Luo X You Z Zhou M Li S Leung H Xia Y Zhu Q (2015) A highly efficient approach to protein interactome mapping based oncollaborative filtering framework Sci Rep 5 7702

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma C Wang X (2012) Application of the Gini correlation coefficient to infer regulatory relationships in transcriptome analysis PlantPhysiol 160 192-203

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma H-W Zeng A-P (2003) The connectivity structure giant strong component and centrality of metabolic networks Bioinformatics 191423-1430

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma S Shah S Bohnert HJ Snyder M Dinesh-Kumar SP (2013) Incorporating motif analysis into gene co-expression networks revealsnovel modular expression pattern and new signaling pathways PLoS Genet 9 e1003840

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Marbach D Costello JC Kuumlffner R Vega NNM Prill RJ Camacho DM Allison KR Aderhold A Allison KR Bonneau R et al (2012)Wisdom of crowds for robust gene network inference Nat Methods 9 796-804

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Margolin AA Nemenman I Basso K Wiggins C Stolovitzky G Dalla Favera R Califano A (2006) ARACNE an algorithm for thereconstruction of gene regulatory networks in a mammalian cellular context BMC Bioinformatics 7 S7

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mark Cigan A Unger-Wallace E Haug-Collet K (2005) Transcriptional gene silencing as a tool for uncovering gene function in maizePlant J 43 929-940

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads EMBnet J 17 pp-10Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Maza E Frasse P Senin P Bouzayen M Zouine M (2013) Comparison of normalization methods for differential gene expressionanalysis in RNA-Seq experiments A matter of relative size of studied transcriptomes Commun Integr Biol 6 e25849

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

McClintock B (1983) The Significance of Responses of THE GENOME TO CHALLENGE Science (80- ) 792-801Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Meyer PE Kontos K Lafitte F Bontempi G (2007) Information-theoretic inference of large transcriptional regulatory networks EurasipJ Bioinforma Syst Biol doi 101155200779879

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Meyer PE Lafitte F Bontempi G (2008) minet ARBioconductor package for inferring large transcriptional networks using mutualinformation BMC Bioinformatics 9 461

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Monaco MK Sen TZ Dharmawardhana PD Ren L Schaeffer M Naithani S Amarasinghe V Thomason J Harper L Gardiner J et al(2013) Maize Metabolic Network Construction and Transcriptome Analysis Plant Genome 6 12

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Morohashi K Casas MI Falcone Ferreyra ML Falcone Ferreyra L Mejiacutea-Guerra MK Pourcel L Yilmaz A Feller A Carvalho B EmilianiJ et al (2012) A genome-wide regulatory framework identifies maize pericarp color1 controlled genes Plant Cell 24 2745-64

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Morris JH Apeltsin L Newman AM Baumbach J Wittkop T Su G Bader GD Ferrin TE (2011) clusterMaker a multi-algorithmclustering plugin for Cytoscape BMC Bioinformatics 12 436

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mortazavi A Williams BA McCue K Schaeffer L Wold B (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq NatMethods 5 621-628

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mukaka MM (2012) A guide to appropriate use of correlation coefficient in medical research Malawi Med J 24 69-71Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Obayashi T Hayashi S Saeki M Ohta H Kinoshita K (2009) ATTED-II provides coexpressed gene networks for Arabidopsis Nucleic wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Acids Res 37 D987-D991Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ogata Y Suzuki H Sakurai N Shibata D (2010) CoP a database for characterizing co-expressed gene modules with biologicalinformation in plants Bioinformatics 26 1267-1268

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Oshlack A Wakefield MJ (2009) Transcript length bias in RNA-seq data confounds systems biology Biol Direct 4 14Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Park T Yi S-G Kang S-H Lee S Lee Y-S Simon R (2003) Evaluation of normalization methods for microarray data BMC Bioinformatics4 33

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Paulson J Chen C-Y Lopes-Ramos CM Kuijjer ML Platig J Sonawane AR Fagny M Glass K Quackenbush J (2016) Tissue-awareRNA-Seq processing and normalization for heterogeneous and sparse data bioRxiv 81802

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Pautler M Eveland AL LaRue T Yang F Weeks R Lunde C Je B Il Meeley R Komatsu M Vollbrecht E et al (2015) FASCIATED EAR4Encodes a bZIP Transcription Factor That Regulates Shoot Meristem Size in Maize Plant Cell Online 2 tpc114132506

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Penning BW Hunter 3rd CT Tayengwa R Eveland AL Dugard CK Olek AT Vermerris W Koch KE McCarty DR Davis MF et al (2009)Genetic resources for maize cell wall biology Plant Physiol 151 1703-1728

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ponnala L Wang Y Sun Q Wijk KJ (2014) Correlation of mRNA and protein abundance in the developing maize leaf Plant J 78 424-440Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Rau A Gallopin M Celeux G Jaffreacutezic F (2013) Data-based filtering for replicated high-throughput transcriptome sequencingexperiments Bioinformatics 29 2146-2152

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Robinson MD McCarthy DJ Smyth GK (2010) edgeR a Bioconductor package for differential expression analysis of digital geneexpression data Bioinformatics 26 139-140

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sales G Romualdi C (2011) parmigenemdasha parallel R package for mutual information estimation and gene network reconstructionBioinformatics 27 1876-1877

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sangrador-Vegas A Mitchell AL Chang H-Y Yong S-Y Finn RD (2016) GO annotation in InterPro why stability does not indicateaccuracy in a sea of changing annotations Database J Biol databases curation 2016

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sato Y Antonio BA Namiki N Takehisa H Minami H Kamatsuki K Sugimoto K Shimizu Y Hirochika H Nagamura Y (2011) RiceXPro aplatform for monitoring gene expression in japonica rice grown under natural field conditions Nucleic Acids Res 39 D1141-D1148

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schaefer RJ Briskine R Springer NM Myers CL (2014) Discovering functional modules across diverse maize transcriptomes using wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

COB the co-expression browser PLoS One doi 101371journalpone0099193Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schmidt D (2016) Co-Operation Fast Correlation Covariance and Cosine Similarity R packagePubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schnable P Ware D Fulton R Stein J (2009) The B73 maize genome complexity diversity and dynamics Science (80- ) 326 1112-1115Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schwanhaumlusser B Busse D Li N Dittmar G Schuchhardt J Wolf J Chen W Selbach M (2011) Global quantification of mammalian geneexpression control Nature 473 337-342

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Scott-Boyer M-P Haibe-Kains B Deschepper CF (2013) Network statistics of genetically-driven gene co-expression modules in mousecrosses Frontiers in Genetics 20134291

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Serin EAR Nijveen H Hilhorst HWM Ligterink W (2016) Learning from Co-expression Networks Possibilities and Challenges FrontPlant Sci 7 444

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Shannon P Markiel A Ozier O Baliga NS Wang JT Ramage D Amin N Schwikowski B Ideker T (2003) Cytoscape a softwareenvironment for integrated models of biomolecular interaction networks Genome Res 13 2498-2504

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sing T Sander O Beerenwinkel N Lengauer T (2005) ROCR visualizing classifier performance in R Bioinformatics 21 3940-3941Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Singh M Goel S Meeley RB Dantec C Parrinello H Michaud C Leblanc O Grimanelli D (2011) Production of viable gametes withoutmeiosis in maize deficient for an ARGONAUTE protein Plant Cell 23 443-458

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Song L Langfelder P Horvath S (2012) Comparison of co-expression measures mutual information correlation and model basedindices BMC Bioinformatics 13 328

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Stelpflug SC Rajandeep S Vaillancourt B Hirsch CN Buell CR Leon N De Kaeppler SM (2015) An expanded maize gene expressionatlas based on RNA-sequencing and its use to explore root development Plant Genome 314-362

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Szklarczyk D Franceschini A Wyder S Forslund K Heller D Huerta-Cepas J Simonovic M Roth A Santos A Tsafou KP et al (2015)STRING v10 Protein-protein interaction networks integrated over the tree of life Nucleic Acids Res 43 D447-D452

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Tzfadia O Diels T De Meyer S Vandepoele K Aharoni A Van de Peer Y (2016) CoExpNetViz Comparative Co-Expression NetworksConstruction and Visualization Tool Front Plant Sci 6 1194

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Usadel B Obayashi T Mutwil M Giorgi FM Bassel GW Tanimoto M Chow A Steinhauser D Persson S Provart NJ (2009) Co-expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

USDA (2016) Grain World Markets and Trade

Vanholme R Demedts B Morreel K Ralph J Boerjan W (2010) Lignin biosynthesis and structure Plant Physiol 153 895-905Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Verdin E Ott M (2014) 50 Years of Protein Acetylation From Gene Regulation To Epigenetics Metabolism and Beyond Nat Rev MolCell Biol 16 258-264

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Walley JW Sartor RC Shen Z Schmitz RJ Wu KJ Urich MA Nery JR Smith LG Schnable JC Ecker JR (2016) Integration of omicnetworks in a developmental atlas of maize Science 353 814-818

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang W Zhou XJ Liu Z Sun F (2015a) Network tuned multiple rank aggregation and applications to gene ranking BMC Bioinformatics16 S6

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang YXR Jiang K Feldman LJ Bickel PJ Huang H (2015b) Inferring gene-gene interactions and functional modules using sparsecanonical correlation analysis Ann Appl Stat 9 300-323

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang Z Gerstein M Snyder M (2009) RNA-Seq a revolutionary tool for transcriptomics Nat Rev Genet 10 57-63Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wei C Li J Bumgarner RE (2004) Sample size for detecting differentially expressed genes in microarray experiments BMC Genomics5 87

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wolfe CJ Kohane IS Butte AJ (2005) Systematic survey reveals general applicability of guilt-by-association within genecoexpression networks BMC Bioinformatics 6 227

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wu D-D Wang X Li Y Zeng L Irwin DM Zhang Y-P (2014) Out of Pollen Hypothesis for Origin of New Genes in Flowering PlantsStudy from Arabidopsis thaliana Genome Biol Evol 6 2822-2829

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yang H Liu X Xin M Du J Hu Z Peng H Rossi V Sun Q Ni Z Yao Y (2016) Genome-wide Mapping of Targets of Maize HistoneDeacetylase HDA101 Reveals Its Function and Regulatory Mechanism during Seed Development Plant Cell 28 629-645

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yim WC Yu Y Song K Jang CS Lee B-M (2013) PLANEX the plant co-expression database BMC Plant Biol 13 83Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zheng W Chung LM Zhao H (2011) Bias detection and correction in RNA-Sequencing data BMC Bioinformatics 12 290Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Allen JD Xiao G Xie Y (2014) Ensemble-based network aggregation improves the accuracy of gene network reconstructionPLoS One 9 e106319 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Ye ZH (2015) Secondary cell walls Biosynthesis patterned deposition and transcriptional regulation Plant Cell Physiol 56195-214

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhu G Wu A Xu X-J Xiao P Lu L Liu J Cao Y Chen L Wu J Zhao X-M (2016) PPIM A protein-protein interaction database for MaizePlant Physiol 170 pp1501821-

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zyprych-Walczak J Szabelska A Handschuh L Goacuterczak K Klamecka K Figlerowicz M Siatkowski I (2015) The impact of normalizationmethods on RNA-Seq data analysis Biomed Res Int 2015

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

  • Parsed Citations
  • Article File
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Figure 7
  • Figure 8
  • Parsed Citations
Page 18: 1 : Maize RNA-Seq GCNs · 4 Author names and affiliation: 5 Ji Huang, ... and the genome annotation is still far from complete (Mark 49 Cigan et al., 2005; Ficklin and Feltus, 2011)

Page | 18

Supplemental Figure 5 Characteristic of all 1720 PPPTY gene set (ALL_1720) genes with highest AUROC 595

values in CSC method (CSC) PCC method (PCC) and MRNET method (MRNET) 596

Supplemental Figure 6 Evaluation of network performance based on sample size and inference 597

Supplemental Figure 7 GCN performance comparison between protein networks 598

Supplemental Figure 8 Average neighborhood connectivity for three selected networks PCC-aggregated (PA) 599

SCC-aggregated (SA) and MRNET-single (MS) 600

Supplemental Figure 9 Node distribution for four selected networks PCC-aggregated (PA) SCC-aggregated 601

(SA) and MRNET-single (MS) and the intersection among three networks (Merged network) 602

Supplemental Figure 10 Network representation of PCC ranked aggregation network (PA) 603

Supplemental Table S1 RNA-Seq libraries used in this analysis 604

Supplemental Table S2 Random network AUROC value baseline 605

Supplemental Table S3 ANOVA tables and pairwise comparisons 606

Supplemental Table S4 Topological characteristics of four maize networks 607

Supplemental Table S5 Gene Ontology annotation for 148 hub genes 608

Supplemental Table S6 Enriched GO terms for PCC ranked aggregation networks from module 1 to module 8 609

Supplemental Table S7 Enriched GO terms for SCC ranked aggregation networks from module 1 to module 8 610

Supplemental Table S8 16 query genes in maize cell wall pathway 611

Supplemetal Table S9 GO enrichment analysis for 214 co-expressed genes of cell wall query genes in 612

merged network 613

Supplemental Table S10 Annotation for co-expressed genes queried by 16 cell wall pathway genes from 614

merged network 615

Supplemental Table S11 Annotation for co-expressed genes queried by 16 cell wall pathway genes from 616

CORNET database 617

Supplemental Table S12 Annotation for co-expressed genes queried by 16 cell wall pathway genes from 618

STRING database 619

Supplemental Dataset S1 The merged network in Cytoscape-ready format 620

Supplemental Dataset S2 Tutorial Visualizing Co-expression data in Cytoscape 621

622

623 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 19

624

625

626

Figure legends 627

628

Figure 1 Number of maize microarray and RNA-Seq samples submitted to NCBI (httpswwwncbinlmnihgov) 629

from 2008-2016 A A text search of the National Center for Biotechnology Information (NCBI) Gene 630

Expression Omnibus (GEO) database identified samples generated by microarray platforms GPL4032 and 631

GPL12620 the total values for years 2008 to 2016 were combined to represent the number of microarray 632

studies (Microarray) A text search of the NCBI Sequence Read Archive (SRA) database was used to identify 633

RNA-seq samples generated between 2008 and 2016 using the Illumina sequencing platform (RNA-Seq) B 634

the number of samples submitted to NCBI GEO database each year generated by microarray platform 635

GPL4032 and GPL12620 were identified by a text search (dash line) and compared to the number of RNA-Seq 636

Illumina samples (solid line) per year 2008-2016 637

638

Figure 2 Normalization and network inference methods effect on single network performance A Network 639

performance was evaluated by calculating area under the receiver operating characteristic curve (AUROC) 640

values from GO datasets for comparisons with samples normalized using Variance Stabilizing Transformation 641

(VST) Counts Per Million (CPM) or Reads per Kilobase per Million (RPKM) methods B Network performance 642

was evaluated by calculating AUROC values from PPPTY dataset comparisons for samples normalized using 643

VST CPM or RPKM methods C Network performance was evaluated by calculating AUROC values from 644

comparisons with HDA101 binding targets for samples normalized using VST CPM or RPKM methods D 645

Network performance was evaluated by calculating AUROC values from comparisons with GO dataset for 646

samples constructed using ten inference methods including Pearson Correlation Coefficient (PCC) Spearman 647

correlation coefficient (SCC) Kendall rank Correlation Coefficient (KCC) Gini correlation coefficient (GCC) 648

Biweight midcorrelation (BIC) Cosine Similarity Coefficient (CSC) Additive ARCNE (AA) multiplicative 649

ARCNE (MA) Minimum Redundancy NETwork (MRNET) and Context Likelihood of Relatedness (CLR) E 650

Network performance was evaluated by calculating AUROC values from comparisons with PPPTY for samples 651

constructed using ten inference methods F Network performance was evaluated by calculating AUROC 652

values from comparisons with HDA101 binding targets for samples constructed using ten inference methods 653

Outliers were defined as outside of 15 times the interquartile range above 75 quantile or below 25 quantile 654

Median values were plotted as bold horizontal lines For C to F the horizontal dashed lines indicate the highest 655

and lowest AUROC values 656

657

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 20

Figure 3 Similarity between ten inference methods on network performance based upon GO (A) and PPPTY 658

(B) evaluation Genes with highest AUROC value in CSC or MRNETCLR are closed in blue and green box 659

respectively Area under the ROC curve (AUROC) values for each GO term or genes were scaled to standard 660

normal distribution resulting in scaled AUROC values between -3 (blue) and 3 (red) Samples normalized by 661

VST CPM and RPKM were analyzed using each inference methods (PCC SCC KCC GCC BIC CSC AA 662

MA MRNET and CLR) and clustered based on Euclidian distance PCC Pearson Correlation Coefficient SCC 663

Spearman correlation coefficient KCC Kendall rank Correlation Coefficient GCC Gini correlation coefficient 664

BIC Biweight midcorrelation CSC Cosine Similarity Coefficient AA Additive ARCNE MA multiplicative 665

ARCNE MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness 666

667

Figure 4 Effect of sample size on network performance A Average area under the ROC curve (average 668

AUROC) values from GO evaluation of 17 different sized networks plotted against natural logarithm 669

transformed sample size (log(sample size)) B Average AUROC values from PPPTY evaluation of 17 different 670

sized networks plotted against natural logarithm transformed sample size (log(sample size)) Regression fitting 671

logarithm models were plotted in black lines R-square and p-values were calculated by lm() function in R PCC 672

Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET Minimum Redundancy 673

NETwork CLR Context Likelihood of Relatedness 674

675

Figure 5 Evaluation of different sized networks constructed with different number of samples using PCC 676

(black) SCC (green) MRNET (red) and CLR (blue) methods A Average AUROC values from GO evaluations 677

of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted against sample size Seventeen 678

individual networks were labeled as S12_1 to S404 the S1266 included all samples from 17 experiments B 679

Average AUROC values from PPPTY evaluations of networks constructed using 12 (S12) to 1266 (S1266) 680

libraries were plotted against sample size Networks with the same number of samples included are 681

designated as ldquo_1rdquo ldquo_2rdquo and ldquo_3rdquo PCC Pearson Correlation Coefficient SCC Spearman correlation 682

coefficient MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness 683

684

Fig 6 GCN performance comparison among single network (whiterdquo1266rdquo) aggregated network (greyrdquoaggrdquo) 685

and protein network (dark greyrdquoprrdquo) using PCC SCC MRNET and CLR A GO evaluation on networks 686

Inference methods were indicated by single letter (p- PCC s- SCC m- MRNET c-CLR) AUROC values were 687

plotted against network types B PPPTY evaluation on networks Inference methods were indicated by single 688

letter (p- PCC s- SCC m- MRNET c-CLR) Network types were plotted against AUROC values Bold 689

horizontal lines indicate median star sign is the mean value of each box Outliers are plotted in grey dots 690

691

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 21

Figure 6 GCN performance comparison of networks constructed with 1266 libraries A Area under the ROC 692

curve (AUROC) values from GO evaluation of single network (white bars) aggregation network (grey bars) and 693

protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) 694

or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers B 695

AUROC values from PPPTY evaluation of single network (white bars) aggregation network (grey bars) and 696

protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) 697

or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers 698

699

Figure 7 Comparison of top 1000 interacting genes in aggregation and single networks A Venn diagram 700

shows the overlap among 1000 highest interacting genes in three networks 148 genes were shared among 701

three networks PA PCC ranked aggregation network SA SCC ranked aggregation network MS MRNET 702

single network B Venn diagram shows the overlap among top 1x106 edges in three networks 106591 edges 703

were shared among three networks C GO term enrichment analyzed by AgriGO for 148 shared highly 704

interacting genes among PA SA and MS D GO term enrichment analyzed by AgriGO for 214 co-expressed 705

genes queried by 16 cell wall pathway genes 706

707

Figure 8 Cell wall pathway subnetworks A Intersections of PCC aggregation (PA) SCC aggregation (SA) and 708

MRNET-single (MS) networks queried by 16 cell wall pathway genes (red nodes) Cyan nodes are genes with 709

reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of 710

involvement in cell wall related pathways Grey lines indicate network predicted interactions B Network 711

retrieved from CORNET database queried by the16 cell wall pathway genes (red node) Cyan nodes are 712

genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior 713

knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions C 714

Network retrieved from STRING database queried by 16 cell wall pathway genes (red nodes) Cyan nodes are 715

genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior 716

knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions 717

718

Supplemental Figure 1 Pipeline and datasets used for analysis A Workflow used in this analysis 719

Independent steps are labeled in square boxes with alternative algorithms for each step in the rounded boxes 720

Software and packages for each step are in italics between the boxes Raw data files were acquired from 721

National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) database converted to a 722

common format (fastq files) and aligned to the maize AGPv3 genome (Alignment) Gene-level reads were 723

counted (Read Count) to generate an expression matrix which was imported to the R environment for the 724

normalization inference and evaluation steps All networks were visualized in Cytoscape B Relative 725

representation of different maize tissues in acquired datasets Tissues are listed by name with the percentage 726

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 22

of the1266 libraries originating from each tissue SAM= Shoot Apical Meristem Samples are grouped by tissue 727

and may be represented by one or more developmental stages of that tissue Tissues represented by less than 728

10 libraries were grouped together as Others C Relative representation of different maize genotypes in our 729

datasets Genotypes are listed by name with the percentage of the 1266 libraries originating from each tissue 730

MAGIC = Multi-parent Advanced Generation InterCrosses Genotypes represented by more than 10 libraries 731

were grouped together as Others 732

733

Supplemental Figure 2 Distribution of gene expression values The frequency of each expression level in the 734

dataset (Density) was plotted against gene expression (Expr) which was calculated after normalization by 735

Variance Stabilizing Transformation (VST) Counts Per Million (CPM) or Reads Per Kilobase per Million 736

mapped reads (RPKM) A-B distribution of expression values for samples normalized with CPM (black line 737

CPM graph) and RPKM (black line RPKM graph) before (A) and after (B) logarithm normalization (log2) VST 738

values are log2 transformed by default The normal distribution of expression (dot lines) was calculated using 739

dnorm() function in R which takes the mean value and standard deviation from log2 transformed expressions 740

C Normalized gene expression values for 15116 genes were averaged libraries and plotted as a function of 741

gene length in base pairs (bp) 742

743

Supplemental Figure 3 Maize CPM-normalized with log2 transformed gene expression from all tissues and 744

developmental stages (Stelpflug et al 2015) A Clustering dendrogram of samples based on Euclidean 745

distance (Height) DAS days after sowing DAP days after pollination V1-V18 vegetative developmental 746

stage B Heat map of the gene expression correlation between pollen tissue and 78 other tissues calculated 747

by Pearson correlation coefficient ranging 06 to 10 Red color indicates higher correlation 748

749

Supplemental Figure 4 Pairwise comparison among results of inferences methods A GO evaluation 750

comparisons for VST CPM and RPKM normalized data The AUROC value density for each method was 751

plotted in diagonal line of blocks between AUROC values and PCC values AUROC values evaluated by GO 752

datasets were plotted pairwise in triangle below diagonal with the number corresponding coefficient values as 753

calculated by Pearson correlation shown in the triangle above diagonal B PPPTY evaluation comparisons for 754

VST CPM and RPKM normalized data The AUROC value density for each method was plotted in diagonal 755

line of blocks between AUROC values and PCC values AUROC values evaluated by PPPTY datasets were 756

plotted pairwise in triangle below diagonal with the number corresponding coefficient values as calculated by 757

Pearson correlation shown in the triangle above diagonal PCC Pearson Correlation Coefficient SCC 758

Spearman correlation coefficient KCC Kendall rank Correlation Coefficient GCC Gini correlation coefficient 759

Bi Biweight midcorrelation CSC Cosine Similarity Coefficient AA Additive ARCNE MA multiplicative 760

ARCNE MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness 761

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 23

762

Supplemental Figure 5 Characteristic of all 1720 PPPTY gene set (ALL_1720) genes with highest AUROC 763

values in CSC method (CSC) PCC method (PCC) and MRNET method (MRNET) Average expression in 764

CPM of four gene sets were in squares average number of lowly expressed elements (CPM lt 0) were in solid 765

circles 766

767

Supplemental Figure 6 Evaluation of network performance based on sample size and inference A AUROC 768

values from GO evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted 769

against sample size B AUROC values from PPPTY evaluations of networks constructed using 12 (S12) to 770

1266 (S1266) libraries were plotted against sample size Networks with the same number of samples included 771

are designated as ldquo_1rdquo ldquo_2rdquo and ldquo_3rdquo Outliers were defined as outside of 15 times the interquartile range 772

above the 75 quantile or below 25 quantile Median values were plotted as bold horizontal lines Dash lines 773

are average AUROC value from 17 individual networks of each categories Mean values of each network were 774

labeled in asterisks PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET 775

Minimum Redundancy NETwork CLR Context Likelihood of Relatedness 776

777

Supplemental Figure 7 GCN performance comparison between protein networks A Area Under the ROC 778

curve (AUROC) values from GO evaluation of protein networks with 17862 genes (ppr_all) and with 11429 779

genes (ppr) B Area Under the ROC curve (AUROC) values from PPPTY evaluation of protein networks with 780

17862 genes (ppr_all) and with 11429 genes (ppr) Both networks were constructed by Pearson Correlation 781

Coefficient (PCC) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate 782

outliers 783

784

Supplemental Figure 8 Average neighborhood connectivity for three selected networks PCC-aggregated (PA) 785

SCC-aggregated (SA) and MRNET-single (MS) The average neighborhood connectivity distribution of all 786

genes is plotted against number of neighbors The top one million edges were chosen for each network Red 787

and blue curve shows the power-law fitted distribution R2 value indicates the fitness with the power-law model 788

789

Supplemental Figure 9 Node distribution for four selected networks PCC-aggregated (PA) SCC-aggregated 790

(SA) and MRNET-single (MS) and the intersection among three networks (Merged network) The number of 791

edges linked to the genes (node degree) was plotted against the number of genes with that degree (number of 792

nodes) Red curve shows the power-law fitted distribution with the function and R2 indicated beside 793

794

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 24

Supplemental Figure 10 Network representation of PCC ranked aggregation network (PA) Each node is a 795

gene in the network The eight largest modules detected by Markov Cluster Algorithm (MCL) were highlighted 796

in colors Genes not in modules 1-8 are light grey nodes 797

798

799

Literature Cited 800

Allen JD Xie Y Chen M Girard L Xiao G (2012) Comparing statistical methods for constructing large scale 801 gene networks PLoS One 7 e29348 802

Anders S Huber W (2010) Differential expression analysis for sequence count data Genome Biol 11 R106 803

Aoki K Ogata Y Shibata D (2007) Approaches for extracting practical information from gene co-expression 804 networks in plant biology Plant Cell Physiol 48 381ndash90 805

Aoki Y Okamura Y Tadaka S Kinoshita K Obayashi T (2015) ATTED-II in 2016 A Plant Coexpression 806 Database Towards Lineage-Specific Coexpression Plant Cell Physiol 57 e5ndashe5 807

Asnicar F Masera L Coller E Gallo C Sella N Tolio T Morettin P Erculiani L Galante F Semeniuta S (2016) 808 NES2RA Network expansion by stratified variable subsetting and ranking aggregation Int J High Perform 809 Comput Appl 1094342016662508 810

Baerenfaller K Grossmann J Grobei MA Hull R Hirsch-Hoffmann M Yalovsky S Zimmermann P 811 Grossniklaus U Gruissem W Baginsky S (2008) Genome-scale proteomics reveals Arabidopsis thaliana 812 gene models and proteome dynamics Science (80- ) 320 938ndash941 813

Ballouz S Verleyen W Gillis J (2015) Guidance for RNA-seq co-expression network construction and analysis 814 Safety in numbers Bioinformatics 31 2123ndash2130 815

Ballouz S Weber M Pavlidis P Gillis J (2016) EGAD Ultra-fast functional analysis of gene networks bioRxiv 816 53868 817

Barabasi A-L Oltvai ZNZN Barabaacutesi A-L (2004) Network biology understanding the cellrsquos functional 818 organization Nat Rev Genet 5 101ndash113 819

Benjamini Y Hochberg Y (1995) Controlling the false discovery rate a practical and powerful approach to 820 multiple testing J R Stat Soc Ser B 289ndash300 821

De Bodt S Hollunder J Nelissen H Meulemeester N Inzeacute D (2012) CORNET 20 Integrating plant 822 coexpression protein-protein interactions regulatory interactions gene associations and functional 823 annotations New Phytol 195 707ndash720 824

Bolduc N Yilmaz A Mejia-Guerra MK Morohashi K OrsquoConnor D Grotewold E Hake S (2012) Unraveling the 825 KNOTTED1 regulatory network in maize meristems Genes Dev 26 1685ndash90 826

Bosch M Mayer C-D Cookson A Donnison IS (2011) Identification of genes involved in cell wall biogenesis in 827 grasses by differential gene expression profiling of elongating and non-elongating maize internodes J 828 Exp Bot 62 3545ndash3561 829

Camacho C Coulouris G Avagyan V Ma N Papadopoulos J Bealer K Madden TL (2009) BLAST+ 830 architecture and applications BMC Bioinformatics 10 421 831

Conesa A Madrigal P Tarazona S Gomez-Cabrero D Cervera A McPherson A Szcześniak MW Gaffney DJ 832 Elo LL Zhang X et al (2016) A survey of best practices for RNA-seq data analysis Genome Biol 17 13 833

Drsquohaeseleer P Liang S Somogyi R (2000) Genetic network inference from co-expression clustering to reverse 834 engineering Bioinformatics 16 707ndash726 835

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 25

Davidson RM Hansey CN Gowda M Childs KL Lin H Vaillancourt B Sekhon RS de Leon N Kaeppler SM 836 Jiang N et al (2011) Utility of RNA Sequencing for Analysis of Maize Reproductive Transcriptomes Plant 837 Genome J 4 191 838

Desprez T Juraniec M Crowell EF Jouy H Pochylova Z Parcy F Houmlfte H Gonneau M Vernhettes S (2007) 839 Organization of cellulose synthase complexes involved in primary cell wall synthesis in Arabidopsis 840 thaliana Proc Natl Acad Sci 104 15572ndash15577 841

Dhillon IS Modha DS (2001) Concept decompositions for large sparse text data using clustering Mach Learn 842 42 143ndash175 843

Dillies M-A Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot G Castel D 844 Estelle J (2013a) A comprehensive evaluation of normalization methods for Illumina high-throughput RNA 845 sequencing data analysis Brief Bioinform 14 671ndash683 846

Dillies MA Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot NS Castel D 847 Estelle J et al (2013b) A comprehensive evaluation of normalization methods for Illumina high-throughput 848 RNA sequencing data analysis Brief Bioinform 14 671ndash683 849

Doncheva NT Assenov Y Domingues FS Albrecht M (2012) Topological analysis and interactive visualization 850 of biological networks and protein structures Nature Protoc 7 670ndash85 851

Dong J Horvath S (2007) Understanding network concepts in modules BMC Syst Biol 1 24 852

Dotto MC Petsch KA Aukerman MJ Beatty M Hammell M Timmermans MCP (2014) Genome-wide analysis 853 of leafbladeless1-regulated and phased small RNAs underscores the importance of the TAS3 ta-siRNA 854 pathway to maize development PLoS Genet 10 e1004826 855

Du D Rawat N Deng Z Gmitter FG (2015) Construction of citrus gene coexpression networks from microarray 856 data using random matrix theory Hortic Res 2 15026 857

Du Z Zhou X Ling Y Zhang Z Su Z (2010) agriGO a GO analysis toolkit for the agricultural community 858 Nucleic Acids Res 38 64-70 859

Enright AJ Van Dongen S Ouzounis CA (2002) An efficient algorithm for large-scale detection of protein 860 families Nucleic Acids Res 30 1575ndash1584 861

Fadista J Vikman P Laakso EO Mollet IG Esguerra J Lou Taneera J Storm P Osmark P Ladenvall C 862 Prasad RB (2014) Global genomic and transcriptomic analysis of human pancreatic islets reveals novel 863 genes influencing glucose metabolism Proc Natl Acad Sci 111 13924ndash13929 864

Faith JJ Hayete B Thaden JT Mogno I Wierzbowski J Cottarel G Kasif S Collins JJ Gardner TS (2007) 865 Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of 866 expression profiles PLoS Biol 5 0054ndash0066 867

Fedoroff N V (2012) McClintockrsquos challenge in the 21st century Proc Natl Acad Sci 109(50) 20200ndash20203 868

Ficklin SP Feltus FA (2011) Gene coexpression network alignment and conservation of gene modules 869 between two grass species maize and rice Plant Physiol 156 1244ndash56 870

Filzmoser P Fritz H Kalcher K (2009) pcaPP Robust PCA by Projection Pursuit R Packag version 1 871

Fu J Cheng Y Linghu J Yang X Kang L Zhang Z Zhang J He C Du X Peng Z (2013) RNA sequencing 872 reveals the complex regulatory network in the maize kernel Nature Commun 42832 873

Gao S Ver Steeg G Galstyan A (2015) Efficient Estimation of Mutual Information for Strongly Dependent 874 Variables Artificial Intelligence and Statistics 277-286 875

Gillis J Pavlidis P (2011) The role of indirect connections in gene networks in predicting function 876 Bioinformatics 27 1860ndash1866 877

Giorgi F Fabbro C Del Licausi F (2013) Comparative study of RNA-seq-and Microarray-derived coexpression 878 networks in Arabidopsis thaliana Bioinformatics 2 1ndash8 879

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 26

Gu Y Kaplinsky N Bringmann M Cobb A Carroll A Sampathkumar A Baskin TI Persson S Somerville CR 880 (2010) Identification of a cellulose synthase-associated protein required for cellulose biosynthesis Proc 881 Natl Acad Sci 107 12866ndash12871 882

Han Y Gao S Muegge K Zhang W Zhou B (2015) Advanced applications of RNA sequencing and challenges 883 Bioinform Biol Insights 9 29ndash46 884

Horvath S Dong J (2008) Geometric interpretation of gene coexpression network analysis PLoS Comput biol 885 4 e1000117 886

Huang J Lynn JS Schulte L Vendramin S McGinnis K (2017) Chapter Two-Epigenetic Control of Gene 887 Expression in Maize Int Rev Cell Mol Biol 328 25ndash48 888

Iancu OD Kawane S Bottomly D Searles R Hitzemann R McWeeney S (2012) Utilizing RNA-Seq data for de 889 novo coexpression network inference Bioinformatics 28 1592ndash1597 890

Kim D Langmead B Salzberg SL (2015) HISAT a fast spliced aligner with low memory requirements Nat 891 Methods 12 357ndash360 892

Koumlster J Rahmann S (2012) Snakemakemdasha scalable bioinformatics workflow engine Bioinformatics 28 893 2520ndash2522 894

Krouk G Lingeman J Colon A Coruzzi G Shasha D (2013) Gene regulatory networks in plants learning 895 causality from time and perturbation Genome Biol 14 123 896

Kumar S Stecher G Suleski M Hedges SB (2017) TimeTree a resource for timelines timetrees and 897 divergence times Mol Biol Evol 34 1812ndash1819 898

Kumari S Nie J Chen H-S Ma H Stewart R Li X Lu M-Z Taylor WM Wei H (2012) Evaluation of gene 899 association methods for coexpression network construction and biological knowledge discovery PLoS 900 One 7 e50411 901

Langfelder P Horvath S (2008) WGCNA an R package for weighted correlation network analysis BMC 902 Bioinformatics 9 559 903

Leinonen R Sugawara H Shumway M (2010) The sequence read archive Nucleic Acids Res gkq1019 904

Li C Qiao Z Qi W Wang Q Yuan Y Yang X Tang Y Mei B Lv Y Zhao H et al (2015a) Genome-Wide 905 Characterization of cis-Acting DNA Targets Reveals the Transcriptional Regulatory Framework of 906 Opaque2 in Maize Plant Cell 27 532-545 907

Li H Peng Z Yang X Wang W Fu J Wang J Han Y Chai Y Guo T Yang N (2013a) Genome-wide 908 association study dissects the genetic architecture of oil biosynthesis in maize kernels Nat Genet 45 43ndash909 50 910

Li J Wei H Zhao PX (2013b) DeGNServer  Deciphering Genome-Scale Gene Networks through High 911 Performance Reverse Engineering Analysis 2013 912

Li P Piao Y Shon HS Ryu KH (2015b) Comparing the normalization methods for the differential analysis of 913 Illumina high-throughput RNA-Seq data BMC Bioinformatics 16 347 914

Li Q Eichten SR Hermanson PJ Zaunbrecher VM Song J Wendt J Rosenbaum H Madzima TF Sloan AE 915 Huang J et al (2014a) Genetic Perturbation of the Maize Methylome Plant Cell 26 4602ndash4616 916

Li S Łabaj PP Zumbo P Sykacek P Shi W Shi L Phan J Wu P-Y Wang M Wang C (2014b) Detecting and 917 correcting systematic variation in large-scale RNA sequencing data Nature Biotechnol 32 888ndash895 918

Li Y Pearl SA Jackson SA (2015c) Gene Networks in Plant Biology Approaches in Reconstruction and 919 Analysis Trends Plant Sci 20 664ndash675 920

Liao Y Smyth GK Shi W (2014) featureCounts an efficient general purpose program for assigning sequence 921 reads to genomic features Bioinformatics 30 923ndash930 922

Lim WK Wang K Lefebvre C Califano A (2007) Comparative analysis of microarray normalization procedures 923 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 27

Effects on reverse engineering gene networks Bioinformatics pp 282ndash288 924

Liu S Liu Y Zhao J Cai S Qian H Zuo K Zhao L Zhang L (2017) A computational interactome for prioritizing 925 genes associated with complex agronomic traits in rice Plant J 90 177-188 926

Lonsdale J Thomas J Salvatore M Phillips R Lo E Shad S Hasz R Walters G Garcia F Young N (2013) 927 The genotype-tissue expression (GTEx) project Nat Genet 45 580ndash585 928

Love MI Huber W Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data 929 with DESeq2 Genome Biol 15 1 930

Luo X You Z Zhou M Li S Leung H Xia Y Zhu Q (2015) A highly efficient approach to protein interactome 931 mapping based on collaborative filtering framework Sci Rep 5 7702 932

Ma C Wang X (2012) Application of the Gini correlation coefficient to infer regulatory relationships in 933 transcriptome analysis Plant Physiol 160 192ndash203 934

Ma H-W Zeng A-P (2003) The connectivity structure giant strong component and centrality of metabolic 935 networks Bioinformatics 19 1423ndash1430 936

Ma S Shah S Bohnert HJ Snyder M Dinesh-Kumar SP (2013) Incorporating motif analysis into gene co-937 expression networks reveals novel modular expression pattern and new signaling pathways PLoS Genet 938 9 e1003840 939

Marbach D Costello JC Kuumlffner R Vega NNM Prill RJ Camacho DM Allison KR Aderhold A Allison KR 940 Bonneau R et al (2012) Wisdom of crowds for robust gene network inference Nat Methods 9 796ndash804 941

Margolin AA Nemenman I Basso K Wiggins C Stolovitzky G Dalla Favera R Califano A (2006) ARACNE 942 an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context BMC 943 Bioinformatics 7 S7 944

Mark Cigan A Unger‐Wallace E Haug‐Collet K (2005) Transcriptional gene silencing as a tool for uncovering 945 gene function in maize Plant J 43 929ndash940 946

Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads EMBnet J 17 947 pp-10 948

Maza E Frasse P Senin P Bouzayen M Zouine M (2013) Comparison of normalization methods for 949 differential gene expression analysis in RNA-Seq experiments A matter of relative size of studied 950 transcriptomes Commun Integr Biol 6 e25849 951

McClintock B (1983) The Significance of Responses of THE GENOME TO CHALLENGE Science (80- ) 792ndash952 801 953

Meyer PE Kontos K Lafitte F Bontempi G (2007) Information-theoretic inference of large transcriptional 954 regulatory networks Eurasip J Bioinforma Syst Biol doi 101155200779879 955

Meyer PE Lafitte F Bontempi G (2008) minet ARBioconductor package for inferring large transcriptional 956 networks using mutual information BMC Bioinformatics 9 461 957

Monaco MK Sen TZ Dharmawardhana PD Ren L Schaeffer M Naithani S Amarasinghe V Thomason J 958 Harper L Gardiner J et al (2013) Maize Metabolic Network Construction and Transcriptome Analysis 959 Plant Genome 6 12 960

Morohashi K Casas MI Falcone Ferreyra ML Falcone Ferreyra L Mejiacutea-Guerra MK Pourcel L Yilmaz A 961 Feller A Carvalho B Emiliani J et al (2012) A genome-wide regulatory framework identifies maize 962 pericarp color1 controlled genes Plant Cell 24 2745ndash64 963

Morris JH Apeltsin L Newman AM Baumbach J Wittkop T Su G Bader GD Ferrin TE (2011) clusterMaker 964 a multi-algorithm clustering plugin for Cytoscape BMC Bioinformatics 12 436 965

Mortazavi A Williams BA McCue K Schaeffer L Wold B (2008) Mapping and quantifying mammalian 966 transcriptomes by RNA-Seq Nat Methods 5 621ndash628 967

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 28

Mukaka MM (2012) A guide to appropriate use of correlation coefficient in medical research Malawi Med J 24 968 69ndash71 969

Obayashi T Hayashi S Saeki M Ohta H Kinoshita K (2009) ATTED-II provides coexpressed gene networks 970 for Arabidopsis Nucleic Acids Res 37 D987ndashD991 971

Ogata Y Suzuki H Sakurai N Shibata D (2010) CoP a database for characterizing co-expressed gene 972 modules with biological information in plants Bioinformatics 26 1267ndash1268 973

Oshlack A Wakefield MJ (2009) Transcript length bias in RNA-seq data confounds systems biology Biol 974 Direct 4 14 975

Park T Yi S-G Kang S-H Lee S Lee Y-S Simon R (2003) Evaluation of normalization methods for microarray 976 data BMC Bioinformatics 4 33 977

Paulson J Chen C-Y Lopes-Ramos CM Kuijjer ML Platig J Sonawane AR Fagny M Glass K Quackenbush 978 J (2016) Tissue-aware RNA-Seq processing and normalization for heterogeneous and sparse data 979 bioRxiv 81802 980

Pautler M Eveland AL LaRue T Yang F Weeks R Lunde C Je B Il Meeley R Komatsu M Vollbrecht E et 981 al (2015) FASCIATED EAR4 Encodes a bZIP Transcription Factor That Regulates Shoot Meristem Size in 982 Maize Plant Cell Online 2 tpc114132506 983

Penning BW Hunter 3rd CT Tayengwa R Eveland AL Dugard CK Olek AT Vermerris W Koch KE McCarty 984 DR Davis MF et al (2009) Genetic resources for maize cell wall biology Plant Physiol 151 1703ndash1728 985

Ponnala L Wang Y Sun Q Wijk KJ (2014) Correlation of mRNA and protein abundance in the developing 986 maize leaf Plant J 78 424ndash440 987

Rau A Gallopin M Celeux G Jaffreacutezic F (2013) Data-based filtering for replicated high-throughput 988 transcriptome sequencing experiments Bioinformatics 29 2146ndash2152 989

Robinson MD McCarthy DJ Smyth GK (2010) edgeR a Bioconductor package for differential expression 990 analysis of digital gene expression data Bioinformatics 26 139ndash140 991

Sales G Romualdi C (2011) parmigenemdasha parallel R package for mutual information estimation and gene 992 network reconstruction Bioinformatics 27 1876ndash1877 993

Sangrador-Vegas A Mitchell AL Chang H-Y Yong S-Y Finn RD (2016) GO annotation in InterPro why 994 stability does not indicate accuracy in a sea of changing annotations Database J Biol databases 995 curation 2016 996

Sato Y Antonio BA Namiki N Takehisa H Minami H Kamatsuki K Sugimoto K Shimizu Y Hirochika H 997 Nagamura Y (2011) RiceXPro a platform for monitoring gene expression in japonica rice grown under 998 natural field conditions Nucleic Acids Res 39 D1141ndashD1148 999

Schaefer RJ Briskine R Springer NM Myers CL (2014) Discovering functional modules across diverse maize 1000 transcriptomes using COB the co-expression browser PLoS One doi 101371journalpone0099193 1001

Schmidt D (2016) Co-Operation Fast Correlation Covariance and Cosine Similarity R package 1002

Schnable P Ware D Fulton R Stein J (2009) The B73 maize genome complexity diversity and dynamics 1003 Science (80- ) 326 1112ndash1115 1004

Schwanhaumlusser B Busse D Li N Dittmar G Schuchhardt J Wolf J Chen W Selbach M (2011) Global 1005 quantification of mammalian gene expression control Nature 473 337ndash342 1006

Scott-Boyer M-P Haibe-Kains B Deschepper CF (2013) Network statistics of genetically-driven gene co-1007 expression modules in mouse crosses Frontiers in Genetics 20134291 1008

Serin EAR Nijveen H Hilhorst HWM Ligterink W (2016) Learning from Co-expression Networks Possibilities 1009 and Challenges Front Plant Sci 7 444 1010

Shannon P Markiel A Ozier O Baliga NS Wang JT Ramage D Amin N Schwikowski B Ideker T (2003) 1011 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 29

Cytoscape a software environment for integrated models of biomolecular interaction networks Genome 1012 Res 13 2498ndash2504 1013

Sing T Sander O Beerenwinkel N Lengauer T (2005) ROCR visualizing classifier performance in R 1014 Bioinformatics 21 3940ndash3941 1015

Singh M Goel S Meeley RB Dantec C Parrinello H Michaud C Leblanc O Grimanelli D (2011) Production of 1016 viable gametes without meiosis in maize deficient for an ARGONAUTE protein Plant Cell 23 443ndash458 1017

Song L Langfelder P Horvath S (2012) Comparison of co-expression measures mutual information 1018 correlation and model based indices BMC Bioinformatics 13 328 1019

Stelpflug SC Rajandeep S Vaillancourt B Hirsch CN Buell CR Leon N De Kaeppler SM (2015) An 1020 expanded maize gene expression atlas based on RNA-sequencing and its use to explore root 1021 development Plant Genome 314ndash362 1022

Szklarczyk D Franceschini A Wyder S Forslund K Heller D Huerta-Cepas J Simonovic M Roth A Santos A 1023 Tsafou KP et al (2015) STRING v10 Protein-protein interaction networks integrated over the tree of life 1024 Nucleic Acids Res 43 D447ndashD452 1025

Tzfadia O Diels T De Meyer S Vandepoele K Aharoni A Van de Peer Y (2016) CoExpNetViz Comparative 1026 Co-Expression Networks Construction and Visualization Tool Front Plant Sci 6 1194 1027

Usadel B Obayashi T Mutwil M Giorgi FM Bassel GW Tanimoto M Chow A Steinhauser D Persson S 1028 Provart NJ (2009) Co-expression tools for plant biology opportunities for hypothesis generation and 1029 caveats Plant Cell Environ 32 1633ndash51 1030

USDA (2016) Grain World Markets and Trade 1031

Vanholme R Demedts B Morreel K Ralph J Boerjan W (2010) Lignin biosynthesis and structure Plant 1032 Physiol 153 895ndash905 1033

Verdin E Ott M (2014) 50 Years of Protein Acetylation From Gene Regulation To Epigenetics Metabolism 1034 and Beyond Nat Rev Mol Cell Biol 16 258ndash264 1035

Walley JW Sartor RC Shen Z Schmitz RJ Wu KJ Urich MA Nery JR Smith LG Schnable JC Ecker JR 1036 (2016) Integration of omic networks in a developmental atlas of maize Science 353 814ndash818 1037

Wang W Zhou XJ Liu Z Sun F (2015a) Network tuned multiple rank aggregation and applications to gene 1038 ranking BMC Bioinformatics 16 S6 1039

Wang YXR Jiang K Feldman LJ Bickel PJ Huang H (2015b) Inferring genendashgene interactions and functional 1040 modules using sparse canonical correlation analysis Ann Appl Stat 9 300ndash323 1041

Wang Z Gerstein M Snyder M (2009) RNA-Seq a revolutionary tool for transcriptomics Nat Rev Genet 10 1042 57ndash63 1043

Wei C Li J Bumgarner RE (2004) Sample size for detecting differentially expressed genes in microarray 1044 experiments BMC Genomics 5 87 1045

Wolfe CJ Kohane IS Butte AJ (2005) Systematic survey reveals general applicability ofldquo guilt-by-associationrdquo 1046 within gene coexpression networks BMC Bioinformatics 6 227 1047

Wu D-D Wang X Li Y Zeng L Irwin DM Zhang Y-P (2014) ldquoOut of Pollenrdquo Hypothesis for Origin of New 1048 Genes in Flowering Plants Study from Arabidopsis thaliana Genome Biol Evol 6 2822ndash2829 1049

Yang H Liu X Xin M Du J Hu Z Peng H Rossi V Sun Q Ni Z Yao Y (2016) Genome-wide Mapping of 1050 Targets of Maize Histone Deacetylase HDA101 Reveals Its Function and Regulatory Mechanism during 1051 Seed Development Plant Cell 28 629ndash645 1052

Yim WC Yu Y Song K Jang CS Lee B-M (2013) PLANEX the plant co-expression database BMC Plant Biol 1053 13 83 1054

Zheng W Chung LM Zhao H (2011) Bias detection and correction in RNA-Sequencing data BMC 1055 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 30

Bioinformatics 12 290 1056

Zhong R Allen JD Xiao G Xie Y (2014) Ensemble-based network aggregation improves the accuracy of gene 1057 network reconstruction PLoS One 9 e106319 1058

Zhong R Ye ZH (2015) Secondary cell walls Biosynthesis patterned deposition and transcriptional regulation 1059 Plant Cell Physiol 56 195ndash214 1060

Zhu G Wu A Xu X-J Xiao P Lu L Liu J Cao Y Chen L Wu J Zhao X-M (2016) PPIM A protein-protein 1061 interaction database for Maize Plant Physiol 170 pp1501821- 1062

Zyprych-Walczak J Szabelska A Handschuh L Goacuterczak K Klamecka K Figlerowicz M Siatkowski I (2015) 1063 The impact of normalization methods on RNA-Seq data analysis Biomed Res Int 2015 1064

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 1 Number of maize microarray and RNA-Seq samples submitted to NCBI (httpswwwncbinlmnihgov) from 2008-2016 A A text search of the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) database identified samples generated by microarray platforms GPL4032 and GPL12620 the total values for years 2008 to 2016 were combined to represent the number of microarray studies (Microarray) A text search of the NCBI Sequence Read Archive (SRA) database was used to identify RNA-seq samples generated between 2008 and 2016 using the Illumina sequencing platform (RNA-Seq) B the number of samples submitted to NCBI GEO database each year generated by microarray platform GPL4032 and GPL12620 were identified by a text search (dash line) and compared to the number of RNA-Seq Illumina samples (solid line) per year 2008-2016

Fig 1A B

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 2 Normalization and network inference methods effect on single network performance A Network performance was evaluated by calculating area under the receiver operating characteristic curve (AUROC) values from GO datasets for comparisons with samples normalized using Variance Stabilizing Transformation (VST) Counts Per Million (CPM) or Reads per Kilobase per Million (RPKM) methods B Network performance was evaluated by calculating AUROC values from PPPTY dataset comparisons for samples normalized using VST CPM or RPKM methods C Network performance was evaluated by calculating AUROC values from com-parisons with HDA101 binding targets for samples normalized using VST CPM or RPKM methods D Network performance was evaluated by calculating AUROC values from comparisons with GO dataset for samples constructed using nine inference methods including Pearson Correlation Coefficient (PCC) Spearman correla-tion coefficient (SCC) Kendall rank Correlation Coefficient (KCC) Gini correlation coefficient (GCC) Biweight midcorrelation (BIC) Cosine Similarity Coefficient (CSC) Additive ARCNE (AA) multiplicative ARCNE (MA) Minimum Redundancy NETwork (MRNET) and Context Likelihood of Relatedness (CLR) E Network perfor-mance was evaluated by calculating AUROC values from comparisons with PPPTY for samples constructed using nine inference methods F Network performance was evaluated by calculating AUROC values from com-parisons with HDA101 binding targets for samples constructed using nine inference methods Outliers were defined as outside of 15 times the interquartile range above 75 quantile or below 25 quantile Median values were plotted as bold horizontal lines For C to F the horizontal dashed lines indicate the highest and lowest AUROC values

Fig 2 A D

B E

C F

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

FigP

FigurePIPSimilarityPbetweenPninePinferencePmethodsPonPnetworkPperformancePbasedPuponPGOPA-PandPPPPTYPB-PevaluationI Genes with highest AUROC value in CSC or MRNETCLR are closed in blue and green box respectively AreaPunderPthePROCPcurvePAUROC-PvaluesPforPeachPGOPtermPorPgenesPwerePscaledPtoPstandardPnormalPdistributionMPresultingPinPscaledPAUROCPvaluesPbetweenPKPblue-PandPPred-IPSamplesPnormalizedPbyPVSTMPCPMPandPRPKMPwerePanalyzedPusingPeachPinferencePmethodsPPCCMPSCCMPKCCMPGCCMPBICMPCSCMPAAMPMAMPMRNETPandPCLR-PandPclusteredPbasedPonP EuclidianP distanceIP PCCP PearsonP CorrelationP CoefficientP SCCP SpearmanP correlationP coefficientP KCCP KendallP rankP CorrelationP CoefficientPGCCPGiniPcorrelationPcoefficientPBICPBiweightPmidcorrelationPCosinePSimilarityPCoefficientPCSC-PAAPAdditivePARCNEPMAPmultiplicativePARCNEPMRNETPMinimumPRedundancyPNETworkPCLRPContextPLikelihoodPofPRelatednessI

A

B

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 4 Effect of sample size on network performance A Average area under the ROC curve (average AUROC) values from GO evaluation of 17 different sized networks plotted against natural logarithm transformed sample size (log(sample size)) B Average AUROC values from PPPTY evaluation of 17 different sized networks plotted against natural logarithm transformed sample size (log(sample size)) Regression fitting logarithm models were plotted in black lines R-square and p-values were calculated by lm() function in R PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness

Fig 4A

B

GO PCC GO SCC

GO MRNET GO CLR

PPPTY PCC PPPTY SCC

PPPTY CLRPPPTY MRNET

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Fig 5

A B

Figure 5 Evaluation of different sized networks constructed with different number of samples using PCC (black) SCC (green) MRNET (red) and CLR (blue) methods A Average AUROC values from GO evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted against sample size B Average AUROC values from PPPTY evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted against sample size Networks with the same number of samples included are designated as ldquo_1rdquo ldquo_2rdquo and ldquo_3rdquo PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 6 GCN performance comparison of networks constructed with 1266 libraries A Area under the ROC curve (AUROC) values from GO evalu-ation of single network (white bars) aggregation network (grey bars) and protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers B AUROC values from PPPTY evaluation of single network (white bars) aggregation network (grey bars) and protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers

AU

C

AU

C

Protein GO Protein PPPTYA B

Fig 6

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

835

45

3812

5 802

148

MS PA

SA

872505

167664

16510411732

9172 716573

106591

MS PA

SA

chromatinassemblydisassembly

cellular macromoleculemetabolic process

chromatin assembly

Hub

N2 compound metabolicprocess

gene silencing

macromoleculemetabolic process

cellular componentorganization

chromatin modification

biosynthetic process

cellular biosyntheticprocess

DNA packaging

organelle organization

protein-DNA complexassembly

nucleosomeorganization

DNA-dep DNAreplication

macromoleculebiosynthetic process

response to DNAdamage stimulus

chromosomeorganization

pattern specificationprocess

DNA replication

DNA conformationchange

translation

cellular macromoleculebiosynthetic process

Nucleic acid metabolicprocess

gene expression

chromatin organizationnucleosome assembly

epigenetic reg of geneexpression

negative regulation ofmacromolecule

metabolic process

cellular response tostress

RNA processing

DNA repair

regionalization

polysaccharidebiosynthetic process

cell wall organization orbiogenesis

glucan metabolicprocess

cellular glucanmetabolic process

cellular polysaccharidebiosynthetic process

cellular carbohydratebiosynthetic process

cellulose metabolicprocess

cellular polysaccharidemetabolic process

cellulose biosyntheticprocess epidermis development

cell growthgrowth

regulation of cellularcomponent size

cellular amino acidderivative metabolic

process

cell wall polysaccharidemetabolic process

carbohydrate metabolicprocess

regulation of anatomicalstructure size

GTP metabolic process

root morphogenesis

epidermal celldifferentiation

ectoderm developmentphenylpropanoid

biosynthetic process

regulation of cell size

glucan biosyntheticprocess

carbohydratebiosynthetic process cellular cell wall

organization orbiogenesis

cell wall biogenesis

Cell Wallroot epidermal cell

differentiationcell differentiation

cell wall organization

protein polymerization

plant-type cell wallbiogenesis

cellular carbohydratemetabolic process

phenylpropanoidmetabolic process

cell wall macromoleculemetabolic process

cellular cell wallmacromolecule

metabolic process

plant-type cell wallorganization or

biogenesis

hemicellulose metabolicprocess

Fig 7A C

B D

Figure 7 Comparison of top 1000 interacting genes in aggregation and single networks A Venn diagram shows the overlap among 1000 highest interacting genes in three networks 148 genes were shared among three networks PA PCC ranked aggregation network SA SCC ranked aggrega-tion network MS MRNET single network B Venn diagram shows the overlap among top 1x106 edges in three networks 106591 edges were shared among three networks C GO term enrichment analyzed by AgriGO for 148 shared highly interacting genes among PA SA and MS D GO term enrichment analyzed by AgriGO for 214 co-expressed genes queried by 16 cell wall pathway genes wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Fig 8

A B C

Figure 8 Cell wall pathway subnetworks A Intersections of PCC aggregation (PA) SCC aggregation (SA) and MRNET-single (MS) networks queried by 16 cell wall pathway genes (red nodes) Cyan nodes are genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions B Network retrieved from CORNET database queried by the 16 cell wall pathway genes (red node) Cyan nodes are genes with reported function in cell wall related path-ways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions C Network retrieved from STRING database queried by the 16 cell wall pathway genes (red nodes) Cyan nodes are genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Parsed CitationsAllen JD Xie Y Chen M Girard L Xiao G (2012) Comparing statistical methods for constructing large scale gene networks PLoS One7 e29348

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Anders S Huber W (2010) Differential expression analysis for sequence count data Genome Biol 11 R106Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Aoki K Ogata Y Shibata D (2007) Approaches for extracting practical information from gene co-expression networks in plant biologyPlant Cell Physiol 48 381-90

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Aoki Y Okamura Y Tadaka S Kinoshita K Obayashi T (2015) ATTED-II in 2016 A Plant Coexpression Database Towards Lineage-Specific Coexpression Plant Cell Physiol 57 e5-e5

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Asnicar F Masera L Coller E Gallo C Sella N Tolio T Morettin P Erculiani L Galante F Semeniuta S (2016) NES2RA Networkexpansion by stratified variable subsetting and ranking aggregation Int J High Perform Comput Appl 1094342016662508

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Baerenfaller K Grossmann J Grobei MA Hull R Hirsch-Hoffmann M Yalovsky S Zimmermann P Grossniklaus U Gruissem WBaginsky S (2008) Genome-scale proteomics reveals Arabidopsis thaliana gene models and proteome dynamics Science (80- ) 320938-941

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ballouz S Verleyen W Gillis J (2015) Guidance for RNA-seq co-expression network construction and analysis Safety in numbersBioinformatics 31 2123-2130

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ballouz S Weber M Pavlidis P Gillis J (2016) EGAD Ultra-fast functional analysis of gene networks bioRxiv 53868Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Barabasi A-L Oltvai ZNZN Barabaacutesi A-L (2004) Network biology understanding the cells functional organization Nat Rev Genet 5101-113

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Benjamini Y Hochberg Y (1995) Controlling the false discovery rate a practical and powerful approach to multiple testing J R Stat SocSer B 289-300

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

De Bodt S Hollunder J Nelissen H Meulemeester N Inzeacute D (2012) CORNET 20 Integrating plant coexpression protein-proteininteractions regulatory interactions gene associations and functional annotations New Phytol 195 707-720

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Bolduc N Yilmaz A Mejia-Guerra MK Morohashi K OConnor D Grotewold E Hake S (2012) Unraveling the KNOTTED1 regulatorynetwork in maize meristems Genes Dev 26 1685-90

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Bosch M Mayer C-D Cookson A Donnison IS (2011) Identification of genes involved in cell wall biogenesis in grasses by differential wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

gene expression profiling of elongating and non-elongating maize internodes J Exp Bot 62 3545-3561Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Camacho C Coulouris G Avagyan V Ma N Papadopoulos J Bealer K Madden TL (2009) BLAST+ architecture and applications BMCBioinformatics 10 421

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Conesa A Madrigal P Tarazona S Gomez-Cabrero D Cervera A McPherson A Szczesniak MW Gaffney DJ Elo LL Zhang X et al(2016) A survey of best practices for RNA-seq data analysis Genome Biol 17 13

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dhaeseleer P Liang S Somogyi R (2000) Genetic network inference from co-expression clustering to reverse engineeringBioinformatics 16 707-726

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Davidson RM Hansey CN Gowda M Childs KL Lin H Vaillancourt B Sekhon RS de Leon N Kaeppler SM Jiang N et al (2011) Utilityof RNA Sequencing for Analysis of Maize Reproductive Transcriptomes Plant Genome J 4 191

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Desprez T Juraniec M Crowell EF Jouy H Pochylova Z Parcy F Houmlfte H Gonneau M Vernhettes S (2007) Organization of cellulosesynthase complexes involved in primary cell wall synthesis in Arabidopsis thaliana Proc Natl Acad Sci 104 15572-15577

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dhillon IS Modha DS (2001) Concept decompositions for large sparse text data using clustering Mach Learn 42 143-175Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dillies M-A Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot G Castel D Estelle J (2013a) Acomprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis Brief Bioinform 14671-683

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dillies MA Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot NS Castel D Estelle J et al (2013b) Acomprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis Brief Bioinform 14671-683

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Doncheva NT Assenov Y Domingues FS Albrecht M (2012) Topological analysis and interactive visualization of biological networksand protein structures Nature Protoc 7 670-85

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dong J Horvath S (2007) Understanding network concepts in modules BMC Syst Biol 1 24Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dotto MC Petsch KA Aukerman MJ Beatty M Hammell M Timmermans MCP (2014) Genome-wide analysis of leafbladeless1-regulatedand phased small RNAs underscores the importance of the TAS3 ta-siRNA pathway to maize development PLoS Genet 10 e1004826

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Du D Rawat N Deng Z Gmitter FG (2015) Construction of citrus gene coexpression networks from microarray data using randommatrix theory Hortic Res 2 15026

Pubmed Author and TitleCrossRef Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Google Scholar Author Only Title Only Author and Title

Du Z Zhou X Ling Y Zhang Z Su Z (2010) agriGO a GO analysis toolkit for the agricultural community Nucleic Acids Res 38 64-70Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Enright AJ Van Dongen S Ouzounis CA (2002) An efficient algorithm for large-scale detection of protein families Nucleic Acids Res 301575-1584

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fadista J Vikman P Laakso EO Mollet IG Esguerra J Lou Taneera J Storm P Osmark P Ladenvall C Prasad RB (2014) Globalgenomic and transcriptomic analysis of human pancreatic islets reveals novel genes influencing glucose metabolism Proc Natl AcadSci 111 13924-13929

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Faith JJ Hayete B Thaden JT Mogno I Wierzbowski J Cottarel G Kasif S Collins JJ Gardner TS (2007) Large-scale mapping andvalidation of Escherichia coli transcriptional regulation from a compendium of expression profiles PLoS Biol 5 0054-0066

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fedoroff N V (2012) McClintocks challenge in the 21st century Proc Natl Acad Sci 109(50) 20200-20203Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ficklin SP Feltus FA (2011) Gene coexpression network alignment and conservation of gene modules between two grass speciesmaize and rice Plant Physiol 156 1244-56

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Filzmoser P Fritz H Kalcher K (2009) pcaPP Robust PCA by Projection Pursuit R Packag version 1Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fu J Cheng Y Linghu J Yang X Kang L Zhang Z Zhang J He C Du X Peng Z (2013) RNA sequencing reveals the complex regulatorynetwork in the maize kernel Nature Commun 42832

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gao S Ver Steeg G Galstyan A (2015) Efficient Estimation of Mutual Information for Strongly Dependent Variables ArtificialIntelligence and Statistics 277-286

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gillis J Pavlidis P (2011) The role of indirect connections in gene networks in predicting function Bioinformatics 27 1860-1866Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Giorgi F Fabbro C Del Licausi F (2013) Comparative study of RNA-seq-and Microarray-derived coexpression networks in Arabidopsisthaliana Bioinformatics 2 1-8

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gu Y Kaplinsky N Bringmann M Cobb A Carroll A Sampathkumar A Baskin TI Persson S Somerville CR (2010) Identification of acellulose synthase-associated protein required for cellulose biosynthesis Proc Natl Acad Sci 107 12866-12871

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Han Y Gao S Muegge K Zhang W Zhou B (2015) Advanced applications of RNA sequencing and challenges Bioinform Biol Insights 929-46

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Horvath S Dong J (2008) Geometric interpretation of gene coexpression network analysis PLoS Comput biol 4 e1000117Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Huang J Lynn JS Schulte L Vendramin S McGinnis K (2017) Chapter Two-Epigenetic Control of Gene Expression in Maize Int RevCell Mol Biol 328 25-48

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Iancu OD Kawane S Bottomly D Searles R Hitzemann R McWeeney S (2012) Utilizing RNA-Seq data for de novo coexpressionnetwork inference Bioinformatics 28 1592-1597

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kim D Langmead B Salzberg SL (2015) HISAT a fast spliced aligner with low memory requirements Nat Methods 12 357-360Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Koumlster J Rahmann S (2012) Snakemakemdasha scalable bioinformatics workflow engine Bioinformatics 28 2520-2522Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Krouk G Lingeman J Colon A Coruzzi G Shasha D (2013) Gene regulatory networks in plants learning causality from time andperturbation Genome Biol 14 123

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kumar S Stecher G Suleski M Hedges SB (2017) TimeTree a resource for timelines timetrees and divergence times Mol Biol Evol34 1812-1819

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kumari S Nie J Chen H-S Ma H Stewart R Li X Lu M-Z Taylor WM Wei H (2012) Evaluation of gene association methods forcoexpression network construction and biological knowledge discovery PLoS One 7 e50411

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Langfelder P Horvath S (2008) WGCNA an R package for weighted correlation network analysis BMC Bioinformatics 9 559Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Leinonen R Sugawara H Shumway M (2010) The sequence read archive Nucleic Acids Res gkq1019Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li C Qiao Z Qi W Wang Q Yuan Y Yang X Tang Y Mei B Lv Y Zhao H et al (2015a) Genome-Wide Characterization of cis-Acting DNATargets Reveals the Transcriptional Regulatory Framework of Opaque2 in Maize Plant Cell 27 532-545

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li H Peng Z Yang X Wang W Fu J Wang J Han Y Chai Y Guo T Yang N (2013a) Genome-wide association study dissects the geneticarchitecture of oil biosynthesis in maize kernels Nat Genet 45 43-50

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li J Wei H Zhao PX (2013b) DeGNServer Deciphering Genome-Scale Gene Networks through High Performance ReverseEngineering Analysis 2013

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li P Piao Y Shon HS Ryu KH (2015b) Comparing the normalization methods for the differential analysis of Illumina high-throughputRNA-Seq data BMC Bioinformatics 16 347 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li Q Eichten SR Hermanson PJ Zaunbrecher VM Song J Wendt J Rosenbaum H Madzima TF Sloan AE Huang J et al (2014a)Genetic Perturbation of the Maize Methylome Plant Cell 26 4602-4616

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li S Labaj PP Zumbo P Sykacek P Shi W Shi L Phan J Wu P-Y Wang M Wang C (2014b) Detecting and correcting systematicvariation in large-scale RNA sequencing data Nature Biotechnol 32 888-895

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li Y Pearl SA Jackson SA (2015c) Gene Networks in Plant Biology Approaches in Reconstruction and Analysis Trends Plant Sci 20664-675

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Liao Y Smyth GK Shi W (2014) featureCounts an efficient general purpose program for assigning sequence reads to genomicfeatures Bioinformatics 30 923-930

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Lim WK Wang K Lefebvre C Califano A (2007) Comparative analysis of microarray normalization procedures Effects on reverseengineering gene networks Bioinformatics pp 282-288

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Liu S Liu Y Zhao J Cai S Qian H Zuo K Zhao L Zhang L (2017) A computational interactome for prioritizing genes associated withcomplex agronomic traits in rice Plant J 90 177-188

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Lonsdale J Thomas J Salvatore M Phillips R Lo E Shad S Hasz R Walters G Garcia F Young N (2013) The genotype-tissueexpression (GTEx) project Nat Genet 45 580-585

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Love MI Huber W Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 Genome Biol15 1

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Luo X You Z Zhou M Li S Leung H Xia Y Zhu Q (2015) A highly efficient approach to protein interactome mapping based oncollaborative filtering framework Sci Rep 5 7702

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma C Wang X (2012) Application of the Gini correlation coefficient to infer regulatory relationships in transcriptome analysis PlantPhysiol 160 192-203

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma H-W Zeng A-P (2003) The connectivity structure giant strong component and centrality of metabolic networks Bioinformatics 191423-1430

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma S Shah S Bohnert HJ Snyder M Dinesh-Kumar SP (2013) Incorporating motif analysis into gene co-expression networks revealsnovel modular expression pattern and new signaling pathways PLoS Genet 9 e1003840

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Marbach D Costello JC Kuumlffner R Vega NNM Prill RJ Camacho DM Allison KR Aderhold A Allison KR Bonneau R et al (2012)Wisdom of crowds for robust gene network inference Nat Methods 9 796-804

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Margolin AA Nemenman I Basso K Wiggins C Stolovitzky G Dalla Favera R Califano A (2006) ARACNE an algorithm for thereconstruction of gene regulatory networks in a mammalian cellular context BMC Bioinformatics 7 S7

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mark Cigan A Unger-Wallace E Haug-Collet K (2005) Transcriptional gene silencing as a tool for uncovering gene function in maizePlant J 43 929-940

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads EMBnet J 17 pp-10Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Maza E Frasse P Senin P Bouzayen M Zouine M (2013) Comparison of normalization methods for differential gene expressionanalysis in RNA-Seq experiments A matter of relative size of studied transcriptomes Commun Integr Biol 6 e25849

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

McClintock B (1983) The Significance of Responses of THE GENOME TO CHALLENGE Science (80- ) 792-801Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Meyer PE Kontos K Lafitte F Bontempi G (2007) Information-theoretic inference of large transcriptional regulatory networks EurasipJ Bioinforma Syst Biol doi 101155200779879

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Meyer PE Lafitte F Bontempi G (2008) minet ARBioconductor package for inferring large transcriptional networks using mutualinformation BMC Bioinformatics 9 461

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Monaco MK Sen TZ Dharmawardhana PD Ren L Schaeffer M Naithani S Amarasinghe V Thomason J Harper L Gardiner J et al(2013) Maize Metabolic Network Construction and Transcriptome Analysis Plant Genome 6 12

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Morohashi K Casas MI Falcone Ferreyra ML Falcone Ferreyra L Mejiacutea-Guerra MK Pourcel L Yilmaz A Feller A Carvalho B EmilianiJ et al (2012) A genome-wide regulatory framework identifies maize pericarp color1 controlled genes Plant Cell 24 2745-64

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Morris JH Apeltsin L Newman AM Baumbach J Wittkop T Su G Bader GD Ferrin TE (2011) clusterMaker a multi-algorithmclustering plugin for Cytoscape BMC Bioinformatics 12 436

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mortazavi A Williams BA McCue K Schaeffer L Wold B (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq NatMethods 5 621-628

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mukaka MM (2012) A guide to appropriate use of correlation coefficient in medical research Malawi Med J 24 69-71Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Obayashi T Hayashi S Saeki M Ohta H Kinoshita K (2009) ATTED-II provides coexpressed gene networks for Arabidopsis Nucleic wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Acids Res 37 D987-D991Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ogata Y Suzuki H Sakurai N Shibata D (2010) CoP a database for characterizing co-expressed gene modules with biologicalinformation in plants Bioinformatics 26 1267-1268

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Oshlack A Wakefield MJ (2009) Transcript length bias in RNA-seq data confounds systems biology Biol Direct 4 14Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Park T Yi S-G Kang S-H Lee S Lee Y-S Simon R (2003) Evaluation of normalization methods for microarray data BMC Bioinformatics4 33

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Paulson J Chen C-Y Lopes-Ramos CM Kuijjer ML Platig J Sonawane AR Fagny M Glass K Quackenbush J (2016) Tissue-awareRNA-Seq processing and normalization for heterogeneous and sparse data bioRxiv 81802

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Pautler M Eveland AL LaRue T Yang F Weeks R Lunde C Je B Il Meeley R Komatsu M Vollbrecht E et al (2015) FASCIATED EAR4Encodes a bZIP Transcription Factor That Regulates Shoot Meristem Size in Maize Plant Cell Online 2 tpc114132506

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Penning BW Hunter 3rd CT Tayengwa R Eveland AL Dugard CK Olek AT Vermerris W Koch KE McCarty DR Davis MF et al (2009)Genetic resources for maize cell wall biology Plant Physiol 151 1703-1728

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ponnala L Wang Y Sun Q Wijk KJ (2014) Correlation of mRNA and protein abundance in the developing maize leaf Plant J 78 424-440Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Rau A Gallopin M Celeux G Jaffreacutezic F (2013) Data-based filtering for replicated high-throughput transcriptome sequencingexperiments Bioinformatics 29 2146-2152

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Robinson MD McCarthy DJ Smyth GK (2010) edgeR a Bioconductor package for differential expression analysis of digital geneexpression data Bioinformatics 26 139-140

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sales G Romualdi C (2011) parmigenemdasha parallel R package for mutual information estimation and gene network reconstructionBioinformatics 27 1876-1877

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sangrador-Vegas A Mitchell AL Chang H-Y Yong S-Y Finn RD (2016) GO annotation in InterPro why stability does not indicateaccuracy in a sea of changing annotations Database J Biol databases curation 2016

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sato Y Antonio BA Namiki N Takehisa H Minami H Kamatsuki K Sugimoto K Shimizu Y Hirochika H Nagamura Y (2011) RiceXPro aplatform for monitoring gene expression in japonica rice grown under natural field conditions Nucleic Acids Res 39 D1141-D1148

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schaefer RJ Briskine R Springer NM Myers CL (2014) Discovering functional modules across diverse maize transcriptomes using wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

COB the co-expression browser PLoS One doi 101371journalpone0099193Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schmidt D (2016) Co-Operation Fast Correlation Covariance and Cosine Similarity R packagePubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schnable P Ware D Fulton R Stein J (2009) The B73 maize genome complexity diversity and dynamics Science (80- ) 326 1112-1115Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schwanhaumlusser B Busse D Li N Dittmar G Schuchhardt J Wolf J Chen W Selbach M (2011) Global quantification of mammalian geneexpression control Nature 473 337-342

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Scott-Boyer M-P Haibe-Kains B Deschepper CF (2013) Network statistics of genetically-driven gene co-expression modules in mousecrosses Frontiers in Genetics 20134291

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Serin EAR Nijveen H Hilhorst HWM Ligterink W (2016) Learning from Co-expression Networks Possibilities and Challenges FrontPlant Sci 7 444

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Shannon P Markiel A Ozier O Baliga NS Wang JT Ramage D Amin N Schwikowski B Ideker T (2003) Cytoscape a softwareenvironment for integrated models of biomolecular interaction networks Genome Res 13 2498-2504

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sing T Sander O Beerenwinkel N Lengauer T (2005) ROCR visualizing classifier performance in R Bioinformatics 21 3940-3941Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Singh M Goel S Meeley RB Dantec C Parrinello H Michaud C Leblanc O Grimanelli D (2011) Production of viable gametes withoutmeiosis in maize deficient for an ARGONAUTE protein Plant Cell 23 443-458

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Song L Langfelder P Horvath S (2012) Comparison of co-expression measures mutual information correlation and model basedindices BMC Bioinformatics 13 328

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Stelpflug SC Rajandeep S Vaillancourt B Hirsch CN Buell CR Leon N De Kaeppler SM (2015) An expanded maize gene expressionatlas based on RNA-sequencing and its use to explore root development Plant Genome 314-362

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Szklarczyk D Franceschini A Wyder S Forslund K Heller D Huerta-Cepas J Simonovic M Roth A Santos A Tsafou KP et al (2015)STRING v10 Protein-protein interaction networks integrated over the tree of life Nucleic Acids Res 43 D447-D452

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Tzfadia O Diels T De Meyer S Vandepoele K Aharoni A Van de Peer Y (2016) CoExpNetViz Comparative Co-Expression NetworksConstruction and Visualization Tool Front Plant Sci 6 1194

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Usadel B Obayashi T Mutwil M Giorgi FM Bassel GW Tanimoto M Chow A Steinhauser D Persson S Provart NJ (2009) Co-expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

USDA (2016) Grain World Markets and Trade

Vanholme R Demedts B Morreel K Ralph J Boerjan W (2010) Lignin biosynthesis and structure Plant Physiol 153 895-905Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Verdin E Ott M (2014) 50 Years of Protein Acetylation From Gene Regulation To Epigenetics Metabolism and Beyond Nat Rev MolCell Biol 16 258-264

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Walley JW Sartor RC Shen Z Schmitz RJ Wu KJ Urich MA Nery JR Smith LG Schnable JC Ecker JR (2016) Integration of omicnetworks in a developmental atlas of maize Science 353 814-818

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang W Zhou XJ Liu Z Sun F (2015a) Network tuned multiple rank aggregation and applications to gene ranking BMC Bioinformatics16 S6

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang YXR Jiang K Feldman LJ Bickel PJ Huang H (2015b) Inferring gene-gene interactions and functional modules using sparsecanonical correlation analysis Ann Appl Stat 9 300-323

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang Z Gerstein M Snyder M (2009) RNA-Seq a revolutionary tool for transcriptomics Nat Rev Genet 10 57-63Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wei C Li J Bumgarner RE (2004) Sample size for detecting differentially expressed genes in microarray experiments BMC Genomics5 87

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wolfe CJ Kohane IS Butte AJ (2005) Systematic survey reveals general applicability of guilt-by-association within genecoexpression networks BMC Bioinformatics 6 227

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wu D-D Wang X Li Y Zeng L Irwin DM Zhang Y-P (2014) Out of Pollen Hypothesis for Origin of New Genes in Flowering PlantsStudy from Arabidopsis thaliana Genome Biol Evol 6 2822-2829

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yang H Liu X Xin M Du J Hu Z Peng H Rossi V Sun Q Ni Z Yao Y (2016) Genome-wide Mapping of Targets of Maize HistoneDeacetylase HDA101 Reveals Its Function and Regulatory Mechanism during Seed Development Plant Cell 28 629-645

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yim WC Yu Y Song K Jang CS Lee B-M (2013) PLANEX the plant co-expression database BMC Plant Biol 13 83Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zheng W Chung LM Zhao H (2011) Bias detection and correction in RNA-Sequencing data BMC Bioinformatics 12 290Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Allen JD Xiao G Xie Y (2014) Ensemble-based network aggregation improves the accuracy of gene network reconstructionPLoS One 9 e106319 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Ye ZH (2015) Secondary cell walls Biosynthesis patterned deposition and transcriptional regulation Plant Cell Physiol 56195-214

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhu G Wu A Xu X-J Xiao P Lu L Liu J Cao Y Chen L Wu J Zhao X-M (2016) PPIM A protein-protein interaction database for MaizePlant Physiol 170 pp1501821-

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zyprych-Walczak J Szabelska A Handschuh L Goacuterczak K Klamecka K Figlerowicz M Siatkowski I (2015) The impact of normalizationmethods on RNA-Seq data analysis Biomed Res Int 2015

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

  • Parsed Citations
  • Article File
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Figure 7
  • Figure 8
  • Parsed Citations
Page 19: 1 : Maize RNA-Seq GCNs · 4 Author names and affiliation: 5 Ji Huang, ... and the genome annotation is still far from complete (Mark 49 Cigan et al., 2005; Ficklin and Feltus, 2011)

Page | 19

624

625

626

Figure legends 627

628

Figure 1 Number of maize microarray and RNA-Seq samples submitted to NCBI (httpswwwncbinlmnihgov) 629

from 2008-2016 A A text search of the National Center for Biotechnology Information (NCBI) Gene 630

Expression Omnibus (GEO) database identified samples generated by microarray platforms GPL4032 and 631

GPL12620 the total values for years 2008 to 2016 were combined to represent the number of microarray 632

studies (Microarray) A text search of the NCBI Sequence Read Archive (SRA) database was used to identify 633

RNA-seq samples generated between 2008 and 2016 using the Illumina sequencing platform (RNA-Seq) B 634

the number of samples submitted to NCBI GEO database each year generated by microarray platform 635

GPL4032 and GPL12620 were identified by a text search (dash line) and compared to the number of RNA-Seq 636

Illumina samples (solid line) per year 2008-2016 637

638

Figure 2 Normalization and network inference methods effect on single network performance A Network 639

performance was evaluated by calculating area under the receiver operating characteristic curve (AUROC) 640

values from GO datasets for comparisons with samples normalized using Variance Stabilizing Transformation 641

(VST) Counts Per Million (CPM) or Reads per Kilobase per Million (RPKM) methods B Network performance 642

was evaluated by calculating AUROC values from PPPTY dataset comparisons for samples normalized using 643

VST CPM or RPKM methods C Network performance was evaluated by calculating AUROC values from 644

comparisons with HDA101 binding targets for samples normalized using VST CPM or RPKM methods D 645

Network performance was evaluated by calculating AUROC values from comparisons with GO dataset for 646

samples constructed using ten inference methods including Pearson Correlation Coefficient (PCC) Spearman 647

correlation coefficient (SCC) Kendall rank Correlation Coefficient (KCC) Gini correlation coefficient (GCC) 648

Biweight midcorrelation (BIC) Cosine Similarity Coefficient (CSC) Additive ARCNE (AA) multiplicative 649

ARCNE (MA) Minimum Redundancy NETwork (MRNET) and Context Likelihood of Relatedness (CLR) E 650

Network performance was evaluated by calculating AUROC values from comparisons with PPPTY for samples 651

constructed using ten inference methods F Network performance was evaluated by calculating AUROC 652

values from comparisons with HDA101 binding targets for samples constructed using ten inference methods 653

Outliers were defined as outside of 15 times the interquartile range above 75 quantile or below 25 quantile 654

Median values were plotted as bold horizontal lines For C to F the horizontal dashed lines indicate the highest 655

and lowest AUROC values 656

657

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 20

Figure 3 Similarity between ten inference methods on network performance based upon GO (A) and PPPTY 658

(B) evaluation Genes with highest AUROC value in CSC or MRNETCLR are closed in blue and green box 659

respectively Area under the ROC curve (AUROC) values for each GO term or genes were scaled to standard 660

normal distribution resulting in scaled AUROC values between -3 (blue) and 3 (red) Samples normalized by 661

VST CPM and RPKM were analyzed using each inference methods (PCC SCC KCC GCC BIC CSC AA 662

MA MRNET and CLR) and clustered based on Euclidian distance PCC Pearson Correlation Coefficient SCC 663

Spearman correlation coefficient KCC Kendall rank Correlation Coefficient GCC Gini correlation coefficient 664

BIC Biweight midcorrelation CSC Cosine Similarity Coefficient AA Additive ARCNE MA multiplicative 665

ARCNE MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness 666

667

Figure 4 Effect of sample size on network performance A Average area under the ROC curve (average 668

AUROC) values from GO evaluation of 17 different sized networks plotted against natural logarithm 669

transformed sample size (log(sample size)) B Average AUROC values from PPPTY evaluation of 17 different 670

sized networks plotted against natural logarithm transformed sample size (log(sample size)) Regression fitting 671

logarithm models were plotted in black lines R-square and p-values were calculated by lm() function in R PCC 672

Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET Minimum Redundancy 673

NETwork CLR Context Likelihood of Relatedness 674

675

Figure 5 Evaluation of different sized networks constructed with different number of samples using PCC 676

(black) SCC (green) MRNET (red) and CLR (blue) methods A Average AUROC values from GO evaluations 677

of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted against sample size Seventeen 678

individual networks were labeled as S12_1 to S404 the S1266 included all samples from 17 experiments B 679

Average AUROC values from PPPTY evaluations of networks constructed using 12 (S12) to 1266 (S1266) 680

libraries were plotted against sample size Networks with the same number of samples included are 681

designated as ldquo_1rdquo ldquo_2rdquo and ldquo_3rdquo PCC Pearson Correlation Coefficient SCC Spearman correlation 682

coefficient MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness 683

684

Fig 6 GCN performance comparison among single network (whiterdquo1266rdquo) aggregated network (greyrdquoaggrdquo) 685

and protein network (dark greyrdquoprrdquo) using PCC SCC MRNET and CLR A GO evaluation on networks 686

Inference methods were indicated by single letter (p- PCC s- SCC m- MRNET c-CLR) AUROC values were 687

plotted against network types B PPPTY evaluation on networks Inference methods were indicated by single 688

letter (p- PCC s- SCC m- MRNET c-CLR) Network types were plotted against AUROC values Bold 689

horizontal lines indicate median star sign is the mean value of each box Outliers are plotted in grey dots 690

691

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 21

Figure 6 GCN performance comparison of networks constructed with 1266 libraries A Area under the ROC 692

curve (AUROC) values from GO evaluation of single network (white bars) aggregation network (grey bars) and 693

protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) 694

or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers B 695

AUROC values from PPPTY evaluation of single network (white bars) aggregation network (grey bars) and 696

protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) 697

or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers 698

699

Figure 7 Comparison of top 1000 interacting genes in aggregation and single networks A Venn diagram 700

shows the overlap among 1000 highest interacting genes in three networks 148 genes were shared among 701

three networks PA PCC ranked aggregation network SA SCC ranked aggregation network MS MRNET 702

single network B Venn diagram shows the overlap among top 1x106 edges in three networks 106591 edges 703

were shared among three networks C GO term enrichment analyzed by AgriGO for 148 shared highly 704

interacting genes among PA SA and MS D GO term enrichment analyzed by AgriGO for 214 co-expressed 705

genes queried by 16 cell wall pathway genes 706

707

Figure 8 Cell wall pathway subnetworks A Intersections of PCC aggregation (PA) SCC aggregation (SA) and 708

MRNET-single (MS) networks queried by 16 cell wall pathway genes (red nodes) Cyan nodes are genes with 709

reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of 710

involvement in cell wall related pathways Grey lines indicate network predicted interactions B Network 711

retrieved from CORNET database queried by the16 cell wall pathway genes (red node) Cyan nodes are 712

genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior 713

knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions C 714

Network retrieved from STRING database queried by 16 cell wall pathway genes (red nodes) Cyan nodes are 715

genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior 716

knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions 717

718

Supplemental Figure 1 Pipeline and datasets used for analysis A Workflow used in this analysis 719

Independent steps are labeled in square boxes with alternative algorithms for each step in the rounded boxes 720

Software and packages for each step are in italics between the boxes Raw data files were acquired from 721

National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) database converted to a 722

common format (fastq files) and aligned to the maize AGPv3 genome (Alignment) Gene-level reads were 723

counted (Read Count) to generate an expression matrix which was imported to the R environment for the 724

normalization inference and evaluation steps All networks were visualized in Cytoscape B Relative 725

representation of different maize tissues in acquired datasets Tissues are listed by name with the percentage 726

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 22

of the1266 libraries originating from each tissue SAM= Shoot Apical Meristem Samples are grouped by tissue 727

and may be represented by one or more developmental stages of that tissue Tissues represented by less than 728

10 libraries were grouped together as Others C Relative representation of different maize genotypes in our 729

datasets Genotypes are listed by name with the percentage of the 1266 libraries originating from each tissue 730

MAGIC = Multi-parent Advanced Generation InterCrosses Genotypes represented by more than 10 libraries 731

were grouped together as Others 732

733

Supplemental Figure 2 Distribution of gene expression values The frequency of each expression level in the 734

dataset (Density) was plotted against gene expression (Expr) which was calculated after normalization by 735

Variance Stabilizing Transformation (VST) Counts Per Million (CPM) or Reads Per Kilobase per Million 736

mapped reads (RPKM) A-B distribution of expression values for samples normalized with CPM (black line 737

CPM graph) and RPKM (black line RPKM graph) before (A) and after (B) logarithm normalization (log2) VST 738

values are log2 transformed by default The normal distribution of expression (dot lines) was calculated using 739

dnorm() function in R which takes the mean value and standard deviation from log2 transformed expressions 740

C Normalized gene expression values for 15116 genes were averaged libraries and plotted as a function of 741

gene length in base pairs (bp) 742

743

Supplemental Figure 3 Maize CPM-normalized with log2 transformed gene expression from all tissues and 744

developmental stages (Stelpflug et al 2015) A Clustering dendrogram of samples based on Euclidean 745

distance (Height) DAS days after sowing DAP days after pollination V1-V18 vegetative developmental 746

stage B Heat map of the gene expression correlation between pollen tissue and 78 other tissues calculated 747

by Pearson correlation coefficient ranging 06 to 10 Red color indicates higher correlation 748

749

Supplemental Figure 4 Pairwise comparison among results of inferences methods A GO evaluation 750

comparisons for VST CPM and RPKM normalized data The AUROC value density for each method was 751

plotted in diagonal line of blocks between AUROC values and PCC values AUROC values evaluated by GO 752

datasets were plotted pairwise in triangle below diagonal with the number corresponding coefficient values as 753

calculated by Pearson correlation shown in the triangle above diagonal B PPPTY evaluation comparisons for 754

VST CPM and RPKM normalized data The AUROC value density for each method was plotted in diagonal 755

line of blocks between AUROC values and PCC values AUROC values evaluated by PPPTY datasets were 756

plotted pairwise in triangle below diagonal with the number corresponding coefficient values as calculated by 757

Pearson correlation shown in the triangle above diagonal PCC Pearson Correlation Coefficient SCC 758

Spearman correlation coefficient KCC Kendall rank Correlation Coefficient GCC Gini correlation coefficient 759

Bi Biweight midcorrelation CSC Cosine Similarity Coefficient AA Additive ARCNE MA multiplicative 760

ARCNE MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness 761

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 23

762

Supplemental Figure 5 Characteristic of all 1720 PPPTY gene set (ALL_1720) genes with highest AUROC 763

values in CSC method (CSC) PCC method (PCC) and MRNET method (MRNET) Average expression in 764

CPM of four gene sets were in squares average number of lowly expressed elements (CPM lt 0) were in solid 765

circles 766

767

Supplemental Figure 6 Evaluation of network performance based on sample size and inference A AUROC 768

values from GO evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted 769

against sample size B AUROC values from PPPTY evaluations of networks constructed using 12 (S12) to 770

1266 (S1266) libraries were plotted against sample size Networks with the same number of samples included 771

are designated as ldquo_1rdquo ldquo_2rdquo and ldquo_3rdquo Outliers were defined as outside of 15 times the interquartile range 772

above the 75 quantile or below 25 quantile Median values were plotted as bold horizontal lines Dash lines 773

are average AUROC value from 17 individual networks of each categories Mean values of each network were 774

labeled in asterisks PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET 775

Minimum Redundancy NETwork CLR Context Likelihood of Relatedness 776

777

Supplemental Figure 7 GCN performance comparison between protein networks A Area Under the ROC 778

curve (AUROC) values from GO evaluation of protein networks with 17862 genes (ppr_all) and with 11429 779

genes (ppr) B Area Under the ROC curve (AUROC) values from PPPTY evaluation of protein networks with 780

17862 genes (ppr_all) and with 11429 genes (ppr) Both networks were constructed by Pearson Correlation 781

Coefficient (PCC) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate 782

outliers 783

784

Supplemental Figure 8 Average neighborhood connectivity for three selected networks PCC-aggregated (PA) 785

SCC-aggregated (SA) and MRNET-single (MS) The average neighborhood connectivity distribution of all 786

genes is plotted against number of neighbors The top one million edges were chosen for each network Red 787

and blue curve shows the power-law fitted distribution R2 value indicates the fitness with the power-law model 788

789

Supplemental Figure 9 Node distribution for four selected networks PCC-aggregated (PA) SCC-aggregated 790

(SA) and MRNET-single (MS) and the intersection among three networks (Merged network) The number of 791

edges linked to the genes (node degree) was plotted against the number of genes with that degree (number of 792

nodes) Red curve shows the power-law fitted distribution with the function and R2 indicated beside 793

794

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 24

Supplemental Figure 10 Network representation of PCC ranked aggregation network (PA) Each node is a 795

gene in the network The eight largest modules detected by Markov Cluster Algorithm (MCL) were highlighted 796

in colors Genes not in modules 1-8 are light grey nodes 797

798

799

Literature Cited 800

Allen JD Xie Y Chen M Girard L Xiao G (2012) Comparing statistical methods for constructing large scale 801 gene networks PLoS One 7 e29348 802

Anders S Huber W (2010) Differential expression analysis for sequence count data Genome Biol 11 R106 803

Aoki K Ogata Y Shibata D (2007) Approaches for extracting practical information from gene co-expression 804 networks in plant biology Plant Cell Physiol 48 381ndash90 805

Aoki Y Okamura Y Tadaka S Kinoshita K Obayashi T (2015) ATTED-II in 2016 A Plant Coexpression 806 Database Towards Lineage-Specific Coexpression Plant Cell Physiol 57 e5ndashe5 807

Asnicar F Masera L Coller E Gallo C Sella N Tolio T Morettin P Erculiani L Galante F Semeniuta S (2016) 808 NES2RA Network expansion by stratified variable subsetting and ranking aggregation Int J High Perform 809 Comput Appl 1094342016662508 810

Baerenfaller K Grossmann J Grobei MA Hull R Hirsch-Hoffmann M Yalovsky S Zimmermann P 811 Grossniklaus U Gruissem W Baginsky S (2008) Genome-scale proteomics reveals Arabidopsis thaliana 812 gene models and proteome dynamics Science (80- ) 320 938ndash941 813

Ballouz S Verleyen W Gillis J (2015) Guidance for RNA-seq co-expression network construction and analysis 814 Safety in numbers Bioinformatics 31 2123ndash2130 815

Ballouz S Weber M Pavlidis P Gillis J (2016) EGAD Ultra-fast functional analysis of gene networks bioRxiv 816 53868 817

Barabasi A-L Oltvai ZNZN Barabaacutesi A-L (2004) Network biology understanding the cellrsquos functional 818 organization Nat Rev Genet 5 101ndash113 819

Benjamini Y Hochberg Y (1995) Controlling the false discovery rate a practical and powerful approach to 820 multiple testing J R Stat Soc Ser B 289ndash300 821

De Bodt S Hollunder J Nelissen H Meulemeester N Inzeacute D (2012) CORNET 20 Integrating plant 822 coexpression protein-protein interactions regulatory interactions gene associations and functional 823 annotations New Phytol 195 707ndash720 824

Bolduc N Yilmaz A Mejia-Guerra MK Morohashi K OrsquoConnor D Grotewold E Hake S (2012) Unraveling the 825 KNOTTED1 regulatory network in maize meristems Genes Dev 26 1685ndash90 826

Bosch M Mayer C-D Cookson A Donnison IS (2011) Identification of genes involved in cell wall biogenesis in 827 grasses by differential gene expression profiling of elongating and non-elongating maize internodes J 828 Exp Bot 62 3545ndash3561 829

Camacho C Coulouris G Avagyan V Ma N Papadopoulos J Bealer K Madden TL (2009) BLAST+ 830 architecture and applications BMC Bioinformatics 10 421 831

Conesa A Madrigal P Tarazona S Gomez-Cabrero D Cervera A McPherson A Szcześniak MW Gaffney DJ 832 Elo LL Zhang X et al (2016) A survey of best practices for RNA-seq data analysis Genome Biol 17 13 833

Drsquohaeseleer P Liang S Somogyi R (2000) Genetic network inference from co-expression clustering to reverse 834 engineering Bioinformatics 16 707ndash726 835

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 25

Davidson RM Hansey CN Gowda M Childs KL Lin H Vaillancourt B Sekhon RS de Leon N Kaeppler SM 836 Jiang N et al (2011) Utility of RNA Sequencing for Analysis of Maize Reproductive Transcriptomes Plant 837 Genome J 4 191 838

Desprez T Juraniec M Crowell EF Jouy H Pochylova Z Parcy F Houmlfte H Gonneau M Vernhettes S (2007) 839 Organization of cellulose synthase complexes involved in primary cell wall synthesis in Arabidopsis 840 thaliana Proc Natl Acad Sci 104 15572ndash15577 841

Dhillon IS Modha DS (2001) Concept decompositions for large sparse text data using clustering Mach Learn 842 42 143ndash175 843

Dillies M-A Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot G Castel D 844 Estelle J (2013a) A comprehensive evaluation of normalization methods for Illumina high-throughput RNA 845 sequencing data analysis Brief Bioinform 14 671ndash683 846

Dillies MA Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot NS Castel D 847 Estelle J et al (2013b) A comprehensive evaluation of normalization methods for Illumina high-throughput 848 RNA sequencing data analysis Brief Bioinform 14 671ndash683 849

Doncheva NT Assenov Y Domingues FS Albrecht M (2012) Topological analysis and interactive visualization 850 of biological networks and protein structures Nature Protoc 7 670ndash85 851

Dong J Horvath S (2007) Understanding network concepts in modules BMC Syst Biol 1 24 852

Dotto MC Petsch KA Aukerman MJ Beatty M Hammell M Timmermans MCP (2014) Genome-wide analysis 853 of leafbladeless1-regulated and phased small RNAs underscores the importance of the TAS3 ta-siRNA 854 pathway to maize development PLoS Genet 10 e1004826 855

Du D Rawat N Deng Z Gmitter FG (2015) Construction of citrus gene coexpression networks from microarray 856 data using random matrix theory Hortic Res 2 15026 857

Du Z Zhou X Ling Y Zhang Z Su Z (2010) agriGO a GO analysis toolkit for the agricultural community 858 Nucleic Acids Res 38 64-70 859

Enright AJ Van Dongen S Ouzounis CA (2002) An efficient algorithm for large-scale detection of protein 860 families Nucleic Acids Res 30 1575ndash1584 861

Fadista J Vikman P Laakso EO Mollet IG Esguerra J Lou Taneera J Storm P Osmark P Ladenvall C 862 Prasad RB (2014) Global genomic and transcriptomic analysis of human pancreatic islets reveals novel 863 genes influencing glucose metabolism Proc Natl Acad Sci 111 13924ndash13929 864

Faith JJ Hayete B Thaden JT Mogno I Wierzbowski J Cottarel G Kasif S Collins JJ Gardner TS (2007) 865 Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of 866 expression profiles PLoS Biol 5 0054ndash0066 867

Fedoroff N V (2012) McClintockrsquos challenge in the 21st century Proc Natl Acad Sci 109(50) 20200ndash20203 868

Ficklin SP Feltus FA (2011) Gene coexpression network alignment and conservation of gene modules 869 between two grass species maize and rice Plant Physiol 156 1244ndash56 870

Filzmoser P Fritz H Kalcher K (2009) pcaPP Robust PCA by Projection Pursuit R Packag version 1 871

Fu J Cheng Y Linghu J Yang X Kang L Zhang Z Zhang J He C Du X Peng Z (2013) RNA sequencing 872 reveals the complex regulatory network in the maize kernel Nature Commun 42832 873

Gao S Ver Steeg G Galstyan A (2015) Efficient Estimation of Mutual Information for Strongly Dependent 874 Variables Artificial Intelligence and Statistics 277-286 875

Gillis J Pavlidis P (2011) The role of indirect connections in gene networks in predicting function 876 Bioinformatics 27 1860ndash1866 877

Giorgi F Fabbro C Del Licausi F (2013) Comparative study of RNA-seq-and Microarray-derived coexpression 878 networks in Arabidopsis thaliana Bioinformatics 2 1ndash8 879

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 26

Gu Y Kaplinsky N Bringmann M Cobb A Carroll A Sampathkumar A Baskin TI Persson S Somerville CR 880 (2010) Identification of a cellulose synthase-associated protein required for cellulose biosynthesis Proc 881 Natl Acad Sci 107 12866ndash12871 882

Han Y Gao S Muegge K Zhang W Zhou B (2015) Advanced applications of RNA sequencing and challenges 883 Bioinform Biol Insights 9 29ndash46 884

Horvath S Dong J (2008) Geometric interpretation of gene coexpression network analysis PLoS Comput biol 885 4 e1000117 886

Huang J Lynn JS Schulte L Vendramin S McGinnis K (2017) Chapter Two-Epigenetic Control of Gene 887 Expression in Maize Int Rev Cell Mol Biol 328 25ndash48 888

Iancu OD Kawane S Bottomly D Searles R Hitzemann R McWeeney S (2012) Utilizing RNA-Seq data for de 889 novo coexpression network inference Bioinformatics 28 1592ndash1597 890

Kim D Langmead B Salzberg SL (2015) HISAT a fast spliced aligner with low memory requirements Nat 891 Methods 12 357ndash360 892

Koumlster J Rahmann S (2012) Snakemakemdasha scalable bioinformatics workflow engine Bioinformatics 28 893 2520ndash2522 894

Krouk G Lingeman J Colon A Coruzzi G Shasha D (2013) Gene regulatory networks in plants learning 895 causality from time and perturbation Genome Biol 14 123 896

Kumar S Stecher G Suleski M Hedges SB (2017) TimeTree a resource for timelines timetrees and 897 divergence times Mol Biol Evol 34 1812ndash1819 898

Kumari S Nie J Chen H-S Ma H Stewart R Li X Lu M-Z Taylor WM Wei H (2012) Evaluation of gene 899 association methods for coexpression network construction and biological knowledge discovery PLoS 900 One 7 e50411 901

Langfelder P Horvath S (2008) WGCNA an R package for weighted correlation network analysis BMC 902 Bioinformatics 9 559 903

Leinonen R Sugawara H Shumway M (2010) The sequence read archive Nucleic Acids Res gkq1019 904

Li C Qiao Z Qi W Wang Q Yuan Y Yang X Tang Y Mei B Lv Y Zhao H et al (2015a) Genome-Wide 905 Characterization of cis-Acting DNA Targets Reveals the Transcriptional Regulatory Framework of 906 Opaque2 in Maize Plant Cell 27 532-545 907

Li H Peng Z Yang X Wang W Fu J Wang J Han Y Chai Y Guo T Yang N (2013a) Genome-wide 908 association study dissects the genetic architecture of oil biosynthesis in maize kernels Nat Genet 45 43ndash909 50 910

Li J Wei H Zhao PX (2013b) DeGNServer  Deciphering Genome-Scale Gene Networks through High 911 Performance Reverse Engineering Analysis 2013 912

Li P Piao Y Shon HS Ryu KH (2015b) Comparing the normalization methods for the differential analysis of 913 Illumina high-throughput RNA-Seq data BMC Bioinformatics 16 347 914

Li Q Eichten SR Hermanson PJ Zaunbrecher VM Song J Wendt J Rosenbaum H Madzima TF Sloan AE 915 Huang J et al (2014a) Genetic Perturbation of the Maize Methylome Plant Cell 26 4602ndash4616 916

Li S Łabaj PP Zumbo P Sykacek P Shi W Shi L Phan J Wu P-Y Wang M Wang C (2014b) Detecting and 917 correcting systematic variation in large-scale RNA sequencing data Nature Biotechnol 32 888ndash895 918

Li Y Pearl SA Jackson SA (2015c) Gene Networks in Plant Biology Approaches in Reconstruction and 919 Analysis Trends Plant Sci 20 664ndash675 920

Liao Y Smyth GK Shi W (2014) featureCounts an efficient general purpose program for assigning sequence 921 reads to genomic features Bioinformatics 30 923ndash930 922

Lim WK Wang K Lefebvre C Califano A (2007) Comparative analysis of microarray normalization procedures 923 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 27

Effects on reverse engineering gene networks Bioinformatics pp 282ndash288 924

Liu S Liu Y Zhao J Cai S Qian H Zuo K Zhao L Zhang L (2017) A computational interactome for prioritizing 925 genes associated with complex agronomic traits in rice Plant J 90 177-188 926

Lonsdale J Thomas J Salvatore M Phillips R Lo E Shad S Hasz R Walters G Garcia F Young N (2013) 927 The genotype-tissue expression (GTEx) project Nat Genet 45 580ndash585 928

Love MI Huber W Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data 929 with DESeq2 Genome Biol 15 1 930

Luo X You Z Zhou M Li S Leung H Xia Y Zhu Q (2015) A highly efficient approach to protein interactome 931 mapping based on collaborative filtering framework Sci Rep 5 7702 932

Ma C Wang X (2012) Application of the Gini correlation coefficient to infer regulatory relationships in 933 transcriptome analysis Plant Physiol 160 192ndash203 934

Ma H-W Zeng A-P (2003) The connectivity structure giant strong component and centrality of metabolic 935 networks Bioinformatics 19 1423ndash1430 936

Ma S Shah S Bohnert HJ Snyder M Dinesh-Kumar SP (2013) Incorporating motif analysis into gene co-937 expression networks reveals novel modular expression pattern and new signaling pathways PLoS Genet 938 9 e1003840 939

Marbach D Costello JC Kuumlffner R Vega NNM Prill RJ Camacho DM Allison KR Aderhold A Allison KR 940 Bonneau R et al (2012) Wisdom of crowds for robust gene network inference Nat Methods 9 796ndash804 941

Margolin AA Nemenman I Basso K Wiggins C Stolovitzky G Dalla Favera R Califano A (2006) ARACNE 942 an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context BMC 943 Bioinformatics 7 S7 944

Mark Cigan A Unger‐Wallace E Haug‐Collet K (2005) Transcriptional gene silencing as a tool for uncovering 945 gene function in maize Plant J 43 929ndash940 946

Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads EMBnet J 17 947 pp-10 948

Maza E Frasse P Senin P Bouzayen M Zouine M (2013) Comparison of normalization methods for 949 differential gene expression analysis in RNA-Seq experiments A matter of relative size of studied 950 transcriptomes Commun Integr Biol 6 e25849 951

McClintock B (1983) The Significance of Responses of THE GENOME TO CHALLENGE Science (80- ) 792ndash952 801 953

Meyer PE Kontos K Lafitte F Bontempi G (2007) Information-theoretic inference of large transcriptional 954 regulatory networks Eurasip J Bioinforma Syst Biol doi 101155200779879 955

Meyer PE Lafitte F Bontempi G (2008) minet ARBioconductor package for inferring large transcriptional 956 networks using mutual information BMC Bioinformatics 9 461 957

Monaco MK Sen TZ Dharmawardhana PD Ren L Schaeffer M Naithani S Amarasinghe V Thomason J 958 Harper L Gardiner J et al (2013) Maize Metabolic Network Construction and Transcriptome Analysis 959 Plant Genome 6 12 960

Morohashi K Casas MI Falcone Ferreyra ML Falcone Ferreyra L Mejiacutea-Guerra MK Pourcel L Yilmaz A 961 Feller A Carvalho B Emiliani J et al (2012) A genome-wide regulatory framework identifies maize 962 pericarp color1 controlled genes Plant Cell 24 2745ndash64 963

Morris JH Apeltsin L Newman AM Baumbach J Wittkop T Su G Bader GD Ferrin TE (2011) clusterMaker 964 a multi-algorithm clustering plugin for Cytoscape BMC Bioinformatics 12 436 965

Mortazavi A Williams BA McCue K Schaeffer L Wold B (2008) Mapping and quantifying mammalian 966 transcriptomes by RNA-Seq Nat Methods 5 621ndash628 967

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 28

Mukaka MM (2012) A guide to appropriate use of correlation coefficient in medical research Malawi Med J 24 968 69ndash71 969

Obayashi T Hayashi S Saeki M Ohta H Kinoshita K (2009) ATTED-II provides coexpressed gene networks 970 for Arabidopsis Nucleic Acids Res 37 D987ndashD991 971

Ogata Y Suzuki H Sakurai N Shibata D (2010) CoP a database for characterizing co-expressed gene 972 modules with biological information in plants Bioinformatics 26 1267ndash1268 973

Oshlack A Wakefield MJ (2009) Transcript length bias in RNA-seq data confounds systems biology Biol 974 Direct 4 14 975

Park T Yi S-G Kang S-H Lee S Lee Y-S Simon R (2003) Evaluation of normalization methods for microarray 976 data BMC Bioinformatics 4 33 977

Paulson J Chen C-Y Lopes-Ramos CM Kuijjer ML Platig J Sonawane AR Fagny M Glass K Quackenbush 978 J (2016) Tissue-aware RNA-Seq processing and normalization for heterogeneous and sparse data 979 bioRxiv 81802 980

Pautler M Eveland AL LaRue T Yang F Weeks R Lunde C Je B Il Meeley R Komatsu M Vollbrecht E et 981 al (2015) FASCIATED EAR4 Encodes a bZIP Transcription Factor That Regulates Shoot Meristem Size in 982 Maize Plant Cell Online 2 tpc114132506 983

Penning BW Hunter 3rd CT Tayengwa R Eveland AL Dugard CK Olek AT Vermerris W Koch KE McCarty 984 DR Davis MF et al (2009) Genetic resources for maize cell wall biology Plant Physiol 151 1703ndash1728 985

Ponnala L Wang Y Sun Q Wijk KJ (2014) Correlation of mRNA and protein abundance in the developing 986 maize leaf Plant J 78 424ndash440 987

Rau A Gallopin M Celeux G Jaffreacutezic F (2013) Data-based filtering for replicated high-throughput 988 transcriptome sequencing experiments Bioinformatics 29 2146ndash2152 989

Robinson MD McCarthy DJ Smyth GK (2010) edgeR a Bioconductor package for differential expression 990 analysis of digital gene expression data Bioinformatics 26 139ndash140 991

Sales G Romualdi C (2011) parmigenemdasha parallel R package for mutual information estimation and gene 992 network reconstruction Bioinformatics 27 1876ndash1877 993

Sangrador-Vegas A Mitchell AL Chang H-Y Yong S-Y Finn RD (2016) GO annotation in InterPro why 994 stability does not indicate accuracy in a sea of changing annotations Database J Biol databases 995 curation 2016 996

Sato Y Antonio BA Namiki N Takehisa H Minami H Kamatsuki K Sugimoto K Shimizu Y Hirochika H 997 Nagamura Y (2011) RiceXPro a platform for monitoring gene expression in japonica rice grown under 998 natural field conditions Nucleic Acids Res 39 D1141ndashD1148 999

Schaefer RJ Briskine R Springer NM Myers CL (2014) Discovering functional modules across diverse maize 1000 transcriptomes using COB the co-expression browser PLoS One doi 101371journalpone0099193 1001

Schmidt D (2016) Co-Operation Fast Correlation Covariance and Cosine Similarity R package 1002

Schnable P Ware D Fulton R Stein J (2009) The B73 maize genome complexity diversity and dynamics 1003 Science (80- ) 326 1112ndash1115 1004

Schwanhaumlusser B Busse D Li N Dittmar G Schuchhardt J Wolf J Chen W Selbach M (2011) Global 1005 quantification of mammalian gene expression control Nature 473 337ndash342 1006

Scott-Boyer M-P Haibe-Kains B Deschepper CF (2013) Network statistics of genetically-driven gene co-1007 expression modules in mouse crosses Frontiers in Genetics 20134291 1008

Serin EAR Nijveen H Hilhorst HWM Ligterink W (2016) Learning from Co-expression Networks Possibilities 1009 and Challenges Front Plant Sci 7 444 1010

Shannon P Markiel A Ozier O Baliga NS Wang JT Ramage D Amin N Schwikowski B Ideker T (2003) 1011 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 29

Cytoscape a software environment for integrated models of biomolecular interaction networks Genome 1012 Res 13 2498ndash2504 1013

Sing T Sander O Beerenwinkel N Lengauer T (2005) ROCR visualizing classifier performance in R 1014 Bioinformatics 21 3940ndash3941 1015

Singh M Goel S Meeley RB Dantec C Parrinello H Michaud C Leblanc O Grimanelli D (2011) Production of 1016 viable gametes without meiosis in maize deficient for an ARGONAUTE protein Plant Cell 23 443ndash458 1017

Song L Langfelder P Horvath S (2012) Comparison of co-expression measures mutual information 1018 correlation and model based indices BMC Bioinformatics 13 328 1019

Stelpflug SC Rajandeep S Vaillancourt B Hirsch CN Buell CR Leon N De Kaeppler SM (2015) An 1020 expanded maize gene expression atlas based on RNA-sequencing and its use to explore root 1021 development Plant Genome 314ndash362 1022

Szklarczyk D Franceschini A Wyder S Forslund K Heller D Huerta-Cepas J Simonovic M Roth A Santos A 1023 Tsafou KP et al (2015) STRING v10 Protein-protein interaction networks integrated over the tree of life 1024 Nucleic Acids Res 43 D447ndashD452 1025

Tzfadia O Diels T De Meyer S Vandepoele K Aharoni A Van de Peer Y (2016) CoExpNetViz Comparative 1026 Co-Expression Networks Construction and Visualization Tool Front Plant Sci 6 1194 1027

Usadel B Obayashi T Mutwil M Giorgi FM Bassel GW Tanimoto M Chow A Steinhauser D Persson S 1028 Provart NJ (2009) Co-expression tools for plant biology opportunities for hypothesis generation and 1029 caveats Plant Cell Environ 32 1633ndash51 1030

USDA (2016) Grain World Markets and Trade 1031

Vanholme R Demedts B Morreel K Ralph J Boerjan W (2010) Lignin biosynthesis and structure Plant 1032 Physiol 153 895ndash905 1033

Verdin E Ott M (2014) 50 Years of Protein Acetylation From Gene Regulation To Epigenetics Metabolism 1034 and Beyond Nat Rev Mol Cell Biol 16 258ndash264 1035

Walley JW Sartor RC Shen Z Schmitz RJ Wu KJ Urich MA Nery JR Smith LG Schnable JC Ecker JR 1036 (2016) Integration of omic networks in a developmental atlas of maize Science 353 814ndash818 1037

Wang W Zhou XJ Liu Z Sun F (2015a) Network tuned multiple rank aggregation and applications to gene 1038 ranking BMC Bioinformatics 16 S6 1039

Wang YXR Jiang K Feldman LJ Bickel PJ Huang H (2015b) Inferring genendashgene interactions and functional 1040 modules using sparse canonical correlation analysis Ann Appl Stat 9 300ndash323 1041

Wang Z Gerstein M Snyder M (2009) RNA-Seq a revolutionary tool for transcriptomics Nat Rev Genet 10 1042 57ndash63 1043

Wei C Li J Bumgarner RE (2004) Sample size for detecting differentially expressed genes in microarray 1044 experiments BMC Genomics 5 87 1045

Wolfe CJ Kohane IS Butte AJ (2005) Systematic survey reveals general applicability ofldquo guilt-by-associationrdquo 1046 within gene coexpression networks BMC Bioinformatics 6 227 1047

Wu D-D Wang X Li Y Zeng L Irwin DM Zhang Y-P (2014) ldquoOut of Pollenrdquo Hypothesis for Origin of New 1048 Genes in Flowering Plants Study from Arabidopsis thaliana Genome Biol Evol 6 2822ndash2829 1049

Yang H Liu X Xin M Du J Hu Z Peng H Rossi V Sun Q Ni Z Yao Y (2016) Genome-wide Mapping of 1050 Targets of Maize Histone Deacetylase HDA101 Reveals Its Function and Regulatory Mechanism during 1051 Seed Development Plant Cell 28 629ndash645 1052

Yim WC Yu Y Song K Jang CS Lee B-M (2013) PLANEX the plant co-expression database BMC Plant Biol 1053 13 83 1054

Zheng W Chung LM Zhao H (2011) Bias detection and correction in RNA-Sequencing data BMC 1055 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 30

Bioinformatics 12 290 1056

Zhong R Allen JD Xiao G Xie Y (2014) Ensemble-based network aggregation improves the accuracy of gene 1057 network reconstruction PLoS One 9 e106319 1058

Zhong R Ye ZH (2015) Secondary cell walls Biosynthesis patterned deposition and transcriptional regulation 1059 Plant Cell Physiol 56 195ndash214 1060

Zhu G Wu A Xu X-J Xiao P Lu L Liu J Cao Y Chen L Wu J Zhao X-M (2016) PPIM A protein-protein 1061 interaction database for Maize Plant Physiol 170 pp1501821- 1062

Zyprych-Walczak J Szabelska A Handschuh L Goacuterczak K Klamecka K Figlerowicz M Siatkowski I (2015) 1063 The impact of normalization methods on RNA-Seq data analysis Biomed Res Int 2015 1064

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 1 Number of maize microarray and RNA-Seq samples submitted to NCBI (httpswwwncbinlmnihgov) from 2008-2016 A A text search of the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) database identified samples generated by microarray platforms GPL4032 and GPL12620 the total values for years 2008 to 2016 were combined to represent the number of microarray studies (Microarray) A text search of the NCBI Sequence Read Archive (SRA) database was used to identify RNA-seq samples generated between 2008 and 2016 using the Illumina sequencing platform (RNA-Seq) B the number of samples submitted to NCBI GEO database each year generated by microarray platform GPL4032 and GPL12620 were identified by a text search (dash line) and compared to the number of RNA-Seq Illumina samples (solid line) per year 2008-2016

Fig 1A B

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 2 Normalization and network inference methods effect on single network performance A Network performance was evaluated by calculating area under the receiver operating characteristic curve (AUROC) values from GO datasets for comparisons with samples normalized using Variance Stabilizing Transformation (VST) Counts Per Million (CPM) or Reads per Kilobase per Million (RPKM) methods B Network performance was evaluated by calculating AUROC values from PPPTY dataset comparisons for samples normalized using VST CPM or RPKM methods C Network performance was evaluated by calculating AUROC values from com-parisons with HDA101 binding targets for samples normalized using VST CPM or RPKM methods D Network performance was evaluated by calculating AUROC values from comparisons with GO dataset for samples constructed using nine inference methods including Pearson Correlation Coefficient (PCC) Spearman correla-tion coefficient (SCC) Kendall rank Correlation Coefficient (KCC) Gini correlation coefficient (GCC) Biweight midcorrelation (BIC) Cosine Similarity Coefficient (CSC) Additive ARCNE (AA) multiplicative ARCNE (MA) Minimum Redundancy NETwork (MRNET) and Context Likelihood of Relatedness (CLR) E Network perfor-mance was evaluated by calculating AUROC values from comparisons with PPPTY for samples constructed using nine inference methods F Network performance was evaluated by calculating AUROC values from com-parisons with HDA101 binding targets for samples constructed using nine inference methods Outliers were defined as outside of 15 times the interquartile range above 75 quantile or below 25 quantile Median values were plotted as bold horizontal lines For C to F the horizontal dashed lines indicate the highest and lowest AUROC values

Fig 2 A D

B E

C F

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

FigP

FigurePIPSimilarityPbetweenPninePinferencePmethodsPonPnetworkPperformancePbasedPuponPGOPA-PandPPPPTYPB-PevaluationI Genes with highest AUROC value in CSC or MRNETCLR are closed in blue and green box respectively AreaPunderPthePROCPcurvePAUROC-PvaluesPforPeachPGOPtermPorPgenesPwerePscaledPtoPstandardPnormalPdistributionMPresultingPinPscaledPAUROCPvaluesPbetweenPKPblue-PandPPred-IPSamplesPnormalizedPbyPVSTMPCPMPandPRPKMPwerePanalyzedPusingPeachPinferencePmethodsPPCCMPSCCMPKCCMPGCCMPBICMPCSCMPAAMPMAMPMRNETPandPCLR-PandPclusteredPbasedPonP EuclidianP distanceIP PCCP PearsonP CorrelationP CoefficientP SCCP SpearmanP correlationP coefficientP KCCP KendallP rankP CorrelationP CoefficientPGCCPGiniPcorrelationPcoefficientPBICPBiweightPmidcorrelationPCosinePSimilarityPCoefficientPCSC-PAAPAdditivePARCNEPMAPmultiplicativePARCNEPMRNETPMinimumPRedundancyPNETworkPCLRPContextPLikelihoodPofPRelatednessI

A

B

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 4 Effect of sample size on network performance A Average area under the ROC curve (average AUROC) values from GO evaluation of 17 different sized networks plotted against natural logarithm transformed sample size (log(sample size)) B Average AUROC values from PPPTY evaluation of 17 different sized networks plotted against natural logarithm transformed sample size (log(sample size)) Regression fitting logarithm models were plotted in black lines R-square and p-values were calculated by lm() function in R PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness

Fig 4A

B

GO PCC GO SCC

GO MRNET GO CLR

PPPTY PCC PPPTY SCC

PPPTY CLRPPPTY MRNET

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Fig 5

A B

Figure 5 Evaluation of different sized networks constructed with different number of samples using PCC (black) SCC (green) MRNET (red) and CLR (blue) methods A Average AUROC values from GO evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted against sample size B Average AUROC values from PPPTY evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted against sample size Networks with the same number of samples included are designated as ldquo_1rdquo ldquo_2rdquo and ldquo_3rdquo PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 6 GCN performance comparison of networks constructed with 1266 libraries A Area under the ROC curve (AUROC) values from GO evalu-ation of single network (white bars) aggregation network (grey bars) and protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers B AUROC values from PPPTY evaluation of single network (white bars) aggregation network (grey bars) and protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers

AU

C

AU

C

Protein GO Protein PPPTYA B

Fig 6

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

835

45

3812

5 802

148

MS PA

SA

872505

167664

16510411732

9172 716573

106591

MS PA

SA

chromatinassemblydisassembly

cellular macromoleculemetabolic process

chromatin assembly

Hub

N2 compound metabolicprocess

gene silencing

macromoleculemetabolic process

cellular componentorganization

chromatin modification

biosynthetic process

cellular biosyntheticprocess

DNA packaging

organelle organization

protein-DNA complexassembly

nucleosomeorganization

DNA-dep DNAreplication

macromoleculebiosynthetic process

response to DNAdamage stimulus

chromosomeorganization

pattern specificationprocess

DNA replication

DNA conformationchange

translation

cellular macromoleculebiosynthetic process

Nucleic acid metabolicprocess

gene expression

chromatin organizationnucleosome assembly

epigenetic reg of geneexpression

negative regulation ofmacromolecule

metabolic process

cellular response tostress

RNA processing

DNA repair

regionalization

polysaccharidebiosynthetic process

cell wall organization orbiogenesis

glucan metabolicprocess

cellular glucanmetabolic process

cellular polysaccharidebiosynthetic process

cellular carbohydratebiosynthetic process

cellulose metabolicprocess

cellular polysaccharidemetabolic process

cellulose biosyntheticprocess epidermis development

cell growthgrowth

regulation of cellularcomponent size

cellular amino acidderivative metabolic

process

cell wall polysaccharidemetabolic process

carbohydrate metabolicprocess

regulation of anatomicalstructure size

GTP metabolic process

root morphogenesis

epidermal celldifferentiation

ectoderm developmentphenylpropanoid

biosynthetic process

regulation of cell size

glucan biosyntheticprocess

carbohydratebiosynthetic process cellular cell wall

organization orbiogenesis

cell wall biogenesis

Cell Wallroot epidermal cell

differentiationcell differentiation

cell wall organization

protein polymerization

plant-type cell wallbiogenesis

cellular carbohydratemetabolic process

phenylpropanoidmetabolic process

cell wall macromoleculemetabolic process

cellular cell wallmacromolecule

metabolic process

plant-type cell wallorganization or

biogenesis

hemicellulose metabolicprocess

Fig 7A C

B D

Figure 7 Comparison of top 1000 interacting genes in aggregation and single networks A Venn diagram shows the overlap among 1000 highest interacting genes in three networks 148 genes were shared among three networks PA PCC ranked aggregation network SA SCC ranked aggrega-tion network MS MRNET single network B Venn diagram shows the overlap among top 1x106 edges in three networks 106591 edges were shared among three networks C GO term enrichment analyzed by AgriGO for 148 shared highly interacting genes among PA SA and MS D GO term enrichment analyzed by AgriGO for 214 co-expressed genes queried by 16 cell wall pathway genes wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Fig 8

A B C

Figure 8 Cell wall pathway subnetworks A Intersections of PCC aggregation (PA) SCC aggregation (SA) and MRNET-single (MS) networks queried by 16 cell wall pathway genes (red nodes) Cyan nodes are genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions B Network retrieved from CORNET database queried by the 16 cell wall pathway genes (red node) Cyan nodes are genes with reported function in cell wall related path-ways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions C Network retrieved from STRING database queried by the 16 cell wall pathway genes (red nodes) Cyan nodes are genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Parsed CitationsAllen JD Xie Y Chen M Girard L Xiao G (2012) Comparing statistical methods for constructing large scale gene networks PLoS One7 e29348

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Anders S Huber W (2010) Differential expression analysis for sequence count data Genome Biol 11 R106Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Aoki K Ogata Y Shibata D (2007) Approaches for extracting practical information from gene co-expression networks in plant biologyPlant Cell Physiol 48 381-90

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Aoki Y Okamura Y Tadaka S Kinoshita K Obayashi T (2015) ATTED-II in 2016 A Plant Coexpression Database Towards Lineage-Specific Coexpression Plant Cell Physiol 57 e5-e5

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Asnicar F Masera L Coller E Gallo C Sella N Tolio T Morettin P Erculiani L Galante F Semeniuta S (2016) NES2RA Networkexpansion by stratified variable subsetting and ranking aggregation Int J High Perform Comput Appl 1094342016662508

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Baerenfaller K Grossmann J Grobei MA Hull R Hirsch-Hoffmann M Yalovsky S Zimmermann P Grossniklaus U Gruissem WBaginsky S (2008) Genome-scale proteomics reveals Arabidopsis thaliana gene models and proteome dynamics Science (80- ) 320938-941

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ballouz S Verleyen W Gillis J (2015) Guidance for RNA-seq co-expression network construction and analysis Safety in numbersBioinformatics 31 2123-2130

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ballouz S Weber M Pavlidis P Gillis J (2016) EGAD Ultra-fast functional analysis of gene networks bioRxiv 53868Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Barabasi A-L Oltvai ZNZN Barabaacutesi A-L (2004) Network biology understanding the cells functional organization Nat Rev Genet 5101-113

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Benjamini Y Hochberg Y (1995) Controlling the false discovery rate a practical and powerful approach to multiple testing J R Stat SocSer B 289-300

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

De Bodt S Hollunder J Nelissen H Meulemeester N Inzeacute D (2012) CORNET 20 Integrating plant coexpression protein-proteininteractions regulatory interactions gene associations and functional annotations New Phytol 195 707-720

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Bolduc N Yilmaz A Mejia-Guerra MK Morohashi K OConnor D Grotewold E Hake S (2012) Unraveling the KNOTTED1 regulatorynetwork in maize meristems Genes Dev 26 1685-90

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Bosch M Mayer C-D Cookson A Donnison IS (2011) Identification of genes involved in cell wall biogenesis in grasses by differential wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

gene expression profiling of elongating and non-elongating maize internodes J Exp Bot 62 3545-3561Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Camacho C Coulouris G Avagyan V Ma N Papadopoulos J Bealer K Madden TL (2009) BLAST+ architecture and applications BMCBioinformatics 10 421

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Conesa A Madrigal P Tarazona S Gomez-Cabrero D Cervera A McPherson A Szczesniak MW Gaffney DJ Elo LL Zhang X et al(2016) A survey of best practices for RNA-seq data analysis Genome Biol 17 13

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dhaeseleer P Liang S Somogyi R (2000) Genetic network inference from co-expression clustering to reverse engineeringBioinformatics 16 707-726

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Davidson RM Hansey CN Gowda M Childs KL Lin H Vaillancourt B Sekhon RS de Leon N Kaeppler SM Jiang N et al (2011) Utilityof RNA Sequencing for Analysis of Maize Reproductive Transcriptomes Plant Genome J 4 191

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Desprez T Juraniec M Crowell EF Jouy H Pochylova Z Parcy F Houmlfte H Gonneau M Vernhettes S (2007) Organization of cellulosesynthase complexes involved in primary cell wall synthesis in Arabidopsis thaliana Proc Natl Acad Sci 104 15572-15577

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dhillon IS Modha DS (2001) Concept decompositions for large sparse text data using clustering Mach Learn 42 143-175Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dillies M-A Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot G Castel D Estelle J (2013a) Acomprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis Brief Bioinform 14671-683

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dillies MA Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot NS Castel D Estelle J et al (2013b) Acomprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis Brief Bioinform 14671-683

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Doncheva NT Assenov Y Domingues FS Albrecht M (2012) Topological analysis and interactive visualization of biological networksand protein structures Nature Protoc 7 670-85

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dong J Horvath S (2007) Understanding network concepts in modules BMC Syst Biol 1 24Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dotto MC Petsch KA Aukerman MJ Beatty M Hammell M Timmermans MCP (2014) Genome-wide analysis of leafbladeless1-regulatedand phased small RNAs underscores the importance of the TAS3 ta-siRNA pathway to maize development PLoS Genet 10 e1004826

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Du D Rawat N Deng Z Gmitter FG (2015) Construction of citrus gene coexpression networks from microarray data using randommatrix theory Hortic Res 2 15026

Pubmed Author and TitleCrossRef Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Google Scholar Author Only Title Only Author and Title

Du Z Zhou X Ling Y Zhang Z Su Z (2010) agriGO a GO analysis toolkit for the agricultural community Nucleic Acids Res 38 64-70Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Enright AJ Van Dongen S Ouzounis CA (2002) An efficient algorithm for large-scale detection of protein families Nucleic Acids Res 301575-1584

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fadista J Vikman P Laakso EO Mollet IG Esguerra J Lou Taneera J Storm P Osmark P Ladenvall C Prasad RB (2014) Globalgenomic and transcriptomic analysis of human pancreatic islets reveals novel genes influencing glucose metabolism Proc Natl AcadSci 111 13924-13929

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Faith JJ Hayete B Thaden JT Mogno I Wierzbowski J Cottarel G Kasif S Collins JJ Gardner TS (2007) Large-scale mapping andvalidation of Escherichia coli transcriptional regulation from a compendium of expression profiles PLoS Biol 5 0054-0066

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fedoroff N V (2012) McClintocks challenge in the 21st century Proc Natl Acad Sci 109(50) 20200-20203Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ficklin SP Feltus FA (2011) Gene coexpression network alignment and conservation of gene modules between two grass speciesmaize and rice Plant Physiol 156 1244-56

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Filzmoser P Fritz H Kalcher K (2009) pcaPP Robust PCA by Projection Pursuit R Packag version 1Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fu J Cheng Y Linghu J Yang X Kang L Zhang Z Zhang J He C Du X Peng Z (2013) RNA sequencing reveals the complex regulatorynetwork in the maize kernel Nature Commun 42832

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gao S Ver Steeg G Galstyan A (2015) Efficient Estimation of Mutual Information for Strongly Dependent Variables ArtificialIntelligence and Statistics 277-286

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gillis J Pavlidis P (2011) The role of indirect connections in gene networks in predicting function Bioinformatics 27 1860-1866Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Giorgi F Fabbro C Del Licausi F (2013) Comparative study of RNA-seq-and Microarray-derived coexpression networks in Arabidopsisthaliana Bioinformatics 2 1-8

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gu Y Kaplinsky N Bringmann M Cobb A Carroll A Sampathkumar A Baskin TI Persson S Somerville CR (2010) Identification of acellulose synthase-associated protein required for cellulose biosynthesis Proc Natl Acad Sci 107 12866-12871

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Han Y Gao S Muegge K Zhang W Zhou B (2015) Advanced applications of RNA sequencing and challenges Bioinform Biol Insights 929-46

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Horvath S Dong J (2008) Geometric interpretation of gene coexpression network analysis PLoS Comput biol 4 e1000117Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Huang J Lynn JS Schulte L Vendramin S McGinnis K (2017) Chapter Two-Epigenetic Control of Gene Expression in Maize Int RevCell Mol Biol 328 25-48

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Iancu OD Kawane S Bottomly D Searles R Hitzemann R McWeeney S (2012) Utilizing RNA-Seq data for de novo coexpressionnetwork inference Bioinformatics 28 1592-1597

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kim D Langmead B Salzberg SL (2015) HISAT a fast spliced aligner with low memory requirements Nat Methods 12 357-360Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Koumlster J Rahmann S (2012) Snakemakemdasha scalable bioinformatics workflow engine Bioinformatics 28 2520-2522Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Krouk G Lingeman J Colon A Coruzzi G Shasha D (2013) Gene regulatory networks in plants learning causality from time andperturbation Genome Biol 14 123

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kumar S Stecher G Suleski M Hedges SB (2017) TimeTree a resource for timelines timetrees and divergence times Mol Biol Evol34 1812-1819

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kumari S Nie J Chen H-S Ma H Stewart R Li X Lu M-Z Taylor WM Wei H (2012) Evaluation of gene association methods forcoexpression network construction and biological knowledge discovery PLoS One 7 e50411

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Langfelder P Horvath S (2008) WGCNA an R package for weighted correlation network analysis BMC Bioinformatics 9 559Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Leinonen R Sugawara H Shumway M (2010) The sequence read archive Nucleic Acids Res gkq1019Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li C Qiao Z Qi W Wang Q Yuan Y Yang X Tang Y Mei B Lv Y Zhao H et al (2015a) Genome-Wide Characterization of cis-Acting DNATargets Reveals the Transcriptional Regulatory Framework of Opaque2 in Maize Plant Cell 27 532-545

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li H Peng Z Yang X Wang W Fu J Wang J Han Y Chai Y Guo T Yang N (2013a) Genome-wide association study dissects the geneticarchitecture of oil biosynthesis in maize kernels Nat Genet 45 43-50

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li J Wei H Zhao PX (2013b) DeGNServer Deciphering Genome-Scale Gene Networks through High Performance ReverseEngineering Analysis 2013

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li P Piao Y Shon HS Ryu KH (2015b) Comparing the normalization methods for the differential analysis of Illumina high-throughputRNA-Seq data BMC Bioinformatics 16 347 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li Q Eichten SR Hermanson PJ Zaunbrecher VM Song J Wendt J Rosenbaum H Madzima TF Sloan AE Huang J et al (2014a)Genetic Perturbation of the Maize Methylome Plant Cell 26 4602-4616

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li S Labaj PP Zumbo P Sykacek P Shi W Shi L Phan J Wu P-Y Wang M Wang C (2014b) Detecting and correcting systematicvariation in large-scale RNA sequencing data Nature Biotechnol 32 888-895

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li Y Pearl SA Jackson SA (2015c) Gene Networks in Plant Biology Approaches in Reconstruction and Analysis Trends Plant Sci 20664-675

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Liao Y Smyth GK Shi W (2014) featureCounts an efficient general purpose program for assigning sequence reads to genomicfeatures Bioinformatics 30 923-930

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Lim WK Wang K Lefebvre C Califano A (2007) Comparative analysis of microarray normalization procedures Effects on reverseengineering gene networks Bioinformatics pp 282-288

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Liu S Liu Y Zhao J Cai S Qian H Zuo K Zhao L Zhang L (2017) A computational interactome for prioritizing genes associated withcomplex agronomic traits in rice Plant J 90 177-188

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Lonsdale J Thomas J Salvatore M Phillips R Lo E Shad S Hasz R Walters G Garcia F Young N (2013) The genotype-tissueexpression (GTEx) project Nat Genet 45 580-585

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Love MI Huber W Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 Genome Biol15 1

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Luo X You Z Zhou M Li S Leung H Xia Y Zhu Q (2015) A highly efficient approach to protein interactome mapping based oncollaborative filtering framework Sci Rep 5 7702

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma C Wang X (2012) Application of the Gini correlation coefficient to infer regulatory relationships in transcriptome analysis PlantPhysiol 160 192-203

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma H-W Zeng A-P (2003) The connectivity structure giant strong component and centrality of metabolic networks Bioinformatics 191423-1430

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma S Shah S Bohnert HJ Snyder M Dinesh-Kumar SP (2013) Incorporating motif analysis into gene co-expression networks revealsnovel modular expression pattern and new signaling pathways PLoS Genet 9 e1003840

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Marbach D Costello JC Kuumlffner R Vega NNM Prill RJ Camacho DM Allison KR Aderhold A Allison KR Bonneau R et al (2012)Wisdom of crowds for robust gene network inference Nat Methods 9 796-804

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Margolin AA Nemenman I Basso K Wiggins C Stolovitzky G Dalla Favera R Califano A (2006) ARACNE an algorithm for thereconstruction of gene regulatory networks in a mammalian cellular context BMC Bioinformatics 7 S7

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mark Cigan A Unger-Wallace E Haug-Collet K (2005) Transcriptional gene silencing as a tool for uncovering gene function in maizePlant J 43 929-940

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads EMBnet J 17 pp-10Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Maza E Frasse P Senin P Bouzayen M Zouine M (2013) Comparison of normalization methods for differential gene expressionanalysis in RNA-Seq experiments A matter of relative size of studied transcriptomes Commun Integr Biol 6 e25849

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

McClintock B (1983) The Significance of Responses of THE GENOME TO CHALLENGE Science (80- ) 792-801Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Meyer PE Kontos K Lafitte F Bontempi G (2007) Information-theoretic inference of large transcriptional regulatory networks EurasipJ Bioinforma Syst Biol doi 101155200779879

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Meyer PE Lafitte F Bontempi G (2008) minet ARBioconductor package for inferring large transcriptional networks using mutualinformation BMC Bioinformatics 9 461

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Monaco MK Sen TZ Dharmawardhana PD Ren L Schaeffer M Naithani S Amarasinghe V Thomason J Harper L Gardiner J et al(2013) Maize Metabolic Network Construction and Transcriptome Analysis Plant Genome 6 12

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Morohashi K Casas MI Falcone Ferreyra ML Falcone Ferreyra L Mejiacutea-Guerra MK Pourcel L Yilmaz A Feller A Carvalho B EmilianiJ et al (2012) A genome-wide regulatory framework identifies maize pericarp color1 controlled genes Plant Cell 24 2745-64

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Morris JH Apeltsin L Newman AM Baumbach J Wittkop T Su G Bader GD Ferrin TE (2011) clusterMaker a multi-algorithmclustering plugin for Cytoscape BMC Bioinformatics 12 436

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mortazavi A Williams BA McCue K Schaeffer L Wold B (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq NatMethods 5 621-628

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mukaka MM (2012) A guide to appropriate use of correlation coefficient in medical research Malawi Med J 24 69-71Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Obayashi T Hayashi S Saeki M Ohta H Kinoshita K (2009) ATTED-II provides coexpressed gene networks for Arabidopsis Nucleic wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Acids Res 37 D987-D991Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ogata Y Suzuki H Sakurai N Shibata D (2010) CoP a database for characterizing co-expressed gene modules with biologicalinformation in plants Bioinformatics 26 1267-1268

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Oshlack A Wakefield MJ (2009) Transcript length bias in RNA-seq data confounds systems biology Biol Direct 4 14Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Park T Yi S-G Kang S-H Lee S Lee Y-S Simon R (2003) Evaluation of normalization methods for microarray data BMC Bioinformatics4 33

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Paulson J Chen C-Y Lopes-Ramos CM Kuijjer ML Platig J Sonawane AR Fagny M Glass K Quackenbush J (2016) Tissue-awareRNA-Seq processing and normalization for heterogeneous and sparse data bioRxiv 81802

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Pautler M Eveland AL LaRue T Yang F Weeks R Lunde C Je B Il Meeley R Komatsu M Vollbrecht E et al (2015) FASCIATED EAR4Encodes a bZIP Transcription Factor That Regulates Shoot Meristem Size in Maize Plant Cell Online 2 tpc114132506

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Penning BW Hunter 3rd CT Tayengwa R Eveland AL Dugard CK Olek AT Vermerris W Koch KE McCarty DR Davis MF et al (2009)Genetic resources for maize cell wall biology Plant Physiol 151 1703-1728

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ponnala L Wang Y Sun Q Wijk KJ (2014) Correlation of mRNA and protein abundance in the developing maize leaf Plant J 78 424-440Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Rau A Gallopin M Celeux G Jaffreacutezic F (2013) Data-based filtering for replicated high-throughput transcriptome sequencingexperiments Bioinformatics 29 2146-2152

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Robinson MD McCarthy DJ Smyth GK (2010) edgeR a Bioconductor package for differential expression analysis of digital geneexpression data Bioinformatics 26 139-140

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sales G Romualdi C (2011) parmigenemdasha parallel R package for mutual information estimation and gene network reconstructionBioinformatics 27 1876-1877

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sangrador-Vegas A Mitchell AL Chang H-Y Yong S-Y Finn RD (2016) GO annotation in InterPro why stability does not indicateaccuracy in a sea of changing annotations Database J Biol databases curation 2016

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sato Y Antonio BA Namiki N Takehisa H Minami H Kamatsuki K Sugimoto K Shimizu Y Hirochika H Nagamura Y (2011) RiceXPro aplatform for monitoring gene expression in japonica rice grown under natural field conditions Nucleic Acids Res 39 D1141-D1148

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schaefer RJ Briskine R Springer NM Myers CL (2014) Discovering functional modules across diverse maize transcriptomes using wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

COB the co-expression browser PLoS One doi 101371journalpone0099193Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schmidt D (2016) Co-Operation Fast Correlation Covariance and Cosine Similarity R packagePubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schnable P Ware D Fulton R Stein J (2009) The B73 maize genome complexity diversity and dynamics Science (80- ) 326 1112-1115Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schwanhaumlusser B Busse D Li N Dittmar G Schuchhardt J Wolf J Chen W Selbach M (2011) Global quantification of mammalian geneexpression control Nature 473 337-342

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Scott-Boyer M-P Haibe-Kains B Deschepper CF (2013) Network statistics of genetically-driven gene co-expression modules in mousecrosses Frontiers in Genetics 20134291

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Serin EAR Nijveen H Hilhorst HWM Ligterink W (2016) Learning from Co-expression Networks Possibilities and Challenges FrontPlant Sci 7 444

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Shannon P Markiel A Ozier O Baliga NS Wang JT Ramage D Amin N Schwikowski B Ideker T (2003) Cytoscape a softwareenvironment for integrated models of biomolecular interaction networks Genome Res 13 2498-2504

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sing T Sander O Beerenwinkel N Lengauer T (2005) ROCR visualizing classifier performance in R Bioinformatics 21 3940-3941Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Singh M Goel S Meeley RB Dantec C Parrinello H Michaud C Leblanc O Grimanelli D (2011) Production of viable gametes withoutmeiosis in maize deficient for an ARGONAUTE protein Plant Cell 23 443-458

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Song L Langfelder P Horvath S (2012) Comparison of co-expression measures mutual information correlation and model basedindices BMC Bioinformatics 13 328

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Stelpflug SC Rajandeep S Vaillancourt B Hirsch CN Buell CR Leon N De Kaeppler SM (2015) An expanded maize gene expressionatlas based on RNA-sequencing and its use to explore root development Plant Genome 314-362

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Szklarczyk D Franceschini A Wyder S Forslund K Heller D Huerta-Cepas J Simonovic M Roth A Santos A Tsafou KP et al (2015)STRING v10 Protein-protein interaction networks integrated over the tree of life Nucleic Acids Res 43 D447-D452

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Tzfadia O Diels T De Meyer S Vandepoele K Aharoni A Van de Peer Y (2016) CoExpNetViz Comparative Co-Expression NetworksConstruction and Visualization Tool Front Plant Sci 6 1194

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Usadel B Obayashi T Mutwil M Giorgi FM Bassel GW Tanimoto M Chow A Steinhauser D Persson S Provart NJ (2009) Co-expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

USDA (2016) Grain World Markets and Trade

Vanholme R Demedts B Morreel K Ralph J Boerjan W (2010) Lignin biosynthesis and structure Plant Physiol 153 895-905Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Verdin E Ott M (2014) 50 Years of Protein Acetylation From Gene Regulation To Epigenetics Metabolism and Beyond Nat Rev MolCell Biol 16 258-264

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Walley JW Sartor RC Shen Z Schmitz RJ Wu KJ Urich MA Nery JR Smith LG Schnable JC Ecker JR (2016) Integration of omicnetworks in a developmental atlas of maize Science 353 814-818

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang W Zhou XJ Liu Z Sun F (2015a) Network tuned multiple rank aggregation and applications to gene ranking BMC Bioinformatics16 S6

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang YXR Jiang K Feldman LJ Bickel PJ Huang H (2015b) Inferring gene-gene interactions and functional modules using sparsecanonical correlation analysis Ann Appl Stat 9 300-323

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang Z Gerstein M Snyder M (2009) RNA-Seq a revolutionary tool for transcriptomics Nat Rev Genet 10 57-63Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wei C Li J Bumgarner RE (2004) Sample size for detecting differentially expressed genes in microarray experiments BMC Genomics5 87

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wolfe CJ Kohane IS Butte AJ (2005) Systematic survey reveals general applicability of guilt-by-association within genecoexpression networks BMC Bioinformatics 6 227

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wu D-D Wang X Li Y Zeng L Irwin DM Zhang Y-P (2014) Out of Pollen Hypothesis for Origin of New Genes in Flowering PlantsStudy from Arabidopsis thaliana Genome Biol Evol 6 2822-2829

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yang H Liu X Xin M Du J Hu Z Peng H Rossi V Sun Q Ni Z Yao Y (2016) Genome-wide Mapping of Targets of Maize HistoneDeacetylase HDA101 Reveals Its Function and Regulatory Mechanism during Seed Development Plant Cell 28 629-645

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yim WC Yu Y Song K Jang CS Lee B-M (2013) PLANEX the plant co-expression database BMC Plant Biol 13 83Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zheng W Chung LM Zhao H (2011) Bias detection and correction in RNA-Sequencing data BMC Bioinformatics 12 290Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Allen JD Xiao G Xie Y (2014) Ensemble-based network aggregation improves the accuracy of gene network reconstructionPLoS One 9 e106319 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Ye ZH (2015) Secondary cell walls Biosynthesis patterned deposition and transcriptional regulation Plant Cell Physiol 56195-214

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhu G Wu A Xu X-J Xiao P Lu L Liu J Cao Y Chen L Wu J Zhao X-M (2016) PPIM A protein-protein interaction database for MaizePlant Physiol 170 pp1501821-

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zyprych-Walczak J Szabelska A Handschuh L Goacuterczak K Klamecka K Figlerowicz M Siatkowski I (2015) The impact of normalizationmethods on RNA-Seq data analysis Biomed Res Int 2015

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

  • Parsed Citations
  • Article File
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Figure 7
  • Figure 8
  • Parsed Citations
Page 20: 1 : Maize RNA-Seq GCNs · 4 Author names and affiliation: 5 Ji Huang, ... and the genome annotation is still far from complete (Mark 49 Cigan et al., 2005; Ficklin and Feltus, 2011)

Page | 20

Figure 3 Similarity between ten inference methods on network performance based upon GO (A) and PPPTY 658

(B) evaluation Genes with highest AUROC value in CSC or MRNETCLR are closed in blue and green box 659

respectively Area under the ROC curve (AUROC) values for each GO term or genes were scaled to standard 660

normal distribution resulting in scaled AUROC values between -3 (blue) and 3 (red) Samples normalized by 661

VST CPM and RPKM were analyzed using each inference methods (PCC SCC KCC GCC BIC CSC AA 662

MA MRNET and CLR) and clustered based on Euclidian distance PCC Pearson Correlation Coefficient SCC 663

Spearman correlation coefficient KCC Kendall rank Correlation Coefficient GCC Gini correlation coefficient 664

BIC Biweight midcorrelation CSC Cosine Similarity Coefficient AA Additive ARCNE MA multiplicative 665

ARCNE MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness 666

667

Figure 4 Effect of sample size on network performance A Average area under the ROC curve (average 668

AUROC) values from GO evaluation of 17 different sized networks plotted against natural logarithm 669

transformed sample size (log(sample size)) B Average AUROC values from PPPTY evaluation of 17 different 670

sized networks plotted against natural logarithm transformed sample size (log(sample size)) Regression fitting 671

logarithm models were plotted in black lines R-square and p-values were calculated by lm() function in R PCC 672

Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET Minimum Redundancy 673

NETwork CLR Context Likelihood of Relatedness 674

675

Figure 5 Evaluation of different sized networks constructed with different number of samples using PCC 676

(black) SCC (green) MRNET (red) and CLR (blue) methods A Average AUROC values from GO evaluations 677

of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted against sample size Seventeen 678

individual networks were labeled as S12_1 to S404 the S1266 included all samples from 17 experiments B 679

Average AUROC values from PPPTY evaluations of networks constructed using 12 (S12) to 1266 (S1266) 680

libraries were plotted against sample size Networks with the same number of samples included are 681

designated as ldquo_1rdquo ldquo_2rdquo and ldquo_3rdquo PCC Pearson Correlation Coefficient SCC Spearman correlation 682

coefficient MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness 683

684

Fig 6 GCN performance comparison among single network (whiterdquo1266rdquo) aggregated network (greyrdquoaggrdquo) 685

and protein network (dark greyrdquoprrdquo) using PCC SCC MRNET and CLR A GO evaluation on networks 686

Inference methods were indicated by single letter (p- PCC s- SCC m- MRNET c-CLR) AUROC values were 687

plotted against network types B PPPTY evaluation on networks Inference methods were indicated by single 688

letter (p- PCC s- SCC m- MRNET c-CLR) Network types were plotted against AUROC values Bold 689

horizontal lines indicate median star sign is the mean value of each box Outliers are plotted in grey dots 690

691

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 21

Figure 6 GCN performance comparison of networks constructed with 1266 libraries A Area under the ROC 692

curve (AUROC) values from GO evaluation of single network (white bars) aggregation network (grey bars) and 693

protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) 694

or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers B 695

AUROC values from PPPTY evaluation of single network (white bars) aggregation network (grey bars) and 696

protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) 697

or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers 698

699

Figure 7 Comparison of top 1000 interacting genes in aggregation and single networks A Venn diagram 700

shows the overlap among 1000 highest interacting genes in three networks 148 genes were shared among 701

three networks PA PCC ranked aggregation network SA SCC ranked aggregation network MS MRNET 702

single network B Venn diagram shows the overlap among top 1x106 edges in three networks 106591 edges 703

were shared among three networks C GO term enrichment analyzed by AgriGO for 148 shared highly 704

interacting genes among PA SA and MS D GO term enrichment analyzed by AgriGO for 214 co-expressed 705

genes queried by 16 cell wall pathway genes 706

707

Figure 8 Cell wall pathway subnetworks A Intersections of PCC aggregation (PA) SCC aggregation (SA) and 708

MRNET-single (MS) networks queried by 16 cell wall pathway genes (red nodes) Cyan nodes are genes with 709

reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of 710

involvement in cell wall related pathways Grey lines indicate network predicted interactions B Network 711

retrieved from CORNET database queried by the16 cell wall pathway genes (red node) Cyan nodes are 712

genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior 713

knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions C 714

Network retrieved from STRING database queried by 16 cell wall pathway genes (red nodes) Cyan nodes are 715

genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior 716

knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions 717

718

Supplemental Figure 1 Pipeline and datasets used for analysis A Workflow used in this analysis 719

Independent steps are labeled in square boxes with alternative algorithms for each step in the rounded boxes 720

Software and packages for each step are in italics between the boxes Raw data files were acquired from 721

National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) database converted to a 722

common format (fastq files) and aligned to the maize AGPv3 genome (Alignment) Gene-level reads were 723

counted (Read Count) to generate an expression matrix which was imported to the R environment for the 724

normalization inference and evaluation steps All networks were visualized in Cytoscape B Relative 725

representation of different maize tissues in acquired datasets Tissues are listed by name with the percentage 726

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 22

of the1266 libraries originating from each tissue SAM= Shoot Apical Meristem Samples are grouped by tissue 727

and may be represented by one or more developmental stages of that tissue Tissues represented by less than 728

10 libraries were grouped together as Others C Relative representation of different maize genotypes in our 729

datasets Genotypes are listed by name with the percentage of the 1266 libraries originating from each tissue 730

MAGIC = Multi-parent Advanced Generation InterCrosses Genotypes represented by more than 10 libraries 731

were grouped together as Others 732

733

Supplemental Figure 2 Distribution of gene expression values The frequency of each expression level in the 734

dataset (Density) was plotted against gene expression (Expr) which was calculated after normalization by 735

Variance Stabilizing Transformation (VST) Counts Per Million (CPM) or Reads Per Kilobase per Million 736

mapped reads (RPKM) A-B distribution of expression values for samples normalized with CPM (black line 737

CPM graph) and RPKM (black line RPKM graph) before (A) and after (B) logarithm normalization (log2) VST 738

values are log2 transformed by default The normal distribution of expression (dot lines) was calculated using 739

dnorm() function in R which takes the mean value and standard deviation from log2 transformed expressions 740

C Normalized gene expression values for 15116 genes were averaged libraries and plotted as a function of 741

gene length in base pairs (bp) 742

743

Supplemental Figure 3 Maize CPM-normalized with log2 transformed gene expression from all tissues and 744

developmental stages (Stelpflug et al 2015) A Clustering dendrogram of samples based on Euclidean 745

distance (Height) DAS days after sowing DAP days after pollination V1-V18 vegetative developmental 746

stage B Heat map of the gene expression correlation between pollen tissue and 78 other tissues calculated 747

by Pearson correlation coefficient ranging 06 to 10 Red color indicates higher correlation 748

749

Supplemental Figure 4 Pairwise comparison among results of inferences methods A GO evaluation 750

comparisons for VST CPM and RPKM normalized data The AUROC value density for each method was 751

plotted in diagonal line of blocks between AUROC values and PCC values AUROC values evaluated by GO 752

datasets were plotted pairwise in triangle below diagonal with the number corresponding coefficient values as 753

calculated by Pearson correlation shown in the triangle above diagonal B PPPTY evaluation comparisons for 754

VST CPM and RPKM normalized data The AUROC value density for each method was plotted in diagonal 755

line of blocks between AUROC values and PCC values AUROC values evaluated by PPPTY datasets were 756

plotted pairwise in triangle below diagonal with the number corresponding coefficient values as calculated by 757

Pearson correlation shown in the triangle above diagonal PCC Pearson Correlation Coefficient SCC 758

Spearman correlation coefficient KCC Kendall rank Correlation Coefficient GCC Gini correlation coefficient 759

Bi Biweight midcorrelation CSC Cosine Similarity Coefficient AA Additive ARCNE MA multiplicative 760

ARCNE MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness 761

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 23

762

Supplemental Figure 5 Characteristic of all 1720 PPPTY gene set (ALL_1720) genes with highest AUROC 763

values in CSC method (CSC) PCC method (PCC) and MRNET method (MRNET) Average expression in 764

CPM of four gene sets were in squares average number of lowly expressed elements (CPM lt 0) were in solid 765

circles 766

767

Supplemental Figure 6 Evaluation of network performance based on sample size and inference A AUROC 768

values from GO evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted 769

against sample size B AUROC values from PPPTY evaluations of networks constructed using 12 (S12) to 770

1266 (S1266) libraries were plotted against sample size Networks with the same number of samples included 771

are designated as ldquo_1rdquo ldquo_2rdquo and ldquo_3rdquo Outliers were defined as outside of 15 times the interquartile range 772

above the 75 quantile or below 25 quantile Median values were plotted as bold horizontal lines Dash lines 773

are average AUROC value from 17 individual networks of each categories Mean values of each network were 774

labeled in asterisks PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET 775

Minimum Redundancy NETwork CLR Context Likelihood of Relatedness 776

777

Supplemental Figure 7 GCN performance comparison between protein networks A Area Under the ROC 778

curve (AUROC) values from GO evaluation of protein networks with 17862 genes (ppr_all) and with 11429 779

genes (ppr) B Area Under the ROC curve (AUROC) values from PPPTY evaluation of protein networks with 780

17862 genes (ppr_all) and with 11429 genes (ppr) Both networks were constructed by Pearson Correlation 781

Coefficient (PCC) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate 782

outliers 783

784

Supplemental Figure 8 Average neighborhood connectivity for three selected networks PCC-aggregated (PA) 785

SCC-aggregated (SA) and MRNET-single (MS) The average neighborhood connectivity distribution of all 786

genes is plotted against number of neighbors The top one million edges were chosen for each network Red 787

and blue curve shows the power-law fitted distribution R2 value indicates the fitness with the power-law model 788

789

Supplemental Figure 9 Node distribution for four selected networks PCC-aggregated (PA) SCC-aggregated 790

(SA) and MRNET-single (MS) and the intersection among three networks (Merged network) The number of 791

edges linked to the genes (node degree) was plotted against the number of genes with that degree (number of 792

nodes) Red curve shows the power-law fitted distribution with the function and R2 indicated beside 793

794

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 24

Supplemental Figure 10 Network representation of PCC ranked aggregation network (PA) Each node is a 795

gene in the network The eight largest modules detected by Markov Cluster Algorithm (MCL) were highlighted 796

in colors Genes not in modules 1-8 are light grey nodes 797

798

799

Literature Cited 800

Allen JD Xie Y Chen M Girard L Xiao G (2012) Comparing statistical methods for constructing large scale 801 gene networks PLoS One 7 e29348 802

Anders S Huber W (2010) Differential expression analysis for sequence count data Genome Biol 11 R106 803

Aoki K Ogata Y Shibata D (2007) Approaches for extracting practical information from gene co-expression 804 networks in plant biology Plant Cell Physiol 48 381ndash90 805

Aoki Y Okamura Y Tadaka S Kinoshita K Obayashi T (2015) ATTED-II in 2016 A Plant Coexpression 806 Database Towards Lineage-Specific Coexpression Plant Cell Physiol 57 e5ndashe5 807

Asnicar F Masera L Coller E Gallo C Sella N Tolio T Morettin P Erculiani L Galante F Semeniuta S (2016) 808 NES2RA Network expansion by stratified variable subsetting and ranking aggregation Int J High Perform 809 Comput Appl 1094342016662508 810

Baerenfaller K Grossmann J Grobei MA Hull R Hirsch-Hoffmann M Yalovsky S Zimmermann P 811 Grossniklaus U Gruissem W Baginsky S (2008) Genome-scale proteomics reveals Arabidopsis thaliana 812 gene models and proteome dynamics Science (80- ) 320 938ndash941 813

Ballouz S Verleyen W Gillis J (2015) Guidance for RNA-seq co-expression network construction and analysis 814 Safety in numbers Bioinformatics 31 2123ndash2130 815

Ballouz S Weber M Pavlidis P Gillis J (2016) EGAD Ultra-fast functional analysis of gene networks bioRxiv 816 53868 817

Barabasi A-L Oltvai ZNZN Barabaacutesi A-L (2004) Network biology understanding the cellrsquos functional 818 organization Nat Rev Genet 5 101ndash113 819

Benjamini Y Hochberg Y (1995) Controlling the false discovery rate a practical and powerful approach to 820 multiple testing J R Stat Soc Ser B 289ndash300 821

De Bodt S Hollunder J Nelissen H Meulemeester N Inzeacute D (2012) CORNET 20 Integrating plant 822 coexpression protein-protein interactions regulatory interactions gene associations and functional 823 annotations New Phytol 195 707ndash720 824

Bolduc N Yilmaz A Mejia-Guerra MK Morohashi K OrsquoConnor D Grotewold E Hake S (2012) Unraveling the 825 KNOTTED1 regulatory network in maize meristems Genes Dev 26 1685ndash90 826

Bosch M Mayer C-D Cookson A Donnison IS (2011) Identification of genes involved in cell wall biogenesis in 827 grasses by differential gene expression profiling of elongating and non-elongating maize internodes J 828 Exp Bot 62 3545ndash3561 829

Camacho C Coulouris G Avagyan V Ma N Papadopoulos J Bealer K Madden TL (2009) BLAST+ 830 architecture and applications BMC Bioinformatics 10 421 831

Conesa A Madrigal P Tarazona S Gomez-Cabrero D Cervera A McPherson A Szcześniak MW Gaffney DJ 832 Elo LL Zhang X et al (2016) A survey of best practices for RNA-seq data analysis Genome Biol 17 13 833

Drsquohaeseleer P Liang S Somogyi R (2000) Genetic network inference from co-expression clustering to reverse 834 engineering Bioinformatics 16 707ndash726 835

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 25

Davidson RM Hansey CN Gowda M Childs KL Lin H Vaillancourt B Sekhon RS de Leon N Kaeppler SM 836 Jiang N et al (2011) Utility of RNA Sequencing for Analysis of Maize Reproductive Transcriptomes Plant 837 Genome J 4 191 838

Desprez T Juraniec M Crowell EF Jouy H Pochylova Z Parcy F Houmlfte H Gonneau M Vernhettes S (2007) 839 Organization of cellulose synthase complexes involved in primary cell wall synthesis in Arabidopsis 840 thaliana Proc Natl Acad Sci 104 15572ndash15577 841

Dhillon IS Modha DS (2001) Concept decompositions for large sparse text data using clustering Mach Learn 842 42 143ndash175 843

Dillies M-A Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot G Castel D 844 Estelle J (2013a) A comprehensive evaluation of normalization methods for Illumina high-throughput RNA 845 sequencing data analysis Brief Bioinform 14 671ndash683 846

Dillies MA Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot NS Castel D 847 Estelle J et al (2013b) A comprehensive evaluation of normalization methods for Illumina high-throughput 848 RNA sequencing data analysis Brief Bioinform 14 671ndash683 849

Doncheva NT Assenov Y Domingues FS Albrecht M (2012) Topological analysis and interactive visualization 850 of biological networks and protein structures Nature Protoc 7 670ndash85 851

Dong J Horvath S (2007) Understanding network concepts in modules BMC Syst Biol 1 24 852

Dotto MC Petsch KA Aukerman MJ Beatty M Hammell M Timmermans MCP (2014) Genome-wide analysis 853 of leafbladeless1-regulated and phased small RNAs underscores the importance of the TAS3 ta-siRNA 854 pathway to maize development PLoS Genet 10 e1004826 855

Du D Rawat N Deng Z Gmitter FG (2015) Construction of citrus gene coexpression networks from microarray 856 data using random matrix theory Hortic Res 2 15026 857

Du Z Zhou X Ling Y Zhang Z Su Z (2010) agriGO a GO analysis toolkit for the agricultural community 858 Nucleic Acids Res 38 64-70 859

Enright AJ Van Dongen S Ouzounis CA (2002) An efficient algorithm for large-scale detection of protein 860 families Nucleic Acids Res 30 1575ndash1584 861

Fadista J Vikman P Laakso EO Mollet IG Esguerra J Lou Taneera J Storm P Osmark P Ladenvall C 862 Prasad RB (2014) Global genomic and transcriptomic analysis of human pancreatic islets reveals novel 863 genes influencing glucose metabolism Proc Natl Acad Sci 111 13924ndash13929 864

Faith JJ Hayete B Thaden JT Mogno I Wierzbowski J Cottarel G Kasif S Collins JJ Gardner TS (2007) 865 Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of 866 expression profiles PLoS Biol 5 0054ndash0066 867

Fedoroff N V (2012) McClintockrsquos challenge in the 21st century Proc Natl Acad Sci 109(50) 20200ndash20203 868

Ficklin SP Feltus FA (2011) Gene coexpression network alignment and conservation of gene modules 869 between two grass species maize and rice Plant Physiol 156 1244ndash56 870

Filzmoser P Fritz H Kalcher K (2009) pcaPP Robust PCA by Projection Pursuit R Packag version 1 871

Fu J Cheng Y Linghu J Yang X Kang L Zhang Z Zhang J He C Du X Peng Z (2013) RNA sequencing 872 reveals the complex regulatory network in the maize kernel Nature Commun 42832 873

Gao S Ver Steeg G Galstyan A (2015) Efficient Estimation of Mutual Information for Strongly Dependent 874 Variables Artificial Intelligence and Statistics 277-286 875

Gillis J Pavlidis P (2011) The role of indirect connections in gene networks in predicting function 876 Bioinformatics 27 1860ndash1866 877

Giorgi F Fabbro C Del Licausi F (2013) Comparative study of RNA-seq-and Microarray-derived coexpression 878 networks in Arabidopsis thaliana Bioinformatics 2 1ndash8 879

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 26

Gu Y Kaplinsky N Bringmann M Cobb A Carroll A Sampathkumar A Baskin TI Persson S Somerville CR 880 (2010) Identification of a cellulose synthase-associated protein required for cellulose biosynthesis Proc 881 Natl Acad Sci 107 12866ndash12871 882

Han Y Gao S Muegge K Zhang W Zhou B (2015) Advanced applications of RNA sequencing and challenges 883 Bioinform Biol Insights 9 29ndash46 884

Horvath S Dong J (2008) Geometric interpretation of gene coexpression network analysis PLoS Comput biol 885 4 e1000117 886

Huang J Lynn JS Schulte L Vendramin S McGinnis K (2017) Chapter Two-Epigenetic Control of Gene 887 Expression in Maize Int Rev Cell Mol Biol 328 25ndash48 888

Iancu OD Kawane S Bottomly D Searles R Hitzemann R McWeeney S (2012) Utilizing RNA-Seq data for de 889 novo coexpression network inference Bioinformatics 28 1592ndash1597 890

Kim D Langmead B Salzberg SL (2015) HISAT a fast spliced aligner with low memory requirements Nat 891 Methods 12 357ndash360 892

Koumlster J Rahmann S (2012) Snakemakemdasha scalable bioinformatics workflow engine Bioinformatics 28 893 2520ndash2522 894

Krouk G Lingeman J Colon A Coruzzi G Shasha D (2013) Gene regulatory networks in plants learning 895 causality from time and perturbation Genome Biol 14 123 896

Kumar S Stecher G Suleski M Hedges SB (2017) TimeTree a resource for timelines timetrees and 897 divergence times Mol Biol Evol 34 1812ndash1819 898

Kumari S Nie J Chen H-S Ma H Stewart R Li X Lu M-Z Taylor WM Wei H (2012) Evaluation of gene 899 association methods for coexpression network construction and biological knowledge discovery PLoS 900 One 7 e50411 901

Langfelder P Horvath S (2008) WGCNA an R package for weighted correlation network analysis BMC 902 Bioinformatics 9 559 903

Leinonen R Sugawara H Shumway M (2010) The sequence read archive Nucleic Acids Res gkq1019 904

Li C Qiao Z Qi W Wang Q Yuan Y Yang X Tang Y Mei B Lv Y Zhao H et al (2015a) Genome-Wide 905 Characterization of cis-Acting DNA Targets Reveals the Transcriptional Regulatory Framework of 906 Opaque2 in Maize Plant Cell 27 532-545 907

Li H Peng Z Yang X Wang W Fu J Wang J Han Y Chai Y Guo T Yang N (2013a) Genome-wide 908 association study dissects the genetic architecture of oil biosynthesis in maize kernels Nat Genet 45 43ndash909 50 910

Li J Wei H Zhao PX (2013b) DeGNServer  Deciphering Genome-Scale Gene Networks through High 911 Performance Reverse Engineering Analysis 2013 912

Li P Piao Y Shon HS Ryu KH (2015b) Comparing the normalization methods for the differential analysis of 913 Illumina high-throughput RNA-Seq data BMC Bioinformatics 16 347 914

Li Q Eichten SR Hermanson PJ Zaunbrecher VM Song J Wendt J Rosenbaum H Madzima TF Sloan AE 915 Huang J et al (2014a) Genetic Perturbation of the Maize Methylome Plant Cell 26 4602ndash4616 916

Li S Łabaj PP Zumbo P Sykacek P Shi W Shi L Phan J Wu P-Y Wang M Wang C (2014b) Detecting and 917 correcting systematic variation in large-scale RNA sequencing data Nature Biotechnol 32 888ndash895 918

Li Y Pearl SA Jackson SA (2015c) Gene Networks in Plant Biology Approaches in Reconstruction and 919 Analysis Trends Plant Sci 20 664ndash675 920

Liao Y Smyth GK Shi W (2014) featureCounts an efficient general purpose program for assigning sequence 921 reads to genomic features Bioinformatics 30 923ndash930 922

Lim WK Wang K Lefebvre C Califano A (2007) Comparative analysis of microarray normalization procedures 923 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 27

Effects on reverse engineering gene networks Bioinformatics pp 282ndash288 924

Liu S Liu Y Zhao J Cai S Qian H Zuo K Zhao L Zhang L (2017) A computational interactome for prioritizing 925 genes associated with complex agronomic traits in rice Plant J 90 177-188 926

Lonsdale J Thomas J Salvatore M Phillips R Lo E Shad S Hasz R Walters G Garcia F Young N (2013) 927 The genotype-tissue expression (GTEx) project Nat Genet 45 580ndash585 928

Love MI Huber W Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data 929 with DESeq2 Genome Biol 15 1 930

Luo X You Z Zhou M Li S Leung H Xia Y Zhu Q (2015) A highly efficient approach to protein interactome 931 mapping based on collaborative filtering framework Sci Rep 5 7702 932

Ma C Wang X (2012) Application of the Gini correlation coefficient to infer regulatory relationships in 933 transcriptome analysis Plant Physiol 160 192ndash203 934

Ma H-W Zeng A-P (2003) The connectivity structure giant strong component and centrality of metabolic 935 networks Bioinformatics 19 1423ndash1430 936

Ma S Shah S Bohnert HJ Snyder M Dinesh-Kumar SP (2013) Incorporating motif analysis into gene co-937 expression networks reveals novel modular expression pattern and new signaling pathways PLoS Genet 938 9 e1003840 939

Marbach D Costello JC Kuumlffner R Vega NNM Prill RJ Camacho DM Allison KR Aderhold A Allison KR 940 Bonneau R et al (2012) Wisdom of crowds for robust gene network inference Nat Methods 9 796ndash804 941

Margolin AA Nemenman I Basso K Wiggins C Stolovitzky G Dalla Favera R Califano A (2006) ARACNE 942 an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context BMC 943 Bioinformatics 7 S7 944

Mark Cigan A Unger‐Wallace E Haug‐Collet K (2005) Transcriptional gene silencing as a tool for uncovering 945 gene function in maize Plant J 43 929ndash940 946

Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads EMBnet J 17 947 pp-10 948

Maza E Frasse P Senin P Bouzayen M Zouine M (2013) Comparison of normalization methods for 949 differential gene expression analysis in RNA-Seq experiments A matter of relative size of studied 950 transcriptomes Commun Integr Biol 6 e25849 951

McClintock B (1983) The Significance of Responses of THE GENOME TO CHALLENGE Science (80- ) 792ndash952 801 953

Meyer PE Kontos K Lafitte F Bontempi G (2007) Information-theoretic inference of large transcriptional 954 regulatory networks Eurasip J Bioinforma Syst Biol doi 101155200779879 955

Meyer PE Lafitte F Bontempi G (2008) minet ARBioconductor package for inferring large transcriptional 956 networks using mutual information BMC Bioinformatics 9 461 957

Monaco MK Sen TZ Dharmawardhana PD Ren L Schaeffer M Naithani S Amarasinghe V Thomason J 958 Harper L Gardiner J et al (2013) Maize Metabolic Network Construction and Transcriptome Analysis 959 Plant Genome 6 12 960

Morohashi K Casas MI Falcone Ferreyra ML Falcone Ferreyra L Mejiacutea-Guerra MK Pourcel L Yilmaz A 961 Feller A Carvalho B Emiliani J et al (2012) A genome-wide regulatory framework identifies maize 962 pericarp color1 controlled genes Plant Cell 24 2745ndash64 963

Morris JH Apeltsin L Newman AM Baumbach J Wittkop T Su G Bader GD Ferrin TE (2011) clusterMaker 964 a multi-algorithm clustering plugin for Cytoscape BMC Bioinformatics 12 436 965

Mortazavi A Williams BA McCue K Schaeffer L Wold B (2008) Mapping and quantifying mammalian 966 transcriptomes by RNA-Seq Nat Methods 5 621ndash628 967

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 28

Mukaka MM (2012) A guide to appropriate use of correlation coefficient in medical research Malawi Med J 24 968 69ndash71 969

Obayashi T Hayashi S Saeki M Ohta H Kinoshita K (2009) ATTED-II provides coexpressed gene networks 970 for Arabidopsis Nucleic Acids Res 37 D987ndashD991 971

Ogata Y Suzuki H Sakurai N Shibata D (2010) CoP a database for characterizing co-expressed gene 972 modules with biological information in plants Bioinformatics 26 1267ndash1268 973

Oshlack A Wakefield MJ (2009) Transcript length bias in RNA-seq data confounds systems biology Biol 974 Direct 4 14 975

Park T Yi S-G Kang S-H Lee S Lee Y-S Simon R (2003) Evaluation of normalization methods for microarray 976 data BMC Bioinformatics 4 33 977

Paulson J Chen C-Y Lopes-Ramos CM Kuijjer ML Platig J Sonawane AR Fagny M Glass K Quackenbush 978 J (2016) Tissue-aware RNA-Seq processing and normalization for heterogeneous and sparse data 979 bioRxiv 81802 980

Pautler M Eveland AL LaRue T Yang F Weeks R Lunde C Je B Il Meeley R Komatsu M Vollbrecht E et 981 al (2015) FASCIATED EAR4 Encodes a bZIP Transcription Factor That Regulates Shoot Meristem Size in 982 Maize Plant Cell Online 2 tpc114132506 983

Penning BW Hunter 3rd CT Tayengwa R Eveland AL Dugard CK Olek AT Vermerris W Koch KE McCarty 984 DR Davis MF et al (2009) Genetic resources for maize cell wall biology Plant Physiol 151 1703ndash1728 985

Ponnala L Wang Y Sun Q Wijk KJ (2014) Correlation of mRNA and protein abundance in the developing 986 maize leaf Plant J 78 424ndash440 987

Rau A Gallopin M Celeux G Jaffreacutezic F (2013) Data-based filtering for replicated high-throughput 988 transcriptome sequencing experiments Bioinformatics 29 2146ndash2152 989

Robinson MD McCarthy DJ Smyth GK (2010) edgeR a Bioconductor package for differential expression 990 analysis of digital gene expression data Bioinformatics 26 139ndash140 991

Sales G Romualdi C (2011) parmigenemdasha parallel R package for mutual information estimation and gene 992 network reconstruction Bioinformatics 27 1876ndash1877 993

Sangrador-Vegas A Mitchell AL Chang H-Y Yong S-Y Finn RD (2016) GO annotation in InterPro why 994 stability does not indicate accuracy in a sea of changing annotations Database J Biol databases 995 curation 2016 996

Sato Y Antonio BA Namiki N Takehisa H Minami H Kamatsuki K Sugimoto K Shimizu Y Hirochika H 997 Nagamura Y (2011) RiceXPro a platform for monitoring gene expression in japonica rice grown under 998 natural field conditions Nucleic Acids Res 39 D1141ndashD1148 999

Schaefer RJ Briskine R Springer NM Myers CL (2014) Discovering functional modules across diverse maize 1000 transcriptomes using COB the co-expression browser PLoS One doi 101371journalpone0099193 1001

Schmidt D (2016) Co-Operation Fast Correlation Covariance and Cosine Similarity R package 1002

Schnable P Ware D Fulton R Stein J (2009) The B73 maize genome complexity diversity and dynamics 1003 Science (80- ) 326 1112ndash1115 1004

Schwanhaumlusser B Busse D Li N Dittmar G Schuchhardt J Wolf J Chen W Selbach M (2011) Global 1005 quantification of mammalian gene expression control Nature 473 337ndash342 1006

Scott-Boyer M-P Haibe-Kains B Deschepper CF (2013) Network statistics of genetically-driven gene co-1007 expression modules in mouse crosses Frontiers in Genetics 20134291 1008

Serin EAR Nijveen H Hilhorst HWM Ligterink W (2016) Learning from Co-expression Networks Possibilities 1009 and Challenges Front Plant Sci 7 444 1010

Shannon P Markiel A Ozier O Baliga NS Wang JT Ramage D Amin N Schwikowski B Ideker T (2003) 1011 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 29

Cytoscape a software environment for integrated models of biomolecular interaction networks Genome 1012 Res 13 2498ndash2504 1013

Sing T Sander O Beerenwinkel N Lengauer T (2005) ROCR visualizing classifier performance in R 1014 Bioinformatics 21 3940ndash3941 1015

Singh M Goel S Meeley RB Dantec C Parrinello H Michaud C Leblanc O Grimanelli D (2011) Production of 1016 viable gametes without meiosis in maize deficient for an ARGONAUTE protein Plant Cell 23 443ndash458 1017

Song L Langfelder P Horvath S (2012) Comparison of co-expression measures mutual information 1018 correlation and model based indices BMC Bioinformatics 13 328 1019

Stelpflug SC Rajandeep S Vaillancourt B Hirsch CN Buell CR Leon N De Kaeppler SM (2015) An 1020 expanded maize gene expression atlas based on RNA-sequencing and its use to explore root 1021 development Plant Genome 314ndash362 1022

Szklarczyk D Franceschini A Wyder S Forslund K Heller D Huerta-Cepas J Simonovic M Roth A Santos A 1023 Tsafou KP et al (2015) STRING v10 Protein-protein interaction networks integrated over the tree of life 1024 Nucleic Acids Res 43 D447ndashD452 1025

Tzfadia O Diels T De Meyer S Vandepoele K Aharoni A Van de Peer Y (2016) CoExpNetViz Comparative 1026 Co-Expression Networks Construction and Visualization Tool Front Plant Sci 6 1194 1027

Usadel B Obayashi T Mutwil M Giorgi FM Bassel GW Tanimoto M Chow A Steinhauser D Persson S 1028 Provart NJ (2009) Co-expression tools for plant biology opportunities for hypothesis generation and 1029 caveats Plant Cell Environ 32 1633ndash51 1030

USDA (2016) Grain World Markets and Trade 1031

Vanholme R Demedts B Morreel K Ralph J Boerjan W (2010) Lignin biosynthesis and structure Plant 1032 Physiol 153 895ndash905 1033

Verdin E Ott M (2014) 50 Years of Protein Acetylation From Gene Regulation To Epigenetics Metabolism 1034 and Beyond Nat Rev Mol Cell Biol 16 258ndash264 1035

Walley JW Sartor RC Shen Z Schmitz RJ Wu KJ Urich MA Nery JR Smith LG Schnable JC Ecker JR 1036 (2016) Integration of omic networks in a developmental atlas of maize Science 353 814ndash818 1037

Wang W Zhou XJ Liu Z Sun F (2015a) Network tuned multiple rank aggregation and applications to gene 1038 ranking BMC Bioinformatics 16 S6 1039

Wang YXR Jiang K Feldman LJ Bickel PJ Huang H (2015b) Inferring genendashgene interactions and functional 1040 modules using sparse canonical correlation analysis Ann Appl Stat 9 300ndash323 1041

Wang Z Gerstein M Snyder M (2009) RNA-Seq a revolutionary tool for transcriptomics Nat Rev Genet 10 1042 57ndash63 1043

Wei C Li J Bumgarner RE (2004) Sample size for detecting differentially expressed genes in microarray 1044 experiments BMC Genomics 5 87 1045

Wolfe CJ Kohane IS Butte AJ (2005) Systematic survey reveals general applicability ofldquo guilt-by-associationrdquo 1046 within gene coexpression networks BMC Bioinformatics 6 227 1047

Wu D-D Wang X Li Y Zeng L Irwin DM Zhang Y-P (2014) ldquoOut of Pollenrdquo Hypothesis for Origin of New 1048 Genes in Flowering Plants Study from Arabidopsis thaliana Genome Biol Evol 6 2822ndash2829 1049

Yang H Liu X Xin M Du J Hu Z Peng H Rossi V Sun Q Ni Z Yao Y (2016) Genome-wide Mapping of 1050 Targets of Maize Histone Deacetylase HDA101 Reveals Its Function and Regulatory Mechanism during 1051 Seed Development Plant Cell 28 629ndash645 1052

Yim WC Yu Y Song K Jang CS Lee B-M (2013) PLANEX the plant co-expression database BMC Plant Biol 1053 13 83 1054

Zheng W Chung LM Zhao H (2011) Bias detection and correction in RNA-Sequencing data BMC 1055 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 30

Bioinformatics 12 290 1056

Zhong R Allen JD Xiao G Xie Y (2014) Ensemble-based network aggregation improves the accuracy of gene 1057 network reconstruction PLoS One 9 e106319 1058

Zhong R Ye ZH (2015) Secondary cell walls Biosynthesis patterned deposition and transcriptional regulation 1059 Plant Cell Physiol 56 195ndash214 1060

Zhu G Wu A Xu X-J Xiao P Lu L Liu J Cao Y Chen L Wu J Zhao X-M (2016) PPIM A protein-protein 1061 interaction database for Maize Plant Physiol 170 pp1501821- 1062

Zyprych-Walczak J Szabelska A Handschuh L Goacuterczak K Klamecka K Figlerowicz M Siatkowski I (2015) 1063 The impact of normalization methods on RNA-Seq data analysis Biomed Res Int 2015 1064

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 1 Number of maize microarray and RNA-Seq samples submitted to NCBI (httpswwwncbinlmnihgov) from 2008-2016 A A text search of the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) database identified samples generated by microarray platforms GPL4032 and GPL12620 the total values for years 2008 to 2016 were combined to represent the number of microarray studies (Microarray) A text search of the NCBI Sequence Read Archive (SRA) database was used to identify RNA-seq samples generated between 2008 and 2016 using the Illumina sequencing platform (RNA-Seq) B the number of samples submitted to NCBI GEO database each year generated by microarray platform GPL4032 and GPL12620 were identified by a text search (dash line) and compared to the number of RNA-Seq Illumina samples (solid line) per year 2008-2016

Fig 1A B

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 2 Normalization and network inference methods effect on single network performance A Network performance was evaluated by calculating area under the receiver operating characteristic curve (AUROC) values from GO datasets for comparisons with samples normalized using Variance Stabilizing Transformation (VST) Counts Per Million (CPM) or Reads per Kilobase per Million (RPKM) methods B Network performance was evaluated by calculating AUROC values from PPPTY dataset comparisons for samples normalized using VST CPM or RPKM methods C Network performance was evaluated by calculating AUROC values from com-parisons with HDA101 binding targets for samples normalized using VST CPM or RPKM methods D Network performance was evaluated by calculating AUROC values from comparisons with GO dataset for samples constructed using nine inference methods including Pearson Correlation Coefficient (PCC) Spearman correla-tion coefficient (SCC) Kendall rank Correlation Coefficient (KCC) Gini correlation coefficient (GCC) Biweight midcorrelation (BIC) Cosine Similarity Coefficient (CSC) Additive ARCNE (AA) multiplicative ARCNE (MA) Minimum Redundancy NETwork (MRNET) and Context Likelihood of Relatedness (CLR) E Network perfor-mance was evaluated by calculating AUROC values from comparisons with PPPTY for samples constructed using nine inference methods F Network performance was evaluated by calculating AUROC values from com-parisons with HDA101 binding targets for samples constructed using nine inference methods Outliers were defined as outside of 15 times the interquartile range above 75 quantile or below 25 quantile Median values were plotted as bold horizontal lines For C to F the horizontal dashed lines indicate the highest and lowest AUROC values

Fig 2 A D

B E

C F

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

FigP

FigurePIPSimilarityPbetweenPninePinferencePmethodsPonPnetworkPperformancePbasedPuponPGOPA-PandPPPPTYPB-PevaluationI Genes with highest AUROC value in CSC or MRNETCLR are closed in blue and green box respectively AreaPunderPthePROCPcurvePAUROC-PvaluesPforPeachPGOPtermPorPgenesPwerePscaledPtoPstandardPnormalPdistributionMPresultingPinPscaledPAUROCPvaluesPbetweenPKPblue-PandPPred-IPSamplesPnormalizedPbyPVSTMPCPMPandPRPKMPwerePanalyzedPusingPeachPinferencePmethodsPPCCMPSCCMPKCCMPGCCMPBICMPCSCMPAAMPMAMPMRNETPandPCLR-PandPclusteredPbasedPonP EuclidianP distanceIP PCCP PearsonP CorrelationP CoefficientP SCCP SpearmanP correlationP coefficientP KCCP KendallP rankP CorrelationP CoefficientPGCCPGiniPcorrelationPcoefficientPBICPBiweightPmidcorrelationPCosinePSimilarityPCoefficientPCSC-PAAPAdditivePARCNEPMAPmultiplicativePARCNEPMRNETPMinimumPRedundancyPNETworkPCLRPContextPLikelihoodPofPRelatednessI

A

B

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 4 Effect of sample size on network performance A Average area under the ROC curve (average AUROC) values from GO evaluation of 17 different sized networks plotted against natural logarithm transformed sample size (log(sample size)) B Average AUROC values from PPPTY evaluation of 17 different sized networks plotted against natural logarithm transformed sample size (log(sample size)) Regression fitting logarithm models were plotted in black lines R-square and p-values were calculated by lm() function in R PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness

Fig 4A

B

GO PCC GO SCC

GO MRNET GO CLR

PPPTY PCC PPPTY SCC

PPPTY CLRPPPTY MRNET

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Fig 5

A B

Figure 5 Evaluation of different sized networks constructed with different number of samples using PCC (black) SCC (green) MRNET (red) and CLR (blue) methods A Average AUROC values from GO evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted against sample size B Average AUROC values from PPPTY evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted against sample size Networks with the same number of samples included are designated as ldquo_1rdquo ldquo_2rdquo and ldquo_3rdquo PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 6 GCN performance comparison of networks constructed with 1266 libraries A Area under the ROC curve (AUROC) values from GO evalu-ation of single network (white bars) aggregation network (grey bars) and protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers B AUROC values from PPPTY evaluation of single network (white bars) aggregation network (grey bars) and protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers

AU

C

AU

C

Protein GO Protein PPPTYA B

Fig 6

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

835

45

3812

5 802

148

MS PA

SA

872505

167664

16510411732

9172 716573

106591

MS PA

SA

chromatinassemblydisassembly

cellular macromoleculemetabolic process

chromatin assembly

Hub

N2 compound metabolicprocess

gene silencing

macromoleculemetabolic process

cellular componentorganization

chromatin modification

biosynthetic process

cellular biosyntheticprocess

DNA packaging

organelle organization

protein-DNA complexassembly

nucleosomeorganization

DNA-dep DNAreplication

macromoleculebiosynthetic process

response to DNAdamage stimulus

chromosomeorganization

pattern specificationprocess

DNA replication

DNA conformationchange

translation

cellular macromoleculebiosynthetic process

Nucleic acid metabolicprocess

gene expression

chromatin organizationnucleosome assembly

epigenetic reg of geneexpression

negative regulation ofmacromolecule

metabolic process

cellular response tostress

RNA processing

DNA repair

regionalization

polysaccharidebiosynthetic process

cell wall organization orbiogenesis

glucan metabolicprocess

cellular glucanmetabolic process

cellular polysaccharidebiosynthetic process

cellular carbohydratebiosynthetic process

cellulose metabolicprocess

cellular polysaccharidemetabolic process

cellulose biosyntheticprocess epidermis development

cell growthgrowth

regulation of cellularcomponent size

cellular amino acidderivative metabolic

process

cell wall polysaccharidemetabolic process

carbohydrate metabolicprocess

regulation of anatomicalstructure size

GTP metabolic process

root morphogenesis

epidermal celldifferentiation

ectoderm developmentphenylpropanoid

biosynthetic process

regulation of cell size

glucan biosyntheticprocess

carbohydratebiosynthetic process cellular cell wall

organization orbiogenesis

cell wall biogenesis

Cell Wallroot epidermal cell

differentiationcell differentiation

cell wall organization

protein polymerization

plant-type cell wallbiogenesis

cellular carbohydratemetabolic process

phenylpropanoidmetabolic process

cell wall macromoleculemetabolic process

cellular cell wallmacromolecule

metabolic process

plant-type cell wallorganization or

biogenesis

hemicellulose metabolicprocess

Fig 7A C

B D

Figure 7 Comparison of top 1000 interacting genes in aggregation and single networks A Venn diagram shows the overlap among 1000 highest interacting genes in three networks 148 genes were shared among three networks PA PCC ranked aggregation network SA SCC ranked aggrega-tion network MS MRNET single network B Venn diagram shows the overlap among top 1x106 edges in three networks 106591 edges were shared among three networks C GO term enrichment analyzed by AgriGO for 148 shared highly interacting genes among PA SA and MS D GO term enrichment analyzed by AgriGO for 214 co-expressed genes queried by 16 cell wall pathway genes wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Fig 8

A B C

Figure 8 Cell wall pathway subnetworks A Intersections of PCC aggregation (PA) SCC aggregation (SA) and MRNET-single (MS) networks queried by 16 cell wall pathway genes (red nodes) Cyan nodes are genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions B Network retrieved from CORNET database queried by the 16 cell wall pathway genes (red node) Cyan nodes are genes with reported function in cell wall related path-ways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions C Network retrieved from STRING database queried by the 16 cell wall pathway genes (red nodes) Cyan nodes are genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Parsed CitationsAllen JD Xie Y Chen M Girard L Xiao G (2012) Comparing statistical methods for constructing large scale gene networks PLoS One7 e29348

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Anders S Huber W (2010) Differential expression analysis for sequence count data Genome Biol 11 R106Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Aoki K Ogata Y Shibata D (2007) Approaches for extracting practical information from gene co-expression networks in plant biologyPlant Cell Physiol 48 381-90

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Aoki Y Okamura Y Tadaka S Kinoshita K Obayashi T (2015) ATTED-II in 2016 A Plant Coexpression Database Towards Lineage-Specific Coexpression Plant Cell Physiol 57 e5-e5

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Asnicar F Masera L Coller E Gallo C Sella N Tolio T Morettin P Erculiani L Galante F Semeniuta S (2016) NES2RA Networkexpansion by stratified variable subsetting and ranking aggregation Int J High Perform Comput Appl 1094342016662508

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Baerenfaller K Grossmann J Grobei MA Hull R Hirsch-Hoffmann M Yalovsky S Zimmermann P Grossniklaus U Gruissem WBaginsky S (2008) Genome-scale proteomics reveals Arabidopsis thaliana gene models and proteome dynamics Science (80- ) 320938-941

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ballouz S Verleyen W Gillis J (2015) Guidance for RNA-seq co-expression network construction and analysis Safety in numbersBioinformatics 31 2123-2130

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ballouz S Weber M Pavlidis P Gillis J (2016) EGAD Ultra-fast functional analysis of gene networks bioRxiv 53868Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Barabasi A-L Oltvai ZNZN Barabaacutesi A-L (2004) Network biology understanding the cells functional organization Nat Rev Genet 5101-113

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Benjamini Y Hochberg Y (1995) Controlling the false discovery rate a practical and powerful approach to multiple testing J R Stat SocSer B 289-300

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

De Bodt S Hollunder J Nelissen H Meulemeester N Inzeacute D (2012) CORNET 20 Integrating plant coexpression protein-proteininteractions regulatory interactions gene associations and functional annotations New Phytol 195 707-720

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Bolduc N Yilmaz A Mejia-Guerra MK Morohashi K OConnor D Grotewold E Hake S (2012) Unraveling the KNOTTED1 regulatorynetwork in maize meristems Genes Dev 26 1685-90

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Bosch M Mayer C-D Cookson A Donnison IS (2011) Identification of genes involved in cell wall biogenesis in grasses by differential wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

gene expression profiling of elongating and non-elongating maize internodes J Exp Bot 62 3545-3561Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Camacho C Coulouris G Avagyan V Ma N Papadopoulos J Bealer K Madden TL (2009) BLAST+ architecture and applications BMCBioinformatics 10 421

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Conesa A Madrigal P Tarazona S Gomez-Cabrero D Cervera A McPherson A Szczesniak MW Gaffney DJ Elo LL Zhang X et al(2016) A survey of best practices for RNA-seq data analysis Genome Biol 17 13

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dhaeseleer P Liang S Somogyi R (2000) Genetic network inference from co-expression clustering to reverse engineeringBioinformatics 16 707-726

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Davidson RM Hansey CN Gowda M Childs KL Lin H Vaillancourt B Sekhon RS de Leon N Kaeppler SM Jiang N et al (2011) Utilityof RNA Sequencing for Analysis of Maize Reproductive Transcriptomes Plant Genome J 4 191

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Desprez T Juraniec M Crowell EF Jouy H Pochylova Z Parcy F Houmlfte H Gonneau M Vernhettes S (2007) Organization of cellulosesynthase complexes involved in primary cell wall synthesis in Arabidopsis thaliana Proc Natl Acad Sci 104 15572-15577

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dhillon IS Modha DS (2001) Concept decompositions for large sparse text data using clustering Mach Learn 42 143-175Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dillies M-A Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot G Castel D Estelle J (2013a) Acomprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis Brief Bioinform 14671-683

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dillies MA Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot NS Castel D Estelle J et al (2013b) Acomprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis Brief Bioinform 14671-683

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Doncheva NT Assenov Y Domingues FS Albrecht M (2012) Topological analysis and interactive visualization of biological networksand protein structures Nature Protoc 7 670-85

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dong J Horvath S (2007) Understanding network concepts in modules BMC Syst Biol 1 24Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dotto MC Petsch KA Aukerman MJ Beatty M Hammell M Timmermans MCP (2014) Genome-wide analysis of leafbladeless1-regulatedand phased small RNAs underscores the importance of the TAS3 ta-siRNA pathway to maize development PLoS Genet 10 e1004826

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Du D Rawat N Deng Z Gmitter FG (2015) Construction of citrus gene coexpression networks from microarray data using randommatrix theory Hortic Res 2 15026

Pubmed Author and TitleCrossRef Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Google Scholar Author Only Title Only Author and Title

Du Z Zhou X Ling Y Zhang Z Su Z (2010) agriGO a GO analysis toolkit for the agricultural community Nucleic Acids Res 38 64-70Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Enright AJ Van Dongen S Ouzounis CA (2002) An efficient algorithm for large-scale detection of protein families Nucleic Acids Res 301575-1584

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fadista J Vikman P Laakso EO Mollet IG Esguerra J Lou Taneera J Storm P Osmark P Ladenvall C Prasad RB (2014) Globalgenomic and transcriptomic analysis of human pancreatic islets reveals novel genes influencing glucose metabolism Proc Natl AcadSci 111 13924-13929

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Faith JJ Hayete B Thaden JT Mogno I Wierzbowski J Cottarel G Kasif S Collins JJ Gardner TS (2007) Large-scale mapping andvalidation of Escherichia coli transcriptional regulation from a compendium of expression profiles PLoS Biol 5 0054-0066

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fedoroff N V (2012) McClintocks challenge in the 21st century Proc Natl Acad Sci 109(50) 20200-20203Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ficklin SP Feltus FA (2011) Gene coexpression network alignment and conservation of gene modules between two grass speciesmaize and rice Plant Physiol 156 1244-56

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Filzmoser P Fritz H Kalcher K (2009) pcaPP Robust PCA by Projection Pursuit R Packag version 1Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fu J Cheng Y Linghu J Yang X Kang L Zhang Z Zhang J He C Du X Peng Z (2013) RNA sequencing reveals the complex regulatorynetwork in the maize kernel Nature Commun 42832

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gao S Ver Steeg G Galstyan A (2015) Efficient Estimation of Mutual Information for Strongly Dependent Variables ArtificialIntelligence and Statistics 277-286

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gillis J Pavlidis P (2011) The role of indirect connections in gene networks in predicting function Bioinformatics 27 1860-1866Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Giorgi F Fabbro C Del Licausi F (2013) Comparative study of RNA-seq-and Microarray-derived coexpression networks in Arabidopsisthaliana Bioinformatics 2 1-8

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gu Y Kaplinsky N Bringmann M Cobb A Carroll A Sampathkumar A Baskin TI Persson S Somerville CR (2010) Identification of acellulose synthase-associated protein required for cellulose biosynthesis Proc Natl Acad Sci 107 12866-12871

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Han Y Gao S Muegge K Zhang W Zhou B (2015) Advanced applications of RNA sequencing and challenges Bioinform Biol Insights 929-46

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Horvath S Dong J (2008) Geometric interpretation of gene coexpression network analysis PLoS Comput biol 4 e1000117Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Huang J Lynn JS Schulte L Vendramin S McGinnis K (2017) Chapter Two-Epigenetic Control of Gene Expression in Maize Int RevCell Mol Biol 328 25-48

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Iancu OD Kawane S Bottomly D Searles R Hitzemann R McWeeney S (2012) Utilizing RNA-Seq data for de novo coexpressionnetwork inference Bioinformatics 28 1592-1597

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kim D Langmead B Salzberg SL (2015) HISAT a fast spliced aligner with low memory requirements Nat Methods 12 357-360Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Koumlster J Rahmann S (2012) Snakemakemdasha scalable bioinformatics workflow engine Bioinformatics 28 2520-2522Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Krouk G Lingeman J Colon A Coruzzi G Shasha D (2013) Gene regulatory networks in plants learning causality from time andperturbation Genome Biol 14 123

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kumar S Stecher G Suleski M Hedges SB (2017) TimeTree a resource for timelines timetrees and divergence times Mol Biol Evol34 1812-1819

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kumari S Nie J Chen H-S Ma H Stewart R Li X Lu M-Z Taylor WM Wei H (2012) Evaluation of gene association methods forcoexpression network construction and biological knowledge discovery PLoS One 7 e50411

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Langfelder P Horvath S (2008) WGCNA an R package for weighted correlation network analysis BMC Bioinformatics 9 559Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Leinonen R Sugawara H Shumway M (2010) The sequence read archive Nucleic Acids Res gkq1019Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li C Qiao Z Qi W Wang Q Yuan Y Yang X Tang Y Mei B Lv Y Zhao H et al (2015a) Genome-Wide Characterization of cis-Acting DNATargets Reveals the Transcriptional Regulatory Framework of Opaque2 in Maize Plant Cell 27 532-545

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li H Peng Z Yang X Wang W Fu J Wang J Han Y Chai Y Guo T Yang N (2013a) Genome-wide association study dissects the geneticarchitecture of oil biosynthesis in maize kernels Nat Genet 45 43-50

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li J Wei H Zhao PX (2013b) DeGNServer Deciphering Genome-Scale Gene Networks through High Performance ReverseEngineering Analysis 2013

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li P Piao Y Shon HS Ryu KH (2015b) Comparing the normalization methods for the differential analysis of Illumina high-throughputRNA-Seq data BMC Bioinformatics 16 347 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li Q Eichten SR Hermanson PJ Zaunbrecher VM Song J Wendt J Rosenbaum H Madzima TF Sloan AE Huang J et al (2014a)Genetic Perturbation of the Maize Methylome Plant Cell 26 4602-4616

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li S Labaj PP Zumbo P Sykacek P Shi W Shi L Phan J Wu P-Y Wang M Wang C (2014b) Detecting and correcting systematicvariation in large-scale RNA sequencing data Nature Biotechnol 32 888-895

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li Y Pearl SA Jackson SA (2015c) Gene Networks in Plant Biology Approaches in Reconstruction and Analysis Trends Plant Sci 20664-675

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Liao Y Smyth GK Shi W (2014) featureCounts an efficient general purpose program for assigning sequence reads to genomicfeatures Bioinformatics 30 923-930

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Lim WK Wang K Lefebvre C Califano A (2007) Comparative analysis of microarray normalization procedures Effects on reverseengineering gene networks Bioinformatics pp 282-288

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Liu S Liu Y Zhao J Cai S Qian H Zuo K Zhao L Zhang L (2017) A computational interactome for prioritizing genes associated withcomplex agronomic traits in rice Plant J 90 177-188

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Lonsdale J Thomas J Salvatore M Phillips R Lo E Shad S Hasz R Walters G Garcia F Young N (2013) The genotype-tissueexpression (GTEx) project Nat Genet 45 580-585

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Love MI Huber W Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 Genome Biol15 1

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Luo X You Z Zhou M Li S Leung H Xia Y Zhu Q (2015) A highly efficient approach to protein interactome mapping based oncollaborative filtering framework Sci Rep 5 7702

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma C Wang X (2012) Application of the Gini correlation coefficient to infer regulatory relationships in transcriptome analysis PlantPhysiol 160 192-203

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma H-W Zeng A-P (2003) The connectivity structure giant strong component and centrality of metabolic networks Bioinformatics 191423-1430

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma S Shah S Bohnert HJ Snyder M Dinesh-Kumar SP (2013) Incorporating motif analysis into gene co-expression networks revealsnovel modular expression pattern and new signaling pathways PLoS Genet 9 e1003840

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Marbach D Costello JC Kuumlffner R Vega NNM Prill RJ Camacho DM Allison KR Aderhold A Allison KR Bonneau R et al (2012)Wisdom of crowds for robust gene network inference Nat Methods 9 796-804

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Margolin AA Nemenman I Basso K Wiggins C Stolovitzky G Dalla Favera R Califano A (2006) ARACNE an algorithm for thereconstruction of gene regulatory networks in a mammalian cellular context BMC Bioinformatics 7 S7

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mark Cigan A Unger-Wallace E Haug-Collet K (2005) Transcriptional gene silencing as a tool for uncovering gene function in maizePlant J 43 929-940

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads EMBnet J 17 pp-10Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Maza E Frasse P Senin P Bouzayen M Zouine M (2013) Comparison of normalization methods for differential gene expressionanalysis in RNA-Seq experiments A matter of relative size of studied transcriptomes Commun Integr Biol 6 e25849

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

McClintock B (1983) The Significance of Responses of THE GENOME TO CHALLENGE Science (80- ) 792-801Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Meyer PE Kontos K Lafitte F Bontempi G (2007) Information-theoretic inference of large transcriptional regulatory networks EurasipJ Bioinforma Syst Biol doi 101155200779879

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Meyer PE Lafitte F Bontempi G (2008) minet ARBioconductor package for inferring large transcriptional networks using mutualinformation BMC Bioinformatics 9 461

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Monaco MK Sen TZ Dharmawardhana PD Ren L Schaeffer M Naithani S Amarasinghe V Thomason J Harper L Gardiner J et al(2013) Maize Metabolic Network Construction and Transcriptome Analysis Plant Genome 6 12

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Morohashi K Casas MI Falcone Ferreyra ML Falcone Ferreyra L Mejiacutea-Guerra MK Pourcel L Yilmaz A Feller A Carvalho B EmilianiJ et al (2012) A genome-wide regulatory framework identifies maize pericarp color1 controlled genes Plant Cell 24 2745-64

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Morris JH Apeltsin L Newman AM Baumbach J Wittkop T Su G Bader GD Ferrin TE (2011) clusterMaker a multi-algorithmclustering plugin for Cytoscape BMC Bioinformatics 12 436

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mortazavi A Williams BA McCue K Schaeffer L Wold B (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq NatMethods 5 621-628

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mukaka MM (2012) A guide to appropriate use of correlation coefficient in medical research Malawi Med J 24 69-71Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Obayashi T Hayashi S Saeki M Ohta H Kinoshita K (2009) ATTED-II provides coexpressed gene networks for Arabidopsis Nucleic wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Acids Res 37 D987-D991Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ogata Y Suzuki H Sakurai N Shibata D (2010) CoP a database for characterizing co-expressed gene modules with biologicalinformation in plants Bioinformatics 26 1267-1268

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Oshlack A Wakefield MJ (2009) Transcript length bias in RNA-seq data confounds systems biology Biol Direct 4 14Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Park T Yi S-G Kang S-H Lee S Lee Y-S Simon R (2003) Evaluation of normalization methods for microarray data BMC Bioinformatics4 33

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Paulson J Chen C-Y Lopes-Ramos CM Kuijjer ML Platig J Sonawane AR Fagny M Glass K Quackenbush J (2016) Tissue-awareRNA-Seq processing and normalization for heterogeneous and sparse data bioRxiv 81802

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Pautler M Eveland AL LaRue T Yang F Weeks R Lunde C Je B Il Meeley R Komatsu M Vollbrecht E et al (2015) FASCIATED EAR4Encodes a bZIP Transcription Factor That Regulates Shoot Meristem Size in Maize Plant Cell Online 2 tpc114132506

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Penning BW Hunter 3rd CT Tayengwa R Eveland AL Dugard CK Olek AT Vermerris W Koch KE McCarty DR Davis MF et al (2009)Genetic resources for maize cell wall biology Plant Physiol 151 1703-1728

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ponnala L Wang Y Sun Q Wijk KJ (2014) Correlation of mRNA and protein abundance in the developing maize leaf Plant J 78 424-440Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Rau A Gallopin M Celeux G Jaffreacutezic F (2013) Data-based filtering for replicated high-throughput transcriptome sequencingexperiments Bioinformatics 29 2146-2152

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Robinson MD McCarthy DJ Smyth GK (2010) edgeR a Bioconductor package for differential expression analysis of digital geneexpression data Bioinformatics 26 139-140

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sales G Romualdi C (2011) parmigenemdasha parallel R package for mutual information estimation and gene network reconstructionBioinformatics 27 1876-1877

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sangrador-Vegas A Mitchell AL Chang H-Y Yong S-Y Finn RD (2016) GO annotation in InterPro why stability does not indicateaccuracy in a sea of changing annotations Database J Biol databases curation 2016

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sato Y Antonio BA Namiki N Takehisa H Minami H Kamatsuki K Sugimoto K Shimizu Y Hirochika H Nagamura Y (2011) RiceXPro aplatform for monitoring gene expression in japonica rice grown under natural field conditions Nucleic Acids Res 39 D1141-D1148

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schaefer RJ Briskine R Springer NM Myers CL (2014) Discovering functional modules across diverse maize transcriptomes using wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

COB the co-expression browser PLoS One doi 101371journalpone0099193Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schmidt D (2016) Co-Operation Fast Correlation Covariance and Cosine Similarity R packagePubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schnable P Ware D Fulton R Stein J (2009) The B73 maize genome complexity diversity and dynamics Science (80- ) 326 1112-1115Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schwanhaumlusser B Busse D Li N Dittmar G Schuchhardt J Wolf J Chen W Selbach M (2011) Global quantification of mammalian geneexpression control Nature 473 337-342

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Scott-Boyer M-P Haibe-Kains B Deschepper CF (2013) Network statistics of genetically-driven gene co-expression modules in mousecrosses Frontiers in Genetics 20134291

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Serin EAR Nijveen H Hilhorst HWM Ligterink W (2016) Learning from Co-expression Networks Possibilities and Challenges FrontPlant Sci 7 444

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Shannon P Markiel A Ozier O Baliga NS Wang JT Ramage D Amin N Schwikowski B Ideker T (2003) Cytoscape a softwareenvironment for integrated models of biomolecular interaction networks Genome Res 13 2498-2504

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sing T Sander O Beerenwinkel N Lengauer T (2005) ROCR visualizing classifier performance in R Bioinformatics 21 3940-3941Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Singh M Goel S Meeley RB Dantec C Parrinello H Michaud C Leblanc O Grimanelli D (2011) Production of viable gametes withoutmeiosis in maize deficient for an ARGONAUTE protein Plant Cell 23 443-458

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Song L Langfelder P Horvath S (2012) Comparison of co-expression measures mutual information correlation and model basedindices BMC Bioinformatics 13 328

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Stelpflug SC Rajandeep S Vaillancourt B Hirsch CN Buell CR Leon N De Kaeppler SM (2015) An expanded maize gene expressionatlas based on RNA-sequencing and its use to explore root development Plant Genome 314-362

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Szklarczyk D Franceschini A Wyder S Forslund K Heller D Huerta-Cepas J Simonovic M Roth A Santos A Tsafou KP et al (2015)STRING v10 Protein-protein interaction networks integrated over the tree of life Nucleic Acids Res 43 D447-D452

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Tzfadia O Diels T De Meyer S Vandepoele K Aharoni A Van de Peer Y (2016) CoExpNetViz Comparative Co-Expression NetworksConstruction and Visualization Tool Front Plant Sci 6 1194

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Usadel B Obayashi T Mutwil M Giorgi FM Bassel GW Tanimoto M Chow A Steinhauser D Persson S Provart NJ (2009) Co-expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

USDA (2016) Grain World Markets and Trade

Vanholme R Demedts B Morreel K Ralph J Boerjan W (2010) Lignin biosynthesis and structure Plant Physiol 153 895-905Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Verdin E Ott M (2014) 50 Years of Protein Acetylation From Gene Regulation To Epigenetics Metabolism and Beyond Nat Rev MolCell Biol 16 258-264

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Walley JW Sartor RC Shen Z Schmitz RJ Wu KJ Urich MA Nery JR Smith LG Schnable JC Ecker JR (2016) Integration of omicnetworks in a developmental atlas of maize Science 353 814-818

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang W Zhou XJ Liu Z Sun F (2015a) Network tuned multiple rank aggregation and applications to gene ranking BMC Bioinformatics16 S6

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang YXR Jiang K Feldman LJ Bickel PJ Huang H (2015b) Inferring gene-gene interactions and functional modules using sparsecanonical correlation analysis Ann Appl Stat 9 300-323

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang Z Gerstein M Snyder M (2009) RNA-Seq a revolutionary tool for transcriptomics Nat Rev Genet 10 57-63Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wei C Li J Bumgarner RE (2004) Sample size for detecting differentially expressed genes in microarray experiments BMC Genomics5 87

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wolfe CJ Kohane IS Butte AJ (2005) Systematic survey reveals general applicability of guilt-by-association within genecoexpression networks BMC Bioinformatics 6 227

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wu D-D Wang X Li Y Zeng L Irwin DM Zhang Y-P (2014) Out of Pollen Hypothesis for Origin of New Genes in Flowering PlantsStudy from Arabidopsis thaliana Genome Biol Evol 6 2822-2829

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yang H Liu X Xin M Du J Hu Z Peng H Rossi V Sun Q Ni Z Yao Y (2016) Genome-wide Mapping of Targets of Maize HistoneDeacetylase HDA101 Reveals Its Function and Regulatory Mechanism during Seed Development Plant Cell 28 629-645

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yim WC Yu Y Song K Jang CS Lee B-M (2013) PLANEX the plant co-expression database BMC Plant Biol 13 83Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zheng W Chung LM Zhao H (2011) Bias detection and correction in RNA-Sequencing data BMC Bioinformatics 12 290Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Allen JD Xiao G Xie Y (2014) Ensemble-based network aggregation improves the accuracy of gene network reconstructionPLoS One 9 e106319 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Ye ZH (2015) Secondary cell walls Biosynthesis patterned deposition and transcriptional regulation Plant Cell Physiol 56195-214

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhu G Wu A Xu X-J Xiao P Lu L Liu J Cao Y Chen L Wu J Zhao X-M (2016) PPIM A protein-protein interaction database for MaizePlant Physiol 170 pp1501821-

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zyprych-Walczak J Szabelska A Handschuh L Goacuterczak K Klamecka K Figlerowicz M Siatkowski I (2015) The impact of normalizationmethods on RNA-Seq data analysis Biomed Res Int 2015

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

  • Parsed Citations
  • Article File
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Figure 7
  • Figure 8
  • Parsed Citations
Page 21: 1 : Maize RNA-Seq GCNs · 4 Author names and affiliation: 5 Ji Huang, ... and the genome annotation is still far from complete (Mark 49 Cigan et al., 2005; Ficklin and Feltus, 2011)

Page | 21

Figure 6 GCN performance comparison of networks constructed with 1266 libraries A Area under the ROC 692

curve (AUROC) values from GO evaluation of single network (white bars) aggregation network (grey bars) and 693

protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) 694

or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers B 695

AUROC values from PPPTY evaluation of single network (white bars) aggregation network (grey bars) and 696

protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) 697

or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers 698

699

Figure 7 Comparison of top 1000 interacting genes in aggregation and single networks A Venn diagram 700

shows the overlap among 1000 highest interacting genes in three networks 148 genes were shared among 701

three networks PA PCC ranked aggregation network SA SCC ranked aggregation network MS MRNET 702

single network B Venn diagram shows the overlap among top 1x106 edges in three networks 106591 edges 703

were shared among three networks C GO term enrichment analyzed by AgriGO for 148 shared highly 704

interacting genes among PA SA and MS D GO term enrichment analyzed by AgriGO for 214 co-expressed 705

genes queried by 16 cell wall pathway genes 706

707

Figure 8 Cell wall pathway subnetworks A Intersections of PCC aggregation (PA) SCC aggregation (SA) and 708

MRNET-single (MS) networks queried by 16 cell wall pathway genes (red nodes) Cyan nodes are genes with 709

reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of 710

involvement in cell wall related pathways Grey lines indicate network predicted interactions B Network 711

retrieved from CORNET database queried by the16 cell wall pathway genes (red node) Cyan nodes are 712

genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior 713

knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions C 714

Network retrieved from STRING database queried by 16 cell wall pathway genes (red nodes) Cyan nodes are 715

genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior 716

knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions 717

718

Supplemental Figure 1 Pipeline and datasets used for analysis A Workflow used in this analysis 719

Independent steps are labeled in square boxes with alternative algorithms for each step in the rounded boxes 720

Software and packages for each step are in italics between the boxes Raw data files were acquired from 721

National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) database converted to a 722

common format (fastq files) and aligned to the maize AGPv3 genome (Alignment) Gene-level reads were 723

counted (Read Count) to generate an expression matrix which was imported to the R environment for the 724

normalization inference and evaluation steps All networks were visualized in Cytoscape B Relative 725

representation of different maize tissues in acquired datasets Tissues are listed by name with the percentage 726

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 22

of the1266 libraries originating from each tissue SAM= Shoot Apical Meristem Samples are grouped by tissue 727

and may be represented by one or more developmental stages of that tissue Tissues represented by less than 728

10 libraries were grouped together as Others C Relative representation of different maize genotypes in our 729

datasets Genotypes are listed by name with the percentage of the 1266 libraries originating from each tissue 730

MAGIC = Multi-parent Advanced Generation InterCrosses Genotypes represented by more than 10 libraries 731

were grouped together as Others 732

733

Supplemental Figure 2 Distribution of gene expression values The frequency of each expression level in the 734

dataset (Density) was plotted against gene expression (Expr) which was calculated after normalization by 735

Variance Stabilizing Transformation (VST) Counts Per Million (CPM) or Reads Per Kilobase per Million 736

mapped reads (RPKM) A-B distribution of expression values for samples normalized with CPM (black line 737

CPM graph) and RPKM (black line RPKM graph) before (A) and after (B) logarithm normalization (log2) VST 738

values are log2 transformed by default The normal distribution of expression (dot lines) was calculated using 739

dnorm() function in R which takes the mean value and standard deviation from log2 transformed expressions 740

C Normalized gene expression values for 15116 genes were averaged libraries and plotted as a function of 741

gene length in base pairs (bp) 742

743

Supplemental Figure 3 Maize CPM-normalized with log2 transformed gene expression from all tissues and 744

developmental stages (Stelpflug et al 2015) A Clustering dendrogram of samples based on Euclidean 745

distance (Height) DAS days after sowing DAP days after pollination V1-V18 vegetative developmental 746

stage B Heat map of the gene expression correlation between pollen tissue and 78 other tissues calculated 747

by Pearson correlation coefficient ranging 06 to 10 Red color indicates higher correlation 748

749

Supplemental Figure 4 Pairwise comparison among results of inferences methods A GO evaluation 750

comparisons for VST CPM and RPKM normalized data The AUROC value density for each method was 751

plotted in diagonal line of blocks between AUROC values and PCC values AUROC values evaluated by GO 752

datasets were plotted pairwise in triangle below diagonal with the number corresponding coefficient values as 753

calculated by Pearson correlation shown in the triangle above diagonal B PPPTY evaluation comparisons for 754

VST CPM and RPKM normalized data The AUROC value density for each method was plotted in diagonal 755

line of blocks between AUROC values and PCC values AUROC values evaluated by PPPTY datasets were 756

plotted pairwise in triangle below diagonal with the number corresponding coefficient values as calculated by 757

Pearson correlation shown in the triangle above diagonal PCC Pearson Correlation Coefficient SCC 758

Spearman correlation coefficient KCC Kendall rank Correlation Coefficient GCC Gini correlation coefficient 759

Bi Biweight midcorrelation CSC Cosine Similarity Coefficient AA Additive ARCNE MA multiplicative 760

ARCNE MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness 761

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 23

762

Supplemental Figure 5 Characteristic of all 1720 PPPTY gene set (ALL_1720) genes with highest AUROC 763

values in CSC method (CSC) PCC method (PCC) and MRNET method (MRNET) Average expression in 764

CPM of four gene sets were in squares average number of lowly expressed elements (CPM lt 0) were in solid 765

circles 766

767

Supplemental Figure 6 Evaluation of network performance based on sample size and inference A AUROC 768

values from GO evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted 769

against sample size B AUROC values from PPPTY evaluations of networks constructed using 12 (S12) to 770

1266 (S1266) libraries were plotted against sample size Networks with the same number of samples included 771

are designated as ldquo_1rdquo ldquo_2rdquo and ldquo_3rdquo Outliers were defined as outside of 15 times the interquartile range 772

above the 75 quantile or below 25 quantile Median values were plotted as bold horizontal lines Dash lines 773

are average AUROC value from 17 individual networks of each categories Mean values of each network were 774

labeled in asterisks PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET 775

Minimum Redundancy NETwork CLR Context Likelihood of Relatedness 776

777

Supplemental Figure 7 GCN performance comparison between protein networks A Area Under the ROC 778

curve (AUROC) values from GO evaluation of protein networks with 17862 genes (ppr_all) and with 11429 779

genes (ppr) B Area Under the ROC curve (AUROC) values from PPPTY evaluation of protein networks with 780

17862 genes (ppr_all) and with 11429 genes (ppr) Both networks were constructed by Pearson Correlation 781

Coefficient (PCC) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate 782

outliers 783

784

Supplemental Figure 8 Average neighborhood connectivity for three selected networks PCC-aggregated (PA) 785

SCC-aggregated (SA) and MRNET-single (MS) The average neighborhood connectivity distribution of all 786

genes is plotted against number of neighbors The top one million edges were chosen for each network Red 787

and blue curve shows the power-law fitted distribution R2 value indicates the fitness with the power-law model 788

789

Supplemental Figure 9 Node distribution for four selected networks PCC-aggregated (PA) SCC-aggregated 790

(SA) and MRNET-single (MS) and the intersection among three networks (Merged network) The number of 791

edges linked to the genes (node degree) was plotted against the number of genes with that degree (number of 792

nodes) Red curve shows the power-law fitted distribution with the function and R2 indicated beside 793

794

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 24

Supplemental Figure 10 Network representation of PCC ranked aggregation network (PA) Each node is a 795

gene in the network The eight largest modules detected by Markov Cluster Algorithm (MCL) were highlighted 796

in colors Genes not in modules 1-8 are light grey nodes 797

798

799

Literature Cited 800

Allen JD Xie Y Chen M Girard L Xiao G (2012) Comparing statistical methods for constructing large scale 801 gene networks PLoS One 7 e29348 802

Anders S Huber W (2010) Differential expression analysis for sequence count data Genome Biol 11 R106 803

Aoki K Ogata Y Shibata D (2007) Approaches for extracting practical information from gene co-expression 804 networks in plant biology Plant Cell Physiol 48 381ndash90 805

Aoki Y Okamura Y Tadaka S Kinoshita K Obayashi T (2015) ATTED-II in 2016 A Plant Coexpression 806 Database Towards Lineage-Specific Coexpression Plant Cell Physiol 57 e5ndashe5 807

Asnicar F Masera L Coller E Gallo C Sella N Tolio T Morettin P Erculiani L Galante F Semeniuta S (2016) 808 NES2RA Network expansion by stratified variable subsetting and ranking aggregation Int J High Perform 809 Comput Appl 1094342016662508 810

Baerenfaller K Grossmann J Grobei MA Hull R Hirsch-Hoffmann M Yalovsky S Zimmermann P 811 Grossniklaus U Gruissem W Baginsky S (2008) Genome-scale proteomics reveals Arabidopsis thaliana 812 gene models and proteome dynamics Science (80- ) 320 938ndash941 813

Ballouz S Verleyen W Gillis J (2015) Guidance for RNA-seq co-expression network construction and analysis 814 Safety in numbers Bioinformatics 31 2123ndash2130 815

Ballouz S Weber M Pavlidis P Gillis J (2016) EGAD Ultra-fast functional analysis of gene networks bioRxiv 816 53868 817

Barabasi A-L Oltvai ZNZN Barabaacutesi A-L (2004) Network biology understanding the cellrsquos functional 818 organization Nat Rev Genet 5 101ndash113 819

Benjamini Y Hochberg Y (1995) Controlling the false discovery rate a practical and powerful approach to 820 multiple testing J R Stat Soc Ser B 289ndash300 821

De Bodt S Hollunder J Nelissen H Meulemeester N Inzeacute D (2012) CORNET 20 Integrating plant 822 coexpression protein-protein interactions regulatory interactions gene associations and functional 823 annotations New Phytol 195 707ndash720 824

Bolduc N Yilmaz A Mejia-Guerra MK Morohashi K OrsquoConnor D Grotewold E Hake S (2012) Unraveling the 825 KNOTTED1 regulatory network in maize meristems Genes Dev 26 1685ndash90 826

Bosch M Mayer C-D Cookson A Donnison IS (2011) Identification of genes involved in cell wall biogenesis in 827 grasses by differential gene expression profiling of elongating and non-elongating maize internodes J 828 Exp Bot 62 3545ndash3561 829

Camacho C Coulouris G Avagyan V Ma N Papadopoulos J Bealer K Madden TL (2009) BLAST+ 830 architecture and applications BMC Bioinformatics 10 421 831

Conesa A Madrigal P Tarazona S Gomez-Cabrero D Cervera A McPherson A Szcześniak MW Gaffney DJ 832 Elo LL Zhang X et al (2016) A survey of best practices for RNA-seq data analysis Genome Biol 17 13 833

Drsquohaeseleer P Liang S Somogyi R (2000) Genetic network inference from co-expression clustering to reverse 834 engineering Bioinformatics 16 707ndash726 835

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 25

Davidson RM Hansey CN Gowda M Childs KL Lin H Vaillancourt B Sekhon RS de Leon N Kaeppler SM 836 Jiang N et al (2011) Utility of RNA Sequencing for Analysis of Maize Reproductive Transcriptomes Plant 837 Genome J 4 191 838

Desprez T Juraniec M Crowell EF Jouy H Pochylova Z Parcy F Houmlfte H Gonneau M Vernhettes S (2007) 839 Organization of cellulose synthase complexes involved in primary cell wall synthesis in Arabidopsis 840 thaliana Proc Natl Acad Sci 104 15572ndash15577 841

Dhillon IS Modha DS (2001) Concept decompositions for large sparse text data using clustering Mach Learn 842 42 143ndash175 843

Dillies M-A Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot G Castel D 844 Estelle J (2013a) A comprehensive evaluation of normalization methods for Illumina high-throughput RNA 845 sequencing data analysis Brief Bioinform 14 671ndash683 846

Dillies MA Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot NS Castel D 847 Estelle J et al (2013b) A comprehensive evaluation of normalization methods for Illumina high-throughput 848 RNA sequencing data analysis Brief Bioinform 14 671ndash683 849

Doncheva NT Assenov Y Domingues FS Albrecht M (2012) Topological analysis and interactive visualization 850 of biological networks and protein structures Nature Protoc 7 670ndash85 851

Dong J Horvath S (2007) Understanding network concepts in modules BMC Syst Biol 1 24 852

Dotto MC Petsch KA Aukerman MJ Beatty M Hammell M Timmermans MCP (2014) Genome-wide analysis 853 of leafbladeless1-regulated and phased small RNAs underscores the importance of the TAS3 ta-siRNA 854 pathway to maize development PLoS Genet 10 e1004826 855

Du D Rawat N Deng Z Gmitter FG (2015) Construction of citrus gene coexpression networks from microarray 856 data using random matrix theory Hortic Res 2 15026 857

Du Z Zhou X Ling Y Zhang Z Su Z (2010) agriGO a GO analysis toolkit for the agricultural community 858 Nucleic Acids Res 38 64-70 859

Enright AJ Van Dongen S Ouzounis CA (2002) An efficient algorithm for large-scale detection of protein 860 families Nucleic Acids Res 30 1575ndash1584 861

Fadista J Vikman P Laakso EO Mollet IG Esguerra J Lou Taneera J Storm P Osmark P Ladenvall C 862 Prasad RB (2014) Global genomic and transcriptomic analysis of human pancreatic islets reveals novel 863 genes influencing glucose metabolism Proc Natl Acad Sci 111 13924ndash13929 864

Faith JJ Hayete B Thaden JT Mogno I Wierzbowski J Cottarel G Kasif S Collins JJ Gardner TS (2007) 865 Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of 866 expression profiles PLoS Biol 5 0054ndash0066 867

Fedoroff N V (2012) McClintockrsquos challenge in the 21st century Proc Natl Acad Sci 109(50) 20200ndash20203 868

Ficklin SP Feltus FA (2011) Gene coexpression network alignment and conservation of gene modules 869 between two grass species maize and rice Plant Physiol 156 1244ndash56 870

Filzmoser P Fritz H Kalcher K (2009) pcaPP Robust PCA by Projection Pursuit R Packag version 1 871

Fu J Cheng Y Linghu J Yang X Kang L Zhang Z Zhang J He C Du X Peng Z (2013) RNA sequencing 872 reveals the complex regulatory network in the maize kernel Nature Commun 42832 873

Gao S Ver Steeg G Galstyan A (2015) Efficient Estimation of Mutual Information for Strongly Dependent 874 Variables Artificial Intelligence and Statistics 277-286 875

Gillis J Pavlidis P (2011) The role of indirect connections in gene networks in predicting function 876 Bioinformatics 27 1860ndash1866 877

Giorgi F Fabbro C Del Licausi F (2013) Comparative study of RNA-seq-and Microarray-derived coexpression 878 networks in Arabidopsis thaliana Bioinformatics 2 1ndash8 879

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 26

Gu Y Kaplinsky N Bringmann M Cobb A Carroll A Sampathkumar A Baskin TI Persson S Somerville CR 880 (2010) Identification of a cellulose synthase-associated protein required for cellulose biosynthesis Proc 881 Natl Acad Sci 107 12866ndash12871 882

Han Y Gao S Muegge K Zhang W Zhou B (2015) Advanced applications of RNA sequencing and challenges 883 Bioinform Biol Insights 9 29ndash46 884

Horvath S Dong J (2008) Geometric interpretation of gene coexpression network analysis PLoS Comput biol 885 4 e1000117 886

Huang J Lynn JS Schulte L Vendramin S McGinnis K (2017) Chapter Two-Epigenetic Control of Gene 887 Expression in Maize Int Rev Cell Mol Biol 328 25ndash48 888

Iancu OD Kawane S Bottomly D Searles R Hitzemann R McWeeney S (2012) Utilizing RNA-Seq data for de 889 novo coexpression network inference Bioinformatics 28 1592ndash1597 890

Kim D Langmead B Salzberg SL (2015) HISAT a fast spliced aligner with low memory requirements Nat 891 Methods 12 357ndash360 892

Koumlster J Rahmann S (2012) Snakemakemdasha scalable bioinformatics workflow engine Bioinformatics 28 893 2520ndash2522 894

Krouk G Lingeman J Colon A Coruzzi G Shasha D (2013) Gene regulatory networks in plants learning 895 causality from time and perturbation Genome Biol 14 123 896

Kumar S Stecher G Suleski M Hedges SB (2017) TimeTree a resource for timelines timetrees and 897 divergence times Mol Biol Evol 34 1812ndash1819 898

Kumari S Nie J Chen H-S Ma H Stewart R Li X Lu M-Z Taylor WM Wei H (2012) Evaluation of gene 899 association methods for coexpression network construction and biological knowledge discovery PLoS 900 One 7 e50411 901

Langfelder P Horvath S (2008) WGCNA an R package for weighted correlation network analysis BMC 902 Bioinformatics 9 559 903

Leinonen R Sugawara H Shumway M (2010) The sequence read archive Nucleic Acids Res gkq1019 904

Li C Qiao Z Qi W Wang Q Yuan Y Yang X Tang Y Mei B Lv Y Zhao H et al (2015a) Genome-Wide 905 Characterization of cis-Acting DNA Targets Reveals the Transcriptional Regulatory Framework of 906 Opaque2 in Maize Plant Cell 27 532-545 907

Li H Peng Z Yang X Wang W Fu J Wang J Han Y Chai Y Guo T Yang N (2013a) Genome-wide 908 association study dissects the genetic architecture of oil biosynthesis in maize kernels Nat Genet 45 43ndash909 50 910

Li J Wei H Zhao PX (2013b) DeGNServer  Deciphering Genome-Scale Gene Networks through High 911 Performance Reverse Engineering Analysis 2013 912

Li P Piao Y Shon HS Ryu KH (2015b) Comparing the normalization methods for the differential analysis of 913 Illumina high-throughput RNA-Seq data BMC Bioinformatics 16 347 914

Li Q Eichten SR Hermanson PJ Zaunbrecher VM Song J Wendt J Rosenbaum H Madzima TF Sloan AE 915 Huang J et al (2014a) Genetic Perturbation of the Maize Methylome Plant Cell 26 4602ndash4616 916

Li S Łabaj PP Zumbo P Sykacek P Shi W Shi L Phan J Wu P-Y Wang M Wang C (2014b) Detecting and 917 correcting systematic variation in large-scale RNA sequencing data Nature Biotechnol 32 888ndash895 918

Li Y Pearl SA Jackson SA (2015c) Gene Networks in Plant Biology Approaches in Reconstruction and 919 Analysis Trends Plant Sci 20 664ndash675 920

Liao Y Smyth GK Shi W (2014) featureCounts an efficient general purpose program for assigning sequence 921 reads to genomic features Bioinformatics 30 923ndash930 922

Lim WK Wang K Lefebvre C Califano A (2007) Comparative analysis of microarray normalization procedures 923 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 27

Effects on reverse engineering gene networks Bioinformatics pp 282ndash288 924

Liu S Liu Y Zhao J Cai S Qian H Zuo K Zhao L Zhang L (2017) A computational interactome for prioritizing 925 genes associated with complex agronomic traits in rice Plant J 90 177-188 926

Lonsdale J Thomas J Salvatore M Phillips R Lo E Shad S Hasz R Walters G Garcia F Young N (2013) 927 The genotype-tissue expression (GTEx) project Nat Genet 45 580ndash585 928

Love MI Huber W Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data 929 with DESeq2 Genome Biol 15 1 930

Luo X You Z Zhou M Li S Leung H Xia Y Zhu Q (2015) A highly efficient approach to protein interactome 931 mapping based on collaborative filtering framework Sci Rep 5 7702 932

Ma C Wang X (2012) Application of the Gini correlation coefficient to infer regulatory relationships in 933 transcriptome analysis Plant Physiol 160 192ndash203 934

Ma H-W Zeng A-P (2003) The connectivity structure giant strong component and centrality of metabolic 935 networks Bioinformatics 19 1423ndash1430 936

Ma S Shah S Bohnert HJ Snyder M Dinesh-Kumar SP (2013) Incorporating motif analysis into gene co-937 expression networks reveals novel modular expression pattern and new signaling pathways PLoS Genet 938 9 e1003840 939

Marbach D Costello JC Kuumlffner R Vega NNM Prill RJ Camacho DM Allison KR Aderhold A Allison KR 940 Bonneau R et al (2012) Wisdom of crowds for robust gene network inference Nat Methods 9 796ndash804 941

Margolin AA Nemenman I Basso K Wiggins C Stolovitzky G Dalla Favera R Califano A (2006) ARACNE 942 an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context BMC 943 Bioinformatics 7 S7 944

Mark Cigan A Unger‐Wallace E Haug‐Collet K (2005) Transcriptional gene silencing as a tool for uncovering 945 gene function in maize Plant J 43 929ndash940 946

Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads EMBnet J 17 947 pp-10 948

Maza E Frasse P Senin P Bouzayen M Zouine M (2013) Comparison of normalization methods for 949 differential gene expression analysis in RNA-Seq experiments A matter of relative size of studied 950 transcriptomes Commun Integr Biol 6 e25849 951

McClintock B (1983) The Significance of Responses of THE GENOME TO CHALLENGE Science (80- ) 792ndash952 801 953

Meyer PE Kontos K Lafitte F Bontempi G (2007) Information-theoretic inference of large transcriptional 954 regulatory networks Eurasip J Bioinforma Syst Biol doi 101155200779879 955

Meyer PE Lafitte F Bontempi G (2008) minet ARBioconductor package for inferring large transcriptional 956 networks using mutual information BMC Bioinformatics 9 461 957

Monaco MK Sen TZ Dharmawardhana PD Ren L Schaeffer M Naithani S Amarasinghe V Thomason J 958 Harper L Gardiner J et al (2013) Maize Metabolic Network Construction and Transcriptome Analysis 959 Plant Genome 6 12 960

Morohashi K Casas MI Falcone Ferreyra ML Falcone Ferreyra L Mejiacutea-Guerra MK Pourcel L Yilmaz A 961 Feller A Carvalho B Emiliani J et al (2012) A genome-wide regulatory framework identifies maize 962 pericarp color1 controlled genes Plant Cell 24 2745ndash64 963

Morris JH Apeltsin L Newman AM Baumbach J Wittkop T Su G Bader GD Ferrin TE (2011) clusterMaker 964 a multi-algorithm clustering plugin for Cytoscape BMC Bioinformatics 12 436 965

Mortazavi A Williams BA McCue K Schaeffer L Wold B (2008) Mapping and quantifying mammalian 966 transcriptomes by RNA-Seq Nat Methods 5 621ndash628 967

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 28

Mukaka MM (2012) A guide to appropriate use of correlation coefficient in medical research Malawi Med J 24 968 69ndash71 969

Obayashi T Hayashi S Saeki M Ohta H Kinoshita K (2009) ATTED-II provides coexpressed gene networks 970 for Arabidopsis Nucleic Acids Res 37 D987ndashD991 971

Ogata Y Suzuki H Sakurai N Shibata D (2010) CoP a database for characterizing co-expressed gene 972 modules with biological information in plants Bioinformatics 26 1267ndash1268 973

Oshlack A Wakefield MJ (2009) Transcript length bias in RNA-seq data confounds systems biology Biol 974 Direct 4 14 975

Park T Yi S-G Kang S-H Lee S Lee Y-S Simon R (2003) Evaluation of normalization methods for microarray 976 data BMC Bioinformatics 4 33 977

Paulson J Chen C-Y Lopes-Ramos CM Kuijjer ML Platig J Sonawane AR Fagny M Glass K Quackenbush 978 J (2016) Tissue-aware RNA-Seq processing and normalization for heterogeneous and sparse data 979 bioRxiv 81802 980

Pautler M Eveland AL LaRue T Yang F Weeks R Lunde C Je B Il Meeley R Komatsu M Vollbrecht E et 981 al (2015) FASCIATED EAR4 Encodes a bZIP Transcription Factor That Regulates Shoot Meristem Size in 982 Maize Plant Cell Online 2 tpc114132506 983

Penning BW Hunter 3rd CT Tayengwa R Eveland AL Dugard CK Olek AT Vermerris W Koch KE McCarty 984 DR Davis MF et al (2009) Genetic resources for maize cell wall biology Plant Physiol 151 1703ndash1728 985

Ponnala L Wang Y Sun Q Wijk KJ (2014) Correlation of mRNA and protein abundance in the developing 986 maize leaf Plant J 78 424ndash440 987

Rau A Gallopin M Celeux G Jaffreacutezic F (2013) Data-based filtering for replicated high-throughput 988 transcriptome sequencing experiments Bioinformatics 29 2146ndash2152 989

Robinson MD McCarthy DJ Smyth GK (2010) edgeR a Bioconductor package for differential expression 990 analysis of digital gene expression data Bioinformatics 26 139ndash140 991

Sales G Romualdi C (2011) parmigenemdasha parallel R package for mutual information estimation and gene 992 network reconstruction Bioinformatics 27 1876ndash1877 993

Sangrador-Vegas A Mitchell AL Chang H-Y Yong S-Y Finn RD (2016) GO annotation in InterPro why 994 stability does not indicate accuracy in a sea of changing annotations Database J Biol databases 995 curation 2016 996

Sato Y Antonio BA Namiki N Takehisa H Minami H Kamatsuki K Sugimoto K Shimizu Y Hirochika H 997 Nagamura Y (2011) RiceXPro a platform for monitoring gene expression in japonica rice grown under 998 natural field conditions Nucleic Acids Res 39 D1141ndashD1148 999

Schaefer RJ Briskine R Springer NM Myers CL (2014) Discovering functional modules across diverse maize 1000 transcriptomes using COB the co-expression browser PLoS One doi 101371journalpone0099193 1001

Schmidt D (2016) Co-Operation Fast Correlation Covariance and Cosine Similarity R package 1002

Schnable P Ware D Fulton R Stein J (2009) The B73 maize genome complexity diversity and dynamics 1003 Science (80- ) 326 1112ndash1115 1004

Schwanhaumlusser B Busse D Li N Dittmar G Schuchhardt J Wolf J Chen W Selbach M (2011) Global 1005 quantification of mammalian gene expression control Nature 473 337ndash342 1006

Scott-Boyer M-P Haibe-Kains B Deschepper CF (2013) Network statistics of genetically-driven gene co-1007 expression modules in mouse crosses Frontiers in Genetics 20134291 1008

Serin EAR Nijveen H Hilhorst HWM Ligterink W (2016) Learning from Co-expression Networks Possibilities 1009 and Challenges Front Plant Sci 7 444 1010

Shannon P Markiel A Ozier O Baliga NS Wang JT Ramage D Amin N Schwikowski B Ideker T (2003) 1011 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 29

Cytoscape a software environment for integrated models of biomolecular interaction networks Genome 1012 Res 13 2498ndash2504 1013

Sing T Sander O Beerenwinkel N Lengauer T (2005) ROCR visualizing classifier performance in R 1014 Bioinformatics 21 3940ndash3941 1015

Singh M Goel S Meeley RB Dantec C Parrinello H Michaud C Leblanc O Grimanelli D (2011) Production of 1016 viable gametes without meiosis in maize deficient for an ARGONAUTE protein Plant Cell 23 443ndash458 1017

Song L Langfelder P Horvath S (2012) Comparison of co-expression measures mutual information 1018 correlation and model based indices BMC Bioinformatics 13 328 1019

Stelpflug SC Rajandeep S Vaillancourt B Hirsch CN Buell CR Leon N De Kaeppler SM (2015) An 1020 expanded maize gene expression atlas based on RNA-sequencing and its use to explore root 1021 development Plant Genome 314ndash362 1022

Szklarczyk D Franceschini A Wyder S Forslund K Heller D Huerta-Cepas J Simonovic M Roth A Santos A 1023 Tsafou KP et al (2015) STRING v10 Protein-protein interaction networks integrated over the tree of life 1024 Nucleic Acids Res 43 D447ndashD452 1025

Tzfadia O Diels T De Meyer S Vandepoele K Aharoni A Van de Peer Y (2016) CoExpNetViz Comparative 1026 Co-Expression Networks Construction and Visualization Tool Front Plant Sci 6 1194 1027

Usadel B Obayashi T Mutwil M Giorgi FM Bassel GW Tanimoto M Chow A Steinhauser D Persson S 1028 Provart NJ (2009) Co-expression tools for plant biology opportunities for hypothesis generation and 1029 caveats Plant Cell Environ 32 1633ndash51 1030

USDA (2016) Grain World Markets and Trade 1031

Vanholme R Demedts B Morreel K Ralph J Boerjan W (2010) Lignin biosynthesis and structure Plant 1032 Physiol 153 895ndash905 1033

Verdin E Ott M (2014) 50 Years of Protein Acetylation From Gene Regulation To Epigenetics Metabolism 1034 and Beyond Nat Rev Mol Cell Biol 16 258ndash264 1035

Walley JW Sartor RC Shen Z Schmitz RJ Wu KJ Urich MA Nery JR Smith LG Schnable JC Ecker JR 1036 (2016) Integration of omic networks in a developmental atlas of maize Science 353 814ndash818 1037

Wang W Zhou XJ Liu Z Sun F (2015a) Network tuned multiple rank aggregation and applications to gene 1038 ranking BMC Bioinformatics 16 S6 1039

Wang YXR Jiang K Feldman LJ Bickel PJ Huang H (2015b) Inferring genendashgene interactions and functional 1040 modules using sparse canonical correlation analysis Ann Appl Stat 9 300ndash323 1041

Wang Z Gerstein M Snyder M (2009) RNA-Seq a revolutionary tool for transcriptomics Nat Rev Genet 10 1042 57ndash63 1043

Wei C Li J Bumgarner RE (2004) Sample size for detecting differentially expressed genes in microarray 1044 experiments BMC Genomics 5 87 1045

Wolfe CJ Kohane IS Butte AJ (2005) Systematic survey reveals general applicability ofldquo guilt-by-associationrdquo 1046 within gene coexpression networks BMC Bioinformatics 6 227 1047

Wu D-D Wang X Li Y Zeng L Irwin DM Zhang Y-P (2014) ldquoOut of Pollenrdquo Hypothesis for Origin of New 1048 Genes in Flowering Plants Study from Arabidopsis thaliana Genome Biol Evol 6 2822ndash2829 1049

Yang H Liu X Xin M Du J Hu Z Peng H Rossi V Sun Q Ni Z Yao Y (2016) Genome-wide Mapping of 1050 Targets of Maize Histone Deacetylase HDA101 Reveals Its Function and Regulatory Mechanism during 1051 Seed Development Plant Cell 28 629ndash645 1052

Yim WC Yu Y Song K Jang CS Lee B-M (2013) PLANEX the plant co-expression database BMC Plant Biol 1053 13 83 1054

Zheng W Chung LM Zhao H (2011) Bias detection and correction in RNA-Sequencing data BMC 1055 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 30

Bioinformatics 12 290 1056

Zhong R Allen JD Xiao G Xie Y (2014) Ensemble-based network aggregation improves the accuracy of gene 1057 network reconstruction PLoS One 9 e106319 1058

Zhong R Ye ZH (2015) Secondary cell walls Biosynthesis patterned deposition and transcriptional regulation 1059 Plant Cell Physiol 56 195ndash214 1060

Zhu G Wu A Xu X-J Xiao P Lu L Liu J Cao Y Chen L Wu J Zhao X-M (2016) PPIM A protein-protein 1061 interaction database for Maize Plant Physiol 170 pp1501821- 1062

Zyprych-Walczak J Szabelska A Handschuh L Goacuterczak K Klamecka K Figlerowicz M Siatkowski I (2015) 1063 The impact of normalization methods on RNA-Seq data analysis Biomed Res Int 2015 1064

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 1 Number of maize microarray and RNA-Seq samples submitted to NCBI (httpswwwncbinlmnihgov) from 2008-2016 A A text search of the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) database identified samples generated by microarray platforms GPL4032 and GPL12620 the total values for years 2008 to 2016 were combined to represent the number of microarray studies (Microarray) A text search of the NCBI Sequence Read Archive (SRA) database was used to identify RNA-seq samples generated between 2008 and 2016 using the Illumina sequencing platform (RNA-Seq) B the number of samples submitted to NCBI GEO database each year generated by microarray platform GPL4032 and GPL12620 were identified by a text search (dash line) and compared to the number of RNA-Seq Illumina samples (solid line) per year 2008-2016

Fig 1A B

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 2 Normalization and network inference methods effect on single network performance A Network performance was evaluated by calculating area under the receiver operating characteristic curve (AUROC) values from GO datasets for comparisons with samples normalized using Variance Stabilizing Transformation (VST) Counts Per Million (CPM) or Reads per Kilobase per Million (RPKM) methods B Network performance was evaluated by calculating AUROC values from PPPTY dataset comparisons for samples normalized using VST CPM or RPKM methods C Network performance was evaluated by calculating AUROC values from com-parisons with HDA101 binding targets for samples normalized using VST CPM or RPKM methods D Network performance was evaluated by calculating AUROC values from comparisons with GO dataset for samples constructed using nine inference methods including Pearson Correlation Coefficient (PCC) Spearman correla-tion coefficient (SCC) Kendall rank Correlation Coefficient (KCC) Gini correlation coefficient (GCC) Biweight midcorrelation (BIC) Cosine Similarity Coefficient (CSC) Additive ARCNE (AA) multiplicative ARCNE (MA) Minimum Redundancy NETwork (MRNET) and Context Likelihood of Relatedness (CLR) E Network perfor-mance was evaluated by calculating AUROC values from comparisons with PPPTY for samples constructed using nine inference methods F Network performance was evaluated by calculating AUROC values from com-parisons with HDA101 binding targets for samples constructed using nine inference methods Outliers were defined as outside of 15 times the interquartile range above 75 quantile or below 25 quantile Median values were plotted as bold horizontal lines For C to F the horizontal dashed lines indicate the highest and lowest AUROC values

Fig 2 A D

B E

C F

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

FigP

FigurePIPSimilarityPbetweenPninePinferencePmethodsPonPnetworkPperformancePbasedPuponPGOPA-PandPPPPTYPB-PevaluationI Genes with highest AUROC value in CSC or MRNETCLR are closed in blue and green box respectively AreaPunderPthePROCPcurvePAUROC-PvaluesPforPeachPGOPtermPorPgenesPwerePscaledPtoPstandardPnormalPdistributionMPresultingPinPscaledPAUROCPvaluesPbetweenPKPblue-PandPPred-IPSamplesPnormalizedPbyPVSTMPCPMPandPRPKMPwerePanalyzedPusingPeachPinferencePmethodsPPCCMPSCCMPKCCMPGCCMPBICMPCSCMPAAMPMAMPMRNETPandPCLR-PandPclusteredPbasedPonP EuclidianP distanceIP PCCP PearsonP CorrelationP CoefficientP SCCP SpearmanP correlationP coefficientP KCCP KendallP rankP CorrelationP CoefficientPGCCPGiniPcorrelationPcoefficientPBICPBiweightPmidcorrelationPCosinePSimilarityPCoefficientPCSC-PAAPAdditivePARCNEPMAPmultiplicativePARCNEPMRNETPMinimumPRedundancyPNETworkPCLRPContextPLikelihoodPofPRelatednessI

A

B

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 4 Effect of sample size on network performance A Average area under the ROC curve (average AUROC) values from GO evaluation of 17 different sized networks plotted against natural logarithm transformed sample size (log(sample size)) B Average AUROC values from PPPTY evaluation of 17 different sized networks plotted against natural logarithm transformed sample size (log(sample size)) Regression fitting logarithm models were plotted in black lines R-square and p-values were calculated by lm() function in R PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness

Fig 4A

B

GO PCC GO SCC

GO MRNET GO CLR

PPPTY PCC PPPTY SCC

PPPTY CLRPPPTY MRNET

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Fig 5

A B

Figure 5 Evaluation of different sized networks constructed with different number of samples using PCC (black) SCC (green) MRNET (red) and CLR (blue) methods A Average AUROC values from GO evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted against sample size B Average AUROC values from PPPTY evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted against sample size Networks with the same number of samples included are designated as ldquo_1rdquo ldquo_2rdquo and ldquo_3rdquo PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 6 GCN performance comparison of networks constructed with 1266 libraries A Area under the ROC curve (AUROC) values from GO evalu-ation of single network (white bars) aggregation network (grey bars) and protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers B AUROC values from PPPTY evaluation of single network (white bars) aggregation network (grey bars) and protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers

AU

C

AU

C

Protein GO Protein PPPTYA B

Fig 6

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

835

45

3812

5 802

148

MS PA

SA

872505

167664

16510411732

9172 716573

106591

MS PA

SA

chromatinassemblydisassembly

cellular macromoleculemetabolic process

chromatin assembly

Hub

N2 compound metabolicprocess

gene silencing

macromoleculemetabolic process

cellular componentorganization

chromatin modification

biosynthetic process

cellular biosyntheticprocess

DNA packaging

organelle organization

protein-DNA complexassembly

nucleosomeorganization

DNA-dep DNAreplication

macromoleculebiosynthetic process

response to DNAdamage stimulus

chromosomeorganization

pattern specificationprocess

DNA replication

DNA conformationchange

translation

cellular macromoleculebiosynthetic process

Nucleic acid metabolicprocess

gene expression

chromatin organizationnucleosome assembly

epigenetic reg of geneexpression

negative regulation ofmacromolecule

metabolic process

cellular response tostress

RNA processing

DNA repair

regionalization

polysaccharidebiosynthetic process

cell wall organization orbiogenesis

glucan metabolicprocess

cellular glucanmetabolic process

cellular polysaccharidebiosynthetic process

cellular carbohydratebiosynthetic process

cellulose metabolicprocess

cellular polysaccharidemetabolic process

cellulose biosyntheticprocess epidermis development

cell growthgrowth

regulation of cellularcomponent size

cellular amino acidderivative metabolic

process

cell wall polysaccharidemetabolic process

carbohydrate metabolicprocess

regulation of anatomicalstructure size

GTP metabolic process

root morphogenesis

epidermal celldifferentiation

ectoderm developmentphenylpropanoid

biosynthetic process

regulation of cell size

glucan biosyntheticprocess

carbohydratebiosynthetic process cellular cell wall

organization orbiogenesis

cell wall biogenesis

Cell Wallroot epidermal cell

differentiationcell differentiation

cell wall organization

protein polymerization

plant-type cell wallbiogenesis

cellular carbohydratemetabolic process

phenylpropanoidmetabolic process

cell wall macromoleculemetabolic process

cellular cell wallmacromolecule

metabolic process

plant-type cell wallorganization or

biogenesis

hemicellulose metabolicprocess

Fig 7A C

B D

Figure 7 Comparison of top 1000 interacting genes in aggregation and single networks A Venn diagram shows the overlap among 1000 highest interacting genes in three networks 148 genes were shared among three networks PA PCC ranked aggregation network SA SCC ranked aggrega-tion network MS MRNET single network B Venn diagram shows the overlap among top 1x106 edges in three networks 106591 edges were shared among three networks C GO term enrichment analyzed by AgriGO for 148 shared highly interacting genes among PA SA and MS D GO term enrichment analyzed by AgriGO for 214 co-expressed genes queried by 16 cell wall pathway genes wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Fig 8

A B C

Figure 8 Cell wall pathway subnetworks A Intersections of PCC aggregation (PA) SCC aggregation (SA) and MRNET-single (MS) networks queried by 16 cell wall pathway genes (red nodes) Cyan nodes are genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions B Network retrieved from CORNET database queried by the 16 cell wall pathway genes (red node) Cyan nodes are genes with reported function in cell wall related path-ways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions C Network retrieved from STRING database queried by the 16 cell wall pathway genes (red nodes) Cyan nodes are genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Parsed CitationsAllen JD Xie Y Chen M Girard L Xiao G (2012) Comparing statistical methods for constructing large scale gene networks PLoS One7 e29348

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Anders S Huber W (2010) Differential expression analysis for sequence count data Genome Biol 11 R106Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Aoki K Ogata Y Shibata D (2007) Approaches for extracting practical information from gene co-expression networks in plant biologyPlant Cell Physiol 48 381-90

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Aoki Y Okamura Y Tadaka S Kinoshita K Obayashi T (2015) ATTED-II in 2016 A Plant Coexpression Database Towards Lineage-Specific Coexpression Plant Cell Physiol 57 e5-e5

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Asnicar F Masera L Coller E Gallo C Sella N Tolio T Morettin P Erculiani L Galante F Semeniuta S (2016) NES2RA Networkexpansion by stratified variable subsetting and ranking aggregation Int J High Perform Comput Appl 1094342016662508

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Baerenfaller K Grossmann J Grobei MA Hull R Hirsch-Hoffmann M Yalovsky S Zimmermann P Grossniklaus U Gruissem WBaginsky S (2008) Genome-scale proteomics reveals Arabidopsis thaliana gene models and proteome dynamics Science (80- ) 320938-941

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ballouz S Verleyen W Gillis J (2015) Guidance for RNA-seq co-expression network construction and analysis Safety in numbersBioinformatics 31 2123-2130

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ballouz S Weber M Pavlidis P Gillis J (2016) EGAD Ultra-fast functional analysis of gene networks bioRxiv 53868Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Barabasi A-L Oltvai ZNZN Barabaacutesi A-L (2004) Network biology understanding the cells functional organization Nat Rev Genet 5101-113

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Benjamini Y Hochberg Y (1995) Controlling the false discovery rate a practical and powerful approach to multiple testing J R Stat SocSer B 289-300

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

De Bodt S Hollunder J Nelissen H Meulemeester N Inzeacute D (2012) CORNET 20 Integrating plant coexpression protein-proteininteractions regulatory interactions gene associations and functional annotations New Phytol 195 707-720

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Bolduc N Yilmaz A Mejia-Guerra MK Morohashi K OConnor D Grotewold E Hake S (2012) Unraveling the KNOTTED1 regulatorynetwork in maize meristems Genes Dev 26 1685-90

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Bosch M Mayer C-D Cookson A Donnison IS (2011) Identification of genes involved in cell wall biogenesis in grasses by differential wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

gene expression profiling of elongating and non-elongating maize internodes J Exp Bot 62 3545-3561Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Camacho C Coulouris G Avagyan V Ma N Papadopoulos J Bealer K Madden TL (2009) BLAST+ architecture and applications BMCBioinformatics 10 421

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Conesa A Madrigal P Tarazona S Gomez-Cabrero D Cervera A McPherson A Szczesniak MW Gaffney DJ Elo LL Zhang X et al(2016) A survey of best practices for RNA-seq data analysis Genome Biol 17 13

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dhaeseleer P Liang S Somogyi R (2000) Genetic network inference from co-expression clustering to reverse engineeringBioinformatics 16 707-726

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Davidson RM Hansey CN Gowda M Childs KL Lin H Vaillancourt B Sekhon RS de Leon N Kaeppler SM Jiang N et al (2011) Utilityof RNA Sequencing for Analysis of Maize Reproductive Transcriptomes Plant Genome J 4 191

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Desprez T Juraniec M Crowell EF Jouy H Pochylova Z Parcy F Houmlfte H Gonneau M Vernhettes S (2007) Organization of cellulosesynthase complexes involved in primary cell wall synthesis in Arabidopsis thaliana Proc Natl Acad Sci 104 15572-15577

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dhillon IS Modha DS (2001) Concept decompositions for large sparse text data using clustering Mach Learn 42 143-175Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dillies M-A Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot G Castel D Estelle J (2013a) Acomprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis Brief Bioinform 14671-683

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dillies MA Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot NS Castel D Estelle J et al (2013b) Acomprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis Brief Bioinform 14671-683

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Doncheva NT Assenov Y Domingues FS Albrecht M (2012) Topological analysis and interactive visualization of biological networksand protein structures Nature Protoc 7 670-85

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dong J Horvath S (2007) Understanding network concepts in modules BMC Syst Biol 1 24Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dotto MC Petsch KA Aukerman MJ Beatty M Hammell M Timmermans MCP (2014) Genome-wide analysis of leafbladeless1-regulatedand phased small RNAs underscores the importance of the TAS3 ta-siRNA pathway to maize development PLoS Genet 10 e1004826

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Du D Rawat N Deng Z Gmitter FG (2015) Construction of citrus gene coexpression networks from microarray data using randommatrix theory Hortic Res 2 15026

Pubmed Author and TitleCrossRef Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Google Scholar Author Only Title Only Author and Title

Du Z Zhou X Ling Y Zhang Z Su Z (2010) agriGO a GO analysis toolkit for the agricultural community Nucleic Acids Res 38 64-70Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Enright AJ Van Dongen S Ouzounis CA (2002) An efficient algorithm for large-scale detection of protein families Nucleic Acids Res 301575-1584

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fadista J Vikman P Laakso EO Mollet IG Esguerra J Lou Taneera J Storm P Osmark P Ladenvall C Prasad RB (2014) Globalgenomic and transcriptomic analysis of human pancreatic islets reveals novel genes influencing glucose metabolism Proc Natl AcadSci 111 13924-13929

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Faith JJ Hayete B Thaden JT Mogno I Wierzbowski J Cottarel G Kasif S Collins JJ Gardner TS (2007) Large-scale mapping andvalidation of Escherichia coli transcriptional regulation from a compendium of expression profiles PLoS Biol 5 0054-0066

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fedoroff N V (2012) McClintocks challenge in the 21st century Proc Natl Acad Sci 109(50) 20200-20203Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ficklin SP Feltus FA (2011) Gene coexpression network alignment and conservation of gene modules between two grass speciesmaize and rice Plant Physiol 156 1244-56

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Filzmoser P Fritz H Kalcher K (2009) pcaPP Robust PCA by Projection Pursuit R Packag version 1Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fu J Cheng Y Linghu J Yang X Kang L Zhang Z Zhang J He C Du X Peng Z (2013) RNA sequencing reveals the complex regulatorynetwork in the maize kernel Nature Commun 42832

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gao S Ver Steeg G Galstyan A (2015) Efficient Estimation of Mutual Information for Strongly Dependent Variables ArtificialIntelligence and Statistics 277-286

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gillis J Pavlidis P (2011) The role of indirect connections in gene networks in predicting function Bioinformatics 27 1860-1866Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Giorgi F Fabbro C Del Licausi F (2013) Comparative study of RNA-seq-and Microarray-derived coexpression networks in Arabidopsisthaliana Bioinformatics 2 1-8

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gu Y Kaplinsky N Bringmann M Cobb A Carroll A Sampathkumar A Baskin TI Persson S Somerville CR (2010) Identification of acellulose synthase-associated protein required for cellulose biosynthesis Proc Natl Acad Sci 107 12866-12871

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Han Y Gao S Muegge K Zhang W Zhou B (2015) Advanced applications of RNA sequencing and challenges Bioinform Biol Insights 929-46

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Horvath S Dong J (2008) Geometric interpretation of gene coexpression network analysis PLoS Comput biol 4 e1000117Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Huang J Lynn JS Schulte L Vendramin S McGinnis K (2017) Chapter Two-Epigenetic Control of Gene Expression in Maize Int RevCell Mol Biol 328 25-48

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Iancu OD Kawane S Bottomly D Searles R Hitzemann R McWeeney S (2012) Utilizing RNA-Seq data for de novo coexpressionnetwork inference Bioinformatics 28 1592-1597

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kim D Langmead B Salzberg SL (2015) HISAT a fast spliced aligner with low memory requirements Nat Methods 12 357-360Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Koumlster J Rahmann S (2012) Snakemakemdasha scalable bioinformatics workflow engine Bioinformatics 28 2520-2522Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Krouk G Lingeman J Colon A Coruzzi G Shasha D (2013) Gene regulatory networks in plants learning causality from time andperturbation Genome Biol 14 123

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kumar S Stecher G Suleski M Hedges SB (2017) TimeTree a resource for timelines timetrees and divergence times Mol Biol Evol34 1812-1819

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kumari S Nie J Chen H-S Ma H Stewart R Li X Lu M-Z Taylor WM Wei H (2012) Evaluation of gene association methods forcoexpression network construction and biological knowledge discovery PLoS One 7 e50411

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Langfelder P Horvath S (2008) WGCNA an R package for weighted correlation network analysis BMC Bioinformatics 9 559Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Leinonen R Sugawara H Shumway M (2010) The sequence read archive Nucleic Acids Res gkq1019Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li C Qiao Z Qi W Wang Q Yuan Y Yang X Tang Y Mei B Lv Y Zhao H et al (2015a) Genome-Wide Characterization of cis-Acting DNATargets Reveals the Transcriptional Regulatory Framework of Opaque2 in Maize Plant Cell 27 532-545

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li H Peng Z Yang X Wang W Fu J Wang J Han Y Chai Y Guo T Yang N (2013a) Genome-wide association study dissects the geneticarchitecture of oil biosynthesis in maize kernels Nat Genet 45 43-50

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li J Wei H Zhao PX (2013b) DeGNServer Deciphering Genome-Scale Gene Networks through High Performance ReverseEngineering Analysis 2013

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li P Piao Y Shon HS Ryu KH (2015b) Comparing the normalization methods for the differential analysis of Illumina high-throughputRNA-Seq data BMC Bioinformatics 16 347 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li Q Eichten SR Hermanson PJ Zaunbrecher VM Song J Wendt J Rosenbaum H Madzima TF Sloan AE Huang J et al (2014a)Genetic Perturbation of the Maize Methylome Plant Cell 26 4602-4616

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li S Labaj PP Zumbo P Sykacek P Shi W Shi L Phan J Wu P-Y Wang M Wang C (2014b) Detecting and correcting systematicvariation in large-scale RNA sequencing data Nature Biotechnol 32 888-895

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li Y Pearl SA Jackson SA (2015c) Gene Networks in Plant Biology Approaches in Reconstruction and Analysis Trends Plant Sci 20664-675

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Liao Y Smyth GK Shi W (2014) featureCounts an efficient general purpose program for assigning sequence reads to genomicfeatures Bioinformatics 30 923-930

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Lim WK Wang K Lefebvre C Califano A (2007) Comparative analysis of microarray normalization procedures Effects on reverseengineering gene networks Bioinformatics pp 282-288

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Liu S Liu Y Zhao J Cai S Qian H Zuo K Zhao L Zhang L (2017) A computational interactome for prioritizing genes associated withcomplex agronomic traits in rice Plant J 90 177-188

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Lonsdale J Thomas J Salvatore M Phillips R Lo E Shad S Hasz R Walters G Garcia F Young N (2013) The genotype-tissueexpression (GTEx) project Nat Genet 45 580-585

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Love MI Huber W Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 Genome Biol15 1

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Luo X You Z Zhou M Li S Leung H Xia Y Zhu Q (2015) A highly efficient approach to protein interactome mapping based oncollaborative filtering framework Sci Rep 5 7702

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma C Wang X (2012) Application of the Gini correlation coefficient to infer regulatory relationships in transcriptome analysis PlantPhysiol 160 192-203

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma H-W Zeng A-P (2003) The connectivity structure giant strong component and centrality of metabolic networks Bioinformatics 191423-1430

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma S Shah S Bohnert HJ Snyder M Dinesh-Kumar SP (2013) Incorporating motif analysis into gene co-expression networks revealsnovel modular expression pattern and new signaling pathways PLoS Genet 9 e1003840

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Marbach D Costello JC Kuumlffner R Vega NNM Prill RJ Camacho DM Allison KR Aderhold A Allison KR Bonneau R et al (2012)Wisdom of crowds for robust gene network inference Nat Methods 9 796-804

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Margolin AA Nemenman I Basso K Wiggins C Stolovitzky G Dalla Favera R Califano A (2006) ARACNE an algorithm for thereconstruction of gene regulatory networks in a mammalian cellular context BMC Bioinformatics 7 S7

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mark Cigan A Unger-Wallace E Haug-Collet K (2005) Transcriptional gene silencing as a tool for uncovering gene function in maizePlant J 43 929-940

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads EMBnet J 17 pp-10Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Maza E Frasse P Senin P Bouzayen M Zouine M (2013) Comparison of normalization methods for differential gene expressionanalysis in RNA-Seq experiments A matter of relative size of studied transcriptomes Commun Integr Biol 6 e25849

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

McClintock B (1983) The Significance of Responses of THE GENOME TO CHALLENGE Science (80- ) 792-801Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Meyer PE Kontos K Lafitte F Bontempi G (2007) Information-theoretic inference of large transcriptional regulatory networks EurasipJ Bioinforma Syst Biol doi 101155200779879

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Meyer PE Lafitte F Bontempi G (2008) minet ARBioconductor package for inferring large transcriptional networks using mutualinformation BMC Bioinformatics 9 461

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Monaco MK Sen TZ Dharmawardhana PD Ren L Schaeffer M Naithani S Amarasinghe V Thomason J Harper L Gardiner J et al(2013) Maize Metabolic Network Construction and Transcriptome Analysis Plant Genome 6 12

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Morohashi K Casas MI Falcone Ferreyra ML Falcone Ferreyra L Mejiacutea-Guerra MK Pourcel L Yilmaz A Feller A Carvalho B EmilianiJ et al (2012) A genome-wide regulatory framework identifies maize pericarp color1 controlled genes Plant Cell 24 2745-64

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Morris JH Apeltsin L Newman AM Baumbach J Wittkop T Su G Bader GD Ferrin TE (2011) clusterMaker a multi-algorithmclustering plugin for Cytoscape BMC Bioinformatics 12 436

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mortazavi A Williams BA McCue K Schaeffer L Wold B (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq NatMethods 5 621-628

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mukaka MM (2012) A guide to appropriate use of correlation coefficient in medical research Malawi Med J 24 69-71Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Obayashi T Hayashi S Saeki M Ohta H Kinoshita K (2009) ATTED-II provides coexpressed gene networks for Arabidopsis Nucleic wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Acids Res 37 D987-D991Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ogata Y Suzuki H Sakurai N Shibata D (2010) CoP a database for characterizing co-expressed gene modules with biologicalinformation in plants Bioinformatics 26 1267-1268

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Oshlack A Wakefield MJ (2009) Transcript length bias in RNA-seq data confounds systems biology Biol Direct 4 14Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Park T Yi S-G Kang S-H Lee S Lee Y-S Simon R (2003) Evaluation of normalization methods for microarray data BMC Bioinformatics4 33

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Paulson J Chen C-Y Lopes-Ramos CM Kuijjer ML Platig J Sonawane AR Fagny M Glass K Quackenbush J (2016) Tissue-awareRNA-Seq processing and normalization for heterogeneous and sparse data bioRxiv 81802

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Pautler M Eveland AL LaRue T Yang F Weeks R Lunde C Je B Il Meeley R Komatsu M Vollbrecht E et al (2015) FASCIATED EAR4Encodes a bZIP Transcription Factor That Regulates Shoot Meristem Size in Maize Plant Cell Online 2 tpc114132506

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Penning BW Hunter 3rd CT Tayengwa R Eveland AL Dugard CK Olek AT Vermerris W Koch KE McCarty DR Davis MF et al (2009)Genetic resources for maize cell wall biology Plant Physiol 151 1703-1728

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ponnala L Wang Y Sun Q Wijk KJ (2014) Correlation of mRNA and protein abundance in the developing maize leaf Plant J 78 424-440Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Rau A Gallopin M Celeux G Jaffreacutezic F (2013) Data-based filtering for replicated high-throughput transcriptome sequencingexperiments Bioinformatics 29 2146-2152

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Robinson MD McCarthy DJ Smyth GK (2010) edgeR a Bioconductor package for differential expression analysis of digital geneexpression data Bioinformatics 26 139-140

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sales G Romualdi C (2011) parmigenemdasha parallel R package for mutual information estimation and gene network reconstructionBioinformatics 27 1876-1877

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sangrador-Vegas A Mitchell AL Chang H-Y Yong S-Y Finn RD (2016) GO annotation in InterPro why stability does not indicateaccuracy in a sea of changing annotations Database J Biol databases curation 2016

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sato Y Antonio BA Namiki N Takehisa H Minami H Kamatsuki K Sugimoto K Shimizu Y Hirochika H Nagamura Y (2011) RiceXPro aplatform for monitoring gene expression in japonica rice grown under natural field conditions Nucleic Acids Res 39 D1141-D1148

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schaefer RJ Briskine R Springer NM Myers CL (2014) Discovering functional modules across diverse maize transcriptomes using wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

COB the co-expression browser PLoS One doi 101371journalpone0099193Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schmidt D (2016) Co-Operation Fast Correlation Covariance and Cosine Similarity R packagePubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schnable P Ware D Fulton R Stein J (2009) The B73 maize genome complexity diversity and dynamics Science (80- ) 326 1112-1115Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schwanhaumlusser B Busse D Li N Dittmar G Schuchhardt J Wolf J Chen W Selbach M (2011) Global quantification of mammalian geneexpression control Nature 473 337-342

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Scott-Boyer M-P Haibe-Kains B Deschepper CF (2013) Network statistics of genetically-driven gene co-expression modules in mousecrosses Frontiers in Genetics 20134291

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Serin EAR Nijveen H Hilhorst HWM Ligterink W (2016) Learning from Co-expression Networks Possibilities and Challenges FrontPlant Sci 7 444

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Shannon P Markiel A Ozier O Baliga NS Wang JT Ramage D Amin N Schwikowski B Ideker T (2003) Cytoscape a softwareenvironment for integrated models of biomolecular interaction networks Genome Res 13 2498-2504

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sing T Sander O Beerenwinkel N Lengauer T (2005) ROCR visualizing classifier performance in R Bioinformatics 21 3940-3941Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Singh M Goel S Meeley RB Dantec C Parrinello H Michaud C Leblanc O Grimanelli D (2011) Production of viable gametes withoutmeiosis in maize deficient for an ARGONAUTE protein Plant Cell 23 443-458

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Song L Langfelder P Horvath S (2012) Comparison of co-expression measures mutual information correlation and model basedindices BMC Bioinformatics 13 328

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Stelpflug SC Rajandeep S Vaillancourt B Hirsch CN Buell CR Leon N De Kaeppler SM (2015) An expanded maize gene expressionatlas based on RNA-sequencing and its use to explore root development Plant Genome 314-362

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Szklarczyk D Franceschini A Wyder S Forslund K Heller D Huerta-Cepas J Simonovic M Roth A Santos A Tsafou KP et al (2015)STRING v10 Protein-protein interaction networks integrated over the tree of life Nucleic Acids Res 43 D447-D452

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Tzfadia O Diels T De Meyer S Vandepoele K Aharoni A Van de Peer Y (2016) CoExpNetViz Comparative Co-Expression NetworksConstruction and Visualization Tool Front Plant Sci 6 1194

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Usadel B Obayashi T Mutwil M Giorgi FM Bassel GW Tanimoto M Chow A Steinhauser D Persson S Provart NJ (2009) Co-expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

USDA (2016) Grain World Markets and Trade

Vanholme R Demedts B Morreel K Ralph J Boerjan W (2010) Lignin biosynthesis and structure Plant Physiol 153 895-905Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Verdin E Ott M (2014) 50 Years of Protein Acetylation From Gene Regulation To Epigenetics Metabolism and Beyond Nat Rev MolCell Biol 16 258-264

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Walley JW Sartor RC Shen Z Schmitz RJ Wu KJ Urich MA Nery JR Smith LG Schnable JC Ecker JR (2016) Integration of omicnetworks in a developmental atlas of maize Science 353 814-818

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang W Zhou XJ Liu Z Sun F (2015a) Network tuned multiple rank aggregation and applications to gene ranking BMC Bioinformatics16 S6

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang YXR Jiang K Feldman LJ Bickel PJ Huang H (2015b) Inferring gene-gene interactions and functional modules using sparsecanonical correlation analysis Ann Appl Stat 9 300-323

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang Z Gerstein M Snyder M (2009) RNA-Seq a revolutionary tool for transcriptomics Nat Rev Genet 10 57-63Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wei C Li J Bumgarner RE (2004) Sample size for detecting differentially expressed genes in microarray experiments BMC Genomics5 87

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wolfe CJ Kohane IS Butte AJ (2005) Systematic survey reveals general applicability of guilt-by-association within genecoexpression networks BMC Bioinformatics 6 227

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wu D-D Wang X Li Y Zeng L Irwin DM Zhang Y-P (2014) Out of Pollen Hypothesis for Origin of New Genes in Flowering PlantsStudy from Arabidopsis thaliana Genome Biol Evol 6 2822-2829

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yang H Liu X Xin M Du J Hu Z Peng H Rossi V Sun Q Ni Z Yao Y (2016) Genome-wide Mapping of Targets of Maize HistoneDeacetylase HDA101 Reveals Its Function and Regulatory Mechanism during Seed Development Plant Cell 28 629-645

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yim WC Yu Y Song K Jang CS Lee B-M (2013) PLANEX the plant co-expression database BMC Plant Biol 13 83Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zheng W Chung LM Zhao H (2011) Bias detection and correction in RNA-Sequencing data BMC Bioinformatics 12 290Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Allen JD Xiao G Xie Y (2014) Ensemble-based network aggregation improves the accuracy of gene network reconstructionPLoS One 9 e106319 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Ye ZH (2015) Secondary cell walls Biosynthesis patterned deposition and transcriptional regulation Plant Cell Physiol 56195-214

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhu G Wu A Xu X-J Xiao P Lu L Liu J Cao Y Chen L Wu J Zhao X-M (2016) PPIM A protein-protein interaction database for MaizePlant Physiol 170 pp1501821-

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zyprych-Walczak J Szabelska A Handschuh L Goacuterczak K Klamecka K Figlerowicz M Siatkowski I (2015) The impact of normalizationmethods on RNA-Seq data analysis Biomed Res Int 2015

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

  • Parsed Citations
  • Article File
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Figure 7
  • Figure 8
  • Parsed Citations
Page 22: 1 : Maize RNA-Seq GCNs · 4 Author names and affiliation: 5 Ji Huang, ... and the genome annotation is still far from complete (Mark 49 Cigan et al., 2005; Ficklin and Feltus, 2011)

Page | 22

of the1266 libraries originating from each tissue SAM= Shoot Apical Meristem Samples are grouped by tissue 727

and may be represented by one or more developmental stages of that tissue Tissues represented by less than 728

10 libraries were grouped together as Others C Relative representation of different maize genotypes in our 729

datasets Genotypes are listed by name with the percentage of the 1266 libraries originating from each tissue 730

MAGIC = Multi-parent Advanced Generation InterCrosses Genotypes represented by more than 10 libraries 731

were grouped together as Others 732

733

Supplemental Figure 2 Distribution of gene expression values The frequency of each expression level in the 734

dataset (Density) was plotted against gene expression (Expr) which was calculated after normalization by 735

Variance Stabilizing Transformation (VST) Counts Per Million (CPM) or Reads Per Kilobase per Million 736

mapped reads (RPKM) A-B distribution of expression values for samples normalized with CPM (black line 737

CPM graph) and RPKM (black line RPKM graph) before (A) and after (B) logarithm normalization (log2) VST 738

values are log2 transformed by default The normal distribution of expression (dot lines) was calculated using 739

dnorm() function in R which takes the mean value and standard deviation from log2 transformed expressions 740

C Normalized gene expression values for 15116 genes were averaged libraries and plotted as a function of 741

gene length in base pairs (bp) 742

743

Supplemental Figure 3 Maize CPM-normalized with log2 transformed gene expression from all tissues and 744

developmental stages (Stelpflug et al 2015) A Clustering dendrogram of samples based on Euclidean 745

distance (Height) DAS days after sowing DAP days after pollination V1-V18 vegetative developmental 746

stage B Heat map of the gene expression correlation between pollen tissue and 78 other tissues calculated 747

by Pearson correlation coefficient ranging 06 to 10 Red color indicates higher correlation 748

749

Supplemental Figure 4 Pairwise comparison among results of inferences methods A GO evaluation 750

comparisons for VST CPM and RPKM normalized data The AUROC value density for each method was 751

plotted in diagonal line of blocks between AUROC values and PCC values AUROC values evaluated by GO 752

datasets were plotted pairwise in triangle below diagonal with the number corresponding coefficient values as 753

calculated by Pearson correlation shown in the triangle above diagonal B PPPTY evaluation comparisons for 754

VST CPM and RPKM normalized data The AUROC value density for each method was plotted in diagonal 755

line of blocks between AUROC values and PCC values AUROC values evaluated by PPPTY datasets were 756

plotted pairwise in triangle below diagonal with the number corresponding coefficient values as calculated by 757

Pearson correlation shown in the triangle above diagonal PCC Pearson Correlation Coefficient SCC 758

Spearman correlation coefficient KCC Kendall rank Correlation Coefficient GCC Gini correlation coefficient 759

Bi Biweight midcorrelation CSC Cosine Similarity Coefficient AA Additive ARCNE MA multiplicative 760

ARCNE MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness 761

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 23

762

Supplemental Figure 5 Characteristic of all 1720 PPPTY gene set (ALL_1720) genes with highest AUROC 763

values in CSC method (CSC) PCC method (PCC) and MRNET method (MRNET) Average expression in 764

CPM of four gene sets were in squares average number of lowly expressed elements (CPM lt 0) were in solid 765

circles 766

767

Supplemental Figure 6 Evaluation of network performance based on sample size and inference A AUROC 768

values from GO evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted 769

against sample size B AUROC values from PPPTY evaluations of networks constructed using 12 (S12) to 770

1266 (S1266) libraries were plotted against sample size Networks with the same number of samples included 771

are designated as ldquo_1rdquo ldquo_2rdquo and ldquo_3rdquo Outliers were defined as outside of 15 times the interquartile range 772

above the 75 quantile or below 25 quantile Median values were plotted as bold horizontal lines Dash lines 773

are average AUROC value from 17 individual networks of each categories Mean values of each network were 774

labeled in asterisks PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET 775

Minimum Redundancy NETwork CLR Context Likelihood of Relatedness 776

777

Supplemental Figure 7 GCN performance comparison between protein networks A Area Under the ROC 778

curve (AUROC) values from GO evaluation of protein networks with 17862 genes (ppr_all) and with 11429 779

genes (ppr) B Area Under the ROC curve (AUROC) values from PPPTY evaluation of protein networks with 780

17862 genes (ppr_all) and with 11429 genes (ppr) Both networks were constructed by Pearson Correlation 781

Coefficient (PCC) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate 782

outliers 783

784

Supplemental Figure 8 Average neighborhood connectivity for three selected networks PCC-aggregated (PA) 785

SCC-aggregated (SA) and MRNET-single (MS) The average neighborhood connectivity distribution of all 786

genes is plotted against number of neighbors The top one million edges were chosen for each network Red 787

and blue curve shows the power-law fitted distribution R2 value indicates the fitness with the power-law model 788

789

Supplemental Figure 9 Node distribution for four selected networks PCC-aggregated (PA) SCC-aggregated 790

(SA) and MRNET-single (MS) and the intersection among three networks (Merged network) The number of 791

edges linked to the genes (node degree) was plotted against the number of genes with that degree (number of 792

nodes) Red curve shows the power-law fitted distribution with the function and R2 indicated beside 793

794

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 24

Supplemental Figure 10 Network representation of PCC ranked aggregation network (PA) Each node is a 795

gene in the network The eight largest modules detected by Markov Cluster Algorithm (MCL) were highlighted 796

in colors Genes not in modules 1-8 are light grey nodes 797

798

799

Literature Cited 800

Allen JD Xie Y Chen M Girard L Xiao G (2012) Comparing statistical methods for constructing large scale 801 gene networks PLoS One 7 e29348 802

Anders S Huber W (2010) Differential expression analysis for sequence count data Genome Biol 11 R106 803

Aoki K Ogata Y Shibata D (2007) Approaches for extracting practical information from gene co-expression 804 networks in plant biology Plant Cell Physiol 48 381ndash90 805

Aoki Y Okamura Y Tadaka S Kinoshita K Obayashi T (2015) ATTED-II in 2016 A Plant Coexpression 806 Database Towards Lineage-Specific Coexpression Plant Cell Physiol 57 e5ndashe5 807

Asnicar F Masera L Coller E Gallo C Sella N Tolio T Morettin P Erculiani L Galante F Semeniuta S (2016) 808 NES2RA Network expansion by stratified variable subsetting and ranking aggregation Int J High Perform 809 Comput Appl 1094342016662508 810

Baerenfaller K Grossmann J Grobei MA Hull R Hirsch-Hoffmann M Yalovsky S Zimmermann P 811 Grossniklaus U Gruissem W Baginsky S (2008) Genome-scale proteomics reveals Arabidopsis thaliana 812 gene models and proteome dynamics Science (80- ) 320 938ndash941 813

Ballouz S Verleyen W Gillis J (2015) Guidance for RNA-seq co-expression network construction and analysis 814 Safety in numbers Bioinformatics 31 2123ndash2130 815

Ballouz S Weber M Pavlidis P Gillis J (2016) EGAD Ultra-fast functional analysis of gene networks bioRxiv 816 53868 817

Barabasi A-L Oltvai ZNZN Barabaacutesi A-L (2004) Network biology understanding the cellrsquos functional 818 organization Nat Rev Genet 5 101ndash113 819

Benjamini Y Hochberg Y (1995) Controlling the false discovery rate a practical and powerful approach to 820 multiple testing J R Stat Soc Ser B 289ndash300 821

De Bodt S Hollunder J Nelissen H Meulemeester N Inzeacute D (2012) CORNET 20 Integrating plant 822 coexpression protein-protein interactions regulatory interactions gene associations and functional 823 annotations New Phytol 195 707ndash720 824

Bolduc N Yilmaz A Mejia-Guerra MK Morohashi K OrsquoConnor D Grotewold E Hake S (2012) Unraveling the 825 KNOTTED1 regulatory network in maize meristems Genes Dev 26 1685ndash90 826

Bosch M Mayer C-D Cookson A Donnison IS (2011) Identification of genes involved in cell wall biogenesis in 827 grasses by differential gene expression profiling of elongating and non-elongating maize internodes J 828 Exp Bot 62 3545ndash3561 829

Camacho C Coulouris G Avagyan V Ma N Papadopoulos J Bealer K Madden TL (2009) BLAST+ 830 architecture and applications BMC Bioinformatics 10 421 831

Conesa A Madrigal P Tarazona S Gomez-Cabrero D Cervera A McPherson A Szcześniak MW Gaffney DJ 832 Elo LL Zhang X et al (2016) A survey of best practices for RNA-seq data analysis Genome Biol 17 13 833

Drsquohaeseleer P Liang S Somogyi R (2000) Genetic network inference from co-expression clustering to reverse 834 engineering Bioinformatics 16 707ndash726 835

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 25

Davidson RM Hansey CN Gowda M Childs KL Lin H Vaillancourt B Sekhon RS de Leon N Kaeppler SM 836 Jiang N et al (2011) Utility of RNA Sequencing for Analysis of Maize Reproductive Transcriptomes Plant 837 Genome J 4 191 838

Desprez T Juraniec M Crowell EF Jouy H Pochylova Z Parcy F Houmlfte H Gonneau M Vernhettes S (2007) 839 Organization of cellulose synthase complexes involved in primary cell wall synthesis in Arabidopsis 840 thaliana Proc Natl Acad Sci 104 15572ndash15577 841

Dhillon IS Modha DS (2001) Concept decompositions for large sparse text data using clustering Mach Learn 842 42 143ndash175 843

Dillies M-A Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot G Castel D 844 Estelle J (2013a) A comprehensive evaluation of normalization methods for Illumina high-throughput RNA 845 sequencing data analysis Brief Bioinform 14 671ndash683 846

Dillies MA Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot NS Castel D 847 Estelle J et al (2013b) A comprehensive evaluation of normalization methods for Illumina high-throughput 848 RNA sequencing data analysis Brief Bioinform 14 671ndash683 849

Doncheva NT Assenov Y Domingues FS Albrecht M (2012) Topological analysis and interactive visualization 850 of biological networks and protein structures Nature Protoc 7 670ndash85 851

Dong J Horvath S (2007) Understanding network concepts in modules BMC Syst Biol 1 24 852

Dotto MC Petsch KA Aukerman MJ Beatty M Hammell M Timmermans MCP (2014) Genome-wide analysis 853 of leafbladeless1-regulated and phased small RNAs underscores the importance of the TAS3 ta-siRNA 854 pathway to maize development PLoS Genet 10 e1004826 855

Du D Rawat N Deng Z Gmitter FG (2015) Construction of citrus gene coexpression networks from microarray 856 data using random matrix theory Hortic Res 2 15026 857

Du Z Zhou X Ling Y Zhang Z Su Z (2010) agriGO a GO analysis toolkit for the agricultural community 858 Nucleic Acids Res 38 64-70 859

Enright AJ Van Dongen S Ouzounis CA (2002) An efficient algorithm for large-scale detection of protein 860 families Nucleic Acids Res 30 1575ndash1584 861

Fadista J Vikman P Laakso EO Mollet IG Esguerra J Lou Taneera J Storm P Osmark P Ladenvall C 862 Prasad RB (2014) Global genomic and transcriptomic analysis of human pancreatic islets reveals novel 863 genes influencing glucose metabolism Proc Natl Acad Sci 111 13924ndash13929 864

Faith JJ Hayete B Thaden JT Mogno I Wierzbowski J Cottarel G Kasif S Collins JJ Gardner TS (2007) 865 Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of 866 expression profiles PLoS Biol 5 0054ndash0066 867

Fedoroff N V (2012) McClintockrsquos challenge in the 21st century Proc Natl Acad Sci 109(50) 20200ndash20203 868

Ficklin SP Feltus FA (2011) Gene coexpression network alignment and conservation of gene modules 869 between two grass species maize and rice Plant Physiol 156 1244ndash56 870

Filzmoser P Fritz H Kalcher K (2009) pcaPP Robust PCA by Projection Pursuit R Packag version 1 871

Fu J Cheng Y Linghu J Yang X Kang L Zhang Z Zhang J He C Du X Peng Z (2013) RNA sequencing 872 reveals the complex regulatory network in the maize kernel Nature Commun 42832 873

Gao S Ver Steeg G Galstyan A (2015) Efficient Estimation of Mutual Information for Strongly Dependent 874 Variables Artificial Intelligence and Statistics 277-286 875

Gillis J Pavlidis P (2011) The role of indirect connections in gene networks in predicting function 876 Bioinformatics 27 1860ndash1866 877

Giorgi F Fabbro C Del Licausi F (2013) Comparative study of RNA-seq-and Microarray-derived coexpression 878 networks in Arabidopsis thaliana Bioinformatics 2 1ndash8 879

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 26

Gu Y Kaplinsky N Bringmann M Cobb A Carroll A Sampathkumar A Baskin TI Persson S Somerville CR 880 (2010) Identification of a cellulose synthase-associated protein required for cellulose biosynthesis Proc 881 Natl Acad Sci 107 12866ndash12871 882

Han Y Gao S Muegge K Zhang W Zhou B (2015) Advanced applications of RNA sequencing and challenges 883 Bioinform Biol Insights 9 29ndash46 884

Horvath S Dong J (2008) Geometric interpretation of gene coexpression network analysis PLoS Comput biol 885 4 e1000117 886

Huang J Lynn JS Schulte L Vendramin S McGinnis K (2017) Chapter Two-Epigenetic Control of Gene 887 Expression in Maize Int Rev Cell Mol Biol 328 25ndash48 888

Iancu OD Kawane S Bottomly D Searles R Hitzemann R McWeeney S (2012) Utilizing RNA-Seq data for de 889 novo coexpression network inference Bioinformatics 28 1592ndash1597 890

Kim D Langmead B Salzberg SL (2015) HISAT a fast spliced aligner with low memory requirements Nat 891 Methods 12 357ndash360 892

Koumlster J Rahmann S (2012) Snakemakemdasha scalable bioinformatics workflow engine Bioinformatics 28 893 2520ndash2522 894

Krouk G Lingeman J Colon A Coruzzi G Shasha D (2013) Gene regulatory networks in plants learning 895 causality from time and perturbation Genome Biol 14 123 896

Kumar S Stecher G Suleski M Hedges SB (2017) TimeTree a resource for timelines timetrees and 897 divergence times Mol Biol Evol 34 1812ndash1819 898

Kumari S Nie J Chen H-S Ma H Stewart R Li X Lu M-Z Taylor WM Wei H (2012) Evaluation of gene 899 association methods for coexpression network construction and biological knowledge discovery PLoS 900 One 7 e50411 901

Langfelder P Horvath S (2008) WGCNA an R package for weighted correlation network analysis BMC 902 Bioinformatics 9 559 903

Leinonen R Sugawara H Shumway M (2010) The sequence read archive Nucleic Acids Res gkq1019 904

Li C Qiao Z Qi W Wang Q Yuan Y Yang X Tang Y Mei B Lv Y Zhao H et al (2015a) Genome-Wide 905 Characterization of cis-Acting DNA Targets Reveals the Transcriptional Regulatory Framework of 906 Opaque2 in Maize Plant Cell 27 532-545 907

Li H Peng Z Yang X Wang W Fu J Wang J Han Y Chai Y Guo T Yang N (2013a) Genome-wide 908 association study dissects the genetic architecture of oil biosynthesis in maize kernels Nat Genet 45 43ndash909 50 910

Li J Wei H Zhao PX (2013b) DeGNServer  Deciphering Genome-Scale Gene Networks through High 911 Performance Reverse Engineering Analysis 2013 912

Li P Piao Y Shon HS Ryu KH (2015b) Comparing the normalization methods for the differential analysis of 913 Illumina high-throughput RNA-Seq data BMC Bioinformatics 16 347 914

Li Q Eichten SR Hermanson PJ Zaunbrecher VM Song J Wendt J Rosenbaum H Madzima TF Sloan AE 915 Huang J et al (2014a) Genetic Perturbation of the Maize Methylome Plant Cell 26 4602ndash4616 916

Li S Łabaj PP Zumbo P Sykacek P Shi W Shi L Phan J Wu P-Y Wang M Wang C (2014b) Detecting and 917 correcting systematic variation in large-scale RNA sequencing data Nature Biotechnol 32 888ndash895 918

Li Y Pearl SA Jackson SA (2015c) Gene Networks in Plant Biology Approaches in Reconstruction and 919 Analysis Trends Plant Sci 20 664ndash675 920

Liao Y Smyth GK Shi W (2014) featureCounts an efficient general purpose program for assigning sequence 921 reads to genomic features Bioinformatics 30 923ndash930 922

Lim WK Wang K Lefebvre C Califano A (2007) Comparative analysis of microarray normalization procedures 923 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 27

Effects on reverse engineering gene networks Bioinformatics pp 282ndash288 924

Liu S Liu Y Zhao J Cai S Qian H Zuo K Zhao L Zhang L (2017) A computational interactome for prioritizing 925 genes associated with complex agronomic traits in rice Plant J 90 177-188 926

Lonsdale J Thomas J Salvatore M Phillips R Lo E Shad S Hasz R Walters G Garcia F Young N (2013) 927 The genotype-tissue expression (GTEx) project Nat Genet 45 580ndash585 928

Love MI Huber W Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data 929 with DESeq2 Genome Biol 15 1 930

Luo X You Z Zhou M Li S Leung H Xia Y Zhu Q (2015) A highly efficient approach to protein interactome 931 mapping based on collaborative filtering framework Sci Rep 5 7702 932

Ma C Wang X (2012) Application of the Gini correlation coefficient to infer regulatory relationships in 933 transcriptome analysis Plant Physiol 160 192ndash203 934

Ma H-W Zeng A-P (2003) The connectivity structure giant strong component and centrality of metabolic 935 networks Bioinformatics 19 1423ndash1430 936

Ma S Shah S Bohnert HJ Snyder M Dinesh-Kumar SP (2013) Incorporating motif analysis into gene co-937 expression networks reveals novel modular expression pattern and new signaling pathways PLoS Genet 938 9 e1003840 939

Marbach D Costello JC Kuumlffner R Vega NNM Prill RJ Camacho DM Allison KR Aderhold A Allison KR 940 Bonneau R et al (2012) Wisdom of crowds for robust gene network inference Nat Methods 9 796ndash804 941

Margolin AA Nemenman I Basso K Wiggins C Stolovitzky G Dalla Favera R Califano A (2006) ARACNE 942 an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context BMC 943 Bioinformatics 7 S7 944

Mark Cigan A Unger‐Wallace E Haug‐Collet K (2005) Transcriptional gene silencing as a tool for uncovering 945 gene function in maize Plant J 43 929ndash940 946

Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads EMBnet J 17 947 pp-10 948

Maza E Frasse P Senin P Bouzayen M Zouine M (2013) Comparison of normalization methods for 949 differential gene expression analysis in RNA-Seq experiments A matter of relative size of studied 950 transcriptomes Commun Integr Biol 6 e25849 951

McClintock B (1983) The Significance of Responses of THE GENOME TO CHALLENGE Science (80- ) 792ndash952 801 953

Meyer PE Kontos K Lafitte F Bontempi G (2007) Information-theoretic inference of large transcriptional 954 regulatory networks Eurasip J Bioinforma Syst Biol doi 101155200779879 955

Meyer PE Lafitte F Bontempi G (2008) minet ARBioconductor package for inferring large transcriptional 956 networks using mutual information BMC Bioinformatics 9 461 957

Monaco MK Sen TZ Dharmawardhana PD Ren L Schaeffer M Naithani S Amarasinghe V Thomason J 958 Harper L Gardiner J et al (2013) Maize Metabolic Network Construction and Transcriptome Analysis 959 Plant Genome 6 12 960

Morohashi K Casas MI Falcone Ferreyra ML Falcone Ferreyra L Mejiacutea-Guerra MK Pourcel L Yilmaz A 961 Feller A Carvalho B Emiliani J et al (2012) A genome-wide regulatory framework identifies maize 962 pericarp color1 controlled genes Plant Cell 24 2745ndash64 963

Morris JH Apeltsin L Newman AM Baumbach J Wittkop T Su G Bader GD Ferrin TE (2011) clusterMaker 964 a multi-algorithm clustering plugin for Cytoscape BMC Bioinformatics 12 436 965

Mortazavi A Williams BA McCue K Schaeffer L Wold B (2008) Mapping and quantifying mammalian 966 transcriptomes by RNA-Seq Nat Methods 5 621ndash628 967

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 28

Mukaka MM (2012) A guide to appropriate use of correlation coefficient in medical research Malawi Med J 24 968 69ndash71 969

Obayashi T Hayashi S Saeki M Ohta H Kinoshita K (2009) ATTED-II provides coexpressed gene networks 970 for Arabidopsis Nucleic Acids Res 37 D987ndashD991 971

Ogata Y Suzuki H Sakurai N Shibata D (2010) CoP a database for characterizing co-expressed gene 972 modules with biological information in plants Bioinformatics 26 1267ndash1268 973

Oshlack A Wakefield MJ (2009) Transcript length bias in RNA-seq data confounds systems biology Biol 974 Direct 4 14 975

Park T Yi S-G Kang S-H Lee S Lee Y-S Simon R (2003) Evaluation of normalization methods for microarray 976 data BMC Bioinformatics 4 33 977

Paulson J Chen C-Y Lopes-Ramos CM Kuijjer ML Platig J Sonawane AR Fagny M Glass K Quackenbush 978 J (2016) Tissue-aware RNA-Seq processing and normalization for heterogeneous and sparse data 979 bioRxiv 81802 980

Pautler M Eveland AL LaRue T Yang F Weeks R Lunde C Je B Il Meeley R Komatsu M Vollbrecht E et 981 al (2015) FASCIATED EAR4 Encodes a bZIP Transcription Factor That Regulates Shoot Meristem Size in 982 Maize Plant Cell Online 2 tpc114132506 983

Penning BW Hunter 3rd CT Tayengwa R Eveland AL Dugard CK Olek AT Vermerris W Koch KE McCarty 984 DR Davis MF et al (2009) Genetic resources for maize cell wall biology Plant Physiol 151 1703ndash1728 985

Ponnala L Wang Y Sun Q Wijk KJ (2014) Correlation of mRNA and protein abundance in the developing 986 maize leaf Plant J 78 424ndash440 987

Rau A Gallopin M Celeux G Jaffreacutezic F (2013) Data-based filtering for replicated high-throughput 988 transcriptome sequencing experiments Bioinformatics 29 2146ndash2152 989

Robinson MD McCarthy DJ Smyth GK (2010) edgeR a Bioconductor package for differential expression 990 analysis of digital gene expression data Bioinformatics 26 139ndash140 991

Sales G Romualdi C (2011) parmigenemdasha parallel R package for mutual information estimation and gene 992 network reconstruction Bioinformatics 27 1876ndash1877 993

Sangrador-Vegas A Mitchell AL Chang H-Y Yong S-Y Finn RD (2016) GO annotation in InterPro why 994 stability does not indicate accuracy in a sea of changing annotations Database J Biol databases 995 curation 2016 996

Sato Y Antonio BA Namiki N Takehisa H Minami H Kamatsuki K Sugimoto K Shimizu Y Hirochika H 997 Nagamura Y (2011) RiceXPro a platform for monitoring gene expression in japonica rice grown under 998 natural field conditions Nucleic Acids Res 39 D1141ndashD1148 999

Schaefer RJ Briskine R Springer NM Myers CL (2014) Discovering functional modules across diverse maize 1000 transcriptomes using COB the co-expression browser PLoS One doi 101371journalpone0099193 1001

Schmidt D (2016) Co-Operation Fast Correlation Covariance and Cosine Similarity R package 1002

Schnable P Ware D Fulton R Stein J (2009) The B73 maize genome complexity diversity and dynamics 1003 Science (80- ) 326 1112ndash1115 1004

Schwanhaumlusser B Busse D Li N Dittmar G Schuchhardt J Wolf J Chen W Selbach M (2011) Global 1005 quantification of mammalian gene expression control Nature 473 337ndash342 1006

Scott-Boyer M-P Haibe-Kains B Deschepper CF (2013) Network statistics of genetically-driven gene co-1007 expression modules in mouse crosses Frontiers in Genetics 20134291 1008

Serin EAR Nijveen H Hilhorst HWM Ligterink W (2016) Learning from Co-expression Networks Possibilities 1009 and Challenges Front Plant Sci 7 444 1010

Shannon P Markiel A Ozier O Baliga NS Wang JT Ramage D Amin N Schwikowski B Ideker T (2003) 1011 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 29

Cytoscape a software environment for integrated models of biomolecular interaction networks Genome 1012 Res 13 2498ndash2504 1013

Sing T Sander O Beerenwinkel N Lengauer T (2005) ROCR visualizing classifier performance in R 1014 Bioinformatics 21 3940ndash3941 1015

Singh M Goel S Meeley RB Dantec C Parrinello H Michaud C Leblanc O Grimanelli D (2011) Production of 1016 viable gametes without meiosis in maize deficient for an ARGONAUTE protein Plant Cell 23 443ndash458 1017

Song L Langfelder P Horvath S (2012) Comparison of co-expression measures mutual information 1018 correlation and model based indices BMC Bioinformatics 13 328 1019

Stelpflug SC Rajandeep S Vaillancourt B Hirsch CN Buell CR Leon N De Kaeppler SM (2015) An 1020 expanded maize gene expression atlas based on RNA-sequencing and its use to explore root 1021 development Plant Genome 314ndash362 1022

Szklarczyk D Franceschini A Wyder S Forslund K Heller D Huerta-Cepas J Simonovic M Roth A Santos A 1023 Tsafou KP et al (2015) STRING v10 Protein-protein interaction networks integrated over the tree of life 1024 Nucleic Acids Res 43 D447ndashD452 1025

Tzfadia O Diels T De Meyer S Vandepoele K Aharoni A Van de Peer Y (2016) CoExpNetViz Comparative 1026 Co-Expression Networks Construction and Visualization Tool Front Plant Sci 6 1194 1027

Usadel B Obayashi T Mutwil M Giorgi FM Bassel GW Tanimoto M Chow A Steinhauser D Persson S 1028 Provart NJ (2009) Co-expression tools for plant biology opportunities for hypothesis generation and 1029 caveats Plant Cell Environ 32 1633ndash51 1030

USDA (2016) Grain World Markets and Trade 1031

Vanholme R Demedts B Morreel K Ralph J Boerjan W (2010) Lignin biosynthesis and structure Plant 1032 Physiol 153 895ndash905 1033

Verdin E Ott M (2014) 50 Years of Protein Acetylation From Gene Regulation To Epigenetics Metabolism 1034 and Beyond Nat Rev Mol Cell Biol 16 258ndash264 1035

Walley JW Sartor RC Shen Z Schmitz RJ Wu KJ Urich MA Nery JR Smith LG Schnable JC Ecker JR 1036 (2016) Integration of omic networks in a developmental atlas of maize Science 353 814ndash818 1037

Wang W Zhou XJ Liu Z Sun F (2015a) Network tuned multiple rank aggregation and applications to gene 1038 ranking BMC Bioinformatics 16 S6 1039

Wang YXR Jiang K Feldman LJ Bickel PJ Huang H (2015b) Inferring genendashgene interactions and functional 1040 modules using sparse canonical correlation analysis Ann Appl Stat 9 300ndash323 1041

Wang Z Gerstein M Snyder M (2009) RNA-Seq a revolutionary tool for transcriptomics Nat Rev Genet 10 1042 57ndash63 1043

Wei C Li J Bumgarner RE (2004) Sample size for detecting differentially expressed genes in microarray 1044 experiments BMC Genomics 5 87 1045

Wolfe CJ Kohane IS Butte AJ (2005) Systematic survey reveals general applicability ofldquo guilt-by-associationrdquo 1046 within gene coexpression networks BMC Bioinformatics 6 227 1047

Wu D-D Wang X Li Y Zeng L Irwin DM Zhang Y-P (2014) ldquoOut of Pollenrdquo Hypothesis for Origin of New 1048 Genes in Flowering Plants Study from Arabidopsis thaliana Genome Biol Evol 6 2822ndash2829 1049

Yang H Liu X Xin M Du J Hu Z Peng H Rossi V Sun Q Ni Z Yao Y (2016) Genome-wide Mapping of 1050 Targets of Maize Histone Deacetylase HDA101 Reveals Its Function and Regulatory Mechanism during 1051 Seed Development Plant Cell 28 629ndash645 1052

Yim WC Yu Y Song K Jang CS Lee B-M (2013) PLANEX the plant co-expression database BMC Plant Biol 1053 13 83 1054

Zheng W Chung LM Zhao H (2011) Bias detection and correction in RNA-Sequencing data BMC 1055 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 30

Bioinformatics 12 290 1056

Zhong R Allen JD Xiao G Xie Y (2014) Ensemble-based network aggregation improves the accuracy of gene 1057 network reconstruction PLoS One 9 e106319 1058

Zhong R Ye ZH (2015) Secondary cell walls Biosynthesis patterned deposition and transcriptional regulation 1059 Plant Cell Physiol 56 195ndash214 1060

Zhu G Wu A Xu X-J Xiao P Lu L Liu J Cao Y Chen L Wu J Zhao X-M (2016) PPIM A protein-protein 1061 interaction database for Maize Plant Physiol 170 pp1501821- 1062

Zyprych-Walczak J Szabelska A Handschuh L Goacuterczak K Klamecka K Figlerowicz M Siatkowski I (2015) 1063 The impact of normalization methods on RNA-Seq data analysis Biomed Res Int 2015 1064

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 1 Number of maize microarray and RNA-Seq samples submitted to NCBI (httpswwwncbinlmnihgov) from 2008-2016 A A text search of the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) database identified samples generated by microarray platforms GPL4032 and GPL12620 the total values for years 2008 to 2016 were combined to represent the number of microarray studies (Microarray) A text search of the NCBI Sequence Read Archive (SRA) database was used to identify RNA-seq samples generated between 2008 and 2016 using the Illumina sequencing platform (RNA-Seq) B the number of samples submitted to NCBI GEO database each year generated by microarray platform GPL4032 and GPL12620 were identified by a text search (dash line) and compared to the number of RNA-Seq Illumina samples (solid line) per year 2008-2016

Fig 1A B

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 2 Normalization and network inference methods effect on single network performance A Network performance was evaluated by calculating area under the receiver operating characteristic curve (AUROC) values from GO datasets for comparisons with samples normalized using Variance Stabilizing Transformation (VST) Counts Per Million (CPM) or Reads per Kilobase per Million (RPKM) methods B Network performance was evaluated by calculating AUROC values from PPPTY dataset comparisons for samples normalized using VST CPM or RPKM methods C Network performance was evaluated by calculating AUROC values from com-parisons with HDA101 binding targets for samples normalized using VST CPM or RPKM methods D Network performance was evaluated by calculating AUROC values from comparisons with GO dataset for samples constructed using nine inference methods including Pearson Correlation Coefficient (PCC) Spearman correla-tion coefficient (SCC) Kendall rank Correlation Coefficient (KCC) Gini correlation coefficient (GCC) Biweight midcorrelation (BIC) Cosine Similarity Coefficient (CSC) Additive ARCNE (AA) multiplicative ARCNE (MA) Minimum Redundancy NETwork (MRNET) and Context Likelihood of Relatedness (CLR) E Network perfor-mance was evaluated by calculating AUROC values from comparisons with PPPTY for samples constructed using nine inference methods F Network performance was evaluated by calculating AUROC values from com-parisons with HDA101 binding targets for samples constructed using nine inference methods Outliers were defined as outside of 15 times the interquartile range above 75 quantile or below 25 quantile Median values were plotted as bold horizontal lines For C to F the horizontal dashed lines indicate the highest and lowest AUROC values

Fig 2 A D

B E

C F

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

FigP

FigurePIPSimilarityPbetweenPninePinferencePmethodsPonPnetworkPperformancePbasedPuponPGOPA-PandPPPPTYPB-PevaluationI Genes with highest AUROC value in CSC or MRNETCLR are closed in blue and green box respectively AreaPunderPthePROCPcurvePAUROC-PvaluesPforPeachPGOPtermPorPgenesPwerePscaledPtoPstandardPnormalPdistributionMPresultingPinPscaledPAUROCPvaluesPbetweenPKPblue-PandPPred-IPSamplesPnormalizedPbyPVSTMPCPMPandPRPKMPwerePanalyzedPusingPeachPinferencePmethodsPPCCMPSCCMPKCCMPGCCMPBICMPCSCMPAAMPMAMPMRNETPandPCLR-PandPclusteredPbasedPonP EuclidianP distanceIP PCCP PearsonP CorrelationP CoefficientP SCCP SpearmanP correlationP coefficientP KCCP KendallP rankP CorrelationP CoefficientPGCCPGiniPcorrelationPcoefficientPBICPBiweightPmidcorrelationPCosinePSimilarityPCoefficientPCSC-PAAPAdditivePARCNEPMAPmultiplicativePARCNEPMRNETPMinimumPRedundancyPNETworkPCLRPContextPLikelihoodPofPRelatednessI

A

B

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 4 Effect of sample size on network performance A Average area under the ROC curve (average AUROC) values from GO evaluation of 17 different sized networks plotted against natural logarithm transformed sample size (log(sample size)) B Average AUROC values from PPPTY evaluation of 17 different sized networks plotted against natural logarithm transformed sample size (log(sample size)) Regression fitting logarithm models were plotted in black lines R-square and p-values were calculated by lm() function in R PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness

Fig 4A

B

GO PCC GO SCC

GO MRNET GO CLR

PPPTY PCC PPPTY SCC

PPPTY CLRPPPTY MRNET

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Fig 5

A B

Figure 5 Evaluation of different sized networks constructed with different number of samples using PCC (black) SCC (green) MRNET (red) and CLR (blue) methods A Average AUROC values from GO evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted against sample size B Average AUROC values from PPPTY evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted against sample size Networks with the same number of samples included are designated as ldquo_1rdquo ldquo_2rdquo and ldquo_3rdquo PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 6 GCN performance comparison of networks constructed with 1266 libraries A Area under the ROC curve (AUROC) values from GO evalu-ation of single network (white bars) aggregation network (grey bars) and protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers B AUROC values from PPPTY evaluation of single network (white bars) aggregation network (grey bars) and protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers

AU

C

AU

C

Protein GO Protein PPPTYA B

Fig 6

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

835

45

3812

5 802

148

MS PA

SA

872505

167664

16510411732

9172 716573

106591

MS PA

SA

chromatinassemblydisassembly

cellular macromoleculemetabolic process

chromatin assembly

Hub

N2 compound metabolicprocess

gene silencing

macromoleculemetabolic process

cellular componentorganization

chromatin modification

biosynthetic process

cellular biosyntheticprocess

DNA packaging

organelle organization

protein-DNA complexassembly

nucleosomeorganization

DNA-dep DNAreplication

macromoleculebiosynthetic process

response to DNAdamage stimulus

chromosomeorganization

pattern specificationprocess

DNA replication

DNA conformationchange

translation

cellular macromoleculebiosynthetic process

Nucleic acid metabolicprocess

gene expression

chromatin organizationnucleosome assembly

epigenetic reg of geneexpression

negative regulation ofmacromolecule

metabolic process

cellular response tostress

RNA processing

DNA repair

regionalization

polysaccharidebiosynthetic process

cell wall organization orbiogenesis

glucan metabolicprocess

cellular glucanmetabolic process

cellular polysaccharidebiosynthetic process

cellular carbohydratebiosynthetic process

cellulose metabolicprocess

cellular polysaccharidemetabolic process

cellulose biosyntheticprocess epidermis development

cell growthgrowth

regulation of cellularcomponent size

cellular amino acidderivative metabolic

process

cell wall polysaccharidemetabolic process

carbohydrate metabolicprocess

regulation of anatomicalstructure size

GTP metabolic process

root morphogenesis

epidermal celldifferentiation

ectoderm developmentphenylpropanoid

biosynthetic process

regulation of cell size

glucan biosyntheticprocess

carbohydratebiosynthetic process cellular cell wall

organization orbiogenesis

cell wall biogenesis

Cell Wallroot epidermal cell

differentiationcell differentiation

cell wall organization

protein polymerization

plant-type cell wallbiogenesis

cellular carbohydratemetabolic process

phenylpropanoidmetabolic process

cell wall macromoleculemetabolic process

cellular cell wallmacromolecule

metabolic process

plant-type cell wallorganization or

biogenesis

hemicellulose metabolicprocess

Fig 7A C

B D

Figure 7 Comparison of top 1000 interacting genes in aggregation and single networks A Venn diagram shows the overlap among 1000 highest interacting genes in three networks 148 genes were shared among three networks PA PCC ranked aggregation network SA SCC ranked aggrega-tion network MS MRNET single network B Venn diagram shows the overlap among top 1x106 edges in three networks 106591 edges were shared among three networks C GO term enrichment analyzed by AgriGO for 148 shared highly interacting genes among PA SA and MS D GO term enrichment analyzed by AgriGO for 214 co-expressed genes queried by 16 cell wall pathway genes wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Fig 8

A B C

Figure 8 Cell wall pathway subnetworks A Intersections of PCC aggregation (PA) SCC aggregation (SA) and MRNET-single (MS) networks queried by 16 cell wall pathway genes (red nodes) Cyan nodes are genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions B Network retrieved from CORNET database queried by the 16 cell wall pathway genes (red node) Cyan nodes are genes with reported function in cell wall related path-ways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions C Network retrieved from STRING database queried by the 16 cell wall pathway genes (red nodes) Cyan nodes are genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Parsed CitationsAllen JD Xie Y Chen M Girard L Xiao G (2012) Comparing statistical methods for constructing large scale gene networks PLoS One7 e29348

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Anders S Huber W (2010) Differential expression analysis for sequence count data Genome Biol 11 R106Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Aoki K Ogata Y Shibata D (2007) Approaches for extracting practical information from gene co-expression networks in plant biologyPlant Cell Physiol 48 381-90

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Aoki Y Okamura Y Tadaka S Kinoshita K Obayashi T (2015) ATTED-II in 2016 A Plant Coexpression Database Towards Lineage-Specific Coexpression Plant Cell Physiol 57 e5-e5

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Asnicar F Masera L Coller E Gallo C Sella N Tolio T Morettin P Erculiani L Galante F Semeniuta S (2016) NES2RA Networkexpansion by stratified variable subsetting and ranking aggregation Int J High Perform Comput Appl 1094342016662508

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Baerenfaller K Grossmann J Grobei MA Hull R Hirsch-Hoffmann M Yalovsky S Zimmermann P Grossniklaus U Gruissem WBaginsky S (2008) Genome-scale proteomics reveals Arabidopsis thaliana gene models and proteome dynamics Science (80- ) 320938-941

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ballouz S Verleyen W Gillis J (2015) Guidance for RNA-seq co-expression network construction and analysis Safety in numbersBioinformatics 31 2123-2130

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ballouz S Weber M Pavlidis P Gillis J (2016) EGAD Ultra-fast functional analysis of gene networks bioRxiv 53868Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Barabasi A-L Oltvai ZNZN Barabaacutesi A-L (2004) Network biology understanding the cells functional organization Nat Rev Genet 5101-113

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Benjamini Y Hochberg Y (1995) Controlling the false discovery rate a practical and powerful approach to multiple testing J R Stat SocSer B 289-300

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

De Bodt S Hollunder J Nelissen H Meulemeester N Inzeacute D (2012) CORNET 20 Integrating plant coexpression protein-proteininteractions regulatory interactions gene associations and functional annotations New Phytol 195 707-720

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Bolduc N Yilmaz A Mejia-Guerra MK Morohashi K OConnor D Grotewold E Hake S (2012) Unraveling the KNOTTED1 regulatorynetwork in maize meristems Genes Dev 26 1685-90

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Bosch M Mayer C-D Cookson A Donnison IS (2011) Identification of genes involved in cell wall biogenesis in grasses by differential wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

gene expression profiling of elongating and non-elongating maize internodes J Exp Bot 62 3545-3561Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Camacho C Coulouris G Avagyan V Ma N Papadopoulos J Bealer K Madden TL (2009) BLAST+ architecture and applications BMCBioinformatics 10 421

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Conesa A Madrigal P Tarazona S Gomez-Cabrero D Cervera A McPherson A Szczesniak MW Gaffney DJ Elo LL Zhang X et al(2016) A survey of best practices for RNA-seq data analysis Genome Biol 17 13

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dhaeseleer P Liang S Somogyi R (2000) Genetic network inference from co-expression clustering to reverse engineeringBioinformatics 16 707-726

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Davidson RM Hansey CN Gowda M Childs KL Lin H Vaillancourt B Sekhon RS de Leon N Kaeppler SM Jiang N et al (2011) Utilityof RNA Sequencing for Analysis of Maize Reproductive Transcriptomes Plant Genome J 4 191

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Desprez T Juraniec M Crowell EF Jouy H Pochylova Z Parcy F Houmlfte H Gonneau M Vernhettes S (2007) Organization of cellulosesynthase complexes involved in primary cell wall synthesis in Arabidopsis thaliana Proc Natl Acad Sci 104 15572-15577

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dhillon IS Modha DS (2001) Concept decompositions for large sparse text data using clustering Mach Learn 42 143-175Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dillies M-A Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot G Castel D Estelle J (2013a) Acomprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis Brief Bioinform 14671-683

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dillies MA Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot NS Castel D Estelle J et al (2013b) Acomprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis Brief Bioinform 14671-683

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Doncheva NT Assenov Y Domingues FS Albrecht M (2012) Topological analysis and interactive visualization of biological networksand protein structures Nature Protoc 7 670-85

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dong J Horvath S (2007) Understanding network concepts in modules BMC Syst Biol 1 24Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dotto MC Petsch KA Aukerman MJ Beatty M Hammell M Timmermans MCP (2014) Genome-wide analysis of leafbladeless1-regulatedand phased small RNAs underscores the importance of the TAS3 ta-siRNA pathway to maize development PLoS Genet 10 e1004826

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Du D Rawat N Deng Z Gmitter FG (2015) Construction of citrus gene coexpression networks from microarray data using randommatrix theory Hortic Res 2 15026

Pubmed Author and TitleCrossRef Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Google Scholar Author Only Title Only Author and Title

Du Z Zhou X Ling Y Zhang Z Su Z (2010) agriGO a GO analysis toolkit for the agricultural community Nucleic Acids Res 38 64-70Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Enright AJ Van Dongen S Ouzounis CA (2002) An efficient algorithm for large-scale detection of protein families Nucleic Acids Res 301575-1584

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fadista J Vikman P Laakso EO Mollet IG Esguerra J Lou Taneera J Storm P Osmark P Ladenvall C Prasad RB (2014) Globalgenomic and transcriptomic analysis of human pancreatic islets reveals novel genes influencing glucose metabolism Proc Natl AcadSci 111 13924-13929

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Faith JJ Hayete B Thaden JT Mogno I Wierzbowski J Cottarel G Kasif S Collins JJ Gardner TS (2007) Large-scale mapping andvalidation of Escherichia coli transcriptional regulation from a compendium of expression profiles PLoS Biol 5 0054-0066

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fedoroff N V (2012) McClintocks challenge in the 21st century Proc Natl Acad Sci 109(50) 20200-20203Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ficklin SP Feltus FA (2011) Gene coexpression network alignment and conservation of gene modules between two grass speciesmaize and rice Plant Physiol 156 1244-56

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Filzmoser P Fritz H Kalcher K (2009) pcaPP Robust PCA by Projection Pursuit R Packag version 1Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fu J Cheng Y Linghu J Yang X Kang L Zhang Z Zhang J He C Du X Peng Z (2013) RNA sequencing reveals the complex regulatorynetwork in the maize kernel Nature Commun 42832

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gao S Ver Steeg G Galstyan A (2015) Efficient Estimation of Mutual Information for Strongly Dependent Variables ArtificialIntelligence and Statistics 277-286

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gillis J Pavlidis P (2011) The role of indirect connections in gene networks in predicting function Bioinformatics 27 1860-1866Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Giorgi F Fabbro C Del Licausi F (2013) Comparative study of RNA-seq-and Microarray-derived coexpression networks in Arabidopsisthaliana Bioinformatics 2 1-8

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gu Y Kaplinsky N Bringmann M Cobb A Carroll A Sampathkumar A Baskin TI Persson S Somerville CR (2010) Identification of acellulose synthase-associated protein required for cellulose biosynthesis Proc Natl Acad Sci 107 12866-12871

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Han Y Gao S Muegge K Zhang W Zhou B (2015) Advanced applications of RNA sequencing and challenges Bioinform Biol Insights 929-46

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Horvath S Dong J (2008) Geometric interpretation of gene coexpression network analysis PLoS Comput biol 4 e1000117Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Huang J Lynn JS Schulte L Vendramin S McGinnis K (2017) Chapter Two-Epigenetic Control of Gene Expression in Maize Int RevCell Mol Biol 328 25-48

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Iancu OD Kawane S Bottomly D Searles R Hitzemann R McWeeney S (2012) Utilizing RNA-Seq data for de novo coexpressionnetwork inference Bioinformatics 28 1592-1597

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kim D Langmead B Salzberg SL (2015) HISAT a fast spliced aligner with low memory requirements Nat Methods 12 357-360Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Koumlster J Rahmann S (2012) Snakemakemdasha scalable bioinformatics workflow engine Bioinformatics 28 2520-2522Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Krouk G Lingeman J Colon A Coruzzi G Shasha D (2013) Gene regulatory networks in plants learning causality from time andperturbation Genome Biol 14 123

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kumar S Stecher G Suleski M Hedges SB (2017) TimeTree a resource for timelines timetrees and divergence times Mol Biol Evol34 1812-1819

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kumari S Nie J Chen H-S Ma H Stewart R Li X Lu M-Z Taylor WM Wei H (2012) Evaluation of gene association methods forcoexpression network construction and biological knowledge discovery PLoS One 7 e50411

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Langfelder P Horvath S (2008) WGCNA an R package for weighted correlation network analysis BMC Bioinformatics 9 559Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Leinonen R Sugawara H Shumway M (2010) The sequence read archive Nucleic Acids Res gkq1019Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li C Qiao Z Qi W Wang Q Yuan Y Yang X Tang Y Mei B Lv Y Zhao H et al (2015a) Genome-Wide Characterization of cis-Acting DNATargets Reveals the Transcriptional Regulatory Framework of Opaque2 in Maize Plant Cell 27 532-545

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li H Peng Z Yang X Wang W Fu J Wang J Han Y Chai Y Guo T Yang N (2013a) Genome-wide association study dissects the geneticarchitecture of oil biosynthesis in maize kernels Nat Genet 45 43-50

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li J Wei H Zhao PX (2013b) DeGNServer Deciphering Genome-Scale Gene Networks through High Performance ReverseEngineering Analysis 2013

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li P Piao Y Shon HS Ryu KH (2015b) Comparing the normalization methods for the differential analysis of Illumina high-throughputRNA-Seq data BMC Bioinformatics 16 347 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li Q Eichten SR Hermanson PJ Zaunbrecher VM Song J Wendt J Rosenbaum H Madzima TF Sloan AE Huang J et al (2014a)Genetic Perturbation of the Maize Methylome Plant Cell 26 4602-4616

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li S Labaj PP Zumbo P Sykacek P Shi W Shi L Phan J Wu P-Y Wang M Wang C (2014b) Detecting and correcting systematicvariation in large-scale RNA sequencing data Nature Biotechnol 32 888-895

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li Y Pearl SA Jackson SA (2015c) Gene Networks in Plant Biology Approaches in Reconstruction and Analysis Trends Plant Sci 20664-675

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Liao Y Smyth GK Shi W (2014) featureCounts an efficient general purpose program for assigning sequence reads to genomicfeatures Bioinformatics 30 923-930

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Lim WK Wang K Lefebvre C Califano A (2007) Comparative analysis of microarray normalization procedures Effects on reverseengineering gene networks Bioinformatics pp 282-288

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Liu S Liu Y Zhao J Cai S Qian H Zuo K Zhao L Zhang L (2017) A computational interactome for prioritizing genes associated withcomplex agronomic traits in rice Plant J 90 177-188

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Lonsdale J Thomas J Salvatore M Phillips R Lo E Shad S Hasz R Walters G Garcia F Young N (2013) The genotype-tissueexpression (GTEx) project Nat Genet 45 580-585

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Love MI Huber W Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 Genome Biol15 1

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Luo X You Z Zhou M Li S Leung H Xia Y Zhu Q (2015) A highly efficient approach to protein interactome mapping based oncollaborative filtering framework Sci Rep 5 7702

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma C Wang X (2012) Application of the Gini correlation coefficient to infer regulatory relationships in transcriptome analysis PlantPhysiol 160 192-203

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma H-W Zeng A-P (2003) The connectivity structure giant strong component and centrality of metabolic networks Bioinformatics 191423-1430

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma S Shah S Bohnert HJ Snyder M Dinesh-Kumar SP (2013) Incorporating motif analysis into gene co-expression networks revealsnovel modular expression pattern and new signaling pathways PLoS Genet 9 e1003840

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Marbach D Costello JC Kuumlffner R Vega NNM Prill RJ Camacho DM Allison KR Aderhold A Allison KR Bonneau R et al (2012)Wisdom of crowds for robust gene network inference Nat Methods 9 796-804

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Margolin AA Nemenman I Basso K Wiggins C Stolovitzky G Dalla Favera R Califano A (2006) ARACNE an algorithm for thereconstruction of gene regulatory networks in a mammalian cellular context BMC Bioinformatics 7 S7

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mark Cigan A Unger-Wallace E Haug-Collet K (2005) Transcriptional gene silencing as a tool for uncovering gene function in maizePlant J 43 929-940

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads EMBnet J 17 pp-10Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Maza E Frasse P Senin P Bouzayen M Zouine M (2013) Comparison of normalization methods for differential gene expressionanalysis in RNA-Seq experiments A matter of relative size of studied transcriptomes Commun Integr Biol 6 e25849

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

McClintock B (1983) The Significance of Responses of THE GENOME TO CHALLENGE Science (80- ) 792-801Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Meyer PE Kontos K Lafitte F Bontempi G (2007) Information-theoretic inference of large transcriptional regulatory networks EurasipJ Bioinforma Syst Biol doi 101155200779879

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Meyer PE Lafitte F Bontempi G (2008) minet ARBioconductor package for inferring large transcriptional networks using mutualinformation BMC Bioinformatics 9 461

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Monaco MK Sen TZ Dharmawardhana PD Ren L Schaeffer M Naithani S Amarasinghe V Thomason J Harper L Gardiner J et al(2013) Maize Metabolic Network Construction and Transcriptome Analysis Plant Genome 6 12

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Morohashi K Casas MI Falcone Ferreyra ML Falcone Ferreyra L Mejiacutea-Guerra MK Pourcel L Yilmaz A Feller A Carvalho B EmilianiJ et al (2012) A genome-wide regulatory framework identifies maize pericarp color1 controlled genes Plant Cell 24 2745-64

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Morris JH Apeltsin L Newman AM Baumbach J Wittkop T Su G Bader GD Ferrin TE (2011) clusterMaker a multi-algorithmclustering plugin for Cytoscape BMC Bioinformatics 12 436

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mortazavi A Williams BA McCue K Schaeffer L Wold B (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq NatMethods 5 621-628

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mukaka MM (2012) A guide to appropriate use of correlation coefficient in medical research Malawi Med J 24 69-71Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Obayashi T Hayashi S Saeki M Ohta H Kinoshita K (2009) ATTED-II provides coexpressed gene networks for Arabidopsis Nucleic wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Acids Res 37 D987-D991Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ogata Y Suzuki H Sakurai N Shibata D (2010) CoP a database for characterizing co-expressed gene modules with biologicalinformation in plants Bioinformatics 26 1267-1268

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Oshlack A Wakefield MJ (2009) Transcript length bias in RNA-seq data confounds systems biology Biol Direct 4 14Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Park T Yi S-G Kang S-H Lee S Lee Y-S Simon R (2003) Evaluation of normalization methods for microarray data BMC Bioinformatics4 33

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Paulson J Chen C-Y Lopes-Ramos CM Kuijjer ML Platig J Sonawane AR Fagny M Glass K Quackenbush J (2016) Tissue-awareRNA-Seq processing and normalization for heterogeneous and sparse data bioRxiv 81802

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Pautler M Eveland AL LaRue T Yang F Weeks R Lunde C Je B Il Meeley R Komatsu M Vollbrecht E et al (2015) FASCIATED EAR4Encodes a bZIP Transcription Factor That Regulates Shoot Meristem Size in Maize Plant Cell Online 2 tpc114132506

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Penning BW Hunter 3rd CT Tayengwa R Eveland AL Dugard CK Olek AT Vermerris W Koch KE McCarty DR Davis MF et al (2009)Genetic resources for maize cell wall biology Plant Physiol 151 1703-1728

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ponnala L Wang Y Sun Q Wijk KJ (2014) Correlation of mRNA and protein abundance in the developing maize leaf Plant J 78 424-440Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Rau A Gallopin M Celeux G Jaffreacutezic F (2013) Data-based filtering for replicated high-throughput transcriptome sequencingexperiments Bioinformatics 29 2146-2152

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Robinson MD McCarthy DJ Smyth GK (2010) edgeR a Bioconductor package for differential expression analysis of digital geneexpression data Bioinformatics 26 139-140

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sales G Romualdi C (2011) parmigenemdasha parallel R package for mutual information estimation and gene network reconstructionBioinformatics 27 1876-1877

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sangrador-Vegas A Mitchell AL Chang H-Y Yong S-Y Finn RD (2016) GO annotation in InterPro why stability does not indicateaccuracy in a sea of changing annotations Database J Biol databases curation 2016

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sato Y Antonio BA Namiki N Takehisa H Minami H Kamatsuki K Sugimoto K Shimizu Y Hirochika H Nagamura Y (2011) RiceXPro aplatform for monitoring gene expression in japonica rice grown under natural field conditions Nucleic Acids Res 39 D1141-D1148

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schaefer RJ Briskine R Springer NM Myers CL (2014) Discovering functional modules across diverse maize transcriptomes using wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

COB the co-expression browser PLoS One doi 101371journalpone0099193Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schmidt D (2016) Co-Operation Fast Correlation Covariance and Cosine Similarity R packagePubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schnable P Ware D Fulton R Stein J (2009) The B73 maize genome complexity diversity and dynamics Science (80- ) 326 1112-1115Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schwanhaumlusser B Busse D Li N Dittmar G Schuchhardt J Wolf J Chen W Selbach M (2011) Global quantification of mammalian geneexpression control Nature 473 337-342

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Scott-Boyer M-P Haibe-Kains B Deschepper CF (2013) Network statistics of genetically-driven gene co-expression modules in mousecrosses Frontiers in Genetics 20134291

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Serin EAR Nijveen H Hilhorst HWM Ligterink W (2016) Learning from Co-expression Networks Possibilities and Challenges FrontPlant Sci 7 444

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Shannon P Markiel A Ozier O Baliga NS Wang JT Ramage D Amin N Schwikowski B Ideker T (2003) Cytoscape a softwareenvironment for integrated models of biomolecular interaction networks Genome Res 13 2498-2504

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sing T Sander O Beerenwinkel N Lengauer T (2005) ROCR visualizing classifier performance in R Bioinformatics 21 3940-3941Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Singh M Goel S Meeley RB Dantec C Parrinello H Michaud C Leblanc O Grimanelli D (2011) Production of viable gametes withoutmeiosis in maize deficient for an ARGONAUTE protein Plant Cell 23 443-458

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Song L Langfelder P Horvath S (2012) Comparison of co-expression measures mutual information correlation and model basedindices BMC Bioinformatics 13 328

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Stelpflug SC Rajandeep S Vaillancourt B Hirsch CN Buell CR Leon N De Kaeppler SM (2015) An expanded maize gene expressionatlas based on RNA-sequencing and its use to explore root development Plant Genome 314-362

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Szklarczyk D Franceschini A Wyder S Forslund K Heller D Huerta-Cepas J Simonovic M Roth A Santos A Tsafou KP et al (2015)STRING v10 Protein-protein interaction networks integrated over the tree of life Nucleic Acids Res 43 D447-D452

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Tzfadia O Diels T De Meyer S Vandepoele K Aharoni A Van de Peer Y (2016) CoExpNetViz Comparative Co-Expression NetworksConstruction and Visualization Tool Front Plant Sci 6 1194

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Usadel B Obayashi T Mutwil M Giorgi FM Bassel GW Tanimoto M Chow A Steinhauser D Persson S Provart NJ (2009) Co-expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

USDA (2016) Grain World Markets and Trade

Vanholme R Demedts B Morreel K Ralph J Boerjan W (2010) Lignin biosynthesis and structure Plant Physiol 153 895-905Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Verdin E Ott M (2014) 50 Years of Protein Acetylation From Gene Regulation To Epigenetics Metabolism and Beyond Nat Rev MolCell Biol 16 258-264

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Walley JW Sartor RC Shen Z Schmitz RJ Wu KJ Urich MA Nery JR Smith LG Schnable JC Ecker JR (2016) Integration of omicnetworks in a developmental atlas of maize Science 353 814-818

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang W Zhou XJ Liu Z Sun F (2015a) Network tuned multiple rank aggregation and applications to gene ranking BMC Bioinformatics16 S6

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang YXR Jiang K Feldman LJ Bickel PJ Huang H (2015b) Inferring gene-gene interactions and functional modules using sparsecanonical correlation analysis Ann Appl Stat 9 300-323

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang Z Gerstein M Snyder M (2009) RNA-Seq a revolutionary tool for transcriptomics Nat Rev Genet 10 57-63Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wei C Li J Bumgarner RE (2004) Sample size for detecting differentially expressed genes in microarray experiments BMC Genomics5 87

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wolfe CJ Kohane IS Butte AJ (2005) Systematic survey reveals general applicability of guilt-by-association within genecoexpression networks BMC Bioinformatics 6 227

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wu D-D Wang X Li Y Zeng L Irwin DM Zhang Y-P (2014) Out of Pollen Hypothesis for Origin of New Genes in Flowering PlantsStudy from Arabidopsis thaliana Genome Biol Evol 6 2822-2829

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yang H Liu X Xin M Du J Hu Z Peng H Rossi V Sun Q Ni Z Yao Y (2016) Genome-wide Mapping of Targets of Maize HistoneDeacetylase HDA101 Reveals Its Function and Regulatory Mechanism during Seed Development Plant Cell 28 629-645

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yim WC Yu Y Song K Jang CS Lee B-M (2013) PLANEX the plant co-expression database BMC Plant Biol 13 83Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zheng W Chung LM Zhao H (2011) Bias detection and correction in RNA-Sequencing data BMC Bioinformatics 12 290Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Allen JD Xiao G Xie Y (2014) Ensemble-based network aggregation improves the accuracy of gene network reconstructionPLoS One 9 e106319 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Ye ZH (2015) Secondary cell walls Biosynthesis patterned deposition and transcriptional regulation Plant Cell Physiol 56195-214

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhu G Wu A Xu X-J Xiao P Lu L Liu J Cao Y Chen L Wu J Zhao X-M (2016) PPIM A protein-protein interaction database for MaizePlant Physiol 170 pp1501821-

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zyprych-Walczak J Szabelska A Handschuh L Goacuterczak K Klamecka K Figlerowicz M Siatkowski I (2015) The impact of normalizationmethods on RNA-Seq data analysis Biomed Res Int 2015

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

  • Parsed Citations
  • Article File
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Figure 7
  • Figure 8
  • Parsed Citations
Page 23: 1 : Maize RNA-Seq GCNs · 4 Author names and affiliation: 5 Ji Huang, ... and the genome annotation is still far from complete (Mark 49 Cigan et al., 2005; Ficklin and Feltus, 2011)

Page | 23

762

Supplemental Figure 5 Characteristic of all 1720 PPPTY gene set (ALL_1720) genes with highest AUROC 763

values in CSC method (CSC) PCC method (PCC) and MRNET method (MRNET) Average expression in 764

CPM of four gene sets were in squares average number of lowly expressed elements (CPM lt 0) were in solid 765

circles 766

767

Supplemental Figure 6 Evaluation of network performance based on sample size and inference A AUROC 768

values from GO evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted 769

against sample size B AUROC values from PPPTY evaluations of networks constructed using 12 (S12) to 770

1266 (S1266) libraries were plotted against sample size Networks with the same number of samples included 771

are designated as ldquo_1rdquo ldquo_2rdquo and ldquo_3rdquo Outliers were defined as outside of 15 times the interquartile range 772

above the 75 quantile or below 25 quantile Median values were plotted as bold horizontal lines Dash lines 773

are average AUROC value from 17 individual networks of each categories Mean values of each network were 774

labeled in asterisks PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET 775

Minimum Redundancy NETwork CLR Context Likelihood of Relatedness 776

777

Supplemental Figure 7 GCN performance comparison between protein networks A Area Under the ROC 778

curve (AUROC) values from GO evaluation of protein networks with 17862 genes (ppr_all) and with 11429 779

genes (ppr) B Area Under the ROC curve (AUROC) values from PPPTY evaluation of protein networks with 780

17862 genes (ppr_all) and with 11429 genes (ppr) Both networks were constructed by Pearson Correlation 781

Coefficient (PCC) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate 782

outliers 783

784

Supplemental Figure 8 Average neighborhood connectivity for three selected networks PCC-aggregated (PA) 785

SCC-aggregated (SA) and MRNET-single (MS) The average neighborhood connectivity distribution of all 786

genes is plotted against number of neighbors The top one million edges were chosen for each network Red 787

and blue curve shows the power-law fitted distribution R2 value indicates the fitness with the power-law model 788

789

Supplemental Figure 9 Node distribution for four selected networks PCC-aggregated (PA) SCC-aggregated 790

(SA) and MRNET-single (MS) and the intersection among three networks (Merged network) The number of 791

edges linked to the genes (node degree) was plotted against the number of genes with that degree (number of 792

nodes) Red curve shows the power-law fitted distribution with the function and R2 indicated beside 793

794

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 24

Supplemental Figure 10 Network representation of PCC ranked aggregation network (PA) Each node is a 795

gene in the network The eight largest modules detected by Markov Cluster Algorithm (MCL) were highlighted 796

in colors Genes not in modules 1-8 are light grey nodes 797

798

799

Literature Cited 800

Allen JD Xie Y Chen M Girard L Xiao G (2012) Comparing statistical methods for constructing large scale 801 gene networks PLoS One 7 e29348 802

Anders S Huber W (2010) Differential expression analysis for sequence count data Genome Biol 11 R106 803

Aoki K Ogata Y Shibata D (2007) Approaches for extracting practical information from gene co-expression 804 networks in plant biology Plant Cell Physiol 48 381ndash90 805

Aoki Y Okamura Y Tadaka S Kinoshita K Obayashi T (2015) ATTED-II in 2016 A Plant Coexpression 806 Database Towards Lineage-Specific Coexpression Plant Cell Physiol 57 e5ndashe5 807

Asnicar F Masera L Coller E Gallo C Sella N Tolio T Morettin P Erculiani L Galante F Semeniuta S (2016) 808 NES2RA Network expansion by stratified variable subsetting and ranking aggregation Int J High Perform 809 Comput Appl 1094342016662508 810

Baerenfaller K Grossmann J Grobei MA Hull R Hirsch-Hoffmann M Yalovsky S Zimmermann P 811 Grossniklaus U Gruissem W Baginsky S (2008) Genome-scale proteomics reveals Arabidopsis thaliana 812 gene models and proteome dynamics Science (80- ) 320 938ndash941 813

Ballouz S Verleyen W Gillis J (2015) Guidance for RNA-seq co-expression network construction and analysis 814 Safety in numbers Bioinformatics 31 2123ndash2130 815

Ballouz S Weber M Pavlidis P Gillis J (2016) EGAD Ultra-fast functional analysis of gene networks bioRxiv 816 53868 817

Barabasi A-L Oltvai ZNZN Barabaacutesi A-L (2004) Network biology understanding the cellrsquos functional 818 organization Nat Rev Genet 5 101ndash113 819

Benjamini Y Hochberg Y (1995) Controlling the false discovery rate a practical and powerful approach to 820 multiple testing J R Stat Soc Ser B 289ndash300 821

De Bodt S Hollunder J Nelissen H Meulemeester N Inzeacute D (2012) CORNET 20 Integrating plant 822 coexpression protein-protein interactions regulatory interactions gene associations and functional 823 annotations New Phytol 195 707ndash720 824

Bolduc N Yilmaz A Mejia-Guerra MK Morohashi K OrsquoConnor D Grotewold E Hake S (2012) Unraveling the 825 KNOTTED1 regulatory network in maize meristems Genes Dev 26 1685ndash90 826

Bosch M Mayer C-D Cookson A Donnison IS (2011) Identification of genes involved in cell wall biogenesis in 827 grasses by differential gene expression profiling of elongating and non-elongating maize internodes J 828 Exp Bot 62 3545ndash3561 829

Camacho C Coulouris G Avagyan V Ma N Papadopoulos J Bealer K Madden TL (2009) BLAST+ 830 architecture and applications BMC Bioinformatics 10 421 831

Conesa A Madrigal P Tarazona S Gomez-Cabrero D Cervera A McPherson A Szcześniak MW Gaffney DJ 832 Elo LL Zhang X et al (2016) A survey of best practices for RNA-seq data analysis Genome Biol 17 13 833

Drsquohaeseleer P Liang S Somogyi R (2000) Genetic network inference from co-expression clustering to reverse 834 engineering Bioinformatics 16 707ndash726 835

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 25

Davidson RM Hansey CN Gowda M Childs KL Lin H Vaillancourt B Sekhon RS de Leon N Kaeppler SM 836 Jiang N et al (2011) Utility of RNA Sequencing for Analysis of Maize Reproductive Transcriptomes Plant 837 Genome J 4 191 838

Desprez T Juraniec M Crowell EF Jouy H Pochylova Z Parcy F Houmlfte H Gonneau M Vernhettes S (2007) 839 Organization of cellulose synthase complexes involved in primary cell wall synthesis in Arabidopsis 840 thaliana Proc Natl Acad Sci 104 15572ndash15577 841

Dhillon IS Modha DS (2001) Concept decompositions for large sparse text data using clustering Mach Learn 842 42 143ndash175 843

Dillies M-A Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot G Castel D 844 Estelle J (2013a) A comprehensive evaluation of normalization methods for Illumina high-throughput RNA 845 sequencing data analysis Brief Bioinform 14 671ndash683 846

Dillies MA Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot NS Castel D 847 Estelle J et al (2013b) A comprehensive evaluation of normalization methods for Illumina high-throughput 848 RNA sequencing data analysis Brief Bioinform 14 671ndash683 849

Doncheva NT Assenov Y Domingues FS Albrecht M (2012) Topological analysis and interactive visualization 850 of biological networks and protein structures Nature Protoc 7 670ndash85 851

Dong J Horvath S (2007) Understanding network concepts in modules BMC Syst Biol 1 24 852

Dotto MC Petsch KA Aukerman MJ Beatty M Hammell M Timmermans MCP (2014) Genome-wide analysis 853 of leafbladeless1-regulated and phased small RNAs underscores the importance of the TAS3 ta-siRNA 854 pathway to maize development PLoS Genet 10 e1004826 855

Du D Rawat N Deng Z Gmitter FG (2015) Construction of citrus gene coexpression networks from microarray 856 data using random matrix theory Hortic Res 2 15026 857

Du Z Zhou X Ling Y Zhang Z Su Z (2010) agriGO a GO analysis toolkit for the agricultural community 858 Nucleic Acids Res 38 64-70 859

Enright AJ Van Dongen S Ouzounis CA (2002) An efficient algorithm for large-scale detection of protein 860 families Nucleic Acids Res 30 1575ndash1584 861

Fadista J Vikman P Laakso EO Mollet IG Esguerra J Lou Taneera J Storm P Osmark P Ladenvall C 862 Prasad RB (2014) Global genomic and transcriptomic analysis of human pancreatic islets reveals novel 863 genes influencing glucose metabolism Proc Natl Acad Sci 111 13924ndash13929 864

Faith JJ Hayete B Thaden JT Mogno I Wierzbowski J Cottarel G Kasif S Collins JJ Gardner TS (2007) 865 Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of 866 expression profiles PLoS Biol 5 0054ndash0066 867

Fedoroff N V (2012) McClintockrsquos challenge in the 21st century Proc Natl Acad Sci 109(50) 20200ndash20203 868

Ficklin SP Feltus FA (2011) Gene coexpression network alignment and conservation of gene modules 869 between two grass species maize and rice Plant Physiol 156 1244ndash56 870

Filzmoser P Fritz H Kalcher K (2009) pcaPP Robust PCA by Projection Pursuit R Packag version 1 871

Fu J Cheng Y Linghu J Yang X Kang L Zhang Z Zhang J He C Du X Peng Z (2013) RNA sequencing 872 reveals the complex regulatory network in the maize kernel Nature Commun 42832 873

Gao S Ver Steeg G Galstyan A (2015) Efficient Estimation of Mutual Information for Strongly Dependent 874 Variables Artificial Intelligence and Statistics 277-286 875

Gillis J Pavlidis P (2011) The role of indirect connections in gene networks in predicting function 876 Bioinformatics 27 1860ndash1866 877

Giorgi F Fabbro C Del Licausi F (2013) Comparative study of RNA-seq-and Microarray-derived coexpression 878 networks in Arabidopsis thaliana Bioinformatics 2 1ndash8 879

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 26

Gu Y Kaplinsky N Bringmann M Cobb A Carroll A Sampathkumar A Baskin TI Persson S Somerville CR 880 (2010) Identification of a cellulose synthase-associated protein required for cellulose biosynthesis Proc 881 Natl Acad Sci 107 12866ndash12871 882

Han Y Gao S Muegge K Zhang W Zhou B (2015) Advanced applications of RNA sequencing and challenges 883 Bioinform Biol Insights 9 29ndash46 884

Horvath S Dong J (2008) Geometric interpretation of gene coexpression network analysis PLoS Comput biol 885 4 e1000117 886

Huang J Lynn JS Schulte L Vendramin S McGinnis K (2017) Chapter Two-Epigenetic Control of Gene 887 Expression in Maize Int Rev Cell Mol Biol 328 25ndash48 888

Iancu OD Kawane S Bottomly D Searles R Hitzemann R McWeeney S (2012) Utilizing RNA-Seq data for de 889 novo coexpression network inference Bioinformatics 28 1592ndash1597 890

Kim D Langmead B Salzberg SL (2015) HISAT a fast spliced aligner with low memory requirements Nat 891 Methods 12 357ndash360 892

Koumlster J Rahmann S (2012) Snakemakemdasha scalable bioinformatics workflow engine Bioinformatics 28 893 2520ndash2522 894

Krouk G Lingeman J Colon A Coruzzi G Shasha D (2013) Gene regulatory networks in plants learning 895 causality from time and perturbation Genome Biol 14 123 896

Kumar S Stecher G Suleski M Hedges SB (2017) TimeTree a resource for timelines timetrees and 897 divergence times Mol Biol Evol 34 1812ndash1819 898

Kumari S Nie J Chen H-S Ma H Stewart R Li X Lu M-Z Taylor WM Wei H (2012) Evaluation of gene 899 association methods for coexpression network construction and biological knowledge discovery PLoS 900 One 7 e50411 901

Langfelder P Horvath S (2008) WGCNA an R package for weighted correlation network analysis BMC 902 Bioinformatics 9 559 903

Leinonen R Sugawara H Shumway M (2010) The sequence read archive Nucleic Acids Res gkq1019 904

Li C Qiao Z Qi W Wang Q Yuan Y Yang X Tang Y Mei B Lv Y Zhao H et al (2015a) Genome-Wide 905 Characterization of cis-Acting DNA Targets Reveals the Transcriptional Regulatory Framework of 906 Opaque2 in Maize Plant Cell 27 532-545 907

Li H Peng Z Yang X Wang W Fu J Wang J Han Y Chai Y Guo T Yang N (2013a) Genome-wide 908 association study dissects the genetic architecture of oil biosynthesis in maize kernels Nat Genet 45 43ndash909 50 910

Li J Wei H Zhao PX (2013b) DeGNServer  Deciphering Genome-Scale Gene Networks through High 911 Performance Reverse Engineering Analysis 2013 912

Li P Piao Y Shon HS Ryu KH (2015b) Comparing the normalization methods for the differential analysis of 913 Illumina high-throughput RNA-Seq data BMC Bioinformatics 16 347 914

Li Q Eichten SR Hermanson PJ Zaunbrecher VM Song J Wendt J Rosenbaum H Madzima TF Sloan AE 915 Huang J et al (2014a) Genetic Perturbation of the Maize Methylome Plant Cell 26 4602ndash4616 916

Li S Łabaj PP Zumbo P Sykacek P Shi W Shi L Phan J Wu P-Y Wang M Wang C (2014b) Detecting and 917 correcting systematic variation in large-scale RNA sequencing data Nature Biotechnol 32 888ndash895 918

Li Y Pearl SA Jackson SA (2015c) Gene Networks in Plant Biology Approaches in Reconstruction and 919 Analysis Trends Plant Sci 20 664ndash675 920

Liao Y Smyth GK Shi W (2014) featureCounts an efficient general purpose program for assigning sequence 921 reads to genomic features Bioinformatics 30 923ndash930 922

Lim WK Wang K Lefebvre C Califano A (2007) Comparative analysis of microarray normalization procedures 923 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 27

Effects on reverse engineering gene networks Bioinformatics pp 282ndash288 924

Liu S Liu Y Zhao J Cai S Qian H Zuo K Zhao L Zhang L (2017) A computational interactome for prioritizing 925 genes associated with complex agronomic traits in rice Plant J 90 177-188 926

Lonsdale J Thomas J Salvatore M Phillips R Lo E Shad S Hasz R Walters G Garcia F Young N (2013) 927 The genotype-tissue expression (GTEx) project Nat Genet 45 580ndash585 928

Love MI Huber W Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data 929 with DESeq2 Genome Biol 15 1 930

Luo X You Z Zhou M Li S Leung H Xia Y Zhu Q (2015) A highly efficient approach to protein interactome 931 mapping based on collaborative filtering framework Sci Rep 5 7702 932

Ma C Wang X (2012) Application of the Gini correlation coefficient to infer regulatory relationships in 933 transcriptome analysis Plant Physiol 160 192ndash203 934

Ma H-W Zeng A-P (2003) The connectivity structure giant strong component and centrality of metabolic 935 networks Bioinformatics 19 1423ndash1430 936

Ma S Shah S Bohnert HJ Snyder M Dinesh-Kumar SP (2013) Incorporating motif analysis into gene co-937 expression networks reveals novel modular expression pattern and new signaling pathways PLoS Genet 938 9 e1003840 939

Marbach D Costello JC Kuumlffner R Vega NNM Prill RJ Camacho DM Allison KR Aderhold A Allison KR 940 Bonneau R et al (2012) Wisdom of crowds for robust gene network inference Nat Methods 9 796ndash804 941

Margolin AA Nemenman I Basso K Wiggins C Stolovitzky G Dalla Favera R Califano A (2006) ARACNE 942 an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context BMC 943 Bioinformatics 7 S7 944

Mark Cigan A Unger‐Wallace E Haug‐Collet K (2005) Transcriptional gene silencing as a tool for uncovering 945 gene function in maize Plant J 43 929ndash940 946

Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads EMBnet J 17 947 pp-10 948

Maza E Frasse P Senin P Bouzayen M Zouine M (2013) Comparison of normalization methods for 949 differential gene expression analysis in RNA-Seq experiments A matter of relative size of studied 950 transcriptomes Commun Integr Biol 6 e25849 951

McClintock B (1983) The Significance of Responses of THE GENOME TO CHALLENGE Science (80- ) 792ndash952 801 953

Meyer PE Kontos K Lafitte F Bontempi G (2007) Information-theoretic inference of large transcriptional 954 regulatory networks Eurasip J Bioinforma Syst Biol doi 101155200779879 955

Meyer PE Lafitte F Bontempi G (2008) minet ARBioconductor package for inferring large transcriptional 956 networks using mutual information BMC Bioinformatics 9 461 957

Monaco MK Sen TZ Dharmawardhana PD Ren L Schaeffer M Naithani S Amarasinghe V Thomason J 958 Harper L Gardiner J et al (2013) Maize Metabolic Network Construction and Transcriptome Analysis 959 Plant Genome 6 12 960

Morohashi K Casas MI Falcone Ferreyra ML Falcone Ferreyra L Mejiacutea-Guerra MK Pourcel L Yilmaz A 961 Feller A Carvalho B Emiliani J et al (2012) A genome-wide regulatory framework identifies maize 962 pericarp color1 controlled genes Plant Cell 24 2745ndash64 963

Morris JH Apeltsin L Newman AM Baumbach J Wittkop T Su G Bader GD Ferrin TE (2011) clusterMaker 964 a multi-algorithm clustering plugin for Cytoscape BMC Bioinformatics 12 436 965

Mortazavi A Williams BA McCue K Schaeffer L Wold B (2008) Mapping and quantifying mammalian 966 transcriptomes by RNA-Seq Nat Methods 5 621ndash628 967

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 28

Mukaka MM (2012) A guide to appropriate use of correlation coefficient in medical research Malawi Med J 24 968 69ndash71 969

Obayashi T Hayashi S Saeki M Ohta H Kinoshita K (2009) ATTED-II provides coexpressed gene networks 970 for Arabidopsis Nucleic Acids Res 37 D987ndashD991 971

Ogata Y Suzuki H Sakurai N Shibata D (2010) CoP a database for characterizing co-expressed gene 972 modules with biological information in plants Bioinformatics 26 1267ndash1268 973

Oshlack A Wakefield MJ (2009) Transcript length bias in RNA-seq data confounds systems biology Biol 974 Direct 4 14 975

Park T Yi S-G Kang S-H Lee S Lee Y-S Simon R (2003) Evaluation of normalization methods for microarray 976 data BMC Bioinformatics 4 33 977

Paulson J Chen C-Y Lopes-Ramos CM Kuijjer ML Platig J Sonawane AR Fagny M Glass K Quackenbush 978 J (2016) Tissue-aware RNA-Seq processing and normalization for heterogeneous and sparse data 979 bioRxiv 81802 980

Pautler M Eveland AL LaRue T Yang F Weeks R Lunde C Je B Il Meeley R Komatsu M Vollbrecht E et 981 al (2015) FASCIATED EAR4 Encodes a bZIP Transcription Factor That Regulates Shoot Meristem Size in 982 Maize Plant Cell Online 2 tpc114132506 983

Penning BW Hunter 3rd CT Tayengwa R Eveland AL Dugard CK Olek AT Vermerris W Koch KE McCarty 984 DR Davis MF et al (2009) Genetic resources for maize cell wall biology Plant Physiol 151 1703ndash1728 985

Ponnala L Wang Y Sun Q Wijk KJ (2014) Correlation of mRNA and protein abundance in the developing 986 maize leaf Plant J 78 424ndash440 987

Rau A Gallopin M Celeux G Jaffreacutezic F (2013) Data-based filtering for replicated high-throughput 988 transcriptome sequencing experiments Bioinformatics 29 2146ndash2152 989

Robinson MD McCarthy DJ Smyth GK (2010) edgeR a Bioconductor package for differential expression 990 analysis of digital gene expression data Bioinformatics 26 139ndash140 991

Sales G Romualdi C (2011) parmigenemdasha parallel R package for mutual information estimation and gene 992 network reconstruction Bioinformatics 27 1876ndash1877 993

Sangrador-Vegas A Mitchell AL Chang H-Y Yong S-Y Finn RD (2016) GO annotation in InterPro why 994 stability does not indicate accuracy in a sea of changing annotations Database J Biol databases 995 curation 2016 996

Sato Y Antonio BA Namiki N Takehisa H Minami H Kamatsuki K Sugimoto K Shimizu Y Hirochika H 997 Nagamura Y (2011) RiceXPro a platform for monitoring gene expression in japonica rice grown under 998 natural field conditions Nucleic Acids Res 39 D1141ndashD1148 999

Schaefer RJ Briskine R Springer NM Myers CL (2014) Discovering functional modules across diverse maize 1000 transcriptomes using COB the co-expression browser PLoS One doi 101371journalpone0099193 1001

Schmidt D (2016) Co-Operation Fast Correlation Covariance and Cosine Similarity R package 1002

Schnable P Ware D Fulton R Stein J (2009) The B73 maize genome complexity diversity and dynamics 1003 Science (80- ) 326 1112ndash1115 1004

Schwanhaumlusser B Busse D Li N Dittmar G Schuchhardt J Wolf J Chen W Selbach M (2011) Global 1005 quantification of mammalian gene expression control Nature 473 337ndash342 1006

Scott-Boyer M-P Haibe-Kains B Deschepper CF (2013) Network statistics of genetically-driven gene co-1007 expression modules in mouse crosses Frontiers in Genetics 20134291 1008

Serin EAR Nijveen H Hilhorst HWM Ligterink W (2016) Learning from Co-expression Networks Possibilities 1009 and Challenges Front Plant Sci 7 444 1010

Shannon P Markiel A Ozier O Baliga NS Wang JT Ramage D Amin N Schwikowski B Ideker T (2003) 1011 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 29

Cytoscape a software environment for integrated models of biomolecular interaction networks Genome 1012 Res 13 2498ndash2504 1013

Sing T Sander O Beerenwinkel N Lengauer T (2005) ROCR visualizing classifier performance in R 1014 Bioinformatics 21 3940ndash3941 1015

Singh M Goel S Meeley RB Dantec C Parrinello H Michaud C Leblanc O Grimanelli D (2011) Production of 1016 viable gametes without meiosis in maize deficient for an ARGONAUTE protein Plant Cell 23 443ndash458 1017

Song L Langfelder P Horvath S (2012) Comparison of co-expression measures mutual information 1018 correlation and model based indices BMC Bioinformatics 13 328 1019

Stelpflug SC Rajandeep S Vaillancourt B Hirsch CN Buell CR Leon N De Kaeppler SM (2015) An 1020 expanded maize gene expression atlas based on RNA-sequencing and its use to explore root 1021 development Plant Genome 314ndash362 1022

Szklarczyk D Franceschini A Wyder S Forslund K Heller D Huerta-Cepas J Simonovic M Roth A Santos A 1023 Tsafou KP et al (2015) STRING v10 Protein-protein interaction networks integrated over the tree of life 1024 Nucleic Acids Res 43 D447ndashD452 1025

Tzfadia O Diels T De Meyer S Vandepoele K Aharoni A Van de Peer Y (2016) CoExpNetViz Comparative 1026 Co-Expression Networks Construction and Visualization Tool Front Plant Sci 6 1194 1027

Usadel B Obayashi T Mutwil M Giorgi FM Bassel GW Tanimoto M Chow A Steinhauser D Persson S 1028 Provart NJ (2009) Co-expression tools for plant biology opportunities for hypothesis generation and 1029 caveats Plant Cell Environ 32 1633ndash51 1030

USDA (2016) Grain World Markets and Trade 1031

Vanholme R Demedts B Morreel K Ralph J Boerjan W (2010) Lignin biosynthesis and structure Plant 1032 Physiol 153 895ndash905 1033

Verdin E Ott M (2014) 50 Years of Protein Acetylation From Gene Regulation To Epigenetics Metabolism 1034 and Beyond Nat Rev Mol Cell Biol 16 258ndash264 1035

Walley JW Sartor RC Shen Z Schmitz RJ Wu KJ Urich MA Nery JR Smith LG Schnable JC Ecker JR 1036 (2016) Integration of omic networks in a developmental atlas of maize Science 353 814ndash818 1037

Wang W Zhou XJ Liu Z Sun F (2015a) Network tuned multiple rank aggregation and applications to gene 1038 ranking BMC Bioinformatics 16 S6 1039

Wang YXR Jiang K Feldman LJ Bickel PJ Huang H (2015b) Inferring genendashgene interactions and functional 1040 modules using sparse canonical correlation analysis Ann Appl Stat 9 300ndash323 1041

Wang Z Gerstein M Snyder M (2009) RNA-Seq a revolutionary tool for transcriptomics Nat Rev Genet 10 1042 57ndash63 1043

Wei C Li J Bumgarner RE (2004) Sample size for detecting differentially expressed genes in microarray 1044 experiments BMC Genomics 5 87 1045

Wolfe CJ Kohane IS Butte AJ (2005) Systematic survey reveals general applicability ofldquo guilt-by-associationrdquo 1046 within gene coexpression networks BMC Bioinformatics 6 227 1047

Wu D-D Wang X Li Y Zeng L Irwin DM Zhang Y-P (2014) ldquoOut of Pollenrdquo Hypothesis for Origin of New 1048 Genes in Flowering Plants Study from Arabidopsis thaliana Genome Biol Evol 6 2822ndash2829 1049

Yang H Liu X Xin M Du J Hu Z Peng H Rossi V Sun Q Ni Z Yao Y (2016) Genome-wide Mapping of 1050 Targets of Maize Histone Deacetylase HDA101 Reveals Its Function and Regulatory Mechanism during 1051 Seed Development Plant Cell 28 629ndash645 1052

Yim WC Yu Y Song K Jang CS Lee B-M (2013) PLANEX the plant co-expression database BMC Plant Biol 1053 13 83 1054

Zheng W Chung LM Zhao H (2011) Bias detection and correction in RNA-Sequencing data BMC 1055 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 30

Bioinformatics 12 290 1056

Zhong R Allen JD Xiao G Xie Y (2014) Ensemble-based network aggregation improves the accuracy of gene 1057 network reconstruction PLoS One 9 e106319 1058

Zhong R Ye ZH (2015) Secondary cell walls Biosynthesis patterned deposition and transcriptional regulation 1059 Plant Cell Physiol 56 195ndash214 1060

Zhu G Wu A Xu X-J Xiao P Lu L Liu J Cao Y Chen L Wu J Zhao X-M (2016) PPIM A protein-protein 1061 interaction database for Maize Plant Physiol 170 pp1501821- 1062

Zyprych-Walczak J Szabelska A Handschuh L Goacuterczak K Klamecka K Figlerowicz M Siatkowski I (2015) 1063 The impact of normalization methods on RNA-Seq data analysis Biomed Res Int 2015 1064

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 1 Number of maize microarray and RNA-Seq samples submitted to NCBI (httpswwwncbinlmnihgov) from 2008-2016 A A text search of the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) database identified samples generated by microarray platforms GPL4032 and GPL12620 the total values for years 2008 to 2016 were combined to represent the number of microarray studies (Microarray) A text search of the NCBI Sequence Read Archive (SRA) database was used to identify RNA-seq samples generated between 2008 and 2016 using the Illumina sequencing platform (RNA-Seq) B the number of samples submitted to NCBI GEO database each year generated by microarray platform GPL4032 and GPL12620 were identified by a text search (dash line) and compared to the number of RNA-Seq Illumina samples (solid line) per year 2008-2016

Fig 1A B

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 2 Normalization and network inference methods effect on single network performance A Network performance was evaluated by calculating area under the receiver operating characteristic curve (AUROC) values from GO datasets for comparisons with samples normalized using Variance Stabilizing Transformation (VST) Counts Per Million (CPM) or Reads per Kilobase per Million (RPKM) methods B Network performance was evaluated by calculating AUROC values from PPPTY dataset comparisons for samples normalized using VST CPM or RPKM methods C Network performance was evaluated by calculating AUROC values from com-parisons with HDA101 binding targets for samples normalized using VST CPM or RPKM methods D Network performance was evaluated by calculating AUROC values from comparisons with GO dataset for samples constructed using nine inference methods including Pearson Correlation Coefficient (PCC) Spearman correla-tion coefficient (SCC) Kendall rank Correlation Coefficient (KCC) Gini correlation coefficient (GCC) Biweight midcorrelation (BIC) Cosine Similarity Coefficient (CSC) Additive ARCNE (AA) multiplicative ARCNE (MA) Minimum Redundancy NETwork (MRNET) and Context Likelihood of Relatedness (CLR) E Network perfor-mance was evaluated by calculating AUROC values from comparisons with PPPTY for samples constructed using nine inference methods F Network performance was evaluated by calculating AUROC values from com-parisons with HDA101 binding targets for samples constructed using nine inference methods Outliers were defined as outside of 15 times the interquartile range above 75 quantile or below 25 quantile Median values were plotted as bold horizontal lines For C to F the horizontal dashed lines indicate the highest and lowest AUROC values

Fig 2 A D

B E

C F

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

FigP

FigurePIPSimilarityPbetweenPninePinferencePmethodsPonPnetworkPperformancePbasedPuponPGOPA-PandPPPPTYPB-PevaluationI Genes with highest AUROC value in CSC or MRNETCLR are closed in blue and green box respectively AreaPunderPthePROCPcurvePAUROC-PvaluesPforPeachPGOPtermPorPgenesPwerePscaledPtoPstandardPnormalPdistributionMPresultingPinPscaledPAUROCPvaluesPbetweenPKPblue-PandPPred-IPSamplesPnormalizedPbyPVSTMPCPMPandPRPKMPwerePanalyzedPusingPeachPinferencePmethodsPPCCMPSCCMPKCCMPGCCMPBICMPCSCMPAAMPMAMPMRNETPandPCLR-PandPclusteredPbasedPonP EuclidianP distanceIP PCCP PearsonP CorrelationP CoefficientP SCCP SpearmanP correlationP coefficientP KCCP KendallP rankP CorrelationP CoefficientPGCCPGiniPcorrelationPcoefficientPBICPBiweightPmidcorrelationPCosinePSimilarityPCoefficientPCSC-PAAPAdditivePARCNEPMAPmultiplicativePARCNEPMRNETPMinimumPRedundancyPNETworkPCLRPContextPLikelihoodPofPRelatednessI

A

B

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 4 Effect of sample size on network performance A Average area under the ROC curve (average AUROC) values from GO evaluation of 17 different sized networks plotted against natural logarithm transformed sample size (log(sample size)) B Average AUROC values from PPPTY evaluation of 17 different sized networks plotted against natural logarithm transformed sample size (log(sample size)) Regression fitting logarithm models were plotted in black lines R-square and p-values were calculated by lm() function in R PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness

Fig 4A

B

GO PCC GO SCC

GO MRNET GO CLR

PPPTY PCC PPPTY SCC

PPPTY CLRPPPTY MRNET

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Fig 5

A B

Figure 5 Evaluation of different sized networks constructed with different number of samples using PCC (black) SCC (green) MRNET (red) and CLR (blue) methods A Average AUROC values from GO evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted against sample size B Average AUROC values from PPPTY evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted against sample size Networks with the same number of samples included are designated as ldquo_1rdquo ldquo_2rdquo and ldquo_3rdquo PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 6 GCN performance comparison of networks constructed with 1266 libraries A Area under the ROC curve (AUROC) values from GO evalu-ation of single network (white bars) aggregation network (grey bars) and protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers B AUROC values from PPPTY evaluation of single network (white bars) aggregation network (grey bars) and protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers

AU

C

AU

C

Protein GO Protein PPPTYA B

Fig 6

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

835

45

3812

5 802

148

MS PA

SA

872505

167664

16510411732

9172 716573

106591

MS PA

SA

chromatinassemblydisassembly

cellular macromoleculemetabolic process

chromatin assembly

Hub

N2 compound metabolicprocess

gene silencing

macromoleculemetabolic process

cellular componentorganization

chromatin modification

biosynthetic process

cellular biosyntheticprocess

DNA packaging

organelle organization

protein-DNA complexassembly

nucleosomeorganization

DNA-dep DNAreplication

macromoleculebiosynthetic process

response to DNAdamage stimulus

chromosomeorganization

pattern specificationprocess

DNA replication

DNA conformationchange

translation

cellular macromoleculebiosynthetic process

Nucleic acid metabolicprocess

gene expression

chromatin organizationnucleosome assembly

epigenetic reg of geneexpression

negative regulation ofmacromolecule

metabolic process

cellular response tostress

RNA processing

DNA repair

regionalization

polysaccharidebiosynthetic process

cell wall organization orbiogenesis

glucan metabolicprocess

cellular glucanmetabolic process

cellular polysaccharidebiosynthetic process

cellular carbohydratebiosynthetic process

cellulose metabolicprocess

cellular polysaccharidemetabolic process

cellulose biosyntheticprocess epidermis development

cell growthgrowth

regulation of cellularcomponent size

cellular amino acidderivative metabolic

process

cell wall polysaccharidemetabolic process

carbohydrate metabolicprocess

regulation of anatomicalstructure size

GTP metabolic process

root morphogenesis

epidermal celldifferentiation

ectoderm developmentphenylpropanoid

biosynthetic process

regulation of cell size

glucan biosyntheticprocess

carbohydratebiosynthetic process cellular cell wall

organization orbiogenesis

cell wall biogenesis

Cell Wallroot epidermal cell

differentiationcell differentiation

cell wall organization

protein polymerization

plant-type cell wallbiogenesis

cellular carbohydratemetabolic process

phenylpropanoidmetabolic process

cell wall macromoleculemetabolic process

cellular cell wallmacromolecule

metabolic process

plant-type cell wallorganization or

biogenesis

hemicellulose metabolicprocess

Fig 7A C

B D

Figure 7 Comparison of top 1000 interacting genes in aggregation and single networks A Venn diagram shows the overlap among 1000 highest interacting genes in three networks 148 genes were shared among three networks PA PCC ranked aggregation network SA SCC ranked aggrega-tion network MS MRNET single network B Venn diagram shows the overlap among top 1x106 edges in three networks 106591 edges were shared among three networks C GO term enrichment analyzed by AgriGO for 148 shared highly interacting genes among PA SA and MS D GO term enrichment analyzed by AgriGO for 214 co-expressed genes queried by 16 cell wall pathway genes wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Fig 8

A B C

Figure 8 Cell wall pathway subnetworks A Intersections of PCC aggregation (PA) SCC aggregation (SA) and MRNET-single (MS) networks queried by 16 cell wall pathway genes (red nodes) Cyan nodes are genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions B Network retrieved from CORNET database queried by the 16 cell wall pathway genes (red node) Cyan nodes are genes with reported function in cell wall related path-ways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions C Network retrieved from STRING database queried by the 16 cell wall pathway genes (red nodes) Cyan nodes are genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Parsed CitationsAllen JD Xie Y Chen M Girard L Xiao G (2012) Comparing statistical methods for constructing large scale gene networks PLoS One7 e29348

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Anders S Huber W (2010) Differential expression analysis for sequence count data Genome Biol 11 R106Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Aoki K Ogata Y Shibata D (2007) Approaches for extracting practical information from gene co-expression networks in plant biologyPlant Cell Physiol 48 381-90

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Aoki Y Okamura Y Tadaka S Kinoshita K Obayashi T (2015) ATTED-II in 2016 A Plant Coexpression Database Towards Lineage-Specific Coexpression Plant Cell Physiol 57 e5-e5

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Asnicar F Masera L Coller E Gallo C Sella N Tolio T Morettin P Erculiani L Galante F Semeniuta S (2016) NES2RA Networkexpansion by stratified variable subsetting and ranking aggregation Int J High Perform Comput Appl 1094342016662508

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Baerenfaller K Grossmann J Grobei MA Hull R Hirsch-Hoffmann M Yalovsky S Zimmermann P Grossniklaus U Gruissem WBaginsky S (2008) Genome-scale proteomics reveals Arabidopsis thaliana gene models and proteome dynamics Science (80- ) 320938-941

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ballouz S Verleyen W Gillis J (2015) Guidance for RNA-seq co-expression network construction and analysis Safety in numbersBioinformatics 31 2123-2130

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ballouz S Weber M Pavlidis P Gillis J (2016) EGAD Ultra-fast functional analysis of gene networks bioRxiv 53868Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Barabasi A-L Oltvai ZNZN Barabaacutesi A-L (2004) Network biology understanding the cells functional organization Nat Rev Genet 5101-113

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Benjamini Y Hochberg Y (1995) Controlling the false discovery rate a practical and powerful approach to multiple testing J R Stat SocSer B 289-300

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

De Bodt S Hollunder J Nelissen H Meulemeester N Inzeacute D (2012) CORNET 20 Integrating plant coexpression protein-proteininteractions regulatory interactions gene associations and functional annotations New Phytol 195 707-720

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Bolduc N Yilmaz A Mejia-Guerra MK Morohashi K OConnor D Grotewold E Hake S (2012) Unraveling the KNOTTED1 regulatorynetwork in maize meristems Genes Dev 26 1685-90

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Bosch M Mayer C-D Cookson A Donnison IS (2011) Identification of genes involved in cell wall biogenesis in grasses by differential wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

gene expression profiling of elongating and non-elongating maize internodes J Exp Bot 62 3545-3561Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Camacho C Coulouris G Avagyan V Ma N Papadopoulos J Bealer K Madden TL (2009) BLAST+ architecture and applications BMCBioinformatics 10 421

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Conesa A Madrigal P Tarazona S Gomez-Cabrero D Cervera A McPherson A Szczesniak MW Gaffney DJ Elo LL Zhang X et al(2016) A survey of best practices for RNA-seq data analysis Genome Biol 17 13

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dhaeseleer P Liang S Somogyi R (2000) Genetic network inference from co-expression clustering to reverse engineeringBioinformatics 16 707-726

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Davidson RM Hansey CN Gowda M Childs KL Lin H Vaillancourt B Sekhon RS de Leon N Kaeppler SM Jiang N et al (2011) Utilityof RNA Sequencing for Analysis of Maize Reproductive Transcriptomes Plant Genome J 4 191

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Desprez T Juraniec M Crowell EF Jouy H Pochylova Z Parcy F Houmlfte H Gonneau M Vernhettes S (2007) Organization of cellulosesynthase complexes involved in primary cell wall synthesis in Arabidopsis thaliana Proc Natl Acad Sci 104 15572-15577

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dhillon IS Modha DS (2001) Concept decompositions for large sparse text data using clustering Mach Learn 42 143-175Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dillies M-A Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot G Castel D Estelle J (2013a) Acomprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis Brief Bioinform 14671-683

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dillies MA Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot NS Castel D Estelle J et al (2013b) Acomprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis Brief Bioinform 14671-683

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Doncheva NT Assenov Y Domingues FS Albrecht M (2012) Topological analysis and interactive visualization of biological networksand protein structures Nature Protoc 7 670-85

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dong J Horvath S (2007) Understanding network concepts in modules BMC Syst Biol 1 24Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dotto MC Petsch KA Aukerman MJ Beatty M Hammell M Timmermans MCP (2014) Genome-wide analysis of leafbladeless1-regulatedand phased small RNAs underscores the importance of the TAS3 ta-siRNA pathway to maize development PLoS Genet 10 e1004826

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Du D Rawat N Deng Z Gmitter FG (2015) Construction of citrus gene coexpression networks from microarray data using randommatrix theory Hortic Res 2 15026

Pubmed Author and TitleCrossRef Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Google Scholar Author Only Title Only Author and Title

Du Z Zhou X Ling Y Zhang Z Su Z (2010) agriGO a GO analysis toolkit for the agricultural community Nucleic Acids Res 38 64-70Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Enright AJ Van Dongen S Ouzounis CA (2002) An efficient algorithm for large-scale detection of protein families Nucleic Acids Res 301575-1584

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fadista J Vikman P Laakso EO Mollet IG Esguerra J Lou Taneera J Storm P Osmark P Ladenvall C Prasad RB (2014) Globalgenomic and transcriptomic analysis of human pancreatic islets reveals novel genes influencing glucose metabolism Proc Natl AcadSci 111 13924-13929

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Faith JJ Hayete B Thaden JT Mogno I Wierzbowski J Cottarel G Kasif S Collins JJ Gardner TS (2007) Large-scale mapping andvalidation of Escherichia coli transcriptional regulation from a compendium of expression profiles PLoS Biol 5 0054-0066

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fedoroff N V (2012) McClintocks challenge in the 21st century Proc Natl Acad Sci 109(50) 20200-20203Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ficklin SP Feltus FA (2011) Gene coexpression network alignment and conservation of gene modules between two grass speciesmaize and rice Plant Physiol 156 1244-56

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Filzmoser P Fritz H Kalcher K (2009) pcaPP Robust PCA by Projection Pursuit R Packag version 1Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fu J Cheng Y Linghu J Yang X Kang L Zhang Z Zhang J He C Du X Peng Z (2013) RNA sequencing reveals the complex regulatorynetwork in the maize kernel Nature Commun 42832

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gao S Ver Steeg G Galstyan A (2015) Efficient Estimation of Mutual Information for Strongly Dependent Variables ArtificialIntelligence and Statistics 277-286

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gillis J Pavlidis P (2011) The role of indirect connections in gene networks in predicting function Bioinformatics 27 1860-1866Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Giorgi F Fabbro C Del Licausi F (2013) Comparative study of RNA-seq-and Microarray-derived coexpression networks in Arabidopsisthaliana Bioinformatics 2 1-8

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gu Y Kaplinsky N Bringmann M Cobb A Carroll A Sampathkumar A Baskin TI Persson S Somerville CR (2010) Identification of acellulose synthase-associated protein required for cellulose biosynthesis Proc Natl Acad Sci 107 12866-12871

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Han Y Gao S Muegge K Zhang W Zhou B (2015) Advanced applications of RNA sequencing and challenges Bioinform Biol Insights 929-46

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Horvath S Dong J (2008) Geometric interpretation of gene coexpression network analysis PLoS Comput biol 4 e1000117Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Huang J Lynn JS Schulte L Vendramin S McGinnis K (2017) Chapter Two-Epigenetic Control of Gene Expression in Maize Int RevCell Mol Biol 328 25-48

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Iancu OD Kawane S Bottomly D Searles R Hitzemann R McWeeney S (2012) Utilizing RNA-Seq data for de novo coexpressionnetwork inference Bioinformatics 28 1592-1597

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kim D Langmead B Salzberg SL (2015) HISAT a fast spliced aligner with low memory requirements Nat Methods 12 357-360Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Koumlster J Rahmann S (2012) Snakemakemdasha scalable bioinformatics workflow engine Bioinformatics 28 2520-2522Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Krouk G Lingeman J Colon A Coruzzi G Shasha D (2013) Gene regulatory networks in plants learning causality from time andperturbation Genome Biol 14 123

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kumar S Stecher G Suleski M Hedges SB (2017) TimeTree a resource for timelines timetrees and divergence times Mol Biol Evol34 1812-1819

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kumari S Nie J Chen H-S Ma H Stewart R Li X Lu M-Z Taylor WM Wei H (2012) Evaluation of gene association methods forcoexpression network construction and biological knowledge discovery PLoS One 7 e50411

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Langfelder P Horvath S (2008) WGCNA an R package for weighted correlation network analysis BMC Bioinformatics 9 559Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Leinonen R Sugawara H Shumway M (2010) The sequence read archive Nucleic Acids Res gkq1019Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li C Qiao Z Qi W Wang Q Yuan Y Yang X Tang Y Mei B Lv Y Zhao H et al (2015a) Genome-Wide Characterization of cis-Acting DNATargets Reveals the Transcriptional Regulatory Framework of Opaque2 in Maize Plant Cell 27 532-545

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li H Peng Z Yang X Wang W Fu J Wang J Han Y Chai Y Guo T Yang N (2013a) Genome-wide association study dissects the geneticarchitecture of oil biosynthesis in maize kernels Nat Genet 45 43-50

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li J Wei H Zhao PX (2013b) DeGNServer Deciphering Genome-Scale Gene Networks through High Performance ReverseEngineering Analysis 2013

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li P Piao Y Shon HS Ryu KH (2015b) Comparing the normalization methods for the differential analysis of Illumina high-throughputRNA-Seq data BMC Bioinformatics 16 347 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li Q Eichten SR Hermanson PJ Zaunbrecher VM Song J Wendt J Rosenbaum H Madzima TF Sloan AE Huang J et al (2014a)Genetic Perturbation of the Maize Methylome Plant Cell 26 4602-4616

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li S Labaj PP Zumbo P Sykacek P Shi W Shi L Phan J Wu P-Y Wang M Wang C (2014b) Detecting and correcting systematicvariation in large-scale RNA sequencing data Nature Biotechnol 32 888-895

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li Y Pearl SA Jackson SA (2015c) Gene Networks in Plant Biology Approaches in Reconstruction and Analysis Trends Plant Sci 20664-675

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Liao Y Smyth GK Shi W (2014) featureCounts an efficient general purpose program for assigning sequence reads to genomicfeatures Bioinformatics 30 923-930

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Lim WK Wang K Lefebvre C Califano A (2007) Comparative analysis of microarray normalization procedures Effects on reverseengineering gene networks Bioinformatics pp 282-288

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Liu S Liu Y Zhao J Cai S Qian H Zuo K Zhao L Zhang L (2017) A computational interactome for prioritizing genes associated withcomplex agronomic traits in rice Plant J 90 177-188

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Lonsdale J Thomas J Salvatore M Phillips R Lo E Shad S Hasz R Walters G Garcia F Young N (2013) The genotype-tissueexpression (GTEx) project Nat Genet 45 580-585

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Love MI Huber W Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 Genome Biol15 1

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Luo X You Z Zhou M Li S Leung H Xia Y Zhu Q (2015) A highly efficient approach to protein interactome mapping based oncollaborative filtering framework Sci Rep 5 7702

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma C Wang X (2012) Application of the Gini correlation coefficient to infer regulatory relationships in transcriptome analysis PlantPhysiol 160 192-203

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma H-W Zeng A-P (2003) The connectivity structure giant strong component and centrality of metabolic networks Bioinformatics 191423-1430

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma S Shah S Bohnert HJ Snyder M Dinesh-Kumar SP (2013) Incorporating motif analysis into gene co-expression networks revealsnovel modular expression pattern and new signaling pathways PLoS Genet 9 e1003840

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Marbach D Costello JC Kuumlffner R Vega NNM Prill RJ Camacho DM Allison KR Aderhold A Allison KR Bonneau R et al (2012)Wisdom of crowds for robust gene network inference Nat Methods 9 796-804

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Margolin AA Nemenman I Basso K Wiggins C Stolovitzky G Dalla Favera R Califano A (2006) ARACNE an algorithm for thereconstruction of gene regulatory networks in a mammalian cellular context BMC Bioinformatics 7 S7

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mark Cigan A Unger-Wallace E Haug-Collet K (2005) Transcriptional gene silencing as a tool for uncovering gene function in maizePlant J 43 929-940

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads EMBnet J 17 pp-10Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Maza E Frasse P Senin P Bouzayen M Zouine M (2013) Comparison of normalization methods for differential gene expressionanalysis in RNA-Seq experiments A matter of relative size of studied transcriptomes Commun Integr Biol 6 e25849

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

McClintock B (1983) The Significance of Responses of THE GENOME TO CHALLENGE Science (80- ) 792-801Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Meyer PE Kontos K Lafitte F Bontempi G (2007) Information-theoretic inference of large transcriptional regulatory networks EurasipJ Bioinforma Syst Biol doi 101155200779879

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Meyer PE Lafitte F Bontempi G (2008) minet ARBioconductor package for inferring large transcriptional networks using mutualinformation BMC Bioinformatics 9 461

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Monaco MK Sen TZ Dharmawardhana PD Ren L Schaeffer M Naithani S Amarasinghe V Thomason J Harper L Gardiner J et al(2013) Maize Metabolic Network Construction and Transcriptome Analysis Plant Genome 6 12

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Morohashi K Casas MI Falcone Ferreyra ML Falcone Ferreyra L Mejiacutea-Guerra MK Pourcel L Yilmaz A Feller A Carvalho B EmilianiJ et al (2012) A genome-wide regulatory framework identifies maize pericarp color1 controlled genes Plant Cell 24 2745-64

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Morris JH Apeltsin L Newman AM Baumbach J Wittkop T Su G Bader GD Ferrin TE (2011) clusterMaker a multi-algorithmclustering plugin for Cytoscape BMC Bioinformatics 12 436

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mortazavi A Williams BA McCue K Schaeffer L Wold B (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq NatMethods 5 621-628

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mukaka MM (2012) A guide to appropriate use of correlation coefficient in medical research Malawi Med J 24 69-71Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Obayashi T Hayashi S Saeki M Ohta H Kinoshita K (2009) ATTED-II provides coexpressed gene networks for Arabidopsis Nucleic wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Acids Res 37 D987-D991Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ogata Y Suzuki H Sakurai N Shibata D (2010) CoP a database for characterizing co-expressed gene modules with biologicalinformation in plants Bioinformatics 26 1267-1268

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Oshlack A Wakefield MJ (2009) Transcript length bias in RNA-seq data confounds systems biology Biol Direct 4 14Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Park T Yi S-G Kang S-H Lee S Lee Y-S Simon R (2003) Evaluation of normalization methods for microarray data BMC Bioinformatics4 33

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Paulson J Chen C-Y Lopes-Ramos CM Kuijjer ML Platig J Sonawane AR Fagny M Glass K Quackenbush J (2016) Tissue-awareRNA-Seq processing and normalization for heterogeneous and sparse data bioRxiv 81802

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Pautler M Eveland AL LaRue T Yang F Weeks R Lunde C Je B Il Meeley R Komatsu M Vollbrecht E et al (2015) FASCIATED EAR4Encodes a bZIP Transcription Factor That Regulates Shoot Meristem Size in Maize Plant Cell Online 2 tpc114132506

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Penning BW Hunter 3rd CT Tayengwa R Eveland AL Dugard CK Olek AT Vermerris W Koch KE McCarty DR Davis MF et al (2009)Genetic resources for maize cell wall biology Plant Physiol 151 1703-1728

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ponnala L Wang Y Sun Q Wijk KJ (2014) Correlation of mRNA and protein abundance in the developing maize leaf Plant J 78 424-440Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Rau A Gallopin M Celeux G Jaffreacutezic F (2013) Data-based filtering for replicated high-throughput transcriptome sequencingexperiments Bioinformatics 29 2146-2152

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Robinson MD McCarthy DJ Smyth GK (2010) edgeR a Bioconductor package for differential expression analysis of digital geneexpression data Bioinformatics 26 139-140

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sales G Romualdi C (2011) parmigenemdasha parallel R package for mutual information estimation and gene network reconstructionBioinformatics 27 1876-1877

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sangrador-Vegas A Mitchell AL Chang H-Y Yong S-Y Finn RD (2016) GO annotation in InterPro why stability does not indicateaccuracy in a sea of changing annotations Database J Biol databases curation 2016

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sato Y Antonio BA Namiki N Takehisa H Minami H Kamatsuki K Sugimoto K Shimizu Y Hirochika H Nagamura Y (2011) RiceXPro aplatform for monitoring gene expression in japonica rice grown under natural field conditions Nucleic Acids Res 39 D1141-D1148

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schaefer RJ Briskine R Springer NM Myers CL (2014) Discovering functional modules across diverse maize transcriptomes using wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

COB the co-expression browser PLoS One doi 101371journalpone0099193Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schmidt D (2016) Co-Operation Fast Correlation Covariance and Cosine Similarity R packagePubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schnable P Ware D Fulton R Stein J (2009) The B73 maize genome complexity diversity and dynamics Science (80- ) 326 1112-1115Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schwanhaumlusser B Busse D Li N Dittmar G Schuchhardt J Wolf J Chen W Selbach M (2011) Global quantification of mammalian geneexpression control Nature 473 337-342

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Scott-Boyer M-P Haibe-Kains B Deschepper CF (2013) Network statistics of genetically-driven gene co-expression modules in mousecrosses Frontiers in Genetics 20134291

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Serin EAR Nijveen H Hilhorst HWM Ligterink W (2016) Learning from Co-expression Networks Possibilities and Challenges FrontPlant Sci 7 444

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Shannon P Markiel A Ozier O Baliga NS Wang JT Ramage D Amin N Schwikowski B Ideker T (2003) Cytoscape a softwareenvironment for integrated models of biomolecular interaction networks Genome Res 13 2498-2504

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sing T Sander O Beerenwinkel N Lengauer T (2005) ROCR visualizing classifier performance in R Bioinformatics 21 3940-3941Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Singh M Goel S Meeley RB Dantec C Parrinello H Michaud C Leblanc O Grimanelli D (2011) Production of viable gametes withoutmeiosis in maize deficient for an ARGONAUTE protein Plant Cell 23 443-458

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Song L Langfelder P Horvath S (2012) Comparison of co-expression measures mutual information correlation and model basedindices BMC Bioinformatics 13 328

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Stelpflug SC Rajandeep S Vaillancourt B Hirsch CN Buell CR Leon N De Kaeppler SM (2015) An expanded maize gene expressionatlas based on RNA-sequencing and its use to explore root development Plant Genome 314-362

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Szklarczyk D Franceschini A Wyder S Forslund K Heller D Huerta-Cepas J Simonovic M Roth A Santos A Tsafou KP et al (2015)STRING v10 Protein-protein interaction networks integrated over the tree of life Nucleic Acids Res 43 D447-D452

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Tzfadia O Diels T De Meyer S Vandepoele K Aharoni A Van de Peer Y (2016) CoExpNetViz Comparative Co-Expression NetworksConstruction and Visualization Tool Front Plant Sci 6 1194

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Usadel B Obayashi T Mutwil M Giorgi FM Bassel GW Tanimoto M Chow A Steinhauser D Persson S Provart NJ (2009) Co-expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

USDA (2016) Grain World Markets and Trade

Vanholme R Demedts B Morreel K Ralph J Boerjan W (2010) Lignin biosynthesis and structure Plant Physiol 153 895-905Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Verdin E Ott M (2014) 50 Years of Protein Acetylation From Gene Regulation To Epigenetics Metabolism and Beyond Nat Rev MolCell Biol 16 258-264

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Walley JW Sartor RC Shen Z Schmitz RJ Wu KJ Urich MA Nery JR Smith LG Schnable JC Ecker JR (2016) Integration of omicnetworks in a developmental atlas of maize Science 353 814-818

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang W Zhou XJ Liu Z Sun F (2015a) Network tuned multiple rank aggregation and applications to gene ranking BMC Bioinformatics16 S6

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang YXR Jiang K Feldman LJ Bickel PJ Huang H (2015b) Inferring gene-gene interactions and functional modules using sparsecanonical correlation analysis Ann Appl Stat 9 300-323

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang Z Gerstein M Snyder M (2009) RNA-Seq a revolutionary tool for transcriptomics Nat Rev Genet 10 57-63Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wei C Li J Bumgarner RE (2004) Sample size for detecting differentially expressed genes in microarray experiments BMC Genomics5 87

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wolfe CJ Kohane IS Butte AJ (2005) Systematic survey reveals general applicability of guilt-by-association within genecoexpression networks BMC Bioinformatics 6 227

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wu D-D Wang X Li Y Zeng L Irwin DM Zhang Y-P (2014) Out of Pollen Hypothesis for Origin of New Genes in Flowering PlantsStudy from Arabidopsis thaliana Genome Biol Evol 6 2822-2829

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yang H Liu X Xin M Du J Hu Z Peng H Rossi V Sun Q Ni Z Yao Y (2016) Genome-wide Mapping of Targets of Maize HistoneDeacetylase HDA101 Reveals Its Function and Regulatory Mechanism during Seed Development Plant Cell 28 629-645

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yim WC Yu Y Song K Jang CS Lee B-M (2013) PLANEX the plant co-expression database BMC Plant Biol 13 83Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zheng W Chung LM Zhao H (2011) Bias detection and correction in RNA-Sequencing data BMC Bioinformatics 12 290Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Allen JD Xiao G Xie Y (2014) Ensemble-based network aggregation improves the accuracy of gene network reconstructionPLoS One 9 e106319 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Ye ZH (2015) Secondary cell walls Biosynthesis patterned deposition and transcriptional regulation Plant Cell Physiol 56195-214

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhu G Wu A Xu X-J Xiao P Lu L Liu J Cao Y Chen L Wu J Zhao X-M (2016) PPIM A protein-protein interaction database for MaizePlant Physiol 170 pp1501821-

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zyprych-Walczak J Szabelska A Handschuh L Goacuterczak K Klamecka K Figlerowicz M Siatkowski I (2015) The impact of normalizationmethods on RNA-Seq data analysis Biomed Res Int 2015

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

  • Parsed Citations
  • Article File
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Figure 7
  • Figure 8
  • Parsed Citations
Page 24: 1 : Maize RNA-Seq GCNs · 4 Author names and affiliation: 5 Ji Huang, ... and the genome annotation is still far from complete (Mark 49 Cigan et al., 2005; Ficklin and Feltus, 2011)

Page | 24

Supplemental Figure 10 Network representation of PCC ranked aggregation network (PA) Each node is a 795

gene in the network The eight largest modules detected by Markov Cluster Algorithm (MCL) were highlighted 796

in colors Genes not in modules 1-8 are light grey nodes 797

798

799

Literature Cited 800

Allen JD Xie Y Chen M Girard L Xiao G (2012) Comparing statistical methods for constructing large scale 801 gene networks PLoS One 7 e29348 802

Anders S Huber W (2010) Differential expression analysis for sequence count data Genome Biol 11 R106 803

Aoki K Ogata Y Shibata D (2007) Approaches for extracting practical information from gene co-expression 804 networks in plant biology Plant Cell Physiol 48 381ndash90 805

Aoki Y Okamura Y Tadaka S Kinoshita K Obayashi T (2015) ATTED-II in 2016 A Plant Coexpression 806 Database Towards Lineage-Specific Coexpression Plant Cell Physiol 57 e5ndashe5 807

Asnicar F Masera L Coller E Gallo C Sella N Tolio T Morettin P Erculiani L Galante F Semeniuta S (2016) 808 NES2RA Network expansion by stratified variable subsetting and ranking aggregation Int J High Perform 809 Comput Appl 1094342016662508 810

Baerenfaller K Grossmann J Grobei MA Hull R Hirsch-Hoffmann M Yalovsky S Zimmermann P 811 Grossniklaus U Gruissem W Baginsky S (2008) Genome-scale proteomics reveals Arabidopsis thaliana 812 gene models and proteome dynamics Science (80- ) 320 938ndash941 813

Ballouz S Verleyen W Gillis J (2015) Guidance for RNA-seq co-expression network construction and analysis 814 Safety in numbers Bioinformatics 31 2123ndash2130 815

Ballouz S Weber M Pavlidis P Gillis J (2016) EGAD Ultra-fast functional analysis of gene networks bioRxiv 816 53868 817

Barabasi A-L Oltvai ZNZN Barabaacutesi A-L (2004) Network biology understanding the cellrsquos functional 818 organization Nat Rev Genet 5 101ndash113 819

Benjamini Y Hochberg Y (1995) Controlling the false discovery rate a practical and powerful approach to 820 multiple testing J R Stat Soc Ser B 289ndash300 821

De Bodt S Hollunder J Nelissen H Meulemeester N Inzeacute D (2012) CORNET 20 Integrating plant 822 coexpression protein-protein interactions regulatory interactions gene associations and functional 823 annotations New Phytol 195 707ndash720 824

Bolduc N Yilmaz A Mejia-Guerra MK Morohashi K OrsquoConnor D Grotewold E Hake S (2012) Unraveling the 825 KNOTTED1 regulatory network in maize meristems Genes Dev 26 1685ndash90 826

Bosch M Mayer C-D Cookson A Donnison IS (2011) Identification of genes involved in cell wall biogenesis in 827 grasses by differential gene expression profiling of elongating and non-elongating maize internodes J 828 Exp Bot 62 3545ndash3561 829

Camacho C Coulouris G Avagyan V Ma N Papadopoulos J Bealer K Madden TL (2009) BLAST+ 830 architecture and applications BMC Bioinformatics 10 421 831

Conesa A Madrigal P Tarazona S Gomez-Cabrero D Cervera A McPherson A Szcześniak MW Gaffney DJ 832 Elo LL Zhang X et al (2016) A survey of best practices for RNA-seq data analysis Genome Biol 17 13 833

Drsquohaeseleer P Liang S Somogyi R (2000) Genetic network inference from co-expression clustering to reverse 834 engineering Bioinformatics 16 707ndash726 835

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 25

Davidson RM Hansey CN Gowda M Childs KL Lin H Vaillancourt B Sekhon RS de Leon N Kaeppler SM 836 Jiang N et al (2011) Utility of RNA Sequencing for Analysis of Maize Reproductive Transcriptomes Plant 837 Genome J 4 191 838

Desprez T Juraniec M Crowell EF Jouy H Pochylova Z Parcy F Houmlfte H Gonneau M Vernhettes S (2007) 839 Organization of cellulose synthase complexes involved in primary cell wall synthesis in Arabidopsis 840 thaliana Proc Natl Acad Sci 104 15572ndash15577 841

Dhillon IS Modha DS (2001) Concept decompositions for large sparse text data using clustering Mach Learn 842 42 143ndash175 843

Dillies M-A Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot G Castel D 844 Estelle J (2013a) A comprehensive evaluation of normalization methods for Illumina high-throughput RNA 845 sequencing data analysis Brief Bioinform 14 671ndash683 846

Dillies MA Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot NS Castel D 847 Estelle J et al (2013b) A comprehensive evaluation of normalization methods for Illumina high-throughput 848 RNA sequencing data analysis Brief Bioinform 14 671ndash683 849

Doncheva NT Assenov Y Domingues FS Albrecht M (2012) Topological analysis and interactive visualization 850 of biological networks and protein structures Nature Protoc 7 670ndash85 851

Dong J Horvath S (2007) Understanding network concepts in modules BMC Syst Biol 1 24 852

Dotto MC Petsch KA Aukerman MJ Beatty M Hammell M Timmermans MCP (2014) Genome-wide analysis 853 of leafbladeless1-regulated and phased small RNAs underscores the importance of the TAS3 ta-siRNA 854 pathway to maize development PLoS Genet 10 e1004826 855

Du D Rawat N Deng Z Gmitter FG (2015) Construction of citrus gene coexpression networks from microarray 856 data using random matrix theory Hortic Res 2 15026 857

Du Z Zhou X Ling Y Zhang Z Su Z (2010) agriGO a GO analysis toolkit for the agricultural community 858 Nucleic Acids Res 38 64-70 859

Enright AJ Van Dongen S Ouzounis CA (2002) An efficient algorithm for large-scale detection of protein 860 families Nucleic Acids Res 30 1575ndash1584 861

Fadista J Vikman P Laakso EO Mollet IG Esguerra J Lou Taneera J Storm P Osmark P Ladenvall C 862 Prasad RB (2014) Global genomic and transcriptomic analysis of human pancreatic islets reveals novel 863 genes influencing glucose metabolism Proc Natl Acad Sci 111 13924ndash13929 864

Faith JJ Hayete B Thaden JT Mogno I Wierzbowski J Cottarel G Kasif S Collins JJ Gardner TS (2007) 865 Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of 866 expression profiles PLoS Biol 5 0054ndash0066 867

Fedoroff N V (2012) McClintockrsquos challenge in the 21st century Proc Natl Acad Sci 109(50) 20200ndash20203 868

Ficklin SP Feltus FA (2011) Gene coexpression network alignment and conservation of gene modules 869 between two grass species maize and rice Plant Physiol 156 1244ndash56 870

Filzmoser P Fritz H Kalcher K (2009) pcaPP Robust PCA by Projection Pursuit R Packag version 1 871

Fu J Cheng Y Linghu J Yang X Kang L Zhang Z Zhang J He C Du X Peng Z (2013) RNA sequencing 872 reveals the complex regulatory network in the maize kernel Nature Commun 42832 873

Gao S Ver Steeg G Galstyan A (2015) Efficient Estimation of Mutual Information for Strongly Dependent 874 Variables Artificial Intelligence and Statistics 277-286 875

Gillis J Pavlidis P (2011) The role of indirect connections in gene networks in predicting function 876 Bioinformatics 27 1860ndash1866 877

Giorgi F Fabbro C Del Licausi F (2013) Comparative study of RNA-seq-and Microarray-derived coexpression 878 networks in Arabidopsis thaliana Bioinformatics 2 1ndash8 879

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 26

Gu Y Kaplinsky N Bringmann M Cobb A Carroll A Sampathkumar A Baskin TI Persson S Somerville CR 880 (2010) Identification of a cellulose synthase-associated protein required for cellulose biosynthesis Proc 881 Natl Acad Sci 107 12866ndash12871 882

Han Y Gao S Muegge K Zhang W Zhou B (2015) Advanced applications of RNA sequencing and challenges 883 Bioinform Biol Insights 9 29ndash46 884

Horvath S Dong J (2008) Geometric interpretation of gene coexpression network analysis PLoS Comput biol 885 4 e1000117 886

Huang J Lynn JS Schulte L Vendramin S McGinnis K (2017) Chapter Two-Epigenetic Control of Gene 887 Expression in Maize Int Rev Cell Mol Biol 328 25ndash48 888

Iancu OD Kawane S Bottomly D Searles R Hitzemann R McWeeney S (2012) Utilizing RNA-Seq data for de 889 novo coexpression network inference Bioinformatics 28 1592ndash1597 890

Kim D Langmead B Salzberg SL (2015) HISAT a fast spliced aligner with low memory requirements Nat 891 Methods 12 357ndash360 892

Koumlster J Rahmann S (2012) Snakemakemdasha scalable bioinformatics workflow engine Bioinformatics 28 893 2520ndash2522 894

Krouk G Lingeman J Colon A Coruzzi G Shasha D (2013) Gene regulatory networks in plants learning 895 causality from time and perturbation Genome Biol 14 123 896

Kumar S Stecher G Suleski M Hedges SB (2017) TimeTree a resource for timelines timetrees and 897 divergence times Mol Biol Evol 34 1812ndash1819 898

Kumari S Nie J Chen H-S Ma H Stewart R Li X Lu M-Z Taylor WM Wei H (2012) Evaluation of gene 899 association methods for coexpression network construction and biological knowledge discovery PLoS 900 One 7 e50411 901

Langfelder P Horvath S (2008) WGCNA an R package for weighted correlation network analysis BMC 902 Bioinformatics 9 559 903

Leinonen R Sugawara H Shumway M (2010) The sequence read archive Nucleic Acids Res gkq1019 904

Li C Qiao Z Qi W Wang Q Yuan Y Yang X Tang Y Mei B Lv Y Zhao H et al (2015a) Genome-Wide 905 Characterization of cis-Acting DNA Targets Reveals the Transcriptional Regulatory Framework of 906 Opaque2 in Maize Plant Cell 27 532-545 907

Li H Peng Z Yang X Wang W Fu J Wang J Han Y Chai Y Guo T Yang N (2013a) Genome-wide 908 association study dissects the genetic architecture of oil biosynthesis in maize kernels Nat Genet 45 43ndash909 50 910

Li J Wei H Zhao PX (2013b) DeGNServer  Deciphering Genome-Scale Gene Networks through High 911 Performance Reverse Engineering Analysis 2013 912

Li P Piao Y Shon HS Ryu KH (2015b) Comparing the normalization methods for the differential analysis of 913 Illumina high-throughput RNA-Seq data BMC Bioinformatics 16 347 914

Li Q Eichten SR Hermanson PJ Zaunbrecher VM Song J Wendt J Rosenbaum H Madzima TF Sloan AE 915 Huang J et al (2014a) Genetic Perturbation of the Maize Methylome Plant Cell 26 4602ndash4616 916

Li S Łabaj PP Zumbo P Sykacek P Shi W Shi L Phan J Wu P-Y Wang M Wang C (2014b) Detecting and 917 correcting systematic variation in large-scale RNA sequencing data Nature Biotechnol 32 888ndash895 918

Li Y Pearl SA Jackson SA (2015c) Gene Networks in Plant Biology Approaches in Reconstruction and 919 Analysis Trends Plant Sci 20 664ndash675 920

Liao Y Smyth GK Shi W (2014) featureCounts an efficient general purpose program for assigning sequence 921 reads to genomic features Bioinformatics 30 923ndash930 922

Lim WK Wang K Lefebvre C Califano A (2007) Comparative analysis of microarray normalization procedures 923 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 27

Effects on reverse engineering gene networks Bioinformatics pp 282ndash288 924

Liu S Liu Y Zhao J Cai S Qian H Zuo K Zhao L Zhang L (2017) A computational interactome for prioritizing 925 genes associated with complex agronomic traits in rice Plant J 90 177-188 926

Lonsdale J Thomas J Salvatore M Phillips R Lo E Shad S Hasz R Walters G Garcia F Young N (2013) 927 The genotype-tissue expression (GTEx) project Nat Genet 45 580ndash585 928

Love MI Huber W Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data 929 with DESeq2 Genome Biol 15 1 930

Luo X You Z Zhou M Li S Leung H Xia Y Zhu Q (2015) A highly efficient approach to protein interactome 931 mapping based on collaborative filtering framework Sci Rep 5 7702 932

Ma C Wang X (2012) Application of the Gini correlation coefficient to infer regulatory relationships in 933 transcriptome analysis Plant Physiol 160 192ndash203 934

Ma H-W Zeng A-P (2003) The connectivity structure giant strong component and centrality of metabolic 935 networks Bioinformatics 19 1423ndash1430 936

Ma S Shah S Bohnert HJ Snyder M Dinesh-Kumar SP (2013) Incorporating motif analysis into gene co-937 expression networks reveals novel modular expression pattern and new signaling pathways PLoS Genet 938 9 e1003840 939

Marbach D Costello JC Kuumlffner R Vega NNM Prill RJ Camacho DM Allison KR Aderhold A Allison KR 940 Bonneau R et al (2012) Wisdom of crowds for robust gene network inference Nat Methods 9 796ndash804 941

Margolin AA Nemenman I Basso K Wiggins C Stolovitzky G Dalla Favera R Califano A (2006) ARACNE 942 an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context BMC 943 Bioinformatics 7 S7 944

Mark Cigan A Unger‐Wallace E Haug‐Collet K (2005) Transcriptional gene silencing as a tool for uncovering 945 gene function in maize Plant J 43 929ndash940 946

Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads EMBnet J 17 947 pp-10 948

Maza E Frasse P Senin P Bouzayen M Zouine M (2013) Comparison of normalization methods for 949 differential gene expression analysis in RNA-Seq experiments A matter of relative size of studied 950 transcriptomes Commun Integr Biol 6 e25849 951

McClintock B (1983) The Significance of Responses of THE GENOME TO CHALLENGE Science (80- ) 792ndash952 801 953

Meyer PE Kontos K Lafitte F Bontempi G (2007) Information-theoretic inference of large transcriptional 954 regulatory networks Eurasip J Bioinforma Syst Biol doi 101155200779879 955

Meyer PE Lafitte F Bontempi G (2008) minet ARBioconductor package for inferring large transcriptional 956 networks using mutual information BMC Bioinformatics 9 461 957

Monaco MK Sen TZ Dharmawardhana PD Ren L Schaeffer M Naithani S Amarasinghe V Thomason J 958 Harper L Gardiner J et al (2013) Maize Metabolic Network Construction and Transcriptome Analysis 959 Plant Genome 6 12 960

Morohashi K Casas MI Falcone Ferreyra ML Falcone Ferreyra L Mejiacutea-Guerra MK Pourcel L Yilmaz A 961 Feller A Carvalho B Emiliani J et al (2012) A genome-wide regulatory framework identifies maize 962 pericarp color1 controlled genes Plant Cell 24 2745ndash64 963

Morris JH Apeltsin L Newman AM Baumbach J Wittkop T Su G Bader GD Ferrin TE (2011) clusterMaker 964 a multi-algorithm clustering plugin for Cytoscape BMC Bioinformatics 12 436 965

Mortazavi A Williams BA McCue K Schaeffer L Wold B (2008) Mapping and quantifying mammalian 966 transcriptomes by RNA-Seq Nat Methods 5 621ndash628 967

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 28

Mukaka MM (2012) A guide to appropriate use of correlation coefficient in medical research Malawi Med J 24 968 69ndash71 969

Obayashi T Hayashi S Saeki M Ohta H Kinoshita K (2009) ATTED-II provides coexpressed gene networks 970 for Arabidopsis Nucleic Acids Res 37 D987ndashD991 971

Ogata Y Suzuki H Sakurai N Shibata D (2010) CoP a database for characterizing co-expressed gene 972 modules with biological information in plants Bioinformatics 26 1267ndash1268 973

Oshlack A Wakefield MJ (2009) Transcript length bias in RNA-seq data confounds systems biology Biol 974 Direct 4 14 975

Park T Yi S-G Kang S-H Lee S Lee Y-S Simon R (2003) Evaluation of normalization methods for microarray 976 data BMC Bioinformatics 4 33 977

Paulson J Chen C-Y Lopes-Ramos CM Kuijjer ML Platig J Sonawane AR Fagny M Glass K Quackenbush 978 J (2016) Tissue-aware RNA-Seq processing and normalization for heterogeneous and sparse data 979 bioRxiv 81802 980

Pautler M Eveland AL LaRue T Yang F Weeks R Lunde C Je B Il Meeley R Komatsu M Vollbrecht E et 981 al (2015) FASCIATED EAR4 Encodes a bZIP Transcription Factor That Regulates Shoot Meristem Size in 982 Maize Plant Cell Online 2 tpc114132506 983

Penning BW Hunter 3rd CT Tayengwa R Eveland AL Dugard CK Olek AT Vermerris W Koch KE McCarty 984 DR Davis MF et al (2009) Genetic resources for maize cell wall biology Plant Physiol 151 1703ndash1728 985

Ponnala L Wang Y Sun Q Wijk KJ (2014) Correlation of mRNA and protein abundance in the developing 986 maize leaf Plant J 78 424ndash440 987

Rau A Gallopin M Celeux G Jaffreacutezic F (2013) Data-based filtering for replicated high-throughput 988 transcriptome sequencing experiments Bioinformatics 29 2146ndash2152 989

Robinson MD McCarthy DJ Smyth GK (2010) edgeR a Bioconductor package for differential expression 990 analysis of digital gene expression data Bioinformatics 26 139ndash140 991

Sales G Romualdi C (2011) parmigenemdasha parallel R package for mutual information estimation and gene 992 network reconstruction Bioinformatics 27 1876ndash1877 993

Sangrador-Vegas A Mitchell AL Chang H-Y Yong S-Y Finn RD (2016) GO annotation in InterPro why 994 stability does not indicate accuracy in a sea of changing annotations Database J Biol databases 995 curation 2016 996

Sato Y Antonio BA Namiki N Takehisa H Minami H Kamatsuki K Sugimoto K Shimizu Y Hirochika H 997 Nagamura Y (2011) RiceXPro a platform for monitoring gene expression in japonica rice grown under 998 natural field conditions Nucleic Acids Res 39 D1141ndashD1148 999

Schaefer RJ Briskine R Springer NM Myers CL (2014) Discovering functional modules across diverse maize 1000 transcriptomes using COB the co-expression browser PLoS One doi 101371journalpone0099193 1001

Schmidt D (2016) Co-Operation Fast Correlation Covariance and Cosine Similarity R package 1002

Schnable P Ware D Fulton R Stein J (2009) The B73 maize genome complexity diversity and dynamics 1003 Science (80- ) 326 1112ndash1115 1004

Schwanhaumlusser B Busse D Li N Dittmar G Schuchhardt J Wolf J Chen W Selbach M (2011) Global 1005 quantification of mammalian gene expression control Nature 473 337ndash342 1006

Scott-Boyer M-P Haibe-Kains B Deschepper CF (2013) Network statistics of genetically-driven gene co-1007 expression modules in mouse crosses Frontiers in Genetics 20134291 1008

Serin EAR Nijveen H Hilhorst HWM Ligterink W (2016) Learning from Co-expression Networks Possibilities 1009 and Challenges Front Plant Sci 7 444 1010

Shannon P Markiel A Ozier O Baliga NS Wang JT Ramage D Amin N Schwikowski B Ideker T (2003) 1011 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 29

Cytoscape a software environment for integrated models of biomolecular interaction networks Genome 1012 Res 13 2498ndash2504 1013

Sing T Sander O Beerenwinkel N Lengauer T (2005) ROCR visualizing classifier performance in R 1014 Bioinformatics 21 3940ndash3941 1015

Singh M Goel S Meeley RB Dantec C Parrinello H Michaud C Leblanc O Grimanelli D (2011) Production of 1016 viable gametes without meiosis in maize deficient for an ARGONAUTE protein Plant Cell 23 443ndash458 1017

Song L Langfelder P Horvath S (2012) Comparison of co-expression measures mutual information 1018 correlation and model based indices BMC Bioinformatics 13 328 1019

Stelpflug SC Rajandeep S Vaillancourt B Hirsch CN Buell CR Leon N De Kaeppler SM (2015) An 1020 expanded maize gene expression atlas based on RNA-sequencing and its use to explore root 1021 development Plant Genome 314ndash362 1022

Szklarczyk D Franceschini A Wyder S Forslund K Heller D Huerta-Cepas J Simonovic M Roth A Santos A 1023 Tsafou KP et al (2015) STRING v10 Protein-protein interaction networks integrated over the tree of life 1024 Nucleic Acids Res 43 D447ndashD452 1025

Tzfadia O Diels T De Meyer S Vandepoele K Aharoni A Van de Peer Y (2016) CoExpNetViz Comparative 1026 Co-Expression Networks Construction and Visualization Tool Front Plant Sci 6 1194 1027

Usadel B Obayashi T Mutwil M Giorgi FM Bassel GW Tanimoto M Chow A Steinhauser D Persson S 1028 Provart NJ (2009) Co-expression tools for plant biology opportunities for hypothesis generation and 1029 caveats Plant Cell Environ 32 1633ndash51 1030

USDA (2016) Grain World Markets and Trade 1031

Vanholme R Demedts B Morreel K Ralph J Boerjan W (2010) Lignin biosynthesis and structure Plant 1032 Physiol 153 895ndash905 1033

Verdin E Ott M (2014) 50 Years of Protein Acetylation From Gene Regulation To Epigenetics Metabolism 1034 and Beyond Nat Rev Mol Cell Biol 16 258ndash264 1035

Walley JW Sartor RC Shen Z Schmitz RJ Wu KJ Urich MA Nery JR Smith LG Schnable JC Ecker JR 1036 (2016) Integration of omic networks in a developmental atlas of maize Science 353 814ndash818 1037

Wang W Zhou XJ Liu Z Sun F (2015a) Network tuned multiple rank aggregation and applications to gene 1038 ranking BMC Bioinformatics 16 S6 1039

Wang YXR Jiang K Feldman LJ Bickel PJ Huang H (2015b) Inferring genendashgene interactions and functional 1040 modules using sparse canonical correlation analysis Ann Appl Stat 9 300ndash323 1041

Wang Z Gerstein M Snyder M (2009) RNA-Seq a revolutionary tool for transcriptomics Nat Rev Genet 10 1042 57ndash63 1043

Wei C Li J Bumgarner RE (2004) Sample size for detecting differentially expressed genes in microarray 1044 experiments BMC Genomics 5 87 1045

Wolfe CJ Kohane IS Butte AJ (2005) Systematic survey reveals general applicability ofldquo guilt-by-associationrdquo 1046 within gene coexpression networks BMC Bioinformatics 6 227 1047

Wu D-D Wang X Li Y Zeng L Irwin DM Zhang Y-P (2014) ldquoOut of Pollenrdquo Hypothesis for Origin of New 1048 Genes in Flowering Plants Study from Arabidopsis thaliana Genome Biol Evol 6 2822ndash2829 1049

Yang H Liu X Xin M Du J Hu Z Peng H Rossi V Sun Q Ni Z Yao Y (2016) Genome-wide Mapping of 1050 Targets of Maize Histone Deacetylase HDA101 Reveals Its Function and Regulatory Mechanism during 1051 Seed Development Plant Cell 28 629ndash645 1052

Yim WC Yu Y Song K Jang CS Lee B-M (2013) PLANEX the plant co-expression database BMC Plant Biol 1053 13 83 1054

Zheng W Chung LM Zhao H (2011) Bias detection and correction in RNA-Sequencing data BMC 1055 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 30

Bioinformatics 12 290 1056

Zhong R Allen JD Xiao G Xie Y (2014) Ensemble-based network aggregation improves the accuracy of gene 1057 network reconstruction PLoS One 9 e106319 1058

Zhong R Ye ZH (2015) Secondary cell walls Biosynthesis patterned deposition and transcriptional regulation 1059 Plant Cell Physiol 56 195ndash214 1060

Zhu G Wu A Xu X-J Xiao P Lu L Liu J Cao Y Chen L Wu J Zhao X-M (2016) PPIM A protein-protein 1061 interaction database for Maize Plant Physiol 170 pp1501821- 1062

Zyprych-Walczak J Szabelska A Handschuh L Goacuterczak K Klamecka K Figlerowicz M Siatkowski I (2015) 1063 The impact of normalization methods on RNA-Seq data analysis Biomed Res Int 2015 1064

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 1 Number of maize microarray and RNA-Seq samples submitted to NCBI (httpswwwncbinlmnihgov) from 2008-2016 A A text search of the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) database identified samples generated by microarray platforms GPL4032 and GPL12620 the total values for years 2008 to 2016 were combined to represent the number of microarray studies (Microarray) A text search of the NCBI Sequence Read Archive (SRA) database was used to identify RNA-seq samples generated between 2008 and 2016 using the Illumina sequencing platform (RNA-Seq) B the number of samples submitted to NCBI GEO database each year generated by microarray platform GPL4032 and GPL12620 were identified by a text search (dash line) and compared to the number of RNA-Seq Illumina samples (solid line) per year 2008-2016

Fig 1A B

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 2 Normalization and network inference methods effect on single network performance A Network performance was evaluated by calculating area under the receiver operating characteristic curve (AUROC) values from GO datasets for comparisons with samples normalized using Variance Stabilizing Transformation (VST) Counts Per Million (CPM) or Reads per Kilobase per Million (RPKM) methods B Network performance was evaluated by calculating AUROC values from PPPTY dataset comparisons for samples normalized using VST CPM or RPKM methods C Network performance was evaluated by calculating AUROC values from com-parisons with HDA101 binding targets for samples normalized using VST CPM or RPKM methods D Network performance was evaluated by calculating AUROC values from comparisons with GO dataset for samples constructed using nine inference methods including Pearson Correlation Coefficient (PCC) Spearman correla-tion coefficient (SCC) Kendall rank Correlation Coefficient (KCC) Gini correlation coefficient (GCC) Biweight midcorrelation (BIC) Cosine Similarity Coefficient (CSC) Additive ARCNE (AA) multiplicative ARCNE (MA) Minimum Redundancy NETwork (MRNET) and Context Likelihood of Relatedness (CLR) E Network perfor-mance was evaluated by calculating AUROC values from comparisons with PPPTY for samples constructed using nine inference methods F Network performance was evaluated by calculating AUROC values from com-parisons with HDA101 binding targets for samples constructed using nine inference methods Outliers were defined as outside of 15 times the interquartile range above 75 quantile or below 25 quantile Median values were plotted as bold horizontal lines For C to F the horizontal dashed lines indicate the highest and lowest AUROC values

Fig 2 A D

B E

C F

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

FigP

FigurePIPSimilarityPbetweenPninePinferencePmethodsPonPnetworkPperformancePbasedPuponPGOPA-PandPPPPTYPB-PevaluationI Genes with highest AUROC value in CSC or MRNETCLR are closed in blue and green box respectively AreaPunderPthePROCPcurvePAUROC-PvaluesPforPeachPGOPtermPorPgenesPwerePscaledPtoPstandardPnormalPdistributionMPresultingPinPscaledPAUROCPvaluesPbetweenPKPblue-PandPPred-IPSamplesPnormalizedPbyPVSTMPCPMPandPRPKMPwerePanalyzedPusingPeachPinferencePmethodsPPCCMPSCCMPKCCMPGCCMPBICMPCSCMPAAMPMAMPMRNETPandPCLR-PandPclusteredPbasedPonP EuclidianP distanceIP PCCP PearsonP CorrelationP CoefficientP SCCP SpearmanP correlationP coefficientP KCCP KendallP rankP CorrelationP CoefficientPGCCPGiniPcorrelationPcoefficientPBICPBiweightPmidcorrelationPCosinePSimilarityPCoefficientPCSC-PAAPAdditivePARCNEPMAPmultiplicativePARCNEPMRNETPMinimumPRedundancyPNETworkPCLRPContextPLikelihoodPofPRelatednessI

A

B

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 4 Effect of sample size on network performance A Average area under the ROC curve (average AUROC) values from GO evaluation of 17 different sized networks plotted against natural logarithm transformed sample size (log(sample size)) B Average AUROC values from PPPTY evaluation of 17 different sized networks plotted against natural logarithm transformed sample size (log(sample size)) Regression fitting logarithm models were plotted in black lines R-square and p-values were calculated by lm() function in R PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness

Fig 4A

B

GO PCC GO SCC

GO MRNET GO CLR

PPPTY PCC PPPTY SCC

PPPTY CLRPPPTY MRNET

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Fig 5

A B

Figure 5 Evaluation of different sized networks constructed with different number of samples using PCC (black) SCC (green) MRNET (red) and CLR (blue) methods A Average AUROC values from GO evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted against sample size B Average AUROC values from PPPTY evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted against sample size Networks with the same number of samples included are designated as ldquo_1rdquo ldquo_2rdquo and ldquo_3rdquo PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 6 GCN performance comparison of networks constructed with 1266 libraries A Area under the ROC curve (AUROC) values from GO evalu-ation of single network (white bars) aggregation network (grey bars) and protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers B AUROC values from PPPTY evaluation of single network (white bars) aggregation network (grey bars) and protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers

AU

C

AU

C

Protein GO Protein PPPTYA B

Fig 6

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

835

45

3812

5 802

148

MS PA

SA

872505

167664

16510411732

9172 716573

106591

MS PA

SA

chromatinassemblydisassembly

cellular macromoleculemetabolic process

chromatin assembly

Hub

N2 compound metabolicprocess

gene silencing

macromoleculemetabolic process

cellular componentorganization

chromatin modification

biosynthetic process

cellular biosyntheticprocess

DNA packaging

organelle organization

protein-DNA complexassembly

nucleosomeorganization

DNA-dep DNAreplication

macromoleculebiosynthetic process

response to DNAdamage stimulus

chromosomeorganization

pattern specificationprocess

DNA replication

DNA conformationchange

translation

cellular macromoleculebiosynthetic process

Nucleic acid metabolicprocess

gene expression

chromatin organizationnucleosome assembly

epigenetic reg of geneexpression

negative regulation ofmacromolecule

metabolic process

cellular response tostress

RNA processing

DNA repair

regionalization

polysaccharidebiosynthetic process

cell wall organization orbiogenesis

glucan metabolicprocess

cellular glucanmetabolic process

cellular polysaccharidebiosynthetic process

cellular carbohydratebiosynthetic process

cellulose metabolicprocess

cellular polysaccharidemetabolic process

cellulose biosyntheticprocess epidermis development

cell growthgrowth

regulation of cellularcomponent size

cellular amino acidderivative metabolic

process

cell wall polysaccharidemetabolic process

carbohydrate metabolicprocess

regulation of anatomicalstructure size

GTP metabolic process

root morphogenesis

epidermal celldifferentiation

ectoderm developmentphenylpropanoid

biosynthetic process

regulation of cell size

glucan biosyntheticprocess

carbohydratebiosynthetic process cellular cell wall

organization orbiogenesis

cell wall biogenesis

Cell Wallroot epidermal cell

differentiationcell differentiation

cell wall organization

protein polymerization

plant-type cell wallbiogenesis

cellular carbohydratemetabolic process

phenylpropanoidmetabolic process

cell wall macromoleculemetabolic process

cellular cell wallmacromolecule

metabolic process

plant-type cell wallorganization or

biogenesis

hemicellulose metabolicprocess

Fig 7A C

B D

Figure 7 Comparison of top 1000 interacting genes in aggregation and single networks A Venn diagram shows the overlap among 1000 highest interacting genes in three networks 148 genes were shared among three networks PA PCC ranked aggregation network SA SCC ranked aggrega-tion network MS MRNET single network B Venn diagram shows the overlap among top 1x106 edges in three networks 106591 edges were shared among three networks C GO term enrichment analyzed by AgriGO for 148 shared highly interacting genes among PA SA and MS D GO term enrichment analyzed by AgriGO for 214 co-expressed genes queried by 16 cell wall pathway genes wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Fig 8

A B C

Figure 8 Cell wall pathway subnetworks A Intersections of PCC aggregation (PA) SCC aggregation (SA) and MRNET-single (MS) networks queried by 16 cell wall pathway genes (red nodes) Cyan nodes are genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions B Network retrieved from CORNET database queried by the 16 cell wall pathway genes (red node) Cyan nodes are genes with reported function in cell wall related path-ways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions C Network retrieved from STRING database queried by the 16 cell wall pathway genes (red nodes) Cyan nodes are genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Parsed CitationsAllen JD Xie Y Chen M Girard L Xiao G (2012) Comparing statistical methods for constructing large scale gene networks PLoS One7 e29348

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Anders S Huber W (2010) Differential expression analysis for sequence count data Genome Biol 11 R106Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Aoki K Ogata Y Shibata D (2007) Approaches for extracting practical information from gene co-expression networks in plant biologyPlant Cell Physiol 48 381-90

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Aoki Y Okamura Y Tadaka S Kinoshita K Obayashi T (2015) ATTED-II in 2016 A Plant Coexpression Database Towards Lineage-Specific Coexpression Plant Cell Physiol 57 e5-e5

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Asnicar F Masera L Coller E Gallo C Sella N Tolio T Morettin P Erculiani L Galante F Semeniuta S (2016) NES2RA Networkexpansion by stratified variable subsetting and ranking aggregation Int J High Perform Comput Appl 1094342016662508

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Baerenfaller K Grossmann J Grobei MA Hull R Hirsch-Hoffmann M Yalovsky S Zimmermann P Grossniklaus U Gruissem WBaginsky S (2008) Genome-scale proteomics reveals Arabidopsis thaliana gene models and proteome dynamics Science (80- ) 320938-941

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ballouz S Verleyen W Gillis J (2015) Guidance for RNA-seq co-expression network construction and analysis Safety in numbersBioinformatics 31 2123-2130

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ballouz S Weber M Pavlidis P Gillis J (2016) EGAD Ultra-fast functional analysis of gene networks bioRxiv 53868Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Barabasi A-L Oltvai ZNZN Barabaacutesi A-L (2004) Network biology understanding the cells functional organization Nat Rev Genet 5101-113

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Benjamini Y Hochberg Y (1995) Controlling the false discovery rate a practical and powerful approach to multiple testing J R Stat SocSer B 289-300

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

De Bodt S Hollunder J Nelissen H Meulemeester N Inzeacute D (2012) CORNET 20 Integrating plant coexpression protein-proteininteractions regulatory interactions gene associations and functional annotations New Phytol 195 707-720

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Bolduc N Yilmaz A Mejia-Guerra MK Morohashi K OConnor D Grotewold E Hake S (2012) Unraveling the KNOTTED1 regulatorynetwork in maize meristems Genes Dev 26 1685-90

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Bosch M Mayer C-D Cookson A Donnison IS (2011) Identification of genes involved in cell wall biogenesis in grasses by differential wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

gene expression profiling of elongating and non-elongating maize internodes J Exp Bot 62 3545-3561Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Camacho C Coulouris G Avagyan V Ma N Papadopoulos J Bealer K Madden TL (2009) BLAST+ architecture and applications BMCBioinformatics 10 421

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Conesa A Madrigal P Tarazona S Gomez-Cabrero D Cervera A McPherson A Szczesniak MW Gaffney DJ Elo LL Zhang X et al(2016) A survey of best practices for RNA-seq data analysis Genome Biol 17 13

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dhaeseleer P Liang S Somogyi R (2000) Genetic network inference from co-expression clustering to reverse engineeringBioinformatics 16 707-726

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Davidson RM Hansey CN Gowda M Childs KL Lin H Vaillancourt B Sekhon RS de Leon N Kaeppler SM Jiang N et al (2011) Utilityof RNA Sequencing for Analysis of Maize Reproductive Transcriptomes Plant Genome J 4 191

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Desprez T Juraniec M Crowell EF Jouy H Pochylova Z Parcy F Houmlfte H Gonneau M Vernhettes S (2007) Organization of cellulosesynthase complexes involved in primary cell wall synthesis in Arabidopsis thaliana Proc Natl Acad Sci 104 15572-15577

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dhillon IS Modha DS (2001) Concept decompositions for large sparse text data using clustering Mach Learn 42 143-175Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dillies M-A Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot G Castel D Estelle J (2013a) Acomprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis Brief Bioinform 14671-683

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dillies MA Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot NS Castel D Estelle J et al (2013b) Acomprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis Brief Bioinform 14671-683

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Doncheva NT Assenov Y Domingues FS Albrecht M (2012) Topological analysis and interactive visualization of biological networksand protein structures Nature Protoc 7 670-85

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dong J Horvath S (2007) Understanding network concepts in modules BMC Syst Biol 1 24Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dotto MC Petsch KA Aukerman MJ Beatty M Hammell M Timmermans MCP (2014) Genome-wide analysis of leafbladeless1-regulatedand phased small RNAs underscores the importance of the TAS3 ta-siRNA pathway to maize development PLoS Genet 10 e1004826

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Du D Rawat N Deng Z Gmitter FG (2015) Construction of citrus gene coexpression networks from microarray data using randommatrix theory Hortic Res 2 15026

Pubmed Author and TitleCrossRef Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Google Scholar Author Only Title Only Author and Title

Du Z Zhou X Ling Y Zhang Z Su Z (2010) agriGO a GO analysis toolkit for the agricultural community Nucleic Acids Res 38 64-70Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Enright AJ Van Dongen S Ouzounis CA (2002) An efficient algorithm for large-scale detection of protein families Nucleic Acids Res 301575-1584

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fadista J Vikman P Laakso EO Mollet IG Esguerra J Lou Taneera J Storm P Osmark P Ladenvall C Prasad RB (2014) Globalgenomic and transcriptomic analysis of human pancreatic islets reveals novel genes influencing glucose metabolism Proc Natl AcadSci 111 13924-13929

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Faith JJ Hayete B Thaden JT Mogno I Wierzbowski J Cottarel G Kasif S Collins JJ Gardner TS (2007) Large-scale mapping andvalidation of Escherichia coli transcriptional regulation from a compendium of expression profiles PLoS Biol 5 0054-0066

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fedoroff N V (2012) McClintocks challenge in the 21st century Proc Natl Acad Sci 109(50) 20200-20203Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ficklin SP Feltus FA (2011) Gene coexpression network alignment and conservation of gene modules between two grass speciesmaize and rice Plant Physiol 156 1244-56

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Filzmoser P Fritz H Kalcher K (2009) pcaPP Robust PCA by Projection Pursuit R Packag version 1Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fu J Cheng Y Linghu J Yang X Kang L Zhang Z Zhang J He C Du X Peng Z (2013) RNA sequencing reveals the complex regulatorynetwork in the maize kernel Nature Commun 42832

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gao S Ver Steeg G Galstyan A (2015) Efficient Estimation of Mutual Information for Strongly Dependent Variables ArtificialIntelligence and Statistics 277-286

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gillis J Pavlidis P (2011) The role of indirect connections in gene networks in predicting function Bioinformatics 27 1860-1866Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Giorgi F Fabbro C Del Licausi F (2013) Comparative study of RNA-seq-and Microarray-derived coexpression networks in Arabidopsisthaliana Bioinformatics 2 1-8

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gu Y Kaplinsky N Bringmann M Cobb A Carroll A Sampathkumar A Baskin TI Persson S Somerville CR (2010) Identification of acellulose synthase-associated protein required for cellulose biosynthesis Proc Natl Acad Sci 107 12866-12871

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Han Y Gao S Muegge K Zhang W Zhou B (2015) Advanced applications of RNA sequencing and challenges Bioinform Biol Insights 929-46

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Horvath S Dong J (2008) Geometric interpretation of gene coexpression network analysis PLoS Comput biol 4 e1000117Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Huang J Lynn JS Schulte L Vendramin S McGinnis K (2017) Chapter Two-Epigenetic Control of Gene Expression in Maize Int RevCell Mol Biol 328 25-48

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Iancu OD Kawane S Bottomly D Searles R Hitzemann R McWeeney S (2012) Utilizing RNA-Seq data for de novo coexpressionnetwork inference Bioinformatics 28 1592-1597

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kim D Langmead B Salzberg SL (2015) HISAT a fast spliced aligner with low memory requirements Nat Methods 12 357-360Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Koumlster J Rahmann S (2012) Snakemakemdasha scalable bioinformatics workflow engine Bioinformatics 28 2520-2522Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Krouk G Lingeman J Colon A Coruzzi G Shasha D (2013) Gene regulatory networks in plants learning causality from time andperturbation Genome Biol 14 123

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kumar S Stecher G Suleski M Hedges SB (2017) TimeTree a resource for timelines timetrees and divergence times Mol Biol Evol34 1812-1819

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kumari S Nie J Chen H-S Ma H Stewart R Li X Lu M-Z Taylor WM Wei H (2012) Evaluation of gene association methods forcoexpression network construction and biological knowledge discovery PLoS One 7 e50411

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Langfelder P Horvath S (2008) WGCNA an R package for weighted correlation network analysis BMC Bioinformatics 9 559Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Leinonen R Sugawara H Shumway M (2010) The sequence read archive Nucleic Acids Res gkq1019Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li C Qiao Z Qi W Wang Q Yuan Y Yang X Tang Y Mei B Lv Y Zhao H et al (2015a) Genome-Wide Characterization of cis-Acting DNATargets Reveals the Transcriptional Regulatory Framework of Opaque2 in Maize Plant Cell 27 532-545

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li H Peng Z Yang X Wang W Fu J Wang J Han Y Chai Y Guo T Yang N (2013a) Genome-wide association study dissects the geneticarchitecture of oil biosynthesis in maize kernels Nat Genet 45 43-50

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li J Wei H Zhao PX (2013b) DeGNServer Deciphering Genome-Scale Gene Networks through High Performance ReverseEngineering Analysis 2013

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li P Piao Y Shon HS Ryu KH (2015b) Comparing the normalization methods for the differential analysis of Illumina high-throughputRNA-Seq data BMC Bioinformatics 16 347 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li Q Eichten SR Hermanson PJ Zaunbrecher VM Song J Wendt J Rosenbaum H Madzima TF Sloan AE Huang J et al (2014a)Genetic Perturbation of the Maize Methylome Plant Cell 26 4602-4616

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li S Labaj PP Zumbo P Sykacek P Shi W Shi L Phan J Wu P-Y Wang M Wang C (2014b) Detecting and correcting systematicvariation in large-scale RNA sequencing data Nature Biotechnol 32 888-895

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li Y Pearl SA Jackson SA (2015c) Gene Networks in Plant Biology Approaches in Reconstruction and Analysis Trends Plant Sci 20664-675

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Liao Y Smyth GK Shi W (2014) featureCounts an efficient general purpose program for assigning sequence reads to genomicfeatures Bioinformatics 30 923-930

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Lim WK Wang K Lefebvre C Califano A (2007) Comparative analysis of microarray normalization procedures Effects on reverseengineering gene networks Bioinformatics pp 282-288

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Liu S Liu Y Zhao J Cai S Qian H Zuo K Zhao L Zhang L (2017) A computational interactome for prioritizing genes associated withcomplex agronomic traits in rice Plant J 90 177-188

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Lonsdale J Thomas J Salvatore M Phillips R Lo E Shad S Hasz R Walters G Garcia F Young N (2013) The genotype-tissueexpression (GTEx) project Nat Genet 45 580-585

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Love MI Huber W Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 Genome Biol15 1

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Luo X You Z Zhou M Li S Leung H Xia Y Zhu Q (2015) A highly efficient approach to protein interactome mapping based oncollaborative filtering framework Sci Rep 5 7702

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma C Wang X (2012) Application of the Gini correlation coefficient to infer regulatory relationships in transcriptome analysis PlantPhysiol 160 192-203

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma H-W Zeng A-P (2003) The connectivity structure giant strong component and centrality of metabolic networks Bioinformatics 191423-1430

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma S Shah S Bohnert HJ Snyder M Dinesh-Kumar SP (2013) Incorporating motif analysis into gene co-expression networks revealsnovel modular expression pattern and new signaling pathways PLoS Genet 9 e1003840

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Marbach D Costello JC Kuumlffner R Vega NNM Prill RJ Camacho DM Allison KR Aderhold A Allison KR Bonneau R et al (2012)Wisdom of crowds for robust gene network inference Nat Methods 9 796-804

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Margolin AA Nemenman I Basso K Wiggins C Stolovitzky G Dalla Favera R Califano A (2006) ARACNE an algorithm for thereconstruction of gene regulatory networks in a mammalian cellular context BMC Bioinformatics 7 S7

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mark Cigan A Unger-Wallace E Haug-Collet K (2005) Transcriptional gene silencing as a tool for uncovering gene function in maizePlant J 43 929-940

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads EMBnet J 17 pp-10Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Maza E Frasse P Senin P Bouzayen M Zouine M (2013) Comparison of normalization methods for differential gene expressionanalysis in RNA-Seq experiments A matter of relative size of studied transcriptomes Commun Integr Biol 6 e25849

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

McClintock B (1983) The Significance of Responses of THE GENOME TO CHALLENGE Science (80- ) 792-801Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Meyer PE Kontos K Lafitte F Bontempi G (2007) Information-theoretic inference of large transcriptional regulatory networks EurasipJ Bioinforma Syst Biol doi 101155200779879

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Meyer PE Lafitte F Bontempi G (2008) minet ARBioconductor package for inferring large transcriptional networks using mutualinformation BMC Bioinformatics 9 461

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Monaco MK Sen TZ Dharmawardhana PD Ren L Schaeffer M Naithani S Amarasinghe V Thomason J Harper L Gardiner J et al(2013) Maize Metabolic Network Construction and Transcriptome Analysis Plant Genome 6 12

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Morohashi K Casas MI Falcone Ferreyra ML Falcone Ferreyra L Mejiacutea-Guerra MK Pourcel L Yilmaz A Feller A Carvalho B EmilianiJ et al (2012) A genome-wide regulatory framework identifies maize pericarp color1 controlled genes Plant Cell 24 2745-64

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Morris JH Apeltsin L Newman AM Baumbach J Wittkop T Su G Bader GD Ferrin TE (2011) clusterMaker a multi-algorithmclustering plugin for Cytoscape BMC Bioinformatics 12 436

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mortazavi A Williams BA McCue K Schaeffer L Wold B (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq NatMethods 5 621-628

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mukaka MM (2012) A guide to appropriate use of correlation coefficient in medical research Malawi Med J 24 69-71Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Obayashi T Hayashi S Saeki M Ohta H Kinoshita K (2009) ATTED-II provides coexpressed gene networks for Arabidopsis Nucleic wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Acids Res 37 D987-D991Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ogata Y Suzuki H Sakurai N Shibata D (2010) CoP a database for characterizing co-expressed gene modules with biologicalinformation in plants Bioinformatics 26 1267-1268

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Oshlack A Wakefield MJ (2009) Transcript length bias in RNA-seq data confounds systems biology Biol Direct 4 14Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Park T Yi S-G Kang S-H Lee S Lee Y-S Simon R (2003) Evaluation of normalization methods for microarray data BMC Bioinformatics4 33

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Paulson J Chen C-Y Lopes-Ramos CM Kuijjer ML Platig J Sonawane AR Fagny M Glass K Quackenbush J (2016) Tissue-awareRNA-Seq processing and normalization for heterogeneous and sparse data bioRxiv 81802

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Pautler M Eveland AL LaRue T Yang F Weeks R Lunde C Je B Il Meeley R Komatsu M Vollbrecht E et al (2015) FASCIATED EAR4Encodes a bZIP Transcription Factor That Regulates Shoot Meristem Size in Maize Plant Cell Online 2 tpc114132506

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Penning BW Hunter 3rd CT Tayengwa R Eveland AL Dugard CK Olek AT Vermerris W Koch KE McCarty DR Davis MF et al (2009)Genetic resources for maize cell wall biology Plant Physiol 151 1703-1728

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ponnala L Wang Y Sun Q Wijk KJ (2014) Correlation of mRNA and protein abundance in the developing maize leaf Plant J 78 424-440Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Rau A Gallopin M Celeux G Jaffreacutezic F (2013) Data-based filtering for replicated high-throughput transcriptome sequencingexperiments Bioinformatics 29 2146-2152

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Robinson MD McCarthy DJ Smyth GK (2010) edgeR a Bioconductor package for differential expression analysis of digital geneexpression data Bioinformatics 26 139-140

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sales G Romualdi C (2011) parmigenemdasha parallel R package for mutual information estimation and gene network reconstructionBioinformatics 27 1876-1877

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sangrador-Vegas A Mitchell AL Chang H-Y Yong S-Y Finn RD (2016) GO annotation in InterPro why stability does not indicateaccuracy in a sea of changing annotations Database J Biol databases curation 2016

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sato Y Antonio BA Namiki N Takehisa H Minami H Kamatsuki K Sugimoto K Shimizu Y Hirochika H Nagamura Y (2011) RiceXPro aplatform for monitoring gene expression in japonica rice grown under natural field conditions Nucleic Acids Res 39 D1141-D1148

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schaefer RJ Briskine R Springer NM Myers CL (2014) Discovering functional modules across diverse maize transcriptomes using wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

COB the co-expression browser PLoS One doi 101371journalpone0099193Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schmidt D (2016) Co-Operation Fast Correlation Covariance and Cosine Similarity R packagePubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schnable P Ware D Fulton R Stein J (2009) The B73 maize genome complexity diversity and dynamics Science (80- ) 326 1112-1115Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schwanhaumlusser B Busse D Li N Dittmar G Schuchhardt J Wolf J Chen W Selbach M (2011) Global quantification of mammalian geneexpression control Nature 473 337-342

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Scott-Boyer M-P Haibe-Kains B Deschepper CF (2013) Network statistics of genetically-driven gene co-expression modules in mousecrosses Frontiers in Genetics 20134291

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Serin EAR Nijveen H Hilhorst HWM Ligterink W (2016) Learning from Co-expression Networks Possibilities and Challenges FrontPlant Sci 7 444

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Shannon P Markiel A Ozier O Baliga NS Wang JT Ramage D Amin N Schwikowski B Ideker T (2003) Cytoscape a softwareenvironment for integrated models of biomolecular interaction networks Genome Res 13 2498-2504

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sing T Sander O Beerenwinkel N Lengauer T (2005) ROCR visualizing classifier performance in R Bioinformatics 21 3940-3941Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Singh M Goel S Meeley RB Dantec C Parrinello H Michaud C Leblanc O Grimanelli D (2011) Production of viable gametes withoutmeiosis in maize deficient for an ARGONAUTE protein Plant Cell 23 443-458

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Song L Langfelder P Horvath S (2012) Comparison of co-expression measures mutual information correlation and model basedindices BMC Bioinformatics 13 328

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Stelpflug SC Rajandeep S Vaillancourt B Hirsch CN Buell CR Leon N De Kaeppler SM (2015) An expanded maize gene expressionatlas based on RNA-sequencing and its use to explore root development Plant Genome 314-362

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Szklarczyk D Franceschini A Wyder S Forslund K Heller D Huerta-Cepas J Simonovic M Roth A Santos A Tsafou KP et al (2015)STRING v10 Protein-protein interaction networks integrated over the tree of life Nucleic Acids Res 43 D447-D452

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Tzfadia O Diels T De Meyer S Vandepoele K Aharoni A Van de Peer Y (2016) CoExpNetViz Comparative Co-Expression NetworksConstruction and Visualization Tool Front Plant Sci 6 1194

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Usadel B Obayashi T Mutwil M Giorgi FM Bassel GW Tanimoto M Chow A Steinhauser D Persson S Provart NJ (2009) Co-expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

USDA (2016) Grain World Markets and Trade

Vanholme R Demedts B Morreel K Ralph J Boerjan W (2010) Lignin biosynthesis and structure Plant Physiol 153 895-905Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Verdin E Ott M (2014) 50 Years of Protein Acetylation From Gene Regulation To Epigenetics Metabolism and Beyond Nat Rev MolCell Biol 16 258-264

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Walley JW Sartor RC Shen Z Schmitz RJ Wu KJ Urich MA Nery JR Smith LG Schnable JC Ecker JR (2016) Integration of omicnetworks in a developmental atlas of maize Science 353 814-818

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang W Zhou XJ Liu Z Sun F (2015a) Network tuned multiple rank aggregation and applications to gene ranking BMC Bioinformatics16 S6

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang YXR Jiang K Feldman LJ Bickel PJ Huang H (2015b) Inferring gene-gene interactions and functional modules using sparsecanonical correlation analysis Ann Appl Stat 9 300-323

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang Z Gerstein M Snyder M (2009) RNA-Seq a revolutionary tool for transcriptomics Nat Rev Genet 10 57-63Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wei C Li J Bumgarner RE (2004) Sample size for detecting differentially expressed genes in microarray experiments BMC Genomics5 87

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wolfe CJ Kohane IS Butte AJ (2005) Systematic survey reveals general applicability of guilt-by-association within genecoexpression networks BMC Bioinformatics 6 227

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wu D-D Wang X Li Y Zeng L Irwin DM Zhang Y-P (2014) Out of Pollen Hypothesis for Origin of New Genes in Flowering PlantsStudy from Arabidopsis thaliana Genome Biol Evol 6 2822-2829

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yang H Liu X Xin M Du J Hu Z Peng H Rossi V Sun Q Ni Z Yao Y (2016) Genome-wide Mapping of Targets of Maize HistoneDeacetylase HDA101 Reveals Its Function and Regulatory Mechanism during Seed Development Plant Cell 28 629-645

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yim WC Yu Y Song K Jang CS Lee B-M (2013) PLANEX the plant co-expression database BMC Plant Biol 13 83Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zheng W Chung LM Zhao H (2011) Bias detection and correction in RNA-Sequencing data BMC Bioinformatics 12 290Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Allen JD Xiao G Xie Y (2014) Ensemble-based network aggregation improves the accuracy of gene network reconstructionPLoS One 9 e106319 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Ye ZH (2015) Secondary cell walls Biosynthesis patterned deposition and transcriptional regulation Plant Cell Physiol 56195-214

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhu G Wu A Xu X-J Xiao P Lu L Liu J Cao Y Chen L Wu J Zhao X-M (2016) PPIM A protein-protein interaction database for MaizePlant Physiol 170 pp1501821-

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zyprych-Walczak J Szabelska A Handschuh L Goacuterczak K Klamecka K Figlerowicz M Siatkowski I (2015) The impact of normalizationmethods on RNA-Seq data analysis Biomed Res Int 2015

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

  • Parsed Citations
  • Article File
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Figure 7
  • Figure 8
  • Parsed Citations
Page 25: 1 : Maize RNA-Seq GCNs · 4 Author names and affiliation: 5 Ji Huang, ... and the genome annotation is still far from complete (Mark 49 Cigan et al., 2005; Ficklin and Feltus, 2011)

Page | 25

Davidson RM Hansey CN Gowda M Childs KL Lin H Vaillancourt B Sekhon RS de Leon N Kaeppler SM 836 Jiang N et al (2011) Utility of RNA Sequencing for Analysis of Maize Reproductive Transcriptomes Plant 837 Genome J 4 191 838

Desprez T Juraniec M Crowell EF Jouy H Pochylova Z Parcy F Houmlfte H Gonneau M Vernhettes S (2007) 839 Organization of cellulose synthase complexes involved in primary cell wall synthesis in Arabidopsis 840 thaliana Proc Natl Acad Sci 104 15572ndash15577 841

Dhillon IS Modha DS (2001) Concept decompositions for large sparse text data using clustering Mach Learn 842 42 143ndash175 843

Dillies M-A Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot G Castel D 844 Estelle J (2013a) A comprehensive evaluation of normalization methods for Illumina high-throughput RNA 845 sequencing data analysis Brief Bioinform 14 671ndash683 846

Dillies MA Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot NS Castel D 847 Estelle J et al (2013b) A comprehensive evaluation of normalization methods for Illumina high-throughput 848 RNA sequencing data analysis Brief Bioinform 14 671ndash683 849

Doncheva NT Assenov Y Domingues FS Albrecht M (2012) Topological analysis and interactive visualization 850 of biological networks and protein structures Nature Protoc 7 670ndash85 851

Dong J Horvath S (2007) Understanding network concepts in modules BMC Syst Biol 1 24 852

Dotto MC Petsch KA Aukerman MJ Beatty M Hammell M Timmermans MCP (2014) Genome-wide analysis 853 of leafbladeless1-regulated and phased small RNAs underscores the importance of the TAS3 ta-siRNA 854 pathway to maize development PLoS Genet 10 e1004826 855

Du D Rawat N Deng Z Gmitter FG (2015) Construction of citrus gene coexpression networks from microarray 856 data using random matrix theory Hortic Res 2 15026 857

Du Z Zhou X Ling Y Zhang Z Su Z (2010) agriGO a GO analysis toolkit for the agricultural community 858 Nucleic Acids Res 38 64-70 859

Enright AJ Van Dongen S Ouzounis CA (2002) An efficient algorithm for large-scale detection of protein 860 families Nucleic Acids Res 30 1575ndash1584 861

Fadista J Vikman P Laakso EO Mollet IG Esguerra J Lou Taneera J Storm P Osmark P Ladenvall C 862 Prasad RB (2014) Global genomic and transcriptomic analysis of human pancreatic islets reveals novel 863 genes influencing glucose metabolism Proc Natl Acad Sci 111 13924ndash13929 864

Faith JJ Hayete B Thaden JT Mogno I Wierzbowski J Cottarel G Kasif S Collins JJ Gardner TS (2007) 865 Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of 866 expression profiles PLoS Biol 5 0054ndash0066 867

Fedoroff N V (2012) McClintockrsquos challenge in the 21st century Proc Natl Acad Sci 109(50) 20200ndash20203 868

Ficklin SP Feltus FA (2011) Gene coexpression network alignment and conservation of gene modules 869 between two grass species maize and rice Plant Physiol 156 1244ndash56 870

Filzmoser P Fritz H Kalcher K (2009) pcaPP Robust PCA by Projection Pursuit R Packag version 1 871

Fu J Cheng Y Linghu J Yang X Kang L Zhang Z Zhang J He C Du X Peng Z (2013) RNA sequencing 872 reveals the complex regulatory network in the maize kernel Nature Commun 42832 873

Gao S Ver Steeg G Galstyan A (2015) Efficient Estimation of Mutual Information for Strongly Dependent 874 Variables Artificial Intelligence and Statistics 277-286 875

Gillis J Pavlidis P (2011) The role of indirect connections in gene networks in predicting function 876 Bioinformatics 27 1860ndash1866 877

Giorgi F Fabbro C Del Licausi F (2013) Comparative study of RNA-seq-and Microarray-derived coexpression 878 networks in Arabidopsis thaliana Bioinformatics 2 1ndash8 879

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 26

Gu Y Kaplinsky N Bringmann M Cobb A Carroll A Sampathkumar A Baskin TI Persson S Somerville CR 880 (2010) Identification of a cellulose synthase-associated protein required for cellulose biosynthesis Proc 881 Natl Acad Sci 107 12866ndash12871 882

Han Y Gao S Muegge K Zhang W Zhou B (2015) Advanced applications of RNA sequencing and challenges 883 Bioinform Biol Insights 9 29ndash46 884

Horvath S Dong J (2008) Geometric interpretation of gene coexpression network analysis PLoS Comput biol 885 4 e1000117 886

Huang J Lynn JS Schulte L Vendramin S McGinnis K (2017) Chapter Two-Epigenetic Control of Gene 887 Expression in Maize Int Rev Cell Mol Biol 328 25ndash48 888

Iancu OD Kawane S Bottomly D Searles R Hitzemann R McWeeney S (2012) Utilizing RNA-Seq data for de 889 novo coexpression network inference Bioinformatics 28 1592ndash1597 890

Kim D Langmead B Salzberg SL (2015) HISAT a fast spliced aligner with low memory requirements Nat 891 Methods 12 357ndash360 892

Koumlster J Rahmann S (2012) Snakemakemdasha scalable bioinformatics workflow engine Bioinformatics 28 893 2520ndash2522 894

Krouk G Lingeman J Colon A Coruzzi G Shasha D (2013) Gene regulatory networks in plants learning 895 causality from time and perturbation Genome Biol 14 123 896

Kumar S Stecher G Suleski M Hedges SB (2017) TimeTree a resource for timelines timetrees and 897 divergence times Mol Biol Evol 34 1812ndash1819 898

Kumari S Nie J Chen H-S Ma H Stewart R Li X Lu M-Z Taylor WM Wei H (2012) Evaluation of gene 899 association methods for coexpression network construction and biological knowledge discovery PLoS 900 One 7 e50411 901

Langfelder P Horvath S (2008) WGCNA an R package for weighted correlation network analysis BMC 902 Bioinformatics 9 559 903

Leinonen R Sugawara H Shumway M (2010) The sequence read archive Nucleic Acids Res gkq1019 904

Li C Qiao Z Qi W Wang Q Yuan Y Yang X Tang Y Mei B Lv Y Zhao H et al (2015a) Genome-Wide 905 Characterization of cis-Acting DNA Targets Reveals the Transcriptional Regulatory Framework of 906 Opaque2 in Maize Plant Cell 27 532-545 907

Li H Peng Z Yang X Wang W Fu J Wang J Han Y Chai Y Guo T Yang N (2013a) Genome-wide 908 association study dissects the genetic architecture of oil biosynthesis in maize kernels Nat Genet 45 43ndash909 50 910

Li J Wei H Zhao PX (2013b) DeGNServer  Deciphering Genome-Scale Gene Networks through High 911 Performance Reverse Engineering Analysis 2013 912

Li P Piao Y Shon HS Ryu KH (2015b) Comparing the normalization methods for the differential analysis of 913 Illumina high-throughput RNA-Seq data BMC Bioinformatics 16 347 914

Li Q Eichten SR Hermanson PJ Zaunbrecher VM Song J Wendt J Rosenbaum H Madzima TF Sloan AE 915 Huang J et al (2014a) Genetic Perturbation of the Maize Methylome Plant Cell 26 4602ndash4616 916

Li S Łabaj PP Zumbo P Sykacek P Shi W Shi L Phan J Wu P-Y Wang M Wang C (2014b) Detecting and 917 correcting systematic variation in large-scale RNA sequencing data Nature Biotechnol 32 888ndash895 918

Li Y Pearl SA Jackson SA (2015c) Gene Networks in Plant Biology Approaches in Reconstruction and 919 Analysis Trends Plant Sci 20 664ndash675 920

Liao Y Smyth GK Shi W (2014) featureCounts an efficient general purpose program for assigning sequence 921 reads to genomic features Bioinformatics 30 923ndash930 922

Lim WK Wang K Lefebvre C Califano A (2007) Comparative analysis of microarray normalization procedures 923 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 27

Effects on reverse engineering gene networks Bioinformatics pp 282ndash288 924

Liu S Liu Y Zhao J Cai S Qian H Zuo K Zhao L Zhang L (2017) A computational interactome for prioritizing 925 genes associated with complex agronomic traits in rice Plant J 90 177-188 926

Lonsdale J Thomas J Salvatore M Phillips R Lo E Shad S Hasz R Walters G Garcia F Young N (2013) 927 The genotype-tissue expression (GTEx) project Nat Genet 45 580ndash585 928

Love MI Huber W Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data 929 with DESeq2 Genome Biol 15 1 930

Luo X You Z Zhou M Li S Leung H Xia Y Zhu Q (2015) A highly efficient approach to protein interactome 931 mapping based on collaborative filtering framework Sci Rep 5 7702 932

Ma C Wang X (2012) Application of the Gini correlation coefficient to infer regulatory relationships in 933 transcriptome analysis Plant Physiol 160 192ndash203 934

Ma H-W Zeng A-P (2003) The connectivity structure giant strong component and centrality of metabolic 935 networks Bioinformatics 19 1423ndash1430 936

Ma S Shah S Bohnert HJ Snyder M Dinesh-Kumar SP (2013) Incorporating motif analysis into gene co-937 expression networks reveals novel modular expression pattern and new signaling pathways PLoS Genet 938 9 e1003840 939

Marbach D Costello JC Kuumlffner R Vega NNM Prill RJ Camacho DM Allison KR Aderhold A Allison KR 940 Bonneau R et al (2012) Wisdom of crowds for robust gene network inference Nat Methods 9 796ndash804 941

Margolin AA Nemenman I Basso K Wiggins C Stolovitzky G Dalla Favera R Califano A (2006) ARACNE 942 an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context BMC 943 Bioinformatics 7 S7 944

Mark Cigan A Unger‐Wallace E Haug‐Collet K (2005) Transcriptional gene silencing as a tool for uncovering 945 gene function in maize Plant J 43 929ndash940 946

Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads EMBnet J 17 947 pp-10 948

Maza E Frasse P Senin P Bouzayen M Zouine M (2013) Comparison of normalization methods for 949 differential gene expression analysis in RNA-Seq experiments A matter of relative size of studied 950 transcriptomes Commun Integr Biol 6 e25849 951

McClintock B (1983) The Significance of Responses of THE GENOME TO CHALLENGE Science (80- ) 792ndash952 801 953

Meyer PE Kontos K Lafitte F Bontempi G (2007) Information-theoretic inference of large transcriptional 954 regulatory networks Eurasip J Bioinforma Syst Biol doi 101155200779879 955

Meyer PE Lafitte F Bontempi G (2008) minet ARBioconductor package for inferring large transcriptional 956 networks using mutual information BMC Bioinformatics 9 461 957

Monaco MK Sen TZ Dharmawardhana PD Ren L Schaeffer M Naithani S Amarasinghe V Thomason J 958 Harper L Gardiner J et al (2013) Maize Metabolic Network Construction and Transcriptome Analysis 959 Plant Genome 6 12 960

Morohashi K Casas MI Falcone Ferreyra ML Falcone Ferreyra L Mejiacutea-Guerra MK Pourcel L Yilmaz A 961 Feller A Carvalho B Emiliani J et al (2012) A genome-wide regulatory framework identifies maize 962 pericarp color1 controlled genes Plant Cell 24 2745ndash64 963

Morris JH Apeltsin L Newman AM Baumbach J Wittkop T Su G Bader GD Ferrin TE (2011) clusterMaker 964 a multi-algorithm clustering plugin for Cytoscape BMC Bioinformatics 12 436 965

Mortazavi A Williams BA McCue K Schaeffer L Wold B (2008) Mapping and quantifying mammalian 966 transcriptomes by RNA-Seq Nat Methods 5 621ndash628 967

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 28

Mukaka MM (2012) A guide to appropriate use of correlation coefficient in medical research Malawi Med J 24 968 69ndash71 969

Obayashi T Hayashi S Saeki M Ohta H Kinoshita K (2009) ATTED-II provides coexpressed gene networks 970 for Arabidopsis Nucleic Acids Res 37 D987ndashD991 971

Ogata Y Suzuki H Sakurai N Shibata D (2010) CoP a database for characterizing co-expressed gene 972 modules with biological information in plants Bioinformatics 26 1267ndash1268 973

Oshlack A Wakefield MJ (2009) Transcript length bias in RNA-seq data confounds systems biology Biol 974 Direct 4 14 975

Park T Yi S-G Kang S-H Lee S Lee Y-S Simon R (2003) Evaluation of normalization methods for microarray 976 data BMC Bioinformatics 4 33 977

Paulson J Chen C-Y Lopes-Ramos CM Kuijjer ML Platig J Sonawane AR Fagny M Glass K Quackenbush 978 J (2016) Tissue-aware RNA-Seq processing and normalization for heterogeneous and sparse data 979 bioRxiv 81802 980

Pautler M Eveland AL LaRue T Yang F Weeks R Lunde C Je B Il Meeley R Komatsu M Vollbrecht E et 981 al (2015) FASCIATED EAR4 Encodes a bZIP Transcription Factor That Regulates Shoot Meristem Size in 982 Maize Plant Cell Online 2 tpc114132506 983

Penning BW Hunter 3rd CT Tayengwa R Eveland AL Dugard CK Olek AT Vermerris W Koch KE McCarty 984 DR Davis MF et al (2009) Genetic resources for maize cell wall biology Plant Physiol 151 1703ndash1728 985

Ponnala L Wang Y Sun Q Wijk KJ (2014) Correlation of mRNA and protein abundance in the developing 986 maize leaf Plant J 78 424ndash440 987

Rau A Gallopin M Celeux G Jaffreacutezic F (2013) Data-based filtering for replicated high-throughput 988 transcriptome sequencing experiments Bioinformatics 29 2146ndash2152 989

Robinson MD McCarthy DJ Smyth GK (2010) edgeR a Bioconductor package for differential expression 990 analysis of digital gene expression data Bioinformatics 26 139ndash140 991

Sales G Romualdi C (2011) parmigenemdasha parallel R package for mutual information estimation and gene 992 network reconstruction Bioinformatics 27 1876ndash1877 993

Sangrador-Vegas A Mitchell AL Chang H-Y Yong S-Y Finn RD (2016) GO annotation in InterPro why 994 stability does not indicate accuracy in a sea of changing annotations Database J Biol databases 995 curation 2016 996

Sato Y Antonio BA Namiki N Takehisa H Minami H Kamatsuki K Sugimoto K Shimizu Y Hirochika H 997 Nagamura Y (2011) RiceXPro a platform for monitoring gene expression in japonica rice grown under 998 natural field conditions Nucleic Acids Res 39 D1141ndashD1148 999

Schaefer RJ Briskine R Springer NM Myers CL (2014) Discovering functional modules across diverse maize 1000 transcriptomes using COB the co-expression browser PLoS One doi 101371journalpone0099193 1001

Schmidt D (2016) Co-Operation Fast Correlation Covariance and Cosine Similarity R package 1002

Schnable P Ware D Fulton R Stein J (2009) The B73 maize genome complexity diversity and dynamics 1003 Science (80- ) 326 1112ndash1115 1004

Schwanhaumlusser B Busse D Li N Dittmar G Schuchhardt J Wolf J Chen W Selbach M (2011) Global 1005 quantification of mammalian gene expression control Nature 473 337ndash342 1006

Scott-Boyer M-P Haibe-Kains B Deschepper CF (2013) Network statistics of genetically-driven gene co-1007 expression modules in mouse crosses Frontiers in Genetics 20134291 1008

Serin EAR Nijveen H Hilhorst HWM Ligterink W (2016) Learning from Co-expression Networks Possibilities 1009 and Challenges Front Plant Sci 7 444 1010

Shannon P Markiel A Ozier O Baliga NS Wang JT Ramage D Amin N Schwikowski B Ideker T (2003) 1011 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 29

Cytoscape a software environment for integrated models of biomolecular interaction networks Genome 1012 Res 13 2498ndash2504 1013

Sing T Sander O Beerenwinkel N Lengauer T (2005) ROCR visualizing classifier performance in R 1014 Bioinformatics 21 3940ndash3941 1015

Singh M Goel S Meeley RB Dantec C Parrinello H Michaud C Leblanc O Grimanelli D (2011) Production of 1016 viable gametes without meiosis in maize deficient for an ARGONAUTE protein Plant Cell 23 443ndash458 1017

Song L Langfelder P Horvath S (2012) Comparison of co-expression measures mutual information 1018 correlation and model based indices BMC Bioinformatics 13 328 1019

Stelpflug SC Rajandeep S Vaillancourt B Hirsch CN Buell CR Leon N De Kaeppler SM (2015) An 1020 expanded maize gene expression atlas based on RNA-sequencing and its use to explore root 1021 development Plant Genome 314ndash362 1022

Szklarczyk D Franceschini A Wyder S Forslund K Heller D Huerta-Cepas J Simonovic M Roth A Santos A 1023 Tsafou KP et al (2015) STRING v10 Protein-protein interaction networks integrated over the tree of life 1024 Nucleic Acids Res 43 D447ndashD452 1025

Tzfadia O Diels T De Meyer S Vandepoele K Aharoni A Van de Peer Y (2016) CoExpNetViz Comparative 1026 Co-Expression Networks Construction and Visualization Tool Front Plant Sci 6 1194 1027

Usadel B Obayashi T Mutwil M Giorgi FM Bassel GW Tanimoto M Chow A Steinhauser D Persson S 1028 Provart NJ (2009) Co-expression tools for plant biology opportunities for hypothesis generation and 1029 caveats Plant Cell Environ 32 1633ndash51 1030

USDA (2016) Grain World Markets and Trade 1031

Vanholme R Demedts B Morreel K Ralph J Boerjan W (2010) Lignin biosynthesis and structure Plant 1032 Physiol 153 895ndash905 1033

Verdin E Ott M (2014) 50 Years of Protein Acetylation From Gene Regulation To Epigenetics Metabolism 1034 and Beyond Nat Rev Mol Cell Biol 16 258ndash264 1035

Walley JW Sartor RC Shen Z Schmitz RJ Wu KJ Urich MA Nery JR Smith LG Schnable JC Ecker JR 1036 (2016) Integration of omic networks in a developmental atlas of maize Science 353 814ndash818 1037

Wang W Zhou XJ Liu Z Sun F (2015a) Network tuned multiple rank aggregation and applications to gene 1038 ranking BMC Bioinformatics 16 S6 1039

Wang YXR Jiang K Feldman LJ Bickel PJ Huang H (2015b) Inferring genendashgene interactions and functional 1040 modules using sparse canonical correlation analysis Ann Appl Stat 9 300ndash323 1041

Wang Z Gerstein M Snyder M (2009) RNA-Seq a revolutionary tool for transcriptomics Nat Rev Genet 10 1042 57ndash63 1043

Wei C Li J Bumgarner RE (2004) Sample size for detecting differentially expressed genes in microarray 1044 experiments BMC Genomics 5 87 1045

Wolfe CJ Kohane IS Butte AJ (2005) Systematic survey reveals general applicability ofldquo guilt-by-associationrdquo 1046 within gene coexpression networks BMC Bioinformatics 6 227 1047

Wu D-D Wang X Li Y Zeng L Irwin DM Zhang Y-P (2014) ldquoOut of Pollenrdquo Hypothesis for Origin of New 1048 Genes in Flowering Plants Study from Arabidopsis thaliana Genome Biol Evol 6 2822ndash2829 1049

Yang H Liu X Xin M Du J Hu Z Peng H Rossi V Sun Q Ni Z Yao Y (2016) Genome-wide Mapping of 1050 Targets of Maize Histone Deacetylase HDA101 Reveals Its Function and Regulatory Mechanism during 1051 Seed Development Plant Cell 28 629ndash645 1052

Yim WC Yu Y Song K Jang CS Lee B-M (2013) PLANEX the plant co-expression database BMC Plant Biol 1053 13 83 1054

Zheng W Chung LM Zhao H (2011) Bias detection and correction in RNA-Sequencing data BMC 1055 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 30

Bioinformatics 12 290 1056

Zhong R Allen JD Xiao G Xie Y (2014) Ensemble-based network aggregation improves the accuracy of gene 1057 network reconstruction PLoS One 9 e106319 1058

Zhong R Ye ZH (2015) Secondary cell walls Biosynthesis patterned deposition and transcriptional regulation 1059 Plant Cell Physiol 56 195ndash214 1060

Zhu G Wu A Xu X-J Xiao P Lu L Liu J Cao Y Chen L Wu J Zhao X-M (2016) PPIM A protein-protein 1061 interaction database for Maize Plant Physiol 170 pp1501821- 1062

Zyprych-Walczak J Szabelska A Handschuh L Goacuterczak K Klamecka K Figlerowicz M Siatkowski I (2015) 1063 The impact of normalization methods on RNA-Seq data analysis Biomed Res Int 2015 1064

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 1 Number of maize microarray and RNA-Seq samples submitted to NCBI (httpswwwncbinlmnihgov) from 2008-2016 A A text search of the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) database identified samples generated by microarray platforms GPL4032 and GPL12620 the total values for years 2008 to 2016 were combined to represent the number of microarray studies (Microarray) A text search of the NCBI Sequence Read Archive (SRA) database was used to identify RNA-seq samples generated between 2008 and 2016 using the Illumina sequencing platform (RNA-Seq) B the number of samples submitted to NCBI GEO database each year generated by microarray platform GPL4032 and GPL12620 were identified by a text search (dash line) and compared to the number of RNA-Seq Illumina samples (solid line) per year 2008-2016

Fig 1A B

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 2 Normalization and network inference methods effect on single network performance A Network performance was evaluated by calculating area under the receiver operating characteristic curve (AUROC) values from GO datasets for comparisons with samples normalized using Variance Stabilizing Transformation (VST) Counts Per Million (CPM) or Reads per Kilobase per Million (RPKM) methods B Network performance was evaluated by calculating AUROC values from PPPTY dataset comparisons for samples normalized using VST CPM or RPKM methods C Network performance was evaluated by calculating AUROC values from com-parisons with HDA101 binding targets for samples normalized using VST CPM or RPKM methods D Network performance was evaluated by calculating AUROC values from comparisons with GO dataset for samples constructed using nine inference methods including Pearson Correlation Coefficient (PCC) Spearman correla-tion coefficient (SCC) Kendall rank Correlation Coefficient (KCC) Gini correlation coefficient (GCC) Biweight midcorrelation (BIC) Cosine Similarity Coefficient (CSC) Additive ARCNE (AA) multiplicative ARCNE (MA) Minimum Redundancy NETwork (MRNET) and Context Likelihood of Relatedness (CLR) E Network perfor-mance was evaluated by calculating AUROC values from comparisons with PPPTY for samples constructed using nine inference methods F Network performance was evaluated by calculating AUROC values from com-parisons with HDA101 binding targets for samples constructed using nine inference methods Outliers were defined as outside of 15 times the interquartile range above 75 quantile or below 25 quantile Median values were plotted as bold horizontal lines For C to F the horizontal dashed lines indicate the highest and lowest AUROC values

Fig 2 A D

B E

C F

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

FigP

FigurePIPSimilarityPbetweenPninePinferencePmethodsPonPnetworkPperformancePbasedPuponPGOPA-PandPPPPTYPB-PevaluationI Genes with highest AUROC value in CSC or MRNETCLR are closed in blue and green box respectively AreaPunderPthePROCPcurvePAUROC-PvaluesPforPeachPGOPtermPorPgenesPwerePscaledPtoPstandardPnormalPdistributionMPresultingPinPscaledPAUROCPvaluesPbetweenPKPblue-PandPPred-IPSamplesPnormalizedPbyPVSTMPCPMPandPRPKMPwerePanalyzedPusingPeachPinferencePmethodsPPCCMPSCCMPKCCMPGCCMPBICMPCSCMPAAMPMAMPMRNETPandPCLR-PandPclusteredPbasedPonP EuclidianP distanceIP PCCP PearsonP CorrelationP CoefficientP SCCP SpearmanP correlationP coefficientP KCCP KendallP rankP CorrelationP CoefficientPGCCPGiniPcorrelationPcoefficientPBICPBiweightPmidcorrelationPCosinePSimilarityPCoefficientPCSC-PAAPAdditivePARCNEPMAPmultiplicativePARCNEPMRNETPMinimumPRedundancyPNETworkPCLRPContextPLikelihoodPofPRelatednessI

A

B

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 4 Effect of sample size on network performance A Average area under the ROC curve (average AUROC) values from GO evaluation of 17 different sized networks plotted against natural logarithm transformed sample size (log(sample size)) B Average AUROC values from PPPTY evaluation of 17 different sized networks plotted against natural logarithm transformed sample size (log(sample size)) Regression fitting logarithm models were plotted in black lines R-square and p-values were calculated by lm() function in R PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness

Fig 4A

B

GO PCC GO SCC

GO MRNET GO CLR

PPPTY PCC PPPTY SCC

PPPTY CLRPPPTY MRNET

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Fig 5

A B

Figure 5 Evaluation of different sized networks constructed with different number of samples using PCC (black) SCC (green) MRNET (red) and CLR (blue) methods A Average AUROC values from GO evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted against sample size B Average AUROC values from PPPTY evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted against sample size Networks with the same number of samples included are designated as ldquo_1rdquo ldquo_2rdquo and ldquo_3rdquo PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 6 GCN performance comparison of networks constructed with 1266 libraries A Area under the ROC curve (AUROC) values from GO evalu-ation of single network (white bars) aggregation network (grey bars) and protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers B AUROC values from PPPTY evaluation of single network (white bars) aggregation network (grey bars) and protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers

AU

C

AU

C

Protein GO Protein PPPTYA B

Fig 6

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

835

45

3812

5 802

148

MS PA

SA

872505

167664

16510411732

9172 716573

106591

MS PA

SA

chromatinassemblydisassembly

cellular macromoleculemetabolic process

chromatin assembly

Hub

N2 compound metabolicprocess

gene silencing

macromoleculemetabolic process

cellular componentorganization

chromatin modification

biosynthetic process

cellular biosyntheticprocess

DNA packaging

organelle organization

protein-DNA complexassembly

nucleosomeorganization

DNA-dep DNAreplication

macromoleculebiosynthetic process

response to DNAdamage stimulus

chromosomeorganization

pattern specificationprocess

DNA replication

DNA conformationchange

translation

cellular macromoleculebiosynthetic process

Nucleic acid metabolicprocess

gene expression

chromatin organizationnucleosome assembly

epigenetic reg of geneexpression

negative regulation ofmacromolecule

metabolic process

cellular response tostress

RNA processing

DNA repair

regionalization

polysaccharidebiosynthetic process

cell wall organization orbiogenesis

glucan metabolicprocess

cellular glucanmetabolic process

cellular polysaccharidebiosynthetic process

cellular carbohydratebiosynthetic process

cellulose metabolicprocess

cellular polysaccharidemetabolic process

cellulose biosyntheticprocess epidermis development

cell growthgrowth

regulation of cellularcomponent size

cellular amino acidderivative metabolic

process

cell wall polysaccharidemetabolic process

carbohydrate metabolicprocess

regulation of anatomicalstructure size

GTP metabolic process

root morphogenesis

epidermal celldifferentiation

ectoderm developmentphenylpropanoid

biosynthetic process

regulation of cell size

glucan biosyntheticprocess

carbohydratebiosynthetic process cellular cell wall

organization orbiogenesis

cell wall biogenesis

Cell Wallroot epidermal cell

differentiationcell differentiation

cell wall organization

protein polymerization

plant-type cell wallbiogenesis

cellular carbohydratemetabolic process

phenylpropanoidmetabolic process

cell wall macromoleculemetabolic process

cellular cell wallmacromolecule

metabolic process

plant-type cell wallorganization or

biogenesis

hemicellulose metabolicprocess

Fig 7A C

B D

Figure 7 Comparison of top 1000 interacting genes in aggregation and single networks A Venn diagram shows the overlap among 1000 highest interacting genes in three networks 148 genes were shared among three networks PA PCC ranked aggregation network SA SCC ranked aggrega-tion network MS MRNET single network B Venn diagram shows the overlap among top 1x106 edges in three networks 106591 edges were shared among three networks C GO term enrichment analyzed by AgriGO for 148 shared highly interacting genes among PA SA and MS D GO term enrichment analyzed by AgriGO for 214 co-expressed genes queried by 16 cell wall pathway genes wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Fig 8

A B C

Figure 8 Cell wall pathway subnetworks A Intersections of PCC aggregation (PA) SCC aggregation (SA) and MRNET-single (MS) networks queried by 16 cell wall pathway genes (red nodes) Cyan nodes are genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions B Network retrieved from CORNET database queried by the 16 cell wall pathway genes (red node) Cyan nodes are genes with reported function in cell wall related path-ways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions C Network retrieved from STRING database queried by the 16 cell wall pathway genes (red nodes) Cyan nodes are genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Parsed CitationsAllen JD Xie Y Chen M Girard L Xiao G (2012) Comparing statistical methods for constructing large scale gene networks PLoS One7 e29348

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Anders S Huber W (2010) Differential expression analysis for sequence count data Genome Biol 11 R106Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Aoki K Ogata Y Shibata D (2007) Approaches for extracting practical information from gene co-expression networks in plant biologyPlant Cell Physiol 48 381-90

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Aoki Y Okamura Y Tadaka S Kinoshita K Obayashi T (2015) ATTED-II in 2016 A Plant Coexpression Database Towards Lineage-Specific Coexpression Plant Cell Physiol 57 e5-e5

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Asnicar F Masera L Coller E Gallo C Sella N Tolio T Morettin P Erculiani L Galante F Semeniuta S (2016) NES2RA Networkexpansion by stratified variable subsetting and ranking aggregation Int J High Perform Comput Appl 1094342016662508

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Baerenfaller K Grossmann J Grobei MA Hull R Hirsch-Hoffmann M Yalovsky S Zimmermann P Grossniklaus U Gruissem WBaginsky S (2008) Genome-scale proteomics reveals Arabidopsis thaliana gene models and proteome dynamics Science (80- ) 320938-941

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ballouz S Verleyen W Gillis J (2015) Guidance for RNA-seq co-expression network construction and analysis Safety in numbersBioinformatics 31 2123-2130

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ballouz S Weber M Pavlidis P Gillis J (2016) EGAD Ultra-fast functional analysis of gene networks bioRxiv 53868Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Barabasi A-L Oltvai ZNZN Barabaacutesi A-L (2004) Network biology understanding the cells functional organization Nat Rev Genet 5101-113

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Benjamini Y Hochberg Y (1995) Controlling the false discovery rate a practical and powerful approach to multiple testing J R Stat SocSer B 289-300

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

De Bodt S Hollunder J Nelissen H Meulemeester N Inzeacute D (2012) CORNET 20 Integrating plant coexpression protein-proteininteractions regulatory interactions gene associations and functional annotations New Phytol 195 707-720

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Bolduc N Yilmaz A Mejia-Guerra MK Morohashi K OConnor D Grotewold E Hake S (2012) Unraveling the KNOTTED1 regulatorynetwork in maize meristems Genes Dev 26 1685-90

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Bosch M Mayer C-D Cookson A Donnison IS (2011) Identification of genes involved in cell wall biogenesis in grasses by differential wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

gene expression profiling of elongating and non-elongating maize internodes J Exp Bot 62 3545-3561Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Camacho C Coulouris G Avagyan V Ma N Papadopoulos J Bealer K Madden TL (2009) BLAST+ architecture and applications BMCBioinformatics 10 421

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Conesa A Madrigal P Tarazona S Gomez-Cabrero D Cervera A McPherson A Szczesniak MW Gaffney DJ Elo LL Zhang X et al(2016) A survey of best practices for RNA-seq data analysis Genome Biol 17 13

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dhaeseleer P Liang S Somogyi R (2000) Genetic network inference from co-expression clustering to reverse engineeringBioinformatics 16 707-726

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Davidson RM Hansey CN Gowda M Childs KL Lin H Vaillancourt B Sekhon RS de Leon N Kaeppler SM Jiang N et al (2011) Utilityof RNA Sequencing for Analysis of Maize Reproductive Transcriptomes Plant Genome J 4 191

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Desprez T Juraniec M Crowell EF Jouy H Pochylova Z Parcy F Houmlfte H Gonneau M Vernhettes S (2007) Organization of cellulosesynthase complexes involved in primary cell wall synthesis in Arabidopsis thaliana Proc Natl Acad Sci 104 15572-15577

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dhillon IS Modha DS (2001) Concept decompositions for large sparse text data using clustering Mach Learn 42 143-175Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dillies M-A Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot G Castel D Estelle J (2013a) Acomprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis Brief Bioinform 14671-683

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dillies MA Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot NS Castel D Estelle J et al (2013b) Acomprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis Brief Bioinform 14671-683

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Doncheva NT Assenov Y Domingues FS Albrecht M (2012) Topological analysis and interactive visualization of biological networksand protein structures Nature Protoc 7 670-85

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dong J Horvath S (2007) Understanding network concepts in modules BMC Syst Biol 1 24Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dotto MC Petsch KA Aukerman MJ Beatty M Hammell M Timmermans MCP (2014) Genome-wide analysis of leafbladeless1-regulatedand phased small RNAs underscores the importance of the TAS3 ta-siRNA pathway to maize development PLoS Genet 10 e1004826

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Du D Rawat N Deng Z Gmitter FG (2015) Construction of citrus gene coexpression networks from microarray data using randommatrix theory Hortic Res 2 15026

Pubmed Author and TitleCrossRef Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Google Scholar Author Only Title Only Author and Title

Du Z Zhou X Ling Y Zhang Z Su Z (2010) agriGO a GO analysis toolkit for the agricultural community Nucleic Acids Res 38 64-70Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Enright AJ Van Dongen S Ouzounis CA (2002) An efficient algorithm for large-scale detection of protein families Nucleic Acids Res 301575-1584

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fadista J Vikman P Laakso EO Mollet IG Esguerra J Lou Taneera J Storm P Osmark P Ladenvall C Prasad RB (2014) Globalgenomic and transcriptomic analysis of human pancreatic islets reveals novel genes influencing glucose metabolism Proc Natl AcadSci 111 13924-13929

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Faith JJ Hayete B Thaden JT Mogno I Wierzbowski J Cottarel G Kasif S Collins JJ Gardner TS (2007) Large-scale mapping andvalidation of Escherichia coli transcriptional regulation from a compendium of expression profiles PLoS Biol 5 0054-0066

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fedoroff N V (2012) McClintocks challenge in the 21st century Proc Natl Acad Sci 109(50) 20200-20203Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ficklin SP Feltus FA (2011) Gene coexpression network alignment and conservation of gene modules between two grass speciesmaize and rice Plant Physiol 156 1244-56

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Filzmoser P Fritz H Kalcher K (2009) pcaPP Robust PCA by Projection Pursuit R Packag version 1Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fu J Cheng Y Linghu J Yang X Kang L Zhang Z Zhang J He C Du X Peng Z (2013) RNA sequencing reveals the complex regulatorynetwork in the maize kernel Nature Commun 42832

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gao S Ver Steeg G Galstyan A (2015) Efficient Estimation of Mutual Information for Strongly Dependent Variables ArtificialIntelligence and Statistics 277-286

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gillis J Pavlidis P (2011) The role of indirect connections in gene networks in predicting function Bioinformatics 27 1860-1866Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Giorgi F Fabbro C Del Licausi F (2013) Comparative study of RNA-seq-and Microarray-derived coexpression networks in Arabidopsisthaliana Bioinformatics 2 1-8

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gu Y Kaplinsky N Bringmann M Cobb A Carroll A Sampathkumar A Baskin TI Persson S Somerville CR (2010) Identification of acellulose synthase-associated protein required for cellulose biosynthesis Proc Natl Acad Sci 107 12866-12871

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Han Y Gao S Muegge K Zhang W Zhou B (2015) Advanced applications of RNA sequencing and challenges Bioinform Biol Insights 929-46

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Horvath S Dong J (2008) Geometric interpretation of gene coexpression network analysis PLoS Comput biol 4 e1000117Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Huang J Lynn JS Schulte L Vendramin S McGinnis K (2017) Chapter Two-Epigenetic Control of Gene Expression in Maize Int RevCell Mol Biol 328 25-48

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Iancu OD Kawane S Bottomly D Searles R Hitzemann R McWeeney S (2012) Utilizing RNA-Seq data for de novo coexpressionnetwork inference Bioinformatics 28 1592-1597

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kim D Langmead B Salzberg SL (2015) HISAT a fast spliced aligner with low memory requirements Nat Methods 12 357-360Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Koumlster J Rahmann S (2012) Snakemakemdasha scalable bioinformatics workflow engine Bioinformatics 28 2520-2522Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Krouk G Lingeman J Colon A Coruzzi G Shasha D (2013) Gene regulatory networks in plants learning causality from time andperturbation Genome Biol 14 123

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kumar S Stecher G Suleski M Hedges SB (2017) TimeTree a resource for timelines timetrees and divergence times Mol Biol Evol34 1812-1819

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kumari S Nie J Chen H-S Ma H Stewart R Li X Lu M-Z Taylor WM Wei H (2012) Evaluation of gene association methods forcoexpression network construction and biological knowledge discovery PLoS One 7 e50411

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Langfelder P Horvath S (2008) WGCNA an R package for weighted correlation network analysis BMC Bioinformatics 9 559Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Leinonen R Sugawara H Shumway M (2010) The sequence read archive Nucleic Acids Res gkq1019Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li C Qiao Z Qi W Wang Q Yuan Y Yang X Tang Y Mei B Lv Y Zhao H et al (2015a) Genome-Wide Characterization of cis-Acting DNATargets Reveals the Transcriptional Regulatory Framework of Opaque2 in Maize Plant Cell 27 532-545

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li H Peng Z Yang X Wang W Fu J Wang J Han Y Chai Y Guo T Yang N (2013a) Genome-wide association study dissects the geneticarchitecture of oil biosynthesis in maize kernels Nat Genet 45 43-50

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li J Wei H Zhao PX (2013b) DeGNServer Deciphering Genome-Scale Gene Networks through High Performance ReverseEngineering Analysis 2013

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li P Piao Y Shon HS Ryu KH (2015b) Comparing the normalization methods for the differential analysis of Illumina high-throughputRNA-Seq data BMC Bioinformatics 16 347 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li Q Eichten SR Hermanson PJ Zaunbrecher VM Song J Wendt J Rosenbaum H Madzima TF Sloan AE Huang J et al (2014a)Genetic Perturbation of the Maize Methylome Plant Cell 26 4602-4616

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li S Labaj PP Zumbo P Sykacek P Shi W Shi L Phan J Wu P-Y Wang M Wang C (2014b) Detecting and correcting systematicvariation in large-scale RNA sequencing data Nature Biotechnol 32 888-895

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li Y Pearl SA Jackson SA (2015c) Gene Networks in Plant Biology Approaches in Reconstruction and Analysis Trends Plant Sci 20664-675

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Liao Y Smyth GK Shi W (2014) featureCounts an efficient general purpose program for assigning sequence reads to genomicfeatures Bioinformatics 30 923-930

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Lim WK Wang K Lefebvre C Califano A (2007) Comparative analysis of microarray normalization procedures Effects on reverseengineering gene networks Bioinformatics pp 282-288

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Liu S Liu Y Zhao J Cai S Qian H Zuo K Zhao L Zhang L (2017) A computational interactome for prioritizing genes associated withcomplex agronomic traits in rice Plant J 90 177-188

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Lonsdale J Thomas J Salvatore M Phillips R Lo E Shad S Hasz R Walters G Garcia F Young N (2013) The genotype-tissueexpression (GTEx) project Nat Genet 45 580-585

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Love MI Huber W Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 Genome Biol15 1

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Luo X You Z Zhou M Li S Leung H Xia Y Zhu Q (2015) A highly efficient approach to protein interactome mapping based oncollaborative filtering framework Sci Rep 5 7702

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma C Wang X (2012) Application of the Gini correlation coefficient to infer regulatory relationships in transcriptome analysis PlantPhysiol 160 192-203

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma H-W Zeng A-P (2003) The connectivity structure giant strong component and centrality of metabolic networks Bioinformatics 191423-1430

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma S Shah S Bohnert HJ Snyder M Dinesh-Kumar SP (2013) Incorporating motif analysis into gene co-expression networks revealsnovel modular expression pattern and new signaling pathways PLoS Genet 9 e1003840

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Marbach D Costello JC Kuumlffner R Vega NNM Prill RJ Camacho DM Allison KR Aderhold A Allison KR Bonneau R et al (2012)Wisdom of crowds for robust gene network inference Nat Methods 9 796-804

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Margolin AA Nemenman I Basso K Wiggins C Stolovitzky G Dalla Favera R Califano A (2006) ARACNE an algorithm for thereconstruction of gene regulatory networks in a mammalian cellular context BMC Bioinformatics 7 S7

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mark Cigan A Unger-Wallace E Haug-Collet K (2005) Transcriptional gene silencing as a tool for uncovering gene function in maizePlant J 43 929-940

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads EMBnet J 17 pp-10Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Maza E Frasse P Senin P Bouzayen M Zouine M (2013) Comparison of normalization methods for differential gene expressionanalysis in RNA-Seq experiments A matter of relative size of studied transcriptomes Commun Integr Biol 6 e25849

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

McClintock B (1983) The Significance of Responses of THE GENOME TO CHALLENGE Science (80- ) 792-801Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Meyer PE Kontos K Lafitte F Bontempi G (2007) Information-theoretic inference of large transcriptional regulatory networks EurasipJ Bioinforma Syst Biol doi 101155200779879

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Meyer PE Lafitte F Bontempi G (2008) minet ARBioconductor package for inferring large transcriptional networks using mutualinformation BMC Bioinformatics 9 461

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Monaco MK Sen TZ Dharmawardhana PD Ren L Schaeffer M Naithani S Amarasinghe V Thomason J Harper L Gardiner J et al(2013) Maize Metabolic Network Construction and Transcriptome Analysis Plant Genome 6 12

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Morohashi K Casas MI Falcone Ferreyra ML Falcone Ferreyra L Mejiacutea-Guerra MK Pourcel L Yilmaz A Feller A Carvalho B EmilianiJ et al (2012) A genome-wide regulatory framework identifies maize pericarp color1 controlled genes Plant Cell 24 2745-64

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Morris JH Apeltsin L Newman AM Baumbach J Wittkop T Su G Bader GD Ferrin TE (2011) clusterMaker a multi-algorithmclustering plugin for Cytoscape BMC Bioinformatics 12 436

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mortazavi A Williams BA McCue K Schaeffer L Wold B (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq NatMethods 5 621-628

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mukaka MM (2012) A guide to appropriate use of correlation coefficient in medical research Malawi Med J 24 69-71Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Obayashi T Hayashi S Saeki M Ohta H Kinoshita K (2009) ATTED-II provides coexpressed gene networks for Arabidopsis Nucleic wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Acids Res 37 D987-D991Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ogata Y Suzuki H Sakurai N Shibata D (2010) CoP a database for characterizing co-expressed gene modules with biologicalinformation in plants Bioinformatics 26 1267-1268

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Oshlack A Wakefield MJ (2009) Transcript length bias in RNA-seq data confounds systems biology Biol Direct 4 14Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Park T Yi S-G Kang S-H Lee S Lee Y-S Simon R (2003) Evaluation of normalization methods for microarray data BMC Bioinformatics4 33

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Paulson J Chen C-Y Lopes-Ramos CM Kuijjer ML Platig J Sonawane AR Fagny M Glass K Quackenbush J (2016) Tissue-awareRNA-Seq processing and normalization for heterogeneous and sparse data bioRxiv 81802

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Pautler M Eveland AL LaRue T Yang F Weeks R Lunde C Je B Il Meeley R Komatsu M Vollbrecht E et al (2015) FASCIATED EAR4Encodes a bZIP Transcription Factor That Regulates Shoot Meristem Size in Maize Plant Cell Online 2 tpc114132506

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Penning BW Hunter 3rd CT Tayengwa R Eveland AL Dugard CK Olek AT Vermerris W Koch KE McCarty DR Davis MF et al (2009)Genetic resources for maize cell wall biology Plant Physiol 151 1703-1728

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ponnala L Wang Y Sun Q Wijk KJ (2014) Correlation of mRNA and protein abundance in the developing maize leaf Plant J 78 424-440Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Rau A Gallopin M Celeux G Jaffreacutezic F (2013) Data-based filtering for replicated high-throughput transcriptome sequencingexperiments Bioinformatics 29 2146-2152

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Robinson MD McCarthy DJ Smyth GK (2010) edgeR a Bioconductor package for differential expression analysis of digital geneexpression data Bioinformatics 26 139-140

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sales G Romualdi C (2011) parmigenemdasha parallel R package for mutual information estimation and gene network reconstructionBioinformatics 27 1876-1877

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sangrador-Vegas A Mitchell AL Chang H-Y Yong S-Y Finn RD (2016) GO annotation in InterPro why stability does not indicateaccuracy in a sea of changing annotations Database J Biol databases curation 2016

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sato Y Antonio BA Namiki N Takehisa H Minami H Kamatsuki K Sugimoto K Shimizu Y Hirochika H Nagamura Y (2011) RiceXPro aplatform for monitoring gene expression in japonica rice grown under natural field conditions Nucleic Acids Res 39 D1141-D1148

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schaefer RJ Briskine R Springer NM Myers CL (2014) Discovering functional modules across diverse maize transcriptomes using wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

COB the co-expression browser PLoS One doi 101371journalpone0099193Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schmidt D (2016) Co-Operation Fast Correlation Covariance and Cosine Similarity R packagePubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schnable P Ware D Fulton R Stein J (2009) The B73 maize genome complexity diversity and dynamics Science (80- ) 326 1112-1115Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schwanhaumlusser B Busse D Li N Dittmar G Schuchhardt J Wolf J Chen W Selbach M (2011) Global quantification of mammalian geneexpression control Nature 473 337-342

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Scott-Boyer M-P Haibe-Kains B Deschepper CF (2013) Network statistics of genetically-driven gene co-expression modules in mousecrosses Frontiers in Genetics 20134291

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Serin EAR Nijveen H Hilhorst HWM Ligterink W (2016) Learning from Co-expression Networks Possibilities and Challenges FrontPlant Sci 7 444

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Shannon P Markiel A Ozier O Baliga NS Wang JT Ramage D Amin N Schwikowski B Ideker T (2003) Cytoscape a softwareenvironment for integrated models of biomolecular interaction networks Genome Res 13 2498-2504

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sing T Sander O Beerenwinkel N Lengauer T (2005) ROCR visualizing classifier performance in R Bioinformatics 21 3940-3941Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Singh M Goel S Meeley RB Dantec C Parrinello H Michaud C Leblanc O Grimanelli D (2011) Production of viable gametes withoutmeiosis in maize deficient for an ARGONAUTE protein Plant Cell 23 443-458

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Song L Langfelder P Horvath S (2012) Comparison of co-expression measures mutual information correlation and model basedindices BMC Bioinformatics 13 328

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Stelpflug SC Rajandeep S Vaillancourt B Hirsch CN Buell CR Leon N De Kaeppler SM (2015) An expanded maize gene expressionatlas based on RNA-sequencing and its use to explore root development Plant Genome 314-362

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Szklarczyk D Franceschini A Wyder S Forslund K Heller D Huerta-Cepas J Simonovic M Roth A Santos A Tsafou KP et al (2015)STRING v10 Protein-protein interaction networks integrated over the tree of life Nucleic Acids Res 43 D447-D452

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Tzfadia O Diels T De Meyer S Vandepoele K Aharoni A Van de Peer Y (2016) CoExpNetViz Comparative Co-Expression NetworksConstruction and Visualization Tool Front Plant Sci 6 1194

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Usadel B Obayashi T Mutwil M Giorgi FM Bassel GW Tanimoto M Chow A Steinhauser D Persson S Provart NJ (2009) Co-expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

USDA (2016) Grain World Markets and Trade

Vanholme R Demedts B Morreel K Ralph J Boerjan W (2010) Lignin biosynthesis and structure Plant Physiol 153 895-905Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Verdin E Ott M (2014) 50 Years of Protein Acetylation From Gene Regulation To Epigenetics Metabolism and Beyond Nat Rev MolCell Biol 16 258-264

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Walley JW Sartor RC Shen Z Schmitz RJ Wu KJ Urich MA Nery JR Smith LG Schnable JC Ecker JR (2016) Integration of omicnetworks in a developmental atlas of maize Science 353 814-818

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang W Zhou XJ Liu Z Sun F (2015a) Network tuned multiple rank aggregation and applications to gene ranking BMC Bioinformatics16 S6

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang YXR Jiang K Feldman LJ Bickel PJ Huang H (2015b) Inferring gene-gene interactions and functional modules using sparsecanonical correlation analysis Ann Appl Stat 9 300-323

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang Z Gerstein M Snyder M (2009) RNA-Seq a revolutionary tool for transcriptomics Nat Rev Genet 10 57-63Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wei C Li J Bumgarner RE (2004) Sample size for detecting differentially expressed genes in microarray experiments BMC Genomics5 87

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wolfe CJ Kohane IS Butte AJ (2005) Systematic survey reveals general applicability of guilt-by-association within genecoexpression networks BMC Bioinformatics 6 227

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wu D-D Wang X Li Y Zeng L Irwin DM Zhang Y-P (2014) Out of Pollen Hypothesis for Origin of New Genes in Flowering PlantsStudy from Arabidopsis thaliana Genome Biol Evol 6 2822-2829

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yang H Liu X Xin M Du J Hu Z Peng H Rossi V Sun Q Ni Z Yao Y (2016) Genome-wide Mapping of Targets of Maize HistoneDeacetylase HDA101 Reveals Its Function and Regulatory Mechanism during Seed Development Plant Cell 28 629-645

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yim WC Yu Y Song K Jang CS Lee B-M (2013) PLANEX the plant co-expression database BMC Plant Biol 13 83Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zheng W Chung LM Zhao H (2011) Bias detection and correction in RNA-Sequencing data BMC Bioinformatics 12 290Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Allen JD Xiao G Xie Y (2014) Ensemble-based network aggregation improves the accuracy of gene network reconstructionPLoS One 9 e106319 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Ye ZH (2015) Secondary cell walls Biosynthesis patterned deposition and transcriptional regulation Plant Cell Physiol 56195-214

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhu G Wu A Xu X-J Xiao P Lu L Liu J Cao Y Chen L Wu J Zhao X-M (2016) PPIM A protein-protein interaction database for MaizePlant Physiol 170 pp1501821-

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zyprych-Walczak J Szabelska A Handschuh L Goacuterczak K Klamecka K Figlerowicz M Siatkowski I (2015) The impact of normalizationmethods on RNA-Seq data analysis Biomed Res Int 2015

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

  • Parsed Citations
  • Article File
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Figure 7
  • Figure 8
  • Parsed Citations
Page 26: 1 : Maize RNA-Seq GCNs · 4 Author names and affiliation: 5 Ji Huang, ... and the genome annotation is still far from complete (Mark 49 Cigan et al., 2005; Ficklin and Feltus, 2011)

Page | 26

Gu Y Kaplinsky N Bringmann M Cobb A Carroll A Sampathkumar A Baskin TI Persson S Somerville CR 880 (2010) Identification of a cellulose synthase-associated protein required for cellulose biosynthesis Proc 881 Natl Acad Sci 107 12866ndash12871 882

Han Y Gao S Muegge K Zhang W Zhou B (2015) Advanced applications of RNA sequencing and challenges 883 Bioinform Biol Insights 9 29ndash46 884

Horvath S Dong J (2008) Geometric interpretation of gene coexpression network analysis PLoS Comput biol 885 4 e1000117 886

Huang J Lynn JS Schulte L Vendramin S McGinnis K (2017) Chapter Two-Epigenetic Control of Gene 887 Expression in Maize Int Rev Cell Mol Biol 328 25ndash48 888

Iancu OD Kawane S Bottomly D Searles R Hitzemann R McWeeney S (2012) Utilizing RNA-Seq data for de 889 novo coexpression network inference Bioinformatics 28 1592ndash1597 890

Kim D Langmead B Salzberg SL (2015) HISAT a fast spliced aligner with low memory requirements Nat 891 Methods 12 357ndash360 892

Koumlster J Rahmann S (2012) Snakemakemdasha scalable bioinformatics workflow engine Bioinformatics 28 893 2520ndash2522 894

Krouk G Lingeman J Colon A Coruzzi G Shasha D (2013) Gene regulatory networks in plants learning 895 causality from time and perturbation Genome Biol 14 123 896

Kumar S Stecher G Suleski M Hedges SB (2017) TimeTree a resource for timelines timetrees and 897 divergence times Mol Biol Evol 34 1812ndash1819 898

Kumari S Nie J Chen H-S Ma H Stewart R Li X Lu M-Z Taylor WM Wei H (2012) Evaluation of gene 899 association methods for coexpression network construction and biological knowledge discovery PLoS 900 One 7 e50411 901

Langfelder P Horvath S (2008) WGCNA an R package for weighted correlation network analysis BMC 902 Bioinformatics 9 559 903

Leinonen R Sugawara H Shumway M (2010) The sequence read archive Nucleic Acids Res gkq1019 904

Li C Qiao Z Qi W Wang Q Yuan Y Yang X Tang Y Mei B Lv Y Zhao H et al (2015a) Genome-Wide 905 Characterization of cis-Acting DNA Targets Reveals the Transcriptional Regulatory Framework of 906 Opaque2 in Maize Plant Cell 27 532-545 907

Li H Peng Z Yang X Wang W Fu J Wang J Han Y Chai Y Guo T Yang N (2013a) Genome-wide 908 association study dissects the genetic architecture of oil biosynthesis in maize kernels Nat Genet 45 43ndash909 50 910

Li J Wei H Zhao PX (2013b) DeGNServer  Deciphering Genome-Scale Gene Networks through High 911 Performance Reverse Engineering Analysis 2013 912

Li P Piao Y Shon HS Ryu KH (2015b) Comparing the normalization methods for the differential analysis of 913 Illumina high-throughput RNA-Seq data BMC Bioinformatics 16 347 914

Li Q Eichten SR Hermanson PJ Zaunbrecher VM Song J Wendt J Rosenbaum H Madzima TF Sloan AE 915 Huang J et al (2014a) Genetic Perturbation of the Maize Methylome Plant Cell 26 4602ndash4616 916

Li S Łabaj PP Zumbo P Sykacek P Shi W Shi L Phan J Wu P-Y Wang M Wang C (2014b) Detecting and 917 correcting systematic variation in large-scale RNA sequencing data Nature Biotechnol 32 888ndash895 918

Li Y Pearl SA Jackson SA (2015c) Gene Networks in Plant Biology Approaches in Reconstruction and 919 Analysis Trends Plant Sci 20 664ndash675 920

Liao Y Smyth GK Shi W (2014) featureCounts an efficient general purpose program for assigning sequence 921 reads to genomic features Bioinformatics 30 923ndash930 922

Lim WK Wang K Lefebvre C Califano A (2007) Comparative analysis of microarray normalization procedures 923 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 27

Effects on reverse engineering gene networks Bioinformatics pp 282ndash288 924

Liu S Liu Y Zhao J Cai S Qian H Zuo K Zhao L Zhang L (2017) A computational interactome for prioritizing 925 genes associated with complex agronomic traits in rice Plant J 90 177-188 926

Lonsdale J Thomas J Salvatore M Phillips R Lo E Shad S Hasz R Walters G Garcia F Young N (2013) 927 The genotype-tissue expression (GTEx) project Nat Genet 45 580ndash585 928

Love MI Huber W Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data 929 with DESeq2 Genome Biol 15 1 930

Luo X You Z Zhou M Li S Leung H Xia Y Zhu Q (2015) A highly efficient approach to protein interactome 931 mapping based on collaborative filtering framework Sci Rep 5 7702 932

Ma C Wang X (2012) Application of the Gini correlation coefficient to infer regulatory relationships in 933 transcriptome analysis Plant Physiol 160 192ndash203 934

Ma H-W Zeng A-P (2003) The connectivity structure giant strong component and centrality of metabolic 935 networks Bioinformatics 19 1423ndash1430 936

Ma S Shah S Bohnert HJ Snyder M Dinesh-Kumar SP (2013) Incorporating motif analysis into gene co-937 expression networks reveals novel modular expression pattern and new signaling pathways PLoS Genet 938 9 e1003840 939

Marbach D Costello JC Kuumlffner R Vega NNM Prill RJ Camacho DM Allison KR Aderhold A Allison KR 940 Bonneau R et al (2012) Wisdom of crowds for robust gene network inference Nat Methods 9 796ndash804 941

Margolin AA Nemenman I Basso K Wiggins C Stolovitzky G Dalla Favera R Califano A (2006) ARACNE 942 an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context BMC 943 Bioinformatics 7 S7 944

Mark Cigan A Unger‐Wallace E Haug‐Collet K (2005) Transcriptional gene silencing as a tool for uncovering 945 gene function in maize Plant J 43 929ndash940 946

Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads EMBnet J 17 947 pp-10 948

Maza E Frasse P Senin P Bouzayen M Zouine M (2013) Comparison of normalization methods for 949 differential gene expression analysis in RNA-Seq experiments A matter of relative size of studied 950 transcriptomes Commun Integr Biol 6 e25849 951

McClintock B (1983) The Significance of Responses of THE GENOME TO CHALLENGE Science (80- ) 792ndash952 801 953

Meyer PE Kontos K Lafitte F Bontempi G (2007) Information-theoretic inference of large transcriptional 954 regulatory networks Eurasip J Bioinforma Syst Biol doi 101155200779879 955

Meyer PE Lafitte F Bontempi G (2008) minet ARBioconductor package for inferring large transcriptional 956 networks using mutual information BMC Bioinformatics 9 461 957

Monaco MK Sen TZ Dharmawardhana PD Ren L Schaeffer M Naithani S Amarasinghe V Thomason J 958 Harper L Gardiner J et al (2013) Maize Metabolic Network Construction and Transcriptome Analysis 959 Plant Genome 6 12 960

Morohashi K Casas MI Falcone Ferreyra ML Falcone Ferreyra L Mejiacutea-Guerra MK Pourcel L Yilmaz A 961 Feller A Carvalho B Emiliani J et al (2012) A genome-wide regulatory framework identifies maize 962 pericarp color1 controlled genes Plant Cell 24 2745ndash64 963

Morris JH Apeltsin L Newman AM Baumbach J Wittkop T Su G Bader GD Ferrin TE (2011) clusterMaker 964 a multi-algorithm clustering plugin for Cytoscape BMC Bioinformatics 12 436 965

Mortazavi A Williams BA McCue K Schaeffer L Wold B (2008) Mapping and quantifying mammalian 966 transcriptomes by RNA-Seq Nat Methods 5 621ndash628 967

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 28

Mukaka MM (2012) A guide to appropriate use of correlation coefficient in medical research Malawi Med J 24 968 69ndash71 969

Obayashi T Hayashi S Saeki M Ohta H Kinoshita K (2009) ATTED-II provides coexpressed gene networks 970 for Arabidopsis Nucleic Acids Res 37 D987ndashD991 971

Ogata Y Suzuki H Sakurai N Shibata D (2010) CoP a database for characterizing co-expressed gene 972 modules with biological information in plants Bioinformatics 26 1267ndash1268 973

Oshlack A Wakefield MJ (2009) Transcript length bias in RNA-seq data confounds systems biology Biol 974 Direct 4 14 975

Park T Yi S-G Kang S-H Lee S Lee Y-S Simon R (2003) Evaluation of normalization methods for microarray 976 data BMC Bioinformatics 4 33 977

Paulson J Chen C-Y Lopes-Ramos CM Kuijjer ML Platig J Sonawane AR Fagny M Glass K Quackenbush 978 J (2016) Tissue-aware RNA-Seq processing and normalization for heterogeneous and sparse data 979 bioRxiv 81802 980

Pautler M Eveland AL LaRue T Yang F Weeks R Lunde C Je B Il Meeley R Komatsu M Vollbrecht E et 981 al (2015) FASCIATED EAR4 Encodes a bZIP Transcription Factor That Regulates Shoot Meristem Size in 982 Maize Plant Cell Online 2 tpc114132506 983

Penning BW Hunter 3rd CT Tayengwa R Eveland AL Dugard CK Olek AT Vermerris W Koch KE McCarty 984 DR Davis MF et al (2009) Genetic resources for maize cell wall biology Plant Physiol 151 1703ndash1728 985

Ponnala L Wang Y Sun Q Wijk KJ (2014) Correlation of mRNA and protein abundance in the developing 986 maize leaf Plant J 78 424ndash440 987

Rau A Gallopin M Celeux G Jaffreacutezic F (2013) Data-based filtering for replicated high-throughput 988 transcriptome sequencing experiments Bioinformatics 29 2146ndash2152 989

Robinson MD McCarthy DJ Smyth GK (2010) edgeR a Bioconductor package for differential expression 990 analysis of digital gene expression data Bioinformatics 26 139ndash140 991

Sales G Romualdi C (2011) parmigenemdasha parallel R package for mutual information estimation and gene 992 network reconstruction Bioinformatics 27 1876ndash1877 993

Sangrador-Vegas A Mitchell AL Chang H-Y Yong S-Y Finn RD (2016) GO annotation in InterPro why 994 stability does not indicate accuracy in a sea of changing annotations Database J Biol databases 995 curation 2016 996

Sato Y Antonio BA Namiki N Takehisa H Minami H Kamatsuki K Sugimoto K Shimizu Y Hirochika H 997 Nagamura Y (2011) RiceXPro a platform for monitoring gene expression in japonica rice grown under 998 natural field conditions Nucleic Acids Res 39 D1141ndashD1148 999

Schaefer RJ Briskine R Springer NM Myers CL (2014) Discovering functional modules across diverse maize 1000 transcriptomes using COB the co-expression browser PLoS One doi 101371journalpone0099193 1001

Schmidt D (2016) Co-Operation Fast Correlation Covariance and Cosine Similarity R package 1002

Schnable P Ware D Fulton R Stein J (2009) The B73 maize genome complexity diversity and dynamics 1003 Science (80- ) 326 1112ndash1115 1004

Schwanhaumlusser B Busse D Li N Dittmar G Schuchhardt J Wolf J Chen W Selbach M (2011) Global 1005 quantification of mammalian gene expression control Nature 473 337ndash342 1006

Scott-Boyer M-P Haibe-Kains B Deschepper CF (2013) Network statistics of genetically-driven gene co-1007 expression modules in mouse crosses Frontiers in Genetics 20134291 1008

Serin EAR Nijveen H Hilhorst HWM Ligterink W (2016) Learning from Co-expression Networks Possibilities 1009 and Challenges Front Plant Sci 7 444 1010

Shannon P Markiel A Ozier O Baliga NS Wang JT Ramage D Amin N Schwikowski B Ideker T (2003) 1011 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 29

Cytoscape a software environment for integrated models of biomolecular interaction networks Genome 1012 Res 13 2498ndash2504 1013

Sing T Sander O Beerenwinkel N Lengauer T (2005) ROCR visualizing classifier performance in R 1014 Bioinformatics 21 3940ndash3941 1015

Singh M Goel S Meeley RB Dantec C Parrinello H Michaud C Leblanc O Grimanelli D (2011) Production of 1016 viable gametes without meiosis in maize deficient for an ARGONAUTE protein Plant Cell 23 443ndash458 1017

Song L Langfelder P Horvath S (2012) Comparison of co-expression measures mutual information 1018 correlation and model based indices BMC Bioinformatics 13 328 1019

Stelpflug SC Rajandeep S Vaillancourt B Hirsch CN Buell CR Leon N De Kaeppler SM (2015) An 1020 expanded maize gene expression atlas based on RNA-sequencing and its use to explore root 1021 development Plant Genome 314ndash362 1022

Szklarczyk D Franceschini A Wyder S Forslund K Heller D Huerta-Cepas J Simonovic M Roth A Santos A 1023 Tsafou KP et al (2015) STRING v10 Protein-protein interaction networks integrated over the tree of life 1024 Nucleic Acids Res 43 D447ndashD452 1025

Tzfadia O Diels T De Meyer S Vandepoele K Aharoni A Van de Peer Y (2016) CoExpNetViz Comparative 1026 Co-Expression Networks Construction and Visualization Tool Front Plant Sci 6 1194 1027

Usadel B Obayashi T Mutwil M Giorgi FM Bassel GW Tanimoto M Chow A Steinhauser D Persson S 1028 Provart NJ (2009) Co-expression tools for plant biology opportunities for hypothesis generation and 1029 caveats Plant Cell Environ 32 1633ndash51 1030

USDA (2016) Grain World Markets and Trade 1031

Vanholme R Demedts B Morreel K Ralph J Boerjan W (2010) Lignin biosynthesis and structure Plant 1032 Physiol 153 895ndash905 1033

Verdin E Ott M (2014) 50 Years of Protein Acetylation From Gene Regulation To Epigenetics Metabolism 1034 and Beyond Nat Rev Mol Cell Biol 16 258ndash264 1035

Walley JW Sartor RC Shen Z Schmitz RJ Wu KJ Urich MA Nery JR Smith LG Schnable JC Ecker JR 1036 (2016) Integration of omic networks in a developmental atlas of maize Science 353 814ndash818 1037

Wang W Zhou XJ Liu Z Sun F (2015a) Network tuned multiple rank aggregation and applications to gene 1038 ranking BMC Bioinformatics 16 S6 1039

Wang YXR Jiang K Feldman LJ Bickel PJ Huang H (2015b) Inferring genendashgene interactions and functional 1040 modules using sparse canonical correlation analysis Ann Appl Stat 9 300ndash323 1041

Wang Z Gerstein M Snyder M (2009) RNA-Seq a revolutionary tool for transcriptomics Nat Rev Genet 10 1042 57ndash63 1043

Wei C Li J Bumgarner RE (2004) Sample size for detecting differentially expressed genes in microarray 1044 experiments BMC Genomics 5 87 1045

Wolfe CJ Kohane IS Butte AJ (2005) Systematic survey reveals general applicability ofldquo guilt-by-associationrdquo 1046 within gene coexpression networks BMC Bioinformatics 6 227 1047

Wu D-D Wang X Li Y Zeng L Irwin DM Zhang Y-P (2014) ldquoOut of Pollenrdquo Hypothesis for Origin of New 1048 Genes in Flowering Plants Study from Arabidopsis thaliana Genome Biol Evol 6 2822ndash2829 1049

Yang H Liu X Xin M Du J Hu Z Peng H Rossi V Sun Q Ni Z Yao Y (2016) Genome-wide Mapping of 1050 Targets of Maize Histone Deacetylase HDA101 Reveals Its Function and Regulatory Mechanism during 1051 Seed Development Plant Cell 28 629ndash645 1052

Yim WC Yu Y Song K Jang CS Lee B-M (2013) PLANEX the plant co-expression database BMC Plant Biol 1053 13 83 1054

Zheng W Chung LM Zhao H (2011) Bias detection and correction in RNA-Sequencing data BMC 1055 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 30

Bioinformatics 12 290 1056

Zhong R Allen JD Xiao G Xie Y (2014) Ensemble-based network aggregation improves the accuracy of gene 1057 network reconstruction PLoS One 9 e106319 1058

Zhong R Ye ZH (2015) Secondary cell walls Biosynthesis patterned deposition and transcriptional regulation 1059 Plant Cell Physiol 56 195ndash214 1060

Zhu G Wu A Xu X-J Xiao P Lu L Liu J Cao Y Chen L Wu J Zhao X-M (2016) PPIM A protein-protein 1061 interaction database for Maize Plant Physiol 170 pp1501821- 1062

Zyprych-Walczak J Szabelska A Handschuh L Goacuterczak K Klamecka K Figlerowicz M Siatkowski I (2015) 1063 The impact of normalization methods on RNA-Seq data analysis Biomed Res Int 2015 1064

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 1 Number of maize microarray and RNA-Seq samples submitted to NCBI (httpswwwncbinlmnihgov) from 2008-2016 A A text search of the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) database identified samples generated by microarray platforms GPL4032 and GPL12620 the total values for years 2008 to 2016 were combined to represent the number of microarray studies (Microarray) A text search of the NCBI Sequence Read Archive (SRA) database was used to identify RNA-seq samples generated between 2008 and 2016 using the Illumina sequencing platform (RNA-Seq) B the number of samples submitted to NCBI GEO database each year generated by microarray platform GPL4032 and GPL12620 were identified by a text search (dash line) and compared to the number of RNA-Seq Illumina samples (solid line) per year 2008-2016

Fig 1A B

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 2 Normalization and network inference methods effect on single network performance A Network performance was evaluated by calculating area under the receiver operating characteristic curve (AUROC) values from GO datasets for comparisons with samples normalized using Variance Stabilizing Transformation (VST) Counts Per Million (CPM) or Reads per Kilobase per Million (RPKM) methods B Network performance was evaluated by calculating AUROC values from PPPTY dataset comparisons for samples normalized using VST CPM or RPKM methods C Network performance was evaluated by calculating AUROC values from com-parisons with HDA101 binding targets for samples normalized using VST CPM or RPKM methods D Network performance was evaluated by calculating AUROC values from comparisons with GO dataset for samples constructed using nine inference methods including Pearson Correlation Coefficient (PCC) Spearman correla-tion coefficient (SCC) Kendall rank Correlation Coefficient (KCC) Gini correlation coefficient (GCC) Biweight midcorrelation (BIC) Cosine Similarity Coefficient (CSC) Additive ARCNE (AA) multiplicative ARCNE (MA) Minimum Redundancy NETwork (MRNET) and Context Likelihood of Relatedness (CLR) E Network perfor-mance was evaluated by calculating AUROC values from comparisons with PPPTY for samples constructed using nine inference methods F Network performance was evaluated by calculating AUROC values from com-parisons with HDA101 binding targets for samples constructed using nine inference methods Outliers were defined as outside of 15 times the interquartile range above 75 quantile or below 25 quantile Median values were plotted as bold horizontal lines For C to F the horizontal dashed lines indicate the highest and lowest AUROC values

Fig 2 A D

B E

C F

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

FigP

FigurePIPSimilarityPbetweenPninePinferencePmethodsPonPnetworkPperformancePbasedPuponPGOPA-PandPPPPTYPB-PevaluationI Genes with highest AUROC value in CSC or MRNETCLR are closed in blue and green box respectively AreaPunderPthePROCPcurvePAUROC-PvaluesPforPeachPGOPtermPorPgenesPwerePscaledPtoPstandardPnormalPdistributionMPresultingPinPscaledPAUROCPvaluesPbetweenPKPblue-PandPPred-IPSamplesPnormalizedPbyPVSTMPCPMPandPRPKMPwerePanalyzedPusingPeachPinferencePmethodsPPCCMPSCCMPKCCMPGCCMPBICMPCSCMPAAMPMAMPMRNETPandPCLR-PandPclusteredPbasedPonP EuclidianP distanceIP PCCP PearsonP CorrelationP CoefficientP SCCP SpearmanP correlationP coefficientP KCCP KendallP rankP CorrelationP CoefficientPGCCPGiniPcorrelationPcoefficientPBICPBiweightPmidcorrelationPCosinePSimilarityPCoefficientPCSC-PAAPAdditivePARCNEPMAPmultiplicativePARCNEPMRNETPMinimumPRedundancyPNETworkPCLRPContextPLikelihoodPofPRelatednessI

A

B

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 4 Effect of sample size on network performance A Average area under the ROC curve (average AUROC) values from GO evaluation of 17 different sized networks plotted against natural logarithm transformed sample size (log(sample size)) B Average AUROC values from PPPTY evaluation of 17 different sized networks plotted against natural logarithm transformed sample size (log(sample size)) Regression fitting logarithm models were plotted in black lines R-square and p-values were calculated by lm() function in R PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness

Fig 4A

B

GO PCC GO SCC

GO MRNET GO CLR

PPPTY PCC PPPTY SCC

PPPTY CLRPPPTY MRNET

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Fig 5

A B

Figure 5 Evaluation of different sized networks constructed with different number of samples using PCC (black) SCC (green) MRNET (red) and CLR (blue) methods A Average AUROC values from GO evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted against sample size B Average AUROC values from PPPTY evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted against sample size Networks with the same number of samples included are designated as ldquo_1rdquo ldquo_2rdquo and ldquo_3rdquo PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 6 GCN performance comparison of networks constructed with 1266 libraries A Area under the ROC curve (AUROC) values from GO evalu-ation of single network (white bars) aggregation network (grey bars) and protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers B AUROC values from PPPTY evaluation of single network (white bars) aggregation network (grey bars) and protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers

AU

C

AU

C

Protein GO Protein PPPTYA B

Fig 6

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

835

45

3812

5 802

148

MS PA

SA

872505

167664

16510411732

9172 716573

106591

MS PA

SA

chromatinassemblydisassembly

cellular macromoleculemetabolic process

chromatin assembly

Hub

N2 compound metabolicprocess

gene silencing

macromoleculemetabolic process

cellular componentorganization

chromatin modification

biosynthetic process

cellular biosyntheticprocess

DNA packaging

organelle organization

protein-DNA complexassembly

nucleosomeorganization

DNA-dep DNAreplication

macromoleculebiosynthetic process

response to DNAdamage stimulus

chromosomeorganization

pattern specificationprocess

DNA replication

DNA conformationchange

translation

cellular macromoleculebiosynthetic process

Nucleic acid metabolicprocess

gene expression

chromatin organizationnucleosome assembly

epigenetic reg of geneexpression

negative regulation ofmacromolecule

metabolic process

cellular response tostress

RNA processing

DNA repair

regionalization

polysaccharidebiosynthetic process

cell wall organization orbiogenesis

glucan metabolicprocess

cellular glucanmetabolic process

cellular polysaccharidebiosynthetic process

cellular carbohydratebiosynthetic process

cellulose metabolicprocess

cellular polysaccharidemetabolic process

cellulose biosyntheticprocess epidermis development

cell growthgrowth

regulation of cellularcomponent size

cellular amino acidderivative metabolic

process

cell wall polysaccharidemetabolic process

carbohydrate metabolicprocess

regulation of anatomicalstructure size

GTP metabolic process

root morphogenesis

epidermal celldifferentiation

ectoderm developmentphenylpropanoid

biosynthetic process

regulation of cell size

glucan biosyntheticprocess

carbohydratebiosynthetic process cellular cell wall

organization orbiogenesis

cell wall biogenesis

Cell Wallroot epidermal cell

differentiationcell differentiation

cell wall organization

protein polymerization

plant-type cell wallbiogenesis

cellular carbohydratemetabolic process

phenylpropanoidmetabolic process

cell wall macromoleculemetabolic process

cellular cell wallmacromolecule

metabolic process

plant-type cell wallorganization or

biogenesis

hemicellulose metabolicprocess

Fig 7A C

B D

Figure 7 Comparison of top 1000 interacting genes in aggregation and single networks A Venn diagram shows the overlap among 1000 highest interacting genes in three networks 148 genes were shared among three networks PA PCC ranked aggregation network SA SCC ranked aggrega-tion network MS MRNET single network B Venn diagram shows the overlap among top 1x106 edges in three networks 106591 edges were shared among three networks C GO term enrichment analyzed by AgriGO for 148 shared highly interacting genes among PA SA and MS D GO term enrichment analyzed by AgriGO for 214 co-expressed genes queried by 16 cell wall pathway genes wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Fig 8

A B C

Figure 8 Cell wall pathway subnetworks A Intersections of PCC aggregation (PA) SCC aggregation (SA) and MRNET-single (MS) networks queried by 16 cell wall pathway genes (red nodes) Cyan nodes are genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions B Network retrieved from CORNET database queried by the 16 cell wall pathway genes (red node) Cyan nodes are genes with reported function in cell wall related path-ways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions C Network retrieved from STRING database queried by the 16 cell wall pathway genes (red nodes) Cyan nodes are genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Parsed CitationsAllen JD Xie Y Chen M Girard L Xiao G (2012) Comparing statistical methods for constructing large scale gene networks PLoS One7 e29348

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Anders S Huber W (2010) Differential expression analysis for sequence count data Genome Biol 11 R106Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Aoki K Ogata Y Shibata D (2007) Approaches for extracting practical information from gene co-expression networks in plant biologyPlant Cell Physiol 48 381-90

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Aoki Y Okamura Y Tadaka S Kinoshita K Obayashi T (2015) ATTED-II in 2016 A Plant Coexpression Database Towards Lineage-Specific Coexpression Plant Cell Physiol 57 e5-e5

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Asnicar F Masera L Coller E Gallo C Sella N Tolio T Morettin P Erculiani L Galante F Semeniuta S (2016) NES2RA Networkexpansion by stratified variable subsetting and ranking aggregation Int J High Perform Comput Appl 1094342016662508

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Baerenfaller K Grossmann J Grobei MA Hull R Hirsch-Hoffmann M Yalovsky S Zimmermann P Grossniklaus U Gruissem WBaginsky S (2008) Genome-scale proteomics reveals Arabidopsis thaliana gene models and proteome dynamics Science (80- ) 320938-941

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ballouz S Verleyen W Gillis J (2015) Guidance for RNA-seq co-expression network construction and analysis Safety in numbersBioinformatics 31 2123-2130

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ballouz S Weber M Pavlidis P Gillis J (2016) EGAD Ultra-fast functional analysis of gene networks bioRxiv 53868Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Barabasi A-L Oltvai ZNZN Barabaacutesi A-L (2004) Network biology understanding the cells functional organization Nat Rev Genet 5101-113

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Benjamini Y Hochberg Y (1995) Controlling the false discovery rate a practical and powerful approach to multiple testing J R Stat SocSer B 289-300

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

De Bodt S Hollunder J Nelissen H Meulemeester N Inzeacute D (2012) CORNET 20 Integrating plant coexpression protein-proteininteractions regulatory interactions gene associations and functional annotations New Phytol 195 707-720

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Bolduc N Yilmaz A Mejia-Guerra MK Morohashi K OConnor D Grotewold E Hake S (2012) Unraveling the KNOTTED1 regulatorynetwork in maize meristems Genes Dev 26 1685-90

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Bosch M Mayer C-D Cookson A Donnison IS (2011) Identification of genes involved in cell wall biogenesis in grasses by differential wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

gene expression profiling of elongating and non-elongating maize internodes J Exp Bot 62 3545-3561Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Camacho C Coulouris G Avagyan V Ma N Papadopoulos J Bealer K Madden TL (2009) BLAST+ architecture and applications BMCBioinformatics 10 421

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Conesa A Madrigal P Tarazona S Gomez-Cabrero D Cervera A McPherson A Szczesniak MW Gaffney DJ Elo LL Zhang X et al(2016) A survey of best practices for RNA-seq data analysis Genome Biol 17 13

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dhaeseleer P Liang S Somogyi R (2000) Genetic network inference from co-expression clustering to reverse engineeringBioinformatics 16 707-726

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Davidson RM Hansey CN Gowda M Childs KL Lin H Vaillancourt B Sekhon RS de Leon N Kaeppler SM Jiang N et al (2011) Utilityof RNA Sequencing for Analysis of Maize Reproductive Transcriptomes Plant Genome J 4 191

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Desprez T Juraniec M Crowell EF Jouy H Pochylova Z Parcy F Houmlfte H Gonneau M Vernhettes S (2007) Organization of cellulosesynthase complexes involved in primary cell wall synthesis in Arabidopsis thaliana Proc Natl Acad Sci 104 15572-15577

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dhillon IS Modha DS (2001) Concept decompositions for large sparse text data using clustering Mach Learn 42 143-175Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dillies M-A Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot G Castel D Estelle J (2013a) Acomprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis Brief Bioinform 14671-683

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dillies MA Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot NS Castel D Estelle J et al (2013b) Acomprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis Brief Bioinform 14671-683

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Doncheva NT Assenov Y Domingues FS Albrecht M (2012) Topological analysis and interactive visualization of biological networksand protein structures Nature Protoc 7 670-85

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dong J Horvath S (2007) Understanding network concepts in modules BMC Syst Biol 1 24Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dotto MC Petsch KA Aukerman MJ Beatty M Hammell M Timmermans MCP (2014) Genome-wide analysis of leafbladeless1-regulatedand phased small RNAs underscores the importance of the TAS3 ta-siRNA pathway to maize development PLoS Genet 10 e1004826

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Du D Rawat N Deng Z Gmitter FG (2015) Construction of citrus gene coexpression networks from microarray data using randommatrix theory Hortic Res 2 15026

Pubmed Author and TitleCrossRef Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Google Scholar Author Only Title Only Author and Title

Du Z Zhou X Ling Y Zhang Z Su Z (2010) agriGO a GO analysis toolkit for the agricultural community Nucleic Acids Res 38 64-70Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Enright AJ Van Dongen S Ouzounis CA (2002) An efficient algorithm for large-scale detection of protein families Nucleic Acids Res 301575-1584

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fadista J Vikman P Laakso EO Mollet IG Esguerra J Lou Taneera J Storm P Osmark P Ladenvall C Prasad RB (2014) Globalgenomic and transcriptomic analysis of human pancreatic islets reveals novel genes influencing glucose metabolism Proc Natl AcadSci 111 13924-13929

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Faith JJ Hayete B Thaden JT Mogno I Wierzbowski J Cottarel G Kasif S Collins JJ Gardner TS (2007) Large-scale mapping andvalidation of Escherichia coli transcriptional regulation from a compendium of expression profiles PLoS Biol 5 0054-0066

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fedoroff N V (2012) McClintocks challenge in the 21st century Proc Natl Acad Sci 109(50) 20200-20203Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ficklin SP Feltus FA (2011) Gene coexpression network alignment and conservation of gene modules between two grass speciesmaize and rice Plant Physiol 156 1244-56

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Filzmoser P Fritz H Kalcher K (2009) pcaPP Robust PCA by Projection Pursuit R Packag version 1Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fu J Cheng Y Linghu J Yang X Kang L Zhang Z Zhang J He C Du X Peng Z (2013) RNA sequencing reveals the complex regulatorynetwork in the maize kernel Nature Commun 42832

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gao S Ver Steeg G Galstyan A (2015) Efficient Estimation of Mutual Information for Strongly Dependent Variables ArtificialIntelligence and Statistics 277-286

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gillis J Pavlidis P (2011) The role of indirect connections in gene networks in predicting function Bioinformatics 27 1860-1866Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Giorgi F Fabbro C Del Licausi F (2013) Comparative study of RNA-seq-and Microarray-derived coexpression networks in Arabidopsisthaliana Bioinformatics 2 1-8

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gu Y Kaplinsky N Bringmann M Cobb A Carroll A Sampathkumar A Baskin TI Persson S Somerville CR (2010) Identification of acellulose synthase-associated protein required for cellulose biosynthesis Proc Natl Acad Sci 107 12866-12871

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Han Y Gao S Muegge K Zhang W Zhou B (2015) Advanced applications of RNA sequencing and challenges Bioinform Biol Insights 929-46

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Horvath S Dong J (2008) Geometric interpretation of gene coexpression network analysis PLoS Comput biol 4 e1000117Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Huang J Lynn JS Schulte L Vendramin S McGinnis K (2017) Chapter Two-Epigenetic Control of Gene Expression in Maize Int RevCell Mol Biol 328 25-48

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Iancu OD Kawane S Bottomly D Searles R Hitzemann R McWeeney S (2012) Utilizing RNA-Seq data for de novo coexpressionnetwork inference Bioinformatics 28 1592-1597

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kim D Langmead B Salzberg SL (2015) HISAT a fast spliced aligner with low memory requirements Nat Methods 12 357-360Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Koumlster J Rahmann S (2012) Snakemakemdasha scalable bioinformatics workflow engine Bioinformatics 28 2520-2522Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Krouk G Lingeman J Colon A Coruzzi G Shasha D (2013) Gene regulatory networks in plants learning causality from time andperturbation Genome Biol 14 123

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kumar S Stecher G Suleski M Hedges SB (2017) TimeTree a resource for timelines timetrees and divergence times Mol Biol Evol34 1812-1819

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kumari S Nie J Chen H-S Ma H Stewart R Li X Lu M-Z Taylor WM Wei H (2012) Evaluation of gene association methods forcoexpression network construction and biological knowledge discovery PLoS One 7 e50411

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Langfelder P Horvath S (2008) WGCNA an R package for weighted correlation network analysis BMC Bioinformatics 9 559Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Leinonen R Sugawara H Shumway M (2010) The sequence read archive Nucleic Acids Res gkq1019Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li C Qiao Z Qi W Wang Q Yuan Y Yang X Tang Y Mei B Lv Y Zhao H et al (2015a) Genome-Wide Characterization of cis-Acting DNATargets Reveals the Transcriptional Regulatory Framework of Opaque2 in Maize Plant Cell 27 532-545

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li H Peng Z Yang X Wang W Fu J Wang J Han Y Chai Y Guo T Yang N (2013a) Genome-wide association study dissects the geneticarchitecture of oil biosynthesis in maize kernels Nat Genet 45 43-50

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li J Wei H Zhao PX (2013b) DeGNServer Deciphering Genome-Scale Gene Networks through High Performance ReverseEngineering Analysis 2013

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li P Piao Y Shon HS Ryu KH (2015b) Comparing the normalization methods for the differential analysis of Illumina high-throughputRNA-Seq data BMC Bioinformatics 16 347 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li Q Eichten SR Hermanson PJ Zaunbrecher VM Song J Wendt J Rosenbaum H Madzima TF Sloan AE Huang J et al (2014a)Genetic Perturbation of the Maize Methylome Plant Cell 26 4602-4616

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li S Labaj PP Zumbo P Sykacek P Shi W Shi L Phan J Wu P-Y Wang M Wang C (2014b) Detecting and correcting systematicvariation in large-scale RNA sequencing data Nature Biotechnol 32 888-895

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li Y Pearl SA Jackson SA (2015c) Gene Networks in Plant Biology Approaches in Reconstruction and Analysis Trends Plant Sci 20664-675

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Liao Y Smyth GK Shi W (2014) featureCounts an efficient general purpose program for assigning sequence reads to genomicfeatures Bioinformatics 30 923-930

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Lim WK Wang K Lefebvre C Califano A (2007) Comparative analysis of microarray normalization procedures Effects on reverseengineering gene networks Bioinformatics pp 282-288

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Liu S Liu Y Zhao J Cai S Qian H Zuo K Zhao L Zhang L (2017) A computational interactome for prioritizing genes associated withcomplex agronomic traits in rice Plant J 90 177-188

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Lonsdale J Thomas J Salvatore M Phillips R Lo E Shad S Hasz R Walters G Garcia F Young N (2013) The genotype-tissueexpression (GTEx) project Nat Genet 45 580-585

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Love MI Huber W Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 Genome Biol15 1

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Luo X You Z Zhou M Li S Leung H Xia Y Zhu Q (2015) A highly efficient approach to protein interactome mapping based oncollaborative filtering framework Sci Rep 5 7702

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma C Wang X (2012) Application of the Gini correlation coefficient to infer regulatory relationships in transcriptome analysis PlantPhysiol 160 192-203

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma H-W Zeng A-P (2003) The connectivity structure giant strong component and centrality of metabolic networks Bioinformatics 191423-1430

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma S Shah S Bohnert HJ Snyder M Dinesh-Kumar SP (2013) Incorporating motif analysis into gene co-expression networks revealsnovel modular expression pattern and new signaling pathways PLoS Genet 9 e1003840

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Marbach D Costello JC Kuumlffner R Vega NNM Prill RJ Camacho DM Allison KR Aderhold A Allison KR Bonneau R et al (2012)Wisdom of crowds for robust gene network inference Nat Methods 9 796-804

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Margolin AA Nemenman I Basso K Wiggins C Stolovitzky G Dalla Favera R Califano A (2006) ARACNE an algorithm for thereconstruction of gene regulatory networks in a mammalian cellular context BMC Bioinformatics 7 S7

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mark Cigan A Unger-Wallace E Haug-Collet K (2005) Transcriptional gene silencing as a tool for uncovering gene function in maizePlant J 43 929-940

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads EMBnet J 17 pp-10Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Maza E Frasse P Senin P Bouzayen M Zouine M (2013) Comparison of normalization methods for differential gene expressionanalysis in RNA-Seq experiments A matter of relative size of studied transcriptomes Commun Integr Biol 6 e25849

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

McClintock B (1983) The Significance of Responses of THE GENOME TO CHALLENGE Science (80- ) 792-801Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Meyer PE Kontos K Lafitte F Bontempi G (2007) Information-theoretic inference of large transcriptional regulatory networks EurasipJ Bioinforma Syst Biol doi 101155200779879

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Meyer PE Lafitte F Bontempi G (2008) minet ARBioconductor package for inferring large transcriptional networks using mutualinformation BMC Bioinformatics 9 461

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Monaco MK Sen TZ Dharmawardhana PD Ren L Schaeffer M Naithani S Amarasinghe V Thomason J Harper L Gardiner J et al(2013) Maize Metabolic Network Construction and Transcriptome Analysis Plant Genome 6 12

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Morohashi K Casas MI Falcone Ferreyra ML Falcone Ferreyra L Mejiacutea-Guerra MK Pourcel L Yilmaz A Feller A Carvalho B EmilianiJ et al (2012) A genome-wide regulatory framework identifies maize pericarp color1 controlled genes Plant Cell 24 2745-64

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Morris JH Apeltsin L Newman AM Baumbach J Wittkop T Su G Bader GD Ferrin TE (2011) clusterMaker a multi-algorithmclustering plugin for Cytoscape BMC Bioinformatics 12 436

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mortazavi A Williams BA McCue K Schaeffer L Wold B (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq NatMethods 5 621-628

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mukaka MM (2012) A guide to appropriate use of correlation coefficient in medical research Malawi Med J 24 69-71Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Obayashi T Hayashi S Saeki M Ohta H Kinoshita K (2009) ATTED-II provides coexpressed gene networks for Arabidopsis Nucleic wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Acids Res 37 D987-D991Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ogata Y Suzuki H Sakurai N Shibata D (2010) CoP a database for characterizing co-expressed gene modules with biologicalinformation in plants Bioinformatics 26 1267-1268

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Oshlack A Wakefield MJ (2009) Transcript length bias in RNA-seq data confounds systems biology Biol Direct 4 14Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Park T Yi S-G Kang S-H Lee S Lee Y-S Simon R (2003) Evaluation of normalization methods for microarray data BMC Bioinformatics4 33

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Paulson J Chen C-Y Lopes-Ramos CM Kuijjer ML Platig J Sonawane AR Fagny M Glass K Quackenbush J (2016) Tissue-awareRNA-Seq processing and normalization for heterogeneous and sparse data bioRxiv 81802

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Pautler M Eveland AL LaRue T Yang F Weeks R Lunde C Je B Il Meeley R Komatsu M Vollbrecht E et al (2015) FASCIATED EAR4Encodes a bZIP Transcription Factor That Regulates Shoot Meristem Size in Maize Plant Cell Online 2 tpc114132506

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Penning BW Hunter 3rd CT Tayengwa R Eveland AL Dugard CK Olek AT Vermerris W Koch KE McCarty DR Davis MF et al (2009)Genetic resources for maize cell wall biology Plant Physiol 151 1703-1728

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ponnala L Wang Y Sun Q Wijk KJ (2014) Correlation of mRNA and protein abundance in the developing maize leaf Plant J 78 424-440Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Rau A Gallopin M Celeux G Jaffreacutezic F (2013) Data-based filtering for replicated high-throughput transcriptome sequencingexperiments Bioinformatics 29 2146-2152

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Robinson MD McCarthy DJ Smyth GK (2010) edgeR a Bioconductor package for differential expression analysis of digital geneexpression data Bioinformatics 26 139-140

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sales G Romualdi C (2011) parmigenemdasha parallel R package for mutual information estimation and gene network reconstructionBioinformatics 27 1876-1877

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sangrador-Vegas A Mitchell AL Chang H-Y Yong S-Y Finn RD (2016) GO annotation in InterPro why stability does not indicateaccuracy in a sea of changing annotations Database J Biol databases curation 2016

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sato Y Antonio BA Namiki N Takehisa H Minami H Kamatsuki K Sugimoto K Shimizu Y Hirochika H Nagamura Y (2011) RiceXPro aplatform for monitoring gene expression in japonica rice grown under natural field conditions Nucleic Acids Res 39 D1141-D1148

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schaefer RJ Briskine R Springer NM Myers CL (2014) Discovering functional modules across diverse maize transcriptomes using wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

COB the co-expression browser PLoS One doi 101371journalpone0099193Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schmidt D (2016) Co-Operation Fast Correlation Covariance and Cosine Similarity R packagePubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schnable P Ware D Fulton R Stein J (2009) The B73 maize genome complexity diversity and dynamics Science (80- ) 326 1112-1115Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schwanhaumlusser B Busse D Li N Dittmar G Schuchhardt J Wolf J Chen W Selbach M (2011) Global quantification of mammalian geneexpression control Nature 473 337-342

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Scott-Boyer M-P Haibe-Kains B Deschepper CF (2013) Network statistics of genetically-driven gene co-expression modules in mousecrosses Frontiers in Genetics 20134291

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Serin EAR Nijveen H Hilhorst HWM Ligterink W (2016) Learning from Co-expression Networks Possibilities and Challenges FrontPlant Sci 7 444

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Shannon P Markiel A Ozier O Baliga NS Wang JT Ramage D Amin N Schwikowski B Ideker T (2003) Cytoscape a softwareenvironment for integrated models of biomolecular interaction networks Genome Res 13 2498-2504

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sing T Sander O Beerenwinkel N Lengauer T (2005) ROCR visualizing classifier performance in R Bioinformatics 21 3940-3941Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Singh M Goel S Meeley RB Dantec C Parrinello H Michaud C Leblanc O Grimanelli D (2011) Production of viable gametes withoutmeiosis in maize deficient for an ARGONAUTE protein Plant Cell 23 443-458

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Song L Langfelder P Horvath S (2012) Comparison of co-expression measures mutual information correlation and model basedindices BMC Bioinformatics 13 328

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Stelpflug SC Rajandeep S Vaillancourt B Hirsch CN Buell CR Leon N De Kaeppler SM (2015) An expanded maize gene expressionatlas based on RNA-sequencing and its use to explore root development Plant Genome 314-362

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Szklarczyk D Franceschini A Wyder S Forslund K Heller D Huerta-Cepas J Simonovic M Roth A Santos A Tsafou KP et al (2015)STRING v10 Protein-protein interaction networks integrated over the tree of life Nucleic Acids Res 43 D447-D452

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Tzfadia O Diels T De Meyer S Vandepoele K Aharoni A Van de Peer Y (2016) CoExpNetViz Comparative Co-Expression NetworksConstruction and Visualization Tool Front Plant Sci 6 1194

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Usadel B Obayashi T Mutwil M Giorgi FM Bassel GW Tanimoto M Chow A Steinhauser D Persson S Provart NJ (2009) Co-expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

USDA (2016) Grain World Markets and Trade

Vanholme R Demedts B Morreel K Ralph J Boerjan W (2010) Lignin biosynthesis and structure Plant Physiol 153 895-905Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Verdin E Ott M (2014) 50 Years of Protein Acetylation From Gene Regulation To Epigenetics Metabolism and Beyond Nat Rev MolCell Biol 16 258-264

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Walley JW Sartor RC Shen Z Schmitz RJ Wu KJ Urich MA Nery JR Smith LG Schnable JC Ecker JR (2016) Integration of omicnetworks in a developmental atlas of maize Science 353 814-818

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang W Zhou XJ Liu Z Sun F (2015a) Network tuned multiple rank aggregation and applications to gene ranking BMC Bioinformatics16 S6

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang YXR Jiang K Feldman LJ Bickel PJ Huang H (2015b) Inferring gene-gene interactions and functional modules using sparsecanonical correlation analysis Ann Appl Stat 9 300-323

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang Z Gerstein M Snyder M (2009) RNA-Seq a revolutionary tool for transcriptomics Nat Rev Genet 10 57-63Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wei C Li J Bumgarner RE (2004) Sample size for detecting differentially expressed genes in microarray experiments BMC Genomics5 87

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wolfe CJ Kohane IS Butte AJ (2005) Systematic survey reveals general applicability of guilt-by-association within genecoexpression networks BMC Bioinformatics 6 227

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wu D-D Wang X Li Y Zeng L Irwin DM Zhang Y-P (2014) Out of Pollen Hypothesis for Origin of New Genes in Flowering PlantsStudy from Arabidopsis thaliana Genome Biol Evol 6 2822-2829

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yang H Liu X Xin M Du J Hu Z Peng H Rossi V Sun Q Ni Z Yao Y (2016) Genome-wide Mapping of Targets of Maize HistoneDeacetylase HDA101 Reveals Its Function and Regulatory Mechanism during Seed Development Plant Cell 28 629-645

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yim WC Yu Y Song K Jang CS Lee B-M (2013) PLANEX the plant co-expression database BMC Plant Biol 13 83Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zheng W Chung LM Zhao H (2011) Bias detection and correction in RNA-Sequencing data BMC Bioinformatics 12 290Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Allen JD Xiao G Xie Y (2014) Ensemble-based network aggregation improves the accuracy of gene network reconstructionPLoS One 9 e106319 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Ye ZH (2015) Secondary cell walls Biosynthesis patterned deposition and transcriptional regulation Plant Cell Physiol 56195-214

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhu G Wu A Xu X-J Xiao P Lu L Liu J Cao Y Chen L Wu J Zhao X-M (2016) PPIM A protein-protein interaction database for MaizePlant Physiol 170 pp1501821-

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zyprych-Walczak J Szabelska A Handschuh L Goacuterczak K Klamecka K Figlerowicz M Siatkowski I (2015) The impact of normalizationmethods on RNA-Seq data analysis Biomed Res Int 2015

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

  • Parsed Citations
  • Article File
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Figure 7
  • Figure 8
  • Parsed Citations
Page 27: 1 : Maize RNA-Seq GCNs · 4 Author names and affiliation: 5 Ji Huang, ... and the genome annotation is still far from complete (Mark 49 Cigan et al., 2005; Ficklin and Feltus, 2011)

Page | 27

Effects on reverse engineering gene networks Bioinformatics pp 282ndash288 924

Liu S Liu Y Zhao J Cai S Qian H Zuo K Zhao L Zhang L (2017) A computational interactome for prioritizing 925 genes associated with complex agronomic traits in rice Plant J 90 177-188 926

Lonsdale J Thomas J Salvatore M Phillips R Lo E Shad S Hasz R Walters G Garcia F Young N (2013) 927 The genotype-tissue expression (GTEx) project Nat Genet 45 580ndash585 928

Love MI Huber W Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data 929 with DESeq2 Genome Biol 15 1 930

Luo X You Z Zhou M Li S Leung H Xia Y Zhu Q (2015) A highly efficient approach to protein interactome 931 mapping based on collaborative filtering framework Sci Rep 5 7702 932

Ma C Wang X (2012) Application of the Gini correlation coefficient to infer regulatory relationships in 933 transcriptome analysis Plant Physiol 160 192ndash203 934

Ma H-W Zeng A-P (2003) The connectivity structure giant strong component and centrality of metabolic 935 networks Bioinformatics 19 1423ndash1430 936

Ma S Shah S Bohnert HJ Snyder M Dinesh-Kumar SP (2013) Incorporating motif analysis into gene co-937 expression networks reveals novel modular expression pattern and new signaling pathways PLoS Genet 938 9 e1003840 939

Marbach D Costello JC Kuumlffner R Vega NNM Prill RJ Camacho DM Allison KR Aderhold A Allison KR 940 Bonneau R et al (2012) Wisdom of crowds for robust gene network inference Nat Methods 9 796ndash804 941

Margolin AA Nemenman I Basso K Wiggins C Stolovitzky G Dalla Favera R Califano A (2006) ARACNE 942 an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context BMC 943 Bioinformatics 7 S7 944

Mark Cigan A Unger‐Wallace E Haug‐Collet K (2005) Transcriptional gene silencing as a tool for uncovering 945 gene function in maize Plant J 43 929ndash940 946

Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads EMBnet J 17 947 pp-10 948

Maza E Frasse P Senin P Bouzayen M Zouine M (2013) Comparison of normalization methods for 949 differential gene expression analysis in RNA-Seq experiments A matter of relative size of studied 950 transcriptomes Commun Integr Biol 6 e25849 951

McClintock B (1983) The Significance of Responses of THE GENOME TO CHALLENGE Science (80- ) 792ndash952 801 953

Meyer PE Kontos K Lafitte F Bontempi G (2007) Information-theoretic inference of large transcriptional 954 regulatory networks Eurasip J Bioinforma Syst Biol doi 101155200779879 955

Meyer PE Lafitte F Bontempi G (2008) minet ARBioconductor package for inferring large transcriptional 956 networks using mutual information BMC Bioinformatics 9 461 957

Monaco MK Sen TZ Dharmawardhana PD Ren L Schaeffer M Naithani S Amarasinghe V Thomason J 958 Harper L Gardiner J et al (2013) Maize Metabolic Network Construction and Transcriptome Analysis 959 Plant Genome 6 12 960

Morohashi K Casas MI Falcone Ferreyra ML Falcone Ferreyra L Mejiacutea-Guerra MK Pourcel L Yilmaz A 961 Feller A Carvalho B Emiliani J et al (2012) A genome-wide regulatory framework identifies maize 962 pericarp color1 controlled genes Plant Cell 24 2745ndash64 963

Morris JH Apeltsin L Newman AM Baumbach J Wittkop T Su G Bader GD Ferrin TE (2011) clusterMaker 964 a multi-algorithm clustering plugin for Cytoscape BMC Bioinformatics 12 436 965

Mortazavi A Williams BA McCue K Schaeffer L Wold B (2008) Mapping and quantifying mammalian 966 transcriptomes by RNA-Seq Nat Methods 5 621ndash628 967

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 28

Mukaka MM (2012) A guide to appropriate use of correlation coefficient in medical research Malawi Med J 24 968 69ndash71 969

Obayashi T Hayashi S Saeki M Ohta H Kinoshita K (2009) ATTED-II provides coexpressed gene networks 970 for Arabidopsis Nucleic Acids Res 37 D987ndashD991 971

Ogata Y Suzuki H Sakurai N Shibata D (2010) CoP a database for characterizing co-expressed gene 972 modules with biological information in plants Bioinformatics 26 1267ndash1268 973

Oshlack A Wakefield MJ (2009) Transcript length bias in RNA-seq data confounds systems biology Biol 974 Direct 4 14 975

Park T Yi S-G Kang S-H Lee S Lee Y-S Simon R (2003) Evaluation of normalization methods for microarray 976 data BMC Bioinformatics 4 33 977

Paulson J Chen C-Y Lopes-Ramos CM Kuijjer ML Platig J Sonawane AR Fagny M Glass K Quackenbush 978 J (2016) Tissue-aware RNA-Seq processing and normalization for heterogeneous and sparse data 979 bioRxiv 81802 980

Pautler M Eveland AL LaRue T Yang F Weeks R Lunde C Je B Il Meeley R Komatsu M Vollbrecht E et 981 al (2015) FASCIATED EAR4 Encodes a bZIP Transcription Factor That Regulates Shoot Meristem Size in 982 Maize Plant Cell Online 2 tpc114132506 983

Penning BW Hunter 3rd CT Tayengwa R Eveland AL Dugard CK Olek AT Vermerris W Koch KE McCarty 984 DR Davis MF et al (2009) Genetic resources for maize cell wall biology Plant Physiol 151 1703ndash1728 985

Ponnala L Wang Y Sun Q Wijk KJ (2014) Correlation of mRNA and protein abundance in the developing 986 maize leaf Plant J 78 424ndash440 987

Rau A Gallopin M Celeux G Jaffreacutezic F (2013) Data-based filtering for replicated high-throughput 988 transcriptome sequencing experiments Bioinformatics 29 2146ndash2152 989

Robinson MD McCarthy DJ Smyth GK (2010) edgeR a Bioconductor package for differential expression 990 analysis of digital gene expression data Bioinformatics 26 139ndash140 991

Sales G Romualdi C (2011) parmigenemdasha parallel R package for mutual information estimation and gene 992 network reconstruction Bioinformatics 27 1876ndash1877 993

Sangrador-Vegas A Mitchell AL Chang H-Y Yong S-Y Finn RD (2016) GO annotation in InterPro why 994 stability does not indicate accuracy in a sea of changing annotations Database J Biol databases 995 curation 2016 996

Sato Y Antonio BA Namiki N Takehisa H Minami H Kamatsuki K Sugimoto K Shimizu Y Hirochika H 997 Nagamura Y (2011) RiceXPro a platform for monitoring gene expression in japonica rice grown under 998 natural field conditions Nucleic Acids Res 39 D1141ndashD1148 999

Schaefer RJ Briskine R Springer NM Myers CL (2014) Discovering functional modules across diverse maize 1000 transcriptomes using COB the co-expression browser PLoS One doi 101371journalpone0099193 1001

Schmidt D (2016) Co-Operation Fast Correlation Covariance and Cosine Similarity R package 1002

Schnable P Ware D Fulton R Stein J (2009) The B73 maize genome complexity diversity and dynamics 1003 Science (80- ) 326 1112ndash1115 1004

Schwanhaumlusser B Busse D Li N Dittmar G Schuchhardt J Wolf J Chen W Selbach M (2011) Global 1005 quantification of mammalian gene expression control Nature 473 337ndash342 1006

Scott-Boyer M-P Haibe-Kains B Deschepper CF (2013) Network statistics of genetically-driven gene co-1007 expression modules in mouse crosses Frontiers in Genetics 20134291 1008

Serin EAR Nijveen H Hilhorst HWM Ligterink W (2016) Learning from Co-expression Networks Possibilities 1009 and Challenges Front Plant Sci 7 444 1010

Shannon P Markiel A Ozier O Baliga NS Wang JT Ramage D Amin N Schwikowski B Ideker T (2003) 1011 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 29

Cytoscape a software environment for integrated models of biomolecular interaction networks Genome 1012 Res 13 2498ndash2504 1013

Sing T Sander O Beerenwinkel N Lengauer T (2005) ROCR visualizing classifier performance in R 1014 Bioinformatics 21 3940ndash3941 1015

Singh M Goel S Meeley RB Dantec C Parrinello H Michaud C Leblanc O Grimanelli D (2011) Production of 1016 viable gametes without meiosis in maize deficient for an ARGONAUTE protein Plant Cell 23 443ndash458 1017

Song L Langfelder P Horvath S (2012) Comparison of co-expression measures mutual information 1018 correlation and model based indices BMC Bioinformatics 13 328 1019

Stelpflug SC Rajandeep S Vaillancourt B Hirsch CN Buell CR Leon N De Kaeppler SM (2015) An 1020 expanded maize gene expression atlas based on RNA-sequencing and its use to explore root 1021 development Plant Genome 314ndash362 1022

Szklarczyk D Franceschini A Wyder S Forslund K Heller D Huerta-Cepas J Simonovic M Roth A Santos A 1023 Tsafou KP et al (2015) STRING v10 Protein-protein interaction networks integrated over the tree of life 1024 Nucleic Acids Res 43 D447ndashD452 1025

Tzfadia O Diels T De Meyer S Vandepoele K Aharoni A Van de Peer Y (2016) CoExpNetViz Comparative 1026 Co-Expression Networks Construction and Visualization Tool Front Plant Sci 6 1194 1027

Usadel B Obayashi T Mutwil M Giorgi FM Bassel GW Tanimoto M Chow A Steinhauser D Persson S 1028 Provart NJ (2009) Co-expression tools for plant biology opportunities for hypothesis generation and 1029 caveats Plant Cell Environ 32 1633ndash51 1030

USDA (2016) Grain World Markets and Trade 1031

Vanholme R Demedts B Morreel K Ralph J Boerjan W (2010) Lignin biosynthesis and structure Plant 1032 Physiol 153 895ndash905 1033

Verdin E Ott M (2014) 50 Years of Protein Acetylation From Gene Regulation To Epigenetics Metabolism 1034 and Beyond Nat Rev Mol Cell Biol 16 258ndash264 1035

Walley JW Sartor RC Shen Z Schmitz RJ Wu KJ Urich MA Nery JR Smith LG Schnable JC Ecker JR 1036 (2016) Integration of omic networks in a developmental atlas of maize Science 353 814ndash818 1037

Wang W Zhou XJ Liu Z Sun F (2015a) Network tuned multiple rank aggregation and applications to gene 1038 ranking BMC Bioinformatics 16 S6 1039

Wang YXR Jiang K Feldman LJ Bickel PJ Huang H (2015b) Inferring genendashgene interactions and functional 1040 modules using sparse canonical correlation analysis Ann Appl Stat 9 300ndash323 1041

Wang Z Gerstein M Snyder M (2009) RNA-Seq a revolutionary tool for transcriptomics Nat Rev Genet 10 1042 57ndash63 1043

Wei C Li J Bumgarner RE (2004) Sample size for detecting differentially expressed genes in microarray 1044 experiments BMC Genomics 5 87 1045

Wolfe CJ Kohane IS Butte AJ (2005) Systematic survey reveals general applicability ofldquo guilt-by-associationrdquo 1046 within gene coexpression networks BMC Bioinformatics 6 227 1047

Wu D-D Wang X Li Y Zeng L Irwin DM Zhang Y-P (2014) ldquoOut of Pollenrdquo Hypothesis for Origin of New 1048 Genes in Flowering Plants Study from Arabidopsis thaliana Genome Biol Evol 6 2822ndash2829 1049

Yang H Liu X Xin M Du J Hu Z Peng H Rossi V Sun Q Ni Z Yao Y (2016) Genome-wide Mapping of 1050 Targets of Maize Histone Deacetylase HDA101 Reveals Its Function and Regulatory Mechanism during 1051 Seed Development Plant Cell 28 629ndash645 1052

Yim WC Yu Y Song K Jang CS Lee B-M (2013) PLANEX the plant co-expression database BMC Plant Biol 1053 13 83 1054

Zheng W Chung LM Zhao H (2011) Bias detection and correction in RNA-Sequencing data BMC 1055 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 30

Bioinformatics 12 290 1056

Zhong R Allen JD Xiao G Xie Y (2014) Ensemble-based network aggregation improves the accuracy of gene 1057 network reconstruction PLoS One 9 e106319 1058

Zhong R Ye ZH (2015) Secondary cell walls Biosynthesis patterned deposition and transcriptional regulation 1059 Plant Cell Physiol 56 195ndash214 1060

Zhu G Wu A Xu X-J Xiao P Lu L Liu J Cao Y Chen L Wu J Zhao X-M (2016) PPIM A protein-protein 1061 interaction database for Maize Plant Physiol 170 pp1501821- 1062

Zyprych-Walczak J Szabelska A Handschuh L Goacuterczak K Klamecka K Figlerowicz M Siatkowski I (2015) 1063 The impact of normalization methods on RNA-Seq data analysis Biomed Res Int 2015 1064

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 1 Number of maize microarray and RNA-Seq samples submitted to NCBI (httpswwwncbinlmnihgov) from 2008-2016 A A text search of the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) database identified samples generated by microarray platforms GPL4032 and GPL12620 the total values for years 2008 to 2016 were combined to represent the number of microarray studies (Microarray) A text search of the NCBI Sequence Read Archive (SRA) database was used to identify RNA-seq samples generated between 2008 and 2016 using the Illumina sequencing platform (RNA-Seq) B the number of samples submitted to NCBI GEO database each year generated by microarray platform GPL4032 and GPL12620 were identified by a text search (dash line) and compared to the number of RNA-Seq Illumina samples (solid line) per year 2008-2016

Fig 1A B

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 2 Normalization and network inference methods effect on single network performance A Network performance was evaluated by calculating area under the receiver operating characteristic curve (AUROC) values from GO datasets for comparisons with samples normalized using Variance Stabilizing Transformation (VST) Counts Per Million (CPM) or Reads per Kilobase per Million (RPKM) methods B Network performance was evaluated by calculating AUROC values from PPPTY dataset comparisons for samples normalized using VST CPM or RPKM methods C Network performance was evaluated by calculating AUROC values from com-parisons with HDA101 binding targets for samples normalized using VST CPM or RPKM methods D Network performance was evaluated by calculating AUROC values from comparisons with GO dataset for samples constructed using nine inference methods including Pearson Correlation Coefficient (PCC) Spearman correla-tion coefficient (SCC) Kendall rank Correlation Coefficient (KCC) Gini correlation coefficient (GCC) Biweight midcorrelation (BIC) Cosine Similarity Coefficient (CSC) Additive ARCNE (AA) multiplicative ARCNE (MA) Minimum Redundancy NETwork (MRNET) and Context Likelihood of Relatedness (CLR) E Network perfor-mance was evaluated by calculating AUROC values from comparisons with PPPTY for samples constructed using nine inference methods F Network performance was evaluated by calculating AUROC values from com-parisons with HDA101 binding targets for samples constructed using nine inference methods Outliers were defined as outside of 15 times the interquartile range above 75 quantile or below 25 quantile Median values were plotted as bold horizontal lines For C to F the horizontal dashed lines indicate the highest and lowest AUROC values

Fig 2 A D

B E

C F

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

FigP

FigurePIPSimilarityPbetweenPninePinferencePmethodsPonPnetworkPperformancePbasedPuponPGOPA-PandPPPPTYPB-PevaluationI Genes with highest AUROC value in CSC or MRNETCLR are closed in blue and green box respectively AreaPunderPthePROCPcurvePAUROC-PvaluesPforPeachPGOPtermPorPgenesPwerePscaledPtoPstandardPnormalPdistributionMPresultingPinPscaledPAUROCPvaluesPbetweenPKPblue-PandPPred-IPSamplesPnormalizedPbyPVSTMPCPMPandPRPKMPwerePanalyzedPusingPeachPinferencePmethodsPPCCMPSCCMPKCCMPGCCMPBICMPCSCMPAAMPMAMPMRNETPandPCLR-PandPclusteredPbasedPonP EuclidianP distanceIP PCCP PearsonP CorrelationP CoefficientP SCCP SpearmanP correlationP coefficientP KCCP KendallP rankP CorrelationP CoefficientPGCCPGiniPcorrelationPcoefficientPBICPBiweightPmidcorrelationPCosinePSimilarityPCoefficientPCSC-PAAPAdditivePARCNEPMAPmultiplicativePARCNEPMRNETPMinimumPRedundancyPNETworkPCLRPContextPLikelihoodPofPRelatednessI

A

B

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 4 Effect of sample size on network performance A Average area under the ROC curve (average AUROC) values from GO evaluation of 17 different sized networks plotted against natural logarithm transformed sample size (log(sample size)) B Average AUROC values from PPPTY evaluation of 17 different sized networks plotted against natural logarithm transformed sample size (log(sample size)) Regression fitting logarithm models were plotted in black lines R-square and p-values were calculated by lm() function in R PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness

Fig 4A

B

GO PCC GO SCC

GO MRNET GO CLR

PPPTY PCC PPPTY SCC

PPPTY CLRPPPTY MRNET

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Fig 5

A B

Figure 5 Evaluation of different sized networks constructed with different number of samples using PCC (black) SCC (green) MRNET (red) and CLR (blue) methods A Average AUROC values from GO evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted against sample size B Average AUROC values from PPPTY evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted against sample size Networks with the same number of samples included are designated as ldquo_1rdquo ldquo_2rdquo and ldquo_3rdquo PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 6 GCN performance comparison of networks constructed with 1266 libraries A Area under the ROC curve (AUROC) values from GO evalu-ation of single network (white bars) aggregation network (grey bars) and protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers B AUROC values from PPPTY evaluation of single network (white bars) aggregation network (grey bars) and protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers

AU

C

AU

C

Protein GO Protein PPPTYA B

Fig 6

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

835

45

3812

5 802

148

MS PA

SA

872505

167664

16510411732

9172 716573

106591

MS PA

SA

chromatinassemblydisassembly

cellular macromoleculemetabolic process

chromatin assembly

Hub

N2 compound metabolicprocess

gene silencing

macromoleculemetabolic process

cellular componentorganization

chromatin modification

biosynthetic process

cellular biosyntheticprocess

DNA packaging

organelle organization

protein-DNA complexassembly

nucleosomeorganization

DNA-dep DNAreplication

macromoleculebiosynthetic process

response to DNAdamage stimulus

chromosomeorganization

pattern specificationprocess

DNA replication

DNA conformationchange

translation

cellular macromoleculebiosynthetic process

Nucleic acid metabolicprocess

gene expression

chromatin organizationnucleosome assembly

epigenetic reg of geneexpression

negative regulation ofmacromolecule

metabolic process

cellular response tostress

RNA processing

DNA repair

regionalization

polysaccharidebiosynthetic process

cell wall organization orbiogenesis

glucan metabolicprocess

cellular glucanmetabolic process

cellular polysaccharidebiosynthetic process

cellular carbohydratebiosynthetic process

cellulose metabolicprocess

cellular polysaccharidemetabolic process

cellulose biosyntheticprocess epidermis development

cell growthgrowth

regulation of cellularcomponent size

cellular amino acidderivative metabolic

process

cell wall polysaccharidemetabolic process

carbohydrate metabolicprocess

regulation of anatomicalstructure size

GTP metabolic process

root morphogenesis

epidermal celldifferentiation

ectoderm developmentphenylpropanoid

biosynthetic process

regulation of cell size

glucan biosyntheticprocess

carbohydratebiosynthetic process cellular cell wall

organization orbiogenesis

cell wall biogenesis

Cell Wallroot epidermal cell

differentiationcell differentiation

cell wall organization

protein polymerization

plant-type cell wallbiogenesis

cellular carbohydratemetabolic process

phenylpropanoidmetabolic process

cell wall macromoleculemetabolic process

cellular cell wallmacromolecule

metabolic process

plant-type cell wallorganization or

biogenesis

hemicellulose metabolicprocess

Fig 7A C

B D

Figure 7 Comparison of top 1000 interacting genes in aggregation and single networks A Venn diagram shows the overlap among 1000 highest interacting genes in three networks 148 genes were shared among three networks PA PCC ranked aggregation network SA SCC ranked aggrega-tion network MS MRNET single network B Venn diagram shows the overlap among top 1x106 edges in three networks 106591 edges were shared among three networks C GO term enrichment analyzed by AgriGO for 148 shared highly interacting genes among PA SA and MS D GO term enrichment analyzed by AgriGO for 214 co-expressed genes queried by 16 cell wall pathway genes wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Fig 8

A B C

Figure 8 Cell wall pathway subnetworks A Intersections of PCC aggregation (PA) SCC aggregation (SA) and MRNET-single (MS) networks queried by 16 cell wall pathway genes (red nodes) Cyan nodes are genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions B Network retrieved from CORNET database queried by the 16 cell wall pathway genes (red node) Cyan nodes are genes with reported function in cell wall related path-ways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions C Network retrieved from STRING database queried by the 16 cell wall pathway genes (red nodes) Cyan nodes are genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Parsed CitationsAllen JD Xie Y Chen M Girard L Xiao G (2012) Comparing statistical methods for constructing large scale gene networks PLoS One7 e29348

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Anders S Huber W (2010) Differential expression analysis for sequence count data Genome Biol 11 R106Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Aoki K Ogata Y Shibata D (2007) Approaches for extracting practical information from gene co-expression networks in plant biologyPlant Cell Physiol 48 381-90

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Aoki Y Okamura Y Tadaka S Kinoshita K Obayashi T (2015) ATTED-II in 2016 A Plant Coexpression Database Towards Lineage-Specific Coexpression Plant Cell Physiol 57 e5-e5

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Asnicar F Masera L Coller E Gallo C Sella N Tolio T Morettin P Erculiani L Galante F Semeniuta S (2016) NES2RA Networkexpansion by stratified variable subsetting and ranking aggregation Int J High Perform Comput Appl 1094342016662508

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Baerenfaller K Grossmann J Grobei MA Hull R Hirsch-Hoffmann M Yalovsky S Zimmermann P Grossniklaus U Gruissem WBaginsky S (2008) Genome-scale proteomics reveals Arabidopsis thaliana gene models and proteome dynamics Science (80- ) 320938-941

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ballouz S Verleyen W Gillis J (2015) Guidance for RNA-seq co-expression network construction and analysis Safety in numbersBioinformatics 31 2123-2130

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ballouz S Weber M Pavlidis P Gillis J (2016) EGAD Ultra-fast functional analysis of gene networks bioRxiv 53868Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Barabasi A-L Oltvai ZNZN Barabaacutesi A-L (2004) Network biology understanding the cells functional organization Nat Rev Genet 5101-113

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Benjamini Y Hochberg Y (1995) Controlling the false discovery rate a practical and powerful approach to multiple testing J R Stat SocSer B 289-300

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

De Bodt S Hollunder J Nelissen H Meulemeester N Inzeacute D (2012) CORNET 20 Integrating plant coexpression protein-proteininteractions regulatory interactions gene associations and functional annotations New Phytol 195 707-720

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Bolduc N Yilmaz A Mejia-Guerra MK Morohashi K OConnor D Grotewold E Hake S (2012) Unraveling the KNOTTED1 regulatorynetwork in maize meristems Genes Dev 26 1685-90

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Bosch M Mayer C-D Cookson A Donnison IS (2011) Identification of genes involved in cell wall biogenesis in grasses by differential wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

gene expression profiling of elongating and non-elongating maize internodes J Exp Bot 62 3545-3561Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Camacho C Coulouris G Avagyan V Ma N Papadopoulos J Bealer K Madden TL (2009) BLAST+ architecture and applications BMCBioinformatics 10 421

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Conesa A Madrigal P Tarazona S Gomez-Cabrero D Cervera A McPherson A Szczesniak MW Gaffney DJ Elo LL Zhang X et al(2016) A survey of best practices for RNA-seq data analysis Genome Biol 17 13

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dhaeseleer P Liang S Somogyi R (2000) Genetic network inference from co-expression clustering to reverse engineeringBioinformatics 16 707-726

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Davidson RM Hansey CN Gowda M Childs KL Lin H Vaillancourt B Sekhon RS de Leon N Kaeppler SM Jiang N et al (2011) Utilityof RNA Sequencing for Analysis of Maize Reproductive Transcriptomes Plant Genome J 4 191

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Desprez T Juraniec M Crowell EF Jouy H Pochylova Z Parcy F Houmlfte H Gonneau M Vernhettes S (2007) Organization of cellulosesynthase complexes involved in primary cell wall synthesis in Arabidopsis thaliana Proc Natl Acad Sci 104 15572-15577

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dhillon IS Modha DS (2001) Concept decompositions for large sparse text data using clustering Mach Learn 42 143-175Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dillies M-A Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot G Castel D Estelle J (2013a) Acomprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis Brief Bioinform 14671-683

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dillies MA Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot NS Castel D Estelle J et al (2013b) Acomprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis Brief Bioinform 14671-683

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Doncheva NT Assenov Y Domingues FS Albrecht M (2012) Topological analysis and interactive visualization of biological networksand protein structures Nature Protoc 7 670-85

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dong J Horvath S (2007) Understanding network concepts in modules BMC Syst Biol 1 24Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dotto MC Petsch KA Aukerman MJ Beatty M Hammell M Timmermans MCP (2014) Genome-wide analysis of leafbladeless1-regulatedand phased small RNAs underscores the importance of the TAS3 ta-siRNA pathway to maize development PLoS Genet 10 e1004826

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Du D Rawat N Deng Z Gmitter FG (2015) Construction of citrus gene coexpression networks from microarray data using randommatrix theory Hortic Res 2 15026

Pubmed Author and TitleCrossRef Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Google Scholar Author Only Title Only Author and Title

Du Z Zhou X Ling Y Zhang Z Su Z (2010) agriGO a GO analysis toolkit for the agricultural community Nucleic Acids Res 38 64-70Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Enright AJ Van Dongen S Ouzounis CA (2002) An efficient algorithm for large-scale detection of protein families Nucleic Acids Res 301575-1584

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fadista J Vikman P Laakso EO Mollet IG Esguerra J Lou Taneera J Storm P Osmark P Ladenvall C Prasad RB (2014) Globalgenomic and transcriptomic analysis of human pancreatic islets reveals novel genes influencing glucose metabolism Proc Natl AcadSci 111 13924-13929

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Faith JJ Hayete B Thaden JT Mogno I Wierzbowski J Cottarel G Kasif S Collins JJ Gardner TS (2007) Large-scale mapping andvalidation of Escherichia coli transcriptional regulation from a compendium of expression profiles PLoS Biol 5 0054-0066

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fedoroff N V (2012) McClintocks challenge in the 21st century Proc Natl Acad Sci 109(50) 20200-20203Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ficklin SP Feltus FA (2011) Gene coexpression network alignment and conservation of gene modules between two grass speciesmaize and rice Plant Physiol 156 1244-56

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Filzmoser P Fritz H Kalcher K (2009) pcaPP Robust PCA by Projection Pursuit R Packag version 1Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fu J Cheng Y Linghu J Yang X Kang L Zhang Z Zhang J He C Du X Peng Z (2013) RNA sequencing reveals the complex regulatorynetwork in the maize kernel Nature Commun 42832

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gao S Ver Steeg G Galstyan A (2015) Efficient Estimation of Mutual Information for Strongly Dependent Variables ArtificialIntelligence and Statistics 277-286

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gillis J Pavlidis P (2011) The role of indirect connections in gene networks in predicting function Bioinformatics 27 1860-1866Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Giorgi F Fabbro C Del Licausi F (2013) Comparative study of RNA-seq-and Microarray-derived coexpression networks in Arabidopsisthaliana Bioinformatics 2 1-8

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gu Y Kaplinsky N Bringmann M Cobb A Carroll A Sampathkumar A Baskin TI Persson S Somerville CR (2010) Identification of acellulose synthase-associated protein required for cellulose biosynthesis Proc Natl Acad Sci 107 12866-12871

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Han Y Gao S Muegge K Zhang W Zhou B (2015) Advanced applications of RNA sequencing and challenges Bioinform Biol Insights 929-46

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Horvath S Dong J (2008) Geometric interpretation of gene coexpression network analysis PLoS Comput biol 4 e1000117Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Huang J Lynn JS Schulte L Vendramin S McGinnis K (2017) Chapter Two-Epigenetic Control of Gene Expression in Maize Int RevCell Mol Biol 328 25-48

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Iancu OD Kawane S Bottomly D Searles R Hitzemann R McWeeney S (2012) Utilizing RNA-Seq data for de novo coexpressionnetwork inference Bioinformatics 28 1592-1597

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kim D Langmead B Salzberg SL (2015) HISAT a fast spliced aligner with low memory requirements Nat Methods 12 357-360Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Koumlster J Rahmann S (2012) Snakemakemdasha scalable bioinformatics workflow engine Bioinformatics 28 2520-2522Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Krouk G Lingeman J Colon A Coruzzi G Shasha D (2013) Gene regulatory networks in plants learning causality from time andperturbation Genome Biol 14 123

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kumar S Stecher G Suleski M Hedges SB (2017) TimeTree a resource for timelines timetrees and divergence times Mol Biol Evol34 1812-1819

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kumari S Nie J Chen H-S Ma H Stewart R Li X Lu M-Z Taylor WM Wei H (2012) Evaluation of gene association methods forcoexpression network construction and biological knowledge discovery PLoS One 7 e50411

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Langfelder P Horvath S (2008) WGCNA an R package for weighted correlation network analysis BMC Bioinformatics 9 559Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Leinonen R Sugawara H Shumway M (2010) The sequence read archive Nucleic Acids Res gkq1019Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li C Qiao Z Qi W Wang Q Yuan Y Yang X Tang Y Mei B Lv Y Zhao H et al (2015a) Genome-Wide Characterization of cis-Acting DNATargets Reveals the Transcriptional Regulatory Framework of Opaque2 in Maize Plant Cell 27 532-545

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li H Peng Z Yang X Wang W Fu J Wang J Han Y Chai Y Guo T Yang N (2013a) Genome-wide association study dissects the geneticarchitecture of oil biosynthesis in maize kernels Nat Genet 45 43-50

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li J Wei H Zhao PX (2013b) DeGNServer Deciphering Genome-Scale Gene Networks through High Performance ReverseEngineering Analysis 2013

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li P Piao Y Shon HS Ryu KH (2015b) Comparing the normalization methods for the differential analysis of Illumina high-throughputRNA-Seq data BMC Bioinformatics 16 347 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li Q Eichten SR Hermanson PJ Zaunbrecher VM Song J Wendt J Rosenbaum H Madzima TF Sloan AE Huang J et al (2014a)Genetic Perturbation of the Maize Methylome Plant Cell 26 4602-4616

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li S Labaj PP Zumbo P Sykacek P Shi W Shi L Phan J Wu P-Y Wang M Wang C (2014b) Detecting and correcting systematicvariation in large-scale RNA sequencing data Nature Biotechnol 32 888-895

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li Y Pearl SA Jackson SA (2015c) Gene Networks in Plant Biology Approaches in Reconstruction and Analysis Trends Plant Sci 20664-675

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Liao Y Smyth GK Shi W (2014) featureCounts an efficient general purpose program for assigning sequence reads to genomicfeatures Bioinformatics 30 923-930

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Lim WK Wang K Lefebvre C Califano A (2007) Comparative analysis of microarray normalization procedures Effects on reverseengineering gene networks Bioinformatics pp 282-288

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Liu S Liu Y Zhao J Cai S Qian H Zuo K Zhao L Zhang L (2017) A computational interactome for prioritizing genes associated withcomplex agronomic traits in rice Plant J 90 177-188

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Lonsdale J Thomas J Salvatore M Phillips R Lo E Shad S Hasz R Walters G Garcia F Young N (2013) The genotype-tissueexpression (GTEx) project Nat Genet 45 580-585

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Love MI Huber W Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 Genome Biol15 1

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Luo X You Z Zhou M Li S Leung H Xia Y Zhu Q (2015) A highly efficient approach to protein interactome mapping based oncollaborative filtering framework Sci Rep 5 7702

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma C Wang X (2012) Application of the Gini correlation coefficient to infer regulatory relationships in transcriptome analysis PlantPhysiol 160 192-203

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma H-W Zeng A-P (2003) The connectivity structure giant strong component and centrality of metabolic networks Bioinformatics 191423-1430

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma S Shah S Bohnert HJ Snyder M Dinesh-Kumar SP (2013) Incorporating motif analysis into gene co-expression networks revealsnovel modular expression pattern and new signaling pathways PLoS Genet 9 e1003840

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Marbach D Costello JC Kuumlffner R Vega NNM Prill RJ Camacho DM Allison KR Aderhold A Allison KR Bonneau R et al (2012)Wisdom of crowds for robust gene network inference Nat Methods 9 796-804

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Margolin AA Nemenman I Basso K Wiggins C Stolovitzky G Dalla Favera R Califano A (2006) ARACNE an algorithm for thereconstruction of gene regulatory networks in a mammalian cellular context BMC Bioinformatics 7 S7

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mark Cigan A Unger-Wallace E Haug-Collet K (2005) Transcriptional gene silencing as a tool for uncovering gene function in maizePlant J 43 929-940

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads EMBnet J 17 pp-10Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Maza E Frasse P Senin P Bouzayen M Zouine M (2013) Comparison of normalization methods for differential gene expressionanalysis in RNA-Seq experiments A matter of relative size of studied transcriptomes Commun Integr Biol 6 e25849

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

McClintock B (1983) The Significance of Responses of THE GENOME TO CHALLENGE Science (80- ) 792-801Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Meyer PE Kontos K Lafitte F Bontempi G (2007) Information-theoretic inference of large transcriptional regulatory networks EurasipJ Bioinforma Syst Biol doi 101155200779879

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Meyer PE Lafitte F Bontempi G (2008) minet ARBioconductor package for inferring large transcriptional networks using mutualinformation BMC Bioinformatics 9 461

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Monaco MK Sen TZ Dharmawardhana PD Ren L Schaeffer M Naithani S Amarasinghe V Thomason J Harper L Gardiner J et al(2013) Maize Metabolic Network Construction and Transcriptome Analysis Plant Genome 6 12

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Morohashi K Casas MI Falcone Ferreyra ML Falcone Ferreyra L Mejiacutea-Guerra MK Pourcel L Yilmaz A Feller A Carvalho B EmilianiJ et al (2012) A genome-wide regulatory framework identifies maize pericarp color1 controlled genes Plant Cell 24 2745-64

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Morris JH Apeltsin L Newman AM Baumbach J Wittkop T Su G Bader GD Ferrin TE (2011) clusterMaker a multi-algorithmclustering plugin for Cytoscape BMC Bioinformatics 12 436

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mortazavi A Williams BA McCue K Schaeffer L Wold B (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq NatMethods 5 621-628

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mukaka MM (2012) A guide to appropriate use of correlation coefficient in medical research Malawi Med J 24 69-71Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Obayashi T Hayashi S Saeki M Ohta H Kinoshita K (2009) ATTED-II provides coexpressed gene networks for Arabidopsis Nucleic wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Acids Res 37 D987-D991Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ogata Y Suzuki H Sakurai N Shibata D (2010) CoP a database for characterizing co-expressed gene modules with biologicalinformation in plants Bioinformatics 26 1267-1268

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Oshlack A Wakefield MJ (2009) Transcript length bias in RNA-seq data confounds systems biology Biol Direct 4 14Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Park T Yi S-G Kang S-H Lee S Lee Y-S Simon R (2003) Evaluation of normalization methods for microarray data BMC Bioinformatics4 33

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Paulson J Chen C-Y Lopes-Ramos CM Kuijjer ML Platig J Sonawane AR Fagny M Glass K Quackenbush J (2016) Tissue-awareRNA-Seq processing and normalization for heterogeneous and sparse data bioRxiv 81802

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Pautler M Eveland AL LaRue T Yang F Weeks R Lunde C Je B Il Meeley R Komatsu M Vollbrecht E et al (2015) FASCIATED EAR4Encodes a bZIP Transcription Factor That Regulates Shoot Meristem Size in Maize Plant Cell Online 2 tpc114132506

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Penning BW Hunter 3rd CT Tayengwa R Eveland AL Dugard CK Olek AT Vermerris W Koch KE McCarty DR Davis MF et al (2009)Genetic resources for maize cell wall biology Plant Physiol 151 1703-1728

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ponnala L Wang Y Sun Q Wijk KJ (2014) Correlation of mRNA and protein abundance in the developing maize leaf Plant J 78 424-440Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Rau A Gallopin M Celeux G Jaffreacutezic F (2013) Data-based filtering for replicated high-throughput transcriptome sequencingexperiments Bioinformatics 29 2146-2152

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Robinson MD McCarthy DJ Smyth GK (2010) edgeR a Bioconductor package for differential expression analysis of digital geneexpression data Bioinformatics 26 139-140

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sales G Romualdi C (2011) parmigenemdasha parallel R package for mutual information estimation and gene network reconstructionBioinformatics 27 1876-1877

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sangrador-Vegas A Mitchell AL Chang H-Y Yong S-Y Finn RD (2016) GO annotation in InterPro why stability does not indicateaccuracy in a sea of changing annotations Database J Biol databases curation 2016

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sato Y Antonio BA Namiki N Takehisa H Minami H Kamatsuki K Sugimoto K Shimizu Y Hirochika H Nagamura Y (2011) RiceXPro aplatform for monitoring gene expression in japonica rice grown under natural field conditions Nucleic Acids Res 39 D1141-D1148

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schaefer RJ Briskine R Springer NM Myers CL (2014) Discovering functional modules across diverse maize transcriptomes using wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

COB the co-expression browser PLoS One doi 101371journalpone0099193Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schmidt D (2016) Co-Operation Fast Correlation Covariance and Cosine Similarity R packagePubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schnable P Ware D Fulton R Stein J (2009) The B73 maize genome complexity diversity and dynamics Science (80- ) 326 1112-1115Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schwanhaumlusser B Busse D Li N Dittmar G Schuchhardt J Wolf J Chen W Selbach M (2011) Global quantification of mammalian geneexpression control Nature 473 337-342

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Scott-Boyer M-P Haibe-Kains B Deschepper CF (2013) Network statistics of genetically-driven gene co-expression modules in mousecrosses Frontiers in Genetics 20134291

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Serin EAR Nijveen H Hilhorst HWM Ligterink W (2016) Learning from Co-expression Networks Possibilities and Challenges FrontPlant Sci 7 444

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Shannon P Markiel A Ozier O Baliga NS Wang JT Ramage D Amin N Schwikowski B Ideker T (2003) Cytoscape a softwareenvironment for integrated models of biomolecular interaction networks Genome Res 13 2498-2504

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sing T Sander O Beerenwinkel N Lengauer T (2005) ROCR visualizing classifier performance in R Bioinformatics 21 3940-3941Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Singh M Goel S Meeley RB Dantec C Parrinello H Michaud C Leblanc O Grimanelli D (2011) Production of viable gametes withoutmeiosis in maize deficient for an ARGONAUTE protein Plant Cell 23 443-458

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Song L Langfelder P Horvath S (2012) Comparison of co-expression measures mutual information correlation and model basedindices BMC Bioinformatics 13 328

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Stelpflug SC Rajandeep S Vaillancourt B Hirsch CN Buell CR Leon N De Kaeppler SM (2015) An expanded maize gene expressionatlas based on RNA-sequencing and its use to explore root development Plant Genome 314-362

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Szklarczyk D Franceschini A Wyder S Forslund K Heller D Huerta-Cepas J Simonovic M Roth A Santos A Tsafou KP et al (2015)STRING v10 Protein-protein interaction networks integrated over the tree of life Nucleic Acids Res 43 D447-D452

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Tzfadia O Diels T De Meyer S Vandepoele K Aharoni A Van de Peer Y (2016) CoExpNetViz Comparative Co-Expression NetworksConstruction and Visualization Tool Front Plant Sci 6 1194

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Usadel B Obayashi T Mutwil M Giorgi FM Bassel GW Tanimoto M Chow A Steinhauser D Persson S Provart NJ (2009) Co-expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

USDA (2016) Grain World Markets and Trade

Vanholme R Demedts B Morreel K Ralph J Boerjan W (2010) Lignin biosynthesis and structure Plant Physiol 153 895-905Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Verdin E Ott M (2014) 50 Years of Protein Acetylation From Gene Regulation To Epigenetics Metabolism and Beyond Nat Rev MolCell Biol 16 258-264

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Walley JW Sartor RC Shen Z Schmitz RJ Wu KJ Urich MA Nery JR Smith LG Schnable JC Ecker JR (2016) Integration of omicnetworks in a developmental atlas of maize Science 353 814-818

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang W Zhou XJ Liu Z Sun F (2015a) Network tuned multiple rank aggregation and applications to gene ranking BMC Bioinformatics16 S6

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang YXR Jiang K Feldman LJ Bickel PJ Huang H (2015b) Inferring gene-gene interactions and functional modules using sparsecanonical correlation analysis Ann Appl Stat 9 300-323

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang Z Gerstein M Snyder M (2009) RNA-Seq a revolutionary tool for transcriptomics Nat Rev Genet 10 57-63Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wei C Li J Bumgarner RE (2004) Sample size for detecting differentially expressed genes in microarray experiments BMC Genomics5 87

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wolfe CJ Kohane IS Butte AJ (2005) Systematic survey reveals general applicability of guilt-by-association within genecoexpression networks BMC Bioinformatics 6 227

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wu D-D Wang X Li Y Zeng L Irwin DM Zhang Y-P (2014) Out of Pollen Hypothesis for Origin of New Genes in Flowering PlantsStudy from Arabidopsis thaliana Genome Biol Evol 6 2822-2829

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yang H Liu X Xin M Du J Hu Z Peng H Rossi V Sun Q Ni Z Yao Y (2016) Genome-wide Mapping of Targets of Maize HistoneDeacetylase HDA101 Reveals Its Function and Regulatory Mechanism during Seed Development Plant Cell 28 629-645

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yim WC Yu Y Song K Jang CS Lee B-M (2013) PLANEX the plant co-expression database BMC Plant Biol 13 83Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zheng W Chung LM Zhao H (2011) Bias detection and correction in RNA-Sequencing data BMC Bioinformatics 12 290Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Allen JD Xiao G Xie Y (2014) Ensemble-based network aggregation improves the accuracy of gene network reconstructionPLoS One 9 e106319 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Ye ZH (2015) Secondary cell walls Biosynthesis patterned deposition and transcriptional regulation Plant Cell Physiol 56195-214

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhu G Wu A Xu X-J Xiao P Lu L Liu J Cao Y Chen L Wu J Zhao X-M (2016) PPIM A protein-protein interaction database for MaizePlant Physiol 170 pp1501821-

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zyprych-Walczak J Szabelska A Handschuh L Goacuterczak K Klamecka K Figlerowicz M Siatkowski I (2015) The impact of normalizationmethods on RNA-Seq data analysis Biomed Res Int 2015

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

  • Parsed Citations
  • Article File
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Figure 7
  • Figure 8
  • Parsed Citations
Page 28: 1 : Maize RNA-Seq GCNs · 4 Author names and affiliation: 5 Ji Huang, ... and the genome annotation is still far from complete (Mark 49 Cigan et al., 2005; Ficklin and Feltus, 2011)

Page | 28

Mukaka MM (2012) A guide to appropriate use of correlation coefficient in medical research Malawi Med J 24 968 69ndash71 969

Obayashi T Hayashi S Saeki M Ohta H Kinoshita K (2009) ATTED-II provides coexpressed gene networks 970 for Arabidopsis Nucleic Acids Res 37 D987ndashD991 971

Ogata Y Suzuki H Sakurai N Shibata D (2010) CoP a database for characterizing co-expressed gene 972 modules with biological information in plants Bioinformatics 26 1267ndash1268 973

Oshlack A Wakefield MJ (2009) Transcript length bias in RNA-seq data confounds systems biology Biol 974 Direct 4 14 975

Park T Yi S-G Kang S-H Lee S Lee Y-S Simon R (2003) Evaluation of normalization methods for microarray 976 data BMC Bioinformatics 4 33 977

Paulson J Chen C-Y Lopes-Ramos CM Kuijjer ML Platig J Sonawane AR Fagny M Glass K Quackenbush 978 J (2016) Tissue-aware RNA-Seq processing and normalization for heterogeneous and sparse data 979 bioRxiv 81802 980

Pautler M Eveland AL LaRue T Yang F Weeks R Lunde C Je B Il Meeley R Komatsu M Vollbrecht E et 981 al (2015) FASCIATED EAR4 Encodes a bZIP Transcription Factor That Regulates Shoot Meristem Size in 982 Maize Plant Cell Online 2 tpc114132506 983

Penning BW Hunter 3rd CT Tayengwa R Eveland AL Dugard CK Olek AT Vermerris W Koch KE McCarty 984 DR Davis MF et al (2009) Genetic resources for maize cell wall biology Plant Physiol 151 1703ndash1728 985

Ponnala L Wang Y Sun Q Wijk KJ (2014) Correlation of mRNA and protein abundance in the developing 986 maize leaf Plant J 78 424ndash440 987

Rau A Gallopin M Celeux G Jaffreacutezic F (2013) Data-based filtering for replicated high-throughput 988 transcriptome sequencing experiments Bioinformatics 29 2146ndash2152 989

Robinson MD McCarthy DJ Smyth GK (2010) edgeR a Bioconductor package for differential expression 990 analysis of digital gene expression data Bioinformatics 26 139ndash140 991

Sales G Romualdi C (2011) parmigenemdasha parallel R package for mutual information estimation and gene 992 network reconstruction Bioinformatics 27 1876ndash1877 993

Sangrador-Vegas A Mitchell AL Chang H-Y Yong S-Y Finn RD (2016) GO annotation in InterPro why 994 stability does not indicate accuracy in a sea of changing annotations Database J Biol databases 995 curation 2016 996

Sato Y Antonio BA Namiki N Takehisa H Minami H Kamatsuki K Sugimoto K Shimizu Y Hirochika H 997 Nagamura Y (2011) RiceXPro a platform for monitoring gene expression in japonica rice grown under 998 natural field conditions Nucleic Acids Res 39 D1141ndashD1148 999

Schaefer RJ Briskine R Springer NM Myers CL (2014) Discovering functional modules across diverse maize 1000 transcriptomes using COB the co-expression browser PLoS One doi 101371journalpone0099193 1001

Schmidt D (2016) Co-Operation Fast Correlation Covariance and Cosine Similarity R package 1002

Schnable P Ware D Fulton R Stein J (2009) The B73 maize genome complexity diversity and dynamics 1003 Science (80- ) 326 1112ndash1115 1004

Schwanhaumlusser B Busse D Li N Dittmar G Schuchhardt J Wolf J Chen W Selbach M (2011) Global 1005 quantification of mammalian gene expression control Nature 473 337ndash342 1006

Scott-Boyer M-P Haibe-Kains B Deschepper CF (2013) Network statistics of genetically-driven gene co-1007 expression modules in mouse crosses Frontiers in Genetics 20134291 1008

Serin EAR Nijveen H Hilhorst HWM Ligterink W (2016) Learning from Co-expression Networks Possibilities 1009 and Challenges Front Plant Sci 7 444 1010

Shannon P Markiel A Ozier O Baliga NS Wang JT Ramage D Amin N Schwikowski B Ideker T (2003) 1011 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 29

Cytoscape a software environment for integrated models of biomolecular interaction networks Genome 1012 Res 13 2498ndash2504 1013

Sing T Sander O Beerenwinkel N Lengauer T (2005) ROCR visualizing classifier performance in R 1014 Bioinformatics 21 3940ndash3941 1015

Singh M Goel S Meeley RB Dantec C Parrinello H Michaud C Leblanc O Grimanelli D (2011) Production of 1016 viable gametes without meiosis in maize deficient for an ARGONAUTE protein Plant Cell 23 443ndash458 1017

Song L Langfelder P Horvath S (2012) Comparison of co-expression measures mutual information 1018 correlation and model based indices BMC Bioinformatics 13 328 1019

Stelpflug SC Rajandeep S Vaillancourt B Hirsch CN Buell CR Leon N De Kaeppler SM (2015) An 1020 expanded maize gene expression atlas based on RNA-sequencing and its use to explore root 1021 development Plant Genome 314ndash362 1022

Szklarczyk D Franceschini A Wyder S Forslund K Heller D Huerta-Cepas J Simonovic M Roth A Santos A 1023 Tsafou KP et al (2015) STRING v10 Protein-protein interaction networks integrated over the tree of life 1024 Nucleic Acids Res 43 D447ndashD452 1025

Tzfadia O Diels T De Meyer S Vandepoele K Aharoni A Van de Peer Y (2016) CoExpNetViz Comparative 1026 Co-Expression Networks Construction and Visualization Tool Front Plant Sci 6 1194 1027

Usadel B Obayashi T Mutwil M Giorgi FM Bassel GW Tanimoto M Chow A Steinhauser D Persson S 1028 Provart NJ (2009) Co-expression tools for plant biology opportunities for hypothesis generation and 1029 caveats Plant Cell Environ 32 1633ndash51 1030

USDA (2016) Grain World Markets and Trade 1031

Vanholme R Demedts B Morreel K Ralph J Boerjan W (2010) Lignin biosynthesis and structure Plant 1032 Physiol 153 895ndash905 1033

Verdin E Ott M (2014) 50 Years of Protein Acetylation From Gene Regulation To Epigenetics Metabolism 1034 and Beyond Nat Rev Mol Cell Biol 16 258ndash264 1035

Walley JW Sartor RC Shen Z Schmitz RJ Wu KJ Urich MA Nery JR Smith LG Schnable JC Ecker JR 1036 (2016) Integration of omic networks in a developmental atlas of maize Science 353 814ndash818 1037

Wang W Zhou XJ Liu Z Sun F (2015a) Network tuned multiple rank aggregation and applications to gene 1038 ranking BMC Bioinformatics 16 S6 1039

Wang YXR Jiang K Feldman LJ Bickel PJ Huang H (2015b) Inferring genendashgene interactions and functional 1040 modules using sparse canonical correlation analysis Ann Appl Stat 9 300ndash323 1041

Wang Z Gerstein M Snyder M (2009) RNA-Seq a revolutionary tool for transcriptomics Nat Rev Genet 10 1042 57ndash63 1043

Wei C Li J Bumgarner RE (2004) Sample size for detecting differentially expressed genes in microarray 1044 experiments BMC Genomics 5 87 1045

Wolfe CJ Kohane IS Butte AJ (2005) Systematic survey reveals general applicability ofldquo guilt-by-associationrdquo 1046 within gene coexpression networks BMC Bioinformatics 6 227 1047

Wu D-D Wang X Li Y Zeng L Irwin DM Zhang Y-P (2014) ldquoOut of Pollenrdquo Hypothesis for Origin of New 1048 Genes in Flowering Plants Study from Arabidopsis thaliana Genome Biol Evol 6 2822ndash2829 1049

Yang H Liu X Xin M Du J Hu Z Peng H Rossi V Sun Q Ni Z Yao Y (2016) Genome-wide Mapping of 1050 Targets of Maize Histone Deacetylase HDA101 Reveals Its Function and Regulatory Mechanism during 1051 Seed Development Plant Cell 28 629ndash645 1052

Yim WC Yu Y Song K Jang CS Lee B-M (2013) PLANEX the plant co-expression database BMC Plant Biol 1053 13 83 1054

Zheng W Chung LM Zhao H (2011) Bias detection and correction in RNA-Sequencing data BMC 1055 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 30

Bioinformatics 12 290 1056

Zhong R Allen JD Xiao G Xie Y (2014) Ensemble-based network aggregation improves the accuracy of gene 1057 network reconstruction PLoS One 9 e106319 1058

Zhong R Ye ZH (2015) Secondary cell walls Biosynthesis patterned deposition and transcriptional regulation 1059 Plant Cell Physiol 56 195ndash214 1060

Zhu G Wu A Xu X-J Xiao P Lu L Liu J Cao Y Chen L Wu J Zhao X-M (2016) PPIM A protein-protein 1061 interaction database for Maize Plant Physiol 170 pp1501821- 1062

Zyprych-Walczak J Szabelska A Handschuh L Goacuterczak K Klamecka K Figlerowicz M Siatkowski I (2015) 1063 The impact of normalization methods on RNA-Seq data analysis Biomed Res Int 2015 1064

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 1 Number of maize microarray and RNA-Seq samples submitted to NCBI (httpswwwncbinlmnihgov) from 2008-2016 A A text search of the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) database identified samples generated by microarray platforms GPL4032 and GPL12620 the total values for years 2008 to 2016 were combined to represent the number of microarray studies (Microarray) A text search of the NCBI Sequence Read Archive (SRA) database was used to identify RNA-seq samples generated between 2008 and 2016 using the Illumina sequencing platform (RNA-Seq) B the number of samples submitted to NCBI GEO database each year generated by microarray platform GPL4032 and GPL12620 were identified by a text search (dash line) and compared to the number of RNA-Seq Illumina samples (solid line) per year 2008-2016

Fig 1A B

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 2 Normalization and network inference methods effect on single network performance A Network performance was evaluated by calculating area under the receiver operating characteristic curve (AUROC) values from GO datasets for comparisons with samples normalized using Variance Stabilizing Transformation (VST) Counts Per Million (CPM) or Reads per Kilobase per Million (RPKM) methods B Network performance was evaluated by calculating AUROC values from PPPTY dataset comparisons for samples normalized using VST CPM or RPKM methods C Network performance was evaluated by calculating AUROC values from com-parisons with HDA101 binding targets for samples normalized using VST CPM or RPKM methods D Network performance was evaluated by calculating AUROC values from comparisons with GO dataset for samples constructed using nine inference methods including Pearson Correlation Coefficient (PCC) Spearman correla-tion coefficient (SCC) Kendall rank Correlation Coefficient (KCC) Gini correlation coefficient (GCC) Biweight midcorrelation (BIC) Cosine Similarity Coefficient (CSC) Additive ARCNE (AA) multiplicative ARCNE (MA) Minimum Redundancy NETwork (MRNET) and Context Likelihood of Relatedness (CLR) E Network perfor-mance was evaluated by calculating AUROC values from comparisons with PPPTY for samples constructed using nine inference methods F Network performance was evaluated by calculating AUROC values from com-parisons with HDA101 binding targets for samples constructed using nine inference methods Outliers were defined as outside of 15 times the interquartile range above 75 quantile or below 25 quantile Median values were plotted as bold horizontal lines For C to F the horizontal dashed lines indicate the highest and lowest AUROC values

Fig 2 A D

B E

C F

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

FigP

FigurePIPSimilarityPbetweenPninePinferencePmethodsPonPnetworkPperformancePbasedPuponPGOPA-PandPPPPTYPB-PevaluationI Genes with highest AUROC value in CSC or MRNETCLR are closed in blue and green box respectively AreaPunderPthePROCPcurvePAUROC-PvaluesPforPeachPGOPtermPorPgenesPwerePscaledPtoPstandardPnormalPdistributionMPresultingPinPscaledPAUROCPvaluesPbetweenPKPblue-PandPPred-IPSamplesPnormalizedPbyPVSTMPCPMPandPRPKMPwerePanalyzedPusingPeachPinferencePmethodsPPCCMPSCCMPKCCMPGCCMPBICMPCSCMPAAMPMAMPMRNETPandPCLR-PandPclusteredPbasedPonP EuclidianP distanceIP PCCP PearsonP CorrelationP CoefficientP SCCP SpearmanP correlationP coefficientP KCCP KendallP rankP CorrelationP CoefficientPGCCPGiniPcorrelationPcoefficientPBICPBiweightPmidcorrelationPCosinePSimilarityPCoefficientPCSC-PAAPAdditivePARCNEPMAPmultiplicativePARCNEPMRNETPMinimumPRedundancyPNETworkPCLRPContextPLikelihoodPofPRelatednessI

A

B

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 4 Effect of sample size on network performance A Average area under the ROC curve (average AUROC) values from GO evaluation of 17 different sized networks plotted against natural logarithm transformed sample size (log(sample size)) B Average AUROC values from PPPTY evaluation of 17 different sized networks plotted against natural logarithm transformed sample size (log(sample size)) Regression fitting logarithm models were plotted in black lines R-square and p-values were calculated by lm() function in R PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness

Fig 4A

B

GO PCC GO SCC

GO MRNET GO CLR

PPPTY PCC PPPTY SCC

PPPTY CLRPPPTY MRNET

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Fig 5

A B

Figure 5 Evaluation of different sized networks constructed with different number of samples using PCC (black) SCC (green) MRNET (red) and CLR (blue) methods A Average AUROC values from GO evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted against sample size B Average AUROC values from PPPTY evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted against sample size Networks with the same number of samples included are designated as ldquo_1rdquo ldquo_2rdquo and ldquo_3rdquo PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 6 GCN performance comparison of networks constructed with 1266 libraries A Area under the ROC curve (AUROC) values from GO evalu-ation of single network (white bars) aggregation network (grey bars) and protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers B AUROC values from PPPTY evaluation of single network (white bars) aggregation network (grey bars) and protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers

AU

C

AU

C

Protein GO Protein PPPTYA B

Fig 6

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

835

45

3812

5 802

148

MS PA

SA

872505

167664

16510411732

9172 716573

106591

MS PA

SA

chromatinassemblydisassembly

cellular macromoleculemetabolic process

chromatin assembly

Hub

N2 compound metabolicprocess

gene silencing

macromoleculemetabolic process

cellular componentorganization

chromatin modification

biosynthetic process

cellular biosyntheticprocess

DNA packaging

organelle organization

protein-DNA complexassembly

nucleosomeorganization

DNA-dep DNAreplication

macromoleculebiosynthetic process

response to DNAdamage stimulus

chromosomeorganization

pattern specificationprocess

DNA replication

DNA conformationchange

translation

cellular macromoleculebiosynthetic process

Nucleic acid metabolicprocess

gene expression

chromatin organizationnucleosome assembly

epigenetic reg of geneexpression

negative regulation ofmacromolecule

metabolic process

cellular response tostress

RNA processing

DNA repair

regionalization

polysaccharidebiosynthetic process

cell wall organization orbiogenesis

glucan metabolicprocess

cellular glucanmetabolic process

cellular polysaccharidebiosynthetic process

cellular carbohydratebiosynthetic process

cellulose metabolicprocess

cellular polysaccharidemetabolic process

cellulose biosyntheticprocess epidermis development

cell growthgrowth

regulation of cellularcomponent size

cellular amino acidderivative metabolic

process

cell wall polysaccharidemetabolic process

carbohydrate metabolicprocess

regulation of anatomicalstructure size

GTP metabolic process

root morphogenesis

epidermal celldifferentiation

ectoderm developmentphenylpropanoid

biosynthetic process

regulation of cell size

glucan biosyntheticprocess

carbohydratebiosynthetic process cellular cell wall

organization orbiogenesis

cell wall biogenesis

Cell Wallroot epidermal cell

differentiationcell differentiation

cell wall organization

protein polymerization

plant-type cell wallbiogenesis

cellular carbohydratemetabolic process

phenylpropanoidmetabolic process

cell wall macromoleculemetabolic process

cellular cell wallmacromolecule

metabolic process

plant-type cell wallorganization or

biogenesis

hemicellulose metabolicprocess

Fig 7A C

B D

Figure 7 Comparison of top 1000 interacting genes in aggregation and single networks A Venn diagram shows the overlap among 1000 highest interacting genes in three networks 148 genes were shared among three networks PA PCC ranked aggregation network SA SCC ranked aggrega-tion network MS MRNET single network B Venn diagram shows the overlap among top 1x106 edges in three networks 106591 edges were shared among three networks C GO term enrichment analyzed by AgriGO for 148 shared highly interacting genes among PA SA and MS D GO term enrichment analyzed by AgriGO for 214 co-expressed genes queried by 16 cell wall pathway genes wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Fig 8

A B C

Figure 8 Cell wall pathway subnetworks A Intersections of PCC aggregation (PA) SCC aggregation (SA) and MRNET-single (MS) networks queried by 16 cell wall pathway genes (red nodes) Cyan nodes are genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions B Network retrieved from CORNET database queried by the 16 cell wall pathway genes (red node) Cyan nodes are genes with reported function in cell wall related path-ways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions C Network retrieved from STRING database queried by the 16 cell wall pathway genes (red nodes) Cyan nodes are genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Parsed CitationsAllen JD Xie Y Chen M Girard L Xiao G (2012) Comparing statistical methods for constructing large scale gene networks PLoS One7 e29348

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Anders S Huber W (2010) Differential expression analysis for sequence count data Genome Biol 11 R106Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Aoki K Ogata Y Shibata D (2007) Approaches for extracting practical information from gene co-expression networks in plant biologyPlant Cell Physiol 48 381-90

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Aoki Y Okamura Y Tadaka S Kinoshita K Obayashi T (2015) ATTED-II in 2016 A Plant Coexpression Database Towards Lineage-Specific Coexpression Plant Cell Physiol 57 e5-e5

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Asnicar F Masera L Coller E Gallo C Sella N Tolio T Morettin P Erculiani L Galante F Semeniuta S (2016) NES2RA Networkexpansion by stratified variable subsetting and ranking aggregation Int J High Perform Comput Appl 1094342016662508

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Baerenfaller K Grossmann J Grobei MA Hull R Hirsch-Hoffmann M Yalovsky S Zimmermann P Grossniklaus U Gruissem WBaginsky S (2008) Genome-scale proteomics reveals Arabidopsis thaliana gene models and proteome dynamics Science (80- ) 320938-941

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ballouz S Verleyen W Gillis J (2015) Guidance for RNA-seq co-expression network construction and analysis Safety in numbersBioinformatics 31 2123-2130

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ballouz S Weber M Pavlidis P Gillis J (2016) EGAD Ultra-fast functional analysis of gene networks bioRxiv 53868Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Barabasi A-L Oltvai ZNZN Barabaacutesi A-L (2004) Network biology understanding the cells functional organization Nat Rev Genet 5101-113

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Benjamini Y Hochberg Y (1995) Controlling the false discovery rate a practical and powerful approach to multiple testing J R Stat SocSer B 289-300

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

De Bodt S Hollunder J Nelissen H Meulemeester N Inzeacute D (2012) CORNET 20 Integrating plant coexpression protein-proteininteractions regulatory interactions gene associations and functional annotations New Phytol 195 707-720

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Bolduc N Yilmaz A Mejia-Guerra MK Morohashi K OConnor D Grotewold E Hake S (2012) Unraveling the KNOTTED1 regulatorynetwork in maize meristems Genes Dev 26 1685-90

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Bosch M Mayer C-D Cookson A Donnison IS (2011) Identification of genes involved in cell wall biogenesis in grasses by differential wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

gene expression profiling of elongating and non-elongating maize internodes J Exp Bot 62 3545-3561Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Camacho C Coulouris G Avagyan V Ma N Papadopoulos J Bealer K Madden TL (2009) BLAST+ architecture and applications BMCBioinformatics 10 421

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Conesa A Madrigal P Tarazona S Gomez-Cabrero D Cervera A McPherson A Szczesniak MW Gaffney DJ Elo LL Zhang X et al(2016) A survey of best practices for RNA-seq data analysis Genome Biol 17 13

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dhaeseleer P Liang S Somogyi R (2000) Genetic network inference from co-expression clustering to reverse engineeringBioinformatics 16 707-726

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Davidson RM Hansey CN Gowda M Childs KL Lin H Vaillancourt B Sekhon RS de Leon N Kaeppler SM Jiang N et al (2011) Utilityof RNA Sequencing for Analysis of Maize Reproductive Transcriptomes Plant Genome J 4 191

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Desprez T Juraniec M Crowell EF Jouy H Pochylova Z Parcy F Houmlfte H Gonneau M Vernhettes S (2007) Organization of cellulosesynthase complexes involved in primary cell wall synthesis in Arabidopsis thaliana Proc Natl Acad Sci 104 15572-15577

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dhillon IS Modha DS (2001) Concept decompositions for large sparse text data using clustering Mach Learn 42 143-175Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dillies M-A Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot G Castel D Estelle J (2013a) Acomprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis Brief Bioinform 14671-683

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dillies MA Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot NS Castel D Estelle J et al (2013b) Acomprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis Brief Bioinform 14671-683

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Doncheva NT Assenov Y Domingues FS Albrecht M (2012) Topological analysis and interactive visualization of biological networksand protein structures Nature Protoc 7 670-85

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dong J Horvath S (2007) Understanding network concepts in modules BMC Syst Biol 1 24Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dotto MC Petsch KA Aukerman MJ Beatty M Hammell M Timmermans MCP (2014) Genome-wide analysis of leafbladeless1-regulatedand phased small RNAs underscores the importance of the TAS3 ta-siRNA pathway to maize development PLoS Genet 10 e1004826

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Du D Rawat N Deng Z Gmitter FG (2015) Construction of citrus gene coexpression networks from microarray data using randommatrix theory Hortic Res 2 15026

Pubmed Author and TitleCrossRef Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Google Scholar Author Only Title Only Author and Title

Du Z Zhou X Ling Y Zhang Z Su Z (2010) agriGO a GO analysis toolkit for the agricultural community Nucleic Acids Res 38 64-70Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Enright AJ Van Dongen S Ouzounis CA (2002) An efficient algorithm for large-scale detection of protein families Nucleic Acids Res 301575-1584

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fadista J Vikman P Laakso EO Mollet IG Esguerra J Lou Taneera J Storm P Osmark P Ladenvall C Prasad RB (2014) Globalgenomic and transcriptomic analysis of human pancreatic islets reveals novel genes influencing glucose metabolism Proc Natl AcadSci 111 13924-13929

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Faith JJ Hayete B Thaden JT Mogno I Wierzbowski J Cottarel G Kasif S Collins JJ Gardner TS (2007) Large-scale mapping andvalidation of Escherichia coli transcriptional regulation from a compendium of expression profiles PLoS Biol 5 0054-0066

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fedoroff N V (2012) McClintocks challenge in the 21st century Proc Natl Acad Sci 109(50) 20200-20203Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ficklin SP Feltus FA (2011) Gene coexpression network alignment and conservation of gene modules between two grass speciesmaize and rice Plant Physiol 156 1244-56

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Filzmoser P Fritz H Kalcher K (2009) pcaPP Robust PCA by Projection Pursuit R Packag version 1Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fu J Cheng Y Linghu J Yang X Kang L Zhang Z Zhang J He C Du X Peng Z (2013) RNA sequencing reveals the complex regulatorynetwork in the maize kernel Nature Commun 42832

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gao S Ver Steeg G Galstyan A (2015) Efficient Estimation of Mutual Information for Strongly Dependent Variables ArtificialIntelligence and Statistics 277-286

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gillis J Pavlidis P (2011) The role of indirect connections in gene networks in predicting function Bioinformatics 27 1860-1866Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Giorgi F Fabbro C Del Licausi F (2013) Comparative study of RNA-seq-and Microarray-derived coexpression networks in Arabidopsisthaliana Bioinformatics 2 1-8

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gu Y Kaplinsky N Bringmann M Cobb A Carroll A Sampathkumar A Baskin TI Persson S Somerville CR (2010) Identification of acellulose synthase-associated protein required for cellulose biosynthesis Proc Natl Acad Sci 107 12866-12871

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Han Y Gao S Muegge K Zhang W Zhou B (2015) Advanced applications of RNA sequencing and challenges Bioinform Biol Insights 929-46

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Horvath S Dong J (2008) Geometric interpretation of gene coexpression network analysis PLoS Comput biol 4 e1000117Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Huang J Lynn JS Schulte L Vendramin S McGinnis K (2017) Chapter Two-Epigenetic Control of Gene Expression in Maize Int RevCell Mol Biol 328 25-48

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Iancu OD Kawane S Bottomly D Searles R Hitzemann R McWeeney S (2012) Utilizing RNA-Seq data for de novo coexpressionnetwork inference Bioinformatics 28 1592-1597

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kim D Langmead B Salzberg SL (2015) HISAT a fast spliced aligner with low memory requirements Nat Methods 12 357-360Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Koumlster J Rahmann S (2012) Snakemakemdasha scalable bioinformatics workflow engine Bioinformatics 28 2520-2522Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Krouk G Lingeman J Colon A Coruzzi G Shasha D (2013) Gene regulatory networks in plants learning causality from time andperturbation Genome Biol 14 123

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kumar S Stecher G Suleski M Hedges SB (2017) TimeTree a resource for timelines timetrees and divergence times Mol Biol Evol34 1812-1819

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kumari S Nie J Chen H-S Ma H Stewart R Li X Lu M-Z Taylor WM Wei H (2012) Evaluation of gene association methods forcoexpression network construction and biological knowledge discovery PLoS One 7 e50411

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Langfelder P Horvath S (2008) WGCNA an R package for weighted correlation network analysis BMC Bioinformatics 9 559Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Leinonen R Sugawara H Shumway M (2010) The sequence read archive Nucleic Acids Res gkq1019Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li C Qiao Z Qi W Wang Q Yuan Y Yang X Tang Y Mei B Lv Y Zhao H et al (2015a) Genome-Wide Characterization of cis-Acting DNATargets Reveals the Transcriptional Regulatory Framework of Opaque2 in Maize Plant Cell 27 532-545

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li H Peng Z Yang X Wang W Fu J Wang J Han Y Chai Y Guo T Yang N (2013a) Genome-wide association study dissects the geneticarchitecture of oil biosynthesis in maize kernels Nat Genet 45 43-50

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li J Wei H Zhao PX (2013b) DeGNServer Deciphering Genome-Scale Gene Networks through High Performance ReverseEngineering Analysis 2013

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li P Piao Y Shon HS Ryu KH (2015b) Comparing the normalization methods for the differential analysis of Illumina high-throughputRNA-Seq data BMC Bioinformatics 16 347 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li Q Eichten SR Hermanson PJ Zaunbrecher VM Song J Wendt J Rosenbaum H Madzima TF Sloan AE Huang J et al (2014a)Genetic Perturbation of the Maize Methylome Plant Cell 26 4602-4616

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li S Labaj PP Zumbo P Sykacek P Shi W Shi L Phan J Wu P-Y Wang M Wang C (2014b) Detecting and correcting systematicvariation in large-scale RNA sequencing data Nature Biotechnol 32 888-895

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li Y Pearl SA Jackson SA (2015c) Gene Networks in Plant Biology Approaches in Reconstruction and Analysis Trends Plant Sci 20664-675

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Liao Y Smyth GK Shi W (2014) featureCounts an efficient general purpose program for assigning sequence reads to genomicfeatures Bioinformatics 30 923-930

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Lim WK Wang K Lefebvre C Califano A (2007) Comparative analysis of microarray normalization procedures Effects on reverseengineering gene networks Bioinformatics pp 282-288

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Liu S Liu Y Zhao J Cai S Qian H Zuo K Zhao L Zhang L (2017) A computational interactome for prioritizing genes associated withcomplex agronomic traits in rice Plant J 90 177-188

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Lonsdale J Thomas J Salvatore M Phillips R Lo E Shad S Hasz R Walters G Garcia F Young N (2013) The genotype-tissueexpression (GTEx) project Nat Genet 45 580-585

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Love MI Huber W Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 Genome Biol15 1

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Luo X You Z Zhou M Li S Leung H Xia Y Zhu Q (2015) A highly efficient approach to protein interactome mapping based oncollaborative filtering framework Sci Rep 5 7702

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma C Wang X (2012) Application of the Gini correlation coefficient to infer regulatory relationships in transcriptome analysis PlantPhysiol 160 192-203

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma H-W Zeng A-P (2003) The connectivity structure giant strong component and centrality of metabolic networks Bioinformatics 191423-1430

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma S Shah S Bohnert HJ Snyder M Dinesh-Kumar SP (2013) Incorporating motif analysis into gene co-expression networks revealsnovel modular expression pattern and new signaling pathways PLoS Genet 9 e1003840

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Marbach D Costello JC Kuumlffner R Vega NNM Prill RJ Camacho DM Allison KR Aderhold A Allison KR Bonneau R et al (2012)Wisdom of crowds for robust gene network inference Nat Methods 9 796-804

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Margolin AA Nemenman I Basso K Wiggins C Stolovitzky G Dalla Favera R Califano A (2006) ARACNE an algorithm for thereconstruction of gene regulatory networks in a mammalian cellular context BMC Bioinformatics 7 S7

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mark Cigan A Unger-Wallace E Haug-Collet K (2005) Transcriptional gene silencing as a tool for uncovering gene function in maizePlant J 43 929-940

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads EMBnet J 17 pp-10Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Maza E Frasse P Senin P Bouzayen M Zouine M (2013) Comparison of normalization methods for differential gene expressionanalysis in RNA-Seq experiments A matter of relative size of studied transcriptomes Commun Integr Biol 6 e25849

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

McClintock B (1983) The Significance of Responses of THE GENOME TO CHALLENGE Science (80- ) 792-801Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Meyer PE Kontos K Lafitte F Bontempi G (2007) Information-theoretic inference of large transcriptional regulatory networks EurasipJ Bioinforma Syst Biol doi 101155200779879

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Meyer PE Lafitte F Bontempi G (2008) minet ARBioconductor package for inferring large transcriptional networks using mutualinformation BMC Bioinformatics 9 461

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Monaco MK Sen TZ Dharmawardhana PD Ren L Schaeffer M Naithani S Amarasinghe V Thomason J Harper L Gardiner J et al(2013) Maize Metabolic Network Construction and Transcriptome Analysis Plant Genome 6 12

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Morohashi K Casas MI Falcone Ferreyra ML Falcone Ferreyra L Mejiacutea-Guerra MK Pourcel L Yilmaz A Feller A Carvalho B EmilianiJ et al (2012) A genome-wide regulatory framework identifies maize pericarp color1 controlled genes Plant Cell 24 2745-64

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Morris JH Apeltsin L Newman AM Baumbach J Wittkop T Su G Bader GD Ferrin TE (2011) clusterMaker a multi-algorithmclustering plugin for Cytoscape BMC Bioinformatics 12 436

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mortazavi A Williams BA McCue K Schaeffer L Wold B (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq NatMethods 5 621-628

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mukaka MM (2012) A guide to appropriate use of correlation coefficient in medical research Malawi Med J 24 69-71Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Obayashi T Hayashi S Saeki M Ohta H Kinoshita K (2009) ATTED-II provides coexpressed gene networks for Arabidopsis Nucleic wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Acids Res 37 D987-D991Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ogata Y Suzuki H Sakurai N Shibata D (2010) CoP a database for characterizing co-expressed gene modules with biologicalinformation in plants Bioinformatics 26 1267-1268

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Oshlack A Wakefield MJ (2009) Transcript length bias in RNA-seq data confounds systems biology Biol Direct 4 14Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Park T Yi S-G Kang S-H Lee S Lee Y-S Simon R (2003) Evaluation of normalization methods for microarray data BMC Bioinformatics4 33

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Paulson J Chen C-Y Lopes-Ramos CM Kuijjer ML Platig J Sonawane AR Fagny M Glass K Quackenbush J (2016) Tissue-awareRNA-Seq processing and normalization for heterogeneous and sparse data bioRxiv 81802

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Pautler M Eveland AL LaRue T Yang F Weeks R Lunde C Je B Il Meeley R Komatsu M Vollbrecht E et al (2015) FASCIATED EAR4Encodes a bZIP Transcription Factor That Regulates Shoot Meristem Size in Maize Plant Cell Online 2 tpc114132506

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Penning BW Hunter 3rd CT Tayengwa R Eveland AL Dugard CK Olek AT Vermerris W Koch KE McCarty DR Davis MF et al (2009)Genetic resources for maize cell wall biology Plant Physiol 151 1703-1728

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ponnala L Wang Y Sun Q Wijk KJ (2014) Correlation of mRNA and protein abundance in the developing maize leaf Plant J 78 424-440Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Rau A Gallopin M Celeux G Jaffreacutezic F (2013) Data-based filtering for replicated high-throughput transcriptome sequencingexperiments Bioinformatics 29 2146-2152

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Robinson MD McCarthy DJ Smyth GK (2010) edgeR a Bioconductor package for differential expression analysis of digital geneexpression data Bioinformatics 26 139-140

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sales G Romualdi C (2011) parmigenemdasha parallel R package for mutual information estimation and gene network reconstructionBioinformatics 27 1876-1877

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sangrador-Vegas A Mitchell AL Chang H-Y Yong S-Y Finn RD (2016) GO annotation in InterPro why stability does not indicateaccuracy in a sea of changing annotations Database J Biol databases curation 2016

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sato Y Antonio BA Namiki N Takehisa H Minami H Kamatsuki K Sugimoto K Shimizu Y Hirochika H Nagamura Y (2011) RiceXPro aplatform for monitoring gene expression in japonica rice grown under natural field conditions Nucleic Acids Res 39 D1141-D1148

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schaefer RJ Briskine R Springer NM Myers CL (2014) Discovering functional modules across diverse maize transcriptomes using wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

COB the co-expression browser PLoS One doi 101371journalpone0099193Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schmidt D (2016) Co-Operation Fast Correlation Covariance and Cosine Similarity R packagePubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schnable P Ware D Fulton R Stein J (2009) The B73 maize genome complexity diversity and dynamics Science (80- ) 326 1112-1115Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schwanhaumlusser B Busse D Li N Dittmar G Schuchhardt J Wolf J Chen W Selbach M (2011) Global quantification of mammalian geneexpression control Nature 473 337-342

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Scott-Boyer M-P Haibe-Kains B Deschepper CF (2013) Network statistics of genetically-driven gene co-expression modules in mousecrosses Frontiers in Genetics 20134291

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Serin EAR Nijveen H Hilhorst HWM Ligterink W (2016) Learning from Co-expression Networks Possibilities and Challenges FrontPlant Sci 7 444

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Shannon P Markiel A Ozier O Baliga NS Wang JT Ramage D Amin N Schwikowski B Ideker T (2003) Cytoscape a softwareenvironment for integrated models of biomolecular interaction networks Genome Res 13 2498-2504

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sing T Sander O Beerenwinkel N Lengauer T (2005) ROCR visualizing classifier performance in R Bioinformatics 21 3940-3941Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Singh M Goel S Meeley RB Dantec C Parrinello H Michaud C Leblanc O Grimanelli D (2011) Production of viable gametes withoutmeiosis in maize deficient for an ARGONAUTE protein Plant Cell 23 443-458

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Song L Langfelder P Horvath S (2012) Comparison of co-expression measures mutual information correlation and model basedindices BMC Bioinformatics 13 328

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Stelpflug SC Rajandeep S Vaillancourt B Hirsch CN Buell CR Leon N De Kaeppler SM (2015) An expanded maize gene expressionatlas based on RNA-sequencing and its use to explore root development Plant Genome 314-362

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Szklarczyk D Franceschini A Wyder S Forslund K Heller D Huerta-Cepas J Simonovic M Roth A Santos A Tsafou KP et al (2015)STRING v10 Protein-protein interaction networks integrated over the tree of life Nucleic Acids Res 43 D447-D452

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Tzfadia O Diels T De Meyer S Vandepoele K Aharoni A Van de Peer Y (2016) CoExpNetViz Comparative Co-Expression NetworksConstruction and Visualization Tool Front Plant Sci 6 1194

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Usadel B Obayashi T Mutwil M Giorgi FM Bassel GW Tanimoto M Chow A Steinhauser D Persson S Provart NJ (2009) Co-expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

USDA (2016) Grain World Markets and Trade

Vanholme R Demedts B Morreel K Ralph J Boerjan W (2010) Lignin biosynthesis and structure Plant Physiol 153 895-905Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Verdin E Ott M (2014) 50 Years of Protein Acetylation From Gene Regulation To Epigenetics Metabolism and Beyond Nat Rev MolCell Biol 16 258-264

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Walley JW Sartor RC Shen Z Schmitz RJ Wu KJ Urich MA Nery JR Smith LG Schnable JC Ecker JR (2016) Integration of omicnetworks in a developmental atlas of maize Science 353 814-818

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang W Zhou XJ Liu Z Sun F (2015a) Network tuned multiple rank aggregation and applications to gene ranking BMC Bioinformatics16 S6

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang YXR Jiang K Feldman LJ Bickel PJ Huang H (2015b) Inferring gene-gene interactions and functional modules using sparsecanonical correlation analysis Ann Appl Stat 9 300-323

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang Z Gerstein M Snyder M (2009) RNA-Seq a revolutionary tool for transcriptomics Nat Rev Genet 10 57-63Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wei C Li J Bumgarner RE (2004) Sample size for detecting differentially expressed genes in microarray experiments BMC Genomics5 87

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wolfe CJ Kohane IS Butte AJ (2005) Systematic survey reveals general applicability of guilt-by-association within genecoexpression networks BMC Bioinformatics 6 227

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wu D-D Wang X Li Y Zeng L Irwin DM Zhang Y-P (2014) Out of Pollen Hypothesis for Origin of New Genes in Flowering PlantsStudy from Arabidopsis thaliana Genome Biol Evol 6 2822-2829

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yang H Liu X Xin M Du J Hu Z Peng H Rossi V Sun Q Ni Z Yao Y (2016) Genome-wide Mapping of Targets of Maize HistoneDeacetylase HDA101 Reveals Its Function and Regulatory Mechanism during Seed Development Plant Cell 28 629-645

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yim WC Yu Y Song K Jang CS Lee B-M (2013) PLANEX the plant co-expression database BMC Plant Biol 13 83Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zheng W Chung LM Zhao H (2011) Bias detection and correction in RNA-Sequencing data BMC Bioinformatics 12 290Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Allen JD Xiao G Xie Y (2014) Ensemble-based network aggregation improves the accuracy of gene network reconstructionPLoS One 9 e106319 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Ye ZH (2015) Secondary cell walls Biosynthesis patterned deposition and transcriptional regulation Plant Cell Physiol 56195-214

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhu G Wu A Xu X-J Xiao P Lu L Liu J Cao Y Chen L Wu J Zhao X-M (2016) PPIM A protein-protein interaction database for MaizePlant Physiol 170 pp1501821-

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zyprych-Walczak J Szabelska A Handschuh L Goacuterczak K Klamecka K Figlerowicz M Siatkowski I (2015) The impact of normalizationmethods on RNA-Seq data analysis Biomed Res Int 2015

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

  • Parsed Citations
  • Article File
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Figure 7
  • Figure 8
  • Parsed Citations
Page 29: 1 : Maize RNA-Seq GCNs · 4 Author names and affiliation: 5 Ji Huang, ... and the genome annotation is still far from complete (Mark 49 Cigan et al., 2005; Ficklin and Feltus, 2011)

Page | 29

Cytoscape a software environment for integrated models of biomolecular interaction networks Genome 1012 Res 13 2498ndash2504 1013

Sing T Sander O Beerenwinkel N Lengauer T (2005) ROCR visualizing classifier performance in R 1014 Bioinformatics 21 3940ndash3941 1015

Singh M Goel S Meeley RB Dantec C Parrinello H Michaud C Leblanc O Grimanelli D (2011) Production of 1016 viable gametes without meiosis in maize deficient for an ARGONAUTE protein Plant Cell 23 443ndash458 1017

Song L Langfelder P Horvath S (2012) Comparison of co-expression measures mutual information 1018 correlation and model based indices BMC Bioinformatics 13 328 1019

Stelpflug SC Rajandeep S Vaillancourt B Hirsch CN Buell CR Leon N De Kaeppler SM (2015) An 1020 expanded maize gene expression atlas based on RNA-sequencing and its use to explore root 1021 development Plant Genome 314ndash362 1022

Szklarczyk D Franceschini A Wyder S Forslund K Heller D Huerta-Cepas J Simonovic M Roth A Santos A 1023 Tsafou KP et al (2015) STRING v10 Protein-protein interaction networks integrated over the tree of life 1024 Nucleic Acids Res 43 D447ndashD452 1025

Tzfadia O Diels T De Meyer S Vandepoele K Aharoni A Van de Peer Y (2016) CoExpNetViz Comparative 1026 Co-Expression Networks Construction and Visualization Tool Front Plant Sci 6 1194 1027

Usadel B Obayashi T Mutwil M Giorgi FM Bassel GW Tanimoto M Chow A Steinhauser D Persson S 1028 Provart NJ (2009) Co-expression tools for plant biology opportunities for hypothesis generation and 1029 caveats Plant Cell Environ 32 1633ndash51 1030

USDA (2016) Grain World Markets and Trade 1031

Vanholme R Demedts B Morreel K Ralph J Boerjan W (2010) Lignin biosynthesis and structure Plant 1032 Physiol 153 895ndash905 1033

Verdin E Ott M (2014) 50 Years of Protein Acetylation From Gene Regulation To Epigenetics Metabolism 1034 and Beyond Nat Rev Mol Cell Biol 16 258ndash264 1035

Walley JW Sartor RC Shen Z Schmitz RJ Wu KJ Urich MA Nery JR Smith LG Schnable JC Ecker JR 1036 (2016) Integration of omic networks in a developmental atlas of maize Science 353 814ndash818 1037

Wang W Zhou XJ Liu Z Sun F (2015a) Network tuned multiple rank aggregation and applications to gene 1038 ranking BMC Bioinformatics 16 S6 1039

Wang YXR Jiang K Feldman LJ Bickel PJ Huang H (2015b) Inferring genendashgene interactions and functional 1040 modules using sparse canonical correlation analysis Ann Appl Stat 9 300ndash323 1041

Wang Z Gerstein M Snyder M (2009) RNA-Seq a revolutionary tool for transcriptomics Nat Rev Genet 10 1042 57ndash63 1043

Wei C Li J Bumgarner RE (2004) Sample size for detecting differentially expressed genes in microarray 1044 experiments BMC Genomics 5 87 1045

Wolfe CJ Kohane IS Butte AJ (2005) Systematic survey reveals general applicability ofldquo guilt-by-associationrdquo 1046 within gene coexpression networks BMC Bioinformatics 6 227 1047

Wu D-D Wang X Li Y Zeng L Irwin DM Zhang Y-P (2014) ldquoOut of Pollenrdquo Hypothesis for Origin of New 1048 Genes in Flowering Plants Study from Arabidopsis thaliana Genome Biol Evol 6 2822ndash2829 1049

Yang H Liu X Xin M Du J Hu Z Peng H Rossi V Sun Q Ni Z Yao Y (2016) Genome-wide Mapping of 1050 Targets of Maize Histone Deacetylase HDA101 Reveals Its Function and Regulatory Mechanism during 1051 Seed Development Plant Cell 28 629ndash645 1052

Yim WC Yu Y Song K Jang CS Lee B-M (2013) PLANEX the plant co-expression database BMC Plant Biol 1053 13 83 1054

Zheng W Chung LM Zhao H (2011) Bias detection and correction in RNA-Sequencing data BMC 1055 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Page | 30

Bioinformatics 12 290 1056

Zhong R Allen JD Xiao G Xie Y (2014) Ensemble-based network aggregation improves the accuracy of gene 1057 network reconstruction PLoS One 9 e106319 1058

Zhong R Ye ZH (2015) Secondary cell walls Biosynthesis patterned deposition and transcriptional regulation 1059 Plant Cell Physiol 56 195ndash214 1060

Zhu G Wu A Xu X-J Xiao P Lu L Liu J Cao Y Chen L Wu J Zhao X-M (2016) PPIM A protein-protein 1061 interaction database for Maize Plant Physiol 170 pp1501821- 1062

Zyprych-Walczak J Szabelska A Handschuh L Goacuterczak K Klamecka K Figlerowicz M Siatkowski I (2015) 1063 The impact of normalization methods on RNA-Seq data analysis Biomed Res Int 2015 1064

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 1 Number of maize microarray and RNA-Seq samples submitted to NCBI (httpswwwncbinlmnihgov) from 2008-2016 A A text search of the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) database identified samples generated by microarray platforms GPL4032 and GPL12620 the total values for years 2008 to 2016 were combined to represent the number of microarray studies (Microarray) A text search of the NCBI Sequence Read Archive (SRA) database was used to identify RNA-seq samples generated between 2008 and 2016 using the Illumina sequencing platform (RNA-Seq) B the number of samples submitted to NCBI GEO database each year generated by microarray platform GPL4032 and GPL12620 were identified by a text search (dash line) and compared to the number of RNA-Seq Illumina samples (solid line) per year 2008-2016

Fig 1A B

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 2 Normalization and network inference methods effect on single network performance A Network performance was evaluated by calculating area under the receiver operating characteristic curve (AUROC) values from GO datasets for comparisons with samples normalized using Variance Stabilizing Transformation (VST) Counts Per Million (CPM) or Reads per Kilobase per Million (RPKM) methods B Network performance was evaluated by calculating AUROC values from PPPTY dataset comparisons for samples normalized using VST CPM or RPKM methods C Network performance was evaluated by calculating AUROC values from com-parisons with HDA101 binding targets for samples normalized using VST CPM or RPKM methods D Network performance was evaluated by calculating AUROC values from comparisons with GO dataset for samples constructed using nine inference methods including Pearson Correlation Coefficient (PCC) Spearman correla-tion coefficient (SCC) Kendall rank Correlation Coefficient (KCC) Gini correlation coefficient (GCC) Biweight midcorrelation (BIC) Cosine Similarity Coefficient (CSC) Additive ARCNE (AA) multiplicative ARCNE (MA) Minimum Redundancy NETwork (MRNET) and Context Likelihood of Relatedness (CLR) E Network perfor-mance was evaluated by calculating AUROC values from comparisons with PPPTY for samples constructed using nine inference methods F Network performance was evaluated by calculating AUROC values from com-parisons with HDA101 binding targets for samples constructed using nine inference methods Outliers were defined as outside of 15 times the interquartile range above 75 quantile or below 25 quantile Median values were plotted as bold horizontal lines For C to F the horizontal dashed lines indicate the highest and lowest AUROC values

Fig 2 A D

B E

C F

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

FigP

FigurePIPSimilarityPbetweenPninePinferencePmethodsPonPnetworkPperformancePbasedPuponPGOPA-PandPPPPTYPB-PevaluationI Genes with highest AUROC value in CSC or MRNETCLR are closed in blue and green box respectively AreaPunderPthePROCPcurvePAUROC-PvaluesPforPeachPGOPtermPorPgenesPwerePscaledPtoPstandardPnormalPdistributionMPresultingPinPscaledPAUROCPvaluesPbetweenPKPblue-PandPPred-IPSamplesPnormalizedPbyPVSTMPCPMPandPRPKMPwerePanalyzedPusingPeachPinferencePmethodsPPCCMPSCCMPKCCMPGCCMPBICMPCSCMPAAMPMAMPMRNETPandPCLR-PandPclusteredPbasedPonP EuclidianP distanceIP PCCP PearsonP CorrelationP CoefficientP SCCP SpearmanP correlationP coefficientP KCCP KendallP rankP CorrelationP CoefficientPGCCPGiniPcorrelationPcoefficientPBICPBiweightPmidcorrelationPCosinePSimilarityPCoefficientPCSC-PAAPAdditivePARCNEPMAPmultiplicativePARCNEPMRNETPMinimumPRedundancyPNETworkPCLRPContextPLikelihoodPofPRelatednessI

A

B

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 4 Effect of sample size on network performance A Average area under the ROC curve (average AUROC) values from GO evaluation of 17 different sized networks plotted against natural logarithm transformed sample size (log(sample size)) B Average AUROC values from PPPTY evaluation of 17 different sized networks plotted against natural logarithm transformed sample size (log(sample size)) Regression fitting logarithm models were plotted in black lines R-square and p-values were calculated by lm() function in R PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness

Fig 4A

B

GO PCC GO SCC

GO MRNET GO CLR

PPPTY PCC PPPTY SCC

PPPTY CLRPPPTY MRNET

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Fig 5

A B

Figure 5 Evaluation of different sized networks constructed with different number of samples using PCC (black) SCC (green) MRNET (red) and CLR (blue) methods A Average AUROC values from GO evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted against sample size B Average AUROC values from PPPTY evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted against sample size Networks with the same number of samples included are designated as ldquo_1rdquo ldquo_2rdquo and ldquo_3rdquo PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 6 GCN performance comparison of networks constructed with 1266 libraries A Area under the ROC curve (AUROC) values from GO evalu-ation of single network (white bars) aggregation network (grey bars) and protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers B AUROC values from PPPTY evaluation of single network (white bars) aggregation network (grey bars) and protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers

AU

C

AU

C

Protein GO Protein PPPTYA B

Fig 6

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

835

45

3812

5 802

148

MS PA

SA

872505

167664

16510411732

9172 716573

106591

MS PA

SA

chromatinassemblydisassembly

cellular macromoleculemetabolic process

chromatin assembly

Hub

N2 compound metabolicprocess

gene silencing

macromoleculemetabolic process

cellular componentorganization

chromatin modification

biosynthetic process

cellular biosyntheticprocess

DNA packaging

organelle organization

protein-DNA complexassembly

nucleosomeorganization

DNA-dep DNAreplication

macromoleculebiosynthetic process

response to DNAdamage stimulus

chromosomeorganization

pattern specificationprocess

DNA replication

DNA conformationchange

translation

cellular macromoleculebiosynthetic process

Nucleic acid metabolicprocess

gene expression

chromatin organizationnucleosome assembly

epigenetic reg of geneexpression

negative regulation ofmacromolecule

metabolic process

cellular response tostress

RNA processing

DNA repair

regionalization

polysaccharidebiosynthetic process

cell wall organization orbiogenesis

glucan metabolicprocess

cellular glucanmetabolic process

cellular polysaccharidebiosynthetic process

cellular carbohydratebiosynthetic process

cellulose metabolicprocess

cellular polysaccharidemetabolic process

cellulose biosyntheticprocess epidermis development

cell growthgrowth

regulation of cellularcomponent size

cellular amino acidderivative metabolic

process

cell wall polysaccharidemetabolic process

carbohydrate metabolicprocess

regulation of anatomicalstructure size

GTP metabolic process

root morphogenesis

epidermal celldifferentiation

ectoderm developmentphenylpropanoid

biosynthetic process

regulation of cell size

glucan biosyntheticprocess

carbohydratebiosynthetic process cellular cell wall

organization orbiogenesis

cell wall biogenesis

Cell Wallroot epidermal cell

differentiationcell differentiation

cell wall organization

protein polymerization

plant-type cell wallbiogenesis

cellular carbohydratemetabolic process

phenylpropanoidmetabolic process

cell wall macromoleculemetabolic process

cellular cell wallmacromolecule

metabolic process

plant-type cell wallorganization or

biogenesis

hemicellulose metabolicprocess

Fig 7A C

B D

Figure 7 Comparison of top 1000 interacting genes in aggregation and single networks A Venn diagram shows the overlap among 1000 highest interacting genes in three networks 148 genes were shared among three networks PA PCC ranked aggregation network SA SCC ranked aggrega-tion network MS MRNET single network B Venn diagram shows the overlap among top 1x106 edges in three networks 106591 edges were shared among three networks C GO term enrichment analyzed by AgriGO for 148 shared highly interacting genes among PA SA and MS D GO term enrichment analyzed by AgriGO for 214 co-expressed genes queried by 16 cell wall pathway genes wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Fig 8

A B C

Figure 8 Cell wall pathway subnetworks A Intersections of PCC aggregation (PA) SCC aggregation (SA) and MRNET-single (MS) networks queried by 16 cell wall pathway genes (red nodes) Cyan nodes are genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions B Network retrieved from CORNET database queried by the 16 cell wall pathway genes (red node) Cyan nodes are genes with reported function in cell wall related path-ways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions C Network retrieved from STRING database queried by the 16 cell wall pathway genes (red nodes) Cyan nodes are genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Parsed CitationsAllen JD Xie Y Chen M Girard L Xiao G (2012) Comparing statistical methods for constructing large scale gene networks PLoS One7 e29348

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Anders S Huber W (2010) Differential expression analysis for sequence count data Genome Biol 11 R106Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Aoki K Ogata Y Shibata D (2007) Approaches for extracting practical information from gene co-expression networks in plant biologyPlant Cell Physiol 48 381-90

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Aoki Y Okamura Y Tadaka S Kinoshita K Obayashi T (2015) ATTED-II in 2016 A Plant Coexpression Database Towards Lineage-Specific Coexpression Plant Cell Physiol 57 e5-e5

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Asnicar F Masera L Coller E Gallo C Sella N Tolio T Morettin P Erculiani L Galante F Semeniuta S (2016) NES2RA Networkexpansion by stratified variable subsetting and ranking aggregation Int J High Perform Comput Appl 1094342016662508

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Baerenfaller K Grossmann J Grobei MA Hull R Hirsch-Hoffmann M Yalovsky S Zimmermann P Grossniklaus U Gruissem WBaginsky S (2008) Genome-scale proteomics reveals Arabidopsis thaliana gene models and proteome dynamics Science (80- ) 320938-941

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ballouz S Verleyen W Gillis J (2015) Guidance for RNA-seq co-expression network construction and analysis Safety in numbersBioinformatics 31 2123-2130

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ballouz S Weber M Pavlidis P Gillis J (2016) EGAD Ultra-fast functional analysis of gene networks bioRxiv 53868Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Barabasi A-L Oltvai ZNZN Barabaacutesi A-L (2004) Network biology understanding the cells functional organization Nat Rev Genet 5101-113

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Benjamini Y Hochberg Y (1995) Controlling the false discovery rate a practical and powerful approach to multiple testing J R Stat SocSer B 289-300

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

De Bodt S Hollunder J Nelissen H Meulemeester N Inzeacute D (2012) CORNET 20 Integrating plant coexpression protein-proteininteractions regulatory interactions gene associations and functional annotations New Phytol 195 707-720

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Bolduc N Yilmaz A Mejia-Guerra MK Morohashi K OConnor D Grotewold E Hake S (2012) Unraveling the KNOTTED1 regulatorynetwork in maize meristems Genes Dev 26 1685-90

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Bosch M Mayer C-D Cookson A Donnison IS (2011) Identification of genes involved in cell wall biogenesis in grasses by differential wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

gene expression profiling of elongating and non-elongating maize internodes J Exp Bot 62 3545-3561Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Camacho C Coulouris G Avagyan V Ma N Papadopoulos J Bealer K Madden TL (2009) BLAST+ architecture and applications BMCBioinformatics 10 421

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Conesa A Madrigal P Tarazona S Gomez-Cabrero D Cervera A McPherson A Szczesniak MW Gaffney DJ Elo LL Zhang X et al(2016) A survey of best practices for RNA-seq data analysis Genome Biol 17 13

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dhaeseleer P Liang S Somogyi R (2000) Genetic network inference from co-expression clustering to reverse engineeringBioinformatics 16 707-726

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Davidson RM Hansey CN Gowda M Childs KL Lin H Vaillancourt B Sekhon RS de Leon N Kaeppler SM Jiang N et al (2011) Utilityof RNA Sequencing for Analysis of Maize Reproductive Transcriptomes Plant Genome J 4 191

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Desprez T Juraniec M Crowell EF Jouy H Pochylova Z Parcy F Houmlfte H Gonneau M Vernhettes S (2007) Organization of cellulosesynthase complexes involved in primary cell wall synthesis in Arabidopsis thaliana Proc Natl Acad Sci 104 15572-15577

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dhillon IS Modha DS (2001) Concept decompositions for large sparse text data using clustering Mach Learn 42 143-175Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dillies M-A Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot G Castel D Estelle J (2013a) Acomprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis Brief Bioinform 14671-683

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dillies MA Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot NS Castel D Estelle J et al (2013b) Acomprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis Brief Bioinform 14671-683

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Doncheva NT Assenov Y Domingues FS Albrecht M (2012) Topological analysis and interactive visualization of biological networksand protein structures Nature Protoc 7 670-85

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dong J Horvath S (2007) Understanding network concepts in modules BMC Syst Biol 1 24Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dotto MC Petsch KA Aukerman MJ Beatty M Hammell M Timmermans MCP (2014) Genome-wide analysis of leafbladeless1-regulatedand phased small RNAs underscores the importance of the TAS3 ta-siRNA pathway to maize development PLoS Genet 10 e1004826

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Du D Rawat N Deng Z Gmitter FG (2015) Construction of citrus gene coexpression networks from microarray data using randommatrix theory Hortic Res 2 15026

Pubmed Author and TitleCrossRef Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Google Scholar Author Only Title Only Author and Title

Du Z Zhou X Ling Y Zhang Z Su Z (2010) agriGO a GO analysis toolkit for the agricultural community Nucleic Acids Res 38 64-70Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Enright AJ Van Dongen S Ouzounis CA (2002) An efficient algorithm for large-scale detection of protein families Nucleic Acids Res 301575-1584

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fadista J Vikman P Laakso EO Mollet IG Esguerra J Lou Taneera J Storm P Osmark P Ladenvall C Prasad RB (2014) Globalgenomic and transcriptomic analysis of human pancreatic islets reveals novel genes influencing glucose metabolism Proc Natl AcadSci 111 13924-13929

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Faith JJ Hayete B Thaden JT Mogno I Wierzbowski J Cottarel G Kasif S Collins JJ Gardner TS (2007) Large-scale mapping andvalidation of Escherichia coli transcriptional regulation from a compendium of expression profiles PLoS Biol 5 0054-0066

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fedoroff N V (2012) McClintocks challenge in the 21st century Proc Natl Acad Sci 109(50) 20200-20203Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ficklin SP Feltus FA (2011) Gene coexpression network alignment and conservation of gene modules between two grass speciesmaize and rice Plant Physiol 156 1244-56

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Filzmoser P Fritz H Kalcher K (2009) pcaPP Robust PCA by Projection Pursuit R Packag version 1Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fu J Cheng Y Linghu J Yang X Kang L Zhang Z Zhang J He C Du X Peng Z (2013) RNA sequencing reveals the complex regulatorynetwork in the maize kernel Nature Commun 42832

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gao S Ver Steeg G Galstyan A (2015) Efficient Estimation of Mutual Information for Strongly Dependent Variables ArtificialIntelligence and Statistics 277-286

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gillis J Pavlidis P (2011) The role of indirect connections in gene networks in predicting function Bioinformatics 27 1860-1866Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Giorgi F Fabbro C Del Licausi F (2013) Comparative study of RNA-seq-and Microarray-derived coexpression networks in Arabidopsisthaliana Bioinformatics 2 1-8

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gu Y Kaplinsky N Bringmann M Cobb A Carroll A Sampathkumar A Baskin TI Persson S Somerville CR (2010) Identification of acellulose synthase-associated protein required for cellulose biosynthesis Proc Natl Acad Sci 107 12866-12871

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Han Y Gao S Muegge K Zhang W Zhou B (2015) Advanced applications of RNA sequencing and challenges Bioinform Biol Insights 929-46

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Horvath S Dong J (2008) Geometric interpretation of gene coexpression network analysis PLoS Comput biol 4 e1000117Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Huang J Lynn JS Schulte L Vendramin S McGinnis K (2017) Chapter Two-Epigenetic Control of Gene Expression in Maize Int RevCell Mol Biol 328 25-48

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Iancu OD Kawane S Bottomly D Searles R Hitzemann R McWeeney S (2012) Utilizing RNA-Seq data for de novo coexpressionnetwork inference Bioinformatics 28 1592-1597

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kim D Langmead B Salzberg SL (2015) HISAT a fast spliced aligner with low memory requirements Nat Methods 12 357-360Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Koumlster J Rahmann S (2012) Snakemakemdasha scalable bioinformatics workflow engine Bioinformatics 28 2520-2522Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Krouk G Lingeman J Colon A Coruzzi G Shasha D (2013) Gene regulatory networks in plants learning causality from time andperturbation Genome Biol 14 123

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kumar S Stecher G Suleski M Hedges SB (2017) TimeTree a resource for timelines timetrees and divergence times Mol Biol Evol34 1812-1819

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kumari S Nie J Chen H-S Ma H Stewart R Li X Lu M-Z Taylor WM Wei H (2012) Evaluation of gene association methods forcoexpression network construction and biological knowledge discovery PLoS One 7 e50411

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Langfelder P Horvath S (2008) WGCNA an R package for weighted correlation network analysis BMC Bioinformatics 9 559Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Leinonen R Sugawara H Shumway M (2010) The sequence read archive Nucleic Acids Res gkq1019Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li C Qiao Z Qi W Wang Q Yuan Y Yang X Tang Y Mei B Lv Y Zhao H et al (2015a) Genome-Wide Characterization of cis-Acting DNATargets Reveals the Transcriptional Regulatory Framework of Opaque2 in Maize Plant Cell 27 532-545

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li H Peng Z Yang X Wang W Fu J Wang J Han Y Chai Y Guo T Yang N (2013a) Genome-wide association study dissects the geneticarchitecture of oil biosynthesis in maize kernels Nat Genet 45 43-50

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li J Wei H Zhao PX (2013b) DeGNServer Deciphering Genome-Scale Gene Networks through High Performance ReverseEngineering Analysis 2013

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li P Piao Y Shon HS Ryu KH (2015b) Comparing the normalization methods for the differential analysis of Illumina high-throughputRNA-Seq data BMC Bioinformatics 16 347 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li Q Eichten SR Hermanson PJ Zaunbrecher VM Song J Wendt J Rosenbaum H Madzima TF Sloan AE Huang J et al (2014a)Genetic Perturbation of the Maize Methylome Plant Cell 26 4602-4616

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li S Labaj PP Zumbo P Sykacek P Shi W Shi L Phan J Wu P-Y Wang M Wang C (2014b) Detecting and correcting systematicvariation in large-scale RNA sequencing data Nature Biotechnol 32 888-895

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li Y Pearl SA Jackson SA (2015c) Gene Networks in Plant Biology Approaches in Reconstruction and Analysis Trends Plant Sci 20664-675

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Liao Y Smyth GK Shi W (2014) featureCounts an efficient general purpose program for assigning sequence reads to genomicfeatures Bioinformatics 30 923-930

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Lim WK Wang K Lefebvre C Califano A (2007) Comparative analysis of microarray normalization procedures Effects on reverseengineering gene networks Bioinformatics pp 282-288

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Liu S Liu Y Zhao J Cai S Qian H Zuo K Zhao L Zhang L (2017) A computational interactome for prioritizing genes associated withcomplex agronomic traits in rice Plant J 90 177-188

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Lonsdale J Thomas J Salvatore M Phillips R Lo E Shad S Hasz R Walters G Garcia F Young N (2013) The genotype-tissueexpression (GTEx) project Nat Genet 45 580-585

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Love MI Huber W Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 Genome Biol15 1

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Luo X You Z Zhou M Li S Leung H Xia Y Zhu Q (2015) A highly efficient approach to protein interactome mapping based oncollaborative filtering framework Sci Rep 5 7702

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma C Wang X (2012) Application of the Gini correlation coefficient to infer regulatory relationships in transcriptome analysis PlantPhysiol 160 192-203

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma H-W Zeng A-P (2003) The connectivity structure giant strong component and centrality of metabolic networks Bioinformatics 191423-1430

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma S Shah S Bohnert HJ Snyder M Dinesh-Kumar SP (2013) Incorporating motif analysis into gene co-expression networks revealsnovel modular expression pattern and new signaling pathways PLoS Genet 9 e1003840

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Marbach D Costello JC Kuumlffner R Vega NNM Prill RJ Camacho DM Allison KR Aderhold A Allison KR Bonneau R et al (2012)Wisdom of crowds for robust gene network inference Nat Methods 9 796-804

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Margolin AA Nemenman I Basso K Wiggins C Stolovitzky G Dalla Favera R Califano A (2006) ARACNE an algorithm for thereconstruction of gene regulatory networks in a mammalian cellular context BMC Bioinformatics 7 S7

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mark Cigan A Unger-Wallace E Haug-Collet K (2005) Transcriptional gene silencing as a tool for uncovering gene function in maizePlant J 43 929-940

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads EMBnet J 17 pp-10Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Maza E Frasse P Senin P Bouzayen M Zouine M (2013) Comparison of normalization methods for differential gene expressionanalysis in RNA-Seq experiments A matter of relative size of studied transcriptomes Commun Integr Biol 6 e25849

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

McClintock B (1983) The Significance of Responses of THE GENOME TO CHALLENGE Science (80- ) 792-801Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Meyer PE Kontos K Lafitte F Bontempi G (2007) Information-theoretic inference of large transcriptional regulatory networks EurasipJ Bioinforma Syst Biol doi 101155200779879

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Meyer PE Lafitte F Bontempi G (2008) minet ARBioconductor package for inferring large transcriptional networks using mutualinformation BMC Bioinformatics 9 461

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Monaco MK Sen TZ Dharmawardhana PD Ren L Schaeffer M Naithani S Amarasinghe V Thomason J Harper L Gardiner J et al(2013) Maize Metabolic Network Construction and Transcriptome Analysis Plant Genome 6 12

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Morohashi K Casas MI Falcone Ferreyra ML Falcone Ferreyra L Mejiacutea-Guerra MK Pourcel L Yilmaz A Feller A Carvalho B EmilianiJ et al (2012) A genome-wide regulatory framework identifies maize pericarp color1 controlled genes Plant Cell 24 2745-64

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Morris JH Apeltsin L Newman AM Baumbach J Wittkop T Su G Bader GD Ferrin TE (2011) clusterMaker a multi-algorithmclustering plugin for Cytoscape BMC Bioinformatics 12 436

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mortazavi A Williams BA McCue K Schaeffer L Wold B (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq NatMethods 5 621-628

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mukaka MM (2012) A guide to appropriate use of correlation coefficient in medical research Malawi Med J 24 69-71Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Obayashi T Hayashi S Saeki M Ohta H Kinoshita K (2009) ATTED-II provides coexpressed gene networks for Arabidopsis Nucleic wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Acids Res 37 D987-D991Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ogata Y Suzuki H Sakurai N Shibata D (2010) CoP a database for characterizing co-expressed gene modules with biologicalinformation in plants Bioinformatics 26 1267-1268

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Oshlack A Wakefield MJ (2009) Transcript length bias in RNA-seq data confounds systems biology Biol Direct 4 14Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Park T Yi S-G Kang S-H Lee S Lee Y-S Simon R (2003) Evaluation of normalization methods for microarray data BMC Bioinformatics4 33

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Paulson J Chen C-Y Lopes-Ramos CM Kuijjer ML Platig J Sonawane AR Fagny M Glass K Quackenbush J (2016) Tissue-awareRNA-Seq processing and normalization for heterogeneous and sparse data bioRxiv 81802

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Pautler M Eveland AL LaRue T Yang F Weeks R Lunde C Je B Il Meeley R Komatsu M Vollbrecht E et al (2015) FASCIATED EAR4Encodes a bZIP Transcription Factor That Regulates Shoot Meristem Size in Maize Plant Cell Online 2 tpc114132506

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Penning BW Hunter 3rd CT Tayengwa R Eveland AL Dugard CK Olek AT Vermerris W Koch KE McCarty DR Davis MF et al (2009)Genetic resources for maize cell wall biology Plant Physiol 151 1703-1728

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ponnala L Wang Y Sun Q Wijk KJ (2014) Correlation of mRNA and protein abundance in the developing maize leaf Plant J 78 424-440Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Rau A Gallopin M Celeux G Jaffreacutezic F (2013) Data-based filtering for replicated high-throughput transcriptome sequencingexperiments Bioinformatics 29 2146-2152

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Robinson MD McCarthy DJ Smyth GK (2010) edgeR a Bioconductor package for differential expression analysis of digital geneexpression data Bioinformatics 26 139-140

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sales G Romualdi C (2011) parmigenemdasha parallel R package for mutual information estimation and gene network reconstructionBioinformatics 27 1876-1877

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sangrador-Vegas A Mitchell AL Chang H-Y Yong S-Y Finn RD (2016) GO annotation in InterPro why stability does not indicateaccuracy in a sea of changing annotations Database J Biol databases curation 2016

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sato Y Antonio BA Namiki N Takehisa H Minami H Kamatsuki K Sugimoto K Shimizu Y Hirochika H Nagamura Y (2011) RiceXPro aplatform for monitoring gene expression in japonica rice grown under natural field conditions Nucleic Acids Res 39 D1141-D1148

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schaefer RJ Briskine R Springer NM Myers CL (2014) Discovering functional modules across diverse maize transcriptomes using wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

COB the co-expression browser PLoS One doi 101371journalpone0099193Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schmidt D (2016) Co-Operation Fast Correlation Covariance and Cosine Similarity R packagePubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schnable P Ware D Fulton R Stein J (2009) The B73 maize genome complexity diversity and dynamics Science (80- ) 326 1112-1115Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schwanhaumlusser B Busse D Li N Dittmar G Schuchhardt J Wolf J Chen W Selbach M (2011) Global quantification of mammalian geneexpression control Nature 473 337-342

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Scott-Boyer M-P Haibe-Kains B Deschepper CF (2013) Network statistics of genetically-driven gene co-expression modules in mousecrosses Frontiers in Genetics 20134291

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Serin EAR Nijveen H Hilhorst HWM Ligterink W (2016) Learning from Co-expression Networks Possibilities and Challenges FrontPlant Sci 7 444

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Shannon P Markiel A Ozier O Baliga NS Wang JT Ramage D Amin N Schwikowski B Ideker T (2003) Cytoscape a softwareenvironment for integrated models of biomolecular interaction networks Genome Res 13 2498-2504

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sing T Sander O Beerenwinkel N Lengauer T (2005) ROCR visualizing classifier performance in R Bioinformatics 21 3940-3941Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Singh M Goel S Meeley RB Dantec C Parrinello H Michaud C Leblanc O Grimanelli D (2011) Production of viable gametes withoutmeiosis in maize deficient for an ARGONAUTE protein Plant Cell 23 443-458

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Song L Langfelder P Horvath S (2012) Comparison of co-expression measures mutual information correlation and model basedindices BMC Bioinformatics 13 328

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Stelpflug SC Rajandeep S Vaillancourt B Hirsch CN Buell CR Leon N De Kaeppler SM (2015) An expanded maize gene expressionatlas based on RNA-sequencing and its use to explore root development Plant Genome 314-362

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Szklarczyk D Franceschini A Wyder S Forslund K Heller D Huerta-Cepas J Simonovic M Roth A Santos A Tsafou KP et al (2015)STRING v10 Protein-protein interaction networks integrated over the tree of life Nucleic Acids Res 43 D447-D452

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Tzfadia O Diels T De Meyer S Vandepoele K Aharoni A Van de Peer Y (2016) CoExpNetViz Comparative Co-Expression NetworksConstruction and Visualization Tool Front Plant Sci 6 1194

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Usadel B Obayashi T Mutwil M Giorgi FM Bassel GW Tanimoto M Chow A Steinhauser D Persson S Provart NJ (2009) Co-expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

USDA (2016) Grain World Markets and Trade

Vanholme R Demedts B Morreel K Ralph J Boerjan W (2010) Lignin biosynthesis and structure Plant Physiol 153 895-905Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Verdin E Ott M (2014) 50 Years of Protein Acetylation From Gene Regulation To Epigenetics Metabolism and Beyond Nat Rev MolCell Biol 16 258-264

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Walley JW Sartor RC Shen Z Schmitz RJ Wu KJ Urich MA Nery JR Smith LG Schnable JC Ecker JR (2016) Integration of omicnetworks in a developmental atlas of maize Science 353 814-818

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang W Zhou XJ Liu Z Sun F (2015a) Network tuned multiple rank aggregation and applications to gene ranking BMC Bioinformatics16 S6

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang YXR Jiang K Feldman LJ Bickel PJ Huang H (2015b) Inferring gene-gene interactions and functional modules using sparsecanonical correlation analysis Ann Appl Stat 9 300-323

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang Z Gerstein M Snyder M (2009) RNA-Seq a revolutionary tool for transcriptomics Nat Rev Genet 10 57-63Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wei C Li J Bumgarner RE (2004) Sample size for detecting differentially expressed genes in microarray experiments BMC Genomics5 87

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wolfe CJ Kohane IS Butte AJ (2005) Systematic survey reveals general applicability of guilt-by-association within genecoexpression networks BMC Bioinformatics 6 227

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wu D-D Wang X Li Y Zeng L Irwin DM Zhang Y-P (2014) Out of Pollen Hypothesis for Origin of New Genes in Flowering PlantsStudy from Arabidopsis thaliana Genome Biol Evol 6 2822-2829

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yang H Liu X Xin M Du J Hu Z Peng H Rossi V Sun Q Ni Z Yao Y (2016) Genome-wide Mapping of Targets of Maize HistoneDeacetylase HDA101 Reveals Its Function and Regulatory Mechanism during Seed Development Plant Cell 28 629-645

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yim WC Yu Y Song K Jang CS Lee B-M (2013) PLANEX the plant co-expression database BMC Plant Biol 13 83Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zheng W Chung LM Zhao H (2011) Bias detection and correction in RNA-Sequencing data BMC Bioinformatics 12 290Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Allen JD Xiao G Xie Y (2014) Ensemble-based network aggregation improves the accuracy of gene network reconstructionPLoS One 9 e106319 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Ye ZH (2015) Secondary cell walls Biosynthesis patterned deposition and transcriptional regulation Plant Cell Physiol 56195-214

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhu G Wu A Xu X-J Xiao P Lu L Liu J Cao Y Chen L Wu J Zhao X-M (2016) PPIM A protein-protein interaction database for MaizePlant Physiol 170 pp1501821-

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zyprych-Walczak J Szabelska A Handschuh L Goacuterczak K Klamecka K Figlerowicz M Siatkowski I (2015) The impact of normalizationmethods on RNA-Seq data analysis Biomed Res Int 2015

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

  • Parsed Citations
  • Article File
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Figure 7
  • Figure 8
  • Parsed Citations
Page 30: 1 : Maize RNA-Seq GCNs · 4 Author names and affiliation: 5 Ji Huang, ... and the genome annotation is still far from complete (Mark 49 Cigan et al., 2005; Ficklin and Feltus, 2011)

Page | 30

Bioinformatics 12 290 1056

Zhong R Allen JD Xiao G Xie Y (2014) Ensemble-based network aggregation improves the accuracy of gene 1057 network reconstruction PLoS One 9 e106319 1058

Zhong R Ye ZH (2015) Secondary cell walls Biosynthesis patterned deposition and transcriptional regulation 1059 Plant Cell Physiol 56 195ndash214 1060

Zhu G Wu A Xu X-J Xiao P Lu L Liu J Cao Y Chen L Wu J Zhao X-M (2016) PPIM A protein-protein 1061 interaction database for Maize Plant Physiol 170 pp1501821- 1062

Zyprych-Walczak J Szabelska A Handschuh L Goacuterczak K Klamecka K Figlerowicz M Siatkowski I (2015) 1063 The impact of normalization methods on RNA-Seq data analysis Biomed Res Int 2015 1064

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 1 Number of maize microarray and RNA-Seq samples submitted to NCBI (httpswwwncbinlmnihgov) from 2008-2016 A A text search of the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) database identified samples generated by microarray platforms GPL4032 and GPL12620 the total values for years 2008 to 2016 were combined to represent the number of microarray studies (Microarray) A text search of the NCBI Sequence Read Archive (SRA) database was used to identify RNA-seq samples generated between 2008 and 2016 using the Illumina sequencing platform (RNA-Seq) B the number of samples submitted to NCBI GEO database each year generated by microarray platform GPL4032 and GPL12620 were identified by a text search (dash line) and compared to the number of RNA-Seq Illumina samples (solid line) per year 2008-2016

Fig 1A B

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 2 Normalization and network inference methods effect on single network performance A Network performance was evaluated by calculating area under the receiver operating characteristic curve (AUROC) values from GO datasets for comparisons with samples normalized using Variance Stabilizing Transformation (VST) Counts Per Million (CPM) or Reads per Kilobase per Million (RPKM) methods B Network performance was evaluated by calculating AUROC values from PPPTY dataset comparisons for samples normalized using VST CPM or RPKM methods C Network performance was evaluated by calculating AUROC values from com-parisons with HDA101 binding targets for samples normalized using VST CPM or RPKM methods D Network performance was evaluated by calculating AUROC values from comparisons with GO dataset for samples constructed using nine inference methods including Pearson Correlation Coefficient (PCC) Spearman correla-tion coefficient (SCC) Kendall rank Correlation Coefficient (KCC) Gini correlation coefficient (GCC) Biweight midcorrelation (BIC) Cosine Similarity Coefficient (CSC) Additive ARCNE (AA) multiplicative ARCNE (MA) Minimum Redundancy NETwork (MRNET) and Context Likelihood of Relatedness (CLR) E Network perfor-mance was evaluated by calculating AUROC values from comparisons with PPPTY for samples constructed using nine inference methods F Network performance was evaluated by calculating AUROC values from com-parisons with HDA101 binding targets for samples constructed using nine inference methods Outliers were defined as outside of 15 times the interquartile range above 75 quantile or below 25 quantile Median values were plotted as bold horizontal lines For C to F the horizontal dashed lines indicate the highest and lowest AUROC values

Fig 2 A D

B E

C F

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

FigP

FigurePIPSimilarityPbetweenPninePinferencePmethodsPonPnetworkPperformancePbasedPuponPGOPA-PandPPPPTYPB-PevaluationI Genes with highest AUROC value in CSC or MRNETCLR are closed in blue and green box respectively AreaPunderPthePROCPcurvePAUROC-PvaluesPforPeachPGOPtermPorPgenesPwerePscaledPtoPstandardPnormalPdistributionMPresultingPinPscaledPAUROCPvaluesPbetweenPKPblue-PandPPred-IPSamplesPnormalizedPbyPVSTMPCPMPandPRPKMPwerePanalyzedPusingPeachPinferencePmethodsPPCCMPSCCMPKCCMPGCCMPBICMPCSCMPAAMPMAMPMRNETPandPCLR-PandPclusteredPbasedPonP EuclidianP distanceIP PCCP PearsonP CorrelationP CoefficientP SCCP SpearmanP correlationP coefficientP KCCP KendallP rankP CorrelationP CoefficientPGCCPGiniPcorrelationPcoefficientPBICPBiweightPmidcorrelationPCosinePSimilarityPCoefficientPCSC-PAAPAdditivePARCNEPMAPmultiplicativePARCNEPMRNETPMinimumPRedundancyPNETworkPCLRPContextPLikelihoodPofPRelatednessI

A

B

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 4 Effect of sample size on network performance A Average area under the ROC curve (average AUROC) values from GO evaluation of 17 different sized networks plotted against natural logarithm transformed sample size (log(sample size)) B Average AUROC values from PPPTY evaluation of 17 different sized networks plotted against natural logarithm transformed sample size (log(sample size)) Regression fitting logarithm models were plotted in black lines R-square and p-values were calculated by lm() function in R PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness

Fig 4A

B

GO PCC GO SCC

GO MRNET GO CLR

PPPTY PCC PPPTY SCC

PPPTY CLRPPPTY MRNET

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Fig 5

A B

Figure 5 Evaluation of different sized networks constructed with different number of samples using PCC (black) SCC (green) MRNET (red) and CLR (blue) methods A Average AUROC values from GO evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted against sample size B Average AUROC values from PPPTY evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted against sample size Networks with the same number of samples included are designated as ldquo_1rdquo ldquo_2rdquo and ldquo_3rdquo PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 6 GCN performance comparison of networks constructed with 1266 libraries A Area under the ROC curve (AUROC) values from GO evalu-ation of single network (white bars) aggregation network (grey bars) and protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers B AUROC values from PPPTY evaluation of single network (white bars) aggregation network (grey bars) and protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers

AU

C

AU

C

Protein GO Protein PPPTYA B

Fig 6

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

835

45

3812

5 802

148

MS PA

SA

872505

167664

16510411732

9172 716573

106591

MS PA

SA

chromatinassemblydisassembly

cellular macromoleculemetabolic process

chromatin assembly

Hub

N2 compound metabolicprocess

gene silencing

macromoleculemetabolic process

cellular componentorganization

chromatin modification

biosynthetic process

cellular biosyntheticprocess

DNA packaging

organelle organization

protein-DNA complexassembly

nucleosomeorganization

DNA-dep DNAreplication

macromoleculebiosynthetic process

response to DNAdamage stimulus

chromosomeorganization

pattern specificationprocess

DNA replication

DNA conformationchange

translation

cellular macromoleculebiosynthetic process

Nucleic acid metabolicprocess

gene expression

chromatin organizationnucleosome assembly

epigenetic reg of geneexpression

negative regulation ofmacromolecule

metabolic process

cellular response tostress

RNA processing

DNA repair

regionalization

polysaccharidebiosynthetic process

cell wall organization orbiogenesis

glucan metabolicprocess

cellular glucanmetabolic process

cellular polysaccharidebiosynthetic process

cellular carbohydratebiosynthetic process

cellulose metabolicprocess

cellular polysaccharidemetabolic process

cellulose biosyntheticprocess epidermis development

cell growthgrowth

regulation of cellularcomponent size

cellular amino acidderivative metabolic

process

cell wall polysaccharidemetabolic process

carbohydrate metabolicprocess

regulation of anatomicalstructure size

GTP metabolic process

root morphogenesis

epidermal celldifferentiation

ectoderm developmentphenylpropanoid

biosynthetic process

regulation of cell size

glucan biosyntheticprocess

carbohydratebiosynthetic process cellular cell wall

organization orbiogenesis

cell wall biogenesis

Cell Wallroot epidermal cell

differentiationcell differentiation

cell wall organization

protein polymerization

plant-type cell wallbiogenesis

cellular carbohydratemetabolic process

phenylpropanoidmetabolic process

cell wall macromoleculemetabolic process

cellular cell wallmacromolecule

metabolic process

plant-type cell wallorganization or

biogenesis

hemicellulose metabolicprocess

Fig 7A C

B D

Figure 7 Comparison of top 1000 interacting genes in aggregation and single networks A Venn diagram shows the overlap among 1000 highest interacting genes in three networks 148 genes were shared among three networks PA PCC ranked aggregation network SA SCC ranked aggrega-tion network MS MRNET single network B Venn diagram shows the overlap among top 1x106 edges in three networks 106591 edges were shared among three networks C GO term enrichment analyzed by AgriGO for 148 shared highly interacting genes among PA SA and MS D GO term enrichment analyzed by AgriGO for 214 co-expressed genes queried by 16 cell wall pathway genes wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Fig 8

A B C

Figure 8 Cell wall pathway subnetworks A Intersections of PCC aggregation (PA) SCC aggregation (SA) and MRNET-single (MS) networks queried by 16 cell wall pathway genes (red nodes) Cyan nodes are genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions B Network retrieved from CORNET database queried by the 16 cell wall pathway genes (red node) Cyan nodes are genes with reported function in cell wall related path-ways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions C Network retrieved from STRING database queried by the 16 cell wall pathway genes (red nodes) Cyan nodes are genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Parsed CitationsAllen JD Xie Y Chen M Girard L Xiao G (2012) Comparing statistical methods for constructing large scale gene networks PLoS One7 e29348

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Anders S Huber W (2010) Differential expression analysis for sequence count data Genome Biol 11 R106Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Aoki K Ogata Y Shibata D (2007) Approaches for extracting practical information from gene co-expression networks in plant biologyPlant Cell Physiol 48 381-90

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Aoki Y Okamura Y Tadaka S Kinoshita K Obayashi T (2015) ATTED-II in 2016 A Plant Coexpression Database Towards Lineage-Specific Coexpression Plant Cell Physiol 57 e5-e5

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Asnicar F Masera L Coller E Gallo C Sella N Tolio T Morettin P Erculiani L Galante F Semeniuta S (2016) NES2RA Networkexpansion by stratified variable subsetting and ranking aggregation Int J High Perform Comput Appl 1094342016662508

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Baerenfaller K Grossmann J Grobei MA Hull R Hirsch-Hoffmann M Yalovsky S Zimmermann P Grossniklaus U Gruissem WBaginsky S (2008) Genome-scale proteomics reveals Arabidopsis thaliana gene models and proteome dynamics Science (80- ) 320938-941

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ballouz S Verleyen W Gillis J (2015) Guidance for RNA-seq co-expression network construction and analysis Safety in numbersBioinformatics 31 2123-2130

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ballouz S Weber M Pavlidis P Gillis J (2016) EGAD Ultra-fast functional analysis of gene networks bioRxiv 53868Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Barabasi A-L Oltvai ZNZN Barabaacutesi A-L (2004) Network biology understanding the cells functional organization Nat Rev Genet 5101-113

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Benjamini Y Hochberg Y (1995) Controlling the false discovery rate a practical and powerful approach to multiple testing J R Stat SocSer B 289-300

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

De Bodt S Hollunder J Nelissen H Meulemeester N Inzeacute D (2012) CORNET 20 Integrating plant coexpression protein-proteininteractions regulatory interactions gene associations and functional annotations New Phytol 195 707-720

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Bolduc N Yilmaz A Mejia-Guerra MK Morohashi K OConnor D Grotewold E Hake S (2012) Unraveling the KNOTTED1 regulatorynetwork in maize meristems Genes Dev 26 1685-90

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Bosch M Mayer C-D Cookson A Donnison IS (2011) Identification of genes involved in cell wall biogenesis in grasses by differential wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

gene expression profiling of elongating and non-elongating maize internodes J Exp Bot 62 3545-3561Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Camacho C Coulouris G Avagyan V Ma N Papadopoulos J Bealer K Madden TL (2009) BLAST+ architecture and applications BMCBioinformatics 10 421

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Conesa A Madrigal P Tarazona S Gomez-Cabrero D Cervera A McPherson A Szczesniak MW Gaffney DJ Elo LL Zhang X et al(2016) A survey of best practices for RNA-seq data analysis Genome Biol 17 13

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dhaeseleer P Liang S Somogyi R (2000) Genetic network inference from co-expression clustering to reverse engineeringBioinformatics 16 707-726

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Davidson RM Hansey CN Gowda M Childs KL Lin H Vaillancourt B Sekhon RS de Leon N Kaeppler SM Jiang N et al (2011) Utilityof RNA Sequencing for Analysis of Maize Reproductive Transcriptomes Plant Genome J 4 191

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Desprez T Juraniec M Crowell EF Jouy H Pochylova Z Parcy F Houmlfte H Gonneau M Vernhettes S (2007) Organization of cellulosesynthase complexes involved in primary cell wall synthesis in Arabidopsis thaliana Proc Natl Acad Sci 104 15572-15577

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dhillon IS Modha DS (2001) Concept decompositions for large sparse text data using clustering Mach Learn 42 143-175Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dillies M-A Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot G Castel D Estelle J (2013a) Acomprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis Brief Bioinform 14671-683

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dillies MA Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot NS Castel D Estelle J et al (2013b) Acomprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis Brief Bioinform 14671-683

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Doncheva NT Assenov Y Domingues FS Albrecht M (2012) Topological analysis and interactive visualization of biological networksand protein structures Nature Protoc 7 670-85

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dong J Horvath S (2007) Understanding network concepts in modules BMC Syst Biol 1 24Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dotto MC Petsch KA Aukerman MJ Beatty M Hammell M Timmermans MCP (2014) Genome-wide analysis of leafbladeless1-regulatedand phased small RNAs underscores the importance of the TAS3 ta-siRNA pathway to maize development PLoS Genet 10 e1004826

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Du D Rawat N Deng Z Gmitter FG (2015) Construction of citrus gene coexpression networks from microarray data using randommatrix theory Hortic Res 2 15026

Pubmed Author and TitleCrossRef Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Google Scholar Author Only Title Only Author and Title

Du Z Zhou X Ling Y Zhang Z Su Z (2010) agriGO a GO analysis toolkit for the agricultural community Nucleic Acids Res 38 64-70Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Enright AJ Van Dongen S Ouzounis CA (2002) An efficient algorithm for large-scale detection of protein families Nucleic Acids Res 301575-1584

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fadista J Vikman P Laakso EO Mollet IG Esguerra J Lou Taneera J Storm P Osmark P Ladenvall C Prasad RB (2014) Globalgenomic and transcriptomic analysis of human pancreatic islets reveals novel genes influencing glucose metabolism Proc Natl AcadSci 111 13924-13929

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Faith JJ Hayete B Thaden JT Mogno I Wierzbowski J Cottarel G Kasif S Collins JJ Gardner TS (2007) Large-scale mapping andvalidation of Escherichia coli transcriptional regulation from a compendium of expression profiles PLoS Biol 5 0054-0066

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fedoroff N V (2012) McClintocks challenge in the 21st century Proc Natl Acad Sci 109(50) 20200-20203Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ficklin SP Feltus FA (2011) Gene coexpression network alignment and conservation of gene modules between two grass speciesmaize and rice Plant Physiol 156 1244-56

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Filzmoser P Fritz H Kalcher K (2009) pcaPP Robust PCA by Projection Pursuit R Packag version 1Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fu J Cheng Y Linghu J Yang X Kang L Zhang Z Zhang J He C Du X Peng Z (2013) RNA sequencing reveals the complex regulatorynetwork in the maize kernel Nature Commun 42832

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gao S Ver Steeg G Galstyan A (2015) Efficient Estimation of Mutual Information for Strongly Dependent Variables ArtificialIntelligence and Statistics 277-286

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gillis J Pavlidis P (2011) The role of indirect connections in gene networks in predicting function Bioinformatics 27 1860-1866Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Giorgi F Fabbro C Del Licausi F (2013) Comparative study of RNA-seq-and Microarray-derived coexpression networks in Arabidopsisthaliana Bioinformatics 2 1-8

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gu Y Kaplinsky N Bringmann M Cobb A Carroll A Sampathkumar A Baskin TI Persson S Somerville CR (2010) Identification of acellulose synthase-associated protein required for cellulose biosynthesis Proc Natl Acad Sci 107 12866-12871

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Han Y Gao S Muegge K Zhang W Zhou B (2015) Advanced applications of RNA sequencing and challenges Bioinform Biol Insights 929-46

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Horvath S Dong J (2008) Geometric interpretation of gene coexpression network analysis PLoS Comput biol 4 e1000117Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Huang J Lynn JS Schulte L Vendramin S McGinnis K (2017) Chapter Two-Epigenetic Control of Gene Expression in Maize Int RevCell Mol Biol 328 25-48

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Iancu OD Kawane S Bottomly D Searles R Hitzemann R McWeeney S (2012) Utilizing RNA-Seq data for de novo coexpressionnetwork inference Bioinformatics 28 1592-1597

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kim D Langmead B Salzberg SL (2015) HISAT a fast spliced aligner with low memory requirements Nat Methods 12 357-360Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Koumlster J Rahmann S (2012) Snakemakemdasha scalable bioinformatics workflow engine Bioinformatics 28 2520-2522Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Krouk G Lingeman J Colon A Coruzzi G Shasha D (2013) Gene regulatory networks in plants learning causality from time andperturbation Genome Biol 14 123

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kumar S Stecher G Suleski M Hedges SB (2017) TimeTree a resource for timelines timetrees and divergence times Mol Biol Evol34 1812-1819

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kumari S Nie J Chen H-S Ma H Stewart R Li X Lu M-Z Taylor WM Wei H (2012) Evaluation of gene association methods forcoexpression network construction and biological knowledge discovery PLoS One 7 e50411

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Langfelder P Horvath S (2008) WGCNA an R package for weighted correlation network analysis BMC Bioinformatics 9 559Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Leinonen R Sugawara H Shumway M (2010) The sequence read archive Nucleic Acids Res gkq1019Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li C Qiao Z Qi W Wang Q Yuan Y Yang X Tang Y Mei B Lv Y Zhao H et al (2015a) Genome-Wide Characterization of cis-Acting DNATargets Reveals the Transcriptional Regulatory Framework of Opaque2 in Maize Plant Cell 27 532-545

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li H Peng Z Yang X Wang W Fu J Wang J Han Y Chai Y Guo T Yang N (2013a) Genome-wide association study dissects the geneticarchitecture of oil biosynthesis in maize kernels Nat Genet 45 43-50

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li J Wei H Zhao PX (2013b) DeGNServer Deciphering Genome-Scale Gene Networks through High Performance ReverseEngineering Analysis 2013

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li P Piao Y Shon HS Ryu KH (2015b) Comparing the normalization methods for the differential analysis of Illumina high-throughputRNA-Seq data BMC Bioinformatics 16 347 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li Q Eichten SR Hermanson PJ Zaunbrecher VM Song J Wendt J Rosenbaum H Madzima TF Sloan AE Huang J et al (2014a)Genetic Perturbation of the Maize Methylome Plant Cell 26 4602-4616

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li S Labaj PP Zumbo P Sykacek P Shi W Shi L Phan J Wu P-Y Wang M Wang C (2014b) Detecting and correcting systematicvariation in large-scale RNA sequencing data Nature Biotechnol 32 888-895

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li Y Pearl SA Jackson SA (2015c) Gene Networks in Plant Biology Approaches in Reconstruction and Analysis Trends Plant Sci 20664-675

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Liao Y Smyth GK Shi W (2014) featureCounts an efficient general purpose program for assigning sequence reads to genomicfeatures Bioinformatics 30 923-930

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Lim WK Wang K Lefebvre C Califano A (2007) Comparative analysis of microarray normalization procedures Effects on reverseengineering gene networks Bioinformatics pp 282-288

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Liu S Liu Y Zhao J Cai S Qian H Zuo K Zhao L Zhang L (2017) A computational interactome for prioritizing genes associated withcomplex agronomic traits in rice Plant J 90 177-188

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Lonsdale J Thomas J Salvatore M Phillips R Lo E Shad S Hasz R Walters G Garcia F Young N (2013) The genotype-tissueexpression (GTEx) project Nat Genet 45 580-585

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Love MI Huber W Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 Genome Biol15 1

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Luo X You Z Zhou M Li S Leung H Xia Y Zhu Q (2015) A highly efficient approach to protein interactome mapping based oncollaborative filtering framework Sci Rep 5 7702

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma C Wang X (2012) Application of the Gini correlation coefficient to infer regulatory relationships in transcriptome analysis PlantPhysiol 160 192-203

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma H-W Zeng A-P (2003) The connectivity structure giant strong component and centrality of metabolic networks Bioinformatics 191423-1430

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma S Shah S Bohnert HJ Snyder M Dinesh-Kumar SP (2013) Incorporating motif analysis into gene co-expression networks revealsnovel modular expression pattern and new signaling pathways PLoS Genet 9 e1003840

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Marbach D Costello JC Kuumlffner R Vega NNM Prill RJ Camacho DM Allison KR Aderhold A Allison KR Bonneau R et al (2012)Wisdom of crowds for robust gene network inference Nat Methods 9 796-804

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Margolin AA Nemenman I Basso K Wiggins C Stolovitzky G Dalla Favera R Califano A (2006) ARACNE an algorithm for thereconstruction of gene regulatory networks in a mammalian cellular context BMC Bioinformatics 7 S7

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mark Cigan A Unger-Wallace E Haug-Collet K (2005) Transcriptional gene silencing as a tool for uncovering gene function in maizePlant J 43 929-940

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads EMBnet J 17 pp-10Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Maza E Frasse P Senin P Bouzayen M Zouine M (2013) Comparison of normalization methods for differential gene expressionanalysis in RNA-Seq experiments A matter of relative size of studied transcriptomes Commun Integr Biol 6 e25849

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

McClintock B (1983) The Significance of Responses of THE GENOME TO CHALLENGE Science (80- ) 792-801Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Meyer PE Kontos K Lafitte F Bontempi G (2007) Information-theoretic inference of large transcriptional regulatory networks EurasipJ Bioinforma Syst Biol doi 101155200779879

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Meyer PE Lafitte F Bontempi G (2008) minet ARBioconductor package for inferring large transcriptional networks using mutualinformation BMC Bioinformatics 9 461

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Monaco MK Sen TZ Dharmawardhana PD Ren L Schaeffer M Naithani S Amarasinghe V Thomason J Harper L Gardiner J et al(2013) Maize Metabolic Network Construction and Transcriptome Analysis Plant Genome 6 12

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Morohashi K Casas MI Falcone Ferreyra ML Falcone Ferreyra L Mejiacutea-Guerra MK Pourcel L Yilmaz A Feller A Carvalho B EmilianiJ et al (2012) A genome-wide regulatory framework identifies maize pericarp color1 controlled genes Plant Cell 24 2745-64

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Morris JH Apeltsin L Newman AM Baumbach J Wittkop T Su G Bader GD Ferrin TE (2011) clusterMaker a multi-algorithmclustering plugin for Cytoscape BMC Bioinformatics 12 436

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mortazavi A Williams BA McCue K Schaeffer L Wold B (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq NatMethods 5 621-628

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mukaka MM (2012) A guide to appropriate use of correlation coefficient in medical research Malawi Med J 24 69-71Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Obayashi T Hayashi S Saeki M Ohta H Kinoshita K (2009) ATTED-II provides coexpressed gene networks for Arabidopsis Nucleic wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Acids Res 37 D987-D991Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ogata Y Suzuki H Sakurai N Shibata D (2010) CoP a database for characterizing co-expressed gene modules with biologicalinformation in plants Bioinformatics 26 1267-1268

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Oshlack A Wakefield MJ (2009) Transcript length bias in RNA-seq data confounds systems biology Biol Direct 4 14Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Park T Yi S-G Kang S-H Lee S Lee Y-S Simon R (2003) Evaluation of normalization methods for microarray data BMC Bioinformatics4 33

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Paulson J Chen C-Y Lopes-Ramos CM Kuijjer ML Platig J Sonawane AR Fagny M Glass K Quackenbush J (2016) Tissue-awareRNA-Seq processing and normalization for heterogeneous and sparse data bioRxiv 81802

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Pautler M Eveland AL LaRue T Yang F Weeks R Lunde C Je B Il Meeley R Komatsu M Vollbrecht E et al (2015) FASCIATED EAR4Encodes a bZIP Transcription Factor That Regulates Shoot Meristem Size in Maize Plant Cell Online 2 tpc114132506

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Penning BW Hunter 3rd CT Tayengwa R Eveland AL Dugard CK Olek AT Vermerris W Koch KE McCarty DR Davis MF et al (2009)Genetic resources for maize cell wall biology Plant Physiol 151 1703-1728

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ponnala L Wang Y Sun Q Wijk KJ (2014) Correlation of mRNA and protein abundance in the developing maize leaf Plant J 78 424-440Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Rau A Gallopin M Celeux G Jaffreacutezic F (2013) Data-based filtering for replicated high-throughput transcriptome sequencingexperiments Bioinformatics 29 2146-2152

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Robinson MD McCarthy DJ Smyth GK (2010) edgeR a Bioconductor package for differential expression analysis of digital geneexpression data Bioinformatics 26 139-140

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sales G Romualdi C (2011) parmigenemdasha parallel R package for mutual information estimation and gene network reconstructionBioinformatics 27 1876-1877

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sangrador-Vegas A Mitchell AL Chang H-Y Yong S-Y Finn RD (2016) GO annotation in InterPro why stability does not indicateaccuracy in a sea of changing annotations Database J Biol databases curation 2016

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sato Y Antonio BA Namiki N Takehisa H Minami H Kamatsuki K Sugimoto K Shimizu Y Hirochika H Nagamura Y (2011) RiceXPro aplatform for monitoring gene expression in japonica rice grown under natural field conditions Nucleic Acids Res 39 D1141-D1148

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schaefer RJ Briskine R Springer NM Myers CL (2014) Discovering functional modules across diverse maize transcriptomes using wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

COB the co-expression browser PLoS One doi 101371journalpone0099193Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schmidt D (2016) Co-Operation Fast Correlation Covariance and Cosine Similarity R packagePubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schnable P Ware D Fulton R Stein J (2009) The B73 maize genome complexity diversity and dynamics Science (80- ) 326 1112-1115Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schwanhaumlusser B Busse D Li N Dittmar G Schuchhardt J Wolf J Chen W Selbach M (2011) Global quantification of mammalian geneexpression control Nature 473 337-342

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Scott-Boyer M-P Haibe-Kains B Deschepper CF (2013) Network statistics of genetically-driven gene co-expression modules in mousecrosses Frontiers in Genetics 20134291

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Serin EAR Nijveen H Hilhorst HWM Ligterink W (2016) Learning from Co-expression Networks Possibilities and Challenges FrontPlant Sci 7 444

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Shannon P Markiel A Ozier O Baliga NS Wang JT Ramage D Amin N Schwikowski B Ideker T (2003) Cytoscape a softwareenvironment for integrated models of biomolecular interaction networks Genome Res 13 2498-2504

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sing T Sander O Beerenwinkel N Lengauer T (2005) ROCR visualizing classifier performance in R Bioinformatics 21 3940-3941Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Singh M Goel S Meeley RB Dantec C Parrinello H Michaud C Leblanc O Grimanelli D (2011) Production of viable gametes withoutmeiosis in maize deficient for an ARGONAUTE protein Plant Cell 23 443-458

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Song L Langfelder P Horvath S (2012) Comparison of co-expression measures mutual information correlation and model basedindices BMC Bioinformatics 13 328

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Stelpflug SC Rajandeep S Vaillancourt B Hirsch CN Buell CR Leon N De Kaeppler SM (2015) An expanded maize gene expressionatlas based on RNA-sequencing and its use to explore root development Plant Genome 314-362

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Szklarczyk D Franceschini A Wyder S Forslund K Heller D Huerta-Cepas J Simonovic M Roth A Santos A Tsafou KP et al (2015)STRING v10 Protein-protein interaction networks integrated over the tree of life Nucleic Acids Res 43 D447-D452

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Tzfadia O Diels T De Meyer S Vandepoele K Aharoni A Van de Peer Y (2016) CoExpNetViz Comparative Co-Expression NetworksConstruction and Visualization Tool Front Plant Sci 6 1194

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Usadel B Obayashi T Mutwil M Giorgi FM Bassel GW Tanimoto M Chow A Steinhauser D Persson S Provart NJ (2009) Co-expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

USDA (2016) Grain World Markets and Trade

Vanholme R Demedts B Morreel K Ralph J Boerjan W (2010) Lignin biosynthesis and structure Plant Physiol 153 895-905Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Verdin E Ott M (2014) 50 Years of Protein Acetylation From Gene Regulation To Epigenetics Metabolism and Beyond Nat Rev MolCell Biol 16 258-264

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Walley JW Sartor RC Shen Z Schmitz RJ Wu KJ Urich MA Nery JR Smith LG Schnable JC Ecker JR (2016) Integration of omicnetworks in a developmental atlas of maize Science 353 814-818

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang W Zhou XJ Liu Z Sun F (2015a) Network tuned multiple rank aggregation and applications to gene ranking BMC Bioinformatics16 S6

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang YXR Jiang K Feldman LJ Bickel PJ Huang H (2015b) Inferring gene-gene interactions and functional modules using sparsecanonical correlation analysis Ann Appl Stat 9 300-323

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang Z Gerstein M Snyder M (2009) RNA-Seq a revolutionary tool for transcriptomics Nat Rev Genet 10 57-63Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wei C Li J Bumgarner RE (2004) Sample size for detecting differentially expressed genes in microarray experiments BMC Genomics5 87

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wolfe CJ Kohane IS Butte AJ (2005) Systematic survey reveals general applicability of guilt-by-association within genecoexpression networks BMC Bioinformatics 6 227

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wu D-D Wang X Li Y Zeng L Irwin DM Zhang Y-P (2014) Out of Pollen Hypothesis for Origin of New Genes in Flowering PlantsStudy from Arabidopsis thaliana Genome Biol Evol 6 2822-2829

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yang H Liu X Xin M Du J Hu Z Peng H Rossi V Sun Q Ni Z Yao Y (2016) Genome-wide Mapping of Targets of Maize HistoneDeacetylase HDA101 Reveals Its Function and Regulatory Mechanism during Seed Development Plant Cell 28 629-645

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yim WC Yu Y Song K Jang CS Lee B-M (2013) PLANEX the plant co-expression database BMC Plant Biol 13 83Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zheng W Chung LM Zhao H (2011) Bias detection and correction in RNA-Sequencing data BMC Bioinformatics 12 290Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Allen JD Xiao G Xie Y (2014) Ensemble-based network aggregation improves the accuracy of gene network reconstructionPLoS One 9 e106319 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Ye ZH (2015) Secondary cell walls Biosynthesis patterned deposition and transcriptional regulation Plant Cell Physiol 56195-214

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhu G Wu A Xu X-J Xiao P Lu L Liu J Cao Y Chen L Wu J Zhao X-M (2016) PPIM A protein-protein interaction database for MaizePlant Physiol 170 pp1501821-

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zyprych-Walczak J Szabelska A Handschuh L Goacuterczak K Klamecka K Figlerowicz M Siatkowski I (2015) The impact of normalizationmethods on RNA-Seq data analysis Biomed Res Int 2015

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

  • Parsed Citations
  • Article File
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Figure 7
  • Figure 8
  • Parsed Citations
Page 31: 1 : Maize RNA-Seq GCNs · 4 Author names and affiliation: 5 Ji Huang, ... and the genome annotation is still far from complete (Mark 49 Cigan et al., 2005; Ficklin and Feltus, 2011)

Figure 1 Number of maize microarray and RNA-Seq samples submitted to NCBI (httpswwwncbinlmnihgov) from 2008-2016 A A text search of the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) database identified samples generated by microarray platforms GPL4032 and GPL12620 the total values for years 2008 to 2016 were combined to represent the number of microarray studies (Microarray) A text search of the NCBI Sequence Read Archive (SRA) database was used to identify RNA-seq samples generated between 2008 and 2016 using the Illumina sequencing platform (RNA-Seq) B the number of samples submitted to NCBI GEO database each year generated by microarray platform GPL4032 and GPL12620 were identified by a text search (dash line) and compared to the number of RNA-Seq Illumina samples (solid line) per year 2008-2016

Fig 1A B

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 2 Normalization and network inference methods effect on single network performance A Network performance was evaluated by calculating area under the receiver operating characteristic curve (AUROC) values from GO datasets for comparisons with samples normalized using Variance Stabilizing Transformation (VST) Counts Per Million (CPM) or Reads per Kilobase per Million (RPKM) methods B Network performance was evaluated by calculating AUROC values from PPPTY dataset comparisons for samples normalized using VST CPM or RPKM methods C Network performance was evaluated by calculating AUROC values from com-parisons with HDA101 binding targets for samples normalized using VST CPM or RPKM methods D Network performance was evaluated by calculating AUROC values from comparisons with GO dataset for samples constructed using nine inference methods including Pearson Correlation Coefficient (PCC) Spearman correla-tion coefficient (SCC) Kendall rank Correlation Coefficient (KCC) Gini correlation coefficient (GCC) Biweight midcorrelation (BIC) Cosine Similarity Coefficient (CSC) Additive ARCNE (AA) multiplicative ARCNE (MA) Minimum Redundancy NETwork (MRNET) and Context Likelihood of Relatedness (CLR) E Network perfor-mance was evaluated by calculating AUROC values from comparisons with PPPTY for samples constructed using nine inference methods F Network performance was evaluated by calculating AUROC values from com-parisons with HDA101 binding targets for samples constructed using nine inference methods Outliers were defined as outside of 15 times the interquartile range above 75 quantile or below 25 quantile Median values were plotted as bold horizontal lines For C to F the horizontal dashed lines indicate the highest and lowest AUROC values

Fig 2 A D

B E

C F

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

FigP

FigurePIPSimilarityPbetweenPninePinferencePmethodsPonPnetworkPperformancePbasedPuponPGOPA-PandPPPPTYPB-PevaluationI Genes with highest AUROC value in CSC or MRNETCLR are closed in blue and green box respectively AreaPunderPthePROCPcurvePAUROC-PvaluesPforPeachPGOPtermPorPgenesPwerePscaledPtoPstandardPnormalPdistributionMPresultingPinPscaledPAUROCPvaluesPbetweenPKPblue-PandPPred-IPSamplesPnormalizedPbyPVSTMPCPMPandPRPKMPwerePanalyzedPusingPeachPinferencePmethodsPPCCMPSCCMPKCCMPGCCMPBICMPCSCMPAAMPMAMPMRNETPandPCLR-PandPclusteredPbasedPonP EuclidianP distanceIP PCCP PearsonP CorrelationP CoefficientP SCCP SpearmanP correlationP coefficientP KCCP KendallP rankP CorrelationP CoefficientPGCCPGiniPcorrelationPcoefficientPBICPBiweightPmidcorrelationPCosinePSimilarityPCoefficientPCSC-PAAPAdditivePARCNEPMAPmultiplicativePARCNEPMRNETPMinimumPRedundancyPNETworkPCLRPContextPLikelihoodPofPRelatednessI

A

B

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 4 Effect of sample size on network performance A Average area under the ROC curve (average AUROC) values from GO evaluation of 17 different sized networks plotted against natural logarithm transformed sample size (log(sample size)) B Average AUROC values from PPPTY evaluation of 17 different sized networks plotted against natural logarithm transformed sample size (log(sample size)) Regression fitting logarithm models were plotted in black lines R-square and p-values were calculated by lm() function in R PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness

Fig 4A

B

GO PCC GO SCC

GO MRNET GO CLR

PPPTY PCC PPPTY SCC

PPPTY CLRPPPTY MRNET

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Fig 5

A B

Figure 5 Evaluation of different sized networks constructed with different number of samples using PCC (black) SCC (green) MRNET (red) and CLR (blue) methods A Average AUROC values from GO evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted against sample size B Average AUROC values from PPPTY evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted against sample size Networks with the same number of samples included are designated as ldquo_1rdquo ldquo_2rdquo and ldquo_3rdquo PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 6 GCN performance comparison of networks constructed with 1266 libraries A Area under the ROC curve (AUROC) values from GO evalu-ation of single network (white bars) aggregation network (grey bars) and protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers B AUROC values from PPPTY evaluation of single network (white bars) aggregation network (grey bars) and protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers

AU

C

AU

C

Protein GO Protein PPPTYA B

Fig 6

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

835

45

3812

5 802

148

MS PA

SA

872505

167664

16510411732

9172 716573

106591

MS PA

SA

chromatinassemblydisassembly

cellular macromoleculemetabolic process

chromatin assembly

Hub

N2 compound metabolicprocess

gene silencing

macromoleculemetabolic process

cellular componentorganization

chromatin modification

biosynthetic process

cellular biosyntheticprocess

DNA packaging

organelle organization

protein-DNA complexassembly

nucleosomeorganization

DNA-dep DNAreplication

macromoleculebiosynthetic process

response to DNAdamage stimulus

chromosomeorganization

pattern specificationprocess

DNA replication

DNA conformationchange

translation

cellular macromoleculebiosynthetic process

Nucleic acid metabolicprocess

gene expression

chromatin organizationnucleosome assembly

epigenetic reg of geneexpression

negative regulation ofmacromolecule

metabolic process

cellular response tostress

RNA processing

DNA repair

regionalization

polysaccharidebiosynthetic process

cell wall organization orbiogenesis

glucan metabolicprocess

cellular glucanmetabolic process

cellular polysaccharidebiosynthetic process

cellular carbohydratebiosynthetic process

cellulose metabolicprocess

cellular polysaccharidemetabolic process

cellulose biosyntheticprocess epidermis development

cell growthgrowth

regulation of cellularcomponent size

cellular amino acidderivative metabolic

process

cell wall polysaccharidemetabolic process

carbohydrate metabolicprocess

regulation of anatomicalstructure size

GTP metabolic process

root morphogenesis

epidermal celldifferentiation

ectoderm developmentphenylpropanoid

biosynthetic process

regulation of cell size

glucan biosyntheticprocess

carbohydratebiosynthetic process cellular cell wall

organization orbiogenesis

cell wall biogenesis

Cell Wallroot epidermal cell

differentiationcell differentiation

cell wall organization

protein polymerization

plant-type cell wallbiogenesis

cellular carbohydratemetabolic process

phenylpropanoidmetabolic process

cell wall macromoleculemetabolic process

cellular cell wallmacromolecule

metabolic process

plant-type cell wallorganization or

biogenesis

hemicellulose metabolicprocess

Fig 7A C

B D

Figure 7 Comparison of top 1000 interacting genes in aggregation and single networks A Venn diagram shows the overlap among 1000 highest interacting genes in three networks 148 genes were shared among three networks PA PCC ranked aggregation network SA SCC ranked aggrega-tion network MS MRNET single network B Venn diagram shows the overlap among top 1x106 edges in three networks 106591 edges were shared among three networks C GO term enrichment analyzed by AgriGO for 148 shared highly interacting genes among PA SA and MS D GO term enrichment analyzed by AgriGO for 214 co-expressed genes queried by 16 cell wall pathway genes wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Fig 8

A B C

Figure 8 Cell wall pathway subnetworks A Intersections of PCC aggregation (PA) SCC aggregation (SA) and MRNET-single (MS) networks queried by 16 cell wall pathway genes (red nodes) Cyan nodes are genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions B Network retrieved from CORNET database queried by the 16 cell wall pathway genes (red node) Cyan nodes are genes with reported function in cell wall related path-ways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions C Network retrieved from STRING database queried by the 16 cell wall pathway genes (red nodes) Cyan nodes are genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Parsed CitationsAllen JD Xie Y Chen M Girard L Xiao G (2012) Comparing statistical methods for constructing large scale gene networks PLoS One7 e29348

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Anders S Huber W (2010) Differential expression analysis for sequence count data Genome Biol 11 R106Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Aoki K Ogata Y Shibata D (2007) Approaches for extracting practical information from gene co-expression networks in plant biologyPlant Cell Physiol 48 381-90

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Aoki Y Okamura Y Tadaka S Kinoshita K Obayashi T (2015) ATTED-II in 2016 A Plant Coexpression Database Towards Lineage-Specific Coexpression Plant Cell Physiol 57 e5-e5

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Asnicar F Masera L Coller E Gallo C Sella N Tolio T Morettin P Erculiani L Galante F Semeniuta S (2016) NES2RA Networkexpansion by stratified variable subsetting and ranking aggregation Int J High Perform Comput Appl 1094342016662508

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Baerenfaller K Grossmann J Grobei MA Hull R Hirsch-Hoffmann M Yalovsky S Zimmermann P Grossniklaus U Gruissem WBaginsky S (2008) Genome-scale proteomics reveals Arabidopsis thaliana gene models and proteome dynamics Science (80- ) 320938-941

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ballouz S Verleyen W Gillis J (2015) Guidance for RNA-seq co-expression network construction and analysis Safety in numbersBioinformatics 31 2123-2130

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ballouz S Weber M Pavlidis P Gillis J (2016) EGAD Ultra-fast functional analysis of gene networks bioRxiv 53868Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Barabasi A-L Oltvai ZNZN Barabaacutesi A-L (2004) Network biology understanding the cells functional organization Nat Rev Genet 5101-113

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Benjamini Y Hochberg Y (1995) Controlling the false discovery rate a practical and powerful approach to multiple testing J R Stat SocSer B 289-300

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

De Bodt S Hollunder J Nelissen H Meulemeester N Inzeacute D (2012) CORNET 20 Integrating plant coexpression protein-proteininteractions regulatory interactions gene associations and functional annotations New Phytol 195 707-720

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Bolduc N Yilmaz A Mejia-Guerra MK Morohashi K OConnor D Grotewold E Hake S (2012) Unraveling the KNOTTED1 regulatorynetwork in maize meristems Genes Dev 26 1685-90

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Bosch M Mayer C-D Cookson A Donnison IS (2011) Identification of genes involved in cell wall biogenesis in grasses by differential wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

gene expression profiling of elongating and non-elongating maize internodes J Exp Bot 62 3545-3561Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Camacho C Coulouris G Avagyan V Ma N Papadopoulos J Bealer K Madden TL (2009) BLAST+ architecture and applications BMCBioinformatics 10 421

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Conesa A Madrigal P Tarazona S Gomez-Cabrero D Cervera A McPherson A Szczesniak MW Gaffney DJ Elo LL Zhang X et al(2016) A survey of best practices for RNA-seq data analysis Genome Biol 17 13

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dhaeseleer P Liang S Somogyi R (2000) Genetic network inference from co-expression clustering to reverse engineeringBioinformatics 16 707-726

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Davidson RM Hansey CN Gowda M Childs KL Lin H Vaillancourt B Sekhon RS de Leon N Kaeppler SM Jiang N et al (2011) Utilityof RNA Sequencing for Analysis of Maize Reproductive Transcriptomes Plant Genome J 4 191

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Desprez T Juraniec M Crowell EF Jouy H Pochylova Z Parcy F Houmlfte H Gonneau M Vernhettes S (2007) Organization of cellulosesynthase complexes involved in primary cell wall synthesis in Arabidopsis thaliana Proc Natl Acad Sci 104 15572-15577

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dhillon IS Modha DS (2001) Concept decompositions for large sparse text data using clustering Mach Learn 42 143-175Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dillies M-A Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot G Castel D Estelle J (2013a) Acomprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis Brief Bioinform 14671-683

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dillies MA Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot NS Castel D Estelle J et al (2013b) Acomprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis Brief Bioinform 14671-683

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Doncheva NT Assenov Y Domingues FS Albrecht M (2012) Topological analysis and interactive visualization of biological networksand protein structures Nature Protoc 7 670-85

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dong J Horvath S (2007) Understanding network concepts in modules BMC Syst Biol 1 24Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dotto MC Petsch KA Aukerman MJ Beatty M Hammell M Timmermans MCP (2014) Genome-wide analysis of leafbladeless1-regulatedand phased small RNAs underscores the importance of the TAS3 ta-siRNA pathway to maize development PLoS Genet 10 e1004826

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Du D Rawat N Deng Z Gmitter FG (2015) Construction of citrus gene coexpression networks from microarray data using randommatrix theory Hortic Res 2 15026

Pubmed Author and TitleCrossRef Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Google Scholar Author Only Title Only Author and Title

Du Z Zhou X Ling Y Zhang Z Su Z (2010) agriGO a GO analysis toolkit for the agricultural community Nucleic Acids Res 38 64-70Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Enright AJ Van Dongen S Ouzounis CA (2002) An efficient algorithm for large-scale detection of protein families Nucleic Acids Res 301575-1584

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fadista J Vikman P Laakso EO Mollet IG Esguerra J Lou Taneera J Storm P Osmark P Ladenvall C Prasad RB (2014) Globalgenomic and transcriptomic analysis of human pancreatic islets reveals novel genes influencing glucose metabolism Proc Natl AcadSci 111 13924-13929

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Faith JJ Hayete B Thaden JT Mogno I Wierzbowski J Cottarel G Kasif S Collins JJ Gardner TS (2007) Large-scale mapping andvalidation of Escherichia coli transcriptional regulation from a compendium of expression profiles PLoS Biol 5 0054-0066

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fedoroff N V (2012) McClintocks challenge in the 21st century Proc Natl Acad Sci 109(50) 20200-20203Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ficklin SP Feltus FA (2011) Gene coexpression network alignment and conservation of gene modules between two grass speciesmaize and rice Plant Physiol 156 1244-56

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Filzmoser P Fritz H Kalcher K (2009) pcaPP Robust PCA by Projection Pursuit R Packag version 1Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fu J Cheng Y Linghu J Yang X Kang L Zhang Z Zhang J He C Du X Peng Z (2013) RNA sequencing reveals the complex regulatorynetwork in the maize kernel Nature Commun 42832

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gao S Ver Steeg G Galstyan A (2015) Efficient Estimation of Mutual Information for Strongly Dependent Variables ArtificialIntelligence and Statistics 277-286

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gillis J Pavlidis P (2011) The role of indirect connections in gene networks in predicting function Bioinformatics 27 1860-1866Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Giorgi F Fabbro C Del Licausi F (2013) Comparative study of RNA-seq-and Microarray-derived coexpression networks in Arabidopsisthaliana Bioinformatics 2 1-8

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gu Y Kaplinsky N Bringmann M Cobb A Carroll A Sampathkumar A Baskin TI Persson S Somerville CR (2010) Identification of acellulose synthase-associated protein required for cellulose biosynthesis Proc Natl Acad Sci 107 12866-12871

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Han Y Gao S Muegge K Zhang W Zhou B (2015) Advanced applications of RNA sequencing and challenges Bioinform Biol Insights 929-46

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Horvath S Dong J (2008) Geometric interpretation of gene coexpression network analysis PLoS Comput biol 4 e1000117Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Huang J Lynn JS Schulte L Vendramin S McGinnis K (2017) Chapter Two-Epigenetic Control of Gene Expression in Maize Int RevCell Mol Biol 328 25-48

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Iancu OD Kawane S Bottomly D Searles R Hitzemann R McWeeney S (2012) Utilizing RNA-Seq data for de novo coexpressionnetwork inference Bioinformatics 28 1592-1597

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kim D Langmead B Salzberg SL (2015) HISAT a fast spliced aligner with low memory requirements Nat Methods 12 357-360Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Koumlster J Rahmann S (2012) Snakemakemdasha scalable bioinformatics workflow engine Bioinformatics 28 2520-2522Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Krouk G Lingeman J Colon A Coruzzi G Shasha D (2013) Gene regulatory networks in plants learning causality from time andperturbation Genome Biol 14 123

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kumar S Stecher G Suleski M Hedges SB (2017) TimeTree a resource for timelines timetrees and divergence times Mol Biol Evol34 1812-1819

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kumari S Nie J Chen H-S Ma H Stewart R Li X Lu M-Z Taylor WM Wei H (2012) Evaluation of gene association methods forcoexpression network construction and biological knowledge discovery PLoS One 7 e50411

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Langfelder P Horvath S (2008) WGCNA an R package for weighted correlation network analysis BMC Bioinformatics 9 559Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Leinonen R Sugawara H Shumway M (2010) The sequence read archive Nucleic Acids Res gkq1019Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li C Qiao Z Qi W Wang Q Yuan Y Yang X Tang Y Mei B Lv Y Zhao H et al (2015a) Genome-Wide Characterization of cis-Acting DNATargets Reveals the Transcriptional Regulatory Framework of Opaque2 in Maize Plant Cell 27 532-545

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li H Peng Z Yang X Wang W Fu J Wang J Han Y Chai Y Guo T Yang N (2013a) Genome-wide association study dissects the geneticarchitecture of oil biosynthesis in maize kernels Nat Genet 45 43-50

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li J Wei H Zhao PX (2013b) DeGNServer Deciphering Genome-Scale Gene Networks through High Performance ReverseEngineering Analysis 2013

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li P Piao Y Shon HS Ryu KH (2015b) Comparing the normalization methods for the differential analysis of Illumina high-throughputRNA-Seq data BMC Bioinformatics 16 347 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li Q Eichten SR Hermanson PJ Zaunbrecher VM Song J Wendt J Rosenbaum H Madzima TF Sloan AE Huang J et al (2014a)Genetic Perturbation of the Maize Methylome Plant Cell 26 4602-4616

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li S Labaj PP Zumbo P Sykacek P Shi W Shi L Phan J Wu P-Y Wang M Wang C (2014b) Detecting and correcting systematicvariation in large-scale RNA sequencing data Nature Biotechnol 32 888-895

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li Y Pearl SA Jackson SA (2015c) Gene Networks in Plant Biology Approaches in Reconstruction and Analysis Trends Plant Sci 20664-675

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Liao Y Smyth GK Shi W (2014) featureCounts an efficient general purpose program for assigning sequence reads to genomicfeatures Bioinformatics 30 923-930

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Lim WK Wang K Lefebvre C Califano A (2007) Comparative analysis of microarray normalization procedures Effects on reverseengineering gene networks Bioinformatics pp 282-288

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Liu S Liu Y Zhao J Cai S Qian H Zuo K Zhao L Zhang L (2017) A computational interactome for prioritizing genes associated withcomplex agronomic traits in rice Plant J 90 177-188

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Lonsdale J Thomas J Salvatore M Phillips R Lo E Shad S Hasz R Walters G Garcia F Young N (2013) The genotype-tissueexpression (GTEx) project Nat Genet 45 580-585

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Love MI Huber W Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 Genome Biol15 1

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Luo X You Z Zhou M Li S Leung H Xia Y Zhu Q (2015) A highly efficient approach to protein interactome mapping based oncollaborative filtering framework Sci Rep 5 7702

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma C Wang X (2012) Application of the Gini correlation coefficient to infer regulatory relationships in transcriptome analysis PlantPhysiol 160 192-203

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma H-W Zeng A-P (2003) The connectivity structure giant strong component and centrality of metabolic networks Bioinformatics 191423-1430

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma S Shah S Bohnert HJ Snyder M Dinesh-Kumar SP (2013) Incorporating motif analysis into gene co-expression networks revealsnovel modular expression pattern and new signaling pathways PLoS Genet 9 e1003840

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Marbach D Costello JC Kuumlffner R Vega NNM Prill RJ Camacho DM Allison KR Aderhold A Allison KR Bonneau R et al (2012)Wisdom of crowds for robust gene network inference Nat Methods 9 796-804

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Margolin AA Nemenman I Basso K Wiggins C Stolovitzky G Dalla Favera R Califano A (2006) ARACNE an algorithm for thereconstruction of gene regulatory networks in a mammalian cellular context BMC Bioinformatics 7 S7

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mark Cigan A Unger-Wallace E Haug-Collet K (2005) Transcriptional gene silencing as a tool for uncovering gene function in maizePlant J 43 929-940

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads EMBnet J 17 pp-10Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Maza E Frasse P Senin P Bouzayen M Zouine M (2013) Comparison of normalization methods for differential gene expressionanalysis in RNA-Seq experiments A matter of relative size of studied transcriptomes Commun Integr Biol 6 e25849

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

McClintock B (1983) The Significance of Responses of THE GENOME TO CHALLENGE Science (80- ) 792-801Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Meyer PE Kontos K Lafitte F Bontempi G (2007) Information-theoretic inference of large transcriptional regulatory networks EurasipJ Bioinforma Syst Biol doi 101155200779879

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Meyer PE Lafitte F Bontempi G (2008) minet ARBioconductor package for inferring large transcriptional networks using mutualinformation BMC Bioinformatics 9 461

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Monaco MK Sen TZ Dharmawardhana PD Ren L Schaeffer M Naithani S Amarasinghe V Thomason J Harper L Gardiner J et al(2013) Maize Metabolic Network Construction and Transcriptome Analysis Plant Genome 6 12

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Morohashi K Casas MI Falcone Ferreyra ML Falcone Ferreyra L Mejiacutea-Guerra MK Pourcel L Yilmaz A Feller A Carvalho B EmilianiJ et al (2012) A genome-wide regulatory framework identifies maize pericarp color1 controlled genes Plant Cell 24 2745-64

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Morris JH Apeltsin L Newman AM Baumbach J Wittkop T Su G Bader GD Ferrin TE (2011) clusterMaker a multi-algorithmclustering plugin for Cytoscape BMC Bioinformatics 12 436

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mortazavi A Williams BA McCue K Schaeffer L Wold B (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq NatMethods 5 621-628

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mukaka MM (2012) A guide to appropriate use of correlation coefficient in medical research Malawi Med J 24 69-71Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Obayashi T Hayashi S Saeki M Ohta H Kinoshita K (2009) ATTED-II provides coexpressed gene networks for Arabidopsis Nucleic wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Acids Res 37 D987-D991Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ogata Y Suzuki H Sakurai N Shibata D (2010) CoP a database for characterizing co-expressed gene modules with biologicalinformation in plants Bioinformatics 26 1267-1268

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Oshlack A Wakefield MJ (2009) Transcript length bias in RNA-seq data confounds systems biology Biol Direct 4 14Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Park T Yi S-G Kang S-H Lee S Lee Y-S Simon R (2003) Evaluation of normalization methods for microarray data BMC Bioinformatics4 33

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Paulson J Chen C-Y Lopes-Ramos CM Kuijjer ML Platig J Sonawane AR Fagny M Glass K Quackenbush J (2016) Tissue-awareRNA-Seq processing and normalization for heterogeneous and sparse data bioRxiv 81802

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Pautler M Eveland AL LaRue T Yang F Weeks R Lunde C Je B Il Meeley R Komatsu M Vollbrecht E et al (2015) FASCIATED EAR4Encodes a bZIP Transcription Factor That Regulates Shoot Meristem Size in Maize Plant Cell Online 2 tpc114132506

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Penning BW Hunter 3rd CT Tayengwa R Eveland AL Dugard CK Olek AT Vermerris W Koch KE McCarty DR Davis MF et al (2009)Genetic resources for maize cell wall biology Plant Physiol 151 1703-1728

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ponnala L Wang Y Sun Q Wijk KJ (2014) Correlation of mRNA and protein abundance in the developing maize leaf Plant J 78 424-440Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Rau A Gallopin M Celeux G Jaffreacutezic F (2013) Data-based filtering for replicated high-throughput transcriptome sequencingexperiments Bioinformatics 29 2146-2152

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Robinson MD McCarthy DJ Smyth GK (2010) edgeR a Bioconductor package for differential expression analysis of digital geneexpression data Bioinformatics 26 139-140

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sales G Romualdi C (2011) parmigenemdasha parallel R package for mutual information estimation and gene network reconstructionBioinformatics 27 1876-1877

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sangrador-Vegas A Mitchell AL Chang H-Y Yong S-Y Finn RD (2016) GO annotation in InterPro why stability does not indicateaccuracy in a sea of changing annotations Database J Biol databases curation 2016

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sato Y Antonio BA Namiki N Takehisa H Minami H Kamatsuki K Sugimoto K Shimizu Y Hirochika H Nagamura Y (2011) RiceXPro aplatform for monitoring gene expression in japonica rice grown under natural field conditions Nucleic Acids Res 39 D1141-D1148

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schaefer RJ Briskine R Springer NM Myers CL (2014) Discovering functional modules across diverse maize transcriptomes using wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

COB the co-expression browser PLoS One doi 101371journalpone0099193Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schmidt D (2016) Co-Operation Fast Correlation Covariance and Cosine Similarity R packagePubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schnable P Ware D Fulton R Stein J (2009) The B73 maize genome complexity diversity and dynamics Science (80- ) 326 1112-1115Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schwanhaumlusser B Busse D Li N Dittmar G Schuchhardt J Wolf J Chen W Selbach M (2011) Global quantification of mammalian geneexpression control Nature 473 337-342

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Scott-Boyer M-P Haibe-Kains B Deschepper CF (2013) Network statistics of genetically-driven gene co-expression modules in mousecrosses Frontiers in Genetics 20134291

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Serin EAR Nijveen H Hilhorst HWM Ligterink W (2016) Learning from Co-expression Networks Possibilities and Challenges FrontPlant Sci 7 444

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Shannon P Markiel A Ozier O Baliga NS Wang JT Ramage D Amin N Schwikowski B Ideker T (2003) Cytoscape a softwareenvironment for integrated models of biomolecular interaction networks Genome Res 13 2498-2504

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sing T Sander O Beerenwinkel N Lengauer T (2005) ROCR visualizing classifier performance in R Bioinformatics 21 3940-3941Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Singh M Goel S Meeley RB Dantec C Parrinello H Michaud C Leblanc O Grimanelli D (2011) Production of viable gametes withoutmeiosis in maize deficient for an ARGONAUTE protein Plant Cell 23 443-458

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Song L Langfelder P Horvath S (2012) Comparison of co-expression measures mutual information correlation and model basedindices BMC Bioinformatics 13 328

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Stelpflug SC Rajandeep S Vaillancourt B Hirsch CN Buell CR Leon N De Kaeppler SM (2015) An expanded maize gene expressionatlas based on RNA-sequencing and its use to explore root development Plant Genome 314-362

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Szklarczyk D Franceschini A Wyder S Forslund K Heller D Huerta-Cepas J Simonovic M Roth A Santos A Tsafou KP et al (2015)STRING v10 Protein-protein interaction networks integrated over the tree of life Nucleic Acids Res 43 D447-D452

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Tzfadia O Diels T De Meyer S Vandepoele K Aharoni A Van de Peer Y (2016) CoExpNetViz Comparative Co-Expression NetworksConstruction and Visualization Tool Front Plant Sci 6 1194

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Usadel B Obayashi T Mutwil M Giorgi FM Bassel GW Tanimoto M Chow A Steinhauser D Persson S Provart NJ (2009) Co-expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

USDA (2016) Grain World Markets and Trade

Vanholme R Demedts B Morreel K Ralph J Boerjan W (2010) Lignin biosynthesis and structure Plant Physiol 153 895-905Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Verdin E Ott M (2014) 50 Years of Protein Acetylation From Gene Regulation To Epigenetics Metabolism and Beyond Nat Rev MolCell Biol 16 258-264

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Walley JW Sartor RC Shen Z Schmitz RJ Wu KJ Urich MA Nery JR Smith LG Schnable JC Ecker JR (2016) Integration of omicnetworks in a developmental atlas of maize Science 353 814-818

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang W Zhou XJ Liu Z Sun F (2015a) Network tuned multiple rank aggregation and applications to gene ranking BMC Bioinformatics16 S6

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang YXR Jiang K Feldman LJ Bickel PJ Huang H (2015b) Inferring gene-gene interactions and functional modules using sparsecanonical correlation analysis Ann Appl Stat 9 300-323

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang Z Gerstein M Snyder M (2009) RNA-Seq a revolutionary tool for transcriptomics Nat Rev Genet 10 57-63Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wei C Li J Bumgarner RE (2004) Sample size for detecting differentially expressed genes in microarray experiments BMC Genomics5 87

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wolfe CJ Kohane IS Butte AJ (2005) Systematic survey reveals general applicability of guilt-by-association within genecoexpression networks BMC Bioinformatics 6 227

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wu D-D Wang X Li Y Zeng L Irwin DM Zhang Y-P (2014) Out of Pollen Hypothesis for Origin of New Genes in Flowering PlantsStudy from Arabidopsis thaliana Genome Biol Evol 6 2822-2829

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yang H Liu X Xin M Du J Hu Z Peng H Rossi V Sun Q Ni Z Yao Y (2016) Genome-wide Mapping of Targets of Maize HistoneDeacetylase HDA101 Reveals Its Function and Regulatory Mechanism during Seed Development Plant Cell 28 629-645

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yim WC Yu Y Song K Jang CS Lee B-M (2013) PLANEX the plant co-expression database BMC Plant Biol 13 83Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zheng W Chung LM Zhao H (2011) Bias detection and correction in RNA-Sequencing data BMC Bioinformatics 12 290Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Allen JD Xiao G Xie Y (2014) Ensemble-based network aggregation improves the accuracy of gene network reconstructionPLoS One 9 e106319 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Ye ZH (2015) Secondary cell walls Biosynthesis patterned deposition and transcriptional regulation Plant Cell Physiol 56195-214

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhu G Wu A Xu X-J Xiao P Lu L Liu J Cao Y Chen L Wu J Zhao X-M (2016) PPIM A protein-protein interaction database for MaizePlant Physiol 170 pp1501821-

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zyprych-Walczak J Szabelska A Handschuh L Goacuterczak K Klamecka K Figlerowicz M Siatkowski I (2015) The impact of normalizationmethods on RNA-Seq data analysis Biomed Res Int 2015

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

  • Parsed Citations
  • Article File
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Figure 7
  • Figure 8
  • Parsed Citations
Page 32: 1 : Maize RNA-Seq GCNs · 4 Author names and affiliation: 5 Ji Huang, ... and the genome annotation is still far from complete (Mark 49 Cigan et al., 2005; Ficklin and Feltus, 2011)

Figure 2 Normalization and network inference methods effect on single network performance A Network performance was evaluated by calculating area under the receiver operating characteristic curve (AUROC) values from GO datasets for comparisons with samples normalized using Variance Stabilizing Transformation (VST) Counts Per Million (CPM) or Reads per Kilobase per Million (RPKM) methods B Network performance was evaluated by calculating AUROC values from PPPTY dataset comparisons for samples normalized using VST CPM or RPKM methods C Network performance was evaluated by calculating AUROC values from com-parisons with HDA101 binding targets for samples normalized using VST CPM or RPKM methods D Network performance was evaluated by calculating AUROC values from comparisons with GO dataset for samples constructed using nine inference methods including Pearson Correlation Coefficient (PCC) Spearman correla-tion coefficient (SCC) Kendall rank Correlation Coefficient (KCC) Gini correlation coefficient (GCC) Biweight midcorrelation (BIC) Cosine Similarity Coefficient (CSC) Additive ARCNE (AA) multiplicative ARCNE (MA) Minimum Redundancy NETwork (MRNET) and Context Likelihood of Relatedness (CLR) E Network perfor-mance was evaluated by calculating AUROC values from comparisons with PPPTY for samples constructed using nine inference methods F Network performance was evaluated by calculating AUROC values from com-parisons with HDA101 binding targets for samples constructed using nine inference methods Outliers were defined as outside of 15 times the interquartile range above 75 quantile or below 25 quantile Median values were plotted as bold horizontal lines For C to F the horizontal dashed lines indicate the highest and lowest AUROC values

Fig 2 A D

B E

C F

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

FigP

FigurePIPSimilarityPbetweenPninePinferencePmethodsPonPnetworkPperformancePbasedPuponPGOPA-PandPPPPTYPB-PevaluationI Genes with highest AUROC value in CSC or MRNETCLR are closed in blue and green box respectively AreaPunderPthePROCPcurvePAUROC-PvaluesPforPeachPGOPtermPorPgenesPwerePscaledPtoPstandardPnormalPdistributionMPresultingPinPscaledPAUROCPvaluesPbetweenPKPblue-PandPPred-IPSamplesPnormalizedPbyPVSTMPCPMPandPRPKMPwerePanalyzedPusingPeachPinferencePmethodsPPCCMPSCCMPKCCMPGCCMPBICMPCSCMPAAMPMAMPMRNETPandPCLR-PandPclusteredPbasedPonP EuclidianP distanceIP PCCP PearsonP CorrelationP CoefficientP SCCP SpearmanP correlationP coefficientP KCCP KendallP rankP CorrelationP CoefficientPGCCPGiniPcorrelationPcoefficientPBICPBiweightPmidcorrelationPCosinePSimilarityPCoefficientPCSC-PAAPAdditivePARCNEPMAPmultiplicativePARCNEPMRNETPMinimumPRedundancyPNETworkPCLRPContextPLikelihoodPofPRelatednessI

A

B

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 4 Effect of sample size on network performance A Average area under the ROC curve (average AUROC) values from GO evaluation of 17 different sized networks plotted against natural logarithm transformed sample size (log(sample size)) B Average AUROC values from PPPTY evaluation of 17 different sized networks plotted against natural logarithm transformed sample size (log(sample size)) Regression fitting logarithm models were plotted in black lines R-square and p-values were calculated by lm() function in R PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness

Fig 4A

B

GO PCC GO SCC

GO MRNET GO CLR

PPPTY PCC PPPTY SCC

PPPTY CLRPPPTY MRNET

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Fig 5

A B

Figure 5 Evaluation of different sized networks constructed with different number of samples using PCC (black) SCC (green) MRNET (red) and CLR (blue) methods A Average AUROC values from GO evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted against sample size B Average AUROC values from PPPTY evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted against sample size Networks with the same number of samples included are designated as ldquo_1rdquo ldquo_2rdquo and ldquo_3rdquo PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 6 GCN performance comparison of networks constructed with 1266 libraries A Area under the ROC curve (AUROC) values from GO evalu-ation of single network (white bars) aggregation network (grey bars) and protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers B AUROC values from PPPTY evaluation of single network (white bars) aggregation network (grey bars) and protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers

AU

C

AU

C

Protein GO Protein PPPTYA B

Fig 6

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

835

45

3812

5 802

148

MS PA

SA

872505

167664

16510411732

9172 716573

106591

MS PA

SA

chromatinassemblydisassembly

cellular macromoleculemetabolic process

chromatin assembly

Hub

N2 compound metabolicprocess

gene silencing

macromoleculemetabolic process

cellular componentorganization

chromatin modification

biosynthetic process

cellular biosyntheticprocess

DNA packaging

organelle organization

protein-DNA complexassembly

nucleosomeorganization

DNA-dep DNAreplication

macromoleculebiosynthetic process

response to DNAdamage stimulus

chromosomeorganization

pattern specificationprocess

DNA replication

DNA conformationchange

translation

cellular macromoleculebiosynthetic process

Nucleic acid metabolicprocess

gene expression

chromatin organizationnucleosome assembly

epigenetic reg of geneexpression

negative regulation ofmacromolecule

metabolic process

cellular response tostress

RNA processing

DNA repair

regionalization

polysaccharidebiosynthetic process

cell wall organization orbiogenesis

glucan metabolicprocess

cellular glucanmetabolic process

cellular polysaccharidebiosynthetic process

cellular carbohydratebiosynthetic process

cellulose metabolicprocess

cellular polysaccharidemetabolic process

cellulose biosyntheticprocess epidermis development

cell growthgrowth

regulation of cellularcomponent size

cellular amino acidderivative metabolic

process

cell wall polysaccharidemetabolic process

carbohydrate metabolicprocess

regulation of anatomicalstructure size

GTP metabolic process

root morphogenesis

epidermal celldifferentiation

ectoderm developmentphenylpropanoid

biosynthetic process

regulation of cell size

glucan biosyntheticprocess

carbohydratebiosynthetic process cellular cell wall

organization orbiogenesis

cell wall biogenesis

Cell Wallroot epidermal cell

differentiationcell differentiation

cell wall organization

protein polymerization

plant-type cell wallbiogenesis

cellular carbohydratemetabolic process

phenylpropanoidmetabolic process

cell wall macromoleculemetabolic process

cellular cell wallmacromolecule

metabolic process

plant-type cell wallorganization or

biogenesis

hemicellulose metabolicprocess

Fig 7A C

B D

Figure 7 Comparison of top 1000 interacting genes in aggregation and single networks A Venn diagram shows the overlap among 1000 highest interacting genes in three networks 148 genes were shared among three networks PA PCC ranked aggregation network SA SCC ranked aggrega-tion network MS MRNET single network B Venn diagram shows the overlap among top 1x106 edges in three networks 106591 edges were shared among three networks C GO term enrichment analyzed by AgriGO for 148 shared highly interacting genes among PA SA and MS D GO term enrichment analyzed by AgriGO for 214 co-expressed genes queried by 16 cell wall pathway genes wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Fig 8

A B C

Figure 8 Cell wall pathway subnetworks A Intersections of PCC aggregation (PA) SCC aggregation (SA) and MRNET-single (MS) networks queried by 16 cell wall pathway genes (red nodes) Cyan nodes are genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions B Network retrieved from CORNET database queried by the 16 cell wall pathway genes (red node) Cyan nodes are genes with reported function in cell wall related path-ways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions C Network retrieved from STRING database queried by the 16 cell wall pathway genes (red nodes) Cyan nodes are genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Parsed CitationsAllen JD Xie Y Chen M Girard L Xiao G (2012) Comparing statistical methods for constructing large scale gene networks PLoS One7 e29348

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Anders S Huber W (2010) Differential expression analysis for sequence count data Genome Biol 11 R106Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Aoki K Ogata Y Shibata D (2007) Approaches for extracting practical information from gene co-expression networks in plant biologyPlant Cell Physiol 48 381-90

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Aoki Y Okamura Y Tadaka S Kinoshita K Obayashi T (2015) ATTED-II in 2016 A Plant Coexpression Database Towards Lineage-Specific Coexpression Plant Cell Physiol 57 e5-e5

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Asnicar F Masera L Coller E Gallo C Sella N Tolio T Morettin P Erculiani L Galante F Semeniuta S (2016) NES2RA Networkexpansion by stratified variable subsetting and ranking aggregation Int J High Perform Comput Appl 1094342016662508

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Baerenfaller K Grossmann J Grobei MA Hull R Hirsch-Hoffmann M Yalovsky S Zimmermann P Grossniklaus U Gruissem WBaginsky S (2008) Genome-scale proteomics reveals Arabidopsis thaliana gene models and proteome dynamics Science (80- ) 320938-941

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ballouz S Verleyen W Gillis J (2015) Guidance for RNA-seq co-expression network construction and analysis Safety in numbersBioinformatics 31 2123-2130

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ballouz S Weber M Pavlidis P Gillis J (2016) EGAD Ultra-fast functional analysis of gene networks bioRxiv 53868Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Barabasi A-L Oltvai ZNZN Barabaacutesi A-L (2004) Network biology understanding the cells functional organization Nat Rev Genet 5101-113

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Benjamini Y Hochberg Y (1995) Controlling the false discovery rate a practical and powerful approach to multiple testing J R Stat SocSer B 289-300

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

De Bodt S Hollunder J Nelissen H Meulemeester N Inzeacute D (2012) CORNET 20 Integrating plant coexpression protein-proteininteractions regulatory interactions gene associations and functional annotations New Phytol 195 707-720

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Bolduc N Yilmaz A Mejia-Guerra MK Morohashi K OConnor D Grotewold E Hake S (2012) Unraveling the KNOTTED1 regulatorynetwork in maize meristems Genes Dev 26 1685-90

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Bosch M Mayer C-D Cookson A Donnison IS (2011) Identification of genes involved in cell wall biogenesis in grasses by differential wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

gene expression profiling of elongating and non-elongating maize internodes J Exp Bot 62 3545-3561Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Camacho C Coulouris G Avagyan V Ma N Papadopoulos J Bealer K Madden TL (2009) BLAST+ architecture and applications BMCBioinformatics 10 421

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Conesa A Madrigal P Tarazona S Gomez-Cabrero D Cervera A McPherson A Szczesniak MW Gaffney DJ Elo LL Zhang X et al(2016) A survey of best practices for RNA-seq data analysis Genome Biol 17 13

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dhaeseleer P Liang S Somogyi R (2000) Genetic network inference from co-expression clustering to reverse engineeringBioinformatics 16 707-726

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Davidson RM Hansey CN Gowda M Childs KL Lin H Vaillancourt B Sekhon RS de Leon N Kaeppler SM Jiang N et al (2011) Utilityof RNA Sequencing for Analysis of Maize Reproductive Transcriptomes Plant Genome J 4 191

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Desprez T Juraniec M Crowell EF Jouy H Pochylova Z Parcy F Houmlfte H Gonneau M Vernhettes S (2007) Organization of cellulosesynthase complexes involved in primary cell wall synthesis in Arabidopsis thaliana Proc Natl Acad Sci 104 15572-15577

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dhillon IS Modha DS (2001) Concept decompositions for large sparse text data using clustering Mach Learn 42 143-175Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dillies M-A Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot G Castel D Estelle J (2013a) Acomprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis Brief Bioinform 14671-683

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dillies MA Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot NS Castel D Estelle J et al (2013b) Acomprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis Brief Bioinform 14671-683

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Doncheva NT Assenov Y Domingues FS Albrecht M (2012) Topological analysis and interactive visualization of biological networksand protein structures Nature Protoc 7 670-85

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dong J Horvath S (2007) Understanding network concepts in modules BMC Syst Biol 1 24Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dotto MC Petsch KA Aukerman MJ Beatty M Hammell M Timmermans MCP (2014) Genome-wide analysis of leafbladeless1-regulatedand phased small RNAs underscores the importance of the TAS3 ta-siRNA pathway to maize development PLoS Genet 10 e1004826

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Du D Rawat N Deng Z Gmitter FG (2015) Construction of citrus gene coexpression networks from microarray data using randommatrix theory Hortic Res 2 15026

Pubmed Author and TitleCrossRef Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Google Scholar Author Only Title Only Author and Title

Du Z Zhou X Ling Y Zhang Z Su Z (2010) agriGO a GO analysis toolkit for the agricultural community Nucleic Acids Res 38 64-70Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Enright AJ Van Dongen S Ouzounis CA (2002) An efficient algorithm for large-scale detection of protein families Nucleic Acids Res 301575-1584

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fadista J Vikman P Laakso EO Mollet IG Esguerra J Lou Taneera J Storm P Osmark P Ladenvall C Prasad RB (2014) Globalgenomic and transcriptomic analysis of human pancreatic islets reveals novel genes influencing glucose metabolism Proc Natl AcadSci 111 13924-13929

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Faith JJ Hayete B Thaden JT Mogno I Wierzbowski J Cottarel G Kasif S Collins JJ Gardner TS (2007) Large-scale mapping andvalidation of Escherichia coli transcriptional regulation from a compendium of expression profiles PLoS Biol 5 0054-0066

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fedoroff N V (2012) McClintocks challenge in the 21st century Proc Natl Acad Sci 109(50) 20200-20203Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ficklin SP Feltus FA (2011) Gene coexpression network alignment and conservation of gene modules between two grass speciesmaize and rice Plant Physiol 156 1244-56

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Filzmoser P Fritz H Kalcher K (2009) pcaPP Robust PCA by Projection Pursuit R Packag version 1Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fu J Cheng Y Linghu J Yang X Kang L Zhang Z Zhang J He C Du X Peng Z (2013) RNA sequencing reveals the complex regulatorynetwork in the maize kernel Nature Commun 42832

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gao S Ver Steeg G Galstyan A (2015) Efficient Estimation of Mutual Information for Strongly Dependent Variables ArtificialIntelligence and Statistics 277-286

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gillis J Pavlidis P (2011) The role of indirect connections in gene networks in predicting function Bioinformatics 27 1860-1866Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Giorgi F Fabbro C Del Licausi F (2013) Comparative study of RNA-seq-and Microarray-derived coexpression networks in Arabidopsisthaliana Bioinformatics 2 1-8

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gu Y Kaplinsky N Bringmann M Cobb A Carroll A Sampathkumar A Baskin TI Persson S Somerville CR (2010) Identification of acellulose synthase-associated protein required for cellulose biosynthesis Proc Natl Acad Sci 107 12866-12871

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Han Y Gao S Muegge K Zhang W Zhou B (2015) Advanced applications of RNA sequencing and challenges Bioinform Biol Insights 929-46

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Horvath S Dong J (2008) Geometric interpretation of gene coexpression network analysis PLoS Comput biol 4 e1000117Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Huang J Lynn JS Schulte L Vendramin S McGinnis K (2017) Chapter Two-Epigenetic Control of Gene Expression in Maize Int RevCell Mol Biol 328 25-48

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Iancu OD Kawane S Bottomly D Searles R Hitzemann R McWeeney S (2012) Utilizing RNA-Seq data for de novo coexpressionnetwork inference Bioinformatics 28 1592-1597

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kim D Langmead B Salzberg SL (2015) HISAT a fast spliced aligner with low memory requirements Nat Methods 12 357-360Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Koumlster J Rahmann S (2012) Snakemakemdasha scalable bioinformatics workflow engine Bioinformatics 28 2520-2522Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Krouk G Lingeman J Colon A Coruzzi G Shasha D (2013) Gene regulatory networks in plants learning causality from time andperturbation Genome Biol 14 123

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kumar S Stecher G Suleski M Hedges SB (2017) TimeTree a resource for timelines timetrees and divergence times Mol Biol Evol34 1812-1819

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kumari S Nie J Chen H-S Ma H Stewart R Li X Lu M-Z Taylor WM Wei H (2012) Evaluation of gene association methods forcoexpression network construction and biological knowledge discovery PLoS One 7 e50411

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Langfelder P Horvath S (2008) WGCNA an R package for weighted correlation network analysis BMC Bioinformatics 9 559Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Leinonen R Sugawara H Shumway M (2010) The sequence read archive Nucleic Acids Res gkq1019Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li C Qiao Z Qi W Wang Q Yuan Y Yang X Tang Y Mei B Lv Y Zhao H et al (2015a) Genome-Wide Characterization of cis-Acting DNATargets Reveals the Transcriptional Regulatory Framework of Opaque2 in Maize Plant Cell 27 532-545

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li H Peng Z Yang X Wang W Fu J Wang J Han Y Chai Y Guo T Yang N (2013a) Genome-wide association study dissects the geneticarchitecture of oil biosynthesis in maize kernels Nat Genet 45 43-50

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li J Wei H Zhao PX (2013b) DeGNServer Deciphering Genome-Scale Gene Networks through High Performance ReverseEngineering Analysis 2013

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li P Piao Y Shon HS Ryu KH (2015b) Comparing the normalization methods for the differential analysis of Illumina high-throughputRNA-Seq data BMC Bioinformatics 16 347 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li Q Eichten SR Hermanson PJ Zaunbrecher VM Song J Wendt J Rosenbaum H Madzima TF Sloan AE Huang J et al (2014a)Genetic Perturbation of the Maize Methylome Plant Cell 26 4602-4616

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li S Labaj PP Zumbo P Sykacek P Shi W Shi L Phan J Wu P-Y Wang M Wang C (2014b) Detecting and correcting systematicvariation in large-scale RNA sequencing data Nature Biotechnol 32 888-895

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li Y Pearl SA Jackson SA (2015c) Gene Networks in Plant Biology Approaches in Reconstruction and Analysis Trends Plant Sci 20664-675

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Liao Y Smyth GK Shi W (2014) featureCounts an efficient general purpose program for assigning sequence reads to genomicfeatures Bioinformatics 30 923-930

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Lim WK Wang K Lefebvre C Califano A (2007) Comparative analysis of microarray normalization procedures Effects on reverseengineering gene networks Bioinformatics pp 282-288

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Liu S Liu Y Zhao J Cai S Qian H Zuo K Zhao L Zhang L (2017) A computational interactome for prioritizing genes associated withcomplex agronomic traits in rice Plant J 90 177-188

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Lonsdale J Thomas J Salvatore M Phillips R Lo E Shad S Hasz R Walters G Garcia F Young N (2013) The genotype-tissueexpression (GTEx) project Nat Genet 45 580-585

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Love MI Huber W Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 Genome Biol15 1

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Luo X You Z Zhou M Li S Leung H Xia Y Zhu Q (2015) A highly efficient approach to protein interactome mapping based oncollaborative filtering framework Sci Rep 5 7702

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma C Wang X (2012) Application of the Gini correlation coefficient to infer regulatory relationships in transcriptome analysis PlantPhysiol 160 192-203

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma H-W Zeng A-P (2003) The connectivity structure giant strong component and centrality of metabolic networks Bioinformatics 191423-1430

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma S Shah S Bohnert HJ Snyder M Dinesh-Kumar SP (2013) Incorporating motif analysis into gene co-expression networks revealsnovel modular expression pattern and new signaling pathways PLoS Genet 9 e1003840

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Marbach D Costello JC Kuumlffner R Vega NNM Prill RJ Camacho DM Allison KR Aderhold A Allison KR Bonneau R et al (2012)Wisdom of crowds for robust gene network inference Nat Methods 9 796-804

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Margolin AA Nemenman I Basso K Wiggins C Stolovitzky G Dalla Favera R Califano A (2006) ARACNE an algorithm for thereconstruction of gene regulatory networks in a mammalian cellular context BMC Bioinformatics 7 S7

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mark Cigan A Unger-Wallace E Haug-Collet K (2005) Transcriptional gene silencing as a tool for uncovering gene function in maizePlant J 43 929-940

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads EMBnet J 17 pp-10Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Maza E Frasse P Senin P Bouzayen M Zouine M (2013) Comparison of normalization methods for differential gene expressionanalysis in RNA-Seq experiments A matter of relative size of studied transcriptomes Commun Integr Biol 6 e25849

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

McClintock B (1983) The Significance of Responses of THE GENOME TO CHALLENGE Science (80- ) 792-801Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Meyer PE Kontos K Lafitte F Bontempi G (2007) Information-theoretic inference of large transcriptional regulatory networks EurasipJ Bioinforma Syst Biol doi 101155200779879

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Meyer PE Lafitte F Bontempi G (2008) minet ARBioconductor package for inferring large transcriptional networks using mutualinformation BMC Bioinformatics 9 461

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Monaco MK Sen TZ Dharmawardhana PD Ren L Schaeffer M Naithani S Amarasinghe V Thomason J Harper L Gardiner J et al(2013) Maize Metabolic Network Construction and Transcriptome Analysis Plant Genome 6 12

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Morohashi K Casas MI Falcone Ferreyra ML Falcone Ferreyra L Mejiacutea-Guerra MK Pourcel L Yilmaz A Feller A Carvalho B EmilianiJ et al (2012) A genome-wide regulatory framework identifies maize pericarp color1 controlled genes Plant Cell 24 2745-64

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Morris JH Apeltsin L Newman AM Baumbach J Wittkop T Su G Bader GD Ferrin TE (2011) clusterMaker a multi-algorithmclustering plugin for Cytoscape BMC Bioinformatics 12 436

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mortazavi A Williams BA McCue K Schaeffer L Wold B (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq NatMethods 5 621-628

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mukaka MM (2012) A guide to appropriate use of correlation coefficient in medical research Malawi Med J 24 69-71Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Obayashi T Hayashi S Saeki M Ohta H Kinoshita K (2009) ATTED-II provides coexpressed gene networks for Arabidopsis Nucleic wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Acids Res 37 D987-D991Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ogata Y Suzuki H Sakurai N Shibata D (2010) CoP a database for characterizing co-expressed gene modules with biologicalinformation in plants Bioinformatics 26 1267-1268

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Oshlack A Wakefield MJ (2009) Transcript length bias in RNA-seq data confounds systems biology Biol Direct 4 14Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Park T Yi S-G Kang S-H Lee S Lee Y-S Simon R (2003) Evaluation of normalization methods for microarray data BMC Bioinformatics4 33

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Paulson J Chen C-Y Lopes-Ramos CM Kuijjer ML Platig J Sonawane AR Fagny M Glass K Quackenbush J (2016) Tissue-awareRNA-Seq processing and normalization for heterogeneous and sparse data bioRxiv 81802

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Pautler M Eveland AL LaRue T Yang F Weeks R Lunde C Je B Il Meeley R Komatsu M Vollbrecht E et al (2015) FASCIATED EAR4Encodes a bZIP Transcription Factor That Regulates Shoot Meristem Size in Maize Plant Cell Online 2 tpc114132506

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Penning BW Hunter 3rd CT Tayengwa R Eveland AL Dugard CK Olek AT Vermerris W Koch KE McCarty DR Davis MF et al (2009)Genetic resources for maize cell wall biology Plant Physiol 151 1703-1728

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ponnala L Wang Y Sun Q Wijk KJ (2014) Correlation of mRNA and protein abundance in the developing maize leaf Plant J 78 424-440Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Rau A Gallopin M Celeux G Jaffreacutezic F (2013) Data-based filtering for replicated high-throughput transcriptome sequencingexperiments Bioinformatics 29 2146-2152

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Robinson MD McCarthy DJ Smyth GK (2010) edgeR a Bioconductor package for differential expression analysis of digital geneexpression data Bioinformatics 26 139-140

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sales G Romualdi C (2011) parmigenemdasha parallel R package for mutual information estimation and gene network reconstructionBioinformatics 27 1876-1877

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sangrador-Vegas A Mitchell AL Chang H-Y Yong S-Y Finn RD (2016) GO annotation in InterPro why stability does not indicateaccuracy in a sea of changing annotations Database J Biol databases curation 2016

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sato Y Antonio BA Namiki N Takehisa H Minami H Kamatsuki K Sugimoto K Shimizu Y Hirochika H Nagamura Y (2011) RiceXPro aplatform for monitoring gene expression in japonica rice grown under natural field conditions Nucleic Acids Res 39 D1141-D1148

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schaefer RJ Briskine R Springer NM Myers CL (2014) Discovering functional modules across diverse maize transcriptomes using wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

COB the co-expression browser PLoS One doi 101371journalpone0099193Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schmidt D (2016) Co-Operation Fast Correlation Covariance and Cosine Similarity R packagePubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schnable P Ware D Fulton R Stein J (2009) The B73 maize genome complexity diversity and dynamics Science (80- ) 326 1112-1115Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schwanhaumlusser B Busse D Li N Dittmar G Schuchhardt J Wolf J Chen W Selbach M (2011) Global quantification of mammalian geneexpression control Nature 473 337-342

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Scott-Boyer M-P Haibe-Kains B Deschepper CF (2013) Network statistics of genetically-driven gene co-expression modules in mousecrosses Frontiers in Genetics 20134291

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Serin EAR Nijveen H Hilhorst HWM Ligterink W (2016) Learning from Co-expression Networks Possibilities and Challenges FrontPlant Sci 7 444

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Shannon P Markiel A Ozier O Baliga NS Wang JT Ramage D Amin N Schwikowski B Ideker T (2003) Cytoscape a softwareenvironment for integrated models of biomolecular interaction networks Genome Res 13 2498-2504

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sing T Sander O Beerenwinkel N Lengauer T (2005) ROCR visualizing classifier performance in R Bioinformatics 21 3940-3941Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Singh M Goel S Meeley RB Dantec C Parrinello H Michaud C Leblanc O Grimanelli D (2011) Production of viable gametes withoutmeiosis in maize deficient for an ARGONAUTE protein Plant Cell 23 443-458

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Song L Langfelder P Horvath S (2012) Comparison of co-expression measures mutual information correlation and model basedindices BMC Bioinformatics 13 328

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Stelpflug SC Rajandeep S Vaillancourt B Hirsch CN Buell CR Leon N De Kaeppler SM (2015) An expanded maize gene expressionatlas based on RNA-sequencing and its use to explore root development Plant Genome 314-362

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Szklarczyk D Franceschini A Wyder S Forslund K Heller D Huerta-Cepas J Simonovic M Roth A Santos A Tsafou KP et al (2015)STRING v10 Protein-protein interaction networks integrated over the tree of life Nucleic Acids Res 43 D447-D452

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Tzfadia O Diels T De Meyer S Vandepoele K Aharoni A Van de Peer Y (2016) CoExpNetViz Comparative Co-Expression NetworksConstruction and Visualization Tool Front Plant Sci 6 1194

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Usadel B Obayashi T Mutwil M Giorgi FM Bassel GW Tanimoto M Chow A Steinhauser D Persson S Provart NJ (2009) Co-expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

USDA (2016) Grain World Markets and Trade

Vanholme R Demedts B Morreel K Ralph J Boerjan W (2010) Lignin biosynthesis and structure Plant Physiol 153 895-905Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Verdin E Ott M (2014) 50 Years of Protein Acetylation From Gene Regulation To Epigenetics Metabolism and Beyond Nat Rev MolCell Biol 16 258-264

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Walley JW Sartor RC Shen Z Schmitz RJ Wu KJ Urich MA Nery JR Smith LG Schnable JC Ecker JR (2016) Integration of omicnetworks in a developmental atlas of maize Science 353 814-818

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang W Zhou XJ Liu Z Sun F (2015a) Network tuned multiple rank aggregation and applications to gene ranking BMC Bioinformatics16 S6

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang YXR Jiang K Feldman LJ Bickel PJ Huang H (2015b) Inferring gene-gene interactions and functional modules using sparsecanonical correlation analysis Ann Appl Stat 9 300-323

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang Z Gerstein M Snyder M (2009) RNA-Seq a revolutionary tool for transcriptomics Nat Rev Genet 10 57-63Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wei C Li J Bumgarner RE (2004) Sample size for detecting differentially expressed genes in microarray experiments BMC Genomics5 87

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wolfe CJ Kohane IS Butte AJ (2005) Systematic survey reveals general applicability of guilt-by-association within genecoexpression networks BMC Bioinformatics 6 227

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wu D-D Wang X Li Y Zeng L Irwin DM Zhang Y-P (2014) Out of Pollen Hypothesis for Origin of New Genes in Flowering PlantsStudy from Arabidopsis thaliana Genome Biol Evol 6 2822-2829

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yang H Liu X Xin M Du J Hu Z Peng H Rossi V Sun Q Ni Z Yao Y (2016) Genome-wide Mapping of Targets of Maize HistoneDeacetylase HDA101 Reveals Its Function and Regulatory Mechanism during Seed Development Plant Cell 28 629-645

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yim WC Yu Y Song K Jang CS Lee B-M (2013) PLANEX the plant co-expression database BMC Plant Biol 13 83Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zheng W Chung LM Zhao H (2011) Bias detection and correction in RNA-Sequencing data BMC Bioinformatics 12 290Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Allen JD Xiao G Xie Y (2014) Ensemble-based network aggregation improves the accuracy of gene network reconstructionPLoS One 9 e106319 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Ye ZH (2015) Secondary cell walls Biosynthesis patterned deposition and transcriptional regulation Plant Cell Physiol 56195-214

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhu G Wu A Xu X-J Xiao P Lu L Liu J Cao Y Chen L Wu J Zhao X-M (2016) PPIM A protein-protein interaction database for MaizePlant Physiol 170 pp1501821-

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zyprych-Walczak J Szabelska A Handschuh L Goacuterczak K Klamecka K Figlerowicz M Siatkowski I (2015) The impact of normalizationmethods on RNA-Seq data analysis Biomed Res Int 2015

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

  • Parsed Citations
  • Article File
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Figure 7
  • Figure 8
  • Parsed Citations
Page 33: 1 : Maize RNA-Seq GCNs · 4 Author names and affiliation: 5 Ji Huang, ... and the genome annotation is still far from complete (Mark 49 Cigan et al., 2005; Ficklin and Feltus, 2011)

FigP

FigurePIPSimilarityPbetweenPninePinferencePmethodsPonPnetworkPperformancePbasedPuponPGOPA-PandPPPPTYPB-PevaluationI Genes with highest AUROC value in CSC or MRNETCLR are closed in blue and green box respectively AreaPunderPthePROCPcurvePAUROC-PvaluesPforPeachPGOPtermPorPgenesPwerePscaledPtoPstandardPnormalPdistributionMPresultingPinPscaledPAUROCPvaluesPbetweenPKPblue-PandPPred-IPSamplesPnormalizedPbyPVSTMPCPMPandPRPKMPwerePanalyzedPusingPeachPinferencePmethodsPPCCMPSCCMPKCCMPGCCMPBICMPCSCMPAAMPMAMPMRNETPandPCLR-PandPclusteredPbasedPonP EuclidianP distanceIP PCCP PearsonP CorrelationP CoefficientP SCCP SpearmanP correlationP coefficientP KCCP KendallP rankP CorrelationP CoefficientPGCCPGiniPcorrelationPcoefficientPBICPBiweightPmidcorrelationPCosinePSimilarityPCoefficientPCSC-PAAPAdditivePARCNEPMAPmultiplicativePARCNEPMRNETPMinimumPRedundancyPNETworkPCLRPContextPLikelihoodPofPRelatednessI

A

B

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 4 Effect of sample size on network performance A Average area under the ROC curve (average AUROC) values from GO evaluation of 17 different sized networks plotted against natural logarithm transformed sample size (log(sample size)) B Average AUROC values from PPPTY evaluation of 17 different sized networks plotted against natural logarithm transformed sample size (log(sample size)) Regression fitting logarithm models were plotted in black lines R-square and p-values were calculated by lm() function in R PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness

Fig 4A

B

GO PCC GO SCC

GO MRNET GO CLR

PPPTY PCC PPPTY SCC

PPPTY CLRPPPTY MRNET

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Fig 5

A B

Figure 5 Evaluation of different sized networks constructed with different number of samples using PCC (black) SCC (green) MRNET (red) and CLR (blue) methods A Average AUROC values from GO evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted against sample size B Average AUROC values from PPPTY evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted against sample size Networks with the same number of samples included are designated as ldquo_1rdquo ldquo_2rdquo and ldquo_3rdquo PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 6 GCN performance comparison of networks constructed with 1266 libraries A Area under the ROC curve (AUROC) values from GO evalu-ation of single network (white bars) aggregation network (grey bars) and protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers B AUROC values from PPPTY evaluation of single network (white bars) aggregation network (grey bars) and protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers

AU

C

AU

C

Protein GO Protein PPPTYA B

Fig 6

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

835

45

3812

5 802

148

MS PA

SA

872505

167664

16510411732

9172 716573

106591

MS PA

SA

chromatinassemblydisassembly

cellular macromoleculemetabolic process

chromatin assembly

Hub

N2 compound metabolicprocess

gene silencing

macromoleculemetabolic process

cellular componentorganization

chromatin modification

biosynthetic process

cellular biosyntheticprocess

DNA packaging

organelle organization

protein-DNA complexassembly

nucleosomeorganization

DNA-dep DNAreplication

macromoleculebiosynthetic process

response to DNAdamage stimulus

chromosomeorganization

pattern specificationprocess

DNA replication

DNA conformationchange

translation

cellular macromoleculebiosynthetic process

Nucleic acid metabolicprocess

gene expression

chromatin organizationnucleosome assembly

epigenetic reg of geneexpression

negative regulation ofmacromolecule

metabolic process

cellular response tostress

RNA processing

DNA repair

regionalization

polysaccharidebiosynthetic process

cell wall organization orbiogenesis

glucan metabolicprocess

cellular glucanmetabolic process

cellular polysaccharidebiosynthetic process

cellular carbohydratebiosynthetic process

cellulose metabolicprocess

cellular polysaccharidemetabolic process

cellulose biosyntheticprocess epidermis development

cell growthgrowth

regulation of cellularcomponent size

cellular amino acidderivative metabolic

process

cell wall polysaccharidemetabolic process

carbohydrate metabolicprocess

regulation of anatomicalstructure size

GTP metabolic process

root morphogenesis

epidermal celldifferentiation

ectoderm developmentphenylpropanoid

biosynthetic process

regulation of cell size

glucan biosyntheticprocess

carbohydratebiosynthetic process cellular cell wall

organization orbiogenesis

cell wall biogenesis

Cell Wallroot epidermal cell

differentiationcell differentiation

cell wall organization

protein polymerization

plant-type cell wallbiogenesis

cellular carbohydratemetabolic process

phenylpropanoidmetabolic process

cell wall macromoleculemetabolic process

cellular cell wallmacromolecule

metabolic process

plant-type cell wallorganization or

biogenesis

hemicellulose metabolicprocess

Fig 7A C

B D

Figure 7 Comparison of top 1000 interacting genes in aggregation and single networks A Venn diagram shows the overlap among 1000 highest interacting genes in three networks 148 genes were shared among three networks PA PCC ranked aggregation network SA SCC ranked aggrega-tion network MS MRNET single network B Venn diagram shows the overlap among top 1x106 edges in three networks 106591 edges were shared among three networks C GO term enrichment analyzed by AgriGO for 148 shared highly interacting genes among PA SA and MS D GO term enrichment analyzed by AgriGO for 214 co-expressed genes queried by 16 cell wall pathway genes wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Fig 8

A B C

Figure 8 Cell wall pathway subnetworks A Intersections of PCC aggregation (PA) SCC aggregation (SA) and MRNET-single (MS) networks queried by 16 cell wall pathway genes (red nodes) Cyan nodes are genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions B Network retrieved from CORNET database queried by the 16 cell wall pathway genes (red node) Cyan nodes are genes with reported function in cell wall related path-ways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions C Network retrieved from STRING database queried by the 16 cell wall pathway genes (red nodes) Cyan nodes are genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Parsed CitationsAllen JD Xie Y Chen M Girard L Xiao G (2012) Comparing statistical methods for constructing large scale gene networks PLoS One7 e29348

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Anders S Huber W (2010) Differential expression analysis for sequence count data Genome Biol 11 R106Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Aoki K Ogata Y Shibata D (2007) Approaches for extracting practical information from gene co-expression networks in plant biologyPlant Cell Physiol 48 381-90

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Aoki Y Okamura Y Tadaka S Kinoshita K Obayashi T (2015) ATTED-II in 2016 A Plant Coexpression Database Towards Lineage-Specific Coexpression Plant Cell Physiol 57 e5-e5

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Asnicar F Masera L Coller E Gallo C Sella N Tolio T Morettin P Erculiani L Galante F Semeniuta S (2016) NES2RA Networkexpansion by stratified variable subsetting and ranking aggregation Int J High Perform Comput Appl 1094342016662508

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Baerenfaller K Grossmann J Grobei MA Hull R Hirsch-Hoffmann M Yalovsky S Zimmermann P Grossniklaus U Gruissem WBaginsky S (2008) Genome-scale proteomics reveals Arabidopsis thaliana gene models and proteome dynamics Science (80- ) 320938-941

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ballouz S Verleyen W Gillis J (2015) Guidance for RNA-seq co-expression network construction and analysis Safety in numbersBioinformatics 31 2123-2130

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ballouz S Weber M Pavlidis P Gillis J (2016) EGAD Ultra-fast functional analysis of gene networks bioRxiv 53868Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Barabasi A-L Oltvai ZNZN Barabaacutesi A-L (2004) Network biology understanding the cells functional organization Nat Rev Genet 5101-113

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Benjamini Y Hochberg Y (1995) Controlling the false discovery rate a practical and powerful approach to multiple testing J R Stat SocSer B 289-300

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

De Bodt S Hollunder J Nelissen H Meulemeester N Inzeacute D (2012) CORNET 20 Integrating plant coexpression protein-proteininteractions regulatory interactions gene associations and functional annotations New Phytol 195 707-720

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Bolduc N Yilmaz A Mejia-Guerra MK Morohashi K OConnor D Grotewold E Hake S (2012) Unraveling the KNOTTED1 regulatorynetwork in maize meristems Genes Dev 26 1685-90

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Bosch M Mayer C-D Cookson A Donnison IS (2011) Identification of genes involved in cell wall biogenesis in grasses by differential wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

gene expression profiling of elongating and non-elongating maize internodes J Exp Bot 62 3545-3561Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Camacho C Coulouris G Avagyan V Ma N Papadopoulos J Bealer K Madden TL (2009) BLAST+ architecture and applications BMCBioinformatics 10 421

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Conesa A Madrigal P Tarazona S Gomez-Cabrero D Cervera A McPherson A Szczesniak MW Gaffney DJ Elo LL Zhang X et al(2016) A survey of best practices for RNA-seq data analysis Genome Biol 17 13

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dhaeseleer P Liang S Somogyi R (2000) Genetic network inference from co-expression clustering to reverse engineeringBioinformatics 16 707-726

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Davidson RM Hansey CN Gowda M Childs KL Lin H Vaillancourt B Sekhon RS de Leon N Kaeppler SM Jiang N et al (2011) Utilityof RNA Sequencing for Analysis of Maize Reproductive Transcriptomes Plant Genome J 4 191

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Desprez T Juraniec M Crowell EF Jouy H Pochylova Z Parcy F Houmlfte H Gonneau M Vernhettes S (2007) Organization of cellulosesynthase complexes involved in primary cell wall synthesis in Arabidopsis thaliana Proc Natl Acad Sci 104 15572-15577

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dhillon IS Modha DS (2001) Concept decompositions for large sparse text data using clustering Mach Learn 42 143-175Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dillies M-A Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot G Castel D Estelle J (2013a) Acomprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis Brief Bioinform 14671-683

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dillies MA Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot NS Castel D Estelle J et al (2013b) Acomprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis Brief Bioinform 14671-683

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Doncheva NT Assenov Y Domingues FS Albrecht M (2012) Topological analysis and interactive visualization of biological networksand protein structures Nature Protoc 7 670-85

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dong J Horvath S (2007) Understanding network concepts in modules BMC Syst Biol 1 24Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dotto MC Petsch KA Aukerman MJ Beatty M Hammell M Timmermans MCP (2014) Genome-wide analysis of leafbladeless1-regulatedand phased small RNAs underscores the importance of the TAS3 ta-siRNA pathway to maize development PLoS Genet 10 e1004826

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Du D Rawat N Deng Z Gmitter FG (2015) Construction of citrus gene coexpression networks from microarray data using randommatrix theory Hortic Res 2 15026

Pubmed Author and TitleCrossRef Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Google Scholar Author Only Title Only Author and Title

Du Z Zhou X Ling Y Zhang Z Su Z (2010) agriGO a GO analysis toolkit for the agricultural community Nucleic Acids Res 38 64-70Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Enright AJ Van Dongen S Ouzounis CA (2002) An efficient algorithm for large-scale detection of protein families Nucleic Acids Res 301575-1584

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fadista J Vikman P Laakso EO Mollet IG Esguerra J Lou Taneera J Storm P Osmark P Ladenvall C Prasad RB (2014) Globalgenomic and transcriptomic analysis of human pancreatic islets reveals novel genes influencing glucose metabolism Proc Natl AcadSci 111 13924-13929

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Faith JJ Hayete B Thaden JT Mogno I Wierzbowski J Cottarel G Kasif S Collins JJ Gardner TS (2007) Large-scale mapping andvalidation of Escherichia coli transcriptional regulation from a compendium of expression profiles PLoS Biol 5 0054-0066

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fedoroff N V (2012) McClintocks challenge in the 21st century Proc Natl Acad Sci 109(50) 20200-20203Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ficklin SP Feltus FA (2011) Gene coexpression network alignment and conservation of gene modules between two grass speciesmaize and rice Plant Physiol 156 1244-56

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Filzmoser P Fritz H Kalcher K (2009) pcaPP Robust PCA by Projection Pursuit R Packag version 1Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fu J Cheng Y Linghu J Yang X Kang L Zhang Z Zhang J He C Du X Peng Z (2013) RNA sequencing reveals the complex regulatorynetwork in the maize kernel Nature Commun 42832

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gao S Ver Steeg G Galstyan A (2015) Efficient Estimation of Mutual Information for Strongly Dependent Variables ArtificialIntelligence and Statistics 277-286

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gillis J Pavlidis P (2011) The role of indirect connections in gene networks in predicting function Bioinformatics 27 1860-1866Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Giorgi F Fabbro C Del Licausi F (2013) Comparative study of RNA-seq-and Microarray-derived coexpression networks in Arabidopsisthaliana Bioinformatics 2 1-8

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gu Y Kaplinsky N Bringmann M Cobb A Carroll A Sampathkumar A Baskin TI Persson S Somerville CR (2010) Identification of acellulose synthase-associated protein required for cellulose biosynthesis Proc Natl Acad Sci 107 12866-12871

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Han Y Gao S Muegge K Zhang W Zhou B (2015) Advanced applications of RNA sequencing and challenges Bioinform Biol Insights 929-46

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Horvath S Dong J (2008) Geometric interpretation of gene coexpression network analysis PLoS Comput biol 4 e1000117Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Huang J Lynn JS Schulte L Vendramin S McGinnis K (2017) Chapter Two-Epigenetic Control of Gene Expression in Maize Int RevCell Mol Biol 328 25-48

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Iancu OD Kawane S Bottomly D Searles R Hitzemann R McWeeney S (2012) Utilizing RNA-Seq data for de novo coexpressionnetwork inference Bioinformatics 28 1592-1597

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kim D Langmead B Salzberg SL (2015) HISAT a fast spliced aligner with low memory requirements Nat Methods 12 357-360Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Koumlster J Rahmann S (2012) Snakemakemdasha scalable bioinformatics workflow engine Bioinformatics 28 2520-2522Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Krouk G Lingeman J Colon A Coruzzi G Shasha D (2013) Gene regulatory networks in plants learning causality from time andperturbation Genome Biol 14 123

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kumar S Stecher G Suleski M Hedges SB (2017) TimeTree a resource for timelines timetrees and divergence times Mol Biol Evol34 1812-1819

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kumari S Nie J Chen H-S Ma H Stewart R Li X Lu M-Z Taylor WM Wei H (2012) Evaluation of gene association methods forcoexpression network construction and biological knowledge discovery PLoS One 7 e50411

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Langfelder P Horvath S (2008) WGCNA an R package for weighted correlation network analysis BMC Bioinformatics 9 559Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Leinonen R Sugawara H Shumway M (2010) The sequence read archive Nucleic Acids Res gkq1019Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li C Qiao Z Qi W Wang Q Yuan Y Yang X Tang Y Mei B Lv Y Zhao H et al (2015a) Genome-Wide Characterization of cis-Acting DNATargets Reveals the Transcriptional Regulatory Framework of Opaque2 in Maize Plant Cell 27 532-545

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li H Peng Z Yang X Wang W Fu J Wang J Han Y Chai Y Guo T Yang N (2013a) Genome-wide association study dissects the geneticarchitecture of oil biosynthesis in maize kernels Nat Genet 45 43-50

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li J Wei H Zhao PX (2013b) DeGNServer Deciphering Genome-Scale Gene Networks through High Performance ReverseEngineering Analysis 2013

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li P Piao Y Shon HS Ryu KH (2015b) Comparing the normalization methods for the differential analysis of Illumina high-throughputRNA-Seq data BMC Bioinformatics 16 347 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li Q Eichten SR Hermanson PJ Zaunbrecher VM Song J Wendt J Rosenbaum H Madzima TF Sloan AE Huang J et al (2014a)Genetic Perturbation of the Maize Methylome Plant Cell 26 4602-4616

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li S Labaj PP Zumbo P Sykacek P Shi W Shi L Phan J Wu P-Y Wang M Wang C (2014b) Detecting and correcting systematicvariation in large-scale RNA sequencing data Nature Biotechnol 32 888-895

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li Y Pearl SA Jackson SA (2015c) Gene Networks in Plant Biology Approaches in Reconstruction and Analysis Trends Plant Sci 20664-675

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Liao Y Smyth GK Shi W (2014) featureCounts an efficient general purpose program for assigning sequence reads to genomicfeatures Bioinformatics 30 923-930

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Lim WK Wang K Lefebvre C Califano A (2007) Comparative analysis of microarray normalization procedures Effects on reverseengineering gene networks Bioinformatics pp 282-288

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Liu S Liu Y Zhao J Cai S Qian H Zuo K Zhao L Zhang L (2017) A computational interactome for prioritizing genes associated withcomplex agronomic traits in rice Plant J 90 177-188

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Lonsdale J Thomas J Salvatore M Phillips R Lo E Shad S Hasz R Walters G Garcia F Young N (2013) The genotype-tissueexpression (GTEx) project Nat Genet 45 580-585

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Love MI Huber W Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 Genome Biol15 1

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Luo X You Z Zhou M Li S Leung H Xia Y Zhu Q (2015) A highly efficient approach to protein interactome mapping based oncollaborative filtering framework Sci Rep 5 7702

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma C Wang X (2012) Application of the Gini correlation coefficient to infer regulatory relationships in transcriptome analysis PlantPhysiol 160 192-203

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma H-W Zeng A-P (2003) The connectivity structure giant strong component and centrality of metabolic networks Bioinformatics 191423-1430

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma S Shah S Bohnert HJ Snyder M Dinesh-Kumar SP (2013) Incorporating motif analysis into gene co-expression networks revealsnovel modular expression pattern and new signaling pathways PLoS Genet 9 e1003840

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Marbach D Costello JC Kuumlffner R Vega NNM Prill RJ Camacho DM Allison KR Aderhold A Allison KR Bonneau R et al (2012)Wisdom of crowds for robust gene network inference Nat Methods 9 796-804

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Margolin AA Nemenman I Basso K Wiggins C Stolovitzky G Dalla Favera R Califano A (2006) ARACNE an algorithm for thereconstruction of gene regulatory networks in a mammalian cellular context BMC Bioinformatics 7 S7

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mark Cigan A Unger-Wallace E Haug-Collet K (2005) Transcriptional gene silencing as a tool for uncovering gene function in maizePlant J 43 929-940

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads EMBnet J 17 pp-10Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Maza E Frasse P Senin P Bouzayen M Zouine M (2013) Comparison of normalization methods for differential gene expressionanalysis in RNA-Seq experiments A matter of relative size of studied transcriptomes Commun Integr Biol 6 e25849

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

McClintock B (1983) The Significance of Responses of THE GENOME TO CHALLENGE Science (80- ) 792-801Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Meyer PE Kontos K Lafitte F Bontempi G (2007) Information-theoretic inference of large transcriptional regulatory networks EurasipJ Bioinforma Syst Biol doi 101155200779879

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Meyer PE Lafitte F Bontempi G (2008) minet ARBioconductor package for inferring large transcriptional networks using mutualinformation BMC Bioinformatics 9 461

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Monaco MK Sen TZ Dharmawardhana PD Ren L Schaeffer M Naithani S Amarasinghe V Thomason J Harper L Gardiner J et al(2013) Maize Metabolic Network Construction and Transcriptome Analysis Plant Genome 6 12

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Morohashi K Casas MI Falcone Ferreyra ML Falcone Ferreyra L Mejiacutea-Guerra MK Pourcel L Yilmaz A Feller A Carvalho B EmilianiJ et al (2012) A genome-wide regulatory framework identifies maize pericarp color1 controlled genes Plant Cell 24 2745-64

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Morris JH Apeltsin L Newman AM Baumbach J Wittkop T Su G Bader GD Ferrin TE (2011) clusterMaker a multi-algorithmclustering plugin for Cytoscape BMC Bioinformatics 12 436

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mortazavi A Williams BA McCue K Schaeffer L Wold B (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq NatMethods 5 621-628

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mukaka MM (2012) A guide to appropriate use of correlation coefficient in medical research Malawi Med J 24 69-71Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Obayashi T Hayashi S Saeki M Ohta H Kinoshita K (2009) ATTED-II provides coexpressed gene networks for Arabidopsis Nucleic wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Acids Res 37 D987-D991Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ogata Y Suzuki H Sakurai N Shibata D (2010) CoP a database for characterizing co-expressed gene modules with biologicalinformation in plants Bioinformatics 26 1267-1268

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Oshlack A Wakefield MJ (2009) Transcript length bias in RNA-seq data confounds systems biology Biol Direct 4 14Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Park T Yi S-G Kang S-H Lee S Lee Y-S Simon R (2003) Evaluation of normalization methods for microarray data BMC Bioinformatics4 33

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Paulson J Chen C-Y Lopes-Ramos CM Kuijjer ML Platig J Sonawane AR Fagny M Glass K Quackenbush J (2016) Tissue-awareRNA-Seq processing and normalization for heterogeneous and sparse data bioRxiv 81802

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Pautler M Eveland AL LaRue T Yang F Weeks R Lunde C Je B Il Meeley R Komatsu M Vollbrecht E et al (2015) FASCIATED EAR4Encodes a bZIP Transcription Factor That Regulates Shoot Meristem Size in Maize Plant Cell Online 2 tpc114132506

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Penning BW Hunter 3rd CT Tayengwa R Eveland AL Dugard CK Olek AT Vermerris W Koch KE McCarty DR Davis MF et al (2009)Genetic resources for maize cell wall biology Plant Physiol 151 1703-1728

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ponnala L Wang Y Sun Q Wijk KJ (2014) Correlation of mRNA and protein abundance in the developing maize leaf Plant J 78 424-440Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Rau A Gallopin M Celeux G Jaffreacutezic F (2013) Data-based filtering for replicated high-throughput transcriptome sequencingexperiments Bioinformatics 29 2146-2152

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Robinson MD McCarthy DJ Smyth GK (2010) edgeR a Bioconductor package for differential expression analysis of digital geneexpression data Bioinformatics 26 139-140

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sales G Romualdi C (2011) parmigenemdasha parallel R package for mutual information estimation and gene network reconstructionBioinformatics 27 1876-1877

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sangrador-Vegas A Mitchell AL Chang H-Y Yong S-Y Finn RD (2016) GO annotation in InterPro why stability does not indicateaccuracy in a sea of changing annotations Database J Biol databases curation 2016

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sato Y Antonio BA Namiki N Takehisa H Minami H Kamatsuki K Sugimoto K Shimizu Y Hirochika H Nagamura Y (2011) RiceXPro aplatform for monitoring gene expression in japonica rice grown under natural field conditions Nucleic Acids Res 39 D1141-D1148

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schaefer RJ Briskine R Springer NM Myers CL (2014) Discovering functional modules across diverse maize transcriptomes using wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

COB the co-expression browser PLoS One doi 101371journalpone0099193Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schmidt D (2016) Co-Operation Fast Correlation Covariance and Cosine Similarity R packagePubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schnable P Ware D Fulton R Stein J (2009) The B73 maize genome complexity diversity and dynamics Science (80- ) 326 1112-1115Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schwanhaumlusser B Busse D Li N Dittmar G Schuchhardt J Wolf J Chen W Selbach M (2011) Global quantification of mammalian geneexpression control Nature 473 337-342

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Scott-Boyer M-P Haibe-Kains B Deschepper CF (2013) Network statistics of genetically-driven gene co-expression modules in mousecrosses Frontiers in Genetics 20134291

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Serin EAR Nijveen H Hilhorst HWM Ligterink W (2016) Learning from Co-expression Networks Possibilities and Challenges FrontPlant Sci 7 444

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Shannon P Markiel A Ozier O Baliga NS Wang JT Ramage D Amin N Schwikowski B Ideker T (2003) Cytoscape a softwareenvironment for integrated models of biomolecular interaction networks Genome Res 13 2498-2504

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sing T Sander O Beerenwinkel N Lengauer T (2005) ROCR visualizing classifier performance in R Bioinformatics 21 3940-3941Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Singh M Goel S Meeley RB Dantec C Parrinello H Michaud C Leblanc O Grimanelli D (2011) Production of viable gametes withoutmeiosis in maize deficient for an ARGONAUTE protein Plant Cell 23 443-458

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Song L Langfelder P Horvath S (2012) Comparison of co-expression measures mutual information correlation and model basedindices BMC Bioinformatics 13 328

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Stelpflug SC Rajandeep S Vaillancourt B Hirsch CN Buell CR Leon N De Kaeppler SM (2015) An expanded maize gene expressionatlas based on RNA-sequencing and its use to explore root development Plant Genome 314-362

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Szklarczyk D Franceschini A Wyder S Forslund K Heller D Huerta-Cepas J Simonovic M Roth A Santos A Tsafou KP et al (2015)STRING v10 Protein-protein interaction networks integrated over the tree of life Nucleic Acids Res 43 D447-D452

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Tzfadia O Diels T De Meyer S Vandepoele K Aharoni A Van de Peer Y (2016) CoExpNetViz Comparative Co-Expression NetworksConstruction and Visualization Tool Front Plant Sci 6 1194

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Usadel B Obayashi T Mutwil M Giorgi FM Bassel GW Tanimoto M Chow A Steinhauser D Persson S Provart NJ (2009) Co-expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

USDA (2016) Grain World Markets and Trade

Vanholme R Demedts B Morreel K Ralph J Boerjan W (2010) Lignin biosynthesis and structure Plant Physiol 153 895-905Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Verdin E Ott M (2014) 50 Years of Protein Acetylation From Gene Regulation To Epigenetics Metabolism and Beyond Nat Rev MolCell Biol 16 258-264

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Walley JW Sartor RC Shen Z Schmitz RJ Wu KJ Urich MA Nery JR Smith LG Schnable JC Ecker JR (2016) Integration of omicnetworks in a developmental atlas of maize Science 353 814-818

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang W Zhou XJ Liu Z Sun F (2015a) Network tuned multiple rank aggregation and applications to gene ranking BMC Bioinformatics16 S6

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang YXR Jiang K Feldman LJ Bickel PJ Huang H (2015b) Inferring gene-gene interactions and functional modules using sparsecanonical correlation analysis Ann Appl Stat 9 300-323

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang Z Gerstein M Snyder M (2009) RNA-Seq a revolutionary tool for transcriptomics Nat Rev Genet 10 57-63Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wei C Li J Bumgarner RE (2004) Sample size for detecting differentially expressed genes in microarray experiments BMC Genomics5 87

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wolfe CJ Kohane IS Butte AJ (2005) Systematic survey reveals general applicability of guilt-by-association within genecoexpression networks BMC Bioinformatics 6 227

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wu D-D Wang X Li Y Zeng L Irwin DM Zhang Y-P (2014) Out of Pollen Hypothesis for Origin of New Genes in Flowering PlantsStudy from Arabidopsis thaliana Genome Biol Evol 6 2822-2829

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yang H Liu X Xin M Du J Hu Z Peng H Rossi V Sun Q Ni Z Yao Y (2016) Genome-wide Mapping of Targets of Maize HistoneDeacetylase HDA101 Reveals Its Function and Regulatory Mechanism during Seed Development Plant Cell 28 629-645

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yim WC Yu Y Song K Jang CS Lee B-M (2013) PLANEX the plant co-expression database BMC Plant Biol 13 83Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zheng W Chung LM Zhao H (2011) Bias detection and correction in RNA-Sequencing data BMC Bioinformatics 12 290Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Allen JD Xiao G Xie Y (2014) Ensemble-based network aggregation improves the accuracy of gene network reconstructionPLoS One 9 e106319 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Ye ZH (2015) Secondary cell walls Biosynthesis patterned deposition and transcriptional regulation Plant Cell Physiol 56195-214

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhu G Wu A Xu X-J Xiao P Lu L Liu J Cao Y Chen L Wu J Zhao X-M (2016) PPIM A protein-protein interaction database for MaizePlant Physiol 170 pp1501821-

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zyprych-Walczak J Szabelska A Handschuh L Goacuterczak K Klamecka K Figlerowicz M Siatkowski I (2015) The impact of normalizationmethods on RNA-Seq data analysis Biomed Res Int 2015

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

  • Parsed Citations
  • Article File
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Figure 7
  • Figure 8
  • Parsed Citations
Page 34: 1 : Maize RNA-Seq GCNs · 4 Author names and affiliation: 5 Ji Huang, ... and the genome annotation is still far from complete (Mark 49 Cigan et al., 2005; Ficklin and Feltus, 2011)

Figure 4 Effect of sample size on network performance A Average area under the ROC curve (average AUROC) values from GO evaluation of 17 different sized networks plotted against natural logarithm transformed sample size (log(sample size)) B Average AUROC values from PPPTY evaluation of 17 different sized networks plotted against natural logarithm transformed sample size (log(sample size)) Regression fitting logarithm models were plotted in black lines R-square and p-values were calculated by lm() function in R PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness

Fig 4A

B

GO PCC GO SCC

GO MRNET GO CLR

PPPTY PCC PPPTY SCC

PPPTY CLRPPPTY MRNET

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Fig 5

A B

Figure 5 Evaluation of different sized networks constructed with different number of samples using PCC (black) SCC (green) MRNET (red) and CLR (blue) methods A Average AUROC values from GO evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted against sample size B Average AUROC values from PPPTY evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted against sample size Networks with the same number of samples included are designated as ldquo_1rdquo ldquo_2rdquo and ldquo_3rdquo PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 6 GCN performance comparison of networks constructed with 1266 libraries A Area under the ROC curve (AUROC) values from GO evalu-ation of single network (white bars) aggregation network (grey bars) and protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers B AUROC values from PPPTY evaluation of single network (white bars) aggregation network (grey bars) and protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers

AU

C

AU

C

Protein GO Protein PPPTYA B

Fig 6

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

835

45

3812

5 802

148

MS PA

SA

872505

167664

16510411732

9172 716573

106591

MS PA

SA

chromatinassemblydisassembly

cellular macromoleculemetabolic process

chromatin assembly

Hub

N2 compound metabolicprocess

gene silencing

macromoleculemetabolic process

cellular componentorganization

chromatin modification

biosynthetic process

cellular biosyntheticprocess

DNA packaging

organelle organization

protein-DNA complexassembly

nucleosomeorganization

DNA-dep DNAreplication

macromoleculebiosynthetic process

response to DNAdamage stimulus

chromosomeorganization

pattern specificationprocess

DNA replication

DNA conformationchange

translation

cellular macromoleculebiosynthetic process

Nucleic acid metabolicprocess

gene expression

chromatin organizationnucleosome assembly

epigenetic reg of geneexpression

negative regulation ofmacromolecule

metabolic process

cellular response tostress

RNA processing

DNA repair

regionalization

polysaccharidebiosynthetic process

cell wall organization orbiogenesis

glucan metabolicprocess

cellular glucanmetabolic process

cellular polysaccharidebiosynthetic process

cellular carbohydratebiosynthetic process

cellulose metabolicprocess

cellular polysaccharidemetabolic process

cellulose biosyntheticprocess epidermis development

cell growthgrowth

regulation of cellularcomponent size

cellular amino acidderivative metabolic

process

cell wall polysaccharidemetabolic process

carbohydrate metabolicprocess

regulation of anatomicalstructure size

GTP metabolic process

root morphogenesis

epidermal celldifferentiation

ectoderm developmentphenylpropanoid

biosynthetic process

regulation of cell size

glucan biosyntheticprocess

carbohydratebiosynthetic process cellular cell wall

organization orbiogenesis

cell wall biogenesis

Cell Wallroot epidermal cell

differentiationcell differentiation

cell wall organization

protein polymerization

plant-type cell wallbiogenesis

cellular carbohydratemetabolic process

phenylpropanoidmetabolic process

cell wall macromoleculemetabolic process

cellular cell wallmacromolecule

metabolic process

plant-type cell wallorganization or

biogenesis

hemicellulose metabolicprocess

Fig 7A C

B D

Figure 7 Comparison of top 1000 interacting genes in aggregation and single networks A Venn diagram shows the overlap among 1000 highest interacting genes in three networks 148 genes were shared among three networks PA PCC ranked aggregation network SA SCC ranked aggrega-tion network MS MRNET single network B Venn diagram shows the overlap among top 1x106 edges in three networks 106591 edges were shared among three networks C GO term enrichment analyzed by AgriGO for 148 shared highly interacting genes among PA SA and MS D GO term enrichment analyzed by AgriGO for 214 co-expressed genes queried by 16 cell wall pathway genes wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Fig 8

A B C

Figure 8 Cell wall pathway subnetworks A Intersections of PCC aggregation (PA) SCC aggregation (SA) and MRNET-single (MS) networks queried by 16 cell wall pathway genes (red nodes) Cyan nodes are genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions B Network retrieved from CORNET database queried by the 16 cell wall pathway genes (red node) Cyan nodes are genes with reported function in cell wall related path-ways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions C Network retrieved from STRING database queried by the 16 cell wall pathway genes (red nodes) Cyan nodes are genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Parsed CitationsAllen JD Xie Y Chen M Girard L Xiao G (2012) Comparing statistical methods for constructing large scale gene networks PLoS One7 e29348

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Anders S Huber W (2010) Differential expression analysis for sequence count data Genome Biol 11 R106Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Aoki K Ogata Y Shibata D (2007) Approaches for extracting practical information from gene co-expression networks in plant biologyPlant Cell Physiol 48 381-90

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Aoki Y Okamura Y Tadaka S Kinoshita K Obayashi T (2015) ATTED-II in 2016 A Plant Coexpression Database Towards Lineage-Specific Coexpression Plant Cell Physiol 57 e5-e5

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Asnicar F Masera L Coller E Gallo C Sella N Tolio T Morettin P Erculiani L Galante F Semeniuta S (2016) NES2RA Networkexpansion by stratified variable subsetting and ranking aggregation Int J High Perform Comput Appl 1094342016662508

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Baerenfaller K Grossmann J Grobei MA Hull R Hirsch-Hoffmann M Yalovsky S Zimmermann P Grossniklaus U Gruissem WBaginsky S (2008) Genome-scale proteomics reveals Arabidopsis thaliana gene models and proteome dynamics Science (80- ) 320938-941

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ballouz S Verleyen W Gillis J (2015) Guidance for RNA-seq co-expression network construction and analysis Safety in numbersBioinformatics 31 2123-2130

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ballouz S Weber M Pavlidis P Gillis J (2016) EGAD Ultra-fast functional analysis of gene networks bioRxiv 53868Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Barabasi A-L Oltvai ZNZN Barabaacutesi A-L (2004) Network biology understanding the cells functional organization Nat Rev Genet 5101-113

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Benjamini Y Hochberg Y (1995) Controlling the false discovery rate a practical and powerful approach to multiple testing J R Stat SocSer B 289-300

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

De Bodt S Hollunder J Nelissen H Meulemeester N Inzeacute D (2012) CORNET 20 Integrating plant coexpression protein-proteininteractions regulatory interactions gene associations and functional annotations New Phytol 195 707-720

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Bolduc N Yilmaz A Mejia-Guerra MK Morohashi K OConnor D Grotewold E Hake S (2012) Unraveling the KNOTTED1 regulatorynetwork in maize meristems Genes Dev 26 1685-90

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Bosch M Mayer C-D Cookson A Donnison IS (2011) Identification of genes involved in cell wall biogenesis in grasses by differential wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

gene expression profiling of elongating and non-elongating maize internodes J Exp Bot 62 3545-3561Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Camacho C Coulouris G Avagyan V Ma N Papadopoulos J Bealer K Madden TL (2009) BLAST+ architecture and applications BMCBioinformatics 10 421

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Conesa A Madrigal P Tarazona S Gomez-Cabrero D Cervera A McPherson A Szczesniak MW Gaffney DJ Elo LL Zhang X et al(2016) A survey of best practices for RNA-seq data analysis Genome Biol 17 13

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dhaeseleer P Liang S Somogyi R (2000) Genetic network inference from co-expression clustering to reverse engineeringBioinformatics 16 707-726

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Davidson RM Hansey CN Gowda M Childs KL Lin H Vaillancourt B Sekhon RS de Leon N Kaeppler SM Jiang N et al (2011) Utilityof RNA Sequencing for Analysis of Maize Reproductive Transcriptomes Plant Genome J 4 191

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Desprez T Juraniec M Crowell EF Jouy H Pochylova Z Parcy F Houmlfte H Gonneau M Vernhettes S (2007) Organization of cellulosesynthase complexes involved in primary cell wall synthesis in Arabidopsis thaliana Proc Natl Acad Sci 104 15572-15577

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dhillon IS Modha DS (2001) Concept decompositions for large sparse text data using clustering Mach Learn 42 143-175Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dillies M-A Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot G Castel D Estelle J (2013a) Acomprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis Brief Bioinform 14671-683

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dillies MA Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot NS Castel D Estelle J et al (2013b) Acomprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis Brief Bioinform 14671-683

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Doncheva NT Assenov Y Domingues FS Albrecht M (2012) Topological analysis and interactive visualization of biological networksand protein structures Nature Protoc 7 670-85

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dong J Horvath S (2007) Understanding network concepts in modules BMC Syst Biol 1 24Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dotto MC Petsch KA Aukerman MJ Beatty M Hammell M Timmermans MCP (2014) Genome-wide analysis of leafbladeless1-regulatedand phased small RNAs underscores the importance of the TAS3 ta-siRNA pathway to maize development PLoS Genet 10 e1004826

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Du D Rawat N Deng Z Gmitter FG (2015) Construction of citrus gene coexpression networks from microarray data using randommatrix theory Hortic Res 2 15026

Pubmed Author and TitleCrossRef Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Google Scholar Author Only Title Only Author and Title

Du Z Zhou X Ling Y Zhang Z Su Z (2010) agriGO a GO analysis toolkit for the agricultural community Nucleic Acids Res 38 64-70Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Enright AJ Van Dongen S Ouzounis CA (2002) An efficient algorithm for large-scale detection of protein families Nucleic Acids Res 301575-1584

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fadista J Vikman P Laakso EO Mollet IG Esguerra J Lou Taneera J Storm P Osmark P Ladenvall C Prasad RB (2014) Globalgenomic and transcriptomic analysis of human pancreatic islets reveals novel genes influencing glucose metabolism Proc Natl AcadSci 111 13924-13929

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Faith JJ Hayete B Thaden JT Mogno I Wierzbowski J Cottarel G Kasif S Collins JJ Gardner TS (2007) Large-scale mapping andvalidation of Escherichia coli transcriptional regulation from a compendium of expression profiles PLoS Biol 5 0054-0066

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fedoroff N V (2012) McClintocks challenge in the 21st century Proc Natl Acad Sci 109(50) 20200-20203Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ficklin SP Feltus FA (2011) Gene coexpression network alignment and conservation of gene modules between two grass speciesmaize and rice Plant Physiol 156 1244-56

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Filzmoser P Fritz H Kalcher K (2009) pcaPP Robust PCA by Projection Pursuit R Packag version 1Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fu J Cheng Y Linghu J Yang X Kang L Zhang Z Zhang J He C Du X Peng Z (2013) RNA sequencing reveals the complex regulatorynetwork in the maize kernel Nature Commun 42832

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gao S Ver Steeg G Galstyan A (2015) Efficient Estimation of Mutual Information for Strongly Dependent Variables ArtificialIntelligence and Statistics 277-286

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gillis J Pavlidis P (2011) The role of indirect connections in gene networks in predicting function Bioinformatics 27 1860-1866Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Giorgi F Fabbro C Del Licausi F (2013) Comparative study of RNA-seq-and Microarray-derived coexpression networks in Arabidopsisthaliana Bioinformatics 2 1-8

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gu Y Kaplinsky N Bringmann M Cobb A Carroll A Sampathkumar A Baskin TI Persson S Somerville CR (2010) Identification of acellulose synthase-associated protein required for cellulose biosynthesis Proc Natl Acad Sci 107 12866-12871

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Han Y Gao S Muegge K Zhang W Zhou B (2015) Advanced applications of RNA sequencing and challenges Bioinform Biol Insights 929-46

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Horvath S Dong J (2008) Geometric interpretation of gene coexpression network analysis PLoS Comput biol 4 e1000117Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Huang J Lynn JS Schulte L Vendramin S McGinnis K (2017) Chapter Two-Epigenetic Control of Gene Expression in Maize Int RevCell Mol Biol 328 25-48

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Iancu OD Kawane S Bottomly D Searles R Hitzemann R McWeeney S (2012) Utilizing RNA-Seq data for de novo coexpressionnetwork inference Bioinformatics 28 1592-1597

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kim D Langmead B Salzberg SL (2015) HISAT a fast spliced aligner with low memory requirements Nat Methods 12 357-360Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Koumlster J Rahmann S (2012) Snakemakemdasha scalable bioinformatics workflow engine Bioinformatics 28 2520-2522Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Krouk G Lingeman J Colon A Coruzzi G Shasha D (2013) Gene regulatory networks in plants learning causality from time andperturbation Genome Biol 14 123

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kumar S Stecher G Suleski M Hedges SB (2017) TimeTree a resource for timelines timetrees and divergence times Mol Biol Evol34 1812-1819

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kumari S Nie J Chen H-S Ma H Stewart R Li X Lu M-Z Taylor WM Wei H (2012) Evaluation of gene association methods forcoexpression network construction and biological knowledge discovery PLoS One 7 e50411

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Langfelder P Horvath S (2008) WGCNA an R package for weighted correlation network analysis BMC Bioinformatics 9 559Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Leinonen R Sugawara H Shumway M (2010) The sequence read archive Nucleic Acids Res gkq1019Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li C Qiao Z Qi W Wang Q Yuan Y Yang X Tang Y Mei B Lv Y Zhao H et al (2015a) Genome-Wide Characterization of cis-Acting DNATargets Reveals the Transcriptional Regulatory Framework of Opaque2 in Maize Plant Cell 27 532-545

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li H Peng Z Yang X Wang W Fu J Wang J Han Y Chai Y Guo T Yang N (2013a) Genome-wide association study dissects the geneticarchitecture of oil biosynthesis in maize kernels Nat Genet 45 43-50

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li J Wei H Zhao PX (2013b) DeGNServer Deciphering Genome-Scale Gene Networks through High Performance ReverseEngineering Analysis 2013

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li P Piao Y Shon HS Ryu KH (2015b) Comparing the normalization methods for the differential analysis of Illumina high-throughputRNA-Seq data BMC Bioinformatics 16 347 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li Q Eichten SR Hermanson PJ Zaunbrecher VM Song J Wendt J Rosenbaum H Madzima TF Sloan AE Huang J et al (2014a)Genetic Perturbation of the Maize Methylome Plant Cell 26 4602-4616

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li S Labaj PP Zumbo P Sykacek P Shi W Shi L Phan J Wu P-Y Wang M Wang C (2014b) Detecting and correcting systematicvariation in large-scale RNA sequencing data Nature Biotechnol 32 888-895

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li Y Pearl SA Jackson SA (2015c) Gene Networks in Plant Biology Approaches in Reconstruction and Analysis Trends Plant Sci 20664-675

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Liao Y Smyth GK Shi W (2014) featureCounts an efficient general purpose program for assigning sequence reads to genomicfeatures Bioinformatics 30 923-930

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Lim WK Wang K Lefebvre C Califano A (2007) Comparative analysis of microarray normalization procedures Effects on reverseengineering gene networks Bioinformatics pp 282-288

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Liu S Liu Y Zhao J Cai S Qian H Zuo K Zhao L Zhang L (2017) A computational interactome for prioritizing genes associated withcomplex agronomic traits in rice Plant J 90 177-188

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Lonsdale J Thomas J Salvatore M Phillips R Lo E Shad S Hasz R Walters G Garcia F Young N (2013) The genotype-tissueexpression (GTEx) project Nat Genet 45 580-585

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Love MI Huber W Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 Genome Biol15 1

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Luo X You Z Zhou M Li S Leung H Xia Y Zhu Q (2015) A highly efficient approach to protein interactome mapping based oncollaborative filtering framework Sci Rep 5 7702

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma C Wang X (2012) Application of the Gini correlation coefficient to infer regulatory relationships in transcriptome analysis PlantPhysiol 160 192-203

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma H-W Zeng A-P (2003) The connectivity structure giant strong component and centrality of metabolic networks Bioinformatics 191423-1430

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma S Shah S Bohnert HJ Snyder M Dinesh-Kumar SP (2013) Incorporating motif analysis into gene co-expression networks revealsnovel modular expression pattern and new signaling pathways PLoS Genet 9 e1003840

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Marbach D Costello JC Kuumlffner R Vega NNM Prill RJ Camacho DM Allison KR Aderhold A Allison KR Bonneau R et al (2012)Wisdom of crowds for robust gene network inference Nat Methods 9 796-804

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Margolin AA Nemenman I Basso K Wiggins C Stolovitzky G Dalla Favera R Califano A (2006) ARACNE an algorithm for thereconstruction of gene regulatory networks in a mammalian cellular context BMC Bioinformatics 7 S7

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mark Cigan A Unger-Wallace E Haug-Collet K (2005) Transcriptional gene silencing as a tool for uncovering gene function in maizePlant J 43 929-940

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads EMBnet J 17 pp-10Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Maza E Frasse P Senin P Bouzayen M Zouine M (2013) Comparison of normalization methods for differential gene expressionanalysis in RNA-Seq experiments A matter of relative size of studied transcriptomes Commun Integr Biol 6 e25849

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

McClintock B (1983) The Significance of Responses of THE GENOME TO CHALLENGE Science (80- ) 792-801Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Meyer PE Kontos K Lafitte F Bontempi G (2007) Information-theoretic inference of large transcriptional regulatory networks EurasipJ Bioinforma Syst Biol doi 101155200779879

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Meyer PE Lafitte F Bontempi G (2008) minet ARBioconductor package for inferring large transcriptional networks using mutualinformation BMC Bioinformatics 9 461

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Monaco MK Sen TZ Dharmawardhana PD Ren L Schaeffer M Naithani S Amarasinghe V Thomason J Harper L Gardiner J et al(2013) Maize Metabolic Network Construction and Transcriptome Analysis Plant Genome 6 12

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Morohashi K Casas MI Falcone Ferreyra ML Falcone Ferreyra L Mejiacutea-Guerra MK Pourcel L Yilmaz A Feller A Carvalho B EmilianiJ et al (2012) A genome-wide regulatory framework identifies maize pericarp color1 controlled genes Plant Cell 24 2745-64

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Morris JH Apeltsin L Newman AM Baumbach J Wittkop T Su G Bader GD Ferrin TE (2011) clusterMaker a multi-algorithmclustering plugin for Cytoscape BMC Bioinformatics 12 436

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mortazavi A Williams BA McCue K Schaeffer L Wold B (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq NatMethods 5 621-628

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mukaka MM (2012) A guide to appropriate use of correlation coefficient in medical research Malawi Med J 24 69-71Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Obayashi T Hayashi S Saeki M Ohta H Kinoshita K (2009) ATTED-II provides coexpressed gene networks for Arabidopsis Nucleic wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Acids Res 37 D987-D991Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ogata Y Suzuki H Sakurai N Shibata D (2010) CoP a database for characterizing co-expressed gene modules with biologicalinformation in plants Bioinformatics 26 1267-1268

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Oshlack A Wakefield MJ (2009) Transcript length bias in RNA-seq data confounds systems biology Biol Direct 4 14Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Park T Yi S-G Kang S-H Lee S Lee Y-S Simon R (2003) Evaluation of normalization methods for microarray data BMC Bioinformatics4 33

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Paulson J Chen C-Y Lopes-Ramos CM Kuijjer ML Platig J Sonawane AR Fagny M Glass K Quackenbush J (2016) Tissue-awareRNA-Seq processing and normalization for heterogeneous and sparse data bioRxiv 81802

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Pautler M Eveland AL LaRue T Yang F Weeks R Lunde C Je B Il Meeley R Komatsu M Vollbrecht E et al (2015) FASCIATED EAR4Encodes a bZIP Transcription Factor That Regulates Shoot Meristem Size in Maize Plant Cell Online 2 tpc114132506

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Penning BW Hunter 3rd CT Tayengwa R Eveland AL Dugard CK Olek AT Vermerris W Koch KE McCarty DR Davis MF et al (2009)Genetic resources for maize cell wall biology Plant Physiol 151 1703-1728

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ponnala L Wang Y Sun Q Wijk KJ (2014) Correlation of mRNA and protein abundance in the developing maize leaf Plant J 78 424-440Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Rau A Gallopin M Celeux G Jaffreacutezic F (2013) Data-based filtering for replicated high-throughput transcriptome sequencingexperiments Bioinformatics 29 2146-2152

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Robinson MD McCarthy DJ Smyth GK (2010) edgeR a Bioconductor package for differential expression analysis of digital geneexpression data Bioinformatics 26 139-140

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sales G Romualdi C (2011) parmigenemdasha parallel R package for mutual information estimation and gene network reconstructionBioinformatics 27 1876-1877

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sangrador-Vegas A Mitchell AL Chang H-Y Yong S-Y Finn RD (2016) GO annotation in InterPro why stability does not indicateaccuracy in a sea of changing annotations Database J Biol databases curation 2016

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sato Y Antonio BA Namiki N Takehisa H Minami H Kamatsuki K Sugimoto K Shimizu Y Hirochika H Nagamura Y (2011) RiceXPro aplatform for monitoring gene expression in japonica rice grown under natural field conditions Nucleic Acids Res 39 D1141-D1148

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schaefer RJ Briskine R Springer NM Myers CL (2014) Discovering functional modules across diverse maize transcriptomes using wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

COB the co-expression browser PLoS One doi 101371journalpone0099193Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schmidt D (2016) Co-Operation Fast Correlation Covariance and Cosine Similarity R packagePubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schnable P Ware D Fulton R Stein J (2009) The B73 maize genome complexity diversity and dynamics Science (80- ) 326 1112-1115Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schwanhaumlusser B Busse D Li N Dittmar G Schuchhardt J Wolf J Chen W Selbach M (2011) Global quantification of mammalian geneexpression control Nature 473 337-342

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Scott-Boyer M-P Haibe-Kains B Deschepper CF (2013) Network statistics of genetically-driven gene co-expression modules in mousecrosses Frontiers in Genetics 20134291

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Serin EAR Nijveen H Hilhorst HWM Ligterink W (2016) Learning from Co-expression Networks Possibilities and Challenges FrontPlant Sci 7 444

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Shannon P Markiel A Ozier O Baliga NS Wang JT Ramage D Amin N Schwikowski B Ideker T (2003) Cytoscape a softwareenvironment for integrated models of biomolecular interaction networks Genome Res 13 2498-2504

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sing T Sander O Beerenwinkel N Lengauer T (2005) ROCR visualizing classifier performance in R Bioinformatics 21 3940-3941Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Singh M Goel S Meeley RB Dantec C Parrinello H Michaud C Leblanc O Grimanelli D (2011) Production of viable gametes withoutmeiosis in maize deficient for an ARGONAUTE protein Plant Cell 23 443-458

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Song L Langfelder P Horvath S (2012) Comparison of co-expression measures mutual information correlation and model basedindices BMC Bioinformatics 13 328

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Stelpflug SC Rajandeep S Vaillancourt B Hirsch CN Buell CR Leon N De Kaeppler SM (2015) An expanded maize gene expressionatlas based on RNA-sequencing and its use to explore root development Plant Genome 314-362

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Szklarczyk D Franceschini A Wyder S Forslund K Heller D Huerta-Cepas J Simonovic M Roth A Santos A Tsafou KP et al (2015)STRING v10 Protein-protein interaction networks integrated over the tree of life Nucleic Acids Res 43 D447-D452

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Tzfadia O Diels T De Meyer S Vandepoele K Aharoni A Van de Peer Y (2016) CoExpNetViz Comparative Co-Expression NetworksConstruction and Visualization Tool Front Plant Sci 6 1194

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Usadel B Obayashi T Mutwil M Giorgi FM Bassel GW Tanimoto M Chow A Steinhauser D Persson S Provart NJ (2009) Co-expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

USDA (2016) Grain World Markets and Trade

Vanholme R Demedts B Morreel K Ralph J Boerjan W (2010) Lignin biosynthesis and structure Plant Physiol 153 895-905Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Verdin E Ott M (2014) 50 Years of Protein Acetylation From Gene Regulation To Epigenetics Metabolism and Beyond Nat Rev MolCell Biol 16 258-264

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Walley JW Sartor RC Shen Z Schmitz RJ Wu KJ Urich MA Nery JR Smith LG Schnable JC Ecker JR (2016) Integration of omicnetworks in a developmental atlas of maize Science 353 814-818

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang W Zhou XJ Liu Z Sun F (2015a) Network tuned multiple rank aggregation and applications to gene ranking BMC Bioinformatics16 S6

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang YXR Jiang K Feldman LJ Bickel PJ Huang H (2015b) Inferring gene-gene interactions and functional modules using sparsecanonical correlation analysis Ann Appl Stat 9 300-323

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang Z Gerstein M Snyder M (2009) RNA-Seq a revolutionary tool for transcriptomics Nat Rev Genet 10 57-63Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wei C Li J Bumgarner RE (2004) Sample size for detecting differentially expressed genes in microarray experiments BMC Genomics5 87

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wolfe CJ Kohane IS Butte AJ (2005) Systematic survey reveals general applicability of guilt-by-association within genecoexpression networks BMC Bioinformatics 6 227

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wu D-D Wang X Li Y Zeng L Irwin DM Zhang Y-P (2014) Out of Pollen Hypothesis for Origin of New Genes in Flowering PlantsStudy from Arabidopsis thaliana Genome Biol Evol 6 2822-2829

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yang H Liu X Xin M Du J Hu Z Peng H Rossi V Sun Q Ni Z Yao Y (2016) Genome-wide Mapping of Targets of Maize HistoneDeacetylase HDA101 Reveals Its Function and Regulatory Mechanism during Seed Development Plant Cell 28 629-645

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yim WC Yu Y Song K Jang CS Lee B-M (2013) PLANEX the plant co-expression database BMC Plant Biol 13 83Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zheng W Chung LM Zhao H (2011) Bias detection and correction in RNA-Sequencing data BMC Bioinformatics 12 290Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Allen JD Xiao G Xie Y (2014) Ensemble-based network aggregation improves the accuracy of gene network reconstructionPLoS One 9 e106319 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Ye ZH (2015) Secondary cell walls Biosynthesis patterned deposition and transcriptional regulation Plant Cell Physiol 56195-214

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhu G Wu A Xu X-J Xiao P Lu L Liu J Cao Y Chen L Wu J Zhao X-M (2016) PPIM A protein-protein interaction database for MaizePlant Physiol 170 pp1501821-

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zyprych-Walczak J Szabelska A Handschuh L Goacuterczak K Klamecka K Figlerowicz M Siatkowski I (2015) The impact of normalizationmethods on RNA-Seq data analysis Biomed Res Int 2015

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

  • Parsed Citations
  • Article File
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Figure 7
  • Figure 8
  • Parsed Citations
Page 35: 1 : Maize RNA-Seq GCNs · 4 Author names and affiliation: 5 Ji Huang, ... and the genome annotation is still far from complete (Mark 49 Cigan et al., 2005; Ficklin and Feltus, 2011)

Fig 5

A B

Figure 5 Evaluation of different sized networks constructed with different number of samples using PCC (black) SCC (green) MRNET (red) and CLR (blue) methods A Average AUROC values from GO evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted against sample size B Average AUROC values from PPPTY evaluations of networks constructed using 12 (S12) to 1266 (S1266) libraries were plotted against sample size Networks with the same number of samples included are designated as ldquo_1rdquo ldquo_2rdquo and ldquo_3rdquo PCC Pearson Correlation Coefficient SCC Spearman correlation coefficient MRNET Minimum Redundancy NETwork CLR Context Likelihood of Relatedness

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Figure 6 GCN performance comparison of networks constructed with 1266 libraries A Area under the ROC curve (AUROC) values from GO evalu-ation of single network (white bars) aggregation network (grey bars) and protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers B AUROC values from PPPTY evaluation of single network (white bars) aggregation network (grey bars) and protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers

AU

C

AU

C

Protein GO Protein PPPTYA B

Fig 6

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

835

45

3812

5 802

148

MS PA

SA

872505

167664

16510411732

9172 716573

106591

MS PA

SA

chromatinassemblydisassembly

cellular macromoleculemetabolic process

chromatin assembly

Hub

N2 compound metabolicprocess

gene silencing

macromoleculemetabolic process

cellular componentorganization

chromatin modification

biosynthetic process

cellular biosyntheticprocess

DNA packaging

organelle organization

protein-DNA complexassembly

nucleosomeorganization

DNA-dep DNAreplication

macromoleculebiosynthetic process

response to DNAdamage stimulus

chromosomeorganization

pattern specificationprocess

DNA replication

DNA conformationchange

translation

cellular macromoleculebiosynthetic process

Nucleic acid metabolicprocess

gene expression

chromatin organizationnucleosome assembly

epigenetic reg of geneexpression

negative regulation ofmacromolecule

metabolic process

cellular response tostress

RNA processing

DNA repair

regionalization

polysaccharidebiosynthetic process

cell wall organization orbiogenesis

glucan metabolicprocess

cellular glucanmetabolic process

cellular polysaccharidebiosynthetic process

cellular carbohydratebiosynthetic process

cellulose metabolicprocess

cellular polysaccharidemetabolic process

cellulose biosyntheticprocess epidermis development

cell growthgrowth

regulation of cellularcomponent size

cellular amino acidderivative metabolic

process

cell wall polysaccharidemetabolic process

carbohydrate metabolicprocess

regulation of anatomicalstructure size

GTP metabolic process

root morphogenesis

epidermal celldifferentiation

ectoderm developmentphenylpropanoid

biosynthetic process

regulation of cell size

glucan biosyntheticprocess

carbohydratebiosynthetic process cellular cell wall

organization orbiogenesis

cell wall biogenesis

Cell Wallroot epidermal cell

differentiationcell differentiation

cell wall organization

protein polymerization

plant-type cell wallbiogenesis

cellular carbohydratemetabolic process

phenylpropanoidmetabolic process

cell wall macromoleculemetabolic process

cellular cell wallmacromolecule

metabolic process

plant-type cell wallorganization or

biogenesis

hemicellulose metabolicprocess

Fig 7A C

B D

Figure 7 Comparison of top 1000 interacting genes in aggregation and single networks A Venn diagram shows the overlap among 1000 highest interacting genes in three networks 148 genes were shared among three networks PA PCC ranked aggregation network SA SCC ranked aggrega-tion network MS MRNET single network B Venn diagram shows the overlap among top 1x106 edges in three networks 106591 edges were shared among three networks C GO term enrichment analyzed by AgriGO for 148 shared highly interacting genes among PA SA and MS D GO term enrichment analyzed by AgriGO for 214 co-expressed genes queried by 16 cell wall pathway genes wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Fig 8

A B C

Figure 8 Cell wall pathway subnetworks A Intersections of PCC aggregation (PA) SCC aggregation (SA) and MRNET-single (MS) networks queried by 16 cell wall pathway genes (red nodes) Cyan nodes are genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions B Network retrieved from CORNET database queried by the 16 cell wall pathway genes (red node) Cyan nodes are genes with reported function in cell wall related path-ways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions C Network retrieved from STRING database queried by the 16 cell wall pathway genes (red nodes) Cyan nodes are genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Parsed CitationsAllen JD Xie Y Chen M Girard L Xiao G (2012) Comparing statistical methods for constructing large scale gene networks PLoS One7 e29348

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Anders S Huber W (2010) Differential expression analysis for sequence count data Genome Biol 11 R106Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Aoki K Ogata Y Shibata D (2007) Approaches for extracting practical information from gene co-expression networks in plant biologyPlant Cell Physiol 48 381-90

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Aoki Y Okamura Y Tadaka S Kinoshita K Obayashi T (2015) ATTED-II in 2016 A Plant Coexpression Database Towards Lineage-Specific Coexpression Plant Cell Physiol 57 e5-e5

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Asnicar F Masera L Coller E Gallo C Sella N Tolio T Morettin P Erculiani L Galante F Semeniuta S (2016) NES2RA Networkexpansion by stratified variable subsetting and ranking aggregation Int J High Perform Comput Appl 1094342016662508

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Baerenfaller K Grossmann J Grobei MA Hull R Hirsch-Hoffmann M Yalovsky S Zimmermann P Grossniklaus U Gruissem WBaginsky S (2008) Genome-scale proteomics reveals Arabidopsis thaliana gene models and proteome dynamics Science (80- ) 320938-941

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ballouz S Verleyen W Gillis J (2015) Guidance for RNA-seq co-expression network construction and analysis Safety in numbersBioinformatics 31 2123-2130

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ballouz S Weber M Pavlidis P Gillis J (2016) EGAD Ultra-fast functional analysis of gene networks bioRxiv 53868Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Barabasi A-L Oltvai ZNZN Barabaacutesi A-L (2004) Network biology understanding the cells functional organization Nat Rev Genet 5101-113

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Benjamini Y Hochberg Y (1995) Controlling the false discovery rate a practical and powerful approach to multiple testing J R Stat SocSer B 289-300

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

De Bodt S Hollunder J Nelissen H Meulemeester N Inzeacute D (2012) CORNET 20 Integrating plant coexpression protein-proteininteractions regulatory interactions gene associations and functional annotations New Phytol 195 707-720

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Bolduc N Yilmaz A Mejia-Guerra MK Morohashi K OConnor D Grotewold E Hake S (2012) Unraveling the KNOTTED1 regulatorynetwork in maize meristems Genes Dev 26 1685-90

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Bosch M Mayer C-D Cookson A Donnison IS (2011) Identification of genes involved in cell wall biogenesis in grasses by differential wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

gene expression profiling of elongating and non-elongating maize internodes J Exp Bot 62 3545-3561Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Camacho C Coulouris G Avagyan V Ma N Papadopoulos J Bealer K Madden TL (2009) BLAST+ architecture and applications BMCBioinformatics 10 421

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Conesa A Madrigal P Tarazona S Gomez-Cabrero D Cervera A McPherson A Szczesniak MW Gaffney DJ Elo LL Zhang X et al(2016) A survey of best practices for RNA-seq data analysis Genome Biol 17 13

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dhaeseleer P Liang S Somogyi R (2000) Genetic network inference from co-expression clustering to reverse engineeringBioinformatics 16 707-726

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Davidson RM Hansey CN Gowda M Childs KL Lin H Vaillancourt B Sekhon RS de Leon N Kaeppler SM Jiang N et al (2011) Utilityof RNA Sequencing for Analysis of Maize Reproductive Transcriptomes Plant Genome J 4 191

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Desprez T Juraniec M Crowell EF Jouy H Pochylova Z Parcy F Houmlfte H Gonneau M Vernhettes S (2007) Organization of cellulosesynthase complexes involved in primary cell wall synthesis in Arabidopsis thaliana Proc Natl Acad Sci 104 15572-15577

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dhillon IS Modha DS (2001) Concept decompositions for large sparse text data using clustering Mach Learn 42 143-175Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dillies M-A Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot G Castel D Estelle J (2013a) Acomprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis Brief Bioinform 14671-683

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dillies MA Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot NS Castel D Estelle J et al (2013b) Acomprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis Brief Bioinform 14671-683

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Doncheva NT Assenov Y Domingues FS Albrecht M (2012) Topological analysis and interactive visualization of biological networksand protein structures Nature Protoc 7 670-85

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dong J Horvath S (2007) Understanding network concepts in modules BMC Syst Biol 1 24Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dotto MC Petsch KA Aukerman MJ Beatty M Hammell M Timmermans MCP (2014) Genome-wide analysis of leafbladeless1-regulatedand phased small RNAs underscores the importance of the TAS3 ta-siRNA pathway to maize development PLoS Genet 10 e1004826

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Du D Rawat N Deng Z Gmitter FG (2015) Construction of citrus gene coexpression networks from microarray data using randommatrix theory Hortic Res 2 15026

Pubmed Author and TitleCrossRef Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Google Scholar Author Only Title Only Author and Title

Du Z Zhou X Ling Y Zhang Z Su Z (2010) agriGO a GO analysis toolkit for the agricultural community Nucleic Acids Res 38 64-70Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Enright AJ Van Dongen S Ouzounis CA (2002) An efficient algorithm for large-scale detection of protein families Nucleic Acids Res 301575-1584

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fadista J Vikman P Laakso EO Mollet IG Esguerra J Lou Taneera J Storm P Osmark P Ladenvall C Prasad RB (2014) Globalgenomic and transcriptomic analysis of human pancreatic islets reveals novel genes influencing glucose metabolism Proc Natl AcadSci 111 13924-13929

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Faith JJ Hayete B Thaden JT Mogno I Wierzbowski J Cottarel G Kasif S Collins JJ Gardner TS (2007) Large-scale mapping andvalidation of Escherichia coli transcriptional regulation from a compendium of expression profiles PLoS Biol 5 0054-0066

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fedoroff N V (2012) McClintocks challenge in the 21st century Proc Natl Acad Sci 109(50) 20200-20203Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ficklin SP Feltus FA (2011) Gene coexpression network alignment and conservation of gene modules between two grass speciesmaize and rice Plant Physiol 156 1244-56

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Filzmoser P Fritz H Kalcher K (2009) pcaPP Robust PCA by Projection Pursuit R Packag version 1Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fu J Cheng Y Linghu J Yang X Kang L Zhang Z Zhang J He C Du X Peng Z (2013) RNA sequencing reveals the complex regulatorynetwork in the maize kernel Nature Commun 42832

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gao S Ver Steeg G Galstyan A (2015) Efficient Estimation of Mutual Information for Strongly Dependent Variables ArtificialIntelligence and Statistics 277-286

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gillis J Pavlidis P (2011) The role of indirect connections in gene networks in predicting function Bioinformatics 27 1860-1866Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Giorgi F Fabbro C Del Licausi F (2013) Comparative study of RNA-seq-and Microarray-derived coexpression networks in Arabidopsisthaliana Bioinformatics 2 1-8

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gu Y Kaplinsky N Bringmann M Cobb A Carroll A Sampathkumar A Baskin TI Persson S Somerville CR (2010) Identification of acellulose synthase-associated protein required for cellulose biosynthesis Proc Natl Acad Sci 107 12866-12871

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Han Y Gao S Muegge K Zhang W Zhou B (2015) Advanced applications of RNA sequencing and challenges Bioinform Biol Insights 929-46

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Horvath S Dong J (2008) Geometric interpretation of gene coexpression network analysis PLoS Comput biol 4 e1000117Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Huang J Lynn JS Schulte L Vendramin S McGinnis K (2017) Chapter Two-Epigenetic Control of Gene Expression in Maize Int RevCell Mol Biol 328 25-48

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Iancu OD Kawane S Bottomly D Searles R Hitzemann R McWeeney S (2012) Utilizing RNA-Seq data for de novo coexpressionnetwork inference Bioinformatics 28 1592-1597

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kim D Langmead B Salzberg SL (2015) HISAT a fast spliced aligner with low memory requirements Nat Methods 12 357-360Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Koumlster J Rahmann S (2012) Snakemakemdasha scalable bioinformatics workflow engine Bioinformatics 28 2520-2522Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Krouk G Lingeman J Colon A Coruzzi G Shasha D (2013) Gene regulatory networks in plants learning causality from time andperturbation Genome Biol 14 123

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kumar S Stecher G Suleski M Hedges SB (2017) TimeTree a resource for timelines timetrees and divergence times Mol Biol Evol34 1812-1819

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kumari S Nie J Chen H-S Ma H Stewart R Li X Lu M-Z Taylor WM Wei H (2012) Evaluation of gene association methods forcoexpression network construction and biological knowledge discovery PLoS One 7 e50411

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Langfelder P Horvath S (2008) WGCNA an R package for weighted correlation network analysis BMC Bioinformatics 9 559Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Leinonen R Sugawara H Shumway M (2010) The sequence read archive Nucleic Acids Res gkq1019Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li C Qiao Z Qi W Wang Q Yuan Y Yang X Tang Y Mei B Lv Y Zhao H et al (2015a) Genome-Wide Characterization of cis-Acting DNATargets Reveals the Transcriptional Regulatory Framework of Opaque2 in Maize Plant Cell 27 532-545

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li H Peng Z Yang X Wang W Fu J Wang J Han Y Chai Y Guo T Yang N (2013a) Genome-wide association study dissects the geneticarchitecture of oil biosynthesis in maize kernels Nat Genet 45 43-50

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li J Wei H Zhao PX (2013b) DeGNServer Deciphering Genome-Scale Gene Networks through High Performance ReverseEngineering Analysis 2013

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li P Piao Y Shon HS Ryu KH (2015b) Comparing the normalization methods for the differential analysis of Illumina high-throughputRNA-Seq data BMC Bioinformatics 16 347 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li Q Eichten SR Hermanson PJ Zaunbrecher VM Song J Wendt J Rosenbaum H Madzima TF Sloan AE Huang J et al (2014a)Genetic Perturbation of the Maize Methylome Plant Cell 26 4602-4616

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li S Labaj PP Zumbo P Sykacek P Shi W Shi L Phan J Wu P-Y Wang M Wang C (2014b) Detecting and correcting systematicvariation in large-scale RNA sequencing data Nature Biotechnol 32 888-895

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li Y Pearl SA Jackson SA (2015c) Gene Networks in Plant Biology Approaches in Reconstruction and Analysis Trends Plant Sci 20664-675

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Liao Y Smyth GK Shi W (2014) featureCounts an efficient general purpose program for assigning sequence reads to genomicfeatures Bioinformatics 30 923-930

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Lim WK Wang K Lefebvre C Califano A (2007) Comparative analysis of microarray normalization procedures Effects on reverseengineering gene networks Bioinformatics pp 282-288

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Liu S Liu Y Zhao J Cai S Qian H Zuo K Zhao L Zhang L (2017) A computational interactome for prioritizing genes associated withcomplex agronomic traits in rice Plant J 90 177-188

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Lonsdale J Thomas J Salvatore M Phillips R Lo E Shad S Hasz R Walters G Garcia F Young N (2013) The genotype-tissueexpression (GTEx) project Nat Genet 45 580-585

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Love MI Huber W Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 Genome Biol15 1

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Luo X You Z Zhou M Li S Leung H Xia Y Zhu Q (2015) A highly efficient approach to protein interactome mapping based oncollaborative filtering framework Sci Rep 5 7702

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma C Wang X (2012) Application of the Gini correlation coefficient to infer regulatory relationships in transcriptome analysis PlantPhysiol 160 192-203

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma H-W Zeng A-P (2003) The connectivity structure giant strong component and centrality of metabolic networks Bioinformatics 191423-1430

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma S Shah S Bohnert HJ Snyder M Dinesh-Kumar SP (2013) Incorporating motif analysis into gene co-expression networks revealsnovel modular expression pattern and new signaling pathways PLoS Genet 9 e1003840

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Marbach D Costello JC Kuumlffner R Vega NNM Prill RJ Camacho DM Allison KR Aderhold A Allison KR Bonneau R et al (2012)Wisdom of crowds for robust gene network inference Nat Methods 9 796-804

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Margolin AA Nemenman I Basso K Wiggins C Stolovitzky G Dalla Favera R Califano A (2006) ARACNE an algorithm for thereconstruction of gene regulatory networks in a mammalian cellular context BMC Bioinformatics 7 S7

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mark Cigan A Unger-Wallace E Haug-Collet K (2005) Transcriptional gene silencing as a tool for uncovering gene function in maizePlant J 43 929-940

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads EMBnet J 17 pp-10Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Maza E Frasse P Senin P Bouzayen M Zouine M (2013) Comparison of normalization methods for differential gene expressionanalysis in RNA-Seq experiments A matter of relative size of studied transcriptomes Commun Integr Biol 6 e25849

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

McClintock B (1983) The Significance of Responses of THE GENOME TO CHALLENGE Science (80- ) 792-801Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Meyer PE Kontos K Lafitte F Bontempi G (2007) Information-theoretic inference of large transcriptional regulatory networks EurasipJ Bioinforma Syst Biol doi 101155200779879

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Meyer PE Lafitte F Bontempi G (2008) minet ARBioconductor package for inferring large transcriptional networks using mutualinformation BMC Bioinformatics 9 461

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Monaco MK Sen TZ Dharmawardhana PD Ren L Schaeffer M Naithani S Amarasinghe V Thomason J Harper L Gardiner J et al(2013) Maize Metabolic Network Construction and Transcriptome Analysis Plant Genome 6 12

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Morohashi K Casas MI Falcone Ferreyra ML Falcone Ferreyra L Mejiacutea-Guerra MK Pourcel L Yilmaz A Feller A Carvalho B EmilianiJ et al (2012) A genome-wide regulatory framework identifies maize pericarp color1 controlled genes Plant Cell 24 2745-64

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Morris JH Apeltsin L Newman AM Baumbach J Wittkop T Su G Bader GD Ferrin TE (2011) clusterMaker a multi-algorithmclustering plugin for Cytoscape BMC Bioinformatics 12 436

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mortazavi A Williams BA McCue K Schaeffer L Wold B (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq NatMethods 5 621-628

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mukaka MM (2012) A guide to appropriate use of correlation coefficient in medical research Malawi Med J 24 69-71Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Obayashi T Hayashi S Saeki M Ohta H Kinoshita K (2009) ATTED-II provides coexpressed gene networks for Arabidopsis Nucleic wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Acids Res 37 D987-D991Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ogata Y Suzuki H Sakurai N Shibata D (2010) CoP a database for characterizing co-expressed gene modules with biologicalinformation in plants Bioinformatics 26 1267-1268

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Oshlack A Wakefield MJ (2009) Transcript length bias in RNA-seq data confounds systems biology Biol Direct 4 14Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Park T Yi S-G Kang S-H Lee S Lee Y-S Simon R (2003) Evaluation of normalization methods for microarray data BMC Bioinformatics4 33

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Paulson J Chen C-Y Lopes-Ramos CM Kuijjer ML Platig J Sonawane AR Fagny M Glass K Quackenbush J (2016) Tissue-awareRNA-Seq processing and normalization for heterogeneous and sparse data bioRxiv 81802

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Pautler M Eveland AL LaRue T Yang F Weeks R Lunde C Je B Il Meeley R Komatsu M Vollbrecht E et al (2015) FASCIATED EAR4Encodes a bZIP Transcription Factor That Regulates Shoot Meristem Size in Maize Plant Cell Online 2 tpc114132506

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Penning BW Hunter 3rd CT Tayengwa R Eveland AL Dugard CK Olek AT Vermerris W Koch KE McCarty DR Davis MF et al (2009)Genetic resources for maize cell wall biology Plant Physiol 151 1703-1728

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ponnala L Wang Y Sun Q Wijk KJ (2014) Correlation of mRNA and protein abundance in the developing maize leaf Plant J 78 424-440Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Rau A Gallopin M Celeux G Jaffreacutezic F (2013) Data-based filtering for replicated high-throughput transcriptome sequencingexperiments Bioinformatics 29 2146-2152

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Robinson MD McCarthy DJ Smyth GK (2010) edgeR a Bioconductor package for differential expression analysis of digital geneexpression data Bioinformatics 26 139-140

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sales G Romualdi C (2011) parmigenemdasha parallel R package for mutual information estimation and gene network reconstructionBioinformatics 27 1876-1877

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sangrador-Vegas A Mitchell AL Chang H-Y Yong S-Y Finn RD (2016) GO annotation in InterPro why stability does not indicateaccuracy in a sea of changing annotations Database J Biol databases curation 2016

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sato Y Antonio BA Namiki N Takehisa H Minami H Kamatsuki K Sugimoto K Shimizu Y Hirochika H Nagamura Y (2011) RiceXPro aplatform for monitoring gene expression in japonica rice grown under natural field conditions Nucleic Acids Res 39 D1141-D1148

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schaefer RJ Briskine R Springer NM Myers CL (2014) Discovering functional modules across diverse maize transcriptomes using wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

COB the co-expression browser PLoS One doi 101371journalpone0099193Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schmidt D (2016) Co-Operation Fast Correlation Covariance and Cosine Similarity R packagePubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schnable P Ware D Fulton R Stein J (2009) The B73 maize genome complexity diversity and dynamics Science (80- ) 326 1112-1115Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schwanhaumlusser B Busse D Li N Dittmar G Schuchhardt J Wolf J Chen W Selbach M (2011) Global quantification of mammalian geneexpression control Nature 473 337-342

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Scott-Boyer M-P Haibe-Kains B Deschepper CF (2013) Network statistics of genetically-driven gene co-expression modules in mousecrosses Frontiers in Genetics 20134291

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Serin EAR Nijveen H Hilhorst HWM Ligterink W (2016) Learning from Co-expression Networks Possibilities and Challenges FrontPlant Sci 7 444

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Shannon P Markiel A Ozier O Baliga NS Wang JT Ramage D Amin N Schwikowski B Ideker T (2003) Cytoscape a softwareenvironment for integrated models of biomolecular interaction networks Genome Res 13 2498-2504

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sing T Sander O Beerenwinkel N Lengauer T (2005) ROCR visualizing classifier performance in R Bioinformatics 21 3940-3941Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Singh M Goel S Meeley RB Dantec C Parrinello H Michaud C Leblanc O Grimanelli D (2011) Production of viable gametes withoutmeiosis in maize deficient for an ARGONAUTE protein Plant Cell 23 443-458

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Song L Langfelder P Horvath S (2012) Comparison of co-expression measures mutual information correlation and model basedindices BMC Bioinformatics 13 328

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Stelpflug SC Rajandeep S Vaillancourt B Hirsch CN Buell CR Leon N De Kaeppler SM (2015) An expanded maize gene expressionatlas based on RNA-sequencing and its use to explore root development Plant Genome 314-362

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Szklarczyk D Franceschini A Wyder S Forslund K Heller D Huerta-Cepas J Simonovic M Roth A Santos A Tsafou KP et al (2015)STRING v10 Protein-protein interaction networks integrated over the tree of life Nucleic Acids Res 43 D447-D452

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Tzfadia O Diels T De Meyer S Vandepoele K Aharoni A Van de Peer Y (2016) CoExpNetViz Comparative Co-Expression NetworksConstruction and Visualization Tool Front Plant Sci 6 1194

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Usadel B Obayashi T Mutwil M Giorgi FM Bassel GW Tanimoto M Chow A Steinhauser D Persson S Provart NJ (2009) Co-expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

USDA (2016) Grain World Markets and Trade

Vanholme R Demedts B Morreel K Ralph J Boerjan W (2010) Lignin biosynthesis and structure Plant Physiol 153 895-905Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Verdin E Ott M (2014) 50 Years of Protein Acetylation From Gene Regulation To Epigenetics Metabolism and Beyond Nat Rev MolCell Biol 16 258-264

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Walley JW Sartor RC Shen Z Schmitz RJ Wu KJ Urich MA Nery JR Smith LG Schnable JC Ecker JR (2016) Integration of omicnetworks in a developmental atlas of maize Science 353 814-818

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang W Zhou XJ Liu Z Sun F (2015a) Network tuned multiple rank aggregation and applications to gene ranking BMC Bioinformatics16 S6

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang YXR Jiang K Feldman LJ Bickel PJ Huang H (2015b) Inferring gene-gene interactions and functional modules using sparsecanonical correlation analysis Ann Appl Stat 9 300-323

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang Z Gerstein M Snyder M (2009) RNA-Seq a revolutionary tool for transcriptomics Nat Rev Genet 10 57-63Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wei C Li J Bumgarner RE (2004) Sample size for detecting differentially expressed genes in microarray experiments BMC Genomics5 87

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wolfe CJ Kohane IS Butte AJ (2005) Systematic survey reveals general applicability of guilt-by-association within genecoexpression networks BMC Bioinformatics 6 227

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wu D-D Wang X Li Y Zeng L Irwin DM Zhang Y-P (2014) Out of Pollen Hypothesis for Origin of New Genes in Flowering PlantsStudy from Arabidopsis thaliana Genome Biol Evol 6 2822-2829

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yang H Liu X Xin M Du J Hu Z Peng H Rossi V Sun Q Ni Z Yao Y (2016) Genome-wide Mapping of Targets of Maize HistoneDeacetylase HDA101 Reveals Its Function and Regulatory Mechanism during Seed Development Plant Cell 28 629-645

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yim WC Yu Y Song K Jang CS Lee B-M (2013) PLANEX the plant co-expression database BMC Plant Biol 13 83Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zheng W Chung LM Zhao H (2011) Bias detection and correction in RNA-Sequencing data BMC Bioinformatics 12 290Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Allen JD Xiao G Xie Y (2014) Ensemble-based network aggregation improves the accuracy of gene network reconstructionPLoS One 9 e106319 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Ye ZH (2015) Secondary cell walls Biosynthesis patterned deposition and transcriptional regulation Plant Cell Physiol 56195-214

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhu G Wu A Xu X-J Xiao P Lu L Liu J Cao Y Chen L Wu J Zhao X-M (2016) PPIM A protein-protein interaction database for MaizePlant Physiol 170 pp1501821-

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zyprych-Walczak J Szabelska A Handschuh L Goacuterczak K Klamecka K Figlerowicz M Siatkowski I (2015) The impact of normalizationmethods on RNA-Seq data analysis Biomed Res Int 2015

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

  • Parsed Citations
  • Article File
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Figure 7
  • Figure 8
  • Parsed Citations
Page 36: 1 : Maize RNA-Seq GCNs · 4 Author names and affiliation: 5 Ji Huang, ... and the genome annotation is still far from complete (Mark 49 Cigan et al., 2005; Ficklin and Feltus, 2011)

Figure 6 GCN performance comparison of networks constructed with 1266 libraries A Area under the ROC curve (AUROC) values from GO evalu-ation of single network (white bars) aggregation network (grey bars) and protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers B AUROC values from PPPTY evaluation of single network (white bars) aggregation network (grey bars) and protein network (dark grey bars) were compared for network constructed using PCC(p) SCC(s) MRNET(m) or CLR(c) Bold horizontal lines indicate median Asterisks indicate mean and grey dots indicate outliers

AU

C

AU

C

Protein GO Protein PPPTYA B

Fig 6

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

835

45

3812

5 802

148

MS PA

SA

872505

167664

16510411732

9172 716573

106591

MS PA

SA

chromatinassemblydisassembly

cellular macromoleculemetabolic process

chromatin assembly

Hub

N2 compound metabolicprocess

gene silencing

macromoleculemetabolic process

cellular componentorganization

chromatin modification

biosynthetic process

cellular biosyntheticprocess

DNA packaging

organelle organization

protein-DNA complexassembly

nucleosomeorganization

DNA-dep DNAreplication

macromoleculebiosynthetic process

response to DNAdamage stimulus

chromosomeorganization

pattern specificationprocess

DNA replication

DNA conformationchange

translation

cellular macromoleculebiosynthetic process

Nucleic acid metabolicprocess

gene expression

chromatin organizationnucleosome assembly

epigenetic reg of geneexpression

negative regulation ofmacromolecule

metabolic process

cellular response tostress

RNA processing

DNA repair

regionalization

polysaccharidebiosynthetic process

cell wall organization orbiogenesis

glucan metabolicprocess

cellular glucanmetabolic process

cellular polysaccharidebiosynthetic process

cellular carbohydratebiosynthetic process

cellulose metabolicprocess

cellular polysaccharidemetabolic process

cellulose biosyntheticprocess epidermis development

cell growthgrowth

regulation of cellularcomponent size

cellular amino acidderivative metabolic

process

cell wall polysaccharidemetabolic process

carbohydrate metabolicprocess

regulation of anatomicalstructure size

GTP metabolic process

root morphogenesis

epidermal celldifferentiation

ectoderm developmentphenylpropanoid

biosynthetic process

regulation of cell size

glucan biosyntheticprocess

carbohydratebiosynthetic process cellular cell wall

organization orbiogenesis

cell wall biogenesis

Cell Wallroot epidermal cell

differentiationcell differentiation

cell wall organization

protein polymerization

plant-type cell wallbiogenesis

cellular carbohydratemetabolic process

phenylpropanoidmetabolic process

cell wall macromoleculemetabolic process

cellular cell wallmacromolecule

metabolic process

plant-type cell wallorganization or

biogenesis

hemicellulose metabolicprocess

Fig 7A C

B D

Figure 7 Comparison of top 1000 interacting genes in aggregation and single networks A Venn diagram shows the overlap among 1000 highest interacting genes in three networks 148 genes were shared among three networks PA PCC ranked aggregation network SA SCC ranked aggrega-tion network MS MRNET single network B Venn diagram shows the overlap among top 1x106 edges in three networks 106591 edges were shared among three networks C GO term enrichment analyzed by AgriGO for 148 shared highly interacting genes among PA SA and MS D GO term enrichment analyzed by AgriGO for 214 co-expressed genes queried by 16 cell wall pathway genes wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Fig 8

A B C

Figure 8 Cell wall pathway subnetworks A Intersections of PCC aggregation (PA) SCC aggregation (SA) and MRNET-single (MS) networks queried by 16 cell wall pathway genes (red nodes) Cyan nodes are genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions B Network retrieved from CORNET database queried by the 16 cell wall pathway genes (red node) Cyan nodes are genes with reported function in cell wall related path-ways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions C Network retrieved from STRING database queried by the 16 cell wall pathway genes (red nodes) Cyan nodes are genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Parsed CitationsAllen JD Xie Y Chen M Girard L Xiao G (2012) Comparing statistical methods for constructing large scale gene networks PLoS One7 e29348

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Anders S Huber W (2010) Differential expression analysis for sequence count data Genome Biol 11 R106Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Aoki K Ogata Y Shibata D (2007) Approaches for extracting practical information from gene co-expression networks in plant biologyPlant Cell Physiol 48 381-90

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Aoki Y Okamura Y Tadaka S Kinoshita K Obayashi T (2015) ATTED-II in 2016 A Plant Coexpression Database Towards Lineage-Specific Coexpression Plant Cell Physiol 57 e5-e5

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Asnicar F Masera L Coller E Gallo C Sella N Tolio T Morettin P Erculiani L Galante F Semeniuta S (2016) NES2RA Networkexpansion by stratified variable subsetting and ranking aggregation Int J High Perform Comput Appl 1094342016662508

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Baerenfaller K Grossmann J Grobei MA Hull R Hirsch-Hoffmann M Yalovsky S Zimmermann P Grossniklaus U Gruissem WBaginsky S (2008) Genome-scale proteomics reveals Arabidopsis thaliana gene models and proteome dynamics Science (80- ) 320938-941

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ballouz S Verleyen W Gillis J (2015) Guidance for RNA-seq co-expression network construction and analysis Safety in numbersBioinformatics 31 2123-2130

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ballouz S Weber M Pavlidis P Gillis J (2016) EGAD Ultra-fast functional analysis of gene networks bioRxiv 53868Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Barabasi A-L Oltvai ZNZN Barabaacutesi A-L (2004) Network biology understanding the cells functional organization Nat Rev Genet 5101-113

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Benjamini Y Hochberg Y (1995) Controlling the false discovery rate a practical and powerful approach to multiple testing J R Stat SocSer B 289-300

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

De Bodt S Hollunder J Nelissen H Meulemeester N Inzeacute D (2012) CORNET 20 Integrating plant coexpression protein-proteininteractions regulatory interactions gene associations and functional annotations New Phytol 195 707-720

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Bolduc N Yilmaz A Mejia-Guerra MK Morohashi K OConnor D Grotewold E Hake S (2012) Unraveling the KNOTTED1 regulatorynetwork in maize meristems Genes Dev 26 1685-90

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Bosch M Mayer C-D Cookson A Donnison IS (2011) Identification of genes involved in cell wall biogenesis in grasses by differential wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

gene expression profiling of elongating and non-elongating maize internodes J Exp Bot 62 3545-3561Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Camacho C Coulouris G Avagyan V Ma N Papadopoulos J Bealer K Madden TL (2009) BLAST+ architecture and applications BMCBioinformatics 10 421

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Conesa A Madrigal P Tarazona S Gomez-Cabrero D Cervera A McPherson A Szczesniak MW Gaffney DJ Elo LL Zhang X et al(2016) A survey of best practices for RNA-seq data analysis Genome Biol 17 13

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dhaeseleer P Liang S Somogyi R (2000) Genetic network inference from co-expression clustering to reverse engineeringBioinformatics 16 707-726

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Davidson RM Hansey CN Gowda M Childs KL Lin H Vaillancourt B Sekhon RS de Leon N Kaeppler SM Jiang N et al (2011) Utilityof RNA Sequencing for Analysis of Maize Reproductive Transcriptomes Plant Genome J 4 191

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Desprez T Juraniec M Crowell EF Jouy H Pochylova Z Parcy F Houmlfte H Gonneau M Vernhettes S (2007) Organization of cellulosesynthase complexes involved in primary cell wall synthesis in Arabidopsis thaliana Proc Natl Acad Sci 104 15572-15577

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dhillon IS Modha DS (2001) Concept decompositions for large sparse text data using clustering Mach Learn 42 143-175Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dillies M-A Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot G Castel D Estelle J (2013a) Acomprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis Brief Bioinform 14671-683

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dillies MA Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot NS Castel D Estelle J et al (2013b) Acomprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis Brief Bioinform 14671-683

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Doncheva NT Assenov Y Domingues FS Albrecht M (2012) Topological analysis and interactive visualization of biological networksand protein structures Nature Protoc 7 670-85

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dong J Horvath S (2007) Understanding network concepts in modules BMC Syst Biol 1 24Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dotto MC Petsch KA Aukerman MJ Beatty M Hammell M Timmermans MCP (2014) Genome-wide analysis of leafbladeless1-regulatedand phased small RNAs underscores the importance of the TAS3 ta-siRNA pathway to maize development PLoS Genet 10 e1004826

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Du D Rawat N Deng Z Gmitter FG (2015) Construction of citrus gene coexpression networks from microarray data using randommatrix theory Hortic Res 2 15026

Pubmed Author and TitleCrossRef Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Google Scholar Author Only Title Only Author and Title

Du Z Zhou X Ling Y Zhang Z Su Z (2010) agriGO a GO analysis toolkit for the agricultural community Nucleic Acids Res 38 64-70Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Enright AJ Van Dongen S Ouzounis CA (2002) An efficient algorithm for large-scale detection of protein families Nucleic Acids Res 301575-1584

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fadista J Vikman P Laakso EO Mollet IG Esguerra J Lou Taneera J Storm P Osmark P Ladenvall C Prasad RB (2014) Globalgenomic and transcriptomic analysis of human pancreatic islets reveals novel genes influencing glucose metabolism Proc Natl AcadSci 111 13924-13929

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Faith JJ Hayete B Thaden JT Mogno I Wierzbowski J Cottarel G Kasif S Collins JJ Gardner TS (2007) Large-scale mapping andvalidation of Escherichia coli transcriptional regulation from a compendium of expression profiles PLoS Biol 5 0054-0066

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fedoroff N V (2012) McClintocks challenge in the 21st century Proc Natl Acad Sci 109(50) 20200-20203Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ficklin SP Feltus FA (2011) Gene coexpression network alignment and conservation of gene modules between two grass speciesmaize and rice Plant Physiol 156 1244-56

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Filzmoser P Fritz H Kalcher K (2009) pcaPP Robust PCA by Projection Pursuit R Packag version 1Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fu J Cheng Y Linghu J Yang X Kang L Zhang Z Zhang J He C Du X Peng Z (2013) RNA sequencing reveals the complex regulatorynetwork in the maize kernel Nature Commun 42832

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gao S Ver Steeg G Galstyan A (2015) Efficient Estimation of Mutual Information for Strongly Dependent Variables ArtificialIntelligence and Statistics 277-286

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gillis J Pavlidis P (2011) The role of indirect connections in gene networks in predicting function Bioinformatics 27 1860-1866Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Giorgi F Fabbro C Del Licausi F (2013) Comparative study of RNA-seq-and Microarray-derived coexpression networks in Arabidopsisthaliana Bioinformatics 2 1-8

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gu Y Kaplinsky N Bringmann M Cobb A Carroll A Sampathkumar A Baskin TI Persson S Somerville CR (2010) Identification of acellulose synthase-associated protein required for cellulose biosynthesis Proc Natl Acad Sci 107 12866-12871

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Han Y Gao S Muegge K Zhang W Zhou B (2015) Advanced applications of RNA sequencing and challenges Bioinform Biol Insights 929-46

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Horvath S Dong J (2008) Geometric interpretation of gene coexpression network analysis PLoS Comput biol 4 e1000117Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Huang J Lynn JS Schulte L Vendramin S McGinnis K (2017) Chapter Two-Epigenetic Control of Gene Expression in Maize Int RevCell Mol Biol 328 25-48

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Iancu OD Kawane S Bottomly D Searles R Hitzemann R McWeeney S (2012) Utilizing RNA-Seq data for de novo coexpressionnetwork inference Bioinformatics 28 1592-1597

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kim D Langmead B Salzberg SL (2015) HISAT a fast spliced aligner with low memory requirements Nat Methods 12 357-360Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Koumlster J Rahmann S (2012) Snakemakemdasha scalable bioinformatics workflow engine Bioinformatics 28 2520-2522Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Krouk G Lingeman J Colon A Coruzzi G Shasha D (2013) Gene regulatory networks in plants learning causality from time andperturbation Genome Biol 14 123

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kumar S Stecher G Suleski M Hedges SB (2017) TimeTree a resource for timelines timetrees and divergence times Mol Biol Evol34 1812-1819

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kumari S Nie J Chen H-S Ma H Stewart R Li X Lu M-Z Taylor WM Wei H (2012) Evaluation of gene association methods forcoexpression network construction and biological knowledge discovery PLoS One 7 e50411

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Langfelder P Horvath S (2008) WGCNA an R package for weighted correlation network analysis BMC Bioinformatics 9 559Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Leinonen R Sugawara H Shumway M (2010) The sequence read archive Nucleic Acids Res gkq1019Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li C Qiao Z Qi W Wang Q Yuan Y Yang X Tang Y Mei B Lv Y Zhao H et al (2015a) Genome-Wide Characterization of cis-Acting DNATargets Reveals the Transcriptional Regulatory Framework of Opaque2 in Maize Plant Cell 27 532-545

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li H Peng Z Yang X Wang W Fu J Wang J Han Y Chai Y Guo T Yang N (2013a) Genome-wide association study dissects the geneticarchitecture of oil biosynthesis in maize kernels Nat Genet 45 43-50

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li J Wei H Zhao PX (2013b) DeGNServer Deciphering Genome-Scale Gene Networks through High Performance ReverseEngineering Analysis 2013

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li P Piao Y Shon HS Ryu KH (2015b) Comparing the normalization methods for the differential analysis of Illumina high-throughputRNA-Seq data BMC Bioinformatics 16 347 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li Q Eichten SR Hermanson PJ Zaunbrecher VM Song J Wendt J Rosenbaum H Madzima TF Sloan AE Huang J et al (2014a)Genetic Perturbation of the Maize Methylome Plant Cell 26 4602-4616

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li S Labaj PP Zumbo P Sykacek P Shi W Shi L Phan J Wu P-Y Wang M Wang C (2014b) Detecting and correcting systematicvariation in large-scale RNA sequencing data Nature Biotechnol 32 888-895

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li Y Pearl SA Jackson SA (2015c) Gene Networks in Plant Biology Approaches in Reconstruction and Analysis Trends Plant Sci 20664-675

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Liao Y Smyth GK Shi W (2014) featureCounts an efficient general purpose program for assigning sequence reads to genomicfeatures Bioinformatics 30 923-930

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Lim WK Wang K Lefebvre C Califano A (2007) Comparative analysis of microarray normalization procedures Effects on reverseengineering gene networks Bioinformatics pp 282-288

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Liu S Liu Y Zhao J Cai S Qian H Zuo K Zhao L Zhang L (2017) A computational interactome for prioritizing genes associated withcomplex agronomic traits in rice Plant J 90 177-188

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Lonsdale J Thomas J Salvatore M Phillips R Lo E Shad S Hasz R Walters G Garcia F Young N (2013) The genotype-tissueexpression (GTEx) project Nat Genet 45 580-585

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Love MI Huber W Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 Genome Biol15 1

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Luo X You Z Zhou M Li S Leung H Xia Y Zhu Q (2015) A highly efficient approach to protein interactome mapping based oncollaborative filtering framework Sci Rep 5 7702

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma C Wang X (2012) Application of the Gini correlation coefficient to infer regulatory relationships in transcriptome analysis PlantPhysiol 160 192-203

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma H-W Zeng A-P (2003) The connectivity structure giant strong component and centrality of metabolic networks Bioinformatics 191423-1430

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma S Shah S Bohnert HJ Snyder M Dinesh-Kumar SP (2013) Incorporating motif analysis into gene co-expression networks revealsnovel modular expression pattern and new signaling pathways PLoS Genet 9 e1003840

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Marbach D Costello JC Kuumlffner R Vega NNM Prill RJ Camacho DM Allison KR Aderhold A Allison KR Bonneau R et al (2012)Wisdom of crowds for robust gene network inference Nat Methods 9 796-804

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Margolin AA Nemenman I Basso K Wiggins C Stolovitzky G Dalla Favera R Califano A (2006) ARACNE an algorithm for thereconstruction of gene regulatory networks in a mammalian cellular context BMC Bioinformatics 7 S7

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mark Cigan A Unger-Wallace E Haug-Collet K (2005) Transcriptional gene silencing as a tool for uncovering gene function in maizePlant J 43 929-940

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads EMBnet J 17 pp-10Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Maza E Frasse P Senin P Bouzayen M Zouine M (2013) Comparison of normalization methods for differential gene expressionanalysis in RNA-Seq experiments A matter of relative size of studied transcriptomes Commun Integr Biol 6 e25849

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

McClintock B (1983) The Significance of Responses of THE GENOME TO CHALLENGE Science (80- ) 792-801Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Meyer PE Kontos K Lafitte F Bontempi G (2007) Information-theoretic inference of large transcriptional regulatory networks EurasipJ Bioinforma Syst Biol doi 101155200779879

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Meyer PE Lafitte F Bontempi G (2008) minet ARBioconductor package for inferring large transcriptional networks using mutualinformation BMC Bioinformatics 9 461

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Monaco MK Sen TZ Dharmawardhana PD Ren L Schaeffer M Naithani S Amarasinghe V Thomason J Harper L Gardiner J et al(2013) Maize Metabolic Network Construction and Transcriptome Analysis Plant Genome 6 12

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Morohashi K Casas MI Falcone Ferreyra ML Falcone Ferreyra L Mejiacutea-Guerra MK Pourcel L Yilmaz A Feller A Carvalho B EmilianiJ et al (2012) A genome-wide regulatory framework identifies maize pericarp color1 controlled genes Plant Cell 24 2745-64

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Morris JH Apeltsin L Newman AM Baumbach J Wittkop T Su G Bader GD Ferrin TE (2011) clusterMaker a multi-algorithmclustering plugin for Cytoscape BMC Bioinformatics 12 436

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mortazavi A Williams BA McCue K Schaeffer L Wold B (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq NatMethods 5 621-628

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mukaka MM (2012) A guide to appropriate use of correlation coefficient in medical research Malawi Med J 24 69-71Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Obayashi T Hayashi S Saeki M Ohta H Kinoshita K (2009) ATTED-II provides coexpressed gene networks for Arabidopsis Nucleic wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Acids Res 37 D987-D991Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ogata Y Suzuki H Sakurai N Shibata D (2010) CoP a database for characterizing co-expressed gene modules with biologicalinformation in plants Bioinformatics 26 1267-1268

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Oshlack A Wakefield MJ (2009) Transcript length bias in RNA-seq data confounds systems biology Biol Direct 4 14Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Park T Yi S-G Kang S-H Lee S Lee Y-S Simon R (2003) Evaluation of normalization methods for microarray data BMC Bioinformatics4 33

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Paulson J Chen C-Y Lopes-Ramos CM Kuijjer ML Platig J Sonawane AR Fagny M Glass K Quackenbush J (2016) Tissue-awareRNA-Seq processing and normalization for heterogeneous and sparse data bioRxiv 81802

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Pautler M Eveland AL LaRue T Yang F Weeks R Lunde C Je B Il Meeley R Komatsu M Vollbrecht E et al (2015) FASCIATED EAR4Encodes a bZIP Transcription Factor That Regulates Shoot Meristem Size in Maize Plant Cell Online 2 tpc114132506

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Penning BW Hunter 3rd CT Tayengwa R Eveland AL Dugard CK Olek AT Vermerris W Koch KE McCarty DR Davis MF et al (2009)Genetic resources for maize cell wall biology Plant Physiol 151 1703-1728

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ponnala L Wang Y Sun Q Wijk KJ (2014) Correlation of mRNA and protein abundance in the developing maize leaf Plant J 78 424-440Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Rau A Gallopin M Celeux G Jaffreacutezic F (2013) Data-based filtering for replicated high-throughput transcriptome sequencingexperiments Bioinformatics 29 2146-2152

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Robinson MD McCarthy DJ Smyth GK (2010) edgeR a Bioconductor package for differential expression analysis of digital geneexpression data Bioinformatics 26 139-140

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sales G Romualdi C (2011) parmigenemdasha parallel R package for mutual information estimation and gene network reconstructionBioinformatics 27 1876-1877

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sangrador-Vegas A Mitchell AL Chang H-Y Yong S-Y Finn RD (2016) GO annotation in InterPro why stability does not indicateaccuracy in a sea of changing annotations Database J Biol databases curation 2016

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sato Y Antonio BA Namiki N Takehisa H Minami H Kamatsuki K Sugimoto K Shimizu Y Hirochika H Nagamura Y (2011) RiceXPro aplatform for monitoring gene expression in japonica rice grown under natural field conditions Nucleic Acids Res 39 D1141-D1148

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schaefer RJ Briskine R Springer NM Myers CL (2014) Discovering functional modules across diverse maize transcriptomes using wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

COB the co-expression browser PLoS One doi 101371journalpone0099193Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schmidt D (2016) Co-Operation Fast Correlation Covariance and Cosine Similarity R packagePubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schnable P Ware D Fulton R Stein J (2009) The B73 maize genome complexity diversity and dynamics Science (80- ) 326 1112-1115Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schwanhaumlusser B Busse D Li N Dittmar G Schuchhardt J Wolf J Chen W Selbach M (2011) Global quantification of mammalian geneexpression control Nature 473 337-342

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Scott-Boyer M-P Haibe-Kains B Deschepper CF (2013) Network statistics of genetically-driven gene co-expression modules in mousecrosses Frontiers in Genetics 20134291

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Serin EAR Nijveen H Hilhorst HWM Ligterink W (2016) Learning from Co-expression Networks Possibilities and Challenges FrontPlant Sci 7 444

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Shannon P Markiel A Ozier O Baliga NS Wang JT Ramage D Amin N Schwikowski B Ideker T (2003) Cytoscape a softwareenvironment for integrated models of biomolecular interaction networks Genome Res 13 2498-2504

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sing T Sander O Beerenwinkel N Lengauer T (2005) ROCR visualizing classifier performance in R Bioinformatics 21 3940-3941Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Singh M Goel S Meeley RB Dantec C Parrinello H Michaud C Leblanc O Grimanelli D (2011) Production of viable gametes withoutmeiosis in maize deficient for an ARGONAUTE protein Plant Cell 23 443-458

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Song L Langfelder P Horvath S (2012) Comparison of co-expression measures mutual information correlation and model basedindices BMC Bioinformatics 13 328

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Stelpflug SC Rajandeep S Vaillancourt B Hirsch CN Buell CR Leon N De Kaeppler SM (2015) An expanded maize gene expressionatlas based on RNA-sequencing and its use to explore root development Plant Genome 314-362

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Szklarczyk D Franceschini A Wyder S Forslund K Heller D Huerta-Cepas J Simonovic M Roth A Santos A Tsafou KP et al (2015)STRING v10 Protein-protein interaction networks integrated over the tree of life Nucleic Acids Res 43 D447-D452

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Tzfadia O Diels T De Meyer S Vandepoele K Aharoni A Van de Peer Y (2016) CoExpNetViz Comparative Co-Expression NetworksConstruction and Visualization Tool Front Plant Sci 6 1194

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Usadel B Obayashi T Mutwil M Giorgi FM Bassel GW Tanimoto M Chow A Steinhauser D Persson S Provart NJ (2009) Co-expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

USDA (2016) Grain World Markets and Trade

Vanholme R Demedts B Morreel K Ralph J Boerjan W (2010) Lignin biosynthesis and structure Plant Physiol 153 895-905Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Verdin E Ott M (2014) 50 Years of Protein Acetylation From Gene Regulation To Epigenetics Metabolism and Beyond Nat Rev MolCell Biol 16 258-264

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Walley JW Sartor RC Shen Z Schmitz RJ Wu KJ Urich MA Nery JR Smith LG Schnable JC Ecker JR (2016) Integration of omicnetworks in a developmental atlas of maize Science 353 814-818

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang W Zhou XJ Liu Z Sun F (2015a) Network tuned multiple rank aggregation and applications to gene ranking BMC Bioinformatics16 S6

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang YXR Jiang K Feldman LJ Bickel PJ Huang H (2015b) Inferring gene-gene interactions and functional modules using sparsecanonical correlation analysis Ann Appl Stat 9 300-323

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang Z Gerstein M Snyder M (2009) RNA-Seq a revolutionary tool for transcriptomics Nat Rev Genet 10 57-63Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wei C Li J Bumgarner RE (2004) Sample size for detecting differentially expressed genes in microarray experiments BMC Genomics5 87

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wolfe CJ Kohane IS Butte AJ (2005) Systematic survey reveals general applicability of guilt-by-association within genecoexpression networks BMC Bioinformatics 6 227

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wu D-D Wang X Li Y Zeng L Irwin DM Zhang Y-P (2014) Out of Pollen Hypothesis for Origin of New Genes in Flowering PlantsStudy from Arabidopsis thaliana Genome Biol Evol 6 2822-2829

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yang H Liu X Xin M Du J Hu Z Peng H Rossi V Sun Q Ni Z Yao Y (2016) Genome-wide Mapping of Targets of Maize HistoneDeacetylase HDA101 Reveals Its Function and Regulatory Mechanism during Seed Development Plant Cell 28 629-645

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yim WC Yu Y Song K Jang CS Lee B-M (2013) PLANEX the plant co-expression database BMC Plant Biol 13 83Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zheng W Chung LM Zhao H (2011) Bias detection and correction in RNA-Sequencing data BMC Bioinformatics 12 290Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Allen JD Xiao G Xie Y (2014) Ensemble-based network aggregation improves the accuracy of gene network reconstructionPLoS One 9 e106319 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Ye ZH (2015) Secondary cell walls Biosynthesis patterned deposition and transcriptional regulation Plant Cell Physiol 56195-214

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhu G Wu A Xu X-J Xiao P Lu L Liu J Cao Y Chen L Wu J Zhao X-M (2016) PPIM A protein-protein interaction database for MaizePlant Physiol 170 pp1501821-

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zyprych-Walczak J Szabelska A Handschuh L Goacuterczak K Klamecka K Figlerowicz M Siatkowski I (2015) The impact of normalizationmethods on RNA-Seq data analysis Biomed Res Int 2015

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

  • Parsed Citations
  • Article File
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Figure 7
  • Figure 8
  • Parsed Citations
Page 37: 1 : Maize RNA-Seq GCNs · 4 Author names and affiliation: 5 Ji Huang, ... and the genome annotation is still far from complete (Mark 49 Cigan et al., 2005; Ficklin and Feltus, 2011)

835

45

3812

5 802

148

MS PA

SA

872505

167664

16510411732

9172 716573

106591

MS PA

SA

chromatinassemblydisassembly

cellular macromoleculemetabolic process

chromatin assembly

Hub

N2 compound metabolicprocess

gene silencing

macromoleculemetabolic process

cellular componentorganization

chromatin modification

biosynthetic process

cellular biosyntheticprocess

DNA packaging

organelle organization

protein-DNA complexassembly

nucleosomeorganization

DNA-dep DNAreplication

macromoleculebiosynthetic process

response to DNAdamage stimulus

chromosomeorganization

pattern specificationprocess

DNA replication

DNA conformationchange

translation

cellular macromoleculebiosynthetic process

Nucleic acid metabolicprocess

gene expression

chromatin organizationnucleosome assembly

epigenetic reg of geneexpression

negative regulation ofmacromolecule

metabolic process

cellular response tostress

RNA processing

DNA repair

regionalization

polysaccharidebiosynthetic process

cell wall organization orbiogenesis

glucan metabolicprocess

cellular glucanmetabolic process

cellular polysaccharidebiosynthetic process

cellular carbohydratebiosynthetic process

cellulose metabolicprocess

cellular polysaccharidemetabolic process

cellulose biosyntheticprocess epidermis development

cell growthgrowth

regulation of cellularcomponent size

cellular amino acidderivative metabolic

process

cell wall polysaccharidemetabolic process

carbohydrate metabolicprocess

regulation of anatomicalstructure size

GTP metabolic process

root morphogenesis

epidermal celldifferentiation

ectoderm developmentphenylpropanoid

biosynthetic process

regulation of cell size

glucan biosyntheticprocess

carbohydratebiosynthetic process cellular cell wall

organization orbiogenesis

cell wall biogenesis

Cell Wallroot epidermal cell

differentiationcell differentiation

cell wall organization

protein polymerization

plant-type cell wallbiogenesis

cellular carbohydratemetabolic process

phenylpropanoidmetabolic process

cell wall macromoleculemetabolic process

cellular cell wallmacromolecule

metabolic process

plant-type cell wallorganization or

biogenesis

hemicellulose metabolicprocess

Fig 7A C

B D

Figure 7 Comparison of top 1000 interacting genes in aggregation and single networks A Venn diagram shows the overlap among 1000 highest interacting genes in three networks 148 genes were shared among three networks PA PCC ranked aggregation network SA SCC ranked aggrega-tion network MS MRNET single network B Venn diagram shows the overlap among top 1x106 edges in three networks 106591 edges were shared among three networks C GO term enrichment analyzed by AgriGO for 148 shared highly interacting genes among PA SA and MS D GO term enrichment analyzed by AgriGO for 214 co-expressed genes queried by 16 cell wall pathway genes wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Fig 8

A B C

Figure 8 Cell wall pathway subnetworks A Intersections of PCC aggregation (PA) SCC aggregation (SA) and MRNET-single (MS) networks queried by 16 cell wall pathway genes (red nodes) Cyan nodes are genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions B Network retrieved from CORNET database queried by the 16 cell wall pathway genes (red node) Cyan nodes are genes with reported function in cell wall related path-ways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions C Network retrieved from STRING database queried by the 16 cell wall pathway genes (red nodes) Cyan nodes are genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Parsed CitationsAllen JD Xie Y Chen M Girard L Xiao G (2012) Comparing statistical methods for constructing large scale gene networks PLoS One7 e29348

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Anders S Huber W (2010) Differential expression analysis for sequence count data Genome Biol 11 R106Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Aoki K Ogata Y Shibata D (2007) Approaches for extracting practical information from gene co-expression networks in plant biologyPlant Cell Physiol 48 381-90

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Aoki Y Okamura Y Tadaka S Kinoshita K Obayashi T (2015) ATTED-II in 2016 A Plant Coexpression Database Towards Lineage-Specific Coexpression Plant Cell Physiol 57 e5-e5

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Asnicar F Masera L Coller E Gallo C Sella N Tolio T Morettin P Erculiani L Galante F Semeniuta S (2016) NES2RA Networkexpansion by stratified variable subsetting and ranking aggregation Int J High Perform Comput Appl 1094342016662508

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Baerenfaller K Grossmann J Grobei MA Hull R Hirsch-Hoffmann M Yalovsky S Zimmermann P Grossniklaus U Gruissem WBaginsky S (2008) Genome-scale proteomics reveals Arabidopsis thaliana gene models and proteome dynamics Science (80- ) 320938-941

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ballouz S Verleyen W Gillis J (2015) Guidance for RNA-seq co-expression network construction and analysis Safety in numbersBioinformatics 31 2123-2130

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ballouz S Weber M Pavlidis P Gillis J (2016) EGAD Ultra-fast functional analysis of gene networks bioRxiv 53868Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Barabasi A-L Oltvai ZNZN Barabaacutesi A-L (2004) Network biology understanding the cells functional organization Nat Rev Genet 5101-113

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Benjamini Y Hochberg Y (1995) Controlling the false discovery rate a practical and powerful approach to multiple testing J R Stat SocSer B 289-300

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

De Bodt S Hollunder J Nelissen H Meulemeester N Inzeacute D (2012) CORNET 20 Integrating plant coexpression protein-proteininteractions regulatory interactions gene associations and functional annotations New Phytol 195 707-720

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Bolduc N Yilmaz A Mejia-Guerra MK Morohashi K OConnor D Grotewold E Hake S (2012) Unraveling the KNOTTED1 regulatorynetwork in maize meristems Genes Dev 26 1685-90

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Bosch M Mayer C-D Cookson A Donnison IS (2011) Identification of genes involved in cell wall biogenesis in grasses by differential wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

gene expression profiling of elongating and non-elongating maize internodes J Exp Bot 62 3545-3561Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Camacho C Coulouris G Avagyan V Ma N Papadopoulos J Bealer K Madden TL (2009) BLAST+ architecture and applications BMCBioinformatics 10 421

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Conesa A Madrigal P Tarazona S Gomez-Cabrero D Cervera A McPherson A Szczesniak MW Gaffney DJ Elo LL Zhang X et al(2016) A survey of best practices for RNA-seq data analysis Genome Biol 17 13

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dhaeseleer P Liang S Somogyi R (2000) Genetic network inference from co-expression clustering to reverse engineeringBioinformatics 16 707-726

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Davidson RM Hansey CN Gowda M Childs KL Lin H Vaillancourt B Sekhon RS de Leon N Kaeppler SM Jiang N et al (2011) Utilityof RNA Sequencing for Analysis of Maize Reproductive Transcriptomes Plant Genome J 4 191

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Desprez T Juraniec M Crowell EF Jouy H Pochylova Z Parcy F Houmlfte H Gonneau M Vernhettes S (2007) Organization of cellulosesynthase complexes involved in primary cell wall synthesis in Arabidopsis thaliana Proc Natl Acad Sci 104 15572-15577

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dhillon IS Modha DS (2001) Concept decompositions for large sparse text data using clustering Mach Learn 42 143-175Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dillies M-A Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot G Castel D Estelle J (2013a) Acomprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis Brief Bioinform 14671-683

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dillies MA Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot NS Castel D Estelle J et al (2013b) Acomprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis Brief Bioinform 14671-683

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Doncheva NT Assenov Y Domingues FS Albrecht M (2012) Topological analysis and interactive visualization of biological networksand protein structures Nature Protoc 7 670-85

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dong J Horvath S (2007) Understanding network concepts in modules BMC Syst Biol 1 24Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dotto MC Petsch KA Aukerman MJ Beatty M Hammell M Timmermans MCP (2014) Genome-wide analysis of leafbladeless1-regulatedand phased small RNAs underscores the importance of the TAS3 ta-siRNA pathway to maize development PLoS Genet 10 e1004826

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Du D Rawat N Deng Z Gmitter FG (2015) Construction of citrus gene coexpression networks from microarray data using randommatrix theory Hortic Res 2 15026

Pubmed Author and TitleCrossRef Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Google Scholar Author Only Title Only Author and Title

Du Z Zhou X Ling Y Zhang Z Su Z (2010) agriGO a GO analysis toolkit for the agricultural community Nucleic Acids Res 38 64-70Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Enright AJ Van Dongen S Ouzounis CA (2002) An efficient algorithm for large-scale detection of protein families Nucleic Acids Res 301575-1584

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fadista J Vikman P Laakso EO Mollet IG Esguerra J Lou Taneera J Storm P Osmark P Ladenvall C Prasad RB (2014) Globalgenomic and transcriptomic analysis of human pancreatic islets reveals novel genes influencing glucose metabolism Proc Natl AcadSci 111 13924-13929

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Faith JJ Hayete B Thaden JT Mogno I Wierzbowski J Cottarel G Kasif S Collins JJ Gardner TS (2007) Large-scale mapping andvalidation of Escherichia coli transcriptional regulation from a compendium of expression profiles PLoS Biol 5 0054-0066

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fedoroff N V (2012) McClintocks challenge in the 21st century Proc Natl Acad Sci 109(50) 20200-20203Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ficklin SP Feltus FA (2011) Gene coexpression network alignment and conservation of gene modules between two grass speciesmaize and rice Plant Physiol 156 1244-56

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Filzmoser P Fritz H Kalcher K (2009) pcaPP Robust PCA by Projection Pursuit R Packag version 1Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fu J Cheng Y Linghu J Yang X Kang L Zhang Z Zhang J He C Du X Peng Z (2013) RNA sequencing reveals the complex regulatorynetwork in the maize kernel Nature Commun 42832

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gao S Ver Steeg G Galstyan A (2015) Efficient Estimation of Mutual Information for Strongly Dependent Variables ArtificialIntelligence and Statistics 277-286

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gillis J Pavlidis P (2011) The role of indirect connections in gene networks in predicting function Bioinformatics 27 1860-1866Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Giorgi F Fabbro C Del Licausi F (2013) Comparative study of RNA-seq-and Microarray-derived coexpression networks in Arabidopsisthaliana Bioinformatics 2 1-8

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gu Y Kaplinsky N Bringmann M Cobb A Carroll A Sampathkumar A Baskin TI Persson S Somerville CR (2010) Identification of acellulose synthase-associated protein required for cellulose biosynthesis Proc Natl Acad Sci 107 12866-12871

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Han Y Gao S Muegge K Zhang W Zhou B (2015) Advanced applications of RNA sequencing and challenges Bioinform Biol Insights 929-46

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Horvath S Dong J (2008) Geometric interpretation of gene coexpression network analysis PLoS Comput biol 4 e1000117Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Huang J Lynn JS Schulte L Vendramin S McGinnis K (2017) Chapter Two-Epigenetic Control of Gene Expression in Maize Int RevCell Mol Biol 328 25-48

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Iancu OD Kawane S Bottomly D Searles R Hitzemann R McWeeney S (2012) Utilizing RNA-Seq data for de novo coexpressionnetwork inference Bioinformatics 28 1592-1597

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kim D Langmead B Salzberg SL (2015) HISAT a fast spliced aligner with low memory requirements Nat Methods 12 357-360Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Koumlster J Rahmann S (2012) Snakemakemdasha scalable bioinformatics workflow engine Bioinformatics 28 2520-2522Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Krouk G Lingeman J Colon A Coruzzi G Shasha D (2013) Gene regulatory networks in plants learning causality from time andperturbation Genome Biol 14 123

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kumar S Stecher G Suleski M Hedges SB (2017) TimeTree a resource for timelines timetrees and divergence times Mol Biol Evol34 1812-1819

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kumari S Nie J Chen H-S Ma H Stewart R Li X Lu M-Z Taylor WM Wei H (2012) Evaluation of gene association methods forcoexpression network construction and biological knowledge discovery PLoS One 7 e50411

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Langfelder P Horvath S (2008) WGCNA an R package for weighted correlation network analysis BMC Bioinformatics 9 559Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Leinonen R Sugawara H Shumway M (2010) The sequence read archive Nucleic Acids Res gkq1019Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li C Qiao Z Qi W Wang Q Yuan Y Yang X Tang Y Mei B Lv Y Zhao H et al (2015a) Genome-Wide Characterization of cis-Acting DNATargets Reveals the Transcriptional Regulatory Framework of Opaque2 in Maize Plant Cell 27 532-545

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li H Peng Z Yang X Wang W Fu J Wang J Han Y Chai Y Guo T Yang N (2013a) Genome-wide association study dissects the geneticarchitecture of oil biosynthesis in maize kernels Nat Genet 45 43-50

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li J Wei H Zhao PX (2013b) DeGNServer Deciphering Genome-Scale Gene Networks through High Performance ReverseEngineering Analysis 2013

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li P Piao Y Shon HS Ryu KH (2015b) Comparing the normalization methods for the differential analysis of Illumina high-throughputRNA-Seq data BMC Bioinformatics 16 347 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li Q Eichten SR Hermanson PJ Zaunbrecher VM Song J Wendt J Rosenbaum H Madzima TF Sloan AE Huang J et al (2014a)Genetic Perturbation of the Maize Methylome Plant Cell 26 4602-4616

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li S Labaj PP Zumbo P Sykacek P Shi W Shi L Phan J Wu P-Y Wang M Wang C (2014b) Detecting and correcting systematicvariation in large-scale RNA sequencing data Nature Biotechnol 32 888-895

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li Y Pearl SA Jackson SA (2015c) Gene Networks in Plant Biology Approaches in Reconstruction and Analysis Trends Plant Sci 20664-675

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Liao Y Smyth GK Shi W (2014) featureCounts an efficient general purpose program for assigning sequence reads to genomicfeatures Bioinformatics 30 923-930

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Lim WK Wang K Lefebvre C Califano A (2007) Comparative analysis of microarray normalization procedures Effects on reverseengineering gene networks Bioinformatics pp 282-288

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Liu S Liu Y Zhao J Cai S Qian H Zuo K Zhao L Zhang L (2017) A computational interactome for prioritizing genes associated withcomplex agronomic traits in rice Plant J 90 177-188

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Lonsdale J Thomas J Salvatore M Phillips R Lo E Shad S Hasz R Walters G Garcia F Young N (2013) The genotype-tissueexpression (GTEx) project Nat Genet 45 580-585

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Love MI Huber W Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 Genome Biol15 1

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Luo X You Z Zhou M Li S Leung H Xia Y Zhu Q (2015) A highly efficient approach to protein interactome mapping based oncollaborative filtering framework Sci Rep 5 7702

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma C Wang X (2012) Application of the Gini correlation coefficient to infer regulatory relationships in transcriptome analysis PlantPhysiol 160 192-203

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma H-W Zeng A-P (2003) The connectivity structure giant strong component and centrality of metabolic networks Bioinformatics 191423-1430

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma S Shah S Bohnert HJ Snyder M Dinesh-Kumar SP (2013) Incorporating motif analysis into gene co-expression networks revealsnovel modular expression pattern and new signaling pathways PLoS Genet 9 e1003840

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Marbach D Costello JC Kuumlffner R Vega NNM Prill RJ Camacho DM Allison KR Aderhold A Allison KR Bonneau R et al (2012)Wisdom of crowds for robust gene network inference Nat Methods 9 796-804

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Margolin AA Nemenman I Basso K Wiggins C Stolovitzky G Dalla Favera R Califano A (2006) ARACNE an algorithm for thereconstruction of gene regulatory networks in a mammalian cellular context BMC Bioinformatics 7 S7

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mark Cigan A Unger-Wallace E Haug-Collet K (2005) Transcriptional gene silencing as a tool for uncovering gene function in maizePlant J 43 929-940

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads EMBnet J 17 pp-10Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Maza E Frasse P Senin P Bouzayen M Zouine M (2013) Comparison of normalization methods for differential gene expressionanalysis in RNA-Seq experiments A matter of relative size of studied transcriptomes Commun Integr Biol 6 e25849

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

McClintock B (1983) The Significance of Responses of THE GENOME TO CHALLENGE Science (80- ) 792-801Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Meyer PE Kontos K Lafitte F Bontempi G (2007) Information-theoretic inference of large transcriptional regulatory networks EurasipJ Bioinforma Syst Biol doi 101155200779879

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Meyer PE Lafitte F Bontempi G (2008) minet ARBioconductor package for inferring large transcriptional networks using mutualinformation BMC Bioinformatics 9 461

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Monaco MK Sen TZ Dharmawardhana PD Ren L Schaeffer M Naithani S Amarasinghe V Thomason J Harper L Gardiner J et al(2013) Maize Metabolic Network Construction and Transcriptome Analysis Plant Genome 6 12

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Morohashi K Casas MI Falcone Ferreyra ML Falcone Ferreyra L Mejiacutea-Guerra MK Pourcel L Yilmaz A Feller A Carvalho B EmilianiJ et al (2012) A genome-wide regulatory framework identifies maize pericarp color1 controlled genes Plant Cell 24 2745-64

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Morris JH Apeltsin L Newman AM Baumbach J Wittkop T Su G Bader GD Ferrin TE (2011) clusterMaker a multi-algorithmclustering plugin for Cytoscape BMC Bioinformatics 12 436

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mortazavi A Williams BA McCue K Schaeffer L Wold B (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq NatMethods 5 621-628

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mukaka MM (2012) A guide to appropriate use of correlation coefficient in medical research Malawi Med J 24 69-71Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Obayashi T Hayashi S Saeki M Ohta H Kinoshita K (2009) ATTED-II provides coexpressed gene networks for Arabidopsis Nucleic wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Acids Res 37 D987-D991Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ogata Y Suzuki H Sakurai N Shibata D (2010) CoP a database for characterizing co-expressed gene modules with biologicalinformation in plants Bioinformatics 26 1267-1268

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Oshlack A Wakefield MJ (2009) Transcript length bias in RNA-seq data confounds systems biology Biol Direct 4 14Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Park T Yi S-G Kang S-H Lee S Lee Y-S Simon R (2003) Evaluation of normalization methods for microarray data BMC Bioinformatics4 33

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Paulson J Chen C-Y Lopes-Ramos CM Kuijjer ML Platig J Sonawane AR Fagny M Glass K Quackenbush J (2016) Tissue-awareRNA-Seq processing and normalization for heterogeneous and sparse data bioRxiv 81802

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Pautler M Eveland AL LaRue T Yang F Weeks R Lunde C Je B Il Meeley R Komatsu M Vollbrecht E et al (2015) FASCIATED EAR4Encodes a bZIP Transcription Factor That Regulates Shoot Meristem Size in Maize Plant Cell Online 2 tpc114132506

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Penning BW Hunter 3rd CT Tayengwa R Eveland AL Dugard CK Olek AT Vermerris W Koch KE McCarty DR Davis MF et al (2009)Genetic resources for maize cell wall biology Plant Physiol 151 1703-1728

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ponnala L Wang Y Sun Q Wijk KJ (2014) Correlation of mRNA and protein abundance in the developing maize leaf Plant J 78 424-440Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Rau A Gallopin M Celeux G Jaffreacutezic F (2013) Data-based filtering for replicated high-throughput transcriptome sequencingexperiments Bioinformatics 29 2146-2152

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Robinson MD McCarthy DJ Smyth GK (2010) edgeR a Bioconductor package for differential expression analysis of digital geneexpression data Bioinformatics 26 139-140

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sales G Romualdi C (2011) parmigenemdasha parallel R package for mutual information estimation and gene network reconstructionBioinformatics 27 1876-1877

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sangrador-Vegas A Mitchell AL Chang H-Y Yong S-Y Finn RD (2016) GO annotation in InterPro why stability does not indicateaccuracy in a sea of changing annotations Database J Biol databases curation 2016

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sato Y Antonio BA Namiki N Takehisa H Minami H Kamatsuki K Sugimoto K Shimizu Y Hirochika H Nagamura Y (2011) RiceXPro aplatform for monitoring gene expression in japonica rice grown under natural field conditions Nucleic Acids Res 39 D1141-D1148

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schaefer RJ Briskine R Springer NM Myers CL (2014) Discovering functional modules across diverse maize transcriptomes using wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

COB the co-expression browser PLoS One doi 101371journalpone0099193Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schmidt D (2016) Co-Operation Fast Correlation Covariance and Cosine Similarity R packagePubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schnable P Ware D Fulton R Stein J (2009) The B73 maize genome complexity diversity and dynamics Science (80- ) 326 1112-1115Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schwanhaumlusser B Busse D Li N Dittmar G Schuchhardt J Wolf J Chen W Selbach M (2011) Global quantification of mammalian geneexpression control Nature 473 337-342

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Scott-Boyer M-P Haibe-Kains B Deschepper CF (2013) Network statistics of genetically-driven gene co-expression modules in mousecrosses Frontiers in Genetics 20134291

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Serin EAR Nijveen H Hilhorst HWM Ligterink W (2016) Learning from Co-expression Networks Possibilities and Challenges FrontPlant Sci 7 444

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Shannon P Markiel A Ozier O Baliga NS Wang JT Ramage D Amin N Schwikowski B Ideker T (2003) Cytoscape a softwareenvironment for integrated models of biomolecular interaction networks Genome Res 13 2498-2504

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sing T Sander O Beerenwinkel N Lengauer T (2005) ROCR visualizing classifier performance in R Bioinformatics 21 3940-3941Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Singh M Goel S Meeley RB Dantec C Parrinello H Michaud C Leblanc O Grimanelli D (2011) Production of viable gametes withoutmeiosis in maize deficient for an ARGONAUTE protein Plant Cell 23 443-458

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Song L Langfelder P Horvath S (2012) Comparison of co-expression measures mutual information correlation and model basedindices BMC Bioinformatics 13 328

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Stelpflug SC Rajandeep S Vaillancourt B Hirsch CN Buell CR Leon N De Kaeppler SM (2015) An expanded maize gene expressionatlas based on RNA-sequencing and its use to explore root development Plant Genome 314-362

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Szklarczyk D Franceschini A Wyder S Forslund K Heller D Huerta-Cepas J Simonovic M Roth A Santos A Tsafou KP et al (2015)STRING v10 Protein-protein interaction networks integrated over the tree of life Nucleic Acids Res 43 D447-D452

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Tzfadia O Diels T De Meyer S Vandepoele K Aharoni A Van de Peer Y (2016) CoExpNetViz Comparative Co-Expression NetworksConstruction and Visualization Tool Front Plant Sci 6 1194

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Usadel B Obayashi T Mutwil M Giorgi FM Bassel GW Tanimoto M Chow A Steinhauser D Persson S Provart NJ (2009) Co-expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

USDA (2016) Grain World Markets and Trade

Vanholme R Demedts B Morreel K Ralph J Boerjan W (2010) Lignin biosynthesis and structure Plant Physiol 153 895-905Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Verdin E Ott M (2014) 50 Years of Protein Acetylation From Gene Regulation To Epigenetics Metabolism and Beyond Nat Rev MolCell Biol 16 258-264

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Walley JW Sartor RC Shen Z Schmitz RJ Wu KJ Urich MA Nery JR Smith LG Schnable JC Ecker JR (2016) Integration of omicnetworks in a developmental atlas of maize Science 353 814-818

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang W Zhou XJ Liu Z Sun F (2015a) Network tuned multiple rank aggregation and applications to gene ranking BMC Bioinformatics16 S6

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang YXR Jiang K Feldman LJ Bickel PJ Huang H (2015b) Inferring gene-gene interactions and functional modules using sparsecanonical correlation analysis Ann Appl Stat 9 300-323

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang Z Gerstein M Snyder M (2009) RNA-Seq a revolutionary tool for transcriptomics Nat Rev Genet 10 57-63Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wei C Li J Bumgarner RE (2004) Sample size for detecting differentially expressed genes in microarray experiments BMC Genomics5 87

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wolfe CJ Kohane IS Butte AJ (2005) Systematic survey reveals general applicability of guilt-by-association within genecoexpression networks BMC Bioinformatics 6 227

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wu D-D Wang X Li Y Zeng L Irwin DM Zhang Y-P (2014) Out of Pollen Hypothesis for Origin of New Genes in Flowering PlantsStudy from Arabidopsis thaliana Genome Biol Evol 6 2822-2829

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yang H Liu X Xin M Du J Hu Z Peng H Rossi V Sun Q Ni Z Yao Y (2016) Genome-wide Mapping of Targets of Maize HistoneDeacetylase HDA101 Reveals Its Function and Regulatory Mechanism during Seed Development Plant Cell 28 629-645

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yim WC Yu Y Song K Jang CS Lee B-M (2013) PLANEX the plant co-expression database BMC Plant Biol 13 83Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zheng W Chung LM Zhao H (2011) Bias detection and correction in RNA-Sequencing data BMC Bioinformatics 12 290Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Allen JD Xiao G Xie Y (2014) Ensemble-based network aggregation improves the accuracy of gene network reconstructionPLoS One 9 e106319 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Ye ZH (2015) Secondary cell walls Biosynthesis patterned deposition and transcriptional regulation Plant Cell Physiol 56195-214

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhu G Wu A Xu X-J Xiao P Lu L Liu J Cao Y Chen L Wu J Zhao X-M (2016) PPIM A protein-protein interaction database for MaizePlant Physiol 170 pp1501821-

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zyprych-Walczak J Szabelska A Handschuh L Goacuterczak K Klamecka K Figlerowicz M Siatkowski I (2015) The impact of normalizationmethods on RNA-Seq data analysis Biomed Res Int 2015

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

  • Parsed Citations
  • Article File
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Figure 7
  • Figure 8
  • Parsed Citations
Page 38: 1 : Maize RNA-Seq GCNs · 4 Author names and affiliation: 5 Ji Huang, ... and the genome annotation is still far from complete (Mark 49 Cigan et al., 2005; Ficklin and Feltus, 2011)

Fig 8

A B C

Figure 8 Cell wall pathway subnetworks A Intersections of PCC aggregation (PA) SCC aggregation (SA) and MRNET-single (MS) networks queried by 16 cell wall pathway genes (red nodes) Cyan nodes are genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions B Network retrieved from CORNET database queried by the 16 cell wall pathway genes (red node) Cyan nodes are genes with reported function in cell wall related path-ways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions C Network retrieved from STRING database queried by the 16 cell wall pathway genes (red nodes) Cyan nodes are genes with reported function in cell wall related pathways in plant Dark grey nodes are genes without prior knowledge of involvement in cell wall related pathways Grey lines indicate network predicted interactions

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Parsed CitationsAllen JD Xie Y Chen M Girard L Xiao G (2012) Comparing statistical methods for constructing large scale gene networks PLoS One7 e29348

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Anders S Huber W (2010) Differential expression analysis for sequence count data Genome Biol 11 R106Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Aoki K Ogata Y Shibata D (2007) Approaches for extracting practical information from gene co-expression networks in plant biologyPlant Cell Physiol 48 381-90

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Aoki Y Okamura Y Tadaka S Kinoshita K Obayashi T (2015) ATTED-II in 2016 A Plant Coexpression Database Towards Lineage-Specific Coexpression Plant Cell Physiol 57 e5-e5

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Asnicar F Masera L Coller E Gallo C Sella N Tolio T Morettin P Erculiani L Galante F Semeniuta S (2016) NES2RA Networkexpansion by stratified variable subsetting and ranking aggregation Int J High Perform Comput Appl 1094342016662508

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Baerenfaller K Grossmann J Grobei MA Hull R Hirsch-Hoffmann M Yalovsky S Zimmermann P Grossniklaus U Gruissem WBaginsky S (2008) Genome-scale proteomics reveals Arabidopsis thaliana gene models and proteome dynamics Science (80- ) 320938-941

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ballouz S Verleyen W Gillis J (2015) Guidance for RNA-seq co-expression network construction and analysis Safety in numbersBioinformatics 31 2123-2130

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ballouz S Weber M Pavlidis P Gillis J (2016) EGAD Ultra-fast functional analysis of gene networks bioRxiv 53868Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Barabasi A-L Oltvai ZNZN Barabaacutesi A-L (2004) Network biology understanding the cells functional organization Nat Rev Genet 5101-113

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Benjamini Y Hochberg Y (1995) Controlling the false discovery rate a practical and powerful approach to multiple testing J R Stat SocSer B 289-300

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

De Bodt S Hollunder J Nelissen H Meulemeester N Inzeacute D (2012) CORNET 20 Integrating plant coexpression protein-proteininteractions regulatory interactions gene associations and functional annotations New Phytol 195 707-720

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Bolduc N Yilmaz A Mejia-Guerra MK Morohashi K OConnor D Grotewold E Hake S (2012) Unraveling the KNOTTED1 regulatorynetwork in maize meristems Genes Dev 26 1685-90

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Bosch M Mayer C-D Cookson A Donnison IS (2011) Identification of genes involved in cell wall biogenesis in grasses by differential wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

gene expression profiling of elongating and non-elongating maize internodes J Exp Bot 62 3545-3561Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Camacho C Coulouris G Avagyan V Ma N Papadopoulos J Bealer K Madden TL (2009) BLAST+ architecture and applications BMCBioinformatics 10 421

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Conesa A Madrigal P Tarazona S Gomez-Cabrero D Cervera A McPherson A Szczesniak MW Gaffney DJ Elo LL Zhang X et al(2016) A survey of best practices for RNA-seq data analysis Genome Biol 17 13

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dhaeseleer P Liang S Somogyi R (2000) Genetic network inference from co-expression clustering to reverse engineeringBioinformatics 16 707-726

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Davidson RM Hansey CN Gowda M Childs KL Lin H Vaillancourt B Sekhon RS de Leon N Kaeppler SM Jiang N et al (2011) Utilityof RNA Sequencing for Analysis of Maize Reproductive Transcriptomes Plant Genome J 4 191

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Desprez T Juraniec M Crowell EF Jouy H Pochylova Z Parcy F Houmlfte H Gonneau M Vernhettes S (2007) Organization of cellulosesynthase complexes involved in primary cell wall synthesis in Arabidopsis thaliana Proc Natl Acad Sci 104 15572-15577

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dhillon IS Modha DS (2001) Concept decompositions for large sparse text data using clustering Mach Learn 42 143-175Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dillies M-A Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot G Castel D Estelle J (2013a) Acomprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis Brief Bioinform 14671-683

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dillies MA Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot NS Castel D Estelle J et al (2013b) Acomprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis Brief Bioinform 14671-683

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Doncheva NT Assenov Y Domingues FS Albrecht M (2012) Topological analysis and interactive visualization of biological networksand protein structures Nature Protoc 7 670-85

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dong J Horvath S (2007) Understanding network concepts in modules BMC Syst Biol 1 24Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dotto MC Petsch KA Aukerman MJ Beatty M Hammell M Timmermans MCP (2014) Genome-wide analysis of leafbladeless1-regulatedand phased small RNAs underscores the importance of the TAS3 ta-siRNA pathway to maize development PLoS Genet 10 e1004826

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Du D Rawat N Deng Z Gmitter FG (2015) Construction of citrus gene coexpression networks from microarray data using randommatrix theory Hortic Res 2 15026

Pubmed Author and TitleCrossRef Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Google Scholar Author Only Title Only Author and Title

Du Z Zhou X Ling Y Zhang Z Su Z (2010) agriGO a GO analysis toolkit for the agricultural community Nucleic Acids Res 38 64-70Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Enright AJ Van Dongen S Ouzounis CA (2002) An efficient algorithm for large-scale detection of protein families Nucleic Acids Res 301575-1584

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fadista J Vikman P Laakso EO Mollet IG Esguerra J Lou Taneera J Storm P Osmark P Ladenvall C Prasad RB (2014) Globalgenomic and transcriptomic analysis of human pancreatic islets reveals novel genes influencing glucose metabolism Proc Natl AcadSci 111 13924-13929

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Faith JJ Hayete B Thaden JT Mogno I Wierzbowski J Cottarel G Kasif S Collins JJ Gardner TS (2007) Large-scale mapping andvalidation of Escherichia coli transcriptional regulation from a compendium of expression profiles PLoS Biol 5 0054-0066

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fedoroff N V (2012) McClintocks challenge in the 21st century Proc Natl Acad Sci 109(50) 20200-20203Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ficklin SP Feltus FA (2011) Gene coexpression network alignment and conservation of gene modules between two grass speciesmaize and rice Plant Physiol 156 1244-56

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Filzmoser P Fritz H Kalcher K (2009) pcaPP Robust PCA by Projection Pursuit R Packag version 1Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fu J Cheng Y Linghu J Yang X Kang L Zhang Z Zhang J He C Du X Peng Z (2013) RNA sequencing reveals the complex regulatorynetwork in the maize kernel Nature Commun 42832

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gao S Ver Steeg G Galstyan A (2015) Efficient Estimation of Mutual Information for Strongly Dependent Variables ArtificialIntelligence and Statistics 277-286

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gillis J Pavlidis P (2011) The role of indirect connections in gene networks in predicting function Bioinformatics 27 1860-1866Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Giorgi F Fabbro C Del Licausi F (2013) Comparative study of RNA-seq-and Microarray-derived coexpression networks in Arabidopsisthaliana Bioinformatics 2 1-8

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gu Y Kaplinsky N Bringmann M Cobb A Carroll A Sampathkumar A Baskin TI Persson S Somerville CR (2010) Identification of acellulose synthase-associated protein required for cellulose biosynthesis Proc Natl Acad Sci 107 12866-12871

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Han Y Gao S Muegge K Zhang W Zhou B (2015) Advanced applications of RNA sequencing and challenges Bioinform Biol Insights 929-46

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Horvath S Dong J (2008) Geometric interpretation of gene coexpression network analysis PLoS Comput biol 4 e1000117Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Huang J Lynn JS Schulte L Vendramin S McGinnis K (2017) Chapter Two-Epigenetic Control of Gene Expression in Maize Int RevCell Mol Biol 328 25-48

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Iancu OD Kawane S Bottomly D Searles R Hitzemann R McWeeney S (2012) Utilizing RNA-Seq data for de novo coexpressionnetwork inference Bioinformatics 28 1592-1597

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kim D Langmead B Salzberg SL (2015) HISAT a fast spliced aligner with low memory requirements Nat Methods 12 357-360Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Koumlster J Rahmann S (2012) Snakemakemdasha scalable bioinformatics workflow engine Bioinformatics 28 2520-2522Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Krouk G Lingeman J Colon A Coruzzi G Shasha D (2013) Gene regulatory networks in plants learning causality from time andperturbation Genome Biol 14 123

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kumar S Stecher G Suleski M Hedges SB (2017) TimeTree a resource for timelines timetrees and divergence times Mol Biol Evol34 1812-1819

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kumari S Nie J Chen H-S Ma H Stewart R Li X Lu M-Z Taylor WM Wei H (2012) Evaluation of gene association methods forcoexpression network construction and biological knowledge discovery PLoS One 7 e50411

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Langfelder P Horvath S (2008) WGCNA an R package for weighted correlation network analysis BMC Bioinformatics 9 559Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Leinonen R Sugawara H Shumway M (2010) The sequence read archive Nucleic Acids Res gkq1019Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li C Qiao Z Qi W Wang Q Yuan Y Yang X Tang Y Mei B Lv Y Zhao H et al (2015a) Genome-Wide Characterization of cis-Acting DNATargets Reveals the Transcriptional Regulatory Framework of Opaque2 in Maize Plant Cell 27 532-545

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li H Peng Z Yang X Wang W Fu J Wang J Han Y Chai Y Guo T Yang N (2013a) Genome-wide association study dissects the geneticarchitecture of oil biosynthesis in maize kernels Nat Genet 45 43-50

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li J Wei H Zhao PX (2013b) DeGNServer Deciphering Genome-Scale Gene Networks through High Performance ReverseEngineering Analysis 2013

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li P Piao Y Shon HS Ryu KH (2015b) Comparing the normalization methods for the differential analysis of Illumina high-throughputRNA-Seq data BMC Bioinformatics 16 347 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li Q Eichten SR Hermanson PJ Zaunbrecher VM Song J Wendt J Rosenbaum H Madzima TF Sloan AE Huang J et al (2014a)Genetic Perturbation of the Maize Methylome Plant Cell 26 4602-4616

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li S Labaj PP Zumbo P Sykacek P Shi W Shi L Phan J Wu P-Y Wang M Wang C (2014b) Detecting and correcting systematicvariation in large-scale RNA sequencing data Nature Biotechnol 32 888-895

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li Y Pearl SA Jackson SA (2015c) Gene Networks in Plant Biology Approaches in Reconstruction and Analysis Trends Plant Sci 20664-675

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Liao Y Smyth GK Shi W (2014) featureCounts an efficient general purpose program for assigning sequence reads to genomicfeatures Bioinformatics 30 923-930

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Lim WK Wang K Lefebvre C Califano A (2007) Comparative analysis of microarray normalization procedures Effects on reverseengineering gene networks Bioinformatics pp 282-288

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Liu S Liu Y Zhao J Cai S Qian H Zuo K Zhao L Zhang L (2017) A computational interactome for prioritizing genes associated withcomplex agronomic traits in rice Plant J 90 177-188

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Lonsdale J Thomas J Salvatore M Phillips R Lo E Shad S Hasz R Walters G Garcia F Young N (2013) The genotype-tissueexpression (GTEx) project Nat Genet 45 580-585

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Love MI Huber W Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 Genome Biol15 1

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Luo X You Z Zhou M Li S Leung H Xia Y Zhu Q (2015) A highly efficient approach to protein interactome mapping based oncollaborative filtering framework Sci Rep 5 7702

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma C Wang X (2012) Application of the Gini correlation coefficient to infer regulatory relationships in transcriptome analysis PlantPhysiol 160 192-203

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma H-W Zeng A-P (2003) The connectivity structure giant strong component and centrality of metabolic networks Bioinformatics 191423-1430

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma S Shah S Bohnert HJ Snyder M Dinesh-Kumar SP (2013) Incorporating motif analysis into gene co-expression networks revealsnovel modular expression pattern and new signaling pathways PLoS Genet 9 e1003840

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Marbach D Costello JC Kuumlffner R Vega NNM Prill RJ Camacho DM Allison KR Aderhold A Allison KR Bonneau R et al (2012)Wisdom of crowds for robust gene network inference Nat Methods 9 796-804

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Margolin AA Nemenman I Basso K Wiggins C Stolovitzky G Dalla Favera R Califano A (2006) ARACNE an algorithm for thereconstruction of gene regulatory networks in a mammalian cellular context BMC Bioinformatics 7 S7

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mark Cigan A Unger-Wallace E Haug-Collet K (2005) Transcriptional gene silencing as a tool for uncovering gene function in maizePlant J 43 929-940

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads EMBnet J 17 pp-10Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Maza E Frasse P Senin P Bouzayen M Zouine M (2013) Comparison of normalization methods for differential gene expressionanalysis in RNA-Seq experiments A matter of relative size of studied transcriptomes Commun Integr Biol 6 e25849

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

McClintock B (1983) The Significance of Responses of THE GENOME TO CHALLENGE Science (80- ) 792-801Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Meyer PE Kontos K Lafitte F Bontempi G (2007) Information-theoretic inference of large transcriptional regulatory networks EurasipJ Bioinforma Syst Biol doi 101155200779879

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Meyer PE Lafitte F Bontempi G (2008) minet ARBioconductor package for inferring large transcriptional networks using mutualinformation BMC Bioinformatics 9 461

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Monaco MK Sen TZ Dharmawardhana PD Ren L Schaeffer M Naithani S Amarasinghe V Thomason J Harper L Gardiner J et al(2013) Maize Metabolic Network Construction and Transcriptome Analysis Plant Genome 6 12

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Morohashi K Casas MI Falcone Ferreyra ML Falcone Ferreyra L Mejiacutea-Guerra MK Pourcel L Yilmaz A Feller A Carvalho B EmilianiJ et al (2012) A genome-wide regulatory framework identifies maize pericarp color1 controlled genes Plant Cell 24 2745-64

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Morris JH Apeltsin L Newman AM Baumbach J Wittkop T Su G Bader GD Ferrin TE (2011) clusterMaker a multi-algorithmclustering plugin for Cytoscape BMC Bioinformatics 12 436

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mortazavi A Williams BA McCue K Schaeffer L Wold B (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq NatMethods 5 621-628

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mukaka MM (2012) A guide to appropriate use of correlation coefficient in medical research Malawi Med J 24 69-71Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Obayashi T Hayashi S Saeki M Ohta H Kinoshita K (2009) ATTED-II provides coexpressed gene networks for Arabidopsis Nucleic wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Acids Res 37 D987-D991Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ogata Y Suzuki H Sakurai N Shibata D (2010) CoP a database for characterizing co-expressed gene modules with biologicalinformation in plants Bioinformatics 26 1267-1268

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Oshlack A Wakefield MJ (2009) Transcript length bias in RNA-seq data confounds systems biology Biol Direct 4 14Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Park T Yi S-G Kang S-H Lee S Lee Y-S Simon R (2003) Evaluation of normalization methods for microarray data BMC Bioinformatics4 33

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Paulson J Chen C-Y Lopes-Ramos CM Kuijjer ML Platig J Sonawane AR Fagny M Glass K Quackenbush J (2016) Tissue-awareRNA-Seq processing and normalization for heterogeneous and sparse data bioRxiv 81802

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Pautler M Eveland AL LaRue T Yang F Weeks R Lunde C Je B Il Meeley R Komatsu M Vollbrecht E et al (2015) FASCIATED EAR4Encodes a bZIP Transcription Factor That Regulates Shoot Meristem Size in Maize Plant Cell Online 2 tpc114132506

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Penning BW Hunter 3rd CT Tayengwa R Eveland AL Dugard CK Olek AT Vermerris W Koch KE McCarty DR Davis MF et al (2009)Genetic resources for maize cell wall biology Plant Physiol 151 1703-1728

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ponnala L Wang Y Sun Q Wijk KJ (2014) Correlation of mRNA and protein abundance in the developing maize leaf Plant J 78 424-440Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Rau A Gallopin M Celeux G Jaffreacutezic F (2013) Data-based filtering for replicated high-throughput transcriptome sequencingexperiments Bioinformatics 29 2146-2152

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Robinson MD McCarthy DJ Smyth GK (2010) edgeR a Bioconductor package for differential expression analysis of digital geneexpression data Bioinformatics 26 139-140

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sales G Romualdi C (2011) parmigenemdasha parallel R package for mutual information estimation and gene network reconstructionBioinformatics 27 1876-1877

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sangrador-Vegas A Mitchell AL Chang H-Y Yong S-Y Finn RD (2016) GO annotation in InterPro why stability does not indicateaccuracy in a sea of changing annotations Database J Biol databases curation 2016

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sato Y Antonio BA Namiki N Takehisa H Minami H Kamatsuki K Sugimoto K Shimizu Y Hirochika H Nagamura Y (2011) RiceXPro aplatform for monitoring gene expression in japonica rice grown under natural field conditions Nucleic Acids Res 39 D1141-D1148

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schaefer RJ Briskine R Springer NM Myers CL (2014) Discovering functional modules across diverse maize transcriptomes using wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

COB the co-expression browser PLoS One doi 101371journalpone0099193Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schmidt D (2016) Co-Operation Fast Correlation Covariance and Cosine Similarity R packagePubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schnable P Ware D Fulton R Stein J (2009) The B73 maize genome complexity diversity and dynamics Science (80- ) 326 1112-1115Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schwanhaumlusser B Busse D Li N Dittmar G Schuchhardt J Wolf J Chen W Selbach M (2011) Global quantification of mammalian geneexpression control Nature 473 337-342

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Scott-Boyer M-P Haibe-Kains B Deschepper CF (2013) Network statistics of genetically-driven gene co-expression modules in mousecrosses Frontiers in Genetics 20134291

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Serin EAR Nijveen H Hilhorst HWM Ligterink W (2016) Learning from Co-expression Networks Possibilities and Challenges FrontPlant Sci 7 444

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Shannon P Markiel A Ozier O Baliga NS Wang JT Ramage D Amin N Schwikowski B Ideker T (2003) Cytoscape a softwareenvironment for integrated models of biomolecular interaction networks Genome Res 13 2498-2504

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sing T Sander O Beerenwinkel N Lengauer T (2005) ROCR visualizing classifier performance in R Bioinformatics 21 3940-3941Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Singh M Goel S Meeley RB Dantec C Parrinello H Michaud C Leblanc O Grimanelli D (2011) Production of viable gametes withoutmeiosis in maize deficient for an ARGONAUTE protein Plant Cell 23 443-458

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Song L Langfelder P Horvath S (2012) Comparison of co-expression measures mutual information correlation and model basedindices BMC Bioinformatics 13 328

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Stelpflug SC Rajandeep S Vaillancourt B Hirsch CN Buell CR Leon N De Kaeppler SM (2015) An expanded maize gene expressionatlas based on RNA-sequencing and its use to explore root development Plant Genome 314-362

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Szklarczyk D Franceschini A Wyder S Forslund K Heller D Huerta-Cepas J Simonovic M Roth A Santos A Tsafou KP et al (2015)STRING v10 Protein-protein interaction networks integrated over the tree of life Nucleic Acids Res 43 D447-D452

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Tzfadia O Diels T De Meyer S Vandepoele K Aharoni A Van de Peer Y (2016) CoExpNetViz Comparative Co-Expression NetworksConstruction and Visualization Tool Front Plant Sci 6 1194

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Usadel B Obayashi T Mutwil M Giorgi FM Bassel GW Tanimoto M Chow A Steinhauser D Persson S Provart NJ (2009) Co-expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

USDA (2016) Grain World Markets and Trade

Vanholme R Demedts B Morreel K Ralph J Boerjan W (2010) Lignin biosynthesis and structure Plant Physiol 153 895-905Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Verdin E Ott M (2014) 50 Years of Protein Acetylation From Gene Regulation To Epigenetics Metabolism and Beyond Nat Rev MolCell Biol 16 258-264

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Walley JW Sartor RC Shen Z Schmitz RJ Wu KJ Urich MA Nery JR Smith LG Schnable JC Ecker JR (2016) Integration of omicnetworks in a developmental atlas of maize Science 353 814-818

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang W Zhou XJ Liu Z Sun F (2015a) Network tuned multiple rank aggregation and applications to gene ranking BMC Bioinformatics16 S6

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang YXR Jiang K Feldman LJ Bickel PJ Huang H (2015b) Inferring gene-gene interactions and functional modules using sparsecanonical correlation analysis Ann Appl Stat 9 300-323

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang Z Gerstein M Snyder M (2009) RNA-Seq a revolutionary tool for transcriptomics Nat Rev Genet 10 57-63Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wei C Li J Bumgarner RE (2004) Sample size for detecting differentially expressed genes in microarray experiments BMC Genomics5 87

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wolfe CJ Kohane IS Butte AJ (2005) Systematic survey reveals general applicability of guilt-by-association within genecoexpression networks BMC Bioinformatics 6 227

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wu D-D Wang X Li Y Zeng L Irwin DM Zhang Y-P (2014) Out of Pollen Hypothesis for Origin of New Genes in Flowering PlantsStudy from Arabidopsis thaliana Genome Biol Evol 6 2822-2829

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yang H Liu X Xin M Du J Hu Z Peng H Rossi V Sun Q Ni Z Yao Y (2016) Genome-wide Mapping of Targets of Maize HistoneDeacetylase HDA101 Reveals Its Function and Regulatory Mechanism during Seed Development Plant Cell 28 629-645

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yim WC Yu Y Song K Jang CS Lee B-M (2013) PLANEX the plant co-expression database BMC Plant Biol 13 83Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zheng W Chung LM Zhao H (2011) Bias detection and correction in RNA-Sequencing data BMC Bioinformatics 12 290Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Allen JD Xiao G Xie Y (2014) Ensemble-based network aggregation improves the accuracy of gene network reconstructionPLoS One 9 e106319 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Ye ZH (2015) Secondary cell walls Biosynthesis patterned deposition and transcriptional regulation Plant Cell Physiol 56195-214

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhu G Wu A Xu X-J Xiao P Lu L Liu J Cao Y Chen L Wu J Zhao X-M (2016) PPIM A protein-protein interaction database for MaizePlant Physiol 170 pp1501821-

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zyprych-Walczak J Szabelska A Handschuh L Goacuterczak K Klamecka K Figlerowicz M Siatkowski I (2015) The impact of normalizationmethods on RNA-Seq data analysis Biomed Res Int 2015

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

  • Parsed Citations
  • Article File
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Figure 7
  • Figure 8
  • Parsed Citations
Page 39: 1 : Maize RNA-Seq GCNs · 4 Author names and affiliation: 5 Ji Huang, ... and the genome annotation is still far from complete (Mark 49 Cigan et al., 2005; Ficklin and Feltus, 2011)

Parsed CitationsAllen JD Xie Y Chen M Girard L Xiao G (2012) Comparing statistical methods for constructing large scale gene networks PLoS One7 e29348

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Anders S Huber W (2010) Differential expression analysis for sequence count data Genome Biol 11 R106Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Aoki K Ogata Y Shibata D (2007) Approaches for extracting practical information from gene co-expression networks in plant biologyPlant Cell Physiol 48 381-90

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Aoki Y Okamura Y Tadaka S Kinoshita K Obayashi T (2015) ATTED-II in 2016 A Plant Coexpression Database Towards Lineage-Specific Coexpression Plant Cell Physiol 57 e5-e5

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Asnicar F Masera L Coller E Gallo C Sella N Tolio T Morettin P Erculiani L Galante F Semeniuta S (2016) NES2RA Networkexpansion by stratified variable subsetting and ranking aggregation Int J High Perform Comput Appl 1094342016662508

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Baerenfaller K Grossmann J Grobei MA Hull R Hirsch-Hoffmann M Yalovsky S Zimmermann P Grossniklaus U Gruissem WBaginsky S (2008) Genome-scale proteomics reveals Arabidopsis thaliana gene models and proteome dynamics Science (80- ) 320938-941

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ballouz S Verleyen W Gillis J (2015) Guidance for RNA-seq co-expression network construction and analysis Safety in numbersBioinformatics 31 2123-2130

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ballouz S Weber M Pavlidis P Gillis J (2016) EGAD Ultra-fast functional analysis of gene networks bioRxiv 53868Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Barabasi A-L Oltvai ZNZN Barabaacutesi A-L (2004) Network biology understanding the cells functional organization Nat Rev Genet 5101-113

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Benjamini Y Hochberg Y (1995) Controlling the false discovery rate a practical and powerful approach to multiple testing J R Stat SocSer B 289-300

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

De Bodt S Hollunder J Nelissen H Meulemeester N Inzeacute D (2012) CORNET 20 Integrating plant coexpression protein-proteininteractions regulatory interactions gene associations and functional annotations New Phytol 195 707-720

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Bolduc N Yilmaz A Mejia-Guerra MK Morohashi K OConnor D Grotewold E Hake S (2012) Unraveling the KNOTTED1 regulatorynetwork in maize meristems Genes Dev 26 1685-90

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Bosch M Mayer C-D Cookson A Donnison IS (2011) Identification of genes involved in cell wall biogenesis in grasses by differential wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

gene expression profiling of elongating and non-elongating maize internodes J Exp Bot 62 3545-3561Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Camacho C Coulouris G Avagyan V Ma N Papadopoulos J Bealer K Madden TL (2009) BLAST+ architecture and applications BMCBioinformatics 10 421

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Conesa A Madrigal P Tarazona S Gomez-Cabrero D Cervera A McPherson A Szczesniak MW Gaffney DJ Elo LL Zhang X et al(2016) A survey of best practices for RNA-seq data analysis Genome Biol 17 13

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dhaeseleer P Liang S Somogyi R (2000) Genetic network inference from co-expression clustering to reverse engineeringBioinformatics 16 707-726

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Davidson RM Hansey CN Gowda M Childs KL Lin H Vaillancourt B Sekhon RS de Leon N Kaeppler SM Jiang N et al (2011) Utilityof RNA Sequencing for Analysis of Maize Reproductive Transcriptomes Plant Genome J 4 191

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Desprez T Juraniec M Crowell EF Jouy H Pochylova Z Parcy F Houmlfte H Gonneau M Vernhettes S (2007) Organization of cellulosesynthase complexes involved in primary cell wall synthesis in Arabidopsis thaliana Proc Natl Acad Sci 104 15572-15577

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dhillon IS Modha DS (2001) Concept decompositions for large sparse text data using clustering Mach Learn 42 143-175Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dillies M-A Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot G Castel D Estelle J (2013a) Acomprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis Brief Bioinform 14671-683

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dillies MA Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot NS Castel D Estelle J et al (2013b) Acomprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis Brief Bioinform 14671-683

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Doncheva NT Assenov Y Domingues FS Albrecht M (2012) Topological analysis and interactive visualization of biological networksand protein structures Nature Protoc 7 670-85

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dong J Horvath S (2007) Understanding network concepts in modules BMC Syst Biol 1 24Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dotto MC Petsch KA Aukerman MJ Beatty M Hammell M Timmermans MCP (2014) Genome-wide analysis of leafbladeless1-regulatedand phased small RNAs underscores the importance of the TAS3 ta-siRNA pathway to maize development PLoS Genet 10 e1004826

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Du D Rawat N Deng Z Gmitter FG (2015) Construction of citrus gene coexpression networks from microarray data using randommatrix theory Hortic Res 2 15026

Pubmed Author and TitleCrossRef Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Google Scholar Author Only Title Only Author and Title

Du Z Zhou X Ling Y Zhang Z Su Z (2010) agriGO a GO analysis toolkit for the agricultural community Nucleic Acids Res 38 64-70Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Enright AJ Van Dongen S Ouzounis CA (2002) An efficient algorithm for large-scale detection of protein families Nucleic Acids Res 301575-1584

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fadista J Vikman P Laakso EO Mollet IG Esguerra J Lou Taneera J Storm P Osmark P Ladenvall C Prasad RB (2014) Globalgenomic and transcriptomic analysis of human pancreatic islets reveals novel genes influencing glucose metabolism Proc Natl AcadSci 111 13924-13929

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Faith JJ Hayete B Thaden JT Mogno I Wierzbowski J Cottarel G Kasif S Collins JJ Gardner TS (2007) Large-scale mapping andvalidation of Escherichia coli transcriptional regulation from a compendium of expression profiles PLoS Biol 5 0054-0066

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fedoroff N V (2012) McClintocks challenge in the 21st century Proc Natl Acad Sci 109(50) 20200-20203Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ficklin SP Feltus FA (2011) Gene coexpression network alignment and conservation of gene modules between two grass speciesmaize and rice Plant Physiol 156 1244-56

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Filzmoser P Fritz H Kalcher K (2009) pcaPP Robust PCA by Projection Pursuit R Packag version 1Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fu J Cheng Y Linghu J Yang X Kang L Zhang Z Zhang J He C Du X Peng Z (2013) RNA sequencing reveals the complex regulatorynetwork in the maize kernel Nature Commun 42832

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gao S Ver Steeg G Galstyan A (2015) Efficient Estimation of Mutual Information for Strongly Dependent Variables ArtificialIntelligence and Statistics 277-286

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gillis J Pavlidis P (2011) The role of indirect connections in gene networks in predicting function Bioinformatics 27 1860-1866Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Giorgi F Fabbro C Del Licausi F (2013) Comparative study of RNA-seq-and Microarray-derived coexpression networks in Arabidopsisthaliana Bioinformatics 2 1-8

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gu Y Kaplinsky N Bringmann M Cobb A Carroll A Sampathkumar A Baskin TI Persson S Somerville CR (2010) Identification of acellulose synthase-associated protein required for cellulose biosynthesis Proc Natl Acad Sci 107 12866-12871

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Han Y Gao S Muegge K Zhang W Zhou B (2015) Advanced applications of RNA sequencing and challenges Bioinform Biol Insights 929-46

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Horvath S Dong J (2008) Geometric interpretation of gene coexpression network analysis PLoS Comput biol 4 e1000117Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Huang J Lynn JS Schulte L Vendramin S McGinnis K (2017) Chapter Two-Epigenetic Control of Gene Expression in Maize Int RevCell Mol Biol 328 25-48

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Iancu OD Kawane S Bottomly D Searles R Hitzemann R McWeeney S (2012) Utilizing RNA-Seq data for de novo coexpressionnetwork inference Bioinformatics 28 1592-1597

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kim D Langmead B Salzberg SL (2015) HISAT a fast spliced aligner with low memory requirements Nat Methods 12 357-360Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Koumlster J Rahmann S (2012) Snakemakemdasha scalable bioinformatics workflow engine Bioinformatics 28 2520-2522Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Krouk G Lingeman J Colon A Coruzzi G Shasha D (2013) Gene regulatory networks in plants learning causality from time andperturbation Genome Biol 14 123

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kumar S Stecher G Suleski M Hedges SB (2017) TimeTree a resource for timelines timetrees and divergence times Mol Biol Evol34 1812-1819

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kumari S Nie J Chen H-S Ma H Stewart R Li X Lu M-Z Taylor WM Wei H (2012) Evaluation of gene association methods forcoexpression network construction and biological knowledge discovery PLoS One 7 e50411

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Langfelder P Horvath S (2008) WGCNA an R package for weighted correlation network analysis BMC Bioinformatics 9 559Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Leinonen R Sugawara H Shumway M (2010) The sequence read archive Nucleic Acids Res gkq1019Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li C Qiao Z Qi W Wang Q Yuan Y Yang X Tang Y Mei B Lv Y Zhao H et al (2015a) Genome-Wide Characterization of cis-Acting DNATargets Reveals the Transcriptional Regulatory Framework of Opaque2 in Maize Plant Cell 27 532-545

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li H Peng Z Yang X Wang W Fu J Wang J Han Y Chai Y Guo T Yang N (2013a) Genome-wide association study dissects the geneticarchitecture of oil biosynthesis in maize kernels Nat Genet 45 43-50

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li J Wei H Zhao PX (2013b) DeGNServer Deciphering Genome-Scale Gene Networks through High Performance ReverseEngineering Analysis 2013

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li P Piao Y Shon HS Ryu KH (2015b) Comparing the normalization methods for the differential analysis of Illumina high-throughputRNA-Seq data BMC Bioinformatics 16 347 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li Q Eichten SR Hermanson PJ Zaunbrecher VM Song J Wendt J Rosenbaum H Madzima TF Sloan AE Huang J et al (2014a)Genetic Perturbation of the Maize Methylome Plant Cell 26 4602-4616

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li S Labaj PP Zumbo P Sykacek P Shi W Shi L Phan J Wu P-Y Wang M Wang C (2014b) Detecting and correcting systematicvariation in large-scale RNA sequencing data Nature Biotechnol 32 888-895

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li Y Pearl SA Jackson SA (2015c) Gene Networks in Plant Biology Approaches in Reconstruction and Analysis Trends Plant Sci 20664-675

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Liao Y Smyth GK Shi W (2014) featureCounts an efficient general purpose program for assigning sequence reads to genomicfeatures Bioinformatics 30 923-930

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Lim WK Wang K Lefebvre C Califano A (2007) Comparative analysis of microarray normalization procedures Effects on reverseengineering gene networks Bioinformatics pp 282-288

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Liu S Liu Y Zhao J Cai S Qian H Zuo K Zhao L Zhang L (2017) A computational interactome for prioritizing genes associated withcomplex agronomic traits in rice Plant J 90 177-188

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Lonsdale J Thomas J Salvatore M Phillips R Lo E Shad S Hasz R Walters G Garcia F Young N (2013) The genotype-tissueexpression (GTEx) project Nat Genet 45 580-585

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Love MI Huber W Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 Genome Biol15 1

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Luo X You Z Zhou M Li S Leung H Xia Y Zhu Q (2015) A highly efficient approach to protein interactome mapping based oncollaborative filtering framework Sci Rep 5 7702

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma C Wang X (2012) Application of the Gini correlation coefficient to infer regulatory relationships in transcriptome analysis PlantPhysiol 160 192-203

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma H-W Zeng A-P (2003) The connectivity structure giant strong component and centrality of metabolic networks Bioinformatics 191423-1430

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma S Shah S Bohnert HJ Snyder M Dinesh-Kumar SP (2013) Incorporating motif analysis into gene co-expression networks revealsnovel modular expression pattern and new signaling pathways PLoS Genet 9 e1003840

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Marbach D Costello JC Kuumlffner R Vega NNM Prill RJ Camacho DM Allison KR Aderhold A Allison KR Bonneau R et al (2012)Wisdom of crowds for robust gene network inference Nat Methods 9 796-804

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Margolin AA Nemenman I Basso K Wiggins C Stolovitzky G Dalla Favera R Califano A (2006) ARACNE an algorithm for thereconstruction of gene regulatory networks in a mammalian cellular context BMC Bioinformatics 7 S7

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mark Cigan A Unger-Wallace E Haug-Collet K (2005) Transcriptional gene silencing as a tool for uncovering gene function in maizePlant J 43 929-940

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads EMBnet J 17 pp-10Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Maza E Frasse P Senin P Bouzayen M Zouine M (2013) Comparison of normalization methods for differential gene expressionanalysis in RNA-Seq experiments A matter of relative size of studied transcriptomes Commun Integr Biol 6 e25849

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

McClintock B (1983) The Significance of Responses of THE GENOME TO CHALLENGE Science (80- ) 792-801Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Meyer PE Kontos K Lafitte F Bontempi G (2007) Information-theoretic inference of large transcriptional regulatory networks EurasipJ Bioinforma Syst Biol doi 101155200779879

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Meyer PE Lafitte F Bontempi G (2008) minet ARBioconductor package for inferring large transcriptional networks using mutualinformation BMC Bioinformatics 9 461

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Monaco MK Sen TZ Dharmawardhana PD Ren L Schaeffer M Naithani S Amarasinghe V Thomason J Harper L Gardiner J et al(2013) Maize Metabolic Network Construction and Transcriptome Analysis Plant Genome 6 12

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Morohashi K Casas MI Falcone Ferreyra ML Falcone Ferreyra L Mejiacutea-Guerra MK Pourcel L Yilmaz A Feller A Carvalho B EmilianiJ et al (2012) A genome-wide regulatory framework identifies maize pericarp color1 controlled genes Plant Cell 24 2745-64

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Morris JH Apeltsin L Newman AM Baumbach J Wittkop T Su G Bader GD Ferrin TE (2011) clusterMaker a multi-algorithmclustering plugin for Cytoscape BMC Bioinformatics 12 436

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mortazavi A Williams BA McCue K Schaeffer L Wold B (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq NatMethods 5 621-628

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mukaka MM (2012) A guide to appropriate use of correlation coefficient in medical research Malawi Med J 24 69-71Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Obayashi T Hayashi S Saeki M Ohta H Kinoshita K (2009) ATTED-II provides coexpressed gene networks for Arabidopsis Nucleic wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Acids Res 37 D987-D991Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ogata Y Suzuki H Sakurai N Shibata D (2010) CoP a database for characterizing co-expressed gene modules with biologicalinformation in plants Bioinformatics 26 1267-1268

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Oshlack A Wakefield MJ (2009) Transcript length bias in RNA-seq data confounds systems biology Biol Direct 4 14Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Park T Yi S-G Kang S-H Lee S Lee Y-S Simon R (2003) Evaluation of normalization methods for microarray data BMC Bioinformatics4 33

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Paulson J Chen C-Y Lopes-Ramos CM Kuijjer ML Platig J Sonawane AR Fagny M Glass K Quackenbush J (2016) Tissue-awareRNA-Seq processing and normalization for heterogeneous and sparse data bioRxiv 81802

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Pautler M Eveland AL LaRue T Yang F Weeks R Lunde C Je B Il Meeley R Komatsu M Vollbrecht E et al (2015) FASCIATED EAR4Encodes a bZIP Transcription Factor That Regulates Shoot Meristem Size in Maize Plant Cell Online 2 tpc114132506

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Penning BW Hunter 3rd CT Tayengwa R Eveland AL Dugard CK Olek AT Vermerris W Koch KE McCarty DR Davis MF et al (2009)Genetic resources for maize cell wall biology Plant Physiol 151 1703-1728

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ponnala L Wang Y Sun Q Wijk KJ (2014) Correlation of mRNA and protein abundance in the developing maize leaf Plant J 78 424-440Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Rau A Gallopin M Celeux G Jaffreacutezic F (2013) Data-based filtering for replicated high-throughput transcriptome sequencingexperiments Bioinformatics 29 2146-2152

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Robinson MD McCarthy DJ Smyth GK (2010) edgeR a Bioconductor package for differential expression analysis of digital geneexpression data Bioinformatics 26 139-140

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sales G Romualdi C (2011) parmigenemdasha parallel R package for mutual information estimation and gene network reconstructionBioinformatics 27 1876-1877

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sangrador-Vegas A Mitchell AL Chang H-Y Yong S-Y Finn RD (2016) GO annotation in InterPro why stability does not indicateaccuracy in a sea of changing annotations Database J Biol databases curation 2016

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sato Y Antonio BA Namiki N Takehisa H Minami H Kamatsuki K Sugimoto K Shimizu Y Hirochika H Nagamura Y (2011) RiceXPro aplatform for monitoring gene expression in japonica rice grown under natural field conditions Nucleic Acids Res 39 D1141-D1148

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schaefer RJ Briskine R Springer NM Myers CL (2014) Discovering functional modules across diverse maize transcriptomes using wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

COB the co-expression browser PLoS One doi 101371journalpone0099193Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schmidt D (2016) Co-Operation Fast Correlation Covariance and Cosine Similarity R packagePubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schnable P Ware D Fulton R Stein J (2009) The B73 maize genome complexity diversity and dynamics Science (80- ) 326 1112-1115Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schwanhaumlusser B Busse D Li N Dittmar G Schuchhardt J Wolf J Chen W Selbach M (2011) Global quantification of mammalian geneexpression control Nature 473 337-342

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Scott-Boyer M-P Haibe-Kains B Deschepper CF (2013) Network statistics of genetically-driven gene co-expression modules in mousecrosses Frontiers in Genetics 20134291

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Serin EAR Nijveen H Hilhorst HWM Ligterink W (2016) Learning from Co-expression Networks Possibilities and Challenges FrontPlant Sci 7 444

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Shannon P Markiel A Ozier O Baliga NS Wang JT Ramage D Amin N Schwikowski B Ideker T (2003) Cytoscape a softwareenvironment for integrated models of biomolecular interaction networks Genome Res 13 2498-2504

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sing T Sander O Beerenwinkel N Lengauer T (2005) ROCR visualizing classifier performance in R Bioinformatics 21 3940-3941Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Singh M Goel S Meeley RB Dantec C Parrinello H Michaud C Leblanc O Grimanelli D (2011) Production of viable gametes withoutmeiosis in maize deficient for an ARGONAUTE protein Plant Cell 23 443-458

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Song L Langfelder P Horvath S (2012) Comparison of co-expression measures mutual information correlation and model basedindices BMC Bioinformatics 13 328

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Stelpflug SC Rajandeep S Vaillancourt B Hirsch CN Buell CR Leon N De Kaeppler SM (2015) An expanded maize gene expressionatlas based on RNA-sequencing and its use to explore root development Plant Genome 314-362

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Szklarczyk D Franceschini A Wyder S Forslund K Heller D Huerta-Cepas J Simonovic M Roth A Santos A Tsafou KP et al (2015)STRING v10 Protein-protein interaction networks integrated over the tree of life Nucleic Acids Res 43 D447-D452

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Tzfadia O Diels T De Meyer S Vandepoele K Aharoni A Van de Peer Y (2016) CoExpNetViz Comparative Co-Expression NetworksConstruction and Visualization Tool Front Plant Sci 6 1194

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Usadel B Obayashi T Mutwil M Giorgi FM Bassel GW Tanimoto M Chow A Steinhauser D Persson S Provart NJ (2009) Co-expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

USDA (2016) Grain World Markets and Trade

Vanholme R Demedts B Morreel K Ralph J Boerjan W (2010) Lignin biosynthesis and structure Plant Physiol 153 895-905Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Verdin E Ott M (2014) 50 Years of Protein Acetylation From Gene Regulation To Epigenetics Metabolism and Beyond Nat Rev MolCell Biol 16 258-264

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Walley JW Sartor RC Shen Z Schmitz RJ Wu KJ Urich MA Nery JR Smith LG Schnable JC Ecker JR (2016) Integration of omicnetworks in a developmental atlas of maize Science 353 814-818

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang W Zhou XJ Liu Z Sun F (2015a) Network tuned multiple rank aggregation and applications to gene ranking BMC Bioinformatics16 S6

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang YXR Jiang K Feldman LJ Bickel PJ Huang H (2015b) Inferring gene-gene interactions and functional modules using sparsecanonical correlation analysis Ann Appl Stat 9 300-323

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang Z Gerstein M Snyder M (2009) RNA-Seq a revolutionary tool for transcriptomics Nat Rev Genet 10 57-63Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wei C Li J Bumgarner RE (2004) Sample size for detecting differentially expressed genes in microarray experiments BMC Genomics5 87

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wolfe CJ Kohane IS Butte AJ (2005) Systematic survey reveals general applicability of guilt-by-association within genecoexpression networks BMC Bioinformatics 6 227

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wu D-D Wang X Li Y Zeng L Irwin DM Zhang Y-P (2014) Out of Pollen Hypothesis for Origin of New Genes in Flowering PlantsStudy from Arabidopsis thaliana Genome Biol Evol 6 2822-2829

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yang H Liu X Xin M Du J Hu Z Peng H Rossi V Sun Q Ni Z Yao Y (2016) Genome-wide Mapping of Targets of Maize HistoneDeacetylase HDA101 Reveals Its Function and Regulatory Mechanism during Seed Development Plant Cell 28 629-645

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yim WC Yu Y Song K Jang CS Lee B-M (2013) PLANEX the plant co-expression database BMC Plant Biol 13 83Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zheng W Chung LM Zhao H (2011) Bias detection and correction in RNA-Sequencing data BMC Bioinformatics 12 290Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Allen JD Xiao G Xie Y (2014) Ensemble-based network aggregation improves the accuracy of gene network reconstructionPLoS One 9 e106319 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Ye ZH (2015) Secondary cell walls Biosynthesis patterned deposition and transcriptional regulation Plant Cell Physiol 56195-214

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhu G Wu A Xu X-J Xiao P Lu L Liu J Cao Y Chen L Wu J Zhao X-M (2016) PPIM A protein-protein interaction database for MaizePlant Physiol 170 pp1501821-

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zyprych-Walczak J Szabelska A Handschuh L Goacuterczak K Klamecka K Figlerowicz M Siatkowski I (2015) The impact of normalizationmethods on RNA-Seq data analysis Biomed Res Int 2015

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

  • Parsed Citations
  • Article File
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Figure 7
  • Figure 8
  • Parsed Citations
Page 40: 1 : Maize RNA-Seq GCNs · 4 Author names and affiliation: 5 Ji Huang, ... and the genome annotation is still far from complete (Mark 49 Cigan et al., 2005; Ficklin and Feltus, 2011)

gene expression profiling of elongating and non-elongating maize internodes J Exp Bot 62 3545-3561Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Camacho C Coulouris G Avagyan V Ma N Papadopoulos J Bealer K Madden TL (2009) BLAST+ architecture and applications BMCBioinformatics 10 421

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Conesa A Madrigal P Tarazona S Gomez-Cabrero D Cervera A McPherson A Szczesniak MW Gaffney DJ Elo LL Zhang X et al(2016) A survey of best practices for RNA-seq data analysis Genome Biol 17 13

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dhaeseleer P Liang S Somogyi R (2000) Genetic network inference from co-expression clustering to reverse engineeringBioinformatics 16 707-726

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Davidson RM Hansey CN Gowda M Childs KL Lin H Vaillancourt B Sekhon RS de Leon N Kaeppler SM Jiang N et al (2011) Utilityof RNA Sequencing for Analysis of Maize Reproductive Transcriptomes Plant Genome J 4 191

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Desprez T Juraniec M Crowell EF Jouy H Pochylova Z Parcy F Houmlfte H Gonneau M Vernhettes S (2007) Organization of cellulosesynthase complexes involved in primary cell wall synthesis in Arabidopsis thaliana Proc Natl Acad Sci 104 15572-15577

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dhillon IS Modha DS (2001) Concept decompositions for large sparse text data using clustering Mach Learn 42 143-175Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dillies M-A Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot G Castel D Estelle J (2013a) Acomprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis Brief Bioinform 14671-683

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dillies MA Rau A Aubert J Hennequet-Antier C Jeanmougin M Servant N Keime C Marot NS Castel D Estelle J et al (2013b) Acomprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis Brief Bioinform 14671-683

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Doncheva NT Assenov Y Domingues FS Albrecht M (2012) Topological analysis and interactive visualization of biological networksand protein structures Nature Protoc 7 670-85

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dong J Horvath S (2007) Understanding network concepts in modules BMC Syst Biol 1 24Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Dotto MC Petsch KA Aukerman MJ Beatty M Hammell M Timmermans MCP (2014) Genome-wide analysis of leafbladeless1-regulatedand phased small RNAs underscores the importance of the TAS3 ta-siRNA pathway to maize development PLoS Genet 10 e1004826

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Du D Rawat N Deng Z Gmitter FG (2015) Construction of citrus gene coexpression networks from microarray data using randommatrix theory Hortic Res 2 15026

Pubmed Author and TitleCrossRef Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Google Scholar Author Only Title Only Author and Title

Du Z Zhou X Ling Y Zhang Z Su Z (2010) agriGO a GO analysis toolkit for the agricultural community Nucleic Acids Res 38 64-70Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Enright AJ Van Dongen S Ouzounis CA (2002) An efficient algorithm for large-scale detection of protein families Nucleic Acids Res 301575-1584

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fadista J Vikman P Laakso EO Mollet IG Esguerra J Lou Taneera J Storm P Osmark P Ladenvall C Prasad RB (2014) Globalgenomic and transcriptomic analysis of human pancreatic islets reveals novel genes influencing glucose metabolism Proc Natl AcadSci 111 13924-13929

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Faith JJ Hayete B Thaden JT Mogno I Wierzbowski J Cottarel G Kasif S Collins JJ Gardner TS (2007) Large-scale mapping andvalidation of Escherichia coli transcriptional regulation from a compendium of expression profiles PLoS Biol 5 0054-0066

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fedoroff N V (2012) McClintocks challenge in the 21st century Proc Natl Acad Sci 109(50) 20200-20203Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ficklin SP Feltus FA (2011) Gene coexpression network alignment and conservation of gene modules between two grass speciesmaize and rice Plant Physiol 156 1244-56

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Filzmoser P Fritz H Kalcher K (2009) pcaPP Robust PCA by Projection Pursuit R Packag version 1Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fu J Cheng Y Linghu J Yang X Kang L Zhang Z Zhang J He C Du X Peng Z (2013) RNA sequencing reveals the complex regulatorynetwork in the maize kernel Nature Commun 42832

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gao S Ver Steeg G Galstyan A (2015) Efficient Estimation of Mutual Information for Strongly Dependent Variables ArtificialIntelligence and Statistics 277-286

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gillis J Pavlidis P (2011) The role of indirect connections in gene networks in predicting function Bioinformatics 27 1860-1866Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Giorgi F Fabbro C Del Licausi F (2013) Comparative study of RNA-seq-and Microarray-derived coexpression networks in Arabidopsisthaliana Bioinformatics 2 1-8

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gu Y Kaplinsky N Bringmann M Cobb A Carroll A Sampathkumar A Baskin TI Persson S Somerville CR (2010) Identification of acellulose synthase-associated protein required for cellulose biosynthesis Proc Natl Acad Sci 107 12866-12871

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Han Y Gao S Muegge K Zhang W Zhou B (2015) Advanced applications of RNA sequencing and challenges Bioinform Biol Insights 929-46

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Horvath S Dong J (2008) Geometric interpretation of gene coexpression network analysis PLoS Comput biol 4 e1000117Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Huang J Lynn JS Schulte L Vendramin S McGinnis K (2017) Chapter Two-Epigenetic Control of Gene Expression in Maize Int RevCell Mol Biol 328 25-48

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Iancu OD Kawane S Bottomly D Searles R Hitzemann R McWeeney S (2012) Utilizing RNA-Seq data for de novo coexpressionnetwork inference Bioinformatics 28 1592-1597

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kim D Langmead B Salzberg SL (2015) HISAT a fast spliced aligner with low memory requirements Nat Methods 12 357-360Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Koumlster J Rahmann S (2012) Snakemakemdasha scalable bioinformatics workflow engine Bioinformatics 28 2520-2522Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Krouk G Lingeman J Colon A Coruzzi G Shasha D (2013) Gene regulatory networks in plants learning causality from time andperturbation Genome Biol 14 123

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kumar S Stecher G Suleski M Hedges SB (2017) TimeTree a resource for timelines timetrees and divergence times Mol Biol Evol34 1812-1819

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kumari S Nie J Chen H-S Ma H Stewart R Li X Lu M-Z Taylor WM Wei H (2012) Evaluation of gene association methods forcoexpression network construction and biological knowledge discovery PLoS One 7 e50411

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Langfelder P Horvath S (2008) WGCNA an R package for weighted correlation network analysis BMC Bioinformatics 9 559Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Leinonen R Sugawara H Shumway M (2010) The sequence read archive Nucleic Acids Res gkq1019Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li C Qiao Z Qi W Wang Q Yuan Y Yang X Tang Y Mei B Lv Y Zhao H et al (2015a) Genome-Wide Characterization of cis-Acting DNATargets Reveals the Transcriptional Regulatory Framework of Opaque2 in Maize Plant Cell 27 532-545

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li H Peng Z Yang X Wang W Fu J Wang J Han Y Chai Y Guo T Yang N (2013a) Genome-wide association study dissects the geneticarchitecture of oil biosynthesis in maize kernels Nat Genet 45 43-50

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li J Wei H Zhao PX (2013b) DeGNServer Deciphering Genome-Scale Gene Networks through High Performance ReverseEngineering Analysis 2013

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li P Piao Y Shon HS Ryu KH (2015b) Comparing the normalization methods for the differential analysis of Illumina high-throughputRNA-Seq data BMC Bioinformatics 16 347 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li Q Eichten SR Hermanson PJ Zaunbrecher VM Song J Wendt J Rosenbaum H Madzima TF Sloan AE Huang J et al (2014a)Genetic Perturbation of the Maize Methylome Plant Cell 26 4602-4616

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li S Labaj PP Zumbo P Sykacek P Shi W Shi L Phan J Wu P-Y Wang M Wang C (2014b) Detecting and correcting systematicvariation in large-scale RNA sequencing data Nature Biotechnol 32 888-895

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li Y Pearl SA Jackson SA (2015c) Gene Networks in Plant Biology Approaches in Reconstruction and Analysis Trends Plant Sci 20664-675

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Liao Y Smyth GK Shi W (2014) featureCounts an efficient general purpose program for assigning sequence reads to genomicfeatures Bioinformatics 30 923-930

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Lim WK Wang K Lefebvre C Califano A (2007) Comparative analysis of microarray normalization procedures Effects on reverseengineering gene networks Bioinformatics pp 282-288

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Liu S Liu Y Zhao J Cai S Qian H Zuo K Zhao L Zhang L (2017) A computational interactome for prioritizing genes associated withcomplex agronomic traits in rice Plant J 90 177-188

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Lonsdale J Thomas J Salvatore M Phillips R Lo E Shad S Hasz R Walters G Garcia F Young N (2013) The genotype-tissueexpression (GTEx) project Nat Genet 45 580-585

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Love MI Huber W Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 Genome Biol15 1

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Luo X You Z Zhou M Li S Leung H Xia Y Zhu Q (2015) A highly efficient approach to protein interactome mapping based oncollaborative filtering framework Sci Rep 5 7702

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma C Wang X (2012) Application of the Gini correlation coefficient to infer regulatory relationships in transcriptome analysis PlantPhysiol 160 192-203

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma H-W Zeng A-P (2003) The connectivity structure giant strong component and centrality of metabolic networks Bioinformatics 191423-1430

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma S Shah S Bohnert HJ Snyder M Dinesh-Kumar SP (2013) Incorporating motif analysis into gene co-expression networks revealsnovel modular expression pattern and new signaling pathways PLoS Genet 9 e1003840

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Marbach D Costello JC Kuumlffner R Vega NNM Prill RJ Camacho DM Allison KR Aderhold A Allison KR Bonneau R et al (2012)Wisdom of crowds for robust gene network inference Nat Methods 9 796-804

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Margolin AA Nemenman I Basso K Wiggins C Stolovitzky G Dalla Favera R Califano A (2006) ARACNE an algorithm for thereconstruction of gene regulatory networks in a mammalian cellular context BMC Bioinformatics 7 S7

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mark Cigan A Unger-Wallace E Haug-Collet K (2005) Transcriptional gene silencing as a tool for uncovering gene function in maizePlant J 43 929-940

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads EMBnet J 17 pp-10Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Maza E Frasse P Senin P Bouzayen M Zouine M (2013) Comparison of normalization methods for differential gene expressionanalysis in RNA-Seq experiments A matter of relative size of studied transcriptomes Commun Integr Biol 6 e25849

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

McClintock B (1983) The Significance of Responses of THE GENOME TO CHALLENGE Science (80- ) 792-801Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Meyer PE Kontos K Lafitte F Bontempi G (2007) Information-theoretic inference of large transcriptional regulatory networks EurasipJ Bioinforma Syst Biol doi 101155200779879

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Meyer PE Lafitte F Bontempi G (2008) minet ARBioconductor package for inferring large transcriptional networks using mutualinformation BMC Bioinformatics 9 461

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Monaco MK Sen TZ Dharmawardhana PD Ren L Schaeffer M Naithani S Amarasinghe V Thomason J Harper L Gardiner J et al(2013) Maize Metabolic Network Construction and Transcriptome Analysis Plant Genome 6 12

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Morohashi K Casas MI Falcone Ferreyra ML Falcone Ferreyra L Mejiacutea-Guerra MK Pourcel L Yilmaz A Feller A Carvalho B EmilianiJ et al (2012) A genome-wide regulatory framework identifies maize pericarp color1 controlled genes Plant Cell 24 2745-64

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Morris JH Apeltsin L Newman AM Baumbach J Wittkop T Su G Bader GD Ferrin TE (2011) clusterMaker a multi-algorithmclustering plugin for Cytoscape BMC Bioinformatics 12 436

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mortazavi A Williams BA McCue K Schaeffer L Wold B (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq NatMethods 5 621-628

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mukaka MM (2012) A guide to appropriate use of correlation coefficient in medical research Malawi Med J 24 69-71Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Obayashi T Hayashi S Saeki M Ohta H Kinoshita K (2009) ATTED-II provides coexpressed gene networks for Arabidopsis Nucleic wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Acids Res 37 D987-D991Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ogata Y Suzuki H Sakurai N Shibata D (2010) CoP a database for characterizing co-expressed gene modules with biologicalinformation in plants Bioinformatics 26 1267-1268

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Oshlack A Wakefield MJ (2009) Transcript length bias in RNA-seq data confounds systems biology Biol Direct 4 14Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Park T Yi S-G Kang S-H Lee S Lee Y-S Simon R (2003) Evaluation of normalization methods for microarray data BMC Bioinformatics4 33

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Paulson J Chen C-Y Lopes-Ramos CM Kuijjer ML Platig J Sonawane AR Fagny M Glass K Quackenbush J (2016) Tissue-awareRNA-Seq processing and normalization for heterogeneous and sparse data bioRxiv 81802

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Pautler M Eveland AL LaRue T Yang F Weeks R Lunde C Je B Il Meeley R Komatsu M Vollbrecht E et al (2015) FASCIATED EAR4Encodes a bZIP Transcription Factor That Regulates Shoot Meristem Size in Maize Plant Cell Online 2 tpc114132506

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Penning BW Hunter 3rd CT Tayengwa R Eveland AL Dugard CK Olek AT Vermerris W Koch KE McCarty DR Davis MF et al (2009)Genetic resources for maize cell wall biology Plant Physiol 151 1703-1728

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ponnala L Wang Y Sun Q Wijk KJ (2014) Correlation of mRNA and protein abundance in the developing maize leaf Plant J 78 424-440Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Rau A Gallopin M Celeux G Jaffreacutezic F (2013) Data-based filtering for replicated high-throughput transcriptome sequencingexperiments Bioinformatics 29 2146-2152

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Robinson MD McCarthy DJ Smyth GK (2010) edgeR a Bioconductor package for differential expression analysis of digital geneexpression data Bioinformatics 26 139-140

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sales G Romualdi C (2011) parmigenemdasha parallel R package for mutual information estimation and gene network reconstructionBioinformatics 27 1876-1877

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sangrador-Vegas A Mitchell AL Chang H-Y Yong S-Y Finn RD (2016) GO annotation in InterPro why stability does not indicateaccuracy in a sea of changing annotations Database J Biol databases curation 2016

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sato Y Antonio BA Namiki N Takehisa H Minami H Kamatsuki K Sugimoto K Shimizu Y Hirochika H Nagamura Y (2011) RiceXPro aplatform for monitoring gene expression in japonica rice grown under natural field conditions Nucleic Acids Res 39 D1141-D1148

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schaefer RJ Briskine R Springer NM Myers CL (2014) Discovering functional modules across diverse maize transcriptomes using wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

COB the co-expression browser PLoS One doi 101371journalpone0099193Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schmidt D (2016) Co-Operation Fast Correlation Covariance and Cosine Similarity R packagePubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schnable P Ware D Fulton R Stein J (2009) The B73 maize genome complexity diversity and dynamics Science (80- ) 326 1112-1115Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schwanhaumlusser B Busse D Li N Dittmar G Schuchhardt J Wolf J Chen W Selbach M (2011) Global quantification of mammalian geneexpression control Nature 473 337-342

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Scott-Boyer M-P Haibe-Kains B Deschepper CF (2013) Network statistics of genetically-driven gene co-expression modules in mousecrosses Frontiers in Genetics 20134291

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Serin EAR Nijveen H Hilhorst HWM Ligterink W (2016) Learning from Co-expression Networks Possibilities and Challenges FrontPlant Sci 7 444

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Shannon P Markiel A Ozier O Baliga NS Wang JT Ramage D Amin N Schwikowski B Ideker T (2003) Cytoscape a softwareenvironment for integrated models of biomolecular interaction networks Genome Res 13 2498-2504

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sing T Sander O Beerenwinkel N Lengauer T (2005) ROCR visualizing classifier performance in R Bioinformatics 21 3940-3941Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Singh M Goel S Meeley RB Dantec C Parrinello H Michaud C Leblanc O Grimanelli D (2011) Production of viable gametes withoutmeiosis in maize deficient for an ARGONAUTE protein Plant Cell 23 443-458

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Song L Langfelder P Horvath S (2012) Comparison of co-expression measures mutual information correlation and model basedindices BMC Bioinformatics 13 328

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Stelpflug SC Rajandeep S Vaillancourt B Hirsch CN Buell CR Leon N De Kaeppler SM (2015) An expanded maize gene expressionatlas based on RNA-sequencing and its use to explore root development Plant Genome 314-362

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Szklarczyk D Franceschini A Wyder S Forslund K Heller D Huerta-Cepas J Simonovic M Roth A Santos A Tsafou KP et al (2015)STRING v10 Protein-protein interaction networks integrated over the tree of life Nucleic Acids Res 43 D447-D452

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Tzfadia O Diels T De Meyer S Vandepoele K Aharoni A Van de Peer Y (2016) CoExpNetViz Comparative Co-Expression NetworksConstruction and Visualization Tool Front Plant Sci 6 1194

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Usadel B Obayashi T Mutwil M Giorgi FM Bassel GW Tanimoto M Chow A Steinhauser D Persson S Provart NJ (2009) Co-expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

USDA (2016) Grain World Markets and Trade

Vanholme R Demedts B Morreel K Ralph J Boerjan W (2010) Lignin biosynthesis and structure Plant Physiol 153 895-905Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Verdin E Ott M (2014) 50 Years of Protein Acetylation From Gene Regulation To Epigenetics Metabolism and Beyond Nat Rev MolCell Biol 16 258-264

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Walley JW Sartor RC Shen Z Schmitz RJ Wu KJ Urich MA Nery JR Smith LG Schnable JC Ecker JR (2016) Integration of omicnetworks in a developmental atlas of maize Science 353 814-818

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang W Zhou XJ Liu Z Sun F (2015a) Network tuned multiple rank aggregation and applications to gene ranking BMC Bioinformatics16 S6

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang YXR Jiang K Feldman LJ Bickel PJ Huang H (2015b) Inferring gene-gene interactions and functional modules using sparsecanonical correlation analysis Ann Appl Stat 9 300-323

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang Z Gerstein M Snyder M (2009) RNA-Seq a revolutionary tool for transcriptomics Nat Rev Genet 10 57-63Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wei C Li J Bumgarner RE (2004) Sample size for detecting differentially expressed genes in microarray experiments BMC Genomics5 87

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wolfe CJ Kohane IS Butte AJ (2005) Systematic survey reveals general applicability of guilt-by-association within genecoexpression networks BMC Bioinformatics 6 227

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wu D-D Wang X Li Y Zeng L Irwin DM Zhang Y-P (2014) Out of Pollen Hypothesis for Origin of New Genes in Flowering PlantsStudy from Arabidopsis thaliana Genome Biol Evol 6 2822-2829

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yang H Liu X Xin M Du J Hu Z Peng H Rossi V Sun Q Ni Z Yao Y (2016) Genome-wide Mapping of Targets of Maize HistoneDeacetylase HDA101 Reveals Its Function and Regulatory Mechanism during Seed Development Plant Cell 28 629-645

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yim WC Yu Y Song K Jang CS Lee B-M (2013) PLANEX the plant co-expression database BMC Plant Biol 13 83Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zheng W Chung LM Zhao H (2011) Bias detection and correction in RNA-Sequencing data BMC Bioinformatics 12 290Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Allen JD Xiao G Xie Y (2014) Ensemble-based network aggregation improves the accuracy of gene network reconstructionPLoS One 9 e106319 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Ye ZH (2015) Secondary cell walls Biosynthesis patterned deposition and transcriptional regulation Plant Cell Physiol 56195-214

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhu G Wu A Xu X-J Xiao P Lu L Liu J Cao Y Chen L Wu J Zhao X-M (2016) PPIM A protein-protein interaction database for MaizePlant Physiol 170 pp1501821-

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zyprych-Walczak J Szabelska A Handschuh L Goacuterczak K Klamecka K Figlerowicz M Siatkowski I (2015) The impact of normalizationmethods on RNA-Seq data analysis Biomed Res Int 2015

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

  • Parsed Citations
  • Article File
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Figure 7
  • Figure 8
  • Parsed Citations
Page 41: 1 : Maize RNA-Seq GCNs · 4 Author names and affiliation: 5 Ji Huang, ... and the genome annotation is still far from complete (Mark 49 Cigan et al., 2005; Ficklin and Feltus, 2011)

Google Scholar Author Only Title Only Author and Title

Du Z Zhou X Ling Y Zhang Z Su Z (2010) agriGO a GO analysis toolkit for the agricultural community Nucleic Acids Res 38 64-70Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Enright AJ Van Dongen S Ouzounis CA (2002) An efficient algorithm for large-scale detection of protein families Nucleic Acids Res 301575-1584

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fadista J Vikman P Laakso EO Mollet IG Esguerra J Lou Taneera J Storm P Osmark P Ladenvall C Prasad RB (2014) Globalgenomic and transcriptomic analysis of human pancreatic islets reveals novel genes influencing glucose metabolism Proc Natl AcadSci 111 13924-13929

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Faith JJ Hayete B Thaden JT Mogno I Wierzbowski J Cottarel G Kasif S Collins JJ Gardner TS (2007) Large-scale mapping andvalidation of Escherichia coli transcriptional regulation from a compendium of expression profiles PLoS Biol 5 0054-0066

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fedoroff N V (2012) McClintocks challenge in the 21st century Proc Natl Acad Sci 109(50) 20200-20203Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ficklin SP Feltus FA (2011) Gene coexpression network alignment and conservation of gene modules between two grass speciesmaize and rice Plant Physiol 156 1244-56

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Filzmoser P Fritz H Kalcher K (2009) pcaPP Robust PCA by Projection Pursuit R Packag version 1Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Fu J Cheng Y Linghu J Yang X Kang L Zhang Z Zhang J He C Du X Peng Z (2013) RNA sequencing reveals the complex regulatorynetwork in the maize kernel Nature Commun 42832

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gao S Ver Steeg G Galstyan A (2015) Efficient Estimation of Mutual Information for Strongly Dependent Variables ArtificialIntelligence and Statistics 277-286

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gillis J Pavlidis P (2011) The role of indirect connections in gene networks in predicting function Bioinformatics 27 1860-1866Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Giorgi F Fabbro C Del Licausi F (2013) Comparative study of RNA-seq-and Microarray-derived coexpression networks in Arabidopsisthaliana Bioinformatics 2 1-8

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Gu Y Kaplinsky N Bringmann M Cobb A Carroll A Sampathkumar A Baskin TI Persson S Somerville CR (2010) Identification of acellulose synthase-associated protein required for cellulose biosynthesis Proc Natl Acad Sci 107 12866-12871

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Han Y Gao S Muegge K Zhang W Zhou B (2015) Advanced applications of RNA sequencing and challenges Bioinform Biol Insights 929-46

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Horvath S Dong J (2008) Geometric interpretation of gene coexpression network analysis PLoS Comput biol 4 e1000117Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Huang J Lynn JS Schulte L Vendramin S McGinnis K (2017) Chapter Two-Epigenetic Control of Gene Expression in Maize Int RevCell Mol Biol 328 25-48

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Iancu OD Kawane S Bottomly D Searles R Hitzemann R McWeeney S (2012) Utilizing RNA-Seq data for de novo coexpressionnetwork inference Bioinformatics 28 1592-1597

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kim D Langmead B Salzberg SL (2015) HISAT a fast spliced aligner with low memory requirements Nat Methods 12 357-360Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Koumlster J Rahmann S (2012) Snakemakemdasha scalable bioinformatics workflow engine Bioinformatics 28 2520-2522Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Krouk G Lingeman J Colon A Coruzzi G Shasha D (2013) Gene regulatory networks in plants learning causality from time andperturbation Genome Biol 14 123

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kumar S Stecher G Suleski M Hedges SB (2017) TimeTree a resource for timelines timetrees and divergence times Mol Biol Evol34 1812-1819

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kumari S Nie J Chen H-S Ma H Stewart R Li X Lu M-Z Taylor WM Wei H (2012) Evaluation of gene association methods forcoexpression network construction and biological knowledge discovery PLoS One 7 e50411

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Langfelder P Horvath S (2008) WGCNA an R package for weighted correlation network analysis BMC Bioinformatics 9 559Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Leinonen R Sugawara H Shumway M (2010) The sequence read archive Nucleic Acids Res gkq1019Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li C Qiao Z Qi W Wang Q Yuan Y Yang X Tang Y Mei B Lv Y Zhao H et al (2015a) Genome-Wide Characterization of cis-Acting DNATargets Reveals the Transcriptional Regulatory Framework of Opaque2 in Maize Plant Cell 27 532-545

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li H Peng Z Yang X Wang W Fu J Wang J Han Y Chai Y Guo T Yang N (2013a) Genome-wide association study dissects the geneticarchitecture of oil biosynthesis in maize kernels Nat Genet 45 43-50

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li J Wei H Zhao PX (2013b) DeGNServer Deciphering Genome-Scale Gene Networks through High Performance ReverseEngineering Analysis 2013

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li P Piao Y Shon HS Ryu KH (2015b) Comparing the normalization methods for the differential analysis of Illumina high-throughputRNA-Seq data BMC Bioinformatics 16 347 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li Q Eichten SR Hermanson PJ Zaunbrecher VM Song J Wendt J Rosenbaum H Madzima TF Sloan AE Huang J et al (2014a)Genetic Perturbation of the Maize Methylome Plant Cell 26 4602-4616

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li S Labaj PP Zumbo P Sykacek P Shi W Shi L Phan J Wu P-Y Wang M Wang C (2014b) Detecting and correcting systematicvariation in large-scale RNA sequencing data Nature Biotechnol 32 888-895

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li Y Pearl SA Jackson SA (2015c) Gene Networks in Plant Biology Approaches in Reconstruction and Analysis Trends Plant Sci 20664-675

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Liao Y Smyth GK Shi W (2014) featureCounts an efficient general purpose program for assigning sequence reads to genomicfeatures Bioinformatics 30 923-930

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Lim WK Wang K Lefebvre C Califano A (2007) Comparative analysis of microarray normalization procedures Effects on reverseengineering gene networks Bioinformatics pp 282-288

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Liu S Liu Y Zhao J Cai S Qian H Zuo K Zhao L Zhang L (2017) A computational interactome for prioritizing genes associated withcomplex agronomic traits in rice Plant J 90 177-188

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Lonsdale J Thomas J Salvatore M Phillips R Lo E Shad S Hasz R Walters G Garcia F Young N (2013) The genotype-tissueexpression (GTEx) project Nat Genet 45 580-585

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Love MI Huber W Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 Genome Biol15 1

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Luo X You Z Zhou M Li S Leung H Xia Y Zhu Q (2015) A highly efficient approach to protein interactome mapping based oncollaborative filtering framework Sci Rep 5 7702

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma C Wang X (2012) Application of the Gini correlation coefficient to infer regulatory relationships in transcriptome analysis PlantPhysiol 160 192-203

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma H-W Zeng A-P (2003) The connectivity structure giant strong component and centrality of metabolic networks Bioinformatics 191423-1430

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma S Shah S Bohnert HJ Snyder M Dinesh-Kumar SP (2013) Incorporating motif analysis into gene co-expression networks revealsnovel modular expression pattern and new signaling pathways PLoS Genet 9 e1003840

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Marbach D Costello JC Kuumlffner R Vega NNM Prill RJ Camacho DM Allison KR Aderhold A Allison KR Bonneau R et al (2012)Wisdom of crowds for robust gene network inference Nat Methods 9 796-804

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Margolin AA Nemenman I Basso K Wiggins C Stolovitzky G Dalla Favera R Califano A (2006) ARACNE an algorithm for thereconstruction of gene regulatory networks in a mammalian cellular context BMC Bioinformatics 7 S7

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mark Cigan A Unger-Wallace E Haug-Collet K (2005) Transcriptional gene silencing as a tool for uncovering gene function in maizePlant J 43 929-940

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads EMBnet J 17 pp-10Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Maza E Frasse P Senin P Bouzayen M Zouine M (2013) Comparison of normalization methods for differential gene expressionanalysis in RNA-Seq experiments A matter of relative size of studied transcriptomes Commun Integr Biol 6 e25849

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

McClintock B (1983) The Significance of Responses of THE GENOME TO CHALLENGE Science (80- ) 792-801Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Meyer PE Kontos K Lafitte F Bontempi G (2007) Information-theoretic inference of large transcriptional regulatory networks EurasipJ Bioinforma Syst Biol doi 101155200779879

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Meyer PE Lafitte F Bontempi G (2008) minet ARBioconductor package for inferring large transcriptional networks using mutualinformation BMC Bioinformatics 9 461

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Monaco MK Sen TZ Dharmawardhana PD Ren L Schaeffer M Naithani S Amarasinghe V Thomason J Harper L Gardiner J et al(2013) Maize Metabolic Network Construction and Transcriptome Analysis Plant Genome 6 12

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Morohashi K Casas MI Falcone Ferreyra ML Falcone Ferreyra L Mejiacutea-Guerra MK Pourcel L Yilmaz A Feller A Carvalho B EmilianiJ et al (2012) A genome-wide regulatory framework identifies maize pericarp color1 controlled genes Plant Cell 24 2745-64

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Morris JH Apeltsin L Newman AM Baumbach J Wittkop T Su G Bader GD Ferrin TE (2011) clusterMaker a multi-algorithmclustering plugin for Cytoscape BMC Bioinformatics 12 436

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mortazavi A Williams BA McCue K Schaeffer L Wold B (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq NatMethods 5 621-628

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mukaka MM (2012) A guide to appropriate use of correlation coefficient in medical research Malawi Med J 24 69-71Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Obayashi T Hayashi S Saeki M Ohta H Kinoshita K (2009) ATTED-II provides coexpressed gene networks for Arabidopsis Nucleic wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Acids Res 37 D987-D991Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ogata Y Suzuki H Sakurai N Shibata D (2010) CoP a database for characterizing co-expressed gene modules with biologicalinformation in plants Bioinformatics 26 1267-1268

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Oshlack A Wakefield MJ (2009) Transcript length bias in RNA-seq data confounds systems biology Biol Direct 4 14Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Park T Yi S-G Kang S-H Lee S Lee Y-S Simon R (2003) Evaluation of normalization methods for microarray data BMC Bioinformatics4 33

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Paulson J Chen C-Y Lopes-Ramos CM Kuijjer ML Platig J Sonawane AR Fagny M Glass K Quackenbush J (2016) Tissue-awareRNA-Seq processing and normalization for heterogeneous and sparse data bioRxiv 81802

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Pautler M Eveland AL LaRue T Yang F Weeks R Lunde C Je B Il Meeley R Komatsu M Vollbrecht E et al (2015) FASCIATED EAR4Encodes a bZIP Transcription Factor That Regulates Shoot Meristem Size in Maize Plant Cell Online 2 tpc114132506

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Penning BW Hunter 3rd CT Tayengwa R Eveland AL Dugard CK Olek AT Vermerris W Koch KE McCarty DR Davis MF et al (2009)Genetic resources for maize cell wall biology Plant Physiol 151 1703-1728

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ponnala L Wang Y Sun Q Wijk KJ (2014) Correlation of mRNA and protein abundance in the developing maize leaf Plant J 78 424-440Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Rau A Gallopin M Celeux G Jaffreacutezic F (2013) Data-based filtering for replicated high-throughput transcriptome sequencingexperiments Bioinformatics 29 2146-2152

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Robinson MD McCarthy DJ Smyth GK (2010) edgeR a Bioconductor package for differential expression analysis of digital geneexpression data Bioinformatics 26 139-140

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sales G Romualdi C (2011) parmigenemdasha parallel R package for mutual information estimation and gene network reconstructionBioinformatics 27 1876-1877

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sangrador-Vegas A Mitchell AL Chang H-Y Yong S-Y Finn RD (2016) GO annotation in InterPro why stability does not indicateaccuracy in a sea of changing annotations Database J Biol databases curation 2016

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sato Y Antonio BA Namiki N Takehisa H Minami H Kamatsuki K Sugimoto K Shimizu Y Hirochika H Nagamura Y (2011) RiceXPro aplatform for monitoring gene expression in japonica rice grown under natural field conditions Nucleic Acids Res 39 D1141-D1148

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schaefer RJ Briskine R Springer NM Myers CL (2014) Discovering functional modules across diverse maize transcriptomes using wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

COB the co-expression browser PLoS One doi 101371journalpone0099193Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schmidt D (2016) Co-Operation Fast Correlation Covariance and Cosine Similarity R packagePubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schnable P Ware D Fulton R Stein J (2009) The B73 maize genome complexity diversity and dynamics Science (80- ) 326 1112-1115Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schwanhaumlusser B Busse D Li N Dittmar G Schuchhardt J Wolf J Chen W Selbach M (2011) Global quantification of mammalian geneexpression control Nature 473 337-342

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Scott-Boyer M-P Haibe-Kains B Deschepper CF (2013) Network statistics of genetically-driven gene co-expression modules in mousecrosses Frontiers in Genetics 20134291

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Serin EAR Nijveen H Hilhorst HWM Ligterink W (2016) Learning from Co-expression Networks Possibilities and Challenges FrontPlant Sci 7 444

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Shannon P Markiel A Ozier O Baliga NS Wang JT Ramage D Amin N Schwikowski B Ideker T (2003) Cytoscape a softwareenvironment for integrated models of biomolecular interaction networks Genome Res 13 2498-2504

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sing T Sander O Beerenwinkel N Lengauer T (2005) ROCR visualizing classifier performance in R Bioinformatics 21 3940-3941Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Singh M Goel S Meeley RB Dantec C Parrinello H Michaud C Leblanc O Grimanelli D (2011) Production of viable gametes withoutmeiosis in maize deficient for an ARGONAUTE protein Plant Cell 23 443-458

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Song L Langfelder P Horvath S (2012) Comparison of co-expression measures mutual information correlation and model basedindices BMC Bioinformatics 13 328

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Stelpflug SC Rajandeep S Vaillancourt B Hirsch CN Buell CR Leon N De Kaeppler SM (2015) An expanded maize gene expressionatlas based on RNA-sequencing and its use to explore root development Plant Genome 314-362

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Szklarczyk D Franceschini A Wyder S Forslund K Heller D Huerta-Cepas J Simonovic M Roth A Santos A Tsafou KP et al (2015)STRING v10 Protein-protein interaction networks integrated over the tree of life Nucleic Acids Res 43 D447-D452

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Tzfadia O Diels T De Meyer S Vandepoele K Aharoni A Van de Peer Y (2016) CoExpNetViz Comparative Co-Expression NetworksConstruction and Visualization Tool Front Plant Sci 6 1194

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Usadel B Obayashi T Mutwil M Giorgi FM Bassel GW Tanimoto M Chow A Steinhauser D Persson S Provart NJ (2009) Co-expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

USDA (2016) Grain World Markets and Trade

Vanholme R Demedts B Morreel K Ralph J Boerjan W (2010) Lignin biosynthesis and structure Plant Physiol 153 895-905Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Verdin E Ott M (2014) 50 Years of Protein Acetylation From Gene Regulation To Epigenetics Metabolism and Beyond Nat Rev MolCell Biol 16 258-264

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Walley JW Sartor RC Shen Z Schmitz RJ Wu KJ Urich MA Nery JR Smith LG Schnable JC Ecker JR (2016) Integration of omicnetworks in a developmental atlas of maize Science 353 814-818

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang W Zhou XJ Liu Z Sun F (2015a) Network tuned multiple rank aggregation and applications to gene ranking BMC Bioinformatics16 S6

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang YXR Jiang K Feldman LJ Bickel PJ Huang H (2015b) Inferring gene-gene interactions and functional modules using sparsecanonical correlation analysis Ann Appl Stat 9 300-323

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang Z Gerstein M Snyder M (2009) RNA-Seq a revolutionary tool for transcriptomics Nat Rev Genet 10 57-63Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wei C Li J Bumgarner RE (2004) Sample size for detecting differentially expressed genes in microarray experiments BMC Genomics5 87

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wolfe CJ Kohane IS Butte AJ (2005) Systematic survey reveals general applicability of guilt-by-association within genecoexpression networks BMC Bioinformatics 6 227

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wu D-D Wang X Li Y Zeng L Irwin DM Zhang Y-P (2014) Out of Pollen Hypothesis for Origin of New Genes in Flowering PlantsStudy from Arabidopsis thaliana Genome Biol Evol 6 2822-2829

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yang H Liu X Xin M Du J Hu Z Peng H Rossi V Sun Q Ni Z Yao Y (2016) Genome-wide Mapping of Targets of Maize HistoneDeacetylase HDA101 Reveals Its Function and Regulatory Mechanism during Seed Development Plant Cell 28 629-645

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yim WC Yu Y Song K Jang CS Lee B-M (2013) PLANEX the plant co-expression database BMC Plant Biol 13 83Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zheng W Chung LM Zhao H (2011) Bias detection and correction in RNA-Sequencing data BMC Bioinformatics 12 290Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Allen JD Xiao G Xie Y (2014) Ensemble-based network aggregation improves the accuracy of gene network reconstructionPLoS One 9 e106319 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Ye ZH (2015) Secondary cell walls Biosynthesis patterned deposition and transcriptional regulation Plant Cell Physiol 56195-214

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhu G Wu A Xu X-J Xiao P Lu L Liu J Cao Y Chen L Wu J Zhao X-M (2016) PPIM A protein-protein interaction database for MaizePlant Physiol 170 pp1501821-

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zyprych-Walczak J Szabelska A Handschuh L Goacuterczak K Klamecka K Figlerowicz M Siatkowski I (2015) The impact of normalizationmethods on RNA-Seq data analysis Biomed Res Int 2015

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

  • Parsed Citations
  • Article File
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Figure 7
  • Figure 8
  • Parsed Citations
Page 42: 1 : Maize RNA-Seq GCNs · 4 Author names and affiliation: 5 Ji Huang, ... and the genome annotation is still far from complete (Mark 49 Cigan et al., 2005; Ficklin and Feltus, 2011)

Horvath S Dong J (2008) Geometric interpretation of gene coexpression network analysis PLoS Comput biol 4 e1000117Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Huang J Lynn JS Schulte L Vendramin S McGinnis K (2017) Chapter Two-Epigenetic Control of Gene Expression in Maize Int RevCell Mol Biol 328 25-48

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Iancu OD Kawane S Bottomly D Searles R Hitzemann R McWeeney S (2012) Utilizing RNA-Seq data for de novo coexpressionnetwork inference Bioinformatics 28 1592-1597

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kim D Langmead B Salzberg SL (2015) HISAT a fast spliced aligner with low memory requirements Nat Methods 12 357-360Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Koumlster J Rahmann S (2012) Snakemakemdasha scalable bioinformatics workflow engine Bioinformatics 28 2520-2522Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Krouk G Lingeman J Colon A Coruzzi G Shasha D (2013) Gene regulatory networks in plants learning causality from time andperturbation Genome Biol 14 123

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kumar S Stecher G Suleski M Hedges SB (2017) TimeTree a resource for timelines timetrees and divergence times Mol Biol Evol34 1812-1819

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Kumari S Nie J Chen H-S Ma H Stewart R Li X Lu M-Z Taylor WM Wei H (2012) Evaluation of gene association methods forcoexpression network construction and biological knowledge discovery PLoS One 7 e50411

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Langfelder P Horvath S (2008) WGCNA an R package for weighted correlation network analysis BMC Bioinformatics 9 559Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Leinonen R Sugawara H Shumway M (2010) The sequence read archive Nucleic Acids Res gkq1019Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li C Qiao Z Qi W Wang Q Yuan Y Yang X Tang Y Mei B Lv Y Zhao H et al (2015a) Genome-Wide Characterization of cis-Acting DNATargets Reveals the Transcriptional Regulatory Framework of Opaque2 in Maize Plant Cell 27 532-545

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li H Peng Z Yang X Wang W Fu J Wang J Han Y Chai Y Guo T Yang N (2013a) Genome-wide association study dissects the geneticarchitecture of oil biosynthesis in maize kernels Nat Genet 45 43-50

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li J Wei H Zhao PX (2013b) DeGNServer Deciphering Genome-Scale Gene Networks through High Performance ReverseEngineering Analysis 2013

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li P Piao Y Shon HS Ryu KH (2015b) Comparing the normalization methods for the differential analysis of Illumina high-throughputRNA-Seq data BMC Bioinformatics 16 347 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li Q Eichten SR Hermanson PJ Zaunbrecher VM Song J Wendt J Rosenbaum H Madzima TF Sloan AE Huang J et al (2014a)Genetic Perturbation of the Maize Methylome Plant Cell 26 4602-4616

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li S Labaj PP Zumbo P Sykacek P Shi W Shi L Phan J Wu P-Y Wang M Wang C (2014b) Detecting and correcting systematicvariation in large-scale RNA sequencing data Nature Biotechnol 32 888-895

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li Y Pearl SA Jackson SA (2015c) Gene Networks in Plant Biology Approaches in Reconstruction and Analysis Trends Plant Sci 20664-675

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Liao Y Smyth GK Shi W (2014) featureCounts an efficient general purpose program for assigning sequence reads to genomicfeatures Bioinformatics 30 923-930

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Lim WK Wang K Lefebvre C Califano A (2007) Comparative analysis of microarray normalization procedures Effects on reverseengineering gene networks Bioinformatics pp 282-288

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Liu S Liu Y Zhao J Cai S Qian H Zuo K Zhao L Zhang L (2017) A computational interactome for prioritizing genes associated withcomplex agronomic traits in rice Plant J 90 177-188

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Lonsdale J Thomas J Salvatore M Phillips R Lo E Shad S Hasz R Walters G Garcia F Young N (2013) The genotype-tissueexpression (GTEx) project Nat Genet 45 580-585

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Love MI Huber W Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 Genome Biol15 1

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Luo X You Z Zhou M Li S Leung H Xia Y Zhu Q (2015) A highly efficient approach to protein interactome mapping based oncollaborative filtering framework Sci Rep 5 7702

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma C Wang X (2012) Application of the Gini correlation coefficient to infer regulatory relationships in transcriptome analysis PlantPhysiol 160 192-203

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma H-W Zeng A-P (2003) The connectivity structure giant strong component and centrality of metabolic networks Bioinformatics 191423-1430

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma S Shah S Bohnert HJ Snyder M Dinesh-Kumar SP (2013) Incorporating motif analysis into gene co-expression networks revealsnovel modular expression pattern and new signaling pathways PLoS Genet 9 e1003840

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Marbach D Costello JC Kuumlffner R Vega NNM Prill RJ Camacho DM Allison KR Aderhold A Allison KR Bonneau R et al (2012)Wisdom of crowds for robust gene network inference Nat Methods 9 796-804

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Margolin AA Nemenman I Basso K Wiggins C Stolovitzky G Dalla Favera R Califano A (2006) ARACNE an algorithm for thereconstruction of gene regulatory networks in a mammalian cellular context BMC Bioinformatics 7 S7

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mark Cigan A Unger-Wallace E Haug-Collet K (2005) Transcriptional gene silencing as a tool for uncovering gene function in maizePlant J 43 929-940

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads EMBnet J 17 pp-10Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Maza E Frasse P Senin P Bouzayen M Zouine M (2013) Comparison of normalization methods for differential gene expressionanalysis in RNA-Seq experiments A matter of relative size of studied transcriptomes Commun Integr Biol 6 e25849

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

McClintock B (1983) The Significance of Responses of THE GENOME TO CHALLENGE Science (80- ) 792-801Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Meyer PE Kontos K Lafitte F Bontempi G (2007) Information-theoretic inference of large transcriptional regulatory networks EurasipJ Bioinforma Syst Biol doi 101155200779879

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Meyer PE Lafitte F Bontempi G (2008) minet ARBioconductor package for inferring large transcriptional networks using mutualinformation BMC Bioinformatics 9 461

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Monaco MK Sen TZ Dharmawardhana PD Ren L Schaeffer M Naithani S Amarasinghe V Thomason J Harper L Gardiner J et al(2013) Maize Metabolic Network Construction and Transcriptome Analysis Plant Genome 6 12

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Morohashi K Casas MI Falcone Ferreyra ML Falcone Ferreyra L Mejiacutea-Guerra MK Pourcel L Yilmaz A Feller A Carvalho B EmilianiJ et al (2012) A genome-wide regulatory framework identifies maize pericarp color1 controlled genes Plant Cell 24 2745-64

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Morris JH Apeltsin L Newman AM Baumbach J Wittkop T Su G Bader GD Ferrin TE (2011) clusterMaker a multi-algorithmclustering plugin for Cytoscape BMC Bioinformatics 12 436

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mortazavi A Williams BA McCue K Schaeffer L Wold B (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq NatMethods 5 621-628

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mukaka MM (2012) A guide to appropriate use of correlation coefficient in medical research Malawi Med J 24 69-71Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Obayashi T Hayashi S Saeki M Ohta H Kinoshita K (2009) ATTED-II provides coexpressed gene networks for Arabidopsis Nucleic wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Acids Res 37 D987-D991Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ogata Y Suzuki H Sakurai N Shibata D (2010) CoP a database for characterizing co-expressed gene modules with biologicalinformation in plants Bioinformatics 26 1267-1268

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Oshlack A Wakefield MJ (2009) Transcript length bias in RNA-seq data confounds systems biology Biol Direct 4 14Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Park T Yi S-G Kang S-H Lee S Lee Y-S Simon R (2003) Evaluation of normalization methods for microarray data BMC Bioinformatics4 33

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Paulson J Chen C-Y Lopes-Ramos CM Kuijjer ML Platig J Sonawane AR Fagny M Glass K Quackenbush J (2016) Tissue-awareRNA-Seq processing and normalization for heterogeneous and sparse data bioRxiv 81802

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Pautler M Eveland AL LaRue T Yang F Weeks R Lunde C Je B Il Meeley R Komatsu M Vollbrecht E et al (2015) FASCIATED EAR4Encodes a bZIP Transcription Factor That Regulates Shoot Meristem Size in Maize Plant Cell Online 2 tpc114132506

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Penning BW Hunter 3rd CT Tayengwa R Eveland AL Dugard CK Olek AT Vermerris W Koch KE McCarty DR Davis MF et al (2009)Genetic resources for maize cell wall biology Plant Physiol 151 1703-1728

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ponnala L Wang Y Sun Q Wijk KJ (2014) Correlation of mRNA and protein abundance in the developing maize leaf Plant J 78 424-440Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Rau A Gallopin M Celeux G Jaffreacutezic F (2013) Data-based filtering for replicated high-throughput transcriptome sequencingexperiments Bioinformatics 29 2146-2152

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Robinson MD McCarthy DJ Smyth GK (2010) edgeR a Bioconductor package for differential expression analysis of digital geneexpression data Bioinformatics 26 139-140

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sales G Romualdi C (2011) parmigenemdasha parallel R package for mutual information estimation and gene network reconstructionBioinformatics 27 1876-1877

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sangrador-Vegas A Mitchell AL Chang H-Y Yong S-Y Finn RD (2016) GO annotation in InterPro why stability does not indicateaccuracy in a sea of changing annotations Database J Biol databases curation 2016

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sato Y Antonio BA Namiki N Takehisa H Minami H Kamatsuki K Sugimoto K Shimizu Y Hirochika H Nagamura Y (2011) RiceXPro aplatform for monitoring gene expression in japonica rice grown under natural field conditions Nucleic Acids Res 39 D1141-D1148

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schaefer RJ Briskine R Springer NM Myers CL (2014) Discovering functional modules across diverse maize transcriptomes using wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

COB the co-expression browser PLoS One doi 101371journalpone0099193Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schmidt D (2016) Co-Operation Fast Correlation Covariance and Cosine Similarity R packagePubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schnable P Ware D Fulton R Stein J (2009) The B73 maize genome complexity diversity and dynamics Science (80- ) 326 1112-1115Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schwanhaumlusser B Busse D Li N Dittmar G Schuchhardt J Wolf J Chen W Selbach M (2011) Global quantification of mammalian geneexpression control Nature 473 337-342

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Scott-Boyer M-P Haibe-Kains B Deschepper CF (2013) Network statistics of genetically-driven gene co-expression modules in mousecrosses Frontiers in Genetics 20134291

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Serin EAR Nijveen H Hilhorst HWM Ligterink W (2016) Learning from Co-expression Networks Possibilities and Challenges FrontPlant Sci 7 444

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Shannon P Markiel A Ozier O Baliga NS Wang JT Ramage D Amin N Schwikowski B Ideker T (2003) Cytoscape a softwareenvironment for integrated models of biomolecular interaction networks Genome Res 13 2498-2504

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sing T Sander O Beerenwinkel N Lengauer T (2005) ROCR visualizing classifier performance in R Bioinformatics 21 3940-3941Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Singh M Goel S Meeley RB Dantec C Parrinello H Michaud C Leblanc O Grimanelli D (2011) Production of viable gametes withoutmeiosis in maize deficient for an ARGONAUTE protein Plant Cell 23 443-458

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Song L Langfelder P Horvath S (2012) Comparison of co-expression measures mutual information correlation and model basedindices BMC Bioinformatics 13 328

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Stelpflug SC Rajandeep S Vaillancourt B Hirsch CN Buell CR Leon N De Kaeppler SM (2015) An expanded maize gene expressionatlas based on RNA-sequencing and its use to explore root development Plant Genome 314-362

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Szklarczyk D Franceschini A Wyder S Forslund K Heller D Huerta-Cepas J Simonovic M Roth A Santos A Tsafou KP et al (2015)STRING v10 Protein-protein interaction networks integrated over the tree of life Nucleic Acids Res 43 D447-D452

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Tzfadia O Diels T De Meyer S Vandepoele K Aharoni A Van de Peer Y (2016) CoExpNetViz Comparative Co-Expression NetworksConstruction and Visualization Tool Front Plant Sci 6 1194

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Usadel B Obayashi T Mutwil M Giorgi FM Bassel GW Tanimoto M Chow A Steinhauser D Persson S Provart NJ (2009) Co-expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

USDA (2016) Grain World Markets and Trade

Vanholme R Demedts B Morreel K Ralph J Boerjan W (2010) Lignin biosynthesis and structure Plant Physiol 153 895-905Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Verdin E Ott M (2014) 50 Years of Protein Acetylation From Gene Regulation To Epigenetics Metabolism and Beyond Nat Rev MolCell Biol 16 258-264

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Walley JW Sartor RC Shen Z Schmitz RJ Wu KJ Urich MA Nery JR Smith LG Schnable JC Ecker JR (2016) Integration of omicnetworks in a developmental atlas of maize Science 353 814-818

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang W Zhou XJ Liu Z Sun F (2015a) Network tuned multiple rank aggregation and applications to gene ranking BMC Bioinformatics16 S6

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang YXR Jiang K Feldman LJ Bickel PJ Huang H (2015b) Inferring gene-gene interactions and functional modules using sparsecanonical correlation analysis Ann Appl Stat 9 300-323

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang Z Gerstein M Snyder M (2009) RNA-Seq a revolutionary tool for transcriptomics Nat Rev Genet 10 57-63Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wei C Li J Bumgarner RE (2004) Sample size for detecting differentially expressed genes in microarray experiments BMC Genomics5 87

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wolfe CJ Kohane IS Butte AJ (2005) Systematic survey reveals general applicability of guilt-by-association within genecoexpression networks BMC Bioinformatics 6 227

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wu D-D Wang X Li Y Zeng L Irwin DM Zhang Y-P (2014) Out of Pollen Hypothesis for Origin of New Genes in Flowering PlantsStudy from Arabidopsis thaliana Genome Biol Evol 6 2822-2829

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yang H Liu X Xin M Du J Hu Z Peng H Rossi V Sun Q Ni Z Yao Y (2016) Genome-wide Mapping of Targets of Maize HistoneDeacetylase HDA101 Reveals Its Function and Regulatory Mechanism during Seed Development Plant Cell 28 629-645

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yim WC Yu Y Song K Jang CS Lee B-M (2013) PLANEX the plant co-expression database BMC Plant Biol 13 83Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zheng W Chung LM Zhao H (2011) Bias detection and correction in RNA-Sequencing data BMC Bioinformatics 12 290Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Allen JD Xiao G Xie Y (2014) Ensemble-based network aggregation improves the accuracy of gene network reconstructionPLoS One 9 e106319 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Ye ZH (2015) Secondary cell walls Biosynthesis patterned deposition and transcriptional regulation Plant Cell Physiol 56195-214

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhu G Wu A Xu X-J Xiao P Lu L Liu J Cao Y Chen L Wu J Zhao X-M (2016) PPIM A protein-protein interaction database for MaizePlant Physiol 170 pp1501821-

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zyprych-Walczak J Szabelska A Handschuh L Goacuterczak K Klamecka K Figlerowicz M Siatkowski I (2015) The impact of normalizationmethods on RNA-Seq data analysis Biomed Res Int 2015

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

  • Parsed Citations
  • Article File
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Figure 7
  • Figure 8
  • Parsed Citations
Page 43: 1 : Maize RNA-Seq GCNs · 4 Author names and affiliation: 5 Ji Huang, ... and the genome annotation is still far from complete (Mark 49 Cigan et al., 2005; Ficklin and Feltus, 2011)

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li Q Eichten SR Hermanson PJ Zaunbrecher VM Song J Wendt J Rosenbaum H Madzima TF Sloan AE Huang J et al (2014a)Genetic Perturbation of the Maize Methylome Plant Cell 26 4602-4616

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li S Labaj PP Zumbo P Sykacek P Shi W Shi L Phan J Wu P-Y Wang M Wang C (2014b) Detecting and correcting systematicvariation in large-scale RNA sequencing data Nature Biotechnol 32 888-895

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Li Y Pearl SA Jackson SA (2015c) Gene Networks in Plant Biology Approaches in Reconstruction and Analysis Trends Plant Sci 20664-675

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Liao Y Smyth GK Shi W (2014) featureCounts an efficient general purpose program for assigning sequence reads to genomicfeatures Bioinformatics 30 923-930

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Lim WK Wang K Lefebvre C Califano A (2007) Comparative analysis of microarray normalization procedures Effects on reverseengineering gene networks Bioinformatics pp 282-288

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Liu S Liu Y Zhao J Cai S Qian H Zuo K Zhao L Zhang L (2017) A computational interactome for prioritizing genes associated withcomplex agronomic traits in rice Plant J 90 177-188

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Lonsdale J Thomas J Salvatore M Phillips R Lo E Shad S Hasz R Walters G Garcia F Young N (2013) The genotype-tissueexpression (GTEx) project Nat Genet 45 580-585

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Love MI Huber W Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 Genome Biol15 1

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Luo X You Z Zhou M Li S Leung H Xia Y Zhu Q (2015) A highly efficient approach to protein interactome mapping based oncollaborative filtering framework Sci Rep 5 7702

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma C Wang X (2012) Application of the Gini correlation coefficient to infer regulatory relationships in transcriptome analysis PlantPhysiol 160 192-203

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma H-W Zeng A-P (2003) The connectivity structure giant strong component and centrality of metabolic networks Bioinformatics 191423-1430

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ma S Shah S Bohnert HJ Snyder M Dinesh-Kumar SP (2013) Incorporating motif analysis into gene co-expression networks revealsnovel modular expression pattern and new signaling pathways PLoS Genet 9 e1003840

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Marbach D Costello JC Kuumlffner R Vega NNM Prill RJ Camacho DM Allison KR Aderhold A Allison KR Bonneau R et al (2012)Wisdom of crowds for robust gene network inference Nat Methods 9 796-804

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Margolin AA Nemenman I Basso K Wiggins C Stolovitzky G Dalla Favera R Califano A (2006) ARACNE an algorithm for thereconstruction of gene regulatory networks in a mammalian cellular context BMC Bioinformatics 7 S7

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mark Cigan A Unger-Wallace E Haug-Collet K (2005) Transcriptional gene silencing as a tool for uncovering gene function in maizePlant J 43 929-940

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads EMBnet J 17 pp-10Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Maza E Frasse P Senin P Bouzayen M Zouine M (2013) Comparison of normalization methods for differential gene expressionanalysis in RNA-Seq experiments A matter of relative size of studied transcriptomes Commun Integr Biol 6 e25849

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

McClintock B (1983) The Significance of Responses of THE GENOME TO CHALLENGE Science (80- ) 792-801Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Meyer PE Kontos K Lafitte F Bontempi G (2007) Information-theoretic inference of large transcriptional regulatory networks EurasipJ Bioinforma Syst Biol doi 101155200779879

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Meyer PE Lafitte F Bontempi G (2008) minet ARBioconductor package for inferring large transcriptional networks using mutualinformation BMC Bioinformatics 9 461

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Monaco MK Sen TZ Dharmawardhana PD Ren L Schaeffer M Naithani S Amarasinghe V Thomason J Harper L Gardiner J et al(2013) Maize Metabolic Network Construction and Transcriptome Analysis Plant Genome 6 12

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Morohashi K Casas MI Falcone Ferreyra ML Falcone Ferreyra L Mejiacutea-Guerra MK Pourcel L Yilmaz A Feller A Carvalho B EmilianiJ et al (2012) A genome-wide regulatory framework identifies maize pericarp color1 controlled genes Plant Cell 24 2745-64

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Morris JH Apeltsin L Newman AM Baumbach J Wittkop T Su G Bader GD Ferrin TE (2011) clusterMaker a multi-algorithmclustering plugin for Cytoscape BMC Bioinformatics 12 436

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mortazavi A Williams BA McCue K Schaeffer L Wold B (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq NatMethods 5 621-628

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mukaka MM (2012) A guide to appropriate use of correlation coefficient in medical research Malawi Med J 24 69-71Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Obayashi T Hayashi S Saeki M Ohta H Kinoshita K (2009) ATTED-II provides coexpressed gene networks for Arabidopsis Nucleic wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Acids Res 37 D987-D991Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ogata Y Suzuki H Sakurai N Shibata D (2010) CoP a database for characterizing co-expressed gene modules with biologicalinformation in plants Bioinformatics 26 1267-1268

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Oshlack A Wakefield MJ (2009) Transcript length bias in RNA-seq data confounds systems biology Biol Direct 4 14Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Park T Yi S-G Kang S-H Lee S Lee Y-S Simon R (2003) Evaluation of normalization methods for microarray data BMC Bioinformatics4 33

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Paulson J Chen C-Y Lopes-Ramos CM Kuijjer ML Platig J Sonawane AR Fagny M Glass K Quackenbush J (2016) Tissue-awareRNA-Seq processing and normalization for heterogeneous and sparse data bioRxiv 81802

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Pautler M Eveland AL LaRue T Yang F Weeks R Lunde C Je B Il Meeley R Komatsu M Vollbrecht E et al (2015) FASCIATED EAR4Encodes a bZIP Transcription Factor That Regulates Shoot Meristem Size in Maize Plant Cell Online 2 tpc114132506

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Penning BW Hunter 3rd CT Tayengwa R Eveland AL Dugard CK Olek AT Vermerris W Koch KE McCarty DR Davis MF et al (2009)Genetic resources for maize cell wall biology Plant Physiol 151 1703-1728

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ponnala L Wang Y Sun Q Wijk KJ (2014) Correlation of mRNA and protein abundance in the developing maize leaf Plant J 78 424-440Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Rau A Gallopin M Celeux G Jaffreacutezic F (2013) Data-based filtering for replicated high-throughput transcriptome sequencingexperiments Bioinformatics 29 2146-2152

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Robinson MD McCarthy DJ Smyth GK (2010) edgeR a Bioconductor package for differential expression analysis of digital geneexpression data Bioinformatics 26 139-140

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sales G Romualdi C (2011) parmigenemdasha parallel R package for mutual information estimation and gene network reconstructionBioinformatics 27 1876-1877

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sangrador-Vegas A Mitchell AL Chang H-Y Yong S-Y Finn RD (2016) GO annotation in InterPro why stability does not indicateaccuracy in a sea of changing annotations Database J Biol databases curation 2016

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sato Y Antonio BA Namiki N Takehisa H Minami H Kamatsuki K Sugimoto K Shimizu Y Hirochika H Nagamura Y (2011) RiceXPro aplatform for monitoring gene expression in japonica rice grown under natural field conditions Nucleic Acids Res 39 D1141-D1148

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schaefer RJ Briskine R Springer NM Myers CL (2014) Discovering functional modules across diverse maize transcriptomes using wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

COB the co-expression browser PLoS One doi 101371journalpone0099193Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schmidt D (2016) Co-Operation Fast Correlation Covariance and Cosine Similarity R packagePubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schnable P Ware D Fulton R Stein J (2009) The B73 maize genome complexity diversity and dynamics Science (80- ) 326 1112-1115Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schwanhaumlusser B Busse D Li N Dittmar G Schuchhardt J Wolf J Chen W Selbach M (2011) Global quantification of mammalian geneexpression control Nature 473 337-342

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Scott-Boyer M-P Haibe-Kains B Deschepper CF (2013) Network statistics of genetically-driven gene co-expression modules in mousecrosses Frontiers in Genetics 20134291

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Serin EAR Nijveen H Hilhorst HWM Ligterink W (2016) Learning from Co-expression Networks Possibilities and Challenges FrontPlant Sci 7 444

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Shannon P Markiel A Ozier O Baliga NS Wang JT Ramage D Amin N Schwikowski B Ideker T (2003) Cytoscape a softwareenvironment for integrated models of biomolecular interaction networks Genome Res 13 2498-2504

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sing T Sander O Beerenwinkel N Lengauer T (2005) ROCR visualizing classifier performance in R Bioinformatics 21 3940-3941Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Singh M Goel S Meeley RB Dantec C Parrinello H Michaud C Leblanc O Grimanelli D (2011) Production of viable gametes withoutmeiosis in maize deficient for an ARGONAUTE protein Plant Cell 23 443-458

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Song L Langfelder P Horvath S (2012) Comparison of co-expression measures mutual information correlation and model basedindices BMC Bioinformatics 13 328

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Stelpflug SC Rajandeep S Vaillancourt B Hirsch CN Buell CR Leon N De Kaeppler SM (2015) An expanded maize gene expressionatlas based on RNA-sequencing and its use to explore root development Plant Genome 314-362

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Szklarczyk D Franceschini A Wyder S Forslund K Heller D Huerta-Cepas J Simonovic M Roth A Santos A Tsafou KP et al (2015)STRING v10 Protein-protein interaction networks integrated over the tree of life Nucleic Acids Res 43 D447-D452

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Tzfadia O Diels T De Meyer S Vandepoele K Aharoni A Van de Peer Y (2016) CoExpNetViz Comparative Co-Expression NetworksConstruction and Visualization Tool Front Plant Sci 6 1194

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Usadel B Obayashi T Mutwil M Giorgi FM Bassel GW Tanimoto M Chow A Steinhauser D Persson S Provart NJ (2009) Co-expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

USDA (2016) Grain World Markets and Trade

Vanholme R Demedts B Morreel K Ralph J Boerjan W (2010) Lignin biosynthesis and structure Plant Physiol 153 895-905Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Verdin E Ott M (2014) 50 Years of Protein Acetylation From Gene Regulation To Epigenetics Metabolism and Beyond Nat Rev MolCell Biol 16 258-264

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Walley JW Sartor RC Shen Z Schmitz RJ Wu KJ Urich MA Nery JR Smith LG Schnable JC Ecker JR (2016) Integration of omicnetworks in a developmental atlas of maize Science 353 814-818

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang W Zhou XJ Liu Z Sun F (2015a) Network tuned multiple rank aggregation and applications to gene ranking BMC Bioinformatics16 S6

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang YXR Jiang K Feldman LJ Bickel PJ Huang H (2015b) Inferring gene-gene interactions and functional modules using sparsecanonical correlation analysis Ann Appl Stat 9 300-323

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang Z Gerstein M Snyder M (2009) RNA-Seq a revolutionary tool for transcriptomics Nat Rev Genet 10 57-63Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wei C Li J Bumgarner RE (2004) Sample size for detecting differentially expressed genes in microarray experiments BMC Genomics5 87

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wolfe CJ Kohane IS Butte AJ (2005) Systematic survey reveals general applicability of guilt-by-association within genecoexpression networks BMC Bioinformatics 6 227

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wu D-D Wang X Li Y Zeng L Irwin DM Zhang Y-P (2014) Out of Pollen Hypothesis for Origin of New Genes in Flowering PlantsStudy from Arabidopsis thaliana Genome Biol Evol 6 2822-2829

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yang H Liu X Xin M Du J Hu Z Peng H Rossi V Sun Q Ni Z Yao Y (2016) Genome-wide Mapping of Targets of Maize HistoneDeacetylase HDA101 Reveals Its Function and Regulatory Mechanism during Seed Development Plant Cell 28 629-645

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yim WC Yu Y Song K Jang CS Lee B-M (2013) PLANEX the plant co-expression database BMC Plant Biol 13 83Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zheng W Chung LM Zhao H (2011) Bias detection and correction in RNA-Sequencing data BMC Bioinformatics 12 290Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Allen JD Xiao G Xie Y (2014) Ensemble-based network aggregation improves the accuracy of gene network reconstructionPLoS One 9 e106319 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Ye ZH (2015) Secondary cell walls Biosynthesis patterned deposition and transcriptional regulation Plant Cell Physiol 56195-214

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhu G Wu A Xu X-J Xiao P Lu L Liu J Cao Y Chen L Wu J Zhao X-M (2016) PPIM A protein-protein interaction database for MaizePlant Physiol 170 pp1501821-

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zyprych-Walczak J Szabelska A Handschuh L Goacuterczak K Klamecka K Figlerowicz M Siatkowski I (2015) The impact of normalizationmethods on RNA-Seq data analysis Biomed Res Int 2015

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

  • Parsed Citations
  • Article File
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Figure 7
  • Figure 8
  • Parsed Citations
Page 44: 1 : Maize RNA-Seq GCNs · 4 Author names and affiliation: 5 Ji Huang, ... and the genome annotation is still far from complete (Mark 49 Cigan et al., 2005; Ficklin and Feltus, 2011)

Marbach D Costello JC Kuumlffner R Vega NNM Prill RJ Camacho DM Allison KR Aderhold A Allison KR Bonneau R et al (2012)Wisdom of crowds for robust gene network inference Nat Methods 9 796-804

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Margolin AA Nemenman I Basso K Wiggins C Stolovitzky G Dalla Favera R Califano A (2006) ARACNE an algorithm for thereconstruction of gene regulatory networks in a mammalian cellular context BMC Bioinformatics 7 S7

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mark Cigan A Unger-Wallace E Haug-Collet K (2005) Transcriptional gene silencing as a tool for uncovering gene function in maizePlant J 43 929-940

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads EMBnet J 17 pp-10Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Maza E Frasse P Senin P Bouzayen M Zouine M (2013) Comparison of normalization methods for differential gene expressionanalysis in RNA-Seq experiments A matter of relative size of studied transcriptomes Commun Integr Biol 6 e25849

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

McClintock B (1983) The Significance of Responses of THE GENOME TO CHALLENGE Science (80- ) 792-801Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Meyer PE Kontos K Lafitte F Bontempi G (2007) Information-theoretic inference of large transcriptional regulatory networks EurasipJ Bioinforma Syst Biol doi 101155200779879

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Meyer PE Lafitte F Bontempi G (2008) minet ARBioconductor package for inferring large transcriptional networks using mutualinformation BMC Bioinformatics 9 461

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Monaco MK Sen TZ Dharmawardhana PD Ren L Schaeffer M Naithani S Amarasinghe V Thomason J Harper L Gardiner J et al(2013) Maize Metabolic Network Construction and Transcriptome Analysis Plant Genome 6 12

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Morohashi K Casas MI Falcone Ferreyra ML Falcone Ferreyra L Mejiacutea-Guerra MK Pourcel L Yilmaz A Feller A Carvalho B EmilianiJ et al (2012) A genome-wide regulatory framework identifies maize pericarp color1 controlled genes Plant Cell 24 2745-64

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Morris JH Apeltsin L Newman AM Baumbach J Wittkop T Su G Bader GD Ferrin TE (2011) clusterMaker a multi-algorithmclustering plugin for Cytoscape BMC Bioinformatics 12 436

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mortazavi A Williams BA McCue K Schaeffer L Wold B (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq NatMethods 5 621-628

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Mukaka MM (2012) A guide to appropriate use of correlation coefficient in medical research Malawi Med J 24 69-71Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Obayashi T Hayashi S Saeki M Ohta H Kinoshita K (2009) ATTED-II provides coexpressed gene networks for Arabidopsis Nucleic wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

Acids Res 37 D987-D991Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ogata Y Suzuki H Sakurai N Shibata D (2010) CoP a database for characterizing co-expressed gene modules with biologicalinformation in plants Bioinformatics 26 1267-1268

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Oshlack A Wakefield MJ (2009) Transcript length bias in RNA-seq data confounds systems biology Biol Direct 4 14Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Park T Yi S-G Kang S-H Lee S Lee Y-S Simon R (2003) Evaluation of normalization methods for microarray data BMC Bioinformatics4 33

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Paulson J Chen C-Y Lopes-Ramos CM Kuijjer ML Platig J Sonawane AR Fagny M Glass K Quackenbush J (2016) Tissue-awareRNA-Seq processing and normalization for heterogeneous and sparse data bioRxiv 81802

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Pautler M Eveland AL LaRue T Yang F Weeks R Lunde C Je B Il Meeley R Komatsu M Vollbrecht E et al (2015) FASCIATED EAR4Encodes a bZIP Transcription Factor That Regulates Shoot Meristem Size in Maize Plant Cell Online 2 tpc114132506

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Penning BW Hunter 3rd CT Tayengwa R Eveland AL Dugard CK Olek AT Vermerris W Koch KE McCarty DR Davis MF et al (2009)Genetic resources for maize cell wall biology Plant Physiol 151 1703-1728

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ponnala L Wang Y Sun Q Wijk KJ (2014) Correlation of mRNA and protein abundance in the developing maize leaf Plant J 78 424-440Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Rau A Gallopin M Celeux G Jaffreacutezic F (2013) Data-based filtering for replicated high-throughput transcriptome sequencingexperiments Bioinformatics 29 2146-2152

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Robinson MD McCarthy DJ Smyth GK (2010) edgeR a Bioconductor package for differential expression analysis of digital geneexpression data Bioinformatics 26 139-140

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sales G Romualdi C (2011) parmigenemdasha parallel R package for mutual information estimation and gene network reconstructionBioinformatics 27 1876-1877

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sangrador-Vegas A Mitchell AL Chang H-Y Yong S-Y Finn RD (2016) GO annotation in InterPro why stability does not indicateaccuracy in a sea of changing annotations Database J Biol databases curation 2016

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sato Y Antonio BA Namiki N Takehisa H Minami H Kamatsuki K Sugimoto K Shimizu Y Hirochika H Nagamura Y (2011) RiceXPro aplatform for monitoring gene expression in japonica rice grown under natural field conditions Nucleic Acids Res 39 D1141-D1148

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schaefer RJ Briskine R Springer NM Myers CL (2014) Discovering functional modules across diverse maize transcriptomes using wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

COB the co-expression browser PLoS One doi 101371journalpone0099193Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schmidt D (2016) Co-Operation Fast Correlation Covariance and Cosine Similarity R packagePubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schnable P Ware D Fulton R Stein J (2009) The B73 maize genome complexity diversity and dynamics Science (80- ) 326 1112-1115Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schwanhaumlusser B Busse D Li N Dittmar G Schuchhardt J Wolf J Chen W Selbach M (2011) Global quantification of mammalian geneexpression control Nature 473 337-342

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Scott-Boyer M-P Haibe-Kains B Deschepper CF (2013) Network statistics of genetically-driven gene co-expression modules in mousecrosses Frontiers in Genetics 20134291

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Serin EAR Nijveen H Hilhorst HWM Ligterink W (2016) Learning from Co-expression Networks Possibilities and Challenges FrontPlant Sci 7 444

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Shannon P Markiel A Ozier O Baliga NS Wang JT Ramage D Amin N Schwikowski B Ideker T (2003) Cytoscape a softwareenvironment for integrated models of biomolecular interaction networks Genome Res 13 2498-2504

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sing T Sander O Beerenwinkel N Lengauer T (2005) ROCR visualizing classifier performance in R Bioinformatics 21 3940-3941Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Singh M Goel S Meeley RB Dantec C Parrinello H Michaud C Leblanc O Grimanelli D (2011) Production of viable gametes withoutmeiosis in maize deficient for an ARGONAUTE protein Plant Cell 23 443-458

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Song L Langfelder P Horvath S (2012) Comparison of co-expression measures mutual information correlation and model basedindices BMC Bioinformatics 13 328

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Stelpflug SC Rajandeep S Vaillancourt B Hirsch CN Buell CR Leon N De Kaeppler SM (2015) An expanded maize gene expressionatlas based on RNA-sequencing and its use to explore root development Plant Genome 314-362

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Szklarczyk D Franceschini A Wyder S Forslund K Heller D Huerta-Cepas J Simonovic M Roth A Santos A Tsafou KP et al (2015)STRING v10 Protein-protein interaction networks integrated over the tree of life Nucleic Acids Res 43 D447-D452

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Tzfadia O Diels T De Meyer S Vandepoele K Aharoni A Van de Peer Y (2016) CoExpNetViz Comparative Co-Expression NetworksConstruction and Visualization Tool Front Plant Sci 6 1194

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Usadel B Obayashi T Mutwil M Giorgi FM Bassel GW Tanimoto M Chow A Steinhauser D Persson S Provart NJ (2009) Co-expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

USDA (2016) Grain World Markets and Trade

Vanholme R Demedts B Morreel K Ralph J Boerjan W (2010) Lignin biosynthesis and structure Plant Physiol 153 895-905Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Verdin E Ott M (2014) 50 Years of Protein Acetylation From Gene Regulation To Epigenetics Metabolism and Beyond Nat Rev MolCell Biol 16 258-264

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Walley JW Sartor RC Shen Z Schmitz RJ Wu KJ Urich MA Nery JR Smith LG Schnable JC Ecker JR (2016) Integration of omicnetworks in a developmental atlas of maize Science 353 814-818

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang W Zhou XJ Liu Z Sun F (2015a) Network tuned multiple rank aggregation and applications to gene ranking BMC Bioinformatics16 S6

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang YXR Jiang K Feldman LJ Bickel PJ Huang H (2015b) Inferring gene-gene interactions and functional modules using sparsecanonical correlation analysis Ann Appl Stat 9 300-323

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang Z Gerstein M Snyder M (2009) RNA-Seq a revolutionary tool for transcriptomics Nat Rev Genet 10 57-63Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wei C Li J Bumgarner RE (2004) Sample size for detecting differentially expressed genes in microarray experiments BMC Genomics5 87

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wolfe CJ Kohane IS Butte AJ (2005) Systematic survey reveals general applicability of guilt-by-association within genecoexpression networks BMC Bioinformatics 6 227

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wu D-D Wang X Li Y Zeng L Irwin DM Zhang Y-P (2014) Out of Pollen Hypothesis for Origin of New Genes in Flowering PlantsStudy from Arabidopsis thaliana Genome Biol Evol 6 2822-2829

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yang H Liu X Xin M Du J Hu Z Peng H Rossi V Sun Q Ni Z Yao Y (2016) Genome-wide Mapping of Targets of Maize HistoneDeacetylase HDA101 Reveals Its Function and Regulatory Mechanism during Seed Development Plant Cell 28 629-645

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yim WC Yu Y Song K Jang CS Lee B-M (2013) PLANEX the plant co-expression database BMC Plant Biol 13 83Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zheng W Chung LM Zhao H (2011) Bias detection and correction in RNA-Sequencing data BMC Bioinformatics 12 290Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Allen JD Xiao G Xie Y (2014) Ensemble-based network aggregation improves the accuracy of gene network reconstructionPLoS One 9 e106319 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Ye ZH (2015) Secondary cell walls Biosynthesis patterned deposition and transcriptional regulation Plant Cell Physiol 56195-214

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhu G Wu A Xu X-J Xiao P Lu L Liu J Cao Y Chen L Wu J Zhao X-M (2016) PPIM A protein-protein interaction database for MaizePlant Physiol 170 pp1501821-

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zyprych-Walczak J Szabelska A Handschuh L Goacuterczak K Klamecka K Figlerowicz M Siatkowski I (2015) The impact of normalizationmethods on RNA-Seq data analysis Biomed Res Int 2015

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

  • Parsed Citations
  • Article File
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Figure 7
  • Figure 8
  • Parsed Citations
Page 45: 1 : Maize RNA-Seq GCNs · 4 Author names and affiliation: 5 Ji Huang, ... and the genome annotation is still far from complete (Mark 49 Cigan et al., 2005; Ficklin and Feltus, 2011)

Acids Res 37 D987-D991Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ogata Y Suzuki H Sakurai N Shibata D (2010) CoP a database for characterizing co-expressed gene modules with biologicalinformation in plants Bioinformatics 26 1267-1268

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Oshlack A Wakefield MJ (2009) Transcript length bias in RNA-seq data confounds systems biology Biol Direct 4 14Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Park T Yi S-G Kang S-H Lee S Lee Y-S Simon R (2003) Evaluation of normalization methods for microarray data BMC Bioinformatics4 33

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Paulson J Chen C-Y Lopes-Ramos CM Kuijjer ML Platig J Sonawane AR Fagny M Glass K Quackenbush J (2016) Tissue-awareRNA-Seq processing and normalization for heterogeneous and sparse data bioRxiv 81802

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Pautler M Eveland AL LaRue T Yang F Weeks R Lunde C Je B Il Meeley R Komatsu M Vollbrecht E et al (2015) FASCIATED EAR4Encodes a bZIP Transcription Factor That Regulates Shoot Meristem Size in Maize Plant Cell Online 2 tpc114132506

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Penning BW Hunter 3rd CT Tayengwa R Eveland AL Dugard CK Olek AT Vermerris W Koch KE McCarty DR Davis MF et al (2009)Genetic resources for maize cell wall biology Plant Physiol 151 1703-1728

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Ponnala L Wang Y Sun Q Wijk KJ (2014) Correlation of mRNA and protein abundance in the developing maize leaf Plant J 78 424-440Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Rau A Gallopin M Celeux G Jaffreacutezic F (2013) Data-based filtering for replicated high-throughput transcriptome sequencingexperiments Bioinformatics 29 2146-2152

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Robinson MD McCarthy DJ Smyth GK (2010) edgeR a Bioconductor package for differential expression analysis of digital geneexpression data Bioinformatics 26 139-140

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sales G Romualdi C (2011) parmigenemdasha parallel R package for mutual information estimation and gene network reconstructionBioinformatics 27 1876-1877

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sangrador-Vegas A Mitchell AL Chang H-Y Yong S-Y Finn RD (2016) GO annotation in InterPro why stability does not indicateaccuracy in a sea of changing annotations Database J Biol databases curation 2016

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sato Y Antonio BA Namiki N Takehisa H Minami H Kamatsuki K Sugimoto K Shimizu Y Hirochika H Nagamura Y (2011) RiceXPro aplatform for monitoring gene expression in japonica rice grown under natural field conditions Nucleic Acids Res 39 D1141-D1148

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schaefer RJ Briskine R Springer NM Myers CL (2014) Discovering functional modules across diverse maize transcriptomes using wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

COB the co-expression browser PLoS One doi 101371journalpone0099193Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schmidt D (2016) Co-Operation Fast Correlation Covariance and Cosine Similarity R packagePubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schnable P Ware D Fulton R Stein J (2009) The B73 maize genome complexity diversity and dynamics Science (80- ) 326 1112-1115Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schwanhaumlusser B Busse D Li N Dittmar G Schuchhardt J Wolf J Chen W Selbach M (2011) Global quantification of mammalian geneexpression control Nature 473 337-342

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Scott-Boyer M-P Haibe-Kains B Deschepper CF (2013) Network statistics of genetically-driven gene co-expression modules in mousecrosses Frontiers in Genetics 20134291

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Serin EAR Nijveen H Hilhorst HWM Ligterink W (2016) Learning from Co-expression Networks Possibilities and Challenges FrontPlant Sci 7 444

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Shannon P Markiel A Ozier O Baliga NS Wang JT Ramage D Amin N Schwikowski B Ideker T (2003) Cytoscape a softwareenvironment for integrated models of biomolecular interaction networks Genome Res 13 2498-2504

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sing T Sander O Beerenwinkel N Lengauer T (2005) ROCR visualizing classifier performance in R Bioinformatics 21 3940-3941Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Singh M Goel S Meeley RB Dantec C Parrinello H Michaud C Leblanc O Grimanelli D (2011) Production of viable gametes withoutmeiosis in maize deficient for an ARGONAUTE protein Plant Cell 23 443-458

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Song L Langfelder P Horvath S (2012) Comparison of co-expression measures mutual information correlation and model basedindices BMC Bioinformatics 13 328

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Stelpflug SC Rajandeep S Vaillancourt B Hirsch CN Buell CR Leon N De Kaeppler SM (2015) An expanded maize gene expressionatlas based on RNA-sequencing and its use to explore root development Plant Genome 314-362

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Szklarczyk D Franceschini A Wyder S Forslund K Heller D Huerta-Cepas J Simonovic M Roth A Santos A Tsafou KP et al (2015)STRING v10 Protein-protein interaction networks integrated over the tree of life Nucleic Acids Res 43 D447-D452

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Tzfadia O Diels T De Meyer S Vandepoele K Aharoni A Van de Peer Y (2016) CoExpNetViz Comparative Co-Expression NetworksConstruction and Visualization Tool Front Plant Sci 6 1194

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Usadel B Obayashi T Mutwil M Giorgi FM Bassel GW Tanimoto M Chow A Steinhauser D Persson S Provart NJ (2009) Co-expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

USDA (2016) Grain World Markets and Trade

Vanholme R Demedts B Morreel K Ralph J Boerjan W (2010) Lignin biosynthesis and structure Plant Physiol 153 895-905Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Verdin E Ott M (2014) 50 Years of Protein Acetylation From Gene Regulation To Epigenetics Metabolism and Beyond Nat Rev MolCell Biol 16 258-264

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Walley JW Sartor RC Shen Z Schmitz RJ Wu KJ Urich MA Nery JR Smith LG Schnable JC Ecker JR (2016) Integration of omicnetworks in a developmental atlas of maize Science 353 814-818

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang W Zhou XJ Liu Z Sun F (2015a) Network tuned multiple rank aggregation and applications to gene ranking BMC Bioinformatics16 S6

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang YXR Jiang K Feldman LJ Bickel PJ Huang H (2015b) Inferring gene-gene interactions and functional modules using sparsecanonical correlation analysis Ann Appl Stat 9 300-323

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang Z Gerstein M Snyder M (2009) RNA-Seq a revolutionary tool for transcriptomics Nat Rev Genet 10 57-63Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wei C Li J Bumgarner RE (2004) Sample size for detecting differentially expressed genes in microarray experiments BMC Genomics5 87

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wolfe CJ Kohane IS Butte AJ (2005) Systematic survey reveals general applicability of guilt-by-association within genecoexpression networks BMC Bioinformatics 6 227

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wu D-D Wang X Li Y Zeng L Irwin DM Zhang Y-P (2014) Out of Pollen Hypothesis for Origin of New Genes in Flowering PlantsStudy from Arabidopsis thaliana Genome Biol Evol 6 2822-2829

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yang H Liu X Xin M Du J Hu Z Peng H Rossi V Sun Q Ni Z Yao Y (2016) Genome-wide Mapping of Targets of Maize HistoneDeacetylase HDA101 Reveals Its Function and Regulatory Mechanism during Seed Development Plant Cell 28 629-645

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yim WC Yu Y Song K Jang CS Lee B-M (2013) PLANEX the plant co-expression database BMC Plant Biol 13 83Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zheng W Chung LM Zhao H (2011) Bias detection and correction in RNA-Sequencing data BMC Bioinformatics 12 290Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Allen JD Xiao G Xie Y (2014) Ensemble-based network aggregation improves the accuracy of gene network reconstructionPLoS One 9 e106319 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Ye ZH (2015) Secondary cell walls Biosynthesis patterned deposition and transcriptional regulation Plant Cell Physiol 56195-214

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhu G Wu A Xu X-J Xiao P Lu L Liu J Cao Y Chen L Wu J Zhao X-M (2016) PPIM A protein-protein interaction database for MaizePlant Physiol 170 pp1501821-

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zyprych-Walczak J Szabelska A Handschuh L Goacuterczak K Klamecka K Figlerowicz M Siatkowski I (2015) The impact of normalizationmethods on RNA-Seq data analysis Biomed Res Int 2015

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

  • Parsed Citations
  • Article File
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Figure 7
  • Figure 8
  • Parsed Citations
Page 46: 1 : Maize RNA-Seq GCNs · 4 Author names and affiliation: 5 Ji Huang, ... and the genome annotation is still far from complete (Mark 49 Cigan et al., 2005; Ficklin and Feltus, 2011)

COB the co-expression browser PLoS One doi 101371journalpone0099193Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schmidt D (2016) Co-Operation Fast Correlation Covariance and Cosine Similarity R packagePubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schnable P Ware D Fulton R Stein J (2009) The B73 maize genome complexity diversity and dynamics Science (80- ) 326 1112-1115Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Schwanhaumlusser B Busse D Li N Dittmar G Schuchhardt J Wolf J Chen W Selbach M (2011) Global quantification of mammalian geneexpression control Nature 473 337-342

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Scott-Boyer M-P Haibe-Kains B Deschepper CF (2013) Network statistics of genetically-driven gene co-expression modules in mousecrosses Frontiers in Genetics 20134291

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Serin EAR Nijveen H Hilhorst HWM Ligterink W (2016) Learning from Co-expression Networks Possibilities and Challenges FrontPlant Sci 7 444

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Shannon P Markiel A Ozier O Baliga NS Wang JT Ramage D Amin N Schwikowski B Ideker T (2003) Cytoscape a softwareenvironment for integrated models of biomolecular interaction networks Genome Res 13 2498-2504

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Sing T Sander O Beerenwinkel N Lengauer T (2005) ROCR visualizing classifier performance in R Bioinformatics 21 3940-3941Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Singh M Goel S Meeley RB Dantec C Parrinello H Michaud C Leblanc O Grimanelli D (2011) Production of viable gametes withoutmeiosis in maize deficient for an ARGONAUTE protein Plant Cell 23 443-458

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Song L Langfelder P Horvath S (2012) Comparison of co-expression measures mutual information correlation and model basedindices BMC Bioinformatics 13 328

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Stelpflug SC Rajandeep S Vaillancourt B Hirsch CN Buell CR Leon N De Kaeppler SM (2015) An expanded maize gene expressionatlas based on RNA-sequencing and its use to explore root development Plant Genome 314-362

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Szklarczyk D Franceschini A Wyder S Forslund K Heller D Huerta-Cepas J Simonovic M Roth A Santos A Tsafou KP et al (2015)STRING v10 Protein-protein interaction networks integrated over the tree of life Nucleic Acids Res 43 D447-D452

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Tzfadia O Diels T De Meyer S Vandepoele K Aharoni A Van de Peer Y (2016) CoExpNetViz Comparative Co-Expression NetworksConstruction and Visualization Tool Front Plant Sci 6 1194

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Usadel B Obayashi T Mutwil M Giorgi FM Bassel GW Tanimoto M Chow A Steinhauser D Persson S Provart NJ (2009) Co-expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

USDA (2016) Grain World Markets and Trade

Vanholme R Demedts B Morreel K Ralph J Boerjan W (2010) Lignin biosynthesis and structure Plant Physiol 153 895-905Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Verdin E Ott M (2014) 50 Years of Protein Acetylation From Gene Regulation To Epigenetics Metabolism and Beyond Nat Rev MolCell Biol 16 258-264

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Walley JW Sartor RC Shen Z Schmitz RJ Wu KJ Urich MA Nery JR Smith LG Schnable JC Ecker JR (2016) Integration of omicnetworks in a developmental atlas of maize Science 353 814-818

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang W Zhou XJ Liu Z Sun F (2015a) Network tuned multiple rank aggregation and applications to gene ranking BMC Bioinformatics16 S6

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang YXR Jiang K Feldman LJ Bickel PJ Huang H (2015b) Inferring gene-gene interactions and functional modules using sparsecanonical correlation analysis Ann Appl Stat 9 300-323

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang Z Gerstein M Snyder M (2009) RNA-Seq a revolutionary tool for transcriptomics Nat Rev Genet 10 57-63Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wei C Li J Bumgarner RE (2004) Sample size for detecting differentially expressed genes in microarray experiments BMC Genomics5 87

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wolfe CJ Kohane IS Butte AJ (2005) Systematic survey reveals general applicability of guilt-by-association within genecoexpression networks BMC Bioinformatics 6 227

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wu D-D Wang X Li Y Zeng L Irwin DM Zhang Y-P (2014) Out of Pollen Hypothesis for Origin of New Genes in Flowering PlantsStudy from Arabidopsis thaliana Genome Biol Evol 6 2822-2829

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yang H Liu X Xin M Du J Hu Z Peng H Rossi V Sun Q Ni Z Yao Y (2016) Genome-wide Mapping of Targets of Maize HistoneDeacetylase HDA101 Reveals Its Function and Regulatory Mechanism during Seed Development Plant Cell 28 629-645

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yim WC Yu Y Song K Jang CS Lee B-M (2013) PLANEX the plant co-expression database BMC Plant Biol 13 83Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zheng W Chung LM Zhao H (2011) Bias detection and correction in RNA-Sequencing data BMC Bioinformatics 12 290Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Allen JD Xiao G Xie Y (2014) Ensemble-based network aggregation improves the accuracy of gene network reconstructionPLoS One 9 e106319 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Ye ZH (2015) Secondary cell walls Biosynthesis patterned deposition and transcriptional regulation Plant Cell Physiol 56195-214

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhu G Wu A Xu X-J Xiao P Lu L Liu J Cao Y Chen L Wu J Zhao X-M (2016) PPIM A protein-protein interaction database for MaizePlant Physiol 170 pp1501821-

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zyprych-Walczak J Szabelska A Handschuh L Goacuterczak K Klamecka K Figlerowicz M Siatkowski I (2015) The impact of normalizationmethods on RNA-Seq data analysis Biomed Res Int 2015

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

  • Parsed Citations
  • Article File
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Figure 7
  • Figure 8
  • Parsed Citations
Page 47: 1 : Maize RNA-Seq GCNs · 4 Author names and affiliation: 5 Ji Huang, ... and the genome annotation is still far from complete (Mark 49 Cigan et al., 2005; Ficklin and Feltus, 2011)

expression tools for plant biology opportunities for hypothesis generation and caveats Plant Cell Environ 32 1633-51Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

USDA (2016) Grain World Markets and Trade

Vanholme R Demedts B Morreel K Ralph J Boerjan W (2010) Lignin biosynthesis and structure Plant Physiol 153 895-905Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Verdin E Ott M (2014) 50 Years of Protein Acetylation From Gene Regulation To Epigenetics Metabolism and Beyond Nat Rev MolCell Biol 16 258-264

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Walley JW Sartor RC Shen Z Schmitz RJ Wu KJ Urich MA Nery JR Smith LG Schnable JC Ecker JR (2016) Integration of omicnetworks in a developmental atlas of maize Science 353 814-818

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang W Zhou XJ Liu Z Sun F (2015a) Network tuned multiple rank aggregation and applications to gene ranking BMC Bioinformatics16 S6

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang YXR Jiang K Feldman LJ Bickel PJ Huang H (2015b) Inferring gene-gene interactions and functional modules using sparsecanonical correlation analysis Ann Appl Stat 9 300-323

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wang Z Gerstein M Snyder M (2009) RNA-Seq a revolutionary tool for transcriptomics Nat Rev Genet 10 57-63Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wei C Li J Bumgarner RE (2004) Sample size for detecting differentially expressed genes in microarray experiments BMC Genomics5 87

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wolfe CJ Kohane IS Butte AJ (2005) Systematic survey reveals general applicability of guilt-by-association within genecoexpression networks BMC Bioinformatics 6 227

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Wu D-D Wang X Li Y Zeng L Irwin DM Zhang Y-P (2014) Out of Pollen Hypothesis for Origin of New Genes in Flowering PlantsStudy from Arabidopsis thaliana Genome Biol Evol 6 2822-2829

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yang H Liu X Xin M Du J Hu Z Peng H Rossi V Sun Q Ni Z Yao Y (2016) Genome-wide Mapping of Targets of Maize HistoneDeacetylase HDA101 Reveals Its Function and Regulatory Mechanism during Seed Development Plant Cell 28 629-645

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Yim WC Yu Y Song K Jang CS Lee B-M (2013) PLANEX the plant co-expression database BMC Plant Biol 13 83Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zheng W Chung LM Zhao H (2011) Bias detection and correction in RNA-Sequencing data BMC Bioinformatics 12 290Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Allen JD Xiao G Xie Y (2014) Ensemble-based network aggregation improves the accuracy of gene network reconstructionPLoS One 9 e106319 wwwplantphysiolorgon August 22 2020 - Published by Downloaded from

Copyright copy 2017 American Society of Plant Biologists All rights reserved

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Ye ZH (2015) Secondary cell walls Biosynthesis patterned deposition and transcriptional regulation Plant Cell Physiol 56195-214

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhu G Wu A Xu X-J Xiao P Lu L Liu J Cao Y Chen L Wu J Zhao X-M (2016) PPIM A protein-protein interaction database for MaizePlant Physiol 170 pp1501821-

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zyprych-Walczak J Szabelska A Handschuh L Goacuterczak K Klamecka K Figlerowicz M Siatkowski I (2015) The impact of normalizationmethods on RNA-Seq data analysis Biomed Res Int 2015

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

  • Parsed Citations
  • Article File
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Figure 7
  • Figure 8
  • Parsed Citations
Page 48: 1 : Maize RNA-Seq GCNs · 4 Author names and affiliation: 5 Ji Huang, ... and the genome annotation is still far from complete (Mark 49 Cigan et al., 2005; Ficklin and Feltus, 2011)

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhong R Ye ZH (2015) Secondary cell walls Biosynthesis patterned deposition and transcriptional regulation Plant Cell Physiol 56195-214

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zhu G Wu A Xu X-J Xiao P Lu L Liu J Cao Y Chen L Wu J Zhao X-M (2016) PPIM A protein-protein interaction database for MaizePlant Physiol 170 pp1501821-

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

Zyprych-Walczak J Szabelska A Handschuh L Goacuterczak K Klamecka K Figlerowicz M Siatkowski I (2015) The impact of normalizationmethods on RNA-Seq data analysis Biomed Res Int 2015

Pubmed Author and TitleCrossRef Author and TitleGoogle Scholar Author Only Title Only Author and Title

wwwplantphysiolorgon August 22 2020 - Published by Downloaded from Copyright copy 2017 American Society of Plant Biologists All rights reserved

  • Parsed Citations
  • Article File
  • Figure 1
  • Figure 2
  • Figure 3
  • Figure 4
  • Figure 5
  • Figure 6
  • Figure 7
  • Figure 8
  • Parsed Citations