genome-wide analysis of the arabidopsis thaliana ... genome-wide analysis of the arabidopsis...

66
Short Title: Repli-Seq of the Arabidopsis DNA replication program 1 Corresponding Author: Linda Hanley-Bowdoin, [email protected] 2 Genome-Wide Analysis of the Arabidopsis thaliana Replication Timing 3 Program 1 4 Lorenzo Concia a,2 , Ashley M. Brooks a , Emily Wheeler a , Gregory J. Zynda b , Emily E. Wear a , 5 Chantal LeBlanc c,3 , Jawon Song b , Tae-Jin Lee a,4 , Pete E. Pascuzzi d , Robert A. Martienssen c , 6 Matthew W. Vaughn b , William F. Thompson a and Linda Hanley-Bowdoin a,5 7 8 a Department of Plant and Microbial Biology, North Carolina State University, Raleigh, NC 9 27695 10 b Texas Advanced Computing Center, University of Texas at Austin, Austin, TX 78758 11 c Howard Hughes Medical Institute, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 12 11724 13 d Purdue University Libraries, Purdue University, West Lafayette, IN 47907 14 Summary Sentence: The Arabidopsis thaliana genome replicates in two non-interacting 15 compartments during early/mid and late S phase. 16 17 Authors’ Contributions: Experiments were conceived by LC, RAM, MWV, WFT, and LH-B. 18 Experiments were performed by LC, AMB, EW, EEW, CL and T-JL. Repli-Seq data were 19 analyzed by LC, PP, GJZ, JS, MWV, WFT and LH-B. LC, WFT and LH-B wrote the manuscript 20 with contributions from all authors. All authors read and approved the final manuscript. 21 22 1 Funding Information: This work was supported by a grant (IOS-1025830) from the Plant 23 Genome Research Program of the National Science Foundation to LH-B, WFT, RAM and 24 MWV. 25 26 Current addresses: 2 Institute of Plant Sciences Paris-Saclay, Bâtiment 630, Rue Noetzlin, 91190 27 Gif-sur-Yvette, France; 3 Department of Molecular, Cellular & Developmental Biology, Yale 28 University, New Haven, CT 06511; 4 Syngenta Crop Protection, LLC, Research Triangle Park, 29 NC, 27709 30 5 Address correspondence to [email protected] 31 32 Plant Physiology Preview. Published on January 4, 2018, as DOI:10.1104/pp.17.01537 Copyright 2018 by the American Society of Plant Biologists www.plantphysiol.org on June 29, 2018 - Published by Downloaded from Copyright © 2018 American Society of Plant Biologists. All rights reserved.

Upload: tranmien

Post on 25-May-2018

230 views

Category:

Documents


1 download

TRANSCRIPT

  • Short Title: Repli-Seq of the Arabidopsis DNA replication program 1

    Corresponding Author: Linda Hanley-Bowdoin, [email protected] 2

    Genome-Wide Analysis of the Arabidopsis thaliana Replication Timing 3

    Program1 4 Lorenzo Conciaa,2, Ashley M. Brooksa, Emily Wheelera, Gregory J. Zyndab, Emily E. Weara, 5 Chantal LeBlancc,3, Jawon Songb, Tae-Jin Leea,4, Pete E. Pascuzzid, Robert A. Martienssenc, 6 Matthew W. Vaughnb, William F. Thompsona and Linda Hanley-Bowdoina,5 7 8 aDepartment of Plant and Microbial Biology, North Carolina State University, Raleigh, NC 9 27695 10

    bTexas Advanced Computing Center, University of Texas at Austin, Austin, TX 78758 11

    cHoward Hughes Medical Institute, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 12 11724 13

    dPurdue University Libraries, Purdue University, West Lafayette, IN 47907 14

    Summary Sentence: The Arabidopsis thaliana genome replicates in two non-interacting 15

    compartments during early/mid and late S phase. 16 17 Authors Contributions: Experiments were conceived by LC, RAM, MWV, WFT, and LH-B. 18

    Experiments were performed by LC, AMB, EW, EEW, CL and T-JL. Repli-Seq data were 19

    analyzed by LC, PP, GJZ, JS, MWV, WFT and LH-B. LC, WFT and LH-B wrote the manuscript 20

    with contributions from all authors. All authors read and approved the final manuscript. 21 22 1Funding Information: This work was supported by a grant (IOS-1025830) from the Plant 23

    Genome Research Program of the National Science Foundation to LH-B, WFT, RAM and 24

    MWV. 25 26 Current addresses: 2Institute of Plant Sciences Paris-Saclay, Btiment 630, Rue Noetzlin, 91190 27

    Gif-sur-Yvette, France; 3Department of Molecular, Cellular & Developmental Biology, Yale 28

    University, New Haven, CT 06511; 4Syngenta Crop Protection, LLC, Research Triangle Park, 29

    NC, 27709 30

    5Address correspondence to [email protected] 31 32

    Plant Physiology Preview. Published on January 4, 2018, as DOI:10.1104/pp.17.01537

    Copyright 2018 by the American Society of Plant Biologists

    www.plantphysiol.orgon June 29, 2018 - Published by Downloaded from Copyright 2018 American Society of Plant Biologists. All rights reserved.

    http://www.plantphysiol.org

  • ABSTRACT 33

    Eukaryotes use a temporally regulated process, known as the replication timing program, to 34

    ensure that their genomes are fully and accurately duplicated during S phase. Replication timing 35

    programs are predictive of genomic features and activity, and considered to be functional 36

    readouts of chromatin organization. Although replication timing programs have been described 37

    for yeast and animal systems, much less is known about the temporal regulation of plant DNA 38

    replication or its relationship to genome sequence and chromatin structure. We used the 39

    thymidine analog, 5-ethynyl-2-deoxyuridine, in combination with flow sorting and Repli-Seq to 40

    describe, at high-resolution, the genome-wide replication timing program for Arabidopsis 41

    thaliana Col-0 suspension cells. We identified genomic regions that replicate predominantly 42

    during early, mid and late S phase, and correlated these regions with genomic features and with 43

    data for chromatin state, accessibility and long-distance interaction. Arabidopsis chromosome 44

    arms tend to replicate early while pericentromeric regions replicate late. Early and mid-45

    replicating regions are gene-rich and predominantly euchromatic, while late regions are rich in 46

    transposable elements and primarily heterochromatic. However, the distribution of chromatin 47

    states across the different times is complex, with each replication time corresponding to a 48

    mixture of states. Early and mid-replicating sequences interact with each other and not with late 49

    sequences, but early regions are more accessible than mid regions. The replication timing 50

    program in Arabidopsis reflects a bipartite genomic organization with early/mid replicating 51

    regions and late regions forming separate, non-interacting compartments. The temporal order of 52

    DNA replication within the early/mid compartment may be modulated largely by chromatin 53

    accessibility. 54

    55

    www.plantphysiol.orgon June 29, 2018 - Published by Downloaded from Copyright 2018 American Society of Plant Biologists. All rights reserved.

    http://www.plantphysiol.org

  • Concia et al.

    3

    3

    INTRODUCTION 56

    In each cell cycle, a cell must produce two identical copies of its genome during S phase. 57

    Most of our knowledge about genome replication in higher eukaryotes comes from studies in 58

    animals. These studies have indicated that replication is a temporally ordered process (Gilbert, 59

    2010) that occurs in large domains of coordinate replication (replication domains) with 60

    multiple origins firing in concert during S phase (MacAlpine et al., 2004; Desprat et al., 2009; 61

    Schwaiger et al., 2009; Farkash-Amar and Simon, 2010). The replication timing programs of 62

    several metazoan genomes have been characterized (Schbeler et al., 2002; Woodfine et al., 63

    2004; Hiratani et al., 2008; Schwaiger et al., 2009; Hansen et al., 2010). These studies revealed 64

    that early replicating chromatin is rich in genes, transcriptionally active, and contains 65

    euchromatic histone modifications (Schbeler et al., 2002; Woodfine et al., 2004; Hiratani and 66

    Gilbert, 2009; Hansen et al., 2010; Eaton et al., 2011; Lubelsky et al., 2014). Conversely, late 67

    replicating chromatin is enriched for heterochromatin and repetitive elements (Gilbert, 2002; 68

    Woodfine et al., 2004). Early and late replication domains correlate strongly with the open and 69

    closed compartments identified by chromatin conformation capture experiments (Ryba et al., 70

    2010; Yaffe et al., 2010; Pope et al., 2014). These compartments, which are megabases in size, 71

    differ widely with respect to nuclease accessibility, gene density, transcriptional activity and 72

    epigenetic marks (Lieberman-Aiden et al., 2009; Sexton et al., 2012). Hence, metazoan 73

    replication timing programs are predictive of important genomic features and can be considered 74

    functional readouts of chromatin organization (Rivera-Mulia et al., 2015). 75

    Much less is known about how DNA replication occurs temporally and spatially across plant 76

    genomes. Although the DNA replication machinery and many aspects of chromatin biology are 77

    conserved between plants and animals, there are significant differences like the absence in plants 78

    www.plantphysiol.orgon June 29, 2018 - Published by Downloaded from Copyright 2018 American Society of Plant Biologists. All rights reserved.

    http://www.plantphysiol.org

  • Concia et al.

    4

    4

    of lamins and geminin (Shultz et al., 2007; Thorpe and Charpentier, 2017), which play key roles 79

    in chromatin organization and origin function in metazoans. In addition, fundamental processes 80

    such as transcriptional regulation have been shown to differ between plants and animals 81

    (Meyerowitz, 2002; Hetzel et al., 2016). There is also evidence that the spatiotemporal 82

    distribution of replicating DNA is different in plant nuclei than in metazoan cells (Bass et al., 83

    2015). Hence, we cannot assume that DNA replication programs in plants mirror those in 84

    animals (Savadel and Bass, 2017). 85

    Arabidopsis thaliana is an important plant model system because of its small genome, which 86

    has been fully sequenced and is well annotated, and the broad range of genomic resources 87

    (Arabidopsis Genome Initiative, 2000; Provart et al., 2016). There are genome-wide data 88

    available for Arabidopsis chromatin accessibility, histone modifications and chromatin 89

    interactions. Because of these resources, Arabidopsis is an ideal system for examining DNA 90

    replication programs in plants. 91

    Our group previously published a description of the replication timing program for 92

    Arabidopsis chromosome 4 (Lee et al., 2010). In that study, Arabidopsis suspension cells were 93

    pulse-labeled with 5-bromo-2-deoxyuridine (BrdU) for 1 h followed by nuclei separation based 94

    on DNA content using flow cytometry. Replication was examined in three nuclei populations 95

    corresponding to early, mid and late S phase, using a 1-kb tiling microarray platform. While both 96

    the spatial resolution and labeling pulse length were comparable to similar studies with 97

    metazoans (Schbeler et al., 2002; Hiratani et al., 2008), no major differences were observed 98

    between the early and mid S-phase replication profiles for Arabidopsis. This finding led us to 99

    conclude that, different from animals, the order of origin activation in Arabidopsis in early and 100

    mid S phase is stochastic and replication of euchromatin does not follow a strict temporal 101

    www.plantphysiol.orgon June 29, 2018 - Published by Downloaded from Copyright 2018 American Society of Plant Biologists. All rights reserved.

    http://www.plantphysiol.org

  • Concia et al.

    5

    5

    pattern. Unlike the Arabidopsis chromosome 4 replication timing profiles, we recently observed 102

    differences between the early and mid S-phase profiles during replication of the maize genome 103

    (Wear et al., 2017). To address these conflicting results, we reexamined the Arabidopsis 104

    replication program, focusing more closely on sequences replicating in early and mid S phase. In 105

    the process, we adapted our flow cytometry strategy and the Repli-seq methodology to better 106

    distinguish between early and mid S replication. We generated a high-resolution replication 107

    timing map for the entire Arabidopsis genome, and correlated the replication program with 108

    chromatin state, accessibility and interaction data. 109

    110

    www.plantphysiol.orgon June 29, 2018 - Published by Downloaded from Copyright 2018 American Society of Plant Biologists. All rights reserved.

    http://www.plantphysiol.org

  • Concia et al.

    6

    6

    RESULTS 111

    Improving Resolution of the Replication Timing Protocol 112

    We examined several factors that might improve our ability to resolve differences in 113

    replication timing. These included the analysis platform used to detect newly synthesized DNA, 114

    the thymidine analog used to pulse label nascent DNA, the length of the labeling period, and the 115

    flow cytometry strategy for separating nuclei in different stages of S phase. 116

    Initially, we sought to improve our ability to distinguish sequences replicating in early versus 117

    mid S phase by using a more advanced NimbleGen microarray platform with shorter, more 118

    closely spaced probes to better resolve replicating DNA sequences. In this experiment, we used 119

    the same protocol as our previous study, with the exception of the array platform. The replication 120

    timing profiles generated using the NimbleGen arrays show more fine structure than those 121

    obtained from the tiling arrays (Supplemental Fig. S1). However, the overall replication profiles 122

    are very similar for the two array platforms with early and mid S-phase signals showing very 123

    high correlations on both platforms (Supplemental Fig. S2). Thus, we concluded that probe 124

    resolution was not a major factor in our ability to distinguish early and mid S-phase replication. 125

    We then focused on reducing the labeling time and obtaining better separation of early and 126

    mid-replicating nuclei (Fig. 1, A and B) (Bass et al., 2014; Wear et al., 2016). Arabidopsis 127

    cultured cells were pulse labeled with the thymidine analog, 5-ethynyl-2-deoxyuridine (EdU), 128

    for 10 minutes. After formaldehyde fixation and nuclei isolation, the incorporated EdU was 129

    conjugated with Alexa Fluor 488 (AF488) azide using Click chemistry (Salic and Mitchison, 130

    2008). Nuclei were then stained with DAPI and fractionated by flow cytometry using a two-color 131

    sort strategy based on EdU incorporation (AF488) and DNA content (DAPI). EdU-labeled nuclei 132

    were fractionated into early, mid and late S-phase populations (Fig. 1B, upper panel), while non-133

    www.plantphysiol.orgon June 29, 2018 - Published by Downloaded from Copyright 2018 American Society of Plant Biologists. All rights reserved.

    http://www.plantphysiol.org

  • Concia et al.

    7

    7

    replicating G1 and G2 nuclei were excluded based on the absence of EdU. The S-phase gates 134

    were assigned by dividing the EdU arc into 5 equal sections based on DNA content, with the 135

    first, third and fifth sections defined as early, mid and late S phase. This resulted in narrower, 136

    better separated sorting gates than in our previous experiments reducing the range of total DNA 137

    content within each S-phase fraction, and minimizing cross contamination between fractions. 138

    www.plantphysiol.orgon June 29, 2018 - Published by Downloaded from Copyright 2018 American Society of Plant Biologists. All rights reserved.

    http://www.plantphysiol.org

  • Concia et al.

    8

    8

    Reanalysis of a sample from each fraction by flow cytometry showed minimal overlap (

  • Concia et al.

    9

    9distribution. Importantly, the read distributions of the mid S-phase samples were clearly different 162 www.plantphysiol.orgon June 29, 2018 - Published by Downloaded from

    Copyright 2018 American Society of Plant Biologists. All rights reserved.

    http://www.plantphysiol.org

  • Concia et al.

    10

    10

    from the early samples (r2 = -0.02 - 0.11). 163

    Replication profiles were created using the Repliscan pipeline (Zynda et al., 2017). Read 164

    counts were averaged over non-overlapping 1-kb bins, and the total number of reads per sample 165

    (sequencing depth) was normalized to 1X genome coverage using the RPGC method (Ramrez et 166

    al., 2016). Given their high correlations, the biological replicates were combined and sequencing 167

    depth was normalized again prior to further analysis. To account for local variation in 168

    sequenceability, the normalized read densities were divided by the corresponding densities in the 169

    non-replicating G1 reference DNA (Supplemental Fig. S6). Additional low-amplitude variations 170

    were removed using Haar transform wavelets level-3 (Percival and Walden, 2000) to produce 171

    smoothed, normalized read density profiles for early, mid and late S phase. 172

    We chose not to represent the data as a "log ratio," as is often done in replication timing 173

    studies (Lee et al., 2010; Ryba et al., 2011; Pope et al., 2012), because low intensity replication 174

    activity transformed to a log ratio would have resulted in a negative number. This creates 175

    problems for downstream analyses both computationally and conceptually. Moreover, the ability 176

    of log ratio plots to compress extreme values is not necessary here because Repli-Seq profiles 177

    cover a limited range of values. 178

    Distribution of Replication Activity within Chromosomes 179

    As illustrated for Arabidopsis chromosome 1 (Fig. 2B, upper panel; Supplemental Fig. S7), 180

    visualization of the replication activity at the whole chromosome scale shows a temporal pattern 181

    along the chromosome. Early replication intensity is stronger in the distal arms and decreases 182

    progressively toward the centromeres. Conversely, late replication is concentrated near the 183

    centromere. In contrast, replication during mid S phase is more evenly distributed. The trend is 184

    www.plantphysiol.orgon June 29, 2018 - Published by Downloaded from Copyright 2018 American Society of Plant Biologists. All rights reserved.

    http://www.plantphysiol.org

  • Concia et al.

    11

    11

    consistent across all five chromosomes, although the early replication signal is less intense in the 185

    short arms of the acrocentric chromosomes 2 and 4 (Supplemental Fig. S7). 186

    Visualization on a smaller scale confirmed that the distal arms replicate mainly in early S and 187

    centromeric regions replicate in late S. It also revealed that the proximal arms tend to replicate 188

    predominantly in mid S phase, further supporting the trend described above (Fig. 2B, lower 189

    panels). We quantified the fraction of replication at each time as a function of the distance from 190

    the centromere for all ten Arabidopsis chromosome arms. Because the chromosome arms vary in 191

    length, each arm was partitioned into 10 equal size bins and the fraction of total replication in 192

    each bin was determined at each time (Fig. 2D). When the results were plotted as a function of 193

    relative distance from the centromere, it was clear that early replication increases as the distance 194

    from the centromere increases (Fig. 2D, left panel). In contrast, nearly half of late replication 195

    occurs in the three bins closest to the centromere (Fig. 2D, right panel). Mid S replication is more 196

    uniformly distributed and clearly different from early replication (Fig. 2D, middle panel). 197

    Early and mid S phase also have distinct features when examined on a fine scale. The 198

    differences were especially evident in regions where replication intensities were similar for both 199

    time points. Overlaying early and mid-replication profiles in those regions often produced a 200

    pattern of alternate early and mid local maxima (Fig. 2C, alternating blue and green line in the 201

    top panel), suggestive of replication activity spreading over time from early replicating regions to 202

    surrounding mid replicating sequences. 203

    Segmentation Analysis 204

    To facilitate more detailed analysis, we partitioned the genome into segments with similar 205

    replication times using the Repliscan pipeline (Zynda et al., 2017). This method allows for the 206

    possibility that replication of a given locus occurs in more than one time window. Our data 207

    www.plantphysiol.orgon June 29, 2018 - Published by Downloaded from Copyright 2018 American Society of Plant Biologists. All rights reserved.

    http://www.plantphysiol.org

  • Concia et al.

    12

    12showed that no sequence replicated exclusively in a single time window (Fig. 2B; Supplemental 208 www.plantphysiol.orgon June 29, 2018 - Published by Downloaded from

    Copyright 2018 American Society of Plant Biologists. All rights reserved.

    http://www.plantphysiol.org

  • Concia et al.

    13

    13

    Fig. S4). Hence, for a given sequence, we will refer to the "prevalent" time of replication, in 209

    which the replication signal is stronger than at the other times. The Repliscan pipeline uses a 210

    two-step process to assign a prevalent replication time (RT) to a 1-kb bin based on its replication 211

    intensity in early, mid and late S phase. First, 1-kb bins were classified either as replicating or 212

    non-replicating based on a threshold established for each chromosome, and only bins with 213

    replication intensity above the threshold were used for segmentation analysis. Second, 214

    replication signals for each 1-kb bin were divided by the maximum value for that bin, scaling the 215

    largest value to 1 and all others between 0 and 1. The bin was then labeled as replicating 216

    predominantly at the time with a normalized signal above 0.5. If the bin contained one or more 217

    signals within 50% of the highest signal, they were included in the classification. Adjacent bins 218

    with the same RT were merged into larger segments. With this approach, we identified segments 219

    replicating predominantly in early S phase (E), in both early and mid S phase (EM), only in mid 220

    S phase (M), in both mid and late S phase (ML), only in late S phase (L), in early and late S 221

    phase (EL), and at all the three times (EML) (Fig. 3, A and B). Regions with replication signal 222

    below the threshold in all time points were not classified or included in our statistical analyses. 223

    The cumulative genomic coverage of each RT class is shown in Fig. 3C. A single prevalent 224

    time of replication was identified for more than half of the genome (31% E + 20% M + 7% L = 225

    58%), while most of the rest of the genome was evenly split between EM (21 %) and ML (20%). 226

    The EL and EML segment classes together constituted about 1% of the genome, and 2.5% of 227

    genome could not be classified. Given the clear separation of the sorting gates used to generate 228

    the early, mid and late populations (Fig. 1B, upper panel), it is noteworthy that 41% of the 229

    Arabidopsis genome replicates in the intermediate EM and ML classes. The timing heterogeneity 230

    may reflect the presence of subpopulations of cells with related but distinct replication programs 231

    www.plantphysiol.orgon June 29, 2018 - Published by Downloaded from Copyright 2018 American Society of Plant Biologists. All rights reserved.

    http://www.plantphysiol.org

  • Concia et al.

    14

    14

    and/or allelic heterogeneity that may have arisen during prolonged cell culture (Wang and Wang, 232

    2012). The low coverage of L segments relative to E and M is also noteworthy because the width 233

    (range of DNA content) of the three sorting gates was equivalent (Fig. 1B, upper panel). 234

    The distribution of replication timing segments is similar for the five Arabidopsis 235

    chromosomes with the exception of the short arms of chromosomes 2 and 4, which have very 236

    few early segments (Fig. 3B). The distal portions of longer chromosome arms are covered with 237

    large E segments (>50-100 kb) interspersed with small EM and M segments (

  • Concia et al.

    15

    15detected several EL and EML segments, but due to their small size and low frequency, we did 255 www.plantphysiol.orgon June 29, 2018 - Published by Downloaded from

    Copyright 2018 American Society of Plant Biologists. All rights reserved.

    http://www.plantphysiol.org

  • Concia et al.

    16

    16

    not include them in subsequent analyses (Fig. 3D). 256

    Replication Time and Genomic Features 257

    To explore the relationship between replication timing and major genomic features, we 258

    queried the Repli-Seq data using Araport11 genome annotations (Cheng et al., 2017).. A visual 259

    comparison of the RT segmentation data with genes and transposable elements (Fig. 4A; 260

    Supplemental Fig. S9) showed that the gene-rich chromosome arms replicate in E, EM and M 261

    while the TE-rich pericentromeric region replicates in ML and L, as described above and 262

    reported previously (Lee et al., 2010). To obtain a more detailed picture, we computed the 263

    cumulative overlaps of genes, pseudogenes, TEs and unannotated sequences with RT classes for 264

    the entire Arabidopsis genome. The overlaps were expressed as a percent of total genomic 265

    coverage for a given feature to adjust for abundance differences (Fig. 4B). This analysis gave 266

    similar results as the visual inspection of chromosome 1, with segment coverage of genes highest 267

    in E and EM and TEs highest in ML and L. 268

    To assess if the distributions of the genomic features across the RT classes are statistically 269

    different from the distribution across the whole genome, we built a contingency table with the 270

    absolute overlaps expressed as the number of 1-kb bins (Table 2) and applied a chi-square test 271

    for homology. Differences in the overlaps showed high statistical significance (p-value < 2.2E-272

    16, 2= 25,561, df=12). However, when analyzing a large population (N=116,063), small 273

    differences between observed and expected values almost always generate a statistically 274

    significant p-value (Sullivan and Feinn, 2012). For this reason, we estimated the "effect size" of 275

    the test, defined as the "magnitude of association between categorical variables" (Kotrlik et al., 276

    2011), by calculating Cramer's V statistic. The Cramer V value for our data was 0.27, within the 277

    www.plantphysiol.orgon June 29, 2018 - Published by Downloaded from Copyright 2018 American Society of Plant Biologists. All rights reserved.

    http://www.plantphysiol.org

  • Concia et al.

    17

    17

    0.2 - 0.4 range for a "moderate association" (Rea and Parker, 2005), indicating a nonrandom 278

    distribution of genomic features in the RT classes. 279

    Next, we identified which genomic features and replication timing segments overlapped 280

    either more or less than expected by examining the sign and value of the chi-square adjusted 281

    residuals (Agresti, 2007). We split the adjusted residuals into tertiles and classified the relevant 282

    combinations as overrepresented (highest tertile), underrepresented (lowest tertile), or similar to 283

    expected (central tertile). The arrows and dots in Fig. 4B indicate the assigned category. 284

    The statistical analysis confirmed that genes are over-represented in E, EM and M and under-285

    represented in ML and L segments (Fig. 4B). Pseudogenes are enriched in ML segments. This 286

    may be due in part to the association of "processed pseudogenes," the products of 287

    retrotransposition events (Zheng et al., 2007), with TE-rich pericentromeric regions that replicate 288

    in ML (Fig. 4A). Unannotated regions overlap more with E and less with M and ML segments 289

    relative the total genome. The enrichment of unannotated regions in E segments may reflect the 290

    fact that the distances between genes in the distal arms are generally much longer than the spaces 291

    between TEs or between genes and TE in the pericentromeres (Fig. 4A). It is worth noting that 292

    depletion of unannotated regions in L segments is not statistically significant and, instead, is 293

    most likely due to poor annotation of the centromeric regions. 294

    We then determined the number of protein coding genes, pseudogenes and TE genes in each 295

    RT class (Fig. 4C). To control for differences between RT segment coverage, the counts were 296

    normalized over the genomic coverage for each RT class and expressed as the number of 297

    elements per Mb. The densities of protein coding genes in E (287/Mb), EM (308/Mb) and M 298

    (291/MB) are very similar, then drop in ML (173/Mb) and L (58/Mb) segments. Conversely, TE 299

    genes are very sparse in E (4/Mb), EM (10/Mb) and M (18/Mb) but densely packed in ML 300

    www.plantphysiol.orgon June 29, 2018 - Published by Downloaded from Copyright 2018 American Society of Plant Biologists. All rights reserved.

    http://www.plantphysiol.org

  • Concia et al.

    18

    18

    (79/Mb) and L (155/Mb) segments. The density of pseudogenes across the RT classes is low due 301

    to their low number in the Arabidopsis genome. 302

    We also computed the fraction of each segment covered by genes, TEs and unannotated 303

    sequences and generated boxplots showing the range of coverage within each RT class (Fig. 4D). 304

    Consistent with the other analyses, E, EM and M segments are gene-rich (left panel) and 305

    depleted in TEs (center panel), while ML and L segments are TE-rich and have lower gene 306

    content. The unannotated region content of different RT classes is more uniform (right panel), 307

    with slightly higher content in E and EM. 308

    Together, our results indicated that the genomic features associated with the M segments are 309

    more similar to those in E and EM segments than in ML and L segments. This was true even 310

    though the M segments replicate at a distinct stage of S phase and are more likely to be located 311

    in the proximal regions of the chromosome arms, while the E and EM segments are 312

    predominantly located in the distal regions. 313

    The above analyses only used sequence tags that mapped uniquely to the Arabidopsis 314

    genome and, as such, did not address replication timing of repetitive sequences. To analyze 315

    replication timing of repeats, we queried all the reads after initial processing with TEL, CEN, 316

    45S and 5S repeat sequences from the Plant Repeat Databases (Ouyang and Buell, 2004). 317

    Arabidopsis telomeric sequences consist of 2-5 kb stretches of 5-CCCTAAA-3 repeat units 318

    (TEL) (Richards and Ausubel, 1988), while centromeres and pericentromeres contain about 319

    20,000 copies of a 180-bp satellite repeat (CEN) in long arrays extending for several 320

    megabases (Lermontova et al., 2015). The 570-750 copies per haploid genome of 45S rRNA 321

    genes (45S rDNA) form two 4-Mbp arrays in nucleolar organizing regions located at the 322

    ends of the short arms of chromosome 2 and 4 (Copenhaver and Pikaard, 1996; Havlov et 323

    www.plantphysiol.orgon June 29, 2018 - Published by Downloaded from Copyright 2018 American Society of Plant Biologists. All rights reserved.

    http://www.plantphysiol.org

  • Concia et al.

    19

    19

    al., 2016). The pericentromeres of chromosome 3, 4 and 5 also contain heterogeneous arrays 324

    including about 1000 copies of the 5S rRNA genes (5S rDNA) (Vaillant et al., 2007). 325

    For each S-phase dataset, we computed the fraction of reads aligning to each repeat 326

    consensus and normalized it to the fraction of reads in the G1 control that aligned to the same 327

    consensus (Fig. 4E). The resulting ratio is a measure of enrichment or depletion of a given repeat 328

    in reads from early, mid or late S phase. CEN sequences are strongly enriched in late S phase and 329

    depleted in early and mid, in agreement with the late replication timing of the centromeres (Fig. 330

    2A; Supplemental Fig. S7). TEL sequences replicate preferentially in early and mid S phase but 331

    replication activity is also detectable in late S phase. The lack of a single predominant replication 332

    time is likely due to asynchrony between telomeres. In human cells, the telomere replication 333

    program is chromosome-specific and influenced by sequences in sub-telomeric regions (Arnoult 334

    et al., 2010). Replication of both 5S and 45S rDNA occurs primarily in late S phase, consistent 335

    with sequestration and silencing of most 5S and 45S rDNA gene copies by repressive 336

    heterochromatin (Layat et al., 2012). However, some 5S and 45S rDNA genes are 337

    transcriptionally active and packaged into permissive euchromatin (Douet and Tourmente, 2007; 338

    Hamperl et al., 2013; Dvorackova et al., 2017). These active fractions may be the source of the 339

    5S and 45S rDNA reads in the early and mid S-phase datasets. 340

    Replication Time and Chromatin States 341

    Chromatin structure influences the replication program (Hiratani et al., 2008; Schwaiger et 342

    al., 2009; Picard et al., 2014), with early replication associated with euchromatin and late 343

    replication associated with heterochromatin (Ding and MacAlpine, 2011). Some combinations of 344

    epigenetic marks occur together more frequently than others (Kharchenko et al., 2011; Roudier 345

    et al., 2011; Sequeira-Mendes and Gutierrez, 2016). These combinations define chromatin states 346

    www.plantphysiol.orgon June 29, 2018 - Published by Downloaded from Copyright 2018 American Society of Plant Biologists. All rights reserved.

    http://www.plantphysiol.org

  • Concia et al.

    20

    20

    that describe the local chromatin environment more accurately than the traditional binary 347

    classification and may correlate better with replication timing programs. 348

    Arabidopsis chromatin has been classified into 6 different states (CS) using 16 epigenetic 349

    marks by Wang et al. (2015). We chose this classification because it is biologically compatible 350

    with the large size of replication timing segments compared to other functional regions like 351

    transcription units. The classification described two euchromatic states (CS1and CS5), two 352

    heterochromatic states (CS6 and CS3), and two intermediate states (CS2 and CS4). Chromatin 353

    void of any of the 16 histone marks was defined as "unclear" or CS0. 354

    We used these chromatin states to examine the relationship between chromatin structure and 355

    replication timing. First, we calculated the overlap between each CS and RT class (Fig. 5A). 356

    Applying the same procedure as for genomic features, we built a contingency table 357

    (Supplemental Table S3) and performed a chi-square test (p-value < 2.2E-16, 2= 44,932, 358

    df=24). The associated Cramer's V statistic is equal to 0.31, indicating a non-random distribution 359

    of chromatin states in RT classes. The adjusted residuals for each combination of RT class and 360

    CS were classified in three tertiles indicated by the black arrows and dots in Fig. 5A. 361

    Inspection of the overlap between chromatin states and RT classes revealed that the 362

    heterochromatic CS6 and CS3 are more abundantly represented in late replicating regions. 363

    However, there is no simple relationship between chromatin states and the replication timing 364

    segments (Fig. 5A). All of the chromatin states except for CS6 and CS3 include readily 365

    discernible amounts of DNA replicating in each portion of S phase except for late. There is no 366

    clear difference in the distribution of RT classes for the euchromatic states, CS1 and CS5, and 367

    the intermediate states, CS2 and CS4. While there are small differences in the amount of early 368

    replication associated with CS1, CS5, CS2 and CS4, none of these non-heterochromatic states 369

    www.plantphysiol.orgon June 29, 2018 - Published by Downloaded from Copyright 2018 American Society of Plant Biologists. All rights reserved.

    http://www.plantphysiol.org

  • Concia et al.

    21

    21

    display a strong preference for any particular replication time (c.f. the % RT class coverage in 370

    Fig. 5B and Table 3). 371

    www.plantphysiol.orgon June 29, 2018 - Published by Downloaded from Copyright 2018 American Society of Plant Biologists. All rights reserved.

    http://www.plantphysiol.org

  • Concia et al.

    22

    22

    Each RT class also contains multiple chromatin states (Fig. 5C). The most striking 372

    differences in CS content are found between the three early to mid RT classes and the ML and 373

    L classes. E, EM and M have substantial amounts of CS1, CS5, CS2 and CS4, while the L is 374

    primarily the heterochromatic states, CS6 and CS3. The ML class includes a similar amount of 375

    CS6 but is greatly reduced for CS3, which is characterized by the canonical heterochromatin 376

    marks H3K27me1 and H3K9me2 (Luo et al., 2013). Instead, the ML class has a large fraction of 377

    CS4 and smaller amounts of CS1, CS5 and CS2, and appears transitional between the early to 378

    mid RT classes and the L class. This idea is supported by the pairwise Spearman correlation 379

    coefficients in the similarity matrix (Fig. 5D) showing that the chromatin composition of E, EM 380

    and M are similar, while L has a distinctive heterochromatic signature and ML is in between. 381

    www.plantphysiol.orgon June 29, 2018 - Published by Downloaded from Copyright 2018 American Society of Plant Biologists. All rights reserved.

    http://www.plantphysiol.org

  • Concia et al.

    23

    23

    Replication Timing and Chromatin Accessibility 382

    Replication timing also correlates with chromatin accessibility (Farkash-Amar and Simon, 383

    2010; Hansen et al., 2010; Yaffe et al., 2010; Takebayashi et al., 2012). In plants, open 384

    chromatin has been associated with higher gene density and higher levels of transcription (Zhang 385

    et al., 2012; Vera et al., 2014), but these studies did not examine the relationship between 386

    chromatin accessibility and replication timing. Hence, we compared our replication timing data 387

    with the genome-wide mapping of 34,254 DNase I hypersensitive sites (DHS) by Sullivan et al. 388

    (2014). We calculated the number of DHS per kb for each replication timing segment and plotted 389

    the distribution of the DHS densities for each RT class (Fig. 6A). The number of DHS/kb 390

    progressively decreases from E to L segments. Interestingly, only E and EM show a median 391

    DHS density above the genome average (0.28 DHS/kb). Only about 25% of M segments contain 392

    more DHS than the average, while 25% of ML and 50% of L segments do not contain any DHS. 393

    To gain further insight into the relationship between DHS density and replication timing, 394

    regions of high DHS density were compared with regions showing high local replication activity 395

    in early, mid or late S (Fig. 6B). There is an association between DHS site density and local 396

    maxima for replication in early S. In contrast, mid replication activity tends to decline around the 397

    regions of highest DHS density. There are many fewer DHS sites in centromeric and 398

    pericentromeric regions (Fig. 6C), and the DHS sites that are present in these regions do not 399

    overlap with local maxima of late replication. Instead, the peaks of DHS density in these regions 400

    are often associated with small peaks of early replication interspersed among the much stronger 401

    regions of late replication (Fig. 6C). 402

    The DHS analysis indicated that an open chromatin structure is associated with early 403

    replication activity, whereas chromatin replicating in mid S phase, although still classified as 404

    www.plantphysiol.orgon June 29, 2018 - Published by Downloaded from Copyright 2018 American Society of Plant Biologists. All rights reserved.

    http://www.plantphysiol.org

  • Concia et al.

    24

    24

    euchromatic, is less accessible. This behavior suggests a sequential model for euchromatin 405

    replication, starting in regions that can be accessed readily by the replication machinery and then 406

    spreading to less accessible regions. In contrast, late replication activity appears unaffected by 407

    short-range variations in DHS density, raising the possibility that a different mechanism 408

    regulates replication timing within heterochromatin, possibly involving long-range, subnuclear 409

    topology similar to what has been suggested for larger genomes (Pope et al., 2014). 410

    Replication Timing and Long-Range Chromosome Interactions 411

    www.plantphysiol.orgon June 29, 2018 - Published by Downloaded from Copyright 2018 American Society of Plant Biologists. All rights reserved.

    http://www.plantphysiol.org

  • Concia et al.

    25

    25

    Chromosome conformation capture (Hi-C) techniques, which characterize long distance 412

    interactions and reveal large scale spatial patterns of chromatin, have uncovered two distinct sub-413

    nuclear compartments in animals (Lieberman-Aiden et al., 2009; Hou et al., 2012; Zhang et al., 414

    2012). These compartments, which differ widely in nuclease accessibility, gene density, 415

    transcriptional activity and epigenetic marks, correlate with early and late replicating domains 416

    that span 0.1-2 Mbp (Ryba et al., 2010; Ryba et al., 2011). 417

    Hi-C analysis of the Arabidopsis genome has indicated that its spatial organization is much 418

    simpler. Arabidopsis telomeres interact more frequently with other telomeres and with the distal 419

    regions of their adjacent chromosome arms, while pericentromeres interact with the adjacent 420

    proximal regions of their chromosome arms as well as with other pericentromeres (Feng et al., 421

    2014; Grob et al., 2014). This bipartite configuration recalls the overall distribution of replication 422

    activity in early, mid and late S phase (Fig. 4A). To examine the relationship between three-423

    dimensional proximity and replication timing patterns, we compared the RT classes to the 424

    chromosome conformation capture datasets described by Liu et al. (2016). We chose this dataset 425

    because of its reproducibility was established by an earlier study (Wang et al., 2015). 426

    We aligned the Hi-C reads to the TAIR10 reference genome and identified significant 427

    interactions (p-value < 0.001) at 100-kb resolution. To focus attention on long range interactions, 428

    we imposed a minimum 1-Mbp separation between interacting loci because of the strong bias 429

    toward local interactions (Dekker et al., 2002; Lieberman-Aiden et al., 2009). We also did not 430

    consider inter-chromosomal interactions because the in-solution ligation method used to generate 431

    this dataset is known to inflate the number of trans interactions (Nagano et al., 2015). Finally, we 432

    excluded sequences within 1 Mbp of telomeres because telomeres tend to interact with very high 433

    frequency compared to the rest of the genome (Supplemental Fig. S10) (Feng et al., 2014). 434

    www.plantphysiol.orgon June 29, 2018 - Published by Downloaded from Copyright 2018 American Society of Plant Biologists. All rights reserved.

    http://www.plantphysiol.org

  • Concia et al.

    26

    26

    Significant Hi-C interactions and associated RT classes are shown for Arabidopsis 435

    chromosome 1 in Fig. 7A. Three main groups of interactions are apparent, e.g. interactions 436

    within the pericentromere (Mbp 13.5-16.5), within each chromosome arm, and between the distal 437

    parts of the two arms. This pattern agrees well with the large-scale pattern of early-replicating 438

    arms and late-replicating pericentromeres (Fig. 3B; Fig. 4A). Interestingly, while pericentromeric 439

    sequences mainly interact between themselves, the distal arms contact other early replicating 440

    regions on both chromosome arms. All chromosomes show a similar organization (Supplemental 441

    Fig. S11), except the short arms of the acrocentric chromosomes 2 and 4. These results suggested 442

    that sequences in spatial proximity within the nucleus tend to replicate at the same time during S 443

    phase, irrespective of their map positions along the chromosome. 444

    We then analyzed the pairs of interacting bins identified by Hi-C to determine the interaction 445

    profile for each RT class. The resolution of our replication data is much higher than the Hi-C 446

    data, so each Hi-C bin can contain multiple RT classes. To address this, we analyzed separately 447

    all the interacting pairs of Hi-C bins in which the first bin included a given RT class. Next, we 448

    summarized the RT segment classes in the second bin in the pair (Fig. 7B). Some pairs were 449

    assigned multiple times corresponding to each RT segment class included in the first bin. We 450

    performed the analysis in both directions with similar results, confirming that the choice of the 451

    first and second bins in each interacting pair did not influence the outcome (Supplemental Fig. 452

    S12). The E, EM and M segment classes have nearly identical interaction profiles, with a slight 453

    increase of ML and L segments in bins interacting with EM and M segments relative to E. The L 454

    segments interact preferentially with ML and L bins, while the ML segments interact with all RT 455

    classes. The ML and L groups are smaller than the E, EM and M groups due to the reduced 456

    genomic coverage of these classes (Fig. 3C). To account for this disparity, we expressed the 457

    www.plantphysiol.orgon June 29, 2018 - Published by Downloaded from Copyright 2018 American Society of Plant Biologists. All rights reserved.

    http://www.plantphysiol.org

  • Concia et al.

    27

    27interaction profiles as percent of total coverage for each interaction group (Fig. 7C; Table 4). The 458 www.plantphysiol.orgon June 29, 2018 - Published by Downloaded from

    Copyright 2018 American Society of Plant Biologists. All rights reserved.

    http://www.plantphysiol.org

  • Concia et al.

    28

    28

    cumulative interaction profile of all the groups taken together is also shown for reference. 459

    We calculated a Pearson correlation matrix for the overlap of the RT classes with interacting 460

    partners of each group (Supplemental Table S5) and plotted the results as a heat map (Fig. 7D). 461

    The interaction profiles of the E, EM and M groups are strongly correlated, while the L group 462

    has a distinct and opposite interaction profile. The interactions of the ML group are intermediate, 463

    reinforcing the transitional nature of this RT class. The two interaction clusters related to 464

    replication timing the E/EM/M cluster and the L cluster correlate with the large-scale 465

    organization of chromosomes into early replicating arms and late replicating pericentromeres. 466

    467

    www.plantphysiol.orgon June 29, 2018 - Published by Downloaded from Copyright 2018 American Society of Plant Biologists. All rights reserved.

    http://www.plantphysiol.org

  • Concia et al.

    29

    29

    468

    www.plantphysiol.orgon June 29, 2018 - Published by Downloaded from Copyright 2018 American Society of Plant Biologists. All rights reserved.

    http://www.plantphysiol.org

  • Concia et al.

    30

    30

    DISCUSSION 469

    The Genome-Wide Arabidopsis Replication Program at High Resolution 470

    We used a new high-resolution strategy to characterize the replication timing program of 471

    Arabidopsis suspension cells at the whole genome level. Nearly 60% of the Arabidopsis genome 472

    was classified as replicating principally in either early, mid or late S phase. Unlike our earlier 473

    study (Lee et al., 2010), clear differences were observed between the sequence populations 474

    replicating in early and mid S phase. However, 41% of the genome showed strong replication 475

    activity in more than one portion of S phase, indicative of heterogeneity in replication timing. 476

    Several factors contributed to the increased resolution of our new strategy. Potentially most 477

    important, we shortened the labeling time from 1 hour to 10 min after determining that the 478

    duration of S phase is only 1.5-1.9 hours for our Arabidopsis cultured cells (Mickelson-Young et 479

    al., 2016). We also reduced the widths of the sorting gates and increased the distance between 480

    them to minimize cross contamination between nuclei in early, mid and late S phase (Fig. 1). 481

    Finally, EdU conjugation to AF488 allowed us to use a two-way sorting strategy to resolve 482

    replicating from non-replicating nuclei and reduce contamination of EdU-labeled DNA by 483

    unlabeled DNA in the immunoprecipitates. 484

    The increased resolution is apparent in maps of the raw sequencing reads, which show 485

    distinct replication profiles for early, mid and late S phase across 3 highly reproducible 486

    biological replicates (Fig. 1C; Supplemental Fig. S5). Although the narrow sorting gates only 487

    captured about 50% of the S-phase nuclei, the entire Arabidopsis genome was represented in the 488

    read profiles. This may reflect heterogeneity in replication time among genome sequences and/or 489

    technical limitations associated with the sensitivity of the flow cytometer, as demonstrated by a 490

    study in human cells that used six sorting gates (Hansen et al., 2010). 491

    www.plantphysiol.orgon June 29, 2018 - Published by Downloaded from Copyright 2018 American Society of Plant Biologists. All rights reserved.

    http://www.plantphysiol.org

  • Concia et al.

    31

    31

    The increased resolution is also evident in visual comparisons between the replication 492

    profiles for Arabidopsis chromosome 4 generated using a 1-h BrdU pulse versus the 10-min EdU 493

    pulse (Supplemental Fig. S13). The profiles for early S phase are very similar, but there are 494

    major differences in the mid and late S profiles obtained using the two protocols. These 495

    differences correspond to regions that overlap between early and mid or mid and late in the BrdU 496

    profiles. The overlap between adjacent time points most likely reflects the inclusion of regions 497

    that incorporated BrdU as cells moved from earlier to later S phase during the 1-h pulse, which 498

    represents ca. 50% of the length of S phase in the Arabidopsis cultured cells (Mickelson-Young 499

    et al., 2016). Notably, there is less overlap between the profiles generated using a 10-min EdU 500

    pulse, indicating that the Arabidopsis replication timing program is less stochastic than proposed 501

    previously (Lee et al., 2010). 502

    We presented the EdU replication profiles separately for each time point, rather than assign a 503

    unique replication time to each locus based on the ratio between early and late, as is often 504

    described in the literature (Hiratani et al., 2008; Schwaiger et al., 2009; Gilbert, 2010). By doing 505

    so, we highlighted the fact that some sequences replicate with high intensity in more than one 506

    portion of S phase (Fig. 2B; Supplemental Fig. S4). This almost always happens in consecutive 507

    time points, like early-mid or mid-late S phase. However, because of the short pulse length, wide 508

    separation between the gates, and sharp separation between populations of sorted nuclei, the 509

    heterogeneity is unlikely to be a technical artifact. Given that a sequence can replicate only once 510

    in a single cell, this heterogeneity is most likely due to variation between cells in the suspension 511

    culture. However, differences between alleles at a locus, often generated in cell cultures by 512

    somaclonal variation (Wang and Wang, 2012) may also contribute to the observed heterogeneity. 513

    Segmentation Analysis 514

    www.plantphysiol.orgon June 29, 2018 - Published by Downloaded from Copyright 2018 American Society of Plant Biologists. All rights reserved.

    http://www.plantphysiol.org

  • Concia et al.

    32

    32

    To reduce the complexity of the data and assign replication times to regions across the 515

    Arabidopsis genome, we used the Repliscan pipeline (Zynda et al., 2017) to assign a 516

    predominant replication time based on the relative intensity of normalized signal in all three time 517

    points. This analysis allowed us to score replication that occurs in more than one time window at 518

    a given locus, better representing heterogeneous replication. 519

    The segmentation analysis assigned a single prevalent replication time (E, M or L) to more 520

    than half of the Arabidopsis genome, with the rest divided between EM and ML. Only 1% of the 521

    genome was not assigned to a single time or two adjacent times, underscoring the robustness of 522

    the segmentation analysis. The shorter labeling time and placement of gates to minimize overlap 523

    and emphasize mid-replicating sequences (Fig. 1) led to significant differences in segmentation 524

    from our earlier analysis of Arabidopsis chromosome 4 (Lee et al., 2010) (Supplemental Fig. 525

    S8). In the current study, 17% of chromosome 4 was classified as EM compared to 37% in the 526

    previous study. Concomitantly, sequences classified as E increased to 26% from less than 1% 527

    and as M to 22% from 4%. Coverage of L was reduced to 9% from 44%, with most of the late 528

    replicating segments located in a few megabases near the centromere. This reduction may reflect 529

    the narrower late gate and a shift in its placement to improve resolution. However, sequences 530

    classified as ML increased to 23% from 6% and included regions previously regarded as late 531

    replicating (Fig. 1B), consistent with the shorter labeling time increasing resolution. EL and 532

    EML declined from 8% to 1%. The sizes of segment identified in this study (Fig. 3D) are 533

    comparable to the putative replicons described for Arabidopsis chromosome 4 (Lee et al., 2010) 534

    and some animal systems (MacAlpine et al., 2004; Lebofsky et al., 2006; Schwaiger et al., 2009). 535

    However, our analysis did not uncover evidence of the larger replication domains that have been 536

    described in mammals (Hiratani et al., 2008; Ryba et al., 2010). 537

    www.plantphysiol.orgon June 29, 2018 - Published by Downloaded from Copyright 2018 American Society of Plant Biologists. All rights reserved.

    http://www.plantphysiol.org

  • Concia et al.

    33

    33

    The Replication Program and Genome Organization 538

    All five Arabidopsis chromosomes showed the same general pattern of replication timing 539

    (Fig. 2B; Supplemental Fig. S7). At a macroscopic level, the distal portions of the chromosome 540

    arms replicate earlier than proximal regions, while pericentromeric and centromeric regions 541

    replicate last. The short arms of chromosomes 2 and 4 are exceptions because they replicate 542

    mainly in M and ML, perhaps because of their proximity to pericentromeric regions. This 543

    organization agrees generally with the biphasic model of replication that we proposed previously 544

    for Arabidopsis (Lee et al., 2010). Analysis of RT classes in relation to genomic features (Fig. 545

    4D) suggested that E, EM and M segments are predominantly euchromatic, and ML and L 546

    segments are primarily heterochromatic. However, the distribution of chromatin states across the 547

    RT classes is more complex, with each RT class including multiple chromatin states and each 548

    chromatin state including several RT classes. This diversity suggests that, particularly in the 549

    portion of the genome classically regarded as euchromatic, replication timing may be determined 550

    to a large extent by factors that are independent of local chromatin states or by epigenetic 551

    features not included in the chromatin state analysis. 552

    Replication timing data is thought to integrate transcriptional, epigenetic and spatial 553

    information across the genome (Hiratani and Gilbert, 2009), and its inclusion in modeling can 554

    inform chromatin state assignments. Wang et al. (2015) classified CS2 and CS4 as intermediate 555

    between euchromatin or heterochromatin. These assignments were based in part on the lack of 556

    transcription of CS2 and CS4 and no enrichment for histone marks associated with active 557

    transcription. However, the large amount of CS2 and CS4 in the E, EM and M RT classes 558

    indicates that major fractions of these states are in an open, accessible conformation 559

    characteristic of euchromatin. Thus, CS2 and CS4 may include nontranscribed euchromatin that 560

    www.plantphysiol.orgon June 29, 2018 - Published by Downloaded from Copyright 2018 American Society of Plant Biologists. All rights reserved.

    http://www.plantphysiol.org

  • Concia et al.

    34

    34

    replicates with transcribed euchromatin (CS1 and CS5) during early to mid S phase. This idea is 561

    supported by the near absence of CS2 and only a small fraction of CS4 replicating with 562

    heterochromatin in late S phase. ML segments, which include both euchromatic (CS1, CS5, CS2, 563

    and CS4) and heterochromatic (CS6 and CS3) chromatin states, represent a transition from 564

    replicating euchromatin to replicating heterochromatin. 565

    Comparison of replication timing and chromosome conformation data showed that E, EM 566

    and M segments interact with each other with equal frequency within and between the arms of a 567

    chromosome, L segments interact predominantly with Hi-C bins located in the pericentromeres 568

    that encompass ML and L RT classes, while ML segments interact with all RT classes (Fig. 7). 569

    This pattern of interaction is consistent with the Arabidopsis genome consisting of two main 570

    genomic compartments one that replicates during early to mid S phase and another that 571

    replicates in late S phase. This bipartite chromosomal architecture is reminiscent of the "open" 572

    and "closed" compartments identified in the human genome (Lieberman-Aiden et al., 2009). The 573

    two compartments have distinctive epigenomic and expression features and correlate with 574

    replication time (Hansen et al., 1996; Ryba et al., 2010). It has been proposed that because of the 575

    compact nature of the Arabidopsis genome and differences in chromatin organization between 576

    plants and metazoans, the pericentromeric regions and chromosome arms may correspond 577

    functionally to the closed and open compartments in mammalian genomes (Grob, et al., 2014; 578

    Feng et al., 2014). 579

    The datasets used for chromatin state and the long-range interaction studies (and the DHS 580

    data discussed below) were generated from Arabidopsis seedlings. During plant development, 581

    actively proliferating cells are localized primarily to meristematic regions and primordia and 582

    include all cell cycle stages. As a consequence, only a small fraction of the cells used to create 583

    www.plantphysiol.orgon June 29, 2018 - Published by Downloaded from Copyright 2018 American Society of Plant Biologists. All rights reserved.

    http://www.plantphysiol.org

  • Concia et al.

    35

    35

    the seedling datasets were in S phase. For this reason, future studies that use chromatin data from 584

    mitotic cells may uncover relationships between replication timing and chromatin that were not 585

    apparent in the comparisons here. 586

    Nature of Mid S-Phase Replication 587

    Replication in mid S phase may reflect spreading from regions that initiate during early S 588

    phase and/or initiation and elongation events specific to mid S phase. In our data, the 589

    distributions of read densities are sharply different in early and mid S phase, with early reads 590

    displaying high local maxima separated by deep troughs, while mid reads are more evenly 591

    distributed with smaller peaks and dips. These profiles are consistent with models postulating 592

    firing of low efficiency origins during mid S phase (Guilbaud et al., 2011), as well as with other 593

    models involving replication of regions lacking origins by unidirectional fork progression 594

    (Desprat et al., 2009; Ryba et al., 2010). Both of these mechanisms can be incorporated into a 595

    model in which origins are not distributed uniformly across a genome (Rhind, 2014; Kaykov et 596

    al., 2016) and compete for replication factors (Mantiero et al., 2011), with the likelihood of 597

    replication initiating in a given region depending primarily on its origin. 598

    According to the above models, early replicating regions of the Arabidopsis genome would 599

    have more origins and origin clusters, and mid replicating regions would have fewer, more 600

    dispersed origins but would not differ dramatically with respect to sequence composition or 601

    global chromatin features. The only genome-wide study describing putative origin sequences in 602

    Arabidopsis is biased for early replication due to the use of sucrose starvation to arrest cells in 603

    G1 before release into BrdU in the presence of hydroxyurea to deplete nucleotide pools (Costas 604

    et al., 2011), and thus cannot provide insight into whether origins are enriched in early versus 605

    mid-replicating regions. However, our model is supported by the observation that Arabidopsis 606

    www.plantphysiol.orgon June 29, 2018 - Published by Downloaded from Copyright 2018 American Society of Plant Biologists. All rights reserved.

    http://www.plantphysiol.org

  • Concia et al.

    36

    36

    sequences replicating in early or mid S phase overlap similar genomic features (Fig. 4D) and 607

    display similar chromatin state (Fig. 5, C and D) and chromatin interaction profiles (Fig. 7, B, C, 608

    and D). However, these regions have different sensitivities to DNAse I digestion, with early 609

    regions, but not mid regions, enriched for DHS sites (Fig. 6 B and C). Local maxima in early 610

    regions are DHS-rich, while local maxima in mid regions are DHS-depleted, suggesting that 611

    early replication is associated with a higher degree of chromatin accessibility than mid 612

    replication. In this context, it is interesting that the replication program of the human genome can 613

    be accurately simulated by a model in which an initiation probability landscape is determined by 614

    the locations of DHS sites (Gindin et al., 2014). 615

    Comparison to the Maize Replication Timing Program 616

    We recently characterized replication timing in maize root tips labeled with EdU (Wear et al. 617

    2017). The global distribution of the replication timing signals in maize and Arabidopsis are 618

    similar, with chromosome arms replicating earlier and pericentromeric and centromeric regions 619

    replicating later. Like Arabidopsis, maize replication is distributed across the RT classes. 620

    However, there are more early replicating regions and fewer late regions in Arabidopsis than 621

    maize. This difference likely reflects the very different genic and nongenic (TEs and noncoding 622

    sequences) content of the two genomes (Arabidopsis - 51% genic and 49% nongenic; maize - 8% 623

    genic and 92% nongenic), with genic sequences tending to replicate earlier. In addition, there are 624

    more dispersed blocks of ML and L replicating DNA in maize chromosome arms, which are 625

    typically organized into genic regions separated by TE clusters. Maize TEs (81% of the genome) 626

    are very abundant in all RT classes, with those closer to genes replicating earlier. In contrast, 627

    Arabidopsis TEs (20% of the genome) are located primarily in pericentomeric regions and 628

    enriched in ML and L classes. 629

    www.plantphysiol.orgon June 29, 2018 - Published by Downloaded from Copyright 2018 American Society of Plant Biologists. All rights reserved.

    http://www.plantphysiol.org

  • Concia et al.

    37

    37

    There are other important similarities between the Arabidopsis and maize replication timing 630

    programs. Strikingly, the sizes of the RT segments are similar even though the maize genome is 631

    ca. 20-fold larger than Arabidopsis. Some loci show heterogeneity with respect to replication 632

    timing in both plant species. Moreover, early replicating regions are more accessible than mid 633

    replicating regions. This comparison underscores the role of genome structure in replication 634

    timing and highlights common features that are independent of genome organization. 635

    CONCLUSION 636

    We developed a high-resolution approach to study the replication program of eukaryotic 637

    genomes and applied it to the model plant Arabidopsis thaliana, extending our previous analysis 638

    of chromosome 4 (Lee et al., 2010) to the entire genome. Our results confirmed the basic 639

    observation that euchromatin replicates during early and mid S phase and heterochromatin 640

    replicates in late S phase, similar to most other eukaryotes (Hiratani et al., 2008; Schwaiger et 641

    al., 2009; Ryba et al., 2010). However, in this study, we resolved better early and mid-642

    replication patterns within euchromatin. Although very similar in their association with most 643

    genomic features and chromatin marks, early and mid-replicating sequences differ strikingly in 644

    chromatin accessibility as measured by DHS density. This finding is of particular interest in 645

    connection with a recent model proposing that origin accessibility to replication factors is one of 646

    the primary determinants of replication programs (Rhind, 2014). The model, which integrates 647

    sequential activation of origins with stochastic firing, efficiently predicted the human replication 648

    program (Gindin et al., 2014). 649

    650

    www.plantphysiol.orgon June 29, 2018 - Published by Downloaded from Copyright 2018 American Society of Plant Biologists. All rights reserved.

    http://www.plantphysiol.org

  • Concia et al.

    38

    38

    MATERIALS AND METHODS 651

    Arabidopsis Cell Culture and Nuclei Isolation 652

    The Arabidopsis thaliana cell line (Col-0, ecotype Columbia) was maintained as described 653

    by Lee et al. (2010). Labeling followed the 7-d split protocol, in which 25 mL of fresh medium 654

    and 25 mL of a 7-day culture are mixed and grown for 16 h. At 16 h, the cells were labeled with 655

    10 M 5-Ethynyl-2-deoxyuridine (EdU, Life Technologies) for 10 min. Labeling was 656

    terminated by fixing the cells in 1% paraformaldehyde with gentle agitation for 10 min, followed 657

    by quenching the formaldehyde with 0.125 M glycine. Fixed cells were filtered through two 658

    layers of Miracloth mesh and transferred to 1X phosphate buffered saline (PBS). They were 659

    washed in PBS three times and snap frozen in liquid nitrogen. Cells from eight cultures were 660

    combined for each of three biological replicates. 661

    Nuclei were isolated as described previously (Lee et al., 2010; Wear et al., 2016) with the 662

    addition of a Percoll gradient step. The frozen cell pellet was ground at 4C in 40 mL of cell lysis 663 buffer (15 mM Tris-HCl pH 7.5, 2 mM EDTA, 80 mM KCl, 20 mM NaCl, 15 mM -664

    mercaptoethanol, and 0.1% Triton X-100) using a commercial blender. The ground cell 665

    suspension was incubated for 5 min at 4C, filtered through two layers of Miracloth, and 666 centrifuged at 400xg for 5 min at 4 C. Nuclei were enriched using a Percoll step gradient as 667 described by Folta and Kaufman (2006) with minor modifications. The nuclei pellet was 668

    resuspended in 25 mL of extraction buffer (2 M hexylene glycol, 20 mM PIPES-KOH pH 7.0, 10 669

    mM MgCl2, 5 mM -mercaptoethanol) and centrifuged at 1500xg over a discontinuous density 670

    gradient (30% and 80% v/v Percoll in gradient buffer: 0.5 M hexylene glycol, 10 mM MgCl2, 5 671

    mM PIPES-KOH pH 7.0, 5 mM -mercaptoethanol and 1 % w/v Triton X-100) for 30 min at 672

    4C. The nuclei recovered from the 30:80% Percoll interface were resuspended in 15 mL of 673

    www.plantphysiol.orgon June 29, 2018 - Published by Downloaded from Copyright 2018 American Society of Plant Biologists. All rights reserved.

    http://www.plantphysiol.org

  • Concia et al.

    39

    39

    gradient buffer and centrifuged at 1500xg over a cushion of 30% Percoll (v/v) in Gradient Buffer 674

    for 10 min at 4C. 675 After washing the nuclei pellet in modified cell lysis buffer (15 mM Tris-HCl pH 7.5, 2 mM, 676

    EDTA, 80 mM KCl, 20 mM NaCl, and 0.1% Triton X-100), the incorporated EdU was 677

    conjugated with Alexa Fluor 488 (AF488) using a Click-iT EdU Alexa fluor 488 Imaging kit 678

    (Life Technologies) as described previously (Wear et al., 2016). Finally, the nuclei were 679

    resuspended in the original cell lysis buffer containing 2 g/mL DAPI and filtered through a 680

    CellTrics 20-m nylon mesh filter (Partec) just before flow cytometry and sorting. 681

    Flow Cytometry and Sorting 682

    An InFlux flow cytometer (BD Biosciences) equipped with UV (535 nm) and blue (488 683

    nm) lasers was used to sort nuclei by DNA content (DAPI fluorescence) and EdU incorporation 684

    (fluorescence of the conjugated AF488). Events were triggered on forward-angle light scatter 685

    (FSC), and data were collected using 90 side scatter (SSC) and 460/50 nm and 530/40 nm 686

    bandpass filters (Bass et al., 2014; Wear et al., 2016). Plots of SSC vs. 460/50 nm (DAPI) were 687

    used to set analysis and sorting gates that excluded cellular debris. 688

    Sub-stage gates were used to sort labeled nuclei into pools representing early, mid and late S-689

    phase as well as unlabeled nuclei in G1 phase as a source of non-replicating reference DNA. The 690

    sorting gates were separated from each other to minimize overlap between the sorted populations 691

    (Fig. 1B). For each biological replicate, between 90,000-160,000 nuclei for each S phase fraction 692

    and 1 million unlabeled G1 nuclei were collected in tubes containing STE buffer (100 mM NaCl, 693

    10 mM Tris-HCl pH 7.5, 1 mM EDTA). A small sample of nuclei (~12,000-16,000) were also 694

    sorted from each gate into cell lysis buffer augmented with 2 g/mL DAPI and reanalyzed to 695

    www.plantphysiol.orgon June 29, 2018 - Published by Downloaded from Copyright 2018 American Society of Plant Biologists. All rights reserved.

    http://www.plantphysiol.org

  • Concia et al.

    40

    40

    determine the sort purity (Supplemental Fig. S3). Flow cytometry data were analyzed using 696

    FlowJo software (Tree Star Inc.). 697

    Genomic DNA Extraction and Immunoprecipitation of EdU/AF488-Labeled DNA 698

    Genomic DNA was extracted as described previously (Lee et al., 2010) with minor 699

    modifications. After overnight incubation with proteinase K, the samples were incubated with 700

    RNAse A (50 g/mL) for 1 h at 37C prior to addition of PMSF (0.7mg/ml). The DNA was 701

    extracted once with phenol/chloroform/isoamyl alcohol (25:24:1) and twice with chloroform, 702

    and precipitated with 0.6 volumes of ice-cold isopropanol overnight at 20C. The DNA was 703

    pelleted by centrifugation, washed twice with 1 mL of 70 % ethanol and resuspended in 130 L 704

    of IP dilution buffer (167 mM NaCl, 16.7 mM Tris-HCl pH 8, 1.2 mM EDTA and 1.1 % (v/v) 705

    Triton X-100). A Covaris S220 ultrasonicator was used to shear the DNA to an average size of 706

    300 bp (parameters: intensity 5, duty cycle 10%, cycles per burst 200, treatment time 180 s). 707

    After shearing, 370 L of IP dilution buffer (Gendrel et al., 2005) was added, and the sheared 708

    DNA solution was precleared by gentle agitation in 20 L of magnetic protein G beads 709

    (Dynabeads Life Technologies) pre-equilibrated with IP dilution buffer at 4C for 1 h. The 710 beads were removed with a magnet and newly synthesized DNA was immunoprecipitated by 711

    incubating with a 1:200 dilution of anti-Alexa Fluor 488 antibody (Molecular Probes, #A-712

    11094) at 4C overnight. The DNA-antibody complex was captured with 25 L of pre-713 equilibrated protein G beads at 4C for 2 h, followed by washing the beads as described by 714 Gendrel et al. (2005). Bound DNA was eluted from the beads in 250 L of elution buffer (1% 715

    (w/v) SDS, 100 mM sodium bicarbonate) at 65C for 15 min, transferring the supernatant to a 716 new tube and repeating the elution for a final volume of 500 L. Eluted DNA was purified with 717

    www.plantphysiol.orgon June 29, 2018 - Published by Downloaded from Copyright 2018 American Society of Plant Biologists. All rights reserved.

    http://www.plantphysiol.org

  • Concia et al.

    41

    41

    QIAquick PCR Purification Kit (Qiagen) according to the manufacturers directions. To 718

    maximize DNA recovery, pre-warmed (50C) TE was used for the elution step. 719 Library Construction, Sequencing and Analysis of Repli-Seq Data 720

    Immunoprecipitated DNA was used to construct sequencing libraries with the NEXTflex 721

    Illumina ChIP-Seq Library Prep Kit (Bioo Scientific) using the ultra-low input protocol. After 722

    adapter ligation, the libraries were amplified with 18 cycles of PCR with the Expand High 723

    FidelityPLUS PCR System (Roche). For each experiment, individual samples were barcoded and 724

    pooled. The libraries were sequenced with an Illumina Hi-Seq 2000 platform. 725

    Raw sequencing data was processed using Trim Galore! (v0.3.7) to remove 3 universal 726

    adapters from the paired reads, trim 5 ends with fastq quality scores below 20, and remove 727

    trimmed reads shorter than 40 bp. The quality controlled reads were then aligned to the 728

    Arabidopsis TAIR10 genome with BWA mem (v0.7.4) using default parameters (Li, 2013). 729

    After alignment, reads with multiple alignments were discarded using samtools 1.3 (Li et al., 730

    2009). For mapping statistics and total sequence coverage, see Supplemental Table S1. 731 Data were then analyzed as described by Zynda et al. (2017). The scripts can be found at 732 https://github.com/zyndagj/repliscan. Read densities were scored in 1-kb bins across the genome, 733

    and normalized using sequence depth scaling (Ramrez et al., 2016). The correlation between 734

    biological replicates was assessed using multiBigwigSummary and plotted as a heatmap using 735

    plotCorrelation in Deeptools 2.0 suite (Ramrez et al., 2016). Replicates were highly correlated 736

    (Supplemental Fig. S5). 737

    Biological replicates were aggregated by taking the median value in each 1-kb bin. Bins with 738

    coverage in the upper and lower 0.1% tails of a calculated normal distribution were removed. 739

    Values for each of the S-phase samples were divided by the value for the non-replicating G1 740

    www.plantphysiol.orgon June 29, 2018 - Published by Downloaded from Copyright 2018 American Society of Plant Biologists. All rights reserved.

    http://www.plantphysiol.org

  • Concia et al.

    42

    42

    reference in the corresponding bin to normalize for sequencing bias. To reduce noise, Haar 741

    wavelet smoothing was performed using the software package wavelets from Percival and 742

    Walden (2000). The Haar wavelet method was chosen because, unlike kernel smoothing 743

    methods, it reduces differential noise without spreading peak boundaries. 744 Classifying Predominant Replication Time 745 The method used to assign a predominant time of replication to each 1-kb bin across the 746 genome is described by Zynda et al. (2017). Each bin was classified as replicating at a given time 747

    point if its normalized replication intensity was above a chromosome-specific threshold value, as 748

    calculated by the following procedure. Total coverage, defined as the fraction of the 749

    chromosome with a signal greater than the threshold in at least one replication time window, was 750

    computed as a continuous function of the threshold value using a cubic spline interpolation 751

    across the replication values. The first derivative of the coverage function was then calculated 752

    using the central difference formula to show the rate of coverage change. 753

    Starting from the point with the highest rate of coverage change (maximum first derivative), 754

    the threshold was lowered until the first derivative of the coverage vs. threshold curve effectively 755

    flattened out. Below this point any additional signals were uninformative because those regions 756

    had already been classified as replicating in other time points. The predominant replication time 757

    for a given 1-kb bin was then assigned by considering the relative amounts of total replication 758

    signal in early, mid and late S phase. For each 1-kb bin, the three signals were divided by the 759

    maximum value, scaling the largest value to 1 and others between 0 and 1. The bin was labeled 760

    as the combination of times with a normalized signal above 0.5. This strategy allowed single 761

    prevalent time and combinatorial time classifications to be assigned to a given 1-kb bin. Bins 762

    www.plantphysiol.orgon June 29, 2018 - Published by Downloaded from Copyright 2018 American Society of Plant Biologists. All rights reserved.

    http://www.plantphysiol.org

  • Concia et al.

    43

    43

    were classified as undetermined if none the signals in any of the three time samples reached the 763

    threshold value. 764

    Replication Intensity and Relative Distance from the Centromere 765 Centromere positions in each chromosome were identified with the bedtools 2.25.0 766 genomecov utility (Quinlan and Hall, 2010) as 1-kb bins with the maximum coverage of 180-bp 767

    repeats (Nagaki et al., 2003). Using normalized replication intensity in early, middle and late S 768

    phase, the percent of total replication occurring in bins representing successive 10% portions of a 769

    given chromosome arm was calculated with a custom R script (R Development Core Team, 770

    2016). Replication within each interval, expressed as percentage of total replication activity for 771

    that chromosome arm in that portion of S phase, was plotted as a function of the relative distance 772

    from the centromere (Fig. 2D) using the R package ggplot2 (Wickham, 2009). 773

    Association of Replication Timing with Genomic Features and Repeat Sequences 774

    Genomic annotation of genes, pseudogenes and transposable elements (TEs) were obtained 775

    from the Araport11 database (TAIR10_GFF3_genes_transposons.AIP.gff.gz at 776

    https://www.araport.org/downloads/TAIR10_genome_release/annotation). Unannotated regions 777

    were defined as the difference between the genome and all the annotated features. For viewing in 778

    IGV 2.3.60 and comparison with Repli-Seq data, the coverage of genes and TEs was defined as 779

    the percentage of bases in a specified portion of the genome that overlap with that feature. Gene 780

    and TE coverage was scored in 1-kb bins with bedtools v2.25.0 genomecov and map utilities. For 781

    visualization in IGV, the data were smoothed using a 50-kb moving average with the R package 782

    zoo (Zeileis and Grothendieck, 2005). A custom script is available upon request. 783

    Associations of genomic features with RT segmentation classes were computed with 784

    bedtools v2.25.0 intersect, and their statistical significance assessed with a chi-squared test. The 785

    www.plantphysiol.orgon June 29, 2018 - Published by Downloaded from Copyright 2018 American Society of Plant Biologists. All rights reserved.

    http://www.plantphysiol.org

  • Concia et al.

    44

    44

    adjusted residuals (Agresti, 2007) were used to measure the relative contribution of each 786

    combination of genomic feature and RT class to assess the statistical significance of the 787

    associations. 788

    Telomere-related (TEL), centromere-related (CEN), 45S and 5S ribosomal DNA sequences 789

    were obtained from Plant Repeat Databases (Ouyang and Buell, 2004). The replication timing of 790

    each group of repeats was assessed as described by Gent et al. (2014). Reads from individual 791

    biological replicates of G1, early, mid and late S phase samples were aligned to consensus 792

    sequences for each group using Blast software (parameter -e 1e-8) (Camacho et al., 2009). For 793

    each sample and biological replicate, the number of reads that aligned to each repeat family was 794

    normalized to the total number of reads present in the sample. Finally, the relative abundance of 795

    each family in the early, mid or late reads was normalized to the relative abundance of the same 796

    family in the G1 reference. 797

    Association of Replication Timing with Chromatin States, DNAse I Hypersensitivity Sites 798

    and Chromosome Conformation 799

    Repli-Seq data were compared with the chromatin state dataset produced by Wang et al. 800

    (2015). The overlaps in bp between each chromatin state (CS) and the five major RT segment 801

    classes were calculated using bedtools v2.25.0 intersect, and plotted as absolute and relative 802

    coverage. Statistical significance was assessed with a chi-squared test. We used the chi-square 803

    adjusted residuals (Agresti, 2007) to identify which RT classes were most different from the 804

    expected value in each chromatin state group of features, compared to the genome. The absolute 805

    coverage of each chromatin state in each RT class was used to compute the Spearman correlation 806

    coefficient between RT classes using the function cor in R, and subsequently plotted as a heat 807

    map with the package corrplot (Wei and Simko, 2016). 808

    www.plantphysiol.orgon June 29, 2018 - Published by Downloaded from Copyright 2018 American Society of Plant Biologists. All rights reserved.

    http://www.plantphysiol.org

  • Concia et al.

    45

    45

    To compare replication timing profiles with DNase I hypersensitivity sites (DHSs), we used 809

    the dataset (GEO accession PRJNA231710) described by Sullivan et al. (2014). The density of 810

    DHSs in each RT class (Fig. 6A; Supplemental Fig. S4) was determined using data from control 811

    experiments. The number of DNase cleavages from signal files (Accessions GSM1289359 and 812

    GSM1289363) was averaged at 1-kb steps across the genome and smoothed using a 5-kb moving 813

    average. The resulting DHS density distribution was plotted as a heat map and overlaid with the 814

    early, mid and late replication intensity signals. The DNaseI read density files (Col-815

    0.7d_Seedling.NA.NA.DS19992.signal.bw and Col-0.7d_Seedling.NA.NA.DS21094.signal.bw) 816

    are at http://plantregulome.org/public/dnase/other/all-reads/signal/. The DNaseI hypersensitive 817

    peak files (Col-0.7d_Seedling.NA.NA.DS19992.peaks.bed.gz and Col-818

    0.7d_Seedling.NA.NA.DS21094.peaks.bed.gz) are at 819

    http://plantregulome.org/public/dnase/other/all-reads/peaks/. 820

    We used the dataset (Accession number SRR2626429) described in (Liu et al., 2016) for 821

    chromosome conformation analysis. Sequencing reads were aligned to the TAIR10 reference 822

    genome and experimental artifacts, like circularized fragments, PCR duplicates, re-ligated 823

    adjacent sequences and wrong size fragments, were removed using HICUP with the default 824

    parameters (Wingett et al., 2015). Significant interactions, defined as pairs of loci that have a 825

    greater number of Hi-C reads than expected by chance (p-value < 0.001), were identified at 100-826

    kb resolution using HOMER (Heinz et al., 2010) and visualized using the CIRCOS tool 827

    (Krzywinski et al., 2009) together with the genome segmentation in RT classes. Within each 828

    interacting pair of 100-kb bins, we randomized the first and second bins and split interaction in 829

    groups based on the content of RT classes in the first bin. The absolute and relative overlaps of 830

    the second bins with RT classes were computed with bedtools v2.25.0 intersect. A Pearson 831

    www.plantphysiol.orgon June 29, 2018 - Published by Downloaded from Copyright 2018 American Society of Plant Biologists. All rights reserved.

    http://www.plantphysiol.org

  • Concia et al.

    46

    46

    correlation matrix was computed using the function cor in R, and subsequently plotted as heat 832

    map with the package corrplot (Wei and Simko, 2016). 833

    Accession Numbers 834

    Repli-Seq data from this study is in the NCBI Sequence Read Archive (SRA) under the 835

    umbrella accession number PRJNA330547. The SRA numbers are: G1 SAMN05417671, Early 836

    SAMN05417674, Mid SAMN05417672, and Late SAMN05417673. Processed data files 837

    (E_ratio_3.smooth.bedgraph; M_ratio_3.smooth.bedgraph; L_ratio_3.smooth.bedgraph; 838

    ratio_segmentation.gff3) are available from the CyVerse (previously iPlant Collaborative, 839

    (Merchant et al., 2016)) Data Store. The Nimblegene microarray data for Arabidopsis 840

    chromosome 4 replication timing is at Gene Expression Omnibus under accession number 841

    GSE103321. The tiling microarray data for Arabidopsis chromosome 4 replication timing can be 842

    found at Array Express under accession number E-GEOD-30433. 843

    Supplemental Data 844

    The following supplemental materials are available. 845

    Supplemental Table S1. Statistics for sequenced libraries 846

    Supplemental Table S2. Adjusted residuals for chi-square test on contingency Table 2 847

    describing the overlaps between genomic features and RT classes 848

    Supplemental Table S3. Overlap between chromatin states (CS) and RT classes 849

    Supplemental Table S4. Adjusted residuals relative to the chi-square test on the contingency 850

    Table S3 describing the overlaps between between Chromatin states (CS) and RT classes 851

    Supplemental Table S5. Coverage of RT classes of genomic bins establishing significant long-852

    range interactions 853

    www.plantphysiol.orgon June 29, 2018 - Published by Downloaded from Copyright 2018 American Society of Plant Biologists. All rights reserved.

    http://www.plantphysiol.org

  • Concia et al.

    47

    47

    Supplemental Figure S1. Comparison of replication timing profiles generated using tiling and 854

    Nimblegen arrays 855

    Supplemental Figure S2. Spearman correlation matrix for tiling (TL) and Nimblegen (NG) 856

    array platforms 857

    Supplemental Figure S3. Sorting gates and reanalysis of sorted fractions 858

    Supplemental Figure S4. Distribution of read density for each sequencing library in 859

    representative 1 Mb regions of Arabidopsis chromosomes 1, 3 and 5 860

    Supplemental Figure S5. Spearman correlation matrix of read densities of sequenced samples 861

    Supplemental Figure S6. Comparison of linear ratio versus Log2 ratio 862

    Supplemental Figure S7. Large-scale distribution of read density on the five Arabidopsis 863

    chromosomes 864

    Supplemental Figure S8. Comparison of the distribution of RT classes on Arabidopsis 865

    chromosome 4 866

    Supplemental Figure S9. Replication timing and genomic features 867

    Supplemental Figure S10. Hi-C background models generated with HOMER for Arabidopsis 868

    chromosome 1 86