predictive models of eukaryotic transcriptional regulation ... · predictive models of eukaryotic...

16
Predictive models of eukaryotic transcriptional regulation reveals changes in transcription factor roles and promoter usage between metabolic conditions Downloaded from: https://research.chalmers.se, 2020-12-28 19:46 UTC Citation for the original published paper (version of record): Holland, P., Bergenholm, D., Börlin, C. et al (2019) Predictive models of eukaryotic transcriptional regulation reveals changes in transcription factor roles and promoter usage between metabolic conditions Nucleic Acids Research, 47(10): 4986-5000 http://dx.doi.org/10.1093/nar/gkz253 N.B. When citing this work, cite the original published paper. research.chalmers.se offers the possibility of retrieving research publications produced at Chalmers University of Technology. It covers all kind of research output: articles, dissertations, conference papers, reports etc. since 2004. research.chalmers.se is administrated and maintained by Chalmers Library (article starts on next page)

Upload: others

Post on 07-Sep-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Predictive models of eukaryotic transcriptional regulation ... · Predictive models of eukaryotic transcriptional regulation reveals changes in transcription factor roles and promoter

Predictive models of eukaryotic transcriptional regulationreveals changes in transcription factor roles and promoter usagebetween metabolic conditions

Downloaded from: https://research.chalmers.se, 2020-12-28 19:46 UTC

Citation for the original published paper (version of record):Holland, P., Bergenholm, D., Börlin, C. et al (2019)Predictive models of eukaryotic transcriptional regulation reveals changes in transcriptionfactor roles and promoter usage between metabolic conditionsNucleic Acids Research, 47(10): 4986-5000http://dx.doi.org/10.1093/nar/gkz253

N.B. When citing this work, cite the original published paper.

research.chalmers.se offers the possibility of retrieving research publications produced at Chalmers University of Technology.It covers all kind of research output: articles, dissertations, conference papers, reports etc. since 2004.research.chalmers.se is administrated and maintained by Chalmers Library

(article starts on next page)

Page 2: Predictive models of eukaryotic transcriptional regulation ... · Predictive models of eukaryotic transcriptional regulation reveals changes in transcription factor roles and promoter

4986–5000 Nucleic Acids Research, 2019, Vol. 47, No. 10 Published online 12 April 2019doi: 10.1093/nar/gkz253

Predictive models of eukaryotic transcriptionalregulation reveals changes in transcription factorroles and promoter usage between metabolicconditionsPetter Holland 1, David Bergenholm 1, Christoph S. Borlin 1, Guodong Liu1 andJens Nielsen1,2,3,*

1Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg SE-41296,Sweden, 2Novo Nordisk Foundation Center for Biosustainability, Chalmers University of Technology, GothenburgSE-41296, Sweden and 3Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark,Kgs. Lyngby DK-2800, Denmark

Received January 11, 2019; Revised March 26, 2019; Editorial Decision March 28, 2019; Accepted April 04, 2019

ABSTRACT

Transcription factors (TF) are central to transcrip-tional regulation, but they are often studied in relativeisolation and without close control of the metabolicstate of the cell. Here, we describe genome-widebinding (by ChIP-exo) of 15 yeast TFs in four chemo-stat conditions that cover a range of metabolicstates. We integrate this data with transcriptomicsand six additional recently mapped TFs to identifypredictive models describing how TFs control geneexpression in different metabolic conditions. Contri-butions by TFs to gene regulation are predicted to bemostly activating, additive and well approximated byassuming linear effects from TF binding signal. No-tably, using TF binding peaks from peak finding al-gorithms gave distinctly worse predictions than sim-ply summing the low-noise and high-resolution TFChIP-exo reads on promoters. Finally, we discoverindications of a novel functional role for three TFs;Gcn4, Ert1 and Sut1 during nitrogen limited aerobicfermentation. In only this condition, the three TFshave correlated binding to a large number of genes(enriched for glycolytic and translation processes)and a negative correlation to target gene transcriptlevels.

INTRODUCTION

The relationship between transcription factor (TF) bind-ing to DNA and gene transcription in eukaryotes is com-plex. This is highlighted in several studies integrating chro-matin immunoprecipitation (ChIP)-based TF binding data

with transcriptomics from knockout or knockdown exper-iments of the TF with the goal of defining regulatory tar-gets. Studies of transcriptional response to hormones foundthat in mice with the TF glucocorticoid receptor knockedout, only 11% of the differently expressed genes were tar-geted by the TF (1) and a similar study of human estro-gen receptor function found only 6% of differentially ex-pressed genes to be targeted (2). Integration of a large-scalestudy using microarray transcriptomics in yeast TFs dele-tion strains (3) with previously generated ChIP-chip data(4) for 188 TFs showed even less overlap with an average3% of differentially expressed genes being targeted by thecorresponding TF (5). Thus, combining ChIP methods withtranscriptomics to understand transcriptional regulation ineukaryotic systems gives disappointing results compared tothe demonstrated successes of this approach in bacteria (6).

The increased difficulty in understanding eukaryal generegulation in comparison to bacteria may be explainedby the additional levels of regulation present, such asnucleosome–TF interactions, histone modifications andlong-range effects of binding. The impressive ENCODEdataset containing TF binding, transcriptomics as well asother chromatin features (7) has been used to explore thecontributions of different features of human promoters togene regulation. Using machine learning approaches, strongpredictive models were created and analysis of the modelssuggested a highly interconnected regulatory system whereTF binding has functional interactions with both nucleo-some occupancy and histone modifications to regulate tran-scriptional outcomes (8). A different approach to createpredictive models of transcriptional regulation based onlyon TF binding was to build a model from TF-associationscores that includes both the strength of the binding eventand the distance from a given gene in data collected from

*To whom correspondence should be addressed. Tel: +46 31 772 3804; Fax: +46 31 772 3801; Email: [email protected]

C© The Author(s) 2019. Published by Oxford University Press on behalf of Nucleic Acids Research.This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), whichpermits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

Dow

nloaded from https://academ

ic.oup.com/nar/article-abstract/47/10/4986/5446249 by C

halmers U

niversity of Technology user on 16 July 2019

Page 3: Predictive models of eukaryotic transcriptional regulation ... · Predictive models of eukaryotic transcriptional regulation reveals changes in transcription factor roles and promoter

Nucleic Acids Research, 2019, Vol. 47, No. 10 4987

mouse embryonic stem cells (9). For 12 TFs, these scoreswere combined into principal components and using linearregressions on the principal components it was possible topredict an impressive 65% of the variation in gene expres-sion genome wide (9). Such complex analysis methods willbe important tools to fully understand eukaryal transcrip-tional regulation and may allow cell engineering relying onpredictably changes in gene transcription.

While binding has been mapped for most central yeastTFs in one of the impressive large-scale studies (4,10–12),the majority of this data is captured only in a single state ofthe cell; exponential growth in nutrient excess. Here we per-formed a large-scale study of mapping TF binding of multi-ple yeast TFs known to be involved in metabolic regulationby ChIP-exo (chromatin immunoprecipitation with lambdaexonuclease) in four distinct metabolic states of the yeastcell. We integrate TF binding data with transcriptomics ofthe same metabolic conditions with the goal of buildingpredictive models using relatively simple statistical meth-ods that allow full transparency for insights into contribu-tions of different TFs to gene expression. Using ChIP-exoallowed us to study TF binding with high resolution andminimal background and using yeast as a model organismallowed us to study metabolic gene regulation utilizing a va-riety of nutrients with a constant growth rate in chemostats.

MATERIALS AND METHODS

Yeast strains and media

The host strain for all experiments contained in thisstudy was Saccharomyces cerevisiae CEN.PK 113-5D(URA-). CEN.PK 113–5D with Kluyveromyces lactisURA3 (KiURA3) re-integrated was used as control strainfor transcriptome analysis. Strains for ChIP-exo were cre-ated by amplifying either a TAP tag or a 9xMyc tag withKiURA3 and homology arms for recombination into theC-terminal end of the TF coding sequence.

The components of the chemostat media that were dif-ferent between the experimental conditions are as follows:Nitrogen limited media – 1 g/l (NH4)2SO4, 5.3 g/l K2SO4,150 ml/l glucose 40%, 12 drops Antifoam204. Ethanol lim-ited media – 5 g/l (NH4)2SO4, 6.67 ml/l Ethanol 96%,12 drops Antifoam204. Respiratory glucose limited me-dia – 5 g/l (NH4)2SO4, 18.75 ml/l glucose 40%, 12 dropsAntifoam204. Anaerobic glucose limited media – 5 g/l(NH4)2SO4, 25 ml/l glucose 40%, 4 ml/l ergosterol inTween80 (2.6 g/l), 16 drops Antifoam204. In addition tothe previously stated components changing between the me-dia, all media have the following: 14.4 g/l KH2PO4, 0.5g/l MgSO4, 1 ml/l of 1000× vitamin and 1000× tracemetal stock solutions. The 1000× stocks contains the fol-lowing: Vitamins – 0.05 g/l biotin, 0.2 g/l 4-aminobenzoicacid, 1 g/l nicotinic acid, 1 g/l calcium pantothenate,1 g/l pyridoxine HCl, 1 g/l thiamine HCl, and 25 g/lmyo-inositol. Trace metals – 15.0 g/l EDTA-Na2, 4.5 g/lZnSO4·7H2O, 0.84 g/l MnCl2·2H2O, 0.3 g/l CoCl2·6H2O,0.3 g/l CuSO4·5H2O, 0.4 g/l Na2MoO4·2H2O, 4.5 g/lCaCl2·2H2O, 3 g/l FeSO4·7H2O, 1g/l H3BO3 and 0.1 g/lKI. pH of the media was adjusted by adding KOH pelletsto get media pH of 6.0–6.5 that result in a final pH of allchemostat cultures close to 5.5.

Chemostat cultivation

Cells were cultivated in chemostats with a dilution rate of0.1 h−1 at 30c. Stirring and aeration was performed by ei-ther N2 (fermentative glucose metabolism) or pressurizedair (for the three other conditions) supplied to the cultures(13). Cultures were sampled for either ChIP-exo or tran-scriptomics after steady state was achieved for 48–60 h.

ChIP-exo

When chemostat cultures were measured to be stable for48–60 h, formaldehyde with a final concentration of 1%(wt/vol) and distilled water were added to the cultures tocreate a final OD600 of 1.0 and a total volume of 100ml. Cellswere incubated in formaldehyde for 12 min at room temper-ature followed by quenching by addition of L-glycine to a fi-nal concentration of 125 mM. Cells were then washed twicewith cold TBS and snap-frozen with liquid N2. ChIP-exowas then performed according to a protocol based on theoriginally established protocol (14) with certain modifica-tions, as described in (15). Presentation of the ChIP-exo rawdata and replicates is included in Supplementary Data 1.

Peak finding and target gene identification

Peak detection was performed by GEM (16) with defaultparameters. A peak signal threshold of >2-fold peak signalover the local genomic noise was applied and peaks wereannotated to a gene if it was found within –500 to +500 bpof a given genes TSS, as defined by (17). The full list of peaksdetected by GEM (without peak signal threshold) for eachTF is included in Supplementary Data 2.

RNA sequencing

From chemostats at steady-state, 10 OD600 from three bio-logical replicates were collected into tubes and put directlyon ice. Cells were washed twice in cold TBS and snap-frozedin liquid N2. RNA extraction was performed as describedin the manual for the RNeasy® Mini kit (QIAGEN). RNAquality was inspected by Nanodrop, Qubit and Bioanalyzerbefore proceeding with sample preparation for Illumina se-quencing and following sequencing on the NextSeq 500 sys-tem (2 × 75 bp, mid-output mode; Illumina). The RNA se-quencing read counts per gene in each metabolic conditionis included in Supplementary Data 3.

Sequencing data processing

For both the ChIP-exo data and transcriptomics, the rawsequencing output (.fastq) was mapped to a recently pub-lished CEN.PK genome (18) using Bowtie2 (19) with the -Uparameter. Samtools (20) was then used to generate sortedand indexed .bam files by first creating .bam files by the‘view’ command with parameters –bS –q 20 and further the‘sort’ and ‘index’ commands.

For the ChIP-exo data, the read count covering each nu-cleotide position genome-wide was determined from .bamfiles using the genomecov function of BEDTools (21), wherea read length of 10 was used for all TFs. The read counts ofbiological duplicate were then averaged and output as .wig

Dow

nloaded from https://academ

ic.oup.com/nar/article-abstract/47/10/4986/5446249 by C

halmers U

niversity of Technology user on 16 July 2019

Page 4: Predictive models of eukaryotic transcriptional regulation ... · Predictive models of eukaryotic transcriptional regulation reveals changes in transcription factor roles and promoter

4988 Nucleic Acids Research, 2019, Vol. 47, No. 10

files that can be found in our Mendeley Data archives asdescribed under the Data Availability section. All furtherprocessing of ChIP-exo data is through import of .wig filesinto R (22) and executable scripts to reproduce all figuresare included in Supplementary Data 6. We also supply inSupplementary Data 4 processed versions of the .wig data,containing only the total TF read counts for each gene pro-moter (TSS –500 to TSS +500 bp) for each metabolic con-dition. These files can be used directly as inputs to replicateour linear regression analysis.

For RNA sequencing data analysis, Subread (23) wasused to map the aligned reads (.bam) to gene annotations.The resulting output of transcript read counts for the repli-cates from Subread (171206 CENPK subOut wt.txt) canbe found in Supplementary Data 3. Using R (22), geneswere first filtered to only include those with minimum 1read detected in all samples and then the edgeR (24) pack-age was used to calculate FPKM values before averagingthe triplicates. All subsequent analysis using the transcrip-tomics data in the manuscript can be reproduced using theR scripts supplied in Supplementary Data 6.

Linear regressions

The mathematical expression of a simple linear regressionusing one explanatory variable (TF binding of a single TFin this case) to model the relationship to the dependent vari-able (FPKM transcript level in our case) is Yi = �0 + �1 X1i+ εi. Yi is the FPKM of gene i, �0 is the intercept (commonto all genes in the regression), X1i is the amount of TF bind-ing to gene i, �1 is the coefficient that is selected to give thebest fit together with the intercept to the transcript levels(common to all genes in the regression) and εi is the errorof the prediction for gene i.

In multiple linear regressions, several predictors areadded together, each with their own coefficient. The equa-tion then takes the format: Yi = �0 + �1 X1i + . . . + �k Xki+ εi where in our case k indicates the index of a TF. Whileour analysis using TF binding contains 21 TFs, we alwaysuse variable selection (TF selection) from the earth pack-age (25) in R for multiple linear regressions, in which onlythe most predictive set of TFs will be included and addedtogether for predictions of transcript levels of a given setof genes. In some of our analysis we also allow the earthscripts to introduce splines (earth() parameters ‘linpreds =F’ and ‘endspan = 100’), effectively allowing the algorithmto model regions where it is advantageous to have nonlin-earities in the explanatory variables.

RESULTS

Most yeast metabolic TFs show large changes in genes tar-gets in different metabolic states

To get information about several distinctly different statesof yeast metabolism we decided to analyze gene expressionregulation in chemostat cultures operated at the same spe-cific growth rate, but still causing a range of different typesof metabolism; aerobic fermentation using nitrogen limita-tion, respiratory glucose metabolism using glucose limita-tion, fermentative glucose metabolism using anaerobic con-ditions, and gluconeogenic respiration using ethanol limita-

tion. These four states of metabolism should involve largechanges in central carbon metabolism and hence we focusedon TFs that have enriched binding (relative to all otherbinding targets) to central carbon metabolism enzymes. Todefine a list of TFs to focus on we started from the land-mark dataset collected by Harbison et al. containing TFpromoter enrichment genome-wide for a majority of yeastTFs mapped by ChIP-chip in batch cultures with rich media(4). All TFs that had >50 total targeted genes and >5% en-richment of central carbon metabolism genes were selectedas candidates, as well as certain additional TFs suggestedfrom other studies to be important for controlling centralcarbon metabolism. The criteria and process of selectingTFs for this study is described in more detail in Supplemen-tary Data 5.

To map and quantify TF binding, strains were createdwith TFs tagged by a C-terminal TAP or 9xMyc tag. Allstrains were validated for presence of the tag as well as func-tional binding of the tagged TF to a known target gene’spromoter by ChIP-qPCR. The successfully validated strainswere cultivated as biological duplicates in the four differ-ent chemostat conditions and genome-wide binding eventswere mapped and quantified by ChIP-exo. This methodis an improvement over ChIP-seq, including exonucleasetreatment of the cross-linked TF-DNA complex to increasethe resolution and reduce unspecific background binding(14). A demonstration of our raw data and replicates isshown for each TF in Supplementary Data 1.

Peaks were identified by GEM (16), the duplicates aver-aged, and a signal threshold of >2 peak signal relative to thenoise of the local genomic context was applied. Comparingto what degree the targeted genes overlap between the ex-perimental conditions (Figure 1A), only two of the 15 TFsfirst reported in this study show a relatively stable set of tar-gets between the studied nutrient limited conditions: Cbf1and Gcr1. For the remaining TFs there are large changes inwhich genes are being targeted between conditions. By an-alyzing the targeted genes for the most relatively enrichedgene ontology (GO) term, we confirm many well-knownmetabolic roles for these TFs, but also find indications fornew functions such as drug transport for Ert1 and carbohy-drate transport for Sut1 (Figure 1A).

For further analysis of the relationship between TF bind-ing and transcriptional outcomes, we combined binding in-formation of the 15 TFs reported here with data for an ad-ditional 6 TFs obtained with the same experimental con-ditions and using the same protocols (Ino2, Ino4, Hap1,Oaf1 and Pip2 from Bergenholm et al. (26) and Stb5 fromOuyang et al. (27) (Supplementary Figure S1A). We firstexplored the general distribution of peaks on promotersrelative to the TSS and we found the expected strong en-richment upstream of the TSS for all metabolic conditions(Supplementary Figure S1b). Notably, when comparing thenumber of peaks detected for each condition, we discov-ered a significant decrease in count of peaks for most of thestudied TFs in aerobic fermentation (Supplementary Fig-ure S1C).

To look for an explanation for the broad changes in whichgenes are targeted between conditions we compared themost enriched DNA motif bound by the TFs for the dif-ferent metabolic conditions. For most TFs, we found only

Dow

nloaded from https://academ

ic.oup.com/nar/article-abstract/47/10/4986/5446249 by C

halmers U

niversity of Technology user on 16 July 2019

Page 5: Predictive models of eukaryotic transcriptional regulation ... · Predictive models of eukaryotic transcriptional regulation reveals changes in transcription factor roles and promoter

Nucleic Acids Research, 2019, Vol. 47, No. 10 4989

A

B

Figure 1. (A) Genome-wide TF binding characterization for multiple conditions of 15 TFs first described in this study. The most enriched DNA sequence(defined by MEME) is indicated under the TF name. The euler diagrams indicate the number of genes targeted between the experimental conditions andhow the conditions overlap. Numbers are shown for all overlaps with more than two genes. For each condition, the top significant (if there is any withP.adj < 0.05) GO category for the genes targeted that condition is shown in the same color color of the condition in the euler diagram. (B) A summary ofour experimental approach.

Dow

nloaded from https://academ

ic.oup.com/nar/article-abstract/47/10/4986/5446249 by C

halmers U

niversity of Technology user on 16 July 2019

Page 6: Predictive models of eukaryotic transcriptional regulation ... · Predictive models of eukaryotic transcriptional regulation reveals changes in transcription factor roles and promoter

4990 Nucleic Acids Research, 2019, Vol. 47, No. 10

minor differences in the enriched motif between conditions(Supplementary Figure S2A). There are a few notable situa-tions where the most enriched motif may be different in onecondition, for example Gcn4 and Rtg1 in aerobic fermen-tation, but we cannot conclude if these cases are from a truechange of DNA preference for the TF or if it is due to noisyvariation from the TF having fewer peaks in aerobic fer-mentation. For the lipid metabolism TFs Ino2, Ino4, Hap1,Oaf1 and Pip2, this analysis was performed in Bergenholmet al. (26), with similar conclusions.

Our data and peak detection was further compared totwo previously published datasets. We focused on eight TFsthat are first reported in this manuscript where binding inmultiple conditions was reported by Harbison et al. (4)and we also compared to the refined peak definitions re-ported in the MacIsaac et al. (28). We see strong over-lap for many TFs and generally stronger overlap with theHarbison et al. experiments using synthetic media (Supple-mentary Figure S2B), an expected observation because thismedia more closely resembles our experimental conditions.Two TFs show relatively poor overlap with existing datasets,Rtg1 and Rtg3, but we do see a strong enrichment of bothTFs to genes involved in amino acid biosynthesis (Figure1A), which is a role for these TFs that is thoroughly demon-strated (29,30) and makes us confident in the quality of ourdata also for these TFs.

While the selection of TFs was focused on finding TFs en-riched for binding to central carbon metabolism genes, wedecided to expand the gene sets for further studies of howthe TFs are affecting transcriptional regulation to cover allmetabolic genes. Metabolic genes were defined as being in-cluded in the latest published yeast genome-scale model,v7.6 (31); in total 849 genes from the model that have aclearly defined TSS (17) and where we also have robust geneexpression data from transcriptomics were selected for fur-ther analysis. Using all metabolic genes was a compromiseto have enough genes for strong statistical power and re-liable observations from predictive models, but also retainthe property of having relatively good TF-coverage of thegenes. Our experimental approach is summarized in Figure1B.

Comparing predictive models of transcriptional regulation

We next compared performance of different types of pre-processing of the TF binding data in predicting transcriptlevels (measured by RNA sequencing) using multiple linearregressions. The regressions assume a linear relationship be-tween TF binding and effects on transcriptional regulationand we build a model where TFs binding signal is multipliedby a coefficient and added together to predict transcript lev-els. We first tested different signal/noise ratio (SNR) thresh-olds for TF peak binding signal, but found only a mini-mal effect on performance of the predictive models (Figure2A). A different numeric representation of TF binding isto sum TF binding over an interval of DNA and we foundthat summing all binding -50 to +50bp around the identifiedpeaks gave stronger predictive power to transcriptional out-comes (Figure 2A). We further tested an even simpler sum-mation of the whole promoter region and found that thisgave even better predictive power (Figure 2A). We think this

improvement is most likely driven by contributions to tran-scriptional regulation from relatively weaker TF bindingevents that are not strong enough to be detected by a peakfinding algorithm. The reduced background of ChIP-exo ishere leveraged to be able to detect such weaker events overbackground noise. The promoter signal sum data formatwas also tested with multivariate adaptive regression splines(MARS) (32). In MARS, if it is advantageous for predictionperformance, the algorithm can introduce splines in the lin-ear regressions, effectively allowing a type of peak definitionwhere the peak threshold (spline) is introduced to create alinear relationship between TF binding and transcript levelsonly for a certain range of TF binding strength. We foundthat with MARS, the performance of the predictions fur-ther increased.

We were curious to see where in the promoter region TFbinding is most strongly contributing to gene regulation. Wetested the predictive power of binding in segments of thepromoter using linear regressions and found that bindingsignal upstream of the TSS (where we also detect the ma-jority of strong TF-binding peaks, Supplementary FigureS1B) is predicted to be most consequential for transcrip-tional regulation (Supplementary Figure S2C), but with anotable influence also from binding directly downstream ofthe TSS. Comparing the conditions, it appears that there isa relative increase in influence of TF binding directly down-stream of the TSS in aerobic fermentation (SupplementaryFigure S2c; highest point of red line is downstream of TSSwhile highest point of the other conditions is upstream ofTSS). To select a region of a gene’s promoter which capturesas much as possible of the consequential TF binding for fur-ther analysis, we started with the assumption of a symmet-rical region around the TSS (assumed based on Supplemen-tary Figure S2c) and tested extensions of this region in 50 bpincrements for predicting transcript levels (SupplementaryFigure S2d). The performance of predictions increase untilit reaches –500 to +500 around the TSS, after which thereis no further increase, indicating that this region contains amajority of the consequential TF binding.

MARS define a set of core TFs for different conditions and re-veal general quantitative features of the relationship betweenTF binding and transcriptional regulation

Based on the finding that MARS provided the best pre-dictions of transcript levels from TF binding (Figure 2A),we explored what we could learn about the roles and func-tions of TFs from MARS regressions. For multiple linearregressions, the interesting parameters to describe TF func-tion are the coefficients, which can tell us if the TF is anactivator (positive correlation between binding and tran-script levels) or a repressor (negative correlation betweenbinding and transcript levels). The MARS algorithm canalso introduce splines to improve prediction performance,which can be another parameter of TF function, describ-ing the range of TF binding where there is a linear relation-ship with transcriptional outcomes. Finally, variable selec-tion in MARS will select only the best combination of TFsto predict as much as possible while penalizing increasingthe complexity of the model. To define a set of TFs that aremost strongly predictive of transcriptional regulation for

Dow

nloaded from https://academ

ic.oup.com/nar/article-abstract/47/10/4986/5446249 by C

halmers U

niversity of Technology user on 16 July 2019

Page 7: Predictive models of eukaryotic transcriptional regulation ... · Predictive models of eukaryotic transcriptional regulation reveals changes in transcription factor roles and promoter

Nucleic Acids Research, 2019, Vol. 47, No. 10 4991

A

TF binding data preprocessing

Predicting transcript levels from TF binding

2.5 3.5 4.5 5.5

1.5

2.0

2.5

3.0

Cbf1

log10(binding)

log1

0(N

it FP

KM

)

3.5 4.0 4.5 5.0

1.5

2.0

2.5

3.0 Gcr1

log10(binding)

log1

0(N

it FP

KM

)

2.0 2.5 3.0 3.5 4.0 4.5 5.0

1.5

2.0

2.5

3.0 Hap1

log10(binding)

log1

0(N

it FP

KM

)

3.5 4.0 4.5 5.0 5.5 6.0

1.5

2.0

2.5

3.0

Ino4

log10(binding)

log1

0(N

it FP

KM

)

1 2 3 4 5

1.5

2.0

2.5

3.0

Rds2

log10(binding)

log1

0(N

it FP

KM

)

2.5 3.0 3.5 4.0 4.5

1.5

2.0

2.5

3.0 Rtg1

log10(binding)

log1

0(N

it FP

KM

)

2 3 4 5

1.5

2.0

2.5

3.0

Cat8

log10(binding)

log1

0(E

th F

PK

M)

3.0 3.5 4.0 4.5 5.0 5.5

1.5

2.0

2.5

3.0

Gcn4

log10(binding)

log1

0(E

th F

PK

M)

3.0 3.5 4.0 4.5 5.0 5.5

1.5

2.0

2.5

3.0

Gcr1

log10(binding)

log1

0(E

th F

PK

M)

2.5 3.0 3.5 4.0 4.5

1.5

2.0

2.5

3.0 Hap4

log10(binding)

log1

0(E

th F

PK

M)

3 4 5 6

1.5

2.0

2.5

3.0

Ino2

log10(binding)

log1

0(E

th F

PK

M)

1.5 2.5 3.5 4.5

1.5

2.0

2.5 Cat8

log10(binding)

log1

0(A

na F

PK

M)

2.5 3.5 4.5 5.5

1.5

2.0

2.5 Gcn4

log10(binding)

log1

0(A

na F

PK

M)

1 2 3 4

1.5

2.0

2.5 Gcr2

log10(binding)

log1

0(A

na F

PK

M)

3.5 4.0 4.5 5.0

1.5

2.0

2.5 Hap1

log10(binding)

log1

0(A

na F

PK

M)

3.5 4.0 4.5 5.0 5.5 6.0

1.5

2.0

2.5 Ino4

log10(binding)

log1

0(A

na F

PK

M)

2.0 2.5 3.0 3.5 4.0 4.5

1.5

2.0

2.5 Tye7

log10(binding)

log1

0(A

na F

PK

M)

3.0 3.5 4.0 4.5 5.0 5.5 6.0

2.0

2.5

3.0 Gcn4

log10(binding)

log1

0(G

lu F

PK

M)

1 2 3 4 5

2.0

2.5

3.0 Gcr2

log10(binding)

log1

0(G

lu F

PK

M)

3.5 4.0 4.5 5.0 5.5

2.0

2.5

3.0 Hap1

log10(binding)

log1

0(G

lu F

PK

M)

3.0 3.4 3.8 4.2

2.0

2.5

3.0 Hap4

log10(binding)

log1

0(G

lu F

PK

M)

3 4 5 6

2.0

2.5

3.0 Ino4

log10(binding)

log1

0(G

lu F

PK

M)

C D E

B

TFs and splines used TFs and splines used TFs and splines used

TFs and splines used

GDH3

BDH1ACS1

GCV3

CDC19

PMT2FUN26

CYS3

ADE1

YAT1

PHO11

IMA5

OPT1

PHO90

ELO1

ERG20

FBP26

INO1

YUR1

GLG2

LCB3URA2

TRK1

RPE1

GSH1LSB6

PHS1

GWT1

ARG3

ARG2YJL068C

BNA3

TDH1

YJL045W

RNR2

CYR1

AVT1

TDH2

MET3GPI14

ILV3

TES1

BNA1

CYC1

UTR1 CDC8

OPI3

MIR1

BNA2

SFC1

URA8

ADO1CPA2

YMR1

ATP2

STR2

XPT1

MET5

HOM6

PMT4

BAT2

DAL5

JEN1

URA1SAC1

TRP3MST1

PXA2

SPE1

PRS1

TPO5

MCD4

GPM1

SDH1

AVT3

SDH3

TGL1

PGM1

OAC1

AAT1

GFA1

YJU3

CAB3MDH1

VMA5

YNK1

FBA1

OAR1UGP1

MAE1

GPX1URA6

RAM2

ATP7

LAC1

AUR1

MET14

FOX2

SPO14

GAP1

SHB17

YSR3GLG1

KTR2

CCP1

GPT2

MET1

SIS2

MTD1

TGL4

PCK1

MHT1

MMP1

AQY2YBT1 FPS1

SDH2

GPI13

TPO1

DPS1

BPT1

YEH1

LOT6

MEU1

YEH2

AAT2ADE16 COX12

PDC1

ERG3

SHM2

FRS1XYL2

ALT1

SUL2

ERG27

AHP1

CKI1

PDC5

NHA1

PUT1

SPE4

ACS2

ASP3−2

ASP3−3DPH5

IDP2 SAM1

ATG26

COQ9

PNP1

BNA5

VPS34

CDD1

GSY2

LCB5

ECI1

NNT1

ATP14

ECM38

EXG1

MET17

ACO1

STT4

CDA1 NMA1FKS1

DIC1 TAL1

ILV5

ADE13

SUR4

FBP1

VAC14

COX8

VIP1URA4

IMD3

CAR2

VMA6

HMG2

ERG13PHO84

NDI1

COQ5

URA5

TSL1

ALO1

ATP18

HMG1

DAK1NTE1

IMD4

CYB2

CAT2

AMD1

APT1

ERG6

GLO1

PLB2

PLB1

ADI1

HXT2

SEC59

ERG5

FMS1

ARA2

STV1

AAC1

ARG7

ADH3

VBA1

PGM2 ILV2

FOL3ADE17

NDE1

PAH1

ALD3

ALD2

GCV2

ERG2

PFK2

HFA1ERG12 GUA1

ERG8

YMR226C

YHM2

FAA4

GAD1

COX7

TPS3

URA10

SCS7PGM3

GPI12

ABZ2

HER2

LCB1LIP1

ADE4

ADH2

TGL3

ADH6

FET4

FIG4

PHA2PUS4

ERG24

MET2

ALP1

LYP1

PIK1

FOL1

YNL247W

ZWF1

ADE12

SPS19

CHS1

PSD1

MEP2

AAH1

CPT1 NRK1

MLS1

INP52

LEU4

AVT4

MSK1LAT1

COX5A

LAP2

IDH1

KTR5IDP3

PET8

CIT1

LRO1URK1PHO91

ARE2

ABZ1

COQ2

MVD1

LYS9

BIO5 BIO4

DSE4

GRE2

RIB4

ARG8

PFK27MDH2

ITR2

WRS1

COQ3

ADH1

HST1

RIB2

INP54

MET22

PRS5

GPD2ARG1

SPE2

GSH2

MSE1TAT2

PLB3HST3GLO4

VHS3

CYT1

NRT1

CDC21

TGL5RKI1

KTR1

CRC1

LEU9

INP53

GCY1

CAT5

IAH1

ADE2

ORT1

IDH2

LSC1

THI80

ISN1

DDP1GLN4

LCB4

ALE1

HEM15

DCI1

SER1

THI72

HIS3

NPT1

MCT1

ODC2

DFR1

MET7

DGA1

VPH1HEM4

CPA1

DGK1

FAA1

PMT3

PRO2

VMA4

ALA1

PYK2

PUT4

PDE2

ALD4

GDH1

ATF1

IMD2

SAM3

SAM4

ATP15

PLC1

DIP5

FUM1YAH1

HUT1

VMA11

FAS2THI6 PGC1

POS5

COX10

CDC60PPT2

ODC1

IDI1

CAR1

MSD1

MSY1

SSU1

GLR1

YDC1

ATP4

BTS1

ALD6

GRX5

SUR1

KTR6

ISM1

PMA2

ERG10

MET12

CIT3

PDH1

ICL2

ATP20AGC1

ATH1

HTS1

GLN1

VMA13

MSF1 ARO7FCY1

SPE3TKL1

GRS2

PIS1

ANT1

MEP3

TAZ1

ASN1

TPO3

KRE6

GPH1 MET16

DPM1GDB1

QCR2

AQY1

ATP1

BNA4

ILS1

PRS4

PRX1

COR1

FUI1

URA7

RIB1

PET9

ACH1

SCT1 RER2COQ1

UGA2

IPP1

FUR4

CHS3

ETR1

CDS1

HMT1

PDX3

CSG2

CHS2

ATP3

FAT1

CST26

TSC3

BAP2

TAT1

MIS1

AAC3 VPS15

ALG1

LYS2

TKL2

GRS1

TPS1

VMA2

AGP2

ADH5

ARA1

RIB7

IFA38

CSH1

TYR1

ECM31 YPC1RIM2

PGI1

KTR4

LDH1 KTR3

MET8

PYC2

PDB1HIS7

ARO4

DUT1

RIB5

SHM1

TSC10CTP1

VBA2

SUL1

PHO89

CHA1PBN1

APA1

GLK1

ATG22

GRX1

HIS4ILV6

PGS1

CIT2

ADY2

PGK1

SLM5

PMP1

FEN2

FEN1 RBK1

PHO87ARE1

THR4ERS1

TRX3

MPH2

GUD1

GDH2

HEM3

GGC1

VMA1

LYS20

DLD2

DLD1

SFA1

CRD1BPL1

LYS21

QRI1

GET3PMT1

PMT5

RAM1NDE2

THI3

MDH3

COX9IDP1

PSA1

SLC1

FAD1

SIR2

GPD1

TSC13

ATP16

NTH1

GCV1

SES1

ARO3HEM13

BAP3

HEM12

TPI1

TGL2

LCB2

IPT1

TPS2GRX3

ARO1

YCF1EKI1

KGD2

HOM2

ARG82

SDH4HST4

CAB5COQ4

MSS4

ADK1LYS4

FMN1

CTA1

EXG2

MSW1

GLO2DPP1

DPL1

SUR2ATP5

PRO1

GPI11

HNT2IPK1

ASP1

TIM11

YDR341C

HXT7HXT6

HXT3

TRR1TRP4

KEI1

YPR1FRQ1

ARH1ATP17

ATO3

HPT1

URH1

DIT1

ADE8

BNA7

GUK1PHO8

KRE2RIB3

ITR1

SAM2

LPP1

GNP1

GRX2

QCR7

CAB1

STL1

DLD3

HXT13CAN1

PCM1

VMA8

YEL047C

GLY1

GDA1 CYC7

UTR4

BUD16

VMA3

RIP1

URA3

PMP2

YEA6

PMI40YND1

HEM14FAA2 ISC1PRO3YAT2

CHO1

PHM8

SAH1

HOM3PIC2

HIS1

FCY2

FCY21

CEM1

HOR2

ICL1

ARG5

RNR1

ALD5 SER3

ILV1

TRP2

MET6

PRS2

AVT6

COX15YER152C

ADK2TMT1

PDA1

FAU1

AGP3

SEC53

AGX1FRS2

LPD1

GNA1

HXT10

DEG1

GSY1

FAB1

HIS2

MET10

QCR6

BNA6

HXK1HXK2

PDE1

GUS1

ADE5

VRG4SDT1

POX1

ARO8

COX13

COX4TPN1STR3

LYS5

ARO2

GPI10

MET13

COQ8GUP1

FMP37HNM1

NPY1PUS2

PYC1

OLE1

HEM2

PNC1

TRP5

ERG4

LEU1

PMA1

ERG26ECT1

NMA2

VMA7

GSC2

MUP1

ERG25

ADE6

VHT1PDC6

CTT1

VAS1

TPC1

CLD1

MEP1

ASN2

CYS4

CHO2

PSD2

MSM1

ERG1

ATF2

QCR9

TYS1

HIP1

TDH3

PDX1

XKS1

ADE3

SER2 TRX2

PFK1

FMP43

LSC2

CPD1

SOL4

ENO1

COQ6GND2 TNA1

MES1

FOL2

CAB4

BGL2

BIO2

IMA1

MAL11

MAL12

MUP3

GUT1

DUR3

PRS3LAG1

QCR10

LEU5

ERG11

DIA4

ARG4

DED81

YHR020W

THR1

VMA16 PUT2

VMA10

NCP1INM1

COX6

PAN5

DYS1

ERG7

QNS1

MSR1

HXT4

HXT1

HXT5

GEP4

GRE3

TRR2

EPT1 FUR1

ARO9

DCD1

MPC2

SOL3

ENO2

GND1ERG9

BAT1SUC2

POT1

GUT2

PAN6

FLX1

KGD1 AYR1

COX5B

PFK26LYS12

CAB2THS1

SER33

RHR2

HIS6

PDR11

DOT5FAA3

YIA6

INP51DAL1

DAL4

DAL2

DAL7DAL3

LYS1

HYR1HTD2INM2

DUR1

SKN1

AUS1 KCS1

FAS1

THI7

HEM1TSA2

PCT1

PXA1

HST2

NAM2 APA2

YEF1

RNR4TRP1

ACC1

THI20ADH4

GLT1UGA1

HIS5

TPO4AGP1FMT1

PUS12

3

1 2 3log10(Nit_FPKM)

Pre

dict

ed lo

g10(

Nit_

FPK

M)

R2: 0.339

GDH3

BDH1ACS1

GCV3

CDC19

PMT2

FUN26

CYS3

ADE1

YAT1

PHO11

IMA5

OPT1 PHO90ELO1

ERG20

FBP26

INO1

YUR1

GLG2

LCB3

URA2

TRK1

RPE1

GSH1

LSB6

PHS1

GWT1

ARG3

ARG2

YJL068C

BNA3

TDH1

YJL045W

RNR2

CYR1AVT1

TDH2

MET3

GPI14

ILV3

TES1

BNA1

CYC1

UTR1

CDC8

OPI3

MIR1

BNA2 SFC1

URA8

ADO1

CPA2

YMR1

ATP2

STR2

XPT1

MET5

HOM6

PMT4

BAT2

DAL5

JEN1

URA1

SAC1

TRP3

MST1PXA2

SPE1 PRS1

TPO5

MCD4

GPM1

SDH1AVT3

SDH3

TGL1

PGM1

OAC1

AAT1GFA1

YJU3

CAB3

MDH1

VMA5

YNK1

FBA1

OAR1

UGP1MAE1

GPX1

URA6RAM2

ATP7

LAC1

AUR1

MET14

FOX2

SPO14

GAP1

SHB17

YSR3

GLG1

KTR2

CCP1GPT2

MET1

SIS2

MTD1

TGL4PCK1

MHT1MMP1

AQY2

YBT1 FPS1

SDH2

GPI13

TPO1

DPS1

BPT1

YEH1

LOT6

MEU1YEH2

AAT2

ADE16

COX12

PDC1

ERG3

SHM2

FRS1

XYL2 ALT1

SUL2

ERG27

AHP1

CKI1

PDC5NHA1

PUT1

SPE4

ACS2

ASP3−2ASP3−3

DPH5

IDP2

SAM1ATG26

COQ9PNP1

BNA5

VPS34

CDD1

GSY2

LCB5ECI1NNT1

ATP14

ECM38

EXG1

MET17ACO1

STT4

CDA1

NMA1

FKS1

DIC1

TAL1

ILV5

ADE13SUR4FBP1

VAC14

COX8

VIP1

URA4

IMD3

CAR2

VMA6

HMG2

ERG13

PHO84

NDI1

COQ5

URA5

TSL1

ALO1

ATP18

HMG1

DAK1

NTE1

IMD4

CYB2

CAT2AMD1

APT1

ERG6

GLO1PLB2

PLB1

ADI1

HXT2

SEC59

ERG5

FMS1

ARA2

STV1

AAC1

ARG7

ADH3

VBA1

PGM2

ILV2

FOL3

ADE17

NDE1

PAH1

ALD3

ALD2

GCV2

ERG2

PFK2

HFA1

ERG12GUA1

ERG8

YMR226CYHM2

FAA4

GAD1

COX7

TPS3 URA10

SCS7

PGM3

GPI12

ABZ2

HER2

LCB1

LIP1

ADE4ADH2TGL3

ADH6

FET4

FIG4

PHA2

PUS4

ERG24MET2

ALP1

LYP1

PIK1

FOL1 YNL247W

ZWF1

ADE12

SPS19

CHS1PSD1 MEP2

AAH1

CPT1NRK1

MLS1

INP52

LEU4

AVT4

MSK1

LAT1

COX5A

LAP2

IDH1

KTR5

IDP3

PET8

CIT1

LRO1

URK1

PHO91

ARE2

ABZ1

COQ2MVD1

LYS9

BIO5BIO4

DSE4GRE2

RIB4

ARG8

PFK27

MDH2

ITR2 WRS1

COQ3

ADH1

HST1

RIB2INP54

MET22

PRS5

GPD2ARG1

SPE2

GSH2

MSE1

TAT2PLB3

HST3GLO4

VHS3

CYT1

NRT1

CDC21

TGL5

RKI1

KTR1

CRC1

LEU9

INP53

GCY1

CAT5

IAH1 ADE2

ORT1

IDH2

LSC1

THI80

ISN1

DDP1GLN4

LCB4

ALE1HEM15

DCI1

SER1

THI72

HIS3

NPT1

MCT1

ODC2

DFR1

MET7

DGA1

VPH1

HEM4

CPA1

DGK1

FAA1

PMT3

PRO2

VMA4

ALA1

PYK2

PUT4

PDE2

ALD4

GDH1

ATF1IMD2

SAM3 SAM4 ATP15

PLC1

DIP5FUM1

YAH1

HUT1VMA11

FAS2

THI6

PGC1

POS5

COX10

CDC60

PPT2

ODC1

IDI1

CAR1

MSD1MSY1

SSU1

GLR1

YDC1ATP4

BTS1

ALD6

GRX5

SUR1

KTR6

ISM1

PMA2

ERG10

MET12CIT3

PDH1

ICL2ATP20

AGC1

ATH1

HTS1

GLN1

VMA13

MSF1

ARO7

FCY1

SPE3

TKL1

GRS2

PIS1

ANT1

MEP3

TAZ1

ASN1

TPO3

KRE6GPH1

MET16DPM1

GDB1

QCR2

AQY1

ATP1

BNA4

ILS1

PRS4

PRX1

COR1

FUI1

URA7

RIB1

PET9

ACH1

SCT1

RER2COQ1

UGA2

IPP1

FUR4CHS3

ETR1

CDS1

HMT1 PDX3CSG2

CHS2

ATP3

FAT1

CST26

TSC3

BAP2

TAT1MIS1

AAC3

VPS15

ALG1

LYS2

TKL2

GRS1

TPS1

VMA2

AGP2ADH5

ARA1RIB7 IFA38

CSH1TYR1

ECM31

YPC1

RIM2

PGI1

KTR4

LDH1

KTR3

MET8

PYC2

PDB1

HIS7

ARO4

DUT1

RIB5SHM1

TSC10

CTP1VBA2

SUL1PHO89

CHA1

PBN1

APA1

GLK1

ATG22

GRX1

HIS4

ILV6

PGS1 CIT2

ADY2

PGK1

SLM5

PMP1

FEN2

FEN1

RBK1PHO87

ARE1

THR4

ERS1

TRX3

MPH2

GUD1

GDH2

HEM3

GGC1

VMA1

LYS20

DLD2

DLD1

SFA1

CRD1BPL1

LYS21

QRI1

GET3

PMT1

PMT5

RAM1 NDE2

THI3

MDH3

COX9IDP1

PSA1SLC1

FAD1

SIR2

GPD1

TSC13

ATP16

NTH1GCV1

SES1

ARO3

HEM13BAP3

HEM12

TPI1

TGL2

LCB2

IPT1

TPS2

GRX3

ARO1

YCF1EKI1

KGD2HOM2

ARG82

SDH4

HST4

CAB5COQ4

MSS4ADK1

LYS4

FMN1

CTA1

EXG2

MSW1

GLO2

DPP1DPL1SUR2

ATP5

PRO1

GPI11 HNT2

IPK1

ASP1

TIM11YDR341CHXT7

HXT6

HXT3TRR1

TRP4

KEI1

YPR1FRQ1ARH1

ATP17ATO3

HPT1

URH1

DIT1 ADE8

BNA7

GUK1

PHO8

KRE2

RIB3

ITR1

SAM2

LPP1

GNP1

GRX2

QCR7

CAB1 STL1

DLD3

HXT13CAN1

PCM1VMA8

YEL047C

GLY1

GDA1

CYC7UTR4

BUD16VMA3

RIP1

URA3 PMP2

YEA6

PMI40YND1

HEM14

FAA2

ISC1PRO3

YAT2

CHO1

PHM8 SAH1

HOM3

PIC2

HIS1FCY2

FCY21

CEM1

HOR2

ICL1

ARG5

RNR1

ALD5

SER3

ILV1

TRP2

MET6

PRS2

AVT6

COX15

YER152C

ADK2

TMT1PDA1

FAU1

AGP3SEC53

AGX1

FRS2

LPD1

GNA1HXT10

DEG1GSY1

FAB1

HIS2

MET10

QCR6

BNA6

HXK1

HXK2PDE1

GUS1

ADE5

VRG4

SDT1

POX1

ARO8

COX13COX4

TPN1

STR3

LYS5

ARO2

GPI10

MET13

COQ8

GUP1FMP37

HNM1NPY1

PUS2

PYC1

OLE1

HEM2

PNC1

TRP5

ERG4

LEU1PMA1

ERG26

ECT1

NMA2

VMA7

GSC2

MUP1

ERG25

ADE6 VHT1PDC6

CTT1

VAS1TPC1

CLD1MEP1

ASN2

CYS4

CHO2

PSD2

MSM1

ERG1

ATF2

QCR9

TYS1HIP1

TDH3

PDX1

XKS1

ADE3

SER2TRX2

PFK1

FMP43

LSC2

CPD1

SOL4

ENO1

COQ6

GND2

TNA1

MES1FOL2

CAB4

BGL2

BIO2

IMA1

MAL11

MAL12

MUP3

GUT1

DUR3

PRS3

LAG1

QCR10

LEU5

ERG11

DIA4

ARG4

DED81

YHR020W

THR1

VMA16

PUT2

VMA10

NCP1

INM1

COX6

PAN5

DYS1

ERG7

QNS1MSR1

HXT4

HXT1

HXT5

GEP4GRE3

TRR2

EPT1

FUR1

ARO9DCD1

MPC2

SOL3

ENO2

GND1

ERG9

BAT1SUC2

POT1

GUT2

PAN6

FLX1

KGD1

AYR1COX5B

PFK26

LYS12

CAB2

THS1

SER33

RHR2

HIS6

PDR11 DOT5FAA3YIA6

INP51

DAL1DAL4

DAL2

DAL7

DAL3

LYS1

HYR1

HTD2

INM2

DUR1

SKN1AUS1

KCS1

FAS1

THI7

HEM1

TSA2PCT1

PXA1

HST2NAM2

APA2

YEF1

RNR4

TRP1

ACC1

THI20

ADH4

GLT1

UGA1

HIS5

TPO4

AGP1FMT1

PUS1

1.5

2.0

2.5

3.0

3.5

1 2 3log10(Glu_FPKM)

Pred

icte

d lo

g10(

Glu

_FPK

M)

R2: 0.342

GDH3

BDH1

ACS1

GCV3

CDC19

PMT2

FUN26

CYS3ADE1

YAT1

PHO11

IMA5OPT1

PHO90

ELO1 ERG20

FBP26

INO1

YUR1

GLG2LCB3

URA2

TRK1

RPE1

GSH1

LSB6

PHS1

GWT1

ARG3

ARG2

YJL068C

BNA3

TDH1

YJL045W

RNR2

CYR1AVT1

TDH2

MET3

GPI14

ILV3

TES1

BNA1

CYC1

UTR1CDC8

OPI3

MIR1BNA2

SFC1URA8

ADO1

CPA2YMR1

ATP2

STR2 XPT1

MET5

HOM6

PMT4

BAT2

DAL5

JEN1

URA1

SAC1

TRP3

MST1PXA2

SPE1PRS1

TPO5

MCD4

GPM1

SDH1AVT3

SDH3

TGL1

PGM1

OAC1

AAT1GFA1

YJU3

CAB3

MDH1VMA5

YNK1

FBA1

OAR1

UGP1

MAE1

GPX1

URA6

RAM2

ATP7LAC1 AUR1

MET14

FOX2SPO14

GAP1SHB17

YSR3

GLG1

KTR2CCP1GPT2

MET1SIS2

MTD1

TGL4

PCK1

MHT1MMP1AQY2

YBT1 FPS1

SDH2GPI13

TPO1

DPS1

BPT1

YEH1

LOT6

MEU1

YEH2

AAT2

ADE16

COX12

PDC1

ERG3

SHM2

FRS1

XYL2

ALT1

SUL2

ERG27

AHP1

CKI1

PDC5NHA1

PUT1

SPE4

ACS2

ASP3−2

ASP3−3

DPH5

IDP2

SAM1

ATG26

COQ9PNP1

BNA5

VPS34CDD1

GSY2LCB5ECI1 NNT1

ATP14

ECM38

EXG1MET17

ACO1

STT4

CDA1

NMA1

FKS1

DIC1

TAL1

ILV5

ADE13

SUR4FBP1VAC14

COX8

VIP1 URA4

IMD3CAR2

VMA6

HMG2

ERG13

PHO84

NDI1

COQ5URA5

TSL1

ALO1

ATP18

HMG1

DAK1

NTE1

IMD4

CYB2

CAT2AMD1 APT1

ERG6

GLO1PLB2

PLB1

ADI1

HXT2

SEC59

ERG5

FMS1

ARA2

STV1

AAC1ARG7

ADH3

VBA1PGM2

ILV2

FOL3

ADE17

NDE1

PAH1

ALD3

ALD2

GCV2

ERG2

PFK2

HFA1

ERG12

GUA1

ERG8

YMR226C

YHM2FAA4

GAD1

COX7

TPS3

URA10

SCS7

PGM3 GPI12

ABZ2

HER2

LCB1

LIP1

ADE4

ADH2

TGL3ADH6

FET4

FIG4

PHA2

PUS4

ERG24

MET2

ALP1

LYP1

PIK1

FOL1YNL247W

ZWF1

ADE12

SPS19

CHS1PSD1

MEP2

AAH1

CPT1NRK1

MLS1INP52

LEU4

AVT4

MSK1

LAT1

COX5A

LAP2

IDH1

KTR5

IDP3

PET8

CIT1

LRO1URK1

PHO91ARE2

ABZ1

COQ2

MVD1LYS9

BIO5BIO4

DSE4

GRE2

RIB4

ARG8

PFK27MDH2

ITR2WRS1COQ3

ADH1

HST1

RIB2

INP54

MET22

PRS5

GPD2

ARG1

SPE2

GSH2MSE1

TAT2

PLB3

HST3

GLO4VHS3

CYT1

NRT1

CDC21TGL5

RKI1

KTR1

CRC1

LEU9

INP53 GCY1CAT5

IAH1

ADE2

ORT1IDH2

LSC1THI80 ISN1

DDP1

GLN4

LCB4 ALE1HEM15

DCI1

SER1

THI72

HIS3

NPT1

MCT1

ODC2DFR1

MET7DGA1

VPH1

HEM4

CPA1

DGK1

FAA1

PMT3PRO2

VMA4ALA1

PYK2

PUT4

PDE2

ALD4

GDH1

ATF1

IMD2

SAM3SAM4

ATP15

PLC1

DIP5

FUM1

YAH1

HUT1VMA11

FAS2

THI6

PGC1POS5COX10

CDC60

PPT2ODC1

IDI1

CAR1

MSD1MSY1

SSU1

GLR1

YDC1

ATP4

BTS1

ALD6

GRX5

SUR1

KTR6

ISM1PMA2

ERG10

MET12

CIT3PDH1

ICL2

ATP20

AGC1

ATH1

HTS1

GLN1VMA13

MSF1

ARO7

FCY1

SPE3

TKL1

GRS2 PIS1ANT1MEP3

TAZ1

ASN1

TPO3KRE6GPH1

MET16

DPM1

GDB1

QCR2AQY1

ATP1

BNA4 ILS1PRS4PRX1

COR1

FUI1

URA7RIB1

PET9ACH1

SCT1

RER2COQ1

UGA2

IPP1FUR4

CHS3

ETR1

CDS1

HMT1

PDX3

CSG2CHS2

ATP3

FAT1CST26

TSC3

BAP2

TAT1 MIS1AAC3

VPS15ALG1

LYS2

TKL2

GRS1

TPS1VMA2AGP2

ADH5

ARA1

RIB7IFA38

CSH1TYR1

ECM31

YPC1

RIM2

PGI1

KTR4

LDH1KTR3

MET8

PYC2

PDB1

HIS7

ARO4

DUT1

RIB5SHM1

TSC10CTP1

VBA2

SUL1PHO89

CHA1

PBN1APA1

GLK1

ATG22

GRX1

HIS4

ILV6

PGS1

CIT2

ADY2

PGK1

SLM5

PMP1

FEN2

FEN1RBK1PHO87

ARE1

THR4ERS1

TRX3

MPH2

GUD1

GDH2

HEM3

GGC1

VMA1

LYS20

DLD2

DLD1

SFA1CRD1

BPL1

LYS21

QRI1

GET3PMT1PMT5

RAM1

NDE2

THI3 MDH3COX9

IDP1

PSA1SLC1

FAD1

SIR2

GPD1

TSC13

ATP16NTH1

GCV1

SES1

ARO3

HEM13

BAP3

HEM12

TPI1

TGL2LCB2

IPT1

TPS2

GRX3ARO1

YCF1

EKI1

KGD2

HOM2

ARG82SDH4

HST4

CAB5

COQ4

MSS4

ADK1

LYS4

FMN1

CTA1EXG2MSW1

GLO2DPP1

DPL1

SUR2

ATP5

PRO1GPI11HNT2

IPK1

ASP1TIM11

YDR341C

HXT7

HXT6

HXT3

TRR1

TRP4

KEI1

YPR1FRQ1

ARH1 ATP17

ATO3 HPT1

URH1

DIT1

ADE8

BNA7

GUK1

PHO8

KRE2

RIB3ITR1

SAM2

LPP1

GNP1

GRX2

QCR7

CAB1

STL1

DLD3

HXT13

CAN1

PCM1

VMA8

YEL047C

GLY1

GDA1

CYC7UTR4

BUD16 VMA3

RIP1

URA3

PMP2

YEA6 PMI40YND1HEM14

FAA2

ISC1PRO3

YAT2

CHO1

PHM8

SAH1HOM3

PIC2

HIS1

FCY2

FCY21

CEM1

HOR2

ICL1

ARG5

RNR1

ALD5

SER3

ILV1

TRP2

MET6

PRS2AVT6

COX15

YER152C

ADK2TMT1

PDA1

FAU1AGP3

SEC53

AGX1

FRS2

LPD1GNA1

HXT10

DEG1

GSY1

FAB1

HIS2 MET10

QCR6

BNA6

HXK1

HXK2

PDE1GUS1

ADE5

VRG4SDT1

POX1

ARO8

COX13COX4

TPN1

STR3

LYS5

ARO2

GPI10

MET13

COQ8

GUP1

FMP37

HNM1

NPY1

PUS2

PYC1

OLE1

HEM2

PNC1TRP5

ERG4

LEU1

PMA1

ERG26

ECT1

NMA2

VMA7GSC2

MUP1

ERG25

ADE6VHT1

PDC6

CTT1VAS1

TPC1

CLD1

MEP1

ASN2

CYS4

CHO2PSD2

MSM1

ERG1

ATF2

QCR9

TYS1HIP1

TDH3

PDX1XKS1

ADE3

SER2 TRX2

PFK1 FMP43

LSC2

CPD1

SOL4

ENO1

COQ6

GND2TNA1MES1 FOL2

CAB4 BGL2

BIO2

IMA1

MAL11

MAL12

MUP3

GUT1DUR3

PRS3LAG1

QCR10

LEU5

ERG11

DIA4

ARG4

DED81YHR020W

THR1

VMA16

PUT2

VMA10

NCP1

INM1

COX6

PAN5

DYS1

ERG7

QNS1

MSR1HXT4

HXT1

HXT5

GEP4

GRE3

TRR2EPT1

FUR1

ARO9

DCD1

MPC2

SOL3

ENO2

GND1

ERG9

BAT1

SUC2

POT1

GUT2

PAN6

FLX1

KGD1

AYR1

COX5B

PFK26

LYS12

CAB2

THS1SER33

RHR2

HIS6PDR11

DOT5FAA3

YIA6INP51DAL1

DAL4

DAL2

DAL7

DAL3

LYS1

HYR1

HTD2

INM2

DUR1

SKN1

AUS1

KCS1

FAS1

THI7

HEM1

TSA2

PCT1

PXA1

HST2

NAM2

APA2

YEF1

RNR4

TRP1

ACC1

THI20

ADH4GLT1

UGA1

HIS5

TPO4

AGP1

FMT1

PUS1

1

2

3

1 2 3log10(Ana_FPKM)

R2: 0.347

GDH3BDH1

ACS1

GCV3

CDC19

PMT2

FUN26

CYS3

ADE1

YAT1

PHO11

IMA5

OPT1

PHO90

ELO1ERG20

FBP26

INO1

YUR1

GLG2

LCB3

URA2

TRK1

RPE1

GSH1LSB6

PHS1

GWT1

ARG3

ARG2YJL068CBNA3

TDH1YJL045W

RNR2

CYR1AVT1

TDH2

MET3

GPI14

ILV3

TES1BNA1

CYC1

UTR1CDC8

OPI3

MIR1

BNA2

SFC1

URA8

ADO1CPA2YMR1

ATP2

STR2

XPT1

MET5

HOM6PMT4

BAT2

DAL5

JEN1

URA1

SAC1

TRP3

MST1PXA2

SPE1

PRS1

TPO5MCD4

GPM1

SDH1

AVT3

SDH3

TGL1

PGM1

OAC1

AAT1

GFA1 YJU3

CAB3 MDH1

VMA5

YNK1

FBA1

OAR1UGP1

MAE1

GPX1URA6

RAM2

ATP7

LAC1

AUR1

MET14

FOX2

SPO14

GAP1

SHB17

YSR3

GLG1KTR2

CCP1

GPT2

MET1SIS2

MTD1

TGL4

PCK1

MHT1

MMP1

AQY2

YBT1

FPS1

SDH2

GPI13

TPO1

DPS1

BPT1

YEH1

LOT6

MEU1YEH2

AAT2

ADE16

COX12

PDC1

ERG3SHM2

FRS1

XYL2 ALT1

SUL2

ERG27

AHP1

CKI1

PDC5 NHA1

PUT1

SPE4

ACS2

ASP3−2

ASP3−3

DPH5

IDP2

SAM1

ATG26COQ9

PNP1

BNA5

VPS34CDD1

GSY2

LCB5

ECI1NNT1

ATP14

ECM38

EXG1

MET17

ACO1

STT4CDA1

NMA1

FKS1DIC1

TAL1

ILV5

ADE13SUR4

FBP1

VAC14

COX8

VIP1

URA4

IMD3

CAR2

VMA6

HMG2

ERG13PHO84

NDI1

COQ5URA5

TSL1

ALO1

ATP18

HMG1

DAK1NTE1

IMD4

CYB2

CAT2

AMD1APT1

ERG6

GLO1PLB2

PLB1

ADI1 HXT2SEC59

ERG5

FMS1

ARA2

STV1

AAC1

ARG7

ADH3

VBA1 PGM2

ILV2

FOL3

ADE17

NDE1

PAH1ALD3

ALD2

GCV2

ERG2

PFK2

HFA1

ERG12

GUA1

ERG8

YMR226C

YHM2FAA4

GAD1

COX7

TPS3URA10

SCS7

PGM3

GPI12

ABZ2

HER2

LCB1

LIP1

ADE4

ADH2

TGL3ADH6

FET4

FIG4 PHA2PUS4

ERG24

MET2

ALP1LYP1

PIK1

FOL1

YNL247W

ZWF1

ADE12

SPS19

CHS1

PSD1

MEP2

AAH1

CPT1

NRK1

MLS1

INP52

LEU4

AVT4

MSK1 LAT1

COX5A

LAP2

IDH1

KTR5

IDP3

PET8

CIT1

LRO1URK1

PHO91

ARE2

ABZ1COQ2 MVD1

LYS9

BIO5

BIO4

DSE4GRE2

RIB4

ARG8

PFK27

MDH2

ITR2

WRS1COQ3

ADH1HST1

RIB2INP54

MET22

PRS5

GPD2 ARG1

SPE2

GSH2MSE1

TAT2

PLB3

HST3GLO4

VHS3

CYT1

NRT1

CDC21

TGL5

RKI1

KTR1

CRC1

LEU9

INP53

GCY1

CAT5IAH1

ADE2

ORT1

IDH2

LSC1

THI80

ISN1

DDP1

GLN4

LCB4ALE1HEM15

DCI1

SER1

THI72HIS3

NPT1

MCT1 ODC2

DFR1

MET7

DGA1

VPH1

HEM4

CPA1

DGK1

FAA1

PMT3

PRO2VMA4

ALA1

PYK2

PUT4

PDE2

ALD4

GDH1

ATF1

IMD2

SAM3SAM4

ATP15

PLC1DIP5

FUM1

YAH1

HUT1

VMA11FAS2

THI6

PGC1POS5

COX10

CDC60

PPT2

ODC1

IDI1

CAR1

MSD1

MSY1

SSU1

GLR1

YDC1

ATP4

BTS1

ALD6

GRX5

SUR1

KTR6

ISM1PMA2

ERG10

MET12

CIT3

PDH1

ICL2

ATP20

AGC1

ATH1

HTS1

GLN1

VMA13

MSF1ARO7

FCY1SPE3

TKL1

GRS2

PIS1

ANT1

MEP3

TAZ1

ASN1

TPO3

KRE6GPH1 MET16

DPM1GDB1

QCR2

AQY1

ATP1

BNA4

ILS1PRS4

PRX1

COR1

FUI1

URA7

RIB1

PET9

ACH1

SCT1

RER2COQ1UGA2 IPP1

FUR4CHS3

ETR1

CDS1

HMT1

PDX3

CSG2

CHS2

ATP3

FAT1

CST26

TSC3BAP2TAT1

MIS1

AAC3

VPS15

ALG1

LYS2

TKL2

GRS1

TPS1

VMA2AGP2

ADH5

ARA1

RIB7IFA38

CSH1

TYR1

ECM31

YPC1

RIM2

PGI1

KTR4

LDH1 KTR3

MET8

PYC2

PDB1

HIS7

ARO4

DUT1

RIB5SHM1

TSC10CTP1

VBA2

SUL1

PHO89

CHA1 PBN1

APA1

GLK1

ATG22

GRX1

HIS4

ILV6

PGS1

CIT2ADY2

PGK1

SLM5

PMP1

FEN2

FEN1RBK1

PHO87ARE1

THR4

ERS1

TRX3

MPH2

GUD1

GDH2

HEM3

GGC1

VMA1

LYS20

DLD2

DLD1

SFA1CRD1BPL1

LYS21

QRI1

GET3

PMT1PMT5RAM1

NDE2

THI3MDH3

COX9IDP1

PSA1SLC1FAD1

SIR2

GPD1

TSC13

ATP16

NTH1

GCV1

SES1

ARO3HEM13

BAP3

HEM12

TPI1

TGL2LCB2

IPT1

TPS2

GRX3

ARO1

YCF1

EKI1

KGD2

HOM2ARG82

SDH4

HST4CAB5

COQ4

MSS4

ADK1

LYS4

FMN1

CTA1

EXG2MSW1

GLO2

DPP1

DPL1

SUR2

ATP5

PRO1

GPI11HNT2

IPK1

ASP1

TIM11

YDR341C

HXT7HXT6

HXT3

TRR1TRP4

KEI1

YPR1

FRQ1

ARH1

ATP17

ATO3

HPT1URH1

DIT1

ADE8

BNA7

GUK1

PHO8

KRE2

RIB3

ITR1 SAM2

LPP1

GNP1

GRX2

QCR7

CAB1

STL1

DLD3

HXT13CAN1

PCM1

VMA8YEL047C

GLY1

GDA1CYC7UTR4 BUD16

VMA3

RIP1

URA3

PMP2

YEA6PMI40

YND1

HEM14

FAA2

ISC1

PRO3

YAT2

CHO1

PHM8

SAH1

HOM3

PIC2HIS1

FCY2

FCY21

CEM1

HOR2

ICL1

ARG5

RNR1

ALD5

SER3

ILV1

TRP2

MET6

PRS2AVT6COX15YER152C

ADK2TMT1 PDA1

FAU1

AGP3

SEC53

AGX1

FRS2

LPD1

GNA1

HXT10

DEG1

GSY1

FAB1

HIS2

MET10

QCR6

BNA6

HXK1

HXK2

PDE1

GUS1

ADE5

VRG4

SDT1

POX1ARO8

COX13

COX4

TPN1

STR3

LYS5ARO2

GPI10

MET13

COQ8

GUP1

FMP37

HNM1

NPY1

PUS2

PYC1

OLE1

HEM2

PNC1TRP5

ERG4

LEU1

PMA1

ERG26

ECT1NMA2

VMA7

GSC2

MUP1

ERG25

ADE6

VHT1PDC6

CTT1VAS1

TPC1

CLD1

MEP1

ASN2CYS4CHO2

PSD2

MSM1

ERG1ATF2

QCR9

TYS1

HIP1

TDH3

PDX1XKS1

ADE3

SER2

TRX2

PFK1

FMP43

LSC2

CPD1 SOL4

ENO1

COQ6GND2

TNA1

MES1 FOL2

CAB4BGL2

BIO2IMA1

MAL11

MAL12

MUP3

GUT1

DUR3

PRS3

LAG1

QCR10

LEU5

ERG11

DIA4

ARG4DED81

YHR020W

THR1

VMA16

PUT2

VMA10NCP1

INM1

COX6

PAN5

DYS1

ERG7QNS1

MSR1HXT4

HXT1

HXT5

GEP4GRE3

TRR2EPT1 FUR1

ARO9DCD1

MPC2

SOL3

ENO2

GND1ERG9

BAT1

SUC2

POT1 GUT2PAN6

FLX1

KGD1

AYR1

COX5B

PFK26

LYS12CAB2

THS1

SER33

RHR2

HIS6PDR11

DOT5

FAA3

YIA6INP51

DAL1DAL4

DAL2DAL7DAL3

LYS1

HYR1HTD2

INM2

DUR1

SKN1

AUS1

KCS1

FAS1

THI7

HEM1

TSA2

PCT1

PXA1

HST2

NAM2

APA2

YEF1

RNR4

TRP1

ACC1

THI20ADH4

GLT1

UGA1 HIS5

TPO4

AGP1

FMT1 PUS1

2

3

1 2 3 4log10(Eth_FPKM)

Pre

dict

ed lo

g10(

Eth

_FP

KM

)

R2: 0.428

Aerobic fermentation(nitrogen limited)

Gluconeogenic respiration(ethanol limited)

Respiratory glucose metabolism(glucose limited)

Fermentative glucose metabolism(glucose limited, anaerobic)

Gluconeogenic respiration(ethanol limited)

Aerobic fermentation(nitrogen limited)

Fermentative glucose metabolism(glucose limited, anaerobic)Respiratory glucose metabolism(glucose limited)

NitEth

AnaGlu

Nit

Eth

Ana

Glu

Nit

Eth

AnaGlu

Nit

Eth

Ana

Glu

Nit

Eth

Ana

Glu Nit

Eth

Ana

Glu Nit

Eth

Ana

Glu

0.2

0.3

0.4

GEMSNR > 1

Peak signal

GEMSNR > 2

Peak signal

GEMSNR > 3

Peak signal

GEMSNR > 2Sum of

−50 to +50 bpsurrounding

peaks

Sum of TSS−500 −> +500 bp

promotersignal

Sum of TSS−500 −> +500 bp

promotersignal

allowingMARS hinges

R2

Pre

dict

ed lo

g10(

Ana

_FP

KM

)

Figure 2. Comparing performance of TF binding data preprocessing in linear regressions to predict transcript levels and details of multivariate adaptiveregression splines (MARS) models. (A) Correlations between predicted transcript levels and real transcript levels for the different formats of TF bindingdata. The black line indicates the mean of the four metabolic conditions. (B–E) MARS used to predict metabolic gene transcript levels of the differentconditions from the amount of TF binding per gene promoter. The boxes shown below the predictions plots represent the different TFs that are selectedby MARS to give strongest predictive performance in the conditions and how their signal is contributing to predictions in the model.

each condition, we used MARS with cross validation for theTF and spline selection to ensure that only the most robustTFs and splines were included in the final model (modelbuilding is illustrated in Supplementary Figure S3A). Theresulting predictions of transcript levels from TF bindingusing MARS for the four metabolic conditions are shownin Figure 2B–E. Using these conservative MARS models,

we could predict 34–43% of the variation in expression lev-els of metabolic genes at the four metabolic conditions weinvestigated. We judged this predictive power as being suf-ficiently good to assume that the parameters given to theTFs in these predictive models can give insights into howthe TFs are contributing to gene regulation in metabolism.Typical quality control checks of the MARS models pre-

Dow

nloaded from https://academ

ic.oup.com/nar/article-abstract/47/10/4986/5446249 by C

halmers U

niversity of Technology user on 16 July 2019

Page 8: Predictive models of eukaryotic transcriptional regulation ... · Predictive models of eukaryotic transcriptional regulation reveals changes in transcription factor roles and promoter

4992 Nucleic Acids Research, 2019, Vol. 47, No. 10

sented in Figure 2B–E, such as the distribution of residualsand QQ plots are shown in Supplementary Figure S3B.

The details of how TFs are contributing at different TFbinding strengths in these predictive models are shown inthe lower panels of Figure 2B–E. The coefficient of the TFsfrom the regressions are illustrated here by either an up-ward slope (activation) or a downward slope (repression).A striking observation from these models is that of the 22TF-transcript level correlations selected by MARS over thefour metabolic conditions, the linear correlations were allpositive, suggesting a predominance of activation in yeastmetabolic gene regulation by TFs. MARS can also createsegments where there is no correlation between TF bindingand transcript levels if that is advantageous for the predic-tions. The most common way that the MARS algorithmscreated splines from TF binding (15/22 cases) is to intro-duce a threshold after which there is a linear relationshipbetween TF binding and transcript level. MARS also foundrelationships between TF binding and transcription thatshow a saturation effect, where more binding does not leadto higher expression (6/22 cases). Of these six cases, fourof them are from the well-known interaction partners Ino2and Ino4, suggesting that nonlinearity between binding andfunctional outcomes may be a general feature of this TFcomplex.

Contributions from several TFs on gene regulation are gen-erally additive, but exploring collinear TFs indicates cases ofmore complex functional interactions

In multiple linear regressions, correlation between explana-tory variables (here TFs) leads to multicollinearity, a redun-dancy in the information contained in explanatory variableswhich may complicate the interpretation of the model, Inour data, collinearity between TFs could be due to TFsinteracting in a common protein complex, or respondingindependently to the same cellular cues that regulate TF-DNA binding. A feature of the TF selection in MARS isthat if there is collinearity between two TFs that are bothstrongly predictive of transcript levels, only the TF thatgives slightly better predictions will be included while theother TF will not be visible. To look for such collinearTFs we first mapped the correlations in binding signal overmetabolic genes and calculated significance of the correla-tions (Figure 3A–D). To find cases in the MARS models ofcollinearity where a TF is not included because there is aslightly better predictor selected, we tested if all includedTFs could be substituted by other TFs with significantlycorrelated binding. Such cases are shown with black bordersin Figure 3A–D and they are TF pairs predicted to regulatesimilar sets of genes in similar ways, indicating overlappingfunctions and/or possibly more complex TF–TF interac-tions. This analysis revealed several known cases of TF in-teractions such as Ino2–Ino4, Gcr1-Gcr2-Tye7 and Cat8-Sip4, but also novel potential interactions such as Gcn4-Rtg1 and Ert1-Ino4.

In the MARS models shown in Figure 2B–E, the contri-bution of TFs binding to each gene is multiplied by a coef-ficient and then added to get the final predicted transcriptlevel for that gene. We further looked for TF-TF interac-tions that contribute to transcriptional regulation in ways

that are numerically more complex than simple addition.All the significantly correlated TFs were tested if the multi-plication of the signal of two collinear TFs give additionalpredictive power compared to addition of the two TFs (Fig-ure 3E–H). Most collinear TF pairs do not show a strongimprovement in predictive power by including a multiplica-tive interaction term, for example the mentioned potentialTF interactions of Cat8-Sip4 and Gcn4-Rtg1 during glu-coneogenic respiration which only gave a 3% and 4% in-crease in predictive power, respectively (Figure 3F, percent-age improvement calculated by (multiplicative R2 increase(y-axis) + additive R2 (x-axis))/additive R2 (x-axis)). TheTF pair that shows the clearest indications of having a morecomplex functional interaction is Ino2–Ino4, having 19%,11%, 39% and 20% improvement (Figure 3E–H) in predic-tive power in the tested metabolic conditions by including amultiplication of the binding signals. TF pairs that togetherexplain >10% of the metabolic gene variation using an onlyadditive regression and also show minimum 10% improvedpredictive power when allowing multiplication are indicatedin red in Figure 3E–H. For Ino2–Ino4, the strongest effectof the multiplication term is seen during fermentative glu-cose metabolism with 39% improved predictive power (Fig-ure 3G). The plot for how the multiplied Ino2–Ino4 signalis contributing to the regression in this condition reveal thatin the genes where both TFs bind strongest together, thereis a predicted reduced activation as compared to interme-diate binding strengths of both TFs, and a similar trend isseen for the Ino2–Ino4 pair for other metabolic conditions(Supplementary Figure S3c).

Clustering metabolic genes based on their relative change inexpression gives a strong enrichment of metabolic processesand improved predictive power of TF binding in linear regres-sions

Linear regressions of metabolic genes with TF selectionthrough MARS defined a small set of TFs that wererobustly associated with transcriptional changes over allmetabolic genes (Figure 2B–E), but TFs that only regu-late a smaller group of genes would be unlikely to get se-lected by this method. We clustered genes by their sum-of-squares normalized expression between conditions to getsmaller clusters of genes with a range of gene expression lev-els that are appropriate for predictive modeling by multiplelinear regressions. The motivation for clustering genes intosmaller groups is to be able to link TFs to specific patterns ofgene expression changes between the tested metabolic con-ditions and to functionally connected groups of genes– thusallowing more detailed predictions about the TFs’ biologi-cal roles. The optimal number of clusters to maximize theseparation of the normalized expression values of metabolicgenes was 16, as determined by Bayesian information crite-rion (Supplementary Figure S4A). Genes were sorted into16 clusters by k-means clustering and we found that mostclusters then show significant enrichment of metabolic pro-cesses, represented by GO categories (Figure 4). We furtherselected four clusters (indicated by black frames in Figure4) that are both enriched for genes of central metabolic pro-cesses and have large transcriptional changes across the dif-ferent metabolic conditions for further studies of how TFs

Dow

nloaded from https://academ

ic.oup.com/nar/article-abstract/47/10/4986/5446249 by C

halmers U

niversity of Technology user on 16 July 2019

Page 9: Predictive models of eukaryotic transcriptional regulation ... · Predictive models of eukaryotic transcriptional regulation reveals changes in transcription factor roles and promoter

Nucleic Acids Research, 2019, Vol. 47, No. 10 4993

p-value*** < 0.001** < 0.01* < 0.05

p-value*** < 0.001** < 0.01* < 0.05

*

*

*

*

*

*

*

* *

*

*

*

** **

**

**

**

**

***

−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

Gcn

4R

tg1

Hap

1E

r t1

Sut

1O

af1

Stb

5C

bf1

Ino2

Ino4

Gcr

2G

cr1

T ye7

Cat

8R

ds2

Rgt

1P

ip2

Rtg

3H

ap4

Sip

4

Leu3Gcn4

Rtg1Hap1

Ert1Sut1

Oaf1Stb5

Cbf1Ino2

Ino4Gcr2

Gcr1Tye7

Cat8Rds2

Rgt1Pip2

Rtg3Hap4

*

*

*

* *

**

**

**

** **

***

***

***

−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

Gcn

4R

tg1

Hap

4Le

u3R

gt1

Sip

4S

ut1

Pip

2E

rt1C

at8

Rds

2G

cr2

Gcr

1Ty

e7C

bf1

Hap

1In

o2In

o4O

af1

Stb

5

Rtg3Gcn4

Rtg1Hap4

Leu3Rgt1

Sip4Sut1

Pip2Ert1

Cat8Rds2

Gcr2Gcr1

Tye7Cbf1

Hap1Ino2

Ino4Oaf1

Interchangable byMARS TF selection

Interchangable byMARS TF selection

A

B

C

D

p-value*** < 0.001** < 0.01* < 0.05

*

*

*

*

*

*

*

*

*

**

**

**

**

***

***

***

***

−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

Ino4

Cbf

1H

ap1

Leu3

Rtg

1R

tg3

Sut

1H

ap4

Er t1

Gcn

4S

tb5

Oaf

1P

ip2

Gcr

2S

ip4

Gcr

1Ty

e7R

gt1

Cat

8R

ds2

Ino2Ino4

Cbf1Hap1

Leu3Rtg1

Rtg3Sut1

Hap4Ert1

Gcn4Stb5

Oaf1Pip2

Gcr2Sip4

Gcr1Tye7

Rgt1Cat8

Interchangable byMARS TF selection Cbf1−Ert1

Cbf1−Rtg3Ert1−Rtg3

Gcr1−Tye7Ert1−Hap1

Ert1−Gcn4

Ert1−Hap4

Ert1−Ino2

Ert1−Ino4

Gcn4−Hap4

Gcn4−Ino2 Gcn4−Ino4

Hap4−Ino2

Hap4−Ino4

Ino2−Ino4Cat8−Rds2

0.00

0.02

0.04

0.06

0.00 0.05 0.10 0.15 0.20R2 when adding TF pair contributions

R2

incr

ease

allo

win

g TF

pai

r mul

tiplic

atio

n

p-value*** < 0.001** < 0.01* < 0.05

*

**

**

***

***

−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

Gcr

1Ty

e7C

bf1

Ino2

Ino4

Leu3

Gcn

4R

tg1

Rtg

3P

ip2

Oaf

1H

ap1

Stb

5R

gt1

Sut

1S

ip4

Cat

8R

ds2

Ert1

Hap

4

Gcr2Gcr1

Tye7Cbf1

Ino2Ino4

Leu3Gcn4

Rtg1Rtg3

Pip2Oaf1

Hap1Stb5

Rgt1Sut1

Sip4Cat8

Rds2Ert1

Interchangable byMARS TF selection

Cat8−Sip4

Gcn4−Rtg1

Gcr1−Tye7

Ino2−Ino4

0.00

0.02

0.04

0.06

0.00 0.05 0.10 0.15 0.20R2 when adding TF pair contributions

R2

incr

ease

allo

win

g TF

pai

r mul

tiplic

atio

n

Cat8−Rds2Gcn4−Rtg1

Gcr1−Gcr2Gcr1−Tye7

Gcr2−Tye7

Ert1−Hap1Ert1−Ino2

Ert1−Ino4

Ert1−Oaf1Ert1−Sut1 Hap1−Ino2

Hap1−Ino4

Hap1−Oaf1

Hap1−Sut1

Ino2−Ino4

Ino2−Oaf1

Ino2−Sut1Ino4−Oaf1Ino4−Sut1

Oaf1−Sut1

Hap1−Sip4Ino2−Sip4

Ino4−Sip4

Cbf1−Gcr1Cbf1−Tye70.00

0.02

0.04

0.06

0.00 0.05 0.10 0.15 0.20R2 when adding TF pair contributions

R2

incr

ease

allo

win

g TF

pai

r mul

tiplic

atio

n

Gcn4−Rtg1

Gcr1−Gcr2Gcr1−Tye7

Gcr2−Tye7Hap1−Ino2 Hap1−Ino4

Hap1−Oaf1

Hap1−Stb5

Ino2−Ino4

Ino2−Oaf1Ino2−Stb5

Ino4−Oaf1

Ino4−Stb5Oaf1−Stb50.00

0.02

0.04

0.06

0.00 0.05 0.10 0.15 0.20R2 when adding TF pair contributions

R2

incr

ease

allo

win

g TF

pai

r mul

tiplic

atio

n

E

F

G

H

Aerobic fermentation (nitrogen limited)

Gluconeogenic respiration (ethanol limited)

Respiratory glucose metabolism(glucose limited)

Fermentative glucose metabolism(glucose limited, anaerobic)

Aerobic fermentation (nitrogen limited)

Gluconeogenic respiration (ethanol limited)

Fermentative glucose metabolism(glucose limited, anaerobic)

Respiratory glucose metabolism(glucose limited)

Figure 3. Exploring contributions of collinear TF pairs to transcriptional regulation. (A–D) Correlation plots illustrating Pearsons correlations (in color)between TF binding in promoters of metabolic genes. Significance (Pearson’s product moment correlation coefficient) is illustrated for TF pairs with P< 0.05, by one or several asterisks, as indicated. Pairs of significantly collinear TFs that are interchangeable in the MARS TF selection in Figure 2B–Eare indicated by a stronger border in (A–D). (E–H) Linear regressions of collinear TF pairs were tested with and without allowing a multiplication of TFsignals of the two TFs. TF pairs indicated in red and with larger fonts have an R2 of the additive regression >0.1 and increased performance with includinga multiplication of the TF pairs of at least 10%.

Dow

nloaded from https://academ

ic.oup.com/nar/article-abstract/47/10/4986/5446249 by C

halmers U

niversity of Technology user on 16 July 2019

Page 10: Predictive models of eukaryotic transcriptional regulation ... · Predictive models of eukaryotic transcriptional regulation reveals changes in transcription factor roles and promoter

4994 Nucleic Acids Research, 2019, Vol. 47, No. 10

Genes clustered (k-means)by their relative transcript levelsacross conditions

#3 - cellular response to oxidative stress, p.adj: 2.5e−3

#8

#6

#16 - fatty acid beta−oxidation, p.adj: 8.8e−12

#2

#7 - ATP synthesis coupled proton transport, p.adj: 7.7e−9

#12 - tricarboxylic acid cycle, p.adj: 7.2e−4

#15 - glycolytic process, p.adj: 4.7e−10

#10

#5

#9 - cellular response to nitrogen starvation, p.adj: 2.1e−3

#11 - tRNA aminoacylation for protein translation, p.adj: 2.5e-4

#1 - maltose metabolic process, p.adj: 3.5e-3

#4 - ergosterol biosynthetic process, p.adj: 1.7e-5

#13

#14

0.2 0.6RSS-normalized transcript

0.4 0.8

Cluster number and most enriched GO-term (FDR-adj. p < 0.01)

Aerobicfermentation

(nitrogenlimited)

Gluconeogenicrespiration

(ethanollimited)

Respiratory glucose

metabolism(glucoselimited)

Fermentativeglucose

metabolism(glucoselimited,

anaerobic)

Figure 4. Clustering genes by their relative change in expression (sum of squares normalization) over the four experimental conditions gives enrichment offunctional groups of genes. For clusters which have one or several significantly (FDR-adj P < 0.01) enriched GO terms, the top GO term is indicated withp.adj-value. Clusters containing central metabolic processes selected for further analysis with linear regressions in Figure 5 are indicated by a black frame.

are affecting gene regulation in these clusters through multi-ple linear regressions. While the introduction of splines washighly stable for linear regressions over all metabolic genes,we found the process of model building with MARS usingsplines to be less stable in smaller groups of genes (meancluster size with 16 clusters is 55 genes). For the multiplelinear regressions in the clusters, we retained TF selection(by variable selection in the MARS algorithm) to define themost important TFs, but without introduction of splines.

Using this framework of multiple linear regression, pre-dictions of transcriptional regulation on the clustered genesgives an improvement in predictive power compared to pre-dictions of all metabolic genes (Figure 5E–H, R2: 0.57–0.68). To compare the importance of different TFs for thepredictions of transcript levels in the groups over differentconditions, we calculate the ‘TF importance’ by multiply-ing R2 of the multiple linear regression predictions with therelative contribution of the TF in the linear regression (0–1, calculated by model construction algorithm) and also acoefficient for activation or repression (+1 or –1, respec-tively). Some TFs were found to regulate a certain processover several conditions, such as Hap1 for Cluster 4, enrichedfor ergosterol biosynthesis genes (Figure 5A), but Cluster4 is generally an example of a cluster with relatively largechanges in importance of different TFs for gene regulationin different conditions. To get information about the com-plete set of TFs regulating these clusters of genes, we alsoincluded collinear TFs that were not initially included in

the variable selection, but could replace a significantly cor-related TF (illustrated by a red link under the TF’s namesin the heatmaps of Figure 5). For Cluster 4, Oaf1 was notselected during TF selection for this cluster and was thusnot used in the predictions illustrated in the prediction plotof Figure 5E, but was included in the heatmap because itwas correlated to the Hap1 binding and when excludingHap1 from the TF selection, Oaf1 was included. Becausethe contribution of each TF is linear in these regressions,the heatmaps give a complete view of how each gene is pre-dicted to be regulated by different TFs. For Cluster 4 in fer-mentative glucose metabolism, the main contributors to er-gosterol genes (ERG27, ERG26, ERG11, ERG25, ERG3)are predicted to be Ert1, Hap1 and Oaf1 (Figure 5E).

Cluster 15 is highly enriched for glycolytic processes andacross conditions we see that the TFs predicted to be mostimportant in several conditions are the well-known gly-colytic regulators Gcr1, Gcr2 and Tye7 (Figure 5B). In res-piratory glucose metabolism, Gcr1, Hap1, Ert1 and Rtg1are included by variable selection and together these TFsare able to explain 66% of the variation in the cluster (Fig-ure 5F). Both Gcr2 and Tye7 were found to be collinearand able to replace Gcr1 if it was excluded and these threeTFs together are predicted to be regulating the glycolyticgenes TPI1, CDC19, TDH2, ENO2, PGK1, PGI1, FBA1,TDH3, GPM1, PFK2, TDH1 and PFK1 (Figure 5F). Wenext focused on Cluster 7, containing genes with relativelyhigher expression in the two respiratory conditions (seen

Dow

nloaded from https://academ

ic.oup.com/nar/article-abstract/47/10/4986/5446249 by C

halmers U

niversity of Technology user on 16 July 2019

Page 11: Predictive models of eukaryotic transcriptional regulation ... · Predictive models of eukaryotic transcriptional regulation reveals changes in transcription factor roles and promoter

Nucleic Acids Research, 2019, Vol. 47, No. 10 4995

Cluster 4

Gcn

4C

at8

Oaf

1H

ap1

Exp

ress

ion

Er t1

ILV5MPC2ALD5ILV3OLE1ERG3ERG25ERG11SHM2ERG6COX15ERG26SLC1RNR4NCP1ERG7ERG27GUA1ZWF1ADE5ADE13IMD3ADH3KTR1ADE3TAT2HEM2LYP1MTD1ERG24ERG4BNA2SER3SUL2SER2PHO90HMT1YEL047CADE4GPD2HEM12SCT1HEM14HEM1DYS1SAC1COX10PYC1PMT5ADH6PDR11GDB1FMT1CDA1IMD4PMT3

SCT1

FMT1

HMT1SLC1

PMT5

HEM12HEM1

YEL047C

HEM14

ALD5 SER3

COX15

ERG26

ERG4

HEM2

OLE1

PYC1

ADE5

ERG25

RNR4ADE3

SER2

ERG11

NCP1

DYS1

ERG7

MPC2

PDR11

PHO90

ILV3

BNA2

SAC1

MTD1

ERG3

SHM2

SUL2

ERG27

CDA1

ILV5

ADE13IMD3

ERG6

IMD4

ADH3

GUA1

ADE4

ADH6

ZWF1

LYP1

ERG24

TAT2

GPD2

KTR1

PMT3

COX10

GDB1

2.0

2.5

3.0

3.5

1.5 2.0 2.5 3.0 3.5log10(Ana_FPKM)

Pre

dict

ed lo

g10(

Ana

_FP

KM

)

R2: 0.68Ert1, Cat8, Gcn4, Hap1

Cluster 7

Oaf

1

Hap

1

Exp

ress

ion

Hap

4

QCR7INO1RIP1QCR6ATP16ATP14COX12IDH1ADK2COX6COX5AATP1ATP17MHT1ATP20NDI1KGD1SDH1TIM11SDH3SDH2LSC1FEN2ATP7HIS6ATP3CIT1KGD2ATP5BIO4AGP2AQY2QCR2LEU4PPT2IMD2ALP1DIT1ODC1AVT3ACO1SDH4GDH2ATP4MPH2YNK1

ATP1

ATP3

AGP2

FEN2

ATP16

GDH2

MPH2

KGD2

SDH4ATP5

TIM11

ATP17

DIT1

QCR7RIP1

ADK2

QCR6

COX6

IMD2

HIS6

KGD1

INO1 ATP7

YNK1

SDH3

AVT3SDH1

SDH2

AQY2

MHT1

COX12 ATP14

ACO1

NDI1

IDH1

COX5A

LEU4

ALP1

CIT1

BIO4

LSC1

ATP4

ODC1PPT2

ATP20

QCR2

2.0

2.5

3.0

3.5

2 3log10(Eth_FPKM)

Pre

dict

ed lo

g10(

Eth

_FP

KM

)

R2: 0.57Hap4, Hap1, Oaf1

A

Hap1Oaf1

Ino2Ino4

Hap1Gcn4

Rtg1

Hap1Oaf1

Gcn4

Cat8

Ert1Hap1

Gcn4

Rds2

Rtg3

−0.50

−0.25

0.00

0.25

0.50

0.75

Cluster 15 Adjusted p−value4.7e−10glycolytic process6.8e−08vacuolar acidification

3.8e−06ATP hydrolysis coupled proton transport

6.2e−06gluconeogenesis1.2e−05proton transport3.1e−04ion transport

B

Gcr1Tye7

Hap4

Ino2Ino4

Leu3

Rds2

Rtg1Gcr1Tye7

Hap4Ino2

Gcr2

Gcr1

Tye7

Ino2

Gcr2

Rgt1

Gcr1

Tye7

Rtg1

Gcr2

Ert1

Hap1

−0.50

−0.25

0.00

0.25

0.50

0.75

Cat8Ino4

Hap1

Hap4

Oaf1Hap1

Hap4

Oaf1Pip2

Rgt1

−0.50

−0.25

0.00

0.25

0.50

0.75

R2

* R

elat

ive

impo

rtanc

e of

TF

in re

gres

sion

(0−1

) *

corr

. coe

ffici

ent(+

1/-1

)

Adjusted p−value8.8e−12fatty acid beta−oxidation3.1e−03very long−chain fatty acid catabolic process3.1e−03fatty−acyl−CoA transport3.1e−03peroxisomal long−chain fatty acid import8.7e−03fatty acid metabolic process8.7e−03glycerol−3−phosphate metabolic process9.9e−03NADH oxidation

Cluster 16

Gcn4

Ert1Sut1

Gcr1Hap1

Ert1

Gcr1

Cbf1

Ino2Tye7

Leu3

Rtg3

Ino2

Ino4Hap1

Ino2Tye7

Ino4

Cat8

Rgt1

−0.50

−0.25

0.00

0.25

0.50

0.75

CDC19

VMA2

PGI1

TSC10

HIS4

PGK1

PSA1

VMA1

TPI1

EKI1

ITR1

VMA3

BUD16

VMA8

FCY2

FCY21RNR1

GUS1

HXK2

ADH4

UGA1VMA7MUP1

CYS4

TDH3

PFK1

MUP3

VMA10

ENO2

FAA3CYR1

RNR2

TDH1

URA2

ERG20

TDH2

OPI3

FBA1

VMA5

GPM1

SAM1

CDD1

EXG1

FKS1VMA6

ERG13

PFK2

PHO91

MVD1

MET22

VPH1

VMA4

YDC1FAS2

GPH1

2.0

2.5

3.0

3.5

2 3log10(Glu_FPKM)

Pre

dict

ed lo

g10(

Glu

_FP

KM

)

R2: 0.66Gcr1, Hap1, Ert1, Rtg1

Hap

1E

xpre

ssio

nTy

e7G

cr1

Gcr

2R

tg1

Er t1

PFK1TDH1PFK2GPM1TDH3FBA1PGI1PGK1ENO2TDH2CDC19TPI1HXK2RNR2GPH1YDC1VMA7VMA4MUP1CYS4OPI3FAS2ITR1SAM1FCY2FKS1FAA3VMA1UGA1URA2RNR1CDD1EKI1CYR1GUS1VMA10BUD16FCY21TSC10VMA3HIS4VPH1VMA5VMA8MET22ADH4PSA1VMA6VMA2MUP3PHO91MVD1ERG13ERG20EXG1

Sut

1

Gcn

4

Ert1

Gcr

1

Exp

ress

ion

Hap

1

IDP3GPD1ALD4GUT2NDE1STL1PXA2ECI1SFA1TES1MMP1CLD1CTA1HXT5ICL2GUT1JEN1VHT1ATH1POX1YJL045WFMP43POT1SPS19DCI1NDE2FAA2FOX2PXA1

GPD1

NDE2SFA1

CTA1STL1FAA2 POX1

VHT1CLD1

FMP43

GUT1

HXT5

GUT2

POT1

YJL045W

TES1

PXA2

JEN1

FOX2

MMP1

ECI1

NDE1

IDP3

SPS19

DCI1

ALD4

PXA1

ICL2ATH1

1.25

1.50

1.75

2.00

2.25

2.50

1.6 2.0 2.4log10(Nit_FPKM)

Pre

dict

ed lo

g10(

Nit_

FPK

M)

R2: 0.60Gcn4, Hap1, Gcr1

Adjusted p−value1.7e−05ergosterol biosynthetic process1.7e−05steroid biosynthetic process2.7e−04lipid biosynthetic process3.0e−04sterol biosynthetic process3.0e−04purine nucleotide biosynthetic process4.1e−03purine nucleobase biosynthetic process1.5e−02heme biosynthetic process1.9e−02protoporphyrinogen IX biosynthetic process3.6e−02GMP biosynthetic process

Adjusted p−value7.7e−09ATP synthesis coupled proton transport7.8e−06ATP biosynthetic process1.8e−05tricarboxylic acid cycle6.8e−04mitochondrial electron transport, succinate to ubiquinone8.2e−04ion transport1.6e−03proton transport1.9e−03cellular respiration6.6e−03cristae formation8.9e−03obsolete ATP catabolic process1.9e−02hydrogen ion transmembrane transport1.9e−02protein complex oligomerization2.3e−02mitochondrial electron transport, ubiquinol to cytochrome c

E F

C

G

D

H

−4 −2 0 2 4Column Z−Score −2 0 2

Column Z−Score

−3 −1 1 3Column Z−Score

0 −4 −2 0 2 4Column Z−Score

R2

* R

elat

ive

impo

rtanc

e of

TF

in re

gres

sion

(0−1

) *

corr

. coe

ffici

ent(+

1/-1

)

R2

* R

elat

ive

impo

rtanc

e of

TF

in re

gres

sion

(0−1

) *

corr

. coe

ffici

ent(+

1/-1

)

R2

* R

elat

ive

impo

rtanc

e of

TF

in re

gres

sion

(0−1

) *

corr

. coe

ffici

ent(+

1/-1

)

Aerobic fermentation(nitrogen limited)

Gluconeogenic respiration(ethanol limited)

Respiratory glucose metabolism(glucose limited)

Fermentative glucose metabolism(glucose limited, anaerobic)

Cluster 16Cluster 7Cluster 15Cluster 4

Aerobicfermentation

(nitrogenlimited)

Gluconeogenicrespiration(ethanollimited)

Respiratory glucose

metabolism(glucoselimited)

Fermentativeglucose

metabolism(glucoselimited,

anaerobic)

Aerobicfermentation

(nitrogenlimited)

Gluconeogenicrespiration(ethanollimited)

Respiratory glucose

metabolism(glucoselimited)

Fermentativeglucose

metabolism(glucoselimited,

anaerobic)

Aerobicfermentation

(nitrogenlimited)

Gluconeogenicrespiration(ethanollimited)

Respiratory glucose

metabolism(glucoselimited)

Fermentativeglucose

metabolism(glucoselimited,

anaerobic)

Aerobicfermentation

(nitrogenlimited)

Gluconeogenicrespiration(ethanollimited)

Respiratory glucose

metabolism(glucoselimited)

Fermentativeglucose

metabolism(glucoselimited,

anaerobic)

Figure 5. Clustering genes by relative expression gives strong predictive models of the clustered genes. (A–D) All significant (P.adj < 0.05) GO terms for theclustered genes and the relative importance of the TFs selected to give the strongest predictions of transcript levels for the genes in the clusters in differentconditions. Linear regressions (without splines) are used and importance is calculated by R2 (of regression with selected TFs) *relative importance of eachTF (0 to 1) *sign of coefficient (+1 is activation, –1 is repression). (E–H) Prediction plots showing the predicted transcript levels compared to the realtranscript levels from using the selected TFs (written in subtitle of plots). R2 of predicted transcript levels compared to real transcript level is shown inred text. Heatmaps demonstrate the real transcript levels as well as binding signal of each TF normalized column-wise (Z-score). TFs linked by a red lineunder the heatmap have significant collinearity over the cluster genes and were demonstrated to be able replace the other(s) in the variable selection, thushaving overlapping functions in regulation of genes in a given cluster.

Dow

nloaded from https://academ

ic.oup.com/nar/article-abstract/47/10/4986/5446249 by C

halmers U

niversity of Technology user on 16 July 2019

Page 12: Predictive models of eukaryotic transcriptional regulation ... · Predictive models of eukaryotic transcriptional regulation reveals changes in transcription factor roles and promoter

4996 Nucleic Acids Research, 2019, Vol. 47, No. 10

in Figure 4) and most strongly enriched for genes of mito-chondrial ATP biosynthesis (Figure 5C). The most impor-tant TF for predictions in this cluster for both the condi-tions with mostly respiratory metabolism is the well-knownmitochondrial regulator Hap4 (Figure 5C). During glu-coneogenic respiration, Hap4 and Hap1 are predicted toboth contribute to the regulation of most electron transportchain genes such as SHD1, NDI1, COX5A, COX6, QCR6,RIP1, QCR7 and ATP14 (Figure 5G). Several subunits ofthe ATP synthase (ATP17, ATP1 and ATP16) as well as cer-tain TCA cycle enzymes (KGD1, KGD2) are predicted to beregulated by Hap4 without Hap1. Oaf1 also contributes toregulation of various genes of the cluster to a lesser extent,in some cases alone, or together with Hap1 and/or Hap4.We also demonstrate the same analysis for Cluster 16, mostenriched for containing genes of fatty acid beta-oxidation(Figure 5D). While most of the predicted effects of TF bind-ing on transcriptional regulation explored thus far has beenactivation, from the importance of TFs in the different con-ditions of Cluster 16, three TFs showed negative correla-tion to transcriptional changes during aerobic fermentation(Figure 5D). The TF selection used three TFs – Gcn4, Hap1and Gcr1 to predict 60% of the variation in the cluster dur-ing aerobic fermentation (Figure 5H). Ert1 and Sut1 werefurther included in the analysis because they were foundto be collinear and able to replace Gcn4 in TF selection.Exploring the contributions of the TFs to expression lev-els of beta-oxidation genes in the heatmap supports an in-verse relationship between Sut1-Gcn4-Ert1 binding an ex-pression levels of several beta-oxidation genes, most notablyfor PXA1, DCI1, CTA1, CLD1, PXA2 and IDP3 (Figure5H).

The influence of TFs on gene regulation from different regionsof the promoter changes between metabolic conditions

In the analysis shown in Supplementary Figure S2C we no-ticed that there were apparent differences in importance ofdifferent regions of the promoter between the conditions,most notably a predicted shift towards more consequentialTF binding downstream of the TSS during aerobic fermen-tation. We reasoned that this could be because TFs withmore consequential binding downstream of the TSS couldbe more important at this condition, or it could be becauseTFs shift their importance from one region of the promoterto another between conditions. To distinguish these twopossibilities we performed simple linear regressions usingthe binding signal for each TF individually in 75 bp regionsof the promoter to look for potential changes in TF im-portance in different regions of the promoter between themetabolic conditions. Comparing the resulting profiles ofthe explanatory power of TF binding in the experimentalconditions revealed a distinctly different profile during aer-obic fermentation compared to the other three metabolicconditions (Figure 6A compared to Figure 6B–D). Impor-tantly, the differences during aerobic fermentation do notappear to be driven by one or a few TFs that regulate fromdownstream of the TSS with stronger importance duringthis condition, but rather a shift of where in the promoterseveral TFs are most consequential to regulation (CompareGcr1, Ino2, Ino4, Stb5, Cbf1 in Figure 6A to B–D).

Another striking observation from the importance of TFsduring aerobic fermentation was a negative correlation be-tween transcriptional changes and binding of three TFsfrom an overlapping region upstream of the TSS; Sut1,Gcn4 and Ert1 (Figure 6A). These regressions were of allmetabolic genes, suggesting that the negative correlationseen with these TFs on beta-oxidation genes, as seen in Fig-ure 5H, may be a more general phenomena of aerobic fer-mentation. To see if these three TFs are acting on the samegenes or separately we compared the binding of the threeTFs in the region 250–450 upstream of the TSS togetherwith expression levels over all metabolic genes to look forgroups of genes where binding of Sut1, Gcn4 and Ert1 arecorrelated to each other and anti-correlated to expressionlevels (Figure 6E). In aerobic fermentation, we found twodistinct groups of genes where all three TFs are generallycorrelated to each other, one with low binding of the threeTFs and high expression levels and another group with rel-atively higher binding of all three TFs and relatively lowerexpression levels. Interestingly, the group of genes where thethree TFs have the strongest negative correlation to tran-script levels is slightly enriched for genes of translation pro-cesses (Figure 6E, Group 1), genes that are likely closelycontrolled due to the nitrogen limitation of these cultures.We also looked for similar relationships between binding ofSut1-Gcn4-Ert1 and transcript levels for the other experi-mental conditions (Supplementary Figure S4B–D), but wefound no coordinated changes in the other conditions, sug-gesting this phenomena is specific to aerobic fermentation.We summarize some of our main findings from Figures 5and 6A–E regarding how TFs regulate difference metabolicprocesses in different conditions in Figure 6F.

DISCUSSION

The latest count of known and putative yeast TFs is 264(33). With coverage of 21 TFs we only see part of the sys-tem of gene regulation by TFs, but by selecting the TFs mostenriched to binding metabolic genes and building predic-tive models of metabolic gene regulation we achieved goodTF coverage per gene. Through various types of analysiswe discovered surprisingly large changes between the stud-ied metabolic conditions, first encountered when comparingsets of target genes for the TFs between conditions shown inFigure 1A. To explain the changing sets of gene targets be-tween conditions, we hypothesized that the TFs could havechanges in preference of DNA motifs between different con-ditions. We cannot exclude changes in motif preference incertain cases such as Rtg1 and Gcn4 in aerobic fermenta-tion (Supplementary Figure S2a), but in general the changesin motif preference between conditions are small and wethink this is not a major determinant of changes in whichgenes are being targeted between conditions. We think themost likely alternative explanation is that there are changesin nucleosome occupancy or histone modifications that al-low more or less binding to different sets of genes over thedifferent conditions studied and that binding site preferenceis only partly driven by the recognized DNA motif. Thisidea is generally supported by the complex two-way inter-actions that have been suggested between TF binding andhistones in eukaryotic gene regulation (8).

Dow

nloaded from https://academ

ic.oup.com/nar/article-abstract/47/10/4986/5446249 by C

halmers U

niversity of Technology user on 16 July 2019

Page 13: Predictive models of eukaryotic transcriptional regulation ... · Predictive models of eukaryotic transcriptional regulation reveals changes in transcription factor roles and promoter

Nucleic Acids Research, 2019, Vol. 47, No. 10 4997

Gcn4

Ert1

Sut1

Hap1

Ino4

Cbf1Ino2

Cat8Stb5

Gcr1

−0.06

−0.03

0.00

0.03

0.06

Pip2

Cbf1

Oaf1

Gcr1

Rtg1

Gcn4

Ino4

Cat8Hap4

Ino2

Stb5

Tye7

Hap1

Rgt1Ert1

0.000

0.025

0.050

0.075

0.100

Gcr1

Cbf1

Ino4Gcn4

Stb5

Sut1

Rtg1

Tye7

Ino2Ert1

Hap1

Cat8

Oaf1Rds2

0.00

0.02

0.04

0.06

R2

* co

rrel

atio

n co

effic

ient

(+1/

−1) Gcr1

Gcn4

Cat8

Stb5Cbf1

Ino4

Rtg1

Tye7

Oaf1

Ino2

Gcr2

Hap1

Ert1

0.000

0.025

0.050

0.075

R2

* co

rrel

atio

n co

effic

ient

(+1/

−1)

R2

* co

rrel

atio

n co

effic

ient

(+1/

−1)

R2

* co

rrel

atio

n co

effic

ient

(+1/

−1)

A B

C D

Exp

ress

ion

Sut

1

Ert1

Gcn

4

−4 0 2 4Column Z−Score

2

Adjusted p−value

2.4e−02tRNA aminoacylationfor protein translation

4.2e−02translation

Adjusted p−value5.1e−13glycolytic process7.5e−07gluconeogenesis3.8e−03glucose metabolic process4.1e−02electron transport chain

Group 1

Group 2

E

Aerobic fermentation(nitrogen limited)

Gluconeogenic respiration(ethanol limited)

Respiratory glucose metabolism(glucose limited)

Fermentative glucose metabolism(glucose limited, anaerobic)

Aerobic fermentation(nitrogen limited)

F

GLUCOSE TRAN PORTG

LYC

OLY

SIS

MITOCHONDRIA

T

A

CY

CLE

OXIDATIVE PHOSPHORYLATION

BETA OXIDATION

Increasing carbohydrate transportenzymes during glucose limitation

ERGOSTEROL BIOSYNTHESIS

Increasing mitochondrial processesduring respiratory metabolism

Increasing ergosterol biosynthesis enzymes in anaerobic metabolism

Increased glycolytic enzymes during fermentation

Increasing beta oxidationenzymes in respiratory metabolism

Oaf1 Hap4Hap1Pip2Cat8

Cat8

Ert1 Hap1 Gcn4Cat8 Oaf1

Gcr1Gcr2

Tye7

Ino2Ino4

Rgt1

Rds2

Gcr1Hap1

Rgt1

Ino2Ino4 Hap1AEROBICFERMENTATION(NITROGENLIMITED)

Ert1

Gcn4Sut1

Repressive correlation to highlyexpressed glycolytic and respiratory genes

Activating correlation togenes linked to translationprocesses

TSS TSS +750bpTSS -750bp

TF binding signal integrated over 75 bp intervals

TSS TSS +750bpTSS -750bp

TF binding signal integrated over 75 bp intervals

TSS TSS +750bpTSS -750bp

TF binding signal integrated over 75 bp intervals TSS TSS +750bpTSS -750bp

TF binding signal integrated over 75 bp intervals

Rds2

Figure 6. Comparing TF importance in different segments of the promoter shows a distinct pattern of TF importance during aerobic fermentation ascompared to the other conditions. (A–D) Linear regressions, regressing the TF binding signal of each TF individually, in the indicated segments of thepromoters, against the expression levels. Lines are labeled by the TF name near the highest absolute y-axis value. (E) Two groups of genes where bindingof Sut1, Ert1 and Gcn4 in −450 to −250 relative to the TSS is correlated to each other and oppositely correlated to transcript levels. (F) Summary of thestrongest predicted links between TF function and selected metabolic processes. Based on data presented in Figures 4 and 5, Supplementary Figure S4E–Fand Figure 6A–E.

Dow

nloaded from https://academ

ic.oup.com/nar/article-abstract/47/10/4986/5446249 by C

halmers U

niversity of Technology user on 16 July 2019

Page 14: Predictive models of eukaryotic transcriptional regulation ... · Predictive models of eukaryotic transcriptional regulation reveals changes in transcription factor roles and promoter

4998 Nucleic Acids Research, 2019, Vol. 47, No. 10

Based on the success of multiple linear regressions as pre-dictive models of gene regulation from TF binding data, wepropose that a large portion of transcriptional regulationby TFs in yeast is achieved from a linear effect of TF bind-ing on transcriptional outcome. Our predictions also sug-gest that a large amount of the contributions to gene reg-ulation from several metabolic TFs are additive. However,by allowing multiplication of TF binding signal, we do de-tect cases that indicate more complex contributions frompairs of TFs to gene regulation than a simple addition ofthe two TFs binding signal can capture. Most noteworthyis the relationship between Ino2 and Ino4, where the addi-tive contribution from both TFs seem to saturate at a cer-tain strength of binding (Supplementary Figure S3c). TheIno2–Ino4 relationship is best known for being required forphospholipid biosynthesis and Ino2 is described as most im-portant for transcriptional activation of the targeted genes,but Ino2 also depends on Ino4 for translocation into thenucleus (34). There is additional complexity in the regula-tion of this complex by involvement of the Opi1 repressor,which can bind Ino2 to inhibit activation by the complex(35). We cannot conclude on why we see a saturation effectin the transcriptional activation due to increased binding ofboth Ino2 and Ino4, but it could be due to TF–TF competi-tion of the complex with other components of the transcrip-tional machinery, or other nonlinear relationships betweenbinding of the TFs and transcriptional outcomes. In aero-bic fermentation we detect several additional TF pairs forwhich there may be more complex relationships than simpleaddition of the contributing signal. Of these, both Ert1 andGcn4 have relationships to Ino2 and Ino4 binding wherea multiplication of the binding signal clearly improves pre-dictive power (Figure 3E). It is striking that Ert1 and Gcn4are two of only a few TFs that show this kind of complex-ity seen together with the negative correlation seen betweentranscript levels and binding of Sut1, Gcn4 and Ert1 in aer-obic fermentation, but we do not know if these observationsare related.

The main advantage of using linear regressions, as com-pared to more complex machine learning analysis, is thateach TF’s contribution in the predictions can be fully de-scribed. We highlight this feature of our analysis in Fig-ure 5, where the groups of genes are biologically linked andsmall enough to illustrate the amount of binding from eachselected TF in the individual genes of the clusters. In thisanalysis we confirm several previously demonstrated regu-lators of important cellular processes such as Hap1 activat-ing ergosterol biosynthesis (36), Gcr1-Gcr2-Tye7 activatingglycolytic genes (37) and Hap4 activating respiratory pro-cesses (38). We also propose previously unknown contribu-tions from other TFs to these processes such as Ert1 activat-ing ergosterol biosynthesis at anaerobic conditions (Figure5E).

The minimal media used in all chemostats of this studyare without added amino acids, meaning that the feedbackcontrols activating amino acid biogenesis should generallybe activated. Gcn4 is best known as an activator of aminoacid biosynthesis, but also demonstrated in several stud-ies to be a repressor of ribosomal protein expression by anunknown mechanism (39,40). The mechanistic details onhow Gcn4 can activate some genes and repress others re-

main mostly unknown, but the repression has been linkedto functional interactions with the repressive TF Rap1 andhistone acetyltransferase Esa1 (41). A recent study of theyeast strain Y. lipolytica, describing increased lipid accumu-lation of this strain during nitrogen limitation, found thatbeta-oxidation genes were down-regulated in this condition(42). In that study, they also noted that genes with the Gcn4motif in their promoter tended to be down-regulated dur-ing nitrogen limitation, an observation that they could notexplain, but is in line with the negative correlation we ob-serve between Gcn4 binding and transcript levels for betaoxidation genes during nitrogen limitation (Figure 5H). Wepropose that a similar mechanism as was observed in Y.lipolytica exists in S. cerevisiae and further extend it to af-fect a larger set of genes (Figure 6E) and to also involveSut1 and Ert1. Sut1 is previously described to interact withthe co-repressor complex Cyc8-Tup1 (43) while Ert1 is pre-viously described to function as both an activator and re-pressor with the major described functional role being reg-ulating genes driving the diauxic shift (44). Our data doesnot give any further insight into the mechanism of the Sut1-Ert1-Gcn4 repression, but based on the generally reducednumber of peaks (Supplementary Figure S1C) and changesin importance of different regions of the promoter (Supple-mentary Figure S2C, Figure 6A–D) during aerobic fermen-tation, we speculate that Sut1-Ert1-Gcn4 binding is corre-lated to changes to the nucleosomes on each side of theTSS in certain genes, for example histone modifications orchanges in nucleosome occupancy. If the changes in bind-ing of these three TFs is not driven by changes in their pref-erence for motifs, but instead by nucleosomes, it is also notnecessarily the case that the three TFs are directly repressing– the correlation between TF binding and transcriptionalregulation could potentially be a secondary effect of nucleo-some changes that are also causing transcriptional changes.

In conclusion, our experimental design using chemostatsto capture stable states of metabolism reveals large changesin functional roles of different TFs between metabolicstates. The previously demonstrated difficulties in definingthe regulatory targets of eukaryal TFs through transcrip-tomics after TF deletion could be partly explained by thishighly dynamic nature of eukaryal TF function. If the dele-tion of the TF changes cellular conditions enough to shiftthe regulatory roles of a range of one or several other TFs,the following secondary transcriptional changes could be asource of significant changes in genes not targeted directlyby the deleted TF. Our framework of using multiple lin-ear regressions for full transparency of TF contributions totranscriptional regulation without relying on TF deletionwill be equally applicable for future larger-scale studies asbinding data for more TFs with condition-matched tran-scriptomics accumulate to gradually build a system-levelunderstanding of eukaryotic transcriptional regulation.

DATA AVAILABILITY

ChIP-exo raw sequencing data (.fastq) can be retrievedform Arrayexpress accession E-MTAB-6673: https://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-6673/. Aprocessed format of the ChIP-exo data, summarized asTF reads per gene promoter is included in Supplementary

Dow

nloaded from https://academ

ic.oup.com/nar/article-abstract/47/10/4986/5446249 by C

halmers U

niversity of Technology user on 16 July 2019

Page 15: Predictive models of eukaryotic transcriptional regulation ... · Predictive models of eukaryotic transcriptional regulation reveals changes in transcription factor roles and promoter

Nucleic Acids Research, 2019, Vol. 47, No. 10 4999

Data 4. RNA sequencing raw sequencing data (.fastq)can be retrieved from Arrayexpress accession E-MTAB-6657: https://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-6657/. The number of reads annotated to each geneare found in Supplementary Data 3 and our processing ofread counts to get to FPKM values that are used for allrelevant analysis of this study can be reproduced throughthe R scripts included in Supplementary Data 6.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online.

ACKNOWLEDGEMENTS

Authors’ contributions: Funding acquisition, J.N.; concep-tualization, P.H. and J.N.; experiments, P.H., D.B., C.S.Band G.L.; analysis, P.H., D.B. and C.S.B; writing – origi-nal draft, P.H.; writing – review & editing, D.B, C.S.B, G.L.and J.N.

FUNDING

European Union’s Horizon 2020 research and innovationprogramme [Marie Skłodowska-Curie grant agreement no.722287]; Knut and Alice Wallenberg Foundation and theNovo Nordisk Foundation [NNF10CC1016517]. Fundingfor open access charge: Internal.Conflict of interest statement. None declared.

REFERENCES1. Le,P.P., Friedman,J.R., Schug,J., Brestelli,J.E., Parker,J.B.,

Bochkis,I.M. and Kaestner,K.H. (2005) Glucocorticoidreceptor-dependent gene regulatory networks. PLoS Genet., 1,0159–0170.

2. Fan,J.-B., Benner,C., Hutt,K.R., Garcia-Bassets,I., Fu,X.-D.,Rosenfeld,M.G., Bibikova,M., Jin,M., Kwon,Y.-S., Ye,Z. et al. (2007)Sensitive ChIP-DSL technology reveals an extensive estrogenreceptor -binding program on human gene promoters. Proc. Natl.Acad. Sci. U.S.A., 104, 4852–4857.

3. Hu,Z., Killion,P.J. and Iyer,V.R. (2007) Genetic reconstruction of afunctional transcriptional regulatory network. Nat. Genet., 39,683–687.

4. Harbison,C.T., Gordon,D.B., Lee,T.I., Rinaldi,N.J., Macisaac,K.D.,Danford,T.W., Hannett,N.M., Tagne,J.B., Reynolds,D.B., Yoo,J. et al.(2004) Transcriptional regulatory code of a eukaryotic genome.Nature, 431, 1–5.

5. Gitter,A., Siegfried,Z., Klutstein,M., Fornes,O., Oliva,B., Simon,I.and Bar-Joseph,Z. (2009) Backup in gene regulatory networksexplains differences between binding and knockout results. Mol. Syst.Biol., 5, 276.

6. Fang,X., Sastry,A., Mih,N., Kim,D., Tan,J., Yurkovich,J.T.,Lloyd,C.J., Gao,Y., Yang,L. and Palsson,B.O. (2017) Globaltranscriptional regulatory network for Escherichia coli robustlyconnects gene expression to transcription factor activities. Proc. Natl.Acad. Sci. U.S.A., 114, 10286–10291.

7. ENCODE consortium. (2012) An integrated encyclopedia of DNAelements in the human genome. Nature, 489, 57–74.

8. Cheng,C., Alexander,R., Min,R., Leng,J., Yip,K.Y., Rozowsky,J.,Yan,K., Dong,X., Djebali,S., Ruan,Y. et al. (2012) Understandingtranscriptional regulation by integrative analysis of transcriptionfactor binding data. Genome Res., doi:10.1101/gr.136838.111.

9. Ouyang,Z., Zhou,Q. and Wong,W.H. (2009) ChIP-Seq oftranscription factors predicts absolute and differential geneexpression in embryonic stem cells. Proc. Natl. Acad. Sci. U.S.A., 106,21521–21526.

10. Zhu,C., Byers,K.J.R.P., McCord,R.P., Shi,Z., Berger,M.F.,Newburger,D.E., Saulrieta,K., Smith,Z., Shah,M. V.,Radhakrishnan,M. et al. (2009) High-resolution DNA-bindingspecificity analysis of yeast transcription factors. Genome Res., 19,556–566.

11. Badis,G., Pena-Castillo,L., Ansari,A.Z., van Bakel,H., Hughes,T.R.,Hasinoff,M.J., Terterov,D., Li Yeo,A., Tillo,D., Nislow,C. et al.(2008) A library of yeast transcription factor motifs reveals awidespread function for Rsc3 in targeting nucleosome exclusion atpromoters. Mol. Cell, 32, 878–887.

12. Hughes,T.R. and de Boer,C.G. (2013) Mapping yeast transcriptionalnetworks. Genetics, 195, 9–36.

13. Bergenholm,D., Liu,G., Hansson,D. and Nielsen,J. (2019)Construction of mini-chemostats for high-throughput straincharacterization. Biotechnol. Bioeng., 116, 1029–1038.

14. Rhee,H.S. and Pugh,B.F. (2012) ChiP-exo method for identifyinggenomic location of DNA-binding proteins withnear-single-nucleotide accuracy. Curr. Protoc. Mol. Biol., 100,21.24.1–21.24.14.

15. Liu,G., Bergenholm,D. and Nielsen,J. (2016) Genome-Wide mappingof binding sites reveals multiple biological functions of thetranscription factor Cst6p in saccharomyces cerevisiae. MBio, 7,1–10.

16. Guo,Y., Mahony,S. and Gifford,D.K. (2012) High resolution genomewide binding event finding and motif discovery reveals transcriptionfactor spatial binding constraints. PLoS Comput. Biol., 8, e1002638.

17. Borlin,C.S., Siewers,V., Nielsen,J., Holland,P., Bergenholm,D.,Cvetesic,N. and Lenhard,B. (2018) Saccharomyces cerevisiae displaysa stable transcription start site landscape in multiple conditions.FEMS Yeast Res., 19, 1–10.

18. Salazar,A.N., van den Broek,M., Daran,J.-M.G., Gorter deVries,A.R., Wijsman,M., Brouwers,N., Brickwedde,A., Abeel,T. andde la Torre Cortes,P. (2017) Nanopore sequencing enablesnear-complete de novo assembly of Saccharomyces cerevisiaereference strain CEN.PK113-7D. FEMS Yeast Res., 17,doi:10.1093/femsyr/fox074.

19. Langmead,B. and Salzberg,S.L. (2012) Fast gapped-read alignmentwith Bowtie 2. Nat. Methods, 9, 357–359.

20. Li,H., Handsaker,B., Wysoker,A., Fennell,T., Ruan,J., Homer,N.,Marth,G., Abecasis,G., Durbin,R. and 1000 Genome Project DataProcessing Subgroup (2009) The Sequence Alignment/Map formatand SAMtools. Bioinformatics, 25, 2078–2079.

21. Quinlan,A.R. and Hall,I.M. (2010) BEDTools: A flexible suite ofutilities for comparing genomic features. Bioinformatics, 26, 841–842.

22. R Core team (2016) R: A language and environment for statisticalcomputing. R Foundation for Statistical Computing. Vienna,https://www.R-project.org/.

23. Liao,Y., Smyth,G.K. and Shi,W. (2014) featureCounts: an efficientgeneral purpose program for assigning sequence reads to genomicfeatures. Bioinformatics, 30, 923–930.

24. Robinson,M.D., McCarthy,D.J. and Smyth,G.K. (2010) edgeR: aBioconductor package for differential expression analysis of digitalgene expression data. Bioinformatics, 26, 139–140.

25. Milborrow,S. (2017) earth: Multivariate Adaptive Regression Splines.https://CRAN.R-project.org/package=earth.

26. Bergenholm,D., Liu,G., Holland,P. and Nielsen,J. (2018)Reconstruction of a global transcriptional regulatory network forcontrol of lipid metabolism in yeast by using chromatinimmunoprecipitation with lambda exonuclease digestion. mSystems,3, e00215-17.

27. Ouyang,L., Holland,P., Lu,H., Bergenholm,D. and Nielsen,J. (2018)Integrated analysis of the yeast NADPH-regulator Stb5 revealsdistinct differences in NADPH requirements and regulation indifferent states of yeast metabolism. FEMS Yeast Res., 18, 91.

28. MacIsaac,K.D., Wang,T., Gordon,D.B., Gifford,D.K., Stormo,G.D.and Fraenkel,E. (2006) An improved map of conserved regulatorysites for Saccharomyces cerevisiae. BMC Bioinformatics, 7, 113.

29. Hashim,Z., Mukai,Y., Bamba,T. and Fukusaki,E. (2014) Metabolicprofiling of retrograde pathway transcription factors Rtg1 and Rtg3knockout yeast. Metabolites, 4, 580–598.

30. Crespo,L., Powers,T., Fowler,B. and Hall,M.N. (2002) TheTOR-controlled transcription activators GLN3, RTG1, and RTG3are regulated in response to intracellular levels of glutamine. Proc.Natl. Acad. Sci. U.S.A., 99, 6784–6789.

Dow

nloaded from https://academ

ic.oup.com/nar/article-abstract/47/10/4986/5446249 by C

halmers U

niversity of Technology user on 16 July 2019

Page 16: Predictive models of eukaryotic transcriptional regulation ... · Predictive models of eukaryotic transcriptional regulation reveals changes in transcription factor roles and promoter

5000 Nucleic Acids Research, 2019, Vol. 47, No. 10

31. Sanchez,B., Li,F., Lu,H., Kerkhoven,E. and Nielsen,J. (2016)Yeast-GEM: yeast 7.6.0. doi:10.5281/zenodo.1495468.

32. Friedman,J.H. (2007) Multivariate adaptive regression splines. Ann.Stat., 19, 123–141.

33. de Boer,C.G. and Hughes,T.R. (2012) YeTFaSCo: a database ofevaluated yeast transcription factor sequence specificities. NucleicAcids Res., 40, D169–D179.

34. Kumme,J., Dietz,M., Wagner,C. and Schuller,H.-J. (2008)Dimerization of yeast transcription factors Ino2 and Ino4 is regulatedby precursors of phospholipid biosynthesis mediated by Opi1repressor. Curr. Genet., 54, 35–45.

35. Lai,K. and Mcgraws,P. (1994) Dual control of inositol transport insaccharomyces cerevisiae by irreversible inactivation of permease andregulation of permease synthesis by IN02, IN04, and OPI1. JBC, 269,2246–2251.

36. Tamura,K.I., Gu,Y., Wang,Q., Yamada,T., Ito,K. and Shimoi,H.(2004) A hap1 mutation in a laboratory strain of Saccharomycescerevisiae results in decreased expression of ergosterol-related genesand cellular ergosterol content compared to sake yeast. J. Biosci.Bioeng., 98, 159–166.

37. Nishi,K., Park,C.S., Pepper,A.E., Eichinger,G., Innis,M.A. andHolland,M.J. (1995) The GCR1 requirement for yeast glycolytic geneexpression is suppressed by dominant mutations in the SGC1 gene,which encodes a novel basic-helix-loop-helix protein. Mol. Cell. Biol.,15, 2646–2653.

38. Blom,J., De Mattos,M.J.T. and Grivell,L.A. (2000) Redirection of therespiro-fermentative flux distribution in Saccharomyces cerevisiae by

overexpression of the transcription factor Hap4P. Appl. Environ.Microbiol., 66, 1970–1973.

39. Natarajan,K., Meyer,M.R., Jackson,B.M., Slade,D., Roberts,C.,Hinnebusch,A.G. and Marton,M.J. (2001) Transcriptional profilingshows that Gcn4p is a master regulator of gene expression duringamino acid starvation in yeast. Mol. Cell. Biol., 21, 4347–4368.

40. Mittal,N., Guimaraes,J.C., Gross,T., Schmidt,A., Vina-Vilaseca,A.,Nedialkova,D.D., Aeschimann,F., Leidel,S.A., Spang,A. andZavolan,M. (2017) The Gcn4 transcription factor reduces proteinsynthesis capacity and extends yeast lifespan. Nat. Commun., 8, 457.

41. Joo,Y.J., Kim,J.H., Kang,U.B., Yu,M.H. and Kim,J. (2011)Gcn4p-mediated transcriptional repression of ribosomal proteingenes under amino-acid starvation. EMBO J., 30, 859–872.

42. Pomraning,K.R., Kim,Y.-M.M., Nicora,C.D., Chu,R.K.,Bredeweg,E.L., Purvine,S.O., Hu,D., Metz,T.O. and Baker,S.E.(2016) Multi-omics analysis reveals regulators of the response tonitrogen limitation in Yarrowia lipolytica. BMC Genomics, 17, 138.

43. Regnacq,M., Alimardani,P., El Moudni,B. and Berges,T. (2001)Sut1p interaction with Cyc8p(Ssn6p) relieves hypoxic genes fromCyc8p-Tup1p repression in Saccharomyces cerevisiae. Mol.Microbiol., 40, 1085–1096.

44. Gasmi,N., Jacques,P.E., Klimova,N., Guo,X., Ricciardi,A., Robert,F.and Turcotte,B. (2014) The switch from fermentation to respiration inSaccharomyces cerevisiae is regulated by the Ert1 transcriptionalactivator/repressor. Genetics, 198, 547–560.

Dow

nloaded from https://academ

ic.oup.com/nar/article-abstract/47/10/4986/5446249 by C

halmers U

niversity of Technology user on 16 July 2019