a consensus s. cerevisiae metabolic model yeast8 and its ... · article a consensus s. cerevisiae...

14
General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. Users may download and print one copy of any publication from the public portal for the purpose of private study or research. You may not further distribute the material or use it for any profit-making activity or commercial gain You may freely distribute the URL identifying the publication in the public portal If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim. Downloaded from orbit.dtu.dk on: Jan 25, 2021 A consensus S. cerevisiae metabolic model Yeast8 and its ecosystem for comprehensively probing cellular metabolism Lu, Hongzhong; Li, Feiran; Sánchez, Benjamín J; Zhu, Zhengming; Li, Gang; Domenzain, Iván; Marcišauskas, Simonas; Anton, Petre Mihail; Lappa, Dimitra; Lieven, Christian Total number of authors: 14 Published in: Nature Communications Link to article, DOI: 10.1038/s41467-019-11581-3 Publication date: 2019 Document Version Publisher's PDF, also known as Version of record Link back to DTU Orbit Citation (APA): Lu, H., Li, F., Sánchez, B. J., Zhu, Z., Li, G., Domenzain, I., Marcišauskas, S., Anton, P. M., Lappa, D., Lieven, C., Beber, M. E., Sonnenschein, N., Kerkhoven, E. J., & Nielsen, J. (2019). A consensus S. cerevisiae metabolic model Yeast8 and its ecosystem for comprehensively probing cellular metabolism. Nature Communications, 10(1), 3586. https://doi.org/10.1038/s41467-019-11581-3

Upload: others

Post on 26-Sep-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A consensus S. cerevisiae metabolic model Yeast8 and its ... · ARTICLE A consensus S. cerevisiae metabolic model Yeast8 and its ecosystem for comprehensively probing cellular metabolism

General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.

Users may download and print one copy of any publication from the public portal for the purpose of private study or research.

You may not further distribute the material or use it for any profit-making activity or commercial gain

You may freely distribute the URL identifying the publication in the public portal If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim.

Downloaded from orbit.dtu.dk on: Jan 25, 2021

A consensus S. cerevisiae metabolic model Yeast8 and its ecosystem forcomprehensively probing cellular metabolism

Lu, Hongzhong; Li, Feiran; Sánchez, Benjamín J; Zhu, Zhengming; Li, Gang; Domenzain, Iván;Marcišauskas, Simonas; Anton, Petre Mihail; Lappa, Dimitra; Lieven, ChristianTotal number of authors:14

Published in:Nature Communications

Link to article, DOI:10.1038/s41467-019-11581-3

Publication date:2019

Document VersionPublisher's PDF, also known as Version of record

Link back to DTU Orbit

Citation (APA):Lu, H., Li, F., Sánchez, B. J., Zhu, Z., Li, G., Domenzain, I., Marcišauskas, S., Anton, P. M., Lappa, D., Lieven,C., Beber, M. E., Sonnenschein, N., Kerkhoven, E. J., & Nielsen, J. (2019). A consensus S. cerevisiae metabolicmodel Yeast8 and its ecosystem for comprehensively probing cellular metabolism. Nature Communications,10(1), 3586. https://doi.org/10.1038/s41467-019-11581-3

Page 2: A consensus S. cerevisiae metabolic model Yeast8 and its ... · ARTICLE A consensus S. cerevisiae metabolic model Yeast8 and its ecosystem for comprehensively probing cellular metabolism

ARTICLE

A consensus S. cerevisiae metabolic model Yeast8and its ecosystem for comprehensively probingcellular metabolismHongzhong Lu1,5, Feiran Li1,5, Benjamín J. Sánchez1, Zhengming Zhu1,2, Gang Li 1, Iván Domenzain 1,

Simonas Marcišauskas 1, Petre Mihail Anton 1, Dimitra Lappa 1, Christian Lieven 3,

Moritz Emanuel Beber 3, Nikolaus Sonnenschein3, Eduard J. Kerkhoven 1 & Jens Nielsen 1,3,4

Genome-scale metabolic models (GEMs) represent extensive knowledgebases that provide a

platform for model simulations and integrative analysis of omics data. This study introduces

Yeast8 and an associated ecosystem of models that represent a comprehensive computa-

tional resource for performing simulations of the metabolism of Saccharomyces cerevisiae––an

important model organism and widely used cell-factory. Yeast8 tracks community develop-

ment with version control, setting a standard for how GEMs can be continuously updated in a

simple and reproducible way. We use Yeast8 to develop the derived models panYeast8 and

coreYeast8, which in turn enable the reconstruction of GEMs for 1,011 different yeast strains.

Through integration with enzyme constraints (ecYeast8) and protein 3D structures

(proYeast8DB), Yeast8 further facilitates the exploration of yeast metabolism at a multi-scale

level, enabling prediction of how single nucleotide variations translate to phenotypic traits.

https://doi.org/10.1038/s41467-019-11581-3 OPEN

1 Department of Biology and Biological Engineering, Chalmers University of Technology, Kemivägen 10, SE412 96 Gothenburg, Sweden. 2 School ofBiotechnology, Jiangnan University, 1800 Lihu Road, 214122 Wuxi, Jiangsu, China. 3 The Novo Nordisk Foundation Center for Biosustainability, TechnicalUniversity of Denmark, DK-2800 Kgs Lyngby, Denmark. 4 BioInnovation Institute, Ole Maaløes Vej 3, DK2200 Copenhagen N, Denmark. 5These authorscontributed equally: Hongzhong Lu, Feiran Li. Correspondence and requests for materials should be addressed to J.N. (email: [email protected])

NATURE COMMUNICATIONS | (2019) 10:3586 | https://doi.org/10.1038/s41467-019-11581-3 | www.nature.com/naturecommunications 1

1234

5678

90():,;

Page 3: A consensus S. cerevisiae metabolic model Yeast8 and its ... · ARTICLE A consensus S. cerevisiae metabolic model Yeast8 and its ecosystem for comprehensively probing cellular metabolism

In the era of big data, computational models are instrumentalfor turning different sources of data into valuable knowledgefor e.g. biomedical1,2 or industrial use3. As a bottom-up sys-

tems biology tool, genome scale metabolic models (GEMs) con-nect genes, proteins and reactions, enabling metabolic andphenotypic predictions based on specified constraints4,5. Thequality and scope of GEMs have improved as demonstrated bythe models for human6, yeast7, and E. coli8. With their thousandsof reactions, genes and proteins, GEMs also represent valuableorganism-specific databases. Therefore, it is important to keeptrack of changes as new knowledge is added to GEMs, in order tomake model development recorded, repeatable, free and open tothe community. Approaches to record updates of a GEM exist forthis purpose9,10, albeit with some limitations on their simplicityand flexibility.

S. cerevisiae is a widely used cell factory11,12 and is extensivelyused as a model organism in basic biological and medicalresearch13,14. Recently, the emergence of technologies, such asCRISPR15 and single cell omics data generation16, have acceler-ated the developments in systems biology. Consistent with strongresearch interests in yeast, the relevant GEMs have also under-gone numerous rounds of curation since the first published ver-sion in 200317. These GEMs have contributed significantly tosystems biology studies of yeast including their use as platformsfor multi-omics integration18,19, and use for in silico straindesign20,21. However, the hitherto latest version, Yeast722, withonly 909 genes, falling behind the latest genome annotation,presents a bottleneck for the use of yeast GEMs as a scaffold forintegrating omics datasets.

Metabolism is complex and regulated at several differentlevels23,24. Traditional GEMs only consist of reactions and theirrelated gene and protein identifiers, and therefore cannot accu-rately predict cellular phenotypes under varied environmentalconditions other than nutritional conditions, e.g. simulating theimpact of temperature on growth rates25. Recently, enzymeconstraints26 and protein 3D structures27 have been integratedinto GEMs, thereby expanding their scope of application andlaying the foundation for whole cell modelling. GEMs constrainedwith kcat values and enzyme abundances have been able todirectly integrate proteomics data and correctly predict cellularphenotypes under conditions of stress26. GEMs with protein 3Dstructures connect the structure-related parameters and geneticvariation27 with cellular metabolism6, thus enlarging the predic-tion scope of GEMs. However, it remains challenging to directlypredict cellular metabolism based on changes in proteinsequences with current GEMs. Advanced functional mutationcluster analysis constrained with protein 3D structures is there-fore needed to integrate knowledge on protein structuresinto GEMs.

This study presents Yeast8, the latest release of the consensusGEM of S. cerevisiae22,28–30. We also introduce a model ecosys-tem around this GEM, including ecYeast8, a model incorporatingenzyme constraints; panYeast8 and coreYeast8, representing thepan and core metabolic networks of 1011 S. cerevisiae strains; andproYeast8DB, a database containing 3D structures of metabolicproteins. This model ecosystem has the ability to meet wideapplication demands from the large scientific yeast community insystems and synthetic biology of yeast.

Yeast8 is a consensus GEM maintained in an open andversion-controlled way. Through ecYeast8 and proYeast8DB,multiple parameters related to protein kinetics and 3D structurescould be integrated based on gene–protein-reaction relations.Furthermore, with panYeast8 and coreYeast8, 1011 strain-specificGEMs were reconstructed and compared. Thus, with Yeast8 andits model ecosystem, we demonstrate that the metabolism of yeastcan be characterised and explored in a systematic way.

ResultsRecording community developments of yeast GEMs withGitHub. We devised a general pipeline to record updates to themodel using Git (https://git-scm.com/), a version control system,and GitHub (https://github.com/), a hosting service for Gitrepositories (Fig. 1a and Supplementary Fig. 1). Hereby, werecord everything related to updates of the GEM, includingdatasets, scripts, corrections and each released version of theGEM (Supplementary Fig. 1). This Git version-controlled modelenables open and parallel collaboration for a wide community ofscientists. With Git and GitHub, each version of the yeast GEMcan be released periodically, which helps to promote the simul-taneous development of a model ecosystem around yeast GEM(Fig. 1b).

Increasing the scope of the yeast metabolic network. We sys-tematically improved the yeast GEM while moving from Yeast7to Yeast8 through several rounds of updates (Fig. 1c and Sup-plementary Fig. 2). To improve the genome coverage, we addedadditional genes from iSce92631. Besides, all functional geneannotations of S. cerevisiae from SGD32, BioCyc33, Reactome34,KEGG35 and UniProt36 were collected and compared (Supple-mentary Fig. 3) to update gene–protein-reaction relations (GPRs),as well as adding more GPRs. With Biolog experiments, i.e.evaluation of growth on a range of different carbon and nitrogensources (Supplementary Data 1), and metabolomics mapping (seemethods), extra reactions were added to enable the model toensure growth on the related substrates, as well as connectingthose metabolites with high confidence with the GEM. The bio-mass equation was modified by adding nine trace metal ions andeight cofactors. Additionally, 37 transport reactions were added inorder to eliminate 45 dead-end metabolites. To improve lipidconstraints, we reformulated reactions of lipid metabolism usingthe SLIMEr formalism, which Splits Lipids Into MeasurableEntities37. As SLIMEr imposes additional constraints on both thelipid classes and the acyl chain distribution from metabolomicsdata, it improved the model performances in lipid metabolism.

In each round of model updates, standard quality-control tests,such as reaction mass balance check and ATP yield analysis, wereperformed. The results in Supplementary Fig. 2 and Supplemen-tary Fig. 4 indicate that the gene (reaction) coverage in the modeland its performance were improved during the iterative updateprocess, which was also shown by comparing Yeast8 to Yeast7(Fig. 1d–f). To facilitate the multi-omics integrative analysis andvisualisation, we established a map of yeast metabolic pathways inSBGN (System Biology Graphical Notation) format (Supplemen-tary Fig. 5) using CellDesigner38.

Expanding Yeast8 to enable enzyme constraints. To enhancemodel prediction capabilities of Yeast8, ecYeast8 was generatedby accounting for enzyme constraints using the GECKO frame-work26 (Fig. 1b, Supplementary Table 1). In enzyme-constrainedmodels, metabolic rates are constrained by the intracellularconcentration of the corresponding enzyme multiplied by itsturnover number (kcat value) to ensure that fluxes are kept atphysiologically possible levels. Compared with Yeast8, ecYeast8has a large reduction in flux variability for most of the metabolicreactions in both glucose-limited and glucose-excess conditions(Fig. 2a).

To further evaluate the predictive strength of ecYeast8, wecompared the predicted maximum growth rate to that obtainedfrom experiments in which we cultivated yeast (strain S288c) on322 different combinations of carbon and nitrogen sources usingmicrotiter plates (Fig. 2c). Here, the measured maximum specificgrowth rate for each carbon source under all nitrogen sources was

ARTICLE NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-019-11581-3

2 NATURE COMMUNICATIONS | (2019) 10:3586 | https://doi.org/10.1038/s41467-019-11581-3 | www.nature.com/naturecommunications

Page 4: A consensus S. cerevisiae metabolic model Yeast8 and its ... · ARTICLE A consensus S. cerevisiae metabolic model Yeast8 and its ecosystem for comprehensively probing cellular metabolism

scaled based on the growth on ammonia as a reference, withresults displaying that the mean error for the predicted and scaledmeasured cell growth rate in ecYeast8 was 41.9%, which issignificantly lower than using Yeast8 and traditional flux balanceanalysis alone (Fig. 2d–f). With ecYeast8, it is also possible toestimate flux control coefficients (FCC) of enzymes for growth ondifferent carbon sources (Fig. 2b), i.e. enzymes exerting themajority of flux control on growth. For instance, growing onglucose, the major flux controlling enzyme is one of the isoformsof glyceraldehyde-3-phosphate dehydrogenase (Tdh1) whereas onfermentative carbon sources, such as ethanol and acetate, themajority of control is with Oli1, a key component of ATPsynthase, which has earlier been identified to be important forrespiratory metabolism in S. cerevisiae39. This type of analysisusing ecYeast8 can provide clues for in silico cell factory design26.

Generation of panYeast8 and coreYeast8. To investigate thecorrelation between phenotype and genotype, we used genomicsdata of 1011 S. cerevisiae strains from a recent genome-sequencing project and designed a pipeline to reconstructGEMs for each strain (Supplementary Fig. 6a)40. Similar to theCarveMe method41, a pan model for all yeast strains was firstly

reconstructed based on Yeast8 and the comprehensive pangen-ome annotation (Supplementary Note 1). Using panYeast8 andthe gene presence matrix for the 1,011 strains42, strain-specificGEMs (ssGEMs) could be generated automatically (Supplemen-tary Fig. 6a). The reaction number in the models was found to bein range from 3969 to 4013 (Fig. 3a and Supplementary Fig. 7a).

After reconstruction of the ssGEMs, we formulated coreYeast8based on shared reactions, metabolites and genes in all the strains,which contained 3895 reactions, 2666 metabolites and 892 genes.Even though there were 478 variable genes, only 147 reactionswere found to be variable due to the existence of a large numberof isoenzymes, which suggested that during the evolution of yeast,the increased number of gene copies with similar functionsincreased the robustness of cellular function at the protein level.We calculated the ratio of accessory genes (not included in thecoreYeast8) based on subsystems defined in the KEGG databaseand found that the accessory genes were primarily engaged withalternative sugar metabolism and other secondary metabolismpathways (Supplementary Fig. 8). The number of reactions incoreYeast8 was relatively close to Yeast8, signifying that S.cerevisiae metabolism is well conserved among the 1011 yeaststrains. It should be noted that coreYeast8 reflects the coremetabolic functions for all S. cerevisiae strains, and can therefore

+

+

++

+…

0

50

100

Yeast7 Yeast8

Yeast7

0 5 10 15 20New reactions

b

MSYNDPNLNG.…

g6p

ATP

D-gluc f6p

SGD:S000000400

P1270YBR196C

NCBI:852495

CHEBI:15422KEGG:C00002 GeneMNXM3HMDB00538

MNXR138699bigg:PGI

Protein

Metabolite

Yeast8

c

New GPR fromBiolog test

Modelcomparison Yeast8

New GPR fromgene annotation

1133

2680

3949

976

761

pro Yeast8DB

ecYeast8

v1 v2

v1 ≤ kcat1 E1

v2 ≤ kcat2 E2

Add SLIMEreactions

Reaction

d e f

909

2220

3493

1 908 18

Yeast7 iSce926 Backbones

Acyl chains

Merge6721

6604

6427

Pro

tein

num

ber

NCBI5921

KEGG

SGD

UniProt

Data mining C N

SP

ATGTCCTACA.…

YMDBYeast8

Yeast7

762 62 621

164

236S

ubst

rate

genes

mets

rxns

genes

mets

rxns

kcats

PDB files

Tryptophan metabolismbeta-Alanine metabolism

Galactose metabolismCys and met metabolism

Purine metabolismArg and pro metabolismGlutathione metabolism

N-Glycan biosynthesisP

erce

ntag

e(%

)

Geneessentiality

Substrateusage

Memotetotal score

Issue

Commits

Pull request

Developmentbranch

v8.1.0

v8.2.0

v8.0.0

v8.1.1

Yeast GEMecYeast

panYeastproYeastDB…

Masterbranch

a

Featurebranch

Models ecosystem

Fig. 1 Framework of the yeast GEM project. a Recording model updates in a community way using GitHub. b In Yeast8, genes, metabolites and reactionsare annotated with their corresponding IDs from different databases, which simplifies translation between namespaces. Yeast8 forms the basis of themodel ecosystem from which proYeast8DB, ecYeast8, etc., are derived. c Major steps of development from Yeast7 to Yeast8. d Subsystem statisticalanalysis for the reactions added to Yeast8. eMetabolomics mapping between Yeast7, Yeast8 and the YMDB database. f Comparison of Yeast7 and Yeast8in percent accuracy of gene essentiality and substrate usage analysis, as well as in memote test total scores (divided by 100)

NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-019-11581-3 ARTICLE

NATURE COMMUNICATIONS | (2019) 10:3586 | https://doi.org/10.1038/s41467-019-11581-3 | www.nature.com/naturecommunications 3

Page 5: A consensus S. cerevisiae metabolic model Yeast8 and its ... · ARTICLE A consensus S. cerevisiae metabolic model Yeast8 and its ecosystem for comprehensively probing cellular metabolism

be seen as a benchmark for reconstructing GEMs for other yeaststrains. Similar to the core E. coli GEM8, we observed thatcoreYeast8 could not grow in silico in a minimal medium since itlacks the ability to synthesise all required precursors for biomassproduction (Supplementary Note 1, Supplementary Table 2). Thiscan be partly related to the definition of ‘core biomasscompositions’ in our model, which still hints that the ability ofparticular yeast strains to grow on a minimal medium seems to bean acquired phenotype.

Evaluation of 1011 strain-specific GEMs. Using the 1011 GEMs,we estimated in silico substrate usage (including carbon, nitrogenand phosphorus) as well as the yield of 26 metabolites and bio-mass for growth on minimal media (Fig. 3b–f). From the sub-strate usage analysis, we found that some of the domesticatedstrains from the ecological origin “Industrial” had a smaller range

of substrates usage, suggesting that the domestication processmay have resulted in the loss of functions that were not routinelyused. As for the yield of biomass, it could be observed that theyield varied greatly between strains from different ecologicalorigins. By comparison, the biomass yield was relatively lower forstrains with the ecological origin ‘Human’, i.e. strains that havebeen isolated from humans (Fig. 3b), likely due to an adaptationtowards growth on complex media, as many building blocks areprovided by the host and the yeasts have adapted to theseconditions.

To further compare the metabolic potential of different strains,we calculated the maximum yields of 20 amino acids and six keyprecursors in a minimal medium from 1011 ssGEMs (Fig. 3f) andthe results illustrated that the gene background can have aremarkable effect on the ranges of maximum yields of desiredproducts. Additionally, taken strains from ‘Industrial’ ecologicalorigin as an example, a considerable variation in the strains’

0.0 0.1 0.2 0.3 0.4 0.50.0

0.2

0.4

Pre

dict

edva

lue

[h–1

]

0.0 0.1 0.2 0.3 0.4 0.5

0.5

4

Rescaled maximum growth rate [h–1]

0.0

0.2

0.4

2

0.5

4

2

0.1

0.3

0.1

0.3

Yeast8 ecYeast8 Mean error

Yeast8 ecYeast8

129%

42%

Amm

onium

Aspar

agine

Cytosin

e

Amino

butyr

ate

Glycine

Alanine

Argini

ne

Aspar

tate

Citrull

ine

Glutam

ate

Glutam

ine

Histidi

ne

Isoleu

cine

Leuc

ine

Met

hionin

e

Ornith

ine

Pheny

lalan

ine

Prolin

e

Serine

Threo

nine

Trypt

opha

n

Valine

Urea

RaffinoseSucroseD−glucoseD−mannoseD−fructose(R)−lactateL−arabinoseD−arabinoseMaltoseD−riboseTuranoseEthanolD−galactosePyruvate

0

0.05

0.1

0.15

0.2

0.25

a b

c

d

Flu

x co

ntro

l coe

ffici

ent

Gro

wth

rate

inm

icro

plat

es

GP

T2

OLI

1

TD

H1

RB

K1

GA

L1

ICL1

FR

D1

Glucose

Acetate

Ethanol

Glycerol

Glucitol

Galactose

Ribose

00.020.040.060.080.1

e f

10–10 10–5 100

Variability range [mmol (gDW)–1 h–1]

0

0.5

1

Cum

ulat

ive

dist

ribut

ion

Yeast8 chemostatecYeast8 chemostat ecYeast8 batch

Yeast8 batch

Fig. 2 ecYeast8 with enzyme constraints shows improved fit over the regular Yeast8 metabolic model. a Comparison of model performance betweenecYeast8 and Yeast8 based on flux variability analysis (FVA). b Flux control coefficient analysis with ecYeast8 simulated on different carbon sources inminimal medium with the growth as the objective function. c Growth rate of S. cerevisiae S288C in micro-plate under different combinations of carbon andnitrogen sources. d, e Prediction of maximum specific growth rates under different combinations of carbon and nitrogen sources using Yeast8 (d) andecYeast8 (e), the red colour zones mean that the data points are overlapped due to higher density. f Mean errors for comparison of measured andpredicted growth rates using Yeast and ecYeast8

ARTICLE NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-019-11581-3

4 NATURE COMMUNICATIONS | (2019) 10:3586 | https://doi.org/10.1038/s41467-019-11581-3 | www.nature.com/naturecommunications

Page 6: A consensus S. cerevisiae metabolic model Yeast8 and its ... · ARTICLE A consensus S. cerevisiae metabolic model Yeast8 and its ecosystem for comprehensively probing cellular metabolism

capability to synthesise these amino acids was found (Fig. 3e).Combining genotype information, model simulation and pheno-type data40 revealed that variations in the simulated maximumyields of amino acids are primarily due to differences in theenergy pathways used for ATP regeneration and the absence ofsome essential genes in amino acid synthesis (SupplementaryNote 1). This shows that simulations with ssGEMs can beinstrumental to evaluate the potential of a strain in producingspecific chemicals, as well as enable analysis of the relationbetween genotype and phenotype.

Next, we classified the strains based on the genes and reactionscontained in the 1,011 ssGEMs, as well as the in silico substrateusage. Using PCA analysis we found that the strains could besubdivided into several distinct groups according to the existenceof genes and reactions (Supplementary Fig. 7e, f). However,

strains from different ecological origins can be clustered together,reflecting the high conservation in yeast metabolism. A decisiontree algorithm was further employed to classify strains based ontheir substrate usage, while this initially indicated that strainsfrom ‘Human’ can be separated from ‘Wine’ only based on themaximum growth rate on two substrates (lactic acid and sorbitol,Fig. 3d). When using hierarchical cluster analysis based on thereaction existence in ssGEMs for yeast strains from specificecological origins, taking ‘Human’ as an example, these strainscould be further divided into smaller groups based on the variablereactions shared between these strains (Supplementary Fig. 7g).The above analysis makes it clear that the model itself and therelated simulation can classify different yeast strains, which inturn can be complementary to classical strain classifications basedon single-nucleotide polymorphism (SNP) data40.

Ala ArgAsn Asp Cys Gln Glu Gly His Ile Le

uLy

sM

etPhe Pro Ser Thr Trp Tyr Val

CIACIBCKHAKPAVBABMAHMAQAAQDATQAHLBRKSACE_YBRAFRAEGAGMAEHAFDAHCAQHAEAAERSACE_YCCBFMBFESACE_YCGBEGAHQSACE_YCKSACE_YDK

0

50

100

150

Baker

yBee

r

Bioeth

anol

CiderDair

y

Distille

ry

Ferm

enta

tion

Flower

Fruit

Human

Human

, clin

ical

Indu

strial

Inse

ct

Lab

strain

Natur

e

Palm w

ine

Probio

ticSak

eSoil

Tree

Unkno

wn

Wat

erW

ine

0

0.025

0.050

0.075

Baker

yBee

r

Bioeth

anol

CiderDair

y

Distille

ry

Ferm

enta

tion

Flower

Fruit

Human

Human

, clin

ical

Indu

strial

Inse

ct

Lab

strain

Natur

e

Palm w

ine

Probio

ticSak

eSoil

Tree

Unkno

wn

Wat

erW

ine

050

100150200250

3970

3980

3990

4000

4010

Baker

yBee

r

Bioeth

anol

CiderDair

y

Distille

ry

Ferm

enta

tion

Flower

Fruit

Human

Human

, clin

ical

Indu

strial

Inse

ct

Lab

strain

Natur

e

Palmwine

Probio

ticSak

eSoil

Tree

Unkno

wn

Wat

erW

ineWine100%No Yes

Sake14%

Wine86%

S49 < 0.91 S17 < 0.045

S49 ≥ 0.19

S5 ≥ 0.36

Dairy6%

Sake8%

Human6%

Bakery1%

Wine5%

Distillery2%

Wine30%

Wine80%

Wine50%

S5 < 0.048

Wine6%

Wine45%

S11 < 0.046 S5 < 0.096

Wine43%

Strain classificationwith decision tree

HisIle

Leu

Lys

Met

Phe

Pro

Ser

Thr

TrpTyr

ValSuc Mal

Fum

EthIbu

Orn

Ala

Arg

Asn

AspCys

GlnGlu

0

1.5

3

Gly

a

e

f

b

Yie

ld(m

mol

per

mm

olgl

ucos

e)

c

d

Upper bound Lower bound

Ranges of maximum yields

Strains

0

0.5

1

1.5

2

2.5

Num

ber

of s

ubst

rate

sN

umbe

r of

reac

tions

Number of strains

Yie

ld o

fbio

mas

s

Fig. 3 Conservation and diversity analysis of yeast metabolism through reconstruction of 1,011 strain-specific GEMs (ssGEM). a Amount of reactions ofssGEMs for strains from different ecological origins. The top column graph describes the strain number from each ecological origin. Different strains fromone ecological origin could have the equal number of reactions. In the box plots of this work, if not mentioned, the bold line in the box represents themedian. The lower and upper bounds of box show the first and third quartiles, respectively, and whiskers indicate ± 1.5× the interquartile range (IQR). Thecolor dots overlaid represent the corresponding data points. b Prediction of biomass yield using ssGEMs on minimal media with glucose as the carbonsource. c Number of substrates which can be used in silico for strains from different ecological origins. In the simulation using ssGEMs, 58 carbon sources,46 nitrogen sources, 41 phosphate sources and 12 sulphate sources were used respectively in the minimal media. d Decision tree classification of strainsaccording to the in silico maximum growth rate on different carbon sources (S5: Latic acid, S11: Serine, S17: Sorbitol, S49: Trehalose). The in silico uptakerate for each carbon source was set at 10mmol (gBiomass)−1 h−1. e Comparison of the yield of 20 amino acids from glucose for 30 strains from the groupof “industrial strain”. f Comparison of in silico range of maximum yields of 20 amino acids and six key chemicals for all 1011 ssGEMs

NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-019-11581-3 ARTICLE

NATURE COMMUNICATIONS | (2019) 10:3586 | https://doi.org/10.1038/s41467-019-11581-3 | www.nature.com/naturecommunications 5

Page 7: A consensus S. cerevisiae metabolic model Yeast8 and its ... · ARTICLE A consensus S. cerevisiae metabolic model Yeast8 and its ecosystem for comprehensively probing cellular metabolism

Characteristics of metabolism displayed with proYeast8DB.Phenotype is not only determined by absence or presence ofgenes, but also by SNPs. As SNPs occur randomly, we need a wayto test which SNPs are significant, by looking at their distributionover the protein structure. To enable such analysis, we collectedPDB files for all the metabolic proteins in Yeast8, generating theproYeast8DB database (Supplementary Fig. 9a and Fig. 4a). Lowquality PDB files were filtered out in order to only maintain highquality PDB files for our following analysis (SupplementaryNote 2, Supplementary Fig. 9b–e).

To explore the potential functions of protein residue muta-tions, all the 16,258,509 homozygous SNPs from the 1011 S.cerevisiae genome-sequencing project40 were collected (Supple-mentary Fig. 10a, b) and classified as synonymous SNPs (sSNPs)and nonsynonymous SNPs (nsSNPs). The distribution andcorrelation of relative values in nsSNPs and sSNPs (Supplemen-tary Fig. 10c, d) stated clearly that there were fewer nsSNPs thansSNPs, meaning that the variation in protein sequences wassignificantly smaller than variations in nucleotide sequences. Ananalysis concerning the correlation between nsSNPs in genes withthe corresponding protein abundances and number of reactionscatalyzed by the corresponding protein was further conducted(Supplementary Fig. 10e, f). These results hinted that geneshaving high protein abundance and encoding enzymes catalyzinga large number of reactions tended to have fewer nsSNPs. From

this, it transpires that genes catalyzing abundant enzymes orenzymes with many metabolic functions are likely to be moreconserved in evolution. In addition, enzymes with high fluxcontrol coefficients related to the growth on different carbonsources (Fig. 2d) had fewer nsSNPs (Supplementary Fig. 11),which shows genes with a high impact on cell growth may bemore conserved43.

To get further insight into which parts of metabolism may besensitive or prone to mutations we chose the top 30 genes withthe smallest and largest number of relative nsSNPs and conducteda GO-term analysis using DAVID 6.744 (Supplementary Table 3and 4). Genes with the least amount of nsSNPs were enriched inthiamin metabolic process (P value= 0.0001) and glycolysis (Pvalue= 0.0003), indicating that these pathways are conserved inthe evolution of S. cerevisiae. By comparison, genes with the mostnsSNPs were enriched in organic acid biosynthetic process (Pvalue= 0.02), carboxylic acid biosynthetic process (P value=0.02), etc.

Following the findings mentioned above, we then mapped allnsSNPs onto the 3D protein structures of proYeast8DB usingCLUMPS method45, which could find proteins with significantSNP clusters within the protein structures. As an example, usingthe above nsSNP data as input, we ran the CLUMPS pipeline forthree different groups of S. cerevisiae strains: Wild, Wine, andBioethanol strains (Fig. 4b). With a cut-off P value of 0.05,

a

ID mapping

Collected PDBs

Refined PDFs

proYeast8DB

QC&QA

gene-protein-rxnsDomain mapping

b

c

422

415

P value = 0.0452 (CLUMPS)

ORF Ref Position Alt Freq

YML100W D 415 V 35

YML100W K 422 E 9

ORF PDB information

YML100W 342_801_5hvo.1.A

1_299_4flm.1.AYAR035W 3_642_2fy2.1.AYJL068CYMR246W 39_691_5mst.1.A

SNP effect prediction

Residue mutation list

ed

0

20

40

60

Bioeth

anol

Wild

Wine

Num

ber

ofpr

otei

nw

ithp_

valu

e<

0.05

Strain type

CLU

MP

San

alys

is

22

6541

Collect SNPs fromeach type of strains

SWISS-MODELPDB

0

100

200

300

35 54

362

Mutations 6–920Total strains 877 72

2–5 0–1

Strain group

P value = 0.001

P value = 0.012

P value = 0.08

P value = 0.544

Rel

ativ

e gr

owth

rat

e

Mutations 233 858 78

1 0Total strains

Strain group

Num

ber

of s

trai

n

0

0.25

0.5

0.75

1

0

0.25

0.5

0.75

1

Fig. 4 proYeast8DB—connects protein 3D structures with yeast GEM and enables the SNP mapping analysis based on protein structures. a Overview ofproYeast8DB formulation. b CLUMPS analysis to find proteins with significant mutation enrichment45. c Function annotation of residue mutation fromYML100W and mapping those mutations onto the protein structure. The ‘C’ means the related residue mutation is located on the conservative zone ofproteins based on the function annotation. d Classification of strains based on residue mutations in proteins encoded by YML100W, YAR035W, YJL068Cand YMR246W. e Classification of strains based on residue mutations from only one protein encoded by YML100W. The statistical analysis is based onWilcoxon rank sum test

ARTICLE NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-019-11581-3

6 NATURE COMMUNICATIONS | (2019) 10:3586 | https://doi.org/10.1038/s41467-019-11581-3 | www.nature.com/naturecommunications

Page 8: A consensus S. cerevisiae metabolic model Yeast8 and its ... · ARTICLE A consensus S. cerevisiae metabolic model Yeast8 and its ecosystem for comprehensively probing cellular metabolism

proteins with significant mutation enrichment from each groupof strains were identified (Fig. 4b). From the Wine group, wefound GO terms like ‘electron transport chain’ (P value= 0.018)and ‘glutathione metabolic process’ (P value= 0.006, Supple-mentary Table 5), with glutathione metabolism being known tobe distinct for wine yeasts, where it plays a role in anti-oxidationof various aromatic compounds46,47.

Interestingly, we discovered that the Bioethanol groupcontained four proteins (YML100W, YAR035W, YJL068C,YMR246W) belonging to the GO term ‘cellular response to heat’(P value= 0.041, Supplementary Table 6), among which,YML100W is a large subunit of the trehalose 6-phosphatesynthase/phosphatase complex, which is known to contribute tosurvival from acute heat stress. Two mutations were identified inthe protein encoded by YML100W, both are located in a smallcluster in the protein’s 3D structure, despite being separated byseven residues in the primary protein sequence (Fig. 4c). Theannotation from the mutfunc database48 (http://mutfunc.com/)suggested that mutation at site of 422 in YML100W occurred in aconserved zone (Fig. 4c). We further wanted to know whether themutations in these four proteins affected the phenotype of S.cerevisiae strains. Firstly, we classified all the strains based onresidue mutations from these four proteins (Fig. 4e). It could befound that the relative growth rate at 42 °C was significantlyhigher for strains with more than six mutations in these fourproteins than for strains that had less than two mutations.Similarly, the relative growth rates based on mutations only fromYML100W were also analyzed, which indicated that thecontribution of the two mutations from this single protein issmaller than over six mutations from all four proteins (Fig. 4f),further demonstrating that the ability to grow at elevatedtemperatures requires epistatic interactions of mutations inseveral genes (or proteins).

Systematic analysis of ecYeast8 and proYeast8DB. To furtherdisplay the value of the yeast model ecosystem, we combinedecYeast8 and proYeast8DB (Fig. 5a) to identify genes and theirmutations that are related to a specific phenotype. Based ongrowth rate data provided in Peter et al.40 (Fig. 5b), we selected50 strains with the highest, medium, and lowest relative growthrate on complex medium with glycerol as the carbon source(growth with glucose was used as reference) (Fig. 5c). We thenran the hotspot analysis pipeline based on proYeast8DB to findhotspot zones of proteins for each group of strains. In parallel, weperformed in silico flux control analysis to calculate FCCs foreach protein by using ecYeast8 with growth as the objectivefunction on the same medium. Hotspot analysis demonstratedthat each group of strains has a set of unique hotspot zones indifferent proteins (Fig. 5d).

With the FCCs as filtering criteria, four proteins were identifiedwith relatively high FCC (>0.01) from 50 strains with highestrelative growth rate, among which YJL052W, one of the isozymesof glyceraldehyde-3-phosphate dehydrogenase (GAPDH), islocated in the key pathway related to glycerol utilisation (Fig. 5f).When we listed all mutations of YJL052W, there exist six residuemutations at site of 31, 73, 24, 70, 125 and 248. Interestingly, wefound four mutations at site of 24, 31, 70 and 73 were close toeach other, far away from the other two mutations at site of 125and 248 (Fig. 5e). The mutation at position of 31 and 73 wereidentified successfully using our hotspot analysis pipeline and theremaining two mutations at position of 24 and 70 could befiltered out according to the definition of important clusters in ahotspot zone, which should be made up of the significant pairs oftwo residues, separated by at least 20 amino acids in the originalprotein sequence49. When the CLUMPS analysis was employed

for the different combinations of mutations in the above site, theP value for mutations at site of 24, 31, 70 and 73 is 0.0247 and Pvalue for mutations at site of 31 and 73 is 0.0195 (SupplementaryTable 7). However, if all the six mutated positions are considered,the P value is 0.7817. Therefore, mutations at position of 24, 31,70 and 73 as a whole could form into a larger hotspot zone. Fromthe protein functional annotation36, we found that this hotspotzone was very close to the binding site of NAD+ at site of 33and 78.

We queried whether there were correlations between mutationsoccurring in hotspot zones and the growth phenotype. From theremaining 746 strains which were not used for the above hotspotanalysis, we found eight strains that exhibit more than onemutation in the hotspot zone and another 15 strains having twomutations in YJL052W, but in the non-hotspot zone. Comparingthe relative growth rate of these two groups of strains with theremaining, we saw that the former 8 strains grows faster oncomplex medium with glycerol as the carbon source than boththe latter 15 strains, as well as all other strains not containing themutations in hotspot zone (Fig. 5g), indicating that the mutationthat happened in the hotspot zone is possibly beneficial for cellgrowth. Therefore, such an comprehensive analysis with ourmodel ecosystem can enable identification of the potentialmutation targets related to a specific phenotype.

DiscussionAs part of this study, we have developed Yeast8 aided by versioncontrol and open collaboration, which has provided a platformfor a continued community-driven expansion of the model. Thisplatform can greatly accelerate iterative updates of the model, andwe believe that this approach should become the future standardfor developing GEMs for other organisms. Yeast8 is the currentlymost comprehensive reconstruction of yeast metabolism, but italso represents a model that can be used for simulations. Theplatform provided through the GitHub repository enables addi-tion of new knowledge when it is acquired as well as using this forfurther improving the model for simulations. Yeast8 is in linewith the latest trend of performing model quality-control analysisin a standardised manner with memote50. Integrating consistentmodel evaluation with community model development will beinstrumental to accelerate high quality development of GEMs.

Through developing Yeast8, we have significantly improved themetabolic scope of the consensus GEM of S. cerevisiae. As moreevidences from experiments and bioinformatics analyses arerevealed and utilised to update GEMs, these models will movecloser to the in vivo network. Based on Yeast8, we developedstrain specific GEMs, enzyme-constrained GEMs (ecYeast8), etc.,which together form a model ecosystem around the yeast GEMand improve cellular phenotype predictions. As an example,ecYeast8 verifies that the yeast phenotypes are to a large extentdetermined by protein resources allocation, which is consistentwith recent research39. We expect predictions of ecYeast8 tofurther improve as more organism-specific kinetic data becomesavailable, hopefully generated in a high-throughput and sys-tematic way51. By comparing panYeast8, coreYeast8 and1,011 strain specific models, it can be concluded that metaboliccapabilities are largely conserved for all S. cerevisiae strains, whichis consistent with a recent study52. However, through strainspecific GEM simulations, we have found subtle metabolic dif-ferences among the strains in the utilisation of substrates and themaximum yield of 26 chemicals. Exploring these differencesconstrained with more physiological data can guide futuremetabolic engineering and help to evaluate the potential of anygiven strain for any desired product, as well as provide cluesabout the mechanisms of evolutionary adaption. Currently, only

NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-019-11581-3 ARTICLE

NATURE COMMUNICATIONS | (2019) 10:3586 | https://doi.org/10.1038/s41467-019-11581-3 | www.nature.com/naturecommunications 7

Page 9: A consensus S. cerevisiae metabolic model Yeast8 and its ... · ARTICLE A consensus S. cerevisiae metabolic model Yeast8 and its ecosystem for comprehensively probing cellular metabolism

in silico simulations were conducted using 1011 yeast ssGEMs;therefore, future experimental evidence for other non-referencestrains will be important to evaluate the reliability of our modelpredictions.

With the increasing use of adaptive laboratory evolution inmetabolic engineering it is important to have methods for rapidassessment of mutations in strains with improved phenotypes.Our multi-scale model analysis will be easily applicable in thisarea. As a further extension of Yeast8, we developed proYeast8DB

by collecting and evaluating the yeast metabolic protein 3Dstructures from public databases53,54. This enabled identificationof mutational hotspots associated with specific phenotypes. Fur-thermore, through combining the predicted targets from ecYeast8and proYeast8DB, we demonstrated how to identify mutations inenzymes with high flux control over a given pathway, which may

also be associated with desirable phenotypes. It should be notedthat although proYeast8DB is useful for connecting GEMs toprotein structure information like PDB identifiers and proteinparameters, it is still challenging to directly predict phenotypesusing the model with protein structure variations as input. Highquality 3D protein structures at genome scale are still scarce53,thus the breakthrough in 3D proteins structure simulations55 andthe related residues functional predictions are strongly expected.Nevertheless, like Recon3D6 and the E. coli GEM-PRO27, theproYeast8DB holds value as a means to explore the relationbetween the cell genotype and phenotype with clear evidence.

Yeast8 will continue to be developed together with its ecosys-tem of models. As a whole, they are expected be a solid basis fordeveloping a whole cell model of an eukaryal cell, which mayserve as a stepping stone to a wider use of model simulations in

ecYeast8

Glycerol

DHAP

G3P

GA3P

G1,3P

PYR

TDH1YJL052W

NAD+NADH

ATPADP

TCAcycle

YJL052W

248

125

243170 73

a

b

proYeast8DB

Hotspot analysisFlux controlcoefficient analysis

Genes withhotspot zones

Genes list withsignificant effect

Protein target withhotspot zones

c

e

d

f

FCC from ecYeast8 hotspot from proYeast8DB

Hotspot

Non hotspot

0.0180.0010.0190.0070.0120.0020.0010.0060.0060.0030.0100.0020.0030.0010.0010.0010.0030.0070.0070.0040.0010.0010.0330.0010.0220.0030.0030.0010.0030.0180.0010.0010.0040.0030.0010.0020.0020.0060.001

Rat

ioof

Hot

spot

g

high

Med

ium

low

YNR043W_V@328+P@361YER073W_G@383+A@409YGR061C_S@697+S@1219YGR061C_I@667+D@896YLR044C_V@458+N@506YLL018C_Y@116+L@137YLR153C_Y@125+A@339

YGR204W_V@401+T@807+V@815YGR204W_V@401+V@815YPR191W_Q@140+V@288YFL018C_N@205+E@229YFL030W_M@156+A@192Y OR168W_I@30+Y@61

Y OR168W_I@460+A@547YBL045C_S@157+A@332YPR081C_T@352+V@465YBR153W_I@75+I@179YKL085W_K@44+M@69YJL153C_L@55+Y@460YML120C_F@171+I@420

YNR016C_N@31+I@135+R@165YNR016C_A@135+I@135+R@165

YNL071W_V@285+D@445YBR256C_I@9+K@87

YBR256C_I@9+K@87+A@88YLR044C_V@201+N@336

Y OR168W_T@206+E@229Y OR374W_V@202+V@241

YJL052W_I@31+S@73Y OR168W_M@336+R@477

YER073W_I@351+Y@394YER073W_L@38+G@126YPR081C_S@360+N@415YGR061C_N@923+T@980YBR263W_V@55+F@431YLR180W_I@24+T@69

YKL182W_D@1851+E@1903YJL153C_R@95+D@165YPR191W_I@19+H@202

Mutations

0

1

2

3

Relative growth rate0.0 0.5 1.0 1.5

Den

sity

0.0

0.5

1.0

1.5

Strain groupHigh Medium Low

Rel

ativ

egr

owth

rate

WAP

Den

sity

Step1 All mutated residues

Step2 Filtered residues

Step3 Obtain hotspotsCluster 1 Cluster 2

Step4 calculate P value

Mutations

8

In hotspot

15 738 731

1–2 2 0 0–1

Total strains

Yes No Yes No

Strain group

Rel

ativ

egr

owth

rat

e

P value = 0.003

P value = 0.040

0

0.25

0.5

0.75

1

1

0.8

0.6

0.4

0.2

0

Fig. 5 Systematic analysis of ecYeast8 and proYeast8DB underlies the potential relations between gene mutations and growth phenotype. a Pipeline toconduct the integrative analysis of ecYeast8 and proYeast8DB. b Distribution of relative growth rate with glycerol as carbon substrate in complex medium.c Three groups of strains with high, medium and low relative growth rates used for hotspot analysis. d Heatmap of hotspot zones for three groups of strainsand heatmap of flux control coefficients obtained from ecYeast8 with growth as the objective function. e Mapping of all YJL052W mutations from strainswith high growth rate onto the protein 3D structure. f Position of enzyme encoded by YJL052W in the glycerol utilisation pathway. g Classification of theremaining strains based on mutations in hotspot zones and non-hotspot zones of YJL052W. The statistical analysis is based on Wilcoxon rank sum test

ARTICLE NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-019-11581-3

8 NATURE COMMUNICATIONS | (2019) 10:3586 | https://doi.org/10.1038/s41467-019-11581-3 | www.nature.com/naturecommunications

Page 10: A consensus S. cerevisiae metabolic model Yeast8 and its ... · ARTICLE A consensus S. cerevisiae metabolic model Yeast8 and its ecosystem for comprehensively probing cellular metabolism

life sciences, resulting in reducing the costs of developing bio-technology processes and drug discovery.

MethodsTracking model changes with version control. Git and GitHub were used todevelop yeast-GEM in a traceable way. Git is used to track any changes of yeast-GEM, which are stored online in a GitHub repository (Supplementary Fig. 1). Thestructure of the yeast-GEM repository on GitHub contains the following threemain directories:

(1) ComplementaryData, which contains the related database annotation andphysiological data used for yeast-GEM updates. This data is generally stored as tab-separated value (.tsv) format for easier tracking of changes; (2)ComplementaryScripts, which contains all the scripts used to update yeast-GEM;(3) ModelFiles, which contains different formats of yeast-GEM for variousapplications. The.txt and.yml (YAML) formats make it convenient to visualise anychanges in GitHub or Git local clients. The.xml (SBML) format makes it easy toimport the model across different toolboxes and programming languages.

As a standard step, a commit is needed when updating yeast-GEM. To makecommits easy to understand, semantic commit messages are used (SupplementaryFig. 1c). To enable parallel model development, different branches of yeast-GEMare used, including a ‘master’ branch and a ‘devel’ (development) branch.Developers, and even other people from the community, can create new branchesfrom the development branch to introduce their changes, and then request tomerge them back through pull-requests. These changes are only merged to thedevelopment branch, and in turn the changes in the development branch aremerged periodically to the master branch, which contains the stable releases ofthe model.

General procedures used to standardise annotation of metabolites andreactions. For the newly added reactions, their MetaNetX IDs were obtainedaccording to a direct search in the MetaNetX56 database using the related meta-bolite name or EC number information. MetaNetX IDs were also obtained byreaction ID mapping from the KEGG35, Rhea57 and BioCyc33 databases. Thereaction reversibility was corrected based on the BioCyc and BiGG databases58.MetaNetX IDs were also used to obtain the EC number for the correspondingreactions. As the MetaNetX database does not have the reaction name information,the name of each new reaction was obtained based on the reaction ID mapping indatabases of KEGG, ModelSeed and BioCyc.

The compartment annotation of new reactions was refined based oninformation from the UniProt36 and SGD32 databases. The subsystem annotationwas firstly obtained from KEGG35, and if no subsystems were found there,information from BioCyc or Reactome34 was used instead. If the reaction had nogene relations, we assumed that it occurred in the cytoplasm.

For all the metabolites contained in newly added reactions, the relatedMetaNetX IDs were obtained based on the reaction MetaNetX IDs. If not available,they were obtained by ID mapping based on KEGG IDs or ChEBI IDs. Once themetabolite MetaNetX IDs were obtained, the charge, formula, KEGG IDs andChEBI IDs were obtained for the correspondent metabolite based on metabolitesannotation in MetaNetX.

Model update from Yeast7 to Yeast8. Firstly, all the annotations regardingmetabolite ChEBI IDs and KEGG IDs (Supplementary Table 8) were corrected inthe latest version of the consensus GEM of yeast (version 7.6) based on themetabolite annotation available in KEGG and ChEBI59. Additionally, several genesfrom iSce92631 that were not included in yeast 7.6 were added, as with all genesrelated to metabolic processes and transport in SGD, BioCyc, Reactome, KEGGand UniProt. The main databases used for model curation could be found inSupplementary Table 9.

In the Biolog experiments, the strain S288c was grown on 190 carbon sources,95 nitrogen sources, 59 phosphorus sources, and 35 sulphur sources. The resultshowed that S288c could grow on 28 carbon sources, 44 nitrogen sources, 48phosphorus sources and 19 sulphur sources. Based on these results new essentialreactions were added to make the model capable of predicting growth on therelated substrates. Meanwhile, all the metabolomics data contained in the YMDBdatabase (measured metabolites) and the latest metabolomics research(Supplementary Table 10) were collected and compared with that in yeast GEM. Astandard annotation was given for all these metabolites and a pipeline was designedto add the metabolites into the GEM without bringing any new dead-endmetabolites. Detailed procedures in model curation are available inthe Supplementary Methods.

Model validation with varied experimental data sources. To compare themetabolites coverage, the YMDB database60 was parsed. There are 2024 metabo-lites for yeast, among which 871 were measured in S. cerevisiae. For each meta-bolite, ChEBI ID and KEGG ID were assigned, and based on them thecorresponding MetaNetX ID was matched. For metabolites from Yeast7 andYeast8, the MetaNetX ID of each metabolite was also obtained based on IDmapping.

The in silico growth on 190 carbon sources, 95 nitrogen sources, 59 phosphorussources, and 35 sulphur sources were calculated and compared with phenotypedata. The results can be divided into a confusion matrix, which contains: (1) G/G:in vivo growth/in silico growth (true positive); (2) NG/NG: in vivo no growth/insilico no growth (true negative); (3) G/NG: in vivo growth/in silico no growth (falsenegative); (4) NG/G: in vivo no growth/in silico growth (false positive);

The model quality is then evaluated based on accuracy (Eq. 1) and theMatthews’ Correlation Coefficient (MCC)61 (Eq. 2). Accuracy ranges from 0 (worstaccuracy) to 1 (best accuracy). MCC ranges from −1 (total disagreement betweenprediction and observation) to+ 1 (perfect prediction).

Accuracy ¼ TPþ TNTPþ TNþ FTþ FN

ð1Þ

MCC ¼ TP ´TN� FP ´ FNffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiðTPþ FPÞðTPþ FNÞðTNþ FPÞðTNþ FNÞp ð2Þ

To conduct gene essentiality analysis, we used the essential gene list from theYeast Deletion Project, available at http://www-sequence.stanford.edu/group/yeast_deletion_project/downloads.html, which was generated from experimentsusing a complete medium. Accuracy and MCC were computed as described above.

The simulated aerobic and anaerobic growth under glucose-limited andnitrogen-limited conditions were compared with reference data62. The followingprocedure was employed to simulate chemostat growth in glucose-limitedconditions. Firstly set the lower bound of glucose and O2 uptake reactions usingexperimental values. Glucose and oxygen uptake fluxes are negative and thereforethe lower bounds are fixed to represent the maximum uptake rates. Secondlymaximise the growth rate.

As for nitrogen-limited conditions, since protein content in biomass dropsdramatically under nitrogen-limited conditions, the biomass composition wasrescaled according to reference conditions63, then set the lower bound as measuredfor NH3 and O2 uptake reactions using experimental values and finally maximisethe growth rate.

Visualisation of Yeast8. The maps of yeast-GEM were drawn for each subsystemusing cellDesigner 4.438 (Supplementary Fig. 5). In-house R scripts were used toproduce the map of each subsystem automatically based on Yeast8. Afterwards, thegraph layout was adjusted manually in cellDesigner 4.4 to improve its quality andthe whole yeast map in SBGN format could be found in https://github.com/SysBioChalmers/Yeast-maps/tree/master/SBMLfiles.

Generation of ecYeast8. The ecYeast8 model was generated based on the latestrelease of the GECKO toolbox, available at https://github.com/SysBioChalmers/GECKO. For each reaction, the algorithm queries all the necessary kcat values fromthe BRENDA database64, according to gene annotation and a hierarchical set ofcriteria, giving priority to substrate and organism specificity. The kcat values arethen added to reactions according to:

� 1

kijcatvj þ ei ¼ 0 ð3Þ

0 � ei � Ei½ � ð4Þ

vj � kijcat � Ei½ � ð5Þwhere vj represents the flux through reaction j, ei represents the amount of enzymeallocated for reaction j, Ei represents the total concentration of enzyme i, and kcatrepresents the highest turnover number available for enzyme i and reaction j. Thedetailed procedure to generate ecYeast8 can be found in the supplementarymaterial of the GECKO paper26.

Simulations with ecYeast8. To predict the maximum growth rate under differentcarbon and nitrogen sources using ecYeast8, the following procedure was used.Firstly remove any constraints for the related uptake rates of carbon and nitrogensources. Next, set minimal media made up of the related carbon and nitrogensources. Lastly, simulate a growth rate maximisation, whereby the optimal value isfixed for posterior minimisation of the total protein usage. This provides a parsi-monious flux distribution.

For comparative FVA between Yeast8 and an ecYeast8, the maximum growthrate and the optimal glucose uptake rates obtained with ecYeast8 are used as fixedvalue and upper bound, respectively, in the original GEM in order to perform a faircomparison of flux variability for the same growth phenotype.

Flux control coefficients (FCCs) are defined as a ratio between a relative changein the flux of interest and a relative change in the correspondent kcat of 0.1%, whichcan be described by:

FCCi ¼vup � vb

vb

� �=

1:001kijcat � kijcatkijcat

!ð6Þ

where vb and vup are the original flux and new fluxes respectively when the kcat isincreased by 0.1%.

NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-019-11581-3 ARTICLE

NATURE COMMUNICATIONS | (2019) 10:3586 | https://doi.org/10.1038/s41467-019-11581-3 | www.nature.com/naturecommunications 9

Page 11: A consensus S. cerevisiae metabolic model Yeast8 and its ... · ARTICLE A consensus S. cerevisiae metabolic model Yeast8 and its ecosystem for comprehensively probing cellular metabolism

Re-annotation of the pan-genome from the 1011 yeast genome-sequencingproject. To construct the pan model of yeast (panYeast8), the latest genomicsresearch by Peter et al has consulted40. In Peter’s study, 1011 yeast strains genomeshad been sequenced and analysed. A pan-genome was obtained from all thesestrains, made up of by 6081 non-redundant ORFs from S. cerevisiae S288Creference genome, and 1715 non-reference ORFs (nrORFs) from the other strains.For the 7796 ORFs, a panID was given for each of them. By comparison, 4940ORFs are conserved in all these strains while 2846 ORFs are variables across allthese strains. The annotation of non-redundant 6081 ORFs can be taken directlyfrom the latest S. cerevisiae S288C genome annotation, while related gene–protein-reactions (GPR) can be obtained from Yeast8 directly.

As mentioned in Peter’s article there are 774 nrORFs with the ortholog genesfrom S. cerevisiae S288C genome40. The blast analysis, along with the geneannotation of KEGG web service35, and EggNOG web service65, were employed tocheck and improve the original ortholog relation. To evaluate the ortholog generelations qualitatively, the bi-directional blast hit (BBH) analysis was furtherconducted using Diamond66. Here the best hit in BBH analysis with pidentitylarger than 80% were finally chosen and prepared for a panYeast8 formulation.

To further search reliable new reactions connected with nrORFs, the annotationresults from KEGG and the EggNOG web service were used. According to theformat request for the two web services, the protein fasta files of pan-genome wereuploaded onto KEGG (https://www.genome.jp/tools/kaas/) and EggNOG (http://eggnogdb.embl.de/#/app/emapper). For the KEGG annotation, a BBH (bi-directional best hit) assignment method with the default parameters was used. Forthe EggNOG annotation, the HMMER with the default parameters was used. In theEggNOG annotation, each protein will be mapped onto KO ID and BiGG reactionID while for the KEGG annotation, each protein will be given a unique KO ID. Soif the KO ID for a protein is different between KEGG and EggNOG, then the KOID given by KEGG will be preferred in the further analysis. If the KO ID was givenfor one protein by EggNOG, but not in KEGG, then this annotation will be alsoused for the pan-genome annotation. When the KO ids are obtained, the lists ofKOs from nrORFs are compared with the reference ORFs. New KO ids for thenrORFs were subsequently extracted. Following this, the rxnID was obtained basedon KO-rxnID mapping from KEGG database.

Generation of panYeast8, coreYeast8 and strain specific GEMs. For orthologgenes (e.g. gene C) obtained from pan-genome annotation, they can be mergedbased on the reference gene (e.g. gene A) function in the original model accordingto the following rules: (1) if A or B catalyze the same isoenzyme, the GPR rulecould be changed to ‘A or B or C’ in panYeast8; (2) if A and B belong to a complex,the GPR rule should be updated from ‘A and B’ into ‘(A and B) or (C and B)’.Secondly, 51 new reactions with 13 new genes were merged into panYeast8. As forthe genes identity in the model, in order to reduce chaos, the original gene IDs andgene names from original Yeast8 were kept, while for newly added genes, thepanIDs defined in Peter’s work9 were used to represent the gene name.

Collapsed genes in pan-genome but could be found in yeast GEM, and will bereplaced with the corresponding ortholog genes defined in pan-genome. ssGEMsfor 1011 strains were reconstructed based on panYeast8 along with the relatedstrains specific genes list (Supplementary Fig. 6a). A Matlab function wasdeveloped to generate strain specific models automatically. Based on current geneexistence information, if one gene from a complex is missing, then the reaction isremoved; and if a gene from two isoenzymes is missing, then the reaction will bekept, though the GPRs will be updated to remove the missing gene. After thereconstruction of 1011 ssGEMs, coreYeast8 was generated based on commonreactions, genes, and metabolites across the 1011 ssGEMs.

Strain classification based on PCA, decision tree and cluster analysis. Thehierarchical cluster analysis based on the reaction existence in ssGEMs for yeaststrains is based on R package––dendextend (https://CRAN.R-project.org/package= dendextend). For the PCA analysis of strains based gene (or reaction) existencein ssGEMs, R function-prcomp has been used in this article. The decision treeclassification of strains according to the maximum growth rate on different carbonsources was carried out using the R package––rpart (https://cran.r-project.org/web/packages/rpart/). For the hyperparameters tuning, two R packages—ParamHelpers(https://CRAN.R-project.org/package= ParamHelpers) and mlr (https://CRAN.R-project.org/package=mlr) were further used.

Protein structure collection for proYeast8DB. To establish the protein 3Dstructure models for all genes from yeast GEM (and a few metabolic genes notincluded in current Yeast8), all the protein structures of S. cerevisiae S288C fromthe SWISS-MODEL database67 (https://Swissmodel.expasy.org) on 20 July 2018were downloaded. The total number is about 20332 PDB files including the 8109modelling homology PDB files (PDB_homo) and 12223 experimental PDB files(PDB_ex). Meanwhile all the PDB_ex of S. cerevisiae S288C stored in RCSB PDB54

database were further downloaded. The protein sequences contained in eachPDB_ex were also downloaded. The above two sources of PDB files were merged toobtain the comprehensive PDB files database for S. cerevisiae S288C. With themetabolic gene list of S. cerevisiae S288C to query PDB files database, most genes,with the exception of roughly 217 proteins (in Yeast8.3) could be found in the

related PDB files. To fill this gap, the SWISS-MODEL web service was further usedto build the PDB_homo for 217 proteins. As a result, each of metabolic proteincould have at least one PDB file. All the original proteins annotation, like theresidues sequence and protein length, were downloaded from the SGD database.

Once the PDB files were collected, the parameters of PDBs were extracted andcalculated for quality analysis. As for the PDB_homo, the default parameters fromthe ftp of the SWISS-MODEL database were obtained, and included the proteinUniProt ID, the protein length, the related PDB ID (connected with chainID), thestructure sources, the coordinates of proteins residues covered with PDB structures,the coverage, the resolution, and QMEAN. As for PDB_homo, besides the abovedefault parameters from the SWISS-MODEL database, a greater number ofparameters were obtained by parsing the PDB_homo atom files provided by theSWISS-MODEL with an in-house python script, which included the methods usedto obtain the PDB files, the model template, the protein oliga state, the GMQE,QMN4, sequence identity (SID), and sequence similarity (SIM). In summary, eachPDB_homo contains 18 parameters for further PDB quality analysis.

Some of PDB_ex parameters, like coverage and template ID can also be foundfrom the SWISS-MODEL database. The other important parameters likeresolution, ligands, and oliga state were obtained by parsing PDB_ex files fromRCSB PDB database using (https://github.com/williamgilpin/pypdb). The chainIDfor each PDB_ex was downloaded from the SIFTS database68.

Quality analysis of protein 3D structure. As one protein could be connected withseveral PDB files in different quality levels, it is essential to filter out the PDB of lowquality. In this work, mainly four import parameters, that are sequence identity(SI), sequence similarity (SS), resolution, and QMEAN, were used to classify thePDB_homo. By using a simple normal distribution to describe all these parametersof PDB_homo, a Z score test can be done to calculate the threshold value for Pvalue set at 0.1. The cut-off value of sequence identity, the sequence similarity,resolution, and QMEAN are 17.58, 0.25, 3.8 Å and −6.98 respectively. As stated inthe SWISS-MODEL database, however, a PDB_homo with the QMEAN smallerthan −4 is of low quality. To ensure PDB_homo of higher quality in this work, thecritical parameters are reset as the following: QMEAN ≥−4, SI ≥ 0.25, SS ≥ 0.31,and Resolution ≤ 3.4 Å.

In order to check whether there exists a gap in the PDB_ex files, all residuesequences from PDB databases for each chain of one PDB file were downloaded. Atsome points, however, residue sequences provided by PDB databases were notconsistent with residue sequences contained in the structure. To solve this issue, aBiopython package69 was used to obtain residue sequences for each chain of onePDB file. Next, all residue sequences were blasted with original protein sequencesfor S. cerevisiae S288C from SGD with the aid of Diamond66 in order to checkwhether there existed gaps (mismatches or mutations) in the residue sequencesfrom PDB_ex when compared with the original residue sequences. The PDB_exhas been chosen with the thresholds: pidentity= 100 and resolution ≤ 3.4 Å;otherwise a PDB_homo from SWISS-MODEL database will be used.

Establishing relations of protein domain, gene, protein and reactions(dGRPs). In this work, the Pfam32.0 database70 (https://pfam.xfam.org/) wasmainly used to annotate the domain information of proteins from S. cerevisiaeS288C. If a structure covered all residues of any given domain, it was assigned tothat very domain. For each domain, the coordinates of start and end, the name, thedomain function description, the domain type, e_value, the related PDB ID, andprotein ID, were all summarised. According to the GPRs of Yeast8, the relationbetween gene ID and reaction ID could be obtained. Following this, the domaininformation could be connected with each pair of gene and reaction based on theID mapping.

SNP collection and relative coordinates mapping. Starting from the vcf fileprovided by the recent 1011 yeast strains genomes sequencing projects40 thehomozygous SNP from the massive data file (Supplementary Fig. 10a) were firstlyextracted. The SNPs of low total quality with depth being <2.0, mapping quality<40, genotype quality < 30, and Genotype depth <5 were filtered out based on aseries of standard parameters according to the Broad Institute Genome analysisToolkit (GATK)71.

After filtration, the reliable SNP can be obtained for each strain. The datafurthermore contains each SNP’s strain name, chromosome, coordinates, ref, andalt nucleotide base. In the annotation phase, the SNP type and related gene nameswere further annotated based on the coordinates and the annotation information ofS. cerevisiae S. cerevisiae S288C reference genome (version R64-1-1) from NCBI. Ifthe SNP was not located on CDS zone of gene, it was classified as a type of‘INTEGENIC’. If not this classification, it was otherwise given a gene systematicname, consistent with the gene name format in Yeast8. Based on the above SNPannotation information only those belonging to the metabolic genes (gene list inYeast8 and some other metabolic genes not contained in Yeast8 until now) werechosen. According to the SNP annotation information and the protein sequences ofthe related genes, the SNPs are classified as the sSNP (synonymous singlenucleotide polymorphism) and nsSNP (nonsynonymous single nucleotidepolymorphism). The relative numbers of sSNPs and nsSNPs for each gene were

ARTICLE NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-019-11581-3

10 NATURE COMMUNICATIONS | (2019) 10:3586 | https://doi.org/10.1038/s41467-019-11581-3 | www.nature.com/naturecommunications

Page 12: A consensus S. cerevisiae metabolic model Yeast8 and its ... · ARTICLE A consensus S. cerevisiae metabolic model Yeast8 and its ecosystem for comprehensively probing cellular metabolism

calculated, which is equal to the total sSNPs or nsSNPs divided by the relatedprotein length.

Before mapping, the coordinates of mutated residues from each nsSNP need tobe calculated. Firstly, the relative coordinates of mutated residues on the originalprotein sequence can be obtained based on the coordinates of nsSNP on thechromosome. Following this, according to the coordinates mapping between theoriginal protein sequences and the relative residues coordinates in the proteinsstructure, the relative coordinates of the mutated residues in the protein structurescan be estimated and used in the following calculation.

CLUMPS method to calculate p-values of mutation enriched PDB files.Referring to Kamburov’s method45, a WAP score to calculate the pairwise distancesbetween mutated residues for a protein 3D structure.

WAP ¼X

q;rnqnre

�d2q;r

2t2 ð7Þ

Where dq,r in this article is defined as the Euclidean distance (in Å) between αcarbons of any two mutated residues. t is defined as a ‘soft’ distance threshold,which equals to 6 Å. nq and nr are the normalised numbers of samples contains themutations using the followed sigmoidal Hill function:

nq ¼Nmq

θm þ Nmq

ð8Þ

Where Nq is the number of samples with a missense mutation impacting residue qof the protein and θ= 2 and m= 3 are parameters of the Hill function controllingthe critical point (centre) and steepness of the sigmoid function, respectively.Formula (2) was used to normalise the sample number contained in residuemutations q and r, both of which can avoid the impact of higher frequent mutatedresidues in the samples. A detailed description of each formula can be found inKamburov’s article45.

The CLUMPS method can be divided into four steps. Firstly, prepare theneeded SNP information and structure information of one protein. Secondly, withthe normalised mutation number occurring in specific positions, calculate theWAP scores of the samples. Next, assuming that the uniform distribution ofmutations across the protein residues covers the given structure, calculate eachWAP score in 10 randomisations to obtain the null distribution. During thesampling process, the mutation number of residues occurring in random locationswas kept the same as the original values. Lastly, calculate the right tailed P value inthe null distribution for the given mutated protein structures based on the originalWAP score and all the sampled WAP scores. The right tailed P value is defined asthe number of samples with WAP scores larger than the original WAP scored,divided by the total number of samples.

For proteins with P value smaller than 0.05 from strains group of “Bioethonal”and “Wine”, GO-enrichment analysis using DAVID6.7 on-line web service72 wascarried out.

Hotspot analysis of nsSNP mutation. The hotspot analysis pipeline for yeastmainly refers to Niu et al.’s work49. All the SNP and structure information (similarto CLUMPS’ analysis method) were prepared for a group of strains with specificphenotypes. Before carrying out the cluster analysis, the mutated paired residues ofsignificance were filtered according to reference49. These important paired residuesshould meet the followed three criteria: the distance between two residues shouldbe smaller than 10 Å for all the intramolecular clusters analysis; the two residuesshould be separated by at least 20 residues in the original protein sequence; and apermutation method ought to be used to calculate the P value for each pairedresidues (Eq. 9), with a threshold set at 0.05.

P value ¼ n1n2

ð9Þ

Where n1 is the number of paired residues with the distance smaller than that inthe paired residues of target and n2 is the total number of paired residues.

Once the paired residues of significance have been obtained, the clusters madeup of paired residues were obtained based on the undirected graph theory, whichwas realised using the function ‘decompose.graph’ from the R package igraph(https://igraph.org/). For each cluster, its closeness can be calculated using thefunction of ‘closeness.residual’ from the R package entiserve73. The detailedprinciple could be also find in the original research49. As the last step, when acluster was estimated, the P value was calculated based on the CLUMPS analysispipeline in this work.

Prediction of mutations function. To look into the effect of the mutations,mutfunc48 (http://mutfunc.com/) and SIFT (http://snpeff.sourceforge.net/SnpSift.html) were employed to predict the potential effects of nsSNP on protein function.

Growth test using Biolog with different substrate sources. The PhenotypeMicroArray (PM) system was used to test growth on every carbon, nitrogen,phosphorus and sulphur sources74. A total of 190 carbon sources, 95 nitrogensources, 95 phosphorus, and sulphur sources were tested. The PM procedures for S.cerevisiae S288C were based on the protocol of Yeast version of the PM system.

Growth profiling in different media. A total of 14 carbon sources and 23 nitrogensources were combined by orthogonal experiments. Every carbon source andnitrogen source used in the medium were the same C-mole and N-mole as glucose(20 g L−1 glucose) and ammonium sulphate (7.5 g L−1 (NH4)2SO4), respectively.For all other substrate sources, the same minimal medium was used (14.4 g L−1

KH2PO4, 0.5 g L−1 MgSO4∙7H2O, trace metal and vitamin solutions)75. Strainswere cultivated in 96-well plates, and growth performance was determined withGrowth Profiler 960 (Enzyscreen B.V., Heemstede, The Netherlands). The max-imum specific growth rate (μmax) was calculated with the R package—growthrates(https://github.com/tpetzoldt/growthrates).

Statistical analysis. For two group comparison in this work, a two tailed Wil-coxon rank sum test was used.

Reporting summary. Further information on research design is available inthe Nature Research Reporting Summary linked to this article.

Data availabilityThe yeast-GEM project with all related data sources can be found at https://github.com/SysBioChalmers/yeast-GEM, and a full documentation on how the repository works andhow to contribute is available at https://github.com/SysBioChalmers/yeast-GEM/blob/master/.github/CONTRIBUTING.md. The panYeast with all related data sources can befound in https://github.com/SysBioChalmers/panYeast-GEM. The genomic data used inthis work is from40.

Code availabilityMatlab scripts for development of yeast GEM and panYeast can be found in the abovetwo corresponding repositories. R scripts to carry out the CLUMPS and hotspot analysiscan be found at: https://github.com/SysBioChalmers/proYeast8-GEM.

Received: 19 March 2019 Accepted: 17 July 2019

References1. Garza, D. R., van Verk, M. C., Huynen, M. A. & Dutilh, B. E. Towards

predicting the environmental metabolome from metagenomics with amechanistic model. Nat. Microbiol 3, 456–460 (2018).

2. Robinson, J. L. & Nielsen, J. Anticancer drug discovery through genome-scalemetabolic modeling. Current Opinion in. Syst. Biol. 4, 1–8 (2017).

3. Chen, X. et al. DCEO biotechnology: tools to design, construct, evaluate, andoptimize the metabolic pathway for biosynthesis of chemicals. Chem. Rev. 118,4–72 (2017).

4. Schellenberger, J. et al. Quantitative prediction of cellular metabolism withconstraint-based models: the COBRA Toolbox v2.0. Nat. Protoc. 6, 1290–1307(2011).

5. Wang, G., Huang, M. & Nielsen, J. Exploring the potential of Saccharomycescerevisiae for biopharmaceutical protein production. Curr. Opin. Biotechnol.48, 77–84 (2017).

6. Brunk, E. et al. Recon3D enables a three-dimensional view of gene variation inhuman metabolism. Nat. Biotechnol. 36, 272–281 (2018).

7. Heavner, B. D. & Price, N. D. Comparative analysis of yeast metabolicnetwork models highlights progress, opportunities for metabolicreconstruction. PLOS Comput. Biol. 11, e1004530 (2015).

8. Monk, J. M. et al. iML1515, a knowledgebase that computes Escherichia colitraits. Nat. Biotechnol. 35, 904–908 (2017).

9. Aite, M. et al. Traceability, reproducibility and wiki-exploration for “a-la-carte” reconstructions of genome-scale metabolic models. PLoS Comput Biol.14, e1006146 (2018).

10. Spaulding, A. et al. Pathway Tools version 19.0 update: software for pathway/genome informatics and systems biology. Brief. Bioinforma. 17, 877–890(2015).

11. Krivoruchko, A. & Nielsen, J. Production of natural products throughmetabolic engineering of Saccharomyces cerevisiae. Curr. Opin. Biotechnol. 35,7–15 (2015).

12. Runguphan, W. & Keasling, J. D. Metabolic engineering of Saccharomycescerevisiae for production of fatty acid-derived biofuels and chemicals. Metab.Eng. 21, 103–113 (2014).

13. Sun, S. et al. An extended set of yeast-based functional assays accuratelyidentifies human disease mutations. Genome Res. 26, 670–680 (2016).

14. Dabas, P., Kumar, D. & Sharma, N. in Yeast Diversity in Human Welfare. (eds.T. Satyanarayana & G. Kunze) 191–214 (Springer Singapore, Singapore,2017).

15. DiCarlo, J. E. et al. Genome engineering in Saccharomyces cerevisiae usingCRISPR-Cas systems. Nucleic Acids Res. 41, 4336–4343 (2013).

NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-019-11581-3 ARTICLE

NATURE COMMUNICATIONS | (2019) 10:3586 | https://doi.org/10.1038/s41467-019-11581-3 | www.nature.com/naturecommunications 11

Page 13: A consensus S. cerevisiae metabolic model Yeast8 and its ... · ARTICLE A consensus S. cerevisiae metabolic model Yeast8 and its ecosystem for comprehensively probing cellular metabolism

16. Gasch, A. P. et al. Single-cell RNA sequencing reveals intrinsic and extrinsicregulatory heterogeneity in yeast responding to stress. PLoS Biol. 15,e2004050–e2004050 (2017).

17. Forster, J., Famili, I., Fu, P., Palsson, B. O. & Nielsen, J. Genome-scalereconstruction of the Saccharomyces cerevisiae metabolic network. GenomeRes. 13, 244–253 (2003).

18. Lahtvee, P. J. et al. Absolute quantification of protein and mRNA abundancesdemonstrate variability in gene-specific translation efficiency in yeast. CellSyst. 4, 495–504 (2017).

19. Tian, M. & Reed, J. L. Integrating proteomic or transcriptomic data intometabolic models using linear bound flux balance analysis. Bioinformatics 34,3882–3888 (2018).

20. Lopes, H. & Rocha, I. Genome-scale modeling of yeast: chronology,applications and critical perspectives. FEMS Yeast Res. 17, 1–13 (2017).

21. Pereira, R., Nielsen, J. & Rocha, I. Improving the flux distributions simulatedwith genome-scale metabolic models of Saccharomyces cerevisiae.Metab. Eng.Commun. 3, 153–163 (2016).

22. Aung, H. W., Henry, S. A. & Walker, L. P. Revising the representation of fattyacid, glycerolipid, and glycerophospholipid metabolism in the consensusmodel of yeast metabolism. Ind. Biotechnol. (New Rochelle N. Y) 9, 215–228(2013).

23. Yang, L., Yurkovich, J. T., King, Z. A. & Palsson, B. O. Modeling the multi-scale mechanisms of macromolecular resource allocation. Curr. Opin.Microbiol. 45, 8–15 (2018).

24. Hackett, S. R. et al. Systems-level analysis of mechanisms regulating yeastmetabolic flux. Science 354, 432 (2016).

25. O’Brien, E. J., Lerman, J. A., Chang, R. L., Hyduke, D. R. & Palsson, B. O.Genome-scale models of metabolism and gene expression extend and refinegrowth phenotype prediction. Mol. Syst. Biol. 9, 693 (2013).

26. Sanchez, B. J. et al. Improving the phenotype predictions of a yeast genome-scale metabolic model by incorporating enzymatic constraints. Mol. Syst. Biol.13, 935 (2017).

27. Chang, R. L. et al. Structural systems biology evaluation of metabolicthermotolerance in Escherichia coli. Science 340, 1220–1223 (2013).

28. Dobson, P. D. et al. Further developments towards a genome-scale metabolicmodel of yeast. BMC Syst. Biol. 4, 145 (2010).

29. Heavner, B. D., Smallbone, K., Barker, B., Mendes, P. & Walker, L. P. J. B. S. B.Yeast 5 – an expanded reconstruction of the Saccharomyces cerevisiaemetabolic network. BMC Syst. Biol. 6, 55 (2012).

30. Heavner, B. D., Smallbone, K., Price, N. D. & Walker, L. P. Version 6 of theconsensus yeast metabolic network refines biochemical coverage and improvesmodel performance. Database (Oxf.) 2013, bat059 (2013).

31. Chowdhury, R., Chowdhury, A. & Maranas, C. D. Using gene essentiality andsynthetic lethality information to correct yeast and CHO Cell genome-scalemodels. Metabolites 5, 536–570 (2015).

32. Cherry, J. M. et al. Saccharomyces genome database: the genomics resource ofbudding yeast. Nucleic Acids Res. 40, 700–705 (2012).

33. Caspi, R. et al. The MetaCyc database of metabolic pathways and enzymes andthe BioCyc collection of pathway/genome databases. Nucleic Acids Res. 44,471–480 (2016).

34. Fabregat, A. et al. The reactome pathway knowledgebase. Nucleic Acids Res.46, 649–655 (2018).

35. Kanehisa, M. & Goto, S. KEGG: kyoto encyclopedia of genes and genomes.Nucleic acids Res. 28, 27–30 (2000).

36. The UniProt Consortium. UniProt: the universal protein knowledgebase.Nucleic Acids Res. 45, 158–169 (2017).

37. Sanchez, B. J., Li, F., Kerkhoven, E. J. & Nielsen, J. SLIMEr: probing flexibilityof lipid metabolism in yeast with an improved constraint-based modelingframework. BMC Syst. Biol. 13, 4 (2019).

38. Funahashi, A. et al. CellDesigner 3.5: a versatile modeling tool for biochemicalnetworks. Proc. IEEE 96, 1254–1265 (2008).

39. Nilsson, A. & Nielsen, J. Metabolic trade-offs in yeast are caused by F1F0-ATPsynthase. Sci. Rep. 6, 22264 (2016).

40. Peter, J. et al. Genome evolution across 1,011 Saccharomyces cerevisiaeisolates. Nature 556, 339–344 (2018).

41. Machado, D., Andrejev, S., Tramontano, M. & Patil, K. R. Fast automatedreconstruction of genome-scale metabolic models for microbial species andcommunities. Nucleic Acids Res. 46, 7542–7553 (2018).

42. Monk, J. M. et al. Genome-scale metabolic reconstructions of multipleEscherichia coli strains highlight strain-specific adaptations to nutritionalenvironments. Proc. Natl Acad. Sci. USA 110, 20338–20343 (2013).

43. Jordan, I. K., Rogozin, I. B., Wolf, Y. I. & Koonin, E. V. Essential genes aremore evolutionarily conserved than are nonessential genes in bacteria.Genome Res. 12, 962–968 (2002).

44. Huang, D. W., Sherman, B. T. & Lempicki, R. A. Systematic and integrativeanalysis of large gene lists using DAVID bioinformatics resources. Nat. Protoc.4, 44–57 (2008).

45. Kamburov, A. et al. Comprehensive assessment of cancer missense mutationclustering in protein structures. Proc. Natl Acad. Sci. USA 112, 5486–5495(2015).

46. Mezzetti, F., De Vero, L. & Giudici, P. Evolved Saccharomyces cerevisiae winestrains with enhanced glutathione production obtained by an evolution-basedstrategy. FEMS Yeast Res. 14, 977–987 (2014).

47. Bonciani, T., De Vero, L., Mezzetti, F., Fay, J. C. & Giudici, P. A multi-phaseapproach to select new wine yeast strains with enhanced fermentative fitnessand glutathione production. Appl Microbiol Biotechnol. 102, 2269–2278(2018).

48. Wagih, O. et al. A resource of variant effect predictions of single nucleotidevariants in model organisms. Mol. Syst. Biol. 14, e8430 (2018).

49. Niu, B. et al. Protein-structure-guided discovery of functional mutationsacross 19 cancer types. Nat. Genet 48, 827–837 (2016).

50. Lieven, C. et al. Memote: a community driven effort towards a standardizedgenome-scale metabolic model test suite. bioRxiv, 350991 (2018). https://doi.org/10.1101/350991.

51. Nilsson, A., Nielsen, J. & Palsson, B. O. Metabolic models of protein allocationcall for the kinetome. Cell Syst. 5, 538–541 (2017).

52. Caspeta, L., Shoaie, S., Agren, R., Nookaew, I. & Nielsen, J. J. B. S. B. Genome-scale metabolic reconstructions of Pichia stipitis and Pichia pastoris and insilico evaluation of their potentials. BMC Syst. Biol. 6, 24 (2012).

53. Bienert, S. et al. The SWISS-MODEL repository-new features andfunctionality. Nucleic acids Res. 45, 313–319 (2017).

54. Rose, P. W. et al. The RCSB protein data bank: integrative view of protein,gene and 3D structural information. Nucleic Acids Res. 45, 271–281 (2017).

55. Wang, S., Sun, S., Li, Z., Zhang, R. & Xu, J. Accurate de novo prediction ofprotein contact map by Ultra-Deep learning model. PLoS Comput Biol. 13,e1005324 (2017).

56. Moretti, S. et al. MetaNetX/MNXref—reconciliation of metabolites andbiochemical reactions to bring together genome-scale metabolic networks.Nucleic Acids Res. 44, 523–526 (2016).

57. Lombardot, T. et al. Updates in Rhea: SPARQLing biochemical reaction data.Nucleic Acids Res. 47, 596–600 (2018).

58. King, Z. A. et al. BiGG Models: a platform for integrating, standardizing andsharing genome-scale models. Nucleic Acids Res. 44, 515–522 (2016).

59. Hastings, J. et al. ChEBI in 2016: improved services and an expandingcollection of metabolites. Nucleic Acids Res. 44, 1214–1219 (2016).

60. Ramirez-Gaona, M. et al. YMDB 2.0: a significantly expanded version of theyeast metabolome database. Nucleic Acids Res. 45, 440–445 (2017).

61. Matthews, B. W. Comparison of the predicted and observed secondarystructure of T4 phage lysozyme. Biochim Biophys. Acta 405, 442–451 (1975).

62. Osterlund, T., Nookaew, I., Bordel, S. & Nielsen, J. Mapping condition-dependent regulation of metabolism in yeast through genome-scale modeling.BMC Syst. Biol. 7, 36 (2013).

63. Aon, J. C. & Cortassa, S. Involvement of nitrogen metabolism in the triggeringof ethanol fermentation in aerobic chemostat cultures of saccharomycescerevisiae. Metab. Eng. 3, 250–264 (2001).

64. Jeske, L., Placzek, S., Schomburg, I., Chang, A. & Schomburg, D. BRENDA in2019: a European ELIXIR core data resource. Nucleic Acids Res. 47, 542–549(2019).

65. Huerta-Cepas, J. et al. eggNOG 4.5: a hierarchical orthology framework withimproved functional annotations for eukaryotic, prokaryotic and viralsequences. Nucleic Acids Res. 44, 286–293 (2016).

66. Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignmentusing DIAMOND. Nat. Methods 12, 59–60 (2014).

67. Waterhouse, A. et al. SWISS-MODEL: homology modelling of proteinstructures and complexes. Nucleic Acids Res. 46, 296–303 (2018).

68. Velankar, S. et al. SIFTS: structure integration with function, taxonomy andsequences resource. Nucleic Acids Res. 41, 483–489 (2013).

69. Cock, P. J. A. et al. Biopython: freely available Python tools for computationalmolecular biology and bioinformatics. Bioinformatics 25, 1422–1423 (2009).

70. Finn, R. D. et al. The Pfam protein families database: towards a moresustainable future. Nucleic Acids Res. 44, 279–285 (2016).

71. McKenna, A. et al. The genome analysis toolkit: A MapReduce framework foranalyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303(2010).

72. Huang, D. W. et al. DAVID bioinformatics resources: expanded annotationdatabase and novel algorithms to better extract biology from large gene lists.Nucleic Acids Res. 35, 169–175 (2007).

73. Jalili, M. et al. CentiServer: a comprehensive resource, web-based applicationand R package for centrality analysis. PLoS ONE 10, e0143111 (2015).

74. Bochner, B. R. New technologies to assess genotype-phenotype relationships.Nat. Rev. Genet. 4, 309–314 (2003).

75. Verduyn, C., Postma, E., Scheffers, W. A. & Van Dijken, J. P. Effect of benzoicacid on metabolic fluxes in yeasts: a continuous-culture study on theregulation of respiration and alcoholic fermentation. Yeast 8, 501–517 (1992).

ARTICLE NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-019-11581-3

12 NATURE COMMUNICATIONS | (2019) 10:3586 | https://doi.org/10.1038/s41467-019-11581-3 | www.nature.com/naturecommunications

Page 14: A consensus S. cerevisiae metabolic model Yeast8 and its ... · ARTICLE A consensus S. cerevisiae metabolic model Yeast8 and its ecosystem for comprehensively probing cellular metabolism

AcknowledgementsWe would like to thank Pınar Kocabaş, Hao Wang, Jonathan Robinson, and Rui Pereirafor their valuable input. This project has received funding from the Novo NordiskFoundation (grant no. NNF10CC1016517), the Knut and Alice Wallenberg Foundation,and the European Union’s Horizon 2020 research and innovation program with projectsDD-DeCaF and CHASSY (grant agreements No 686070 and 720824).

Author contributionsH.L., F.L., B.J.S., E.J.K. and J.N. conceived the study. H.L., F.L., B.J.S., Z.Z. and E.J.Kcontributed to the yeast GEM update. B.J.S. and I.D. developed ecYeast8 and performedthe corresponding simulations. H.L., F.L. and E.J.K. designed and generated panYeast8and the 1,011 strain specific GEMs. H.L. designed and contributed to the completion ofproYeast8DB. G.L. assisted in the protein structure data collection for proYeast8DB. Z.Z.and F.L. designed and performed the experiments. Z.Z. and H.L. designed and con-tributed to the yeast map. S.M., P.M.A., D.L., C.L., M.E.B., N.S. and E.J.K. providedsupport and ideas for yeast GEM development using version control. H.L., F.L. and J.N.wrote the original manuscript. All authors read, edited, and approved the final paper.

Additional informationSupplementary Information accompanies this paper at https://doi.org/10.1038/s41467-019-11581-3.

Competing interests: The authors declare no competing interests.

Reprints and permission information is available online at http://npg.nature.com/reprintsandpermissions/

Peer review information: Nature Communications thanks the anonymous reviewer(s)for their contribution to the peer review of this work. Peer reviewer reports are available.

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims inpublished maps and institutional affiliations.

Open Access This article is licensed under a Creative CommonsAttribution 4.0 International License, which permits use, sharing,

adaptation, distribution and reproduction in any medium or format, as long as you giveappropriate credit to the original author(s) and the source, provide a link to the CreativeCommons license, and indicate if changes were made. The images or other third partymaterial in this article are included in the article’s Creative Commons license, unlessindicated otherwise in a credit line to the material. If material is not included in thearticle’s Creative Commons license and your intended use is not permitted by statutoryregulation or exceeds the permitted use, you will need to obtain permission directly fromthe copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

© The Author(s) 2019

NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-019-11581-3 ARTICLE

NATURE COMMUNICATIONS | (2019) 10:3586 | https://doi.org/10.1038/s41467-019-11581-3 | www.nature.com/naturecommunications 13