a molecular-level landscape of diet-gut microbiome interactions: toward dietary ... · dietary...
TRANSCRIPT
A Molecular-Level Landscape of Diet-Gut Microbiome Interactions:Toward Dietary Interventions Targeting Bacterial Genes
Yueqiong Ni, Jun Li, Gianni Panagiotou
Systems Biology and Bioinformatics Group, School of Biological Sciences, The University of Hong Kong, Hong Kong, Hong Kong, China
ABSTRACT As diet is considered the major regulator of the gut ecosystem, the overall objective of this work was to demonstratethat a detailed knowledge of the phytochemical composition of food could add to our understanding of observed changes infunctionality and activity of the gut microbiota. We used metatranscriptomic data from a human dietary intervention study todevelop a network that consists of >400 compounds present in the administered plant-based diet linked to 609 microbial targetsin the gut. Approximately 20% of the targeted bacterial proteins showed significant changes in their gene expression levels, whilefunctional and topology analyses revealed that proteins in metabolic networks with high centrality are the most “vulnerable”targets. This global view and the mechanistic understanding of the associations between microbial gene expression and dietarymolecules could be regarded as a promising methodological approach for targeting specific bacterial proteins that impact humanhealth.
IMPORTANCE It is a general belief that microbiome-derived drugs and therapies will come to the market in coming years, eitherin the form of molecules that mimic a beneficial interaction between bacteria and host or molecules that disturb a harmful inter-action or proteins that can modify the microbiome or bacterial species to change the balance of “good” and “bad” bacteria in thegut microbiome. However, among the numerous factors, what has proven the most influential for modulating the microbialcomposition of the gut is diet. In line with this, we demonstrate here that a systematic analysis of the interactions between thesmall molecules present in our diet and the gut bacterial proteome holds great potential for designing dietary interventions toimprove human health.
Received 25 July 2015 Accepted 25 September 2015 Published 27 October 2015
Citation Ni Y, Li J, Panagiotou G. 2015. A molecular-level landscape of diet-gut microbiome interactions: toward dietary interventions targeting bacterial genes. mBio 6(6):e01263-15. doi:10.1128/mBio.01263-15.
Editor Sang Yup Lee, Korea Advanced Institute of Science and Technology
Copyright © 2015 Ni et al. This is an open-access article distributed under the terms of the Creative Commons Attribution-Noncommercial-ShareAlike 3.0 Unported license,which permits unrestricted noncommercial use, distribution, and reproduction in any medium, provided the original author and source are credited.
Address correspondence to Gianni Panagiotou, [email protected]
A substantial amount of research has established that the com-position and metabolism of human microbiota play crucial
roles in human health. Microbial colonization of the gastrointes-tinal tract varies widely, with the large intestine having not onlythe highest density of microbes in terms of bacterial cells per grambut also the most metabolically active microbial community (1).Firmicutes and Bacteroidetes are the most abundant phyla, butActinobacteria, Proteobacteria, and Verrucomicrobia are also regu-larly found in healthy adults (2). Diet clearly has a major impacton variation in the gut microbiota composition, which is easilydetected in fecal samples even after only a few days after a changein diet (3, 4). The fermentative but also (anaerobic) respiratorybacterial metabolism of dietary components produces an extraor-dinary chemical diversity in the large intestine with protective(e.g., short-chain fatty acids [SCFAs]) or detrimental (e.g., hydro-gen sulfite or bile acids) effects on disease development (5–7). Themost sensitive species in response to a change in diet belong toFirmicutes and Actinobacteria; however, more associations be-tween species abundance and diets have been confirmed, e.g., highmeat intake with an increase in Bacteroidetes, more fiber with ahigh proportion of Prevotella spp., etc.
Dietary phytochemicals are usually small molecules with highstructural diversity, often part of the plant’s secondary metabo-
lism, that when they reach the intestinal tract can cause selectivestress or stimulus to the resident microbiota (8–10). For example,a high intake of cocoa-derived flavonoids increases the abundanceof bifidobacteria and lactobacilli and reduces the level of plasmatriacylglycerol (11). However, phytochemicals are also trans-formed by the gut microbial communities to other metaboliteswith altered bioactivity. For example, conversion of polyphenolicmolecules from the microbiome has been shown to inhibit thetumor necrosis factor, NF-�B, and other inflammatory mediators(12–14). Furthermore, bacterial enzymatic activities, such as�-glucuronidase that converts glucuronides to their respectiveaglycons, are responsible for the retention time of phytochemicalsin the human body. It is generally considered that the Gram-positive bacteria are relatively freely permeable to small molecules(15), while those molecules below a certain molecular mass,around 700 Da, can enter into Gram-negative bacteria throughporin proteins (16).
In a previous work, we applied text mining and naive Bayesclassification in 21 million abstracts from MEDLINE, and weidentified 23,137 phytochemicals that are present in plant-baseddiets (17). Several of these phytochemicals have made it all the wayto pharmacy shelves or have served as lead structures for drugdevelopment, demonstrating their important contribution to
RESEARCH ARTICLE crossmark
November/December 2015 Volume 6 Issue 6 e01263-15 ® mbio.asm.org 1
on Decem
ber 2, 2020 by guesthttp://m
bio.asm.org/
Dow
nloaded from
host homeostasis. We also systematically searched for and foundassociations between 2,768 edible plants and 1,613 human diseasephenotypes and characterized the association as being positive ornegative. We developed a web-based database, which to the best ofour knowledge is the most complete source that links phyto-chemicals, diet, and disease, and we were able to provide amolecular-level mode of action on how specific diets could pre-vent disease development (18) or affect drug pharmacodynamicsand pharmacokinetics (19). As a step further, we used this exten-sive source of information to generate a comprehensive picture ofthe interaction profile of 158 edible plants, which are positivelyassociated with reduction in colon cancer, with a predefined can-didate colon cancer target space consisting of ~1,900 human pro-teins (20). We uncovered the key components in colon cancer thatare targeted synergistically by phytochemicals, and we identifiedstatistically significant and highly correlated protein networks thatcould be perturbed by dietary habits. However, in that study, wecompletely ignored interactions of the phytochemicals with thegut microbiome and alterations in gut composition and activity.
The gut microbial ecosystem lies at the interface of the host andenvironment and is considered the third major component regu-lating host health and the onset of disease; therefore, developingways to manipulate the gut microbiota represents a promisingtherapeutic avenue. Trying to identify the “good” and “bad” bac-teria in the intestinal microbiota in order to promote the growthof healthy bacterial species and/or eliminate the others has beenrecognized as one strategy (21); however, not everyone is con-vinced that single species are to be blamed for disease develop-ment. Instead, or maybe in addition, to species-centric studies,targeting specific genes could be a more realistic approach, espe-cially since our knowledge of harmful bacterium-specific meta-bolic products has grown substantially (22). In the present work,we developed the computational framework for studying possibleinteractions between the highly diverse phytochemical space ofour diet and gut microbial proteins. By coupling metatranscrip-tomics, chemoinformatics, and network biology, we created amolecular-level map of diet-bacterium interactions showcasingthat individual dietary components may contribute to the ob-served gene expression activities. To the best of our knowledge,this is the first systematic approach for a mechanistic understand-ing on how phytochemicals are linked to the intestinal microbiota,and it may assist in designing diets with potential therapeutic ben-efit.
RESULTSInteractome analysis of bacterial genes and dietary compo-nents. We adopted the metatranscriptomic data from a recently(2014) published study (4) that aimed to investigate whether thehuman gut microbiota can rapidly respond to dietary interven-tions in a diet-specific manner. Briefly, the original study had twodiet arms, a plant-based diet and an animal-based diet; fecal sam-ples were collected from nine individuals for baseline levels andafter 2 days on each experimental diet, and transcriptome se-quencing (RNA-Seq) analysis of the bacterial community in thegut was performed. In order to utilize the huge diversity ofthe phytochemical space and link the molecular composition ofthe diet with chemoinformatics and network-based analysis, wefocused on only the plant-based diet in this study to achieve amolecular-level investigation of the microbial response to diet.The plant-based diet consisted of a mixture of edible plants, most
of which belong to fruits and vegetables. To obtain an overallpicture of the health-related effects of the administered plant-based diet, we used the plant/food names as a query in the Nu-triChem database (18), and we retrieved data on associated diseasephenotypes. In total, 14 foods in this diet arm (see Table S1A in thesupplemental material) were found in NutriChem and were usedto generate a food-disease network (Fig. 1A). Specifically, for these14 foods, there is experimental evidence for association with 38different diseases that fall into 20 disease categories, which werefurther classified into 8 general disease classes, including meta-bolic disease, cancer, cardiovascular disease, etc. (Table S1C).Among the 14 foods, garlic was associated with the highest num-ber of diseases (20). Notably, a considerable number of these dis-eases, including colon cancer, obesity, and inflammatory boweldisease, have previously been linked to the composition and di-versity of the gut microbiota (23, 24).
Subsequently, the chemical composition of these 14 foods wasretrieved from the NutriChem database (see Table S1B in the sup-plemental material) and 437 unique food compounds (also calledphytochemicals throughout this study) were obtained for furtheranalysis. Of the 437 phytochemicals, 415 (95%) have molecularmasses below 700 Da and thus are expected to enter even Gram-negative bacteria. There are approximately 40 compounds perfood on average; among these foods, peas have the richest phyto-chemical profile (85 phytochemicals), while carrots contain thelowest number of compounds (4 phytochemicals). Tomatoes andrice have “the most novel chemical space,” which denotes thehighest number of compounds present exclusively in one partic-ular food. To investigate the potential bioactivity of these 437phytochemicals, we compared them with 1,536 FDA-approveddrugs in terms of chemical and physical properties. We observedthat most of these food compounds cluster closely or overlap withmany FDA-approved drugs indicated for metabolic diseases (seeFig. S1 in the supplemental material). There was also clear over-lapping with a subset of the FDA-approved drugs that target hu-man proteins with bacterial orthologs in the gut; thus, they (i.e.,the FDA-approved drugs and subsequently their chemically sim-ilar food components) could potentially alter the activity of themicrobial metabolic network (Fig. S2). Some representative foodcompounds, such as methoxsalen (present in papaya), quercetin(garlic, mangos, tomato, cauliflower), aminolevulinic acid (peas,tomato), trifluoperazine (garlic), which can be found in Drug-Bank either as experimental or FDA-approved drugs, are high-lighted in Fig. S2.
To hone in on the food-microbiome interactions at the molec-ular level, we employed compound bioactivity data to uncover thepotential targets of the compounds within the 14 foods. For thatpurpose, we used the ChEMBL database (25), which contains ex-perimentally verified interactions between small molecules andproteins. According to the ChEMBL database, 165 of these 437phytochemicals interacted with 214 proteins. The phytochemicalstargeting the highest number of proteins were quercetin(CHEMBL50), apigenin (CHEMBL28), ellagic acid (CHEMBL6246),and luteolin (CHEMBL151) (listed in decreasing order) (Fig. 1B).Of the 214 protein targets that interact with those food com-pounds, 31 proteins were found to have bacterial orthologs in thegut (see Materials and Methods for details). To evaluate the cred-ibility of interactions between microbial proteins and compoundsgenerated by ortholog inference, we compared the distribution ofsimilarities of the human-microbe orthologs defined in this study,
Ni et al.
2 ® mbio.asm.org November/December 2015 Volume 6 Issue 6 e01263-15
on Decem
ber 2, 2020 by guesthttp://m
bio.asm.org/
Dow
nloaded from
with that of human-microbe protein pairs targeted by the samecompounds (experimentally tested, 1,088 in total) retrieved fromthe ChEMBL database. No significant difference (P � 0.05 by theWilcoxon rank sum test) was observed for bit score, identity, or Evalue between these two groups of protein pairs. In addition tothis global comparison, we investigated the similarities of host-microbe pairs used here at the binding site level. It was observedthat their respective binding pockets also possessed medium orhigh similarities, which were not lower than the similarities ofhuman-microbe protein pairs that have been experimentallytested (based on the ChEMBL database) to share the same ligands(data not shown). This provided additional evidence for the fea-sibility of inferring bacterial targets based on homology to hosttargets. Quercetin, luteolin, and ellagic acid were found to possessthe most bacterial protein targets, as well as kaempferol(CHEMBL150), resveratrol (CHEMBL165), apigenin, and trice-tin (CHEMBL247484) (in decreasing order) (Fig. 1B). Quercetinand kaempferol are widespread in the plant-based diet of ourstudy (cauliflower, mangos, onion, and peas), while other phyto-chemicals are present in only one or two foods. When incorporat-ing the first-degree protein-protein interaction (PPI) partners ofthe 31 direct bacterial targets, we significantly expanded the phy-tochemical target space, generating a gut microbiota-specificcompound-target interaction network (Fig. 1B). Among the 31 di-
rect targets, pyruvate kinase (UniProt accession no. P14618) andglyceraldehyde-3-phosphate dehydrogenase (UniProt accessionno. P04406) have the most PPI partners. In summary, 99 com-pounds present in the 14 foods of the plant-based diet were linkedto 609 gut-microbial targets, which included 31 direct targets plus578 protein-protein interaction partners.
Gene expression analysis of phytochemical targets: func-tional implications and topological characteristics. To deter-mine to what extent the gut microbial protein targets of the foodcompounds were significantly influenced by dietary intake, weperformed a differential expression (DE) analysis using the RNA-Seq data before and after administration of the plant-based diet.Compared to the study of David et al. (4), the DE analysis on theRNA-Seq data, both in the preprocessing steps of the raw data andthe statistical analysis, were modified to achieve higher accuracy(see Materials and Methods for more details). We found that 115microbial targets (7 direct targets and 108 PPI targets), approxi-mately 20% of the whole target space (31 direct targets and 578PPI targets, respectively), altered gene expression significantly(false discovery rate [FDR] P value � 0.20 by Wilcoxon signed-rank test). This percentage of expression-changing targets in-duced by dietary phytochemicals is higher than the 8% induced bydrugs (26). This may be due to the fact that phytochemicals couldtarget multiple proteins, i.e., more promiscuous, as well as the
P53778P14679
CHEMBL206555
O14757P47870
O15264
P39900P47869
P49841
P04278
P11511
Q15118
P31749
O75582
P47989
P21917P45983
P14061
P18505
P48169P14867
Q16445
Q99928P31644
P78334
P28472
O14764
CHEMBL252642CHEMBL269630O75762
Q04206
CHEMBL14085
P18507
O00591
CHEMBL404515
P34903
Q9UN88
CHEMBL125743
Q8N1C3
Q92731
Q04760
Q16678
P35354
P51570
P49840
Q9NPH5
CHEMBL1275624
P23219
P36888
O94956
P04798P10828
P04637P27695Q03164
P46063
CHEMBL242341CHEMBL96926Q6W5P4
CHEMBL422
CHEMBL8320Q16637
CHEMBL986CHEMBL15594CHEMBL312163CHEMBL1453208
CHEMBL65675
CHEMBL501174
CHEMBL582
CHEMBL1232258
O75496
CHEMBL1499
P22748P42345
CHEMBL1394423
Q01453
CHEMBL402947
CHEMBL1394708
CHEMBL50
P21397
P68871CHEMBL239243
CHEMBL8659
CHEMBL13766
CHEMBL45068
P09917
CHEMBL525239
CHEMBL23194
P40225
CHEMBL67166
P02545P42858
CHEMBL602
B2RXH2P28482CHEMBL1492383
P10636
P19838
P16050P15428
CHEMBL247484CHEMBL151P33261
Q9Y6L6CHEMBL454173
CHEMBL28CHEMBL366603
CHEMBL1371781CHEMBL145CHEMBL131921
CHEMBL8145P06746CHEMBL165CHEMBL602575
CHEMBL11608
CHEMBL274323Q9NR56
CHEMBL1328484CHEMBL142652
CHEMBL284377
P54132P18054
CHEMBL46403
CHEMBL37537P35869
CHEMBL1537509
P10253
P11230
CHEMBL1347061
P30926
CHEMBL29757
P32297
Q07001
P17787P07510
O14842
CHEMBL9352
CHEMBL265325
CHEMBL42710P04745
CHEMBL1042
P05177
CHEMBL379064
CHEMBL1231491
CHEMBL464983
P14618
Q16539CHEMBL123040
CHEMBL267476
CHEMBL1236378
CHEMBL17962
CHEMBL539648
CHEMBL1569524
CHEMBL428495
P16473
CHEMBL1398443
CHEMBL416
CHEMBL150P11712
P10635
CHEMBL1332980CHEMBL1622CHEMBL254348
CHEMBL239047
CHEMBL415690
CHEMBL465183
CHEMBL234926
CHEMBL1322988CHEMBL721
CHEMBL170365
CHEMBL449317
P22694
Q05586
P08107P12931
P06280
CHEMBL1588379
P16083
Q16790
CHEMBL1418955
P51151
O43570
Q02763
P08581P00533
CHEMBL84446
P22612P84022
CHEMBL284616
P35968
P15121
P37059O15118
Q13315 P03372
P68400O75604CHEMBL1537012CHEMBL308333
P42330P08253
CHEMBL1375795CHEMBL1394721CHEMBL593019
P03956
Q8IW41
CHEMBL66879
CHEMBL82293
CHEMBL1784262
P17252O00141
Q01469
P42226P11309
P49137CHEMBL6246
CHEMBL1096009CHEMBL1604063
CHEMBL222021
CHEMBL1556Q9ULX7
P08069
Q9HC97CHEMBL464404 P06213
P06748Q9UM73
Q4U2R8 P17612
Q15759
CHEMBL1334750
P06493
CHEMBL1461
P07148
P00918
P27338
P14780
CHEMBL350675
O60285
P15090
CHEMBL464982
CHEMBL359965
CHEMBL444478
P49327
CHEMBL1325366
P37088
P51170
P51168
Q15391
CHEMBL610981 CHEMBL122890
P00390O00519P04406
CHEMBL228057CHEMBL190503
CHEMBL576
Q12879
P43166
P35218
P09237
Q9Y2D0
P15144
CHEMBL120568
Q9HBH1
Q99720Q15125
Q9Y468CHEMBL1686
Q9GZT9
CHEMBL514882
CHEMBL1487394
P31645P35498
P29466
CHEMBL1364101
CHEMBL1236376
CHEMBL54922
CHEMBL1609
CHEMBL15799CHEMBL1451395
CHEMBL389507
Q9BY08
Q96RJ0
Q99250
P31639
CHEMBL564
CHEMBL11298
Q9NY46
P55210
P18089
P08913
P08912
P23975
P11229 P25100
P14416
P05164
P28223
P41595
P50406
O15245
P28221
P22303
P35462
P28335
P18825
P35367P20309
P08173P08172
Q8WXA8
Q13133
P28566 P08908
P33765
CHEMBL170154
P47898
A5X5Y0
Q70Z44
O95264CHEMBL171804
P46098
P11509
P43681
CHEMBL32749
P02708P19099
CHEMBL69018
CHEMBL1473644
P15538
O00255
CHEMBL338066P00352
Q99714Q9NUW8
P08684
CHEMBL1329826CHEMBL19612
CHEMBL1552319
CHEMBL1492473CHEMBL1518783
CHEMBL1221925
CHEMBL1254011CHEMBL446452
P30939
CHEMBL539
P28222
P06239
P06241
Q13639 P34969
CHEMBL1201131
CHEMBL497939CHEMBL1506228
CHEMBL506247
CHEMBL1164301CHEMBL279597
CHEMBL485012
CHEMBL24171
CHEMBL934
CHEMBL484901
CHEMBL1293
CHEMBL6640
CHEMBL1419611CHEMBL64
CHEMBL1236019
CHEMBL333298
CHEMBL274318
CHEMBL1475613
CHEMBL39
CHEMBL26215
P24557CHEMBL540
Q16665
CHEMBL1364
CHEMBL1195032
CHEMBL1354559
CHEMBL164660
ENSP00000216780
ENSP00000280605ENSP00000451932
ENSP00000379769
ENSP00000384949ENSP00000311028
ENSP00000280706
ENSP00000354554
ENSP00000384369
ENSP00000297283
ENSP00000333664
ENSP00000342026ENSP00000374354
ENSP00000249750
ENSP00000351706ENSP00000219252
ENSP00000342032
ENSP00000358939ENSP00000265838
ENSP00000277865
ENSP00000361699
ENSP00000306548
ENSP00000359991
ENSP00000262027ENSP00000349142ENSP00000343867ENSP00000364475
ENSP00000327589
ENSP00000355493
ENSP00000377865
P00390ENSP00000299518
ENSP00000282050
ENSP00000323811
ENSP00000367086ENSP00000261772
ENSP00000368226
ENSP00000264932
ENSP00000359691
ENSP00000379678
ENSP00000368066
ENSP00000321259
ENSP00000341504
ENSP00000370737
ENSP00000367923ENSP00000389175
ENSP00000421524
ENSP00000342071ENSP00000346120 ENSP00000295822
ENSP00000296805
ENSP00000302111ENSP00000411803
ENSP00000262746
ENSP00000364486ENSP00000283646
ENSP00000367615
ENSP00000281543
CHEMBL9352
P16083CHEMBL28
ENSP00000261296ENSP00000386350
ENSP00000296411
ENSP00000290846
ENSP00000231572
ENSP00000223836
ENSP00000369456
ENSP00000246337ENSP00000377954
ENSP00000260563
ENSP00000039007
ENSP00000339399ENSP00000319501
ENSP00000393377
ENSP00000369785ENSP00000281081
ENSP00000426638
ENSP00000373176ENSP00000368438
ENSP00000354499
ENSP00000368358ENSP00000315152
ENSP00000235521ENSP00000402038
ENSP00000315644
ENSP00000340019
ENSP00000355325
ENSP00000344818
P27695ENSP00000251588
ENSP00000350491ENSP00000206542
ENSP00000365773
CHEMBL239243
ENSP00000222982
ENSP00000269298
ENSP00000253382
ENSP00000262878
ENSP00000245552
ENSP00000281038
ENSP00000175506
ENSP00000274606
ENSP00000364794
ENSP00000355536
CHEMBL1686
ENSP00000304229
ENSP00000469037
CHEMBL379064
ENSP00000398536
ENSP00000299163ENSP00000264162
Q9GZT9
ENSP00000371393
CHEMBL8145
CHEMBL150ENSP00000288532
ENSP00000333551ENSP00000219066
ENSP00000355518
ENSP00000282276ENSP00000242576
ENSP00000362298
ENSP00000264220
ENSP00000364649ENSP00000446384
ENSP00000303279
ENSP00000295453
ENSP00000280701
ENSP00000471388ENSP00000360268
ENSP00000265537ENSP00000358719
ENSP00000306817ENSP00000363965
ENSP00000339933P04406
ENSP00000360124ENSP00000007516ENSP00000465858ENSP00000216392
ENSP00000414662
ENSP00000287936ENSP00000272252
ENSP00000308897
ENSP00000379038
ENSP00000322439
ENSP00000385638ENSP00000381234ENSP00000264263ENSP00000216962
ENSP00000345771
ENSP00000390107
ENSP00000263629
ENSP00000377527
ENSP00000282356
ENSP00000216774
ENSP00000241052
ENSP00000358042
ENSP00000378782
ENSP00000370376
ENSP00000305995
ENSP00000366156
ENSP00000393953
ENSP00000419851
P51570
ENSP00000308845ENSP00000274813
Q04760
ENSP00000416583
ENSP00000164139
ENSP00000382595
ENSP00000258874
ENSP00000356022
ENSP00000363641
ENSP00000262030
ENSP00000278572
ENSP00000286621
ENSP00000355086
ENSP00000354876
ENSP00000444136
ENSP00000290921
ENSP00000244217
ENSP00000300051ENSP00000373073
ENSP00000361512
ENSP00000364986
ENSP00000302620
ENSP00000390158
ENSP00000307188ENSP00000228347
ENSP00000428115
ENSP00000360569
ENSP00000295448
ENSP00000353770ENSP00000264954
ENSP00000233893
ENSP00000332256ENSP00000437850
ENSP00000352516
ENSP00000361446ENSP00000356354
ENSP00000418082ENSP00000309334
ENSP00000359345
ENSP00000217455
ENSP00000261733
ENSP00000228510
ENSP00000366927
ENSP00000275605
ENSP00000333837
ENSP00000324124
ENSP00000290299
ENSP00000347345
ENSP00000419325
ENSP00000361511
ENSP00000292432
ENSP00000381504ENSP00000410833
ENSP00000312735
ENSP00000335304
ENSP00000354314ENSP00000358417
ENSP00000472847ENSP00000365462
ENSP00000273588ENSP00000352401
ENSP00000367727
ENSP00000369757
ENSP00000395605
ENSP00000357838
ENSP00000342991
ENSP00000265594
ENSP00000285093ENSP00000215587
ENSP00000360262
ENSP00000358549
ENSP00000455114
ENSP00000226299ENSP00000313877
ENSP00000302393
ENSP00000371230
ENSP00000346921
ENSP00000429374ENSP00000308610
ENSP00000244051
ENSP00000261326ENSP00000438404
ENSP00000333490
ENSP00000312304
ENSP00000273920ENSP00000443194ENSP00000359680
ENSP00000273359ENSP00000238018ENSP00000286479
ENSP00000341885
ENSP00000296674
ENSP00000425809
ENSP00000333395
ENSP00000468425ENSP00000263774
ENSP00000272065ENSP00000266971
ENSP00000367992
ENSP00000380514ENSP00000257549
ENSP00000252505ENSP00000315774
ENSP00000344794ENSP00000362329
ENSP00000230354
ENSP00000318318 ENSP00000354532
ENSP00000298556
ENSP00000258317
ENSP00000346015
ENSP00000317379
ENSP00000475569
ENSP00000361965
ENSP00000253004
ENSP00000452762
P47989
ENSP00000359531
ENSP00000449535 ENSP00000252603ENSP00000219302
ENSP00000438588ENSP00000317904
ENSP00000339435
ENSP00000230048ENSP00000353104
ENSP00000219240
ENSP00000340278
ENSP00000013034
ENSP00000300151
ENSP00000225969
CHEMBL312163
P04745
ENSP00000322524
ENSP00000296577
CHEMBL8320CHEMBL151CHEMBL50
P46063ENSP00000263274ENSP00000408176
ENSP00000372316
CHEMBL1537509
CHEMBL1354559
P00352
CHEMBL1371781
ENSP00000358695
ENSP00000266517
CHEMBL1492473
ENSP00000260324
ENSP00000313350
CHEMBL142652
CHEMBL1221925
ENSP00000305480
CHEMBL131921 CHEMBL145
CHEMBL165CHEMBL1375795CHEMBL402947
CHEMBL1195032
CHEMBL1537012
ENSP00000234420ENSP00000246548
Q99714
CHEMBL274323
CHEMBL1096009
Q03164CHEMBL1329826
CHEMBL1518783
ENSP00000263657
ENSP00000317376CHEMBL247484
CHEMBL1492383ENSP00000285848
CHEMBL15594ENSP00000360310
ENSP00000307288
ENSP00000370377
ENSP00000262105
ENSP00000231790
ENSP00000265113
ENSP00000225577 ENSP00000311847
ENSP00000420927
ENSP00000362702
Q9HBH1
ENSP00000268261ENSP00000367893
ENSP00000257347ENSP00000232607
Q15118
CHEMBL19612 O00141
CHEMBL1394423CHEMBL26215
ENSP00000283131
ENSP00000466058
CHEMBL525239CHEMBL308333
CHEMBL1293
CHEMBL1552319
CHEMBL366603P10253CHEMBL338066
CHEMBL6246
CHEMBL1622
CHEMBL8659
ENSP00000328174
P54132
CHEMBL242341
CHEMBL422CHEMBL1453208
CHEMBL485012
CHEMBL1364
CHEMBL254348
CHEMBL46403
CHEMBL1275624
CHEMBL1499CHEMBL497939
CHEMBL540
ENSP00000370750
ENSP00000305221
ENSP00000319851
CHEMBL1569524
ENSP00000267502
ENSP00000286398CHEMBL23194
CHEMBL284377
CHEMBL67166
CHEMBL986
CHEMBL602 CHEMBL45068
CHEMBL11608
ENSP00000258168CHEMBL164660
CHEMBL1164301
ENSP00000296412
CHEMBL1506228ENSP00000420269
CHEMBL454173
CHEMBL721CHEMBL42710CHEMBL602575
CHEMBL465183
CHEMBL267476
CHEMBL1332980ENSP00000397939
CHEMBL464983
CHEMBL1322988
CHEMBL123040
ENSP00000306606
CHEMBL582
CHEMBL1254011
P43166
CHEMBL446452
P06280
CHEMBL1604063
CHEMBL274318
ENSP00000371321
CHEMBL1419611
CHEMBL1394708
CHEMBL415690
CHEMBL24171
CHEMBL1201131CHEMBL1042
ENSP00000326219
ENSP00000224356
ENSP00000255078
ENSP00000251566
ENSP00000303147ENSP00000438144
ENSP00000379339ENSP00000222286
ENSP00000396308ENSP00000229319
ENSP00000270776ENSP00000385725
ENSP00000307241
ENSP00000357535
ENSP00000429082
ENSP00000370517
P14618ENSP00000215754
ENSP00000378890
ENSP00000377446ENSP00000330054
ENSP00000270142
ENSP00000419129ENSP00000300107
ENSP00000443822ENSP00000301761ENSP00000324105
ENSP00000352222ENSP00000325448
ENSP00000265174ENSP00000460924
ENSP00000309477
ENSP00000288937ENSP00000245816
ENSP00000379108
ENSP00000319531
ENSP00000377617
ENSP00000445175
ENSP00000250237
ENSP00000205402ENSP00000304802ENSP00000333667
ENSP00000331897ENSP00000319788ENSP00000359976
ENSP00000314949
ENSP00000264331ENSP00000199320
Q16665
ENSP00000219431
ENSP00000246868ENSP00000297494
ENSP00000265724
ENSP00000338606
ENSP00000354720
ENSP00000372326 ENSP00000264156ENSP00000254322
ENSP00000264606
CHEMBL65675CHEMBL39
O60285ENSP00000306561
ENSP00000321584
ENSP00000468051
ENSP00000260129ENSP00000216911
ENSP00000286794CHEMBL279597CHEMBL37537
ENSP00000268171
ENSP00000351777
ENSP00000307156ENSP00000359507
ENSP00000237612ENSP00000463683
ENSP00000381607
ENSP00000423822
ENSP00000265498ENSP00000358727
ENSP00000471150
ENSP00000406630ENSP00000274242
ENSP00000451976
CHEMBL122890
ENSP00000285737
ENSP00000261434
ENSP00000357981
ENSP00000311430
ENSP00000345023
ENSP00000298510
ENSP00000290765
ENSP00000265462
ENSP00000237858
ENSP00000356410
ENSP00000338964
ENSP00000354677ENSP00000309463
ENSP00000248935
ENSP00000318868
ENSP00000295266
ENSP00000348234
ENSP00000359151
ENSP00000375881
ENSP00000301149
ENSP00000318351
ENSP00000363676
ENSP00000403290
ENSP00000273398ENSP00000320658 ENSP00000403664
ENSP00000321070 ENSP00000319814ENSP00000302728ENSP00000293860
ENSP00000262584
ENSP00000245206
ENSP00000241600
ENSP00000330141
ENSP00000390637
ENSP00000402760
ENSP00000261187ENSP00000255226
ENSP00000288699
ENSP00000325421
ENSP00000354511
ENSP00000269187
ENSP00000302851
ENSP00000225719
P23975
ENSP00000362135
P31645ENSP00000298472
ENSP00000297185
ENSP00000315006
ENSP00000407509
ENSP00000377191
ENSP00000378920
ENSP00000339260
O15245
ENSP00000370571
Q4U2R8
ENSP00000242375
ENSP00000354982
ENSP00000352702ENSP00000378951
ENSP00000254488
ENSP00000329757
ENSP00000336866
ENSP00000326671
ENSP00000323549
ENSP00000287766
ENSP00000270349
ENSP00000417085
ENSP00000428200
ENSP00000279027
ENSP00000352456
ENSP00000278379
ENSP00000286749
ENSP00000358640
ENSP00000362469
ENSP00000075120
CHEMBL564
ENSP00000370589
CHEMBL1328484ENSP00000254719
ENSP00000348886ENSP00000258975
ENSP00000472940
ENSP00000301327
ENSP00000341382ENSP00000356972ENSP00000296273
ENSP00000349576
ENSP00000408295
P31639
ENSP00000266736
ENSP00000209728
ENSP00000263187
ENSP00000267430
ENSP00000298139
ENSP00000370150
ENSP00000259008
CHEMBL17962CHEMBL1231491
ENSP00000382133
ENSP00000343392
ENSP00000321636
ENSP00000375809ENSP00000369411
CHEMBL265325
ENSP00000264233
ENSP00000411532
P49327
ENSP00000256460
ENSP00000353826ENSP00000346839
ENSP00000357218
ENSP00000363868
CHEMBL464982
ENSP00000265686
ENSP00000264595
ENSP00000349078
ENSP00000314508
ENSP00000220584
ENSP00000344223 ENSP00000217426
ENSP00000322706
ENSP00000275428
ENSP00000281455
ENSP00000260645
ENSP00000371110ENSP00000381464
ENSP00000294724
ENSP00000244289
ENSP00000363229CHEMBL66879
ENSP00000350914
ENSP00000332247ENSP00000352608
ENSP00000369820ENSP00000252595
ENSP00000272286
ENSP00000290429
ENSP00000367462ENSP00000234111
ENSP00000278618
ENSP00000324842
ENSP00000403357
ENSP00000264426
ENSP00000345873ENSP00000346601
ENSP00000222481
ENSP00000311984
ENSP00000378669
ENSP00000325875
ENSP00000313921
ENSP00000236959
ENSP00000300738
P22303
ENSP00000361287
ENSP00000416293
ENSP00000386284
ENSP00000323568
ENSP00000282753
ENSP00000221486ENSP00000335153
ENSP00000306306
ENSP00000295688
P00918
ENSP00000307940
ENSP00000273550ENSP00000248923
ENSP00000318480
ENSP00000368119ENSP00000339064ENSP00000431376ENSP00000221476
ENSP00000341136
ENSP00000257770ENSP00000264613
CHEMBL610981
ENSP00000470037
ENSP00000355890
ENSP00000328973
ENSP00000342056ENSP00000280704
ENSP00000339027ENSP00000260985ENSP00000307786ENSP00000352657
ENSP00000234590
ENSP00000301522
ENSP00000225430
ENSP00000216185
ENSP00000267814ENSP00000327251
ChEMBLB
A
165 compounds <> 214 proteins
99 compounds <> 609 proteins
Ortholog identification
Microbiota-specific
STRING
Respiratory
Unclassified
Metabolic NeurologicalGastrointestinal
ImmunologicalCancerCardiovascular
FIG 1 Molecular-level view of the plant-based diet. (A) Food-disease association network for the 14 foods in the plant-based diet of this study. The generaldisease classes were colored differently as illustrated by the legend. The size of the food nodes reflects the number of disease connections for each food. (B) Gutmicrobiota-specific protein target space of the food phytochemicals. (Left) Human direct targets (red rectangles) of the food phytochemicals (green hexagons)were retrieved from the ChEMBL database. (Right) The microbiota-specific direct targets (red rectangles) and their protein-protein interaction (PPI) partners(blue rectangles) were derived after incorporating orthologous relationships with human and STRING protein-protein interaction data.
Phytochemicals Alter Gut Microbial Gene Expression
November/December 2015 Volume 6 Issue 6 e01263-15 ® mbio.asm.org 3
on Decem
ber 2, 2020 by guesthttp://m
bio.asm.org/
Dow
nloaded from
possibility of multiple phytochemicals targeting the same protein.These 115 DE targets were not found to be significantly differen-tially expressed when comparing the animal-based diet (4) againstits baseline (FDR P value � 0.20 by Wilcox rank sum test). Nota-bly, the targets of food compounds had a significantly higher pro-portion of DE genes than the nontargeted gut-bacterial proteins(20.7% versus 14.8%; P � 0.0016 by Fisher’s exact test). All thephytochemicals targeting these 115 proteins are below 700 Da,and therefore were expected to enter the bacterial cell. On the basisof the response of the gut microbiome and in order to move be-yond the restricted food set used in this study, we also attemptedto identify other foods that would potentially induce the equiva-lent or similar effect. This endeavor was bridged by the common-alities in the phytochemical composition between the original 14foods used in this study and the 1,772 plant-based foods availablein the NutriChem database. Using as input the set of the 59 phy-tochemicals present in these 14 foods with DE targets, we searchedthe NutriChem database for alternative foods that show the high-est overlap in the phytochemical space. More than 550 foods wereretrieved, while some representative examples in this top 20 foodlist (see Table S4 in the supplemental material) include Ginkgobiloba (ginkgo; 41 shared compounds), sea buckthorn (Hippo-phae; 33 shared compounds), and Lonicera japonica (honeysuckle;28 shared compounds).
Looking into the biological processes mostly affected by thephytochemical-protein interactions, we found that the 115 DEtargets were significantly enriched in 11 pathways (FDR P value �0.05 by Fisher’s exact test) (Fig. 2A), which provided functionalinsights into the gut microbial targets. All of these enriched path-ways are related to metabolism, such as glucose metabolism, tri-carboxylic acid (TCA) cycle (also known as the citric acid cycle),and metabolism of amino acids and derivatives. In contrast, thenon-DE targets displayed a very different profile of significantlyenriched pathways, exemplified by DNA repair and translation-related processes (Fig. 2A). To inspect this discrepancy in a morestatistical manner, we compared the pathway differences betweenDE and non-DE targets using a pathway overrepresentation anal-ysis. We focused on the top 12 significantly enriched pathways forboth groups as shown in Fig. 2A. Seven out of the 12 pathways forthe DE targets (as indicated by asterisks in Fig. 2A) showed signif-icant overrepresentation compared directly with the non-DEgroup (P � 0.05 by Fisher’s exact test); moreover, all seven path-ways were metabolic processes. Conversely, all the overrepre-sented pathways in the non-DE group relative to the DE groupwere not closely related to metabolism. This suggests that the phy-tochemicals present in plant-based diet mainly exert their effecton gut microbiome by altering microbial metabolic activities.Taking the 15 most up- or downregulated genes (fold change) asrepresentatives, we further zoomed in the reactions those DE tar-gets are involved in. The solute carrier family 22 member 1 (or-ganic cation transporter 1, UniProt accession no. Q15245) andGTP:AMP phosphotransferase AK3 (UniProt accession no.Q9UIJ7) were the two most upregulated genes, whereas replica-tion factor C subunit 3 (UniProt accession no. P40938) and glu-cose transporter 3 (UniProt accession no. P11169) were the mostdownregulated genes in the list. The results (Fig. 2B; see Ta-ble S3 in the supplemental material) also show that the majority ofthe phytochemical targets with the highest fold change in geneexpression activity participate in metabolic enzymatic reactions,such as the acetyl coenzyme A (acetyl-CoA) acetyltransferase
(UniProt accession no. Q9BWD1) and aldehyde dehydrogenase(UniProt accession no. P47895). In the list with the most up- ordownregulated genes, there are also several genes with “auxiliary”functions such as ATP binding (UniProt accession no. P61221),potentially playing a regulatory role in response to dietary intake.Phosphopyruvate hydratase (UniProt accession no. P06733) andalcohol dehydrogenase (UniProt accession no. P11766) are theenzymes interacting with the highest number of compoundswithin up- or downregulated gene lists, respectively. However,what also caught our attention in this list is that neither the num-ber of compounds nor number of foods targeting gut-bacterialproteins can be used as an indicator for the observed fold changesin gene expression activity level.
Next, we investigated the differences between DE and non-DEphytochemical targets, both from a chemical and biological per-spective, in an attempt to understand why some of the food targetswill show a significant difference in gene expression levels whileothers will not. First, the chemical features of the compounds wereused to examine whether the DE and non-DE proteins could bedistinguished. From the principal component analysis (PCA)based on the chemical descriptors of the 99 compounds, no clearseparation can be observed between compounds targeting DEproteins and non-DE proteins (data not shown). In addition, thestrength of the compound-protein interactions (based on bioac-tivity data retrieved from the ChEMBL database) was comparedbetween DE and non-DE target groups using a set of compoundsthat target proteins from both groups. As discussed above, thecompounds targeting both DE and non-DE proteins did not showsignificantly different binding affinities toward the two targetgroups (P � 0.05 by Student’s t test). These results suggest that theintrinsic biological properties of the target proteins, rather thanthe chemical characteristics of the compounds or the strength ofinteractions between compounds and proteins, are probably as-sociated with the variation in the community gene expression un-der mediation by phytochemicals. This is also consistent with ourprevious observation that the targeted genes with the highest foldchange activity are not necessarily the ones targeted by the highestnumber of compounds (Fig. 2B).
Therefore, the topological properties of the target proteins inthe orthologous PPI network were further investigated (Fig. 2C).For the whole orthologous PPI network, the power-law distribu-tion of degree connectivity was observed, consistent with the com-mon feature of many biological networks (27) (see Fig. S3A in thesupplemental material). Compared to nontarget proteins, thephytochemical targets were mainly positioned in the central area,possessing higher connectivity (Fig. 2C and Fig. S3B). Withinthese targets, the majority of the network centrality measures, in-cluding degree and closeness centrality, were found to be signifi-cantly different between DE and non-DE targets (P � 0.01 byWilcoxon rank sum test). More specifically, the DE targets of thefood compounds tend to be located in a more “central” positionwithin the PPI network than the non-DE ones (Fig. 2C and D), asreflected by the higher centrality measurements (as well as thelower average shortest path length) of the DE group. The highercloseness centrality of the DE targets indicates that the informa-tion spread is generally faster from a given DE protein to all otherconnected nodes in the network. The DE targets also exhibitedhigher betweenness centrality, which measures the amount ofcontrol a node has on the information spread between othernodes. This implies that compared to non-DE targets, DE targets
Ni et al.
4 ® mbio.asm.org November/December 2015 Volume 6 Issue 6 e01263-15
on Decem
ber 2, 2020 by guesthttp://m
bio.asm.org/
Dow
nloaded from
050
100
150
200
Protein degree connectivity
DE targets Non-DE targets
0.00
00.
002
0.00
40.
006
0.00
8
Protein BetweenessCentrality
DE targets Non-DE targets
0.30
0.35
0.40
0.45
0.50
Protein ClosenessCentrality
DE targets Non-DE targets
2.0
2.2
2.4
2.6
2.8
3.0
3.2
3.4
AverageShortestPathLength
DE targets Non-DE targets
0.75
0.80
0.85
Protein Radiality
DE targets Non-DE targets
0e+0
02e
+05
4e+0
56e
+05
8e+0
5
Protein Stress
DE targets Non-DE targets
Galactose catabolism
Methylation
Glycolysis
Citric acid cycle (TCA cycle)
Gluconeogenesis
Pyruvate metabolism and Citric Acid (TCA) cycle
Respiratory electron transport
The citric acid (TCA) cycle and respiratory electron transport
Glucose metabolism
Metabolism of amino acidsand derivatives
Metabolism of carbohydrates
0 5 2010 15
3%
3%
4%
5%
5%
5%
5%
9%
10%
11%
16%
REACT_532
REACT_6946
REACT_1383
REACT_1785
REACT_1520
REACT_1046
REACT_22393
REACT_111083
REACT_723
REACT_13
REACT_474
**
**
*
*
*
DNA strand elongation
Eukaryotic Translation Termination
Formation of a pool of free 40S subunits
L13a-mediated translational silencing of Ceruloplasmin expression
GTP hydrolysis and joining of the 60S ribosomal subunit
Eukaryotic Translation Initiation
Peptide chain elongation
Eukaryotic Translation Elongation
SRP-dependent cotranslational protein targeting to membrane
DNA Repair
Metabolism of nucleotides
REACT_932
REACT_1986
REACT_1797
REACT_79
REACT_2085
REACT_2159
REACT_1404
REACT_1477
REACT_115902
REACT_216
REACT_1698
***
*****
0 5 2010 15
3%
5%
5%
5%
5%
5%
5%
5%
5%
5%
7%
DE
targ
ets
non-
DE
targ
ets
UniProt Regulation Type # of compounds # of foodsO15245 up Transporter 2 2Q9UIJ7 up EnzymeO14556 up EnzymeP30084 up EnzymeP47895 up EnzymeQ16836 up Enzyme
Q9BWD1 up EnzymeP11498 up EnzymeP06733 up Enzyme 13 11A6NJ78 up EnzymeQ9UJZ1 up Ligand binding 39 12P35249 up Replication factor 47 12Q86XE5 up Enzyme 0P13804 up Electro acceptorQ99757 up EnzymeP40938 down Replication factor 47 12P11169 down Transporter 1 1P78417 down EnzymeQ8N0X4 down EnzymeQ9H9Y6 down EnzymeP61221 down Ligand binding 39 12P25685 down Chaperone 9 9P11766 down Enzyme 54 14P23396 down Ribosomal proteinP11182 down EnzymeO14734 down EnzymeQ15070 down Other 39 12Q9Y234 down EnzymeO95477 down Ligand bindingP36957 down Enzyme
3 53 54 82 33 52 36 9
2 3
5 14 84 6
1 12 32 3
2 33 42 3
1 12 44 6
The 15 most up or down regulated genes
C
D
BA
FIG 2 Functional and topological analyses of the differentially expressed (DE) and non-DE phytochemical targets. (A) Significantly enriched pathways for DEtargets and non-DE targets (only the top 11 pathways are shown for non-DE targets). The asterisks indicate the pathways that were significantly overrepresentedcompared to the other group in a direct comparison. (B) List of the 15 most (fold change) up- or downregulated genes (UniProt identifier [ID] or accession no.and type are also shown) targeted by phytochemicals directly or through PPI. The number of compounds targeting these genes and the number of foodscontaining these compounds were listed as well. (C) Complete microbiota-specific protein-protein interaction network. The DE phytochemical targets (DEtargets tend to be located in more central positions within the network) are shown in red, while the non-DE ones are shown in yellow. The nodes in gray indicateproteins that are not phytochemical targets (nontargets). For clarity, not all network edges were shown here. (D) The boxplots of six network topologicalproperties that were significantly different between DE (red) and non-DE targets (yellow) are shown. The y axes of boxplots show the calculated values ofcorresponding topological properties.
Phytochemicals Alter Gut Microbial Gene Expression
November/December 2015 Volume 6 Issue 6 e01263-15 ® mbio.asm.org 5
on Decem
ber 2, 2020 by guesthttp://m
bio.asm.org/
Dow
nloaded from
are more likely to be the “bridges” between clusters in the net-work. We also found that more than 80% of the interactionswithin the PPI network are between proteins found in the samegenome. With the reduced PPI network (filtering out interspeciesconnections), the results and conclusions derived from the net-work topological analysis discussed above still stand.
Taxonomic contribution to gene expression variations of thephytochemical targets. Since the phytochemicals present in theplant-based diet could potentially contribute to a gene expressionresponse of the gut microbiota, the follow-up question was whichmicrobe or groups of microbes were mostly influenced or contrib-uting most to the variations in gene expression at the communitylevel for each of the gene categories targeted by the food com-pounds. We calculated the overall expression change of each phy-lum for the 115 DE targets, and subsequently the contribution ofeach phylum in the whole microbial community (Fig. 3). Thephylum Bacteroidetes was found to possess the “broadest respond-ing spectrum,” i.e., it contributed highly to the differential expres-sion of more targeted genes than other phyla. Within the group oftarget genes to which Bacteroidetes contributed predominantly(denoted by the bright blue rectangle in Fig. 3), it was noticed thatmost genes were downregulated at the community level. GeneOntology (GO) annotation indicated that genes from this groupmainly participate in the carboxylic acid metabolic process (GO:0019752), cellular amino acid metabolic process (GO:0006520),as well as RNA and protein metabolic processes. Firmicutes, al-though without the widest spectrum, was the phylum responsiblefor differential expression of most upregulated target genes, whichare mainly involved in biological processes such as glucose (GO:0006006), hexose (GO:0019318), carboxylic acid (GO:0019752),and fatty acid (GO:0006631) metabolic processes. In addition tothe predominant and specific contribution of Firmicutes towardupregulated genes, it also negatively contributed (i.e., oppositedirection of expression change compared to that of the wholecommunity) to a cluster of downregulated genes, which are shownas the distinct blue blocks in Fig. 3. However, this effect was over-ridden by the opposite effect from Bacteroidetes, leading ulti-mately to a community-level downregulation. The group of targetgenes to which Actinobacteria predominantly contributed displaysa distinct profile with several regulatory processes, which are notpresent in the groups of other phyla. All the target genes in thegroups for both Actinobacteria and Proteobacteria had decreasedexpression at the community level.
We further investigated how various microorganisms re-sponded and contributed to variations in gene expression at thespecies level. As can be seen from Fig. S4 in the supplementalmaterial, the responses were phylogenetically diverse, even withinthe same phylum. While a group of species in one particular phy-lum may be the major contributors to community-level expres-sion change of targeted genes, the other members in that phylummay contribute very little. Taking Firmicutes as an example, 50%of the species in this phylum did not show noticeable contributionto the variation in expression of the 115 DE targets. On the otherhand, the majority of those upregulated DE targets primarily stemfrom a few particular species, including Eubacterium rectale,Faecalibacterium prausnitzii, Roseburia hominis, Roseburia intesti-nalis, Coprococcus, and Ruminococcus bromii. The first three spe-cies are the main butyrate-producing bacteria in the human gut(28, 29). With a role in anti-inflammatory process, their reducedpopulations have been reported previously as being correlated
with the development of diseases such as Crohn’s disease (30) andulcerative colitis (31). Similar patterns were also observed in theother three main phyla. Two single species, Bifidobacteriumlongum from Actinobacteria and Escherichia coli from Proteobacte-ria, dominated the corresponding phylum contribution towardcommunity-level expression variations. Furthermore, these twospecies together were mainly responsible for the community-leveldownregulation of most target genes.
The analyses discussed above provided information about therole of each phylum within the whole gut microbial community.In order to understand better the variation of genuine transcrip-tional activity for each phytochemical target within each phylum,we further normalized the gene expression levels by the phylum-level abundance and compared this normalized transcriptionalactivity after being on the diet with that of the baseline period. Thisfurther elucidated whether the higher gene expression level wasderived from a high transcription level or simply dominant phy-lum abundance. As shown in Fig. 4, the phyla Proteobacteria, Ac-tinobacteria, and Verrucomicrobia had decreased transcription ac-tivities for almost all DE targets, while Firmicutes andEuryarchaeota were the two dominant phyla having higher tran-scription levels. For most of the upregulated DE targets, Firmicutesdisplayed the highest increase of transcription activity, which wasin concordance with its predominant contribution to the commu-nity differential expression of upregulated genes mentionedabove. Interestingly, the phylum Euryarchaeota, which consistsmainly of methanogens, had increased transcription activities formost target genes that it expressed. This unique pattern was “hid-den” in the contribution analysis due to its low abundance in themicrobial community.
Designing dietary interventions: the case study of SCFA me-tabolism. Short-chain fatty acid (SCFA) metabolism has been ex-tensively cited as being associated with human health and diseasepathology, playing a pivotal role in regulating the biological pro-cesses in the human gut and colon (32, 33). The levels of SCFAs inthe gut, especially butyrate and propionate, have been connectedwith the development of different diseases ranging from meta-bolic and inflammatory diseases to cancers (34, 35). Positive cor-relations were also reported between the levels of certain SCFAsand particular microbes within the phylum of Firmicutes (Rose-buria sp., E. rectale, and F. prausnitzii) (4). Since altering the levelsof SCFAs could have potentially remarkable health benefits, weused our computational framework here for predicting diets thatcould potentially alter the activity of the SCFA-related pathways.
The butyrate metabolism (Kyoto Encyclopedia of Genes andGenomes [KEGG] ko00650) consists of 63 reactions (in terms ofKEGG reaction identifiers [IDs]) involving 41 metabolites, whilethe propionate metabolism (KEGG ko00640) is composed of 64reactions involving 44 metabolites (Fig. 5A). In order to identifyfoods that could alter the activity of butyrate or propionate me-tabolism, two approaches were adopted here: in NutriChem andChEMBL databases, (i) we searched for food phytochemicals thathave been experimentally tested to interact with proteins involvedin those two SCFA metabolic pathways; (ii) we searched for foodscontaining phytochemicals that are essentially metabolites in-volved in butyrate and propionate metabolism; thus, they are ex-pected to interact with the metabolic proteins. Although therewere no phytochemicals identified in the ChEMBL database thattarget SCFA metabolism-related proteins, the latter approachyielded a set of 98 foods containing 31 phytochemicals (corre-
Ni et al.
6 ® mbio.asm.org November/December 2015 Volume 6 Issue 6 e01263-15
on Decem
ber 2, 2020 by guesthttp://m
bio.asm.org/
Dow
nloaded from
Firmicu
tes
Bacter
oidete
s
Euryarc
haeo
ta
rruco
microb
ia
Actino
bacte
ria
Proteo
bacte
riaENSG00000124767ENSG00000173599ENSG00000140374ENSG00000100348ENSG00000241935ENSG00000169519ENSG00000114480ENSG00000165283ENSG00000158125ENSG00000147853ENSG00000138796ENSG00000175003ENSG00000120437ENSG00000163918ENSG00000105679ENSG00000127884ENSG00000122729ENSG00000160179ENSG00000168906ENSG00000067704ENSG00000151413ENSG00000153574ENSG00000096384ENSG00000258429ENSG00000078070ENSG00000023697ENSG00000178802ENSG00000119689ENSG00000137992ENSG00000126088ENSG00000165140ENSG00000108515ENSG00000125166ENSG00000155463ENSG00000132002ENSG00000060971ENSG00000002549ENSG00000125246ENSG00000125630ENSG00000148834ENSG00000197894ENSG00000120265ENSG00000198931ENSG00000059573ENSG00000176974ENSG00000140451ENSG00000178035ENSG00000172172ENSG00000198650ENSG00000114686ENSG00000085760ENSG00000131979ENSG00000074800ENSG00000117118ENSG00000048392ENSG00000213930ENSG00000168393ENSG00000197375ENSG00000101473ENSG00000169919ENSG00000164163ENSG00000165029ENSG00000154330ENSG00000059804ENSG00000133119ENSG00000051341ENSG00000149273ENSG00000142657ENSG00000131747ENSG00000115286ENSG00000144182ENSG00000196365ENSG00000110717ENSG00000158864ENSG00000079739ENSG00000065833ENSG00000100823ENSG00000151806ENSG00000126522ENSG00000177302ENSG00000167325ENSG00000166986ENSG00000139131ENSG00000011376ENSG00000117593ENSG00000144381ENSG00000166411ENSG00000167588ENSG00000138430ENSG00000062485ENSG00000100294ENSG00000116984ENSG00000065911ENSG00000137513ENSG00000151093ENSG00000100994ENSG00000091483ENSG00000106992ENSG00000151224ENSG00000138035ENSG00000073578ENSG00000108479ENSG00000013503ENSG00000101444ENSG00000166800ENSG00000138801ENSG00000178445ENSG00000143627ENSG00000184254ENSG00000101347ENSG00000167996ENSG00000132740ENSG00000146085ENSG00000171298ENSG00000083123
0.5 0.5 1.5
Contribution
GO:0006006 glucose metabolic processGO:0006260 DNA replicationGO:0019318 hexose metabolic processGO:0006281 DNA repairGO:0006259 DNA metabolic processGO:0044267 cellular protein metabolic processGO:0019219 regulation of nucleobase, nucleoside, nucleotide
and nucleic acid metabolic processGO:0031326 regulation of cellular biosynthetic process
GO:0019752 carboxylic acid metabolic processGO:0006520 cellular amino acid metabolic processGO:0044106 cellular amine metabolic processGO:0016070 RNA metabolic processGO:0044267 cellular protein metabolic processGO:0006753 nucleoside phosphate metabolic processGO:0019318 hexose metabolic processGO:0006412 translationGO:0044275 cellular carbohydrate catabolic processGO:0006006 glucose metabolic process
GO:0006006 glucose metabolic processGO:0019318 hexose metabolic processGO:0019752 carboxylic acid metabolic processGO:0034637 cellular carbohydrate biosynthetic processGO:0006631 fatty acid metabolic process
GO:0019752 carboxylic acid metabolic processGO:0044267 cellular protein metabolic processGO:0009108 coenzyme biosynthetic processGO:0045333 cellular respirationGO:0046395 carboxylic acid catabolic processGO:0006006 glucose metabolic processGO:0046394 carboxylic acid biosynthetic processGO:0019318 hexose metabolic processGO:0006753 nucleoside phosphate metabolic processGO:0006464 protein modification process
up-regulated genes
down-regulated genes
Bacteroidetes
Actinobacteria
Proteobacteria
Firmicutes
Phytochemicals Alter Gut Microbial Gene Expression
November/December 2015 Volume 6 Issue 6 e01263-15 ® mbio.asm.org 7
on Decem
ber 2, 2020 by guesthttp://m
bio.asm.org/
Dow
nloaded from
sponding to 18 and 19 metabolites from butyrate and propionatemetabolism, respectively) (Fig. 5A). Of these 98 foods, 22 containat least two phytochemicals involved in butyrate/propionate me-tabolism with strawberry, mung bean, and soybean possessing themost (Fig. 5B). Interestingly, 8 of the foods used in the originalstudy of David et al. (4)—where the authors detected significantlyhigher levels of SCFAs in the plant-based diet than in the animal-based diet—were part of the list containing the 98 foods (and 5 ofthem were part of the subgroup of the 22 foods), supporting ournotion that the detailed food composition could help us partlyexplain the observed changes in the gut microbial function andactivity.
To offer additional evidence that dietary interventions con-taining these 22 foods could potentially trigger changes in theSCFA gut metabolism, we developed a food-disease networkbased on experimental studies collected in the NutriChem data-base. We could retrieve 150 disease phenotypes (in terms of Dis-ease Ontology [36] IDs) for these 22 foods (see Fig. S5 in thesupplemental material) with breast cancer, hepatocellular carci-noma, and diabetes appearing as the top three diseases connectedto the highest number of foods (11, 11, and 10, respectively),whereas sweet peppers, tomatoes, and buckwheat have the highestnumber of disease links. While hepatocellular carcinoma and di-abetes have also been linked to gut microbial imbalances (37, 38),we focus here on a subset of the food-disease network, which isrelated to colon cancer, a disease highly associated with the levelsof SCFAs (39). We found that 9 out of the 22 foods have beenassociated with colon cancer (Fig. 5B); interestingly, these 9 foodswere significantly overrepresented in colon cancer than any otherfood-disease associations in the NutriChem database (P � 0.001by Fisher’s exact test). Most of the food-disease associations pres-ent in the literature (and subsequently in the NutriChem data-base) rely on feeding experiments of human or animal models andmonitoring the progress/development of a disease. The method-ology presented here, which relies on the known phytochemicalcomposition of diet and the potential interactions with the gut-specific bacterial genes, could serve as a route for understandingmechanistically the observed phenotypic responses.
Furthermore, we identified the most vulnerable protein targetsof the butyrate and propionate metabolic pathways using theirtopological features (based on the comparisons between DE andnon-DE proteins described above). The proteins in those twometabolic pathways were first ranked according to six significanttopological features independently (degree, closeness centrality,betweenness centrality, radiality, stress, and average shortest pathlength). Then, these six ranked lists were aggregated into a com-bined ranked list in the order of decreasing centrality (or vulner-ability) (see Table S5 in the supplemental material), providinginformation on which protein targets are more likely to changegene expression activity (Fig. 5A). In butyrate metabolism,3-hydroxyacyl-CoA dehydrogenase (K00022), acetyl-CoA acetyl-
transferase (K00626), and enoyl-CoA hydratase (K07511) werefound to be the proteins that if targeted by dietary phytochemicals,significant changes in the gene expression activity should be ex-pected. As to propionate metabolism, our topology analysis indi-cates succinyl-CoA synthetase (GenBank accession no. K01899),L-lactate dehydrogenase (DDBJ accession no. K00016), andmethylmalonyl-CoA mutase (GenBank accession no. K01847) asthe most vulnerable protein targets.
DISCUSSION
The value of the gastrointestinal community members and theirinteraction with the host through a variety of signaling moleculesand mechanisms has become widely recognized (40). Clear evi-dence of this realization is the research spending on the analysis ofthe symbiotic relationships between humans and their indigenousmicrobiome, which is estimated to be more than $500 million inthe last 5 years (41). Metagenomic studies, such as the MetaHitproject (42) and Human Microbiome Project (43), provided themultimillion-gene potential of the intestinal microbiome,whereas metatranscriptomic studies (4, 44) provided the subsetsof these genes that are expressed as a response to a given stimulus.In particular, diet is considered one of the major modulators of theintestinal microbiota and a dominant source of variation in itscomposition (45). However, even though occasionally measure-ments of specific biomarkers in the feces were performed, the fullinteraction spectrum between the food, its molecular compo-nents, and the microbiota was neglected.
It is worthy to note that xenobiotics, including drugs and an-tibiotics, which can be considered small-molecule analogs of di-etary phytochemicals, have been demonstrated to alter the geneexpression of active gut microbiota significantly (46). Here weprovide a molecular-level analysis of the extent and nature of thegut microbiome responsiveness to diet and a computational plat-form for understanding mechanistically how food componentscould drive the activity and function of the gut ecosystem to dif-ferent states. The development of a chemical-protein network be-tween the food components of a plant-based dietary interventionand the human gut-bacterial proteome space revealed that ~20%of the microbiome targets altered significantly their gene expres-sion activity at a community level. Our analysis also indicates thatthe metabolic pathways in the gut microbiota are more likely tochange activity and are easier to be interrupted compared to otherbiological processes, when targeted by dietary phytochemicals.Furthermore, while the chemical features of the phytochemicalscould not shed light on which of the bacterial targets will alter theirgene expression activity, the network analysis revealed the highercentrality within the bacterial PPI network as the most prominentcharacteristic of the DE bacterial targets of phytochemicals. Pre-vious studies have indicated that highly connected nodes, or“hubs,” in the scale-free cellular biological network tend to be theessential genes (47, 48), which will have less expression changes in
FIG 3 Phylum contribution toward the community-level gene expression variations of the differentially expressed (DE) targets. The heatmap illustrates theindividual contribution of each phylum present in gut microbiota for each DE phytochemical target. The 115 DE targets were further separated into two classes,upregulated genes and downregulated genes. The red blocks represent a positive contribution of the phylum toward community-level expression variation (thedirection of change for that phylum in one particular target gene was in agreement with that of the whole microbiota community, regardless of up- ordownregulation). The blue blocks indicate that the expression change of these genes for that phylum were in the direction opposite that of the community (i.e.,negative contribution). For each of the four main phyla, a cluster of genes was identified (enclosed by a rectangle, with the phylum name was displayed on theright), indicating the dominating effect of that phylum on gene expression variations. Gene Ontology annotations (biological processes) for the four gene clusterswere also provided.
Ni et al.
8 ® mbio.asm.org November/December 2015 Volume 6 Issue 6 e01263-15
on Decem
ber 2, 2020 by guesthttp://m
bio.asm.org/
Dow
nloaded from
Prote
obac
teria
Actino
bacte
ria
erru
com
icrob
ia
Bacte
roide
tes
Firmicu
tes
Eurya
rcha
eota
ENSG00000124767ENSG00000173599ENSG00000140374ENSG00000100348ENSG00000241935ENSG00000169519ENSG00000114480ENSG00000165283ENSG00000158125ENSG00000147853ENSG00000138796ENSG00000175003ENSG00000120437ENSG00000163918ENSG00000105679ENSG00000127884ENSG00000122729ENSG00000160179ENSG00000168906ENSG00000067704ENSG00000151413ENSG00000153574ENSG00000096384ENSG00000258429ENSG00000078070ENSG00000023697ENSG00000178802ENSG00000119689ENSG00000137992ENSG00000126088ENSG00000165140ENSG00000108515ENSG00000125166ENSG00000155463ENSG00000132002ENSG00000060971ENSG00000002549ENSG00000125246ENSG00000125630ENSG00000148834ENSG00000197894ENSG00000120265ENSG00000198931ENSG00000059573ENSG00000176974ENSG00000140451ENSG00000178035ENSG00000172172ENSG00000198650ENSG00000114686ENSG00000085760ENSG00000131979ENSG00000074800ENSG00000117118ENSG00000048392ENSG00000213930ENSG00000168393ENSG00000197375ENSG00000101473ENSG00000169919ENSG00000164163ENSG00000165029ENSG00000154330ENSG00000059804ENSG00000133119ENSG00000051341ENSG00000149273ENSG00000142657ENSG00000131747ENSG00000115286ENSG00000144182ENSG00000196365ENSG00000110717ENSG00000158864ENSG00000079739ENSG00000065833ENSG00000100823ENSG00000151806ENSG00000126522ENSG00000177302ENSG00000167325ENSG00000166986ENSG00000139131ENSG00000011376ENSG00000117593ENSG00000144381ENSG00000166411ENSG00000167588ENSG00000138430ENSG00000062485ENSG00000100294ENSG00000116984ENSG00000065911ENSG00000137513ENSG00000151093ENSG00000100994ENSG00000091483ENSG00000106992ENSG00000151224ENSG00000138035ENSG00000073578ENSG00000108479ENSG00000013503ENSG00000101444ENSG00000166800ENSG00000138801ENSG00000178445ENSG00000143627ENSG00000184254ENSG00000101347ENSG00000167996ENSG00000132740ENSG00000146085ENSG00000171298ENSG00000083123
−10 0 10
Variation
GO:0006006 glucose metabolic processGO:0006260 DNA replicationGO:0019318 hexose metabolic processGO:0006281 DNA repairGO:0006259 DNA metabolic processGO:0044267 cellular protein metabolic processGO:0019219 regulation of nucleobase, nucleoside, nucleotide
and nucleic acid metabolic processGO:0031326 regulation of cellular biosynthetic process
GO:0019752 carboxylic acid metabolic processGO:0006520 cellular amino acid metabolic processGO:0044106 cellular amine metabolic processGO:0016070 RNA metabolic processGO:0044267 cellular protein metabolic processGO:0006753 nucleoside phosphate metabolic processGO:0019318 hexose metabolic processGO:0006412 translationGO:0044275 cellular carbohydrate catabolic processGO:0006006 glucose metabolic process
GO:0006006 glucose metabolic processGO:0019318 hexose metabolic processGO:0019752 carboxylic acid metabolic processGO:0034637 cellular carbohydrate biosynthetic processGO:0006631 fatty acid metabolic process
GO:0019752 carboxylic acid metabolic processGO:0044267 cellular protein metabolic processGO:0009108 coenzyme biosynthetic processGO:0045333 cellular respirationGO:0046395 carboxylic acid catabolic processGO:0006006 glucose metabolic processGO:0046394 carboxylic acid biosynthetic processGO:0019318 hexose metabolic processGO:0006753 nucleoside phosphate metabolic processGO:0006464 protein modification process
up-regulated genes
down-regulated genes
Bacteroidetes
Actinobacteria
Proteobacteria
Firmicutes
Phytochemicals Alter Gut Microbial Gene Expression
November/December 2015 Volume 6 Issue 6 e01263-15 ® mbio.asm.org 9
on Decem
ber 2, 2020 by guesthttp://m
bio.asm.org/
Dow
nloaded from
mild or moderate conditions (49). However, this might not beobserved when an external stimulus is present (as shown in thisstudy) or in a completely different physiological condition such asdisease state (50). As shown here, upon dietary stimulus, thesehighly connected targets are more “vulnerable” (i.e., more suscep-tible to gene expression variation) and would generate a strongerglobal effect upon perturbation. From a therapeutic perspective,these more vulnerable (metabolic) proteins with high centralitycould serve as promising targets in phytochemical-based dietaryinterventions. Furthermore, our phylum-level investigation re-vealed that different phyla of bacteria in the human gut wouldrespond differently in terms of target gene expression activityupon phytochemical perturbation. More specifically, the altered
expression levels of targeted genes related to metabolic processeswere predominantly attributed to different dedicated phyla. Thewhole microbiota community thus works coordinately to achievethe overall actions in response to dietary interventions. Thus,identifying specific-disease-associated signatures in the gut mi-crobiota and products that alter microbial populations or blockspecific bacterial metabolites are expected to lead to the first gen-eration of microbiome therapies.
Even though the role of diet in health and disease states hasbeen recognized for years, the majority of studies in the literatureeither treated food as a black box or the focus was on carbohy-drates, lipids, and fiber, ignoring the small-molecule space. Ignor-ing the systemic role of phytochemicals was attributed to their
FIG 4 Variation of normalized transcriptional activity of DE targets in microbial phyla. The 115 DE targets were divided into two classes, upregulated genes anddownregulated genes. The red blocks represent increased transcriptional level of the gene by the particular microbial phylum, while blue blocks indicatedecreased transcription level. For clarity, logE transformation was performed on the values measuring the variation of transcriptional activities (when the valuewas negative, the absolute value was used for log transformation first and then turned into a negative value again). The four clusters of genes enclosed by rectangleswere the ones from Fig. 3.
14 1
15 1
Celery
1,2-PropanediolopaTrifolium incarnatum
AcetoinAGuava
PotatoesFenugreek
umRaspberry
Mustard greens
Asian rice
2-Hydroxypropionaldehyde
Strawberries
MethylglyoxalAcetate2-Propanol
Glutamic AcideSuccinate
1-Aminocyclopropane-1-carboxylate
Fumarate;Maleate
Mung beans
Soybean
Tomatoes
Sweet pepper
Sweet potato
Peas
2,3-Butanedione
N-Propanol
Propionate
23-Butanediol
2-Ketoglutaric Acid
PropionaldehydeSuccinyl-CoA
Bryonia alba
Beta-HydroxybutyricAcid
Rye
Dihydroxyacetonephosphate
Fallopia multiflora
Spinach
Sunroot
Opium poppy
Buckwheat
Pyruvic Acid
Colon adenocarcinoma
GuavaSweet potatoesColon carcinoma SoybeanTomatoes
BuckwheatColon cancer
Raspberry
Fallopia multiflora Potatoes
Strawberries
A B
FIG 5 Potential dietary interventions targeting the short-chain fatty acid (SCFA) metabolism. (A) The KEGG butyrate (top) (ko00650) and propionate(bottom) (ko00640) metabolic pathways are shown. For both pathways, the metabolites that were also found in 98 plant-based foods present in the NutriChemdatabase were highlighted in yellow. The proteins in the two metabolic pathways were ranked in decreasing order of vulnerability and colored from red (mostsusceptible to change in gene expression when targeted) to green (least susceptible to change in gene expression). (B) Subgroup of the 98 plant-based foods. (Top)The 22 foods (red or yellow ellipses) containing at least two phytochemicals/metabolites involved in butyrate and/or propionate metabolism and their associatedmetabolites (green triangle) are shown. The network is visualized in “degree sorted” style, with a decreasing order of node connectivity from middle bottomtoward counterclockwise direction. Strawberries, mung beans, and soybeans are the foods with the most metabolites and are shown in red. (Bottom) Subset ofthe 22 foods associated with colon cancer. Three different names for “colon cancer” in Disease Ontology were treated equivalently here.
Ni et al.
10 ® mbio.asm.org November/December 2015 Volume 6 Issue 6 e01263-15
on Decem
ber 2, 2020 by guesthttp://m
bio.asm.org/
Dow
nloaded from
presence in small amounts in our diet; however, we should notforget that drugs are also given in small amounts, and they have aprofound effect on our health. We believe that the development ofNutriChem, the database linking 1,772 plant-based foods with7,898 phytochemicals and 751 diseases, opened a window of op-portunities for understanding how the molecular components ofdiet interact with the human body, including, as shown here, thebacterial residents of our gut.
However, our analysis is not without limitations: while Nu-triChem is the most complete database of diet-disease associationsat this point in time, the majority of our understanding on theseassociations so far have originated from animal studies or even celllines, calling for more attention and research into this area tomake more accurate extrapolation to humans. It should also benoted that, despite the advances of computational methods, thelack of experimental bioactivity data of small molecules on bacte-rial proteins in public databases is a bottleneck for producing ac-curate ligand-target interaction networks. Further expansion ofour knowledge of the molecular composition of each food willprovide a more accurate picture of the food-microbiome interac-tion network and one more tool for better dietary/therapeuticstrategies. Last but not least, the gene expression changes of theproteins targeted by dietary phytochemicals will not necessarily betranslated into changes of the protein levels. Therefore, additionalmetaproteomic analysis could yield important insights into caus-ative relationships between particular perturbations and the re-spective enzymes/proteins. Furthermore, and since the binding ofa phytochemical to a bacterial target could disrupt the functionwithout affecting the protein’s level, metabolomic data would al-low the establishment of more-accurate linkages between dietarymolecules and the gut ecosystem phenotype.
Even though the framework presented here for modulating thefunctionality and activity of the gut microbiota relies on targetingspecific bacterial genes by food phytochemicals, it can be furtherextended for targeting harmful bacteria, e.g., Bilophila wadswor-thia (51) and Fusobacterium nucleatum (52), which can lead toacute inflammation and potentiate intestinal tumorigenesis, re-spectively, based on the disturbance of species-specific essentialprotein networks.
MATERIALS AND METHODSRNA-Seq data description. The transcriptome sequencing (RNA-Seq)data originated from a recently published study (4) and were deposited inthe Gene Expression Omnibus (GEO) with accession number GSE46761.In that study, the community-wide gene expression status in human gutmicrobiota before (4 and 1 days) and after (3 and 4 days) dietary inter-ventions was measured. To investigate the diet-microbiome interactionsat a molecular level, only the plant-based diet was used here and only theplant foods for which molecular compositions are available. Nine studyvolunteers (subject 1 and 3 to 10) were involved in both baseline andintervention periods of the plant-based diet and were used here. Detaileddata description and more information on experimental design are avail-able in the original publication (4).
Food-phytochemical and food-disease associations. For the foods inthe plant-based diet, their phytochemical composition and disease asso-ciations were retrieved from the NutriChem database (http://www.cbs.d-tu.dk/services/NutriChem-1.0/). After redundancy removal, a collectionof unique compounds formed the phytochemical space of the plant-baseddiet. With regard to the food-disease network, only the food-disease as-sociations supported by more than two references were included (infor-mation from NutriChem database). The third-level Disease Ontologyterm was used to define the disease category. In addition to the foods
present in the plant-based diet, all other links between foods, phytochemi-cals, and diseases were obtained through NutriChem.
Similar chemical space between phytochemicals and drugs. TheSMILES (simplified molecular input line entry system) strings of 437 foodcompounds were retrieved from PubChem (53), while the 1,536 FDA-approved small-molecule drugs and their associated SMILES were ob-tained from DrugBank (54) version 4.1. Based on these structural infor-mation, the RDKit plugin (http://www.rdkit.org) in KNIME (55) wasemployed for the calculation of compound molecular and physical chem-ical descriptors, including 1,024-bit Morgan circular fingerprint, topolog-ical polar surface area (TPSA), octanol-water partition coefficient (SlogP),molecular weight (MW), and numbers of Lipinski hydrogen bond accep-tors (HBA) and donors (HBD). Afterward, a matrix of compound de-scriptors was constructed. Compounds from different groups (e.g., phy-tochemicals or drugs) were distributed along the rows, whereas the 1,024-bit molecular fingerprint and five other properties constituted the 1,029columns. All the principal component analyses (PCAs) were performedinside R.
A subset of all the FDA-approved drugs that according to DrugBankare targeting metabolic pathways was selected; the subset was defined as“drugs with metabolic targets” in this study. Within this category, drugswhose targets have bacterial orthologs based on orthologous analysis (seebelow), were referred to as “drugs with metabolic targets having bacterialorthologs.”
Phytochemical-protein target bioactivity data. Initially, the foodcompounds were mapped to exactly matched compounds in the ChEMBLdatabase (25) or structurally similar ChEMBL compounds, using InChIkey and Morgan circular fingerprints, respectively. Two compounds weredeemed similar when they had a Tanimoto coefficient (calculated fromMorgan fingerprints) higher than 0.85 and their difference in molecularweight was lower than 50 g/mol. The protein targets of the food com-pounds were subsequently retrieved; only interactions within the follow-ing bioactivity thresholds were kept for further analysis: for Ki, dissocia-tion constant (Kd), 50% inhibitory concentration (IC50), and 50%effective concentration (EC50), the p_chem value (calculated as the nega-tive of the logarithm to base 10 of the measured activity) was larger than 6;for inhibition, the measurement value was greater than 30%; for potency,the measurement value was lower than 50 �M (20). To deal with themultiple measurements of the same compound on the same protein, wecalculated a frequency of “positive” measurements among all candidatemeasurements. This was based on the aforementioned thresholds andserved as evidence of compound-protein interaction. Only chemical-protein interactions with a frequency higher than 50% were consideredconfident and were used to derive protein targets of phytochemicals fordownstream analysis (20).
Since the ChEMBL database contains mainly experimental data forinteractions of small molecules with human proteins, the developedchemical-microbial protein network was based on the orthologous rela-tionships between the gut microbial reference genomes and the humangenome. For analyzing the similarity of binding sites between human-microbe protein pairs of phytochemical targets used here, we first ran-domly selected one microbial protein from all orthologous proteins foreach human-microbe protein pair. Then, based on the full sequences ofeach protein pair, COACH (56) was used to identify the stretch of aminoacid sequence that forms the respective binding pocket. These sequencesat the predicted binding site were further compared between human andmicrobial proteins. The same analysis was also performed on randomlyselected human-microbe protein pairs retrieved from ChEMBL that havebeen experimentally tested to share the same ligands.
Depletion of noncoding sequences from raw RNA-Seq data. Toachieve accurate coding sequences (CDS) and functional annotations, aswell as accurate expression level (number of reads per kilobase pair permillion mappable reads [RPKM]) for reads in coding region, we removedall potential rRNA sequences using the following procedures. First, allannotated noncoding sequences from all bacterial references in NCBI
Phytochemicals Alter Gut Microbial Gene Expression
November/December 2015 Volume 6 Issue 6 e01263-15 ® mbio.asm.org 11
on Decem
ber 2, 2020 by guesthttp://m
bio.asm.org/
Dow
nloaded from
were downloaded (ftp://ftp.ncbi.nih.gov/genomes/Bacteria/) and com-bined with Silva 16S rRNA sequences into a noncoding RNA database.Second, all the reads were mapped to this noncoding RNA database usingBLASTX with parameters “1e�5 -F F.” At last, we excluded the potentialnoncoding RNA reads if they were mapped to this noncoding RNA data-base with coverage �50%.
Definition of taxonomy information for reads. To define the taxon-omy of the RNA sequences, we retrieved 2,766 bacterial reference genomesequences from the NCBI database (January 2014 version). The informa-tion of the open reading frames (ORFs), their corresponding CDS andprotein sequences, as well as the taxonomy information of the strains werealso retrieved from NCBI. We further mapped all the reads (excludingpotential noncoding RNA) using the BWA (57) version 0.7.4-r385 tothese reference genomes. For each species with a relative abundance ofmore than 0.5% in at least one sample, the strain with the highest numberof total mappable reads from all samples was selected as the representativestrain for this species. Using these procedures, we identified 54 bacterialreference genomes (see Table S2 in the supplemental material). Based onthe alignment with highly identical reads (both identity and coverage ofthe read are more than 95%), SAMtools (58) was employed to manipulatethe BWA alignments. For each coding gene, per gene coverage was calcu-lated using BEDTools (59). The relative abundances of genes in each sam-ple were estimated using RPKM (number of reads per kilobase pair permillion mappable reads) using in-house scripts.
Definition of the ortholog proteins between human and bacteria.We first downloaded all human genes from Ensembl (60) (Ensembl Ge-nome assembly GRCH37 release 75). We selected the longest transcriptfor each gene locus as the representative transcript using in-house pythonscripts. The ortholog relationships between each bacteria gene set from 54reference genomes and human genes was defined using Inparanoid (61)with default parameters. The ortholog pairs between the human genomeand 54 bacterial genomes were 411 on average. We also carried out theortholog definition using reciprocal best-hit blast and found that �95%of the ortholog relationships defined are consistent between two methods.
Functional analysis of metatranscriptomic data. Thirty-eight sam-ples from nine subjects were kept for downstream analysis. For each sub-ject, the two different days (if present) within the same period (baseline ordietary intervention) were merged into one sample. The reads in prepro-cessed merged samples were then mapped to those 54 reference genomeswith BLASTN (e � 1e�05, coverage �70). The genes from the referencegenomes were further annotated with human orthologous genes as de-scribed above. The expression of each phytochemical target was quanti-fied by aggregation of all orthologous microbial genes and then normal-ized using RPKM. Afterward, Wilcoxon’s signed-rank test was employedto determine whether a gene displayed significantly differential expression(DE) after dietary intervention compared with the baseline status. Thefalse discovery rate (62) (FDR) method was used for correction of multi-ple hypothesis testing. Pathway mapping and enrichment analysis for DEand non-DE targets were conducted with Reactome (63) version 49. Adirect comparison of pathways in which DE and non-DE targets areinvolved was conducted with an overrepresentation analysis per-formed in R.
Ortholog-based PPI network and analysis. Since the proteins withinbiological systems rarely act in isolation, we also included the protein-protein interaction (PPI) data that originated from STRING (64) v9.05.Only PPIs with a score higher than 400 (representing a medium-confidence interaction as defined by the STRING authors) and for pairsfor which both human proteins have bacterial orthologs (as definedabove) were retrieved for construction of the microbiota-specific PPI net-work. The PPI network, as well as all other networks displayed in thisstudy, was visualized in Cytoscape (65) v3.2. The NetworkAnalyzer (66)plugin inside Cytoscape was employed for topological analysis, whichcalculated eight topological features: degree connectivity, clustering coef-ficient, topological coefficient, closeness centrality, radiality, stress, be-tweenness centrality, and average shortest path length.
Differential taxonomic contribution to the whole-community re-sponse. The expression change of each gene present in each phylum wascalculated by subtracting the mean expression level before the diet fromthe mean expression level after the diet. Then, for each gene, the phylumcontribution was calculated by dividing the corresponding expressionchange by the overall community-level gene expression change. A positivevalue thus indicates a positive contribution toward, or consistency withthe direction of, the community-level change, regardless of up- or down-regulation. In contrast, a negative value means the opposite direction ofchange between that particular phylum and the microbiota communityon the particular gene. For each of the four main phyla, a group of geneswhich represented an (nearly) exclusive positive contribution of the cor-responding taxon toward community-level expression change of thesegenes was identified. The four groups of genes were functionally anno-tated with Gene Ontology terms (biological processes, third level) usingthe Database for Annotation, Visualization and Integrated Discovery(DAVID) (67).
To measure the species-level contribution, a more stringent thresholdfor BLASTN mapping was used for the gene expression calculation (e �1e�05, identical ratio �95%, where the “identical ratio” was defined asthe product of identity and alignment length, which was then divided byread length). The remaining steps were analogous to the phylum-levelanalysis.
For each microbial phylum, the genuine transcriptional activity ofeach DE target gene was calculated by dividing its mean expression level(among subjects before or after the diet) by the corresponding phylumtaxonomic abundance. This represented the transcriptional activity nor-malized by the phylum-level abundance. Then, subtraction of this nor-malized activity before and after the diet gave the variation of transcrip-tional activity of each gene within each microbial phylum. A positive valuerepresents increased transcription level, while a negative value indicatesdecreased transcription level. The phylum-level abundance data (averageamong subjects before or after the diet) were retrieved from MG-RAST(68).
SCFA-oriented dietary interventions. The proteins participating inbutyrate (KEGG ko00650) and propionate metabolism (KEGG ko00640)were first ranked by individual topological measurements and then aggre-gated with the R package RankAggreg using default settings (69) to pro-vide an overall indication of vulnerability. Last, the pathway maps withmetabolites/phytochemicals and proteins highlighted were generatedwith R Bioconductor package Pathview (70).
SUPPLEMENTAL MATERIALSupplemental material for this article may be found at http://mbio.asm.org/lookup/suppl/doi:10.1128/mBio.01263-15/-/DCSupplemental.
Figure S1, PDF file, 0.4 MB.Figure S2, PDF file, 0.1 MB.Figure S3, PDF file, 0.4 MB.Figure S4, PDF file, 0.4 MB.Figure S5, PDF file, 1.5 MB.Table S1, DOCX file, 0.02 MB.Table S2, DOCX file, 0.02 MB.Table S3, DOCX file, 0.02 MB.Table S4, DOCX file, 0.02 MB.Table S5, DOCX file, 0.02 MB.
ACKNOWLEDGMENTS
G.P. and J.L. thank the HKU-SRT of Genomics for their technical sup-port. Y.N. thanks David Westergaard for fruitful discussions and supporton the chemical-protein interaction network construction.
REFERENCES1. Flint HJ, Scott KP, Louis P, Duncan SH. 2012. The role of the gut
microbiota in nutrition and health. Nat Rev Gastroenterol Hepatol9:577–589. http://dx.doi.org/10.1038/nrgastro.2012.156.
2. Eckburg PB, Bik EM, Bernstein CN, Purdom E, Dethlefsen L, Sargent
Ni et al.
12 ® mbio.asm.org November/December 2015 Volume 6 Issue 6 e01263-15
on Decem
ber 2, 2020 by guesthttp://m
bio.asm.org/
Dow
nloaded from
M, Gill SR, Nelson KE, Relman DA. 2005. Diversity of the humanintestinal microbial flora. Science 308:1635–1638. http://dx.doi.org/10.1126/science.1110591.
3. Walker AW, Ince J, Duncan SH, Webster LM, Holtrop G, Ze X, BrownD, Stares MD, Scott P, Bergerat A, Louis P, McIntosh F, Johnstone AM,Lobley GE, Parkhill J, Flint HJ. 2011. Dominant and diet-responsivegroups of bacteria within the human colonic microbiota. ISME J5:220 –230. http://dx.doi.org/10.1038/ismej.2010.118.
4. David LA, Maurice CF, Carmody RN, Gootenberg DB, Button JE,Wolfe BE, Ling AV, Devlin AS, Varma Y, Fischbach MA, Biddinger SB,Dutton RJ, Turnbaugh PJ. 2014. Diet rapidly and reproducibly alters thehuman gut microbiome. Nature 505:559 –563. http://dx.doi.org/10.1038/nature12820.
5. Macfarlane G, Gibson G. 1997. Carbohydrate fermentation, energy trans-duction and gas metabolism in the human large intestine, p 269 –318. InMackie R, White B (ed), Gastrointestinal microbiology. Springer, NewYork, NY.
6. Magee EA, Richardson CJ, Hughes R, Cummings JH. 2000. Contribu-tion of dietary protein to sulfide production in the large intestine: an invitro and a controlled feeding study in humans. Am J Clin Nutr 72:1488 –1494.
7. Barrasa JI, Olmo N, Lizarbe MA, Turnay J. 2013. Bile acids in the colon,from healthy to cytotoxic molecules. Toxicol In Vitro 27:964 –977. http://dx.doi.org/10.1016/j.tiv.2012.12.020.
8. Williamson G, Clifford MN. 2010. Colonic metabolites of berrypolyphenols: the missing link to biological activity? Br J Nutr 104(Suppl3):S48 –S66. http://dx.doi.org/10.1017/S0007114510003946.
9. Bialonska D, Ramnani P, Kasimsetty SG, Muntha KR, Gibson GR,Ferreira D. 2010. The influence of pomegranate by-product and punicala-gins on selected groups of human intestinal microbiota. Int J Food Micro-biol 140:175–182. http://dx.doi.org/10.1016/j.ijfoodmicro.2010.03.038.
10. Selma MV, Espín JC, Tomás-Barberán FA. 2009. Interaction betweenphenolics and gut microbiota: role in human health. J Agric Food Chem57:6485– 6501. http://dx.doi.org/10.1021/jf902107d.
11. Tzounis X, Rodriguez-Mateos A, Vulevic J, Gibson GR, Kwik-Uribe C,Spencer JP. 2011. Prebiotic evaluation of cocoa-derived flavanols inhealthy humans by using a randomized, controlled, double-blind, cross-over intervention study. Am J Clin Nutr 93:62–72. http://dx.doi.org/10.3945/ajcn.110.000075.
12. Cardona F, Andrés-Lacueva C, Tulipani S, Tinahones FJ, Queipo-Ortuño MI. 2013. Benefits of polyphenols on gut microbiota and impli-cations in human health. J Nutr Biochem 24:1415–1422. http://dx.doi.org/10.1016/j.jnutbio.2013.05.001.
13. Russell WR, Labat A, Scobbie L, Duncan SH. 2007. Availability ofblueberry phenolics for microbial metabolism in the colon and the poten-tial inflammatory implications. Mol Nutr Food Res 51:726 –731. http://dx.doi.org/10.1002/mnfr.200700022.
14. Larrosa M, Luceri C, Vivoli E, Pagliuca C, Lodovici M, Moneti G,Dolara P. 2009. Polyphenol metabolites from colonic microbiota exertanti-inflammatory activity on different inflammation models. Mol NutrFood Res 53:1044 –1054. http://dx.doi.org/10.1002/mnfr.200800446.
15. Lambert PA. 2002. Cellular impermeability and uptake of biocides andantibiotics in Gram-positive bacteria and mycobacteria. J Appl Microbiol92(Suppl):46S–54S. http://dx.doi.org/10.1046/j.1365-2672.92.5s1.7.x.
16. Nikaido H, Vaara M. 1985. Molecular basis of bacterial outer membranepermeability. Microbiol Res 49:1–32.
17. Jensen K, Panagiotou G, Kouskoumvekaki I. 2014. Integrated text min-ing and chemoinformatics analysis associates diet to health benefit at mo-lecular level. PLoS Comput Biol 10:e1003432. http://dx.doi.org/10.1371/journal.pcbi.1003432.
18. Jensen K, Panagiotou G, Kouskoumvekaki I. 2015. NutriChem: a sys-tems chemical biology resource to explore the medicinal value of plant-based foods. Nucleic Acids Res 43:D940 –D945. http://dx.doi.org/10.1093/nar/gku724.
19. Jensen K, Ni Y, Panagiotou G, Kouskoumvekaki I. 2015. Developing amolecular roadmap of drug-food interactions. PLoS Comput Biol 11:e1004048. http://dx.doi.org/10.1371/journal.pcbi.1004048.
20. Westergaard D, Li J, Jensen K, Kouskoumvekaki I, Panagiotou G. 2014.Exploring mechanisms of diet-colon cancer associations through candi-date molecular interaction networks. BMC Genomics 15:380. http://dx.doi.org/10.1186/1471-2164-15-380.
21. Atarashi K, Tanoue T, Oshima K, Suda W, Nagano Y, Nishikawa H,Fukuda S, Saito T, Narushima S, Hase K, Kim S, Fritz JV, Wilmes P,
Ueha S, Matsushima K, Ohno H, Olle B, Sakaguchi S, Taniguchi T,Morita H, Hattori M, Honda K. 2013. Treg induction by a rationallyselected mixture of Clostridia strains from the human microbiota. Nature500:232–236. http://dx.doi.org/10.1038/nature12331.
22. Ridlon JM, Kang DJ, Hylemon PB, Bajaj JS. 2014. Bile acids and the gutmicrobiome. Curr Opin Gastroenterol 30:332–338. http://dx.doi.org/10.1097/MOG.0000000000000057.
23. Cho I, Blaser MJ. 2012. The human microbiome: at the interface of healthand disease. Nat Rev Genet 13:260 –270. http://dx.doi.org/10.1038/nrg3182.
24. Clemente JC, Ursell LK, Parfrey LW, Knight R. 2012. The impact of thegut microbiota on human health: an integrative view. Cell 148:1258 –1270.http://dx.doi.org/10.1016/j.cell.2012.01.035.
25. Gaulton A, Bellis LJ, Bento AP, Chambers J, Davies M, Hersey A, LightY, McGlinchey S, Michalovich D, Al-Lazikani B, Overington JP. 2012.ChEMBL: a large-scale bioactivity database for drug discovery. NucleicAcids Res 40:D1100 –D1107. http://dx.doi.org/10.1093/nar/gkr777.
26. Iskar M, Campillos M, Kuhn M, Jensen LJ, van Noort V, Bork P. 2010.Drug-induced regulation of target expression. PLoS Comput Biol6:e100925. http://dx.doi.org/10.1371/journal.pcbi.1000925.
27. Barabási A, Oltvai ZN. 2004. Network biology: understanding the cell’sfunctional organization. Nat Rev Genet 5:101–113. http://dx.doi.org/10.1038/nrg1272.
28. Duncan SH, Belenguer A, Holtrop G, Johnstone AM, Flint HJ, LobleyGE. 2007. Reduced dietary intake of carbohydrates by obese subjects re-sults in decreased concentrations of butyrate and butyrate-producing bac-teria in feces. Appl Environ Microbiol 73:1073–1078. http://dx.doi.org/10.1128/AEM.02340-06.
29. Louis P, Flint HJ. 2009. Diversity, metabolism and microbial ecologyof butyrate-producing bacteria from the human large intestine. FEMSMicrobio l Let t 294:1– 8. ht tp : / /dx .doi .org/10 .1111/ j .1574-6968.2009.01514.x.
30. Manichanh C, Rigottier-Gois L, Bonnaud E, Gloux K, Pelletier E,Frangeul L, Nalin R, Jarrin C, Chardon P, Marteau P, Roca J, Dore J.2006. Reduced diversity of faecal microbiota in Crohn’s disease revealedby a metagenomic approach. Gut 55:205–211. http://dx.doi.org/10.1136/gut.2005.073817.
31. Vermeiren J, Van den Abbeele P, Laukens D, Vigsnaes LK, De VosM, Boon N, Van de Wiele T. 2012. Decreased colonization of fecalClostridium coccoides/Eubacterium rectale species from ulcerativecolitis patients in an in vitro dynamic gut model with mucin environ-ment. FEMS Microbiol Ecol 79:685– 696. http://dx.doi.org/10.1111/j.1574-6941.2011.01252.x.
32. Scheppach W. 1994. Effects of short chain fatty acids on gut morphologyand function. Gut 35:S35–S38. http://dx.doi.org/10.1136/gut.35.1_Suppl.S35.
33. den Besten G, van Eunen K, Groen AK, Venema K, Reijngoud DJ,Bakker BM. 2013. The role of short-chain fatty acids in the interplaybetween diet, gut microbiota, and host energy metabolism. J Lipid Res54:2325–2340. http://dx.doi.org/10.1194/jlr.R036012.
34. Hamer HM, Jonkers D, Venema K, Vanhoutvin S, Troost FJ, BrummerRJ. 2008. The role of butyrate on colonic function. Aliment PharmacolTher 27:104 –119. http://dx.doi.org/10.1111/j.1365-2036.2007.03562.x.
35. Canani RB, Costanzo MD, Leone L, Pedata M, Meli R, Calignano A.2011. Potential beneficial effects of butyrate in intestinal and extraintesti-nal diseases. World J Gastroenterol 17:1519 –1528. http://dx.doi.org/10.3748/wjg.v17.i12.1519.
36. Schriml LM, Arze C, Nadendla S, Chang Y-W, Mazaitis M, Felix V,Feng G, Kibbe WA. 2012. Disease Ontology: a backbone for disease se-mantic integration. Nucleic Acids Res 40:D940 –D946. http://dx.doi.org/10.1093/nar/gkr972.
37. Dapito DH, Mencin A, Gwak GY, Pradere JP, Jang MK, Mederacke I,Caviglia JM, Khiabanian H, Adeyemi A, Bataller R, Lefkowitch JH,Bower M, Friedman R, Sartor RB, Rabadan R, Schwabe RF. 2012.Promotion of hepatocellular carcinoma by the intestinal microbiotaand TLR4. Cancer Cell 21:504 –516. http://dx.doi.org/10.1016/j.ccr.2012.02.007.
38. Qin J, Li Y, Cai Z, Li S, Zhu J, Zhang F, Liang S, Zhang W, Guan Y,Shen D, Peng Y, Zhang D, Jie Z, Wu W, Qin Y, Xue W, Li J, Han L, LuD, Wu P, Dai Y, Sun X, Li Z, Tang A, Zhong S, Li X, Chen W, Xu R,Wang M, Feng Q, Gong M, Yu J, Zhang Y, Zhang M, Hansen T,Sanchez G, Raes J, Falony G, Okuda S, Almeida M, LeChatelier E,Renault P, Pons N, Batto JM, Zhang Z, Chen H, Yang R, Zheng W, Li
Phytochemicals Alter Gut Microbial Gene Expression
November/December 2015 Volume 6 Issue 6 e01263-15 ® mbio.asm.org 13
on Decem
ber 2, 2020 by guesthttp://m
bio.asm.org/
Dow
nloaded from
S, Yang H, et al. 2012. A metagenome-wide association study of gutmicrobiota in type 2 diabetes. Nature 490:55– 60. http://dx.doi.org/10.1038/nature11450.
39. Wong JMW, de Souza R, Kendall CWC, Emam A, Jenkins DJA. 2006.Colonic health: fermentation and short chain fatty acids. J Clin Gastroen-terol 40:235–243. http://dx.doi.org/10.1097/00004836-200603000-00015.
40. Holmes E, Li JV, Marchesi JR, Nicholson JK. 2012. Gut microbiotacomposition and activity in relation to host metabolic phenotype anddisease risk. Cell Metab 16:559 –564. http://dx.doi.org/10.1016/j.cmet.2012.10.007.
41. Reardon S. 2014. Microbiome therapy gains market traction. Nature 509:269 –270. http://dx.doi.org/10.1038/509269a.
42. Qin J, Li R, Raes J, Arumugam M, Burgdorf KS, Manichanh C, NielsenT, Pons N, Levenez F, Yamada T, Mende DR, Li J, Xu J, Li S, Li D, CaoJ, Wang B, Liang H, Zheng H, Xie Y, Tap J, Lepage P, Bertalan M, BattoJM, Hansen T, Le Paslier D, Linneberg A, Nielsen HB, Pelletier E,Renault P, Sicheritz-Ponten T, Turner K, Zhu H, Yu C, Li S, Jian M,Zhou Y, Li Y, Zhang X, Li S, Qin N, Yang H, Wang J, Brunak S, DoreJ, Guarner F, Kristiansen K, Pedersen O, Parkhill J, Weissenbach J, etal. 2010. A human gut microbial gene catalogue established by metag-enomic sequencing. Nature 464:59 – 65. http://dx.doi.org/10.1038/nature08821.
43. NIH HMP Working Group, Peterson J, Garges S, Giovanni M, McInnesP, Wang L, Schloss JA, Bonazzi V, McEwen JE, Wetterstrand KA, DealC, Baker CC, Di Francesco V, Howcroft TK, Karp RW, Lunsford RD,Wellington CR, Belachew T, Wright M, Giblin C, David H, Mills M,Salomon R, Mullins C, Akolkar B, Begg L, Davis C, Grandison L,Humble M, Khalsa J, Little AR, Peavy H, Pontzer C, Portnoy M, SayreMH, Starke-Reed P, Zakhari S, Read J, Watson B, Guyer M. 2009. TheNIH Human Microbiome Project. Genome Res 19:2317–2323. http://dx.doi.org/10.1101/gr.096651.109.
44. McNulty NP, Yatsunenko T, Hsiao A, Faith JJ, Muegge BD, GoodmanAL, Henrissat B, Oozeer R, Cools-Portier S, Gobert G, Chervaux C,Knights D, Lozupone CA, Knight R, Duncan AE, Bain JR, MuehlbauerMJ, Newgard CB, Heath AC, Gordon JI. 2011. The impact of a consor-tium of fermented milk strains on the gut microbiome of gnotobiotic miceand monozygotic twins. Sci Transl Med 3:106ra106. http://dx.doi.org/10.1126/scitranslmed.3002701.
45. Faith JJ, McNulty NP, Rey FE, Gordon JI. 2011. Predicting a human gutmicrobiota’s response to diet in gnotobiotic mice. Science 333:101–104.http://dx.doi.org/10.1126/science.1206025.
46. Maurice CF, Haiser HJ, Turnbaugh PJ. 2013. Xenobiotics shape thephysiology and gene expression of the active human gut microbiome. Cell152:39 –50. http://dx.doi.org/10.1016/j.cell.2012.10.052.
47. Hwang YC, Lin CC, Chang JY, Mori H, Juan HF, Huang HC. 2009.Predicting essential genes based on network and sequence analysis. MolBiosyst 5:1672–1678. http://dx.doi.org/10.1039/b900611g.
48. Jeong H, Mason SP, Barabási AL, Oltvai ZN. 2001. Lethality and cen-trality in protein networks. Nature 411:41– 42. http://dx.doi.org/10.1038/35075138.
49. Zhou L, Ma X, Sun F. 2008. The effects of protein interactions, geneessentiality and regulatory regions on expression variation. BMC Syst Biol2:54. http://dx.doi.org/10.1186/1752-0509-2-54.
50. Ideker T, Sharan R. 2008. Protein networks in disease. Genome Res18:644 – 652. http://dx.doi.org/10.1101/gr.071852.107.
51. O’Keefe SJ, Li JV, Lahti L, Ou J, Carbonero F, Mohammed K, PosmaJM, Kinross J, Wahl E, Ruder E, Vipperla K, Naidoo V, Mtshali L, TimsS, Puylaert PG, DeLany J, Krasinskas A, Benefiel AC, Kaseb HO,Newton K, Nicholson JK, de Vos WM, Gaskins HR, Zoetendal EG.2015. Fat, fibre and cancer risk in African Americans and rural Africans.Nat Commun 6:6342.
52. Kostic AD, Chun E, Robertson L, Glickman JN, Gallini CA, MichaudM, Clancy TE, Chung DC, Lochhead P, Hold GL, El-Omar EM,Brenner D, Fuchs CS, Meyerson M, Garrett WS. 2013. Fusobacteriumnucleatum potentiates intestinal tumorigenesis and modulates the tumor-immune microenvironment. Cell Host Microbe 14:207–215. http://dx.doi.org/10.1016/j.chom.2013.07.007.
53. Bolton EE, Wang Y, Thiessen PA, Bryant SH. 2008. Chapter 12.PubChem: integrated platform of small molecules and biological activi-ties, p 217–241. In Ralph AW, David CS (ed), Annual reports in compu-tational chemistry, vol 4. Elsevier, Oxford, United Kingdom.
54. Law V, Knox C, Djoumbou Y, Jewison T, Guo AC, Liu Y, Maciejewski
A, Arndt D, Wilson M, Neveu V, Tang A, Gabriel G, Ly C, Adamjee S,Dame ZT, Han B, Zhou Y, Wishart DS. 2014. DrugBank 4.0: sheddingnew light on drug metabolism. Nucleic Acids Res 42:D1091–D1097.http://dx.doi.org/10.1093/nar/gkt1068.
55. Berthold M, Cebron N, Dill F, Gabriel T, Kötter T, Meinl T, Ohl P, SiebC, Thiel K, Wiswedel B. 2008. KNIME: the Konstanz information miner,p 319 –326. In Preisach C, Burkhardt H, Schmidt-Thieme L, Decker R(ed), Data analysis, machine learning and applications. Springer, Berlin,Germany. http://dx.doi.org/10.1007/978-3-540-78246-9_38.
56. Yang J, Roy A, Zhang Y. 2013. Protein-ligand binding site recognitionusing complementary binding-specific substructure comparison and se-quence profile alignment. Bioinformatics 29:2588 –2595. http://dx.doi.org/10.1093/bioinformatics/btt447.
57. Li H, Durbin R. 2009. Fast and accurate short read alignment withBurrows-Wheeler transform. Bioinformatics 25:1754 –1760. http://dx.doi.org/10.1093/bioinformatics/btp324.
58. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G,Abecasis G, Durbin R, 1000 Genome Project Data Processing Sub-group. 2009. The Sequence Alignment/Map format and SAMtools. Bioin-formatics 25:2078 –2079. http://dx.doi.org/10.1093/bioinformatics/btp352.
59. Quinlan AR, Hall IM. 2010. BEDTools: a flexible suite of utilities forcomparing genomic features. BioInformatics 26:841– 842. http://dx.doi.org/10.1093/bioinformatics/btq033.
60. Cunningham F, Amode MR, Barrell D, Beal K, Billis K, Brent S,Carvalho-Silva D, Clapham P, Coates G, Fitzgerald S, Gil L, Giron CG,Gordon L, Hourlier T, Hunt SE, Janacek SH, Johnson N, Juettemann T,Kahari AK, Keenan S, Martin FJ, Maurel T, McLaren W, Murphy DN,Nag R, Overduin B, Parker A, Patricio M, Perry E, Pignatelli M, RiatHS, Sheppard D, Taylor K, Thormann A, Vullo A, Wilder SP, ZadissaA, Aken BL, Birney E, Harrow J, Kinsella R, Muffato M, Ruffier M,Searle SM, Spudich G, Trevanion SJ, Yates A, Zerbino DR, Flicek P.2015. Ensembl 2015. Nucleic Acids Res 43:D662–D669. http://dx.doi.org/10.1093/nar/gku1010.
61. Remm M, Storm CEV, Sonnhammer ELL. 2001. Automatic clustering oforthologs and in-paralogs from pairwise species comparisons. J Mol Biol314:1041–1052. http://dx.doi.org/10.1006/jmbi.2000.5197.
62. Benjamini Y, Hochberg Y. 1995. Controlling the false discovery rate—apractical and powerful approach to multiple testing. J R Stat Soc Ser BMethod 57:289 –300.
63. Croft D, Mundo AF, Haw R, Milacic M, Weiser J, Wu G, Caudy M,Garapati P, Gillespie M, Kamdar MR, Jassal B, Jupe S, Matthews L, MayB, Palatnik S, Rothfels K, Shamovsky V, Song H, Williams M, Birney E,Hermjakob H, Stein L, D’Eustachio P. 2014. The Reactome pathwayknowledgebase. Nucleic Acids Res 42:D472–D477. http://dx.doi.org/10.1093/nar/gkt1102.
64. Franceschini A, Szklarczyk D, Frankild S, Kuhn M, Simonovic M, RothA, Lin J, Minguez P, Bork P, von Mering C, Jensen LJ. 2013. STRINGv9.1: protein-protein interaction networks, with increased coverage andintegration. Nucleic Acids Res 41:D808 –D815. http://dx.doi.org/10.1093/nar/gks1094.
65. Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, AminN, Schwikowski B, Ideker T. 2003. Cytoscape: a software environment forintegrated models of biomolecular interaction networks. Genome Res 13:2498 –2504. http://dx.doi.org/10.1101/gr.1239303.
66. Assenov Y, Ramirez F, Schelhorn SE, Lengauer T, Albrecht M. 2008.Computing topological parameters of biological networks. Bioinformat-ics 24:282–284. http://dx.doi.org/10.1093/bioinformatics/btm554.
67. Huang DW, Sherman BT, Lempicki RA. 2008. Systematic and integrativeanalysis of large gene lists using DAVID bioinformatics resources. NatProtoc 4:44 –57. http://dx.doi.org/10.1038/nprot.2008.211.
68. Meyer F, Paarmann D, D’Souza M, Olson R, Glass EM, Kubal M,Paczian T, Rodriguez A, Stevens R, Wilke A, Wilkening J, Edwards RA.2008. The metagenomics RAST server—a public resource for the auto-matic phylogenetic and functional analysis of metagenomes. BMC Bioin-formatics 9:386. http://dx.doi.org/10.1186/1471-2105-9-386.
69. Pihur V, Datta S, Datta S. 2009. RankAggreg, an R package for weightedrank aggregation. BMC Bioinformatics 10:62. http://dx.doi.org/10.1186/1471-2105-10-62.
70. Luo W, Brouwer C. 2013. Pathview: an R/Bioconductor package forpathway-based data integration and visualization. Bioinformatics 29:1830 –1831. http://dx.doi.org/10.1093/bioinformatics/btt285.
Ni et al.
14 ® mbio.asm.org November/December 2015 Volume 6 Issue 6 e01263-15
on Decem
ber 2, 2020 by guesthttp://m
bio.asm.org/
Dow
nloaded from