a molecular-level landscape of diet-gut microbiome interactions: toward dietary ... · dietary...

A Molecular-Level Landscape of Diet-Gut Microbiome Interactions:Toward Dietary Interventions Targeting Bacterial Genes

Yueqiong Ni, Jun Li, Gianni Panagiotou

Systems Biology and Bioinformatics Group, School of Biological Sciences, The University of Hong Kong, Hong Kong, Hong Kong, China

ABSTRACT As diet is considered the major regulator of the gut ecosystem, the overall objective of this work was to demonstratethat a detailed knowledge of the phytochemical composition of food could add to our understanding of observed changes infunctionality and activity of the gut microbiota. We used metatranscriptomic data from a human dietary intervention study todevelop a network that consists of >400 compounds present in the administered plant-based diet linked to 609 microbial targetsin the gut. Approximately 20% of the targeted bacterial proteins showed significant changes in their gene expression levels, whilefunctional and topology analyses revealed that proteins in metabolic networks with high centrality are the most “vulnerable”targets. This global view and the mechanistic understanding of the associations between microbial gene expression and dietarymolecules could be regarded as a promising methodological approach for targeting specific bacterial proteins that impact humanhealth.

IMPORTANCE It is a general belief that microbiome-derived drugs and therapies will come to the market in coming years, eitherin the form of molecules that mimic a beneficial interaction between bacteria and host or molecules that disturb a harmful inter-action or proteins that can modify the microbiome or bacterial species to change the balance of “good” and “bad” bacteria in thegut microbiome. However, among the numerous factors, what has proven the most influential for modulating the microbialcomposition of the gut is diet. In line with this, we demonstrate here that a systematic analysis of the interactions between thesmall molecules present in our diet and the gut bacterial proteome holds great potential for designing dietary interventions toimprove human health.

Received 25 July 2015 Accepted 25 September 2015 Published 27 October 2015

Citation Ni Y, Li J, Panagiotou G. 2015. A molecular-level landscape of diet-gut microbiome interactions: toward dietary interventions targeting bacterial genes. mBio 6(6):e01263-15. doi:10.1128/mBio.01263-15.

Editor Sang Yup Lee, Korea Advanced Institute of Science and Technology

Copyright © 2015 Ni et al. This is an open-access article distributed under the terms of the Creative Commons Attribution-Noncommercial-ShareAlike 3.0 Unported license,which permits unrestricted noncommercial use, distribution, and reproduction in any medium, provided the original author and source are credited.

Address correspondence to Gianni Panagiotou, [email protected]

A substantial amount of research has established that the com-position and metabolism of human microbiota play crucial

roles in human health. Microbial colonization of the gastrointes-tinal tract varies widely, with the large intestine having not onlythe highest density of microbes in terms of bacterial cells per grambut also the most metabolically active microbial community (1).Firmicutes and Bacteroidetes are the most abundant phyla, butActinobacteria, Proteobacteria, and Verrucomicrobia are also regu-larly found in healthy adults (2). Diet clearly has a major impacton variation in the gut microbiota composition, which is easilydetected in fecal samples even after only a few days after a changein diet (3, 4). The fermentative but also (anaerobic) respiratorybacterial metabolism of dietary components produces an extraor-dinary chemical diversity in the large intestine with protective(e.g., short-chain fatty acids [SCFAs]) or detrimental (e.g., hydro-gen sulfite or bile acids) effects on disease development (5–7). Themost sensitive species in response to a change in diet belong toFirmicutes and Actinobacteria; however, more associations be-tween species abundance and diets have been confirmed, e.g., highmeat intake with an increase in Bacteroidetes, more fiber with ahigh proportion of Prevotella spp., etc.

Dietary phytochemicals are usually small molecules with highstructural diversity, often part of the plant’s secondary metabo-

lism, that when they reach the intestinal tract can cause selectivestress or stimulus to the resident microbiota (8–10). For example,a high intake of cocoa-derived flavonoids increases the abundanceof bifidobacteria and lactobacilli and reduces the level of plasmatriacylglycerol (11). However, phytochemicals are also trans-formed by the gut microbial communities to other metaboliteswith altered bioactivity. For example, conversion of polyphenolicmolecules from the microbiome has been shown to inhibit thetumor necrosis factor, NF-�B, and other inflammatory mediators(12–14). Furthermore, bacterial enzymatic activities, such as�-glucuronidase that converts glucuronides to their respectiveaglycons, are responsible for the retention time of phytochemicalsin the human body. It is generally considered that the Gram-positive bacteria are relatively freely permeable to small molecules(15), while those molecules below a certain molecular mass,around 700 Da, can enter into Gram-negative bacteria throughporin proteins (16).

In a previous work, we applied text mining and naive Bayesclassification in 21 million abstracts from MEDLINE, and weidentified 23,137 phytochemicals that are present in plant-baseddiets (17). Several of these phytochemicals have made it all the wayto pharmacy shelves or have served as lead structures for drugdevelopment, demonstrating their important contribution to

RESEARCH ARTICLE crossmark

November/December 2015 Volume 6 Issue 6 e01263-15 ® mbio.asm.org 1

on Decem

ber 2, 2020 by guesthttp://m

bio.asm.org/

Dow

nloaded from

http://creativecommons.org/licenses/by-nc-sa/3.0/

http://crossmark.crossref.org/dialog/?doi=10.1128/mBio.01263-15&domain=pdf&date_stamp=2015-10-27

mbio.asm.org

http://mbio.asm.org/

host homeostasis. We also systematically searched for and foundassociations between 2,768 edible plants and 1,613 human diseasephenotypes and characterized the association as being positive ornegative. We developed a web-based database, which to the best ofour knowledge is the most complete source that links phyto-chemicals, diet, and disease, and we were able to provide amolecular-level mode of action on how specific diets could pre-vent disease development (18) or affect drug pharmacodynamicsand pharmacokinetics (19). As a step further, we used this exten-sive source of information to generate a comprehensive picture ofthe interaction profile of 158 edible plants, which are positivelyassociated with reduction in colon cancer, with a predefined can-didate colon cancer target space consisting of ~1,900 human pro-teins (20). We uncovered the key components in colon cancer thatare targeted synergistically by phytochemicals, and we identifiedstatistically significant and highly correlated protein networks thatcould be perturbed by dietary habits. However, in that study, wecompletely ignored interactions of the phytochemicals with thegut microbiome and alterations in gut composition and activity.

The gut microbial ecosystem lies at the interface of the host andenvironment and is considered the third major component regu-lating host health and the onset of disease; therefore, developingways to manipulate the gut microbiota represents a promisingtherapeutic avenue. Trying to identify the “good” and “bad” bac-teria in the intestinal microbiota in order to promote the growthof healthy bacterial species and/or eliminate the others has beenrecognized as one strategy (21); however, not everyone is con-vinced that single species are to be blamed for disease develop-ment. Instead, or maybe in addition, to species-centric studies,targeting specific genes could be a more realistic approach, espe-cially since our knowledge of harmful bacterium-specific meta-bolic products has grown substantially (22). In the present work,we developed the computational framework for studying possibleinteractions between the highly diverse phytochemical space ofour diet and gut microbial proteins. By coupling metatranscrip-tomics, chemoinformatics, and network biology, we created amolecular-level map of diet-bacterium interactions showcasingthat individual dietary components may contribute to the ob-served gene expression activities. To the best of our knowledge,this is the first systematic approach for a mechanistic understand-ing on how phytochemicals are linked to the intestinal microbiota,and it may assist in designing diets with potential therapeutic ben-efit.

RESULTSInteractome analysis of bacterial genes and dietary compo-nents. We adopted the metatranscriptomic data from a recently(2014) published study (4) that aimed to investigate whether thehuman gut microbiota can rapidly respond to dietary interven-tions in a diet-specific manner. Briefly, the original study had twodiet arms, a plant-based diet and an animal-based diet; fecal sam-ples were collected from nine individuals for baseline levels andafter 2 days on each experimental diet, and transcriptome se-quencing (RNA-Seq) analysis of the bacterial community in thegut was performed. In order to utilize the huge diversity ofthe phytochemical space and link the molecular composition ofthe diet with chemoinformatics and network-based analysis, wefocused on only the plant-based diet in this study to achieve amolecular-level investigation of the microbial response to diet.The plant-based diet consisted of a mixture of edible plants, most

of which belong to fruits and vegetables. To obtain an overallpicture of the health-related effects of the administered plant-based diet, we used the plant/food names as a query in the Nu-triChem database (18), and we retrieved data on associated diseasephenotypes. In total, 14 foods in this diet arm (see Table S1A in thesupplemental material) were found in NutriChem and were usedto generate a food-disease network (Fig. 1A). Specifically, for these14 foods, there is experimental evidence for association with 38different diseases that fall into 20 disease categories, which werefurther classified into 8 general disease classes, including meta-bolic disease, cancer, cardiovascular disease, etc. (Table S1C).Among the 14 foods, garlic was associated with the highest num-ber of diseases (20). Notably, a considerable number of these dis-eases, including colon cancer, obesity, and inflammatory boweldisease, have previously been linked to the composition and di-versity of the gut microbiota (23, 24).

Subsequently, the chemical composition of these 14 foods wasretrieved from the NutriChem database (see Table S1B in the sup-plemental material) and 437 unique food compounds (also calledphytochemicals throughout this study) were obtained for furtheranalysis. Of the 437 phytochemicals, 415 (95%) have molecularmasses below 700 Da and thus are expected to enter even Gram-negative bacteria. There are approximately 40 compounds perfood on average; among these foods, peas have the richest phyto-chemical profile (85 phytochemicals), while carrots contain thelowest number of compounds (4 phytochemicals). Tomatoes andrice have “the most novel chemical space,” which denotes thehighest number of compounds present exclusively in one partic-ular food. To investigate the potential bioactivity of these 437phytochemicals, we compared them with 1,536 FDA-approveddrugs in terms of chemical and physical properties. We observedthat most of these food compounds cluster closely or overlap withmany FDA-approved drugs indicated for metabolic diseases (seeFig. S1 in the supplemental material). There was also clear over-lapping with a subset of the FDA-approved drugs that target hu-man proteins with bacterial orthologs in the gut; thus, they (i.e.,the FDA-approved drugs and subsequently their chemically sim-ilar food components) could potentially alter the activity of themicrobial metabolic network (Fig. S2). Some representative foodcompounds, such as methoxsalen (present in papaya), quercetin(garlic, mangos, tomato, cauliflower), aminolevulinic acid (peas,tomato), trifluoperazine (garlic), which can be found in Drug-Bank either as experimental or FDA-approved drugs, are high-lighted in Fig. S2.

To hone in on the food-microbiome interactions at the molec-ular level, we employed compound bioactivity data to uncover thepotential targets of the compounds within the 14 foods. For thatpurpose, we used the ChEMBL database (25), which contains ex-perimentally verified interactions between small molecules andproteins. According to the ChEMBL database, 165 of these 437phytochemicals interacted with 214 proteins. The phytochemicalstargeting the highest number of proteins were quercetin(CHEMBL50), apigenin (CHEMBL28), ellagic acid (CHEMBL6246),and luteolin (CHEMBL151) (listed in decreasing order) (Fig. 1B).Of the 214 protein targets that interact with those food com-pounds, 31 proteins were found to have bacterial orthologs in thegut (see Materials and Methods for details). To evaluate the cred-ibility of interactions between microbial proteins and compoundsgenerated by ortholog inference, we compared the distribution ofsimilarities of the human-microbe orthologs defined in this study,

Ni et al.

2 ® mbio.asm.org November/December 2015 Volume 6 Issue 6 e01263-15

on Decem


bio.asm.org/

Dow

nloaded from

mbio.asm.org


with that of human-microbe protein pairs targeted by the samecompounds (experimentally tested, 1,088 in total) retrieved fromthe ChEMBL database. No significant difference (P � 0.05 by theWilcoxon rank sum test) was observed for bit score, identity, or Evalue between these two groups of protein pairs. In addition tothis global comparison, we investigated the similarities of host-microbe pairs used here at the binding site level. It was observedthat their respective binding pockets also possessed medium orhigh similarities, which were not lower than the similarities ofhuman-microbe protein pairs that have been experimentallytested (based on the ChEMBL database) to share the same ligands(data not shown). This provided additional evidence for the fea-sibility of inferring bacterial targets based on homology to hosttargets. Quercetin, luteolin, and ellagic acid were found to possessthe most bacterial protein targets, as well as kaempferol(CHEMBL150), resveratrol (CHEMBL165), apigenin, and trice-tin (CHEMBL247484) (in decreasing order) (Fig. 1B). Quercetinand kaempferol are widespread in the plant-based diet of ourstudy (cauliflower, mangos, onion, and peas), while other phyto-chemicals are present in only one or two foods. When incorporat-ing the first-degree protein-protein interaction (PPI) partners ofthe 31 direct bacterial targets, we significantly expanded the phy-tochemical target space, generating a gut microbiota-specificcompound-target interaction network (Fig. 1B). Among the 31 di-

rect targets, pyruvate kinase (UniProt accession no. P14618) andglyceraldehyde-3-phosphate dehydrogenase (UniProt accessionno. P04406) have the most PPI partners. In summary, 99 com-pounds present in the 14 foods of the plant-based diet were linkedto 609 gut-microbial targets, which included 31 direct targets plus578 protein-protein interaction partners.

Gene expression analysis of phytochemical targets: func-tional implications and topological characteristics. To deter-mine to what extent the gut microbial protein targets of the foodcompounds were significantly influenced by dietary intake, weperformed a differential expression (DE) analysis using the RNA-Seq data before and after administration of the plant-based diet.Compared to the study of David et al. (4), the DE analysis on theRNA-Seq data, both in the preprocessing steps of the raw data andthe statistical analysis, were modified to achieve higher accuracy(see Materials and Methods for more details). We found that 115microbial targets (7 direct targets and 108 PPI targets), approxi-mately 20% of the whole target space (31 direct targets and 578PPI targets, respectively), altered gene expression significantly(false discovery rate [FDR] P value � 0.20 by Wilcoxon signed-rank test). This percentage of expression-changing targets in-duced by dietary phytochemicals is higher than the 8% induced bydrugs (26). This may be due to the fact that phytochemicals couldtarget multiple proteins, i.e., more promiscuous, as well as the

P53778P14679

CHEMBL206555

O14757P47870

O15264

P39900P47869

P49841

P04278

P11511

Q15118

P31749

O75582

P47989

P21917P45983

P14061

P18505

P48169P14867

Q16445

Q99928P31644

P78334

P28472

O14764

CHEMBL252642CHEMBL269630O75762

Q04206

CHEMBL14085

P18507

O00591

CHEMBL404515

P34903

Q9UN88

CHEMBL125743

Q8N1C3

Q92731

Q04760

Q16678

P35354

P51570

P49840

Q9NPH5

CHEMBL1275624

P23219

P36888

O94956

P04798P10828

P04637P27695Q03164

P46063

CHEMBL242341CHEMBL96926Q6W5P4

CHEMBL422

CHEMBL8320Q16637

CHEMBL986CHEMBL15594CHEMBL312163CHEMBL1453208

CHEMBL65675

CHEMBL501174

CHEMBL582

CHEMBL1232258

O75496

CHEMBL1499

P22748P42345

CHEMBL1394423

Q01453

CHEMBL402947

CHEMBL1394708

CHEMBL50

P21397

P68871CHEMBL239243

CHEMBL8659

CHEMBL13766

CHEMBL45068

P09917

CHEMBL525239

CHEMBL23194

P40225

CHEMBL67166

P02545P42858

CHEMBL602

B2RXH2P28482CHEMBL1492383

P10636

P19838

P16050P15428

CHEMBL247484CHEMBL151P33261

Q9Y6L6CHEMBL454173

CHEMBL28CHEMBL366603

CHEMBL1371781CHEMBL145CHEMBL131921

CHEMBL8145P06746CHEMBL165CHEMBL602575

CHEMBL11608

CHEMBL274323Q9NR56


CHEMBL284377

P54132P18054

CHEMBL46403

CHEMBL37537P35869

CHEMBL1537509

P10253

P11230

CHEMBL1347061

P30926

CHEMBL29757

P32297

Q07001

P17787P07510

O14842

CHEMBL9352

CHEMBL265325

CHEMBL42710P04745

CHEMBL1042

P05177

CHEMBL379064

CHEMBL1231491

CHEMBL464983

P14618

Q16539CHEMBL123040

CHEMBL267476

CHEMBL1236378

CHEMBL17962

CHEMBL539648

CHEMBL1569524

CHEMBL428495

P16473

CHEMBL1398443

CHEMBL416

CHEMBL150P11712

P10635


CHEMBL239047

CHEMBL415690

CHEMBL465183

CHEMBL234926


CHEMBL170365

CHEMBL449317

P22694

Q05586

P08107P12931

P06280

CHEMBL1588379

P16083

Q16790

CHEMBL1418955

P51151

O43570

Q02763

P08581P00533

CHEMBL84446

P22612P84022

CHEMBL284616

P35968

P15121

P37059O15118

Q13315 P03372

P68400O75604CHEMBL1537012CHEMBL308333

P42330P08253


P03956

Q8IW41

CHEMBL66879

CHEMBL82293

CHEMBL1784262

P17252O00141

Q01469

P42226P11309

P49137CHEMBL6246


CHEMBL222021

CHEMBL1556Q9ULX7

P08069

Q9HC97CHEMBL464404 P06213

P06748Q9UM73

Q4U2R8 P17612

Q15759

CHEMBL1334750

P06493

CHEMBL1461

P07148

P00918

P27338

P14780

CHEMBL350675

O60285

P15090

CHEMBL464982

CHEMBL359965

CHEMBL444478

P49327

CHEMBL1325366

P37088

P51170

P51168

Q15391

CHEMBL610981 CHEMBL122890

P00390O00519P04406


CHEMBL576

Q12879

P43166

P35218

P09237

Q9Y2D0

P15144

CHEMBL120568

Q9HBH1

Q99720Q15125

Q9Y468CHEMBL1686

Q9GZT9

CHEMBL514882

CHEMBL1487394

P31645P35498

P29466

CHEMBL1364101

CHEMBL1236376

CHEMBL54922

CHEMBL1609


CHEMBL389507

Q9BY08

Q96RJ0

Q99250

P31639

CHEMBL564

CHEMBL11298

Q9NY46

P55210

P18089

P08913

P08912

P23975

P11229 P25100

P14416

P05164

P28223

P41595

P50406

O15245

P28221

P22303

P35462

P28335

P18825

P35367P20309

P08173P08172

Q8WXA8

Q13133

P28566 P08908

P33765

CHEMBL170154

P47898

A5X5Y0

Q70Z44

O95264CHEMBL171804

P46098

P11509

P43681

CHEMBL32749

P02708P19099

CHEMBL69018

CHEMBL1473644

P15538

O00255

CHEMBL338066P00352

Q99714Q9NUW8

P08684


CHEMBL1552319


CHEMBL1221925


P30939

CHEMBL539

P28222

P06239

P06241

Q13639 P34969

CHEMBL1201131


CHEMBL506247


CHEMBL485012

CHEMBL24171

CHEMBL934

CHEMBL484901

CHEMBL1293

CHEMBL6640


CHEMBL1236019

CHEMBL333298

CHEMBL274318

CHEMBL1475613

CHEMBL39

CHEMBL26215

P24557CHEMBL540

Q16665

CHEMBL1364

CHEMBL1195032

CHEMBL1354559

CHEMBL164660

ENSP00000216780

ENSP00000280605ENSP00000451932

ENSP00000379769

ENSP00000384949ENSP00000311028

ENSP00000280706

ENSP00000354554

ENSP00000384369

ENSP00000297283

ENSP00000333664

ENSP00000342026ENSP00000374354

ENSP00000249750

ENSP00000351706ENSP00000219252

ENSP00000342032

ENSP00000358939ENSP00000265838

ENSP00000277865

ENSP00000361699

ENSP00000306548

ENSP00000359991

ENSP00000262027ENSP00000349142ENSP00000343867ENSP00000364475

ENSP00000327589

ENSP00000355493

ENSP00000377865

P00390ENSP00000299518

ENSP00000282050

ENSP00000323811

ENSP00000367086ENSP00000261772

ENSP00000368226

ENSP00000264932

ENSP00000359691

ENSP00000379678

ENSP00000368066

ENSP00000321259

ENSP00000341504

ENSP00000370737

ENSP00000367923ENSP00000389175

ENSP00000421524

ENSP00000342071ENSP00000346120 ENSP00000295822

ENSP00000296805

ENSP00000302111ENSP00000411803

ENSP00000262746

ENSP00000364486ENSP00000283646

ENSP00000367615

ENSP00000281543

CHEMBL9352

P16083CHEMBL28

ENSP00000261296ENSP00000386350

ENSP00000296411

ENSP00000290846

ENSP00000231572

ENSP00000223836

ENSP00000369456

ENSP00000246337ENSP00000377954

ENSP00000260563

ENSP00000039007

ENSP00000339399ENSP00000319501

ENSP00000393377

ENSP00000369785ENSP00000281081

ENSP00000426638

ENSP00000373176ENSP00000368438

ENSP00000354499

ENSP00000368358ENSP00000315152

ENSP00000235521ENSP00000402038

ENSP00000315644

ENSP00000340019

ENSP00000355325

ENSP00000344818

P27695ENSP00000251588

ENSP00000350491ENSP00000206542

ENSP00000365773

CHEMBL239243

ENSP00000222982

ENSP00000269298

ENSP00000253382

ENSP00000262878

ENSP00000245552

ENSP00000281038

ENSP00000175506

ENSP00000274606

ENSP00000364794

ENSP00000355536

CHEMBL1686

ENSP00000304229

ENSP00000469037

CHEMBL379064

ENSP00000398536

ENSP00000299163ENSP00000264162

Q9GZT9

ENSP00000371393

CHEMBL8145

CHEMBL150ENSP00000288532

ENSP00000333551ENSP00000219066

ENSP00000355518

ENSP00000282276ENSP00000242576

ENSP00000362298

ENSP00000264220

ENSP00000364649ENSP00000446384

ENSP00000303279

ENSP00000295453

ENSP00000280701

ENSP00000471388ENSP00000360268

ENSP00000265537ENSP00000358719

ENSP00000306817ENSP00000363965

ENSP00000339933P04406


ENSP00000414662

ENSP00000287936ENSP00000272252

ENSP00000308897

ENSP00000379038

ENSP00000322439


ENSP00000345771

ENSP00000390107

ENSP00000263629

ENSP00000377527

ENSP00000282356

ENSP00000216774

ENSP00000241052

ENSP00000358042

ENSP00000378782

ENSP00000370376

ENSP00000305995

ENSP00000366156

ENSP00000393953

ENSP00000419851

P51570

ENSP00000308845ENSP00000274813

Q04760

ENSP00000416583

ENSP00000164139

ENSP00000382595

ENSP00000258874

ENSP00000356022

ENSP00000363641

ENSP00000262030

ENSP00000278572

ENSP00000286621

ENSP00000355086

ENSP00000354876

ENSP00000444136

ENSP00000290921

ENSP00000244217

ENSP00000300051ENSP00000373073

ENSP00000361512

ENSP00000364986

ENSP00000302620

ENSP00000390158

ENSP00000307188ENSP00000228347

ENSP00000428115

ENSP00000360569

ENSP00000295448

ENSP00000353770ENSP00000264954

ENSP00000233893

ENSP00000332256ENSP00000437850

ENSP00000352516

ENSP00000361446ENSP00000356354

ENSP00000418082ENSP00000309334

ENSP00000359345

ENSP00000217455

ENSP00000261733

ENSP00000228510

ENSP00000366927

ENSP00000275605

ENSP00000333837

ENSP00000324124

ENSP00000290299

ENSP00000347345

ENSP00000419325

ENSP00000361511

ENSP00000292432

ENSP00000381504ENSP00000410833

ENSP00000312735

ENSP00000335304

ENSP00000354314ENSP00000358417

ENSP00000472847ENSP00000365462

ENSP00000273588ENSP00000352401

ENSP00000367727

ENSP00000369757

ENSP00000395605

ENSP00000357838

ENSP00000342991

ENSP00000265594

ENSP00000285093ENSP00000215587

ENSP00000360262

ENSP00000358549

ENSP00000455114

ENSP00000226299ENSP00000313877

ENSP00000302393

ENSP00000371230

ENSP00000346921

ENSP00000429374ENSP00000308610

ENSP00000244051

ENSP00000261326ENSP00000438404

ENSP00000333490

ENSP00000312304

ENSP00000273920ENSP00000443194ENSP00000359680

ENSP00000273359ENSP00000238018ENSP00000286479

ENSP00000341885

ENSP00000296674

ENSP00000425809

ENSP00000333395

ENSP00000468425ENSP00000263774

ENSP00000272065ENSP00000266971

ENSP00000367992

ENSP00000380514ENSP00000257549

ENSP00000252505ENSP00000315774

ENSP00000344794ENSP00000362329

ENSP00000230354

ENSP00000318318 ENSP00000354532

ENSP00000298556

ENSP00000258317

ENSP00000346015

ENSP00000317379

ENSP00000475569

ENSP00000361965

ENSP00000253004

ENSP00000452762

P47989

ENSP00000359531

ENSP00000449535 ENSP00000252603ENSP00000219302

ENSP00000438588ENSP00000317904

ENSP00000339435

ENSP00000230048ENSP00000353104

ENSP00000219240

ENSP00000340278

ENSP00000013034

ENSP00000300151

ENSP00000225969

CHEMBL312163

P04745

ENSP00000322524

ENSP00000296577


P46063ENSP00000263274ENSP00000408176

ENSP00000372316

CHEMBL1537509

CHEMBL1354559

P00352

CHEMBL1371781

ENSP00000358695

ENSP00000266517

CHEMBL1492473

ENSP00000260324

ENSP00000313350

CHEMBL142652

CHEMBL1221925

ENSP00000305480



CHEMBL1195032

CHEMBL1537012

ENSP00000234420ENSP00000246548

Q99714

CHEMBL274323

CHEMBL1096009

Q03164CHEMBL1329826

CHEMBL1518783

ENSP00000263657

ENSP00000317376CHEMBL247484

CHEMBL1492383ENSP00000285848

CHEMBL15594ENSP00000360310

ENSP00000307288

ENSP00000370377

ENSP00000262105

ENSP00000231790

ENSP00000265113

ENSP00000225577 ENSP00000311847

ENSP00000420927

ENSP00000362702

Q9HBH1

ENSP00000268261ENSP00000367893

ENSP00000257347ENSP00000232607

Q15118

CHEMBL19612 O00141


ENSP00000283131

ENSP00000466058


CHEMBL1293

CHEMBL1552319

CHEMBL366603P10253CHEMBL338066

CHEMBL6246

CHEMBL1622

CHEMBL8659

ENSP00000328174

P54132

CHEMBL242341


CHEMBL485012

CHEMBL1364

CHEMBL254348

CHEMBL46403

CHEMBL1275624


CHEMBL540

ENSP00000370750

ENSP00000305221

ENSP00000319851

CHEMBL1569524

ENSP00000267502

ENSP00000286398CHEMBL23194

CHEMBL284377

CHEMBL67166

CHEMBL986


CHEMBL11608

ENSP00000258168CHEMBL164660

CHEMBL1164301

ENSP00000296412

CHEMBL1506228ENSP00000420269

CHEMBL454173


CHEMBL465183

CHEMBL267476

CHEMBL1332980ENSP00000397939

CHEMBL464983

CHEMBL1322988

CHEMBL123040

ENSP00000306606

CHEMBL582

CHEMBL1254011

P43166

CHEMBL446452

P06280

CHEMBL1604063

CHEMBL274318

ENSP00000371321

CHEMBL1419611

CHEMBL1394708

CHEMBL415690

CHEMBL24171


ENSP00000326219

ENSP00000224356

ENSP00000255078

ENSP00000251566

ENSP00000303147ENSP00000438144

ENSP00000379339ENSP00000222286

ENSP00000396308ENSP00000229319

ENSP00000270776ENSP00000385725

ENSP00000307241

ENSP00000357535

ENSP00000429082

ENSP00000370517

P14618ENSP00000215754

ENSP00000378890

ENSP00000377446ENSP00000330054

ENSP00000270142

ENSP00000419129ENSP00000300107

ENSP00000443822ENSP00000301761ENSP00000324105

ENSP00000352222ENSP00000325448

ENSP00000265174ENSP00000460924

ENSP00000309477

ENSP00000288937ENSP00000245816

ENSP00000379108

ENSP00000319531

ENSP00000377617

ENSP00000445175

ENSP00000250237

ENSP00000205402ENSP00000304802ENSP00000333667

ENSP00000331897ENSP00000319788ENSP00000359976

ENSP00000314949

ENSP00000264331ENSP00000199320

Q16665

ENSP00000219431

ENSP00000246868ENSP00000297494

ENSP00000265724

ENSP00000338606

ENSP00000354720

ENSP00000372326 ENSP00000264156ENSP00000254322

ENSP00000264606

CHEMBL65675CHEMBL39

O60285ENSP00000306561

ENSP00000321584

ENSP00000468051

ENSP00000260129ENSP00000216911

ENSP00000286794CHEMBL279597CHEMBL37537

ENSP00000268171

ENSP00000351777

ENSP00000307156ENSP00000359507

ENSP00000237612ENSP00000463683

ENSP00000381607

ENSP00000423822

ENSP00000265498ENSP00000358727

ENSP00000471150

ENSP00000406630ENSP00000274242

ENSP00000451976

CHEMBL122890

ENSP00000285737

ENSP00000261434

ENSP00000357981

ENSP00000311430

ENSP00000345023

ENSP00000298510

ENSP00000290765

ENSP00000265462

ENSP00000237858

ENSP00000356410

ENSP00000338964

ENSP00000354677ENSP00000309463

ENSP00000248935

ENSP00000318868

ENSP00000295266

ENSP00000348234

ENSP00000359151

ENSP00000375881

ENSP00000301149

ENSP00000318351

ENSP00000363676

ENSP00000403290

ENSP00000273398ENSP00000320658 ENSP00000403664

ENSP00000321070 ENSP00000319814ENSP00000302728ENSP00000293860

ENSP00000262584

ENSP00000245206

ENSP00000241600

ENSP00000330141

ENSP00000390637

ENSP00000402760

ENSP00000261187ENSP00000255226

ENSP00000288699

ENSP00000325421

ENSP00000354511

ENSP00000269187

ENSP00000302851

ENSP00000225719

P23975

ENSP00000362135

P31645ENSP00000298472

ENSP00000297185

ENSP00000315006

ENSP00000407509

ENSP00000377191

ENSP00000378920

ENSP00000339260

O15245

ENSP00000370571

Q4U2R8

ENSP00000242375

ENSP00000354982

ENSP00000352702ENSP00000378951

ENSP00000254488

ENSP00000329757

ENSP00000336866

ENSP00000326671

ENSP00000323549

ENSP00000287766

ENSP00000270349

ENSP00000417085

ENSP00000428200

ENSP00000279027

ENSP00000352456

ENSP00000278379

ENSP00000286749

ENSP00000358640

ENSP00000362469

ENSP00000075120

CHEMBL564

ENSP00000370589

CHEMBL1328484ENSP00000254719

ENSP00000348886ENSP00000258975

ENSP00000472940

ENSP00000301327

ENSP00000341382ENSP00000356972ENSP00000296273

ENSP00000349576

ENSP00000408295

P31639

ENSP00000266736

ENSP00000209728

ENSP00000263187

ENSP00000267430

ENSP00000298139

ENSP00000370150

ENSP00000259008


ENSP00000382133

ENSP00000343392

ENSP00000321636

ENSP00000375809ENSP00000369411

CHEMBL265325

ENSP00000264233

ENSP00000411532

P49327

ENSP00000256460

ENSP00000353826ENSP00000346839

ENSP00000357218

ENSP00000363868

CHEMBL464982

ENSP00000265686

ENSP00000264595

ENSP00000349078

ENSP00000314508

ENSP00000220584

ENSP00000344223 ENSP00000217426

ENSP00000322706

ENSP00000275428

ENSP00000281455

ENSP00000260645

ENSP00000371110ENSP00000381464

ENSP00000294724

ENSP00000244289

ENSP00000363229CHEMBL66879

ENSP00000350914

ENSP00000332247ENSP00000352608

ENSP00000369820ENSP00000252595

ENSP00000272286

ENSP00000290429

ENSP00000367462ENSP00000234111

ENSP00000278618

ENSP00000324842

ENSP00000403357

ENSP00000264426

ENSP00000345873ENSP00000346601

ENSP00000222481

ENSP00000311984

ENSP00000378669

ENSP00000325875

ENSP00000313921

ENSP00000236959

ENSP00000300738

P22303

ENSP00000361287

ENSP00000416293

ENSP00000386284

ENSP00000323568

ENSP00000282753

ENSP00000221486ENSP00000335153

ENSP00000306306

ENSP00000295688

P00918

ENSP00000307940

ENSP00000273550ENSP00000248923

ENSP00000318480


ENSP00000341136

ENSP00000257770ENSP00000264613

CHEMBL610981

ENSP00000470037

ENSP00000355890

ENSP00000328973

ENSP00000342056ENSP00000280704


ENSP00000234590

ENSP00000301522

ENSP00000225430

ENSP00000216185

ENSP00000267814ENSP00000327251

ChEMBLB

A

165 compounds <> 214 proteins

99 compounds <> 609 proteins

Ortholog identification

Microbiota-specific

STRING

Respiratory

Unclassified

Metabolic NeurologicalGastrointestinal

ImmunologicalCancerCardiovascular

FIG 1 Molecular-level view of the plant-based diet. (A) Food-disease association network for the 14 foods in the plant-based diet of this study. The generaldisease classes were colored differently as illustrated by the legend. The size of the food nodes reflects the number of disease connections for each food. (B) Gutmicrobiota-specific protein target space of the food phytochemicals. (Left) Human direct targets (red rectangles) of the food phytochemicals (green hexagons)were retrieved from the ChEMBL database. (Right) The microbiota-specific direct targets (red rectangles) and their protein-protein interaction (PPI) partners(blue rectangles) were derived after incorporating orthologous relationships with human and STRING protein-protein interaction data.

Phytochemicals Alter Gut Microbial Gene Expression


on Decem


bio.asm.org/

Dow

nloaded from

http://www.ncbi.nlm.nih.gov/protein/P14618


mbio.asm.org


possibility of multiple phytochemicals targeting the same protein.These 115 DE targets were not found to be significantly differen-tially expressed when comparing the animal-based diet (4) againstits baseline (FDR P value � 0.20 by Wilcox rank sum test). Nota-bly, the targets of food compounds had a significantly higher pro-portion of DE genes than the nontargeted gut-bacterial proteins(20.7% versus 14.8%; P � 0.0016 by Fisher’s exact test). All thephytochemicals targeting these 115 proteins are below 700 Da,and therefore were expected to enter the bacterial cell. On the basisof the response of the gut microbiome and in order to move be-yond the restricted food set used in this study, we also attemptedto identify other foods that would potentially induce the equiva-lent or similar effect. This endeavor was bridged by the common-alities in the phytochemical composition between the original 14foods used in this study and the 1,772 plant-based foods availablein the NutriChem database. Using as input the set of the 59 phy-tochemicals present in these 14 foods with DE targets, we searchedthe NutriChem database for alternative foods that show the high-est overlap in the phytochemical space. More than 550 foods wereretrieved, while some representative examples in this top 20 foodlist (see Table S4 in the supplemental material) include Ginkgobiloba (ginkgo; 41 shared compounds), sea buckthorn (Hippo-phae; 33 shared compounds), and Lonicera japonica (honeysuckle;28 shared compounds).

Looking into the biological processes mostly affected by thephytochemical-protein interactions, we found that the 115 DEtargets were significantly enriched in 11 pathways (FDR P value �0.05 by Fisher’s exact test) (Fig. 2A), which provided functionalinsights into the gut microbial targets. All of these enriched path-ways are related to metabolism, such as glucose metabolism, tri-carboxylic acid (TCA) cycle (also known as the citric acid cycle),and metabolism of amino acids and derivatives. In contrast, thenon-DE targets displayed a very different profile of significantlyenriched pathways, exemplified by DNA repair and translation-related processes (Fig. 2A). To inspect this discrepancy in a morestatistical manner, we compared the pathway differences betweenDE and non-DE targets using a pathway overrepresentation anal-ysis. We focused on the top 12 significantly enriched pathways forboth groups as shown in Fig. 2A. Seven out of the 12 pathways forthe DE targets (as indicated by asterisks in Fig. 2A) showed signif-icant overrepresentation compared directly with the non-DEgroup (P � 0.05 by Fisher’s exact test); moreover, all seven path-ways were metabolic processes. Conversely, all the overrepre-sented pathways in the non-DE group relative to the DE groupwere not closely related to metabolism. This suggests that the phy-tochemicals present in plant-based diet mainly exert their effecton gut microbiome by altering microbial metabolic activities.Taking the 15 most up- or downregulated genes (fold change) asrepresentatives, we further zoomed in the reactions those DE tar-gets are involved in. The solute carrier family 22 member 1 (or-ganic cation transporter 1, UniProt accession no. Q15245) andGTP:AMP phosphotransferase AK3 (UniProt accession no.Q9UIJ7) were the two most upregulated genes, whereas replica-tion factor C subunit 3 (UniProt accession no. P40938) and glu-cose transporter 3 (UniProt accession no. P11169) were the mostdownregulated genes in the list. The results (Fig. 2B; see Ta-ble S3 in the supplemental material) also show that the majority ofthe phytochemical targets with the highest fold change in geneexpression activity participate in metabolic enzymatic reactions,such as the acetyl coenzyme A (acetyl-CoA) acetyltransferase

(UniProt accession no. Q9BWD1) and aldehyde dehydrogenase(UniProt accession no. P47895). In the list with the most up- ordownregulated genes, there are also several genes with “auxiliary”functions such as ATP binding (UniProt accession no. P61221),potentially playing a regulatory role in response to dietary intake.Phosphopyruvate hydratase (UniProt accession no. P06733) andalcohol dehydrogenase (UniProt accession no. P11766) are theenzymes interacting with the highest number of compoundswithin up- or downregulated gene lists, respectively. However,what also caught our attention in this list is that neither the num-ber of compounds nor number of foods targeting gut-bacterialproteins can be used as an indicator for the observed fold changesin gene expression activity level.

Next, we investigated the differences between DE and non-DEphytochemical targets, both from a chemical and biological per-spective, in an attempt to understand why some of the food targetswill show a significant difference in gene expression levels whileothers will not. First, the chemical features of the compounds wereused to examine whether the DE and non-DE proteins could bedistinguished. From the principal component analysis (PCA)based on the chemical descriptors of the 99 compounds, no clearseparation can be observed between compounds targeting DEproteins and non-DE proteins (data not shown). In addition, thestrength of the compound-protein interactions (based on bioac-tivity data retrieved from the ChEMBL database) was comparedbetween DE and non-DE target groups using a set of compoundsthat target proteins from both groups. As discussed above, thecompounds targeting both DE and non-DE proteins did not showsignificantly different binding affinities toward the two targetgroups (P � 0.05 by Student’s t test). These results suggest that theintrinsic biological properties of the target proteins, rather thanthe chemical characteristics of the compounds or the strength ofinteractions between compounds and proteins, are probably as-sociated with the variation in the community gene expression un-der mediation by phytochemicals. This is also consistent with ourprevious observation that the targeted genes with the highest foldchange activity are not necessarily the ones targeted by the highestnumber of compounds (Fig. 2B).

Therefore, the topological properties of the target proteins inthe orthologous PPI network were further investigated (Fig. 2C).For the whole orthologous PPI network, the power-law distribu-tion of degree connectivity was observed, consistent with the com-mon feature of many biological networks (27) (see Fig. S3A in thesupplemental material). Compared to nontarget proteins, thephytochemical targets were mainly positioned in the central area,possessing higher connectivity (Fig. 2C and Fig. S3B). Withinthese targets, the majority of the network centrality measures, in-cluding degree and closeness centrality, were found to be signifi-cantly different between DE and non-DE targets (P � 0.01 byWilcoxon rank sum test). More specifically, the DE targets of thefood compounds tend to be located in a more “central” positionwithin the PPI network than the non-DE ones (Fig. 2C and D), asreflected by the higher centrality measurements (as well as thelower average shortest path length) of the DE group. The highercloseness centrality of the DE targets indicates that the informa-tion spread is generally faster from a given DE protein to all otherconnected nodes in the network. The DE targets also exhibitedhigher betweenness centrality, which measures the amount ofcontrol a node has on the information spread between othernodes. This implies that compared to non-DE targets, DE targets

Ni et al.


on Decem


bio.asm.org/

Dow

nloaded from

http://www.ncbi.nlm.nih.gov/protein/Q9UIJ7



http://www.ncbi.nlm.nih.gov/protein/Q9BWD1





mbio.asm.org


050

100

150

200

Protein degree connectivity

DE targets Non-DE targets

0.00

00.

002

0.00

40.

006

0.00

8

Protein BetweenessCentrality


0.30

0.35

0.40

0.45

0.50

Protein ClosenessCentrality


2.0

2.2

2.4

2.6

2.8

3.0

3.2

3.4

AverageShortestPathLength


0.75

0.80

0.85

Protein Radiality


0e+0

02e

+05

4e+0

56e

+05

8e+0

5

Protein Stress


Galactose catabolism

Methylation

Glycolysis

Citric acid cycle (TCA cycle)

Gluconeogenesis

Pyruvate metabolism and Citric Acid (TCA) cycle

Respiratory electron transport

The citric acid (TCA) cycle and respiratory electron transport

Glucose metabolism

Metabolism of amino acidsand derivatives

Metabolism of carbohydrates

0 5 2010 15

3%

3%

4%

5%

5%

5%

5%

9%

10%

11%

16%

REACT_532

REACT_6946

REACT_1383

REACT_1785

REACT_1520

REACT_1046

REACT_22393

REACT_111083

REACT_723

REACT_13

REACT_474

**

**

*

*

*

DNA strand elongation

Eukaryotic Translation Termination

Formation of a pool of free 40S subunits

L13a-mediated translational silencing of Ceruloplasmin expression

GTP hydrolysis and joining of the 60S ribosomal subunit

Eukaryotic Translation Initiation

Peptide chain elongation

Eukaryotic Translation Elongation

SRP-dependent cotranslational protein targeting to membrane

DNA Repair

Metabolism of nucleotides

REACT_932

REACT_1986

REACT_1797

REACT_79

REACT_2085

REACT_2159

REACT_1404

REACT_1477

REACT_115902

REACT_216

REACT_1698

***

*****

0 5 2010 15

3%

5%

5%

5%

5%

5%

5%

5%

5%

5%

7%

DE

targ

ets

non-

DE

targ

ets

UniProt Regulation Type # of compounds # of foodsO15245 up Transporter 2 2Q9UIJ7 up EnzymeO14556 up EnzymeP30084 up EnzymeP47895 up EnzymeQ16836 up Enzyme

Q9BWD1 up EnzymeP11498 up EnzymeP06733 up Enzyme 13 11A6NJ78 up EnzymeQ9UJZ1 up Ligand binding 39 12P35249 up Replication factor 47 12Q86XE5 up Enzyme 0P13804 up Electro acceptorQ99757 up EnzymeP40938 down Replication factor 47 12P11169 down Transporter 1 1P78417 down EnzymeQ8N0X4 down EnzymeQ9H9Y6 down EnzymeP61221 down Ligand binding 39 12P25685 down Chaperone 9 9P11766 down Enzyme 54 14P23396 down Ribosomal proteinP11182 down EnzymeO14734 down EnzymeQ15070 down Other 39 12Q9Y234 down EnzymeO95477 down Ligand bindingP36957 down Enzyme

3 53 54 82 33 52 36 9

2 3

5 14 84 6

1 12 32 3

2 33 42 3

1 12 44 6

The 15 most up or down regulated genes

C

D

BA

FIG 2 Functional and topological analyses of the differentially expressed (DE) and non-DE phytochemical targets. (A) Significantly enriched pathways for DEtargets and non-DE targets (only the top 11 pathways are shown for non-DE targets). The asterisks indicate the pathways that were significantly overrepresentedcompared to the other group in a direct comparison. (B) List of the 15 most (fold change) up- or downregulated genes (UniProt identifier [ID] or accession no.and type are also shown) targeted by phytochemicals directly or through PPI. The number of compounds targeting these genes and the number of foodscontaining these compounds were listed as well. (C) Complete microbiota-specific protein-protein interaction network. The DE phytochemical targets (DEtargets tend to be located in more central positions within the network) are shown in red, while the non-DE ones are shown in yellow. The nodes in gray indicateproteins that are not phytochemical targets (nontargets). For clarity, not all network edges were shown here. (D) The boxplots of six network topologicalproperties that were significantly different between DE (red) and non-DE targets (yellow) are shown. The y axes of boxplots show the calculated values ofcorresponding topological properties.



on Decem


bio.asm.org/

Dow

nloaded from

mbio.asm.org


are more likely to be the “bridges” between clusters in the net-work. We also found that more than 80% of the interactionswithin the PPI network are between proteins found in the samegenome. With the reduced PPI network (filtering out interspeciesconnections), the results and conclusions derived from the net-work topological analysis discussed above still stand.

Taxonomic contribution to gene expression variations of thephytochemical targets. Since the phytochemicals present in theplant-based diet could potentially contribute to a gene expressionresponse of the gut microbiota, the follow-up question was whichmicrobe or groups of microbes were mostly influenced or contrib-uting most to the variations in gene expression at the communitylevel for each of the gene categories targeted by the food com-pounds. We calculated the overall expression change of each phy-lum for the 115 DE targets, and subsequently the contribution ofeach phylum in the whole microbial community (Fig. 3). Thephylum Bacteroidetes was found to possess the “broadest respond-ing spectrum,” i.e., it contributed highly to the differential expres-sion of more targeted genes than other phyla. Within the group oftarget genes to which Bacteroidetes contributed predominantly(denoted by the bright blue rectangle in Fig. 3), it was noticed thatmost genes were downregulated at the community level. GeneOntology (GO) annotation indicated that genes from this groupmainly participate in the carboxylic acid metabolic process (GO:0019752), cellular amino acid metabolic process (GO:0006520),as well as RNA and protein metabolic processes. Firmicutes, al-though without the widest spectrum, was the phylum responsiblefor differential expression of most upregulated target genes, whichare mainly involved in biological processes such as glucose (GO:0006006), hexose (GO:0019318), carboxylic acid (GO:0019752),and fatty acid (GO:0006631) metabolic processes. In addition tothe predominant and specific contribution of Firmicutes towardupregulated genes, it also negatively contributed (i.e., oppositedirection of expression change compared to that of the wholecommunity) to a cluster of downregulated genes, which are shownas the distinct blue blocks in Fig. 3. However, this effect was over-ridden by the opposite effect from Bacteroidetes, leading ulti-mately to a community-level downregulation. The group of targetgenes to which Actinobacteria predominantly contributed displaysa distinct profile with several regulatory processes, which are notpresent in the groups of other phyla. All the target genes in thegroups for both Actinobacteria and Proteobacteria had decreasedexpression at the community level.

We further investigated how various microorganisms re-sponded and contributed to variations in gene expression at thespecies level. As can be seen from Fig. S4 in the supplementalmaterial, the responses were phylogenetically diverse, even withinthe same phylum. While a group of species in one particular phy-lum may be the major contributors to community-level expres-sion change of targeted genes, the other members in that phylummay contribute very little. Taking Firmicutes as an example, 50%of the species in this phylum did not show noticeable contributionto the variation in expression of the 115 DE targets. On the otherhand, the majority of those upregulated DE targets primarily stemfrom a few particular species, including Eubacterium rectale,Faecalibacterium prausnitzii, Roseburia hominis, Roseburia intesti-nalis, Coprococcus, and Ruminococcus bromii. The first three spe-cies are the main butyrate-producing bacteria in the human gut(28, 29). With a role in anti-inflammatory process, their reducedpopulations have been reported previously as being correlated

with the development of diseases such as Crohn’s disease (30) andulcerative colitis (31). Similar patterns were also observed in theother three main phyla. Two single species, Bifidobacteriumlongum from Actinobacteria and Escherichia coli from Proteobacte-ria, dominated the corresponding phylum contribution towardcommunity-level expression variations. Furthermore, these twospecies together were mainly responsible for the community-leveldownregulation of most target genes.

The analyses discussed above provided information about therole of each phylum within the whole gut microbial community.In order to understand better the variation of genuine transcrip-tional activity for each phytochemical target within each phylum,we further normalized the gene expression levels by the phylum-level abundance and compared this normalized transcriptionalactivity after being on the diet with that of the baseline period. Thisfurther elucidated whether the higher gene expression level wasderived from a high transcription level or simply dominant phy-lum abundance. As shown in Fig. 4, the phyla Proteobacteria, Ac-tinobacteria, and Verrucomicrobia had decreased transcription ac-tivities for almost all DE targets, while Firmicutes andEuryarchaeota were the two dominant phyla having higher tran-scription levels. For most of the upregulated DE targets, Firmicutesdisplayed the highest increase of transcription activity, which wasin concordance with its predominant contribution to the commu-nity differential expression of upregulated genes mentionedabove. Interestingly, the phylum Euryarchaeota, which consistsmainly of methanogens, had increased transcription activities formost target genes that it expressed. This unique pattern was “hid-den” in the contribution analysis due to its low abundance in themicrobial community.

Designing dietary interventions: the case study of SCFA me-tabolism. Short-chain fatty acid (SCFA) metabolism has been ex-tensively cited as being associated with human health and diseasepathology, playing a pivotal role in regulating the biological pro-cesses in the human gut and colon (32, 33). The levels of SCFAs inthe gut, especially butyrate and propionate, have been connectedwith the development of different diseases ranging from meta-bolic and inflammatory diseases to cancers (34, 35). Positive cor-relations were also reported between the levels of certain SCFAsand particular microbes within the phylum of Firmicutes (Rose-buria sp., E. rectale, and F. prausnitzii) (4). Since altering the levelsof SCFAs could have potentially remarkable health benefits, weused our computational framework here for predicting diets thatcould potentially alter the activity of the SCFA-related pathways.

The butyrate metabolism (Kyoto Encyclopedia of Genes andGenomes [KEGG] ko00650) consists of 63 reactions (in terms ofKEGG reaction identifiers [IDs]) involving 41 metabolites, whilethe propionate metabolism (KEGG ko00640) is composed of 64reactions involving 44 metabolites (Fig. 5A). In order to identifyfoods that could alter the activity of butyrate or propionate me-tabolism, two approaches were adopted here: in NutriChem andChEMBL databases, (i) we searched for food phytochemicals thathave been experimentally tested to interact with proteins involvedin those two SCFA metabolic pathways; (ii) we searched for foodscontaining phytochemicals that are essentially metabolites in-volved in butyrate and propionate metabolism; thus, they are ex-pected to interact with the metabolic proteins. Although therewere no phytochemicals identified in the ChEMBL database thattarget SCFA metabolism-related proteins, the latter approachyielded a set of 98 foods containing 31 phytochemicals (corre-

Ni et al.


on Decem


bio.asm.org/

Dow

nloaded from

mbio.asm.org


Firmicu

tes

Bacter

oidete

s

Euryarc

haeo

ta

rruco

microb

ia

Actino

bacte

ria

Proteo

bacte

riaENSG00000124767ENSG00000173599ENSG00000140374ENSG00000100348ENSG00000241935ENSG00000169519ENSG00000114480ENSG00000165283ENSG00000158125ENSG00000147853ENSG00000138796ENSG00000175003ENSG00000120437ENSG00000163918ENSG00000105679ENSG00000127884ENSG00000122729ENSG00000160179ENSG00000168906ENSG00000067704ENSG00000151413ENSG00000153574ENSG00000096384ENSG00000258429ENSG00000078070ENSG00000023697ENSG00000178802ENSG00000119689ENSG00000137992ENSG00000126088ENSG00000165140ENSG00000108515ENSG00000125166ENSG00000155463ENSG00000132002ENSG00000060971ENSG00000002549ENSG00000125246ENSG00000125630ENSG00000148834ENSG00000197894ENSG00000120265ENSG00000198931ENSG00000059573ENSG00000176974ENSG00000140451ENSG00000178035ENSG00000172172ENSG00000198650ENSG00000114686ENSG00000085760ENSG00000131979ENSG00000074800ENSG00000117118ENSG00000048392ENSG00000213930ENSG00000168393ENSG00000197375ENSG00000101473ENSG00000169919ENSG00000164163ENSG00000165029ENSG00000154330ENSG00000059804ENSG00000133119ENSG00000051341ENSG00000149273ENSG00000142657ENSG00000131747ENSG00000115286ENSG00000144182ENSG00000196365ENSG00000110717ENSG00000158864ENSG00000079739ENSG00000065833ENSG00000100823ENSG00000151806ENSG00000126522ENSG00000177302ENSG00000167325ENSG00000166986ENSG00000139131ENSG00000011376ENSG00000117593ENSG00000144381ENSG00000166411ENSG00000167588ENSG00000138430ENSG00000062485ENSG00000100294ENSG00000116984ENSG00000065911ENSG00000137513ENSG00000151093ENSG00000100994ENSG00000091483ENSG00000106992ENSG00000151224ENSG00000138035ENSG00000073578ENSG00000108479ENSG00000013503ENSG00000101444ENSG00000166800ENSG00000138801ENSG00000178445ENSG00000143627ENSG00000184254ENSG00000101347ENSG00000167996ENSG00000132740ENSG00000146085ENSG00000171298ENSG00000083123

0.5 0.5 1.5

Contribution

GO:0006006 glucose metabolic processGO:0006260 DNA replicationGO:0019318 hexose metabolic processGO:0006281 DNA repairGO:0006259 DNA metabolic processGO:0044267 cellular protein metabolic processGO:0019219 regulation of nucleobase, nucleoside, nucleotide

and nucleic acid metabolic processGO:0031326 regulation of cellular biosynthetic process

GO:0019752 carboxylic acid metabolic processGO:0006520 cellular amino acid metabolic processGO:0044106 cellular amine metabolic processGO:0016070 RNA metabolic processGO:0044267 cellular protein metabolic processGO:0006753 nucleoside phosphate metabolic processGO:0019318 hexose metabolic processGO:0006412 translationGO:0044275 cellular carbohydrate catabolic processGO:0006006 glucose metabolic process

GO:0006006 glucose metabolic processGO:0019318 hexose metabolic processGO:0019752 carboxylic acid metabolic processGO:0034637 cellular carbohydrate biosynthetic processGO:0006631 fatty acid metabolic process

GO:0019752 carboxylic acid metabolic processGO:0044267 cellular protein metabolic processGO:0009108 coenzyme biosynthetic processGO:0045333 cellular respirationGO:0046395 carboxylic acid catabolic processGO:0006006 glucose metabolic processGO:0046394 carboxylic acid biosynthetic processGO:0019318 hexose metabolic processGO:0006753 nucleoside phosphate metabolic processGO:0006464 protein modification process

up-regulated genes

down-regulated genes

Bacteroidetes

Actinobacteria

Proteobacteria

Firmicutes



on Decem


bio.asm.org/

Dow

nloaded from

mbio.asm.org


sponding to 18 and 19 metabolites from butyrate and propionatemetabolism, respectively) (Fig. 5A). Of these 98 foods, 22 containat least two phytochemicals involved in butyrate/propionate me-tabolism with strawberry, mung bean, and soybean possessing themost (Fig. 5B). Interestingly, 8 of the foods used in the originalstudy of David et al. (4)—where the authors detected significantlyhigher levels of SCFAs in the plant-based diet than in the animal-based diet—were part of the list containing the 98 foods (and 5 ofthem were part of the subgroup of the 22 foods), supporting ournotion that the detailed food composition could help us partlyexplain the observed changes in the gut microbial function andactivity.

To offer additional evidence that dietary interventions con-taining these 22 foods could potentially trigger changes in theSCFA gut metabolism, we developed a food-disease networkbased on experimental studies collected in the NutriChem data-base. We could retrieve 150 disease phenotypes (in terms of Dis-ease Ontology [36] IDs) for these 22 foods (see Fig. S5 in thesupplemental material) with breast cancer, hepatocellular carci-noma, and diabetes appearing as the top three diseases connectedto the highest number of foods (11, 11, and 10, respectively),whereas sweet peppers, tomatoes, and buckwheat have the highestnumber of disease links. While hepatocellular carcinoma and di-abetes have also been linked to gut microbial imbalances (37, 38),we focus here on a subset of the food-disease network, which isrelated to colon cancer, a disease highly associated with the levelsof SCFAs (39). We found that 9 out of the 22 foods have beenassociated with colon cancer (Fig. 5B); interestingly, these 9 foodswere significantly overrepresented in colon cancer than any otherfood-disease associations in the NutriChem database (P � 0.001by Fisher’s exact test). Most of the food-disease associations pres-ent in the literature (and subsequently in the NutriChem data-base) rely on feeding experiments of human or animal models andmonitoring the progress/development of a disease. The method-ology presented here, which relies on the known phytochemicalcomposition of diet and the potential interactions with the gut-specific bacterial genes, could serve as a route for understandingmechanistically the observed phenotypic responses.

Furthermore, we identified the most vulnerable protein targetsof the butyrate and propionate metabolic pathways using theirtopological features (based on the comparisons between DE andnon-DE proteins described above). The proteins in those twometabolic pathways were first ranked according to six significanttopological features independently (degree, closeness centrality,betweenness centrality, radiality, stress, and average shortest pathlength). Then, these six ranked lists were aggregated into a com-bined ranked list in the order of decreasing centrality (or vulner-ability) (see Table S5 in the supplemental material), providinginformation on which protein targets are more likely to changegene expression activity (Fig. 5A). In butyrate metabolism,3-hydroxyacyl-CoA dehydrogenase (K00022), acetyl-CoA acetyl-

transferase (K00626), and enoyl-CoA hydratase (K07511) werefound to be the proteins that if targeted by dietary phytochemicals,significant changes in the gene expression activity should be ex-pected. As to propionate metabolism, our topology analysis indi-cates succinyl-CoA synthetase (GenBank accession no. K01899),L-lactate dehydrogenase (DDBJ accession no. K00016), andmethylmalonyl-CoA mutase (GenBank accession no. K01847) asthe most vulnerable protein targets.

DISCUSSION

The value of the gastrointestinal community members and theirinteraction with the host through a variety of signaling moleculesand mechanisms has become widely recognized (40). Clear evi-dence of this realization is the research spending on the analysis ofthe symbiotic relationships between humans and their indigenousmicrobiome, which is estimated to be more than $500 million inthe last 5 years (41). Metagenomic studies, such as the MetaHitproject (42) and Human Microbiome Project (43), provided themultimillion-gene potential of the intestinal microbiome,whereas metatranscriptomic studies (4, 44) provided the subsetsof these genes that are expressed as a response to a given stimulus.In particular, diet is considered one of the major modulators of theintestinal microbiota and a dominant source of variation in itscomposition (45). However, even though occasionally measure-ments of specific biomarkers in the feces were performed, the fullinteraction spectrum between the food, its molecular compo-nents, and the microbiota was neglected.

It is worthy to note that xenobiotics, including drugs and an-tibiotics, which can be considered small-molecule analogs of di-etary phytochemicals, have been demonstrated to alter the geneexpression of active gut microbiota significantly (46). Here weprovide a molecular-level analysis of the extent and nature of thegut microbiome responsiveness to diet and a computational plat-form for understanding mechanistically how food componentscould drive the activity and function of the gut ecosystem to dif-ferent states. The development of a chemical-protein network be-tween the food components of a plant-based dietary interventionand the human gut-bacterial proteome space revealed that ~20%of the microbiome targets altered significantly their gene expres-sion activity at a community level. Our analysis also indicates thatthe metabolic pathways in the gut microbiota are more likely tochange activity and are easier to be interrupted compared to otherbiological processes, when targeted by dietary phytochemicals.Furthermore, while the chemical features of the phytochemicalscould not shed light on which of the bacterial targets will alter theirgene expression activity, the network analysis revealed the highercentrality within the bacterial PPI network as the most prominentcharacteristic of the DE bacterial targets of phytochemicals. Pre-vious studies have indicated that highly connected nodes, or“hubs,” in the scale-free cellular biological network tend to be theessential genes (47, 48), which will have less expression changes in

FIG 3 Phylum contribution toward the community-level gene expression variations of the differentially expressed (DE) targets. The heatmap illustrates theindividual contribution of each phylum present in gut microbiota for each DE phytochemical target. The 115 DE targets were further separated into two classes,upregulated genes and downregulated genes. The red blocks represent a positive contribution of the phylum toward community-level expression variation (thedirection of change for that phylum in one particular target gene was in agreement with that of the whole microbiota community, regardless of up- ordownregulation). The blue blocks indicate that the expression change of these genes for that phylum were in the direction opposite that of the community (i.e.,negative contribution). For each of the four main phyla, a cluster of genes was identified (enclosed by a rectangle, with the phylum name was displayed on theright), indicating the dominating effect of that phylum on gene expression variations. Gene Ontology annotations (biological processes) for the four gene clusterswere also provided.

Ni et al.


on Decem


bio.asm.org/

Dow

nloaded from

http://www.ncbi.nlm.nih.gov/nuccore?term=K01899

http://www.ncbi.nlm.nih.gov/nuccore?term=K01847

mbio.asm.org


Prote

obac

teria

Actino

bacte

ria

erru

com

icrob

ia

Bacte

roide

tes

Firmicu

tes

Eurya

rcha

eota

ENSG00000124767ENSG00000173599ENSG00000140374ENSG00000100348ENSG00000241935ENSG00000169519ENSG00000114480ENSG00000165283ENSG00000158125ENSG00000147853ENSG00000138796ENSG00000175003ENSG00000120437ENSG00000163918ENSG00000105679ENSG00000127884ENSG00000122729ENSG00000160179ENSG00000168906ENSG00000067704ENSG00000151413ENSG00000153574ENSG00000096384ENSG00000258429ENSG00000078070ENSG00000023697ENSG00000178802ENSG00000119689ENSG00000137992ENSG00000126088ENSG00000165140ENSG00000108515ENSG00000125166ENSG00000155463ENSG00000132002ENSG00000060971ENSG00000002549ENSG00000125246ENSG00000125630ENSG00000148834ENSG00000197894ENSG00000120265ENSG00000198931ENSG00000059573ENSG00000176974ENSG00000140451ENSG00000178035ENSG00000172172ENSG00000198650ENSG00000114686ENSG00000085760ENSG00000131979ENSG00000074800ENSG00000117118ENSG00000048392ENSG00000213930ENSG00000168393ENSG00000197375ENSG00000101473ENSG00000169919ENSG00000164163ENSG00000165029ENSG00000154330ENSG00000059804ENSG00000133119ENSG00000051341ENSG00000149273ENSG00000142657ENSG00000131747ENSG00000115286ENSG00000144182ENSG00000196365ENSG00000110717ENSG00000158864ENSG00000079739ENSG00000065833ENSG00000100823ENSG00000151806ENSG00000126522ENSG00000177302ENSG00000167325ENSG00000166986ENSG00000139131ENSG00000011376ENSG00000117593ENSG00000144381ENSG00000166411ENSG00000167588ENSG00000138430ENSG00000062485ENSG00000100294ENSG00000116984ENSG00000065911ENSG00000137513ENSG00000151093ENSG00000100994ENSG00000091483ENSG00000106992ENSG00000151224ENSG00000138035ENSG00000073578ENSG00000108479ENSG00000013503ENSG00000101444ENSG00000166800ENSG00000138801ENSG00000178445ENSG00000143627ENSG00000184254ENSG00000101347ENSG00000167996ENSG00000132740ENSG00000146085ENSG00000171298ENSG00000083123

−10 0 10

Variation

GO:0006006 glucose metabolic processGO:0006260 DNA replicationGO:0019318 hexose metabolic processGO:0006281 DNA repairGO:0006259 DNA metabolic processGO:0044267 cellular protein metabolic processGO:0019219 regulation of nucleobase, nucleoside, nucleotide

and nucleic acid metabolic processGO:0031326 regulation of cellular biosynthetic process

GO:0019752 carboxylic acid metabolic processGO:0006520 cellular amino acid metabolic processGO:0044106 cellular amine metabolic processGO:0016070 RNA metabolic processGO:0044267 cellular protein metabolic processGO:0006753 nucleoside phosphate metabolic processGO:0019318 hexose metabolic processGO:0006412 translationGO:0044275 cellular carbohydrate catabolic processGO:0006006 glucose metabolic process

GO:0006006 glucose metabolic processGO:0019318 hexose metabolic processGO:0019752 carboxylic acid metabolic processGO:0034637 cellular carbohydrate biosynthetic processGO:0006631 fatty acid metabolic process

GO:0019752 carboxylic acid metabolic processGO:0044267 cellular protein metabolic processGO:0009108 coenzyme biosynthetic processGO:0045333 cellular respirationGO:0046395 carboxylic acid catabolic processGO:0006006 glucose metabolic processGO:0046394 carboxylic acid biosynthetic processGO:0019318 hexose metabolic processGO:0006753 nucleoside phosphate metabolic processGO:0006464 protein modification process

up-regulated genes

down-regulated genes

Bacteroidetes

Actinobacteria

Proteobacteria

Firmicutes



on Decem


bio.asm.org/

Dow

nloaded from

mbio.asm.org


mild or moderate conditions (49). However, this might not beobserved when an external stimulus is present (as shown in thisstudy) or in a completely different physiological condition such asdisease state (50). As shown here, upon dietary stimulus, thesehighly connected targets are more “vulnerable” (i.e., more suscep-tible to gene expression variation) and would generate a strongerglobal effect upon perturbation. From a therapeutic perspective,these more vulnerable (metabolic) proteins with high centralitycould serve as promising targets in phytochemical-based dietaryinterventions. Furthermore, our phylum-level investigation re-vealed that different phyla of bacteria in the human gut wouldrespond differently in terms of target gene expression activityupon phytochemical perturbation. More specifically, the altered

expression levels of targeted genes related to metabolic processeswere predominantly attributed to different dedicated phyla. Thewhole microbiota community thus works coordinately to achievethe overall actions in response to dietary interventions. Thus,identifying specific-disease-associated signatures in the gut mi-crobiota and products that alter microbial populations or blockspecific bacterial metabolites are expected to lead to the first gen-eration of microbiome therapies.

Even though the role of diet in health and disease states hasbeen recognized for years, the majority of studies in the literatureeither treated food as a black box or the focus was on carbohy-drates, lipids, and fiber, ignoring the small-molecule space. Ignor-ing the systemic role of phytochemicals was attributed to their

FIG 4 Variation of normalized transcriptional activity of DE targets in microbial phyla. The 115 DE targets were divided into two classes, upregulated genes anddownregulated genes. The red blocks represent increased transcriptional level of the gene by the particular microbial phylum, while blue blocks indicatedecreased transcription level. For clarity, logE transformation was performed on the values measuring the variation of transcriptional activities (when the valuewas negative, the absolute value was used for log transformation first and then turned into a negative value again). The four clusters of genes enclosed by rectangleswere the ones from Fig. 3.

14 1

15 1

Celery

1,2-PropanediolopaTrifolium incarnatum

AcetoinAGuava

PotatoesFenugreek

umRaspberry

Mustard greens

Asian rice

2-Hydroxypropionaldehyde

Strawberries

MethylglyoxalAcetate2-Propanol

Glutamic AcideSuccinate

1-Aminocyclopropane-1-carboxylate

Fumarate;Maleate

Mung beans

Soybean

Tomatoes

Sweet pepper

Sweet potato

Peas

2,3-Butanedione

N-Propanol

Propionate

23-Butanediol

2-Ketoglutaric Acid

PropionaldehydeSuccinyl-CoA

Bryonia alba

Beta-HydroxybutyricAcid

Rye

Dihydroxyacetonephosphate

Fallopia multiflora

Spinach

Sunroot

Opium poppy

Buckwheat

Pyruvic Acid

Colon adenocarcinoma

GuavaSweet potatoesColon carcinoma SoybeanTomatoes

BuckwheatColon cancer

Raspberry

Fallopia multiflora Potatoes

Strawberries

A B

FIG 5 Potential dietary interventions targeting the short-chain fatty acid (SCFA) metabolism. (A) The KEGG butyrate (top) (ko00650) and propionate(bottom) (ko00640) metabolic pathways are shown. For both pathways, the metabolites that were also found in 98 plant-based foods present in the NutriChemdatabase were highlighted in yellow. The proteins in the two metabolic pathways were ranked in decreasing order of vulnerability and colored from red (mostsusceptible to change in gene expression when targeted) to green (least susceptible to change in gene expression). (B) Subgroup of the 98 plant-based foods. (Top)The 22 foods (red or yellow ellipses) containing at least two phytochemicals/metabolites involved in butyrate and/or propionate metabolism and their associatedmetabolites (green triangle) are shown. The network is visualized in “degree sorted” style, with a decreasing order of node connectivity from middle bottomtoward counterclockwise direction. Strawberries, mung beans, and soybeans are the foods with the most metabolites and are shown in red. (Bottom) Subset ofthe 22 foods associated with colon cancer. Three different names for “colon cancer” in Disease Ontology were treated equivalently here.

Ni et al.


on Decem


bio.asm.org/

Dow

nloaded from

mbio.asm.org


presence in small amounts in our diet; however, we should notforget that drugs are also given in small amounts, and they have aprofound effect on our health. We believe that the development ofNutriChem, the database linking 1,772 plant-based foods with7,898 phytochemicals and 751 diseases, opened a window of op-portunities for understanding how the molecular components ofdiet interact with the human body, including, as shown here, thebacterial residents of our gut.

However, our analysis is not without limitations: while Nu-triChem is the most complete database of diet-disease associationsat this point in time, the majority of our understanding on theseassociations so far have originated from animal studies or even celllines, calling for more attention and research into this area tomake more accurate extrapolation to humans. It should also benoted that, despite the advances of computational methods, thelack of experimental bioactivity data of small molecules on bacte-rial proteins in public databases is a bottleneck for producing ac-curate ligand-target interaction networks. Further expansion ofour knowledge of the molecular composition of each food willprovide a more accurate picture of the food-microbiome interac-tion network and one more tool for better dietary/therapeuticstrategies. Last but not least, the gene expression changes of theproteins targeted by dietary phytochemicals will not necessarily betranslated into changes of the protein levels. Therefore, additionalmetaproteomic analysis could yield important insights into caus-ative relationships between particular perturbations and the re-spective enzymes/proteins. Furthermore, and since the binding ofa phytochemical to a bacterial target could disrupt the functionwithout affecting the protein’s level, metabolomic data would al-low the establishment of more-accurate linkages between dietarymolecules and the gut ecosystem phenotype.

Even though the framework presented here for modulating thefunctionality and activity of the gut microbiota relies on targetingspecific bacterial genes by food phytochemicals, it can be furtherextended for targeting harmful bacteria, e.g., Bilophila wadswor-thia (51) and Fusobacterium nucleatum (52), which can lead toacute inflammation and potentiate intestinal tumorigenesis, re-spectively, based on the disturbance of species-specific essentialprotein networks.

MATERIALS AND METHODSRNA-Seq data description. The transcriptome sequencing (RNA-Seq)data originated from a recently published study (4) and were deposited inthe Gene Expression Omnibus (GEO) with accession number GSE46761.In that study, the community-wide gene expression status in human gutmicrobiota before (4 and 1 days) and after (3 and 4 days) dietary inter-ventions was measured. To investigate the diet-microbiome interactionsat a molecular level, only the plant-based diet was used here and only theplant foods for which molecular compositions are available. Nine studyvolunteers (subject 1 and 3 to 10) were involved in both baseline andintervention periods of the plant-based diet and were used here. Detaileddata description and more information on experimental design are avail-able in the original publication (4).

Food-phytochemical and food-disease associations. For the foods inthe plant-based diet, their phytochemical composition and disease asso-ciations were retrieved from the NutriChem database (http://www.cbs.d-tu.dk/services/NutriChem-1.0/). After redundancy removal, a collectionof unique compounds formed the phytochemical space of the plant-baseddiet. With regard to the food-disease network, only the food-disease as-sociations supported by more than two references were included (infor-mation from NutriChem database). The third-level Disease Ontologyterm was used to define the disease category. In addition to the foods

present in the plant-based diet, all other links between foods, phytochemi-cals, and diseases were obtained through NutriChem.

Similar chemical space between phytochemicals and drugs. TheSMILES (simplified molecular input line entry system) strings of 437 foodcompounds were retrieved from PubChem (53), while the 1,536 FDA-approved small-molecule drugs and their associated SMILES were ob-tained from DrugBank (54) version 4.1. Based on these structural infor-mation, the RDKit plugin (http://www.rdkit.org) in KNIME (55) wasemployed for the calculation of compound molecular and physical chem-ical descriptors, including 1,024-bit Morgan circular fingerprint, topolog-ical polar surface area (TPSA), octanol-water partition coefficient (SlogP),molecular weight (MW), and numbers of Lipinski hydrogen bond accep-tors (HBA) and donors (HBD). Afterward, a matrix of compound de-scriptors was constructed. Compounds from different groups (e.g., phy-tochemicals or drugs) were distributed along the rows, whereas the 1,024-bit molecular fingerprint and five other properties constituted the 1,029columns. All the principal component analyses (PCAs) were performedinside R.

A subset of all the FDA-approved drugs that according to DrugBankare targeting metabolic pathways was selected; the subset was defined as“drugs with metabolic targets” in this study. Within this category, drugswhose targets have bacterial orthologs based on orthologous analysis (seebelow), were referred to as “drugs with metabolic targets having bacterialorthologs.”

Phytochemical-protein target bioactivity data. Initially, the foodcompounds were mapped to exactly matched compounds in the ChEMBLdatabase (25) or structurally similar ChEMBL compounds, using InChIkey and Morgan circular fingerprints, respectively. Two compounds weredeemed similar when they had a Tanimoto coefficient (calculated fromMorgan fingerprints) higher than 0.85 and their difference in molecularweight was lower than 50 g/mol. The protein targets of the food com-pounds were subsequently retrieved; only interactions within the follow-ing bioactivity thresholds were kept for further analysis: for Ki, dissocia-tion constant (Kd), 50% inhibitory concentration (IC50), and 50%effective concentration (EC50), the p_chem value (calculated as the nega-tive of the logarithm to base 10 of the measured activity) was larger than 6;for inhibition, the measurement value was greater than 30%; for potency,the measurement value was lower than 50 �M (20). To deal with themultiple measurements of the same compound on the same protein, wecalculated a frequency of “positive” measurements among all candidatemeasurements. This was based on the aforementioned thresholds andserved as evidence of compound-protein interaction. Only chemical-protein interactions with a frequency higher than 50% were consideredconfident and were used to derive protein targets of phytochemicals fordownstream analysis (20).

Since the ChEMBL database contains mainly experimental data forinteractions of small molecules with human proteins, the developedchemical-microbial protein network was based on the orthologous rela-tionships between the gut microbial reference genomes and the humangenome. For analyzing the similarity of binding sites between human-microbe protein pairs of phytochemical targets used here, we first ran-domly selected one microbial protein from all orthologous proteins foreach human-microbe protein pair. Then, based on the full sequences ofeach protein pair, COACH (56) was used to identify the stretch of aminoacid sequence that forms the respective binding pocket. These sequencesat the predicted binding site were further compared between human andmicrobial proteins. The same analysis was also performed on randomlyselected human-microbe protein pairs retrieved from ChEMBL that havebeen experimentally tested to share the same ligands.

Depletion of noncoding sequences from raw RNA-Seq data. Toachieve accurate coding sequences (CDS) and functional annotations, aswell as accurate expression level (number of reads per kilobase pair permillion mappable reads [RPKM]) for reads in coding region, we removedall potential rRNA sequences using the following procedures. First, allannotated noncoding sequences from all bacterial references in NCBI



on Decem


bio.asm.org/

Dow

nloaded from

http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE46761

http://www.cbs.dtu.dk/services/NutriChem-1.0/

http://www.cbs.dtu.dk/services/NutriChem-1.0/

http://www.rdkit.org

mbio.asm.org


were downloaded (ftp://ftp.ncbi.nih.gov/genomes/Bacteria/) and com-bined with Silva 16S rRNA sequences into a noncoding RNA database.Second, all the reads were mapped to this noncoding RNA database usingBLASTX with parameters “1e�5 -F F.” At last, we excluded the potentialnoncoding RNA reads if they were mapped to this noncoding RNA data-base with coverage �50%.

Definition of taxonomy information for reads. To define the taxon-omy of the RNA sequences, we retrieved 2,766 bacterial reference genomesequences from the NCBI database (January 2014 version). The informa-tion of the open reading frames (ORFs), their corresponding CDS andprotein sequences, as well as the taxonomy information of the strains werealso retrieved from NCBI. We further mapped all the reads (excludingpotential noncoding RNA) using the BWA (57) version 0.7.4-r385 tothese reference genomes. For each species with a relative abundance ofmore than 0.5% in at least one sample, the strain with the highest numberof total mappable reads from all samples was selected as the representativestrain for this species. Using these procedures, we identified 54 bacterialreference genomes (see Table S2 in the supplemental material). Based onthe alignment with highly identical reads (both identity and coverage ofthe read are more than 95%), SAMtools (58) was employed to manipulatethe BWA alignments. For each coding gene, per gene coverage was calcu-lated using BEDTools (59). The relative abundances of genes in each sam-ple were estimated using RPKM (number of reads per kilobase pair permillion mappable reads) using in-house scripts.

Definition of the ortholog proteins between human and bacteria.We first downloaded all human genes from Ensembl (60) (Ensembl Ge-nome assembly GRCH37 release 75). We selected the longest transcriptfor each gene locus as the representative transcript using in-house pythonscripts. The ortholog relationships between each bacteria gene set from 54reference genomes and human genes was defined using Inparanoid (61)with default parameters. The ortholog pairs between the human genomeand 54 bacterial genomes were 411 on average. We also carried out theortholog definition using reciprocal best-hit blast and found that �95%of the ortholog relationships defined are consistent between two methods.

Functional analysis of metatranscriptomic data. Thirty-eight sam-ples from nine subjects were kept for downstream analysis. For each sub-ject, the two different days (if present) within the same period (baseline ordietary intervention) were merged into one sample. The reads in prepro-cessed merged samples were then mapped to those 54 reference genomeswith BLASTN (e � 1e�05, coverage �70). The genes from the referencegenomes were further annotated with human orthologous genes as de-scribed above. The expression of each phytochemical target was quanti-fied by aggregation of all orthologous microbial genes and then normal-ized using RPKM. Afterward, Wilcoxon’s signed-rank test was employedto determine whether a gene displayed significantly differential expression(DE) after dietary intervention compared with the baseline status. Thefalse discovery rate (62) (FDR) method was used for correction of multi-ple hypothesis testing. Pathway mapping and enrichment analysis for DEand non-DE targets were conducted with Reactome (63) version 49. Adirect comparison of pathways in which DE and non-DE targets areinvolved was conducted with an overrepresentation analysis per-formed in R.

Ortholog-based PPI network and analysis. Since the proteins withinbiological systems rarely act in isolation, we also included the protein-protein interaction (PPI) data that originated from STRING (64) v9.05.Only PPIs with a score higher than 400 (representing a medium-confidence interaction as defined by the STRING authors) and for pairsfor which both human proteins have bacterial orthologs (as definedabove) were retrieved for construction of the microbiota-specific PPI net-work. The PPI network, as well as all other networks displayed in thisstudy, was visualized in Cytoscape (65) v3.2. The NetworkAnalyzer (66)plugin inside Cytoscape was employed for topological analysis, whichcalculated eight topological features: degree connectivity, clustering coef-ficient, topological coefficient, closeness centrality, radiality, stress, be-tweenness centrality, and average shortest path length.

Differential taxonomic contribution to the whole-community re-sponse. The expression change of each gene present in each phylum wascalculated by subtracting the mean expression level before the diet fromthe mean expression level after the diet. Then, for each gene, the phylumcontribution was calculated by dividing the corresponding expressionchange by the overall community-level gene expression change. A positivevalue thus indicates a positive contribution toward, or consistency withthe direction of, the community-level change, regardless of up- or down-regulation. In contrast, a negative value means the opposite direction ofchange between that particular phylum and the microbiota communityon the particular gene. For each of the four main phyla, a group of geneswhich represented an (nearly) exclusive positive contribution of the cor-responding taxon toward community-level expression change of thesegenes was identified. The four groups of genes were functionally anno-tated with Gene Ontology terms (biological processes, third level) usingthe Database for Annotation, Visualization and Integrated Discovery(DAVID) (67).

To measure the species-level contribution, a more stringent thresholdfor BLASTN mapping was used for the gene expression calculation (e �1e�05, identical ratio �95%, where the “identical ratio” was defined asthe product of identity and alignment length, which was then divided byread length). The remaining steps were analogous to the phylum-levelanalysis.

For each microbial phylum, the genuine transcriptional activity ofeach DE target gene was calculated by dividing its mean expression level(among subjects before or after the diet) by the corresponding phylumtaxonomic abundance. This represented the transcriptional activity nor-malized by the phylum-level abundance. Then, subtraction of this nor-malized activity before and after the diet gave the variation of transcrip-tional activity of each gene within each microbial phylum. A positive valuerepresents increased transcription level, while a negative value indicatesdecreased transcription level. The phylum-level abundance data (averageamong subjects before or after the diet) were retrieved from MG-RAST(68).

SCFA-oriented dietary interventions. The proteins participating inbutyrate (KEGG ko00650) and propionate metabolism (KEGG ko00640)were first ranked by individual topological measurements and then aggre-gated with the R package RankAggreg using default settings (69) to pro-vide an overall indication of vulnerability. Last, the pathway maps withmetabolites/phytochemicals and proteins highlighted were generatedwith R Bioconductor package Pathview (70).

SUPPLEMENTAL MATERIALSupplemental material for this article may be found at http://mbio.asm.org/lookup/suppl/doi:10.1128/mBio.01263-15/-/DCSupplemental.

Figure S1, PDF file, 0.4 MB.Figure S2, PDF file, 0.1 MB.Figure S3, PDF file, 0.4 MB.Figure S4, PDF file, 0.4 MB.Figure S5, PDF file, 1.5 MB.Table S1, DOCX file, 0.02 MB.Table S2, DOCX file, 0.02 MB.Table S3, DOCX file, 0.02 MB.Table S4, DOCX file, 0.02 MB.Table S5, DOCX file, 0.02 MB.

ACKNOWLEDGMENTS

G.P. and J.L. thank the HKU-SRT of Genomics for their technical sup-port. Y.N. thanks David Westergaard for fruitful discussions and supporton the chemical-protein interaction network construction.

REFERENCES1. Flint HJ, Scott KP, Louis P, Duncan SH. 2012. The role of the gut

microbiota in nutrition and health. Nat Rev Gastroenterol Hepatol9:577–589. http://dx.doi.org/10.1038/nrgastro.2012.156.

2. Eckburg PB, Bik EM, Bernstein CN, Purdom E, Dethlefsen L, Sargent

Ni et al.


on Decem


bio.asm.org/

Dow

nloaded from

ftp://ftp.ncbi.nih.gov/genomes/Bacteria/

http://mbio.asm.org/lookup/suppl/doi:10.1128/mBio.01263-15/-/DCSupplemental

http://mbio.asm.org/lookup/suppl/doi:10.1128/mBio.01263-15/-/DCSupplemental

http://dx.doi.org/10.1038/nrgastro.2012.156

mbio.asm.org


M, Gill SR, Nelson KE, Relman DA. 2005. Diversity of the humanintestinal microbial flora. Science 308:1635–1638. http://dx.doi.org/10.1126/science.1110591.

3. Walker AW, Ince J, Duncan SH, Webster LM, Holtrop G, Ze X, BrownD, Stares MD, Scott P, Bergerat A, Louis P, McIntosh F, Johnstone AM,Lobley GE, Parkhill J, Flint HJ. 2011. Dominant and diet-responsivegroups of bacteria within the human colonic microbiota. ISME J5:220 –230. http://dx.doi.org/10.1038/ismej.2010.118.

4. David LA, Maurice CF, Carmody RN, Gootenberg DB, Button JE,Wolfe BE, Ling AV, Devlin AS, Varma Y, Fischbach MA, Biddinger SB,Dutton RJ, Turnbaugh PJ. 2014. Diet rapidly and reproducibly alters thehuman gut microbiome. Nature 505:559 –563. http://dx.doi.org/10.1038/nature12820.

5. Macfarlane G, Gibson G. 1997. Carbohydrate fermentation, energy trans-duction and gas metabolism in the human large intestine, p 269 –318. InMackie R, White B (ed), Gastrointestinal microbiology. Springer, NewYork, NY.

6. Magee EA, Richardson CJ, Hughes R, Cummings JH. 2000. Contribu-tion of dietary protein to sulfide production in the large intestine: an invitro and a controlled feeding study in humans. Am J Clin Nutr 72:1488 –1494.

7. Barrasa JI, Olmo N, Lizarbe MA, Turnay J. 2013. Bile acids in the colon,from healthy to cytotoxic molecules. Toxicol In Vitro 27:964 –977. http://dx.doi.org/10.1016/j.tiv.2012.12.020.

8. Williamson G, Clifford MN. 2010. Colonic metabolites of berrypolyphenols: the missing link to biological activity? Br J Nutr 104(Suppl3):S48 –S66. http://dx.doi.org/10.1017/S0007114510003946.

9. Bialonska D, Ramnani P, Kasimsetty SG, Muntha KR, Gibson GR,Ferreira D. 2010. The influence of pomegranate by-product and punicala-gins on selected groups of human intestinal microbiota. Int J Food Micro-biol 140:175–182. http://dx.doi.org/10.1016/j.ijfoodmicro.2010.03.038.

10. Selma MV, Espín JC, Tomás-Barberán FA. 2009. Interaction betweenphenolics and gut microbiota: role in human health. J Agric Food Chem57:6485– 6501. http://dx.doi.org/10.1021/jf902107d.

11. Tzounis X, Rodriguez-Mateos A, Vulevic J, Gibson GR, Kwik-Uribe C,Spencer JP. 2011. Prebiotic evaluation of cocoa-derived flavanols inhealthy humans by using a randomized, controlled, double-blind, cross-over intervention study. Am J Clin Nutr 93:62–72. http://dx.doi.org/10.3945/ajcn.110.000075.

12. Cardona F, Andrés-Lacueva C, Tulipani S, Tinahones FJ, Queipo-Ortuño MI. 2013. Benefits of polyphenols on gut microbiota and impli-cations in human health. J Nutr Biochem 24:1415–1422. http://dx.doi.org/10.1016/j.jnutbio.2013.05.001.

13. Russell WR, Labat A, Scobbie L, Duncan SH. 2007. Availability ofblueberry phenolics for microbial metabolism in the colon and the poten-tial inflammatory implications. Mol Nutr Food Res 51:726 –731. http://dx.doi.org/10.1002/mnfr.200700022.

14. Larrosa M, Luceri C, Vivoli E, Pagliuca C, Lodovici M, Moneti G,Dolara P. 2009. Polyphenol metabolites from colonic microbiota exertanti-inflammatory activity on different inflammation models. Mol NutrFood Res 53:1044 –1054. http://dx.doi.org/10.1002/mnfr.200800446.

15. Lambert PA. 2002. Cellular impermeability and uptake of biocides andantibiotics in Gram-positive bacteria and mycobacteria. J Appl Microbiol92(Suppl):46S–54S. http://dx.doi.org/10.1046/j.1365-2672.92.5s1.7.x.

16. Nikaido H, Vaara M. 1985. Molecular basis of bacterial outer membranepermeability. Microbiol Res 49:1–32.

17. Jensen K, Panagiotou G, Kouskoumvekaki I. 2014. Integrated text min-ing and chemoinformatics analysis associates diet to health benefit at mo-lecular level. PLoS Comput Biol 10:e1003432. http://dx.doi.org/10.1371/journal.pcbi.1003432.

18. Jensen K, Panagiotou G, Kouskoumvekaki I. 2015. NutriChem: a sys-tems chemical biology resource to explore the medicinal value of plant-based foods. Nucleic Acids Res 43:D940 –D945. http://dx.doi.org/10.1093/nar/gku724.

19. Jensen K, Ni Y, Panagiotou G, Kouskoumvekaki I. 2015. Developing amolecular roadmap of drug-food interactions. PLoS Comput Biol 11:e1004048. http://dx.doi.org/10.1371/journal.pcbi.1004048.

20. Westergaard D, Li J, Jensen K, Kouskoumvekaki I, Panagiotou G. 2014.Exploring mechanisms of diet-colon cancer associations through candi-date molecular interaction networks. BMC Genomics 15:380. http://dx.doi.org/10.1186/1471-2164-15-380.

21. Atarashi K, Tanoue T, Oshima K, Suda W, Nagano Y, Nishikawa H,Fukuda S, Saito T, Narushima S, Hase K, Kim S, Fritz JV, Wilmes P,

Ueha S, Matsushima K, Ohno H, Olle B, Sakaguchi S, Taniguchi T,Morita H, Hattori M, Honda K. 2013. Treg induction by a rationallyselected mixture of Clostridia strains from the human microbiota. Nature500:232–236. http://dx.doi.org/10.1038/nature12331.

22. Ridlon JM, Kang DJ, Hylemon PB, Bajaj JS. 2014. Bile acids and the gutmicrobiome. Curr Opin Gastroenterol 30:332–338. http://dx.doi.org/10.1097/MOG.0000000000000057.

23. Cho I, Blaser MJ. 2012. The human microbiome: at the interface of healthand disease. Nat Rev Genet 13:260 –270. http://dx.doi.org/10.1038/nrg3182.

24. Clemente JC, Ursell LK, Parfrey LW, Knight R. 2012. The impact of thegut microbiota on human health: an integrative view. Cell 148:1258 –1270.http://dx.doi.org/10.1016/j.cell.2012.01.035.

25. Gaulton A, Bellis LJ, Bento AP, Chambers J, Davies M, Hersey A, LightY, McGlinchey S, Michalovich D, Al-Lazikani B, Overington JP. 2012.ChEMBL: a large-scale bioactivity database for drug discovery. NucleicAcids Res 40:D1100 –D1107. http://dx.doi.org/10.1093/nar/gkr777.

26. Iskar M, Campillos M, Kuhn M, Jensen LJ, van Noort V, Bork P. 2010.Drug-induced regulation of target expression. PLoS Comput Biol6:e100925. http://dx.doi.org/10.1371/journal.pcbi.1000925.

27. Barabási A, Oltvai ZN. 2004. Network biology: understanding the cell’sfunctional organization. Nat Rev Genet 5:101–113. http://dx.doi.org/10.1038/nrg1272.

28. Duncan SH, Belenguer A, Holtrop G, Johnstone AM, Flint HJ, LobleyGE. 2007. Reduced dietary intake of carbohydrates by obese subjects re-sults in decreased concentrations of butyrate and butyrate-producing bac-teria in feces. Appl Environ Microbiol 73:1073–1078. http://dx.doi.org/10.1128/AEM.02340-06.

29. Louis P, Flint HJ. 2009. Diversity, metabolism and microbial ecologyof butyrate-producing bacteria from the human large intestine. FEMSMicrobio l Let t 294:1– 8. ht tp : / /dx .doi .org/10 .1111/ j .1574-6968.2009.01514.x.

30. Manichanh C, Rigottier-Gois L, Bonnaud E, Gloux K, Pelletier E,Frangeul L, Nalin R, Jarrin C, Chardon P, Marteau P, Roca J, Dore J.2006. Reduced diversity of faecal microbiota in Crohn’s disease revealedby a metagenomic approach. Gut 55:205–211. http://dx.doi.org/10.1136/gut.2005.073817.

31. Vermeiren J, Van den Abbeele P, Laukens D, Vigsnaes LK, De VosM, Boon N, Van de Wiele T. 2012. Decreased colonization of fecalClostridium coccoides/Eubacterium rectale species from ulcerativecolitis patients in an in vitro dynamic gut model with mucin environ-ment. FEMS Microbiol Ecol 79:685– 696. http://dx.doi.org/10.1111/j.1574-6941.2011.01252.x.

32. Scheppach W. 1994. Effects of short chain fatty acids on gut morphologyand function. Gut 35:S35–S38. http://dx.doi.org/10.1136/gut.35.1_Suppl.S35.

33. den Besten G, van Eunen K, Groen AK, Venema K, Reijngoud DJ,Bakker BM. 2013. The role of short-chain fatty acids in the interplaybetween diet, gut microbiota, and host energy metabolism. J Lipid Res54:2325–2340. http://dx.doi.org/10.1194/jlr.R036012.

34. Hamer HM, Jonkers D, Venema K, Vanhoutvin S, Troost FJ, BrummerRJ. 2008. The role of butyrate on colonic function. Aliment PharmacolTher 27:104 –119. http://dx.doi.org/10.1111/j.1365-2036.2007.03562.x.

35. Canani RB, Costanzo MD, Leone L, Pedata M, Meli R, Calignano A.2011. Potential beneficial effects of butyrate in intestinal and extraintesti-nal diseases. World J Gastroenterol 17:1519 –1528. http://dx.doi.org/10.3748/wjg.v17.i12.1519.

36. Schriml LM, Arze C, Nadendla S, Chang Y-W, Mazaitis M, Felix V,Feng G, Kibbe WA. 2012. Disease Ontology: a backbone for disease se-mantic integration. Nucleic Acids Res 40:D940 –D946. http://dx.doi.org/10.1093/nar/gkr972.

37. Dapito DH, Mencin A, Gwak GY, Pradere JP, Jang MK, Mederacke I,Caviglia JM, Khiabanian H, Adeyemi A, Bataller R, Lefkowitch JH,Bower M, Friedman R, Sartor RB, Rabadan R, Schwabe RF. 2012.Promotion of hepatocellular carcinoma by the intestinal microbiotaand TLR4. Cancer Cell 21:504 –516. http://dx.doi.org/10.1016/j.ccr.2012.02.007.

38. Qin J, Li Y, Cai Z, Li S, Zhu J, Zhang F, Liang S, Zhang W, Guan Y,Shen D, Peng Y, Zhang D, Jie Z, Wu W, Qin Y, Xue W, Li J, Han L, LuD, Wu P, Dai Y, Sun X, Li Z, Tang A, Zhong S, Li X, Chen W, Xu R,Wang M, Feng Q, Gong M, Yu J, Zhang Y, Zhang M, Hansen T,Sanchez G, Raes J, Falony G, Okuda S, Almeida M, LeChatelier E,Renault P, Pons N, Batto JM, Zhang Z, Chen H, Yang R, Zheng W, Li



on Decem


bio.asm.org/

Dow

nloaded from

http://dx.doi.org/10.1126/science.1110591


http://dx.doi.org/10.1038/ismej.2010.118

http://dx.doi.org/10.1038/nature12820


http://dx.doi.org/10.1016/j.tiv.2012.12.020

http://dx.doi.org/10.1016/j.tiv.2012.12.020

http://dx.doi.org/10.1017/S0007114510003946

http://dx.doi.org/10.1016/j.ijfoodmicro.2010.03.038

http://dx.doi.org/10.1021/jf902107d

http://dx.doi.org/10.3945/ajcn.110.000075

http://dx.doi.org/10.3945/ajcn.110.000075

http://dx.doi.org/10.1016/j.jnutbio.2013.05.001

http://dx.doi.org/10.1016/j.jnutbio.2013.05.001

http://dx.doi.org/10.1002/mnfr.200700022



http://dx.doi.org/10.1046/j.1365-2672.92.5s1.7.x

http://dx.doi.org/10.1371/journal.pcbi.1003432


http://dx.doi.org/10.1093/nar/gku724



http://dx.doi.org/10.1186/1471-2164-15-380

http://dx.doi.org/10.1186/1471-2164-15-380


http://dx.doi.org/10.1097/MOG.0000000000000057

http://dx.doi.org/10.1097/MOG.0000000000000057

http://dx.doi.org/10.1038/nrg3182


http://dx.doi.org/10.1016/j.cell.2012.01.035

http://dx.doi.org/10.1093/nar/gkr777




http://dx.doi.org/10.1128/AEM.02340-06

http://dx.doi.org/10.1128/AEM.02340-06

http://dx.doi.org/10.1111/j.1574-6968.2009.01514.x

http://dx.doi.org/10.1111/j.1574-6968.2009.01514.x

http://dx.doi.org/10.1136/gut.2005.073817

http://dx.doi.org/10.1136/gut.2005.073817

http://dx.doi.org/10.1111/j.1574-6941.2011.01252.x

http://dx.doi.org/10.1111/j.1574-6941.2011.01252.x

http://dx.doi.org/10.1136/gut.35.1_Suppl.S35

http://dx.doi.org/10.1136/gut.35.1_Suppl.S35

http://dx.doi.org/10.1194/jlr.R036012

http://dx.doi.org/10.1111/j.1365-2036.2007.03562.x

http://dx.doi.org/10.3748/wjg.v17.i12.1519

http://dx.doi.org/10.3748/wjg.v17.i12.1519



http://dx.doi.org/10.1016/j.ccr.2012.02.007

http://dx.doi.org/10.1016/j.ccr.2012.02.007

mbio.asm.org


S, Yang H, et al. 2012. A metagenome-wide association study of gutmicrobiota in type 2 diabetes. Nature 490:55– 60. http://dx.doi.org/10.1038/nature11450.

39. Wong JMW, de Souza R, Kendall CWC, Emam A, Jenkins DJA. 2006.Colonic health: fermentation and short chain fatty acids. J Clin Gastroen-terol 40:235–243. http://dx.doi.org/10.1097/00004836-200603000-00015.

40. Holmes E, Li JV, Marchesi JR, Nicholson JK. 2012. Gut microbiotacomposition and activity in relation to host metabolic phenotype anddisease risk. Cell Metab 16:559 –564. http://dx.doi.org/10.1016/j.cmet.2012.10.007.

41. Reardon S. 2014. Microbiome therapy gains market traction. Nature 509:269 –270. http://dx.doi.org/10.1038/509269a.

42. Qin J, Li R, Raes J, Arumugam M, Burgdorf KS, Manichanh C, NielsenT, Pons N, Levenez F, Yamada T, Mende DR, Li J, Xu J, Li S, Li D, CaoJ, Wang B, Liang H, Zheng H, Xie Y, Tap J, Lepage P, Bertalan M, BattoJM, Hansen T, Le Paslier D, Linneberg A, Nielsen HB, Pelletier E,Renault P, Sicheritz-Ponten T, Turner K, Zhu H, Yu C, Li S, Jian M,Zhou Y, Li Y, Zhang X, Li S, Qin N, Yang H, Wang J, Brunak S, DoreJ, Guarner F, Kristiansen K, Pedersen O, Parkhill J, Weissenbach J, etal. 2010. A human gut microbial gene catalogue established by metag-enomic sequencing. Nature 464:59 – 65. http://dx.doi.org/10.1038/nature08821.

43. NIH HMP Working Group, Peterson J, Garges S, Giovanni M, McInnesP, Wang L, Schloss JA, Bonazzi V, McEwen JE, Wetterstrand KA, DealC, Baker CC, Di Francesco V, Howcroft TK, Karp RW, Lunsford RD,Wellington CR, Belachew T, Wright M, Giblin C, David H, Mills M,Salomon R, Mullins C, Akolkar B, Begg L, Davis C, Grandison L,Humble M, Khalsa J, Little AR, Peavy H, Pontzer C, Portnoy M, SayreMH, Starke-Reed P, Zakhari S, Read J, Watson B, Guyer M. 2009. TheNIH Human Microbiome Project. Genome Res 19:2317–2323. http://dx.doi.org/10.1101/gr.096651.109.

44. McNulty NP, Yatsunenko T, Hsiao A, Faith JJ, Muegge BD, GoodmanAL, Henrissat B, Oozeer R, Cools-Portier S, Gobert G, Chervaux C,Knights D, Lozupone CA, Knight R, Duncan AE, Bain JR, MuehlbauerMJ, Newgard CB, Heath AC, Gordon JI. 2011. The impact of a consor-tium of fermented milk strains on the gut microbiome of gnotobiotic miceand monozygotic twins. Sci Transl Med 3:106ra106. http://dx.doi.org/10.1126/scitranslmed.3002701.

45. Faith JJ, McNulty NP, Rey FE, Gordon JI. 2011. Predicting a human gutmicrobiota’s response to diet in gnotobiotic mice. Science 333:101–104.http://dx.doi.org/10.1126/science.1206025.

46. Maurice CF, Haiser HJ, Turnbaugh PJ. 2013. Xenobiotics shape thephysiology and gene expression of the active human gut microbiome. Cell152:39 –50. http://dx.doi.org/10.1016/j.cell.2012.10.052.

47. Hwang YC, Lin CC, Chang JY, Mori H, Juan HF, Huang HC. 2009.Predicting essential genes based on network and sequence analysis. MolBiosyst 5:1672–1678. http://dx.doi.org/10.1039/b900611g.

48. Jeong H, Mason SP, Barabási AL, Oltvai ZN. 2001. Lethality and cen-trality in protein networks. Nature 411:41– 42. http://dx.doi.org/10.1038/35075138.

49. Zhou L, Ma X, Sun F. 2008. The effects of protein interactions, geneessentiality and regulatory regions on expression variation. BMC Syst Biol2:54. http://dx.doi.org/10.1186/1752-0509-2-54.

50. Ideker T, Sharan R. 2008. Protein networks in disease. Genome Res18:644 – 652. http://dx.doi.org/10.1101/gr.071852.107.

51. O’Keefe SJ, Li JV, Lahti L, Ou J, Carbonero F, Mohammed K, PosmaJM, Kinross J, Wahl E, Ruder E, Vipperla K, Naidoo V, Mtshali L, TimsS, Puylaert PG, DeLany J, Krasinskas A, Benefiel AC, Kaseb HO,Newton K, Nicholson JK, de Vos WM, Gaskins HR, Zoetendal EG.2015. Fat, fibre and cancer risk in African Americans and rural Africans.Nat Commun 6:6342.

52. Kostic AD, Chun E, Robertson L, Glickman JN, Gallini CA, MichaudM, Clancy TE, Chung DC, Lochhead P, Hold GL, El-Omar EM,Brenner D, Fuchs CS, Meyerson M, Garrett WS. 2013. Fusobacteriumnucleatum potentiates intestinal tumorigenesis and modulates the tumor-immune microenvironment. Cell Host Microbe 14:207–215. http://dx.doi.org/10.1016/j.chom.2013.07.007.

53. Bolton EE, Wang Y, Thiessen PA, Bryant SH. 2008. Chapter 12.PubChem: integrated platform of small molecules and biological activi-ties, p 217–241. In Ralph AW, David CS (ed), Annual reports in compu-tational chemistry, vol 4. Elsevier, Oxford, United Kingdom.

54. Law V, Knox C, Djoumbou Y, Jewison T, Guo AC, Liu Y, Maciejewski

A, Arndt D, Wilson M, Neveu V, Tang A, Gabriel G, Ly C, Adamjee S,Dame ZT, Han B, Zhou Y, Wishart DS. 2014. DrugBank 4.0: sheddingnew light on drug metabolism. Nucleic Acids Res 42:D1091–D1097.http://dx.doi.org/10.1093/nar/gkt1068.

55. Berthold M, Cebron N, Dill F, Gabriel T, Kötter T, Meinl T, Ohl P, SiebC, Thiel K, Wiswedel B. 2008. KNIME: the Konstanz information miner,p 319 –326. In Preisach C, Burkhardt H, Schmidt-Thieme L, Decker R(ed), Data analysis, machine learning and applications. Springer, Berlin,Germany. http://dx.doi.org/10.1007/978-3-540-78246-9_38.

56. Yang J, Roy A, Zhang Y. 2013. Protein-ligand binding site recognitionusing complementary binding-specific substructure comparison and se-quence profile alignment. Bioinformatics 29:2588 –2595. http://dx.doi.org/10.1093/bioinformatics/btt447.

57. Li H, Durbin R. 2009. Fast and accurate short read alignment withBurrows-Wheeler transform. Bioinformatics 25:1754 –1760. http://dx.doi.org/10.1093/bioinformatics/btp324.

58. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G,Abecasis G, Durbin R, 1000 Genome Project Data Processing Sub-group. 2009. The Sequence Alignment/Map format and SAMtools. Bioin-formatics 25:2078 –2079. http://dx.doi.org/10.1093/bioinformatics/btp352.

59. Quinlan AR, Hall IM. 2010. BEDTools: a flexible suite of utilities forcomparing genomic features. BioInformatics 26:841– 842. http://dx.doi.org/10.1093/bioinformatics/btq033.

60. Cunningham F, Amode MR, Barrell D, Beal K, Billis K, Brent S,Carvalho-Silva D, Clapham P, Coates G, Fitzgerald S, Gil L, Giron CG,Gordon L, Hourlier T, Hunt SE, Janacek SH, Johnson N, Juettemann T,Kahari AK, Keenan S, Martin FJ, Maurel T, McLaren W, Murphy DN,Nag R, Overduin B, Parker A, Patricio M, Perry E, Pignatelli M, RiatHS, Sheppard D, Taylor K, Thormann A, Vullo A, Wilder SP, ZadissaA, Aken BL, Birney E, Harrow J, Kinsella R, Muffato M, Ruffier M,Searle SM, Spudich G, Trevanion SJ, Yates A, Zerbino DR, Flicek P.2015. Ensembl 2015. Nucleic Acids Res 43:D662–D669. http://dx.doi.org/10.1093/nar/gku1010.

61. Remm M, Storm CEV, Sonnhammer ELL. 2001. Automatic clustering oforthologs and in-paralogs from pairwise species comparisons. J Mol Biol314:1041–1052. http://dx.doi.org/10.1006/jmbi.2000.5197.

62. Benjamini Y, Hochberg Y. 1995. Controlling the false discovery rate—apractical and powerful approach to multiple testing. J R Stat Soc Ser BMethod 57:289 –300.

63. Croft D, Mundo AF, Haw R, Milacic M, Weiser J, Wu G, Caudy M,Garapati P, Gillespie M, Kamdar MR, Jassal B, Jupe S, Matthews L, MayB, Palatnik S, Rothfels K, Shamovsky V, Song H, Williams M, Birney E,Hermjakob H, Stein L, D’Eustachio P. 2014. The Reactome pathwayknowledgebase. Nucleic Acids Res 42:D472–D477. http://dx.doi.org/10.1093/nar/gkt1102.

64. Franceschini A, Szklarczyk D, Frankild S, Kuhn M, Simonovic M, RothA, Lin J, Minguez P, Bork P, von Mering C, Jensen LJ. 2013. STRINGv9.1: protein-protein interaction networks, with increased coverage andintegration. Nucleic Acids Res 41:D808 –D815. http://dx.doi.org/10.1093/nar/gks1094.

65. Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, AminN, Schwikowski B, Ideker T. 2003. Cytoscape: a software environment forintegrated models of biomolecular interaction networks. Genome Res 13:2498 –2504. http://dx.doi.org/10.1101/gr.1239303.

66. Assenov Y, Ramirez F, Schelhorn SE, Lengauer T, Albrecht M. 2008.Computing topological parameters of biological networks. Bioinformat-ics 24:282–284. http://dx.doi.org/10.1093/bioinformatics/btm554.

67. Huang DW, Sherman BT, Lempicki RA. 2008. Systematic and integrativeanalysis of large gene lists using DAVID bioinformatics resources. NatProtoc 4:44 –57. http://dx.doi.org/10.1038/nprot.2008.211.

68. Meyer F, Paarmann D, D’Souza M, Olson R, Glass EM, Kubal M,Paczian T, Rodriguez A, Stevens R, Wilke A, Wilkening J, Edwards RA.2008. The metagenomics RAST server—a public resource for the auto-matic phylogenetic and functional analysis of metagenomes. BMC Bioin-formatics 9:386. http://dx.doi.org/10.1186/1471-2105-9-386.

69. Pihur V, Datta S, Datta S. 2009. RankAggreg, an R package for weightedrank aggregation. BMC Bioinformatics 10:62. http://dx.doi.org/10.1186/1471-2105-10-62.

70. Luo W, Brouwer C. 2013. Pathview: an R/Bioconductor package forpathway-based data integration and visualization. Bioinformatics 29:1830 –1831. http://dx.doi.org/10.1093/bioinformatics/btt285.

Ni et al.


on Decem


bio.asm.org/

Dow

nloaded from



http://dx.doi.org/10.1097/00004836-200603000-00015

http://dx.doi.org/10.1016/j.cmet.2012.10.007

http://dx.doi.org/10.1016/j.cmet.2012.10.007

http://dx.doi.org/10.1038/509269a



http://dx.doi.org/10.1101/gr.096651.109

http://dx.doi.org/10.1101/gr.096651.109

http://dx.doi.org/10.1126/scitranslmed.3002701

http://dx.doi.org/10.1126/scitranslmed.3002701


http://dx.doi.org/10.1016/j.cell.2012.10.052

http://dx.doi.org/10.1039/b900611g

http://dx.doi.org/10.1038/35075138

http://dx.doi.org/10.1038/35075138

http://dx.doi.org/10.1186/1752-0509-2-54

http://dx.doi.org/10.1101/gr.071852.107

http://dx.doi.org/10.1016/j.chom.2013.07.007

http://dx.doi.org/10.1016/j.chom.2013.07.007

http://dx.doi.org/10.1093/nar/gkt1068

http://dx.doi.org/10.1007/978-3-540-78246-9_38

http://dx.doi.org/10.1093/bioinformatics/btt447


http://dx.doi.org/10.1093/bioinformatics/btp324




http://dx.doi.org/10.1093/bioinformatics/btq033

http://dx.doi.org/10.1093/bioinformatics/btq033



http://dx.doi.org/10.1006/jmbi.2000.5197



http://dx.doi.org/10.1093/nar/gks1094

http://dx.doi.org/10.1093/nar/gks1094

http://dx.doi.org/10.1101/gr.1239303

http://dx.doi.org/10.1093/bioinformatics/btm554

http://dx.doi.org/10.1038/nprot.2008.211

http://dx.doi.org/10.1186/1471-2105-9-386

http://dx.doi.org/10.1186/1471-2105-10-62

http://dx.doi.org/10.1186/1471-2105-10-62


mbio.asm.org


a molecular-level landscape of diet-gut microbiome interactions: toward dietary ... · dietary...

Documents