using network models for analyzing microbiome data · genomics: study of structure, function,...
TRANSCRIPT
Outline:• Brief introduction to metagenomics
• Case study: Phyllosphere fungal networks, andBare Patch Disease of Wheat
• Hands-on example
Using network models for analyzing microbiome data
ByRavin Poudel
Ricardo I. Alcala-BriseñoGarrett Lab
ICPP, Boston, MA
Genomics: Study of structure, function, evolution, and mapping of genomes.
Metagenomics ( Environmental Genomics or Community Genomics) is the study of genomes recovered from environmental samples without the need for culturing them.
Culture-independent analysisv 16S ribosomal RNA (rRNA) sequencingv Whole genome sequencingv Metagenomics: No need for culturing/ use samples directly from environment
Brief introduction to metagenomics
Example: 16S ribosomal RNA (rRNA) sequencing
Sample
PCR amplification
Sequencing
Identification
OTUs / Taxa
Sam
ple
Count Data
Case study- metagenomics and network models • OTU/ taxon is represented as a node
• Links defines the relationship between two OTUs
• Various methods can be used to define the links
• Presence or absence of association (co-occurrence)
• Statistical approach to define the association • Such as correlation, proportionality
• Methods robust for metagenomics data ~ deal with compositional bias and spurious correlation.
• SparCC, SpiecEasi….. and so on…
• Mostly, to reduce compositional bias associated with data type.
OTU/Taxon
Link
Phytopathology, 2016
Pat
hog
en r
esp
onse
var
iabl
e
Host response variable
+
+
_
_
General Network Analysis
• OTU with sequence count
Host-Focused Network Analysis
• OTU with sequence count• Host associated variable
Pathogen-Focused Network Analysis
• OTU with sequence count• Pathogen fitness variable
Diseased-Focused Network Analysis
• OTU with sequence count• Pathogen fitness variable• Host associated variable
General frameworkGoal: To select potential candidate taxa based on – interactions and network attributes
6
Microbiome networks: A systems framework for identifying candidate microbial assemblages for disease management
• Node = Fungal taxon
• Links = association calculated using SparCC
• Fungal phyllosphere data (Jumpponen et. al., 2010)
Data Source:
Microbiome networks: A systems framework for identifying candidate microbial assemblages for disease management
General analysis
Aureobasidium
A
0
14
Aureobasidium
Module hubs Network hubs
Peripherals Connectors-2
-1
0
1
2
3
0.00 0.25 0.50 0.75 1.00Among-module connectivity
With
in-m
odul
e de
gree
B
Phosynthetic rate
C
Pathogen
D
Aureobasidium
A
0
14
Aureobasidium
Module hubs Network hubs
Peripherals Connectors-2
-1
0
1
2
3
0.00 0.25 0.50 0.75 1.00Among-module connectivity
With
in-m
odul
e de
gree
B
Phosynthetic rate
C
Pathogen
D
"rnetcarto"
Guimera et al. (2005).
Connector
Peripheral
Module hub
Network hub
Module
Microbiome networks: A systems framework for identifying candidate microbial assemblages for disease management
First degree neighborsSecond degree neighborsThird degree neighborsMixed Association
First degree neighborsSecond degree neighborsThird degree neighborsMixed Association
Host-focused analysis
Aureobasidium
A 014
Aureobasidium
Module hubs Network hubs
Peripherals Connectors-2
-1
0
1
2
3
0.00 0.25 0.50 0.75 1.00Among-module connectivity
With
in-m
odul
e de
gree
B
Photosynthetic rate
C
Pathogen
D
Goal à Biofertilizers
Blockingà Negatively Linked Nodes
Aureobasidium
A
0
14
Aureobasidium
Module hubs Network hubs
Peripherals Connectors-2
-1
0
1
2
3
0.00 0.25 0.50 0.75 1.00Among-module connectivity
With
in-m
odul
e de
gree
B
Phosynthetic rate
C
Pathogen
DMicrobiome networks: A systems framework for identifying candidate microbial assemblages for disease management
Pathogen-focused analysis
First degree neighborsSecond degree neighborsThird degree neighborsMixed Association
First degree neighborsSecond degree neighborsThird degree neighborsMixed Association
Blockingà Positively Linked Nodes
Goal à Biocontrols
Applied and Environmental Microbiology, 2013
Data source:
Disease-focused analysis
Berendsen, 2012
11
Disease
Enterobacteriaceae
SphingomonadaceaeSphingomonadaceae
Microbacteriaceae
Pseudomonadaceae
Micrococcaceae
Comamonadaceae
Chitinophagaceae
Chitinophagaceae
Streptomycetaceae
PseudomonadaceaeEnterobacteriaceae
Burkholderiaceae
Pseudomonadaceae
Acidobacteriaceae
Streptomycetaceae
Burkholderiaceae
Moraxellaceae
Microbacteriaceae
Nocardiaceae
Enterobacteriaceae
Micromonosporaceae
Chitinophagaceae
Nocardioidaceae
Caulobacteraceae
Flavobacteriaceae
Sphingomonadaceae
Bradyrhizobiaceae
Gemmatimonadetes
Caulobacteraceae
Sphingobacteriaceae
Bradyrhizobiaceae
Phyllobacteriaceae
Gemmatimonadetes
Mycobacteriaceae
Bradyrhizobiaceae
Micrococcaceae
Ellin329
Actinomycetales
Actinomycetales
Dermabacteraceae
Cytophagaceae
Oxalobacteraceae
Gaiellaceae
Comamonadaceae
Flavobacteriaceae
N1423WL
Sphingobacteriaceae
Pseudonocardiaceae
Sphingobacteriaceae
Mycobacteriaceae
Xanthomonadaceae
Frankiaceae
Propionibacteriaceae
Xanthomonadaceae
Actinomycetales
Comamonadaceae
Enterobacteriaceae
Ellin5301
Propionibacteriaceae
Gaiellaceae
Sphingomonadaceae
Glycomycetaceae
Pseudomonadaceae
Mycobacteriaceaea
Enterobaå◊cteriaceae
Nocardioidaceae
Rhizobiales
Pseudomonadaceae
Ellin6075
Frankiaceae
Comamonadaceae
SC-I-84
Burkholderiaceae
N1423WL
odGeodermatophilaceae
Propionibacteriaceae
Rhizobiales
Gemmatimonadetes
Acidobacteriaceae
Methylobacteriaceae
Acetobacteraceae
Rhodospirillaceae
Solirubrobacterales
Chitinophagaceae
Nocardiaceae
Hyphomicrobiaceae
Propionibacteriaceae
Geodermatophilaceae
Oxalobacteraceae
Comamonadaceae
N1423WL
Methylobacteriaceae
Caulobacteraceae
Burkholderiales
Sphingobacteriaceae
TM7-3
TM7-1
Pseudomonadaceae
GaiellaceaeEllin329
Sphingobacteriaceae
Rhodospirillaceae
Gemmatimonadetes
Micrococcaceae
Sphingobacteriaceae
Ellin329
Chitinophagaceae
Koribacteraceae
Acetobacteraceae
Moraxellaceae
Rhodospirillaceae
Acidobacteriaceae
Solirubrobacterales
Gemmatimonadetes
Burkholderiaceae
Caulobacteraceae
Micrococcaceae
Xanthomonadaceae
Microbacteriaceae
Xanthomonadaceae
SC-I-84
Nocardioidaceae
TM7-1
Solirubrobacterales
Solibacteraceae
Koribacteraceae
Hyphomicrobiaceae
Hyphomicrobiaceae
Acidobacteriaceae
Acidobacteriaceae
SC-I-84
Xanthomonadaceae
Flavobacteriaceae
Sphingomonadaceae
SinobacteraceaeAcidobacteriaceae
RhodospirillaceaeAcidobacteriaceae
Solibacteraceae
Solirubrobacterales
GemmatimonadetesEllin6537
WPS-2
Koribacteracea
Disease-focused analysis
• Blue link = positive association with Rhizoctonia solani
• Red link = negative association with Rhizoctonia solani
• Node size= Abundance
• Node associated with disease
• Node associated with healthy state
12
Poster Number: 431-P
Do grafting and rootstock genotype affect the rhizobiome? A study of tomato systems
Hands-on experience on building network models withmicrobiome date
Let’s switch to RStudio
# Loading required packages library(igraph)library(Hmisc) library(Matrix)
# Load the data with the OTU table: otudata.csvotu.table<-read.csv(file.choose(), header=T, row.names = 1)
# Read taxonomy file associated with OTU table into new object: otu_taxonomy.csv
tax<-read.csv(file.choose(),header=T, row.names = 1)
# Check how many OTUs we havedim(otu.table)
# Check for the decrease in the number of OTUsdim(otu.table.filter)
# Keep the OTUs with more than 10 countsotu.table.filter<-otu.table[,colSums(otu.table)>10]
# Calculate pairwise correlation between OTUsotu.cor<-rcorr(as.matrix(otu.table.filter), type="spearman")
# Get p-value matrixotu.pval <- forceSymmetric(otu.cor$P) # Self-correlation as NA
# Select only the taxa for the filtered OTUs by using rownames of otu.pvalsel.tax <-tax[rownames(otu.pval),,drop=FALSE]
# Sanity check all.equal(rownames(sel.tax), rownames(otu.pval))
# Filter the association based on p-values and level of correlationsp.yes<-otu.pval<0.001
# Select the r values for p.yesr.val=otu.cor$r # select all the correlation values p.yes.r<-r.val*p.yes # only select correlation values based on p-value criterion
# Select OTUs by level of correlationp.yes.r<-abs(p.yes.r)>0.75 # output is logical vectorp.yes.rr<-p.yes.r*r.val # use logical vector for subscripting.
# Create an adjacency matrixadjm<-as.matrix(p.yes.rr)
# Add taxonomic information associated with adjacency matrixcolnames(adjm)<-as.vector(sel.tax$Family)rownames(adjm)<-as.vector(sel.tax$Family)
# Create an adjacency matrix in igraph format net.grph=graph.adjacency(adjm,mode="undirected",weighted=TRUE,diag=FALSE)
# Calculate edge weight == level of correlationedgew<-E(net.grph)$weight
# Identify isolated nodesbad.vs<-V(net.grph)[degree(net.grph) == 0]
# Remove isolated nodesnet.grph <-delete.vertices(net.grph, bad.vs)
# Plot the graph objectplot(net.grph,
vertex.size=4,vertex.frame.color="black",edge.curved=F,edge.width=1.5,layout=layout.fruchterman.reingold,edge.color=ifelse(edgew<0,"red","blue"),vertex.label=NA,vertex.label.color="black",vertex.label.family="Times New Roman",vertex.label.font=2)
unclassified
PlanctomycetaceaePasteuriaceae
unclassified
unclassified
Bacteroidetes_incertae_sedis_family_incertae_sedis
unclassified
Coxiellaceae
Acidobacteria_Gp6_family_incertae_sedis
PlanctomycetaceaeSpartobacteria_family_incertae_sedis
Planctomycetaceae
Polyangiaceae
Acidobacteria_Gp4_family_incertae_sedis
Parachlamydiaceae
unclassified
Planctomycetaceae
unclassifiedOpitutaceae
unclassified
Alicyclobacillaceae
unclassified
Halomonadaceae
Acidobacteria_Gp16_family_incertae_sedis
unclassified
Clostridiaceae_1
unclassified
unclassified
Planctomycetaceae
Planctomycetaceae
Planctomycetaceae
Planctomycetaceae
unclassified
Planctomycetaceae
Planctomycetaceae
unclassified
PlanctomycetaceaeSubdivision3_family_incertae_sedis
Subdivision3_family_incertae_sedis
RhodospirillaceaeNannocystaceae Peptostreptococcaceae
unclassifiedunclassified
unclassified
Rubrobacteraceae
BRC1_family_incertae_sedis
Acidobacteria_Gp4_family_incertae_sedis
Nitrospiraceae
unclassified
unclassified
Acidobacteria_Gp4_family_incertae_sedisunclassified
unclassified
unclassified
unclassified
Bdellovibrionaceae
unclassified
Coxiellaceae
TM7_family_incertae_sedis
unclassified
unclassified
unclassified
unclassified
unclassified
Parachlamydiaceae
unclassified
Planctomycetaceae
unclassified
unclassified
Spartobacteria_family_incertae_sedis
Cryomorphaceae
OP11_family_incertae_sedis
OD1_family_incertae_sedis
unclassified
Acidobacteria_Gp10_family_incertae_sedis
unclassified
unclassified
unclassified
Parachlamydiaceae
Parachlamydiaceae
Parachlamydiaceae
OD1_family_incertae_sedis
unclassified
unclassified
OD1_family_incertae_sedisOD1_family_incertae_sedis
unclassified
unclassified
unclassified
Legionellaceae
OD1_family_incertae_sedis
unclassified
unclassified
unclassified
# Summaries of network traitsV(net.grph)
E(net.grph)
vcount(net.grph)
ecount(net.grph)
plot(degree(net.grph))
max(degree(net.grph))
min(degree(net.grph))
http://igraph.org/r/