ngs bioinformatics workshop 2.5 meta-analysis of genomic data may 30 th, 2012 irmacs 10900...
TRANSCRIPT
![Page 1: NGS Bioinformatics Workshop 2.5 Meta-Analysis of Genomic Data May 30 th, 2012 IRMACS 10900 Facilitator: Richard Bruskiewich Adjunct Professor, MBB Acknowledgment:](https://reader036.vdocument.in/reader036/viewer/2022062423/56649c9e5503460f9495df50/html5/thumbnails/1.jpg)
NGS Bioinformatics Workshop2.5 Meta-Analysis of
Genomic Data
May 30th, 2012IRMACS 10900
Facilitator: Richard BruskiewichAdjunct Professor, MBB
Acknowledgment:Several slides courtesy of Professor Fiona Brinkman, MBB
![Page 2: NGS Bioinformatics Workshop 2.5 Meta-Analysis of Genomic Data May 30 th, 2012 IRMACS 10900 Facilitator: Richard Bruskiewich Adjunct Professor, MBB Acknowledgment:](https://reader036.vdocument.in/reader036/viewer/2022062423/56649c9e5503460f9495df50/html5/thumbnails/2.jpg)
Today’s AgendaA brief overview of the bioinformatics for
SNP detection softwareProteinsSystems biologyMetagenomics (some resources; very brief…)
Group feedback: bioinformatics needs at SFU?
![Page 3: NGS Bioinformatics Workshop 2.5 Meta-Analysis of Genomic Data May 30 th, 2012 IRMACS 10900 Facilitator: Richard Bruskiewich Adjunct Professor, MBB Acknowledgment:](https://reader036.vdocument.in/reader036/viewer/2022062423/56649c9e5503460f9495df50/html5/thumbnails/3.jpg)
NGS-based SNP Analysis Programs
From: Nielsen et al. 2011. Nature Reviews Genetics 12:443-451
![Page 4: NGS Bioinformatics Workshop 2.5 Meta-Analysis of Genomic Data May 30 th, 2012 IRMACS 10900 Facilitator: Richard Bruskiewich Adjunct Professor, MBB Acknowledgment:](https://reader036.vdocument.in/reader036/viewer/2022062423/56649c9e5503460f9495df50/html5/thumbnails/4.jpg)
BIOINFORMATICS OF PROTEINS
NGS Bioinformatics Workshop2.5 Meta-Analysis of Genomic Data
![Page 5: NGS Bioinformatics Workshop 2.5 Meta-Analysis of Genomic Data May 30 th, 2012 IRMACS 10900 Facilitator: Richard Bruskiewich Adjunct Professor, MBB Acknowledgment:](https://reader036.vdocument.in/reader036/viewer/2022062423/56649c9e5503460f9495df50/html5/thumbnails/5.jpg)
5
From DNA to Protein to Systems
ATGGAATTC…
![Page 6: NGS Bioinformatics Workshop 2.5 Meta-Analysis of Genomic Data May 30 th, 2012 IRMACS 10900 Facilitator: Richard Bruskiewich Adjunct Professor, MBB Acknowledgment:](https://reader036.vdocument.in/reader036/viewer/2022062423/56649c9e5503460f9495df50/html5/thumbnails/6.jpg)
Amino Acid Properties – Venn Diagram
![Page 7: NGS Bioinformatics Workshop 2.5 Meta-Analysis of Genomic Data May 30 th, 2012 IRMACS 10900 Facilitator: Richard Bruskiewich Adjunct Professor, MBB Acknowledgment:](https://reader036.vdocument.in/reader036/viewer/2022062423/56649c9e5503460f9495df50/html5/thumbnails/7.jpg)
Polypeptides
O
R3HNH
O
R4HH3N+
O
R1HNH
O
R2HNH
O
![Page 8: NGS Bioinformatics Workshop 2.5 Meta-Analysis of Genomic Data May 30 th, 2012 IRMACS 10900 Facilitator: Richard Bruskiewich Adjunct Professor, MBB Acknowledgment:](https://reader036.vdocument.in/reader036/viewer/2022062423/56649c9e5503460f9495df50/html5/thumbnails/8.jpg)
Ramachandran Plot
![Page 9: NGS Bioinformatics Workshop 2.5 Meta-Analysis of Genomic Data May 30 th, 2012 IRMACS 10900 Facilitator: Richard Bruskiewich Adjunct Professor, MBB Acknowledgment:](https://reader036.vdocument.in/reader036/viewer/2022062423/56649c9e5503460f9495df50/html5/thumbnails/9.jpg)
Secondary Structure (SS) Prediction
Note major assumptions in all– Entire information for forming ss is contained in the primary sequence– Side groups of residues will determine structure
• Pattern recognition – Looks for patterns in common ss’s like amphipathic alpha-helices (e.g. pattern
of polar and non-polar residues)
• Homology– Predict ss of the central residue of a given segment from homologous segments
(neighbors)– Based on alignments of homologous residues from a protein family– Assumption: homologous proteins = similar structure– Extension: Use BLOSUM to detect similarity, or, better, use Position Specific
Scoring Matrix (PSSM)
![Page 10: NGS Bioinformatics Workshop 2.5 Meta-Analysis of Genomic Data May 30 th, 2012 IRMACS 10900 Facilitator: Richard Bruskiewich Adjunct Professor, MBB Acknowledgment:](https://reader036.vdocument.in/reader036/viewer/2022062423/56649c9e5503460f9495df50/html5/thumbnails/10.jpg)
SS Prediction Programs• PredictProtein-PHD (72%)
– http://www.predictprotein.org/ • PREDATOR (75%)
– http://www-db.embl heidelberg.de/jss/servlet/ de.embl.bk.wwwTools.GroupLeftEMBL/argos/ predator/predator_info.html
• PSIpred (77%)
– http://bioinf.cs.ucl.ac.uk/psipred/ (PSSM generated by PSI-BLAST, better sequence database, won CASP competition for many years)
• Jpred (81%)
– http://www.compbio.dundee.ac.uk/jpred/
![Page 11: NGS Bioinformatics Workshop 2.5 Meta-Analysis of Genomic Data May 30 th, 2012 IRMACS 10900 Facilitator: Richard Bruskiewich Adjunct Professor, MBB Acknowledgment:](https://reader036.vdocument.in/reader036/viewer/2022062423/56649c9e5503460f9495df50/html5/thumbnails/11.jpg)
Tertiary Structure
Lactate Dehydrogenase: Mixed a / b
Immunoglobulin Fold: b
Hemoglobin B Chain: a
![Page 12: NGS Bioinformatics Workshop 2.5 Meta-Analysis of Genomic Data May 30 th, 2012 IRMACS 10900 Facilitator: Richard Bruskiewich Adjunct Professor, MBB Acknowledgment:](https://reader036.vdocument.in/reader036/viewer/2022062423/56649c9e5503460f9495df50/html5/thumbnails/12.jpg)
Tertiary Structure: Protein Folds
Holm, L. and Sander, C. (1996) Mapping the protein universe. Science, 273, 595-603.
![Page 13: NGS Bioinformatics Workshop 2.5 Meta-Analysis of Genomic Data May 30 th, 2012 IRMACS 10900 Facilitator: Richard Bruskiewich Adjunct Professor, MBB Acknowledgment:](https://reader036.vdocument.in/reader036/viewer/2022062423/56649c9e5503460f9495df50/html5/thumbnails/13.jpg)
Protein Folds
Folds: definition difficult and different criteria used for different classification systems– Normally formed around a separate hydrophobic core
Current protein fold taxonomy– Very roughly …– Approx. 1000-2000 different estimated folds,
depending on method of analysis – of which about half are estimated to be known (500-1000)
– Average domain size approx. 150 aa (50 – 250 aa approx std dev)
![Page 14: NGS Bioinformatics Workshop 2.5 Meta-Analysis of Genomic Data May 30 th, 2012 IRMACS 10900 Facilitator: Richard Bruskiewich Adjunct Professor, MBB Acknowledgment:](https://reader036.vdocument.in/reader036/viewer/2022062423/56649c9e5503460f9495df50/html5/thumbnails/14.jpg)
Protein Fold Major ClassesAll alpha proteins (all a)
All beta proteins (all b)
Alpha/beta proteins (a/b)- Parallel strands connected by helices (bab motifs)
Alpha plus beta proteins (a+b)- More irregular a and b combinations
“Other”- Often subclassified now
![Page 15: NGS Bioinformatics Workshop 2.5 Meta-Analysis of Genomic Data May 30 th, 2012 IRMACS 10900 Facilitator: Richard Bruskiewich Adjunct Professor, MBB Acknowledgment:](https://reader036.vdocument.in/reader036/viewer/2022062423/56649c9e5503460f9495df50/html5/thumbnails/15.jpg)
Protein Fold Classification• Curated/Semi Manual Classification
– SCOP (Structural Classification Of Proteins)
http://scop.mrc-lmb.cam.ac.uk/scop/
– CATH (Class, Architecture, Topology, Homologous superfamily)
http://www.cathdb.info/
![Page 16: NGS Bioinformatics Workshop 2.5 Meta-Analysis of Genomic Data May 30 th, 2012 IRMACS 10900 Facilitator: Richard Bruskiewich Adjunct Professor, MBB Acknowledgment:](https://reader036.vdocument.in/reader036/viewer/2022062423/56649c9e5503460f9495df50/html5/thumbnails/16.jpg)
SCOP classification Family: clear evolutionarily relationship
– Residue identities >= 30% – OR known similar functions and structures (example:
globins form family though some only 15% identical)
Superfamily: Probable common evolutionary origin– Low sequence identities, but structural and functional
features suggest common evolutionary origin. (example: actin, ATPase domain of heat shock proteins, and hexakinase form a superfamily).
Fold: major structural similarity– Same major ss in same arrangement with the same
topological connections– May occur by convergent evolution
![Page 17: NGS Bioinformatics Workshop 2.5 Meta-Analysis of Genomic Data May 30 th, 2012 IRMACS 10900 Facilitator: Richard Bruskiewich Adjunct Professor, MBB Acknowledgment:](https://reader036.vdocument.in/reader036/viewer/2022062423/56649c9e5503460f9495df50/html5/thumbnails/17.jpg)
17
SCOP example
![Page 18: NGS Bioinformatics Workshop 2.5 Meta-Analysis of Genomic Data May 30 th, 2012 IRMACS 10900 Facilitator: Richard Bruskiewich Adjunct Professor, MBB Acknowledgment:](https://reader036.vdocument.in/reader036/viewer/2022062423/56649c9e5503460f9495df50/html5/thumbnails/18.jpg)
18
CATH example
![Page 19: NGS Bioinformatics Workshop 2.5 Meta-Analysis of Genomic Data May 30 th, 2012 IRMACS 10900 Facilitator: Richard Bruskiewich Adjunct Professor, MBB Acknowledgment:](https://reader036.vdocument.in/reader036/viewer/2022062423/56649c9e5503460f9495df50/html5/thumbnails/19.jpg)
Protein Fold Classification• Automated Classification
– DALIhttp://ekhidna.biocenter.helsinki.fi/dali
– VAST (Vector Alignment Search Tool)http://www.ncbi.nlm.nih.gov/Structure/ VAST/vast.shtml
![Page 20: NGS Bioinformatics Workshop 2.5 Meta-Analysis of Genomic Data May 30 th, 2012 IRMACS 10900 Facilitator: Richard Bruskiewich Adjunct Professor, MBB Acknowledgment:](https://reader036.vdocument.in/reader036/viewer/2022062423/56649c9e5503460f9495df50/html5/thumbnails/20.jpg)
Domain Classification # (DC_l_m_n_p)
l: fold space attractor region
m: globular folding topology/fold type (clusters of structural neighbours in fold space with average pairwise Z-scores, by Dali, above 2)
n: functional family (PSI-Blast, clusters of identically conserved functional residues, E.C. numbers, Swissprot keywords)
p: sequence family (>25% identities)
DALI/FSSP – Automated classificationExhaustive all-against-all 3D structure comparison of protein structures currently in the PDB
![Page 21: NGS Bioinformatics Workshop 2.5 Meta-Analysis of Genomic Data May 30 th, 2012 IRMACS 10900 Facilitator: Richard Bruskiewich Adjunct Professor, MBB Acknowledgment:](https://reader036.vdocument.in/reader036/viewer/2022062423/56649c9e5503460f9495df50/html5/thumbnails/21.jpg)
http://www.ncbi.nlm.nih.gov/Structure/VAST/vasthelp.html
All against all BLAST comparison of NCBI’s MMDB (database of known protein structure at NCBI, derived from the PDB)
Clustered into groups by a neighbor joining procedure, using BLAST p-value cutoffs of C or less (where C=10e-7, 10e-40 or 10e-80, to reflect three different levels of redundancy). A fourth level of classification is based on sequence identity
VAST – Automated classification
![Page 22: NGS Bioinformatics Workshop 2.5 Meta-Analysis of Genomic Data May 30 th, 2012 IRMACS 10900 Facilitator: Richard Bruskiewich Adjunct Professor, MBB Acknowledgment:](https://reader036.vdocument.in/reader036/viewer/2022062423/56649c9e5503460f9495df50/html5/thumbnails/22.jpg)
22
Motif and Domain Searching• InterPro – an integration of tools (PROSITE,
PFAM, PRINTS, PRODOM)– http://www.ebi.ac.uk/interpro/
• Expasy Tools has more…– PATTINPROT, to search for patterns in proteins yourself, etc…
But first… Check if the analysis you want to do has already been done!
i.e. www.ebi.ac.uk/proteome/ db.psort.org
![Page 23: NGS Bioinformatics Workshop 2.5 Meta-Analysis of Genomic Data May 30 th, 2012 IRMACS 10900 Facilitator: Richard Bruskiewich Adjunct Professor, MBB Acknowledgment:](https://reader036.vdocument.in/reader036/viewer/2022062423/56649c9e5503460f9495df50/html5/thumbnails/23.jpg)
Phylofacts
PhyloFacts includes hidden Markov models for classification of user-submitted protein sequences to protein families across the Tree of Life.
http://phylogenomics.berkeley.edu/phylofacts/
![Page 24: NGS Bioinformatics Workshop 2.5 Meta-Analysis of Genomic Data May 30 th, 2012 IRMACS 10900 Facilitator: Richard Bruskiewich Adjunct Professor, MBB Acknowledgment:](https://reader036.vdocument.in/reader036/viewer/2022062423/56649c9e5503460f9495df50/html5/thumbnails/24.jpg)
Subcellular Localization Prediction – Example of the benefit of integrating results with a Baysian approach
![Page 25: NGS Bioinformatics Workshop 2.5 Meta-Analysis of Genomic Data May 30 th, 2012 IRMACS 10900 Facilitator: Richard Bruskiewich Adjunct Professor, MBB Acknowledgment:](https://reader036.vdocument.in/reader036/viewer/2022062423/56649c9e5503460f9495df50/html5/thumbnails/25.jpg)
Localization Prediction - methods
Several programs analyze single features:
TargetP
Initially one program analyzed multiple features:
PSORT I (eukaryotes and prokaryotes)
Developed in 1990
![Page 26: NGS Bioinformatics Workshop 2.5 Meta-Analysis of Genomic Data May 30 th, 2012 IRMACS 10900 Facilitator: Richard Bruskiewich Adjunct Professor, MBB Acknowledgment:](https://reader036.vdocument.in/reader036/viewer/2022062423/56649c9e5503460f9495df50/html5/thumbnails/26.jpg)
PSORT I prediction method: Rule based
Nakai & Kanehisa, Proteins: Structure, Function, Genetics (1991)
![Page 27: NGS Bioinformatics Workshop 2.5 Meta-Analysis of Genomic Data May 30 th, 2012 IRMACS 10900 Facilitator: Richard Bruskiewich Adjunct Professor, MBB Acknowledgment:](https://reader036.vdocument.in/reader036/viewer/2022062423/56649c9e5503460f9495df50/html5/thumbnails/27.jpg)
27
Compositional Analysis
• Molecular Weight• Amino Acid Frequency• Isoelectric Point• UV Absorptivity• Solubility, Size, Shape
![Page 28: NGS Bioinformatics Workshop 2.5 Meta-Analysis of Genomic Data May 30 th, 2012 IRMACS 10900 Facilitator: Richard Bruskiewich Adjunct Professor, MBB Acknowledgment:](https://reader036.vdocument.in/reader036/viewer/2022062423/56649c9e5503460f9495df50/html5/thumbnails/28.jpg)
SYSTEMS BIOLOGY
NGS Bioinformatics Workshop2.1 Meta-Analysis of Genomic Data
![Page 29: NGS Bioinformatics Workshop 2.5 Meta-Analysis of Genomic Data May 30 th, 2012 IRMACS 10900 Facilitator: Richard Bruskiewich Adjunct Professor, MBB Acknowledgment:](https://reader036.vdocument.in/reader036/viewer/2022062423/56649c9e5503460f9495df50/html5/thumbnails/29.jpg)
Systems Biology
What is systems biology?
① Considers all (or many) of the proteins and genes in the system
② Links proteins and genes using interactions and functions
③ Uses computational models to study system
④ Provides insights into mechanisms, system dynamics, global properties
![Page 30: NGS Bioinformatics Workshop 2.5 Meta-Analysis of Genomic Data May 30 th, 2012 IRMACS 10900 Facilitator: Richard Bruskiewich Adjunct Professor, MBB Acknowledgment:](https://reader036.vdocument.in/reader036/viewer/2022062423/56649c9e5503460f9495df50/html5/thumbnails/30.jpg)
Molecular Interaction (MI) Network
Nodes = Gene / Protein Edge = Interaction Possible interactions:
phosphorylation physical binding transcriptional regulation others?
![Page 31: NGS Bioinformatics Workshop 2.5 Meta-Analysis of Genomic Data May 30 th, 2012 IRMACS 10900 Facilitator: Richard Bruskiewich Adjunct Professor, MBB Acknowledgment:](https://reader036.vdocument.in/reader036/viewer/2022062423/56649c9e5503460f9495df50/html5/thumbnails/31.jpg)
Cytoscape
http://www.cytoscape.org/
Cytoscape supports many use cases in molecular and systems biology, genomics, and proteomics:
Load molecular and genetic interaction data sets in many formats
Project and integrate global datasets and functional annotations
Establish powerful visual mappings across these data
Perform advanced analysis and modeling using Cytoscape plugins
Visualize and analyze human-curated pathway datasets such as Reactome or KEGG.
![Page 32: NGS Bioinformatics Workshop 2.5 Meta-Analysis of Genomic Data May 30 th, 2012 IRMACS 10900 Facilitator: Richard Bruskiewich Adjunct Professor, MBB Acknowledgment:](https://reader036.vdocument.in/reader036/viewer/2022062423/56649c9e5503460f9495df50/html5/thumbnails/32.jpg)
Cytoscape
Attributes for highlighted nodes / edges
Change visible attributes
Network navigation
Visible networks
Search for nodes
Control tabs: Network, VizMapper, plugin tabs
![Page 33: NGS Bioinformatics Workshop 2.5 Meta-Analysis of Genomic Data May 30 th, 2012 IRMACS 10900 Facilitator: Richard Bruskiewich Adjunct Professor, MBB Acknowledgment:](https://reader036.vdocument.in/reader036/viewer/2022062423/56649c9e5503460f9495df50/html5/thumbnails/33.jpg)
Data Files:1. Network (Simple Interaction Format)2. Node attributes (tab-delimited)3. Gene expression (tab-delimited)
Cytoscape – Loading Data
![Page 34: NGS Bioinformatics Workshop 2.5 Meta-Analysis of Genomic Data May 30 th, 2012 IRMACS 10900 Facilitator: Richard Bruskiewich Adjunct Professor, MBB Acknowledgment:](https://reader036.vdocument.in/reader036/viewer/2022062423/56649c9e5503460f9495df50/html5/thumbnails/34.jpg)
1. Network (Simple Interaction Format)• Format:
gene1 interaction_type gene2
• E.g.:
Cytoscape – Loading Data
C1QB pp C1RC1R pp C2C2 pp C4
…
![Page 35: NGS Bioinformatics Workshop 2.5 Meta-Analysis of Genomic Data May 30 th, 2012 IRMACS 10900 Facilitator: Richard Bruskiewich Adjunct Professor, MBB Acknowledgment:](https://reader036.vdocument.in/reader036/viewer/2022062423/56649c9e5503460f9495df50/html5/thumbnails/35.jpg)
2. Gene Attribute (tab-delimited table)• Maps data values to nodes
Cytoscape – Loading Data
Load File
Check off “Show Text File Import Options”
Check off “Transfer first line as attribute names..”
Preview
![Page 36: NGS Bioinformatics Workshop 2.5 Meta-Analysis of Genomic Data May 30 th, 2012 IRMACS 10900 Facilitator: Richard Bruskiewich Adjunct Professor, MBB Acknowledgment:](https://reader036.vdocument.in/reader036/viewer/2022062423/56649c9e5503460f9495df50/html5/thumbnails/36.jpg)
3. Gene expression (tab-delimited table)• Format:
gene1 exp_cond1 exp_cond2 … sig_cond1 sig_cond2 …
• Expression value: fold-change or intensity from microarray
• Significance value: P-value indicating how likely the expression value is different between conditions.
Cytoscape – Loading Data
![Page 37: NGS Bioinformatics Workshop 2.5 Meta-Analysis of Genomic Data May 30 th, 2012 IRMACS 10900 Facilitator: Richard Bruskiewich Adjunct Professor, MBB Acknowledgment:](https://reader036.vdocument.in/reader036/viewer/2022062423/56649c9e5503460f9495df50/html5/thumbnails/37.jpg)
Cytoscape – Network Style
Can change color by double-clicking on arrows
Select “Continuous Mapping” as mapping type
Select expression fold-change values (CMexp)
Double-click “Node color”
In “Vizmapper” tab…
![Page 38: NGS Bioinformatics Workshop 2.5 Meta-Analysis of Genomic Data May 30 th, 2012 IRMACS 10900 Facilitator: Richard Bruskiewich Adjunct Professor, MBB Acknowledgment:](https://reader036.vdocument.in/reader036/viewer/2022062423/56649c9e5503460f9495df50/html5/thumbnails/38.jpg)
1. Differentially-expressed subnetworks• jActiveModules
2. Functional enrichment• BiNGO
Systems Biology Analyses
![Page 39: NGS Bioinformatics Workshop 2.5 Meta-Analysis of Genomic Data May 30 th, 2012 IRMACS 10900 Facilitator: Richard Bruskiewich Adjunct Professor, MBB Acknowledgment:](https://reader036.vdocument.in/reader036/viewer/2022062423/56649c9e5503460f9495df50/html5/thumbnails/39.jpg)
Search for sub-networks that contain a significant number differentially-expressed genes (nodes)
All genes in sub-network interact… SO these highly differentially-expressed sub-networks
may represent a critical pathway or complex involved in a condition of interest
Differentially-Expressed Subnetworks
![Page 40: NGS Bioinformatics Workshop 2.5 Meta-Analysis of Genomic Data May 30 th, 2012 IRMACS 10900 Facilitator: Richard Bruskiewich Adjunct Professor, MBB Acknowledgment:](https://reader036.vdocument.in/reader036/viewer/2022062423/56649c9e5503460f9495df50/html5/thumbnails/40.jpg)
jActive algorithm: Searches for sub-networks that contain a significant
number differentially-expressed genes (or nodes) Heuristic – won’t always find the optimum result Z-score signifies how likely to find a subnetwork
with a similar number of DE genes.
Differentially-Expressed Subnetworks
![Page 41: NGS Bioinformatics Workshop 2.5 Meta-Analysis of Genomic Data May 30 th, 2012 IRMACS 10900 Facilitator: Richard Bruskiewich Adjunct Professor, MBB Acknowledgment:](https://reader036.vdocument.in/reader036/viewer/2022062423/56649c9e5503460f9495df50/html5/thumbnails/41.jpg)
Search from highlighted nodes
Select expression significance (p-values)
jActive - Inputs
![Page 42: NGS Bioinformatics Workshop 2.5 Meta-Analysis of Genomic Data May 30 th, 2012 IRMACS 10900 Facilitator: Richard Bruskiewich Adjunct Professor, MBB Acknowledgment:](https://reader036.vdocument.in/reader036/viewer/2022062423/56649c9e5503460f9495df50/html5/thumbnails/42.jpg)
Highlight result and click “Create Network”
Subnetworks listed here
jActive - Results
![Page 43: NGS Bioinformatics Workshop 2.5 Meta-Analysis of Genomic Data May 30 th, 2012 IRMACS 10900 Facilitator: Richard Bruskiewich Adjunct Professor, MBB Acknowledgment:](https://reader036.vdocument.in/reader036/viewer/2022062423/56649c9e5503460f9495df50/html5/thumbnails/43.jpg)
Functional Enrichment: Also called over-representation analysis
Searches for common or related functions in a gene set Is there a common annotation (e.g. pathway, GO term)
for a set of genes that is more frequent than you would expect by chance?
Functional Enrichment
![Page 44: NGS Bioinformatics Workshop 2.5 Meta-Analysis of Genomic Data May 30 th, 2012 IRMACS 10900 Facilitator: Richard Bruskiewich Adjunct Professor, MBB Acknowledgment:](https://reader036.vdocument.in/reader036/viewer/2022062423/56649c9e5503460f9495df50/html5/thumbnails/44.jpg)
Gene Ontology• Controlled vocabulary describing functions, processes and cell
components• Consistency between organisms and gene products• GO terms linked by relationships (is-a, part-of) and have
hierarchy (parent – child)
is-apart-of
[other protein complexes]
[other organelles]
protein complex organelle
mitochondrion
fatty acid beta-oxidation multienzyme complex
![Page 45: NGS Bioinformatics Workshop 2.5 Meta-Analysis of Genomic Data May 30 th, 2012 IRMACS 10900 Facilitator: Richard Bruskiewich Adjunct Professor, MBB Acknowledgment:](https://reader036.vdocument.in/reader036/viewer/2022062423/56649c9e5503460f9495df50/html5/thumbnails/45.jpg)
BiNGO: Looks for GO terms that are over-represented in a set of
genes. Displays the results in two ways
A table with p-values A graph showing relationships between terms
Uses the hypergeometric test to statistically test for over-representation of each GO term.
Performs multiple hypothesis correction (since we are testing multiple GO terms for over-representation).
Functional Enrichment
![Page 46: NGS Bioinformatics Workshop 2.5 Meta-Analysis of Genomic Data May 30 th, 2012 IRMACS 10900 Facilitator: Richard Bruskiewich Adjunct Professor, MBB Acknowledgment:](https://reader036.vdocument.in/reader036/viewer/2022062423/56649c9e5503460f9495df50/html5/thumbnails/46.jpg)
BiNGO - Inputs
Click Start BiNGO
Select “Custom” and then load go.annot file
Lower significance level
Fill in Name
![Page 47: NGS Bioinformatics Workshop 2.5 Meta-Analysis of Genomic Data May 30 th, 2012 IRMACS 10900 Facilitator: Richard Bruskiewich Adjunct Professor, MBB Acknowledgment:](https://reader036.vdocument.in/reader036/viewer/2022062423/56649c9e5503460f9495df50/html5/thumbnails/47.jpg)
BiNGO - Results
![Page 48: NGS Bioinformatics Workshop 2.5 Meta-Analysis of Genomic Data May 30 th, 2012 IRMACS 10900 Facilitator: Richard Bruskiewich Adjunct Professor, MBB Acknowledgment:](https://reader036.vdocument.in/reader036/viewer/2022062423/56649c9e5503460f9495df50/html5/thumbnails/48.jpg)
BiNGO - Results
General GO Terms
Specific GO Terms
Significance
![Page 49: NGS Bioinformatics Workshop 2.5 Meta-Analysis of Genomic Data May 30 th, 2012 IRMACS 10900 Facilitator: Richard Bruskiewich Adjunct Professor, MBB Acknowledgment:](https://reader036.vdocument.in/reader036/viewer/2022062423/56649c9e5503460f9495df50/html5/thumbnails/49.jpg)
EGAN: Exploratory Gene Association Networks
http://akt.ucsf.edu/EGAN/
![Page 50: NGS Bioinformatics Workshop 2.5 Meta-Analysis of Genomic Data May 30 th, 2012 IRMACS 10900 Facilitator: Richard Bruskiewich Adjunct Professor, MBB Acknowledgment:](https://reader036.vdocument.in/reader036/viewer/2022062423/56649c9e5503460f9495df50/html5/thumbnails/50.jpg)
METAGENOMICS
NGS Bioinformatics Workshop2.5 Meta-Analysis of Genomic Data
![Page 51: NGS Bioinformatics Workshop 2.5 Meta-Analysis of Genomic Data May 30 th, 2012 IRMACS 10900 Facilitator: Richard Bruskiewich Adjunct Professor, MBB Acknowledgment:](https://reader036.vdocument.in/reader036/viewer/2022062423/56649c9e5503460f9495df50/html5/thumbnails/51.jpg)
What is Metagenomics? The culture-independent isolation and characterization of
DNA from uncultured microorganism communities Nice reading list on the topic:
http://www.cbcb.umd.edu/confcour/CMSC828G-materials/reading-list.html
See also: Torsten Thomas Jack Gilbert and Folker Meyer. 2012. Metagenomics - a guide from sampling to data analysis. Microb. Inform. Exp. doi:10.1186/2042-5783-2-3 http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3351745/
I will just mention a few relevant bioinformatics tools here (no specific endorsements implied).
![Page 52: NGS Bioinformatics Workshop 2.5 Meta-Analysis of Genomic Data May 30 th, 2012 IRMACS 10900 Facilitator: Richard Bruskiewich Adjunct Professor, MBB Acknowledgment:](https://reader036.vdocument.in/reader036/viewer/2022062423/56649c9e5503460f9495df50/html5/thumbnails/52.jpg)
MG-RAST server
http://metagenomics.nmpdr.org/
Meyer, F. et al. 2008. The metagenomics RAST server – a public resource for the automatic phylogenetic and
functional analysis of metagenomes. BMC Bioinformatics. 9:386 doi:10.1186/1471-2105-9-386
![Page 53: NGS Bioinformatics Workshop 2.5 Meta-Analysis of Genomic Data May 30 th, 2012 IRMACS 10900 Facilitator: Richard Bruskiewich Adjunct Professor, MBB Acknowledgment:](https://reader036.vdocument.in/reader036/viewer/2022062423/56649c9e5503460f9495df50/html5/thumbnails/53.jpg)
MEGAN - MEtaGenome ANalyzerhttp://ab.inf.uni-tuebingen.de/software/megan/
Huson DH et al. 2007. MEGAN analysis of metagenomic data. Genome Res. 17: 377-386