comparative network analysis of neurological disorders focuses the genome-wide search for autism...
TRANSCRIPT
Comparative network analysis of neurological disorders focuses the genome-wide search for autism genes.
Dennis P. Wall, PhDCenter for Biomedical [email protected]://wall.hms.harvard.edu
Outline
• Rationale & Biological Significance (30 mins)
• Present status (5 mins)
• Project Plan (25 mins)
Introduction
• Polygenic & Multigenic
• Many genes have been linked to autism
• Few genes have been replicated in across studies
• Difficult for a single researcher to grasp the complexity of the autism gene landscape
Autism
Fragile X
RettSyndrome
TuberousSclerosis
Angelman Schizophrenia
Epilepsy
Seizure Disorder
Mental Retardation
Others??
Behavioral overlap with other disorders
Approach
• Build the network of all genes implicated in Autism to date
• Conduct large comparative analysis of Autism and other neurological disorders at the level of genes, biological processes, and networks
• Leverage existing research on Autism-related disorders to find new genetic leads.
Building Gene Lists for All Neurological Disorders (433)
OMIM
GeneCards
NINDS
AspergerFragile XTourette’sOCD…
Autism
OCD
Epilepsy
Ataxia Gene Lists
Disease source
Gene-Disease sourcesDisease gene database
ADHD
Tourette Syndrome
Attention Deficit Hyperactivity Disorder
Primary Lateral Sclerosis
Neurotoxicity
Down Syndrome
Dementia
Alzheimers Disease
Alzheimer Disease
Brain Injury
Stroke
Multiple Sclerosis
Systemic Lupus Erythematosus
Cerebral Palsy
Erbs Palsy
Neuronal Migration Disorders
West Syndrome
De Morsiers Syndrome
Williams Syndrome
Hydrocephalus
Encephalopathy
Huntington Disease
Epilepsy
Schizophrenia
Asperger Syndrome
Angelman Syndrome
Autism
Rett Syndrome
Hypotonia
Infantile Hypotonia
Spasticity
Microcephaly
mental retardation
Fragile X
Ataxia
Hypoxia
Seizure Disorder
Tuberous Sclerosis
obsessive compulsive disorder
Major Depression
Migraine
Autism Cluster
1100100101…1110101011…1001010100…1001011101…//1101011101…
Genes
Dis
orde
rsAutism Cluster
Network Construction
• Data derived from STRING (http://string.embl.de/)
• Integration of p-p interaction (interactome), co-expression (transcriptome), orthology (orthologome),text (bibliome), and other lines of evidence.
• Focus on creating a networks of possible interactions within a normal cell using classification methods (random forests)
Sequence coEvolution
P-P Interaction
FXYD1 is identified as a MeCP2 target gene whose de-repression may directly contribute to Rett syndrome neuronal pathogenesis
Text (aka Bibliome)
http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.0030043
Random Forest DecisionD1 D2 D3 D4 D5
D1 D3
D3D4= {1,0,2,1,0}
D2
D3
D3
D5D4
D1
D2 D4
Yes
No
A
B
A B
Correlated Expression
Networks for all AC disorders
Hypoxia(586 N/4359E)
FragileX(97N/100E)
TuberousSclerosis
(110N/204E)
Hypotonia(154N/208E)
Autism(145N/164E)
Microcephaly(135N/166E)Rett
(48N/74E)
Angelman(51N/57E)
Spasticity(62N/40E)
Mental Retardation(573N/1035E)
Ataxia(428N/1489E) Seizure
Disorder(35N/13E)
Inf. Hypotonia(29N/16E)
Asperger(15N/9E)
autworks.hms.harvard.edu
Multi-disorder component of autism (MDAG)
• 66 out of 127 involved in at least one member of the autism cluster• Highly connected component of the autism network
Biological Process p value MDAG genes
transmission of nerve impulse 3.00E-11 ABAT, ALDH5A1, APOE, CHRNA4, CHRNA7, GABRA5, GABRB3, GABRG2, GATA3, GRIN2A, MAOA, MET, NF1, NTF5, SCN1A, SLC6A4, TH, TPH1, TSC1
nervous system development
3.29E-11 ALDH5A1, APOE, ARX, BTD, CHRNA4, DAB1, DCX, FMR1, FOXP2, GABRA5, GATA3, GRIN2A, HOXA1, MAP2, MECP2, MET, NDN, NF1, NTF5, PAX3, PTEN, RELN, TSC1, UBE3A, VLDLR
synaptic transmission 7.68E-10 ABAT, ALDH5A1, APOE, CHRNA4, CHRNA7, GABRA5, GABRB3, GABRG2, GATA3, GRIN2A, MAOA, MET, NF1, NTF5, SLC6A4, TH, TPH1
cell-cell signaling 3.12E-09 ABAT, ADM, ALDH5A1, APOE, CHRNA4, CHRNA7, GABRA5, GABRB3, GABRG2, GATA3, GRIN2A, MAOA, MET, NF1, NTF5, SCN1A, SLC6A4, SSTR5, TH, TPH1, TSC1
brain development 2.64E-06 ARX, DAB1, DCX, FOXP2, GABRA5, HOXA1, MET, NF1, RELN, TSC1, UBE3A
generation of neurons 2.43E-05 APOE, ARX, DAB1, DCX, MAP2, MECP2, MET, NDN, NF1, NTF5, PTEN, RELN, VLDLR
regulation of cell proliferation
2.45E-04 ADM, ARX, CHRNA7, DHCR7, FOXP2, GRPR, MECP2, MET, NDN, NF1, PAX3, PTEN, SSTR5, TSC1
cell migration 3.93E-03 ARX, DAB1, DCX, MET, NDN, NF1, PAX3, PTEN, RELN, VLDLR
homeostasis 1.90E-02 ADM, APOE, ARX, CHRNA4, CHRNA7, GRIN2A, MBD1, NDN, NF1, SCN1A, SLC40A1, SSTR5, TH
cell morphogenesis 1.94E-02 APOE, ARX, ATP10A, DCX, MAP2, MECP2, NDN, PTEN, RELN, TSC1
ion transport 2.74E-02 ARX, CACNA1D, CHRNA4, CHRNA7, GABRA5, GABRB3, GABRG2, GRIN2A, MECP2, MET, SCN1A, SLC40A1, TSC1
cell differentiation 4.35E-02 ADM, APOE, ARX, DAB1, DCX, DHCR7, EXT2, FXR1, GATA3, GLO1, GRIN2A, MAP2, MECP2, MET, NDN, NF1, NTF5, PAX3, PTEN, RELN, TSC1, VLDLR
Significantly enriched MDAG processes
Ion Transport
Cell Proliferation
CNSDevelopment
SynapticTransmission
P = 2.7E-02
P = 2.45E-04
P = 3.29E-11
P = 7.68E-10
•Fisher’s exact test•Bonferroni adjustment•14648 biological processes from Gene Ontology tested
Process-Driven Predictions
Fragile X
Tuberous Sclerosis
Seizure Disorder
Mental Retardation
CNS developmentSynaptic TransmissionIon Transport
Biological Processes Autism Cluster Disorders Putative New Genes
64 new genes, all of which occur in 2 or more of the Autism Cluster Disorders
Cell Proliferation…
Experimental Validation• GEO6575 (from UC Davis M.I.N.D. institute)• White blood cell Affymetrix U133plus2.0 • 17 samples of autistic children without
regression• 18 children with regression• 9 children with mental retardation or
developmental delay• 12 typically developing children from the general
population
Experimental Validation
• GEO6575 (from U.C. Davis M.I.N.D. institute)• White blood cell Affymetrix U133plus2.0 • 17 samples of autistic patients without
regression• 18 patients with regression• 9 patients with mental retardation or
developmental delay• 12 typically developing children from the
general population
Data-driven approach to FDR detection can be ineffective
• Standard data-driven application of false discovery rate control yields few genes below FDR threshold of 0.05. (with these data, only 2 genes survive)
• This is a frequent circumstance in instances of weak signal and large background noise (e.g. microarray experiments)
Results of process-driven search
• 43 Process-derived gene predictions had FDR-adjusted p values <0.05
• Highly significant rate of validation -- 65% of predictions confirmed by expression data
Results of network-driven search
• 267 occurred in 1 autism cluster disorder
• 58 occurred in 2
• 17 in 3
• 3 in 4 sibling disorders
• A total of 345 new predictions
Results of network-driven search
• 301 had FDR-adjusted p values <0.05
• 90% (!) of predictions verified by expression data
2 4 6 8
20
40
60
80
100
average distance
8 10 12 14
43
Prior knowledge focuses whole-genomic search
• 43 Process-derived gene predictions had FDR-adjusted p values <0.05. 65%
• 301 Network-derived gene predictions had FDR-adjusted p values <0.05. 90%
The rate of validation in both cases is significantly non-random
Top 20 genes occurring in 3 or more Autism Sibling Disorders
For many of these candidates, their roles in neurological impairment have been studied in autism cluster disorders, but not in autism.
SLC16A2
Molecular Triangulation
OPHN1
AR
PAFAH1B1
FLNA
SLC6A8
MYO5A
FXN
L1CAM
Mental RetardationFragile XHypotonia AtaxiaHypoxia
MicrocephalyRett SyndromeSpasticityTuberous Sclerosis
- cytoskeleton organization
- cell organization/biogenesis
- cell communication
- cell motility
GO biological process enrichment
Conclusions
• Previous research has implicated between 100 and 1500 genes as contributors to the molecular physiology of Autism.
• Our knowledge-driven approach provides a logical means to filter the genome wide search.
Conclusions• Global “ask” swamped by noisy signal
• Informed, knowledge-driven “ask” results in biologically significant gene predictions
• Comparative analysis of Autism with related neurological disorders provides a focused search for novel gene candidates
Autworks• Autworks is a web-driven navigation
system that allows any researcher to view and search through the network of genes implicated in autism and related neurological disorders
• Built to aid and abet the role of serendipity and inspiration for researchers working on autism and other complex neuro diseases.
• http://autworks.hms.harvard.edu
The Plan
• Bring our analytical strategies and Autworks to the cloud– Beef up underbelly using AWS storage and the
Amazon “Turkforce”– Scale up comparative network analysis– Enlarge validation database, verify/re-verify
computational predictions, robustify the candidates
Database Description Stats
Database of Genomic Variants A curated catalogue of structural variation in the human genome
~31615 total entries (indels, inversions, and copy number variation)
dbSNP NCBI’s central repository for both single base nucleotide substitutions and short deletion and insertion polymorphisms
~6,136,008 SNPs for Human
Chromosomal Variation in Man* Searchable reference of chromosomal variation
> 3000 links to publications describing 30 different types of chromosomal variation in human disease
Human Gene Mutation Database*
Established for the study of mutational mechanisms in human genes
62901 mutations in public release
OMIM* NCBI’s compendium of human genes and genetic phenotypes
12,634 genes for ~2459 phenotypes/diseases
GeneCards* searchable database of human genes that provides concise genomic, proteomic, transcriptomic, genetic and functional information
all known and predicted human genes with summaries of known disease association
SNPedia* SNPedia shares information about the effects of variations in DNA, citing peer-reviewed scientific publications
4621 SNPs
Aim 1: Build the neurological disease “gene core” of the Autworks relational database
* Can be queried with a disease or gene term
Aim 1: Steps(1) Extract the entire set of neurological disorders listed by NINDS
(currently 433) to ensure that we can find any and all commonalities to Autism.
(2) Mine all databases in above Table that can be searched using a disease term as the query, specifically the Online Mendelian Inheritance in Man (OMIM), GeneCards, Chromosomal Variation in Man, the Human Gene Mutation Database (HGMD), and SNPedia.
(3) Combine and import the features from each of the online resources into a relational database that will become the backend of Autworks, being careful to remove any redundancies.
(4) Cross-reference resources to comprehensively populate data model.
Gene-disease data model “Gene Core”
Field Description
Gene official gene symbol from HUGO
Variant ID unique identifier (e.g., RS#, SS#, etc.)
Variant Type SNP, CNV, Indel, etc.
Genomic Location chromosomal coordinates (hg build 36)
Source Database(s) from where gene and/or gene variant was derived
OMIM score Confidence score used by OMIM
Polyphen score Score indicating severity of mutation
Disease Autism and related neurological disorders
PubmedID Article(s) describing the genetic variant
This data model will share much in common with Variome project’s database
ABI1
MedlineMedline
Medline
PMID: 17173049 SHANK3 (also known as ProSAP2) regulates the structural organization of dendritic spines and is a binding partner of neuroligins; genes encoding neuroligins are mutated in autism and Asperger syndrome. Here, we report that a mutation of a single copy of SHANK3 on chromosome 22q13 can result in language and/or social communication disorders...
GeneTagger
PMID: 17304222 We identified an important component for controlled actin assembly, abelson interacting protein-1 (Abi-1), as a binding partner for the postsynaptic density (PSD) protein ProSAP2/Shank3. During early neuronal development, Abi-1 is localized in neurites and growth cones; at later stages, the protein is enriched in dendritic spines and PSDs…
Candidate gene filtered
MeSH Major Topics
MeSH term filtered
AnnotatorChecksAccuracy throughBioNotate system
Shank3 Shank3 Autism
Results:Gene-Gene
Gene-DiseaseCorpora
Can we Turkify this process???
Aim 2: Build interaction & network cores for Autworks
Database DescriptionProtein-Protein
InteractionDerived directly from STRING [18]. STRING incorporates >80,000 p-p interactions
from numerous sources including MINT [24], HPRD [25], BIND [26], DIP [27], BioGrid [28, 29], KEGG [30], and Reactome [31]. These databases contain records from two-hybrid assays, synthetic lethality assays, mass spectrometry, co-Immunoprecipitation, and more.
Phylogenetic Profiles
We will take the union of evidence from STRING and evidence from RoundUP, which was built by the PI and has greater coverage than STRING’s orthology information (21,000+ unique phylogenetic profiles for more than 30 Eukaryotic organisms [2]). Phylogenetic profiles are commonly used to predict functional relationships between proteins [32, 33].
Gene Ontology (GO)
GO [34, 35] contains >923034 unique biological process, function, and cellular component terms. Same process, function, and/or cellular location can be used to predict protein-protein interaction. This has been incorporated into STRING.
Co-Expression We will combine data from STRING with our own in house Co-Expression database, ChipperDB [23]. ChipperDB contains a sizable portion of NCBI’s Gene Expression Omnibus [36]. Co-expression is a proven method for predicting shared function and protein-protein interaction [37].
Bibliome Statistically relevant co-occurrences of gene names, and semantically specified interactions found via Natural Language Processing [16].
GO
Co-Ex
P-P intx
Bibliome
Phylo-profiles
Classifier
Interaction Core
Network core
Ataxia
MentalRetardation
Autism
Can we “cloud” it up???
Aim 3: comparative network analysis on the cloud
Schizophrenia
Autism
- Find disease filtered interacting partners- Find shortest paths btw candidates- Find minimal subnetworks- Verify and reconstruct networks appropriately