a systems biology approach to the identification and analysis of transcriptional regulatory networks...

22
A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Angela K. Dean, Stephen E. Harris, Jianhua Ruan Harris, Jianhua Ruan

Upload: ashlynn-mcdowell

Post on 16-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua

A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes

A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes

Angela K. Dean, Stephen E. Harris, Jianhua Angela K. Dean, Stephen E. Harris, Jianhua RuanRuan

Page 2: A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua

OverviewOverview Osteocytes – Background & MotivationOsteocytes – Background & Motivation Review of Biological Central DogmaReview of Biological Central Dogma Osteoctye gene set derivation Osteoctye gene set derivation

Osteocyte purificationOsteocyte purification Microarray experiments Microarray experiments Functional annotation analysisFunctional annotation analysis

Sequence Analysis of promoter regionsSequence Analysis of promoter regions Construction of regulatory networkConstruction of regulatory network Partitioning to define cis-regulatory modulesPartitioning to define cis-regulatory modules ResultsResults

Page 3: A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua

Background – Cellular functionsBackground – Cellular functions

Certain types of cells perform specific Certain types of cells perform specific biological functionsbiological functions Key genes must be activated to perform correctlyKey genes must be activated to perform correctly

Osteocytes play an essential role in regulating Osteocytes play an essential role in regulating bone formation and remodelingbone formation and remodeling We want to identify these key genes and the We want to identify these key genes and the

activators of these genesactivators of these genes

Page 4: A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua

Why study osteocyte cells?Why study osteocyte cells?

Identifying these key genes (and their Identifying these key genes (and their activators) involved in the bone-formation activators) involved in the bone-formation process may lead to new targeted therapies process may lead to new targeted therapies For osteoporosis, loss of bone in space travel, For osteoporosis, loss of bone in space travel,

extended bed rest, etc.extended bed rest, etc.

Page 5: A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua

Molecular Biology Central DogmaMolecular Biology Central Dogma

Page 6: A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua

We want to identify these associations between We want to identify these associations between Transcription Factors and the genes that they regulate Transcription Factors and the genes that they regulate in order to build a “transcriptional regulatory network”in order to build a “transcriptional regulatory network”

Page 7: A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua

Osteocyte cells are hard to isolateOsteocyte cells are hard to isolate

Embedded within the bone matrix, and lacking Embedded within the bone matrix, and lacking molecular and cell surface markers, they are molecular and cell surface markers, they are seemingly inaccessibleseemingly inaccessible

How to characterize and isolate these cells?How to characterize and isolate these cells? Solution: create “special” mouse that contains Solution: create “special” mouse that contains

inserted “special” gene that drives inserted “special” gene that drives fluorescence in fluorescence in osteocytesosteocytes

Page 8: A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua

Isolating osteocytesIsolating osteocytes

Osteocytes are known to highly express Osteocytes are known to highly express Dentin matrix protein 1 (DMP1)Dentin matrix protein 1 (DMP1) A transgene was created with the same promoter A transgene was created with the same promoter

(activation) region as DMP1 that drives GFP, then (activation) region as DMP1 that drives GFP, then inserted into this transgenic mouseinserted into this transgenic mouse

Cells that highly express DMP1 (osteocytes) will Cells that highly express DMP1 (osteocytes) will also drive GFPalso drive GFP

We can now purify osteocytes from We can now purify osteocytes from other cells using fluorescence-activated other cells using fluorescence-activated cell sortingcell sorting

Page 9: A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua

Identifying key osteocyte genes using microarrayIdentifying key osteocyte genes using microarray

Microarray experiments allow us to measure the Microarray experiments allow us to measure the activity of genes (expression profile)activity of genes (expression profile)

We compared the expression profiles of the We compared the expression profiles of the purified osteocyte cells (+GFP) to non-osteocyte purified osteocyte cells (+GFP) to non-osteocyte cells (-GFP)cells (-GFP) Identified the top 269 genes expressed > 3 fold Identified the top 269 genes expressed > 3 fold

in the +GFP as compared to –GFP (FDR-in the +GFP as compared to –GFP (FDR-corrected p-value < 0.05)corrected p-value < 0.05)

Page 10: A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua

Identifying functionally-related osteocyte genesIdentifying functionally-related osteocyte genes

Each of the 269 genes has one or more GO Each of the 269 genes has one or more GO terms or PIR-keywords associated with itterms or PIR-keywords associated with it Gene Ontology (GO) terms describe biological Gene Ontology (GO) terms describe biological

processes, cellular components and molecular processes, cellular components and molecular functionsfunctions

Protein Information Resource (PIR) keyword is an Protein Information Resource (PIR) keyword is an annotation from the PIR database annotation from the PIR database

Page 11: A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua

Functional Annotation ClusteringFunctional Annotation Clustering For each GO term associated with a gene or group of genes For each GO term associated with a gene or group of genes

within the 269 set, a p-value is computed using within the 269 set, a p-value is computed using hypergeometric dist. and adjusted for multiple testing using hypergeometric dist. and adjusted for multiple testing using Benjamini methodBenjamini method

Enrichment score per cluster is the geometric mean of the Enrichment score per cluster is the geometric mean of the indivual GO p-vals.indivual GO p-vals.

DAVID Bioinformatics Tool was used for the clusteringDAVID Bioinformatics Tool was used for the clustering

Page 12: A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua

Functional annotation clustering resultsFunctional annotation clustering results As expected, most enriched clusters relate to As expected, most enriched clusters relate to

“extracellular region”, “system development”, etc.“extracellular region”, “system development”, etc. Cluster 2 relates to bone, and interestingly, Cluster 5 Cluster 2 relates to bone, and interestingly, Cluster 5

relates to musclerelates to muscle We narrowed our 269 gene set to these 98 genes We narrowed our 269 gene set to these 98 genes

corresponding to bone and musclecorresponding to bone and muscle

Page 13: A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua

Identifying TF Binding Sites in the 98 gene setIdentifying TF Binding Sites in the 98 gene set We searched the 5kb promoter sequence upstream to We searched the 5kb promoter sequence upstream to

TSS of each gene for known TF binding motifs from TSS of each gene for known TF binding motifs from TRANSFAC db, using rVista toolTRANSFAC db, using rVista tool Filtered the TF motifs to keep only those Filtered the TF motifs to keep only those

conservedconserved between mouse and human genomes between mouse and human genomes Conserved motifs increase confidenceConserved motifs increase confidence

Page 14: A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua

Identifying TF Binding Sites in the 98 gene setIdentifying TF Binding Sites in the 98 gene set Many motifs identified related to bone & muscleMany motifs identified related to bone & muscle 67 of the 98 genes contained over 10 conserved Mef2 67 of the 98 genes contained over 10 conserved Mef2

binding sites in their promoters binding sites in their promoters

Bone & muscle genes and their number of conserved Mef2 binding Bone & muscle genes and their number of conserved Mef2 binding sitessites

Page 15: A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua

Building the transcriptional regulatory networkBuilding the transcriptional regulatory network Created a network consisting of the 98 gene set and Created a network consisting of the 98 gene set and

their conserved and their conserved and enrichedenriched TF’s as nodes TF’s as nodes An edge between a gene and a TF represents the An edge between a gene and a TF represents the

statistically significant presence of that TF’s statistically significant presence of that TF’s binding site on the promoter of that genebinding site on the promoter of that gene

TF’s filtered using conservation AND enrichment TF’s filtered using conservation AND enrichment to produce more reliable edges and reduce noiseto produce more reliable edges and reduce noise Enrichment of a TF motif is determined by a p-value Enrichment of a TF motif is determined by a p-value

based on the # of occurrences in the 5kb upstream of based on the # of occurrences in the 5kb upstream of this gene, as compared to the # of occurrences in the 5kb this gene, as compared to the # of occurrences in the 5kb upstream of the rest of the genes in the genomeupstream of the rest of the genes in the genome

Page 16: A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua

Modular structure of the regulatory networkModular structure of the regulatory network Final network consisted of 98 genes and 153 Final network consisted of 98 genes and 153

conserved and over-represented TF’sconserved and over-represented TF’s To identify possible combinatorial effects of TFBS, To identify possible combinatorial effects of TFBS,

we partitioned the genes in the network using the Q-we partitioned the genes in the network using the Q-Cut algorithmCut algorithm Q-Cut is a graph partitioning algorithm for finding dense Q-Cut is a graph partitioning algorithm for finding dense

subnets (i.e., communities). Optimizes a statistical score subnets (i.e., communities). Optimizes a statistical score called the modularity, and automatically determines the most called the modularity, and automatically determines the most appropriate number of communitiesappropriate number of communities

Page 17: A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua

We reduced noise and created a more sparse gene-gene We reduced noise and created a more sparse gene-gene network for better partitioningnetwork for better partitioning

We created this temporary network by assigning a cosine We created this temporary network by assigning a cosine similarity score to each pair of genes according to their similarity score to each pair of genes according to their shared TF’s.shared TF’s. Cosine similarity is a measure of similarity between two Cosine similarity is a measure of similarity between two

vectors (each vector contains 153 slots for the 153 enriched vectors (each vector contains 153 slots for the 153 enriched TFs in the 98 gene set)TFs in the 98 gene set)

Edges between genes represent their similarity score, Edges between genes represent their similarity score, and this net was converted to a sparse net by and this net was converted to a sparse net by connecting each gene to its k nearest neighbors (k=7) connecting each gene to its k nearest neighbors (k=7) and employing a similarity score cutoff of 0.5and employing a similarity score cutoff of 0.5

Page 18: A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua

Identifying modules in the initial regulatory networkIdentifying modules in the initial regulatory network Q-Cut was then applied to this gene-gene network, Q-Cut was then applied to this gene-gene network,

resulting in communities with many common TF resulting in communities with many common TF binding sitesbinding sites

Page 19: A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua

Interesting clustersInteresting clusters Cluster below shows a strong community structure Cluster below shows a strong community structure

between 16 genes and their common TFBSbetween 16 genes and their common TFBS Representative of many TF’s coordinately Representative of many TF’s coordinately

regulating a small set of genesregulating a small set of genes

Page 20: A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua

A putative model of a transcriptional networkA putative model of a transcriptional network A proposed model was built using the network resultsA proposed model was built using the network results DMP1 & Sost (highly expr. in osteocytes) are shown DMP1 & Sost (highly expr. in osteocytes) are shown

to be regulated by Mef2 and Myogeninto be regulated by Mef2 and Myogenin

Page 21: A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua

Putative model used to generate hypotheses Putative model used to generate hypotheses We now have an ex vivo system for pure osteocytes in We now have an ex vivo system for pure osteocytes in

a proper microenvironment to conduct experimental a proper microenvironment to conduct experimental validation based on this modelvalidation based on this model

Here the osteocytes will make appropriate levels of Here the osteocytes will make appropriate levels of osteocyte-specific genesosteocyte-specific genes

Experiments are currently underwayExperiments are currently underway

Page 22: A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua

ConclusionsConclusions We used a systems biology method to construct a We used a systems biology method to construct a

putative transcriptional regulatory network model for putative transcriptional regulatory network model for osteocytes, by integratingosteocytes, by integrating Microarray dataMicroarray data Functional annotationFunctional annotation Comparative genomicsComparative genomics Graph-theoretic knowledgeGraph-theoretic knowledge

Many parts of the network can be confirmed by the Many parts of the network can be confirmed by the literatureliterature

Experiments are currently underway to further Experiments are currently underway to further validate the modelvalidate the model