novel approaches to the visualization of cell specific ...gq062rg0666/thesis_cbo-augmented.pdf ·...
TRANSCRIPT
NOVEL APPROACHES TO THE VISUALIZATION OF CELL SPECIFIC GENE
EXPRESSION PATTERNS
A DISSERTATION
SUBMITTED TO THE DEPARTMENT OF BIOENGINEERING
AND THE COMMITTEE ON GRADUATE STUDIES
OF STANFORD UNIVERSITY
IN PARTIAL FULFILLMENT OF THE REQUIREMENTS
FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY
Chuba Benson Oyolu
December 2010
This dissertation is online at: http://purl.stanford.edu/gq062rg0666
© 2011 by Chuba Benson Odimegwu Oyolu. All Rights Reserved.
Re-distributed by Stanford University under license with the author.
ii
I certify that I have read this dissertation and that, in my opinion, it is fully adequatein scope and quality as a dissertation for the degree of Doctor of Philosophy.
Julie Baker, Primary Adviser
I certify that I have read this dissertation and that, in my opinion, it is fully adequatein scope and quality as a dissertation for the degree of Doctor of Philosophy.
Russ Altman
I certify that I have read this dissertation and that, in my opinion, it is fully adequatein scope and quality as a dissertation for the degree of Doctor of Philosophy.
Karl Deisseroth
Approved for the Stanford University Committee on Graduate Studies.
Patricia J. Gumport, Vice Provost Graduate Education
This signature page was generated electronically upon submission of this dissertation in electronic format. An original signed hard copy of the signature page is on file inUniversity Archives.
iii
III
I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of
Philosophy
(Julie C. Baker PhD) Principal Adviser
I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of
Philosophy
(Russ B. Altman M.D. PhD)
I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of
Philosophy
(Karl Deisseroth M.D. PhD)
Approved for the Stanford University Committee on Graduate Studies
IV
ABSTRACT
The fate of a cell is largely determined by the unique patterns of gene
expression found within it. Complex biological machinery exists within each
cell to manipulate chromatin state, and ultimately control gene expression.
Developmental processes such as cellular differentiation require very specific
chemical signals and environmental conditions. These serve as triggers to put
the chromatin modification schemes that produce the resultant patterns of
differential gene expression into action, leading to the formation of the cell
type of interest. My thesis work is an in depth study of the link between
chromatin modification, gene expression, and the unique genetic signatures
that characterize distinct cells on unicellular and multi-cellular levels. On the
multi-cellular level, I have examined histone modification patterns for their
effects on gene activation and repression during human embryonic stem cell
differentiation. On the unicellular level, I have worked with a variety of cell
types to ascertain the degree of individuality that exists between single
members of relatively homogenous cell groups while simultaneously looking
for housekeeping gene expression signatures that can be used to classify
each cell type into a unique group. To further elucidate the patterns of gene
expression found within cell groups and the single cells that comprise them, I
have worked to develop new computational methods that produce visual aids
to elucidate gene expression signatures of single cells and cell groups.
V
ACKNOWLEDGEMENTS
I would first and foremost like to thank the creator for all the help and comfort
that I received in dire moments without which I would never have come this
far. To my parents Edith and Victor Oyolu, your advice and unconditional love
gave me the confidence to persevere regardless of the circumstances and
challenges I faced. I would like to thank the members of my thesis committee
for the insightful and valuable advice they have given me throughout my
career. I would like to especially thank Dr Julie Baker for excellent mentorship
throughout my post-graduate degree, and all the members of the Baker lab for
being so generous with their time and expertise. I would also like to thank my
collaborators… especially those in the Quake and Sidow labs for excellent
correspondence and remarkable technical work. The Genetics department not
only welcomed me from Bioengineering with open arms, but also gave me the
opportunity to do the work I enjoy. And for that, I will be eternally grateful.
VI
TABLE OF CONTENTS
Chapter 1 Introduction 1
Section I - Chromatin Modification
Chapter 2 Nodal Signaling Refines Bivalent 3
Domains During Endoderm Formation
in hESCs
Chapter 3 Cell specific vector generated surface 56
plots “ChIPvect_gui”
Section II - Single Cell Gene Expression
Chapter 4 SC Express: A visual aid to uniquely identify 76 single cells
Chapter 5 Analysis of Gene Expression Patterns 95
in Single Human Embryonic Stem Cells
and Their Derivatives Allows for Cellular
Classification
Chapter 6 Outlook 124
Chapter 7 Archive: MATLAB code 128
References
VII
LIST OF FIGURES PAGE
Figure 2.1 30 Figure 2.2 31 Figure 2.3 32 Figure 2.4 33 Figure 2.5 34 Figure S2.1 35 Figure S2.2 36 Figure S2.3 37 Figure S2.4 38 Figure S2.5 39 Figure S2.6 40 Figure S2.7 41 Figure S2.8 42 Figure S2.9 43 Figure 3.1 69 Figure 3.2 70 Figure 3.3 71 Figure 3.4 72 Figure 3.5 73 Figure 4.1 89 Figure 4.2 90 Figure 4.3 91 Figure 4.4 92 Figure 4.5 93 Figure 5.1 109 Figure 5.2 110 Figure 5.3 111 Figure 5.4 113 Figure 5.5 115
VIII
LIST OF TABLES PAGE
Table 2.1 44 Table S2.1 45 Table S2.2 46 Table S2.3 48 Table 3.1 74 Table 3.2 75 Table 4.1 94 Table 5.1 116 Table 5.2 117
2
Though a vast majority of cell types contain the same base genetic
template, it is currently understood that the uniqueness of each cell is
endowed through selective expression and repression of genes (Schnabel,
Marlovits et al. 2002). Some of the factors internal to the cell that are known to
influence gene expression include: histone modification, transcription factor
binding, and DNA methylation (Jaenisch and Bird 2003; Brunner, Johnson et
al. 2009). Environmental signals received by developing cells serve to trigger
these mechanisms, leading to differential gene expression which eventually
culminates in cell fate determination. The goal of my thesis is two-fold. First, to
understand the epigenetic and transcriptional mechanisms that lead to
differential gene expression on a multi-cellular level, and secondly, to
determine the amount of genetic variation that exists between single members
of the same cell group.
Until relatively recently, differences in the sequence of DNA was
assumed to be solely responsible for the morphological and functional
differences between cells. Research in the past decade has shown that
epigenetic mechanisms are in fact largely responsible for differential gene
expression, and thus the functional and morphological differences between
cells (Bernstein, Mikkelsen et al. 2006). Eukaryotic DNA in its native state is
neatly packaged with histone proteins to form chromatin. Chromatin can take
two forms: the heterochromatic (inactive) and euchromatic (active) form. It is
currently held that the transition between these two forms of chromatin is
3
largely determined by modifications to the histone proteins that comprise the
nucleosome (Bernstein, Mikkelsen et al. 2006).
The advent of chromatin immunoprecipitation coupled with high-
throughput sequencing (chip-seq) has provided a tool with which to monitor
the effect of specific histone modifications on the control of gene expression
(Johnson, Mortazavi et al. 2007). This method, coupled with expression
profiling, has shown that in most cells, histone modifications on specific lysine
residues, promote either activation or repression of genes. For example, it is
held that tri-methylation of the fourth lysine residue (K4) on histone 3 (H3) is
generally associated with the activation of gene expression (Shi, Hong et al.
2006). On the other side of the coin, tri-methylation of the twenty-seventh
lysine residue (K27) of H3 is thought to be associated with gene repression
(Viré, Brenner et al. 2006). Even with this good base of knowledge, many
questions concerning the dynamics of histone modification during human
embryonic stem cell (hESC) differentiation remain unanswered.
hESCs have become one of the major tools in regenerative medicine
and tissue engineering, making it imperative to understand the key
mechanisms that govern their differentiation to more mature cell types. During
development, the three primary germ layers that yield most all the cell types in
the mature organism are specified: endoderm, mesoderm, and ectoderm
(James M. Wells and Melton 2000). The primary germ layer known as
4
endoderm is of particular interest because it is the source of essential visceral
organs such as the lung, liver, and pancreas (Kevin A D'Amour, Alan D
Agulnick et al. 2005; Richard I. Sherwood, Cristian Jitianu et al. 2007). Though
experimental protocols have been developed to effect the differentiation of
hESCs to definitive endoderm, the dynamic changes in the state of chromatin
that occur during this transition have not been well studied. Studying the effect
of histone modifications on the activation of gene expression may yield
valuable insight into the amount and type of genes actively involved in
endoderm specification from hESCs.
While methods such as chromatin immunoprecipitation and microarrays
allow for the study of gene expression on a multi-cellular level, there is
growing interest in the prospect of examining gene expression on the single
cell level. The relatively recent application of microfluidic technology to biology
has fashioned an era in which the expression levels of selected genes within
single cells can be readily observed (Todd Thorsen, Sebastian J. Maerkl et al.
2002; Luigi Warren, David Bryder et al. 2006). As a result, it is possible to ask
questions concerning the degree of uniformity between the gene signatures of
single members of the same cell group. Gene expression data at the single
cell level of resolution lends itself well to aid the design of novel computational
methods that facilitate visualization of the unique genetic signatures that
characterize each single cell, and groups consisting of cells of the same type.
5
The work in this dissertation begins on the multi-cellular scale with the
study of the synergistic interactions between histone modification and nodal
signaling that lead to cell fate determination during endoderm development.
To shed light on the above stated topic, the differentiation of hESC into
definitive endoderm was used as a model system to conduct the study. After
examining these questions on the multi-cellular level, we transitioned to the
unicellular level with the aim of examining the degree of transcriptional
variation between single cells of the same type. And to this end, considerable
effort was devoted towards developing computational tools to enable
visualization of the gene expression patterns within single cells.
7
CONTRIBUTION
The work in this chapter was done in collaboration with Dr Si Wan Kim.
In this body of work, I assumed responsibility for the following:
1. Tissue Culture and maintenance of human embryonic stem cells.
2. Differentiation of human embryonic stem cells into endodermal Cells
3. Chromatin immuoprecipitation experiments
4. Data analysis including peak calling, data comparison with those from
outside sources
5. Manuscript editing and figure generation in preparation for publication
SUMMARY
Uncovering the network that mediates NODAL signaling is critical
toward understanding both maintenance of pluripotency and early cell fate
commitment. To gain insights into the NODAL transcriptional network in
hESCs and derived endoderm, we analyzed the genomic targets for
SMAD2/3, SMAD3, SMAD4, and FOXH1 - as well as the chromatin modifying
marks, H3K4me3 and H3K27me3 - using ChIP-Seq technology. Mapping
sequencing reads to the human genome revealed an unprecedented number
of direct targets of NODAL signaling. We find that while the association of any
of these transcription factors within 1 kb of a transcription start site is
predictive of transcriptional activity, multiple bound targets of SMAD2/3 within
10 kb is the most predictive motif for transcriptional activation, especially in
endoderm. Despite the differentiation toward endoderm, we find that bivalent
8
regions, containing both H3K4me3 and H3K27me3, are still predominant
features of the chromatin, and may even be increased from hESCs.
Significantly, SMAD2/3 bound regions containing the broadest bivalent
signature are specifically resolved upon endoderm differentiation and are
highly predictive of transcriptional activation. The correlation between
SMAD2/3 binding, bivalent resolution and transcriptional activation suggests
that SMAD2/3 directly or indirectly plays an important role in bivalent
resolution within regions critical for endodermal specification. It further
provides a system in which to study how these key ‘poised’ regions become
activated.
INTRODUCTION
Embryogenesis is a complex process, requiring the coordinated
regulation of thousands of genes with a myriad of biological functions. While
we know a great deal about the general signaling pathways and how they
affect cell fate decisions, once these pathways enter the nucleus, very little is
known about how they bind necessary sequences, what those sequences are,
how the chromatin is configured at these regions, and how this combination of
events triggers the next emerging cell fate. Some of the major unresolved
questions in developmental biology pertain to how signaling pathways become
diversified in the nucleus and how these resulting combinations of genes
influence specific developmental fates.
9
Endoderm is one of the first cell types to emerge during embryogenesis
and does so under the control of the NODAL signaling pathway. The secreted
protein - NODAL - signals through serine threonine kinase receptors to
activate the intracellular proteins and transcription factors, SMAD2, SMAD3
and SMAD4. These transcription factors form an association with FOXH1 at
target regions within the genome. Several direct targets of SMAD2/3/4 and
FOXH1 have been elucidated which play key roles in endoderm development,
including GSC, PITX2, LEFTY1, LEFTY2, NODAL and CADHERIN (Shiratori,
Sakuma et al. 2001; Saijoh, Oki et al. 2003; von Both, Silvestri et al. 2004; Izzi,
Silvestri et al. 2007). However, very little is known about how the SMAD2/3/4
and FOXH1 complex assembles at specific genomic targets in a cell type
specific manner. Recently, mouse FOXH1 targets have been bioinformatically
identified using a combination of FOXH1 and SMAD2 consensus sequences
(Silvestri, Narimatsu et al. 2008), but it remains unknown which of these
targets are functionally bound within different cell types. NODAL signaling is
pleiotropic, being involved not only in the establishment of endoderm, but
repeatedly throughout development in the formation of the heart, skin, bones,
and reproductive tracts (von Both, Silvestri et al. 2004; Owens, Han et al.
2008). It has also been implicated in a large variety of cancers (Gupta et al.,
2004; Lee et al., 2010; Mangone et al., 2010; Xu et al., 2004). Recently, it has
been shown that NODAL signaling is required for the maintenance of
pluripotency in human embryonic stem cells (hESCs) (Besser 2004; James,
Levine et al. 2005; Vallier, Alexander et al. 2005; Vallier, Mendjan et al. 2009)
10
which appears contradictory as it is also involved in the first stages of
differentiation toward endoderm in these cells (D'Amour, Agulnick et al. 2005;
D'Amour, Bang et al. 2006). As NODAL has long been known to have strong
dose dependent effects on cell fate specification, it is likely that the decision
between maintaining pluripotency versus differentiation is due to significant
changes in downstream targets in response to varying levels of NODAL signal.
The effect of NODAL in maintaining pluripotency may also be
dependent upon the distinct chromatin state existing in hESCs. hESCs are
known to have a high degree of heterochromatin and have been shown to
have a prevalent histone signature, called a bivalent domain, where a genomic
region is associated with both active (H3K4me3) and repressive (H3K27me3)
histone marks (Bernstein, Mikkelsen et al. 2006; Ku, Koche et al. 2008). These
bivalent domains, especially those that span broad regions, are associated
with developmentally regulated cell fate genes. Thus the bivalent mark in
hESCs has been hypothesized to ‘poise’ developmental genes for rapid
activation (Bernstein, Mikkelsen et al. 2006). Indeed, several reports have
shown that these bivalent marks are resolved into either repressive
(H3K27me3) or active (H3K4me3) states upon differentiation, suggesting that
cell fate commitment may require the release of this primed bivalent state
(Bernstein, Mikkelsen et al. 2006; Zhao, Han et al. 2007).
11
In order to examine the role of NODAL signaling in both pluripotency
and endoderm specification, and how chromatin state influences the response
to these signals, we provide a genomic analysis of SMAD2/3, SMAD3, SMAD4
and FOXH1 targets in both hESCs and hESCs differentiated into endoderm.
We demonstrate that targets for these transcription factors are highly dynamic
and change between the two cell types, suggesting that different loci may
indeed be used to drive different fates. We further show that SMAD2/3,
SMAD3, SMAD4 or FOXH1 binding within 10 kb of the transcription start site
(TSS) is highly predictive of transcription. Additionally, the binding of multiple
sites adjacent to a promoter holds even greater predictive power, particularly
for SMAD2/3 within endodermal cells, suggesting that the presence of multiple
complexes correlates strongly with transcriptional levels.
To elucidate whether these responses are due to chromatin state, we
performed genome wide mapping of marks associated with H3K4me3 and
H3K27me3 in both hESCs and derived endoderm. Although hESC derived
endoderm has similar bivalent domains to hESCs, we show that those regions
selectively associated with SMAD2/3 lose the broad bivalent context within the
endoderm. Interestingly, these SMAD2/3 bound regions are the most
favorable context for inducing an endodermal transcriptional response.
Overall, we report an extensive resource for targets of this important pathway
and associate binding activity to specific chromatin contexts.
12
RESULTS
Genome-Wide Target Analysis of SMAD2/3/4 and FOXH1 in hESCs and
Derived Endoderm
To characterize the downstream NODAL targets during the
differentiation of hESCs into the endodermal lineage, we performed ChIP-Seq
using antibodies against SMAD2/3, SMAD3, SMAD4, and FOXH1. Since
NODAL has a pleiotropic and somewhat contradictory function to both prevent
and induce differentiation in hESCs, we sought to evaluate this pathway in
both hESCs and endoderm derived from hESCs after treatment with ACTIVIN:
known to activate the same pathway. Comparison of NODAL targets between
these stages provides insight into the networks involved in pluripotency and
endoderm formation and can be used to evaluate how these networks change
through time. We examined multiple antibodies against SMAD2/3, SMAD3,
SMAD4 and FOXH1 for their ability to pull down chromatin in both hESCs and
derived-endoderm and found several, including two SMAD2/3 antibodies (anti-
rabbit; SMAD2/3_A and anti-goat; SMAD2/3_B, Table 1), that were highly
efficient based upon extensive validation. By using ChIP-qPCR, we analyzed
enrichment of several known SMAD targets, including LEFTY1 and LEFTY2
(Figure S2.1). GAPDH intronic sequences were used as negative controls.
After validation, three ChIPs were pooled from each antibody as well as input
controls in both hESCs and derived endoderm. Libraries were then generated
and sequenced with Illumina Genome Analyzer II. Sequence tags were
mapped to the human genome (hg18) using Eland and binding sites were
13
identified using CisGenome (Ji, Jiang et al. 2008). Each binding site was
associated with the nearest gene TSS (UCSC Known Gene) within 1000 kb (1
Mb), 100 kb, 10 kb and 1 kb (Table 1). For the transcription factors,
SMAD2/3_A, SMAD2/3_B, SMAD3, SMAD4, and FOXH1, we generated 10.2,
8.7, 6.9, 9 and 10 million mapped reads in hESCs and 9.6, 5.9, 6.1, 6.1 and
11 million mapped reads in derived endoderm, respectively (Table 2.1). We
compared the targets elucidated from the two SMAD2/3 antibodies (A and B)
and found a high degree of overlap in both hESCs and derived endoderm
(92.9% and 74.1%, respectively). As the two SMAD2/3 antibodies detected
similar targets, but more were identified using SMAD2/3_B, all subsequent
analysis was performed on the B dataset.
Our dataset reveals an unprecedented number of direct targets of
NODAL signaling. Unexpectedly, we found that FOXH1 occupancy is vastly
expanded upon differentiation into endoderm while SMAD2/3 becomes more
limited: SMAD2/3 binds 14,833 sites in hESCs, but only 2,915 in derived
endoderm while FOXH1 binds 9,702 sites in hESCs and 29,292 regions in
derived endoderm. This differential use of particular transcription factors
suggests that they occupy very distinct target regions and that FOXH1 may be
acting to coat the chromatin upon differentiation, a role consistent with its
known ‘pioneering’ activities to facilitate opening chromatin (Cirillo and Zaret
1999; Cirillo, Lin et al. 2002). Overall, this provides an unprecedented dataset
14
in which to mine for NODAL targets and putative effectors of this important
pathway.
SMAD2/3/4 associate with different targets in hESCs and derived
endoderm.
We examined the genome distribution of NODAL targets before and
after differentiation in order to determine target dynamics. To this end, we
categorized each binding target based on whether it resided on an annotated
exon, intron, promoter (±10 kb from the TSS), or intergenic region (Figure
2.1A). We found that, in hESCs, SMAD2, 3 and 4 are bound at similar
frequencies to each of these genomic regions and the binding of these
transcription factors is mostly concentrated within genes or surrounding genes,
not within intergenic regions. In contrast, most of the SMAD binding (85%)
occurs in intergenic and intronic regions in derived endoderm with less than
5% and 10% occurring in exons and promoters, respectively. Surprisingly, the
genomic distribution of FOXH1 targets remains more constant between these
two cell types, exhibiting a high degree of binding outside of exons and
promoters. This mimics the distribution of SMADs within derived endoderm,
but not in hESCs. Overall, the SMAD transcription factors display remarkable
dynamics in the genomic distribution of their binding regions even within the 5
days that separate hESCs from endoderm, with the SMAD proteins
preferentially occupying exon and promoter regions in hESCs only.
15
As SMAD binding is dynamic between hESCs and derived endoderm,
we sought to define how these targets are utilized in the different cells. By
analyzing the overlapping targets between each transcription factor in either
hESCs or derived endoderm, we found that most of the SMAD binding targets
change upon differentiation. Only 459 of the 14,833 (3%) SMAD2/3 targets in
hESCs are preserved in the derived endodermal cells (Figure 2.2b). A similar
pattern is observed for SMAD3 (180/2,688; 6.7%), and SMAD4 (345/3,936;
8.8%). On the other hand, FOXH1 retains almost 50% of its hESC targets
upon differentiation toward endoderm. Together, this suggests that a vast
change in transcription factor occupancy is triggered upon differentiation
toward endoderm.
SMAD2/3/4 Associate With Similar Neighboring Genes in hESCs and
Derived Endoderm
As SMAD2, 3 and 4 were bound to distinct targets within hESCs and
derived endoderm, we tested whether these targets surrounded the same
neighboring genetic region. For example, SMAD2/3 may bind different targets
in hESCs and endoderm, but the targets may still be responsible for regulating
the same genes. To this end, we examined the overlap between genes called
within the regions bound by all of the transcription factors analyzed. We found
that genes lying within target regions remained more consistent between
hESCs and endoderm than the targets themselves. For example, 1,134 of the
1,905 (60%) genes neighboring SMAD2/3 targets within 100 kb in endoderm
16
were also targeted in hESCs, compared to 6.7% of the exact targets (Figure
2.1b). This suggests that while the NODAL targets are dynamic during
differentiation they tend to occupy regions surrounding similar genes. These
findings strongly support the notion that transcription factors are highly
dynamic and use different loci within gene regions to mediate distinct
transcriptional responses.
SMAD2, SMAD3, SMAD4, and FOXH1 are known to regulate similar
downstream targets in a variety of cellular contexts and are known to form
complexes at these sites (Attisano, Silvestri et al. 2001; Silvestri, Narimatsu et
al. 2008). Therefore, we examined the overlapping targets between these
transcription factors. We found that, in both hESCs and derived endoderm, all
SMAD transcription factors are bound near a highly overlapping set of genes,
regardless of the distance examined from TSS (Figure 2.1b and Figure S2.2).
Most gene regions were bound by all three proteins. Comparison between the
putative target genes for the SMAD2, 3 and 4 proteins with those of FOXH1
show that while some overlap exists in hESCs, due to the overwhelming
genome wide occupancy of FOXH1, it is extensive in derived endoderm,
encompassing almost all (98.6%) of SMAD target genes (Figure 2.1b).
17
SMAD2/3/4 and FOXH1 Complexes are Highly Predictive of Gene
Transcription If Present Within 10 kb of TSS
SMAD2, 3, 4 and FOXH1 bind thousands of regions genome wide, but
since transcription factor binding does not necessarily equal transcriptional
activity, we sought to understand how these binding signatures correlate with
gene expression output. To this end, we first performed an extensive
microarray time course of hESC differentiation into endoderm post ACTIVIN
treatment, examining every 48 hours (day 0, 1, 3, and 5). Interestingly, several
critical lineage specification genes including GSC, MIXL1 and EOMES are
highly enriched (more than 35 times) after the first 24 hours of differentiation
(Table S2.1). The regions surrounding each of these developmentally
important genes exhibit specific NODAL target regions for both hESCs and
derived endoderm, illustrating the dynamic nature of SMAD2/3/4 binding
(Figure 2.2). For example, upon differentiation to endoderm, EOMES and
GSC are bound by SMAD2/3/4 in regions not bound in hESCs (see Figure 2.2
dotted black boxes). Conversely, several regions bound in hESCs are lost in
endoderm.
We next determined the most favorable context of SMAD2/3/4 or
FOXH1 binding that could be correlated to a transcriptional response. To this
end, we examined all regions in the genome surrounding a TSS at 1 kb, 10 kb
and 1 Mb and identified each that contained regions bound to SMAD2/3/4 or
FOXH1 within both hESCs and derived endoderm. We next correlated these
18
binding contexts with neighboring gene transcription levels, the total of which
were averaged and compared with transcriptional levels of genes with no
detectable binding. Surprisingly, we find that in both hESCs and derived
endoderm, the presence of a SMAD2/3, SMAD3, SMAD4 or FOXH1 binding
event within 1 kb of a TSS is significantly correlated with an increase in
transcriptional levels, above background levels (Student’s t-test; In hESCs, P
were 1.5E-50, 5.6E-38, 2.5E-15 and 5.6E-16, respectively; In endoderm, 3.5E-
12, 8.7E-12, 1.7E-05 and 4.3E-12, respectively; see Figure S2.2). Once this
distance is expanded to 10 kb or 1 Mb, this correlation diminishes for all
transcription factors.
We next examined only the 10 kb interval and asked whether the
accumulation of multiple SMAD2/3/4 or FOXH1 binding events could be
correlated with transcriptional activity. In derived endoderm, we find that three
or more binding regions of SMAD2/3, SMAD3, SMAD4 or FOXH1 proteins is
highly correlated with increased transcription levels and the more target
regions within this interval, the more significant the correlation. This correlation
is particularly strong for regions containing three or more SMAD2/3 or SMAD3
bound sites in derived endoderm (Student’s t-test; P = 1.5E-22 and 9.5E-16,
respectively; see Figure S2.4). Overall, this data strongly suggests that in both
hESCs and endodermal cells NODAL targets are more likely to be activated if
any of these transcription factors have concentrated regions of binding within
10 kb from the TSS.
19
Genome-Wide Mapping of Chromatin Marks, H3K4me3 and H3K27me3, in
hESCs and Derived Endoderm
As the regions surrounding the TSS appear to be critical for SMAD
activation of transcription, we next sought to examine whether these regions
are associated with particular chromatin conformations. To this end, we
performed ChIP-Seq using antibodies against H3K4me3 and H3K27me3. For
H3K4me3 and H3K27me3, we generated 7.3 and 17.9 million mapped reads
in hESCs and 10.3 and 19.6 million mapped reads in derived endoderm,
respectively (Table 2.1). Since the binding of H3K4me3 and H3K27me3 has a
far wider distribution than that of transcription factors, we sought to address
whether our depth of sequencing reached saturation. To this end, we called
peaks from pooled reads (two biological replicates for H3K4me3 and three for
H3K27me3) and checked the levels of saturation of unique peaks called.
H3K4me3 reads reached saturation, but not H3K27me3 even after additional
sequencing (Figure S2.5). To further verify these histone datasets, we
compared those generated for hESCs to other published accounts (Pan, Tian
et al. 2007; Zhao, Han et al. 2007). Although different hESC lines were used
(H9, H1, hES3), a high percentage of genes containing H3K4me3 peaks are
found in common (ours and Pan et al., 71% and 83%, respectively; ours and
Zhao et al., 68% and 88%, respectively). In contrast, relatively lower
percentage of genes containing H3K27me3 peaks are found in common (ours
and Pan et al., 64% and 50%, respectively; ours and Zhao et al., 44% and
65%, respectively). These data suggest that either more extensive sequence
20
depth might be necessary or that H3K27me3 marks are more variable than
H3K4me3 marks among cell lines.
Endoderm Contains Predominant Bivalent Domains
As it is known that bivalent domains containing both bound H3K4me3
and H3K27me3 become resolved during differentiation, we sought to examine
how these marks were altered during endoderm specification. To this end, we
used K-means clustering to visualize H3K27me3 and H3K4me3 enrichment
around 16,621 TSSs in both hESCs and derived endoderm (Heintzman, Stuart
et al. 2007; Hon, Ren et al. 2008). This analysis enabled a clear demarcation
of nine different groups (1-9) containing unique signatures which exist in both
cell types (Figure 2.3). Furthermore, GO analysis defines these clusters,
showing that several have unique biological functions (Table S2.2).
Interestingly, in endoderm there are more bivalent classifications than in
hESCs as depicted by Groups 5-7. This is due to the addition of H3K27me3 in
narrow domains along these regions, which are not present in hESCs. The
bivalent groups with the strongest and widest H3K27me3 marks (Group 1 and
Group 4) are strongly associated with specific biological functions. Group 1
contains genes with roles in various developmental processes
("Developmental Group”; P = 1.1E-88). In this endodermal context however
this Group 1 ‘Developmental Group’ is highly enriched in regions involved in
endoderm formation, including EOMES, GSC, PITX2, SOX17 and GATA4.
Group 4 on the other hand contains genes with roles in cell adhesion and
21
communication (P = 2.5E-08 and 2.3E-14, respectively). While it is known that
the bivalent motif exists in various forms (Ku, Koche et al. 2008; Cui, Zang et
al. 2009), we were surprised to see how many different patterns emerged
upon clustering. Interestingly, unlike other more terminally differentiated cell
types, including neural precursor cells derived from embryonic stem cells,
endoderm appears to have maintained a high degree of bivalency(Bernstein,
Mikkelsen et al. 2006; Mikkelsen, Ku et al. 2007; Pan, Tian et al. 2007; Zhao,
Han et al. 2007).
As it is well known that different histone marks associate with activation
and repression of transcription, we were interested in understanding how
Groups 1-9 correlated with both SMAD binding and transcriptional activation in
the context of endoderm. To this end we used our microarray time course of
hESC differentiation into endoderm to associate the behavior of transcripts,
whether induced, constitutive, inactive, or repressed with a specific histone
grouping (1-9) (Figure S2.7a). Groups 3 and 6, which have predominant
H3K4me3 with minor H3K27me3, are associated with a range of
transcriptional behaviors, including induction, repression and constitutive
expression (both Groups 3 and 6; all P < 1.0E-03, see Experimental
Procedures for statistical analysis). Groups 8 and 9, which have little or no
H3K4me3 or H3K27me3, are associated - as might be expected - with inactive
regions (both P < 1.0E-03). Interestingly, Groups 1, 2 and 4 are associated
22
with transcripts that become activated upon differentiation (P < 1.0E-03, 1.5E-
02 and 2.8E-02, respectively).
SMAD2/3 Association Correlates with Resolution of Bivalent in Group 1.
While bivalent regions are prevalent in endoderm, we sought to
examine whether regions associated with active transcription were still in a
bivalent conformation in the endodermal cells. To this end, we examined only
transcripts that were induced during differentiation from hESCs to endoderm
and divided these into their bivalent groupings (1-9). Histogram plots of the
amount of H3K4me3 and H3K27me3 at each expressed region for each group
are shown in Figure 2.5. While the bivalent conformation is still observed, even
at expressed regions in most of the bivalent groups, this conformation is
strongly being resolved in Group 1 and moderately in Group 4 (Groups 1 and
4; all P for H3K4me3 and H3K27me3 < 1.0E-06, see Experimental Procedures
for statistical analysis.). Overall this suggests that Group 1 genes associated
with transcriptional activation upon differentiation toward endoderm have
unique chromatin alterations.
We next sought to determine whether SMAD2/3 binding could be
associated with these important chromatin changes. To this end, we examined
whether SMAD2/3 binding at the ‘induced’ regions could predict resolution of
bivalency. Of the 32 upregulated genes in Group 1, 21 genes were bound by
SMAD2/3. As illustrated in the browser shots of Figure 2.3 and Figure S2.5, all
23
21 of these regions displayed almost complete resolution of the bivalent
domain compared to much less resolution at the other loci (Figure 2.4b) (both
P for H3K4me3 and H3K27me3 < 1.0E-06). Interestingly, the 21 bound
regions included important endoderm specification genes, including EOMES,
GSC, SOX17, GATA4, GATA6, and FOXA2 (Table S2.3). This suggests that
SMAD2/3 directly or causally plays an important role in bivalent resolution
within these regions which are critical for endodermal specification and
provides a system in which to study how these key ‘poised’ regions become
activated.
Bivalent Domain is the Optimal Conformation for SMAD-Induced
Transcriptional Activation
The presence of SMAD2/3 is correlated with the resolution of bivalent
domains in Group 1, particularly at high expressed loci, including EOMES,
GSC, SOX17, GATA4, GATA6 and FOXA2, all endoderm specification
molecules. Here we sought to determine whether this SMAD2/3 association
was also predictive of active transcription. To this end, we analyzed the
location surrounding the TSS from each group for both SMAD binding and
resulting increase in transcriptional levels between hESCs and derived
endoderm. Surprisingly, we find that the binding of SMAD2/3 within Group 1 is
predictive of expression changes only within the endoderm. This is illustrated
in Figure 2.6 where we plot the log2 value of hESC versus endoderm
expression on regions bound by SMAD2/3, SMAD4 and FOXH1. Only Group 1
24
genes show increased activation of transcription correlated with SMAD2/3
binding. This is further illustrated when using regions bound by combinations
of the transcription factors. Regardless of the combination of bound
transcription factors, the only transcription factor that can be associated with
transcriptional change is SMAD2/3 in the Group 1 context (Figure 2.5 and
Figure S2.8 and S2.9). These results strongly suggest that the endodermal
bivalent state with the broadest H3K4me3 and H3K27me3 domains is the
most conducive for activation of transcription by SMAD2/3. This activation is
mediated by SMAD2/3 binding, not SMAD4 or FOXH1 and is probably
precipitated by a resolving bivalent domain.
DISCUSSION
While many inroads have been made in understanding endoderm
formation in vertebrates, the next paradigm shifts in embryology will be
advanced by the application of new technologies. As ChIP-Seq becomes more
utilized in the scientific community, many reports have described transcription
factor binding in hESCs and other developmental cell types (Boyer, Lee et al.
2005). To date, our datasets are unique, representing not just a single
transcription factor, but a complex of factors. Furthermore, these datasets
follow the dynamics of this complex through developmental time – from
pluripotency to endoderm in hESCs. The generated datasets for SMAD2, 3, 4,
FOXH1, H3K4me3 and H3K27me3 provide insight into mechanisms
25
underlying how SMAD transcription factors mediate NODAL signaling to
specify endoderm.
During endoderm differentiation, SMAD transcription factors specify
target genes to be transcribed when they are required for the execution of the
NODAL-induced developmental program. The subsets of target genes
necessary for the closely related functions are likely to be coordinately marked
and expressed to meet the need. Although the means by which this
coordination of transcription factor-induced gene expression is achieved is not
clear, it is becoming apparent that chromatin modification plays a key role.
Recently, a number of studies have shown that the levels of histone
methylation and the recruitment of histone methyltransferase with transcription
factors are critical for their transcriptional activity (Demers, Chaturvedi et al.
2007; McKinnell, Ishibashi et al. 2008; Cheng, Wu et al. 2009). In agreement
with this view, we showed that chromatin conformation around the TSS plays
a critical role in deciding which groups of genes become activated by all
transcription factors studied. In this paper, we presented genomic evidence
that the surroundings of TSSs are specifically equipped with histone
methylation marks to fulfill this coordinated control. Interestingly, within the
endoderm, we have defined subtle classes of bivalent domains, each with
distinct annotations, transcriptional responses, and binding variability. Group 1
represents the bivalent domain whose function is to regulate ‘Developmental
Genes’ which recapitulates previous findings (Bernstein, Mikkelsen et al.
26
2006). In addition, we showed another subclass bivalent group, Group 4,
which is strongly annotated to neuronal activities and cell adhesion and is not
identified in other studies (Pan, Tian et al. 2007; Zhao, Han et al. 2007). While
a small fraction (less than 20%) of monovalent genes has been shown to
become bivalent in more differentiated cell types including mouse embryonic
fibroblasts (MEFs) and neural progenitor cells (NPCs) (Mikkelsen, Ku et al.
2007; Zhao, Han et al. 2007), we showed that most of the monovalent genes
with H3K4me3 appears to become bivalent during endoderm formation (as
observed in Groups 3, 5, 6, and 7). Since these various bivalent groups
revealed in derived endoderm are associated with distinct annotations and
display unique histone marks, they can be further classified into the types
associated with Polycomb repressive complexes (PRC) as previously
discussed (Ku, Koche et al. 2008). Groups 1 and 4 are likely to be PRC1-
positive because they exhibit large H3K27me3 regions and maintain the
bivalent conformation during differentiation as well as are strongly annotated
to development and cell signaling. Interestingly, Groups 5, 6 and 7 are likely to
contain PRC1-negative bivalent domains emerged during endoderm formation
as they display small H3K27me3 regions and are associated with non-
developmental functions such as protein and DNA metabolism. These groups
suggest that new genes may become poised throughout stages of
differentiation for new functions.
27
Overall, and unexpectedly, the bivalent domains in endoderm derived
from hESCs have not yet been resolved and even are increased from the
hESC state. This maintenance of bivalent state is distinctly different from what
has previously been reported. While bivalent domains are prevalent in hESCs,
encompassing more than 2000 promoters in the genome, most of these
bivalent domains are resolved in more differentiated cell types including MEFs
and NPCs (Bernstein et al., 2006; Mikkelsen et al., 2007; Pan et al., 2007;
Zhao et al., 2007). The resolution is particularly true for genes restricted to
regulation of specialized functions, strongly suggesting that the bivalent
resolves to monovalent to activate developmentally important gene
transcription. We suggest that the difference between the unresolved but
active endoderm bivalent domains, and the resolved bivalent domains in
MEFs and NPCs lies in the degree of differentiation. Endoderm is one of the
first cell types that arise in the embryo and therefore must maintain a degree
of plasticity. It might not be surprising that these more plastic cellular types
retain a more bivalent conformation and even may utilize new subtleties in this
conformation to activate gene transcription. Our observations at particular
endoderm-specific loci reflect an intermediate stage of the bivalent, not
completely resolved, but clearly changing toward a more monovalent state at
important promoter regions. In our case, this is reflected by the Group 1
promoters which are bound by SMAD2/3. These are highly active promoters in
hESC derived endoderm and include key endoderm specification genes,
including GATA4, GATA6, FOXA2, GSC, and PITX2. Interestingly, two thirds
28
of these promoters were bound by SMAD2/3 in hESCs, but were inactive in
that cell type and did not display the subtle H3K4me3 and H3K27me3
changes found within endoderm, possibly suggesting that SMAD binding
precedes the chromatin change. Whether this association is due to SMAD2/3
binding altering the conformation of the bivalents of this class or whether this
conformation allows for initial SMAD2/3 binding is unknown, but will be an
interesting avenue of further pursuit.
Accompanying this paper is the complete dataset for SMAD2/3,
SMAD3, SMAD4 and FOXH1 targets in hESCs and derived endoderm and
their effects on neighboring gene transcription, a resource that can be both
mined for enhancers of specific gene loci and for genomic studies. One of our
surprising findings is that SMAD2, 3, 4 binding is highly dynamic; few specific
target regions are maintained from hESCs to endoderm. This suggests that
the SMAD transcription complex is constantly in flux, using a variety of
different sites to elicit activation of individual loci. Furthermore, we also show
that FOXH1 has very different binding behavior than the SMAD proteins. First,
throughout differentiation, FOXH1 maintains association with the same
general genomic locations, whereas SMAD proteins become far more
localized in intergenic regions once cells have become endoderm. Second,
upon differentiation, FOXH1 exhibits widespread binding throughout the
genome whereas the SMADs become far more restricted to specific locales.
Third, FOXH1 binding has much less effect on transcriptional responses.
29
These all appear to be consistent with a role of FOXH1, not specifically as a
transcriptional activator, but as a pioneer protein which associates with
chromatin to recruit histone modifiers to these loci (Cirillo et al., 2002; Cirillo
and Zaret, 1999).
NODAL signaling is reused throughout development to guide the
formation of a plethora of tissue types. It has also been implicated in several
cancers (Xu, Zhong et al. 2004; Lee, Jan et al. 2010; Mangone, Walder et al.
2010). Despite the importance of this signaling pathway, few direct targets
have been elucidated since the SMAD transcription factors were identified
more than 14 years ago. Here we provide a comprehensive dataset that can
be used for the functional examination of thousands of additional targets.
These targets, several of which are bound by the SMAD complex in both
hESCs and derived endoderm, may also be bound and activated in a
multitude of other normal and diseased cell types. Thus, we anticipate that the
analysis of these factors will have wide-spread benefit to the scientific
community.
30
Figure 2.1: Cell Type-Specific Recruitment of SMADs and FOXH1 (a) Predicted genomic distribution of transcription factor binding. SMADs and FOXH1 targets were classified into annotated exons, introns, promoters, or intergenic region using UCSC Known Genes (Human browser hg18). Promoter regions are defined as regions within 10 kb from TSS. (b) Venn diagram representing the overlap of SMAD2/3 binding targets (upper left, Peaks) and associated genes (upper right, Genes) within 100 kb between hESCs (blue circle) and derived endoderm (red circle). The overlap of SMAD2/3 and FOXH1 binding targets (lower left) and SMAD2/3/4 targets (lower right) in derived endoderm.
31
Figure 2.2: Genome-Wide Mapping of SMAD2/3, SMAD4, H3K4me3 and H3K27me3 Using ChIP-Seq UCSC genome browser screen shots showing the loci of SMAD2/3 and SMAD4 binding and histone marks in the genome of EOMES and GSC in hESCs (blue) and derived endoderm (red). Dotted boxes indicate unique regions of SMAD2/3 and SMAD4 binding in derived endoderm, and asterisk indicates ACTIVIN response element in the promoter region (Danilov et al., 1998). K4 and K27 stand for H3K4me3 and H3K27me3, respectively.
32
Figure 2.3: Clustering of H3K4me3 and H3K27me3 Patterns in Promoter Regions K-means clustering was performed to visualize H3K4me3 (K4) and H3K27me3 (K27) marks surrounding 16,621 TSSs. Promoter regions covered were ±5 kb from TSS. Yellow areas are the regions of the log2 peak intensity higher than zero; black areas close to zero; and blue areas lower than zero.
33
Figure 2.4: Chromatin Signature Changes in Differentially Expressed and SMAD2/3 Bound Genes (a) The peak levels (histograms) of H3K4me3 (K4) and H3K27me3 (K27) in both hESCs and endoderm. Black solid lines indicate the histograms of all genes in each Group. Induced genes are represented in red lines. R represents normalized enrichment over the background. (b) The histograms of H3K4me3 (K4) and H3K27me3 (K27) peaks of Group 1 genes induced and also bound by SMAD2/3 in endoderm. SMAD2/3 bound and not-bound genes were represented in red and blue lines, respectively.
34
Figure 2.5: Regulation of Gene Expression by Transcription Factor Complexes in Each Cluster Genes bound by a single Transcription Factor or duplexes with SMAD2/3 in hESCs (a) and endoderm (b) were scattered based on their expression levels. The numbers in each graph are the quantity of bound genes. Trend lines of individual gene sets were drawn to assist to distinguish the expression differences.
35
Figure S2.1: ChIP Assay for SMAD2/3/4 and FOXH1 Binding to Known Targets H9 hESCs were differentiated to definitive endoderm by ACTIVIN treatment for 5 days. Cells were harvested and processed for ChIP with anti-SMAD2/3, SMAD3, SMAD4, or FOXH1 antibodies. The fold enrichment of the precipitated DNA by each of the antibodies versus the input control was determined by qPCR using positive target primers for LEFTY1 and LEFTY2 and negative target primer for GAPDH intronic region.
36
Figure S2.2: SMAD/FOXH1 Targets in hESCs and Derived Endoderm within 1 Mb, 10 kb and 1 kb from TSS. (a) Venn diagram representing the overlapping targets of SMAD2/3 between hESCs (blue circle) and derived endoderm (red circle). (b) Overlapping targets of SMAD2/3 (red circle) and FOXH1 (blue circle) in derived endoderm. (c) Overlapping targets of SMAD2/3 (red), SMAD3 (purple) and SMAD4 (green) in derived endoderm.
37
Figure S2.3: Expression Correlation of Transcription Factor Binding in Different Distances. Expression levels of genes bound by transcription factors in different distances (< 1 kb, 1 kb< 10 kb, and 10 kb< 1 Mb) in hESCs (a) and endoderm (b). Whiskers represent 5 and 95 percentile of genes in each group. Student t-tests were performed on each group comparing with None groups in the same distance categories. One asterisk denotes P < 0.05 and two asterisks P < 0.01.
38
Figure S2.4: Expression Correlation of Transcription Factor Binding with Different Sites. Expression levels of genes with different numbers of transcription factor binding sites in hESCs (a) and endoderm (b). Genes bound by transcription factors within 10 kb were analyzed. Whiskers represent 5 and 95 percentile of genes in each group. Student t-tests were performed on each group comparing with the None group < 10 kb. One asterisk denotes P < 0.05 and two asterisks P < 0.01.
39
Figure S2.5: H3K4me3 and H3K27me3 ChIP-Seq Peak Saturation. Peaks were called from each bin of the pooled reads and the numbers of unique peaks called were plotted to check the levels of saturation (see Experimental Procedures).
40
Figure S2.6: Genome-Wide Mapping of SMAD2/3, SMAD4, H3K4me3 and H3K27me3 UCSC genome browser screen shots showing the loci of SMAD2/3 and SMAD4 binding and histone marks in the genome of FOXA2, ACSS1 and LMO1 in hESCs (blue) and derived endoderm (red). FOXA2 is an induced Group 1 gene, and ACSS1 and LMO1 are Group1 genes but not in the induced subset. K4 and K27 stand for H3K4me3 and H3K27me3, respectively.
41
Figure S2.7: Enrichment of Differential Gene Expression and Transcription Factor Binding in Clusters. (a) The numbers of genes observed in each expression categories (induced, repressed, constitutive and inactive during hESC differentiation to endoderm) were plotted in red bars. The numbers of genes in random occurrence (average of 1000 random pulls) were plotted in blue bars. (b) The numbers of genes bound by SMAD2/3, SMAD4 or FOXH1 were plotted in red bars. The numbers of genes in random occurrence (average of 1000 random pulls) were plotted in blue bars. Upper panel: genes bound in hESCs, Lower panel: newly bound genes in endoderm.
42
Figure S2.8: Regulation of Gene Expression by Transcription Factor Complexes in Clusters. Genes bound by a single Transcription Factor or duplexes with either SMAD4 or FOXH1 in hESCs (a) and endoderm (b) were scattered based on their expression levels. The numbers in each graph are the quantity of bound genes. Trend lines of individual gene sets were drawn to assist to distinguish the expression differences.
43
Figure S2.9: Regulation of Gene Expression by Triple Transcription Factor Complexes in Clusters. Genes bound by a single Transcription Factor or triplexes in hESCs (a) and endoderm (b) were scattered based on their expression levels. The numbers in each graph are the quantity of bound genes. Trend lines of individual gene sets were drawn to assist to distinguish the expression differences.
44
Table 2.1: ChIP-Seq Data and Analysis Summary The numbers of reads, peaks and associated genes of all transcription factors and histone marks studied are presented separately in hESCs and derived endoderm (Endo).
Table 2.1: ChIP-Seq Data and Analysis Summary
Associated Genes (kb) ChIP Cell Reads Peaks
1000 100 10 1
hESC 10,200,000 4,032 3,588 3,077 1,916 1,249 SMAD2/3_A
Endo 9,605,287 1,037 1,117 827 272 72
hESC 8,708,351 14,833 9,777 9,052 7,057 5,715 SMAD2/3_B
Endo 5,910,789 2,915 2,604 1,905 567 106
hESC 6,928,056 2,688 2,511 2,062 1,197 745 SMAD3
Endo 6,055,629 2,296 2,107 1,466 400 67
hESC 8,959,821 3,936 3,533 3,223 2,293 1,702 SMAD4
Endo 6,066,743 4,531 2,768 2,753 906 207
hESC 10,400,000 9,702 6,897 5,797 2,646 1,123 FOXH1
Endo 10,800,000 29,292 11,631 10,385 4,734 1,324
hESC 9,465,441 - - - - Input
Endo 9,716,862 - - - -
hESC 7,338,695 24,030 H3K4me3
Endo 10,326,110 29,688
hESC 17,893,702 13,936 H3K27me3
Endo 19,595,165 26,293
hESC 8,824,050 - Input
Endo 10,876,757 -
45
Supplemental Table S2.1: Expression of Lineage Specification Genes
Gene Day 0 Day 1 Day 3 Day 5
GSC 73 29 31 24 1424 3320 1248 1614 2829 2806 2967 2985
EOMES 180 95 78 158 4488 7220 4228 3589 3696 4228 4820 4957
MIXL1 252 186 159 141 4188 10592 4883 2653 5151 5767 7088 6971
Table S2.1: Expression of Lineage Specification Genes Individual numbers in each gene and time point represent expression data from biological replicates.
46
Supplemental Table S2.2: Gene Ontology Analysis of Cluster Groups Biological Process p-value
Group 1 mRNA transcription regulation 6.91E-93 Developmental processes 1.11E-88 mRNA transcription 1.47E-86 Ectoderm development 1.50E-66 Neurogenesis 1.18E-63 Nucleoside, nucleotide and nucleic acid metabolism 1.09E-58 Segment specification 1.08E-27 Mesoderm development 1.51E-20 Embryogenesis 3.75E-16 Other receptor mediated signaling pathway 3.85E-12 Anterior/posterior patterning 4.18E-12 Skeletal development 2.35E-11 Cell communication 9.68E-09 Oncogenesis 2.42E-06 Muscle development 6.34E-05 Cell proliferation and differentiation 8.22E-05
Group 2 mRNA transcription 2.30E-07 Nucleoside, nucleotide and nucleic acid metabolism 3.58E-07
Group 3 Nucleoside, nucleotide and nucleic acid metabolism 2.17E-11 mRNA transcription 9.57E-10 mRNA transcription regulation 5.41E-09 Oncogenesis 9.36E-08 Developmental processes 2.72E-06 Protein phosphorylation 1.65E-05
Group 4 Neuronal activities 7.48E-24 Signal transduction 1.35E-21 Ion transport 1.63E-15 Cell communication 2.27E-14 Synaptic transmission 3.97E-13 Cation transport 1.30E-12 Transport 3.35E-12 Cell adhesion 2.49E-08 Cell surface receptor mediated signal transduction 4.79E-06 Cell adhesion-mediated signaling 4.63E-05
Group 5 Intracellular protein traffic 9.22E-08 Protein metabolism and modification 1.01E-06
Group 6 (Protein metabolism and modification) (1.30E-03) Group 7 (DNA metabolism) (1.62E-02) Group 8 Immunity and defense 3.15E-18
Cell surface receptor mediated signal transduction 2.63E-08 Signal transduction 8.76E-07 Cytokine and chemokine mediated signaling pathway 1.21E-06 Cell structure 7.30E-06 Cell structure and motility 8.27E-06 Muscle contraction 2.72E-05
Group 9 Olfaction 5.98E-54 Chemosensory perception 5.00E-53 Sensory perception 2.52E-42 G-protein mediated signaling 3.03E-32 Cell surface receptor mediated signal transduction 2.10E-21 Immunity and defense 1.03E-11 Signal transduction 3.69E-08 Interferon-mediated immunity 1.51E-07
47
Cytokine and chemokine mediated signaling pathway 4.11E-06 Table S2.2: Gene Ontology Analysis of Cluster Groups GO terms in the biological process with P below 1.0E-05 are listed in each Group.
48
Supplemental Table S2.3: Induced Genes in Group 1
Gene Accession No. hESC Expression
Endoderm Expression
SMAD2/3 Target
NTF3 NM_001102654 23 97 * PITX2 NM_000325 401 1617 * EOMES NM_005442 104 3052 * MXI1 NM_130439 228 628 * DUSP4 NM_001394 188 1028 * FOXA2 NM_021784 121 859 * GATA4 NM_002052 62 1238 * PLXNA4 NM_020911 121 306 * NTN1 NM_004822 35 416 * TBX3 NM_016569 29 184 * C1orf61 NM_006365 69 500 * EPHB3 NM_004443 81 325 * HLX NM_021958 52 189 * PDE10A NM_006661 85 564 * FOXQ1 NM_033260 71 884 * SFRP1 NM_003012 1253 3359 * FGF17 NM_003867 97 5132 * GSC NM_173849 45 2058 * HAND1 NM_004821 36 154 * GATA6 NM_005257 118 4761 * SOX17 NM_022454 80 1869 * NOG NM_005450 44 360 - TPPP3 NM_015964 41 197 - PCDH7 NM_032457 171 546 - HNF1B NM_000458 40 245 - CYP26A1 NM_000783 1191 6523 - COL2A1 NM_001844 69 374 - AHNAK NM_001620 406 1274 - SHH NM_000193 92 287 - DLX5 NM_005221 37 241 - CRLF1 NM_004750 2237 3251 - MSX2 NM_002449 65 640 -
Table S3. Induced Genes in Group 1 SMAD2/3 targets are marked by an asterisk.
49
EXPERIMENTAL PROCEDURES Cell Culture and Differentiation
Undifferentiated H9 hESCs (WiCell) were maintained on mouse
embryonic fibroblast (MEF) feeder layers or on Matrigel (1:20 dilution; BD
Biosciences) in mouse embryonic fibroblast-conditioned medium (CM). CM
was produced by conditioning MEFs for at least 24 hours in Dulbecco's
modified Eagle's medium/Ham's F-12 medium (DMEM/F12) supplemented
with 20% knockout serum replacement (Gibco), 1 mM L-glutamine, 0.1 mM
nonessential amino acids, 0.1 mM 2-mercaptoethanol, and 8 ng/ml
recombinant human fibroblast growth factor-basic (bFGF; Peprotech). Cultures
were routinely passaged with 200 U/ml type IV collagenase (Gibco) at the split
ratio of 1:3 to 1:4 every 4–5 days.
Definitive endoderm precursors were generated from hESCs as
previously described (D'Amour et al., 2005). Differentiation was performed in
RPMI-1640 medium supplemented with glutamax, 100 ng/ml recombinant
human ACTIVIN A (R&D Systems), penicillin/streptomycin, and defined fetal
bovine serum (FBS; HyClone) at the sequentially increased concentrations (0,
0.2 to 2%). 2% FBS was maintained afterwards in cultures over the duration of
differentiation.
Endoderm formation was validated by real-time RT-PCR with the total
RNAs isolated from differentiated cells. After washing once in phosphate
buffered saline pH 7.4 (PBS) containing 0.2% bovine serum albumin (BSA),
cells were harvested in Trizol (Invitrogen) and total RNAs were isolated
according to the manufacturer's protocol. One-step RT-PCR was performed
on iCycler (BioRad) using iScript RT-PCR SYBR Green Supermix (Bio-Rad).
The primer sequences are previously described (D'Amour et al., 2005).
50
Gene Expression Time course gene expression was performed on day 0, 1, 3, and 5
differentiated cells. Cells were washed once in PBS containing 0.2% BSA and
used for total RNA preparation using Trizol (Invitrogen). rRNAs were removed
from the isolated total RNAs and gene expression was analyzed using
GeneChip Human Exon 1.0 ST Array (Affymetrix) at the Stanford shared
protein and nucleic acid (PAN) facility. Exon array data were processed using
GeneBASE (Kapur et al., 2007). Probe intensities were corrected using
background probes. Probes were selected and summarized for gene level
expression. Gene expression profiles were pooled for quantile-normalization.
To examine gene expression specific to endodermal cells, CXCR4
positive cells were isolated from day 5 differentiated cells using FACS. Cells
were harvested and dissociated using 0.05% trypsin/EDTA (Invitrogen)
followed by neutralization with PBS containing 10% FBS. Cells were strained
with 40 µm strainer (BD Biosciences) and washed twice in PBS containing
0.2% BSA and 0.09% sodium azide (Staining Buffer). Cells were labeled with
antibodies against CXCR4-Phycoerythrin (R&D Systems) at 10 µl per 2.5x105
cells for 30-45 minutes on ice. Cells were washed twice and resuspended in
the Staining Buffer. CXCR4 positive cells were analyzed and isolated using a
FACS Aria (BD Bioscience) at the Stanford shared FACS facility. Isolated cells
were either used for total RNA preparation using Trizol (Invitrogen) or cross-
linked with formaldehyde for chromatin immunoprecipitation (ChIP).
ChIP-Seq ChIP was performed as previously described (Johnson et al., 2007).
5x106 cells cross-linked with formaldehyde were used for each ChIP. The
cross-linked cells were sonicated in 500 µl of a lysis buffer (50 mM Tris pH8.1,
10 mM EDTA, 1 % SDS) with protease inhibitor cocktail (Roche) to generate
200- to 600-bp fragments. Fragmented chromatin was immunoprecipitated
with magnetic beads coupled with 5 µg of each antibody. The antibodies used
51
were anti-SMADd2/3 (Santa Cruz Biotechnology, sc-8332 or R&D Systems,
AF3797), anti-SMAD3 (Abcam, ab28379), anti-SMAD4 (R&D Systems,
AF2097), anti-FOXH1 (R&D Systems, AF4248), anti-H3K4me3 (Abcam,
ab8580) and anti-H3K27me3 antibody (Upstate, 07-449). After washing,
precipitated DNA was purified and an aliquot was used for PCR validation.
The primers used for qPCR to quantify the ChIP-enriched DNA are as
follows: For transcription factor ChIP, LEFTY1(Forward, 5’-
TGTTTGCAGAGGGATAATAG-3’; Reverse, 5’-
TAATTCACAGGACTGATTGG-3’), LEFTY2 (Forward, 5’-
AGCCTGAAGAGTTTTGTTTG-3’; Reverse, 5’-TCCTGACGACTAA
TCAGACC-3’), GAPDH (Forward, 5’-AAGTGGATATTGTTGCCATC-3’;
Reverse, 5’-GGAATACGTGAGGGTATGAA-3’), and negative control
(Forward, 5’-TAGCCAAAAG AAGGAAGCAACAG-3’; Reverse, 5’-
CTAAAGGTAG GGCTGGAAGCAAT-3’). For histone ChIP, GAPDH (Forward,
5’-TCGACAGTCAGCCGCATCT-3’; Reverse, 5’-
CTAGCCTCCCGGGTTTCTCT-3’), RLP30 (Forward, 5’-
CAAGGCAAAGCGAAATTGGT-3’; Reverse, 5’-
GCCCGTTCAGTCTCTTCGATT-3’), MYOD (Forward, 5’-
CCGCCTGAGCAAAGTAAATGA-3’; Reverse, 5’-GGCAACCGCTGGTTTGG-
3’), and SERPINA1 (Forward, 5’-GGCTCAAGCTGGCATTCCTG-3’; Reverse,
5’-GGCTTAATCACGCACTGAGCTTA-3’). Relative occupancy values were
calculated by determining the apparent immunoprecipitation efficiency (ratio of
the amount of immunoprecipitated DNA to that of the input sample) and
normalized to the level observed at a negative control region, which was
defined as 1.0.
Sequencing libraries were prepared using Genomic DNA Sample Kit
(Illumina) according to the manufacturer's protocol. The ChIP-Seq libraries
were sequenced by Genome Analyzer II (Illumina) and its analyzing program.
52
Sequencing Data Processing Transcription factor ChIP-Seq reads were processed to call peaks using
CisGenome, an analyzing tool for genomic data (Ji et al., 2008). The setting
for calling and sliding window size was 300 bp and the threshold number of
reads required for peak to be called was 11 reads. The false discovery rate
allowed was 0.01. The resulting peaks were mapped to the human genome
hg18 to identify the locations and numbers of peaks around annotated genes.
Histone H3K4me3 and H3K27me3 peaks were called using QuEST 2.4
(Valouev et al., 2008). We used the “histone” bandwidth setting with “relaxed”
peak-calling parameters.
Transcription Factor Binding Regions and Associated Genes
We parsed the targets to see their distributions across the gene body.
UCSC Known Genes (Human browser hg18) were used to locate the targets
into annotated genomic regions, exon, intron, promoter (±10 kb from TSS), or
intergenic region. The numbers of the target peaks reaching at least 1 bp into
each genomic region were counted. To avoid multiple counting due to
overlapping two different regions, the regions of target binding were
sequentially searched in the order of promoter, exon, intron, and intergenic
region. In addition, when we analyzed the numbers of overlapping targets
existing for each transcription factor between the two cell states, the numbers
of the target peaks which are remained at least 1 bp from the previous site
were counted.
We examined the overlap between genes called within the regions
bound by all of the transcription factors analyzed. For the associated genes for
each target, the nearest genes within 1 Mb, 100 kb, 10 kb and 1 kb from TSS
were counted. Further, we examined the numbers of genes lying within target
53
regions between hESCs and endoderm within the same distance categories
(Figure 1B and Figure S2).
Using the expression timecourse, we determined the most favorable
context of SMAD2/3/4 or FOXH1 binding that could be correlated to a
transcriptional response. First, we examined all regions in the genome
surrounding a TSS at 1 kb, 10 kb and 1 Mb and analyzed the expression
levels of genes identified to contain regions bound to SMAD2/3/4 or FOXH1.
Second, we examined all genomic regions surrounding a TSS at 10 kb with
different numbers (one, two, or more than three) of SMAD2/3/4 or
FOXH1bound site. Student’s t-tests were performed to determine correlation
of those transcription factor bindings with transcription levels.
Histone Modification and Associated Genes To determine sequence library saturation, we simulated random
subsets for each library. We examined how many more peaks were
computationally identified using 10% of all reads, up to 100% in 10%
increments. In this way, if significantly more peaks are called when using
100% of reads versus 90%, then the library is not yet saturated. If the number
of identified peaks levels off with <100% of reads, then the library is
considered saturated.
To further verify our H3K27me3 and H3K4me3 ChIP-Seq datasets, we
compared the datasets generated for hESCs to other published accounts (Pan
et al., 2007; Zhao et al., 2007). Specifically, we identified the number of genes
in the intersection of the set of genes that were within 10 kb of reads in our
dataset, and another set of genes that were within 10 kb of reads in other
published work.
Histone Peak Clustering We used K-means clustering (http://bonsai.ims.u-
tokyo.ac.jp/~mdehoon/software/cluster/software.htm) to visualize the
54
H3K4me3 and H3K27me3 surrounding TSS in the genome. The
wiggle/enrichment plots represent normalized enrichment over the
background. The data points were the normalized enrichment values that are
calculated by QuEST. The log2(enrichment) values were used for clustering
and plotting. H3K4me3 and H3K27me3 marks were analyzed depending on
their patterns within ±5 kb of the TSS from UCSC Known Genes. For gene loci
with isoforms with alternate TSS's, we chose the TSS with the largest
H3K4me3 peak. Genes with a TSS within 10 kb of another gene TSS were
discarded for clustering analysis.
To functionally define these clusters, GO analysis was performed using
DAVID (the Database for Annotation, Visualization and Integrated Discovery)
(http://david.niaid.nih.gov). In addition, we examined how Groups 1-9 are
correlated with transcriptional activation in the context of endoderm. To this
end we used our microarray timecourse of hESC differentiation into endoderm
to associate the behavior of transcripts, whether induced, constitutive, inactive,
or repressed with a specific histone grouping (1-9). We compared the day 5
CXCR4 positive samples (d5) with hESC samples (d0). For each gene, we
calculated the fold change (R), difference (D) between the means of the two
groups, and the Welch's t-test p-value using dChip (Li and Wong, 2001).
Induced genes were defined by R > 2 and D > 100 of d5 over d0, and P ≤
0.05. Repressed genes were defined by R > 2 and D > 100 of d0 over d5, and
P value <=0.05. We also calculated the logarithm-transformed average (A)
and difference (M) of the means of d0 and d5 for each gene. We calculated
the z-scores of A (ZA) and the z-scores of M (ZM) for all genes. Constitutive
genes were defined by ZA > 1 and ZM < 1. Inactive genes were defined by ZA <
-1 and ZM < 1.
Transcription Factor Binding and Histone Marks We compared genes bound by the transcription factors to each histone
group to examine whether different groups are enriched for genes associated
55
with SMAD2/3, SMAD4 or FOXH1. We counted the genes bound by
SMAD2/3, SMAD4 or FOXH1 within 100 kb (until reaching other gene) in each
histone group. These counts were compared with the numbers of genes in
random occurrence in each group. For each cluster group with N genes, we
calculated a background expectation by randomly drawing N genes from the
total sample and recording the number, repeated 1000 times. In addition, we
examined whether the binding of transcription factors within each group is
predictive of expression changes between hESCs and derived endoderm. We
analyzed the location within 100 kb (until reaching other gene) surrounding the
TSS from each group for both factor binding and resulting increase in
transcriptional levels. Further, we also compared the regions bound by
combinations of the transcription factors in each group. We identified
complexes by using a sliding window of 600 bp. Student’s t-tests were
performed to elucidate correlation of those transcription factor bindings with
transcription levels.
We examined whether there is a subsignature within histone groups
that is conformationally distinct for transcription factor binding and
transcriptional increase. To this end, we analyzed the extent of H3K4me3 and
H3K27me3 association around TSS regions of subgroups bound by
transcription factors and compared with those that are not related with
transcription factor binding. The changes of H3K4me3 and H3K27me3 peaks
were statistically analyzed as follows: within each cluster group, we calculated
the average profile for all the genes in the whole group (across 100 bins). We
took the subset of interest (e.g. genes bound by SMAD2/3 in endoderm) and
calculated the average profile for that. Then we calculated the sum of squares
of the deviation from the mean at each bin to get a measure of deviation from
the whole group. We permuted 1000 random groups of the same size as the
original subset and calculated the background distribution of scores.
56
CHAPTER 3
ChIPvect_gui: Cell Specific Vector Generated Surface Plots
Invention Disclosure Docket Number: 08-330 Stanford University Office of Technology Licensing
57
ABSTRACT
Chip-seq has enabled scientists to paint an accurate picture of genetic
occupancy by identifying regions in the genome that are occupied by
transcription factors and/or modified histone proteins of interest with a high
degree of accuracy (Johnson, Mortazavi et al. 2007). This new technology has
spawned an array of bioinformatics tools that are designed to organize and
prime these data for analysis and the extraction of meaningful conclusions
(Valouev, Johnson et al. 2008). Though the bioinformatics tools currently
available are extremely useful in their own right, the need for an intuitive
method that capitalizes on chip-seq data to decipher the unique identity of a
given cell type remains apparent. The software invention presented here -
ChIPvect_gui - has been created using MATLAB as a platform, and it aims to
meet this need in a way that will be understandable and accessible to any
scientist that possesses a basic level of competence using a personal
computer.
58
BACKGROUND
Coupling chromatin immunoprecipitation with high-throughput sequencing has
yielded a technique that provides an accurate account of the regions in the
genome that are bound by certain transcription factors or associated with
histone proteins that are modified in a specific way (Bernstein, Mikkelsen et al.
2006; Johnson, Mortazavi et al. 2007). In turn, identifying gene regions that
are bound by certain transcription factors may hint at the role of genes during
development and cellular differentiation (Visel, Blow et al. 2008). It is
imperative that data of this quality and importance be presented in ways that
will highlight the striking epigenetic patterns that are inherent within it.
ChIPvect_gui is designed to illuminate striking histone modification and
transcription factor binding patterns that are present in chip-seq data.
ChIPvect_gui is based on fundamental concepts from linear algebra theory
and is built with a user-friendly graphical user interface (GUI) that makes the
package easy for any scientist with a basic level of competency using a
personal computer to understand. The software package has a variety of tools
and features that expose the user to novel ways of visualizing chip-seq data.
Each feature of ChIPvect_gui can be accessed through a simple click of the
appropriate button in the GUI at the appropriate time. To use ChIPvect_gui, an
input text file in tab-delimited (.txt) format, containing the relevant data is
59
required. Such files may be generated by the user in Microsoft excel or any
other text editor software package such as Text Wrangler.
Here, each feature of ChIPvect_gui is presented in detail. The concept behind
each one of the features in carefully described, and the functionality of each
feature is demonstrated using an example dataset. The MATLAB code written
to generate the GUI is also included in this chapter.
60
RESULTS
Application Features: Surface Plot
To use this feature of ChIPvect_gui, the user must provide data from three
separate chip-seq experiments performed on the same cell type. Each chip-
seq experiment is to be performed using an antibody that binds specifically to
a unique protein of interest. The surface plot function acts to generate a 3D
surface based on an input data file containing data from the three separate
chip-seq experiments referred to above. This input data matrix is n rows by 3
columns in size as illustrated in Table 3.1. Each column of the input data
matrix represents the number of sequence tags discovered in each of the
three chip-seq experiments. Each row of the input data matrix contains
specific genes that the user is interested in examining. Therefore, the entry in
the nth row of the mth column of the matrix contains the number of sequence
tags found to be associated with the nth gene in the mth chip-seq experiment.
Upon receipt of this input data matrix, the surface plot functionality of
ChIPvect_gui creates the three-dimensional surface in the following way. An
input data matrix in the same form as that seen in Table 3.1 is laid flat on the
X-Y plane in three-dimensional space, and a dot is placed above the X-Y
plane in the middle of each cell within the matrix. The unique thing about the
dots is that they rise to a height above the X-Y plane that is tantamount to the
number within each respective box. For example, if a certain gene in an
experiment had a total of 1000 sequence tags associated with it, the surface
61
plot feature would generate a dot rising 1000 units above the X-Y plane in the
middle of the appropriate cell of the corresponding matrix. This same
procedure is repeated for all the data points in the matrix leading to an array of
dots at varying heights and positions in three-dimensional space. Next, all
such dots are connected resulting in a three-dimensional surface that is
representative of the unique epigenetic patterns present in a distinct cell type.
Figure 3.1 depicts one such surface.
Application Features: Vector Plot & Vector Plot Annotation
The vector plot feature of ChIPvect_gui utilizes a matrix that is identical to that
accepted by the surface plot functionality shown in Table 3.1. The output
however, is markedly different. In this case, chip-seq data is presented in a
three-dimensional vector field. Each vector in the three-dimensional vector
field represents one gene, and the numerical contents of each row represent
the X, Y, and Z components of each vector. For example the 3D vector
representing Nanog in Table 3.1 would have 0.255056, 0.955676, and
0.827119 as its X, Y, and Z components respectively. A vector for each gene
in the list provided by the user is created in the same way. Figure 3.2 depicts
an array of vectors generated from the data set in Table 3.1. This data comes
from a chip-seq experiment in which the genetic occupancy of Oct4, Sox2, and
Nanog were examined in mouse embryonic stem cells (mESC) (Chen, Xu et
al. 2008).
62
To complement the vector plot functionality of ChIPvect_gui, a vector plot
annotation feature has also been added. This feature takes the form of a
checkbox in the gui. When selected, it simply labels each vector in three-
dimensional space with the appropriate gene name. This is designed to give
the user a better sense of the vector that represents each gene, and indirectly
allows the user to gauge the relative magnitude of occupancy of each
transcription factor on each gene.
Application Features: Vector-Generated Surface
The vector-generated surface is an important extension of the vector plot. The
vector plot serves as a skeletal framework for the vector generated surface/
The vector-generated surface is created by connecting the tips of all the
vectors in the vector plot to form a three dimensional shape that can be used
to identify a specific cell type. One such shape is depicted in Figure 3.3 and it
is uniquely defined by the pattern of transcription factor occupancy associated
with user-selected genes. The vector-generated surface could serve as a
readily available visual aid for the classification of cell types.
63
Application Features: Chromposition Chromposition provides yet another intuitive way of creating a three-
dimensional shape from ChIP-seq data. In contrast to the surface plot and
vector plot features, chromposition takes the location of each discovered
sequence tag into consideration when producing the three-dimensional shape.
It is important to note that the form of the matrix accepted by this feature of the
chipvect_gui software package is quite different from the matrices accepted by
the first three features described above. A truncated form of the matrix
accepted by this feature is depicted in Table 3.2. This data is obtained from
efforts to map STAT1 targets in interferon-γ stimulated HELA cells (Robertson,
Hirst et al. 2007).
Upon receipt of the input data matrix, chromposition generates a
representative three-dimensional shape in the following way. First,
chromposition generates a virtual circle drawn with 22 lines that are evenly
spaced apart from one another and run from the center of the circle to its
circumference. Each line represents one of the 22 chromosome pairs in the
human genome and is labeled accordingly (i.e. “Chr1” through “Chr22”). For
each distinct row in the matrix shown in Table 3.2, this feature of chipvect_gui
“looks” in the first column for the chromosome number and tags the
appropriate line that represents the current chromosome. It then takes the
number in the second column which is the position on the chromosome where
a majority of the tags were found and locates the exact point on the relevant
64
virtual chromosome to which it corresponds. A dot is placed directly above the
located point on the chromosome, at a height that is tantamount to the number
of tags associated with the data point in question. This procedure is repeated
for each row generating a large amount of data points in three-dimensional
space. Finally, connecting all the dots that have been generated from the
matrix containing the data of interest generates a 3D shape. Figures 3.4 and
3.5 provide a pictorial representation of the ideas discussed above.
Application Features: Custom Data Cursor The chromposition feature of chipvect_gui has been fitted with a custom data
cursor that displays the chromosome number and the amount of sequence
tags associated with a given peak. This feature was designed to make it easy
for any user to quickly and accurately determine the basic information
associated with a given data peak. Usage of the custom data cursor is
illustrated in Figure 3.5, where the chromosome number, number of tags, and
position on the chromosome associated with a specific peak displayed.
65
DISCUSSION
Chip-seq has emerged as a potent method for determining the patterns of
transcription factor occupancy and histone modification within cell groups of all
kinds (Bernstein, Mikkelsen et al. 2006; Johnson, Mortazavi et al. 2007; Chen,
Xu et al. 2008). It provides a more in-depth data set than its predecessor -
chip-chip - as scientists can now directly gauge transcription factor binding in
the genome rather than rely on oligomer hybridization to pre-made probes for
the detection of transcription factor occupancy. This advance in the field of
genomics has inspired the development of many potent computational tools
directed at priming data for the discovery of unique patterns (Ji, Jiang et al.
2008; Valouev, Johnson et al. 2008). With data of this quality readily available,
the need for computational techniques that build upon current methods to
highlight unique transcription factor binding patterns has become increasingly
apparent. Here we report the creation of a software tool that can be used to
present chip-seq data in novel ways, which serve to highlight interesting trends
contained within it.
Current computational tools that are used for chip-seq data analysis can be
used to convert the raw data to sequence tag form, which can then be readily
aligned with the genome of the appropriate model organism (Valouev,
Johnson et al. 2008). While this is very important for chip-seq data analysis,
there are a few analytical vantage points that are missing from this approach.
66
First, this method of presenting chip-seq data does not afford the user an
opportunity to examine the entire dataset at a glance. Second, the user is not
presented with the opportunity to ascertain if there is a unique binding pattern
for the transcription factor(s) under inspection that characterizes each cell
type. And lastly, it is rare to find a computational tool that is equipped with a
gui and is easy enough for a novice to master in a short amount of time.
ChIPvect_gui has been designed to meet the above stated needs. For
instance, the “chromposition” feature of ChIPvect_gui allows the user to look
at the entire data set at a glance by consolidating the 22 human chromosomes
into a genetic wheel. The ability to view all the data points at a glance is very
conducive to discovering regions in the genome that are frequently and
purposefully bound by a transcription factor under study. The chromposition
feature of ChIPvect_gui is also fitted with a data cursor that can give the user
the exact chromosome number, the position along the chromosome, and the
number of tags that correspond to any peak within the data set. The vector
plot functionality of ChIPvect_gui has the potential to yield a three dimensional
shape that could serve as a signature for each distinct cell group. For each cell
type, one can select genes that are known to be highly expressed and/or
important for the determination of cell identity, and create a three-dimensional
shape based on the transcription factor binding patterns related to those
genes. The resulting three-dimensional shape created using the vector plot
function of ChIPvect_gui could serve as a shape signature for that particular
67
cell type. One can see potential applications of this tool in the field of stem cell
engineering in which scientists strive to produce distinct cell types from naïve
embryonic stem cells. Upon application of a differentiation protocol to naïve
cells, researchers can take the newly differentiated cells and create their own
characteristic three-dimensional shape with the tools described here. If this
shape happens to match the already established shape for the cell type in
question, this could serve as strong verification for the differentiation protocol
being developed.
CONCLUSIONS
ChIPvect_gui is an interactive chip-seq data analysis tool that produces three-
dimensional shapes based on the patterns of transcription factor binding found
within a particular cell type. ChIPvect_gui offers the user a variety of different
forms in which chip-seq data can be presented. It is designed to produce
characteristic shapes that can serve as three-dimensional signatures of
distinct cells, as well as to provide a global view of the transcription factor
binding patterns found within a particular cell group.
68
IMPLEMENTATION
ChIPvect gui is implemented in MATLAB 7. The application can be installed on
any local personal computer in a dedicated sub-directory. ChIPvect_gui is
executed from the MATLAB 7 prompt by typing in a single command, and
providing user defined names for the input files. The user must provide input
data files containing ChIPseq data in the appropriate format as depicted in
Tables 3.1 and 3.2. These data will be used for each individual feature of
ChIPvect_gui. ChIPvect_gui is designed to convert this information into a
three-dimensional shape of the appropriate kind depending on which feature
of ChIPvect_gui is activated.
69
Figure 3.1: Surface Plot The surface plot above is formed through the use of a simple matrix containing the normalized number of tags (directly proportional to transcription factor occupancy) associated with user selected genes. This matrix is placed flat on the x-y plane, and the number of sequence tags present in each cell of the matrix dictates the topology of the surface.
70
Figure 3.2: The Vector Plot and Annotation Function of ChIPvect_gui The vectors in the figure above represent the degree of binding/occupancy of Sox2, Oct4, and Nanog on each of the genes noted in the figure. The normalized number of reads found for each chip-seq experiment serve as the X,Y, and Z components of each one of the vectors. The vector annotation function has also been used here to label each vector with the gene that it represents.
71
Figure 3.3: Vector Generated Surface Feature of ChIPvect_gui The vector-generated surface is a telling extension of the vector plot shown in figure 3.2. This feature of ChIPvect_gui works to connect the tips of all the vectors produced by the vector plot function leading to the creation of a three-dimensional surface whose shape is informed by the transcription factor binding patterns found within the cell under inspection. The vector plot annotation function can also be used here to label the appropriate regions of the three-dimensional shape that results from the tips of the vectors from the vector plot.
72
(a)
(b)
Figure 3.4: The Chromposition Function of ChIPvect_gui The data shown in this figure represents the binding patterns of STAT1 in normal HELA cells (a), and HELA cells that have been stimulated with interferon-γ (b). The genetic wheel on the X-Y plane in each of the figures above serves as the basis of chromposition. In the genetic wheel, each spoke represents one chromosome and the data points from the chip-seq experiment are systematically placed on each chromosome depending on the binding regions detected. Chromposition offers the user a view of the entire data set and at a glance, this result provides clues about the parts in the genome that have the highest amount of transcription factor activity.
73
Figure 3.5: Chromposition Custom Data Cursor Chromposition is fitted with a custom data cursor that allows the user to obtain information about any data peak in the figure window with the simple click of a button. This custom data cursor provides the chromosome number, chromosome position, and the number of sequence tags/reads associated with each data point.
74
Table 3.1: Input Data Matrix for Surface Plot and Vector Plot
Gene Name # of Tags Found in
Oct4 chip-seq
(Normalized Value)
# of Tags Found in
Sox2 chip-seq
(Normalized Value)
# of Tags Found in
Nanog chip-seq
(Normalized Value)
Nanog 0.255056 0.955676 0.827119
Oct4 0.809273 0.648718 0.827119
Sox2 0.809273 0.394118 0.120909
Klf4 0.100434 0.154611 0.105109
E2f1 0 0 0
Esrrb 0.443651 0.886154 0.165065
CTCF 0 0.365347 0.142826
Mycn 0.505594 0.955676 0.543939
Myc 0.255056 0 0.374297
Smad1 0 0 0
STAT3 0.809273 0.234118 0
Tcfcp2I1 0.505594 0.394118 0.105109
Zfx 0 0 0
Table 3.1: Input Data Matrix for Surface Plot and Vector Plot The above table highlights the general form of sample data that serves as input for the surface and vector plotting features of ChIPvect_gui. Each row contains the normalized number of tags associated with each gene in each chip-seq experiment. Note that the numbers are normalized for each chip-seq experiment.
75
Table 3.2: Truncated version of Chromposition Input data matrix Chr # Position # of Tags Chr1 556461 113 Chr1 559591 42 Chr1 703604 29 Chr1 845039 48
.
.
. Chr16 88615427 318 Chr16 88675610 33 Chr17 200296 26 Chr17 259843 39
.
.
. Chr22 48715687 28 Chr22 48730558 49 Chr22 48796845 106 Chr22 48810821 18 Chr22 49127275 47
Table 3.2: Truncated version of Chromposition Input data matrix Each row represents STAT1 binding patterns found in interferon-γ stimulated HELA cells. The first column indicates the chromosome number, the second column contains the mid point of the STAT1 binding range found along the appropriate chromosome, and the third column is simply the number of sequence tags found within that region from the chip-seq experiment.
77
ABSTRACT
Recent developments in microfluidics have made it possible to examine the
expression patterns of single cells via multiplexed quantitative real time PCR.
This powerful technological advance necessitates the development of
bioinformatics tools that can elucidate the unique patterns of gene expression
within a single cell, and facilitate the comparison of gene expression patterns
between cells.
Here we chronicle the design and implementation of SC Express, a MATLAB
based bioinformatics tool that produces a three-dimensional shape that is
reflective of the expression patterns of a single cell. We show that the three-
dimensional shape generated using the genetic contents within a single cell is
reproducible. The software package accepts tab delimited text files containing
the relevant gene expression data and provides a graphical user interface that
enables facile comparison of any two individual cell types on the same screen.
SC Express is a bioinformatics tool that provides a means to visualize gene
expression patterns that are reflective of individual cells.
78
BACKGROUND
Recent developments in microfluidics have made it possible to examine the
expression patterns of single cells (Todd Thorsen, Sebastian J. Maerkl et al.
2002). One of these platforms allows for simultaneous examination of 48
transcripts within 48 isolated single cells with remarkable sensitivity and
reproducibility (Sandra L. Spurgeon, Robert C. Jones et al. 2008). The
technological advances stated above present an opportunity to elucidate the
expression patterns within any single cell and to examine the similarities and
differences between individual cells. There is therefore an emerging need for a
bioinformatics tool that can be used to interpret and categorize the unique and
perhaps subtle signatures of individual cells at a complex molecular level.
In this chapter, the invention of SC Express - a software tool that can be used
to present the expression profiles of individual cells obtained via multiplexed
quantitative real time PCR (qRT-PCR) in a three-dimensional form – is
highlighted. This software package accepts cycle threshold (CT) and fold
enrichment values for 24 transcripts analyzed within 48 single cells. The
software generates a three-dimensional shape for each cell based on the
patterns of gene expression found within it. Technical replicate experiments
performed using the genetic material obtained from the same cell are found to
yield extremely similar three-dimensional shapes when analyzed using SC
Express. These three-dimensional shapes allow for visualization of data, and
79
thus illuminate the subtle expression patterns within a single cell in a way not
possible with more traditional methods. The general principle of SC Express
can be easily applied to the analysis of many other complex data sets such as
chromatin immunoprecipitation coupled with high-throughput sequencing
(ChipSeq), chromatin immunoprecipitation coupled with microarray
(ChIPChip), and RNA sequencing (RNASeq) making the basic principle
behind SC Express of utility for a variety of different methodological
applications.
RESULTS
Application Features: Rendering the Cell Specific 3-D Shapes
SC Express is a tool that can be used to visualize complex transcriptional data
from an individual cell. To test its reliability in depicting these signatures,
human embryonic stem cells (hESCs) were cultured and differentiated towards
definitive endoderm (Thompson, Itskovitz-Eldor et al. 1998; Kevin A D'Amour,
Alan D Agulnick et al. 2005). Single definitive endoderm cells were isolated
using fluorescence activated cell sorting (FACS), lysed, and transcripts within
each single cell were reverse transcribed. After 22 rounds of amplification, 24
different primers representing well-known definitive endoderm genes were
used in qRT-PCR reactions on 48 individual cells using Biomark 48.48TM chips
depicted in Figure 4.1 (Sandra L. Spurgeon, Robert C. Jones et al. 2008). A
80
tab delimited Microsoft excel file containing the resulting CT and fold
enrichment values was used as input data. A truncated version of the input
data file is illustrated in Table 4.1. These CT and fold enrichment values
constitute the numerical basis upon which SC Express generates three-
dimensional shapes for each cell. The representative three-dimensional shape
for each cell is developed in the following way.
First a virtual genetic wheel with lines that emanate from its center and extend
to its circumference is created on the x-y plane in three-dimensional space.
Each line contained in this circle represents one of the 24 genes whose
pattern of expression the user has chosen to examine. Each line is 50 units in
length and is labeled with the symbol of the gene that it represents. A pictorial
representation of the virtual genetic wheel is shown in Figure 4.2a.
After the virtual genetic wheel has been created, SC Express uses the input
data to initiate the second step involved in creating the cell specific three-
dimensional shapes. The input data file takes the form of a two-column matrix
with a variable number of rows (some multiple of 24) depending on how many
individual cells are being examined. The first column of the input data matrix
contains fold enrichment values calculated by the user relative to some set
standard, while the second column contains the CT values recorded by the
48.48 dynamic array system. SC Express breaks apart the input data into
smaller matrices each having a length of precisely 24 rows as illustrated in
81
Table 4.1. These smaller matrices are stored in memory as data that will be
used to create the unique three-dimensional shape for each cell. The first 24
rows of the input data matrix will be used to create the three-dimensional
shape that represents cell 1, with the next 24 being used for cell 2, and so on.
Once the individual matrices have been assigned to each cell the final step
generates a three-dimensional shape for each cell. For each 24 by 2 matrix
representing a single cell, SC Express begins by finding the line on the virtual
genetic wheel that represents the current gene and counts a number of units
along the selected line that corresponds to the rounded CT value associated
with that gene and marks the spot. Next, it creates a stem that rises above the
x-y plane to a height that is equal to the fold enrichment value recorded in the
same row. This stem emanates from the x-y plane at the exact spot that was
previously located using the gene name and the appropriate CT value. This
same procedure is performed for all of the 24 genes selected resulting in a set
of stems in three-dimensional space as shown in Figure 4.2b.
The set of stems constructed as in Figure 4.2b serves as a skeletal
framework upon which the actual three-dimensional shape is built. This part of
SC Express works by connecting the tips of all the stems created to yield a
three-dimensional shape that has been developed by taking the expression
levels of each assayed gene in the single cell into consideration. An example
of a completed cell specific three dimensional shape is illustrated in Figure
82
4.2c. The three-dimensional shapes generated in Figure 4.3 were produced
using data obtained from technical replicates of a BiomarkTM 48.48 dynamic
array experiment. Each three-dimensional shape was created from the
expression profile of a single cell. Figures 4.3a and 4.3b where created using
the gene expression profile of the same cell, while Figures 4.3c and 4.3d
where created using the gene expression profile a different cell. In general,
Figure 4.3 shows that the two shapes produced using the genetic material
from the same cell appear similar, while shapes produced from different cells
look different.
Application Features: Vector Based Variation Scores
For any pair of three-dimensional shapes generated in the SC Express GUI,
vector based variation scores can be calculated to serve as a measure of the
difference between them. The vector based variation scores are generated as
follows.
Imaginary vectors are extended from the origin of the virtual genetic wheel to
the tip of every stem in the set that serves as the skeletal framework of each
three-dimensional shape. Thus while comparing two cells, there will be a pair
of imaginary vectors that exist in the same vertical plane – one from each
three-dimensional shape - representing each gene.
83
Next, the corresponding sets of imaginary vectors generated for each shape
are compared to obtain a variation score. Specifically, the dot product between
pairs of vectors that represent the same gene in separate three-dimensional
shapes serves as the foundation upon which the variation score is determined.
For the purposes of illustration we will consider two imaginary vectors A1 and
A2 that represent the same gene in different three-dimensional shapes as
shown in Figure 4.4. The variation score between the three-dimensional
shapes being considered for this particular gene is defined as the angle
between vectors A1 and A2 which can be obtained using the dot product of
both vectors as shown in the mathematical formulas below.
A1 . A2 = |A1| |A2| Cos θ (1)
θ = Cos-1 (A1 . A2 / |A1| |A2|) (2)
As the angle between A1 and A2 approaches zero, A1 approaches A2 and vice
versa. For each gene under consideration, an identical procedure is used to
determine the variation between the corresponding pair of imaginary vectors.
SC Express displays the maximum variation score between all corresponding
pairs of vectors, the minimum variation score between all corresponding pairs
of vectors, and the average variation score which is an average measure of
the degree of variation between each pair of corresponding vectors
representing a gene in the pool selected by the user.
84
The variation scores obtained upon comparison of three-dimensional shapes
are also displayed as a stem plot within the GUI. Specifically, the measure of
variation for each gene (in degrees) is plotted as a stem that emanates from
the x-axis. Each stem rises to a height that is exactly equal to the amount of
variation that was calculated for the gene that it represents. This pictorial
representation of the variation in gene expression between any two cells was
designed to give the user a means to discern the particular gene in a chosen
pool that contributes most to the difference between the general expression
patterns of any pair of cells.
Application Features: Graphical User Interface
To ensure the best possible user experience, the software was developed with
a graphical user interface (GUI) on its front end. This GUI is fitted with two
visualization windows that display the three-dimensional shapes generated for
any cell. Drop down menus accompany each visualization window and allow
the user to select the expression data generated for any individual cell. With a
single click of the cell specific 3D plot button, a three-dimensional shape that
represents the selected cell will appear in the appropriate visualization
window. This style fosters an environment in which users can immediately
discern the apparent similarities or differences between any two cells profiled
in the experiment. Figure 4.5 presents a pictorial representation of the SC
Express GUI.
85
DISCUSSION
Current genotyping technology is used to observe gene expression patterns
on a multi-cellular level (Mark Schena, Dari Shalon et al. 1995). The next
frontier is to analyze expression patterns within an individual cell. Technology
is moving quickly toward this goal as indicated by the scientific studies that
have attempted to examine single cell expression (Eberwine, Yeh et al. 1992;
Levsky, Shenoy et al. 2002) the required computational approaches to explore
and analyze this data must keep pace. The relatively recent application of
microfluidics to cell biology has enabled a robust quantitative measure of the
expression patterns within a single cell (Sandra L. Spurgeon, Robert C. Jones
et al. 2008). Here, we report the development of a software tool that can be
used to visualize single cell expression data providing an unprecedented
ability to compare specific patterns of gene expression. SC express provides
resolution of data sets on a single cell level and a level of simplicity that is
unattainable with current bioinformatics tools.
Current bioinformatics analysis techniques such as clustering may allow users
to identify groups of single cells that show closely related patterns of gene
expression, but SC Express offers the added advantage of assigning a unique
three-dimensional signature to each cell. The unique three-dimensional
shapes assigned to each cell allows for direct visual comparison of single cell
specific expression patterns. The apparent reproducibility of these cell specific
86
three-dimensional signatures suggests that they could be used as accurate
markers of cell identity. In addition, SC Express readily calculates maximum,
minimum, and average variation scores between cell specific expression
patterns. The SC express GUI is also fitted with a variation score plot that
depicts the variation in expression for each gene upon comparison of cell
specific three-dimensional shapes. The simplicity of design allows any user
with a basic level of competence in personal computing to examine the
nuances in the expression patterns of a single cell.
SC Express was built to exploit the accuracy and reproducibility of
microfluidics enabled single cell analyses by facilitating visual comparison of
the expression profile of one single cell to another. SC Express’s features
include a GUI front end containing functional push buttons and two
visualization windows within which specific three-dimensional shapes for any
individual cell can be formed. For reasons of familiarity, the program has been
designed to accept tab delimited Microsoft excel files as input data. The above
listed features of the SC Express GUI were made in order to simplify use of
the application and thus tailor it to as wide a range of potential users as
possible.
Although expression data were used to test the efficacy of SC Express, the
basic underlying architecture of SC Express can easily be tailored to a variety
of datasets. Using similar logic, but different parameter assignments, SC
87
Express can generate three-dimensional shapes from ChipSeq or ChipChip
data. Thus the high utility of SC Express makes it a useful platform for
scientists across multiple disciplines to analyze a wide range of diverse
biological datasets.
CONCLUSIONS
SC Express generates three-dimensional shapes that represent the genetic
characteristics of a single cell. Each cell specific three-dimensional shape is
created using the results of 48 RT-qPCR reactions performed on a single cell.
SC Express is designed to give users the freedom to compare individual cells
through the use of three-dimensional shapes that are created using the
genetic characteristics of each cell and provides variation scores that serve as
a measure of variation between any two cell specific three-dimensional
shapes.
IMPLEMENTATION
SC Express is implemented in MATLAB 7. The application can be installed on
local computers in a dedicated sub-directory. SC Express is executed from the
MATLAB 7 prompt by typing in a single command, and providing user defined
names for the input files. Upon execution, the program will prompt the user to
88
provide two tab delimited text files that contain a list of recorded RT-qPCR CT
values and calculated fold enrichments for each and every individual cell
considered in two separate experiments. SC Express is designed to convert
this information into a three-dimensional shape that uniquely represents each
individual cell.
89
Figure 4.1: Biomark 48.48 Dynamic Array Device allows for the execution of 2034 simultaneous qRT-PCR runs on a single chip
90
Figure 4.2: Step-wise Construction of Three-dimensional Shape (a) The virtual genetic wheel in three dimensions (b) The array of stems in three-dimensional space that serves as a skeletal framework of three-dimensional shape (c) Completed three-dimensional shape
91
Figure 4.3: Three-dimensional cell specific plots Figures (a) and (b) were generated from technical replicate BiomarkTM 48.48 array enabled qRT-PCR experiments performed using cDNA obtained from the same exact single cell: cell 1. (a) Represents the first technical replicate performed using BiomarkTM 48.48 chip number: 1131100054 (b) Represents the second technical replicate performed using BiomarkTM 48.48 chip number: 1131100055. Figures (c) and (d) depict the three dimensional shapes generated using the genetic contents of another cell: cell 4.
92
Figure 4.4: Variation Score between three-dimensional shapes (a) and (b) depict imaginary vectors A1 and A2 that have been drawn to represent the same gene in different three-dimensional shapes. The angle between vectors A1 and A2 serves as the variation score for that particular gene between the three-dimensional shapes being considered.
93
Figure 4.5: SC Express graphical user interface Figure 5.6 shows a screen shot of the SC Express graphical user interface. Visualization windows (A), Cell Specific 3D Plot button (B), Drop down menus (C), Variation score calculator (D), and the Variation Score Plot (E) are all shown in the figure.
94
Cell Gene Symbol Column 1: Fold Enrichment Column 2: CT Value 10 FOXA2 6.317687933 14.81324486
GATA4 2.651446641 16.06909628 APOA2 0.517275768 18.42538484 SMARCD3 0.565844471 18.29420511 NR0B1 4.72631E-05 31.84211789 PRSS2 0.454294384 18.61778746 S100A16 87.57701716 11.02040246 FOXQ1 5.025226082 15.1434441 SAMD11 2.03331E-05 33.16318622 PORCN 2.852996557 15.9604623 SMAD6 3.096406161 15.84283484 PREX1 0.759743641 17.86911417 REEP6 0.689517521 18.00975132 GATA6 24.68464796 12.84709707 GSC 4.177925458 15.40998366 CXCR4 4.114427694 15.43763149 SOX17 2.419376574 16.20095549 MID1IP1 1.489734425 16.89770791 NODAL 27.69063403 12.68135029 NFKBIA 21.63575053 13.03779544 FXYD6 0.25719157 19.43241416 CST3 0.004238115 25.37495379 SOX1 8.65736E-05 31.23749261 GAPDH 100.0469864 10.82877637
Table 4.1: A truncated version of the SC Express input data file The table shows the information contained in the input data matrix. Specifically, the above table shows data recorded for cell #10 in an actual experiment. All 48 cells have such truncated matrix forms, and the complete data file is a concatenation of all 48 truncated matrices arranged with respect to the numerical order of the cells.
95
Chapter 5
Analysis of Gene Expression Patterns in Single Human Embryonic Stem
Cells and Their Derivatives Allows for Cellular Classification
96
ABSTRACT
Background: Discriminating between different cells within complex mixtures is
key to multiple disciplines including stem cell biology, cancer biology, and
developmental biology. Recent developments in microfluidics have made it
possible to examine the expression patterns of single cells via multiplexed
quantitative real time PCR. This powerful technological advance allows for in
depth exploration of the unique patterns of gene expression within a single
cell, and facilitates the comparison of gene expression patterns between cells.
Results: In this report, we demonstrate that transcriptional variation between
isolated single cells is high, but that this variability can be used to clearly
distinguish different cellular types. With the aid of SC Express - a
computational tool that we developed, we show that single isolated endoderm
cells derived from human embryonic stem cells (hESCs) have a surprising
degree of transcriptional variation. Looking closely at this variation, we found
three housekeeping transcripts that change significantly between cell types in
that the relative expression of these three markers when plotted in three-
dimensional space can clearly discriminate between different cellular types,
including 293T, hepg2, induced pluripotent stem cells (iPSCs), hESCs, and
endoderm derived from both iPSCs and hESCs.
Conclusion: Housekeeping transcripts are endemic to all cells, and our study
shows that these transcripts may be useful in discriminating between different
98
BACKGROUND
Distinguishing between subtle varieties of cell types is central to many
disciplines, including regenerative medicine and cancer biology. In both fields,
it is essential to identify particular cells within a complex mixture, whether it be
differentiating cultures or tumors. While the transcriptomes of whole
organisms, organ systems and culture regimes, have been described, the
extent of the transcriptional similarities between individual cells within these
populations is far from understood. This distinction is critical, as embryos,
organs, and tumors contain diverse populations of cells. Cell surface receptors
have been highly successful at isolating specific cells from these complex
tissues (Charles M. Baum, Irving L. Weissman et al. 1992), (Kevin A D'Amour,
Alan D Agulnick et al. 2005), but it is likely that cellular complexity is far
greater than a few markers can reflect.
Recent developments in microfluidics have made it possible to examine
the expression patterns of a variety of markers within single cells (Todd
Thorsen, Sebastian J. Maerkl et al. 2002). One of these platforms allows for
simultaneous examination of 48 transcripts within 48 isolated single cells with
remarkable sensitivity and reproducibility (Sandra L. Spurgeon, Robert C.
Jones et al. 2008). The technological advances stated above present an
opportunity to elucidate the expression patterns within any single cell and to
examine the similarities and differences between individual cells.
99
The study of variation between cells may allow insight into lineage
specification. On one hand, the discovery of a finite number of distinct cellular
expression patterns may indicate the existence of cellular subgroups
inherently fated to yield cells of a certain type. On the other hand, extreme
single cell individuality might indicate that transcript levels vary tremendously
between single cells and may not be an indicator of the future identity of a
specific cellular type. Having studied transcriptional variation within purified
hESC derived endodermal cells using recent breakthroughs in microfluidics
(Todd Thorsen, Sebastian J. Maerkl et al. 2002) and a novel bioinformatic
approach termed SCExpress, we find widespread transcriptional variation
between single definitive endoderm cells. This result suggests that these cells
are either highly individualistic or that cellular fate tolerates a high degree of
transcriptional variability. We also found three housekeeping transcripts that
are uncharacteristically variable between definitive endoderm and hESC
populations. Interestingly, the relative expression of these three markers when
plotted in three-dimensional space can clearly discriminate between different
cellular types, including 293T, hepg2, induced pluripotent stem cells (iPSCs),
hESCs, and endoderm derived from both iPSCs and hESCs. These three
transcripts may be used to discriminate different cellular types and may aid
basic biological understanding of lineage formation in hESCs and have
applications in regenerative medicine.
100
RESULTS
Gene Expression Profiling in Single Definitive Endoderm Cells
We hypothesized that expression profiling of single endodermal cells
would enable their classification into specific cellular groups reflective of
different endodermal fates. To this end, we differentiated hESC towards
definitive endoderm using an established differentiation protocol (Kevin A
D'Amour, Alan D Agulnick et al. 2005), and used fluorescence activated cell
sorting (FACS) to isolate cells within the differentiated population that
expressed the chemokine cell surface receptor CXCR4. Next, we selected 22
endoderm specific genes from the intersection of RNA sequencing (RNA-seq)
and exon array experiments performed on these same CXCR4+ definitive
endoderm cells. The endoderm specific genes used in our experiments are
listed in Table 5.1. We added a well-known ectoderm marker (SOX1) and a
housekeeping gene (GAPDH) to the list of genes as controls. Using the
multiplexed quantitative real time PCR (qRT-PCR) BiomarkTM system (Aaron
R. Wheeler, William R. Throndset et al. 2003), we profiled the relative
expression levels of these 24 genes in ~ 80 single CXCR4+ cells. Specifically,
we calculated fold enrichment values by comparing the cycle threshold (CT)
values of each gene to that of the common baseline control: GAPDH. We
found that each CXCR4+ definitive endoderm cell showed a unique pattern of
gene expression as depicted in Figure 5.1a. Even after analyzing the
expression of 22 endoderm specific genes in ~80 CXCR4+ definitive endoderm
101
cells, no two cells displayed identical expression patterns. Thus the transcript
levels of lineage specific molecules during endoderm specification appear to
occur on a continuum.
SC Express: A Tool to Visualize Gene Expression Patterns in Single
Cells
In order to visualize the individual expression patterns of each single
cell, we created SC Express: a method that uses single cell gene expression
to produce three-dimensional shapes . CT values from the Biomark 48.48TM
experiments were used to calculate fold enrichment values using the ΔΔCT
method. The resulting fold enrichment values and original CT values from
which they were calculated were used to create each cell specific three-
dimensional shape using SC Express (See Chapter 4 for a detailed description
of three dimensional shape construction). We found that cell specific three-
dimensional shapes created using our method tend to be individualistic (i.e.
specific to each cell) and reproducible as seen in Figure 5.2. At this point, we
surmised that a more stably expressed class of genes would be better for our
search for patterns that represent distinct cell types.
102
Housekeeping Gene Expression Within Single Cells
Since tissue specific transcripts were highly variable within individual
endoderm cells, we tested housekeeping transcripts to gauge their
consistency in expression within these same cells. While lineage specific
genes appear to be loosely regulated at the transcription level, research has
shown that housekeeping genes are more tightly controlled (Robert D. Barber,
Dan W. Harmer et al. 2005). We selected primers for 22 known housekeeping
genes (Eli Eisenberg and Levanon 2003) listed in Table 5.1 and performed
single cell PCR on FACS isolated SSEA4+ hESC and CXCR4+ hESC derived
endoderm. In general, housekeeping gene expression within single CXCR4+
hESC derived endoderm and single SSEA4+ hESCs was more uniform than
the tissue specific transcripts (Figure 5.1b and 5.1c), suggesting that
housekeeping gene expression is more tightly controlled than those of
regulatory pathways. Interestingly, while the relative level of most of the
housekeeping transcripts appeared to be consistent between hESC and hESC
derived endoderm, a few showed variability.
We also examined the expression of our housekeeping gene set (Table
5.1) within all 6 different cell lines (293T, hepg2, induced pluripotent stem cells
(iPSCs), hESCs, and endoderm derived from both iPSCs and hESCs). During
analysis of housekeeping gene expression in our selected cell lines, we
observed some variation between cells of different types. We performed
103
principal component analysis (PCA) to determine whether certain cell lines
cluster together or diverge based on the expression patterns of our set of
selected housekeeping genes. Our principal component analysis revealed 6
principal components, or axes of variation in the data: PC1, PC2, PC3, PC4,
PC5, and PC6. It should be noted that these principal components are ranked
based on how much variation in the data they explain. For example, principal
component 1 (PC1) explains most of the variation in the data, followed by
PC2, PC3, and so on. The first four principal components account for 87.8% of
the total variation in the dataset (PC1: 33.6%, PC2: 27.2%, PC3: 19.5%, PC4:
7.5%). In general, our data shows that single cells from the same group tend
to have similar housekeeping gene expression patterns resulting in the
formation of clusters of the different cell types (Figure 5.3). The first principal
component – PC1 - separates our hESC cluster from the hepg2 cluster,
marking these two clusters as the most distinct within our dataset (Figure
5.3a). The combination of PC3 and PC4 distinguishes the iPS derived
definitive endoderm from the other cell lines within the group (Figure 5.3b),
and also allows for the emergence of distinct hepG2, and hESCendo clusters.
The combination of PC1 and PC3 isolates iPS endoderm and hepG2 as
distinct clusters within the dataset (Figure 5.3c).
Though the expression patterns of all the genes within our
housekeeping pool yielded good clustering of the different cell lines after
principal component analysis (Figure 5.3), we sought to find the genes that
104
contributed most to the variation between cell lines. To this end, we ranked the
housekeeping genes based on their contribution to the PCA. The
housekeeping genes are listed in Table 5.2 in order of decreasing importance
(1 being the most important, and 24 being the least important) to our PCA.
Since PCA yielded cell type specific clusters based on housekeeping
gene expression, we assessed different permutations of the top 10
contributors to our PCA (Table 5.2) to see if any three of them resulted in
unique clustering of the different cell types. Using the expression patterns of
the top three contributors to our PCA, (LDHA, ACTB, NONO) we performed
principal component analysis again to see if the distinction between clusters of
different cell lines became even more striking relative to the PCA done with
the full set of housekeeping genes. We found that the principal components
derived based on the top three genes (LDHA, ACTB, NONO), were able to
demarcate the cell clusters more effectively (Figure 5.4a). Also, the
combination of LDHA, GPI, and NONO yielded good separation of the cell
clusters after PCA (Figure 5.4b). As a control, we performed PCA using three
genes picked at random from our housekeeping gene set (SOX1, TXN, NCL),
and found that these genes did not clearly separate the different cell types into
distinct clusters as in the case of the top three contributors to the PCA (Figure
5.4c). Further, the top three contributors to the PCA yielded the best visual
separation between the cell lines when the expression patterns of these genes
were used as the X, Y, and Z components for each cell in three-dimensional
105
rectangular coordinates (Figure 5.5a). As a control, we plotted all the different
cell lines in three-dimensional space using the expression patterns of SOX1,
TXN, and NCL as X, Y, and Z components respectively. In this case, the well-
defined cell specific clustering observed for these same cell types using
LDHA, ACTB, NONO as X, Y and Z components was lost.
DISCUSSION
Definitive endoderm cells show remarkable versatility in serving as the
precursor to a multitude of cell types that constitute the visceral organs (Kevin
A D'Amour, Alan D Agulnick et al. 2005; Richard I. Sherwood, Cristian Jitianu
et al. 2007). The developmental versatility of definitive endoderm begs the
question of how homogenous this cell population is on the transcriptional level.
A number of scenarios could arise: single members of this group could be
identical, they could completely differ from one another, or the entire
population could be segregated into sub populations each primed to yield
unique somatic cell types. Answering such questions requires analysis of
lineage specific gene expression patterns on the single cell level. Here, we
show that lineage specific gene expression patterns within single CXCR4+
definitive endoderm cells are highly individualistic. Several theories could
explain the unique patterns of lineage specific gene expression observed in
each endoderm cell. In one scenario, the expression level of each lineage
specific gene may only need to exceed a certain threshold for the cells to
attain endodermal fate. In this model, cellular identity may be controlled by
106
posttranscriptional mechanisms. Studies have shown that protein levels within
a cell can be modulated by intricate posttranscriptional mechanisms
(Nishimoto T 1981). These posttranscriptional mechanisms may act to keep
the amount of lineage specific proteins produced within a range that confers
endodermal character. Therefore, though the pattern of endoderm specific
genes expressed from cell to cell appears to be stochastic, the developmental
potential of each cell may be identical due to similar levels of protein
expression. If this were indeed the correct mechanism of gene expression
regulation, housekeeping genes appear to be exempt from this method of
control. The pattern of housekeeping gene expression that we discovered is
tightly controlled and does not appear to require this mode of regulation.
In a second scenario, transcript variation may reflect the actual
diversity, vast plasticity and developmental potential of definitive endoderm.
Fate mapping studies during mouse embryo development suggest that cellular
sub groups within definitive endoderm are inherently fated to yield cells of a
given type (Kristie A. Lawson, Juanito J. Meneses et al. 1991; Kimberly D.
Tremblay and Zaret 2005). On the other hand, co-culture experiments in the
embryo show that endoderm is not fully committed to any lineage in the early
stages of development (James M. Wells and Melton 2000). These studies in
addition to our own findings suggest that precursor cells within a given lineage
are not irreversibly fated to give rise to defined cell types. The transcriptional
heterogeneity observed within the definitive endoderm population is more
107
supportive of a developmental model in which each cell has some potential to
develop into a handful of cell types, but the eventual fate decision is pliable
and ultimately cemented by the presence of unique permutations of
developmental factors in the right concentration, at the right place and time.
This might be particularly true of endoderm derived in culture since the precise
inductive interactions characteristic of the three-dimensional embryo are not
present. We investigated endodermal heterogeneity within the embryo proper,
but were unable to consistently isolate single live endoderm cells from the
embryo. Thus, it is unclear whether transcript variability is specific to in vitro
differentiation conditions using hESCs. Nonetheless, this heterogeneity is a
critical issue to be understood when producing cell types for regenerative
medicine applications.
We have also shown through principal component analysis that
housekeeping gene expression is unique enough between different cell types
to result in the formation of distinct clusters. Further, we discovered three
housekeeping genes within our selected pool with sufficient variability in their
patterns of expression to distinguish between six different cell lines, including
hESCs and iPSCs. Although housekeeping genes have traditionally been
used to normalize gene expression data, recent work has shown that the
expression of this class of genes may vary from cell to cell (Luigi Warren,
David Bryder et al. 2006). Our work expands on this observation and
suggests that the variation in housekeeping transcripts could be an untapped
108
resource with which to distinguish different cell types, and could be an
important tool for both regenerative medicine and clinical diagnostics. For
example, variation in housekeeping gene expression could be used as a tool
to select specialized cell types from differentiating culture. It could also serve
as a possible diagnostic to distinguish between cancerous and non-cancerous
cells of the same cell type. In the future, it would be interesting to examine the
expression patterns of all housekeeping genes within as many different cells
as possible in search of a subset whose patterns of expression can be used to
distinguish between them. In general, single cell gene expression data is
immensely powerful and holds great promise for the study of development,
disease progression, and the treatment of disease.
109
Figure 5.1: Gene expression profiles within CXCR4+ definitive endoderm and SSEA4+ hESC. Expression patterns of endoderm specific and housekeeping genes are shown. Each panel contains ~40 traces with each trace representing the expression patterns of a single cell. (a): Endoderm specific genes are uniquely expressed in each CXCR4+ definitive endoderm cell. (b): Relative to lineage specific genes, housekeeping gene expression patterns within single definitive endoderm cells are much more uniform. (c): Housekeeping genes are also uniformly expressed in a group of single SSEA4+ human embryonic stem cells
110
Figure 5.2: Single cell specific three-dimensional shapes representing unique patterns of endoderm specific gene expression within CXCR4+ definitive endoderm cells The same CXCR4+ endoderm cell was used to generate (a) and (b) above. A separate CXCR4+ endoderm cell was used to generate (c) and (d) above. These shapes show that three dimensional shapes generated using endoderm specific gene expression patterns within the same single cells are highly reproducible, though the pattern of expression for these genes within single cells is unique.
111
Figure 5.3: Principal Component Analysis (PCA) Yields Distinct Clustering of Unique Cell Types Expression patterns of our entire housekeeping gene set were used for PCA. The first 4 principal components (PCs) accounted for ~ 88% of variation in the data. The PCs yielded clustering of the different cell types. (a) PC1 and PC2 distinguish the hESC cluster from the hepg2 cluster. (b) PC3 vs PC4, reveals distinct hepg2, hESC, hESCendo, and iPSendo clusters in the data set. (c)
113
Figure 5.4: Top three housekeeping gene PCA contributors allow for the formation of distinct cell clusters Expression patterns of the top housekeeping gene contributors to our PCA (ACTB, LDHA, NONO), were used for another PCA. (a) The principal components obtained from this analysis (PC1 and PC2) resulted in the formation of cell specific clusters. As controls, the expression patterns of LDHA, GPI, NONO (c), and SOX1, TXN NCL (b) were also used for a
114
separate PCA. The resulting plots (b) and (c) show that the clusters loose distinction in each of the two cases.
115
Figure 5.5: Plotting the expression patterns of the top three contributors to our PCA (ACTB, LDHA, NONO) results in the formation of cell specific clusters that can be visually distinguished from one another. Each cell is represented in three-dimensional space using the expression patterns of ACTB, LDHA, and NONO as X,Y and Z components respectively. This results in the formation of well defined cell specific clusters in three-dimensional space.
116
Table 5.1: Definitive endoderm and housekeeping gene sets used in single cell experiments Gene Class Number Gene Symbol of genes Endoderm 22 FOXA2, GATA4, APOA2, SMARCD3, NR0B1,
PRSS2, S100A16, FOXQ1, SAMD11, PORCN, SMAD6, PREX1, REEP6, GATA6, GSC, CXCR4, SOX17, MID1IP1, NODAL, NFKBIA, FXYD6, CST3
Housekeeping 22 ACTB, CTSD, GAPDH, ALDOA, ALDOC, NDUFA7, CCND3, PGK1, NONO, LDHA, ARHGDIA, SAFB, CTSB, CDA, CANX, MSN, FBL, TXN, PRPH, NCL, CSK, GPI
117
Table 5.2: Ranking the housekeeping genes in order of decreasing significance to the principal component analysis Rank Gene Symbol
1 ACTB 2 NONO 3 LDHA 4 NCL 5 TXN 6 GPI 7 CSTD 8 CANX 9 PGK1 10 CSTB 11 MSN 12 NDUFA7 13 SAFB 14 ARHGDIA 15 FBL 16 ALDOC 17 CCND3 18 ALDOA 19 CDA 20 PRPH 21 CSK
118
MATERIALS AND METHODS hESC Maintenance
hESCs were maintained in mouse embryonic fibroblast conditioned media on
10cm tissue culture plates (BD Falcon) coated with matrigel (R&D). The media
consists of DMEM F-12 (GIBCO), 20% Knockout serum (GIBCO), Non-
essential amino acids (GIBCO), 4ng/ml basic fibroblast growth factor
(peprotech), L-Glutamine (GIBCO), and β-mercaptoethanol. The media was
conditioned by MEFs for 24 hours at 37oC in 5% CO2. Cells were fed every 24
hours, and passed every 4 – 5 days.
iPS Maintenance iPS cells were maintained on mouse embryonic feeder layers in DMEM F-12
(GIBCO) supplemented with 20% knockout serum (GIBCO), non-essential
amino acids, 8ng/ml basic fibroblast growth factor (peprotech), L-glutamine,
and β-mercaptoethanol. Cell culture media was replaced daily, and cells were
passed in a 1:3 ratio every 4 – 5 days
hESC and iPS Cell Differentiation
hESCs were differentiated to definitive endoderm using the TGF-β signaling
molecule activin A. hESC media was aspirated and the cells were washed in
PBS (GIBCO) to remove any lingering traces of serum. Differentiation was
carried out in RPMI (GIBCO) containing 100ng/ml of activin A, and defined
FBS (Hyclone). The concentration of FBS in the solution was steadily
increased during differentiation from 0% for the first 24h, 0.2% for the next
24h, and 2% for all subsequent days of differentiation.
iPS cells were differentiated in much the same way as hESCs using the TGF-β
signaling molecule activin A. Differentiation was carried out in RPMI containing
100ng/ml of activin A, and defined fetal bovine serum (Hyclone). The
concentration of fetal bovine serum in the solution was steadily increased
119
during differentiation from 0% for the first 24h, 0.2% for the next 24h, and 2%
for all subsequent days of differentiation.
Tissue Culture: 293T 293T cells were cultured on 15cm dishes in DMEM (GIBCO) supplemented
with 10% FBS (GIBCO) and 1% penicillin streptomycin. Media was replaced
every 2 days. Cells were harvested by trypsinization for subsequent lysis,
amplification, and Biomark experiments.
Tissue Culture: HepG2 HepG2 cells were cultured in T175 flasks (BD Falcon) in DMEM cell culture
media (GIBCO) supplemented with 10% FBS (GIBCO) and 1% penicillin
streptomycin (GIBCO). Media was replaced every 2 days and cells were
harvested via trypsinization for subsequent amplification and Biomark
experiments.
FACS
Definitive endoderm cells were washed with PBS to remove any traces of
serum, and harvested using 0.05% trypsin/EDTA (GIBCO). Cells were briefly
washed in PBS and then again in Stain Buffer (BD Pharmigen). Human serum
substitute (Irvine Scientific) was added to prevent non-specific binding.
Endoderm cells were stained using monoclonal phycoerythrin labeled
antibodies against CXCR4 (R&D) for 30 – 45 minutes. After staining, cells
were washed twice in BD stain buffer, and resuspended in PBS. Single
definitive endoderm cells were sorted into individual wells of low profile 96 well
plates (Thermo Scientific) containing 5ul of Cells Direct 2x reaction mix
(Invitrogen) and SUPERase-In (Applied Biosystems) per well. The FACS
experiments were carried out at the Stanford FACS facility using BD FACS
Aria equipment. hESCs were sorted with the same protocol used to sort
definitive endoderm. However in the case of hESCs, monoclonal anti-human
120
allophycocyanin labeled antibodies against SSEA4 (R&D) were used to isolate
single hESCs into each well of low profile 96 well plates.
Biomark 48.48 Experiments
Immediately after cell sorting, cells were lysed, mRNA was reverse transcribed
(at 50oC for 15 minutes), and the resulting cDNA was amplified using Taqman
Primers (Applied Biosystems) specific to our selected gene set. The genetic
material from this pre-amplification step was diluted in a 1:4 ratio with TE
buffer (IDT). The diluted cDNA product was combined with Fluidigm’s Sample
loading reagent developed specifically for multiplexed quantitative real time
PCR (qRT-PCR) using the Biomark 48.48TM system. The Taqman assay for
each analyzed gene was mixed with Fluidigm’s Assay loading reagent in a 1:1
ratio in preparation for the qRT-PCR experiment. 5ul of each cDNA sample
mixture and 5ul of each Taqman assay mixture were distributed into the
appropriate wells on the 48.48 microfluidic chip. The chip was then primed and
loaded using the Biomark Nanoflex integrated fluidic chip controller, and
inserted into the Biomark machine for multiplexed qRT-PCR.
Principal Component Analysis Principal components analysis (PCA) is a linear dimensionality reduction
method that is widely used in population genetic studies. The technique seeks
to identify a small number of components that together account for most of the
variation in the data. Given an m x n matrix X of data for m cells at n loci, it is
common practice to perform PCA on the covariance matrix, estimated from the
data as follows:
where µX is the vector of average expression for each individual over all
genes.
121
Sample preparation
PCA can be highly influenced by outlier individuals. Thus, we first opted to
screen out cells with possibly aberrant expression profiles at one or more
housekeeping genes used in the study. To this end, we examined the
distribution of expressions over all cells at each gene; we then systematically
excluded cells whose expression level at any one gene deviated from the
average expression at that locus by more than 10 standard deviations. This
process resulted in the exclusion of 3 cells, bringing the total sample size in
this analysis to 189.
Treatment of missing data
Of the 189 remaining cells, only 6 were found to be missing expression data at
one or more genes. Of these, 5 had missing information for only one of the 24
housekeeping genes; the remaining cell had missing information at two loci.
To compute the covariance matrix as described above, we first set the
expression of the missing genes in these cells to 0. To correct for biases that
may result from these missing data, we then normalized the entries of the
covariance matrix by the number of non-missing genes used to estimate the
covariance in expression between every pair of cells. In other words, for each
entry we now have:
where nij is the number of genes that are non-missing in both cells i and j.
Identification of PCA-correlated genes
To identify a subset of genes that best explains the variation in the data, we
adopted a method described in the context of genome-wide human genetic
studies (Paschou, Ziv et al. 2007). This technique, aims to select a small set of
SNPs that best capture the intricate genetic relationships between human
populations, and is readily applicable to our dataset. Briefly, their algorithm
122
determines the number of significant principal components derived from the
data. These principal components are then used to compute an importance
score for each locus; the markers with the highest scores are those with
highest correlation to the PCA. We now describe how these steps were
applied to the single cell data in this study.
Identifying significant principal components
Estimating the number of significant PCs is an area of active research in
Random Matrix Theory. The original paper from Paschou et al. suggests
comparing the structure of the matrix corresponding to each PC and all
smaller ones to that of a random matrix constructed from the same entries. A
cutoff is then specified, and principal components that exhibit more structure
than the resulting random ones are retained as significant. While this method
enjoys the advantage of being computationally fast and straightforward to
implement, it tends to overestimate the number of significant PCs.
Another approach draws from the observation that, for a suitably normalized m
x n rectangular matrix, the eigenvalues of the PCA are approximately Tracy-
Widom distributed for large m and n (Johnstone 2001; Patterson, Price et al.
2006). Quantiles from the Tracy-Widom distribution have been computed in a
number of studies and are readily available (Matlab), and thus one could in
theory use the distribution to compute p-values for all of our PCs. In practice
however, the fact that our study focuses on a very small number of genes (24
loci) invalidates the assumption that the Tracy-Widom approximation would
hold in this case.
To circumvent these limitations, we opted instead to retain the first few
principal components that together explain a certain arbitrary proportion of the
variance in the data. We find that the first 4 principal components account for
99% of the variance in the data. This observation was corroborated by plots of
PC4 versus PC5 and PC5 versus PC6, which together did not appear to
123
capture any of the structure in the data (not shown). Thus, we elected to use
the first 4 principal components to obtain the importance scores for our
housekeeping genes.
Computation of importance scores
The single value decomposition theorem states that any rectangular m x n
matrix can be decomposed into a factorization of the form:
Hence, in vector notation, the data matrix can be written as the sum:
where di is the ith eigenvalue, and ui and vi are the ith columns of matrices U
and V, respectively. Paschou et al. argue then that the SNPs that have the
largest effects on the PCs should have large coefficients vi; they therefore
propose the importance score (Paschou, Ziv et al. 2007):
where k is now the number of significant PCs retained for the analysis (in our
case, k=4).
To determine which subset of genes has the greatest influence on our PCA,
we proceeded in two steps. We first removed all 6 cells with missing data and
used the statistical package R to obtain the singular value decomposition of
the data matrix. Finally, using the right singular matrix, we used the above
equation to compute importance scores for all 24 housekeeping genes.
125
As mankind continues to unravel the mysteries of mammalian
development, it is becoming increasingly apparent that studying the
mechanics involved in the control of gene expression on both the multi-cellular
and unicellular levels is of utmost importance. In recent times, tools with which
to extensively study the dynamics of gene expression have been developed.
Technological advances such as the DNA microarray and second generation
sequencing instruments - the Illumina Genome Analyzer, the HeliScope single
molecule sequencer, and Life Technologies’ SOLiD 4 - have provided a
means to gauge the expression patterns of any cell type, organ or tissue with
unprecedented accuracy and depth. These new technologies have spawned
an era in which expression profiling of different cell groups, whole transcript
sequencing, the study of transcription factor occupancy, and even whole
genome sequencing are now possible (Jackson, Bartz et al. 2003; Johnson,
Mortazavi et al. 2007; Pushkarev, Neff et al. 2009). While the above
mentioned instruments are immensely powerful, they are mostly limited to
addressing scientific questions on a multi-cellular scale.
On a smaller but equally significant scale, methods have been
established to examine gene expression patterns within single cells (Levsky,
Shenoy et al. 2002; Aaron R. Wheeler, William R. Throndset et al. 2003;
Sandra L. Spurgeon, Robert C. Jones et al. 2008). These efforts have
culminated in robust systems such as the BiomarkTM which can readily assay
the expression of as many as 48 genes within 48 single cells. These single
126
cell gene expression assay platforms answer many of the same questions as
the second generation sequencing platforms, but on a higher level of
resolution. With these new single cell ready platforms, it has now become
possible to examine questions concerning the genetic similarity and
differences between single cells of the same type, or between cells of different
types (Guo, Huss et al. 2010).
The marriage of second generation sequencing and single cell analysis
is an extremely important avenue for the study of development. With
microarrays and second generation sequencing instruments, it is possible to
asses the average global measure of gene expression within a group of cells
as they progress from naivety to a more determined state (Richard I.
Sherwood, Cristian Jitianu et al. 2007). On the other hand, single cell gene
expression instruments can be used to examine each cell within the transient
cell populations formed as naïve cell groups mature (Guo, Huss et al. 2010).
Using these two powerful technologies in tandem, we can now unequivocally
determine if the average expression pattern found on the multi-cellular scale is
representative of each individual cell, or rather, just an average measure of the
gene expression patterns seen on the unicellular level.
Answering questions pertaining to cellular diversity within a given cell
group is key to the budding disciplines of regenerative medicine and tissue
engineering. In these related fields, it is important to obtain pure populations of
127
therapeutically relevant cell types for implantation into an afflicted individual.
To ensure safety of the recipient/patient in this case, the identity of the cells
being implanted for therapeutic reasons must be unequivocally determined.
With the burgeoning toolkit of biotech instruments currently available, it is
becoming possible to determine the identity of the members of any group of
cells. This obviously bodes well for the field of cell replacement therapy, where
replacing specific diseased cells to restore a particular function within a
diseased patient is the ultimate goal.
The efforts in this thesis have been mainly directed towards using
second generation sequencing instruments to understand the epigenetic
changes that occur as hESCs become more determined, and elucidating the
gene expression patterns in cells that have been derived from hESCs on a
single cell level. This powerful combination of genetic analysis on the multi-
cellular and the unicellular level is what I hope is the beginning of larger scale
efforts to characterize therapeutically relevant cell types obtained from hESCs,
to ensure the safety of potential cell replacement therapy patients in the future.
129
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % ChIPvect GUI PROGRAM CODE % % Created By Chuba B. Oyolu % % Date: 07/29/2008 % % Last Modified: 06/21/2009 % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% function varargout = chipvect_gui(varargin) %CHIPVECT_GUI M-file for chipvect_gui.fig %CHIPVECT_GUI, by itself, creates a new CHIPVECT_GUI or raises the %existing singleton*. %H = CHIPVECT_GUI returns the handle to a new CHIPVECT_GUI or the %handle to the existing singleton*. %CHIPVECT_GUI('CALLBACK',hObject,eventData,handles,...) calls the local %function named CALLBACK in CHIPVECT_GUI.M with the given input %arguments. %CHIPVECT_GUI('Property','Value',...) creates a new CHIPVECT_GUI or %raises the existing singleton*. Starting from the left, property value %pairs are applied to the GUI before chipvect_gui_OpeningFunction gets %called. An unrecognized property name or invalid value makes property %application stop. All inputs are passed to chipvect_gui_OpeningFcn via %varargin. %*See GUI Options on GUIDE's Tools menu. Choose "GUI allows only one %instance to run (singleton)".See also: GUIDE, GUIDATA, GUIHANDLES %Edit the above text to modify the response to help chipvect_gui %Last Modified by GUIDE v2.5 05-Feb-2009 09:39:40 %****************Begin initialization code - DO NOT EDIT***********% gui_Singleton = 1; gui_State = struct('gui_Name', mfilename, ... 'gui_Singleton', gui_Singleton, ... 'gui_OpeningFcn', @chipvect_gui_OpeningFcn, ... 'gui_OutputFcn', @chipvect_gui_OutputFcn, ... 'gui_LayoutFcn', [] , ... 'gui_Callback', []); if nargin && ischar(varargin{1}) gui_State.gui_Callback = str2func(varargin{1}); end if nargout [varargout{1:nargout}] = gui_mainfcn(gui_State, varargin{:}); else gui_mainfcn(gui_State, varargin{:}); end %**************End initialization code - DO NOT EDIT***************%
130
%Executes just before chipvect_gui is made visible. function chipvect_gui_OpeningFcn(hObject, eventdata, handles, varargin) %This function has no output args, see OutputFcn. %hObject - handle to figure %eventdata - reserved - to be defined in a future version of MATLAB %handles - structure with handles and user data (see GUIDATA) %varargin - command line arguments to chipvect_gui (see VARARGIN) %Import the file from Microsoft Excel... filename1 = input('Please Enter Filename: ','s'); chipmat = xlsread(['/Applications/MATLAB_SV74/' filename1 '.xls']); %User Input for axis labels... labelx = input('Please Label X axis: ','s'); labely = input('Please Label Y axis: ','s'); labelz = input('Please Label Z axis: ','s'); filename2 = input('Please Enter Filename For Chromposition: ','s'); raw_data = dlmread(['/Applications/MATLAB_SV74/' filename2 '.txt']); %Ensure that the file is the right size... if size(chipmat) ~= [12 3]; error('INVALID MATRIX SIZE... MATRIX DIMENSIONS MUST BE 12 rows X 3 columns'); return end zero_vect = [0 0 0]; %Data for Surface Plot... handles.surf = chipmat; %Data for Vector Arrow Plot... handles.vectarrow = chipmat; %Data for Vector generated surface... handles.vectgen = chipmat; %Data for Chromposition... handles.chromposition = raw_data; %Data for Chrompeaks... handles.chrompeaks = raw_data; %Label for the x,y, and z axes handles.labelx =labelx; handles.labely =labely; handles.labelz =labelz; %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Default is to start with the surface plot % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% handles.current_data = handles.surf; surf(handles.current_data); title('Matrix Generated 3D-Surface Plot'); shading interp; %Choose default command line output for chipvect_gui
131
handles.output = hObject; %Update handles structure guidata(hObject, handles); %UIWAIT makes chipvect_gui wait for user response (see UIRESUME) %uiwait(handles.figure1); %Outputs from this function are returned to the command line. function varargout = chipvect_gui_OutputFcn(hObject, eventdata, handles) %varargout cell array for returning output args (see VARARGOUT); %hObject handle to figure %eventdata reserved - to be defined in a future version of MATLAB %handles structure with handles and user data (see GUIDATA) %Get default command line output from handles structure varargout{1} = handles.output; %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Button press in pushbutton1 (Surface Plot Button) % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% function pushbutton1_Callback(hObject, eventdata, handles) %hObject - handle to pushbutton1 (see GCBO) %eventdata - reserved - to be defined in a future version of MATLAB %handles - structure with handles and user data (see GUIDATA) surf(handles.current_data); title('Matrix Generated 3D-Surface Plot'); shading interp; rotate3d off %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Button press in pushbutton2 (Vector Plot Button) % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% function pushbutton2_Callback(hObject, eventdata, handles) %hObject - handle to pushbutton2 (see GCBO) %eventdata - reserved - to be defined in a future version of MATLAB %handles - structure with handles and user data (see GUIDATA) j = 1; zero_vect = [0 0 0]; for j = 1:length(handles.current_data); if j > length(handles.current_data); break; else vectarrow(zero_vect,handles.current_data(j,:)); hold on; j = j + 1; xlabel(handles.labelx); ylabel(handles.labely); zlabel(handles.labelz); end end hold off; rotate3d off
132
title('Matrix Generated 3D-Vector Plot'); %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Check in Checkbox Right Below Vector Plot (Annotate Vector Plot)% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% function checkbox1_Callback(hObject,eventdata,handles) %Define Vector that contains the names of all genes being considered genevect = {'Nanog';'Oct4';'Sox2';'klf4';'E2f1';'Esrrb';'CTCF'; 'Mycn';'Myc';'Smad1';'STAT3';'Tcfcp2I1';'Zfx';'Gene 14'}; nullvect = {'';'';'';'';'';'';'';'';'';'';'';'';'';''}; checkboxStatus = get(handles.checkbox1,'Value'); k = 1; if checkboxStatus == 1; for k = 1:length(handles.current_data); text(handles.current_data(k,1),handles.current_data(k,2),handles.current_data(k,3),genevect(k)); hold on; k = k + 1; end end hold off; if checkboxStatus == 0; for k = 1:length(handles.current_data); text(handles.current_data(k,1),handles.current_data(k,2),handles.current_data(k,3),nullvect(k)); hold on; k = k + 1; end end hold off; %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Button press in pushbutton3 (Vector Generated Surface Button) % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% function pushbutton3_Callback(hObject, eventdata, handles) %hObject - handle to pushbutton3 (see GCBO) %eventdata - reserved - to be defined in a future version of MATLAB %handles - structure with handles and user data (see GUIDATA) x = handles.current_data(:,1); y = handles.current_data(:,2); z = handles.current_data(:,3); tri = delaunay(x,y); h = trisurf(tri,x,y,z); shading interp; lighting phong;
133
xlabel(handles.labelx); ylabel(handles.labely); zlabel(handles.labelz); rotate3d off title('Vector Generated 3D-Surface'); %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Button press in pushbutton4 (Enable 3D Plot Rotation Button) % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% function pushbutton4_Callback(hObject, eventdata, handles) %hObject - handle to pushbutton4 (see GCBO) %eventdata - reserved - to be defined in a future version of MATLAB %handles - structure with handles and user data (see GUIDATA) rotate3d on %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Button press in pushbutton6 (Chromposition) % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% function pushbutton6_Callback(hObject, eventdata, handles) %hObject - handle to pushbutton6 (see GCBO) %eventdata - reserved - to be defined in a future version of MATLAB %handles - structure with handles and user data (see GUIDATA) cursor_mat = handles.chromposition; handles.chromposition(:,2) = (handles.chromposition(:,2)/max(handles.chromposition(:,2)))*10000; handles.chromposition(:,2) = round(handles.chromposition(:,2)); %Generating the Points on the circle... NOP = 23; radius_circ = max(handles.chromposition(:,2)); center = [0,0,10]; style = '.'; global radius_circ; THETA=linspace(0,2*pi,NOP); RHO=ones(1,NOP)*radius_circ; [X,Y] = pol2cart(THETA,RHO); X=X+center(1); Y=Y+center(2); Z = center(3)*ones(1,length(X)); H=plot3(X,Y,Z,style); xlabel('x coordinate'); ylabel('y coordinate'); zlabel('Number Of Reads'); axis square; grid
134
%Creating the spokes of the bicycle wheel... chuba = [X,Y]; emeka = [chuba(:,1:23);chuba(:,24:46)]; coord_mat = emeka'; line([0 coord_mat(1,1)],[0 coord_mat(1,2)],[10 10],'Marker','.','LineStyle','--'); line([0 coord_mat(2,1)],[0 coord_mat(2,2)],[10 10],'Marker','.','LineStyle','--'); line([0 coord_mat(3,1)],[0 coord_mat(3,2)],[10 10],'Marker','.','LineStyle','--'); line([0 coord_mat(4,1)],[0 coord_mat(4,2)],[10 10],'Marker','.','LineStyle','--'); line([0 coord_mat(5,1)],[0 coord_mat(5,2)],[10 10],'Marker','.','LineStyle','--'); line([0 coord_mat(6,1)],[0 coord_mat(6,2)],[10 10],'Marker','.','LineStyle','--'); line([0 coord_mat(7,1)],[0 coord_mat(7,2)],[10 10],'Marker','.','LineStyle','--'); line([0 coord_mat(8,1)],[0 coord_mat(8,2)],[10 10],'Marker','.','LineStyle','--'); line([0 coord_mat(9,1)],[0 coord_mat(9,2)],[10 10],'Marker','.','LineStyle','--'); line([0 coord_mat(10,1)],[0 coord_mat(10,2)],[10 10],'Marker','.','LineStyle','--'); line([0 coord_mat(11,1)],[0 coord_mat(11,2)],[10 10],'Marker','.','LineStyle','--'); line([0 coord_mat(12,1)],[0 coord_mat(12,2)],[10 10],'Marker','.','LineStyle','--'); line([0 coord_mat(13,1)],[0 coord_mat(13,2)],[10 10],'Marker','.','LineStyle','--'); line([0 coord_mat(14,1)],[0 coord_mat(14,2)],[10 10],'Marker','.','LineStyle','--'); line([0 coord_mat(15,1)],[0 coord_mat(15,2)],[10 10],'Marker','.','LineStyle','--'); line([0 coord_mat(16,1)],[0 coord_mat(16,2)],[10 10],'Marker','.','LineStyle','--'); line([0 coord_mat(17,1)],[0 coord_mat(17,2)],[10 10],'Marker','.','LineStyle','--'); line([0 coord_mat(18,1)],[0 coord_mat(18,2)],[10 10],'Marker','.','LineStyle','--'); line([0 coord_mat(19,1)],[0 coord_mat(19,2)],[10 10],'Marker','.','LineStyle','--'); line([0 coord_mat(20,1)],[0 coord_mat(20,2)],[10 10],'Marker','.','LineStyle','--'); line([0 coord_mat(21,1)],[0 coord_mat(21,2)],[10 10],'Marker','.','LineStyle','--'); line([0 coord_mat(22,1)],[0 coord_mat(22,2)],[10 10],'Marker','.','LineStyle','--'); line([0 coord_mat(23,1)],[0 coord_mat(23,2)],[10 10],'Marker','.','LineStyle','--'); hold on;
135
%Labelling the Chromosomes... text(coord_mat(1,1),coord_mat(1,2),10,'Chr1'); text(coord_mat(2,1),coord_mat(2,2),10,'Chr2'); text(coord_mat(3,1),coord_mat(3,2),10,'Chr3'); text(coord_mat(4,1),coord_mat(4,2),10,'Chr4'); text(coord_mat(5,1),coord_mat(5,2),10,'Chr5'); text(coord_mat(6,1),coord_mat(6,2),10,'Chr6'); text(coord_mat(7,1),coord_mat(7,2),10,'Chr7'); text(coord_mat(8,1),coord_mat(8,2),10,'Chr8'); text(coord_mat(9,1),coord_mat(9,2),10,'Chr9'); text(coord_mat(10,1),coord_mat(10,2),10,'Chr10'); text(coord_mat(11,1),coord_mat(11,2),10,'Chr11'); text(coord_mat(12,1),coord_mat(12,2),10,'Chr12'); text(coord_mat(13,1),coord_mat(13,2),10,'Chr13'); text(coord_mat(14,1),coord_mat(14,2),10,'Chr14'); text(coord_mat(15,1),coord_mat(15,2),10,'Chr15'); text(coord_mat(16,1),coord_mat(16,2),10,'Chr16'); text(coord_mat(17,1),coord_mat(17,2),10,'Chr17'); text(coord_mat(18,1),coord_mat(18,2),10,'Chr18'); text(coord_mat(19,1),coord_mat(19,2),10,'Chr19'); text(coord_mat(20,1),coord_mat(20,2),10,'Chr20'); text(coord_mat(21,1),coord_mat(21,2),10,'Chr21'); text(coord_mat(22,1),coord_mat(22,2),10,'Chr22'); %Obtaining all line coordinates... [a1,b1] = conect(0,coord_mat(1,1),0,coord_mat(1,2)); victor1 = [a1;b1]'; [a2,b2] = conect(0,coord_mat(2,1),0,coord_mat(2,2)); victor2 = [a2;b2]'; [a3,b3] = conect(0,coord_mat(3,1),0,coord_mat(3,2)); victor3 = [a3;b3]'; [a4,b4] = conect(0,coord_mat(4,1),0,coord_mat(4,2)); victor4 = [a4;b4]'; [a5,b5] = conect(0,coord_mat(5,1),0,coord_mat(5,2)); victor5 = [a5;b5]'; [a6,b6] = conect(0,coord_mat(6,1),0,coord_mat(6,2)); victor6 = [a6;b6]'; [a7,b7] = conect(0,coord_mat(7,1),0,coord_mat(7,2)); victor7 = [a7;b7]'; [a8,b8] = conect(0,coord_mat(8,1),0,coord_mat(8,2)); victor8 = [a8;b8]'; [a9,b9] = conect(0,coord_mat(9,1),0,coord_mat(9,2)); victor9 = [a9;b9]'; [a10,b10] = conect(0,coord_mat(10,1),0,coord_mat(10,2)); victor10 = [a10;b10]'; [a11,b11] = conect(0,coord_mat(11,1),0,coord_mat(11,2)); victor11 = [a11;b11]'; [a12,b12] = conect(0,coord_mat(12,1),0,coord_mat(12,2)); victor12 = [a12;b12]'; [a13,b13] = conect(0,coord_mat(13,1),0,coord_mat(13,2)); victor13 = [a13;b13]'; [a14,b14] = conect(0,coord_mat(14,1),0,coord_mat(14,2)); victor14 = [a14;b14]'; [a15,b15] = conect(0,coord_mat(15,1),0,coord_mat(15,2));
136
victor15 = [a15;b15]'; [a16,b16] = conect(0,coord_mat(16,1),0,coord_mat(16,2)); victor16 = [a16;b16]'; [a17,b17] = conect(0,coord_mat(17,1),0,coord_mat(17,2)); victor17 = [a17;b17]'; [a18,b18] = conect(0,coord_mat(18,1),0,coord_mat(18,2)); victor18 = [a18;b18]'; [a19,b19] = conect(0,coord_mat(19,1),0,coord_mat(19,2)); victor19 = [a19;b19]'; [a20,b20] = conect(0,coord_mat(20,1),0,coord_mat(20,2)); victor20 = [a20;b20]'; [a21,b21] = conect(0,coord_mat(21,1),0,coord_mat(21,2)); victor21 = [a21;b21]'; [a22,b22] = conect(0,coord_mat(22,1),0,coord_mat(22,2)); victor22 = [a22;b22]'; [a23,b23] = conect(0,coord_mat(23,1),0,coord_mat(23,2)); victor23 = [a23;b23]'; pos_mat = zeros(radius_circ*23,2); pos_mat = [victor1;victor2;victor3;victor4;victor5;victor6;victor7;victor8; victor9;victor10;victor11;victor12;victor13;victor14;victor15;victor16; victor17;victor18;victor19;victor20;victor21;victor22;victor23]; %Get coordinates for each data point... for m = 1:length(handles.chromposition); coord_index(m) = (radius_circ * (handles.chromposition(m,1)-1)) + handles.chromposition(m,2); handles.chromposition(m,4) = pos_mat(coord_index(m),1); handles.chromposition(m,5) = pos_mat(coord_index(m),2); m = m + 1; end %Render the 3Dimensional Form... x = handles.chromposition(:,4); y = handles.chromposition(:,5); z = handles.chromposition(:,3); tri = delaunay(x,y); h = trisurf(tri,x,y,z); title ('Surface Plot: Topographical Display of Enrichment'); shading interp; lighting phong; cursor_mat(:,4) = handles.chromposition(:,4); cursor_mat(:,5) = handles.chromposition(:,5); global cursor_mat; hold off
137
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %Button press in pushbutton8... (Zooming in) % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% function pushbutton8_Callback(hObject, eventdata, handles) %hObject - handle to pushbutton8 (see GCBO) %eventdata - reserved - to be defined in a future version of MATLAB %handles - structure with handles and user data (see GUIDATA) zoom on; %Executes when figure1 is resized. function figure1_ResizeFcn(hObject, eventdata, handles) %hObject - handle to figure1 (see GCBO) %eventdata - reserved - to be defined in a future version of MATLAB %handles - structure with handles and user data (see GUIDATA) %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %Button press in pushbutton9... (Zooming out) % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% function pushbutton9_Callback(hObject, eventdata, handles) zoom out; %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Custom Cursor for Chromposition % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% function pushbutton11_Callback(hObject, eventdata, handles) %hObject - handle to pushbutton11 (see GCBO) %eventdata - reserved - to be defined in a future version of MATLAB %handles - structure with handles and user data (see GUIDATA) global cursor_mat; dcm_obj = datacursormode; set(dcm_obj,'UpdateFcn',@myupdatefcn); %Executes on selection change in popupmenu4. function popupmenu4_Callback(hObject, eventdata, handles) %hObject - handle to popupmenu4 (see GCBO) %eventdata - reserved - to be defined in a future version of MATLAB %handles - structure with handles and user data (see GUIDATA) %Hints: contents = get(hObject,'String') returns popupmenu4 contents as %cell array contents{get(hObject,'Value')} returns selected item from %popupmenu4 str = get(hObject, 'String'); val = get(hObject,'Value'); switch str{val}; case 'All Chromosomes' % User selects All Chromosomes handles.chrompeaks = handles.chromposition; case 'Chromosome 1' % User selects Chromosome 1. handles.chrompeaks = handles.chrom1;
138
case 'Chromosome 2' % User selects Chromosome 2. handles.chrompeaks = handles.chrom2; end %Save the handles structure... guidata(hObject,handles) %%%%%%%%%%%%%%%%%%%%%%%%%%%% % Chrompeaks Pulldown Menu % %%%%%%%%%%%%%%%%%%%%%%%%%%%% function popupmenu4_CreateFcn(hObject, eventdata, handles) %hObject - handle to popupmenu4 (see GCBO) %eventdata - reserved - to be defined in a future version of MATLAB %handles - empty - handles not created until after all CreateFcns %called %Hint: popupmenu controls usually have a white background on Windows. %See ISPC and COMPUTER. if ispc && isequal(get(hObject,'BackgroundColor'), get(0,'defaultUicontrolBackgroundColor')) set(hObject,'BackgroundColor','white'); end %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Custom Cursor for Chrompeaks % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% function pushbutton12_Callback(hObject, eventdata, handles) %hObject - handle to pushbutton12 (see GCBO) %eventdata - reserved - to be defined in a future version of MATLAB %handles - structure with handles and user data (see GUIDATA) global cursor_mat2 dcm_obj = datacursormode; set(dcm_obj,'UpdateFcn',@updatefcn); %%%%%%%%%%%%%%%%%%%%%%%%%% % Chrompeaks Push Button % %%%%%%%%%%%%%%%%%%%%%%%%%% function pushbutton13_Callback(hObject, eventdata, handles) %hObject - handle to pushbutton13 (see GCBO) %eventdata - reserved - to be defined in a future version of MATLAB %handles - structure with handles and user data (see GUIDATA) cursor_mat2 = handles.chrompeaks; cursor_mat2(:,4) = zeros(length(cursor_mat2),1); raw_data = handles.chrompeaks; global cursor_mat2;
139
%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Matrix for chromosome #1 % %%%%%%%%%%%%%%%%%%%%%%%%%%%% %Potential Dummy Variable one = 0; for i = 1:length(raw_data)-1; if raw_data(i,1) == 1; mat_one(i,:) = raw_data(i,:); elseif raw_data(1,1) ~= 1; mat_one = zeros(1,3); one = -1; break end i = i + 1; end if mat_one ~= 0; mat_one = sortrows(mat_one,2); end %%%%%%%%%%%%%%%%%%%%%%%%%%%% % Matrix for chromosome #2 % %%%%%%%%%%%%%%%%%%%%%%%%%%%% %Potential Dummy Variable two = 0; length_one = length(mat_one)+one+1; %Create matrix for chromosome #2 for i = length_one:length(raw_data)-1; if raw_data(i,1) == 2; mat_two((i - (length_one - 1)),:) = raw_data(i,:); elseif raw_data(length_one,1) ~= 2; mat_two = zeros(1,3); two = -1; break end i = i + 1; end if mat_two ~= 0; mat_two = sortrows(mat_two,2); end %%%%%%%%%%%%%%%%%%%%%%%%%%%% % Matrix for chromosome #3 % %%%%%%%%%%%%%%%%%%%%%%%%%%%% %Potential Dummy Variable three = 0; %Calculating length of all 2 matrices... length_two = length(mat_one)+one+length(mat_two)+two+1;
140
%Create matrix for chromosome #3 for i = length_two:length(raw_data)-1; if raw_data(i,1) == 3; mat_three((i - (length_two - 1)),:) = raw_data(i,:); elseif raw_data(length_two,1) ~= 3; mat_three = zeros(1,3); three = -1; break end i = i + 1; end if mat_three ~= 0; mat_three = sortrows(mat_three,2); end %%%%%%%%%%%%%%%%%%%%%%%%%%%% % Matrix for chromosome #4 % %%%%%%%%%%%%%%%%%%%%%%%%%%%% %Potential Dummy Variable four = 0; %Calculating length of all 3 matrices... length_three = length(mat_one)+one+length(mat_two)+two+length(mat_three)... +three+1; %Create matrix for chromosome #4 for i = length_three:length(raw_data)-1; if raw_data(i,1) == 4; mat_four((i - (length_three - 1)),:) = raw_data(i,:); elseif raw_data(length_three,1) ~= 4; mat_four = zeros(1,3); four = -1; break end i = i + 1; end if mat_four ~= 0; mat_four = sortrows(mat_four,2); end %%%%%%%%%%%%%%%%%%%%%%%%%%%% % Matrix for chromosome #5 % %%%%%%%%%%%%%%%%%%%%%%%%%%%% %Potential Dummy Variable five = 0; %Calculating length of all 4 matrices... length_four = length(mat_one)+one+length(mat_two)+two+length(mat_three)...
141
+three+length(mat_four)+four+1; %Create matrix for chromosome #5 for i = length_four:length(raw_data)-1; if raw_data(i,1) == 5; mat_five((i - (length_four - 1)),:) = raw_data(i,:); elseif raw_data(length_four,1) ~= 5; mat_five = zeros(1,3); five = -1; break end i = i + 1; end if mat_five ~= 0; mat_five = sortrows(mat_five,2); end %%%%%%%%%%%%%%%%%%%%%%%%%%%% % Matrix for chromosome #6 % %%%%%%%%%%%%%%%%%%%%%%%%%%%% %Potential Dummy Variable six = 0; %Calculating length of all 5 matrices... length_five = length(mat_one)+one+length(mat_two)+two+length(mat_three)... +three+length(mat_four)+four+length(mat_five)+five+1; %Potential dummy variable... six = 0; %Create matrix for chromosome #6 for i = length_five:length(raw_data)-1; if raw_data(i,1) == 6; mat_six((i - (length_five - 1)),:) = raw_data(i,:); elseif raw_data(length_five,1) ~= 6; mat_six = zeros(1,3); six = -1; break end i = i + 1; end if mat_six ~= 0; mat_six = sortrows(mat_six,2); end %%%%%%%%%%%%%%%%%%%%%%%%%%%% % Matrix for chromosome #7 % %%%%%%%%%%%%%%%%%%%%%%%%%%%% %Potential Dummy variable... seven = 0;
142
%Calculating length of all 6 matrices... length_six = length(mat_one)+one+length(mat_two)+two+length(mat_three)... +three+length(mat_four)+four+length(mat_five)+five+length(mat_six)... +six+1; % Create matrix for chromosome #7 for i = length_six:length(raw_data)-1; if raw_data(i,1) == 7; mat_seven((i - (length_six - 1)),:) = raw_data(i,:); elseif raw_data(length_six,1) ~= 7; mat_seven = zeros(1,3); seven = -1; break end i = i + 1; end if mat_seven ~= 0; mat_seven = sortrows(mat_seven,2); end %%%%%%%%%%%%%%%%%%%%%%%%%%%% % Matrix for chromosome #8 % %%%%%%%%%%%%%%%%%%%%%%%%%%%% % Potential Dummy Variable eight = 0; % Calculating length of all 7 matrices... length_seven = length(mat_one)+one+length(mat_two)+two+length(mat_three)... +three+length(mat_four)+four+length(mat_five)+five+length(mat_six)+six... +length(mat_seven)+seven+1; % Create matrix for chromosome #8 for i = length_seven:length(raw_data)-1; if raw_data(i,1) == 8; mat_eight((i - (length_seven - 1)),:) = raw_data(i,:); elseif raw_data(length_seven,1) ~= 8; mat_eight = zeros(1,3); eight = -1; break end i = i + 1; end if mat_eight ~= 0; mat_eight = sortrows(mat_eight,2); end %%%%%%%%%%%%%%%%%%%%%%%%%%%% % Matrix for chromosome #9 % %%%%%%%%%%%%%%%%%%%%%%%%%%%% % Potential Dummy variable
143
nine = 0; % Calculating length of all 8 matrices... length_eight = length(mat_one)+one+length(mat_two)+two+length(mat_three)... +three+length(mat_four)+four+length(mat_five)+five+length(mat_six)+six... +length(mat_seven)+seven+length(mat_eight)+eight+1; % Create matrix for chromosome #9 for i = length_eight:length(raw_data)-1; if raw_data(i,1) == 9; mat_nine((i - (length_eight - 1)),:) = raw_data(i,:); elseif raw_data(length_eight,1) ~= 9; mat_nine = zeros(1,3); nine = -1; break end i = i + 1; end if mat_nine ~= 0; mat_nine = sortrows(mat_nine,2); end %%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Matrix for chromosome #10 % %%%%%%%%%%%%%%%%%%%%%%%%%%%%% ten = 0; % Calculating length of all 9 matrices... length_nine = length(mat_one)+one+length(mat_two)+two+length(mat_three)... +three+length(mat_four)+four+length(mat_five)+five+length(mat_six)... +six+length(mat_seven)+seven+length(mat_eight)+eight+length(mat_nine)... +nine+1; % Create matrix for chromosome #10 for i = length_nine:length(raw_data)-1; if raw_data(i,1) == 10; mat_ten((i - (length_nine - 1)),:) = raw_data(i,:); elseif raw_data(length_nine,1) ~= 10; mat_ten = zeros(1,3); ten = -1; break end i = i + 1; end if mat_ten ~= 0; mat_ten = sortrows(mat_ten,2); end
144
%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Matrix for chromosome #11 % %%%%%%%%%%%%%%%%%%%%%%%%%%%%% oneone = 0; % Calculating length of all 10 matrices... length_ten = length(mat_one)+one+length(mat_two)+two+length(mat_three)... +three+length(mat_four)+four+length(mat_five)+five+length(mat_six)... +six+length(mat_seven)+seven+length(mat_eight)+eight+length(mat_nine)... +nine+length(mat_ten)+ten+1; % Create matrix for chromosome #11 for i = length_ten:length(raw_data)-1; if raw_data(i,1) == 11; mat_oneone((i - (length_ten - 1)),:) = raw_data(i,:); elseif raw_data(length_ten,1) ~= 11; mat_oneone = zeros(1,3); oneone = -1; break end i = i + 1; end if mat_oneone ~= 0; mat_oneone = sortrows(mat_oneone,2); end %%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Matrix for chromosome #12 % %%%%%%%%%%%%%%%%%%%%%%%%%%%%% onetwo = 0; % Calculating length of all 11 matrices... length_oneone = length(mat_one)+one+length(mat_two)+two+length(mat_three)... +three+length(mat_four)+four+length(mat_five)+five+length(mat_six)... +six+length(mat_seven)+seven+length(mat_eight)+eight+length(mat_nine)... +nine+length(mat_ten)+ten+length(mat_oneone)+oneone+1; % Create matrix for chromosome #12 for i = length_oneone:length(raw_data)-1; if raw_data(i,1) == 12; mat_onetwo((i - (length_oneone - 1)),:) = raw_data(i,:); elseif raw_data(length_oneone,1) ~= 12; mat_onetwo = zeros(1,3); onetwo = -1; break
145
end i = i + 1; end if mat_onetwo ~= 0; mat_onetwo = sortrows(mat_onetwo,2); end %%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Matrix for chromosome #13 % %%%%%%%%%%%%%%%%%%%%%%%%%%%%% onethree = 0; % Calculating length of all 12 matrices... length_onetwo = length(mat_one)+one+length(mat_two)+two+length(mat_three)... +three+length(mat_four)+four+length(mat_five)+five+length(mat_six)... +six+length(mat_seven)+seven+length(mat_eight)+eight+length(mat_nine)... +nine+length(mat_ten)+ten+length(mat_oneone)+oneone+length(mat_onetwo)... +onetwo+1; % Create matrix for chromosome #13 for i = length_onetwo:length(raw_data)-1; if raw_data(i,1) == 13; mat_onethree((i - (length_onetwo - 1)),:) = raw_data(i,:); elseif raw_data(length_onetwo,1) ~= 13; mat_onethree = zeros(1,3); onethree = -1; break end i = i + 1; end if mat_onethree ~= 0; mat_onethree = sortrows(mat_onethree,2); end %%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Matrix for chromosome #14 % %%%%%%%%%%%%%%%%%%%%%%%%%%%%% onefour = 0; % Calculating length of all 13 matrices... length_onethree = length(mat_one)+one+length(mat_two)+two+length(mat_three)... +three+length(mat_four)+four+length(mat_five)+five+length(mat_six)... +six+length(mat_seven)+seven+length(mat_eight)+eight+length(mat_nine)...
146
+nine+length(mat_ten)+ten+length(mat_oneone)+oneone+length(mat_onetwo)... +onetwo+length(mat_onethree)+onethree+1; % Create matrix for chromosome #14 for i = length_onethree:length(raw_data)-1; if raw_data(i,1) == 14; mat_onefour((i - (length_onethree - 1)),:) = raw_data(i,:); elseif raw_data(length_onethree,1) ~= 14; mat_onefour = zeros(1,3); onefour = -1; break end i = i + 1; end if mat_onefour ~= 0; mat_onefour = sortrows(mat_onefour,2); end %%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Matrix for chromosome #15 % %%%%%%%%%%%%%%%%%%%%%%%%%%%%% onefive = 0; % Calculating length of all 14 matrices... length_onefour = length(mat_one)+one+length(mat_two)+two+length(mat_three)... +three+length(mat_four)+four+length(mat_five)+five+length(mat_six)... +six+length(mat_seven)+seven+length(mat_eight)+eight+length(mat_nine)... +nine+length(mat_ten)+ten+length(mat_oneone)+oneone+length(mat_onetwo)... +onetwo+length(mat_onethree)+onethree+length(mat_onefour)+onefour+1; % Create matrix for chromosome #15 for i = length_onefour:length(raw_data)-1; if raw_data(i,1) == 15; mat_onefive((i - (length_onefour - 1)),:) = raw_data(i,:); elseif raw_data(length_onefour,1) ~= 15; mat_onefive = zeros(1,3); onefive = -1; break end i = i + 1; end if mat_onefive ~= 0; mat_onefive = sortrows(mat_onefive,2); end %%%%%%%%%%%%%%%%%%%%%%%%%%%%%
147
% Matrix for chromosome #16 % %%%%%%%%%%%%%%%%%%%%%%%%%%%%% onesix = 0; % Calculating length of all 15 matrices... length_onefive = length(mat_one)+one+length(mat_two)+two+length(mat_three)... +three+length(mat_four)+four+length(mat_five)+five+length(mat_six)... +six+length(mat_seven)+seven+length(mat_eight)+eight+length(mat_nine)... +nine+length(mat_ten)+ten+length(mat_oneone)+oneone+length(mat_onetwo)... +onetwo+length(mat_onethree)+onethree+length(mat_onefour)+onefour... +length(mat_onefive)+onefive+1; % Create matrix for chromosome #16 for i = length_onefive:length(raw_data)-1; if raw_data(i,1) == 16; mat_onesix((i - (length_onefive - 1)),:) = raw_data(i,:); elseif raw_data(length_onefive,1) ~= 16; mat_onesix = zeros(1,3); onesix = -1; break end i = i + 1; end if mat_onesix ~= 0; mat_onesix = sortrows(mat_onesix,2); end %%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Matrix for chromosome #17 % %%%%%%%%%%%%%%%%%%%%%%%%%%%%% oneseven = 0; % Calculating length of all 16 matrices... length_onesix = length(mat_one)+one+length(mat_two)+two+length(mat_three)... +three+length(mat_four)+four+length(mat_five)+five+length(mat_six)... +six+length(mat_seven)+seven+length(mat_eight)+eight+length(mat_nine)... +nine+length(mat_ten)+ten+length(mat_oneone)+oneone+length(mat_onetwo)... +onetwo+length(mat_onethree)+onethree+length(mat_onefour)+onefour... +length(mat_onefive)+onefive+length(mat_onesix)+onesix+1; % Create matrix for chromosome #17
148
for i = length_onesix:length(raw_data)-1; if raw_data(i,1) == 17; mat_oneseven((i - (length_onesix - 1)),:) = raw_data(i,:); elseif raw_data(length_onesix,1) ~= 17; mat_oneseven = zeros(1,3); oneseven = -1; break end i = i + 1; end if mat_oneseven ~= 0; mat_oneseven = sortrows(mat_oneseven,2); end %%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Matrix for chromosome #18 % %%%%%%%%%%%%%%%%%%%%%%%%%%%%% oneeight = 0; % Calculating length of all 17 matrices... length_oneseven = length(mat_one)+one+length(mat_two)+two+length(mat_three)... +three+length(mat_four)+four+length(mat_five)+five+length(mat_six)... +six+length(mat_seven)+seven+length(mat_eight)+eight+length(mat_nine)... +nine+length(mat_ten)+ten+length(mat_oneone)+oneone+length(mat_onetwo)... +onetwo+length(mat_onethree)+onethree+length(mat_onefour)+onefour... +length(mat_onefive)+onefive+length(mat_onesix)+onesix+length(mat_oneseven)... +oneseven+1; % Create matrix for chromosome #18 for i = length_oneseven:length(raw_data)-1; if raw_data(i,1) == 18; mat_oneeight((i - (length_oneseven - 1)),:) = raw_data(i,:); elseif raw_data(length_oneseven,1) ~= 18; mat_oneeight = zeros(1,3); oneeight = -1; break end i = i + 1; end if mat_oneeight ~= 0; mat_oneeight = sortrows(mat_oneeight,2); end %%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Matrix for chromosome #19 % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%
149
onenine = 0; % Calculating length of all 18 matrices... length_oneeight = length(mat_one)+one+length(mat_two)+two+length(mat_three)... +three+length(mat_four)+four+length(mat_five)+five+length(mat_six)... +six+length(mat_seven)+seven+length(mat_eight)+eight+length(mat_nine)... +nine+length(mat_ten)+ten+length(mat_oneone)+oneone+length(mat_onetwo)... +onetwo+length(mat_onethree)+onethree+length(mat_onefour)+onefour... +length(mat_onefive)+onefive+length(mat_onesix)+onesix+length(mat_oneseven)... +oneseven+length(mat_oneeight)+oneeight+1; % Create matrix for chromosome #19 for i = length_oneeight:length(raw_data)-1; if raw_data(i,1) == 19; mat_onenine((i - (length_oneeight - 1)),:) = raw_data(i,:); elseif raw_data(length_oneeight,1) ~= 19; mat_onenine = zeros(1,3); onenine = -1; break end i = i + 1; end if mat_onenine ~= 0; mat_onenine = sortrows(mat_onenine,2); end %%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Matrix for chromosome #20 % %%%%%%%%%%%%%%%%%%%%%%%%%%%%% twozero = 0; % Calculating length of all 19 matrices... length_onenine = length(mat_one)+one+length(mat_two)+two+length(mat_three)... +three+length(mat_four)+four+length(mat_five)+five+length(mat_six)... +six+length(mat_seven)+seven+length(mat_eight)+eight+length(mat_nine)... +nine+length(mat_ten)+ten+length(mat_oneone)+oneone+length(mat_onetwo)... +onetwo+length(mat_onethree)+onethree+length(mat_onefour)+onefour... +length(mat_onefive)+onefive+length(mat_onesix)+onesix... +length(mat_oneseven)+oneseven+length(mat_oneeight)+oneeight...
150
+length(mat_onenine)+onenine+1; % Create matrix for chromosome #20 for i = length_onenine:length(raw_data)-1; if raw_data(i,1) == 20; mat_twozero((i - (length_onenine - 1)),:) = raw_data(i,:); elseif raw_data(length_onenine,1) ~= 20; mat_twozero = zeros(1,3); twozero = -1; break end i = i + 1; end if mat_twozero ~= 0; mat_twozero = sortrows(mat_twozero,2); end %%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Matrix for chromosome #21 % %%%%%%%%%%%%%%%%%%%%%%%%%%%%% twoone = 0; % Calculating length of all 20 matrices... length_twozero = length(mat_one)+one+length(mat_two)+two+length(mat_three)... +three+length(mat_four)+four+length(mat_five)+five+length(mat_six)... +six+length(mat_seven)+seven+length(mat_eight)+eight+length(mat_nine)... +nine+length(mat_ten)+ten+length(mat_oneone)+oneone+length(mat_onetwo)... +onetwo+length(mat_onethree)+onethree+length(mat_onefour)+onefour... +length(mat_onefive)+onefive+length(mat_onesix)+onesix... +length(mat_oneseven)+oneseven+length(mat_oneeight)+oneeight... +length(mat_onenine)+onenine+length(mat_twozero)+twozero+1; % Create matrix for chromosome #21 for i = length_twozero:length(raw_data)-1; if raw_data(i,1) == 21; mat_twoone((i - (length_twozero - 1)),:) = raw_data(i,:); elseif raw_data(length_twozero,1) ~= 21; mat_twoone = zeros(1,3); twoone = -1; break end i = i + 1; end if mat_twoone ~= 0; mat_twoone = sortrows(mat_twoone,2); end %%%%%%%%%%%%%%%%%%%%%%%%%%%%%
151
% Matrix for chromosome #22 % %%%%%%%%%%%%%%%%%%%%%%%%%%%%% twotwo = 0; % Calculating length of all 21 matrices... length_twoone = length(mat_one)+one+length(mat_two)+two+length(mat_three)... +three+length(mat_four)+four+length(mat_five)+five+length(mat_six)... +six+length(mat_seven)+seven+length(mat_eight)+eight+length(mat_nine)... +nine+length(mat_ten)+ten+length(mat_oneone)+oneone+length(mat_onetwo)... +onetwo+length(mat_onethree)+onethree+length(mat_onefour)+onefour... +length(mat_onefive)+onefive+length(mat_onesix)+onesix... +length(mat_oneseven)+oneseven+length(mat_oneeight)+oneeight... +length(mat_onenine)+onenine+length(mat_twozero)+twozero... +length(mat_twoone)+1; % Create matrix for chromosome #22 for i = length_twoone:length(raw_data); if raw_data(i,1) == 22; mat_twotwo((i - (length_twoone - 1)),:) = raw_data(i,:); elseif raw_data(length_twoone,1) ~= 22; mat_twotwo = zeros(1,3); twotwo = -1; break end i = i + 1; end if mat_twotwo ~= 0; mat_twotwo = sortrows(mat_twotwo,2); end length_vector = [length(mat_one) length(mat_two) length(mat_three)... length(mat_four) length(mat_five) length(mat_six) length(mat_seven)... length(mat_eight) length(mat_nine) length(mat_ten) length(mat_oneone)... length(mat_onetwo) length(mat_onethree) length(mat_onefour) length(mat_onefive)... length(mat_onesix) length(mat_oneseven) length(mat_oneeight) length(mat_onenine)... length(mat_twozero) length(mat_twoone) length(mat_twotwo)]; cols = max(length_vector); image_matrix = zeros(44,cols);
152
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % % Assign the contents of each individual matrix to the right row % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% image_matrix(1,1:length(mat_one)) = mat_one(:,3); image_matrix(2,1:cols) = zeros(1,cols); image_matrix(3,1:length(mat_two)) = mat_two(:,3); image_matrix(4,1:cols) = zeros(1,cols); image_matrix(5,1:length(mat_three)) = mat_three(:,3); image_matrix(6,1:cols) = zeros(1,cols); image_matrix(7,1:length(mat_four)) = mat_four(:,3); image_matrix(8,1:cols) = zeros(1,cols); image_matrix(9,1:length(mat_five)) = mat_five(:,3); image_matrix(10,1:cols) = zeros(1,cols); image_matrix(11,1:length(mat_six)) = mat_six(:,3); image_matrix(12,1:cols) = zeros(1,cols); image_matrix(13,1:length(mat_seven)) = mat_seven(:,3); image_matrix(14,1:cols) = zeros(1,cols); image_matrix(15,1:length(mat_eight)) = mat_eight(:,3); image_matrix(16,1:cols) = zeros(1,cols); image_matrix(17,1:length(mat_nine)) = mat_nine(:,3); image_matrix(18,1:cols) = zeros(1,cols); image_matrix(19,1:length(mat_ten)) = mat_ten(:,3); image_matrix(20,1:cols) = zeros(1,cols); image_matrix(21,1:length(mat_oneone)) = mat_oneone(:,3); image_matrix(22,1:cols) = zeros(1,cols); image_matrix(23,1:length(mat_onetwo)) = mat_onetwo(:,3); image_matrix(24,1:cols) = zeros(1,cols); image_matrix(25,1:length(mat_onethree)) = mat_onethree(:,3); image_matrix(26,1:cols) = zeros(1,cols); image_matrix(27,1:length(mat_onefour)) = mat_onefour(:,3); image_matrix(28,1:cols) = zeros(1,cols); image_matrix(29,1:length(mat_onefive)) = mat_onefive(:,3); image_matrix(30,1:cols) = zeros(1,cols); image_matrix(31,1:length(mat_onesix)) = mat_onesix(:,3); image_matrix(32,1:cols) = zeros(1,cols); image_matrix(33,1:length(mat_oneseven)) = mat_oneseven(:,3); image_matrix(34,1:cols) = zeros(1,cols); image_matrix(35,1:length(mat_oneeight)) = mat_oneeight(:,3); image_matrix(36,1:cols) = zeros(1,cols); image_matrix(37,1:length(mat_onenine)) = mat_onenine(:,3); image_matrix(38,1:cols) = zeros(1,cols); image_matrix(39,1:length(mat_twozero)) = mat_twozero(:,3); image_matrix(40,1:cols) = zeros(1,cols); image_matrix(41,1:length(mat_twoone)) = mat_twoone(:,3); image_matrix(42,1:cols) = zeros(1,cols); image_matrix(43,1:length(mat_twotwo)) = mat_twotwo(:,3); ycoord = [1:(length(mat_one)+one),1:(length(mat_two)+two),1:(length(mat_three)+three),... 1:(length(mat_four)+four),1:(length(mat_five)+five),1:(length(mat_six
153
)+six),... 1:(length(mat_seven)+seven),1:(length(mat_eight)+eight),1:(length(mat_nine)+nine),... 1:(length(mat_ten)+ten),1:(length(mat_oneone)+oneone),1:(length(mat_onetwo)+onetwo),... 1:(length(mat_onethree)+onethree),1:(length(mat_onefour)+onefour),... 1:(length(mat_onefive)+onefive),1:(length(mat_onesix)+onesix),... 1:(length(mat_oneseven)+oneseven),1:(length(mat_oneeight)+oneeight),... 1:(length(mat_onenine)+onenine),1:(length(mat_twozero)+twozero),... 1:(length(mat_twoone)+twoone),1:(length(mat_twotwo)+twotwo)]; % Insert the y-coordinates into the matrix... cursor_mat2(1:length(ycoord),4) = ycoord; %%%%%%%%%%%%%%%%%% % Plot the image % %%%%%%%%%%%%%%%%%% % fig = figure; x = mesh(image_matrix(1:44,1:cols));
154
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % SC EXPRESS GUI PROGRAM CODE % % Created By Chuba B. Oyolu % % Date: 05/26/2009 % % Last Modified: 09/21/2009 % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% function varargout = sc_exp_v2(varargin) %SC_EXP_V2 M-file for sc_exp_v2.fig %SC_EXP_V2, by itself, creates a new SC_EXP_V2 or raises the existing %singleton*. %H = SC_EXP_V2 returns the handle to a new SC_EXP_V2 or the handle to %the existing singleton*. %SC_EXP_V2('CALLBACK',hObject,eventData,handles,...) calls the local %function named CALLBACK in SC_EXP_V2.M with the given input arguments. %SC_EXP_V2('Property','avgvalue',...) creates a new SC_EXP_V2 or raises %the existing singleton*. Starting from the left, property avgvalue %pairs are applied to the GUI before sc_exp_v2_OpeningFunction gets %called. An unrecognized property name or invalid avgvalue makes %property application stop. All inputs are passed to %sc_exp_v2_OpeningFcn via varargin. %*See GUI Options on GUIDE's Tools menu. Choose "GUI allows only one %instance to run (singleton)". %See also: GUIDE, GUIDATA, GUIHANDLES %Edit the above text to modify the response to help sc_exp_v2 %Last Modified by GUIDE v2.5 29-May-2009 11:17:52 %****************Begin initialization code - DO NOT EDIT***********% gui_Singleton = 1; gui_State = struct('gui_Name', mfilename, ... 'gui_Singleton', gui_Singleton, ... 'gui_OpeningFcn', @sc_exp_v2_OpeningFcn, ... 'gui_OutputFcn', @sc_exp_v2_OutputFcn, ... 'gui_LayoutFcn', [] , ... 'gui_Callback', []); if nargin && ischar(varargin{1}) gui_State.gui_Callback = str2func(varargin{1}); end if nargout [varargout{1:nargout}] = gui_mainfcn(gui_State, varargin{:}); else gui_mainfcn(gui_State, varargin{:}); end
155
%**************End initialization code - DO NOT EDIT***************% %Executes just before sc_exp_v2 is made visible. function sc_exp_v2_OpeningFcn(hObject, eventdata, handles, varargin) %This function has no output args, see OutputFcn. %hObject - handle to figure %eventdata - reserved - to be defined in a future version of MATLAB %handles - structure with handles and user data (see GUIDATA) %varargin - command line arguments to sc_exp_v2 (see VARARGIN) %Supply input file number 1... filename = input('Enter Filename for Visualization Window #1: ','s'); raw_data = dlmread(['/Applications/MATLAB_SV74/' filename '.txt']); raw_data(:,2) = round(raw_data(:,2)); input_mat_length = length(raw_data); %Pad the matrix if a complete data set is not offered... if input_mat_length < 1152;
raw_data((input_mat_length +1):1152,:) = zeros(1152-(input_mat_length),2);
end %Supply input file number 2.... filename2 = input('Enter Filename for Visualization Window #2: ','s'); raw_data2 = dlmread(['/Applications/MATLAB_SV74/' filename2 '.txt']); raw_data2(:,2) = round(raw_data2(:,2)); input_mat_length2 = length(raw_data2); %Pad the matrix if a complete data set is not offered... if input_mat_length2 < 1152;
raw_data2((input_mat_length2 +1):1152,:) = zeros(1152- (input_mat_length2),2);
end %Break up input file #1 into appropriate blocks to represent each %cell... handles.cell1 = raw_data(1:24,:); handles.cell2 = raw_data(25:48,:); handles.cell3 = raw_data(49:72,:); handles.cell4 = raw_data(73:96,:); handles.cell5 = raw_data(97:120,:); handles.cell6 = raw_data(121:144,:); handles.cell7 = raw_data(145:168,:); handles.cell8 = raw_data(169:192,:); handles.cell9 = raw_data(193:216,:); handles.cell10 = raw_data(217:240,:); handles.cell11 = raw_data(241:264,:); handles.cell12 = raw_data(265:288,:); handles.cell13 = raw_data(289:312,:);
156
handles.cell14 = raw_data(313:336,:); handles.cell15 = raw_data(337:360,:); handles.cell16 = raw_data(361:384,:); handles.cell17 = raw_data(385:408,:); handles.cell18 = raw_data(409:432,:); handles.cell19 = raw_data(433:456,:); handles.cell20 = raw_data(457:480,:); handles.cell21 = raw_data(481:504,:); handles.cell22 = raw_data(505:528,:); handles.cell23 = raw_data(529:552,:); handles.cell24 = raw_data(553:576,:); handles.cell25 = raw_data(577:600,:); handles.cell26 = raw_data(601:624,:); handles.cell27 = raw_data(625:648,:); handles.cell28 = raw_data(649:672,:); handles.cell29 = raw_data(673:696,:); handles.cell30 = raw_data(697:720,:); handles.cell31 = raw_data(721:744,:); handles.cell32 = raw_data(745:768,:); handles.cell33 = raw_data(769:792,:); handles.cell34 = raw_data(793:816,:); handles.cell35 = raw_data(817:840,:); handles.cell36 = raw_data(841:864,:); handles.cell37 = raw_data(865:888,:); handles.cell38 = raw_data(889:912,:); handles.cell39 = raw_data(913:936,:); handles.cell40 = raw_data(937:960,:); handles.cell41 = raw_data(961:984,:); handles.cell42 = raw_data(985:1008,:); handles.cell43 = raw_data(1009:1032,:); handles.cell44 = raw_data(1033:1056,:); handles.cell45 = raw_data(1057:1080,:); handles.cell46 = raw_data(1081:1104,:); handles.cell47 = raw_data(1105:1128,:); handles.cell48 = raw_data(1129:1152,:); %Break up input file #2 into appropriate blocks to represent each %cell... handles.sec_cell1 = raw_data2(1:24,:); handles.sec_cell2 = raw_data2(25:48,:); handles.sec_cell3 = raw_data2(49:72,:); handles.sec_cell4 = raw_data2(73:96,:); handles.sec_cell5 = raw_data2(97:120,:); handles.sec_cell6 = raw_data2(121:144,:); handles.sec_cell7 = raw_data2(145:168,:); handles.sec_cell8 = raw_data2(169:192,:); handles.sec_cell9 = raw_data2(193:216,:); handles.sec_cell10 = raw_data2(217:240,:); handles.sec_cell11 = raw_data2(241:264,:); handles.sec_cell12 = raw_data2(265:288,:); handles.sec_cell13 = raw_data2(289:312,:); handles.sec_cell14 = raw_data2(313:336,:); handles.sec_cell15 = raw_data2(337:360,:); handles.sec_cell16 = raw_data2(361:384,:); handles.sec_cell17 = raw_data2(385:408,:);
157
handles.sec_cell18 = raw_data2(409:432,:); handles.sec_cell19 = raw_data2(433:456,:); handles.sec_cell20 = raw_data2(457:480,:); handles.sec_cell21 = raw_data2(481:504,:); handles.sec_cell22 = raw_data2(505:528,:); handles.sec_cell23 = raw_data2(529:552,:); handles.sec_cell24 = raw_data2(553:576,:); handles.sec_cell25 = raw_data2(577:600,:); handles.sec_cell26 = raw_data2(601:624,:); handles.sec_cell27 = raw_data2(625:648,:); handles.sec_cell28 = raw_data2(649:672,:); handles.sec_cell29 = raw_data2(673:696,:); handles.sec_cell30 = raw_data2(697:720,:); handles.sec_cell31 = raw_data2(721:744,:); handles.sec_cell32 = raw_data2(745:768,:); handles.sec_cell33 = raw_data2(769:792,:); handles.sec_cell34 = raw_data2(793:816,:); handles.sec_cell35 = raw_data2(817:840,:); handles.sec_cell36 = raw_data2(841:864,:); handles.sec_cell37 = raw_data2(865:888,:); handles.sec_cell38 = raw_data2(889:912,:); handles.sec_cell39 = raw_data2(913:936,:); handles.sec_cell40 = raw_data2(937:960,:); handles.sec_cell41 = raw_data2(961:984,:); handles.sec_cell42 = raw_data2(985:1008,:); handles.sec_cell43 = raw_data2(1009:1032,:); handles.sec_cell44 = raw_data2(1033:1056,:); handles.sec_cell45 = raw_data2(1057:1080,:); handles.sec_cell46 = raw_data2(1081:1104,:); handles.sec_cell47 = raw_data2(1105:1128,:); handles.sec_cell48 = raw_data2(1129:1152,:); %Choose default command line output for sc_exp_v2 handles.output = hObject; %Update handles structure guidata(hObject, handles); %UIWAIT makes sc_exp_v2 wait for user response (see UIRESUME) %uiwait(handles.figure1); %Outputs from this function are returned to the command line. function varargout = sc_exp_v2_OutputFcn(hObject, eventdata, handles) %varargout cell array for returning output args (see VARARGOUT); %hObject handle to figure %eventdata reserved - to be defined in a future version of MATLAB %handles structure with handles and user data (see GUIDATA) %Get default command line output from handles structure varargout{1} = handles.output; %Executes on selection change in popupmenu2. function popupmenu2_Callback(hObject, eventdata, handles) %hObject handle to popupmenu2 (see GCBO) %eventdata reserved - to be defined in a future version of MATLAB %handles structure with handles and user data (see GUIDATA)
158
%Determine the selected data set. str = get(hObject, 'String'); val = get(hObject,'Value'); set(handles.avgvalue,'String','0.'); set(handles.minvalue,'String','0.'); set(handles.maxvalue,'String','0.'); %Set current data to the selected Cell. switch str{val}; case 'Cell 1' %User selects Cell 1. handles.current_data = handles.cell1; case 'Cell 2' %User selects Cell 2. handles.current_data = handles.cell2; case 'Cell 3' %User selects Cell 3. handles.current_data = handles.cell3; case 'Cell 4' %User selects Cell 4. handles.current_data = handles.cell4; case 'Cell 5' %User selects Cell 5. handles.current_data = handles.cell5; case 'Cell 6' %User selects Cell 6. handles.current_data = handles.cell6; case 'Cell 7' %User selects Cell 7. handles.current_data = handles.cell7; case 'Cell 8' %User selects Cell 8. handles.current_data = handles.cell8; case 'Cell 9' %User selects Cell 9. handles.current_data = handles.cell9; case 'Cell 10' %User selects Cell 10. handles.current_data = handles.cell10; case 'Cell 11' %User selects Cell 11. handles.current_data = handles.cell11; case 'Cell 12' %User selects Cell 12. handles.current_data = handles.cell12; case 'Cell 13' %User selects Cell 13. handles.current_data = handles.cell13; case 'Cell 14' %User selects Cell 14. handles.current_data = handles.cell14; case 'Cell 15' %User selects Cell 15. handles.current_data = handles.cell15; case 'Cell 16' %User selects Cell 16. handles.current_data = handles.cell16; case 'Cell 17' %User selects Cell 17. handles.current_data = handles.cell17; case 'Cell 18' %User selects Cell 18. handles.current_data = handles.cell18; case 'Cell 19' %User selects Cell 19. handles.current_data = handles.cell19; case 'Cell 20' %User selects Cell 20. handles.current_data = handles.cell20; case 'Cell 21' %User selects Cell 21. handles.current_data = handles.cell21; case 'Cell 22' %User selects Cell 22. handles.current_data = handles.cell22; case 'Cell 23' %User selects Cell 23. handles.current_data = handles.cell23; case 'Cell 24' %User selects Cell 24. handles.current_data = handles.cell24;
159
case 'Cell 25' %User selects Cell 25. handles.current_data = handles.cell25; case 'Cell 26' %User selects Cell 26. handles.current_data = handles.cell26; case 'Cell 27' %User selects Cell 27. handles.current_data = handles.cell27; case 'Cell 28' %User selects Cell 28. handles.current_data = handles.cell28; case 'Cell 29' %User selects Cell 29. handles.current_data = handles.cell29; case 'Cell 30' %User selects Cell 30. handles.current_data = handles.cell30; case 'Cell 31' %User selects Cell 31. handles.current_data = handles.cell31; case 'Cell 32' %User selects Cell 32. handles.current_data = handles.cell32; case 'Cell 33' %User selects Cell 33. handles.current_data = handles.cell33; case 'Cell 34' %User selects Cell 34. handles.current_data = handles.cell34; case 'Cell 35' %User selects Cell 35. handles.current_data = handles.cell35; case 'Cell 36' %User selects Cell 36. handles.current_data = handles.cell36; case 'Cell 37' %User selects Cell 37. handles.current_data = handles.cell37; case 'Cell 38' %User selects Cell 38. handles.current_data = handles.cell38; case 'Cell 39' %User selects Cell 39. handles.current_data = handles.cell39; case 'Cell 40' %User selects Cell 40. handles.current_data = handles.cell40; case 'Cell 41' %User selects Cell 41. handles.current_data = handles.cell41; case 'Cell 42' %User selects Cell 42. handles.current_data = handles.cell42; case 'Cell 43' %User selects Cell 43. handles.current_data = handles.cell43; case 'Cell 44' %User selects Cell 44. handles.current_data = handles.cell44; case 'Cell 45' %User selects Cell 45. handles.current_data = handles.cell45; case 'Cell 46' %User selects Cell 46. handles.current_data = handles.cell46; case 'Cell 47' %User selects Cell 47. handles.current_data = handles.cell47; case 'Cell 48' %User selects Cell 48. handles.current_data = handles.cell48; end %Save the handles structure. guidata(hObject,handles) %Hints: contents = get(hObject,'String') returns popupmenu2 contents as %cell array contents{get(hObject,'avgvalue')} returns selected item %from popupmenu2 %Executes during object creation, after setting all properties.
160
function popupmenu2_CreateFcn(hObject, eventdata, handles) %hObject handle to popupmenu2 (see GCBO) %eventdata reserved - to be defined in a future version of MATLAB %handles empty - handles not created until after all CreateFcns called %Hint: popupmenu controls usually have a white background on Windows. %See ISPC and COMPUTER. if ispc && isequal(get(hObject,'BackgroundColor'), get(0,'defaultUicontrolBackgroundColor')) set(hObject,'BackgroundColor','white'); end %Executes on selection change in popupmenu3. function popupmenu3_Callback(hObject, eventdata, handles) %hObject handle to popupmenu3 (see GCBO) %eventdata reserved - to be defined in a future version of MATLAB %handles structure with handles and user data (see GUIDATA) %Hints: contents = get(hObject,'String') returns popupmenu3 contents as %cell array contents{get(hObject,'avgvalue')} returns selected item %from popupmenu3 %Determine the selected data set. str = get(hObject, 'String'); val = get(hObject,'Value'); %Set current data to the selected Cell. switch str{val}; case 'Cell 1' %User selects Cell 1. handles.current_data2 = handles.sec_cell1; case 'Cell 2' %User selects Cell 2. handles.current_data2 = handles.sec_cell2; case 'Cell 3' %User selects Cell 3. handles.current_data2 = handles.sec_cell3; case 'Cell 4' %User selects Cell 4. handles.current_data2 = handles.sec_cell4; case 'Cell 5' %User selects Cell 5. handles.current_data2 = handles.sec_cell5; case 'Cell 6' %User selects Cell 6. handles.current_data2 = handles.sec_cell6; case 'Cell 7' %User selects Cell 7. handles.current_data2 = handles.sec_cell7; case 'Cell 8' %User selects Cell 8. handles.current_data2 = handles.sec_cell8; case 'Cell 9' %User selects Cell 9. handles.current_data2 = handles.sec_cell9; case 'Cell 10' %User selects Cell 10. handles.current_data2 = handles.sec_cell10; case 'Cell 11' %User selects Cell 11. handles.current_data2 = handles.sec_cell11; case 'Cell 12' %User selects Cell 12. handles.current_data2 = handles.sec_cell12; case 'Cell 13' %User selects Cell 13. handles.current_data2 = handles.sec_cell13; case 'Cell 14' %User selects Cell 14. handles.current_data2 = handles.sec_cell14; case 'Cell 15' %User selects Cell 15.
161
handles.current_data2 = handles.sec_cell15; case 'Cell 16' %User selects Cell 16. handles.current_data2 = handles.sec_cell16; case 'Cell 17' %User selects Cell 17. handles.current_data2 = handles.sec_cell17; case 'Cell 18' %User selects Cell 18. handles.current_data2 = handles.sec_cell18; case 'Cell 19' %User selects Cell 19. handles.current_data2 = handles.sec_cell19; case 'Cell 20' %User selects Cell 20. handles.current_data2 = handles.sec_cell20; case 'Cell 21' %User selects Cell 21. handles.current_data2 = handles.sec_cell21; case 'Cell 22' %User selects Cell 22. handles.current_data2 = handles.sec_cell22; case 'Cell 23' %User selects Cell 23. handles.current_data2 = handles.sec_cell23; case 'Cell 24' %User selects Cell 24. handles.current_data2 = handles.sec_cell24; case 'Cell 25' %User selects Cell 25. handles.current_data2 = handles.sec_cell25; case 'Cell 26' %User selects Cell 26. handles.current_data2 = handles.sec_cell26; case 'Cell 27' %User selects Cell 27. handles.current_data2 = handles.sec_cell27; case 'Cell 28' %User selects Cell 28. handles.current_data2 = handles.sec_cell28; case 'Cell 29' %User selects Cell 29. handles.current_data2 = handles.sec_cell29; case 'Cell 30' %User selects Cell 30. handles.current_data2 = handles.sec_cell30; case 'Cell 31' %User selects Cell 31. handles.current_data2 = handles.sec_cell31; case 'Cell 32' %User selects Cell 32. handles.current_data2 = handles.sec_cell32; case 'Cell 33' %User selects Cell 33. handles.current_data2 = handles.sec_cell33; case 'Cell 34' %User selects Cell 34. handles.current_data2 = handles.sec_cell34; case 'Cell 35' %User selects Cell 35. handles.current_data2 = handles.sec_cell35; case 'Cell 36' %User selects Cell 36. handles.current_data2 = handles.sec_cell36; case 'Cell 37' %User selects Cell 37. handles.current_data2 = handles.sec_cell37; case 'Cell 38' %User selects Cell 38. handles.current_data2 = handles.sec_cell38; case 'Cell 39' %User selects Cell 39. handles.current_data2 = handles.sec_cell39; case 'Cell 40' %User selects Cell 40. handles.current_data2 = handles.sec_cell40; case 'Cell 41' %User selects Cell 41. handles.current_data2 = handles.sec_cell41; case 'Cell 42' %User selects Cell 42. handles.current_data2 = handles.sec_cell42; case 'Cell 43' %User selects Cell 43. handles.current_data2 = handles.sec_cell43; case 'Cell 44' %User selects Cell 44.
162
handles.current_data2 = handles.sec_cell44; case 'Cell 45' %User selects Cell 45. handles.current_data2 = handles.sec_cell45; case 'Cell 46' %User selects Cell 46. handles.current_data2 = handles.sec_cell46; case 'Cell 47' %User selects Cell 47. handles.current_data2 = handles.sec_cell47; case 'Cell 48' %User selects Cell 48. handles.current_data2 = handles.sec_cell48; end %Save the handles structure. guidata(hObject,handles); %Executes during object creation, after setting all properties. function popupmenu3_CreateFcn(hObject, eventdata, handles) %hObject - handle to popupmenu3 (see GCBO) %eventdata - reserved - to be defined in a future version of MATLAB %handles - empty - handles not created until after all CreateFcns %called %Hint: popupmenu controls usually have a white background on Windows. %See ISPC and COMPUTER. if ispc && isequal(get(hObject,'BackgroundColor'), get(0,'defaultUicontrolBackgroundColor')) set(hObject,'BackgroundColor','white'); end %Executes on button press in pushbutton1. function pushbutton1_Callback(hObject, eventdata, handles) %hObject - handle to pushbutton1 (see GCBO) %eventdata - reserved - to be defined in a future version of MATLAB %handles - structure with handles and user data (see GUIDATA) axes(handles.axes1); NOP = 25; radius_circ = 50; center = [0,0,0]; style = '.'; global radius_circ; THETA=linspace(0,2*pi,NOP); RHO=ones(1,NOP)*radius_circ; [X,Y] = pol2cart(THETA,RHO); X=X+center(1); Y=Y+center(2); Z = center(3)*ones(1,length(X)); H=plot3(X,Y,Z,style); axis square; %Creating the spokes of the bicycle wheel... chuba = [X,Y]; emeka = [chuba(:,1:25);chuba(:,26:50)]; coord_mat = emeka';
163
line([0 coord_mat(1,1)],[0 coord_mat(1,2)],[0 0],'Marker','.','LineStyle','--'); line([0 coord_mat(2,1)],[0 coord_mat(2,2)],[0 0],'Marker','.','LineStyle','--'); line([0 coord_mat(3,1)],[0 coord_mat(3,2)],[0 0],'Marker','.','LineStyle','--'); line([0 coord_mat(4,1)],[0 coord_mat(4,2)],[0 0],'Marker','.','LineStyle','--'); line([0 coord_mat(5,1)],[0 coord_mat(5,2)],[0 0],'Marker','.','LineStyle','--'); line([0 coord_mat(6,1)],[0 coord_mat(6,2)],[0 0],'Marker','.','LineStyle','--'); line([0 coord_mat(7,1)],[0 coord_mat(7,2)],[0 0],'Marker','.','LineStyle','--'); line([0 coord_mat(8,1)],[0 coord_mat(8,2)],[0 0],'Marker','.','LineStyle','--'); line([0 coord_mat(9,1)],[0 coord_mat(9,2)],[0 0],'Marker','.','LineStyle','--'); line([0 coord_mat(10,1)],[0 coord_mat(10,2)],[0 0],'Marker','.','LineStyle','--'); line([0 coord_mat(11,1)],[0 coord_mat(11,2)],[0 0],'Marker','.','LineStyle','--'); line([0 coord_mat(12,1)],[0 coord_mat(12,2)],[0 0],'Marker','.','LineStyle','--'); line([0 coord_mat(13,1)],[0 coord_mat(13,2)],[0 0],'Marker','.','LineStyle','--'); line([0 coord_mat(14,1)],[0 coord_mat(14,2)],[0 0],'Marker','.','LineStyle','--'); line([0 coord_mat(15,1)],[0 coord_mat(15,2)],[0 0],'Marker','.','LineStyle','--'); line([0 coord_mat(16,1)],[0 coord_mat(16,2)],[0 0],'Marker','.','LineStyle','--'); line([0 coord_mat(17,1)],[0 coord_mat(17,2)],[0 0],'Marker','.','LineStyle','--'); line([0 coord_mat(18,1)],[0 coord_mat(18,2)],[0 0],'Marker','.','LineStyle','--'); line([0 coord_mat(19,1)],[0 coord_mat(19,2)],[0 0],'Marker','.','LineStyle','--'); line([0 coord_mat(20,1)],[0 coord_mat(20,2)],[0 0],'Marker','.','LineStyle','--'); line([0 coord_mat(21,1)],[0 coord_mat(21,2)],[0 0],'Marker','.','LineStyle','--'); line([0 coord_mat(22,1)],[0 coord_mat(22,2)],[0 0],'Marker','.','LineStyle','--'); line([0 coord_mat(23,1)],[0 coord_mat(23,2)],[0 0],'Marker','.','LineStyle','--'); line([0 coord_mat(24,1)],[0 coord_mat(24,2)],[0 0],'Marker','.','LineStyle','--'); %List the gene names... text(coord_mat(1,1),coord_mat(1,2),0,'FoxA2'); text(coord_mat(2,1),coord_mat(2,2),0,'Gata4'); text(coord_mat(3,1),coord_mat(3,2),0,'Apoa2'); text(coord_mat(4,1),coord_mat(4,2),0,'Smarcd3'); text(coord_mat(5,1),coord_mat(5,2),0,'Nrob1'); text(coord_mat(6,1),coord_mat(6,2),0,'Prss2');
164
text(coord_mat(7,1),coord_mat(7,2),0,'S100a16'); text(coord_mat(8,1),coord_mat(8,2),0,'Foxq1'); text(coord_mat(9,1),coord_mat(9,2),0,'Samd11'); text(coord_mat(10,1),coord_mat(10,2),0,'Porcn'); text(coord_mat(11,1),coord_mat(11,2),0,'Smad6'); text(coord_mat(12,1),coord_mat(12,2),0,'Prex1'); text(coord_mat(13,1),coord_mat(13,2),0,'Reep6'); text(coord_mat(14,1),coord_mat(14,2),0,'Gata6'); text(coord_mat(15,1),coord_mat(15,2),0,'Gsc'); text(coord_mat(16,1),coord_mat(16,2),0,'Cxcr4'); text(coord_mat(17,1),coord_mat(17,2),0,'Sox17'); text(coord_mat(18,1),coord_mat(18,2),0,'Mid1ip1'); text(coord_mat(19,1),coord_mat(19,2),0,'Nodal'); text(coord_mat(20,1),coord_mat(20,2),0,'Nfkbia'); text(coord_mat(21,1),coord_mat(21,2),0,'Fxyd6'); text(coord_mat(22,1),coord_mat(22,2),0,'Cst3'); text(coord_mat(23,1),coord_mat(23,2),0,'Sox1'); text(coord_mat(24,1),coord_mat(24,2),0,'Gapdh'); hold on %Obtain the coordinate @ which each line touches %circumference of the circle... [a1,b1] = conect(0,coord_mat(1,1),0,coord_mat(1,2)); victor1 = [a1;b1]'; [a2,b2] = conect(0,coord_mat(2,1),0,coord_mat(2,2)); victor2 = [a2;b2]'; [a3,b3] = conect(0,coord_mat(3,1),0,coord_mat(3,2)); victor3 = [a3;b3]'; [a4,b4] = conect(0,coord_mat(4,1),0,coord_mat(4,2)); victor4 = [a4;b4]'; [a5,b5] = conect(0,coord_mat(5,1),0,coord_mat(5,2)); victor5 = [a5;b5]'; [a6,b6] = conect(0,coord_mat(6,1),0,coord_mat(6,2)); victor6 = [a6;b6]'; [a7,b7] = conect(0,coord_mat(7,1),0,coord_mat(7,2)); victor7 = [a7;b7]'; [a8,b8] = conect(0,coord_mat(8,1),0,coord_mat(8,2)); victor8 = [a8;b8]'; [a9,b9] = conect(0,coord_mat(9,1),0,coord_mat(9,2)); victor9 = [a9;b9]'; [a10,b10] = conect(0,coord_mat(10,1),0,coord_mat(10,2)); victor10 = [a10;b10]'; [a11,b11] = conect(0,coord_mat(11,1),0,coord_mat(11,2)); victor11 = [a11;b11]'; [a12,b12] = conect(0,coord_mat(12,1),0,coord_mat(12,2)); victor12 = [a12;b12]'; [a13,b13] = conect(0,coord_mat(13,1),0,coord_mat(13,2)); victor13 = [a13;b13]'; [a14,b14] = conect(0,coord_mat(14,1),0,coord_mat(14,2)); victor14 = [a14;b14]'; [a15,b15] = conect(0,coord_mat(15,1),0,coord_mat(15,2)); victor15 = [a15;b15]'; [a16,b16] = conect(0,coord_mat(16,1),0,coord_mat(16,2)); victor16 = [a16;b16]'; [a17,b17] = conect(0,coord_mat(17,1),0,coord_mat(17,2)); victor17 = [a17;b17]';
165
[a18,b18] = conect(0,coord_mat(18,1),0,coord_mat(18,2)); victor18 = [a18;b18]'; [a19,b19] = conect(0,coord_mat(19,1),0,coord_mat(19,2)); victor19 = [a19;b19]'; [a20,b20] = conect(0,coord_mat(20,1),0,coord_mat(20,2)); victor20 = [a20;b20]'; [a21,b21] = conect(0,coord_mat(21,1),0,coord_mat(21,2)); victor21 = [a21;b21]'; [a22,b22] = conect(0,coord_mat(22,1),0,coord_mat(22,2)); victor22 = [a22;b22]'; [a23,b23] = conect(0,coord_mat(23,1),0,coord_mat(23,2)); victor23 = [a23;b23]'; [a24,b24] = conect(0,coord_mat(24,1),0,coord_mat(24,2)); victor24 = [a24;b24]'; pos_mat = [victor1;victor2;victor3;victor4;victor5;victor6;victor7;victor8; victor9;victor10;victor11;victor12;victor13;victor14;victor15;victor16; victor17;victor18;victor19;victor20;victor21;victor22;victor23;victor24]; data1 = handles.current_data; %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Get coordinates for each data point...% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% for m = 1:24; coord_index(m) = (radius_circ * (m-1)) + data1(m,2); if round(data1(m,1)) == 0; continue data1(m,2) = 41; data1(m,3) = pos_mat(coord_index(m),1); data1(m,4) = pos_mat(coord_index(m),2); m = m + 1; else coord_index(m) = (radius_circ * (m-1)) + data1(m,2); data1(m,3) = pos_mat(coord_index(m),1); data1(m,4) = pos_mat(coord_index(m),2); m = m + 1; end end x = data1(:,3); y = data1(:,4); z = data1(:,1); tri = delaunay(x,y); h = trisurf(tri,x,y,z); shading interp; lighting phong; grid; rotate3d on; hold off; %Executes on button press in pushbutton2.
166
function pushbutton2_Callback(hObject, eventdata, handles) %hObject - handle to pushbutton2 (see GCBO) %eventdata - reserved - to be defined in a future version of MATLAB %handles - structure with handles and user data (see GUIDATA) axes(handles.axes2); NOP = 25; radius_circ = 50; center = [0,0,0]; style = '.'; global radius_circ; THETA=linspace(0,2*pi,NOP); RHO=ones(1,NOP)*radius_circ; [X,Y] = pol2cart(THETA,RHO); X=X+center(1); Y=Y+center(2); Z = center(3)*ones(1,length(X)); H=plot3(X,Y,Z,style); axis square; %Creating the spokes of the bicycle wheel... chuba = [X,Y]; emeka = [chuba(:,1:25);chuba(:,26:50)]; coord_mat = emeka'; line([0 coord_mat(1,1)],[0 coord_mat(1,2)],[0 0],'Marker','.','LineStyle','--'); line([0 coord_mat(2,1)],[0 coord_mat(2,2)],[0 0],'Marker','.','LineStyle','--'); line([0 coord_mat(3,1)],[0 coord_mat(3,2)],[0 0],'Marker','.','LineStyle','--'); line([0 coord_mat(4,1)],[0 coord_mat(4,2)],[0 0],'Marker','.','LineStyle','--'); line([0 coord_mat(5,1)],[0 coord_mat(5,2)],[0 0],'Marker','.','LineStyle','--'); line([0 coord_mat(6,1)],[0 coord_mat(6,2)],[0 0],'Marker','.','LineStyle','--'); line([0 coord_mat(7,1)],[0 coord_mat(7,2)],[0 0],'Marker','.','LineStyle','--'); line([0 coord_mat(8,1)],[0 coord_mat(8,2)],[0 0],'Marker','.','LineStyle','--'); line([0 coord_mat(9,1)],[0 coord_mat(9,2)],[0 0],'Marker','.','LineStyle','--'); line([0 coord_mat(10,1)],[0 coord_mat(10,2)],[0 0],'Marker','.','LineStyle','--'); line([0 coord_mat(11,1)],[0 coord_mat(11,2)],[0 0],'Marker','.','LineStyle','--'); line([0 coord_mat(12,1)],[0 coord_mat(12,2)],[0 0],'Marker','.','LineStyle','--'); line([0 coord_mat(13,1)],[0 coord_mat(13,2)],[0 0],'Marker','.','LineStyle','--'); line([0 coord_mat(14,1)],[0 coord_mat(14,2)],[0 0],'Marker','.','LineStyle','--'); line([0 coord_mat(15,1)],[0 coord_mat(15,2)],[0 0],'Marker','.','LineStyle','--');
167
line([0 coord_mat(16,1)],[0 coord_mat(16,2)],[0 0],'Marker','.','LineStyle','--'); line([0 coord_mat(17,1)],[0 coord_mat(17,2)],[0 0],'Marker','.','LineStyle','--'); line([0 coord_mat(18,1)],[0 coord_mat(18,2)],[0 0],'Marker','.','LineStyle','--'); line([0 coord_mat(19,1)],[0 coord_mat(19,2)],[0 0],'Marker','.','LineStyle','--'); line([0 coord_mat(20,1)],[0 coord_mat(20,2)],[0 0],'Marker','.','LineStyle','--'); line([0 coord_mat(21,1)],[0 coord_mat(21,2)],[0 0],'Marker','.','LineStyle','--'); line([0 coord_mat(22,1)],[0 coord_mat(22,2)],[0 0],'Marker','.','LineStyle','--'); line([0 coord_mat(23,1)],[0 coord_mat(23,2)],[0 0],'Marker','.','LineStyle','--'); line([0 coord_mat(24,1)],[0 coord_mat(24,2)],[0 0],'Marker','.','LineStyle','--'); %List the gene names... text(coord_mat(1,1),coord_mat(1,2),0,'FoxA2'); text(coord_mat(2,1),coord_mat(2,2),0,'Gata4'); text(coord_mat(3,1),coord_mat(3,2),0,'Apoa2'); text(coord_mat(4,1),coord_mat(4,2),0,'Smarcd3'); text(coord_mat(5,1),coord_mat(5,2),0,'Nrob1'); text(coord_mat(6,1),coord_mat(6,2),0,'Prss2'); text(coord_mat(7,1),coord_mat(7,2),0,'S100a16'); text(coord_mat(8,1),coord_mat(8,2),0,'Foxq1'); text(coord_mat(9,1),coord_mat(9,2),0,'Samd11'); text(coord_mat(10,1),coord_mat(10,2),0,'Porcn'); text(coord_mat(11,1),coord_mat(11,2),0,'Smad6'); text(coord_mat(12,1),coord_mat(12,2),0,'Prex1'); text(coord_mat(13,1),coord_mat(13,2),0,'Reep6'); text(coord_mat(14,1),coord_mat(14,2),0,'Gata6'); text(coord_mat(15,1),coord_mat(15,2),0,'Gsc'); text(coord_mat(16,1),coord_mat(16,2),0,'Cxcr4'); text(coord_mat(17,1),coord_mat(17,2),0,'Sox17'); text(coord_mat(18,1),coord_mat(18,2),0,'Mid1ip1'); text(coord_mat(19,1),coord_mat(19,2),0,'Nodal'); text(coord_mat(20,1),coord_mat(20,2),0,'Nfkbia'); text(coord_mat(21,1),coord_mat(21,2),0,'Fxyd6'); text(coord_mat(22,1),coord_mat(22,2),0,'Cst3'); text(coord_mat(23,1),coord_mat(23,2),0,'Sox1'); text(coord_mat(24,1),coord_mat(24,2),0,'Gapdh'); hold on %Obtain the coordinate @ which each line touches %circumference of the circle... [a1,b1] = conect(0,coord_mat(1,1),0,coord_mat(1,2)); victor1 = [a1;b1]'; [a2,b2] = conect(0,coord_mat(2,1),0,coord_mat(2,2)); victor2 = [a2;b2]'; [a3,b3] = conect(0,coord_mat(3,1),0,coord_mat(3,2)); victor3 = [a3;b3]'; [a4,b4] = conect(0,coord_mat(4,1),0,coord_mat(4,2)); victor4 = [a4;b4]';
168
[a5,b5] = conect(0,coord_mat(5,1),0,coord_mat(5,2)); victor5 = [a5;b5]'; [a6,b6] = conect(0,coord_mat(6,1),0,coord_mat(6,2)); victor6 = [a6;b6]'; [a7,b7] = conect(0,coord_mat(7,1),0,coord_mat(7,2)); victor7 = [a7;b7]'; [a8,b8] = conect(0,coord_mat(8,1),0,coord_mat(8,2)); victor8 = [a8;b8]'; [a9,b9] = conect(0,coord_mat(9,1),0,coord_mat(9,2)); victor9 = [a9;b9]'; [a10,b10] = conect(0,coord_mat(10,1),0,coord_mat(10,2)); victor10 = [a10;b10]'; [a11,b11] = conect(0,coord_mat(11,1),0,coord_mat(11,2)); victor11 = [a11;b11]'; [a12,b12] = conect(0,coord_mat(12,1),0,coord_mat(12,2)); victor12 = [a12;b12]'; [a13,b13] = conect(0,coord_mat(13,1),0,coord_mat(13,2)); victor13 = [a13;b13]'; [a14,b14] = conect(0,coord_mat(14,1),0,coord_mat(14,2)); victor14 = [a14;b14]'; [a15,b15] = conect(0,coord_mat(15,1),0,coord_mat(15,2)); victor15 = [a15;b15]'; [a16,b16] = conect(0,coord_mat(16,1),0,coord_mat(16,2)); victor16 = [a16;b16]'; [a17,b17] = conect(0,coord_mat(17,1),0,coord_mat(17,2)); victor17 = [a17;b17]'; [a18,b18] = conect(0,coord_mat(18,1),0,coord_mat(18,2)); victor18 = [a18;b18]'; [a19,b19] = conect(0,coord_mat(19,1),0,coord_mat(19,2)); victor19 = [a19;b19]'; [a20,b20] = conect(0,coord_mat(20,1),0,coord_mat(20,2)); victor20 = [a20;b20]'; [a21,b21] = conect(0,coord_mat(21,1),0,coord_mat(21,2)); victor21 = [a21;b21]'; [a22,b22] = conect(0,coord_mat(22,1),0,coord_mat(22,2)); victor22 = [a22;b22]'; [a23,b23] = conect(0,coord_mat(23,1),0,coord_mat(23,2)); victor23 = [a23;b23]'; [a24,b24] = conect(0,coord_mat(24,1),0,coord_mat(24,2)); victor24 = [a24;b24]'; pos_mat = [victor1;victor2;victor3;victor4;victor5;victor6;victor7;victor8; victor9;victor10;victor11;victor12;victor13;victor14;victor15;victor16; victor17;victor18;victor19;victor20;victor21;victor22;victor23;victor24]; data2 = handles.current_data2; %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Get coordinates for each data point...% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% for m = 1:24; coord_index(m) = (radius_circ * (m-1)) + data2(m,2); if round(data2(m,1)) == 0;
169
continue data2(m,2) = 41; data2(m,3) = pos_mat(coord_index(m),1); data2(m,4) = pos_mat(coord_index(m),2); m = m + 1; else coord_index(m) = (radius_circ * (m-1)) + data2(m,2); data2(m,3) = pos_mat(coord_index(m),1); data2(m,4) = pos_mat(coord_index(m),2); m = m + 1; end end x = data2(:,3); y = data2(:,4); z = data2(:,1); tri = delaunay(x,y); h = trisurf(tri,x,y,z); shading interp; lighting phong; grid; rotate3d on; hold off; %Executes during object creation, after setting all properties. function avgvalue_CreateFcn(hObject, eventdata, handles) %hObject - handle to avgvalue (see GCBO) %eventdata - reserved - to be defined in a future version of MATLAB %handles - empty - handles not created until after all CreateFcns %called %handles.avgvalue %set(handles.avgvalue,'String',theta1); %Executes during object creation, after setting all properties. function minvalue_CreateFcn(hObject, eventdata, handles) %hObject - handle to minvalue (see GCBO) %eventdata - reserved - to be defined in a future version of MATLAB %handles - empty - handles not created until after all CreateFcns %called handles.minvalue set(handles.minvalue,'String','0.'); %Executes during object creation, after setting all properties. function maxvalue_CreateFcn(hObject, eventdata, handles) %hObject - handle to maxvalue (see GCBO) %eventdata - reserved - to be defined in a future version of MATLAB %handles - empty - handles not created until after all CreateFcns %called handles.maxvalue set(handles.maxvalue,'String','0.'); %Executes on button press in pushbutton3. function pushbutton3_Callback(hObject, eventdata, handles) %hObject - handle to pushbutton3 (see GCBO) %eventdata - reserved - to be defined in a future version of MATLAB %handles - structure with handles and user data (see GUIDATA)
170
datamat1 = handles.current_data; datamat2 = handles.current_data2; %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Mathematical Measure of Similarity % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %Vector 1 vector1a = [datamat1(1,2) datamat1(1,1)]; vector1b = [datamat2(1,2) datamat2(1,1)]; dotprod1 = dot(vector1a,vector1b); mag1a = sqrt((datamat1(1,1))^2 + (datamat1(1,2))^2); mag1b = sqrt((datamat2(1,1))^2 + (datamat2(1,2))^2); theta1 = acos(dot(vector1a,vector1b)/(mag1a*mag1b))*(180/pi); %Vector 2 vector2a = [datamat1(2,2) datamat1(2,1)]; vector2b = [datamat2(2,2) datamat2(2,1)]; dotprod2 = dot(vector2a,vector2b); mag2a = sqrt((datamat1(2,1))^2 + (datamat1(2,2))^2); mag2b = sqrt((datamat2(2,1))^2 + (datamat2(2,2))^2); theta2 = acos(dot(vector2a,vector2b)/(mag2a*mag2b))*(180/pi); %Vector 3 vector3a = [datamat1(3,2) datamat1(3,1)]; vector3b = [datamat2(3,2) datamat2(3,1)]; dotprod3 = dot(vector3a,vector3b); mag3a = sqrt((datamat1(3,1))^2 + (datamat1(3,2))^2); mag3b = sqrt((datamat2(3,1))^2 + (datamat2(3,2))^2); theta3 = acos(dot(vector3a,vector3b)/(mag3a*mag3b))*(180/pi); %Vector 4 vector4a = [datamat1(4,2) datamat1(4,1)]; vector4b = [datamat2(4,2) datamat2(4,1)]; dotprod4 = dot(vector4a,vector4b); mag4a = sqrt((datamat1(4,1))^2 + (datamat1(4,2))^2); mag4b = sqrt((datamat2(4,1))^2 + (datamat2(4,2))^2); theta4 = acos(dot(vector4a,vector4b)/(mag4a*mag4b))*(180/pi); %Vector 5 vector5a = [datamat1(5,2) datamat1(5,1)]; vector5b = [datamat2(5,2) datamat2(5,1)];
171
dotprod5 = dot(vector5a,vector5b); mag5a = sqrt((datamat1(5,1))^2 + (datamat1(5,2))^2); mag5b = sqrt((datamat2(5,1))^2 + (datamat2(5,2))^2); theta5 = acos(dot(vector5a,vector5b)/(mag5a*mag5b))*(180/pi); %Vector 6 vector6a = [datamat1(6,2) datamat1(6,1)]; vector6b = [datamat2(6,2) datamat2(6,1)]; dotprod6 = dot(vector6a,vector6b); mag6a = sqrt((datamat1(6,1))^2 + (datamat1(6,2))^2); mag6b = sqrt((datamat2(6,1))^2 + (datamat2(6,2))^2); theta6 = acos(dot(vector6a,vector6b)/(mag6a*mag6b))*(180/pi); %Vector 7 vector7a = [datamat1(7,2) datamat1(7,1)]; vector7b = [datamat2(7,2) datamat2(7,1)]; dotprod7 = dot(vector7a,vector7b); mag7a = sqrt((datamat1(7,1))^2 + (datamat1(7,2))^2); mag7b = sqrt((datamat2(7,1))^2 + (datamat2(7,2))^2); theta7 = acos(dot(vector7a,vector7b)/(mag7a*mag7b))*(180/pi); %Vector 8 vector8a = [datamat1(8,2) datamat1(8,1)]; vector8b = [datamat2(8,2) datamat2(8,1)]; dotprod8 = dot(vector8a,vector8b); mag8a = sqrt((datamat1(8,1))^2 + (datamat1(8,2))^2); mag8b = sqrt((datamat2(8,1))^2 + (datamat2(8,2))^2); theta8 = acos(dot(vector8a,vector8b)/(mag8a*mag8b))*(180/pi); %Vector 9 vector9a = [datamat1(9,2) datamat1(9,1)]; vector9b = [datamat2(9,2) datamat2(9,1)]; dotprod9 = dot(vector9a,vector9b); mag9a = sqrt((datamat1(9,1))^2 + (datamat1(9,2))^2); mag9b = sqrt((datamat2(9,1))^2 + (datamat2(9,2))^2); theta9 = acos(dot(vector9a,vector9b)/(mag9a*mag9b))*(180/pi); %Vector 10 vector10a = [datamat1(10,2) datamat1(10,1)]; vector10b = [datamat2(10,2) datamat2(10,1)]; dotprod10 = dot(vector10a,vector10b); mag10a = sqrt((datamat1(10,1))^2 + (datamat1(10,2))^2); mag10b = sqrt((datamat2(10,1))^2 + (datamat2(10,2))^2);
172
theta10 = acos(dot(vector10a,vector10b)/(mag10a*mag10b))*(180/pi); %Vector 11 vector11a = [datamat1(11,2) datamat1(11,1)]; vector11b = [datamat2(11,2) datamat2(11,1)]; dotprod11 = dot(vector11a,vector11b); mag11a = sqrt((datamat1(11,1))^2 + (datamat1(11,2))^2); mag11b = sqrt((datamat2(11,1))^2 + (datamat2(11,2))^2); theta11 = acos(dot(vector11a,vector11b)/(mag11a*mag11b))*(180/pi); %Vector 12 vector12a = [datamat1(12,2) datamat1(12,1)]; vector12b = [datamat2(12,2) datamat2(12,1)]; dotprod12 = dot(vector12a,vector12b); mag12a = sqrt((datamat1(12,1))^2 + (datamat1(12,2))^2); mag12b = sqrt((datamat2(12,1))^2 + (datamat2(12,2))^2); theta12 = acos(dot(vector12a,vector12b)/(mag12a*mag12b))*(180/pi); %Vector 13 vector13a = [datamat1(13,2) datamat1(13,1)]; vector13b = [datamat2(13,2) datamat2(13,1)]; dotprod13 = dot(vector13a,vector13b); mag13a = sqrt((datamat1(13,1))^2 + (datamat1(13,2))^2); mag13b = sqrt((datamat2(13,1))^2 + (datamat2(13,2))^2); theta13 = acos(dot(vector13a,vector13b)/(mag13a*mag13b))*(180/pi); %Vector 14 vector14a = [datamat1(14,2) datamat1(14,1)]; vector14b = [datamat2(14,2) datamat2(14,1)]; dotprod14 = dot(vector14a,vector14b); mag14a = sqrt((datamat1(14,1))^2 + (datamat1(14,2))^2); mag14b = sqrt((datamat2(14,1))^2 + (datamat2(14,2))^2); theta14 = acos(dot(vector14a,vector14b)/(mag14a*mag14b))*(180/pi); %Vector 15 vector15a = [datamat1(15,2) datamat1(15,1)]; vector15b = [datamat2(15,2) datamat2(15,1)]; dotprod15 = dot(vector15a,vector15b); mag15a = sqrt((datamat1(15,1))^2 + (datamat1(15,2))^2); mag15b = sqrt((datamat2(15,1))^2 + (datamat2(15,2))^2); theta15 = acos(dot(vector15a,vector15b)/(mag15a*mag15b))*(180/pi);
173
%Vector 16 vector16a = [datamat1(16,2) datamat1(16,1)]; vector16b = [datamat2(16,2) datamat2(16,1)]; dotprod16 = dot(vector16a,vector16b); mag16a = sqrt((datamat1(16,1))^2 + (datamat1(16,2))^2); mag16b = sqrt((datamat2(16,1))^2 + (datamat2(16,2))^2); theta16 = acos(dot(vector16a,vector16b)/(mag16a*mag16b))*(180/pi); %Vector 17 vector17a = [datamat1(17,2) datamat1(17,1)]; vector17b = [datamat2(17,2) datamat2(17,1)]; dotprod17 = dot(vector17a,vector17b); mag17a = sqrt((datamat1(17,1))^2 + (datamat1(17,2))^2); mag17b = sqrt((datamat2(17,1))^2 + (datamat2(17,2))^2); theta17 = acos(dot(vector17a,vector17b)/(mag17a*mag17b))*(180/pi); %Vector 18 vector18a = [datamat1(18,2) datamat1(18,1)]; vector18b = [datamat2(18,2) datamat2(18,1)]; dotprod18 = dot(vector18a,vector18b); mag18a = sqrt((datamat1(18,1))^2 + (datamat1(18,2))^2); mag18b = sqrt((datamat2(18,1))^2 + (datamat2(18,2))^2); theta18 = acos(dot(vector18a,vector18b)/(mag18a*mag18b))*(180/pi); %Vector 19 vector19a = [datamat1(19,2) datamat1(19,1)]; vector19b = [datamat2(19,2) datamat2(19,1)]; dotprod19 = dot(vector19a,vector19b); mag19a = sqrt((datamat1(19,1))^2 + (datamat1(19,2))^2); mag19b = sqrt((datamat2(19,1))^2 + (datamat2(19,2))^2); theta19 = acos(dot(vector19a,vector19b)/(mag19a*mag19b))*(180/pi); %Vector 20 vector20a = [datamat1(20,2) datamat1(20,1)]; vector20b = [datamat2(20,2) datamat2(20,1)]; dotprod20 = dot(vector20a,vector20b); mag20a = sqrt((datamat1(20,1))^2 + (datamat1(20,2))^2); mag20b = sqrt((datamat2(20,1))^2 + (datamat2(20,2))^2); theta20 = acos(dot(vector20a,vector20b)/(mag20a*mag20b))*(180/pi);
174
%Vector 21 vector21a = [datamat1(21,2) datamat1(21,1)]; vector21b = [datamat2(21,2) datamat2(21,1)]; dotprod21 = dot(vector21a,vector21b); mag21a = sqrt((datamat1(21,1))^2 + (datamat1(21,2))^2); mag21b = sqrt((datamat2(21,1))^2 + (datamat2(21,2))^2); theta21 = acos(dot(vector21a,vector21b)/(mag21a*mag21b))*(180/pi); %Vector 22 vector22a = [datamat1(22,2) datamat1(22,1)]; vector22b = [datamat2(22,2) datamat2(22,1)]; dotprod22 = dot(vector22a,vector22b); mag22a = sqrt((datamat1(22,1))^2 + (datamat1(22,2))^2); mag22b = sqrt((datamat2(22,1))^2 + (datamat2(22,2))^2); theta22 = acos(dot(vector22a,vector22b)/(mag22a*mag22b))*(180/pi); %Vector 23 vector23a = [datamat1(23,2) datamat1(23,1)]; vector23b = [datamat2(23,2) datamat2(23,1)]; dotprod23 = dot(vector23a,vector23b); mag23a = sqrt((datamat1(23,1))^2 + (datamat1(23,2))^2); mag23b = sqrt((datamat2(23,1))^2 + (datamat2(23,2))^2); theta23 = acos(dot(vector23a,vector23b)/(mag23a*mag23b))*(180/pi); %Vector 24 vector24a = [datamat1(24,2) datamat1(24,1)]; vector24b = [datamat2(24,2) datamat2(24,1)]; dotprod24 = dot(vector24a,vector24b); mag24a = sqrt((datamat1(24,1))^2 + (datamat1(24,2))^2); mag24b = sqrt((datamat2(24,1))^2 + (datamat2(24,2))^2); theta24 = acos(dot(vector24a,vector24b)/(mag24a*mag24b))*(180/pi); %Put all the angles in one vector theta_vect = [theta1 theta2 theta3 theta4 theta5 theta6 theta7 theta8 theta9 theta10 theta11 ... theta12 theta13 theta14 theta15 theta16 theta17 theta18 theta19 theta20 theta21 theta22 ... theta23 theta24]; x = [1:length(theta_vect)]; set(handles.avgvalue,'String',mean(theta_vect)); set(handles.maxvalue,'String',max(theta_vect)); set(handles.minvalue,'String',min(theta_vect)); axes(handles.axes3) %plot((1:length(theta_vect)),theta_vect); stem(x,theta_vect);
175
xlabel('Genes'); ylabel('Variation Score (Degrees)'); grid; %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % pcanalysis PROGRAM CODE % % Creator: Chuba B. Oyolu % % Date: 07/29/2008 % % Last Modified: 09/2/2010 % % Version 1 % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%Begin new executable cell %Program will prompt user for file containing principal components %The user is allowed to supply two separate input files... filename = input('Enter full PCA filename: ','s'); pca_data = dlmread(['/Applications/MATLAB_SV74/' filename '.txt']); filename2 = input('Enter top three PCA filename: ','s'); pca_data2 = dlmread(['/Applications/MATLAB_SV74/' filename2 '.txt']); %Get the length of both files for efficient manipulation fLength = size(pca_data,1); %- Get length of entire file... fLength2 = size(pca_data2,1); %- Get length of entire file... %%Begin new executable cell %Graphics for PCA performed using all genes... %Need to divvy up the input file containing the pc analysis for all %cells into the appropriate sections hESCblk = pca_data(1:40,:); endoblk = pca_data(41:79,:); iPSblk = pca_data(80:103,:); iPSendoblk = pca_data(104:136,:); tntblk = pca_data(137:160,:); hepg2blk = pca_data(161:189,:); %Plot all possible combinations of principal components 1 through 4 %with one another %This block takes care of all combinations containing PC1 for pidX = 1:4 figure(10+pidX) plot(hESCblk(:,1),hESCblk(:,pidX),'b.') hold on plot(endoblk(:,1),endoblk(:,pidX),'r.') plot(iPSblk(:,1),iPSblk(:,pidX),'m.')
176
plot(iPSendoblk(:,1),iPSendoblk(:,pidX),'g.') plot(tntblk(:,1),tntblk(:,pidX),'k.') plot(hepg2blk(:,1),hepg2blk(:,pidX),'c.') hold off end clear pidX %This block takes care of all combinations containing PC2 for pidX = 3:4 figure(20+pidX) plot(hESCblk(:,2),hESCblk(:,pidX),'b.') hold on plot(endoblk(:,2),endoblk(:,pidX),'r.') plot(iPSblk(:,2),iPSblk(:,pidX),'m.') plot(iPSendoblk(:,2),iPSendoblk(:,pidX),'g.') plot(tntblk(:,2),tntblk(:,pidX),'k.') plot(hepg2blk(:,2),hepg2blk(:,pidX),'c.') hold off end clear pidX %This block takes care of the combination of PC3 & PC4 for pidX = 4 figure(30+pidX) plot(hESCblk(:,3),hESCblk(:,pidX),'b.') hold on plot(endoblk(:,3),endoblk(:,pidX),'r.') plot(iPSblk(:,3),iPSblk(:,pidX),'m.') plot(iPSendoblk(:,3),iPSendoblk(:,pidX),'g.') plot(tntblk(:,3),tntblk(:,pidX),'k.') plot(hepg2blk(:,3),hepg2blk(:,pidX),'c.') hold off end clear pidX %%Begin new executable cell %Graphics for PCA performed using top three genes... %Need to divvy up the topt_scellpca file into sections hESCblk2 = pca_data2(1:40,:); endoblk2 = pca_data2(41:78,:); iPSblk2 = pca_data2(79:102,:); iPSendoblk2 = pca_data2(103:136,:); tntblk2 = pca_data2(137:160,:); hepg2blk2 = pca_data2(161:189,:); %This block plots the relationship between both principal components %PC1 and PC2 for all cells for pidX = 1:2 figure(210+pidX) plot(hESCblk2(:,1),hESCblk2(:,pidX),'b.') hold on plot(endoblk2(:,1),endoblk2(:,pidX),'r.') plot(iPSblk2(:,1),iPSblk2(:,pidX),'m.') plot(iPSendoblk2(:,1),iPSendoblk2(:,pidX),'g.')
177
plot(tntblk2(:,1),tntblk2(:,pidX),'k.') plot(hepg2blk2(:,1),hepg2blk2(:,pidX),'c.') hold off end clear pidX
REFERENCES
178
Aaron R. Wheeler, William R. Throndset, et al. (2003). "Microfluidic device for single-cell analysis." Anal. Chem. 74: 3581-3586
Attisano, L., C. Silvestri, et al. (2001). "The transcriptional role of Smads and FAST (FoxH1) in TGFbeta and activin signalling." Mol Cell Endocrinol 180(1-2): 3-11.
Bernstein, B. E., T. S. Mikkelsen, et al. (2006). "A Bivalent Chromatin Structure Marks Key Developmental Genes in Embryonic Stem Cells." Cell 125: 315-326.
Bernstein, B. E., T. S. Mikkelsen, et al. (2006). "A bivalent chromatin structure marks key developmental genes in embryonic stem cells." Cell 125(2): 315-26.
Besser, D. (2004). "Expression of Nodal, Lefty-A, and Lefty-B in Undifferentiated Human Embryonic Stem Cells Requires Activation of Smad2/3." Journal of Biological Chemistry 279: 45076-45084.
Boyer, L. A., T. I. Lee, et al. (2005). "Core Transcriptional Regulatory Circuitry in Human Embryonic Stem Cells
." Cell 122: 947 - 956. Brunner, A. L., D. S. Johnson, et al. (2009). "Distinct DNA Methylation Patterns
Characterize Differentiated Human Embryonic Stem Cells and Developing Human Fetal Liver." Genome Research 19: 1044-1056.
Charles M. Baum, Irving L. Weissman, et al. (1992). "Isolation of a candidate human hematopoietic stem-cell population." PNAS 89: 2804-2808.
Chen, X., H. Xu, et al. (2008). "Integration of External Signaling Pathways with the Core Transcriptional Network in Embryonic Stem Cells." Cell 133: 1106 - 1117.
Cheng, Y., W. Wu, et al. (2009). "Erythroid GATA1 function revealed by genome-wide analysis of transcription factor occupancy, histone modifications, and mRNA expression." Genome Res 19(12): 2172-84.
Cirillo, L. A., F. R. Lin, et al. (2002). "Opening of compacted chromatin by early developmental transcription factors HNF3 (FoxA) and GATA-4." Mol Cell 9(2): 279-89.
Cirillo, L. A. and K. S. Zaret (1999). "An early developmental transcription factor complex that is more stable on nucleosome core particles than on free DNA." Mol Cell 4(6): 961-9.
Cui, K., C. Zang, et al. (2009). "Chromatin signatures in multipotent human hematopoietic stem cells indicate the fate of bivalent genes during differentiation." Cell Stem Cell 4(1): 80-93.
D'Amour, K. A., A. D. Agulnick, et al. (2005). "Efficient differentiation of human embryonic stem cells to definitive endoderm." Nat Biotechnol 23(12): 1534-41.
D'Amour, K. A., A. G. Bang, et al. (2006). "Production of pancreatic hormone-expressing endocrine cells from human embryonic stem cells." Nat Biotechnol 24(11): 1392-401.
Demers, C., C. P. Chaturvedi, et al. (2007). "Activator-mediated recruitment of the MLL2 methyltransferase complex to the beta-globin locus." Mol Cell 27(4): 573-84.
179
Eberwine, J., H. Yeh, et al. (1992). "Analysis of gene expression in single live neurons." Proc Natl Acad Sci 89: 3010 - 3014.
Eli Eisenberg and E. Y. Levanon (2003). "Human housekeeping genes are compact." Trends in Genetics 19: 362-365
Guo, G., M. Huss, et al. (2010). "Resolution of Cell Fate Decisions Revealed by Single-Cell Gene Expression Analysis
from Zygote to Blastocyst." Developmental Cell 18: 675 - 685. Heintzman, N. D., R. K. Stuart, et al. (2007). "Distinct and predictive chromatin
signatures of transcriptional promoters and enhancers in the human genome." Nat Genet 39(3): 311-8.
Hon, G., B. Ren, et al. (2008). "ChromaSig: a probabilistic approach to finding common chromatin signatures in the human genome." PLoS Comput Biol 4(10): e1000201.
Izzi, L., C. Silvestri, et al. (2007). "Foxh1 recruits Gsc to negatively regulate Mixl1 expression during early mouse development." EMBO J 26(13): 3132-43.
Jackson, A. L., S. R. Bartz, et al. (2003). "Expression profiling reveals off-target gene regulation by RNAi." Nature Biotechnology 21: 635 - 637.
Jaenisch, R. and A. Bird (2003). "Epigenetic regulation of gene expression: how the genome integrates intrinsic and environmental signals." Nature Genetics 33: 245 - 254.
James, D., A. J. Levine, et al. (2005). "TGFbeta/activin/nodal signaling is necessary for the maintenance of pluripotency in human embryonic stem cells." Development 132(6): 1273-82.
James M. Wells and D. A. Melton (2000). "Early mouse endoderm is patterned by soluble factors from adjacent germ layers." Development 127: 1563-1572
Ji, H., H. Jiang, et al. (2008). "An integrated software system for analyzing ChIP-chip and ChIP-seq data." Nat Biotechnol 26(11): 1293-300.
Ji, H., H. Jiang, et al. (2008). "An integrated software system for analyzing ChIP-chip and ChIP-seq data." Nature Biotechnology 26: 1293-1300.
Johnson, D. S., A. Mortazavi, et al. (2007). "Genome-wide mapping of in vivo protein-DNA interactions." 316: 1497–1502.
Kevin A D'Amour, Alan D Agulnick, et al. (2005). "Efficient differentiation of human embryonic stem cells to definitive endoderm." Nature Biotechnology 23: 1534-1541
Kimberly D. Tremblay and K. S. Zaret (2005). "Distinct populations of endoderm cells converge to generate the embryonic liver bud and ventral foregut tissues." Dev. Biol. 280: 87-99.
Kristie A. Lawson, Juanito J. Meneses, et al. (1991). "Clonal analysis of epiblast fate during germ layer formation in the mouse embryo." Development 113: 891-911.
Ku, M., R. P. Koche, et al. (2008). "Genomewide analysis of PRC1 and PRC2 occupancy identifies two classes of bivalent domains." PLoS Genet 4(10): e1000242.
Lee, C. C., H. J. Jan, et al. (2010). "Nodal promotes growth and invasion in human gliomas." Oncogene 29(21): 3110-23.
180
Levsky, J., S. Shenoy, et al. (2002). "Single-cell gene expression profiling." Science 297: 836 - 840.
Luigi Warren, David Bryder, et al. (2006). "Transcription factor profiling in individual hematopoietic progenitors by digital RT-PCR." PNAS 103: 17807-17812
Mangone, F. R., F. Walder, et al. (2010). "Smad2 and Smad6 as predictors of overall survival in oral squamous cell carcinoma patients." Mol Cancer 9: 106.
Mark Schena, Dari Shalon, et al. (1995). "Quantitative monitoring of gene expression patterns with a complementary DNA microarray." Science 270: 467-470.
McKinnell, I. W., J. Ishibashi, et al. (2008). "Pax7 activates myogenic genes by recruitment of a histone methyltransferase complex." Nat Cell Biol 10(1): 77-84.
Mikkelsen, T. S., M. Ku, et al. (2007). "Genome-wide maps of chromatin state in pluripotent and lineage-committed cells." Nature 448(7153): 553-60.
Nishimoto T, I. R., Ajiro K, Yamamoto S, Takahashi T (1981). "The synthesis of protein(S) for chromosome condensation may be regulated by a post-transcriptional mechanism." J. Cell. Physiol 109: 299-308
Owens, P., G. Han, et al. (2008). "The role of Smads in skin development." J Invest Dermatol 128(4): 783-90.
Pan, G., S. Tian, et al. (2007). "Whole-genome analysis of histone H3 lysine 4 and lysine 27 methylation in human embryonic stem cells." Cell Stem Cell 1(3): 299-312.
Pushkarev, D., N. F. Neff, et al. (2009). "Single-molecule sequencing of an individual human genome." Nature Biotechnology 27: 847 - 850.
Richard I. Sherwood, Cristian Jitianu, et al. (2007). "Prospective isolation and global gene expression analysis of definitive and visceral endoderm." Dev Biol 304: 541-555
Robert D. Barber, Dan W. Harmer, et al. (2005). "Gapdh as a housekeeping gene: analysis of gapdh mRNA exprssion in a panel of 72 human tissues." Physiol. Genomics 21: 389-395
Robertson, G., M. Hirst, et al. (2007). "Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing." Nature Methods 4: 651 - 657
Saijoh, Y., S. Oki, et al. (2003). "Left-right patterning of the mouse lateral plate requires nodal produced in the node." Dev Biol 256(1): 160-72.
Sandra L. Spurgeon, Robert C. Jones, et al. (2008). "High Throughput Gene Expression Measurement with Real Time PCR in Microfluidic Dynamic Array." PLoS ONE 3: e1662. doi:10.1371/journal.pone.0001662.
Schnabel, M., S. Marlovits, et al. (2002). "Dedifferentiation-associated changes in morphology and gene expression in primary human articular chondrocytes in cell culture." Osteoarthritis and Cartilage 10: 62-70.
Shi, X., T. Hong, et al. (2006). "ING2 PHD domain links histone H3 lysine 4 methylation to active gene repression." Nature 442: 96 - 99.
Shiratori, H., R. Sakuma, et al. (2001). "Two-step regulation of left-right asymmetric expression of Pitx2: initiation by nodal signaling and maintenance by Nkx2." Mol Cell 7(1): 137-49.
181
Silvestri, C., M. Narimatsu, et al. (2008). "Genome-wide identification of Smad/Foxh1 targets reveals a role for Foxh1 in retinoic acid regulation and forebrain development." Dev Cell 14(3): 411-23.
Thompson, J., J. Itskovitz-Eldor, et al. (1998). "Embryonic stem cell lines derived from human blastocysts." Science 282: 1145 - 1147.
Todd Thorsen, Sebastian J. Maerkl, et al. (2002). "Microfluidic Large-Scale Integration." Science 298: 580-584
Vallier, L., M. Alexander, et al. (2005). "Activin/Nodal and FGF pathways cooperate to maintain pluripotency of human embryonic stem cells." J Cell Sci 118(Pt 19): 4495-509.
Vallier, L., S. Mendjan, et al. (2009). "Activin/Nodal signalling maintains pluripotency by controlling Nanog expression." Development 136(8): 1339-49.
Valouev, A., D. S. Johnson, et al. (2008). "Genome-wide analysis of transcription factor binding sites based on ChIP-Seq data." Nature Methods 5: 829 - 834.
Viré, E., C. Brenner, et al. (2006). "The Polycomb group protein EZH2 directly controls DNA methylation." Nature 439: 871 - 874.
Visel, A., M. J. Blow, et al. (2008). "ChIP-seq accurately predicts tissue-specific activity of enhancers." Nature 457: 854-858.
von Both, I., C. Silvestri, et al. (2004). "Foxh1 is essential for development of the anterior heart field." Dev Cell 7(3): 331-45.
Xu, G., Y. Zhong, et al. (2004). "Nodal induces apoptosis and inhibits proliferation in human epithelial ovarian cancer cells via activin receptor-like kinase 7." J Clin Endocrinol Metab 89(11): 5523-34.
Zhao, X. D., X. Han, et al. (2007). "Whole-genome mapping of histone H3 Lys4 and 27 trimethylations reveals distinct genomic compartments in human embryonic stem cells." Cell Stem Cell 1(3): 286-98.