systems biology ii. roadmap review from a long time ago when we last visited this topic. review of...
TRANSCRIPT
Systems Biology II
Roadmap
• Review from a long time ago when we last visited this topic.
• Review of some work we have done using a systems biology approach.
• Look at some research that benefited by adopting systems biology approaches.
“Inner life of a Cell”SIGGRAPH 2006 showcase
winner• Need to fight infection
– WBC
• Need to keep blood from leaking out
Two ways of looking a problem
• Top down or bottom up– Either look at the whole organism and
abstract large portions of it – Or try to understand each small piece and
then after understanding every small piece assemble into the whole
– Both are used, valid and complement each other
Theoretical types of control
Expression measurements
Visualizing the data
Blue line (pp)Yellow line (pd)
Graph theory, networks
• Two types of networks– Exponential and scale
free– Most cellular networks
are scale free– It makes the most
sense to study the interactions of the central nodes not the outer nodes
Using network properties of a large complex data-set to evaluate the
correlation of gene expression from a large microarray experiment
Design of initial experiment
Gene expression
SHR-SP SR/JR/HSD
120 ♂ F2 rats
Genotyping
…
mRNA of whole eyes
♂ ♀
F1 rats
Summary of eQTL linkages
Marker Location
Tra
nscr
ipt
Loca
tion
Cis Trans
NPCE: Non-Positional Correlation of Expression
Capture bio-relatednessPair-wise correlation
Macromolecular structuresMetabolic pathwaysDisease Genes
Devoid of marker informationMore information
not dependant on marker densitymore noise
Strongly correlated genes
Expression BBS3 (log2)
Exp
ress
ion
BB
S7
(log
2)
r2 = 0.78
Weak correlation
Expression BBS4 (log2)
Exp
ress
ion
AB
CA
4 (
log 2
) r2 = 0.16
Distribution of r2
0
1000000
2000000
3000000
4000000
5000000
6000000
7000000
8000000
-0.8
9-0
.83-0
.77-0
.71-0
.65-0
.59-0
.53-0
.47-0
.41-0
.35-0
.29-0
.23-0
.17-0
.11-0
.05
0.01
0.07
0.13
0.19
0.25
0.31
0.37
0.43
0.49
0.55
0.61
0.67
0.73
0.79
0.85
0.91
0.97
p=0.001
r2=0.78
p=0.01
r2=0.66
Combining pathway information with correlation
Median Correlations of Pathways
0
1
2
3
4
5
6
7
8
0.5
0.47
0.44
0.41
0.38
0.35
0.32
0.29
0.26
0.23 0.
20.
170.
140.
110.
080.
050.
02
Median Correlation
Nu
mb
er
All Pathways
Random
Shorter Pathways
Pairwise correlations are not enough?
• Looking at known pathways a simple cutoff value is not identifiable
• Partial correlation or multiple correlations – More feasible but, still difficult– May only work in a subset of pathways
• Most useful if you want to confirm membership to a known group?– Difference between random and known pathways is small
• Another way?
Networks
“Realworld” Networks
• Tend to be highly clustered
• Tend to have short path lengths
• Many nodes with few interactions– Few nodes with many interactions
Useful tools
• Cytoscape– Best for visualization– Limited (for us anyway) number of nodes– http://www.cytoscape.org/
• Networkx– Python module– Visualization and network discription
-https://networkx.lanl.gov/
Using network properties
• Can we use networks to identify “critical” genes?• Is it possible to determine a usable “cutoff” for
correlations used to make the network– What correlation value will give a usable, relevant network?– Is this value similar to the p value determined from the
distribution of correlations?
• Is it possible to use network properties to identify a grouping of interacting genes (ex. pathway, subunits or other interactions)
Highly Connected genesGene @ correlation # connect function
Glul .9,.8,.7 1498 Glutamine synthetase
Gnai3 .85,.8 832 Guanine nucleotide-binding protein
Smad4 .8 726 Common mediator of signal transduction
Syngap .8 672 Ras-GTPase activator
Orc4 .8,.75,.7 2521 directs DNA replication
Psma4 .7,.65,.6 7054 multicatalytic proteinase
Rabep1 .7,.65,.6 7037 Rab GTPase binding effector protein
Pcna .7,.65,.6 6799 proliferating cell nuclear antigen
Common ontologies
• Molecular function– Most common - none– glutamate-ammonia
ligase activity– GTPase activator
activity– carrier activity– structural molecule
activity– DNA binding
• Biological process– nitrogen fixation– Transport– vesicle fusion– cell motility– small GTPase
mediated signal transduction
What correlation level to useHighly Connected Nodes
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
0.9 0.85 0.8 0.75 0.7 0.65
Correlation level
Nu
mb
er
of
Nod
es
0
0.5
1
1.5
2
2.5
3
Perc
en
t of
tota
l ed
ges
Most connected
Percent of total
Other parameters
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.95 0.9 0.85 0.8 0.75 0.7 0.65
0
0.002
0.004
0.006
0.008
0.01
0.012
0.014
0.016
0.018
Clustering
Density
Validating a graph biological relevance
• Need to use information to pick a correlation level(s) used to construct a graph.
• After the graph is constructed– How well does it predict known bio-
interactions
Validating against pathways
• Kegg has a nice collection of pathway annotations (http://www.genome.jp/kegg/)– Also have a webservice interface– Allows programatic access to pathway annotations
(http://www.genome.jp/kegg/soap/)• By species• By pathway• By pathway type • Some problems kegg id vs affy probe id
– May be a many to many relationship
Rattus norvegicus (rat) metabolic pathways
• Kegg has 110 metabolic pathways
• Range in size from 3 members to 100’s of members
• Examples:– Novobiocin biosynthesis– ATP synthesis– Fructose and mannose metabolism
Path lengthPath length ratio
0
1
2
3
4
5
6
7
8
0.9
8
0.9
4
0.9
0.8
6
0.8
2
0.7
8
0.7
4
0.7
0.6
6
0.6
2
0.5
8
0.5
4
0.5
0.4
6
0.4
2
0.3
8
0.3
4
0.3
0.2
6
0.2
2
0.1
8
0.1
4
0.1
0.0
6
0.0
2
Rand/Pathway
Path coverage
Coverage of a .70 correlation network
0
1
2
3
4
5
6
7
8
2 8 14 20 26 32 38 44 50 56 62 68 74 80 86 92 98
Percent coverage
Nu
mb
er
of
path
ways
Kegg Pathways
Random
Different values
• Using a correlation of .9– No coverage for either pathway or random
set– Not enough connections, they may be
significant, but only a small fraction are present
• Lower correlations– Less clear– Much larger networks
Why networks != correlations
Bbs2
Bbs4
Bbs6
Bbs8
Bbs1
Bbs3
Bbs7
Bbs5
Bbs11
Bbs9
Abca4
0.45-0.540.55-0.640.65+
p < 0.001
p < 0.0002
Conclusions
• Network properties show promise as a way to look at this data
• Pair-wise correlations and networks are unable to predict pathways or other interactions with certainty– But they can help
• Using network tools and frameworks is a way to manage and simplify analysis
AcknowledgmentsMicroarray collaborators
Ed Stone
Val Sheffield
Jian Huang
Kwang-youn Kim
Ruth Swiderski
Kevin Knudtson
Rod Philp
CBCB
Todd ScheetzTom CasavantTerry BraunNathan Schulz
Example Studies
• Physicochemical modeling of cell signaling pathways. B.B. Aldridge et al. Nature Cell Biology. 8(11) Nov 2006. 1195-1203.
• Reverse engineering of regulatory networks in human B cells. K. Basso. Nature Genetics. 37(4) Apr 2005. 382-397.
• Dynamic proteomics in individual human cells uncovers widespread cell-cycle dependence of nuclear proteins. A. Sigal. Nature Methods. 3(7) Jul 2006. 525-532.
• Structural systems biology: modeling protein interactions. P. Aloy. Nature Reviews. Mar 2006. 188-198 .
Reverse engineering of regulatory networks in human B
cells• Have lots of microarrays, how can you
reconstruct the network of regulation.– Lower organisms, works– Higher, too much noise
• ARACNe algorithm for the reconstruction of accurate cellular networks– Find correlated genes– Remove indirect correlations
Mutual Information
• How much does value t1 tell you about value t2
• If MI = 0 there is no information if MI = 1 you have perfect information.
• Similar to correlation coefficient but able to capture more complex interactions.
Find direct interactions
• Use “data transmission theory” – Data processing inequality (DPI)– If (x,y) and (y,z) directly interact and (x,z)
indirectly interact• Mutual information of x,z will be less than x,y or
y,z• High MI values confound analysis• Three member loops are common, and difficult
to parse.
Assessing validity and coverage
Validation and conclusions
• Validated 34 candidates by chip-chip
• Make conclusions about hierarchical nature of the myc network
• Know important members of the network for further study.
Dynamic proteomics in individual human cells uncovers widespread cell-cycle
dependence of nuclear proteins
• Measure temporal and spatial relations in dividing cells of 20 fluorescently labeled proteins.
Keys
• New technique to introduce a fluorescent label that does not perturb the protein function (as much)
• In-silico synchronization
Results of the paper:
• Large number of proteins that probably are involved in cell cycle control
• A general, scalable technique for studying location and interaction of proteins.