reverse engineering gene networks using singular value decomposition and robust regression
DESCRIPTION
Reverse engineering gene networks using singular value decomposition and robust regression. M.K.Stephen Yeung Jesper Tegner James J. Collins. General idea. Reverse-engineer: Genome-wide scale Small amount of data No prior knowledge Using SVD for a family of possible solutions - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Reverse engineering gene networks using singular value decomposition and robust regression](https://reader036.vdocument.in/reader036/viewer/2022081520/56815b1f550346895dc8d432/html5/thumbnails/1.jpg)
Reverse engineering gene networks using singular value
decomposition and robust regressionM.K.Stephen Yeung
Jesper TegnerJames J. Collins
![Page 2: Reverse engineering gene networks using singular value decomposition and robust regression](https://reader036.vdocument.in/reader036/viewer/2022081520/56815b1f550346895dc8d432/html5/thumbnails/2.jpg)
General idea
Reverse-engineer:• Genome-wide scale• Small amount of data • No prior knowledge• Using SVD for a family of possible
solutions• Using robust regression to choose from
them
![Page 3: Reverse engineering gene networks using singular value decomposition and robust regression](https://reader036.vdocument.in/reader036/viewer/2022081520/56815b1f550346895dc8d432/html5/thumbnails/3.jpg)
If the system is near a steady state, dynamics can be approximated by linear system of N ODEs:
xi = concentration of mRNA
(reflects expression level of genes)λi = self-degradation rates
bi = external stimuli
ξi = noise
Wij = type and strength of effect
of jth gene on ith gene
)()()()()(1
ttbtxWtxtx ii
N
jjijiii
![Page 4: Reverse engineering gene networks using singular value decomposition and robust regression](https://reader036.vdocument.in/reader036/viewer/2022081520/56815b1f550346895dc8d432/html5/thumbnails/4.jpg)
![Page 5: Reverse engineering gene networks using singular value decomposition and robust regression](https://reader036.vdocument.in/reader036/viewer/2022081520/56815b1f550346895dc8d432/html5/thumbnails/5.jpg)
Suppositions made:• No time-dependency in connections
(so W is not time-dependent), and they are not changed by the tests
• System near steady state
• Noise will be discarded, so exact measurements are assumed
• can be calculated exactly enoughX
![Page 6: Reverse engineering gene networks using singular value decomposition and robust regression](https://reader036.vdocument.in/reader036/viewer/2022081520/56815b1f550346895dc8d432/html5/thumbnails/6.jpg)
In M experiments with N genes, • each time apply stimuli (b1,…,bN) to the genes• measure concentrations of N mRNAs (x1,…,xN)
using a microarray
You get:
subscript i = mRNA numbersuperscript j = experiment number
MNNN
M
M
MN
xxx
xxx
xxx
X
21
222
12
121
11
x
![Page 7: Reverse engineering gene networks using singular value decomposition and robust regression](https://reader036.vdocument.in/reader036/viewer/2022081520/56815b1f550346895dc8d432/html5/thumbnails/7.jpg)
Goal is to use as few measurements as possible. By this method (with exact measurements):
M = O(log(N))e.g. in 1st test, the results will be:
![Page 8: Reverse engineering gene networks using singular value decomposition and robust regression](https://reader036.vdocument.in/reader036/viewer/2022081520/56815b1f550346895dc8d432/html5/thumbnails/8.jpg)
System becomes:
With A = W + diag(-λi)Compute by using several measurements of the
data for X. (e.g. using interpolation)Goal = deduce W (or A) from the rest
If M=N, compute (XT)-1, but mostly M << N (this is our goal: M = log(N))
MNMNNNMN BXAX xxxx
NMT
NMT
NNT
NMT BXAX xxxx
X
![Page 9: Reverse engineering gene networks using singular value decomposition and robust regression](https://reader036.vdocument.in/reader036/viewer/2022081520/56815b1f550346895dc8d432/html5/thumbnails/9.jpg)
Therefore, use SVD (to find least squares sol.):
Here, U and V are orthogonal (UT = U-1)and W is diag(w1,…,wN) with wi the singular
values of XSuppose all wi = 0 are in the beginning, so wi = 0
for i = 1…L and wi ≠ 0 (i=L+1...L+N)
NNT
NNNMNMT VWUX xxxx
NM
TNM
T
NNT
NNT
NNiNM
BX
AVwdiagU
xx
xxxx )(
![Page 10: Reverse engineering gene networks using singular value decomposition and robust regression](https://reader036.vdocument.in/reader036/viewer/2022081520/56815b1f550346895dc8d432/html5/thumbnails/10.jpg)
Then the least squares (L2) solution to the problem is:
With 1/wj replaced by 0 if wj = 0
So this formula tries to match every datapoint as closely as possible to the solution.
NNT
jNMMNMN V
wdiagUBXA xxxx0
1
![Page 11: Reverse engineering gene networks using singular value decomposition and robust regression](https://reader036.vdocument.in/reader036/viewer/2022081520/56815b1f550346895dc8d432/html5/thumbnails/11.jpg)
But all possible solutions are:
with C = (cij)NxN where cij = 0 if j > L and otherwise just a scalar coefficient
How to choose from the family of solutions ?The least squares method tries to match
every datapoint as closely as possible → a not-so-sparse matrix with a lot of
small entries.
TCVAA 0
![Page 12: Reverse engineering gene networks using singular value decomposition and robust regression](https://reader036.vdocument.in/reader036/viewer/2022081520/56815b1f550346895dc8d432/html5/thumbnails/12.jpg)
1. Basing on prior biological knowledge,impose this on the solutions.e.g.: when we know 2 genes are related,the solution must reflect this in the matrix
2. Work from the assumption that normal gene networks are sparse, and look for the matrix that is most sparsethus: search cij to maximize the number of zero-entries in A
![Page 13: Reverse engineering gene networks using singular value decomposition and robust regression](https://reader036.vdocument.in/reader036/viewer/2022081520/56815b1f550346895dc8d432/html5/thumbnails/13.jpg)
So:• get as much zero-entries as you can• therefore get a sparse matrix• the non-zero entries form the connections
• fit as much measurements as you can, exactly: “robust regression”(So you suppose exact measurements)
![Page 14: Reverse engineering gene networks using singular value decomposition and robust regression](https://reader036.vdocument.in/reader036/viewer/2022081520/56815b1f550346895dc8d432/html5/thumbnails/14.jpg)
Do this using L1 regression. Thus, when considering
we want to “minimize” A. The L1 regression idea is then to look for the
solution C where is minimal.
This causes as many zeros as possible.
Implementation was done using the simplex method (linear adjustment method)
10 |||| TCVA
TCVAA 0
![Page 15: Reverse engineering gene networks using singular value decomposition and robust regression](https://reader036.vdocument.in/reader036/viewer/2022081520/56815b1f550346895dc8d432/html5/thumbnails/15.jpg)
Thus, to reverse-engineer a network of N genes, we “only” need Mc = O(logN) experiments.
Then Mc << N, and the computational cost will be O(N4)
(Brute-force methods would have a cost of O(N!/(k!(N-k)!)) with k non-zero entries)
![Page 16: Reverse engineering gene networks using singular value decomposition and robust regression](https://reader036.vdocument.in/reader036/viewer/2022081520/56815b1f550346895dc8d432/html5/thumbnails/16.jpg)
Test 1• Create random connectivity matrix:
for each row, select k entries to be non-zero- k < kmax << N (to impose sparseness)- non-zero entry random from uniform distrib.
• Do random perturbations• Do measurements while system relaxes back
to its previous steady state → X• Compute by interpolation • Do this M times
X
![Page 17: Reverse engineering gene networks using singular value decomposition and robust regression](https://reader036.vdocument.in/reader036/viewer/2022081520/56815b1f550346895dc8d432/html5/thumbnails/17.jpg)
Test 1
• Then apply algorithm to become approximation of A
• Computed error (with the computed A):A~
otherwise 0
|~| if 1 where
1 1
ijijij
N
i
N
jij
AAe
eE
![Page 18: Reverse engineering gene networks using singular value decomposition and robust regression](https://reader036.vdocument.in/reader036/viewer/2022081520/56815b1f550346895dc8d432/html5/thumbnails/18.jpg)
• Results: Mc = O(log(N))
• Better than only SVD, without regression:
![Page 19: Reverse engineering gene networks using singular value decomposition and robust regression](https://reader036.vdocument.in/reader036/viewer/2022081520/56815b1f550346895dc8d432/html5/thumbnails/19.jpg)
Test 2
• One-dimensional cascade of genes
• Result for N = 400:
Mc = 70
![Page 20: Reverse engineering gene networks using singular value decomposition and robust regression](https://reader036.vdocument.in/reader036/viewer/2022081520/56815b1f550346895dc8d432/html5/thumbnails/20.jpg)
Test 3
• Large sparse gene network, with ran-dom connections, external stimuli,…
• Results the same as in previous tests
![Page 21: Reverse engineering gene networks using singular value decomposition and robust regression](https://reader036.vdocument.in/reader036/viewer/2022081520/56815b1f550346895dc8d432/html5/thumbnails/21.jpg)
Discussion
Advantages:• Very few data needed, in comparison with
neural networks, Bayesian models• No prior knowledge needed• Easy to parallelize, as it recovers the
connectivity matrix row by row (gene by gene)
• Also applicable to protein networks
![Page 22: Reverse engineering gene networks using singular value decomposition and robust regression](https://reader036.vdocument.in/reader036/viewer/2022081520/56815b1f550346895dc8d432/html5/thumbnails/22.jpg)
Discussion
Disadvantages:• Less efficient for small networks (M≈N)• No quantification yet of the necessary
“sparseness”, though avg. 10 connections is good for a network containing > 200 genes
• Uncertain • Especially useful with exact data, which
we don’t have
X
![Page 23: Reverse engineering gene networks using singular value decomposition and robust regression](https://reader036.vdocument.in/reader036/viewer/2022081520/56815b1f550346895dc8d432/html5/thumbnails/23.jpg)
Improvements
• Other algorithms to impose sparseness: alternatives are possible both for L1 (basic criterion) as for simplex (implementation)
• By using a deterministic linear system of ODEs, a lot has been neglected (noise, time delays, nonlinearities)
• Connections could change by experiments;then the use of time-dependent W is
necessary