1 cse, marmara university mimoza.marmara.edu.tr/~m.sakalli/cse546 including some slights of...
TRANSCRIPT
1
Why CpG islands?CSE, Marmara University CSE, Marmara University
mimoza.marmara.edu.tr/~m.sakalli/cse546mimoza.marmara.edu.tr/~m.sakalli/cse546Including some slights of Papoulis. These notes will be further modified. Including some slights of Papoulis. These notes will be further modified.
Dec/15/09Dec/15/09
Notes on probability are from Notes on probability are from A. Papoulis and S. U. Pillai,
2
RNA interference, and DNA methylation RCOOHRCOOCH3
• Methylation involves on the regulation of the gene expression, protein functioning, and RNA metabolisms.
• A cell is combination of numerous proteins, each determining how a cell functions. Disproportionately expressed proteins will have devastating effects.
• Two possible vulnerabilities: – one is at the transcriptional level, while dna is converted to mRNA, a fraction of
antisense oligonucleotide binding to unprocessed gene in the DNA, and creating a 3-strand complex, as a result blocking transcription process,
– and the second vulnerability is that at the level of translation. Translation is a ribosome-guided process for manufacturing a protein from mRNA. There, once antisenseRNA hybridizes mRNA, then protein generation is inhibited since the editing enzymes splicing introns from RNAs are blocked. RNaseH recognizes the double helix complex of antisense on bound mRNA, and somehow frees antisense on, and cleaves mRNA.
• Antisense therapy: HIV, influenza and for cancer treatment where replication and transcription is targeted.
3
RNA interference, and DNA methylation RCOOHRCOOCH3
• RNA interference (RNAi) is a system controlling (either increasing or decreasing) the activity of RNAs. MicroRNA (miRNA) and small interfering RNA (siRNA) which are the direct products of genes, and can bind to other specific RNAs. They play roles in defending cells against parasitic genes – viruses and transposons – but also gene expression in general. It is universal.
• The methylation process differs in prokaryotic and eukaryotic cells, in the former one it occurs at the 5’ of cytosine pyrimidine and at the 6’ of nitrogen of the adenine purine ring, while in the later one, it occurs at the # 5 carbon of the cytosin pyrimidine sites.
• In mammalian, metyhlation occurs at the 5C of CpG dinucleotide. CpG is 1% of human genome. Most are methylated. Unmethylated CpG islands present in the regulatory genes, including promoter regions, therefore impeding transcription and protein modeling, (cromotin and histone).
• One abnormality for example caused due to the incomplete methylation is Rett syndrome. Epigenetic abnormalities. Methylated histones holding dna tightly and blocking transcriptions.
4
• The occurrence of CpG sequences is the least frequent in many genomes.. rarer than would be expected by the independent probabilities of C and G. This is said (!!because) C in CpG has a tendency to methylate and to become methyle-C, and methylation process is suppressed in areas around genes, hence these areas have a relatively higher concentration of CpG in islands.
• Epigenetic Importance: Methyle-C has a high change in mutating to T, therefore important in epigenetic inheritance, as well its importance in in controlling gene expression and regulation.
• Questions: How close a short sequence is to be a CpG island, and the likelihood of a long sequence containing one or more CpG islands, and more importantly the relation it bears, coincidental or for some functional reasoning.
• Therefore Markov chains.
5
A A Markov chainMarkov chain is a stochastic random process, a discrete is a stochastic random process, a discrete process {process { XXnn } where } where nn { { 0, 1, 2, . . . }, with the Markov 0, 1, 2, . . . }, with the Markov
property, for which, the conditional probability distribution of property, for which, the conditional probability distribution of the future states depends only upon the current state and a the future states depends only upon the current state and a fixed number of past states (with fixed number of past states (with mm memories). Continuous memories). Continuous Time MC has continuous time index. Time MC has continuous time index.
PrPr{{ XXmm+1+1 = = jj || XX00 = = kk00, . . . , , . . . , XXmm-1-1 = = kkmm-1-1, , XXmm = = ii } } = =
PrPr{ { XXmm+1+1 = = jj | | XXnn = = ii }} transition probabilities. transition probabilities.
Finite state machine, iid sequence. Finite state machine, iid sequence. for every for every ii, , jj, , kk00, . . . , , . . . , kkmm-1 -1 and for every and for every mm. .
Stationary: For all Stationary: For all nn, the transition matrix does not change over , the transition matrix does not change over time and the future state depends only on the current state time and the future state depends only on the current state ii and not on the previous states. and not on the previous states.
Pr{Pr{ XXnn+1 +1 = = jj | | XXnn = = ii } = Pr{ } = Pr{ XX1 1 = = jj | |XX00 = = ii }. }.
6
The one-step transition matrix for a Markov chain with states S = { 0, 1, 2 } is [………, …, …]
where Pr{ X1 = j | X0 = i } = pij(n)>0.
Accessibility: A Markov Process is ergodic if if possible to communicate between any two i to j states. Then this is irreducible, if all states communicate..
Periodic if returns to the same state at every k (periodicity) steps. Aperiodic if there is no a repetitive k steps.
A system lucking the system is absorbing state. If there is no absorbing state then the Markov Chain
irreducible.
2
0
1
(1) (1)
(1)
(1)
2
0
1
(1)
(0.5)
(1)
4(0.5)
7
3 1
0
2
0 0 X 0 X
1 X 0 0 0
2 X 0 0 0
3 0 0 X X
0 1 2 3
State
4
0
0 X X 0 0 X
1 X X 0 0 0
2 0 0 X X 0
3 0 0 0 X 0
0 1 2 3 4
State 4 0 0 0 0 0
23
1
8
Learn these.. • Conditional probability, joint probability.
• Independence of occurrences of events.
• Bayesian process.
• Expressing sequences statistically with their distribution. Discriminating states.
• MLE, EM.
• MCMC, for producing a desired posteriori distribution, 1- Metropolis-Hastings, RWMC, 2-Gibbs sampling.
• Markov chains, properties maintained. Stationary, ergodic, irreducibility, aperiodic,
• Hidden Markov Models (the goal is to detect the sequence of underlying states the sequence of underlying states that is likely to give rise to an observed sequence). that is likely to give rise to an observed sequence).
• This is Viterbi Algorithm.
9
Independence: A and B are said to be independent events,
if
Notice that the above definition is a probabilistic statement,
not a set theoretic notion such as mutually exclusiveness.
Suppose A and B are independent, then
Thus if A and B are independent, the event that B has occurred does not give any clue on the occurrence of the event A. It makes no difference to A whether B has occurred or not.
).()()( BPAPABP (1-45)
PILLAI
).()(
)()(
)(
)()|( AP
BP
BPAP
BP
ABPBAP (1-46)
10
Example 1.2: A box contains 6 white and 4 black balls. Remove two balls at random without replacement. What is the probability that the first one is white and the second one is black?
Let W1 = “first ball removed is white”
B2 = “second ball removed is black”
PILLAI
11
Ex 1.2: A box contains 6 w and 4 b balls. Remove two at random without replacement. What is the probability that the 1st one is white and the 2nd one is black?
Let W1 = “first ball removed is white” and B2 = “second ball removed is black”
We need We have
Using the conditional probability rule,
But
and
and hence
?)( 21 BWP
).()|()()( 1121221 WPWBPWBPBWP
,5
3
10
6
46
6)( 1
WP
,9
4
45
4)|( 12
WBP
.25.081
20
9
4
9
5)( 21 BWP
.122121 WBBWBW
(1-47)
PILLAI
12
Are the events W1 and B2 independent? Our common sense says No. To verify this we need to compute P(B2). Of course the fate of the second ball very much depends on that of the first ball. The first ball has two options: W1 = “first ball is white” or B1= “first ball is black”. Note that and Hence W1 together with B1 form a partition. Thus (see (1-42)-(1-44))
and
As expected, the events W1 and B2 are dependent.
,11 BW
.11 BW
,5
2
15
24
5
2
3
1
5
3
9
4
10
4
36
3
5
3
45
4
)()|()()|()( 1121122
BPRBPWPWBPBP
.81
20)(
5
3
5
2)()( 1212 WBPWPBP
PILLAI
13
From (1-35),
Similarly, from (1-35)
or
or Bayes’ theorem
).()|()( BPBAPABP
,)(
)(
)(
)()|(
AP
ABP
AP
BAPABP
).()|()( APABPABP
(1-48)
(1-49)
).()|()()|( APABPBPBAP
(1-50))(
)(
)|()|( AP
BP
ABPBAP
PILLAI
14
Although simple enough, Bayes’ theorem has an interesting interpretation:
P(A|B): a-posteriori probability of A given B.
P(B): (New Infor.) Evidence of “B has occurred”.
P(B|A): Likelihood of B given A
P(A): the a-priori probability of the event A.
We can also view the event B as new knowledge obtained from a fresh experiment. We know something about A as P(A). The new information is available in terms of B. The new information should be used to improve our knowledge/understanding of A. Bayes’ theorem gives the exact mechanism for incorporating such new information.
PILLAI
15
A more general version of Bayes’ theorem involves
partition of . From (1-50)
where we have made use of (1-44). In (1-51),
represent a set of mutually exclusive events with
associated a-priori probabilities With the
new information “B has occurred”, the information about
Ai can be updated by the n conditional probabilities
,)()|(
)()|(
)(
)()|()|(
1
n
iii
iiiii
APABP
APABP
BP
APABPBAP (1-51)
,1 , niAi
47).-(1 using ,1 ),|( niABP i
.1 ),( niAP i
PILLAI
16
Example 1.3: Two boxes B1 and B2 contain 100 and 200
light bulbs respectively. The first box (B1) has 15 defective
bulbs and the second 5. Suppose a box is selected at random and one bulb is picked out.
(a) What is the probability that it is defective?
Solution: Note that box B1 has 85 good and 15 defective
bulbs. Similarly box B2 has 195 good and 5 defective
bulbs. Let D = “Defective bulb is picked out”.
Then
.025.0200
5)|( ,15.0
100
15)|( 21 BDPBDP
PILLAI
17
Since a box is selected at random, they are equally likely.
Thus B1 and B2 form a partition as in (1-43), and using
(1-44) we obtain
Thus, there is about 9% probability that a bulb picked at random is defective.
.2
1)()( 21 BPBP
.0875.02
1025.0
2
115.0
)()|()()|()( 2211
BPBDPBPBDPDP
PILLAI
18
(b) Suppose we test the bulb and it is found to be defective.
What is the probability that it came from box 1?
Notice that initially then we picked out a box
at random and tested a bulb that turned out to be defective.
Can this information shed some light about the fact that we
might have picked up box 1?
From (1-52), and indeed it is more
likely at this point that we must have chosen box 1 in favor
of box 2. (Recall box1 has six times more defective bulbs
compared to box2).
.8571.00875.0
2/115.0
)(
)()|()|( 11
1
DP
BPBDPDBP
?)|( 1 DBP
(1-52)
;5.0)( 1 BP
,5.0857.0)|( 1 DBP
PILLAI
19
Let denote the random outcome of an experiment. To every such outcome suppose a waveform is assigned.The collection of such waveforms form a stochastic process. The set of and the time index t can be continuousor discrete (countably infinite or finite) as well.For fixed (the set of all experimental outcomes), is a specific time function.For fixed t,
is a random variable. The ensemble of all such realizations over time represents the stochastic
),( tX
}{ k
Si
),( 11 itXX
),( tX
PILLAI/Cha
14. Stochastic Processes
t1
t2
t
),(n
tX
),(k
tX
),(2
tX
),(1
tX
Fig. 14.1
),( tX
0
),( tX
Introduction
20
process X(t). (see Fig 14.1). For example
where is a uniformly distributed random variable in represents a stochastic process. Stochastic processes are everywhere:Brownian motion, stock market fluctuations, various queuing systemsall represent stochastic phenomena.
If X(t) is a stochastic process, then for fixed t, X(t) representsa random variable. Its distribution function is given by
Notice that depends on t, since for a different t, we obtaina different random variable. Further
represents the first-order probability density function of the process X(t).
),cos()( 0 tatX
})({),( xtXPtxFX
),( txFX
(14-1)
(14-2)
PILLAI/Cha
(0,2 ),
dxtxdF
txf X
X
),(),(
21
For t = t1 and t = t2, X(t) represents two different random variablesX1 = X(t1) and X2 = X(t2) respectively. Their joint distribution is given by
and
represents the second-order density function of the process X(t).Similarly represents the nth order densityfunction of the process X(t). Complete specification of the stochasticprocess X(t) requires the knowledge of for all and for all n. (an almost impossible taskin reality).
})(,)({),,,( 22112121 xtXxtXPttxxFX
(14-3)
(14-4)
),, ,,,( 2121 nn tttxxxfX
),, ,,,( 2121 nn tttxxxfX
niti , ,2 ,1 ,
PILLAI/Cha
21 2 1 2
1 2 1 21 2
( , , , )( , , , )
X
X
F x x t tf x x t t
x x
22
Mean of a Stochastic Process:
represents the mean value of a process X(t). In general, the mean of a process can depend on the time index t.
Autocorrelation function of a process X(t) is defined as
and it represents the interrelationship between the random variablesX1 = X(t1) and X2 = X(t2) generated from the process X(t).
Properties:
1.
2.
(14-5)
(14-6)
*1
*212
*21 )}]()({[),(),( tXtXEttRttR
XXXX (14-7)
.0}|)({|),( 2 tXEttRXX
PILLAI/Cha
(Average instantaneous power)
( ) { ( )} ( , )
Xt E X t x f x t dx
* *1 2 1 2 1 2 1 2 1 2 1 2( , ) { ( ) ( )} ( , , , )
XX XR t t E X t X t x x f x x t t dx dx
23
3. represents a nonnegative definite function, i.e., for any set of constants
Eq. (14-8) follows by noticing that The function
represents the autocovariance function of the process X(t).Example 14.1Let
Then
.)(for 0}|{|1
2
n
iii tXaYYE
)()(),(),( 2*
12121 ttttRttCXXXXXX
(14-9)
.)(
T
TdttXz
T
T
T
T
T
T
T
T
dtdtttR
dtdttXtXEzE
XX
2121
212*
12
),(
)}()({]|[|
(14-10)
niia 1}{
),( 21 ttRXX
n
i
n
jjiji ttRaa
XX
1 1
* .0),( (14-8)
PILLAI/Cha
24
Similarly
,0}{sinsin}{coscos
)}{cos()}({)(
0 0
0
EtaEta
taEtXEtX
).(cos2
)}2)(cos()({cos2
)}cos(){cos(),(
210
2
210210
2
20102
21
tta
ttttEa
ttEattRXX
(14-12)
(14-13)
Example 14.2
).2,0(~ ),cos()( 0 UtatX (14-11)
This gives
PILLAI/Cha
2
0 }.{sin0cos}{cos since 2
1 EdE
25
Stationary Stochastic ProcessesStationary processes exhibit statistical properties that are
invariant to shift in the time index. Thus, for example, second-orderstationarity implies that the statistical properties of the pairs {X(t1) , X(t2) } and {X(t1+c) , X(t2+c)} are the same for any c. Similarly first-order stationarity implies that the statistical properties of X(ti) and X(ti+c) are the same for any c.
In strict terms, the statistical properties are governed by thejoint probability density function. Hence a process is nth-orderStrict-Sense Stationary (S.S.S) if
for any c, where the left side represents the joint density function of the random variables andthe right side corresponds to the joint density function of the randomvariables A process X(t) is said to be strict-sense stationary if (14-14) is true for all
),, ,,,(),, ,,,( 21212121 ctctctxxxftttxxxf nnnn XX
(14-14)
)( , ),( ),( 2211 nn tXXtXXtXX
).( , ),( ),( 2211 ctXXctXXctXX nn
. and ,2 ,1 , , ,2 ,1 , canynniti PILLAI/Cha
26
For a first-order strict sense stationary process,from (14-14) we have
for any c. In particular c = – t gives
i.e., the first-order density of X(t) is independent of t. In that case
Similarly, for a second-order strict-sense stationary process we have from (14-14)
for any c. For c = – t2 we get
),(),( ctxftxfXX
(14-16)
(14-15)
(14-17)
)(),( xftxfXX
[ ( )] ( ) , E X t x f x dx a constant.
), ,,(), ,,( 21212121 ctctxxfttxxfXX
) ,,(), ,,( 21212121 ttxxfttxxfXX
(14-18)
PILLAI/Cha
27
i.e., the second order density function of a strict sense stationary process depends only on the difference of the time indices In that case the autocorrelation function is given by
i.e., the autocorrelation function of a second order strict-sensestationary process depends only on the difference of the time indices Notice that (14-17) and (14-19) are consequences of the stochastic process being first and second-order strict sense stationary. On the other hand, the basic conditions for the first and second order stationarity – Eqs. (14-16) and (14-18) – are usually difficult to verify.In that case, we often resort to a looser definition of stationarity,known as Wide-Sense Stationarity (W.S.S), by making use of
.21 tt
.21 tt
(14-19)
PILLAI/Cha
*1 2 1 2
*1 2 1 2 1 2 1 2
*1 2
( , ) { ( ) ( )}
( , , )
( ) ( ) ( ),
XX
X
XX XX XX
R t t E X t X t
x x f x x t t dx dx
R t t R R
28
(14-17) and (14-19) as the necessary conditions. Thus, a process X(t)is said to be Wide-Sense Stationary if(i) and(ii) i.e., for wide-sense stationary processes, the mean is a constant and the autocorrelation function depends only on the difference between the time indices. Notice that (14-20)-(14-21) does not say anything about the nature of the probability density functions, and instead deal with the average behavior of the process. Since (14-20)-(14-21) follow from (14-16) and (14-18), strict-sense stationarity always implies wide-sense stationarity. However, the converse is not true in general, the only exception being the Gaussian process.This follows, since if X(t) is a Gaussian process, then by definition are jointly Gaussian randomvariables for any whose joint characteristic function is given by
)}({ tXE
(14-21)
(14-20)
),()}()({ 212*
1 ttRtXtXEXX
)( , ),( ),( 2211 nn tXXtXXtXX
PILLAI/Chanttt ,, 21
29
where is as defined on (14-9). If X(t) is wide-sense stationary, then using (14-20)-(14-21) in (14-22) we get
and hence if the set of time indices are shifted by a constant c to generate a new set of jointly Gaussian random variables then their joint characteristic function is identical to (14-23). Thus the set of random variables and have the same joint probability distribution for all n and all c, establishing the strict sense stationarity of Gaussian processes from its wide-sense stationarity.
To summarize if X(t) is a Gaussian process, then wide-sense stationarity (w.s.s) strict-sense stationarity (s.s.s).Notice that since the joint p.d.f of Gaussian random variables dependsonly on their second order statistics, which is also the basis
),( ki ttCXX
1 ,
( ) ( , ) / 2
1 2( , , , )XX
n n
k k i k i kk l k
X
j t C t t
n e
(14-22)
12
1 1 1 1
( )
1 2( , , , )XX
n n n
k i k i kk k
X
j C t t
n e
(14-23)
niiX 1}{
niiX 1}{
PILLAI/Cha
),( 11 ctXX )(,),( 22 ctXXctXX nn
30
The ergodic hypothesis: an isolated system in thermal equilibrium, evolving in time, will pass through all the accessible microstates states at the same recurrence rate, i.e. all accessible microstates are equally probable.
The average over long times will equal the average over the ensemble of all equi-energetic microstates: if we take a snapshot of a system with N microstates, we will find the system in any of these microstates with the same probability.