finding relations between variables - unibo.it · 2015. 10. 22. · the term is often attributed to...
TRANSCRIPT
![Page 1: FINDING RELATIONS BETWEEN VARIABLES - unibo.it · 2015. 10. 22. · The term is often attributed to Rudolf Erich Raspe's story The Surprising Adventures of Baron Munchausen, where](https://reader035.vdocument.in/reader035/viewer/2022071515/613744400ad5d2067648827e/html5/thumbnails/1.jpg)
FINDING RELATIONS BETWEEN VARIABLES
Correlation
![Page 2: FINDING RELATIONS BETWEEN VARIABLES - unibo.it · 2015. 10. 22. · The term is often attributed to Rudolf Erich Raspe's story The Surprising Adventures of Baron Munchausen, where](https://reader035.vdocument.in/reader035/viewer/2022071515/613744400ad5d2067648827e/html5/thumbnails/2.jpg)
Relation between coupled variables
What couples of variables are in relation?
![Page 3: FINDING RELATIONS BETWEEN VARIABLES - unibo.it · 2015. 10. 22. · The term is often attributed to Rudolf Erich Raspe's story The Surprising Adventures of Baron Munchausen, where](https://reader035.vdocument.in/reader035/viewer/2022071515/613744400ad5d2067648827e/html5/thumbnails/3.jpg)
Correlated variables
![Page 4: FINDING RELATIONS BETWEEN VARIABLES - unibo.it · 2015. 10. 22. · The term is often attributed to Rudolf Erich Raspe's story The Surprising Adventures of Baron Munchausen, where](https://reader035.vdocument.in/reader035/viewer/2022071515/613744400ad5d2067648827e/html5/thumbnails/4.jpg)
Uncorrelated variables
![Page 5: FINDING RELATIONS BETWEEN VARIABLES - unibo.it · 2015. 10. 22. · The term is often attributed to Rudolf Erich Raspe's story The Surprising Adventures of Baron Munchausen, where](https://reader035.vdocument.in/reader035/viewer/2022071515/613744400ad5d2067648827e/html5/thumbnails/5.jpg)
Definition The covariance of two random variable X and Y
is
Theorem For any two random variables X and Y.
Cov( , ) E[( [ ])( E[ ])]X Y X X Y Y
Var[ ] Var[ ] Var[ ] 2Cov( , ).X Y X Y X Y
Variance and Moments of a Random Variable
![Page 6: FINDING RELATIONS BETWEEN VARIABLES - unibo.it · 2015. 10. 22. · The term is often attributed to Rudolf Erich Raspe's story The Surprising Adventures of Baron Munchausen, where](https://reader035.vdocument.in/reader035/viewer/2022071515/613744400ad5d2067648827e/html5/thumbnails/6.jpg)
Independent variables COV=0
][][][
][][][][][][][),(
YEXEXYE
YEXEXEYEYEXEXYEYXCov
][][)()()()(
)(
][
,
,
YEXEyPyxPxyPxPyx
yxPyx
XYE
j
jj
i
ii
ji
jiji
ji
jiji
For discrete variables (for continuous, integral instead of sum)
For Independent variables
X,Y independent COV (X,Y) =0
The viceversa is not always true
![Page 7: FINDING RELATIONS BETWEEN VARIABLES - unibo.it · 2015. 10. 22. · The term is often attributed to Rudolf Erich Raspe's story The Surprising Adventures of Baron Munchausen, where](https://reader035.vdocument.in/reader035/viewer/2022071515/613744400ad5d2067648827e/html5/thumbnails/7.jpg)
Covariance and Pearson’s Correlation index
Variable 1 Variable 2
Item1 x11 x21
Item 2 x12 x22
Item i x1i x2i
Item m x1m x2m
Mean M1- M2-
m
i
i
m
i
i
xn
M
xn
M
1
22
1
11
1
1
m
i
ii
m
i
ii
MxMx
nxxcorr
MxMxn
xx
1 21
221121
1
221121
1
1),(
1
1),cov(
![Page 8: FINDING RELATIONS BETWEEN VARIABLES - unibo.it · 2015. 10. 22. · The term is often attributed to Rudolf Erich Raspe's story The Surprising Adventures of Baron Munchausen, where](https://reader035.vdocument.in/reader035/viewer/2022071515/613744400ad5d2067648827e/html5/thumbnails/8.jpg)
Correlation
1,1),( 21 xxcorr
![Page 9: FINDING RELATIONS BETWEEN VARIABLES - unibo.it · 2015. 10. 22. · The term is often attributed to Rudolf Erich Raspe's story The Surprising Adventures of Baron Munchausen, where](https://reader035.vdocument.in/reader035/viewer/2022071515/613744400ad5d2067648827e/html5/thumbnails/9.jpg)
When is a correlation significant?
m
i
ii MxMx
nxxr
1 21
221121
1
1),(
Given a correlation index:
A test variable can be computed under the null hypothesis that r=0
t is distributed as Student’s t test with n-2 degrees of freedom It assumes normality of x
![Page 10: FINDING RELATIONS BETWEEN VARIABLES - unibo.it · 2015. 10. 22. · The term is often attributed to Rudolf Erich Raspe's story The Surprising Adventures of Baron Munchausen, where](https://reader035.vdocument.in/reader035/viewer/2022071515/613744400ad5d2067648827e/html5/thumbnails/10.jpg)
Graph showing the minimum value of Pearson's correlation coefficient that is significantly different from zero at the 0.05 level, for a given sample size.
![Page 11: FINDING RELATIONS BETWEEN VARIABLES - unibo.it · 2015. 10. 22. · The term is often attributed to Rudolf Erich Raspe's story The Surprising Adventures of Baron Munchausen, where](https://reader035.vdocument.in/reader035/viewer/2022071515/613744400ad5d2067648827e/html5/thumbnails/11.jpg)
Example: discovery of a misconduct
Repeatability test: 2 different experimentalist were asked to take the same solution and to perform 24 independent ELISA assays on a 6x4 plate.
They submitted to the assessor the following results out of the spectrophotometer, ordered following the well
![Page 12: FINDING RELATIONS BETWEEN VARIABLES - unibo.it · 2015. 10. 22. · The term is often attributed to Rudolf Erich Raspe's story The Surprising Adventures of Baron Munchausen, where](https://reader035.vdocument.in/reader035/viewer/2022071515/613744400ad5d2067648827e/html5/thumbnails/12.jpg)
S1 S2P1 0,481 0,496P2 0,485 0,501P3 0,479 0,495P4 0,506 0,522P5 0,467 0,48P6 0,474 0,491P7 0,469 0,48P8 0,475 0,489P9 0,514 0,52P10 0,52 0,524P11 0,526 0,531P12 0,494 0,509P13 0,535 0,54P14 0,524 0,526P15 0,481 0,492P16 0,502 0,509P17 0,479 0,484P18 0,491 0,495P19 0,503 0,515P20 0,472 0,481P21 0,481 0,486P22 0,503 0,512P23 0,448 0,454P24 0,519 0,526
The assessor suspectedthat the experimenter submitted two reads of the same plate
How to prove it?
![Page 13: FINDING RELATIONS BETWEEN VARIABLES - unibo.it · 2015. 10. 22. · The term is often attributed to Rudolf Erich Raspe's story The Surprising Adventures of Baron Munchausen, where](https://reader035.vdocument.in/reader035/viewer/2022071515/613744400ad5d2067648827e/html5/thumbnails/13.jpg)
Example: discovery of a misconduct
0,44
0,45
0,46
0,47
0,48
0,49
0,5
0,51
0,52
0,53
0,54
0,55
0,44 0,46 0,48 0,5 0,52 0,54
S2
S1
R=0.978
![Page 14: FINDING RELATIONS BETWEEN VARIABLES - unibo.it · 2015. 10. 22. · The term is often attributed to Rudolf Erich Raspe's story The Surprising Adventures of Baron Munchausen, where](https://reader035.vdocument.in/reader035/viewer/2022071515/613744400ad5d2067648827e/html5/thumbnails/14.jpg)
Example: discovery of a misconduct
R=0.978 n=24 t=22.05
Objection: the test is valid only when data are normally distributed and we cannot prove that.
Any other idea?
![Page 15: FINDING RELATIONS BETWEEN VARIABLES - unibo.it · 2015. 10. 22. · The term is often attributed to Rudolf Erich Raspe's story The Surprising Adventures of Baron Munchausen, where](https://reader035.vdocument.in/reader035/viewer/2022071515/613744400ad5d2067648827e/html5/thumbnails/15.jpg)
Use the data themselves to generate random experiments
S1 S2P1 0,481 0,496P2 0,485 0,501P3 0,479 0,495P4 0,506 0,522P5 0,467 0,48P6 0,474 0,491P7 0,469 0,48P8 0,475 0,489P9 0,514 0,52P10 0,52 0,524P11 0,526 0,531P12 0,494 0,509P13 0,535 0,54P14 0,524 0,526P15 0,481 0,492P16 0,502 0,509P17 0,479 0,484P18 0,491 0,495P19 0,503 0,515P20 0,472 0,481P21 0,481 0,486P22 0,503 0,512P23 0,448 0,454P24 0,519 0,526
S1 Random(S2)P1 0,481 0,495P2 0,485 0,522P3 0,479 0,48P4 0,506 0,491P5 0,467 0,48P6 0,474 0,489P7 0,469 0,52P8 0,475 0,524P9 0,514 0,531P10 0,52 0,509P11 0,526 0,54P12 0,494 0,526P13 0,535 0,492P14 0,524 0,509P15 0,481 0,484P16 0,502 0,495P17 0,479 0,515P18 0,491 0,481P19 0,503 0,486P20 0,472 0,512P21 0,481 0,454P22 0,503 0,526P23 0,448 0,496P24 0,519 0,501
![Page 16: FINDING RELATIONS BETWEEN VARIABLES - unibo.it · 2015. 10. 22. · The term is often attributed to Rudolf Erich Raspe's story The Surprising Adventures of Baron Munchausen, where](https://reader035.vdocument.in/reader035/viewer/2022071515/613744400ad5d2067648827e/html5/thumbnails/16.jpg)
0,44
0,46
0,48
0,5
0,52
0,54
0,56
0,44 0,46 0,48 0,5 0,52 0,54
Rand
om (S2)
S1
R=0.25
![Page 17: FINDING RELATIONS BETWEEN VARIABLES - unibo.it · 2015. 10. 22. · The term is often attributed to Rudolf Erich Raspe's story The Surprising Adventures of Baron Munchausen, where](https://reader035.vdocument.in/reader035/viewer/2022071515/613744400ad5d2067648827e/html5/thumbnails/17.jpg)
Building a distribution by random resampling
Iterate the process of shuffling and computation of r many times (say 1000)
Compute a cumulative histogram counting the resamplings scoring with correlation≥r
0,00
200,00
400,00
600,00
800,00
1000,00
0,00 0,10 0,20 0,30 0,40 0,50 0,60 0,70 0,80 0,90
![Page 18: FINDING RELATIONS BETWEEN VARIABLES - unibo.it · 2015. 10. 22. · The term is often attributed to Rudolf Erich Raspe's story The Surprising Adventures of Baron Munchausen, where](https://reader035.vdocument.in/reader035/viewer/2022071515/613744400ad5d2067648827e/html5/thumbnails/18.jpg)
Building a distribution by random resampling
The plot gives the probability (per thousand) of obtaining a given correlation with random pairings of the original data P-value independent on the assumptions on the data distribution
0,00
200,00
400,00
600,00
800,00
1000,00
0,00 0,10 0,20 0,30 0,40 0,50 0,60 0,70 0,80 0,90
This is only an example plot: Compute by yourself the plot corresponding to the data available in misconduct.xls file
![Page 19: FINDING RELATIONS BETWEEN VARIABLES - unibo.it · 2015. 10. 22. · The term is often attributed to Rudolf Erich Raspe's story The Surprising Adventures of Baron Munchausen, where](https://reader035.vdocument.in/reader035/viewer/2022071515/613744400ad5d2067648827e/html5/thumbnails/19.jpg)
Bootstrapping
The term is often attributed to Rudolf Erich Raspe's story The Surprising Adventures of Baron Munchausen, where the main character pulls himself out of a swamp by his hair (specifically, his pigtail), but the Baron does not, in fact, pull himself out by his bootstraps
![Page 20: FINDING RELATIONS BETWEEN VARIABLES - unibo.it · 2015. 10. 22. · The term is often attributed to Rudolf Erich Raspe's story The Surprising Adventures of Baron Munchausen, where](https://reader035.vdocument.in/reader035/viewer/2022071515/613744400ad5d2067648827e/html5/thumbnails/20.jpg)
Correlation index assumes linear dependence
R=0.816
![Page 21: FINDING RELATIONS BETWEEN VARIABLES - unibo.it · 2015. 10. 22. · The term is often attributed to Rudolf Erich Raspe's story The Surprising Adventures of Baron Munchausen, where](https://reader035.vdocument.in/reader035/viewer/2022071515/613744400ad5d2067648827e/html5/thumbnails/21.jpg)
Non parametric correlation: Spearman
Given a set of paired (xi,yi) sort separately the two variables, obtaining the ranks.
The Sperman’s correlation is the Pearson’s correlation of the ranked variables: (Rxi,Ryi),
![Page 22: FINDING RELATIONS BETWEEN VARIABLES - unibo.it · 2015. 10. 22. · The term is often attributed to Rudolf Erich Raspe's story The Surprising Adventures of Baron Munchausen, where](https://reader035.vdocument.in/reader035/viewer/2022071515/613744400ad5d2067648827e/html5/thumbnails/22.jpg)
![Page 23: FINDING RELATIONS BETWEEN VARIABLES - unibo.it · 2015. 10. 22. · The term is often attributed to Rudolf Erich Raspe's story The Surprising Adventures of Baron Munchausen, where](https://reader035.vdocument.in/reader035/viewer/2022071515/613744400ad5d2067648827e/html5/thumbnails/23.jpg)
Under the null hypothesis (r=0)
Is distributed as a Student’s t test with n-2 degrees of freedom
![Page 24: FINDING RELATIONS BETWEEN VARIABLES - unibo.it · 2015. 10. 22. · The term is often attributed to Rudolf Erich Raspe's story The Surprising Adventures of Baron Munchausen, where](https://reader035.vdocument.in/reader035/viewer/2022071515/613744400ad5d2067648827e/html5/thumbnails/24.jpg)
Categorical data: Matthews correlation index
Secreted Non Secreted Total
With Signal peptide a b a + b
Without Signal Peptide c d c + d
total a + c b + d n
cdbdcaba
ad-bcMCC
![Page 25: FINDING RELATIONS BETWEEN VARIABLES - unibo.it · 2015. 10. 22. · The term is often attributed to Rudolf Erich Raspe's story The Surprising Adventures of Baron Munchausen, where](https://reader035.vdocument.in/reader035/viewer/2022071515/613744400ad5d2067648827e/html5/thumbnails/25.jpg)
REGRESSION
![Page 26: FINDING RELATIONS BETWEEN VARIABLES - unibo.it · 2015. 10. 22. · The term is often attributed to Rudolf Erich Raspe's story The Surprising Adventures of Baron Munchausen, where](https://reader035.vdocument.in/reader035/viewer/2022071515/613744400ad5d2067648827e/html5/thumbnails/26.jpg)
Regression
Regression analysis: any techniques for modeling and analyzing several variables, when the focus is on the relationship between a dependent variableand one or more independent variables.
Plot of Height vs Weight
100 140 180 220 260
Weight
4.6
5
5.4
5.8
6.2
6.6
7
Heig
ht
![Page 27: FINDING RELATIONS BETWEEN VARIABLES - unibo.it · 2015. 10. 22. · The term is often attributed to Rudolf Erich Raspe's story The Surprising Adventures of Baron Munchausen, where](https://reader035.vdocument.in/reader035/viewer/2022071515/613744400ad5d2067648827e/html5/thumbnails/27.jpg)
Linear regression: Which is the linethat best fits the data?
?
?
?
![Page 28: FINDING RELATIONS BETWEEN VARIABLES - unibo.it · 2015. 10. 22. · The term is often attributed to Rudolf Erich Raspe's story The Surprising Adventures of Baron Munchausen, where](https://reader035.vdocument.in/reader035/viewer/2022071515/613744400ad5d2067648827e/html5/thumbnails/28.jpg)
Linear regression
residuals or errors
iii baxy
baxy
Interpolating line
Data Points:
x y
1 6
2 1
3 9
4 5
5 17
6 12
![Page 29: FINDING RELATIONS BETWEEN VARIABLES - unibo.it · 2015. 10. 22. · The term is often attributed to Rudolf Erich Raspe's story The Surprising Adventures of Baron Munchausen, where](https://reader035.vdocument.in/reader035/viewer/2022071515/613744400ad5d2067648827e/html5/thumbnails/29.jpg)
Least squares line
baxy Choose the line (a,b) that minimize
m
i
ii
m
i
i baxyE1
2
1
2
![Page 30: FINDING RELATIONS BETWEEN VARIABLES - unibo.it · 2015. 10. 22. · The term is often attributed to Rudolf Erich Raspe's story The Surprising Adventures of Baron Munchausen, where](https://reader035.vdocument.in/reader035/viewer/2022071515/613744400ad5d2067648827e/html5/thumbnails/30.jpg)
Minimizing
m
i
ii
m
i
i baxyE1
2
1
2
2
1
2
1
1
2
1
11
2
1 1
111
2
11
111
),cov(
1
1
0020
11020
x
m
i
i
m
i
ii
m
i
i
m
i
ii
m
i
i
m
i
i
m
i
m
i
iii
m
i
i
m
i
i
m
i
i
m
i
iii
m
i
ii
m
i
i
m
i
i
m
i
ii
yxa
xxxm
xyyxm
xxmx
xymyx
xxx
xyyx
a
xxaxyxayxxbaxya
E
xaybxm
aym
bbaxyb
E
![Page 31: FINDING RELATIONS BETWEEN VARIABLES - unibo.it · 2015. 10. 22. · The term is often attributed to Rudolf Erich Raspe's story The Surprising Adventures of Baron Munchausen, where](https://reader035.vdocument.in/reader035/viewer/2022071515/613744400ad5d2067648827e/html5/thumbnails/31.jpg)
Data Points:
x y
1 6
2 1
3 9
4 5
5 17
6 12
![Page 32: FINDING RELATIONS BETWEEN VARIABLES - unibo.it · 2015. 10. 22. · The term is often attributed to Rudolf Erich Raspe's story The Surprising Adventures of Baron Munchausen, where](https://reader035.vdocument.in/reader035/viewer/2022071515/613744400ad5d2067648827e/html5/thumbnails/32.jpg)
Polynomial interpolation
The same technique can be applied byimposing a polynomial regression model
p is the degree of the polynomial
ak and b are the trainable coefficients
p
k
k
k bxaxPy1
)(
![Page 33: FINDING RELATIONS BETWEEN VARIABLES - unibo.it · 2015. 10. 22. · The term is often attributed to Rudolf Erich Raspe's story The Surprising Adventures of Baron Munchausen, where](https://reader035.vdocument.in/reader035/viewer/2022071515/613744400ad5d2067648827e/html5/thumbnails/33.jpg)
Polynomials can perfectly interpolate a set of points
A set of data consisting of m points can be perfectlyinterpolated with a polynomial of degree m-1 2 points define a unique line, 3 points define a unique parabola (or a
line, if aligned) and so on…
Increasing the degree of the polynomial corresponds to decreasingthe error.
Bishop C, Pattern recognition and Machine Learning, Springer
![Page 34: FINDING RELATIONS BETWEEN VARIABLES - unibo.it · 2015. 10. 22. · The term is often attributed to Rudolf Erich Raspe's story The Surprising Adventures of Baron Munchausen, where](https://reader035.vdocument.in/reader035/viewer/2022071515/613744400ad5d2067648827e/html5/thumbnails/34.jpg)
values for parameters
M=0 M=1 M=3 M=9
b 0.19 0.82 0.31 0.35
a1 -1.27 7.99 232.37
a2 -25.43 -5321
a3 17.37 48568
a4 -231639
a5 640042
a6 -1061800
a7 1042400
a8 -557682
a9 125201
![Page 35: FINDING RELATIONS BETWEEN VARIABLES - unibo.it · 2015. 10. 22. · The term is often attributed to Rudolf Erich Raspe's story The Surprising Adventures of Baron Munchausen, where](https://reader035.vdocument.in/reader035/viewer/2022071515/613744400ad5d2067648827e/html5/thumbnails/35.jpg)
But…
The interpolation is useless and do not gives a goodpredictive model for extrapolationOVERFITTING
We cannot use the model for predicting Y in this region
![Page 36: FINDING RELATIONS BETWEEN VARIABLES - unibo.it · 2015. 10. 22. · The term is often attributed to Rudolf Erich Raspe's story The Surprising Adventures of Baron Munchausen, where](https://reader035.vdocument.in/reader035/viewer/2022071515/613744400ad5d2067648827e/html5/thumbnails/36.jpg)
Overfitting
![Page 37: FINDING RELATIONS BETWEEN VARIABLES - unibo.it · 2015. 10. 22. · The term is often attributed to Rudolf Erich Raspe's story The Surprising Adventures of Baron Munchausen, where](https://reader035.vdocument.in/reader035/viewer/2022071515/613744400ad5d2067648827e/html5/thumbnails/37.jpg)
Low number of points increasesrisk of overfitting
Bishop C Pattern Recognition and Machine Learning
![Page 38: FINDING RELATIONS BETWEEN VARIABLES - unibo.it · 2015. 10. 22. · The term is often attributed to Rudolf Erich Raspe's story The Surprising Adventures of Baron Munchausen, where](https://reader035.vdocument.in/reader035/viewer/2022071515/613744400ad5d2067648827e/html5/thumbnails/38.jpg)
High values for parameters
M=0 M=1 M=3 M=9
b 0.19 0.82 0.31 0.35
a1 -1.27 7.99 232.37
a2 -25.43 -5321
a3 17.37 48568
a4 -231639
a5 640042
a6 -1061800
a7 1042400
a8 -557682
a9 125201
Bishop C Pattern Recognition and Machine Learning
![Page 39: FINDING RELATIONS BETWEEN VARIABLES - unibo.it · 2015. 10. 22. · The term is often attributed to Rudolf Erich Raspe's story The Surprising Adventures of Baron Munchausen, where](https://reader035.vdocument.in/reader035/viewer/2022071515/613744400ad5d2067648827e/html5/thumbnails/39.jpg)
Regularized Error function
Discouraging high values for coefficients
p
k
k
m
i
ii axPyE1
2
1
2)(
Parameter weighting the strength of regularization
![Page 40: FINDING RELATIONS BETWEEN VARIABLES - unibo.it · 2015. 10. 22. · The term is often attributed to Rudolf Erich Raspe's story The Surprising Adventures of Baron Munchausen, where](https://reader035.vdocument.in/reader035/viewer/2022071515/613744400ad5d2067648827e/html5/thumbnails/40.jpg)
Values for parameters for M=9
ln λ=0 ln λ=-18 ln λ=-∞
w0 0.13 0.35 0.35
w1 -0.05 4.74 232.37
w2 -0.06 -0.77 -5321
w3 -0.05 -31.97 48568
w4 -0.03 -3.89 -231639
w5 -0.02 55.28 640042
w6 -0.01 41.32 -1061800
w7 0.00 -45.95 1042400
w8 0.00 -91.53 -557682
w9 0.01 72.68 125201
Bishop C Pattern Recognition and Machine Learning
![Page 41: FINDING RELATIONS BETWEEN VARIABLES - unibo.it · 2015. 10. 22. · The term is often attributed to Rudolf Erich Raspe's story The Surprising Adventures of Baron Munchausen, where](https://reader035.vdocument.in/reader035/viewer/2022071515/613744400ad5d2067648827e/html5/thumbnails/41.jpg)
λ=0Bishop C Pattern Recognition and Machine Learning
![Page 42: FINDING RELATIONS BETWEEN VARIABLES - unibo.it · 2015. 10. 22. · The term is often attributed to Rudolf Erich Raspe's story The Surprising Adventures of Baron Munchausen, where](https://reader035.vdocument.in/reader035/viewer/2022071515/613744400ad5d2067648827e/html5/thumbnails/42.jpg)
Regularized regression avoids overfitting
Bishop C Pattern Recognition and Machine Learning