stat_19 july fnl
TRANSCRIPT
-
8/12/2019 Stat_19 July Fnl
1/32
-
8/12/2019 Stat_19 July Fnl
2/32
-
8/12/2019 Stat_19 July Fnl
3/32
-
8/12/2019 Stat_19 July Fnl
4/32
Bivalvia species of India
65 70 75 80Delta+
50
100
150
200
250
300
L a m b d a +
S20406080
100120140160180200220240
260280300
20
40
60
80100
120140160
180200 220240260280300
GJ(129)
MH(101)
GA(75)
KA(37)
KL(60)
LK(81)
OR(164)
WB(92)
AP(100)
TN(269) AN(252)
Bivalvia speciesofIndia
0 200 400 600Numberof species
50
55
60
65
70
75
80
D e l t a +
GJMHGAKA
KL LK ORWBAP
TNAN
Bivalvia Species of India
Resemblance: Gamma+
st gjmhgakakllk
or wbaptnan
Dissimilarity 25
GJMH
GA
KA
KL
LK
OR
WB
APTN
AN
2DStress: 0.1
Statistics
Reliability: SignificanceStrength of relationship: Meaningfulness
t- test ANOVA
Env of mandovi
0
20
40
60
80
100
B %
2-4 5,7C
6,8B
1 A
Macrofauna of mandovi Group average
M O R ' 0 7
M O R ' 0 8
M O R ' 0 9
L S R ' 0 8
L S R ' 0 9
P M o n R
' 0 7
P M R ' 0 7
E M R ' 0 7
Samples
100
80
60
40
20
S i m i l a r i t y
Transform:Log(X+1)Resemblance:S17 BrayCurtis similarity
-
8/12/2019 Stat_19 July Fnl
5/32
Type of Statistics
1. Descriptive : e.g Mean, Median, Std. Dev, Std.Error, Std. Variance
2. Correlation: Relation between parameters
3. Inferential : Differences between/within group
-
8/12/2019 Stat_19 July Fnl
6/32
Descriptive Statistics
Mean: arithmetic average of the scores. Considers both thenumber of scores and their value
Median: middle point in an ordered distribution at whichan equal number of scores lie on each side.
Mode: most frequently occurring score
6
-
8/12/2019 Stat_19 July Fnl
7/32
MedianExample: 71, 73, 74, 75, 72
Step One: Place the scores in order from lowest tohighest: 71, 72, 73, 74, 75
Step Two: Calculate the position of the median using thefollowing formula:
Mdn= 5+1/2 = 3 rd score
-
8/12/2019 Stat_19 July Fnl
8/32
Mode
Mode: most frequently occurring score
Which of the following scores is the mode?Unimodal: 3, 7, 3, 9, 9, 3, 5, 1, 8, 5
Biomodal: 2, 4, 9, 6, 4, 6, 6, 2, 8, 2
Multimodal: 7, 7, 6, 6, 5, 5, 4 and 4
-
8/12/2019 Stat_19 July Fnl
9/32
Mean versus Median
Median not influenced by large sample values & isa better measure of centrality if the distribution isskewed.
If mean=median=mode then the data are said tobe symmetrical or Normal distribution
9
-
8/12/2019 Stat_19 July Fnl
10/32
Descriptive Statistics: Variability
Measures of variability: extent of similarity or differencein a set of data
E.g Range, standard deviation, standard variance
10
-
8/12/2019 Stat_19 July Fnl
11/32
Standard Deviation (SD)
Standard Deviation (s) a measure of the variability,or spread, of a set of scores around the mean
Sum of differences between each score and the mean(known as deviation scores)
A good approach for measuring variability around themean
11
-
8/12/2019 Stat_19 July Fnl
12/32
Standard Deviation
The sample standard deviation , s, is the square-root of the variance
1
1
2
n
x x
s
n
i
i
12
-
8/12/2019 Stat_19 July Fnl
13/32
-
8/12/2019 Stat_19 July Fnl
14/32
Standard Variance
Square of the standard deviation (s 2)
Used with in: regression analysis, analysis of variance(ANOVA), and the determination of the reliability of atest
Also known as the mean square (MS)
14
-
8/12/2019 Stat_19 July Fnl
15/32
Sample Variance
1
1
2
2
n
x x
s
n
ii
15
The sample variance , s2, is the arithmetic mean of thesquared deviations from the sample mean:
>
-
8/12/2019 Stat_19 July Fnl
16/32
Normal Distribution of dataGraphical Assessment of Normality(probability plots)Shapiro-Wilk's Test (W-statistic)D'Agostino Test (D-statistic)Goodness-of-Fit Tests (e.g.,Kolmogorov-Smirnov Test)
Data Normally distributed parametric test No normal Distribution: transformation or non parametric test
E.g log (growth rate) square root (Density data) Arcsine (% , ratio data)
-
8/12/2019 Stat_19 July Fnl
17/32
Univariate Analysis
t-test: difference between two mean values
Analysis of variance (ANOVA)
-
8/12/2019 Stat_19 July Fnl
18/32
t- test
Comparison of two mean valuesE.g Density data between two sitesDifference between control & Experiment
One-tailed: testing in any one direction
Two-tailed: testing relationship in both directions i.e
higher & below mean
-
8/12/2019 Stat_19 July Fnl
19/32
t-test contd..
Independent t-test : comparing unrelated dataE.g male and female or two different sites
Dependent: data that are related e.g before and after
-
8/12/2019 Stat_19 July Fnl
20/32
Analysis of Variance (ANOVA)
One-way - One independent variable e.g site ormonth, season
Two-way - 2 independent variable e.g site and month
Factorial - > 2 independent variable e.g Transect,Site (Area) and month/season
-
8/12/2019 Stat_19 July Fnl
21/32
ANOVA
OCSS Degr. Of
freedomMS F p
Intercept 2.33635 1 2.336346 112.024 0.0000
station 0.86437 9 0.096041 4.6050 0.0021
Error 0.41712 20 0.020856
SS- Sum of Square Degree of freedom= n-1 MS: Mean square F: ratio of mean square by the residual mean square. F value should be greater than the cut-off value P= 95 % confidence
-
8/12/2019 Stat_19 July Fnl
22/32
One way ANOVA
OCSS Degr. Of
freedomMS F p
Intercept 2.33635 1 2.336346 112.024 0.0000
station 0.86437 9 0.096041 4.6050 0.0021
Error 0.41712 20 0.020856
PhaeopigmentSS Degr. Of
FMS F p
Intercept 0.02431 1 0.02431 7.08660 0.01496station 0.02900 9 0.00322 0.93941 0.51402Error 0.06861 20 0.00343
-
8/12/2019 Stat_19 July Fnl
23/32
Two way ANOVA SS Degr. of MS F p
Intercept 272.8705 1 272.8705 575.1452 0.000000season 9.4421 2 4.7210 9.9508 0.000185Stn 15.3212 9 1.7024 3.5882 0.001262
season*Stn 13.9800 18 0.7767 1.6370 0.079326Error 28.4663 60 0.4744
SS Degr. of MS F pIntercept 15.11210 1 15.11210 290.6176 0.000
season 0.02009 2 0.01005 0.1932 0.824Tide 0.54502 2 0.27251 5.2406 0.0072season*Tide 0.57176 4 0.14294 2.7489 0.0336Error 4.21200 81 0.05200
-
8/12/2019 Stat_19 July Fnl
24/32
Post hoc test {1} {2} {3} {4} {5} {6} {7} {8} {9}
season Tide 0.161 .533 .53 .46 .458 .40 .33 .49 .34
1 1 1
2 1 2 0.013
3 1 3 0.011 1.004 2 1 0.211 0.97 0.97
5 2 2 0.11 0.99 0.99 0.996 2 3 0.30 0.939 0.92 1.00 0.997 3 1 0.75 0.577 0.54 0.99 0.96 0.99
8 3 2 0.037 0.999 0.99 0.99 0.99 0.99 0.799 3 3 0.707 0.624 0.59 0.99 0.973 0.99 1.00 0.83
-
8/12/2019 Stat_19 July Fnl
25/32
Factorial ANOVA
Source df MS F P
Abundance Season 2 43365239 10.30 0.00005125
Stn9 14757634 3.50 0.00042912
tide 2 50601728 11.54 0.00001643
S x Stn 18 19414787 4.61 0.00000001
S x T 4 7078555 1.35 0.25004741
Stn x T 18 16288388 3.71 0.00000147
-
8/12/2019 Stat_19 July Fnl
26/32
Correlation
A linear relationship between two variablesPearsons (r and p) : Parametric
Spearman (rho and P): non-parametricRelation : positive or negative (r= -1 0 +1)
-
8/12/2019 Stat_19 July Fnl
27/32
Multiple Regression
Correlation of one variable (e.g biological) to 2 ormore variables (e.g environment )
Multiple Regression Results
Dependent: BR Multiple R = .77193627 F = 6.881218R= .59588560 df = 3 , 14
No. of cases: 18 adjusted R= .50928966 p = .004444
Standard error of estimate: 23.371691805Intercept: -1.680825308 Std.Error: 7.052847 t( 14 ) = -.2383 p = .8151
FF beta=.271 GR beta=.499 GR/DR beta=.291
-
8/12/2019 Stat_19 July Fnl
28/32
Multivariate Analyses
Cluster and nMDSSIMPER ANOSIM
BIOENVPrincipal Component Analysis (PCA)Canonical Correspondence Analysis (CCA)PRIMER E and MVSTEP
-
8/12/2019 Stat_19 July Fnl
29/32
Test for Normality of data
Histogram plotCheck for skewness and
Kurtosis
Kolmogorov-Smirnov TestUsed if data set are unqeualeg Station 1 (10 replicates/station)Station 2 (7 replicate /station
Shapiro-Wilk's Test(W-statistic)D'Agostino Test (D-statistic)Lilliefors test
Normal distribution (p > 0.05)Parametric Analysis
T-test, ANOVA, Pearson correlation
Not Normal distributionTransformation
And check for Normality
Normal distributionParametric Analysis
Not Normal distributionNon-Parametric Analysis
-
8/12/2019 Stat_19 July Fnl
30/32
Analysis Type Example Parametric test Non parametric
Compare Mean between
2 independent grp
Abundance variation
between Mandovi andZuari
Independent t-test Wilcoxon rank-sum
test
Compare twoquantitativemeasurement from
same individual
Difference before andafter
Dependent t-test Wilcoxon signed-rank test
Compare mean between> 2 groups
Abundance betweenMandovi, Zuari,Chapora, Sal
1. Way Anova Kruskal-Wallis test
Estimation relation
between 1 dependentand 1 independentvariables
Relation of biotic and
abiotic data
Pearson correlation
(r -1 0 +1 p< 0.05)
Spearman
correlation( -1 0 +1 P< 5%)
Estimation relationbetween 1 dependent
and > 2 independentvariables
Relation ofphytoplankton density
with temperaturesalinity, DO etc
Multiple Regression(Check for beta value
and p
-
8/12/2019 Stat_19 July Fnl
31/32
Take-home pointsParametric and nonparametric are two broad classifications of statistical procedures.
Parametric tests are based on assumptions about the distribution of the underlyingpopulation from which the sample was taken. The most common parametric assumption is that data are approximately normallydistributed.Nonparametric tests do not rely on assumptions about the shape or parameters of the
underlying population distribution.If the data deviate strongly from the assumptions of a parametric procedure, using theparametric procedure could lead to incorrect conclusions. You should be aware of the assumptions associated with a parametric procedure(Normality test eg. Shapiro-Wilks testor histogram)
If you determine that the assumptions of the parametric procedure are not valid, usean analogous nonparametric procedure instead (Previous slide).Nonparametric tests are often a good option for small data ( n < 30).Nonparametric procedures generally have less powerInterpretation of nonparametric procedures can also be more difficult than for
parametric procedures.
-
8/12/2019 Stat_19 July Fnl
32/32
Thank you!
Next Saturday ?????