stat_19 july fnl

8/12/2019 Stat_19 July Fnl

1/32


2/32


3/32


4/32

Bivalvia species of India

65 70 75 80Delta+

50

100

150

200

250

300

L a m b d a +

S20406080

100120140160180200220240

260280300

20

40

60

80100

120140160

180200 220240260280300

GJ(129)

MH(101)

GA(75)

KA(37)

KL(60)

LK(81)

OR(164)

WB(92)

AP(100)

TN(269) AN(252)

Bivalvia speciesofIndia

0 200 400 600Numberof species

50

55

60

65

70

75

80

D e l t a +

GJMHGAKA

KL LK ORWBAP

TNAN

Bivalvia Species of India

Resemblance: Gamma+

st gjmhgakakllk

or wbaptnan

Dissimilarity 25

GJMH

GA

KA

KL

LK

OR

WB

APTN

AN

2DStress: 0.1

Statistics

Reliability: SignificanceStrength of relationship: Meaningfulness

t- test ANOVA

Env of mandovi

0

20

40

60

80

100

B %

2-4 5,7C

6,8B

1 A

Macrofauna of mandovi Group average

M O R ' 0 7

M O R ' 0 8

M O R ' 0 9

L S R ' 0 8

L S R ' 0 9

P M o n R

' 0 7

P M R ' 0 7

E M R ' 0 7

Samples

100

80

60

40

20

S i m i l a r i t y

Transform:Log(X+1)Resemblance:S17 BrayCurtis similarity


5/32

Type of Statistics

1. Descriptive : e.g Mean, Median, Std. Dev, Std.Error, Std. Variance

2. Correlation: Relation between parameters

3. Inferential : Differences between/within group


6/32

Descriptive Statistics

Mean: arithmetic average of the scores. Considers both thenumber of scores and their value

Median: middle point in an ordered distribution at whichan equal number of scores lie on each side.

Mode: most frequently occurring score

6


7/32

MedianExample: 71, 73, 74, 75, 72

Step One: Place the scores in order from lowest tohighest: 71, 72, 73, 74, 75

Step Two: Calculate the position of the median using thefollowing formula:

Mdn= 5+1/2 = 3 rd score


8/32

Mode

Mode: most frequently occurring score

Which of the following scores is the mode?Unimodal: 3, 7, 3, 9, 9, 3, 5, 1, 8, 5

Biomodal: 2, 4, 9, 6, 4, 6, 6, 2, 8, 2

Multimodal: 7, 7, 6, 6, 5, 5, 4 and 4


9/32

Mean versus Median

Median not influenced by large sample values & isa better measure of centrality if the distribution isskewed.

If mean=median=mode then the data are said tobe symmetrical or Normal distribution

9


10/32

Descriptive Statistics: Variability

Measures of variability: extent of similarity or differencein a set of data

E.g Range, standard deviation, standard variance

10


11/32

Standard Deviation (SD)

Standard Deviation (s) a measure of the variability,or spread, of a set of scores around the mean

Sum of differences between each score and the mean(known as deviation scores)

A good approach for measuring variability around themean

11


12/32

Standard Deviation

The sample standard deviation , s, is the square-root of the variance

1

1

2

n

x x

s

n

i

i

12


13/32


14/32

Standard Variance

Square of the standard deviation (s 2)

Used with in: regression analysis, analysis of variance(ANOVA), and the determination of the reliability of atest

Also known as the mean square (MS)

14


15/32

Sample Variance

1

1

2

2

n

x x

s

n

ii

15

The sample variance , s2, is the arithmetic mean of thesquared deviations from the sample mean:

>


16/32

Normal Distribution of dataGraphical Assessment of Normality(probability plots)Shapiro-Wilk's Test (W-statistic)D'Agostino Test (D-statistic)Goodness-of-Fit Tests (e.g.,Kolmogorov-Smirnov Test)

Data Normally distributed parametric test No normal Distribution: transformation or non parametric test

E.g log (growth rate) square root (Density data) Arcsine (% , ratio data)


17/32

Univariate Analysis

t-test: difference between two mean values

Analysis of variance (ANOVA)


18/32

t- test

Comparison of two mean valuesE.g Density data between two sitesDifference between control & Experiment

One-tailed: testing in any one direction

Two-tailed: testing relationship in both directions i.e

higher & below mean


19/32

t-test contd..

Independent t-test : comparing unrelated dataE.g male and female or two different sites

Dependent: data that are related e.g before and after


20/32

Analysis of Variance (ANOVA)

One-way - One independent variable e.g site ormonth, season

Two-way - 2 independent variable e.g site and month

Factorial - > 2 independent variable e.g Transect,Site (Area) and month/season


21/32

ANOVA

OCSS Degr. Of

freedomMS F p

Intercept 2.33635 1 2.336346 112.024 0.0000

station 0.86437 9 0.096041 4.6050 0.0021

Error 0.41712 20 0.020856

SS- Sum of Square Degree of freedom= n-1 MS: Mean square F: ratio of mean square by the residual mean square. F value should be greater than the cut-off value P= 95 % confidence


22/32

One way ANOVA

OCSS Degr. Of

freedomMS F p

Intercept 2.33635 1 2.336346 112.024 0.0000

station 0.86437 9 0.096041 4.6050 0.0021

Error 0.41712 20 0.020856

PhaeopigmentSS Degr. Of

FMS F p

Intercept 0.02431 1 0.02431 7.08660 0.01496station 0.02900 9 0.00322 0.93941 0.51402Error 0.06861 20 0.00343


23/32

Two way ANOVA SS Degr. of MS F p

Intercept 272.8705 1 272.8705 575.1452 0.000000season 9.4421 2 4.7210 9.9508 0.000185Stn 15.3212 9 1.7024 3.5882 0.001262

season*Stn 13.9800 18 0.7767 1.6370 0.079326Error 28.4663 60 0.4744

SS Degr. of MS F pIntercept 15.11210 1 15.11210 290.6176 0.000

season 0.02009 2 0.01005 0.1932 0.824Tide 0.54502 2 0.27251 5.2406 0.0072season*Tide 0.57176 4 0.14294 2.7489 0.0336Error 4.21200 81 0.05200


24/32

Post hoc test {1} {2} {3} {4} {5} {6} {7} {8} {9}

season Tide 0.161 .533 .53 .46 .458 .40 .33 .49 .34

1 1 1

2 1 2 0.013

3 1 3 0.011 1.004 2 1 0.211 0.97 0.97

5 2 2 0.11 0.99 0.99 0.996 2 3 0.30 0.939 0.92 1.00 0.997 3 1 0.75 0.577 0.54 0.99 0.96 0.99

8 3 2 0.037 0.999 0.99 0.99 0.99 0.99 0.799 3 3 0.707 0.624 0.59 0.99 0.973 0.99 1.00 0.83


25/32

Factorial ANOVA

Source df MS F P

Abundance Season 2 43365239 10.30 0.00005125

Stn9 14757634 3.50 0.00042912

tide 2 50601728 11.54 0.00001643

S x Stn 18 19414787 4.61 0.00000001

S x T 4 7078555 1.35 0.25004741

Stn x T 18 16288388 3.71 0.00000147


26/32

Correlation

A linear relationship between two variablesPearsons (r and p) : Parametric

Spearman (rho and P): non-parametricRelation : positive or negative (r= -1 0 +1)


27/32

Multiple Regression

Correlation of one variable (e.g biological) to 2 ormore variables (e.g environment )

Multiple Regression Results

Dependent: BR Multiple R = .77193627 F = 6.881218R= .59588560 df = 3 , 14

No. of cases: 18 adjusted R= .50928966 p = .004444

Standard error of estimate: 23.371691805Intercept: -1.680825308 Std.Error: 7.052847 t( 14 ) = -.2383 p = .8151

FF beta=.271 GR beta=.499 GR/DR beta=.291


28/32

Multivariate Analyses

Cluster and nMDSSIMPER ANOSIM

BIOENVPrincipal Component Analysis (PCA)Canonical Correspondence Analysis (CCA)PRIMER E and MVSTEP


29/32

Test for Normality of data

Histogram plotCheck for skewness and

Kurtosis

Kolmogorov-Smirnov TestUsed if data set are unqeualeg Station 1 (10 replicates/station)Station 2 (7 replicate /station

Shapiro-Wilk's Test(W-statistic)D'Agostino Test (D-statistic)Lilliefors test

Normal distribution (p > 0.05)Parametric Analysis

T-test, ANOVA, Pearson correlation

Not Normal distributionTransformation

And check for Normality

Normal distributionParametric Analysis

Not Normal distributionNon-Parametric Analysis


30/32

Analysis Type Example Parametric test Non parametric

Compare Mean between

2 independent grp

Abundance variation

between Mandovi andZuari

Independent t-test Wilcoxon rank-sum

test

Compare twoquantitativemeasurement from

same individual

Difference before andafter

Dependent t-test Wilcoxon signed-rank test

Compare mean between> 2 groups

Abundance betweenMandovi, Zuari,Chapora, Sal

1. Way Anova Kruskal-Wallis test

Estimation relation

between 1 dependentand 1 independentvariables

Relation of biotic and

abiotic data

Pearson correlation

(r -1 0 +1 p< 0.05)

Spearman

correlation( -1 0 +1 P< 5%)

Estimation relationbetween 1 dependent

and > 2 independentvariables

Relation ofphytoplankton density

with temperaturesalinity, DO etc

Multiple Regression(Check for beta value

and p


31/32

Take-home pointsParametric and nonparametric are two broad classifications of statistical procedures.

Parametric tests are based on assumptions about the distribution of the underlyingpopulation from which the sample was taken. The most common parametric assumption is that data are approximately normallydistributed.Nonparametric tests do not rely on assumptions about the shape or parameters of the

underlying population distribution.If the data deviate strongly from the assumptions of a parametric procedure, using theparametric procedure could lead to incorrect conclusions. You should be aware of the assumptions associated with a parametric procedure(Normality test eg. Shapiro-Wilks testor histogram)

If you determine that the assumptions of the parametric procedure are not valid, usean analogous nonparametric procedure instead (Previous slide).Nonparametric tests are often a good option for small data ( n < 30).Nonparametric procedures generally have less powerInterpretation of nonparametric procedures can also be more difficult than for

parametric procedures.


32/32

Thank you!

Next Saturday ?????

stat_19 july fnl

Documents