stat_19 july fnl

Upload: srinivas-darsipudi

Post on 03-Jun-2018

232 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/12/2019 Stat_19 July Fnl

    1/32

  • 8/12/2019 Stat_19 July Fnl

    2/32

  • 8/12/2019 Stat_19 July Fnl

    3/32

  • 8/12/2019 Stat_19 July Fnl

    4/32

    Bivalvia species of India

    65 70 75 80Delta+

    50

    100

    150

    200

    250

    300

    L a m b d a +

    S20406080

    100120140160180200220240

    260280300

    20

    40

    60

    80100

    120140160

    180200 220240260280300

    GJ(129)

    MH(101)

    GA(75)

    KA(37)

    KL(60)

    LK(81)

    OR(164)

    WB(92)

    AP(100)

    TN(269) AN(252)

    Bivalvia speciesofIndia

    0 200 400 600Numberof species

    50

    55

    60

    65

    70

    75

    80

    D e l t a +

    GJMHGAKA

    KL LK ORWBAP

    TNAN

    Bivalvia Species of India

    Resemblance: Gamma+

    st gjmhgakakllk

    or wbaptnan

    Dissimilarity 25

    GJMH

    GA

    KA

    KL

    LK

    OR

    WB

    APTN

    AN

    2DStress: 0.1

    Statistics

    Reliability: SignificanceStrength of relationship: Meaningfulness

    t- test ANOVA

    Env of mandovi

    0

    20

    40

    60

    80

    100

    B %

    2-4 5,7C

    6,8B

    1 A

    Macrofauna of mandovi Group average

    M O R ' 0 7

    M O R ' 0 8

    M O R ' 0 9

    L S R ' 0 8

    L S R ' 0 9

    P M o n R

    ' 0 7

    P M R ' 0 7

    E M R ' 0 7

    Samples

    100

    80

    60

    40

    20

    S i m i l a r i t y

    Transform:Log(X+1)Resemblance:S17 BrayCurtis similarity

  • 8/12/2019 Stat_19 July Fnl

    5/32

    Type of Statistics

    1. Descriptive : e.g Mean, Median, Std. Dev, Std.Error, Std. Variance

    2. Correlation: Relation between parameters

    3. Inferential : Differences between/within group

  • 8/12/2019 Stat_19 July Fnl

    6/32

    Descriptive Statistics

    Mean: arithmetic average of the scores. Considers both thenumber of scores and their value

    Median: middle point in an ordered distribution at whichan equal number of scores lie on each side.

    Mode: most frequently occurring score

    6

  • 8/12/2019 Stat_19 July Fnl

    7/32

    MedianExample: 71, 73, 74, 75, 72

    Step One: Place the scores in order from lowest tohighest: 71, 72, 73, 74, 75

    Step Two: Calculate the position of the median using thefollowing formula:

    Mdn= 5+1/2 = 3 rd score

  • 8/12/2019 Stat_19 July Fnl

    8/32

    Mode

    Mode: most frequently occurring score

    Which of the following scores is the mode?Unimodal: 3, 7, 3, 9, 9, 3, 5, 1, 8, 5

    Biomodal: 2, 4, 9, 6, 4, 6, 6, 2, 8, 2

    Multimodal: 7, 7, 6, 6, 5, 5, 4 and 4

  • 8/12/2019 Stat_19 July Fnl

    9/32

    Mean versus Median

    Median not influenced by large sample values & isa better measure of centrality if the distribution isskewed.

    If mean=median=mode then the data are said tobe symmetrical or Normal distribution

    9

  • 8/12/2019 Stat_19 July Fnl

    10/32

    Descriptive Statistics: Variability

    Measures of variability: extent of similarity or differencein a set of data

    E.g Range, standard deviation, standard variance

    10

  • 8/12/2019 Stat_19 July Fnl

    11/32

    Standard Deviation (SD)

    Standard Deviation (s) a measure of the variability,or spread, of a set of scores around the mean

    Sum of differences between each score and the mean(known as deviation scores)

    A good approach for measuring variability around themean

    11

  • 8/12/2019 Stat_19 July Fnl

    12/32

    Standard Deviation

    The sample standard deviation , s, is the square-root of the variance

    1

    1

    2

    n

    x x

    s

    n

    i

    i

    12

  • 8/12/2019 Stat_19 July Fnl

    13/32

  • 8/12/2019 Stat_19 July Fnl

    14/32

    Standard Variance

    Square of the standard deviation (s 2)

    Used with in: regression analysis, analysis of variance(ANOVA), and the determination of the reliability of atest

    Also known as the mean square (MS)

    14

  • 8/12/2019 Stat_19 July Fnl

    15/32

    Sample Variance

    1

    1

    2

    2

    n

    x x

    s

    n

    ii

    15

    The sample variance , s2, is the arithmetic mean of thesquared deviations from the sample mean:

    >

  • 8/12/2019 Stat_19 July Fnl

    16/32

    Normal Distribution of dataGraphical Assessment of Normality(probability plots)Shapiro-Wilk's Test (W-statistic)D'Agostino Test (D-statistic)Goodness-of-Fit Tests (e.g.,Kolmogorov-Smirnov Test)

    Data Normally distributed parametric test No normal Distribution: transformation or non parametric test

    E.g log (growth rate) square root (Density data) Arcsine (% , ratio data)

  • 8/12/2019 Stat_19 July Fnl

    17/32

    Univariate Analysis

    t-test: difference between two mean values

    Analysis of variance (ANOVA)

  • 8/12/2019 Stat_19 July Fnl

    18/32

    t- test

    Comparison of two mean valuesE.g Density data between two sitesDifference between control & Experiment

    One-tailed: testing in any one direction

    Two-tailed: testing relationship in both directions i.e

    higher & below mean

  • 8/12/2019 Stat_19 July Fnl

    19/32

    t-test contd..

    Independent t-test : comparing unrelated dataE.g male and female or two different sites

    Dependent: data that are related e.g before and after

  • 8/12/2019 Stat_19 July Fnl

    20/32

    Analysis of Variance (ANOVA)

    One-way - One independent variable e.g site ormonth, season

    Two-way - 2 independent variable e.g site and month

    Factorial - > 2 independent variable e.g Transect,Site (Area) and month/season

  • 8/12/2019 Stat_19 July Fnl

    21/32

    ANOVA

    OCSS Degr. Of

    freedomMS F p

    Intercept 2.33635 1 2.336346 112.024 0.0000

    station 0.86437 9 0.096041 4.6050 0.0021

    Error 0.41712 20 0.020856

    SS- Sum of Square Degree of freedom= n-1 MS: Mean square F: ratio of mean square by the residual mean square. F value should be greater than the cut-off value P= 95 % confidence

  • 8/12/2019 Stat_19 July Fnl

    22/32

    One way ANOVA

    OCSS Degr. Of

    freedomMS F p

    Intercept 2.33635 1 2.336346 112.024 0.0000

    station 0.86437 9 0.096041 4.6050 0.0021

    Error 0.41712 20 0.020856

    PhaeopigmentSS Degr. Of

    FMS F p

    Intercept 0.02431 1 0.02431 7.08660 0.01496station 0.02900 9 0.00322 0.93941 0.51402Error 0.06861 20 0.00343

  • 8/12/2019 Stat_19 July Fnl

    23/32

    Two way ANOVA SS Degr. of MS F p

    Intercept 272.8705 1 272.8705 575.1452 0.000000season 9.4421 2 4.7210 9.9508 0.000185Stn 15.3212 9 1.7024 3.5882 0.001262

    season*Stn 13.9800 18 0.7767 1.6370 0.079326Error 28.4663 60 0.4744

    SS Degr. of MS F pIntercept 15.11210 1 15.11210 290.6176 0.000

    season 0.02009 2 0.01005 0.1932 0.824Tide 0.54502 2 0.27251 5.2406 0.0072season*Tide 0.57176 4 0.14294 2.7489 0.0336Error 4.21200 81 0.05200

  • 8/12/2019 Stat_19 July Fnl

    24/32

    Post hoc test {1} {2} {3} {4} {5} {6} {7} {8} {9}

    season Tide 0.161 .533 .53 .46 .458 .40 .33 .49 .34

    1 1 1

    2 1 2 0.013

    3 1 3 0.011 1.004 2 1 0.211 0.97 0.97

    5 2 2 0.11 0.99 0.99 0.996 2 3 0.30 0.939 0.92 1.00 0.997 3 1 0.75 0.577 0.54 0.99 0.96 0.99

    8 3 2 0.037 0.999 0.99 0.99 0.99 0.99 0.799 3 3 0.707 0.624 0.59 0.99 0.973 0.99 1.00 0.83

  • 8/12/2019 Stat_19 July Fnl

    25/32

    Factorial ANOVA

    Source df MS F P

    Abundance Season 2 43365239 10.30 0.00005125

    Stn9 14757634 3.50 0.00042912

    tide 2 50601728 11.54 0.00001643

    S x Stn 18 19414787 4.61 0.00000001

    S x T 4 7078555 1.35 0.25004741

    Stn x T 18 16288388 3.71 0.00000147

  • 8/12/2019 Stat_19 July Fnl

    26/32

    Correlation

    A linear relationship between two variablesPearsons (r and p) : Parametric

    Spearman (rho and P): non-parametricRelation : positive or negative (r= -1 0 +1)

  • 8/12/2019 Stat_19 July Fnl

    27/32

    Multiple Regression

    Correlation of one variable (e.g biological) to 2 ormore variables (e.g environment )

    Multiple Regression Results

    Dependent: BR Multiple R = .77193627 F = 6.881218R= .59588560 df = 3 , 14

    No. of cases: 18 adjusted R= .50928966 p = .004444

    Standard error of estimate: 23.371691805Intercept: -1.680825308 Std.Error: 7.052847 t( 14 ) = -.2383 p = .8151

    FF beta=.271 GR beta=.499 GR/DR beta=.291

  • 8/12/2019 Stat_19 July Fnl

    28/32

    Multivariate Analyses

    Cluster and nMDSSIMPER ANOSIM

    BIOENVPrincipal Component Analysis (PCA)Canonical Correspondence Analysis (CCA)PRIMER E and MVSTEP

  • 8/12/2019 Stat_19 July Fnl

    29/32

    Test for Normality of data

    Histogram plotCheck for skewness and

    Kurtosis

    Kolmogorov-Smirnov TestUsed if data set are unqeualeg Station 1 (10 replicates/station)Station 2 (7 replicate /station

    Shapiro-Wilk's Test(W-statistic)D'Agostino Test (D-statistic)Lilliefors test

    Normal distribution (p > 0.05)Parametric Analysis

    T-test, ANOVA, Pearson correlation

    Not Normal distributionTransformation

    And check for Normality

    Normal distributionParametric Analysis

    Not Normal distributionNon-Parametric Analysis

  • 8/12/2019 Stat_19 July Fnl

    30/32

    Analysis Type Example Parametric test Non parametric

    Compare Mean between

    2 independent grp

    Abundance variation

    between Mandovi andZuari

    Independent t-test Wilcoxon rank-sum

    test

    Compare twoquantitativemeasurement from

    same individual

    Difference before andafter

    Dependent t-test Wilcoxon signed-rank test

    Compare mean between> 2 groups

    Abundance betweenMandovi, Zuari,Chapora, Sal

    1. Way Anova Kruskal-Wallis test

    Estimation relation

    between 1 dependentand 1 independentvariables

    Relation of biotic and

    abiotic data

    Pearson correlation

    (r -1 0 +1 p< 0.05)

    Spearman

    correlation( -1 0 +1 P< 5%)

    Estimation relationbetween 1 dependent

    and > 2 independentvariables

    Relation ofphytoplankton density

    with temperaturesalinity, DO etc

    Multiple Regression(Check for beta value

    and p

  • 8/12/2019 Stat_19 July Fnl

    31/32

    Take-home pointsParametric and nonparametric are two broad classifications of statistical procedures.

    Parametric tests are based on assumptions about the distribution of the underlyingpopulation from which the sample was taken. The most common parametric assumption is that data are approximately normallydistributed.Nonparametric tests do not rely on assumptions about the shape or parameters of the

    underlying population distribution.If the data deviate strongly from the assumptions of a parametric procedure, using theparametric procedure could lead to incorrect conclusions. You should be aware of the assumptions associated with a parametric procedure(Normality test eg. Shapiro-Wilks testor histogram)

    If you determine that the assumptions of the parametric procedure are not valid, usean analogous nonparametric procedure instead (Previous slide).Nonparametric tests are often a good option for small data ( n < 30).Nonparametric procedures generally have less powerInterpretation of nonparametric procedures can also be more difficult than for

    parametric procedures.

  • 8/12/2019 Stat_19 July Fnl

    32/32

    Thank you!

    Next Saturday ?????