statistika angl ii - muni medvariability repeated measurements 18,2 c 18,5 c 19,1 c 18,7 c...

16
Advanced statistics "there are three kinds of lies: lies, damned lies and statistics" Disraeli

Upload: others

Post on 24-Oct-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

  • Advanced statistics

    "there are three kinds of lies: lies, damned lies and statistics" Disraeli

  • Definition• procedure used in data exploration,

    organisation, presentation, analysis and interpretation facilitating decision making

    • separate real effect from randomvariation

    • descriptive – tables– graphs– measures • inferential

    – estimating size– sampling

    • hypothesistesting

    • makingmodels

  • Variability

    repeated measurements18,2°C18,5°C19,1°C18,7°C intra-population variability

    180cm175cm165cm157cm

    ecological variabilityinter-population differences

    ethnic differences

    = BIODIVERSITY

    temporal fluctuation

    time

  • Data description

    • data types: qualitative x quantitative• frequency: histogram• tendency: mean, range, median, mode,

    quartiles, ….• variability: standard deviation, variance• distribution: symetric (sometimes

    normal), asymetric

  • symetric

    mean=median=modus

    asymetric

    median

    mean

    Distributions

  • Inferential statistics

    • population

    • sample

  • Hypothesis testing

    • drawing conclusions about population by analyzing sample

    research hypothesis

    null hypothesis alternative hypothesis

  • Statistical significance???

    by chanceDgenuine experimental effectC

    observed difference

  • Statistical errors vs. measure of statistical significance

    • Type 1: false positive• the study showed an effect which in reality does not exist

    • Type 2: false negative• an effect was there but the study missed it

    • P-value: probability of type I error (a)

  • Statistical tests

    parametric (for normal,normal-likeor transformeddistribution)

    non-parametric(distribution-free)

    tests unpaired paired

    • t-test independent(two-sample t-test)

    • ANOVA / MANOVA• regression• correlation (Pearson)

    • Mann-Whitney• median test• rank correlation (Spearman, Kendall)

    • chi-squared • Kruskall-Walis

    ANOVA

    • t-test dependent(one-sample)• ANOVA

    • Wilcoxon pairedtest

    comparison of parameterbetween 2 or more groups of subjects

    comparison of pair of parametersin subjects fromone group in timeintervals

  • ANOVA - Analysis of Variance

    • determines the probability that two or more samples were drawn from the same parent population

    • can be sub-classified by secondary or grouping variables (two way ANOVA)– eg. blood pressure before and after two different

    treatments– if some of the measurements were made in the

    same subjects then can be corrected for repeated measures

    • comparison of two samples with one way ANOVA is very similar to performing a non-paired t- test

  • Regression and Correlation

    • Regression indicates relationship(mathematical) between two or more variables

    • parametric analysis rules apply so data must either be normally distributed or can be converted into normal distribution by transformation

    • Correlation indicates an associationbetween two variables

    • non-parametric rules apply as the association is calculated between the ranks of the variables, not the variables themselves

    • correlation describes an association, not cause and effect!!!!

  • Regression model

    • where a and b are parameters in the regression model

    • the emphasis is on predicting one variable from the other

    • the least-squares criteria for goodness-of-fit

  • Correlation

    • a measure of the degree of linear relationshipbetween two variables (usually labeled X and Y)

    • it is possible for two variables to be related (correlated), but not causing one another

    • correlation coefficient (r)– the sign of the correlation coefficient (+ , -) defines the

    direction of the relationship– the absolute value of the correlation coefficient measures

    the strength of the relationship• r=0.0 indicates the absence of a linear relationship and

    correlation• coefficients of r=+1.0 and r=-1.0 indicate a perfect linear

    relationship

  • Scatterplots

  • Chi-Square test• a very useful, robust, simple test• handles classification data (eg. lived v died)• most frequently used for the 2 x 2 • cross-tabulation • table

    • a null hypothesis that there is no significant difference between the survival rate in the two groups