multivariate analysis€¦ · multivariate analysis anova and manova ... it tests the hypothesis...

29
Multivariate Analysis ANOVA and MANOVA Prof. Dr. Anselmo E de Oliveira anselmo.quimica.ufg.br [email protected]

Upload: others

Post on 22-Oct-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

  • Multivariate Analysis ANOVA and MANOVA

    Prof. Dr. Anselmo E de Oliveira

    anselmo.quimica.ufg.br

    [email protected]

    http://www.quimica.ufg.br/docentes/anselmomailto:[email protected]

  • ANOVA

    • ANalysis Of VAriance

    – separar e estimar diferentes causas de variação

    – testar se a alteração de um fator controlado leva a uma diferença significativa entre os valores médios obtidos

    – um fator (controlado ou aleatório) além do erro aleatório da medida ANOVA One-Way

  • Como Funciona a ANOVA

    • ANOVA divide, basicamente, a

    variabilidade em variabilidade Entre

    Grupos e variabilidade Dentro dos

    Grupos, e compara as duas

    • Quanto maior for a primeira

    comparada à segunda, maior será a

    evidência de que existe variabilidade

    entre grupos

  • Como Funciona a ANOVA

    • Define-se a soma de quadrados total, SQT, calculada a partir de todos os dados, em que 𝑥 é a média amostral global

    𝑺𝑸𝑻 = 𝒙𝒊 − 𝒙 𝟐

    𝒊

    • Note que a estimativa usual de variância de uma amostra é

    𝝈𝟐 =𝑺𝑸𝑻𝑵− 𝟏

    = 𝑴𝑸

    • Podemos subdividir SQT como 𝑺𝑸𝑻 = 𝑺𝑸𝑫 + 𝑺𝑸𝑬

    Soma de Quadrados Dentro do Grupo Soma de Quadrados Entre os Grupos

  • Como Funciona a ANOVA

    • SQD e SQE são definidos como

    𝑺𝑸𝑫 = 𝒙𝒊 − 𝒙 𝟏𝟐

    𝒈𝒑𝟏

    + 𝒙𝒊 − 𝒙 𝟐𝟐

    𝒈𝒑𝟐

    +⋯

    com 𝑥 𝑘 é a média amostral do grupo k, e 𝑺𝑸𝑬 = 𝒏𝟏 𝒙 𝟏 − 𝒙

    𝟐 + 𝒏𝟐 𝒙 𝟐 − 𝒙 𝟐 +⋯

    em que nk é o tamanho amostral do

    grupo k

  • Como Funciona a ANOVA

    • Após separar a variabilidade, podem-se obter estimativas independentes da variância populacional comum 𝜎2 a partir de SQE e SQD. Essas estimativas são chamadas de médias quadráticas (MQ), e obtemos as seguintes estimativas

    𝝈𝒆𝒏𝒕𝒓𝒆𝟐 =

    𝑺𝑸𝑬𝒎− 𝟏

    𝝈𝒅𝒆𝒏𝒕𝒓𝒐𝟐 =

    𝑺𝑸𝑫𝑵−𝒎

    em que m é o número de grupos e N é o tamanho amostral total

  • Comparação de várias médias

    • Problema: estabilidade de um

    reagente fluorescente

    condições medidas 𝒙 𝒎

    A

    recém preparada 102, 100, 101 101

    B

    armazenada, 1h, escuro 101, 101, 104 102

    C

    armazenada, 1h, sombra 97, 95, 99 97

    D

    armazenada, 1h, luz 90, 92, 94 92

    𝒙 = 𝟗𝟖

  • Comparação de várias médias

    • SQD

    𝑺𝑸𝑫 = 𝒙𝒊 − 𝒙 𝟏𝟐

    𝒈𝒑𝟏

    + 𝒙𝒊 − 𝒙 𝟐𝟐

    𝒈𝒑𝟐

    +⋯

    𝑺𝑸𝑫 = 𝟏𝟎𝟐 − 𝟏𝟎𝟏

    𝟐 + 𝟏𝟎𝟎 − 𝟏𝟎𝟏 𝟐 + 𝟏𝟎𝟏 − 𝟏𝟎𝟏 𝟐𝒂𝒎𝒐𝒔𝒕𝒓𝒂 𝑨

    + 𝟏𝟎𝟏 − 𝟏𝟎𝟐 𝟐 + 𝟏𝟎𝟏 − 𝟏𝟎𝟐 𝟐 + 𝟏𝟎𝟒 − 𝟏𝟎𝟐 𝟐𝒂𝒎𝒐𝒔𝒕𝒓𝒂 𝑩

    + 𝟗𝟕 − 𝟗𝟕 𝟐 + 𝟗𝟓 − 𝟗𝟕 𝟐 + 𝟗𝟗 − 𝟗𝟕 𝟐𝒂𝒎𝒐𝒔𝒕𝒓𝒂 𝑪

    + 𝟗𝟎 − 𝟗𝟐 𝟐 + 𝟗𝟐 − 𝟗𝟐 𝟐 + 𝟗𝟒 − 𝟗𝟐 𝟐𝒂𝒎𝒐𝒔𝒕𝒓𝒂 𝑫

    𝑺𝑸𝑫 = 𝟐 + 𝟔 + 𝟖 + 𝟖 = 𝟐𝟒

  • Comparação de várias médias

    • SQE 𝑺𝑸𝑬 = 𝒏𝟏 𝒙 𝟏 − 𝒙

    𝟐 + 𝒏𝟐 𝒙 𝟐 − 𝒙 𝟐 +⋯

    no exemplo n1 = n2 = n3 = n4 = 3

    𝑺𝑸𝑬 = 𝟑 𝟏𝟎𝟏 − 𝟗𝟖

    𝟐 + 𝟑 𝟏𝟎𝟐 − 𝟗𝟖 𝟐 + 𝟑 𝟗𝟕 − 𝟗𝟖 𝟐 + 𝟑 𝟗𝟐 − 𝟗𝟖 𝟐

    𝑺𝑸𝑬 = 𝟐𝟕 + 𝟒𝟖 + 𝟑 + 𝟏𝟎𝟖 = 𝟏𝟖𝟔

  • Comparação de várias médias

    • SQT

    𝑺𝑸𝑻 = 𝑺𝑸𝑫 + 𝑺𝑸𝑬

    𝑺𝑸𝑻 = 𝟐𝟒 + 𝟏𝟖𝟔 = 𝟐𝟏𝟎

    • 𝝈𝒆𝒏𝒕𝒓𝒆𝟐 =

    𝑺𝑸𝑬

    𝒎−𝟏

    𝝈𝒆𝒏𝒕𝒓𝒆𝟐 =

    𝟏𝟖𝟔

    𝟒 − 𝟏= 𝟔𝟐

    • 𝝈𝒅𝒆𝒏𝒕𝒓𝒐𝟐 =

    𝑺𝑸𝑫

    𝑵−𝒎

    𝝈𝒅𝒆𝒏𝒕𝒓𝒐𝟐 =

    𝟐𝟒

    𝟏𝟐 − 𝟒= 𝟑

  • Como Funciona a ANOVA

    • Teste da Hipótese

    – hipótese nula, H0: todas as amostras

    pertencem à uma mesma população com

    média e variância 𝝈𝟐

    – estimativa de 𝝈𝟐

    • variação dentro das amostras: 𝝈𝒅𝒆𝒏𝒕𝒓𝒐𝟐

    • variação entre as amostras: 𝝈𝒆𝒏𝒕𝒓𝒆𝟐

  • Como Funciona a ANOVA

    • Teste da Hipótese

    hipótese nula, H0 :

    verdadeira • estimativas de 𝝈𝟐 não devem

    diferir significativamente

    falsa

    • 𝝈𝒆𝒏𝒕𝒓𝒆𝟐 > 𝝈𝒅𝒆𝒏𝒕𝒓𝒐

    𝟐

    •𝝈𝒆𝒏𝒕𝒓𝒆𝟐

    𝝈𝒅𝒆𝒏𝒕𝒓𝒐𝟐 > 𝟏

    𝝈𝒆𝒏𝒕𝒓𝒆𝟐 − 𝝈𝒅𝒆𝒏𝒕𝒓𝒐

    𝟐 = 𝟎

  • Como Funciona a ANOVA

    • Teste da Hipótese e Teste F

    – pra saber se um valor é maior do que um

    outro uma cauda, teste F

    se Fcalc > Ftab hipótese nula é descartada

    como Fcalc é maior do que Ftab,

    com 95% de confiança, as

    médias das amostras diferem

    significativamente

    Tabela F, P = 0,05

    • é o número do grau de liberdade •F é sempre maior do que 1

    http://home.comcast.net/~sharov/PopEcol/tables/f005.htmlhttp://home.comcast.net/~sharov/PopEcol/tables/f005.htmlhttp://home.comcast.net/~sharov/PopEcol/tables/f005.htmlhttp://home.comcast.net/~sharov/PopEcol/tables/f005.htmlhttp://home.comcast.net/~sharov/PopEcol/tables/f005.htmlhttp://home.comcast.net/~sharov/PopEcol/tables/f005.html

  • Tabela ANOVA

    fonte de variação SQ MQ F

    entre as amostras 186 3 62 20,7

    dentro das amostras 24 8 3

    Total 210 11

  • Octave

    • Faz o teste F para comparar se as médias são estatisticamente iguais

    • Não mostra a tabela

    • Calcula o valor de F

    • Calcula o valor de p

    – H0: as médias são iguais?

    – p < 0,05 hipótese nula é falsa

    • Calcula gle (graus de liberdade entre) e gld (graus de liberdade dentro)

    • Matriz Y: grupos em colunas

  • Octave

    > [p,F,gle,gld] = anova(Y)

  • Aplicativo Java

    http://www.psych.utah.edu/stat/introstats/anovaflash.html

  • MANOVA

    MANOVA: Multivariate Analysis of Variance

    http://www.google.com.br/url?sa=t&rct=j&q=manova site:usc.edu&source=web&cd=1&ved=0CCYQFjAA&url=http://www-bcf.usc.edu/~mmclaugh/550x/PPTslides/WeekElevenSlides/MANOVA.ppt&ei=vGmVT6ebGpSQ8wS_7JiqBA&usg=AFQjCNGuG6lhm_BflMGHKaNLQYzrIYNKwg

  • MANOVA: What Kinds of Hypotheses Can it Test?

    • A MANOVA or multivariate analysis of variance is a way to test the hypothesis that one or more independent variables (IV), or factors, have an effect on a set of two or more dependent variables (DV) – For example, you might wish to test the hypothesis that

    sex and ethnicity interact to influence a set of job-related outcomes including attitudes toward co-workers, attitudes toward supervisors, feelings of belonging in the work environment, and identification with the corporate culture

    – As another example, you might want to test the hypothesis that three different methods of teaching writing result in significant differences in ratings of student creativity, student acquisition of grammar, and assessments of writing quality by an independent panel of judges

  • Why Should You Do a MANOVA?

    • You do a MANOVA instead of a series of one-at-a-time ANOVAs for two main reasons – Supposedly to reduce the experiment-wise level of Type I

    error 8 F tests at 0.05 each means the experiment-wise probability of making a Type I error (rejecting the null hypothesis when it is in fact true) is 40%! The so-called overall test or omnibus test protects against this inflated error probability only when the null hypothesis is true. If you follow up a significant multivariate test with a bunch of ANOVAs on the individual variables without adjusting the error rates for the individual tests, there’s no “protection”

    – Another reasons to do MANOVA. None of the individual ANOVAs may produce a significant main effect on the DV, but in combination they might, which suggests that the variables are more meaningful taken together than considered separately

    • MANOVA takes into account the intercorrelations among the DVs

  • Type I and II error

    • Type I error – A type I error occurs when one rejects the null hypothesis

    when it is true

    • Type II error – A type II error occurs when one rejects the alternative

    hypothesis (fails to reject the null hypothesis) when the alternative hypothesis is true

    If there is a diagnostic value demarcating the choice of two means, moving

    it to decrease type I error will increase type II error (and vice-versa)

    http://www.cs.uni.edu/~campbell/stat/inf5.html

  • Assumptions of MANOVA

    1.Multivariate normality – All of the DVs must be distributed normally (can visualize

    this with histograms; tests are available for checking this out)

    – Any linear combination of the DVs must be distributed normally • Check out pairwise relationships among the DVs for

    nonlinear relationships using scatter plots

  • Assumptions of MANOVA

    – All subsets of the variables must have a multivariate normal distribution • These requirements are rarely if ever tested in practice

    • MANOVA is assumed to be a robust test that can stand up to departures from multivariate normality in terms of Type I error rate

    Log-likelihood density (log scale) using multivariate normal distribution (correlated)

    http://cxc.harvard.edu/contrib/sherpa/scipy11/

  • Assumptions of MANOVA

    – Statistical power (power to detect a main or interaction effect) may be reduced when distributions are very plateau-like (platykurtic)

    – If the classes in the center of the distribution have more or less the same frequency, the resulting histogram looks like a plateau

  • Assumptions of MANOVA, cont’d

    2. Homogeneity of the covariance matrices – In ANOVA we talked about the need for the

    variances of the dependent variable to be equal across levels of the independent variable • In MANOVA, the univariate requirement of equal

    variances has to hold for each one of the dependent variables

    – In MANOVA we extend this concept and require that the “covariance matrices” be homogeneous • Computations in MANOVA require the use of matrix

    algebra, and each Person’s “score” on the dependent variables is actually a “vector” of scores on DV1, DV2, DV3, …, DVn

    • The matrices of the covariances -the variance shared between any two variables- have to be equal across all levels of the independent variable

  • Assumptions of MANOVA, cont’d

    – This homogeneity assumption is tested with a test that is similar to Levene’s test for the ANOVA case. It is called Box’s M, and it works the same way: it tests the hypothesis that the covariance matrices of the dependent variables are significantly different across levels of the independent variable

    • Putting this in English, what you don’t want is the case where if your

    independent variable (IV), was, for example, ethnicity, all the people

    in the “other” category had scores on their 6 dependent variables

    clustered very tightly around their mean, whereas people in the

    “white” category had scores on the vector of 6 dependent variables

    clustered very loosely around the mean. You don’t want a

    leptokurtic set of distributions for one level of the IV and a

    platykurtic set for another level

  • Assumptions of MANOVA, cont’d

    • If Box’s M is significant, it means you have violated an assumption of

    MANOVA. This is not much of a problem if you have equal cell sizes

    and large N; it is a much bigger issue with small sample sizes

    and/or unequal cell sizes (in factorial anova if there are unequal cell

    sizes the sums of squares for the three sources (two main effects and

    interaction effect) won’t add up to the Total Sum of Squares, SS)

  • Assumptions of MANOVA, cont’d

    3. Independence of observations – Subjects’ scores on the dependent measures

    should not be influenced by or related to scores of other subjects in the condition or level

    – Can be tested with an intraclass correlation coefficient if lack of independence of observations is suspected

  • MANOVA no Action®

    http://www.portalaction.com.br/552-manova