lesson 2 - univariate statistics and experimental design

8/13/2019 Lesson 2 - Univariate Statistics and Experimental Design

1/34

Sensory Evaluation Methods

Lesson 2 Page 1 of 34

Copyright The Regents of the University of California 2006Copyright Dr. Jean-Xavier Guinard 2006

Lesson 2: Univariate Statist ics and Experimental Design

Topic 2.1: Univariate Statistics

Terms Summary Statistics

Topic 2.2: Central Tendency and Dispersion

Topic 2.3: The Null Hypothesis and Type I and Type II Errors

The Null Hypothesis (H0) Type I and Type II Errors

Topic 2.4: Basic Statistical Concepts and Associated Tests

Degrees of Freedom Confidence Interval One-tailed or two-tailed Test?

The Normal Distribution Central Limit Theorem The Binomial Test The Chi-Square Test Student's t-test Statistical versus Practical Significance

Topic 2.5: Correlation and Regression

Topic 2.6: Analysis of Variance

Judges: are they a random or fixed effect?

Topic 2.7: Multiple Mean Comparison Tests

Topic 2.8: Nonparametric Statistics

Topic for discussion: How carefully should you check the assumptions behind the statisticaltests you use?

Topic 2.9: Experimental Design

Significance, Power and Precision Sample Size Determination Randomization

Types of Designs Balanced Incomplete Block Design (BIB) Crossover Design Factorial Designs Which Design to Choose?

References

Tables
http://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/toc/toc.cfmhttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/toc/toc.cfmhttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_001.cfmhttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_001.cfmhttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_002.cfm#termshttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_002.cfm#termshttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_002.cfm#summaryhttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_002.cfm#summaryhttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_003.cfmhttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_003.cfmhttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_004.cfmhttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_004.cfmhttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_004.cfm#nullhttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_004.cfm#nullhttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_004.cfm#typehttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_004.cfm#typehttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_005.cfmhttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_005.cfmhttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_005.cfm#degreeshttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_005.cfm#degreeshttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_005.cfm#tailedhttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_005.cfm#tailedhttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_007.cfm#chihttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_007.cfm#chihttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_008.cfm#studenthttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_008.cfm#studenthttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_009.cfmhttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_009.cfmhttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_010.cfmhttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_010.cfmhttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_012.cfm#judgeshttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_012.cfm#judgeshttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_013.cfmhttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_013.cfmhttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_014.cfmhttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_014.cfmhttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_015.cfm#topichttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_015.cfm#topichttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_015.cfm#topichttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_016.cfmhttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_016.cfmhttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_016.cfm#significancehttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_016.cfm#significancehttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_017.cfm#samplehttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_017.cfm#samplehttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_018.cfm#randomhttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_018.cfm#randomhttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_019.cfm#designhttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_019.cfm#designhttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_020.cfm#balancedhttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_020.cfm#balancedhttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_021.cfm#crossoverhttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_021.cfm#crossoverhttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_022.cfm#factorialhttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_022.cfm#factorialhttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_023.cfm#whichhttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_023.cfm#whichhttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_024.cfmhttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_024.cfmhttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_025.cfmhttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_025.cfmhttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_025.cfmhttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_024.cfmhttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_023.cfm#whichhttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_022.cfm#factorialhttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_021.cfm#crossoverhttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_020.cfm#balancedhttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_019.cfm#designhttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_018.cfm#randomhttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_017.cfm#samplehttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_016.cfm#significancehttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_016.cfmhttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_015.cfm#topichttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_015.cfm#topichttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_014.cfmhttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_013.cfmhttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_012.cfm#judgeshttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_010.cfmhttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_009.cfmhttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_008.cfm#statisticalhttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_008.cfm#studenthttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_007.cfm#chihttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_007.cfm#binomialhttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_006.cfm#centralhttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_006.cfm#normalhttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_005.cfm#tailedhttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_005.cfm#confidencehttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_005.cfm#degreeshttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_005.cfmhttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_004.cfm#typehttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_004.cfm#nullhttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_004.cfmhttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_003.cfmhttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_002.cfm#summaryhttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_002.cfm#termshttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/les_002_001.cfmhttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/toc/toc.cfm


2/34




Lesson 2: Univariate Statist ics & Experimental DesignTopic 2.1: Univariate Statistics

Lesson Objectives

In this lesson we will cover basic statistics and the principles of experimental design. For some, this will be areview, and for others, it will represent a significant amount of new material. The goal here is not so much to go intothe theory or the mathematics of statistical tests and experimental design, but rather to show their application to thedesign and analysis of sensory tests. Some theoretical background is required, though, to fully grasp the nature ofthese statistical protocols and the type of questions they are suited to answer. You will be given that background.More importantly, you will become sufficiently knowledgeable about experimental design and basic statistics to beable to select the appropriate design and statistical procedure to analyze data from a given sensory test and tointerpret the meaning of the results.The working title for this lesson includes the terms 'univariate statistics' (in contrast with multivariate statistics,which we will cover in Lesson 6), because we will focus mostly on situations where one variable is analyzed (or twoin the case of correlation and regression).Most of these assignments for this lesson will have you run statistical tests on actual data. We are providing a set of

guidelines on how to run the test with various software programs, or even by hand, if you are so inclined. You canfind this univariate statistics tutorial on the topic outline for this lesson.

Objectives:

1. To provide a detailed overview of relevant statistical principles, and of statistical tests, theirapplications, and interpretation.

2. To describe and explain the basic concepts of experimental design.

To cite Michael O'Mahony (1986) from his statistics textbook, "Because behavioral and biological data are soinherently variable, statistical analysis is a useful tool for pinpointing and clarifying trends which otherwise might beobscured by a welter of numbers." But we could also cite Benjamin Disraeli here, "There are three kinds of lies: lies,damned lies, and statistics." The message to take away is that statistics can be manipulated to the point of

reversing the outcome of a test. So beware...

For example, one might treat judges as a so-called fixed effect in an analysis of variance of scaling data andconclude that there is a significant difference among the samples for the scaled attribute, and just as easily (andvalidly) reach the opposite conclusion (no difference among the samples) by treating judges as a so-called randomeffect. [We will explain the difference between random and fixed effects later in this lesson.] So consider both themessage, and the messenger (and if you are the messenger, behave as a rigorous and honest statistician - hencethe need for some working knowledge of statistics).
http://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/library/statstutorialpart2_2005.dochttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/library/statstutorialpart2_2005.dochttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/library/statstutorialpart2_2005.dochttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/library/statstutorialpart2_2005.doc


3/34




Lesson 2: Univariate Statist ics & Experimental DesignTopic 2.1: Univariate Statistics

It is often said that the most important part of a research experiment is the experimental design. This is because apoorly designed experiment will produce data of limited validity. We will examine the basic principles ofexperimental design and go over the types of designs suited for your experimental needs.

Another misconception we must dismiss right off the bat is that the numerical, and therefore complex, nature ofstatistics and experimental design makes them boring and somewhat scary. Experimental design and statisticalanalysis are exercises in logic and as such can be rather entertaining. We aim to have everyone enjoy designingsensory tests and running statistical analyses by the end of this course (and doing it right!).

Because experimental design requires some knowledge and understanding of basic statistics, we will begin thislesson with basic statistics and end it with experimental design.

Terms

First, let's go over some terms we will use often.

1. Descriptive statistics are used to describe the data (e.g., graphs, tables, averages, ranges, etc.).2. Inferential statistics are used to infer, from a sample, facts about the population it came from. It

follows that a parameter is a fact regarding a population, whereas a statistic is a fact regarding asample.

Statistical tests for inferential statistics are divided into parametric and nonparametric tests . Parametric tests areused to analyze data from interval or ratio scales (continuously distributed, following a normal distribution), whilenonparametric tests are designed to handle ordinal data (ranks) and nominal data (categories).

Summary Statistics

There are two major distinctions in summary statistics :

Whether the statistics are measures of central tendency (i.e., where do most of the numbers fall?),

or

Whether they are measures of dispersion (i.e., how much spread is there in the numbers?)

The following section will discuss these topics in greater detail.


4/34




Lesson 2: Univariate Statist ics & Experimental DesignTopic 2.2: Central Tendency and Dispersion

Measures of central tendency are intended to provide a feel for the average value of a data set. There are threecommonly used measures of central tendency:

1. The mean is the average of all data points.2. The median is the middle number of a set of numbers arranged in order.3. The mode is the value which occurs most frequently in the data set.

While a data set can have only one mean and median, it can potentially be multimodal (have several modes). Canyou think of an example of a multimodal distribution from the first course?

The median has the advantage of not being influenced by outliers in the data set. That is not true of the mean,however. So watch out for outliers if, like most people, your favorite measure of central tendency is the mean.It is possible to draw inferences about population means from sample means, thus the mean is the most widelyused measure of central tendency. For example, by recording the age of a sample of 300 undergraduate studentsat a public university and calculating the mean age of that sample, we can assume that the mean age of collegestudents across the nation won't be too far off that number.

Measures of dispersion indicate how scattered the data are. There are five commonly used measures ofdispersion:

1. The range is the difference between the highest value and the lowest value in the data set = Maximum -Minimum

2. The interquartile range (IQR) is the range between the upper and lower quartiles (25th and 75thpercentiles) = 75th percentile - 25th percentile

3. The variance is the sum of the squared differences between the observed values and the mean and thenaveraged across the population.

Where: X is an observed value, and are the population and sample means respectively, and Nis the size (or num ber) of the sampled population.

- population

- sample

4. The standard deviation is the square root of the variance

5. The coefficient of variation equals the standard deviation divided by the mean

The range of the sample is not a good estimate of the range of the population; it is too small. This limitation isreduced by increasing the size of the sample.

The interquartile range eliminates the effect of outliers on the range.


5/34




We should also define the standard error of t he mean (s.e.m.) as the standard deviation divided by the squareroot of N, the number of observations from which the mean was derived. In most of the figures you will find inpublished research, means are shown along with s.e.m. confidence intervals.

You should familiarize yourself with the basic ways in which summary statistics are calculated for dataspreadsheets in your favorite office software(s) (e.g., Microsoft Excel) and/or statistical software (Minitab, SAS,SPSS, etc...).

Also consult the univariate statistics tutorial on the lesson outline.
http://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/library/statstutorialpart2_2005.dochttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/library/statstutorialpart2_2005.dochttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/library/statstutorialpart2_2005.dochttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/library/statstutorialpart2_2005.doc


6/34




Lesson 2: Univariate Statist ics & Experimental DesignTopic 2.3: The Null Hypothesis and Type I and Type II Errors

The Null Hypothesis (H0)

The Null Hypothesis (H 0) may be a hypothesis stating that there is NO DIFFERENCE:

Between two samples Between the means of two sets of numbers Between the number of people or things in various categories

Type I and Type II Errors

A Type I Err or is one we commit if we reject the Null Hypothesis when it is actually true. In a difference test, forexample, that means concluding that the two samples are different when they are not perceptibly different. The riskof committing a Type I Error is called alpha () .

A Type II Err or is one we commit if we do no t reject the Null Hypothesis when it (i.e., the Null Hypothesis) is false .In a difference test, that means concluding that the two samples are not different when they are perceptibly

different. Beta () is the risk of NOT finding a difference when one actually exists.Most statistical tests are designed around the significance level of alpha (). Traditionally, alpha significance levelsof 5%, 1% and 0.1%, expressed as p < 0.05, p < 0.01, or p < 0.001 respectively, are considered to accept or rejectthe Null Hypothesis.

P = 1-

The power of a statistical test P is defined as 1 - beta. In discrimination testing, P is the probability of finding adifference if one actually exists, or the probability of making the correct decision that the two samples areperceptibly different. The power of the test P is dependent on:

The magnitude of the difference between samples, The size of alpha, and The number of judges performing the test

In practice, we set the desired level of P to determine how many judges should be recruited to conduct the test.This is what is known as power analysis . It is an important tool for anyone involved in experimental design anddata collection, and one that we will examine in more detail in our discrimination testing lesson.


7/34




Lesson 2: Univariate Statist ics & Experimental DesignTopic 2.4: Basic Statistical Concepts and Associated Tests

Degrees of Freedom

Basically, the number of degrees of freedom refers to the number of categories to which data can be assigned

'freely' without being predetermined. For example, if we produced 10 samples using either of 3 processingmethods, and we had 4 samples manufactured with process A, and 4 samples manufactured with process B, thenwe know that the remaining 2 had to be manufactured with process C. For the three categories, we could onlyassign samples freely to two. For the third, we had no choice. So in most cases, the number of degrees of freedomis n - 1, where n is the number of categories.

Confidence Interval

A confidence interval gives a sense for where a sample mean is likely to fall.

It is typically set at +/- 2 s.e.m. (Standard Error of the Mean) for a 95% probability level. It is derived from the criticalvalue of t at p < 0.05 (for a high number of degrees of freedom).

As mentioned above, the standard error of the mean is equal to the standard deviation divided by the square root ofN, the number of observations from which the mean is derived.

One-tailed or two-tailed Test?

This is a question that often arises in the use of statistical tests of the null hypothesis. This basically has to do withwhether there is only one correct 'direction' or 'answer' for a test with two alternative outcomes, or whether both canbe envisioned.

An example from difference testing can help here. In a paired comparison (a type of difference test - see Lesson 4),we may compare two samples for the intensity of one attribute ("which sample is stronger for attribute X?"), or forpreference ("which do you prefer?"). In the first scenario, there is only one correct answer (one of the samplesMUST be stronger than the other), so we use one-tailed probabilities to analyze the results. In the second case, theperson is free to prefer one sample or the other. Both alternatives are 'acceptable,' and so we use two-tailedprobabilities.

Some statisticians argue that in most scenarios, you may not know whether one-tailed or two-tailed probabilities arewarranted, and that you should always use two-tailed probabilities. We concur, except in those (few) instanceswhere you have a very good reason for using a one-tailed test.


8/34


9/34




Mathematically, a normal or Gaussian distribution is defined as follows:

Central Limit Theorem

This theorem states that sampling distributions, provided that the sample size is large enough, will approach normaldistributions. For example, if you plot the ages of the students in an undergraduate class of 20 at UC Davis, thedistribution may or may not be normal. But if you plot the data from 10 classes of 20 together, it will most likely benormally distributed.

To normalize distribution statistics, mean and variance data can be converted into z-scores:

A z-score is really the distance from the mean in terms of standard deviations.


10/34





The Binomial Test

A binomial test is used to determine whether more observations fall into one of two categories vs. the other. We

use the so-called binomial expansion to calculate probabilities for various problems, such as tossing coins anddrawing cards from a deck.

A common use for the binomial test in sensory evaluation is for the analysis of difference tests. For differencetesting, the two categories are judges able to discriminate vs. judges unable to discriminate between samples.

The Chi-Square Test

The chi-square test is used to test hypotheses about frequency of occurrence.

Just like there is a normal and a binomial distribution, there is a chi-square distribution.

Chi-square =

In practice, we compare the calculated chi-square value to the largest value that could occur on the null hypothesis(found in tables for various levels of significance).

The chi-square test is very powerful (readily rejects H0), prevents a Type II error, but it does not guard well againsta Type I error (see below).

The chi-square test requires that all observations be independent.


11/34





Student's t-test

The comparison of the means from two sets of observations (e.g., two products or two panels) is a very common

task in sensory evaluation. Are they statistically different, or more or less the same? This question is answeredusing the t- statistic, which is distributed somewhat like a z-score, except that it is wider and flatter. We compare thecalculated t-value for those two means to tabled values of t at the usual probability levels of alpha (5%, 1%, and0.1%).

There are three variations of the t-test:

1. One-sample t-test . This test determines whether a sample (we mean a statistical sample here) with agiven mean X comes from a population with a given mean mu ().

2. Two-sample t-test, related samples . This test determines whether two samples were drawn from thesame population (means not significantly different) or from different populations (means significantlydifferent), in a related-samples design. We would use this t-test to compare the mean ratings for a panelof judges in two conditions - for example, rating the intensity of an attribute in a sample under white lightvs. red light.

3. Two-sample t-test, independent (unrelated) samples . This test determines whether two samples weredrawn from the same population (means not significantly different) or from different populations (meanssignificantly different), in an independent-sample design - for example comparing the mean ratings givento a sample by two different panels.

Statist ical versus Practical Significance

A consideration with regards to evaluating significance is the role of the number of cases or people that areinvolved in testing for significance. With large N's, such as with tests where there are 100 or more observations orrespondents, you may reach a statistical significance at the .05 level with very small differences in the means.Thus, although statistical significance may be obtained, the actual difference in the means may be inconsequentialrelative to other factors which could influence decision-making. This "practical" significance should be taken intoaccount when making recommendations from a sensory test.


12/34




Lesson 2: Univariate Statist ics & Experimental DesignTopic 2.5: Correlation and Regression

Linear correlation is a measure of the degree of association between two data sets; it indicates to what degree aplot of the two sets of data fits on a straight line.

Linear regression is the technique whereby a straight line that best fits the data set is drawn as close as possible toall the points.

The formula used to calculate a Pearson's Product-Moment correlation coefficient r is:

The calculated value for r is compared to a table of critical values for Pearson's Product-Moment correlationcoefficient. Those are given for various significance levels, and for one-tailed and two-tailed scenarios. Note thatthe degrees of freedom for a correlation coefficient are n - 2 , where n is the number of pairs of X and Yobservations used to calculate r. This is the one time when degrees of freedom are not equal to n - 1!


13/34




Both linear correlation and linear regression are based on the assumption that there is a linear relationship betweenthe data sets. THIS IS AN IMPORTANT AND SOMETIMES OVERLOOKED ASSUMPTION.

Linear correlation is based on four additional assumptions:

Each pair of X and Y values must be independent . The data must come from a bivariate normal di stribution . Generally, X and Y should be randomly sampled . X and Y should be homoscedastic (of equal variance).

In practice, these assumptions are rarely checked.

A significant correlation between two variables X and Y may imply that:

1. X causes Y2. Y causes X3. X and Y are both caused by some other factor4. None of the above

That is, correlation does NOT necessarily im ply CAUSALITY .

Another consideration in looking at the significance of correlation coefficients is the fact that tables give us

information on what level of confidence we can have that the r is not zero. It does not tell us that this r will be ofpractical value. For example, an r of 0.2 could be significantly different from zero at .05 level but this would onlymean that X would only account for four percent of the variance in Y. (Note: r is called the coefficient ofdetermination and indicates the variance of Y accounted for by X, in this example 0.22 = 0.04 or 4%.)


14/34




Lesson 2: Univariate Statist ics & Experimental DesignTopic 2.6: Analysis of Variance

Analysis of variance is by far the most common statistical test performed on sensory evaluation and consumertesting data, where more than two products are compared using scaled responses, such as the intensity of anattribute in descriptive analysis, or the degree of liking of a product in consumer testing. It provides a very sensitivetool for assessing whether treatment variables, such as ingredient or processing variables, have an effect on thesensory properties and/or acceptability of a product.

Analysis of variance is a method used for finding and quantifying variation that can be attributed to specific(assignable) causes, against a background of existing variation due to other (unexplained and non-assignable)causes. These other unexplained causes account for the so-called experimental error or noise in the data. What theanalysis of variance algorithm essentially does is to compare the size of the variance from each assignable causeto that of the experimental error. If the ratio of the two is large, it is likely that the assignable cause is a significantsource of variation in the data (i.e., it accounts for a significant chunk of the variance). If the ratio is small, thatvariable does not contribute too much of the variance in the data, and it is not a significant source of variation. Sothis is all about sorting and separating the 'big chunks' from the 'small chunks' among the sources of variations andtheir interactions. The problem usually lies in figuring out where the medium-size chunks belong - with the large orthe small ones?

Analysis of variance estimates the variance or squared deviations attributable to each factor in the design. This isthe degree to which each factor or variable moves the data away from the overall mean of the data set. It alsoestimates the amount of variance represented by the error. (Think of the error as the 'other' variance, notattributable to the factors we manipulated in our experiment). Then, analysis of variance examines the ratio of eachfactor's variance to the error variance.

This ratio follows the distribution of an F-statistic, and it is called an F-ratio. Mathematically, it is the ratio of themean-squared differences among treatments over the mean-squared error. In a two-product situation, it is simplythe square of the t-value (see Student's t-test above). A significant F-ratio for a given factor states that the meansfor that factor were significantly different - all it takes for that to happen is for two of the means to be 'different'. Notethan an F-ratio has two degrees of freedom: one for the numerator (number of treatment levels minus one), andone for the denominator - the error's.


15/34


16/34




Lesson 2: Univariate Statist ics & Experimental DesignTopic 2.6: Analysis of Variance

Judges: are they a random or fixed effect?

This is a somewhat controversial issue which is going to take us back to our definition of the individuals involved in

sensory tests. Are the (trained) ju dges doing a sensory evaluation test? Or are they human subjects whosethresholds are getting measured, or perhaps consumers giving hedonic ratings?

If the individuals have been randomly sampled from the general population (and hence are representative of thatpopulation), then they should be treated as a random effect. This is the case when a consumer panel of 300housewives is assembled to evaluate the quality of TV dinners. They represent the general population ofhousewives at large. This is also the case with a sample of 300 overweight human subjects recruited randomlyfrom the at-large population of overweight individuals for a clinical study of a diet pill.

In sensory analytical tests (sensory evaluation) however, judges are TRAINED, which causes them to no longerrepresent a random sample from the population. In an analysis of variance design that includes judges as a sourceof variation, we recommend that they be treated as a fixed effect. But to be fair, we should note that manystatisticians and sensory scientists will argue otherwise, and treat judges in a descriptive analysis as a randomeffect. (You may want to consult some of the main sensory evaluation textbooks on the subject or discuss this withthe company's statistician(s), and make up your own mind.)

Does it make a difference anyway? Well yes, it can. The difference resides in the calculation of the F-ratio for theother fixed effects. In a design with judges, samples, replications, and their interactions as sources of variation, thesample F-ratio will be:

MS Samples / MS Error if judges are treated as a fixed effect,

or

MS Samples / MS JudgesXSamples if they are treated as a random effect, where MS stands for Mean Squares.

Sometimes, this can make a difference in the significance of the F-ratio.

How do we run an analysis of variance and then read and interpret the results? This will be an importantcomponent of the tutorial at the end of the lesson, and the focus of one of the exercises for this lesson. We will useSAS (Statistical Analysis Systems - PC Version) to run our ANOVA. But you may use any other software with thatcapability. We usually present the results as a table showing F-ratios and their significance for the sources ofvariation (treatments) and their interactions as shown in the table below from your reading assignment.


17/34





18/34




Lesson 2: Univariate Statist ics & Experimental DesignTopic 2.7: Multiple Mean Comparison Tests

In an analysis of variance, we determine whether sources of variation are significant or not by examining F-ratios.For example, if the F-ratio for a treatment in the design is significant, it means that overall, the means for thattreatment are significantly different. At this point, there is a need to determine how individual means for thattreatment differ from each other. This is done with multiple mean comparison tests, most of which are based on thet-test. The rationale here is to avoid an inflated risk of a Type I error that is inherent when making pairedcomparisons between means with multiple t-tests.

There are different methods for multiple mean comparisons, including Duncan's stud entized range test, Fisher'sLeast Sign ificant Difference (LSD) test, Newman-Keuls' t est, Scheffe's test, and Tukey's Honestly-Significant-Difference (HSD) test.

Scheffe's test is the most conservative (i.e., least likely to produce significant differences among means), andFisher's LSD the least (i.e., it requires the smallest difference between means to establish significance). TheDuncan test guards well against Type I error among a set of paired comparisons, after a significant F-ratio hasbeen found in the ANOVA for that treatment. We recommend using either Fisher's LSD test or Duncan's testdepending on the circumstances. Fisher's Least Significant Difference is calculated as follows:

Where:

n = number of treatments (e.g., number of judges, samples or replications) in a one-way ANOVA or one-factorrepeated measures analysis of variance

t = t-value for a two-tailed test with the degrees of freedom for the error term (that number is 1.98 with a highenough number of degrees of freedom) and an alpha level of 5, 1 or 0.1%.

If the difference between 2 treatment means is larger than the LSD, then the means are deemed to be significantlydifferent at the corresponding alpha level (5, 1 or 0.1%).Duncan's studentized range statistic q goes as follows:

This q value must exceed a tabulated value based on the number of means being compared.

All of these multiple mean comparison tests are typically included as options in the statistical software you use. It issimply a matter of writing in the one or two lines in your program requesting that these multiple mean comparisonsbe run after the main ANOVA procedure. We will go through that procedure with our ANOVA example in the

tutorial, and you will make multiple mean comparisons in an ANOVA assignment as well.


19/34




Lesson 2: Univariate Statist ics & Experimental DesignTopic 2.8: Nonparametric Statistics

The t-test, analysis of variance, and multiple mean comparisons are examples of parametric statistics. Thesemethods are suited for situations where the variables under study are continuous , as with rating scales (i.e., anintensity rating on a 0-10 numerical scale varies continuously from 0 to 10). There are many instances in sensoryevaluation, however, when we categorize performance into right and wrong answers, or when we count the numberof people who choose one product over another. This is called discrete, categorical data, and to properly study itstatistically, we must use nonparametric statistics.

We have already introduced the binomial and chi-square distributions and tests and their applications. Othernonparametric tests of interest in sensory evaluation are rank order tests.

Ranking is a form of difference testing where two or more samples are ranked in order of intensity of an attribute orpreference. The simplest case of ranking is the paired comparison. A simple nonparametric test of difference withpaired data is the sign test . It only involves counting the direction of paired scores, assuming a 50/50 split underthe null hypothesis. Once the + and - signs have been counted (no ties allowed), we compute the ratio of pluses tototal pairs and check the corresponding probability level in a two-tailed binomial probability table.

An alternative to the independent samples t-test is the Mann Whitney U test . This test can be used in a situation

where two sets of data are to be compared and the level of measurement is ordinal.

Two tests are commonly used to analyze ranking data with multiple products. One is the Friedman analysis ofvariance . The second is Kramer's rank sum test . Both will be covered in some detail in the lesson on differencetesting (Lesson 4).

Finally, we should mention an alternative to Pearson's product-moment correlation coefficient - the Spearman rankorder correlation . It is a recommended alternative for data with a high degree of skew or outliers, or for data fromordinal scal es. The Spearman rank order correlation ( or rho) asks whether the two variables line up in similarrankings across a set of observations. Tables of significance indicate whether an association exists on the basis ofthese rankings. The data must be converted to ranks first (if it is not ranking data to begin with), and a differencescore calculated for each pair of ranks (d):

Where:

= the Spearman correlation coefficient (rho)

d = the sum of the squares of the differences between ranks

N = the number of casesThe table below helps summarize the parallels between parametric and nonparametric statistics.


20/34





21/34




Lesson 2: Univariate Statist ics & Experimental DesignTopic 2.8: Nonparametric Statistics

Food for thought: How carefully should you check the assumptions behind the statisticaltests you use?

"Very carefully!" is going to be the answer from your statistician. The livelihood of a statistician is indeed to makesure that the statistics are done right, and that means checking all the assumptions behind a statistical test.

You should never lose sight of the bigger question, however (i.e., the one you are trying to answer with the sensorytest you carried out).

It is rather unlikely that if you follow sound experimental design and test execution practices, the conclusions youreach will be completely flawed because one the many assumptions underlying the particular statistical test youused to analyze the data was violated.

For example, let's examine the following question:

Is there a correlation between the acceptability of frozen yogurt (as measured by hedonic ratings from consumers)and acidity (as measured by pH or sensory ratings of sourness)? If the two variables are indeed related (most likelyinversely so), the fact that the two sets of observations you collect for a set of yogurt samples varying in acidity maynot quite have the same variance and should not lead you to the wrong findings. (To figure out whichassumptions(s) may have been violated here, see the section on correlation and regression above.)


22/34




Lesson 2: Univariate Statist ics & Experimental DesignTopic 2.9: Experimental Design

The experimental design of a research study is the most critical component of any study. We can defineexperimental design as an organized approach to the collection of experimental data. Ideally, this approach shoulddefine the population to be studied, the randomization process, the administration of treatments, the sample sizerequirement, and the method(s) of statistical analysis.

In statistical terms, the experimental design defines:

The size and number of the experimental units. The manner in which the treatments are allotted to the units. The appropriate type and grouping of the experimental units.

The experimental unit is the smallest subdivision of the experimental material such that any two could be assignedto different treatments.

Significance, Power and Precision

Hypotheses and estimates in the experimental 'world' are usually subject to error. An experiment should thereforebe designed in such a way that:

1. The probability (i.e., the alpha value) of rejecting the null hypothesis H0 (1 = 2) when it is true is low(typically 5%, 1%, or 0.1%) = significance level

2. The probability of rejecting the null hypothesis when it is false (1 - ) is high (typically 90%) = power 3. The estimate of the difference between 1 and 2 should (with high confidence - 95%) be within +/- 10%

of being correct = precision of estimate

To accomplish this, one first needs an estimate of the expected standard deviation of the intended experiment. Thisdoes not need to be iron clad and any preliminary experiment or literature information would be usable.

Next, one needs to be sold to the experimental objective that an 'important' difference can be identified. This is

NOT a statistical concept, but rather an educated decision based on the background and implications of yourresults and conclusion.


23/34





Significance, Power and Precision (Cont.)

Sample Size Determination

The issue of the number of judges, subjects, or consumers required for a given sensory test is not always dealt withproperly. Whereas many sensory scientists are satisfied to use what is considered a 'typical' number of judges for agiven test (e.g., 30 for a difference test, 10-15 for descriptive analysis, and 100-300 for a consumer test), there arestatistical means to determine what an appropriate number of subjects should be for a given application. Thesemeans are collectively known as ' power analysis .' It requires estimates of the variance and of within-subjectcorrelations for the measurement to be carried out in the sensory test (e.g., intensity scaling, hedonic scaling,etc...), and of the minimum size of the difference required to establish significance (e.g., 1 point on a 10-pointintensity scale or a 9-point hedonic scale).

Note that power analysis is more critical for groups of human subjects or consumers randomly selected from thepopulation they represent. In the case of judges recruited for descriptive analysis, because they are trained to use ascale in a specific way, those estimates of variance and within-judge correlations are low and high, respectively,and the required sample size for a panel goes down dramatically. We therefore know (and this is confirmed byexperience) that a panel size of 10-15 judges is amply sufficient for descriptive analysis (provided they are properlytrained).

Now, let's look at an example, based on a clinical study with human subjects. We completed a study of the effect ofdietary fat on preferences for fat in selected foods (Guinard et al., 1999). The main hypothesis was that a shift inenergy from fat in the diet would result in a corresponding shift in hedonic response to fat in foods. A poweranalysis was carried out to estimate the size of the sample that would be adequate to test our hypothesis. We tookestimates for variance (MSE = 1.89) and within- subject correlations ( = 0.66) from a prior study o f animal fatacceptability among athletes conducted with 40 subjects on 29 meat and dairy products using the 9-point hedonicscale. Based on that study, we deemed a minimum detectable difference of 0.8 on the 9-point scale adequate toverify our hypothesis. With a power of 90%, and a two-tailed test with alpha = 0.05, we found that minimumdetectable differences of 0.9, 0.8 and 0.7 would be achieved with 20, 25 or 30 subjects, respectively. We recruited

25 subjects for our experiment.

For details on how to compute these estimates into the calculation of required sample size, consult Fleiss (1986),Schlesselman (1973), or any experimental design textbook.


24/34





Randomization

Randomization is the process whereby we assure that each experimental unit has an equal probability of being

assigned to each treatment. The purpose of randomization is to guarantee that the statistical test will have a validsignificance level.

In sensory testing, a common form of randomization is the random distribution of panelists to a specific group. Bydoing this, the uncontrolled variation among panelists is distributed to treatment groups, and the treatment effect istherefore similarly affected, resulting in cancellation of the effect on overall variation.

Another form of randomization is the random ordering of sample presentation in a sensory test. It is recommendedthat the sample order be counterbalanced, with each serving sequence occurring an equal number of times (fullycounterbalanced design). It is also recommended to use completely balanced serving sequences when thepossibility of carryover effects between samples exist. The Mutually Orthogonal Latin Squares (MOLS) designdeveloped by Wakeling and MacFie (1995) is an example of a design where every sample occurs in every positionin the sequence (first, second,...) the same number of times, as well as before and after every other sample in thedesign the same number of times. This is particularly useful in preference mapping applications where consumerssee several samples, and first-order and/or carry over effects are a possibility.


25/34


26/34




The layout for a RCB design is:

The corresponding ANOVA table is:

The RCB design is frequently used when trained panels must evaluate several samples in replicate (not feasible inone single session). In this case, it is best to have each judge evaluate all samples in a single session, and thenreturn to evaluate them again in another session, etc. In this type of study, the blocks are the sessions, and thesamples are randomized across judges within each block.


27/34





Balanced Incomplete Block Design (BIB)

This is an extension of the RCB design, with the following parameters:

t = number of treatments k = number of experimental units per block r = number of replications of each treatment b = number of blocks (judges) = number of blocks in which each pair of treatments are compared

These parameters are not independent and the following requirements apply:

rt = bk = N and (t - 1) = r(k - 1)

where N is the total number of observations in the experiment.

The layout for a BIB design is:

The corresponding ANOVA table is:


28/34




A BIB design is used when there are too many treatments in the experiment for the judges to evaluate all thesamples in a single session (block). In this case, judges evaluate subsets of samples during different sessions .


29/34





Crossover Design

A crossover design is a plan characterized by the measurement of the response of judges/consumers from the

evaluation of two treatments/products, each being evaluated in sequence. Crossover designs are notrecommended for use in sensory evaluation (with judges), but they are used extensively in clinical research andhave their application in consumer home-use tests, where consumers are asked to first use a product for someperiod of time and then use the other product, after which they complete a questionnaire regarding the products.

The layout of the crossover design is shown below as:

In designing a crossover home-use test, two groups of consumers are formed - I and II. Consumers are assignedrandomly to the two groups. In the first period, Group I uses product A first and product B second, and vice versafor Group II.

Two assumptions of this model, which may not always be met in a home-use test, are that (1) there is no product-by-period interaction; that is, the difference between products A and B is the same regardless of the sequence inwhich they were evaluated; and (2) there are no order or carry-over effects from one product to the other. Neitherassumption is likely to hold true, however.


30/34





Factor ial Designs

Factorial designs are plans used to study the effects of two or more factors on product attributes, where each level

of a factor is varied simultaneously with the other factors in the experiment. Chief among factorial designs is the 2k

factorial design. It is the foundation of response surface methodology, an optimization technique we will discuss inCourses 3 and 4.

In the 2 k factorial design, there are k factors (e.g., A, B, C, ...,) each at two levels - low and high.

In a two-factor experiment, the response Y consists of the effects of factors A, B, interaction AB, and the residual E(error). Note that without replications, the AB interaction cannot be estimated. The statistical model is written as:

Yij k = + A i +B j + (AB) ij + E ijk

Where:

i = 1, 2 (low, high) j = 1, 2 (low, high)k = 1, 2, ... r replications

A center point can be added to the 2 k factorial design. It is located half-way between the low and high levels of thetwo variables.

The 3 k factorial design is another design where we consider low, medium, and high levels of each variable.


31/34


32/34


33/34




Lesson 2: Univariate Statist ics & Experimental DesignTables

Table of critical values for chi-square Table of critical values for Pearson's product-moment correlation coefficient Tables of Spearman rank order correlation values

Table of critical values of t (Student's t-test)
http://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/pdfs/chi_square.pdfhttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/pdfs/chi_square.pdfhttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/pdfs/pearsright.pdfhttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/pdfs/pearsright.pdfhttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/pdfs/spearman.pdfhttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/pdfs/spearman.pdfhttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/pdfs/t_test.pdfhttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/pdfs/t_test.pdfhttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/pdfs/t_test.pdfhttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/pdfs/spearman.pdfhttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/pdfs/pearsright.pdfhttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/pdfs/chi_square.pdf


34/34


Lesson 2: Univariate Statist ics & Experimental DesignUnivariate Statistics Tutorial

Most of the assignments for this lesson will have you run statistical tests on actual data. We are providing a set ofguidelines here on how to run the tests with Excel or PC SAS. The first tutorial covers mean, variance, standarddeviation, t-test, chi-squared, correlation and regression on Excel. The second tutorial covers analysis of varianceon SAS.

Stats Tutorial 1 Stats Tutorial 2 (Analysis of Variance Tutorial)
http://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/word%20docs/statstutorialpart1.dochttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/word%20docs/statstutorialpart1.dochttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/library/statstutorialpart2_2005.dochttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/library/statstutorialpart2_2005.dochttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/library/statstutorialpart2_2005.dochttp://152.79.198.7/cfmx/DLC/Campus/Courses/seInstrumental/lesson_02/word%20docs/statstutorialpart1.doc

lesson 2 - univariate statistics and experimental design

Documents