spss basics probability distributions

SPSS Basics for Probability Distributions Prepared by: Chris Hay-Jahans, UAS Mathematics Program Page 1 of 28 Built-in Statistical Functions in SPSS Begin by defining some variables in the Variable View of a data file, save this file as “Probability_Distributions.sav” and save the corresponding output file as “Probability_Distributions.spo”. Accessing built-in statistical functions (and others) in SPSS is fairly straightforward when using the Transform then Compute Variable option, see below The Compute Variable window then opens

Upload: kikii-kiki

Post on 04-Apr-2015




8 download


Page 1: SPSS Basics Probability Distributions

SPSS Basics for Probability Distributions

Prepared by: Chris Hay-Jahans, UAS Mathematics Program Page 1 of 28

Built-in Statistical Functions in SPSS Begin by defining some variables in the Variable View of a data file, save this file as “Probability_Distributions.sav” and save the corresponding output file as “Probability_Distributions.spo”.

Accessing built-in statistical functions (and others) in SPSS is fairly straightforward when using the Transform then Compute Variable option, see below

The Compute Variable window then opens

Page 2: SPSS Basics Probability Distributions

SPSS Basics for Probability Distributions

Prepared by: Chris Hay-Jahans, UAS Mathematics Program Page 2 of 28

For the present, let’s identify where the various distribution related functions are located. The menu of all available functions in SPSS is in the box labeled Function group:, selecting one of the function groups provides a sub-menu in the Functions and Special Variables box.

By selecting CDF & Noncentral CDF access can be gained to a list of Cumulative Distribution Functions.

By selecting PDF & Noncentral PDF access can be gained to a list of Probability Density Functions

Page 3: SPSS Basics Probability Distributions

SPSS Basics for Probability Distributions

Prepared by: Chris Hay-Jahans, UAS Mathematics Program Page 3 of 28

By selecting Inverse DF access can be gained to Inverse functions for Cumulative Probabilities.

Selecting Random Numbers provides access to a list of Random Number Generators within specific probability distributions

Page 4: SPSS Basics Probability Distributions

SPSS Basics for Probability Distributions

Prepared by: Chris Hay-Jahans, UAS Mathematics Program Page 4 of 28

Selecting Significance provides access to functions which may be used in computing a Significance (commonly referred to as p-value) corresponding to the F- and Chi-Square distributions.

Selecting Statistical provides access to some of the more routine functions used in statistical computations.

Page 5: SPSS Basics Probability Distributions

SPSS Basics for Probability Distributions

Prepared by: Chris Hay-Jahans, UAS Mathematics Program Page 5 of 28

Finally, selecting Arithmetic provides access to a list of commonly used arithmetic, algebraic and transcendental functions which might be needed.

When using any of the above identified functions (or any others, for that matter), a Target Variable has to be defined or listed and then a Numeric Expression (a desired computational formula) which may or may not involve built in functions will need to be entered. This formula also may or may not involve other variables listed in the data file. Once the computational formula is entered and the OK button is clicked SPSS will compute values for every row that has at least one data value in it.

Computing Probabilities and their Inverses

Consider the following simple examples.

Working with built-in functions for the Binomial Distribution

Go to the Data View of the data file, enter the data values 5, 10, 15 and 20 for the variable x. The following discussion addresses computations of cumulative probabilities for binomial distributions.

Open the Compute Variable window, identify “Binomial” as the Target Variable, then select CDF & Noncentral CDF in the Function group box and highlight Cdf.Binom. To place this function in the Numeric Expression box, click on the upward-arrow button located right next to the Function group box.

When ever a function or special variable is highlighted in the Functions and Special Variables box, a description will appear in the central space reserved for instructions and/or descriptions right below the “calculator” key-pad.

Page 6: SPSS Basics Probability Distributions

SPSS Basics for Probability Distributions

Prepared by: Chris Hay-Jahans, UAS Mathematics Program Page 6 of 28

In this case, note that the Numeric Expression “CDF.BINOM(?,?,?)” requires a “quant”, an “n”, and a “prob”. This function computes a probability of the form

P(X ≤ x) = CDF.BINOM(x, n, p) for a binomial distribution with probability of success p and number of trials n.

Put x for the 1st “?” – SPSS takes x’s value from the data file, set n = 23 and p = 0.37.

Page 7: SPSS Basics Probability Distributions

SPSS Basics for Probability Distributions

Prepared by: Chris Hay-Jahans, UAS Mathematics Program Page 7 of 28

Now click on OK, a little window will pop up asking if the existing values of the variable should be changed. Select OK.

The output file will be updated with a “log” of the computation performed, and the data file will now contain the cumulative probabilities

P(x ≤ 5), P(x ≤ 10), P(x ≤ 15) and P(x ≤ 20)

For the binomial distribution with parameters n = 23 and p = 0.37.

Adjust the decimal places for the variable “Binomial” to four places, then copy the values from “Binomial” to the variable “p” – reset the decimal places for “p” as well

Now, suppose the reverse computation is desired, i.e., find x for which the cumulative probability is 0.0938 and so on – note that the answers are already given in the variable x, but consider a process for finding these values.

It appears that SPSS does not have inverse functions associated with cumulative probabilities for discrete probability distributions. This provides an opportunity to illustrate the use of SPSS’ Command Syntax.

A Command Syntax Illustration: Begin by opening a “Syntax” file (as opposed to a data or output file).

Page 8: SPSS Basics Probability Distributions

SPSS Basics for Probability Distributions

Prepared by: Chris Hay-Jahans, UAS Mathematics Program Page 8 of 28

In this new file, type in the command syntax shown below and save the file as “InvBinom.sps”. Note that all text that follow a “/*” are comments for user reference – they tell users of the “program” what a particular code’s purpose is.

Now, to run the program, select Run and then All.

Open the data file and notice the new values placed under “InvBinom”

Page 9: SPSS Basics Probability Distributions

SPSS Basics for Probability Distributions

Prepared by: Chris Hay-Jahans, UAS Mathematics Program Page 9 of 28

The values entered are exactly those under the variable “x”.

Comments about the above “Program” and Command Syntax: In general, when dealing with a binomial distribution it is the probability of success that is of most interest – this being the quantity which might need estimating. The above “program” will work only if you are dealing with a theoretically exact set of probabilities. Its introduction here is for illustrative purposes only. Advanced users of SPSS will be the ones who find most use out of the Command Syntax language of SPSS, most users will never find a need for it. A point to note is that in order to be able to make effective use of the Command Syntax feature a user will need familiarity with computer programming and a reasonably high level of comfort with mathematics.

In computing probabilities of the form P(X = x) for a binomial distribution using SPSS’ built-in function, the Function group: PDF & Noncentral PDF is opened and in Functions & Special Variables: the function Pdf.Binom is called up

Thus, for a binomial distribution with parameters n and p (n trials and probability of success p),

P(X = x) = Pdf.Binom(x, n, p).

Use this function in the same manner as which cumulative frequencies were computed earlier. Be sure to assign a target variable.

Working with built-in functions for the Normal Distribution

As with binomial distributions, the main characteristics of a normal distribution are determined by two parameters. For the normal distribution (for the built-in functions in SPSS at least) the two parameters are the mean μ and the standard deviation σ (occasionally the variance is used instead of the standard deviation).

Page 10: SPSS Basics Probability Distributions

SPSS Basics for Probability Distributions

Prepared by: Chris Hay-Jahans, UAS Mathematics Program Page 10 of 28

Thus, to compute cumulative frequencies corresponding to the given values of x in the data file the population mean and standard deviation of the random variable X are needed. Remember that the normal distribution is a theoretical distribution, thus, at best, one may estimate the true mean and standard deviation.

Suppose the data given earlier is obtained from a normal distribution with μ = 6.25 and standard deviation σ = 3.5.

As with the binomial distribution, the cumulative frequency distribution function for a normal distribution with mean μ and standard deviation σ is accessed by opening the Compute Variable window and then selecting Cdf.Normal. The Numeric Expression entered below computes the probability P( -∞ < X < x), where x represents the data value being used in the function expression

Being sure to identify a target variable, click on OK to get

Page 11: SPSS Basics Probability Distributions

SPSS Basics for Probability Distributions

Prepared by: Chris Hay-Jahans, UAS Mathematics Program Page 11 of 28

Note that to compute a cumulative probability of the form P(a < X < b) using SPSS, one would use the numeric expression

Cdf.Normal(b, μ, σ) - Cdf.Normal(a, μ, σ).

Increase the number of decimal places, if desired, for the variable “Normal” and then copy and paste the newly computed values into the variable “p”.

Now, consider finding values for x for which P( -∞ < X < x) = p. Think of this task as solving this equation for x. SPSS does this through the function Idf.Normal – Keeping the same values μ = 6.25 and standard deviation σ = 3.5, and setting up the Compute Variable window as

The values computed show up in

Page 12: SPSS Basics Probability Distributions

SPSS Basics for Probability Distributions

Prepared by: Chris Hay-Jahans, UAS Mathematics Program Page 12 of 28

Unlike as was the case in the binomial distribution, the function Pdf.Norm(x, μ, σ) does not return the value of P(X = x). This function computes the probability density of the normal distribution, with specified mean μ and standard deviation σ, at x – see the general discussion on normal distributions in the text.

This function can be used to obtain the graph of the normal distribution curve (with specified mean μ and standard deviation σ), however, this function will not play much of a direct role in this course.

Working with built-in functions for the t-Distribution

The parameter needed to determine a t-distribution is the degrees of freedom, df = n – 1. Computing probabilities using a t-distribution follows the same steps as for binomial and normal distributions. The cumulative frequency distribution function for a t-distribution with degrees of freedom df = n – 1 is accessed by opening the Compute Variable window and then selecting Cdf.T. The Numeric Expression entered returns the probability P( -∞ < T < x).

Page 13: SPSS Basics Probability Distributions

SPSS Basics for Probability Distributions

Prepared by: Chris Hay-Jahans, UAS Mathematics Program Page 13 of 28

The values computed show up in

As in previous cases, inverses of probabilities can be computed

to get

Note: Some poor notation has crept in, my apologies – in the above data file the letter t represents a probability. In practice the letter t is reserved for the “t-value” of a t-distribution.

Page 14: SPSS Basics Probability Distributions

SPSS Basics for Probability Distributions

Prepared by: Chris Hay-Jahans, UAS Mathematics Program Page 14 of 28

The probability density function of t-distributions will not play much of a direct role in this course.

Working with built-in functions for the χ2-Distribution

The parameter needed to determine a (Chi-square) χ2-distribution is again the degrees of freedom, df = n – 1. Computing probabilities follows the same steps as before. The cumulative frequency distribution function for a χ2-distribution with degrees of freedom df = n – 1 is accessed by opening the Compute Variable window and then selecting Cdf.Chisq. The Numeric Expression entered returns the probability P( -∞ < χ2 < x).

The computed values appear in

Note: Once again, be aware of the poor notation. Here, the variable “Chi” represents probabilities and the variable “InvChi” represents “χ2-values” of the χ2-distribution.

Page 15: SPSS Basics Probability Distributions

SPSS Basics for Probability Distributions

Prepared by: Chris Hay-Jahans, UAS Mathematics Program Page 15 of 28

To obtain inverses of probabilities (i.e., to find the “x-values”) involving a χ2-distribution, the process is the same as before.

Again, the probability density function of χ2-distributions will not play much of a direct role in this course.

Working with built-in functions for the F-Distribution

First, change the variable “F” to “FProb”.

When working with F-distributions, the parameters needed to determine a distribution include two degrees of freedom, dfN (of the numerator) and dfD (of the denominator). Computing probabilities follows the same steps as before – here suppose, for the sake of example, that the two degrees of freedom are dfN = 3 and dfD = 5.

The cumulative frequency distribution function for the desired F-distribution is accessed by opening the Compute Variable window and then selecting Cdf.F. The Numeric Expression entered returns the probability P( -∞ < F < x).

Similarly, the inverse of a probability involving an F-distribution is accessed by opening the Compute Variable window and then selecting Idf.F.

The Compute Variable windows for each of these are shown on the next page, the first being for computing probabilities.

Page 16: SPSS Basics Probability Distributions

SPSS Basics for Probability Distributions

Prepared by: Chris Hay-Jahans, UAS Mathematics Program Page 16 of 28

Then for inverses of probabilities

The results of the above computations are

Page 17: SPSS Basics Probability Distributions

SPSS Basics for Probability Distributions

Prepared by: Chris Hay-Jahans, UAS Mathematics Program Page 17 of 28

Yet again, the probability density function of F-distributions will not play much of a direct role in this course.

Assessing the Normality of a Random Variable Graphically

A common method for determining whether the underlying population of a random variable, say X, for a set of data is (approximately) normally distributed is to use what is generically called a normal probability plot. Here, a particular type of normal probability called a Q-Q normal probability plot for a given set of data is obtained from scratch, and then using a built-in routine available in SPSS.

The data used are shown below

Computational Procedures for Obtaining a Q-Q Normal Probability Plot Begin by first sorting the data in ascending order. To do this, select Data on the toolbar and then click on Sort Cases as shown below

Page 18: SPSS Basics Probability Distributions

SPSS Basics for Probability Distributions

Prepared by: Chris Hay-Jahans, UAS Mathematics Program Page 18 of 28

In the Sort Cases window, identify the variable to be sorted and the Sort Order

The result will be

Page 19: SPSS Basics Probability Distributions

SPSS Basics for Probability Distributions

Prepared by: Chris Hay-Jahans, UAS Mathematics Program Page 19 of 28

Now insert a new variable before the variable “x”,

Name this new variable “i” and for this variable, enter the values 1,2, …, 12.

Now obtain the plotting position using Blom’s approximation – this uses the formula




Page 20: SPSS Basics Probability Distributions

SPSS Basics for Probability Distributions

Prepared by: Chris Hay-Jahans, UAS Mathematics Program Page 20 of 28

This is done using the Compute Variable feature.

Adjust the decimal places for the computed variable values to 4 places.

Though not necessary, now standardize the x-values using the mean and standard deviation of the sample.

Page 21: SPSS Basics Probability Distributions

SPSS Basics for Probability Distributions

Prepared by: Chris Hay-Jahans, UAS Mathematics Program Page 21 of 28

You will have to compute the mean and standard deviation for the data first. One way to do this is to use the SPSS routine


It is useful to note that SPSS provides a means of obtaining standardized values – see Save standardized values as variables in the Descriptives window above. The standardized values are computed using the formula

sxxz −


Where x is the mean of the data values and s is the standard deviation. You can limit the amount of output by opening the Options window and selecting only that which is desired – see below.

Page 22: SPSS Basics Probability Distributions

SPSS Basics for Probability Distributions

Prepared by: Chris Hay-Jahans, UAS Mathematics Program Page 22 of 28

The standardized values are saved under the variable name “Zx”, these values will be referred to as “Observed Values”

The next step in the process is to compute the “Expected Values”. Since it is the normality of the data that is being examined, the distribution used to obtain these expected values is a normal distribution. Furthermore, since the observed data values have been standardized it makes sense to use the Standard Normal distribution. Each plotting position “p” provides the cumulative probability associated with the rank of the corresponding observed data value. The expected value is then obtained by computing

Ze = Idf.Normal(p, 0, 1).

Page 23: SPSS Basics Probability Distributions

SPSS Basics for Probability Distributions

Prepared by: Chris Hay-Jahans, UAS Mathematics Program Page 23 of 28

Use the Compute Variables feature to obtain this.

Open the Variable View of the data file and Label “Zx” as “Observed Values” and “Ze” as “Expected Values.

All that is needed to obtain the Q-Q normal probability plot has now been obtained. Any one of the (three) graphing features may now be used to obtain a scatter plot of the observed values against the expected values. The “closeness” to normality is then indicated by how “closely” the scatter plot approximates the line y = x.

The two output columns are shown below.

Page 24: SPSS Basics Probability Distributions

SPSS Basics for Probability Distributions

Prepared by: Chris Hay-Jahans, UAS Mathematics Program Page 24 of 28

Now, to obtain the plot, open the Legacy Dialogs and select Scatter/Dot. Choose Simple Scatter and then assign “Ze” to the x-axis and “Zx” to the y-axis. You can add a Title and then select OK. The initial appearance of the Q-Q plot is

Earlier editing methods can be used to obtain

Page 25: SPSS Basics Probability Distributions

SPSS Basics for Probability Distributions

Prepared by: Chris Hay-Jahans, UAS Mathematics Program Page 25 of 28

For ease of comparison, the reference line y = x can be included as follows. Click on the button that produces the “Add a reference line from Equation” pop-up text box.

A reference line will appear on the chart area and the Properties window will show the equation of the line in the Custom Equation box in the form

bxaY += *

For this Q-Q normal probability plot (using standardized values) make sure 1=a and 0=b .

Page 26: SPSS Basics Probability Distributions

SPSS Basics for Probability Distributions

Prepared by: Chris Hay-Jahans, UAS Mathematics Program Page 26 of 28

The end result is

Observe that the scatter plot follows the line xy = very closely, and all points are closely clustered randomly (no systematic patterns) about the line. This suggests that it is reasonable to assume that the underlying population of the random variable X is (at least approximately) normally distributed. See class handout for a detailed analysis and interpretation of Q-Q normal probability plots.

Obtaining a Q-Q Normal Probability Plot using the Built-in SPSS Feature SPSS has a built in feature to construct Q-Q normal probability plots which shortens the process considerably

Page 27: SPSS Basics Probability Distributions

SPSS Basics for Probability Distributions

Prepared by: Chris Hay-Jahans, UAS Mathematics Program Page 27 of 28

Select the toolbar commands shown below

In the Q-Q Plots window assign the original variable to Variables, check the indicated boxes etc. and select OK.

The resulting Q-Q normal probability plot is similar to the one obtained “from scratch”, with the exception that the “Expected Values” values are placed in the vertical axis rather than the horizontal axis, see below.

Page 28: SPSS Basics Probability Distributions

SPSS Basics for Probability Distributions

Prepared by: Chris Hay-Jahans, UAS Mathematics Program Page 28 of 28

Open the Chart Editor window and begin by transposing the graph. This is done by clicking on the button that yields a pop-up box containing the “Transpose chart coordinate system”

Further edits can then be made to obtain a Q-Q normal probability plot that is very close in appearance to the earlier obtained plot.