Download - Basic Statistics with Microsoft Excel
Basic Statistics with Microsoft Excel
Helen Dixon
Aim and Objectives Aim of today’s course
To illustrate how Excel can be used to carry out some basic statistical analyses and tests
ObjectivesTo show you how to use some of the
statistical worksheet functions available within Excel
To show you how to use some of the tools available in the Analysis ToolPak
To make you aware of the limitations of Excel
Why use Excel? Software more accessible Previous familiarity with software Easy to format output Better charting facilities than some
statistical applications Access to other key Excel facilities Easy to use results with other applications
Problems with Excel Errors due to rounding, missing data or extreme
values Not suitable for very large data sets Output labelled or arranged inappropriately Need to repeat processes for different variables
or options No record of analyses Some algorithms are numerically unstable - little
or no information about algorithms employed Analysis ToolPak results are not dynamic and
may vary with results generated by functions
Statistical Functions
Frequency Distributions Mean, Median and Mode Percentiles and Quartiles Deviation and Squared Deviation about
the Mean Variance and Standard Deviation Covariance and the Correlation Coefficient
Frequency Use COUNTIF to count how many times
an item appears in a list=COUNTIF(range, criteria)
Use FREQUENCY to calculate how often values occur within a range=FREQUENCY(data_array, bins_array)
Can also use Histogram tool in Analysis Toolpak
Mean, Median, Mode
Use AVERAGE or AVERAGEA to calculate the arithmetic mean=AVERAGE(number1, number2, etc.)
Use MEDIAN to return the middle number=MEDIAN(number1, number2, etc)
Use MODE to return the most common value=MODE(number1, number2, etc)
Percentiles and Quartiles Use PERCENTILE to return the kth percentile of
a data set =PERCENTILE(array, percentile) Percentile argument is a value between 0 and 1
Use QUARTILE to return the given quartile of a data set =QUARTILE(array, quart) Quart is 1, 2, 3 or 4 IQR = Q3-Q1
May return different values to statistical package
Variance and Standard Deviation Use VAR, VARA, VARP or VARPA to
calculate the variance for a rangeE.g. =VAR(value1, value 2, etc.)Squared deviations about the mean/N or /n-1
Use STDEV, STDEVA, STDEVP or STDEVPA to calculate the standard deviation for a range=E.g. =STDEV(value1, value2, etc.)Positive square root of variance
Covariance and the Correlation Coefficient Use COVAR to calculate the covariance
=COVAR(array1, array2)Average of products of deviations for each data point
pairDepends on units of measurement
Use CORREL to return the correlation coefficient =CORREL(array1, array2)Returns value between -1 and +1
Also available in Analysis ToolPak
Probability Numerical measure of the likelihood that
an event will occur Some probabilities that can be calculated
using Excel:Binomial ProbabilitiesPoisson ProbabilitiesHypergeometric ProbabilitiesNormal ProbabilitiesExponential Probabilities
Binomial Probabilities
Use BINOMDIST to compute binomial distribution probabilities and cumulative binomial probabilities=BINOMDIST(number_s, trials, probability_s,
cumulative)Calculates the probability that a sequence of
independent trials with two possible outcomes will have a given number of successes
Cumulative is either TRUE or FALSE
Poisson Probabilities
Use POISSON to compute Poisson Probabilities=POISSON(x, mean, cumulative)Shows the probability of x occurrences of an
event over a specified interval of time or space
Hypergeometric Probabilities Use HYPGEOMDIST to compute
hypergeometric probabilities=HYPGEOMDIST(sample_s,
number_sample, population_s, number_pop)Computes the probability of x successes
(sample_s) in n trials (number_sample) when the trials are dependent
Similar to Binomial except trials are not independent – probability of success changes from trial to trial
Does not compute cumulative probabilities
Normal Probabilities Use NORMSDIST or NORMDIST to
compute the cumulative probability=NORMSDIST(z)=NORMDIST(x, mean, standard_dev,
cumulative) Use NORMSINV or NORMINV to compute
the z or x value given a cumulative probability=NORMSINV(probability)=NORMINV(probability, mean, standard_dev)
Exponential Probabilities Use EXPONDIST to compute exponential
probabilities=EXPONDIST(x, lambda, cumulative)x is the random variableLambda is 1/meanUseful in computing probabilities for the time it
takes to complete a task
Analysis ToolPak
Descriptive Statistics Correlation Linear Regression t-Tests z-Tests ANOVA Covariance
Descriptive Statistics
Mean, Median, Mode Standard Error Standard Deviation Sample Variance Kurtosis Skewness Confidence Level for
Mean
Range Minimum Maximum Sum Count kth Largest kth Smallest
Correlation and Regression
Correlation is a measure of the strength of linear association between two variables Values between -1 and +1 Values close to -1 indicate strong negative relationship Values close to +1 indicate strong positive relationship Values close to 0 indicate weak relationship
Linear Regression is the process of finding a line of best fit through a series of data points Can also use the SLOPE, INTERCEPT, CORREL and RSQ
functions
t-Tests and z-Tests
Used to test hypotheses by comparing means If sample means are equal suggests both
samples came from same population t-Test – n <30
Equal or unequal variances or paired testCheck result using TTEST function
z-Test – n>30Used for means with known variances
ANOVA: Analysis of Variances
Compares variances in two or more data sets If difference is found it can be assumed that the
means of the data sets are different Single Factor – use instead of t-Test for more than 2
samples Two Factor with Replication – useful when data can
be classified along 2 different dimensions Two Factor without Replication – as above but only
one observation for each pair
PivotTables
Use for crosstabulations Data must be in tabular format: columns
with headings, no blank columns Easy to pivot data Easy to create PivotCharts Can summarise and analyse data without
affecting data source
Final Tips
Excel only suitable for basic analysis using small data sets
Later versions of Excel more reliable than Excel 97
Check Analysis TookPak results with worksheet functions
Check overall results by hand or with dedicated statistical package