questionnaire design and surveys...

21
Questionnaire Design and Surveys Sampling United Kingdom Mirror Site The contents of this site are aimed at students who need to perform basic statistical analyses on data from sample surveys, especially those in marketing science. Students are expected to have a basic knowledge of statistics, such as descriptive statistics and the concept of hypothesis testing. Professor Hossein Arsham MENU Questionnaire Design and Surveys Management 1. Sample Size in Surveys Sampling 2. Multilevel Statistical Models 3. Surveys Sampling Routines 4. Cronbach's Alpha (Coefficient Alpha) 5. Instrumentality Theory 6. Value Measurements Survey Instruments (Rokeach's Value Survey) 7. Interesting and Useful Sites 8. To search the site, try Edit | Find in page [Ctrl + f]. Enter a word or phrase in the dialogue box, e.g. "parameter" or "sampling" If the first appearance of the word/phrase is not what you are looking for, try Find Next. Questionnaire Design and Surveys Management This part of the course is aimed at students who need to perform basic statistical analyses on data from sample surveys, especially those in the marketing science. Students are expected to have a basic knowledge of statistics such as descriptive statistics and the concept of hypothesis testing. When the sampling units are human beings, the main methods of collecting information are: face-to-face interviewing postal surveys telephone surveys direct observation. Questionnaire Design and Surveys Sampling http://ubmail.ubalt.edu/~harsham/stat-data/opre330Surveys.htm (1 of 21) [9/3/2001 3:25:49 PM]

Upload: others

Post on 18-Jul-2021

12 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Questionnaire Design and Surveys Sampling158.132.155.107/.../methods/questionnaire-design02.pdfQuestionnaire Design and Surveys Sampling United Kingdom Mirror Site The contents of

Questionnaire Design and SurveysSampling

United Kingdom Mirror Site

The contents of this site are aimed at students who need toperform basic statistical analyses on data from sample surveys,especially those in marketing science. Students are expected tohave a basic knowledge of statistics, such as descriptive statisticsand the concept of hypothesis testing.

Professor Hossein Arsham   

MENUQuestionnaire Design and Surveys Management1.

Sample Size in Surveys Sampling2.

Multilevel Statistical Models3.

Surveys Sampling Routines4.

Cronbach's Alpha (Coefficient Alpha)5.

Instrumentality Theory6.

Value Measurements Survey Instruments (Rokeach's ValueSurvey)

7.

Interesting and Useful Sites8.

To search the site, try Edit | Find in page [Ctrl + f]. Enter a word or phrase inthe dialogue box, e.g. "parameter" or "sampling" If the first appearance of theword/phrase is not what you are looking for, try Find Next.

Questionnaire Design and SurveysManagement

This part of the course is aimed at students who need toperform basic statistical analyses on data from sample surveys,especially those in the marketing science. Students are expectedto have a basic knowledge of statistics such as descriptivestatistics and the concept of hypothesis testing.

When the sampling units are human beings, the main methods ofcollecting information are:

face-to-face interviewingpostal surveystelephone surveysdirect observation.

Questionnaire Design and Surveys Sampling

http://ubmail.ubalt.edu/~harsham/stat-data/opre330Surveys.htm (1 of 21) [9/3/2001 3:25:49 PM]

Page 2: Questionnaire Design and Surveys Sampling158.132.155.107/.../methods/questionnaire-design02.pdfQuestionnaire Design and Surveys Sampling United Kingdom Mirror Site The contents of

Objectives:

To enable students to understand the integrated processes ofdesigning and conducting quantitative survey research projects.

To give students experience of grappling with problems in thedesign of survey samples, the construction of data collectioninstruments and the management of survey projects.

To make student aware of main sources of error in the surveyprocess and ways of detecting, controlling and minimising sucherror.

Course Content:

The quantitative survey process from project formulation,statistical design and sampling, through instrument design andquestion formulation, to data processing.

Basic principles and practice of probability sample design for fieldsurveys.

How to operationalise concepts, word questions and design,develop and test survey instruments, taking account of intendeduses of the data collected.

Principles of manual coding and editing of survey data, computerediting and preparing data for analysis.

Sources of error in survey data, ways of assessing them andways of minimising error.

Planning and management of large scale surveys, piloting and pretesting, relations with stakeholders in the sponsored surveyprocess, issues in survey ethics.

The main questions are:

What is the purpose of the survey?

What kinds of questions the survey would be developed toanswer?

What sorts of actions is the company considering based on theresults of the survey?

Step 1: Planning Questionnaire Research

Consider the advantages and disadvantages of usingquestionnaires.Prepare written objectives for the research.Have your objectives reviewed by others.Review the literature related to the objectives.Determine the feasibility of administering your questionnaire tothe population of interest.Prepare a time-line.

Questionnaire Design and Surveys Sampling

http://ubmail.ubalt.edu/~harsham/stat-data/opre330Surveys.htm (2 of 21) [9/3/2001 3:25:49 PM]

Page 3: Questionnaire Design and Surveys Sampling158.132.155.107/.../methods/questionnaire-design02.pdfQuestionnaire Design and Surveys Sampling United Kingdom Mirror Site The contents of

Step 2. Conducting Item Try-Outs and an Item Analysis

Have your items reviewed by others.Conduct "think-alouds" with several people.Carefully select individuals for think-alouds.Consider asking about 10 individuals to write detailed responseson a draft of your questionnaire.Ask some respondents to respond to the questionnaire for anitem analysis. In the first stage of an item analysis, tally thenumber of respondents who selected each choice.In the second stage of an item analysis, compare the responsesof high and low groups on individual items.

Step 3: Preparing a Questionnaire for Administration

Write a descriptive title for the questionnaire.Write an introduction to the questionnaire.Group the items by content, and provide a subtitle for eachgroup.Within each group of items, place items with the same formattogether.At the end of the questionnaire, indicate what respondents shoulddo next.Prepare an informed consent form, if needed.If the questionnaire will be mailed to respondents, avoid havingyour correspondence look like junk mail.If the questionnaire will be mailed, consider including a tokenreward.If the questionnaire will be mailed, write a follow-up letter.If the questionnaire will be administered in person, considerpreparing written instructions for the administrator.

Step 4: Selecting a Sample of Respondents

Identify the accessible population.Avoid using samples of convenience.Simple random sampling is a desirable method of sampling.Systematic sampling is an acceptable method of sampling.Stratification may reduce sampling errors.Consider using random cluster sampling when every member of apopulation belongs to a group.Consider using multistage sampling to select respondents fromlarge populations.Consider the importance of getting precise results whendetermining sample size.Remember that using a large sample does not compensate for abias in sampling.Consider sampling non respondents to get information on thenature of a bias.The bias in the mean is the difference of the population means forrespondents and non respondents multiplied by the populationnonresponse rate.

Questionnaire Design and Surveys Sampling

http://ubmail.ubalt.edu/~harsham/stat-data/opre330Surveys.htm (3 of 21) [9/3/2001 3:25:49 PM]

Page 4: Questionnaire Design and Surveys Sampling158.132.155.107/.../methods/questionnaire-design02.pdfQuestionnaire Design and Surveys Sampling United Kingdom Mirror Site The contents of

Step 5: Preparing Statistical Tables and Figures

Prepare a table of frequencies.Consider calculating percentages and arranging them in a tablewith the frequencies.For nominal data, consider constructing a bar graph.Consider preparing a histogram to display a distribution of scores.Consider preparing polygons if distributions of scores are to becompared.

Step 6: Describing Averages and Variability

Use the median as the average for ordinal data.Consider using the mean as the average for equal interval data.Use the median as the average for highly skewed, equal intervaldata.Use the range very sparingly as the measure of variability.If the median has been selected as the average, use theinterquartile range as the measure of variability.If the mean has been selected as the average, use the standarddeviation as the measure of variability.Keep in mind that the standard deviation has a specialrelationship to the normal curve that helps in its interpretation.For moderately asymmetrical distributions the mode, median andmean satisfy the formula: mode=3*median-2*mean.

Step 7: Describing Relationships

For the relationship between two nominal variables, prepare acontingency table.When groups have unequal numbers of respondents, includepercentages in contingency tables.For the relationship between two equal interval variables,compute a correlation coefficient.Interpret a Pearson r using the coefficient of determination.For the relationship between a nominal variable and an equalinterval variable, examine differences among averages.

Step 8: Estimating Margins of Error

It is extremely difficult, and often impossible, to evaluate theeffects of a bias in sampling.When evaluating a percentage, consider the standard error of apercentage.When evaluating a mean, consider the standard error of themean.When evaluating a median, consider the standard error of themedian.Consider building confidence intervals, especially when comparingtwo or more groups

Step 9: Writing Reports of Questionnaire Research

In an informal report, variations in the organization of the report

Questionnaire Design and Surveys Sampling

http://ubmail.ubalt.edu/~harsham/stat-data/opre330Surveys.htm (4 of 21) [9/3/2001 3:25:49 PM]

Page 5: Questionnaire Design and Surveys Sampling158.132.155.107/.../methods/questionnaire-design02.pdfQuestionnaire Design and Surveys Sampling United Kingdom Mirror Site The contents of

are permitted.Academic reports should begin with a formal introduction thatcites literature.The second section of academic reports should describe theresearch methods.The third section of academic reports should describe the results.The last section of academic reports should be a discussion.Acknowledge any weakness in your research methodology.

Missing Values on a Sensitive Topic

A natural way to get answers is to, as much as possible, assurepeople that the surveys are anonymous, and to find a way tomake the respondent at least minimally comfortable. So,according to US General Accounting Office book, "Developing andUsing Questionnaires" (Oct 1983) chapter 9, you should do thefollowing:

explain to respondent the reasons for asking the questions,1. make response categories as broad as possible.2. word the question in a nonjudgemental style that avoids theappearance of censure, or, if possible, make the behavior inquestion appear to be socially acceptable.

3.

present the request as matter of factly as possible.4. guarantee confidentiality or anonymity5. make sure the respondent knows the info will not be used inany threatening way.

6.

explain how the info will be handled7. avoid cross classification that will allow for pinpointingresponses.

8.

Source of Errors

The use of an inadequate frame.1. A poorly designed questionnaire.2. Recording and measurement errors.3. Non-response problems.4.

For example consider the following question: "Over the lasttwelve months would you say your health has on the whole been: Good? / Fairly good? / Not good?" . The respondent is requiredto tick one of 3 thus-labelled boxes.

What is wrong with this :

It is the ONLY question on the form which asks about a matter ofopinion rather than fact, but this distinction is not in any wayrepresented in its layout or wording.

Whereas for a question about opinion there should be a response

Questionnaire Design and Surveys Sampling

http://ubmail.ubalt.edu/~harsham/stat-data/opre330Surveys.htm (5 of 21) [9/3/2001 3:25:49 PM]

Page 6: Questionnaire Design and Surveys Sampling158.132.155.107/.../methods/questionnaire-design02.pdfQuestionnaire Design and Surveys Sampling United Kingdom Mirror Site The contents of

option of 'Don't Know' this is not provided. In some cases, suchas the Census Form and the Census advisory staff are adamantthat the question must be answered. Thus a person with noopinion on the matter is in a quandary and threatened withpossible legal action.

This particular question is highly ambiguous as regards thequalitative nature of what is being asked about (your health). Isone to respond in terms of how one feels, how one can perform,disgnosed conditions, comparisons with peer groups, comparisonswith other periods of one's life, or what?

Visit also the following Web sites:

Association for Survey ComputingWrite more effective survey questionsResearch Methods Knowledge BaseResearch Methods & Statistics ResourcesSampling In ResearchSampling, Questionnaire Distribution and InterviewingSRMSNET: An Electronic Bulletin Board for Survey ResearchersSampling and Surveying HandbookRead also some of the articles in the Public Opinion Quarterly,This journal focuses on methodologies for survey research.

The following softwares offer also the option of telephoneinterviewing:CASESSurveycraftThe Survey SystemRonin's Results for ResearchSawtooth's Ci3CATI for WindowsWestat's BlaiseMercator's SNAP Survey Software

Sample Size in Surveys Sampling

People sometimes ask me, what fraction of the population do youneed? I answer, "It's irrelevant; accuracy is determined bysample size alone" This answer has to be modified if the sampleis a sizable fraction of the population.

For an item scored 0/1 for no/yes, the standard deviation of theitem scores is given by SD = (p(1-p)/N) 1/2 where p is theproportion obtaining a score of 1, and N is the sample size.

The standard error of estimate SE (the standard deviation of therange of possible p values based on your sample estimate) isgiven by SE= SD/ N. Thus, SE is at a maximum when p = 0.5.Thus the worst case scenario occurs when 50% agree, 50%

Questionnaire Design and Surveys Sampling

http://ubmail.ubalt.edu/~harsham/stat-data/opre330Surveys.htm (6 of 21) [9/3/2001 3:25:49 PM]

Page 7: Questionnaire Design and Surveys Sampling158.132.155.107/.../methods/questionnaire-design02.pdfQuestionnaire Design and Surveys Sampling United Kingdom Mirror Site The contents of

disagree.

The sample size, N, can then be expressed as largest integer lessthan or equal to 0.25/SE2

Thus, for SE to be 0.01 (i.e. 1%), a sample size of 2500 would beneeded; 2%, 625; 3%, 278; 4%, 156, 5%, 100.

Note, incidentally, that as long as the sample is a small fraction ofthe total population, the actual size of the population is entirelyirrelevant for the purposes of this calculation.

Sample sizes with regard to binary data:

n = [t2 N p(1-p)] / [t2 p(1-p) + α2 (N-1)]

with N being the size of the total number of cases, n being thesample size, α the expected error, t being the value taken fromthe t distribution corresponding to a certain confidence interval,and p being the probability of an event.

There are several formulas for the sample size needed for at-test. The simplest one is

n = 2(Zα+Zβ)2σ2/D2

which underestimates the sample size, but is reasonable for largesample sizes. A less inaccurate formula replaces the Z values witht values, and requires iteration, since the df for the t distributiondepends on the sample size. The accurate formula uses anon-central t distribution and it also requires iteration.

The simplest approximation in your case is to replace the first Zvalue in the above formula with the value from the studentizedrange statistic that is used to derive Tukey's follow-up test. If youdon't have sufficiently detailed tables of the studentized range,you can approximate the Tukey follow-up test using a Bonferronicorrection. That is, change the first Z value to Zα where k is thenumber of comparisons.

Neither of these solutions is exact. I suspect that the exactsolution is a bit messy. But either of the above approaches isprobably close enough, especially if the resulting sample size islarger than (say) 30.

A better stopping rule for conventional statistical tests is asfollows:Test some minimum (pre-determined) number of subjects.Stop if p-value is equal to or less than .01, or p-value equal to orgreater than .36; otherwise, run more subjects.

Obviously, another option is to stop if/when the number ofsubjects becomes too great for the effect to be of practicalinterest. This procedure maintains α about 0.05.

Questionnaire Design and Surveys Sampling

http://ubmail.ubalt.edu/~harsham/stat-data/opre330Surveys.htm (7 of 21) [9/3/2001 3:25:49 PM]

Page 8: Questionnaire Design and Surveys Sampling158.132.155.107/.../methods/questionnaire-design02.pdfQuestionnaire Design and Surveys Sampling United Kingdom Mirror Site The contents of

We may categorized probability proportion to size (PPS)sampling, stratification, and ratio estimation (or any other form ofmodel assisted estimation) as tools that protect one from theresults of a very unlucky sample. The first two (PPS sampling andstratification) do this by manipulation of the sampling plan (withPPS sampling conceptually a limiting case of stratification). Modelassisted estimation methods such as ratio estimation serve thesame purpose by introduction of ancillary information into theestimation procedure. Which tools are preferable depends, asothers have said, on costs, availability of information that allowsuse of these tools, and the potential payoffs (none of these willhelp much if the stratification/PPS/ratio estimation variable is notwell correlated with the response variable of interest).

Therefore, you must use whatever tools are at your disposal thatwould improve your estimates at feasible costs.

There are also heuristic methods for determination of samplesize. For example, in healthcare behavior and processmeasurement sampling criteria are designed for a 95% CI of 10percentage points around a population mean of 0.50; There is aheuristic rule: "If the number of individuals in the targetpopulation is smaller than 50 per month, systems do not usesampling procedures but, attempt to collect data from allindividuals in the target population.", visit e.g., The JointCommission on Accreditation of Healthcare Organizations.

Also visit: Probability Sampling

Multilevel Statistical ModelsMany kinds of data, including observational data collected in thehuman and biological sciences, have a hierarchical or clusteredstructure. For example, animal and human studies of inheritancedeal with a natural hierarchy where offspring are grouped withinfamilies. Offspring from the same parents tend to be more alikein their physical and mental characteristics than individualschosen at random from the population at large.

Many designed experiments also create data hierarchies, forexample clinical trials carried out in several randomly chosencenters or groups of individuals. Multilevel models are concernedonly with the fact of such hierarchies not their provenance. Werefer to a hierarchy as consisting of units grouped at differentlevels. Thus offspring may be the level 1 units in a 2-levelstructure where the level 2 units are the families: students maybe the level 1 units clustered within schools that are the level 2units.

The existence of such data hierarchies is neither accidental nor

Questionnaire Design and Surveys Sampling

http://ubmail.ubalt.edu/~harsham/stat-data/opre330Surveys.htm (8 of 21) [9/3/2001 3:25:49 PM]

Page 9: Questionnaire Design and Surveys Sampling158.132.155.107/.../methods/questionnaire-design02.pdfQuestionnaire Design and Surveys Sampling United Kingdom Mirror Site The contents of

ignorable. Individual people differ as do individual animals andthis necessary differentiation is mirrored in all kinds of socialactivity where the latter is often a direct result of the former, forexample when students with similar motivations or aptitudes aregrouped in highly selective schools or colleges. In other cases,the groupings may arise for reasons less strongly associated withthe characteristics of individuals, such as the allocation of youngchildren to elementary schools, or the allocation of patients todifferent clinics. Once groupings are established, even if theirestablishment is effectively random, they will tend to becomedifferentiated, and this differentiation implies that the group' andits members both influence and are influenced by the groupmembership. To ignore this relationship risks overlooking theimportance of group effects, and may also render invalid many ofthe traditional statistical analysis techniques used for studyingdata relationships.

A simple example will show its importance. A well known andinfluential study of primary (elementary) school children carriedout in the 1970's claimed that children exposed to so called'formal' styles of teaching reading exhibited more progress thanthose who were not. The data were analyzed using traditionalmultiple regression techniques which recognized only theindividual children as the units of analysis and ignored theirgroupings within teachers and into classes. The results werestatistically significant. Subsequently, it has been demonstratedthat when the analysis accounted properly for the grouping ofchildren into classes, the significant differences disappeared andthe 'formally' taught children could not be shown to differ fromthe others.

This re-analysis is the first important example of a multilevelanalysis of social science data. In essence what was occurringhere was that the children within any one classroom, becausethey were taught together, tended to be similar in theirperformance. As a result they provide rather less informationthan would have been the case if the same number of studentshad been taught separately by different teachers. In other words,the basic unit for purposes of comparison should have been theteacher not the student. The function of the students can be seenas providing, for each teacher, an estimate of that teacher'seffectiveness. Increasing the number of students per teacherwould increase the precision of those estimates but not changethe number of teachers being compared. Beyond a certain point,simply increasing the numbers of students in this way hardlyimproves things at all. On the other hand, increasing the numberof teachers to be compared, with the same or somewhat smallernumber of students per teacher, considerably improves theprecision of the comparisons.

Researchers have long recognized this issue. In education, forexample, there has been much debate about the so called 'unit of

Questionnaire Design and Surveys Sampling

http://ubmail.ubalt.edu/~harsham/stat-data/opre330Surveys.htm (9 of 21) [9/3/2001 3:25:49 PM]

Page 10: Questionnaire Design and Surveys Sampling158.132.155.107/.../methods/questionnaire-design02.pdfQuestionnaire Design and Surveys Sampling United Kingdom Mirror Site The contents of

analysis' problem, which is the one just outlined. Beforemultilevel modelling became well developed as a research tool,the problems of ignoring hierarchical structures were reasonablywell understood, but they were difficult to solve because powerfulgeneral purpose tools were unavailable. Special purposesoftware, for example for the analysis of genetic data, has beenavailable longer but this was restricted to 'variance components'models and was not suitable for handling general linear models.Sample survey workers have recognized this issue in anotherform. When population surveys are carried out, the sampledesign typically mirrors the hierarchical population structure, interms of geography and household membership. Elaborateprocedures have been developed to take such structures intoaccount when carrying out statistical analyses.

For more details, visit Multilevel Models Project

References and Further Readings:Goldstein H., Multilevel Statistical Models, Halstead Press, NewYork, 1995.Longford N., Random Coefficient Models, Clarendon Press,Oxford, 1993.

These books cover a very wide range of applications and theory.

Surveys Sampling Routines

Note: The following programs are referred to the PracticalMethods for Design and Analysis of Complex Surveys, by R.Lehtonen, and E. Pahkinen, Wiley, Chichester, 1995. See also,L.Lyberg et al., (Editors), Survey Measurement and ProcessQuality, New York, Wiley, 1997.

Other software packages such as Le Sphinx, CENVAR, CLUSTERS,Epi Info, Generalized Estimation System, Super CARP, Stata,SUDAAN, VPLX, WesVarPC, and ORIRIS IV. For a detailed reviewvisit Summary of Survey Analysis Software

TITLE Bernoulli sampling; PI=0.25, N=32GET FILE (input dataset)COMPUTE PI=0.25COMPUTE EPSN=UNIF(1)SELECT IF (EPSN LT PI)WRITE OUTPUT=(output dataset)

TITLE Simple random sampling with replacement; n=8, N=32GET FILE (input dataset)COMPUTE L=L+IDLEAVE LCOMPUTE E=L-ID

Questionnaire Design and Surveys Sampling

http://ubmail.ubalt.edu/~harsham/stat-data/opre330Surveys.htm (10 of 21) [9/3/2001 3:25:49 PM]

Page 11: Questionnaire Design and Surveys Sampling158.132.155.107/.../methods/questionnaire-design02.pdfQuestionnaire Design and Surveys Sampling United Kingdom Mirror Site The contents of

NUMERIC W(f2)COMPUTE W=0DO REPEAT A=A1-A8IF (ID=1) A=UNIF(32)LEAVE AIF (E LT A AND A LE L) W=W + 1END REPEATSELECT IF (W GT 0)WRITE OUTFILE = (output dataset)

TITLE Simple random sampling without replacement; n=8, N=32

GET FILE (input dataset)SAMPLE 8 FROM 32WRITE OUTFILE = (output dataset)

TITLE Systematic sampling; n=8, sampling interval =4MATRIXCOMPUTE RAND = TRANC (4*UNIFORM(1,1))COMPUTE INT=RAND*MAKE(32, 1, 1)SAVE INT/OUTPUT=*/VAR=INTEND MATRIXMATCH FILES FILE = (input dataset)/FILE=*COMPUTE INDEX = MOD ($CASENUM, 4)SELECT IF (INDEX=INT)SAVE OUTPUTFILE = (output dataset)/DROP=INDEX INT

The Following routines are for sampling

(selection with probability proportion to size)

TITLE PPS Poisson sampling with expected size of 8

GET FILE )input dataset)COMPUTE PI=8*HOU*%/91753COMPUTE EPSN=UNIF(1)SELECT IF (EPSN LE PI)WRITE OUTFILE (output dataset)

TITLE PPS Sampling with replacement; n=8GET FILE (input dataset)COMPUT L=L+HOU85LEAVE LCOMPUTE E=L-HOU85NUMERIC W(F2)COMPUTE W=0DO REPEAT A A1 TO A8IF (ID=1) A=INIF(91753)

Questionnaire Design and Surveys Sampling

http://ubmail.ubalt.edu/~harsham/stat-data/opre330Surveys.htm (11 of 21) [9/3/2001 3:25:49 PM]

Page 12: Questionnaire Design and Surveys Sampling158.132.155.107/.../methods/questionnaire-design02.pdfQuestionnaire Design and Surveys Sampling United Kingdom Mirror Site The contents of

LEAVE AIF (E LT A AND ALE L) W=W+1END REPEAT SELECT IF W GT 0WRITE OUTFILE = (output dataset)

TITLE PPS Systematic sampling n=8GET FILE (input dataset)COMPUTE #C=#C + 1COMPUTE CASE = #CCOMPUTE #SN=8COMPUTE #PN=01753COMPUTE #INT=TRUNC (#PN/#SN)COMPUTE #RAN= TRUNC (UNIFORM (#INT) +1)DO IF CASE = 1COMPUTE #COMP=#RANCOMPUTE RAN=#RANEND IFCOMPUTE SAMIND=0LOOP IF +COMP LE CUMHOU*%+ COMPUTE SAMIND = SAMIND+1+ COMPUTE #COMP+#COMP+#INTEND LOOPEXECUTE.WRITE OUTFILE= (output dataset)

References and Further Readings:Bethel J., Sample allocation in multivariate surveys, SurveyMethodology, 15, 1989, 47-57.

Valliant R., and J. Gentle, An application of mathematicalprogramming to a sample allocation problem, ComputationalStatistics and Data Analysis, 25, 1997, 337-360.

Visit also Survey Samplings, andSurvey Software

Cronbach's Alpha (Coefficient α)

Perhaps the best way to conceptualize Cronbach's Alpha is tothink of it as the average of all possible split half reliabilities for aset of items. A split half reliability is simply the reliability betweentwo parts of a test or instrument where those two parts arehalves of the total instrument. In general, the reliabilities of thesetwo halves should then be stepped up (Spearman BrownProphesy Formula) to estimate the reliability for the full lengthtest rather than the reliability between to half length tests.Assuming, for ease of interpretation, that a test has an evennumber of items (e.g, 10), then items 1-5 versus 6-10 would beone split, evens versus odds would be another and, in fact, with

Questionnaire Design and Surveys Sampling

http://ubmail.ubalt.edu/~harsham/stat-data/opre330Surveys.htm (12 of 21) [9/3/2001 3:25:49 PM]

Page 13: Questionnaire Design and Surveys Sampling158.132.155.107/.../methods/questionnaire-design02.pdfQuestionnaire Design and Surveys Sampling United Kingdom Mirror Site The contents of

10 items chosen 5 at a time, there are 10 chose 5 or 252 possiblesplit halves for this test. If we compute each of these stepped upsplit half reliabilities and averaged them all, this average wouldbe Cronbach's Alpha. Since some splits will be better than othersin terms of creating two more closely parallel halves, and thereliability between parallel halves is probably the mostappropriate estimate of an instrument's reliability, Cronbach'salpha is often considered a relatively conservative estimate of theinternal consistency of a test.

The following is a SAS program for computing coefficient alpha orCronbach's Alpha. Note that, it is an option in the PROC CORRprocedure. In SAS, for a WORK data set called ONE, suppose wewant the internal consistency or coefficient alpha or Cronbach'salpha for x1-x10, the syntax is:

PROC CORR DATA=WORK.ONE ALPHA; VAR X1-X10;RUN;

There are at least three important caveats to consider whencomputing coefficient alpha.

Note 1: How to handle "missing" values. In achievement testing,a missing value or a not reached value is traditionally coded as 0or wrong. the CORR procedure is SAS DOES NOT treating missingas wrong. It is not difficult to write code to force this to happen,but we must write the code. In the above example we could do soas follows:

DATA WORK.ONE;SET WORK.ONE;ARRAY X {10} X1-X10; /* DEFINING AN ARRAY FOR THE 10 ITEMS */DO I=1 TO 10;IF X(I) = . THEN X(I) = 0; /* FOR EACH ITEM X1-X10 CHANGING MISSING VALUES (.) TO 0 */ END;RUN;

Note 2: The use of the NOMISS option in the CORR procedure.This is related to Note 1 above. Another way to handling missingobservations is to use the NOMISS option in the CORR procedure.The syntax is as follows:

PROC CORR DATA=WORK.ONE ALPHA NOMISS; VAR X1-X10;

The effect of this is to remove all items X1-X10 from analysis forany record where a at least one of these items X1-X10 aremissing. Obviously, for achievement testing, especially forspeeded tests, where most examines might not be expected tocomplete all items, this would be a problem. The use of the

Questionnaire Design and Surveys Sampling

http://ubmail.ubalt.edu/~harsham/stat-data/opre330Surveys.htm (13 of 21) [9/3/2001 3:25:49 PM]

Page 14: Questionnaire Design and Surveys Sampling158.132.155.107/.../methods/questionnaire-design02.pdfQuestionnaire Design and Surveys Sampling United Kingdom Mirror Site The contents of

NOMISS option would restrict the analysis to the subset ofexamines who did complete all items and this quite often wouldnot be the population of interest when wishing to establish aninternal consistency reliability estimate.

One common approach to resolving this problem might be todefine a number of items that must be attempted for the recordto be included. Some health status measures, for example theSF-36, have scoring rules that require that at least 50% of theitems must be answered for the scale to be defined. If less thanhalf of the items are attempted, then the scale is not interpreted.If the scale is considered valid, by THEIR definition, then allmissing values on that scale are replace by the average of thenon-missing items on that scale. The SAS code to implement thisscoring algorithm is summarized below under the assumption thatthe scale is has 10 items.

DATA WORK.ONE;SET WORK.ONE; ARRAY X {10} X1-X10; IF NMISS(OF X1-X10) > 5 THEN DO I=1 TO 10; X(I) = .; END; ELSE IF NMISS(OF X1-X10) < = 5 THEN DO I=1 TO 10; IF X(I) =. THEN X(I) = MEAN(OF X1-X10); END;RUN;

Note that replacing all missing values with the average of thenon-missing values in the cases where then number of missingvalues is not greater than half of the total number of items willresult in an inflated Cronbach's alpha. A better approach would beto remove from consideration records where fewer than 50% ofthe records are completed and to leave the remaining recordsintact, with the missing values still in. In other words, toimplement that first IF statement above, but to eliminate theELSE IF clause and then to run the PROC CORR without theNOMISS option. The bottom line: The NOMISS option in PROCCORR in general, and with the ALPHA options in particular mustbe considered carefully.

Note 3: Making sure that all items in the set are coded in thesame direction. Although 0/1 (wrong/right) coding is rarely aproblem with this, for Likert or other scales with more than 2points on the scale, it is not uncommon for the scale to remainconstant (e.g., Strongly Agree, Agree, Disagree, StronglyDisagree), but for the wording of the questions to reverse theappropriate interpretation of the scale. For example,

Q1. Social Security System Must be reformed SA A D SDQ2. Social Security System Remain the Same SA A D SD

Clearly, the two questions are on the same scale, but the

Questionnaire Design and Surveys Sampling

http://ubmail.ubalt.edu/~harsham/stat-data/opre330Surveys.htm (14 of 21) [9/3/2001 3:25:49 PM]

Page 15: Questionnaire Design and Surveys Sampling158.132.155.107/.../methods/questionnaire-design02.pdfQuestionnaire Design and Surveys Sampling United Kingdom Mirror Site The contents of

meanings of the end points opposite.

In SAS, the way to adjust for this problem is to pick the directionthat we want the scale to be coded, that is, do we want SA to bea positive statement about the Social Security System or anegative one, and then reverse scale those items were SA reflectsnegatively (or positively) about Social Security System. In theabove example, SA for Q1 is a negative position relative to theSocial Security System and, therefore should be reverse scaled ifthe decision is to scale so the SA implies positive attitudes.

If the coding of the 4-point Likert Scale was SA-0, A-1, D-2,SD-3, then the item will be reverse scaled as follows:Q1 = 3-Q1, in this way 0 becomes 3-0 = 1; 1 becomes 3-1 = 2;2 becomes 3-2 = 1; and 3 becomes 3-3 = 0.

If the coding of the 4-point Likert Scale was SA-1, A-2, D-3,SD-4, then the item will be reverse scaled as follows:Q1 = 5-Q1, in this way 1 becomes 5-1 = 4; 2 becomes 5-2 = 3;3 becomes 5-3 = 2; and 4 becomes 5-4 = 1.

From the earlier example, If items X1, X3, X5, X7, and X9 wouldneed to be reverse scaled for before computing an internalconsistency estimate, then the following SAS code would do thejob, Assuming a the 4-point Likert scale illustrated above with 1-4scoring.

DATA WORK.ONE;SET WORK.ONE; ARRAY X {10} X1-X10; /* DEFINING AN ARRAY FOR THE 10 ITEMS */ DO I=1,3,5,7,9; /* INDICATING WHICH ITEMS IN THE ARRAY TO BE REVERSE SCALED */ X(I) = 5-X(I); /* REVERSE SCALING FOR 1-4 CODING OF 4-POINT LIKERT SCALE */ END;RUN;

It should be noted that some of the output from PROC CORR withthe ALPHA option, such as the correlation of the item with thetotal and the internal consistency estimate for the scale with eachindividual item NOT part of the scale provides very usefuldiagnostics that should alert the researcher about either poorlyfunctioning items or items that were missed when consideringreverse scaling. An item that correlated negatively with the totalusually needs to be reverse scaled or is poorly formed.

References and Further Readings:Feldt L., and R. Brennan, Reliability, in Educational Measurement,Linn R. (Ed.), 105-146, 1989, Macmillian Publishing Company.Garson G., Reliability.

Questionnaire Design and Surveys Sampling

http://ubmail.ubalt.edu/~harsham/stat-data/opre330Surveys.htm (15 of 21) [9/3/2001 3:25:49 PM]

Page 16: Questionnaire Design and Surveys Sampling158.132.155.107/.../methods/questionnaire-design02.pdfQuestionnaire Design and Surveys Sampling United Kingdom Mirror Site The contents of

Instrumentality TheorySuppose two corresponding items, one from the dimension beingrated and its mate, the relative importance of that topic, calledthe "valence", are cross-multiplied, then added up across all suchpairs, then divided by the number of such pairs. This produces aweighted score, the sum of the items each weighted by itsrelative importance. The higher the average weighted score, thegreater the overall importance and rating of the topic. Thetechnique has been well-liked since two issues are beingconsidered here, how satisfied or prepared or . . . someone is,and how important that topic is to them. The approach has beenapplied to multivariate issues such as factors affecting leaving anorganization, job satisfaction, managerial behavior, etc.

Value Measurements SurveyInstruments:Rokeach's Value Survey

Anthropologists have traditionally observed the behavior ofmembers of a specific society and inferred from such behavior thedominant or underlying values of the society. In recent years,however, there has been a gradual shift to measuring valuesdirectly by means of survey questionnaire research. Researchersuse data collection instruments called value instruments to askpeople how they feel about such basic personal and socialconcepts as freedom, comfort, national security, and peace.

Research into the relationship between peoples values and theiractions as consumers is still in its infancy. However, it is an areathat is destined to receive increased attention, for it taps a broaddimension of human behavior that could not be exploredeffectively before the availability of standardized valueinstruments.

A popular value instrument that has been employed in consumerbehavior studies in the Rokeach Value Survey (RVS). Thisself-administered value inventory is divided into two parts, witheach part measuring different but complementary types ofpersonal values. The first part consists of eighteen terminal valueitems, which are designed to measure the relative importance ofend- states of existence (i.e. personal goals). The second partconsists of eighteen instrumental valueitems, which measurebasic approaches and individual might take to reach end-statevalues. Thus, the firs half of the measurement instrumentdeals with ends. While the second half considers means.

If the items are not reworded to accommodate the Likert format;instead, respondents are asked to indicate the degree of personal

Questionnaire Design and Surveys Sampling

http://ubmail.ubalt.edu/~harsham/stat-data/opre330Surveys.htm (16 of 21) [9/3/2001 3:25:49 PM]

Page 17: Questionnaire Design and Surveys Sampling158.132.155.107/.../methods/questionnaire-design02.pdfQuestionnaire Design and Surveys Sampling United Kingdom Mirror Site The contents of

importance each RVS value holds, from "very unimportant" to"very important," and then they're given the standard Likert scalenext to each RVS value. Some applications use , for example, a5-point scale and then features a rank-ordering of the top threeRVS values after each list of has already been rated, to use incorrecting for end-piling. It is show that in many cases, slightly,but not significantly, lower test-retest reliabilities for the Likertversus rank-ordered procedure.

Since the common reason for preferring to use the RVS in a Likertformat is to be able to perform normative statistical tests on thedata, it is worthwhile to point out that there are good argumentsin favor of using normative statistical tests on RVS data with thescale in its original, rank-ordered format, under some conditions.

References and Further Readings:Braithwaite V., Beyond Rokeach's equality-freedom model: Twodimensional values in a one dimensional world, Journal of SocialIssues , 50, 67-94, 1994.Gibbins K., and I. Walker, Multiple interpretations of the Rokeachvalue survey, Journal of Social Psychology, 133, 797-805, 1993.

Some Interesting and Useful Sites:

Search and General Resources| All In One Search | AAA Search | ASA: American StatisticalAssociation | The International Biometric Society | Institute ofMathematical Statistics|

| Lecture Hall | Syllabits | How to Study Statistics |StatisticalTerms | Statistics Education|

| All Topics on the Web | All Topics Periodicals | StatisticalResources on the Web| Statistical Resources on the Web|

Teaching Resources | Statistics and OR Resources | StatisticalResources | Statistics Archive|

| World Wide Resources | Virtual Library | Books and Journal |OnLine Text Books | ASA Washington D.C. Chapter|

| Online Statistical Textbooks | Statistical Page Resources |Statistical Resources | Psychometrics | StatServ |ArgusClearinghouse|

| Job Listing | Careers in Statistics | Conferences |

| Statistical Computing Journals | Rainer's Home Page forStatisticians | Statistics Around the world | International

Questionnaire Design and Surveys Sampling

http://ubmail.ubalt.edu/~harsham/stat-data/opre330Surveys.htm (17 of 21) [9/3/2001 3:25:49 PM]

Page 18: Questionnaire Design and Surveys Sampling158.132.155.107/.../methods/questionnaire-design02.pdfQuestionnaire Design and Surveys Sampling United Kingdom Mirror Site The contents of

Statistical Institute|

Demo's and Interactive| Coin Flipping | Equation Solver by Newton's Method | ProbabilityLessons | Real Analysis | Java Applets | XploRe|

| ANOVA Applet | Outliers and Regression Line | ConfidenceInterval | Let's Make a Deal|

| Histogram | Central Limit Theorem | Power of T-test |Computer-based Learning Statistics|

| Learning Statistics by Demo | Normal Approx. to binomial |Small Sample Size Effect | Statistics Demonstrations|

|Normal Curve Area |Critical Values for the t-Distribution | CriticalValues for the F-Distribution | Critical Values for the Chi- squareDistribution|

| Tests for Binomial | Chi-square Goodness-of-fit| ConfidenceIntervals on a Proportion| Confidence Intervals on a Correlation|Confidence Intervals on a Mean| Confidence Intervals on theDifference of Two Independent Means|

| T-test Paired | Statistical Calculators| Test of Two Proportions|Mean| Two Independent Means| Two Dependent Means| MedianTest| Sign Test| McNemar's Test| Wilcoxon Matched-Pairs Test|Wilcoxon Test |Randomization Plans|

Rank Correlation Coefficient| Correlation Coefficient| ComparingCorrelation Coefficients|

Survey Analysis| Resources for Methods in Evaluation and Social Research |Social Research and Statistical Links | Survey Analysis Software |ASA Survey Research | American Association for Public OpinionResearch | Association for Survey Computing|

| Applied Social Surveys| General Social Survey | Bureau of theCensus | Social Surveys Question Bank | National MappingInformation |On-line Survey Response|

Statistical Software| StatiBot | Analyse-it |Autobox |BMDP |Data Desk |Lisp-Stat|Matlab |Minitab |SAS |SPSS |Stata | Biostatistical SoftwareRoutines | Ecological Data Analysis |Multiple Imputation

Questionnaire Design and Surveys Sampling

http://ubmail.ubalt.edu/~harsham/stat-data/opre330Surveys.htm (18 of 21) [9/3/2001 3:25:49 PM]

Page 19: Questionnaire Design and Surveys Sampling158.132.155.107/.../methods/questionnaire-design02.pdfQuestionnaire Design and Surveys Sampling United Kingdom Mirror Site The contents of

|Statistica |ASSUME |The Spreadsheet page |Add-ins for Excel. |Data analysis and statistical solutions for Excel|

| Choosing a Statistical Analysis Package | Statistical SoftwareReview | Statistical Software Providers | Visual Statistics System| WebStat | QDStat | Statistical Calculators on Web | Modstat |The AssiStat | XLStatistics|

Data and Data Analysis| Economic Data and Links | Agricultural Statistics | The GallupOrganization | The Data and Story Library | Dr. B's Data |Datasets | Research Resources|

| Forum on Applied Data Analysis | Multivariate Analysis |Multivariate Statistics | Statistical data analysis FAQ| | DataAnalysis | Cast Your Vote!|

| National Center for Health Statistics (NCHS) | FEDSTAT: USA |Economic Data sources|

|Computer Routines and Data Files | Globally AccessibleStatistical Procedures| Data Mining and Knowledge Discovery |USDA Economics and Statistics System|

| Brixton Books: Epi Info | Software FAQs| ExperimentalInteractive Statistics | Discrete Genetic Data Analysis | SoftwareProviders | Software Metrics

Statistics and Probability|Test for Randomness |Confidence Intervals |ANOVA in Detail |The Probability Web | Business Statistics | Against All Odds|

| A New View of Statistics | Statistics Homepage |MathForum |SIIP | Introductory Statistics | Cases | HyperStat | Introductionto Statistics|

| Descriptive Statistics | Statistics Lab | Linear Regression | K-12Statistics | Statistics Sites|

| Statistics Links | Use and abuse of statistics | Statistics on Web| CHANCE Magazine | Statistical Links | Statistics Handouts|

| Basic Statistics | SurfStat | Design Research | Statistics MailingLists | Statistics Links | Introduction to Statistics | Topics inStatistics | Statistics Refresher | World Lecture Hall | Probabilityand Statistics | Statistics Online | Lecture Notes | ConceptualStatistics|

| SimStat Package | Ordination Methods | Glossary for Ordination

Questionnaire Design and Surveys Sampling

http://ubmail.ubalt.edu/~harsham/stat-data/opre330Surveys.htm (19 of 21) [9/3/2001 3:25:49 PM]

Page 20: Questionnaire Design and Surveys Sampling158.132.155.107/.../methods/questionnaire-design02.pdfQuestionnaire Design and Surveys Sampling United Kingdom Mirror Site The contents of

Methods | Guide for Statisticians | Power Analysis: How manysamples? | Statistics for Everyone | American Risk and InsuranceAssociation | Multilevel Models| | International Association ForStatistical Computing | Risk Theory Society | StatLib | StatisticalEducation through Problem Solving|

| AllStat | Institute of Mathematical Statistics | Statistics andSocial Sciences | An Introduction to Geostatistics | AnnotatedBibliography of Articles for the Statistics User|

| Bibliography for Computational Probability and Statistics

| Probability Archive | Probability Resources|

A selection of:

| BUBL Catalogue| Computational Probability| ConnectedUniversity| CTI Statistics| Links2Go|

McGraw-Hill| |Math Forum Phone-soft Cyber-world| PhysicalSciences| Probability and Statistics|

Search Engines Directory| AltaVista| AOL| Dogpile| Excite| HotBot| Looksmart| Lycos|Netscape| NetFirst| OpenDirectory| OpenHere| Webcrawler|Yahoo|

| Second Moment| Smallshop| Suite101| World Lecture Hall|

Additional useful sites may be found by clicking on the followingsearch engine:

Lycos

Back to

Statistical Data Analysis: Inferring From Datamain Web site.

The Copyright Statement: The fair use, according the 1996 FairUse Guidelines for Educational Multimedia, of materials presentedon this Web site is permitted for non-commercial and classroompurposes only.This site may be mirrored intact (including these notices), on anyserver with public access, and linked to other Web pages.

Kindly e-mail me your comments, suggestions, and concerns.Thank you.

Professor Hossein Arsham   

Questionnaire Design and Surveys Sampling

http://ubmail.ubalt.edu/~harsham/stat-data/opre330Surveys.htm (20 of 21) [9/3/2001 3:25:49 PM]

Page 21: Questionnaire Design and Surveys Sampling158.132.155.107/.../methods/questionnaire-design02.pdfQuestionnaire Design and Surveys Sampling United Kingdom Mirror Site The contents of

This site is launched on 2/18/1994, and its intellectual materials have beenthoroughly revised on yearly basis. The current version is the 8th Edition. Allexternal links are checked once a month.

Back to Dr Arsham's Home Page

EOF

Questionnaire Design and Surveys Sampling

http://ubmail.ubalt.edu/~harsham/stat-data/opre330Surveys.htm (21 of 21) [9/3/2001 3:25:49 PM]