quantitative analysis managing the data. coding of questionnaires and production of code book....
TRANSCRIPT
Quantitative Analysis
Managing the Data. Coding of Questionnaires and
Production of Code book.
Quantitative Measurement: Nominal, ordinal & interval
variables.
Simple statistics: Frequencies (percentages) Descriptives (mean, median, mode)
Data Analysis
Having spent time designing your survey and collecting data, it is important to make a good job of your data analysis.
Your analysis is guided by the hypotheses that you are exploring and relies on your ingenuity to relate the data to the theory.
Data analysis is helped by efficient management of data, particularly coding.
Managing Data:
Keep all questionnaires safely and where you cannot violate your promise of anonymity and confidentiality (cabinet and key!).
Check your input of data regularly on the computer – for example, SPSS, check frequencies and cross-tabulations of responses to variables to ensure no strange input!
Also check frequencies of case numbers (numbers representing your respondents) to prevent entering a case (respondent) twice!
When finished processing data, add frequencies of variables to the codebook.
Coding the data
You need to extract the data from the questionnaires and put them into a form that is easier to refer to and to manipulate:
1. You code data and create a data file.2. Draw up a coding frame. This lists all the
alternative values for a given variable and allocates a number for each possible answer.
3. Coding frame for closed questions can be taken from the questionnaire. Additional coding for open questions needs to be created when imputing data. Coding frame is usually drawn up before you attempt to code data and input data into the matrix.
The job of analysis is made easier if you use a computer to create data files – e.g. SPSS, STATA, R, SAS or even Excel.
Coding Questions
What sex are you?1. Male 2. Female
How old are you?________
What is your marital status?1. Never married 2. Married 3. Cohabiting 4. Separated 5. Divorced 6. Widowed
Data File
This involves producing a grid on which to record the appropriate codes for each respondent.
Normally each row represents the respondent and each column a variable.
ID no Sex Age Marital EdLevel
3125 1 26 1 8
3768 2 57 2 7
4448 2 39 3 99
Data File
In some computer programmes you can click on a data label and reveal what each code means.
99 or 999 are usually the codes we use when there is missing data – ‘non item’ responses.
ID no Sex Age Marital EdLevel
3125 male 26 Never married
degree
3768 female 57 married diploma
4448 Female 39 Separated
missing
Notes on the Codebook:
Start creating the codebook with the questionnaire.
For each question:1. Write down the actual question.2. The name(s) of the variable(s) given to
represent the question [eg:Q1walk]3. The label of the variable. [eg: whether or not
respondent walks to work].4. Values (codes) and value labels for each
attribute. [egg: 1 (value) = walks to work (label); 0 (value) = does not walk to work (label) and 99 = missing data]
5. Variable type – nominal, ordinal or continuous.
Code open questions (string variables) and ‘other’ categories afterwards and manually.
Example of Tricky Question to Code:
Q1. Which mode of transport do you use most often to work?
(please tick all those that apply)
Walk Bicycle
Rail Bus Car as driver Car as passenger Motorbike as driver Motorbike as passenger
Coding of ‘Tricky’ Question:
Spilt the question into different variables, i.e. create multi-dichotomies.
Walk variable (Q1walk):1 = walks to work0 = does not walk to work99 = missing data.
Bicycle variable (Q1bicycle):1 = walks to work0 = does not walk to work99 = missing data.
ETC.
Categorical Variables
Examples Descriptive
Statistics
Binary
2 categories
Usually coded as 1/2 or 0/1
Male/female
Employed/
unemployed
Supports/opposes Euro
Frequencies
Descriptives of 0/1 variable
Crosstabs
Nominal
More than 2 unordered categories
Usually coded as 1, 2, 3 but these are labels
Social class
Region of Residence
Political Party vote
Frequencies
Crosstabs
Continuous
Variables
Examples Descriptive
Statistics
Interval/ratio
Differences have same meaning at different points on the scale
Calendar year
Income
Weight
Height
Descriptive
Group frequencies
Scatter plots
Discrete Counts
No of counts in a given area or period of time
Number of children in a family
Number of heart attacks in Oxford in 2005
Frequencies
(if few values)
Descriptives
Ranked Variable
Examples Descriptive
Statistics
Ordinal
Categorical with ordered categories
Numeric codes 1,2,3…but numeric order corresponds to the ordering of categories
Class of university degree
Strength of agreement or disagreement about an issue
Level of job satisfaction
Frequencies
Crosstabs
Counting Responses
After compiling your data in your data matrix, the first step in data analysis is to summarise your data.
You can do this by: Tabulating the data (‘Frequencies’) Calculating the ‘Descriptives’ i.e.
summaries and variability of the data
Graphs This initial step is sometimes
called univariate analysis – i.e. description of one variable
Frequencies
From a frequency table, you can tell how often (frequently) people gave each response.
It tells you how many people selected each response to a question.
Frequencies can also be used to check codes. If a code appears in the frequency table that wasn’t used in the coding scheme, you know that an error has happened in imputing the data.
IMPJOB – Importance to respondent of having a fulfilling job.
Value Label
Value Frequency
Percent Valid Percent
Cum Percent
One of the most imp.
1 316 21.1 21.4 21.4
Very imp.
2 833 55.5 56.3 77.7
Somewhat
3 238 15.9 16.1 93.8
Not too import
4 62 4.1 4.2 98
Not at all
5 30 2.0 2.0 100
DK 8 7 .5 Missing
NA 9 14 .9 Missing
Total 1500 100 100
Frequencies
A frequency count alone is not a very good summary of the data so use both counts and percentages.
Percentages are easier to visualise – unlike counts you can compare percentages across surveys with different cases.
Use valid percentages – i.e. exclude the missing data.
Simple Statistics
Descriptives:
1. Mode (most frequently recurring response)
2. Mean (average)
3. Median (middle value if all responses were laid out in a row from smallest to largest).
Used instead of mean for ordinal variables, i.e. is the mid-rank.
Also is not affected by ‘outliers’ so a better measure than mean for continuous variables like age and income.
Graphs
You can use a number of graphs to illustrate your findings such as: Pie chart Bar chart Histogram
It depends on the type of variable!