quantitative analysis managing the data. coding of questionnaires and production of code book....

21
Quantitative Analysis Managing the Data. Coding of Questionnaires and Production of Code book. Quantitative Measurement: Nominal, ordinal & interval variables. Simple statistics: Frequencies (percentages) Descriptives (mean, median, mode)

Upload: corey-cole

Post on 22-Dec-2015

221 views

Category:

Documents


1 download

TRANSCRIPT

Quantitative Analysis

Managing the Data. Coding of Questionnaires and

Production of Code book.

Quantitative Measurement: Nominal, ordinal & interval

variables.

Simple statistics: Frequencies (percentages) Descriptives (mean, median, mode)

Data Analysis

Having spent time designing your survey and collecting data, it is important to make a good job of your data analysis.

Your analysis is guided by the hypotheses that you are exploring and relies on your ingenuity to relate the data to the theory.

Data analysis is helped by efficient management of data, particularly coding.

Managing Data:

Keep all questionnaires safely and where you cannot violate your promise of anonymity and confidentiality (cabinet and key!).

Check your input of data regularly on the computer – for example, SPSS, check frequencies and cross-tabulations of responses to variables to ensure no strange input!

Also check frequencies of case numbers (numbers representing your respondents) to prevent entering a case (respondent) twice!

When finished processing data, add frequencies of variables to the codebook.

Coding the data

You need to extract the data from the questionnaires and put them into a form that is easier to refer to and to manipulate:

1. You code data and create a data file.2. Draw up a coding frame. This lists all the

alternative values for a given variable and allocates a number for each possible answer.

3. Coding frame for closed questions can be taken from the questionnaire. Additional coding for open questions needs to be created when imputing data. Coding frame is usually drawn up before you attempt to code data and input data into the matrix.

The job of analysis is made easier if you use a computer to create data files – e.g. SPSS, STATA, R, SAS or even Excel.

Coding Questions

What sex are you?1. Male 2. Female

How old are you?________

What is your marital status?1. Never married 2. Married 3. Cohabiting 4. Separated 5. Divorced 6. Widowed

Data File

This involves producing a grid on which to record the appropriate codes for each respondent.

Normally each row represents the respondent and each column a variable.

ID no Sex Age Marital EdLevel

3125 1 26 1 8

3768 2 57 2 7

4448 2 39 3 99

Data File

In some computer programmes you can click on a data label and reveal what each code means.

99 or 999 are usually the codes we use when there is missing data – ‘non item’ responses.

ID no Sex Age Marital EdLevel

3125 male 26 Never married

degree

3768 female 57 married diploma

4448 Female 39 Separated

missing

Notes on the Codebook:

Start creating the codebook with the questionnaire.

For each question:1. Write down the actual question.2. The name(s) of the variable(s) given to

represent the question [eg:Q1walk]3. The label of the variable. [eg: whether or not

respondent walks to work].4. Values (codes) and value labels for each

attribute. [egg: 1 (value) = walks to work (label); 0 (value) = does not walk to work (label) and 99 = missing data]

5. Variable type – nominal, ordinal or continuous.

Code open questions (string variables) and ‘other’ categories afterwards and manually.

Example of Tricky Question to Code:

Q1. Which mode of transport do you use most often to work?

(please tick all those that apply)

Walk Bicycle

Rail Bus Car as driver Car as passenger Motorbike as driver Motorbike as passenger

Coding of ‘Tricky’ Question:

Spilt the question into different variables, i.e. create multi-dichotomies.

Walk variable (Q1walk):1 = walks to work0 = does not walk to work99 = missing data.

Bicycle variable (Q1bicycle):1 = walks to work0 = does not walk to work99 = missing data.

ETC.

Quantitative Measurement

Types of Variables: Binary Nominal Ordinal Interval Discrete Count

Categorical Variables

Examples Descriptive

Statistics

Binary

2 categories

Usually coded as 1/2 or 0/1

Male/female

Employed/

unemployed

Supports/opposes Euro

Frequencies

Descriptives of 0/1 variable

Crosstabs

Nominal

More than 2 unordered categories

Usually coded as 1, 2, 3 but these are labels

Social class

Region of Residence

Political Party vote

Frequencies

Crosstabs

Continuous

Variables

Examples Descriptive

Statistics

Interval/ratio

Differences have same meaning at different points on the scale

Calendar year

Income

Weight

Height

Descriptive

Group frequencies

Scatter plots

Discrete Counts

No of counts in a given area or period of time

Number of children in a family

Number of heart attacks in Oxford in 2005

Frequencies

(if few values)

Descriptives

Ranked Variable

Examples Descriptive

Statistics

Ordinal

Categorical with ordered categories

Numeric codes 1,2,3…but numeric order corresponds to the ordering of categories

Class of university degree

Strength of agreement or disagreement about an issue

Level of job satisfaction

Frequencies

Crosstabs

Counting Responses

After compiling your data in your data matrix, the first step in data analysis is to summarise your data.

You can do this by: Tabulating the data (‘Frequencies’) Calculating the ‘Descriptives’ i.e.

summaries and variability of the data

Graphs This initial step is sometimes

called univariate analysis – i.e. description of one variable

Frequencies

From a frequency table, you can tell how often (frequently) people gave each response.

It tells you how many people selected each response to a question.

Frequencies can also be used to check codes. If a code appears in the frequency table that wasn’t used in the coding scheme, you know that an error has happened in imputing the data.

IMPJOB – Importance to respondent of having a fulfilling job.

Value Label

Value Frequency

Percent Valid Percent

Cum Percent

One of the most imp.

1 316 21.1 21.4 21.4

Very imp.

2 833 55.5 56.3 77.7

Somewhat

3 238 15.9 16.1 93.8

Not too import

4 62 4.1 4.2 98

Not at all

5 30 2.0 2.0 100

DK 8 7 .5 Missing

NA 9 14 .9 Missing

Total 1500 100 100

Frequencies

A frequency count alone is not a very good summary of the data so use both counts and percentages.

Percentages are easier to visualise – unlike counts you can compare percentages across surveys with different cases.

Use valid percentages – i.e. exclude the missing data.

Simple Statistics

Descriptives:

1. Mode (most frequently recurring response)

2. Mean (average)

3. Median (middle value if all responses were laid out in a row from smallest to largest).

Used instead of mean for ordinal variables, i.e. is the mid-rank.

Also is not affected by ‘outliers’ so a better measure than mean for continuous variables like age and income.

Graphs

You can use a number of graphs to illustrate your findings such as: Pie chart Bar chart Histogram

It depends on the type of variable!

Simple Statistics

Bivariate Analysis – this involves the relationship between two variables.

The simplest form is a cross-tabulation.

You need categorical variables for this.

For continuous variables, use a scatter plot to graph the relationship.