data analysis and presentation

95
Research Fundamentals and Terminology 1-1 Excel Books 4-1 4 UNIT Data Analysis and Presentation

Upload: jignesh-kariya

Post on 05-Apr-2017

136 views

Category:

Education


0 download

TRANSCRIPT

Page 1: Data analysis and Presentation

Research Fundamentals and Terminology

1-1 Excel Books4-1

4UNIT

Data Analysis and

Presentation

Page 2: Data analysis and Presentation

Research Fundamentals and Terminology

1-2

Data Preparation and

Data Presentation

Page 3: Data analysis and Presentation

Research Fundamentals and Terminology

1-3

1. Introduction to Data Preparation Process

The data preparation process is as shown in the diagram below.

We are going to discuss each element of data preparation process in subsequent sections.

Page 4: Data analysis and Presentation

Research Fundamentals and Terminology

1-4

1. Prepare preliminary plan data analysis

2. Check questionnaire

3. Edit

4. Code

5. Transcribe

6. Clean data

7. Statistically adjust the data

8. Select data analysis strategy

Page 5: Data analysis and Presentation

Research Fundamentals and Terminology

1-5

2. Check questionnaire

A questionnaire returned from the field may be unacceptable for several reasons.• Parts of the questionnaire may be incomplete.•The pattern of responses may indicate that the respondent did not understand or follow the instructions.•The responses show little variance.•One or more pages are missing.•The questionnaire is received after the pre-established cutoff date.•The questionnaire is answered by someone who does not qualify for participation.

Page 6: Data analysis and Presentation

Research Fundamentals and Terminology

1-6

Information gathered during data collection may lack uniformity. Example: Data collected through questionnaire and schedules may have answers which may not be ticked at proper places, or some questions may be left unanswered. Sometimes information may be given in a form which needs reconstruction in a category designed for analysis, e.g., converting daily/monthly income in annual income and so on. The researcher has to take a decision as to how to edit it.

Editing also needs that data are relevant and appropriate and errors are modified. Occasionally, the investigator makes a mistake and records and impossible answer. “How much red chilies do you use in a month” The answer is written as “4 kilos”. Can a family of three members use four kilo chilies in a month? The correct answer could be “0.4 kilo”.

3. Editing

Page 7: Data analysis and Presentation

Research Fundamentals and Terminology

1-7

Care should be taken in editing (re-arranging) answers to open-ended questions. Example: Sometimes “don’t know” answer is edited as “no response”. This is wrong. “Don’t know” means that the respondent is not sure and is in a double mind about his reaction or considers the questions personal and does not want to answer it. “No response” means that the respondent is not familiar with the situation/object/event/individual about which he is asked.

Page 8: Data analysis and Presentation

Research Fundamentals and Terminology

1-8

It is the process of checking and adjusting the data for omissions, legibility and consistency and readying them for coding and storage.

Reasons for editing is to ensure completeness, consistency and to detect questions answered out of order.

Editing means the following treatment of unsatisfactory results.

To ensure consistency in response, the questionnaires with unsatisfactory responses may be returned to the field, where the interviewers re-contact the respondents.

Page 9: Data analysis and Presentation

Research Fundamentals and Terminology

1-9

To ensure completeness, assigning missing values if returning the questionnaires to the field is not feasible, the editor may assign missing values to unsatisfactory responses.

If the questions answered are out of order, discarding unsatisfactory respondents.

In this approach, the respondents with unsatisfactory responses are simply discarded.

Page 10: Data analysis and Presentation

Research Fundamentals and Terminology

1-10

Coding is translating answers into numerical values or assigning numbers to the various categories of a variable to be used in data analysis. Coding is done by using a code book, code sheet, and a computer card. Coding is done on the basis of the instructions given in the codebook. The code book gives a numerical code for each variable.

Now-a-days, codes are assigned before going to the field while constructing the questionnaire/schedule. Pose data collection; pre-coded items are fed to the computer for processing and analysis. For open-ended questions, however, post-coding is necessary. In such cases, all answers to open-ended questions are placed in categories and each category is assigned a code.

4. Coding

Page 11: Data analysis and Presentation

Research Fundamentals and Terminology

1-11

Manual processing is employed when qualitative methods are used or when in quantitative studies, a small sample is used, or when the questionnaire/schedule has a large number of open-ended questions, or when accessibility to computers is difficult or inappropriate. However, coding is done in manual processing also.

Page 12: Data analysis and Presentation

Research Fundamentals and Terminology

1-12

• Coding is the process of identifying and assigning a numerical score or other character symbol to previously edited data.

• Coding means assigning a code, usually a number , to each possible response to each question asked.

• The code includes an indication of the column position (field) and data record it will occupy.

Page 13: Data analysis and Presentation

Research Fundamentals and Terminology

1-13

Coding Questions

• Fixed field codes, which mean that the number of records for each respondent is the same and the same data appear in the same column(s) for all respondents, are highly desirable.

• If possible standard codes should be used for missing data.

• Coding of structured questions is relatively simple, since the responses options are predetermined.• In questions that permit a large number of responses, each possible response option should be assigned a separate column.

Page 14: Data analysis and Presentation

Research Fundamentals and Terminology

1-14

Guidelines for coding unstructured questions:

• Category codes should be mutually exclusive and collectively exhaustive.

• Only a few of the responses should fall in to the other category.

• Category codes should be assigned for critical issues even if no one has mentioned them.

• Data should be coded to retain as much detail as possible.

Page 15: Data analysis and Presentation

Research Fundamentals and Terminology

1-15

A codebook contains coding instructions and the necessary information about variables in the data set. A codebook generally contains the following information:• Column number• Variable number• Question number• Record number• Variable number• Instructions for coding

Page 16: Data analysis and Presentation

Research Fundamentals and Terminology

1-16

Coding Questionnaire:

The respondent code and the record number appear on each record in the data.

The first record contains the additional codes: project code, interview code, date and time codes, and validation code.

It is a good practice to insert blanks between parts.

Page 17: Data analysis and Presentation

Research Fundamentals and Terminology

1-17

Converting raw data into transcribe data

The raw data can be converted into transcribed data by using various devices.

First the raw data input is done by using devices such as optical scanning, mark sense forms, computerized sensory analysis, Key punching and verification to correct etc.

The input data is stored in computer memory, hard disks and magnetic tapes which can be retrieved as transcribed data as and when required.

Page 18: Data analysis and Presentation

Research Fundamentals and Terminology

1-18

Data CleaningConsistency checks identify data that are out of range, logically inconsistent, or have extreme values. Computer packages like MINITAB, SPSS, SAS, and EXCEL can be programmed to identify out-of-range values for each variable and print out respondent code, variable code, variable name, record number, column number and out-of-range value.

Page 19: Data analysis and Presentation

Research Fundamentals and Terminology

1-19

Data cleaning Treatment of Missing Responses

• Substitute a Neutral Value: A neutral value, typically the mean response to the variable, is substituted for the missing responses.• Substitute an Imputed Response: The respondents pattern of responses to other questions are used to impute or calculate a suitable response to the missing questions.•In case wise deletion, cases or respondents with any missing responses are discarded from the analysis.• In Pair wise deletion, instead of discarding all cases with any missing values, the researcher uses only the cases or respondents with complete responses for each calculation

Page 20: Data analysis and Presentation

Research Fundamentals and Terminology

1-20

Statistically adjusting the data

In weighting each case or respondent in the database is assigned a weight to reflect its importance relative to other cases or respondents.Weighting is most widely used to make the sample data more representative of a target population on specific characteristics.Yet another use of weighting is to adjust the sample so greater importance is attached to respondents with certain characteristics.Statistically adjusting the data for variable re-specification.Variable re-specification involves the transformation of data to create new variables or modify existing variables.

Page 21: Data analysis and Presentation

Research Fundamentals and Terminology

1-21

Selecting data analysis strategyThe first data analysis strategy is the transformation of raw data into a form that that will make them easy to understand and interpret; rearranging, ordering, and manipulating data to generate descriptive information is the starting point of data analysis strategy.

Second data analysis strategy is to perform basic data analysis by using descriptive statistics.

Third data analysis strategy is to perform basic analysis using univariate statistics.

Fourth data analysis strategy is to perform data analysis using bivariate analysis such as testing the differences.

Page 22: Data analysis and Presentation

Research Fundamentals and Terminology

1-22

Fifth data analysis strategy is to perform data analysis using measures of association such as regression and correlation and nonparametric statistics.

Sixth data analysis strategy is to perform data analysis using multivariate statistics.

Page 23: Data analysis and Presentation

Research Fundamentals and Terminology

1-23

Data Presentation ToolsThis is the first category of Research Methodology tools and techniques which are used to present data.The list of data presentation tools is as below.

1. Bar chart 2. Pie chart 3. Frequency Distribution

4. Histogram 5. Frequency Polygon6. Ogive7. Stem and Leaf Plot 8. Cross Tabulation table

To summarize Qualitative and quantitative data various charts and graphs are developed.

Page 24: Data analysis and Presentation

Research Fundamentals and Terminology

1-24

Summarizing Qualitative Data

Here we are using various charts and graphs.

1. Frequency distribution2. Relative frequency distribution3. Percent frequency distribution4. Bar graphs5. Pie charts

Page 25: Data analysis and Presentation

Research Fundamentals and Terminology

1-25

Summarizing Quantitative Data

1. Frequency distribution2. Relative frequency distribution3. Percent frequency distribution4. Cumulative frequency distribution5. Cumulative relative frequency6. Cumulative percent frequency distribution7.Histogram 8.Frequency Polygon9.Ogive

Page 26: Data analysis and Presentation

Research Fundamentals and Terminology

1-26

• Tabulation is the final stage in collection and compilation of data, and it is thestepping-stone to the analysis and interpretation of data. In deciding about the type of tabulation one has to keep in mind the nature, scope and the object of research problem. Tabulation of data should be done in such a form that it suits the nature and object of the research investigation.

• Tabulation simply means presenting of data through tables. To be more precise “Tabulation is an orderly arrangement of data in column and rows‟.

Tabulation

Page 27: Data analysis and Presentation

Research Fundamentals and Terminology

1-27

OBJECTIVES, IMPORTANCE AND ADVANTAGES OF TABULATION

• Proper tabulation of data is of great importance because if the tabulation of data is not satisfactory its analysis is bound to be defective.

• Tabulation of data is done in order to achieve simplicity and convenience in processing and interpretation of data collected for research problem.

Page 28: Data analysis and Presentation

Research Fundamentals and Terminology

1-28

Advantages Of Tabulation

1. Ease in Understanding: It is easy to understand tabulated data than the unorganized data.

2. Time Savings: Tabulation of Data leads to immense saving of time during analysis.

3. Ease in Drawing Diagrams: Diagrammatic representation of data is more convenient if it is done on the basis of a tabulated data.

Page 29: Data analysis and Presentation

Research Fundamentals and Terminology

1-29

4. Ease in Comparison: Through tabulated data, it becomes easy to undertake comparative study because it is systematically displayed.

5. Detection of Errors: Errors and omissions in data are easily detected in tabulated data.

6. Space-Saving: In tabulation, the data is displayed in column and rows and so it uses less space.

 7. No Chances of Repetition: Repetition of data may

occur if it is displayed in an unorganized fashion. Tabulation saves the data from being repeated.

Page 30: Data analysis and Presentation

Research Fundamentals and Terminology

1-30

Cross-Tabulation

• A cross tabulation is the merging of the frequency distribution of 2 or more variables in a single table. It helps to understand how 1 variable relates to another variables

• Example: brand loyalty relates to gender

Page 31: Data analysis and Presentation

Research Fundamentals and Terminology

1-31

Example:- Cross-tabulationSample Gender Handedness

1 Female Right-handed

2 Male Left-handed

3 Female Right-handed

4 Male Right-handed

5 Male Left-handed

6 Male Right-handed

7 Female Right-handed

8 Female Left-handed

9 Male Right-handed

10 Female Right-handed

Page 32: Data analysis and Presentation

Research Fundamentals and Terminology

1-32

Left-handed

Right-handed Total

Males 2 3 5

Females 1 4 5

Total 3 7 10

Page 33: Data analysis and Presentation

Research Fundamentals and Terminology

1-33

Advantages1. Cross tabulation analysis & results can be easily

interpreted2. The clarity of interpretation provides a stronger

link between research result & managerial action3. It is simple4. Easy to understand by manager because Not

statistically oriented

Page 34: Data analysis and Presentation

Research Fundamentals and Terminology

1-34

SIGNIFICANCE OF REPORT WRITINGResearch report is considered a major component of the research study for the research task remains incomplete till the report has been presented and/or written. As a matter of fact even the most brilliant hypothesis, highly well designed and conducted research study, and the most strikinggeneralizations and findings are of little value unless they are effectively communicated to others.The purpose of research is not well served unless the findings are made known to others. Research results must invariably enter the general store of knowledge.

Page 35: Data analysis and Presentation

Research Fundamentals and Terminology

1-35

DIFFERENT STEPS IN WRITING REPORT

Research reports are the product of slow, painstaking, accurate inductive work. The usual steps involved in writing report are:

(a)logical analysis of the subject-matter; (b) preparation of the final outline; (c) preparation of the rough draft; (d) rewriting and polishing; (c) preparation of the final bibliography; and (f) writing the final draft.

Page 36: Data analysis and Presentation

Research Fundamentals and Terminology

1-36

Logical analysis of the subject matter: It is the first step which is primarily concerned with the development of a subject. There are two ways in which to develop a subject (a)logically and (b) chronologically. The logical development is made on the basis of

mentalconnections and associations between the one thing andanother by means of analysis. Logical treatment often consists in developing the material from the simple possible to the most complex structures. Chronological development is based on a connection or sequence in time or occurrence. The directions for doing or making something usually follow the chronological order.

Page 37: Data analysis and Presentation

Research Fundamentals and Terminology

1-37

Preparation of the final outline:

It is the next step in writing the research report “Outlines are the framework upon which long written works are constructed.

They are an aid to the logical organisation of the material and a reminder of the points to be stressed in the report.

Page 38: Data analysis and Presentation

Research Fundamentals and Terminology

1-38

Preparation of the rough draft

This follows the logical analysis of the subject and the preparation of the final outline. Such a step is of utmost importance for the researcher now sits to write down what he has done in the context of his research study.

He will write down the procedure adopted by him in collecting the material for his study along with various limitations faced by him, the technique of analysis adopted by him, the broad findings and generalizations and the various suggestions he wants to offer regarding the problem concerned.

Page 39: Data analysis and Presentation

Research Fundamentals and Terminology

1-39

Rewriting and polishing of the rough draft: This step happens to be most difficult part of all formal writing. Usually this step requires more time than the writing of the rough draft. The careful revision makes the difference between a mediocre and a good piece of writing. While rewriting and polishing, one should check the report for weaknesses in logical development or presentation.

The researcher should also “see whether or not the material, as it is presented, has unity and cohesion; does the report stand upright and firm and exhibit a definite pattern, like a marble arch? Or does it resemble an old wall of moldering cement and loose brick.”

In addition the researcher should give due attention to the fact that in his rough draft he has been consistent or not. He should check the mechanics of writing—grammar, spelling and usage.

Page 40: Data analysis and Presentation

Research Fundamentals and Terminology

1-40

Preparation of the final bibliography: Next in order comes the task of the preparation of the final bibliography. The bibliography, which is generally appended to the research report, is a list of books in some way pertinent to the research which has been done. It should contain all those works which the researcher has consulted. The bibliography should be arranged alphabetically and may be divided into two parts; the first part may contain the names of books and pamphlets, and the second part may contain the names of magazine and newspaper articles. Generally, this pattern of bibliography is considered convenient and satisfactory from the point of view of reader, though it is not the only way of presenting bibliography.The entries in bibliography should be made adopting the following order:

Page 41: Data analysis and Presentation

Research Fundamentals and Terminology

1-41

For books and pamphlets the order may be as under:1. Name of author, last name first.2. Title, underlined to indicate italics.3. Place, publisher, and date of publication.4. Number of volumes.

Page 42: Data analysis and Presentation

Research Fundamentals and Terminology

1-42

Writing the final draft: This constitutes the last step. The final draft should be written in a concise and objective style and in simple language, avoiding vague expressions such as “it seems”, “there may be”, and the like ones. While writing the final draft, the researcher must avoid abstract terminology and technical jargon. Illustrations and examples based on common experiences must be incorporated in the final draft as they happen to be most effective in communicating the research findings toothers.

A research report should not be dull, but must enthuse people and maintain interest and must show originality. It must be remembered that every report should be an attempt to solve some intellectual problem and must contribute to the solution of a problem and must add to the knowledge of both the researcher and the reader.

Page 43: Data analysis and Presentation

Research Fundamentals and Terminology

1-43

LAYOUT OF THE RESEARCH REPORT

Anybody, who is reading the research report, must necessarily be conveyed enough about the study so that he can place it in its general scientific context, judge the adequacy of its methods and thus form an opinion of how seriously the findings are to be taken. For this purpose there is the need of proper layout of the report. The layout of the report means as to what the research report should contain. A comprehensive layout of the research report should comprise (A) preliminary pages; (B) the main text; and (C) the end matter. Let us deal with them separately.

Page 44: Data analysis and Presentation

Research Fundamentals and Terminology

1-44

(A)Preliminary Pages

In its preliminary pages the report should carry a title and date, followed by acknowledgements in the form of ‘Preface’ or ‘Foreword’. Then there should be a table of contents followed by list of tables and illustrations so that the decision-maker or anybody interested in reading the report can easily locate the required information in the report.

Page 45: Data analysis and Presentation

Research Fundamentals and Terminology

1-45

(B) Main Text

The main text provides the complete outline of the research report along with all details. Title of the research study is repeated at the top of the first page of the main text and then follows the other details on pages numbered consecutively, beginning with the second page. Each main section of the report should begin on a new page. The main text of the report should have the following sections:

(i) Introduction; (ii) Statement of findings and recommendations; (iii) The results; (iv) The implications drawn from the results; and (v) The summary.

Page 46: Data analysis and Presentation

Research Fundamentals and Terminology

1-46

(i) Introduction: The purpose of introduction is to introduce the research project to the readers. It should contain a clear statement of the objectives of research i.e., enough background should be given to make clear to the reader why the problem was considered worth investigating.

A brief summary of other relevant research may also

be stated so that the present study can be seen in that context. The hypotheses of study, if any, and the definitions of the major concepts employed in the study should be explicitly stated in the introduction of the report.

Page 47: Data analysis and Presentation

Research Fundamentals and Terminology

1-47

(ii) Statement of findings and recommendations:

After introduction, the research report must contain a statement of findings and recommendations in non-technical language so that it can be easily understood by all concerned. If the findings happen to be extensive, at this point they should be put in the summarised form.

Page 48: Data analysis and Presentation

Research Fundamentals and Terminology

1-48

(iii) Results: A detailed presentation of the findings of the study, with supporting data in the form of tables and charts together with a validation of results, is the next step in writing the main text of the report. This generally comprises the main body of the report, extending over several chapters. The result section of the report should contain statistical summaries and reductions of the data rather than the raw data. All the results should be presented in logical sequence and splitted into readily identifiable sections. All relevant results must find a place in the report. But how one is to decide about what is relevant is the basic question.

Page 49: Data analysis and Presentation

Research Fundamentals and Terminology

1-49

(iv) Implications of the results: Toward the end of the main text, the researcher should again put down the results of his research clearly and precisely. He should, state the implications that flow from the results of the study, for the general reader is interested in the implications for understanding the human behaviour. Such implications may have three aspects as stated below:

(a) A statement of the inferences drawn from the present study which may be expected to apply in similar circumstances.(b) The conditions of the present study which may limit the extent of legitimate generalizations of the inferences drawn from the study.(c) The relevant questions that still remain unanswered or new questions raised by the study along with suggestions for the kind of research that would provide answers for them.

Page 50: Data analysis and Presentation

Research Fundamentals and Terminology

1-50

It is considered a good practice to finish the report with a short conclusion which summarises and recapitulates the main points of the study. The conclusion drawn from the study should be clearly related to the hypotheses that were stated in the introductory section. At the same time, a forecast of the probable future of the subject and an indication of the kind of research which needs to be done in that particular field is useful and desirable.

(v) Summary: It has become customary to conclude the research report with a very brief summary, resting in brief the research problem, the methodology, the major findings and the major conclusions drawn from the research results.

Page 51: Data analysis and Presentation

Research Fundamentals and Terminology

1-51

(C) End MatterAt the end of the report, appendices should be enlisted in respect of all technical data such as questionnaires, sample information, mathematical derivations and the like ones. Bibliography of sources consulted should also be given. Index (an alphabetical listing of names, places and topics along with the numbers of the pages in a book or report on which they are mentioned or discussed) should invariably be given at the end of the report.

The value of index lies in the fact that it works as a guide to the reader for the contents in the report.

Page 52: Data analysis and Presentation

Research Fundamentals and Terminology

1-52

Types of ReportsResearch reports vary greatly in length and type. In each individual case, both the length and the form are largely dictated by the problems at hand. For instance, business firms prefer reports in the letter form, just one or two pages in length. Banks, insurance organisations and financial institutions are generally fond of the short balance-sheet type of tabulation for their annual reports to their customers and shareholders. Mathematicians prefer to write the results of their investigations in the form of algebraic notations. Chemists report their results in symbols and formulae.

Page 53: Data analysis and Presentation

Research Fundamentals and Terminology

1-53

A technical report is used whenever a full written report of the study is required whether for recordkeeping or for public dissemination.

A popular report is used if the research results have policy implications. We give below a few details about the said two types of reports

Page 54: Data analysis and Presentation

Research Fundamentals and Terminology

1-54

In the technical report the main emphasis is on (i) the methods employed, (ii) assumptions made in the course of the study, (iii) the detailed presentation of the findings including their limitations

and supporting data.A general outline of a technical report can be as follows:1. Summary of results: A brief review of the main findings just in

two or three pages.

2. Nature of the study: Description of the general objectives of study, formulation of the problem in operational terms, the working hypothesis, the type of analysis and data required, etc.

3. Methods employed: Specific methods used in the study and their limitations. For instance, in sampling studies we should give details of sample design viz., sample size, sample selection, etc.

1. Technical Reports

Page 55: Data analysis and Presentation

Research Fundamentals and Terminology

1-55

4. Data: Discussion of data collected, their sources, characteristics and limitations. If secondary data are used, their suitability to the problem at hand be fully assessed. In case of a survey, the manner in which data were collected should be fully described.

5. Analysis of data and presentation of findings: The analysis of data and presentation of the findings of the study with supporting data in the form of tables and charts be fully narrated. This, in fact, happens to be the main body of the report usually extending over several chapters.

6. Conclusions: A detailed summary of the findings and the policy implications drawn from the results be explained.

Page 56: Data analysis and Presentation

Research Fundamentals and Terminology

1-56

7. Bibliography: Bibliography of various sources consulted be prepared and attached.

8. Technical appendices: Appendices be given for all technical matters relating to questionnaire, mathematical derivations, elaboration on particular technique of analysis and the like ones.

9. Index: Index must be prepared and be given invariably in the report at the end.

Page 57: Data analysis and Presentation

Research Fundamentals and Terminology

1-57

The popular report is one which gives emphasis on simplicity and attractiveness. The simplification should be sought through clear writing, minimization of technical, particularly mathematical, details and liberal use of charts and diagrams. Attractive layout along with large print, many subheadings, even an occasional cartoon now and then is another characteristic feature of the popular report.

Besides, in such a report emphasis is given on practical aspects and policy implications.

We give below a general outline of a popular report.

2. Popular Report

Page 58: Data analysis and Presentation

Research Fundamentals and Terminology

1-58

1.The findings and their implications: Emphasis in the report is given on the findings of most practical interest and on the implications of these findings.

2. Recommendations for action: Recommendations for action on the basis of the findings of the study is made in this section of the report.

3. Objective of the study: A general review of how the problem arise is presented along with the specific objectives of the project under study.

4. Methods employed: A brief and non-technical description of the methods and techniques used, including a short review of the data on which the study is based, is given in this part of the report.

Page 59: Data analysis and Presentation

Research Fundamentals and Terminology

1-59

5. Results: This section constitutes the main body of the report wherein the results of the study are presented in clear and non-technical terms with liberal use of all sorts of illustrations such as charts, diagrams and the like ones.

6. Technical appendices: More detailed information on methods used, forms, etc. is presented in the form of appendices. But the appendices are often not detailed if the report is entirely meant for general public.

Page 60: Data analysis and Presentation

Research Fundamentals and Terminology

1-60

All statistical techniques which simultaneously analyse more than two variables on a sample of observations can be categorized as multivariate techniques. We may as well use the term ‘multivariate analysis’ which is a collection of methods for analyzing data in which a number of observations are available for each object

Multi variate Analysis Technique

Page 61: Data analysis and Presentation

Research Fundamentals and Terminology

1-61

VARIABLES IN MULTIVARIATE ANALYSIS(i) Explanatory variable and criterion variable: If X may be

considered to be the cause of Y, then X is described as explanatory variable (also termed as causal or independent variable) and Y is described as criterion variable (also termed as resultant or dependent variable).

In some cases both explanatory variable and criterion variable may consist of a set of many variables in which case set (X1, X2, X3, …., Xp) may be called a set of explanatory variables and the set (Y1, Y2, Y3, …., Yq) may be called a set of criterion variables if the variation of the former may be supposed to cause the variation of the latter as a whole. In economics, the explanatory variables are called external or exogenous variables and the criterion variables are called endogenous variables. Some people use the term external criterion for explanatory variable and the term internal criterion for criterion variable.

Page 62: Data analysis and Presentation

Research Fundamentals and Terminology

1-62

ii) Observable variables and latent variables: Explanatory variables described above are supposed to be observable directly in some situations, and if this is so, the same are termed as observable variables. However, there are some unobservable variables which may influence the criterion variables.We call such unobservable variables as latent variables.

(iii) Discrete variable and continuous variable: Discrete variable is that variable which when measured may take only the integer value whereas continuous variable is one which, when measured, can assume any real value (even in decimal points).

(iv) Dummy variable (or Pseudo variable): This term is being used in a technical sense and is useful in algebraic manipulations in context of multivariate analysis. We call Xi ( i = 1, …., m) a dummy variable, if only one of Xi is 1 and the others are all zero

Page 63: Data analysis and Presentation

Research Fundamentals and Terminology

1-63

IMPORTANT MULTIVARIATE TECHNIQUES

(i) Multiple regression

(ii) Multiple discriminant analysis

(iii) Multivariate analysis of variance

(iv) Canonical correlation analysis

(v) Factor Analysis

Page 64: Data analysis and Presentation

Research Fundamentals and Terminology

1-64

(i) Multiple regression: In multiple regression we form a linear composite of explanatory variables in such way that it has maximum correlation with a criterion variable. This technique is appropriate when the researcher has a single, metric criterion variable. Which is supposed to be a function of other explanatory variables.

The main objective in using this technique is to predict the variability the dependent variable based on its covariance with all the independent variables. One can predict the level of the dependent phenomenon through multiple regression analysis model, given the levels of independent variables. Given a dependent variable, the linear-multiple regression problem is to estimate constants B1, B2, ... Bk and A such that the expression Y = B1X1 + B2X2 + ... + BkXk + A pare provides a good estimate of an individual’s Y score based on his X scores.

Page 65: Data analysis and Presentation

Research Fundamentals and Terminology

1-65

In practice, Y and the several X variables are converted to standard scores; zy, zl, z2, ... zk; each z has a mean of 0 and standard deviation of 1. Then the problem is to estimate constants, bi , such That z¢y = b1z1 + b2z2 + ...+ bk zkwhere z'y stands for the predicted value of the standardized Y score, zy. The expression on the right side of the above equation is the linear combination of explanatory variables. The constant A is eliminated in the process of converting X’s to z’s. The least-squares-method is used, to estimate the beta weights in such a way that the sum of the squared prediction errors is kept as small as possible

Page 66: Data analysis and Presentation

Research Fundamentals and Terminology

1-66

Sometimes the researcher may use step-wise regression techniques to have a better idea of the independent contribution of each explanatory variable. Under these techniques, the investigator adds the independent contribution of each explanatory variable into the prediction equation one by one, computing betas and R2 at each step.

Formal computerized techniques are available for the purpose and the same can be used in the context of a particular problem being studied by the researcher.

Page 67: Data analysis and Presentation

Research Fundamentals and Terminology

1-67

Multiple discriminant analysis

Through discriminant analysis technique, researcher may classify individuals or objects into one of two or more mutually exclusive and exhaustive groups on the basis of a set of independent variables. Discriminant analysis requires interval independent variables and a nominal dependent variable.

For example, suppose that brand preference (say brand x or y) is the dependent variable of interest and its relationship to an individual’s income, age, education, etc. Is being investigated, then we should use the technique of discriminant analysis.

Page 68: Data analysis and Presentation

Research Fundamentals and Terminology

1-68

Regression analysis in such a situation is not suitable because the dependent variable is, not intervally scaled.

Thus discriminant analysis is considered an appropriate technique when the single dependent variable happens to be non-metric and is to be classified into two or more groups, depending upon its relationship with several independent variables which all happen to be metric.

The objective in discriminant analysis happens to be to predict an object’s likelihood of belonging to a particular group based on several independent variables. In case we classify the dependent variable in more than two groups, then we use the name multiple discriminant analysis; but in case only two groups are to be formed, we simply use the term discriminant analysis.

Page 69: Data analysis and Presentation

Research Fundamentals and Terminology

1-69

(iii) Multivariate analysis of variance:

Multivariate analysis of variance is an extension of bivariate analysis of variance in which the ratio of among-groups variance to within-groups variance is calculated on a set of variables instead of a single variable.

This technique is considered appropriate when several metric dependent variables are involved in a research study along with many non-metric explanatory variables.

Page 70: Data analysis and Presentation

Research Fundamentals and Terminology

1-70

In other words, multivariate analysis of variance is specially applied whenever the researcher wants to test hypotheses concerning multivariate differences in group responses to experimental manipulations.

For instance, the market researcher may be interested in using one test market and one control market to examine the effect of an advertising campaign on sales as well as awareness, knowledge and attitudes. In that case he should use the technique of multivariate analysis of variance for meeting his objective.

Page 71: Data analysis and Presentation

Research Fundamentals and Terminology

1-71

(iv) Canonical correlation analysis:

This technique was first developed by Hotelling wherein an effort is made to simultaneously predict a set of criterion variables from their joint co-variance with a set of explanatory variables. Both metric and non-metric data can be used in the context of this multivariate technique.

The procedure followed is to obtain a set of weights for the dependent and independent variables in such a way that linear composite of the criterion variables has a maximum correlation with the linear composite of the explanatory variables.

Page 72: Data analysis and Presentation

Research Fundamentals and Terminology

1-72

For example, if we want to relate grade school adjustment to health and physical maturity of the child, we can then use canonical correlation analysis, provided we have for each child a number of adjustment scores (such as tests, teacher’s ratings, parent’s ratings and so on) and also we have for each child a number of health and physical maturity scores (such as heart rate, height, weight, index of intensity of illness and so on).

The main objective of canonical correlation analysis is to discover factors separately in the two sets of variables such that the multiple correlation between sets of factors will be the maximum possible

Page 73: Data analysis and Presentation

Research Fundamentals and Terminology

1-73

(v) Factor analysis:

Factor analysis is by far the most often used multivariate technique of research studies, specially pertaining to social and behavioural sciences.

It is a technique applicable when there is a systematic interdependence among a set of observed or manifest variables and the researcher is interested in finding out something more fundamental or latent which creates this commonality.

For instance, we might have data, say, about an individual’s income, education, occupation and dwelling area and want to infer from these some factor (such as social class) which summarises the commonality of all the said four variables

Page 74: Data analysis and Presentation

Research Fundamentals and Terminology

1-74

The technique used for such purpose is generally described as factor analysis. Factor analysis, thus, seeks to resolve a large set of measured variables in terms of relatively few categories, known as factors.

This technique allows the researcher to group variables into factors (based on correlation between variables) and the factors so derived may be treated as new variables (often termed as latent variables) and their value derived by summing the values of the original variables which have been grouped into the factor.

The meaning and name of such new variable is subjectively determined by the researcher. Since the factors happen to be linear combinations of data, the coordinates of each observation or variable is measured to obtain what are called factor loadings.

Such factor loadings represent the correlation between the particular variable and the factor, and are usually place in a matrix of correlations between the variable and the factors.

Page 75: Data analysis and Presentation

Research Fundamentals and Terminology

1-75

A Classification of Multivariate Techniques

Page 76: Data analysis and Presentation

Research Fundamentals and Terminology

1-76

More Than One Dependent

Variable* Multivariate

Analysis of Variance and Covariance

* Canonical Correlation

* Multiple Discriminant Analysis

* Cross- Tabulation

* Analysis of Variance and Covariance

* Multiple Regression

* Conjoint Analysis

* Factor Analysis

One Dependent Variable

Variable Interdependenc

e

Interobject Similarity

*Cluster Analysis*Multidimensiona

l Scaling

Dependence Technique

Interdependence Technique

Multivariate Techniques

Page 77: Data analysis and Presentation

Research Fundamentals and Terminology

1-77

• When a single variable is analyzed alone is called Univariate analysis.

• Univariate data is used for the simplest form of analysis. It is the type of data in which analysis are made only based on one variable

Eg: sample statistic such as “mean” which might refers to the average consumption of a particular kind of food or the age of certain group of students or people, it is known as univariate analysis

Univariate Analysis

Page 78: Data analysis and Presentation

Research Fundamentals and Terminology

1-78

A Classification of Univariate Techniques

Page 79: Data analysis and Presentation

Research Fundamentals and Terminology

1-79

Independent

Related

Independent

Related* Two- Group test

* Z test * One-Way

ANOVA

* Paired t test

* Chi-Square* Mann-Whitney* Median* K-S* K-W ANOVA

* Sign* Wilcoxon* McNemar* Chi-Square

Metric Data Non-numeric Data

Univariate Techniques

One Sample Two or More Samples

One Sample Two or More Samples

* t test* Z test

* Frequency* Chi-Square* K-S* Runs* Binomial

Page 80: Data analysis and Presentation

Research Fundamentals and Terminology

1-80

• Bivariate data is used for little complex analysis than as compared with univariate data.

• Bivariate data is the data in which analysis are based on two variables per observation simultaneously.

Eg: cross-classification of age group & cross consumption of two products.

Bivariate Analysis

Page 81: Data analysis and Presentation

Research Fundamentals and Terminology

1-81

HYPOTHESIS TESTING

Page 82: Data analysis and Presentation

Research Fundamentals and Terminology

1-82

Steps involved in hypothesis testing

Page 83: Data analysis and Presentation

Research Fundamentals and Terminology

1-83 Draw Marketing Research Conclusion

Formulate H0 and H1

Select Appropriate TestChoose Level of Significance

Determine Probability Associated with Test

Statistic

Determine Critical Value of Test

Statistic TSCR

Determine if TSCR falls into (Non)

Rejection RegionCompare with Level of

Significance,

Reject or Do not Reject H0

Collect Data and Calculate Test Statistic

Page 84: Data analysis and Presentation

Research Fundamentals and Terminology

1-84

A Broad Classification of Hypothesis Tests

Page 85: Data analysis and Presentation

Research Fundamentals and Terminology

1-85

Median/ RankingsDistribution

sMeans Proportions

Tests of Association

Tests of Differences

Hypothesis Tests

Page 86: Data analysis and Presentation

Research Fundamentals and Terminology

1-86

A Classification of Hypothesis Testing Procedures for Examining Differences

Page 87: Data analysis and Presentation

Research Fundamentals and Terminology

1-87

Hypothesis Testing Related to Differences• Parametric tests assume that the variables of interest are

measured on at least an interval scale. • Nonparametric tests assume that the variables are

measured on a nominal or ordinal scale. • These tests can be further classified based on whether one

or two or more samples are involved. • The samples are independent if they are drawn randomly

from different populations. For the purpose of analysis, data pertaining to different groups of respondents, e.g., males and females, are generally treated as independent samples.

• The samples are paired when the data for the two samples relate to the same group of respondents.

Page 88: Data analysis and Presentation

Research Fundamentals and Terminology

1-88

Independent Samples

Paired Samples Independent

SamplesPaired

Samples* Two-Group t test

* Z test * Paired t test * Chi-Square

* Mann-Whitney* Median* K-S

* Sign* Wilcoxon* McNemar* Chi-Square

Hypothesis Tests

One Sample Two or More Samples

One Sample Two or More Samples

* t test* Z test

* Chi-Square * K-S * Runs* Binomial

Parametric Tests (Metric Tests)

Non-parametric Tests (Nonmetric Tests)

Page 89: Data analysis and Presentation

Research Fundamentals and Terminology

1-89

Type I and Type II errorsBecause the hypothesis testing process uses sample statistics calculated from random data to reach conclusion about population parameters, it is possible to make an incorrect decision about the null hypothesis, in particular, two types of errors can be made in testing hypotheses:

• Type I and • Type II

Page 90: Data analysis and Presentation

Research Fundamentals and Terminology

1-90

Type IIt is committed by rejecting a true null hypothesis. With a type I error, the null hypothesis is true, but the business researcher decides that it is not.

For example, suppose the flour packaging process actually is “in control” and is averaging 40 ounces of flour per package.

Suppose also that a business researcher randomly selects 100 packages, weighs the contents of each, and computes a sample mean.

It is possible, by chance, to randomly select 100 of the more extreme packages (mostly heavy weighted or mostly light weighted) resulting in a mean that falls in the rejection area.

The decision is to reject the null hypothesis even though the population mean is actually 40 ounces. In this case, the business researcher has committed a Type I error.

Page 91: Data analysis and Presentation

Research Fundamentals and Terminology

1-91

If a manager fires an employee because some evidence indicates that she is stealing from the company and if she really is not stealing from the company, then the manager has committed Type I error.

Non rejection areaCritical

Value

Rejection Area

40 ounces

Page 92: Data analysis and Presentation

Research Fundamentals and Terminology

1-92

The preceded figure shows, the rejection regions represents the possibility of committing a Type I error.Mean that fall beyond the critical values will be considered so extreme that the business researcher chooses to reject the null hypothesis.However, if the null hypothesis is true, any mean that falls in a rejection region will result in a decision that produces a Type I error.An analogous courtroom example of a Type I error is when an innocent person is sent to jail.

Page 93: Data analysis and Presentation

Research Fundamentals and Terminology

1-93

The probability of committing a Type I error is called alpha (α)or level of significance.

Alpha equals the area under that is in the rejection region beyond the critical value(s).

The value of alpha is always set before the experiment or study is undertaken, common values of alpha are 0.05, 0.01, 0.10,and 0.001.

Page 94: Data analysis and Presentation

Research Fundamentals and Terminology

1-94

Type IIA Type II error is committed when a business researcher fails to reject a false null hypothesis. In this case, the null hypothesis is false, but a decision is made to not reject it.

Suppose in the case of flour problem that the packaging process is actually producing a population mean of 41 ounces even though the null hypothesis is 40 ounces. A sample of 100 packages yields a sample mean 40.2 ounces, which falls in the no rejection region.The business decision maker decides not to reject the null hypothesis. A type II error has been committed.The packaging procedure is out of control and hypothesis testing process does not identify it.

Page 95: Data analysis and Presentation

Research Fundamentals and Terminology

1-95

Suppose in the business world an employee is stealing from the company. A manager sees some evidence that the stealing is occurring but lacks enough evidence to conclude that the employee is stealing from the company.The manager decides not to fire the employee based on theft.The manager has committed Type II error.

The probability of committing a Type II error is beta (β). Unlike alpha, beta is not usually stated at the beginning of the hypothesis testing procedure. Because beta occurs only when the null hypothesis is not true, the computation of beta varies with the many possible alternative parameters might occur.