important terminologies

6
1. Statistics vs. Parameters: A statistic is a numerical measure computed from a sample and a parameter is a numerical measure computed from a population. Thus, these terms are also referred to as sample statistics and population parameters. 2. Frequency Distribution: The frequency (f) is the number of times a variable takes on a particular value. Note that any variable has a frequency distribution. For example, roll a pair of dice several times and record the resulting values (constrained to being between and 2 and 12), counting the number of times any given value occurs (the frequency of that value occurring), and take these all together to form a frequency distribution. Frequencies can be absolute when the frequency provided is the actual count of the occurrences, or it can be relative when they are normalized by dividing the absolute frequency by the total number of observations [0, 1]. Relative frequencies are particularly useful if you want to compare distributions drawn from two different sources, i.e., while the numbers of observations of each source may be different. 3. Mean, Median, Mode and Range: The mean is the numerical average of the data set. Ordinarily, the mean is computed by adding all the values in the set, then dividing the sum by the number of values. The median is the number that is in the middle of a set of data. Arrange the numbers in the set in order from least to greatest. Then find the number that is in the middle. What, if there are even number of data in the set? In this case, take two central numbers, add them and divide by 2 and there comes the median value. Say, for example, if a student’ scores in eight different subjects are 45, 67, 74, 82, 88, 91, 92, 93, then his/her median score will be (82+88)/2 = 170/2 = 85. One important thing here is the data needs be converted into an array of ascending or descending order before computing the median value. So, what is mode then? The mode is 1 Important Statistical Terminologies in Research

Category:

Education


1 download

DESCRIPTION

This document is highly important for the learners of research methodology. A number of statistical terminologies are defined with examples for the simplicity of learners.

TRANSCRIPT

Page 1: Important terminologies

1. Statistics vs. Parameters: A statistic is a numerical measure computed from a sample and a parameter is a numerical measure computed from a population. Thus, these terms are also referred to as sample statistics and population parameters.

2. Frequency Distribution: The frequency (f) is the number of times a variable takes on a particular value. Note that any variable has a frequency distribution. For example, roll a pair of dice several times and record the resulting values (constrained to being between and 2 and 12), counting the number of times any given value occurs (the frequency of that value occurring), and take these all together to form a frequency distribution. Frequencies can be absolute when the frequency provided is the actual count of the occurrences, or it can be relative when they are normalized by dividing the absolute frequency by the total number of observations [0, 1]. Relative frequencies are particularly useful if you want to compare distributions drawn from two different sources, i.e., while the numbers of observations of each source may be different.

3. Mean, Median, Mode and Range: The mean is the numerical average of the data set. Ordinarily, the mean is computed by adding all the values in the set, then dividing the sum by the number of values. The median is the number that is in the middle of a set of data. Arrange the numbers in the set in order from least to greatest. Then find the number that is in the middle. What, if there are even number of data in the set? In this case, take two central numbers, add them and divide by 2 and there comes the median value. Say, for example, if a student’ scores in eight different subjects are 45, 67, 74, 82, 88, 91, 92, 93, then his/her median score will be (82+88)/2 = 170/2 = 85. One important thing here is the data needs be converted into an array of ascending or descending order before computing the median value. So, what is mode then? The mode is the piece of data that occurs most frequently in the data set. A set of data can have i. one mode, more than one mode, and no mode at all. The range is the difference between the lowest and highest values in a data set. For example, in above case of marks earned by the student, the Range = 93 – 45 = 48. It reveals the numerical extent of the width of data set.

4. Variance and Standard Deviation: The variance is the average squared deviation from the mean of a set of data. It is used to find the standard deviation. Process: 1. Find the mean of the data. Hint: Mean is the average, so add up the values and divide by the number of items. 2. Subtract the mean from each value; the result is called the deviation from the mean.3. Square each deviation of the mean. 4. Find the sum of the squares. 5. Divide the total by the number of items. The variance formula includes the Sigma Notation, , which represents the sum of all the items to the right of Sigma; Here, mean is represented by and n is the number of items. Standard Deviation shows the variation in data. If the data is close together, the standard deviation will be small. If the data is spread out, the standard deviation will be large. Standard Deviation is often denoted by the lowercase Greek letter sigma (). Notice the standard deviation formula is the square root of

1

Important Statistical Terminologies in Research Methodology

Page 2: Important terminologies

the variance. As we have seen, standard deviation measures the dispersion of data. The greater the value of the standard deviation, the further the data tend to be dispersed from the mean. Z-Scores are referred to as the number of standard deviations an observation is away from the mean.

5. Skewness and Kurtosis: A fundamental task in many statistical analyses is to characterize the location and variability of a data set. A further characterization of the data includes the analyses of skewness and kurtosis. The measure of dispersion tells us about the variation of the data set. Skewness tells us about the direction of variation of the data set. Skewness is a measure of symmetry, or more precisely, the lack of symmetry. A distribution, or data set, is symmetric if it looks the same to the left (negative) and right (positive) of the center point. The histogram is an effective graphical technique for showing both the skewness and kurtosis of a data set.

There are further statistics that describe the shape of the distribution, using formulae that are similar to those of the mean and variance. 1st moment - Mean (describes central value); 2nd moment - Variance (describes dispersion); 3rd moment - Skewness (describes asymmetry); and 4th moment - Kurtosis (describes peakedness).

Kurtosis measures how peaked the histogram is.

The kurtosis of a normal distribution is 0. Kurtosis characterizes the relative peakedness or flatness of a distribution compared to the normal distribution. Platykurtic: When the kurtosis < 0, the frequencies throughout the curve are closer to be equal (i.e., the curve is more flat and wide). Thus, negative kurtosis indicates a relatively flat distribution. Leptokurtic: When the kurtosis > 0, there are high frequencies in only a small part of the curve (i.e, the curve is more peaked). Thus, positive kurtosis indicates a relatively peaked distribution.

Kurtosis is based on the size of a distribution's tails. Negative kurtosis (platykurtic): distributions with short tails. Positive kurtosis (leptokurtic): distributions with relatively long tails.

2

kurtosis=∑

i

n

( x i− x̄ )4

ns4−3

Page 3: Important terminologies

6. Hypothesis: It is a hunch, assumption, suspicion, assertion or an idea about a phenomena, relationship, or situation, the reality of truth of which one do not know. A researcher calls these assumptions, assertions, statements, or hunches hypotheses and they become the basis of an inquiry. In most cases, the hypothesis will be based upon either previous studies or the researcher’s own or someone else’s observations. Hypothesis is a conjectural statement of relationship between two or more variable (Kerlinger, 1986). Hypothesis is proposition, condition or principle which is assumed, perhaps without belief, in order to draw its logical consequences and by this method to test its accord with facts which are known or may be determined (Webster’s New International Dictionary of English). According to Black and Dean (1976), a tentative statement about something, the validity of which is usually unknown is known as hypothesis. Accordingly, Baily (1978) has defined it as a proposition that is stated in a testable form and that predicts a particular relationship between two or more variable. In other words, if we think that a relationship exists, we first state it is hypothesis and then test hypothesis in the field. In fact, a hypothesis may be defined as a tentative theory or supposition set up and adopted provisionally as a basis of explaining certain facts or relationships and as a guide in the further investigation of other facts or relationships.

Hypotheses has these characteristics – i. a tentative proposition, ii. unknown validity, and iii. specifies relation between two or more variables.

Functions of a hypothesis: Bringing clarity to the research problem. It provides a study with focus. It signifies what specific aspects of a research problem is to be investigated. It also helps delimit what data to be collected and what not to be collected. It serves for the enhancement of objectivity of the study. It serves highly instrumental to formulate the theory and enables to conclude with what is true or what is false.

Types of hypotheses: Three types of hypotheses include -- working hypothesis, null hypothesis and alternate hypothesis. Working hypothesis is provisionally adopted to explain the relationship between some observed facts for guiding a researcher in the investigation of a problem. A statement constitutes a trail or working hypothesis (which) is to be tested and conformed, modifies or even abandoned as the investigation proceeds.

Null hypothesis is formulated against the working hypothesis, and it opposes the statement of the working hypothesis. It is contrary to the positive statement made in the working hypothesis. It is formulated to disprove the contrary of a working hypothesis. When a researcher rejects a null hypothesis, he/she actually proves a working hypothesis. It is normally denoted by H0. Normally, only null hypothesis is written research papers.

Alternate hypothesis is formulated when a researcher totally rejects null hypothesis. He/she develops such a hypothesis with adequate reasons. It is normally denoted by H 1. A researcher formulates this hypothesis only after rejecting the null hypothesis.

Examples of different hypotheses:Working hypothesis: Population influences the number of bank branches in a town.

3

Page 4: Important terminologies

Null hypothesis (Ho): Population may not have any significant influence on the number of bank branches in a town.

Alternate hypothesis (H1): Population might have significant effect on the number of bank branches in a town.

7. Statistical Tests: Different statistical tests have to be performed for different types of data.

For continuous data: If comparing 2 groups (treatment/control), t-test. If comparing > 2 groups, ANOVA (F-test). If measuring association between 2 variables, Pearson r correlation. If trying to predict an outcome (crystal ball), regression or multiple regression.

For ordinal data: Likert-type scales are ordinal data. If comparing 2 groups, Mann Whitney U (treatment vs. control), Wilcoxon (matched pre vs. post). If comparing > 2 groups, Kruskal-Wallis (median test). If measuring association between 2 variables, Spearman rho (ρ).

For categorical data: Called a test of frequency; how often something is observed (AKA: Goodness of Fit Test, Test of Homogeneity). Chi-Square (χ2). Examples of burning research questions -- Do negative ads change how people vote? Is there a relationship between marital status and health insurance coverage? Do blonds have more fun?

8. Descriptive and Inferential Statistics: Descriptive Statistics provide an overview of the attributes of a data set. These include measurements of central tendency (frequency histograms, mean, median, and mode) and dispersion (range, variance and standard deviation). Inferential statistics provide measures of how well your data support your hypothesis and if your data are generalizable beyond what was tested (significance tests).

4