data collection in research
DESCRIPTION
TRANSCRIPT
1. Overview of Statistics & Collection of Data
1. 1 Introduction to statistics – Defination, types, basic terms,level of data measurement.
1.2 Methods of Collection of Data – Census & SamplingMethods
Shaya’a Othman Definition of Statistics
“Statistics is a scientific method of collecting, organizing, presenting, analyzing and interpreting of numerical information, developed from mathematical theory of probability, to assist in making effective and efficient decision.”
Definition by Shaya'a Othman,
OVERVIEW OF STATISTICS
Collecting & Publishing Numerical data
Scienctific Method of Collecting, Organizing, Presenting, Analyzing , Interpreting,
numerical information,developed from
mathematical theory of probability, to assist in making effective and
efficient decision.
DEFINATION
DESCRIPTIVE STATISTICS: Methods of Organizing, and Presenting Data in informative way.
INFERENTIAL STATISTICS:Methods of determine something about population base on sample.
Q u alita tive o r a ttrib u te(typ e o f ca r ow n ed )
d isc re te(n u m b er o f ch ild ren )
con tin u ou s(tim e taken fo r an exam )
Q u an tita tive o r n u m erica l
D A TA
Levels of Measurement
Nominal Nominal
OrdinalOrdinal
IntervalInterval
RatioRatio
DATA
TYPESVaribles
Levels
Inferential
Descriptive
Science
co
mm
on
ETHICS
Misleading Data
Use of AverageUse of GraphicUse of Association
Computer Application:Microsoft Excel
SPSS, NVivo (CAQDAS}
COMPUTER
STATISTICS
Collection of Data
Primary DataSecondary Data Census [Total Count]
Sample [selected Count]
SAMPLING TECHNIQUES;Systematic samplingStratified Sampling
Multi-dtage SamplingCluster samplingQuota sampling
METHODS OF COLLECTINGInterviews Forms - Direct/Phone
Mailing QuestionnairesComputer -eMail, eFax, etc
Mobile Phone -SMS
MALAYSIAN GOVERNMENT PUBLICATION:Statistics Dept. PM Dept.Econ. Planning Unit, PM DeptResearch Institution -RRI,
PORIM, MARDI,
Private Survey/Research Co.
INTERNATIONALORGANAZIATION:United NationsOIC ,ASEAN, World Bank, Islamic Dev. Bank
Government Publications
International Organization
Private Publication/Data
Total Count of Population
Selected Count of Population
Internet, Website,-CIA Data
SO
UR
SE
TECH
NIQ
UES
METHODS
Internets
COLLECTINGDATA
RESEARCH METHODOLOGY
WHAT IS HYPOTHESIS ?
STEPS ACTIONS DESCRIPTIONS
STEP 1State Null and Alternative hypothesis
Null Hypothesis : Ho = 0Alternative Hypothesis : H1 = 0Note : 1.Two-tailed test if alternative hypothesis does not state direction [ greater or less].2. One-tailed test if alternative state direction.
STEP 2Select Level of Significance
1. .01 level [1% level] - for consumer research 2. .05 level [5% level] – for quality assurance3. .10 level [10% level] – for political pooling
STEP 3Identify the test Statistics
z and t as test statistic , and othersNon-Parametric Test : F and X Chi-square statistic
STEP 4Formulate Decision Rule
Find the critical value of z from Normal Distribution table , or value t from t distribution table where appropriate.
STEP 5Take a sample arrive at decision
Only ONE DECISION is possible in Hypothesis TestingDo not reject Null Hypothesis, or reject Null Hypothesis and Accept Alternative Hypothesis
5-STEPS PROCEDURE FOR TESTING HYPOTHESIS
1.Two-tailed test if alternative hypothesis does not state direction [ greater or less].
2. One-tailed test if alternative state direction.
Possibility Two Type of Errors[Type I and Type II]
One-Sample Tests of Hypothesis
Two-Samples Tests of Hypothesis
Large sample[ n more than 30]
Small Sample[ n less than 30]
Large Sample [n more than 30 ]
Small Sample[n less than 30]
Two-Tail Test[No direction]
z = x – u σ/√n
Using normal distribution table
t = x - u s/ √n
df = n-1Using t distribution table
z = x₁ - x₂ ______ √[ (σ₁² / n₁ ) +(σ₂²/ n₂)]
t = x₁ - x₂ ______ √[ (s₁² /n₁ ) +(s₂²/ n₂ )]
df = n + n - 2Using t- distribution table
One-Tail Test[With direction : Greater or less than]
- --
- - - -
0 1.65
D o not
re ject
[P robability = .95]
Region of
re jection
[Probability=.05]
C ritica l va lue
STATISTICAL TEST OF HYPOTHESIS
Hypothesis – “A supposition or proposed explanation made on the basis of limited evidence as a starting point for further investigation. Oxford Dictionary
Hypothesis – “ A statement or conjecture which is neither true nor false, subjected to be verified “ Shayaa Othman, KUIS
Hypothesis – “A statement about a population parameter developed for the purpose of testing “ Douglas A Lind Statistical Techniques on Business Economics
Hypothesis Testing – “A procedure based on sample evidence and probability theory to determine whether the hypothesis is a reasonable statement. “
Douglas A Lind statistical Techniques on Business Economics
Null Hypothesis – “A statement about a the value of a population parameter.”Douglas A Lind statistical Techniques on Business Economics
Alternative Hypothesis – “A statement that is accepted if the sample data provide sufficient evidence that the null hypothesis is false.”
Douglas A Lind statistical Techniques on Business Economics
Describing Data – Measures of Location
Population Mean = Sum of all the values in the Population Number of Values in the Population
Sample Mean = Sum of values in the Sample = Σx Number of Values in the Sample n
Weighted Mean = Σ[wx] Σw
Parameter = A characteristic of Population
Median = The midpoint of values after they have been ordered from the smallest to the highest
Mode = The value of observations that appears most frequently
Describing data = Measures of Dispersion
Range = Largest Value – Smaller Value
Mean Deviation = The Arithmetic mean of the absolute values of the deviation from the arithmetic mean
= l X- X l n
where is sigma [sum of]; X = value of each observation; X = arithmetic mean of the values; n is number of
observation ; l l indicates absolute values
Variance = The arithmetic mean of the of the squared deviation from the mean
Standard Deviation = The Square Root of the variance
Location of Percentiles = Lp = (n+1) P 100
M M
Characteristics of the Mean
It is calculated by summing the values and dividing by the number of values.
It requires the interval scale. All values are used. It is unique. The sum of the deviations from the mean is 0.
The Arithmetic MeanArithmetic Mean is the most widely used measure of location and shows the central value of the data.
The major characteristics of the mean are: A verag e J oe
3- 17
Population Mean
N
X
where µ is the population mean N is the total number of observations. X is a particular value. indicates the operation of adding.
For ungrouped data, the
Population MeanPopulation Mean is the sum of all the population values divided by the total number of population values:
3- 18
Example 1
500,484
000,73...000,56
N
X
Find the mean mileage for the cars.
A ParameterParameter is a measurable characteristic of a population.
AHMAD’s family owns four cars. The following is the current mileage on each of the four cars.
56,000
23,000
42,000
73,000
3- 19
Sample Mean
n
XX
where n is the total number of values in the sample.
For ungrouped data, the sample mean is the sum of all the sample values divided by the number of sample values:
3- 20
Example 2
4.155
77
5
0.15...0.14
n
XX
A statisticstatistic is a measurable characteristic of a sample.
A sample of five executives received the following bonus last year ($000):
14.0, 14.0, 15.0, 15.0, 17.0, 17.0, 16.0, 16.0, 15.015.0
3- 21
Example 4
89.0$50
50.44$1515155
)15.1($15)90.0($15)75.0($15)50.0($5
wX
During a one hour period on a hot Saturday afternoon in
Langkawi, Ahmad sold fifty drinks. He sold five drinks for $0.50,; fifteen for $0.75, fifteen for $0.90, and fifteen for $1.10. Compute the weighted mean of
the price of the drinks.
3- 22
The Median
There are as many values above the median as below it in the data array.
For an even set of values, the median will be the arithmetic average of the two middle numbers and is
found at the (n+1)/2 ranked observation.
The MedianMedian is the midpoint of the values after they have been ordered from the smallest to the largest.
3- 23
The ages for a sample of five INSANIAH students visiting Islamic Artifact Exhibition:
21, 25, 19, 20, 22,18, 27.
Arranging the data in ascending order gives:
18,19, 20, 21, 22, 25, 27
Thus median = 21.
The median (continued)
3- 24
Example 5
Arranging the data in ascending
order gives:
73, 76, 80
Thus the median is 76.
The heights of 3 INSANIAH Lecturers, in inches, are: 76, 73, 80.
The median is found at the (n+1)/2 = (3+1)/2 =2th data
point.
3- 25
The Mode: Example 6
Example 6Example 6:: The exam scores for ten students are: 81, 93, 84, 75, 68, 87, 81, 75, 81, 87. Because the score of 81 occurs the most often, it is the mode.
Data can have more than one mode. If it has two modes, it is referred to as bimodal, three modes, trimodal, and the like.
The ModeMode is another measure of location and represents the value of the observation that appears most frequently.
3- 26
Symmetric distributionSymmetric distribution: A distribution having the same shape on either side of the center
Skewed distributionSkewed distribution: One whose shapes on either side of the center differ; a nonsymmetrical distribution.
Can be positively or negatively skewed, or bimodal
The Relative Positions of the Mean, Median, and Mode
3- 27
The Relative Positions of the Mean, Median, and Mode: Symmetric Distribution
Zero skewness Mean
=Median
=Mode
M o d e
M ed ia n
M ea n
3- 28
The Relative Positions of the Mean, Median, and Mode: Right Skewed Distribution
Positively skewed: Mean and median are to the right of the mode.
Mean>Median>Mode
M o d e
M ed ia n
M ea n
3- 29
Negatively Skewed: Mean and Median are to the left of the Mode.
Mean<Median<Mode
The Relative Positions of the Mean, Median, and Mode: Left Skewed Distribution
M o d eM ea n
M ed ia n
3- 30
Geometric Mean
GM X X X Xnn ( )( )( )...( )1 2 3
The geometric mean is used to average percents, indexes, and relatives.
The Geometric MeanGeometric Mean (GM) of a set of n numbers is defined as the nth root of the product of the n numbers. The formula is:
3- 31
Example 7
The interest rate on three bonds were 5, 21, and 4 percent.The arithmetic mean is (5+21+4)/3 =10.0.The geometric mean is
49.7)4)(21)(5(3 GM
The GM gives a more conservative profit figure because it is not heavily weighted by the rate of 21percent.
3- 32
Geometric Mean continued
1period) of beginningat (Value
period) of endat Value(nGM
Another use of the geometric mean is to determine the percent increase in sales, production or other business or economic series from one time period to another.
Grow th in Sales 1999-2004
0
10
20
30
40
50
1999 2000 2001 2002 2003 2004
Year
Sal
es in
Milli
ons(
$)
3- 33
Example 8
0127.1000,755
000,8358 GM
The total number of females enrolled in American colleges increased from 755,000 in 1992 to 835,000 in 2000. That is, the geometric mean rate of increase is 1.27%.
3- 34
Describing data = Measures of Dispersion
Range = Largest Value – Smaller Value
Mean Deviation = The Arithmetic mean of the absolute values of t he deviation from the arithmetic mean
= E l X- X’ l n
where E is sigma [sum of]; X = value of each observation; X’ = arithmetic mean of the values; n is number of observation ; l l indicates absolute values
Variance = The arithmetic mean of the of the squared deviation from the mean
Standard Deviation = The Square Root of the variance
DispersionDispersion refers to the spread or variability in the data.
Measures of dispersion include the following: rangerange, , mean deviationmean deviation, , variancevariance, and , and standard standard deviationdeviation.
Range Range = Largest value – Smallest value
Measures of Dispersion
0
5
10
15
20
25
30
0 2 4 6 8 10 12
3- 36
The following represents the current year’s Return on Equity of the 25 companies in an investor’s portfolio.
-8.1 3.2 5.9 8.1 12.3-5.1 4.1 6.3 9.2 13.3-3.1 4.6 7.9 9.5 14.0-1.4 4.8 7.9 9.7 15.01.2 5.7 8.0 10.3 22.1
Example 9
Highest value: 22.1 Lowest value: -8.1
Range = Highest value – lowest value= 22.1-(-8.1)= 30.2
3- 37
Mean Mean DeviationDeviation
The arithmetic mean of the
absolute values of the
deviations from the arithmetic
mean.
The main features of the mean deviation are:
All values are used in the calculation.
It is not unduly influenced by large or small values.
The absolute values are difficult to manipulate.
Mean Deviation
M D = X - X
n
3- 38
The weights of a sample of crates containing books for the INSANIAH Library (in pounds ) are:
103, 97, 101, 106, 103Find the mean deviation.
X = 102
The mean deviation is:
4.25
541515
102103...102103
n
XXMD
Example 10
3- 39
VarianceVariance:: the arithmetic mean of the squared
deviations from the mean.
Standard deviationStandard deviation: The square root of the variance.
Variance and standard Deviation
3- 40
Not influenced by extreme values. The units are awkward, the square of the
original units. All values are used in the calculation.
The major characteristics of the
Population VariancePopulation Variance are:
Population Variance
3- 41
Population VariancePopulation Variance formula:
(X - )2
N =
X is the value of an observation in the population
m is the arithmetic mean of the population
N is the number of observations in the population
Population Standard DeviationPopulation Standard Deviation formula:
2Variance and standard deviation
3- 42
(-8 .1 -6 .6 2 ) 2 + (-5 .1 -6 .6 2 ) 2 + ... + (2 2 .1 -6 .6 2 ) 2
2 5
= 4 2 .2 2 7
= 6 .4 9 8
In Example 9, the variance and standard deviation are:
(X - )2
N =
Example 9 continued
3- 43
Sample variance (sSample variance (s22))
s 2 =(X - X ) 2
n -1
Sample standard deviation (s)Sample standard deviation (s)
2ss Sample variance and standard deviation
3- 44
40.75
37
n
XX
30.515
2.2115
4.76...4.77
1
2222
n
XXs
Example 11
The hourly wages earned by a sample of five students are:$7, $5, $11, $8, $6.
Find the sample variance and standard deviation.
30.230.52 ss
3- 45
Cumulative Frequency Polygon
Histogram &Frequency Polygon
Example 12
A sample of ten movie in TV tallied the total number of movies showing in all TV channel last week. Compute the mean number of movies showing.
Movies showing
frequency f
class midpoint
X
(f)(X)
1 up to 3 1 2 2
3 up to 5 2 4 8
5 up to 7 3 6 18
7 up to 9 1 8 8
9 up to 11
3 10 30
Total 10 66
6.610
66
n
fXX
3- 49
The Median of Grouped Data
)(2 if
CFn
LMedian
where L is the lower limit of the median class, CF is the cumulative frequency preceding the median class, f is the frequency of the median class, and i is the median class interval.
The MedianMedian of a sample of data organized in a frequency distribution is computed by:
3- 50
Describing Data – Measures of Location
[For Grouped Data]MEAN
MEDIAN
MODE
The Mean of Grouped Data
n
XfX
The MeanMean of a sample of data organized in a frequency
distribution is computed by the following formula:
3- 52
Example 12
A sample of ten movie theaters in a large metropolitan area tallied the total number of movies showing last week. Compute the mean number of movies showing.
Movies showing
frequency f
class midpoint
X
(f)(X)
1 up to 3 1 2 2
3 up to 5 2 4 8
5 up to 7 3 6 18
7 up to 9 1 8 8
9 up to 11
3 10 30
Total 10 66
6.610
66
n
XX
3- 53
The Median of Grouped Data
)(2 if
CFn
LMedian
where L is the lower limit of the median class, CF is the cumulative frequency preceding the median class, f is the frequency of the median class, and i is the median class interval.
The MedianMedian of a sample of data organized in a frequency distribution is computed by:
3- 54
Finding the Median Class
To determine the median class for grouped data
Construct a cumulative frequency distribution.
Divide the total number of data values by 2.
Determine which class will contain this value. For example, if n=50, 50/2 = 25, then determine which class will contain the 25th value.
3- 55
Example 12 continued
Movies showing
Frequency Cumulative Frequency
1 up to 3 1 1
3 up to 5 2 3
5 up to 7 3 6
7 up to 9 1 7
9 up to 11 3 10
3- 56
Example 12 continued
33.6)2(3
32
10
5)(2
if
CFn
LMedian
From the table, L=5, n=10, f=3, i=2, CF=3
3- 57
BUSINESS STATISTICS ; LECTURE NOTE [ShayaaOthman]