statistics
DESCRIPTION
Statistics Reviewer a. Introduction-definition of terms b.Sampling Techniques c.Descriptive StatisticsTRANSCRIPT
STATISTICS REVIEWER
Chapter One
Introduction
Meaning
Statistic (singular) - is a science that deals with the principles and procedures for the collection, organization, summarization, presentation and analysis of numerical data.
Statistics (plural)-set of data or mass of observation
Fields of statistics
Mathematical statistics- development and exposition of theories
Applied statistics- application of statistical methods to solve real problems
Branches of Statistics
Descriptive statistics- methods of summarizing and presenting data (collection, extraction, summarization, presentation, measures of central tendency, measures of location and measures of variability)
Inferential statistics-the process of drawing and making decision on the population based on evidence obtained from a sample (estimation and hypothesis testing)
Classification of Statistics
Parametic statistics- is an approach which assumes a random sample from a normal distribution and involves testing of hypothesis about the population parameter
Appropriate for interval and ratio data Requires large sample size to appeal normality
Nonparametic statistics/distribution-free method-is an approach for estimating and hypothesis testing when no underlying data is assumed
Can be used for nominal, ordinal scaled data Can be used for interval and ratio scaled data where the distribution of
the random variable of interest is unspecified Good when there is not enough sample size to assess the form of the
distribution
Data- quantities or qualities measured or observed
Types of Data
1. Categorical data-uses nonparametic statistics Nominal scales- values or categories with no particular order
Ex: cause of death, gender Lowest or less precise scale Can be stored in computer using numbers
Nominal scale of measurement- uses numbers merely as a means of separating the properties into different categories
Ordinal scales – values or categories with particular orderEx: pain level, social status
Can be stored in computer using numbers
Ordinal scale of measurement-refers to measurements where only the comparisons greater, less or equal between measurements are relevant
2. Continuous Data-uses parametic statistics Interval scales-measured on a continuum and differences
between any two numbers on the scale are of known size: no true zero Ex: Temperature, tons of garbage, number of arrest, income, age
Ratio scales-data with both equal intervals and an absolute zeroEx: weight in pounds, height in centimeters, age in years
Highest or most precise scaled dataRatio scale of measurement is used when not only the order and interval size are important, but also the ratio between two measurements are meaningful.
Variables-property that can take on different values or categories which cannot be predicted with certainty
Ex: undergraduate majors (BSBA, CRIMINOLOGY), smoking habit, attitude toward the head, height, faculty ranks
Types of Variables
1. Explanatory/Independent/ X variables-may be continuous 2. Response/Dependent/Y variables-may be continuous or categorical3. Control/Z variables
Classification of variables
a. Qualitative variable- categories are used as labels to distinguish one group from another Ex: cause of death, nationality, race, gender, severity of pain
b. Quantitative variable-whose categories can be measured and ordered according to quantity; Ex: number of children in a family, age
Quantitative variable classification:
i. Discrete variables-values that is either finite or countably infinite There are gaps between its possible valuesEx: number of missing teeth and number of household members
ii. Continuous variables- has a set of possible values including all values in an interval of the real lineEx: Body Mass Index, height There is no gaps between possible values
Sources of Data
a) Primary sourceb) Secondary source
Presentation of Data
a) Textual presentation-uses statementsb) Tabular presentation-uses statistical tablec) Graphical presentation-uses graph
I. Bar graphII. Circle graphIII. Line graph
Chapter Two
Sampling Techniques
Population [N]-totality of the individuals
Target population-entire set of individuals which we require info from
Sampled population-finite set of individuals from which sample is drawn
Coverage error-when the sampled population and the target population are not identical
Sample-representative portion of the population understudy
Principle of Randomization
“Four basic reasons for the use of samples”
1. It allows us to obtain information with greater speed2. It allows us to obtain information with reduced cost3. It allows us to obtain information over a greater scope4. It allows us to obtain information with greater accuracy
Probability sampling and Non-probability Sampling
Probability sampling-uses random selection
Methods of Probability Sampling
1. Simple random sampling-choosing a sample from a set Techniques in drawing SRS
a) Table of Random Numbersb) Lottery of fishbowl technique
May be done in two ways
a) With replacement b) Without replacement
Advantages
Simple and is more easily understood that other sampling designs
Disadvantages
List of all the members in the population is needed. Assigning numbers to each member of the sampled
population is frequently impractical.
It may be difficult to collect the sample data with SRS if the samples are spread inconveniently throughout the population.
It is often less precise than other sampling plans because it disregards any information already known about the population.
When to use
When the population is homogeneous with respect to the characteristics understudy.
2. Systematic random sampling-selection of the desired sample in a list by arranging them systematically
Formula: k=Nn
Where k=sampling intervalN=population sizen=sample size
Advantages:
Easier to apply and less likely to make mistakes. It is possible to select a sample in the field without a sampling
frame. It could give more precise estimate than SRS when there is
order in the samples
Disadvantages:
If periodic regularities are found in the lists, a systematic sample may consist only of similar types
If the population is not in random order, one cannot validly estimate the variance of the mean from a single systematic sample
Could be less precise than SRS
When to use:
If the ordering of the population is essentially random When stratification with numerous data is used
3. Stratified random sampling- population [N] is divided into a number
Advantages:
Gain in the precision of the estimates of characteristics of the population
It allows for more comprehensive data analysis since information is provided for each stratum
Accommodate administrative convenience, fieldwork is organized by strata, which usually result in saving in cost
Accommodate different sampling plans in different strata
Disadvantages:
Sampling frame is necessary for every stratum Prior information about the population and its subpopulations
is necessary for stratification purposes
When to use:
Population is known to be heterogeneous or the population can be subdivided into mutually exclusive and exhaustive groups
4. Cluster sampling- used when the population is very large and widely spread out over a wide range of geographical area
Advantages:
Sampling frame for the entire population is not necessary, only a frame of clusters, i.e., a list of clusters in the population
Reduced listing and transportation costs The procedure saves time, effort and money
Disadvantages:
Entail more statistical analysis Estimation procedures are difficult, especially when the
clusters are of unequal size
When to use
Sampling frame is not available and the cost of constructing such a frame is very high
For economic consideration, i.e., when the time, effort and cost involved in obtaining information on the population units increase as the distances separating these units increase
5. Multistage sampling- the selection of sample is done in two or more stages
Advantages:
Like cluster sampling, transportation and listing cost are reduced
Disadvantages:
The procedures are difficult, especially when the first-stage units are not of the same size
The sampling procedure entails much planning before selection is done
When to use
if no population is available if the population covers a wide area
Non-probability Sampling-does not involve random selection
Methods of Non-probability Sampling
1. Convenience sampling-samples are ready2. Accidental sampling-choses sample by chance or accident3. Quota sampling-you select samples according to some fixed quota4. Judgment sampling-chooses samples on the basis of an expert’s
opinion
Chapter Three
Descriptive Statistics
Three major characteristics of a single variable
1. Distribution-summary of the frequency of the individual observations for a varialble
a) Frequency distribution-used when there are many individual observations for a variable, when data are greater than or equal to 30.
2. The measures of central tendencya) Mean-most reliable (sum of all observation divided by the
number of observationsb) Mode-repeated value of the observationsc) Median-value that is found at the exact middle of observations
3. The measures of dispersiona) Range-difference between highest and lowest observationb) Standard deviation-involves all observations in the distribution