statistics

11
STATISTICS REVIEWER

Upload: university-of-cebu

Post on 06-May-2015

31 views

Category:

Education


1 download

DESCRIPTION

Statistics Reviewer a. Introduction-definition of terms b.Sampling Techniques c.Descriptive Statistics

TRANSCRIPT

Page 1: Statistics

STATISTICS REVIEWER

Page 2: Statistics

Chapter One

Introduction

Meaning

Statistic (singular) - is a science that deals with the principles and procedures for the collection, organization, summarization, presentation and analysis of numerical data.

Statistics (plural)-set of data or mass of observation

Fields of statistics

Mathematical statistics- development and exposition of theories

Applied statistics- application of statistical methods to solve real problems

Branches of Statistics

Descriptive statistics- methods of summarizing and presenting data (collection, extraction, summarization, presentation, measures of central tendency, measures of location and measures of variability)

Inferential statistics-the process of drawing and making decision on the population based on evidence obtained from a sample (estimation and hypothesis testing)

Classification of Statistics

Parametic statistics- is an approach which assumes a random sample from a normal distribution and involves testing of hypothesis about the population parameter

Appropriate for interval and ratio data Requires large sample size to appeal normality

Nonparametic statistics/distribution-free method-is an approach for estimating and hypothesis testing when no underlying data is assumed

Can be used for nominal, ordinal scaled data Can be used for interval and ratio scaled data where the distribution of

the random variable of interest is unspecified Good when there is not enough sample size to assess the form of the

distribution

Page 3: Statistics

Data- quantities or qualities measured or observed

Types of Data

1. Categorical data-uses nonparametic statistics Nominal scales- values or categories with no particular order

Ex: cause of death, gender Lowest or less precise scale Can be stored in computer using numbers

Nominal scale of measurement- uses numbers merely as a means of separating the properties into different categories

Ordinal scales – values or categories with particular orderEx: pain level, social status

Can be stored in computer using numbers

Ordinal scale of measurement-refers to measurements where only the comparisons greater, less or equal between measurements are relevant

2. Continuous Data-uses parametic statistics Interval scales-measured on a continuum and differences

between any two numbers on the scale are of known size: no true zero Ex: Temperature, tons of garbage, number of arrest, income, age

Ratio scales-data with both equal intervals and an absolute zeroEx: weight in pounds, height in centimeters, age in years

Highest or most precise scaled dataRatio scale of measurement is used when not only the order and interval size are important, but also the ratio between two measurements are meaningful.

Variables-property that can take on different values or categories which cannot be predicted with certainty

Ex: undergraduate majors (BSBA, CRIMINOLOGY), smoking habit, attitude toward the head, height, faculty ranks

Types of Variables

1. Explanatory/Independent/ X variables-may be continuous 2. Response/Dependent/Y variables-may be continuous or categorical3. Control/Z variables

Classification of variables

Page 4: Statistics

a. Qualitative variable- categories are used as labels to distinguish one group from another Ex: cause of death, nationality, race, gender, severity of pain

b. Quantitative variable-whose categories can be measured and ordered according to quantity; Ex: number of children in a family, age

Quantitative variable classification:

i. Discrete variables-values that is either finite or countably infinite There are gaps between its possible valuesEx: number of missing teeth and number of household members

ii. Continuous variables- has a set of possible values including all values in an interval of the real lineEx: Body Mass Index, height There is no gaps between possible values

Sources of Data

a) Primary sourceb) Secondary source

Presentation of Data

a) Textual presentation-uses statementsb) Tabular presentation-uses statistical tablec) Graphical presentation-uses graph

I. Bar graphII. Circle graphIII. Line graph

Page 5: Statistics

Chapter Two

Sampling Techniques

Population [N]-totality of the individuals

Target population-entire set of individuals which we require info from

Sampled population-finite set of individuals from which sample is drawn

Coverage error-when the sampled population and the target population are not identical

Sample-representative portion of the population understudy

Principle of Randomization

“Four basic reasons for the use of samples”

1. It allows us to obtain information with greater speed2. It allows us to obtain information with reduced cost3. It allows us to obtain information over a greater scope4. It allows us to obtain information with greater accuracy

Probability sampling and Non-probability Sampling

Probability sampling-uses random selection

Methods of Probability Sampling

1. Simple random sampling-choosing a sample from a set Techniques in drawing SRS

a) Table of Random Numbersb) Lottery of fishbowl technique

May be done in two ways

a) With replacement b) Without replacement

Advantages

Simple and is more easily understood that other sampling designs

Disadvantages

List of all the members in the population is needed. Assigning numbers to each member of the sampled

population is frequently impractical.

Page 6: Statistics

It may be difficult to collect the sample data with SRS if the samples are spread inconveniently throughout the population.

It is often less precise than other sampling plans because it disregards any information already known about the population.

When to use

When the population is homogeneous with respect to the characteristics understudy.

2. Systematic random sampling-selection of the desired sample in a list by arranging them systematically

Formula: k=Nn

Where k=sampling intervalN=population sizen=sample size

Advantages:

Easier to apply and less likely to make mistakes. It is possible to select a sample in the field without a sampling

frame. It could give more precise estimate than SRS when there is

order in the samples

Disadvantages:

If periodic regularities are found in the lists, a systematic sample may consist only of similar types

If the population is not in random order, one cannot validly estimate the variance of the mean from a single systematic sample

Could be less precise than SRS

When to use:

If the ordering of the population is essentially random When stratification with numerous data is used

3. Stratified random sampling- population [N] is divided into a number

Advantages:

Gain in the precision of the estimates of characteristics of the population

Page 7: Statistics

It allows for more comprehensive data analysis since information is provided for each stratum

Accommodate administrative convenience, fieldwork is organized by strata, which usually result in saving in cost

Accommodate different sampling plans in different strata

Disadvantages:

Sampling frame is necessary for every stratum Prior information about the population and its subpopulations

is necessary for stratification purposes

When to use:

Population is known to be heterogeneous or the population can be subdivided into mutually exclusive and exhaustive groups

4. Cluster sampling- used when the population is very large and widely spread out over a wide range of geographical area

Advantages:

Sampling frame for the entire population is not necessary, only a frame of clusters, i.e., a list of clusters in the population

Reduced listing and transportation costs The procedure saves time, effort and money

Disadvantages:

Entail more statistical analysis Estimation procedures are difficult, especially when the

clusters are of unequal size

When to use

Sampling frame is not available and the cost of constructing such a frame is very high

For economic consideration, i.e., when the time, effort and cost involved in obtaining information on the population units increase as the distances separating these units increase

5. Multistage sampling- the selection of sample is done in two or more stages

Advantages:

Page 8: Statistics

Like cluster sampling, transportation and listing cost are reduced

Disadvantages:

The procedures are difficult, especially when the first-stage units are not of the same size

The sampling procedure entails much planning before selection is done

When to use

if no population is available if the population covers a wide area

Non-probability Sampling-does not involve random selection

Methods of Non-probability Sampling

1. Convenience sampling-samples are ready2. Accidental sampling-choses sample by chance or accident3. Quota sampling-you select samples according to some fixed quota4. Judgment sampling-chooses samples on the basis of an expert’s

opinion

Chapter Three

Descriptive Statistics

Three major characteristics of a single variable

1. Distribution-summary of the frequency of the individual observations for a varialble

a) Frequency distribution-used when there are many individual observations for a variable, when data are greater than or equal to 30.

2. The measures of central tendencya) Mean-most reliable (sum of all observation divided by the

number of observationsb) Mode-repeated value of the observationsc) Median-value that is found at the exact middle of observations

3. The measures of dispersiona) Range-difference between highest and lowest observationb) Standard deviation-involves all observations in the distribution