problems & considerations in fd (1)

36
PROBLEMS & CONSIDERATIONS IN FREQUENCY DISTRIBUTION SUBMITTED TO: SUBMITTED BY: MR.DINESH DHANKHAR AARTI Asst. Professor Ph.D. - 09 DEPARTMENT OF TOURISM & HOTEL MANAGEMENT, KUK

Upload: ohlyanaarti

Post on 15-Apr-2017

217 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Problems & Considerations in FD (1)

PROBLEMS & CONSIDERATIONS IN FREQUENCY DISTRIBUTION

SUBMITTED TO: SUBMITTED BY:MR.DINESH DHANKHAR AARTI Asst. Professor Ph.D. - 09

DEPARTMENT OF TOURISM & HOTEL MANAGEMENT, KUK

Page 2: Problems & Considerations in FD (1)

FREQUENCY DISTRIBUTION

Page 3: Problems & Considerations in FD (1)

FREQUENCY

It is the number of times a particular value of the variable occurs.

usually abbreviated as f.

FREQUENCY DISTRIBUTION

A tabular organization of statistical data, where each piece of data is assigned its corresponding frequency.

Used for both qualitative and quantitative data.

One variable is considered at a time.

Indicates the shape of empirical distribution of the variable.

Page 4: Problems & Considerations in FD (1)

TYPES OF FREQUENCIES

Page 5: Problems & Considerations in FD (1)

Absolute Frequency The absolute frequency is the number of times that a certain

value appears in a statistical study. It is denoted by fi.

The sum of the absolute frequencies is equal to the total number of data, which is denoted by N.

This sum is commonly denoted by the Greek letter Σ (capital sigma) which represents 'sum'.

Page 6: Problems & Considerations in FD (1)

Cumulative Frequency

It is the sum of the absolute frequencies of all values less than or equal to the value considered.

It is denoted by Fi.

Page 7: Problems & Considerations in FD (1)

Relative cumulative Frequency

The quotient between the absolute frequency of a certain value and the total number of data.

Expressed in terms of percentages or fractions.

Denoted by ni.

The sum of the relative frequency is equal to 1.

Page 8: Problems & Considerations in FD (1)

Example

A city has recorded the following daily maximum temperatures during the month:

32, 31, 28, 29, 33, 32, 31, 30, 31, 31, 27, 28, 29, 30, 32, 31, 31, 30, 30, 29, 29, 30, 30, 31, 30, 31, 34, 33, 33, 29, 29.

xi fi Fi Ni

27 1 1 0.032

28 2 3 0.097

29 6 9 0.290

30 7 16 0.0516

31 8 24 0.774

32 3 27 0.871

33 3 30 0.968

34 1 31 1

  31    

Page 9: Problems & Considerations in FD (1)

REASONS for constructing a FD

To organize the data in a meaningful, intelligent way.

To enable the reader to make comparisons among different data sets.

To facilitate computational procedures for measures of average and spread.

To enable the reader to determine the nature and shape of distribution.

To enable the researcher to draw charts and graphs for presentation of data.

Page 10: Problems & Considerations in FD (1)

TYPES of Frequency Distribution

Page 11: Problems & Considerations in FD (1)

CATEGORICAL

Are used when data can be placed in specific categories, such as nominal or ordinal level data.

Example: Political affiliations, Blood types

UNGROUPED

Are used when few distinct data is to be organized. Example: Number of incoming calls per day over first 20 days

GROUPED

Are used when large amount of data is to be organized. The values are grouped in intervals (classes) that have the same amplitude. Each class is assigned its corresponding frequency. Example: Miles traveled by 50 employees of a company to work every day.

Page 12: Problems & Considerations in FD (1)

CONSTRUCTION Classify the data

Decide the range by equal classes and number of classes for dividing the data

The range of scores (highest score –lowest score)

Width: divide the range by the number of class intervals. Round the interval width in either direction to a convenient

number, even if that means adjusting the number of class intervals.

Frequencies: count the number of observations that occur in each interval and enter the count as the frequency of the interval.

Page 13: Problems & Considerations in FD (1)

CONSIDERATIONS Must:

5-20 classes;

The classes must be exhaustive- enough classes to accommodate all the data.

The class width should be an odd number. This ensures that the midpoint has the same place value as the data. E.g. width= H-L/number of classes; round up result.

The classes must be mutually exclusive- no overlapping class limits.

The classes must be continuous.

The classes must be equal in width (exception : “open ended distributions”, no specific beginning value or no specific ending value.). This makes it easier to compare the frequency in one class to another.

Page 14: Problems & Considerations in FD (1)

Mere SUGGESTIONS:

Avoid open-ended classes if possible such as "75 and over".

Try to use between 5 and 20 classes if possible. If you have fewer than 5 classes, you're not really breaking up the data, and if you use more than 20 classes, this will probably be information overflow.

It is usually convenient to use class sizes of 5 or 10, in other words, to have each class containing 5 or 10 possible values.

It is usually convenient to make the lower limit of the first category a multiple of the class size.

It is necessary to include scores with zero frequency in order to draw the frequency polygons correctly.

Page 15: Problems & Considerations in FD (1)

PROBLEMS Selection of classes No hard & fast rules

It depends on a number of factors such as: The number of classes to be classified The magnitude of the class interval The accuracy desired The ease of calculation for further processing of data

Difficult to find out values with zero frequencies

Page 16: Problems & Considerations in FD (1)

PRESENTATION

Page 17: Problems & Considerations in FD (1)

BAR GRAPH

Used for discrete variables, often nominal or ordinal data.

Bars represent separate groups, so they should be separated.

Page 18: Problems & Considerations in FD (1)

HISTOGRAM

was first introduced by Karl Pearson

It is a graphical representation of a single dataset, which is tallied into classes.

It comprises of a series of rectangles, the widths of which are defined by the limits of the classes, the heights of these are determined by the frequency in each interval.

Used for continuous variables.

Bars represent segments of a range, so they should touch.

Page 19: Problems & Considerations in FD (1)

A RELATIVE FREQUENCY HISTOGRAM

A Relative frequency histogram is made by taking the relative frequencies as heights of the rectangles.

Don’t forget to close the tails to the X axis.

Page 20: Problems & Considerations in FD (1)

PIE GRAPH

1999 Top Company Employers in Central Florida

Tourism35%

Retail20%

Health Care16%

Others29%

Pie graphs are used to show the relationship between the parts and the whole.

Page 21: Problems & Considerations in FD (1)

ABSOLUTE FREQUENCY POLYGON

An absolute frequency polygon is drawn exactly like a histogram except that points are drawn rather than bars.

Page 22: Problems & Considerations in FD (1)

RELATIVE FREQUENCY POLYGON

The relative frequency polygon is drawn exactly like the absolute frequency polygon except the Y-axis is labeled and incremented with relative frequency rather than absolute frequency.

Page 23: Problems & Considerations in FD (1)

CUMULATIVE FREQUENCY POLYGON/OGIVES

A cumulative frequency polygon will always be monotonically increasing.

The line will never go down, it will either stay at the same level or increase.

Page 24: Problems & Considerations in FD (1)

PARETO CHART It is named after Vilfredo Pareto. It is a chart that contains both bars and a line graph, where

individual values are represented in descending order by bars, and the cumulative total is represented by the line.

The purpose of the Pareto chart is to highlight the most important among a (typically large) set of factors.

Used to show frequencies for nominal variables.

Page 25: Problems & Considerations in FD (1)

TIME SERIES GRAPHS A line chartline chart, , also called a time plottime plot, , is a series of data plotted at

various time intervals. Measuring time along the horizontal axis and the numerical quantity

of interest along the vertical axis yields a point on the graph for each observation.

Joining points adjacent in time by straight lines produces a time plot.

Used to show a pattern or trend that occurs over time.

Growth Trends in Internet Use by Age 1997 to 1999

16.520.2

26.331.3 32.7

9.813.8 15.8 17.2 18.5

5 7.511.4 13 14.2

05

101520253035

April 1997 to July 1999

Mill

ions

of A

dults

Age 18 to 29

Age 30 to 49

Age 50+

Page 26: Problems & Considerations in FD (1)

STEM and LEAF PLOT

It is an alternative to the histogram. Data are grouped according to their leading

digits (called the stem) while listing the final digits (called leaves) separately for each member of a class.

The leaves are displayed individually in ascending order after each of the stems.

Page 27: Problems & Considerations in FD (1)

SCATTER PLOTAbsences Grade

0 2 4 6 8 10 12 14 16404550556065707580859095

Absences (x)

x825

121596

y78929058437481

Grades

First introduced by Sir Francis Galton

Page 28: Problems & Considerations in FD (1)

SHAPES

Page 29: Problems & Considerations in FD (1)

Chapter 3 - 29

The Normal Distribution A bell-shaped curve Called the normal curve or a normal distribution It is symmetrical The far left and right portions containing the low-frequency

extreme scores are called the tails of the distribution. Variations in Normal Distribution: Mesokurtic = normal distribution Leptokurtic = thin Platykurtic = broad or fat

Page 30: Problems & Considerations in FD (1)

Copyright © Houghton Mifflin Company. All rights reserved. Chapter 3 - 30

Skewed Distributions

It is not symmetrical as it has only one pronounced tail. A distribution may be either negatively skewed or positively skewed. Whether a skewed distribution is negative or positive corresponds to

whether the distinct tail slopes toward or away from zero.

Page 31: Problems & Considerations in FD (1)

Chapter 3 - 31

Negatively Skewed Distribution

A negatively skewed distribution contains extreme low scores that have a low frequency, but does not contain low frequency extreme high scores

Page 32: Problems & Considerations in FD (1)

Copyright © Houghton Mifflin Company. All rights reserved. Chapter 3 - 32

Positively Skewed Distribution

A positively skewed distribution contains extreme high scores that have a low frequency, but does not contain low frequency extreme low scores.

Page 33: Problems & Considerations in FD (1)

Chapter 3 - 33

Bimodal Distribution

A bimodal distribution is a symmetricaldistribution containing two distinct humps

Page 34: Problems & Considerations in FD (1)

Chapter 3 - 34

Rectangular Distribution

A rectangular distribution is a symmetrical distribution shaped like a rectangle

Page 35: Problems & Considerations in FD (1)

REFERENCES

Kothari, C.R., Research Methodology,2nd ed., New Delhi: New Age International (P) Ltd.,Publishers, 2004.

Malhotra, Naresh K. and Dash, Satyabhushan, Research Marketing

Richard, I. Levin and David, S. Rubin, Statistics for Management, Pearson Education, Inc.,1998.

Page 36: Problems & Considerations in FD (1)