4- graphic representation of data .2011

Upload: emran-leghari

Post on 10-Mar-2016

9 views

Category:

Documents


0 download

DESCRIPTION

hi

TRANSCRIPT

GLE 1

4. Graphical Representation of Data

Representation: A representation refers to the type of graph, table, diagram, or model that is used to display data. The representations that are included in this course include:

Frequency tables

Tables

Bar graphs

Circle graphs

Line graphs

Stem-and-leaf plots

Scatter plots

Histograms

1. Frequency table: A frequency table is a table that indicates the number of times an observation or category occurs. There are different types of frequency tables. The type described in Example 4.1 is called an absolute frequency table.Example 1.1 Halloween Candy:The Number of Candy Bars Students Brought to

School the Day after HalloweenNumber of Candy BarsNumber of Students

01

11

21

33

40

54

62

71

82

2. Line plot: A line plot is a representation in which Xs, symbols, or objects are used to show a frequency along a number line.Example 2.1 Line plot with key:

Example 2.2 Line plot with a label on the vertical axis and no key:

3. Bar graph: A bar graph is a representation that is used to make comparisons among groups. Bar graphs are constructed using horizontal or vertical bars of equal widths. The heights (lengths) are equal to the values (e.g., counts, percents, temperature scales) being represented. While bar graphs are used to display both categorical data (e.g., countries, favorite colors, months, eye color, ages, and favorite whole number) and discrete numerical data it is one of the most common ways to display categorical data. An important characteristic of a bar graph displaying categorical data is that there is not always one correct order in which to put the categories.Example 3.1 Bar graph categorical data (countries) with counts:

Example 3.2 Bar graph categorical data (countries) with percents:

Example 3.3 Multiple bars to compare within and across sets of data:Example 3.4 Bar graph representing discrete numerical data:

Mrs. Plums Class

NamePercent*

Barbara94

Beth61

Bob62

Brian84

Chris80

Frank93

Hannah84

Ian88

Karen84

Kim51

Marge72

Nick98

Patty96

Phil71

Ralph53

Richard89

Sam60

Sam R.87

Sonia97

Ted72

*Percents rounded to the nearest whole percent

4.Histogram: A histogram is a graphical representation of frequency distributions (e.g., counts, percents) used to compare groups in which the groups have values that are continuous and can be ordered from least to greatest. Some variables that can be represented using histograms include ages, heights, test scores, and times. A histogram is constructed of contiguous rectangles (rectangles touching each other) with: widths of the rectangles representing numerical intervals (groups) that are determined by the person who organizes the histogram (See examples 7.2 and 7.3.); widths of the rectangles proportional to the size of the respective intervals (e.g., If all the numerical intervals are the same length, then the widths of the rectangles are the same length. However, if, for example, one interval spans 10 years and another interval spans 20 years, then the width of the rectangle representing 20 years is twice as long as the width of the rectangle representing 10 years.); and

areas of the rectangles proportional to the percent or frequency (height of rectangle).Test Score

IntervalsFrequency

50 592

60 693

70 793

80 897

90 995

Example 4.1 Histogram with class intervals equal in size (10 percentage points):

Example 4.2 Histogram with class intervals equal in size (10 percent points) with intervals on a scale:Test ScoreIntervalsFrequency

50592

60693

70793

80897

90995

Example 4.3 Histogram with class intervals equal in size (5 percentage points) with intervals on a scale:

Test Score IntervalsFrequency

50 542

55 590

60 643

65 690

70 743

75 790

80 844

85 893

90 942

95 993

5. Circle graph (Pie Graph): A circle graph is a pictorial representation used to compare parts of a whole. The area of the circle represents the whole and is divided into pieces called sectors that represent parts of a whole. The size of the sector is determined by the percent of the whole that its respective categories represent.Example 5.1 Circle graph:6. Stem-and-leaf plot: A stem-and-leaf plot organizes data to show their shape and distribution. Each data value is split into a stem and a leaf. Typically, the stem is the first digit(s) of a data value and the leaf is the last digit of the data value.(e.g., If 27 is a data value, then 2 is the stem and 7 is the leaf.) Example 6.1 Mathematics Test Scores Mrs. Plum and Mr. Scarlet:Mr. Scarlets class

NamePercent*

Beth97

Bob76

Brian78

Chris65

Darlene76

Debbie70

Glenn70

Ian85

Kim68

Lee70

Leslie80

Lorie79

Lucinda81

Marilyn88

Mike68

Mike R.76

Nick85

Ralph98

Richard82

Ted68

Tom88

*Percents rounded to the nearest whole percent

Mr. Scarlet and Mrs. Plum gave the same test to two different classes. The results from the tests are organized into the tables and stem-and-leaf plots below.

Mrs. Plums class

NamePercent*

Barbara94

Beth61

Bob62

Brian84

Chris80

Frank93

Hannah84

Ian88

Karen84

Kim51

Marge72

Nick98

Patty96

Phil71

Ralph53

Richard89

Sam60

Sam R.87

Sonia97

Ted72

*Percents rounded to the nearest whole percent

Mrs. Plums Class

51, 3

60, 1, 2

71, 2, 2

80, 4, 4, 4, 7, 8, 9

93, 4, 6, 7, 8

Mr. Scarlets Class

65, 8, 8, 8

70, 0, 0, 6, 6, 6, 8, 9

80, 1, 2, 5, 5, 8, 8

97, 8

Time of DayTemperature (F)

8:00 A.M.60

9:00 A.M.62

10:00 A.M.70

11:00 A.M.75

12:00 P.M.76

1:00 P.M.78

2:00 P.M.77

3:00 P.M.74

4:00 P.M.70

5:00 P.M.66

6:00 P.M.50

7:00 P.M.50

8:00 P.M.45

7. Line graph: A line graph is typically used to represent serial data (data that can be collected without gaps such as change in temperature over a time period). Typically, observations are made over regular intervals. Then, these discrete points are plotted and adjacent points are connected with line segments to indicate the serial nature of the data and allow for interpolation.Example 7.1 Line Graph:

Average Daily Maximum Temperatures (Degrees Fahrenheit): Providence, RI

(19611990)

JFMAMJJASOND

373846576777828174645341

Example 9.2 Broken Line Graph:

Example 9.3 Double line graph: Ice Cream Consumption

(Pints consumed per capita)Price

(per pint in dollars)Temperature

(degrees Fahrenheit)

0.3860.27041

0.3740.28256

0.3930.27763

0.4250.28068

0.4060.27269

0.3440.26265

0.3270.27561

0.2880.26747

0.2690.26532

0.2560.27724

0.2860.28226

0.2980.27026

0.3290.27232

0.3180.28740

0.3810.27755

0.3810.28763

0.4700.28072

0.4430.27772

0.3860.27767

0.3420.27760

0.3190.29244

0.3070.28740

0.2840.27732

0.3260.28527

0.3090.28228

0.3590.26533

0.3760.26541

0.4160.26552

0.4370.26864

0.5480.26071

8. Scatter plot: Scatter plots are plots of observations that are used to determine if there is a relationship between two variables being studied. Scatter plots help to answer questions about how two variables are related to each other (e.g., Is there a relationship between the amount of time students watch TV and their grades?).Examples contain data collected over 30 four-week periods from March 18, 1951 to July 11, 1953. Example 10.1 Relationship of Ice Cream Consumed to Price of Ice Cream:The question being asked in this example was Is ice cream consumption dependent upon price?

Example 10.2 Relationship of Ice Cream Consumed to Air Temperature:The question being asked in this example was Is there a relationship between air temperature and the amount of ice cream consumed?

Note: Additional studies were conducted to determine if other variables have an impact on ice cream consumption, including Lag-Temp (Temp lagged by one time period) and Year, which were both significant predictors of ice cream consumption. For more information about the study go to

9. Estimated line of best fit: A line that can be drawn through the data points on a scatter plot that represents the best approximation of the trend reflected in the data. The line can then be used to make predictions.Example 11.1 Estimated line of best fit (Data from Examples 10.2):An ice cream company would be very interested in the data from the study in Example 13.2. However, they would not just be interested in the fact that there is a trend, but they would want to use the data to determine how much ice cream to produce at different times of the year for different regions of the country (or world) based upon regional temperatures. To make a prediction about production, find an estimated line of best fit and use it to make a prediction by interpolating the values along the line at a given point.

http://lib.stat.cmu.edu/DASL/Stories/IceCreamConsumption.htm .10. Outlier: An outlier is a data point that is widely separated from the general distribution of a set of data.

Example 13.1 Scatter plot showing the general distribution of a set of data with an outlier:

Data set constructed based on National Center for Health Statistics Growth Chart http://www.cdc.gov/nchs/data/nhanes/growthcharts/set2/chart%2007.pdf.

When interpreting histograms with a scale the typical convention is to include the left endpoint in the interval and exclude the right endpoint (e.g., Test scores 50, 51, 52, 53, 54, 55, 56, 57, 58, and 59 are included in the first bar in the graph, but not test score 60. Test score 60 is included in the second bar of the graph).

Counts

Test Scores (Percents)

Outlier The heght ofd 12 year old boys

Number line

The area of each rectangle is proportional to the frequency of its respective class. (i.e., The area of each rectangle is ten times the frequency.)

Each sector is a part of the whole (water on earth).

The 12 year old boy who is 5 10 is an outlier for this set of data.

A broken line graph can be used to show the trends in non-serial data, such as the average daily maximum temperatures from month to month. The line is broken because the data points are NOT serial as in Example 12.1. That is, drawing a line between any two points would imply that an average daily maximum temperature exists, for example, for some month between January and February.

One student has no candy bars, 1 student has one candy bar, 1 student has two candy bars, 3 students have three candy bars, no student has four candy bars, and so on.

Stem-and-leaf plots concisely organize large amounts of data in a way that reveals the distribution of the data, the mode, and provides an easy way to determine the median of the data set.

Leaves

Stems

There does not appear to be any discernable trend in the data.

In general, as the temperature increases from 24F to 72F, so does ice cream consumption.

Estimated line of best fit

Interpolation: Based on the estimated line of best fit, at 50 F approximately 0.36 pints of ice cream per capita would be consumed.

Reads as Four students each had 5 candy bars.

e.g., 51 represents 51%

Whoever organizes the histogram decides the length of the intervals.

The order in which the bars are constructed does not change the meaning of the data.

. A property of a line of best fit known as the least squares regression line is that the sum of the errors (difference between the actual points and the points on the line for given x-values) is zero. This means that half of the error is below the line and half of the error is above the line. Estimating this line leads to an appropriate estimated line of best fit.

An ice cream company would have collected many more data points over longer periods of time than were collected in this study to make this type of decision. However, this example is used for illustrative purposes.

PAGE 10