edited economic statistics note

UNIT ONEINTRODUCTION

1.1. MEANING OF ECONOMIC STATISTICSStatistics: is a science that deals with the methods of collecting, organizing, and analyzing of data and interpretation of results. It is a science of decision making under uncertainty.There are 4 stages of statistics;

Collection of data, Presentation of data, Analysis and Interpretation.

Economic Statistics is an area which uses statistical methods in presenting, analyzing and interpretation of economic data.It includes those statistical methods which are frequently used in economics.

1.2. FUNCTIONS OF ECONOMIC STATISTICS It presents facts in a definite form It simplifies a mass of figures It facilitates comparison It helps in formulating and testing hypotheses It helps in prediction It helps in formulation of suitable policies

1.3. TYPES OF STATISTICAL DATA

Data are row facts about a phenomenon. Data are records of the actual state of some measurable aspect of the universe at a particular point in time. Data are not abstract; they are concrete, they are measurements or the tangible and countable features of the world. When data are processed, they generate information. There are different types of statistical data based on the reference in which they are measured.

A) Based on Scale of measurement

Based on the scale of measurement, there are different types of data. There are four basic measurement scales: Nominal, ordinal, interval and ratio.The most accepted basis for scaling has three characteristics:

1. Numbers are ordered. One number is less than, greater than, or equal to another number.2. Differences between numbers are ordered. The difference between any pair of numbers is

greater than, less than, or equal to the difference between any other pair of numbers.3. The number series has a unique origin indicated by the number zero.

Combination of these characteristics of order, distance, and origin provide the following widely used classification of measurement scales.

1

Nominal Scales: When we use nominal scale, we partition a set into categories that are mutually exclusive and collectively exhaustive. The counting of members is the only possible arithmetic operation and as a result the researcher is restricted to the use of the mode as the measure of central tendency. If we use numbers to identify categories, they are recognized as labels only and have no quantitative value. Nominal scales are the least powerful of the four types. They suggest no order or distance relationship and have no arithmetic origin. Examples can be respondents’ marital status, gender, students’ Id number, etc.

Ordinal Scales: Ordinal scales include the characteristics of the nominal scale plus an indicator of order. The use of an ordinal scale implies a statement of ‘greater than’ or ‘less than’ (an equality statement is also acceptable) without stating how much greater or less. Thus the real difference between ranks 1 and 2 may be more or less than the difference between ranks 2 and 3. The appropriate measure of central tendency for ordinal scales is the median. Examples of ordinal scales include opinion or preference scales.

Interval Scales: The interval scale has the powers of nominal and ordinal scales plus one additional strength: It incorporates the concept of equality of interval (the distance between 1 and 2 equals the distance between 2 and 3). When a scale is interval, you use the arithmetic mean as the measure of central tendency. Calendar time is such a scale. For example, the elapsed time between 4 and 6 A.M. equals the time between 5 and 7 A.M. One cannot say, however, 6 A.M is twice as late as 3 A.M. because zero time is an arbitrary origin. Centigrade and Fahrenheit temperature scales are other examples of classical interval scales.

Ratio Scales: Ratio scales incorporate all of the powers of the previous ones plus the provision for absolute zero or origin. The ratio scale represents the actual amounts of a variable. Multiplication and division can be used with this scale but not with the other mentioned. Money values, population counts, distances, return rates, weight, height, and area can be examples for ratio scales.

Summary of measurement scales

Type of scale Characteristics Basic empirical operationNominal No order, distance, or origin Determination of equalityOrdinal Order but no distance or

unique originDetermination of greater or lesser values

Interval Both order and distance but no unique origin

Determination of equality of intervals or differences

Ratio Order, distance, and unique origin

Determination of equality of ratios

B) Based on Time Reference

2

On the basis of time reference, statistical data are of four types;Time Series Data: These are data collected over periods of time. Data which can take different values in different periods of time are normally referred as time series data.Cross-Sectional Data: Data collected at a point of time from different places. Data collected at a single time are known as cross-sectional data. Pooled Data: Data collected over periods of time from different places. It is the combination of both time series and cross-sectional data.Panel Data: It is also known as longitudinal data. It is a time series data collected from the same sample over periods of time.

C) Based on the Sources

Depending on the source, the type of data collected could be primary or secondary in nature.Primary data are those which are collected afresh and for the first time, and thus happen to be original in character. Its advantage is its relevance to the user, but it is also likely to be expensive in time and money terms to collect. Secondary data are those which have already been collected by someone else and which have already been passed through the statistical process. It is information extracted from an existing source, probably published or held on a computer database. From Practical point of view this type of information is collected for any purpose other than the current research objectives and is not always up-to-date. For this reason it may not precisely meet the needs of the secondary user. However, it is less expensive and time-consuming to obtain. Therefore, it provides a good starting point and very often can help the investigator to formulate and generate ideas which can later be refined further by collecting primary data.

UNIT TWO

3

OVERVIEW OF DATA PRESENTATION AND ANALYSIS TECHNIQUES

2.1DATA PRESENTATION TECHNIQUESData Presentation: is the process of summarizing the collected data in a meaningful and suitable form. Presentation can be done in two basic forms: Statistical tables and Statistical charts.Statistical table is the presentation of numbers in a logical arrangement with some brief explanation to show what the data representing.Statistical Charts or graphs on the other hand are pictorial devices of presenting data.

2.1.1 Presentation of Quantitative DataThe important tools for presenting quantitative data are:

1. Frequency Distribution2. Histograms3. Frequency Polygons4. Cumulative Frequency Curve or Ogives.

Frequency DistributionIt is the method of arranging data in some order and counting the number of times each observation appears in the data set. Frequency is the number of times that each observation appears in the data set. Important elements of frequency distribution are:

1. Class interval: it is the difference between the upper limit and the lower limit2. Class limit: It is the lowest and the highest values that can be included in the class.3. Class frequency: Is the number of observations corresponding to the particular class.4. Class mark: Is the midpoint of the class interval.

class mark=upper class lim it+lower class lim it2

5. Class width: Is the size of the class.

HistogramIs a graph consisting of rectangles having:

1. Bases on horizontal axis with centers at class marks and lengths2. Areas proportional to class frequencies.

Frequency Polygon

It is a graph of frequency distribution. It facilitates comparison of two or more frequency distributions on the same graph. It can be drawn with or without histogram.

Ogive CurveIt is a graph that shows the cumulative frequency less than any upper class boundary or more than any lower class boundary. There are two types of ogive curves; the less than ogive and more

4

than ogive curves. In less than ogive cure we should use the upper class boundaries of class in the horizontal axis and cumulative class frequencies on the vertical axis but in the case of more than ogive cure we use lower class boundaries on the horizontal axis and cumulative class frequencies on the vertical axis.

2.1.2 PRESENTATION OF QUALITATIVE DATAImportant methods of presenting qualitative data include:1. Bar charts2. Categorical distribution3. Pie-charts

Bar charts: are one sided rectangular shaped diagrams used to present qualitative data. Bar charts are drawn in such a way that the height of the par is proportional to the amount of a given category and the width of each bar must be equal for all bars and the space between any two bars must be the same with the space between any other two bars. There are different types of bar graphs such as simple, sub divided and special bar graphs.Categorical Distribution: is a distribution used to present categorical data. It is the frequency distribution counter part of a categorical data. In this method of presenting data, categories should be defined in such a manner that they should be mutually exclusive and collectively exhaustive.Ex. Employees of Organization X by level of education.

No Name of Employee Level of Education1 Alemu Diploma2. Yalew Cirtificate3. Chala Bachelor4. Toga MSc.

. .

. . .60 Gebremedhin PhD

Present the data using categorical distribution.Solution: first, we have to establish the level of education into mutually exclusive and collectively exhaustive categories in such a way that we should ensure that each employee should have only one category and does not have more than one category and there must be a category for any employee to be belonged for.

Second, we have to count the number of employees that are belonging to each category.Categorical distribution of Employees of Organization X by LOE

5

Education Number Certificate 5 Diploma 15 Bachelor 25 Master 12 PhD 3 Total 60Pie-chart: is a type of circle used to display the percentage of total number of measurements falling into each category. It is a method of presenting data in a manner of dividing 360 degrees of a circle into a degree that is allocated to each category proportional to the share of each category in the total data.Ex. Present the data of employees of organization X by LOE using pie-chart.Solution: first, calculate the share of each category from the total data and secondly divide the circle into a degree that each category should have from the circle.

Employees of Organization X by the LOEEducation Number Percentage Degree of a categoryCertificate 5 8.33 30Diploma 15 25 90Bachelor 25 41.67 150Master 12 20 72PhD 3 5 18Total 60 100 360

2.2 DATA ANALYTICAL TECHNIQUESREGRESSION AND CORRELATION ANALYSISVery often data are given in pairs of measurements where one variable is dependent on the other variable.Ex. Income and years of service of workers Saving and family size Food consumption and weight of people University to high school level performance Etc.Regression and correlation analysis will show us how to determine both the nature and strength of relationship between a series of paired observations of two or more variables. Regression deals with the mathematical method that depicts the relationship while correlation concerned with measuring and expressing the closeness of the relationship between variables.

2.2.1 Correlation Analysis

6

Correlation is the degree of relationship that exists between two or more variables. It is the measure of degree of co-variation or association between two or more variables. Two variables are said to be correlated if an increase or a decrease on average in one variable is accompanied by an average increase or decrease of the other otherwise they are not.Types of Correlation: There are different types of correlation investigated from showing the nature of relationship that variables has. Correlation may be:

Positive or negative Simple, partial or multiple Linear or non-linear

Positive or Negative CorrelationIf an increase or a decrease in one variable is accompanied by the increase or decrease in the other changing with the same direction of both variables, then we will have a positive correlation. There are many economic variables which are positively correlated. Some examples of these include; quantity supplied and price of a commodity, income of a consumer and demand for a normal good, consumption and family size, saving and disposable income, etc.If an increase or a decrease in one variable is accompanied by the decrease or increase in the other changing with opposite directions of both variables, we will have a negative correlation. There are also several economic variables which are negatively correlated. Some of these include; quantity demanded and price of a commodity, interest rate and investment, saving and family size, supply and price of an input, etc.Simple, Partial or Multiple CorrelationsA correlation is said to be simple if it studies or if it exists between two variables only. A correlation is said to partial if it exists between two variables when all other variables connected to those two are kept constant and a correlation is said to multiple if it exists between more than two variables. Simple and partial correlations can take any value positive, zero or negative but multiple correlations cannot be negative.Linear or Non-linear Correlation: A correlation is said to be linear if a change in one variable brings on average a constant change on the other variable. A correlation is said to be non-linear if a change in one variable brings on average a different change on the other variable.METHODS OF STUDYING CORRELATIONCorrelation is studied in one of the following three methods

1. The scatter diagram(Graphic method)2. Simple linear correlation coefficient3. The coefficient of rank correlation

The Scatter Diagram: It is the rectangular diagram which helps us to visualize the relationship between two phenomena. We can plot the data by an X-Y plane starting from the minimum values of X and Y variables. When there is a strong correlation (Either positive or negative) the dots are condensed each other and as the degree correlation decreases they become more and more scatter. If the two variables are positively correlated the scatter diagram shows that the

7

points will be moving from left bottom to right top. If the two variables are negatively correlated, the scatter diagram shows that the points will be moving from right bottom to left top. If the two variables are uncorrelated, the points do not show any pattern but they show a non-patterned plot.Simple Linear Correlation Coefficient: The sample correlation coefficient is denoted by r. It is the measure of degree of relationship that exists between two variables. It is only applied in linear relationship as well as in simple correlation. The values of correlation coefficient cannot be less than -1 and cannot be greater than +1. In other words, the values of correlation coefficient always range between -1 and +1. A good measure of correlation coefficient is a one which supplies the answer in pure number, independent of the unit of measurement and indicates the direction and extent of correlation.

r=∑ xy

nδ x δ y

where x=X−X__

and y=Y−Y__

r=∑ xy

√∑ x2 .∑ y2

r=∑ ( X−X__

)(Y−Y )__

√∑ ( X−X__

)2∑ (Y−Y__

)2

r=n∑ XY−∑ X∑ Y

√(n∑ X2−(∑ X )2 )(n∑Y 2−(∑Y )2

If the correlation coefficient is ranging between -1 and zero, there is a negative correlation. Movement from zero to -1 increases the degree of negative correlation and vice versa. If the correlation coefficient takes the value of -1, there is a perfect or exact negative correlation between the two variables.If the correlation coefficient is ranging between zero and +1, there is a positive correlation. Movement from zero to +1 increases the degree of positive correlation and vice versa. If the correlation coefficient takes the value of +1, there is a perfect or exact positive correlation between the two variables.

Ifr=0 , there is no any linear correlation between the two variables Properties of Correlation Coefficient

1. The values of a correlation coefficient range between -1 and +1.

2. Correlation coefficient is symmetric. rYX=r XY

3. Correlation coefficient is the geometric mean of two regression coefficients. r=√bYX . bXY

4. Correlation coefficient has the same sign with regression coefficients. If the two regression coefficients are positive, correlation coefficient will be positive and vice verssa.

5. Correlation coefficient is independent of change of origin and change of scale. By change of origin we mean that adding or subtracting any constant from the values of the variables. Independent of change of origin indicates that adding or subtracting any constant value from the

8

values of the two variables does not change the correlation coefficient. By change of scale we mean that multiplying or dividing values of the two variables by any constant. Independent of change of scale indicates that multiplying or dividing values of the two variables by any constant does not change the correlation coefficient.

Example: Calculate Karl Pearson’s correlation coefficient from the following paired data

X: 28 41 40 38 35 33 40 32 36 33 Y: 23 34 33 34 30 26 28 31 36 38

Solution: We can apply the formula for computing correlation coefficient after we compute the arithmetic means of the two variables. We can also compute the correlation coefficient by using property five. Let we subtract the value 35 from all values of X and the value 31 from all values of Y and compute the coefficient.

X Y ( X−35) ( X−35)2 (Y−31 ) (Y−31 )2 (X-35)(Y-31)

28 23 -7 49 -8 64 5641 34 6 36 3 9 1840 33 5 25 2 4 1038 34 3 9 3 9 935 30 0 0 -1 1 033 26 -2 4 -5 25 1040 28 5 25 -3 9 -1532 31 -3 9 0 0 036 36 1 1 5 25 533 38 -2 4 7 49 -14

∑ ( X−35)2=162 ∑ (Y−31 )2=195 79

r=∑ ( X−35 )(Y −31)

√∑ ( X−35)2 .∑(Y −31)2= 79

√162 .195

r=0 . 44 . which is averagely positive correlationRank correlation CoefficientThe Karl Pearson’s coefficient of correlation can’t be used in cases where the direct quantitative measurement of phenomenon under study is not possible. In such cases one may rank the different items and apply the Spearman’s method of rank differences for finding out the degree of correlation. The rank correlation coefficient is denoted by R. Its value also ranges from -1 to

+1. R=1−

6∑ Di2

n(n2−1)

Ex. A group of workers of a factory are ranked according to their efficiency by judges as ff;

9

Name of Worker Judgment of Judge A Judgment of Judge BA 4 3B 8 9C 6 6D 7 5E 1 1F 3 2G 2 4H 5 7I 10 8J 9 10

Compute the rank correlation coefficient and interpret your result. Solution:Name of Worker R1 R2 Di Di

2

A 4 3 1 1B 8 9 -1 1C 6 6 0 0D 7 5 2 4E 1 1 0 0F 3 2 1 1G 2 4 -2 4H 5 7 -2 4I 10 8 -2 4J 9 10 1 1

∑ D i2=20

R=1−6∑ Di

2

n(n2−1 )

R=1−6 .2010(100−1)

R=1−120990

R=0 .88

Therefore, we can interpret the result as the opinion of two judges with regard to the efficiency of workers shows greater similarity.

2.2.2 Regression AnalysisRegression describes the average relationship between variables in a sense that the change in one or more variables brings a certain change on the other variable. A variable or group of variables that makes a certain cause for the change of the other variable is called independent or explanatory variable. A variable which is affected by the change of other variables is called dependent or explained variable. Regression describes the cause and effect relationship among

10

variables. A regression can be simple or multiple or it can be also linear or non-linear. A regression is said to be linear if it studies the relationship between one independent and one dependent variable while a regression is said to multiple if it studies the relationship between one dependent and more than one independent variable. A correlation is said to be linear if the change in an independent variable brings a constant change on the dependent variable while a correlation is said to non-linear if the change on an independent variable brings a non-constant change on the dependent variable. Estimating the Parameters of A functionOrdinary least Square Estimating Method (OLS): There are different methods of estimating the unknown parameters of a regression function in which the ordinary least squares is the most prominent method which is frequently used by statisticians because of its simplicity and having the desirable statistical properties that a good estimator should have to be a reliable estimator. For the matter of simplicity and scope here we will discuss only simple linear regression in which there are two variables in the model (one dependent and one independent variable) and the function is linear.

Y i=β0+β1 X i+U i . There are five elements in this function; Y , X , β0 , β1 and U i . Y, X and U

are known as variables and β0 and β1 are known as parameters. Y and X are known variables

whose values are collected from the field or from secondary sources. Since β0 , β1 and U i are not observed, the above function cannot be estimated as it is. Thus, we have to get the estimators of the unobserved elements and try to estimate the parameters.

Y i=b0+b1 X i+e i

The estimated regression line is given by the equation as;

Y i

¿

=b0+b1 X i .The difference between the actual value of Y and its estimated value (Y i

¿

) is the

error term, which can be given as; e i=Y i−Y i

¿

. But we can replace Y i

¿

by b0+b1 X i and find the equation for the error term.

e i=Y i−b0−b1 X i . The ordinary least squares method is designed to compute the estimated values of the parameters in such a way that the error term is the minimum possible. To minimize the error term function, first we have to find the aggregate of second degree function of the error term and apply the classical optimization criteria for minimization. This can be done as follows;

∑ ei2=∑ (Y i−b0−b1 X i)

2 and partially differentiate this error term function with respect to

b0 and b1 and set it with equal to zero to find the critical points.

11

∂∑ ei2

∂ b0

=2∑ (Y i−b0−b1 X i)(−1)=0

∂∑ ei2

∂ b1

=2∑ (Y i−b0−b1 X i)(−X i )=0

After manipulating certain algebraic operations, we can get values of parameter estimators that can minimize the error term as follows;

b0=Y__

−b X__

b1=∑ xy

∑ x2or b1=

∑ XY−n Y__

X__

∑ X2−n X¿

2

b1=n∑ XY−∑Y .∑ X

n∑ X2−(∑ X )2

It should be also noted that b1=bYX that is the regression coefficient of Y dependent on X. This regression can also be given in terms of the correlation coefficient.

bYX=r .δY

δX

where r is the correlation coefficient and δY is the s tan dard deviation of Y

and δX is the s tan dard deviation of X

The inverse of the function Y=f ( X ) which can be given as X=f =1 (Y )and its regression is known as inverse regression. The inverse regression is given as

X i=c+dY i . Where c and d are parameter estimates. By the same token c and are computed as follows;

c=X__

−d Y__

d=∑ xy

∑ x2or d=∑ XY −n X

__

Y__

∑ Y 2−n Y__2 =

n∑ XY−∑ X .∑ Y

n∑ Y 2−(∑ X )2

The regression coefficient d is also known as the regression coefficient of X dependent on Y.

d=bXY=r .δX

δY

Properties of Regression Coefficients1. The regression coefficients are not symmetric. That is the value of the regression coefficient

of Y dependent on X is not equal with the value of the regression coefficient of X on Y. bYX≠b XY

2. The regression coefficients must be having the same sign. If the regression of Y on X is negative, then should be the regression coefficient of X on Y and vice-versa.

12

3. If one of the regression coefficient is greater than one the other should be less than one and vice-versa.

4. Regression coefficients are independent of change of origin but not independent of change of scale.

The Coefficient of DeterminationAfter we estimate the unknown parameters, we have check to what extent the estimators are the reliable representatives of the parameters. We can test them using either t-test or z-test. The second test that can help us in testing the goodness of fit is the coefficient of determination.The coefficient of determination is the measure of the explanatory power of the model . It measures the proportion or percentage of the total variation of the dependent variable explained

or determined by the model. The coefficient of determination is denoted by the symbolR2

.

13

UNIT FIVETIME SERIES ANALYSIS

1.1 Introduction

Time series AnalysisDefinitely four types of data may be available for empirical analysis: time series, cross-section panel and pooled (combination of time series and cross section) data. A time series is a set of observations on the value that a variable takes at different times. Cross section data are data on one or more variables collected at the same point of time.

There are two major methods of analyzing time series data: Conventional and econometric methods. Econometric method of analysis can also be divided into two; frequency domain approach or spectral analysis and time domain approach. For ease of understanding, we are going to discuss the conventional method of time series analysis.

A time series data is a set of observations taken at specified times, usually at “equal intervals”, Mathematically, a time series is defined by the values Y1,Y2, . . . Yt , thus Y is a function of time, symbolically Y=f(t). Thus, when we observe numerical data at different points of time and the set of observations is known as time series. A good example is the production of teff in each production year.

Role of Time series Analysis Time series analysis is great significance in business decision making for the following reasons:

1. It helps in the understanding of post behavior by observing data over the period of time; one can easily understand what changes have taken place in the past. Such exercise will be important in understanding and predicting the future.

2. It helps in planning future operations if the regularity of occurrence of any feature over a sufficient long period could be clearly established, then, prediction of probable future variations would become possible.

3. It helps in evaluating current accomplishments. Times series analysis helps comparing the actual performance with that of the expected performance and the cause of variation is analyzed.

4. It facilitates comparison – Different time series are often compared and important conclusions drawn from them.

5.2 Components of Time Series

14

Time series elements are classified in to four basic types of variations which account for the changes in the series over a period of time. These four types of patterns, variations, movements are often called components or elements of time series. These are: 1) Secular trend 2) Seasonal variations3) Cyclical variations 4) Irregular variations In traditional or classical time series analysis, it is ordinarily assumed that there is a multiplicative relationship between these four components. That is, it is assumed that any particular value in series is the product of factors that can be attributed to the various components. Symbolically, it is given as; Y= T*S*C*I Where; T= Trend, S= Seasonal, C= Cyclical and I= Irregular If the above model is employed, the seasonal, cyclical and irregular items are not viewed as absolute amounts, but rather as relative magnitude. 1. Secular TrendTrend is the variation of value of a variable that can be observed in a long period of time. It is the general tendency of the data to grow or to decline over a long period of time. Trend is broadly divided under two heads: linear (what we going to see) and non – linear trends.

Methods of measuring Trend The following methods are used for measuring trend:1) Graphic method 2) The semi – average method 3) The method of least squares Graphic method: - This is the simplest method of studying trend. Under this method the given data are plotted on graph paper and a trend line is fitted to the data just by inspecting the graph of the series. There is no formal statistical criterion where by the adequacy of such a line can be judged and the judgment depends on the discretion of the individual researcher. However, as a rough guide, the line should be drawn in such a way that it passes between the plotted points in such a manner that the fluctuations in one direction are approximately equal to those in the other direction and that it shows a general movement. This method is not frequently used since its approach is subjective and no statistical method is used.

Methods of semi – Averages: This method is used in such a way that the given data are divided in to two parts, preferably, with equal number of years. For example, if we are given data from 1982 to 1999, that is, over a period of 18 years, the two equal parts will be first nine years, i.e., from 1982 to 1990 and from 1991 to 1999. In the case of odd number of years like 9, 13, 17, etc, two equal parts can be made simply by ignoring the middle year. For example, if the data are given for 19 years from 1981 to 1999, the two equal parts would be from 1981 to 1989 and from 1991 to 1999, the middle year 1990 would be ignored. Example: fit a trend line to the following data by the method of semi-averages:

15

Year sales 1994 1021995 1051996 1141997 1101998 1081999 1162000 112

Solution: since seven years are given, the middle year should be omitted and an average of the first three years and the last three years shall be obtained. The averages of the first three years is 102+105+1143

=3213

=107 and the average of the last three years is

108+116+1123

=3363

=112

Thus, we get two points, 107 and 112, which shall be plotted corresponding to their respective middle years, i.e. 1995 and 1999. By joining these two points; we obtain the required trend line.

Y

Trend Line112

107 Time

1994 1995 1996 1997 1998 1999 2000

Method of least squaresThis method is most widely used in practice. When this method is applied, a trend line is fitted to the data in such a manner that the following two conditions are satisfied:

1) ∑ (Y−Y C )=0 . The sum of deviations of the actual values of Y and the computed values of Y is zero.

2) ∑ ( y− yc )2Is the least, that is, the sum of the squares of the deviations of the actual and

computed values is the least one. The method of least squares can be used either to fit a

16

straight line trend or a parabolic trend. The straight line trend is represented by the equation

Y C=a+bX

In order to determine the value of the constants a and b, the following two normal equations are to be solved.

∑Y =n .a+b∑ X

∑YX=a∑ X+b∑ X2 Where n represents number of years and X is the time period.

We can measure the variable x from any point of time in origin such as the first year. However, this calculations are very much simplified when the midpoint in time is taken as the origin because in that case, the negative values in the first half of the series balances the positive values in the second half so that x=0, the above two normal equations would take the form:

∑Y =a .n

∑ xy=b∑ x2

a=∑Yn

b=∑ xy

∑ x2

The constant ‘a’ gives the arithmetic mean of Y and constant ‘b’ indicates the rate of change. Example: - based on the following figures of production of a sugar factory (in thousand quintals), fit a straight line trend and estimate the likely sales of the company in 1990. Year 1983 1984 1985 1986 1987 1988 1989 Production 80 90 92 83 94 99 92

Solution Year Production (Y) Time (X) XY X2

1983 80 -3 -240 91984 90 -2 -180 41985 92 -1 - 92 11986 83 0 0 01987 84 1 94 11988 99 2 198 41989 92 3 276 9

Y=630 X=0 XY=56 X2=28

Y C=a+bX

17

a=∑Y

n=630

7=90

b=∑ xy

∑ x2=56

28=2

Y C=90+2 X

2. Forecasting for the year 1990. Since 1990 is four years later than the base year, x=4. Therefore, we have to find the value of the YC when x=4YC = 90 + 2 (4) = 98 units i.e. the likely production of sugar factory in 1990 is 98,000 quintals. Example 2:- calculate the trend values by the method of least squares from the data given below and estimate the sales for the year 2003. Year 1996 1997 1998 1999 2000 Sales 12 18 20 23 27Solution

Year Sales Time XY X2 YC

1996 12 2 24 4 13

1997 18 -1 -18 1 16.5

1998 20 0 0 0 20

1999 23 1 23 1 23.5

2000 27 2 54 4 27

y=100 x=0 xy=35 x2=10

Y C=a+bX

a=∑Y

n=100

5=20

b=∑ xy

∑ x2=35

10=3 . 5

YC = 20 + 3.5 XForecasting for the sales of 2003 is carried out by substituting x = 5 since 2003 is found five years later than the base year 1998.Y2003 = 20 + 3.5 (5) = 37.5Example 3:- Fit a straight line trend to the following dataYear 1995 1996 1997 1998 1999 2000Production 64 70 75 82 88 95

Year Sales Time XY X2

1995 64 - 3 - 192 9

1996 70 -2 -140 4

1997 75 -1 -75 1

1998 82 0 0 0

18

1999 88 1 88 1

2000 95 2 190 4

y=474 x=-3 yx=-129 x2=19

Y = n. a + bxXy = ax + bx2

474 = 6a – 3b-129 = -3a + 196474 = 6a – 3b-258 = -6a + 38b216 = 35 bb = 216 = 6.17 35 474 = 6a – 3 (6.17)6a = 474+18.516a = 492.51

a=492 .516

=82 .085

YC = 82.085 + 6.17X

Seasonal variationsSeasonal variations are periodic movements in business activity which occurs, regularly every year and have their origin in the nature of the year itself. It exists only when data are given in a period which is less than a year (monthly, semi-annually, quarterly, weekly, daily, etc). However, it does not exist in data which are given in annual basis or more than a year period internal. Nearly every type of business activity is liable to seasonal influence to a greater or lesser degree and, as such, these variations are regarded as normal phenomenon recurring every year. Although the word ‘seasonal’ seems to imply a connection with the season of the year, the term is meant to include any kind of variation which is of periodic nature and whose repeating cycles are of relatively short duration. The factors that cause seasonal variations are:

1) Climate and weather conditions. The most important factor causing seasonal variation is the climate changes in the climate and weather conditions such as rainfall, humidity, heat, etc, act on different product and industry differently.

2) Customs, traditions and habits – Though nature is mainly responsible for seasonal variations in time series, customs and traditions also have their impact.

Measurement of seasonal variationsWhen data are expressed annually there is no seasonal variation. However, monthly or quarterly data frequently exhibit strong seasonal movements and considerable interest attaches to devise a

19

pattern of average seasonal variation. There are several methods of measuring seasonal variation. However, the following methods are popularly used in practice: 1. Method of simple averages2. Ratio to trend method 3. Ratio to moving average method 4. Link relatives method

Method of simple averagesThis is the simplest method of obtaining a seasonal index. The following steps are necessary for computing the index:1) Average the unadjusted data by years and months or quarters if the data are given quarterly.2) Find the totals of the data in each month, quarter or a period in which the data are given. 3) Divide each total by the number of years for which data are given.4) Obtain an average of monthly averages by dividing the total of monthly averages by 12. 5) Taking the average of monthly averages as 100, compute the percentage.

Seasonal Index for January=( Monthly average for JanuaruAverage of monthly averages

)100

Example: consumption of monthly electric power in KW hours of for street lighting in Haramaya University from 1995 – 1999.

Year Jan Feb Mar Apri may Jun Jul Aug Sep Oct

1995 318 281 278 250 231 216 223 245 269 302

1996 342 309 299 268 249 236 242 262 288 321

1997 367 328 320 287 269 251 259 284 309 345

1998 392 349 342 311 290 273 282 305 328 364

1999 420 378 370 334 314 296 305 330 356 396

Year Nov Dec

1995 325 347

1996 342 364

1997 367 394

1998 389 417

1999 422 452

20

Find out seasonal variation by the method of monthly averages?Solution:

Month 1995 1996 1997 1998 1999 Total Average %

Jan 318 342 367 392 420 1839 367.8 116.1 Feb 281 309 328 349 378 1645 329 103.9

Mar 278 299 320 342 370 1609 321.8 101.6 April 250 268 287 311 334 1450 290 91.6 May 231 249 269 290 314 1353 270.6 85.4

Jun 216 236 251 273 296 1272 254.4 80.3

July 223 242 259 282 305 1311 262.2 82.8

Aug 245 262 284 305 330 1426 285.2 90.1 Sep 269 288 309 328 356 1550 310 97.9

Oct 302 321 345 364 396 1728 345.6 109.

Nov 325 342 367 389 422 1845 369 116.

Dec 347 364 394 417 452 1974 394.8 124.7

Total 19002 3800.4 1200

Average 1583.5 316.7 100

Seasonal index for January=367 . 8316 .7

×100=116. 1

Seasonal index for Februrary=329316 . 7

×100=103 . 9

Seasonal index for Julay=262 .2316 .7

×100=82 . 8

Ratio – to- Trend method This method of calculating a seasonal index in relatively simple and yet an improvement over the method of simple average explained in the preceding section. The method assumes that the seasonal variation for a given month is a constant fraction of the trend. It first eliminates the trend component by dividing the original data with the trend value.

T×S×C×IT

=S×C×I

Random elements are supposed to disappear when the ratios are averaged. A careful selection of the period of years used in the computation is expected to cause the influences of prosperity or depression to offset each other and thus removes the cycle.

21

This method requires the following steps:1. Compute the trend values by applying the method of least squares; 2. Divide the original data month by month by the corresponding trend values and multiply the

ratio by 100. The values obtained are now free from trend; 3. In order to free form irregular and cyclical movements, the irregular given for various years

for January, February, etc should be averaged and 4. The seasonal index for each month is expressed as a percentage of the average month. The

sum of 12 values must equal 1,200 or 100%. If it does not, an adjustment is made by multiplying each index by a suitable factor (1200). This gives the final seasonal index.

Example: - find the seasonal variations by ratio to trend method from the data given below

Year 1 st q 2 nd q 3 rd q 4 th quarter

1996 30 40 36 34

1997 34 52 50 44

1998 40 58 54 48

1999 74 76 68 42

2000 80 92 86 82

Solution: - To determine seasonal variation by ratio to trend method, first we will determine the trend of yearly data and then convert it to quarterly data. First calculate the trend values;

Year Yearly total Yearly average (Y) Time (X) XY X 2 trend values

1996 140 35 -2 -70 4 32

1997 180 45 -1 - 45 1 44

1998 200 50 0 0 0 56

1999 260 65 1 65 1 68

2000 340 85 2 170 4 80

Y=280 X= 0 xY=120 X2=10

Y C=a+bX

a=∑Y

n=280

5=56

b=∑ xy

∑ x2=120

10=12

Quarterly increment=124

=3

Calculation of Quarterly trend valuesConsider 1997. The trend value of 1997 indicates the trend value of the middle quarter of the year. The middle quarter is found half of 2nd and half of 3rd quarter. Therefore, trend value of the 2nd quarter is given as 44 – 3/2 = 42.5 and the trend value of the 3rd quarter is 44+3/2 = 45.5.

22

After this, subtract 3 from the 2nd quarter trend value to get the trend value of the first quarter and add 3 to get the trend value of the 4th quarter to the trend value of the 3rd quarter.

Quarterly Trend Values

Year 1 st quarter 2 nd quarter 3 rd quarter 4 th quarter

1996 27.5 30.5 33.5 36.5

1997 39.5 42.5 45.5 48.5

1998 51.5 54.5 57.5 60.5

1999 63.5 66.5 69.5 72.5

2000 75.5 78.5 81.5 84.5

The ratio to trend values can be found by dividing the original data by the trend values expressed in percentage. Quarterly values as percentage of trend values

Year 1 st quarter 2 nd quarter 3 rd quarter 4 th quarter

1996 109.1 131.1 107.5 93.1

1997 86.1 122.4 109.9 90.7

1998 77.7 106.4 93.9 79.3

1999 85.0 114.3 97.8 85.5

2000 106.0 117.1 105.5 84.5

Total 463.9 591.3 514.6 445.6

Average 92.78 118.26 102.92 89.12

Since 92.78+118.26+102.92+89.12=403.08 is greater than 400, we have to find the correction factor and multiply each seasonal index by the correction factor.

CF=400sum of values of 4 quarters

=400403 .08 ,

Then the adjusted seasonal index will be given as follows;

1st quarter = 92.02nd quarter = 117.43rd quarter = 102.24th quarter = 88.4

Ratio-to- moving average methodThe ratio to moving average is the most widely used method of measuring seasonal variations. The following steps are important in measuring seasonal variations using the ratio to moving average method. 1) Compute the centered 12 – month moving average from the original data. This contains

trend and cyclical variations. 2) Express the original data for each month as percentage of the centered 12 – month moving

average

23

3) Divide each month data by the corresponding 12- centered moving average and list the quotient

T×S×C×IT×C

=S×I

4) Compute the average of each month for the quotient that we obtained in step 3. By doing so, the irregular component will be removed.

S×II

=S

The sum of seasonal index should be 1200. If the sum is different from 1200, compute the correction factor and multiply each month’s seasonal index by the correction factor. The correction factor is obtained as, CF = 1200 ______________ The total mean for 12 months

Link Relatives MethodThis is also one of the methods of measuring seasonal variations. When this method is adopted, the following steps need to be considered;1. Calculate the link relatives of seasonal figures.

LR= Current season ' s figure

Pr evious season ' s figure×100

2. Calculate the average of the link relatives for each season3. Convert the averages in to chain relatives on the base of the last season4. Calculate the chain relatives of the first season on the base of the last season5. For correction, the chain relative of the first season calculated by the first method is deducted

from the chain relative of the first season calculated by the second method6. Express corrected chain relatives as percentage of their averages. These provide the required

seasonal indices by the method of link relatives.Example: Apply the method of link relatives to the following data and calculate seasonal indices

Quarter 1998 1999 2000 2001 2002I 6.0 5.4 6.8 7.2 6.6II 6.5 7.9 6.5 5.8 7.3III 7.8 8.4 9.3 7.5 8.0IV 8.7 7.3 6.4 8.5 7.1

Solution: Calculation of Seasonal Indices by Link RelativesYear I II III IV1998 - 108.3 120.0 111.51999 62.1 146.3 106.3 86.92000 93.2 95.6 143.1 68.8

24

2001 112.5 80.6 129.3 113.32002 77.6 110.6 109.6 88.8Mean 86.35 108.28 121.66 93.86Chin relative 100 100×108. 28

100=108 . 28

121 .66×108 .28100=131 . 73

93 . 86100

×131.73

=123 . 64Corrected chain relative

100 108 .28−1 .675=106 .605

131 .73−3 .35=128 . 38

123 . 64−5 .025=118 .615

Seasonal Indices 100 106 . 605113. 4

×100

=94 .00

128 .38113. 4

×100

=113 .21

118. 615×100113. 4=104 .60

Correction factor=Link relative from last−Link relative from first LR in the first season= 100

LR in the last season=86 . 35100

×123 . 64=106 .76

Difference between the chain relative=106 . 76−100=6 . 76

Difference per quarter=6 . 764

=1. 675

Adjusted chain relatives are obtained by subtracting1×1 . 675from the second quarter, 2×1 . 675

from third quarter and 3×1. 675 from the fourth quarter. Seasonal indices can be calculated as;

100+106 . 605+128 .38+118. 6154

=453 .64

=113.4

Seasonal index=Corrected chain relative113.4

×100

Cyclical VariationsThe term cycle refers to recurrent variations in time series that usually last longer than a year and regular, neither in amplitude nor in length. Cyclical fluctuations are long term movements that represent consistently recurring rises and declines in activity. They are resulted mainly from business cycles. A business cycle consists of the recurrence of the up down movements of business activity from some sort of statistical trend. There are four well defined periods or phases in the business cycle. These are prosperity, decline, depression and improvement. The study of cyclical variations is extremely useful in framing suitable policies for stabilizing the level of business activity, i.e. for avoiding periods of booms and depressions as both are bad for the economy.

Measurement of Cyclical Variations

25

Business cycles are important types of fluctuations in economic data. Definitely, they are receiving a lot of attention in economic literature. Despite the importance of business cycles, they are most difficult types of fluctuations to measure. This is because successive cycles vary widely in timing, amplitude and pattern. Because of such reason, it is impossible to construct meaningful typical cycle indices of curves similar to those that have been developed for trends and seasonality. The important methods used for measuring cyclical variations are:

1. Residual Method2. Reference Cycle Analysis Method3. Direct Method4. Harmonic Analysis Method

Because of the frequent usage and convenience of time, only the first method is discussed. Residual Method: Among all the methods of arriving at estimates of the cyclical movements of time series, the residual method is most commonly used. This method consists of eliminating seasonal and then trend variations to obtain the cyclical and irregular movements.

T×S×C×IS

=T×C×I

T×C×IT

=C×I

The data are usually smoothed in order to obtain cyclical movements, which are sometimes termed as the cyclical relatives since they are always expressed in percentages. This is because cyclical, irregular or the cyclical movements remain residuals. As a result, this procedure is referred to as the residual method.

Irregular Variations Irregular variations refer to such variations in business activities which do not repeat in a definite pattern. It includes all types of variations other than those accounting for the trend, seasonal and cyclical movements. Irregular movements are considered to be largely random, being the result of chance factors, which like the fall of a coin, that are wholly unpredictable. Irregular variations are caused by such special occurrences as flood, earthquakes, strikes and wars. Sudden changes in demand or rapid technological progress may also be included in this category. By their nature, these movements are very irregular and unpredictable. Quantitatively it is almost impossible to separate out the irregular movements and the cyclical movements. Therefore, while analyzing time series, the trend and seasonal variations are measured separately and the cyclical and irregular variations are left altogether. Measurement of Irregular VariationsThe irregular component in a time series represents the residue of fluctuations after trend, seasonal and cyclical movements have been accounted for. Thus, if original data is divided by T, S and C, we

get I.

TSCITSC

=I .In practice, the cycle itself is so erratic and interwoven with irregular movements

that it is impossible to separate them. In the analysis of time series into its components, trend and

26

seasonal movements are usually measured directly, while cyclical and irregular fluctuations are left altogether after the other elements have been removed.

27

edited economic statistics note

Sales

types of statistical

nominal scales

crosssectional data

ratio scales

presentation of data

types time series data

pooled data

different types of data