s2e - stat2var - tex - rev 2018jff-dut-tc.weebly.com/uploads/1/4/7/9/14799044/s2e... · iut de...

10
____________________________________________________________________________ IUT de Saint-Etienne – Département TC –J.F.Ferraris – Math – S2 – Stat2Var – TEx – Rev2018 – page 1 / 10 SALES AND MARKETING Department MATHEMATICS 2nd Semester ________ Bivariate statistics ________ Tutorials and exercises Online document: http://jff-dut-tc.weebly.com section DUT Maths S2.

Upload: others

Post on 11-Oct-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: S2e - Stat2Var - TEX - Rev 2018jff-dut-tc.weebly.com/uploads/1/4/7/9/14799044/s2e... · IUT de Saint-Etienne – Département TC –J.F.Ferraris – Math – S2 – Stat2Var – TEx

____________________________________________________________________________ IUT de Saint-Etienne – Département TC –J.F.Ferraris – Math – S2 – Stat2Var – TEx – Rev2018 – page 1 / 10

SALES AND MARKETING Department

MATHEMATICS

2nd Semester

________ Bivariate statistics ________

Tutorials and exercises

Online document: http://jff-dut-tc.weebly.com section DUT Maths S2.

Page 2: S2e - Stat2Var - TEX - Rev 2018jff-dut-tc.weebly.com/uploads/1/4/7/9/14799044/s2e... · IUT de Saint-Etienne – Département TC –J.F.Ferraris – Math – S2 – Stat2Var – TEx

____________________________________________________________________________ IUT de Saint-Etienne – Département TC –J.F.Ferraris – Math – S2 – Stat2Var – TEx – Rev2018 – page 2 / 10

Exercise 1. (Tutorial for lesson page 5)

Are people’s behaviour in relation to tobacco and people’s gender related, with a 10% significant level?

Here are the results of a survey made on a sample of 51 men and 66 women:

G : variable "gender" B : variable "behaviour in relation to tobacco"

Gm : men Bn : never smoked

Gw : women Bs : smoke

Bss : stopped smoking

observed

frequencies:

theoretical frequencies

according to H0: Detailed Chi-squares and total:

Gm Gw Gm Gw Gm Gw

Bn 12 23 Bn Bn

Bs 31 26 Bs Bs

Bss 8 17 Bss Bss

1) Place the subtotals and the general total in the first table, and in the second one, identically.

2) Fill the second table (6 central theoretical values) following proportional calculations.

3) Table #3: calculate the six Chi-square, then add them to get the value χ²calc.

4) Test writing:

Null hypothesis:

Observed χ²

Value of the variable χ² between the observed and the theoretical samples: χ²calc =

Rejection area

Significance level: α =

Number of dof: (r-1)(k-1) =

Value of the variable χ² limit until rejection : χ²lim =

Comparison and decision:

Exercise 2.

Two candidates compete for a presidential election: NS and FH. In a little town, there are 500 voters. 100 are

retired people, 50 are unemployed and 350 are employees. There, the vote results are:

candidates FH NS

blank/

abstention voters

unemployed 24 16 10

employees 122 148 80

retired 36 27 37

1) Decide, with a 1% significance level, whether people’s opinion depends on their social group or not.

2) What can we say if we do not include blank votes and abstentions?

Page 3: S2e - Stat2Var - TEX - Rev 2018jff-dut-tc.weebly.com/uploads/1/4/7/9/14799044/s2e... · IUT de Saint-Etienne – Département TC –J.F.Ferraris – Math – S2 – Stat2Var – TEx

____________________________________________________________________________ IUT de Saint-Etienne – Département TC –J.F.Ferraris – Math – S2 – Stat2Var – TEx – Rev2018 – page 3 / 10

Exercise 3.

The table shows attendance in two stores A and B: how many people

made at least one purchase. These clients have been sorted by age group

(10 to 15 years old, and so on).

1. Say, with a 5% significance level, whether the chosen store depends on

the age of a client.

store

age

A B

10 - 15 46 24

15 - 20 29 35

20 - 40 14 17

> 40 12 18

2) What age group mostly contributes to the previous result? Explain.

3) Give the meaning of the “5% significance level” on your first answer.

4) According to your Chi² table, can you be more accurate about the chance taken in this statement (your first

answer)?

Exercise 4.

In a survey, 100 people were asked about their age and their attendance at theatres (cinema). We name X the

variable "age" and Y the variable "number of annual cinema shows". The survey result is the following table of

quotes (fr.: citations) :

Y X [15 ; 25[ [25 ; 50[ ≥ 50

none 4 6 13

1 to 11 10 16 15

12 to 23 13 8 4

≥ 24 6 3 2

1) By a χ² independence test, with a 2% significance level, decide whether there’s a link or not between the

age and the level of attendance at the cinema.

2) Using your form table, discuss the level of confidence you can assign to the assertion : “they are

dependent”.

3) Identify the most important partial Chi-2s and give the meaning of these high values.

Exercise 5. (Tutorial for lesson page 6)

Let’s have a close look of a company’s turnover evolution through time.

2009 2010 2011 2012

tri1 tri2 tri3 tri4 tri1 tri2 tri3 tri4 tri1 tri2 tri3 tri4 tri1 tri2 tri3 tri4

(M€) 28 45 49 36 30 44 48 40 28 46 52 37 31 42 54 39

Though there are big seasonal variations, due to its particular activity, is it possible to find out a global

trend on several years?

Page 4: S2e - Stat2Var - TEX - Rev 2018jff-dut-tc.weebly.com/uploads/1/4/7/9/14799044/s2e... · IUT de Saint-Etienne – Département TC –J.F.Ferraris – Math – S2 – Stat2Var – TEx

____________________________________________________________________________ IUT de Saint-Etienne – Département TC –J.F.Ferraris – Math – S2 – Stat2Var – TEx – Rev2018 – page 4 / 10

Let’s decide to calculate and display the 5 by 5 moving means:

(do it as a group job: divide the set of calculations with your neighbours and share your results)

1-5 2-6 3-7 …

X

Y

calculations:

Exercise 6. (Tutorial for lesson page 7)

Let’s take back one of the examples introduced page 3 (lessons doc): effect of the amount of fertilizer on the

harvested production.

fertilizer harvest

plot # X (kg.ha-1) Y (q.ha-1)

1 150 46

2 80 37

3 120 46

4 220 51

5 100 43

1) For each half-cloud, determine the mean points coordinates.

2) Determine the expression of the Mayer’s line (G1G2).

3) On a graph, plot the initial table and draw this line.

Exercise 7.

Determine the expression of the Mayer’s line, taking back the case given in exercise 5.

Exercise 8. (Tutorial for lesson page 8)

Calculate or display on your calculator: the means and standard deviations; the covariance.

1) Taking the data of exercise 6 (fertilizer/harvest)

2) Taking the data of exercise 4 (age/# of cinema shows) – choose 60 as average age for the class 50 and more;

choose 36 as average number of shows for the class 24 and more.

Exercise 9. (Tutorial for lesson page 9)

Let’s consider the following time series: a company’s annual expenses in advertising.

X : year 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013

Y : expense (k€) 41 60 55 66 87 61 90 95 82 120 125 118

The corresponding scatter plot is represented:

Page 5: S2e - Stat2Var - TEX - Rev 2018jff-dut-tc.weebly.com/uploads/1/4/7/9/14799044/s2e... · IUT de Saint-Etienne – Département TC –J.F.Ferraris – Math – S2 – Stat2Var – TEx

____________________________________________________________________________ IUT de Saint-Etienne – Département TC –J.F.Ferraris – Math – S2 – Stat2Var – TEx – Rev2018 – page 5 / 10

Determine the expression of the Y on X fitting line, following the least square method; then, draw it.

Exercise 10.

500 people, having passed their driving license exam, are

sorted in the table below.

They are distributed with respect to the number X of times

they took the exam before passing it and to the number Y of

hours of driving lessons before their first attempt.

1) Define a margin frequency. Then, give an example from the

table.

2) Describe, shortly, the way to enter the data set in your calculator.

3) Calculate the covariance of the pair (X, Y) and give a concrete comment about this value.

4) Among those who took between 15 and 25 hours of driving lessons, what is the rate of those who passed

their exam on the third attempt?

5) Among those who passed their exam on the third attempt, what is the rate of those who took between 15

and 25 hours of driving lessons?

Exercise 11.

A sales agent wishes to analyse his (or her) activity and efficiency. On

each appointment to a prospect have been noted the length (X, in

minutes) of the presentation of the product, and the sold quantity

(Y). The twelve values inside the table were filled with the number of

appointments that correspond to each pair (X, Y).

1) Give the meaning of the frequency "8" found inside the table.

2) Calculate, manually, the average time spent per appointment.

3) Give the covariance of the pair (X, Y).

Exercise 12.

The following table indicates the sales price (€) of an equipment and the number of sold items, for 4 years.

year rank 1 2 3 4

sales price (€) X 300 210 270 375

# of sold items Y 198 240 222 160

1) Build the scatter plot with an orthogonal frame. The axes intersection must be the point (210, 160);

scales: 1 cm for €15 on the abscissas axis, 1 cm for 10 items on the ordinates axis.

year 1: 2006

Page 6: S2e - Stat2Var - TEX - Rev 2018jff-dut-tc.weebly.com/uploads/1/4/7/9/14799044/s2e... · IUT de Saint-Etienne – Département TC –J.F.Ferraris – Math – S2 – Stat2Var – TEx

____________________________________________________________________________ IUT de Saint-Etienne – Département TC –J.F.Ferraris – Math – S2 – Stat2Var – TEx – Rev2018 – page 6 / 10

2) Determine the coordinates of G, mean point of the cloud.

3) a. Determine the expression of the Y on X fitting line, following the least square method.

The coefficients will be expressed with 6 significant figures.

b. Draw this regression line on the graph.

4) Which year saw the highest turnover? For which amount?

going further:

5) Now, we assume that, each year, the number of sold items y and the sales price x are related this way:

y = – 0.498 x + 349. We denote S(x) the turnover achieved by selling y items, €x each.

a. Express S(x) with respect to x.

b. Find the variations of the function S defined in [210 ; 375].

c. Deduce the sales price we would have to set for a fifth year if we want a maximum turnover. How many

items will be sold (round to one unit)? For what turnover?

Exercise 13.

A survey wishes to compare people's expense in high tech equipment compared to their sales. Each column

of the table T below represents, in a given French land, the average monthly income of people (X) and the

average monthly expense (Y) in high-tech equipment.

land A B C D E F

income X (€) 1550 1620 1770 1850 1930 2000

expense Y (€) 57 61 66 73 76 82

1) Calculate the covariance and then the linear correlation coefficient of the pair (X, Y).

Give an interpretation of both parameters.

2) a. Give, by the mean of your calculator, the expression of the Y on X regression line.

b. Obtain the expression of the Mayer's line of the series, from the table T.

c. Both lines slightly differ. Find the income for which they both give the same expense. What makes this

common point special, inside the point cloud?

Exercise 14. (Tutorial for lesson page 12)

Data about the fuel consumption of a motorcycle have been

collected. Consumption: Y, in L/100km, speed: X, in km/h) :

X 10 20 30 40 50 60 70 80 90

Y 15.2 11.6 9.3 7.8 7 6.6 6.9 8 9.6

The scatter plot, on the right, clearly shows us that a linear

regression would be inappropriate to describe the evolution of the

consumption with respect to the speed. Thus, we will propose a

variable change.

1) Let’s define the variable T by: T = (X – 60)².

Complete the following table:

T

Y 15.2 11.6 9.3 7.8 7 6.6 6.9 8 9.6

2) Perform a linear regression of Y on T.

3) Thus, deduce the expression of the regression curve, for the initial scatter plot.

Page 7: S2e - Stat2Var - TEX - Rev 2018jff-dut-tc.weebly.com/uploads/1/4/7/9/14799044/s2e... · IUT de Saint-Etienne – Département TC –J.F.Ferraris – Math – S2 – Stat2Var – TEx

____________________________________________________________________________ IUT de Saint-Etienne – Département TC –J.F.Ferraris – Math – S2 – Stat2Var – TEx – Rev2018 – page 7 / 10

Exercise 15. quadratic fitting

A company took note of its profits Y with respect to X, produced and sold quantity:

X (tons) 2 3 5 7 11

Y (k€) 38 55 72 69 24

T

1) Thanks to your calculator, give the linear correlation coefficient between X and Y. Comment.

2) Let’s settle the variable T = -(X - 6)².

a. Complete the table.

b. Calculate Cov(T, Y) and then the linear correlation coefficient between both variables.

c. Is a linear fitting of Y on T appropriate?

d. Determine the expression of the Y on T fitting line, following the least square method.

e. Deduce an expression of the regression of Y on X.

Exercise 16. quadratic fitting

A market study was conducted on a new type of product. The table below gives, for several proposed sales

price, the number of people willing to pay that price.

unit price (€) X 2 3 4 5 6 7

number of people Y 66 47 34 25 18 14

1) Calculate the covariance of the variables X and Y, then comment its sign.

2) We set T = X(X - 20)

a. Calculate le the linear correlation coefficient between both variables T and Y.

b. Comment its value.

c. Determine the expression of the Y on T fitting line, following the least square method.

d. Deduce an expanded expression of the regression of Y with respect to X.

3) Here we examine the expected turnover (unit selling price × number of sales), if the numbers of citations

obtained in the survey are considered to be the numbers of units sold.

a. Calculate the turnovers that can be extracted from the initial table.

b. Calculate, for the same values of X, the turnovers CA' that can be got thanks to the formula obtained in

question 2)d.

c. What unit selling price should we fix, so that the best turnover would be reached?

Exercise 17. inverse fitting

A perfumery, on analysing its turnover, connects the sales quantities (Y) to various perfume brands and

models prices (X). The results are gathered in the following table:

X, bottle’s price (€) 15 25 30 40 45 60 75 90

Y, # of sold bottles 202 117 107 82 78 60 55 48

Answer the questions beginning with "calculate" by using your calculator’s results.

1) a. Calculate the covariance of X and Y; comment its sign.

b. Calculate the linear correlation coefficient of X and Y; comment its value.

2) In order to have a more precise idea of how X and Y are related, we set the variable change: 850

TX

=

a. After having calculated the list of values of T, in a third list (calculator), justify that the linear correlation

is excellent between T and Y.

b. Give the expression of the Y on T regression line, according to the least square method.

c. What is the least square criterion?

d. Deduce from question 2)b a modelled expression of Y with respect to X.

e. According to this model, how many bottles whose cost is €150 would the perfumery expect to sell?

Page 8: S2e - Stat2Var - TEX - Rev 2018jff-dut-tc.weebly.com/uploads/1/4/7/9/14799044/s2e... · IUT de Saint-Etienne – Département TC –J.F.Ferraris – Math – S2 – Stat2Var – TEx

____________________________________________________________________________ IUT de Saint-Etienne – Département TC –J.F.Ferraris – Math – S2 – Stat2Var – TEx – Rev2018 – page 8 / 10

Exercise 18. (Tutorial for lesson page 13)

Calculate the point estimates, in the given situations.

1) Taking back exercise 9, give an estimate of the expense in 2015.

2) Taking back exercise 6, give an estimate of the quantity of fertilizer that would offer a harvest of 60 q/ha.

3) Taking back exercise 13, give an estimate of the fuel consumption when the speed is 100 km/h.

Exercise 19. (Tutorial for lesson page 13)

Let’s take back exercise 9. We want to estimate the expense, for the year 2015, by a 95% confidence interval.

1) a. Get the values of Y’, from the values of X and the expression of the fitting line;

b. Get the values of Z, by dividing Y by Y’; c. Then, give the mean and standard deviation of Z.

2) Give the point estimate of the expense in 2015.

3) Give the coefficient u corresponding to the confidence level.

4) Then, give the confidence interval.

Exercise 20. (Tutorial for lesson page 13)

With exercise 6, estimate the harvest by a 99% confidence interval, due to 300 kg/ha of fertilizer.

1) a. Get the values of Y’, from the values of X and the expression of the fitting line;

b. Get the values of Z, by dividing Y by Y’; c. Then, give the mean and standard deviation of Z.

2) Give a point estimate of the harvest.

3) Give the coefficient u corresponding to the confidence level.

4) Then, give the confidence interval.

Exercise 21. (Tutorial for lesson page 13)

On each person in a sample, a survey noted the age class (X) and the visual acuity (Y, 1/10 = 0.1):

X

[5 ; 35[ [35 ; 45[ [45 ; 55[ [55 ; 65[

Y

0.3 1 5 10 20

0.6 8 12 25 18

0.9 55 30 14 6

Estimate the visual acuity of a 80 year-old person, by a 99% confidence interval.

Exercise 22.

In a country, two variables are compared: the consumer force index and the turnover of its car industry:

consumer force (index) X 3.26 3.85 3.44 3.08 3.6

car industry turnover (G€) Y 9.3 9.56 9.36 9.24 9.47

1) Give the expression of the Y on X Mayer’s line.

2) By the mean of a point estimate, give a value of the consumer force that would correspond to a G€ 10

car industry turnover.

3) Is a strong correlation between two variables a sign of a cause and effect relationship between them?

Exercise 23. least square + confidence interval

Monthly revenues of a commercial website are listed below, from January to December 2015:

in k€ : 3 5 4 8 10 9 13 12 17 18 18 21

1) In a few words, describe the least square method.

2) Thanks to the global trend of the evolution of the monthly revenue, give the 95% confidence interval of the

predictable revenue in December 2016. (number the months from 1 for January 2015)

3) Give the probability that, in December 2016, the revenue would be less than k€ 29.23.

4) Build the scatter plot (scale: 2 cm for one month), draw the regression line and finally represent the

confidence interval.

Page 9: S2e - Stat2Var - TEX - Rev 2018jff-dut-tc.weebly.com/uploads/1/4/7/9/14799044/s2e... · IUT de Saint-Etienne – Département TC –J.F.Ferraris – Math – S2 – Stat2Var – TEx

____________________________________________________________________________ IUT de Saint-Etienne – Département TC –J.F.Ferraris – Math – S2 – Stat2Var – TEx – Rev2018 – page 9 / 10

Exercise 24. Mayer + confidence interval

city X Y The given table includes eight among major cities of a country. The variable X

gives, in thousands, the number of city residents; the variable Y gives, in

thousands, the number of students in this city.

1) Build the scatter plot from this data series.

2) Give the coordinates of the mean point of the cloud.

3) a. Using Mayer’s method, determine manually the expression of the Y on X

regression line.

b. Draw this line. Does G belong to it?

c. Give "Mayer’s principle".

A 850 58

B 623 37

C 587 38

D 360 20

E 312 16

F 275 15

G 262 12

H 244 12

4) We will use here another fitting line, whose expression is: y' = 0.07 x – 6.

a. With this line, give the 95% confidence interval of the predictable number of students in a town that has

two million inhabitants.

b. What can we say about the chances that the number of students would exceed 155,000 in such a town ?

Exercise 25. logarithmic fitting + confidence interval

Service life of some identical office equipment has been studied. In the following table, ti represents the

duration of use - expressed in thousands of hours - and R(ti) the rate of equipment still in use at the time ti.

(e.g. : after 1,000 hours, ti = 1, there are still 90 % left of equipment in use, R(ti) = 0.90)..

ti 1 2 3 4 5 6 7 8 9

R(ti) 0.9 0.66 0.53 0.4 0.32 0.25 0.19 0.14 0.1

1) We set yi = ln[R(ti)] where ln is the natural logarithm. Fill the following table, then build the scatter plot,

using the points Mi (ti, yi), into an orthogonal frame.

ti 1 2 3 4 5 6 7 8 9

yi

2) May a linear fitting be relevant in the previous point?

Calculate the linear correlation coefficient between T and Y.

3) Using the least square method, determine an expression of the Y on T regression line.

Deduce from this expression that there are two positive real numbers k and λ such that: R(t) = k e- λt

.

4) In this question, we'll take k = 1.174 and λ = 0.266.

a. Determine the predictable rate of equipment still in use after 10,000 hours.

b. After how long are there exactly 50 % of equipment still in use?

5) Give a 99% confidence interval of the rate of equipment still in use after 10,000 hours of service.

Exercise 26.

100 children have been classified by age (X) and size (Y):

Y

[95 ; 105[ [105 ; 125[ [125 ; 135[

X

[3 ; 5[ 15 10 0

[5 ; 7[ 8 32 5

[7 ; 9[ 2 13 15

1) Enter this table in your calculator.

2) Give the means and standard deviations of X and Y, calculate their covariance.

3) Calculate their linear correlation coefficient. Comment this value.

4) Nevertheless, does the table allow us to see some trend?

5) Assuming that the relationship between age and size is linear until the age of 12, give the 95% confidence

interval of the size of a 12 year-old child.

Page 10: S2e - Stat2Var - TEX - Rev 2018jff-dut-tc.weebly.com/uploads/1/4/7/9/14799044/s2e... · IUT de Saint-Etienne – Département TC –J.F.Ferraris – Math – S2 – Stat2Var – TEx

____________________________________________________________________________ IUT de Saint-Etienne – Département TC –J.F.Ferraris – Math – S2 – Stat2Var – TEx – Rev2018 – page 10 / 10

IUT TC MATHEMATICS FORM FOR BIVARIATE STATISTICS